Math120 130 Text
Math120 130 Text
Applied Mathematics
Fourth Edition
Alan Parks
Lawrence University
Appleton, Wisconsin
♠ This book is the printed fourth edition of a text for the two-term
Applied Calculus course at Lawrence University. The original project
that grew into this text was begun during the summer of 2011.
Alan
c Parks. αλασ Publishing, Appleton, Wisconsin, 2018. The principal
copy of this edition was printed on December 27, 2018.
Contents
Introduction v
Chapter 1. The Exponential Function 1
1. Exponentials and Logarithms 1
2. Exponential Models 7
Chapter 2. Recursion 15
1. Recursive Models 17
2. Investigating Recursive Models 22
Chapter 3. The Derivative 27
1. Discovering the Derivative 27
2. The Derivative at a Point 34
3. The Derivative of a Function 37
Chapter 4. Computing the Derivative 43
1. The Power Rule, Linearity 44
2. Products and Quotients 50
3. Exponentials and Logarithms 54
4. The Chain Rule 56
Chapter 5. Interpreting and Using the Derivative 63
1. Curve Sketching 65
2. Newton’s Method 72
3. The Chain Rule Revisited 75
4. Marginals 80
Chapter 6. Linear Optimization 85
1. Simple Examples 85
2. More Complicated Examples 88
3. Shadow Prices – Lagrange Multipliers 94
iii
iv CONTENTS
v
vi INTRODUCTION
To be more specific about subject matter: during the first term (Applied
Calculus I), we will study recursive sequences, the derivative, and linear opti-
mization problems. The second term (Applied Calculus II) will involve inte-
gration, linear algebra, multivariate derivatives, and non-linear optimization
problems.
This text will be coordinated with class work and homework; above all,
this book is meant to be read carefully. Many students reading mathematics
seriously are surprised at how long it takes to wade through a short section
of text. Perhaps that’s because lines of reasoning that seemed clear when you
scanned them quickly dissolve when you try to reproduce or apply them. And
the ability to reproduce and apply what you have read is the principal test of
mathematical understanding.
There are many problems presented during the exposition: labeled with
the letter T followed by the chapter number, then a period, and then the
problem number. There is a separate problems document with a great number
of exercises from which class examples and homework will be chosen.
Even though this book is intended to be self-contained, you are welcome
to consult other books if you find them helpful. Our bibliography (on p.237)
was taken, for the most part, from Lawrence University’s Seeley G. Mudd
Library. The books [1], [2], and [3], for example, are widely used textbooks.
Indeed, there are an enormous number of texts for courses such as this one,
and the student interested in additional problems or alternative approaches
will have no trouble finding help from libraries and online. You might be
helped occaisionally by a textbook for the general calculus, such as [13].
We hope you will be drawn into applied mathematics in a way that tran-
scends the memorization of algorithmic calculation – we hope you will see
interesting corridors to other disciplines, and, more than anything else, that
you will gain confidence in recognizing applied problems and in addressing
them effectively. We have intended for the level of this material to increase
somewhat as the course moves along. We have observed that the reader’s
facility in reading mathematics increases correspondingly.
CHAPTER 1
Properties of ex
(1) ea is a positive real number for every real number a
(2) e0 = 1
(3) ea+b = ea · eb for all real numbers a, b
(4) (ea )b = ea·b for all real numbers a, b
The listed properties are a matter of algebra when a and b are rational
numbers – when a, b can be written as ratios of integers. We will use these
properties no matter what a, b are. Here is a consequence; make sure you see
which properties (1)-(4) are being used in each step.
1
ea · e−a = ea−a = e0 = 1 so that e−a =
ea
This shows that we can switch an exponential from the numerator to the
denominator by negating the exponent. This works for moving from the de-
nominator to the numerator as well:
1 1
−3
= e3 and = e−4
e e4
This identity also shows why ea can never be 0, for 0 has no reciprocal.
There is an alternate notation used for the exponential function; we write
x
e = exp(x). This notation allows us to keep the exponent on the same level
as the function; that can make some expressions more readable. For instance,
compare
2 3
e(x −2·x )/2 = exp (x2 − 2 · x3 )/2
1. EXPONENTIALS AND LOGARITHMS 3
We see that the natural logarithm and exponential function cancel each other
when they are in sequence. This means that they are function inverses of each
other. Function inverses are not algebraic inverses: ln 6= 1/ exp; rather, the
equations (1.1) define the meaning of the inverse.
The logarithm has properties analogous to the properties of the exponential
function. We might have time to discuss one or two of these in class.
Properties of ln(x)
(1) ln(b) is defined for every positive number b
(2) ln(1) = 0
(3) ln(b · c) = ln(b) + ln(c) for all positive numbers b, c
(4) ln(ba ) = a · ln(b) for every real number a and every positive number b
These properties are not as important as the properties of the exponential
function, but properties of ln(x) will come up in various problems during the
term.
Here is our first text problem. As we mentioned in the Introduction, they
are indicated by the capital T.
Solution. Suppose that a < b, and we need to show that ln(a) < ln(b). (Both
a, b are positive.) If not, then either ln(a) = ln(b) or ln(a) > ln(b).
If ln(a) = ln(b), then (1.2) shows that
a = exp(ln(a)) = exp(ln(b)) = b
contrary to the assumption that a < b. Similarly, if ln(b) < ln(a), then
remembering that the exponential function is increasing, we have
Solution. Since the x we want is in the exponent, we will use the logarithm
as in (1.1). In order to use those equations, we need to get the exponential
function all by itself.
3 · exp(2 · x − 5) = 9
exp(2 · x − 5) = 3 now use (1.1)
2 · x − 5 = ln(3)
ln(3) ≈ 1.098612289
1. EXPONENTIALS AND LOGARITHMS 5
and so
1
x≈ · (1.098612289) + 5 = 3.049306144
2
Mathematicians are very careful about the difference between exact answers
and numerical answers. We insist on using the approximation sign x ≈ 3.049
whenever a numerical answer occurs. There is no general rule about the num-
ber of decimal places used for numerical answers; we will be fairly loose about
that.
Exact answers are almost always preferable to numerical answers. Some-
times we need to compare numbers to see which is larger, and for that purpose
numerical answers can be better, provided they are accurate enough. We will
do a fair amount of numerical work in the course, and so we will certainly get
our fill of approximations.
We mention an algebraic detail in the previous problem. The expression
2 · x − 5 contains both a product and a difference. By the precedence rules of
algebra, the product is done first; in other words
h i h i
2 · x − 5 = 2 · x − 5 NOT 2 · x − 5
The expression on the far right would be 2 · x − 10. This applies to addition
and division, as well; for instance,
h i h i 1
3 · x2 + 4 · x + 1/x = 3 · x2 + 4 · x +
x
If you are likely to make a mistake in this type of expression, use parenthesis [or
brackets] to clarify. If you are unsure of the meaning of the string of operations
on any text problem or homework problem, please ask!
Multiplying both sides of the last inequality by −1, and remembering to switch
the inequality, we get
x > − ln(0.01) ≈ 4.6052
Here are two more abstract problems. First, we claimed that all exponen-
tial functions can be written using the special exponential function ex . Proving
the following identity furnishes a nice exercise in using the properties of ex .
Problem T1.4. Let a > 0, and let b be an arbitrary number. Show that
Problem T1.5. Let a > 0 and let b be an arbitrary number. Show that
ln(b)
loga (b) =
ln(a)
aln(b)/ ln(a) = b
2. Exponential Models
Solution. We will use the model to get equations to solve. We will omit some
algebraic steps that use the properties (1)-(4) given above for the exponential
function; you should see if you can do them yourself, and we will discuss the
algebra in class.
If P is the population of bacteria after t hours, then we have
P = P0 · exp(k · t)
for some constant k. The phrase “If we start with 100” tells us that P0 = 100.
When t = 1 hour we have
Heating and Cooling says that F = F0 · exp(−k · t), where t is time and k is a
positive constant, and F0 is the initial difference in temperature.
20 = 30 · exp(−40 · k)
This equation can be solved for k; we get k ≈ 0.01. We want to find t so that
F = 0.01. We’ll see that t ≈ 801 minutes.
In the problem we just did, the fact that we start with F = 30, a positive
number, tells us that the object is warmer than the environment, and so it
is cooling off. An object colder than the environment would have a negative
value of F .
3It is not necessary for you to understand anything technical about electrical circuits
here.
4It may surprise you that the current doesn’t just jump to 0. According to this model,
the electrons gradually slow down.
10 1. THE EXPONENTIAL FUNCTION
Problem T1.9. Suppose we know that R = 106 ohms, that I0 = 1 amp, and
that I = 0.01 amps when t = 0.05 seconds. What is C? (The units of C are
Farads.)
Y = a · K b · Lc
where a, b, c are positive constants.6 One of the main ways to deal with this
equation is to take the logarithm of both sides, using the identities on p.3.
Y + 0.23 · Y = 1.23 · Y
Thus, we have
1.23 · Y = a · (2 · K)b · Lc = 2b · a · K b · Lc = 2b · Y
We see that 1.23 = 2b . Taking the logarithm of both sides, ln(1.23) = b · ln(2),
and
ln(1.23)
b= ≈ 0.299
ln(2)
12 1. THE EXPONENTIAL FUNCTION
A Limited Population
In Chapter 2 we will meet with the logistic equation that describes a pop-
ulation growing by reproduction, but limited by an external factor. Here is a
generic formula for such a population P where t is time and E, c are positive
constants.
E
(1.4) P =
1 − c · exp(−E · t)
Problem T1.12. Show that P gets close to E as t gets larger and larger.7
Solution. We know that E > 0, and so E · t > 0 when t > 0. As t gets
larger, the quantity −E · t is large and negative. The graph of the exponential
function shows that exp(−E · t) gets smaller and smaller as t gets larger. As
the exponential term gradually vanishes, the fromula (1.4) for P goes to E.
7Thisfact shows the role of the constant E in this model. It is an equilibrium – studied
in the next chapter.
2. EXPONENTIAL MODELS 13
......
......
......
......
y .
.....
.
..
......
....
....
....
..
....
.
....
....
....
..
.....
...
....
....
...
y = ln(x)
...
...
...
...
...
...
...
...
..
.
...
...
r
(1, 0) ..
..
... x
.
..
..
..
..
...
.
..
..
..
..
...
.
..
..
..
..
...
.
..
..
..
..
...
.
..
..
..
..
...
..
..
...
...
.....
..
...
...
...
.....
.
CHAPTER 2
Recursion
P0 , P1 , P2 , P3 , · · ·
Pn+1 = Pn + r · Pn
Adding r · Pn accomplishes the increase. We can write this equation like this.
P1 = (1 + r) · P0 (2.1) with n = 0
P2 = (1 + r) · P1 (2.1) with n = 1
P3 = (1 + r) · P2
P4 = (1 + r) · P3
..
.
These equations don’t tell us directly how to get the individual values, but
they tell us how to get from one value to the next. That’s recursion.
Next we observe that if we have a specific value of r and if we know the
initial population P0 , then (2.1) gives us the rest of the sequence of populations.
For example, let r = 1, so that (2.1) tells us that Pn+1 = 2 · Pn . Suppose that
P0 = 3, and just use the recursive equation over and over, as we did above:
P1 = 2 · P0 = 2 · 3 = 6
P2 = 2 · P1 = 2 · 6 = 12
P3 = 2 · P2 = 2 · 12 = 24
P4 = 2 · P3 = 2 · 24 = 48
..
.
sequence from (2.1) without a formula. There are many recursive sequences for
which no direct formula is available. And, as we mentioned above, recursion
is the natural way many sequences are given in the first place. So, we want to
be familiar and comfortable with recursion!
In the next section we give practical examples.
1. Recursive Models
r stand for the finance charge, and let q be the monthly payment, so that r, q
are parameters. Then
(2.3) Mn+1 = (1 + r) · Mn − q
The finance charge is added in, and then the monthly payment is applied to
decrease the amount owed. If we know the finance charge r, the monthly
payment q, and the initial amount M0 , we should be able to construct the
sequence M1 , M2 , . . .. We are probably interested in getting M to be 0, so
that the loan is payed off.
The Logistic Equation On p.12 we introduced the logistic equation 4 using
the exponential function. That model imagines time moving along continu-
ously – any real number is a valid time in the model. Now we will introduce
a discrete version of the model – a version in which time moves along step by
step. For instance, we might think of time in days and make one measurement
each day.
In 1838, Verhulst suggested the logistic equation for a population P that
changes over time due to two phenomena: reproductive growth at a rate k
and an external (environmental) limitation E. The numbers k, E are positive
constants – they are parameters. Here is the discrete version of the logistic
equation.
Pn
(2.4) Pn+1 = Pn + k · Pn · 1 −
E
The units of E are those of population. You can see that this recursive equation
is more complicated. Let’s notice a couple of features.
An equilibrium is a sequence that repeats the same value over and over.
We would have an equilibrium in the logistic equation if the sequence
P0 , P1 , P2 , P3 , . . . looks like P, P, P, P, . . .
4Sometimes the logistic equation is called Verhulst’s equation; it goes by other names
as well, having been rediscovered several times over the years. See [4, p.12]
1. RECURSIVE MODELS 19
Problem T2.1. Find the equilibria for sequences produced by the logistic
equation.
Having introduced equilibria, you might ask whether there was an equilib-
rium in the simple interest model. It is an important question regarding just
about any recursive model.
Pn+1 = (1 + a) · Pn − b · Pn · Qn
Qn+1 = (1 − c) · Qn + d · Pn · Qn
Let’s give a brief explanation of these equations; we will say more about them
in class. Look at the equation for Pn+1 . The (1+a)·Pn term looks like Malthu-
sian growth – reproductive growth. If Qn = 0 (if there are no predators), the
prey will increase in numbers exponentially. Apparently they have an abun-
dant food supply. The −b · Pn · Qn term in Pn+1 says that the prey will tend
to decrease the more predators there are. The product term Pn · Qn imagines
random meetings between prey and predator; the greater are the numbers Pn
and Qn , the more likely these species are to encounter each other. Bad for
prey.
Now look at the equation for Qn+1 ; the (1 − c) · Qn term says that the
population of predators will decrease if left to itself (if Pn = 0, so that there
are no prey). The term d · Pn · Qn represents a tendency for the predator
population to increase in the presence of their food supply.
To get this model started, we need the initial number P0 of prey and the
initial number Q0 of predators. Then the equations take over, producing P1
and Q1 ; then producing P2 and Q2 , and so on.
This model has two equilibria; one of them has Pn = 0 = Qn , for all n.
That makes perfect sense; if you have no prey and no predators, then you will
never get any! You should find the other equilibrium. Can you explain that
equilibrium in ecological terms?
Let’s show that the exponential model and the recursive Malthusian model
are related.
P = P0 · exp(k · t)
Problem T2.3. In the loan repayment model, suppose that $10000 is bor-
rowed, that the (annual) interest rate is 5%, and that the monthly payment is
$500. How long does it take to pay off the loan?
Solution. As above, let Mn be the amount of money left to pay off after n
months. Then M0 = 10000. We have the recursion Mn+1 = (1 + r) · Mn − q.
Since 5% is the annual interest rate, the monthly interest rate is r = 0.05/12.
The payment q = 500.
We computed Mn for n = 1, 2, . . ., looking for the place where Mn becomes
negative. Here are the first few terms; the decimal parts are cents – they have
been rounded.
n: 0 1 2 3 4 ...
Mn : 10000 9541.67 9081.42 8619.26 8155.18 . . .
As we computed the Mn , we encountered these entries.
n: 19 20 21
Mn : 957.26 461.25 −36.83
After 20 months, there is less owed than the monthly payment, even after the
finance charge is applied. So, in the 21st month, we pay off the loan.
Here is a new model; it measures the speed of a falling object.
V0 , V1 , V2 , ...
and we suppose that Vn+1 = (1−k)·Vn +g, where k, g are positive parameters.7
The units of speed here are feet per second. Find an equilibrium. Now let
k = 0.1 and g = 20; starting with V0 = 0, does the sequence approach the
equilibrium?
Solution. To find an equilibrium, we write V for Vn and Vn+1 and obtain the
equation
(2.5) V = (1 − k) · V + g
We solve V = g/k feet per second. The sequence that repeats g/k over and
over is an equilibrium. (The ratio g/k is called the terminal speed of the falling
object.)
Now let k = 0.1 and g = 20, so that the equilibrium is 20/0.1 = 200 feet
per second. Letting V0 = 0, we used Excel to calculate the sequence. Here are
some values we obtained.
n : 0 1 2 ··· 70 71 72
Vn : 0 20 38 · · · 199.87 199.89 199.90
The sequence increases slowly; V100 ≈ 199.99. It looks like the sequence is
approaching equilibrium.
We add the probabilities here, since the two events (sunny day n, rainy day
n) do not overlap.
We calculate Rn+1 in a similar way. See if you can do it, and we’ll discuss
it in class.
Rn+1 = 0.3 · Sn + 0.45 · Rn
As in the predator/prey model, we have two recursive sequences that are re-
lated to each other. You will experiment with the behavior of these sequences.
A Markov process involves a sequence of time steps; at each step there are
a finite number of possibilities, exactly one of which must happen at that step.
The possibilities are usually called states. The states in the weather example
are sunny and rainy. For each states S, T , there is a definite probability asso-
ciated with having state S at time step t and state T at time step t + 1. We
will see several examples of Markov processes.
9This problem is a typical counting problem that involves recursion. Books on com-
binatorics, such as [16] contain many such problems. See [15, p.214ff] for information on
Fibonnaci.
26 2. RECURSION
(3) mature female rabbits gestate each month, become pregnant, and
produce one male and one female at the end of the month
Fibonacci was able to find a recursion for the number Fn of female rabbits at
the end of month n:
(2.6) Fn+2 = Fn+1 + Fn for n = 0, 1, 2, . . .
This is a different type of recursion than what we have considered so far. In the
previous examples, you computed a sequence term from the term just previous
to it. In (2.6) you need two previous terms.
Can you show that (2.6) is correct? You might want to consider the number
Mn of mature females at the end of month n, and the number In is immature
females. Then Fn = Mn + In . Fibonacci’s assumptions show that
Mn+1 = Mn + In
In+1 = Mn
We will complete the details in class.
If you assume you have one immature female to start with, so that M0 = 0
and I0 = 1, then you get F0 = 1 and F1 = 1, and then the recursion (2.6) takes
over. This specific example gives what are called the Fibonacci numbers; they
have a remarkable number of interesting properties and they arise in a host of
counting problems.
We have tried to keep to fairly simple models in this chapter, while pushing
in the direction of the complexity found in more realistic applications. The
reader might glance at Chapter 6 of [6], regarding gas exchange in the human
lung, to see a typical, much more complicated, example.
CHAPTER 3
The Derivative
The expression ∆x is not a product; it is one symbol (a word spelled with two
letters). The value of ∆x depends on two values of x; one value designated as
starting and other as final.
To work another example, suppose the variable z starts at 7 and ends up
at 3. Then
∆z = 3 − 7 = −4
Note that the change is final minus start regardless of the size of either number.
In this case, ∆z was negative – that means that the variable z decreased. In
the first example ∆x = 2 was positive, indicating that x increased.
Notice that if x is the starting value and the change is ∆x, then the ending
value is x + ∆x.
27
28 3. THE DERIVATIVE
Later we will consider the situation of an object moving along a line – say
that the line is the x-axis. The value of x at some point in time is the object’s
position. The quantity ∆x, the change in position, is called the displacement.
If we undergo a displacement of −2, we mean that we move 2 units in the
negative direction on the x-axis; that would be to the left.
The delta notation will arise frequently with two variables related by a
function. Say y = f (x). Starting with a value of x, we have the corresponding
value of y = f (x). Now say we undergo a change ∆x in x, the ending value of
x is x + ∆x, and the y-value that goes with this is f (x + ∆x). So, the starting
value of y is f (x) and the ending value of y is f (x + ∆x), and
(3.1) ∆y = f (x + ∆x) − f (x)
This equation is important; it relates the change in a variable y to values of a
function f (x) that computes that variable y.
Now we move toward this chapter’s main topic. Let’s consider two ap-
plication problems: finding the velocity of a moving object and finding the
slope of a graph. We will see that these problems are, in some sense, the same
problem; when we abstract out what the two problems have in common, we
will discover the derivative. This is very typical mathematics – a general idea
is discovered as a common feature of several concrete problems. An additional
application will be discussed in class.
Problem: Velocity.
We are interested in the discovery of a mathematical idea; it is interesting
that the approach of a particular physics text is identical. See [9, p.23ff].
We begin with some definitions that may be familiar to you – we want to
make sure everyone is on the same page. We imagine traveling along a straight
line – let’s say we are traveling along the x-axis. The speed of a moving body
measures the ratio of distance traveled to the time elapsed. A constant speed
of 30 miles per hour (MPH) predicts that we travel 30 miles each hour elapsed.
1. DISCOVERING THE DERIVATIVE 29
The velocity of a moving body is the speed along with a plus or minus sign
that indicates the direction of the motion: positive velocity means that we are
traveling in the positive direction on the x-axis (to the right); negative velocity
means that we are traveling to the left. So, if the velocity is -10 MPH, then
the speed is 10 MPH and we are moving to the left. Notice that speed and
velocity have the same units.
If the speed is constant, then we can calculate it by dividing distance
traveled by time elapsed. Say we start at x = 3 at time t = 5, and we end
up at x = −10 when t = 7. Then we have traveled a distance of 3 + 10 = 13
miles in 2 hours, and so our speed was 13/2 = 6.5 MPH. The velocity, on
the other hand, would be the displacement divided by the time elapsed, for
displacement takes direction into account. The displacement in this example
is
∆x = (ending x) − (starting x) = (−10) − (3) = −13
and so the velocity is −13/2 = −6.5 MPH. Notice that the denominator 2 in
the case of both speed and velocity is the change in time ∆t. In particular,
notice that the ratio x/t is not relevant to either quantity: for instance, we
had x = 3 when t = 7; the ratio 3/7 doesn’t tell us about speed.
Constant velocity is very unusual. For instance, motion often starts from
a standstill, speeding up and slowing down along the way. It is the business
of calculus to explain how to calculate velocity when it is not constant. Let’s
think about that.
Suppose that the x axis measures miles, and time t is in hours. Suppose
we want to know the velocity when t = 3. Let’s assume we are at x = 10
when t = 3, and we end up at x = −80 at t = 5. The displacement is
∆x = −80 − 10 = −90 miles, and the elapsed time is ∆t = 5 − 3 = 2 hours.
Their ratio is
∆x −90
= = −45 miles per hour
∆t 2
30 3. THE DERIVATIVE
What does the number −45 tell us? It would be too much to expect that it
tells us the velocity at t = 3, for the velocity may have varied quite a bit during
the two hours elapsed. The ratio gives what is called an average velocity. An
average is meant to give one number that smooths out variation along with
way, but it doesn’t tell us much about what was happening exactly at t = 3.
Instead of measuring over an elapsed time of 2 hours, suppose we use
∆t = 0.25 hours; say we start at t = 3 and end at t = 3.25, and suppose
that the displacement is ∆x = −12 miles over this change in time. Then the
average velocity is
∆x −12 miles
= = −48 miles per hour
∆t 0.25 hours
The time interval 0.25 hours is much smaller than 2 hours, and so we expect
that −48 miles per hour is closer to the actual velocity at t = 3. Nonetheless,
there could be some variation in 0.25 hours, and so -48 miles per hour is still
an average velocity, and we still don’t know our exact velocity.
Now suppose we have ∆t = 0.01 hours, and the displacement is ∆x =
−0.47 miles. This time, the average velocity is
∆x −0.47 miles
= = −47 miles per hour
∆t 0.01 hours
Because we are measuring over one hundredth of an hour we expect even less
variation in velocity than with the previous 0.25 hours, so, this average velocity
is probably closer to the actual velocity at t = 3.
It looks like this: to get the average velocity to be close to the actual
velocity, use a very small elapsed time, so that the velocity can’t vary too
much. As the time interval gets smaller and smaller, we expect the average
velocity to be close to the actual velocity. The average velocity is
∆x
∆t
1. DISCOVERING THE DERIVATIVE 31
and here is the technical notation for letting the time interval ∆t get very
small.
∆x
(3.2) velocity = lim
∆t→0 ∆t
Problem: Slope
We are given a curve y = f (x) in the plane, and we consider the problem
of finding the slope at some point. The slope is, by definition, the slope of a
tangent line to the curve. If we are given a sketch of the curve, we can probably
determine a tangent line by eye; what we want is an exact, analytic way of
finding the slope of that line. Idea: we can find the slope of a line through two
points on the curve – such a line is a secant line. If the two points are very
close together, their secant line is approximately a tangent line.
Suppose we are interested in the slope at some point (x0 , y0 ) on the curve
y = f (x). We will imagine a secant line through (x0 , y0 ) and some other point
on the curve. We can imagine this other point as resulting from a change of
∆x in x, and then the ratio ∆y/∆x gives the slope of the secant line.
...
......
y = f (x)
......
......
r .
......
.
.......
.
......
......
secant
.
.......
.
..
..
......
......
.......
......
∆y
.......
.......
.
..
.
...
.
........
.......
........
.........
......
...... .
............
.
..
..
..
...........
r
.......
....... ...............
.....................................
(x0 , y0 ) ∆x
If the two points at the ends of the secant are very close to each other – if
∆x is small – it makes sense that the secant line would be close to the tangent.
In other words,
∆y
tangent slope = lim
∆x→0 ∆x
Let’s work a specific example. Let’s find the slope to y = 1 − x2 at x = 2.
The point on the curve where x = 2 is (2, −3). That’s our starting point
for secant lines. Stepping ∆x, we have x-coordinate 2 + ∆x, and then the
y-coordinate is 1 − (2 + ∆x)2 . And so, again using (3.3), we compute
1. DISCOVERING THE DERIVATIVE 33
= 1 − 4 − 4 · ∆x − (∆x)2 + 3 = −4 · ∆x − (∆x)2
Problem: Density
Assume we have a block of wood lying along the interval 0 ≤ x ≤ 2 on
the x-axis, where the x measures yards. (So the wood is 2 yards long.) We
want to think about the weight of the wood. Notice that it doesn’t make
sense to talk about the weight at a point along the block, since a point is not
three-dimensional, and so the weight at a point is 0 (ideally). We measure
how weight accumulates as we move from left to right over the log. For each
x with 0 ≤ x ≤ 2, let W (x) be the weight in pounds of the wood from the
left-hand end up to the point x. In other words, W (x) is the weight of x yards
of wood – the left-hand x yards. We will see that this function W (x) can be
used to calculate the density of the wood – the number of pounds per yard we
have at any given point.1
1The density can vary over the block, either because the thickness of the block varies, or
because the characteristics of the wood vary. This approach to density – using the function
W that measures the weight at various points is consistent with the ideas of Isaac Newton
that led to the discovery of the general calculus. See [15]. Also, we mention that the word
density means different things in different contexts; we will point this out as we encounter
examples.
34 3. THE DERIVATIVE
0 1 1 + ∆x
∆W = W (1 + ∆x) − W (1)
is the weight in pounds of the block between x = 1 and x = 1 + ∆x. The ratio
∆W
∆x
has units pounds per yard, an approximation of the density at x. Why approx-
imation? Because the thickness or shape could vary between 1 and 1 + ∆x. In
fact, the ratio ∆W/∆x is the average density on that piece of wood. Letting
∆x get smaller and smaller, we expect to get closer to the actual density at x.
You might try the following problem.
where the steps ∆x were taken from a specific point (x0 , y0 ) on the curve
y = f (x), so that y0 = f (x0 ). The limit (3.4) is called the derivative of f (x)
at x = x0 .
Problem T3.2. Compute the derivative of 1/x at x = 4.
Solution. At the point x = 4 on the curve y = 1/x, we have y = 1/4. Thus,
x starts at 4 and y starts at 1/4. The definition (3.4) of the derivative wants
∆y:
1 1 4 − (4 + ∆x) −∆x
∆y = − = =
4 + ∆x 4 (4 + ∆x) · 4 4 · (4 + ∆x)
And so
∆y −∆x 1 −1
= · =
∆x 4 · (4 + ∆x) ∆x 4 · (4 + ∆x)
We see that when ∆x is small, the ratio is close to −1/(4 · 4). Thus, the slope
of y = 1/x at x = 4, is −1/16. Or, if y = 1/x describes an object traveling on
the y-axis, where x is time, then we just computed the velocity of the object
at time x = 4.
Problem T3.3. Compute the derivative of 3 · x − 7 at x = −2.
Solution. On y = 3 · x − 7 and x = −2, we have y = −13. Then check that
∆y = 3 · (−2 + ∆x) − 7 − (−13) = 3 · ∆x
and so
∆y 3 · ∆x
= =3
∆x ∆x
The resulting expression 3 stays at that value no matter what ∆x does. Thus,
the derivative is 3.
Let’s interpret the derivative we just calculated. That derivative is the
slope of y = 3 · x − 7; that curve is a straight line of slope 3, so it’s no wonder
that the derivative is 3.
There are many other examples. We may have time to mention some of
them in class.
36 3. THE DERIVATIVE
(1) Let v(t) be the velocity at time t of an object moving along an axis.
Then the derivative of v at t = t0 is the acceleration a(t) of the object
at time t = t0 . The units of acceleration are distance/(time)2 .
(2) If W (x) is the work done in moving an object along the x-axis to the
point x, then the derivative of W at x = x0 is the force applied at the
point x = x0 to produce that motion.
(3) If C(x) measures the cost of quantity x of some good, then the deriv-
ative of C at x = x0 is the cost per unit of x, at x = x0 . (The cost
per unit is the price; price can vary, depending on how much we buy.)
You might wonder about calculating the derivative numerically.
Problem T3.4. Approximate the derivative of ex at x = 2.
Solution. Let y = ex . To compute ∆y, we use 2 as our starting value and
then 2 + ∆x will be our ending value. Then
∆y exp(2 + ∆x) − exp(2)
∆y = exp(2 + ∆x) − exp(2) and =
∆x ∆x
We used Excel to compute the ratio for various small values of ∆x:
x: 0.1 0.05 0.01 0.005 0.001 0.0001 0.00001
∆y/∆x: 7.7711 7.5769 7.4261 7.4076 7.3928 7.3894 7.3891
It looks as if the derivative is 7.389, or so.
In the next chapter we will be able to find the exact value of the deriva-
tive of ex ; numerical calculation raises the question why we shouldn’t prefer
that method to the algebraic method we used to this point. There are two
important reasons to prefer the algebraic method: (1) in applications of the
derivative we will want to understand the behavior of the derivative over a
large set of points, all at once. Algebraic formulas disclose that behavior much
more readily than tables of values. (2) Many of the (practical!) uses of the de-
rivative involve purely symbolic quantities – there will be no specific numerical
values to compute. Having said that, we do admit that numerical calculations
are often useful, and they are necessary in situations where we do not have
3. THE DERIVATIVE OF A FUNCTION 37
a formula for the function involved. We close by saying that the numerical
calculation of derivatives, in general, is a hard problem with many nuances.
We won’t take the time to introduce that subject in any formal way. Still,
there are a couple of problems at the end of this chapter that invite numerical
calculation.
∆y = (x + ∆x)2 − x2
Expanding out:
∆y = x2 + 2 · x · ∆x + (∆x)2 − x2
= 2 · x · ∆x + (∆x)2
And so
∆y 2 · x · ∆x + (∆x)2
= = 2 · x + ∆x
∆x ∆x
In the expression 2 · x + ∆x, the x indicates s a specific point on the curve – we
just don’t want to make which point. And the ∆x is supposed to get small.
When that happens, the expression 2 · x + ∆x gets close to 2 · x – that’s the
value of the derivative of x2 , in general.
Here is one of the notations we will use to designate the general derivative:
(x2 )0 = 2 · x
Now we continue.
f (x + ∆x) − f (x) 1 1 1
= − · get a common denominator
∆x x + ∆x x ∆x
x − (x + ∆x) 1
= ·
(x + ∆x) · x ∆x
−∆x 1
= · cancel ∆x
(x + ∆x) · x ∆x
−1
=
(x + ∆x) · x
Let’s not lose track of this expression; it’s nothing more nor less than ∆y/∆x.
As ∆x gets close to 0, the expression looks like −1/x2 .
0
1 1
=− 2
x x
Here is a case where we can predict the answer before we compute it.
Problem T3.7. Compute the derivative of m · x + b, where m, b are constants.
Solution. This time we will use (3.6). Let y = m · x + b is a line of slope m.
Since the derivative computes slope, we expect that
(m · x + b)0 = m
Let’s check. We have
∆y = m · (x + ∆x) + b − m · x + b
= m · x + m · ∆x + b − m · x − b
= m · ∆x
And then
∆y m · ∆x
= =m
∆x ∆x
This secant slope is constant. (That’s because a secant line to a line is just
the line itself!) And so the derivative is m.
40 3. THE DERIVATIVE
We have seen that similar patterns can occur in variables with different
names. If we have f (B), then f 0 (B) asks for the derivative of f , just as f 0 (x)
does. Thus, (B 2 )0 = 2 · B, just as we already computed (x2 )0 = 2x. Once we
compute the derivative of a function abstractly, we have the derivative in the
same function no matter what variable is used. That efficiency is the point of
having general formulas.
Suppose we have y = f (x) and we know that f 0 (3) > 0. We claim that y
is increasing when x increases from x = 3. This is because ∆y/∆x → f 0 (3)
as ∆x → 0, with x = 3. Since f 0 (3) is positive, the ratios ∆y/∆x must be
positive when ∆x is small. If x increases from 3 a very small bit, then ∆x > 0,
and so for ∆y/∆x to be positive, we must have ∆y positive, as well. That ∆y
is positive is that y is increasing.
A similar argument can be made when f 0 (x) < 0. In that case, we say that
y is decreasing, which means that if x increases, then y decreases. This gives
very practical information that we will use often.
Virtually every function for which there is a formula has a derivative that
we can calculate, as we will see. But there are some (unusual) functions for
which the ratio in (3.5) does not do anything sensible as ∆x goes to 0. The
functions f (x) for which f 0 (x) exists are said to be differentiable. The most
common way for us to know that a function is differentiable is to have a formula
for its derivative; in the next few sections of this text we will develop a set of
rules for taking the derivative – rules which will apply to most every function
formula.
When we have y = f (x), the equations (3.5) and (3.6) write the derivative
two ways:
f (x + ∆x) − f (x) ∆y
(3.7) f 0 (x) = lim = lim
∆x→0 ∆x ∆x→0 ∆x
3. THE DERIVATIVE OF A FUNCTION 41
∆y = f (x + ∆x) − f (x)
The formulations represent different points of view. The ratio that uses f (x)
is thinking about the derivative as applying to a function; the ratio with ∆y
is thinking about the relationship between the variables x, y.
These two points of view are united in one more notation. It is customary
to write
dy ∆y
(3.8) = lim
dx ∆x→0 ∆x
Equation (3.6) shows that dy/dx is just another notation for the derivative:
dy
(3.9) = f 0 (x)
dx
Why have two notations for the same thing? As we mentioned, the two no-
tations represent two different points of view. It turns out that the dy/dx
notation will be useful in its own right. The dy and dx are differentials.2
Similar to the ∆x notation, dx is one symbol and not a product. It is not
trivial to explain exactly what differentials are; for our course, we will see that
they furnish a useful notation, especially when units are involved. Indeed,
suppose that x measures weight in pounds and y measures frogs, and suppose
there is a function y = f (x) that tells us how many frogs we get for so-many
pounds. (We can suppose that different frogs have different weights, so f (x)
may be quite complicated.) The units of the ratio ∆y/∆x would be frogs per
pound, that ratio coming directly from the units of y and the units of x. Since
∆y/∆x → dy/dx, the derivative dy/dx has the same units: frogs per pound.
This simple reasoning works in general. If we have an object moving along
the x-axis, where that axis measures feet, and if t is time in seconds, then the
2The differential notation dy/dx comes from Leibnitz, one of the discoverers of the
calculus. The book [15] has more information.
42 3. THE DERIVATIVE
velocity v(t) has units feet per second. The derivative formula
dx
v(t) =
dt
shows the same ratio of units: feet per second. Taking this one step further,
the derivative of v(t) could be written
dv feet per second
v 0 (t) = in units
dt second
and a little fraction algebra:
feet per second feet 1 feet
= · =
second second second second2
The units feet per second-squared are the units of acceleration, and in fact the
derivative of the velocity is acceleration, as we have mentioned.
If W = f (x) is the weight in pounds of the left-hand x feet of a steel bar,
then
dW
f 0 (x) =
pounds/foot
dx
As we know from our previous work, dW/dx is the density.
CHAPTER 4
Let’s summarize the notation for the derivative. Given y = f (x), we have
two ways to write the derivative:
dy
(4.1) f 0 (x) =
dx
The definition of the derivative asks us to think about a ratio; here are the
two formulations of that ratio.
∆y f (x + ∆x) − f (x)
(4.2) =
∆x ∆x
The ratio of deltas is the average change in y per change in x. When we let
∆x → 0, we get the derivative, the exact change in y per change in x:
f (x + ∆x) − f (x)
(4.3) f 0 (x) = lim
∆x→0 ∆x
A function that has a derivative is a differentiable function. In terms of dif-
ferentials:
dy ∆y
(4.4) = lim
dx ∆x→0 ∆x
Another example: since 1/x = x−1 , the Power Rule applies here, as well, for
that rule would compute
−1 0 1
x = (−1) · x−2 = − 2
x
and this was what we got by the limit calculation of the derivative. We will
prefer writing the derivative of 1/x as −1/x2 .
We will discuss a general argument for the Power Rule later. Here are some
special cases, using the limit (3.5). We will skirt some of the details, in case
you want to try them yourself!
Solution. If y = x3 , then1
and so
∆y
= 3 · x2 + 3 · x · ∆x + (∆x)2
∆x
As ∆x → 0, we get (x3 )0 = 3 · x2 .
Here are two more rules will allow us to calculate the derivative of a wide
variety of functions.
When we take the derivative of a negative, we don’t have to convert the minus
sign to a (−1). We just pull out the minus sign; the point is that we are pulling
0
out a constant according to the Constant Multiple Rule: − f (x) = −f 0 (x).
Example. We have 0 0
1 5 1 0
+x = + x5
x x
We already know the derivative of 1/x and of x5 , and so
0 0
1 5 1 0 1
+x = + x5 = − 2 + 5 · x4
x x x
The Constant Multiple Rule and the Addition Rule together are sometimes
called the linearity of the derivative. Here is an abstract version that combines
them: let f (x) and g(x) be differentiable functions, and let a, b be constants.
Then
0 0 0
a · f (x) + b · g(x) = a · f (x) + b · g(x) Addition
= a · f 0 (x) + b · g 0 (x) Constant Multiple
1. THE POWER RULE, LINEARITY 47
Example. Here we combine the power rule and linearity; underneath each
line we give the reason for that particular step.
0
4 · x5 − 3 · x3 + 7 · x + 1
0 0 0 0
= 4 · x5 + − 3 · x3 + 7 · x + 1
addition rule
0 0 0
= 4 · x5 − 3 · x3 + 7 · x + 0
constant multiple, derivative of a constant
= 4 · 5 · x4 − 3 · 3 · x2 + 7 · 1
power rule, including x0 = 1
= 20 · x4 − 9 · x3 + 7
In the last two examples we belabored the steps to make sure you could see
exactly what we were doing. As we do more and more derivative calculations,
we will work at doing them more quickly, using the rules without thinking
48 4. COMPUTING THE DERIVATIVE
Problem T4.4. We are moving along the y-axis; position on that axis is
measured in feet. At time t seconds we are at y = 3t − 12t3 . What is our
velocity when t = −2?
Problem T4.5. An object travels along the y-axis (measuring meters in its
usual vertical orientation) such that y = 3 · t2 − 12 · t + 14 at time t seconds.
When is the object moving down?
value of Q. The number 2 is 2/10=20% of the number 10. The ratio 20% is
the relative rate of Q. Thus, the relative rate of Q is
Q0 1 dQ
= ·
Q Q dt
Of course, we must be assuming that Q 6= 0. Notice the units of relative rate
here: percent per second.
Many uses of the derivative involve abstract quantities for which we do not
have equations. Here is an example.
Problem T4.6. Helium gas in a closed container obeys the Ideal Gas Law :2
V · P = k · T , where P is the pressure, V the volume, T the absolute tem-
perature, and k a positive constant. In a closed container, we would have V
constant, whereas P, T could change in time. Show that P, T have the same
relative rates.
Solution. We take the derivative of both sides of the Gas Law equation, using
time t as variable. Since V, k are constants, the Constant Multiple Rule applies
to them:
dP dT
V · =k·
dt dt
To get at relative rates, we divide each side of this equation by the correspond-
ing side of the original equation:
1 dP 1 dT 1 dP 1 dT
·V · = ·k· so that · = ·
VP dt kT dt P dt T dt
This says that P, T have the same relative rate.
Since P, T have the same relative rates, if P is changing by, say, 23% per
second, then T is changing at 23% per second, as well.
The answer agrees with the Power Rule; in fact, the Product Rule implies the
Power Rule in the case that the exponent is a positive integer.
Here is a practical, abstract use of the Product Rule.
2. PRODUCTS AND QUOTIENTS 51
Problem T4.7. Suppose we sell tables, and the weekly demand D for tables
(how many we can sell) depends on the unit price p. The notation D(p)
expresses that demand is a function of price. Suppose that we don’t know D(p)
exactly, but we know that it is positive and we know that D0 (p) < 0. Under
what circumstances should we raise the price in order to increase revenue?
Solution. That D0 (p) < 0 is that D(p) decreases when p increases.3
The revenue R is the product of demand and price: R = D(p) · p; this is
the product of the number of tables and the price of each table. To have an
increase in price result in an increase in revenue, we want dR/dp > 0.
The Product Rule computes
dR
= D0 (p) · p + D(p) · 1
dp
In the inequality
D0 (p) · p + D(p) > 0
we want to divide by D0 (p). Since that quantity is negative, the division will
reverse the inequality:
D(p) D(p)
p+ 0
< 0 which is p < − 0
D (p) D (p)
Again we remind you that D0 (p) is negative, and so the fraction −D(p)/D0 (p)
is positive. The last inequality says that if the price is below the ratio −D/D0 ,
then we should raise the price to increase R (even though we will decrease
demand).
Problem T4.8. Use the Product Rule to show that (f (x)2 )0 = 2 · f (x) · f 0 (x).
Solution. Write f (x)2 = f (x) · f (x), and use the Product Rule:
h i0
f (x) · f (x) = f 0 (x) · f (x) + f (x) · f 0 (x) = 2 · f (x) · f 0 (x)
3This fact about the sign of the derivative was mentioned on p.40.
52 4. COMPUTING THE DERIVATIVE
The derivative rule for ratios of functions is somewhat similar to the Prod-
uct Rule.
The numerator of this formula should remind you of the Product Rule: we
take turns with each function. But there is a minus sign in between rather
than a plus sign. We need to take the derivative of the numerator f (x) first.
As in our first Product Rule example, there is no need to work further with
this expression if we are just trying to apply the rule.
As with the Product Rule, the Quotient Rule is consistent with the other
rules.
Example. We calculate the derivative of x−5 using the Quotient Rule, apply-
ing the Power Rule to x5 .
0
−5 0 1
(x ) = algebra
x5
10 · x5 − 1 · (x5 )0
= Quotient Rule
(x5 )2
2. PRODUCTS AND QUOTIENTS 53
0 · x5 − 5 · x4
= Power Rule x5
x10
−5 · x4
= algebra
x10
= −5 · x4−10 = −5 · x−6
The final answer agrees with the Power Rule applied to x−5 .
We mentioned that the Product Rule can be used to establish the Power
Rule for xn when n is a positive integer. The previous example can be gen-
eralized to establish the Power Rule when n is a negative integer, using the
Quotient Rule. Thus, the Product Rule and Quotient Rule give the Power
Rule when the exponent is an integer.
Solution. Recall that M is increasing when dM/dt > 0. (By the way, the
derivative has units kg per hour.) We compute dM/dt using the Quotient
Rule.
dM 1 · (t2 + 9) − t · 2 · t
=
dt (t2 + 9)2
We need to work with this derivative to find out where it is positive. So, we
simplify the numerator:
1 · (t2 + 9) − t · 2 · t t2 + 9 − 2 · t2 −t2 + 9
= =
(t2 + 9)2 (t2 + 9)2 (t2 + 9)2
We want to know where this is positive. The denominator (t2 + 9)2 is positive,
and so the fraction is positive when the numerator is positive. That’s
Since t ≥ 0, we see that 9 > t2 means that 3 > t. Thus, M is increasing when
t < 3.
54 4. COMPUTING THE DERIVATIVE
Solution. We are asking when dy/dx > 0. Use the Quotient Rule:
x 0
e (ex )0 · (x + 3) − ex · (x + 3)0 ex (x + 3) − ex ex (x + 2)
= = =
x+3 (x + 3)2 (x + 3)2 (x + 3)2
(Notice, in the second equation, when we took the derivative of the ex , it
doesn’t look as if anything happened! That’s the simplicity of the derivative
rule for the exponential function.) We want our slope to be positive. The
exponential function is positive, and the (x+3)2 in the denominator is positive,
and so the sign of the slope is determined by x + 2. We see that the slope is
positive when x > −2.
The graph of the exponential function shows that it is always increasing. In
other words, its slope is always positive – but that slope is just the exponential
function, which is positive, so that makes sense.
On p.36, we estimated the derivative of ex at x = 2 numerically. Now
we see that the exact value of that derivative is e2 . The formula (ex )0 = ex
depends on the limit considered in the following.
3. EXPONENTIALS AND LOGARITHMS 55
This is zero when ln(x) + 1 = 0, so that ln(x) = −1, and that’s x = e−1 = 1/e.
Problem T4.13. Explain why ln(4x) and ln(x) have the same derivative.
The Chain Rule explains how to take the derivative of a composite function.
The pattern f (g(x)) sees an outside function f and an inside function g. The
outside function in our example is the 5-th power function. The inside function
is x4 +3·x2 . The Chain Rule says to take the derivative of the outside function,
leaving the inside function alone. That’s the expression f 0 (g(x)) in the Chain
4. THE CHAIN RULE 57
Rule formula. The derivative of the 5-th power function comes from the Power
Rule; here is the formula, using as the variable.
5 0
= 5 · 4
The other factor in the Chain Rule formula is g 0 (x); that’s the derivative of
the inside function:
4 0
x + 3 · x2 = 4 · x3 + 3 · 2 · x = 4 · x3 + 6 · x
Let’s do another one; this time we will try to focus on the outside-then-
inside pattern and not clutter things with function notation.
0 h i0
1 4 −7
= (5 − 7 · x )
(5 − 7 · x4 )7
= −7 · (5 − 7 · x4 )−8 · (5 − 7 · x4 )0 ***
= −7 · (5 − 7 · x4 )−8 · (0 − 7 · 4 · x3 )
At the equation marked (***) we took the derivative of the (−7)th power (the
outside function), leaving the inside function 5 − 7 · x4 alone. Then we took
the derivative of the inside.
Solution. The function e3t is a composite; this is easier to see if we use the
notation exp(3t); the outside function is the exponential function and the
inside function is 3t. Thus, we use the Chain Rule:
0
exp(3t) = exp0 (3t) · (3t)0 = exp(3t) · 3 = 3e3t
The derivative formula exp0 = exp was used, leaving the inside 3t alone.
The previous problem can be done another way; write e3t = (et )3 , and then
h i0
(et )3 = 3 · (et )2 · (et )0 = 3e2t · et = 3e3t
Same answer!
We take the previous problem a little further. Many of the exponential
models in Chapter 1 were of the form y = y0 · exp(k · t), where y0 , k are
constant. We can explain these models as showing that y changes at a rate
4. THE CHAIN RULE 59
Problem T4.18. Show that the relative rate of f (t) is the derivative of
ln(f (t)), assuming that f (t) is positive.
Problem T4.19. Show that the Chain Rule and the Product Rule imply the
Quotient Rule.
We will rewrite the numerator, letting x = g(t), so that ∆x = g(t + ∆t) − g(t),
and then
f (g(t + ∆t)) − f (g(t)) f (x + ∆x) − f (x)
=
∆t ∆t
Supposing that ∆x 6= 0, we have
f (x + ∆x) − f (x) f (x + ∆x) − f (x) ∆x
= ·
∆t ∆x ∆t
We get the derivative of f (g(t)) if we let ∆t → 0.
Letting ∆t → 0, the ratio ∆x/∆t goes to dx/dt = g 0 (t). Also, it makes
sense that if ∆t → 0, then ∆x → 0, and so
f (x + ∆x) − f (x)
→ f 0 (x) = f 0 (g(t))
∆x
Putting the two factors back together, we get
0 f (x + ∆x) − f (x) ∆x
= f 0 (g(t)) · g 0 (t)
f (g(t)) = lim ·
∆t→0 ∆x ∆t
and that’s the Chain Rule.
The foregoing raises two technical issues; although we will not resolve either
of them, honesty compels us to point them out. First, we have to consider
what happens when ∆x = 0. Second, we need the fact that ∆x → 0 when
∆t → 0. That second fact is the continuity of x = g(t); you can look up
continuity in Section 10-3 of [1], if you are curious.
The Chain Rule can be used to explain the Power Rule. Recall the formula
(1.3) on p.6; it says that
xn = exp(n · ln(x))
(We need x > 0 so that ln(x) is defined.) Taking the derivative with respect
to x:
n n
(xn )0 = exp0 (n · ln(x)) · (n · ln(x))0 = exp(n · ln(x)) ·
= xn · = n · xn−1
x x
and that’s the Power Rule, at least when x > 0. We’ll be content with that
case.
CHAPTER 5
Problem T5.1. Let f (x) = x4 −x3 −6·x2 . Find out for where f (x) is positive
and where it is negative.
Solution. The roots of f (x) are the x values where f (x) is 0. It is a theorem
of elementary algebra that x = a is a root of f (x) if and only if x − a is a factor
of f (x). Thus, to find where f (x) is 0, we factor, starting with noticing the x2
on each term. For the other factors, there are several methods you may have
learned; we will be happy to help you individually if you need some review.
x4 − x3 − 6 · x2 = x2 · (x2 − x − 6) = x2 · (x + 2) · (x − 3)
The factors show that the roots of f (x) are x = 0, −2, 3. We graph these on
a number line, and this chops the real numbers into four intervals:
On each of these intervals f (x) is not 0, and so on each interval its sign cannot
change.1 There is a fairly simple way to figure out these signs, working from
1The assertion that f cannot change sign between roots follows from the fact already
enunciated that roots come from factors. There is also a calculus theorem called the Inter-
mediate Value Theorem that is relevant to this situation. You are encouraged to look up
that result online or in a calculus text.
63
64 5. INTERPRETING AND USING THE DERIVATIVE
right to left along the intervals. For the rightmost interval 3 < x, the sign is
determined by the leading coefficient of the polynomial – the coefficient on the
highest power of x. That highest power is x4 and the coefficient is 1; thus, the
sign on 3 < x is positive. If you wish, choose various values of x greater than
3; plug each of them into the polynomial to observe that you get a positive
value each time.
As we move to the left, across each root, we look at the exponent of that
root as a factor of f (x). When we cross x = 3, going from 3 < x to 0 < x < 3,
the factor is x − 3 to the first power. The exponent is 1 is an odd number, and
so the sign changes as we cross x = 3. Thus, the sign of 0 < x < 3 is negative.
Moving left across x = 0, the factor is x2 and the exponent 2 is even. When
we cross a root with an even exponent, the sign stays the same. The sign of
0 < x < 3 was negative, and so the sign of −2 < x < 0 is negative, too.
When we cross x = −2, we encounter the factor x+2 which has exponent 1,
an odd exponent, so that the sign changes. Here is the resulting sign diagram.
+ − − +
−2 0 3
For any number x, the diagram tells us quickly whether f (x) is positive, neg-
ative, or zero. For instance f (−1) < 0 and f (17) > 0 and f (1) < 0.
Problem T5.2. Find the signs of (x2 − 4 · x − 5)/(x − 7).
Solution. Factor!
x2 − 4 · x − 5 (x + 1) · (x − 5)
=
x−7 x−7
The roots are x = −1 and x = 5. The denominator root x = 7 is a singularity
– a place where the function is not defined. We include singularities in the
determination of sign, in the same way as roots. The leading coefficient on the
function is 1, and each exponent is 1. Here is the sign diagram.
− + − +
−1 5 7
1. CURVE SKETCHING 65
10 · x4 − 5 · x3 − 5 · x5 = −5 · x5 + 10 · x4 − 5 · x3
= −5 · x3 (x2 − 2 · x + 1)
= −5 · x3 · (x − 1)2
Here is the sign diagram; make sure you can explain each of the signs!
+ − −
0 1
1. Curve Sketching
We show how to make a quick sketch of a curve y = f (x). We will see that
a great deal of information can be observed from such a graph.
We know that dy/dx = f 0 (x) is the slope of the curve y = f (x). Also,
y = f (x) is increasing when f 0 (x) > 0, and it is decreasing when f 0 (x) < 0. If
f 0 (x) = 0 when x = a, then x = a is called a critical point. At each critical
point, the tangent has slope 0; the tangent is horizontal.
By a quick sketch we mean a sketch that plots critical points and that
indicates where a curve is increasing and where is it decreasing. The in-
creasing/decreasing feature of a curve is its oscillation. These features are
determined by the signs of the derivative.
Solution. We find the sign diagram for the derivative, as in the last section.
dy
= 2 · x − 6 = 2 · (x − 3)
dx
66 5. INTERPRETING AND USING THE DERIVATIVE
...... .....
...... .....
...... .....
...... ......
......
....... .
..
.......
.
.
....... .......
........ .......
......... ........
.......... .........
........... ..........
r
............... ...
..
..
..
..
............................................................
(3, −1)
Solution. The derivative uses the Quotient Rule; the first equation below
applies that rule carefully. We skip lightly over the rest of the algebra:
(2 · x + 1) · (x + 1) − (x2 + x + 4) · 1
y0 =
(x + 1)2
x2 + 2 · x − 3 (x + 3) · (x − 1)
= 2
=
(x + 1) (x + 1)2
The critical points: x = −3 and x = 1. Here is the sign diagram, including the
resulting information about the graph. Note carefully the use of y 0 as opposed
to y. We use the abbreviations inc and dec for increasing and decreasing,
respectively.
y0 : + − − +
−3 −1 1
y: inc dec dec inc
1. CURVE SKETCHING 67
... .
... ...
... ...
... ...
...
... ...
... ...
..
...
... ...
.
... ..
.... ...
....
r
....
........ .............
.........
x = −1
y 0 = 4 · x3 − 8 = 4 · (x3 − 2)
√
This has only one critical point: x3 −2 = 0 gives x = 3 2. You may or may not
remember how to factor x3 − 2; let’s show that we don’t need to factor it. We
√
can determine the signs of y 0 by choosing points on either side of 3 2 ≈ 1.26
√
and observe the sign of y 0 . Taking x = 1 to the left of 3 2, we see that y 0 = −4;
√
thus, y 0 < 0 to the left of the cube root. Taking x = 2 to the right of 3 2, we
compute y 0 = 24, so that y 0 > 0 to the right of the critical point. This shows
√
us that y = x4 − 8 · x + 3 has a minimum as x = 3 2. Plugging in, we get
√
y ≈ −4.56. Since the point ( 3 2, −4.56) is below the x-axis, and since y rises
√
to the left and to the right2 of x = 3 2, we see that x4 − 8 · x + 3 has two real
roots.
Here is another kind of problem for which a quick sketch is relevant.
Problem T5.8. Find the maximum and minimum of y = −2x3 +3x2 +12x+10
for 0 ≤ x ≤ 3.
Solution. Factor y 0 :
2We assume you know that the values of a polynomial get large without bound as the
x-values get large and positive or large and negative. For a polynomial of degree 4, y will
get large and positive as x gets large in both directions.
1. CURVE SKETCHING 69
concave down. Between the two possibilities for the sign of f 0 (x) and the two
possibilities for the sign of f 00 (x), there are four possibilities. The following
circle picture gives the shape of the curve in each case.3 We will discuss this
in class.
...........................
r
............................................................
...............
............... ...........
........... ..........
.
............... ........
...... .......
.... .......
....... ....
..
... ....
... ....
.. ....
..
.
.
.
..
...
y0 > 0 y0 < 0 ...
...
...
...
..
. ...
.... ...
...
...
...
y 00 < 0 y 00 < 0 ...
..
..
..
... ..
..
... ..
... ..
...
... ...
r...
...
..
..
r
..
...
..
.. ....
.. .
.
.. ..
.. ..
.. ..
..
..
..
..
...
y0 < 0 y0 > 0 ...
.
.
..
..
... ...
... ..
..
y 00 > 0 y 00 > 0
... .
.
...
... ...
... ...
.... ...
.... ......
.... ...
.... ....
....... ......
....... .......
........
.......... ..
. ..
..........
............ ......
............
.................
r
.............................................................................................
Solution. We use the Product Rule and the Chain Rule to compute
y 0 = −(x − 1) · exp(−x)
y 00 = (x − 2) · exp(−x)
3We are using the arcs on the circle to indicate the shape. Of course, we are not saying
that all curves are circular.
1. CURVE SKETCHING 71
y0 : + − −
y 00 : − − +
1 2
y: inc dec dec
dn dn up
And here is the graph; notice the change in shape at x = 2. (In drawing the
picture, we have exaggerated that change somewhat to make it prominent.)
(1,.r1/e)
.....
.......... ..............
..... ....
....
.
.
..
....
...
....
...
...
...
y = x · exp(−x)
... ...
...
... ...
... ...
... ...
.
..
.. ...
..... r (2, 2/e2 )
......
......
.......
.......
........
........
.........
...........
...............
..................
2. Newton’s Method
There are many ways to find numerical solutions to equations of the form
f (x) = 0. We will discuss a recursive method called Newton’s Method or the
Newton-Raphson Method .4
We want to approximate a solution to the equation f (x) = 0. We start
with a number R0 , and compute
f (Rn )
Rn+1 = Rn −
f 0 (Rn )
Of course, we need f 0 (Rn ) 6= 0. This method is not infallible, but, remarkably
often, if the initial R0 is at all close to a solution, the sequence gives very
accurate approximations to the solution.
√
Problem T5.12. Use Newton’s Method to approximate 2.
√
Solution. We need 2 to be a solution to an equation of the form f (x) = 0;
let’s use x2 − 2 = 0, so that f (x) = x2 − 2 in the notation of the method. Then
4Asthe names imply, the method goes back to Newton and/or Raphson, although the
connection with calculus was, apparently, not elucidated at first.
2. NEWTON’S METHOD 73
We replace the derivatives on the far left and far right by differential expres-
sions, and we get the differential version of the Chain Rule.
Chain Rule – Differential Version Given x = g(t) and y = f (x), where
g(t) and f (x) are differentiable, we have
dy dy dx
(5.2) = ·
dt dx dt
76 5. INTERPRETING AND USING THE DERIVATIVE
Equation (5.2) looks like the dx factors were canceled. When we introduced
differentials, we said we would not give an explicit definition – in any case
they are not numbers, and so the dx factors don’t cancel literally. But the
Chain Rule does say that the equation (5.2) is true, so the point is that we can
apparently manipulate differentials as if they were numbers. We claimed before
that differentials would prove useful; the Chain Rule justifies that promise.
As with the equation (5.2), it looks as if the differentials are playing algebra
– we keep saying that is why they are useful.
The function x5 and y −1/5 are called inverse functions; they undo each
other:
(x5 )1/5 = x5/5 = x1 = x and (y 1/5 )5 = y 5/5 = y 1 = y
You have seen this pattern before: y = exp(x) if and only if ln(y) = x. And
it occurs in other situations. Let’s show that (5.3) holds whenever we have
differentiable inverse functions. If y = f (x) and x = g(y) are inverses, then
1 = g 0 (f (x)) · f 0 (x)
Now differentials: g 0 (f (x)) = g 0 (y) = dx/dy, and f 0 (x) = dy/dx. We see that
(5.3) holds.
But there is an additional fact. It turns out that equation (5.3) holds even
if we do not have an explicit formula for the inverse function, as long as the
derivatives involved in (5.3) are not 0. This fact is called the Inverse Function
Theorem.5 Here is an applied example.
Previously, we stated that ln0 (x) = 1/x, but we didn’t explain where that
formula came from. Now we see that it comes from the formula exp0 (x) =
exp(x) and the Inverse Function Theorem.
Here is a problem that combines (5.2) and (5.3).
Problem T5.16. A particle traces a curve in the xy-plane. At time t, we
have x = (t2 − 1)/(t2 + 1) and y = 2t/(t2 + 1). What is the slope of the path
of the particle when t = 2? (Slope is dy/dx, as always!)
Solution. We see that we can calculate dx/dt and dy/dt. Treating differentials
algebraically, and not worrrying about reasoning for the moment, we might
expect that
dy dy/dt
=
dx dx/dt
provided that dx/dt 6= 0. Compute dy/dt using the Quotient Rule.
dy 2(t2 + 1) − 2t(2t) 2 − 2t2
= =
dt (t2 + 1)2 (t2 + 1)2
Similarly,
dx 2t(t2 + 1) − (t2 − 1)2t 4t
= 2 2
= 2
dt (t + 1) (t + 1)2
Then
dy dy/dt (2 − 2t2 )/(t2 + 1)2 2 − 2t2
= = =
dx dx/dt 4t/(t2 + 1)2 4t
When t = 2, we would have dy/dx = −6/8 = −3/4. That would be the slope
of the path when t = 2.
We worked the previous problem in a typical way – people don’t usually
worry too much about justifying differential calculations. For completeness,
let’s show that what we did is valid. We had dx/dt = 4t/(t2 + 1)2 , and that’s
not zero when t = 2. The Inverse function Theorem then says that t is a
function of x near that point, and (5.3) kicks in to show that
dt 1 (t2 + 1)2
= =
dx dx/dt 4·t
80 5. INTERPRETING AND USING THE DERIVATIVE
4. Marginals
Problem T5.17. The cost of producing x units of a good is the sum of three
quantities: a constant overhead of $6000 for machinery, a production cost of
$50 per unit, and a quantity discount $1000 times the square root of x. If we
currently produce 120 units, what is the marginal cost?
Let’s interpret the answer, 98/3 dollars per hour, of the previous problem.
Given that we now make 10 tables (using 30 hours of labor), if could increase
labor by one hour, our production costs would go up by roughly 98/3 dollars.
Problem T5.20. Suppose that the supply S of pencils (sold in boxes of 1000)
satisfies S = 2 + 1.3 · p, where p is the price of a box in dollars. Suppose that
the demand D for pencils satisfies D = 35 − m · p, where m is a constant.
Describe m as a marginal. If m increases, what happens to the price where
supply and demand are equal? What happens to revenue when m = 2 and
increases?
4. MARGINALS 83
2 + 1.3 · q = 35 − m · q
And we have
33
q=
1.3 + m
To measure the change in q as m increases, we take the derivative
dq 0 33
= 33 · (1.3 + m)−1 = −33 · (1.3 + m)−2 = −
dm (1.3 + m)2
Since the derivative is negative, we see that q decreases as m increases.
As for revenue, that is R = D · p. When p = q, we get this formula for R:
R = D · q = (35 − m · q) · q = 35 · q − m · q 2
Linear Optimization
1. Simple Examples
For our first example we will focus strictly on the mathematical details of
a typical, small problem.
Solution. The quantity Z being optimized (in this case, maximized) is called
the objective.1 The objective in this problem is a function of the variables x, y;
those are the problem variables. The other conditions on the variables are the
constraints.
The constraints may look complicated, but they are easy to graph. To
graph the inequality 3 · x + 5 · y ≤ 15, we solve for y:
1 3
3 · x + 5 · y ≤ 15 yields y ≤ · (15 − 3 · x) = 3 − · x
5 5
The inequality y ≤ 3 − 35 · x refers to points on or below the line y = 3 − 35 · x.
Similarly, the inequality 2 · x + y ≤ 4 comes to y ≤ 4 − 2 · x, and that’s the
points on or below the line y = 4 − 2 · x. The other two constraints x, y ≥ 0 say
we are in the first quadrant. Here, to the left, is a picture of the constraints;
we included the x-intercepts of the two lines, as well as the y-intercepts.
(0, 4)
A
(0, 3) HAA (0, 3) HHr(5/7, 18/7)
H
AHH A
A H A
A HH A
(2, 0) (5, 0) (2, 0)
The problem requires all the constraints to hold. That means that the
constraints define the set of points on and inside the small quadrilateral defined
by the picture; that quadrilateral is pictured to the right. We have caculated
the intersection point between the two lines. We claim that the maximum of
Z cannot occur interior to the quadrilaterial. This is easy to see: at an interior
point, we can increase x or y, or both, and Z = 6 · x + 7 · y will increase.
1Noticethat this use of the word objective is not the usual one. Usually, an objective is a
goal. The word objective in an optimization problem refers to the quantity being optimized,
as opposed to the desire to optimize that quantity. This possibly confusing usage has become
standard.
1. SIMPLE EXAMPLES 87
We also claim that the maximum of Z occurs at one of the corner points of
the quadrilateral. This will best be shown in class: briefly, along each boundary
line we can write Z as a linear function of one variable. A linear function has
a maximum at one of its endpoints – the endpoints of the boundary lines are
the corners in the picture. We will give more specific details in class.
To find the maximum of Z, we have only to calculate its values at each
corner:
point: (0, 0) (2, 0) (0, 3) (5/7, 18/7)
Z 0 12 21 156/7
Because 156/7 ≈ 22.3, we see that the maximum of Z is 156/7, occuring at
(5/7, 18/7).
In the problem just worked, the point (5/7, 18/7) gives values of the prob-
lem variables: x = 5/7 and y = 18/7. We call this a solution to the problem.
To repeat: the solution gives the values of the problem variables, not the value
of the objective. The value of the objective Z = 156/7 at the solution is the
maximum of the objective.
Here is a similar problem that we will work in class.
2.1. Mixing. There are a host of linear optimization problems that in-
volve mixing ingredients together to make something: choosing foods to con-
struct a diet, choosing metals to manufacture an alloy, etc. In the problems
we consider there are many ways to mix the ingredients – that’s probably a
surprising feature, since you are used to situations where there is one recipe
for the desired mix. (e.g. There is only one way to make a water molecule out
of hydrogen and oxygen.)
Problem T6.5. We manufacture four models of desks.2 Each desk is con-
structed in the carpentry shop and then sent to the finishing shop. The num-
ber of hours of labor required for each model at each stage is given in the table
below. We also list the per desk profit realized from the sale of each model.
How many desks of each type should we make to maximize profit, if we have
up to 6000 hours available for carpentry and 4000 for finishing? (Note: the
mix here is the mix of numbers of models scheduled to be constructed.)
model 1 model 2 model 3 model 4
carpentry 4 9 7 10
finishing 1.5 1.5 3 40
unit profit 12 20 18 40
Solution. The objective is profit Q, and it is a function of the number of each
desk model we make. Thus, we have four variables: M1 , M2 , M3 , M4 giving us
the number of each model made. Then profit is easily computed:
Q = 12 · M1 + 20 · M2 + 18 · M3 + 40 · M4
2Modified from an example in [12, p.50]
90 6. LINEAR OPTIMIZATION
We might wonder about the decimal answer for M4 in the last problem.
Can we make fractional numbers of desks? In general, it is a significant compli-
cation to require integer values in an optimization problem. The Solver allows
us to constrain some or all of the variables to be integers, and, if the problem
is not too large, then the Solver will find the constrained solution. However,
for larger problems the Solver can work for a long time without getting a
solution, and there’s no way to tell in advance how long it will take. When
we constrain the desk problem variables to be integers, we get M1 = 1380 and
M4 = 48 and M2 = M3 = 0, and the maximum of Q is 18480. Note that
this solution is not obtained merely by rounding the decimal values in the first
solution – that just does not work in general. In this text, we do not want to
2. MORE COMPLICATED EXAMPLES 91
introduce the many nuances possible when variables have to be integers, and
so we will allow decimal answers without worrying. But you should be aware
that there is a major issue here. We also remind you that we are dealing with
models, and the answer, decimal or integer, might be an approximation in any
case.
The coefficients come from the cost table. We have omitted some of the terms;
there are 12 of them altogether!
3Shipping costs can vary due to many factors such as distance.
92 6. LINEAR OPTIMIZATION
As for constraints, each store needs to receive the correct number of pianos.
Store 1 needs 60 pianos, and so
There is a similar constraint for the other three stores. Also, each warehouse
can send no more pianos than it has in stock. Warehouse A has 100 pianos,
and so
Notice the inquality here. There are 380 pianos, total, at the warehouses, and
the stores want a total of 350, so not all pianos will be shipped. We get three
warehouse constraints. Finally, all the variables are non-negative, since we
cannot ship negative numbers of pianos!
In the Solver, the variables will be laid out in a 4 × 3 table, corresponding
to the shipping cost table, as we will see in class. The minimum cost is
Z = 4980.
Problem T6.6. During a given week, over a four week period, we have workers
and trainees. It takes one week to turn a trainee into a worker. Workers are
divided into producers (making a product) and those who are idle. Here is a
table of revenue produced for each producer, the cost per idle worker, and the
cost per trainee. There is a maximum number of producers for each week in
the table, as well. Assuming we begin with 10 workers, figure out how many
producers, idle, and trainees we need each week to maximize total revenue.
2. MORE COMPLICATED EXAMPLES 93
week: 1 2 3 4
producer (unit revenue) 12 10 10 12
idle (unit cost) 5 4 2 4
trainee (unit cost) 6 7 4 -
maximum producers 8 30 25 40
Solution. As we have been advocating, we turn everything into a variable!
We’ll have workers Wn during weeks 1, 2, 3, 4, and trainees Tn during weeks
1, 2, 3. (Because there are only four weeks, it doesn’t make sense to pay for
trainees during week 4.) We’ll have producers Pn for weeks 1, 2, 3, 4 and idle
In for weeks 1, 2, 3, 4. The objective R is revenue: production minus costs.
R = 12 · P1 + 10 · P2 + 10 · P3 + 12 · P4
− 5 · I1 − 4 · I2 − 2 · I3 − 4 · I4
− 6 · T1 − 7 · T2 − 4 · T3
Even though this is complicated, it is just a sumproduct expression in Excel.
We want to maximize R as a function of all these variables.
For constraints, we just read through the problem carefully. Trainees turn
into workers the next week. Thus,
W2 = W1 + T1 W3 = W2 + T2 W4 = W3 + T3
These equations are called time sequence equations, since they show how the
variables step through the time intervals. Next, workers are divided into pro-
ducers and idle.
W1 = P1 + I1 W2 = P2 + I2
W3 = P3 + I3 W4 = P4 + I4
These equations are sometimes called material balance equations, since they
show how various groups are divided up – they provide a kind of accounting
of the workers. It is not crucial to use the terms times sequence equations
and material balance equations, but we think they help in setting up the more
94 6. LINEAR OPTIMIZATION
complicated problems. Time sequence equations show how to get from one
time step to the next one; material balance equations show how items are
categorized at a single time step.
There is a maximum number of producers each week:
P1 ≤ 8 P2 ≤ 30 P3 ≤ 25 P4 ≤ 40
And we begin with W1 = 10. As usual, all the variables are non-negative, and
we have our problem set-up!
In class, we will describe using Excel to solve this problem. Here is a table
showing numbers of workers each week.
week: 1 2 3 4
workers: 10 30 30 40
Our solution shows that we will be training workers during weeks 1 and 3, and
we will have idle workers during those weeks, as well.
The solution to this problem occurs where the two lines meet; we solve
those equations using our parameter A.
20 − A 2 · A − 12
3 · x + 5 · y = A and 2 · x + y = 4 gives x = ,y=
7 7
This solution gives maximum objective
8 · A + 36
Z̄ =
7
We have used the notation Z̄ to distinguish the maximum value of the objective
from Z as a function of x, y. Notice that Z̄ is a function of the parameter A.
When A = 15, we get the solution originally obtained: x = 5/7 and y = 18/7
and Z̄ = 156/7.
It is easy to use our formula for Z̄ to get information about ∆Z̄. Since Z̄
is a linear function of the parameter A, we see that
∆Z̄ dZ̄ 8
= =
∆A dA 7
If A increases from 15 to 16, so that ∆A = 1, then
8 dZ̄ ∆Z̄
= = = ∆Z̄
7 dA ∆A
We expect the maximum objective to increase by 8/7.
Let’s call attention to the shift in point of view here. While we are working
the optimization problem the objective Z is a function of the problem variables
x, y, and A is the constant 15. In other words, A is fixed while we do the
optimization problem. Once we do the problem, the maximum objective Z̄ is
a function of the parameter A, without the problem variables.
The derivative dZ̄/dA = 8/7 is a special case of the derivative of the
optimal objective with respect to a parameter. This derivative is identified
by several technical terms, for it arises in several contexts. Mathematicians
and practitioners of some other disciplines call it a Lagrange multiplier . In
economics problems, it may be called a shadow price – we will explain that
term momentarily. People also say that the derivative measures the sensitivity
96 6. LINEAR OPTIMIZATION
of the maximum with respect to the parameter A. We will use the term
Lagrange multiplier, while nodding to other terms as they make sense.
Here is a typical applied problem.
Problem T6.7. We make (cheap) chairs and tables. It costs $10 to make
each chair, and $50 to make each table, and we can spend up to $500 to make
them. Each chair takes 3 hours of labor and each table 2 hours; we have 60
hours of labor available. We get $22 profit from each chair and $35 from each
table. How many chairs and tables should we make to maximize profit? What
would it be worth to us to increase the available hours of labor by 1?
Solution. Let x be the number of chairs we make, and let y be the number
of tables. Cost: 10 · x + 50 · y ≤ 500. Labor: 3 · x + 2 · y ≤ 60. Also, x, y ≥ 0.
The objective is profit P = 22 · x + 35 · y.
Let’s think about the second question: the question about what happens if
the amount of labor changes. The amount of labor L is a parameter, currently
60. We see that the objective profit P is a function of the variables x, y;
the profit does not mention the parameter L. The maximum value P̄ of P ,
however, will be a function of L. The Lagrange multiplier would be dP̄ /dL.
The Solver finds the solution
x = 15.4, y = 6.9, so that P̄ = 580.8
When the Solver reports that it has found a solution, the right side of the
dialogue box shows three options: Answer and Sensitivity and Limits. If
you select Sensitivity, then a new worksheet will be produced that shows
the Lagrange Multipliers (or it may call them shadow prices) – one multiplier
for each constraint. Write L for the amount of labor – currently 60 hours. The
Lagrange Multiplier for L goes with the constraint 3 · x + 2 · y ≤ L. Excel
calculated the Lagrange multiplier for the labor constraint to be 5.8. Thus,
dP̄
= 5.8
dL
3. SHADOW PRICES – LAGRANGE MULTIPLIERS 97
The Integral
1. Antiderivatives
Taking k = 6 as a specific example, both 3x2 − ln(x) and 3x2 − ln(x) + 6 are
antiderivatives of 6x − 1/x.
It turns out that all the antiderivatives of a given function can be described
in this way: find a particular antiderivative, and the rest of them are obtained
by adding constants.1 Thus, since (x3 )0 = 3x2 , we have
If f 0 (x) = 3x2 then f (x) = x3 + k for some constant k
We introduce the most common notation for the antiderivative. We will
concentrate on using this notation; it will be explained more fully in the next
section. We write Z
f (x) · dx
R
for an antiderivative of f (x), also called the indefinite integral of f (x). The
is the integral sign; it originated as a German S, standing for summe, recog-
nizable as the the German word for sum. The function f (x) is the integrand ,
and the differential dx indicates the variable. Since (x2 )0 = 2 · x, we write
Z
2
(7.1) x = 2 · x · dx
As we saw before, different functions can have the same derivative; that’s why
the indefinite integral is indefinite! We will see that ambiguity is an advantage
in this setting. Some texts write
Z
2 · x · dx = x2 + k where k is a constant
This point of view regards the indefinite integral as standing for all antideriva-
tives of 2 · x simultaneously. Our preference is to use the indefinite integral
1Thisfact is a consequence of the Mean Value Theorem. You might look up a statement
of that result in a standard calculus text. When the Mean Value Theorem is expressed in
terms of the secant line on a graph, it seems obvious to most people.
1. ANTIDERIVATIVES 101
to stand for some antiderivative rather than all of them. The thing to remem-
ber is that if you have one antiderivative of a function, you can get the rest of
them by adding constants.
Each rule for calculating derivatives can be turned into a rule for calculating
antiderivatives. We will begin with the power rule.
Explanation. Remember that ln(a) is defined only when a > 0. When x > 0,
we know that
Z
0 1 dx
ln (x) = and so ln(x) = when x > 0
x x
We will need to have an antiderivative for 1/x in the case that x < 0, as well.
In that case, notice that −x > 0, and so ln(−x) is defined. Use the Chain
Rule to compute:
0 1 1
ln(−x) = ln0 (−x) · (−x)0 =
· (−1) =
−x x
102 7. THE INTEGRAL
Thus,
Z
dx
ln(−x) = when x < 0
x
When −x < 0, we can write −x = |x|; when x > 0, we have x = |x|, so we
have the Logarithm Rule in both cases.
Don’t forget the absolute value in ln |x|. We will need to allow x to be
negative, and the logarithm of a negative is undefined, but |x| is always positive
if x 6= 0, and so ln |x| is defined.
The exponential function is its own derivative, and so it is its own anti-
derivative.
Recall the Addition Rule and Constant Multiple Rule for derivatives:
0 0
f (x) + g(x) = f 0 (x) + g 0 (x) and c · f (x) = c · f 0 (x)
where f (x) and g(x) are differentiable, and c is a constant. It follows that
we have an Addition Rule and a Constant Multiple Rule for the indefinite
integral.
Addition Rule, Constant Multiple Rule If f (x) and g(x) are differen-
tiable, and if c is a constant, then
Z Z Z
(7.3) f (x) + g(x) · dx = f (x) · dx + g(x) · dx
and
Z Z
(7.4) c · f (x) · dx = c · f (x) · dx
Example. We compute
Z Z Z
x 5 x 5
2 · e + 3 · dx = 2 · e · dx + · dx addition rule
x x3
Z Z
= 2 · e · dx + 5 · x−3 · dx
x
constant multiple
x−2
= 2 · ·ex + 5 · exp, inverse power
−2
5
= 2 · ex −
2x2
In the following problem, pay attention to the use of the unknown constant
K.
Problem T7.1. A steel bar 2 feet long has density 10x2 − x3 + 2ex pounds
per foot at the point x feet from one end of the bar. What is the weight of
the bar?
Solution. Measure density and weight W (x) from the left hand end of bar.
Then W 0 (x) is the density, so that W 0 (x) = 10x2 − x3 + 2ex , and so
Z
10x2 − x3 + 2ex · dx
W (x) =
Let’s use the various rules on the right side, and then we’ll bring W (x) back
into the picture. Make sure you understand which rules are being used in each
line of the following.
Z
10x2 −x3 + 2ex · dx
Z Z Z
= 10x · dx − x · dx + 2ex · dx
2 3
Z Z Z
= 10 · x · dx − x · dx + 2 · ex · dx
2 3
x3 x4 10 1
= 10 · − + 2 · ex = x3 − x4 + 2ex
3 4 3 4
104 7. THE INTEGRAL
In the previous problem, we used the antiderivative rules to find the indef-
inite integral of the density. The weight function is a definite antiderivative.
The constant K synchronizes the indefinite antiderivative to the definite one
of the problem.
Sometimes we are asked to verify an antiderivative formula: that’s just
asking us to take the derivative.
Solution. As we said, we are just being asked to take the derivative; we need
to remember that (ln |x|)0 = 1/x no matter whether x > 0 and x < 0.
Compute
0 7
x + 7 · ln |x − 7| = 1 +
x−7
1. ANTIDERIVATIVES 105
The Linear Inside Rule is somewhat abstract. If we ignore the linear term
ax + b, we see the pattern f 0 (x) · dx = f (x). The Rule says to imitate this
R
pattern, carrying the ax + b along the way we do in the Chain Rule. But we
modify the antiderivative with the fraction 1/a, dividing by the slope of the
linear term ax + b. Let’s see some uses of this rule.
Problem T7.3. Compute each of the following.
Z Z Z p
4
(a) (3x − 2) · dx (b) exp(−2t) · dt (c) 3y + 1 · dy
For many integrals, there is more than one way to get the answer. For
instance, consider
Z
(5 − 2 · t) · dt
We introduce the definite integral , one of the most important and com-
monly occurring mathematical objects in all of science. We will discuss the
definition and computation of the integral, and we will give the two main ways
it occurs in applications.
There are many approaches to this material. We want to start with the
computation of the definite integral. That approach can be presented by build-
ing on the indefinite integral studied in the previous section.
First, we remind you about the closed interval notation. For instance,
[1, 4] stands for the set of all numbers x with 1 ≤ x ≤ 4. In general, if a, b are
numbers with a ≤ b, the closed interval [a, b] is the set of all numbers x with
a ≤ x ≤ b.
Second, we introduce notation that will help us evaluate functions at var-
ious values. Given F (x) and numbers a, b, we define
b
F (x) = F (b) − F (a)
a
Plug in the top number first, then subtract what you get when you plug in
the bottom number. Thus,
3 2
x = 32 − 22 = 5 and
2
exp(x/2) = exp(1) − exp(0) = e − 1
2 0
b
You might recognize that F (x) is ∆F . Hold that thought.
a
You have seen the indefinite integral
Z
f (x) · dx
As in the indefinite integral, the function f (x) is called the integrand and
the differential dx indicates the variable x for which f (x) is a function. The
number a is the lower limit of integration, and the number b is the upper limit
of integration.
We have many things to say about the definite integral; we begin with how
it is computed in most cases. First, find an antiderivative2 F (x) for f (x) on
[a, b]. Then,
Z b b
f (x) · dx = F (x) = F (b) − F (a)
a a
We see that the constant 5 cancels! This is what happens in general, for
suppose we have an antiderivative F (x) for f (x), so that F 0 (x) = f (x). In
Section 1 we said that any other antiderivative for f (x) is of the form F (x)+K
where K is a constant. Using F (x) + K in place of F (x) in the definition of
the definite integral we get
Z b b
f (x) · dx = (F (x) + K)
a a
= (F (b) + K) − (F (a) + K)
= F (b) + K − F (a) − K = F (b) − F (a)
Thus, we compute the same value for the definite integral no matter which
antiderivative is used.
We see that the value of the definite integral is ∆F , where F 0 (x) = f (x)
is the integrand. That leads to many applications. Here is an example.
Problem T7.4. An object is moving along the x-axis with velocity v(t) =
5 − 2 · t2 , for t ≥ 0. What is the displacement as t goes from 1 to 3?
3. Riemann Sums
We have computed the definite integral from an antiderivative for the inte-
grand. There are many functions for which there is no obvious formula for the
antiderivative – for instance, integrals involving the function exp(−x2 ) come
up often in statistics. How are we to compute the definite integral when we
can’t find an antiderivative? If you accept the statement we made to the effect
that the integral occurs often, you might find this question compelling.
There is another problem with our approach to the integral. Many appli-
cations of the integral do not involve an antiderivative in any obvious way.3
The link from those applications to the integral involves looking at the inte-
gral differently. It turns out that this alternative point of view answers the
question posed in the last paragraph about the existence of antiderivatives.
Let’s introduce this new idea via a very specific example. We need a
function defined on a closed interval; let’s use x2 − 3 on [1, 3]. Now we divide
the interval [1, 3] into some number of equal subintervals. Let’s divide it into
5 subintervals; the length of [1, 3] is 3 − 1 = 2, and so each of the 5 pieces will
have length 2/5 = 0.4. Here are the subintervals.
Next we add up the function values at the right hand endpoint of each
subinterval. Our function was x2 − 3, and here are the values:
x: 1.4 1.8 2.2 2.6 3
x2 − 3 : −1.04 0.24 1.84 3.76 6
And their sum:
−1.04 + 0.24 + 1.84 + 3.76 + 6 = 10.8
Finally, we multiply by the width of the subintervals; in this case, the width
is 0.4. The product of the sum and width is called a Riemann sum.
A single Riemann sum is not impressive, and the calculation of specific ex-
amples dissolves into arithmetic. What is interesting is what happens as the
number of intervals used gets larger and larger. Here are the Riemann sums
for x2 − 3 on [1, 3] when we divide up the interval into n pieces for various n.
(In class we will explain how the sums were computed; the answers have been
rounded.)
n: 5 10 100 1000 106
Riemann sum : 4.32 3.48 2.75 2.675 2.667
Our table shows only a few sums, but as the number n of subintervals gets
larger and larger, the Riemann sums get closer and closer to (about) 2.667.
That number turns out to be the value of the definite integral using the same
function x2 − 3 and the same interval [1, 3]:
Z 3
x3
2
3 8 8
(x − 3) · dx = − 3 · x = 0 − − = ≈ 2.67
1 3 1 3 3
It turns out that Riemann sums can be made close to the definite integral by
using more and more subintervals.
Let’s describe what we just did more generally. We are given a function
f (x) defined on a closed interval [a, b]. Given a positive integer n, there is a
3. RIEMANN SUMS 113
Riemann sum which we will denote4 R(f (x), a, b, n). The sum is formed by
dividing the interval [a, b] into n equal subintervals, each of width (b − a)/n.
Denote the endpoints of the subintervals like this.
(7.11) a = x0 < x1 < x2 < · · · < xn−1 < xn = b
We write a = x0 and b = xn to make the notation for the subintervals more
uniform.5 Here are the subintervals:
[x0 , x1 ], [x1 , x2 ], [x2 , x3 ], · · · [xn−1 , xn ]
Here is the Riemann sum:
b−a h i
(7.12) R(f (x), a, b, n) = · f (x1 ) + f (x2 ) + f (x3 ) + · · · + f (xn )
n
If f (x) is differentiable, then it turns out that as n gets larger and larger,
the sums approach a specific number; this number is the definite integral. In
notation
Z b
(7.13) lim R(f (x), a, b, n) = f (x) · dx
n→∞ a
The notation n → ∞ simply means that n gets larger and larger, without
bound. We are saying that the larger n gets, the closer the Riemann sums get
to the definite integral.
In our Riemann sums, the terms f (xj ) evaluate the function f (x) at the
right endpoint of each of the subintervals into which the main interval is par-
titioned. It is just as acceptable to use the left endpoints, or, in fact, any
point in each of the subintervals. The points can be chosen systematically or
randomly, as long as there is a term f (cj ) for some point cj in the j-th interval.
Computer programs that calculate Riemann sums numerically use a variety of
schemes. And those schemes are important, since one use of (7.13) is to give
4There is no universally used notation for Riemann sums. The notation we introduce
will be used only in this course.
5We won’t need a precise formula for the x , although, in case you are interested,
j
xj = a + j · (b − a)/n.
114 7. THE INTEGRAL
Solution. We need to pick various n and form the Riemann sum (7.12). The
problem doesn’t tell us how to do this. We’ll start with n = 8 and compute
the sum by hand. When the interval [0, 6] is divided into 8 subintervals, each
has width 6/8 = 0.75. Here are the endpoints, as in (7.11).
0 < 0.75 < 1.5 < 2.25 < 3 < 3.75 < 4.5 < 5.25 < 6
1. Anti-Rates
Solution. For the first question, we notice that if the population is P (t), then
the rate of growth is P 0 (t). Thus,
P 0 (t) = 10 · (10 − t)
2. Area
Let f (x) be a differentiable and non-negative function on the interval [a, b].
Let R be the plane figure2 bounded by the x-axis, the curve y = f (x), the line
x = a, and the line x = b. Here is a generic picture of R.
2A plane figure is nothing more than a set of points in the plane.
2. AREA 121
............................................
.............................. ................
............... ............
............ ...........
..
..
..
............
. ..........
..
..
..
......
. .........
........
..
..
.......
. ........
.......
..
..
......
. .......
.....
.
......
.
..
.
R ........
..........
..........
...........
y = f (x)
........... ..
............. ............
.............. ............
................. ..............
......................... .................
.....................................................................................
r r
a b
..................................................................................
................
............. ............ (z, f (z))
.
...
.........
.
..
..
..
............
...........
r
..........
.........
........
..
..
.........
. ........
.......
..
..
......
. .......
..... .......
.
.......
.
..
. .........
..........
.......... y = f (x)
A(z) ∆A ...........
............
.............
................
........
...........
..............
.................... ...............
............................................................................................................
r r r r
a z z + ∆z b
The reason for involving the derivative of A is that (8.2) shows that
1 1
A0 (z) = and so A0 (x) =
z x
changing variable from z to x. Thus, (8.3) shows that
Z 3
1
area = · dx
1 x
In the solution just done, we emphasized the use of Newton’s function A(z).
The point of that function was to obtain the equation (8.1) and calculate the
area by integration. Going forward, we will use (8.1) without appealing to the
apparatus of Newton’s area function.
Because ln |x| is an antiderivative of 1/x, we see that areas under y = 1/x
are logarithms. Perhaps that is an unexpected interpretation of the logarithm.
Even though we don’t have a formula for R(x), we will use it to good effect.
We have
1
(8.4) π = R(x) = R(1) − R(−1)
−1
For r > 0, use the Chain Rule to compute
d x x 1 r x2 1
R = R0 · = 1− 2 ·
dx r r r r r
r √
r 2 − x2 1 r 2 − x2 1 2 √ 2
=2· · = 2 · · = · r − x2
r2 r r r r2
This shows us that
x Z √
2
r ·R =2· r2 − x2 · dx
r
and so the area of a circle of radius r is this:
Z r√ x r
2· r2 − x2 · dx = r2 · R
r
−r −r
= r2 · (R(1) − R(−1))
Looking at (8.4), we see that the area of the circle of radius r is π · r2 .
The previous argument involves the abstract existence of an antiderivative
– a consequence of the Fundamental Theorem of Calculus. We will not pursue
this sort of argument very often – there will be one more use in this chapter
2. AREA 125
– but we need to say that this kind of calculation has some important uses in
applied mathematics.
We computed the area between the graph of a function and the x-axis,
assuming that the function was above the x-axis. More generally, we can
compute the area between y = f (x) and y = g(x) for a ≤ x ≤ b, assuming
that f (x) ≥ g(x).
Area between cuves If f (x) ≥ g(x) for a ≤ x ≤ b, then the area between
y = f (x) and y = g(x) on this interval is
Z b
(f (x) − g(x)) · dx
a
Explanation We can add a constant c to g(x) so that g(x) + c > 0; that puts
y = g(x) + c above the x-axis. Since f (x) ≥ g(x), it follows that y = f (x) + c
is also above the x-axis. The area between y = f (x) and y = g(x) is the same
as the area between y = f (x) + c and y = g(x) + c, since the latter area is just
the former area moved up in the plane.
If you draw a generic picture, you will see that the area between y = f (x)+c
and y = g(x) + c is the difference between the area under y = f (x) + c and the
x-axis over the interval [a, b] and the area under y = g(x) + c and the x-axis
over [a, b]. Thus, the area we are trying to compute is this:
Z b Z b
(f (x) + c) · dx − (g(c) + c) · dx
a a
Z b Z b Z b Z b
= f (x) · dx + c · dx − g(x) · dx − c · dx
a a a a
Z b Z b
= f (x) · dx − g(x) · dx
a a
Z b
= (f (x) − g(x)) · dx
a
The area formula shows that we integrate the upper curve minus the lower
curve. We usually determine which is which from the graphs. In the next few
problems we deliberately do not draw the graphs in the text; we want to do
that interactively in class, or have you do it for yourself.
Problem T8.7. Find the area bounded by y = x2 and y = 2 · x + 8.
Solution. It is not hard to see how a parabola and line intersect. The area
bounded by these curves must lie between them, and because the parabola
is concave up, we expect the line to be the upper curve. Let’s see: to find
the boundaries of the area, we need to figure out where they intersect. The
equation x2 = 2 · x + 8 leads to x2 − 2 · x + 8 = 0, which is (x + 2) · (x − 4) = 0.
We see that the figure is bounded by the interval [−2, 4].
It is not hard to figure out which is the upper curve from the sign diagram
for the difference. Or we can plug in a number in the interval [−2, 4] (say
x = 0). One way or another we see that y = 2 · x + 8 is above y = x2 . Our
area is
Z 4 h i Z 4 h i
2 2
(2 · x + 8) − x · dx = 2 · x + 8 − x · dx
−2 −2
2 1 3 4
=x +8·x− ·x
3 −2
h 64 i h 8i
= 16 + 32 − − 4 − 16 +
3 3
= 36
The curves intersect at x = 1 and x = 2, and so they trap area between them
there. Over that interval, plugging in x = 1.5 (or looking at the sign pattern),
we see that y = x2 + 3 · x is the upper curve, and y = 2 · x2 + 2 is the lower
curve. The area of the bubble between the curves is
Z 2 Z 2
2 2 1
3 · x − x2 − 2 · dx =
(x + 3 · x) − (2 · x + 2) · dx =
1 1 6
But there’s more! The boundary x = 3 must also be taken into account.
Between x = 2 and x = 3, the curves trap another bubble of area. In that
range, the curve y = 2 · x2 + 2 is the upper curve. (How did we know that?)
The additional area is this:
Z 3 Z 3
2 2 5
x2 + 2 − 3 · x · dx =
(2 · x + 2) − (x + 3 · x) · dx =
2 2 6
The total area is (1/6) + (5/6) = 1.
Solution. From the graph, we see that y = 2 − x is the upper curve and
y = ex is the lower curve, until they meet somewhere to the right of the y-axis.
To find the intersection point, we need to solve ex = 2 − x. This equation
does not have a nice, algebraic solution, and so we use Newton’s Method to
approximate the solution.3 The equation is ex + x − 2 = 0. Starting with
x = 0, we obtain the approximate solution α = 0.44285.
The boundary x ≥ 0 gives the left-hand limit of integration. The area is
then
α
x2 α2
Z α
x x
(2 − x − e ) · dx = 2 · x − −e =2·α− − eα + 1 ≈ 0.2305
0 2 0 2
3You might wish to review Newton’s Method, since it is going to be used several times
in this chapter. See p.72.
128 8. INTERPRETING AND USING THE INTEGRAL
3. Probability
and this shows that c = 1/(b − a). Thus, for the constant distribution, we can
compute the density.
Problem T8.10. For a constant distribution on [−3, 5], what is the probabil-
ity that x > 0?
Solution. The constant is 1/(5 + 3) = 1/8. The probability is then
Z 5
1 x 5 5
· dx = =
0 8 8 0 8
This is easy to interpret: the statement that x > 0 is the statement that
0 < x ≤ 5. Since that interval has length 5, and the entire interval [−3, 5] has
length 8, the probability that x > 0 is 5/8. Sadly, when the density is not
constant, we cannot use such straightforward reasoning.
130 8. INTERPRETING AND USING THE INTEGRAL
(8.5) λ · exp(−λ · x)
Remembering that λ is positive, if we let b get larger and larger, the graph of
the exponential function shows that exp(−λ · b) → 0. Our improper integral
becomes 1, and so, as expected, the formula (8.6) holds.
Our answer says that about 73.6% of the time, there will be less than 4 seconds
between cars.
As we said, the infinite upper limit means that we think about what hap-
pens when x gets larger and larger (we can write x → ∞). We see that
exp(−x/5) → 0 as x → ∞, and so
P = 0 + exp(−6/5) ≈ 0.3012
Problem T8.13. Suppose that the variable x satisfies the standard normal
distribution. Find the probability that −1 ≤ x ≤ 1. (This is a famous
probability: the chance of landing within one standard deviation of the mean.)
The Mean Many of the typical statistical quantities associated with prob-
abilities are computed via integration. We don’t want to get carried away
into statistics proper, but we mention two important examples, both of which
measure a kind of average for a continuous random variable.
Let x be a random variable in the interval [a, b] with probability density
function f (x). (Remember that it doesn’t really matter whether the endpoints
a, b are included, or not. For instance, we can have b = ∞.) The mean of x is
meant to be the best approximation to x by a constant. When we discuss least
squares at the end of the course, we can go into this in more detail, but the
idea is that the constant approximation is an average.8 Here is the formula for
the mean
Z b
(8.8) mean of x = x · f (x) · dx
a
Problem T8.14. Show that (a + b)/2 is the mean of the constant distribution
on [a, b].
Solution. The probability density function is the constant 1/(b − a). The
mean is then computed using (8.8).
Z b
1 x2 1 b 1
x· · dx = · = · (b2 − a2 )
a b−a 2 b − a a 2(b − a)
1 b+a
= · (b − a)(b + a) =
2(b − a) 2
Problem T8.15. Let λ = 1/5 in the exponential distribution. Verify that the
mean is 5.
The functions F (−x) and F (x) have the same derivative, and so they differ
by a constant: F (−x) = F (x) + C for some constant C. Let x = 0, and we
see that C = 0. Thus, F (−x) = F (x). Here is what follows:
Z 0 2
1 x
x · √ · exp − · dx = F (0) − F (−b) = −F (−b) = −F (b)
−b 2π 2
Thus,
Z 0 2
1 x
(8.11) x · √ · exp − · dx = lim −F (b)
−∞ 2π 2 b→∞
The Median For some random variables, the mean doesn’t make sense as
an average. An example: in the distribution of income, the mean is typically
too large due to the influence of very high incomes that aren’t balanced by
very low incomes. It makes more sense to compute the value that divides the
random variables by probability. The median m of a random variable x has
the property that the probability of x < m is 50%, and the probability that
x > m is 50%. The mean and median are not always the same; each gives its
own idea of an average.
Problem T8.18. Find the median m for the random variable x with 0 ≤ x ≤ 1
and probability density function 3 · x2 .
Solution. The probability that 0 ≤ x ≤ m is 50%:
Z m
1
= 3 · x2 · dx = m3
2 0
√
3
and so m = 1/ 2 ≈ 0.7937.
We computed the mean µ = 3/4 for this distribution; the median was not
the same. As we said, each of them can be used as an average depending on
the context.
Here is a typical problem: the median is the solution to an algebraic equa-
tion.
Problem T8.19. Find the median of the random variable x on [1, 3] with
probability density (3 · x2 + 2)/30.
Solution. If m is the median, then
Z m
1 3 · x2 + 2 x3 + 2 · x m
= · dx =
2 30 30
1 1
3
m +2·m 3
= −
30 30
Multiplying through by 30, we get
15 = m3 + 2 · m − 3 which is m3 + 2 · m − 18 = 0
138 8. INTERPRETING AND USING THE INTEGRAL
We use Newton’s Method, starting at 2, since that’s in the range of the variable,
and we get m ≈ 2.367.
4. Quantities in Economics
There are many quantitites that occur in economics that are computed as
integrals. We survey a few of them.
Suppose we have an annual interest rate r on savings, constant for the
foreseeable future.9 Continuous compounding of savings is an exponential
model, where an initial amount P dollars saved for t years results in P ·exp(r·t)
dollars at the end of the t years. Write Q for the amount at the end of t years,
so that
Q = P · exp(r · t)
Turning time around, we regard Q as the given and solve for P .
Q · exp(−r · t) = P · exp(r · t) · exp(−r · t) = P · exp(0) = P
This value of P is called the present value of Q dollars t years from now.
(8.12) present value of Q = Q · exp(−r · t)
Problem T8.20. What is the present value of $1 million 50 years from now.
Assume an interest rate of 5%.
Solution. We have r = 0.05 (per year, as usual), and t = 50 years. Recall
that $1 million is 106 .The present value is given in (8.12)
$106 · exp(−0.05 · 50) ≈ $82, 084
Explanation. This example was mentioned earlier in the book; the integral
comes about from Riemann sums rather than by an anti-derivative. We divide
the time interval [0, T ] into n subintervals, each of width T /n (years elapsed).
Let [p, q] be one of these subintervals. Over this time, the stream produces
roughly f (q) · T /n dollars. That’s since f (t) measures dollars per year and
T /n measures the years in [p, q]. The number f (q) · T /n only approximates
the amount of money, since f (t) may vary over the subinterval. The amount
f (q) · T /n is earned about q years from now, and so its present value is given
by (8.12)
T
f (q) ·
· exp(−r · q)
n
This term has the form f (q)·exp(−r·q) times the width T /n of the subinterval.
When these terms are added up, they form a Riemann sum for the function
f (t) · exp(−r · t) on the interval [0, T ]. As n gets larger and larger, (7.13) shows
that the Riemann sum approaches the definite integral in the equation for the
present value of the stream.
Problem T8.21. Find the present value of the income stream 10 · t · (20 − t)
dollars per year over 20 years, if the annual interest rate is 5%.
Solution. If you graph y = 10 · t · (20 − t) you will see that the rate of income
increases over the first 10 years, and then it decreases over the final 10 years.
140 8. INTERPRETING AND USING THE INTEGRAL
Problem T8.22. Show that each of these income streams produces a total
income of $10000 over 10 years: 1000 per year, and 200 · t per year. Given
an interest rate of 3%, which one is preferable in terms of having the smaller
present value?
(Obtained by the Riemann sum calculator.) The second stream has the smaller
present value.
Is it clear why we prefer the smaller present value? Because it is the
cheaper way to produce the income stream.
4. QUANTITIES IN ECONOMICS 141
Consumer Surplus. (This is a standard topic in many texts; see [10], for
example.10) A demand curve relates the unit price of some good to the demand
– how many will be sold. It is natural to think of the demand x as a function
of the unit price p, but our calculation will be simpler if we think of p as a
function of x. So, we imagine p = f (x). We usually suppose that f (x) is
decreasing and concave up,11 so that f 0 (x) < 0 and f 00 (x) > 0.
Suppose that the current unit price of apples12 is p0 , and demand is x0 , so
that p0 = f (x0 ). Some consumers are willing to pay more than p0 for apples;
the consumer surplus measures the total amount of money saved because some
consumers are paying less than they are willing to pay. On the graph p = f (x),
these consumers are to the left of x0 , where f (x) is higher than p0 .
To estimate consumer surplus, let n be a positive integer and divide the
interval [0, x0 ] into n subintervals of width ∆x = x0 /n. Let [x1 , x2 ] be one of
these intervals, so that x2 − x1 = ∆x. We think of this interval as standing
for the ∆x consumers who were willing to pay f (x2 ) for apples. Each of these
consumers paid p0 for the apples; each saved f (x2 ) − p0 . Their surplus is thus
[f (x2 ) − p0 ] · ∆x (the amount each saved times the number of consumers).
The total surplus for all consumers is the sum of all these terms, and that’s a
Riemann sum for the function f (x) − p0 over the interval [0, x0 ].
Letting the number n get larger and larger, we arrive at the following.
Z x0
consumer surplus = f (x) − p0 · dx
0
10The concept of consumer surplus is a kind of lightning rod, as well. See the article by
Currie, Murphy, and Schmitz, The Concept of Economic Surplus and Its Use in Economic
Analysis, The Economic Journal, 1971, pp.741-799. Available on the internet.
11This is a form of diminishing returns, for the rate of decrease of p is decreasing (in
absolute value).
12For a unit of apples, imagine a large crate.
142 8. INTERPRETING AND USING THE INTEGRAL
Problem T8.24. The per unit tax on an item is supposed to generate revenue
√
equal to 10% of the consumer surplus. If the demand curve is p = 100 − x,
in dollars, for 0 ≤ x ≤ 1000, and the current price is $75, what should the per
item tax be?
Solution. When p0 = 75, we solve for x0 = 252 = 625. The consumer surplus
from (8.13):
Z 625
√
(100 − x) · dx − 75 · 625 ≈ $5208.33
0
4. QUANTITIES IN ECONOMICS 143
The tax revenue is 10% of this surplus: $520.83. The demand is 625, and so
the per item tax is
$520.83
≈ 83/c
625
Relating taxes to surplus is intended to limit how much taxes cut into
spending. Good luck.
CHAPTER 9
Matrix Algebra
1. Matrix Arithmetic
145
146 9. MATRIX ALGEBRA
Because real addition is commutative, we have A[i, j]+B[i, j] = B[i, j]+A[i, j].
And the definition of matrix addition shows that
Because matrix addition is done entry by entry, it is obvious that adding the
zero matrix to a matrix A does not change A.
Zero Matrix If A is an m × n matrix, then
A + Om×n = A = Om×n + A
Now we are ready to define matrix multiplication. There are several nu-
ances in this definition – make sure you understand the details. First, the
matrices A and B can be multiplied to form the product A · B if A has the
same number of columns as B has rows. The best way to check this is to put
the sizes of A and B side by side (keep A’s size on the left). Suppose that A
is 4 × 3 and B is 3 × 5:
4×3 3×5
We need the inside numbers to match: A has 3 columns and B has 3 rows.
This tells us that we can form the product A · B. The outside numbers give
us the size of the result; the matrix A · B will be 4 × 5.
If A is 9 × 9 and B is 7 × 9, then, putting the sizes next to each other:
9×9 7×9
we see that A · B is not defined. Notice, however, that if we put B first we get
7×9 9×9
1,1 1,2
2,1 2,2
3,1 3,2
To get each entry in the product, we compute the dot product of a row of
A and a column of B, the row and column indicated by the entry. Thus, to
get the 1, 2 entry of A · B, we want the dot product of row 1 of A and column
2 of B:
The matrix In has In [i, i] = 1 for each i; we say that the 1’s are on the diagonal .
The other entries of In are 0.
152 9. MATRIX ALGEBRA
Im · A = A = A · In
You should check that the size of Im · A is m × n and that the size of A · In
is m × n. In class, we will discuss these identities; they follow directly from
the formula for matrix multiplication.
We have emphasized that matrix multiplication is not commutative. How-
ever, it does have many properties similar to those of numbers. We list these
properties to be clear and complete in our introduction to matrix algebra. As
we mentioned previously, we will use these properties without calling attention
to them. Throughout the list of rules, A, B, C are matrices.
(1) Associative Law If A · B and B · C are defined, then
(A · B) · C = A · (B · C)
k · (A · B) = (k · A) · B = A · (k · B)
An+1 = An + Bn A0 = 4
Bn+1 = 3 · An − Bn B0 = 1
=mmult($B$1:$C$2,B4:B5)
was placed in cell C4. We can fill-right to produce the remaining 9 terms
of the sequences. We get A9 = 1280 and B9 = 2816.
Here is another type of recursion that can be written as matrix multipli-
cation.
and C0 = 1 and C1 = 0.
Solution. The key idea is to pair Cn and Cn+1 . Let Bn = Cn+1 for n ≥ 0,
and then B0 = C1 = 0. The recursive equation is now
Bn+1 = −6 · Bn − 13 · Cn
2. APPLICATIONS OF MATRIX ALGEBRA 155
1One of Leslie’s papers on the subject is The use of matrices in certain population
mathematics appeared in 1945 in the journal Biometrika, Vol. 33(3), pp183-212.
2This table is a simplified version of the more realistic (and larger) table in the paper
Parameters for Seasonally Breeding Populations, by G. Caughley from the journal Ecology,
vol. 48, 1967, pp. 834-839
156 9. MATRIX ALGEBRA
Thus, A2 is the number of female sheep aged 0-3 after 6 years; B4 is the
number of female sheep aged 4-6 after 12 years, and so on. The birth rate
puts new animals into the An sequence. The survival rate puts animals in the
next category. In light of this, here are recursive equations for the populations.
In matrices:
An+1 0.436 1.502 1.513 1.313 An
Bn+1 0.795 0 0 0 Bn
(9.2) ·
Cn+1 = 0
0.787 0 0 Cn
Dn+1 0 0 0.462 0 Dn
Here is an example problem.
Problem T9.5. In the sheep model just considered, suppose we start with 10
sheep in category 1 and none in the other categories. How long until the total
population reaches 100 sheep?
Since Q[1] = 0.2, the probability is 20% that we are in state 1. State 2:
50% and state 3: 30%. If we are in state 2 for certain, then Q[2] = 1 and
Q[1] = Q[3] = 0. A probability vector has non-negative entries, one for each
state, that add up to 1=100%, since we have to be in exactly one of the states.
The key feature of a Markov process is its transition matrix ; that matrix
shows the probability of going from one state, at a particular time, to the
various states at the next time. Suppose, for example, we have three states
1,2,3, with the following transition matrix.
0.1 0.4 0.2
(9.3) A = 0.6 0.1 0.7
0.3 0.5 0.1
Each column of A imagines we are in the state of that column and tells us the
chance that we move to each of the three states at the next time step. For
instance, A[3, 2] = 0.5. If we are in state 2, then the chance that we will be in
state 3 at the next time is 50%.
If Qn holds the probability of being in each state at time n, then
(9.4) Qn+1 = A · Qn
158 9. MATRIX ALGEBRA
(9.7) A·X +C ≤X
The meaning of inequality here is that it applies to each entry on both sides.
For instance, the third entry on the left is less than or equal to the third entry
on the right. And so for all entries. If (9.7) holds, we say that X is a feasible
production schedule.
Later in the course we will see how to find a feasible production schedule,
if such a schedule exists. At this point, we introduce a related concept. If we
are given a feasible production schedule X0 , then we can form the recursive
sequence
equation.
(9.9) A·Y +C =Y
The equation shows that production Y exactly covers usage in production and
consumption. In other words, Y is a maximally efficient schedule.
Let’s interpret equation (9.8). The right side A · Xn + C meaures the goods
used in the economy: used for production and used for consumption. Setting
Xn+1 to this means that each producer produces next year exactly what was
used this year. In many situations, when each person adjusts his own activity
to make it more efficient, the efficiency of the group is unpredictable. For
instance, if everyone comes early to a concert to get a good seat, the task of
finding a good seat will be no easier than it was when everyone came on time.
But in Leontief’s model, individual adjustments lead to macro-efficiency. As
we converge to (9.9), the economy becomes maximally efficient: everything
that is produced is used.
Problem T9.7. We have a technology matrix A, consumption vector C, and
feasible production schedule X0 , as follows
0 0.15 0.225 0.075 30 150
1.5 0.075 0.0075 0.15 10
300
A= 0 and C = and X0 =
0.3 0 0.075 100 250
0.75 0.225 0.075 0.15 50 300
Find the sequence Xn described above and estimate the schedule Y such that
(9.9) holds.
Solution. We have only to use the recursion (9.8). Excel shows that X11 ≈
X12 , and
140
286
X12 =
207
277
This should be a good estimate of Y .
2. APPLICATIONS OF MATRIX ALGEBRA 161
Linear Equations
3 · x1 − 2 · x2 + 5 · x3 + x 4 = 1
A solution to this equation is a set of values of the variables. For instance, let
x1 = 1, x2 = 4, x3 = 0, x4 = 6
and observe that these values make the linear equation true:1
3·1−2·4+5·0+6=1
1Don’t worry about how this solution was found; we’ll discuss that later.
2This form provides a partial explanation for the unexpected definition of matrix
multiplication.
163
164 10. LINEAR EQUATIONS
It is crucial that you see how the matrix product (10.2) corresponds directly
to the equations in (10.1). There is one row of the matrix on the left for each
equation. There is one column of the matrix on the left for each variable.
A solution to a system of equations is a solution to each of the equations
simultaneously. For the system (10.1) we can let
a 3
b = 2
c 1
and observe that (10.2) now gives us a true matrix equation:
1 −2 −1 3 −2
2 3 3 · 2 = 15
1 1 1 1 6
as you should verify! The single matrix equation is really three equations: one
for each entry – one for each of the three equations that the system requires
to hold simultaneously.
Abstractly, a system of linear equations looks like this:
(10.3) A·X =B
where A is a matrix of numbers – it is called the coefficient matrix . The
symbol X stands for a column of unknowns (variables), as many variables as
A has columns. The matrix B gives the right sides of each equation, and so B
is called the right side matrix . A solution to this matrix equation is a specific
matrix C, corresponding to X, such that
A·C =B
The j-th entry of C tells the value of the j-th variable in X.
It is not hard to solve a system of linear equations. The main idea is
to use one of the equations to solve for one of the variables; then you can
substitute for that variable in the remaining equations, reducing the problem
by one equation and one variable, and then continuing in the same way. We
1. EQUATIONS AND SOLUTIONS 165
could take a rather ad-hoc approach to the solution of linear equations, but
we choose to be systematic for at least two reasons. First, there is information
hidden in the equation AX = B that is relevant to general applications of
matrices; a careful solution technique will disclose this information. Second, a
systematic approach to solutions will allow us to reach a solution in an efficient
way that avoids dead ends.
a b c =
−1 0 1 1
[A|B] =
2
0 2 2
1 3 3 3
The matrix [A|B] is the augmented matrix of the system. The augmented
matrix is just a table that contains all the numbers we are working with. We
have labeled the columns of A with the unknowns, so we don’t forget them!
And we labeled the last column with an equals sign – it is the right side B.
166 10. LINEAR EQUATIONS
3These
operations are also called row operations and elementary row operations.
4The
claim about this last elementary operation is a little harder to see; look at a couple
of examples.
1. EQUATIONS AND SOLUTIONS 167
the operations do not change the set of solutions, the solutions of the given
system will be obvious from the transformed system.
Let’s go back to the example linear equation to demonstrate the Elimi-
nation algorithm. Then we will describe Elimination in general. In trying
the algorithm on our example, observe the notation we use for the elementary
operations. Rows are referred to by roman numerals. We use −(1/3) · II for
multiplying row 2 by -1/3, and −4 · I + II for adding -4 times row 1 to row 2.
Here is the augmented matrix we had:
a b c =
−1 0 1 1
[A|B] =
2
0 2 2
1 3 3 3
(We drop the top row to avoid writing it over and over.) Now we use the pivot
1 in the [1, 1] position to eliminate the occurrence of the variable a in all the
other equations. To do this, we add multiples of row one to the other rows,
clearing the entries below a. Watch, and notice that we do not change row
one.
1 0 −1 −1 1 0 −1 −1
2 0 2 −2 · I + II
2 0 0 4 4
−1 · I + III
1 3 3 3 0 3 4 4
Do you see why we added −2 times row I to row II? We wanted to clear the
2 at [2, 1].
168 10. LINEAR EQUATIONS
Because these equations were obtained from the original equations (10.4) by
elementary operations, the resulting equations have the same solutions as the
original. In other words, a = b = 0 and c = 1 is the unique solution to (10.4)!
Now we describe Elimination in general, given the system AX = B.
1. EQUATIONS AND SOLUTIONS 169
Elimination
Apply Steps 1-4, with row 1, 2, . . ., in turn, as the current row, until the bottom
row is reached or Step 1 fails.
Step 1. Find the leftmost column in the coefficient part of the augmented
matrix having a non-zero entry at the current row or below. If there is no such
entry, Step 1 fails. Otherwise, choose one such entry (this entry is a pivot).
Step 2. Switch rows, if necessary, to bring the pivot to the current row.
Step 3. Multiply the current row by the inverse of the pivot (so that the
pivot now has value 1).
Step 4 Add multiples of the current row to rows above and below it so that
the pivot is the only non-zero entry in its column.
Example Elimination Let’s solve the system of equations given by the
following augmented matrix, where the unknowns are listed along the top.
x1 x2 x3 x4 x 5
2 -3 1 4 1 17
-4 6 -1 -6 0 -27
1 1 2 7 -1 20
-4 1 -4 -16 3 -50
Step 1 might choose the 2 at the 1,1-entry as pivot. Step 2 is not needed,
and Step 3 and Step 4 compute
2 -3 1 4 1 17 (1/2) · I 1 -3/2 1/2 2 1/2 17/2
-4 6
-1 -6 0 -27 4 · I + II
0 0
1 2 2 7
1 1 2 7 -1 20 -1 · I + III 0 5/2 3/2 5 -3/2 23/2
-4 1 -4 -16 3 -50 4 · I + IV 0 -5 -2 -8 5 -16
With row 2 as current row, the leftmost column at row 2 or below is column
2. We use 5/2 as pivot; Step 2 interchanges rows 2 and 3:
170 10. LINEAR EQUATIONS
1 -3/2 1/2 2 1/2 17/2 1 -3/2 1/2 2 1/2 17/2
0 0 1 2 2 7 0 5/2 3/2 5 -3/2 23/2
0 5/2 3/2 5 -3/2 23/2 II ↔ III
0 0 1 2 2 7
0 -5 -2 -8 5 -16 0 -5 -2 -8 5 -16
Then Step 3 and Step 4 come along.
1 -3/2 1/2 2 1/2 17/2 1 0 7/5 5 -2/5 77/5
0 5/2 (2/5) · II
3/2 5 -3/2 23/2 (3/2) · II + I
0
1 3/5 2 -3/5 23/5
0 0 1 2 2 7 0 0 1 2 2 7
5 · II + IV
0 -5 -2 -8 5 -16 0 0 1 2 2 7
Now row 3 is the current row. We use the 1 at entry 3,3 as pivot. Step 2 and
Step 3 are skipped. Step 4:
1 0 7/5 5 -2/5 77/5 1 0 0 11/5 -16/5 28/5
0 1 3/5 2 -3/5 23/5 -(3/5) · III + II 0 1 0 4/5 -9/5 2/5
-(7/5) · III + I
0 0 1 2 2 7 0 0 1 2 2 7
-1 · III + IV
0 0 1 2 2 7 0 0 0 0 0 0
Continue to row 4 and Step 1 fails. We are done.
Recall that the coefficient matrix you get at the end of Elimination is in
row-echelon form; the echelons referred to are the columns with their pivots
organized left to right and top to bottom. The number of pivots (pivoted
variables) in Elimination is called the rank of the coefficient matrix.
The last equation in this system simply says 0 = 0, which is always true
and, therefore, has no effect on solutions. Let’s write out the three equations
that matter.
If x4 and x5 are chosen arbitrarily, then x1 and x2 and x3 are uniquely deter-
mined in a solution to the system. Thus, there are infinitely many solutions,
1. EQUATIONS AND SOLUTIONS 171
and, in choosing some particular solution, x4 and x5 are arbitrary. These ar-
bitrary variables did not get pivots in Elimination, whereas the determined
variables did get pivots. When a system is consistent (has solutions), the piv-
oted variables are determined by the non-pivoted variables. The non-pivoted
variables are said to be free since their values are arbitrary. The pivoted vari-
ables are called basic.. The number of basic variables is the number of pivots
is the rank of the coefficient matrix.
Problem T10.1. Solve the system of equations given by the following aug-
mented matrix, in which the unknowns are listed along the top.
x1 x2 x3 =
1 2 3 1
4 5 6 0
7 8 9 0
Solution. We put the relevant step from the Elimination algorithm in a square
at the top of each cluster of calculations. As we will see, we find pivots at
entries [1, 1] and [2, 2].
1 2 3 1 4 1 2 3 1
4 5 6 0 −4 · I + II 0 −3 −6 −4
7 8 9 0 −7 · I + III 0 −6 −12 −7
1 2 3 1 4 1 0 −1 −5/3
3 0 1 2 4/3 −2 · II + I 0 1 2 4/3
−1/3 · II
0 −6 −12 −7 6 · II + III 0 0 0 1
We are done, since there are no further pivots in the coefficient matrix. As
before, the solutions to the final equation are the same as the original. Row 3
says that
0x1 + 0x2 + 0x3 = 1
and this is impossible. Thus, the equation in this problem has no solutions at
all. We say the system is inconsistent. This example shows the general form
172 10. LINEAR EQUATIONS
2. Matrix Inverse
It looks as if this solves the linear equation! The trouble is that not all matrices
have an inverse, as we will see.
Because matrix multiplication is, in general, not commutative, we need to
be pickier in the argument we just gave. Consider the equation A · X = E,
and suppose there is a matrix B such that B · A is an identity matrix. Let’s
not worry about sizes at this point, and just write B · A = I. If X = D is a
solution to A · X = E, so that A · D = E, then we can multiply on the left by
B, and get this:
D = I · D = (B · A) · D = B · (A · D) = B · E
We are saying that equation (10.5) does not find a solution to the present
equation, since the present equation doesn’t have any solution at all.
If B · A is an identity matrix, then A · X = E has at most one solution for
each E, but sometimes there may be no solution at all.. Let’s see how to get
the existence of a solution: if there is a matrix C such that A · C is an identity
matrix, then we claim that C · E is a solution to A · X = E. Indeed,
A · (C · E) = (A · C) · E = I · E = E
Going back to (10.5), we had the equation P · Q = I2 , and so P · X = E
has a solution for all E. If you do Elimination on P , you will see that it has
rank 2, so that every row gets a pivot and there cannot be an inconsistency.
However, because there is a free variable (in the third column), the solution
will never be unique.
We can put the two ideas together: if there are matrices B, C such that
B · A and A · C are identity matrices, then every equation A · X = E will have
a unique solution. In this case, we say that A is invertible. The situation with
B, C is a lot simpler than it looks, as we will see by embarking on an argument
to establish a circle of facts that will tell us which matrices are invertible and
how to find the B, C.
Suppose that A is invertible, and get matrices B, C so that B · A and A · C
are identity matrices. Let A be m × n. (Soon we will see that m = n, but, for
now, we work in general.) The matrix B · A has n columns, and so In has to
be the identity matrix equal to that product: B · A = In . This shows that B
is n × m. Similarly, A · C has m rows, and so A · C = Im , and so C is n × m.
Next we show that B = C! Indeed, using the associative law, we compute
B = B · Im = B · (A · C) = (B · A) · C = In · C = C
2. MATRIX INVERSE 175
(a) Suppose that A is m × n and has an inverse. Then m = n and A has rank
n.
(b) Suppose that A is an n × n matrix of rank n. Then the row-echelon form
of A is In
(c) Suppose that the matrix A has row-echelon form equal to In . Then A has
an inverse.
Proof of (b): The first pivot goes in row 1, the second in row 2, and so on.
Since the rank is n, each row gets a pivot. There are n columns, as well, and so
each column gets a pivot. The pivots, which are 1’s in row-echelon form, move
left to right as we go down the rows, and it is easy to see that the pivots have
to be at the [i, i] entries for i = 1, 2, . . . , n. This shows that the row-echelon
form of A is In .
Solution. Recall that the equation for X says that it is efficient in the sense
that all that is produced is used. We can place the X as an unknown in a
linear equation:
A · X + C = I3 · X so that C = (I3 − A) · X
Problem T10.4. Find coefficients a, b, c, so that (1, 4), (2, 4), and (3, 10) all
lie on the curve y = a · x2 + b · x + c.
Solution. Plug the points into the equation to see what equations we need:
(1, 4) : 4 = a · 12 + b · 1 + c
(2, 5) : 5 = a · 22 + b · 2 + c
(3, 10) : 10 = a · 32 + b · 3 + c
178 10. LINEAR EQUATIONS
y = a0 + a1 · x + · · · + an · xn
passing through the points. In the curve, the coefficients aj are the unknowns.
In the case n = 1, we are saying that there is a unique curve y = a0 + a1 · x
passing through two points with distinct x-coordinates – that’s a unique line
through two points. We prove the general fact in two worked-out problems.
Solution. The circle of facts on p.175 shows that V will have an inverse if its
rank is n + 1. That would mean that the equation V · X = O would have a
unique solution. The matrix X is (n + 1) × 1, write it
a0
a1
X=
...
an
a0 + a1 x1 + a2 x21 + · · · + an xn1 = 0
a0 + a1 x2 + a2 x22 + · · · + an xn2 = 0
...
Problem T10.6. Let (x1 , y1 ), . . . , (xn+1 , yn+1 ), where the xj are distinct.
Then there is a unique curve y = a0 + a1 x + · · · + an xn passing through
these points.
Solution. The fact that the curves passes through the points is that these
equations hold.
a0 + a1 x1 + · · · + an xn1 = y1
..
.
a0 + a1 xn+1 + · · · + an xnn+1 = yn
The coefficient matrix for this system is the matrix V of the previous prob-
lem. By that problem, V has an inverse, and this gives us the unique curve
coefficients ai that satisfy the equations.
Here is a problem in which there are less than n + 1 points for a curve of
degree n.
When the parameters are exponents, we can use the logarithm to bring
them down as linear factors.
Problem T10.8. Fit the curve w = xa · y b · z c , where a, b, c are parameters,
to these points:
x y z w
2 3 3 4
2 5 5 4
3 4 7 6
To fit the curve, get a system of linear equations from the logarithm of the
equation.
Solution. Compute that
ln(w) = a · ln(x) + b · ln(y) + c · ln(z)
We used Excel to solve the resulting system of equations, for the coefficient
matrix has an inverse.
a ≈ 2, b ≈ 0.724541939, c ≈ −0.724541939
e d
e d
b c
b c
a
a
Our example has three vertices (junctions) and five edges (wires). The
wires have been labeled a,b,c,d,e. Each wire has a current associated with
it – we defined this real number on p.9; its absolute value is the number of
electrons passing through that wire per unit time. To indicate the direction of
the electrons, we put an arrow on each wire. That’s the picture on the right.
Let Ia be the current in wire a. If the current Ia is going in the direction of
the arrow, then Ia > 0; if the current is going in the opposite direction, then
Ia < 0. It doesn’t matter which way our arrows go, the current values can be
adjusted appropriately.
Now we have currents Ia , Ib , and so on. As mentioned earlier in this book,
the units of current are amperes, usually abbreviated amps.
One of the basic principles of electrical circuits is called Kirchoff ’s Current
Rule.7 According to Kirchoff’s Rule, the sum of the currents coming into each
junction is equal to the sum of the currents leaving that junction.8 Here are
the equations for Kirchoff’s Rule. They are linear equations with right side 0.
top vertex Ie + Ib + Ic + Id = 0
left vertex: 0 = Ie + Ib + Ia
right vertex: Ia = Ic + Id
It turns out that we can predict which variable currents can be free in
Kirchoff’s equations and which can be basic. To do this, we choose a tree in
the graph – a set of wires that does not contain a loop – a path from one
7See [9, p.785].
8Whether a current is coming in or leaving is determined by the arrow.
3. APPLICATIONS OF LINEAR EQUATIONS 183
junction back to that same junction. (The path does not have to follow the
arrow directions.) We give our tree as many edges as possible – it is a maximal
tree. For instance, a, c form a maximal tree, for if we try to add an additional
edge, say b, then we have a loop (b to a to c). It turns out that the variables
associated with a maximal tree can be basic in Kirchoff’s equations. We chose
a, b, and so Ia , Ic can be basic variables, and Ib , Id , Ie would be free. Here is the
augmented matrix for the equations, putting the columns for Ia and Ic first, so
they will be found basic in Elimination. The second matrix is the row-echelon
form from Elimination.
Ia Ic Ib Id Ie =
0 1 0 1 0 1 0
1 1 1 1 0 0 1 1 1 1 0
−1 0 −1 0 −1 0
=⇒
0 0 0 0 0 0
1 −1 0 −1 0 0
Sure enough, Ia , Ic are basic. Remember that different sets of variables can be
basic. A different maximal tree would give a different set of basic variables.
x7 x8 x9
C A
x4 x5 x6
x1 x2 x3
D
Now we interpret the condition that the temperature x1 is the average
of temperatures around it. We use the temperatures x2 , x4 , and the side
temperatures C, D:
1 h i
x1 = · x2 + x4 + C + D
4
This gives a linear equation:
4x1 − x2 − x4 = C + D
We have used C + D as the right side, since C, D are given constants. Each
of the nine grid points has such an equation, and we arrive at a system of
nine equations in nine variables. Here is the augmented matrix of this system.
Each row i comes from the average equation for grid point xi .
x1 x2 x3 x4 x5 x 6 x7 x8 x9 =
4 −1 0 −1 0 0 0 0 0 C + D
−1 4 −1 0 −1 0 0 0 0 D
0 −1 4 0 0 −1 0 0 0 A + D
−1 0 0 4 −1 0 −1 0 0 C
0 −1 0 −1 4 −1 0 −1 0 0
0
0 −1 0 −1 4 0 0 −1 A
0
0 0 −1 0 0 4 −1 0 B + C
0 0 0 0 −1 0 −1 4 −1 B
0 0 0 0 0 −1 0 −1 4 A + B
3. APPLICATIONS OF LINEAR EQUATIONS 185
The elimination macro can be applied to the coefficient matrix to see that
it has rank 9. In other words, our equations have a unique solution once
A, B, C, D are given.
Our model can be any sort of graph. Here is an example we like.
e f g
x7 x8 x9 d
x4 x5 x6
x1 x2 x3 c
a b
The seven lettered vertices a, b, c, d, e, f, g are given values, and the vari-
ables xj should be determined by averaging. For instance, x1 is connected to
x4 , x5 , a, and so its equation is this:
1
x1 = · x4 + x5 + a which is 3 · x1 − x4 − x5 = a
3
Notice that x5 is connected to six vertices, and so its value is an average of six
numbers. We will write down all the equations in class or on homework.
g
1 4
d
a e f
b
2 3
c
Here is a table giving the current supply of pallets at each vertex and the
eventual supply, accomplished by shipping.
warehouse: 1 2 3 4
current 5 25 0 50
eventual 15 5 40 20
(Notice that there are 80 pallets total in both supply lists.) We want to
determine how to ship the pallets around to get the eventual supply numbers.
We have seven routes, and we let their labels stand for the amount shipped:
a, b, . . . , g. At warehouse 1, we have 5 pallets and we need 15, so we will have
a net 10 pallets coming in. Pallets can come in along route g. Does that mean
g = 10? Maybe we ship pallets out along routes a, d. The net number of
pallets we have after shipping must be 10: −a − d + g = 10. We get similar
equations at warehouses 2,3,4. Here are all the resulting equations.
− a − d + g = 10
a − b + c = −20
b − c + d − e + f = 40
e − f − g = −30
3. APPLICATIONS OF LINEAR EQUATIONS 187
Partial Derivatives
1. Partial derivatives
at the function this way, it might help to block out expressions in y’s.
y 2 + y · z − ez =
+ · z − ez
Here is the derivative, with an explicit reminder of the basic rules for calcu-
lating it.
∂ ∂ ∂ ∂ z
+ · z − ez =
+ ·z − e sum rule
∂z ∂z ∂z ∂z
We continue:
∂ ∂ z
=0+ ·z − e derivative of constant
∂z ∂z
∂ ∂ z
=· z − e constant multiple rule
∂z ∂z
= · 1 − ez derivative of z and ez
Now we remember that the stood for y, and we have
∂ 2
y + y · z − ez = y − ez
∂z
You may or may not want to block out other variables; the point is to focus
on the variable z and ignore everything else.
To compute the other derivative, we regard y as variable and hold z con-
stant. This time, we’ll leave the constant z in the expression rather than block
it out. Notice that ez will also be constant.
∂ 2 ∂ 2 ∂ ∂ z
y + y · z − ez =
y + y·z − e
∂y ∂y ∂y ∂y
∂
=2·y+z· y −0
∂y
=2·y+z
Be sure you can explain the steps here!
The calculation of partial derivatives involves exactly the same rules as
the calculation of ordinary derivatives; this section is essentially a review of
differentiation. We say that the function f of several variables is differentiable
if it has a valid partial derivative with respect to each of its variables.
1. PARTIAL DERIVATIVES 191
Solution. We calculate
∂T
= 2x − y kg/m3
∂x
(Make sure you understand why we get the units we do!) The inequality
∂T /∂x > 0 is seen to be 2x − y > 0, and that’s 2x > y. The inequality
describes a set of points on the square – the points below the line y = 2x.
1. PARTIAL DERIVATIVES 193
Solution. We use the blocking out method on the first one, writing
for t.
We need to remember that the derivative of ln |x| is 1/x.
∂ ln |z 2 +
|
∂z z·
+2
(ln |z 2 +
|)0 · (z ·
+ 2) − ln |z 2 +
| · (z ·
+ 2)0
=
(z ·
+ 2)2
quotient rule
1 2 0 2 0
= · (ln |z +
|) · (z ·
+ 2) − ln |z +
| · (z ·
+ 2)
(z ·
+ 2)2
rewrite
(z +
)0
2
1 2 0
= · · (z ·
+ 2) − ln |z +
| · (z ·
+ 2)
(z ·
+ 2)2 z2 +
Chain, derivative of logarithm
1 2·z 2
= · · (z ·
+ 2) − ln |z +
| ·
(z ·
+ 2)2 z 2 +
Power Rule, constant multiple
1 2·z 2
= · · (z · t + 2) − ln |z + t| · t
(z · t + 2)2 z 2 + t
unblock t
194 11. PARTIAL DERIVATIVES
We’ll do the other derivative without blocking and without giving reasons.
Make sure you understand each step, as always.
∂ ln |z 2 + t|
∂t z · t + 2
(ln |z 2 + t|)0 · (z · t + 2) − ln |z 2 + t| · (z · t + 2)0
=
(z · t + 2)2
1 1 2
= · · (z · t + 2) − ln |z + t| · z
(z · t + 2)2 z 2 + t
There are times when we need more than two partial derivatives. Once
you understand the notation for the second derivative, higher derivatives are
not hard to interpret.
Problem T11.8. Let G(v, x, y) = xv 3 + yv 4 . Compute
∂ 3G
∂v∂x∂v
Solution. We see that
∂ 3G ∂ ∂ ∂G
= · ·
∂v∂x∂v ∂v ∂x ∂v
∂ ∂
3xv 2 + 4yv 3
= ·
∂v ∂x
∂ 2
= 3v = 6v
∂v
Many of the classical mathematical models are expressed via partial differ-
ential equations, referred to as PDE’s. The unknown of a PDE is a function of
3It turns out that it usually doesn’t matter in which order this second derivative is
performed, so if you forget the ordering, you will usually be ok.
196 11. PARTIAL DERIVATIVES
several variables. The PDE gives an equation that must be satisfied by various
partial derivatives of the function. Here is an example.
∆f = ∆x f + ∆y f
198 11. PARTIAL DERIVATIVES
Remembering that there are formulas for x, y in terms of t, we can write the
answer as a function of t:
df
= (2(t2 + 2) + (1 − 3t)3 ) · (2t) + 3(t2 + 2)(1 − 3t)2 · (−3)
dt
As long as we remember that x, y are functions of t, either form of the answer
is fine.
We have stated the Chain Rule using two variables: x, y. Any number of
variables can be used, and it does not matter what they are called. Thus, if
we have differentiable f (x1 , . . . , xn ), and if each xi is a differentiable function
of t, then the Chain Rule asserts that
df ∂f dx1 ∂f dx2 ∂f dxn
= · + · + ··· ·
dt ∂x1 dt ∂x2 dt ∂xn dt
3. THE CHAIN RULE 199
Problem T11.12. Given F (a, b, c), suppose that a, b, c are functions of the
variable w. Find dF/dw.
Problem T11.13. Suppose that we have g(x, y) and we define h(x) = g(x, x).
Find h0 (x) in terms of the partial derivatives of g.
You might try a specific instance of the previous problem. Let g(x, y) =
x + y 2 , so that g(x, x) = x + x2 . You can verify that
∂(x + y 2 ) ∂(x + y 2 )
(x + x2 )0 = + where y = x
∂x ∂y
Suppose we are standing at (3, 2) in the xy-plane. We contemplate moving
away from this point in some direction. One natural way to indicate a direction
is to describe the changes in x and y that would occur in one time unit. If the
changes are to be (a, b), then consider
x=3+a·t
y = 2 + b · t where t ≥ 0
200 11. PARTIAL DERIVATIVES
It is easy to see how to get such points: add the same positive quantity to both
5 and 6. The general such point would be (5+c, 6+c) for some positive number
c. Notice that this idea doesn’t involve time, as did the idea of the previous
paragraph. That idea suggests moving along x = 5 + c · t and y = 6 + c · t. The
various values of c tell how far we want to go in one time unit; the larger c
is, the farther we go. What this really means is that the specific c determines
our speed.
Now suppose we have f (x, y) and we want to know how f changes as we
move northeast from (5, 6). Let x = 5 + c · t and y = 6 + c · t. Then
df ∂f dx ∂f dy
= · + ·
dt ∂x dt ∂y dt
∂f ∂f
= ·c+ ·c
∂x ∂y
∂f ∂f
= + ·c
∂x ∂y
Solution. To move from (1, −1) in direction (−3, 4), we let x = 1 − 3 · t and
y = −1 + 4 · t. The derivative dF/dt is the change in temperature with respect
to t. At t = 0, that derivative measures the change in F at the point (1, −1).
Thus, we compute
dF ∂F dx ∂F dy
= · + · = 2xy · (−3) + (x2 + 3y 2 ) · 4
dt ∂x dt ∂y dt
When t = 0 we have x = 1 and y = −1 and dF/dt = 22. We see that F will
increase.
It is instructive to rework the previous problem using a different set of
changes in the same direction as (−3, 4). For instance, we can move in the
direction (−6, 8) or in the direction (−1.5, 2). You should see that you get
different values of dF/dt but that the sign stays the same.
Solution. We let
x = 3 + a · t and y = 2 + b · t for t ≥ 0
If y = 0, then we see that x = ±2, and the Chain Rule equation says
that 2x(dx/dt) = 0. Thus, dx/dt = 0. The tangent to the circle at (±2, 0) is
vertical.
Given a function of several variables, its derivative is the matrix row of its
partial derivatives. The derivative of the function f is often denoted Df ; we
don’t usually employ the prime notation f 0 that was used for functions of one
variable. If f (x, y, z) = x2 + y 2 + x · z, then the derivative Df is a 1 × 3 matrix:
h i
Df = ∂f ∂f ∂f
∂x ∂y ∂z = 2x + z 2y x
There isn’t much to this beyond a convenient notation that we will use in
the next chapter.
CHAPTER 12
Non-Linear Optimization
We deal first with optimization problems for which all the constraints are
open, or for which there are no constraints at all. Going back to the case of
a single variable, the minimum or maximum of a function in an open interval
occurs at a critical point – a point where the derivative is 0. Remember that
the converse is false: we can have derivative zero without having an extreme.
(e.g. y = x3 at x = 0)
205
206 12. NON-LINEAR OPTIMIZATION
x = (x1 , . . . , xn )
Solution. We need to solve the equation Df = O. Here are the three equa-
tions, one for each partial derivative.
∂f
0= = 3 · x21 − 1
∂x1
∂f
0= = 4 · x2 − 2 · x3 − 4
∂x2
∂f
0= = −2 · x2 + 6 · x3 − 18
∂x3
√
The first equation gives x1 = ±1/ 3. The second and third equations yield
x2 = 3, x3 = 4. Thus, there are two critical points:
√ √
(1/ 3, 3, 4) and (−1/ 3, 3, 4)
It was very significant that we knew there was a solution in the previous
problem. Otherwise, the values we calculated at the critical points did not
have to be extremes. For a problem with open constraints, this can be a
tricky business. For a problem all of whose constraints are closed, there is
a very important theorem that is relevant. To give that theorem, we need a
term: we say that the constraints of an optimization problem are bounded if
each variable is confined to a closed interval.
The Extreme Value Theorem If f (x) is a differentiable function sub-
ject to closed and bounded constraints, then f (x) has both a maximum and
minimum subject to the constraints.
Problem T12.4. Consider the problem: Maximize x4 ·y 5 , where x2 +2·y 2 ≤ 9.
Does the Extreme Value Theorem apply to this problem?
Solution. The single constraint is closed; it implies that x2 ≤ 9, so that
√ √
−3 ≤ x ≤ 3. Similarly, we have 2 · y 2 ≤ 9, so that −3/ 2 ≤ y ≤ 3/ 2, so
that the constraints are bounded.1 Thus, the Extreme Value Theorem applies.
The problem necessarily has a solution.
Let’s take the previous problem a little further. The constraint defines
an ellipse and its interior. First suppose that the maximum occurs inside the
ellipse – where the constraint is open: x2 +2·y 2 < 9. Then the First Derivative
Test applies and the solution occurs at a critical point, where D(x4 · y 5 ) = O.
On the other hand, the maximum might occur on the ellipse – where the
constraint is x2 + 2 · y 2 = 9. In the next section we will discuss a method for
dealing with both of these cases simultaneously.
It is instructive to use the Solver on the previous problem. When we
started with x = y = 0, the Solver found x = y = 0 as solution. That’s
obviously not correct – perhaps the Solver was confused by the fact that all
1
√Note that we don’t have to show that y can actually attain all the values between
±3/ 2; it is enough to show that y’s values have to come from that closed interval.
2. LAGRANGE MULTIPLIERS 209
points on both the x-axis and y-axis are critical points. When we started
x = 1 = y, the Solver found x = 2, y = 1.581 and objective 158.1 as solution.
That looks more plausible. We notice that this point is on the ellipse.
We use the Exteme Value Theorem in all the problems we consider in the
rest of this chapter.
2. Lagrange Multipliers
Solution. Notice that the single constraint is closed, but it is not bounded.
Thus, the Extreme Value Theorem does not apply. We will discuss the exis-
tence of a solution later; for now, we will assume that the minimum exists.
Now to Lagrange’s method. We write this constraint as a function set
equal to a constant:
3 · y + 2 · x = 13
In this form, the constraint gets a Lagrange multiplier λ. (It is customary to
use the Greek letter lambda as a multiplier; L for Lagrange, we think.) The
multiplier is an additional variable in the problem. The Lagrange equation
says, at a solution to the problem, that the derivative of the objective is the
multiplier times the derivative of the function side of the constraint.
(12.1) DZ = λ · D(3 · y + 2 · x)
210 12. NON-LINEAR OPTIMIZATION
λ for x − y + 3 · z
µ for 2 · x + y − z
2·x=λ·1+µ·2=λ+2·µ
4 · y = λ · (−1) + µ · 1 = µ − λ
2 · z = λ · 3 + µ · (−1) = 3 · λ − µ
0 = 2 · λ1 · t
and that says that either λ1 = 0 or t = 0. We noted above that if y − x2 < 10,
then t 6= 0, and so λ1 = 0 in this case.
These conditions can be stated without reference to the slack variable t.
We go back to the original variables x, y, and write (12.3) in terms of those
variables alone.
(12.4) 2x 2y = λ1 · −2x 1 + λ2 · 1 1
x ≈ 1.5, y ≈ 1.5, λ1 ≈ 0, λ2 ≈ 3
The maximum objective was ≈ 4.5. At the solution, the constraint y −x2 ≤ 10
is a proper inequality, and so its multiplier is 0. Observe that the equation
(12.4) holds at the solution.
Here is a problem that involves pretty much everything we have discussed.
We remind you that we will not do many problems by hand, but it will be
helpful to see the Lagrange equation method in action.
214 12. NON-LINEAR OPTIMIZATION
Solution. Let Q = 2x−y, so that Q is the objective. It is not hard to see that
the constraints are closed and bounded. They consist of the region between
y = x2 and the line y = 2 + x. Thus, the Extreme Value Theorem says that
the maximum and minimum exist.
Let λ1 be the multiplier for y − x2 ≥ 0 and let λ2 be the multiplier for
y − x ≤ 2. Then
The second entry says that λ = −1, and the first entry equation is 2 =
(−1) · (−2x) = 2x. Thus, x = 1, and so y − x2 = 0 says that y = 1. The
objective is Q = 1 in this case.
2. LAGRANGE MULTIPLIERS 215
The entries disagree on the value of λ2 . There are no points in this case.
Dgk
If the rank of G is k at the maximum or minimum of the objective, then the
Lagrange equations have to hold. Thus, to conduct a complete use of the
Lagrange equations, we need to check points where Dg has rank less than
k, plugging them into the objective to see if the minimum/maximum occurs.
Typically, there are no such points, but, to be complete, this case should be
considered.
Problem T12.9. We sell two related products A and B. Here are the demand
curves for each; they show the demand as a function of the price. The subscript
1 goes with A, 2 goes with B.
p2 1
x1 = 6 · 105 · · 2 and x2 = 50 · exp(−p2 /200)
p2 + 400 p1 + 10
It takes 10 hours to make one unit of A and 40 hours for one unit of B. We
have up to 1000 hours for production of A and 500 hours for B.
(a) Find the prices and demand that maximize revenue.
(b) What would we be willing to pay for 10 more hours for producing A?
R = x1 · p 1 + x2 · p 2
M1 = (1 + r) · M0 , M2 = (1 + r) · M1 , M3 = (1 + r) · M2 , ...
Now suppose we observe the actual value of the investment each day. Here
is what we observe over the first few days.
day n : 0 1 2 3 4 5
value Mn : 100 101 100 105 110 111
We want to know the interest rate r. We are supposed to have
This gives r = 0.01. That was easy. But we also want to have
which solves as r ≈ −0.001. (Apparently, day 2 was a bad day.) Then again
and this is r = 0.05. There is no one value of r that works in each equation!
How are we supposed to find r, given that the observed data don’t agree on
its value?
Let’s back up and realize that we should have expected something like this
to happen. Remember that we said that the existence of r is a theoretical
fact – that probably means that the value of the investment is determined by
several things and that the interest rate is an important factor, maybe the
most important factor – but perhaps it’s not the only factor. In that case, the
interest rate wouldn’t completely determine the sequence, and so we should
not be surprised at the inconsistencies. Here is another possible explanation:
suppose that the observed values are only estimates (approximations) of the
actual values. There might be errors in those estimates that make the equa-
tions inconsistent.
The first sort of error – that theory only approximates fact – suggests that
the theoretical r is approximate; the second sort of error – that observations
can be rough – suggests that the observations might not reveal the actual
r. Either way we are stuck with trying to find a theoretical parameter from
inconsistent data.
The situation we have just described is typical of virtually all experiments
in both the natural and social sciences. Some kind of underlying theory pre-
dicts what should happen, but not exactly; our observations show us what
actually happened, but not exactly. Experience shows that even when the
theory is rock solid and the observations are as precise as we can make them,
220 12. NON-LINEAR OPTIMIZATION
P0 P1 P2 · · · Pn
and Q0 Q1 Q2 · · · Qn
We are not yet in a position to think about the size of the norm (78 may seem
like a large number). The point is that 78.01 measures the distance between
the predicted sequence and the observed sequence.
When, as in the previous example, the square norm measures the distance
between predicted values and observed values, the norm is often called the
squares error .
Example. We think that the weight w (lbs) of a seven year old boy is predicted
by his height h (inches) via the formula w = 5 · h − 190. (So, this equation
is theoretical.) As an experiment, we record the height and weight of four
seven-year-olds in this chart:
height : 47 49 45 46
observed weight: 45 50 38 45
The equation w = 5h − 190 predicts the following weights:
height : 47 49 45 46
predicted weight: 45 55 35 40
The squares error measures the distance between the observed weights and the
theoretical weights: (0)2 + (5)2 + (3)2 + (5)2 = 59.
3.2. Least Squares. We have described the situation where we have some
sort of observed sequence and a theoretical prediction that it should be ex-
plained by particular parameters. In the height/weight example, we had a
particular theoretical curve w = 5 · h − 190; let’s only assume that the curve
should be a line w = m · h + b and we’ll try to find the parameters m, b.
3. FITTING A MODEL TO DATA 223
This is not easy to read; study it carefully! The value of m is stored in A2 and
the value of b in B2. The formula in B5 computes w = m · h + b for h = 47 in
B3. The cells marked ? were obtained by fill-right from B5. That’s why
the references to m, b have dollar signs. Cell B6 holds the difference between
the predicted w (in B5, for h = 47) and the observed w in B4. The entries in
row 6 marked with ◦ were obtained by fill-right from B6. The objective
square norm is in cell F6; the symbol stands for this:
=sumproduct(B6:E6,B6:E6)
We asked the Solver to minimize the cell F6, by changing the variables
A2:B2 (the parameters m, b), and it came back with
m ≈ 2.685, b ≈ −81.01
The minimum squares error was 9.89. (Note: when we invoked the Solver,
we were careful to uncheck the box for non-negative variables, since m, b could
be negative.)
The line that results from the solution to the previous problem is this:
w = 2.685 · h − 81.01
This line is called the line of best fit or the least squares line or the regression
line. It would be the business of statistics to argue from the minimum squares
error back to the plausibility of the theoretical curve. In this course, we will
stick with calculating the minimums.
Spreadsheet Formulas
1. Function Values
1Microsoft Excel. Among the many, many software tools available for numerical work,
we have chosen Excel because of its general availability and usage. Apple iWork Numbers
and Google Docs Spreadsheet, for instance, have many of the functions of Excel, but they
lack the ability to handle the non-linear problems that arise in many applications. For
what it’s worth, as of 2015 Excel is more flexible and more widely used than its current
competitors.
227
228 13. SPREADSHEET FORMULAS
First we explain how to compute a single value of this function. Enter the
following symbols and numbers in the indicated cells. You can use the arrow
keys or return to indicate that you are finished entering something.
A B C ···
1 x y
2 4 =A2*A2
..
.
The expression =A2*A2 is a formula. The equals sign at the beginning
tells Excel to compute the value of this expression rather than to display the
formula itself. The A2 in the formula refers to the contents of that cell. Since
4 is in cell A2, cell B2 shows 16 when the formula is entered. If you select B2
and look at the formula strip just above the spreadsheet grid, you will see the
formula.
Change cell A2 to various other values, and notice that B2 shows the square
of A2 each time.
When you are typing in the formula, you can click on A2 instead of typing
it. Try it by re-entering =A2*A2.
The expression A2*A2 could also be written A2∧2 The symbol ∧ is expo-
nentiation. Try changing B2 to =A2∧2 and make sure you are still computing
the square after the formula is entered.
Let’s compute a list of function values, using a more complicated function.
√
We’ll use y = x3 · exp(− x). Change B2 to
=(A2∧3)*exp(-sqrt(A2))
Notice how x3 is denoted; exp is the exponential function and sqrt is the
square root function. For the square root, we could also have written
A2∧0.5 or A2∧(1/2)
A B C ···
1 x y
2 4 =(A2∧3)*exp(-(sqrt(A2))
3 3
4 1
√
Cell B2 should have 8.66. . .. We want to compute x3 exp(− x) for the values
3 and 1, as well. We can do this without rewriting the formula over and over.
Select B2 and place the cursor over the small square at the lower right corner
of that cell. The cursor should turn into a solid cross. (Not a double-lined
cross and not the hand symbol.) Once you have the solid cross, click and hold
down the cursor. Drag the cursor down to the cells B3,B4 below B2 and let
go! Here is what you should see.
A B C ···
1 x y
2 4 8.661
3 3 4.776
4 1 0.367
(You may have more decimal places.) Click on cell B3, and look at the formula
strip above the spreadsheet: you will see that the formula that was in B2 has
been copied down, but the reference to A2 has been changed to A3. The
operation of pulling down with the solid cross is called a fill-down.
We will understand fill-down by using it, but here is an abstract de-
scription: when you fill-down, the cell row references (these references are
numbers) are increased by one for each row drag down the column.
Let’s get a list of function values where the function has a parameter. Let
y = ln(m · x − 2), where m is a parameter (constant). We want to compute y
when x = 1, 2, 3, 4. We’ll start with m = 10; later we will change it. Set up
the following cell contents. (Notice that we have started in cell E4; we’re just
making the point that the fun can start anywhere!)
230 13. SPREADSHEET FORMULAS
E F G H
4 m 10
5 x y
6 1 =ln(F$4*E6-2)
7 2
8 3
9 4
In the entry F$4, ignore the dollar sign temporarily; that refers to cell F4,
which holds the value of m. When you enter the formula in F6, the number
2.079 is displayed; that’s the y-value when x = 1 (and m = 10).
Use fill-down from cell F6 to F9 to get the other y-values for m = 10.
Then look at the formulas in those cells. The E-column changes, row by row
– running over the four different x-values. But notice that the reference F$4
does not change. That’s the purpose of the dollar sign: to keep row 4 constant
as the fill-down is done.
Now we want to change m from 10 to 11, and then to 12. One way to do
this is simply to change the entry in F4 to 11. Try that, and notice that all
the y-values change as well! The y-values update to the new value of m.
We want to show how to change m while keeping the previous values of m
around – say to study what happens as m changes. Change F4 back to 10 and
enter 11 in G4 and 12 in H4.
Next we need to change the formula in F6; change it to this:
=ln(F$4*$E6-2)
We will explain the dollar sign on E momentarily. Then, fill-down from F6
to F9.
Now click on F6 and shift-click on F9; the cells F6,F7,F8,F9 should all be
highlighted. Place the cursor over the small square at the lower right corner
of the box of cells, hold down, and drag to the right, so that the box extends
to the right. End up at H9. Now release the cursor, and all the cells between
2. RECURSION CALCULATIONS 231
2. Recursion Calculations
Let’s see how to compute a recursive sequence. We’ll start with a very
simple example:
=1.5*B1
Now B2 shows 3 = 1.5 · 2, which is Q1 . For the rest of the sequence, we simply
fill-down: select B2, get the solid cross in the lower right corner, and drag
down to B10. Cell B10 should show 76.88, or so. (That’s Q9 .)
232 13. SPREADSHEET FORMULAS
3. Matrix Calculations
In Excel, the rectangle of cells with upper left corner at A1 and lower right
corner at C4 is referred to as A1:C4. This block of cells will hold a 4×3 matrix,
as you can see by selecting that range on a spreadsheet.
Excel matrix addition is straightforward. Suppose we want to compute the
sum of two 2 × 3 matrices, one in block A2:C3 and one in block D4:F5 We
select a block of 2 × 3 cells in which to store the answer: say B7:D8. In the
upper left cell B7 we type the formula
=A2:C3+B7:D8
What you do next depends on the type of your computer. On a PC running
Windows2, press CONTROL+SHIFT+ENTER; on a Mac3, press APPLE+RETURN. We
call this kind of return a matrix return. The matrix return will fill the block
with the product sum. Typing an ordinary RETURN will not give you the entire
matrix sum.
Scalar multiplication is similarly easy. Say we want to compute A − 3 · B
where A, B are 3 × 4. Assume that A is in cells A1:D3 and B is in F1:I3. If
we want to put the answer in B5:E7, then we select those cells, and type the
following formula in B5
=A1:D3-3*F1:I3
Do a matrix return to fill B5:E7 with the answer.
The formula mmult multiplies matrices. Here is an example: we want to
multiply the 3 × 4 matrix in A1:D3 by the 4 × 3 matrix in F1:H4.
2Microsoft Windows
3Apple Macintosh, iMac, MacAir, etc.
234 13. SPREADSHEET FORMULAS
A B C D E F G H
1 1 0 -1 2 1 3 -1
2 1 1 4 6 0 1 5
3 0 2 -3 4 -2 3 -2
4 1 0 9
Now we compute the product. It doesn’t matter where it goes, but we
need room for the product matrix. Our product is 3 × 3, so maybe we move
down a ways, starting in row 11. We select a 3 × 3 block of cells, starting with
the upper left cell in the block. Say we select A11:C13. Here is formula typed.
A B C
11 =mmult(A1:D3,F1:H4)
12
13
Don’t forget to do a matrix return after typing the mmult formula. The
matrix product should appear.
A B C
11 5 0 9
12 -1 16 50
13 10 -7 52
To compute the inverse of a matrix, you use the minverse formula. Here
is a 2 × 2 example.
A B C D E
1 2 -4 =minverse(A1:B2)
2 7 6
The minverse formula is typed while D1:E2 is selected, using a matrix return,
as you would expect. If the matrix that you try to invert does not have an
inverse, each cell in the selected area will display #NUM!, indicating an error.
The mdeterm formula determines whether a square matrix has an inverse.
(Recall that a non-square matrix cannot have an inverse.) This formula gives a
single number, the determinant of the matrix. A square matrix has an inverse
if and only if its determinant is not 0. In the previous example, we could
4. THE SOLVER 235
4. The Solver
237
Index