1070 e Text
1070 e Text
1.3 Graphing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Limits 42
3.2 Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Derivatives 58
4.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2
5 Computing derivatives 68
7 Optimization 94
9 Summation 117
3
9.3 Some series you can explicitly sum . . . . . . . . . . . . . . . . . . . . . . . 120
10 Integrals 125
4
13.3 Computing Taylor polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 172
5
Introduction
This e-textbook, while it could be used in a straight lecture class, was written for a flipped
classroom format.
In all but the introductory section (where students may not have had time to read before
class) there are many self-check exercises, numbered and offset in red. It is intended that
students do these when they come across them during the reading. From the self-check
exercises, students can tell what they need to learn and whether they are getting what they
should from reading the text. Unless marked with an asterisk, the self-check exercises are
at the level where we expect you can figure them out yourself as you do the reading.
Some conventions we adhere to are as follows. The first time a term is introduced it appears
in boldface. This marks it as important and makes it easier to find when looking back.
As discussed in Section 1.1, the colon-equal sign is used for defining equalities, reserving
the regular equal sign for propositions that could be true or not; this is consistent with
conventions in computer science.
The way this book covers content carries an expectation. Any math you know, you should
know well enough to
Related to this is the emphasis on mathematical modeling. In the old days no one seemed to
care about this. More recently, most calculus textbooks address this by including a number
of applied examples and problems. We take this a step beyond that by directly addressing
things you need to know in order to successfully apply math to physical problems, and by
6
including problems that are not just a reflection of the mathematical technique just learned,
but require thought, organization, choices and recognition of structure. We hope most of
them are also interesting.
Some of the topics you will see at the beginning of the course are thought of as high
school topics or pre-calculus. The reason they are here is that we find students to be at
a disadvantage if they have learned these topics enough to do well on the usual tests but
without time to see the connections and subtleties. For this reason, we revisit concepts
such as functions and graphing, inequalities, units, proportionality, sequences, limits and
continuity. We promise to keep away from the drill and kill versions of these subjects, which
you probably already had, but also to give you the instruction and support you need if even
these aspects are in need of attention.
Very little of the old-sounding material is pure review. Most of it revisits topics while
adding depth and connection. Pure review will be limited this section. If the contents of
this section are not reasonably familiar to you, then you may have some catching up to
do. This may indicate the need for help from the tutoring center, as students are generally
expected to know these facts from Algebra II.
Definition 0.1. When b is a real number and x is a positive integer, the notation bx means
multiply together x copies of b. This is called exponentiation and read as “b to the power
x”.
Exponentiation obeys some basic rules. Being able to recall these by trying a simple example
is just as good as having them memorized.
bx · by = bx+y .
(bx )y = bxy .
7
Definition 0.4 (zero power). For all b > 0, define b0 := 1.
Definition 0.5 (positive rational powers). For all real b > 0 and integers q, define b1/q :=
√ √ √
q
x. For all real b > 0 and positive rational x = p/q, define bx := ( q x)p = q xp .
Definition 0.6 (negative powers). For all real b > 0 and positive rational x = p/q, define
b−x = 1/bx .
This next definition may not be review, because it involves limits. This motivates our
upcoming discussion of limits!
Definition 0.7 (real powers). For all real b > 1 and real x, define bx = limy→x by as y
approaches x through rational numbers.
Because you have not yet seen limits, we include an alternate definition: bx is the least real
number z such that for all rational numbers y,
by < z if and only if y < x . (0.1)
If this seems overly formal, you can understand it intuitively by realizing that the graph of
the function f (x) := bx over the domain of rationals looks like a smooth curve except the
domain is full of holes, and the definition for non-rational x is the one that smoothly fills
in the holes. We also remark that we have restricted to the case b > 1 so that bx will be an
increasing function of x and we can use a single inequality in (0.1). For b < 1 we can either
reverse the inequalities or, realizing that b < 1 means b = 1/c with c > 1, we can just define
(1/c)x = 1/cx .
The logarithm, to a particular base b, is defined to be the inverse function to the function
f (x) := bx . Formally,
Definition 0.8 (logarithm to the base b). For any real b > 1, define logb (x) to be the
unique real number y such that by = x.
From the additive and multiplicative rules for exponentiation, we can derive identities for
logarithms.
Proposition 0.9 (identities for logarithms).
logb (xy) = logb x + logb y (0.2)
logb (xc ) = c logb x (0.3)
logb (1/x) = − logb x (0.4)
logb x = logc x/ logc b (0.5)
8
Proposition 0.10 (definition of the number e). There is exactly one real number b for
which the slope of the graph of bx at the point (0, 1) is equal to 1. This number is roughly
2.71828 and is called e.
Definition 0.11 (exp function and natural log). Special notation for exponents and logs to
the base e are:
exp(x) := ex
ln(x) := loge (x)
We can take logs (short for “logarithms”) to any base, but there are three bases that
are most commonly used: 2, e and 10. The reason for using e as a base is hinted at in
Proposition 0.10, namely that it simplifies many formulas.
The reason for using 2 as a base is that powers of 2 play a big role in computer science
and also are conceptually easy. We suggest you memorize at least the first ten of them:
2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, . . .. The fact that 210 ≈ 1000 is useful in approximating
things. (For example, a kilobyte refers to 1024 bytes, not 1000 bytes.) A notation sometimes
used for log2 is lg .
Finally, the reason for using 10 as a base because we are used to the base-10 numbering
system. The size of a number is most obvious to us when we compare it to powers of 10. If
a number is given in scientific notation, for example, as 3.124 × 107 we know immediately
that it is a little over 31 million. In logarithm facts, the base-10 logarithm of 31 million is
7 plus the base-10 logarithm of 3.1, hence a little under 7.5. The base-10 log of a number
gives a direct handle on the size of the number.
Exercise 0.1. If M is a fifteen digit integer then log10 M is apprioximately what? Give
lower and upper bounds: write C ≤ log10 M ≤ D where C and D are fairly simple numbers
and say whether either of these inequalities must be strict (<) or not (≤ but could be equal).
Exercise 0.2. Write the fact 210 ≈ 1000 as a fact about logarithms.
This e-textbook is about using math for modeling and coming up with plausible analyses.
One of the course goals is number sense. Wikipedia defines this as “an intuitive understand-
ing of numbers, their magnitude, relationships, and how they are affected by operations.”
9
Exercise 0.3. If you have a model for spread of disease where the number of infections
doubles every three days, how long can this go on before the model has to change: A few
years? A few months? A few weeks? A few days?
If some kind of answer began to form in your mind without your stopping to get out a
calculator, then you have some of the ingredients of number sense already: perhaps you
understand exponential growth, perhaps you can remember about how many people there
are in the country or the world, perhaps you are familiar with powers of two and know how
they relate to this problem. It is useful to be able to think this way. It’s not important
whether you use a calculator to answer any given question, but realistically, how often will
you stop in casual conversation and whip out a calculator?
In this course, we’ll teach you a number of these ingredients: use of logs, converting to powers
of ten, tangent line approximation, Taylor polynomials, pairing off positive and negative
summands, approximating integrals with sums and vice versa. Discussions of these will be
brief. The point is to use them when you need them, which turns out to be nearly every
lesson.
Today we’re going to start with the tangent line approximation. You might think this odd
because we haven’t taught you calculus yet. Calculus provides a way of computing the slopes
of tangent lines to graphs. But conceptually, understanding the tangent line approximation
takes knowledge only of algebra and geometry, not calculus. So, we’ll preview the idea now,
and in fact several more ideas from the course, and then later see how to use calculus to do
these analyses more methodically.
I am hanging wind chimes on my balcony using a ladder 5 meters long. On the highest safe
step, my shoulders will be exactly at the top of the ladder, which I need to be at the height
of the balcony rail, 4 meters above the ground. Every time I reposition the ladder I scratch
the paint, so I’d rather not move it too many times. I need to get my shoulders within
a couple of centimeters of the right height in order to drive a nail into the lintel. Where
should I put the base of the ladder? The Pythagorean theorem tells me that it should be 3
meters from the wall; see Figure 1. Unfortunately, I didn’t measure right, maybe because
of the wide hedge at the base of the wall. I am 20 cm too low. Now what?
Solution: Let h be the function representing the height of the ladder as a function of the
10
My Goal
Lintel
4m 5m
Wall Ladder
Figure 1
position of the base, in other words, h(x) is the height of the ladder on the wall (in meters)
√
when the base is x meters from the wall. By the Pythagorean Theorem, h(x) = 25 − x2 .
Reality
Lintel
0.2 meters too low
Ladder
Hedge
Figure 2
The height I am trying to reach is shown in Figure 2. which has x = 3 and h(x) = 4.
Instead I hit some other point z with h(z) = 3.8. Clearly z is too far from the wall. How
far do I need to scoot the ladder toward the wall? As you can see in figure 2, due to the
balcony and the hedge, it was not feasible to measure either the height of the lintel or the
distance I placed the foot of the ladder more accurately.
Figure 3 shows the graph of h and a tangent line to the graph of h at the point (3, 4). The
tangent line is a very good approximation to the graph near (3, 4). For values of x between
perhaps 2.6 and 3.4, the line is still visually indistinguishable form the graph. If we know
11
Figure 3
the slope of this line, m, we can write the equation of the line: (y − 4) = m(x − 3). Because
h(x) is very nearly equal to this y (because the curve nearly coincides with the line), we
can write h(x) ≈ 4 + m(x − 3). The wiggly equal sign is not a formal mathematical symbol.
Here, it means the two will be close, but has no guarantee of how close, and furthermore,
it is only supposed to be close when x is close to 3. This is called an estimate. Shortly,
we will talk about bounds: estimates that do come with guarantees.
The reason we chose this particular example to demonstrate the tangent line approximation
is that we could compute the slope with high school geometry. With calculus, we can do
this for pretty much any function we can write down. In fact the word calculus when it
was invented meant literally “a method of computing”.
12
Bounding
To get an upper bound on f (x) means to find a quantity U (x) that you understand better
than f (x) for which you can prove that U (x) ≥ f (x). A lower bound is a quantity L(x) that
you understand better than f (x) and that you can prove to satisfy L(x) ≤ f (x). If you have
both a lower and upper bound, then f (x) is stuck for certain in the interval [L(x), U (x)].
The smaller the upper bound and the bigger the lower bound, the better, because this traps
the value of f (x) in a smaller interval [L(x), U (x)].
While estimating produces statements that are not mathematically well defined, bound-
ing produces inequalities with precise mathematical meaning. Two ways we typically find
bounds are as follows.
First, if f is increasing then an easy upper bound for f (x) is f (u) for any u ≥ x for which
we can compute f (u). Similarly an easy lower bound is f (v) for any v ≤ x for which we
can compute f (v). If f is decreasing, you can swap the roles of u and v in finding upper
and lower bounds. There are even stupider bounds that are still useful, such as f (x) ≤ C
if f is a function that never gets above C. The goal in this case is to pick u and v as close
to x as possible while still being able to compute f (u) and f (v).
Example 0.12. Suppose f (x) = sin(1). The easiest upper and lower bounds are 1 and −1
respectively because sin never goes above 1 or below −1. A better lower bound is 0 because
sin(x) remains positive until x = π/2 and 1 < π/2. You might in fact recall that one radian
√
is just a bit under 60◦ , meaning that sin(60◦ ) = 3/2 ≈ 0.0866 . . . is an upper bound for
sin(1). Computing more carefully, we find that a radian is also less than 58◦ . Is sin(58◦ ) a
better upper bound? Probably not, because we don’t know how to calculate it, so it’s not
a quantity we understand better. Of course is we had an old-fashioned table of sines, and
all we could remember about one radian is that it is between 57◦ and 58◦ , then sin(58◦ )
would be an excellent upper bound.
√
Exercise 0.4. Which is the best of these three choices for a lower bound for 10, and
√ √
why? (a) 3, because we know 32 = 9 < 10; (b) 10 − 0.001 because this is less than 10 by
definition, but not by much; (c) 2, because 22 = 4 < 10 and you don’t have to think as hard
to see this.
13
Concavity
A more subtle bound comes when f is known to be concave upward or downward in some
region. A chord of a graph is a line segment connecting two points on the graph. By
definition, a concave upward function lies below its chords and a concave downward function
lies above its chords.
y=l(x)
y=C(x)
y=f(x)
f(b)
f(a)
a c b
In the figure, the function f (x) is concave down, meaning it bends downwards. As long as
x is in the interval [a, b], we are guaranteed to have C(x) ≤ f (x). Looking at tangent lines
instead of chords, if a function is concave down on an interval, then the function always lies
below the tangent line. Therefore l(x) is an upper bound for f (x) when x ∈ [a, b] no matter
at which point c ∈ [a, b] we choose to take the linear approximation. The figure shows the
function y = f (x) trapped between the chord and the tangent line over the interval spanned
by the chord.
Exercise 0.5. Did 0.15 meters over- or under-estimate how far we needed to move the base
of the ladder?
In the ladder example, we were lucky that the graph was a familiar geometric shape, a
quarter circle, which we know to be convex. We are able to conclude that the tangent line
remains above the graph because we know geometrically that the tangent line to a circle
touches the circle at one point and otherwise remains outside the circle. Calculus will give
us a far more general way to determine concavity.
14
And now for something completely different: Logarithm Cheat Sheet
e ≈ 2.7
ln(2) ≈ 0.7
ln(10) ≈ 2.3
log10 (2) ≈ 0.3
log10 (3) ≈ 0.48
e3 ≈ 20
√ √
Also useful sometimes: 3 = 1.732 . . . and 5 = 2.236 . . . both to within about
0.003%.
1
√ p
OK so technically 2 is about 1.015% greater than 1.4 and 0.7 is about 1.015% less than 1/2
15
1 Variables, functions and graphs
In addition, there are some more routine things to discuss up front. In order to have a
shared language, we need to agree on notation and terminology. Normally it is a good
idea to read everything that is assigned; however if this notation is very familiar, you can
probably just answer the self-check questions and skip the reading. We apologize for the
length of this preliminary section. When the material becomes harder, the sections will be
shorter.
There are several ways to conceive of a function. One is that it is a rule that takes an input
and gives you an output. This is how most of us think of functions most of the time, but
it is not precise (rules are sentences which may be ambiguous or underspecified). For this
reason we also need a formal definition. A third way to understand functions is via their
graphs. We now discuss all three of these ways of characterizing a function, beginning with
the most formal.
Definition 1.1.
(i) A function is a set of ordered pairs with the property that no two ordered pairs have
the same first element.
(ii) The expression f (x) is defined to equal to y if the ordered pair (x, y) is in the set of
ordered pairs defining f and undefined otherwise. Informally, f (x) is called the value
of the function f evaluated at the argument x.
16
(iii) The domain of f is defined to the set of all first elements of the ordered pairs. The
range of f is defined to be the set of all second elements of the ordered pairs.
Now let’s say the same things verbally. The domain of a function is the set of allowed
inputs; the range is the set of all outputs. We often name functions with letters; f is the
typical choice, then g if another is needed, but of course we could name a function anything.
While it is common to refer to the function f as f (x), we will try to observe the distinction
that f is the function and f (x) is its value at the argument x, meaning the output when
you plug in x as an input. The condition that first coordinates are distinct corresponds to
the rule producing an unambiguous answer.
Finally, to describe the function f via its graph, we interpret the ordered pairs as points in
the plane, and draw this set as a curve. The condition that first coordinates are distinct
corresponds to the so-called vertical line rule: any vertical line (vertical lines being sets
with a single fixed x-coordinate but all possible y-coordinates) intersects the graph at most
once.
In common usage, one might encounter any of the three ways of defining or referring to a
function. We don’t want to drown in formality, so we usually use something only as formal
as needed. Let’s look at why we sometimes need formality.
Example 1.2. Suppose we define a function f by f (x) := x2 + 2. Have we formally defined
this function? It sounds as if this is the set of ordered pairs
{. . . , (−2, 6), (−1, 3), (0, 2), (1, 3), (2, 6), . . .}.
That would be if we meant the domain to be the set of all integers. Maybe instead we meant
the domain to be the set of all real numbers. In that case, the “. . .” in the list is somewhat
misleading; we should probably write the set of ordered pairs like this: {(x, x2 + 2) : x ∈ R}
(we use the notation R for the real numbers and ∈ for the “is an element of”). If this
function arose in a word problem where f (x) represented the value of some quantity at a
time x seconds after the start, maybe it makes sense to allow only nonnegative real numbers
as inputs. Formally, this would look like {(x, x2 + 2) : x is real and nonnegative}, which
could also be written {(x, x2 + 2) : x ∈ [0, ∞)} or {(x, x2 + 2) : x ≥ 0}, this last version
assuming we understood this to mean real numbers at least zero rather than, say, integers
at least zero.
17
is all nonnegative reals. You can see they are different functions: even though the defining
equation f (x) := x2 +2 is the same for all three, they are defined by different sets of ordered
pairs. On the other hand, for many purposes, we don’t care which of these functions was
intended. We can feel free to define the function by f (x) := x2 + 2 without specifying the
domain unless and until we get into trouble with the ambiguity in the domain. If we try to
answer a question like “How many solutions are there to f (x) = 3?” then we will need to
be more precise about the domain.
Exercise 1.1. What are the respective numbers of solutions to f (x) = 3 when f (x) := x2 +2
and the domain is respectively (a) the integers, (b) the reals, (c) the nonnegative reals?
In the discussion so far, we have introduced four notations you are probably familiar with,
but to be completely explicit, we discuss each briefly.
Maps-to notation. Often we name a function when defining it, then refer to it by name,
but we can also refer to it using the “maps-to” symbol 7→. Thus, x 7→ x2 + 2 refers to
the function that we named f , above. We use this when mentioning a function but rarely
when evaluating it at an argument because the notation (x 7→ x2 + 2)(3) is an atrocity (but
technically equal to 11).
Open and closed interval notation. The interval [a, b] refers to all real numbers x such
that a ≤ x ≤ b. When both endpoints are included, this is called a closed interval. The
interval (a, b) refers to all real numbers x such that a < x < b. When both endpoints are
excluded, this is called an open interval. [Warning: the notation is exactly the same as for
an ordered pair! If there is any ambiguity we will try to specify which, for example, “Let
(a, b) be the open interval...”] The notations (a, b] and [a, b) are called half-open and refer to
an interval with one point (the one next to the square bracket) included and one excluded.
The defining colon-equal sign. We use := to mean that the quantity on the left is
defined to be the quantity on the right, and a regular equal sign to mean an equation that
could hold for some values of the variables and fail for others. Thus, f (x) := x2 + 2 defines
a function, whereas f (x) = x2 + 2 is an equation which is true when a given function f ,
18
evaluated at x, has the same value as x2 + 2, and false otherwise.
Exercise 1.2. Suppose f (x) := x2 + 2. For each of the domains (a)–(c) in Exercise 1.1,
write the set of values of x that make the equation f (x) = 5 − 3x2 true. Please simplify
your answer(s). Here and throughout, the empty set is denoted by ∅.
One final remark about the basic definitions: there is an ambiguity in common usage of the
word “range”. Sometimes “range” is used to refer to a bigger set than in our definition,
namely the set of all things of the type that the function outputs. For example, someone
might say that the domain and range of a function f (x) := x2 + 2 is all real numbers. We
won’t do that here, but you may come across it elsewhere. In this text, technically the
range is the set of real numbers that are at least 2.
Exercise 1.3. What are two formal mathematical ways of writing the set of real numbers
that are at least 2, one using set-builder notation and one using interval notation?
Definition by cases
As we said, the most familiar way of referring to a function is as a rule for converting input
to output. Usually the rule is an equation, such as f (x) := C − x · ex , but the rule could be
verbal, for example, “Let f (t) be the amount in tons of carbon dioxide emitted in t years.”
Sometimes we want to talk about functions that are defined by equations, but different ones
in different parts of the domain. This is called definition by cases. An example from a
recent research paper looks like this:
−9x a ≤ −3
f (x) := 2x2 − 3x −3 < x < 1 .
−a3
a≥1
A number of useful functions can be defined in this way. For example the absolute value of
x, denoted |x|, may also be defined in cases:
(
x x≥0
|x| := .
−x x<0
19
1. Note that x and −x agree at x = 0, so we could have grouped zero with either case.
When this happens, writing
(
x x≥0
|x| :=
−x x ≤ 0
emphasizes this. If x and −x did not agree at x = 0, this would be a badly formed
definition.
2. There is a period following the two example definitions but not the one in the first
remark. Why? Because well written math follows rules of basic grammar. These rules
can be a little different on occasion, but for the most part, you should expect this text
to read in complete sentences, to define variables and functions before using them, and
when used within sentences, to connect and flow logically, using connecting words like
“and”, “because”, “therefore”, and punctuation such as commas and periods.
Exercise 1.4. Which of the following defines a function whose domain is all real numbers?
Explain your reasoning.
(
x+1 x>2
f (x) := ;
x−1 x<2
(
x+1 x≥2
g(x) := ;
x−1 x<2
(
x+1 x≥2
h(x) := .
x−1 x≤2
In the defining statement f (x) := x2 + 2, it would define the same function if instead we
said f (u) := u2 + 2. It is the same set of order pairs, has the same graph, etc. The variable
x (or in the second case, u) is said to be a bound variable. The bound variable in this case
runs over all values in the domain of f . A variable that is not bound is free. For example,
20
in the definition f (u) := u2 + c, the variable c is free. The definition of the function f
depends on the value of c. If c = 2, it boils down to the previous definition. If c = 1 it is a
different function. If c has not been assigned a value, then f is a function whose range is
not the real numbers but rather algebraic expressions in the variable c.
Bound variables arise many times throughout this course, in fact throughout math and
throughout life! Here is a list of some places bound variables occur in this course, the first
two of which you have already seen.
• In the definition of a function
• In the definition of a subset
• In quantifiers
• In limits
• In the definition of a derivative
• In summations
• In the definition of an integral
• In notions of orders of magnitude and asymptotic equivalence
• In Taylor’s theorem
A related notion is that of a quantifier. Typically we use two quantifiers, for all and there
exists. These two phrases are so important that there are symbols for them. Some people
find these intimidating so we won’t use them, but in case you encounter them elsewhere, in
math they are denoted ∀ and ∃. A typical use of quantifiers is as follows. A function f is
said to be differentiable on an open interval (a, b) if (a, b) is in the domain of f and if, for
all x ∈ (a, b), the derivative f 0 (x) exists. In this case there was only one quantifier.
Exercise 1.5. (i) What was the one quantifier? (ii) In the above definition of differen-
tiability, among the variables a, b and x, which are bound and which are free? Intuitively
a variable is free if the final answer depends on what value you take for that variable, but
bound if you have to consider many values of the variable and put the information together.
Here are some more useful special functions. The greatest integer function at the argu-
ment x is denoted bxc defined to be the greatest integer y such that y ≤ x. In other words,
21
if x is an integer then bxc = x; if x is positive and not an integer, then bxc is the “whole
number you get when you write x as a decimal and ignore what comes after the decimal
point”; if x is negative and not an integer, it is −1 plus what you get when you ignore the
decimals. In older texts, the same function is sometimes denoted [x]. This square bracket
notation has largely been abandonded in favor of the “floor” notation, because (especially
in computer science) we also often want to use the ceiling function as well. The ceiling
function at the argument x is denoted dxe and is defined to be the least integer y such that
y ≥ x. Informally, bxc rounds down to the nearest integer and dxe rounds up.
√
Exercise 1.6. What is bxc when x is respectively 3, 9.4, 2, 0 , −1.5? What is dxe?
Another useful function is the sign function, not to be confused with the sine function! This
is defined by
1 x>0
sgn(x) = 0 x=0 .
−1 x<0
Another is the delta function defined by δ(x) = 1 when x = 0 and 0 when x 6= 0.
Exercise 1.7. Write the delta functions as a definition by cases.
We now list certain properties of functions to which we will often refer. A function f is said
to be odd if f (−x) = −f (x) for all x in the domain of f . It is unclear what is meant if the
domain contains x but not −x. Similarly an even function f is one satisfying f (−x) = f (x).
Exercise 1.8. For each of these functions, say whether it is odd, even or neither.
(a) f (x) := x2
(b) f (x) = 3 − x
(c) f (x) = x3 + x
(d) f (x) = sin x
(e) f (x) = cos x
A function f is said to be increasing if f (x) ≤ f (y) for all values of x and y in the
domain of f such that x < y. Informally, the value of an increasing function gets bigger
if the argument gets bigger. If you change the requirement that f (x) ≤ f (y) to the strict
inequality f (x) < f (y), this defines the notion of strictly increasing. Decreasing and
strictly decreasing functions are defined analogously but with one inequality reversed: f
22
is decreasing if f (x) ≥ f (y) for all x, y satisfying x < y. A (strictly) monotone function
is one that is either (strictly) increasing or (strictly) decreasing.
We can also say when a function is increasing or decreasing on a part of the domain: f is
increasing on the open interval (a, b) if the above inequality holds for all x, y ∈ (a, b). For
any point c ∈ (a, b), we then also say that f is increasing at c. In other words, to say f is
increasing at a point c means there is some a < c < b such that f is increasing on the open
interval (a, b).
Exercise 1.9. Is the sign function strictly increasing, increasing, strictly decreasing, de-
creasing, or none of the above?
1.3 Graphing
As you already know, points in the plane can be labeled by ordered pairs of real numbers. As
you also already know, the graph of a function f is the set points in the plane corresponding
to the ordered pairs {(x, f (x)) : x ∈ domain of f }.
Often the graph of a function is a continuous curve, and can be quickly drawn, conveying
essential information about f to the eye much more efficiently than if the reader had to
wade through equations or set notation.
Exercise 1.11. Which of the four graphs, borrowed from Hughes-Hallett et al., best matches
each of the following stories?
(a) I had just left home when I realized I had forgotten my books, so I went back to pick
them up.
(c) I started out calmly but sped up when I realized I was going to be late.
23
dist. from home dist. from home
time time
(1) (2)
dist. from home dist. from home
time time
(3) (4)
Some conventions make graphs even more effective at conveying information. The axes
should be labeled (more on that later) but more importantly, marked so that the scale is
clear. Rather than just mark where 1 is on the horizontal and vertical axes, it is often helpful
to mark any value where something interesting is going on: a discontuity, an asymptote, a
local maximum or minimum, or a change of cases for functions defined in cases.
24
For example, if I graph x 7→ 1/(x2 − 3x + 2), I should mark vertical asymptotes (a certain
kind of discontinuity) on the x-axis at x = 1 and x = 2; a dashed vertical line is customary.
We should mark a local maximum of −4 (marked on the y-axis) occurring at x = 3/2
(marked on the x-axis). When graphing a function on the entire real line, we can’t go
to infinity and stay in scale, so we either go out of scale or draw a finite portion, large
enough to given the idea. Choosing the latter, the resulting picture should look something
like the graph in Figure 4. Another way to do this would be to label and mark the point
(3/2, −4) on the graph. There is a horizontal asymptote at zero, which we would mark
with a dashed horizontal line if it occurred anywhere else, but we don’t because it is hidden
by the x-axis. If there is a point where an otherwise continuous function fails to be well
defined, the convention is to put a small open circle. For example the function f (x) := x2 /x
is undefined at zero but is otherwise equal to x; its graph is shown on the left of Figure 5.
A solid circle is used to denote a point where the function is defined, as in the graph of the
floor function on the right of Figure 5.
Here follows a list of tips on graphing an unfamiliar function, call it f . The last three tips
on shifting and scaling are ones we have found in the past that many students vaguely recall
but get wrong, so please make sure you know them.
(i) Is the domain all real numbers? If not, what is it? If the function has a piecewise
definition, try drawing each piece separately.
(ii) Is there an obvious symmetry? If f (−x) = f (x) for all x in the domain, then f is even
and there is a symmetry about the y-axis. If f (−x) = −f (x) then f is odd and there
is 180-degree rotational symmetry about the origin.
25
(iii) Are there discontinuities, and if so, where? Are there asymptotes?
(iv) Try values of the function near the discontinuities to get an idea of the shape – these
are particularly important places. If the domain includes points on both sides of a
discontinuity be sure to test points on each side.
(v) Try computing some easy points. Often f (0) or f (1) is easy to compute. Trig functions
are easily evaluated at certain multiples of π.
(vii) Where is f increasing and where is it decreasing? This will be easier once you know
some calculus.
(viii) Where is f concave upward versus concave downward? This will be a lot easier once
you know some calculus.
(ix) Where are the maxima and minima of f and what are its values there? This will be
a lot easier once you know some calculus.
(xi) Is there a function you understand better than f which is close enough to f that their
graphs look similar?
(xiii) Is the graph of f a shift of a more familiar graph? Graphing y = f (x) + c shifts
the graph up by c; this is pretty intuitive; if c is negative the graph shifts downward.
Graphing y = f (x + c) shifts the graph left or right by c. If c is positive, the graph
shifts left.
(xiv) Is the graph of f a rescaling of a more familiar graph? The graph y = cf (x) stretches
vertically by a factor of c. When c ∈ (−1, 1) this is a shrink rather than a stretch.
Exercise 1.12. What happens when c is negative? Sketch the specific example where
c = −2 and f (x) = x2 on the domain [−1, 1].
(xv) The graph of y = f (cx) stretches or shrinks in the horizontal direction. When c > 1,
it is a shrink. Why? Try sketching y = cos x and on top of this sketch y = cos(2x).
Exercise 1.13. Explain in words why c ∈ (0, 1) produces a horizontal stretch. What
happens when c is negative?
26
1.4 Inverse functions
One method of solving the problem of Galileo’s experiment involved an inverse function.
Let’s be explicit about the definitions involved. The inverse function of a function f is the
function that answers the question,
In other words, if g is the inverse function of f then g(y) is whatever value x satisfies
f (x) = y. If there is more than one answer to this, then f has no inverse function; however,
you can usually restrict the domain so there is only one answer. If there is no answer, that’s
not a problem, it just means that y is not in the domain of g. This happens when y is not
in the range of f . Thus, the domain of g is the range of f . Likewise, the range of g is any
possible answer to the question above, therefore any x in the domain of f .
Exercise 1.14. Let f (x) := sin x on a domain of the form [−L, L], where L is some positive
real number. What is the largest value of L such that f is one-to-one and therefore has an
inverse?
The usual notation for the inverse function to f is f −1 . This is terrible notation because it
is the same as the notation for the −1 power of f , also known as 1/f . We tried changing
the inverse function notation to f inv for the purposes of this class, but then students were
confused when they saw f −1 . We will stick with the terrible notation, and mention it when
confusion might arise.
There is a standard way that the domain is restricted on trig functions so that the inverse
function can be defined. For sin and tan it is [−π/2, π/2]. The function cos when restricted
to [−π/2, π/2] is not one-to-one; the standard choice for cos is [0, π]. These are arbitrary
conventions, but are probably built in to your calculator, so we had better adopt them.
Also, along with sin−1 , cos−1 and tan−1 , the conventional names arcsin, arccos and arctan
are also used.
Exercise 1.15. Let f be the squaring function, f (x) := x2 . What is the standard name of
the inverse function to f , and what choice of domain of f is usually made so that f will be
one-to-one?
Inverse functions occur naturally in mathematical modeling. For example, if f (t) represents
how many miles you can walk in t hours, then f −1 (x) represents how many hours it takes
27
Figure 6: sin is one-to-one on [−π/2, π/2] (top) but cos is not (left) so we move the window
to [0, π] (right)
you to walk x miles. Note that in this explanation, x is a bound variable; we could have
used any other name, such as t again, only it helps readability if we use names such as t for
time and x for distance.
Exercise 1.16. Define f (x) to be the number of pounds you have to carry when planning
a backpacking excursion for x days.
(a) Give an interpretation for f −1 (v)
(b) Give interpretations for f −1 (v) + f −1 (w) and f −1 (v + w).
(c) Which do you think would be greater?
28
How does the graph of an inverse function relate to the graph of a function? The roles of x
and y have switched. When the first and second coordinate of an ordered pair are switched,
the point reflects across the diagonal line y = x. Thus, the graph of the inverse function
is the original graph (on the appropriate domain) reflected across the diagonal. The blue
curve in Figure 7 is the plot of f (x) := x3 − 3x from x = 1 to x = 3, an interval on which
f is one-to-one. The red curve shows f −1 on the corresponding interval [−2, 18].
29
2 Units, proportionality and mathematical modeling
One skill most students need practice with is writing formulas for functions given by verbal
descriptions. Try this multiple choice question before going on.
Exercise 2.1. Knowing that an inch is 2.54 centimeters, if f (x) is the mass of a bug x
centimeters long, what function represents the mass of a bug x inches long?
(b) f (x)/2.54
(c) f (2.54x)
(d) f (x/2.54)
It helps to think about all such problems in units. Although inches are bigger than cen-
timeters by a factor of 2.54, numbers giving lengths in inches are less than numbers giving
lengths in centimeters by exactly this same factor. Writing this in units prevents you from
making a mistake. The quantity 1 inch is the same as the quantity 2.54 centimeters, so
their quotient in either order is the number 1 (unitless). We can multiply by 1 without
changing something. Thus,
2.54 cm
x in × = 2.54x cm .
1 in
This shows that replacing x by 2.54x converts the measurement, and therefore (c) is the
correct answer. Here are some more helpful facts about units.
1. You can’t add or subtract quantities unless they have the same units. That would be
like adding apples and oranges!
3. Taking a power raises the units to that power. For example, if x is in units of length,
say centimeters, then 3x2 will have units of area, in this case square centimeters. Most
functions other than powers require unitless quantites for their input. For example, in a
30
formula y = e∗∗∗ the quantity *** must be unitless. The same is true of logarithms and
trig functions: their arguments are always unitless2 .
4. Units tell you how a quantity transforms under scale changes. For example, a square
inch is 2.542 times as big as a square centimeter.
Exercise 2.2. Suppose a pear growing on a tree doubles in length over the course of two
weeks. By what factor does its volume increase?
Often what we can easily tell about a function is that it is proportional to some combination
of other quantities, where the constant of proportionality may or may not be known,
or may vary from one version of the problem to another. Constants of proportionality have
units, which may be computed from the fact that both sides of an equation must have the
same units.
Example 2.1. If the monetization of a social networking app is proportional to the square
of the number of subscribers (this representing perhaps the amount of messaging going on)
then one might write M = kN 2 where M is monetization, N is number of subscribers and
k is the constant of proportionality. You should always give units for such constants. They
can be deduced from the units of everything else. The units of N are people and the units of
$
M are dollars, so k is in dollars per square person. You can write the constant as k .
person2
Example 2.2. If the expected profit on a home sale is proportional to the assessed value
of the home and inversely proportional to the number of days it has been on the market,
we could capture that relation as P = kV /T where P is profit in dollars, V is assessed price
in dollars, T is number of days on the market, and k is a constant of proportionality.
31
units will not be assignable to the proportionality constant k in the formula BV = kL2.65 .
In this case we just have to live with the fact that k has units involving fractional powers
of length that won’t make much sense outside of this context.
An important point when writing up your work: You don’t just write M = kN 2 without
stating the interpretations of the three variables. Also, there would not usually be a :=
here, because you are not defining the function M (N ) := kN 2 as much as you are saying
that two observed quantities M and N vary together in a way that satisfies the equation
M = kN 2 . There isn’t a clear line here, but the style of the definition can be important in
conveying to the reader what’s going on.
Example 2.3. The present value under constant discounting is given by V (t) = V0 e−αt
where V0 is the initial value and α is the discount rate. What are the units of α? They
have to be inverse time units because αt must be unitless. A typical discount rate is 2% per
year. You could say that as “0.02 inverse years.” We hope that by the end of the semester,
the notion of an inverse year is somewhat intuitive.
Exercise 2.4. Write a formula expressing the statement that risk of viral infection in an en-
closed space is proportional to the square of the number of people and inversely proportional
to the cube root of the volume. Be sure to give the units of the constant of proportionality.
Often quantities are measured as proportions. For example, the proportional increase in
sales is the change in sales divided by sales. In an equation: the proportional increase in S
is ∆S/S. Here, ∆S is the difference between the new and old values of S. You can subtract
because both have the same units (sales), so ∆S has units of sales as well. That makes the
proportional increase unitless. In fact proportions are always unitless.
Percentage increases are always unitless. In fact they are proportional increases multiplied
by 100. Thus if the proportional increase is 0.183, the percentage increase is 18.3%. In
this class we aren’t going to be picky about proportion versus percentage. If you say the
percentage increase is 0.183 or the proportional change is 18.3%, everyone will know exactly
what you mean. But you may as well be precise.
Exercise 2.5. The proportional increase in an animal’s weight during the first week of life
is observed to be exponential in the percentage of a certain protein in the blood at birth. Do
the units make sense or not?
Units behave predictably under differentiation and integration as well. We will refer back
to this when we define the relevant concepts, but you may as well see a preview now. The
32
derivative (d/dx)f has units of f divided by units of x. You can see this easily on the graph
in Figure 8 because the derivative is a limit of rise over run, where rise has units of f and
R
run has units of x. The integral f (x) dx has units of f times units of x. Again you can
see it from a picture (Figure 9), because the integral is an area under a graph where the
y-axis has units of f and the x-axis has units of, well, x.
33
2.2 Modeling
Unpacking this, we see a number of features. First of all one must define mathematical
objects in the model: variables, sets, functions, equations and so forth. Secondly, one must
give interpretations of everything in the model. An interpretation tells what physical
quantity is associated with each of the constants and variables and what relation is meant
by each function. Physical quantities include units, so this part always involves stating
units. Note: the interpretation tells how the math relates to the scenario; it is not itself
mathematical. Thirdly, often one needs to add hypotheses about the scenario. These say
the circumstances under which would you expect the mathematics to be correct for the
model. This hypotheses are also physical, not mathematical. Lastly, if there are questions
given in the scenario, it is necessary to say what part of the mathematics answers the
question(s). After this, what is left is a math problem: solve for the quantities that answer
the questions.
In the following example, we have underlined parts of the modeling exercise that reflect the
outline we have given, such as naming of variables, interpretation, units and hypotheses.
Example 2.4.
Scenario: Galileo observes that objects falling a short time seem to fall a distance that was
proportional to the square of the time and independent of the object: 4 feet for an object
falling half a second, 9 feet after three quarters of a second, 16 feet after one second, and so
forth. Galileo decides to measure the Tower of Pisa by dropping a stone from the top of the
tower and measuring the time it takes for him to hear it hit the ground. Make a model for
this and use it to estimate the elapsed time Galileo measured between dropping the stone
and hearing the sound.
Model: Let f (t) be the distance in feet that an object falls in t seconds, starting from rest.
The wording of the scenario tells us that f (t) = c t2 where c has units of feet per seconds squared.
This assumes we set t = 0 at the time of release and measure distance from the point of release.
We are asked to determine t such that f (t) = h, where h is the height of the Tower of Pisa.
Equivalently, we need to find f −1 (h). We assume that the model is accurate. What that
means in this case is that we can ignore things such as air resistance and the time lag for
34
the sound of impact to get back to Galileo’s ear.
Solution: We look up the height of the Tower of Pisa to find that h = 186 feet. We
solve for c given Galileo’s data for small distances and find that c = 16 (for example:
f (1/2) = c(1/2)2 = 4 implies c = 16). We can solve directly for 16t2 = 186 or we can
compute the inverse function to f yielding f −1 (x) =
p
x/16 and substitute 186 for x.
p
Either way we get t = 186/16 ≈ 3.40. It may sound pedantic, but probably we should
justify our choice of the positive square root by saying the whole experiment only covers
time after the release, that is, t ≥ 0.
Were the hypotheses warranted? Many objects would be slowed by air resistance over such
a distance. Probably Galileo would have had to drop something like a rock in order for the
fall not to have been significantly slowed. Looking up the speed of sound, it would take
an extra 1/4 second to register the sound. Probably Galileo could measure time to within
greater accuracy that 1/4 second, so this hypothesis is definitely shaky.
35
Squares and Powers of 2 Cheat Sheet
If you know the powers of 2 you can do the same thing with log2 that you can do with log10 .
Because I indeed am a Geek, I have listed the first few powers of 2 and am suggesting you
be at least somewhat familiar with them. By the way, you should also recognize the first
twenty squares:
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400 .
No kidding, when you come across one of these numbers under a radical, you know imme-
diately it can be factored out. Here are the powers of 2.
20 = 1
21 = 2
22 = 4
23 = 8
24 = 16
25 = 32
26 = 64
27 = 128
28 = 256
29 = 512
210 = 1, 024
211 = 2, 048
212 = 4, 096
213 = 8, 192
214 = 16, 384
215 = 32, 768
216 = 65, 536
220 ≈ 1, 000, 000
230 ≈ 1, 000, 000, 000
2100 ≈ 1030
36
2.3 Exponential and logarithmic relationships
The log cheatsheet is there to encourage you to use logs for quick computations. The squares
and powers of two are just for fun (OK it was written by geeks). We’re going to take a
quick break from concepts to get the hang of computing with logs.
Example 2.5. What is the probability of getting all sixes when rolling 10 six-sided dice?
It’s 1 in 610 but how big is that? If we use base-10 logs, we see that log10 (610 ) = 10 log10 6 =
10(log10 (2) + log10 (3)) ≈ 10(.78) = 7.8. So the number we’re looking for is approximately
107.8 which is 107 × 100.8 or 10, 000, 000 times a shade over 10.78 , this latter quantity being
very close to 6 according to the one-digit logs you computed. So we’re looking at a little
over sixty million to one odds against.
Exercise 2.6. Roughly how big is 511 ? Just one significant digit is fine.
These are not just random examples, it is always the best way to get a quick idea of the
size of a large power. When the base is 10 we already know how many digits is has, but
when the base is something else, we quickly compute log10 (ba ) = a · log10 (b).
Example 2.6. Why is the value ln(10) ≈ 2.3 on your log cheatsheet so important? It
converts back and forth between natural and base-10 logs. Remember, log10 x = ln x/ ln 10.
Thus the constant ln 10 is an important conversion constant that just happens to be closer
than it looks (the actual value is 2.302 . . .). So for example,
√
e8 ≈ 108/2.3 ≈ 103.5 = 1000 × 100.5 = 1000 10 ≈ 3, 000 .
Exercise 2.7. Estimate 2.72.3 using the log cheatsheet, then use a calculator to find a more
accurate decimal approximation.
Exercise 2.8. A certain astronomical computation yields the number exp(24). How many
digits will this be? (Meaning, how many digits before the decimal point.)
Recall in the definition of e, the slope of the graph ex at (0, 1) is 1, therefore the tangent
line approximation is ex ≈ 1 + x. In case you didn’t do practice problem #2, you should
know that this approximation is very good when x < 0.1. Let’s see what this means for
doing typical interest computations. Suppose, for example, your company grows in value
by 6% each year for 20 years. By what factor C does the value increase over this time? The
answer is 1.0620 , but about how big is that? For a quick answer, take logs. Using the fact
37
that ln 1.06 ≈ 0.06, we see that ln C = ln(1.0620 ) ≈ 20 × 0.06 = 1.2. We’d rather have this
in base ten, so we compute log10 C = ln C/ ln 10 ≈ ln C/2.3 ≈ 1.2/2.3 ≈ 0.5, maybe a little
bigger like 0.52 or so. Looking at the log cheatsheet shows this means C should be between
3 and 4, somewhat closer to 3. In fact to two significant figures, the growth factor is 3.2.
Exercise 2.9. Historical economists look at real (inflation-adjusted) growth rates over pe-
riods of a century or more. If the real annual growth rate averages 2%, what should be the
growth factor over the century and a half from 1870 to 2020?
If you ask someone to state a relationship between the numbers 20 and 30, the most common
answer is that 30 is ten more than 20. A more fundamental answer is that 30 is 50% more,
or equivalently that 30 is three halves of 20. The section on proportionality is designed to
emphasize multipicative thinking over additive thinking. Additive thinking is more com-
mon only because we find it computationally easier to add than to multiply. Saying that
multiplicative thinking is more fundamental is not a precise mathematical statement, so
there’s no way to prove it. One reason to believe it is that the statement remains the same
no matter what units you use (as long as the 30 and the 20 are in the same units).
Exercise 2.10. A small city has 40,000 households. To organize an emergency response
system, the city wants to organize groups of households on a scale “halfway between the
individual household and entire city scale.” What size of groups of households best fulfills
this?
Exponentials and logarithms are built to express multiplicative facts. In fact the additive
laws of exponentiation and logarithms basically convert multiplicative facts to additive facts,
thereby converting the more fundamental fact to the type you can compute more easily.
Much of what you learn on topic of exponential and logarithmic relationships insights such
as this one:
If you observe that ln x has increased by about 0.7, what does this mean about
the increase that has occurred in x?
38
represent both quantities can lead to mess and confusion. Better to use different names such
as x1 and x2 , or xinit and xfinal , or possibly x and x0 , etc. Using this idea on the question
above sets up an equation like this: ln x2 ≈ ln x1 + 0.7. From here, exponentiating leads to
So, if you observe ln x increasing by about 0.7, you will know that x had approximately
doubled. This is what it means that logarithms transfer multiplicative scales to additive
ones. A multiplicative relation such as doubling transfers to an additive relation, namely
addition of about 0.7.
Exercise 2.11. When x triples, what happens to the base-ten log of x? What about the
natural log of x?
One more thing to keep in mind about logarithms and exponentials is that they do not
scale with units. If I change the units of x from inches to centimeters, and if y = ex , then
0
in the new units y 0 = e2.54x = y 2.54 . The new exponential appears to be the old one to the
2.54 power. What does that even mean? It is a tipoff that x should not be exponentiated:
anything other than a unitless constant is likely to be meaningless when exponentiated.
The same is true for logarithms and trig functions.
If a quantity Q increases at a constant additive rate, it means that if you wait one unit of
time, Q always increases by the same additive amount. In fact, between any two times s
and t the increase will be c(t − s).
Exercise 2.13. What are the units of c in this case?
If a quantity Q increases at a constant multiplicative rate, it means waiting one unit of time
always multiples Q by the same amount, and in general, between times s and t, the factor
by which Q increases will be ct−s where c is the factor by which Q increases in one unit of
time.
Exercise 2.14. What are the units of c in this case?
To get back to the question of what it means about logs relating additive to multiplicative
growth, if ln x = a + bt (constant additive growth over time) then x = ea+bt = ea ebt = AB t
39
where A = ea and B = eb . This is constant multiplicative growth.
Constant multiplicative growth rates occur in a lot of applications. This is also called
exponential growth because the formula for a quantity growing multiplicatively is Aebt (also
ea+bt or AB t ). When b < 0, it is called exponential decay or decrease.
Here are a few examples. Equilibrating: if an item is hotter or colder than its environment
then the temperature difference between the object and its environment, as a function of
time, decreases exponentially (here, in Aebt , the coefficient b is negative). Money accumu-
lating (fixed rate) interest grows exponentially. So, unfortunately does debt (just put a
minus sign on the money). Population tends to grow this way (again unfortunately, in most
cases). Radioactive substances decay exponentially. So does the portion of DNA remaining
unmutated. Present value analyses, under a fixed discount rate, imply exponential decrease
of the present value for revenue at future times. Time series data for which the correlations
decay exponentially are common.
If we get to assume a nice clean exponential model, and can observe at more than one time
point, then exponential growth/decay models are nearly as easy to solve as linear growth
models (a highlight of eighth grade math). You should learn this both conceptually and as
a mindless skill.
Example 2.7. A viral infection is spreading exponentially through the community. On
the first day that the outbreak had a name, there were 25 infections. A week later there
were 40 infections. How many infections will there be in another two weeks? When will the
number of infections reach 200,000, which is the size of the entire local population?
Solution #1 (plug in logs): Let N (t) denote the number of infections after t weeks.
Our model is N (t) = Aebt . The given information is that plugging in t = 0 and t = 1 give
N = 25 and N = 40 respectively. Because e0 = 1, we have 25 = A, while 40 = Aeb . This
gives eb = 40/25 = 8/5, hence b = ln(8/5). In another two weeks we will have t = 3, so
Solution #2 (growth factor): If we use the growth factor B in the equation AB t instead
of the exponential constant b in Aebt we may get away without logs. In a week the increase
40
was from 25 to 40, a factor of 8/5 so clearly B = 8/5. Thus N = 25(8/5)t . In three
weeks we have N (3) = 25(8/5)3 = 512/5 = 102.4. Evidently the expression 25e3 ln(8/5) can
be simplified! The time needed to get to 200,000, a growth factor of 8000, is t such that
(8/5)t = 8000. This is, by definition log8/5 8000. You can see the previous answer produces
this, because log8/5 8000 is equal to ln 8000/ ln(8/5). But in fact you should realize it is
equal to logb 8000/ logb (8/5) for any base b. This can be useful if you know the logs in some
other base, for example base ten. Using our knowledge of base-ten logs, we approximate:
3.9/(0.9 − 0.7) = 19.5. So it should take between nineteen and twenty weeks to saturate
the city.
Exercise 2.15. Is exponential growth a more realistic model when a small portion of the
population is infected or when a large portion is infected?
41
3 Limits
You might not think limits would show up in a calculus course oriented toward application.
Wrong! There are a lot of reasons why you need to understand the basics of limits. You
should know these reasons, so here they are.
1. You have already seen they show up in the definition of powers and logarithms when the
exponent is not rational.
5. Limits are needed to understand improper integrals, such as the integrals of probability
densities.
Intuitive
Pictorial
Formal
Computational
Intuitive: The limit as x → a of f (x) is the numerical value (if any) that f (x) gets close
to when x gets close to (but does not equal) a. This is denoted limx→a f (x). If we only let
x approach a from one side, say from the right, we get the one-sided limit limx→a+ f (x).
Please observe the syntax: If I tell you a function f and a value a then the expression
limx→a f (x) takes on a numerical value or “undefined”. The variable x is a bound variable;
it does not have a value in the expression and does not appear in the answer; it stands for
42
Figure 10: Left: f (x) = x + 2 except that f is undefined at x = 2; Center: a wiggly function
near zero; Right: zooming in on the wiggly function at zero
a continuum of possible values approaching a. The variable a is free and does show up in
the answer; for example limx→a x2 is equal to a2 .
Pictorial: If the graph of f appears to zero in on a point (a, b) as the x-coordinate gets
closer to a, then b is the limit, even if the actual point (a, b) is not on the graph. For
x2 − 4
example, suppose f (x) = . Canceling the factor of x − 2 from top and bottom,
x−2
you can see this is equal to x + 2, except when x = 2 because then you get zero divided
by zero. Functions like this are not just made up for this problem. They occur naturally
when solving simple differential equations, where indeed something different might happen
if x = 2. The graph of f has a hole in it, which we usually depict as an open circle, as in the
left side of Figure 10. The value of limx→2 f (x) is 2, even though f is undefined precisely
at 2.
In this example the function f behaved very nicely everywhere except 2, growing steadily
at a linear rate. The center figure shows the somewhat less well behaved function g(x) :=
x sin(1/x). This function is undefined at zero. As x approaches zero, the function wig-
gles back and forth an infinite number of times, but the wiggles are smaller and smaller.
Intuitively, the value of the function g seems to approach zero as x approaches zero. Picto-
rially we see this too: zooming in on x = 0 in the right-hand figure, corroborates that g(x)
approaches zero.
(
e−x x ≥ 0
Exercise 3.1. Sketch the function f (x) := ; see Exercise 3.2, upcoming. Does
0 x<0
43
the limit limx→0 f (x) exist?
Formal: The precise definition of a limit is a little unexpected if you’ve never seen it
before. We don’t define the value of limx→a f (x). Instead, we define when the statement
limx→a f (x) = L is true. It can be true for at most one value L. If there is such an L, we
call this the limit. If there is no L, we say the limit does not exist. When asked for the
value of limx→a f (x), you should answer with either a real number, or “DNE”, for “does
not exist”. We won’t have to spend a lot of time on the formal definition. You should see
and grasp it at least once. Use of the Greek letters ε and δ for the bound variables is a
strong tradition.
Definition 3.1. If f is a function whose domain includes an open interval containing the
real number a, we say that limx→a f (x) = L if and only if the following statement is true.
For any positive real number ε (think of this as acceptable tolerance in the y
value) there is a corresponding positive real δ (think of this as guaranteed accu-
racy in the x-value) such that for any x other than a in the interval [a − δ, a + δ],
f (x) is guaranteed to be in the interval [L − ε, L + ε].
3
A horizontal line can be an asymptote for f even if f crosses back and forth over the line; we will see a
formal definition soon in Definition 3.6.
44
In symbols, the logical implication that must hold is:
Remark. Loosely speaking, you can think of ε as an acceptable error tolerance and δ as
how tightly you control the input. The limit statement says, you can meet even the pickiest
error tolerance provided you can tune the input sufficiently well. Why is this a difficult
definition? Chiefly because of the quantifiers. The logical form of the condition that must
hold is: For all ε > 0 there exists δ > 0 such that for all x ∈ [a − δ, a + δ], · · · . This has
three alternating quantifiers (for all... there exists... such that for all...) as well as an if-then
statement after all this. Experience shows that most people can easily grasp one quantifier
“for all” or “there exists”, but that two is tricky: “for all ε there exists a δ . . .”. A three
quantifier statement usually takes mathematical training to unravel.
Some people find it easier to conceive of the formal definition as a game. Alice is trying to
show it’s true. Bob is trying to show it’s false. Alice says to Bob, no matter what ε you give
me, I can find a δ to make the implication true. (The implication is that all x-values fitting
into Alice’s δ-interval will give values of f (x) inside Bob’s interval.) Now they play the
game: Bob tries to come up with a value of ε so small as to thwart Alice. Then Alice has
to say her δ. If she can always do so (assuming Bob has not made a blunder in overlooking
the right choice of ε) she wins and the limit is L. If not (unless Alice has overlooked a δ
that would have worked), Bob has won and the limit is not L.
3.2 Variations
Before introducing computational apparatus for limits, we need to finish the definitions by
defining some variations: one-sided limits, limits at infinity and “limits of infinity” (which
are in quotes because technically they are not limits at all).
One-sided limits
Change the definition so that f (x) is only required to approach L when x → a if x is greater
than a. We say x “approaches a from the right,” thinking of a number line. If the value of
f (x) approaches L when x approaches a from the right, we say that the limit from the right
of f (x) at x = a is L, and denote this limx→a+ f (x) = L. If we require f (x) to approach L
45
when x approaches a but only for those x that are less than a, this is called having a limit
from the left and is denoted limx→a− f (x) = L.
Remark.
Just like wind directions (North wind, South
wind, etc.), one-sided limits are named for the
direction they come from, not the direction x
is moving. Thus, limx→0+ is evaluated by let-
ting x approach zero from the positive direc-
tion, as shown to the right.
Exercise 3.2. The lifetime of a light bulb is often modeled as a random variable4 with
density f (x) = ce−cx when x ≥ 0 and f (x) = 0 when x < 0 (light bulbs cannot have negative
lifetimes). Here c is some positive constant. What are limx→0+ f (x) and limx→0− f (x)?
Both kinds of one-sided limits require something less stringent, so the statement limx→a f (x) =
L automatically implies both limx→a+ f (x) = L and limx→a− f (x) = L. Likewise, if f (x)
is forced to approach L when x approaches a from the right, but also when x approaches
a from the left, then this covers all x, and the (unrestricted) limit will be L. If you want,
you can summarize this as a theorem – wait, no it’s too puny, let’s make it a proposition.
We won’t be referring to this too often, but here it is.
In words, a limiting value for a function exists at a point if and only if the two one-sided
limits exist are equal.
Exercise 3.3. Suppose f is a function satisfying limx→4− f (x) = 2 and limx→4+ f (x) = 1.
Example 3.3 (one-sided limits). Let f (x) = bxc, the greatest integer function. Let’s
evaluate the one-sided limits and two-sided limit at a couple of values. First, take a = π, you
4
You haven’t studied probability densities yet, but all that matters here is the function f .
46
know, the irrational number beginning 3.14 . . .. If we just look near this value, say between
3.1 and 3.2, it is completely flat: a constant function, taking the value 3 everywhere. So
of course the limit at x = π will also be 3. This is the same by words or pictures; see
Figure 12. By the formal definition, no matter what ε is chosen, you can take δ = 0.1, say,
and f (x) will be within ε of 3 because it will be exactly 3. So the limit is 3, hence so are
both one-sided limits as in the picture just above.
Now take x to be an integer, say a = 5. The limit from the right looks like it did before,
with f (x) taking the value 5 for every sufficiently close x (here sufficient means within 1)
greater than 5. On the other hand, when x is close to 5 but less than 5, we will have
f (x) = 4, as in the picture below. Thus,
lim f (x) = 5
x→5+
lim f (x) = 4
x→5−
lim f (x) = DNE .
x→5
The two-sided limit does not exist because the two one-sided limits are unequal; see Fig-
ure 13.
47
Exercise 3.4. Let f (x) = sgn(x), the sign function. Use the verbal, pictorial or formal
definition, as you please, to give values of these limits.
• limx→0+ f (x)
• limx→0− f (x)
• limx→0 f (x)
How about if we take the absolute value: is limx→0 |sgn(x)| any different?
Limits at infinity
You have already seen the pictorial and verbal version of a limit at infinity. Here is the
formal definition. It repeats a lot of the definition of a limit at x = a. The only difference is
that instead of having to come up with an interval [a − δ, a + δ] guaranteeing f (x) is within
ε of the limit, you have to come up with an “interval near infinity”. This turns out to mean
an interval [M, ∞). In other words, there must be a real number M guaranteeing f (x) is
with ε of L when x > M .
Remark. Informally, “close to infinity” turns into “sufficiently large”. In the tolerance/accuracy
analogy, getting f (x) to be close to L to within the acceptable tolerance will result from
guaranteed largeness of the input rather than guaranteed closeness to a.
Definition 3.4. We say that limx→∞ f (x) = L if and only if L is a real number and:
For any positive real number ε (think of this as acceptable tolerance in the y
value) there is a corresponding real M (think of this as guaranteed minimum
value for x) such that for any x greater than M , f (x) is guaranteed to be in the
interval [L − ε, L + ε].
48
Exercise 3.5. True or false?
1
lim x + =x
x→∞ x
Limits at −∞ are defined exactly the same except for a single inequality that is reversed.
Now the implication that must hold is that for some (possibly very negative) M ,
When this holds, we write limx→−∞ f (x) = L. When no such L exists, we write limx→−∞ f (x) =
DN E or just limx→−∞ f (x) DNE.
x
√
Example 3.5. Let f (x) := √1+x 2
. Because 1 + x2 is a little bigger than |x| but almost
the same when x or −x is large, this function satisfies
lim f (x) = 1
x→∞
lim f (x) = −1
x→−∞
√
Figure 14: graph of x/ 1 + x2
The graph of this function is shown in Figure 14. It has horizontal asymptotes at 1 and −1.
This suggests how to define a horizontal asymptote.
Definition 3.6. A function f or its graph is said to have a horizontal asymptote at
height b if limx→∞ f (x) = b or limx→−∞ f (x) = b.
Exercise 3.6.
(i) Sketch a graph of a function f for which limx→−∞ f (x) exists but limx→+∞ f (x) does
not.
(ii) Give a formula defining a function g(x) := · · · such that limx→−∞ g(x) exists but
limx→+∞ g(x) does not.
(iii) Which of these two things was easier to do?
49
“Limits” of infinity
Consider the function f (x) = 1/x2 , defined for all real numbers except zero. What happens
to f (x) as x → 0? By our definitions, limx→0 1/x2 DNE. But we can see that f (x) “goes to
infinity”. Because infinity is not a number, the limit technically does not exist. However, it
is useful to classify DNE limits as ones where the function approaches ∞ (or −∞) versus
ones where there is no consistent behavior.
Remark. This time, instead of staying within a tolerance of ε in the output, we make the
output sufficiently large (greater than any given N ) or small. We do this by guaranteeing
δ accuracy in the input (for limits as x → a) or by making the input sufficiently large or
small (limits as x → ±∞).
Definition 3.7. If f is a function and a is a real number, we say that limx→a f (x) = +∞
if for every real N there is a δ > 0 such that 0 < |x − a| < δ implies f (x) > N .
Again, if we reverse the last inequality to require that f (x) < N (and N can be a very
negative number) we get the definition for a limit of negative infinity. Please remember
these are all subcases of limits that don’t exist! If you show that a limit is infinity, you have
shown that the limit does not exist (and you have specified a particular reason it doesn’t
exist).
Example 3.8. Let’s check that limx→0 1/x2 = +∞. Given a positive real number N , how
can we ensure f (x) > N ? Answer: for positive numbers, f is decreasing and f (x) = N
√ √
precisely when x = 1/ N . Therefore, if we keep x positive but less than 1/ N then f (x)
will be greater than N . We have just shown that limx→0+ 1/x2 = +∞. Similarly, when x is
√
negative, if we keep x in the interval (−1/ N , 0) we ensure 1/x2 > N . So limx→0− is also
+∞. Both one-sided limits are +∞, therefore
1
lim = +∞ .
x→0 x2
50
For one-sided limits and limits at infinity, the DNE case also includes a case where the limit
would be said to be infinity. Stating all these would be repetitive. Try one, to make sure
you agree it’s straightforward.
Exercise 3.7. Write a formal definition for the statement limx→a+ f (x) = −∞.
Exercise 3.8. Consider the function 1/x. Write one-sided infinite limit statements for
limx→0+ 1/x and limx→0− 1/x.
Limit of a sequence
A special case of limits at infinity is when the domain of f is the natural numbers. When
f is only defined at the arguments 1, 2, 3, . . ., it is more usual to think of it as a sequence
b1 , b2 , b3 , . . ., where bk := f (k). The definition of a limit at infinity can be applied directly,
resulting in the definition of the limit of a sequence.
Definition 3.9 (limit of a sequence). Given a sequence {bn } and a real number L we say
limn→∞ bn = L if and only if for all ε > 0 there is an M such that |bn − L| < ε for every
n > M.
(iii) limn→∞ 2n
Pictorially, if a sequence has a limit L, then for every pair of parallel horizontal lines,
however narrow, enclosing the height L, the sequence must eventually stay between them.
This is shown in Figure 15.
51
L ............ ...
..
1 2 3 4 5 6 7 8 9
Figure 15: For these two parallel lines, once k > 9, the height bk is between the lines
As you will see, Propositions 3.12 and 3.15 give ways to determine limits of more complicated
functions once you understand limits of some basic functions. Here is another piece of logic
that can help do the same thing. You will prove it in your homework.
Theorem 3.10 (sandwiching). Let a be a real number or ±∞ and let f, g and h be functions
satisfying f (x) ≤ g(x) ≤ h(x) for every x. If limx→a f (x) = L and limx→a h(x) = L then
also limx→a g(x) = L. If we know only that limx→a+ h(x) = limx→a− h(x) = L then we can
conclude limx→a+ g(X) = L, and same for limits from the left.
The same fact is true of sequences: if an ≤ bn ≤ cn for these three sequences and the first
and last sequence converge to the same limit L, then so does the middle one. We will not
do anything with this now, but will get back to this fact in a week or two. The next exercise
brushes up on the logical syntax of limits.
Exercise 3.10. Evaluate limx→4 cx. Evaluate limt→a bt. In each of these two cases, say
which variables (any letter appearing in the expression other than letters spelling “lim”) are
free and which are bound. Did your answers involve only the free variables?
52
3.3 Continuity
Exercise 3.11. Is limx→a f (x) − f (a) = 0 the same as f being continuous at a? Explain
why or why not.
Continuity on regions
Before going on to use the notion of continuity to help us compute limits, we will state one
famous result which will seem either stupid and obvious or deep and tricky.
53
This says, basically, a continuous function can’t get from one value to another without
hitting everything in between. The theorem is most often used when there is a number we
can only define this way. For example, let f (x) := ex /x, which is an increasing function on
the half-line [1, ∞). We want to say “let c be the value for which f (c) = 3.” How do we
know there is one? Well, f (1) = e, which is less than 3, and f (3) ≈ 6.695 which is greater
than 3. So there must be an argument between 1 and 3 where f takes value 3. There can
be only one because f is strictly increasing (you can prove after another two sections).
Computing a limit by verifying the formal definition is a real pain. There is computational
apparatus that allows us to compute limits of many functions once we know limits of a few
simple ones. One approach we have seen in textbooks is to give a list of rules that work. It
looks something like this.
lim cf (x) = cL
x→a
lim f (x)c = Lc provided L > 0
x→a
54
Proposition 3.15. If f and g are functions and a, K, L are real numbers with limx→a f (x) =
K and limx→a g(x) = L, then
Exercise 3.14. Use Proposition 3.15 to evaluate two of these three limits. For the third,
can you find a way to evaluate it?
√
(a) limx→1 ln x − x
So that Propositions 3.12 and 3.15 don’t look like arbitrary rules from out of nowhere, you
should realize they can be proved, and in fact follow from one basic theorem.
Theorem 3.16 (composition with a continuous function). If the function f has a limit L
at x = a and the function H is continuous at L then H ◦ f will have the limit H(L) at
x = a. Formally,
Why do the two propositions 3.12 and 3.15 follow from this principle? Let H(x) be the
continuous function cx. Then H ◦ f is cf (x) and we recover the first conclusion of Propo-
sition 3.12. Setting H(x) := xc recovers the second conclusion.
Exercise 3.15. A related fact about limits is computation by change of variables. Suppose
g is a function such that limx→0 g(x) = 3. What is limx→0 g(2x)? This question will be
discussed further. For now, give a short answer and try to explain in words.
55
Some more techniques and tricks
This course is more about using limits than it is about computational technique, but you
should at least see some of the standard techniques for cases that go beyond what’s in
Propositions 3.12 and 3.15.
Suppose you need to evaluate limx→a f (x)/g(x). If both f and g have nonzero limits at
a, say L and M , then Proposition 3.15 tells you limx→a f (x)/g(x) = L/M . In fact if
L = 0 but M 6= 0, this still works. If M = 0 but L 6= 0, then the question of evaluating
limx→a f (x)/g(x) also has an easy answer.
Exercise 3.16. What is the easy answer?
The remaining case, when L = M = 0, can be enigmatic. Calculus provides one solution
you will see in a few weeks (L’Hôpital’s rule), but you can often solve this with algebra. If
you can factor out (x − a) from both f and g, you may get a simpler expression for which
at least one of the functions has a nonzero limit.
x2 − 25
Example 3.17. What is lim ?
x→5 x2 − 5x
Both numerator and denominator are continuous functions with values of zero (hence limits
x+5
of zero) at 5. That suggests dividing top and bottom by x − 5, resulting in lim . Both
x→5 x
numerator and denominator are continuous functions so we can just evaluate and get 10/5
so the answer is 2.
Sometimes you have to do a little algebra to simplify. Here’s an example of one of the most
common simplification tricks.
√
x+1−1
Example 3.18. What is lim ?
x→0 x
Multiplying and dividing by the so-called conjugate expression, where a sum is turned into
a difference or vice versa, gives
√ √ √
x+1−1 x+1−1 x+1+1
lim = lim √
x→0 x x→0 x x+1+1
x
= lim √
x→0 x( x + 1 + 1)
1
= lim √ .
x→0 x+1+1
56
The numerator and denominator are continuous at x = 0 with nonzero limits of 1 and 2
respectively, so the limit is equal to 1/2.
This algebra trick occurs so commonly throughout mathematics that you should always
think about conjugate radicals every time you see an expression with a square root added
to or subtracted from something!
Further tricks can wait until you’ve learned some more background. Although limits
are needed to define derivatives, you can then use derivatives to evaluate more limits
(L’Hôpital’s rule). Similarly, limits are used to define orders of growth, which can then
be used to evaluate more limits.
57
4 Derivatives
It’s easy to define your average speed for a trip: take the number of miles, divide by the
number of hours, and there’s your average speed in miles per hour. If you journey at
constant speed, then that’s also your speed at every moment of the trip. Most of us do not
travel at constant speed. What is your speed then? How do you define it? How do you
measure it? How do you compute it if you know some equation for your position at time t?
The concept of instantaneous speed is subtle. It is what spurred the invention of calculus
over a few decades near the year 1700. It is a very general notion. Average speed is distance
traveled per total time. Instantaneous speed is some instantaneous version.
Exercise 4.1.
If you replace “distance traveled” by “production price” and “time elapsed” by “units
produced” you get the notions of average production cost per unit; marginal cost per unit
is the instantaneous version. The list of applications is endless. Mathematically, they are
all the same: if f is a function and x0 and x1 are starting and ending arguments for f , then
the average change in f over the interval is (f (x1 ) − f (x0 ))/(x1 − x0 ); the instantaneous
rate of change of f with respect to x is called the derivative of f with respect to x and
denoted f 0 (x).
Exercise 4.2. Suppose f (x) = mx + b. What is f 0 (x)?
In this section we will see how to understand f 0 both physically and mathematically. We
will continue to use instantaneous speed as a running example of the physical concept, and
instantaneous rate of change of f (x) with respect to x as the corresponding mathematical
concept.
58
Important remark: we can take the slope of the function f at any point. Taking it at x
gives a value we call f 0 (x). That means that f 0 is a function: give it an argument x and
it will produce the slope of f at that point. It will be helpful to keep in mind that the
derivative operator takes as input functions f and produces as output their derivatives f 0 .
Operator is a fancy word for a function whose input and output are functions rather than
numbers. Taking derivatives is a linear operator. This is captured in Propositions 5.1 - 5.3
below.
Exercise 4.3. Suppose you replace “distance traveled” by “elevation of trail” and “time
elapsed” by “distance hiked”. What would be the physical interpretation of the instantaneous
rate?
4.2 Definitions
Most functions we use in mathematical modeling have unique tangent lines at most points.
The slope of the tangent line to the graph of f at the point (x, f (x)) seems like one reasonable
definition of f 0 (x). In rare cases, such as you have already seen, we can use geometry to
prove there is exactly one line tangent to the graph of f at a point and compute the slope.
√
Figure 16: graphs of |x|, x sin(1/x) and 3
x
Unfortunately, there are not many functions for which the graph is a well known geometric
object. In most cases we can’t use geometry to conclude that there is a tangent line, that
there is only one tangent line, or what the slope of this line is, if indeed there is exactly one.
Keeping this in mind, we will use limits to come up with a definition that works for most
functions and, when it does not work, as in the examples in Figure 16, gives an indication
59
of why. In cases when it does not work, in fact we would probably agree that there is no
good way to make sense of the instantaneous slope.
√
Exercise 4.4. The graphs of |x|, x sin(1/x) and 3 x are shown in Figure 16. All contain
the point (0, 0) provided we add zero to the domain of the second function and define the
function to be zero there. In each case, say whether there is one, none, or more than one
tangent line to the graph at (0, 0). In which of these cases do you think there is a well
defined slope of the tangent (0, 0)?
We can take average slopes over any interval we want. The slope over the interval [a, b] is
the slope of the secant line passing through (a, f (a)) and (b, f (b)). This is also called the
difference quotient of f at the arguments a and b. What happens when one endpoint
of the interval is x and the other is very close to x? Pictorially, it looks the slope gets
very close to the slope of the tangent line at (x, f (x)). Figure 17 shows an example where
a = 1/2 and secant lines (blue) are drawn through various values of b. These appear to
converge to the tangent line at (1/2, f (1/2)) which is black and dashed.
The derivative is a mathematical definition meant to compute the slope of the tangent line
at a. Definition 4.1, however, only talks about limits of slopes of secants, not of tangents.
Do you think these two notions will always coincide? There isn’t a right answer to this.
60
Definition 4.1. Let f be a function whose domain contains an interval around the point
a. Define
f (b) − f (a)
f 0 (a) := lim (4.1)
b→a b−a
if the limit exists, and say that f 0 (a) is undefined if the limit does not exist. Because we
want to emphasize that b − a is going to zero, we often define h := b − a and rewrite the
definition as
f (a + h) − f (a)
f 0 (a) := lim . (4.2)
h→0 h
The two definitions (4.1) and (4.2) are algebraically equivalent.
Exercise 4.5.
(i) In (4.1), which variables are free and which are bound?
(ii) In Figure 17 What values of a and b are being illustrated?
(iii) Suppose a student complains that Figure 17 illustrates a limit of the form limb→a+ ,
not limb→a . What could you add to the picture to address her concerns?
Example 4.2. Let f (x) = x2 . Let’s see the definition to try to compute f 0 (1). By definition,
this is
f (b) − f (1)
lim .
b→1 b−1
Evaluating the numerator, gives
b2 − 1
lim = lim b + 1 = 2 .
b→1 b − 1 b→1
The first equality is true because we can cancel the factors of b − 1 (remember, the limit
looks at values of b near 1 but not equal to 1). The second equality is true because we can
evaluate the limit of the polynomial b + 1 at a = 1 by plugging in 1 for b (Proposition 3.14).
Exercise 4.6. Let f (x) = x2 + 5. Compute f 0 (3) directly from the definition, as we did in
the previous example (show your work: you can upload a pdf, write in text in using a lot of
parentheses, or use the Canvas equation editor).
Definition 4.3 (one-sided derivative). Sometimes only a one-sided limit exists in equa-
tion (4.1). We call this a one-sided derivative and denote it by f 0 (a+ ) or f 0 (a− ).
61
Notation
We already agreed to use a prime after the function name as one way to denote a derivative.
Thus the derivative of f is f 0 , the derivative of g is g 0 , the derivative of Γ is Γ0 , etc. We
may need to refer to the derivative of a function when it has not been given a name. One
could imagine something like the notation (cx)0 for the derivative of the function “multiply
by c”, or perhaps the more precise5 (x 7→ cx)0
df
To avoid ambiguity, we use the notation dx for the derivative of f with respect to x. This
0
is better than f when there is more that one variable that could be differentiated. You can
d
also write this as dx f when f is a big long cumbersome expression, for example,
x2 −1
d e 1+xsin x 2
!
d ex −1 sin x
is the same as .
dx dx 1+x
Then there is the question of how to write f 0 (a), the value of the function f 0 at argument
a, in this notation. Should we write d fdx(a) or dx
df
(a)? The second is better, for example,
d(x3 −3x+1)
dx (a),
because the first looks like you are differentiating a constant. Another com-
d (x3 − 3x + 1)
mon way of writing this is .
dx x=a
Exercise 4.7. Suppose the number of feet an object has fallen after t seconds is given by
16t2 + ct where c is its initial downward velocity6 . Write an expression for the downward
instantaneous speed of the object after s seconds. Please don’t compute any derivatives, just
write an expression in some notation involving a derivative.
You have seen examples in which derivatives represent speed. More generally, the derivative
of a function of time represents the rate of change of the quantity per time. Here are some
other things derivatives commonly represent.
Suppose you have a formula f (x) involving a quantity x that is measured, but with mea-
surement error. Then f 0 (x) tells you how much error you get in f per amount of error in
measuring x.
5
More precise because it is distinguished (c 7→ cx)0 in which x is the free variable, c is the (bound)
variable, and the function is “multiply by x”.
6
This is in fact true when air resitance is ignored and the earth’s gravitational constant is approximated.
62
Example 4.4. A 4 × 8 foot board is cut parallel to the long side to obtain a 3 × 8 board.
The accuracy of the cut is 1/4 inch. What is the accuracy of the area, in square feet?
Writing A = ` × w and differentiating gives dA/dw = ` = 8 feet in our case. Therefore, the
error in area (in square feet) is 8 feet times the measurement error in the width (in linear
feet). Plugging in a measurement error of 1/4 inch, which equals 1/48 feet, we see the area
1 1
is accurate to within 8 ft × ft = ft2 .
48 6
The symbol ∆ is the upper case Greek letter Delta an often used to denote change in a
quantity or error in a measurement.
Exercise 4.8. Let ∆x denote the possible error in x, and ∆f denote the possible resulting
in f (x). Write a formula for these quantities in terms of the derivative of f .
Another interpretation is the marginal effect of the variable on the function. For example,
if f (x) represents the cost of producing x barrels of refined oil, then f 0 (x) is the marginal
cost of production of more oil. Unless f is linear, this will depend on x. The marginal cost
of further production usually depends on the present level of production.
Knowledge of the derivative can help you sketch a function more accurately. The very first
practice problem asked you to incorporate slope information into a sketch. Sketching is
as much an art as a science, but there are methodical ways to use information about the
function and its derivatives.
To begin with, knowing where the derivative is positive and negative determines whether
it is sloping up or down as you move right. In other words, the sign of the derivative
indicates whether the function is increasing or decreasing. Where the sign of the derivative
changes from positive to negative as you move right, the function changes from increasing
to decreasing. That means someone hiking on the graph of the function from left to right
has been walking upwards and now begins to walk downward; see Figure 18.
Exercise 4.9. What does the hiker’s landscape look like if f 0 is positive to the left of the
value x = a and negative to the right?
Because transitions in the sign of f 0 correspond to hilltops and valley floors, finding values
of x that are maxima and minima for f (x) involves finding values of x for which f 0 (x) = 0.
63
f’=0
f’>0 f’<0
Figure 18: The hilltop, where the function changes from increasing to decreasing, occurs
exactly where f 0 = 0.
We discuss this at greater length in Chapter 7. For the purposes of sketching, the moral
of the story is: know where f 0 is positive and where it is negative, and use this to depict a
function that is increasing and decreasing in the right places.
Exercise 4.10. Sketch a function f such that f 0 is positive when x < 1, dips to zero at
x = 1, is positive again until x = 3, is zero at x = 3 and is negative to the right of that.
See if you can also make f have a unique zero at x = −2.
A Japanese proverb says, “The other side also has another side.” The function f has a
derivative. This is also a function. Therefore, The derivative also has a derivative. Not
quite so poetic, but very useful for sketching functions. It is called the second derivative,
d2 f
denoted f 00 , or . The sign of the derivative says where a function is increasing or
dx2
decreasing, therefore the sign of f 00 indicates where the slope f 0 is increasing or decreasing.
We use italics here as a visual reminder that there are a number of levels (original function,
first derivative, second derivative) and attributes (positive/negative, increasing/decreasing)
and it’s easy to get mixed up what corresponds to what.
64
d2 f
Remark. The placement of the 2 in the numerator of may seem strange, but it reflects
dx2
something important: (d/dx) is a differential operator, and (d2 /dx2 ) is the result of applying
this operator twice. This becomes important in later courses such as Math 1410.
Of course, not every function is differentiable, and not every derivative is itself differentiable,
so f 00 may not exist even if f 0 exists.
We have talked informally about functions that are concave up or down. It is time to give
a definition. In fact we give two definitions, one algebraic and one pictorial. The pictorial
one is in fact more general because it works when f 0 does not exist. When f 0 exists on
(a, b), then the two definitions agree.
Definition 4.6 (concavity: pictorial definition). If (a, b) is an open interval in the domain
of f and if for every pair of numbers x, y ∈ (a, b) the graph of f on (a, b) lies below the line
segment connecting (x, f (x)) to (y, f (y)), we say that f is concave upward on (a, b).
Exercise 4.12. If f 00 exists and is positive, can you conclude anything about concavity of
f ? How about if f 00 exists and is negative?
Points of inflection
We never formally defined a tangent line. One definition would be “A line that touches a
graph of a function at precisely one point and stays on one side of the graph other than
this.” Here are four ways this definition may fail to capture what some people think a
tangent line should be. For each example, please say whether you think the given line
ought to count as a tangent line.
65
Exercise 4.13.
As you can see, the intuitive definition of a tangent line is subject to unanticipated judgment
calls. This motivates a more formal definition.
Exercise 4.14. Is the point (a, f (a)) always on this line? Explain why or why not.
One confusing case is when the second derivative is zero. What happens to the concavity
at such a point? Often it switches from up to down or vice versa. Wherever concavity
switches is called a point of inflection. The geometric concept of an inflection point does
not require calculus, though the notion seems not to have been discussed much before the
advent of calculus.
Exercise 4.15. Which of the figures in Exercise 4.13 shows a point of inflection?
Exercise 4.16.
(i) Sketch a graph of the sine function.
(ii) Mark the intervals where sine increases and those where it decreases.
(iii) On the same graph sketch the cosine function.
66
(iv) The derivative of sin is cos; what does this imply about the values of cosine on the
marked intervals?
(v) Where are the points of inflection for sine and what happens to the cosine at those
arguments?
67
5 Computing derivatives
There are a lot of rules for computing derivatives that are relatively easy to remember and
use. These rules are theorems – they can all be derived from the definition via limits and
some computation. You will get familiar enough with these rules that you will happily use
them without thinking. The structure of this chapter is backwards: we give you nearly
all the rules right away, then give arguments for some of them, postponing some of the
arguments until we have developed a few more tools. We do this because calculus is so
much more fun when you know enough to do a few computations!
The rules have two forms. Some just tell you the derivative of a particular function like
sin x or a class of functions like bx . Others are rules for combining and transforming. They
tell you, if you know f 0 and g 0 , what the derivatives are of f + g, f g, f ◦ g, and so forth.
Proposition 5.1 (sum rule). Let f and g be differentiable functions. Then (f +g)0 = f 0 +g 0 .
Proposition 5.2 (difference rule). Let f and g be differentiable functions. Then (f − g)0 =
f 0 − g0.
Proposition 5.3 (multiplication by a constant). Let f be a differentiable function and c
be a constant. Then (cf )0 = cf 0 .
Exercise 5.1. Using the three propositions above, as well as examples you’ve worked out
√
earlier, compute the derivative of x − 3 x.
Proposition 5.4 (product rule). Let f and g be differentiable functions. Then (f g)0 =
f 0g + g0f .
Proposition 5.5 (quotient rule). Let f and g be differentiable functions. Then for any x
such that g(x) 6= 0,
d f (x) gf 0 − f g 0
= ,
dx g(x) g2
all functions on the right-hand side evaluated at x.
68
Proposition 5.6 (chain rule). Let f and g be differentiable functions. Let a be a real
number inside an open interval in the domain of g such that g(a) is inside an open interval
in the domain of f . Then
!
d df dg
f (g(x)) = .
dx x=a dx x=g(a) dx x=a
We list a few that are either obvious from the definition or are ones you’ve worked out
already.
d
c = 0
dx
d
cx = c
dx
d
x2 = 2x
dx
d √ 1
x = √ for x > 0 .
dx 2 x
Exercise 5.2. Which functions f have the property that f 0 is a constant function? Sketch
the graph of f in the case that f 0 is the constant function 1/2.
Proposition 5.8 (powers and transcendental functions). In the following list, if no restric-
69
tions are given on x, then the statement holds for all real x.
d
1. xn = nxn−1 when n is a positive integer
dx
d
2. xr = rxr−1 when x 6= 0 and r is any nonzero real number
dx
d
3. ex = ex
dx
d
4. ax = ax · ln a for a > 0 and all real x
dx
d 1
5. ln x = for x > 0
dx x
d
6. sin x = cos x
dx
d
7. cos x = − sin x
dx
d
8. tan x = sec2 x when this is finite
dx
d 1
9. arcsin x = √
dx 1 − x2
d −1
10. arccos x = √
dx 1 − x2
d 1
11. arctan x =
dx 1 + x2
Exercise 5.3. Use rule # 4 to compute the slope of the function f (x) := ax at x = 0. For
which a is this slope equal to 1? Is this consistent with Proposition 0.10?
Exercise 5.4. Let f (x) := x−1 and g(x) := x3 . This exercise takes you step by step through
a test of the product rule.
(i) What is f 0 ?
(ii) What is g 0 ?
(iii) what is (f 0 )(g 0 )?
(iv) What does the product rule give you for (f g)0 ?
(v) What do you get for (f g)0 by first multiplying, then using rule #1 from Proposition 5.8
(the power rule)?
70
You are probably pretty experienced at taking apart algebraic expressions into sums and
differences of products and quotients of simpler expressions. Here are some more exercises
to check that you can do this and then apply the differentiation rules above.
Exercise 5.5. Use the sum, difference, product and quotient rules, along with derivatives
given in Proposition 5.8 to evaluate f 0 (x) in each of these cases.
(i) f (x) := x3 ex
1
(ii) f (x) := x2.5
(iii) f (x) := x ln x − x
Taking apart algebraic expressions into compositions of functions, as is needed for the chain
rule, can be a little trickier.
Example 5.9. In order to differentiate (1 + x2 )1/3 you need to recognize this as a composi-
tion f (g(x)) with f (x) = x1/3 and g(x) = 1 + x2 . The chain rule tells us that the derivative
of (1 + x2 )1/3 at x = a will be given by
d 1/3 d 2
x (1 + x ) . (5.1)
dx x=1+a2 dx x=a
The derivative of x1/3 is (1/3)x−2/3 be the power rule (the second identity in Proposi-
tion 5.8); the derivative of 1 + x2 is 0 + 2x = 2x by the sum rule and the power rule. This
shows (5.1) to equal
1 −2/3 1
x ( 2x|x=a ) = (1 + a2 )−2/3 (2a) .
3 x=1+a2 3
The next few exercises check on your understanding of the chain rule. The first two tell you
how to choose f and g. The last two do not.
Exercise 5.6. Let f (x) = ex and g(x) = −x. Use the chain rule to evaluate the derivative
of e−x .
√
Exercise 5.7. Let f (x) = x and g(x) = 1 + x2 . Use the chain rule to evaluate the
√
derivative of 1 + x2
71
Exercise 5.8. Evaluate h0 (x) where h := ln(1 + x2 ). To do so, first state a choice of
functions f and g such that h(x) = f (g(x)). Then use the chain rule.
2
Exercise 5.9. Evaluate h0 (x) where h := e−x /2 . To do so, first state a choice of functions
f and g such that h(x) = f (g(x)). Then use the chain rule.
Proofs are for convincing others, as well as for deciding whether you know something for
sure, in all cases. The next two exercises ask for opinions on whether or not a proof is
needed. There’s no right answer, but we expect you to give a good sense of why or why
not.
Exercise 5.10 (sum rule - obvious or not?). The sum rule, in an applied setting, says
something like this. Suppose Dick’s net worth at time t, call it f (t), is increasing at a
certain rate, and Jane’s, call it g(t), is increasing at another rate. Then their joint fortune
(they are married) is increasing at a rate that is the sum of the two individual rates. Stated
in these terms, is the sum rule obvious or does it require proof ?
Exercise 5.11. In applied terms, suppose f (t) is the length in meters of a turtle that is t
days old and g(t) = 3.3f (t) is the length in feet. Then g 0 (t), the rate of increase of length
in feet per day, should be 3.3 times f 0 (t), the rate of increase in meters per day. Obvious
or not?
In case some of you answered that it was not obvious, here is a mathematical proof. In
most of the upcoming proofs, we need to use the definition of the derivative as a limit of
difference quotients. We don’t need to use the ε-δ definition of limit, just known facts about
limits.
72
As you can see, the logic broke this down into small steps, justified by facts we have
accumulated. The proof didn’t add a whole lot to our understanding, although it does help
to nail down the fact that this holds whenever f 0 (a) and g 0 (a) exist, without exceptions for
when one of them is zero, or undefined for values other than a, or anything like that.
We’ll ask you to do one of these on your own, then not bother you with proofs of things
that are borderline obvious.
Exercise 5.12. Prove Proposition 5.3. It’s pretty similar to the proof for the sum rule but
a little easier.
We mentioned earlier what units a derivative has, but never discussed why. Now is a
good time. Taking the limit of an expression gives something with the same units. The
derivative is the limit of a difference quotient (f (x + h) − f (x))/h. The numerator is the
difference between two things with the same units, namely the units of the value of f . The
denominator has units of the argument of f . So the difference quotient has units of the
value of f divided by the argument of f . For example, if f (t) is distance traveled in the
time t, then f 0 has units of distance per time (such as MPH).
Why is (f g)0 not equal to f 0 g 0 ? There are many reasons, one of which is the units. In
an application, the values of f and g might have different units, but if both are being
differentiated with respect to x then they must have the same input units. The units of
(f g)0 are, as we have just seen, units of f times units of g divided by units of x, the argument.
Unfortunately f 0 g 0 has the units of f /x times the units of g/x, so one too many units of x
in the denominator.
We now present three arguments for the product rule. When we’re done, we’ll take a poll
of which is most convincing.
Intuitive proof: If f is a constant, so all the change in the product f g comes from
changes in g, then we have seen (f g)0 = f · g 0 . If g is a constant, then similarly, (f g)0 = gf 0 .
In reality, both are changing, so the rate of change of the area is the sum of these two
individual rates.
Picture proof: Suppose f (t) is the length in meters of a growing rectangular blob at
time t seconds, and g(t) is its width. How fast is the area growing at time t?
73
∆g f .∆g ∆ f ∆g
g .
fg
g ∆f
f ∆f
Figure 19 shows the classical pictorial argument. When time increases by a small quantity
∆t, both f and g increase by small quantities, which we respectively call ∆f and ∆g, and
the area increases by f ∆g plus g∆f plus (∆f )(∆g). We know that ∆f is approximately
f 0 (t)∆t, because in the limit as ∆t → 0, the ratio ∆f /∆t converges to f 0 (t). Similarly,
∆g ≈ g 0 (t)∆t. From the picture, you can see that ∆(f g) = f ∆g + g∆f + (∆g)(∆f ). So
∆f g ∆g ∆f (∆f )(∆g)
=f +g + .
∆t ∆t ∆t ∆t
Taking limits on the right hand side as ∆t → 0 gives f 0 g + g 0 f + lim∆t→0 (∆f )(∆g)/∆t.
This last limit should be zero. Why? Say f 0 (t) = a and g 0 (t) = b. Then ∆f ≈ a∆t and
∆g ≈ b∆t, so
which is zero.
Aside. We could have called δt something like h, in keeping with the notation in the def-
inition of derivative. We have purposely used different notation here to get you used to
seeing multiple different looks. All are common in textbooks. The different notations affect
your brain slightly differently. The ∆f and ∆t notation is designed to make you think of a
physical quantity changing as another physical quantity changes. The notation f (x + h) is
designed to make you think of a mathematical function with an argument x increased by a
small amount h. Both are important frames in which to think.
Formal proof: The simplest algebraic proof of the product rule is a bit more “out of the
74
blue” because it relies on this trick:
and hence
f (x + h)g(x + h) − f (x)g(x) g(x + h) − g(x) f (x + h) − f (x)
= f (x + h) + g(x) .
h h h
The trick was, we added and subtracted f (x + h)g(x) in order to be able to separate the
original difference quotient into two pieces, both of which look a function times a simpler
difference quotient. Taking limits and using the fact that limits of sums are sums of limits,
and the same for products, gives
f (x + h)g(x + h) − f (x)g(x)
(f g)0 (x) = lim
h→0 h
g(x + h) − g(x) f (x + h) − f (x)
= lim f (x + h) + lim g(x)
h→0 h h→0 h
g(x + h) − g(x) f (x + h) − f (x)
= lim f (x + h) lim + lim g(x) lim
h→0 h→0 h h→0 h→0 h
= f (x)g 0 (x) + g(x)f 0 (x) .
Exercise 5.13. Because f and g are differentiable, they are continuous. The formal proof
above uses that fact that one of the two is continuous at x but does not use continuity of
the other. Which continuity fact is needed and where is it used?
Suppose a toy car is moving around a circular track of radius one meter, so that its speed
is constant 1 meter per second; the coordinates of the point are x = cos t, y = sin t. By
definition of radian, its angle with respect to the horizontal increases at a rate of one radian
per second. The northward (y-direction) speed is the derivative of sin t. Suppose at time x
a gate opens up and the car stops turning to stay on the track and coasts straight onward
at its present speed of 1. Its northward speed during the time [x, x + 1] is the derivative
of the sine function at time x. To evaluate this, we just have to check how far northward
the car went from time x to x + 1. This is just analytic geometry. The car goes one unit
tangent to the circle during this time interval from the point (cos x, sin x) (B in Figure 20)
to the point (cos t − sin t, sin t + cos t) (A in the figure). Therefore the derivative of sin is
cos. For free, we also get (by looking at the x coordinate) that the derivative of cos is − sin.
75
C ( cos(x) − sin(x) , sin(x) + cos(x) )
(− sin(x) , cos(x ) ) D
B ( cos(x) , sin(x) )
x
A
Figure 20
ABCD is a square of side 1 tangent to the unit circle as shown.
At time x the car is at point B, making angle x with the x-axis.
From time x to time x + 1 the car travels in a straight line to C.
The easiest way to make sense of the chain rule is in terms of related rates. Think of x, u
and y as physical quantities related by rules. If you change x, it changes u. The specific
rule is u = g(x). If you change u it changes y. The specific rule is y = f (u).
g f
x u y
Aside. Suppose x is time, u is how many liters of air you breathed in that time and y is
how much CO2 you produced. If you breathe six liters per minute, and you produce 1/20
liter CO2 for every liter of air you breathe in, what is your rate of production of CO2 ? This
simple word problem, which most of you would solve without much thought, turns into the
chain rule if your respiration rate or rate of CO2 production per breath is no longer constant
and refer instead to the present instantaneous rates.
What does this mean quantitatively? The rate of change of u with respect to x is g 0 (x).
This is illustrated on the left side of Figure 21, where the infinitesimal changes dx and
du are depicted. The slope of the hypotenuse of the small triangle is g 0 (x), where in the
diagram, the value of x is roughly 1/2. On the right side of the figure, we see that this small
76
change in u leads to a proportionate small change in y. The ratio, dy/du is equal to f 0 (u).
One question remains: at what value of u is this ratio evaluated? In the figure, it appears
u ≈ 1/8. More precisely, if we originally took x to be 1/2, the u value will be f (1/2). In
other words, the value from the u-axis (vertical in the first graph) is copied to the second
graph (where the u-axis is now the horizontal axis). In other words, f 0 is evaluated at u,
which is g(x). Thus dy/dx = du/dx · dy/du|u=g(x) .
77
If g(a + h) could be replaced by the tangent line approximation g(a) + hg 0 (a) then the proof
would finish easily: letting ε := hg 0 (a),
It is indeed true that the tangent line approximation is close enough to g itself to make this
work, but proving that takes a trickier argument than we want to go into here.
Aside. The first equality above used the a change of variables between ε and h in the limit.
We hope you made a note of this following the class discussion of Exercise 3.15.
78
6 Asymptotic analysis and L’Hôpital’s rule
This is an optional (but fun) intro to “the infinity rules”. Recall from Chapter 3 that
limits (two-sided, one-sided, or a limit at ±∞) that evaluate to DNE may broken into three
categories, limits of +∞, limits of −∞ or no limit not even an infinite one, which we will
write as UND for “undefined”.
1
Jake is trying to evaluate lim . He says that plugging in ∞ for x you get 1/∞ which is 0.
x→∞ x
ln x
Jen is trying to evaluate lim √ . She says that plugging in ∞ for x you get ∞/∞ which
x→0 x
is 1. If your gut feeling is that Jake is right and Jen is wrong, then you have good instincts.
Jake’s logic is correct because every time limx→∞ f (x) = 1 and limx→∞ g(x) = ∞, it follows
that limx→∞ f (x)/g(x) = 0. Jen’s problem contains an indeterminate form, meaning that
when both limx→∞ f (x) = ∞ and limx→∞ g(x) = ∞, there are multiple possible values for
limx→∞ f (x)/g(x), including any positive real number, ∞, or UND.
When you learn about complex numbers, they seem in one sense like make-believe but in
another sense like ordinary math because they obey clear rules. Learning about infinity is
different. The word is in the vocabulary of most children, but no one knows the rules! Is
infinity part of math? Part of philsophy? Science fiction? It turns out infinity does obey
some very clear rules, as long as you decide to define it as a limit. (Trust mathematicians
to take the fun out of it!)
Suppose, in addition to the real numbers, we include the numbers +∞, −∞ and UND.
These are the possible limits a function can have. The goal is to create combining rules for
limits under the basic operations: addition, subtraction, multiplication, division and taking
powers. One rule is that once something is undefined, it stays that way. Limits that DNE
could turn out to be ±∞ rather than UND, but once a limit gets classified as UND, nothing
can be inferred about what you get when you add it to something, multiply it, etc. Thus,
UND + 3, UND −∞, −∞· UND and UND / UND are all undefined7
7
Occasionally this classifies a limit as undefined when there is a value, but that’s OK as long as we
understand UND to mean that our combining rules alone don’t determine the value. Example: 2/0 = UND
but if you know the particular function with a limit of zero is always positive then the limit is actually +∞.
Similarly, if limx→∞ f (x) = UND then also limx→∞ f (x)/f (x) = UND by this rule, not the obvious limit, 1.
79
We want a theory that makes Jake’s limit 0 and Jen’s limit UND. Let a and b be extended
real numbers, that is they are either real numbers, or +∞ or −∞. We don’t bother with
UND because we already agreed that once either a or b is UND then any combination of
them is UND. Since you’re reading this optional section for your own edification, stop before
you read the next definition and think how this theory would work.
Definition 6.1 (operations with infinity). If a, b and L are extended real numbers, we say
that a + b = L if for every extended real number c and every pair of functions f and g
such that limx→c f (x) = a and limx→c g(x) = b, it is true that limx→c f (x) + g(x) = L. If
no such extended real L exists we say that a + b is UND and call this an indeterminate
form. Extending this definition with any other binary operation in place of addition gives
definitions as well for subtraction, multiplication, division, to the power, etc.
Example 6.2 (2 + 2 = 4). Let’s be sure we haven’t destroyed anything we already knew!
We’ll check it on “2+2=4”, often used for something everyone knows8 . Checking this is not
completely trivial (!) but it follows from Proposition 3.15.
Example 6.3 (1/∞ (Jake’s example)). To check that 1/∞ = 0 we need to show that
limx→c f (x) = 1 and limx→c g(x) = ∞ imply that limx→c f (x)/g(x) = 0. Briefly, if f is
getting near 1 and g is getting very large, you can see that f /g must be getting very small,
i.e., close to 0. If you are curious, your TA or instructor can supply the formal proof you’d
see in an honors calculus class.
Example 6.4 (∞/∞ (Jen’s example)). Take c = ∞ and f (x) = g(x) = x. Then obviously
limx→∞ x/x = 1. On the other hand, changing f (x) to 2x, we get limx→∞ f (x)/g(x) = 2, or
if we take f (x) = x2 and g(x) = x we get limx→∞ f (x)/g(x) = ∞. Because many different
√
limits are possible, ∞/∞ is undefined. Jen may or may not be right that limx→∞ ln x/ x =
1, but the argument that ∞/∞ = 1 is bogus.
Exercise 6.1. Using Definition 6.1 as a guide, say what you think the value (possibly ±∞
or UND) is for each of these three expressions. You don’t need a proof, just a guess.
(i) 4/∞
(ii) 1∞
(iii) 3−∞
8
This famous identity is used as a test for brainwashing in George Orwell’s classic novel 1984.
80
6.2 L’Hôpital’s rule
The previous section is optional because you don’t ever NEED to know whether a form
is indeterminate. L’Hôpital’s rule allows us to resolve indeterminate forms in some cases.
The hypotheses involve particular indeterminate forms such as 0/0, but you don’t need the
infinity rules to use it. Rather, the infinity rules give you an alternate way to evaluate limits
when the expression is NOT really an indeterminate form.
Theorem 6.5 (L’Hôpital’s rule, first version9 ). Let f and g be functions differentiable on
an interval containing the point a, except possibly at the point a, where f and g are not
required to be defined. Suppose f and g both have limit zero at a and suppose g 0 is nonzero
on the interval. If limx→a f 0 (x)/g 0 (x) = L for some finite L, then the limit limx→a f (x)/g(x)
exists and is equal to L.
Example 6.6. L’Hôpital’s rule computes limx→0 sin(x)/x much more easily than in the
video from a few weeks ago. Let f (x) = sin x, g(x) = x and a = 0 and observe that the
continuous functions f and g both vanish at zero, hence limx→0 f (x) = limx→0 g(x) = 0.
Therefore,
sin x cos x cos(0)
lim = lim = = 1.
x→0 x x→0 1 1
You might wonder, when we first evaluated this limit, why did we do it the hard way?
Remember, we did not and will not prove L’Hôpital’s rule . For this reason it’s good to see
some things that can be done without it.
Exercise 6.2. Use L’Hôpital’s rule to evaluate the following limits. Please state what are
f, g, a, f 0 and g 0 , as well as the value of the limit.
9
L’Hôpital’s rule uses derivatives to compute limits. You might object that this is circular because limits
are used to define derivatives. It is not circular, because in each case, we use facts we already know to
compute ones we don’t. We should probably avoid using L’Hôpital’s rule to prove general theorems about
derivatives, given that we are not going to prove L’Hôpital’s rule and don’t know what theorems about
derivatives it relied on. But it’s safe to use L’Hôpital’s rule to evaluate individual derivatives. We promise
no individual derivative was used in the proof of L’Hôpital’s rule.
81
ex − 1
(a) lim
x→0 x
√ √
3
x − 3 10
(b) lim √ √
x→10 x − 10
There are two common mistakes in applying L’Hôpital’s rule. One is trying to use it the
other way around. If f /g has a limit at a, that doesn’t mean f 0 /g 0 does, or that these even
exist. The other is to try to use it when f or g has a nonzero limit at a. For example, if
limx→a f (x) = 5 and limx→a g(x) = 3 then limx→a f (x)/g(x) = 5/3 (the nonzero quotient
rule) and is probably not equal to limx→a f 0 (x)/g 0 (x).
Exercise 6.3. Which (possibly several, possibly none) of these uses of L’Hôpital’s rule are
valid (hypotheses are satisfied and conclusion is correctly applied)?
x2 − 10 2x
(i) lim = lim = 6.
x→3 x − 3 x→3 1
x2 − 4 2x
(ii) lim = lim = 4.
x→2 x − 2 x→2 1
6 − e−x
(iii) lim = 2 and the respective derivatives on top and bottom are e−x and 2e−2x ,
x→∞ 3 − e−2x
e−x
therefore lim = 2.
x→∞ 2e−2x
If the hypotheses hold only from one side, for example limx→a+ f (x) = limx→a+ g(x) = 0,
then the conclusion still holds on that side: if limx→a+ f 0 (x)/g 0 (x) = L then limx→a+ f (x)/g(x) =
L. Also, the limit can be taken at ±∞ and nothing changes.
(i) Suppose f and g are differentiable on an open interval (a, b), with f and g both having
limit zero at a. Suppose that g 0 6= 0 on (a, b) and limx→a+ f 0 (x)/g 0 (x) = L. Then
limx→a+ f (x)/g(x) = L.
(ii) Suppose f and g are differentiable on an open interval (b, a) with f and g both having
limit zero at a. Suppose that g 0 6= 0 on (b, a) and limx→a− f 0 (x)/g 0 (x) = L. Then
limx→a− f (x)/g(x) = L.
82
(iii) Suppose f and g are differentiable on an open interval (b, ∞) with f and g both having
limit zero at infinity. Suppose that g 0 6= 0 on (b, ∞) and limx→∞ f 0 (x)/g 0 (x) = L.
Then limx→∞ f (x)/g(x) = L. The same holds for limits at −∞, replacing the interval
with (−∞, b).
Exercise 6.4. Which of these would you use to evaluate the limit at zero from the right of
√
ln(1 + x)/ x, and what is the limit?
The case 0 · ∞
Suppose limx→a f (x) = 0 and limx→a g(x) = ∞. How can we compute limx→a f (x) · g(x)?
We know that limx→a 1/g(x) = 1/∞ = 0. Therefore, an easy trick is to replace multiplica-
tion by g with division by 1/g. Letting h denote 1/g, we have
f (x)
lim f (x)g(x) = lim
x→a x→a h(x)
Example 6.8. What is limx→0+ x cot x? Letting f (x) = x and g(x) = cot x we see this has
the form 0 · ∞. Letting h(x) = 1/g(x) = tan x we see that
x x
lim x cot x = lim = lim · cos x .
x→0+ x→0+ tan x x→0+ sin x
The limit at 0 of x/ sin x is 1 and the limit of the continuous function cos x is cos(0) = 1,
therefore the answer is 1 · 1 = 1.
83
x
Exercise 6.5. Compute lim .
x→∞ ex
The idea with indeterminate powers is to take the log, compute the limit, then exponentiate.
The reason this works is that ex is a continuous function. Theorem 3.16 says that if
limx→a h(x) = L then limx→a eh(x) = eL .
The way we will use this when evaluating something of the form limx→a f (x)g(x) is to take
logarithms. Algebra tells us ln f (x)g(x) = g(x) ln f (x). If we can evaluate limx→a g(x) ln f (x) =
L then we can exponentiate to get limx→a f (x)g(x) = eL .
Exercise 6.6. What is limx→∞ x1/ ln(x) ? RP says: “Maybe I’m warped, but I think this
one is cute: surprising and easier than it looks.” 2024 update: we did this one already in a
1 − cos(x)
scratchee! Instead: what is lim ?
x→0 sin x
Example 6.10 (continuous compounding). Suppose you have a million dollars earning a
12% annual interest rate for one year. You might think that after a year you will have 1.12
million dollars. But no, things are better than that. The bank compounds your interest
for you. They realize you could have cashed out after half a year with 1.06 million and
reinvested for another half year, giving you 1.1236 million, which doesn’t seem so different
but is actually 3600 dollars more. You could play this game more frequently, dividing
the year into n periods and earning 12%/n interest n times, so your one million becomes
(1 + 0.12/n)n million.
With computerized trading, you could make the period of time a second, or even a mi-
crosecond. Does this enable you to claim an unbounded amount of money after one year?
To answer that, let’s compute the amount you would get if you compounded continuously,
namely limn→∞ (1+0.12/n)n . Taking logs gives ln(1+0.12/n)n = n ln(1+0.12/n). Compute
ln(1 + 0.12/n)
lim n ln(1 + 0.12/n) = lim
n→∞ n→∞ 1/n
84
Therefore, limn→∞ (1 + 0.12/n)n = e0.12 ≈ 1.12749685 million dollars. That’s better than
the $120,000 you earn without compounding, or the $3,600 more than that you earn com-
pounding once, but it’s not infinite, it’s just another $3896.85 better.
Exercise 6.7. What is limt→0 (1 + t)1/t ? This limit is sometimes used to define the famous
constant named after Euler.
Because interest is quoted in both continuous and annualized rates, we need to agree on
terminology to distinguish between these. Our terminology is reasonably consistent with
industry usage, however you should be warned that real world usage can vary quite a bit. For
us an interest rate always refers to a continuous exponential growth rate, r, quoted either
as a real number in units of inverse time, or a percentage R so that R = R/100. For example,
an interest rate of r = 0.07 annually (which is in units of inverse time because “annually”
means “per year”) is the same as an annual interest rate of R = 7% and corresponds to
the way your money would grow in a savings account that offered 7% interest compounded
continuously. As we have seen in Example 6.10, this corresponds to a one-year growth
factor of e0.07 .
The growth factor for t years instead of one year is easily seen to be ert ; thus money growing
at a constant continuous interest rate is an example of exponential growth.
Exercise 6.8. Verify this by dividing the t years into tn intervals of size 1/n years (as was
done in Example 6.10 with t = 1) and computing limn→∞ (1 + r/n)tn .
There’s also a name for the annual yield, which is how much the interest looks like if you
receive it in a lump sum at the end of a year. For example, an interest rate of r = 7% gives
a one-year growth factor of e0.07 , leading to an annual yield of e0.07 − 1, which is a little
over 0.0725. Multiplying by 100 to write this as a percentage we say that the APY which
stands for Annual Percentage Yield is a little over 7.25%. Because consumers find the
idea of annual yields easier to understand, banks have now for decades been required to
quote interests rates in terms of the APY.
Letting r be the interest rate, so R = 100R is the percentage interest rate, with g denoting
the growth factor and y = g − 1 the annual yield, we can solve for any of these in terms of
85
any other to obtain
g = er (6.1)
r
y = e −1 (6.2)
r = ln g (6.3)
r = ln(1 + y) (6.4)
If you prefer things in percentages, the APY for example, in terms of the percentage interest
rate R, would be given by APY = 100(eR/100 − 1).
Exercise 6.9. What is the inverse function for this, that writes R in terms of the APY?
Sometimes when trying to evaluate limx→a f (x)/g(x) you find that limx→a f 0 (x)/g 0 (x) ap-
pears a bit simpler, but you still can’t tell what it is. You might try L’Hôpital’s rule twice.
If f 0 (x) and g 0 (x) tend to zero as x → a (if they don’t, you can probably tell what the limit
is), then you can use f 0 in place of f and g 0 in place of g in L’Hôpital’s rule. If you can
evaluate the limit of f 0 (x)/g 0 (x) then this must be the limit of f (x)/g(x). You can often do
a little better if you simplify f 0 (x)/g 0 (x) to get a new numerator and denominator whose
derivatives will be less messy.
Example 6.11. Repeated L’Hôpital’s rule makes another limit that was formerly painful
into a piece of cake: limx→0 (1 − cos x)/x2 . Both numerator and denominator are zero at
zero, so we apply L’Hôpital’s rule to see that the limit is equal to limx→0 sin x/(2x). You
can probably remember what this is, but in case not, one more application of L’Hôpital’s
rule shows it to be equal to limx→0 cos x/2 = cos(0)/2 = 1/2.
x3
Exercise 6.10. Compute lim .
x→∞ ex
Often in mathematical modeling, one hears statements such as “This model produces a
much smaller growth rate than the other model, as time gets large.” This statement sounds
86
vague: how much is “much smaller” and what are “large times”? In this section we will
give a precise meaning to statements such as this one.
Why are we spending our time making a science out of vague statements? Answer: (1)
people really think this way, and it clarifies your thinking to make these thoughts precise;
(2) a lot of theorems can be stated with these as hypotheses; (3) knowing the science of
orders of growth helps to fulfill the Number Sense mandate because you can easily fit an
unfamiliar function into the right place in the hierarchy of more familiar functions.
(i) We say the function f is asymptotic to the function g, short for “asymptotically
equal to”, if
f (x)
lim = 1.
x→∞ g(x)
This is denoted f ∼ g.
(ii) The function f is said to be much smaller than g, or to grow “much more slowly” if
f (x)
lim = 0.
x→∞ g(x)
x2 + 3x 3
lim 2
= lim 1 + = 1 ,
x→∞ x x→∞ x
therefore indeed x2 + 3x ∼ x2 .
87
(iii) ln x x
Example 6.13. Let’s compare two powers, say x3 and x3.1 . Are they asymptotically equiva-
lent or does one grow much faster? Taking the limit at infinity we see that limx→∞ x3 /x3.1 =
limx→∞ x−0.1 = 0. Therefore, x3 x3.1 . This is shown on the left side of Figure 22.
What about comparing 100x3 with 0.001x3.1 ? The plot on the right side of Figure 22 appears
to show that 100x3 remains much greater than 0.001x3.1 , at least beyond a duodecillion (look
it up). Doing the math gives
100x3
lim = lim 100000x−0.1 = 0 .
x→∞ 0.001x3.1 x→∞
Therefore, again, 100x3 0.001x3.1 . Whether or not you care what happens beyond 1042
depends on the application, but the math is pretty clear: if a < b, then Kxa Lxb for any
positive constants K and L.
Discussion
This is a general rule: the function g(x) + h(x) will be asymptotic to g(x) exactly when
h(x) g(x). Why? Because (g(x) + h(x))/g(x) and h(x)/g(x) differ by precisely 1. It
88
follows that if g(x) + h(x) ∼ g(x) then
x + 1/x x
−x
∼ .
2−e 2
These two facts give important techniques for estimating. They allow you to clear away
irrelevant terms: in any sum, every term that is much less than one of the others can be
eliminated and the result will be asymptotic to what it was before. You can keep going
with products and quotients.
√
Example 6.15. Find a nice function asymptotically equal to x2 + 1. The notion of “nice”
is subjective; here it means a function you’re comfortable with, can easily estimate, and so
forth.
√
Because 1 x2 we can ignore the 1 and get x2 which is equal to x for all
√
positive x. Therefore, 1 + x2 ∼ x.
(i) f ∼ g
(ii) h ∼ h
(iii) f − h ∼ g − h
89
It should be obvious that the relation ∼ is symmetric: f ∼ g if and only if g ∼ f . Formally,
f (x) g(x)
lim = 1 ⇐⇒ lim =1
x→∞ g(x) x→∞ f (x)
because one is the reciprocal of the other. On the other hand, the relation f g is
anti-symmetric: it is not possible that both f g and g f .
2. Exponentials grow at different rates and every exponential grows faster than every power.
3. Logarithms grow so slowly that any power of ln x is less than any positive power of x.
Everything we have discussed in this section has referred to limits at infinity. Also, all our
examples have been of functions getting large, not small, at infinity. But we could equally
have talked about functions such as 1/x and 1/x2 , both of which go to zero at infinity. It
probably won’t surprise you to learn that 1/x2 is much smaller than 1/x at infinity.
Exercise 6.13. Use the definitions to verify that 1/x2 1/x.
These same notions may be applied elsewhere simply by taking a limit as x → a instead of
as x → ∞. The question then becomes: is one function much smaller than the other as the
argument approaches a? In this case it is more common that both functions are going to
zero than that both functions are going to infinity, though both cases do arise. Remember:
at a itself, the ratio of f to g might be 0/0 or ∞/∞, which of course is meaningless, and
can be made precise only by taking a limit as x approaches a.
The notation, unfortunately, is not built to reflect whether a = ∞ or some other number.
So we will have to spell out or understand by context whether the limits in the definitions
of and ∼ are intended to occur at infinity or some other specified location, a.
90
Example 6.17. Let’s compare x and x2 at x = 0. At infinity, we know x x2 . At zero,
both go to zero but at possibly different rates. Have a look at Figure 23. You can see that
x has a postive slope whereas x2 has a horizontal tangent at zero. Therefore, x2 x as
x → 0+ . You can see it from Figure 23 or from L’Hôpital:
x2 2x
lim = lim = 0.
x→0+ x x→0+ 1
Example 6.18. What about x2 and x4 near zero? Both have slope zero. By eye, x4 is a
lot flatter. Maybe x4 x2 near zero. It is not clearly settled by the picture (do you agree?
see Figure 24), but the limit is easy to compute.
Exercise 6.14. Compute the limit needed to settle the previous answer.
We try evaluating the ratio: f (x)/g(x) = x1/2 /x1/3 = x1/2−1/3 = x1/6 . Therefore,
f (x)
lim = lim x1/6 = 0
x→0+ g(x) x→0+
91
Figure 24: Comparing x2 (red) and x4 (black) at x = 0
√ √
3
Figure 25: Comparing x (black) and x (red) at x = 0
and indeed x1/2 x1/3 . Intuitively, the square root of x and the cube root of x both go to
zero as x goes to zero, but the cube root goes to zero a lot slower (that is, it remains bigger
for longer).
Exercise 6.15. Let a, b, K, L be positive constants with a < b. Determine which of Kxa or
Lxb is much greater than the other at x = 0, if either.
Suppose f and g are two nice functions, both of which are supposed to be approximations
to some more complicated function H near the argument a. The question of whether
f − H g − H, or g − H f − H, or neither as x → a is particularly important because
it tells us whether one of the two functions f and g is a much better approximation to H
than is the other. We will be visiting this question shortly in the context of the tangent
92
line approximation, and again later in the context of Taylor polynomial approximations.
Often when discussing comparisons at infinity we use the term “for sufficiently large x”.
That means that something is true for every value of x greater than some number M (you
don’t necessarily know what M is). For example, is it true that f g implies f < g? No,
but it implies f (x) < g(x) for sufficiently large x. Any limit at infinity depends only on
what happens for sufficiently large x.
√ √
Example 6.20. We have seen that ln x x − 5. It is not true that ln 6 < 6 − 5 (the
√
corresponding values are about 1.8 and 1) and it is certainly not true that ln 1 < 1 − 5
√
because the latter is not even defined. But we can be certain that ln x < x − 5 for
sufficiently large x. The crossover point is between 10 and 11.
93
7 Optimization
Many of you have seen max-min problems before. If not, pay attention! Finding the
maximum or minimum of a function is one of the crowning achievements of calculus. This
occurs in business (maximize profit), medicine (minimize mortality), mechanical engineering
(what is the maximum load?), economics (maximize utility), population genetics (maximize
selective advantage), actuarial science (minimize risk), and further applications in every field
that uses mathematical models and methods.
The following definitions give precise meaning to notions you have probably already seen.
Some vocabulary may be new but none of it is rocket science.
Definition 7.1.
• A point x ∈ [a, b] such that f (x) ≤ f (y) for all y ∈ [a, b] is called a minimum
(Plural: minima). This is also called a global or absolute minimum on [a, b].
• A point x ∈ [a, b] such that f (x) ≥ f (y) for all y ∈ [a, b] is called a maximum
(Plural: maxima). This is also called a global or absolute maximum on [a, b].
• A local minimum is a value x such that f (x) ≤ f (y) for all y in some open interval
I containing x, which could be a lot smaller than the whole interval (a, b). The terms
local maximum and local extremum are defined analogously.
A subtle but important piece of vocabulary distinguishes between the location of the
extremum (the value of x) and the value of the extremum, namely f (x) where x is the
location. When we refer to “the maximum” without saying “location” or “value” it is
assumed we mean the value. Both are important though, as can be seen through these
examples.
94
• If I want to build a building to house my flying squirrels, I need to know what the
maximum height they’re capable of flying is, but I don’t really care when they get to
that height.
• If I need to build a window which admits the most possible light, what I care about
is how to set the dimensions (an input), but the amount of light actually let in (in
lumens, say) isn’t really needed.
• If I’m running a widget factory and I want to know what production level will maxi-
mize my profit, the input where the maximum occurs (a number of widgets per hour)
is important, but for fiscal planning I also need to know what that maximum (a
number of dollars) actually is.
Before we start looking for extrema, it might occur to you to question whether they exist.
Exercise 7.1.
(i) Find a discontinuous function defined on the interval [−2, 1] with no absolute maximum
nor minimum on that interval.
(ii) Find a continuous function on (−2, 1) with no absolute maximum nor minimum on
that interval.
Now that you have seen some scenarios where functions have no absolute extrema on an
interval, here is a theorem guaranteeing the opposite.
Aside. Like the Intermediate Value Theorem, this theorem requires mathematical analysis
to prove; we would say it was obvious were it not for the counterexamples we paraded by
you in the self-check exercises!
Theorem 7.2. Let f be a continuous function on the closed interval [a, b]. Then f has at
least one absolute minimum on [a, b] and at least one absolute maximum on [a, b].
Exercise 7.2. What hypothesis of Theorem 7.2 is violated in each part of Exercise 7.1?
95
This result should seem very credible on an intuitive level. If f 0 (c) > 0 then moving to the
left from c to c − ε should produce a greater value of f . Likewise, if f 0 (c) < 0 then moving
to the right should produce a greater value. This is the most intuitive justification we could
write down, though not exactly airtight.
Figure 26: difference quotients between c and points to the right (red) are positive; those
to the left (blue) are negative
Let’s get the logic straight. It is of the form minimum ⇒ f 0 = 0. The converse is not
necessarily true: f 0 = 0 ⇒ minimum. Nevertheless, everyone’s favorite procedure for finding
minima is to set f 0 equal to zero. Why does this work, or rather, when does this work?
From Theorem 7.2, if f is defined and continuous on a closed interval [a, b], then indeed f
has to have a minimum somewhere on [a, b]. We can find it by using Theorem 7.3 to rule
out where it’s not: if a < c < b and f 0 (c) 6= 0, then definitely the minimum does not occur
at c. Where can it be then? What’s left is the point a, the point b, every point where f 0
is zero, and every point where f 0 does not exist. An identical argument shows the same is
true for the maximum. Summing up:
Theorem 7.4. Suppose f is continuous on [a, b] and differentiable everywhere on (a, b)
96
except for a finite number of points c1 , . . . , ck . Then the minimum value of f on [a, b]
occurs at one or more of the points {a, b, c1 , . . . , ck , anywhere f 0 = 0}, and nowhere else.
The maximum also occurs at one or more of these points and nowhere else.
Exercise 7.4. Let f (x) := |x| and let [a, b] be the interval [−2, 2]. Does the theorem say f
must have a minimum on this interval? If so, what does the theorem say about where the
minimum must be? Answer the same question for the maximum of f on [a, b].
Being differentiable except for a number of points is sometimes called being piecewise
differentiable, because the function is differentiable in pieces, the pieces being the intervals
(a, c1 ), (c1 , c2 ), . . . , (ck−1 , ck ), (ck , b).
Figure 27
Exercise 7.5. Let f be the “sawtooth” function shown in Figure 27, defined by letting f (x)
be the distance from x to the nearest integer, either bxc or dxe.
Is f piecewise differentiable on [−2, 2]? If so, give a value of k and c1 , . . . , ck that shows
this to be true. If not, say why not.
You can write Theorem 7.4 as a procedure if you want. Even if you’re looking only for the
minimum or only for the maximum, the procedure is the same so it will find both.
(5) For every point x on the list, compute f (x); the one(s) of these where f is greatest will
be the maxima; the one(s) where f is least will be the minima.
97
Example 7.6. Find the maximum of f (x) := 5x − x2 on
the interval [1, 3]; see the figure at the right. Computing
f 0 (x) = 5−2x and setting it equal to zero we see that f 0 (x) =
0 precisely when x = 2 21 . There are no points where f is
undefined, so our list consists of just the one point plus the
two endpoints: {1, 2 21 , 3}. Checking the values of f there
produces 4, 6 41 , 6. The maximum is the greatest of these,
occurring at x = 2 21 .
Exercise 7.6. Find the maximum and minimum of x3 − x2 − 2x on the interval [−1, 3].
(a) (b)
(c) (d)
Figure 28
Exercise 7.7. Here are some other things you may find when you use Procedure 7.5. Match
each of these verbal descriptions to the role of x in one of the four pictures in Figure 28.
Then state which picture has an endpoint that is not a global extremum.
98
• f 0 (x) = 0 but f is neither a local minimum nor a local maximum
Example 7.7 (interval is not closed, function has no minimum). Let f (x) = x and consider
the half-closed interval (0, 1]. In this case we have a continuous function but not a closed
interval. This example represents a scenario where you make a donation in bitcoin to enter
a virtual tourist attraction and you want to spend as little as possible. You have 1 bitcoin,
so that’s the maximum you can donate; donations can be any positive real number but zero
is not allowed. The minimum of x on (0, 1] does not exist: there is no smallest positive
real number. The interpretation is clear: no matter how little you donate, you could have
donated less. Mathematically, this clarifies the need for a closed interval in Theorem 7.2.
Exercise 7.8.
(i) True or false: the function e−x has a global minimum on the whole real line?
(ii) True or false: the function xe−x has a global minimum on the nonnegative half-line
[0, ∞)?
Second derivatives
Recall that wherever f has a second derivative, if f 00 6= 0 then the sign of f 00 determines the
concavity of f . If f 00 (x) > 0 then f is concave upward and if f 00 (x) < 0 then f is concave
downward. At a point where f 0 = 0, if we know the concavity, we know whether f has a
local maximum or local minimum.
Figure 29: a critical point where f 00 < 0 (left) and where f 00 > 0 (right)
99
Example 7.8. What are the extrema of the function f (x) := x2 + 1/x on the interval
(0, 2)? The only critical point is where f 0 (x) = 2x − 1/x2 = 0, hence x = 3 1/2. Here,
p
f 00 (x) = 2 + 2/x3 > 0 therefore this is a local minimum. There are not any local maxima.
This means f has no global maximum on (0, 2). It may have a global minimum, and indeed,
√
Figure 30 shows that x = 3 2 is a local minimum. We will discuss further tools for arguing
whether a local extremum on a non-closed interval is a global extremum.
Remark. If the second derivative vanishes along with the first, you won’t know any more
than you did already.
Applications
Example 7.9. The logistic equation models growth rate per unit time, call it R, of a
population as R(x) = Cx(A − x). Here C is a constant of proportionality, x is the present
population, and A is a theoretical limit on the population size supported by the habitat.
At what size is the population growing the fastest?
We need to find the maximum of R(x) := Cx(A − x) on [0, A]. The reason for restricting
to this interval is that we are told the population size is constrained to be at most A, and
of course it has to be nonnegative. Computing R0 (x) = C(A − 2x), we find R0 = 0 for a
single value, x = A/2. Checking the endpoints, we find R is zero at both. Therefore the
maximum value occurs at x = A/2.
100
Exercise 7.9. What are the units of x, R, A and C?
Example 7.10. Suppose the cost of supplying a station is proportional to the distance
from the station to the nearest port, and the cost of the land for the station is inversely
proportional to the distance to the nearest port. Adding together these costs, what is the
least expensive distance at which to put the station?
Letting x be the distance to the nearest port and f (x) be the cost, we are told that f (x) =
ax + b/x where a and b are unspecified constants. The value of f (x) is defined for every
positive x and f is continuous on (0, ∞). We seek the global minimum of f on (0, ∞). We
are not guaranteed there is a minimum. When we solve for f 0 (x) = 0 we find
r
0 b b
0 = f (x) = a − 2 hence x= .
x a
q q √
At this value, f (x) = a ab + b/ ab = 2 ab. Checking what happens near 0 and ∞, we
find limx→0 f (x) = ∞ and limx→∞ f (x) = ∞. Therefore, there is a minimum value, which
√
we have determined to be ab occurring at x = ab .
p
You might have noticed there are two free variables in this example, the unspecified con-
stants a and b. It’s worth observing that everything interesting in the problem depends
only on the ratio b/a. One might check whether this makes sense from the units. The units
of a are in dollars per distance. The units of b are dollars per inverse distance, so dollars
times distance. Dividing and simplifying, we see that b/a has units of distance, which cor-
p
roborates that x = b/a is a reasonable solution for the location of the minimum, since
this really is a “location” as measured in distance to the nearest port.
Example 7.11. The functions xγ e−x , for x ≥ 0, arise in probability modeling. They are
called Gamma densities. We will return to these in Section 12.3. For now, we would like to
understand the shape of these functions. An example with m = 5 is shown in Figure 31.
The place where one is mostly to find the random variable is where the maximum of the
density occurs. Where does the maximum of f (x) := x5 e−x occur? We know that the value
is zero at x = 0 and positive everywhere else. We also know limx→∞ f (x) = 0. This means
there must be a maximum at some positive finite x. The function f is differentiable for all
positive x, therefore the maximum can only occur where f 0 = 0. Solving
101
Figure 31: Gamma-5 density
Exercise 7.10. Why does a limit of zero at infinity imply that f must have a maximum at
some positive, finite x? Any convincing argument is fine here.
Example 7.12. Let h be the height of a member of a carnivore species. In this simple
model, the food gathering capability of an individual is given by kh2 while its daily food
needs are given by ch3 .
(a) Why?
(c) To maximize food gathering ability minus food needs, how tall should members of this
species be?
(a) We can only make educated guesses about the reason the equations in the model have
this form. If an animal’s speed is proportional to its height then the model stipulates
territory is proportional to the square of this. Perhaps territory is the area that can be
reached in a given amount of time such as an hour or a day. As to why food needs would be
proportional to volume, one might imagine that sustaining and nourishing tissue requires
nutrients proportional to volume.
(b) Units of c are food per length3 and units of k are food per length2 . For example, if
food is measured in kilograms and length in meters, then food per length3 would be kg/m3 ;
however one might measure food in other ways such as calories, or numbers of a particular
animal of prey, etc.
102
(c) The objective function we want to maximize is kh2 −ch3 . Having been told no limitations
on size, we assume h can be any positive real number, though we may have to retract that
if the optimum turns out to have an unrealistic scale. Differentiating f (h) := kh2 − ch3
with respect to h yields 2kh − 3ch2 and setting equal to zero gives the two solutions 0 and
x∗ := (2k)/(3c). This indeed has units of length. Clearly f (0) = 0. The value of the
objective function at x∗ is 4k 3 /(27c2 ), which is positive. Therefore the maximum of f on
[0, ∞) is either 4k 3 /(27c2 ) achieved at h = (2k)/(3c) or there is no maximum because the
function can get arbitrarily large as h → ∞. At infinity, f (h) ∼ −ch3 because kh2 ch3
as h → ∞. Therefore, h has a maximum at a positive location, whose value is 4k 3 /(27c2 ).
Exercise 7.11 (optional, it’s a bit of a computation, though not hard). Continuing the
previous example, suppose that for lions k = 0.001 gazelles per square meter, and c = 0.0004
gazelles per cubic meter. What length of lion maximizes its excess food gathering ability,
and how many gazelle carcasses per day will be left over for the other lions in the pride?
7.3 Applications
There’s no new material in this section, just some typical applications of optimization using
differential calculus.
Example 7.13. We’re going to build a window in the shape of a rectangle topped by an
equilateral triangle. We want to make a window which lets in the most light – that is, with
the greatest possible area. In order to build the window, we have to use wood trim. We
have 16 feet of wood trim to build the window with.
Such a window has two dimensions: the width and the height of the rectangle. The rect-
angular portion has area and the triangular portion has area. So the total area is given
by √
3 2
A(w, h) = wh + w .
4
We also need to take into account the fact that our supplies are limited. Two pieces of trim
with length h and four of length w add up to 2h + 4w which we can set equal to 16 because
103
if they add up to less we would increase h to takeup the slack, obviously giving us more
light. Thus h = 8 − 2w and we can plug in to get
√
3 2
A(w) = w(8 − 2w) + w .
4
Clearly w can’t be less than zero or greater than 4, so we are left with the calculus problem
of maximizing A(w) over w ∈ [0, 4]. There’s only one critical point, when A0 (w) = 8 − 4 w +
√ 16
1/2 w 3 = 0, whose solution is w∗ = √ ≈ 2.55 feet. We are also interested in the
8− 3 √
value of the maximal area which is A(w∗ ) = 64/(8 − 3) ≈ 10.21 square feet.
Optimization in business
Consider a company whose main business is producing and selling sneakers. In real life
it’s very complicated, taking into account things such as labor costs, transportation, im-
port/export taxes, management costs, durable equipment versus expendable supplies, and
so forth. But one can get a handle on basic decision making with a simplified model, taking
into account only a few variables, as follows.
Let p be the selling price of a pair of sneakers. This may seem like an odd choice for the
lone independent variable in such a model until one realizes that the retail price is the
one thing the company completely controls. According to economic theory, the demand
N (p) for the sneakers will be a function of the price; this is pretty credible. The equation
P (p) = N (p)(p − U (p)) represents the fact that the profit P is found by multiplying the
number of pairs of sneakers sold times the difference between the price p and the production
cost U (p) per pair. One might also write this as revenue minus cost, where revenue is your
gross sales pN (p) and the production cost U (p)N (p) is the unit cost times the number of
units.
The big simplification in this model involves the supposition that N (p) and U (p) are know-
able and furthermore have simple formulas. In fact, a lot might be inferred about N (p)
might be available from marketing data and known demographics. In the region where the
maximum of P occurs, N may indeed be approximated by a simple formula. In the case
of the unit production cost, U (p) might be difficult to know because of the huge basket of
things it includes taxes, management costs, excess inventory, etc., and while it is a function
of p (because it is a function of N and N is a function of p) it is unlikely to satisfy a nice
formula or be mathematically tractable.
104
Example 7.14. Suppose that U (p) is constant: no matter how many sneakers you make,
the marginal cost of producing one more is the same amount, say c dollars. Suppose that
N (p) obeys some power law, N (p) = bp−α . Thus,
P (p) = bp−α (p − c) .
Can we determine the best price to set? Looking for critical points we find
Setting this equal to zero we factor out bp−α−1 and find that 0 = p − α(p − c), and solving
α
for p gives p = c .
α−1
This is a good chance to practice asking questions. Before you read on, please stop and think
about what questions you should be asking. When fractional exponents are involved, units
are often nonsensical, so let’s not go there. What about the signs: is α positive or negative,
and does the formula make sense? It seems the way we set things up, α should be positive so
that the demand can decrease to zero, not increase to infinity, as the price rises. Something
must be messed up when α = 1, but what and why? In fact, something is messed up when
α ≤ 1: the critical point is a minimum rather than a maximum. In fact when α < 1, say
1/2 for example, the model is nonsensical. You can price the sneakers at a trillion dollars,
sell only 1/1, 000, 000 of a pair, and make a million dollars. The nonsense is that there’s no
good interpretation of selling a small fraction of one pair of sneakers. The same issue arises
in principle when α > 1, say α = 2, only it doesn’t matter, because if you sell a trillionth of
a pair for a million dollars per pair, almost no money (or sneakers) changes hands. It’s OK
to model N (p) as a continuous variable when small values of N corresponds to irrelevant
parts of the scenario but not when the small values of N correspond to ridiculously huge
transactions.
Next question: say α > 1; do things make sense now? The best price point is the cost, c,
multiplied by α/(α − 1). It’s a good sign that α/(α − 1) > 1; it means you are setting the
price above cost. Notice that as α → ∞, α/(α−1) goes to 1 from above. You might interpret
that as saying that when consumers are very cost-sensitive (large α), then you shouldn’t
set the price much above your actual cost. What about the constant of proportionality b?
It doesn’t appear at all in the formula for the best price point. The profit you make at this
price point will be proportional to b but the price point doesn’t change with b. Whether
this makes intuitive sense is up to you. It kind of does to me. This is an oversimplified
model, to be sure, but seems to be getting at some real phenomena.
105
8 Further topics in differential calculus
Calculus has been around for 300 years. The applications and techniques don’t all fit nicely
into chapter length categories. Here, we tie up some loose ends and mention a few things
we think you shouldn’t miss.
This section pays back a debt by addressing those functions in Proposition 5.8 whose deriva-
tions we have not yet discussed: powers, exponentials, logarithms and inverse trig functions.
To clarify our terminology, the reason xa is called a power, while ax is called an exponential,
is that we are differentiating with respect to x, while a plays the role of a constant.
For positive integer powers xn there are many ways of computing the derivative. One is by
expanding it out:
n n n−1 n 2 n−2
(x + h) − x = nhx + h x + · · · + nhn−1 x + hn .
2
Dividing by h and taking the limit as h → 0 shows that the derivative of xn is nxn−1 .
Another way is to prove it by induction, using the product rule to get from (d/dx)xn =
nxn−1 to (d/dx)xn+1 = (n + 1)xn .
For negative integer powers you can use the quotient rule, writing x−n = 1/xn and using
the known derivative for positive integer values of n. For rational powers, it is easiest after
proving a combining rule that tells us how to compute the derivative of the inverse function
f −1 if we know the derivative of f . The derivation is a quick use of the chain rule.
Proposition 8.1.
d −1 1
f (x) = 0 −1 . (8.1)
dx f (f (x))
d −1
f 0 (f −1 (x)) f (x) = 1
dx
and dividing both sides by f 0 (f −1 (x)) yields the result.
106
One of the instructors called this proof “efficient but unenlightning.” In case you feel the
same way, here is a pictorial proof.
Exercise 8.1. Suppose f has input units of people and output units of money. Do a unit
analysis of equation (8.1): what are the units of each side, and are they the same?
Example. Square root is the inverse function to squaring. Using Proposition 8.1 quickly
computes the derivitive of the square root. Letting f (x) = x2 in Proposition 8.1, and using
f 0 (x) = 2x, the conclusion becomes
d√ 1
x= √ .
dx 2· x
d √
3
Exercise 8.2. Use a similar method to compute x. Show your work.
dx
Similarly, this allows us to show (d/dx)x1/n = (1/n)x1/n−1 . Using the chain rule, because
xk/n = (x1/n )k , we can then compute (d/dx)xk/n for any nonzero integers k and n. So now
we have verified that the derivative of xr is rxr−1 for all rational numbers r.
At the end of the section we will finish this argument by handling the case of exponents
that are not rational numbers.
We’ve already computed the derivatives of the basic trig functions (parts 6, 7 and 8). What
remains are the inverse trig functions. Use the inverse function rule, obviously! For example,
107
if f (x) := sin x then the derivative of arcsin is computed by
d 1
arcsin x = .
dx cos(arcsin x)
p
Some of you may recognize the identity cos(arcsin y) = 1 − y 2 . In case not, it’s an
easy piece of geometry. For any y ∈ [−1, 1], arcsin y is a value between −π/2 and π/2,
denoted by θ in Figure 32. In the figure, the measure of BC is |y| and the measure of
AC is cos θ =pcos(arcsin y), and the Pythagorean theorem shows what we want, namely
cos arcsin y = 1 − y 2 .
C
A θ
y
Figure 32
Sometimes two quantities vary with time and one is a function of the other. In this case,
the rate of change of one quantity determines the rate of change of the other. In old style
textbooks, this was a major topic even though there isn’t all that much to say. We think it
is more proportionate to illustrate with one example, give you a few practice problems and
call it a day.
Example 8.2. Suppose the volume of a balloon increases as a function of time. The radius,
being a function of the volume, will therefore increase at a different rate. Writing R = f (V )
and V = g(t), we have R = f (g(t)). Therefore by the chain rule,
dR dR dV
= .
dt dV dt
108
This notation hides where each derivative is evaluated but the meaning is clear. Letting
primes denote time derivatives, R0 = V 0 · dR/dV .
The rate of increase of radius and the rate of increase of volume are therefore called related
rates. Knowing one always gives you the other, provided you know the present volume and
p
can compute dR/dV . For a spherical balloon, V = (4π/3)R3 , therefore R = 3 3V /(4π) =
(3/(4π))1/3 V 1/3 and we can compute dR/dV = (1/3) · (3/(4π))1/3 V −2/3 . In other words,
if the present volume is V , then the rate the radius is growing in, say, cm/sec, is equal
p
to 3 3/(4π)/3 times the rate the volume is growing in cm3 /sec divided by the two thirds
power of the volume.
(iv) The units of h were not given. Did you choose units? Does
this affect the answer?
Recall that e was defined to be the positive real number such that ex has slope 1 at x = 0.
In other words, by definition,
eh − 1
lim = 1.
h→0 h
From this we can compute the derivative of ex at any point. Let f (x) := ex . Then
ex+h − ex eh − 1 eh − 1
f 0 (x) = lim = lim ex = ex lim = ex .
h→0 h h→0 h h→0 h
109
Next, for some a > 0, let f (x) := ax = (eln a )x = ex ln a . The chain gives
d
f 0 (x) = ex ln a (x ln a) = ax ln a .
dx
At this point, we have derived parts 1, 3 and 4 of Proposition 5.8. For Part 5, letting f (x) :=
ex so f −1 (x) = ln x, we use the inverse function rule Proposition 8.1 and (d/dx)ex = ex to
obtain
d 1 1
ln x = ln x = .
dx e x
Paying back a debt, this finishes off Part 2 of Proposition 5.8. For any real r and positive
x, let f (x) := xr = er ln x and use the chain rule to obtain
d r
f 0 (x) = er ln x (r ln x) = xr = rxr−1 .
dx x
Differential equations
The course after this one studies differential equations. This semester we get only a tiny
preview of this subject. A differential equation arises when you have a function that is
unknown and your information about it includes something about the derivative. The
simplest example is when you know the derivative outright, for example f 0 (t) = 5 + 4t.
Integral calculus then produces a formula for f . In this case, because you are familiar with
derivatives of polynomials, you can probably recognize the solution f (t) = 5t + 2t2 . There
are other possible solutions, all differing by a constant, for example f (t) = 1 + 5t + 2t2 . The
general solution is f (t) = c + 5t + 2t2 where c can be any constant. Further information is
required to figure out c; if you know even a single value of f , such as f (7) = −2, you can
solve for c.
The differential equation we will study here is the next simplest one: f 0 (t) = kf (t). This is
more subtle because the derivative is not given outright but rather is related to the function
itself (of course f represents the same function on both sides of the equation). This method
of solution of this equation is similar to the previous example. You can solve it because you
can recall a function that behaves this way, namely the function f (t) := ekt . That is the
simplest looking solution but there are others. The most general solution is f (t) = A ekt
where A can be any constant. When you study methodical solutions to differential equations
you will be able to prove that these are the only solutions. In the present course, we won’t
discuss the problem at that level but you are free to assume this is true: if f 0 (t) = k f (t)
110
for all t then f (t) = A ekt for some real number A. Note that the constant k is not like the
constant A: the constant k is part of the equation you were given altering it will make the
function no longer a solution to the equation.
When k = −` < 0, this is called exponential decay. The classic example of exponential decay
is a radioactive material breaking down through alpha or beta decay. Other things that
decay exponentially under the right circumstances are temperature difference, correlations
in time series data and valuations of future goods. These examples were mentioned briefly
in Section 2.3. Calculus gives a a reason to believe why exponential growth and decay are
plausible models for these physical phenomena. It is because the underlying mechanisms
force f 0 to be proportional to f .
Exercise 8.4. Suppose the underlying mechanisms force f 0 to be proportional to f −c rather
than f . Write down a guess as to what the differential equation would look like.
Time constants
Suppose f (t) := Aekt where t is in units of time and f is a quantity in some units we will
just refer to as “units of f ”. Recall from the introduction to units early in the course that
the exponent kt is required to be unitless if the expression is to make physical sense. That
means the constant k has to have units of inverse time. Such constants are called time
constants.
111
At first these can be difficult to make physical sense of. We understand the quantity 0.02
days, but what is the physical significance of the quantity 0.02 inverse days? Most directly
it means that if t is the reciprocal, namely 50 days, then kt = 1 (unitless) and the quantity
Aekt is A · e, a factor of e greater than it was at the start (because at the start, Aek·0 = A.
Exercise 8.5. In March, 2020, the U.S. COVID-19 epidemic was increasing exponentially
with a time constant of 1.4 inverse weeks. By roughly what factor did the number of total
cases increase each week in March?
Which is bigger, an inverse second or an inverse minute? Minutes of course are much longer
than second: one minute equals 60 seconds. On paper one can convert between inverse time
units as well. For example,
1 60sec 60
1sec−1 = · = = 60min−1
sec 1minute minute
so one inverse second is 60 inverse minutes. To make this a little more intuitive, think of one
inverse second as 1/sec which we might write say aloud as “one per second”. The phrases
“one per second” and “sixty per minute” should sound believably the same.
112
8.4 Tangent line estimates and bounds using calculus
Let’s sum up what we already know about the tangent line approximation, this time in the
language of calculus. If f is a function differentiable in an open interval I containing a,
then the tangent approximation to f (x) at a is the function
√
Exercise 8.7. Compute the tangent line approximation for f (x) := 3 1 + x near x =
√
0. What quick estimate does this give of 3 1.06? Please check this against a numerical
computation on your computer and say how close the quick estimate was.
If f is twice differentiable in I and f 00 ≥ 0 on I then L(x) ≤ f (x) for all x ∈ I, that is, the
tangent line approximation is a lower bound for the actual value. Reversing the inequality
to f 00 ≤ 0 reverses the conclusion to L(x) ≥ f (x). Making the inequality strict makes the
conclusion strict, except at a where f and L always agree; see Figure 33.
Figure 33: f 00 < 0 on the interval shown, hence for any a, La (x) ≥ f (x) with equality only
at x = a.
Exercise 8.8. Compute the tangent line approximation to sin(π/5) at any nearby point a
where you know the value of sin(a). Write the result as an algebraic expression involving π
and say whether this is an upper bound, lower bound or neither.
We have said before that f (x) ≈ L(x) when x is near a. How close are these two? At
the end of the course you will see that L is just the first in a series of estimates P1 , P2 , · · ·
that approximate f better and better. These are the Taylor polynomials, the first being
linear, the second quadratic, and so on, the nth one having degree n. Each one is the best
approximation for a polynomial of its degree. How good an approximation are they? The
113
degree n Taylor polynomial differs from f at x by a term on the order10 of (x − a)n+1 .
Because the tangent line approximation L = P1 is the first, it differs from f by on the order
of (x − a)2 , meaning possibly 2(x − a)2 or 10(x − a)2 but not anything (x − a)2 as x → a.
When talking about orders of magnitude of functions near a, recall that (x − a)2 |x − a|,
in other words the difference between f and L at any x is much less than the distance that x
is from a. The above facts about Taylor polynomials are a preview. We won’t discuss them
more now, but instead will focus only on P1 , which is also denoted L. This proposition is
weaker than what we just told you about how close the tangent line approximation is, but
has the virtue of being easy to prove.
Proposition 8.3. The tangent line approximation is better than linear, meaning that
|L(x) − f (x)| |x − a| as x → a.
L(x) − f (x)
lim = 0.
x→a x−a
by composition with the absolute value function, which is continuous. We evaluate this:
Exercise 8.9. Using a calculator, compute the difference between the cube root of 1.06 and
your tangent line estimate in Exercise 8.7. Does this corroborate Proposition 8.3? Does it
corroborate the assertion that |P1 − f | should be on the scale of |x − a|2 ? In each case say
why or why not.
10
We have not formally introduced the phrase “on the order of” but what we mean here is that the first
quantity should not be much more than the second: it should not be true that x − a |Pn (x) − f (x)|.
114
The mean value theorem
In class we will discuss the following theorem. Please read it now to see whether it makes
intuitive sense to you. The hypotheses will be filled in after the class discussion centered
on the counterexamples in Figure 34.
f (b) − f (a)
Figure 34: In each case the dashed line illustrates the average slope
b−a
115
Theorem 8.4 (Mean value theorem). Let f be a function and a < b be real numbers.
Assuming some hypotheses , there must be a number
c ∈ (a, b) where the slope of f is equal to the average slope over (a, b), that is,
f (b) − f (a)
f 0 (c) = . (8.2)
b−a
Example 8.5. Let f (x) be the position (mile marker) of a PA Turnpike driver at time x.
Suppose the driver entered the Turnpike at Mile 75 (New Stanton) at 4pm and exited at Mile
328 (Valley Forge) at 7pm. What does the Mean Value Theorem tell you in this case? The
average slope of f over interval [4pm,7pm] is the difference quotient (f (7) − f (4))/(7 − 4) =
(325 − 75)/3 = 84 31 . Thus there is some time c between 4pm and 7pm that f 0 (c) = 84 13
MPH, in other words, that this driver was traveling at a speed of 84 13 MPH. Bonus question:
can the Mean Value Theorem be used in court by Law Enforcement? It has been ruled in
some states that this is legal evidence of the car having violated a speed limit, but not that
the particular driver has done so.
Exercise 8.10. Let f (x) := 1/x and let a < b be positive real numbers. What, explicitly in
terms of a and b, is the number c guaranteed by the Mean value theorem?
116
9 Summation
9.1 Sequences
On page 51 we briefly discussed sequences. When working with sums and the “Sigma”
notation for summations, you need to be able to write formulas for sequences you understand
intuitively. For example, if you want to write the sequence 7, 9, 11, 13, . . . in the notation
{bn : n ≥ 1}, so that b1 = 7, b2 = 9 and so on, one choice would be to say,
The subscript n is called the index11 (plural: indices). Indexing can begin at any natural
number. In this case, as is most common, we began at n = 1. Defining {bn : n ≥ 3}
by bn := 1 + 2n yields the same sequence: 7, 9, 11, 13, . . .. Secondly, the informal notation
7, 9, 11, 13, . . . is not mathematically precise because it assumes we all agree exactly what
the pattern is. Producing a formula for the nth term removes any ambiguity. A formula is
often necessary if you want to sum the sequence or to use it to define other sequences. This
section considers some common types of sequences and gives you some practice writing a
formula for the general term.
Exercise 9.1. Write a formula for the general term of the “place value” sequence 1, 10, 100,
1000, 10000, . . .. You can choose any letter for the indexing variable (we chose n above),
the sequence name (we chose b above) and the starting index (we chose 1 at first, then 3
for constrast). Whatever you choose, write the definition in a full sentence, similar to the
quoted sentence above.
Definition 9.1. A sequence is called arithmetic (adjective, accent on the third syllable,
to rhyme with “alphabetic”) if the difference between successive terms is constant.
Our example sequence 7, 9, 11, 13, . . . is an arithmetic sequence with common difference 2.
It is particularly easy to write a formula for the general term of an arithmetic sequence if
you start indexing at zero. The nth term is the zeroth term plus n copies of the common
11
We don’t absolutely need new notation. A sequence could be thought of as a function n 7→ bn from the
natural numbers to the real numbers, but the notation is useful because it sets us up to imagine that we
will be looking at the numbers b1 , b2 , . . . rather than the relationship between n and bn . In fact we define
sequences to be the same if the numbers are the same, even if the indices are different.
117
difference. In notation, if the common difference is d and the sequence is {ak : k ≥ 0},
this means ak = a0 + kd. Setting a0 = 7 and d = 2 gives ak = 7 + 2k for the sequence
7, 9, 11, 13, . . ..
Exercise 9.2. Which of these sequences appear to be arithmetic sequences?
(i) 9, −11, 13, −15, . . .
(ii) sin(1), sin(3), sin(5), sin(7), . . .
(iii) 30, 27, 24, 21, . . .
(iv) the sequence defined for n ≥ 0 by bn := 1/(5 + 2n)
(v) the sequence defined for n ≥ 0 by bn := 14 − n/2
Definition 9.2. A sequence is called geometric if the ratio between successive terms is
constant. In other words, if the sequence is {uj }, then the ratio uj+1 /uj has some common
value r for all j.
For example, the sequence 10, 20, 40, 80, 160, . . . is geometric with common ratio 2.
Exercise 9.3. Write a formula for the general term of this geometric sequence.
Sequences with alternating signs appear often enough that it’s a good idea to know a way
to write their general term. The key to being able to write such sequences is to notice that
(−1)n bounces back and forth between +1 and −1. The odd terms are negative, so starting
with n = 1 (or 3 or 5, etc.) starts with −1 whereas starting with 0 (or 2, -2, etc.) starts
with +1. You can incorporate this in a sum as a multiplicative factor and it will change
the sign of every second term. Thus for example, to write the sequence 1, −2, 3, −4 . . . you
can write (−1)n+1 · n . Note that we used (−1)n+1 rather than (−1)n so that the term
corresponding to n = 1 was positive rather than negative.
When the sum has a pattern that takes a couple of steps to repeat, the greatest integer
n+2
function can be useful. For example, 1, 1, 1, 2, 2, 2, 3, 3, 3, . . . can be written as an :=
3
for n j≥ 1. Actually, it comes out a little more simply if you index starting from zero:
n k
an := + 1 for n ≥ 0.
3
Definitions by cases work for sequences just the way they do for functions. Suppose you
want to define a sequence with an opposite sign on every third term, such as −1, −1, 1, −1 −
118
1, 1, . . .. You can do this by cases as follows.
(
−1 n is not a multiple of 3
an := .
1 n is a multiple of 3
Plenty of sequences don’t fit any of these molds. Writing a formula for the general term is
a matter of trying an expression, seeing if it works, then if not, tinkering to get it right.
Let’s talk for a minute about a notation you have likely seen before. It is called the “Sigma”
notation because Σ is a capital Greek Sigma. The notation involves an indexing variable
which runs between a lower limit and an upper limit. The lower and upper limits are
required to be integers12 . If the indexing variable is n, the lower limit is L, the upper limit
XU
is U and the general term is bn , the summation looks like bn . What this means is to
n=L
add together all the values of bn starting with n = L and ending with n = U .
5
X 1 1 1 1 1
Example 9.3. 2−n represents the sum + + + + .
2 4 8 16 32
n=1
The summand, as you can see is usually a function of the indexing variable; otherwise, the
summand would not change from term to term. There may be other variables, for example
X6
kx evaluates to 3x + 4x + 5x + 6x, which is equal to 18x. Note that this other variable
k=3
x persists when the sum is evaluated. It is a free variable. On the other hand, the index of
summation, k in this case, is a bound variable. It runs over a set of values (in this case 3
to 6) and does not appear in the final value.
n
X C
Exercise 9.4. In the sum , which of the variables k, n and C are free and which are
k
k=1
bound?
119
19
X 3
Example 9.4. The sum represents a series with 15 terms because there are 15
n−2
n=5
integers in the range from 5 to 19. Informally, we might write this sum by writing the
first few terms and the last term, with dots in between (traditionally the dots are centered
for series, as opposed to at the bottom of the line for sequences). Thus we would write
3 3 3
+ + · · · + , assuming this conveyed enough information for the reader to understand
3 4 17
the precise sum. Of course there is no reason why the index should go from 5 to 19. There
have to be fifteen terms, but why not write the sum with the index going from 1 to 15?
Then it would look like
15
X 3
.
n+2
n=1
Exercise 9.5. Write a summation that sums the integers from 1 to 100 for which the lower
limit is −5.
The series in Example 9.4 sums to a rational number. According to Excel it is equal to
23763863/4084080. There isn’t any really nice formula for this sum in terms of the values
5 and 19 and the function n 7→ 3/(n − 2). In fact most series don’t have nice summation
formulas. Arithmetic and geometric series are exceptions. Because they are common and
the formulas are simple and useful, we include them in this course.
Arithmetic series
Here’s an example of how to sum an arithmetic series, which generalizes easily to summing
any arithmetic series. This particular example is a well known piece of mathematical folklore
(google “Gauss child sum”).
120
Example 9.5. Problem: Sum the numbers from 1 to 100. Solution: Pair the numbers
starting from both ends: 1 pairs with 100, 2 pairs with 99, and so forth, ending at 50 paired
with 51. There are 50 pairs each summing to 101, so the sum is 50 × 101 = 5050.
Advice: this is more general than our usual exercise. You might find it easier to try a few
examples with numbers before doing the exercise with algebraic expressions.
Geometric series
The standard trick for summing geometric series is to notice that the sum and r times the
sum are very similar. It is easiest to explain with an example.
10
X
Example 9.7. Evaluate 7 · 4n−1 .
n=1
To do this we let S denote the value of the sum. We then evaluate S − 4S (because r = 4).
121
I have written this out so you can see the cancellation better.
S − 4S = 7 + 28 + 112 + · · · + 7 · 49
− (28 + 112 + · · · + 7 · 49 + 7 · 410 ) .
Thus,
(1 − 4)S = 7 − 7 · 410 .
Exercise 9.7. The chance that it takes precisely n rolls of a standard die in order to roll
your first 6 is (5/6)n−1 (1/6). Sum 10 terms of a geometric series to find the chance that
you first see a 6 by the time of your tenth roll.
M
X
General case: Evaluate A · rn−1 .
n=1
1 − rM
S=A .
1−r
When A and r are positive, all the terms are positive, hence the sum is positive as well.
When r < 1 this is very evident from the formula. When r > 1 it is true as well, but easier
to see multiplying top and bottom by −1 so as to get A(rM − 1)/(r − 1). When r = 1 this
quotient is undefined, however the sum is very easy: M copies of A sum to A · M .
No discussion of series would be satisfying if it didn’t answer the question, “Is 0.9999 . . .
(repeating) actually equal to 1?” As you can probably guess, it is a matter of definition.
However, there is a standard definition, and therefore we can in fact supply an answer (see
below).
Definition 9.8.
∞
X M
X
bn := lim bn .
M →∞
n=L n=L
122
This definition might require a bit of unpacking. First of all, the colon-equal is right: the
symbol ∞
P
n=L bn on the left is not already defined, and we are defining it to be the value
on the right. So what we are saying is that the sum of an infinite series is the limit of a
certain sequence, called the sequence of partial sums.
Example 9.9. How does this definition apply to the so-called harmonic series, ∞
P
n=1 1/n?
It says that this infinite sum is equal to the limit of the sequence {HM }, where HM is the
harmonic number M
P
n=1 1/n. The harmonic numbers HM are said to be the partial
sums of the harmonic series. Interpreting the infinite sum in this way doesn’t tell us
whether the limit is defined, or if so, what it is, it just tells us that if we can evaluate
the limit limM →∞ HM , this is by definition the sum of the harmonic series. If the limit is
undefined, then the sum of the harmonic series is undefined.
Exercise 9.8. The alternating harmonic series is the series 1 − 1/2 + 1/3 − 1/4 + · · · .
1. Write this as an infinite summation.
2. Write the value of this infinite sum as a limit.
3. State your guess as to whether this limit is defined; if so, estimate (unscientifically)
what it is; if not, say whether or not you think the limit is ∞ or −∞.
Because we know how to sum finite geometric series, we can sum infinite geometric series.
Example 9.10. Problem: evaluate 1 + 1/2 + 1/4 + 1/8 · · · . Solution: this is the infi-
∞
X
(1/2)n . The value is the limit of the partial sums SM := M n
P
nite sequence n=0 (1/2) .
n=0
Evaluating these finite sums gives
1 − (1/2)M +1 1
SM = =2− M .
1 − 1/2 2
The infinite sum is then limM →∞ 2 − (1/2)M which is clearly equal to 2.
Exercise 9.9. Write 0.9999 . . . (repeating) as an infinite geometric series, then evaluate it
to see if it is really equal to 1.
Consider a mortgage loan (loan for a house) or car loan, at a annual interest rate r. Typically
payments on these are made monthly, which we will take to be every 1/12 of a year instead
123
of counting days (most car loans in fact assume this). Recall from Exercise 6.8 that the
one-month growth factor (the factor by which your debt grows each month) is er/12 . That’s
only if you don’t pay off the loan. Actually, these loans are typically configured so you pay a
fixed amount every month until the loan is paid off in an integer number of months (usually,
in fact, an integer number of years). To agree on some notation, let r be the annual interest
rate, P be the principal, that is the initial debt, and let M be the monthly payment.
In order to deal successfully with car sales people, it’s helpful to understand how these
determine your balance over the successive months. The key relation is to understand what
happens from one month to the next. We will discuss this, then leave the rest of the balance
sheet computation for in-class discussion and homework. To determine your debt after a
month, just take your initial debt P , multiply by the factor er/12 for the growth of the debt
over the first month, and subtract the amount you just paid off, namely M . We can write
this as P1 = er/12 P0 −M . It holds equally from any month to the next: Pn+1 = er/12 Pn −M ,
where Pn is your debt after n months.
How about your retirement account? Say while you’re working, you put M dollars every
month into an interest bearing account. How much do you have after n months? It’s the
same formula, with an opposite sign because you’re adding to your balance, not subtracting.
Exercise 9.10. Write a formula for your retirement balance after n + 1 months, Pn+1 , in
terms of your balance Pn after n months.
A guaranteed rate annuity works similarly. By the time you retire you have put P dollars
into an account. (How did this happen? See Exercise 9.10.) You hand this over to a
company who guarantees you a certain APY every year, call it Y . Each year you also
withdraw a fixed amount to live on, call it M .
Exercise 9.11. Write a formula for Pn+1 in terms of Pn , Y and M .
The University of Pennsylvania’s endowment works something like this. The balance in-
creases by roughly 5% each year due to the growth of the investments and new donations.
Meanwhile, during the year, the university spends roughly 3.4% of the present endowment.
Unlike the formula for growth of a retirement fund or reduction of debt, this one is only
approximate because the actual return varies. Nevertheless, it is useful for forecasting. Let
En denote the size of the endowment after n years.
Exercise 9.12. What is the relation of En to En+1 ? In what way does this formula differ
from the other three (loan, retirement account, annuity)?
124
10 Integrals
10.1 Area
Integrals compute many things, the most fundamental of these being area. The definition
of area is more subtle than one might think. Most people’s understanding of area is based
on a physical concept of how much two-dimensional space is taken up. For example, if you
have to paint an irregular flat shape, how much paint does it take?
Looking back at the treatment of area in the pre-college math curriculum, you can see the
steps toward a mathematical definition. First, for rectangles with integer sides a and b,
you can count the number of 1 × 1 squares needed to make the rectangle, leading to the
area formula A = a × b. From the physical point of view this is a formula, but from the
mathematical point of view it is a definition, extended later to non-integer side lengths.
Areas of triangles are not studied until much later. For right triangles with sides a and
b and hypotenuse c, the area is shown to be equal ab/2 by showing that two of these fit
together to make an a×b rectangle. This invokes a new principle: areas of congruent figures
are equal. To compute the area of a parallelogram or trapezoid, the dissection principle
is invoked: cutting up and rearranging the pieces of a figure preserves the area. These
principles, all of which make intuitive and physical sense, are illustrated in Figure 35.
h h
b b
b b c
h
h
c c b
Exercise 10.1. Write a sentence for each of the two rows explaining how it proves an area
formula. What is being asserted to have the same area as what, why is this true, and what
is the conclusion?
125
The area of a circle is introduced, usually without much explanation. Do you know why
the area of a circle of radius r is equal to πr2 ? One common explanation is that areas of
similar figures are related by a scaling principle. Recalling that area has units of squared
length, it makes sense that scaling a figure by λ should scale the area by λ2 . All circles
are similar; if follows that the area of a circle should be Kr2 for some constant K. We can
name this constant π but that leaves a nagging question unanswered. Scaling also shows
that the circumference of a circle should be proportional to the radius, therefore C = K 0 r
for some other constant K 0 . This turns out to be 2π. But why should K 0 be double K? An
argument involving dissections and limits is shown in Figure 36.
r
r
Figure 36: a limit of dissections relates the constants for circumference and area
Exercise 10.2. In Figure 36 the measure πr on the right refers to the total curved length
of the bottom. We have not defined limits of shapes, but intuitively, what is the limiting
shape on the right and what are its dimensions?
Once limits are brought into the discussion, there is a way to define areas of much more
general shapes. The idea is this: put as many non-overlapping squares of some small side
length ε as you can inside the shape. These cover an area less than the area of the shape,
but if ε is small, it seems credible that the area is getting close to the area of the shape.
If the limit as ε → 0+ exists, this should be the area of the shape. Similarly, you could
completely cover the shape with squares of side ε if you are willing to cover a slightly too
big region. When ε is small, you don’t cover too much extra area. The limit as ε → 0+
should also be the area of the shape. To make a long story short (you can hear the full story
in Math 360), there are many shapes for which it is possible to prove that these two limits
exist and are equal. For these shapes we can define area to be this common limiting value.
This mathematical definition captures our existing physical intuition and is also consistent
with the principles we already adpoted: congrunce, scaling and dissection.
126
With this build up, we will mathematically define area
for a certain restricted class of shapes. The class of
shapes we start with will be the class of shapes that are
rectangular on three sides but whose top is described
by an arbitrary continuous function. More precisely,
let a < b be real numbers and let f be positive and
continuous on the closed interval [a, b]. We will define
the area of the region R bounded on the left by the
vertical line x = a, on the right by the vertical line
x = b, on the bottom by the x-axis (the line y = 0),
and on the top by the graph of f (the curve y = f (x)). Figure 37: region between the
This region is shown in Figure 37. x-axis and the graph of a func-
tion
We now define the lower and upper Riemann sums with n rectangles for a function f on an
interval [a, b]. If you prefer a picture, refer to Figure 38.
127
Definition 10.1. Let f be a nonnegative continuous function on an interval [a, b] and let n
be a positive integer. Let I1 , . . . , In denote the intervals you get when you divide [a, b] into
n equally sized intervals. For each interval Ik , let yk be the minimum value of f on Ik and
let Rk be the rectangle with base Ik on the x-axis and height yk . The lower Riemann sum
for f on [a, b] with n rectangles is the sum of the areas of the rectangles Rk , for 1 ≤ k ≤ n.
The upper Riemann sum is defined similarly, with the maximum value instead of the
minimum value on each interval.
Exercise 10.4. What are the endpoints of the interval I2 in Figure 38? What is the
approximate value of y2 ?
Example 10.2. We are not given precise values for the function f in Figure 38, but we can
estimate from the graph. The rectangles each have width 4/3. The respective heights for
the lower Riemann sum appear to be roughly 1.9, 1.6 and 1.7, making the lower Riemann
sum equal to (4/3)1.9 + (4/3)1.6 + (4/3)1.7 = (4/3)5.2 ≈ 6.93. The upper Rieman sum
is computed from rectangles with approximate heights 3.3, 1.9 and 2.15, leading to a total
area of (4/3)7.35 = 9.8.
The left Riemann sums and right Riemann sums are defined similarly, except that
instead of using the minimum or maximum values of the function on each sub-interval the
left Riemann sums uses the value at the left endpoint of each interval Ik , while the right
Riemann sum uses the value at the right endpoint of each sub-interval Ik . Examples are
shown in Figure 39.
128
Exercise 10.5. Is the left Riemann sum on the left of Figure 39 or on the right?
The upper and lower Riemann sums give upper and lower bounds on the area of the figure.
The left and right Riemann sums are neither upper nor lower bounds for the area, but they
are sandwiched in between the lower and upper Riemann sums, so they also converge to
the area. They are useful because always choosing the left endpoint (or always choosing
the right endpoint) leads to a simpler formula.
Exercise 10.6. Write a summation formula for the left Riemann sum for f on [a, b] with
10
X
10 rectangles. It should have 10 terms and look like this: ··· .
n=1
The values of the lower and upper Riemann sums in Figure 38 are approximately 6.9 and
9.8. These are not very close to each other, leaving considerable uncertainty about the true
area. Replacing by the left (say) Riemann sums, we can program the sum into a computing
device and compute for much greater values of n. If we increase n from 3 to 10, as in
Figure 39, we find the Riemann sums come out to approximately 8.48 and 8.02 – somewhat
better. These are not necessarily bounds: the true value could be greater than both, or less
than both, or in between. Replacing n by 50 gives 8.28. This is again not a bound, however
the following theorem guarantees that as n → ∞, this will converge to the area.
Theorem 10.3. The upper Riemann sums for any continuous function f on any closed
interval [a, b] converge as n → ∞. The lower Riemann sums converge to the same value.
It follows that you can let yk = f (xk ) for any xk ∈ Ik and the sums of rectangle areas will
still converge to this common limit.
Definition 10.4. The common limit in Theorem 10.3 is called the definite integral of f
Z b
from a to b and is denoted f (x) dx.
a
Exercise 10.7. Let f be the constant function c. How far apart are the lower and upper
R9
Riemann sums for 3 c dx (pick any value of n)? What does that tell you about the definite
Rb
integral a c dx?
Rb
Remark. The variable x is a bound variable; the notation a f (u) du would represent the
same thing. Also, as in the notation for derivatives, you shouldn’t try to interpret what
the symbol du means on its own. It evokes the width of an infinitesimal rectangle, but you
can’t always count on it to behave nicely in equations.
129
Exercise 10.8. From this construction and theorem, you can deduce some identities for
integrals. Simplify these definite integrals in terms of more basic ones.
Rb Rc
(i) a f (x) dx + b f (x) dx
Rb
(ii) a 3 + 10f (x) dx
Area is the most visually obvious interpretation but there are many others. If material (or
charge, or mass, etc.) is spread out unevenly over an interval, the density at any point
is the amount of material per length near that point. It has units of material divided by
length. The total amount of material in the interval is gotten by summing how the amount
of material over small intervals. When the interval is small enough, we can estimate the
amount of material as f (x) times the length of the interval where x is any point in the
interval. This is not exact because f generally will still vary over the interval, but not by
Rb
much when x is small. The limit as the interval lengths go to zero will be a f (x) dx and
will represent the total material.
Example 10.5. A 3-inch blade of grass is covered in mold. The amount of mold decreases
up the blade because it is killed by sunlight. The density of mold per inch is 1000e−x/3
spores per inch at height x inches from the ground. The total number of spores on the
R3
blade of grass is given by 0 1000e−x/3 dx.
Exercise 10.9. Why did we use 0 and 3 for the limits of integration in Example 10.5?
Integrals can also be used to give averages. For a finite collection, the average is defined to
be the total divided by the number you added to get the total. Averages over an interval
are defined similarly.
Definition 10.6 (average over an interval). The average of a quantity varying over an
Z b
1
interval [a, b] according to a function f is defined to be f (x) dx.
b−a a
Example 10.7. Suppose the temperature over a day is f (t) degrees Celsius t hours after
Z 24
1
midnight. The average temperature over the day is then f (T ) dt.
24 0
130
Exercise 10.10. Suppose f (x) is some constant c on the interval [a, b]. Intuitively, what
is the average of f on [a, b]? Compute the average value of f on [a, b] directly from the
definitions and check that it is what you expected.
An integral is a limit of a sum of rectangle areas. The units are therefore the same units
as the rectangle areas. The rectangles live on a graph where the x-axis has units of the
argument variable and the y-axis has units of the function. Therefore the rectangle units,
hence the integral units, are units of the argument times units of the function. In the grass
example, the function was density (spores per inch) and the argument was inches, therefore
the integral had units of spores. It is a good thing that this agrees with our interpretation
of the integral as the total number of spores. In the temperature example, f has units of
temperature and t is in units of time, so the integral of f has units of temperature times
time. This sounds like a strange unit but it’s not unheard of. Severity of cold spells is
measured, for example, in heating degree-days. The average is the integral divided by the
time, so it is in units of temperature. Of course: the average temperature should be a
temperature!
In physics there are countless things represented by integrals. One is the moment. Suppose
mass is spread out along [a, b] with density f (you know what that means now, right?).
Rb
Integrate f and you get the total mass. If instead you compute a x f (x) dx you get the
moment of inertia, which tells you how much the weight counts when balancing (imagine
a teeter-totter pivoting on the origin), or how much torque is needed to produce a given
angular acceleration.
In probability theory, random quantities can be discrete or continuous. If the random quan-
tity X is discrete it means that there is a set of values x1 , x2 , . . . such that probabilities
for X = xk sum to 1. This could be a finite sum or the sum of an infinite sequence (you
now know the definition of an infinite sum, right?). For continuous quantities, you need
integrals. The probabilities for finding X to take various values are spread continuously
over an interval (possibly an infinite interval such as the whole real line). There will be a
probability density function f such that the probability of finding X in a given interval
Rb
[a, b] will be a f (x) dx. We will say more about this in Section 12, after we have defined
integrals where one or both of the limits of integration can be infinite.
131
Going back to the area interpretation, you may ask
what about more general shapes? It turns out you
don’t really need straight sides. The vertical walls on
the left and right sides of the regions in Figures 37
and 38 can disappear. For example, letting f (x) =
√
1 − x2 and [a, b] = [−1, 1] produces the upper half
of a circle.
Figure 40: the signed area A2 will be negative because it is below the x-axis
132
Another useful definition along the same lines switches the upper and lower limits.
Ra Rb
Definition 10.8. If a < b then b f (x) dx is defined to equal − a f (x) dx.
Rb
Suppose f and g are functions such that f ≥ g on [a, b]. One interpretation of a [f (x) −
g(x)] dx is that it is the area of the shape with upper boundary f and lower boundary g.
We started out computing areas of a very specific set of shapes, looking like three sides
of a rectangle and a possibly curved upper boundary. Using the idea of upper and lower
boundaries we can use integrals to give the area of a much greater variety of shapes.
Exercise 10.11. On a coordinate axis, draw a heart shape (you know, the classic Valen-
tine’s heart). Then draw in values a and b on the x-axis and graphs of functions f and g
Rb
such that the area of the heart is computed by a [f (x) − g(x)] dx.
The examples of densities of quantities spread out along a line is somewhat limited. When
quantities spread out, usually they spread over a region in a plane or in three dimensions.
The next calculus course covers multivariable integration. Still, there are some higher
dimensional things you can do with ordinary integrals. One of these is to compute a volume
of an object if you know the area of its cross-sections. Dividing the object into n very thin
slabs, the volume of the k th one is roughly the thickness ∆k times the cross-sectional area
of the k th slab, call it Ak ; see Figure 41.
The limit of nk=1 Ak ∆k should give the volume. Line up the slabs so that the x-axis goes
P
perpendicular to the slabs. This limit looks awfully similar to the limit of nk=1 f (xk )∆k
P
where xk is any point on the x-axis inside the k th slab and f is the function telling the cross-
Rb
sectional area at every x-value. Therefore, the volume is computed by a f (x) dx where a
and b are the x-values at the first and last slab respectively.
133
Example 10.9 (area of a pyramid). We write an integral for area of a pyramid whose base
is a square of side length s and whose height is h. It corresponds best to the description
above if we orient it so the height is measured along the x-direction with the apex at the
origin. See Figure 42. The cross-section is a square with side increasing linearly from 0
to s as x increases from 0 to h. Thus, the side length is given by `(x) = (s/h)x, hence the
cross-sectional area is given by f (x) = (s/h)2 x2 between x = 0 and x = h. The volume
Rh
is therefore given by 0 (s/h)2 x2 dx. When you learn to compute integrals, this will be a
pretty easy one.
The reason we make such a fuss over integrals is that they can often be exactly computed.
To see how this works, we look at the indefinite integral. Replacing the upper limit on
the integral by a variable yields a function of that variable. To say this in another way, we
Rb
may consider a f (x) dx as a function of the free variables a and b (it can’t be a function of
x because x is a bound variable). Let a remain a constant but consider b to be a variable.
Rb
We then have a function, b 7→ a f (x) dx. Denote this function by G, in other words
Rb
G(b) := a f (x) dx.
Rb
Example 10.10. Let f (x) := 3x and a = 0. Then G(b) := 0 3x dx. Definite integrals
134
compute area, hence G(b) is the area of the triangle with vertices at the origin, (b, 0) and
(b, 3b). The triangle area formula gives G(b) = (3/2)b2 .
For fun (we have a warped sense of fun), compute G0 . That’s an easy one: G0 (b) = 3b. Note
that this is the integrand of the original integral, with the free variable b in place of the
bound variable x. This is not a coincidence, as the following theorem asserts.
G(b + h) − G(b)
G0 (b+ ) = lim .
h→0+ h
Anti-derivatives
135
How do we find anti-derivatives? The next chapter is entirely about computing these.
Like rules for differentiation, rules for anti-differentiation start from a collection known
results. For derivatives, we obtained these from the definition by computing limits. For
anti-derivatives, we will get these simply by remembering some basic derivatives. The
simple rule yielding the derivative of a polynomial may be run backwards. So for example
a
the monomial axm has anti-derivative m+1 xm+1 . We can sum these, obtaining the anti-
derivative of any polynomial: an anti-derivative of m
P k
Pm ak k+1
k=0 ak x is given by k=0 k+1 x .
In fact this works for negative or fractional powers, as long as the power is not −1.
Exercise 10.12.
(i) Why can’t the power be −1?
(ii) Compute an anti-derivative of x2 − 5x + 6.
(iii) Compute a different anti-derivative of x2 − 5x + 6.
We say “an anti-derivative” rather than “the anti-derivative” because there is more than
one. The functions G and G + c, where c is a constant, have the same derivative, so one
is an anti-derivative of f if the other is. This is the only way anti-derivatives can differ13 .
Once you know the value of the anti-derivative at any point, it is easy to reconstruct the
correct anti-derivative as an integral, as in the following example.
For concreteness, let’s see how this works with the example from above: f (x) = 3x. Then
Rb Rb R3
G(b) = 7 + 3 3x dx. We already computed 0 3x dx = 23 b2 and similarly 0 3x dx =
Rb
(3/2)32 = 27/2. Subtracting, 3 3x dx = (3/2)b2 − 27/2. Thus the anti-derivative we are
looking for is 7 + (3/2)b2 − 27/2 = (3/2)b2 − 13/2.
136
anti-derivative of f .
Note: this implies that H(b) − H(a) = G(b) − G(a) when H is any other anti-derivative
of f . In other words, differences of an anti-derivative at a specified pair of points do not
depend on which particular anti-derivative was chosen.
R6
Exercise 10.13. Compute 1 x2 − 5x + 6 dx.
We have seen integrals interpreted as areas and volumes, totals and averages, moments, and
probabilities. One more use of an integral is to estimate a sum. In a way this is the reverse
of the definition, which tells you that an integral is estimated by Riemann sums, in fact is
a limit of such sums. Going the other way, if we have a sum, we can write an integral for
which it is a Riemann sum. We may then expect the integral to be a good approximation
for the sum. This will be easier when we know how to compute more integrals, but there
are plenty we can already compute. We illustrate with a long example. It starts with the
fact that the derivative of ln x is 1/x. This means that an anti-derivative of 1/x is ln x.
Example 10.15 (harmonic sum estimated by an integral). Problem: estimate the 100th
harmonic number 1 + 1/2 + 1/3 + · · · + 1/100. To solve this, we may as well estimate
Hn := nk=1 1/k for any positive integer n. Summing 1/n looks a lot like integrating 1/x.
P
Rn
In fact, suppose we write a Riemann sum for 1 1/x dx that has precisely n − 1 rectangles.
Then the intervals Ik are just the intervals [1, 2], [2, 3], . . . , [n − 1, n]. Even better, we can
make areas of the rectangles be exactly the same as in the sum. We just need to use the
upper Riemann sum: 1 + 1/2 + · · · + 1/(n − 1); see the left-hand side of Figure 43 for a
picture of this when n = 9.
Rn
We have shown that Hn−1 is an upper Riemann sum for 1 1/x dx. By Proposition 10.14
the integral is the difference of anti-derivatives:
Z n
1
dx = ln n − ln 1 = ln n .
1 x
Therefore, we have shown the bound Hn−1 ≥ ln n. In particular, choosing n = 101, we see
that H100 ≥ ln 101 ≈ 4.615.
Is this an upper bound or a lower bound? It depends on your point of view. If we were
trying to figure out the integral up to 100, H100 would be an upper bound on the value. But
137
Figure 43: representing the harmonic sum as upper and lower Riemann sums
in this case we know the integral and are trying to estimate H100 . The integral provides a
lower bound, in this case 4.615.
What about an upper bound on H100 . The obvious thing is to see if we can make the
same sum be a lower Riemann sum. Watch what happens when you try to do this. Take
the graph, shift all the rectangles one unit left, and voilà! (See the right-hand side of
Figure 43.) This shows that H100 is a lower Riemann sum for a slightly different integral,
Z 100
1
namely dx. Alas, this is not an integral we can do because 1/x is not continuous at
0 x
x = 0. In fact, when we study improper integrals, we will see this evaluates to +∞. Sure,
we get the upper bound H100 ≤ ∞, but that is hardly useful. All is not lost, however, if we
use some common sense. The same picture shows that an upper bound for the harmonic
sum starting at 2 instead of 1 is
100 Z 100
X 1 1
≤ dx = ln 100 ≈ 4.605 .
k 1 x
k=2
So, adding back the 1, we see that H100 ≤ 1 + ln 100 ≈ 5.605. This is about as good as
we can do with the techniques we have so far: 4.615 ≤ H100 ≤ 5.605. For the record,
H100 ≈ 5.1874.
138
Trapezoidal approximation
Sometimes it can be frustrating using Riemann sums because a lot of calculation doesn’t
get you all that good an approximation. You can see a lot of “white space” between the
function f and the horzontal lines at the top of the rectangles that make up the upper or
lower Riemann sum. If instead you let the rectangle become a right trapezoid, with both
its top-left and top-right corner on the graph y = f (x), then you get what is known as the
trapezoidal approximation. The figure shows a trapezoidal approximation of an integral
R4
0 f (x) dx with five trapezoids. Note that the first and last trapezoid are degenerate, that
is, one of the vertical sides has length zero and the trapezoid is actually a right triangle. It
is perfectly legitimate for one or more of the trapezoids to be degenerate.
Because the tops of the slices are allowed to slant, they remain much closer to the graph
y = f (x) than do the Riemann sums. Because the area of a right trapezoid is the average of
the areas of the two rectangles whose heights are the value of f at the two endpoints, it is
easy to compute the trapezoidal approximation: it is just the average of the left-Riemann
sum and the right-Riemann sum corresponding to the same partition into vertical strips.
R2 1
Example 10.16. Compute the trapezoidal approximation for 1 1+x 2 with 10 trapezoids.
Averaging the left and right Riemann sums always gives a sum containing the n−1 common
terms plus half the first term for the left Riemann sum and the last term for the right
139
Riemann sum. In this case one gets
9
1 f (1) 1 f (0) X 1 j
+ + f 1+ .
2 10 2 10 10 10
1
The outcome of trapezoidal approximation in general can be summarized as, “Sum the
values of f along a regular grid of x-values, counting endpoints as half, and multiply by the
spacing between consecutive points.”
The trapezoidal estimate is usually much closer than the upper or lower estimate, though
it has the drawback of being neither an upper nor a lower bound. However, if you know the
function to be concave upward then the trapezoidal estimate is an upper bound. Similarly
if f 00 < 0 on the interval then the trapezoidal estimate is an lower bound. In the figure, f
is concave downward and the trapezoidal estimate is indeed a lower bound.
Example 10.17. The function 1/(1+x3 ) is concave upward on [1, 2] (compute and see that
the second derivative is a positive quantity divided by (1 + x3 )3 ) so the trapezoidal estimate
should be not only very close but an upper bound. Indeed, the trapezoidal estimate is the
average of the upper and lower previously computed and is equal to 0.25485... which is
indeed just slightly higher than the true value of 0.25425....
Aside. Just as Riemann sums estimate by strips with constant height (degree zero) and
trapezoids estimate by strips whose height is a linear function, you could imagine using
higher degree polynomials (because you can still compute their areas exactly). Simpson’s rule,
for example, uses quadratic functions. It gets very good results! We won’t discuss it here
but you might want to ask your instructor about higher degree polynomial approximations,
which can be programmed without too much difficulty into a computation package or even a
spreadsheet.
140
11 Computing integrals
All continuous functions have anti-derivatives, but not all of the anti-derivatives have names.
Z 8 Z b
1 1
For example, the definite integral dx is a well defined quantity; indeed dx is
3 ln x Rb a ln x
well defined for any b > a > 1, but the function b 7→ a (1/ ln x) dx is not equal to any com-
bination of named functions such as powers, logs, exponentials and trig functions. The same
2 √ √ √
is true of the normal (bell curve) density function e−x , or sin x or 1 − 4x2 / 1 − x2 .
The prevalence of functions like this is the reason we need good numeric approximations
to integrals. In the remainder of this section we concentrate on anti-derivatives for which
reasonably nice exact expressions exist.
Computing derivatives, as you saw in Chapter 5, rests on combination rules and working
out some basic cases. For anti-derivatives the same is true, with“ working out” replaced
by “remembering”. In other words, if you remember what the derivative of f is, then you
know how to compute an anti-derivative of f 0 . This is how we computed anti-derivatives for
polynomials, for example. The strategy is then: (1) list the derivatives we already know,
organized in a way that allows us to query what function goes with a given derivative; and
(2) give combining rules for anti-derivatives. This gives the following proposition. Note
that in each case, remembering allows us to identify just one of the antiderivatives; we trust
you can compute the others from that.
Notation: we use an integral sign without upper and lower limits to denote the antideriva-
tive: e.g., (3x2 + 1) dx is equal to x3 + x, plus any constant. We usually write this as
R
The variable x is bound, so the choice of letter does not affect the meaning.
141
Proposition 11.1. The following basic anti-derivatives are computed by reversing Propo-
sition 5.8.
Z
1 m+1
(i) xm = x + C as long as m 6= 0.
m
Z
1
(ii) dx = ln x + C
x
Z
(iii) cos x dx = sin x + C
Z
(iv) sin x dx = − cos x + C
Z
(v) sec2 x = tan x + C
Z
(vi) ex dx = ex + C
Z
1
(vii) √ dx = arcsin x + C
1 − x2
Z
1
(viii) dx = arctan x + C
1 + x2
Z 1
1
Exercise 11.1. Use Proposition 11.1 to compute this definite integral: dx. You
0 1 + x2
will also need Proposition 10.14, which you should get used to using without even thinking
of it as an extra step.
The derivative of a sum or difference is the sum or difference of the derivatives. The
derivative of c·f is c times the derivative of f for any real constant c. This leads immediately
to the following proposition.
142
Proof: Let F be an anti-derivative of f and G be an anti-derivative of g. Then (F + G)0 =
F 0 + G0 = f + g therefore (F + G) is an antiderivative of f + g, proving (11.1).
Exercise 11.2. The proof of the second statement of Proposition 11.2 is even shorter. See
if you can write it down.
The word “anti-derivative” is a mouthful and so is the verb form “anti-differntiate”. Because
computing integrals comes down to anti-differentiation, common practice is to use the verb
integrate in place of “anti-differentiate”. We also call an anti-derivative an “integral”.
Propositions 11.1 and 11.2 allow us to compute some more integrals.
a cos x + b/ cos x
Example 11.3. Compute the integral of . Simplifying,
cos x
a cos x + b/ cos x
= a + b sec2 x .
cos x
Therefore
Z Z
a cos x + b/ cos x
dx = [a + b sec2 x] dx
cos x
Z Z
= a dx + b sec2 (x) dx
= ax + b tan x + C .
Exercise 11.3. One of your classmates argues this is wrong: a dx = ax+C and sec2 (x) dx =
R R
tan x + C, therefore the answer should be ax + tan x + 2C. Explain what is going on: is the
original answer is right, or the new answer, or both?
a cos x + b/ cos x
Example 11.3 should worry you. Does it seem a bit contrived? The expression
cos x
just happens to simplify into two expressions covered by the list of cases in Proposition 11.1.
If that seems like a piece of luck, it is. With only Propositions 11.1 and 11.2 you won’t get
very far. The next two sections give two rules for combining integrands that will greatly
increase your ability to integrate. Keep in mind though, that in some sense you are still
lucky whenever you can compute an analytic expression for an anti-derivative: many anti-
derivatives have no nice formula.
The sum rule for derivatives is simple enough that it leads directly to (11.1), which is an
identical rule for anti-derivatives. There is also a product rule, but it does not lead directly
143
to an identical rule for anti-derivatives. That’s because it is not symmetric. The derivative
of f g is not f 0 g 0 but rather f 0 g + g 0 f . Therefore, if we want to run it backwards, we get
Z
[f 0 (x)g(x) + g 0 (x)f (x)] dx = f (x)g(x) + C . (11.3)
The problem is, this doesn’t tell us how to integrate a product such as f 0 g 0 , but rather
f 0 g + g 0 f . This is great if someone asks us to compute the anti-derivative of f 0 g + g 0 f , but
this is rare, harder to spot, and does not answer the question as to the anti-derivative of
the product.
The best we can do is to exploit (11.3) as much as we can. This leads to the following
proposition.
Proposition 11.4 (integration by parts). Let u and v be differentiable functions. Suppose
u0 v is known to have anti-derivative G. Then v 0 u has anti-derivative uv − G. In a single
equation, Z Z
v u dx = uv − u0 v dx .
0
(11.4)
Proof: This is just the product rule run in reverse: (uv)0 = u0 v + v 0 u, therefore
(uv − G)0 = u0 v + v 0 u − G0 = v 0 u .
The way this works in practice is that when integrating an expression, you try to identify
the expression as v 0 u for some functions u and v. Then you check whether you already know
the anti-derivative to u0 v. If so, you subtract this from uv and you are done. Sometimes
there are several possible ways to do this, in which case you may have to try them all until
you find one that works.
Example 11.5. Use integration by parts to integrate xex . Obviously this decomposes as a
product of x and ex . One of these should be v 0 and the other should be u. Let’s try setting
v0 = x ;
u = ex .
At first this goes smoothly: the expression we chose for v has a known anti-derivative and
the one we chose for u has a known derivative, therefore we can find v and u0 :
x2
v = +C;
2
u0 = ex .
144
Unfortunately the next step doesn’t work: u0 v = ex (x2 /2 + C), which is not something
whose anti-derivative we recognize no matter what choice we make for the constant C.
v 0 = ex ;
u = x.
Again it goes smoothly at first: the expression we chose for v has a known integral ex and
the one we chose for u has a known derivative 1, therefore
v = ex + C ;
u0 = 1 .
Now we’re in better shape. Choose C = 0 (usually this works if anything does). Then
u0 v = ex , for which an integral is known, namely ex . Therefore,
Z Z Z Z
0 0
xe dx = uv dx = uv − u v dx = xe − ex dx = xex − ex + C .
x x
We did a long-winded example to show you the process of trial and error and to show how
each step works. What would have happened if we chose a different value of C? It turns
out it always works exactly as well.
Exercise 11.4. Complete the computation in the previous example, choosing C = 7 instead
of C = 0, to see that it works out the same after some cancellation. [Bonus question: can
you see why this cancellation always happens?]
It usually takes several worked examples and a lot of practice before integration by parts
feels natural. Because “a lot of practice” means different things to different people, we
include only a few mandatory self-check and howework problems, putting a greater number
online for those who want to practice.
R 2π
Example 11.6. Compute the definite integral 0 x sin(x) dx. We start with the indefinite
integral, which we compute by parts. Based on what happened with xex , let’s decide to
start with the choice u = x, v 0 = sin x. Then v = − cos x and u0 = 1, which yields
Z Z
x sin(x) dx = −x cos x − (− cos x) dx = −x cos x − (− sin x) = sin x − x cos x + C .
145
Evaluating the definite integral (notice we chose C = 0),
Z 2π
x sin(x) dx = [sin x − x cos(x)]x=2π − [sin x − x cos(x)]x=0
0
= [sin(2π) − 2π cos(2π)] − [sin(0) − 0 · cos(0)]
= −2π .
Rπ
Exercise 11.5. Evaluate 0 x cos(x) dx.
Here are a few more tips to help you use integration by parts. Also, you should see a
notational variation that is common in textbooks and on the web. Instead of v 0 u dx =
R
Z Z
u dv = uv − v du .
Because u and v are functions of x, you can think of du := u0 (x) dx and dv := v 0 (x) dx,
whereby this form of the identity comes out to exactly the same thing as (11.4).
Sometimes integration by parts doesn’t quite get you to an expression u0 v that you know
how to evaluate, but it gets you closer, so that repeating the integration by parts solves the
problem.
Z Z
x2 ex dx = x2 ex − 2xex dx .
That last expression isn’t covered by Proposition 11.1 but we just saw (take out the constant
factor 2) that it can be done by parts and integrates to 2(xex − ex ) = 2(x − 1)ex . Therefore,
Z
x2 ex dx = x2 ex − 2(x − 1)ex = (x2 − 2x + 2)ex .
It should be apparent you can integrate p(x)ex this way for any polynomial p. Some
textbooks have a separate algorithm for this called tabular integration. We won’t be
teaching that, but you can google it if you ever need the anti-derivative of p(x)ex where
146
p(x) has degree more than, say, 3 (doing it by hand gets longer and more complicated as
the degree of p grows). To see how this will go, try the following exercise, which is about
as much as we would ever ask you to do by hand.
Exercise 11.6. Compute x3 ex . Double check afterword by differentiating your answer.
R
You can always decompose any expression as itself times 1. In the langauge of v du and
R
u dv, that says f (x) dx can always be thought of as u dv where u(x) = f (x) and dv = dx,
that is, v 0 = 1. This only sometimes works but it’s good to know.
R
Example 11.8. Compute ln(x) dx. There’s only one term to decompose so we pretty
much have to use the dv = dx trick. Setting u(x) = ln x and dv = dx, gives (recalling that
the derivative of ln x is 1/x),
Z Z Z
1
ln(x) dx = (ln x)(x) − x · dx = x ln x − 1 · dx = x ln x − x + C .
x
This is a good one to memorize - it’s very useful to recall quickly how to integrate the
natural log.
11.3 Substitution
Integration by parts is what you get from reversing the product rule. Reversing the chain
rule is called substitution. You can probably guess what it says. The chain rule says
(d/dx)f (g(x)) = f 0 (g(x))g 0 (x). Therefore, we need a rule to tell us that f 0 (g(x))g 0 (x) dx =
R
Theorem 11.9. Suppose g is differentiable on an interval (a, b) and let I (which will also
be a closed interval) be the range of g. Suppose h is differentiable on I. Then
Z
h0 (g(x))g 0 (x) dx = h(g(x)) + C .
147
where the identity f = h0 allows us to write the indefinite integral f in place of h on
R
the right. This second form is sometimes clearer because we often arrive at the form
f (g(x))g 0 (x) before we have identified the antiderivative of f , hence it makes sense for
R
the right-hand side to leave f unevaluated.
(ln x)2
Example 11.10. We compute the integral of . The numerator (ln x)2 looks like a
x
composition f (g(x)) where f (x) = x2 and g(x) = ln x. We are in luck because g 0 (x) = 1/x
so there is alread a g 0 sitting there. The expression to be integrated looks like f (g(x))g 0 (x),
so applying (11.5),
(ln x)2
Z Z
2
dx = x ◦ ln .
x
The indefinite integral of x2 is x3 /3, so the final answer is that the indefinite integral of
(ln x)2 /x is (ln x)3 /3 + C.
Z
2
Exercise 11.7. Use the substitution method to evaluate (2x)ex dx.
The substitution rule is very often stated in the language of science, with a variable u,
thought of as a physical quantity related to the variable x via u = g(x). For this reason
the substitution method is commonly referred to as “u-substitution”, a name which is a
little silly only because it ties the method to a particular variable name u when of course
you could choose any name. Instead of a theorem, this version is usually described as a
procedure.
Again, we let g be the function relating u to x via u = g(x), and again you need hypotheses,
namely the ones stated in Theorem 11.9. Then du = g 0 (x) dx. Usually you don’t do this
kind of substitution unless there will be an g 0 (x) dx term waiting which you can then turn
into du. Also, you don’t do this unless the rest of the occurences of x can also be turned
into u. If g has an inverse function, you can do this by substituting g −1 (u) for x everywhere.
Now when you reach the fourth step, it’s easier because you can just plug in u = g(x) to
get things back in terms of x.
148
This notation gives a particularly nice simplification when u = x + c for some constant c.
Replacing x by x + c is called a translation. In the first unit of the course, we discussed
what this does to the graph. It is a very natural change of variables, corresponding to a
different starting point for a parametrization.
√
Example 11.11 (translation). Compute the indefinite integral of x + 6. Let u = x + 6.
Then du = dx. Integrating the 1/2 power (one of the basic facts in Proposition 11.1),
√ √
Z Z
2 2
x + 6 dx = u du = u3/2 = (x + 6)3/2 + C .
3 3
The moral of this story is that you can “read off” integrals of translations. For example,
R R
knowing cos x dx = sin x allows you to read off cos(x − π/4) dx = sin(x − π/4). Don’t let
this example fool you into thinking it works this way for functions other than translations.
R √ √
Thinking that cos( x) dx = sin( x) + C is wrong; it is the calculus equivalent of the
algebra mistake (a + b)2 = a2 + b2 .
is easily valuated as un+1 /(n + 1) + C. Now plug back in u = sin x and you get the answer
sinn+1 x
+C.
n+1
You might think to worry whether the substitution had the right domain and range, was
one to one, etc., but you don’t need to. When computing an indefinite integral you are
computing an anti-derivative and the proof of correctness is whether the derivative is what
you started with. You can easily check that the derivative of sinn+1 x/(n + 1) is sinn x cos x.
After a translation, the next simplest substitution is a dilation, where u(x) = cx for some
nonzero real number c. This is the other case in which substitution always succeeds: if you
can integrate f (x) you can always integrate f (cx). We leave it to you to work this out, first
in an example, then in the general case.
Exercise 11.8.
(i) Use substitution to integrate cos(5x).
149
(ii) Suppose you know the anti-derivative for f ; say f = h0 . Use subsitution to work out
R
the general formula for f (cx) dx.
When evaluating a definite integral you can compute the indefinite integral as above and
then evaluate. A second option is to change variables, including the limit of integration,
and then never change back.
Z 2
x
Example 11.13. Compute 2+1
dx.
1 x
If we let u = x2 + 1 then du = 2x dx, so the integrand becomes (1/2) du/u. If x goes from 1
to 2 then u goes from 2 to 5, thus the integral becomes
Z 5
1 du 1
= (ln 5 − ln 2) .
2 2 u 2
Of course you can get the same answer in the usual way: the indefinite integral is (1/2) ln u;
we substitute back and get (1/2) ln(x2 + 1). Now we evaluate at 2 and 1 instead of 5 and 2,
but the result is the same: (1/2)(ln 5 − ln 2).
Backwards substitution
There are times when the best substitution is of the form x = g(u) rather than u = g(x).
No matter what f and g are, the substitution x = g(u), dx = g 0 (u) du always leads to a
new integral, it’s just hard to choose g in a way that makes the new integral simpler than
the old one. It turns out there are some integrals, not apparently involving trig functions,
where substituting x = g(u) for some trig function g will magically unlock a dead end.
Knowing tricks for dealing with a wide class of anti-derivative extractions is not the aim of
this course, therefore we will not be featuring this method in the text. If you’re interested
in seeing one of these, try googling “integrate sqrt(1-xˆ2)”.
Looking it up
Math is about understanding relations of a precise nature, about abstraction, and about
making models of physical phenomena. It is also about building a library of computational
tricks, but that’s only a small part of math, and it’s somewhat time-consuming. We have
taught you what we think it is reasonable for you to know and remember – to have in
150
your quick-access library. For all the other integrals currently known to mankind, there are
lookup tables. The following integral table is stolen from a popular calculus book. Use it
as a handy reference, as needed.
151
12 Integrals over the whole real line
12.1 Definitions
The situation when integrating out to infinity is similar to the situation with infinite sums.
Because there is no already assigned meaning for summing infinitely many things, we de-
fined this as a limit, which in each case needs to be evaluated:
∞
X M
X
ak := lim ak .
M →∞
k=1 k=1
It is the same when one tries to integrate over the whole real line. We define such integrals
by integrating over a bigger and bigger piece and taking the limit. In fact the definition is
even pickier than that. We only let one of the limits of integration go to zero at a time.
Consider first an integral over a half-line [a, ∞).
Definition 12.1 (one-sided integral to infinity). Let a be a real number and let f be a
continuous function on the infinite interval [a, ∞). We define
Z ∞ Z M
f (x) dx := lim f (x) dx . (12.1)
a M →∞ a
Z 3
Exercise 12.1. Write down the defining limit for ex dx and evaluate the limit.
−∞
Z ∞
We remark that you can often substitute ∞ into the antiderivative and subtract: dx/x2 =
1
(−1/x)|∞
1 = 0 − (1) = 1. If the value of −1/(∞) were not obvious, you would need limits.
Aside. When you say −1/∞ = 0, recalling Definition 6.1, you are really saying
lim −1/M = 0 and then quickly evaluating that limit in your head.
M →∞
If we want both limits to be infinite then we require the two parts to be defined separately.
152
Definition 12.2 (two-sided integral to infinity). Let a be a real number and Let f be a
continuous function on the whole real line. Pick a real number c and define
Z ∞ Z c Z ∞
f (x) dx := f (x) dx + f (x) dx . (12.2)
−∞ −∞ c
If either of these two limits is undefined, the whole integral is said not to exist.
Z ∞
x
Example 12.3. What is 2
? Choosing c = 0, we see it is the sum of two one-sided
−∞ x +1
R∞ R0
infinite integrals 0 x/(x2 +1) dx+ −∞ x/(x2 +1) dx. Going back to the definition replaces
each one-sided infinite integral by a limit:
Z M Z 0
x x
lim 2
dx + lim dx .
M →∞ 0 x +1 M →∞ −M x2 +1
It looks as if this limit is coming to come out to be zero because x/(x2 + 1) is an odd
function. Integrating from −M to M will produce exactly zero, therefore
Z M
x
lim dx = lim 0 = 0 . (12.3)
M →∞ −M x2 +1 M →∞
Be careful! The definition says not to evaluate (12.3) but rather to evaluate the two one-
sided integrals separately and sum them. We will come back to finish this example later.
The answer to the first question is, pick c to be anything, you’ll always get the same answer.
This is important because otherwise, what we wrote isn’t really a definition. The reason the
integral does not depend on c is that if one changes c from, say, 3 to 4, then the first of the
R4
two integrals loses a piece: 3 f (x) dx. But the second integral gains this same piece, so the
sum is unchanged. This is true even if one or both pieces is infinite. Adding or subtracting
R4
the finite quantity 3 f (x) dx won’t change that.
153
The answer to the second question is yes, sometimes you can be more specific. The one-
sided integral to infinity is a limit. Cases where a finite limit does not exist can be resolved
into limits of ∞ or −∞, along with the remaining cases where no limit exists even allowing
for infinite limits. Because integrals over the whole real line are sums of one-sided (possibly
infinite) limits, the rules for infinity from Sections 3.2 and 6.2 can be applied. In other
words, integrals over the whole real line are the sum of two one-sided limits; we can add real
numbers and ±∞ according to the rules in Definition 6.1: ∞ + ∞ = ∞ (and analogously
with −∞), ∞ + a = ∞ when a is real (and analogously with −∞), ∞ − ∞ = U N D,
U N D + anything = U N D, and so on.
The third question is also a matter of definition. The reason we make the choice to do it
this way is illustrated by the integral of the sign function
1
x≥0
f (x) = sign(x) = 0 x=0
−1 x < 0
RM
On one hand, −M f (x) dx is always zero, because the postive and negative parts exactly
R∞ RM
cancel. On the other hand, M f (x) dx and −∞ f (x) dx are always undefined. Do we
R∞
want the answer for the whole integral −∞ f (x) dx to be undefined or zero? There is no
intrinsically correct choice here but it is a lot safer to have it undefined. If it has a value,
one could make a case for values other than zero by centering the integral somewhere else,
as in the following exercise.
R 7+M
Exercise 12.2. What is limM →∞ 7−M sign(x) dx?
Example 12.4. The function sin(x)/x is not defined at x = 0 but you might recall it does
have a limit at 0, namely limx→0 sin(x)/x = 1. Therefore the function
(
sin(x)/x x 6= 0
sinc (x) :=
1 x=0
is a continuous function on the whole real line. Its graph is shown in Figure 44. To write
down a limit that defines this integral, we first choose any c. Choosing c = 0 makes things
R0
symmetric. The integral is then defined as the sum of two integrals, −∞ sinc (x) dx +
R∞
0 sinc (x) dx. Going back to the definition of one-sided integrals as limits, this sum of
integrals is equal to
Z 0 Z M
lim sinc (x) dx + lim sinc (x) dx .
M →−∞ M M →∞ 0
154
Figure 44: graph of the function sinc
It is not obvious whether these limits exist. One thing is easy to discern: because sinc is
an even function, the two limits have the same value (whether finite or not). We can safely
say: Z ∞ Z M
sinc (x) dx = 2 · lim sinc (x) dx .
−∞ M →∞ 0
R∞
Exercise 12.3. Evaluate −∞ x dx by writing down the definition via limits and then eval-
uating.
12.2 Convergence
R∞
The central question of this section is: how do we tell whether a limit such as b f (x) dx
exists. If so, we would like to evaluate it if possible, and estimate it otherwise. When
R∞
discussing convergence you should realize that a f (x) dx either diverges for all values of
a or converges for all values of a as long as f is defined and continuous
Z ∞ on [a, ∞). For this
R∞
reason, we use the notation f (x) dx or, to be really blunt, f (x) dx.
who cares
Exercise 12.4. Explain the “you should realize” comment
Z ∞ Z ∞in a concrete context by stating
−3 ln(ln x)
a reason why e dx converges if and only if e−3 ln(lnx) dx converges. Hint:
3 6
remember the questions we said should bother you, “What is c? Does it matter?”
155
Case 1: you know how to compute the definite integral
RM
Suppose b f (x) dx is something for which you know how to compute an explicit formula.
The formula will have M in it. You have to evaluate the limit as M → ∞. How do you
do that? There is no one way, but that’s why we studied limits before. Apply what you
know. What about b, do you have to take a limit in b as well? I hope you already knew the
answer to that. In this definition, b is any fixed number. You don’t take a limit.
These special cases will become theorems once you have worked them out.
Z ∞
ekx dx
b
Z ∞
power test xp dx
b
∞
(ln x)q
Z
dx
b x
You will work out these cases in class: write each as a limit, evaluate the limit, state
whether it converges, which will depend on the value of the parameter, k, p or q. Go
ahead and pencil them in once you’ve done this. The second of these especially, is worth
remembering because it is not obvious until you do the computation where the break should
be between convergence and not.
Exercise 12.5. Work out the first special case: for what real k does the integral converge?
In this case you can’t even get to the point of having a difficult limit to evaluate. So probably
you can’t evaluate the improper integral. But you can and should still try to answer whether
the integral has a finite value versus being undefined. This is where comparison tests come
in. You build up a library of cases where you do know the answer and then, for the rest of
functions, you try to compare them to functions in your library.
156
Sometimes a comparison is informative, sometimes it isn’t. Suppose that f and g are
positive functions and f (x) ≤ g(x) for all x. Consider several pieces of information you
might have about these functions.
Comparison tests
R∞
(a) f (x) dx converges to a finite value L. conclusion:
Rb∞
(b) f (x) dx does not converge. conclusion:
Rb∞
(c) g(x) dx converges to a finite value L. conclusion:
Rb∞
(d) b g(x) dx does not converge. conclusion:
In which cases can you conclude something about the other function? We are doing this
in class. Once you have the answer, either by working it out yourself or from the class
discussion, please pencil it in here so you’ll have it for later reference.
Z ∞
3 + sin(x)
Exercise 12.6. Suppose you want to show that dx converges. Which pair
1 x2
of facts allows you to do this?
R∞ 2
(a) 3+sin
x2
x
≥ x22 and x2
dx converges
∞ 4
(b) 3+sin x
≤ x42 and
R
x2 x2
dx does not converge
∞
(c) 3+sin x
≤ x42 and 4
R
x2 x2
dx converges
∞
(d) 3+sin x
≤ x42 and 2
R
x2 x2
dx does not converge
Here are two key ideas that help your comparison tests work more often, based on the fact
that the question “convergence or not?” is not sensitive to certain things.
(1) Multiplying by a constant does not change whether an integral converges. That’s because
Z M Z M
if lim f (x) dx converges to the finite constant L then lim Kf (x) dx converges
M →∞ b M →∞ b
to the finite constant KL.
Z ∞
10
Exercise 12.7. Does dx converge or not? In either case, give a reason why. If
x
it converges, say to what. If it does not converge, is the value ∞ or −∞ or is it truly
undefined?
157
(2) It doesn’t matter if f (x) ≤ g(x) for every single x as long as the inequality is true for
sufficiently large x. For example, if f (x) ≤ g(x) once x ≥ 100, then you can apply the
R∞ R∞
comparison test to compare b f (x) dx to b g(x) dx as long as b ≥ 100. But even if not,
R∞ R∞ R 100
once you compare 100 f (x) dx to 100 g(x) dx, then adding the finite quantity b f (x) dx
R 100
or b g(s) dx will not change whether either of these converges.
Putting these two ideas together leads to the conclusion that if f (x) ≤ Kg(x) from some
R∞ R∞
point onward and b g(x) dx converges, then so does b f (x) dx. The theorem we just
proved is:
Theorem 12.5 (asymptotic comparison). If f and g are positive functions on some interval
(b, ∞) and if there are some constants M and K such that
In particular, if f (x) g(x) as x → ∞ then (12.4) holds, hence convergence of the integral
R∞ R∞
b g(x) dx implies convergence of the integral b f (x) dx.
Exercise 12.8. Let f (x) := 3x3 /(x − 17) and g(x) := x2 . Is it true that f (x) ≤ Kg(x)
from some point onward? Explain.
Z ∞
Example 12.6 (power times negative exponential). Does x8 e−x dx converge? One
1
way to do this is by computing the integral exactly. This takes eight integrations by parts,
and is probably too messy unless you figured out how to do “tabular” integration (optional
when you learned integration by parts). In any case, there’s an easier way if you only want
to know whether it converges, but not to what.
We claim that x8 e−x e−(1/2)x (you could use e−βx in this argument for any β ∈ (0, 1)).
R∞
It follows from the asymptotic comparison test that convergence of 1 e−(1/2)x implies
convergence of x8 e−x dx. We check the claim by evaluating
R
x8 e−x x8
lim = lim =0
x→∞ e−(1/2)x x→∞ e(1/2)x
because we know the power x8 is much less than the exponential e(1/2)x .
∞
x3
Z
Exercise 12.9. Does e−x dx converge? [You can use the result of Exercise 12.8.]
18 x − 17
158
A particular case of Theorem 12.5 is when f (x) ∼ g(x). When two functions are asymp-
totically equivalent, then each can be upper bounded by a constant multiple of the other,
hence we have the following proposition.
R∞
Proposition 12.7. If f and g are positive functions and f ∼ g then f (x) dx converges
R∞
if and only if g(x) dx converges.
Example 12.8.
Z ∞
dx
(i) Does 2 + 3x
converge?
1 x
Z ∞
1 1 dx
Answer: We can use comparison test (c) here: 2 ≤ 2 and we know
Z ∞ x + 3x x 1 x2
dx
converges, hence so does .
1 x2 + 3x
Z ∞
dx
(ii) Does converge?
4 − 3xx2
Answer: Now the inequality goes the other way, so we are in case (c) of the comparison
test and we cannot conclude anything from direct comparison. However, we also know
1 1
2
∼ 2 as x → ∞, therefore we can conclude convergence again by Proposition 12.7.
x − 3x x
Did you wonder about the lower limit of 4 in part (ii)? That wasn’t just randomly added
so you’d be more flexible about the lower limits of integrals to infinity. It was put there to
ensure that f was continuous; note the discontinuity at x = 3.
Exercise 12.10. Find a Zsimple function g such that (3x + cos(x))/x3 ∼ g(x) as x → ∞.
∞
3x + cos x
Then determine whether dx converges.
1 x3
Students have varied backgrounds when it comes to probability. A few have taken courses
in probability. Most have seen a little probability theory in high school. Some have never
studied anything to do with probability. Because of the varied backgrounds, we take a
couple of paragraphs to discuss the key concepts.
The first thing students usually learn is discrete probability, where the random variables
take values in a finite set, with given probabilities for each outcome. That’s because this
159
can be studied with middle school mathematics. For example, rolling two 6-sided dice leads
to 36 possible outcomes, each equally likely; this in turn leads to 11 possible outcomes for
the sum of the two dice, with probabilities ranging from 1/36 for 2 and 12 to 6/36 for 7.
All questions about rolls of finitely many dice can be answered with careful analysis and
basic arithmetic.
Random variables whose values are spread over all real numbers, or a real interval, require
calculus to define and study. These are called continuous random variables, and are the
topic of this section.
For discrete random variables you answer this type of question by summing the probability
that X is equal to y for every y in the set A. For continuous random variables, the probability
of being equal to any one real number is zero. In the example with the dart, the probability
√
that it lands exactly 3 feet from the left edge (or 1 foot, or 1/3 of a foot, or any other
real number of feet) is zero. The only way to get a nonzero probability is to consider an
entire interval of values. Thus the most basic questions we ask about X are: what is the
probability that X ∈ [a, b], where a < b are fixed real numbers. These probabilities will be
governed by a probability density, which is a nonnegative function telling how likely it
is for X to be in an interval centered at any given real number.
160
Exercise 12.12. Why do we require f to integrate to 1?
Sometimes f is defined only on an interval [a, b] and not on the whole real line. The
interpretation is that the random variable X takes values only in [a, b]. Probabilities for X
are then defined by integrating in sub-intervals of [a, b]. Often one extends the definition of
f to all real numbers by making it zero off of [a, b]. This may result in f being discontinuous
but its definite integrals are still defined.
Example 12.10. The standard exponential random variable has density e−x on [0, ∞). If
X has this density, what is P(X ∈ [−1, 1])? This is the same as P(X ∈ [0, 1]), because X
Z 1
1
cannot be negative. We compute it by e−x dx = e−x 0 = 1 − e−1 . As a quick reality
0
check we observe that the quantity 1 − 1e is indeed between zero and one, therefore it makes
sense for this to be a probability.
Often the model dictates the form of the function f but not a multiplicative constant.
Example 12.11. For example, if we know that f (x) should be of the form Cx−3 on [1, ∞)
then we would need to find the right constant C to make this a probability density. The
function f has to integrate to 1, meaning we have to solve
Z ∞
Cx−3 dx = 1
1
for C. Solving this results in C = 2, therefore the density of f is 2/x3 on [1, ∞).
Exercise 12.14. Suppose X has density proportional to cos(x) on the interval [−π/2, π/2].
What value of C makes C cos x a probability density on this interval?
Several important quantities associated with a probability distribution are the mean, the
variance, the standard deviation and the median. Again, a couple of paragraphs don’t do
justice to these ideas, but we hope they explain the concepts at least a little and make the
math seem more motivated and relevant.
Probably the simplest concept intuitively is the median. This is the 50th percentile of the
distribution.
161
Definition 12.12. The median of a random variable X having probability density f is the
real number m such that
1
P(X > m) = P(X < m) = . (12.5)
2
Definition 12.13.
1. If X has probability density f , the mean or expectation of X (the two terms are
R∞
synonyms) is the quantity EX := −∞ x f (x) dx. A variable commonly used for the
mean of a distribution is µ.
2. If X has probability density f and mean µ, the variance of X is the quantity
Z ∞
Var (X) := (x − µ)2 f (x) dx .
−∞
To understand these intuitively, you might recall what happens when rolling a die. Each of
the six numbers comes up about 1/6 of the time, so in a large number N of dice rolls you
will get about N/6 of each of the six outcomes. The average will therefore be
1
[(N/6) · 1 + (N/6) · 2 + (N/6) · 3 + (N/6) · 4 + (N/6) · 5 + (N/6) · 6] .
N
We can write this in summation notation as
6
X
j · P(X = j) .
j=1
Exercise 12.16. A carnival game that costs a dollar to play gives you a quarter for each
spot on a roll of a die (e.g., 75 cents if you roll a 3). When you have spent N dollars, about
how many quarters will you have received?
When instead there are infinitely many possible outcomes spread over an interval, the sum
is replaced by an integral Z ∞
x · f (x) dx .
−∞
162
A famous theorem in probability theory, called the Strong Law of Large Numbers, says that
this still computes the long term average: the long term average of independent draws from
R
a distribution with probability density function f will converge to x · f (x) dx.
Exercise 12.17. The random variable X has probability density 2x on [0, 1]. If you sample
a million times and take the average of the samples, roughly what will you get?
It is more difficult to understand why the variance has the precise definition it does, but it
is easy to see that the formula produces bigger values when the random variable X tends to
be farther from its mean value µ. The standard deviation is another measure of dispersion.
To see why it might be more physically relevant, consider the units.
Probabilities such as P(X ∈ [a, b]) can be considered to be unitless because they represent
ratios of like things: frequency of occurrences within the interval [a, b] divided by frequency
of all occurrences. Probability densities, integrated against the variable x (which may have
units of length, time, etc.) give probabilities. Therefore, probability densities have units of
“probability per unit x-value”, or in other words, inverse units to the independent variable.
The units of the mean are units of xf (x) dx, which is units of f times x2 ; but f has units
R
of inverse x, so the mean has units of x. This makes sense because the mean represents
a point on the x-axis. Similarly, the variance has units of x2 . It is hard to see what the
variance represents physically. The standard deviation, however, has units of x. Therefore,
it is a measure of dispersion having the same units as the mean. It represents a distance on
the x-axis which is some kind of average discrepancy from the mean14 .
Exercise 12.18. Here are three probability densities with mean 1. Rank them in order from
greatest to least standard deviation. You don’t have to compute precisely unless you want
to; just state an answer and justify it intuitively. The three densities are graphed to the right.
14
To be precise, a root-mean-square discrepancy.
163
Some common probability densities
There are a zillion different functions commonly used for probability densities. Three of
the most common are named in this section: the exponential, the uniform, and the normal.
These are common in probability for reasons analogous to why exponential behavior is
common in evolving systems. They come from simple properties.
The uniform, as the name applies, arises when a random quantity is uniformly likely to
be anywhere in an interval. It is often used as an “uninformed” model when all you know
is that a quantity has to be somewhere in a fixed interval. The normal arises when many
small independent contributions are summed. It is often used to model observational error.
The exponential is the so-called memoryless distribution. It arises when the probability
of finding X in the next small interval, given that you haven’t already found it, is always
constant.
All three of these are parametrized families of distributions. Once values are picked for the
parameters you get a particular distribution. This section concludes by giving definitions
of each and discuss typical applications.
The exponential distribution has a parameter µ which can be any positive real number. Its
density is (1/µ)e−x/µ on the positive half-line [0, ∞). This is obviously the same as the
density Ce−Cx (just take C = 1/µ) but we use the parameter µ rather than C because a
quick computation shows that the mean of the distribution is equal to µ.
Exercise 12.19. Integrate by parts with u = x and dv = µ−1 e−x/µ to show that the mean
of the exponential with parameter µ is µ. Don’t forget to write integrals to ∞ as limits.
The median of the exponential distribution with mean µ is also easy to compute. Solving
R M −1 −x/µ
0 µ e dx = 1/2 gives M = µ·ln 2. When X is a random waiting time, the interpreta-
164
tion is that it is equally likely to occur before ln 2 times its mean as after. Because ln 2 ≈ 0.7,
the median is significantly less than the mean. When modeling with exponentials, it is good
to remember it produces values that are unbounded but always positive.
Any of you who have studied radioactive decay know that each atom acts randomly and
independently of the others, decaying at a random time with an exponential distribution.
The fraction remaining after time t is the same as the probability that each individual
remains undecayed at time t, namely e−t/µ , so another interpretation for the median is the
half-life: the time at which only half the original substance remains. Other examples are
the life span of an organism that faces environmental hazards but does not age, or time for
an electronic component to fail (they don’t seem to age either).
The uniform distribution on the interval [a, b] is the probability density whose density is a
constant on this interval: the constant will be 1/(b − a). This is often thought of the least
informative distribution if you know that the the quantity must be between the values a
and b. The mean and median are both (a + b)/2.
Aside. The uniform distribution is less common in nature than the exponential or normal.
On the other hand, if you ask a computer to generate a random number in some range, it
will pick from the uniform distribution unless you program it otherwise.
Exercise 12.20. Use calculus to prove that a constant function C on an interval [a, b] is
a probability density if and only if C = 1/(b − a).
Example 12.14. In your orienteering class you are taken to a far away location and spun
around blindfolded when you arrive. When the blindfold comes off, you are facing at a
random compass angle (usually measured clockwise from due north). It would be reasonable
to model this as a uniform random variable from the interval [0, 360] in units of degrees.
Exercise 12.21. The mean and median are both 180◦ . Why are these not meaningful
measures of the center of the distribution in this case?
165
The normal distribution
The normal density with mean µ and standard deviation σ is the density
1 2 2
√ e−(x−µ) /(2σ ) .
σ 2π
The standard normal is the one with µ = 0 and σ = 1. There is a very cool mathematical
reason for this formula, which we will not go into. When a random variable is the result of
summing a bunch of smaller random variables all acting independently, the result is usually
well approximated by a normal. It is possible (though very tricky) to show that the definite
integral of this density over the whole real line is in fact 1 (in other words, that we have
chosen the right constant to make it a probability density).
166
13 Taylor polynomials
Polynomials are simpler than most other functions. This leads to the idea of approximating
a complicated function by a polynomial. Taylor realized that this is possible provided there
is an “easy” point at which you know how to compute the function and its derivatives.
Given a function f (x) and a value a, we will define for each degree n a polynomial Pn (x)
which is the “best nth degree polynomial approximation to f (x) near x = a.”
It pays to start very simply. A zero-degree polynomial is a constant. What is the best
constant approximation to f (x) near x = a? Clearly, the constant f (a). What is the best
linear approximation? We already know this, and have given it the notation L(x). It is the
tangent line to the graph of f (x) at x = a and its equation is L(x) = f (a) + f 0 (a)(x − a).
So now we know that
P0 (x) = f (a)
P1 (x) = f (a) + f 0 (a)(x − a)
Figure 45: A function (red), its constant (blue), and linear (black) approximations at x = 2
167
Figure 45 shows the graph of a function f along with its zeroth and first degree Taylor poly-
nomials at x = 2. The zeroth degree polynomial is the flat line and the first degree Taylor
polynomial is the tangent line. To refresh your memory on how well these approximate
f (x) near x = a, you might want to look back at Proposition 8.3 and Exercise 8.9.
Exercise 13.1. Suppose f 0 (a) 6= 0, which is true in Figure 45, for example, at a = 2.
Multiple choice question: How good an approximation is P0 near x = a?
(i) f (x) − P0 (x) ∼ a
(ii) f (x) − P0 (x) ∼ x − a
(iii) f (x) − P0 (x) ∼ f 0 (a) · (x − a)
(iv) f (x) − P0 (x) ∼ f 0 (a) · (x − a)2
To figure out the best degree-n polynomial approximation for all n, the one idea you need
is that the polynomial Pn should match all the derivatives of f up through the first n (the
zeroth being the value of f itself). Let’s check we’ve already done this for P0 and P1 . Check:
P0 was chosen to match the function value at a. Check: P1 matches the first derivative
because P1 (x) is a line; it has the same derivative everywhere, f 0 (a), chosen to match the
derivative of f at the point a.
Figure 46 shows P3 , P4 and P5 at x = 2 for the same function, with P5 shown in long
dashes, P4 in shorter dashes and P3 in dots. As n grows, notice how Pn becomes a better
approximation and stays close to f (shown in red) for longer.
Proposition 8.3 showed that |P1 − f | |x − a| near x = a and Exercise 8.9 gave evidence
that in fact |P1 − f | was on the scale of |x − a|2 , at least for a particular example. In
the coming sections we will see that this is true in general, and in fact that |Pn − f | is of
the scale |x − a|n+1 near x = a. This is one of the main motivations for studying Taylor
approximations.
168
Figure 46: Successive Taylor approximations P3 (dots), P4 (short dashes), P5 (long dashes)
and f in red
Exercise 13.2. The Taylor series for 1/x near x = 1 happens to obey the approximation
|Pn (x) − f (x)| ≈ |x − a|n+1 very closely. About how many digits after the decimal point
would the approximation P6 (1.01) capture of the true value of 1/1.01?
There is a formula for computing Pn . It’s easiest to see what’s going on when computing
Taylor polynomials near x = 0. The algebra for these is enough simpler that these Taylor
polynomials have a different name. A Taylor polynomial near x = 0 is called a MacLaurin
polynomial.
The formula for Taylor and MacLaurin polynomials uses some possibly unfamiliar notation:
f (k) refers to the k th derivative of the function f . This is better than f 0 , f 00 , etc., because
we can use it in a formula as k varies. In this notation, f (0) denotes f itself.
Proposition 13.1 (MacLaurin’s formula). Let f be a function that is n times differentiable
on an interval containing 0. The polynomial Pn whose 0th through k th derivatives match
those of f is given by the formula
n
X f (k) (0)
Pn (x) = xk . (13.1)
k!
k=0
169
Exercise 13.3. Use formula (13.1) to compute P4 near x = 0 for the function f (x) =
cos(x).
The reason it’s easy to check MacLaurin’s formula is that (d/dx)j xk is a simple computation.
When j > k, you get zero. When j = k you get the constant k!. When j < k you get
k (k − 1) · · · (k − j + 1)xk−j which may seem messy but at the value x = 0 is zero.
Exercise 13.4. What is the 6th derivative evaluated at x = 0 of the polynomial 10 + 11x +
12x2 + 13x3 + 14x4 + 15x5 + 16x6 + 17x7 + 18x8 ?
match those of f at the point a if you write it instead as a sum nk=0 bk (x − a)k . This is
P
still a polynomial of degree at most n, now written in a way that makes it easier to evaluate
repeated derivatives at the point a. In fact the same argument proves the following more
general formula.
Proposition 13.2 (Taylor’s formula). Let a be any real number and let f be a function
that can be differentiated at least n times at the point a. The Taylor polynomial for f of
order n about the point a is the polynomial Pn (x) defined by
n
X f (k) (a)
Pn (x) := (x − a)k . (13.2)
k!
k=0
Exercise 13.5. Identify the free and the bound variables on the right-hand side of (13.2).
Do all the free variables appear on the left? What does that tell you about the notation
Pn (x)?
Remember to read this sort of thing slowly. Here is roughly the thought process you should
go through when seeing MacLaurin’s formula and Taylor’s formula for the first time.
170
• It looks as if Pn is a polynomial in the variable x with n + 1 terms.
• Really the polyonomial depends on both n and a. It should really be called Pn,a (x).
• Hey, what’s the zeroth derivative f (0) (a)? Oh, it’s just f (a).
• The degree of Pn (x) will be n unless the coefficient on the highest power (x − a)n is
zero, in which case the degree will be less.
Example 13.3. Let f (x) := x, with n = 3 and a = 2. The value of f (a) is 2 and the first
three derivatives of f (x) are constants: 1, 0, 0. Therefore
0 0
P3 (x) = 2 + 1 · (x − 2) + (x − 2)2 + (x − 2)3 .
2! 3!
In other words, P3 (x) = x. Obviously P4 , P5 and so on will also be x. Maybe this example
was too trivial. But it does point out a fact: if f is a polynomial of degree d then the terms
of the Taylor polynomial beyond degree d vanish because the derivatives of f vanish. In
fact, Pn (x) = f (x) for all n ≥ d. When a = 0 the Taylor polynomials for n < d are also
pretty simple:
Proposition 13.4. If f (x) = dk=0 ak xk is a degree-d polynomial, then Pn (x) = f (x) for
P
Exercise 13.6. What are the Maclaurin polynomials P0 , P1 , P2 , P3 and P4 for f (x) := x2 ?
Example 13.5. f (x) = ex , n = 3 and a = 0. We list the function and its derivatives out
to the third one.
171
f (k) (a)
k f (k) (x) f (k) (a) (x − a)k
k!
0 ex 1 1
1 ex 1 x
x2
2 ex 1
2
x3
3 ex 1
6
Summing the last column we find that the cubic Maclaurin polynomial is given by P3 (x) =
1 + x + x2 /2 + x3 /6.
√
Example 13.6. Let f (x) = ln x and expand around a = 1. We’ll do the first two terms
this time.
f (k) (a)
k f (k) (x) f (k) (a) (x − a)k
k!
√
0 ln x 0 0
1 1 1
1 (x − 1)
2x 2 2
−1 1 1
2 − − (x − 1)2
2x2 2 4
x−1 (x − 1)2
Summing the last column we find that P2 (x) = − . If you don’t have a
2√ 4
computing device and you need a quick estimate ln 1.4, this is one you can do in your
head (really!).
You can always compute a Taylor polynomial using the formula. But sometimes the deriva-
tives get messy and you can save time and mistakes by building up from pieces. Taylor
polynomials follow the usual rules for addition, multiplication and composition. If f and
g have Taylor polynomials P and Q of order n then f + g has Taylor polynomial P + Q.
This is easy to see because the derivative is just the sum of the derivatives. Furthermore,
the order n Taylor polynomial for f g is P · Q (ignore terms of order higher than n). This is
172
because the product rule for the derivative of f g looks exactly like the rule for multiplying
polynomials. I won’t present a proof here but you can feel free to use this fact.
Example 13.7. What is the cubic Maclaurin polynomial for ex sin x? The respective cubic
Taylor polynomials are 1 + x + x2 /2 + x3 /6 and x − x3 /6. Multiplying these and ignoring
terms with a power beyond 3 we get
x2 x3 x3
P3 (x) = x 1 + x + − · 1 = x + x2 + .
2 6 3
You can do the same thing with division, assuming you learned polynomial long division
(this is useful? Who knew!). If you have Taylor series around a point a other than zero,
you will be dealing with polynomials in (x − a) rather than in x.
173
Perhaps the most useful manipulation is composition. I will illustrate this by example.
2
The Maclaurin polynomial for ex is obtained by plugging in x2 for x in the Maclaurin
polynomial for ex : 1 + (x2 ) + (x2 )2 /2! + · · · .
One last trick arises when computing the Taylor series for a function defined as an integral.
Rx
Suppose f (x) = b g(t) dt. Then f 0 (x) = g(x) so if you know g and its derivatives, you
know the derivatives of f . If g has no nice indefinite integral, then you don’t know the value
of f itself, except at one point, namely f (b) = 0. Therefore, a Taylor series at b is the most
Rx
common choice for a function defined as b of another function.
Rx√
Example 13.9. Suppose f (x) = 1 1 + t3 dt. The Taylor series can be computed about
√ √
the point a = 1. From f 0 (x) = 1 + x3 , f 00 (x) = 3x2 /(2 1 + x3 ) we get
√ √
f (1) = 0, f 0 (1) = 2, f 00 (1) = 3/(2 2)
√ 3
and therefore P2 (x) = 2(x − 1) + √ (x − 1)2 .
4 2
The next section gives precise statements about how closely Taylor polynomials approximate
function values. For now, we will take this on faith and see how to use Taylor polynomials.
Example 13.10. What’s a good approximation to e0.05 ? The Maclaurin polynomial will
provide a very accurate estimate with only a few terms. The linear approximation, 1.05, is
already not bad. The quadratic approximation is
Taylor series are sometimes useful in approximating integrals when you can’t do the integral:
you approximate the integrand by a Taylor polynomial, then integrate precisely (polynomial
anti-deriviatives are easy to calculate).
R 1/2
Example 13.11. Is it easier to approximate 0 cos(πx2 ) dx via trapezoidal approximation
or Taylor integration?
The Taylor approach starts by computing some Pn at some point in the interval. The mid-
point 1/4 would probably give the greatest accuracy but computations would be messier.
174
Instead take a = 0. There, P4 is easily computed by substituting πx2 for x in the Maclaurin
polynomial for cosine. Which one? P2 (x) = 1 − x2 /2 is good enough to get all terms up to
degree 4 after the substitution: plugging in πx2 for x gives
π2 4
P4 (x) = 1 − x .
2
Integrating,
1/2 1/2
π2 5
Z
P4 (x) = x − x .
0 10 0
This comes out to 1/2−π 2 /320 ≈ 0.46916 which is accurate to within 0.001. The trapezoidal
approximation gives roughly 0.464907 which is off by four times as much.
R1 2
Exercise 13.9. Estimate −1 e−x dx by integrating the quadratic Taylor polynomial ex-
actly. How close do you get to the numerical answer of 1.49?
Aside. This last section is ambitious. If you understand this section, you will have absorbed
a good dose of mathematical reasoning. You will probably be a lot better prepared for further
study in math than some students who place into higher courses.
The central question for this section is, how good an approximation to f is Pn ? We will
give a rough answer and then a more precise one.
Rough answer: Pn (x) − f (x) ∼ K(x − a)n+1 near x = a. For example, the linear approxi-
mation P1 is off from the actual value by a quadratic quantity K(x − a)2 . If x differs from
a by about 0.1 then P1 (x) will differ from f (x) by something like 0.01 (we are being rough
here and pretending K = 1). If x agrees with a to four decimal places, then P1 (x) should
agree with f (x) to about eight places. Similarly, the quadratic approximation P2 differs
from f by a multiple of (x − a)3 , and so on.
You can skip the justification of this answer, but we thought we’d include the derivation
for those who want it because it’s just an application of L’Hôpital’s rule. Once you guess
that Pn (x) − f (x) ∼ K(x − a)n , you can verify it by starting with the equation
f (x) − Pn (x)
lim ,
x→0 (x − a)n+1
175
and repeatedly applying L’Hôpital’s rule until the denominator is not zero at x = a. Because
the derivatives of f and Pn at zero match through order n, it takes at least n + 1 derivatives
to get something nonzero, at which point the denominator has become the nonzero constant
(n + 1)!. The limit is therefore f (n+1) (a)/(n + 1)!, which may or may not be zero but is
surely finite.
We know the Taylor polynomial is an order (x−a)n+1 approximation but there is a constant
K in the expression which could be huge. What about actual bounds can we obtain on
f (x) − Pn (x)? These are given by Answer #2, which is called Taylor’s Theorem with
Remainder.
Theorem 13.12 (Taylor’s Theorem with Remainder). Let f be a function with n + 1 con-
tinuous derivatives on an interval [a, x] or [x, a] and let Pn be the order n Taylor polynomial
for f about the point a. Then
f (n+1) (c)
f (x) − Pn (x) = (x − a)n+1
(n + 1)!
for some c between a and x. This is illustrated in Figure 47.
Figure 47: the difference f (x)−P2 (x) is equal to (x−a)3 times f (3) (c)/3! for some c between
a and x
The theorem tells us that the constant k in the rough answer is f (n+1) (c)/(n + 1)! for this
unknown c. This is at first a little mysterious and difficult to use, which is why we’ll be
176
doing some practice. The exact value of c will depend on a, x, n and f and will not be
known. However, it will always be between a and x.
√
Example 13.13. Suppose f (x) := x, a = 9 and n = 1. Observing that f (9) = 3
√
and f 0 (9) = 1/(2 9) = 1/6, we see that the tangent line approximation P1 (x) is equal
√
to 3 + (x − 9)/6. What can we infer about the value of 10 from this? With x = 10,
Theorem 13.12 tells us that
√ 3 + (10 − 9) f 00 (c)
10 − = (10 − 9)2
6 2!
for some c between 9 and 10. Using f 00 (x) = −(1/4)x−3/2 , this simplifies to
√ 1 1
10 − 3 = − c−3/2 .
6 8
Exercise 13.10. Suppose f (x) := e−x , a = 0 and n = 1. What does Theorem 13.12 say
about f (0.4)?
In Example 13.13, we still don’t know which number between 9 and 10 is the actual c. Often
we can get a good idea of the error by examining the possible values of the right-hand side
a little more closely. Frequently, for example the sign of f (n+1) does not change and is
known to us. Also frequently it is greatest at the point a where we can compute everything
exactly. For example, if f (n+1) is known to be positive on the interval [a, x], and known to
be greater at a than at larger values, we can conclude that Pn (x) − f (x) is between 0 and
(x − a)n+1 · f (n+1) (a)/(n + 1)!. Here is a very similar example, except that f (n+1) is known
to be negative.
Example (13.13 continued). We don’t know which number between 9 and 10 is c, but
examining values of c−3/2 when c is between 9 and 10, we see that they are all positive, with
a maximum of 9−3/2 = 1/27. This is pretty small, which is nice for us because it implies
√
that the error 10 − 3 16 is a negative number whose magnitude is no greater than 1/(8 · 27)
which is less than .005 because 8 times 27 is more than 200. Evidently 3 16 is a very good
√
approximation to 10.
q
1
Exercise 13.11. Use the same technique to say how good an approximation 3 12 is to 9 12 .
Things don’t always work out so nicely. It is pretty common that you know the sign of
f (n+1) , and almost always you can compute f (n+1) (x) precisely at x = a, but it is only
moderately likely that its magnitude will be maximized at x = a.
177
Exercise 13.12. Why do you usually know the exact value of f (n+1) (a)?
Here is an example of what you can do when you don’t know the maximum magnitude of
f (n+1) (x) on [a, x].
Example 13.14. Let f (x) = ex , a = 0 and n = 2. Because f (n) (ex ) = ex for all n, and
e0 = 1, we see that f (n) (0) = 1 for all n, and in particular that
x2
P2 (x) = 1 + x + .
2
Let’s use P2 to estimate e0.4 . This is just like Exercise 13.10 except with ex instead of e−x .
First, P2 (0.4) = 1 + 0.4 + (0.4)2 /2 = 1.48 precisely. Plugging in f 000 (x) = ex , Theorem 13.12
tells us that
f 000 (c)
e0.4 − 1.48 = (0.4)3 ≈ 0.021ec
3!
for some c ∈ [0, 0.4]. We can see the maximum of the right-hand side is attained at c = 0.4
rather than c = 0. The value of f 000 there is e0.4 which happens to be the quantity we are
going to a lot of trouble to estimate. So of course we don’t already know what it is. The
trick is to use any crude upper bound. For example, e is less than 3 and 0.4 is less than
√
1/2, so e0.4 < 3, which we happen to know to be approximately 1.732. If we didn’t know
this, we could use e < 4 instead of e < 3, leading to e0.4 < 40.5 = 2. That’s pretty rock
solid. So then the error, which is known to be positive, is less than 2 · 0.021 = 0.042 and
we have 1.48 < e0.4 < 1.522. The true value to three decimals is 1.492.
178