Master
Master
Copyright
2004.
c This work is covered by the Gnu Free Documentation li-
cense. Loosely speaking here are the terms of this license:
• You are free to copy, redistribute, change, print, sell, and otherwise use in
any manner part or all of this document,
• Any work derived from these notes must also be covered by the GFDL. (This
only applies to that part of your work derived from these notes. It is up to
you whether those parts of your work which are not based on these notes is
covered by the GFDL. Also, you can quote, with attribution, and subject
to fair use provisions, from these notes like you would from any copyrighted
work.)
• Anyone distributing works covered by the GFDL must provide source code
or other editable files for the material which is distributed. In the case of
these notes that means the LATEX code, as well as the source documents for
creating the graphics.
• If you make changes to these notes and redestribute them, you should name
the finished product something different than “The Free Speech Calculus
Text” or “The Free Speech Calculus Text: original version”. You may choose
derivative names like “The Free Speech Calculus Text: the John Doe ver-
sion”.
Contents
1 Background 3
1.1 The numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Using functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4 End of chapter problems . . . . . . . . . . . . . . . . . . . . . . . . 35
2 Limits 37
2.1 Elementary limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Formal limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.3 Foundations of the real numbers . . . . . . . . . . . . . . . . . . . 47
2.4 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5 Limits at infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3 Derivatives 61
3.1 The idea of the derivative of a function . . . . . . . . . . . . . . . . 61
3.2 Derivative Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 An alternative approach to derivatives . . . . . . . . . . . . . . . . 74
3.4 Derivatives of transcendental functions . . . . . . . . . . . . . . . . 81
3.5 Product and quotient rule . . . . . . . . . . . . . . . . . . . . . . . 87
3.6 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.7 Hyperbolic functions . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.8 Tangent and Normal Lines . . . . . . . . . . . . . . . . . . . . . . . 97
3.9 End of chapter problems . . . . . . . . . . . . . . . . . . . . . . . . 100
5 Integration 165
5.1 Basic integration formulas . . . . . . . . . . . . . . . . . . . . . . . 165
iii
iv CONTENTS
11 Vectors 307
11.1 Basic vector arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 307
11.2 Limits and Continuity in Vector calculus . . . . . . . . . . . . . . . 316
11.3 Derivatives in vector calculus . . . . . . . . . . . . . . . . . . . . . 324
11.4 Div, Grad, Curl, and other operators . . . . . . . . . . . . . . . . . 327
CONTENTS v
Introduction
Discussion.
[author=garrett,style=friendly,label=introduction_to_whole_work,version=1, file
=text_files/introduction_to_whole]
Relax. Calculus doesn’t have to be hard, and its basic ideas can be understood
by anyone. So, why does it have a reputation? Well, Calculus can be hard. Huh?
That’s right, it doesn’t have to be hard, but it can be.
Calculus itself just involves two new processes, differentiation and integration,
and applications of these new things to solution of problems that would have been
impossible otherwise.
For better or for worse, most Calculus classes increase the overall level of
difficulty above pre-Calculus, while teaching you the subject matter. What this
means is that in addition to learning how to take the derivative, you’ll set up word
problems, solve some equations, interpret the results etc. Most of this is algebra,
but the pieces are all held together by the Calculus.
The three hardest parts about a typical Calculus class are (in my opinion):
(1) Setting problems up (reading word problems, setting up equations etc), (2)
manipulating functions and equations in algebraic way, (3) keeping track of a
bunch of parts of the problem and putting it together. Note that (1) and (3)
should be really important for anyone who is going to use their head for a living.
Note that none of these “hard parts” is taking derivatives or integrating. Of course,
I could be biased since I teach the class!
Another thing to think about as you take this course is the role of the calculator.
On the one hand, since we have graphing calculators, the way we use calculus
should be a little different than how it was used in the past. However, math
teachers are still figuring out what parts should change and what parts should
stay the same. So please forgive us if we don’t have it perfect quite yet.
On the other hand, some of the parts that have changed are now harder. So,
in the past, it might have been a useful exercise (in a practical sense)for a student
to learn how to graph y = ex /x by hand, now the use of this exercise is probably
one of the following: (1) purely for the sake of learning, (2) because we have to be
able to double check/understand what the calculators are telling us, (3) as a warm
up for the problems that the calculator can’t solve. Option (3) might mean being
able to analyze all functions of the form y = aebx /x where a and b are constants,
but we don’t know what they are ahead of time. Note that the calculator can’t
graph y = aebx ; we can (and will) learn to do it, but this problem is a little harder
than graphing y = ex /x.
Discussion.
[author=duckworth,style=formal,label=introduction_to_whole_work,version=2, file
=text_files/introduction_to_whole]
This text book aims to provide both insight into the essential problems of Calculus
(and the related field of mathematical analysis) and a rigorous proof of all of the
standard material in a Calculus class. However, we will not put rigor in the way
of leisure or explanation. Thus, while we will prove everything, we will not always
do so in the most sophisticated or efficient manner manner.
One of the special features of this text is to include discussion of the historical
2 CONTENTS
controversies and so-called paradoxes which made Calculus such an exciting and
hard-won mathematical field.
Discussion.
[author=duckworth,style=middle,label=introduction_to_whole_work,version=3, file
=text_files/introduction_to_whole]
This text will attempt to introduce the student to all of the varied roles which Cal-
culus plays in science and academia. Calculus is an applied subject which forms
the basis of elementary calculations in physics, biology, psychology, statistics, en-
gineering, etc. Calculus is the first math class that most people have taken where
they have to learn concepts that are not immediate generalizations of arithmetic
or geometric intuition. Finally, and related to both of the above, Calculus is the
first math class that many people take where statements are given that are not
exercises in proof, like in geometry, but still need to be proven.
It is of course not easy to satisfy all of the above goals at the same time, so
we will have to take a middle-of-the-road approach: we will offer a little bit of
material towards each goal.
In addition to the elusive goals just layed out, there is the difficulty of having
a widely defined audience: in a typical college Calculus class roughly one third
of the students have seen Calculus before, and remember the material fairly well.
Another third of the class has seen Calculus before, but did not absorb a significant
part of the course. The final third of the class has not seen Calculus before.
It is of course not easy to write a book addressed to all of the above parts of
the audience, so we will again take a middle-of-the-road approach: we will offer
enough explanation that a student who has never seen the material before can,
with diligence, learn Calculus. The phrase “with diligence” is supposed to suggest
that such a student will have to expect to spend more time figuring out examples
and discussion in this book than they needed for previous math classes. Such a
student might also want to access extra material from study guides.
Chapter 1
Background
Discussion.
3
4 CHAPTER 1. BACKGROUND
having only finitely many digits. For example, although π has an infinite number
of digits, we can approximate it as π ≈ 3.1415926 · · · . This is how our calculators
and computers work: they approximate the set of all real numbers using only num-
bers with finitely many digits. This is why their answers are sometimes wrong;
because they’re based on approximations.
Discussion.
[author=duckworth,uses=complex_numbers,uses=extended_reals,uses=hyperreals, file
=text_files/basics_about_numbers]
It is sometimes convenient to add some extra numbers to the real numbers. When
we do so we go beyond many people’s intuition, and this might make some stu-
dents uncomfortable. Good! This discomfort is a sign of something interesting; I
encourage you to explore any topic here which you think is strange, or suspicious.
I’ll just briefly say now that everything we do here can be rigorously justified, and
that it’s great fun to introduce new objects into your mathematics. With these
new objects you can do things that were previously “forbidden”: take the square
root of negative numbers and divide by 0,
The complex numbers are denoted by C and are obtained by taking the real
numbers R and joining them with the “imaginary” number i, which satisfies i2 =
−1. By “joining” I mean that you also take all sums, differences, products and
quotients of things in R together with i. In other words, every complex number can
be written in the form a + bi where a and b are any real number. The arithmetic
is C is defined by the two rules:
The extended real numbers do not have a standard symbol. They are obtained
by taking the real numbers R and joining them with infinity ∞. Again, “joining”
means taking all sums, differences, products, and quotients of things in R with ∞.
The arithmetic in the extended real numbers is (loosely speaking) defined by the
following rules:
The hyperreal numbers are similar to the extended real numbers. They are
obtained by taking the real numbers R and joining them with ∞, as well as an
infinitesimal . The arithmetic in the hyperreals is (loosely speaking) defined by
1.1. THE NUMBERS 5
These three extended systems of the real numbers have quite different uses,
and mathematicians view them quite differently. The complex numbers are seen
as a “simple” extension of the real numbers. They are used almost exactly the
same way that real numbers are; to solve equations, to define polynomials, expo-
nentials, logarithms, trigonometry, derivatives and anti-derivatives. The extended
real numbers are viewed as a notational convenience. They allow one to writ things
5
like ∞ = 0 which is useful when calculating limits. The hyperreals are a mod-
ern version of the ideas that Newton and Leibnitz first used to develop Calculus.
They are mathematically rigorous, deep, and can be used to prove all the results
of Calculus we will use later. They also seem more abstract or foreign to many
students than the complex numbers or the extended reals.
Discussion.
[author=wikibooks,uses=algebraic_axioms_for_reals,label=algebra_axiom_for_real_
number_field, file =text_files/rules_of_basic_algebra]
The following axioms, or rules, are satisfied by the real numbers.
• Addition is commutative: a + b = b + a
• Addition is associative: (a + b) + c = a + (b + c)
• Multiplication is a commutative: a · b = b · a
• Multiplication is associative: (a · b) · c = a · (b · c)
Comment.
[author=wikibooks,author=duckworth, file =text_files/rules_of_basic_algebra]
The above laws are true for all a, b, and c. This also means that the laws are true
if a, b and c represent unknowns, or combinations of unknowns; in other words a,
b and c can be variables, functions, formulas, etc. All the algebra we do in this
class (or any other class), follows from these rules (as well as the rules of logic,
and the rule that you can do the same thing to both sides an equation and you
will still have an equation). Of course, all of us know lots of other algebraic rules,
but each of these other rules must be built up, or derived, from the simple ones
above.
Example 1.1.1.
[author=wikibooks,author=duckworth, file =text_files/rules_of_basic_algebra]
When you want to cancel or simplify something, if you’re not sure what rule you’re
trying to use, look up the rule. For instance, occasionally people do the following,
which is incorrect
2 · (x + 2) 2 x+2 x+2
= · = .
2 2 2 2
So, how do Axioms ?? apply to this situation? Well first, let’s review our rules for
multiplying fractions. So, can we figure out ab · dc from Axioms ??? Well, ab doesn’t
appear in Axioms ??. In fact, ab is shorthand notation for a · 1b , which does appear
in Axioms ??. So ab · dc really equals ac 1b d1 . Ok, now what. Now I claim that 1b d1
1
must equal bd . Why? Well, by Axiom 1.1.1 there is a unique number which is the
inverse of bd, and that number has the unique property that when you multiply it
by bd you get 1. Well,
( 1b d1 )bd = ( 1b b)( d1 d) by Axioms 1.1.1 and 1.1.1
= 1·1 by Axiom 1.1.1
= 1 by Axiom 1.1.1
11 11 1 ac
Therefore, bd equals the inverse of bd, thus bd = bd . Therefore, b d = a 1b c d1 =
ac bd ac
= bd .
Note: I would never suggest that you go through these steps every time. We
have just shown how to multiply two fractions, from now on, I would always just
use the property we just derived.
Ok, now that we know how to multiply two fractions, we can straighten out
the mistake above. It is not the case that 2(x+2)
2 = 22 x+2
2 . Rather, We should have
2(x+2) 2 x+2
2 = 2 1 = 1(x + 2) = x + 2.
Example 1.1.2.
For example, if you’re not sure whether it’s ok to cancel the x + 3 in the following
expression (x+2)(x+3)
x+3 you could justify the steps as follows:
(x+2)(x+3) 1
x+3 = (x + 2)(x + 3) · x+3 (Division definition)
= (x + 2) · 1 (Associtive law and Inverse law)
= x+2 (One law)
Discussion.
[author=duckworth,label=discussion_of_what_less_than_means, file =text_files/
inequalities]
The real numbers are split in half; the positive numbers are on the right half of
the number line and the negative numbers are on the left half.
For any real numbers a and b we say a < b if a is to the left of b on the real
number line. This is equivalent to having b − a be positive.
Next, we’re going to review basic facts and arithmetic about positive and neg-
ative numbers, and inequalities.
[author=duckworth,label=order_axioms_for_reals,label=order_axioms_for_reals,
file =text_files/inequalities]
In additon to the algebraic axioms for the real numbers (see 1.1), we also have the
following order axioms:
• The trichotomy law: for all real numbers a and b we have a < b or b < a or
a = b.
• Transitivity: if a ≤ b and b ≤ c then a ≤ c.
• Addition preserves order: if a ≤ b and c is any real number then a+c ≤ b+c.
• Multiplication by positives preserves order: if a ≤ b and c ≥ 0 then ac ≤ bc.
Rule 1.1.1.
[author=garrett,label=rules_for_multiplying_pos_negatives, file =text_files/inequalities]
First, a person must remember that the only way for a product of numbers to be
zero is that one or more of the individual numbers be zero. As silly as this may
seem, it is indispensable.
Next, there is the collection of slogans:
Or, more cutely: the product of two numbers of the same sign is positive, while
the product of two numbers of opposite signs is negative.
Extending this just a little: for a product of real numbers to be positive, the
number of negative ones must be even. If the number of negative ones is odd then
the product is negative. And, of course, if there are any zeros, then the product is
zero.
Notation.
[author=wikibooks, file =text_files/interval_notation]
The notation used to denote intervals is very simple, but sometimes ambiguous
because of the similarity to ordered pair notation
Let a and b be any real numbers, or ±∞, with a ≤ b. We define the following
sets, called intervals, on the real line:
Unfortunately the notation (a, b) is the same notation as is used for x, y points.
I’m sorry but mathematicians re-use notation and hope that the context makes it
clear which meaning is intended.
There is also notation for combining intervals. The union notation ∪ means
combine the intervals. Thus (1, 2) ∪ (3, 4) means the set of numbers that are in
(1, 2) or in (3, 4).
Note: the use of the word “or” here is sometimes confusing. You might think
of (1, 2) ∪ (3, 4) as equalling the interval (1, 2) and the interval (3, 4). You’re not
wrong if you think this way. But, mathematicians have learned through experience
that it’s best, linguisticly, to talk about a single number x rather than infinite sets
of numbers. Thus, a single number x is in (1, 2) ∪ (3, 4) if x is in (1, 2) or x is in
(3, 4).
Exercises
1. Find the intervals on which f (x) = x(x − 1)(x + 1) is positive, and the
intervals on which it is negative.
2. Find the intervals on which f (x) = (3x − 2)(x − 1)(x + 1) is positive, and
the intervals on which it is negative.
3. Find the intervals on which f (x) = (3x − 2)(3 − x)(x + 1) is positive, and
the intervals on which it is negative.
1.2. FUNCTIONS 9
1.2 Functions
Definition 1.2.1.
[author=duckworth,label=definition_of_function, file =text_files/what_is_a_function]
A function is something which takes a set of numbers as inputs, and converts each
input into exactly one output.
Comment.
[author=duckworth,label=comment_explaining_functions, file =text_files/what_is_
a_function]
In our definition of function, “something” means rule or algorithm or procedure.
The most familiar “something” is a formula like x2 or x + 3.
The function sin(x) gives an example of something which you might think of as
a formula, but actually depends upon a procedure. To find sin(.57) one “draws”
a right triangle which contains the angle .57, and then sin(.57) equals the ratio of
the opposite side over the hypotenuse. People are often bothered by this definition
when the first learn it, because it’s not a formula. Eventually, time and experience
make people more comfortable with sin(x) and we actually start to view it as one
of our basic functions, as if we knew it’s formula.
Comment.
2. Piecwise: Giving more than one formula and piecing them together.
3. Graphically: Giving a graph with inputs on one axis and outputs on the
other.
Definition 1.2.2.
files/what_is_a_function]
The collection of all possible inputs is called the domain of the function. The
collection of all possible outputs is the range.
If the domain has not been stated explicitly, then we assume that the domain
equals all real numbers which make the function defined. In this case it is usually
easy, with a little work, to find an explicit description of the domain. The range
is not usually explicitly stated and it is sometimes difficult to find an explicit
description of it.
Discussion.
(We note that some of these things aren’t so bad if one is willing to work with
the complex numbers, or the hyperreals.)
Discussion.
Example 1.2.2.
[author=garrett,label=example_finding_domain_sqrt_x^2-1, file =text_files/what_
is_a_function]
For example, what is the domain of the function
p
y= x2 − 1?
x2 − 1 = (x − 1)(x + 1) = (x − 1) (x − (−1))
10
This is negative exactly on the interval (−1, 1), so this is the interval we must
8
prohibit in order to have just the domain of the function. That is, the domain is
the union of two intervals: 6
4
(−∞, −1] ∪ [1, +∞) 2
–10 –6 –4 0 2 4 6 8 10
–2
You can also verify our answer by looking at the graph. Of course, we will
always try to solve problems algebraically when possible, rather than just relying sqrt_x_squared_minus_1
upon the graph. In any case, on the graph we don’t see any points between x = −1
and x = 1, which is equivalent to saying that the domain equals what we described 4
above.
3
2
Example 1.2.3.
1
[author=wikibooks,label=example_finding_domain_top_half_of_circle, file =text_
files/what_is_a_function]
√
–2 –1 0 1 2
Let y = 1 − x2 define a function. Then this formula is only defined for values
of x between −1 and 1, because the square root function is not defined (in the top_half_of_unit_circle
world of real numbers) for negative values. Thus, the domain would be [−1, 1].
This agrees with the fact that the graph is the top half of a circle, and not defined
outside of [−1, 1].
√
In this case it is easy to see that 1 − x2 can only equal values from 0 to 1.
Thus, the range of this function is [0, 1].
Example 1.2.4.
[author=duckworth,label=example_function_given_by_graph, file =text_files/what_
is_a_function]
Let f (x) be defined by the graph below.
12 CHAPTER 1. BACKGROUND
10
–3 –2 –1 1 2 3
–10
–20
–30
–40
generic_cubic
To determine a function value from the graph we read the y-value (off the verti-
cal axis) which corresponds to some given x-value (on the horizontal axis). For
example given the input of x = 3 the output is y = 14.
A graph shows us lots of information about the function, and much of what we
learn later will be how to find this information without relying upon the graph.
For example, we can see that there is a certain type of maximum at x = 0.
In problems like this, that depend upon the graph, we will generally not require
very accurate answers. The answers only need to be accurate enough to show that
we’ve read the graph correctly.
Example 1.2.5.
[author=duckworth,label=example_function_given_by_numbers, file =text_files/what_
is_a_function]
Let the table of numbers below define a function, where x is the input and y is
the output.
Example 1.2.6.
[author=duckworth,label=example_piecewise_function, file =text_files/what_is_
a_function]
Let y be defined by the following formulas, each applying to just one range of
1.2. FUNCTIONS 13
inputs. 2
x if x ≤ 0
y= −x2 if 0 < x ≤ 3
x
e if 3 < x
Which formula you use depends upon which x-value you are plugging in. To
plug in x = −1 we use the first formula. So an input of x = −1 has an output of
(−1)2 = 1. To plug in x = 2 we use the second formula, so the output is −22 = −4.
Similarly an input of x = 4 has an output of y = e4 .
We can also graph y. In this case it looks like x2 on the left (i.e. for x ≤ 0); it
looks like −x2 in the middle (for 0 < x ≤ 3) and it looks like ex on the right (for
x > 3). Notice that the graph looks “unnatural,” especially at x = 3 where it is
discontinuous.
50
40
y 30
20
10
–4 –2 0 2 4
two_parbs_and_exponential
Example 1.2.7.
[author=duckworth,label=example_function_implicit, file =text_files/what_is_a_
function]
Let y be defined as a function of x, x < 0, by the equation:
x3 + y 3 = 6xy
(−.5)3 + y 3 = 6 · (−.5)y
for y (actually I’ll probably have to enter it in the calculator using x instead of
y!) to find y ≈ 0.04164259578. Similarly, I could do this for any negative value
for x; this is how y can be viewed as a function of x (only for negative values of x
though).
To make this more concrete, but still not rely upon a formula, I could fill in a
small table of numbers:
What happens when we try to plug in a positive value for x like x = 1? There
is more than one solution for y. This means that y is not a function of x for x > 0.
Discussion.
[author=livshits,uses=function_extensions,label=discussion_extension_restriction_
of_functions, file =text_files/what_is_a_function]
We think of a function as a rule by which we can figure out f (x) from x. Strictly
speaking, we have to specify what objects x are being used, the collection of all
these objects is called the (definition) domain of the function.
The home address is a real life example of a function. This function is defined
for all the people that have home address, in other words, the definition domain
of the home address is the collection of all the people who live at home. The
home address is not defined for the homeless people. On the other hand, some
homeless individuals pick up their mail at the post office and therefore have their
postal addresses. For people who live at home their postal address and their home
address coincide.
We say that the postal address is an extension of the home address to the
homeless individuals who pick up their mail at the post office.
We also say that the home address is a restriction of the postal address to the
individuals who live at home.
The notions of restriction and extension of functions are central to our approach
to differentiation.
Discussion.
[author=duckworth,label=discussion_types_of_basic_functions, file =text_files/
list_of_basic_functions]
In practice, in this class, we don’t have that many basic functions. Here’s most of
them.
Polynomials These are positive powers of x, combined with addition and mul-
tiplication by numbers. We call the highest power that appears the degree
of the polynomial. The numbers which are multiplied by x are called the
coefficients. The leading coefficent is the coefficent of the highest power
of x. The constant term is the number which has no power of x.
We can write a general expression for a polynomial, but since we don’t now
exactly what the degree will be, we need to use a letter to represent it; we
will use n. Since we don’t know how big the degree is, we can’t write all the
terms, thus we will leave out some number of terms in the middle, and will
write “. . . ” in their place. Similarly, we will need to use letters to represent
the coefficients. The number of coefficients equals the degree varies with the
degree, so we don’t know how many letters we’ll need. For this reason we
don’t usually write a general polynomial with letters of the form a, b, c, . . . ,
but rather we use a0 , a1 , a2 , etc. We summarize this terminology and show
some examples in figure 1.2
Trigonometric Functions sin(x), cos(x), tan(x), sin −1 (x), cos −1 (x), tan −1 (x),
csc(x), sec(x), cot(x)
1.2. FUNCTIONS 15
Figure 1.3: y = x3
10
8
6
4
2
–3 –2 0 1 2 3
–2
–4
–6
–8
–10
x_cubed_-3_to_3_manual
–3 –2 –1 0 1 2 3
–1
–2
–3
1_over_x_-3_to_3_manual_fit
We show some graphs of some of these functions in the next few figures.
Notation.
0.5
0 2 4 6 8 10 12
–0.5
–1
sin_0_to_4pi
–8 –6 –4 0 2 4 6 8
–0.5
–1
–1.5
tan_inverse_-3pi_to_3pi
Figure 1.7: y = ex
10
–3 –2 –1 0 1 2 3
e_to_the_x_-3_to_3_manual
–1 1 1 2 3 4 5
–1
–2
–3
–4
–5
–6
ln_of_x_neg_1_to_5
1.2. FUNCTIONS 17
the same thing with ln, cos, tan and the other trig functions. In this book, we
will always use parentheses for these functions, unless the notation becomes too
complicated and and it seems that leaving out some parenthesis would simplify it.
Example 1.2.8.
[author=wikibooks,label=example_simple_function_notation, file =text_files/function_
notation]
For example, if we write f (x) = 3x + 2, then we mean that f is the function which,
if you give it a number, will give you back three times that number, plus two. We
call the input number the argument of the function, or the independent variable.
The output number is the value of the function, or the dependent variable.
For example, f (2), (i.e. the value of f when given argument 2) is equal to
f (2) = 3 · 2 + 2 = 6 + 2 = 8.
Example 1.2.9.
[author=duckworth, file =text_files/function_notation]
Let f be the function given by f (x) = x2 . Then x represents the input, the output
is x2 . For instance f (2) = 4.
Discussion.
Now, if you really understand the notation, you should be able to say what
f (x + 3) is without a moment’s hesitation. . . . . . . . . . I hope you said (x + 3)2 , but
if not, keep practicing!
Example 1.2.10.
[author=duckworth,label=example_function_notation_sin,uses=sin, file =text_files/
function_notation]
Let f (x) = sin(x). Then f (π/2) = sin(π/2). Now it so happens that sin(π/2)
equals 1, so we can say that f (π/2) = 1. Similarly, f (π/2 + 1) = sin(π/2 + 1). Be-
lieve it or not, I don’t know what sin(π/2+1) equals. It is not equal to sin(π/2)+1.
According to my calculator, sin(π/2 + 1) is approximately equal to 0.54.
Examples 1.2.11.
−y
ex dx, if y > 0
6. f (y) = .
0, if y ≤ 0
This function takes an input called y, and uses it as boundary values for an
integration (which we’ll learn about later).
3
2.5
2
1.5
Example 1.2.12.
1
0.5 [author=livshits,label=example_absolute_value_function,uses=absolute_value, file
=text_files/function_examples]
–3 –2 –1 0 1 2 3 Here’s one way to define the absolute value function:
abs_value_function
x if x is already positive, or 0
|x| =
−x if x is negative
You can think of this function as “making x positive” or “stripping the sign from
x”. You can also think of |x| as the distance from x to 0 on the real number line;
1.2. FUNCTIONS 19
this is a nice way to think about it, because it’s geometric, and because the main
reason we use absolute values is to give a mathematical expression to distances.
Discussion.
[author=wikibooks,label=discussion_arithmetic_with_functions, file =text_files/
combining_functions]
Functions can be manipulated in the same manner as any other variable they
can be added, multiplied, raised to powers, etc. For instance, let f (x) = 3x + 2
g(x) = x2 .
We define f + g to be the function which takes an input x to f (x) + g(x). If
you completely understand function notation then you know what the formula for
f (x) + g(x) is . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
f (x) + g(x) = (3x + 2) + (x2 ). Of course, this formula can be simplified to
f (x) + g(x) = x2 + 3x + 2.
Similarly,
However, there is one particular way to combine functions which is not like
the usual arithmetic we do with variables: you can plug one function inside of
the other! This is possibility really opens the door to many wonderful areas of
mathematics way beyond Calculus, but for now we won’t go there.
Definition 1.2.3.
[author=wikibooks,label=definition_function_composition, file =text_files/combining_
functions]
Plugging one function inside of another is called composition. Composition is
denoted by f ◦ g = (f ◦ g)(x) = f (g(x)). In this case, g is applied first, and then
f is applied to the output of g.
Example 1.2.13.
Examples 1.2.14.
files/combining_functions]
Let f (x) = x2 + 1 and g(x) = sin(x).
5. Find f (g(2)).
Definition 1.2.4.
[author=duckworth,label=definition_one_to_one_function, file =text_files/one_
to_one_functions_and_inverses]
4 A function f (x) is one-to-one if it does not ever take to different inputs to the
same output. In symbols: if a 6= b then f (a) 6= f (b).
3
2
Example 1.2.15.
1
[author=wikibooks,label=example_circle_is_not_one_to_one, file =text_files/one_
to_one_functions_and_inverses]
√
–2 –1 0 1 2 The function f (x) = 1p − x2 is not one-to-one, because both x = 1/2 and
top_half_of_unit_circle x = −1/2 result in fp(x) = 3/4. You can see this graphically as the fact that the
horizontal line y = 3/4 crosses the graph twice.
The function f (x) = x + 2 is one-to-one because for every possible value of
f (x) we never have two different inputs going to the same output. In symbols: if
a 6= b then a + 2 6= b + 2, and therefore f (a) 6= f (b).
Definition 1.2.5.
[author=duckworth,label=definition_function_inverses, file =text_files/one_to_
one_functions_and_inverses]
Let f (x) be a function. We say that another function g(x) is the inverse function
of f (x) if f (g(x)) = x and g(f (x)) = x for all x. This means that f and g cancel
each other.
An equivalent definition is that f (a) = b if and only if g(b) = a. This means
that g reverses inputs and outputs compared to f .
An equivalent way to think about this is that g(x) is the answer to the question:
f of what equals x?
Another equivalent way to think about this is that f (x) has an inverse function
if and only f (x) is one-to-one.
1.2. FUNCTIONS 21
Example 1.2.16.
[author=wikibooks,label=example_function_inverse, file =text_files/one_to_one_
functions_and_inverses]
For example, the inverse of f (x) = x + 2 is g(x) = x − 2. To verify this note that
f (g(x)) = f (x − 2) = (x − 2) + 2 = x.
√
2
√ The function f (x) = 1 − x has no inverse (as we saw above). The function
2
1 + x is close, but
√ q it works
√
only for positive values of x. To verify this note
p √
that f ( 1 + x ) = 1 − ( 1 + x2 )2 = 1 − (1 + x2 ) = x2 = |x| where |x| is
2
Example 1.2.17.
[author=duckworth,label=example_function_inverse_e^x_and_ln,uses=e^x,uses=ln,
file =text_files/one_to_one_functions_and_inverses]
Let’s consider ex and ln(x). These functions are inverses. So, for example,
since e2 ≈ 7.39 we must have ln(7.39) ≈ 2. Another way to state this is that
eln(7.39) = 7.39 and ln(e2 ) = 2.
However, these numerical examples are not really how we use the fact that
ln(x) and ex are inverses. The following would be a much more common example.
Suppose that the amount of money in someone’s bank account is given by
1000e.05t where t is measured in years. Find out how many years it will take
before they have $3000.
This means that we want to solve 3000 = 1000e.05t . Dividing both sides by
1000 we get a new eqation
3 = e.05t .
Now we can take ln(x) of both sides:
ln(3) = ln(e.05t ).
ln(3) = .05t
whence
ln(3)
t=
.05
This is a perfectly good expression for the final answer. Of course, some readers
would rather get an explicit number for t; this is understandable, but you should
practice being comfortable with answers that are formulas.
Example 1.2.18.
[author=duckworth,label=example_function_inverse_cos,uses=cos, file =text_files/
one_to_one_functions_and_inverses]
Let’s consider cos(x) and cos −1 (x). If I write x = cos(y) (and x is between −π/2
and π/2) this is (mathematically) equivalent to writing y = cos −1 (x). In other
words, the two equations will be satisfied by exactly the same values of x and y.
Thus, saying that cos(π/2) = 0 is equivalent to saying that π/2 = cos −1 (0).
22 CHAPTER 1. BACKGROUND
(We will use this idea later to find the derivative of cos −1 (x). We will start
with y = cos(x), solve this for x = cos −1 (y) and then apply implicit derivatives.)
Discussion.
[author=duckworth,label=discussion_how_to_find_inverses, file =text_files/one_
to_one_functions_and_inverses]
Many of us learned how to find inverses by following these steps: given an equation
y = . . ., (1) reverse x and y, (2) solve the new equation for y.
I think this procedure sometimes makes people confused. To clear up the
confusion, I hope you realize that step (1) is purely cosmetic. In other words, the
only part of this step that matters is step (2), the reason we do step (1) is because
we’re not used to having a function of the form x = . . ..
Let’s illustrate. The equation which translates Farhenheit into Celsius is C =
5
9 (F − 32). The inverse of this equation will translate Farhenheit into Celsius. We
find the inverse by solving for F :
5 9 9
C= (F − 32) −→ C = F − 32 −→ F = C + 32
9 5 5
Now, wasn’t that simple?
If you follow the same steps for the equation y = 95 (x−32) you get x = 95 y +32,
and this is the sort of equation that step (1) was meant to prevent.
The moral of this story should be: don’t get too hung up on the roles of x
and y, they just represent two numbers. If you get too fixated on which what the
input and output should look like, then you will sometimes have extra work to do,
to sort out purely cosmetic problems.
Exercises
1. Find the domain of the function
x−2
f (x) =
x2 + x − 2
That is, find the largest subset of the real line on which this formula can be
evaluated meaningfully.
8. The function is defined by the formula h(x) = |x|, the domain of h is all the
numbers x such that −10 ≤ x ≤ 5. What is the range of h?
9. What is the graph of this function?
10. What is f g? What is 1/f ? What are their graphs? What is the domain of
1/f ? What is the range of 1/f ?
11. v(x) = x − 3, what is the graph of |v|?
12. u(x) = (x + 1)/(x − 1), q(x) = (x2 − 1)/(x − 1). Find the domains of u and
q.
13. p(x) = x2 + 2x + 5, what is the range of p?
14. What is the degree of the product f g of 2 polynomials? Hint: What is the
highest degree term of f g?
15. Let f and g be 2 nonzero polynomials. Can f g be zero? Hint: What is the
leading term of f g?
16. Find the domain of r(x). Check that r(x) = u(x) = (x+1)/(x−1) for x 6= 0.
17. Find the domain of z(x) = 1/(1/x). Check that for x 6= 0 z(x) = x.
18. Extend the function q(x) = (x2 − 1)/(x − 1) to x = 1 by a polynomial; in
other words, find a polynomial p(x) such that p(x) = q(x) for x 6= 1.
24 CHAPTER 1. BACKGROUND
(a,b)
(0,0) a
x
point_in_plane
Discussion.
[author=duckworth,label=intro_to_manipulating_functions, file =text_files/intro_
to_manipulating_functions]
In this section we lay out some of the basic tools for using functions. This is sort
of grab-bag of techniques.
We start by reviewing how equations can represent lines, circles, and other
geometric objects.
We review some applications of functions to model real-world data.
We review how to solve some equations and inequalities.
Discussion.
[author=duckworth,label=discussion_of_point_in_xy_plane, file =text_files/cartesian_
coords_and_graphs]
Recall that the x, y-plane refers to an ideal mathematical plane labelled with an
x-axis which is horizontal, a y-axis which is vertical and the origin which is where
the two axes intersect. Every point in the plane can be labelled with x and y co-
ordinates which measure the horizontal and vertical distance respectively between
the point and the origin, see figure 1.9
1.3. USING FUNCTIONS 25
Definition 1.3.1.
[author=duckworth,label=definition_graph_of_function, file =text_files/cartesian_
coords_and_graphs]
The graph of a function f (x) is the set of points (x, y) such that x is in the do-
main and y equals f (x). Given any equation involving x and y the graph of the
equation is the set of points (x, y) which satisfy the equation.
Discussion.
[author=wikibooks,label=discussion_of_how_to_graph, file =text_files/cartesian_
coords_and_graphs]
Functions may be graphed by finding the value of f for various x, and plotting
the points (x, f (x)) in the x, y-plane.
Plotting points like this is laborious (unless you have your calculator do it).
Fortunately, many functions’s graphs fall into general patterns, and we can learn
these patters. For example, consider a function of the form f (x) = mx. The graph
of f (x) is a straight line, and m controls how steeply angled the line is. Similarly
we can learn about the graphs of our other basic functions, and later we will learn
how to find out useful information about more complicated graphs as well.
Example 1.3.1.
[author=duckworth,label=example_plotting_points, file =text_files/cartesian_coords_ 40
and_graphs]
20
Draw a picture of the graph of f (x) = 3x3 − 10x by plotting a few points.
First, we calculate some points: –3 –2 –1 1 2 3
–20
x −3 −2.5 −2 −1.5 −1 −.5 0 .5 1 1.5 2 2.5 3
f (x) −51 −21.9 −4 4.9 7 4.6 0 −4.6 −7 −4.9 4 21.9 51 –40
These points are shown in Figure 1.10 Note, we could have saved some effort if we
had only calculated half of these points. This function is odd (in a technical sense cubic_plot_points
that we’ll define later), and this would have told us that the values in the right
half of our table would equal the negative of the values in the left half. Now we Figure 1.10:
draw a smooth line through the points and get the graph shown in figure 1.11.
40
20
Definitions 1.3.2.
–3 –2 –1 1 2 3
[author=garrett,author=duckworth,label=defintion_slopes_equations_lines, file =text_ –20
files/lines_and_circles]
The simplest graphs are straight lines. The main things to remember are: –40
(x_1,y_1)
D = \sqrt{x^2+y^2}
|y_1−y_0| = \Delta y
(x_0,y_0)
|x_1−x_0|=\Delta x
distance_formula_triangle
• A vertical line has equation x = a for some number a. A horizontal line has
equation y = c for some number c.
• The slope-intercept form of the equation of a line is y = mx + b. This
form is convenient for graphing by hand, but it is not as convenient for some
other purposes.
• The point-slope form of the equation of a line with slope m and containing
a point (x0 , y0 ) is given by
y = m(x − x0 ) + y0 .
This is by far the most convenient form of the equation of a line for us to
use in Calculus.
12
10
8
6
4 Example 1.3.2.
2
Definition 1.3.3.
[author=duckworth,label=definition_distance_formula, file =text_files/lines_and_
circles]
Given two points (x0 , y0 ) and (x1 , y1 ), in the x, y-plane, their distance apart can
be computed by drawing a right triangle that contains them and applying the
Pythagorean theorem (see figure 1.12). This gives distance as
p
d = (x1 − x0 )2 + (y1 − y0 )2
Example 1.3.3.
1.3. USING FUNCTIONS 27
(2,5)
D=\sqrt{3^2+4^2}
(5,1)
example_of_distance_formula
Definition 1.3.4.
The equation of a circle with center at the point (a, b) and radius r (see fig-
ure ??) is given by
(x − a)2 + (y − b)2 = r2
Example 1.3.4.
[author=livshits,label=example_graph_of_circle_and_lines, file =text_files/lines_
and_circles]
28 CHAPTER 1. BACKGROUND
(0,0)
generic_circle
y=\sqrt{x^2+r^2}
top_half_of_circle
x +y =1
x
y=
y=
−x
y=−1/2
circle_and_axes
1.3. USING FUNCTIONS 29
Figure 1.17:
10
y 5
–10 –5 0 5 10
–5
–10
short_rational_fuction_standard_window
Figure 1.18:
0.0008
0.0006
0.0004
0.0002
–10 –5 0 5 10
short_rational_fuction_fit
In figure 1.16 we show the graphs of the unit circle x2 + y 2 = 1, and the straight
lines y = x, y = −x and y = −1/2.
Discussion.
[author=duckworth,label=discussion_graphing_on_calculators_not_always_easy, file
=text_files/complicated_graphs]
Using calculators does not always make it perfectly easy to graph a function. We
collect now a few examples of things which take some work to graph.
Example 1.3.5.
Figure 1.19:
1
0.5
–3 –2 –1 0 1 2 3
–0.5
–1
sin_of_5000_x_calculator
Figure 1.20:
1
0.5
–3 –2 –1 1 2 3
–0.5
–1
sin_of_5000_x_normal_sample
Example 1.3.6.
Figure 1.21:
1
0.5
–3 –2 –1 0 1 2 3
–0.5
–1
sin_of_5000_x_massive_sample
1.3. USING FUNCTIONS 31
Figure 1.22:
1
0.5
–1
sin_of_5000_x_narrow_range
we should use our knowledge of sin(x). We know that sin(x) oscillates. It turns out
that sin(ax) still oscillates, but it oscillates faster if a is greater than 1. To show
a graph that is oscillating faster, we need a smaller window. Roughly speaking,
to graph sin(5000x) we should use a graph that is 5000 times smaller than usual.
−π π
Thus, we try graphing with the range 5000 ≤ x ≤ 5000 . The results are shown in
figure1.22.
Discussion.
Definition 1.3.5.
[author=duckworth,label=definition_mathematical_models, file =text_files/mathematical_
models]
A mathematical model is a function that is used to describe a real-world set of
data. Sometimes this can be done by exactly solving equations for various param-
eters. Sometimes this we can only find the model which comes closest to matching
some data; in this case we usually need to use our calculators or computers.
Example 1.3.7.
[author=duckworth,label=example_modelling_population,uses=e^x,uses=ln, file =text_
files/mathematical_models]
Use an exponential model (i.e. P = Cekt ) to match the following populations
32 CHAPTER 1. BACKGROUND
Year Population
1980 4 billion,
2000 5 billion
We wish to find k and C such that the following two equations are satisfied:
4 = Ce0
5 = Ce20k
From the first equaton we see that C = 4. Plugging this into the second equation
we get
5 = 4e20k .
As soon as you see an equation with an unknown as an exponent, you can be sure
that we will use ln(x) to find that unknown. In this case, I’ll divide by 4 first:
5/4 = e20k
and then take ln(x) of both sides (using the cancelling property of ln(x) and ex ,
see Example 1.2)
ln(5/4) = 20k
whence
1
k= ln(5/4).
20
Example 1.3.8.
[author=garrett,label=example_solving_polynomial_inequality,version=1, file =text_
files/solving_inequalities]
Solve the following inequality:
The roots of this polynomial are 1, −4, 2, −3, which we put in order (from left to
right)
. . . < −4 < −3 < 1 < 2 < . . .
The roots of the polynomial P break the numberline into the intervals
(−∞, −4), (−4, −3), (−3, 1), (1, 2), (2, +∞)
On each of these intervals the polynomial is either positive all the time, or negative
all the time, since if it were positive at one point and negative at another then it
would have to be zero at some intermediate point!
For input x to the right (larger than) all the roots, all the factors x + 4, x + 3,
x − 1, x − 2 are positive, and the number 5 in front also happens to be positive.
Therefore, on the interval (2, +∞) the polynomial P (x) is positive.
Next, moving across the root 2 to the interval (1, 2), we see that the factor
x − 2 changes sign from positive to negative, while all the other factors x − 1,
x + 3, and x + 4 do not change sign. (After all, if they would have done so, then
they would have had to be 0 at some intermediate point, but they weren’t, since
we know where they are zero...). Of course the 5 in front stays the same sign.
1.3. USING FUNCTIONS 33
Therefore, since the function was positive on (2, +∞) and just one factor changed
sign in crossing over the point 2, the function is negative on (1, 2).
Similarly, moving across the root 1 to the interval (−3, 1), we see that the
factor x − 1 changes sign from positive to negative, while all the other factors
x − 2, x + 3, and x + 4 do not change sign. (After all, if they would have done
so, then they would have had to be 0 at some intermediate point). The 5 in front
stays the same sign. Therefore, since the function was negative on (1, 2) and just
one factor changed sign in crossing over the point 1, the function is positive on
(−3, 1).
Similarly, moving across the root −3 to the interval (−4, −3), we see that the
factor x + 3 = x − (−3) changes sign from positive to negative, while all the other
factors x − 2, x − 1, and x + 4 do not change sign. (If they would have done so,
then they would have had to be 0 at some intermediate point). The 5 in front
stays the same sign. Therefore, since the function was positive on (−3, 1) and just
one factor changed sign in crossing over the point −3, the function is negative on
(−4, −3).
Last, moving across the root −4 to the interval (−∞, −4), we see that the
factor x + 4 = x − (−4) changes sign from positive to negative, while all the other
factors x − 2, x − 1, and x + 3 do not change sign. (If they would have done so,
then they would have had to be 0 at some intermediate point). The 5 in front
stays the same sign. Therefore, since the function was negative on (−4, −3) and
just one factor changed sign in crossing over the point −4, the function is positive
on (−∞, −4).
In summary, we have
There’s another way to write this. The polynomial is negative on (1, 2) ∪ (−4, −3).
(The notation (1, 2) ∪ (−4, −3) means all those x-values between 1 and 2, together
with all those x-values betwen −4 and −3.)
Example 1.3.9.
is positive and and on which it’s negative. We have to factor it a bit more: recall
that we have nice facts
x2 − a2 = (x − a) (x + a) = (x − a) (x − (−a))
x2 − 2ax + a2 = (x − a) (x − a)
so that we get
It is important to note that the equation x2 + 1 = 0 has no real roots, since the
square of any real number is non-negative. Thus, we can’t factor any further than
this over the real numbers. That is, the roots of P , in order, are
For x larger than all the roots (meaning x > 2) all the factors x + 2, x − 1,
x − 1, x − 2 are positive, while the factor of −3 in front is negative. Thus, on the
interval (2, +∞) P (x) is negative.
Next, moving across the root 2 to the interval (1, 2), we see that the factor
x − 2 changes sign from positive to negative, while all the other factors 1 + x2 ,
(x − 1)2 , and x + 2 do not change sign. (After all, if they would have done so,
then they would have be 0 at some intermediate point, but they aren’t). The
−3 in front stays the same sign. Therefore, since the function was negative on
(2, +∞) and just one factor changed sign in crossing over the point 2, the function
is positive on (1, 2).
A new feature in this example is that the root 1 occurs twice in the factor-
ization, so that crossing over the root 1 from the interval (1, 2) to the interval
(−2, 1) really means crossing over two roots. That is, two changes of sign means
no changes of sign, in effect. And the other factors (1 + x2 ), x + 2, x − 2 do not
change sign, and the −3 does not change sign, so since P (x) was positive on (1, 2)
it is still positive on (−2, 1). (The rest of this example is the same as the first
example).
Again, the point is that each time a root of the polynomial is crossed over, the
polynomial changes sign. So if two are crossed at once (if there is a double root)
then there is really no change in sign. If three roots are crossed at once, then the
effect is to change sign.
Generally, if an even number of roots are crossed-over, then there is no change
in sign, while if an odd number of roots are crossed-over then there is a change in
sign.
Exercises
1. Write the equation for the line passing through the two points (1, 2) and
(3, 8).
2. Write the equation for the line passing through the two points (−1, 2) and
(3, 8).
3. Write the equation for the line passing through the point (1, 2) with slope 3.
4. Write the equation for the line passing through the point (11, −5) with slope
−1.
1.4. END OF CHAPTER PROBLEMS 35
Exercises
1. Two mathematicians (A and B) are taking a walk and chatting.
A: I have 3 children.
B: How old are they?
A: The product of their ages is 36.
B: I can’t figure out how old they are.
A: The number on the house that we are passing is the sum of their ages.
B: I still can’t figure it out.
A: My oldest child is having a soccer match tomorrow.
B: Now I can figure it out!
How old are the children?
Make a list of the possible ages whose product is 36.
The only possibilities for these three ages are 1, 1, 36; 1, 6, 6; 1, 2, 18; 1, 3, 12;
2, 2, 9; 2, 3, 6; 3, 3, 4. The sums of these ages are 38, 13, 21, 16, 13, 12, 10.
If the house number had been any of these sums except 13, mathematician
B would have known the ages, so the street number must have been 13. If
there is only one oldest child, then 2, 2, 9 must be the ages.
2. You have 2 identical ropes, a scissors and a box of matches. Each rope, when
ignited at one of its ends, burns for 1 hour. Fugure out how to measure off
45 minutes by burning these ropes. Notice: the ropes may be not uniform,
so they can burn in starts and stops, not at a constant speed.
Ignited from both ends sinultaneously, how long will a rope burn?
3. Repeat example ??, where you replace the leaky cone-shaped bucket with a
leaky cylindrical bucket.
The surface area A will be proportional
√ to H 2 , i.e. A = cH 2 the equation
dH/dt = −(a/A)v(H) = −(a/A) 2gH still holds, stick the expression for A
into it and try to work out the rest.
36 CHAPTER 1. BACKGROUND
Chapter 2
Limits
Discussion.
[author=duckworth,label=discussion_overview_of_limits,style=historical, file =text_
files/limits_overview]
Before we begin to learn limits, it might be worth describing how the way we use
limits today is the reverse of how they came to be developed historically.
Almost all modern Calculus courses (see exceptions below) start with the def-
inition of limit, and then everything which follows is built upon this definition:
vertical and horizontal asymptotes are described using limits; derivatives are de-
fined in terms of limits, as are definite integrals; sequences and series, Taylor poly-
nomials, L’Hospital’s rule, all deal directly with limits. So the modern approach
is limits first, and then everything else.
But the modern approach reverses the order of history! Newton and Leibnitz
invented a lot of what we think of as Calculus, and they never used the concept
of limits. In fact, their work was finished around 1700, but it wasn’t until around
1850’s that limits were carefully and precisely defined. Even then, it took about
another 100 years for Calculus books to base everything on limits. (Over this 100
year period the use of limits gradually trickled from more advanced subjects down
to a college freshman level Calculus course.)
So, if you find it a little difficult to understand exactly what limits are, how
they are used, and why we discuss them so much, don’t feel bad! Geniuses like
Newton, Leibnitz, Euler, Gauss, Lagrange, the Bernoulli’s, etc. didn’t understand
them either! On the other hand, by now, limits have been re-worked and simplified
so much that anyone can use them, but they still take work. The moral: don’t
feel bad if they don’t make sense at first, but don’t give up or decide that you just
can’t get it; keep working hard.
So, are there alternatives to a limits based Calculus? Yes. In the 1960’s the
infinitesimal approach was put on rigorous grounds, and this made it acceptable
to mathematicians to write Calculus books which based their results on this ap-
proach. Infinitesimals are very similar to how Leibnitz thought about derivatives
and integrals. They involve doing calculation with infinitely small quantities; this
is a strange idea and the strangeness of it is why mathematicians didn’t feel that
it was rigorously justified until the work in the 1960’s mentioned above was com-
plete. (For more about this approach and the the one which is described next
37
38 CHAPTER 2. LIMITS
line which zig-zags up and down, and then imagine that if you magnify the picture,
that there are more zig-zags that were too small for you to see before; and if you
magnify the picture again, there are more zig-zags, etc. This example is correct;
to prove that it is correct, you need to understand limits and continuity; but more
importantly, it shows that you cannot rely on intuition to say things like “it’s clear
that a continuous function is differentiable”.
Here’s another idea: integrals are calculated by finding anti-derivatives. For
example, the area between the curve y = x2 and x = 0 and x = 1 is calculated by
finding the anti-derivative 13 x3 , and plugging in x = 1 and x = 0 to get the area of
1 −x2
3 . But what about the function e ? Does that have an anti-derivative? Well,
2
it turns out that there is no formula for the anti-derivative of e−x . So, how can
we calculate areas under this curve? Well, with limits we can define the integral
R b −x2
0
e dx and we can show that the limit exists, and therefore the integral exists,
even though we cannot write down a formula for it.
Discussion.
Example 2.1.1.
[author=duckworth,label=example_glimpse_of_deriv_as_limit, file =text_files/limits_
overview]
Here’s a brief glimpse of something that’s coming later. We show it now because
it’s so important; in fact, it’s the whole reason we introduced limits now! Let
f (x) = x2 . Then the derivative of f (x) at x = 3 will be defined (later) to be
f (x) − f (3)
f 0 (x) = lim x → 3
x−3
. We interpret the derivative to be the slope of the tangent line at x = a, or the
instantaneous velocity.
Example 2.1.2.
The point of this exercise is to see the steps that we’re about to do as leading
to the idea of limits, which we’ll define in the next section, and that limits will
allow us to define derivatives.
The definition of average velocity is ∆h
∆t where ∆h is the change in height h
and ∆t is the change in time t. The probleb is that the example did not tell us to
find the velocity from t = 4 to, say, t = 6. We were given only one point in time,
and so ∆t appears to be 0. We can’t plug 0 into our definition of velocity or we
would be dividing by zero.
The elementary way out of this dilemma, is to find the average velocity from
t = 4 to t = 4.1, and figure that this is pretty close to the instantaneous velocity
at t = 4. We have:
h(4.1) − h(4) −7.938
velocity from t = 4 to t = 4.1 = = = −79.38 ms
4.1 − 4 .1
(I’ve done all the calculations in my calculator using y1 = −9.8x2 + 23x + 100 and
enterying y1 (4.1), etc.) Now, this answer is probably pretty close to the correct
value. But, to make sure, we should probably compute a few more velocities over
shorter intervals of time; this should get closer to the correct answer at t = 4.
h(4.01)−h(4)
t = 4 to t = 4.01 4.01−4 = −78.498
h(4)−h(3.999)
t = 3.999 to t = 4 4−3.999 = −78.3902
This makes it pretty clear that the “real” answer should be somewhere around
−78 ms. We can’t be sure how accurate our calculations are until we learn later
how to get the exact expression.
The idea of limit will be to take the calculations just done, and try to figure
out what the limit as t approaches 4 of the velocity function h(t)−h(4)
t−4 is.
Discussion.
[author=duckworth,label=discussion_looking_forward_to_derivatives, file =text_
files/limits_overview]
Looking ahead to the chapter on derivatives: Once we decide that formulas for
derivatives are more useful than finding the derivitive at lots of randomly chosen
numbers, we want to get a list of shortcuts. To prove that these shortcuts are
correct we need to use the long definition given above. But we only have to do
that once for each shortcut and then we will always use the shortcut.
Discussion.
[author=wikibooks,label=discussion_introducing_section_on_limits, file =text_
files/basic_limits]
Now that we have done a review of functions, we come to the central idea of cal-
culus, the concept of limit.
Example 2.1.3.
[author=wikibooks,label=example_removable_discontinuity_leading_to_limits, file
=text_files/basic_limits]
2.1. ELEMENTARY LIMITS 41
Let’s start with a function, f (x) = x2 . Now we know that f (2) = 4. But let’s be
a bit mischevious and create a gap at 2. We can do this by creating the function
x2 (x − 2)
f (x) = .
x−2
lim f (x) = 4.
x→2
Notice it doesn’t matter what f (x) is at x = 2, in this case we have left it undefined,
but it could be 2 or 15 or 1, 000, 000. The idea of the limit is that that you can
talk about how a function behaves as it gets closer and closer to a value, without
talking about how it behaves at that value. Now using variables we can say that
L is the “limit’ of the function f (x) as x approaches c if f (x) ≈ L whenever x ≈ c.
Definition 2.1.1.
[author=duckworth,label=definition_of_a_limit,style=informal, file =text_files/
basic_limits]
The notation limx→a f (x) = L means any of the following equivalent statements
(choose whichever one makes the most sense to you):
x is close f (x) is close
1. If then
to a(but 6= a) to L
Strategy.
2. Make a table of numbers for x and f (x) as x gets close to a and look for the
pattern of y-values.
Once you’ve found the limit, you still might be asked to verify that it satisfies
the definition. In particular, you might be given f (x), L, a and and asked to
find δ. Essentially, you do this graphically as follows: find the closest x-value
corresponding to y = L ± and δ is the distance from this x-value to x = a.
Discussion.
[author=wikibooks,label=discussion_of_limits_after_definition, file =text_files/
basic_limits]
Now this idea of talking about a function as it approaches something was a major
breakthrough, because it lets us talk about things that we couldn’t before. For
example, consider the function 1/x. As x gets very big, 1/x gets very small. In
fact 1/x gets closer and closer to zero, the bigger x gets. Now without limits its
very difficult to talk about this fact, because 1/x never actually gets to zero. But
the language of limits exists precisely to let us talk about the behavior of a func-
tion as it approaches something, without caring about the fact that it will never
get there. So we can say
1
lim = 0.
x→∞ x
Notice that we could use “=” instead of saying “close to”. Saying that the limit
equals 0 already means that 1/x is close to.
Exercises
1. Find limx→5 2x2 − 3x + 4.
x+1
2. Find limx→2 x2 +3 .
√
3. Find limx→1 x + 1.
2.2. FORMAL LIMITS 43
Discussion.
[author=wikibooks,label=discussion_intro_to_formal_limits, file =text_files/formal_
limits]
In preliminary calculus, the definition of a limit is probably the most difficult con-
cept to grasp. If nothing else, it took some of the most brilliant mathematicians
150 years to arrive at it.
The intuitive definition of a limit is adequate in most cases, as the limit of a
function is the function of the limit. But what is our meaning of “close”? How
close is close? We consider this question with the aid of an example.
1.4
1.2
Example 2.2.1. 1
y 0.8
0.6
[author=duckworth,label=example_limit_sin_over_x,uses=sin,uses=limits, file =text_ 0.4
files/formal_limits] 0.2
Consider the function f (x) = sin(x) –8 –6 –0.20 2 4 6 8
x . What happens to f (x) as x gets close to 0?
Well, if you try to plug x = 0 in, you get f (0) = 0/0, and this is undefined. But –0.4
if you graph the function you figure 2.1. It seems clear that the y-value “at” (or
near) x = 0 should be 1. sin_over_x
How do we convert that intuition into a rigorous statement? What do I mean by Figure 2.1:
“rigorous statement”? Well, we need a statement that doesn’t depend on looking
at graphs. Why can we not depend on graphs? Well, we need to be able to find
limits of functions like xn , without knowing what n is! So if we don’t know n,
how can we graph the function? Also, we need a statement that will work for
other kinds of limits, like those we will use when we define definite integrals, and
like those we will use when we do calculus in three (or higher) dimensions, where
we can’t rely on a graph. Finally, sometimes graphs, can be misleading, or even
wrong. See Section 1.3 for examples of this.
Discussion.
[author=duckworth,label=discussion_limit_means_infinitely_close,uses=sin,uses=
limits, file =text_files/formal_limits]
So, to say, for example, that limx→0 sin(x)
x = 1, how close does sin(x)
x have to get
to 1? Infinitely close. In mathematics, we usually want answers that are exactly
correct, not just “close enough” (actually, there are many parts of math where
“close enough” is of interest, but if it’s possible, then exactly right is always bet-
ter). So, how can we define infinitely close? The human brain doesn’t deal well
with “infinite” statements. So in fact, we translate infinite statements into finite
ones.
A first attempt at this might give something like limx→0 sin(x)
x = 1 means that
sin(x)
limx→0 x is closer to 1 than any other real number. This attempt has the
problem that it’s circular, we explained what “limx→0 sin(x)
x ” means by talking
about “limx→0 sin(x)
x ” itself.
No, we need to describe what this limit means by refering only to sin(x)
x . What
should this be doing? It should be close to 1. How close? Infinitely close. How can
44 CHAPTER 2. LIMITS
I state this using only “finite” concepts? By saying something like the following:
“for every distance you want to pick, sin(x)
x will get at least that close to 1”. The
formal definition of limit merely names “distance” with the letter .
Definition 2.2.1.
[author=wikibooks,label=definition_of_limit_formal,style=formal,uses=limits,
file =text_files/formal_limits]
Let f (x) be a function. We write
lim f (x) = L
x→a
if for every number , there exists a number δ such that |x − a| < δ and x 6= a
implies that |f (x) − L| < .
Comment.
[author=wikibooks,label=comment_about_what_limit_definition_means, file =text_
files/formal_limits]
Note that instead of saying f (x) approximately equals L, the formal definition
says that the difference between f (x) and L is less than any number epsilon.
Definition 2.2.2.
[author=duckworth,label=defintion_of_one_sided_limits,uses=limits,style=formal,
file =text_files/formal_limits]
Let f (x) be a function. We write
lim f (x) = L
x→a+
if for every number , there exists a number δ such that a < x < a + δ implies that
|f (x) − L| < . We write
lim− f (x) = L
x→a
if for every number , there exists a number δ such that a − δ < x < a implies that
|f (x) − L| < .
Comment.
[author=wikibooks,label=comment_how_to_read_one_sided_limits, file =text_files/
formal_limits]
We read limx→a− f (x) as the limit of f (x) as x approaches a from the left, and
limx→a+ f (x) as x approaches a from the right.
Fact.
[author=wikibooks,label=fact_limit_implies_equality_of_two_sided_limits, file =text_
files/formal_limits]
2.2. FORMAL LIMITS 45
Example 2.2.2.
Example 2.2.3.
Let > 0 be some (small) number, and suppose that we want to get our y values
to a distance within of 2a + 1. In other words we want |2x + 1 − (2a + 1)| <
when we make x close enough to a. How close do we need x to be to a?
We want
Ah ha!! Whatever is we can guarantee that |y(x) − y(a)| < if we pick x’s
with |x − a| < /2.
We have just proven that lim 2x + 1 = 2a + 1. Note that this applies to stuff
x→a
we did before with the difference quotient where we simplified an expression down
to something like lim 2h + 1 = 1
h→0
Example 2.2.4.
[author=wikibooks,label=example_limit_of_x^2,style=formal, file =text_files/formal_
limits]
|x2 − 16| = |x − 4| · |x + 4|
< √ δ · (δ +√8)
= ( 16 + √ − 4) · ( 16 + + 4).
= ( 16 + )2 − 42
=
Example 2.2.5.
[author=wikibooks,label=example_limit_of_sin_of_1_over_x_dne,style=formal, file
=text_files/formal_limits]
Show that the limit of sin(1/x) as x approaches 0 does not exist.
We will proceed by contradiction, thus, suppose the limit exists and is L. We
show first that L 6= 1 is a contradiction, the case L = 1 is similar. Choose = L−1,
1
then for every δ > 0, there exists a large enough n such that 0 < x0 = π/2+2πn < δ,
but |sin(1/x0 ) − l| = |L − l| = a contradiction.
2.3. FOUNDATIONS OF THE REAL NUMBERS 47
Example 2.2.6.
Discussion.
48 CHAPTER 2. LIMITS
Definition 2.3.1.
Comment.
of calculus without it, and you can see how far you get. Seriously, that would be a
fun exercise. But, if you want, you can simply preface all the statements later in
calculus with the invisible statement “If the least upper bound axiom holds, then
. . . ” where “. . . ” might be some rule about limits, or some rule about derivatives,
or some rule about max and mins of a function. In this way, all the statements
which follow are hypothetical statements, which are logically perfect, and then one
can debate if they are “really” true, which is to say, does the least upper bound
axiom “really” hold!
Comment.
[author=duckworth,label=comment_that_lub_implies_glb, file =text_files/foundations_
of_reals]
The least upper bound axiom is not symmetric, in that it talks only about upper
bounds and not lower ones. However, the real numbers are quite symmetric, and
multiplying by −1 turns lower bounds into upper bounds and vice versa. The
following theorem makes this more precise.
Theorem 2.3.1.
[author= wikibooks,label= theroem_ existence_ of_ glb , file =text_files/foundations_
of_reals]
Every non-empty set of real numbers which is bounded below has an greatest lower
bound.
Proof.
[author=duckworth,label=proof_that_existence_of_lub_implies_glb, file =text_files/
foundations_of_reals]
Let E be a non-empty set of of real numbers which is bounded below. Then −E
is bounded above (check this assertion). Let M be a least upper bound for −E.
Then −M is a greatest lower bound for E (check this assertion).
Notation.
Lemma 2.3.1.
[author= wikibooks,label= lemma_ facts_ about_ infs_ and_ sups_ and_ subsets , file =text_
files/foundations_of_reals]
Let A and B be two nonempty subsets of the real numbers. The following hold:
1. A ⊆ B ⇒ sup A ≤ sup B
2. A ⊆ B ⇒ inf A ≥ inf B
50 CHAPTER 2. LIMITS
b
3. sup A ∪ B = max(sup A, sup B)
|x|=|a−b||
4. inf A ∪ B = min(inf A, inf B)
Figure 2.2:
Proof.
[author=duckworth, file =text_files/foundations_of_reals]
Case 1: x and y are positive. Then |x + y| = x + y and |x| + |y| = x + y.
Case 2: x is positive, y is negative, and x + y is positive. Then |x + y| = x + y
and |x| + |y| = x − y. Now we calculate:
x+y ≤x−y ⇐⇒ y ≤ −y
⇐⇒ 2y ≤ 0
⇐⇒ y≤0
which is true
2.4 Continuity
Discussion.
Definition 2.4.1.
[author=wikibooks,label=definition_of_continuity_at_point, file =text_files/continuity]
We say that f (x) is at c if limx→c f (x) = f (c).
Discussion.
[author=duckworth,label=discussion_of_continuous_definition, file =text_files/
continuity]
The definition of continuous is a technical version of something that is supposed
to be intuitive. This is not done to make an easy thing seem hard. Rather, it is
done so that results can be rigorously proven. In fact, in every technical field it
is common to take an intuitive idea, often an idea that that exists outside of the
field, and translate it into a technical statement that can be used within the field.
Here’s two intuitive translations of the definition continuity:
The main intuitive ideas of continuity that this definition is supposed to capture
are these:
1. Continuous should mean that there are no holes in the graph. If you think
about it there are two types of holes, and in both cases, what is happening
at the number x = c is different than what is happening near the number.
jump_discontinuity
2. The slope between x = c, y = f (c) and an other point on the curve of f (x)
is bounded, i.e. the absolute value of this slope does not become infinitely
large. The same pictures we drew showing holes in a discontinuos function
should also show you that the slopes become infinite. It will take us some
time to prove that the slope is bounded for a continous function.
Discussion. removable_discontinuity
p
3. f (x) = (x), x = 0, then we need 2n accurate decimal places in x to get n
accurate decimal places in f (x), and it will work for x > 0 as well.
4. f (x) = 1/x and |x| > 10−k , then we can get n accurate decimal places in
f (x) by taking n + 2k accurate decimal places in x.
5. f (x) = sin(x), then we can get n accurate places in f (x) by taking n accurate
places in x.
The examples above suggest that as long as x stays away from the ”bad” values
(such as x = 0 for f (x) = 1/x) and from infinity (which means that there is an
estimate of the form |x| < A, like in example 2), we can answer the question in a
satisfactory manner. In othes words, given n, we can, by taking enough (but still
a finite number) of accurate decimal places in x get n accurate decimal places in
f (x).
Definition 2.4.2.
Discussion.
Discussion.
[author=livshits, file =text_files/continuity]
Continuous functions are rather reasonable, in particular, continuous functions
2.4. CONTINUITY 53
It follows that our approach to differentiation (see section 2.1) works for con-
tinuous functions, i.e. the rule that f 0 (a) is (f (x) − f (a))/(x − a) evaluated at
x = a defines f 0 (a) unambiguously if the division is carried out in the class of
continuous functions.
It follows from the observation that any 2 continuous functions g and h such
that (x − a)(g(x) − h(x)) = 0 must be equal because they are equal for x 6= a as
well as for x = a (g − h can’t jump).
Discussion.
[author=wikibooks, file =text_files/finding_limits]
Now we will concentrate on finding limits, rather than proving them. In the proofs
above, we started off with the value of the limit. How did we find it to even begin
our proofs?
First, if the function is continuous at a particular point c, that the limit is
simply the value of the function at c, due to the definition of continuity. All
polynomial, trigonometric, logarithmic, and exponential functions are continuous
over their domains.
If the function is not continuous at c, then in many cases (as with rational
functions) the function is continuous all around it, but there is a discontinuity
at that isolated point. In that case, we want to find a similar function, except
with the hole is filled in. The limit of this function at c will be the same, as can
be seen from the definition of a limit. The function is the same as the previous
except at a point c. The limit definition depends on f(x) only at the points where
0 < |x − c| < δ. When x = c, that inequality is false, and so the limit at c does not
depend on the value of the function at c. Therefore, the limit is the same. And
since our new function in continuous, we can now just evaluate the function at c
as before.
Lastly, note that the limit might not exist at all. There are a number of ways
that this can occur There a is gap (more than a point wide) in the function where
the function is not defined.
Example 2.4.1.
the middle of the graph. Note also that the function also has no limit at the
endpoints of the two curves generated (at x=-4 and x=4). For the limit to exist,
the point must be approachable from both the left and the right. Note also that
there is no limit at a totally isolated point on the graph.
Discussion.
Now g is continuous nowhere! For let x be a real number we show that g isn’t
continuous at x. Let δ = 2 then if g were continuous at x, there’d be a number
such that whenever y was a real number at distance less than , we’d have
|g(x) − g(y)| < 1. But no matter how small we make we can find a number y
within of x such that |g(x) − g(y)| = 2 for if x is rational, just pick y irrational
and if x is irrational, pick x rational. Thus g fails to be continuous at every real
number!
Discussion.
2.4. CONTINUITY 55
Definition 2.4.3.
Example 2.4.2.
Exercises
56 CHAPTER 2. LIMITS
Example 2.5.1.
2x + 3 2+ 3 2 + 3y 2+3·0
lim = lim 5 x = lim = = −2
x→∞ 5 − x y→0 5y − 1 5·0−1
x −1
x→∞
Discussion.
[author=garrett, file =text_files/limits_at_infinity]
The point is that we called 1/x by a new name, ‘y’, and rewrote the original limit
as x → ∞ as a limit as y → 0. Since 0 is a genuine number that we can do
arithmetic with, this brought us back to ordinary everyday arithmetic. Of course,
it was necessary to rewrite the thing we were taking the limit of in terms of 1/x
(renamed ‘y’).
Notice that this is an example of a situation where we used the letter ‘y’ for
something other than the name or value of the vertical coordinate.
58 CHAPTER 2. LIMITS
Discussion.
2n = 2 × 2 × 2 × . . . × 2 (n factors)
10n = 10 × 10 × 10 × . . . × 10 (n factors)
n
1 1 1 1 1
= × × × ... × (n factors)
2 2 2 2 2
From this idea it’s not hard to understand the fundamental properties of
exponents (they’re not laws at all):
am+n = a × a × a × . . . × a (m + n factors)
| {z }
m+n
= (a × a × a × . . . × a) × (a × a × a × . . . × a) = am × an
| {z } | {z }
m n
and also
amn = (a × a × a × . . . × a) =
| {z }
mn
= (a × a × a × . . . × a) × . . . × (a × a × a × . . . × a) = (am )n
| {z } | {z }
m m
| {z }
n
at least for positive integers m, n. Even though we can only easily see that these
properties are true when the exponents are positive integers, the extended notation
is guaranteed (by its meaning, not by law ) to follow the same rules.
Discussion.
1
a−n = an×(−1) = (an )−1 =
an
2.5. LIMITS AT INFINITY 59
(whether n is positive or not). Just to check one example of consistency with the
properties above, notice that
1 1
a = a1 = a(−1)×(−1) = = =a
a−1 1/a
This is not supposed to be surprising, but rather reassuring that we won’t reach
false conclusions by such manipulations.
Also, fractional exponents fit into this scheme. For example
√ √
a1/2 = a a1/3 = 3
a
√ √
a1/4 = 4
a a1/5 = 5
a
This is consistent with earlier notation: the fundamental property of the nth root
of a number is that its nth power is the original number. We can check:
a = a1 = (a1/n )n = a
One hazard is that, if we want to have only real numbers (as opposed to
complex numbers) come up, then we should not try to take square roots, 4th
roots, 6th roots, or any even order root of negative numbers.
For general real exponents x we likewise should not try to understand ax except
for a > 0 or we’ll have to use complex numbers (which wouldn’t be so terrible).
But the value of ax can only be defined as a limit: let r1 , r2 , . . . be a sequence of
rational numbers approaching x, and define
ax = lim ari
i
We would have to check that this definition does not accidentally depend upon
the sequence approaching x (it doesn’t), and that the same properties still work
(they do).
Discussion.
Discussion.
With the definitions in mind it is easier to make sense of questions about limits
of exponential functions. The two companion issues are to evaluate
lim ax
x→+∞
lim ax
x→−∞
Since we are allowing the exponent x to be real, we’d better demand that a be a
positive real number (if we want to avoid complex numbers, anyway). Then
+∞ if a>1
lim ax = 1 if a=1
x→+∞
0 if 0 < a < 1
0 if a>1
lim ax = 1 if a=1
x→−∞
+∞ if 0 < a < 1
Exercises
x+1
1. Find limx→∞ x2 +3 .
x2 +3
2. Find limx→∞ x+1 .
x2 +3
3. Find limx→∞ 3x2 +x+1 .
1−x2
4. Find limx→∞ 5x2 +x+1 .
2
5. Find limx→∞ e−x
Chapter 3
Derivatives
Discussion.
Example 3.1.1.
61
62 CHAPTER 3. DERIVATIVES
Definition 3.1.1.
Definition 3.1.2.
Example 3.1.2.
Discussion.
[author=wikibooks, file =text_files/velocity_problem_as_limit]
To see the power of the limit, let’s go back to the moving car we talked about at
the introduction. Suppose we have a car whose position is linear with respect to
time (that is, that a graph plotting the position with respect to time will show a
staight line). We want to find the velocity. This is easy to do from algebra, we
just take a slope, and that’s our velocity.
But unfortunately (or perhaps fortunately if you are a calculus teacher), things
3.1. THE IDEA OF THE DERIVATIVE OF A FUNCTION 63
in the real world don’t always travel in nice straight lines. Cars speed up, slow
down, and generally behave in ways that make it difficult to calculate their veloc-
ities. (figure 2)
Now what we really want to do is to find the velocity at a given moment.
(figure 3) The trouble is that in order to find the velocity we need two points,
while at any given time, we only have one point. We can, of course, always find
the average speed of the car, given two points in time, but we want to find the
speed of the car at one precise moment.
Here is where the basic trick of differential calculus comes in. We take the
average speed at two moments in time, and then make those two moments in time
closer and closer together. We then see what the limit of the slope is as these two
moments in time are closer and closer, and as those two moments get closer and
closer, the slope comes out to be closer and closer to the slope at a single instant.
Discussion.
Example 3.1.3.
Example 3.1.4.
√ You may√ notice that it is the same trick “upside down”, because if we put z =
t and Z = T , the undefined expression to take care of becomes (z−Z)/(z 2 −Z 2 )
which is the same as (z − Z)/((z − Z)(z + Z)).
Here is one more similar problem that is easy enough to do ”with the bare
hands”.
Example 3.1.5.
(x,1/x)
the
tang
ent
(a,1/a)
the
y=1/x
se
ca
x
nt
1_over_x_tangent_secant
Definition 3.1.3.
[author=wikibooks, file =text_files/derivatives_definition]
f (x + h) − f (x)
f 0 (x) = lim
h→0 h
3.1. THE IDEA OF THE DERIVATIVE OF A FUNCTION 65
Example 3.1.6.
This is consistent with the definition of the derivative as the slope of a function.
Example 3.1.7.
= limh→0 2xh+h
h
= limh→0 2x + h
= 2x.
Though this may seem surprising, because y = x2 fits y = mx + c if m = x
and c = 0, it becomes intuitive when one realizes that the slope changes twice as
fast as with m = x because there are two xs that vary.
Discussion.
√ √
in the second problem it is the class of rational functions of t and T , while in
the third problem it is the class of rational functions of x and a.
Why do we need a special class of functions? Why can’t we consider all func-
tions whatsoever? Because the class of all functions is too wide to disambiguate
the ambiguous ratio 0/0. Indeed, if we allow p(x, a) to be any function such that
q(x, a) = p(x, a) for x 6= a, we can get no information about p(a, a) because p(a, a)
can be changed to any number if we admit all the functions into the game. We
see that some restrictions on the functions that we treat are inevitable.
The following property of the functions we treated so far was crucial for our
success: any 2 of such functions that are defined for x = a and coincide for all x 6= a
also coincide for x = a. It means that the value p(a, a) is defined unambiguously by
the condition that p(x, a) = q(x, a) for x 6= a (see the last paragraph of section ??).
Later on we will describe some other classes of functions, much more general
than the ones we dealt with so far, but still nice enough for our machinery to work.
To summarize briefly, the function f is differentiable if the increment f (x)−f (a)
factors as f (x) − f (a) = (x − a)p(x, a) and the function p(x, a) is well defined for
x = a. The derivative f 0 (a) = p(a, a).
In the next section we will consider some elementary properties (the rules) of
differentiation that will be handy in calculations.
Discussion.
Notation.
[author=livshits, file =text_files/derivative_notation]
The standard notation (due to Lagrange) for the derivative of f for x = a is f 0 (a).
We can also consider it as a function of a and then differentiation becomes the
operation of passing from a function f to its derivative f 0 (which is also a function
of x).
The other notation for f 0 (due to Leibniz) is df /dx. In particular, we can say
that we calculated s0 (2) in our first problem, r0 (T ) in our second problem and
dy/dx(a) in our third problem. We can also write the results we got so far as
√0 √
(16t2 )0 = 32t, t = 1/(2 t and d(1/x)/dx = −1/x2 .
Newton used dots on top of the letters denoting functions as the differentiation
sign; for example, by solving problem 1, we got ṡ(t) = 32t. This notation is still
3.2. DERIVATIVE SHORTCUTS 67
popular in mechanics.
On each step we multiplied the divisor by the monomial to kill the leading
term of the remainder obtained at the previous step. This way the degree of the
remainder dropped by one every step of the process. The process stops when the
degree of the remainder is less than the degree of the divisor. The remainder in
our example is 6976. On the other hand, p(3) = 6976 too. Is it a coincidence? No,
because the result of the division can be written as
p(x) = 3x7 +5x4 +x2 +1 = (3x6 +9x5 +27x4 +86x3 +258x2 +775x+2325)(x−3)+6976
and we can plug x = 3 into this formula to see that p(3) = 6976. In general, p(a)
is the remainder of the division of p(x) by x − a, in particular, p(a) = 0 if and only
if x − a divides p(x) evenly, i.e. with zero remainder.
68 CHAPTER 3. DERIVATIVES
This is a very important fact. Assume that a1 , ..., ak are the roots of p(x).
Then each x − aj divides p(x), whence p(x) = (x − a1 )...(x − ak )g(x), so the degree
of p is at least k. It follows that a polynomial of degree d can not have more than
d different roots. In particular, no nonzero polynomial can have infinite number
of roots; in other words, if a polynomial has an infinite number of roots, it is
zero. Also two polynomial functions that coincide on an infinite set must coincide
everywhere (consider their difference!).
We can also see that any rational function is well defined for all the values of
the argument except for the finite number of values at which some denominator
involved in this function vanishes.
It also follows that a rational function can have at most a finite number of
zeroes, in particular, any two rational functions that coincide on an infinite set
coincide wherever they are both defined (exercise!).
We can use this fact to check our algebraic manipulations. For example, if we
rewrite some formula in a different form, to catch a mistake it is usually enough
to plug in some random number into both formulas and see if they give different
results. The probability that this approach fails is zero.
Discussion.
Rule 3.2.1.
Rule 3.2.2.
d
c=0
dx
for any constant c.
Rule 3.2.3.
The third thing, which reflects the innocuous role of constants in calculus, is that
for any function f of x we have
d d
c·f =c· f
dx dx
The fourth is that for any two functions f, g of x, the derivative of the sum is the
sum of the derivatives:
d d d
(f + g) = f+ g
dx dx dx
Rule 3.2.4.
d
(axm + bxn + cxp ) = a · mxm−1 + b · nxn−1 + c · pxp−1
dx
and so on, with more summands than just the three, if so desired. And in any
case here are some examples with numbers instead of letters:
d 3
5x = 5 · 3x3−1 = 15x2
dx
d
(3x7 + 5x3 − 11) = 3 · 7x6 + 5 · 3x2 − 0 = 21x6 + 15x2
dx
d
(2 − 3x2 − 2x3 ) = 0 − 3 · 2x − 2 · 3x2 = −6x − 6x2
dx
d
(−x4 + 2x5 + 1) = −4x3 + 2 · 5x4 + 0 = −4x3 + 10x4
dx
Even if you do catch on to this idea right away, it is wise to practice the
technique so that not only can you do it in principle, but also in practice.
Rule 3.2.5.
Both rules together say that differentiation is a linear operation. These rules
are sort of obvious. For example, to calculate (f + g)0 (a) we consider the difference
quotient (f (x) + g(x) − (f (a) + g(a)))/(x − a) which can be rewritten as (f (x) −
f (a))/(x − a) + (g(x) − g(a))/(x − a). Since both additive terms make sense for
x = a and produce f 0 (a) and g 0 (a), we are done.
70 CHAPTER 3. DERIVATIVES
Examples 3.2.2.
[author=garrett, file =text_files/deriv_powers]
It’s important to remember some of the other possibilities for the exponential
notation xn . For example √
x1/2 = x
1
x−1 =
x
1
x−1/2 = √
x
and so on. The good news is that the rule given just above for taking the derivative
of powers of x still is correct here, even for exponents which are negative or fractions
or even real numbers:
d r
x = r xr−1
dx
Thus, in particular,
d √ d 1/2 1
x= x = x−1/2
dx dx 2
d 1 d −1 −1
= x = −1 · x−2 = 2
dx x dx x
When combined with the sum rule and so on from above, we have the obvious
possibilities:
Example 3.2.3.
d √ 5 d 1 7
(3x2 − 7 x + 2 = (3x2 − 7x 2 + 5x−2 ) = 6x − x−1/2 − 10x−3
dx x dx 2
Comment.
Discussion.
[author=wikibooks, file =text_files/derivative_rules]
The process of differentiation is tedious for large functions. Therefore, rules for
differentiating general functions have been developed, and can be proved with a
little effort. Once sufficient rules have been proved, it will be possible to differen-
tiate a wide variety of functions. Some of the simplest rules involve the derivative
of linear functions.
Rule 3.2.6.
[author=wikibooks, file =text_files/derivative_rules]
3.2. DERIVATIVE SHORTCUTS 71
d
Constant rule dx c =0
d
Linear functions dx mx =m
dy
The special case dx
= 1 shows the advantage of the d/dx notation - rules
are intuitive by basic algebra, though this does not constitute a proof, and
can lead to misconceptions to what exactly dx and dy actually are.
Constant multiple and addition rules Since we already know the rules for
some very basic functions, we would like to be able to take the derivative
of more complex functions and break them up into simpler functions. Two
tools that let us do this are the constant multiple rules and the addition rule.
d d
The constant multiple rule is dx cf (x) = c dx f (x)
The reason, of course, is that one can factor the c out of the numerator, and
then of the entire limit, in the definition.
Example 3.2.4.
Rule 3.2.7.
d d d
Addition rule dx (f (x) + g(x)) = dx f (x) + dx g(x)
d d d
Subtraction Rule dx (f (x) − g(x)) = dx f (x) − dx g(x)
Example 3.2.5.
[author=wikibooks, file =text_files/derivative_rules]
d
Example what is dx 3x2 + 5x
d d d
2
2
dx 3x + 5x = dx 3x + dx 5x
d
= 6x + dx 5x
= 6x + 5
Comment.
[author=wikibooks, file =text_files/derivative_rules]
The fact that both of these rules work is extremely significant mathematically
because it means that differentiation is linear. You can take an equation, break
72 CHAPTER 3. DERIVATIVES
it up into terms, figure out the derivative individually and build the answer back
up, and nothing odd will happen.
Rule 3.2.8.
[author=wikibooks, file =text_files/derivative_rules]
d n
The Power Rule dx x = nxn−1 - that is, bring down the power and reduce it
by one.
Example 3.2.6.
[author=wikibooks, file =text_files/derivative_rules]
For example, in the case of x2 , the derivative is 2x1 = 2x, as was established
earlier.
Example 3.2.7.
[author=wikibooks, file =text_files/derivative_rules]
The power rule also applies to fractional and negative powers, therefore
√
d √
= 2xx
d
1/2 1 −1/2 1
dx [ x] = dx x = 2x = 2√ x
Comment.
Example 3.2.8.
Exercises
d 7
1. Find dx (3x + 5x3 − 11)
3.2. DERIVATIVE SHORTCUTS 73
d 2
2. Find dx (x + 5x3 + 2)
d 4
3. Find dx (−x + 2x5 + 1)
d 2
4. Find dx (−3x − x3 − 11)
d √
5. Find dx (3x7 + 5 x − 11)
d 2 √
6. Find dx ( x + 5 3 x + 3)
d 5
7. Find dx (7 − x3 + 5x7 )
74 CHAPTER 3. DERIVATIVES
Discussion.
Example 3.3.1.
C
y B
A
x 3−a3−3a2 (x−a)
2 2
|BC| = (|OB| + |AC| − |OB|
y=x 3 x
a x strange_cubic_tangent_circle
We see that this distance has a factor (x − a)2 in it. The other factor, |x + 2a|
will be bounded by some constant K if we restrict x and a to some finite segment
[A, B], in other words, if we demand that A ≤ x ≤ B and A ≤ a ≤ B (in fact we
can take K = 3max{|A|, |B|}).
Now the whole estimate can be rewritten as |f (x) − f (a) − f 0 (a)(x − a)| ≤
K(x − a)2 for x and a in [A, B]. Here K may depend only on function f and on
segment [A, B], but not on x and a. We can also see that |(f (x) − f (a))/(x − a) −
f 0 (a)| ≤ K|x − a| for x 6= a.
The same kind of estimates hold when f is any polynomial or a rational function
defined everywhere in [A, B], it is also true if f is sin or cos
Definition 3.3.1.
[author=livshits, file =text_files/increasing_function_theorem]
We say that f is uniformly Lipschitz differentiable on [A, B] if for some constant
3.3. AN ALTERNATIVE APPROACH TO DERIVATIVES 75
K we have
|f (x) − f (a) − f 0 (a)(x − a)| ≤ K(x − a)2 (3.1)
for all x and a in [A, B].
Comment.
2
y=f(a)+f (a)(x-a)+K(x-a)
2
y=f(a)+f (a)(x-a)-K(x-a)
f(a)
x
a)
x-
(a)( a
f
y=f(x) )+
f(a
y=
tangent_parabs_bound_graph
Comment.
[author=livshits, file =text_files/increasing_function_theorem]
Another motivation for this definition is related to the idea to view differentiation
as factoring of functions of a certain class, that was developed in section 2.1. Let
us say that we want to deal only with the functions that don’t change too abruptly.
To insure it we can demand that |f (x) − f (a)| can be estimated in terms |x − a|,
the simplest estimate of this kind is used in the following definition.
Definition 3.3.2.
76 CHAPTER 3. DERIVATIVES
Comment.
[author=livshits, file =text_files/increasing_function_theorem]
Important: the constant L (which is called a Lipschitz constant for g and [A, B]) in
this definition depends only on the function and the interval, but not on individual
x or a.
Definition 3.3.3.
Definition 3.3.4.
[author=livshits, file =text_files/increasing_function_theorem]
Now let us say that f (x) − f (a) factors as f (x) − f (a) = (x − a)p(x, a) where
p(x, a) is a ULC function of x and f 0 (a) = p(a, a). Then the following inequality
holds for x 6= a:
f (x) − f (a)
| − f 0 (a)| ≤ L(a)|x − a|.
x−a
Here the function L(a) may be rather nasty, but if it is bounded by a constant,
that is if L(a) ≤ K for all a between A and B, we arrive (by multiplying both
sides by |x − a| and replacing L(a) by K) at 3.1.
Proof.
[author=livshits, file =text_files/increasing_function_theorem]
Case 1. We assume that if f 0 (x) ≥ C for some C > 0 then f is increasing.
It follows from this result that f will be increasing if f 0 ≥ 0. Here is how.
According to exercise ??, for any C > 0 the function f (x) + Cx will be increasing,
i.e. for any a < b we will have f (a) + Ca ≤ f (b) + Cb, whence f (b) − f (a) ≥
−C(b − a), and since C is arbitrary, we must have f (a) ≥ f (b).
Case 2. The idea is the most popular one in Calculus: to chop up the segment
[A, B] into N equal pieces, use the estimate from our definition on each piece, and
then notice what happens when N becomes large.
Let us take xn = A + n(B − A)/N for n = 0, . . . , N and let us take a = xn−1
and x = xn in the estimate from the definition. The estimate from ?? can be
3.3. AN ALTERNATIVE APPROACH TO DERIVATIVES 77
rewritten as
Since f 0 ≥ 0 and xn ≥ xn−1 and therefore f 0 (xn−1 )(xn − xn−1 ) ≥ 0, we can (by
also noticing that xn − xn−1 = (B − A)/N ) get the following estimate:
Now let us replace f (B) − f (A) with the following telescoping sum:
There are N terms in this sum, each one is ≥ −K(B −A)2 /N 2 , therefore the whole
sum is ≥ −K(B − A)2 /N . But the whole sum is equal to f (B) − f (A), therefore
This inequality can hold for all N only if f (B)−f (A) ≥ 0 (this is called Archimedes
Principle), therefore f (A) ≤ f (B).
Corollary 3.3.1.
[author= livshits , file =text_files/increasing_function_theorem]
If f 0 (x) = 0 for all x, then f is a constant function.
Proof.
[author=livshits, file =text_files/increasing_function_theorem]
Let f be ULD on [A, B] and f 0 = 0. IFT tells us that f (A) ≥ f (B). But (−f )0 = 0
too, so −f (A) ≥ −f (B), and f (A) ≤ f (B), therefore f (A) = f (B). Taking A = u
and B = x, u ≤ x finishes the proof.
Corollary 3.3.2.
[author= livshits , file =text_files/increasing_function_theorem]
From this result we can conclude that any two ULD antiderivatives of the same
function may differ only by a constant, and therefore if F 0 = f then all the ULD
antiderivatives of f are of the form F + C, where C is a constant.
Theorem 3.3.2.
[author= livshits , file =text_files/increasing_function_theorem]
The derivative of a ULD function is ULC.
Proof.
[author=livshits, file =text_files/increasing_function_theorem]
For x 6= a, by dividing both sides of ?? by |x − a|, we get
f (x) − f (a) 0
x−a − f (a) ≤ K|x − a|. (3.3)
This estimate may be handy to check your differentiation. If your formula for f 0
is right, the left side of 3.3 will be small for x close to a (how close – will depend
on K), if it is wrong – it will not be so.
Interchanging x and a in formula 3.3 leads to
f (a) − f (x) 0
a−x − f (x) ≤ K|a − x|,
but
f (x) − f (a) f (a) − f (x)
=
x−a a−x
78 CHAPTER 3. DERIVATIVES
and |a − x| = |x − a|, so f 0 (x) and f 0 (a) are less than K|x − a| away from the same
number, and therefore less than 2K|x − a| apart, i.e.
|f 0 (x) − f 0 (a)| ≤ 2K|x − a|. (3.4)
Comment.
Comment.
[author=livshits, file =text_files/increasing_function_theorem]
It is natural to ask whether any ULC function has a ULD primitive. Later on, after
taking a closer look at area and integration, we show that it is true. Combining
this fact with IFT, we can derive positivity of definite integrals that was promised
at the end of section ??.
It is also clear that uniform Lipschitz differentiability is stronger than mere
divizibility of f (x) − f (a) by x − a in the class of ULC functions of x. As an
example, consider f (x) = x2 sin(1/x). We have f (0) = f 0 (0) = 0, but the x-axis
doesn’t look like a tangent, near x = 0 it cuts the graph of f (that looks like fuzz)
at infinitely many points. However, if f 0 understood in the spirit of section 2.1
turns out to be ULC, f will be ULD. To prove this fact one needs some rather
delicate property of the real numbers (completeness) that will be treated in another
chapter.
Derivation.
y y
C B
A B
C
u x
t u x
O A D E
sin(t+u)-sin(t) |CD| sin(u) < u < tan(u)
= = cos(t + u/2)
2sin(u/2) |DB| deriv_sin_cos_rigorous
Dividing the inequality sin(u) < u < tan(u) by u (assuming π/4 > u > 0), we
get
sin(u) tan(u) sin(u)/u
<1< = ,
u u cos(u)
3.3. AN ALTERNATIVE APPROACH TO DERIVATIVES 79
therefore
sin(u)
cos(u) < <1
u
which holds for −π/4 < u < 0 as well since cos(−u) = cos(u) and sin(−u) =
− sin(u), whence sin(−u)/(−u) = sin(u)/u. Now
To conclude our proof that sin0 (u) = cos(u) we have to get an estimate
cos(t) − cos(t + u/2) sin(u/2) ≤ K|u|
u/2
Exercises
3. Try to show ”The figure showing the upper and lower parabolas suggests
that any ULD function with a positive derivative will be increasing.” it and
see that it is not easy.
5. Show that functions with positive derivatives are increasing. Can you use
IFT to make the argument easy?
80 CHAPTER 3. DERIVATIVES
6. Fill in the details of ”This theorem together with the estimate 3.3 demon-
strate that the time derivative of the distance is a reasonable mathematical
metaphor for instantaneous velocity if the distance is a ULD function of time.
Indeed, in this case the average velocity over a short enough time interval
will be close to the time derivative of the distance at any time during this
interval. ”
3.4. DERIVATIVES OF TRANSCENDENTAL FUNCTIONS 81
Rule 3.4.1.
Rule 3.4.2.
Rule 3.4.3.
Comment.
Rule 3.4.4.
82 CHAPTER 3. DERIVATIVES
d x
dx a = ln a · ax
d 1
dx loga x = ln a·x
d
dx sec x = tan x sec x
d
dx csc x = − cot x csc x
d
dx cot x = − csc2 x
d √ −1
dx arccos x = 1−x2
d −1
dx arccot x = 1+x2
d √−1
dx arccsc x = x x2 −1
Comment.
[author=garrett, file =text_files/deriv_transcend]
(There are always some difficulties in figuring out which of the infinitely-many
possibilities to take for the values of the inverse trig functions, and this is especially
bad with arccsc, for example. But we won’t have time to worry about such things).
Comment.
Discussion.
First we will solve this for the specific case of an exponent with a base of e
and then extend it to the general case with a base of a where a is a positive real
number.
Derivation.
[author=wikibooks, file =text_files/derivative_exponentials]
First we set up our problem using f (x) = ex
d x ex+h −ex−h
dx e = limh→0 2h
Treating ex as a constant with respect to what we are taking the limit of, we
can use the limit rules to move it to the outside, leaving us with
3.4. DERIVATIVES OF TRANSCENDENTAL FUNCTIONS 83
d x eh −e−h
dx e = ex · limh→0 2h
Derivation.
Derivation.
Next we will put both sides to the power of e in an attempt to remove the
logarithm from the right hand side
ey = x
Now, applying the chain rule and the property of exponents we derived earlier,
we take the derivative of both sides
dy
dx · ey = 1
This leaves us with the derivative
dy 1
dx = ey
84 CHAPTER 3. DERIVATIVES
Derivation.
[author=wikibooks, file =text_files/derivatives_logarithms]
If we wanted, we could go through that same process again for a generalized base,
but it is easier just to use properties of logs and realize that
ln(x)
logb (x) = ln(b)
Discussion.
Derivation.
[author=wikibooks,uses=complexnumbers, file =text_files/derivatives_trig_functions]
There are two basic ways to determine the derivative of these functions. The first
is to sit down with a table of trigonometric identities and work your way through
using the formal equation for the derivative. This is tedious and requires either
memorizing or using a table with a lot of equations on it. It is far simpler to just
use Euler’s Formula
Euler’s Formula’ ei x = cos(x) + i sin(x)
√
Where i = −1.
This leads us to the equations for the sine and cosine
ei x −e−i x ei x +e−i x
sin(x) = 2i cos(x) = 2
Derivation.
[author=wikibooks, file =text_files/derivatives_trig_functions]
Let us find the derivative of sin(x), using the above definition.
sin(x+h)−sin(x)
f (x) = sin(x) f 0 (x) = limh→0 h
sin(x) cos h+cos(x) sin h−sin(x)
= limh→0 h
sin(x)(cos h−1)+cos(x) sin h
= limh→0 h
sin(x)(cos h−1) cos(x) sin h
= limh→0 h + h
sin(x)(cos h−1) cos(x) sin h
= limh→0 h + limh→0 h
cos(x) sin h
= 0 + limh→0 h
= cos(x)
Derivation.
[author=wikibooks, file =text_files/derivatives_trig_functions]
Derivation.
[author=wikibooks, file =text_files/derivatives_trig_functions]
For secants, we just need to apply the chain rule to the derivations we have already
determined.
1
sec(x) = cos(x)
Simplifying, we get
d
Derivative of the Secant dx sec(x) = sec(x) tan(x)
Derivation.
86 CHAPTER 3. DERIVATIVES
We get
d
Derivative of the Cosecant dx csc(x) = − csc(x) cot(x)
Using the same procedure for the cotangent that we used for the tangent, we
get
d
Derivative of the Cotangent dx cot(x) = − csc2 (x)
Discussion.
O A x
t
cos(t)
deriv_sin_cos_circle
You can see from the figure that sin(t)0 = cos(t) and cos(t)0 = −sin(t).
Exercises
d cos x
1. Find dx (e )
d
2. Find dx (arctan(2 − ex ))
d
p
3. Find dx ( ln (x − 1))
d 2 cos x+5
4. Find dx (e )
d
5. Find dx (arctan(1 + sin 2x))
d
6. Find cos(ex − x2 )
dx
√
d 3
7. Find dx 1 − ln 2x
d ex −1
8. Find dx ex +1
q
d
9. Find dx ( ln ( x1 ))
3.5. PRODUCT AND QUOTIENT RULE 87
Rule 3.5.1.
Comment.
Example 3.5.1.
d
(x3 + x2 + x + 1)(x4 + x3 + 2x + 1)
dx
we could multiply out and then take the derivative term-by-term as we did with
several polynomials above. This would be at least mildly irritating because we’d
have to do a bit of algebra. Rather, just apply the product rule without feeling
compelled first to do any algebra:
d
(x3 + x2 + x + 1)(x4 + x3 + 2x + 1)
dx
Now if we were somehow still obliged to multiply out, then we’d still have to do
some algebra. But we can take the derivative without multiplying out, if we want
to, by using the product rule.
88 CHAPTER 3. DERIVATIVES
Comment.
[author=garrett, file =text_files/product_rule]
For that matter, once we see that there is a choice about doing algebra either
before or after we take the derivative, it might be possible to make a choice which
minimizes our computational labor. This could matter.
Rule 3.5.2.
[author=livshits, file =text_files/product_rule]
Product or Leibniz Rule: (f g)0 = f 0 g + f g 0
Derivation.
f(a)g(a)
g
g(a) g(x) leibnitz_rule
Discussion.
[author=garrett, file =text_files/quotient_rule]
The quotient rule is one of the more irritating and goofy things in elementary
calculus, but it just couldn’t have been any other way.
Rule 3.5.3.
[author=garrett, file =text_files/quotient_rule]
Quotient Rule:
f 0 g − g0 f
d f
=
dx g g2
Comment.
[author=garrett, file =text_files/quotient_rule]
The main hazard is remembering that the numerator is as it is, rather than acci-
dentally reversing the roles of f and g, and then being off by ±, which could be
fatal in real life.
3.5. PRODUCT AND QUOTIENT RULE 89
Example 3.5.2.
Example 3.5.3.
[author=garrett, file =text_files/quotient_rule]
Example 3.5.4.
[author=garrett, file =text_files/quotient_rule]
Example 3.5.5.
[author=livshits, file =text_files/quotient_rule]
(f (x)/g(x)−f (a)/g(a))/(x−a)|x=a = [(f (x)/g(x)−f (x)/g(a))+(f (x)−f (a))/g(a)]/(x−
a)|x=a =
= (f (x)/(g(x)g(a)))(g(a)−g(x))/(x−a)|x=a +(f (x)−f (a))/(x−a)/g(a)|x=a =
= −f (x)g 0 (x)/(g(x))2 + f 00 (x)/g(x) = (f 00 (x)g(x) − f (x)g 0 (x))/(g(x)2 )
Discussion.
Rule 3.5.4.
Derivation.
”quotient rule”
Rule 3.5.5.
Which some people remember with the mnemonic “low D-high minus high D-
low over the square of what’s below.”
Comment.
[author=wikibooks, file =text_files/product_quotient_rules]
Remember the derivative of a product/quotient “is not’ the product/quotient of
the derivatives. (That is, differentiation does not distribute over multiplication
or division.) However one can distribute before taking the derivative. That is
d d
dx ((a + b) × (c + d)) ≡ dx (ac + ad + bc + bd))
Comment.
Exercises
d 3
1. Find dx (x − 1)(x6 + x3 + 1))
d 2
2. Find dx (x + x + 1)(x4 − x2 + 1).
d 3
3. Find dx (x+ x2 + x + 1)(x4 + x2 + 1))
d √
4. Find dx (x3 + x2 + x + 1)(2x + x))
d x−1
5. Find dx ( x−2 )
d 1
6. Find dx ( x−2 )
√
d x−1
7. Find dx ( x2 −5 )
3
d 1−x
8. Find dx ( 2+ x )
√
92 CHAPTER 3. DERIVATIVES
Rule 3.6.1.
Comment.
Example 3.6.1.
[author=garrett, file =text_files/chain_rule]
F (x) = (1 + x2 )100
is really obtained by first using x as input to the function which squares and adds
1 to its input. Then the result of that is used as input to the function which takes
the 100th power. It is necessary to think about it this way or we’ll make a mistake.
The derivative is evaluated as
d
(1 + x2 )100 = 100(1 + x2 )99 · 2x
dx
To see that this is a special case of the general formula, we need to see what
corresponds to the f and g in the general formula. Specifically, let
f (input) = (input)100
3.6. CHAIN RULE 93
g(input) = 1 + (input)2
The reason for writing ‘input’ and not ‘x’ for the moment is to avoid a certain
kind of mistake. But we can compute that
f 0 (input) = 100(input)99
g 0 (input) = 2(input)
The hazard here is that the input to f is not x, but rather is g(x). So the general
formula gives
d
(1 + x2 )100 = f 0 (g(x)) · g 0 (x) = 100g(x)99 · 2x = 100(1 + x2 )99 · 2x
dx
Examples 3.6.2.
d
(3x5 − x + 14)11 = 11(3x5 − x + 14)10 · (15x4 − 1)
dx
Example 3.6.3.
Example 3.6.4.
[author=garrett, file =text_files/chain_rule]
Of course, this idea can be combined with polynomials, quotients, and products
to give enormous and excruciating things where we need to use the chain rule, the
quotient rule, the product rule, etc., and possibly several times each. But this is
not hard, merely tedious, since the only things we really do come in small steps.
For example:
√ √ √
(1 + x + 2)0 · (1 + 7x)33 − (1 + x + 2) · ((1 + 7x)33 )0
d 1+ x+2
=
dx (1 + 7x)33 ((1 + 7x)33 )2
94 CHAPTER 3. DERIVATIVES
d √ 1 1
x + 2 = (x + 2)−1/2 · (x + 2)0 = (x + 2)−1/2
dx 2 2
Then we use the chain rule again to take the derivative of that big power of 1 + 7x,
so the whole thing becomes
√
( 12 (x + 2)−1/2 ) · (1 + 7x)33 − (1 + x + 2) · (33(1 + 7x)32 · 7)
((1 + 7x)33 )2
Although we could simplify a bit here, let’s not. The point about having to do
several things in a row to take a derivative is pretty clear without doing algebra
just now.
Discussion.
[author=wikibooks, file =text_files/chain_rule]
d
We know how to differentiate regular polynomial functions. For example dx (3x3 −
2 2
6x + x) = 9x − 12x + 1 However, we’ve not yet explored the derivative of an
unexpanded expression. If we are given the function y = (x + 5)2 , we currently
have no choice but to expand it y = x2 + 10x + 25 f 0 (x) = 2x + 10 However,
there is a useful rule known as the “chain rule’. The function above (y = (x + 5)2 )
can be consolidated into y = u2 , where u = (x + 5). Therefore y = f (u) = u2
u = g(x) = x + 5 Therefore y = f (g(x))
Rule 3.6.2.
Example 3.6.5.
[author=wikibooks, file =text_files/chain_rule]
dy dy
We can now investigate the original function dx = 2u · 1 dx = 2(x + 5) = 2x + 10
Example 3.6.6.
Rule 3.6.3.
[author=livshits, file =text_files/chain_rule]
Chain Rule: (f (g(x)))0 = f 0 (g(x))g 0 (x)
Exercises
d
1. Find dx ((1 − x2 )100 )
d
√
2. Find dx x−3
d 2
√
3. Find dx (x − x2 − 3)
d
√
4. Find 2
dx ( x + x + 1)
√
d 3 3
5. Find 2
dx ( x + x + x + 1)
d 3
√
6. Find dx ((x + x + 1)10 )
96 CHAPTER 3. DERIVATIVES
1 x
sinh x = (e − e−x )
2
1 x
cosh x = (e + e−x )
2
ex − e−x sinh x
tanh x = =
ex + e−x cosh x
The reciprocal functions cosech, sech, coth are defined from these functions.
Facts.
[author=wikibooks, file =text_files/hyperbolics]
The hyperbolic trigonometric functions satisfy identities very similar to those sat-
isfied by the regular trigonometric functions.
cosh2 x − sinh2 x = 1
2
1 − tanh2 x = sech x
sinh 2x = sinh x cosh x
cosh 2x = cosh2 x + sinh2 x
Rules 3.7.1.
d
sinh x = cosh x
dx
d
cosh x = sinh x
dx
d 2
tanh x = sech x
dx
d
cosech x = − cosech x coth x
dx
d
sech x = − sech x tanh x
dx
d 2
coth x = cosech x
dx
3.8. TANGENT AND NORMAL LINES 97
Definition 3.7.2.
[author=wikibooks, file =text_files/hyperbolics]
We define inverse functions for the hyperbolic functions. As with the usual trigono-
metric functions, we sometimes need to restrict the domain to obtain a function
which is one-to-one.
• sinh(x) is one-to-one on the whole real number line, and its range is the whole
real number line. Therefore, sinh −1 is defined
√ on the whole real number line.
The formula is given by sinh −1 = ln(z + z 2 + 1).
• cosh(x) is one-to-one on the domain [0, ∞). It’s range is [1, ∞). Therefore
cosh −1√is defined on the interval [1, ∞). The formula is given by cosh−1 z =
ln(z + z 2 − 1).
• tanh(x) is one-to-one on the whole real number line and it’s range is the
interval (−1, 1). Thefore tanh −1qis defined on the interval (−1, 1). The
formula is given by tanh−1 z = ln 1+z
1−z .
Rules 3.7.2.
• d
dx sinh −1 (x) = √ 1
1+x2
.
• d
dx cosh −1 (x) = √ 1
x2 −1
, x > 1.
• d
dx tanh −1 (x) = √ 1
1−x2
, −1 < x < 1.
• d
dx cosech −1 (x) = − |x|√11+x2 , x 6= 0.
• d
dx sech −1 (x) = − x√1−x
1
2
, 0 < x < 1.
• d
dx coth −1 (x) = 1
1−x2 , |x| > 1.
Rule 3.8.1.
y − yo = m(x − xo )
In the present context, the slope is f 0 (xo ) and the point is (xo , f (xo )), so the
equation of the tangent line to the graph of f at (xo , f (xo )) is
Rule 3.8.2.
[author=garrett, file =text_files/tangent_normal_lines]
The normal line to a curve at a particular point is the line through that point and
perpendicular to the tangent. A person might remember from analytic geometry
that the slope of any line perpendicular to a line with slope m is the negative
reciprocal −1/m. Thus, just changing this aspect of the equation for the tangent
line, we can say generally that the equation of the normal line to the graph of f at
(xo , f (xo )) is
−1
y − f (xo ) = 0 (x − xo )
f (xo )
The main conceptual hazard is to mistakenly name the fixed point ‘x’, as well
as naming the variable coordinate on the tangent line ‘x’. This causes a person
to write down some equation which, whatever it may be, is not the equation of a
line at all.
Another popular boo-boo is to forget the subtraction −f (xo ) on the left hand
side. Don’t do it.
Example 3.8.1.
[author=garrett, file =text_files/tangent_normal_lines]
So, as the simplest example: let’s write the equation for the tangent line to the
curve y = x2 at the point where x = 3. The derivative of the function is y 0 = 2x,
which has value 2 · 3 = 6 when x = 3. And the value of the function is 3 · 3 = 9
when x = 3. Thus, the tangent line at that point is
y − 9 = 6(x − 3)
−1
y−9= (x − 3)
6
3.8. TANGENT AND NORMAL LINES 99
So the question of finding the tangent and normal lines at various points of
the graph of a function is just a combination of the two processes: computing
the derivative at the point in question, and invoking the point-slope form of the
equation for a straight line.
Exercises
1. Write the equation for both the tangent line and normal line to the curve
y = 3x2 − x + 1 at the point where x = 1.
2. Write the equation for both the tangent line and normal line to the curve
y = (x − 1)/(x + 1) at the point where x = 0.
100 CHAPTER 3. DERIVATIVES
Exercises
1. Derive multiplier rule from the Leibniz rule.
2. Find the formulas for (1/f )0 and (g/f )0 using Leibniz rule (Hint: differentiate
the identity (1/f )f = 1 and solve for (1/f )0 ).
3. To make our guess a theorem we observe that every time we turn the crank
to get from (xn )0 to (x(n+1) )0 the pattern persists (exercise: check it).
4. Write x8 as ((x2 )2 )2 and use the chain rule 2 times to get (x8 )0 . Differentiate
x81 using a similar approach.
5. Use the chain rule to get an easy solution for ex.1.6
6. Use the fact that (x1/7 )7 =x and the chain rule to get (x1/7 )0 .
7. Differentiate some polynomials using the differentiation rules.
8. Do the calculations (Hint: use the chain rule to get d(x(t)5 )/dt)
9. Redo problem 1.3 without solving for y (Hint: go implicit).
10. sin0 = cos (see section 2.4 for details). Compute arcsin0 (Hint: go implicit,
starting from sin(arcsin(x)) = x and use sin2 + cos2 = 1).
11. Differentiate everything that moves to get more practice.
12. For some f see how q(x, a) = (f (x) − f (a))/(x − a) behaves when x − a gets
small.
13. As in the example ??, More generally, the velocity at time t will be 32t
(exercise)
14. Differentiate x3 , x5 , x6 , xn , c (= a constant).
15. Differentiate x1/3 , x1/5 , x1/7 , x1/n .
p
16. Find the slope of the tangent to the unit circle at the point (a, (1 − a2 )).
Hint: the equation of the unit circle is x2 + y 2 = 1.
17. Differentiate x(m/n) . Guess the formula for (xb )0 , b real.
18. Give an argument that (f + g)0 = f 0 + g 0 and for any constant c (cf )0 = cf 0 .
19. Differentiate (1 + x)7 and find a neat formula for the answer.
Differentiate the following functions
20. x4 + 4x4 − 5x3 + x + 1 Use the constant multiplier rule, sums rule and the
formula dxn /dx = nxn−1 20x3 + 15x2 + 1
21. (x2 + 3x + 2)10 use chain rule 10(x2 + 3x + 2)9 (2x + 3)
22. [(x3 + 2x + 1)6 + (x5 + x3 + 2)5 ]10
use chain rule
10[(x3 + 2x + 1)6 + (x5 + x3 + 2)5 ]9 [6(x3 + 2x + 1)5 (3x2 + 2) + 5(x5 + x3 +
2)4 (5x4 + 3x2 )]
3.9. END OF CHAPTER PROBLEMS 101
36. ln(x3 + 3)
Chain rule
(1/(x3 + 3))3x2
38. e10x
Chain rule
10e10x
41. Find a solution to the equation y 00 = −y such that y(0) = 1 and y 0 (0) = 2.
Find a multiple of sin plus a multiple of cos which satisfy the extra conditions
cos(x) + 2sin(x)
Find the following integrals.
R 5x
42. e dx
U -subst
(e5x /5) + C
2
43. xe−x dx
R
U -subst
R −x2 2 2
dx = −(1/2) e−x d(−x2 ) = −e−x /2 +C
R
xe
U -subst
sin(x2 )d(x2 ) = −cos(x2 ) + C
R
R
45. dx/(x + 1)
U -subst
ln|1 + x| + C
46. x2 dx/(x3 + 3)
R
U -subst
(1/3) d(x3 + 3)/(x3 + 3) = (1/3)ln|x3 + 3| + C
R
47. x3 ex dx
R
x
ex+e dx
R
48.
Try U = ex
R x ex R x R x x
e e dx = ee (ex )0 dx = ee d(ex ) = ee + C
49. x2 cos(x)dx
R
grate by parts once more to get the power of xdown to 0 (sin = −cos0 ).
th these pic-
y(t)
y
50. rope_sliding
−d/dt(Mv)
M
Ma
m 0
yshock_absorber
104 CHAPTER 3. DERIVATIVES
Chapter 4
Applications of Derivatives
Discussion.
105
106 CHAPTER 4. APPLICATIONS OF DERIVATIVES
Comment.
Definition 4.1.2.
Rule 4.1.1.
[author=garrett, file =text_files/derivs_and_graphs]
Further, for the kind of functions we’ll deal with here, there is a fairly systematic
way to get all this information: to find the intervals of increase and decrease of a
function f :
Comment.
[author=garrett, file =text_files/derivs_and_graphs]
It is certainly true that there are many possible shortcuts to this procedure, es-
pecially for polynomials of low degree or other rather special functions. However,
if you are able to quickly compute values of (derivatives of!) functions on your
calculator, you may as well use this procedure as any other.
Exactly which auxiliary points we choose does not matter, as long as they fall
in the correct intervals, since we just need a single sample on each interval to find
out whether f 0 is positive or negative there. Usually we pick integers or some other
kind of number to make computation of the derivative there as easy as possible.
It’s important to realize that even if a question does not directly ask for critical
points, and maybe does not ask about intervals either, still it is implicit that we
have to find the critical points and see whether the functions is increasing or
decreasing on the intervals between critical points. Examples:
Example 4.1.1.
[author=garrett, file =text_files/derivs_and_graphs]
Find the critical points and intervals on which f (x) = x2 + 2x + 9 is increasing
and decreasing: Compute f 0 (x) = 2x + 2. Solve 2x + 2 = 0 to find only one critical
point −1. To the left of −1 let’s use the auxiliary point to = −2 and to the right
use t1 = 0. Then f 0 (−2) = −2 < 0, so f is decreasing on the interval (−∞, −1).
And f 0 (0) = 2 > 0, so f is increasing on the interval (−1, ∞).
Example 4.1.2.
[author=garrett, file =text_files/derivs_and_graphs]
Find the critical points and intervals on which f (x) = x3 − 12x + 3 is increasing,
decreasing. Compute f 0 (x) = 3x2 − 12. Solve 3x2 − 12 = 0: this simplifies to
x2 − 4 = 0, so the critical points are ±2. To the left of −2 choose auxiliary point
to = −3, between −2 and = 2 choose auxiliary point t1 = 0, and to the right of
+2 choose t2 = 3. Plugging in the auxiliary points to the derivative, we find that
f 0 (−3) = 27 − 12 > 0, so f is increasing on (−∞, −2). Since f 0 (0) = −12 < 0, f is
decreasing on (−2, +2), and since f 0 (3) = 27 − 12 > 0, f is increasing on (2, ∞).
Notice too that we don’t really need to know the exact value of the derivative
at the auxiliary points: all we care about is whether the derivative is positive or
negative. The point is that sometimes some tedious computation can be avoided by
stopping as soon as it becomes clear whether the derivative is positive or negative.
108 CHAPTER 4. APPLICATIONS OF DERIVATIVES
Exercises
1. Find the critical points and intervals on which f (x) = x2 +2x+9 is increasing,
decreasing.
2. Find the critical points and intervals on which f (x) = 3x2 − 6x + 7 is in-
creasing, decreasing.
3. Find the critical points and intervals on which f (x) = x3 − 12x + 3 is in-
creasing, decreasing.
4.2. MINIMIZATION AND MAXIMIZATION 109
Theorem 4.2.1.
[author= wikibooks , file =text_files/extreme_values]
The extreme value theorem states that for function f(x), continuous on the closed
interval [a,b], f(x) must attain its maximum and minimum value each at least
once. Mathematically, there exists numbers m and M such that m ≤ f (x) ≤ M
And there exist some c and d such that f (c) = m and f (d) = M
Comment.
Corollary 4.2.1.
[author= wikibooks , file =text_files/extreme_values]
An important result that the extreme value theorem establishes is the following
Suppose that f is differentiable and that f has a local maximum or a local minimum
at x = c. Then f ’(c) = 0.
Definition 4.2.2.
Example 4.2.1.
[author=duckworth, file =text_files/max_mins]
Make up graphs showing some of each kind of thing.
Theorem 4.2.2.
[author= duckworth , file =text_files/max_mins]
110 CHAPTER 4. APPLICATIONS OF DERIVATIVES
1. Find f 0 (x), solve f 0 (x) = 0 and identify where f 0 (x) is undefined (i.e. find
the critical numbers).
2. Plug the critical numbers (which you found in step 1) and a and b into f (x).
This makes a list of y-values. The biggest y-value on this list is the absolute
maximum. The smallest y-value on this list is the absolute mininum.
Rule 4.2.2.
• Find f 0 (x), solve f 0 (x) = 0 and identify where f 0 (x) is undefined (i.e. find
the critical numbers).
• Test each critical number (which you found in step 1) using the first deriva-
tive test or the second derivative test.
Comment.
Definition 4.2.3.
14
12
10
8
6
4
2
0 2 4 6 8 10 generic_increasing_function
0
f (x) < 0
If f 00 (x) > 0 we say that f (x) is concave up. This means that it is curving
more upwards (it does not mean that it is increasing). If f 00 (x) < 0 we say that
f (x) is concave down. This means that it is curving more downwards.
Comment.
[author=duckworth, file =text_files/max_mins]
Note that concavity is not related to whether or not the graph is increasing or
decreasing. In fact you can have any combination of concavity (up or down) with
increasing or decreasing. This gives four possible pictures which you might want
to keep in mind.
increasing_concave_down_graph decreasing_concave_down_graph
f 00 (x) < 0, f 0 (x) > 0 f 00 (x) < 0, f 0 (x) < 0
conc. down, incr. conc. down, decr.
increasing_concave_up_graph decreasing_concave_up_graph
f 00 (x) > 0, f 0 (x) > 0 f 00 (x) > 0, f 0 (x) < 0
conc. up, incr. conc. up, decr.
(Note, you can get these pictures from the four quadrants of a circle.)
Rule 4.2.3.
[author=duckworth, file =text_files/max_mins]
First and second derivative tests. You can see figure out what the tests should
be, just by looking at pictures of max and mins, and thinking about what the first
or second derivative is doing there.
112 CHAPTER 4. APPLICATIONS OF DERIVATIVES
5 5
4 4
f0 > 0 f0 < 0
3 3
f0 < 0 f0 > 0
2 2
0 1 2 3 4 0 1 2 3 4
smooth_local_max smooth_lo
3 f0 > 0
2
f0 > 0
1
Comment.
Rule 4.2.4.
4.2. MINIMIZATION AND MAXIMIZATION 113
1. Solve for when f 0 (x) = 0 or is undefined. These are the only places when
f 0 (x) can change signs. (By the Intermediate Value Theorem! Yay! You
thought you could forget about this. Not!)
2. Test a single value of x between each pair of numbers you found in step 1
(including a value to the right of all the numbers and a value to the left of
all the numbers)
Example 4.2.2.
[author=duckworth, file =text_files/max_mins]
Let f (x) = x − 2 sin(x).
Example 4.2.3.
Discussion.
[author=garrett, file =text_files/max_mins]
The fundamental idea which makes calculus useful in understanding problems of
maximizing and minimizing things is that at a peak of the graph of a function, or
at the bottom of a trough, the tangent is horizontal. That is, the derivative f 0 (xo )
is 0 at points xo at which f (xo ) is a maximum or a minimum.
Well, a little sharpening of this is necessary: sometimes for either natural or
artificial reasons the variable x is restricted to some interval [a, b]. In that case, we
can say that the maximum and minimum values of f on the interval [a, b] occur
among the list of critical points and endpoints of the interval.
And, if there are points where f is not differentiable, or is discontinuous, then
these have to be added in, too. But let’s stick with the basic idea, and just ignore
some of these complications.
Rule 4.2.5.
Example 4.2.4.
[author=garrett, file =text_files/max_mins]
Find the minima and maxima of the function f (x) = x4 − 8x2 + 5 on the interval
[−1, 3]. First, take the derivative and set it equal to zero to solve for critical points:
this is
4x3 − 16x = 0
or, more simply, dividing by 4, it is x3 − 4x = 0. Luckily, we can see how to factor
this: it is
x(x − 2)(x + 2)
4.2. MINIMIZATION AND MAXIMIZATION 115
So the critical points are −2, 0, +2. Since the interval does not include −2, we
drop it from our list. And we add to the list the endpoints −1, 3. So the list
of numbers to consider as potential spots for minima and maxima are −1, 0, 2, 3.
Plugging these numbers into the function, we get (in that order) −2, 5, −11, 14.
Therefore, the maximum is 14, which occurs at x = 3, and the minimum is −11,
which occurs at x = 2.
Notice that in the previous example the maximum did not occur at a critical
point, but by coincidence did occur at an endpoint.
Example 4.2.5.
[author=garrett, file =text_files/max_mins]
You have 200 feet of fencing with which you wish to enclose the largest possible
rectangular garden. What is the largest garden you can have?
Let x be the length of the garden, and y the width. Then the area is simply
xy. Since the perimeter is 200, we know that 2x + 2y = 200, which we can solve
to express y as a function of x: we find that y = 100 − x. Now we can rewrite the
area as a function of x alone, which sets us up to execute our procedure:
area = xy = x(100 − x)
The derivative of this function with respect to x is 100 − 2x. Setting this equal to
0 gives the equation
100 − 2x = 0
to solve for critical points: we find just one, namely x = 50.
Now what about endpoints? What is the interval? In this example we must
look at ‘physical’ considerations to figure out what interval x is restricted to.
Certainly a width must be a positive number, so x > 0 and y > 0. Since y = 100−x,
the inequality on y gives another inequality on x, namely that x < 100. So x is in
[0, 100].
When we plug the values 0, 50, 100 into the function x(100−x), we get 0, 2500, 0,
in that order. Thus, the corresponding value of y is 100−50 = 50, and the maximal
possible area is 50 · 50 = 2500.
Definition 4.2.4.
Example 4.2.6.
the direction of the graph changes suddenly at that point, so there is no well-
defined tangent line (and so no derivative) for f at 0. We call the point 0 a
”critical point” of f.
Definition 4.2.5.
Example 4.2.7.
[author=wikibooks, file =text_files/max_mins]
For example, a critical point for the function f (x) = x2 is 0, since f’(x) = 2x and
f’(0) = 0. In fact, it is the only critcal number.
Comment.
[author=wikibooks, file =text_files/max_mins]
Critical numbers are significant because extrema only occur at critical numbers.
However, the converse is not true. An example of this is f (x) = x3 , since it has
one critical number f 0 (x) = 3x2 , f’(0) = 0. However, it is not an extrema.
Example 4.2.8.
(x+1)2 (x−1)(x+1)
f (x) = 2x f 0 (x) = 2x2
(x−1)(x+1)
f 0 (x) = 2x2 =0
x − 1 = 0, x = 1 x + 1 = 0, x = −1
intermediate values and checking them. We now pick intermediate values and test
to see whether they show that the function value indicates an extreme value. Use
convenient values when possible.
x= −2 −1 −1/2 0 1/2 1 2
f (x) = −1/4 0 −1/4 DNE 2.25 2 2.25
Since f (−1) is greater than the numbers around it, 0 is a local maximum. Also,
f (1) is lower than the numbers around it, it is a local minimum. However, since
f (0) is also undefined, it is not anything.
Exercises
1. Olivia has 200 feet of fencing with which she wishes to enclose the largest
possible rectangular garden. What is the largest garden she can have?
2. Find the minima and maxima of the function f (x) = 3x4 − 4x3 + 5 on the
interval [−2, 3].
3. The cost per hour of fuel to run a locomotive is v 2 /25 dollars, where v is
speed, and other costs are $100 per hour regardless of speed. What is the
speed that minimizes cost per mile?
Comment.
Rule 4.3.1.
• Between each pair xi < xi+1 of points in the list, choose an auxiliary point
ti+1 . Evaluate the derivative f 0 at all the auxiliary points.
• For each critical point xi , we have the auxiliary points to each side of it:
ti < xi < ti+1 . There are four cases best remembered by drawing a picture! :
• if f 0 (ti ) > 0 and f 0 (ti+1 ) < 0 (so f is increasing to the left of xi and
decreasing to the right of xi , then f has a local maximum at xo .
• if f 0 (ti ) < 0 and f 0 (ti+1 ) > 0 (so f is decreasing to the left of xi and
increasing to the right of xi , then f has a local minimum at xo .
4.3. LOCAL MINIMA AND MAXIMA (FIRST DERIVATIVE TEST) 119
• if f 0 (ti ) < 0 and f 0 (ti+1 ) < 0 (so f is decreasing to the left of xi and also
decreasing to the right of xi , then f has neither a local maximum nor a local
minimum at xo .
• if f 0 (ti ) > 0 and f 0 (ti+1 ) > 0 (so f is increasing to the left of xi and also
increasing to the right of xi , then f has neither a local maximum nor a local
minimum at xo .
The endpoints require separate treatment: There is the auxiliary point to just
to the right of the left endpoint a, and the auxiliary point tn just to the left of the
right endpoint b:
Comment.
Example 4.3.1.
Comment.
Exercises
1. Find all the local (=relative) minima and maxima of the function f (x) =
(x + 1)3 − 3(x + 1) on the interval [−2, 1].
2. Find the local (=relative) minima and maxima on the interval [−3, 2] of the
function f (x) = (x + 1)3 − 3(x + 1).
3. Find the local (relative) minima and maxima of the function f (x) = 1 −
12x + x3 on the interval [−3, 3].
4. Find the local (relative) minima and maxima of the function f (x) = 3x4 −
8x3 + 6x2 + 17 on the interval [−3, 3].
4.4. AN ALGEBRA TRICK 121
f 0 (x) =
= k(x − a)k−1 (x − b)` (x − c)m + (x − a)k `(x − b)`−1 (x − c)m + (x − a)k (x − b)` m(x − c)m−1
which we can take out, using the distributive law in reverse: we have
f 0 (x) =
= (x − a)k−1 (x − b)`−1 (x − c)m−1 [k(x − b)(x − c) + `(x − a)(x − c) + m(x − a)(x − b)]
The minor miracle is that the big expression inside the square brackets is a mere
quadratic polynomial in x.
Then to determine critical points we have to figure out the roots of the equation
f 0 (x) = 0: If k − 1 > 0 then x = a is a critical point, if k − 1 ≤ 0 it isn’t. If
` − 1 > 0 then x = b is a critical point, if ` − 1 ≤ 0 it isn’t. If m − 1 > 0 then x = c
is a critical point, if m − 1 ≤ 0 it isn’t. And, last but not least, the two roots of
the quadratic equation
Example 4.4.1.
[author=garrett, file =text_files/algebra_for_first_deriv_test]
A very simple numerical example: suppose we are to find the critical points of
the function
f (x) = x5/2 (x − 1)4/3
Implicitly, we have to find the critical points first. We compute the derivative by
using the product rule, the power function rule, and a tiny bit of chain rule:
5 3/2 4
f 0 (x) = x (x − 1)4/3 + x5/2 (x − 1)1/3
2 3
122 CHAPTER 4. APPLICATIONS OF DERIVATIVES
And now solve this for x? It’s not at all a polynomial, and it is a little ugly.
But our algebra trick transforms this issue into something as simple as solving
a linear equation: first figure out the largest power of x that occurs in all the
terms: it is x3/2 , since x5/2 occurs in the first term and x3/2 in the second. The
largest power of x − 1 that occurs in all the terms is (x − 1)1/3 , since (x − 1)4/3
occurs in the first, and (x − 1)1/3 in the second. Taking these common factors out
(using the distributive law ‘backward’), we rearrange to
5 3/2 4
f 0 (x) = x (x − 1)4/3 + x5/2 (x − 1)1/3
2 3
5 4
= x3/2 (x − 1)1/3 (x − 1) + x
2 3
5 5 4
= x3/2 (x − 1)1/3 ( x − + x)
2 2 3
23 5
= x3/2 (x − 1)1/3 ( x − )
6 2
Now to see when this is 0 is not so hard: first, since the power of x appearing
in front is positive, x = 0 make this expression 0. Second, since the power of x + 1
appearing in front is positive, if x − 1 = 0 then the whole expression is 0. Third,
and perhaps unexpectedly, from the simplified form of the complicated factor, if
23 5
6 x − 2 = 0 then the whole expression is 0, as well. So, altogether, the critical
points would appear to be
15
x = 0, , 1
23
15
Many people would overlook the critical point 23 , which is visible only after the
algebra we did.
Exercises
1. Find the critical points and intervals of increase and decrease of f (x) =
x10 (x − 1)12 .
2. Find the critical points and intervals of increase and decrease of f (x) =
x10 (x − 2)11 (x + 2)3 .
3. Find the critical points and intervals of increase and decrease of f (x) =
x5/3 (x + 1)6/5 .
4. Find the critical points and intervals of increase and decrease of f (x) =
x1/2 (x + 1)4/3 (x − 1)−11/3 .
4.5. LINEAR APPROXIMATIONS: APPROXIMATION BY DIFFERENTIALS123
Rule 4.5.1.
[author=duckworth, file =text_files/linear_approximation]
Linearization. Let f be a function, fix an x-value x = a and let L(x) be the
tangent line of f (x) at x = a. Then f (x) and L(x) are approximately equal for
those x-values near x = a. In symbols:
Comment.
[author=garrett,author=duckworth, file =text_files/linear_approximation]
We note the following:
• One formula for L(x) is given by L(x) = f 0 (a)(x − a) + f (a). One really
important thing abouth this formula is that to make it explicit, we only need
to know two numbers: f (a) and f 0 (a).
• We will not spend time making precise what “near” means in this definition
or how good the approximation “≈” is. However we note that f (a) = L(a),
so that at x = a the L(x) is an exact approximation of f (x). Also, f 0 (a) =
d
dx L(x) , i.e. the derivative of f at x = a equals the derivative of L(x) at
x=a
x = a. So again, at x = a, the slope of L(x) is an exact approximation of
f (x).
Notation.
[author=garrett, file =text_files/linear_approximation]
The aproximation statement has many paraphrases in varying choices of symbols,
and a person needs to be able to recognize all of them. For example, one of
124 CHAPTER 4. APPLICATIONS OF DERIVATIVES
the more traditional paraphrases, which introduces some slightly silly but oh-so-
traditional notation, is the following one. We might also say that y is a function
of x given by y = f (x). Let
∆x = small change in x
∆y ≈ f 0 (x) ∆x
dx = ∆x
and call the dx and dy ‘differentials’. And then this whole procedure is ‘approxi-
mation by differentials’. A not particularly enlightening paraphrase, using the
previous notation, is
dy ≈ ∆y
Even though you may see people writing this, don’t do it.
More paraphrases, with varying symbols:
f (x + δ) ≈ f (x) + f 0 (x)δ
f (x + h) ≈ f (x) + f 0 (x)h
f (x + ∆x) − f (x) ≈ f 0 (x)∆x
y + ∆y ≈ f (x) + f 0 (x)∆x
∆y ≈ f 0 (x)∆x
Comment.
Comment.
[author=duckworth, file =text_files/linear_approximation]
Try to keep the following in mind as we do some examples. We will start with
examples that are easy, or historically relevant, but do not show you, a modern
reader, why linearization is a useful thing. These examples include anything where
we have a formula for f (x), and we are trying to approximate f (b) for some number
b. The examples that are useful to us today will came later.
√
Historically, using linearizaton to approximate 10 would have been a useful
trick
√ for most of the last 1000 years. Today, we (or our machines) can calculate
10; we will do this example just as a means of practicing linearization.
However, linearization is still very useful today. The following applications are
incredibly important and we’ll return to them later in these notes.
2. Suppose that we know only a little bit about a function. For example,
suppose that we know that some moving object has position p(t) satisfying
p(0) = 7, and that the velocity is given by v(t) = (t − 1) cos(t). Can we
approximate p(.5), p(1), etc.?
Let L(t) be the linear approximation of p(t) at t = 0. To write down a
formula for L(t) we only need to know two numbers: p(0) and p0 (0). Well,
we were explicitly told that p(0) = 7, and we can find p0 (0) = v(0) =
(0 − 1) cos(0) = −1. Therefore L(t) = −t + 7. Therefore p(.5) ≈ L(.5) = 6.5
and p(1) ≈ L(1) = 6.
Example 4.5.1.
∆x = 17 − 16 = 1
and
√ √ 1 1
17 = f (17) ≈ f (16) + f 0 (16)∆x = 16 + √ · 1 = 4 + 18
2 16
√
Similarly,
√ if we wanted to approximate 18 ‘by differentials’, we’d again take
f (x) = x = x1/2 . Still we imagine that we are doing this ‘by hand’, and then
126 CHAPTER 4. APPLICATIONS OF DERIVATIVES
of course we can ‘easily evaluate’ the function f and its derivative f 0 at the point
x = 16 which is ‘near’ 18. Thus, here
∆x = 18 − 16 = 2
and
√ √ 1 1 1
18 = f (18) ≈ f (16) + f 0 (16)∆x = 16 + √ ·2=4+
2 16 4
√
Why not use the ‘good’ point 25 as the ‘nearby’ point to find 18? Well, in
broad terms, the further away your ‘good’ point is, the worse the approximation
will be. Yes, it is true that we have little idea how good or bad the approximation
is anyway.
Comment.
Example 4.5.2.
punching the buttons, and from a contemporary perspective may seem senseless.
Example 4.5.3.
∆x = (x + 2) − x = 2
Example 4.5.4.
∆x = (e + 2) − e = 2
so we have
2 2
ln(e + 2) = f (e + 2) ≈ f (e) + f 0 (e) · 2 = ln e + =1+
e e
since ln e = 1.
Exercises
√ √
1. Approximate 101 ‘by differentials’ in terms of 100 = 10.
√ √
2. Approximate x + 1 ‘by differentials’, in terms of x.
d
3. Granting that dx ln x = x1 , approximate ln(x + 1) ‘by differentials’, in terms
of ln x and x.
d x
4. Granting that dx e = ex , approximate ex+1 in terms of ex .
d
5. Granting that dx cos x = − sin x, approximate cos(x + 1) in terms of cos x
and sin x.
128 CHAPTER 4. APPLICATIONS OF DERIVATIVES
Example 4.6.1.
y 5 − xy + x5 = 1
and we are to find some useful expression for dy/dx. Notice that it is not likely
that we’d be able to solve this equation for y as a function of x (nor vice-versa,
either), so our previous methods do not obviously do anything here! But both
sides of that equality are functions of x, and are equal, so their derivatives are
equal, surely. That is,
dy dy
5y 4 −1·y−x + 5x4 = 0
dx dx
Here the trick is that we can ‘take the derivative’ without knowing exactly what
y is as a function of x, but just using the rules for differentiation.
Specifically, to take the derivative of the term y 5 , we view this as a composite
function, obtained by applying the take-the-fifth-power function after applying the
(not clearly known!) function y. Then use the chain rule!
Likewise, to differentiate the term xy, we use the product rule
d dx dy dy
(x · y) = ·y+x· =y+x·
dx dx dx dx
since, after all,
dy
=1
dx
And the term x5 is easy to differentiate, obtaining the 5x4 . The other side of
the equation, the function ‘1’, is constant, so its derivative is 0. (The fact that this
means that the left-hand side is also constant should not be mis-used: we need
to use the very non-trivial looking expression we have for that constant function,
there on the left-hand side of that equation!).
Now the amazing part is that this equation can be solved for y 0 , if we tolerate
a formula involving not only x, but also y: first, regroup terms depending on
whether they have a y 0 or not:
y 0 (5y 4 − x) = y − 5x4
4.6. IMPLICIT DIFFERENTIATION 129
y − 5x4
y0 =
5y 4 − x
Yes, this is not as good as if there were a formula for y 0 not needing the y. But,
on the other hand, the initial situation we had did not present us with a formula
for y in terms of x, so it was necessary to lower our expectations.
Yes, if we are given a value of x and told to find the corresponding y 0 , it would
be impossible without luck or some additional information. For example, in the
case we just looked at, if we were asked to find y 0 when x = 1 and y = 1, it’s easy:
just plug these values into the formula for y 0 in terms of both x and y: when x = 1
and y = 1, the corresponding value of y 0 is
1 − 5 · 14
y0 = = −4/4 = −1
5 · 14 − 1
If, instead, we were asked to find y and y 0 when x = 1, not knowing in advance
that y = 1 fits into the equation when x = 1, we’d have to hope for some luck.
First, we’d have to try to solve the original equation for y with x replace by its
value 1: solve
y5 − y + 1 = 1
By luck indeed, there is some cancellation, and the equation becomes
y5 − y = 0
So there are actually three real numbers which work as y for x = 1: the values
−1, 0, +1. There is no clear way to see which is ‘best’. But in any case, any one
of these three values could be used as y in substituting into the formula
y − 5x4
y0 =
5y 4 − x
we obtained above.
Yes, there are really three solutions, three functions, etc.
Note that we could have used the Intermediate Value Theorem and/or New-
ton’s Method to numerically solve the equation, even without too much luck. In
‘real life’ a person should be prepared to do such things.
Discussion.
Example 4.6.2.
130 CHAPTER 4. APPLICATIONS OF DERIVATIVES
Comment.
Rule 4.6.1.
At this point we need to go back to the unit triangle. Since y is the angle and
4.6. IMPLICIT DIFFERENTIATION 131
the opposite side is sin(y) (which is equal to x), the adjacent side is cos(y) (which
is equal to the square root of 1 minus x2 , based on the pythagorean theorem), and
the hypotenuse is 1. Since we have determined the value of cos(y) based on the
unit triangle, we can substitute it back in to the above equation and get
d √ 1
Derivative of the Arcsine dx arcsin(x) = 1+x2
Rule 4.6.2.
[author=wikibooks, file =text_files/derivatives_inverse_trig]
We can use an identical procedure for the arccosine and arctangent
Derivative of the Arccosine d
arccos(x) = √ −1
dx 1−x2
d 1
Derivative of the Arctangent dx arctan(x) = 1+x2
Exercises
1. Suppose that y is a function of x and
y 5 − xy + x5 = 1
y 3 − xy 2 + x2 y + x5 = 7
d2 y
Find dy/dx at the point x = 1, y = 2. Find dx2 at that point.
132 CHAPTER 4. APPLICATIONS OF DERIVATIVES
Example 4.7.1.
[author=garrett, file =text_files/related_rates]
Continuing with the idea of describing a function by a relation, we could have two
unknown functions x and y of t, related by some formula such as
x2 + y 2 = 25
Discussion.
• You have a formula which has more than one independant variable in it.
Each variable is a letter. You let each letter represent a function of t, and
then you take the derivative with respect to t. For instance, if you have
4.7. RELATED RATES 133
A = f g where f and g are each functions of t, then the product rule says
df dg
that dA
dt = dt g + f dt .
• Now you look at the information in the problem and plug in numbers for
everything in the formula except one unknown quantity, which you can solve
for.
df
• We interpret dt as the rate of change of f with respect to t. Similarly for
dA dg
dt and dt .
• Often, the hardest part is just figuring out what formula to start with.
Example 4.7.2.
3. Suppose now that you know f = t + 10 and g = cos(t). Find dA dt 0 by taking
Example 4.7.3.
dV
1. Find the formula for dt .
2. Suppose you know that the radius of the sphere is 5, and it is increasing at
a rate of 10m/s. How fast is the volume increasing?
3. Suppose that you know that the radius of the sphere is 10, and that the
volume is decreasing at a rate of −3m3 /s. How fast is the radius decreasing?
Example 4.7.4.
134 CHAPTER 4. APPLICATIONS OF DERIVATIVES
1. Find a formula for the distance D between the cars in terms of x and y.
2. Find a formula for dD dx
dt . (Hint: if you can’t figure out where to put dt and
dy
dt think about where the chain rule says you should put the derivative of
the inside.)
3. Suppose you know that car A is travelling at 60 m/h and is 100 miles from
the starting point. Suppose you know that car B is travelling 30 m/h and
is 50 miles from the starting point. Find how fast the distance between the
cars is increasing.
4. Suppose you know that the distance between the cars is increasing at the
rate of 37 m/h. Suppose you know that car A is 75 miles from the starting
point and going 30 m/h. Suppose you know that car B is 55 miles from the
starting point. How fast is car B going
Example 4.7.5.
dV
1. Find a formula for dt .
dV dr dh
2. Suppose dt = 3, r = 2, dt = 5 and h = 7. Find dt .
3. Suppose you know that the volume of water is 1000, and that the height is
10. Suppose also that you know that the radius is increasing at a rate of
1 and that the volume is increasing at a rate of 5. How fast is the height
increasing?
Discussion.
[author=duckworth, file =text_files/related_rates]
The basic idea here is that we have a formula, and the letters in the formula stand
d
for some function of t. We can take dt of both sides of the formula and treat every
letter as a function of time. Then you plug numbers into every spot except the
one you’re solving for. Then you solve for the unknown.
In general, I emphasize the formula first, and taking the derivative. Afterwards
I go back to the problem and see how to plug the numbers in.
Example 4.7.6.
A ladder is leaning against the wall, and sliding downwards. The ladder is 10 feet
long.
d dy
The equation is x2 + y 2 = 10. Taking dt of both sides gives 2x dx
dt + 2y dt = 0.
dx
(Note, we need dt because x is some function of t. If we knew the formula for x
we could write the formula for dx
dt .)
Now suppose that you know the base is 2 feet from the wall and moving at the
rate of 41 ft/sec. How fast is the top sliding down? We plug these numbers in and
we have 2 · 2 · 41 + 2y dydt = 0. We need to get rid of y before we can solve for dy
dt .
2 2 2
√
Use the Pythagorean theorem: 2 + y = 10 so y = 100 − 4 = 9.8. Then we
have 2 · 2 · 14 + 2 · 9.8 · dy dy
dt = 0 whence dt = −.05 ft/sec.
d
Notice that we took dt of both sides first, and then plugged in numbers. In
d
general, you cannot plug in any numbers before taking dt unless they are numbers
which cannot change in the problem.
Exercises
1. Suppose that x, y are both functions of t, and that x2 + y 2 = 25. Express
dx dy dy dx
dt in terms of x, y, and dt . When x = 3 and y = 4 and dt = 6, what is dt ?
Comment.
[author=garrett, file =text_files/intermediate_value_theorem]
The assertion of the Intermediate Value Theorem is something which is probably
‘intuitively obvious’, and is also provably true.
This result has many relatively ‘theoretical’ uses, but for our purposes can be
used to give a crude but simple way to locate the roots of functions. There is a
lot of guessing, or trial-and-error, involved here, but that is fair. Again, in this
situation, it is to our advantage if we are reasonably proficient in using a calculator
to do simple tasks like evaluating polynomials! If this approach to estimating roots
is taken to its logical conclusion, it is called the method of interval bisection, for
a reason we’ll see below. We will not pursue this method very far, because there
are better methods to use once we have invoked this just to get going.
Example 4.8.1.
[author=garrett, file =text_files/intermediate_value_theorem]
For example, we probably don’t know a formula to solve the cubic equation
x3 − x + 1 = 0.
Example 4.8.2.
[author=garrett, file =text_files/intermediate_value_theorem]
If we continue with this method, we can obtain as good an approximation as we
want! But there are faster ways to get a really good approximation, as we’ll see.
Unless a person has an amazing intuition for polynomials, there is really no
way to anticipate what guess is better than any other in getting started.
Invoke the Intermediate Value Theorem to find an interval of length 1 or less
in which there is a root of x3 + x + 3 = 0: Let f (x) = x3 + x + 3. Just, guessing, we
compute f (0) = 3 > 0. Realizing that the x3 term probably ‘dominates’ f when
x is large positive or large negative, and since we want to find a point where f is
4.9. NEWTON’S METHOD 137
negative, our next guess will be a ‘large’ negative number: how about −1? Well,
f (−1) = 1 > 0, so evidently −1 is not negative enough. How about −2? Well,
f (−2) = −7 < 0, so we have succeeded. Further, the failed guess −1 actually was
worthwhile, since now we know that f (−2) < 0 and f (−1) > 0. Then, invoking
the Intermediate Value Theorem, there is a root in the interval [−2, −1].
Of course, typically polynomials have several roots, but the number of roots of
a polynomial is never more than its degree. We can use the Intermediate Value
Theorem to get an idea where all of them are.
Invoke the Intermediate Value Theorem to find three different intervals of
length 1 or less in each of which there is a root of x3 − 4x + 1 = 0: first, just
starting anywhere, f (0) = 1 > 0. Next, f (1) = −2 < 0. So, since f (0) > 0 and
f (1) < 0, there is at least one root in [0, 1], by the Intermediate Value Theorem.
Next, f (2) = 1 > 0. So, with some luck here, since f (1) < 0 and f (2) > 0, by the
Intermediate Value Theorem there is a root in [1, 2]. Now if we somehow imagine
that there is a negative root as well, then we try −1: f (−1) = 4 > 0. So we know
nothing about roots in [−1, 0]. But continue: f (−2) = 1 > 0, and still no new
conclusion. Continue: f (−3) = −14 < 0. Aha! So since f (−3) < 0 and f (2) > 0,
by the Intermediate Value Theorem there is a third root in the interval [−3, −2].
Notice how even the ‘bad’ guesses were not entirely wasted.
y = f 0 (a)(x − a) + f (a).
Definition 4.9.1.
f (x) ∼
= L(x) for x near a.
Example 4.9.1.
[author=duckworth,uses=sin,uses=linear_approximation, file =text_files/newtons_
method]
Let f (x) = sin(x). Then the equation of the tangent line at x = 0 is L(x) = x.
Then sin(x) ∼
= x for x near 0. If you like, make a table of some values of y1 = sin(x)
and y2 = x for x near 0. (By the way, this explains why limx→0 sin(x)x = 1.)
138 CHAPTER 4. APPLICATIONS OF DERIVATIVES
Rule 4.9.1.
Example 4.9.2.
Rule 4.9.2.
Program.
To get → you hit the STO→ button (right above ON ). To get End you hit
PRGRM and choose 7. To get Disp you hit PRGRM and choose I/O and then
choose 3. To get y1 you hit VARS , then choose y-vars, then choose 1. To get
nDeriv( you hit MATH , then choose nDeriv( .
Example 4.9.3.
[author=duckworth,uses=e^x,uses=newtons_method, file =text_files/newtons_method]
Using the program, find an approximation of the solution of x + ex = 0. Start
with x = 0 and run five steps. , y2 = 1 + ex . Run NEWT with an initial guess of
x = 0, and try 5 steps. You should get −.567 . If you look at the graph of y1 this
should be very close to the x-intercept.
By the way, if you want to run it again you can just hit enter after you’ve run
the program, but before you hit anything else.
Discussion.
Derivation.
[author=garrett,uses=newtons_method, file =text_files/newtons_method]
Let’s derive the relevant formula: if our blind guess for a root of f is xo , then the
tangent line is
y − f (xo ) = f 0 (xo )(x − xo )
‘Sliding down’ the tangent line to hit the x-axis means to find the intersection of
this line with the x-axis: this is where y = 0. Thus, we solve for x in
0 − f (xo ) = f 0 (xo )(x − xo )
to find
f (xo )
x = xo −
f 0 (xo )
Well, let’s call this first serious guess x1 . Then, repeating this process, the
second serious guess would be
f (x1 )
x2 = x1 −
f 0 (x1 )
140 CHAPTER 4. APPLICATIONS OF DERIVATIVES
and generally, if we have the nth guess xn then the n + 1th guess xn+1 is
f (xn )
xn+1 = xn −
f 0 (xn )
OK, that’s the formula for improving our guesses. How do we decide when to
quit? Well, it depends upon to how many decimal places we want our approxima-
tion to be good. Basically, if we want, for example, 3 decimal places accuracy, then
as soon as xn and xn+1 agree to three decimal places, we can presume that those
are the true decimals of the true root of the equation. This will be illustrated in
the examples below.
Comment.
Example 4.9.4.
[author=garrett,uses=newtons_method, file =text_files/newtons_method]
Approximate a root of x3 − x + 1 = 0 using the intermediate value theorem to get
started, and then Newton’s method:
First let’s see what happens if we are a little foolish here, in terms of the ‘blind’
guess we start with. If we ignore the advice about using the intermediate value
theorem to guarantee a root in some known interval, we’ll waste time. Let’s see:
The general formula
f (xn )
xn+1 = xn −
f 0 (xn )
becomes
x3 − x + 1
xn+1 = xn −
3x2 − 1
If we take x1 = 1 as our ‘blind’ guess, then plugging into the formula gives
x2 = 0.5
x3 = 3
x4 = 2.0384615384615383249
This is discouraging, since the numbers are jumping around somewhat. But if we
are stubborn and can compute quickly with a calculator (not by hand!), we’d see
4.9. NEWTON’S METHOD 141
what happens:
x5 = 1.3902821472167361527
x6 = 0.9116118977179270555
x7 = 0.34502849674816926662
x8 = 1.4277507040272707783
x9 = 0.94241791250948314662
x10 = 0.40494935719938018881
x11 = 1.7069046451828553401
x12 = 1.1557563610748160521
x13 = 0.69419181332954971175
x14 = −0.74249429872066285974
x15 = −2.7812959406781381233
x16 = −1.9827252470441485421
x17 = −1.5369273797584126484
x18 = −1.3572624831877750928
x19 = −1.3256630944288703144
x20 = −1.324718788615257159
x21 = −1.3247179572453899876
Well, after quite a few iterations of ‘sliding down the tangent’, the last two
numbers we got, x20 and x21 , agree to 5 decimal places. This would make us think
that the true root is approximated to five decimal places by −1.32471.
The stupid aspect of this little scenario was that our initial ‘blind’ guess was
too far from an actual root, so that there was some wacky jumping around of the
numbers before things settled down. If we had been computing by hand this would
have been hopeless.
Let’s try this example again using the Intermediate Value Theorem to pin down
a root with some degree of accuracy: First, f (1) = 1 > 0. Then f (0) = +1 > 0
also, so we might doubt that there is a root in [0, 1]. Continue: f (−1) = 1 > 0
again, so we might doubt that there is a root in [−1, 0], either. Continue: at last
f (−2) = −5 < 0, so since f (−1) > 0 by the Intermediate Value Theorem we do
indeed know that there is a root between −2 and −1. Now to start using Newton’s
Method, we would reasonably guess
xo = −1.5
since this is the midpoint of the interval on which we know there is a root. Then
computing by Newton’s method gives:
x1 = −1.3478260869565217295
x2 = −1.3252003989509069104
x3 = −1.324718173999053672
x4 = −1.3247179572447898011
Example 4.9.5.
[author=garrett,uses=newtons_method, file =text_files/newtons_method]
Approximate all three roots of x3 −3x+1 = 0 using the intermediate value theorem
to get started, and then Newton’s method. Here you have to take a little care in
choice of beginning ‘guess’ for Newton’s method:
142 CHAPTER 4. APPLICATIONS OF DERIVATIVES
In this case, since we are told that there are three roots, then we should
certainly be wary about where we start: presumably we have to start in dif-
ferent places in order to successfully use Newton’s method to find the different
roots. So, starting thinking in terms of the intermediate value theorem: letting
f (x) = x3 − 3x + 1, we have f (2) = 3 > 0. Next, f (1) = −1 < 0, so we by
the Intermediate Value Theorem we know there is a root in [1, 2]. Let’s try to
approximate it pretty well before looking for the other roots: The general formula
for Newton’s method becomes
x3 − 3x + 1
xn+1 = xn −
3x2 − 3
Our initial ‘blind’ guess might reasonably be the midpoint of the interval in which
we know there is a root: take
xo = 1.5
Then we can compute
x1 = 1.533333333333333437
x2 = 1.5320906432748537807
x3 = 1.5320888862414665521
x4 = 1.5320888862379560269
x5 = 1.5320888862379560269
x6 = 1.5320888862379560269
So it appears that we have quickly approximated a root in that interval! To what
looks like 19 decimal places!
Continuing with this example: f (0) = 1 > 0, so since f (1) < 0 we know by the
intermediate value theorem that there is a root in [0, 1], since f (1) = −1 < 0. So
as our blind gues let’s use the midpoint of this interval to start Newton’s Method:
that is, now take xo = 0.5:
x1 = 0.33333333333333337034
x2 = 0.3472222222222222654
x3 = 0.34729635316386797683
x4 = 0.34729635533386071788
x5 = 0.34729635533386060686
x6 = 0.34729635533386071788
x7 = 0.34729635533386060686
x8 = 0.34729635533386071788
so we have a root evidently approximated to 3 decimal places after just 2 appli-
cations of Newton’s method. After 8 applications, we have apparently 15 correct
decimal places.
Discussion.
Let us try to figure out how fast the approximation improves. We get: x2n+1 −
a = (xn + a/xn )2 /4 − a = (x2n + 2a + a2 /x2n )/4 − a = (x2n − 2a + a2 /x2n )/4 =
4.9. NEWTON’S METHOD 143
So, roughly speaking, every iteration doubles the number of accurate decimal
places in the approximation if the approximation is good enough to begin with. If
the approximation is not good – then, starting with the second iteration, we will
get twice closer to the solution every time we turn the crank.
This trick was already known to the Babylonians about 4000 years ago (see pp.
21-23 in Analysis by Its History). By looking at it from a more modern perspective
we will arrive at the Newton’s method. Here is how.
Derivation.
f (x) = 0 (4.1)
where f is ULD. For x close to xn f (x) is well approximated by f (xn )+f 0 (xn )(x−
xn ), so we may hope that the solution to the approximate equation
will be a good approximation to the solution of our original equation. But the
approximate equation is easy to solve because it is linear. Its solution is
x
x n x n-1
2 a
y=x -
parab_newtons_method
Discussion.
[author=livshits, file =text_files/newtons_method]
Now we want to show that Newton’s method always works for a ULD f that
changes sign and has a positive and increasing derivative.
Assume that we start with the original guess x0 , then calculate x1 using 4.3
with n = 0, then by taking n = 1 in 4.3 we get x2 , then x3 by taking n = 2 and
so on. Notice that f (x1 ) ≥ 0 no matter what x0 is.
144 CHAPTER 4. APPLICATIONS OF DERIVATIVES
)
y=f(x
x n-f(x n)/f’(b)
x0 b x1 x x
x n+1 x n x n-1
generic_newtons_method
We can see next that for n ≥ 1 we will have xn+1 ≤ xn , so the sequence
x1 , x2 , ..., xn ... will be decreasing.
On the other hand, there is b such that f (b) < 0 (we assumed that f changes
sign), therefore, since f is increasing (because f 0 > 0), we can conclude that
b < xn . It follows that for any given t > 0 there will be m such that xm −xm+1 < t
(otherwise b < xn will break), whence we will have f (xm ) = (xm − xm+1 )f 0 (xm ) <
tf 0 (xm ) ≤ tf 0 (x1 ), and for any n > m it will be 0 ≤ f (xn ) ≤ f (xm ) ≤ tf 0 (x1 ).
Now we can take t small enough for the fast convergence mentioned in exercise 3
to kick in and demonstrate that Newton’s method works. Here are some details.
By taking a = xn and x = xn+1 in estimate ?? from section 2.4, and taking into
account the equation 4.2 and the formula 4.3, we get
2
2 f (xn ) K
|f (xn+1 )| ≤ K(xn − xn+1 ) = K < f (xn )2 = M f (xn )2 ,
f 0 (xn ) f 0 (b)2
where M = K/f 0 (b)2 is a (positive) constant. Now, if M < 10k and |f (xn )| < 10−l ,
then |f (xn+1 )| < 10k−2l . To estimate how well xn approximates the true solution
we notice that f (xn − f (xn )/f 0 (b)) ≤ 0, while f (xn ) ≥ 0 (for n > 0), therefore
the true solution will be between xn − f (xn )/f 0 (b) and xn , and will be not farther
than f (xn )/f 0 (b) from xn .
Comment.
1. As you may have noticed, all the action took place on the segment [b, x1 ], so
we can assume that the constant K that appeared in our finite analysis of
approximation, is good only for this segment.
2. We assumed that the (there can be only one) true solution to the eqution was
between xn − f (xn )/f 0 (b) and xn without justifying that assumption. It is
clear that the solution can not be anywhere outside of [xn −f (xn )/f 0 (b), xn ],
but we haven’t shown that it exists. To do it requires some properties of
the real numbers that we will discuss later. For now we can be content that
Newton’s method allows us to get an approximate solution of as high quality
as we want, and rather quickly at the final stage of the computation.
3. The whole argument was a bit heavy, it can be made more elegant by using
convergence of sequences, we will learn later about this powerful tool.
4.9. NEWTON’S METHOD 145
Exercises
1. Approximate a root of x3 − x + 1 = 0 using the intermediate value theorem
to get started, and then Newton’s method.
2. Approximate a root of 3x4 −16x3 +18x2 +1 = 0 using the intermediate value
theorem to get started, and then Newton’s method. You might have to be
sure to get sufficiently close to a root to start so that things don’t ‘blow up’.
3. Approximate all three roots of x3 − 3x + 1 = 0 using the intermediate value
theorem to get started, and then Newton’s method. Here you have to take
a little care in choice of beginning ‘guess’ for Newton’s method.
4. Approximate the unique positive root of cos x = x.
5. Approximate a root of ex = 2x.
6. Approximate a root of sin x = ln x. Watch out.
7. Try to prove that the algorithm given in the text gives better and better
approxmiations of the squar root of a number. Try to prove it, also see what
happens when a = 0, play with a calculator and try to understand what is
going on).
8. Check that if we take f (x) = x2 − a we will arrive at the same Babylonian
formula that we started with.
9. Investigate how Newton’s iteration will improve the approximate solution,
assuming that f 0 > c > 0 and the approximation that we start with is
good enough. Do some calculations to get a feel for the performance of the
method. Hint: use the inequality ?? from section 2.4 together with 4.3 to
estimate f (xn+1 ) and then to estimate |xn+1 − xn+2 | in terms of |xn − xn+1 |.
10. Now we want to show that Newton’s method always works for a ULD f that
changes sign and has a positive and increasing derivative.
Assume that we start with the original guess x0 , then calculate x1 using 4.3
with n = 0, then by taking n = 1 in 4.3 we get x2 , then x3 by taking n = 2
and so on. Notice that f (x1 ) ≥ 0 no matter what x0 is.
Look at the diagram and see why, then prove it.
11. We can see next that for n ≥ 1 we will have xn+1 ≤ xn , so the sequence
x1 , x2 , ..., xn ... will be decreasing. Prove it
12. 4) While Newton’s method is really good for making a good approximation
to the solution much better, its perfomance may become very sluggish if the
original approximation is not good.
Play with the equation ex = 2 to see that.
146 CHAPTER 4. APPLICATIONS OF DERIVATIVES
Rule 4.10.1.
[author=garrett, file =text_files/lhospitals_rule]
Suppose we want to evaluate
f (x)
lim
g(x)
x→a
No, this is not the quotient rule. No, it is not so clear why this would help, either,
but we’ll see in examples.
Example 4.10.1.
Example 4.10.2.
[author=garrett, file =text_files/lhospitals_rule]
Find limx→0 x/(e2x − 1): both numerator and denominator go to 0, so we are
entitled to use L’Hospital’s rule:
x 1
lim = lim
x→0 e2x − 1 x→0 2e2x
148 CHAPTER 4. APPLICATIONS OF DERIVATIVES
In the new expression, the numerator and denominator are both non-zero when
x = 0, so we just plug in 0 to get
x 1 1 1
lim = lim = 0 =
x→0 e2x − 1 x→0 2e2x 2e 2
Example 4.10.3.
[author=garrett, file =text_files/lhospitals_rule]
Find limx→0+ x ln x: The 0+ means that we approach 0 from the positive side,
since otherwise we won’t have a real-valued logarithm. This problem illustrates
the possibility as well as necessity of rearranging a limit to make it be a ratio of
things, in order to legitimately apply L’Hospital’s rule. Here, we rearrange to
ln x
lim x ln x = lim
x→0+ x→0 1/x
In the new expressions the top goes to −∞ and the bottom goes to +∞ as x goes
to 0 (from the right). Thus, we are entitled to apply L’Hospital’s rule, obtaining
ln x 1/x
lim x ln x = lim = lim
x→0+ x→0 1/x x→0 −1/x2
Now it is very necessary to rearrange the expression inside the last limit: we have
1/x
lim = lim −x
x→0 −1/x2 x→0
Comment.
[author=garrett, file =text_files/lhospitals_rule]
It is often necessary to apply L’Hospital’s rule repeatedly: Let’s find limx→+∞ x2 /ex :
both numerator and denominator go to ∞ as x → +∞, so we are entitled to apply
L’Hospital’s rule, to turn this into
2x
lim
x→+∞ ex
But still both numerator and denominator go to ∞, so apply L’Hospital’s rule
again: the limit is
2
lim =0
x→+∞ ex
since now the numerator is fixed while the denominator goes to +∞.
Example 4.10.4.
lim xx
x→0
4.10. L’HOSPITAL’S RULE 149
It is less obvious now, but we can’t just plug in 0 for x: on one hand, we are taught
to think that x0 = 1, but also that 0x = 0; but then surely 00 can’t be both at
once. And this exponential expression is not a ratio.
The trick here is to take the logarithm:
The reason that we are entitled to interchange the logarithm and the limit is that
logarithm is a continuous function (on its domain). Now we use the fact that
ln(ab ) = b ln a, so the log of the limit is
lim x ln x
x→0+
Aha! The question has been turned into one we already did! But ignoring that,
and repeating ourselves, we’d first rewrite this as a ratio
ln x
lim+ x ln x = lim+
x→0 x→0 1/x
1/x
lim = lim −x = 0
x→0+ −1/x2 x→0+
But we have to remember that we’ve computed the log of the limit, not the limit.
Therefore, the actual limit is
Example 4.10.5.
[author=garrett, file =text_files/lhospitals_rule]
Here is another issue of rearranging to fit into accessible form: Find
p p
lim x2 + x + 1 − x2 + 1
x→+∞
This is not a ratio, but certainly is ‘indeterminate’, since it is the difference of two
expressions both of which go to +∞. To make it into a ratio, we take out the
largest reasonable power of x:
r r
p
2
p
2
1 1 1
lim x + x + 1 − x + 1 = lim x · ( 1 + + 2 − 1 + 2 )
x→+∞ x→+∞ x x x
q s
1 1
1+ x + x2 1 + x12
= lim
x→+∞ − 1/x
The last expression here fits the requirements of the L’Hospital rule, since both
numerator and denominator go to 0. Thus, by invoking L’Hospital’s rule, it be-
comes q −2
− x12 − x23 1 + x1 + x12 − q x3 1
1 1+ x2
= lim 2
x→+∞ 2 −1/x
150 CHAPTER 4. APPLICATIONS OF DERIVATIVES
This is a large but actually tractable expression: multiply top and bottom by
x2 , so that it becomes
r
p
2
−1 1
= lim 12 + 1x 1 + 1x + 1x + 1+ 2
x→+∞ x x
1
At this point, we can replace every x by 0, finding that the limit is equal to
1
2 +0 0 1
√ +√ =
1+0+0 1+0 2
Exercises
1. Find limx→0 (sin x)/x
2. Find limx→0 (sin 5x)/x
3. Find limx→0 (sin (x2 ))/x2
4. Find limx→0 x/(e2x − 1)
5. Find limx→0 x ln x
6. Find
lim (ex − 1) ln x
x→0+
7. Find
ln x
lim
x→1 x−1
8. Find
ln x
lim
x→+∞ x
9. Find
ln x
lim
x→+∞ x2
10. Find limx→0 (sin x)x
4.11. EXPONENTIAL GROWTH AND DECAY: A DIFFERENTIAL EQUATION151
Example 4.11.1.
y1 = cekt1
y2 = cekt2
Even though we certainly do have ‘two equations and two unknowns’, these equa-
tions involve the unknown constants in a manner we may not be used to. But it’s
still not so hard to solve for c, k: dividing the first equation by the second and
using properties of the exponential function, the c on the right side cancels, and
we get
y1
= ek(t1 −t2 )
y2
Taking a logarithm (base e, of course) we get
ln y1 − ln y2 = k(t1 − t2 )
Dividing by t1 − t2 , this is
ln y1 − ln y2
k=
t 1 − t2
Substituting back in order to find c, we first have
ln y1 −ln y2
t1
y1 = ce t1 −t2
ln y1 − ln y2
ln y1 = ln c + t1
t 1 − t2
Rearranging, this is
ln y1 − ln y2 t1 ln y2 − t2 ln y1
ln c = ln y1 − t1 =
t1 − t 2 t1 − t2
Therefore, in summary, the two equations
y1 = cekt1
y2 = cekt2
allow us to solve for c, k, giving
ln y1 − ln y2
k=
t 1 − t2
t1 ln y2 −t2 ln y1
c=e t1 −t2
Example 4.11.2.
[author=garrett, file =text_files/expon_growth_diffeq]
A herd of llamas has 1000 llamas in it, and the population is growing exponentially.
At time t = 4 it has 2000 llamas. Write a formula for the number of llamas at
arbitrary time t.
Here there is no direct mention of differential equations, but use of the buzz-
phrase ‘growing exponentially’ must be taken as indicator that we are talking
about the situation
f (t) = cekt
where here f (t) is the number of llamas at time t and c, k are constants to be
determined from the information given in the problem. And the use of language
should probably be taken to mean that at time t = 0 there are 1000 llamas, and at
time t = 4 there are 2000. Then, either repeating the method above or plugging
into the formula derived by the method, we find
c = value of f at t = 0 = 1000
ln f (t1 ) − ln f (t2 ) ln 1000 − ln 2000
k= =
t1 − t2 0−4
1000 ln 21
= ln −4 = = (ln 2)/4
2000 −4
Therefore,
ln 2
f (t) = 1000 e 4 t = 1000 · 2t/4
This is the desired formula for the number of llamas at arbitrary time t.
Example 4.11.3.
[author=garrett, file =text_files/expon_growth_diffeq]
A colony of bacteria is growing exponentially. At time t = 0 it has 10 bacteria in
it, and at time t = 4 it has 2000. At what time will it have 100, 000 bacteria?
Even though it is not explicitly demanded, we need to find the general formula
for the number f (t) of bacteria at time t, set this expression equal to 100, 000, and
solve for t. Again, we can take a little shortcut here since we know that c = f (0)
and we are given that f (0) = 10. (This is easier than using the bulkier more
general formula for finding c). And use the formula for k:
10
ln f (t1 ) − ln f (t2 ) ln 10 − ln 2, 000 ln 2,000 ln 200
k= = = =
t1 − t 2 0−4 −4 4
Therefore, we have
ln 200
t
f (t) = 10 · e 4 = 10 · 200t/4
as the general formula. Now we try to solve
ln 200
t
100, 000 = 10 · e 4
Exercises
1. A herd of llamas is growing exponentially. At time t = 0 it has 1000 llamas
in it, and at time t = 4 it has 2000 llamas. Write a formula for the number
of llamas at arbitrary time t.
2. A herd of elephants is growing exponentially. At time t = 2 it has 1000
elephants in it, and at time t = 4 it has 2000 elephants. Write a formula for
the number of elephants at arbitrary time t.
3. A colony of bacteria is growing exponentially. At time t = 0 it has 10
bacteria in it, and at time t = 4 it has 2000. At what time will it have
100, 000 bacteria?
4. A colony of bacteria is growing exponentially. At time t = 2 it has 10
bacteria in it, and at time t = 4 it has 2000. At what time will it have
100, 000 bacteria?
4.12. THE SECOND AND HIGHER DERIVATIVES 155
d2 d2 f d2 y
y 00 = f 00 (x) = f = =
dx2 dx2 dx2
The third derivative is
d3 d3 f d3 y
y 000 = f 000 (x) = 3
f= 3
= 3
dx dx dx
And, generally, we can put on a ‘prime’ for each derivative taken. Or write
dn dn f dn y
n
f= n
= n
dx dx dx
for the nth derivative. There is yet another notation for high order derivatives
where the number of ‘primes’ would become unwieldy:
dn f
= f (n) (x)
dxn
as well.
The geometric interpretation of the higher derivatives is subtler than that of
the first derivative, and we won’t do much in this direction, except for the next
little section.
Exercises
1. Find f ”(x) for f (x) = x3 − 5x + 1.
2. Find f ”(x) for f (x) = x5 − 5x2 + x − 1.
√
3. Find f ”(x) for f (x) = x2 − x + 1.
√
4. Find f ”(x) for f (x) = x.
156 CHAPTER 4. APPLICATIONS OF DERIVATIVES
Rule 4.13.1.
• Compute the second derivative f 00 of f , and solve the equation f 00 (x) = 0 for
x to find all the inflection points, which we list in order as x1 < x2 < . . . < xn .
(Any points of discontinuity, etc., should be added to the list!)
• We need some auxiliary points: To the left of the leftmost inflection point
x1 pick any convenient point to , between each pair of consecutive inflec-
tion points xi , xi+1 choose any convenient point ti , and to the right of the
rightmost inflection point xn choose a convenient point tn .
Example 4.13.1.
4.13. INFLECTION POINTS, CONCAVITY UPWARD AND DOWNWARD157
f (x) = 3x2 − 9x + 6
First, the second derivative is just f 00 (x) = 6. Since this is never zero, there are
not points of inflection. And the value of f 00 is always 6, so is always > 0, so the
curve is entirely concave upward.
Example 4.13.2.
First, the second derivative is f 00 (x) = 12x − 24. Thus, solving 12x − 24 = 0, there
is just the one inflection point, 2. Choose auxiliary points to = 0 to the left of
the inflection point and t1 = 3 to the right of the inflection point. Then f 00 (0) =
−24 < 0, so on (−∞, 2) the curve is concave downward. And f 00 (2) = 12 > 0, so
on (2, ∞) the curve is concave upward.
Example 4.13.3.
f (x) = x4 − 24x2 + 11
the second derivative is f 00 (x) = 12x2 − 48. Solving the equation 12x2 − 48 = 0, we
find inflection points ±2. Choosing auxiliary points −3, 0, 3 placed between and
to the left and right of the inflection points, we evaluate the second derivative:
First, f 00 (−3) = 12 · 9 − 48 > 0, so the curve is concave upward on (−∞, −2).
Second, f 00 (0) = −48 < 0, so the curve is concave downward on (−2, 2). Third,
f 00 (3) = 12 · 9 − 48 > 0, so the curve is concave upward on (2, ∞).
Exercises
1. Find the inflection points and intervals of concavity up and down of f (x) =
3x2 − 9x + 6.
2. Find the inflection points and intervals of concavity up and down of f (x) =
2x3 − 12x2 + 4x − 27.
3. Find the inflection points and intervals of concavity up and down of f (x) =
x4 − 2x2 + 11.
158 CHAPTER 4. APPLICATIONS OF DERIVATIVES
Derivation.
Exercises
4.14. ANOTHER DIFFERENTIAL EQUATION: PROJECTILE MOTION 159
1. You drop a rock down a deep well, and it takes 10 seconds to hit the bottom.
How deep is it?
2. You drop a rock down a well, and the rock is going 32 feet per second when
it hits bottom. How deep is the well?
3. If I throw a ball straight up and it takes 12 seconds for it to go up and come
down, how high did it go?
160 CHAPTER 4. APPLICATIONS OF DERIVATIVES
Definition 4.15.1.
Example 4.15.1.
Definition 4.15.2.
[author=garrett, file =text_files/graphing_with_calculus]
A horizontal asymptote of the graph of a function f occurs if either limit
lim f (x)
x→+∞
or
lim f (x)
x→−∞
Example 4.15.2.
[author=garrett, file =text_files/graphing_with_calculus]
Find asymptotes, critical points, intervals of increase and decrease, inflection
x+3
points, and intervals of concavity up and down of f (x) = 2x−6 : First, let’s find
the asymptotes. The denominator is 0 for x = 3 (and this is not cancelled by the
numerator) so the line x = 3 is a vertical asymptote. And as x goes to ±∞, the
function values go to 1/2, so the line y = 1/2 is a horizontal asymptote.
The derivative is
1 · (2x − 6) − (x + 3) · 2 −12
f 0 (x) = 2
=
(2x − 6) (2x − 6)2
Since a ratio of polynomials can be zero only if the numerator is zero, this f 0 (x) can
never be zero, so there are no critical points. There is, however, the discontinuity
at x = 3 which we must take into account. Choose auxiliary points 0 and 4 to
the left and right of the discontinuity. Plugging in to the derivative, we have
f 0 (0) = −12/(−6)2 < 0, so the function is decreasing on the interval (−∞, 3). To
the right, f 0 (4) = −12/(8 − 6)2 < 0, so the function is also decreasing on (3, +∞).
The second derivative is f 00 (x) = 48/(2x − 6)3 . This is never zero, so there
are no inflection points. There is the discontinuity at x = 3, however. Again
choosing auxiliary points 0, 4 to the left and right of the discontinuity, we see
f 00 (0) = 48/(−6)3 < 0 so the curve is concave downward on the interval (−∞, 3).
And f 00 (4) = 48/(8 − 6)3 > 0, so the curve is concave upward on (3, +∞).
Plugging in just two or so values into the function then is enough to enable a
person to make a fairly good qualitative sketch of the graph of the function.
Exercises
x−1
1. Find all asymptotes of f (x) = x+2 .
x+2
2. Find all asymptotes of f (x) = x−1 .
x2 −1
3. Find all asymptotes of f (x) = x2 −4 .
x2 −1
4. Find all asymptotes of f (x) = x2 +1 .
162 CHAPTER 4. APPLICATIONS OF DERIVATIVES
equal slopes
m = f 0 (c)
c
f (b)−f (a)
m= b−a
b
a
mean_ value_ theorem
Comment.
Example 4.16.1.
Comment.
if we know something about f 0 (c), then we can say something about f (b) − f (a).
This is incredibly important. It gives us a formula for how the derivative affects
what we know about f (x).
Before we had this equation, if I told you that f 0 (x) is always ≥ 1, then all
you would have been able to conclude is that f (b) is always ≥ f (a) (since f (x) is
increasing). Now, I can tell you exactly how much bigger f (b) has to be.
Example 4.16.2.
f (b) − f (a)
= f 0 (x) ≥ 1
b−a
we drop the middle term “f 0 (x)”, and multiply by b − a to get:
f (b) − f (a) ≥ b − a
whence f (b) ≥ f (a)+b−a. Thus, for b ≥ 4 we can say that f (b) ≥ 7+b−4 = b+3.
Comment.
[author=duckworth, file =text_files/mean_value_theorem]
In the previous example we used the MVT to take information about f 0 (x) and
turn it into very specific, quantitive information about f (x). This idea will be
crucial when we prove the Fundamental Theorem of Calculus. In fact, you can
alread imagine how the proof will go, in heuristic terms. In the previous example
I used one piece of information about the derivative, namely that it was bigger
than 1, to tell us one piece of information about f (x) (when x ≥ 4), namely that
f (x) was bigger than x + 3. Now, suppose I told you exactly what the derivative
was at a whole bunch of points. Then you should be able to say more precisely
what f (x) is. If I told you what f 0 (x) is at every point, then you should be able
to say what f (x) is every point.
164 CHAPTER 4. APPLICATIONS OF DERIVATIVES
Chapter 5
Integration
Discussion.
then
Z
f (x) dx = F (x) + C
The extra C, called the constant of integration, is really necessary, since after
all differentiation kills off constants, which is why integration and differentiation
are not exactly inverse operations of each other.
Rules 5.1.1.
[author=garrett, file =text_files/integration_basics]
Since integration is almost the inverse operation of differentiation, recollection
of formulas and processes for differentiation already tells the most important
165
166 CHAPTER 5. INTEGRATION
Rule 5.1.2.
Example 5.1.1.
[author=garrett, file =text_files/integration_basics]
√
For example, it is easy to integrate polynomials, even including terms like x and
−1
more general power functions. The only thing to watch out for is terms x = x1 ,
since these integrate to ln x instead of a power of x. So
Rule 5.1.3.
Example 5.1.2.
[author=garrett, file =text_files/integration_basics]
Sums of constant multiples of all these functions are easy to integrate: for example,
5 · 2x 5x3
Z
23
5 · 2x − √ + 5x2 dx = − 23 arcsec x + +C
x x2 − 1 ln 2 3
Discussion.
Discussion.
Now we make the time chunks smaller and smaller until we have approximated
the smooth curve.
Since we have a new function we need a new set of symbols to represent it, this
looks like
Discussion.
168 CHAPTER 5. INTEGRATION
Example 5.1.3.
[author=duckworth, file =text_files/integration_basics]
Let’s find an anti-derivative by guessing, or “thinking backwards”. Let f (x) = x2 .
Can we guess what function we would take the derivative of to get x2 ?
Well, at least in your head, check the derivative of a bunch of our basic func-
tions. You should quickly decide that we won’t get x2 by taking the derivative of
ex , ln(x), sin(x), etc. We need to take the derivative of a power of x in order to
get x2 .
In fact, we will have to take the derivative of something of the form x3 in
order to get x2 . So let’s check: how close to the right answer is F (x) = x3 . Well,
d d 3 2 2 2
dx F (x) = dx x = 3x . We’re trying to get x , not 3x , so we need to change our
guess for F (x) a little bit. We want to cancel the 3. A little more thought leads
3
d x3
to our next guess of F (x) = x3 . Now it’s easy to check that dx d
F (x) = dx 3 =
2
3 x3 = x2 .
x3
So, we’ve done it, F (x) = 3 is an anti-derivative of x2 .
Is F (x) the only solution of this problem? Well, if you go back through our
thought process above you can see that no other power of x will work, and the
coefficent has to be 13 . Well, you can change F (x) by adding something whose
3 3
derivative will be 0. Thus, F (x) = x3 + 12, or F (x) = x3 − 13427, are also
anti-derivatives.
x3
In general, every function of the form F (x) = 3 + C is an anti-derivative of
x2 .
Example 5.1.4.
fact that differentiating 2x3 yields 6x2 , but this is not the only solution we also
have 2x3 + 1, 2x3 + 2, even 2x3 − 98999 giving us this same solution! Constants
”disappear” on differentiation– so we generally write the integral of a function
with an arbitrary constant added
R to the end to show all the possible solutions. So
we write the full equation as 6x2 dx = 2x3 + C
The method we have described is terribly ad-hoc, but we will be able to gen-
5.1. BASIC INTEGRATION FORMULAS 169
eralize it, and obtain the polynomial formula in the next section.
Rule 5.1.4.
Comment.
Example 5.1.5.
Derivation.
In this section we will concern ourselves with determining the integrals of other
functions, such as sin(x), cos(x), tan(x), and others.
Recall the following D sin(x) = cos(x) D cos(x) = − sin(x) D tan(x) = (sec(x))2
R
and given the above rule that Df (x) = g(x), g(x)dx = f (x) + C
sin(x), and (secx)2 cos(x) dx =
R
we instantly have the integrals of cos(x),
sin(x) + C sin(x) dx = − cos(x) + C (sec(x))2 dx = tan(x) + C
R R
Derivation.
Discussion.
Notation.
[author=livshits, file =text_files/integration_basics]
Meanwhile we will assume that it is true and introduce the notation
Z
f (x)dx
5.1. BASIC INTEGRATION FORMULAS 171
for the set of all the antiderivatives of a given function f . This set is also called
the indefinite integral of f . Since all the antiderivatives of f are of the form F + C
where F is one of them and C is a constant, we can write
Z
f (x)dx = F (x) + C
Example 5.1.6.
v(t)=y’(t)
a free fall of a stone
y(t)
-g=y’’(t)
y=0 stone_in_freefall
It follows from Newton’s Second law: F = ma, where m is the mass of the
stone, a = y 00 is its acceleration and F = −gm is the force of gravity. We also
have two additional conditions:
The equation simly says that the acceleration equals to −g. We can find the veloc-
ity by integrating acceleration and using the initial velocity to get the integration
constant. This gives us
v(t) = y 0 (t) = v0 − gt.
To find the position we integrate the velocity and use the initial position to figure
out the integration constant. By doing so we get
In case of zero initial velocity (v0 = 0) the velocity and the position of the stone
at time t will be:
v(t) = −gt and y(t) = y0 − gt2 /2.
p
In particular, it will take T = 2y0 /g seconds for the stone to hit the ground.
At that point its speed (which is the absolute value of the velocity) will be
172 CHAPTER 5. INTEGRATION
√
v(T ) = gT = 2gy0 . While the stone drops, it loses height, but it picks up
speed. However, the energy
1
E= mv 2 + mgy
2
will stay the same. The energy of the stone consists of 2 parts:
1
K= mv 2
2
is called the kinetic energy, it is the energy of motion, it depends only on the
speed, and
P = mgy
is called the potential energy, it depends only on the position of the stone. Con-
servation of energy is one of the most important principles in physics.
Example 5.1.7.
H(t) H
0
a v(t)
a
a leaky bucket a bucket full of
slick rods v(t)leaky_buckets
Let A be the area of the horizontal cross-section of the bucket, a be the area of
the hole and H0 be the original water level in the bucket. Assume that the hole in
the bucket was opened up at time 0, so H(0) = H0 where H(t) is the water level
at time t.
This problem of the detailed description of the flow of water is rather compli-
cated, so we will add some simplifying assumptions to make things manageable.
We first have to figure out how fast the wat water is squirting out of the hole,
depending on the level of water in the bucket. Let us say it is squirting out at
velocity v. If a small mass of water, say m, escapes through the hole, the mass of
the water left in the bucket will be reduced by m, and the reduction will take place
at the level H, so the potential energy of the water will drop by mgH. On the
other hand, the kinetic energy of mass m of water moving at velocity v is mv 2 /2,
and the water that escapes has the potential energy zero because the hole is at
the level zero. From the conservation of energy we must have
1 p
mv 2 = mgH, therefore v(t) = 2gH(t). (5.1)
2
In other words, the velocity v(t) at which the water escapes is the same velocity
that it would pick up by a free fall from level H(t) to level 0 where the hole is
(compate to the results from the previous problem). In deriving this formula for
v(t) we neglected a few things, such as the internal friction in water, the change
in the flow pattern inside the bucket and the variations in velocity across the jet
of water squirting out of the hole.
Now, after we get a handle on how fast the water is flowing out, it is easy to
see how fast the water level will drop. Indeed, the rate of change of the volume of
5.1. BASIC INTEGRATION FORMULAS 173
the water in the bucket is −AH 0 (t), that must be equal to the rate at which the
water passes through the hole, which is av(t), and, using the formula 5.1 for v(t),
we get
ap
H 0 (t) = − 2gH(t). (5.2)
A
p
Dividing sides by 2 H(t) we can rewrite Equation ?? as
p ap
( H(t))0 = − g/2.
A
Taking into account that H(0) = H0 , we get
p p ap
( H(t)) = H0 − t g/2.
A
Finally, solving the equation H(T ) = 0 leads to
Ap
T = 2H0 /g.
a
Let us take a closer look at this formula and see why it makes sense.
The case a = A corresponds to the bottom of the bucket p falling off, so all the
water will be in a free fall. As we know, it will take just 2H0 /g seconds for the
water to drop the distance H0 , and that’s exactly what the formula says.
The formula also says that the time it takes the bucket to empty out is pro-
portional to the cross-section of the bucket and inversely proportional to the size
of the hole, which makes sense.
Now assume that the bucket is slightly inclined and filled with a bunch of
identical well lubricated metal rods, assume that each rod fits intopthe hole snugly,
so it slides out, as soon as it gets to it (see the picture). It takes 2H0 /g seconds
for each rod to slide out, there are A/a of them that will fit into the bucket, and
we arrive at the same formula for T .
Exercises
7
4x3 − 3 cos x +
R
1. x + 2 dx =?
3x2 + e2x − 11 + cos x dx =?
R
2.
sec2 x dx =?
R
3.
7
R
4. 1+x2 dx =?
√
16x7 − x + √3
R
5. x
dx =?
√ 2
R
6. 23 sin x − 1−x2
dx =?
174 CHAPTER 5. INTEGRATION
This diagram shows the logical structure of the heart of Calculus. The main
results are on top, the Fundamental Theorem of Calculus parts I and II. Each
result is only true if the results below it are true, so the whole thing builds up
piece by piece. It’s pretty amazing that you can have something that works and
that has this many logical steps, each one of which could break the whole structure!
If one part of this were really false, there would probably be satellites falling out
of the sky!!
Fundamental Theorem
of Calculus part II:
Rb
a
f (x) = F (b) − F (a)
where F (x) is any
anti-derivative
Fundamental Theorem
f 0 = g0
of R Calculus part I:
d x ⇒f =g+C
dx 0 f (s) ds = f (x)
1 2 3 4 5
same
eeze_theorem limits Mean Value Theorem
mean_value_ A
same A
theoremslopes A
A
Extreme Value Theorem: Fermat’s Theorem: if
f has abs. max/min on x = c is local max/min
[a, b] then f 0 (c) = 0
Proof.
[author=livshits, file =text_files/the_fundamental_theorem]
We have to establish the inequality
|F (c) − F (b) − f (b)(c − b)| ≤ K(c − b)2
but by our integration rules the LHS can be rewritten as
Z c Z c Z c
L|x − b|dx = (L/2)(b − c)2
(f (x) − f (b)) dx ≤ |f (x) − f (b)| dx ≤
b b b
Examples 5.3.1.
Z
1
cos(ax + b) dx = · sin(ax + b) + C
a
Z
1
eax+b dx = · eax+b + C
a
√ (ax + b)3/2
Z
1
ax + b dx = · +C
a 3/2
Z
1 1
dx = · ln(ax + b) + C
ax + b a
Examples 5.3.2.
Comment.
Rule 5.3.1.
[author=livshits, file =text_files/integration_simple_subst]
Z Z
f (g(x))g 0 (x)dx = f (g)dg
Exercises
e3x+2 dx =?
R
1.
R
2. cos(2 − 5x) dx =?
R√
3. 3x − 7 dx =?
sec2 (2x + 1) dx =?
R
4.
(5x7 + e6−2x + 23 + x2 ) dx =?
R
5.
R
6. cos(7 − 11x) dx =?
178 CHAPTER 5. INTEGRATION
5.4 Substitutions
Discussion.
[author=garrett, file =text_files/integration_subst]
The chain rule can also be ‘run backward’, and is called change of variables or
substitution or sometimes u-substitution. Some examples of what happens are
straightforward, but others are less obvious. It is at this point that the capacity
to recognize derivatives from past experience becomes very helpful.
Examples 5.4.1.
Notice how for ‘bookkeeping purposes’ we put the 21 into the integral (to
make the constants right there) and put a compensating 2 outside.
4. Since (by the chain rule)
d
sin7 (3x + 1) = 7 · sin6 (3x + 1) · cos(3x + 1) · 3
dx
then we have
Z Z
1
cos(3x + 1) sin6 (3x + 1) dx = 7 · 3 · cos(3x + 1) sin6 (3x + 1) dx
21
1
= sin7 (3x + 1) + C
21
5.4. SUBSTITUTIONS 179
Exercises
R
1. cos x sin x dx =?
2
2x ex dx =?
R
2.
6
6x5 ex dx =?
R
3.
cos x
R
4. sin x dx =?
cos xesin x dx =?
R
5.
√
1
e x
R
6. √
2 x
dx =?
cos x sin5 x dx =?
R
7.
sec2 x tan7 x dx =?
R
8.
2
(3 cos x + x) e6 sin x+x dx =?
R
9.
R √
10. ex ex + 1 dx =?
180 CHAPTER 5. INTEGRATION
Examples 5.5.1.
[author=wikibooks, file =text_files/summation_notation]
It’s easiest to learn summation notation by example, so we list a number of exam-
ples now.
P5
1. i=1 i=1+2+3+4+5
Here, the dummy variable is i, the lower limit of summation is 1, and the
upper limit is 5.
P7 2 2 2 2 2 2 2
2. j=2 j = 2 + 3 + 4 + 5 + 6 + 7
Here, the dummy variable is j, the lower limit of summation is 2, and the
upper limit is 7.
3. The name of the dummy variable doesn’t matter. For example, the following
are all the same:
4
X 4
X 4
X
i= j= α = 1 + 2 + 3 + 4.
i=1 j=1 α=1
This means we can change the name of the dummy variable whenever we
like. Conventionally we use the letters i, j, k, m.
4. Sometimes, you will see summation signs with no dummy variable specified,
e.g
X4
i3 = 100
1
In such cases the correct dummy variable should be clear from the context.
5. You may also see cases where the limits are unspecified. Here too, they must
be deduced from the context. For example, later we will always be studying
infinite summations that start at 0 or 1, so
X1
n
would mean (in that context)
∞
X 1
n=0
n
5.5. AREA AND DEFINITE INTEGRALS 181
Examples 5.5.2.
[author=wikibooks, file =text_files/summation_notation]
Here are some common summations, together with a closed form formula for their
sum (note: having a closed form formula is quite rare in general)
Pn
1. i=1 c = c + c + ... + c = nc (for any real number c)
Pn n(n+1)
2. i=1 i = 1 + 2 + 3 + ... + n = 2
Pn n(n+1)(2n+1)
3. i=1 i2 = 12 + 22 + 32 + ... + n2 = 6
Pn n2 (n+1)2
4. i=1 i3 = 13 + 23 + 33 + ... + n3 = 4
Notation.
denotes the sum of the values of f(k) for k=1, k=2, etc., up to k=n. For
example,
P4
k=1 2k = (2 · 1) + (2 · 2) + (2 · 3) + (2 · 4) = 2 + 4 + 6 + 8 = 20
Definition 5.5.1.
Intuitively, this can be thought of as adding the areas of ”bars” in the curve
to obtain an approximation of the area, and it gets more accurate as the number
of bars (n) increases.
Definition 5.5.2.
[author=garrett, file =text_files/area_defn_integrals]
The actual definition of ‘integral’ is as a limit of sums, which might easily be viewed
as having to do with area. One of the original issues integrals were intended to
address was computation of area.
First we need more notation. Suppose that we have a function f whose integral
182 CHAPTER 5. INTEGRATION
is another function F : Z
f (x) dx = F (x) + C
The left-hand side of this equality is just notation for the definite integral. The
use of the word ‘limit’ here has little to do with our earlier use of the word, and
means something more like ‘boundary’, just like it does in more ordinary English.
A similar notation is to write
Example 5.5.3.
Comment.
Example 5.5.4.
[author=garrett, file =text_files/area_defn_integrals]
For example, since y = x2 is certainly always positive (or at least non-negative,
which is really enough), the area ‘under the curve’ (and, implicitly, above the
x-axis) between x = 0 and x = 1 is just
Z 1
x3 13 − 03 1
x2 dx = [ ]10 = =
0 3 3 3
More generally, the area below y = f (x), above y = g(x), and between x = a
and x = b is Z b
area... = f (x) − g(x) dx
a
Z right limit
= (upper curve - lower curve) dx
left limit
It is important that f (x) ≥ g(x) throughout the interval [a, b].
For example, the area below y = ex and above y = x, and between x = 0 and
x = 2 is
Z 2
x2
ex − x dx = [ex − ]20 = (e2 − 2) − (e0 − 0) = e2 + 1
0 2
since it really is true that ex ≥ x on the interval [0, 2].
As a person might be wondering, in general it may be not so easy to tell
whether the graph of one curve is above or below another. The procedure to
examine the situation is as follows: given two functions f, g, to find the intervals
where f (x) ≤ g(x) and vice-versa:
• Find where the graphs cross by solving f (x) = g(x) for x to find the x-coordinates
of the points of intersection.
• Between any two solutions x1 , x2 of f (x) = g(x) (and also to the left and right
of the left-most and right-most solutions!), plug in one auxiliary point of your
choosing to see which function is larger.
Of course, this procedure works for a similar reason that the first derivative
test for local minima and maxima worked: we implicitly assume that the f and
g are continuous, so if the graph of one is above the graph of the other, then the
situation can’t reverse itself without the graphs actually crossing.
Example 5.5.5.
In some cases the ‘side’ boundaries are redundant or only implied. For example,
the question might be to find the area between the curves y = 2 − x and y = x2 .
What is implied here is that these two curves themselves enclose one or more finite
pieces of area, without the need of any ‘side’ boundaries of the form x = a. First,
we need to see where the two curves intersect, by solving 2 − x = x2 : the solutions
are x = −2, 1. So we infer that we are supposed to find the area from x = −2 to
x = 1, and that the two curves close up around this chunk of area without any
need of assistance from vertical lines x = a. We need to find which curve is higher:
plugging in the point 0 between −2 and 1, we see that y = 2 − x is higher. Thus,
the desired integral is
Z 1
area... = (2 − x) − x2 dx
−2
Definition 5.5.3.
V being any primitive of v. Going back to our usual notation and using the rules
of integration for indefinite integrals, we get
Z b
f = F (b) − F (a), where F is any primitive of f.
a
Rule 5.5.1.
• Sums Rule: Z b Z b Z b
(f + g) = f+ g
a a a
• Multiplier Rule:
Z b Z b
cf = c f, wrere c is a constant
a a
5.5. AREA AND DEFINITE INTEGRALS 185
• Integration by Parts:
Z b Z b
f 0 g = f g|ba − f g0 ,
a a
b
f(x)dg
f(a)g(a)
g
g(a) g(b) integration_by_parts
• Change of Variable:
Z b Z g(b)
f (g(x))g 0 (x)dx = f (g)dg
a g(a)
Example 5.5.6.
Discussion.
[author=livshits, file =text_files/area_defn_integrals]
As we saw in section ?? (Theorem 3.3.2), the derivative of a ULD function is
ULC. It is natural to ask whether any ULC function is a derivative of some ULD
function. In this section we will see that it is indeed the case. In other words,
any ULC function has a ULD primitive and it makes sense to talk about definite
and indefinite integrals of any ULC function. We will also take a closer look at
the notion of area and prove the Newton-Leibniz theorem for ULC functions This
will provide a rigorous foundation for Calculus in the realm of ULC and ULD
functions.
The central idea is to approximate a ULC function f from above by f and
from below by f with some simple (piecewise-linear) functions that are easy to
integrate. Then, using positivity of definite integral (that is equivalent to IFT) we
can conclude that Z Z b b
f (x)dx ≤ f (x)dx
a a
(we assume that a < b), and if we want to keep positivity, we conclude that
Z b Z b Z b
f (x)dx ≤ f (x)dx ≤ f (x)dx (5.3)
a a a
The assumption that f is ULC will allow us to take f and f as close to each
other as we want, therefore their integrals can be made as close to each other
Rb
as we want, and this will define a f (x)dx uniquely. After this construction is
understood, the Newton-Leibniz theorem becomes an easy check and provides a
construction for a ULD primitive of f .
Y
_
y = f(x)
y = f(x)
~
y=f(x)
y =f(x)
_
a b c X
approx_integral
Proof.
[author=livshits, file =text_files/area_defn_integrals]
So let us assume that f is defined on the segment [a, b] and is ULC, i.e. |f (x) −
f (u)| ≤ L|x − u|. First we introduce a mesh of points a = x0 < x1 < ... <
xn−1 < xn = b such that xk − xk−1 ≤ h. Then we put f (xk ) = f (xk ) − 2Lh and
f (xk ) = f (xk ) + 2Lh for k = 0, ..., n and assume f and f to be linear on each
segment [xk−1 , xk ]. It is easy to check that f (x) ≤ f (x) ≤ f (x) for any x in [a, b].
Also f − f = 4Lh, therefore
Z b Z b
f (x)dx − f (x)dx = 4Lh(b − a) (5.4)
a a
Since the h > 0 is arbitrary, there is at most one real number I such that
Rb Rb
a
f (x)dx ≤ I ≤ a f (x)dx for any piecewise-linear f and f such that f ≤ f ≤ f .
5.5. AREA AND DEFINITE INTEGRALS 187
Rb
And there will be such a number because f ≤ f ≤ f implies a f (x)dx ≤
Rb Rb
a
f (x)dx, so we can define a f (x)dx = I. This works when a < b, and we
Ra Rb Ra
can put a f (x)dx = 0 and a f (x)dx = − b f (x)dx when b < a.
The piecewise-linear function f˜ such that f˜(xk ) equals f (xk ) and f˜ is linear on
every [xk−1 , xk ] approximates f better than f or f because it sits between them
Rb Rb
together with f , so a f˜(x)dx is often used in practical calculations of a f (x)dx.
It is called the trapezoid rule because the approximating integral is the sum of the
(appropriately signed) areas of a bunch of trapezoids.
In particular, we can conclude from the estimate 5.4 that
Z
b Z b
˜
f (x)dx − f (x)dx ≤ 4Lh|b − a|
a a
and the previous exercise shows that the factor 4 in the right-hand side can be
dropped.
Now, using this estimate, it is easy to see that the definite integral that we
have just constructed for ULC functions possesses the positivity and additivity
properties and satisfies the sums and the constant multiple rules from section ??.
It inherits these properties from the approximations, so to speak.
Discussion.
[author=livshits, file =text_files/area_defn_integrals]
For example, to prove positivity, we can observe that from f ≥ 0 it follows that
Rb
f˜ ≥ 0 and therefore a f˜(x)dx ≥ 0 (we assume here that a ≤ b and we know
that positivity holds for the piecewise-linear functions), so we can conclude that
Rb Rb
a
f (x)dx ≥ −4Lh(b − a), and therefore a f (x)dx ≥ 0 we can take h = (b − a)/n
(Archimedes principle again). Additivity and the sums and the constant multiple
rules are demonstrated in a similar fashion (exercise).
There is an important and easy consequence of positivity of our newly con-
structed definite integral that will be handy soon:
Z Z
b b
f (x)dx ≤ |f (x)|dx
a a
Exercises
1. Find the area between the curves y = x2 and y = 2x + 3.
5. It is easy to check that f (x) ≤ f (x) ≤ f (x) for any x in [a, b].
6. For example, to prove positivity, we can observe that from f ≥ 0 it follows
Rb
that f˜ ≥ 0 and therefore a f˜(x)dx ≥ 0 (we assume here that a ≤ b and
we know that positivity holds for the piecewise-linear functions), so we can
Rb Rb
conclude that a f (x)dx ≥ −4Lh(b − a), and therefore a f (x)dx ≥ 0 we
can take h = (b − a)/n (Archimedes principle again). Additivity and the
sums and the constant multiple rules are demonstrated in a similar fashion
(exercise).
7. It is not too difficult to see that f and f can be chosen 4 times closer
together because already f (xk ) = f (xk ) + Lh/2 and f (xk ) = f (xk ) − Lh/2
will guarantee f ≤ f ≤ f .
5.6. TRANSCENDENTAL INTEGRATION 189
Discussion.
A(1,2*3)=A(1,6)=A(1,3)+A(3,6)=A(1,2)+A(1,3)
A(1,2)
y=1/x
A(3,6)=A(1,2)
x
1 2 3 6 area_under_1_over_x
Notice that the formulas will hold for positive a, b, or x less than 1 if we take into
account that ln(x) = −ln(1/x) for 0 < x < 1. These formulas can be extended
even to the negative x as well by replacing ln(x) with ln|x|, ln(a) with ln|a| and
190 CHAPTER 5. INTEGRATION
ln(b) with ln|b|, but should be treated with some caution since ln|x| and 1/x blow
up at 0.
Now we got yet another function that we can differentiate:
(ln |x|)0 = 1/x
Definition 5.6.1.
[author=livshits,uses=e^x,establishes=deriv_of_e^x, file =text_files/transcendental_
integration]
The base of the natural logarithm is called the Euler number and denoted e, so
we can write
ln(ex ) = x and eln(a) = a for any a > 0.
Sometimes ex is written as exp(x), so
ln(exp(x)) = x for any x and exp(ln(x)) = x for x > 0.
d
We can use implicit differentiation to figure out dx exp(x):
1 = x0 = (ln(exp(x)))0 = ln0 (exp(x)) exp0 (x) = (1/ exp(x)) exp0 (x),
so
exp0 (x) = exp(x).
Use U = x3 .
u = x3 , du = 3x2 dx, so the integral becomes (u + 10)du = u2 /2 + 10u + C,
R
−1/(u2 + 3) + C
R √
7. x2 x3 + 2dx
(x3 )0 = 3x2 , use U -subst
u = x3 + 2, du = 3x2 dx, so the integral becomes (1/3)u1/2 = (2/9)u3/2 +
R
C = (2/9)(x3 + 2)3/2 + C
8. Water is pored into a conical bucket at a rate 50 cubic inches per minute.
How fast is the water level in the bucket rising at the moment when the area
of the water surface is 100 sqare inches?
Differentiate the formula for the volume of a cone
The volume of the cone of height h and base area A is V = Ah/3 in our
problem A = ah2 , so V = ah3 /3. The time derivative V 0 = ah2 h0 and finally
h0 = V 0 /(ah2 ) = V 0 /A = 50/100 = 1/2 inches per second.
9. A sperical balloon is pumped up at 5 cubic inches per second. How fast is
its area growing when its radius is 10 inches?
Differentiate the formula for the volume of a ball
The volume of the balloon is V = (4/3)πr3 , its surface area is A = 4πr2 , so
V 0 = 4πr2 r0 and A0 = 8πrr0 = 2V 0 /r = 10/10 = 1
10. Conservation of energy via chain rule.
(a) Check that the gravity force pulling the stone down is equal to −dP/dy
where P (y) is the potential energy of the stone.
(b) Check that Newton’s Second law can be rewritten as my 00 + dP/dy = 0.
(c) Use the chain rule to calculate the time derivative E 0 of the energy and
use the equation from (b) to show that E 0 = 0, which implies that E
does not change with time, i.e. energy is conserved.
For part (c) note that (y 02 )0 = 2y 0 y 00
11. (x5 + 3x4 − 7)10 (5x4 + 12x3 )dx
R
Applications of Integration
Discussion.
Rule 6.1.1.
Example 6.2.1.
[author=duckworth, file =text_files/arc_length]
Find the distance travelled by a ball which has path given by y = −x2 + 4.
I’ll pretend I don’t know how to solve this exactly and do an approximaton in
3 steps. Thus ∆x = 4/3 and so I will have points at x equal to −2, −2/3, 2/3, 2.
The y-values corresponding to these x-values are 0, 32/9, 32/9, 0. Between these
points I will use straight lines, thus the distance at each step will be given by the
193
194 CHAPTER 6. APPLICATIONS OF INTEGRATION
p
distance formula (i.e. ∆x2 + ∆y 2 ). So we have:
p p p
Arc-length ≈ (4/3)2 + (32/9)2 + (4/3)2 + 02 + (4/3)2 + (32/9)2
= 8.928
Now, I want to get an exact answer. That means that I need to figure out how
R 2 roots by something of the form ∗ · ∆x. If I can do
to replace each of those square
that then I can integrate −2 ∗ dx. So this is a trick; each of those square roots
p
was of the form ∆x2 + ∆y 2 , and if I really want something times ∆x (which I
do) I’ll factor that out to get:
r
∆y 2
1+ · ∆x.
∆x2
Thus, what we should integrate is
s 2
dy
1+ dx
dx
Note that the function is even, so we can integrate from 0 to 2 and multiply the
result by 2. Also, (−2x)2 equals (2x)2 , so we can find
Z 2p
2 1 + (2x)2 dx
0
We look up this integral in the back of our book (because we’ve already done
integrals like this in chapter 7) to get
u
√ √ 4
1 + u 2 + 1 ln(u + 1 + u 2 )
2
√2 √ 0
= 2 17 + 12 ln(4 + 17) − (0 + ln(1 + 0))
= 9.29
Definition 6.2.1.
[author=duckworth, file =text_files/arc_length]
Based on this experience, we define arc-length as follows:
Z b r r !
2 2
dy dx
Arc-length = s = ds where ds = 1+ dx dx or dy + 1 dy
a
p
≈ ∆x2 + ∆y 2
6.2. LENGTHS OF CURVES 195
Comment.
[author=duckworth, file =text_files/arc_length]
The problems in this section can take a long time just because there’s lots of
simplification and/or manipulation to get the integral into the right form. Here’s
some advice:
Rule 6.2.1.
This formula comes from approximating the curve by straight lines connecting
successive points on the curve, using the Pythagorean Theorem to compute the
lengths of these segments in terms of the change in x and the change in y. In
one way of writing, which also provides a good heuristic for remembering the
formula, if a small change in x is dx and a small change in y is dy, then the length
of the hypotenuse of the right triangle with base dx and altitude dy is (by the
Pythagorean theorem)
s 2
p
2 2
dy
hypotenuse = dx + dy = 1 + dx
dx
Unfortunately, by the nature of this formula, most of the integrals which come
up are difficult or impossible to ‘do’. But if one of these really mattered, we could
still estimate it by numerical integration.
196 CHAPTER 6. APPLICATIONS OF INTEGRATION
Exercises
√
1. Find the length of the curve y = 1 − x2 from x = 0 to x = 1.
2. Find the length of the curve y = 41 (e2x + e−2x ) from x = 0 to x = 1.
3. Set up (but do not evaluate) the integral to find the length of the piece of
the parabola y = x2 from x = 3 to x = 4.
6.3. NUMERICAL INTEGRATION 197
Example 6.3.1.
[author=duckworth, file =text_files/numerical_integration]
R1 2
Consider 0 e−x dx. Let’s approximate this in four steps. So we have n = 4 and
∆x = 14 . We have:
1 −(x∗ 2 ∗ 2 ∗ 2 ∗ 2
Rule x∗1 , x∗2 , x∗3 , x∗4 4 (e
1) + e−(x2 ) + e−(x3 ) + e−(x4 ) )
1 −02 2 2 2
LHR x∗1 = 0, x∗2 = 41 , x∗3 = 42 , x∗4 = 3
4 4 (e + e−(1/4) + e−(2/4) + e−(3/4) ) = .821999
1 −(1/4)2 2 2 2
RHR x∗1 = 14 , x∗2 = 24 , x∗3 = 43 , x∗4 = 4
4 =1 4 (e + e−(2/4) + e−(3/4) + e−1 ) = .663969
1 −(1/8)2 2 2 2
MP x∗1 = 18 , x∗2 = 38 , x∗3 = 85 , x∗4 = 7
8 4 (e + e−(3/8) + e−(5/8) + e−(7/8) ) = .748747
The obvious questions at this point are: which one of these is best, and how
close is it? You might think just by looking at these numbers that .821999 is too
high and .663969 is too low. In this case, this is right, but the correct way to see
this is to graph f (x) and note that it is decreasing. This implies that the LHR is
too high and the RHR is too low.
Rule 6.3.1.
[author=duckworth, file =text_files/numerical_integration]
We can summarize this for all functions:
R
• If f (x) is increasing then LHR > > RHR.
R
• If f (x) is decreasing then RHR > > LHR.
What about the MP rule? What about averaging the LHR and RHR? Let’s define
a new rule: TRAP = 12 (LHR + RHR). We give the outcome of this rule, together
with how to calculate in terms of the xi ∗:
So which is better, the MP or the TRAP? To figure this out draw one “rect-
angle” in f (x) with a quarter of a circle on top (or see the picture in the book or
in lecture notes). The TRAP gives the area formed by the trapezoid connecting
the right side to the left where the vertical lines hit the curve for f (x). Draw the
MP with a horizontal line coming half-way between the left and right sides (this
198 CHAPTER 6. APPLICATIONS OF INTEGRATION
is not the same as a horizontol line half-way between the top and the bottom of
the curve). You can re-draw the MP by drawing a tangent line at the point where
the MP line intersects f (x). The trapezoid formed with this tangent line, has the
same area as the rectangle formed with a horizontal line at the mid-point (you
can see this because you just cut off one corner of the rectangle and move it to
the other side to form the trapezoid). We can finally see whether MP or TRAP is
better and which is too big/too small.
R
• If f (x) is concave down, then ∓ > > TRAP
R
• If f (x) is concave up, then ∓ < < TRAP
• In all cases MP is better than TRAP
The preceeding discussion justifies a new rule. We want a formula for something
between MP and TRAP, which comes out a little closer to MP. This is Simpson’s
rule (as applied to the previous example):
rule as an average formula
2∓+TRAP 11
SIMP 3 3 4 (f (0) + 4f (1/8) + 2f (2/8) + 4f (3/8)
Discussion.
Discussion.
Yes, all the values have a factor of ‘2’ except the first and the last. (This method
approximates the area under the curve by trapezoids inscribed under the curve in
each subinterval).
Midpoint rule: Let xi = 12 (xi − xi−1 ) be the midpoint of the subinterval
[xi−1 , xi ]. Then the midpoint rule says that
Z b
f (x) dx ≈ ∆x[f (x1 ) + . . . + f (xn )]
a
(This method approximates the area under the curve by rectangles whose height
is the midpoint of each subinterval).
Simpson’s rule: This rule says that
Z b
f (x) dx ≈
a
∆x
≈ [f (x0 ) + 4f (x1 ) + 2f (x2 ) + 4f (x3 ) + . . . + 2f (xn−2 ) + 4f (xn−1 ) + f (xn )]
3
Yes, the first and last coefficients are ‘1’, while the ‘inner’ coefficients alternate ‘4’
and ‘2’. And n has to be an even integer for this to make sense. (This method
approximates the curve by pieces of parabolas).
In general, the smaller the ∆x is, the better these approximations are. We can
be more precise: the error estimates for the trapezoidal and midpoint rules depend
upon the second derivative: suppose that |f 00 (x)| ≤ M for some constant M , for
all a ≤ x ≤ b. Then
M (b − a)3
error in trapezoidal rule ≤
12n2
M (b − a)3
error in midpoint rule ≤
24n2
The error estimate for Simpson’s rule depends on the fourth derivative: suppose
that |f (4) (x)| ≤ N for some constant N , for all a ≤ x ≤ b. Then
N (b − a)5
error in Simpson’s rule ≤
180n4
From these formulas estimating the error, it looks like the midpoint rule is
always better than the trapezoidal rule. And for high accuracy, using a large
number n of subintervals, it looks like Simpson’s rule is the best.
200 CHAPTER 6. APPLICATIONS OF INTEGRATION
Definition 6.4.1.
Example 6.4.1.
Discussion.
[author=garrett, file =text_files/average_of_function]
A weighted average is an average in which some of the items to be averaged
are ‘more important’ or ‘less important’ than some of the others. The weights are
(non-negative) numbers which measure the relative importance.
For example, the weighted average of a list of numbers x1 , . . . , xn with corre-
sponding weights w1 , . . . , wn is
w1 · x1 + w2 · x2 + . . . + wn · xn
w1 + w2 + . . . + wn
Note that if the weights are all just 1, then the weighted average is just a plain
average.
Definition 6.4.2.
[author=garrett, file =text_files/average_of_function]
The continuous analogue of a weighted average can be obtained as an integral,
6.5. CENTERS OF MASS (CENTROIDS) 201
using a notation which matches better: let f be a function on an interval [a, b],
with weight w(x), a non-negative function on [a, b]. Then
Rb
a
w(x) · f (x) dx
weighted average value of f on the interval [a, b] with weight w = Rb
a
w(x) dx
Notice that in the special case that the weight is just 1 all the time, then the
weighted average is just a plain average.
Example 6.4.2.
Example 6.4.3.
[author=duckworth, file =text_files/average_of_function]
One of the best examples to think of for average value of a function is the temper-
ature outside over one full day. It’s easy to understand what the high temperature
means, and what the low temperature means. Suppose you want to know the
average temperature, how many times do you need to measure the temperature?
1? Not enough. 4 times? Not enough if you want the most accuracy. 24 times? In
practical terms this might be enough, but in math we always want infinite preci-
sion. That leads to the following definition. The average of f on an interval [a, b]
is: Z b
1
fAvg = f (x) dx.
b−a a
A rectangle with base b − a and height equal to the number fAvg has the same
Rb
area as a f (x) dx. This can be used to define/understand what we mean by the
average.
The simplest case is that of a rectangle: it is pretty clear that the centroid
is the ‘center’ of the rectangle. That is, if the corners are (0, 0), (u, 0), (0, v) and
(u, v), then the centroid is
u v
( , )
2 2
The formulas below are obtained by ‘integrating up’ this simple idea:
Definition 6.5.1.
[author=garrett, file =text_files/centers_of_mass]
For the center of mass (centroid) of the plane region described by f (x) ≤ y ≤ g(x)
and a ≤ x ≤ b, we have
Comment.
Example 6.5.1.
And
R11
0 2
[(x2 )2 − 0] dx
y-coordinate of the centroid = R1
0
[x2 − 0] dx
1 5 1 1
2 [x /5]0 − 0)
2 (1/5 3
= 3 1 = =
[x /3]0 1/3 − 0 10
Exercises
1. Find the center of mass (centroid) of the region 0 ≤ x ≤ 1 and 0 ≤ y ≤ x2 .
2. Find the center of mass (centroid) of the region defined by 0 ≤ x ≤ 1, 0 ≤
y ≤ 1 and x + y ≤ 1.
3. Find the center of mass (centroid) of a homogeneous plate in the shape of
an equilateral triangle.
204 CHAPTER 6. APPLICATIONS OF INTEGRATION
Rule 6.6.1.
[author=duckworth, file =text_files/volumes_cross_section]
Let V be the volume of a shape between x = a and x = b, which has cross-sectional
area given by the function A(x). Then V is given by
Z b
V = A(x) dx.
a
Comment.
Discussion.
Rule 6.6.2.
Example 6.6.1.
+1
x3 +1 (−1)3
Z
1 2 2 4
= π(1 − x2 ) dx = π[x − ]−1 = π[(1 − ) − (−1 − )] = + =
−1 3 3 3 3 3 3
Exercises
1. Find the volume of a circular cone of radius 10 and height 12 (not by a
formula, but by cross sections).
2. Find the volume of a cone whose base is a square of side 5 and whose height
is 6, by cross-sections.
206 CHAPTER 6. APPLICATIONS OF INTEGRATION
Rule 6.7.1.
[author=garrett, file =text_files/solids_revolution]
If we rotate the plane region described by f (x) ≤ y ≤ g(x) and a ≤ x ≤ b around
the x-axis, then the volume of the resulting solid is
Z b
V = π(g(x)2 − f (x)2 ) dx
a
Z right limit
= π(upper curve2 − lower curve2 ) dx
left limit
It is necessary to suppose that f (x) ≥ 0 for this to be right.
Comment.
[author=garrett, file =text_files/solids_revolution]
This formula comes from viewing the whole thing as sliced up into slices of thick-
ness dx, so that each slice is a disk of radius g(x) with a smaller disk of radius
f (x) removed from it. Then we use the formula
and ‘add them all up’. The hypothesis that f (x) ≥ 0 is necessary to avoid different
pieces of the solid ‘overlap’ each other by accident, thus counting the same chunk
of volume twice.
If we rotate the plane region described by f (x) ≤ y ≤ g(x) and a ≤ x ≤ b
around the y-axis (instead of the x-axis), the volume of the resulting solid is
Z b
volume = 2πx(g(x) − f (x)) dx
a
Z right
= 2πx( upper - lower) dx
left
This second formula comes from viewing the whole thing as sliced up into thin
cylindrical shells of thickness dx encircling the y-axis, of radius x and of height
g(x) − f (x). The volume of each one is
Example 6.7.1.
[author=garrett, file =text_files/solids_revolution]
As an example, let’s consider the region 0 ≤ x ≤ 1 and x2 ≤ y ≤ x. Note that for
0 ≤ x ≤ 1 it really is the case that x2 ≤ y ≤ x, so y = x is the upper curve of the
two, and y = x2 is the lower curve of the two. Invoking the formula above, the
volume of the solid obtained by rotating this plane region around the x-axis is
Z right
volume = π(upper2 − lower2 ) dx
left
Z 1
= π((x)2 − (x2 )2 ) dx = π[x3 /3 − x5 /5]10 = π(1/3 − 1/5)
0
Example 6.7.2.
Discussion.
[author=duckworth, file =text_files/solids_revolution]
For some functions it’s easier to slice the volume a different way. If you rotate a
little bump around the y-axis, then cross-section slices aren’t very good. In this
case, think about a little vertical rectangle in the bump, being rotated around the
y-axis and making a cylindrical shell. If we add a bunch of these shells together
we’ll have the whole volume.
Derivation.
where x is the radius because you’re rotating about the y-axis, and you need to
figure out a and b (which equal the smallest and the largest radiuses) from the
picture.
Sometimes, your region isn’t defined by a single function f (x). In this case,
you draw a single shell figure out what h is.
For example, if the region is defined as being between two functions f and g
you’d have
h 2πr ∆r
f (x) − g(x) 2πx ∆x
Rb
a
(f (x) − g(x)) 2πx dx
Exercises
1. Find the volume of the solid obtained by rotating the region 0 ≤ x ≤ 1, 0 ≤
y ≤ x around the y-axis.
2. Find the volume of the solid obtained by rotating the region 0 ≤ x ≤ 1, 0 ≤
y ≤ x around the x-axis.
3. Set up the integral which expresses the volume of the doughnut obtained by
rotating the region (x − 2)2 + y 2 ≤ 1 around the y-axis.
210 CHAPTER 6. APPLICATIONS OF INTEGRATION
6.8 Work
Discussion.
[author=duckworth, file =text_files/work_application]
The amount of work required to move an object is:
W =F ·d
where F is a (positive) force acting in the opposite direction of the movement and
d is the distance the object is moved. Here we assume that F is constant. Also,
we change this definition slightly if the force is acting in the same direction as the
movement: then we use −F instead of F .
Usually we deal with problems where the force is changing or the distance is
changing. In this case we figure out:
(a) the formula for doing part of the work (where “part” refers to either moving
part of the object of thickness ∆x or to figuring out the force over a certain
distance of length ∆x)
(b) and then we integrate the formula we found in part (a)
Rule 6.8.1.
Example 6.8.1.
(a) Here the force is changing. Let ∆x be a little distance that the object will
move, at position x (for example, ∆x = .1 and x = 0 would represent the
work to move from x = 0 to x = .1). On this segment, the work will be
sin(x) (from x = 0 to x = .1 we would take x in [0, .1], maybe sin(0), sin(.1)
or sin(.05)). So the work to move a distance of ∆x around position x would
be
part of work sin(x)∆x
R1
(b) The total work is 0 sin(x) dx.
6.9. SURFACES OF REVOLUTION 211
Rule 6.8.2.
[author=duckorth, file =text_files/work_application]
Suppose we have a substance (usually water, gravel, dirt, lengths of rope or chain)
which is being moved. Suppose that the substance covers positions from x = a to
x = b. Let ∆x be given, and let the phrase “the substance at position x” mean the
total volume of the substance which is contained in any interval of length ∆x which
contains x (for example we could pick the interval [x − 21 ∆x, x + 21 ∆x]). We first
approximate the amount of work required to move the substance at position x, by
using a constant values for the distance the substance is moved and, if necessary,
using a constant value for the force. Let I(x)∆x be a formula for this constant
approxmition of the work required to move all the substance at position x. Then
the total work is given by
Z b
W = I(x) dx.
a
Example 6.8.2.
[author=duckworth, file =text_files/work_application]
Suppose we are pumping water out of a tank which is a cylinder of radius 2 m and
height 9 m. Find the work required to empty the tank.
(a) Here, the distance being lifted is changing. Let’s measure x from the top
and consider a slice of the cylinder of thickness ∆x at depth x. The work to
lift this slice of water is
R9
(b) The total work is 0
π · 4 · 1000 · 9.8 · x dx
Definition 6.9.1.
The surface area generated by rotation a function around one of the axes is
Z b r r !
2 2
dy dx
SA = 2πr ds where ds = 1 + dx dx or dy + 1 dy
a
p
≈ ∆x2 + ∆y 2
Here r is the radius of revolution. If you’re rotating around the x-axis, and your
formula
r is given as a function of x, then you will use r = function of x and ds =
2
dy
1 + dx dx. If you’re rotating around the x-axis and your formula is given in
r
2
dx
terms of y then you’ll use r = y and ds = dy + 1 dy
Comment.
[author=duckworth, file =text_files/surface_revolution]
One way to understand this formula is to think of ds as being approximately the
length of a diagonal line ` between two points on the curve. Then 2πr times this
length is the area of a rectangle with length 2πr and height `. This rectangle has
approximately the same area as one gets by rotating the line ` around a radius of
r.
Definition 6.9.2.
This formula comes from extending the ideas of the previous section the length
of a little piece of the curve is p
dx2 + dy 2
This gets rotated around thep
perimeter of a circle of radius y = f (x), so approxi-
mately give a band of width dx2 + dy 2 and length 2πf (x), which has area
s 2
p
2 2
dy
2πf (x) dx + dy = 2πf (x) 1 + dx
dx
Integrating this (as if it were a sum!) gives the formula.
As with the formula for arc length, it is very easy to obtain integrals which are
difficult or impossible to evaluate except numerically.
Similarly, we might rotate the curve y = f (x) around the y-axis instead. The
same general ideas apply to p compute the area of the resulting surface. The width
of each little band is still dx2 + dy 2 , but now the length is 2πx instead. So the
band has area p
width × length = 2πx dx2 + dy 2
Therefore, in this case the surface area is obtained by integrating this, yielding the
formula s
Z b 2
dy
area = 2πx 1 + dx
a dx
6.9. SURFACES OF REVOLUTION 213
Exercises
1. Find the area of the surface obtained by rotating the curve y = 41 (e2x +e−2x )
with 0 ≤ x ≤ 1 around the x-axis.
2. Just set up the integral for the surface obtained by rotating the curve y =
1 2x
4 (e + e−2x ) with 0 ≤ x ≤ 1 around the y-axis.
3. Set up the integral for the area of the surface obtained by rotating the curve
y = x2 with 0 ≤ x ≤ 1 around the x-axis.
4. Set up the integral for the area of the surface obtained by rotating the curve
y = x2 with 0 ≤ x ≤ 1 around the y-axis.
214 CHAPTER 6. APPLICATIONS OF INTEGRATION
Chapter 7
Techniques of Integration
Z Z
Integration by parts f 0 · g = f · g − f · g0
The book writes this a different way. Let u = f (x) and v = g(x) so du = f 0 (x) dx
and dv = g 0 (x) dx. Then we have:
Z Z
Integration by parts v du = u · v − u dv
Usually you are given something to integrate that looks like a product. You have
to choose which thing to call f 0 (or du) and whichR to call g (or v). The point is
that f · g 0 should be easier for some reason than f 0 · g.
R
Example 7.1.1.
Derivation.
[author=garrett, file =text_files/integration_by_parts]
Strangely, the subtlest standard method is just the product rule run backwards.
This is called integration by parts. (This might seem strange because often
people find the chain rule for differentiation harder to get a grip on than the
215
216 CHAPTER 7. TECHNIQUES OF INTEGRATION
Sometimes this is written another way: if we use the notation that for a function
u of x,
du
du = dx
dx
then for two functions u, v of x the rule is
Z Z
u dv = uv − v du
Yes, it is hard to see how this might be helpful, but it is. The first theme we’ll
see in examples is where we could do the integral except that there is a power of
x ‘in the way’:
Example 7.1.2.
Example 7.1.3.
[author=garrett, file =text_files/integration_by_parts]
A similar example is
Z Z Z
x cos x dx = x d(sin x) = x sin x − sin x dx = x sin x + cos x + C
Example 7.1.4.
= x2 ex − 2x ex + 2ex + C
7.1. INTEGRATION BY PARTS 217
Here we integrate byR parts twice. After the first integration by parts, the integral
we come up with is xex dx, which we had dealt with in the first example.
Example 7.1.5.
[author=garrett, file =text_files/integration_by_parts]
Sometimes it is easier to integrate the derivative of something than to integrate
the thing:
Z Z Z
ln x dx = ln x d(x) = x ln x − x d(ln x)
Z Z
1
= x ln x − x dx = x ln x − 1 dx = x ln x − x + C
x
We took u = ln x and v = x.
Example 7.1.6.
[author=garrett, file =text_files/integration_by_parts]
Again in this example it is easier to integrate the derivative than the thing itself:
Z Z Z
arctan x dx = arctan x d(x) = x arctan x − x d(arctan x)
Z Z
x 1 2x
= x arctan x − 2
dx = x arctan x − dx
1+x 2 1 + x2
1
= x arctan x − ln(1 + x2 ) + C
2
since we should recognize the
2x
1 + x2
as being the derivative (via the chain rule) of ln(1 + x2 ).
Rule 7.1.1.
[author=livshits, file =text_files/integration_by_parts]
Integration by Parts
Z Z
f 0g = f g − f g0
Example 7.1.7.
Exercises
R
1. ln x dx =?
xex dx =?
R
2.
(ln x)2 dx =?
R
3.
xe2x dx =?
R
4.
R
5. arctan 3x dx =?
x3 ln x dx =?
R
6.
R
7. ln 3x dx =?
R
8. x ln x dx =?
7.2. PARTIAL FRACTIONS 219
All the rest of our work is to break down more complicated problems into pieces
that are polynomials, or which use the formulas just given.
Procedure.
• If degree top poly ≥ degree bottom poly, then perform polynomial division
so that this is no longer the case.
• Factor the bottom poly, so that we have only linear and quadratic factors.
Then do partial fractions so that we have separate fractions, each of the form
∗ ∗
x±a or x2 +ax+b (in each case ∗ should be something with a lower degree than
the bottom).
• Perform completing the square on any fractions with quadratic factors so we
have:
∗ ∗
2
→ 2 .
x + ax + b u ± c2
• This reduces the original integral as follows:
∗ ∗ ∗ ∗
Z Z
poly
= poly + + +···+ 2 + + . . .
bottom poly x±a x±b u ± c2 u2 ± d2
• We
R shouldR be able
R to finish the integral using our knowledge of how to do
∗ ∗
poly, x±a , x2 ±a 2 (again, ∗ always represents something with lower
Discussion.
Example 7.2.1.
[author=duckworth, file =text_files/partial_fractions]
Find 123
9 We rewrite this as 9 123. We will put first a 1 on top because 9 goes into
12 once:
1 13
9 123 → 9 123 → 9 123
−9 −9
3 33
−27
6
So we have a remainder of 6. We write this as
123 6
= 13 + .
9 9
Example 7.2.2.
x2
2
x + x x4 +0x3 −2x2 +17x+2
−(x4 + x3 )
− x3 −2x2
Next, we put −x on top because when we multiply this by x2 + x we can kill off
the −x3 :
x2 − x x2 − x − 1
2 2
x + x x4 +0x3 −2x2 +17x+2 → x + x x4 +0x3 −2x2 +17x+2
−(x4 + x3 ) −(x4 + x3 )
− x3 −2x2 − x3 −2x2
− (x3 − x2 ) − (x3 − x2 )
− x2 +17x − x2 +17x
−(− x2 − x)
18x+2
Discussion.
[author=duckworth, file =text_files/partial_fractions]
Partial Fractions. This is a way to rewrite a single fraction with factors on the
7.2. PARTIAL FRACTIONS 221
Example 7.2.3.
1 3 x2 + 1 3x x2 + 3x + 1
+ 2 = + = .
x x +1 x(x2 + 1) x(x2 + 1) x(x2 + 1)
2
Now suppose we started with xx(x +3x+1
2 +1) and didn’t know that it was originally
written as two fractions. We could figure out those fractions as follows. Solve for
A, B and C:
x2 + 3x + 1 A Bx + C
2
= + 2 .
x(x + 1) x x +1
Multiplying both sides by x(x2 + 1) we get:
Now, for these sides of the equation to be equal, we need the coefficients of x2 to
be the same on both sides, we need the coefficients of x to be the same on both
sides, and we need the constant terms on both sides to be the same. This leads to
the following equations:
x2 coeff : 1 = A+B
x coeff : 3 = C
constant : 1 = A
This gives us A = 1, B = 0 and C = 3. Thus we have found:
x2 + 3x + 1 1 3
= + 2 .
x(x2 + 1) x x +1
Of course, in this example we already knew this, but the point is we figured out
how to take the fraction on the left, and write it as the sum of fractions on the
right.
Procedure.
• Distinct linear factors: each gets represented once on the right hand side:
∗ A B
= + + .... (a 6= b)
(x + a)(x + b) . . . x+a x+b
• Repeated linear factors: the ones that are repeated get represented multiple
times on the right hand side:
∗ A B C D E
= + + + + +. . . (a 6= b)
(x + a)4 (x + b) . . . 2 3 4
x + a (x + a) (x + a) (x + a) x + b
222 CHAPTER 7. TECHNIQUES OF INTEGRATION
Hopefully the pattern is clear about what to do if you replaced (x + a)4 with
(x + a)9 .
• Distinct quadratic factors: each gets represented once on the right hand side:
∗ Ax + B Cx + D
= 2 + ...
(x2 + ax + b)(x2 + cx + d) x + ax + b x2 + cx + d
• Repeated quadratic factors: the ones that are repeated get represented mul-
tiple times on the right hand side:
∗ Ax + B Cx + B Dx + E Fx + G
= 2 + + +
(x2 + ax + b)3 (x2 + cx + d) x + ax + b (x2 + ax + b)2 (x2 + ax + b)3 x2 + cx + d
After you get the above equation set up, you multiply both sides by the de-
nominator from the left, you multiply everything out on the right, you gather the
x-terms, you gather the x2 -terms, the x3 -terms etc. Then you get a new system
of equations by requiring that the coefficients of x be the same on both sides, the
coefficients of x2 to be the same on both sides, etc.
Discussion.
Example 7.2.4.
x2 + 6x +7 x2 + 6x+ 9 − 9+7
÷2 ↓ → ÷2 ↓ -%
^2
3−−→9 ^2 9
3−−→
x2 + 6x + 7 = (x + 3)2 − 2
x2 + 6x + 7 = (x + a)2 + b
and solve for a and b. You get x2 + 6x + 7 = x2 + 2ax + a2 + b so you see that
a = 3 (because we need 6x = 2ax) thus 7 = a2 + b implies that b = −2.
7.2. PARTIAL FRACTIONS 223
Discussion.
[author=garrett, file =text_files/partial_fractions]
Now we return to a more special but still important technique of doing indefinite
integrals. This depends on a good trick from algebra to transform complicated
rational functions into simpler ones. Rather than try to formally describe the
general fact, we’ll do the two simplest families of examples.
Example 7.2.5.
[author=garrett, file =text_files/partial_fractions]
Consider the integral Z
1
dx
x(x − 1)
As it stands, we do not recognize this as the derivative of anything. However, we
have
1 1 x − (x − 1) 1
− = =
x−1 x x(x − 1) x(x − 1)
Therefore,
Z Z
1 1 1
dx = − dx = ln(x − 1) − ln x + C
x(x − 1) x−1 x
That is, by separating the fraction 1/x(x − 1) into the ‘partial’ fractions 1/x and
1/(x − 1) we were able to do the integrals immediately by using the logarithm.
How to see such identities?
Rule 7.2.1.
[author=garrett, file =text_files/partial_fractions]
Well, let’s look at a situation
A B
cx + d(x − a)(x − b) = +
x−a x−b
where a, b are given numbers (not equal) and we are to find A, B which make this
true. If we can find the A, B then we can integrate (cx + d)/(x − a)(x − b) simply
by using logarithms:
Z Z
cx + d A B
dx = + dx = A ln(x − a) + B ln(x − b) + C
(x − a)(x − b) x−a x−b
To find the A, B, multiply through by (x − a)(x − b) to get
cx + d = A(x − b) + B(x − a)
c · a + d = A(a − b)
c · b + d = B(b − a)
That is,
c·a+d c·b+d
A= B=
a−b b−a
224 CHAPTER 7. TECHNIQUES OF INTEGRATION
So, yes, we can find the constants to break the fraction (cx + d)/(x − a)(x − b)
down into simpler ‘partial’ fractions.
Further, if the numerator is of bigger degree than 1, then before executing the
previous algebra trick we must firstdivide the numerator by the denominator to get
a remainder of smaller degree.
Example 7.2.6.
[author=garrett, file =text_files/partial_fractions]
A simple example is
x3 + 4x2 − x + 1
=?
x(x − 1)
We must recall how to divide polynomials by polynomials and get a remainder of
lower degree than the divisor. Here we would divide the x3 + 4x2 − x + 1 by
x(x − 1) = x2 − x to get a remainder of degree less than 2 (the degree of x2 − x).
We would obtain
x3 + 4x2 − x + 1 4x + 1
=x+5+
x(x − 1) x(x − 1)
since the quotient is x + 5 and the remainder is 4x + 1. Thus, in this situation
Z 3
x + 4x2 − x + 1
Z
4x + 1
dx = x + 5 + dx
x(x − 1) x(x − 1)
Now we are ready to continue with the first algebra trick.
In this case, the first trick is applied to
4x + 1
x(x − 1)
We want constants A, B so that
4x + 1 A B
= +
x(x − 1) x x−1
As above, multiply through by x(x − 1) to get
4x + 1 = A(x − 1) + Bx
4 · 0 + 1 = −A 4·1+1=B
Rule 7.2.2.
[author=garrett, file =text_files/partial_fractions]
In a slightly different direction: we can do any integral of the form
Z
ax + b
dx
1 + x2
because we know two different sorts of integrals with that same denominator:
Z Z
1 2x
dx = arctan x + C dx = ln(1 + x2 ) + C
1 + x2 1 + x2
where in the second one we use a substitution. Thus, we have to break the given
integral into two parts to do it:
Z Z Z
ax + b a 2x 1
dx = dx + b dx
1 + x2 2 1 + x2 1 + x2
a
= ln(1 + x2 ) + b arctan x + C
2
Example 7.2.7.
Example 7.2.8.
Rule 7.2.3.
[author=wikibooks, file =text_files/partial_fractions]
More generally, if we have a Q(x) which is the product ni of p factors of the form
(x − ai )ni and q factors of the form (x − bi )2 − ci then we can write any P/Q
as a sum of simpler terms, each with a power of only one factor in the denominator:
P (x) d1,1 dp,n1
Q(x) = x−a1 + ··· + (x−ap )np + · · ·
f1,1 +g1,1 x fq,nq +gq,nq x
+ (x−b 2
1 ) −c1
+ ··· + ((x−bq )2 −cq )nq
then solve for the new constants. If we were using complex numbers none of the
factors of Q would be quadratic.
Example 7.2.9.
[author=wikibooks, file =text_files/partial_fractions]
We will consider a few more examples, to see how the procedure goes. Consider
ln (x+3) (x+7)
4 2
13
(x+5) 2
Example 7.2.10.
[author=wikibooks, file =text_files/partial_fractions]
1 a b c
2/P (x) = 1, Q(x) = (x+1)(x+2)2 We first write (x+1)(x+2) 2 = x+1 + x+2 + (x+2)2
Multiply both sides by the denominator 1 = a(x + 2)2 + b(x + 1)(x + 2) + c(x + 1)
Substitute in three values of x to get three equations for the unknown constants,
x = 0 1 = 22 a + 2b + c
1 1 1
x = −1 1=a so a=1, b=-1, c=-1, and (x+1)(x+2) 2 = x+1 − x+2 −
x = −2 1 = −c
1 dx x+1 1
R
(x+2) 2 We can now integrate the left hand side. (x+1)(x+2) 2 = ln x+2 + x+2
Exercises
1
R
1. x(x−1) dx =?
1+x
R
2. 1+x2 dx =?
7.2. PARTIAL FRACTIONS 227
2x3 +4
R
3. x(x+1) dx =?
2+2x+x2
R
4. 1+x2 dx =?
2x3 +4
R
5. x2 −1 dx =?
2+3x
R
6. 1+x2 dx =?
x3 +1
R
7. (x−1)(x−2) dx =?
x3 +1
R
8. x2 +1 dx =?
228 CHAPTER 7. TECHNIQUES OF INTEGRATION
Procedure.
[author=duckworth, file =text_files/trigonometric_integrals]
sinn (x) cosm (x) dx use:
R
For
• if n is odd get rid of all but 1 power of sin using sin2 = 1 − cos2 , then use
u = cos and du = − sin dx.
• if m is odd get rid of all but 1 power of cos using cos2 = 1 − sin2 , then use
u = sin and du = cos dx.
• if n and m are even, use sin2 (x) = 21 (1−cos(2x)) and cos2 (x) = 12 (1+cos(2x))
(may have to repeat this step) to get everything in terms of cos(2x), cos(4x)
etc.
Procedure.
• if n is odd get rid of all but 1 power of tan using tan2 = sec2 −1, force one
power of sec out next to tan, and use u = sec, du = sec tan.
• if m is even get rid of all but 2 powers of sec using sec2 = tan2 +1, use
u = tan and du = sec2 .
• if n is even and m is odd get rid of all powers of tan using tanR2 = sec2 −1.
Now we have only powers of sec, use integration by parts and sec(x) dx =
ln | sec(x) + tan(x)|.
Example 7.3.1.
[author=duckworth, file =text_files/trigonometric_integrals]
sin7 (x) cos2 (x) dx. We get rid of sin6 (x) by rewriting it as (1 − cos2 (x))3 . Then
R
we have:
Z Z Z
7
sin (x) cos (x) dx = sin(x)(1 − cos (x)) cos (x) dx = − (1 − u2 )3 u2 du
2 2 3 2
Example 7.3.2.
[author=duckworth, file =text_files/trigonometric_integrals]
tan2 (x) sec(x) dx. We get rid of tan2 (x) by rewriting it as sec2 (x) − 1. Then we
R
have:
Z Z Z
tan2 (x) sec(x) dx = (sec2 (x) − 1) sec(x) dx = sec3 (x) − sec(x) dx
Discussion.
[author=garrett, file =text_files/trigonometric_integrals]
Here we’ll just have a sample of how to use trig identities to do some more com-
plicated integrals involving trigonometric functions. This is ‘just the tip of the
iceberg’. We don’t do more for at least two reasons: first, hardly anyone remem-
bers all these tricks anyway, and, second, in real life you can look these things up
in tables of integrals. Perhaps even more important, in ‘real life’ there are more
sophisticated viewpoints
√ which even make the whole issue a little silly, somewhat
like evaluating 26 ‘by differentials’ without your calculator seems silly.
The only identities we’ll need in our examples are
sin2 (x) = 1
cos2 (x) +q Pythagorean identity
1−cos(2x)
sin(x) = 2 half-angle formula
q
1+cos(2x)
cos(x) = 2 half-angle formula
Example 7.3.3.
If we ignore all trig identities, there is no easy way to do this integral. But if we
use the Pythagorean identity to rewrite it, then things improve:
Z Z Z
sin3 x dx = (1 − cos2 x) sin x dx = − (1 − cos2 x)(− sin x) dx
In the latter expression, we can view the − sin x as the derivative of cos x, so with
the substitution u = cos x this integral is
u3 cos3 x
Z
− (1 − u2 ) du = −u + + C = − cos x + +C
3 3
Example 7.3.4.
230 CHAPTER 7. TECHNIQUES OF INTEGRATION
(polynomial in u) du
Example 7.3.5.
[author=garrett, file =text_files/trigonometric_integrals]
But this Pythagorean identity trick does not help us on the relatively simple-
looking integral Z
sin2 (x) dx
Example 7.3.6.
sin3 2x
Z Z
1 1 1
cos3 2x dx = (1 − sin2 2x) cos 2x dx = [sin 2x − ]+C
8 8 8 3
Putting it all together, we have
x −3 sin3 2x
Z
3x 3 1
sin6 x dx = + sin 2x + + sin 4x + [sin 2x − ]+C
8 16 16 64 8 3
This last example is typical of the kind of repeated application of all the tricks
necessary in order to treat all the possibilities.
Example 7.3.7.
There is something distasteful about this rationalization, but at this level of tech-
nique we’re stuck with it.
Comment.
[author=garrett, file =text_files/trigonometric_integrals]
Maybe this is enough of a sample. There are several other tricks that one would
have to know in order to claim to be an ‘expert’ at this, but it’s not really sensible
to want to be ‘expert’ at these games, because there are smarter alternatives.
Discussion.
Example 7.3.8.
= 1/3(sin(x))3 − 1/5(sin(x))5 + C
Rule 7.3.1.
• for m odd substitute u = sin x and use the fact that (cos x)2 = 1 − (sin x)2
• for m even substitute u = cos x and use the fact that (sin x)2 = 1 − (cos x)2
• for m and n both even, use the fact that (sin x)2 = 1/2(1 − cos 2x) and
(cos x)2 = 1/2(1 + cos 2x)
Example 7.3.9.
[author=wikibooks, file =text_files/trigonometric_integrals]
For example, for m and n even, say I = (sin x)2 (cos x)4 dx making the substitu-
R
R 1 1 2
tions gives I = 2 (1 − cos 2x) 2 (1 + cos 2x) dx
Expanding this out I = 18
1 − cos2 2x + cos 2x − cos3 2x dx
R
x sin 4x sin3 2x
I= − + + C.
16 64 48
Discussion.
Example 7.3.10.
Example 7.3.11.
[author=wikibooks, file =text_files/trigonometric_integrals]
For example, if we are considering the integral
Z 1 √
1 − x2
I= 2
dx
−1 1 + x
In effect, we’ve removed the square root from the original integrand. We could
do this with a single change of variables, but doing it in two steps gives us the
opportunity of doing the trigonometric integral another way.
Having done this, we can split the new integrand into partial fractions, and
integrate.
R 1 2−√2 R 1 2+√2 R1 2
I = √ dt + √ dt − dt
√ −1 2
t +3−
p8 √ −1 2
t +3+ √ 8 p 1+t2√
−1
4− 8 −1 4+ 8 −1
= √ √ tan ( 3 + 8) + √ √ tan ( 3 − 8) − π
3− 8 3+ 8
Example 7.3.12.
[author=wikibooks, file =text_files/trigonometric_integrals]
E.g, in this last example, once we deduced
Z π/2
cos2 θ
I= 2 dθ
−π/2 1 + sin θ
234 CHAPTER 7. TECHNIQUES OF INTEGRATION
we could have used the double angle formulae, since this contains only even powers
of cos and sin. Doing that gives
Z π/2
1 π 1 + cos φ
Z
1 + cos 2θ
I= dθ = dφ
−π/2 3 − cos 2θ 2 −π 3 − cos φ
Rule 7.3.2.
[author=wikibooks,
R file =text_files/trigonometric_integrals]
R R
For the integrals sin nx cos mx dx, sin nx sin mx dx, cos nx cos mx dx use the
following identities 2 sin a cos b = (sin (a + b)+sin (a − b)), 2 sin a sin b = (cos (a − b)−
cos (a + b)), 2 cos a cos b = (cos (a − b) + cos (a + b))
Example 7.3.13.
Example 7.3.14.
Rule 7.3.3.
[author=wikibooks, file =text_files/trigonometric_integrals]
A reduction formula is one that enables us to solve an integral problem by reducing
it to a problem of solving an easier integral problem, and then reducing that to
7.3. TRIGONOMETRIC INTEGRALS 235
Example 7.3.15.
Exercises
cos2 x dx =?
R
1.
cos x sin2 x dx =?
R
2.
cos3 x dx =?
R
3.
sin2 5x dx =?
R
4.
R
5. sec(3x + 7) dx
sin2 (2x + 1) dx =?
R
6.
sin3 (1 − x) dx =?
R
7.
236 CHAPTER 7. TECHNIQUES OF INTEGRATION
Procedure.
Example 7.4.1.
(a) Find the area under a circle with radius 1, from x = 0 to x = 1/2. This
R 1/2 √
is 0 1 − x2 dx. The hard part is coming up with the definite integral.
√ q
Let x = sin(θ), then dx = cos(θ) dθ. Note that 1 − x2 = 1 − sin2 (θ) =
cos(θ). We also translate the endpoints of the integral. When x = 0 we have
sin(θ) = 0 so θ = 0. When x = 1/2 we have sin(θ) = 1/2 so θ = π/6. So we
have
Z 1/2 p Z π/6 Z π/6
1 − x2 dx = cos(θ) · cos(θ) dθ = cos2 (θ) dθ.
0 0 0
We look up this integral from section 7.1 or 7.2 as 21 θ + 12 sin(θ) cos(θ) so the
final answer is found by plugging in θ = π/6 and θ = 0.
R√
(b) Find the indefinite integral in part (a) (i.e. the anti-derivative 1 − x2 dx).
1 1
Well, we know this is 2 θ + 2 sin(θ) cos(θ), so we just need to translate from θ
back to x. By the definition of our substitution we have x = sin(θ). To find
cos(θ) in terms of x you can draw a right triangle, label an angle as θ, the
opposite side as x, the hypotenuse as 1 (this is because √ sin(θ) = x) and solve
for the missing side. You should find that cos(θ) = 1 − x2 (by the way, it
√
always works out this way; the missing side is the that you started with
−1
in the integral). Finally, θ = sin (x) (because sin(θ) = x). Thus,
Z p
1 1 1 1 p
1 − x2 dx = θ + sin(θ) cos(θ) = sin −1 (x) + x 1 − x2 .
2 2 2 2
(If you want, you can get the same answer as in (a) by plugging in x = 1/2
and x = 0 to evaluate this definite integral, i.e. to find the area under the
curve.)
7.4. TRIGONOMETRIC SUBSTITUTIONS 237
Discussion.
Example 7.4.2.
[author=garrett, file =text_files/trigonometric_subst]
For example, in Z p
1 − x2 dx
Now we have an integral we know how to integrate: using the half-angle formula,
this is Z Z
1 + cos 2u u sin 2u
cos2 u du = du = + +C
2 2 4
And there still remains the issue of substituting back to obtain an expression in
terms of x rather than u. Since x = sin u, it’s just the definition of inverse function
that
u = arcsin x
To express sin 2u in terms of x is more aggravating. We use another half-angle
formula
sin 2u = 2 sin u cos u
Then
1 1 1 p
sin 2u = · 2 sin u cos u = x · 1 − x2
4 4 4
where ‘of course’ we used the Pythagorean identity to give us
p p
cos u = 1 − sin2 u = 1 − x2
238 CHAPTER 7. TECHNIQUES OF INTEGRATION
Whew.
Rule 7.4.1.
Example 7.4.3.
[author=garrett, file =text_files/trigonometric_subst]
For example, in
Z √
1 + x2
dx
x
we use
x = tan u dx = sec2 u du
and turn the integral into
Z √ Z √
1 + x2 1 + tan2 u
dx = sec2 u du =
x tan u
Z √ 2 Z Z
sec u 2 sec u 2 1
= sec u du = sec u du = du
tan u tan u sin u cos2 u
by rewriting everything in terms of cos u and sin u.
Rule 7.4.2.
[author=garrett, file =text_files/trigonometric_subst]
√
For integrals containing x2 − 1, use x = sec u in order to invoke the Pythagorean
identity
sec2 u − 1 = tan2 u
so as to be able to ‘take the square root’. Let’s not execute any examples of this,
since nothing new really happens.
Discussion.
Example 7.4.4.
[author=garrett, file =text_files/trigonometric_subst]
For example, consider Z p
−2x − x2 dx
The quadratic polynomial inside the square-root is not one of the three simple
types we’ve looked at. But, by completing the square, we’ll be able to rewrite it
in essentially such forms:
Note that always when completing the square we ‘take out’ the coefficient in front
of x2 in order to see what’s going on, and then put it back at the end.
So, in this case, we’d let
sin u = 1 + x, cos u du = dx
Example 7.4.5.
Rather than put the whole ‘−4’ back, we only keep track of the ±, and take a ‘+4’
outside the square root entirely:
Z p Z p
2
8x − 4x dx = −4(−1 + (x − 1)2 ) dx
Z p Z p
=2 −(−1 + (x − 1)2 ) dx = 2 1 − (x − 1)2 ) dx
Rule 7.4.3.
x = a sin(θ) dx = a cos(θ) dθ
This will transform the integrand to a trigonometic function. If the new inte-
grand can’t be integrated on sight then the tan-half-angle substitution described
below will generally transform it into a more tractable algebraic integrand.
240 CHAPTER 7. TECHNIQUES OF INTEGRATION
Example 7.4.6.
[author=wikibooks, file
p=text_files/trigonometric_subst]
Find the integral of (1 − x2 ),
R1√ R π/2 p
0
1 − x2 dx = 0
1 − sin2 θ cos θ dθ
R π/2
= 0
cos2 θ dθ
1 π/2
R
= 2 0 1 + cos 2θ dθ
π
= 4
Example 7.4.7.
[author=wikibooks, file
p=text_files/trigonometric_subst]
p
Find the integral of (1 + x)/ (1 − x). We first rewrite this as
r r
1+x 1+x1+x 1+x
= =√
1−x 1+x1−x 1 − x2
Rule 7.4.4.
[author=wikibooks, file =text_files/trigonometric_subst]
√
If the integrand contains a factor of the form x2 − a2 we use the substitution
p
x = a sec θ dx = a sec θ tan θdθ x2 − a2 = tan θ
This will transform the integrand to a trigonometic function. If the new inte-
grand can’t be integrated on sight then another substitution may transform it to
a more tractable algebraic integrand.
Example 7.4.8.
[author=wikibooks, file
p=text_files/trigonometric_subst]
Find the integral of (x2 − 1)/x.
We use substitution:
R z √x2 −1 Rα
1 x dx = 1 tan
sec
θ
R αθ sec2θ tan θ dθ z>1
= 0
tan θ dθ α =√ sec−1 z
α
= [tan θ − θ]0 tan α = √ sec2 α − 1
= √ tan α − α −1 tan α = z 2 − 1
= z 2 − 1 − sec z
7.4. TRIGONOMETRIC SUBSTITUTIONS 241
Since the integrand is approximately 1 for large x we should expect the integral
at large z to be z plus a constant. It is actually z − π/2, as we expected. We can
use this line of reasoning to check our calculations.
Example 7.4.9.
[author=wikibooks, file
p=text_files/trigonometric_subst]
Find the integral of (x2 − 1)/x2 .
Note that the integrand is approximately 1/x for large x, so the antiderivative
should be approximately ln x. Using the substitution we find
R z √x2 −1 R α tan θ
x2 dx = 1 sec sec θ tan θ dθ z>1
1 R2 αθ sin2 θ
= 0 cos θ
dθ α = sec−1 z
We can now integrate by parts
R z √x2 −1 α Rα
1 x2 dx = − [tan θ cos θ]0 + 0 sec θ dθ
α
= − sin α + [ln(sec θ + tan θ)]0
= ln(sec α + tan α) −√sin α
√ 2
= ln(z + z 2 − 1) − zz −1
which for large z behaves like ln z + ln 2 − 1, just as expected.
Rule 7.4.5.
Example 7.4.10.
[author=wikibooks, file =text_files/trigonometric_subst]
Find the integral of (x2 + a2 )− 3/2.
We make the substitution:
Rz 2 − 3 Rα
0
x + a2 2 dx = a−2 0 cos θ dθ z>0
α
= a−2 [sin θ]0 α = tan−1 (z/a)
−2
= a sin α
−2 √ z/a 1 √ z
= a 2 2
= a2 a2 +z 2
1+z /a
If the integral is Z z p
I= x2 + a2 z>0
0
then on making this substitution we find
Rα
I = aR2 0 sec3 θ dθ α = tan−1 (z/a)
α
= a2 0 sec θ d tan θ Rα
= a2 [sec θ tan θ]α
0 − a2 0 Rsec θ tan2 θ dθ
α Rα
= a2 sec α tan α − a2 0 sec3 θ dθ +a2 R0 sec θ dθ
α
= a2 sec α tan α − I +a2 0 sec θ dθ
242 CHAPTER 7. TECHNIQUES OF INTEGRATION
Example 7.4.11.
[author=wikibooks, file =text_files/trigonometric_subst]
Consider the problem Z
1
dx
x2 + a2
with the substitution x = a tan(θ) , we have dx = asec2 θdθ , so that
Z
1 arctan(x/a)
2 2
dx =
x +a a
Exercises
√
x8 x2 − 1 dx
R
1. Tell what trig substitution to use for
R √
2. Tell what trig substitution to use for 25 + 16x2 dx
R √
3. Tell what trig substitution to use for 1 − x2 dx
R √
4. Tell what trig substitution to use for 9 + 4x2 dx
√
x9 x2 + 1 dx
R
5. Tell what trig substitution to use for
√
x8 x2 − 1 dx
R
6. Tell what trig substitution to use for
7.5. OVERVIEW OF INTEGRATION 243
• Familiarize yourself with a list of basic anti-derivatives, like the one in given
in class or elsewhere in these notes. This does mean memorizing part of
the list. The part of the list that you don’t memorize you should at least
recognize.
• Try u-substitution.
– Trigonometric functions.
– Rational functions.
– Integration by parts.
√ √
– Radicals ( ±x2 ± a2 is in√7.3, and n ax + b often reduces to a rational
function and 7.4 via u = n ax + b).
Rb Rb
• −∞
f (x), dx = lima→−∞ a
f (x) dx. If we can find F (x) then this equals
b
lima→−∞ F (x) .
a
R∞ Rb
• a
f (x), dx = limb→∞ a
f (x) dx. If we can find F (x) then this equals
b
limb→∞ F (x) .
a
R∞ R0 R∞
• −∞
= −∞
+ 0
where both of the integrals on the right hand side have
to exist.
Rc Rt
• If x = c is a VA then a
f (x) dx = limt→c a f (x) dx. If we can find F (x)
t
this equals limt→c F (x) .
a
244 CHAPTER 7. TECHNIQUES OF INTEGRATION
Rb Rb
• If x = c is a VA then c
f (x) dx = limt→c t
f (x) dx. If we can find F (x)
b
then this equals limt→c F (x) .
t
Rb Rc Rb
• If x = c is a VA and c is in (a, b) then a
= a
+ c
and both of these
integrals have to exist.
Chapter 8
Note that the latter expression is the formula for the slope of the ‘chord’ or
‘secant’ line connecting the two points (a, f (a)) and (b, f (b)) on the graph of f .
And the f 0 (c) can be interpreted as the slope of the tangent line to the curve at
the point (c, f (c)).
In many traditional scenarios a person is expected to commit the statement of
the Mean Value Theorem to memory. And be able to respond to issues like ‘Find
a point c in the interval [0, 1] satisfying the conclusion of the Mean Value Theorem
for the function f (x) = x2 .’ This is pointless and we won’t do it.
Discussion.
245
246 CHAPTER 8. TAYLOR POLYNOMIALS AND SERIES
k! = 1 · 2 · 3 · 4 · . . . · (k − 1) · k
Comment.
Comment.
[author=garrett, file =text_files/taylor_poly_formula]
There are many other possible forms for the error/remainder term. The one here
248 CHAPTER 8. TAYLOR POLYNOMIALS AND SERIES
was chosen partly because it resembles the other terms in the main part of the
expansion.
Comment.
Comment.
[author=garrett, file =text_files/taylor_poly_formula]
The general idea here is to approximate ‘fancy’ functions by polynomials, especially
if we restrict ourselves to a fairly small interval around some given point. (That
‘approximation by differentials’ circus was a very crude version of this idea).
It is at this point that it becomes relatively easy to ‘beat’ a calculator, in the
sense that the methods here can be used to give whatever precision is desired.
So at the very least this methodology is not as silly and obsolete as some earlier
traditional examples.
But even so, there is more to this than getting numbers out: it ought to be
of some intrinsic interest that pretty arbitrary functions can be approximated as
well as desired by polynomials, which are so readily computable (by hand or by
machine)!
One element under our control is choice of how high degree polynomial to use.
Typically, the higher the degree (meaning more terms), the better the approxi-
mation will be. (There is nothing comparable to this in the ‘approximation by
differentials’).
Of course, for all this to really be worth anything either in theory or in practice,
we do need a tangible error estimate, so that we can be sure that we are within
whatever tolerance/error is required. (There is nothing comparable to this in the
‘approximation by differentials’, either).
And at this point it is not at all clear what exactly can be done with such
formulas. For one thing, there are choices.
8.2. TAYLOR POLYNOMIALS: FORMULAS 249
Notation.
[author=duckworth, file =text_files/taylor_poly_formula]
Recall the notation f (k) means the kth derivative of f . Recall the definition of n!:
0! = 1, 1! = 1, 2! = 2, 3! = 3 · 2, 4! = 4 · 3 · 2 and in general k! = k(k − 1) · · · 3 · 2.
Theorem 8.2.1.
[author= duckworth , file =text_files/taylor_poly_formula]
If f (x) is a nice function near x = 0, then f (x) may be approximated by the
following degree n polynomial
Comment.
Comment.
Comment.
[author=duckworth, file =text_files/taylor_poly_formula]
What does “nice” mean in Theorem 8.2.1? It means that f has as many derivatives
as we want, all continuous, on some open interval containing x = 0.
Example 8.2.1.
[author=duckworth, file =text_files/taylor_poly_formula]
Let’s find the Maclaurin polynomial for f (x) = sin(x). For the above recipe we
need to calculate f (k) (x), i.e. a bunch of derivatives, and we need to calculate
250 CHAPTER 8. TAYLOR POLYNOMIALS AND SERIES
Thus we have
1 3 1 1
sin(x) = x − x + x5 − x7 + . . .
3! 5! 7!
We’ll worry about how to write the “last” term later, and we’ll worry about Σ
notation later. Note that only odd terms remain in this polynomial. That’s
because sin(x) is an odd function.
Just to see how good this approximation is, let’s take a look at some graphs.
1 3 1 3 1 5 1 3
Let’s graph y1 = sin(x), y2 = x − 3! x , y3 = x − 3! x + 5! x and y4 = x − 3! x +
1 5 1 7
5! x − 7! x .
Comment.
[author=duckworth, file =text_files/taylor_poly_formula]
What about the “last” term in Example 8.2.1? Judging from the above pattern
we know it will be odd. We can write any odd number as 2n + 1, so the last term
1
will be of the form ± (2n+1)! x2n+1 . But that’s not very satisfying, is it “+” or is
it “−”? Well, that alternates. The first term is positive, the next is negative, the
next positive, etc. So we want a formula that alternates like this between positive
and negative. The most common formula for this is (−1)n . Thus, including the
last term we have:
1 3 1 1 (−1)n 2n+1
Maclaurin poly for sin(x) is x− x + x5 − x7 + · · · + x
3! 5! 7! (2n + 1)!
Example 8.2.2.
Thus we have
1 2 1 1
cos(x) = 1 − x + x4 − x6 + . . .
2! 4! 6!
8.2. TAYLOR POLYNOMIALS: FORMULAS 251
Notice that only even terms appear in this polynomial. That’s because cos(x) is
an even function.
Example 8.2.3.
[author=duckworth, file =text_files/taylor_poly_formula]
Let’s find the Maclaurin polynomial for f (x) = ex . We calculate:
f (x) = ex f (0) = 1
f 0 (x) = ex f 0 (0) = 1
f 00 (x) = ex f 00 (0) = 1
(After this it repeats)
.. ..
. .
Thus we have
1 2 1 1
ex = 1 + x + x + x3 + x4 + . . .
2! 3! 4!
Discussion.
Theorem 8.2.2.
[author= duckworth , file =text_files/taylor_poly_formula]
If f (x) is a nice function near x = a, then f (x) may be approximated by the
following polynomial:
Comment.
[author=duckworth, file =text_files/taylor_poly_formula]
The polynomial in Theorem ?? is called the Taylor polynomial of f (x) at x = a.
People also say that the polynomial is defined at x = a or centered at x = a
or that a is the center of the polynomial.
Note that for a = 0 this formula is identical to the formula for the Maclaurin
polynomial. In other words, the Maclaurin polynomial is just a special case of the
Taylor polynomial. However, this “special” case is the one which we will see most
often.
Example 8.2.4.
Thus we have
Exercises
1. Write the first three terms of the Taylor series at 0 of f (x) = 1/(1 + x).
2. Write the first three terms of the Taylor series at 2 of f (x) = 1/(1 − x).
3. Write the first three terms of the Taylor series at 0 of f (x) = ecos x .
8.3. CLASSIC EXAMPLES OF TAYLOR POLYNOMIALS 253
1
1. 1−x = 1 + x + x2 + x3 + x4 + x5 + x6 + . . .
x x2 x3 x4
2. ex = 1 + 1! + 2! + 3! + 4! + ...
x2 x4 x6 x8
3. cos x = 1 − 2! + 4! − 6! + 8! ...
x x3 x5 x7
4. sin x = 1! − 3! + 5! − 7! + ...
2 3
x x x4 x5 x6
5. log(1 + x) = x − 2 + 3 − 4 + 5 − 6 + ...
where here the dots mean to continue to whatever term you want, then stop, and
stick on the appropriate remainder term.
It is entirely reasonable if you can’t really see that these are what you’d get,
but in any case you should do the computations to verify that these are right. It’s
not so hard.
Note that the expansion for cosine has no odd powers of x (meaning that
the coefficients are zero), while the expansion for sine has no even powers of x
(meaning that the coefficients are zero).
Comment.
[author=garrett, file =text_files/taylor_examples]
At this point it is worth repeating that we are not talking about infinite sums
(series) at all here, although we do allow arbitrarily large finite sums. Rather
than worry over an infinite sum that we can never truly evaluate, we use the error
or remainder term instead. Thus, while in other contexts the dots would mean
‘infinite sum’, that’s not our concern here.
The first of these formulas you might recognize as being a geometric series, or
at least a part of one. The other three patterns might be new to you. A person
would want to be learn to recognize these on sight, as if by reflex!
The most straightforward way to deal with this is just to do what is indicated by
the formula: take however high order derivatives you need and plug in. However,
very often this is not at all the most efficient.
Especially in a situation where we are interested in a composite function of
the form f (xn ) or, more generally, f (polynomial in x) with a ‘familiar’ function
f , there are alternatives.
Example 8.4.1.
x2 x3 ec
ex = 1 + x + + + x4
2! 3! 4!
with some c between 0 and x, where our choice to cut it off after that many terms
was simply a whim. But then replacing x by x3 gives
3 x6 x9 ec
ex = 1 + x3 + + + x12
2! 3! 4!
with some c between 0 and x3 . Yes, we need to keep track of c in relation to the
new x.
So we get a polynomial plus that funny term with the ‘c’ in it, for the remainder.
Yes, this gives us a different-looking error term, but that’s fine.
So we obtain, with relative ease, the expansion of degree eleven of this function,
which would have been horrible to obtain by repeated differentiation and direct
application of the general formula. Why ‘eleven’ ?: well, the error term has the
x12 in it, which means that the polynomial itself stopped with a x11 term. Why
didn’t we see that term? Well, evidently the coefficients of x11 , and of x10 (not to
mention x, x2 , x4 , x5 , x7 , x8 !) are zero.
Example 8.4.2.
[author=garrett, file =text_files/taylor_calculation_tricks]
As another example, let’s get the degree-eight expansion of cos x2 at 0. Of course,
it makes sense to use
x2 x4 − sin c 5
cos x = 1 − + + x
2! 4! 5!
with c between 0 and x, where we note that − sin x is the fifth derivative of cos x.
Replacing x by x2 , this becomes
x4 x8 − sin c 10
cos x2 = 1 − + + x
2! 4! 5!
Exercises
1. Use a shortcut to compute the Taylor expansion at 0 of cos(x5 ).
2
2. Use a shortcut to compute the Taylor expansion at 0 of e(x +x)
.
1
3. Use a shortcut to compute the Taylor expansion at 0 of log( 1−x ).
256 CHAPTER 8. TAYLOR POLYNOMIALS AND SERIES
Theorem 8.5.1.
[author= duckworth , file =text_files/new_taylor_series_from_old]
Suppose f (x) ≈ c0 + c1 (x − a) + c2 (x − a)2 + c3 (x − a)3 + . . . . Then we may find
polynomial approximations for a function g(x) as follows:
1. If g(x) = f (x2 ) (or f (2x), or f (−x2 ) or . . . ) the polynomial for g(x) is found
by substituting x2 (or 2x, or −x2 or . . . ) in place of x in the polynomial for
f (x).
2. If g(x) is the anti-derivative of f (x) (or the derivative) then the polyno-
mial for g(x) is found by taking the anti-derivative (or the derivative) of the
polynomial for f (x).
3. If g(x) equals f (x) times x (or sin(x), or ex , or . . . , or divided by one of
these) then the polynomial for g(x) is found by multipliying by x (or by the
polynomial for sin(x), or the polynomial for sin(x) or by dividing by one of
these).
Example 8.5.1.
Example 8.5.2.
anti-derivative.
= R x1 dx + C
R
ln(x)
= 1 − (x − 1) + (x − 1)2 − (x − 1)3 + . . . dx + C
2 3 4
= x − (x−1)
2 + (x−1)
3 − (x−1)
4 + ··· + C
But what’s C? Well, we know that we should have ln(1) = 0. Plugging this in we
get
1 = 1 − 0 + 0 − 0 + ··· + C
so C = −1 and we can write
(x − 1)2 (x − 1)3 (x − 1)4
ln(x) = (x − 1) − + − + ...
2 3 4
Example 8.5.3.
1
x = 1 − (x − 1) + (x − 1)2 − (x − 1)3 + (x − 1)4 − . . .
↓
1
1+x2 = 1 − (1 + x2 − 1) + (1 + x2 − 1)2 − (1 + x2 − 1)3 + (1 + x2 − 1)4 − . . .
= 1 − x2 + x4 − x6 + x8
↓
tan −1 (x) 1 − x2 + x4 − x6 + x8 − · · · + C
R
=
x3 x5 x7
=x− 3 + 5 − 7 − ··· + C
Again you can find C by plugging in tan −1 (0) = 0. In this case you find that
C = 0, thus:
x3 x5 x7
tan −1 (x) = x − + − − ...
3 5 7
Example 8.5.4.
[author=duckworth, file =text_files/new_taylor_series_from_old]
Find the Maclaurin series for ex sin(x). Actually, just find the first few terms. The
idea here is just that you muliply the polynomials for ex and sin(x). So we have
x2 x3 x3 x5
ex sin(x) = (1 + x + + + . . . )(x − + − ...)
2 3! 3! 5!
Not everyone knows how to multiply things like this together. If you apply the
distributive law over and over again the result is this: pick a term on the left,
258 CHAPTER 8. TAYLOR POLYNOMIALS AND SERIES
multiply it by each term on the right, then move to the next term on the left.
Thus we get:
x3 5
ex sin(x) = 1 · x − 1 · 3! + 1 x5! − . . . (1 times the polynomial on the right)
x3 x5
+x · x − x · 3! +x· 5! − ... (x times the polynomial on the right)
2
x2 x3 x2 x5 2
+ x2 · x − 2 · 3! + 2 · 5! − ... ( x2 times the polynomial on the right)
3
x3 x3 x3 x5 3
+ x3! · x − 3! · 3! + 3! · 5! − ... ( x3! times the polynomial on the right)
x3 x5
= x− 3! + 5! − ...
x4 x6
+x2 − 3! + 5! − ...
3
x5 x7
+ x2 − 2·3! + 2·5! − ...
4
x6 x8
+ x3! − 3!·3! + 3!·5! − ...
Now one collects the constant terms in front, then all the x terms, then all the x2
terms, etc.
1 1 2 1 1
= x + x2 + − + x3 + x4 + − x5 + . . .
3! 2 3! 5! 2 · 3!
Note that here there is not a clear pattern as to what the next term would look
like.
As a special case of the last question, we can consider the question of approx-
imating f (x) to within a given tolerance/error in terms of f (xo ), f 0 (xo ), f 00 (xo )
and higher derivatives of f evaluated at a given point xo .
In ‘real life’ this last question is not really so important as the third of the
questions listed above, since evaluation at just one point can often be achieved
more simply by some other means. Having a polynomial approximation that
works all along an interval is a much more substantive thing than evaluation at a
single point.
It must be noted that there are also other ways to approach the issue of best
approximation by a polynomial on an interval. And beyond worry over approxi-
mating the values of the function, we might also want the values of one or more
of the derivatives to be close, as well. The theory of splines is one approach to
approximation which is very important in practical applications.
Discussion.
Definition 8.6.1.
In other words, Rn (x) is the gap between the original function f and the polyno-
mial.
Comment.
1. Note, usually we will find M by finding the absolute max and min of f (n+1) (x)
on the interval [a, b]. Sometimes, however, we can find a value for M with-
out calculating absolute max’s and min’s. For example, if f (x) equals sin(x),
then we can always take M = 1.
2. Note that this theorem gives some impression of why Maclaurin approxima-
M
tions get better by using more terms. As n gets bigger, the fraction (n+1)!
will almost always get smaller. Why? Because (n+1)! get’s really big. O.K.,
M
so does (n+1)! always get smaller? Well, be to be rigorous, M might change
with n. Off the top of my head, I can’t think of a function where M would
M
change enough to prevent (n+1)! from getting smaller, but I believe such a
function exists.
3. Note, we are often interested in bounding Rn (x) on an interval; in such a
case we replace |x|n+1 by its absolute max on the interval. In other words,
if the interval is [a, b], we’ll replace |x|n+1 by |a|n+1 or bn+1 , whichever is
bigger.
Example 8.6.1.
[author=duckworth, file =text_files/taylor_questions]
Consider the Maclaurin polynomial for sin(x).
(a) Find an upper bound for the error of approximating sin(.5) using the degree
three Maclaurin polynomial.
(b) Find n so that the error would be at most .00001.
1 3
Solution: (a) By the work above the degree three polynomial is x − 3! x . Thus
the error is
M
R3 (.5) ≤ (.5)4
4!
where M is an upper bound on the fourth derivative of sin(x). You should always
remember, for sin(x) and cos(x), you can always take 1 as an upper bound on any
derivative. So, let M = 1. Then we have
1
R3 (.5) ≤ (.5)4 = .0026
4!
8.6. PROTOTYPES: MORE SERIOUS QUESTIONS ABOUT TAYLOR POLYNOMIALS261
To make this more concrete, let’s calculate the approximation we’re discussing
(note that so far in this problem, we’ve calculated the error without ever knowing
what the approximation is; this is kind of strange). The approximation is
(.5)3
sin(.5) ≈ .5 − = .4792
3!
and our calculation above says that the “real” number is within .0026 of this.
(b) Now we don’t know n, but we use the same value for M . So we want:
1
(.5)n+1 ≤ .00001
(n + 1)!
Now, the truth is, I don’t know how to solve this for n. So, I’ll just guess and
check. Note, I’ll just guess odd values for n since there are no even terms in sin(x).
1 6
n=5⇒ 6! (.5) = .000022
n=7⇒ 1
8! (.5)
8
= .968 × 10−7
Comment.
have). Then I would need the first 8 nonzero terms of the sine polynomial (i.e.
up to degree 15). That’s a lot more calculation than before, especially since this
would been raising something to the 15th power. So is there another shortcut?
Sure, I can think of another approache, and allthought I’m sure that the cal-
culator uses something vaguely like this, it probably is much more sophisticated
(i.e. efficient, but complicated) than what I’m presenting here. The goal is to
reduce 1.47 to something closer to zero. One trick would be to use the identity
sin(x) = cos(x − π/2). Then sin(1.47) = cos(−0.100796327) and now I would use
the cosine polynomial, probably with only a few terms since what I’m plugging in
is close to zero. If I use the first four nonzero terms of the Maclaurin polynomial
I find that cos(−0.100796327) ≈ 0.9949243497.
Using all of the above tricks would reduce calculations of sin(x), and cos(x),
for all values of x to calculations involving only x between −π/4 and π/4.
What if this still isn’t good enough? What if calculating sin(.78) takes too long?
Note that .78 is near π/4 and can’t be made any closer to zero by subtracting π,
or 2π, or π/2 etc. Well, then you could use other identities. Remember, in
trigonometry, there are a million identities! So, you could use the half angle
identity: sin(x) = 2 sin(x/2) cos(x/2). So, you could calculate sin(.78/2) and
cos(.78/2) using Maclaurin series, and then multiply these together and multiply
by 2.
Well, you get the idea. If time matters (which it usually does) and if calcu-
lations take time (which they always do) and if you’re doing lots of calculations
(which is probably the case in most interesting problems) then it’s worth your
time to optimize the process using whatever tricks you can. The tricks I’ve shown
you here are “naive” in the sense that they didn’t use anything more than basic
trigonometry. In real life, there are whole books and classes full tricks to speed
calculations. This topic is part of numerical analysis and numerical recipes.
Discussion.
[author=garrett, file =text_files/taylor_error]
This section treats a simple example of the second kind of question mentioned
above: ‘Given a Taylor polynomial approximation to a function, expanded at some
given point, and given an interval around that given point, within what tolerance
does the Taylor polynomial approximate the function on that interval?’
Example 8.7.1.
[author=garrett, file =text_files/taylor_error]
2 4
Let’s look at the approximation 1 − x2 + x4! to f (x) = cosx on the interval [− 12 , 21 ].
We might ask ‘Within what tolerance does this polynomial approximate cos x on
that interval?’
To answer this, we first recall that the error term we have after those first
8.7. DETERMINING TOLERANCE/ERROR 263
− sin c 5
x
5!
For x in the indicated interval, we want to know the worst-case scenario for the size
of this thing. A sloppy but good and simple estimate on sin c is that | sin c| ≤ 1,
regardless of what c is. This is a very happy kind of estimate because it’s not so
bad and because it doesn’t depend at all upon x. And the biggest that x5 can be
is ( 21 )5 ≈ 0.03. Then the error is estimated as
− sin c 5 1
| x |≤ 5 ≤ 0.0003
5! 2 · 5!
This is not so bad at all!
We could have been a little clever here, taking advantage of the fact that a lot
of the terms in the Taylor expansion of cosine at 0 are already zero. In particular,
2 4
we could choose to view the original polynomial 1 − x2 + x4! as including the fifth-
degree term of the Taylor expansion as well, which simply happens to be zero, so
is invisible. Thus, instead of using the remainder term with the ‘5’ in it, we are
actually entitled to use the remainder term with a ‘6’. This typically will give a
better outcome.
That is, instead of the remainder we had must above, we would have an error
term
− cos c 6
x
6!
Again, in the worst-case scenario | − cos c| ≤ 1. And still |x| ≤ 12 , so we have the
error estimate
− cos c 6 1
| x |≤ 6 ≤ 0.000022
6! 2 · 6!
This is less than a tenth as much as in the first version.
But what happened here? Are there two different answers to the question of
how well that polynomial approximates the cosine function on that interval? Of
course not. Rather, there were two approaches taken by us to estimate how well
it approximates cosine. In fact, we still do not know the exact error!
The point is that the second estimate (being a little wiser) is closer to the
truth than the first. The first estimate is true, but is a weaker assertion than we
are able to make if we try a little harder.
This already illustrates the point that ‘in real life’ there is often no single ‘right’
or ‘best’ estimate of an error, in the sense that the estimates that we can obtain by
practical procedures may not be perfect, but represent a trade-off between time,
effort, cost, and other priorities.
Exercises
Example 8.8.1.
25 ≤ c ≤ x
So, in the worst-case scenario, the value of c−3/2 is at most 25−3/2 = 1/125.
And we can rearrange the equation:
√ 1 1 1
x − [5 + (x − 25)] = − 3/2 (x − 25)2
10 8c
Taking absolute values in order to talk about error, this is
√ 1 1 1
| x − [5 + (x − 25)]| = | 3/2 (x − 25)2 |
10 8c
1
Now let’s use our estimate | c3/2 | ≤ 1/125 to write
√ 1 1 1
| x − [5 + (x − 25)]| ≤ | (x − 25)2 |
10 8 125
OK, having done this simplification, now we can answer √ questions like For
1
what range of x ≥ 25 does 5 + 10 (x − 25) approximate x to within .001? We
cannot hope to tell exactly, but only to give a range of values of x for which we can
be sure based upon our estimate. So the question becomes: solve the inequality
1 1
| (x − 25)2 | ≤ .001
8 125
(with x ≥ 25). Multiplying out by the denominator of 8·125 gives (by coincidence?)
|x − 25|2 ≤ 1
Exercises
x3
1. For what range of values of x is x − 6 within 0.01 of sin x?
Example 8.9.1.
[author=garrett, file =text_files/taylor_adjusting_degree]
For example, let’s get a Taylor polynomial approximation to ex which is within
0.001 on the interval [− 12 , + 12 ]. We use
x2 x3 xn ec
ex = 1 + x + + + ... + + xn+1
2! 3! n! (n + 1)!
for some c between 0 and x, and where we do not yet know what we want n to
be. It is very convenient here that the nth derivative of ex is still just ex ! We are
wanting to choose n large-enough to guarantee that
ec
| xn+1 | ≤ 0.001
(n + 1)!
for all x in that interval (without knowing anything too detailed about what the
corresponding c’s are!).
The error term is estimated as follows, by thinking about the worst-case sce-
nario for the sizes of the parts of that term: we know that the exponential function
is increasing along the whole real line, so in any event c lies in [− 21 , + 12 ] and
|ec | ≤ e1/2 ≤ 2
(where we’ve not been too fussy about being accurate about how big the square
root of e is!). And for x in that interval we know that
1
|xn+1 | ≤ ( )n+1
2
So we are wanting to choose n large-enough to guarantee that
ec 1
| ( )n+1 | ≤ 0.001
(n + 1)! 2
Since
ec 1 2 1
| ( )n+1 | ≤ ( )n+1
(n + 1)! 2 (n + 1)! 2
we can be confident of the desired inequality if we can be sure that
2 1
( )n+1 ≤ 0.001
(n + 1)! 2
268 CHAPTER 8. TAYLOR POLYNOMIALS AND SERIES
Exercises
1. Determine how many terms are needed in order to have the corresponding
Taylor polynomial approximate ex to within 0.001 on the interval [−1, +1].
2. Determine how many terms are needed in order to have the corresponding
Taylor polynomial approximate cos x to within 0.001 on the interval [−1, +1].
3. Determine how many terms are needed in order to have the corresponding
Taylor polynomial approximate cos x to within 0.001 on the interval [ −π π
2 , 2 ].
4. Determine how many terms are needed in order to have the correspond-
ing Taylor polynomial approximate cos x to within 0.001 on the interval
[−0.1, +0.1].
√
5. Approximate e1/2 = e to within .01 by using a Taylor polynomial with
remainder term, expanded at 0. (Do NOT add up the finite sum you get!)
√
6. Approximate 101 = (101)1/2 to within 10−15 using a Taylor polynomial
with remainder term. (Do NOT add up the finite sum you get! One point
here is that most hand calculators do not easily give 15 decimal places. Hah!)
8.10. INTEGRATING TAYLOR POLYNOMIALS: FIRST EXAMPLE 269
Example 8.10.1.
[author=garrett, file =text_files/taylor_integration]
As a promising example: on one hand, it’s not too hard to compute that
Z T
dx
dx = [− log(1 − x)]T0 = − log(1 − T )
0 1−x
x2 x3 x4
− log(1 − x) = x + + + + ...
2 3 4
(For the moment let’s not worry about what happens to the error term for the
Taylor polynomial).
This little computation has several useful interpretations. First, we obtained a
Taylor polynomial for − log(1 − T ) from that of a geometric series, without going
to the trouble of recomputing derivatives. Second, from a different perspective,
we have an expression for the integral
Z T
dx
dx
0 1−x
without necessarily mentioning the logarithm: that is, with some suitable inter-
pretation of the trailing dots,
T
T2 T3 T4
Z
dx
dx = T + + + + ...
0 1−x 2 3 4
270 CHAPTER 8. TAYLOR POLYNOMIALS AND SERIES
c is in the range [0, x) which is inside [0, 1), keeping c away from 1. To do this we
might demand that 0 ≤ T < 1.
For simplicity, and to illustrate the point, let’s just take 0 ≤ T ≤ 12 . Then in
the worst-case scenario
1 1
| |≤ = 2n+1
(1 − c)n+1 (1 − 21 )n+1
2n+1 ( 12 )n+2 1
≤ =
(n + 1)(n + 2) 2(n + 1)(n + 2)
T2 Tn 1
| − log(1 − T ) − [T + + ... + ]| ≤
2 n 2(n + 1)(n + 2)
T2 Tn 2n+1 T n+2
| − log(1 − T ) − [T + + ... + ]| ≤
2 n 2(n + 1)(n + 2)
and the latter expression shrinks rapidly as T approaches 0.
Example 8.12.1.
If we add the first three terms here we get .7666. As a rough idea of how accurate
this is, supppose we added the next term. This would change the result to .7429.
This isn’t much of a change. If we added one more term, this would change it even
less.
Discussion.
To get these coefficients we can look at Pascal’s triangle. In this triangle, the
numbers on row n are the coefficients used in (a + b)n . You get a coefficient by
adding the two numbers above it.
11
121
1331
14641
1 5 10 10 5 1
1 6 15 20 15 6 1
This triangle is great, but what if we want to find (a + b)27 ? Do we really want to
write down 27 rows of this triangle? I think not. Then, is there a closed formula
for the coefficients? Yes.
k factors
z }| {
n n(n − 1) . . .
Define: :=
k k!
272 CHAPTER 8. TAYLOR POLYNOMIALS AND SERIES
Then we have
n n n−1 n n−2 2 n n−3 3
(a + b) = a + na b+ a b + a b + · · · + nabn−1 + bn
2 3
How does this relate to polynomials? Newton realized first that we could
replace whole numbers for n by any real numbers, and secondly, we could replace
b by x. (A critic of Newton once said that “any clever school boy could have
thought of this”!). The following theorem is due to Newton.
Theorem 8.12.1.
[author= duckworth , file =text_files/binomial_series]
For any real number n, and for |x| < 1, we have
n n 2 n 3
(1 + x) = 1 + nx + x + x + ...
2 3
Proof.
[author=duckworth, file =text_files/binomial_series]
To prove that the binomial series is correct one just applies the Maclaurin series
to (1 + x)n . To use the binomial series for something like (a + b)n you factorn
out the larger number. So suppose a ≥ b, then we write (a + b)n = an 1 + ab .
Also, (1 − x)n we treat as (1 + u)n and substitute −x in for u. This will give an
alternating series.
Example 8.12.2.
To get the series for √sin(x) one would substitute x/4 into the series for (1 +
1+x/4
x)−1/2 , then multiply the result by the series for sin(x).
Chapter 9
Infinite Series
Definition 9.0.1.
9.1 Convergence
Definition 9.1.1.
Example 9.1.1.
273
274 CHAPTER 9. INFINITE SERIES
Example 9.1.2.
Example 9.1.3.
[author=wikibooks, file =text_files/introduction_to_series_convergence]
Perhaps a more surprising and interesting fact is that for |r| < 1, Sn (r) will
converge to a finite value. Specifically, it is possible to showP that limn→∞ PnSn (r) =
r n n n
1−r . Indeed, consider the quantity (1 − r)S n (r) = (1 − r) i=1 r = i=1 r −
P n+1 n n+1 n+1
i=2 r = r − r Since r → 0 as n → ∞ for |r| < 1, this shows that
(1 − r)Sn (r) → r as n → ∞. The quantity 1 − r is non-zero and doesn’t depend
on n so we can divide by it and arrive at the formula we want.
We’d like to be able to draw similar conclusions about any series.
Unfortunately, there is no simple way to sum a series. The most we will be
able to do is determine if it converges.
Example 9.1.4.
[author=wikibooks, file =text_files/introduction_to_series_convergence]
It is obvious that for a series to converge, the an must tend to zero, but this is not
sufficient.
P2m 1
1 n = 1 + 21 + 1
3 + 14 + +
Consider the harmonic series, the sum of 1/n, and group terms > 32 + 1
4 2+
= 32 + 1
2+
As m tends to infinity, so does this final sum, hence the series diverges.
We can also deduce something about how quickly it diverges. Using the same
grouping of terms, we can get an upper limit on the sum of the first so many terms,
the partial sums.
P2m 1 ln2 m Pm
1+ m 2 < 1 n < 1 + m or 1 + m < 1 n1 < 1 + ln2 m and the partial
9.1. CONVERGENCE 275
Test.
[author=wikibooks, file =text_files/introduction_to_series_convergence]
Comparison test This is a convergence test (also known as the direct comparison
test) we can apply to any pair of series. If bn converges and |an | ≤ |bn | then an
converges. If bn diverges and |an | ≥ |bn | then an diverges.
There are many such tests, the most important of which we’ll describe in this
chapter.
Definition 9.1.2.
Theorem 9.1.1.
[author= wikibooks , file =text_files/absolute_convergence_of_series]
P ∞
If
P∞ the series of absolute values, n=1 |an |, converges, then so does the series
n=1 an
Comment.
[author=wikibooks, file =text_files/absolute_convergence_of_series]
We say such a series converges absolutely.
The converse does not hold. The series 1-1/2+1/3-1/4 ... converges, even
though the series of its absolute values diverges.
A series like this that converges, but not absolutely, is said to converge condi-
tionally.
Comment.
[author=wikibooks, file =text_files/absolute_convergence_of_series]
If a series converges absolutely, we can add terms in any order we like. The limit
will still be the same.
If a series converges conditionally, rearranging the terms changes the limit. In
fact, we can make the series converge to any limit we like by choosing a suitable
rearrangement.
E.g, in the series 1-1/2+1/3-1/4 ..., we can add only positive terms until the
partial sum exceeds 100, subtract 1/2, add only positive terms until the partial
sum exceeds 100, subtract 1/4, and so on, getting a sequence with the same terms
276 CHAPTER 9. INFINITE SERIES
Rule 9.2.1.
[author=duckworth, file
P=text_files/ratio_test]
∞
Consider the series i=0 ai . Let limi→∞ aai+1
i
= L. If L < 1 then the series is
absolutely convergent. If L > 1 (or L = ∞) then the series diverges. If L = 1 then
the test tells you nothing.
Comment.
[author=duckworth, file =text_files/ratio_test]
Note: when we say that the ratio test tells us nothing about the case L = 1, this
means that there are convergent series with L = 1 and there are divergent series
with L = 1. Note, the test is easy to remember because for convergence we need
(for positive numbers) that the terms decrease; if the terms decrease this means
that ai+1 should be smaller than ai , and if this is the case then aai+1
i
< 1. Note,
we have learned how to find limi→∞ of many fractions.
Example 9.2.1.
Rule 9.2.2.
Comment.
[author=duckworth, file =text_files/root_test_for_series]
Note: the statement that the series tells us nothing when L = 1 means that there
are convergent series with L = 1 and there are divergent series with L = 1. Note:
it is often easier to apply the ratio test than the root test. So the root test is best
9.2. VARIOUS TESTS FOR CONVERGENCE 277
Rule 9.2.3.
Rule 9.2.4.
−p+1
If −p+1 > 0, then this last fraction has more x’s on top and therefore limb→∞ x = ∞
and the series diverges. If −p + 1 < 0, then this last fraction has x’s on the bottom
−p+1
and therefore limb→∞ x = 0.
Rule 9.2.5.
[author=duckworth, file =text_files/comparison_test_for_series]
P∞ P∞
(a) if an ≥ bn ≥ 0 and i=0 an exists then so does i=0 bn .
P∞ P∞
(b) If limn→∞ abnn equals a non-zero number, then i=0 an exists ⇐⇒ i=0 bn
exists.
Example 9.2.2.
Rule 9.2.6.
[author=wikibooks, file =text_files/limit_comparison_test]
If bn converges, and lim |abnn | < ∞ then an converges.
Example 9.2.3.
[author=wikibooks, file =text_files/limit_comparison_test]
n+1
Let an = n− n
For large n, the terms of this series are similar to, but smaller than, those of
the harmonic series. We compare the limits.
|an | n 1
lim = lim n+1 = lim 1 = 1 > 0
cn n n nn
so this series diverges.
Definition 9.2.1.
[author=wikibooks, file =text_files/alternating_series_test]
If the signs of the an alternate, an = (−1)n |an | and they are decreasing, then we
call this an alternating series.
Theorem 9.2.1.
[author= wikibooks , file =text_files/alternating_series_test]
The series sum converges provided that limn→∞ an = 0.
The error inPa partial P
sum of an alternating series is smaller than the first
∞ m
omitted term. | n=1 an − n=1 an | < |am+1 |
Comment.
Theorem 9.2.2.
[author= duckworth , file =text_files/alternating_series_test]
P ∞ i
If bi ≥ bi+1 (for
Pall i) and
Plimi→∞ bi = 0, then i=0 (−1) bi converges. Further-
∞ n
more, if Rn = i=0 bi − i=0 bi is the error, then |Rn | ≤ bn+1 .
Comment.
Definition 9.3.1.
Theorem 9.3.1.
[author= duckworth , file =text_files/power_series]
P∞
Given a power series i=0 ci (x − a)i , one of the following situations holds:
Comment.
[author=duckworth, file =text_files/power_series]
Note: This statement does not tell us what happens for x = a ± R, though
sometimes we can figure this out by using the another test. In general, we need
to use the root or ratio test to find R.
Example 9.3.1.
[author=duckworth, file =text_files/power_series]
Series R
P∞ i
i=0 x R=1
x
P∞ xi .
e = i=0 i! R=∞
P∞ n
ln(x) = i=0 (−1)n+1 (x−1)
n R=1
280 CHAPTER 9. INFINITE SERIES
Discussion.
[author=wikibooks, file =text_files/power_series]
The study of power series concerns ourselves with looking at series that can ap-
proximate some function over some interval.
Recall from elementary calculus that we can obtain a line that touches a curve
at one point by using differentiation. So in a sense we are getting an approximation
to a curve at one point. This does not help us very much however.
Let’s look at the case of y = cos(x), about the point x = 0. We have a first
approximation using differentiation by the line y = 1. Observe that cos(x) looks
like a parabola upside-down at x = 0. So naturally we think “what parabola could
approximate cos(x) at this point?” The parabola 1 − x2 /2 will do. In fact, it is
the best estimate using polynomials of degree 2. But how do we know this is so?
This is the study of power series finding optimal approximations to functions using
polynomials.
Definition 9.3.2.
Theorem 9.3.2.
[author= wikibooks , file =text_files/power_series]
Pn
Radius of convergence We can only use the equation f (x) = j=0 aj xj to
study f (x) when the power series converges. This may happen for a finite range,
or for all real numbers.
If the series converges only for x is some interval, then the radius of con-
vergence is half of the length of this interval.
Example 9.3.2.
[author=wikibooks, file =text_files/power_series]
1
P ∞ n
Consider the series 1−x = n=0 x (a geometric series) this converges when
P∞ xn
—x—¡1, so the radius of convergence is 1. ex = n=0 n! Using the [[Calcu-
lusratio test—ratio test]],this series
converges when the ratio of successive terms
xn+1 n! x
is less than one, limn→∞ (n+1)! xn < 1 or lim n→∞ n
< 1 which is always true,
so this power series has an infinite radius of convergence.
If we use the ratio test on an arbitary power series, we find it converges when
lim |a|a
n+1 x|
n|
< 1 and diverges when lim |a|a
n+1 x|
n|
> 1 The radius of convergence is
therefore r = lim |a|an+1
n|
| If this limit diverges to infinity, the series has an infinite
radius of convergence.
Fact.
R P∞ P∞ aj−1 j
j=0 aj z j dz = j=1 j x
Both the differential and the integral have the same radius of convergence as
the original series.
Example 9.3.3.
[author=wikibooks, file =text_files/power_series]
1
This allows us to sum exactly suitable power series. E.g 1+x = 1−x+x2 −x3 +. . ..
This is a geometric series, which converges for |x| < 1. Integrate both sides, and we
2 3
get ln(1 + x) = x − x2 + x3 . . . which will also converge for |x| < 1. When x = −1
this is the harmonic series, which diverges. When x = 1 this is an alternating
series with diminishing terms, which converges to ln(2).
2
It also lets us write power series for integrals we cannot do exactly. E.g e−x =
2n
(−1)n xn! The left hand side can not be integrated exactly, but the right hand
P
Rz 2 P (−1)n z2n+1
side can be. 0 e−x dx = (2n+1)n! This gives us a power series for the sum,
which has an infinite radius of convergence, letting us approximate the integral as
closely as we like.
Definition 9.3.3.
Here, n! is the factorial of n and f ( n)(a) denotes the nth derivative of f at the
point a. If this series converges for every x in the interval (a − r, a + r) and the
sum is equal to f (x), then the function f (x) is called analytic. If a = 0, the series
is also called a Maclaurin series.
Comment.
[author=wikibooks, file =text_files/taylor_series_in_context_of_power_series]
To check whether the series converges towards f (x), one normally uses estimates
for the remainder term of Taylor’s theorem. A function is analytic if and only if it
can be represented as a power series the coefficients in that power series are then
necessarily the ones given in the above Taylor series formula.
Comment.
Example 9.3.4.
Comment.
[author=wikibooks, file =text_files/taylor_series_in_context_of_power_series]
Some functions cannot be written as Taylor series because they have a singularity.
In these cases, one can often still achieve a series expansion if one allows also
2
negative powers of the variable x see Laurent series. For example, f (x) = e−1/x
can be written as a Laurent series.
Examples 9.3.5.
[author=wikibooks, file =text_files/taylor_series_in_context_of_power_series]
Several important Taylor series expansions follow. All these expansions are also
valid for complex arguments x.
9.4. FORMAL CONVERGENCE 283
P∞ n
ex = n=0 xn! for all x
P∞ n+1
ln(1 + x) = n=1 (−1)n xn for |x| < 1
1 ∞
xn
P
1−x = n=0 for |x| < 1
P∞
(1 + x) = n=0 C(α, n)xn
α
for all |x| < 1, and all complex α,
and the C(α, n) are the Binomial coefficients
which are defined somewhere else, or which can
be calculated on a case-by-case basis
P∞ (−1)n 2n+1
sin x = n=0 (2n+1)! x for all x
P∞ (−1)n 2n
cos x = n=0 (2n)! x for all x
n
P∞ (1−4n ) 2n−1
tan x = n=1 B2n (−4)
(2n)! x for |x| < π2
and the B2n are the Bernoulli numbers which are
defined somewhere else, or which can be calculated
on a case-by-case basis
P∞ (−1)n E2n 2n
sec x = n=0 (2n)! x for |x| < π2
and the E2n are the Euler numbers which are de-
fined somewhere else, or which can be calculated
P∞ on a case-by-case basis
(2n)! 2n+1
arcsin x = 4n (n!)2 (2n+1) x for |x| < 1
Pn=0
∞ n
arctan x = n=0 (−1)
2n+1 x
2n+1
for |x| < 1
P∞ 1 2n+1
sinh x = n=0 (2n+1)! x for all x
P∞ 1
cosh x = n=0 (2n)! x2n x
P∞ B2n 4n (4n −1) 2n−1
tanh x = n=1 (2n)! x for |x| < π2
and the B2n are the Bernoulli numbers which are
defined somewhere else, or which can be calculated
P∞ n
on a case-by-case basis
sinh−1 x = n=0 4n(−1) (2n)!
(n!)2 (2n+1) x
2n+1
for |x| < 1
−1 P ∞ 1 2n+1
tanh x = n=0 2n+1 x for |x| < 1
P∞ n−1
W0 (x) = n=1 (−n)
n! xn for |x| < 1
e
Comment.
[author=wikibooks, file =text_files/taylor_series_in_context_of_power_series]
The Taylor series may be generalised to functions of more than one variable with
the formula
∞ ∞
X X ∂ n1 ∂ nd f (a1 , · · · , ad )
··· n1
· · · nd
(x1 − a1 )n1 · · · (xd − ad )nd
n1 =0 n =0
∂x ∂x n 1 ! · · · n d !
d
Of course, to use this formula one must know how to take derivatives in more
than one dimension! In fact, one way to define derivatives in any dimension, is to
say that they are the functions which give you the correct coefficients for a Taylor
polynomial to work!
Comment.
[author=wikibooks, file =text_files/convergent_sequences_and_series]
We can also think of a sequence as a function from the natural numbers to the
real numbers: this just means that given n there’s some description of what an is.
In fact, most of the sequences we work with are explicitly given by a function, like
“an = 1/2n ”.
Since we can list the integers 1,2,3,... we likewise list a sequence f(1), f(2),
f(3), f(4), . . . We shall denote a sequence by an italic capital letter, the set of real
values that function takes by the same non-italic capital letter , and the elements
of that set with the corresponding lower case letter, and subscripts. For example
sequence S takes values in the set S with elements s1 , s2 , s3 ... .
S is a set of reals, S is a function from the integers to the reals, two different
concepts. While we are being rigorous we must be careful not to confuse the two,
but in general usage the concepts are interchangable.
We can also denote sequences by their function. For example if we say the
function S is 3k, then the sequence consists of 3,6,9,. . . .
In particular we will be interested in special types of sequences that converge.
We first introduce three definitions.
Definition 9.4.2.
Definition 9.4.3.
Definition 9.4.4.
[author=wikibooks, file =text_files/convergent_sequences_and_series]
A sequence, S, converges if there exists a number, s, such that for all > 0,
there exists an integer n() such that for all k > n(), |s − sn() | < . If the
series is convergent we call the number, s, the limit of the sequence S. We write,
s = limn→∞ sn
Theorem 9.4.1.
[author= wikibooks , file =text_files/convergent_sequences_and_series]
If there exists a number, s, such that for all > 0, there exists an integer n()
9.4. FORMAL CONVERGENCE 285
such that for all k > n(), |s − sn() | < f () where f is such that δ smaller than or
equal to some δ() implies f (δ) ≤ , f (x) is positive for all positive x, and f(0)=0
then S converges.
Proof.
[author=wikibooks, file =text_files/convergent_sequences_and_series]
For any consider n(δ()).
If n > n(δ()) then
From given conditions, |s − sn(δ()) | < f (δ()) ≤
So, S meets the conditions in Definition 3.
Comment.
Discussion.
Theorem 9.4.2.
[author= wikibooks , file =text_files/convergent_sequences_and_series]
Every Cauchy sequence is bounded above and below.
Proof.
[author=wikibooks, file =text_files/convergent_sequences_and_series]
We prove only that the sequence is bounded above. By definition 1 with = 1,
∃N such that ∀n > N |sn − sN | < 1. Define r = 2 + sup{s1 , s2 , ...sN } Then, by
definition sn < r ∀k ≤ N . If n¿N sn ≤ 1 + sN < r. Therefore the sequence meets
definition 2 with s+ = r The cauchy sequence is bounded above.
Definition 9.4.5.
Theorem 9.4.3.
[author= wikibooks , file =text_files/convergent_sequences_and_series]
If S is bounded above and monotonically increasing, S converges to sup S b. If S
is bounded below and monotonically decreasing, S converges to inf S
Proof.
[author=wikibooks, file =text_files/convergent_sequences_and_series]
For a monotonically increasing sequence bounded aboveS, and all > 0, we must
have sN > sup S − for some N , else sup S − is an upper bound of S, contradicting
definition of sup for all n > N , sN ≤ sn , by definition combine with first inequality
to get sup S > sn > supS − rearrange |sn − sup S| < for all n larger than some
N Hence sup S is the limit of S 3b) is proved similarly
286 CHAPTER 9. INFINITE SERIES
Theorem 9.4.4.
[author= wikibooks , file =text_files/convergent_sequences_and_series]
(The sandwich theorem) Given three sequences, R, S , T, If R and T both converge,
lim R=lim T and ∃N ∀n > N rn ≤ sn ≤ tn Sequence S converges to the same
limit
Proof.
[author=wikibooks, file =text_files/convergent_sequences_and_series]
Let s = lim R = lim T . For any > 0, by definition of convergence, there exist
M, N such that ∀n > M |rn − s| < ∀n > N |tn − s| < Combing these two
inequalities with the conditions on R and T gives s − < rn ≤ sn ≤ tn < s +
for all n greater than the maximum of M and N on rearrangement, S satisfies the
definition of convergence, with limit s, and n() = max{M, N }.
Theorem 9.4.5.
[author= wikibooks , file =text_files/convergent_sequences_and_series]
If R, and S are both convergent series ∀nrn ≤ sn ⇒ lim rn ≤ lim sn
Theorem 9.4.6.
[author= wikibooks , file =text_files/convergent_sequences_and_series]
A sequence S is convergent if and only if it is cauchy.
Proof.
[author=wikibooks, file =text_files/convergent_sequences_and_series]
Convergence implies cauchy. Assume S is convergent, with limit s For a >
0, choose n such that ∀k > n |sk − s| < /2 (always possible by defintion of
convegence) Via triangle inequality, |sk − sj | < |sk − s| + |s − sj | |sk − sj | <
/2 + /2 = , for j, k > n. This is defintion of cauchy.
Cauchy implies convergence. ¡ Let S be a cauchy sequence Define two sequences
R and T by rn = inf{sm m ≥ n} tn = sup{sm m ≥ n} rn = min(sn , rn+1 ) ≤ rn+1
R is monotonically increasing. Similarly T is monotonically decreasing. ∀m >
n r1 ≤ rn ≤ sm ≤ tn ≤ t1 so R and T are bounded above and below respectively.
Being bounded and monotonic, they converge to their supremum and infimum
repsectively. r = limn→∞ rn = sup inf{sm } t = limn→∞ tn = inf sup{sm } By
theorem 5 since rn ≤ tn for all n, r ≤ t. If, for some N, all sn with n > N
are greater than r, r is a lower bound to the sn but it is also an upper bound.
For r to be both the sn must be constant, making the series trivially convergent.
Similarly for t So, for all N, there must be n, m larger than N, with sn ≤ r sm ≥ t
∀N ∃n, m > N |sn −sm | ≥ |r−t| If r 6= t this contradicts the definition of Cauchy,
so r=t But S is bounded betweeen R and T, so by the sandwhich theorem, S is
convergent
Comment.
[author=wikibooks, file =text_files/convergent_sequences_and_series]
We can now use cauchy and convergent interchangeably, as convenient. We will
often prove a sequence is convergent by proving it to be cauchy.
Discussion.
Fact.
[author=wikibooks, file =text_files/convergent_sequences_and_series]
We define (S + T ) by (s + t)n = sn + tn .
Addition on sequences inherits the group properties of the reals.
If S and T both converge, to s and t respectively, then for all > 0, ∃N ∀n >
N |sn −s| < /2 |tn −t| < /2 (definition of limit) |sn −s|+|tn −t| > |sn +tn −s−t|
hence |sn + tn − (s + t)| < So, by definition of limit, S+T converges to s+t
Fact.
Ordinary Differential
Equations
Discussion.
Discussion.
[author=duckworth, file =text_files/introduction_to_ordinary_diffeqs]
A differential equation is an equation involving x, y, y 0 , possibly y 00 etc, that we
are trying to solve for y (which is a function). Here are all of the types of equations
we will solve in this course:
• You will be given an equation and told what form the solution y should take.
For example:
• Exponential growth and decay. All the problems in this section are varia-
tions on the following: y 0 = cy, or dP dt = rP , or “the rate of change of the
population is proportional to the population”. Allthough this type of equa-
tion is very usefull, it’s kind of stupid that the book waited until the fourth
289
290 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
section to introduce it: we know already how to solve these, they’re all of
the form Cert !
Definition 10.1.1.
Example 10.1.1.
[author=wikibooks, file =text_files/ordinary_diffeqs]
Consider the differential equation 3f 00 (x) + 5xf (x) = 11 Since the equation’s high-
est derivative is 2, we say that the differential equation is of order 2.
Discussion.
[author=wikibooks, file =text_files/ordinary_diffeqs]
A key idea in solving differential equations will be that of Integration.
Let us consider the second order differential equation f = 2
How would we go about solving this?. It tells us that on differentiating twice,
we obtain the constant 2 so, if we integrate twice, we should obtain our result.
Integrating once first of all
f dx = 2 dx f 0 = 2x + C1
R R
This is the solution to the differential equation. We will get f = 2 for all values
of C1 and C2 .
The values C1 and C2 are known as initial conditions.
Discussion.
[author=wikibooks, file =text_files/ordinary_diffeqs]
Why are initial conditions useful? ODEs (ordinary differential equations) are
useful in modelling physical conditions. We may wish to model a certain physical
system which is initially at rest (so one initial condition may be zero), or wound
up to some point (so an initial condition may be nonzero and be say 5 for instance)
and we may wish to see how the system reacts under such an initial condition.
When we solve a system with given initial conditions, we substitute them during
our process of integration. Without initial conditions, the answer we obtain is the
most general solution.
Example 10.1.2.
Example 10.1.3.
Derivation.
[author=wikibooks, file =text_files/separable_ordinary_differential_equations]
A separable equation is one of the form
dy
= f (x)/g(y).
dx
In this context people always use dy/dx notation. Previously we have only dealt
with simple differential equations with g(y) = 1. How do we solve such a separable
equation as above?
We group x and dx terms together, and y and dy terms together as well. This
gives
g(y) dy = f (x) dx.
Integrating both sides (on the left we integrate with respect to y, and on the right
with respect to x) we get
Z Z
g(y) dy = f (x) dx + C.
The resulting equation gives an implicit solution for y(x). In practice, it is often
possible to solve this equation for y.
Example 10.2.1.
Example 10.2.2.
dy dy 3
= (3x2 ) dx Integrating 3x2 dx ln y = x3 + C y = ex +C
R R
Separating y y =
3
C
Letting k = e where k is a constant we obtain y = kex which is the general
solution.
Example 10.2.3.
[author=wikibooks, file =text_files/separable_ordinary_differential_equations]
Just for practice, let’s verify that our answer in Example 10.2.2 really was a solu-
tion of the given differential equation. Note, this step is only for practice, it is not
usually part of finding a solution.
3 dy
We obtained y = kex as the solution to dx = 3x2 y
dy 3
Differentiating the solution, dx = 3kx2 ex
3
Since y = kex , we can write dydx = 3x2 y We see that we obtain our original
differential equation, so we can confirm our working as being correct.
Discussion.
dy
= rate in − rate out.
dx
See, that’s not so hard. The part about translating is making “rate in” and “rate
out” into formulas. Usually, one of these you’ve been given directly in the problem,
e.g. “salty water with .5kg of salt per liter flows into the tank at rate of 7 liters per
minute”. In this case you would have rate in = .5 × 7. The other rate is usually
found by multiplying the concentration (which is like density) by the amount of
flow. E.g. “thoroughly mixed water flows out of the tank at the same rate as water
flows in”. In this case, we have:
dy y
= .5 × 7 − ×7
dx 100
294 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
Example 10.2.4.
dy y
= .5 × 7 − × 7.
dx 100
1
dy = dx
.5 × 7 − 7y/100
Let’s move those constants around. Multiply both sides by 7, mulitply the top
and bottom of the fraction by 100.
100
dy = 7dx
50 − y
Z Z
100
dx = 7dx
50 − y
100 ln |50 − y| = 7x + C
1
ln |50 − y| = (7x + C)
100
7
ln |50 − y| = x + C (new C!)
100
|50 − y| = Ce.007x (new C!)
y = 50 ± Ce.007x
Example 10.2.5.
The main idea is to translate these questions into ones involving derivatives.
So, (a) is equivalent to: which values of y make dydt = 0? (b) is equivalent to:
dy
which values of y make dt positive? (c) is equivalent to: which values of y make
dy dy
dt negative? Since we have an equation for dt , these questions are easily solved:
dy
dt =0 is equivalent to 0 = y 4 − 6y 3 + 5y 2
is equivalent to 0 = y 2 (y − 5)(y − 1)
is equivalent to y = 0, 5, 1
So far we’ve lookd at this problem without actually solving it. But, if we know
the method of separation we can solve this differential equation.
dy
We divide both sides of dt = y 2 (y − 5)(y − 1) to get
1
dy = dt
y 2 (y − 5)(y − 1)
To integrate the left hand side we need partial fractions (yay! I’m so glad we
learned that already!). We set that up as follows:
1 A B C D
= + 2+ +
y 2 (y − 5)(y − 1) y y y−5 y−1
This is a relatively simple example to solve. Multiply both sides by y 2 (y −5)(y −1)
to get
ODE_lengthy_example_first_graph
10.2. BASIC ORDINARY DIFFERENTIAL EQUATIONS 297
ODE_lengthy_example_whole_graph
Now, I want to graph this solution directly, and compare it to the graph we sort
of made up in Figure 10.1. But how can we graph it? I can’t solve this equation
of or y as a function of t. The trick is to graph t as a function of y; this is like
graphing the inverse of the function that we really want. Thus, the picture we get
will be like the one in Figure 10.1, except that we’ve switched the x and y axes
(you can think about this as reflection of the graph across the line y = x, or you
can think about this as “flipping” the graph so that the x-axis goes where the
y-axis was, and you’re looking at the graph through the back of the picture). So
the graphs below put t on the vertical axis and y on the horizontal axis (this is like
entering on our calculators t ↔ y1 and y ↔ x). Note that the different values of
C cause the graph to shift up and down. You can see here again why there should
be infinitely many solutions of this differential equation.
Figure 10.2 shows the whole graph for three different values of C.
It’s kind of hard to see the behavior of the graph when you look at the whole
picture. This is one way that hand-drawn graphs are better than real ones: I could
make each feature pretty clear in Figure 10.1, even though (actually because) it
was not perfectly accurate.
We can work around the limitations of the real graph by looking closely at each
part. In Figures 10.3–10.8 I zoom in on various parts of the graph. In each case
you get (roughly) the shape that I drew in Figure 10.1.
Well, I think we’ve analyzed this problem from every way we can. The point
was just to do the problem in two different ways; we combined tricks from inte-
gration (partial fractions) and used the method of separation. I guess we also got
298 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
ODE_lengthy_example_graph-2to0 ODE_le
ODE_lengthy_example_graph_1to5 ODE_le
10.2. BASIC ORDINARY DIFFERENTIAL EQUATIONS 299
ODE_lengthy_example_graph_2to5 ODE_lengthy_example_graph_5to7
practice in looking at a graph, and even a little bit of review of inverse functions
(i.e. reversing the roles of x and y). Thanks for reading through this.
Definition 10.2.1.
Example 10.2.6.
[author=wikibooks, file =text_files/homogeneous_ordinary_differential_equations]
dy y 2 +x2
We have the equation dx = yx
This does not appear to be immediately seperable, but let us expand to get
dy y2 x2 dy x y
dx = yx + yx dx = y + x
dy
Substituting y=xv which is the same as substituting v=y/x dx = 1/v + v
dv dv
Now v + x dx = 1/v + v Cancelling v from both sides x dx = 1/v Seperating
y 2
2
v dv = dx/x Integrating both sides v + C = ln(x) x = ln(x) − C y 2 =
p
x2 ln(x) − Cx2 y = x ln(x) − C
which is our desired solution.
300 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
Definition 10.2.2.
Rule 10.2.1.
[author=wikibooks, file =text_files/linear_ordinary_differential_equations]
Multiplying or dividing a linear first order differential equation by any non-zero
function of x makes no difference to its solutions so we could always divide by a(x)
to make the coefficient of the differential 1, but writing the equation in this more
general form may offer insights.
At first glance, it is not possible to integrate the left hand side, but there is
one special case. If b happens to be the differential of a then we can write
dy dy da
a(x) + b(x)y = a(x) +y = ddxa(x)y
dx dx dx
and integration is now straightforward.
Since we can freely multiply by any function, lets see if we can use this freedom
to write the left hand side in this special form.
We multiply the entire equation by an arbitary I(x) getting
dy
aI + bIy = cI
dx
then impose the condition
d
aI = bI.
dx
If this is satisified the new left hand side will have the special form. Note that
multiplying I by any constant will leave this condition still satisfied.
Rearranging this condition gives
da
1 dI b − dx
=
I dx a
We can integrate this to get
Z
b(z) k R b(z)
dz
ln I(x) = dz − ln a(x) + c I(x) = e a(z) .
a(z) a(x)
We can set the constant k to be 1, since this makes no difference. Next we use I
on the original differential equation, getting
R b(z)
dz dy R b(z)
dz b(x) R b(z)
dz c(x)
e a(z) +e a(z) y=e a(z) .
dx a(x) a(x)
Because we’ve chosen I to put the left hand side in the special form we can rewrite
this as
d R b(z) R b(z)
c(x)
(ye a(z) dz ) = e a(z) dz .
dx a(x)
10.2. BASIC ORDINARY DIFFERENTIAL EQUATIONS 301
Example 10.2.7.
Definition 10.2.3.
Rule 10.2.2.
Example 10.2.8.
Derivation.
d2 y du du dy du
= = · = · u.
dx2 dx dy dx dy
Substitute these two expression into the equationa and we get
du
F y, u, ·u =0
dy
which is a first order ODE.
Example 10.3.1.
[author=wikibooks, file =text_files/reducible_higher_order_ODEs]
Solve 1 + 2y 2 D2 y = 0 if at x=0, y=Dy=1
10.3. HIGHER ORDER DIFFERENTIAL EQUATIONS 303
Definition 10.3.1.
equations are much simpler to solve than typical non-linear ODE’s. Though only
a few special cases can be solved exactly in terms of elementary functions, there is
much that can be said about the solution of a generic linear ODE. A full account
would be beyond the scope of this book If F (x) = 0 for all x the ODE is called
homogeneous.
Fact.
Rule 10.3.1.
Example 10.3.2.
[author=wikibooks, file =text_files/linear_higher_order_ODEs]
d2 y 2 dy 6
Consider dx2 + x dx − x2 y =0
304 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
Rearrange and simplify. x2 D2 z + 6xDz = 0 This is first order for Dz. We can
solve it to get z = Ax−5 y = Ax−3
Since the equation is linear we can add this to any multiple of the other solution
to get the general solution,
y = Ax−3 + Bx2
Rule 10.3.2.
Rule 10.3.3.
Rule 10.3.4.
Making these substitutions will give a set of simultaneous linear equations for
the coefficients of the polynomials.
Rule 10.3.5.
[author=wikibooks, file =text_files/linear_higher_order_ODEs]
Non-Linear ODE’s If the ODE is not linear, first check if it is reducible. If it is
neither linear nor reducible there is no generic method of solution. You may, with
sufficient ingenuity and algebraic skill, be able to transform it into a linear ODE.
If that is not possible, solving the ODE is beyond the scope of this book.
Chapter 11
Vectors
Discussion.
Discussion.
307
308 CHAPTER 11. VECTORS
Definition 11.1.1.
Comment.
Definition 11.1.2.
[author=wikibooks, file =text_files/vector_operations]
The magnitude of a vector is defined as
q
|~u| = u2x + u2y
where ux is the width, or run, of the vector uy is the height, or rise, of the vector.
You should recognize this formula as simply the distance formula between two
points. It is – the magnitude is the distance between the initial point and the
terminal point.
11.1. BASIC VECTOR ARITHMETIC 309
Definition 11.1.3.
[author=wikibooks, file =text_files/vector_operations]
The direction of a vector is defined as,
uy
tan θ =
ux
where θ is the direction of the vector. This formula is simply the tangent formula
for right triangles.
Comment.
[author=duckworth, file =text_files/vector_operations]
Note that the definition of direction of a vector assumes that you have fixed x
and y axes in the R2 plane. In more general settings “direction” of a vector is
too vague, instead, one would refer more speceficially to “the angle between two
vectors.”
Definition 11.1.4.
[author=duckworth,
file
=text_files/vector_operations]
ux v
Let ~u = and ~v = x be any vectors. We define ~u + ~v to be the vector given
uy vy
by
ux + vx
.
uy + vy
Comment.
Example 11.1.1.
Definition 11.1.5.
[author=wikibooks, file =text_files/vector_operations]
Let c be a real number and ~u any vector. We define the scalar product c~u as
310 CHAPTER 11. VECTORS
the vector:
cux
c~u =
cuy
Comment.
Example 11.1.2.
Fact.
Definition 11.1.6.
[author=duckworth,
file =text_files/vector_operations]
ux v
Let ~u = and ~v = x be any vectors. We define the dot product ~u · ~v to
uy vy
be the real number given by
ux · vx + uy · vy .
Comment.
[author=duckworth, file =text_files/vector_operations]
Note, we have used the notation “·” both for multiplying vectors and for multiply-
ing real numbers. We rely on the reader to whether the things being multiplied
are vectors or real numbers.
Definition 11.1.7.
Fact.
[author=duckworth, file =text_files/vector_operations]
Two vectors ~u and ~v are perpendicular to each other if and only if ~u · ~v = 0.
Definition 11.1.8.
Definition 11.1.9.
Comment.
[author=duckworth, file =text_files/vector_operations]
Using the standard unit vectors we can write an arbitrary vector ~u this way
u = ux î + uy ĵ
where ux and uy are the x and y-components of u, respectively.
Discussion.
[author=wikibooks, file =text_files/polar_coordinates]
Polar coordinates are an alternative two-dimensional coordinate system, which is
often useful when rotations are important. Instead of specifying the position along
the x and y axes, we specify the distance from the origin, r, and the direction, an
angle θ .
Looking at this diagram, we can see that the values of x and y are related to
those of r and θ by the equations
p
x = r cos θ r = x2 + y 2
y = r sin θ tan θ = xy
312 CHAPTER 11. VECTORS
Because tan −1 is multivalued, care must be taken to select the right value.
Just as for Cartesian coordinates the unit vectors that point in the x and y
directions special, so in polar coordinates the unit vectors that point in the r and
θ directions are special.
We will call these vectors r̂ and θ̂, pronounced r-hat and theta-hat. Putting
a circumflex over a vector this way is often used to mean the unit vector in that
direction.
Again, on looking at the diagram we see,
i = r̂ cos θ − θ̂ sin θ r̂ = xr i + yr j
j = r̂ sin θ + θ̂ cos θ θ̂ = − yr i + xr j
Discussion.
[author=wikibooks, file =text_files/three_dimensional_vectors]
Two-dimensional Cartesian coordinates as we’ve discussed so far can be easily
extended to three-dimensions by adding one more value z. If the standard (x, y)
coordinate axes are drawn on a sheet of paper, the z axis would extend upwards
off of the paper.
Similar to the two coordinate axes in two-dimensional coordinates, there are
three coordinate planes in space. These are the xy-plane, the yz-plane, and the
xz-place. Each plane is the “sheet of paper” that contains both axes the name
mentions. For instance, the yz-plane contains both the y and z axes and is per-
pendicular to the x axis.
Therefore, vectors can be extended to three dimensions by simply adding the
z value. For example:
x
~u = y
z
To faciliate standard form notation, we add another standard unit vector
0
~k = 0
1
Again, both forms (component and standard) are equivalent. For example,
1
2 = 1~i + 2~j + 3~k
3
Definition 11.1.10.
[author=wikibooks, file =text_files/three_dimensional_vectors]
The cross product of two vectors is defined as the following determinant:
~k
~i ~j
~u × ~v = ux uy uz
vx vy vz
and is vector.
The cross product of two vectors is at right angles to both vectors. The mag-
nitude of the cross product is the product of the magnitude of the vectors and
sin(θ) where θ is the angle between the two vectors:
This magnitude is the area of the parallelogram defined by the two vectors.
Fact.
~u × (a~v + bw)
~ = a~u × ~v + b~u × w
~
and
~u × ~v = −~v × ~u
If both vectors point in the same direction, their cross product is zero.
Facts.
~u · (~v × w)
~
~u · (~v × w)
~ = (~u × ~v ) · w
~
Either way, the absolute value of this product is the volume of the paralllelpiped
defined by the three vectors, u, v, and w
314 CHAPTER 11. VECTORS
~u × (~v × w)
~ = (~u · w)~
~ v − (~u · ~v )w
~
~u × (~v × w)
~ 6= (~u × ~v ) × w.
~
There are special cases where the two sides are equal, but in general the brackets
matter and must not be omitted.
Discussion.
[author=wikibooks, file =text_files/three_dimensional_vectors]
We will use r to denote the position of a point.
The multiples of a vector, a all lie on a line through the origin. Adding a
constant vector b will shift the line, but leave it straight, so the equation of a line
is, ~r = ~as + ~b
This is a parametric equation. The position is specified in terms of the param-
eter s.
Any linear combination of two vectors, a and b lies on a single plane through
the origin, provided the two vectors are not colinear. We can shift this plane by a
constant vector again and write ~r = ~as + ~bt + ~c
If we choose a and b to be orthonormal vectors in the plane (i.e unit vectors
at right angles) then s and t are cartesian coordiantes for points in the plane.
These parametric equations can be extended to higher dimensions.
Instead of giving parametic equations for the line and plane, we could use
constraints. E.g, for any point in the x − y-plane z = 0.
For a plane through the origin, the single vector normal to the plane, n, is at
right angle with every vector in the plane, by defintion, so ~r · ~n = 0 is a plane
through the origin, normal to n.
For planes not through the origin we get (~r − ~a) · ~n = 0 ~r · ~n = a
A line lies on the intersection of two planes, so it must obey the constraint for
both planes, i.e ~r · ~n = a ~r · m
~ =b
These constraint equations con also be extended to higher dimensions.
Discussion.
[author=wikibooks, file =text_files/three_dimensional_vectors]
For any curve given by vector function of t, f (t), we can define a unit tangent
vector t,
~t = 1 df~
,
|df~/dt| dt
where t depends only on the geometry of the curve, not on the parameterisation.
11.1. BASIC VECTOR ARITHMETIC 315
1 = ~v · ~v
1 = vx vx + vy vy + vz vz
0 = 2vx v˙x + 2vy v˙y + 2vx v˙y
0 = ~v · ~v˙
1 d~t
~n = ,
|d~t/dt| dt
n is called the normal to the curve. The curve lies in its n−t-plane near any point.
This plane is called the osculating plane.
Since we’ve got two perpendicular unit vectors we can define a third.
~b = ~t × ~n
This vector is called the binormal. All three of these vectors depend only on the
geometry of the curve, which makes them useful when studying that curve.
We can, for example, use them to define curvature.
Discussion.
where s is the length measured along the curve and v is the derivative of x with
respect to t, analogous to velocity.
Integrating this, we get
Z q
ds
s= vx2 + vy2 + vz2 dt = |~v |
dt
d
For a circle, x = (a cos(t), a sin(t), 0), this gives dt = a and the circumference
of the circle as 2πa just as expected.
The curvature of a curve ~x is defined to be
d~x
κ = |
ds
We can get the general expression for κ by writing v and a in terms of t and n
~v = ~t ds
dt
d ~t ds
~a = dt dt
d2 s ~ ~
= dt2 t + ds dt
dtdt
d2 s ~ ds 2 d~
= dt2 t + dt dst
d2 s ~
2
= dt2 t + ds
dt κ~n
where the last line follows from the definitions of n and κ.
We can now take the cross product of velocity and acceleration to get
3
ds ~
~v × ~a = κ b
dt
but b is a unit vector and |ds/dt| = |v| so
|~v × ~a|
κ=
|~v |3
Discussion.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Topology in Rn We are already familar with the nature of the regular real number
line, which is the set R, and the two-dimensional plane, R2 . This examination of
topology in Rn attempts to look at a generalization of the nature of n-dimensional
spaces; R, or R23 , or Rn .
Discussion.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Lengths and distances If we have a vector in R2 , we can calculate its length
using
√ the Pythagorean
√ theorem. For instance, the length of the vector (2, 3) is
22 + 32 = 13
Definition 11.2.1.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
We can generalize this to Rn . We define a vector’s length, written —x—, as the
square root of the squaresp of each of its components. That is, if we have a vector
x = (x1 , ..., xn ), |x| = x21 + x22 + · · · + x2n
Now that we have established some concept of length, we can establish the
distance between two vectors. We define this distance to be the length of the two
vectors’
p P difference. We write this distance d(x, y), and it is d(x, y) = |x − y| =
(xi − yi )2
This distance function is sometimes referred to as a metric. Other metrics
arise in different circumstances. The metric we have just defined is known as the
Euclidean metric.
Definition 11.2.2.
Definition 11.2.3.
Example 11.2.1.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
It is important to note that most sets are not open or closed. Think about a box
in R2 with its top and bottom included, and it’s left and right sides open - this
set is {(x, y)||x| < 1and|y| ≤ 1}.
Definition 11.2.4.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Limit points A limit point of some set S is a point where, if we construct a
neighbourhood about that point, that neighbourhood always contains some other
point in S.
Example 11.2.2.
Definition 11.2.5.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
If we include all the limit points of a set including that set, we call that set the
closure of S, and we write it S.
Comment.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Limit points allow us to also characterize whether a set is open or closed - a set is
closed if it contains all its limit points.
Definition 11.2.6.
Example 11.2.3.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
The boundary of a closed ball in R2 is the circle surrounding the interior of that
ball. In symbols this means that ∂B((0, 0), 1) = {(x, y)|x2 + y 2 = 1}
Definition 11.2.7.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Bounded sets A set S is bounded if it is contained in some ball centered at 0.
Definition 11.2.8.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Curves and parametrizations If we have a function f : R → Rn , we say that
the image of f (i.e. the set {f (t)|t ∈ R}) is a curve in Rn and that f is its
parametrization.
Parametrizations are not necessarily unique - for example, f (t) = (cos t, sin t)
such that t ∈ [0, 2π) is one parametrization of the unit circle, and g(t) = (cos 7t, sin 7t)
such that t ∈ [0, 2π/7) is another parameterization.
320 CHAPTER 11. VECTORS
Definition 11.2.9.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Collision and intersection points Say we have two different curves. It may be
important to consider when the two curves cross each other - where they intersect
when the two curves hit each other at the same time - where they collide.
Definition 11.2.10.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Intersection points Firstly, we have two parametrizations f(t) and g(t), and we
want to find out when they intersect, this means that we want to know when the
function values of each parametrization are the same. This means that we need
to solve f(t) = g(s) because were seeking the function values independent of the
times they intersect.
Example 11.2.4.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
For example, if we have f(t) = (t, 3t) and g(t) = (t, t2 ), and we want to find
intersection points f (t) = g(s) = (t, 3t) = (s, s2 ), t = s and 3t = s2 with solutions
(t, s) = (0, 0) and (3, 3)
So, the two curves intersect at the points (0, 0) and (3, 9).
However, if we want to know when the points ”collide”, with f(t) and g(t), we
need to know when both the function values and the times are the same, so we
need to solve instead f(t) = g(t)
For example, using the same functions as before, f(t) = (t, 3t) and g(t) = (t, t2 ),
and we want to find collision points f (t) = g(t)(t, 3t) = (t, t2 ), t = t and 3t = t2
which gives solutions t = 0, 3 So the collision points are (0, 0) and (3, 9).
We may want to do this to actually model physical problems, such as in bal-
listics.
Definition 11.2.11.
Definition 11.2.12.
similar.
We can expect the tangent vector to depend on f 0 (t) and we know that a line
is its own tangent, so looking at a parametrised line will show us precisely how to
define the tangent vector for a curve.
An arbitary line is f (t) = at + b, with fi (t) = ai t + bi , so fi (t) = ai and
f (t) = a, which is the direction of the line, its tangent vector.
Similarly, for any curve, the tangent vector is f 0 (t).
The gradient of the line f (t) in the one-variable case is f 0 (t), likewise, the
tangent vector to a curve in the several variable case is the vector f 0 (t) (this
vector must not be 0).
Definition 11.2.13.
Definition 11.2.14.
Definition 11.2.15.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Different parametrizations One such parametrization of a curve is not neces-
sarily unique. Curves can have several different parametrizations. For example,
we already saw that the unit circle can be parametrized by g(t) = (cos(at), sin(at))
such that t ∈ [0, 2π/a).
Generally, if f is one parametrization of a curve, and g is another, with f (t0 ) =
g(s0 ) there is a function u(t) such that u(t0 ) = s0 , and g(u(t)) = f (t) near t0 .
This means, in a sense, the function u(t) ”speeds up” the curve, but keeps the
curves shape.
Definition 11.2.16.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Surfaces A surface in space can be described by the image of a function f : R2 →
Rn . We call f the parametrization of that surface.
322 CHAPTER 11. VECTORS
Example 11.2.5.
Comment.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Surfaces can also be described explicitly, as the graph of a function z = f (x, y)
which has a standard parametrization as f (x, y) = (x, y, f (x, y)), or implictly, in
the form f (x, y, z) = c.
Definition 11.2.17.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Level sets The concept of the level set (or contour) is an important one. If
you have a function f(x, y, z), a level set in R3 is a set of the form {(x, y, z) |
f (x, y, z) = c}. Each of these level sets is a surface.
Level sets can be similarly defined in any Rn .
Level sets in two dimensions may be familiar from maps, or weather charts.
Each line represents a level set. For example, on a map, each contour represents
all the points where the height is the same. On a weather chart, the contours
represent all the points where the air pressure is the same.
Discussion.
Discussion.
Definition 11.2.18.
if for all positive , there is a corresponding positive number δ such that |f (x)−b| <
whenever |x − a| < δ, with x 6= a.
Comment.
[author=duckworth, file =text_files/formal_issues_of_vector_calculus]
Definition 11.2.18 means that by making difference between x and a smaller, we
can make the difference between f (x) and b as small as we want.
For grammatical convenience we sometimes describe the situation in Defini-
tion 11.2.18 in different ways.
We read this definition as “the limit of f (x), as x approaches a, equals b.” We
also write “f (x) → b as x → a”. We also will write “limx→a f = b” (where we
leave out the “x” in “f (x)”), or even lim f = b (where we leave out the “x → a”).
These abbreviated forms are not used just out of laziness; it’s sometimes better to
simplify notation by leaving out unnecessary details.
Fact.
• limx→a (f + g) = b + c,
• limx→a (h(x)f (x)) = Hb,
• limx→a (f · g) = b · c,
• limx→a (f × g) = b × c.
f
• if n = 1 and c 6= 0, then limx→a g = cb .
f b
• If H 6= 0 then limx→a h = H
Discussion.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
Continuity Again, we can use a similar definition to the one variable case to
formulate a definition of continuity for multiple variables.
Definition 11.2.19.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
If f : Rm → Rn , then f is continuous at a point a in Rm if f (a) is defined and
324 CHAPTER 11. VECTORS
Comment.
Comment.
[author=wikibooks, file =text_files/formal_issues_of_vector_calculus]
From these facts we also have that if A is some matrix which is n × m in size, with
x in Rm , a function f (x) = Ax is continuous in that the function can be expanded
in the form x1 a1 + ... + xm am , which can be easily verified from the points above.
Fact.
Fact.
Comment.
Before we define the derivative in higher dimensions, it’s worth looking again at
the definition of derivative in one variable.
f (x)−f (p)
For one variable the definition of the derivative at a point p, is limx→p x−p =
f (p)
We cant divide by vectors, so this defintion cant be immediately extended to
the multiple variable case. However, we can divide by the absolute value of a
vector, so lets rewrite this definition in terms of absolute values
Still needs probably a little more explanation here limx→p |f (x)−f (p)−f
|x−p|
(p)(x−p)|
=
0 after pulling f(p) inside and putting it over a common denominator.
So, how can we use this for the several-variable case?
If we switch all the variables over to vectors and replace the constant,(which
performs a linear map in one dimension) with a matrix (which is also a linear map),
we have limx→p |f (x)−f (p)−A(x−p)|
|x−p| = 0 If this limit exists for some f Rm → Rn ,
and there is a matrix A which is m × n, we refer to this matrix as being the
derivative and we write it as Dp f .
A point on terminology - in referring to the action of taking the derivative, we
write Dp f , but in referring to this matrix itself, it is known as the Jacobian matrix
and is also written Jp f . More on the Jacobian later.
Discussion.
Discussion.
Discussion.
[author=wikibooks, file =text_files/derivatives_in_vector_calculus]
Continuity and differentiability
Furthermore, if all the partial derivatives exist, and are continuous in some
neighbourhood of a point p, then f is differentiable at p. This has the conse-
quence that functions which have their component functions built from continuous
326 CHAPTER 11. VECTORS
Discussion.
[author=wikibooks, file =text_files/derivatives_in_vector_calculus]
Rules of taking Jacobians If f Rm → Rn , and h(x)Rm → R are differentiable
at p Jp (f + g) = Jp f + Jp g Jp (hf ) = hJp f + f (p)Jp h Jp (f · g) = gT Jp f + f T Jp g
Important make sure the order is right - matrix multiplication is not commutative!
Discussion.
Discussion.
[author=wikibooks, file =text_files/derivatives_in_vector_calculus]
Alternate notations For simplicity, we will often use various standard abbrevi-
ations, so we can write most of the formulae on one line. This can make it easier
to see the important details.
Notation.
If we are using subscripts for both the components of a vector and for partial
derivatives we will separate them with a comma.
∂ux
ux,y = ∂y
The most widely used notation is hx . Both h1 and ∂1 h are also quite widely
used whenever the axes are numbered. The notation ∂x h is used least frequently.
We will use whichever notation best suits the equation we are working with.
Discussion.
Discussion.
Definition 11.4.1.
This is the dot product of dr with a vector whose components are the partial
derivatives of f, called the gradient of f
328 CHAPTER 11. VECTORS
∂f (p)
grad f = ∇f = ∂x1 , · · · , ∂f∂x(p)
n
We can write the action of taking the gradient vector by writing this as an
operator. Recall that in the one-variable case we can write d/dx for the action
of taking the derivative with respect to x. This case is similar, but ∇ acts like a
vector.
∂ ∂ ∂
We can also write the action of taking the gradient vector as ∇ = ∂x ,
1 ∂x2
, · · · ∂xn
Comment.
Example 11.4.1.
Fact.
Definition 11.4.2.
[author=wikibooks, file =text_files/gradients_divergence_curl]
Divergence If the vector function u maps Rn to itself, then we can take the dot
11.4. DIV, GRAD, CURL, AND OTHER OPERATORS 329
Comment.
[author=wikibooks, file =text_files/gradients_divergence_curl]
diverge v tells us how much u is converging or diverging. It is positive when the
vector is diverging from some point, and negative when the vector is converging
on that point.
Example 11.4.2.
[author=wikibooks, file =text_files/gradients_divergence_curl]
Define the vector function v = (1 + x2 , xy). Then diverge v = 3x, which is positive
to the right of the origin, where v is diverging, and negative to the left of the
origin, where v is diverging.
Fact.
Comment.
Comment.
Definition 11.4.3.
The curl of u tells us if the vector u is rotating around a point. The direction
of curl u is the axis of rotation.
We can treat vectors in two dimensions as a special case of three dimensions,
with uz = 0 and Dz u = 0. We can then extend the definition of curl u to two-
dimensional vectors and obtain curl u = Dy ux − Dx uy . This two dimensional curl
is a scalar. In four, or more, dimensions there is no vector equivalent to the curl.
Example 11.4.3.
Example 11.4.4.
Comment.
Rules 11.4.1.
Rules 11.4.2.
11.5. INTEGRATION IN VECTOR CALCULUS 331
Definition 11.4.4.
Discussion.
[author=wikibooks, file =text_files/gradients_divergence_curl]
∂2u 2
We can also take the Laplacian of a vector, ∇2 u(x1 , x2 , . . . xn ) = ∂x21
+ ∂∂xu2 + . . . +
2
∂2u
∂x2n
The Laplacian of a vector is not the same as the divergence of its gradient
∇(∇ · u) − ∇2 u = ∇ × (∇ × u)
Both the curl of the gradient and the divergence of the curl are always zero.
∇ × ∇f = 0 ∇ · (∇ × u) = 0
This pair of rules will prove useful.
Discussion.
[author=wikibooks, file =text_files/integration_in_vector_calculus]
We have already considered differentiation of functions of more than one variable,
which leads us to consider how we can meaningfully look at integration.
In the single variable case, we interpret the definite integral of a function to
mean the area under the function. There is a similar interpretation in the multiple
332 CHAPTER 11. VECTORS
Definition 11.5.1.
Definition 11.5.2.
component.
Definition 11.5.3.
Example 11.5.1.
where C is the curve being integrated along, and t is the unit vector tangent to
the curve.
Rule 11.5.1.
[author=wikibooks, file =text_files/integration_in_vector_calculus]
There are some particularly natural ways to integrate a vector function, u, along
a curve, Z Z Z Z
u ds u · dr u × dr u · nds
C C C C
If the curve is planar and u a vector lieing in the same plane, the second
integral can be usefully rewritten. Say, u = ut t + un n + ub b where t, n, and b are
the tangent, normal, and binormal vectors uniquely defined by the curve.
Then u × t = −bun + nub
For the 2-d curves specified b is the constant unit vector normal to their plane,
and ub is always zero.
R R
Therefore, for such curves, C u × dr = C u · nds
Discussion.
Example 11.5.2.
[author=wikibooks, file =text_files/integration_in_vector_calculus]
For example, if V = r2 then ∇V = 2(x, y, z) = 2r and
Z r Z r
2u · du = 2 (udu + vdv + wdw)
0 0
r r r
= u2 + v 2 + w 2
0 0 0
= x2 + y 2 + z 2 = r2
Example 11.5.3.
The curl of u is
i j k
1
Dx Dy Dz = k = v
2
−y x 0
as expected.
Comment.
[author=wikibooks, file =text_files/integration_in_vector_calculus]
We will soon see that these three integrals do not depend on the path, apart from
a constant.
11.5. INTEGRATION IN VECTOR CALCULUS 335
Discussion.
dS = ndS.
For a scalar function V and a vector function v this gives us the integrals
Z Z Z
V dS, v · dS, v × dS
A A A
These integrals can be reduced to parametric integrals but, written this way, it is
clear that they reflect more of the geometry of the surface.
When working in three dimensions, dV is a scalar, so there is only one option
for integrals over volumes.
Discussion.
[author=wikibooks, file =text_files/integration_in_vector_calculus]
Rb
Gausss divergence theorem We know that, in one dimension, a Df dx = f |ba
Integration is the inverse of differentiation, so integrating the differential of a
function returns the original function.
This can be extended to two or more dimensions in a natural way, drawing on
the analogies between single variable and multivariable calculus.
The analog of D is ∇ , so we should consider cases where the integrand is a
divergence.
Instead of integrating over a one-dimensional interval, we need to integrate
over a n-dimensional volume.
In one dimension, the integral depends on the values at the edges of the interval,
so we expect the result to be connected with values on the boundary.
This suggests the following theorem.
Theorem 11.5.1.
[author=
R R , file =text_files/integration_in_vector_calculus]
wikibooks
V
∇ · u dV = ∂V
n · udS
336 CHAPTER 11. VECTORS
Comment.
[author=wikibooks, file =text_files/integration_in_vector_calculus]
This is indeed true, for vector fields in any number of dimensions.
This is called Gausss theorem.
Theorem 11.5.2.
[author= wikibooks , file =text_files/integration_in_vector_calculus]
There are two other, closely related, theorems for grad and curl
Z Z
∇u dV = undS,
V ∂V
and Z Z
∇ × u dV = n × udS,
V ∂V
with the last theorem only being valid where curl is defined.
Discussion.
Theorem 11.5.3.
[author= wikibooks , file =text_files/integration_in_vector_calculus]
Z I
∇ · u dS = n · uds
S ∂S
where s is arclength along the boundary curvw and the vector n is the unit normal
to the curve that lies in the surface S, i.e in the tangent plane of the surface at
its boundary, which is not necessarily the same as the unit normal associated with
the boundary curve itself.
Theorem 11.5.4.
[author= wikibooks , file =text_files/integration_in_vector_calculus]
Similarly, we get Z Z
∇ × u ds = n × uds (1),
s C
where C is the boundary of S.
Comment.
The left hand side is an integral over a closed surface bounding some volume
V so we can use Gausss divergence theorem.
11.5. INTEGRATION IN VECTOR CALCULUS 337
R R
S1 +S2
∇ × u dS = V
∇ · ∇ × u dV
but we know this integrand is always zero so the right hand side of (2) must
always be zero, i.e the integral is independant of the surface.
This means we can choose the surface so that the normal to the curve lieing in
the surface is the same as the curves intrinsic normal
Then, if u itself lies in the surface, we can write
u = (u · n) n + (u · t) t
just as we did for line integrals in the plane earlier, and substitute this into (1)
to get the following.
Partial Differential
Equations
Discussion.
Discussion.
Discussion.
[author=wikibooks, file =text_files/partial_diffeqs]
Now consider the slightly more complex PDE ax ux + ay uy + az uz = h(u) (2)
where h can be any function, and each a is a real constant.
339
340 CHAPTER 12. PARTIAL DIFFERENTIAL EQUATIONS
We recognise the left hand side as being a · ∇ , so this equation says that the
differential of u in the a direction is h(u). Comparing this with the first equation
suggests that the solution can be written as an arbitary function on the plane
normal to a combined with the solution of an ODE.
Remembering from earlier that any vector r can
be split up into components
parallel and perpendicular to a, r = r⊥ + rk = r − (r·a)a
|a|2 + (r·a)a
|a|2 we will use
this to split the components of r in a way suggested by the analogy with (1).
r·a
Lets write r = (x, y, z) = r⊥ + sa s = a·a and substitute this into (2), using
the chain rule. Because we are only differentiating in the a direction, adding any
function of the perpendicular vector to s will make no difference.
a d
First we calculate grad s, for use in the chain rule, ∇s = a2 ds
d a·a d
On making the substitution into (2), we get, h(u) = a·∇s ds u(s) = a·a u(s) =
dsR
du u dt
ds which is an ordinary differential equation with the solution s = c(r⊥ ) + h(t)
The constant c can depend on the perpendicular components, but not upon the
parallel coordinate. Replacing s with a monotonic scalar function of s multiplies
the ODE by a function of s, which doesn’t affect the solution.
Example 12.1.1.
Derivation.
Consider a PDE, a(x, y)ux + b(x, y)uy = c(x, y, u) For the suggested solution,
dy
u = f (x(s), y(s)), the chain rule gives du dx
ds = ds ux + ds uy Comparing coefficients
dx dy du
then gives ds = a(x, y) ds = b(x, y) ds = c(x, y, u) so weve reduced our original
PDE to a set of simultaneous ODEs. This procedure can be reversed.
The curves (x(s), y(s)) are called characteristics of the equation.
Example 12.1.2.
[author=wikibooks, file =text_files/partial_diffeqs]
dy
Solve yux = xuy given u = f (x) for x ≥ 0. The ODEs are dx ds = y ds =
du
−x ds = 0 subject to the initial conditions at s = 0, x(0) = r y(0) = 0 u(0) =
f (r) r ≥ 0 This ODE is easily solved, giving x(s) = r cos s y(s) = sin s u(s) =
f (r) so the characteristics are concentric circles round the origin, and in polar
coordinates u(r, θ) = f (r).
Considering the logic of this method, we see that the independance of a and b
from u has not been used either, so that assumption too can be dropped, giving
the general method for equations of this quasilinear form.
Discussion.
[author=wikibooks, file =text_files/quasilinear_partial_diffeqs]
∂u
Summarising the conclusions of the last section, to solve a PDE a1 (u, x) ∂x 1
+
∂u ∂u
a2 (u, x) ∂x2 · · · + an (u, x) ∂xn = b(u, x) subject to the initial condition that on the
surface, (x1 (r1 , . . . , rn − 1, . . . xn (r1 , . . . , rn − 1), u = f (r1 , . . . , rn − 1) –this being
an arbitary paremetrisation of the initial surface–
We transform the equation to the equivalant set of ODEs, dx dxn
ds = a1 . . . ds =
1
an duds = b subject to the initial conditions xi (0) = f (r1 , . . . , rn−1 ) u = f (r1 , r2 , . . . rn−1 )
Solve the ODEs, giving xi as a function of s and the ri . Invert this to get s and
the ri as functions of the xi . Substitute these inverse functions into the expression
for u as a function of s and the ri obtained in the second step.
Both the second and third steps may be troublesome.
The set of ODEs is generally non-linear and without analytical solution. It
may even be easier to work with the PDE than with the ODEs.
In the third step, the ri together with s form a coordinate system adapted for
the PDE. We can only make the inversion at all if the Jacobian of the transfor-
∂x1 · · · ∂x1
a1
∂r1 ∂rn−1
. .. .. 6= 0 This is
mation to Cartesian coordinates is not zero, .. . .
∂xn
∂r1 · · · ∂r∂xn−1 an
n
equivalent to saying that the vector (a1 , . . . , an ) is never in the tangent plane to a
surface of constant s.
If this condition is not false when s=0 it may become so as the equations are
integrated. We will soon consider ways of dealing with the problems this can cause.
342 CHAPTER 12. PARTIAL DIFFERENTIAL EQUATIONS
Example 12.2.1.
[author=wikibooks, file =text_files/quasilinear_partial_diffeqs]
To see how this works in practice, we will a/ consider the PDE, uux + uy + ut = 0
with generic initial condition, u = f (x, y) on t = 0
Naming variables for future convenience, the corresponding ODEs are dx
dτ =
dy dz du
u dτ = 1 dτ = 1 dτ = 0 subject to the initial conditions at τ = 0, x =
r y = s t = 0 u = f (r, s)
These ODEs are easily solved to give x = r + f (r, s)τ y = s+τ t=τ u=
f (r, s)
These are the parametric equations of a set of straight lines, the characteristics.
1 + τ ∂f τ ∂f
∂r ∂s f
The determinant of the Jacobian of this coordinate transformation is 0 1 1 =
0 0 1
∂f
1 + τ ∂r
This determinant is 1 when t=0, but if fr is anywhere negative this determinant
will eventually be zero, and this solution fails.
In this case, the failure is because the surface sfr = −1 is an envelope of the
characteristics.
For arbitary f we can invert the transformation and obtain an implicit expres-
sion for u u = f (x − tu, y − x) If f is given this can be solved for u.
Example 12.2.2.
Example 12.2.3.
[author=wikibooks, file =text_files/quasilinear_partial_diffeqs]
Consider the form √of equation f (x, y) = x2 . The implicit solution is u = (x −
tu)2 ⇒ u = 1+2tx−2t2
1+4tx
which looks like [[ImageQuasilinearPDEexample1.png—
equation solution]] This solution clearly fails when 1 + 4tx < 0, which is just when
sfr = −1 . For any t¿0 this happens somewhere. As t increases this point of
failure moves toward the origin.
Notice that the point where u=0 stays fixed. This is true for any solution of
this equation, whatever f is.
We will see later that we can find a solution after this time, if we consider
discontinuous solutions. We can think of this as a shockwave.
12.3. INITIAL VALUE PROBLEMS 343
Example 12.2.4.
Example 12.2.5.
[author=wikibooks, file =text_files/quasilinear_partial_diffeqs]
/We can also consider the closely related PDE uux +uy +ut = y The corresponding
dy
ODEs are dx
dτ = u dτ = 1
dz
dτ = 1
du
dτ = y subject to the initial conditions at
τ = 0, x = r y = s t = 0 u = f (r, s)
These ODEs are easily solved to give x = r + τ f + 12 sτ 2 + 16 τ 3 y = s + τ t =
τ u = f + sτ + 21 τ 2 Writing f in terms of u, s, and τ , then substituting into the
equation for x gives an implicit solution u(x, y, t) = f (x − ut + 12 yt2 − 16 t3 , y − t) +
yt − 21 t2
It is possible to solve this for u in some special cases, but in general we can
only solve this equation numerically. However, we can learn much about the global
properties of the solution from further analysis
If we were to simply use the general solution of this equation for smooth initial
conditions,
u(x, t) = u(x + ct, 0)
we would get
1, x + ct ≥ 0
u(x, t) =
0, x + ct < 0
which appears to be a solution to the original equation. However, since the partial
differentials are undefined on the characteristic x+ct=0, so it becomes unclear
what it means to say that the equation is true at that point.
We need to investigate further, starting by considering the possible types of
discontinuities.
If we look at the derivations above, we see weve never use any second or higher
order derivatives so it doesnt matter if they arent continuous, the results above
will still apply.
344 CHAPTER 12. PARTIAL DIFFERENTIAL EQUATIONS
The next simplest case is when the function is continuous, but the first deriva-
tive is not, e.g |x|. Well initially restrict ourselves to the two-dimensional case,
u(x, t) for the generic equation.
Now we let the width of the strip fall to zero. The right hand side also tends
to zero but the left hand side reduces to the difference between two integrals along
the part of the boundary of R parallel to the curve.
R R
au+ dt − bu+ dx − au− dt − bu− dx = 0
The integrals along the opposite sides of R have different signs because they
are in opposite directions.
For the last equation to always be true, the integrand must always be zero, i.e
a dt dx0
ds − b ds [u] = 0
0
Since, by assumption [u] isnt zero, the other factor must be, which immediately
implies the curve of discontinuity is a characteristic.
Once again, discontinuities propagate along characteristics.
Discussion.
At larger t the solution u is more spread out than at t=0 but still the same
shape.
We can also consider what happens when a tends to 0, so that u itself is
discontinuous at x=0.
If we write the PDE in conservation form then use Greens theorem, as we did
above for the linear case, we get
[u] dx 1 2 dt0
ds = 2 [u ] ds
0
Example 12.3.1.
This has slope discontinuities at x=0 and x=a, dividing the solution into three
regions.
The boundaries between these regions are given by the characteristics through
these initial points, namely the two lines
x=t x=a
These characteristics intersect at t=a, so the nature of the solution must change
then.
In between these two discontinuities, the characteristic through x=b at t=0 is
clearly
x = 1 − ab t + b 0 ≤ b ≤ a
This is the reverse of what we saw for the initial condition previously consid-
ered, two slope discontinuities merging into a step discontinuity rather than vice
versa. Which actually happens depends entirely on the initial conditions. Indeed,
examples could be given for which both processes happen.
In the two examples above, we started with a discontinuity and investigated
348 CHAPTER 12. PARTIAL DIFFERENTIAL EQUATIONS
how it evolved. It is also possible for solutions which are initially smooth to become
discontinuous.
For example, we saw earlier for this particular PDE that the solution with the
initial condition u = x2 breaks down when 2xt+1=0. At these points the solution
becomes discontinuous.
Typically, discontinuities in the solution of any partial differential equation,
not merely ones of first order, arise when solutions break down in this way and
progogate similarly, merging and splitting in the same fashion.
Discussion.
Example 12.4.1.
Note that the partial derivatives are constant on the characteristics. This
always happen when the PDE contains only partial derivatives, simplifying the
procedure.
These equations are readily solved to give x = r(1 − 4τ ) y = s(1 − 4τ ) t =
τ u = (r2 + s2 )(1 − 4τ )
x2 +y 2
On eliminating the parameters we get the solution, u = 1−4t which can easily
12.5. HIGHER ORDER PDE’S 349
be checked.
a(x, y)uxx + b(x, y)uxy + c(x, y)uyy = d(x, y)ux + e(x, y)uy + p(x, y)u + q(x, y) (1)
The natural approach, after our experience with ordinary differential equations
and with simple algebraic equations, is attempt a factorisation. Lets see how far
this takes us.
We would expect factoring the left hand of (1) to give us an equivalent equation
of the form
a(x, y)(Dx + α+ (x, y)Dy )(Dx + α− (x, y)Dy )u
and we can immediately divide through by a. This suggests that those particular
combinations of first order derivatives will play a special role.
Now, when studying first order PDEs we saw that such combinations were
equivalent to the derivatives along characteristic curves. Effectively, we changed
to a coordinate system defined by the characteristic curve and the initial curve.
Here, we have two combinations of first order derivatives each of which may
define a different characteristic curve. If so, the two sets of characteristics will
define a natural coordinate system for the problem, much as in the first order
case.
In the new coordinates we will have
with each of the factors having become a differentiation along its respective charac-
teristic curve, and the left hand side will become simply ur ’s giving us an equation
of the form
urs = A(r, s)ur + B(r, s)us + C(r, s)u + D(r, s).
If A, B, and C all happen to be zero, the solution is obvious. If not, we can hope
that the simpler form of the left hand side will enable us to make progress.
However, before we can do all this, we must see if (1) can actually be factorised.
Multiplying out the factors gives
b(x, y) c(x, y)
uxx + uxy + c(x, y)uyy = uxx + (α+ + α− )uxy + α+ α− uyy
a(x, y) a(x, y)
On comparing coefficients, and solving for the α s we see that they are the roots
of
a(x, y)α2 + b(x, y)α + c(x, y) = 0
Since we are discussing real functions, we are only interested in real roots, so
the existence of the desired factorisation will depend on the discriminant of this
quadratic equation.
350 CHAPTER 12. PARTIAL DIFFERENTIAL EQUATIONS
If b(x, y)2 > 4a(x, y)c(x, y) then we have two factors, and can follow the pro-
cedure outlined above. Equations like this are called hyperbolic
If b(x, y)2 = 4a(x, y)c(x, y) then we have only factor, giving us a single char-
acteristic curve. It will be natural to use distance along these curves as one co-
ordinate, but the second must be determined by other considerations. The same
line of argument as before shows that use the characteristic curve this way gives
a second order term of the form ur,r , where weve only taken the second derivative
with respect to one of the two coordinates. Equations like this are called parabolic
If b(x, y)2 < 4a(x, y)c(x, y) then we have no real factors. In this case the best
we can do is reduce the second order terms to the simplest possible form satisfying
this inequality, i.e ur,r +us,s It can be shown that this reduction is always possible.
Equations like this are called elliptic
It can be shown that, just as for first order PDEs, discontinuities propagate
along characteristics. Since elliptic equations have no real characteristics, this
implies that any discontinuities they may have will be restricted to isolated points
i.e, that the solution is almost everywhere smooth.
This is not true for hyperbolic equations. Their behaviour is largely controlled
by the shape of their characteristic curves.
These differences mean different methods are required to study the three types
of second equation. Fortunately, changing variables as indicated by the factorisa-
tion above lets us reduce any second order PDE to one in which the coefficients of
the second order terms are constant, which means it is sufficient to consider only
three standard equations.
uxx + uyy = 0 uxx − uyy = 0 uxx − uy = 0
We could also consider the cases where the right hand side of these equations
is a given function, or proportional to u or to one of its first order derivatives,
but all the essential properties of hyperbolic, parabolic, and elliptic equations are
demonstrated by these three standard forms.
Derivation.
When the coefficients are not constant, an equation can be hyperbolic in some
regions of the xy plane, and elliptic in others. If so, different methods must be
used for the solutions in the two regions.
Derivation.
∇ 2 h = ht
Rb Rb
−a
ht dt = −a
hxx dx
d
Rb
dt −a h dt = [hx ]b−a
Provided that hx tends to zero for large x, we can take the limit as a and b
tend to infinity, deducing
d ∞
Z
h dt
dt −∞
so the integral of h over all space is constant.
This means this PDE can be thought of as describing some conserved quantity,
initially concentrated but spreading out, or diffusing, over time.
This last result can be extended to two or more dimensions, using the theorems
of vector calculus.
We can also differentiate any solution with respect to any coordinate to obtain
another solution. E.g if h is a solution then
∇2 hx = ∂x ∇2 h = ∂x ∂t h = ∂t hx
Derivation.
352 CHAPTER 12. PARTIAL DIFFERENTIAL EQUATIONS
r = αx τ = α2
then the equation retains the same form. This suggests that the combination of
variables x2 /t, which is unaffected by this variable change, may be significant.
We therefore assume this equation to have a solution of the special form
x
h(x, t) = f (η) where η =
t1/2
then
η
hx = ηx fη = t−1/2 fη ht = ηt fη = − fη
2t
and substituting into the diffusion equation eventually gives
η
fηη + fη = 0
2
which is an ordinary differential equation.
Integrating once gives
η2
fη = Ae− 4
Reverting to h, we find
2
A − η4
hx = √ e
A
R x t−s2 /4t
h = √
t −∞√
e ds + B
R x/2 t −z2
= A −∞ e dz + B
This last integral can not be written in terms of elementary functions, but its
values are well known.
In particular the limiting values of h at infinity are
√
h(−∞, t) = B h(∞, t) = B + A π,
and the entire solution looks like We see that the initial discontinuity is immedi-
ately smoothed out. The solution at later times retains the same shape, but is
more stretched out.
The derivative of this solution with respect to x
A 2
hx = √ e−x /4t
t
is itself a solution, with h spreading out from its initial peak, and plays a significant
role in the further analysis of this equation.
The same similiarity method can also be applied to some non-linear equations.
12.6. SYSTEMS OF PARTIAL DIFFERENTIAL EQUATIONS 353
Derivation.
[author=wikibooks, file =text_files/second_order_partial_diffeqs]
We can also obtain some solutions of this equation by separating variables.
Preamble
355
356 CHAPTER 12. PARTIAL DIFFERENTIAL EQUATIONS
states that this License applies to the Document. These Warranty Disclaimers
are considered to be included by reference in this License, but only as regards
disclaiming warranties: any other implication that these Warranty Disclaimers
may have is void and has no effect on the meaning of this License.
VERBATIM COPYING
You may copy and distribute the Document in any medium, either commer-
cially or noncommercially, provided that this License, the copyright notices, and
the license notice saying this License applies to the Document are reproduced in
all copies, and that you add no other conditions whatsoever to those of this Li-
cense. You may not use technical measures to obstruct or control the reading or
further copying of the copies you make or distribute. However, you may accept
compensation in exchange for copies. If you distribute a large enough number of
copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you
may publicly display copies.
COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed
covers) of the Document, numbering more than 100, and the Document’s license
notice requires Cover Texts, you must enclose the copies in covers that carry,
clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover,
and Back-Cover Texts on the back cover. Both covers must also clearly and legibly
identify you as the publisher of these copies. The front cover must present the full
title with all words of the title equally prominent and visible. You may add other
material on the covers in addition. Copying with changes limited to the covers, as
long as they preserve the title of the Document and satisfy these conditions, can
be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you
should put the first ones listed (as many as fit reasonably) on the actual cover,
and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more
than 100, you must either include a machine-readable Transparent copy along
with each Opaque copy, or state in or with each Opaque copy a computer-network
location from which the general network-using public has access to download using
public-standard network protocols a complete Transparent copy of the Document,
free of added material. If you use the latter option, you must take reasonably
prudent steps, when you begin distribution of Opaque copies in quantity, to ensure
that this Transparent copy will remain thus accessible at the stated location until
at least one year after the last time you distribute an Opaque copy (directly or
through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document
well before redistributing any large number of copies, to give them a chance to
provide you with an updated version of the Document.
MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the
conditions of sections 2 and 3 above, provided that you release the Modified Ver-
sion under precisely this License, with the Modified Version filling the role of the
Document, thus licensing distribution and modification of the Modified Version
to whoever possesses a copy of it. In addition, you must do these things in the
Modified Version:
358 CHAPTER 12. PARTIAL DIFFERENTIAL EQUATIONS
A. Use in the Title Page (and on the covers, if any) a title distinct from that of
the Document, and from those of previous versions (which should, if there
were any, be listed in the History section of the Document). You may use
the same title as a previous version if the original publisher of that version
gives permission.
B. List on the Title Page, as authors, one or more persons or entities respon-
sible for authorship of the modifications in the Modified Version, together
with at least five of the principal authors of the Document (all of its prin-
cipal authors, if it has fewer than five), unless they release you from this
requirement.
C. State on the Title page the name of the publisher of the Modified Version,
as the publisher.
F. Include, immediately after the copyright notices, a license notice giving the
public permission to use the Modified Version under the terms of this License,
in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and required
Cover Texts given in the Document’s license notice.
I. Preserve the section Entitled “History”, Preserve its Title, and add to it
an item stating at least the title, year, new authors, and publisher of the
Modified Version as given on the Title Page. If there is no section Entitled
“History” in the Document, create one stating the title, year, authors, and
publisher of the Document as given on its Title Page, then add an item
describing the Modified Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for public access
to a Transparent copy of the Document, and likewise the network locations
given in the Document for previous versions it was based on. These may
be placed in the “History” section. You may omit a network location for a
work that was published at least four years before the Document itself, or if
the original publisher of the version it refers to gives permission.
L. Preserve all the Invariant Sections of the Document, unaltered in their text
and in their titles. Section numbers or the equivalent are not considered part
of the section titles.
Copyright
YEAR
c YOUR NAME. Permission is granted to copy, dis-
tribute and/or modify this document under the terms of the GNU Free
Documentation License, Version 1.2 or any later version published by
the Free Software Foundation; with no Invariant Sections, no Front-
Cover Texts, and no Back-Cover Texts. A copy of the license is included
in the section entitled “GNU Free Documentation License”.
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, re-
place the “with...Texts.” line with this:
with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being
LIST.
If you have Invariant Sections without Cover Texts, or some other combination
of the three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recommend
releasing these examples in parallel under your choice of free software license, such
as the GNU General Public License, to permit their use in free software.