0% found this document useful (0 votes)
9 views

LectureNotes HT22 Part2

Uploaded by

Ellie Pym
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

LectureNotes HT22 Part2

Uploaded by

Ellie Pym
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 298

Don’t Panic

Part 2

A guide to MATA21 Analysis in One Variable

Version: August 19, 2022

Jan-Fredrik Olsen
ii
Contents

6 Limits for functions 175


6.1 A first look at limits for functions . . . . . . . . . . . . . . . . . . . . . . . 176
6.2 The definition of the limit for functions . . . . . . . . . . . . . . . . . . . . 182
6.3 The rulebook for limits of functions . . . . . . . . . . . . . . . . . . . . . . 188
6.4 How to use the rulebook in practice . . . . . . . . . . . . . . . . . . . . . . 191
6.5 Exam exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
6.6 Answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . 204

7 What is Calculus 207


7.1 The derivative and the definite integral . . . . . . . . . . . . . . . . . . . . 207
7.2 A first look at differential equations . . . . . . . . . . . . . . . . . . . . . . 216
7.3 Answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . 225

8 The derivative 227


8.1 Computational rules for the derivative . . . . . . . . . . . . . . . . . . . . 228
8.2 Proof of the computational rules for the derivative . . . . . . . . . . . . . 237
8.3 Differentiation formulas for elementary functions . . . . . . . . . . . . . . 242
8.4 Exam exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
8.5 Answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . 249

9 More on functions and limits 251


9.1 A first look at the Mean Value Theorem . . . . . . . . . . . . . . . . . . . 252
9.2 Consequences of the Mean Value Theorem . . . . . . . . . . . . . . . . . . 257
9.3 A nice little trick: L’Hopital’s rule . . . . . . . . . . . . . . . . . . . . . . 269
9.4 A closer look at the Mean Value Theorem . . . . . . . . . . . . . . . . . . 273
9.5 Proofs of a few deep theorems related to continuity . . . . . . . . . . . . 275
9.6 Exercises from previous exams . . . . . . . . . . . . . . . . . . . . . . . . . 284

iii
iv CONTENTS

9.7 Answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . 287

10 The indefinite integral 293


10.1 A first look at the indefinite integral . . . . . . . . . . . . . . . . . . . . . 293
10.2 The rulebook for indefinite integration . . . . . . . . . . . . . . . . . . . . 297
10.3 A big bag of integration tricks . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.4 Exam exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
10.5 Answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . 319

11 Differential equations 323


11.1 First order differential equations . . . . . . . . . . . . . . . . . . . . . . . 323
11.2 Second order linear ODEs with constant coefficients . . . . . . . . . . . . 329
11.3 Relevant exercises from previous exams . . . . . . . . . . . . . . . . . . . . 349
11.3 Answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . 351

12 The definite integral 353


12.1 The definition of the definite integral . . . . . . . . . . . . . . . . . . . . . 354
12.2 Basic rulebook for the definite integral . . . . . . . . . . . . . . . . . . . . 367
12.3 More advanced rules for the definite integral . . . . . . . . . . . . . . . . . 370
12.4 Applications to geometry and elementary functions . . . . . . . . . . . . . 377
12.5 Computing and estimating unbounded areas . . . . . . . . . . . . . . . . . 380
12.6 Relevant exercises from previous exams . . . . . . . . . . . . . . . . . . . . 389
12.7 Answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . 394

13 Taylor polynomials 397


13.1 A first look at Taylor polynomials . . . . . . . . . . . . . . . . . . . . . . . 398
13.2 Error estimates for Taylor polynomials . . . . . . . . . . . . . . . . . . . . 406
13.3 A uniqueness theorem for Taylor polynomials . . . . . . . . . . . . . . . . 416
13.4 The Big-oh "calculus" for error terms . . . . . . . . . . . . . . . . . . . . . 422
13.5 Relevant exercises from previous exams . . . . . . . . . . . . . . . . . . . . 431
13.6 Answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . 437

E Additional results on the definite integral E-1


E.1 Some words on the Lebesgue integral . . . . . . . . . . . . . . . . . . . . . E-1

F Taylor and power series F-1


F.1 A first look at Taylor and power series . . . . . . . . . . . . . . . . . . . . F-2
CONTENTS v

F.2 Uniform convergence and continuity of power series . . . . . . . . . . . . . F-8


F.3 When can we integrate and differentiate power series? . . . . . . . . . . . F-12
F.4 Summary on how to define the elementary functions . . . . . . . . . . . . F-15
F.5 Power series and differential equations . . . . . . . . . . . . . . . . . . . . F-21
vi CONTENTS
Chapter 6

Limits for functions

Introduction

In this chapter we study limits for functions. For sequences, we considered what hap-
pened to a sequence an as n ! 1. Now, we mainly consider what happens to a function
f (x) as x ! a. Even though these two situations are quite similar, we need to be a bit
more nuanced when it comes to limits of functions.

Remark 6.1 (Selected problems from previous exams based on this chapter)

1. (a) Formulate the epsilon-delta definition of limx!a f (x) = L.


(b) Explain what it means that f (x) is continuous at x = a.
(c) Use this definition to show that

lim (3x3 9x + 1) = 7.
x!2

2. We consider the function


8
<x sin( 1 ) x 6= 0
f (x) = x
:
C x=0

(a) For what value of C is f (x) continuous at x = 0?


(b) Use the epsilon-delta definition of the limit to prove your answer in (a).
(c) What is the derivative of f for x 2 R\{0}?
(d) Is f differentiable at x = 0?

175
176 CHAPTER 6. LIMITS FOR FUNCTIONS

6.1 A first look at limits for functions


Some words on neighbourhoods
Typically, we want to take the limit of f (x) as x approaches some point a that is not
in the domain Df . However, for this to make sense, the point a has to lie "near" the
domain. We therefore introduce the notion of a neighbourhood of a point.

Definition 6.2 (neighbourhood) We say that a set D ⇢ R is a neighbourhood of a


point x if there exists some radius > 0 so that

(x , x + ) ⇢ D.

The point is that if I is a neighbourhood of a point x, then this means that there is
a little bit of wiggle-room both to the left and the right of x inside of I.

p
Example 6.3 The function f (x) = x is defined in a neighbourhood of the point x = 1,
but, it is not defined in a neighbourhood of x = 0 (however, it is defined in a one-sided
neighbourhood of x = 0, see exercise 6.11).

In particular, if a function is defined on a neighbourhood of a point, then this means


that f is defined "near" that point. But what if we do not care if the function is defined
"at" the point?

Definition 6.4 (punctured neighbourhood) We say that a set D ⇢ R is a punctured


neighbourhood of a point x if there exists some radius > 0 so that

(x , x) [ (x, x + ) ⇢ D.

Example 6.5 The natural domain of the function f (x) = sin x/x is R\{0}. This
function is therefore defined in a punctured neighbourhood of x = 0.

As we will see on the next page, it therefore makes sense (at least intuitively) to
study the limit of f (x) = sin x/x as x approaches 0.
6.1. A FIRST LOOK AT LIMITS FOR FUNCTIONS 177

An informal definition of the limit for functions


Here is an informal definition of what we mean by the limit of a function.

Informal definition 6.6 (limit of a function) Suppose that f is defined (at least)
in a punctured neighbourhood of a point a. Then, if f (x) "approaches" some value L as
x "approaches" a, we write

lim f (x) = L or f (x) ! L.


x!a x!a

If this does not happen, we say that the function diverges as x ! a.

Of course, strictly speaking, this "informal definition" does not really define anything.
But in the following example, we try to illustrate what we are getting at.

Example 6.7 Consider


sin x
f (x) = .
x
Since the natural domain of f is R\{0}, we have that f is defined in a punctured
neighbourhood of 0. To get an idea of how f behaves as x approaches 0, we consider the
following plots made in Python:

Fig. 1. Two visualisations of f (x) where we indicated the computed values of f (x)
with red dots. Notice that x = 0 is not in the domain of f .

From the above figures, it seems like the values of f (x) approach 1 as x ! 0. It is
therefore reasonable to believe that
sin x
lim = 1.
x!0 x
178 CHAPTER 6. LIMITS FOR FUNCTIONS

Let us summarise the insight from the above example in a rule of thumb:

Remark 6.8 (Rule of thumb) To visually de-


termine a limit, follow your finger!

That is, to visually determine the limit of some


function as, say, x ! 0, just let your finger trace
the graph as x approaches 0. If the finger be-
comes confused along the way, then the limit
(most likely) does not exist!
Fig. 2. The finger knows :-D
Notice the following technical point about the
finger.

Remark 6.9 (The finger does not care about the point itself ) When discussing
if limx!a f (x) = L, we are not allowed to use any information about f (x) at the point
x = a itself. This is a feature of the formal definition of the limit which is necessary
since we typically want to investigate what happens as we approach points just outside
the domain of f . For this reason, the finger is blind to any information at x = a itself.

Exercise 6.10 What appears to be the limit for each of the functions in Figure 3 as
x approaches 0?

Fig. 3. What is the limit as x ! 0?


6.1. A FIRST LOOK AT LIMITS FOR FUNCTIONS 179

One-sided limits

The function shown in the Figure to the


right has no limit as x ! 0. Indeed, the
finger gets confused since it arrives at a
different value when coming in from the
left or from the right (the limit exists if
and only if our finger approaches the same
value when coming in from both sides,
whenever possible). To to discuss such sit-
uations, we introduce one-sided-limits.
Fig. 4. A graph with a sudden jump.

Exercise 6.11 Use definitions 6.2 and 6.4 as insipiration to define what we ought to
mean by one-sided (punctured) neighbourhoods to the right and to the left of a point,
respectively.
Remark: The point is to find a condition that ensures that the domain of a function
f is such that we can evaluate it as x approaches the point a from the left or from the
right, respectively.

Once we have agreed on a notion of one-sided neighborhoods, the following definition


makes sense.

Informal definition 6.12 (one-sided limits) Suppose that f is defined (at least)
on a punctured one-sided neighbourhood to the right of the point a. Then, if f (x)
"approaches" some value L as x "approaches" a from the right, we write
lim f (x) = L or f (x) ! L.
x!a+ x!a+

If the same holds, but with the word "right" replaced by "left" above, we write
lim f (x) = L or f (x) ! L.
x!a x!a

Exercise 6.13 (a) What are the one-sided limits of the function shown in Figure 4,
above?
(b) What are the one-sided limits of the function f (x) = sin x/x studied in Example
6.7, above?
180 CHAPTER 6. LIMITS FOR FUNCTIONS

One-sided limits versus two-sided limits


Notice that the limit, as we described it in Informal Definition 6.6 requires that f is
defined on both sides of the point we want x to approach. It is therefore natural to call
it the two-sided limit. The relation between the two-sided limit and the one-sided limit
is rather straight-forward.

Proposition 6.14 Suppose that f is defined (at least) on a punctured neighbourhood


of a 2 R. Then
lim f (x) = L () lim f (x) = L and lim f (x) = L
x!a x!a+ x!a

This proposition, which you are asked to prove in exercise 6.35, is rather useful since
it sometimes reduces checking an annoying limit (involving, say, absolute values) by two
hopefully friendlier one-sided limits.

Example 6.15 Does the limit x


lim
x!0 |x|
exist? Well, if we consider the cases x < 0 and x > 0 separately, we can open up the
absolute value. This allows the following, one-sided, computations:
x x x x
lim = lim = 1 and lim = lim = 1.
x!0+ |x| x!0 x
+ x!0 |x| x!0 x

Since the one-sided limits are different, we conclude that the limit does not exist.

Exercise 6.16 Compute x2 x 2


lim .
x!2 |x 2|

Finally, we make the following definition (which is not "informal").

Definition 6.17 If f is defined only in a one-sided neighbourhood of a point a, then by

lim f (x) = L,
x!a

we mean the appropriate one-sided limit.

That is, in situations where it only makes sense to consider one-sided limits, then by
the limit of a function, we always mean its one-sided limit.
p
Example 6.18 The limit of f (x) = x as x ! 0 is equal to 0.
6.1. A FIRST LOOK AT LIMITS FOR FUNCTIONS 181

Continuity
Above, we remarked that when computing limx!a f (x) our finger does not care about
about what happens at x = a itself. But what if we do care?

Definition 6.19 (Continuity) We say that a function f is continuous at a point


a 2 Df if one of the following holds:

• we can evaluate the limit of f at a and limx!a f (x) = f (a),


• a is an isolated point of Df .

Moreover, a function is continuous on a set I if it is continuous at all a 2 I, and we


simply call a function continuous if it is continuous on all points in its domain.

Here is an informal, not quite accurate, but quite useful description of what it means
for a function to be continuous at a point.

Remark 6.20 (An informal definition of continuity) A function is continuous at


a point in its domain if its graph has no sudden "jump" there.

Fig. 5. Guess which one of these is continuous at x = 0? :-)


8
> sin x
Exercise 6.21 Consider < if x 6= 0
f (x) = x
>
:
C if x = 0
Based on Example 6.7, for what value of C is this function continuous at x = 0?
Exercise 6.22 (a) Are any of the graphs shown in Figure 3 continuous at x = 0?
(b) Suppose that you are allowed to change the value of each of these graphs at x = 0
(if needed). Can you make any of them continuous by doing this?
Exercise 6.23 Does the graph of f (x) = 1/x have any jumps? Is this a continuous
function? Is this function continuous at x = 0?
Exercise 6.24 A sequence (an )1n=0 can be thought of as a function f (n) = an with
domain D = N. Is this function continuous?
182 CHAPTER 6. LIMITS FOR FUNCTIONS

6.2 The definition of the limit for functions


We now state the definition of the limit of a function.

Definition 6.25 Suppose that f is defined (at least) in a punctured neighbourhood of


a point a. Then we say that f (x) has L as its limit as x tends to a if the following holds:
For every ✏ > 0, there exists a > 0 so that
(
|x a| <
=) |f (x) L| < ✏.
x 6= a

If this holds, we write f (x) ! L as x ! a, or, which is the same,


lim f (x) = L.
x!a

Notice that the property that Df \{a} has points arbitrarily close to a just means
that the left-hand side of the implication in the above definition is not empty.

Remark 6.26 When working with limits, sometimes it is enough to assert that some
property holds for f when x 6= a is "sufficiently close" to a. By this we mean that there
exists some > 0 so that the property holds for x 2 Df \{a} whenever |x a| < .

How to use the definition on specific examples


We begin by considering the following example.
Example 6.27 We prove that
lim (2x 1) = 3. (6.1)
x!2
As we did for sequences, we begin by letting ✏ > 0 be some unknown, but fixed, number.

Fig. 6. Here, y = 2x 1 is illustrated along with the epsilon challenge ✏ = 1.

Our goal is now to show that we get (2x 1) closer than ✏ to 3 by moving x sufficiently
6.2. THE DEFINITION OF THE LIMIT FOR FUNCTIONS 183

close to 2. That is, we want to show that

|x 2| small =) |(2x 1) 3| < ✏.

The first step towards achieving this is to simplify the expression we are considering:

|(2x 1) 3| = |2x 4| = 2|x 2|.

As by a miracle, the expression |x 2| suddenly appears, and we can observe that



|x 2| < =) 2|x 2| < ✏,
2
In particular, this means that the requirement of Definition 6.25 is satisfied by choosing
= ✏/2. We have found our -response to the ✏-challenge, and we are done!

Fig. 7. Here we see appropriate responses to the challenges ✏ = 1 and ✏ = 1/2.

Exercise 6.28 What is the appropriate delta response to ✏ = 1 and ✏ = 1/2, respec-
tively, according to the computations in the above example? Does this match what
we see in Figure 7?
Exercise 6.29 In this exercise you are to use the definition of the limit to show that

lim (4x 3) = 9.
x!3

(a) How small does |x 3| have to be to beat the challenge ✏ = 1?


(b) How small does |x 3| have to be to beat the challenge ✏ = 1/10?
(b) How small does |x 3| have to be to beat the a general challenge ✏ > 0?

Example 6.30 Let us do a more difficult example. We use the epsilon-delta definition
of the limit to show that
f (x) = x3 2x2 5x + 8
184 CHAPTER 6. LIMITS FOR FUNCTIONS

is continuous at x = 1. That is, we need to use the epsilon-delta definition to show that

lim (x3 2x2 5x + 8) = f (1).


x!1

Again, we begin by supposing that we are given fixed but unknown ✏ > 0.

Fig. 8. Here, we illustrate the function f (x) = x3 2x2 5x + 8 with two possible
epsilon challenges. The question is, how do we respond?

The point is now to prove that

|x 1| small =) |(x3 2x2 5x + 8) f (1)| < ✏.

The first step towards achieving this is to simplify the expression we are considering:

|(x3 2x2 5x + 8) f (1)| = |(x3 2x2 5x + 8) 2| = |x3 2x2 5x + 6|.

To have any hope of succeeding, we need the factor x 1 to appear. But this is exactly
what happens since x = 1 is a root of this expression (check this!). Using what we
learned in Chapter 1, we obtain

|x3 2x2 5x + 6| = |x2 x 6||x 1|.

This is all good, but how to deal with |x2 x 6|? We now use a trick: observe that
while we do not yet know how small we need to make |x 1|, let us at least agree that we
will make this distance smaller than 1 (because why not?). Opening the absolute value,
we see that
|x 1| < 1 () 1 < x 1 < 1 () 0 < x < 2.
Next, we see that, under the condition 0 < x < 2, the triangle inequality gives us that
|x2 x 6|  x2 + |x| + 6 < 22 + 2 + 6 = 12.
That is, by combining what we have done so far, we get

|x 1| < 1 =) |(x3 2x2 5x + 8) 2| < 12|x 1|.


6.2. THE DEFINITION OF THE LIMIT FOR FUNCTIONS 185

But now, we are in exactly the same situation as in the previous example. Here, the
final observation is that

|x 1| < =) 12|x 1| < ✏.
12
In conclusion, we have shown that
(
|x 1| < 1
=) |(x3 2x2 5x + 8) 2| < ✏.
|x 1| < ✏/12

But wait! Does this satisfy the definition of the limit? Yes, what this means is that given
a challenge ✏ > 0, then the proper response is to be the smallest of the two numbers 1
and ✏/12, whichever that may be. We express this as choosing = min{1, ✏/12}. Done!

The first three exercises below are meant to help you understand the above example.

Exercise 6.31 Draw the graph of the function (✏) = min{1, ✏/12}.
Exercise 6.32 In Example 6.30, what seems like appropriate delta responses based
on Figure 8 (where epsilon is equal to 4 and 2, respectively)? What delta response is
suggested by the computations in the example? Does it matter that are not the same?
Exercise 6.33 In this exercise you are to use the definition of the limit to show that

lim (x3 4x2 + 10x 1) = 11.


x!2

In particular, make sure that you answer the following:

(a) How small does |x 2| have to be to beat the challenge ✏ = 1?


(b) How small does |x 2| have to be to beat the challenge ✏ = 1/10?
(c) How small does |x 2| have to be to beat the a general challenge ✏ > 0?

If you are able to solve the following exercise, then you have an excellent understand-
ing of the epsilon-delta type proofs and the techniques involved. Most students will need
more than one attempt to solve it.

Exercise 6.34 (Challenging) In this exercise, we investigate how to prove that


1
f (x) =
x 1
is continuous at x = 2.

(a) According to the definition of continuity, what limit do we need to prove?


(b) Draw the graph of f (x), and visually insert the epsilon challenges ✏ = 1 and
✏ = 1/2. What seem like appropriate delta responses?
186 CHAPTER 6. LIMITS FOR FUNCTIONS

(c) Is there any epsilon we can beat by choosing = 2? (You are also supposed to
answer this question by inspecting the graph visually.)
(d) Use the definition of the limit to verify that f (x) is continuous at x = 2.

Remark: Part (c) is included to help you notice an added difficulty when solving (d).
The point of this exercise is to figure out how to successfully deal with this in part (d).

Exercise 6.35 (a) Formulate an epsilon-type definition for the one sided limits ap-
pearing in Remark 6.12 and Proposition 6.14.
(b) Prove the (= part of Proposition 6.14.
(c) Prove the =) part of Proposition 6.14.
Hint: Once you have done (a), the rest of this exercise is just a matter of comparing
the definitions for one and two-sided limits.

Remark 6.36 In the definition of continuity (Definition 6.19), it may strike you as
arbitrary that we choose to call a function continuous at isolated points. However, one
reason why this is natural is that it is equivalent to the following way to define continuity
at a point:
We say that f (x) is continuous at a point a 2 Df if, for every ✏ > 0, there exists a
> 0 so that
|x a| < =) |f (x) f (a)| < ✏.

Exercise 6.37 Show that the definition in the above remark is equivalent to Defini-
tion 6.19.
Remark: This means that you need to prove that if f is continuous at a point according
to the one definition, then this implies that it is also continuous according to the other.
That is, there are two implications to check.
6.2. THE DEFINITION OF THE LIMIT FOR FUNCTIONS 187

Definition of the limit of a function as x tends to infinity


We now consider limits involving infinities such as

lim f (x) = L and lim f (x) = 1.


x!1 x!a

To define these, we need to capture formally what we would want to mean by

x large =) f (x) close to L

and

x close to a =) f (x) large,

respectively. Since this does not require any new ideas (just look at the definitions for
the limits of sequences and functions that we have come across so far), we leave this as
an exercise.

Exercise 6.38 Formulate an ✏-type definition for both limits mentioned above.
188 CHAPTER 6. LIMITS FOR FUNCTIONS

6.3 The rulebook for limits of functions


We now state the “rulebook” for how to deal with limits for functions. As you see, this
rulebook is more or less identical to that of sequences.

Proposition 6.39 (Rulebook for the limit of functions) Suppose that the limits
limx!a f (x) and limx!a g(x) exist (and are finite). Then the following hold:
⇣ ⌘
(i) lim f (x) ± g(x) = lim f (x) ± lim g(x)
x!a x!a x!a

(ii) lim f (x) · g(x) = lim f (x) · lim g(x)


x!a x!a x!a

If, in addition, limx!a g(x) 6= 0, then

(iii) lim f (x)/g(x) = lim f (x) / lim g(x)


x!a x!a x!a

The next two rules state that the limit respects inequalities. We note that in both rules,
it is enough for the inequalities to hold for x 6= a that are sufficiently close to a:

(iv) f (x)  g(x) =) lim f (x)  lim g(x)


x!a x!a

Here is the Squeeze theorem for the limits of functions:

(v) f (x)  g(x)  h(x) and lim f (x) = lim h(x) = L =) lim g(x) = L.
x!a x!a x!a

We now state a "composition rule" for the limit. It says that if we can evaluate the limit
of f g as x tends to a, then
(
f is continuous, and
(vi) =) lim f g(x) = f lim g(x)
lim g(x) exists and is in Df x!a x!a
x!a

Here is essentially the same rule as above, formulated as "change of variables" rule. In
this case, we also need to assume that g(x) is not equal to b for x sufficiently close to a:

(vi’) lim g(x) = b and lim f (u) exists =) lim f g(x) = lim f (u).
x!a u!b x!a u!b

Finally, the following concrete limits are useful enough to be included here:

(vii) lim C = C for all constants C 2 R, (viii) lim x = a


x!a x!a

Finally, the same extensions to infinite limits as for the limits of sequences hold, and,
moreover, the above rules also apply if we replace a by +1 or 1.
6.3. THE RULEBOOK FOR LIMITS OF FUNCTIONS 189

Proof of Proposition 6.39(i): The sum rule for limits


To see how to prove a result on limits for functions, we consider the following rule.

Proposition 6.40
⇣ ⌘
lim f (x) = L and lim g(x) = M =) lim f (x) + g(x) = L + M.
x!a x!a x!a

Proof of Proposition 6.40. To prove that f (x) + g(x) ! L + M , we begin assuming,


just as for sequences, that we are given a fixed but unknown ✏ > 0. According to the
definition of the limit, we must respond to this challenge by showing that the following
implication is true:

|x a| small =) |(f (x) + g(x)) (L + M )| < ✏. (6.2)

Note that the statement “|x a| is small” is exactly mean when we write |x a| < .
In particular, the hypothesis of the proposition, i.e., the statements f (x) ! L and
g(x) ! M , mean the following: for all ✏1 , ✏2 > 0, there exist numbers 1 , 2 > 0, so that

|x a| < 1 =) |f (x) L| < ✏1


(6.3)
|x a| < 2 =) |g(x) M | < ✏2 .

As in the corresponding proof for sequences, we are almost done with the proof at this
point. Recall that our goal is to somehow use the information in (6.3) to obtain (6.2).
So, what we do is to connect these expressions by using the triangle inequality as follows:

|(f (x) + g(x)) (L + M )| = |(f (x) L) + (g(x) M )|


 |f (x) L| + |g(x) M |.

Next, observe that if we choose ✏1 = ✏/2 and ✏2 = ✏/2, then we are guaranteed that there
exist numbers 1 , 2 so that (6.3) holds. Combining this with the above, we find that
8
<|x a| < 1
>
|x a| < 2 =) |(f (x) + g(x)) (L + M )| < ✏1 + ✏2 = ✏.
>
:
x 6= a

In other words, the definition of the limit limx!a (f (x) + g(x)) = L + M is satisfied if we
choose = min{ 1 , 2 }.

Exercise 6.41 Modify the proof of the product rule for limits of sequences so that it
applies to limits for functions.

Exercise 6.42 Modify the proof of the squeeze theorem for limits of sequences so
that it applies to limits for functions.
190 CHAPTER 6. LIMITS FOR FUNCTIONS

Proof of Proposition 6.39(vi’): Change of variables formula


We begin by essentially reformulating the change of variables formula. However, note
that this formulation is slightly different from the one given in the rulebook.

Proposition 6.43 Suppose that f g(x) is defined in some punctured neighbourhood


of x = a, and that g(x) 6= b there, then

lim g(x) = b and lim f (u) = L =) lim f (g(x)) = L


x!a u!b x!a

Proof of Proposition 6.43. In a slightly sloppy language, what we know is the following:
8✏1 > 0 : |x a| small =) |g(x) b| < ✏1
(6.4)
8✏2 > 0 : |u b| small =) |f (u) L| < ✏2
What we need to prove is that given some unknown, but fixed, ✏ > 0, then
|x a| small =) |f (g(x)) L| < ✏.
But this is rather reasonable. Indeed, note by using the connection u = g(x), we ought to
be able to combine the two lines in (6.4) to arrive at the following chain of implications:
u=g(x)
|x a| small =) |u b| small =) |f (u) L| < ✏

In the figure, below, we illustrate the basic objective of the proof. You start out with
an epsilon target around L and are supposed to find a some delta on the x-axis that you
know answers this challenge. To do this, you need to take into account what happens
on the u-axis in the middle.
Exercise 6.44 In this exercise you are
asked to complete the proof.

(a) First identify a suitable choice for ✏2 ,


and write out the condition this gives
on |u b|.
(b) Next, since we have put u = g(x), we
can use the condition on |u b| to make
a suitable choice for ✏1 . This in turn
gives a condition on |x a|. Write out
this condition.
(c) Finally, make a suitable choice for a
number so that the following impli-
cation holds: Fig. 9. The basic idea of the proof.
|x a| < and x 6= a =) |f (g(x)) L| < ✏.
6.4. HOW TO USE THE RULEBOOK IN PRACTICE 191

6.4 How to use the rulebook in practice


The intuitions and rules of thumb we gained when working with limits of sequences also
hold for limits of functions. However, at the start of this chapter we formulated a new
rule of thumb (the finger!). Here, we formulate yet another one (an algebraic finger!).
We also take another look at how to use the change of variables formula and at how to
compute asymptotes.

A final rule of thumb


One difference when working with limits of functions as opposed to limits of sequences is
that we are no longer letting n tend to 1, but rather we are letting x tend to some specific
value. For this reason, the following rule of thumb is usually helpful in determining when
an expression is no longer on an indeterminate form.

Remark 6.45 (A final rule of thumb – an algebraic finger) If you can get away
with replacing x by its limit, then go for it!

We illustrate what we mean by this in the following example.

Example 6.46 We compute the limit


x2 1
lim .
x!1 x 1
Since this limit is of the form [0/0], we must begin by rewriting the expression:
x2 1 h 0 i (x 1)(x + 1)
lim = = lim = lim (x + 1)
x!1 x 1 0 x!1 x 1 x!1

Here, we end up with an expression where it makes complete sense to replace x by 1.


Following our rule of thumb, we conclude that

x2 1
lim = lim (x + 1) = 2.
x!1 x 1 x!1

Exercise 6.47 Use the rulebook to justify the steps in the above example.
Exercise 6.48 Determine the following limits.
✓ ◆
x 2 1 2
(a) lim (b) lim + .
x!2 x2 + x 6 x! 1 x + 1 x2 1
192 CHAPTER 6. LIMITS FOR FUNCTIONS

The composition rule versus the change of variables rule


We now discuss the relation between rules (vi) and (vi’). To emphasise how they are
essentially two formulations of the same underlying "phenomenon", we consider an ex-
ample that can be computed using both.

Example 6.49 Suppose we want to compute the limit


lim (4x + 1)2 .
x!1
First, we observe that we can of course compute this limit using neither rule (vi) or (vi’).
For instance, the following computation is straight-forward:

lim (4x + 1)2 = lim (16x2 + 8x + 1) = 16 + 8 + 1 = 25.


x!1 x!1

Next, suppose that we know that the function y = x2 is continuous (you are asked to
verify this below). Then we can use (vi) to simplify the above computation as follows:
⇣ ⌘2
lim (4x + 1)2 = lim (4x + 1) = 52 = 25.
x!1 x!1

Alternatively, we can perform what amounts to basically the same simplification as


follows using rule (vi’). In the notation of the change of variables formula, we put g(x) =
4x + 1 and f (u) = u2 . Then, since limx!1 g(x) = 5 and limu!5 f (u) = limu!5 u2 = 25,
we are justified in writing

lim (4x + 1)2 = lim u2 = 25.


x!1 u !5

Now this last computation might not seem so appealing as the notation was a bit cum-
bersome by comparison to the application of (vi). But consider the following, more
elegant way of presenting computation using rule (vi’):
Since u = 4x + 1 ! 5 as x ! 1, we have
u=4x+1
lim (4x + 1)2 = lim u2 = 25.
x!1 u !5

Exercise 6.50 In this exercise you are asked to study the connection between rules
(vi) and (vi’).

(a) Explain why it is that if we assume that f (u) is continuous at u = b, then we


can remove the condition that g(x) 6= b in a punctured neighbourhood from rule
(vi’).
(b) Under the modification from (a), prove that rule (vi’) implies rule (vi).
6.4. HOW TO USE THE RULEBOOK IN PRACTICE 193

Elementary functions are continuous


Our goal is to use the rulebook for the limit to prove the following extremely useful
proposition.

Theorem 6.51 All elementary functions are continuous.

Recall that the elementary functions included all functions in the following list, as well
as all combinations of these functions using a finite number of the operations of addition,
subtraction, multiplication, addition and composition (no inverses though!):

• polynomials and rational functions,


• the trigonometric functions
• the logarithm
• the exponential and power functions
• the inverse trigonometric functions

As a first step to proving the above theorem, we establish the following proposition,
which more or less follows immediately from the rulebook for the limit of functions.

Proposition 6.52 Suppose that f and g are continuous. Then all combinations of f
and g using a finite number of the operations of addition, subtraction, multiplication,
division and composition are continuous.

Proof. Suppose that f and g are continuous, and that a is some point in the domain of
f + g. If this is an isolated point of the domain, we are done. If not, we check the limit:

lim (f (x) + g(x)) = lim f (x) + lim g(x) = f (a) + g(a).


x!a x!a x!a

Since the limit of f + g as x tends to a equals the value of (f + g)(a), we conclude that
the sum must be continuous.
The proofs of the continuity of f g, f · g, f /g and f g are almost identical to the
one above, and so we leave these as an exercise.

Exercise 6.53 Complete the proof of the above proposition.

Note that the above proposition does not say anything about continuity of inverse
functions. This is because even if a function is continuous and invertible, its inverse is
not necessarily continuous. We also record the following fact, which we will need.
194 CHAPTER 6. LIMITS FOR FUNCTIONS

Proposition 6.54 If f is defined, continuous and invertible on an interval, then f 1

is continuous.

While we postpone the proof of this proposition until the end of Chapter 9, we invite
you to do the following exercise.

Exercise 6.55 (Challenge) Find an example of a function that is continous and


invertible, but whose inverse function is not continuous.
Hint: To understand how such an example would have to look, you should read the
statement of Proposition 6.54 carefully.

By the above propositions, in order to prove Theorem 6.51, we actually only need to
check that a few of the functions in the list of elementary functions are continuous.

Lemma 6.56

(i) If all polynomials are continuous, then so are all rational functions.
(ii) If the logarithm is continuous, then so is the exponential function and all power
functions.
(iii) If the sine and cosine functions are continuous, then so is the tangent function and
the inverse trigonometric functions.

Exercise 6.57 Prove the above lemma.


Hint: You need to apply propositions 6.52 and 6.54.

In light of Lemma 6.56, the following three lemmas establishes Theorem 6.51.

Lemma 6.58 All polynomials are continuous.

Proof. From the rulebook of the limit, we already know that the constants and f (x) = x
are continuous. This is enough to do a proof by induction to show that all polynomials
of degree n for all n 2 N are continuous. In particular, the continuity of the constants is
exactly the base case n = 0.
To do the induction step, we assume that all polynomials of degree n are continuous.
Our goal is to prove that this implies that all polynomials of degree n + 1 are continuous.
So, suppose that p(x) is a polynomial of degree n + 1. Now, observe that y =
p(x) p(0) has a zero at x = 0. By the fundamental theorem of algebra, this means
6.4. HOW TO USE THE RULEBOOK IN PRACTICE 195

we can factor out x and write p(x) p(0) = xr(x) for some polynomial r(x) of degree
n, which, by the induction hypothesis, is continuous. But now we have basically won!
Indeed, solving for p(x), we obtain the formula p(x) = xr(x) p(0) which allows us to
use Proposition 6.52 to conclude that p(x) is continuous (keep in mind, we already know
that y = x is continuous).

We now turn to the logarithm, the sine and the cosine. We first recall that in the
"geometrically obvious" facts listed on these functions in remarks 2.49 and 2.65, we also
said that these functions have no "jumps" in their graphs. That is, we take it as obviously
true that these functions are continous. Nevertheless, below, we indicate how to show
that the continuity of these functions is can be deduced from some facts that we will
be able to prove later, once we define the sine, cosine and logarithm in a non-geometric
fashion.

Lemma 6.59 The logarithm is continuous.

Proof. Let us first prove that the logarithm is continuous at x = 1. Since ln(1) = 0, this
amounts to showing that
lim ln(x) = 0.
x!1

But this follows immediately by applying the squeeze theorem to the following inequality
(recall Proposition 2.70):

x 1
 log x  x 1, 8x > 0.
x
Next, we use the logarithmic laws to prove that this implies that the logarithm is con-
tinuous for all x > 0. To do this, we need to use the logarithmic laws in combination
with the change of variables formula. Here is one way to compute this (note that we are,
yet again, relying on the trick of adding by 0):

lim ln(x) = lim ln(x) ln(a) + ln(a)


x!a x!a
⇣ ⌘
= lim ln(x) ln(a) + ln(a)
x!a
⇣x⌘
= lim ln + ln(a)
x!a a
u= x
a
= lim ln(u) + ln(a) = ln(a).
u!1

Lemma 6.60 The sine and cosine are continuous.


196 CHAPTER 6. LIMITS FOR FUNCTIONS

Proof. This proof is rather similar to the one for the logarithm above. Indeed, we start
out by establishing that the sine is continuous at x = 0. Since sin(1) = 0, this amounts
to showing that
lim sin x = 0. (6.5)
x!0
As for the logarithm, we again apply an inequality from Chapter 2 (recall Proposition
2.60):
0  sin x  x  tan x, 8x 2 [0,⇡/2).
Here, we only need the two left-most parts of this triple inequality. Namely that 0 
sin x  x. From this, it immediately follows from the Squeeze theorem that (6.5) holds
when x ! 0+ . Using that sin x is an odd function, the corresponding one-sided limit
from the left also holds:
u= x
lim sin x = lim sin( u) = lim sin(u) = 0.
x!0 u!0+ u!0+

Next, we establish that it now follows that the cosine function is also continuous at x = 0.
This can be done in several ways. For instance, we can apply the half-angle formula on
the form ⇣x⌘
cos(x) = 1 2 sin2 .
2
Indeed, taking the limit of this expression as x ! 0, we find that
x
lim cos x = 1 2 lim sin2
x!0 x!0 2
⇣ x ⌘2
= 1 2 lim sin
x!0 2
x
u= 2 ⇣ ⌘2
= 1 2 lim sin u = 1 2 · 02 = 1.
u!0

(Notice that we used the continuity of the polynomial y = x2 to be able to pass the limit
inside of the square in the last line!)
Now, all that remains is to use the continuity of the sine and cosine at x = 0 prove
that they are continuous at all x. Here is how to prove that the sine is continuous at
an arbitrary point a 2 R using, yet again, trigonometric identities and the change of
variables rule:
u=x a
lim sin(x) = lim sin(u + a)
x!a u!0
⇣ ⌘
= lim sin u cos a + sin a cos u
u!0

= cos a · lim sin u + sin a · lim cos u


u!0 u!0

= cos a · 0 + sin a · 1 = sin a.


We leave the almost identical proof for the cosine as an exercise.
6.4. HOW TO USE THE RULEBOOK IN PRACTICE 197

Exercise 6.61 Prove that the cosine is continuous.

Exercise 6.62 Prove that f (x) = x is continuous for all > 0.


Hint: Recall the definition of x .

Remark 6.63 We remark that Theorem 6.51, above, is the answer to the following
question: "How do I justify that a function is continuous on the final exam". That is,
just remark that it is an elementary function! (This is something we will practice doing
over and over and over again in Chapter 9!)
198 CHAPTER 6. LIMITS FOR FUNCTIONS

Some important limits


We now push the techniques used to prove the continuity of the elementary functions a
bit further to establish some limits that occur surprisingly often in both mathematics,
physics and engineering.
The first limit we consider is:

Proposition 6.64
ln(x + 1)
lim = 1.
x!0 x

Since this limit can be established using basically the same ideas as when we showed
that the logarithm was continuous at x = 1, we outline the proof as an exercise. As
before, the central players will be the logarithmic laws and the inequality
x 1
 log(x)  x 1, 8x > 0.
x
Exercise 6.65 (a) Verify the limit visually by considering a suitable plot.
(b) Prove the above limit by following the proof of the continuity of the logarithm
at x = 1, line by line.

The next limits we consider are:

Proposition 6.66
sin x 1 cos(x)
(i) lim =1 (ii) lim =0
x!0 x x!0 x

Again we give the proofs of these limit as an exercise, since the plan is to mimic what
we did when proving that the sine and cosine were continuous. Specifically, the point is
to use trigonometric identities in combination with the inequality

0  sin x  x  tan x, 8x 2 [0,⇡/2).

Exercise 6.67 (a) Verify the limits visually by considering a suitable plot.
(b) Prove the above limits by following the proof of the continuity of the sine and
cosine at x = 0, "essentially" line by line.

Exercise 6.68 Use the above limits to compute the limits

sin(x2 1) ex 1
(a) lim (b) lim
x!1 x 1 x!0 x
6.4. HOW TO USE THE RULEBOOK IN PRACTICE 199

Some important limits involving infinities


We now move on to consider some important special cases of limits where either x or
the involved functions tend to infinity (or both!). First, we note that since the involved
definitions are so similar, the following result is proven exactly in the same way as that
of sequences (this was contained in the table of growth from Chapter 5):

Proposition 6.69 ln x
lim = 0, ↵ > 0.
x!1 x↵

We now give an example of how we can use this limit in a computation. Note that
since the rulebook for limits is well-adapted to deal with limits where x tends to infinity.

Example 6.70 We use the above limit in combination with a change of variables to
show that for > 0, we have
x
lim x = 0.
x!1 e

Notice that this limit is of the form [1/1]. So, this is just another example of the
exponential function winning essentially every fight he is involved in.
Let us try the change of variables ex = u, which is the same as x = ln u. Our hope
is that this will transform this expression into the limit from the above proposition. We
note that x ! +1 implies that u = ex ! +1. This allows the following computation:
x u=ex (ln u) ⇣ ln u ⌘
lim x = lim = lim
x!1 e u!1 u u!1 u1/
⇣ ln u ⌘
= lim 1/ = 0.
u!1 u

In the last line, we first used the fact that y = x is continuous (see exercise 6.62, below),
followwed by Proposition 6.69 for ↵ = 1/ .

Exercise 6.71 For ↵ > 0, compute the limit

lim x↵ ln x.
x!0+

Exercise 6.72 Compute the limits


⇣ 1 ⌘2x ⇣ 1 ⌘2x
(a) lim 1+ (b) lim 1+
x!1 3x x!0+ 3x
200 CHAPTER 6. LIMITS FOR FUNCTIONS

Computations involving asymptotes


In Chapter A, we discussed informally various types of asymptotes that you (should) have
seen in high school (vertical, horisontal and skew). We now point out how the notions of
asymptotes are related to the limits involving infinites discussed on the previous pages.

Definition 6.73 (asymptotes) We say that:

• f has the straight line y = L as a horisontal asymptote if

lim f (x) = L or lim f (x) = L.


x!1 x! 1

• f has the vertical line x = a as a vertical asymptote if

lim f (x) = ±1 or lim f (x) = ±1,


x!a x!a+

(here, by ±1, we mean "1 or 1").

• f has the straight line y = kx + m as a skew asymptote if

lim f (x) (kx + m) = 0 or lim f (x) (kx + m) = 0.


x!1 x! 1

In Chapter A, we only considered asymptotes of rational functions. Let us now


consider an example where we check the vertical and horisontal asymptotes of a non-
rational function.

Example 6.74 (Vertical asymptotes) Let us determine any vertical and horisontal
asymptotes of the expression
1
f (x) = p .
x2 2x x
To find the vertical asymptotes, we need to identify all points in R where the function
can tend to infinity. First, note that when x approaches points where the denominator
is defined and non-zero, the "additional rule thumb" applies and we get a finite limit.
p
p To figure out where we can have an infinite limit, we investigate x
2 2x =
x(x 2). We see that this root is defined for on ( 1,0] [ [2, + 1), and, moreover,
that it is zero when x 2 {0, 2}. We check the following limits:
1 1 h 1 i h 1 i
lim f (x) = = and lim f (x) = + = + = +1.
x!2+ 0 2 2 x!0 0 0 0
That is, we have a vertical asymptote at x = 0. Here, we used the symbols 0+ and 0
to indicate whether or not the zeroes are approached from the positive or negative sides.
Moreover, we checked one-sided limits since we cannot approach from inside (0,2).
6.4. HOW TO USE THE RULEBOOK IN PRACTICE 201

The
p final possibility for vertical asymptotes are at points where the root is defined,
but x2 2x x = 0. That is, at points where x satisfies:
p
x2 x = x =) x2 x = x2 () x = 0.

But this is the point we already detected, so we have found all vertical asymptotes.

To check for horisontal asymptotes, we need to figure out if the expression approaches
some constant as x approaches +1 or 1. In this case, the computations more or less
act as if we were computing with sequences and letting n ! 1.
Let us revisit the above example.

Example 6.75 (Horisontal asymptotes) We check whether the function f (x) in the
previous example has any horisontal asymptotes. First, we check if there is a horisontal
asymptote as x ! +1:
h 1 i p
1 1 x2 x + x
lim p = = lim p ·p
x!1 x2 x x 1 1 x!1 x2 x x x2 x + x
p
x2 x + x
= lim
x!1 x
q
1 p
x 1 x +1 1+0+1
= lim = = 2.
x!1 x 1 1
(Notice how we used practically every rule of thumb here.)
We leave it as an exercise to check for a horisontal asymptote as x ! 1.

Exercise 6.76 (a) Determine whether the function in the previous example has a
horisontal asymptote as x ! 1.
(b) Use some visualisation tool to draw the graph of f (x) to verify your answer in
(a).
Exercise 6.77 (a) Determine any vertical and horisontal asymptotes of
1 ⇣1⌘
f (x) = 2 sin .
x x
(b) Use some visualisation tool to draw the graph of the function in (a) to verify
your answers.

We now turn to the question of how to compute skew asymptotes (which, we remind
you, are sometimes also called oblique asymptotes). In Appendix A, we explain how we
202 CHAPTER 6. LIMITS FOR FUNCTIONS

can identify skew asymptotes of rational functions by using polynomial division. Here is
a recipe for finding skew asymptotes that also works for functions that are not rational.

Method 6.78 (Finding skew asymptotes) To determine the oblique asymptote


y = kx + m of a function as x ! 1 one can use the following algorithm:

Step 1. Compute A = limx!1 f (x)/x.

Step 2. Compute B = limx!1 (f (x) Ax).

Step 3. Conclude: If both limits A and B exist, y = Ax + B is the oblique


asymptote as x ! +1. If either A or B do not exist, then f has no oblique
asymptote as x ! 1.

By replacing 1 by 1, the same method allows us to find skew asymptotes as x ! 1.

Remark 6.79 An example of finding a skew asymptote is worked out in the YouTube-
film linked to Method 6.78.

Exercise 6.80 Use the above method to find the skew asymptotes as x ! 1 of the
following functions.
x3 + 2x2 5x
(a) f (x) = 3x 2 (b) f (x) = (c) f (x) = xe1/x .
x2 + 1

Hint: The skew asymptote in part (b) can also be found using polynomial division, as
we did in Chapter 2. Try both methods to see if they match up. In (c), a standard
limit may come in handy.

Exercise 6.81 In this exercise, we prove that Method 6.78 will always give the correct
answer.

(a) Prove that if f (x) has y = kx + m as an oblique asymptote as x ! 1, then the


above method produces gives A = k and B = m.
(b) Prove that if the above method indicates that f (x) has y = Ax + B as an oblique
asymptote, then y = Ax+B satisfies the definition of being an oblique asymptote
of f (x).
6.5. EXAM EXERCISES 203

6.5 Exam exercises


Exercise 6.82 (Lund, January 2016) On this exercise you could get a maximum
of 5 points.

(a) (1 point) Define what we mean by limx!c f (x) = L.


(b) Use the definition you gave in (a) to verify one of the following limits.

(i) (2 points) lim (3x2 9x + 1) = 5


x!2

(ii) (3 points) lim (3x3 9x + 1) = 7


x!2

x3 3x + 1
(iii) (4 points) lim =3
x!2 2x 3

Exercise 6.83 (Lund, May 2015) Do one of the following exercises (both are worth
the same number of points).

(a) Give the definition of limx!a f (x) = L and use it to show that

lim (x2 + 3x + 1) = 5.
x!1

(b) Give the definition of limn!1 an = +1 and use it to show that

lim n!/20n = +1.


n !1

Remark: Part (b) actually belongs to Chapter 5.

Exercise 6.84 (Lund, August 2014)

(a) Formulate the epsilon-delta definition of the limit of a function at a point.


(b) Explain what it means for a function to be continuous at a point.
(c) Prove, using the epsilon-delta definition, that f (x) = 2x2 3x + 1 is continuous
at x = 1.

Exercise 6.85 (Lund, May 2014) Explain briefly what it means for a function f
to be continuous at a point x. Then use the epsilon-delta definition of the limit to
show that f (x) = x3 + 2x2 + 3x + 1 is continuous at x = 2.
204 CHAPTER 6. LIMITS FOR FUNCTIONS

6.6 Answers to selected exercises


6.10 Clockwise, starting at top-left: diverges to 1, 0, diverges, diverges.

6.21 C = 1.

6.22 No.

6.23 The function is continuous, but not at x = 0 (how can this be?).

6.28 = 1/2 and = 1/4, respectively (or any smaller than this).

6.29 (a) |x 3| < 1/4, (b) |x 3| < 1/40, (c) |x 3| < ✏/4.

6.32 It is a bit hard to see, but based on the first figure = 1/2 (or even something
slightly larger than this) seems to work, and in the second figure, = 1/4 seems to
work. The formula from the example gives = 4/12 = 1/3 and = 2/12 = 1/6,
respectively. It does not matter that these are not the same (since if one works,
then all smaller also automatically work – the point is to find some that is
small enough).

6.33 (c) Following the steps of Example 6.30, we arrive at = min{1, ✏/21}. (But
just changing the steps slightly may lead to other choices for which are also
acceptable).

6.41 Basically, the proofs are the same except conditions of the type n > N are replaced
by conditions of the type |x a| < .

6.44 (a) ✏2 = ✏, (b) ✏1 = 2, (c) = 1.

6.47 You need rules (i), (vi) and (vii).

6.48 (a) 1/5, (b) 1/2.

6.68 (a) 2, (b) 1.

6.71 The limit is equal to 0. Do a change of variables to make it into standard limit
(iii).

6.72 Rewrite the expressions using ax = ex log a , then you will find the answers (a) e2/3 ,
(b) 1.

6.76 (a) y = 0 is a horistonal asymptote when x ! 1.

6.77 (a) No vertical asymptotes, y = 0 is a horisontal asymptote as x ! ±1.

6.80 (a) y = 3x 2, (b) y = x + 2, (c) y = x + 1.


6.6. ANSWERS TO SELECTED EXERCISES 205

6.81 In this exercise, it is important to keep in mind that y = kx+m is a skew asymptote
for f (x) if limx!1 (f (x) kx m) = 0. In (a), the strategy is to add by 0 in both
the expressions for A and B to make kx + m appear, and then to use the definition
of the skew asymptote. In (b), it is enough to rewrite the computation that gives
you B in order to verify that y = Ax + B satisfies the definition of being a skew
asymptote.
206 CHAPTER 6. LIMITS FOR FUNCTIONS
Chapter 7

What is Calculus

As in Chapter 3, we now take the time to explain one of the major ideas of mathematics.
In Chapter 3, the point was to explain how mathematical analysis can be understood
as the study of mathematical objects in terms of limits of "infinite" processes. Here, we
discuss some of the central ideas of Calculus, which is the part of mathematical analysis
that deals with the interplay between the derivative and the definite integral.

7.1 The derivative and the definite integral


A first look at the derivative
Here is an informal description of what we mean by the derivative of a function.

Informal definition 7.1 (Day job of the derivative) The derivative f 0 (x) of a
function f denotes the slope of the line tangent to the graph of f at the point x.

Visually, this looks as follows:

Fig. 1. The slope of the blue line is exactly what we mean by f 0 (1).

207
208 CHAPTER 7. WHAT IS CALCULUS

Based on the informal definition, it may come as a surprise that the derivative is
probably among the most important scientific objects ever "discovered". Indeed, it is
central for our understanding of our physical reality in mathematical terms!

Fig. 2. The derivative is like Batman. It is has a seemingly boring day job, but is a
superhero by night (describing gravitational waves and curing cancer!).

To motivate a formal definition of the derivative, we consider the following example.

Example 7.2 Let us try to determine the derivative of f (x) = x2 at x = 1. Suppose,


for a moment, that we are a computer that only has knowledge of f on a finite number
of points represented by the red dots in the figure to the right.

In this case, the slope of the straight line


through the neighbouring points (1,f (1)) and
(2, f (2)) would be the best approximation we
could get of the slope of f at x = 1. That is,

f (2) f (1) 22 12
f 0 (1) ⇡ = = 3.
2 1 1
How to improve this estimate? Well, let us make
the gaps between the red dots smaller: Fig. 3. A first guess of the slope
of f at x = 1.

Fig. 4. We estimate f 0 (1) by comparing the value of f at x = 1 to its value at


x = 1 + 1/2 (left) and x = 1 + 1/10 (right), respectively.
7.1. THE DERIVATIVE AND THE DEFINITE INTEGRAL 209

That is, we get the following approximations:

0 f (1 + 12 ) f (1) (1 + 12 )2 1
f (1) ⇡ = = 2.5
1 + 12 1 1
2

1 1 2
f (1 + 10 ) f (1) (1 + 10 ) 12
f 0 (1) ⇡ 1 = 1 = 2.1.
1 + 10 1 10
What happens if we make the gaps between the red dots smaller? Could it be the
approximations for f 0 (1) stabilise at 2? As you are supposed to check in the following
exercise, this is indeed the case!

Exercise 7.3 We consider the function f (x) = x2 from the above example.
(a) What approximation for f 0 (1) do you get if you compute the slope of the straight
line through the points (1,f (1)) and (1 + 1/100, f (1 + 1/100))?
(b) Write out an expression for the slope of the line passing through the points
(1 + h, f (1 + h)), where h is some unknown number. Try to determine the limit
as h ! 0 both numerically using Python and analytically using computational
rules for the limit.

Inspired by the above discussion, we now give the definition of what we mean by the
derivative and tangent lines, respectively, of a function f at a point x.

Definition 7.4 (The derivative) We define the derivative of f at the point x to be

def f (x + h) f (x)
f 0 (x) = lim
h!0 h
at all points where this limit exists. Moreover, if the limit exists, we say that f is
differentiable at x. If f is differentiable at all points in its domain, we simply say that f
is differentiable. Note that we sometimes write dx d
f (x) or df
dx (x) instead of f (x).
0

Definition 7.5 (Tanget lines) If f is differentiable at a point x = a, then we call

y = f 0 (a)(x a) + f (a)

the tangent line of f at x = a.

Exercise 7.6 Use the definition of the derivative to calculate the derivatives of:
(a) f (x) = C (b) f (x) = x (c) f (x) = x2
(d) f (x) = 1/x (e) f (x) = kx + m (f ) f (x) = eCx
210 CHAPTER 7. WHAT IS CALCULUS

Physical interpretation of the derivative

Above, we discussed the very basics of


the derivative. But why should we care
about it? The thing is that the derivative
describes how a function is changing at a
specific value of x. For instance, suppose
that the graph in Example 7.2 describes
the distance travelled by an object as a
function of time starting at time x = 0.
Then the average velocity this object has
from time x = 1 to time x = 2 is equal
to the distance travelled divided by the
time this took:
f (2) f (1)
.
2 1
If we want to know more precisely the Fig. 5. If the top graph represents the
velocity this object had at time x = 1, position of an object with time as the
then it would make sense to consider the variable, then the following two graphs
average velocity over a shorter time in- represents its velocity and acceleration.
terval:
1
f (1 + 10 ) f (1)
1 .
1 + 10 1
Pushing this physical argument to its logical conclusion, it seems clear that if we want
to know the exact velocity of the object at time x = 1, we need to compute the limit
f (x + h) f (x)
lim .
h!0 h
That is, if a function describes the distance of an object with time as the variable, then
the derivative is exactly its velocity. Similarly, if a function describes the velocity of an
object with time as the variable, then the derivative is exactly its acceleration.
These are crucial observations, since, for instance, Newton’s laws of physics are for-
mulated as equations involving position, velocity and acceleration. In fact, most – if
not all – laws of nature are in one way or the other formulated in terms of how var-
ious quantities change, and therefore, in terms of derivatives. Indeed, this is true for
Maxwell’s laws for electromagnetism, Einsteins theories of relativity and Schrödinger’s
theory for Quantum Mechanics. The list goes on. In fact, we could also include less ob-
vious fields which are more and more depending on understanding the slopes of graphs –
these include finance, chemistry, biology and medicine. In short, we can describe nature
in terms of change, and the mathematical language of change is the derivative! That is,
the derivative truly is some type of super-hero!
7.1. THE DERIVATIVE AND THE DEFINITE INTEGRAL 211

A first look at the definite integral


Here is an informal description of what we mean by the definite integral of a function.

Informal definition 7.7 (Day job of the definite integral) The definite integral
Z b
f (x) dx
a

denotes the area under the graph of the function f (x) on the interval [a,b], where the
area below the x-axis is to be interpreted as being negative.

Fig. 6. Left: The definite integral of y = x2 over the interval [0,1]. Right: The
definite integral of y = sin x over the interval [0.8]. The area below the x-axis is
counted as "negative area".

There is more than one way to set up a limiting process for computing the area under
the graph of a function, and all of them lead to a version of the definite integral. In fact,
it is said that every mathematician in the 18th century had his own version of the defi-
nite integral. The Riemann integral, which is the one we learn in this course, is mostly
considered a pedagogical tool. In practice, mathematicians, physicists and engineers use
the more advanced Lebesgue integral which you will meet in later courses.

Fig. 7. If the derivative is like Batman, then it makes sense to think of the definite
integral as Robin. Indeed, as we shall see below, together they form a formidable
crime fighting duo!

To motivate the definition of the definite integral, we consider an example.


212 CHAPTER 7. WHAT IS CALCULUS

p
Example 7.8 Let us try to compute the area under the graph of f (x) = 1 x2 as
x 2 [0,1]. As in high school, our strategy is to approximate this area by using rectangles,
and, as with the derivative, we make such approximations pretending that we only have
knowledge of the graph of f at a finite number of evenly spread red dots.

p
Fig. 8. To the left, we approximate the area under the graph of f (x) = 1 x2 using
4 rectangles with base lengths 1/4, to the right, we approximate using 8 rectangles
with base lengths 1/8 (in both cases, the right-most rectangle has height zero).

In the above figures, we see two examples of lower Riemann sums. That is, finite
sums of the area of rectangles lying below some graph. For convenience, we denote the
lower Riemann sums shown above by L4 and L8 , respectively.
In particular, if we write xk = k/4 for k 2 {0, 1, 2, 3, 4}, then the area expressed in
the left-most figure, above, is equal to

f (x1 )(x1 x0 ) + f (x2 )(x2 x1 ) + f (x3 )(x3 x2 ) + f (x4 )(x4 x3 )


| {z } | {z } | {z } | {z }
Area of 1st rectangle Area of 2nd rectangle Area of 3rd rectangle Area of 4th rectangle

Writing xk = (xk xk 1 ) for the base-lengths of the rectangles, we can express the
resulting approximation as
Z 1 4
X 4 p
X 1
f (x)dx ⇡ f (xk ) xk = 1 (k/4)2 · = 0.62...,
0 4
k=1 k=1

where we used Python to compute the sum in the last step.

Exercise 7.9 (a) Use high school geometry to computep the exact value of the area
considered in the above exercise. (Hint: Rewrite y = 1 x2 .)
(b) Compute L8 , L100 , L1000 and compare these values to the value obtained in (a).
Hint: To do (b), feel free to use the code given in Example 7.10, below.
7.1. THE DERIVATIVE AND THE DEFINITE INTEGRAL 213

Example 7.10 (Riemann sums in Python) To compute, say, L4 , in the example


above, we can use the following code:
1 def f(x):
2 return (1 x ⇤⇤ 2) ⇤⇤ (1/2)
3
4 a=0; b=1; N=4
5 X = np. linspace (a,b,N+1)
6 S = 0
7
8 for k in range(1,N+1):
9 S = S + f(X[k]) ⇤ (X[k] X[k 1])

Notice how the code mirrors the mathematical notation (and that we, in this code, split
the interval into N equally long pieces)! Also, note that we could replace the three last
lines by the single line S = sum([f(X[k])⇤(X[k] X[k 1]) for k in range(1,N+1))].

Inspired by the above discussion, we now give the definition of what we mean by the
definite integral of a function f with respect to some interval [a,b].

Definition 7.11 (Definition of the definite integral for continuous functions)


Suppose that f is continuous on [a,b]. For each N 2 N, let x0 < x1 < . . . < xN be
equally spaced points inside of [a,b] with x0 = a and xN = b. Then
Z b N
X
f (x)dx = lim f (xk ) x.
a N !1
k=1

At the moment, even though it may seem reasonable, we do not know if the limit in
Definition 7.11 exists for all continuous functions. This is something we prove is true in
Chapter 12.

Exercise 7.12 For each of the following integrals, (a) make a drawing of the actual
area that the integrals represent, and use what you know about high school geome-
try to compute their exact values, and (b) adapt the code in the above example to
approximate them numerically.
Z 3 Z 2⇡
(i) x dx, (ii) sin(x) dx.
1 0
214 CHAPTER 7. WHAT IS CALCULUS

Determining the quality of approximations of definite integrals


Above, we used rectangles to obtain approximations of the area below a graph. However,
without already knowing the answer, we cannot determine how good this approximation
is. To this end, we need to also consider upper Riemann sums.

Example 7.13 Let us denote the upper Riemann sum illustrated in Figure 9 by U4 .
Writing, as above, xk = k/4 for k 2 {0,1,2,3,4} and letting xk = xk xk 1 , we obtain
4
X
U4 = f (xk 1) xk = 0.87...
k=1

(Here, we use the left end-points of each sub-


interval to compute the heights for the upper
Riemann sums.) Since L4 is smaller than the
area we want to approximate, and U4 is larger,
this tells us that
Z 1p
0.62...  1 x2 dx  0.87...
0
Fig. 9. Here we approximate the
In particular, this tells us that the error we area
make when approximating the integral by L4 p under the graph of f (x) =
1 x2 using 4 rectangles that
(or by U4 ) is less than 0.61 0.88 = 0.27. all lie above the graph.

In the following exercise, we make the important point that for monotone functions,
we can figure out the quality of approximations by Riemann sums without having to
compute the Riemann sums themselves.

Exercise 7.14 Suppose that f is a monotone and


continuous function on [a,b]. Split [a,b] into n
equally long pieces and consider the correspond-
ing lower and upper Riemann sums Ln and Un .
Determine how large
R b n has to be in order for Ln
to approximate a f (x)dx with an accuracy of at
least ✏.
Hint: Consider the figure to the right. Notice that
the difference between the upper and lower Rie-
mann sums can be represented as the area of a Fig. 10. Illustration for exercise
column (which has a friendly formula!). 7.14.

Exercise 7.15 Use the result of the previousRexercise to determine how large we have
1p
to choose n in order for Ln to approximate 0 1 x2 dx with an error of less than
1/1000. Use Python to verify that this is correct.
7.1. THE DERIVATIVE AND THE DEFINITE INTEGRAL 215

The connection between the derivative and the definite integral


So far, we have two ways of computing a definite integral:

1. Use a geometric argument (after all, the definite integral is an area).


2. Use the definition of the definite integral to compute a numerical approximation.

However, there is also a third way to compute definite integrals – one which you prob-
ably used the most in high school. This method uses the connection between derivatives
and the definite integral as discovered by Newton and Leibniz in the 17th century. Some
even claim that this is the most important scientific discovery ever made!
3. Use the evaluation formula. That is, if F
is a primitive function of f on [a,b] (that
is, F 0 (x) = f (x) for all x 2 [a,b]), then
Z b
f (t)dt = F (b) F (a).
a
Fig. 11. The derivative and the
When reading the above result, keep in mind definite integral, finally together,
that the definite integral and the derivative kicking ass!
come from two completely different geometric
notions: areas and tangent lines, respectively. It
was therefore quite surprising, and very useful,
when Newton and Leibniz realised that these
two concepts are mirror images of one another!
(In fact, historically, the definite integral was
studied long before anyone came up with the
derivative.) At the time, there was a huge argu-
ment between the two on who discovered "Cal-
culus" – the link between theory of integration
Fig. 12. Gottfried Wilhelm von
and the theory of differentiation – first. In the
Leibniz (1646–1716) shown biting
end, Newton was credited for the discovery, but
Sir Isaac Newon (1642–1727).
we are using Leibniz’ notation.

Exercise 7.16 Use the evaluation formula in combination with exercise 7.6 to com-
pute the following integrals.
Z 3 Z 3 Z 3
dx
(a) x dx (b) 2
(c) e5x dx.
1 1 x 1

Exercise 7.17 Compute the following expressions by first using the evaluation for-
mula, and then differentiating (you can assume that f has a primitive). Do the answers
surprise you? Z x Z x
d dx d
(a) (b) f (t)dt
dx 1 x2 dx 1
216 CHAPTER 7. WHAT IS CALCULUS

7.2 A first look at differential equations


Differential equations are used by physicists and engineers to model reality, and the
tools of Calculus were originally developed by Newton to solve such equations. For
instance, Newton’s theory of gravity, Maxwell’s theory for electromagnetic waves and
Schrödinger’s theory for quantum physics are all just a bunch of differential equations.
In this section, we shall try to understand how such equations move "information"
through time (and/or space). We do this by first considering a sandwich.

Two discrete-time models for population growth


Suppose you make yourself a delicious sandwich and leave
it on the kitchen counter in the summer heat. In particular,
what if there is a single E. coli bacteria on it? To the right,
you see the "The veterinary’s night snack". A Danish spe-
ciality consisting of llard of pig, liver of pig and ham of...
well, pig. If you look carefully, you might just see the E. coli
bacteria sitting there, starting a family.

Example 7.18 (The discrete-time proportional growth model) Under optimal


conditions, E. coli bacteria are practically immortal and grow by splitting into two new
bacteria roughly every 20 minutes. So, if we let an denote the number of E. coli bacteria
on our sandwich after 20 · n minutes, then we ought to have

an+1 = an + (number of new bacteria)

= an + an = 2an

The assumption that we have one single bacteria at time zero means that we put a0 = 1.
By the above formula for an+1 , we immediately obtain

a1 = 2a0 = 2
a2 = 2a1 = 4

and so forth. That is, the system of equations


(
an+1 = 2an
a0 = 1

allow us to compute the number of bacteria at any given time. We call this system of
equations the proportionate growth model for the E. coli population on our sandwich. In
particular, we call an+1 = 2an the evolution equation of the model, and a0 = 1 its initial
condition (e.g., if we initially had more bacteria, we would just adjust a0 ).
7.2. A FIRST LOOK AT DIFFERENTIAL EQUATIONS 217

Exercise 7.19 (a) According to Google, around E. coli 50 bacteria are more than
enough to poison the average person. How long does it take for the E. coli popu-
lation to reach this size?
(b) The earth weighs 6 · 1024 kg and a bacteria about 10 15 kg. How long does it
take for the colony of bacteria to weigh more than the earth? Is this reasonable?
A point of the above exercise is to illustrate that this model for bacterial growth is
not particularly realistic in the long run. The following model improves on this by taking
into account that bacteria produce less offspring when food and space become scarce.

Example 7.20 (The discrete-time logistic growth model) The evolution equation
of the so-called logistic growth model is obtained by making the assumption that the
number of new bacteria produced at time step n + 1 is given by the formula
⇣ an ⌘
(number of new bacteria) = an 1 . (7.1)
L
Here, L is a parameter reflecting the maximum number of bacteria supported by the
sandwich (note that the expression (7.1) approaches 0 as an approaches L). This means
that the evolution equation from the proportional growth model should be replaced by
⇣ an ⌘ ⇣ an ⌘
an+1 = an + an 1 = an 2 .
L L

Fig. 13. Left: A plot of the 20 first terms from the logistic growth model with L = 100
and initial condition a0 = 1. Right: Two actual growth curves of E. coli taken from
a report published at http://2009.igem.org/Team:SJTU-BioX-Shanghai/Results.

Exercise 7.21 (a) Suppose an E. coli bacteria has a diameter of 10 6 meters. Use
this to estimate how many E. coli will fit on your Sandwich (i.e., the value of L).
(b) Compute how long it takes for the colony to reach 50 bacteria according to the
logistic growth model.
(c) Compute how many bacteria there are on the Sandwhich after 2 days, according
to the logistic growth model. Compare this to the answer in (b) of exercise 7.19.
218 CHAPTER 7. WHAT IS CALCULUS

Two continuous-time models for population growth


We now turn to differential equations. They are "continuous-time" analogues of the
discrete-time models we got by considering sequences in the previous section.

Fig. 14. That is, the time variable will now move smoothly along the real line,
instead of only taking certain values through a sequence of discrete jumps.
An advantage of working with differential equations is that we can apply the tools of
Calculus to solving them. A disadvantage, however, is that they are more difficult to
simulate (due to the lack of discrete time-jumps).
Example 7.22 (Continuous-time growth models) Suppose that the function y(t)
describes the number of bacteria on the sandwich from Example 7.18 at time t. This
means that the derivative y 0 (t) describes the rate of growth of the colony. Assuming that
we start out with one lonely bacteria at time t = 0, our initial condition is y(0) = 1.
If we assume that the bacterial growth is simply proportional to the number of
bacteria present (similar to the discrete-time proportional growth model), then we arrive
at the continuous-time proportional growth model
( 0
y (t) = Cy(t)

y(0) = 1,
for some experimentally determined constant C. Here, y 0 (t) = Cy(t) is the evolution
equation of the model, and y(0) = 1 is its initial condition. As in the discrete model, if
we initially had more bacteria, we would adjust y(0).
Similar to the discrete-time case, a more realistic model is given by the continuous-
time logistic growth model, where we replace the evolution equation by
⇣ y(t) ⌘
y 0 (t) = Cy(t) 1 .
L
While there is no nice formula for the solution of the discrete-time logistic growth model,
we will see later on that using the tools of Calculus, a formula for the solution is possible
to find in the continuous-time case.

Exercise 7.23 (a) Verify that y(t) = DeCt solves the continuous-time proportional
growth model for all D 2 R. Hint: Recall exercise 7.6.
(b) Suppose that y(t) from (a) models the growth of the bacteria on our sandwich
after t minutes. Use the equation y(0) = 1 and y(20) = 2 to find C and D.
(c) How long does it take for the colony to reach 50 bacteria according to this model?
7.2. A FIRST LOOK AT DIFFERENTIAL EQUATIONS 219

Visual solutions of differential equations using slope fields


We now discuss how to visually "solve" differential equations. The first step is to figure
out how to draw slope fields. We explain this in the following example.

Example 7.24 (Slope fields) We consider the differential equation

y 0 = xy. (7.2)

While we do not know what the solution to this equation is, we do know that if its
solution has the value y = 2 at x = 1, then the derivative y 0 of the solution at x = 1
must satisfy
y 0 (1) = 1 · 2 = 2.

In this way, we can play a game of "what-if",


and compute the prescribed value for any solu-
tion at a bunch of coordinates in the xy-plane.
Drawing a short line with the computed slope
in each such coordinate, we obtain the picture
shown on the right, called a slope field.

Fig. 15. A slope field for y 0 = xy.

Exercise 7.25 Match the slope fields shown below to the differential equations:
(a) y 0 = 1/x (b) y 0 = y

(i) (ii)

Exercise 7.26 Suppose we use the continuous-time growth model to model the num-
ber of bacteria on our sandwich, with C and L as in exercise 7.21. Draw a slope field
for the evolution equation for this model (use reasonable values for x and y).
Hint: Try not to be to detailed here. A quite rough slope field will be sufficient for
understanding how the solutions behave.
220 CHAPTER 7. WHAT IS CALCULUS

The second step in obtaining a visual solution is to place your pen somewhere in the
slope field, and then trace the path suggested by the "flow" of the slope field. We make
this more precise in the following example:

Example 7.27 Again, we consider the differential equation y 0 = xy from the previous
example. Let us assume that the solution we are looking for satisfies for which y( 1) = 1.
That is, we are looking for a solution of the sys-
tem (
y 0 = xy
y( 1) = 1
As in the discrete-time case, we call y 0 = xy the
evolution equation of this system, and y( 1) =
1 its initial condition.
Now, if we place our pen at the point (x,y) =
( 1,1), and then trace out the path suggested
by the "flow" of the slope field, we arrive at the
following: Fig. 16. Here we have used the
slope filed to visualise the solution
of the differential equation with
initial condition y( 1) = 1.

In the above example, notice how the solution seems to have a slope at each point
that matches the slope prescribed by the differential equation! That is, we have visually
solved the differential equation for the given initial condition. Intuitively, we can think
of the slope field as describing the current of some river, and the solution being the path
of a leaf dropped at the coordinate of the initial condition.

Exercise 7.28 (Exercise 7.25, continued) Use the above slope fields to visually solve
the following differential equations with given initial values:
( (
y 0 = 1/x y0 = y
(a) (b)
y(1) = 0 y(0) = 1

Exercise 7.29 (Exercise 7.26, continued) Draw a few solutions for the continuous-
time logistic growth model for our E. coli infested sandwich. How do the solutions
behave with respect to different initial conditions at x = 0?
7.2. A FIRST LOOK AT DIFFERENTIAL EQUATIONS 221

How to simulate a first order differential equation


The problem of simulating a differential equations and continuous-time models is that
given a point t0 2 R, then it is not clear what we mean by the "next" point (as opposed to
working with discrete-time models). So, what to do? Well, it turns out to be a good idea
to break down the process of visually solving differential equations into a step-by-step
process.

Fig. 17. Here, we have drawn, by hand, a "solution" of the differential equation
y 0 = x2 y 2 , with initial condition y( 1) = 1. The idea is to follow the blue arrows!

To analyse, step-by-step, how we came up with the red curve in the above figure, we
assume that the differential equation can be expressed on the form
y 0 = f (x,y), (7.3)
where f (x,y) is some expression in terms of x and y (for instance, the differential equation
in Example 7.24 is on this form for f (x,y) = xy).
Step 1: Place the point of your pen in an initial
coordinate (x0 , y0 ) and choose a step-size x.
Step 2: Check the slope of the blue line of the
slope field at (x0 , y0 ). According to formula
(7.3), the value of y 0 is given by f (x0 , y0 ).
Step 3: Let your pen follow the blue line as
you move a distance of x in the horisontal
direction. Since the blue line has a slope of
f (x0 ,y0 ), this means you also move a distance
of y = f (x0 ,y0 ) x in the vertical direction. In Fig. 18. When drawing the blue lines
this way, you end up moving diagonally from one after another, we start to get
(x0 ,y0 ) to (x1 , y1 ) = (x0 + x, y0 + y). something looking like a graph.

Step 4: We now repeat steps 1, 2 and 3, with the point (x0 , y0 ) replaced by (x1 , y1 ), to
obtain a new line segment ending at the point (x2 , y2 ), and so on, until you have reached
as far as you want (note that it is usual to keep the same x at each consecutive step).
222 CHAPTER 7. WHAT IS CALCULUS

We now illustrate the above steps in a concrete example.

Example 7.30 Let us, by hand, simulate the differential equation y 0 = x2 y 2 with
initial condition (x0 , y0 ) = (0, 1) and time-step x = 0.1. This means that f (x,y) =
x2 y 2 . According to step 2, above, we compute y 0 = f ( 1,1) = 0. This gives y =
f ( 1,1) x = 0. This means that our new point is

(x1 ,y1 ) = (0 + x, 1 + y) = (0 + 0.1, 1 + 0) = (0.1, 1).

To move a second "time-step" of x = 0.1, we repeat the process with the point (0,1)
replaced by (0.1,1). Note that this time y 0 = f (0.1,1) = (0.1)2 12 = 0.9.

Next, we explain how to do the above steps in Python.

Example 7.31 (Simulating a differential equation in Python) The following


code simulates the following differential equation under the initial condition y( 1) = 1.
y 0 = x2 y2.

1 import matplotlib . pyplot as plt


2
3 def f(x,y):
4 return x ⇤⇤ 2 y ⇤⇤ 2
5
6 deltaX = 0.1; N=20 # Stepsize and number of steps
7 X = [ 1 + n ⇤ deltaX for n in range(0,N+1)]; Y = [1] # Init conditions
8
9 for n in range(0,N): # Loop simulating Y
10 slope = f(X[n],Y[n])
11 Y. append (Y[n] + slope ⇤ deltaX )
12
13 plt.plot(X,Y," darkred ",linewidth =2)

Fig. 19. A selection of simulated curves. The orange is the one produced in the
above example. The right-most image is the one produced by this code.
7.2. A FIRST LOOK AT DIFFERENTIAL EQUATIONS 223

Exercise 7.32 Consider the differential equation y 0 = xy with (x0 ,y0 ) = ( 2,1).
(a) Where do you end up if you take a "time-step" of length x = 0.1? Compare
this with the figure in Example 7.27.
(b) Where do you end up if you take a second time step? There are now two sources
of error. Which ones?
(c) Use the code from Example 7.31 to simulate the differential equation with the
above initial condition. Does it match the figure in Example 7.27?
Exercise 7.33 (Exercise 7.28, continued.) Use the code from Example 7.31 to sim-
ulate the initial value problems from Exercise 7.28. How do the simulations compare
to your visual solutions?
Exercise 7.34 (Exercise 7.29, continued.) Use the code from Example 7.31 to simu-
late the continuous-time logistic growth model for our sandwich. Choose initial con-
ditions at y(0) that will result in different types of behaviour.

Remark 7.35 (The danger of butterflies) The code in Example 7.31 has an extreme
– and unnecessary – weakness which has led to people being killed, and stock markets to
crash. The problem is that there is a round-off error which experiences a snowball effect
as the code runs.
To understand what is going on, recall that a computer cannot represent every real
number. Instead, it has a finite number of floating point numbers that it can repre-
sent. Unfortunately, 1/10 is not one of them. Instead, the variable deltaX will contain
something that is close to 1/10, but which contains an error in the 16th digit. That is,
x = 1/10 + ✏ where ✏ ⇡ 10 17 (an important detail: why do we claim that the error
is of the size 10 17 and not 10 16 ?). After n repetitions of the for-loop, this means that
the error in the x variable is equal to n · ✏ ⇡ n · 10 17 .
To the right, we see the Patriot defensive missile
system used by the Americans, for instance, in their
wars against Iraq. It had a targeting computer that
made a "tick" every 1/10 seconds. Some poor pro-
grammer needed to make a vector containing a sched-
ule for all the "ticks" from the targeting computer,
and did this using a for-loop essentially executing the
command t = t + 0.1 (just as we do in Example
7.31). The result was that the missile system became Fig. 20.
useless after a certain amount of time.

Exercise 7.36 (a) Compute the error accumulated in the time variable 100 hours
after a Patriot missile system was turned on.
(b) Suggest a way to change the code to avoid the problem of round-off errors alto-
gether in the list X.
224 CHAPTER 7. WHAT IS CALCULUS

.
7.3. ANSWERS TO SELECTED EXERCISES 225

7.3 Answers to selected exercises


(1+h)2 1
7.3 (a) 2.01, (b) h = 2 + h ! 2 as h ! 0.

7.6 (a) f 0 (x) = 0, (b) f 0 (x) = 1, (c) f 0 (x) = 2x, (d) f 0 (x) = 1/x2 , (e) f 0 (x) = k, (f)
f 0 (x) = CeCx .

7.12 (a): The area represented by (i) is a rectangle plus a triangle with combined area
4. The area represented by (ii) has equal size above and below the x-axis, and so,
by symmetry, the integral is equal to 0.

7.14 The combined area of the difference between UN and LN is equal to a column
with area (f (b) f (a)) · (b a)/N . For this to be less than ✏ > 0, we must have
N > (f (b) f (a))(b a)/✏.

7.16 (a) 4, (b) 2/3, (c) (1/5)(e15 e5 ).

7.17 (a) 1/x, (b) f (t).

7.19 (a) After 2 hours there are 64 bacteria. This is just enough to get food poisoning,
(b) just less than 44 hours.

7.23 (b) D = 1, C = (ln 2)/20, which yields y(t) = 2t/20 , (c) A bit less than 113 minutes.

7.32 You should get the following answers: (a) After one step, the simulated solution
gives y( 1.9) ⇡ 0.8, while the exact solution gives y( 1.9) = e 0.195 = 0.8228....
(b) After two time steps, the simulated solution gives y( 1.8) ⇡ 0.648. The main
sources of error are now that we are relying on an approximate value of y( 1.9)
(in step one, we had an exact value of y( 2)), as well as making a mistake by
following this slope for the full length of the time-step x = 0.1 (in the first step,
this was the only source of error).
Additionally, the computer makes a round-off error, but this has an extremely
minor effect at this point.
226 CHAPTER 7. WHAT IS CALCULUS
Chapter 8

The derivative

In this chapter we figure out the computational rules for the derivative and prove the
differentiation formulas for the functions most commonly appearing in these lecture
notes.

Remark 8.1 (Selected problems from previous exams based on this chapter)

1. The following elementary functions are closely related to the trigonometric func-
tions, and are called hyperbolic functions:

ex e x ex + e x
sinh x = and cosh x = .
2 2
(a) Prove the differentiation formulas

(sinh x)0 = cosh x, and (cosh x)0 = sinh x.

(b) Prove the "hyperbolic identity": cosh2 x sinh2 x = 1.


(c) Show that sinh x is an invertible function on R.
(d) Define the inverse hyperbolic function arcsinh x by letting y = arcsinh x ()
sinh y = x. Determine
d
arcsinh x.
dx

227
228 CHAPTER 8. THE DERIVATIVE

8.1 The computational rules for the derivative


Definition for the derivative
We begin by briefly recalling the definition of the derivative from Chapter 7.

Definition 8.2 (The derivative) We define the derivative of f at the point x to be

def f (x + h) f (x)
f 0 (x) = lim
h!0 h
at all points where this limit exists. Moreover, if the limit exists, we say that f is
differentiable at x. If f is differentiable at all points in its domain, we simply say that f
is differentiable. Note that we sometimes write dx d
f (x) or df
dx (x) instead of f (x).
0

Exercise 8.3 (a) In Chapter 7 We computed the derivative of f (x) = x2 using the
definition. Check that you can also compute this derivative by using the limit

f (x) f (u)
lim .
u!x x u

(b) Is it always true that the limit in (a) is equal to the derivative of f (x) for all
differentiable functions f ? If yes, prove this, or, if not, then find a counter-
example.

Remark 8.4 (Leibniz notation) The letter h in the definition of the derivative denotes
how far we move away from the point x on the x-axis. Similarly, the quantity f (x +
h) f (x) denotes how far this pushes the function away from the value f (x) along the
y-axis. Denoting these changes to the values x and f (x) by x and f , respectively,
Leibniz came up with the notation df
dx
for the derivative. (Here, we should point out that the Greek letter corresponds to
the latin letter "d" and stands for "difference".) Explicitly, we have
df f f (x + x) f (x)
= lim = lim .
dx x!0 x x!0 x
In order to remember this notation, then the following is worth keeping in mind:

"In the limit, Greek letters turn into Latin letters."

(In fact, this also holds true in the case of the notation for the definite integral.)
8.1. COMPUTATIONAL RULES FOR THE DERIVATIVE 229

Derivatives of piece-wise defined functions


As an exercise in applying the definition of the derivative, we can try to compute the
derivative of a piecewise-defined function. Here is an example.

Example 8.5 Consider the function


(
x2 + x x 0
f (x) =
Cx x < 0

For what values of C is this function differentiable at x = 0? To determine this, we need


to check if the following limit exists:

f (0 + h) f (0) f (h)
lim = lim .
h!0 h h!0 h

Here, we used that f (0) = 0, according to the definition of f . Now, to plug in a formula
for f (h), we need to know if h is positive or negative (since f has different formulas
depending on the sign of h). This forces us to consider the one-sided limits separately:

f (h) h2 + h
lim = lim = lim h + 1 = 1,
h!0+ h h!0+ h h!0+

f (h) Ch
lim = lim = lim C = C.
h!0 h h!0 h h!0

Since these two limits are equal if and only if the two-sided limit exists (Proposition
6.14), it follows that f is differentiable at x = 0 exactly if C = 1.

Exercise 8.6 Consider the function in the above example. Suppose that x > 0 is
some fixed number. Do we have to take both formulas of f into consideration when
computing f 0 (x)? Explain why.

Exercise 8.7 Determine values for C and D so that the following function is both
continuous and differentiable at x = 0:
(p
x+1+D x 0
f (x) =
C(x + 1) x < 0
230 CHAPTER 8. THE DERIVATIVE

Some common differentiation formulas


In Chapter 7, we proved the differentiation formulas for functions such as y = 1/x,
p
y = x2 and y = x. Later on in this chapter, we prove how to differentiate the other
elementary functions we encounter in these lecture notes. However, since most of these
formulas should be familiar to you from high school, we ask you already now to fill in
as much as you can in the below table, and to feel free to use these formulas throughout
most of this chapter (recall that F is called a primitive function for f if F 0 = f ).

Proposition 8.8 (Derivatives and your favourite primitives of some elemen-


tary functions)

Primitive function F (x) f (x) d


dx f (x)

x2 2x

x 1

1/x
p
x

x↵

ex

ln x

sin x

cos x

tan x

arcsin x

arccos x

arctan x
8.1. COMPUTATIONAL RULES FOR THE DERIVATIVE 231

The rulebook for the derivative


A second goal of this chapter is to prove the following computational rules for the deriva-
tive, and learn how to use them to compute expressions made up of combinations of
functions from the table in Proposition 8.8, above.

Proposition 8.9 (Rulebook for the derivative)


Suppose that both f and g have a derivative at the point x. Then,
⇣ ⌘0
(i) f (x) ± g(x) = f 0 (x) ± g 0 (x) (sum rule)
⇣ ⌘0
(ii) f (x) · g(x) = f 0 (x)g(x) + f (x)g 0 (x) (product rule)

Moreover, if g(x) 6= 0, it also holds that


✓ ◆0
1 g 0 (x)
(iii) = (reciprocal rule)
g(x) g(x)2
✓ ◆0
f (x) f 0 (x)g(x) f (x)g 0 (x)
(iv) = (quotient rule)
g(x) g(x)2

Finally, if f (u) is differentiable at the point u = g(x), we have

d
(v) f g(x) = f 0 g(x) · g 0 (x) (chain rule)
dx

We prove these rules in Section 8.2, below. As we shall see, these rules are all con-
sequences of the computational rules for the limit. For this reason, it may be surprising
that the sum rule looks very similar to the one for the limit, while others do not.
While the above computational rules should be more or less familiar from high school,
the following rule is probably not.

Proposition 8.10 Suppose that f is an invertible function that is differentiable at a


point y, and satisfies f 0 (y) 6= 0, then f 1 is differentiable at x = f (y), and

d 1 1
f (x) = .
dx f0 f 1 (x)

We immediately note that while this result may be hard to read and apply, it is
actually not that hard to prove. Indeed, it follows almost immediately from the chain
rule. We shall return to this when we discuss implicit differentiation later in the chapter.
232 CHAPTER 8. THE DERIVATIVE

How to use the computational rules for the derivative


In the first example, we illustrate the use of rules (i) to (iv). In a sense, these are
the "easy" rules (keep in mind that you are allowed to use everything from the table of
derivatives listed in Proposition 8.8, above).

Example 8.11 Here are some examples to illustrate rules (i) to (iv).

d 3
(i) x + sin x = 3x2 + cos x
dx

d 3
(ii) x sin x = 3x2 sin x + x3 cos x
dx
= x2 3 sin x + x cos x

d⇣ 1 ⌘ 1 d 3
(iii) = 2 · x sin x
dx x3 sin x 3
x sin x dx

x2 3 sin x + x cos x
=
x6 sin2 x
3 sin x + x cos x
=
x4 sin2 x
Notice how we in (iii) do not try to solve everything in one line. Instead, the first
step was essentially to recall the reciprocal rule. Indeed, to have a bit of patience when
computing derivates often helps us avoid mistakes.

d ⇣ x3 ⌘ 3x2 sin(x) x3 cos x


(iv) = 2
dx sin x sin x
3 sin x x cos x
= x2 ·
sin2 x

We include one more example on the product rule to illustrate the importance of
patience when using the product rule:

Example 8.12 Applying the product rule twice, we can differentiate the product of
three functions:
d d
ln x · sin x · arctan x = ln x · sin x · {z
arctan x}
dx dx |{z} |
f g
⇣d ⌘ d⇣ ⌘
= ln x · sin x · arctan x + ln x · sin x · arctan x
dx dx
8.1. COMPUTATIONAL RULES FOR THE DERIVATIVE 233

sin x · arctan x ⇣ sin x ⌘


= + ln x · cos x · arctan x +
x 1 + x2
sin x · arctan x ln x · sin x
= + ln x · cos x · arctan x + .
x 1 + x2

Exercise 8.13 Compute the derivatives of


x2 + 1 sin x x
(a) (b) (c) p
arctan(x) 1 + cos x ln x

Remark: You can avoid the use of the chain rule in this exercise.
Exercise 8.14 Show that for constants a,b,c,d such that not both c = d = 0, then
✓ ◆
d ax + b ad bc
= .
dx cx + d (cx + d)2

Exercise 8.15 (a) Use the product rule to prove by induction that
d n
x = nxn 1
, 8n 2 {1,2,3, . . .}.
dx
(b) Combine the formula from (a) with the reciprocal rule to prove that
d n
x = nxn 1
, 8n 2 { 1, 2, 3, . . .}.
dx
We now move on to rule (v), namely, the the chain rule.

Example 8.16 Using the chain rule, we get


d d 3
sin(x3 ) = cos x3 · x
dx dx
= 3x2 cos(x3 ).

Here, we use the chain rule as formulated in Proposition 8.9 with f (x) = sin x and
g(x) = x3 . In particular, since f 0 (x) = cos x, this means that f 0 g(x) = cos(x3 ).

Exercise 8.17 Compute the derivatives of the following functions.


2)
(a) esin x (b) ln(1 + x2 ) (c) arctan(1/x) (d) xe(x

Remark: At first, the chain rule can be confusing. If you are struggling with this
exercise, continue reading, and then try again after taking a look at Example 8.20.
234 CHAPTER 8. THE DERIVATIVE

The chain rule is usually the computational rule for the derivative that requires the
most effort to master. The main reason is probably that the notation is sort of bad.
Indeed, notice that in our formulation of the chain rule, then
d
f g(x) 6= f 0 g(x) . (8.1)
dx
So what is going on? Well, in the expression to the left, we are trying to say that one
should first compose f and g, to get f (g(x)) = sin(x3 ), and then take the derivative of
this composition. In the expression to the right, on the other hand, we mean to say that
you should first take the derivative of f (x) = sin(x), and afterwards compose the result
with g(x).
This difference is really not at all clear from how we write these expressions. So,
to make the chain rule easier to understand, it is common to introduce different letters
for the variables and write f (u) for the outer function and g(x) for the inner function.
With the Leibniz notation for the derivative (recall Remark 5.24), we can now write the
right-most expression in (8.1) as follows:
d d
f (u) or f (u) .
du du u=g(x)

These two expressions mean the same thing. However, in the right-most variant, we
make the extra effort of reminding the reader that only after taking the derivative, do
we put u = g(x). The chain rule can now be expressed as
d d df du
f g(x) = f (u) = · .
dx dx du
|{z} dx
|{z}
outer der. inner der.

Example 8.16 (continued) In the case of sin(x3 ) then f (u) = sin u is the outer function
and g(x) = x3 is the inner function. This means that f 0 (u) = cos u is the outer derivative
and g 0 (x) = 3x2 is the inner derivative. By the chain rule, we get

d u=x3 d du
sin(x3 ) = sin(u) = cos u ·
dx dx dx
d 3
= cos(x3 ) · x
dx
= cos(x3 ) · 3x2 .

Exercise 8.18 Compute the derivatives in exercise 8.17 using this notation.

Exercise 8.19 Check the definition of the indefinite integral in Appendix A, and use
it to pair the following integrals with the suitable expression. Note that to solve this
8.1. COMPUTATIONAL RULES FOR THE DERIVATIVE 235

exercise, you only need to be able to compute derivatives. (Why? Also, note that you
do not even need to know what the derivative of arctan x is.)
Z
dx 1
(a) 2
(i) ln(x2 + 4) + C
4+x 2
Z
dx 1⇣ ⌘
(b) (ii) ln(2 + x) ln(2 x) + C
4+x 4
Z
dx
(c) (iii) ln(4 + x) + C
(4 + x)2
Z
x dx 1
(d) 2
(iv) +C
4+x 4+x
Z
dx 1 x
(e) 2
(v) arctan + C
4 x 2 2

Let us consider one more example where we illustrate the use of the chain rule.

Example 8.20 We wish to compute


d p
sin 1 + x3 .
dx
As with the product of three factors, we need patience, and we need to apply the chain
rule more than once. We typically work as follows:
d p p
u= 1+x3 d
sin 1 + x3 = sin u
dx dx
chain rule du
= cos u ·
dx
p dp
= 1 + x3 ·
cos 1 + x3
dx
v=1+x3
p dp
= cos 1 + x3 · v
dx
chain rule
p 1 dv
= cos 1 + x3 · p ·
2 v dx
p
p 1 3x 2 cos 1 + x3
2
= cos 1 + x3 · p · 3x = p
2 1 + x3 2 1 + x3

In the last line, we used that dv/dx = (x3 )0 = 3x2 .

Exercise 8.21 Compute the derivative of y = ln ln(ln x) .


236 CHAPTER 8. THE DERIVATIVE

Exercise 8.22 Compute the derivatives of the following functions. Note that they
all have something in common. In particular, after having done this exercise, think
about what this means for their graphs, and plot them to see if you are correct.
1
(a) f (x) = arctan + arctan x
x
x
(b) f (x) = arcsin p arctan x
1 + x2
p p
(c) f (x) = 2 arctan(x x2 1) + arctan x2 1

Hint: These functions – and how to compute their derivatives – have all appeared on
recent exams. You can find these exams, with full solutions, on the course website.
8.2. PROOF OF THE COMPUTATIONAL RULES FOR THE DERIVATIVE 237

8.2 Proof of the computational rules for the derivative


In this section, we take the point of view that proofs for the computational rules for the
derivative are essentially just examples of applications of the computational rules for the
limit.

Example 8.23 (Proof of the sum rule) We use the computational rules for the limit
to show that ⇣ ⌘0
f (x) + g(x) = f 0 (x) + g 0 (x).

Starting from the definition of the derivative, we do the following computation:


⇣ ⌘ ⇣ ⌘
d ⇣ ⌘ f (x + h) + g(x + h) f (x) + g(x)
f (x) + g(x) = lim
dx h!0 h
✓ ◆
f (x + h) f (x) g(x + h) g(x)
= lim +
h!0 h h
f (x + h) f (x) g(x + h) g(x)
= lim + lim = f 0 (x) + g 0 (x).
h!0
| {z h } h!0
| {z h }
=f 0 (x) =g 0 (x)

Notice that we could use the summation rule for the limit since we knew that both the
limits f 0 (x) and g 0 (x) exist.

Next, we turn to proving the product rule. While


it has the same flavour as the proof for the sum-
mation rule, it is slightly more complicated. In
particular, we need to use the fact that if you
are differentiable at a point, then you are also
continuous there.
In the following two exercises you are asked to Fig. 1. Differentiable functions are
verify that the diagram to the right is correct. always continuous.

Exercise 8.24 Show that if f has a derivative at x, then it is also continuous there.
What part of the diagram in Figure 1 does this justify?
Hint: Recall the formula from exercise 8.3, and find a chain of equalities showing that
⇣ ⌘
lim f (u) f (x) = · · · = 0.
u!x

Keep in mind that you know that f 0 (x) exists...


238 CHAPTER 8. THE DERIVATIVE

Exercise 8.25 We consider the function


(
x2 if x 1
f (x) =
x if x < 1

Use the definition of the derivative to deter-


mine whether f 0 (1) exists or not. What does
this example say about the diagram in Figure
1 on the previous page?
Remark: Pictured to the right is the
rather
P1 extreme Weierstrass function f (x) = Fig. 2. The Weierstrass function.
2 n cos(3n ⇡x). It is continuous at ev-
n=1
ery point but is nowhere differentiable.
Exercise 8.26 Prove the product rule for the derivative.
Mega long hint: What you need to do is to obtain a chain of equalities proving that

d⇣ ⌘ f (x + h)g(x + h) f (x)g(x)
f (x) · g(x) = lim = · · · = f 0 (x)g(x) + f (x)g 0 (x).
dx h!0 h

As in the last steps of the proof of the sum rule,


you need to make both expressions (f (x + h)
f (x))/h and (g(x + h) g(x))/h appear. To to
this, you can use basically the same trick as we
used for the product rule for limits. That is, you
can add by 0 in a clever way. Or, you could use
the illustration to the right as inspiration. Here,
for the sake of simplicity, we suppose that f and
g are both increasing functions. Then f (x)g(x)
can be thought of as the blue area, while f (x +
h)g(x+h) as the total area shown. The difference
f (x + h)g(x + h) f (x)g(x) can then be written
as the sum of the red and green areas. Express
this difference algebraically, and you can use it
Fig. 3.
to rewrite the numerator in the expression for
the derivative of f (x)g(x).

Exercise 8.27 (a) Use the definition of the derivative to figure out a formula for

d 1
.
dx g(x)

What assumption do we need to make on g(x)?


8.2. PROOF OF THE COMPUTATIONAL RULES FOR THE DERIVATIVE 239

(b) Use the formula found in (a) and the product rule for derivatives to derive a
formula for d f (x)
.
dx g(x)
Finally, we turn to the chain rule. It is – by far
– the most difficult of the computational rules to
prove. To prepare us for the proof, we give a sim-
ple, but unfortunately false, argument that helps
us understand what is going on (curiously, this
"proof" may be found in numerous high school Fig. 4. Fake proof ahead.
textbooks).
Example 8.28 (Fake "proof" of chain rule) We wish to compute the limit

d f (g(x + h)) f (g(x))


f g(x) = lim .
dx h!0 h
The trick is now to multiply with 1 in a creative way. Namely,

f (g(x + h)) f (g(x)) f (g(x + h)) f (g(x)) g(x + h) g(x)


lim = lim ·
h!0 h h!0 h g(x + h) g(x)
f (g(x + h)) f (g(x)) g(x + h) g(x)
= lim ·
h!0 g(x + h) g(x) h
f (g(x + h)) f (g(x)) g(x + h) g(x)
= lim · lim .
h!0 g(x + h) g(x) h!0
{z h
| {z } | }
(⇤) =g 0 (x)

To end the proof, we need to compute the limit labeled by (⇤). To do this, we make a
change of variables. That is, we set k = g(x + h) g(x). Note that as h ! 0, then k ! 0
(this is true since differentiable functions are automatically continuous). This allows us
to write
f g(x + h) f (g(x)) f g(x) + k f g(x) ⇣ ⌘
lim = lim = f 0 g(x) .
h!0 g(x + h) g(x) k!0 k

Notice that the last expression here means the derivative of the function f evaluated at
the point g(x).

Exercise 8.29 (a) What is the problem with the above proof? (A correct proof is
supplied on the following page.)
(b) This "fake proof" can be used to come up with a correct proof for the differenti-
ation formula for inverse functions from Proposition 8.10.
Hint: In (b), use the "fake proof" to differentiate f 1 (f (x)).
240 CHAPTER 8. THE DERIVATIVE

Correct proof of the chain rule


Again, our goal is to compute the limit
d f (g(x + h)) f (g(x))
f g(x) = lim .
dx h!0 h

Now, the problem with the fake proof of the chain rule occurs when we multiply by one.
Indeed, the expression
g(x + h) g(x)
g(x + h) g(x)
may be of the form 0/0 an infinite number of times as h ! 0. That is, we need to find
an alternative approach that avoids division by g(x + h) g(x).
For this reason, let us now consider the definition of f 0 , which we write up as follows:
f (u + k) f (u)
f 0 (u) = lim . (8.2)
k!0 k
As in the fake proof, we want to put k = g(x + h) g(x). However, the problem we
mentioned above then becomes precisely that k may be zero for various values of h, and
we may therefore not divide by it. But here is the crucial step: before we make the
connection between k and g(x + h) g(x), we define
8
<f 0 (u) f (u + k) f (u) , k 6= 0,
E(k) = k (8.3)
:
0 k = 0,

where we keep in mind that we consider u as being fixed and k as the variable. Here, we
also notice that since (8.2) holds, it follows that

lim E(k) = 0.
k!0

That is, when defined in this way, the function E(k) is continuous at the origin. Why
did we do all of this? Well, by multiplying up k, and rearranging (8.3), we can now write
⇣ ⌘
f (u + k) f (u) = f 0 (u) E(k) k.

Since this expression is fine for k = 0, it is now safe to put u = g(x) and make the
connection k = g(x + h) g(x), which, in particular, means that k ! 0 as h ! 0 and
that we can write
⇣ ⌘⇣ ⌘
f (g(x) + k) f (g(x)) = f 0 (g(x)) E(k) g(x + h) g(x) .

Here, we did not write out the first k since this will make the computation that follows
below slightly easier to read. Also, notice that we can rewrite k = g(x + h) g(x) as
g(x + h) = g(x) + k.
8.2. PROOF OF THE COMPUTATIONAL RULES FOR THE DERIVATIVE 241

Finally, we have gathered all the necessary pieces needed to make the following
computation:

d f (g(x + h)) f (g(x))


f g(x) = lim
dx h!0 h
f 0 (g(x)) E(k) k
= lim
h!0 h
⇣ ⌘ k
= lim f 0 (g(x)) E(k) · lim
h!0 h!0 h
⇣ ⌘ g(x + h) g(x)
= lim f 0 (g(x)) E(k) · lim
k!0 h!0 h
= f 0 (g(x)) · g 0 (x).

And we are done!

Remark 8.30 While the above proof seems to be more complicated than the "fake"
proof, the general idea is basically the same. However, here, things get more complicated
as we need to do some extra bookkeeping to make sure that nothing bad happens if k
happens to be zero for h arbitrarily close to 0 (but not equal to 0).

Exercise 8.31 (Discussion) Is the "fake" proof really that bad? Can you think of
any conditions under which it will actually work, and do the functions we normally
consider in these lecture notes satisfy such conditions?

Exercise 8.32 Use the chain rule to prove the differentiation formula for invertible
functions from Proposition 8.10. Here, you may assume that both f and f 1 are
differentiable.
Remark: You should compare this exercise to exercise 8.29.

Remark 8.33 In the YouTube-film linked in the margin, here, another proof of the
formula for the derivative on an inverse function is given.
242 CHAPTER 8. THE DERIVATIVE

8.3 Differentiation formulas for elementary functions


In this section, we take a closer look at the formulas for the derivatives of the trigono-
metric functions sin x, cos x, tan x, the logarithm and the exponential function, as well as
the inverse trigonometric functions. But we start out by observing that the elementary
functions are less well-behaved with respect to differentiation than continuity.

Are all elementary functions differentiable? What, no!?


In Chapter 6.4, we observed that all elementary functions are continuous. Unfortunately,
the corresponding statement is not true for differentiability as is shown by the following
example.

Example 8.34
Let f (x) = arcsin(x) and g(x) = sin(x). Surely,
these functions are differentiable. However, this
is not the case for the composition f g(x) =
arcsin(sin x). As we see in the figure to the right,
there are plenty of pointy edges! The point is
that arcsin(x) is not differentiable at the end-
points of its domain, and this causes trouble
when x is such that sin x = ±1.
Fig. 5. Look! Pointy edges!

But all is not lost. The following proposition is analogue to Proposition 6.52 (notice
that it contains a little bit of "fine print").

Proposition 8.35 Suppose that f and g are differentiable. Then the same is true for
the functions f ± g, f · g and f /g. Moreover, if g is differentiable at a point x and f is
differentiable at the point g(x), then f g is also differentiable at x.

Exercise 8.36 Prove Proposition 8.35.


Hint: This is just a matter of using the rulebook for differentiation. For instance, the
differentiation rule (f g)0 = f 0 g + f g 0 in particular states that if f, g are differentiable
at a point x, then so is the product f g.
Exercise 8.37 The inverse function of an invertible and differentiable function is not
always differentiable. This statement is true even when adding the assumption that
the function is defined on an interval. For instance, the function y = x3 is continuous
and differentiable for all x 2 R, but the same is not true for its inverse y = x1/3 .
Explain.
8.3. DIFFERENTIATION FORMULAS FOR ELEMENTARY FUNCTIONS 243

A first look at differentiation formulas for elementary functions


The point is now to establish the following differentiation formula.

Proposition 8.38
d 1
(i) log x = , x>0
dx x
d
(ii) sin x = cos x
dx
d
(iii) cos x = sin x
dx

As it happens, we have already done the hard work in proving this proposition in
the previous chapter (see page 198). In the following exercises, the point is to help you
realise this.
Exercise 8.39 We now ask you to prove part (i) of the above proposition.
(a) Write out the definition of the derivative of the logarithm at x = 1 and verify
that we have already proved that formula (i) holds at this point.
(b) Use the logarithmic laws, and the change of variables rule for the limit, to extend
the differentiation formula to all other x.
Hint: The solution to (a) should reveal where to look for inspiration for (b).

Exercise 8.40 In this exercise, we ask you to prove parts (ii) and (iii) of the above
proposition.
(a) Write out the definition of the derivative of the sine and cosine at x = 0 and
verify that we have already proved that formulas (ii) and (iii) hold there.
(b) Use suitable trigonometric formulas, and the change of variables rule for the limit,
to extend the differentiation formulas to all other x.
Hint: The solution to (a) should reveal where to look for inspiration for (b).

Now, an interesting observation is that while the logarithm has domain x > 0, its
derivative y = 1/x has the much larger domain x 6= 0. This leads us to ponder if we can
somehow extend the logarithm to negative x in such a way that the derivative of this
extension is equal to 1/x there. This is the point of the following exercise:
Exercise 8.41 (a) Use the chain rule for the derivative to prove that for x < 0 we
have
d 1
log( x) = .
dx x
(b) Use what you learned in (a) to write a formula for a function f (x) with domain
R\{0} such that f 0 (x) = 1/x there.
244 CHAPTER 8. THE DERIVATIVE

Implicit differentiation and more on derivatives of elementary functions


One of our main goals is to now use what we know about the derivatives of the logarithm
and the trigonometric functions to find the derivatives of their inverses. Now, technically
speaking, once we have proved Proposition 8.10 on the derivatives for inverse functions,
this should be smooth sailing. However, as students tend to find that formula hard to
use, we introduce a special technique called implicit differentiation in order to make it
as transparent as possible that taking derivatives of inverse functions is nothing but a
clever use of the chain rule.
So, what is implicit differentiation? Essentially, it is just an application of the chain
rule. We illustrate the main idea of differentiating "implicitly" by considering the prob-
lem of finding tangent lines to the unit circle:

Example 8.42 (Implicit versus explicit differentiation)

Our goal is to compute the slope of the tangent


p
line to the unit circle at the point (1/2, 3/2).
We first do it this in the “obvious” way. Indeed,
the unit circle is described by the equation
p x +
2

y = 1. Solving for y leads to y = ± 1 x2 .


2

Choosing + gives us the equation for the upper


half circle: p
y = + 1 x2 .
Taking the derivative of this expression with re-
spect to x, we obtain the desired slope as follows: Fig. 6.

dp x 1/2 p
y0 = 1 x2 = p =) y 0 (1/2) = p = 1/ 3.
dx 1 x2 1 1/4

p
Fig.
p 7. The function y = + 1 x describes the upper semi-circle and y =
2

1 x2 describes the lower semi-circle.


Determining the slope using implicit differentiation: We now show how to find
this slope without first solving y explicitly as a function of x. To do this is to assume
that our curve is defined by some function y = f (x) close to the point we care about.
8.3. DIFFERENTIATION FORMULAS FOR ELEMENTARY FUNCTIONS 245

The point is that we can solve our problem without ever needing to know the formula
for f (x).
The first step is to put y = f (x) into the equation for the circle:

x2 + f (x)2 = 1.

Since the left-hand side is identical to the right-hand side for all x, the derivative of the
left-hand side has to be equal to the derivative of the right-hand side. We obtain:
d⇣ 2 ⌘ d⇣ ⌘
x + f (x)2 = 1 =) 2x + 2f (x)f 0 (x) = 0.
dx dx
Here, f 0 (x) appears since it is the inner derivative when we use the chain rule to get
(f (x)2 )0 = 2f (x)f 0 (x). Rewriting the above expression, we get
x x
f 0 (x) = () y 0 = .
f (x) y

In the last step, we just


p used that y =0 f (x) and
p y = f (x). But this means that when
0 0

(x,y) is equal to (1/2, 3/2), we get y = 1/ 3.

To summarise, we say that we differentiate implicitly when we take the derivative


of an expression where y is not explicitly given in terms of x. If we leave out the step
where we give y the name f (x) (which is not really necessary), the above example can
be (partially) summed up as follows:
p explicit diff. x
explicit solution: x2 + y 2 = 1 =) y = ± 1 x2 =) y0 = ± p
1 x2
implicit diff. x
implicit solution: x2 + y 2 = 1 =) 2x + 2yy 0 = 0 =) y 0 = .
y
p
Observe that since y = ± 1 x2 both methods really give the same result.

Exercise 8.43 Consider the curve given by the equation

x4 y4 x2 + y 2 = 0.

(a) Determine y 0 as a formula of x and y by using implicit differentiation.


p
(b) Show that the points ( 3/2, 1/2) and (3,3) are on the curve. Determine the
tangent-lines through these points.
(c) Draw the curve in WolframAlpha or some other program. How can the curve
most easily be described in terms of other, more familiar, curves. First make a
guess, and then try to prove that your guess is true.
246 CHAPTER 8. THE DERIVATIVE
p p
(d) What value do you get for y 0 at the point (1/ 2,1/ 2). Why do you think this
is?

Remark 8.44 (Implicit Function Theorem) To use implicit differentiation, we need


to know that we can think of the variable y as a function of x even if we cannot explicitly
solve y in terms of x. The theoretical justification of this is given by a result called the
Implicit Function Theorem. Since it is a result from several variable calculus, we will just
assume that this justification works in these lecture notes. We remark that the implicit
function theorem also states conditions for when y is differentiable.

We now apply implicit differentiation to the problem of finding the derivative of


inverse functions. The nice thing is that the technique allows us to use the differentiation
formula for inverse functions without actually using it. To explain what we mean by this
apparent non-sense, let us consider an example:

Example 8.45 We now use implicit differentiation to compute the derivative of the
function y = arcsin(x) by using the fact that it is the inverse function of the sine. That
is, for x 2 Darcsin we have
y = arcsin x () sin y = x,

where Darcsin = Rsin = [ 1,1] and Rarcsin = Dsin = [ ⇡/2,⇡/2].


The point is that we consider the relation sin y = x as an implicit equation for y.
That is, in considering y as a function of x, we can use implicit differentiation to compute
d d
sin y = x () cos y · y 0 = 1
dx dx
1
() y 0 = .
cos y
Notice that the only thing we did here was to use the chain rule to get (sin y)0 = cos y · y 0 .
Next, plugging in y = arcsin x, we can write the last expression above as
1
(arcsin x)0 = .
cos(arcsin x)
p
By the result of Example 2.86, we know that cos(arcsin x) = 1 x2 , and so
1
(arcsin x)0 = p .
1 x2

Remark 8.46 Observe that when using implicit differentiation to study the derivatives
of inverse functions as we do above, then we have no need for the implicit function
8.3. DIFFERENTIATION FORMULAS FOR ELEMENTARY FUNCTIONS 247

theorem. Indeed, since the function is differentiable, we know that we can consider both
y as a function of x and x as a function of y (if needed).

Exercise 8.47 Use implicit differentiation to show that


d 1 d 1
(a) arccos(x) = p (b) arctan(x) = .
dx 1 x2 dx 1 + x2
Hint: In (b), keep in mind that there is more than one way to express (tan x)0 .
Exercise 8.48 Use the fact that exp(x) is the inverse function of log x, and that
(log x)0 = 1/x to prove that
d
exp(x) = exp(x).
dx
Exercise 8.49 Use the definition of the complex exponential to prove that
d
exp(ix) = i exp(ix).
dx
Exercise 8.50 Use what you know about the logarithm and exponential functions to
prove that for all ↵ 2 R, we have
d ↵
x = ↵x↵ 1
, x > 0.
dx
Exercise 8.51 Use implicit differentiation to prove the differentiation formula for
inverse functions from Proposition 8.10.

Here is a summary of some of the differentiation formulas proven in this chapter.

Proposition 8.52
d
(i) exp(x) = exp(x)
dx
d
(ii) exp(ix) = i exp(x)
dx
d ↵
(iii) x = ↵x↵ 1
, ↵ 2 R, x > 0
dx
d 1
(iv) arcsin x = p , x 2 ( 1,1),
dx 1 x2
d 1
(v) arccos x = p , x 2 ( 1,1),
dx 1 x2
d 1
(vi) arctan x = , x 2 R.
dx 1 + x2
248 CHAPTER 8. THE DERIVATIVE

8.4 Exam exercises


There are no questions on the previous exams simply asking you to compute the deriva-
tive of a function. Therefore, all exercises below are part of some larger problem, and if
you consult the suggested solutions of the exams problems, you should find the details
of most of these computations.

Exercise 8.53 (Exam 2015-05-27, part of 5) Make a table of signs for the deriva-
tive of the function
x 1
f (x) = ln x , x 1.
x+1
Exercise 8.54 (Exam 2014-08-18, part of 2) Make a table of signs for the deriva-
tive of the function
1
f (x) = 2 arctan x + , x 6= 0.
x
Exercise 8.55 (Exam 2014-05-26, part of 3) Make a table of signs for the deriva-
tive, and the second derivative, of the function
1/x
f (x) = |x|e , x 6= 0.

Exercise 8.56 (Exam 2012-12-19, part of 4) Make a table of signs for the deriva-
tive of the function

f (x) = |x3 6x2 + 9x 4|, x 2 [0,5].

Exercise 8.57 (Exam 2012-05-28, part of 1) Make a table of signs for the deriva-
tive of the function p
2
f (x) = e x /2 x2 + 1, x 2 R.

Exercise 8.58 (Exam 2012-05-28, part of 3) Make a table of signs for the deriva-
tive of the function

x 1
f (x) = ln(1 + e ) , x 2 R.
ex +1
8.5. ANSWERS TO SELECTED EXERCISES 249

8.5 Answers to selected exercises


8.1 In (i) and (ii), the point is to use the expressions defining sinh x and cosh x. In
(iii), use the fact that the derivative of sinh x is strictly positive for all x 2 R. In
(iv), use implicit differentiation.

8.6 No.

8.7 C = 1/2, D = 1/2.

8.13 (a) (2x arctan(x) 1)/ arctan(x)2 , (b) 1/(1 + cos(x)), (c) 2(ln(x) 1)/(ln x)2 .
2
8.17 (a) cos xesin x , (b) 2x/(1 + x2 ), (c) 1/(1 + x2 ), (d) (1 + 2x2 )ex

8.19 (a) - (v), (b) - (iii), (c) - (iv), (d) - (i), (e) - (ii).

8.22 (a) 0, (b) 0, (c) 0.

8.24 Here is an additional hint: multiply the expression in the original hint by one in
such a way that you can take advantage of the fact that the limit

f (u) f (x)
lim
u!x u x
exists and is equal to some finite number.

8.25 It says that there are functions that are continuous but not differentiable. That is,
that the two areas in the Venn diagram do not coincide.

8.27 (a)
1 1
g(x+h) g(x) g(x) g(x + h) 1 1 g(x + h) g(x) g 0 (x)
lim = lim = lim · = .
h!0 h h!0 hg(x)g(x + h) g(x) h!0 g(x + h) h g(x)2

(b) Apply the product rule to f (x) · 1/g(x).


2x2 ) p
8.43 (a) y 0 = x(1 2
y(1 2y )
, (b) the tangent lines are y = 3x + 2 and y = x, respectively,
(c) it is the union of the circle x +y = 1 and the lines y = x and y = x. This can
2 2

be seen algebraically from the identity x4 y 4 x2 +y 2 = (1 x2 y 2 )(y x)(y +x).

8.48 Let y = ex , then log y = x. Now differentiate implicitly on both sides, and then
substitute back.

8.51 Let y = f 1 (x), then f (y) = x. Now differentiate implicitly on both sides, and
then substitute back.
250 CHAPTER 8. THE DERIVATIVE
Chapter 9

More on functions and limits

Introduction
The main focus of the chapter is the Mean Value Theorem. Basically the first half of
the chapter is about understanding its statement and exploring how it has been playing
a part in our lives since high-school mathematics. The second half of the chapter is
about proving the Mean Value Theorem. Along the way, we will meet the Intermediate
Value Theorem, the Min-Max Theorem and, perhaps the most important of them all,
the Bolzano-Weierstrass theorem.

Remark 9.1 (Selected problems from previous exams based on this chapter)

1. (a) Formulate the Mean Value Theorem.


(b) Show that if f 0 (x) = 0 on an interval, then f (x) is constant there.
(c) Show that p p ⇡
2 arctan(x x2 1) + arctan( x2 1) = , x 1.
2
2. (a) Formulate the Mean Value Theorem.
(b) Show that if f 0 (x) > 0 holds on an interval, then f (x) is growing there.
(c) Show that x 1
ln x , x 1.
x+1
3. Sketch the graph of f (x) = |x|e 1/x for x 6= 0. In particular, determine where f
is growing, decreasing, convex and concave, respectively. Moreover, point out any
extremal points (local and global) of f .
4. Determine the range of the function
p
3+x
f (x) = + arctan x, x 2 R.
1 + x2

251
252 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

9.1 A first look at the Mean Value Theorem


In this section, our goal is to understand the statement of the Mean Value Theorem and
some of its immediate consequences.

The statement of the Mean Value Theorem


We begin with the statement.

Theorem 9.2 (The Mean Value Theorem for Derivatives) Suppose that a func-
tion f is continuous on [a,b] and differentiable on (a,b). Then there exists (at least) a
point c 2 (a,b) such that
f (b) f (a)
f 0 (c) = .
b a

The literal meaning of the Mean Value Theo-


rem is illustrated to the right. Indeed, given a
continuous and differentiable function on an in-
terval (which essentially means that the graph
of the function is nice and smooth), the Mean
Value Theroem says the there exists a point c in
the interval (a,b) so that the tangent line of f at
x = c is parallel to the straight line from (a,f (a))
to (b,f (b)).
A more common sense explanation is as fol-
lows: If your Granny starts her car in Lund at
08:00 and arrives in Malmö 5 minutes later at Fig. 1. Standard illustration for the
08:05, should you be worried? Let us think about Mean Value Theorem.
this for a moment. The distance between Malmö
and Lund is about 20 kilometres. So, her average
velocity would have to have been
20 km km km km
=4 = 4 1 = 240 .
5 min min 60 h h

This is quite fast! Now, it is clear that if she


always drove strictly slower (or always drove
strictly faster) than 240 km/h, then this could
not have been her average velocity. So, she would
actually have to have driven at exactly 240 km/h Fig. 2. A less standard illustration.
at some moment in time.
9.1. A FIRST LOOK AT THE MEAN VALUE THEOREM 253

Let us formulate the Granny example in a mathematical language: Suppose that the
function f (x) describes the position of Granny at time x, she starts at time x = a, and
stops at time x = b. Her average velocity is then expressed as the quotient

f (b) f (a)
.
b a
Now, according to our intuitive reasoning, then at some point in time she would actually
have had to drive at the average velocity. In other words, there has to be some time
x = c,
with a < c < b, so that her actual velocity, given
by f 0 (c), was equal to her average velocity. That
is, there has to exists a c 2 (a,b) so that

f (b) f (a)
f 0 (c) = .
b a
This is exactly the conclusion of the Mean Value
Theorem! Fig. 3. Busted!
Let us now consider the hypotheses of the Mean Value Theorem. Why do we need to
assume that f is continuous on [a,b] and differentiable on (a,b)? Well, suppose we allow
f to have discontinuities. This would mean that Granny can teleport! And if Granny
has a teleportation device in her living room, this means that she can easily get from
Lund to Malmö in 5 minutes without ever breaking any laws (of traffic, at least).

Fig. 4. Left: If Granny can travel in a discontinous manner, then she can teleport.
Right: If Granny can travel without being differentiable, she will break her neck.

And what about being differentiable? Well, if f (t) is not differentiable, then the slope
may change instantaneously. That is, Granny’s velocity can go from 90 km/h to 300
km/h and back to 90 km/h without ever passing the average velocity. While this is bad
news for anyone who wants to avoid whiplash, it also means that we cannot expect the
Mean Value Theorem to hold if the function is not differentiable.
We consider the proof of the Mean Value Theorem in Section 9.4.
254 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

An immediate consequence of the Mean Value Theorem


In this course, the following result can be said to be the main consequence of the Mean
Value Theorem. It is actually this result that allows us to say that if a function has a
positive derivative then it has to be growing. (However, the Mean Value Theorem is not
needed for the converse statement. See Exercise 9.5 below.)

Corollary 9.3 Suppose that f (x) is continuous and differentiable on an intervall I.


Then the following holds:

(i) f 0 (x) > 0 for all x 2 I =) f (x) is strictly increasing on I.


(ii) f 0 (x) = 0 for all x 2 I =) f (x) is constant on I.
(iii) f 0 (x) < 0 for all x 2 I =) f (x) is stricly decreasing on I.

Proof. We prove part (ii), and leave parts (i) and (iii) as an exercise.
So, we assume that f 0 (x) = 0 on I, and want to show that f (x) is constant on I.
To do this, we choose choose two points x1 , x2 2 I with x1 < x2 . By the Mean Value
Theorem, there exists a point c 2 (x1 , x2 ) so that

f (x2 ) f (x1 )
= f 0 (c).
x2 x1

Since f 0 (c) = 0, and the denominator is non-zero (x1 6= x2 ), it follows that the numerator
has to be zero. That is,
f (x2 ) = f (x1 ).
This means that f takes the same value at all points in I and is therefore constant.

Exercise 9.4 Prove parts (i) and (iii) of Corollary 9.3.


Hint: Recall Definition 2.41.
Exercise 9.5 Show that with a minor modification, the implication (= also holds
in Corollary 9.3. (For this, you only need to use the definition of the derivative.)

Corollary 9.3 may not seem like much, but this is far from true. The following exercise
is meant to give a hint of its power.

Exercise 9.6 Suppose you only know the differentiation formulas (sin x)0 = cos x,
(cos x)0 = sin x and that sin 0 = 0 and cos 0 = 1. Use this and Corollary 9.3 to prove
that
sin2 x + cos2 x = 1, 8x 2 R.

Hint: What is the derivative of f (x) = cos2 x + sin2 x 1?


9.1. A FIRST LOOK AT THE MEAN VALUE THEOREM 255

Two important theorems on continuity


We now state two innocent looking theorems on continuous functions. The first reflects
the intuition that a function is continuous on an interval if you can draw its graph there
without lifting your pen.

Theorem 9.7 (The Intermediate Value Theorem) Suppose that f is continuous


on the finite and closed interval [a,b]. If f (a) and f (b) have opposite signs, then there
exists (at least) a number c 2 (a,b) so that f (c) = 0.

It may (or should) come as a surprise that


this is one of the most difficult results to prove
in these lecture notes. Indeed, if a function that
is continuous on an interval [a,b] is negative at
x = a and positive at x = b, how hard can it
be to show that it has to be 0 at some point in
between? Well the problem is that we basically
need to prove that there are no "holes" in the Fig. 5. This is how I imagine a con-
real line that it can somehow pass through! fused graph.

In order to convince you that this explanation


is not completely ridiculous, consider the follow-
ing. What if the x-axis only consisted of rational
points Q (and for a computer, it consists of even
fewer points – the floating point numbers)? Let
us call this the Q-axis. Notice that the continu-
ous function f (x) = x2 2 satisfies f (0) < 0 and
f (2) > 0, but there is no point x on the Q-axis
so that f (x) = 0. That is, the Q-axis has holes! Fig. 6. The function f (x) = x2 2
As it turns out, the real x-axis has no holes. can pass from being negative to pos-
It is what we call complete, and the Intermediate itive without ever touching the ratio-
value theorem is one reflection of this property nal x-axis. That is, the Q-axis has
(we will not give an abstract definition of what it holes!
means for a set to be complete). As we indicated
above, the rational numbers Q are not complete (the Q-line has holes!). The reason for
this is that R satisfies the completeness axiom, and Q does not (in particular, we will
need the completeness axiom when proving the Intermediate value theorem).
We now state the second, and related, theorem for continuous functions.
256 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

Theorem 9.8 (The Min-Max Theorem) Suppose that f is continuous on the finite
and closed interval [a,b]. Then there exist (at least) two points `, u 2 [a,b] so that for all
x 2 [a,b] we have
f (`)  f (x)  f (u).

Fig. 7. The Min-Max Theorem says that if a function is continuous on a finite


and closed interval, then it will have a global minimum value (f (`)) and a global
maximum value (f (u)). Since we do not know if f is differentiable, these may be at
stationary points, singular points, or even at the end points of the interval.

The Max-min theorem is closely related to the Intermediate value theorem, and is
difficult to prove for the same reason. Namely, could it happen that the max and min
points happen at "holes" in the real line? In order to build a little more mathematical
"muscle" before tackling the proofs of these theorems, we postpone further study until
Section 9.5 (we will not need these results until that chapter anyway).

Exercise 9.9 Give examples that show how (a) the Intermediate Value Theorem
and (b) the Min-Max Theorem both fail if we remove the condition that f is continuous
on a closed and finite interval.

Exercise 9.10 Show that the following statement is a consequence of the Intermedi-
ate Value Theorem. "Suppose that f is continuous on [a,b] and s 2 R, and that one
of the following conditions hold:

(i) f (a)  s and f (b) s, (ii) f (a) s and f (b)  s.

Then there exists a point c 2 [a,b] so that f (c) = s."

Exercise 9.11 Use the previous exercise and the Min-Max Theorem to prove that if
the domain of a continuous function f is a closed interval, then the range of f is also
an interval.
Remark: This is also true if f is continuous but defined on an interval that is not
closed. As an extra challenge, feel free to try to extend your proof to also cover this
situation.
9.2. CONSEQUENCES OF THE MEAN VALUE THEOREM 257

9.2 Consequences of the Mean Value Theorem


Something that ought to be familiar from high school is that the derivative can be used
to study graphs of functions. This is something we briefly touched on in Chapter 2, and
that we are going to properly deal with now.

Example 1: How to optimise a function using the derivative


To optimise a function is the same as determining its largest and smallest values. The
following is a very typical high-school type example of how to use the derivative.

Example 9.12 For x in [ 2,3], we determine the largest and smallest values of
f (x) = x3 3x + 1.
Following what we (should have) learned in high school, we compute the derivative
f 0 (x) = 3x2 3 = 3(x2 1) and observe that f 0 (x) = 0 holds exactly when x = ±1.
To efficiently use the information contained in f 0 (x) to sketch the function, one can
make a table of signs for the derivative, and then apply Corollary 9.3 as follows:

Fig. 8. In the first three rows, we figure out the sign of f 0 (x). In the last row, we
use Corollary 9.3 to translate this into how f (x) behaves.
Based on the above table, we see that f has its smallest value at either x = 2 or x = 1,
and its largest value at either x = 1 or x = 3. To better understand what is going on,
we compute the values of f at these points:

minimum points maximum points


f ( 2) = 1 f ( 1) = 3

f (1) = 1 f (3) = 19

We conclude that f ( 2) = f (1) = 1 and


f (3) = 19 is the smallest and largest values of Fig. 9. Here is a computer plot of the
f , respectively. function from Example 9.12.
258 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

Let us now recall some vocabulary relevant to the above example. We call a point
x = a a local maximum point of a function if f (x)  f (a) for all x in a neighbourhood
of x = a. Local minimum points are defined similarly (at end-points we only require
this to hold in one-sided neighbourhoods). Collectively, these are called local extremal
points. The local extreme points at which the function takes its largest and smallest
values are called global extremal points. In the above example, x = 3 was a global
maximum point, while both x = 2 and x = 1 were global minima.
Points where f 0 (x) = 0 are called stationary
points, while points x 2 Df where f 0 (x) does
not exist are called singular points. We remark
that the definition of singular points varies be-
tween textbooks. (For instance, some textbooks
do not require singular points to be in the do- Fig. 10. Restricted to [ 1,2), f (x) =
main in order for them to be able say that, e.g., |x| has two extremal points. Do you
f (x) = 1/x has a singular point at x = 0.) see which these are?
Next, we mention a result that is due to
Fermat, and which leads to an explanation of
where a function can have maximum and mini-
mum points. While this result is sometimes re-
ferred to as "Fermat’s theorem" it should not be
confused with "Fermat’s last theorem", which is
a famous problem that took more than 300 years
to solve (Simon Singh’s book on Fermat’s last
theorem was basically what got me into mathe-
matics in the first place). Fig. 11. Pierre de Fermat 1607 –
Note that all points of an interval that are not 1665.
endpoints are called inner points.

Theorem 9.13 (Fermat’s theorem) Let f be defined on an interval I, and suppose


that c 2 I is both an inner point of I and a local extremal point for f . If f is differentiable
at c then c is a stationary point for f .

Proof. Suppose that x = c is a local maximum of f . By hypothesis f 0 (c) exists, and so,
by definition, we have
f (x) f (c)
f 0 (c) = lim . (9.1)
x!c x c
Since c is not an endpoint of the interval I, we can consider one-sided limits from each
side. Since we know that f (x) f (c) is negative when x approaches c, this allows us to
use knowledge of the sign of (x c) in the one-sided limits to deduce that

f (x) f (c) f (x) f (c)


lim 0 and lim 0.
x!c+ x c x!c x c
9.2. CONSEQUENCES OF THE MEAN VALUE THEOREM 259

Since f 0 (c) must be equal to both of these limits, it follows that f 0 (c) = 0. The proof in
the case of a local minimum is similar.

A consequence of Fermat’s theorem is a classification of where a function can have


extremal points, something which is useful to know when optimising functions.

Corollary 9.14 (Classification of extremal points) Let f be defined on an interval


[a,b]. If x 2 [a,b] is an extremal point for f , then at least of the three has to hold:

• x is a stationary point of f .

• x is an endpoint of [a,b].

• x is a singular point for f .

Proof. Suppose that x is an extremal point of f . We can split the proof into two cases:
(i) x is an endpoint of [a,b], (ii) x is an inner point of [a,b].
In the first case, we are done (since endpoints is one of the conclusions we aim for). So,
we deal with case (ii). But this we can split into two sub-cases: (ii-a) f is differentiable
at x, (ii-b) f is not differentiable at x. In the first of these subcases, we use Fermat’s
theorem to conclude that x is a stationary point. In the second, we are immediately
done, since this means that x is a singular point of x.

Finally, we remark that it is possible for sta-


tionary points not to be at extremal points. The
classical example is f (x) = x3 . Indeed, then
f 0 (0) = 0 but x = 0 is neither a local maximum
nor a local minimum. While not very common
in the English literature, some call such points Fig. 12. The function f (x) = x3 .
terrace points.

Example 9.15 (Example 9.12 revisited) Again, for x 2 [ 2,3], we consider

f (x) = x3 3x + 1.

As an alternative to make a table of signs to optimise f , we can use Corollary 9.14 in


combination with the Min-max theorem.
Indeed, since f is continuous (it is a polynomial) and [ 2,3] is a closed and bounded
interval, it follows from the Min-max theorem that it has a global maximum and a global
minimum. Moreover, according to Corollary 9.14, these can only occur on stationary
points, endpoints and singular points.
260 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

Since f has no singular points, this means that we are guaranteed that the global
maximum and minimum points of f will be on (at least) one of the points in the following
list: x = 2, x = 3 (the endpoints) or x = ±1 (the stationary points of f ).
By comparing the values of f computed in the table of Example 9.12, we see that
f ( 2) = f (1) = 1 is the global minimum and f (3) = 19 is the global maximum.

While the above approach is somewhat less complicated than making a table of signs,
it also gives less information. For instance, without making further arguments, we do not
know if x = 1 is a local maximum, local minimum or a terrace point of f – something
which would be immediately clear from a table of signs for f 0 .

Remark 9.16 (Warning) Please note that while elegant, the justification of the ap-
proach from Example 9.15 collapses completely if f is continuous and differentiable on
an open interval. For instance, if we take the same function, that is f (x) = x3 3x + 1,
but now consider it on ( 1,1), then none of the points considered above are global
maximum or minimum points for f (in fact, on this interval, f has no global extremal
points!).

Exercise 9.17 Optimise the function f (x) = x3 3x2 9x + 2 on [ 1,4].


Exercise 9.18 Consider f (x) = |x2 +x 2| on R.

(a) Determine where f is increasing and decreasing, respectively.


(b) Determine all local and global extremal points of f .

Exercise 9.19 Determine all extreme points and asymptotes of the function
2
p
f (x) = e x /4 x2 + 1.

Classify the extreme points and determine whether they are global. Illustrate your
answer in a simple sketch of the function.
Exercise 9.20 Consider the function given by
8p
< 1 + x2 x>0
f (x) =
:
1 x2 x0

(a) Determine if f is continuous at x = 0.


(b) Use the definition of the derivative to determine if f is differentiable at x = 0
(c) Specify any possible asymptotes, possible minimum and maximum points, and
where the function is increasing or decreasing.
9.2. CONSEQUENCES OF THE MEAN VALUE THEOREM 261

Example 2: How to sketch a graph using information from the second


derivative

In the previous example, we used the table of signs to make crude sketches of the graph
of a function based on information on its derivative. Here, we take a look at how this
can be improved by taking the second derivative into account.
First, recall that the second derivative is
the derivative of the derivative, the third deriva-
tive is the derivative of the second derivative, and
so on. That is,

def d 0 def d 00
f 00 (x) = f (x), f 000 (x) = f (x), · · · .
dx dx
For higher order derivatives, it becomes silly to
keep writing f 000 , f 0000 and so on, it is practical to
use the notation f (3) , f (4) , . . ., instead.
Fig. 13. The mascots of the Rice
Also, recall that if y denotes position with re- Krispies cereals are apparently called
spect to time, then y 0 represents velocity (change “Snap”, “Crackle” and “Pop”. Which,
of position), y 00 acceleration (change of velocity), apparently, is exactly what engineers
and y 000 is called the “jerk” (change of accelera- call y (4) , y (5) and y (6) , respectively.
tion).
To understand, visually, what it means that
f 00 is positive or negative, we apply Corollary
9.3 to f 0 (instead of to f ). This tells us that if
f 00 is positive, then f 0 is increasing, and if f 00 is
negative then f 0 decreasing. By definition, we
say that a function f is convex on an interval
I if f 0 is increasing on I, and concave if f 0 is
decreasing on I (see figure to the right).
Fig. 14. The happy graph has a grow-
The point where a function changes from be- ing derivative, and the sad graph has
ing convex to concave (or vice versa) is called an a decreasing derivative.
inflection point. We require, as is usual, that
for a point x to be called an inflection point for
a function f , then f must be differentiable at x.
Now, let us make the following observation
from Figure 15. If a stationary point is in an
interval where the graph is happy (convex), then
it has to lie at the bottom of the smile, and thus
it must be a local minimum. This is a special Fig. 15. A happy function with a sta-
case of the second-derivative test: tionary point.
262 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

Proposition 9.21 (The second derivative test) Suppose that f (x) is twice differ-
entiable on (a,b) and has a stationary point at c 2 (a,b). Then the following holds.

(i) f 00 (c) > 0 =) f (c) is a local minimum.


(ii) f 00 (c) < 0 =) f (c) is a local maximum.

If f 00 (c) = 0, then anything can happen. (It could be an extreme point, or it could be
an inflection point.)

Proof. This proof is actually kind of fun and kind of similar to that of Fermat’s
theorem, above. Suppose we are in situation (i). Our first observation is that since x = c
is a stationary point for f we have
f 0 (x) f 0 (c) f 0 (x)
f 00 (c) = lim = lim .
x!c x c x!c x c

We now proceed by considering the one sided


limits separately. We do this in order to get the
fact that f 00 (c) < 0 into play. First, we look at

f 0 (x)
0 > f 00 (c) = lim .
x!c+ x c
Here, the denominator x c must be positive.
Fig. 16. In some (possibly very small)
But since the fraction f 0 (x)/(x c) is supposed to
neighbourhood of x = c, we know the
give a negative limit, this means that f 0 (x) has to
sign of f 0 (x). This means that we
become negative as x ! c+ . For similar reasons,
know the behaviour of f (x).
f 0 (x) has to become positive as x ! c .
Let us now consider a first example.

Example 9.22 (use of the second derivative test) Let us find the extremal points
of f (x) = x3 3x + 1 without making a table of signs. We can do this by considering

f 0 (x) = 3(x2 1) = 3(x 1)(x + 1)

and
f 00 (x) = 6x.
We notice that this function is continuous and differentiable on all of R. This means
that the only possible extremal point is at the stationary points x = ±1. Since

f 00 (1) > 0, and f 00 ( 1) < 0,

it follows by the second-derivative test that x = 1 is a local minimum and x = 1 is a


local maximum.
9.2. CONSEQUENCES OF THE MEAN VALUE THEOREM 263

Here is a second, more complicated example, where we use the second derivative to
provide a relatively detailed sketch of a function.

Example 9.23

Suppose we are given the function


1
f (x) = 2 arctan x + , x 6= 0,
x
and the computer generated plot to the right.
Since it is not very clear from the image, our
mission is to see if we can find any hidden fea-
tures of this graph by making a sketch by hand.
The first thing we do is to compute the first
two derivatives of the function. We leave it as an Fig. 17. A plot of f using 100 data-
exercise to check that points.
x2 1 (x + 1)(x 1)
f 0 (x) = = ,
x2 (x2 + 1) x2 (x2 + 1)
x4 2x2 1
f 00 (x) = 2
x3 (x2 + 1)2
p p
(x2 2 1)(x2 + 2 1)
= 2
x3 (x2 + 1)2

Since the derivative is defined on all of the do-


main Df = {x 6= 0}, this means that the function
has no singular points on its domain. Therefore, Fig. 18. Table of signs for f .
all extremal points have to be at the stationary
points x = ±1 (there are no end-points to con-
sider).
To figure out how the function acts, we start
out by doing a table of signs for the first deriva-
tive (Figure 18). From it, we clearly see that
x = 1 is a local maximum and x = 1 a local
minimum. Notice that since x2 is in the denomi-
nator of f 0 (x), its zero becomes a skull to indicate
that the derivative does not exist at this point.
Fig. 19. Table of signs for f 00 (x).
To improve our sketch, we also consider a table of signs for the second derivative
(Figure 19). From it, we clearly see where the function is convex and concave, respec-
tively. Finally, to get the full picture, we also check the asymptotes of the function. By
264 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

the formula for f (x), we see directly that it has a vertical asymptote at x = 0. We are
also able to identify horisontal asymptotes follow from the computation:
⇣ 1⌘ ⇣ ⇡⌘
lim 2 arctan x + =2· ± + 0 = ±⇡.
x!±1 x 2

ppTaking into
p
account the information from the table of signs, and the fact that
p
2 + 1 ⇡ 1.4 + 1 = 2.4 > 1, we arrive at the following sketch.

Fig. 20. Our sketch for f . Notice how, in certain respects, it is more informative
than the computer generated image (Figure 17).

Exercise 9.24 Sketch the graph of the function f (x) = xe 1/x . In particular, deter-
mine its domain, all asymptotes (vertical, horisontal and skew), all extremal points
(local and global) and where it is convex and concave.
Exercise 9.25 Repeat the previous exercise for the function f (x) = (x2 1)2/3 .
Hint: The derivative and double-derivatives can be expressed quite nicely, however, the
computation is messy if you are not careful.

Exercise 9.26 To the right, you see a partial


plot of the graph of

f (x) = 3 arctan(x) + ln(1 + x2 )

using 100 datapoint on the interval [ 3,3]. De-


termine the exact location of the local min-
imum we see in the graph. Also, determine
whether this is an actual global minimum.
That is, study how the function behaves out-
side of the interval shown in the figure. Fig. 21. A computer generated plot
of f (x) in exercise 9.26.
9.2. CONSEQUENCES OF THE MEAN VALUE THEOREM 265

Example 3: How to prove identities


Note that the following example is similar to exercise 9.6.

Example 9.27 Let us prove that the identity


⇣1⌘
ln = ln(x), x > 0,
x
is a consequence of the identities (ln x)0 = 1/x and ln(1) = 0. Similar to what we did in
the example on inequalities, the trick is to define
⇣1⌘
f (x) = ln + ln(x)
x
and to prove that f (x) is equal to 0 for all x > 0. So, the question becomes, how to do
this? Well, using the chain rule, we compute the derivative as follows:
1 ⇣ 1⌘ 1 1 1
f 0 (x) = · + = + = 0.
( x1 ) x2 x x x

By Corollary 9.3, this means that f (x) is constant. If we put x = 1, we see that
1
f (1) = ln + ln 1 = 0 + 0 = 0.
1
That is, f (x) is constantly equal to 0, and we are done.

Exercise 9.28 As in the above example, use only that (ln x)0 = 1/x and ln 1 = 0 to
prove that for all a > 0, we have
ln(ax) = ln a + ln x, x > 0.

Exercise 9.29 For which x does it hold that



arcsin x = arccos x.
2
Exercise 9.30 (Exam 2004-07-25) Show that
p 1
arctan x2 1 + arcsin
x
is constant for x 1, and determine the value of this constant.
266 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

Example 4: How to prove inequalities

Example 9.31 (Exam 2013-05-29, slightly modified)


Prove that for all x 2 R: p
3+x p ⇡
+ arctan(x)  3+ .
1 + x2 6
Here, the trick is to define a "help function"
p
3+x p ⇡
f (x) = + arctan(x) 3 .
1 + x2 6
Formulated in terms of this function, our task is to show that f (x)  0 for all x 2 R.
We do this by studying the graph of f . Notice that since f is an elementary function,
it is continuous on its domain, which we see is all of R. To see where it is differentiable,
we compute f 0 below.
Our first step to understand the function, is to
compute its derivative
p
0 1 3x
f (x) = 2
(1 + x2 )2
and make the table of signs for its derivative as
shown to the right (we leave the detailsp to the
reader). We clearly see that x = 1/ 3 is the
only local maximum (note that we can ignore Fig. 22. Table of signs for derivative
the denominator as it is always strictly positive). of f .
This
p means that to figure out if f (x)  0 for all x 2 R, we p
only need to check that
f (1/ 3)  0. By what we did in Chapter 2, we have arctan(1/ 3) = ⇡/6. Moreover, a
computation shows that p
3 + p13 p
= 3.
1 + 13
Piecing this together, we arrive at
p
f (1/ 3) = 0.

Exercise 9.32 Show that for all x 2 R it holds that


p
3+x ⇡
2
+ arctan(x) > .
1+x 2

Exercise 9.33 (Exam 95-05-31) Show that ln(1 + 4x) > arctan 3x for all x > 0.
9.2. CONSEQUENCES OF THE MEAN VALUE THEOREM 267

Example 5: Determining number of roots for expressions


If we combine the Mean Value Theorem with the Intermediate Value Theorem, we can
determine the number of roots of a function, even if we cannot figure out exactly where
they are. A key observation is contained in the following exercise.
Exercise 9.34 Use the Mean Value Theorem to prove that if f is continuous, differ-
entiable and strictly monotone on an interval I, then f has at most one zero there.

Example 9.35 (Part of problem on exam 1996-01-12) Determine the number of


solutions of the equation
x 2 2x
= 0.
x 2 + 2x
and more or less where these are (if any). To this
end, we put the left-hand side equal to f (x), and
compute the derivative:

(2 x ln x)
f 0 (x) = x2x+1 · . Fig. 23. Table of signs for f 0 .
(x2 + 2x )
Since f is differentiable on R, we see from the table of signs that, by the above exercise,
f has at most one zero on each interval ( 1, 0], (0,2/ ln 2) and [2/ ln 2, +1). That is,
at most 3 zeroes on R. To investigate further, we check that

x 2 2x
lim f (x) = lim = 1 < 0,
x!1 x!1 x2 + 2x

4
(ln 2)2
22/ ln 2
f (0) = 1 < 0 and f (2/ ln 2) = 4 > 0,
(ln 2)2
+ 22/ ln 2

x2 2x
lim f (x) = lim = 1 > 0.
x! 1 x!1 x2 + 2x

In particular, this means f changes sign on each of the three intervals mentioned above,
and therefore, by the Intermediate Value Theorem applied to the continuous function
f three times, we find that f has to have at least three zeroes on R. We can therefore
conclude that f has exactly three zeroes on R.

Exercise 9.36 Let f (x) = cos x and g(x) = x. Prove that f (x) and g(x) intersect
each other on the interval [0,⇡/2] (a) at least once, and (b) at most once.
Exercise 9.37 (Exam 2001-10-31) Determine the number of real roots of
1
arctan x + = 1.
x+2
268 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

Example 6: Determining the range of functions


Using the Intermediate Value Theorem in combination with a table of signs for f 0 allows
us to say something about its range. To prepare for this, we formulate an observation in
the following exercise.

Exercise 9.38 Suppose that f is continuous on an interval I with endpoints a < b


that need not belong to I or even be finite. Then show the following, where we consider
f with Df = I:

(a) If f attains its maximum and minimum value for points u, ` 2 I, respectively,
then Rf = [c,d] with c = f (`) and d = f (u).
(b) If f attains neither its maximum or minimum value on I, but the limits

lim f (x) and lim f (x)


x!a+ x!b

both exist or are equal to infinity, then Rf = (c,d) with c and d being equal to
the smallest and largest limit, respectively.
(c) Formulate what happens if, say, f attains its maximum value at some point u 2 I,
but not its minimum value, and both limits mentioned in (b) exist.

We now apply the results of the above exercise to determine the range of a function.

Example 9.39 Let us determine the range of the function from Example 9.23. Since
it is not continuous at x = 0, we need to consider the function restricted to the intervals
( 1, 0) and (0,1) separately.
Let us first consider f restricted to the interval ( 1,0). By what was done in
Example 9.23, we see that f attains its maximum value on that interval at the point
x = 1, and that f ( 1) = ⇡/2 1. Moroever, we observed that

lim f (x) = lim f (x) = 1.


x! 1 x!0

By the above exercise, this means that the range of f when restricted to ( 1,0) is equal
to ( 1, ⇡/2 1].
Similarly, when we consider f restricted to the interval (0,1), we find that its range
is equal to [⇡/2 + 1,1). We therefore conclude that f with Df = ( 1,0) [ (0,1) has
range
Rf = ( 1, ⇡/2 1] [ [⇡/2 + 1,1).

Exercise 9.40 Determine the range of the function from Example 9.35.
9.3. A NICE LITTLE TRICK: L’HOPITAL’S RULE 269

9.3 A nice little trick: L’Hopital’s rule

Here is a trick for computing limits that makes


life easier. In fact, this trick is so powerful that
some university teachers in Sweden, and even in
Lund, refuse to include it in their classes.
The trick is named after the french mathe-
matician Guillaume de l’Hopital who included it
in his textbook on mathematical analysis, which
is also considered to be the first text-book on
the subject. Legend has it that l’Hopital bought
this, and other results, from poor mathemati-
cians, hanging out in bars, who were in need of Fig. 24. Guillaume de l’Hopital (1661
money and then published them in his own name. – 1704)

Theorem 9.41 (L’Hopital’s rule) Suppose that f, g are differentiable in a punctured


neighbourhood around x = c, that 1/g 0 and 1/g are defined there and that the limit

f 0 (x)
lim
x!c g 0 (x)

exist.
Then, if we have either

f (x) h 0 i f (x) h 1 i
(i) lim = or (ii) lim = ,
x!c g(x) 0 x!c g(x) 1

it follows that
f (x) f 0 (x)
lim = lim 0 . (9.2)
x!c g(x) x!c g (x)

This is even true for one-sided limits or if c = 1 or c = 1.

Before discussing how to prove this result, let us illustrate how it works. Here is a
first example:

Example 9.42 Let us use L’Hopital’s rule to compute the familiar limit
sin x
lim .
x!0 x
Since this expression is of the form [0/0], we want to use L’Hopital’s theorem. If we, for
the moment, ignore all conditions we need to check to make sure the rule applies, we see
270 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

that formula (9.2) gives that:

sin x h 0 i L’Hop. cos x


lim = = lim = 1.
x!0 x 0 x!0 1

But can we trust this? Well, we need to check that all conditions of L’Hopital’s rule
are satisfied. But this follows since sin x and x are elementary functions defined in a
neigbourhood of x = 0, and since the limit of cos x/1 exists as x ! 0. Done!

Let us now consider an example of a limit of the form [1/1].

Example 9.43 We now use L’Hopitals rule to compute the limit

x2
lim .
x!1 ex

This limit is of the form [1/1]. As in the previous example, we start by applying
formula (9.2) if L’Hopital’s rule without really worrying about if the involved limits
exist:
x2 h 1 i L’Hop. 2x
lim x = = lim x .
x!1 e 1 x!1 e

This limit is also of the type [1/1]. So, what to do? Well, again, let us hope for the
best and apply (9.2) to get:

2x h 1 i L’Hop. 2 2
lim x = = lim x = = 0.
x!1 e 1 x!1 e 1

Ok, so now we ended up with a concrete value, which is nice. But can we trust the
computation? That is, are the conditions of L’Hopital’s formula met? Now, the crucial
condition in each step is whether the limit of f 0 /g 0 exists. As it turns out, we can justify
this condition "backwards". Indeed, in the final step, we verified that the limit of 2/ex
exists, and therefore the second application of L’Hopital’s rule is justified. But this
means that the limit of 2x/ex exists, and therefore the first application of L’Hopital’s
rule is justified, and all is good!

Finally, we look at an example where not checking the condition that the limit f 0 /g 0
exists does get us into trouble.

Example 9.44 Let us try to use L’Hopital’s rule to compute the limit
x + sin x
lim .
x!1 x
9.3. A NICE LITTLE TRICK: L’HOPITAL’S RULE 271

As in the previous examples, let us just use L’Hopital’s rule and worry about conditions
later. This gives:
x + sin x h 1 i L’Hop. 1 + cos x
lim = = lim .
x!1 x 1 x!1 1
Now, since cos x diverges as x ! 1, this limit does not exist. But what does this say
about our original expression? Well, not so much. Indeed, here is what happens when
we try to compute it without using L’Hopital’s rule:
x + sin x ⇣ sin x ⌘
lim = lim 1 + = 1 + 0 = 1.
x!1 x x!1 x
That is, in this case, L’Hopital’s theorem gives us the wrong answer. And the reason is
that the limit of f 0 /g 0 does not exist, and so L’Hopital’s rule does not apply here.

Exercise 9.45 Use L’Hopital’s rule to compute the following limits.


tan x arctan x
(i) lim (ii) lim
x!0 x x!0 x
ln(1 + x) ex 1
(iii) lim (iv) lim
x!0 x x!0 x
sin x
(v) lim (vi) lim x ln x.
x!0 x x!0+

Hint: In (vi), start by rewriting the expression on the form [1/1].

Exercise 9.46 Is the following computation correct?


sin x L’Hop. cos x L’Hop. sin x L’Hop. cos x 1
lim = lim = lim = lim = .
x!0 x3 x!0 3x2 x!0 6x x!0 6 6

Exercise 9.47 What happens if you try to use L’Hopitals rule to compute the limit
x
lim p ?
x!1 x2 1

Exercise 9.48 Compare what happens if you try to compute the limit

f (x)
lim ,
x!0 g(x)

with f (x) = x2 sin(1/x) and g(x) = x, using (i) L’Hopital’s rule, and (ii) without
using L’Hopital’s rule. Does this make sense?
272 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

A naive “proof” in the [0/0] case in which c is a finite point goes something like this:

f (x) f (c) f (x) f (c)


f (x) lim f 0 (c) f 0 (x)
lim = lim x c =
x!c x c = 0 = lim 0 .
x!c g(x) x!c g(x) g(c) g(x) g(c) g (c) x!c g (x)
lim
x c x!c x c
The problem with this argument is that l’Hopitals rule is supposed to work even if g and
f are not differentiable at the point x = c itself (for instance, this is the case in part (vi)
of exercise 9.45). One elegant way to get around this is to use the so-called Generalised
Mean Value Theorem due to Cauchy (you are asked to prove it in exercise 9.55).

Proposition 9.49 (The Generalised Mean Value Theorem for Derivatives)


Suppose that the functions f, g are continuous on [a,b], differentiable on (a,b) and g 0 (x) 6=
0 on (a,b). Then there exists (at least) a point c 2 (a,b) such that

f 0 (c) f (b) f (a)


0
= .
g (c) g(b) g(a)

Exercise 9.50 In this exercise, we ask you to prove L’Hopital’s rule in the case when
x tends to a finite number.

(a) Prove that l’Hopitals theorem is a consequence of the above proposition in the
case where c is a finite number, and the limit f /g is of the type [0/0].
(b) Deduce from (a) that L’Hopital’s rule holds for limits of the type [0/0] when
x ! 1.

Hint: In (b), just use a change of variables.

Exercise 9.51 (a) It would seem that Proposition 9.49 has a problem for functions
g such that g(a) = g(b). Explain why this potential problem is already taken
care of by the hypothesis of the proposition.
(b) Some textbooks on Calculus formulate Proposition 9.49 in such a way that there
is no problem if g(a) = g(b). Can you suggest such a formulation?
9.4. A CLOSER LOOK AT THE MEAN VALUE THEOREM 273

9.4 A closer look at the Mean Value Theorem


We now consider the question of how to prove the Mean Value Theorem.

Theorem 9.2 (The Mean Value Theorem for the derivative) Suppose that a
function f is continuous on [a,b] and differentiable on (a,b). Then there exists (at least)
a point c 2 (a,b) such that
f (b) f (a)
f 0 (c) = .
b a

Rolle’s theorem
The first step to proving the Mean Value Theorem is to establish a baby-version usually
called Rolle’s theorem. We choose to call it a “lemma”, since this is the customary word
to use for result that are purely preparatory in nature.

Lemma 9.52 (Rolle’s theorem) Suppose that f is continuous on [a,b] and differen-
tiable on (a,b). Also, suppose that f (a) = f (b) = 0. Then there exists a point c 2 (a,b)
such that f 0 (c) = 0.

Proof: Let f be a function satisfying the hy-


potheses of Rolle’s theorem. In particular,
f is continuous on [a,b], and so by the Min-
max theorem, there exists u, ` 2 [a,b] so that
f (`)  f (x)  f (u) for all x 2 [a,b]. Now, we
consider the following cases:
Case 1: Both u and ` are at the end-points Fig. 25. Rolle’s theorem is the Mean
of [a,b]. Since f (a) = f (b) = 0, this means that Value Theorem when f (a) = f (b) = 0.
f (`) = f (u) = 0, and so, by what we said
above, f (x) = 0 for all x 2 [a,b]. In particular, this means that f is constant on [a,b],
and so f 0 (c) = 0 for all c 2 (a,b), and we are done.
Case 2: At least one of u or ` is an inner point of [a,b]. By Fermat’s theorem
(Theorem 9.13), this means that this point must be stationary, and we are done.

Exercise 9.53 In Rolle’s theorem we made the assumption that f (a) = f (b) = 0. Is
this needed for the conclusion to be true? Motivate your answer with either a proof
or an example, as appropriate.
274 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

Proof that the Mean Value Theorem follows from Rolle’s theorem
Suppose that we are given a function f that is
continuous on [a,b] and differentiable on (a,b).
The point of the proof is to create a situation
where we can apply Rolle’s theorem.
To this end, we define the following function
✓ ◆
def f (b) f (a)
h(x) = f (x) f (a) + (x a) .
b a

To the right, we have illustrated the meaning of


the function h(x). Indeed, its value is equal to
the height difference of f (x) and the secant line
between (a,f (a)) and (b,f (b)). Notice that we
are not using absolute values, so h(x) > 0 when-
ever f (x) lies above the secant line, h(x) = 0 Fig. 26. Illustration of the function
at the points where f (x) crosses the secant line, h(x).
and h(x) < 0 whenever f (x) lies below the secant
line.
The function h(x) is continuous on [a,b] and differentiable on (a,b) (why?), and also
satisfies h(a) = h(b) = 0 (we see this from the figure, but can also verify this directly by
putting x = a and x = b into the formula for h(x)). Hence, Rolle’s theorem implies that
there exists a point c 2 (a,b) so that h0 (c) = 0.
But notice that the derivative of h(x) is given by:
f (b) f (a)
h0 (x) = f 0 (x)
b a
From this, it follows that by putting x = c and using h0 (c) = 0, we obtain
f (b) f (a)
f 0 (c) = .
b a
This ends the proof.

Exercise 9.54 Motivate why the function h(x) is continuous on [a,b] and differen-
tiable on (a,b).
Hint: Use Proposition 6.52.

Exercise 9.55 Modify the above proof to prove the Generalised Mean Value Theorem
for derivatives (Proposition 9.49).
Hint: The point is to modify the help function so that when we apply Rolle’s theorem,
the Generalised Mean Value Theorem follows.
9.5. PROOFS OF A FEW DEEP THEOREMS RELATED TO CONTINUITY 275

9.5 Proofs of a few deep theorems related to continuity


In the proof of Rolle’s theorem, we used the Min-max theorem for the first time (at least
in terms of the theory), which we stated without proof in Chapter 6. To make sure that
Rolle’s theorem is true, we now take the time to prove the Min-max theorem, as well
as his close relative: the Intermediate value theorem. For convenience, we restate both
results here:

Theorem 6.7 (The Intermediate Value Theorem) Suppose that f is continuous


on the finite and closed interval [a,b]. If f (a) and f (b) have opposite signs, then there
exists (at least) a number c 2 (a,b) so that f (c) = 0.

Theorem 6.8 (The Min-max theorem) Suppose that f is continuous on the finite
and closed interval [a,b]. Then there exists (at least) two points `, u 2 [a,b] so that for
all x 2 [a,b] we have
f (`)  f (x)  f (u).

Since some students find the logic of this chapter a bit hard to follow, here is an
overview that indicates how the main results of this chapter are connected (some of
these results will be formulated for the first time in the following pages):

Remark 9.56 (Overview of the theory in this chapter)


Completeness axiom (in the form of the Balloon lemma)
+
Bolzano-Weierstrass theorem (inspired by proof of Intermediate value theorem)
+
Bounded function theorem
+
Min-max theorem (proof also needs another use of Bolzano-Weierstrass)
+
Rolle’s theorem
+
Mean value theorem
Finally, once we have the Mean value theorem, we get Corollary 9.3 which is our main
tool for studying the graphs of functions using the derivative (a technique that also allows
us to establish identities and inequalities).
276 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

The Intermediate Value Theorem


Before we give the proof, we try to explain the
idea. Suppose that f satisfies the hypothesis of the
Intermediate Value Theorem and that its graph looks
something like the one in Figure 27. That is, it is
continuous (no teleporting!) and has different signs
on the two endpoints. Our goal is to prove that it has Fig. 27. A continuous function
to cross the x-axis in at least one point. Our strategy with endpoint values of opposite
is to divide and conquer. signs.
Here is an intuitive explanation of how this would
look for this specific f :
Step 1: We put m = (a + b)/2. This is the midpoint
of [a,b]. We check the sign of f (m), which turns out
to be negative. This means that f (m) and f (a) have
the same sign, while f (m) and f (b) have opposite
signs. Therefore, we put a1 = m and b1 = b, and we
look for a zero on [a1 ,b1 ] in the next step.
Step 2: We put m1 = (a1 + b1 )/2. This is the mid-
point of [a1 ,b1 ]. This time we see that f (m1 ) and
f (a1 ) have opposite signs. Therefore, we put a2 = a1
and b2 = m1 , and we look for a zero on [a2 ,b2 ] in the
next step.
Step 3: We put m2 = (a2 + b2 )/2. This is the mid-
point of [a2 ,b2 ]. This time we see that f (m2 ) and
f (b2 ) have opposite signs. Therefore, we put a3 = m2
and b3 = b2 , and look for a zero on [a3 ,b3 ] in the next Fig. 28. Above, we see illustra-
step. tions of steps 1, 2 and 3.
The point of the above steps is to identify smaller and smaller intervals where we
hope to capture a zero of our function. Here is how to turn this "strategy" into a proof:
Proof of the Intermediate Value Theorem. So, suppose that f is continuous on [a,b] and
that f (a) and f (b) have different signs. Let us first consider the case when f (a) < 0 and
f (b) > 0. So, what to do? Well, the key is to think like a programmer.
First, we put m = (a + b)/2 and tell our imaginary "computer" to check the sign of
f (m). One of the three following cases will now occur:

Case 1: f (m) = 0, Case 2: f (m) > 0, or Case 3: f (m) < 0.

We instruct our imaginary computer to react as follows:

• Case 1: We found a zero! We end the search and return the value m as our c.
9.5. PROOFS OF A FEW DEEP THEOREMS RELATED TO CONTINUITY 277

• Case 2: f (a) and f (m) have opposite signs, and we put a1 = a and b1 = m.
• Case 3: f (m) and f (b) have opposite signs, and we put a1 = m and b1 = b.
What happens now? Well, if Case 1 applies, we are done. So let us assume that one of
cases 2 or 3 hold. In either case, we end up with a new interval [a1 ,b1 ] for which
a| |b
(i) f (a1 ) < 0 (ii) f (b1 ) > 0 (iii) |b1 a1 | = .
2
But this means that the function f satisfies exactly the same hypotheses on [a1 ,b1 ] as on
[a,b]. This allows us to repeat the above process on [a1 ,b1 ] to produce an interval that
we now call [a2 ,b2 ]. Continuing in this way, we obtain a sequence of nested intervals
[a,b] [a1 ,b1 ] [a2 ,b2 ] [a3 ,b3 ] · · ·
for which |b a|
(i) f (an ) < 0 (ii) f (bn ) > 0 (iii) |bn an | = .
2n
Now, one of two things may happen. Either case 1 kicks in after a finite number of
steps, and we have found a zero of f (and the proof ends). Or it does not, and we end
up with an infinite sequence of intervals [an ,bn ]. The point is now to prove that even if
the process does not stop after a finite number of steps, we still find a zero of f .

To do this, we first make the observation that since the


intervals [an ,bn ] are contained in eachother, as indicated
above, it follows that:

n=1 is increasing and bounded above by b.


• (an )1
• (bn )n=1 is decreasing and bounded below by a.
1

But this implies, by the Balloon lemma (Proposition 5.25)


that the sequence an converges to some number A, and Fig. 29. Here, we illustrate
that the sequence bn converges to some number B. In the relative placement of the
fact, A = B, as follows from the computation intervals.

b a
B A = lim bn lim an = lim (bn an ) = lim = 0.
n!1 n!1 n!1 n!1 2n
This means that we can put c = A = B. Moreover, since the (an )1 n=1 are bounded above
by b, and the (bn )1
n=1 are bounded below by a, it follows that A  b, and B a, and so
c 2 [a, b].
The last step of this proof is to show that f (c) = 0. To this end, recall that f is
continuous on [a, b], and that f (an ) < 0 and f (bn ) > 0 for all n. From this, we obtain
both
f (c) = f ( lim an ) = lim f (an )  0,
n!1 n!1
and
f (c) = f ( lim bn ) = lim f (bn ) 0.
n!1 n!1
In other words, 0  f (c)  0, and so we conclude that f (c) = 0.
278 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

The Bolzano-Weierstrass Theorem


Before tackling the Min-Max Theorem, we are going to make a slight detour.

Theorem 9.57 (Bolzano-Weierstrass’ Theorem) Every sequence (cn )1 n=0 of points


in an interval of the form [a,b] has a convergent subsequence with limit in [a,b].

The Bolzano-Weierstrass theorem is named af-


ter its discoverers Karl Weierstrass, a famous
German mathematician, and Bernard Bolzano,
the Czech mathematician that was the first to
properly define what we mean by the limit.
Weierstrass was essentially responsible for dis-
covering what Bolzano had done (no one had
noticed at the time) and to make sure he got
proper credit... posthumously.
The Bolzano-Weierstrass Theorem can be Fig. 30. Karl Weierstrass (1815–
thought of as an abstract way of capturing the 1897)
"divide and conquer" method from the proof of the Intermediate Value Theorem. This
result survives generalisation to many settings, and is an important tool in many areas
of mathematics (it leads to a concept called "compactness"). In these lecture notes, we
use it (twice) to prove the Min-max theorem as well as a key result on definite integrals.
Let us now explain what we mean by a subsequence. Informally, we say that a
sequence (dk )1k=1 is a subsequence of a sequence (cn )n=1 if:
1

(i) (dk )1
k=1 only consists of elements taken from the sequence (cn )n=1 .
1

(ii) (dk )1
k=1 respects the order of the sequence (cn )n=1 .
1

We illustrate what we mean by this in an example:

Example 9.58 Let (cn )1 n=1 = 1/n. Then (dk )k=1 = (1/2 )k=1 is a subsequence of
1 k 1

n=1 . To see this a bit more clearly, let us write out the first few terms of the two
(cn )1
sequences as follows: 1 1 1 1 1 1 1 1
1, , , , , , , , ,...
2 3 |{z}
|{z} 4 5 6 7 |{z} 8 9
=d1 =d2 =d3

As we see, the sequence d1 , d2 , d3 , . . . only consists of numbers taken from the list
c1 , c2 , c3 , . . . and the order of the original sequence has been respected.

In the above example, notice that d1 = c2 , d2 = c4 , d3 = c8 and so forth. In fact,


dk = c2k . This means that if we write nk = 2k , then
dk = c nk .
9.5. PROOFS OF A FEW DEEP THEOREMS RELATED TO CONTINUITY 279

This is a typical way of expressing subsequences. Indeed, so much so, that we take the
following to be our formal definition of a subsequence:

Definition 9.59 If (nk )1 k=1 is a strictly increasing sequence of integers, then we say
that (cnk )1
k=1 is a subsequence of (cn )1
n=1 .

Exercise 9.60 Prove that if (xnk )1


k=1 is a subsequence of (xn )n=1 , then nk
1 k.

Exercise 9.61 Suppose that cn = 1/(2n) for n = 1, 2, . . .. Write the following entries
from a subsequence dk of cn on the form cnk .
1 1 1 1
d1 = , d2 = , d3 = , d4 = .
22 10 5! 210
Remark: The point is to figure out suitable values for the nk for these four terms.

Now that we know what a subsequence is, we can


finally understand what the Bolzano-Weierstrass
theorem actually says. Indeed, it says that no
matter the sequence c1 , c2 , c3 , . . . of points from
[a,b], you can always pick a subsequence that Fig. 31.The first terms of the
converges. sequence.
Proof of the Bolzano-Weierstrass Theorem. Suppose that (cn )1 n=0 is a sequence of points
from [a,b]. In constructing a convergent subsequence dn , our first choice is to put d0 = c0 .
To choose d1 , we apply the method of "divide and conquer". As in the proof of the
Intermediate Value Theorem, we put m = (a + b)/2 and consider the following cases:

• Case 1: Only the half [a, m] has an infinite number of entries of (cn )1
n=0 . Denote
by n1 the index of first cn to appear inside this interval. We choose d1 = cn1 and
put a1 = a, b1 = m.
• Case 2: Only the half [m, b] has an infinite number of entries of (cn )1
n=0 . Denote
by n1 the index of first cn to appear inside this interval. We choose d1 = cn1 and
put a1 = m, b1 = b.
• Case 3: Both halves [a, m] and [m, b] have an infinite number of entries from
n=1 . In this case, we choose d1 , a1 and b1 exactly as in Case 1.
(cn )1

Notice that at least one of these cases has to


hold true. Indeed, if neither is true, then all of
[a,b] only contains a finite number of terms from
n=1 . But this is absurd since, by assumption,
(cn )1
the sequence has an infinite number of entries Fig. 32.One half of the interval con-
from [a,b]. tains an infinite number of entries.
280 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

Whichever case holds, we have now chosen d0 and d1 . To choose d2 , we consider


cases 1, 2, and 3 above, modified so that we seek the first entry cn2 in the relevant half
of [a1 ,b1 ] so that n2 > n1 (we need this condition for the cnk to be a subsequnce of cn ).
If we keep on repeating this process, we end up with an infinite chain of intervals

[a,b] [a1 ,b1 ] [a2 ,b2 ] [a3 ,b3 ] ···

and a strictly growing sequence of indices

n1 < n2 < n3 < · · · .

In particular, this means that the sequence (dk )1k=0 where dk = cnk is a subsequence of
(cn )1
n=0 . Moreover, we are in a good position to prove that dk converges to some limit
L 2 [a,b]. Since this is done by repeating, word by word, the last lines of the proof of
the Intermediate Value Theorem, we leave this to the reader.

Exercise 9.62 Complete the proof of the Bolzano-Weierstrass Theorem.

For the interested student, we also mention a completely different way of proving the
Bolzano-Weierstrass theorem. It is based on the following, rather interesting, lemma.

Lemma 9.63 Every sequence (cn )1


n=1 has (at least) one monotone subsequence.

We ask you to explore this strategy in the following exercise.

Exercise 9.64 (Challenge)

(a) Explain how the Bolzano-Weierstrass theorem follows from the lemma.
(b) To prove the lemma itself, suppose that (cn )1
n=1 has no decreasing subsequence.
Use this assumption to prove that there can only be a finite number of n with
the property that for all m n we have cm  cn (do this by contradiction).
(c) Use what you proved in (b) to prove that (cn )1
n=1 must have an increasing sub-
sequence.
(d) Explain why combining (a), (b) and (c) proves the Bolzano-Weierstrass theorem.

Remark: Most (all?) proofs of the Bolzano-Weierstrass theorem on YouTube follows


this strategy (it is sometimes called the "room with a view" strategy – or something to
this effect – since if n satisfies the property in (b), then cn has a nice view!).
9.5. PROOFS OF A FEW DEEP THEOREMS RELATED TO CONTINUITY 281

The Min-Max Theorem


We now turn to the final result of this section, the Min-Max Theorem. The first step to
proving this result is to establish the following lemma (it will act like a stepping stone
to the Min-Max Theorem, sort of like how Rolle’s Theorem was a stepping stone to the
Mean Value Theorem).

Lemma 9.65 (The Bounded Function Theorem) Suppose f (x) is continuous on


the finite and closed interval [a,b]. Then there exists a constant C > 0 so that

C  f (x)  C, 8x 2 [a,b].

Before we give the proof, note how this lemma says less than the Min-Max Theorem.
Indeed, we are not claiming that the function f is able to attain the values C and C
at some points on the interval [a,b]. Instead, we are merely saying that the function is
bounded by these constants.
Proof of the lemma. We begin by proving the
existence of an upper bound for f (x). We do
this by a contradiction argument involving the
Bolzano-Weierstrass theorem. To this end, sup-
pose that f (x) is not bounded from above. That
is, for all n 2 N there exists an cn 2 [a, b] so that

f (cn ) n. (9.3)

This is illustrated in the figure to the right.


Applying the Bolzano-Weierstrass theorem to
the sequence (cn )1
n=1 , we obtain a subsequence
(dk )k=1 which converges to some limit L in [a,b].
1 Fig. 33. f (x) has to take at least one
value above each of the lines y =
Next, notice that by (9.3), we have 1, y = 2, y = 3, . . ..
f (dk ) = f (cnk ) nk . (9.4)
By exercise 9.60 and inequality (9.4), it follows that for all k 2 N, we have

f (dk ) k.

Moreover, taking into account the continuity of f (x), we obtain

f (L) = f ( lim dk ) = lim f (dk ) lim k = 1.


k!1 k!1 k!1

But, this is absurd since L 2 Df and so, in particular, f (L) has to be a finite number.
We conclude that f (x) has to be bounded above by some constant C > 0.
282 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

Exercise 9.66 Prove the second half of the lemma. That is, prove that the function
must have a bound from below.
Hint: Using the part we already proved, this can be done in one or two lines.

We now turn to the proof of the Min-Max Theorem itself. Since the argument follows
the same pattern as the ones we have seen above, we only outline the proof.

Outline of the proof of the Min-Max Theorem. We restrict ourselves to showing that
there exists a number u 2 [a,b] so that f (x)  f (u) for all x 2 [a,b]. That is, f attains
its global maximum.
Our plan is to apply the Bolzano-Weierstrass theorem to find a sequence dk converging
to a number L so that f (L) is the global maximum of the f . This we can do in the
following way:

Step 1. Establish that the set M = {f (x) : x 2


[a,b]} has a least upper bound. Denote this least
upper bound by L.
Step 2. Prove that there exists a sequence cn so
that for every n 2 N, we have f (cn ) L 1/n.
Step 3. Use Bolzano-Weierstrass to prove the
existence of a number u so that f (u) L.
Step 4. Conclude that f (u) = L.
As in the proof of the previous lemma, the ex-
istence of a point such that the global minimum
is attained can be obtain in a line or two once Fig. 34. Illustration for the proof of
the first part of the theorem has been estab- the Min-Max Theorem.
lished.

Exercise 9.67 Complete the proof of the Min-Max theorem.

(a) Do steps 1 and 2.


(b) Do steps 3 and 4.
(c) Prove that the existence of a point so that the global minimum is attained follows
from the first part of the theroem.
9.5. PROOFS OF A FEW DEEP THEOREMS RELATED TO CONTINUITY 283

Continuity of inverse functions


Let us begin by restating the following result, announced already in Chapter 6 (recall
Proposition 6.54 – se also exercise 9.55).

Proposition 9.68 Suppose that f is defined, continuous and invertible on an interval.


Then f 1 is continuous.

We now indicate how this proposition is a consequence of the Bolzano-Weierstrass


theorem. First of all, denote the domain of f by Df = [a,b]. By combining the Min-
max theorem and intermediate value theorem, we obtain that the range is of the form
Rf = [c,d]. We point out that this means that Df 1 = Rf = [c,d] and Rf 1 = Df = [a,b].
We now formulate the proof strategy:
Step 1: To get a contradiction, suppose that y is a number in [c,d] where f 1 is
not continuous. Write f 1 (y) = x.
Step 2: Show that this means that there has to exist a number ✏ > 0 and a
sequence yn in [c,d] so that yn ! y but |f 1 (yn ) f 1 (y)| ✏. Write f 1 (yn ) = xn .
Step 3: Use the Bolzano-Weierstrass theorem to obtain a subsequence of xnk of
xn that converges to some number L 2 [a,b].
Step 4: Finally, to obtain a contradiction, combine the the continuity and invert-
ibility of f to conclude that x = L.

Fig. 35. Illustration of the situation of the above proof. The key to getting the
contradiction is to show that there exists a sequence yn that converges to y, but so
that the sequence xn = f 1 (yn ) stays away from x = f 1 (x).

Exercise 9.69 Complete the proof of the above proposition by filling out the above
steps. In particular, point out why we arrive at a contradiction in Step 4.
Exercise 9.70 (Challenge) Prove that if f is defined, continuous and invertible on
an interval, then f has to be monotone.
Hint: Use exercise 2.44.
284 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

9.6 Exercises from previous exams


Exercise 9.71 (Exam 2016-01-23)

(a) Formulate the Mean Value Theorem.


(b) Show that if f 0 (x) = 0 on an interval, then f (x) is constant there.
(c) Show that
p p ⇡
2 arctan(x x2 1) + arctan x2 1= , x 1.
2

Exercise 9.72 (Exam 2016-01-07)

(a) Use implicit differentiation to prove that

d 1
arcsin(x) = p , x 2 ( 1,1).
dx 1 x2

(b) Use (a) to prove that for all x 2 R, we have


x
arcsin p = arctan x.
1 + x2

Exercise 9.73 (Exam 2015-05-27)

(a) Formulate the Mean Value Theorem.


(b) Use the Mean Value Theorem to show that if f 0 (x) 0 for all x 2 (a,b) holds,
then for x 2 [a,b] we have

x1  x2 =) f (x1 )  f (x2 ).

(c) Show that


x 1
ln x , x 1.
x+1
Exercise 9.74 (Exam 2014-12-18)

(a) Formulate the Mean Value Theorem.


(b) Let the function f be defined and continuous on R. Use the Mean Value Theorem
to prove that if f 0 (x) = 0 for all x 2 R, then f (x) is a constant.
(c) Prove the formula

1 ⇡
arctan(x) + arctan = , x > 0.
x 2

Exercise 9.75 (Exam 2014-08-18)


9.6. EXERCISES FROM PREVIOUS EXAMS 285

(a) Say that a function takes the y-values 1 and 3. Under what assumptions can we
guarantee that f must also take the y-value 2? (Here, you are supposed to refer
to and formulate a theorem from the course.)
(b) Give an example of a function that takes the y-values 1 and 3, but not the y-value
2. Point out why your example does not contradict the theorem you cited in (a).
(c) Determine the range of the function f (x) = 2 arctan(x) + x1 .

Exercise 9.76 (Exam 2014-05-26) Let f (x) = |x|e 1/x for x 6= 0. Make a sketch
of this function which shows where it is growing/decreasing, convex/concave and any
asymptotes it may have. In you sketch, make sure you also point out any extreme
points, and how the function behaves close to x = 0.
Exercise 9.77 (Exam 2014-01-09) Determine the number of zeros of the function
f (x) = ex x2 .
Exercise 9.78 (Exam 2013-12-18) Let
1
f (x) = p .
x2 2x x
(a) Determine the domain of f .
(b) Determine where the function is positive and negative, respectively.
(c) Determine where the function is continuous.
(d) Determine any horisontal and vertical asymptotes the function may have.
(e) Determine all local and global extreme points.
It is important that you illustrate your answers in a sketch of the function.
Exercise 9.79 (Exam 2013-08-21) Prove the inequality
2x ln x < x2 1, x > 1.

Exercise 9.80 (Exam 2013-08-21)


(a) Let n be a natural number. Show that there is exactly one point xn in each
interval (n 1)⇡, (n 12 )⇡ so that 1 = xn tan xn . Determine even limn!1 xn /n.
(b) Determine all extreme points of f (x) = sin(ex ). Determine also the inflexion
points of f in terms of the points xn . Illustrate your answer in a minimialistic
figure.
Exercise 9.81 (Exam 2013-05-29) Determine the range of the following function:
p
3+x
f (x) = + arctan x, x 2 R.
1 + x2
In particular, determine whether the function has any global extreme points, and if
so, determine their values.
286 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

Exercise 9.82 (Exam 2012-12-19) Determine all local and global extreme points
of the function
f (x) = |x3 6x2 + 9x 4|
on the interval [0,5] and sketch its graph there.

Exercise 9.83 (Exam 2012-05-28) Prove the inequality

x 1
ln(1 + e )> , x 2 R.
ex +1
9.7. ANSWERS TO SELECTED EXERCISES 287

9.7 Answers to selected exercises


9.4 Use the Mean Value Theorem to obtain the relation f (x2 ) f (x1 ) = f 0 (c)(x2 x1 )
for some c 2 (a,b). Next, to investigate the sign of the expression on the right-hand
side.

9.5 The modification is that the strict inequalities have to be replaced by non-strict
inequalities (to see why, think of the example f (x) = x3 ). Next, since the derivative
of f exists, we know that

f (x + h) f (x) f (x + h) f (x)
lim = f 0 (x) = lim .
h!0+ h h!0 h

Investigate the sign of the two one-sided limits.

9.6 Put f (x) = cos2 x + sin2 x 1. Then f 0 (x) = 2 sin cos x + 2 sin cos x = 0. Hence,
by Proposition 9.3, f is equal to some constant. What remains is to figure out the
value of this constant.

9.17 Global minimum f (3) = 25 and global maximum f ( 1) = 7. There is also a


local maximum f (4) = 18.

9.18 As a general point of strategy, it is wise to first study the graph of g(x) = x2 +x 2 =
(x + 2)(x 1). Now, g has zeroes at x = 2 and x = 1, and doing a table of signs
for g 0 reveals that it has a local minimum at x = 1/2. When taking the absolute
value to obtain f (x) = |g(x)|, the part of g that is negative becomes positive –
that is, reflected with respect to the x-axis:

Fig. 36. A plot of g(x) (left) and f (x) right. Notice how the zeroes of g become the
global minima of f .

In particular, zeroes of g(x) become global minima of f (x), and the local minimum
for g at x = 1/2 becomes a local maximum f ( 1/2) = 9/4. There are no global
maxima. Here is an illustration of the situation:
288 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

9.24 The crucial thing is to compute


x+1 1
f 0 (x) = e 1/x
and f 00 (x) = e 1/x
.
x x3
Features of interest are (see also figure below):

– f defined, continuous and differentiable on R\{0}.


– f increases on ( 1, 1], decreases on [ 1, 0), has a singularity at x = 0, and
increases on (0,1).
– f has a local maximum f ( 1) = e (which is not global).
– f is concave on ( 1,0) and convex on (0,1).
– f has a vertical asymptote at x = 0 (the limit is 1 from the left, and 0
from the right).
– y = x 1 is a skew asymptote as x ! ±1 (see also exercise 6.80b).

Fig. 37. Here is a plot of y = xe 1/x .

9.25 To solve this exercise, it is crucial to correctly compute


4x 2
f 0 (x) = (x 1) 1/3
3
4
f 00 (x) = (x2 1) 4/3
(x2 3).
9
Features of interest are (see also figure below):

– f defined and continuous on R.


– f 0 and f 00 defined on R\{ 1,1}.
– f decreases on ( 1, 1], increases on [ 1,0], decreases on [0,1] and increases
on [1,1).
– f has global minima f ( 1) = f (1) = 0 and local maximum f (0) = 1. No
global maximum.
9.7. ANSWERS TO SELECTED EXERCISES 289
p p
– f is convex on ( 1, 3) and ( 3,1).
p p
– f is concave on ( 3, 1), ( 1,1) and on (1, 3).
– f has no asymptotes.

Fig. 38. Here is a plot of y = (x2 1)2/3 .

9.26 To solve this exercise, we first compute


3 + 2x
f 0 (x) = .
1 + x2
A table of signs now reveals a global minimum at x = 3/2.
9.32 The point is to define
p
3+x ⇡
f (x) = 2
+ arctan(x) + .
1+x 2
and prove that this function is always strictly bigger than zero. To do this, we
study p
0 1 3x
f (x) = 2 .
(1 + x2 )2
p
A table
p of signs reveals that f is increasing on ( 1,1/ 3] and decreasing on
[1/ 3,1). After checking that limx!±1 f (x) 0 we may conclude that f (x) > 0
for all x 2 R.
9.33 We put f (x) = ln(1+4x) arctan(3x) and seek to prove that this function satisfies
f (x) > 0 for all x > 0. To do this, we compute
(6x 1)2
f 0 (x) =
(4x + 1)(9x2 + 1)
and study a table of signs. This table reveals that f is strictly increasing on [0,1).
Since also f (0) = 0, it therefore follows that 0 = f (0) < f (x) for all 0 < x, and we
are done.
290 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS

9.28 Put f (x) = ln(ax) ln a ln x and study its derivative.

9.29 Put f (x) = arcsin x ⇡/2 + arccos x and study its derivative. The formula is true
for 1  x  1.
p
9.30 Put f (x) = arctan x2 1 + arcsin(1/x) and study its derivative. The value of
the constant is ⇡/2.

9.45 (i) 1, (ii) 1, (iii) 1, (iv) 1, (v) 1, (vi) 0.

9.46 No, the second equality is false.

9.47 If you just use L’Hopital (without simplifying the expression first), then the ex-
pression will just repeat itself (more or less – you will get its reciprocal).

9.50 To use the Generalised Mean Value Theorem, we need for f and g to be defined
and continuous at x = c (but not necessarily differentiable there). If this is the
case, then fine. If not, consider instead the "extended" functions
( (
f (x) x 6= c g(x) x 6= c
F (x) = G(x) =
0 x=c 0 x=c

These functions are both defined, and continuous at x = c. Moreover, since they
agree with f and g, respectively, for x 6= c, we also have that

f (x) F (x)
lim = lim .
x!c g(x) x!c G(x)

The Generalised Mean Value Theorem may now be applied.

9.54 h is the difference of two continuous and differentiable functions.

9.55 Replace b a by g(b) g(a) and (x a) by g(x) g(a), respectively.

9.60 First, notice that n1 1 (in fact, all nk are larger than or equal to one). Next, we
observe that since n2 is an integer, then the inequality n2 > n1 1 implies n2 2.
Using these observations, one should be able to set up an induction proof.

9.61 d1 = c2 , d2 = c5 , d3 = c60 , d4 = c512 . That is, n1 = 2, n2 = 5, n3 = 60, n4 = 512.

9.62 The next step in the proof is to follow, word-by-word, the portion of the proof of
the intermediate value theorem establishing that limk!1 ak = limk!1 bk = c for
some c 2 [a,b]. Since ak  dk  bk , the conclusion now follows from the squeeze
theorem.

9.66 Hint: To obtain a lower bound for f (x) is the same as getting an upper bound for
g(x) = f (x)...
9.7. ANSWERS TO SELECTED EXERCISES 291

9.67 Here are some pointers:

– Step 1: This follows from the Bounded Function Theorem.


– Step 2: If L is the supremum of Rf = {f (x) : x 2 [a,b]}, then L 1/n is not
an upper bound for Rf . So, for every n 2 N there exists a cn 2 [a,b] such that
f (cn ) > L 1/n.
– Step 3: By Bolzano-Weierstrass, cn has a convergent subsequence cnk (denote
the limit by u). But then we know that

f (u) = f ( lim cnk ) = lim f (cnk ) lim (L 1/n) = L.


k!1 k!1 k!1

Since L is an upper bound for Rf , we also have f (c)  L, and therefore


f (u) = L.
– Step 4: We are done :-)

9.69 Here are some pointers:

– Step 1: Nothing needs to be done.


– Step 2: This is the negation of what it means for f 1 to be continuous at y.
– Step 3: Use Bolzano-Weierstrass to replace xn by a convergent subsequence
xnk (call the limit L).
– Step 4: Since f is continuous, we get both

lim f (xnk ) = f ( lim xnk ) = f (L),


k!1 k!1

and
lim f (xnk ) = lim ynk = y = f (x).
k!1 k!1

Since f is invertible (and therefore satisfies the horisontal line criterion), this
means that x = L. By what we did in Step 2, this gives a contradiction
(please make sure you see why this is).
292 CHAPTER 9. MORE ON FUNCTIONS AND LIMITS
Chapter 10

The indefinite integral

In this chapter we study the indefinite integral. Since the indefinite integral is just a
matter of taking the derivative "backwards", all facts about the indefinite integral are
obtained from corresponding facts on the derivative.

Remark 10.1 (Selected problems from previous exams based on this chapter)

1. Compute the integrals


Z Z
sin x sin 2x dx
(a) dx (ii)
1 + cos2 x 1 + cos2 x
Z Z p p
p 1+ x
(b) sin x dx (iv) p dx
x

10.1 A first look at the indefinite integral


Since the indefinite integral is just the derivative,
but "backwards", we will see that many compu-
tations (and even proofs) involving the indefinite
integral resemble a game of "Guess who?". To
see what we mean by this, let us consider an ex-
ample.

293
294 CHAPTER 10. THE INDEFINITE INTEGRAL

Example 10.2 (Free fall example) Suppose that y(x) describes the vertical position
of an object in free fall at time x ("free fall" means that the only force acting upon the
object is gravity). According to introductory physics, its acceleration is described by

y 00 (x) = 9.82.

We call this a differential equation since it is an equation that involves derivatives. To


solve a differential equation, we need to find a function y which makes the left-hand side
of the differential equation equal to the right-hand side.
To solve this differential equation why not try guessing? First, if y 00 = 9.82, what
isy0? For someone experienced with computing derivatives, a natural guess (which you
are asked to check below) is
y 0 (x) = 9.82x.

We now have an equation for the velocity y 0 . But we want an expression for y.
Guessing once more (again, you are asked to check this below), it seems like the following
guy does the trick:
9.82 2
y(x) = x .
2

Exercise 10.3 By using the definition of the derivative, verify that if y = ( 9.82/2)x2 ,
then y 0 = 9.82x and y 00 = 9.82.

Our solution method can be described as doing derivative backwards. That is, we
are looking for primitive functions:

Definition 10.4 (Primitive functions) F is a primitive function of f if F 0 (x) = f (x).

Let us now look closer at Example 10.2, above. Is it really reasonable that the
motion of your mobile phone is described by the expression y = 9.82x2 /2? Well, not
necessarily. For instance, notice that this expression forces the position at time x = 0 to
be y(0) = 0. Moreover, it also forces the velocity at time x = 0 to be y 0 (0) = 0.
The expression for y found in the above example matches the situation where you
drop your phone down a hole in the ground. That is, the initial heights and velocity are
0. But what if the situation is the one to the right, where the initial height and velocity
is different from zero? The problem is that in the example, we missed a lot of solutions.
A function does not only have one primitive function, it has lots of them.
10.1. A FIRST LOOK AT THE INDEFINITE INTEGRAL 295

Fig. 1. It does not seem like too much to ask for Newton’s laws of nature to be able
to describe the physics in both of these situations.

Exercise 10.5 Suppose that f is a function on some interval [a,b]. We now ask you
to show that if F is a primitive function of f , then all primitive functions of f are
exactly on the form F + C, where C is any constant.
(a) Suppose that F (x) is a primitive of f (x). By using the definition of the derivative,
verify that G(x) = F (x) + C is also a primitive of f (x), no matter the constant
C 2 R.
(b) Suppose that F (x) and G(x) are two primitive functions of f (x). Use the Mean
Value Theorem to show that there must exist a constant C so that G(x) =
F (x) + C.

Example 10.6 (Free fall example, continued) By exercise 10.5, we can modify our
first guess to be
y 0 (x) = 9.82x + C.
Indeed, no matter the constant C, if we differentiate this expression, we get y 00 (x) =
9.82. By noticing that y 0 (0) = C, we see that we should interpret C as the initial
velocity.
Next, we modify our second guess to be
9.82 2
y(x) = x + Cx + D.
2
Indeed, by differentiating this expression, we get y 0 (x) = 9.82x + C. Here, we see
that y(0) = D, so the constant D should be interpreted as the initial position. This
means that we have found an equation for free fall that is flexible with respect to initial
velocities and positions.
296 CHAPTER 10. THE INDEFINITE INTEGRAL

Exercise 10.7 Suppose that you throw you mobile phone upwards. The initial height
is roughly 2 meters (your hand is extended upwards as you release it) and it takes 3
seconds for the mobile phone to hit the ground. (a) What was the initial velocity?
(b) How far up did the phone go?
Keeping track of these constants means that our equations (hopefully) model all
relevant physical situations, and not just a particular one. Since these constants are so
important, we introduce the following definition.

Definition 10.8 (Indefinite integral) For a function f we use the notation


Z
f (x) dx

to denote all primitive functions of f . By what we did in exercise 10.5, it follows that if
f is defined on an interval, and F is any primitive function of f , then we can write
Z
f (x) dx = F (x) + C, C 2 R.

Remark 10.9 If you read the definition of the indefinite integral carefully, you may
realise that it would be more correct to write
Z
f (x) dx = {F (x) + C : C 2 R}.

However, as long as we keep track of the constant C, we can safely skip the set notation
in most situations (note that if f is not defined on an interval, then we should adjust
this slightly – do you see why and how?).

The day job of the indefinite integral is to represent all primitive functions of f (x).
Because of this, the operations of taking derivatives and taking indefinite integrals are
by definition inverse to each other.

Fig. 2. That is, what the one does, the other tries to undo.

Exercise 10.10 Determine the following indefinite integrals.


Z Z Z Z
1
(a) dx (b) x dx (c) dx (d) 0 dx.
x2
Hint: Take a look at the derivatives we computed in the previous section.
10.2. THE RULEBOOK FOR INDEFINITE INTEGRATION 297

10.2 The rulebook for indefinite integration


The rulebook, part 1: Guess and check!
The discussion from the previous section can more or less be summarised as follows:
Z
by definition d
f (x) dx = F (x) + C () f (x) = F (x).
dx
In particular, everything we wish to say about the indefinite integral has to come from
some fact about the derivative. In fact, we shall see that basically every computational
rule for the derivative yields a computational rule for the indefinite integral. The point
is now to collect these, one by one.
The secret behind becoming a master of solving indefinite integrals is the following:

Remark 10.11 (Most important computational rule) Guess and check!

Here, we illustrate what we mean by this.

Example 10.12 (Solving an integral by guessing and checking) We want to solve


Z
x sin(x2 ) dx.

Since we know that sin x and cos x are related by the derivative, we guess that a primitive
of x sin(x2 ) ought to be F (x) = cos(x2 ). To check whether this is correct, we differentiate:

F 0 (x) = 2x sin(x2 ).

Since this is not equal to x sin(x2 ), our guess was wrong! However, only by the factor
2. To try to compensate for this, we make the modified guess F (x) = (1/2) cos(x2 ).
This time differentiation gives F 0 (x) = x sin(x2 ), and our guess is shown to be correct!
In conclusion, we have proved that
Z
1
x sin(x2 ) dx = cos(x2 ) + C.
2

Exercise 10.13 Use the method of guessing and checking to compute the following
indefinite integrals.
Z Z Z Z
dx x2 3x2
(a) (b) xe dx (c) dx (d) tan2 x dx
x3 1 + x3

Hint: In (d), it helps to know a few ways of how to express the derivative of tan x.
298 CHAPTER 10. THE INDEFINITE INTEGRAL

The rulebook, part 2: The summation rule and a weak product rule
The method of guessing and checking also allows us to obtain other computational rules
for the indefinite integral. We begin with the following pair of basic rules:

Proposition 10.14 For all functions f,g and constants k 2 R\{0}, we have
Z Z
(i) kf (x)dx = k f (x) dx, k is a constant
Z ⇣ ⌘ Z Z
(ii) f (x) + g(x) dx = f (x)dx + g(x)dx

While these formulas are rather straight-forward, there is a complication here. In-
deed, there is something mysterious going on with the constant C. Let us look at an
example to see what is going on.

Example 10.15 (A closer look at the undetermined constant) Since (x2 )0 = 2x,
it holds by the definition of the indefinite integral that
Z
2xdx = x2 + C.

On the other hand, by part (i) of Proposition 10.14, we get


Z Z ⇣1 ⌘
2xdx = 2 xdx = 2 x2 + C = x2 + 2C.
2
The observant reader will notice something strange here. Indeed, if both of these com-
putations are correct, then we have x2 + C = x2 + 2C, which leads to the rather strange
conclusion that C = 0.
So, what is going on? Well, the thing is that at this point we need to take into
consideration that the indefinite integralR is not a specific
R function, but rather a set of
functions (recall Remark 10.9). That is, 2xdx = 2 xdx holds since

{x2 + 2C : C 2 R} = {x2 + C : C 2 R}.

Now, in practice, we tend to avoid using set notation when dealing with indefinite inte-
grals. Instead, we usually justify the above identity by using different symbols for the
indefinite constants showing up. For instance, we write
Z Z
2
2xdx = x + C and 2 xdx = x2 + 2D,

and then observe that as C and D runs through all of R, then theseR two formulas
R describe
exactly the same functions, and we may therefore conclude that 2xdx = 2 xdx.
10.2. THE RULEBOOK FOR INDEFINITE INTEGRATION 299

Fig. 3. Here we see a selection of primitive functions for y = 2x. Notice that these
functions all have the same slope for a given value of x, and that all can be obtained
from both the formula x2 + C and x2 + 2D.

In fact, since the indefinite integral is really a set of functions, we are allowed to
simplify the constants involved. For instance, the computation in the above example is
often expressed as
Z Z ⇣1 ⌘
2xdx = 2 xdx = 2 x2 + C = x2 + 2C = x2 + D,
2
or even Z Z ⇣1 ⌘
2xdx = 2 xdx = 2 x2 + C = x2 + 2C = x2 + C.
2
Note that in the second computation, we actually change the role of C in the middle of
the computation. While sloppy, this is quite common. To warn people that this is about
to happen, mathematicians sometimes write "the constant C may change from line to
line" at the start of certain computations. We do this when the actual value of C does
not matter.

Exercise 10.16 Prove Proposition 10.14


R
Hint: In (i), a suitable guess is that kf (x)dx is equal to kF (x) + C, where F 0 (x) =
f (x). Verify this guess, and explain why it leads to the formula. In (ii), you should
use a similar strategy.

Exercise 10.17 Compute the integral


Z
x2
dx.
1 + x2

Hint: No fancy techniques are needed. Just rewrite the expression in some useful way.
300 CHAPTER 10. THE INDEFINITE INTEGRAL

The rulebook, part 3: A stronger product rule


We will now obtain a computational rule for the indefinite integral by using the product
rule for the derivative. To this end, we begin by recalling the product formula:

d⇣ ⌘
f (x)g(x) = f 0 (x)g(x) + f (x)g 0 (x)
dx
But this means that f (x)g(x) is a primitive of the expression f 0 (x)g(x) + f (x)g 0 (x). By
the definition of the indefinite integral, this means that
Z ⇣ ⌘
f 0 (x)g(x) + f (x)g 0 (x) dx = f (x)g(x) + C.

Next, using part (ii) of Proposition 10.14, we can rewrite this as


Z Z
0
f (x)g(x) dx = f (x)g(x) + C f (x)g 0 (x) dx
Z
= f (x)g(x) f (x)g 0 (x) dx.

In the last equality, we combine the constant C with the constant still inside of the
integral appearing on the left-hand side. This means that the constant is still there, we
just do not see it in the formula.
We have now proved the formula called integration by parts (or partial integration):

Proposition 10.18 (integration by parts)


Z Z
f 0 (x)g(x) dx = f (x)g(x) f (x)g 0 (x) dx.

We illustrate how this computational rule works in the following example:

Example 10.19 We use partial integration to compute


Z
x sin x dx.

To make the formula work, we have to pretend that one of the factors is the term f 0
from the integration by parts formula, and that the other is g.
The following is a bad choice:
Z Z
1 1 2
x sin
|{z} |{z}x dx = x2 sin x x cos x dx.
f0 g |2 {z } |2 {z }
f ·g f ·g 0
10.2. THE RULEBOOK FOR INDEFINITE INTEGRATION 301

Here, we first choose candidates for f 0 and g, compute f and g 0 , and then insert everything
into the partial integration formula. However, it turns out that this was a bad idea since
our next expression looks even worse than the one we began with.
We therefore try another choice for f 0 and g:
Z Z
x sin
|{z} |{z}x dx = x · ( cos x) 1 · ( cos x) dx
0
| {z } | {z }
g f g·f g 0 ·f
Z
= x cos x + cos x dx.

This time, we see that after integrating by parts, we arrive at a new


R expression that is
much easier to compute than the one we started with. Indeed, since cos xdx = sin x+C,
we conclude that Z
x sin x dx = x cos x + sin x + C.

Important: If you doubt your answer when computing an indefinite integral, or just
want to double check that your answer is correct, then recall that you can guess and check.
If taking the derivative of the answer does not give you back the original function, you
have messed up. In the above example, the following verifies that our answer is correct:
d⇣ ⌘
x cos x + sin x + C = cos x + x sin x + cos x = x sin x.
dx
The following three exercises more or less sums up the basic tricks related to partial
integration.

Exercise 10.20 Use partial integration to compute


Z Z Z
x ln x
(a) xe dx (b) x2 cos x dx (c) dx
x2
Exercise 10.21 Use partial integration to compute
Z Z
(a) ln x dx (b) arctan x dx

Hint: ln x = 1 · ln x and arctan x = 1 · arctan x.


Exercise 10.22 Use partial integration to compute the more tricky integrals
Z Z Z
ln x
(a) ex sin x dx (b) sin2 xdx (c) dx
x

Hint: In (b), you need two tricks. The first is sin2 x = sin x · sin x. Try to figure out
the second yourself. (Using this trick twice will not work!)
302 CHAPTER 10. THE INDEFINITE INTEGRAL

The rulebook, part 4: A chain rule for indefinite integrals

Next, we are going to investigate how we can make the chain rule give us a computational
rule for the indefinite integral. This will result in the most useful computational rule for
the indefinite integral, namely the change of variables formula.
To get some intuition of what is going on, let us first look at some examples.

Example 10.23 We want to compute


Z
sin 3x dx.

This essentially means that we have to guess a primitive for sin 3x. The guess cos 3x
seems natural, but it is slightly wrong (check this yourself!). However, if we adjust this
guess – compensating for the constant just as in example 10.12 – we obtain the correct
formula Z
1
sin 3x dx = cos 3x + C.
3

Example 10.24 We want to compute


Z p
sin x
p dx.
2 x
p
Again, we try to guess. It is a bit harder this time, but we try with cos x, and see what
happens:
d p 1 p
cos x = p ( sin x).
dx 2 x
And, again, this is almost correct. With only a minor modification, we get
Z p
sin x p
p dx = cos x + C.
2 x

Both examples above are typically handled by the change of variables formula for the
indefinite integral, which is a consequence of the chain rule for derivatives. To this end,
we begin by writing up the chain rule us follows:

d
F g(x) = f g(x) g 0 (x)
dx

Here, we use F to denote a primitive of f . But this means that F (g(x)) is a primitive
10.2. THE RULEBOOK FOR INDEFINITE INTEGRATION 303

function of f g(x) g 0 (x). In other words:


Z
f g(x) g 0 (x) dx = F g(x) + C

If you examine the two previous examples closely, then you see that this is actually the
formula we used. However, this is not the formula that most mathematicians think of
when they say change of variables.
So, let us massage the above expression a bit, introducing the variable u = g(x). The
job of this variable is to hide the complexity of g(x). Indeed, we can write the following:
Z
F g(x) + C = F (u) + C = f (u) du.
R
Here, the integral f (u) du tells us that weR are to find the primitive of f (u) as if u is
the variable of f . That is, when computing f (u)du, we are allowed to forget about the
connection u = g(x).
Combining the above, we get:

Proposition 10.25 (change of variables) Setting u = g(x), we can write


Z Z
0
f g(x) g (x) dx = f (u) du.

To remember this formula, we use Leibniz notation. Notice that when we put u =
g(x), then computing the derivative of u with respect to x gives
du
= g 0 (x) () du = g 0 (x)dx.
dx
The expression to the right does not really mean anything (since we have only given
meaning to the symbol du/dx, and not to du and dx separately), but the notation is
excellent as a reminder of how to use the change of variables formula.
Let us revisit some of the above examples.

Example 10.26 We wish to calculate


Z
sin 2x dx.

The point is now more or less to think in the same way as when we used the chain rule.
Here sin u is a friendly outer function (since we know how to integrate it with respect to
u), so we put u = 2x. (Note that we often call sin u the outer function, and u = 2x the
304 CHAPTER 10. THE INDEFINITE INTEGRAL

inner function.) We compute the derivative of the inner function (usually just called the
inner derivative), and change it to the form indicated above:

du
= 2 () du = 2dx.
dx
By the change of variables formula, this yields
Z Z
1
sin 2x dx = sin 2x · 2dx
2
Z
1
= sin u du
2
1⇣ ⌘ 1
= cos u + C = cos 2x + C.
2 2
(Note that the constant C changed in the last equality. We could have used a new name,
like D, and pointed out that C/2 = D, but it does not really matter, since it is an
“indefinite constant”.)

Example 10.27 We compute Z p


sin x
p dx.
2 x
This time it is a bit harder to see what is the proper outer and inner functions. Let us
make a guess and again choose sin u as the outer function. This means that we choose
p
u = x as the inner function. Computing the inner derivative reveals that

du 1 dx
= p () du = p .
dx 2 x 2 x

This is just what we need! (Such miracles occur frequently on exams.) The change of
variables formula now yields
Z p Z
sin x p dx
p dx = sin x · p
2 x 2 x
Z
= sin u · du
p
= cos u + C = cos x + C.
10.2. THE RULEBOOK FOR INDEFINITE INTEGRATION 305

Exercise 10.28 Use change of variables and with Proposition 8.8 to compute:
Z Z
10
(a) (2x + 3) dx (b) cos x esin x dx
Z Z
x sin(2x)
(c) 2
dx (d) dx
1+x 1 + cos2 x
Hint: Most of these should be rather straight-forward. However, to solve part (d), you
may want to browse the various trigonometric identities from Chapter A.
Exercise 10.29 Use a suitable change of variables to show that
Z 2
x (11x 9)
dx = x3 (x 1)2/3 + C.
3(x 1)1/3
Remark: This integral looks much worse than it actually is...

With a bit more experience, these exercises will get easier, since we will more used
to spotting suitable pairs of inner and outer functions. When we reach the exam, these
should (hopefully) be routine exercises. Note that what makes change of variable formula
difficult, in the way we are using it here, is that it is not enough to spot a suitable pair
of inner and outer functions – the derivative of the inner function must also appear!
However, this is not strictly necessary. The change of variables formula can also
be used even if the derivative of the inner function does not appear. As the following
example shows, there is quite a lot of freedom.

Example 10.30 We wish to compute


Z
p
sin x dx.

This is similar to the previous example, however this time there is no sign of a suitable
inner derivative. Still, we can try to force the change of variables. So, again, we put
p p p
u = x and compute du = dx/2 x. But since u = x, we can also express this as
du = dx/2u, which is the same as 2udu = dx.
Using this, Z Z Z
p
sin x dx = sin u · 2u du = 2 u sin u du.

We have arrived at the same expression as in example 10.19.

Exercise 10.31 Compute Z


p
arctan x dx.

Exercise 10.32 Parts (iii) and (iv) of Proposition 8.9 also yield integration formulas,
even though they are practically never used. Determine these.
306 CHAPTER 10. THE INDEFINITE INTEGRAL

10.3 A big bag of integration tricks


We end this chapter by considering some additional tricks for computing indefinite inte-
grals.

First trick: Partial fraction decomposition


Partial fraction decomposition is a trick that, in theory, allows us to compute all integrals
of rational functions
Z
p(x)
dx, p(x), q(x) polynomials.
q(x)
Here is an example.

Example 10.33 (The basic idea) To compute the integral


Z
dx
2
,
x 1
the point is to observe that one can, for suitable constants A, B, write
1 A B
= + .
(x + 1)(x 1) x+1 x 1

This is called a partial fraction decomposition of 1/(x2 1). Indeed, once such A,B are
found, the integral is solved as follows
Z
dx
2
= A ln |x + 1| + B ln |x 1| + C. (10.1)
x 1

This leads to two questions: how do we know that we can decompose 1/(x2 1) in
this way, and how do we determine the constants A and B? We address the second
question first.

Example 10.33 (continued – how to determine A, B)


We describe two approaches. In both, we start out by getting rid of the denominator to
get
1 = A(x 1) + B(x + 1). (10.2)

Method 1 – system of equations: By rewriting the right-hand side, we get

1 = (A + B)x + (B A)
10.3. A BIG BAG OF INTEGRATION TRICKS 307

Notice that for the left-hand side to be equal to the right-hand side, it suffices to have
A + B = 0 and A B = 1. That is, we get the system of equations

A+B =0
B A = 1.
Solving this system, we get the solutions A = 1/2 and B = 1/2.
Method 2 – laying on hands: The following method is often easier and more direct
than the one above. The trick is to take advantage of the fact that (10.2) is supposed to
hold for all x. In particular, choosing x = 1 and x = 1, we get

x = 1 =) 1 = A(1 1) + B(1 + 1) = 0 + 2B
1
=) B =
2
x = 1 =) 1 = A( 1 1) + B( 1 + 1) = 2A + 0
1
=) A = .
2
(Can you guess why the method is called laying on hands?)
We conclude that Z
dx 1 x 1
= ln + C.
x2 1 2 x+1

Exercise 10.34 Modify the solution procedure in the above example a tiny bit to
compute the integral Z
x dx
.
x2 1
Exercise 10.35 Combine the strategies from 10.33 and 10.34 to compute the integral
Z
3x + 2
2
dx.
x x 2

Exercise 10.36 (a) Explain why it is impossible to decompose x2 /(x2 1) by using


the same type of partial fraction decomposition as in 10.33 and 10.34.
(b) By first doing a polynomial division, compute the integral
Z
x2
dx.
(x2 1)
What we have seen so far is the “light” version of partial fraction decompositions. We
summarise it as follows:
308 CHAPTER 10. THE INDEFINITE INTEGRAL

Method 10.37 (Partial fraction decomposition) A partial fraction decomposition


of a rational function p(x)/q(x) is obtained as follows:

1. Make sure that the degree of the numerator p(x) is strictly less than the degree
of the denominator q(x). If this is not the case, perform a polynomial division to
write
p(x) p1 (x)
= r(x) + ,
q(x) q(x)
where r(x) and p1 (x) are polynomials and deg p1 (x) < deg q(x).
2. Factorise the denominator q(x) as much as possible. Here, any factorisation is fine
as long as no two factors have common zeroes.
3. Make a suitable guess for the partial fraction decomposition of p(x)/q(x) (or, if
you needed to do a polynomial division in Step 1, of p1 (x)/q(x))). For each factor
of the form (x + a)n in q(x), we include in the guess either

A0 + A1 x + · · · + An n 1
1x B0 B1 Bn 1
or + + ··· +
(x + a)n 1 x + a (x + a) 2 (x + a)n

and for each factor of the form (ax2 + bx + c)n , we include in the guess either

A0 + A1 x + · · · + A2n 1 x2n 1 B0 + C 0 x B1 + C1 x
or +
(x2 + ax + b)n ax2 + bx + a (ax + bx + c)2
Bn 1 + C n 1 x
+··· + ,
(ax2 + bx + c)n

where the Aj , Bj and Cj are all constants (see below for some examples).
4. Determine the constants Aj , Bj and Cj .
5. Double check that your formula is correct (this is how you know whether or not
you have made a mistake – and will actually be the proof that your formula is
true).

We now take a closer look at how to make suitable guesses in Step 3 above. The
thing is that the guess we need to make for the partial fractions decomposition sometimes
need to be adjusted. Here are some pointers on how this is done.

Method 10.37 continued (pointers on how to make suitable guesses)


1) Here is an example of a suitable guess when deg p(x) < deg q(x), and all factors
in q(x) are of the form (x + a)n with n = 1:
16x + 61 A B C
= + + . (10.3)
(x + 2)(x 3)(x 5) x+2 x 3 x 5
10.3. A BIG BAG OF INTEGRATION TRICKS 309

2) Here is an example where one factor is of the form (x + a)n with n > 1. According
to our method, each first degree term with higher multiplicity can be included in one of
two ways: either once with a numerator of degree one degree less than the multiplicty,
or as several times with increasing multiplicity as shown in the second line below:

16x + 61 Ax2 + Bx + C D
3
= 3
+ (10.4)
(x + 2) (x 3) (x + 2) x 3
E F G H
= + 2
+ 3
+ (10.5)
x + 2 (x + 2) (x + 2) x 3

3) Finally, if you have a second degree factor that you “cannot” factorise then you
can compensate by choosing a suitable degree for the denominator. For instance:
16x + 61 Ax + B C
= 2 + (10.6)
(x2 + 2)(x 3) x +2 x 3

With the above guidelines, you essentially know all you need to know about partial
fraction decompositions of rational functions (see Remark 10.41, below).

Exercise 10.38 (a) Inspired by the guidelines above, state an appropriate guess for

x3 + 2x + 1
.
(x 1)2 (x2 1)(x2 + 1)2

(Here, you do NOT have to explicitly compute the coefficients.)


(b) Without determining the coefficients explicitly, what can you say about the inte-
gral of this function? (Here, it may be useful to check out exercise 10.57.)

Exercise 10.39 (a) Try to determine coefficients A, B, C so that

2x + 1 A Bx + C
= + .
(x 1)(x2 1) x 1 x2 1

What goes wrong? (b) Adjust the partial fraction decomposition so that it works and
compute the integral.

Exercise 10.40 Compute the following integrals:


Z Z Z
dx dx x+4
(a) (b) (c) dx
(x 3)(x + 2) x (1 + x2 )
2 x2 5x + 6
310 CHAPTER 10. THE INDEFINITE INTEGRAL

Remark 10.41 Note that we do not state or prove a general theorem on partial fraction
decompositions (the techniques needed to prove such a result go beyond the scope of this
course). This means that you need to prove your partial fraction decomposition each and
every time you use this technique. (But this is in any case a good idea as this amounts
to double checking whatever such formula you come up with!)
Next, we point out that the guesses mentioned in Method 10.37 are all that we need.
Indeed, any factor in q(x) can always be expressed as a product of factors of the form
(x + a)n and/or (ax2 + bx + c)n . A simple example of this would be

x3 1 = (x 1)(x2 + x + 1).

Moreover, if we allow complex numbers, then we can express any factor of q(x) as a
product of factors (x + a)n . An example of this would be

x3 1 = (x 1)(x e2i⇡/3 )(x e 2i⇡/3


).

In other words, by allowing for complex number, we could simplify the types of guesses
we need to make! Alas, this would get us into trouble, as we do not know how to integrate
factors of the form 1/(x + a) when a is a complex number (it would require us to define
the logarithm of a complex number – which requires a deeper discussion on complex
numbers).

Exercise 10.42 Prove that all third degree polynomials can be written as a product
of a second degree polynomial and a first degree polynomial with real coefficients.
Hint: Use the graph sketching techniques from Chapter 9 in combination with Propo-
sition A.34.
Remark: Note that if we would allow complex coefficients, then by the fundamental
theorem of algebra, every polynomial of degree three can be written as a product of
three first degree polynomials.
10.3. A BIG BAG OF INTEGRATION TRICKS 311

Second trick: Integrals of rational functions that we cannot partial


fraction decompose
Finally, we remark that if we try to compute the integral, say,
Z
dx
2
,
x + 4x + 5
using partial fraction decomposition, we will get into trouble1 . The trick to comfortably
compute such an integral is to complete the square of the denominator, and then make
"an obvious" change of variables to simplify what you get.

Example 10.43 To compute the above integral, we start by completing the square
of the denominator (if the numerator was of equal or higher degree, we would start by
doing a polynomial division):
Z Z
dx du
2
= .
x + 4x + 5 (x + 2)2 + 1

Next, we make the change of variables x + 2 = u. This gives dx = du, and we compute
Z Z
du du
2
= 2
= arctan u + C
(x + 2) + 1 u +1

= arctan(x + 2) + C.

Exercise 10.44 Compute the following integrals by first completing the square of the
denominator. Z Z
dx dx
(a) 2
(b) 2
x + 4x + 5 x + 4x + 3
Exercise 10.45 Compute Z
dx
.
(x + 1)(x2 + x + 1)
Exercise 10.46 Compute
Z Z
dx x3
(a) (b) dx
x3 3x2 + 2x x2 + 5

Exercise 10.47 Compute the following integral (which is suprisingly annoying):


Z
x dx
2
.
x + 4x + 5
1
Unless we allow ourselves to use complex numbers, as mentioned at the end of Method 10.37.
312 CHAPTER 10. THE INDEFINITE INTEGRAL

Third trick: Expressions involving trigonometric functions


Here, we indicate a few tricks to help you compute integrals of trigonometric functions.

Example 10.48 (Trick 1) The integral


Z
cosn x dx

is fairly easy to compute when n is odd, and quite annoying when n is even. To see why
this is, we consider n = 5. Then, by the Pythagorean identity, we get
Z Z Z
cos x dx = cos x · cos x dx = (1 sin2 x)2 cos x dx.
5 4

Exercise 10.49 Complete the computation of the integral in the previous example by
making a suitable change of variables.

Exercise 10.50 Use the same strategy to compute


Z Z
dx 1
(a) (b) dx
sin x tan x

Hint: First, multiply with 1 so that you can use Pythagoras without square roots.

Example 10.51 (Trick 2) But how to solve


Z
cosn x dx

when n is an even number? Well, the strategy is to reduce the number n is much as
possible by using the half-angle formulas
1 + cos 2x 1 cos 2x
cos2 x = and sin2 x =
2 2
To illustrate this, suppose that n = 4. Then, by the half-angle formula,
Z Z ⇣
1 + cos 2x ⌘2
cos4 x dx = dx
2
Z
1 + 2 cos 2x + cos2 (2x)
= dx
4

Exercise 10.52 Complete the computation of the integral in the previous example by
applying the half-angle formula once more.
Exercise 10.53 Compute the indefinite integral of sin6 x.
10.3. A BIG BAG OF INTEGRATION TRICKS 313

Exercise 10.54 Pair the following integrals with suitable primitive functions.
Z
sin 2x 1 1
(a) 2
dx (i) sin6 x sin8 x + C
1 cos x 6 8
Z
x 1
(b) (sin x)5 (cos x)3 dx (ii) sin(2x) + C
2 4
Z
dx
(c) (iii) 2 ln(sin x) + C
cos x
Z
1 1 + sin x
(d) sin x(cos5 x + cos3 x)1/3 dx (iv) ln +C
2 1 sin x
Z
3
(e) sin2 x dx (v) (cos2 x + 1)4/3 + C
8
Remark: You should make it your business to understand how to compute all of these
integrals, and not just verify their solution using differentiation.
314 CHAPTER 10. THE INDEFINITE INTEGRAL

The fourth trick: Inverse trigonometric substitutions

Example 10.55 (The basic idea) We consider the integral


Z p
1 x2 dx, x 2 [ 1,1].

So, what to do? Well, wouldn’tpit be really nice


pif x = sin u? Indeed, if this was the case,
p
we could simplify 1 x = 1 sin u = cos2 u = | cos u|. Well, there is nothing
2 2

stopping us, so let’s try! Taking into account that dx = cos u du, we get
Z p Z
2
1 x dx = | cos u| cos u du.

Can we get rid of the absolute value? Yes, to see how, you need to notice that the change
of variables we really made is not x = sin u, but rather arcsin(x) = u (this is why we
call it an inverse substitution – if this confuses you, go back and carefully re-read the
section on the change of variables formula, and compare to what is going on here). Since
the range of arcsin(x) is [ ⇡/2, ⇡/2], it follows that | cos u| = cos u (see Example 2.86).
Yay! In other words, we need to compute the integral
Z p Z
1 x dx = cos2 u du.
2

By exercise 10.54 (sort of), and the fact that x = sin u is the same as arcsin x = u, we
get
Z
u + sin u cos u
cos2 u du = +C
2
arcsin x + sin(arcsin x) cos(arcsin x)
= +C
2
p
arcsin x + x 1 x2
= +C
2
p
Here, we used in the last step that sin(arcsin x) = x and cos(arcsin x) = 1 x2 for
x 2 [ 1,1] (recall Example 2.86).

Exercise 10.56 (challenge)


(a) Use the trigonometric substitution y = sin u to compute
Z p
1 y2
dy, y 2 (0,1].
y
(b) Use the substitution u = 1 y 2 to compute this integral.
10.3. A BIG BAG OF INTEGRATION TRICKS 315
p
(c) Use the substitution v = 1 y 2 to compute this integral.

Exercise 10.57 Use an inverse trigonometric substitution to compute


Z
dt
(1 + t2 )2

Hint: 1 + tan2 u = 1/ cos2 u.

Exercise 10.58 Compute the integral


Z
dx
p .
x2 + 2x + 2
316 CHAPTER 10. THE INDEFINITE INTEGRAL

(Optional) A fifth trick: Tangent of the "half-angle" inverse substitution


Integrals on the form Z
f (sin x, cos x) dx, (10.7)

where f is a rational function in two variables, can be computed using partial fraction
decompositions. The trick is to use the change of variables
x
y = tan .
2
The point is that with this change of variables, it is possible (but not easy) to compute
that
2 dy 2y 1 y2
dx = sin x = cos x = .
1 + y2 1 + y2 1 + y2
This means that the integral in (10.7) can be rewritten and computed using the partial
fraction decomposition technique. While we will not really pursue this technique (the
expressions you end up with tend to be really horrible), here are two exercises to get a
feel of what is going on.

Exercise 10.59 Use the substitution y = tan(x/2) to solve


Z
dx
.
sin x

Exercise 10.60 Prove the formulas for dx, sin x and cos x in the above remark.
10.4. EXAM EXERCISES 317

10.4 Exam exercises


A few of the exercises below require knowledge of the definite integral. However, to solve
these, you basically only need to solve the corresponding indefinite integrals. Indeed, as
we will see in the chapter on definite integration – and should be familiar to you from
high school – if Z
f (x)dx = F (x) + C,

then the corresponding definite integral satisfies


Z b
f (x)dx = F (b) F (a).
a

Exercise 10.61 (Exam 2016-01-23, 3)

(a) State the definition of the indefinite integral and explain the difference to definite
integrals.
(b) Compute the integrals
Z Z 3
p
(i) sin x dx (ii) |x2 5x + 4| dx
0

Remark: In part (ii), the definite integral actually requires some thinking before it can
be solved using "indefinite" techniques.

Exercise 10.62 (Exam 2014-10-04, 1) Compute


Z Z
sin x dx
(a) (b)
1 + 4 cos2 x e 4x + e4x
Z Z r
1 1
(c) (b) dx
1 + 4x2 4+x

Exercise 10.63 (Exam 2014-08-18, 3) Consider the definite integral


Z ap
1 x2 dx, a 2 [0,1].
0

(a) Make a figure


R1pillustrating the area determined by the above integral. In particular,
determine 0 1 x2 dx without actually integrating.
p
(b) Prove that sin(2 arcsin a) = 2a 1 a2 for all a 2 [0,1].
(c) Compute the integral by making the substitution x = sin u.

Exercise 10.64 (Exam 2014-05-26, 2)


318 CHAPTER 10. THE INDEFINITE INTEGRAL

R
(a) Explain briefly the relation between a primitive function f (x)dx and the inte-
Rb
gral a f (x)dx.
(b) Find all primitive functions of
tan x
.
sin x + 1
(c) Compute
Z 3
dx
.
1 2 + |x2 2|
Remark: As in the first exercise above, part (c) requires some thought before the in-
definite techniques can be used.

Exercise 10.65 (Exam 2012-12-19, 3) Compute


Z
sin 2✓
2
d✓.
cos ✓ + 6 cos ✓ + 10

Exercise 10.66 (Exam 2012-05-28, 4) Calculate the integrals


Z 26
p p Z e⇡/2
1+ x
(a) p dx (b) sin(ln x)dx
32 x 1
10.5. ANSWERS TO SELECTED EXERCISES 319

10.5 Answers to selected exercises


p p p
10.1 (i) arctan(cos x) + C, (ii) ln(1 + cos2 x) + C, (iii) 2 sin( x) 2 x cos( x) + C,
p
(iv) (4/3)( x + 1)3/2 + C.

10.5
(F (x + h) + C) (F (x) + C) F (x + h) F (x)
(F (x)+C)0 = lim = lim = f (x).
h !0 h h !0 h

10.7 (a) About 14 m/s, which is roughly the same as 51 km/h. (b) About 12 meters
(you do not need to differentiate to figure this out. For instance, you can deduce
this by completing the square of the expression – indeed, this was how we found
the range of a second degree expression in Chapter 0.).
x2
10.10 (a) x + C, (b) 2 + C, (c) x
1
+ C, (d) C.
2
10.13 (a) 1/2x2 + C, (b) e x /2 + C, (c) ln(1 + x3 ) + C, (d) tan x x + C. (In (d),
we used the fact that we already know that (tan x)0 = 1 + tan2 x.)

10.16 We give the argument for (i). On the one hand, if F (x) is a primitive of f (x), then
it follows by the product rule for derivatives that (kF (x))0 = f (x). It therefore
follows by the definition of indefinite integrals that
Z
kf (x) dx = kF (x) + C.

On the other hand,


Z
k f (x) dx = k(F (x) + D) = kF (x) + kD.

Since, as we allow C and D to run through all of R, then the two above formulas
describe exactly the same functions (given that k 6= 0). Hence, these two indefinite
integrals represent the same sets of functions, and we conclude that
Z Z
kf (x) dx = k f (x) dx.

10.17 Hint: Write


Z Z Z ⇣
x2 x2 + 1 1 1 ⌘
dx = dx = 1 dx.
1 + x2 1 + x2 1 + x2

10.20 (a) xex ex + C, (b) (x2 2) sin x + 2x cos x + C, (c) (ln(x) + 1)/x + C.

10.21 (a) x ln(x) x + C, (b) x arctan(x) (1/2) log(x2 + 1) + C.


320 CHAPTER 10. THE INDEFINITE INTEGRAL

10.22 (a) (1/2)ex (sin x cos x) + C, (b) (1/2)(x sin x cos x) + C, (c) (ln(x))2 /2 + C.

10.28 (a) (1/22)(2x+3)11 +C, (b) esin x +C, (c) (1/2) ln(x2 +1)+C, (d) ln(1+cos2 x)+C.

10.29 The change of variable x 1 = u will work.


p p
10.31 (x + 1) arctan( x) x + C.

10.32 For instance, rule (iii) gives the formula


Z 0
g (x) 1
dx = +C
g(x)2 g(x)

10.34 You need to find A, B that solves

x A B
= + .
(x 1)(x + 1) x+1 x 1

That is, you need to find A, B that solves x = A(x 1) + B(x + 1). Using either
method from example 10.33 gives A = B = 1/2.

10.35 First express the integral as


Z Z
x dx dx
3 +2 ,
(x + 1)(x 2) (x + 1)(x 2)

and then use the method from example 10.33.

10.36 (a) Compare the limit of x ! 1 of both expressions, can they ever be the same?
(b) x + ln |x2 1| ln |x+1|
2 + C.

10.38 (a)
A + Bx + Cx2 D E + F x + Gx2 + Hx3
+ +
(x 1)3 x+1 (x2 + 1)2
or
A B C D E + Fx G + Hx
+ + + + +
x 1 (x 1)2 (x 1)3 x + 1 (x2 + 1) (x2 + 1)2

(b) According to the second of the above expressions, the integral will be equal to

B C F H 1
A log |x 1| +D log |x+1|+E arctan(x)+ log(1+x2 )+G??? +C,
(x 1) 2(x 1)2 2 2 x2 + 1

where ??? stands for whatever the answer to exercise 10.57 is.

10.39 (a) Not possible. The factor (x 1) appears in both expressions on the right-hand
side.
10.5. ANSWERS TO SELECTED EXERCISES 321

10.40 Use the partial fraction decompositions:


A B A B C + Dx A B
(a) + (b) + 2+ (c) + .
x 3 x+2 x x 1 + x2 x 2 x 3

10.44 (a) arctan(x + 2) + C, (b) (1/2) log x+1


x+3 + C.
p p
10.45 ( 1/2) log(x2 + x + 1) + log |x + 1| + (1/ 3) arctan((2x + 1)/ 3) + C.

10.49 sin x (2/3) sin3 x + (1/5) sin5 x + C.


R R
10.50 In (a), the trick is to consider sin(x)/ sin2 (x)dx, and in (b) to consider sin x cos x/ cos2 xdx,
and then apply Pythagoras to both denominators before proceeding with Trick 1.

10.54 (a) - (iii), (b) - (i), (c) - (iv), (d) - (v), (e) - (ii).

10.56 (a) A rather involved computation gives

1 1 + cos(arcsin y)
cos(arcsin(y)) ln + C.
2 1 cos(arcsin y))

Plugging in the expression for cos(arcsin(y)), and then simplifying, leads you to
the expression p p
1 y 2 log( 1 y 2 + 1) + log(y) + C.

(b) To make this method work, start out by writing


Z p Z p
1 y2 1 y2
dy = ydy.
y y2

10.57 The substitution t = tan u, gives the answer


1⇣ t ⌘
+ arctan(t) + C.
2 1 + t2

10.60 Rewrite y = tan(x/2) as arctan y = x/2. With this, implicit differentiation will
immediately give you the formula for dx. To get the formulas for sin x and cos x,
observe that x = 2 arctan y and that the expressions you need to compute are
sin(2 arctan y) and cos(2 arctan 2). You can now do as we did in Chapter 0 (that
is, either work geometrically or use trigonometric identities).
322 CHAPTER 10. THE INDEFINITE INTEGRAL
Chapter 11

Differential equations

Introduction
Differential equations are the main mathematical tool used by scientists to formulate the
laws of nature. In Chapter 7, we discussed the intuition behind differential equations and
how to simulate them. Here, we discuss how to solve them using the tools of Calculus.

Remark 11.1 (Selected problems from previous exams based on this chapter)

1. Determine all solutions to the following differential equations.

1 1 + y2 y 1
(i) = , (ii) y 0 = , x > 0, (iii) y 00 +y 0 2y = ex .
x sin x y0 x 1 + x2

11.1 Solution methods for first order differential equations


A first order differential equation is any equation involving a function y, its variable x
and its derivative y 0 (but no higher order derivative).
We are going to study the following methods for finding formulas for the solutions of
first order differential equations:
1. Guess and check.
2. Two methods based on using the chain rule and the product rule for derivatives
backwards.
Note that these are essentially the same strategies that we have for
R computing indefinite
integrals. This should not come as a big surprise, as computing f (x) dx is the same as
solving the most basic of all first order differential equation, namely y 0 = f (x).

323
324 CHAPTER 11. DIFFERENTIAL EQUATIONS

First solution method: First order separable differential equations

A first order differential equation is said to be separable if it can be written on the form
g(y)y 0 = f (x). (11.1)
In particular, both the continuous-time proportional growth model and logistic-growth
model considered in Chapter 7 are of this form. We now describe a method that can be
used to solve them.

Example 11.2 (Solving separable differential equations – first explanation)


To solve
y 0 = 2x · (1 + y 2 ),
we first "separate the variables" by rewriting this on the form (11.1):
y0
= 2x. (11.2)
1 + y2
The trick is now to identify the left-hand side as the result of the use of the chain rule.
In this case, it is the result of computing the expression
d
arctan(y).
dx
This means that we can solve the differential equation as follows:
Z
y0 d
= 2x () arctan(y) = 2x () arctan(y) = 2xdx
1 + y2 dx
() arctan(y) = x2 + C () y = tan(x2 + C).

Note that for the last line in the above chain of equivalences to hold, x2 +C must be in the
interval ( ⇡/2,⇡/2) (do you see why?). However, this restriction is not that important.
Indeed, treating y = tan(x2 + C) as a guess for the solution, we can see, by checking,
that everything is fine whenever x2 + C is not an integer multiple of ⇡/2 (again, why?).

In the above example, we more or less tried to explain exactly what was going on.
However, since the level of detail may hide the solution strategy, we now try to express
the above solution in way that (hopefully) reveals the strategy more clearly.

Example 11.3 (Solving separable differential equations – second explanation)


Let us now take advantage of the Leibniz notation to give a more "compact" explanation
of how to solve the differential equation y 0 = 2x·(1+y 2 ). The point is to write y 0 = dy/dx
and to pretend that we can separate this "fraction". That is, the above solution method
11.1. FIRST ORDER DIFFERENTIAL EQUATIONS 325

can now be expressed as follows:


dy
y 0 = 2x · (1 + y 2 ) () = 2x · (1 + y 2 )
dx
dy
() = 2xdx (the variables are now separated)
1 + y2
Z Z
dy
() = 2xdx (we "integrate" both sides)
1 + y2
() arctan(y) = x2 + C () y = tan(x2 + C).

Here, the comment that we made at the end of Example 11.2 also applies.

Exercise 11.4 Use the above method to solve the following differential equations
(and verify that your solutions satisfy the original differential equations!).

(a) y 0 = (1 + ex )y 2 (b) (1 + x2 )y 0 = xy (c) xy 0 + e y


= 1, x > 0.

Exercise 11.5 Solve the differential equation in the model for continuous-time pro-
portional growth (recall Example 7.22).

Exercise 11.6 (Challenging) Solve the differential equation in the continuous-time


model for logistic growth (recall Example 7.22).

For easy reference, here is a brief summary of the above solution method.

Method 11.7 (Solution method for separable first order differential equations)
dy
g(y)y 0 = f (x) () g(y) = f (x)
dx
() g(y)dy = f (x)dx
Z Z
() g(y)dy = f (x)dx

Here, the same remark as we made in the text following Proposition 8.25 applies.
326 CHAPTER 11. DIFFERENTIAL EQUATIONS

Second solution method: First order linear differential equations


A differential equation is said to be first order linear1 if it can be written on the form
y 0 + p(x)y = q(x). (11.3)

The most classical textbook example of a physi-


cal situation modelled by such a differential equa-
tion is the speed obtained in by an object in free
fall where we take air resistance into account.
To see how this is, we notice that by the laws of
Newton, such a model ought to look something
like Fig. 1. Meet Elsa. Elsa just jumped
mv 0 = m · 9.82 air resistance, out of a plane...
where m is the mass of the object, and we have chosen "downwards speed" as being the
positive. Now, a standard simplifying assumption (which is not very realistic – so, why
do you think we make this assumption?) is that the force exerted by air resistance is
proportional to the speed. That is, the above model is on the form

mv 0 = m · 9.82 kv (11.4)

Exercise 11.8 Explain exactly why this differential equation is of the form (11.3).

We now describe a solution method tailored for these types of differential equations.

Example 11.9 (The basic idea) Let us solve


xy 0 + y = sin(x).
Since this differential equation is not separable, we need a new idea for how to solve it.
So what to do? This time, the trick is to identify the left-hand side as the result of the
use of the product rule for derivatives. In this case, it is the result of computing the
expression
d⇣ ⌘
xy(x) .
dx
This means that we can solve the differential equation as follows:
Z
0 0
xy + y = sin(x) () (xy) = sin x () xy = sin x dx

() xy = cos x + C
C cos x
() y = .
x

1
We explain why we use the word linear when we deal with second order differential equations.
11.1. FIRST ORDER DIFFERENTIAL EQUATIONS 327

Unfortunately, this method has an obvious weakness: what if there is no obvious way
of applying the product rule "backwards"? For instance, how to solve

y 0 + 2y = sin x ?

The thing is that there is a miraculous trick that always fixes this problem. That is, we
can always use the product rule backward if we first multiply by a suitable integrating
factor!

Example 11.10 (A first look at integrating factors) We now try to solve the
differential equation
y 0 + 2y = sin x.
Here, it is not immediately clear how to apply the product rule backwards to simplify
the left-hand side. But look what happens if we multiply by e2x on both sides:

e2x y 0 + 2e2x y = e2x sin x.

Suddenly, the left-hand side is the result of a product rule. Indeed, the left-hand side is
equal to
d ⇣ 2x ⌘
e y .
dx
The rest of the solution is now more or less as in the previous example (but with a harder
integral to compute), and we leave it as an exercise.

Exercise 11.11 Complete the above example.

Exercise 11.12 Below, we explain how to identify appropriate integrating factors. But
before reading on, try to figure out the integrating factors needed to solve the following
differential equations:

(a) y 0 + 3y = x (b) y 0 + 2xy = x

But how to determine the integrating factor? Well, suppose your differential equation
is of the form
y 0 + p(x)y = sin(x).
The trick is to multiply in the factor eP (x) , where P (x) is a primitive of p(x). Indeed,
the differential equation becomes

eP (x) y 0 + eP (x) p(x)y = eP (x) sin(x).

You can now verify yourself that the left-hand side is of the form (do this!)
d ⇣ P (x) ⌘
e y .
dx
328 CHAPTER 11. DIFFERENTIAL EQUATIONS

Exercise 11.13 Identify the appropriate integrating factors, and use them to solve:
1
(a) y 0 · y = x2 ex , x > 0 (b) (1 + x2 ) · y 0 + y = 1 (c) y 0 + xy = x3 .
x
Exercise 11.14 (a) Determine the solutions of the ODE
1 0 2 1
2
y 3
y+ 2 = 0, x > 0.
x x x +x
f (x)
(b) Which of these solutions satisfies the terminal condition limx!1 x2
=1?

Exercise 11.15 (a) Use the method of integrating factors to solve the differential
equation for the free fall with air resistance given by equation
mv 0 = m · 9.82 kv.
(b) In the initial stages of a skydive, people tend to lie horizontally to increase air
resistance, and thus decrease the speed of the fall. The stable velocity at this
stage is (according to the web) about 200 km/h. Suppose this is true for Elsa,
determine k by considering a suitable limit of the differential equation in (a).
For easy reference, here is a brief summary of the above solution method.

Method 11.16 (Solution method for linear first order ODEs)

y 0 + p(x)y = q(x) () eP (x) y 0 + p(x)eP (x) y = q(x)eP (x)


d ⇣ P (x) ⌘
() e y = q(x)eP (x)
dx
Z Z
P (x)
() e y = q(x)eP (x) dx () y = e P (x)
q(x)eP (x) dx.

Note that we are justified in multiplying these equations with eP (x) since this expression
is never zero.

We now remark that since at every step of the above description of the solution
procedure to first order linear differential equations we have equivalences, it follows that
the procedure gives us every solution of the differential equation. We cannot make the
same claim about our solution method for separable differential equations since one step,
strictly speaking, does not make sense (treating dy/dx as a quotient. We ask you to fix
this in the next exercise.
Exercise 11.17 (Challenge) Inspired by the above solution method for first order
linear differential equations, try to rewrite the solution method for separable differ-
ential equations so that it becomes clear that it gives all solutions (on an implicit
form).
Hint: Involve the primitive functions G and F of g and f , respectively.
11.2. SECOND ORDER LINEAR ODES WITH CONSTANT COEFFICIENTS 329

11.2 Second order linear ODEs with constant coefficients


We now study second order linear ODEs with constant coefficients. These are exactly
the differential equations of the form

ay 00 + by 0 + cy = g(x) (11.5)

or, what is basically the same,


y 00 + py 0 + qy = f (x) (11.6)

where a, b, c, p, q are constants and g(x), f (x) functions. Below, we discuss the physical
significance of these equations and study how to solve them (at some point we also
discuss why we use the term “linear” here).
Physical significance
Linear second order differential equations give a
mathematical model for the oscillation of a mass-
spring system. For non-physicists, this may seem
like a contrived example, but it cannot be over-
stated how fundamental this system is to the
understanding of many (perhaps all?) phenom-
Fig. 2. Explanation of the spring-
ena studied by physicists (indeed, everything vi-
mass system.
brates – even vacuum!). When interpreted this
way, we should consider x to represent time, y
the displacement of the spring from its equilib-
rium point, the constants a, b and c represent in-
herent properties of the spring-mass system (see
the figure), while f (x) should be thought of as
an external force (for example gravity, wind, or
a motor that is attached to the system).
As an example, let us consider the initial value
problem 8 00 0
<y + y + 4y = 0
>
y(0) = 1
>
:
y 0 (0) = 0
Here, the mass of the system is a = 1, the friction
of the spring is b = 1, the stiffness is c = 4 (we Fig. 3. The initial condition y(0) = 1
ignore units) and there is no external force. The essentially means that we push the
two initial conditions, both at time x = 0, say mass up one unit, and then release at
that the object has an initial displacement of y = time x = 0. The condition y 0 (0) = 0
1 and speed y 0 = 0. means that there is no initial speed.
330 CHAPTER 11. DIFFERENTIAL EQUATIONS

To the right, we see a simulation of y 00 +


y 0 + 4y = 0 with initial conditions y(0) = 1 and
y 0 (0) = 0 and x = 0.1. Notice how the friction
is slowly killing off the oscillations. Also note
that since this is a second order differential equa-
tion, we cannot visualise it in a slope field. More-
over, we need to modify Euler’s method ever so
slightly to make the simulation (we discuss how
to simulate second order linear differential equa-
tions further below). Fig. 4.

Note that there are two initial conditions above. In general, you should impose the
same number of initial conditions as the order of the differential equation. To get a feeling
for why, recall that when solving y 00 = 10, you integrate twice, giving you the solution
y = 5x2 + Cx + D. To determine both coefficients, we need two initial conditions. This
is morally true for all second order equations (even if the solution methods vary).

Fig. 5. Here is what happens if we change the initial conditions to y(0) = 0 and
y 0 (0) = 2. This means that we start the motion by giving the object a vertical
punch upwards resulting in an initial motion with speed 2 (whatever this means).
To the right, we have simulated (with x = 0.1) the effect of such an initial condition
on a system with evolution equation y 00 + y 0 + 4y = 0. Notice that the dots are more
spread at first, which indicates a high initial velocity.

Exercise 11.18 Consider the system


8 00 0
<y + by + 4y = 0
>
y(0) = 1
>
:
y 0 (0) = 0
Depending on b, the system is either underdamped (b < 0), harmonic (b = 0), damped
(0 < b < 4), critically damped (b = 4) or overdamped (b > 4).
(a) Find some app/program on the internet that lets you simulate these differential
equations and experiment with different values of b to see if you can recognise
these different types of behaviour.
(b) If the stiffness is changed from 4 to 1, try to figure out for what values of b these
different behaviours occur.
11.2. SECOND ORDER LINEAR ODES WITH CONSTANT COEFFICIENTS 331

Differential operators
We now start our discussion of how to solve second order differential equations of the
form
y 00 + py 0 + qy = f (x), p,q 2 R. (11.7)
The basic idea is to figure out a way to factorise such second order differential equations
into first order linear differential equations of the form

y 0 + g(x)y = h(x),

for which we already have a nice solution method (integrating factors!). To this end, we
first recall that we can express the derivative of, say, a sum as follows:
d⇣ ⌘
f 0 (x) + g 0 (x) = f (x) + g(x) .
dx
Now, in the same way, for some function y = y(x), we should be able to write
⇣d ⌘
f 0 (x) + 2f (x) = + 2 f (x).
dx
Here, (d/dx + 2) is what we call a differential operator. In particular, just as the symbol
d/dx on is not a function, neither is the symbol 2. Instead, both symbols are what we
call operators – that is, functions that act on functions (that is, we use boldface font in
order to separate between the number 2 and the operator 2). Specifically, the operator
d/dx takes a function f as input and gives the function f 0 as output, and the operator
2 takes a function f as input and gives the function 2f as output.

Example 11.19 We now give an example to show why it is important to separate


between the operator 2 and the number 2. Indeed, consider the expressions
d d
(i) 2 and (ii) 2.
dx dx
Here, the expression (i) is just the derivative of the constant function y = 2, which is
equal to zero. The expression (ii), on the other hand, is a composition of differential
operators (since multiplication of operators always mean composition, we usually skip
the " "-symbol). This differential operator first multiplies a function by 2, and then
takes its derivative. Specifically, by the rules of composition, we compute
✓ ◆
d d
2 (f ) = (2f ) = 2f 0 (x).
dx dx

Exercise 11.20 Suppose g is some differentiable function, and let g be the operator of
multiplication by g. Check wether or not it is true in general that (d/dx)g = g(d/dx).
332 CHAPTER 11. DIFFERENTIAL EQUATIONS

How to factorise second order differential operators


We now observe that differential operators act very similarly to polynomials. To help us
explain this, we make the following definition (where d2 /dx2 = (d/dx)2 , as is usual).

Definition 11.21 (characteristic polynomial) Given p, q 2 R and a differential


operator
⇣ d2 d ⌘
+ p + q ,
dx2 dx
then its characteristic polynomial is given by

p(r) = r2 + pr + q.

The main point is that by factorising the characteristic polynomial of a differential


operator, we can also factorise the differential operator itself.

Example 11.22 (How to factorise a differential equation) Let us consider the


differential operator ⇣ d2 ⌘
d
+ 3 + 2 y = y 00 + 3y 0 + 2y.
dx2 dx
The point is now to convince you that this differential operator behaves similarly to its
characteristic polynomial
p(r) = r2 + 3r + 2.
Now, by using the pq-formula, we find that

p(r) = (r + 1)(r + 2).

But what happens if we now replace each occurrence of r by d/dx? Well, this gives us
a differential operator that acts as follows:
⇣d ⌘⇣ d ⌘ ⇣d ⌘
+1 +2 y = + 1 (y 0 + 2y)
dx dx dx
d 0
= (y + 2y) + (y 0 + 2y)
dx (11.8)
= (y 00 + 2y 0 ) + (y 0 + 2y)

= y 00 + 3y 0 + 2y.
That is, by considering the characteristic polynomial, we have arrived at the identity
⇣ d2 d ⌘ ⇣d ⌘⇣ d ⌘
+ 3 + 2 = + 1 + 2 .
dx2 dx dx dx

Exercise 11.23 Factorise the differential operator corresponding to y 00 + y 0 2y.


11.2. SECOND ORDER LINEAR ODES WITH CONSTANT COEFFICIENTS 333

How to solve a factored second order differential equation


We now use the observation from Example 11.22 to solve the differential equation.

Example 11.24 (First part) We now solve the differential equation

y 00 + 3y 0 + 2y = ex .

By Example 11.22, we know that it can be factored on the form by using its factorised
form ⇣d ⌘⇣ d ⌘
+1 + 2 y = ex .
dx dx
But how to proceed? Let us jump out of the example for a while to discuss the strategy.

The idea is basically to use the factorised form to express the second order differential
equation as a system of two first order differential equations. This can be done as follows:

• The first equation: Put u = (d/dx + 2)y. This gives u = y 0 + 2y.


• The second equation: Compute (d/dx + 1)u = ex . This gives u0 + u = ex .

That is, for y to solve y 00 + 3y 0 + 2y = ex we must have both

y 0 + 2y = u

u 0 + u = ex .

Conversely, note that if y solves the original differential equation, then y gives a solution
to this system! We formulate this as a theorem.

Proposition 11.25 (Solution theorem for second order linear differential equa-
tions) Let p, q 2 R, and consider the differential equation

y 00 + py 0 + qy = f (x).

Moreover, suppose that the characteristic polynomial can be factored on the form

p(r) = r2 + pr + q = (r r1 )(r r2 ).

Then y is a solution of the differential equation if and only if it solves the system
( 0
y r2 y = u

u0 r1 u = f (x).

Exercise 11.26 Write down a proof for the above proposition.


334 CHAPTER 11. DIFFERENTIAL EQUATIONS

Let us illustrate how to solve such a system by continuing our work on the above
example. Notice that by using the theorem, we do not have to mention differential
operators at all (from now on, they do their work in the background!):

Example 11.24 (Second part) We again consider the differential equation


y 00 + 3y 0 + 2y = ex .
Since the characteristic polynomial is
p(r) = r2 + 3r + 2 = (r + 1)(r + 2),

we need, by Proposition 11.25, to solve the system

y 0 + 2y = u

u 0 + u = ex .

Step 1: To solve u0 + u = ex , we use the integrating factor ex :

u0 + u = ex () u0 ex + uex = e2x

() (uex )0 = e2x
1
() uex = e2x + A
2
1
() u = ex + Ae x
.
2
Step 2: We plug the expression for u into u = y 0 + 2y to get the differential equation

y 0 + 2y = (1/2)ex + Ae x
.

This time, we multiply by the integrating factor e2x :


1 1
y 0 + 2y = ex + Ae x
() e2x y 0 + 2e2x y = e3x + Aex
2 2
1
() (e2x y)0 = e3x + Aex
2
1
() e2x y = e3x + Aex + B
6
1
() y = ex + Ae x
+ Be 2x
.
6
Done!

Exercise 11.27 Solve y 00 + y 0 = sin x.


11.2. SECOND ORDER LINEAR ODES WITH CONSTANT COEFFICIENTS 335

A structure theorem for homogeneous solutions


Given any differential equation on the form

y 00 + py 0 + qy = f (x),
we call the solutions to
y 00 + py 0 + qy = 0 (11.9)
its homogeneous solutions. That is, these are the solutions to the spring-mass system
when it is not influenced by external forces.
As it turns out, the homogenous solutions are rather well behaved. In particular,
they satisfy the two following propositions2 .

Proposition 11.28 Suppose that y1 , y2 are two solutions of (11.9). Then for all A, B 2
C, the linear combinations y = Ay1 + By2 are also solutions to (11.9).

The next result provides us with a formula for these solutions.

Proposition 11.29 (Structure theorem for homogeneous solutions) For all p,q 2
R, the following holds:

(i) If the characteristic polynomial of (11.9) has two different roots (i.e., r0 6= r1 ),
then all solutions of (11.9) are given by
y = Aer0 x + Ber1 x , A, B constants.

(ii) If the characteristic polynomial of (11.9) has a repeated root (i.e, r0 = r1 ), then
all solutions of (11.9) are given by

y = Aer0 x + xBer0 x , A, B constants.

Exercise 11.30 Prove Proposition 11.28.


Exercise 11.31 Combine propositions 11.25 and 11.28 to obtain the structure theorem
for homogeneous solutions.
Exercise 11.32 Use the structure theorem to solve

(a) y 00 + 5y + 4 = 0 (b) y 00 + 4y 0 + 4y = 0 (c) y 00 + 2y 0 + 4y = 0 (d) y 00 + 4y = 0.

2
After taking the courses in linear algebra, you will recognise that the first proposition says that the
homogeneous solutions of a second order linear differential equation always form a vector space, and the
second that this vector space is always of dimension 2.
336 CHAPTER 11. DIFFERENTIAL EQUATIONS

The real form of complex homogeneous solutions


Let us now consider the following curious detail.
Even though y 00 + py 0 + qy = 0 represents a rea-
sonable physical system if the parameters p,q are
real and positive, it is still possible for the roots
of its characteristic polynomial to be complex,
and therefore for the solutions to be complex
(see, e.g., the example below).
This is a bit shocking since such a system
should have real solutions – at least for real initial
Fig. 6. A simulation of the differen-
conditions since a simulation should then pro-
tial equation y 00 + y = 0 with initial
duce a real solution curve. As it turns out, there
values y(0) = 1 and y 0 (0) = 0.
is no reason for concern.

Example 11.33 (Complex solutions on real form) According to the structure


theorem for homogeneous solutions, the differential equation
y 00 + y = 0

has complex solutions y = Aeix + Be ix . Let us see what happens when we combine this
with the formula eix = cos(x) + i sin(x) for the complex exponential:
y = Aeix + Be ix
⇣ ⌘ ⇣ ⌘
= A cos(x) + i sin(x) + B cos( x) + i sin( x)
(11.10)
= (A + B) cos(x) + (iA iB) sin(x) = C cos(x) + D sin(x).
| {z } | {z }
=C =D

That is, it seems that by allowing C, D to be (possibly) complex, we get the solutions on
“real form”. But this is not really the case. Indeed, if we solve for the initial conditions
y(0) = 1 and y 0 (0) = 0, we obtain C = 1 and D = 0. That is, the constants C, D were
real all along – it was the constants A, B that were complex! In particular, y = cos x.
(Compare this with the simulation shown in Figure 6!)

Exercise 11.34 Suppose that y 00 + py 0 + qy = 0, where p,q 2 R, has characteristic


polynomial with complex zeroes r1 6= r2 .
(a) Show that we can write r1 = k + i! and r2 = k i! for k,! 2 R.
(b) Use (a) to show that we can now write

y = ekx C cos(!x) + D sin(!x) , C,D constants.

Exercise 11.35 Express the solutions of parts (c) and (d) of 11.32 on real form.
11.2. SECOND ORDER LINEAR ODES WITH CONSTANT COEFFICIENTS 337

A structure theorem for general solutions


Before we formulate our description of the structure of the solutions of second order linear
differential equations with constant coefficients, we consider the following proposition.

Proposition 11.36 Suppose that yp is any solution of the differential equation


y 00 + py 0 + qy = f (x), p, q 2 R. (11.11)
Then y is a solution of (11.11) if and only if
y = yp + yh ,
where yh is a homogeneous solution (we usually call yp a particular solution.)

Exercise 11.37 Prove this proposition. That is (a) prove that if yh and yp are as
above, then y = yp + yh solves (11.11), and (b) prove that if y and yp are two solutions
of (11.11), then y yp is a homogeneous solution.
Hint: The proof is quite similar to that of Proposition 11.28.
We now formulate our most "general structure theorem" for the solutions.

Proposition 11.38 (The general structure theorem) Suppose that yp is any solu-
tion of the the differential equation
y 00 + py 0 + qy = f (x). (11.12)

Moreover, suppose that r1 , r2 are the (possibly repeated) roots of the polynomial
p(r) = r2 + pr + q.
Then y is a solution of (11.12) if and only if
y = yp + yh

= yp + Ay1 (x) + By2 (x),


where:

• Case 1: If r1 6= r2 , then y1 (x) = er1 x and y2 (x) = er2 x .


• Case 2: If r1 = r2 , then y1 (x) = er1 x and y2 (x) = xer1 x .

Moreover, p,q are real, then if the roots in Case 1 are complex, there exists k, ! 2 R so
that r1 = k + i! and r2 = k i!, and we can write

y1 (x) = ekx cos(!x) and y2 (x) = ekx sin(!x).

We will explore how to use this theorem on the next couple of pages.
338 CHAPTER 11. DIFFERENTIAL EQUATIONS

How to use the general structure theorem to "guess and check"


The structure theorem (Proposition 11.38) should be compared with a "structure the-
orem" that we are already familiar with. Namely that if F is a primitive of f on an
interval, then Z
f (x)dx = F (x) + C

expresses all solutions of the differential equation y 0 = f (x) there. In particular, F (x)
plays the part of the particular solution and C the part of the homogeneous solution.
Notice that in both structure theorems, we get complete information on how the
homogeneous solution looks, but no information on the particular solution. That is, to
find particular solutions we have to study each case separately. Here is a first example:

Example 11.39 (First "guess and check" example, part 1) Let us consider the
differential equation
y 00 + 3y 0 + 2y = 2(x2 + 1).
Since the characteristic polynomial is

p(r) = r2 + 3r + 2 = (r + 1)(r + 2),

we know by the structure theorem that the homogeneous solutions are given by
x 2x
yh = Ae + Be .

Now, to find all solutions, we must also determine a particular solution yp . That is, we
seek a function yp so that, when inserted into the above equation, makes the left-hand
side equal to the right-hand side.

Before finding the particular solution, let us make a note on the physical interpreta-
tion of what we are doing. Namely, the inhomogeneous differential equation represents a
physical spring-mass system as explained on page 329 with external force f (x). Below,
we simulate how the system evolves when starting from initial values y(0) = 1, y 0 (0) = 0.

Fig. 7. Left: Simulation of the differential equation with f (x) = 2(x2 + 1). Right:
For comparison, the same simulation, but with f (x) = sin x.
11.2. SECOND ORDER LINEAR ODES WITH CONSTANT COEFFICIENTS 339

In both cases, the solution behaves – after a transitional period – more or less like the
external force. This suggests to us the following: Guess that the particular solution is
more or less on the same form as the external force!

Example 11.39 (part 2) Since f (x) is a second degree polynomial, we guess that our
solution is of the form
yp = C 2 x 2 + C 1 x + C 0 .

The point is now to use this guess and insert yp , yp0 , yp00 into the differential equation, and
see what this says about the constants C0 , C1 , C2 . The calculation is as follows:

2C2 + 3(2C2 x + C1 ) + 2(C2 x2 + C1 x + C0 ) = 2(x2 + 1)

2C2 x2 + (6C2 + 2C1 )x + (2C2 + 3C1 + 2C0 ) = 2(x2 + 1)


1
C2 x2 + (3C2 + C1 )x + (2C2 + 3C1 + 2C0 ) = x2 + 1
2
For the last line to hold for every x, we need the polynomial on the left-hand side to be
the same as the polynomial on the right-hand side. That is, we need:

C2 = 1, 3C2 + C1 = 0 and 2C2 + 3C1 + 2C0 = 2.

Solving this system of equations yields C2 = 1, C1 = 3 and C0 = 9/2. This leads us to


conclude that a solution is
9
yp = x2 3x + .
2
Finally, by the structure theorem, we conclude that the general solution is given by

x 2x 9
y = yh + yp = Ae + Be + x2 3x + .
2

Exercise 11.40 Solve the initial value problem (shown in Figure 11.7):
8
>
> y 00 + 3y 0 + 2y = 2(x2 + 1)
>
<
y(0) = 1
>
>
>
: y 0 (0) = 0

Next, we adress the following question: what are natural guesses for other choices of
f (x)? Well, the thing is that
R for general f (x), we cannot hope to find solutions (just as
we cannot hope to solve f (x)dx for general f ). However, there are some specific f ’s
that allow for reasonable guesses:
340 CHAPTER 11. DIFFERENTIAL EQUATIONS

Here, we provide a list of some choices of f (x) along with natural guesses for particular
solutions:

Table 11.41 (Natural guesses for the particular solution)


f (x) Natural guess
polynomial of degree n yp = Cn xn + Cn 1x
n 1
+ · · · + C1 x + C0
eax yp = Ceax
sin ax or cos ax yp = C sin ax + D cos ax

Moreover, if f (x) is either a sum or product of the above functions, then our guess should
be the product or sum of the corresponding guesses.

Exercise 11.42 Find all solutions to the following differential equations.

(a) y 00 +2y 0 +4y = 2x+3 (b) y 00 +4y 0 +4y = ex (c) y 00 +6y 0 +4y = cos(x)

Exercise 11.43 Solve the initial value problem (simulated in Figure 11.7):
8 00
>
> y + 3y 0 + 2y = sin x
>
<
y(0) = 1
>
>
>
: y 0 (0) = 0

Remark: When you are done, plot your solution. Does it match the simulation shown
in Figure 7?
11.2. SECOND ORDER LINEAR ODES WITH CONSTANT COEFFICIENTS 341

Why natural guesses sometimes fail and how to deal with this
Unfortunately, it turns out that even if you make a guess based on Table 11.41, it may
be that it does not work! In the following example, we explain why this happens.

Example 11.44 (Part 1: Failure of a natural guess) We consider the differential


equation
y 00 + y = sin(x). (11.13)
By Table 11.41, we are supposed to make the guess yp = C sin x+D cos x. Unfortunately,
this will not work. To understand why, note that the characteristic polynomial is given
by
p(r) = r2 + 1 = (r + i)(r i).
This means that the homogenous solutions can be expressed on the form

yh = C sin x + D cos x.

But this is identical to the natural guess! In other words, when plugged into (11.13), the
natural guess will make the left-hand side equal to zero, and not sin x.

Exercise 11.45 Try to solve the differential equation y 00 + 4y 0 + 4y = e 2x by making


the natural guess yp = Ce 2x . Why does this fail?

So, what to do? For inspiration, let us try to solve two homogeneous equations by
using the guess and check method.

Example 11.46 (Use of "guess and check" to find homogeneous solutions)


First, let us consider
y 00 + 3y 0 + 2y = 0.
The structure theorem tells us that the solutions are of the form
x 2x
yh = Ae + Be .

But let us suppose we do not know this. Instead, we observe that the first order linear
differential equation 3y 0 + 2y = 0 has solution y = Ce (2/3)x , and therefore suspect that
the solution could be on the form y = erx (by the structure theorem for homogeneous
solutions, we may ignore the constant). To check if this works for some value of r, we
compute

y 0 = rerx

y 00 = r2 erx
342 CHAPTER 11. DIFFERENTIAL EQUATIONS

and plug these expressions into the homogeneous differential equation to get

y 00 + 3y 0 + 2y = 0 () r2 erx + 3rerx + 2erx = 0

() erx (r2 + 3r + 2) = 0

() r2 + 3r + 2 = 0.

That is, we observe that y = erx is a solution if and only if r is a root of the characteristic
polynomial! This should not be surprising, since this gives us the two solutions y1 = e x
and y2 = e 2x promised by the structure theorem.

Here is an example where the "guess and check" strategy for finding homogeneous
solutions, indicated above, fails.

Example 11.47 (Use of "guess and check" to find homogeneous solutions) We


now consider
y 00 + 2y 0 + y = 0.
Again, we make the guess y = erx . By plugging this into the differential equation,
following the same steps as above, we find that y = erx is a solution if and only if

r2 + 2r + r = 0.

But since this polynomial has a repeated root at r = 1, we only find the solution
y1 = e x in this way. That is, the solution y2 = xe x is "missing".

Now, let us return to example 11.44, and the failure of the natural guess. Note that
while we missed the solution y2 = xe x in Example 11.47, we do obtain it by multiplying
y1 by x. Could it be that we can fix failed guesses for finding particular solutions simply
by multiplying by x? Well, that would just be too good to be true... or?

Example 11.44 (Part 2: When in doubt, multiply by x) Again, we consider the


differential equation
y 00 + y = sin(x).
This time, let us modify it by multiplying by x:

yp = x(C sin x + D cos x).

To see if this works, we compute:

y 0 = (C sin x + D cos x) + x(C cos x D sin x),


11.2. SECOND ORDER LINEAR ODES WITH CONSTANT COEFFICIENTS 343

y 00 = (C cos x D sin x) + (C cos x D sin x) + x( C sin x D cos x)

= 2D sin x + 2C cos x Cx sin x Dx cos x.

If we plug the expression for y 00 and y into the differential equation y 00 + y 0 = sin x, we
get the equation
2D sin x + 2C cos x = sin x.
This implies that for the left-hand and right-hand sides to agree for all x, we need
D = 1/2 and C = 0. That is,
x
yp = cos x.
2
Taking the real form of the homogeneous solution into account (recall example 11.33),
we conclude that the general solution is given by
x
y = y p + yh = cos x + C sin x + D cos x.
2
Success!

Exercise 11.48 Use the trick of multiplying by x to solve:


(a) y 00 y=e x
(b) y 00 + 2y 0 + y = e x
.
Finally, we mention that there are situations when the natural guess is not salvaged
by multiplying by x. In the following exercise, you are asked to try out one last trick:

Exercise 11.49 Solve the initial value problem


8 00 0 x
<y + 2y + y = xe
>
y(0) = 1
>
:
y 0 (0) = 0
(a) Check that the "muliply by x" strategy fails.
(b) Assume that the particular solution is on the form yp = C(x)e x for some function
C(x). Use this to solve the differential equation.

Remark 11.50 Some students find it frustrating that there is no general method
to solve inhomogeneous second order linear differential equations (as opposed to their
homogeneous counter-parts). Please keep in mind that if we want exact formulas for the
solutions, then just as there is no general strategy to solve all indefinite integrals, there is
no general strategy for solving all second order differential equations. However, if all we
need is to solve the differential equation numerically, then simulations essentially always
work (see Example 11.56 for how to simulate second order differential equations).
344 CHAPTER 11. DIFFERENTIAL EQUATIONS

The physical significance of "multiplying by x" – Resonance!


We now discuss the connection to resonance. This phenomenon occurs perhaps most
clearly in example 11.44, part 2, above, where we considered the differential equation

y 00 + y = sin(x). (11.14)

Indeed, here, the homogeneous solution is

yh = A sin x + B cos x.

One important feature of this solution is that no matter the A and B, the homogeneous
solution will always have the same frequencies. Indeed, it is possible to show that given
A,B you can always find C,D such that

A sin x + B cos x = C sin(x + D).

Exercise 11.51 Use the addition formula for sin x and Pythagoras to prove this.

That is, the homogeneous solution of (11.14) really dictates the frequency with which
this system wants to oscillate when there is no outside force. Now, what happens when
we impose an external force that oscillates with exactly this frequency (as we are doing
above)? Well, resonance:

To the right, we see a simulation of y 00 +y = sin(x)


with y(0) = 1 and y 0 (0) = 0. The effects of reso-
nance lets an outside force of finite magnitude make
the system oscillate with an amplitude going to in-
finity. Mathematically, the resonance occurs exactly
when we need to multiply our natural guess by x!
Fig. 8. Illustration of resonance.

Exercise 11.52 Determine the solution to the differential equation in (11.14) with
initial conditions y(0) = 1 and y 0 (0) = 0. Does it match the simulation shown above?

Exercise 11.53 Consider the differential equation y 00 + y = sin(bx).

(a) For what value of b should resonance occur?


(b) Solve the differential equation. What happens with the solution as b ! 1?
(c) Compare your answer from (b) to the solution of y 00 + y = sin x from part 2 of
Example 11.44.
11.2. SECOND ORDER LINEAR ODES WITH CONSTANT COEFFICIENTS 345

Remark 11.54 (Examples of resonance and its use)


The Tacoma Narrow Bridge collapse in 1940 is a
standard example of resonance. A bridge acts like a
spring-mass system and has some natural frequen-
cies that it likes to oscillate to (they are the “in-
ternal” solutions). When an outside force such as
wind acts on the bridge at a frequency matching
the frequencies of these internal solutions, then res-
onance occurs. That is, solutions get multiplied by Fig. 9. The Tacoma bridge col-
x (morally). lapse.
MRI (Nuclear Magnetic Resonance Imaging) ma-
chines produce images such as the one to the right
by sending electromagnetic waves through the body
at certain frequencies, and then measuring the res-
onance this causes with water molecules. (The ’N’
is removed from the abbreviation since people are
afraid of anything with ’Nuclear’ in it – no kidding!).
Spectrograms of sound are made by recording the Fig. 10. An MRI image.
sound and then checking how sin(ax) and cos(ax)
resonate with the recorded signal (we Fourier trans-
form the signal). High resonance means that the sig-
nal contains a lot of this frequency – low resonance
means that this frequency is not present. (The hu-
man ear basically does the same thing – that is, it
uses resonance to generate electric impulses based
on the sounds you are exposed to – what you “hear”
is the Fourier transform of the brain!) Fig. 11. On the bottom, we see a
spectrogram for sound signal.
The JPEG format takes an image and measures
its resonance with certain building blocks for im-
ages (these are basis vectors for the vector space of
images). The image is compressed by only keeping
the blocks corresponding to the strongest resonance.
When using only these blocks, the image looks the
same to humans, even though a lot of information
has been thrown away (MP3 compression works in
exactly the same way). As with sound, cells inside
of your eye resonate in different ways to the light
hitting the retina. This is then turned into electric Fig. 12. An analysis of the build-
impulses, and what you “see” is again the result of a ing blocks of a jpeg image.
Fourier transform performed by the brain.
346 CHAPTER 11. DIFFERENTIAL EQUATIONS

Some words on n’th order linear differential equations


Above, we have discussed a solution method for 2’nd order linear differential equations.
Here we briefly discuss the question of how much of this procedure is valid for n’th order
linear differential equations
cn y (n) + cn 1y
(n 1)
+ · · · + c1 y 0 + c0 y = f (x).
Basically, the answer is: pretty much everything. To this end, we begin by noting that
given an n’th order linear differential equation, such as the one above, we define its
characteristic polynomial to be
p(r) = cn rn + cn 1r
n 1
+ · · · c1 r + c0 .
We can now formulate the following higher order structure theorem:

Proposition 11.55 (The structure theorem) Suppose that yp is any solution of the
the differential equation
y (n) + pn 1y
(n 1)
+ · · · + p1 y 0 + p0 y = f (x). (11.15)

Moreover, suppose that the characteristic polynomial


p(r) = rn + pn 1r
n 1
+ · · · p1 r + p0

has distinct zeroes r1 , r2 , . . . , rk with multiplicities m1 , m2 , . . . , mk , respectively. Then y


is a solution of (11.15) if and only if
y = y p + yh
where:

yh = ( A1 + B1 x + · · · )er1 x + ( A2 + B2 x + · · · )er2 x
| {z } | {z }
degree m1 polynomial degree m2 polynomial

+ · · · + ( Ak + Bk x + · · · )erk x .
| {z }
degree mk polynomial

If all coefficients pj are real, then any complex roots appear in conjugate pairs having
the same multiplicity. In particular, if ri , rj form such a pair with multiplicities mj =
mi = m, then there exists k, ! 2 R so that ri = k + i!, rj = k i! and the corresponding
two "groups" of terms from the formula for yh can be replaced by the single "group" of
terms ⇣ ⌘
ekx (C1 + C2 x + · · · ) cos(!x) + (D1 + D2 x + · · ·) sin(!x)
| {z } | {z }
degree m polynomial degree m polynomial

While we will neither prove nor use this result, we remark that the proof is a straight-
forward, albeit tedious, extension of the proof for the case n = 2.
11.2. SECOND ORDER LINEAR ODES WITH CONSTANT COEFFICIENTS 347

How to simulate second order linear differential equations (optional)


Earlier, we showed how to express a second order linear differential equation in terms
of two first order differential equations. A consequence of this is that we can use what
we know about simulating first order differential equations to simulate the second order
one. We explain this in an example that you can use as a blueprint.

Example 11.56 Let us explain how to simulate the differential equation


y 00 + 3y 0 + 2y = ex
by exploiting the fact that we can express it on the form
y 0 + 2y = u (11.16)

u 0 + u = ex . (11.17)
Our strategy is basically this:

Step 1: Choose initial conditions y(x0 ) and y 0 (x0 ), and a time step x0 .
Step 2: Simulate the first order differential equation (11.17) (here, we use (11.16)
to translate initial conditions for y into initial conditions for u).
Step 3: Simulate the first order differential equation (11.16) (here, we use the
simulated values for u from Step 2).

A (rather naive) Python code may therefore look something as follows:

1 def dy(y,u): # This is basically (10.14)


2 return u 2 ⇤ y
3 def du(x,u): # This is basically (10.15)
4 return np.exp(x) u # This line requires the numpy package
5
6 deltaX = 1/2 ⇤⇤ 3; N = 2 ⇤⇤ 5 # Stepsize and number of steps
7 X = [n ⇤ deltaX for n in range(0,N+1)] # X values
8 Y = [1]; DY = [0] # Initial conditions for Y
9 U = DY + 2 ⇤ Y # Translating initial conditions to U
10
11 for n in range(0,N): # Loop simulating U
12 slope = du(X[n],U[n])
13 U. append (U[n] + slope ⇤ deltaX )
14
15 for n in range(0,N): # Loop simulating Y
16 slope = dy(Y[n],U[n])
17 Y. append (Y[n] + slope ⇤ deltaX )
348 CHAPTER 11. DIFFERENTIAL EQUATIONS

If we also import the pyplot package and add the


line plt.plot(X,Y,"bo") in the code from the
above example, then we get the output shown in
the figure to the right.

Exercise 11.57 Based on the above code, sim-


ulate the differential equation from exercise
11.49. Fig. 13. Simulation of the differential
equation y 00 + 3y 0 + 2y = ex .
The approach we suggest above has a drawback. Indeed, what if the factorisation
involves complex roots? Well, either we need to let Python deal with complex numbers
(which we can), or we rewrite the algorithm so that complex numbers do not appear. In
the following example, we explain how to modify the algorithm.

Remark 11.58 (Complex roots and simulations) We simulate y 00 + 3y 0 + 2y = ex


starting from y(0) = 1 and y 0 (0) = 1 using a minor modification of our algorithm for
simulating first order differential equations:
1. Plug initial conditions y(0) and y 0 (0) into the differential equation to obtain y 00 (0).
2. Use y(0) and y 0 (0) to obtain y( x) ⇡ y(0) + x · y 0 (0) (this is the same as always).
3. Use y 0 (0) and y 00 (0) to obtain y 0 ( x) ⇡ y 0 (0) (as above, with y replaced by y 0 ).
4. Repat the above with y( x) and y 0 ( ) in place of y(0) and y 0 (0).
Here is one (rather naive) way of translating this into Python code:

1 def ddy(x,y,dy ):
2 return np.exp(x) 2 ⇤ y 3 ⇤ dy # This requires the numpy package
3
4 deltaX = 1/2 ⇤⇤ 3; N = 2 ⇤⇤ 5 # Stepsize and number of steps
5 X = [0 + n ⇤ deltaX for n in range(0,N+1)] # X values starting at 0
6 Y = [1]; DY = [0] # Initial conditions for Y and DY
7
8 for n in range(0,N): # Loop simulating Y and DY
9 slope = DY[n]
10 slope_of_slope = ddy(X[n],Y[n],DY[n])
11 Y. append (Y[n] + slope ⇤ deltaX )
12 DY. append (Y[n] + slope_of_slope ⇤ deltaX )

Exercise 11.59 Use the above code to simulate y 00 + y = 0 with initial conditions
y(0) = 1 and y 0 (0) = 0.
11.3. RELEVANT EXERCISES FROM PREVIOUS EXAMS 349

11.3 Relevant exercises from previous exams


Exercise 11.60 (Exam 2016-01-23, Problem 5) We consider the differential equa-
tion
y 00 + y 0 2y = 2x2 + 4x + 1.

(a) Show that if yp and yep are two solutions of this equation, then their difference
solves the homogeneous equation.
(b) Find all solutions of the differential equation. Comment on the relevance of what
you did in part (a).

Exercise 11.61 (Exam 2016-01-07, Problem 2) Determine all solutions to the


following differential equations.
1 x sin x y 1
(a) = , x 2 R, (b) y 0 = , x > 0.
1+y 2 y0 x 1 + x2

Exercise 11.62 (Exam 2015-08-17, Problem 3) Solve the initial value problem
(
(4 + e2x )y 0 = yex
y(0) = 1

Exercise 11.63 (Exam 2015-05-27, Problem 2) Determine all solutions to the


following differential equations.
1 x
(a) (1 + cos2 x)yy 0 = (1 + y 2 ) sin x, (b) y 0 y= 2 , x > 0.
x x + 2x + 2

Exercise 11.64 (Exam 2015-01-08, Problem 4) Determine all solutions to the


following differential equations.

(a) xy 0 y = x3 sin x, (b) y 0 = x(3 + ey )

Hint for (b): Make the change of variables ey = z.

Exercise 11.65 (Exam 2014-12-18, 3) Determine all solutions to the following dif-
ferential equations.
y
(a) y 0 + = arctan(x), (b) (1 + cos x)y 0 = (1 + y 2 ) tan x.
x

Exercise 11.66 (Exam 2014-10-04, part of Problem 3) Determine a solution to


the differential equation
✓ ◆
0 1 p
y = x y, y(1) = 1/e.
x
350 CHAPTER 11. DIFFERENTIAL EQUATIONS

Exercise 11.67 (Exam 2014-08-18, Problem 1) Determine all solutions to


1 0 2
(a) y 0 + y = e(x ) (b) y 0 = y 2 + 4y + 5.
x

Exercise 11.68 (Exam 2014-05-26, Problem 1) Determine all solutions to

xy 0 + y = x3 , x 2]0,1[.

Exercise 11.69 (Exam 2014-01-09, part of Problem 2) Solve the differential


equation (
(e2x + 4ex + 4)y 0 = ex ,
y(0) = 2

Exercise 11.70 (Exam 2013-12-18, Problem 2) Solve the initial value problem
8 0
< y + y = 1,
x
:
y(1) = 2

Exercise 11.71 (Exam 2013-08-21, Problem 3) Determine all solutions to the


differential equation
xy 0 + 2y = sin x, x > 0.

Exercise 11.72 (Exam 2013-05-29, Problem 5) Solve the differential equation


p
0 1 x2
xy = , x 2 [0,1].
y3
To get all the points for this exercise, you need to simplify the answer as much as
possible.

Exercise 11.73 (Exam 2013-01-09, Problem 2) Determine the solution of the


differential equation
(x + 1)(x + 2)y 0 = 1
that satisfies y(0) = 1.

Exercise 11.74 (Exam 2012-12-19, Problem 2) Determine the general solution to


the differential equation
y 00 + y 0 2y = e 2x .

Exercise 11.75 (Exam 2012-05-28, Problem 2) Solve the initial value problem
(
xy 0 = y + xy
y(1) = 1
11.3. ANSWERS TO SELECTED EXERCISES 351

11.3 Answers to selected exercises


p
11.4 (a) y = 1/(x + ex + C), (b) y = C 1 + x2 , (c) y = ln(Cx + 1)

11.6 Using the method, we get


Z Z
Ldy
y 0 = Cy(1 y/L) () = Cdx.
y(L y)

The integral involving y’s is solved using partial fraction decomposition.

11.8 It can be rewritten as v 0 + p(x)v = q(x) with p(x) = k/m and q(x) = 9.82 (so,
both p(x) and q(x) are constants).

11.11 ( 1/5)(cos(x) 2 sin(x)) + Ce 2x .

2
11.12 The integrating factors are (a) e3x , (b) e(x ) .

11.13 (a) y = (x 1)xex + Cx, (b) y = 1 Ce arctan(x) , (c) y = x2 2 + Ce x2 /2 .

11.14 (a) y = x2 ln x + x2 ln(x + 1) + Cx2 , (b) C = 1.

11.15 (a) v = (m/k) · 9.82 + Ce kx/m , (b) assuming Elsa weighs m = 100 kg’s, then
k = 4.91 (we ignore the units).

11.18 (b) Underdamped (b < 0), harmonic (b = 0), damped (0 < b < 2), critically
damped (b = 2), and overdamped (b > 2).

11.30 For y = Ay1 + By2 , compute y 0 and y 00 . Plug this into the differential equation
ay 00 + by 0 + cy 0 = 0, and observe that the left-hand side becomes equal to the right
hand-side (you need to use the fact that both y1 and y2 satisfy this differential
equation on their own).

11.40 y = 4e x + (1/2)e 2x + x2
3x + 9/2.
p p 2x ,
11.42 (a) y = x/2 + 1/2 + e x (C sin( 3x) + p
D cos( 3x), p(b) y = ex /9 + (A + Bx)e
(c) y = (2 sin x + cos x)/15 + Ae( 3+ 5)x + Be( 3 5)x .

11.43 y = (1/10) sin(x) + ( 3/10) cos(x) + Ae x + Be 2x with A = 5/2 and B = 6/5.

11.45 See example 11.44.

11.48 (a) y = Aex + Be x (1/2)xe x, (b) y = Ae x + Bxe x + (1/2)x2 e x.

11.49 (a) The "natural" guess yp = Cx2 e x fails, (b) this gives solution y = Ae x +
Bxe x + (1/6)x3 e x , (c) y = e x + xe x + (1/6)x3 e x
352 CHAPTER 11. DIFFERENTIAL EQUATIONS

11.51 By the addition formula C sin(x + D) = C cos(D) sin(x) + C sin(D) cos(x). This
gives the relations A = C cos(D) and B = C sin(D). This means that C and D
are the
p modulus and angle of the coordinate (A,B). In other words, we can put
C = A2 + B 2 , and, say, D = arctan(B/A).

11.52 y = (1/2) sin x + cos x (1/2)x cos x.

11.53 (a) At b = 1, (b) y = (1/(1 b2 )) sin(bx) + C sin(x) + D cos(x). As b ! 1, this


formula seems to break down, (c) by considering the limit of (1/(1 b2 )) sin(bx) as
b ! 1 as a derivative, it is possible to see the connection to the solution formula
for the differential equation when b = 1 found in part 2 of Example 11.44.
Chapter 12

The definite integral

Introduction
In this chapter, we study the definite integral. A highlight is the proof of the Fundamental
Theorem of Calculus, which explains why definite integration and differentiation are, in
a sense, "opposite" operations.

Remark 12.1 (Selected problems from previous exams based on this chapter)

1. (a) Formulate the Fundamental Theorem of Calculus.


(b) Determine, without actually integrating the following expression, the points
x where the function Z x
1 u2
f (x) = 2
du
0 1+u
has local extreme points, and determine whether or not these are global.
Present your answers in a simple sketch where you also point out where the
function is concave and convex, respectively.

2. Explain what we mean by a improper integral, and determine whether the following
improper integrals converge or diverge:
Z 1 Z 1
dx dx
(a) 2
(b) p
2 x(ln x) 0 sin x

3. By comparing the following sum with an integral, show that


N
X 1
arctan(n) N arctan(N ) ln(1 + N 2 ).
2
n=1

353
354 CHAPTER 12. THE DEFINITE INTEGRAL

12.1 The definition of the definite integral


In Chapter 7, we discussed the following definition of the definite integral.

Definition 12.2 (Definition of the definite integral for continuous functions)


Suppose that f is continuous on [a,b]. For each N 2 N, let x0 < x1 < . . . < xN be
equally spaced points inside of [a,b] with x0 = a and xN = b. Then
Z b N
X
f (x)dx = lim f (xk ) x.
a N !1
k=1

An immediate issue is that we do not actually know if this definition holds for some
continuous functions, or for all continuous functions. Or could it be that it holds for
more than just continuous functions?

A naive definition of the definite integral and why it fails


We begin with the following question. Could it be that 12.2 is a reasonable definition
for the area under the graphs of positive, bounded functions on finite intervals? The
following example shows that the answer is a resounding "no"!

Example 12.3 (The Dirichlet function)


We consider a rather strange function, namely
the so-called Dirichlet function
(
0 if x 2 Q
f (x) =
1 if x 2 R\Q

In Figure 1, we try to visualise f . Since, in ev-


ery open interval, no matter how short, there Fig. 1. Since both the Q and R\Q-
are both infinitely many rational and irrational axes have infinitely many holes, the
points (see Corollary 1.124), the graph jumps be- graph of Dirichlet’s function can be
tween the heights y = 0 and y = 1 infinitely often thought of as some type of foam at
on every interval, no matter how short. the heights y = 0 and y = 1.
Now, suppose that we use Definition 12.2 as our definition of the definite integral
for bounded functions on finite intervals. Since f is a positive function, the area under
its graph over the interval [0,1],
p whatever it is, ought to be larger than the area with
respect to the interval [0,1/ 2] (indeed, if not, then our definition of "area" would be
quite useless). That is, we ought to have
Z 1 Z 1/p2
f (x) dx > f (x) dx. (12.1)
0 0
12.1. THE DEFINITION OF THE DEFINITE INTEGRAL 355

But notice the following. If we extend Definition 12.2 in this way, then we
would have
Z 1 N
X ⇣ ⌘ 1 N
X 1
f (x) dx = lim f k/N · = lim 0· = 0,
0 N !1 |{z} N N !1 N
k=1 xk k=1

and p
Z 1/ 2 XN ⇣ p ⌘ XN
1 1 1
f (x) dx = lim f k/ 2N · p = lim 1· p =p .
0 N !1 | {z } 2N N !1 2N 2
k=1 xk k=1

This is completely unreasonable (since it does not respect inequality (12.1)), and there-
fore we cannot hope to extend Definition 12.2 naively!

Exercise 12.4 Show that the Dirichlet function is discontinuous at all points in R.

Exercise 12.5 (Challenge) In Appendix C, it was shown that Q is a countable set.


That is, there exists a sequence (rn )1
n=1 so that Q = {rn : n 2 N}. Define a function
by 8
< 1 x = rn for some n 2 N,
f (x) = n + 1
:
0 otherwise
Show that:

(a) f is not continuous at all x 2 Q.


(b) f is continuous at all other x.

In what follows, we explain two historically


important ways of fixing the above "definition"
for the definite integral. These are due to Bern-
hard Riemann, one of the most famous mathe-
maticians of all time, and Gaston Darboux, a less
famous mathematician. The thing was that Rie-
mann’s theory, which was the first correct and
useful theory for definite integrals, was based on
a definition that was something of a mess. Dar- Fig. 2. Left: Bernhard Riemann
boux, who only became an active mathematician (1826-1866). Right: Gaston Dar-
after Riemann had died, figured out how to make boux (1842-1917).
Riemann’s theory more accessible to both stu-
dents and other mathematicians.
356 CHAPTER 12. THE DEFINITE INTEGRAL

The definite integral according to Riemann


To avoid the problems caused by the Dirichlet function, Riemann considered a wider
family of Riemann sums than what we did in the naive definition, above. To this end,
we define what we mean by a partition and its mesh size.

Definition 12.6 (Partitions and mesh size) A finite set of points P ⇢ [a,b] is called
a partition of the interval [a, b] if we can write P = {xk }N
k=0 with

a = x 0 < x1 < x2 < · · · < xN 1 < xN = b.

We call the size of the largest "gap" xk = xk xk 1 the mesh size of the partition P,
and denote it by mesh(P) (the notation kPk is also common). Note that unless otherwise
stated, we always express a partition in such a way so that the above inequalities hold.

p
Fig. 3. Here we show two Riemann sums for the function f (x) = 1 x2 . The
partition used on the right has more points, but both have the same mesh size.
Notice that a large mesh size indicates that the Riemann sum is probably bad.

Next, given a partition P, we will allow the height of the rectangle with base [xk 1 ,xk ]
to be computed with respect to any height f (ck ) as long as ck 2 [xk 1 ,xk ]. That is, we
define a Riemann sum as follows.

Definition 12.7 (Riemann sums) Let f be a bounded function defined on [a,b]. Then
the sum
XN
f (ck ) xk ,
k=1

where PN = {xk }N
k=0 is any partition of [a,b] into N pieces, and ck 2 [xk 1 ,xk ] for all
k 2 {1, . . . , N }, is called a Riemann sum for f .
12.1. THE DEFINITION OF THE DEFINITE INTEGRAL 357

Fig. 4. Illustration of two Riemann sums that lie neither above nor below the graph.
As we see from the figure on the right, when the mesh size is small, it really should
not matter that much which ck we choose in each subinterval [xk 1 ,xk ].

We can now formulate Riemann’s definition of the definite integral.

Definition 12.8 (Riemann integrability and the Riemann integral) Let f be a


bounded function defined on the interval [a,b]. We say that f is Riemann integrable on
[a,b] if the limit
XN
lim f (ck ) xk (12.2)
N !1
k=1

has the same limit for all sequences of partitions PN with mesh size going to 0, as
N ! 1, and all choices ck 2 [xk 1 ,xk ]. Moreover, we call the common value of these
limits the Riemann integral of f over [a,b] and denote it by
Z b
f (x) dx.
a

If the above does not hold, we say that f is not Riemann integrable on [a,b].

Note that, above, we are tacitly assuming that PN partitions [a,b] into N pieces.
While this is rather standard, it is not really necessary – the important thing is that the
mesh size of PN goes to 0 as N ! 1. Moreover, while both the ck and xk depend
on N , we ignore this in the notation, as is customary, to make the expression (12.2) less
scary.

Exercise 12.9 What does this definition say about the Dirichlet function from Ex-
ample 12.3?
358 CHAPTER 12. THE DEFINITE INTEGRAL

The definite integral according to Darboux


Darboux realised that using Riemann’s definition to check whether or not a function
was integrable was cumbersome since one would have to take all possible partitions of
an interval into account (in particular, his students were complaining about this). He
therefore set out to find a more "user friendly" criterion for integrability. His main
insight was that we only actually need to consider lower and upper Riemann sums.

Definition 12.10 (Lower and upper Riemann sums) Let f be a bounded function
defined on [a,b], and let P be a partition of [a,b]. For each subinterval [xk 1 ,xk ], denote
by mk and Mk the infimum and supremum, respectively, of f over this interval.
We now define the lower and upper Riemann sums of f with respect to P by
N
X N
X
L(f,P ) = mk xk and U (f,P ) = Mk x k .
k=1 k=1

Exercise 12.11 (a) Suppose that f is bounded function defined on [a,b]. Are the
lower and upper Riemann sums for f always Riemann sums? Consider some
suitable example and explain.
(b) What if we in addition assume that f is continuous?

We are now ready to formulate Darboux’ definition.

Definition 12.12 (Darboux integrability) Let f be a bounded function defined on


the interval [a,b]. We say that f is Darboux integrable if for all ✏ > 0 there exists a
partition P of [a,b] so that
U (f,P ) L(f,P ) < ✏.

An advantage of this definition is that it is


an ✏-type definition, much like our definitions of
various limits. To the right, we illustrate what
it means. Indeed, we say that a function is Dar-
boux integrable if, for every ✏ > 0, you can find
a partition for which the difference between the
combined areas if the upper rectangles minus the
combined are of the lower rectangles is smaller
than this ✏. Fig. 5. Illustration for 12.12.

Exercise 12.13 Prove that if f is monotone on [a,b], then f is Darboux integrable on


[a,b].
Hint: Consider the illustration in exercise 7.14.
12.1. THE DEFINITION OF THE DEFINITE INTEGRAL 359

A disadvantage of Darboux’ definition is that it is rather indirect. Indeed, it only


states when a function is to be considered integrable, but says nothing about the actual
value of the area under its graph. To address this, we need another result (note that
while the proof is a bit involved, it does not involve any fancy theorems).

Lemma 12.14 Suppose that f is Darboux integrable on [a,b]. Then there exists a
unique number A with the property that for all partitions P, P 0 of [a,b], we have
L(f,P)  A  U (f,P 0 ). (12.3)

Proof. Let f be a Darboux integrable function on [a,b]. In exercise 12.15, below, you are
asked to prove that if P, P 0 are two partitions of [a,b], then
L(f,P )  U (f,P 0 ).
Geometrically, this should be clear, since all lower Riemann sums should lie completely
below the graph of the function, while the upper Riemann sums lie completely above the
graph (however, as Example 12.3 teaches us, we need to tread carefully when studying
the concept of areas below graphs).
For any fixed P 0 , then U (f,P 0 ) is an upper bound for L(f,P ) for all partitions P of
[a,b]. Therefore, by the completeness axiom, supP L(f,P ) exists. Similarly, inf P 0 U (f,P 0 )
exists. Moreover, by exercise 1.114, we have that
sup L(f,P )  inf0 U (f,P 0 ).
P P

Next, we note that if a number A satisfies (12.3), then it must also satisfy
sup L(f,P )  A  inf0 U (f,P 0 ). (12.4)
P P

(Do you see why?)


Finally, to obtain a contradiction, we assume that there exist two numbers A1 < A2
that both satisfy (12.3), and therefore inequality (12.4). In particular, for any partition
P , this implies that
B A  U (f,P ) L(f,P ).

Fig. 6. The set of values obtained by the L(f,P ) and U (f,P 0 ) as P,P 0 run through all
possible partitions of [a,b] are indicated in red and blue, respectively. In particular,
the distance between A1 and A2 should be smaller than the distance between any
two upper and lower Riemann sums.
360 CHAPTER 12. THE DEFINITE INTEGRAL

We now invoke the Darboux integrability of the function f for the first time. This
implies that for all ✏ > 0, there exists a partition P so that
U (f,P ) L(f,P ) < ✏.
Since B A > 0, we can choose ✏ = (B A)/2. But this implies that
B A
B A  U (f,P ) L(f,P ) < ,
2
which is absurd. Done!
Exercise 12.15 Let P be a partition of [a,b].
(a) Prove that if P = P 0 , then L(f, P )  U (f, P 0 ).
(b) Prove that if you extend P by putting P 0 = P [ {y}, for y 2 [a,b], then
L(f,P )  L(f,P 0 ) and U (f,P 0 )  U (f,P ). (12.5)
(c) Prove, by induction, that if you extend P by putting P 00 = P [ {y1 , . . . , yM }, for
yj 2 [a,b], then (12.5) holds with P 0 replaced by P 00 .
(d) Combine parts (a) and (c) to show that for all partitions P, P 0 we have L(f,P ) 
U (f,P 0 ).
Hint: For (b), see the figure below. In (c), you can use (b) to prove both the base case
and the induction step. In (d), you need to define, in terms of P and P 0 some third
partition P 00 to which you can compare both P and P 0 .

Fig. 7. Here is what happens to a lower Riemann sum when we add some extra point
y1 to the partition.

With the above lemma in hand, the following definition makes sense, and explains
what we mean by the Darboux integral.

Defintion 12.16 (Darboux integral) Let f be a Darboux integrable function on [a,b].


Then we call the unique number A from the above lemma the Darboux integral of f over
[a,b] and denote it by
Z b
f (x)dx.
a
12.1. THE DEFINITION OF THE DEFINITE INTEGRAL 361

Darboux and Riemann integrability of smooth functions


To get a more detailed sense of Darboux and Riemann’s definitions of the definite integral,
we consider the case of smooth functions. We use the following definition.

Definition 12.17 (Smooth functions) We say that a function f is smooth on an


interval I if f 0 exists and is continuous on I.

We immediately note that most of the functions we encounter in these lecture notes
are smooth (such as polynomials, rational functions, the exponential function and the
logarithm, as well as the trigonometric functions). However, there are plenty of non-
smooth functions. Even elementary functions can fail to be smooth. A basic example is
the absolute value function (which is not differentiable at x = 0).

Exercise 12.18 Prove that if f is smooth on an interval [a,b], then there exists a
constant M so that |f (x1 ) f (x0 )|  M |x1 x0 | for all x1 , x0 2 [a,b].
Hint: The proof is short and sweet, but uses some of the main results of the course.

We now prove that all smooth functions are differentiable.

Proposition 12.19 If f is smooth on [a,b], then f is Darboux integrable there.

Proof. By the definition, we need to check that for all ✏ > 0, there exists a partition P of
[a,b] so that U (f,P ) L(f,P ) < ✏. To this end, we first let ✏ > 0 be some fixed number.
Next, we study the expression U (f,P ) L(f,P ). To obtain a more explicit formula,
we denote the points in the partition P by a = x0 < x1 < . . . < xN = b. Moreover, since
f is continuous, it follows by the Min-Max theorem, that the infimum and supremum
of f on each of the subintervals [xk 1 ,xk ] exist and are attained at points `k and uk ,
respectively. In particular, this means that the formulas for the lower and upper Riemann
sums, respectively, can be expressed as
N
X N
X
L(f,P ) = f (`k ) xk and U (f,P ) = f (uk ) xk ,
k=1 k=1

which implies that


N
X N
X
U (f,P ) L(f,P ) = f (uk ) xk f (`k ) xk
k=1 k=1

N ⇣
X ⌘
= f (uk ) f (`k ) xk .
k=1
362 CHAPTER 12. THE DEFINITE INTEGRAL

Next, we invoke exercise 12.18, which says that there exists some M > 0 so that

|f (uk ) f (`k )|  |uk `k |M

 xk M

 mesh(P )M.

Here, in the second step, we used that, for each k, the points uk , `k are from the same
subinterval [xk 1 ,xk ], and in the final step, we used that the largest xk in a partition
is called its mesh size mesh(P ).
Inserting this into the expression for U (f,P ) L(f,P ), we obtain the inequality

N
X
|U (f,P ) L(f,P )|  mesh(P )M xk = mesh(P )M (b a).
k=1

Choosing P so that mesh(P )  ✏/(M (b a)), we are done.

As we mentioned earlier, an advantage of the Darboux integral is that it is compar-


atively easy to check if a function is Darboux integrable or not (as opposed to being
Riemann integrable). However, as we have seen, the definition of the Darboux integral
was rather indirect, which makes it awkward to compute. For this reason, the following
result, which connects the Darboux and Riemann integral for smooth functions, is help-
ful (in particular, it tells us that the definition used in Chapter 7) works just fine for
smooth functions.

Proposition 12.20 If f is smooth on [a,b], then f is integrable in both the sense of


Darboux and Riemann. In particular, for any sequence of partitions PN = {x0 , . . . , xN }
of [a,b] with mesh(PN ) ! 0, and ck 2 [xk 1 ,xk ], we have
Z b N
X
f (x) dx = lim f (ck ) xk .
a N !1
k=1

Proof. The idea of this proof is essentially to push the techniques used in the proof of
Proposition 12.19 just a little bit further. Indeed, for each N 2 N, let PN = {xk }Nk=0
and {ck }N
k=0 be as in the hypothesis.
First, we note that by Proposition 12.19, f is Darboux integrable on [a,b]. We denote
the value of its Darboux integral on [a,b] by A (this A is then the same as the one in
Lemma 12.12.3).
Next, let f (`k ) and f (uk ) denote the smallest and largest value of f on [xk 1 ,xk ].
This means that for all ck 2 [xk 1 ,xk ], we have f (`k )  f (ck )  f (uk ). In particular,
12.1. THE DEFINITION OF THE DEFINITE INTEGRAL 363

this implies that


N
X
L(f,PN )  f (ck ) xk  U (f,PN ).
k=1
Therefore, if we can prove that
lim L(f,PN ) = lim U (f,PN ) = A,
N !1 N !1
then the conclusion of the proposition follows from the Squeeze theorem.
To end the proof, we prove
lim U (f,PN ) = A.
N !1
(The limit for L(f,PN ) is obtained in the same way.) To this end, let ✏ > 0 be given,
but unknown. Our goal is then to determine some N0 sufficiently large so that for all
N 2 N we have
N > N0 =) |U (f,PN ) A| < ✏.
Since A lies between L(f,PN ) and U (f,PN ), it follows that
|U (f,PN ) A|  |U (f,PN ) L(f,PN )|.
Exercise 12.21 Finish the proof by repeating parts of what we did in the proof of
Proposition 12.19.

As we mentioned above, a problem with developping a theory for the definite integral
for smooth functions is that it will not apply to, for instance, the absolute value function.
However, this can be easily fixed by noticing that the absolute value function is what we
call piece-wise smooth. We explain what we mean by this in the following example.

Example 12.22 Define the function


8
<tan x for x 2 [0,⇡/3]
>
f (x) = sin x for x 2 (⇡/3,⇡/2]
>
:
cos x for x 2 (⇡/2,⇡]
We say that f is piecewise smooth since its graph
can, more or less, be thought of as consisting of
three pieces, each smooth on a closed and finite
interval. The fact that this is not really true
makes a proper definition slightly annoying to
formulate.
Fig. 8.

Exercise 12.23 (a) Define what it means for a function to be piece-wise smooth.
(b) Prove that piece-wise smooth functions are Darboux integrable.
Remark: This definition of piece-wise smooth is not entirely standard. Usually,
one also asks for the function to be continuous at the "break points".
364 CHAPTER 12. THE DEFINITE INTEGRAL

Darboux and Riemann integrability of continuous functions

Before we move on to establishing computational rules for the definite integral, it is


important to point out that Proposition 12.20 also holds when the word "smooth" is
replaced by "continuous". That is, the following is true:

Proposition 12.24 If f is continuous on [a,b], then f is integrable in both the sense of


Darboux and Riemann. In particular, for any sequence of partitions PN = {x0 , . . . , xN }
of [a,b] with mesh(PN ) ! 0, and ck 2 [xk 1 ,xk ], we have
Z b N
X
f (x) dx = lim f (ck ) xk .
a N !1
k=1

If you re-read the proof of Proposition 12.20, you will notice that the only point
where we used the hypothesis that f is smooth, is when we obtain, for all k, the estimate

|f (uk ) f (`k )| < mesh(P )M.

The point of this estimate is that by choosing mesh(P ), we can get |f (uk ) f (`k )| to be
as small as we want, for all k.
While this seems like just the thing continuity would imply (i.e., |uk `k | small
implies |f (uk ) f (`k )| small), unfortunately, the fact that we need to get this "for all
k" causes problems. To explain why this is, we recall the definition of continuity.

Definition 12.25 (Continuity) A function f is said to be continuous on an interval


I if for all x 2 I, the following holds:

For all ✏ > 0 there exists > 0 such that for all y 2 I we have

|x y| < =) |f (y) f (x)| < ✏.

So why is using continuity problematic? Well, given a partition of [a,b], then to know
if |uk `k | small implies |f (uk ) f (`k )| small, we would have to apply the definition
of continuity to each point `k . For each k, this would give a different value of k , and
we would have no guarantee that these k ’s are large enough for |uk `k | < k to hold
for all (or even any) k. Potentially, we could respond to this by making the partition
smaller (thus moving the uk ’s and `k ’s closer together), but this would result in a new
set of uk ’s, and therefore also new k ’s, leaving us back at square one.
To be able to deal with this, we need another type of continuity.
12.1. THE DEFINITION OF THE DEFINITE INTEGRAL 365

Definition 12.26 (Uniform continuity) A function f is said to be uniformly contin-


uous on some interval I if the following holds:

For all ✏ > 0, there exists > 0 such that for all x, y 2 I we have

|x y| < =) |f (x) f (y)| < ✏.

Here, an important detail is to notice that while usual continuity1 is always relative
to some specific point, uniform continuity is always relative to an interval.

Example 12.27 (a) The function f (x) = x2 is not uniformly continous on R. Indeed,
let us show that the definition fails for ✏ = 1 by finding two points that are arbitrarily
close together, but such that |f (x) f (y)| 1. To this end, let > 0 be some arbitrary,
but fixed number, and consider the points x + and x, for some x > 0 to be decided.
With this, we obtain

|f (x + ) f (x)| = |(x + )2 x2 | = 2x + 2

To get this larger than ✏ = 1, it suffices to choose x = 1/ , and we are done.


(b) The function f (x) = x2 is uniformly continuous on the interval [0,10]. Indeed,
given any x0 2 R, by the procedures from Chapter 4, we find that, say,
n ✏ o
|x x0 | < = min 2|x0 | + 1, =) |f (x) f (x0 )| < ✏.
2|x0 | + 1

For x0 2 [0,10], the smallest value for prescribed by this formula would be min{1, 21

}.
Since this value is non-zero, and works for all x0 2 [0,10], we are done.

Exercise 12.28 (a) Show that f (x) = 1/x is not uniformly continuous on (0,1].
(b) Show that f (x) = 1/x is uniformly continuous on [1/2,1].
Exercise 12.29 Show that smooth functions are uniformly continuous on intervals
[a,b]. Hint: Recall exercise 12.18.
Exercise 12.30 Show that if f is uniformly continuous, then for all ✏ > 0 there exists a
> 0 so that for all partitions with mesh size mesh(P ) < , then U (f,P ) L(f,P ) < ✏.
Explain why this proves Proposition 12.20 with "smooth" is replaced by "uniformly
continuous".
Remark: This exercise seems challenging at first, but once you solve it, you will see that
you needed to do almost nothing (since the concept of uniform continuity essentially is
tailored for the task).
1
When also discussing uniform continuity, we sometimes call the usual form of continuity "point wise
continuity".
366 CHAPTER 12. THE DEFINITE INTEGRAL

In light of exercise 12.30, the following result immediately implies Proposition 12.24.

Proposition 12.31 If f is continuous on [a,b], then f is uniformly continuous on [a,b].

Proof of Proposition 12.31. We do a proof by contradiction, where the Bolzano-Weierstrass


theorem will play a central part (f is continuous on a closed and bounded interval!).
The first step is therefore to establish that the negation of f being uniformly contin-
uous is as follows (you are asked to provide the details in exercise 12.32, below):
There exists ✏ > 0 such that for all > 0 there exist x, y 2 I such that
|x y| < and |f (x) f (y)| ✏.
To apply the Bolzano-Weierstrass theorem, we need to work with sequences. To
this end, we observe that if f is not uniformly continuous, then there exists some fixed
✏ > 0 so that for all > 0, there exists x,y 2 [a,b] that are "close" while f (x),f (y) are
"far apart". Specifically, for each = 1/n, with n 2 N, there exists a pair of points
xn , yn 2 [a,b] so that
1
|xn yn | < and |f (xn ) f (yn )| ✏ both hold.
n
Suppose now that the sequences xn and yn converge (you are asked to adress this
in exercise 12.33, below). In particular, they have to converge to the same limit since
|xn yn | < 1/n. Denote this limit by L.
But now we almost have our contradiction! Indeed, since f is a continuous function
on [a,b], we are allowed to make the following computation:
0 = |f (L) f (L)| = f ( lim xnk ) f ( lim ynk )
k!1 k!1

= lim f (xnk ) f (ynk ) ✏.


k !1
Done!

Exercise 12.32 Use the following three rules for forming negations to deduce the above
negation of uniform continuity:
(i) The negation of "for all x then A holds" is "there exists x such that A is false".
(ii) The negation of "there exists x such that A holds" is "for all x then A is false".
(iii) The negation of "for all x we have A =) B" is "there exists x so that A and
the negation of B both hold ".
Exercise 12.33 There is no reason why the sequences xn and yn should converge in
the above argument. Use Bolzano-Weierstrass to fix this, and explain why the limits
are the same.
Hint: It is a bad idea to apply Bolzano-Weiestrass separately to xn and yn .
12.2. BASIC RULEBOOK FOR THE DEFINITE INTEGRAL 367

12.2 Basic rulebook for the definite integral


We now motivate and prove the basic rules for the definite integral. Note that we only
make the assumption that f is Riemann integrable here.
First, we formulate two rules that we actually take as definitions.

Definition 12.34 Suppose that f is a Riemann integrable function on [a,b] and that
c 2 [a,b]. Then we define:
Z c Z a Z b
(i) f (x) dx = 0 (ii) f (x) dx = f (x) dx
c b a

Why do we formulate these rules in a definition and not in a proposition that we


prove? Well, in the case of (i), the problem is that we cannot partition an interval of
length zero such as [a,a] (why?). This means that either we modify what we R amean by a
partition, or we extend the definition of the definite integral by defining that a f (x)dx =
0. To avoid complicating other parts of the theory, we choose to do the latter.
Next, rule (ii) falls outside of our theoretical framework since we have not made
sense of what it means to partition an interval “backwards”. By accepting (ii) as being
true by definition, we again avoid complicating other parts of the theory.
We now formulate the basic computational rules that we can give proofs for.

Proposition 12.35 (Basic rulebook for the definite integral) Suppose that f, g, h
are Riemann integrable on [a,b], c 2 (a,b) and k 2 R. Then we have
Z b
(iii) dx = b a
a
Z c Z b Z b
(iv) f (x) dx + f (x) dx = f (x) dx,
a c a
Z b⇣ ⌘ Z b Z b
(v) f (x) + g(x) dx = f (x) dx + g(x) dx
a a a
Z b Z b
(vi) kf (x)dx = k f (x) dx
a a
Z b Z b
(vii) f (x)  g(x) on [a,b] =) f (x) dx  g(x) dx.
a a

Finally, we have the triangle inequality for definite integrals: if a < b, then
Z b Z b
(viii) f (x)dx  |f (x)| dx
a a
368 CHAPTER 12. THE DEFINITE INTEGRAL

Discussion of rules (iii) through (vii)


Let us now consider the second “group” of rules
from the rulebook. What these have in common
is that they are fairly easy to visualise (at least
in special cases), and they can be given rather
pleasant proofs based on Definition 12.8. Below,
we prove rule (iv), and ask you in the exercises
to produce similar proofs for the remaining rules.
Fig. 9. Illustration of rule (iv).

Proof of rule (iv). For each N 2 N, we partition [a,c] and [c,b] into N equally long subin-
tervals. We denote these partitions by {x0 , x1 , . . . , xN } and {y0 , y1 , . . . , yN }, respectively
(in particular, this means that xN = y0 ). Since the mesh sizes of both partitions tend
to zero as N grows, it follows by Definition 12.8 that
Z c N
X Z b N
X
f (x) dx = lim f (xk ) xk , and f (x) dx = lim f (yk ) yk .
a N !1 c N !1
k=0 k=0

The point is now that {x0 , x1 , . . . , xN } [ {y0 , y1 , . . . , yN } is a partition of [a,b] into a


bunch of subintervals that, even if they are not all of equal length, their mesh size will
tend to 0 as N increases. Therefore, by applying Definition 12.8 once more, we obtain
Z b ⇣X
N N
X ⌘ Z c Z b
f (x) dx = lim f (xk ) xk + f (yk ) yk = f (x) dx + f (x) dx.
a N !1 a c
k=0 k=0

Exercise 12.36 The simplest rule is rule (iii). Show that it follows almost immediately
from Definition 12.8

Exercise 12.37 Use Definition 12.8 to prove rules (v), (vi) and (vii). (The arguments
are all minor variations of the one used to prove rule (iv).

(a) Prove rule (v).


(b) Prove rule (vi).
(c) Prove rule (vii).
(d) Does a rule such as (vii) hold for indefinite integrals?

Exercise 12.38 (Discussion) Although we claimed that rules (i) and (ii) do not
really fit into our framework, it is possible to modify what we mean by a partition so
that we can give proofs for these rules.

(a) Do this for rule (i).


(b) Do this for rule (ii).
12.2. BASIC RULEBOOK FOR THE DEFINITE INTEGRAL 369

Discussion of rule (viii), the triangle inequality for definite integrals


Recall that the usual triangle inequality says that

|a + b|  |a| + |b|. (12.6)

If we know that the the triangle inequality holds for the sum of two numbers, then it
can almost immediately be extended to a sum of three numbers as follows:

|a + b + c| = |(a + b) + c|  |a + b| + |c|

 (|a| + |b|) + |c| = |a| + |b| + |c|.

Notice that in both of the inequalities in this computation, we only used the usual
triangle inequality (12.6).

Exercise 12.39 (a) Prove by induction that


N
X N
X
ak  |ak |.
k=1 k=1

(b) Use Definition 12.8, in combination with (a), to prove rule (viii), namely that
Z b Z b
f (t) dt  |f (t)| dt.
a a

(c) Illustrate this result visually.


370 CHAPTER 12. THE DEFINITE INTEGRAL

12.3 More advanced rules for the definite integral


We now establish more advanced rules for the definite integral.

The Mean Value Theorem for Integrals


As an application of the basic rules obtained above, we now prove a counter-part of the
Mean Value Theorem for integrals. This result will be the key ingredient in the proof of
the Fundamental Theorem of Calculus.

Proposition 12.40 (Mean Value Theorem for Integrals) Suppose that f is con-
tinuous on [a,b]. Then there exists a c 2 [a,b] so that
Z b
f (c)(b a) = f (x)dx.
a

Before we begin the proof, let us think about what this result actually says. Now,
by the Min-Max Theorem, there exist points `, u 2 [a,b] so that f (`) and f (u) are the
smallest and largest values of f on [a,b]. Consider the following figure:

Fig. 10. The red rectangle has area f (`)(b a), the blue rectangle has area f (u)(b a).
The Mean Value Theorem for Integrals just says that there has to be some "average"
height f (c) so that the rectangle with area f (c)(b a) is exactly equal to the green
area under the graph on the interval [a,b].

Proof of the Mean Value Theorem for Integrals. As discussed above, let f (`) and f (u)
be the minimum and maximum of f on [a,b], respectively. This gives us the inequalities

f (`)  f (x)  f (u) 8x 2 [a,b].

We apply the basic rule (vii) to integrate these inequalities to obtain


Z b Z b Z b
f (`)dx  f (x)dx  f (u)dx.
a a a
12.3. MORE ADVANCED RULES FOR THE DEFINITE INTEGRAL 371

The values f (`) and f (u) do not depend on x, we can therefore use rule (vi) to rewrite
these inequalities as
Z b Z b Z b
f (`) dx  f (x)dx  f (u) dx,
a a a

which, by rule (iii), is the same as


Z b
f (`)(b a)  f (x)dx  f (u)(b a).
a

Dividing by (b a) on all sides, we get


Z b
1
f (`)  f (x)dx  f (u).
b a a
| {z }
=s

Notice that the value which we label by s is between f (`) and f (u). This means that by
the Intermediate Value Theorem (or more specifically, exercise 9.10), there must exist
some value c 2 [a,b] so that s = f (c). This is exactly what we needed to prove!

As for the Mean Value Theorem for derivatives, there also exists a Generalised Mean
Value Theorem for Integrals which we shall need in a later chapter. Since the proof is
just a minor variation of the above proof, we formulate this result, and leave the proof
to the exercises.

Proposition 12.41 (Generalised Mean Value Theorem for Integrals) Suppose


that f and g are continuous functions on [a,b], and that g(x) 0 for all x 2 [a,b]. Then
there exists a value c 2 [a,b] so that
Z b Z b
f (x)g(x)dx = f (c) g(x)dx.
a a

Exercise 12.42 (a) Prove the Generalised Mean Value Theorem for Integrals by
modifying the proof of the Mean Value Theorem for integrals.
(b) Does the conclusion of the Generalised Mean Value Theorem for Integrals hold
if we replace the assumption that g is positive on [a,b], by the assumption that g
is negative on [a,b]?

Exercise 12.43 It is possible to improve both Mean Value Theorems for Integrals
formulated above, so that in the conclusion we can replace c 2 [a,b] by c 2 (a,b).
Modify the proof of Proposition 12.40 so that this holds.
Hint: Take a look at the strategy for the proof of Rolle’s theorem.
372 CHAPTER 12. THE DEFINITE INTEGRAL

The Fundamental Theorem of Calculus


In practice, there are basically three ways to compute a definite integral:

1. "Cheat" by using a geometric argument.


2. Use "brute force" and compute the integral numerically with Python by approxi-
mating with Riemann sums.
3. Use the "evaluation formula" to compute the definite integral in terms of the
primitive function (which we sometimes also call the anti-derivative).

Above, we
R 1 have
p already seen how to both "cheat" and use "brute force" to study
the integral 0 1 x2 dx. The main point of this section is to explain how and why
the third method works. To this end, we need to establish what some claim is the most
important scientific discovery ever made:

Theorem 12.44 (The Fundamental Theorem of Calculus) Suppose that f is a


continuous function on some interval I. Then the following holds:
(i) If x, a are inner points of I, it holds that
Z x
d
f (t) dt = f (x).
dx a

(ii) (the evaluation formula) If F is a primitive function of f on I, then for all a,b 2 I
we have Z b
f (t) dt = F (b) F (a).
a

Note that we often use the notation [F (t)]ba in place of F (b) F (a).

When reading the above result, keep in mind


that the definite integral and the derivative come
from two completely different geometric notions:
areas and tangent lines, respectively. It was
therefore quite surprising, and very useful, when
Newton and Leibniz realised that these two con-
cepts are mirror images of one another! (In fact,
historically, people started working on the defi-
nite integral before the derivative.) At the time,
there was a huge argument between the two on
who discovered "Calculus" – the link between Fig. 11. Gottfried Wilhelm von Leib-
theory of integration and the theory of differenti- niz (1646 – 1716) shown biting Sir
ation – first. In the end, Newton was credited for Isaac Newon (1642 – 1727).
the discovery, but we are using Leibniz’ notation.
12.3. MORE ADVANCED RULES FOR THE DEFINITE INTEGRAL 373

Proof of part (i). Suppose that f is continuous


on some interval that contains x and a as inner
points. For instance, this seems to be the case for
the function in Figure 12. By Proposition 12.24,
this means that f is integrable and we can define
the function
Z x
G(x) = f (t) dt.
a

Our goal is to show that G(x) is a primitive func- Fig. 12. Illustration of f (t) (the red
tion of f (x). In other words, we need to compute graph) and G(x) (the green "area
the derivative of G(x). function").

Using the basic computational rule (iv) for the definite integral, we begin the com-
putation of G0 (x) as follows:

G(x + h) G(x)
G0 (x) = lim
h!0 h
Z Z x
1 ⇣ x+h ⌘
= lim f (t) dt f (t) dt
h!0 h a a
Z x+h
1
= lim f (t) dt
h!0 h x

Next, by the Mean Value Theorem for Integrals, we continue the computation by
1
· · · = lim f (c)(x + h x)
h!0 h

= lim f (c)
h!0

Finally, we observe that since c 2 [x,x + h], by the squeeze theorem, it follows that h ! 0
implies c ! x, and so, we complete the computation using the continuity of f :
· · · = lim f (c) = f (x).
c!x

Proof of part (ii). Fix a,b 2 I and some primitive function of f on I that we call F (we
emphasize that at the moment, we know nothing about F , except that F 0 = f ). Letting
G be as in the first part of the theorem, since also G0 = f on I, we already know from
the theory of indefinite integration that there exists a constant C so that
F (x) = G(x) + C
Z x
= f (t)dt + C.
a
374 CHAPTER 12. THE DEFINITE INTEGRAL

But this implies that


⇣Z b ⌘ ⇣Z a ⌘
F (b) F (a) = f (t)dt + C f (t)dt + C
a a
Z b Z a Z b
= f (t)dt f (t)dt = f (t)dt,
a a a

which completes the proof of the Fundamental Theorem of Calculus!

Exercise 12.45 Compute the following integrals by first identifying the relevant prim-
itive functions (that is, by first computing the relevant indefinite integrals).
Z 4 Z 4 Z e
p dx dx
(a) sin x dx (b) (c)
0 2 x2 1 1 x(1 + (ln x)2 )

Exercise 12.46 Compute the following definite integrals by first splitting up the def-
inite integral using rule (iv) from Proposition 12.35.
Z 2⇡ Z 3
(a) | sin x| dx (b) |x4 16| dx
0 0

Exercise 12.47 (From an old exam) What is the largest value taken by the following
function? Z x
1 t
g(x) = dt, x 2 R.
0 1 + t2

Exercise 12.48 Let F (x) be some primitive function of f (t) = arctan(t2 ). By the
evaluation formula it holds that
Z x4
arctan(t2 ) dt = F (x4 ) F (x2 ).
x2

Use this to find a formula for the derivative of the integral on the left-hand side with
respect to x (note that you never need to figure out who F actually is).

Exercise 12.49 An alternative proof of the evaluation formula starts as follows: Choose
a partition P = {x0 , x1 , . . . , xn } of [a,b] and add 0 n-times by writing

F (b) F (a) = F (b) F (xn 1) + F (xn 1) ··· F (x1 ) + F (x1 ) F (a).

Use the Mean Value theorem to complete this proof.


12.3. MORE ADVANCED RULES FOR THE DEFINITE INTEGRAL 375

Change of variable formula for definite integrals


When making changes of variables, it is possible to use the formulas for the indefinite
integrals to compute the relevant primitive functions, and then apply the evaluation
formula. However, to save a bit of work, we can formulate this formula directly in the
language of definite intergrals.

Proposition 12.50 (The change of variable formula) Suppose that g has a con-
tinuous derivative on [a,b], and suppose that f is continuous on the range of g. Then
Z b Z g(b)
0
f (g(t))g (t) dt = f (u) du.
a g(a)

Proof. Suppose that F is a primitive function of f . By the chain rule, we know that
d
f (g(t))g 0 (t) = F (g(t)).
dt
On the one hand, integrating both sides over the interval [a,b], and applying the evalu-
ation formula to the right-hand side, we get
Z b Z b
0 d
f (g(t))g (t) dt = F (g(t)) dt
a a dt

= F (g(b)) F (g(a)).

On the other hand, since F is a primitive function of f , the evaluation formula gives
Z g(b)
f (u)du = F ((g(b)) F (g(a)).
g(a)

Combining these two computations, we arrive at the desired formula.

Exercise 12.51 Compute the following integrals.


Z ⇡/3 Z ⇡/2 Z 1
sin 2x dx 2 p
(a) (b) x cos x dx (c) arctan x dx
0 sin2 (x) + 1 0 0

Exercise 12.52 (a) Suppose that f is an oddZ and continuous function the interval
a
[ a,a]. What does this say about the integral f (x) dx ?
Z 1 p a

(b) Compute x 1 x2 dx.


1

Exercise 12.53 Prove Proposition 12.50 by modifying the argument in exercise 12.49.
376 CHAPTER 12. THE DEFINITE INTEGRAL

The partial integration formula for definite integrals


As for changes of variables, it is possible to do integration by parts via the indefinite
integral. However, as it sometimes is convenient to have this formula in the language of
the definite integral, we formulate it here.

Proposition 12.54 (The partial integration formula) Suppose that f, g have con-
tinuous derivatives on [a,b]. Then
Z b Z b
f 0 (t)g(t) dt = [f (t)g(t)]ba f (t)g 0 (t) dt.
a a

Proof. As in the proof of the partial integration formula for the indefinite integral, we
begin by considering the product rule for the derivative:

(f g)0 = f 0 g + f g 0 .

Taking the definite integral on both sides of this expression, and applying the evaluation
formula to the left-hand side, we get
Z b Z b
0
f (b)g(b) f (a)g(a) = f (t)g(t)dt + f (t)g 0 (t)dt,
a a

which is equivalent to the desired formula.

Exercise 12.55 Compute


Z ⇡ Z ⇡/4
(a) x sin xdx (b) x arctan(x)dx.
0 0

Exercise 12.56 Consider


Z 1
In = ln(1 + xn ), n = 1, 2, 3, . . . .
0

(a) Compute I1 , I2 , I3 .
(b) Show that the sequence In converges.

Hint: (b) becomes easier when you realise that you do not have to show what the limit
is (however, the limit is possible to determine – so if you want an extra challenge, try
to do also this).

Exercise 12.57 Prove by induction that for n = 0, 1, 2, 3, . . ., we have


Z x ⇣
dn sin x 1 n ⇡⌘
= t sin t + (n + 1) dt.
dxn x xn+1 0 2
12.4. APPLICATIONS TO GEOMETRY AND ELEMENTARY FUNCTIONS 377

12.4 Applications to geometry and elementary functions


Some formulas on geometric quantities

In the following exercises you need to use the full flexibility of Riemann sums to find
formulas describing certain geometric properties of various objects.

Exercise 12.58 We begin by considering how


to find a formula for the length of a graph.

(a) Use Pythagoras’ theorem to express each


distance sk using xk = xk xk 1 and
fk = f (xk ) f (xk 1 ). (See figure 13.)
(b) Use the Mean Value Theorem on the fk
to make xk appear.
(c) Deduce the following formula for the total
Fig. 13. An approximation of the
length:
N Z bp length of the graph is given by the
X total length of the straight lines, each
s = lim sk = 1 + f 0 (x)2 dx.
N !1
k=1 a with length sk .

(d) Find the length of y = x3/2 , x 2 [0,1].

Exercise 12.59 In this exercise, we consider


volumes with rotational symmetries. We think
of the volume we are trying to compute in
terms of a thinly sliced baguette.

(a) Find an expression that approximates the


volume of the slice Vk corresponding to
xk (see Figure 14).
Fig. 14. Given a function f (x), we
(b) Deduce the following formula for the total
consider the volume obtained by ro-
volume V :
Z b tating the graph around the x-axis.
XN
A reasonable approximation of the
V = lim Vk = ⇡f (x)2 dx.
N !1 a volume of each slice is obtained by as-
k=1
suming that the function is constant
(c) Compute the volume of a ball with radius there. We denote this approximation
R > 0. of the volume of the slice by Vk .

Exercise 12.60 (Challenge) In this last exercise, we consider how to compute the
surface area of a function with rotational symmetry. As in the previous exercise, we
think of the object as a thinly sliced baguette.
378 CHAPTER 12. THE DEFINITE INTEGRAL

(a) First, consider the case when the baguette is just a cylinder (that is, f (x) is
constant). Explain why the surface area of the slice denoted by Sk is equal to
2⇡f (xk ) sk , where sk is defined as in exercise 12.58.
(b) Next, we consider the more general situation shown in Figure 15, where the slice
is not a cylinder. Use the figure as inspiration, and explain why it follows from
the intermediate value theorem that almost the same formula for Sk holds.
(c) Deduce the following formula for the total surface:
XN Z b p
S = lim Sk = 2⇡f (x) 1 + f 0 (x)2 dx.
N !1 a
k=1
p
(d) Compute the surface area of y = x, x 2 [0,1].

Fig. 15. Left: The thinly sliced baguette. To approximate the area of its "crust",
we think of the function as being a straight line on each slice (just as when we
computed the length of the graph). As before, we denote the length of this line by
sk . This allows us to approximate the "crust-area" of each slice by the surface
area of a sequence of bands. Right: To compute the area of the blue band, think
of it as being made of rubber (that is flexible). If you keep the width sk , but
stretch/squish the band so that it becomes cylindrical with the same radius on both
sides, then the total area either gets larger (green) or smaller (red). This is a key
observation to getting a nice formula in part (b) of exercise 12.60.

Remark 12.61 Above, we ask you to compute formulas for length, volume and surface
area without actually defining what we mean by these concepts. While not ideal, we
allow ourselves to be slightly sloppy since we only want to briefly showcase how, in
certain situations, these concepts can be computed in terms of one variable definite
integrals.
For the actual definitions, we refer the interested student to any text on several
variable calculus, where these concepts all belong naturally. For instance, the two-
dimensional analogue of the Darboux-Riemann integral defines what we mean by vol-
ume.
12.4. APPLICATIONS TO GEOMETRY AND ELEMENTARY FUNCTIONS 379

Revisiting the logarithm


In Chapter 2, we defined the logarithm in terms of an area under the graph y = 1/t.
Finally, we can formulate this definition without resorting to "geometric arguments":

Definition 12.62 (the logarithm) For x 2 (0,1), we define


Z x
dt
ln(x) = .
1 t

We now ask you to use the theory of the definite integral to deduce some basic
properties of the logarithm.

Exercise 12.63 By only using properties of the definite integral, prove that:

(a) Prove that ln 1 = 0.


(b) Prove that (ln x)0 = 1/x for x > 0.
(c) Prove that ln(1/x) = ln x for x > 0.
(d) Prove that ln(xy) = ln(x) + ln(y) for x,y > 0.
(e) Prove that ln(xa ) = a ln(x) for x > 0 and a 2 R.

Remark 12.64 Based on the above exercise, the proofs given for the properties of the
exponential function are now correct.
380 CHAPTER 12. THE DEFINITE INTEGRAL

12.5 Computing and estimating unbounded areas


Until now, we have focused on how to integrate functions assumed to be continuous on
a closed and bounded interval [a,b]. In this section, we are going to see how to loosen up
this restriction in two different ways. This leads us to the notion of improper integrals.

How to integrate functions with either unbounded domains or ranges


That is, what if we replace the interval [a,b] by, say, [a, 1)? For instance, how to compute
Z 1
1
dx ?
0 1 + x2

Fig. 16. This is the "infinitely long" area under 1/(1 + x2 ) represented by the above
integral.

Exercise 12.65 In this exercise, we explain the general idea behind computing "in-
finitely long" areas.

(a) Fix a number c > 0. Compute the integral of f (x) = 1/(1 + x2 ) over [0, c].
(b) Take the limit as c ! 1 in the answer from (a). What do we get? Use the above
figure to explain why it makes sense to consider such a limit.

We are so happy with the procedure from the above example that we make the
following definition.

Definition 12.66 Let f be continuous on [a,1). Then we define the improper inte-
gral of f on [a,1) to be the limit
Z 1 Z c
f (x) dx = lim f (x) dx.
a c!1 a

We say that the integral converges if this limit exists, and that it diverges otherwise.
12.5. COMPUTING AND ESTIMATING UNBOUNDED AREAS 381

Next, we see what happens if we consider functions that


are only continuous on a non-closed interval of the type,
say, (a, b]. The point is that this allows an asymptote
in x = a. We start by considering (see Figure 17)
Z 1
1
p dx.
0 x

Exercise 12.67 In this exercise, we explain the gen-


eral idea behind computing "infinitely tall" areas.

(a) Fix a number c 2 (0,1). Compute the integral of


p
f (x) = 1/ x over [c,1].
(b) Take the limit as c ! 0+ in the answer from (a).
What do we get? Use the above figure to explain
why it makes sense to consider such a limit. Fig. 17. This is the "infintely
p
tall" area under f (x) = 1/ x.
Again, we are so satisfied with this procedure that we
make the following definition.

Definition 12.68 Suppose that f is continuous on (a,b]. Then we define the improper
integral of f over (a,b] to be the limit
Z b Z b
f (x) dx = lim f (x) dx.
a c!a+ c

We say that the integral converges if this limit exists, and that it diverges otherwise.

We now consider some examples and exercises.

Example 12.69 The improper integral


Z 1
x
e dx
0

is convergent and is equal to 1. Indeed, this follows from the computation


Z 1 Z c
x
e dx = lim e x dx = lim [ e x ]c0 = lim (1 e c ) = 1.
0 c!1 0 c!1 c!1

Exercise 12.70 Do the following improper integrals converge or diverge?


Z 1 Z 1 Z 1
dx dx
(a) p p (b) p (c) sin x dx.
1 x(1 + x) 1 x(1 + x) 0
382 CHAPTER 12. THE DEFINITE INTEGRAL

Remark: When 1 is replaced by c, these integrals can all be computed by first finding
primitive functions.

Exercise 12.71 Determine exactly for which ↵ 2 R the following integral diverges and
converges, respectively: Z 1
dx
1 x↵

Exercise 12.72 Do the following improper integrals converge or diverge?


Z 1 Z 1
dx dx
(a) p p (b) p
0 x(1 + x) 0 x(1 + x)

Exercise 12.73 Determine exactly for which ↵ 2 R the following integral diverges and
converges, respectively: Z 1
dx

0 x

Remark 12.74 (i) Note that while there also exists other types of improper integrals,
they are mostly just minor variations of the above ideas. For instance, if f is continuous
on [a,b) with a vertical asymptote at x = b, then to obtain its integral over [a,b), we
should first integrate over [a,c] for some a < c < b, and then take the limit c ! b .
(ii) However, one variation that requires some care is if an integral is improper for
two (or more) reasons. For instance, this is the case with
Z 1
dx
. (12.7)
0 x

In such cases, we must split up the integral that each "issue" can be considered separately.
That is, here, we should first make the split
Z 1 Z 1 Z 1
dx dx dx
= + ,
0 x 0 x 1 x

and then consider the two integrals on the right-hand side (that are both improper for
one single reason), separately. If both converge, we say that (12.7) converges. If at least
one of them diverges then we say that (12.7) diverges.

Exercise 12.75 Is the following improper integral convergent? Explain.


Z 1
dx
.
1 x
12.5. COMPUTING AND ESTIMATING UNBOUNDED AREAS 383

The connection between improper integrals and infinite series


Improper integrals and infinite series are intimately connected. To illustrate this con-
nection, consider the following way to visualise
1
X 1
(12.8)
1 + k2
k=1
as an area:

Fig. 18. Here, we have plotted the graph of f (x) = 1/(1 + x2 ) and indicated the
values 1/(1 + k 2 ) for k = 1, 2, 3, . . . with vertical red line segments topped with a
dot. Next to each vertical line segment, we have placed a rectangle with base length
1. This means that the area of each rectangle is equal to its height, P and that the
combined area of all the rectangles is equal to the value of the series 1 2
k=1 1/(1+k ).

The point of the above figure is that the rectangles completely lie under the graph of the
function. In particular, merely using that a function f is continuous and decreasing, the
following computation is justified:
Z N Z 1 Z 2 Z N
f (x) dx = f (x) dx + f (x) dx + · · · + f (x) dx
0 0 1 N 1
Z 1 Z 2 Z N
f (1) dx + f (2) dx + · · · + f (N ) dx
0 1 N 1

N
X
= f (1) + f (2) + · · · + f (N ) = f (k).
k=1

In the case that f (x) = 1/(1 + x2 ), this implies that


N
X Z N Z 1
1 dx dx ⇡
  = .
1 + k2 0 1 + x2 0 1+x 2 2
k=1

Hence, by the Balloon lemma (or rather, the dichotomy for positive series), it follows
that the series (12.8) converges.

Exercise 12.76 You can also justify the connection between the partial sums and the
integral using what we know about Riemann sums. Do this.
384 CHAPTER 12. THE DEFINITE INTEGRAL

Note that above, we were able to bound an infinite series by an integral. The opposite
is also possible. Indeed, consider the following figure:

Fig. 19. Here, we have placed the rectangles to the right of their respective heights
instead of to the left. Their combined area has not changed, but now the rectangles
completely cover a part of the area under the graph.

Exercise 12.77 Suppose that f is a continuous and decreasing function. Use the above
figure as inspiration to prove that
XN Z N +1
f (k) f (x)dx.
k=1 1

Exercise 12.78 (a) Use the above exercise to find a new proof of the fact that the
harmonic series diverges (that does not use Oresme’s trick).
P
(b) Use the above techniques to prove that 1 k=1 1/k converges if and only if ↵ > 1.

By the above, the following proposition should now seem reasonable.

Proposition 12.79 (Integral test for infinite series) Suppose that f is positive,
continuous and decreasing on [a,1), and let A be any integer such that A > a. Then
Z 1 1
X
(i) f (x) dx converges () f (k) converges,
a k=A
Z 1 1
X
(ii) f (x) dx diverges () f (k) diverges.
a k=A

Proof. By what we did on the previous page, the implication =) of (i) and (= of (ii)
hold. To obtain the remaining implications, we need a Balloon lemma for monotonous
functions, in order to get a dichotomy result for improper integrals (cf. Proposition 5.24).
Since these results are established by repeating, essentially word-by-word, the proofs in
the case of infinite sequences and series, respectively, we leave them to the interested
reader.
12.5. COMPUTING AND ESTIMATING UNBOUNDED AREAS 385

Example 12.80 Let us investigate whether or not the series


1
X 2
ke k /2
k=1

converges. To apply the integral test, we first need to check whether or not the terms
2
are decreasing. To this end, we compute the derivative of the function f (x) = xe x /2 :
x2 /2 x2 /2 x2 /2
f 0 (x) = e x2 e = (1 x2 )e .

We see that the derivative of f is negative whenever 1 x2 < 1. Since this holds
when x > 1, it follows that the function is decreasing on [1,1). By the integral test
(Proposition 12.79), this means that the series converges if and only if the same is true
for Z 1
2
xe x /2 dx.
1
To determine whether or not this improper integral converges, we do the following:
Z 1 Z c
x2 /2 2
xe dx = lim xe x /2 dx
1 c!1 1

Z c2 /2
u=x2 /2 u
= lim e du
c!1 1/2

c2 /2 c2 /2
= lim [ eu ]1/2 = lim (e 1/2
e )=e 1/2
.
c!1 c!1

By the integral test, we conclude that the series is convergent.

Exercise 12.81 Use the integral test to determine whether the following infinite series
converge or not.
1
X 1
X 1
X 1
X
1 1 1 1
(a) (b) (c) (d) .
k k2 k ln k k(ln k)2
k=1 k=1 k=2 k=2

Remark: The point is to study these series using the tools we have for integrals.

Exercise 12.82 Use figures similar to figures 18 and 19 to show that for N = 1, 2, 3, . . .
we have X N
1
ln N   1 + ln N
k
k=1

Hint:
PN If you run into problems at x = 0 in the integral expression, then study the sum
k=2 1/k instead.
386 CHAPTER 12. THE DEFINITE INTEGRAL

Convergence tests for improper integrals of positive functions


Here are two comparison tests for improper integrals. The first should be compared to
Proposition 5.26, while the second should be compared to Proposition 4.39.

Proposition 12.83 (Standard comparison test for improper integrals) Suppose


that f, g are positive functions continuous on [a,1) such that 0  f (x)  g(x) for all
x a. Then the following holds:
Z 1 Z 1
g(x) dx < 1 =) f (x) dx < 1
a a

This result remains true if we replace the sense in which the integrals are improper.

Exercise 12.84 Adapt the proof for the comparison test for infinite series to prove the
above proposition.

Fig. 20. Here, a = 1. It should be intuitively clear that if the larger area is finite,
then so is the smaller area. And correspondingly, if the smaller is infinite, then so
is the larger one.

Proposition 12.85 (Limit comparison test for improper integrals) Suppose f, g


are two positive continuous functions on [a,1). If there exists L > 0 so that

f (x)
lim = L,
x!1 g(x)
then Z 1 Z 1
f (x)dx < 1 () g(x)dx < 1.
a a
If L = 0 or L = 1, then in each case half of the result holds (can you see which?).
This result remains true if we replace the sense in which the integrals are improper.

Exercise 12.86 Adapt the proof for the limit comparison test for infinite series to
prove the above proposition.
12.5. COMPUTING AND ESTIMATING UNBOUNDED AREAS 387

As with series, in order to use these comparison tests, we need something to compare
with. And, just like with series, we call the most useful class of integrals "↵-integrals".

Proposition 12.87 (Convergence of ↵-integrals)


Z 1
dx
converges () ↵ > 1
1 x↵
Z 1
dx

converges () ↵ < 1
0 x

Exercise 12.88 Prove this proposition (without using the integral test).

Example 12.89 We are to determine whether or not the improper integral


Z 1
x+2
p dx
0 x3 + x5
converges or not. Here, we first note that this integral is improper at both x = 0 and
x = 1. This means that we have to split into two parts (see Remark 12.74), say,
Z 1 Z 1 Z 1
x+2 x+2 x+2
p dx = p dx + p dx .
3
x +x 5 3
x +x 5 x3 + x5
0
|0 {z } |1 {z }
(I) (II)

If both (I) and (II) converge, then the integral is convergent. If one, or both, of (I) and
(II) diverge, then the integral is divergent.
We first check (I). In much the same way as we used comparison tests for series, we
start out by using our intuition of what happens when x ⇡ 0:
Z 1 Z 1 Z 1
x+2 2 2
p dx ⇡ p dx = 3/2
dx
3 5 3
0 x +x 0 x +0 0 x
By Proposition 12.87, this indicates that we ought to expect divergence since ↵ = 3/2 >
1. To verify this, we use the limit comparison test (Proposition 12.85):
x+2
p
x3 + x5 x3/2 (x + 2) x3/2 (x + 2)
lim = lim p = lim p
x!0+ 1 x!0+ x3 + x5 x!0+ x3/2 1 + x2

x3/2
0+2
=p = 2.
1+0
By the limit comparison test, this means that the integral (I) is divergent. But this
means that the total integral (I) + (II) is divergent, and we are done.
388 CHAPTER 12. THE DEFINITE INTEGRAL

Exercise 12.90 Determine if (II) from the above example is convergent or not.

Exercise 12.91 Determine the convergence of the following improper integrals.


Z 1 Z 1
dx dx
(a) p (b) p
4
x +1+x 2 sin(x)
1 0

Exercise 12.92 For which ↵ 2 R does the following integral converge or diverge:
Z 1
dx
p ?
0 x(1 + x↵ )

Exercise 12.93 Motivate why the following integrals converge, and show that for a >
0, we have Z 1
n!
xn e ax dx = n+1 , n = 1, 2, 3, . . . .
0 a

Some remarks on improper integrals of non-positive functions


As with infinite series, we can study the convergence of functions that have changing
signs via the concept of absolute convergence. Since this works essentially exactly the
same way as for infinite series, we leave the details to the following exercises.

Exercise 12.94 Define what it means for an improper integral to (a) converge abso-
lutely, and (b) converge conditionally.
Remark: As we have done above, to keep things reasonable, it is enough to consider the
case of functions that are either unbounded or are defined on an unbounded domain.
Exercise 12.95 Prove that if an improper integral converges absoltely, then it also
converges in the usual sense.
Hint: The proof is essentially, word-by-word, the same as for infinite series.
Exercise 12.96 Determine if the following improper integral converges absolutely, con-
ditionally or if it diverges. Z 1
sin x
dx
1 x2

Hint: It is hopeless to try to find a primitive of sin x/x2 .


Exercise 12.97 (Challenge) Show that the following improper integral converges
conditionally. Z 1
sin(ex )dx
1
What does this say about the existence of a divergence test for improper integrals?
Hint: It is hopeless to try to find a primitive of sin(ex ).
12.6. RELEVANT EXERCISES FROM PREVIOUS EXAMS 389

12.6 Relevant exercises from previous exams


Keep in mind that essentially all these exercises have full solutions on the course webpage.

Exercise 12.98 (Exam 2016-01-23, part of exercise) Compute the integral


Z 3
|x2 5x + 4|dx.
0

Exercise 12.99 (Exam 2016-01-23, part of exercise) Determine whether the fol-
lowing improper integral converges or diverges. If it converges, compute its value.
Z 1
dx
2 x(ln x)2

Exercise 12.100 (Exam 2015-01-08) (a) By comparing the following sum with an
integral, show that
N
X 1
arctan(n) N arctan(N ) ln(1 + N 2 ).
2
n=1

(b) Determine numbers ↵, 2 R so that


N
X
lim N ↵ arctan(n) =
N !1
n=1

with 6= 0.

Exercise 12.101 (Exam 2014-12-18) (a) Formulate the Fundamental Theorem of


Calculus.
(b) Determine, without actually computing its integral, the values of x for which the
function Z x
1 u2
f (x) = 2
du
0 1+u
has its local extreme points. Moreover, determine whether these are global, and de-
termine where the function is convex and concave. You should illustrate your answer
with a simple sketch of the graph of f (x).

Exercise 12.102 (Exam 2014-12-18, part of exercise) Explain why the length of
the graph of y = f (x) as x 2 [a,b] is given by the formula
Z bp
1 + f 0 (x)2 dx.
a
390 CHAPTER 12. THE DEFINITE INTEGRAL

Exercise 12.103 (Exam 2014-10-04, part of exercise) (a) Suppose that fR is contin-
1
uous on (0,1]. Define what we mean when we say that the improper integral 0 f (x) dx
is convergent or divergent, respectively.
(b) Determine whether or not the following integral is convergent.
Z 1
dx
p .
0 sin x

Exercise 12.104 (Exam 2014-08-18) In this exercise we consider the integral


Z ap
1 x2 dx a 2 [0,1].
0

(a) Draw a figure which illustrates the area that this integral represents. Also, with-
out computing the integral, determine its value for a = 1.
p
(b) Show that for a 2 [0,1] it holds that sin(2 arcsin a) = 2a 1 a2 . (Hint: The
addition formula for the sine.)
(c) Compute the integral by using the change of variables x = sin u. p
Double-check
your answer by using that for a = 1/2 the value is supposed to be 3/8 + ⇡/12.

Exercise 12.105 (Exam 2014-05-26, part of exercise) (a) Explain the difference
between definite and indefinite integrals. (b) Solve
Z 3
dx
.
1 2 + |x2 2|

Exercise 12.106 (Exam 2014-05-26, part of exercise) Determine a curve for which
the following integral denotes the length:
Z ⇡/6 q
1 + sin2 (x) dx.
0

Exercise 12.107 (Exam 2014-01-09, part of exercise) Solve the integral


Z 2
dx
.
2 x2 + 4|x| + 4

Exercise 12.108 (Exam 2014-01-09, part of exercise) Determine whether or not


the following improper integral converges or not.
Z 1 p
1 + x3
dx
1 1 + 3x + x2
12.6. RELEVANT EXERCISES FROM PREVIOUS EXAMS 391

Exercise 12.109 (Exam 2013-12-18, part of exercise) Determine whether or not


the following improper integral converges or not.
Z 1
dx
x
.
1 e e x

Exercise 12.110 (Exam 2013-08-21, part of exercise) Compute the integral


Z 1
x+1
2
dx.
0 x + 4x + 3

Exercise 12.111 (Exam 2013-05-29) Compute the integrals


Z 4 Z ln 4
ex
(a) |x2 4x + 3| dx (b) dx.
0 0 e + 4ex + 5
2x

Exercise 12.112 (Exam 2013-01-09, part of exercise) Compute


Z 3
ln(1 + x)dx.
0

Exercise 12.113 (Exam 2013-01-09, part of exercise) Investigate the convergence


of Z 1
t 1
dt.
1 1 + t2

Exercise 12.114 (Exam 2013-01-09) Determine all local and global extreme points
of the function Z x
1 t
f (x) = 2
dt
0 1+t
on R. Illustrate your answer in a sketch.

Exercise 12.115 (Exam 2012-12-19, part of exercise) Investigate the convergence


of Z 1
dx
p .
0 sin x

Exercise 12.116 (Exam 2012-05-28) Compute the integrals


Z 26 p p Z e⇡/2
1+ x
(a) p dx (b) sin(ln x)dx
32 x 1

Exercise 12.117 (Exam 2012-05-28) (a) Let a > 0. For which s > 0 does the
improper integral Z 1
dx
a xs
392 CHAPTER 12. THE DEFINITE INTEGRAL

converge? Compute the exact value for the cases when it does converge.
(b) Show (for instance by making a suitable illustration) that
Z N Z
N +1
dx X 1 N
dx
  .
2 xs ns 1 xs
n=2

(c) Determine
1
X 1
lim (s 1) .
s!1+ ns
n=1

Exercise 12.118 (From old exam) Show that for N = 1, 2, 3, . . . we have

N p
X 1 1+ N +1
p p 2 ln .
k(1 + k) 2
k=1

Exercise 12.119 (From old exam) Show that for N = 1, 2, 3, . . . we have

N
X 1 1 2N + 1
p > ln .
k2 +1+k 2 3
k=1
p
Remark: This one is slightly tricky. It is much easier to get the lower bound ( 2 +
1) ln(N + 1).

Exercise 12.120 (Lund, May 2016) Choose one of the following two exercises. (Both
give full credit.)

(a) It is a famous result in mathematics that


1
X 1 ⇡2
= .
k2 6
k=1

Since the proof of this equality requires techniques that is not included in this
course, you should not attempt to prove it. Instead, determine how large we need
to choose N so that the difference between the partial sum
N
X 1
k2
k=1

and the series (that is, the error) is less than 1/1000.
(b) The following Python code computes the value of a partial sum of some series.
12.6. RELEVANT EXERCISES FROM PREVIOUS EXAMS 393

1 N= 100
2 S = 0
3 for n in range(0,N):
4 S = S + n/(n ⇤⇤ 4 + 1) ⇤⇤ (1/2)
5 print (S)

(i) In mathematical notation, write down the partial sum and series.
(ii) In case the series is convergent, explain how many terms is needed for the
partial sum to approximate the value of the series with an error of at most 1/1000.
In case the series is divergent, explain how many terms is needed for the partial
sum to be larger than 1000.
394 CHAPTER 12. THE DEFINITE INTEGRAL

12.7 Answers to selected exercises


12.9 The Dirichlet function is not Riemann integrable.

12.11 (a) No, (b) yes.

12.37 To prove (iv), choose a sequence of partitions PN with mesh size going tot 0. This
yields a chain of equalities:
Z b⇣ ⌘ N ⇣
X ⌘ Z b Z b
f (x) + g(x) dx = lim f (xk ) + g(xk ) xk = · · · = f (x)dx + g(x)dx.
a N !1 a a
k=1

(d) No.

12.39 (a) The main point in the induction step is to notice that
N
X N
X1
ak = ak + aN .
k=1 k=1

(b) This argument is now very similar to those from exercise 12.37. (c)

12.45 (a) 2 sin(2) 4 cos(2) (see Example 10.30), (b) (1/2) ln(9/5) (see also Example
10.33), (c) ⇡/4 (use substitution u = ln(x)).

12.46 (a) 4, (b) 259/5.

12.47 The largest value is ⇡/4 log 2/2 (to see why this is, use the Fundamental theorem
of calculus to make a table of signs for g 0 (x).)

12.48 Z x4
d
arctan(t2 ) dt = 4x3 arctan(x8 ) 2x arctan(x4 ).
dx x2

12.49 Additional hint: After applying the Mean Value Theorem, the expression should
be exactly on the form required by Definition 12.12.2.

12.51 (a) log(7/4) (hint: use the double angle formula), (b) ⇡ 2 /4 2 (hint: partial
p
integration), (c) ⇡/2 1 (hint: substitute u = x).
12.7. ANSWERS TO SELECTED EXERCISES 395

12.52 (a) 0 (compute this by expressing the integral as a limit of Riemann sums, and
then use f ( x) = f (x), (b) 0.
p
12.58 (a) sk = ( xk )2 + ( fP k ) , (b) P
2 = xk · f 0 (xk ), (c) the point is to
fk p p take the
limit of the Riemann sums sk = 1 + f 0 (ck )2 xk , (d) (1/27)(13 13 8) ⇡
1.439...

12.59 (a) Vk = ⇡f (xk )2 · xk (this is the formula for the volume of a cylinder with
"radius"
P f (xk ) and
P thickness xk ), (b) the point is to take the limit of the Riemann
sums Vk = ⇡f (xk )2 xk , (c) (4/3)⇡R3 .

12.60 (a) this follows by the standard formula for the surface area for the side of a
cyldinder (google it), (b) write up the expression of the surface areas for the sides
of the red and green cylinders, respectively. By the intermediate value theorem, the
surface area of the side of the blue object has to be Sk = 2⇡f (ck ) skPfor some
cPk 2 [xx 1 ,xk ], (c)
Pthe pointpis to take the limit of the Riemann sum Sk =
2⇡f (ck ) sk = 2⇡f (ck ) 1 + f 0 (dk )2 xk where ck , dk 2 [xk 1 ,xk ]. (Note that
here we have a problem since we are a dealing with both ck ’s and dk ’s – you can
choose to either ignore this, or, you can take the extra challenge and try to figure
out how to deal with this! (hint: it can be dealt with in a pretty naive way), (d)
(⇡/6)(53/2 1).

12.65 (a) arctan(c), (b) ⇡/2.

12.70 (a) Diverges to 1, (b) converges to ⇡/2, (c) diverges.

12.71 By computing the improper integral, we see that it converges for all ↵ > 1 and
diverges for all ↵ < 1 (you need to consider the cases ↵ 6= 1 and ↵ = 1 separately).
p
12.67 (a) 2(1 c), (b) 2.

12.72 (a) converge, (b) diverge.

12.73 Converges for all ↵ < 1 and diverges for all ↵ 1 (be sure to compare this to the
result of exercise 12.71).

12.75 By the remark prior to the exercise, it diverges.

12.81 (a) Diverges, (b) converges, (c) diverges, (d) converges.


P
12.82 For the lower bound, use Figure 19, for the upper bound, use Figure 18 on N k=2 1/k
(notice that the lower summation limit is k = 2 and not k = 1).
R1
12.90 (II) roughly acts like 1 x 3/2 dx and therefore is convergent (use, say, limit com-
parison test to confirm this).
R1 R1 p
12.91 (a) compare to 1 dx/x2 (convergent), (b) compare to 0 dx/ x (convergent).
396 CHAPTER 12. THE DEFINITE INTEGRAL

12.92 Converges for all ↵ > 1/2 (be sure to take into account that this integral is improper
for two reasons, and therefore needs to be split up).

12.96 It converges absolutely.

12.97 Make the change of variables u = ex .


Chapter 13

Taylor polynomials

Introduction
We have now come to the final chapter of these lecture notes, where we combine most
of what we have learned so far in the study of Taylor polynomials.

Remark 13.1 (Selected problems from a single previous exam)


1. (a) Define what we mean by f (x) = g(x) + O (h(x)) as x ! 0.
(b) Find the Taylor approximation of order 3 centered at x = 0, with Big-oh error
term, for
ln(1 sin x).
(c) Use (b) to determine the limit
⇣ 1 ⌘k
lim 1 sin .
k!1 k
2. (a) Show that for x > 0 (this is not a misprint) we have
p 1 1
1 + x4 = x2 + 2 + E(x)
2x 8x6
with 1
|E(x)| < .
16x10
(b) Use (a) to approximate the improper integral
Z 1 ⇣p ⌘
1 + x4 x2 dx
1
with an error less than 10 9.

3. Determine whether the following infinite series converge or diverge.


1 ⇣p ⌘ 1 ⇣
1⌘
X1 X X
1/k2 2 1
(a) e (b) 4
k +1 k (c) sin arctan .
k k
k=1 k=1 k=1

397
398 CHAPTER 13. TAYLOR POLYNOMIALS

13.1 A first look at Taylor polynomials


The basic idea of the Taylor polynomials is that they are supposed to be higher order
"tangent curves". As an initial example, let us consider the function f (x) = sin x, whose
Taylor polynomials of orders 1, 3 and 5, centered at x = 0, turn out to be
T1 (x) = x,

x3
T3 (x) = x ,
3!
x3 x5
T5 (x) = x + .
3! 5!

Fig. 1. Comparison of sin x with T1 , T3 and T5 , respectively.

Note that T1 is identical to the tangent line of sin(x) at x = 0, while T3 and T5 are
what we ought to call the "tangent cube" and "tangent quintics", respectively, of sin x
at x = 0. For fun, below, we illustrate how T37 approximates sin x.

x3 x5 x37
Fig. 2. Comparison of y = sin x and y = T37 (x) = x 3! + 5! ··· + 37! .

Remark 13.2 (Python code for Taylor polynomials of the sine function)
1 import math as m
2 def T(k,x) # Returns value of Taylor polynomial of order n=2k+1.
3 C = [( 1) ⇤⇤ j ⇤ x ⇤⇤ (2j+1)/m. factorial (2j+1) for j in range(0,k)]
4 return sum(C)

Exercise 13.3 How large does n have to be in the Taylor polynomials for sin x cen-
tered at x = 0 for Tn (1) to match 4 digits of sin(1)? (Use the code in Remark 13.2.)
13.1. A FIRST LOOK AT TAYLOR POLYNOMIALS 399

Taylor polynomials are named in honor of the


English mathematician Brook Taylor, who also
invented (or discovered?) integration by parts.
Interestingly, he was also on a committee tasked
with settling the biggest scientific dispute of all
time: who discovered Calculus first?
Taylor’s personal life was rather tragic. From
a relatively wealthy family, he married out of love
at 36, which made his father cut all contact with
him. Two years later, both his wife and first born
child died during birth. He married again a few
years later, this time with his fathers blessing, to
a wealthy woman. Fortune did not smile on Tay- Fig. 3. Brook Taylor (1685 – 1731).
lor this time either, as his second wife also died Successful in math, not so much in
in childbirth (but this time the child survived). life.
A broken man, Taylor died at age 46.

Basic example 1: Using Taylor polynomials to compute limits


Here, we indicate a first example of how Taylor polynomials can be of use to us.

Example 13.4 We now indicate how to use Taylor polynomials for sin x to compute
the standard limit sin x
lim .
x!0 x

By what we see in Figure 1, we ought to have


sin x ⇡ T1 (x) = x

close to x = 0. Therefore, it ought to follow that


sin x T1 (x) x
lim ⇡ lim = lim = 1.
x!0 x x!0 x x!0 x

What could be simpler than this? The problem with this computation, of course, is
that sin x is not actually equal to x. Expressed more correctly, the above computation
is actually
sin x T1 (x) + error T1 (x) error error
lim = lim = lim + lim = 1 + lim .
x!0 x x!0 x x!0 x x!0 x x!0 x

For this reason, to make the computation work, we need to prove that the error term
goes faster to 0 than x as x ! 0. Understanding the error we make when we replace
functions by their Taylor polynomials is one of the main goals of this chapter.

Exercise 13.5 What happens if you replace T1 by T3 or T5 in Example 13.4?


400 CHAPTER 13. TAYLOR POLYNOMIALS

Basic example 2: Using Taylor polynomials to approximate integrals

Example 13.6 We want to compute the integral


Z 1
1 2
p e t /2 dt.
2⇡ 0
This integral is important in statistics since it
gives the probability of the event that a so-called
Gaussian random variable takes a value in the in-
terval [0,1]. It is difficult to compute since there
2
is no "nice" formula for the primitive of e t /2 .
Let us approximate it using Taylor polynomials
for ex :
T1 (x) = 1 + x

x2
T2 (x) = 1 + x +
2!
x2 x3
T3 (x) = 1 + x + +
2! 3!
and so forth. For instance, by plugging x =
t2 /2 into T1 (x), we ought to have
t2 /2 t2
e ⇡ T1 ( t2 /2) = 1 Fig. 4. The function ex together with
2
close to t = 0. Integrating this, we ought to get its Taylor polynomials T1 , T2 and T3 .

Z Z 1⇣
1 1
t2 /2 1 t2 ⌘
p e dt ⇡ p 1 dt
2⇡ 0 2⇡ 0 2
1 ⇣ 1⌘ 5
=p 1 = p = 0.3324...
2⇡ 6 6 2⇡
As in the previous example, to make this computation accurate, we must take the error
made when approximating by Taylor polynomials into account. This gives the expression
Z 1 Z 1⇣
1 t2 /2 1 t2 Taylor approximation⌘
p e dt = p 1 + error dt = 0.3324... + total error.
2⇡ 0 2⇡ 0 2

In particular, understanding the error allows us to figure out what order Taylor polyno-
mial to use in order to beat any ✏ threshold for the desired accuracy of this computation.

Exercise 13.7 How many terms from the Taylor polynomials for ex , centered at
x = 0, do you need to approximate the above integral so that you match the 6 first
decimal digits of its true value? (Use Table 13.18 in combination with trial and error.)
13.1. A FIRST LOOK AT TAYLOR POLYNOMIALS 401

Tangent lines and tangent parabolas


To motivate our definition of Taylor polynomials, we first point out the following formu-
lation of the definition of a tangent line of a function at a point.

Definition 13.8 (Tangent lines)


Suppose f is defined in a neighbourhood of the
point x = a and that f 0 exists at this point, and
let T1 be a polynomial of degree at most one.
Then we say that T1 is the tangent line at
x = a if we have
(
T1 (a) = f (a)
T10 (a) = f 0 (a). Fig. 5. Example of tangent line of
f (x) = ln(1 + x) at x = 0.

Exercise 13.9 Prove that if T1 satisfies the above definition, then T1 = f (a) +
f 0 (a)(x a).
Hint: A polynomial p(x) is of degree at most one exactly if it is on the form p(x) =
c0 + c1 x. Indeed, if c1 = 0, then it is of degree 0. Otherwise it is of degree 1.
Next, we consider how to define a second order tangent curve – what we probably
should call a tangent parabola. In light of the above formulation of the definition for
tangent lines, the following definition is rather natural.

Definition 13.10 (Tangent parabolas)


Suppose f is defined in a neighbourhood of the
point x = a and that f 0 , f 00 exists at this point,
and let T2 be a polynomial of degree at most two.
Then we say that T2 is the tangent parabola
at x = a if we have
8
< T2 (a) = f (a)
>
T20 (a) = f 0 (a)
>
: 00 Fig. 6. Example of tangent parabola
T2 (a) = f 00 (a) of f (x) = ln(1 + x) at x = 0.

Exercise 13.11 Prove that if T2 satisfies the above definition for a = 0, then
f 00 (0) 2
T2 (x) = f (0) + f 0 (0)x + x .
2
Exercise 13.12 Determine the tangent parabola at x = 0 for the following functions.
(a) f (x) = ex (b) f (x) = ln(1 + x) (c) f (x) = sin x.
402 CHAPTER 13. TAYLOR POLYNOMIALS

In the above exercise, you were asked to find the formula for tangent parabolas
at x = 0. While it is not that hard to directly find the corresponding formula near
some x 6= 0, the computations do become annoying if you are not careful – especially
when we consider Taylor polynomials of large order. Here is a "trick" that helps keep
computations simple.

Remark 13.13 (Trick for computing the tangent parabola formula if a 6= 0)


The trick is to "center" the notation for the polynomial at x = a. Indeed, if T2 (x) is
a polynomial of degree at most 2, then so is the translate T2 (x + a). This means there
exists coefficients c0 , c1 , c2 such that

T2 (x + a) = c0 + c1 x + c2 x2 .

And so, by replacing x by x a in this formula, we obtain that if T2 is a polynomial of


degree at most 2, then we can express it on the form

T2 (x) = c0 + c1 (x a) + c2 (x a)2 .

As should be clear, this trick extends directly to higher order polynomials.

Example 13.14 Let us express p(x) = 1 + x + x2 in notation centered at x = 1. To


this end, we first compute p(x + 1) = 1 + (x + 1) + (x + 1)2 = 3 + 3x + x2 . Replacing x
by x 1 in this formula, we get the desired expression p(x) = 3 + 3(x 1) + (x 1)2 .

Exercise 13.15 Use the formula from the above remark to show that the tangent
parabola of a function at x = a is given by
f 00 (a)
T2 (x) = f (a) + f 0 (a)(x a) + (x a)2 .
2

Exercise 13.16 An approximation that is often used by engineers is (1 + x)1/m '


1 + x/m for "small values" of x (this approximation is even true for negative m).
p
(a) Visualise the approximation you get of 1 + x by using the tangent
p line at x = 0
(that is, compare the two graphs). What approximation for 2 does it give?
How accurate is it?
p
(b) What approximation do you get in (a) if you use the tangent parabola of 1 + x
at x = 0 instead? Visualise the approximation, and determine how accurate it is.
p p
(c) Approximate the value of 43 by using the tangent line for x at x = 36. What
approximation does this give? How accurate is it?
(d) What approximation do you get in (c) if you use the tangent parabola instead?
How accurate is it?
13.1. A FIRST LOOK AT TAYLOR POLYNOMIALS 403

The definition of Taylor polynomials and Taylor’s formula


We are now ready to formulate our definition of n’th order Taylor polynomials.

Definition 13.17 (Taylor polynomial) Suppose f is defined in a neighbourhood of


the point x = a and that f 0 , f 00 , ..., f (n) all exist at this point, and let Tn be a polynomial
of degree at most n.
Then, we say that Tn is the polynomial Taylor polynomial for f of order n centered
at x = a if
Tn(k) (a) = f (k) (a), for all k 2 {0, 1, 2, . . . , n}.

In the table below, we indicate


the Taylor polynomials most im-
portant for us. Notice that all are
centered at x = 0. This is because,
in practice, we almost always prefer
to approximate functions near x =
0 to keep formulas simple. E.g.,
p
instead of approximating x near
x = 36, wep usually approximate its Fig. 7. Here, we see the 9 first Taylor polynomials
translate x + 36 near x = 0. of f (x) = ln(1 + x).

Table 13.18 (Some useful/common Taylor approximations centered at x = 0)


n
x2 xn X xn
ex ⇡ 1 + x + + ··· + =
2! n! n!
k=0
n
X
x3 x5 x2n+1 x2k+1
sin x ⇡ x + · · · + ( 1)n = ( 1)k
3! 5! (2n + 1)! (2k + 1)!
k=0
cos x ⇡
1

1 x
ln(1 + x) ⇡

arctan(x) ⇡

(1 + x)↵ ⇡

arcsin(x) ⇡

arccos(x) ⇡

We highly recommend that you fill out the above table every time you encounter a
new Taylor polynomial throughout this chapter.
404 CHAPTER 13. TAYLOR POLYNOMIALS

Remark 13.19 Since Taylor polynomials centered at x = 0 turn out to be particularly


useful in practice, applied mathematicians, physicists and engineers (but rarely pure
mathematicians) tend to call these Maclaurin
polynomials after the Scottish mathematician
Colin Maclaurin (who also discovered the inte-
gral test for the convergence of infinite series).
In contrast to Taylor, Maclaurin led a happier
life, and was renowned as an excellent teacher.
Still, he died at age 48, not long after having
helped (unsuccessfully) defend his home city of
Edinburgh during the Jacobite rebellion. Fig. 8. Colin Maclaurin (1698 – 1746).

The following theorem extends the formulas for the tangent lines and parabolas on
page 401. It allows us to compute most (but not all) Taylor polynomials in Table 13.18.

Theorem 13.20 (Taylor’s formula) Suppose f is defined in a neighbourhood of the


point x = a and that f 0 , f 00 , ..., f (n) all exist at this point. Then the n’th order Taylor
polynomial of f centered at x = a is given by

f 00 (a) f n (a)
Tn (x) = f (a) + f 0 (a)(x a) + (x a)2 + · · · + (x a)n .
2 n!

Proof. By the trick from Remark 13.13, we can write


Tn (x) = c0 + c1 (x a) + c2 (x a)2 + · · · + cn (x a)n . (13.1)
Differentiating this expression, we obtain
Tn0 (x) = 0 + c1 + c2 2(x a) + · · · + cn n(x a)n 1

Tn00 (x) = 0 + 0 + c2 2 + · · · + cn n(n 1)(x a)n 2

..
.
Tn(n) (x) = 0 + 0 + 0 + · · · + cn n(n 1)(n 2) · · · 2 · 1.
Letting x = a, we observe that only the first term from each expression is non-zero, and
that we get
Tn (a) = c0
Tn0 (a) = c1
Tn00 (a) = c2 2
..
.
Tn(n) (a) = cn n!
(k)
But, by hypothesis, we have Tn = f (k) (a). Solving for the cn , and inserting this into
(13.1), above, we obtain the desired formula.
13.1. A FIRST LOOK AT TAYLOR POLYNOMIALS 405

Example 13.21 We compute the (perhaps surprisingly important) Taylor polynomials,


centered at x = 0, of
1
f (x) = .
1 x
To apply Taylor’s formula, we compute a general formula for f (k) (x). To get a better
idea of how it should look, we start by computing the first few derivatives:
f 0 (x) = (1 x) 2

f 00 (x) = 2 · (1 x) 3

f 000 (x) = 2 · 3 · (1 x) 4

f (4) (x) = 2 · 3 · 4 · (1 x) 5
.
Based on this, it would seem that a reasonable guess for the k’th derivative would be
f (k) (x) = k!(1 x) (k+1)
.
Let us now prove this formula by induction. Since we already took care of proving the
"base case", all that remains is to do the "induction step". That is, we show that if the
formula holds for k, then it this also holds for k + 1. We do this as follows:
d (k) d
f (k+1) (x) = f (x) = k!(1 x) (k+1)
dx dx
(k+1) 1
= k!( k 1)(1 x) · ( 1)
(k+2)
= (k + 1)!(1 x) .
(We point out that in this computation, the induction hypothesis was used when we
wrote out the formula for f (k) . Also, note that the factor ( 1) appearing in the middle
line is the derivative of the inner function 1 x.)
Inserting x = 0 into the expression for the k’th derivative of f , we get
f (k) (0) = k!
And so, by Taylor’s formula, we find that the Taylor polynomials of f centered at x = 0
are exactly
Tn (x) = 1 + x + x2 + · · · + xn .

Exercise 13.22 Compute the Taylor polynomials, of all orders, centered at x = 0, for

(a) f (x) = ex (b) f (x) = ln(1+x) (c) f (x) = sin x (d) f (x) = cos x.

Exercise 13.23 (Exercise


p 13.16, continued.) What order Taylor polynomial do you
need
p to approximate 2 with an accuracy of 2 decimal digits? (b) Next, approximate
43 with 3 decimal digits. (Use trial and error in both (a) and (b).)
406 CHAPTER 13. TAYLOR POLYNOMIALS

13.2 Error estimates for Taylor polynomials


Error estimate for tangent lines
Suppose we want to understand how well, say,
f (x) = sin x is approximated by its first order
Taylor polynomial T1 (x) = x. Then, we need to
study the the error function
def
E1 (x) = f (x) T1 (x)

= sin x x,

which is illustrated in the figure to the right.


The following result gives a first estimate on Fig. 9. Here, f (x) = sin x is in red,
the error made when approximating functions by and T1 (x) is in blue.
their tangent lines.

Proposition 13.24 Suppose f, f 0 , f 00 are defined and continuous in a neighbourhood


of the point x = a. Then on a (possibly smaller) neighbourhood of x = a, we have

f (x) = f (a) + f 0 (a)(x a) + E1 (x) with |E1 (x)|  C(x a)2 ,

where C can be chosen to be the maximum of |f 00 (d)| for d on this neighbourhood.

Proof. Let T1 (x) be the tangent line of f at


x = a. To find a useful expression for the cor-
responding error function, we use the the Mean
Value Theorem (MVT) twice:

E1 (x) = f (x) T1 (x)


⇣ ⌘
= f (x) f (a) + f 0 (a)(x a)
Fig. 10. The relative positions of the
= f (x) f (a) f 0 (a)(x a) points a,c, d and x. You should think
| {z }
apply MVT of the points a and x as being placed
first. Then c between a and x, fol-
= f 0 (c)(x a) f 0 (a)(x a)
lowed by d between a and c. For us,
0 0
= f (c) f (a) (x a) the important thing to notice is that
| {z } both c and d are always between a
apply MVT
and x (even if x is placed to the left
00
= f (d)(x a)(c a). of a).
Although it may not seem like much, this is actually a rather helpful formula. Indeed,
13.2. ERROR ESTIMATES FOR TAYLOR POLYNOMIALS 407

since c is between a and x, we have |c a|  |x a|, and therefore, we get

|E1 (x)| = |f (x) T1 (x)| = |f 00 (d)(c a)(x a)|  |f 00 (d)|(x a)2 .

Since d is between a and x, we have that d is in the same neighbourhood of a as x. By


shrinking the neighbourhood, if necessary, we can assume that it is closed. Since f 00 ,
and therefore also |f 00 |, is continuous, it follows by the Min-max theorem that |f 00 | has a
largest value on this closed interval. Denoting this value by C ends the proof.
Below, we give some examples and exercises to illustrate how this proposition is used.

Exempel 13.25 How well is f (x) = sin x approximated on the interval [ 1/10, 1/10]
by its tangent line T1 (x) centered at x = 0? By what we did above, we know that

|E1 (x)| = | sin x T1 (x)|  Cx2 ,

where C is the maximum of |f 00 (c)| = | sin c| for c


in [ 1/10, 1/10]. Using the estimate, | sin c|  1,
this leads to the estimate

|E1 (x)|  x2 .

In particular, this means that the error we make


on the interval [ 1/10, 1/10], when we replace f
Fig. 11. In green, the “zone” where
by T1 , is no larger than 1/102 = 1/100.
our second estimate allows the error
However, it is also true that | sin(c)|  |c|, to live, and in red, the actual error.
which is a better inequality if c is close to 0
(which is the case here). Now, since c is between 0 and x, it also follows that |c|  |x|,
and so we get
|E1 (x)|  |x| · x2 = |x|3 .

In particular, this means that the error we make on the interval [ 1/10, 1/10], when we
replace f by T1 , is no larger than 1/103 = 1/1000.

Exercise 13.26 (a) In the above example we first showed that sin x = x + E1 (x),
where |E1 (x)|  x2 for x close to 0. Use this to "fix" the computation in Example
13.4.
(b) The second estimate above was |E1 (x)|  |x|3 . Does this matter when used to
Example 13.4? Would it make a difference if the estimate was |E1 (x)|  C|x|n
where C > 0 is any constant and n 2?
Exercise 13.27 Use Proposition 13.24 to determine the following limits.
ln(1 + x) ex 1 arcsin(x)
(a) lim (b) lim (c) lim .
x!0 x x!0 x x!0 x
408 CHAPTER 13. TAYLOR POLYNOMIALS

We now consider an example with a slightly different flavour.

Example 13.28 (Example 13.6 revisited) Using Proposition 13.24, we obtain that
for x 2 [ 1,1] we have

f (x) = 1 + x + E1 (x) with |E1 (x)|  ex2 .


p p
Letting x = t2 /2 in this estimate, we find that for t 2 [ 2, 2], we have

t2 et4
ex = 1 + E1 ( t2 /2) with |E1 ( t2 /2)|  .
2 4
But this means that
Z 1 Z 1 Z 1
1 t2 /2 1 1
p e dt = p T1 ( t2 /2)dt + p E1 ( t2 /2)dt .
2⇡ 0 2⇡ 0 2⇡ 0
| {z } | {z }
the approximation the error made

To estimate the error, we use the triangle inequality for integrals:


Z 1 Z 1
1 2 1
p E1 ( t /2)dt  p |E1 ( t2 /2)|dt
2⇡ 0 2⇡ 0
Z 1 4
1 et e
p dt = p ⇡ 0.05.
2⇡ 0 4 20 2⇡
That is, the error we make is at most 0.05.

We now make the following observation: the error with the actual error made in
Example 13.6 is roughly
0.3413 0.3324 ⇡ 0.009,
which is much smaller than the estimate found above. This motivates the following
questions:

1. Can the error estimate from Proposition 13.24 be improved?


2. Can our use of the error estimate in the above example be improved?

As it turns out, the answer to both questions is yes. In particular, since understanding
how to get a better value for the constant C in Proposition 13.24 will help us extend the
proposition to higher order Taylor polynomials, we will now explain how this is done.

Proof of Proposition 13.24, revisited. A problem in our original proof is that we use the
Mean Value Theorem in the first steps. Indeed, the "unknown" quantities c and d are
rather hard to handle since we do not know exactly where they are. Instead, we can try
to use the Fundamental Theorem of Calculus (FTC) to start the computation as follows:
13.2. ERROR ESTIMATES FOR TAYLOR POLYNOMIALS 409

⇣ ⌘
E1 (x) = f (x) T1 (x) = f (x) f (a) + f 0 (a)(x a)

= f (x) f (a) f 0 (a)(x a)


| {z }
apply FTC
Z x
= f 0 (t) dt f 0 (a)(x a)
a
Notice how we have simplified the expression for E1 (x), somewhat, without having in-
troduced anything beyond our control. There are now several ways of continuing this
computation. One interesting variant is to use integration by parts1 . For this to work,
we need to be slightly clever, and observe that inside of the integral, t is the variable and
x just a constant. In particular, it is true that (t x) is a primitive of 1. Using this, we
can continue the computation as follows:
Z x
... = f 0 (t) · 1 dt f 0 (a)(x a)
a
| {z }
Apply partial integration
with (t x) as the primitive of 1
⇣⇥ Z x ⌘
0
⇤x
= f (t)(t x) a f 00 (t)(t x) dt f 0 (a)(x a)
| {z } a
=f 0 (a)(x a)
Z x
= f 00 (t)(x t) dt,
a

where we flipped (x t) to (t x) to get rid of a minus sign in the last step. Moreover,
observe how we are still in total control since there are no c’s or d’s anywhere. But this
is as far as it goes. Because now, in the last step, we apply the Generalised Mean Value
Theorem for Integrals (Proposition 12.41):
Z x
00 f 00 (c)
= f (c) (x t)dt = (x a)2 ,
a 2
where, finally, c, which is some number between x and a, appears.

Exercise 13.29 Use the result of the above computation to find a better value for the
constant C appearing in Proposition 13.24 (you should probably record this improved
value in the margin next to the proposition).
Exercise 13.30 Use the error estimate from exercise 13.29, in addition to any other
computational tweak you can think of, to improve the error estimate in Example 13.28.
Exercise 13.31 (Exercises 13.16 and 13.23, continued.) Use the error estimate from
exercise
p 13.29 to estimate the error made when we used tangent lines to approximate
2. (Compare this to the actual error.)
1
Let us think of this as an homage to Taylor as he invented the technique.
410 CHAPTER 13. TAYLOR POLYNOMIALS

Lagrange’s error formula for Taylor polynomials


We now turn to the last major result of these lec-
ture notes. It is a formula for the error functions
En (x) = f (x) Tn (x).
The formula is named in honor of the Italian
mathematician Lagrange (his birth name was ac-
tually Giuseppe Lodovico Lagrangia). Unlike
Taylor and Maclaurin, he lived a long and pros-
perous life, and made important contributions to
most, if not all, fields of mathematics. Fig. 12. Joseph-Louis Lagrange
His formula is as follows: (1736 – 1813).

Theorem 13.32 (Lagrange’s error formula) Suppose that f, f 0 , f 00 , . . . , f (n+1) are


all defined and continuous in an open neighbourhood of x = a, and let Tn (x) be the
Taylor polynomial of f of order n centered at x = a. Then for x in this neighbourhood,
f (n+1) (c)
f (x) = Tn (x) + En (x) with En (x) = (x a)n+1 ,
(n + 1)!
where c is some number between x and a.

As it turns out, we are in a good position to prove this result. Indeed, the second proof
for Proposition 13.24 actually establishes Lagrange’s formula for n = 1, and basically
contains all the ideas required. In the following exercise, you are guided, step-by-step,
into proving the full result.

Exercise 13.33 We now prove Theorem 13.32. To this end, let f be as in the state-
ment, and let En = f Tn .
(a) The main part of the proof is to establish, by induction, the following integral
formula: Z x
(x t)n
En (x) = f (n+1) (t) dt.
a n!
(i) Re-read the second proof for Proposition 13.24, and observe that we have
already showed that the base case n = 1 holds.
(ii) To better understand how the induction step should work, first justify why
00
E2 (x) = E1 (x) f 2(a) (x a)2 .
(iii) Next, use the relation in (ii) in combination with a suitable application of
integration by parts, to prove that the integral formula for n = 2 follows
from the one for n = 1.
(iv) Use induction to prove the integral formula for general n.
(b) Use the Generalised Mean Value Theorem for Integrals (Proposition 12.41) to
deduce Lagrange’s error formula from the integral error formula.
13.2. ERROR ESTIMATES FOR TAYLOR POLYNOMIALS 411

Over the next few pages, we are going to consider some examples and exercises. To
help you out, in the following remark, we summarise what we have shown above.

Remark 13.34 Suppose f, f 0 , . . . , f (n) are defined at the point x = a. Then we can
compute the n’th order Taylor polynomials of f , centered at x = a, to be

f 00 (a) f (n) (a)


Tn (x) = f (a) + f 0 (a)(x a) + (x a)2 + · · · + (x a)n .
2 n!

Moreover, if f, f 0 , . . . , f (n) as well as f (n+1) are continuous in a neighbourhood of the


point x = a, the Taylor polynomial approximates f close to x = a in the sense that

f (n+1) (c)
f (x) = Tn (x) + En (x) with En (x) = (x a)n+1 .
(n + 1)!

Finally, if we restrict ourselves to a closed neighbourhood of x = a where f (n+1) is


continuous (which can always be achieved by "shrinking" the open neighbourhood of
x = a), we can apply the Bounded Function Theorem (Lemma 9.65) to obtain that
there exists a constant C > 0 so that for x in this closed neighbourhood, we have

f (x) = Tn (x) + En (x) with |En (x)|  C|x a|n+1 .

Exercise 13.35 (a) Show in detail how we use the Bounded Function Theorem to
obtain the last estimate for En (x) in the above remark from the Lagrange error
formula.
(b) What formula does this give for the constant C?

Exercise 13.36 (Exercises 13.16, 13.23 and 13.31, continued.)

(a) Use Lagrange’s error formula


p in some form to determine the order of thepTaylor
polynomial of f (x) = 1 + x centered at x = 0 needed to approximate 2 with
an accuracy of at least 10 6 .
p
(b) Figure out how to approximate 43 with an accuracy of at least 10 6 .
412 CHAPTER 13. TAYLOR POLYNOMIALS

Example 1: How to use error estimates for point-wise approximations

Example 13.37 How large do we have to choose n for Tn (x) (centered at x = 0) to


approximate sin x with an accuracy of at least 1/100 on the interval [ 6,6]? Based on
Figure 1, we see that while T7 is not yet good enough, it may be that T37 does the trick.
To figure this out, we investigate the error function.
Note that since the derivatives of sin x are of the form ± sin x or ± cos x, we have
sup |f (n+1) (c)|  1.
c2[ 6,6]

This means that Lagrange’s error formula (or Remark 13.34) yields the estimate
|x|n+1 |x|n+1
|En (x)| = |f (n+1) (c)|  .
(n + 1)! (n + 1)!
The question now becomes, when is this less than
1/100 for all x 2 [ 6,6]? First, note that |x|n+1
is the largest when x = 6. This means we have
to figure out how large n has to be for

6n+1 1

(n + 1)! 100

to hold. Using, say, Python, we obtain the table


shown to the right. We see that the estimate
of the error starts out getting worse and worse,
before it finally starts to drop. At n = 18, it is
below 1/100.
Here is an alternative way to figure out a suit-
able value for n where we do not rely on a com- Fig. 13. Values for 6n+1 /(n + 1)!.
puter doing part of the job:
6n+1 6 · 6 · 6···6
|En (x)|  =
(n + 1)! 1 · 2 · 3 · · · (n + 1)
65 6n 5 6 1944
= · ·  .
1
| · 2 ·
{z3 · · · 5 6 ·
} | {z }7 · · · n n + 1 5(n + 1)
=324/5 1

So, to determine an n which makes this less than 1/100, we compute


1944 1
< =) 1944 · 20 < n + 1 =) 38879 < n.
5(n + 1) 100
In other words, working by hand, we get the estimate n 38880, while using the table
in Figure 13 gives us the estimate n 18. While both are correct, one is certainly better
than the other.
13.2. ERROR ESTIMATES FOR TAYLOR POLYNOMIALS 413

Exercise 13.38 In the above example, we basically figured out how large to choose the
order for sin(6) to be approximated by Tn (6), where the Tn are Taylor polynomials
centered at x = 0. Determine what order you need for this approximation to hold
with the same accuracy (i.e., 1/100), if you instead use Taylor polynomials for sin x
centered at x = 2⇡ ⇡ 6.28.

Exercise 13.39 (a) Let T2n+1 be the Taylor expansion of sin(x) of order 2n + 1
centered at x = 0. Show that the error function E2n+1 satisfies the inequality

x2n+3
|E2n+1 (x)|  .
(2n + 3)!

(b) According to the error estimate in (a), what order Taylor polynomial do you need
for it to approximate sin(1) with an error of less than 10 4 ? What about 10 16 ?

Exercise 13.40 Show that the error function En for the Taylor polynomial Tn of
f (x) = 1/(1 x) is given by
1
En (x) = xn+1 ,
(1 c)n+2

where c is some number between 0 and x.

Exercise 13.41 In this exercise we are to use the Taylor polynomials Tn for ex (cen-
tered at x = 0), to estimate the value for e. We suppose that we only know the
derivatives of ex and that e  3.

(a) Show that on the interval [ 1,1], the error function for the Taylor polynomials
Tn (x) satisfy the relation
3|x|n+1
|En (x)| 
(n + 1)!
(b) How many terms from the Taylor polynomial of ex is needed to approximate
e = e1 with an accuracy of 16 digits? (Why not make a table such as in the
above example?).
414 CHAPTER 13. TAYLOR POLYNOMIALS

Example 2: More tricky error estimates

Example 13.42 (Example 13.6, revisited, again) Let us again consider the integral
Z 1
1 t2
p e 2 dt. (13.2)
2⇡ 0
We now use Lagrange’s formula for the error function to determine how many terms
from the Taylor polynomial Tn for f (x) = ex (centered at x = 0) we need for the integral
to be approximated by Z 1 ⇣ 2⌘
1 t
p Tn dt (13.3)
2⇡ 0 2
with an error less than 10 7.

First, we use the relation ex = Tn (x) + En (x) to obtain that


Z 1 Z 1 ⇣ 2⌘ Z 1 ⇣ 2⌘
1 t2 1 t 1 t
p e 2 dt = p Tn dt + p En dt
2⇡ 0 2⇡ 0 2 2⇡ 0 2
| {z }
Rn

Here, Rn represents the error we make when we approximate the integral (13.2) by the
integral (13.3). So, our goal is to estimate how large it is.
By Lagrange’s formula, we obtain

xn+1 xn+1
En (x) = f (n+1) (c) = ec ,
(n + 1)! (n + 1)!

where c is between 0 and x. Putting x = t2 /2, this becomes


⇣ t2 ⌘ t2n+2
En = ec ( 1)n+1 n+1 ,
2 2 (n + 1)!

where c is between 0 and t2 /2. Taking absolute values, this means that
⇣ t2 ⌘ t2n+2
En  n+1 .
2 2 (n + 1)!
By using the triangle inequality for integrals, this implies that
Z 1 ⇣ t2 ⌘ Z 1
1 1 t2n+2
|Rn |  p En dt  p dt
2⇡ 0 2 2⇡ 0 2n+1 (n + 1)!
1 1
=p n+1
.
2⇡ 2 (2n + 3)(n + 1)!
You are asked to use this estimate to complete this example in exericse 13.43, below.
13.2. ERROR ESTIMATES FOR TAYLOR POLYNOMIALS 415

Exercise 13.43 In this exercise we consider the above example.

(a) Use, say, Python to make a table such as the one in Example 13.37. How large
do you have to choose n to get |Rn |  10 7 ? Does this match what you found in
Example 13.7?
(b) Compare the estimated error and the "actual error". Which is largest? Does
this make sense? (Here, to find the "actual error", why not use the value for the
integral as given by WolframAlpha?)

Exercise 13.44 (From old exam) Use Taylor’s formula with Lagrange error term to
decide whether or not Z 1
cos(x2 )dx > 9/10.
0

Exercise 13.45 (From old exam)

(a) Show that for 0 < x < ⇡/2 we have


x3 x5
sin x = x + R(x), where 0 < R(x) < .
6 120

(b) Compute the integral


Z ⇡/2
sin(cos t)dt
⇡/3

with an error of less than (1/2) · 10 3.

Exercise 13.46 (From old exam)

(a) For t 0, show that

p t t2 t3
1+t=1+ + R(t) with |R(t)|  .
2 8 16

(b) Use this to approximate the generalised integral


Z 1 p
e x 100 + e x dx.
0

How large is the error?


416 CHAPTER 13. TAYLOR POLYNOMIALS

13.3 A uniqueness theorem for Taylor polynomials


Sometimes it is quite difficult to compute Taylor polynomials directly from the definition.
A notorious example, which can cause trouble on exams, is given by the ones for the
arctangent.

Example 13.47 Let us try to find the Taylor expansion of f (x) = arctan(x) of order n
(centered at x = 0). Since arctan x is our favourite function, we expect this to work in
an extremely beautiful way. But here is what happens when we compute its derivatives:
1
f 0 (x) =
1 + x2
2x
f 00 (x) =
(1 + x2 )2
2(1 + x2 )2 ( 2x)2(1 + x2 )(2x)
f 000 (x) =
(1 + x2 )4

f (4) (x) = ....


Fig. 14. Aaaaagh!

This is rather curious since it turns out that the arctangent has rather nice Taylor
polynomials. The problem is that we need some tool other than Taylor’s formula to
determine them. Such a tool is actually offered by the uniqueness theorem for Taylor
polynomials.

Taylor polynomials give the best approximations by polynomials

Before we state the uniqueness theorem, we need to know exactly what we should mean
by a "good" approximation near the point x = a. We therefore recall that, by Remark
13.34, we know that the n’th order error function satisfies

|En (x)|  C|x a|n+1 ,

for some constant whose value, it turns out, does not matter here. Indeed, the uniqueness
theorem says that any polynomial approximating f with an error smaller than or equal
to what is given in the above estimate (no matter how large C is) must itself be a Taylor
polynomial.
In fact, recall that also in exercise 13.26, we saw a situation where the exact value of
the constant C in the error term estimate did not matter. Therefore, we might as well
introduce the following definition.
13.3. A UNIQUENESS THEOREM FOR TAYLOR POLYNOMIALS 417

Definition 13.48 (Big-oh) Let g(x) and h(x)


be two functions such that the inequality
|g(x)|  C|h(x)|
holds (at least) in a punctured neighbourhood of
some point x = a for some constant C > 0. Then
we write
g(x) = O (h(x)) as x ! a,
and say that g is of at most order h for x close to Fig. 15. Here, g(x) (in red) is domi-
a (or we just say that g is Big-oh of h near a). nated by h(x) = x3 (in green).

In terms of this notation, we can formulate the uniqueness theorem for Taylor polyno-
mials . When you get used to the Big-oh notation, you will start to notice how it makes
these types of statements a bit easier to read (indeed, compare it to the last statement
of Remark 13.34).

Proposition 13.49 (The uniqueness theorem for Taylor polynomials) Suppose


f, f 0 , f 00 , . . . , f (n+1) are defined and continuous on some neighbourhood of x = a, and
that p is a polynomial of degree at most n. Then

f (x) = p(x) + O (x a)n+1 as x ! a () p(x) = Tn (x),

where Tn is the n’th order Taylor polynomial of f centered at x = a.

Proof. As the implication " (= " is just Remark 13.34, it only remains to prove the
implication " =) ". To this end, we start by expressing the polynomial p on the form
(recall Remark 13.13)
p(x) = c0 + c1 (x a) + c2 (x a)2 + · · · + cn (x a)n .
By the hypothesis, and the definition of the Big-oh, we have for x close to a that
|f (x) c0 c1 (x a) c2 (x a)2 ··· cn (x a)n |  C|x a|n+1 . (13.4)
Since f is continuous, letting x ! a, we get |f (a) c0 | = 0. That is,
c0 = f (a).
Inserting this into (13.4), and then dividing by |x a| on both sides, we get
f (x) f (a)
c1 c2 (x a) ··· cn (x a)n 1
 C|x a|n .
x a
Since f is differentiable, letting x ! a, we get |f 0 (a) c1 | = 0. That is, c1 = f 0 (a).
Continuing in this way, we obtain the desired result.

Exercise 13.50 Complete the proof of the uniqueness theorem.


418 CHAPTER 13. TAYLOR POLYNOMIALS

Example 1: Geometric sums


As a first, and hopefully friendly, example of how
to determine Taylor polynomials by using the
uniqueness theorem, we revisit an old friend of
ours: the formula for the partial sums of the Ge-
ometric series.
While the example may not seem like all that
much, it is surprisingly important, and basically
Fig. 16. Archimedes says "hello"!
lets us compute the Taylor polynomials for the
arctangent.

Example 13.51 We now show that the the function


1
f (x) =
1 x
has the polynomial
p(x) = 1 + x + x2 + · · · + xn
as its n’th order Taylor polynomial centered at x = 0. In order to use the uniqueness
theorem, we need to show that f (x) = p(x) + O xn+1 . That is, we must show that on
some neighbourhood of x = 0, there exists a constant C > 0 so that

|En (x)| = |f (x) p(x)|  C|x|n+1 .

To this end, we recall Lemma 5.13, which, in the context of the language of Taylor
polynomials says that

xn+1
f (x) = p(x) + En (x) with En (x) = .
1 x
Now, to obtain the desired estimate on the error term, all we have to do is find some
neighbourhood of x = 0 where 1/(1 x) does not become to large. One such interval is
[ 1/2,1/2], where we have 1/2  1 x  3/2. This implies that on this interval we have

xn+1 1
|En (x)| = = |x|n+1  2|x|n+1 .
1 x 1 x
From this, it follows by the uniqueness theorem for Taylor polynomials that p(x) = Tn (x).

Exercise 13.52 Determine a C > 0, so that for x 2 [ , ], the inequality |En (x)| 
C|x|n+1 holds.

Exercise 13.53 Compare the error term found in the above example to the one from
exercise 13.40. Which one do you think is better?
13.3. A UNIQUENESS THEOREM FOR TAYLOR POLYNOMIALS 419

Example 2: The logarithm

As a warm up exercise before moving on to


the arctangent, we visit another old friend
of ours. Indeed, in this second example, we
show how to take advantage of the previ-
ous example to find Taylor polynomials for
ln(1 + x). As a consequence, we will finally
be able to deduce Mengoli’s formula for the
alternating harmonic series:
1
X ( 1)k+1
= ln(2).
k
k=1 Fig. 17. Mengoli says "hello"!

Example 13.54 We show that the n’th order Taylor polynomial, centered at x = 0, of

f (x) = ln(1 + x)

is given by
x2 x3 xn
p(x) = x + · · · + ( 1)n 1 .
2 3 n
To do this, the first step is to plug x = t into the formula for the (n 1)’st partial
sums of the Geometric series. This yields the expression
1 ( 1)n tn
=1 t + t2 · · · + ( 1)n 1 n 1
t + . (13.5)
1+t 1+t
The point is that when we integrate both sides of this expression from 0 to x, we obtain
Z x⇣ ⌘ Z x
( 1)n tn
ln(1 + x) = 1 t + t2 · · · + ( 1)n 1 tn 1 dt + dt
0 0 1+t
Z x
x2 x3 xn ( 1)n tn
=x + · · · + ( 1)n 1 + dt.
| 2 3 {z n} 0 1+t
=p(x)

This is great! In particular, this gives the following formula for the error term:
Z x
( 1)n tn
En (x) = f (x) p(x) = dt.
0 1+t
Using the Generalised Mean Value Theorem for Integrals (Proposition 12.41), we obtain

1 ( 1)n xn+1
En (x) = ,
1+c n+1
420 CHAPTER 13. TAYLOR POLYNOMIALS

where c is some number between 0 and x. Next, if we restrict x to, say, the interval
[ 1/2, 1/2], this implies that c 2 [ 1/2, 1/2], and so, we have 1/2  1 + c  3/2. In
particular, this means that
2
|E(x)|  |x|n+1 .
n+1
This means that by the Uniqueness Theorem for Taylor Polynomials, we have that p(x)
is the Taylor polynomial for ln(1 + x) centered at x = 0.

Exercise 13.55 In this exercise we consider the above example.

(a) Modify the above argument to find an estimate for |E(x)| that is valid for x 2
[ 9/10,9/10].
(b) Find an estimate for |E(x)| that is valid for x = 1.
(c) Use (b) to prove Mengoli’s formula for the value of the Alternating Harmonic
series.
13.3. A UNIQUENESS THEOREM FOR TAYLOR POLYNOMIALS 421

Example 3: Inverse trigonometric functions


While some developments from ancient Greek
and early European mathematics are fairly well
known, Indian mathematics also has a rich his-
tory. In fact, it is reasonable to claim that math-
ematical analysis was invented by the indian
mathematician Madhava Sangamagrama around
1350 and not by Newton and Leibniz 300 years
later. One of the main achievements by Madhava
was the discovery of the infinite series represen- Fig. 18. Madhava Sangamagrama
tation of the inverse trigonometric functions! (1350 – 1425) says "hello"!.
In the exercises below, you are to use the techniques discussed on the previous pages
to determine the Taylor polynomials for the arctangent, arcsine and arccosine.
Exercise 13.56 We now deduce a formula for the Taylor polynomials of arctan(x).
(a) Let f (x) = 1/(1 + x2 ). Do as in Example 13.51, and express f (x) = T2n (x) +
E2n (x), where T2n is the Taylor polynomial of f of order 2n, centered at x = 0,
and E2n the corresponding error function.
(b) Follow Example 13.54, and use (a) to determine the Taylor polynomials T2n+1 ,
centered at x = 0, for arctan(x). In particular, show that the error function
satifies
|x|2n+3
|E2n+1 (x)|  .
2n + 3
Exercise 13.57 We now ask you to find the Taylor polynomials of the function f (x) =
(1 + x)↵ centered at x = 0 (these are needed below to find the Taylor polynomials for
the arcsine and arccosine). In doing this, the Pochhammer symbol may be helpful:
def
( )k = ·( 1) · ( 2) · · · ( k + 1), 2 R.
n
(a) As a warm up exercise, express as a quotient of two Pochhammer symbols.
m
(b) Compute the first few derivatives of (1 + x)↵ , and use the Pochhammer symbol
to guess a formula for the n’th derivative.
(c) Prove the formula guessed in (b) using induction.
(d) Write out the formula for the n’th order Taylor polynomial for f (x) = (1 + x)↵
with some suitable expression for the error term.
Exercise 13.58 (a) Determine the Taylor polynomials of arcsin(x) by using the unique-
ness theorem, in combination with exercise 13.57 and the formula
d 1
arcsin(x) = p .
dx 1 x2
(b) Use some suitable trigonometric formula to immediately deduce the formula for
the Taylor polynomials of arccos(x).
422 CHAPTER 13. TAYLOR POLYNOMIALS

13.4 The Big-oh "calculus" for error terms


We now go all in on the Big-oh notation since it allows us to simplify, and thereby speed
up, the computation of limits using Taylor polynomials.

First computational rules and a motivating example


Before we consider an example, we discuss two easy-to-prove properties of the Big-oh
that will be of immediate use to us. We formulate them in the case a = 0 since this is
basically the only case that we will care about. The first rule is that
f (x)
f (x) = O x2 as x ! 0 =) = O (x) as x ! 0
x
This is true since, by the definition of the Big-oh, we have near x = 0 that
f (x) |O x2 | C|x2 |
=  = C|x|.
x |x| |x|
The second property is that
f (x) = O (x) as x ! 0 =) lim f (x) = 0.
x!0
This follows from the Squeeze theorem since, close to x = 0, we have
0  |f (x)| = |O (x) |  C|x|.
These two observations are usually expressed, a bit sloppily, on the following form:

Proposition 13.59
O x2
(i) = O (x) (ii) lim O (x) = 0.
x x!0

Let us now consider an example where this is used.

Example 13.60 (Example 13.4, revisited, again) We consider the limit


sin x
lim .
x!0 x

It follows from, say, the Lagrange error formula that we can write
sin x = x + E1 (x) where |E1 (x)|  C|x|2 for x close to 0.
But this means that by the definition of the Big-oh, we can write
sin x = x + O x2 as x ! 0. (13.6)
Finally, by the observations recorded in Proposition 13.59, we obtain
sin x x + O x2 ⇣ ⌘
lim = lim = lim 1 + O (x) = 1 + 0 = 1.
x!0 x x!0 x x!0
13.4. THE BIG-OH "CALCULUS" FOR ERROR TERMS 423

Notice how the Big-oh notation allows us to make the computation in the above
example using exactly the information we need, but nothing more. As we will see,
keeping the level of detail to a minimum allows us to compute more efficiently.

Exercise 13.61 Try to compute the following “standard limits” using the Taylor ap-
proximations of order 1, 2 and 3 using the Big-oh notation for the error term. How
does the answer depend on the order?

sin x ln(1 + x) 1 cos x ex 1


(a) lim (b) lim (c) lim (d) lim .
x!0 x x!0 x x!0 x2 x!0 x

Exercise 13.62 Complete the following table.

Table 13.63 (Some Taylor approximations with Big-oh error terms as x ! 0)

ex =

sin x =

x2 x4 x2n
cos x = 1 + + · · · + ( 1)n + O x2n+2
2 4! (2n)!

1
=
1 x

ln(1 + x) =

x3 x5 x2n+1
arctan(x) = x + · · · + ( 1)n + O x2n+3
3 5 2n + 1

(1 + x)↵ =

arcsin x =

arccos x =
424 CHAPTER 13. TAYLOR POLYNOMIALS

The rulebook for the Big-oh


Before we look at more advanced examples, let us take a moment to formulate and prove
some computational rules for the Big-oh. Since we will only ever use these for the case
when x ! 0, we only formulate the rules in this case (where we use the same, slightly
sloppy notation, as in Proposition 13.59).

Proposition 13.64 (Computational rules) First, we formulate rules (i) and (ii),
from above, in a more general way. Indeed, for ↵, 2 R, the following holds as x ! 0:

(i) x↵ · O(x ) = O(x +↵


).

(ii) ↵ > 0 =) lim O (x↵ ) = 0.


x!0

Moreover, the following holds:

(iii) C 6= 0 =) C · O(x↵ ) = O(x↵ ).


Fig. 19. Notice how almost all of
↵ ↵+
(iv) O (x ) · O(x ) = O(x ). these rules are about how the Big-
oh eats stuff up. As we will see,
(v)  ↵ =) x↵ + O(x ) = O(x ). this means that the Big-oh acts a
lot like Pacman!
(vi) O(x↵ ) + O(x ) = O(xmin{↵, } )

Proof. We only prove rule (v), and leave the others as exercises. So, assume that  ↵.
The following computation then holds for all x 2 [ 1,1]\{0}:

x↵ + O(x )  |x|↵ + |O(x )|  |x|↵ + C|x|

= (|x|↵ + C) · |x|

 (1 + C)|x|
Note that in the last step, we used the fact thats ↵ 0 and that |x|  1. This is
fine since the Big-oh only demands that the estimate holds (at least) on a punctured
neighbourhood of x = 0.
Exercise 13.65 Prove the remaining rules of the above proposition.
Hint: Rules (i) and (ii) are proved just like their counter-parts on page 422. Rules
(iii) and (iv) follows almost immediately from the definition of the Big-oh. Finally,
the proof of rule (vi) is more or less the same as that of rule (v).
Exercise 13.66 While efficient, the slightly sloppy notation we use to express the
computational rules for the Big-oh is not without flaws. For instance, we could claim
both that O x3 = O x2 is true, but that O x2 = O x3 is not. Can you explain
how to make sense of this apparent non-sense?
13.4. THE BIG-OH "CALCULUS" FOR ERROR TERMS 425

Example 1: How to manipulate expressions involving the Big-oh


To illustrate how the Pac-man rules for the Big-oh work in practice, we begin with some
examples where we use the uniqueness theorem for Taylor polynomials to identify Taylor
polynomials of various functions.

Example 13.67 Let us try to find the third order Taylor polynomial (centered at x = 0)
of
f (x) = sin x cos x.
A naive approach would be to just multiply the third order Taylor polynomials of sin x
and cos x and hope for the best. So, let us do this! Multiplying

x3 x2
sin x = x + O x5 and cos x = 1 + O x4
3! 2
we get
⇣ x3 ⌘⇣ x2 ⌘
sin x cos x = x + O x5 1 + O x4
3! 2
x3 x3 x5
=x +
2 3! 2 · 3!
(13.7)
4 x3
+x·O x · O x4 + O x5 · O x4
3!
x2
+ O x5 · O x5 .
2
This looks absolutely horrible! But not to worry. When we use the Pacman-rules of
the Big-oh, the term O x5 eats up all terms with x of the same or higher exponent,
resulting in

x3 x3 2x3
sin x cos x = x + O x5 = x + O x5 .
2 3! 3
By the uniqueness theorem, we have actually found the Taylor polynomial of sin x cos x
of order 4 (!) centered at x = 0.

Exercise 13.68 Let f (x) = sin x cos x. Without computing any derivatives, use the
above example to determine the values of f 0 (0), f 00 (0), f 000 (0) and f (4) (0).

Here are two additional examples.


426 CHAPTER 13. TAYLOR POLYNOMIALS

Example 13.69 We now compute the general expression for the Taylor polynomial of
f (x) = x arctan(x2 )
centered at x = 0. To this end, we first recall that

x3 x5 x2n+1
arctan(x) = x + · · · + ( 1)n + O x2n+3 .
3 5 2n + 1
But this means that
x6 x10 x4n+2
arctan(x2 ) = x2 + · · · + ( 1)n + O x4n+6 ,
3 5 2n + 1
and finally, where we need to use rule (i) of the the rulebook for the Big-oh, we get

x7 x11 x4n+3
x arctan(x2 ) = x3 + · · · + ( 1)n + O x4n+7 .
3 5 2n + 1

Example 13.70 We now compute the Taylor polynomial of order 5 of


f (x) = sin x x cos x,
centered at x = 0. By considering the expression for f , we need to use the Taylor
polynomials of sin x and cos x of orders 5 and 4, respectively. These are:

x3 x5
sin x = x + + O x7
3! 5!
x2 x4
cos x = 1 + O x6 .
2 4!
Using the rulebook for the Big-oh, we get
⇣ x3 ⌘ ⇣ x2 ⌘
sin x x cos x = x + O x5 x· 1 + O x4
6 2
x3 x3 x3
=x + O x5 x+ + O x5 = + O x5 .
6 2 3

Exercise 13.71 Find the third order Taylor approximation centered at x = 0 of:
⇣ x2 ⌘ x
(a) f (x) = ln(1+x) sin x (b) f (x) = 1 x+ e (c) f (x) = sin(sin x).
2
13.4. THE BIG-OH "CALCULUS" FOR ERROR TERMS 427

Example 2: How to compute limits using the Big-oh


We now discuss how to compute limits using Taylor polynomials with Big-oh error term.

Example 13.72 Let us consider the limit


x arctan(x2 )
lim , (13.8)
x!0 sin x x cos x

This limit is of the type [0/0], and from the plot


shown on the right, we have reason to believe
that the limit ought to be 3 (or something really
close to 3).
The idea is to replace the functions in both
the numerator and denominator by Taylor poly-
nomial to obtain an expression that will be on Fig. 20. The graph of the expression
the form inside the limit (13.8).
strong main term + weak errror term
lim .
x!0 strong main term + weak errror term

The point is that this puts us in an excellent position to use Rule of Thumb 2 from
Chapter 4 (let the strong main terms fight each other!). In this case, we use the expres-
sions found in examples 13.69 and 13.70, in combination with rules (i) and (ii) from the
rulebook of the Big-oh, to get

x arctan(x2 ) x3 + O x7 x3 1 + O x4 1+0
lim = lim x3 = lim 3 · 1 2
= 1 = 3,
x!0 sin x x cos x x 3 + O (x ) 3 +0
5
3 + O (x )
x!0 x!0

and we are done!

Exercise 13.73 What happens if we increase the order of any of the Taylor approx-
imations in Example 13.72? In particular, add one term to each of the three Taylor
approximations (and adjust the corresponding Big-oh terms accordingly). How does
this affect the computation of the limit? Is there any way to recognise that you used
"too many" terms?

Exercise 13.74 (From previous exams) Compute the limits

ln(1 + x) sin x (1 + x2 ) 1/2 cos x


(a) lim (b) lim
x!0 x2 x!0 x sin x ln(1 + x2 )
2
ln(1 + x2 ) arctan(x) sin2 x ln(1 + x2 )
(c) lim (d) lim
x!0 x4 x!0 x2 x arctan x
428 CHAPTER 13. TAYLOR POLYNOMIALS

Example 3: Using Big-oh to study infinite series and improper integrals

Example 13.75 Determine whether the following infinite series converges or diverges:
1
X 1
k arctan k1
.
k=1
sin k1

Intuitive step: We need to figure out how the terms in the series behave for large k.
This means that 1/k is small, and it makes sense to consider the Taylor approximations of
arctan x and sin x as x tends to zero 0. To express both the numerator and denominators
on the form “main term” + “error term”, we make the following observation:
8 8
x 3 > 1 1 1
>
<arctan x = x >
<arctan k ⇡ k 3k 3
+ O x5
3 as x ! 0 =) as k ! 1
>
: >
> 1 1
sin x = x + O x 3 : sin ⇡ ,
k k
This implies that 1
arctan k1 ( 3k13 ) 1
k
1 ⇡ 1 = 2, as k ! 1.
sin k (k) 3k
P1
and so, we expect that the series will behave like the convergent ↵-series k=1 1/k
2.

Formal step: We use the limit comparison test, making sure to include Big-oh terms:
1 1
arctan
(k sin k1
k
) 1
arctan k1
lim = lim k 2 · k
k!1 ( k12 ) k!1 sin k1
1
3k3
+ O k15 1
3 +O 1
k2
1
+0 1
= lim k 2 · 1 1 = lim 1 = 3
= .
k!1
k + O k3
k!1 1+O k2
1+0 3

In conclusion, by the limit comparison test, the original series is convergent.

Exercise 13.76 (From previous exam) Determine whether the following series and
generalised integrals converge:
⇣1 Z 1⇣
1⌘ 1⌘
X1
1
(a) k· sin (b) arctan dx.
k k 1 x x
k=1

Exercise 13.77 (From previous exam) Use Taylor approximations with Big-oh er-
ror term to determine for what values of a does the following infinite series converge:
1 ⇣p
X ⌘
k 2a + 1 ka .
k=1
13.4. THE BIG-OH "CALCULUS" FOR ERROR TERMS 429

Example 4: The Big-oh in Physics – The ultraviolet catastrophe


When an object is heated, it emits electromagnetic radiation. Stars are among a class
of objects called (strangely enough) black bodies. These emit electromagnetic radiation
at every wavelength, with the intensity depending on the wavelength.
According to the Rayleigh-Jeans law (late 19th
century), the energy density of this radiation at
the wavelength is given by the function
8⇡kT
fRJ ( ) = 4
.

(The wavelength is measured in meters, the


temperature T in Kelvin, and k is Boltzmann’s
constant.) However, this law has a problem:
while it agrees with observations for large , it
fails in a spectacular fashion as ! 0. Indeed,
it predicts that fRJ ( ) ! 1. Not only did this
Fig. 21. Here, we see both the energy
not agree with observations, it is quite clear that
densities predicted by the Rayleigh-
nothing can emit an infinite amount of energy,
Jeans law, and the later Planck’s law,
so, by the physicists at the time, this was known
for our sun (T = 5700).
as the the ultraviolet catastrophe.
In 1900, the physicist Mac Planck suggested the following model for the energy
density of black body radation:
8⇡hc 5
fP ( ) = .
ehc/( kT ) 1
(Here, , T, k are as above, c is the speed of light, and h is the Planck constant.)

Exercise 13.78 (Optional)

(a) Use L’Hopital’s rule to verify that fP ( ) resolves the ultraviolet catastrophe.
(b) Use a Taylor polynomial with Big-oh error term to show that for large wave-
lengths, then Planck’s law becomes more and more like the Rayleigh-Jeans law.
(c) For what value of is fP the largest for our sun? Google a bit, and use realistic
values for the various constants appearing in the expression. (Hint: The answer
is kind of indicated in the figure above.)
(d) As in (c), estimate the maximum of fP for the stars Betelgeuse (T = 3400),
Procyon (T = 6400) and Sirius (T = 9200). In particular, why do you think
Sirius is thought of as a "blue" star, while "Betelgeuse" is thought of as a "red"
star?
430 CHAPTER 13. TAYLOR POLYNOMIALS

Example 5: The Big-oh in Physics – Special relativity


A rather nice example of the use of the Big-oh notation is found in Einstein’s theory of
special relativity.

Fig. 22. Albert Einstein (1879 – 1955). Personally, I choose to believe that there is
a Big-oh lurking behind the guy. (A prize to the one who finds an actual Big-oh on
a blackboard behind Einstein – such a photo has to exist!)

In Newtonian physics, the energy of an object is given by the formula


1
Ecl = mv 2 .
2
(Here, we are setting the potential energy equal to 0 for the sake of simplicity.) The
higher the speed, the higher the energy. If the speed is 0, then you have no energy.
In the theory of (special) relativity, the formula for energy is different, and is given
by the formula
⇣ v 2 ⌘ 1/2 2
Erel = 1 mc ,
c2
where v is the speed of the object, and c is the speed of light (again, we are ignoring the
potential energy by setting it equal to 0).

Exercise 13.79 (Optional)

(a) What happens with Erel as v ! c ?


(b) Determine the first order Taylor expansion (1 x) 1/2 = P1 (x) + E1 (x), and
express the error term using Big-oh error term.
(c) Replace (1 v 2 /c2 ) 1/2 in the formula for relativistic energy using the Taylor
expansion with Big-oh error found in (b). This yields an expression with 3 terms.
Can you interpret these three? In particular, what happens when v/c is small?
13.5. RELEVANT EXERCISES FROM PREVIOUS EXAMS 431

13.5 Relevant exercises from previous exams


Exercises on Taylor expansions with explicit error estimates
Exercise 13.80 (Exam 2016-01-07)

(a) Show that for x > 0 (this is not a misprint), we have


p 1 1
1 + x4 = x2 + 2 + E(x)
2x 8x6
with
1
|E(x)| < .
16x10
(b) Use (a) to approximate the generalised integral
Z 1 ⇣p ⌘
1 + x4 x2 dx
1

with an error less than 10 2.

Exercise 13.81 (Exam 2015-08-17)

(a) Determine the Taylor expansion of order 4n + 1 with error term for the function
Z x
1
f (x) = dt.
0 1 + 2t4

(b) For which x can you guarantee that the error term goes to 0 as n ! 1?
Bonus question: Show that the Taylor expansion diverges as n ! 1 for the
remaining x.

Exercise 13.82 (Exam 2015-05-27)

(a) Formulate Taylor’s formula centered at x = 0 with an explicit estimate for the
error (make sure you state under which assumptions the statement holds).
(b) Show that
Z x
x5 x9
cos(t2 ) dt = x + R(x) with |R(x)| < .
0 10 216
R 0.1
(c) Determine an approximation for 0 cos(t2 ) dt with an error of at most 10 decimal
digits. Your approximation should be given in the form of a rational number (i.e.,
a fraction).

Exercise 13.83 (Exam 2014-12-18)


432 CHAPTER 13. TAYLOR POLYNOMIALS

(a) Explain why the length of the graph of y = f (x) as x 2 [a,b] is given by the
formula Z bp
1 + f 0 (x)2 dx.
a

(b) Show that

1 1 2 u3
(1 + u)1/2 = 1 + u u + R(u) with |R(u)| < for u 0.
2 8 16
(b) Use these results to approximate the length of the curve y = sin x as x 2 [0,⇡/2],
and give the smallest estimate for this error that you are able to.

Exercise 13.84 (Exam 2014-10-04)

(a) Formulate Taylor’s formula with explicit error term estimate for a function f (x)
when the approximation is centered at x = 0.
(b) Given a function with the graph shown in Figure 23, what can we say about the
three first coefficients of its Taylor expansion centered at x = 0?
p
(c) Determine what for the Taylor polynomial of f (x) = 4 + x2 is needed for the
error to be less than 1/105 on the interval [ 1,1].

Fig. 23. Illustration for exercise 13.84.

Exercise 13.85 (Exam 2014-08-18) From a previous exercises on this exam we know
that Z 1/2 p p
2
3 ⇡
1 x dt = + .
0 8 12
We are now supposed to use this to find a rational number that approximates ⇡.

(a) Use a p
suitable Taylor expansion to determine a rational number that approxi-
mates 3 with an error of at most 1/100.
13.5. RELEVANT EXERCISES FROM PREVIOUS EXAMS 433

(b) Use a suitable Taylor expansions to determine a rational number that approxi-
mates the integral above with the same accuracy.
(c) Use parts (a) and (b) to give a rational number that approximates ⇡. What is
the error of this approximation?
Exercise 13.86 (Exam 2014-05-26)
(a) Determine a curve for which the following integral gives the length:
Z ⇡/6 q
1 + sin2 (x) dx.
0

(b) Find a fraction that approximates the value of this length with an error of at
most 1/10. (Bonuspoints if you manage to approximate with an error less than
1/200.)
Exercise 13.87 (Exam 2014-01-09) As all exercises, this gives at most 5 points.
Here, we are going to approximate the value of
Z 1
cos(x2 ) dx.
0
Do one of the following:
(a) Show that cos x = 1 + R1 (x) with |R1 (x)|  x2 /2. Use this to give an approxi-
mation of the integral and an estimate of the error. (3 points)
(b) Show that cos x = 1 x2 /2 + R2 (x) with |R1 (x)|  x4 /24. Use this to give an
approximation of the integral and an estimate of the error. (4 points)
(c) Give a rational number approximating the above integral with an error less than
1/1000. (5 points)
Exercise 13.88 (Exam 2013-12-18)
(a) Show that
1
=1 x2 + x4 x6 + · · · + ( 1)n x2n + Rn (x) with |Rn (x)|  x2n+2 .
1 + x2
(b) Use part (a) to find the formula for the Taylor expansion of arctan(x). In partic-
ular, what inequality for the error term do you get?
(c) Use this to find a rational number approximating ⇡ with an error less than 1/100.
(This rational number can be given as a sum of fractions.)
Exercise 13.89 (Exam 2013-05-29) Motivate why the function sin(x2 )/x2 can be
integrated over the interval [ 1,1], and determine a rational approximation of
Z 1
sin(x2 )
dx
1 x2
by using Taylor approximation. Give this approximation of the integral both with
434 CHAPTER 13. TAYLOR POLYNOMIALS

(i) an error less than 1/100, and


(ii) an error less than some unknown, but fixed, ↵ > 0.

(The answers may be given as sums of fractions.)

Exercises on Taylor expansions with Big-oh error estimates


The exercises on limits, below, are usually among the first on the exam. Those on series
are usually placed in the middle portion of the exam.

Exercise 13.90 (Exam 2016-01-23, part of exercise) Determine whether the fol-
lowing series converges:
1 ⇣
X 1⌘
k 1 cos
k
k=1

Exercise 13.91 (Exam 2016-01-23)

(a) Determine the limit p


cos2 (x) 1 2x2
lim
x!0 x2 sin(x2 )
(b) For what value of L is the following function continuous at x = 0?
8 p
> 2 1 2x2
< cos (x) x 6= 0
f (x) = x2 sin(x2 )
>
: L x=0

Exercise 13.92 (Exam 2016-01-07)

(a) Define what we mean by f (x) = g(x) + O (h(x)) as x ! 0.


(b) Find the Taylor approximation of order 3 centered at x = 0, with Big-oh error
term, for
ln(1 sin x)

(c) Use (b) to determine the limit


⇣ 1 ⌘k
lim 1 sin .
k!1 k

Exercise 13.93 (Exam 2016-01-07, part of exercise) Determine whether the fol-
lowing series diverges or converges:
1 ⇣
X 1 1⌘
sin arctan .
k k
k=1
13.5. RELEVANT EXERCISES FROM PREVIOUS EXAMS 435

Exercise 13.94 (Exam 2015-08-17) Determine the limit


⇣ 1 1 ⌘
ln(1 + x) arctan x
lim
x!0 x

Exercise 13.95 (Exam 2015-05-27)


(a) Define what we mean when we write r(x) = O x3 as x ! 0.
(b) Determine the limit
2)
e(x x arctan x 1
lim
x!0 x3 sin x
Exercise 13.96 (Exam 2014-12-18) Determine the limit
p
cos x 1 x2
lim
x!0 x4

Exercise 13.97 (Exam 2014-05-26, part of exercise) Determine whether the fol-
lowing series converges or diverges.
X1
tan( k1 ) 1
k
q .
k=1 sin k1

Exercise 13.98 (Exam 2014-01-09) Let


x(ex 1)
f (x) = .
cos(x) + sin(2x2 ) 1
(a) Determine the limit of f (x) as x ! 0.
(b) What do we mean when we say that a function is continuous at a point x = a?
How should we define the above function at x = 0 for it to be continuous there?
Exercise 13.99 (Exam 2013-12-18) Determine for what real numbers a the follow-
ing limit exists:
cos(x) sin(ax) tan(x)
lim .
x!0 x3
Exercise 13.100 (Exam 2013-08-21, part of exercise) Determine the limit
1 cos x
lim
x!0 1 e x2

Exercise 13.101 (Exam 2013-08-21, part of exercise) Investigate the convergence


of
1 ⇣ ⌘ ⇣1 1⌘
X X1
p p
(a) k+1 k 1 (b) k 3/2 sin .
k k
k=2 k=2
436 CHAPTER 13. TAYLOR POLYNOMIALS

Exercise 13.102 (Exam 2013-05-29) Determine the limit


tan x arctan x
lim .
x!0 sin(x) x cos(x)

Exercise 13.103 (Exam 2013-01-09) Determine the limit

1 cos(arctan x)
lim .
x!0 (arctan x)2

Exercise 13.104 (Exam 2012-12-19) Determine the limit


x tan x
lim .
x!0 x sin x
13.6. ANSWERS TO SELECTED EXERCISES 437

13.6 Answers to selected exercises


13.3 4 terms, which gives the approximation sin(1) ⇡ 1 1/3! + 1/5! 1/7!.

13.5 You get the same answer as with T1 (all additional terms just give extra zeroes).
R1 2
13.7 6 terms, which gives the approximation p12⇡ 0 e t /2 dt ⇡ 1 1/(3 · 2) + 1/(5 · 22 ·
2!) 1/(7 · 23 · 3!) + 1/(9 · 24 · 4!) 1/(11 · 25 · 5!).

13.12 (a) T2 (x) = 1 + x + x2 /2, (b) T2 (x) = x x2 /2, (c) T2 (x) = x.


p p
13.16 (a)
p 2 ⇡ T 1 (1) = 1 + 1/2 = 1.5, (b) ⇡ T2 (1) = 1 + 1/2 1/8 = 1.375, (c)
2p
43 ⇡ T1 (43) = 6 + 7/12 = 6.5833..., (d) 43 ⇡ T2 (43) = 6 + 7/12 72 /1728 =
6.5549...
P P Pn
13.22 (a) Tn (x) = nk=0Pxk /k!, (b) Tn (x) = nk=1 ( 1)k 1 xk /k, (c) T2n+1 (x) = k=0 ( 1)k x2k+1 /(2k+
1)!, (d) T2n (x) = nk=0 ( 1)k x2k /(2k!).
p
13.23 (a) To match the first 2 digits, you need 4 terms, which gives approximation 2 ⇡
p
1 + 1/2 1/8 + 1/16. (b) If, in (b),pyou use the Taylor polynomial of x centered
at x = 6, then the approximation 43 ⇡ T3 (43) = 6 + 7/12 72 /1728 = 6.5549...
gives 3 correct digits (the true value seems to be 6.5574...).

13.26 (b) No and no.

13.27 (a) 1, (b) 1, (c) 1.

13.29 C can be chosen to be the maximum of |f 00 (x)|/2 on your favourite neighbourhood


of x = 0.

13.30 It suffices to apply the error estimate for x 2 [ 1/2,0]. Using this, in combination
with the improved error estimate, we find that the total error will not be larger
than approximately 0.00997....

13.31 The error is less than the maximum of |f 00 (c)|/2! = 1


8(1+c)3/2
for c in [0,1]. That is,
the error is less than 1/4.

13.33 (a): (ii) Use the relation E2 = f T2 followed


R x 00 by the connection between T2 =
T1 + f (a)(x a) /2. (iii) Replace E1 by a f (t)(x t) dt in the expression from
00 2

(ii), and then integrate by parts (differentiating f 00 and integrating (x t)). (iv)
Do the same as in (iii), but now for general n.

13.38 Using order n = 1 gives an error of slightly less than 0.004.

13.39 (b) According to the error estimate, putting n = 3 works.


438 CHAPTER 13. TAYLOR POLYNOMIALS

13.40 (There is an error in the formula: the term ( 1)n should not be there.) The point
of the exercise is to show (by induction) that

n!
f (n) (x) = .
(1 x)n+1

13.41 (b) n = 18.

13.43 (a) n = 6 will do, (b) the estimated error is roughly 4.1 · 10 8 , while the distance
computed between the integral and the estimate obtained using Taylor polynomials
with n = 6 is roughly 3.9 · 10 8 .

13.44 The point is to write


Z 1 Z 1 Z 1
2 2
cos(x )dx = T2n (x )dx + E2n (x2 )dx .
0
|0 {z } | 0
{z }
main term error term

If we would be so lucky that the main term is equal to 9/10 for some value of n,
then all we need to do is figure out if we can tell which sign the error term is...

13.45 (a) Use the Taylor formula with Lagrange error function, (b) the point is to observe
that Z ⇡/2 Z ⇡/2 ⇣ Z ⇡/2
cos3 t ⌘
sin(cos t)dt = cos t dt + R(cos t)dt .
⇡/3 ⇡/3 6 ⇡/3
| {z } | {z }
main term error term

13.46 (a) This follows by using the Taylor formula with Lagrange error term, (b) (1/64) ·
10 8 .

13.52 C = 1/(1 ) for sufficiently small (well, strictly smaller than 1).

13.53 The one from the example has no unknown "c", and should therefore be easier to
handle.

13.55 (a)
P1|E(x)| k 110|x|
n+1 /(n + 1), (b) |E(x)|  |x|n+1 /(n + 1), (c) the point is that

k=1 ( 1) /k = limn!1 Tn (1) + limn!1 En (1), and with an appropriate esti-


mate on the error function, we can see that the last term goes to zero.

13.56 (a) It follows from exercise 13.56 that

1
= Pn ( x2 ) + Rn ( x2 ),
1 + x2
where Pn is the Taylor expansion of 1/(1 x) of order n centered at x = 0, and
Rn is the corresponding error function (here, we use the notations Pn and Rn
13.6. ANSWERS TO SELECTED EXERCISES 439

since Tn and En are to be used for the Taylor polynomials and error functions for
arctan(x)).
P
(b) The Taylor polynomials for arctan(x) are T2n+1 = nk=0 ( 1)k x2k+1 /(2k + 1)
with error term given by
Z x
E2n+1 (x) = Rn ( t2 )dt.
0
Basically, playing around with this integral expression as in Example 13.54 solves
the exercise.
n
13.57 (a) m = (n)m /m! = (n)mP /(m)m , (b) f (n) (x) = (↵)n (1 + x)↵ n , (c) do as in
example 13.21, (d) Tn (x) = nk=0 (↵)k xk /k!.
P ( 1/2)k 2k+1
13.58 (a) T2n+1 (x) = nk=0 ( 1)k k!·(2k+1) x , (b) use arccos(x) = ⇡/2 arcsin(x).

13.61 (a) answer 1 for orders 1,2,3, (b) 1 for orders 1,2,3, (c) no answer for order 1, for
order 2,3 answer is 1/2 (d) answer 1 for orders 1,2,3.
13.65 Here is a proof for (iii):
|C · O (x↵ ) |  C · D · |x↵ | = E · |x↵ |,
where C,D,E are constants. This implies, by definition, that C · O (x↵ ) = O (x↵ ).
13.66 The point is that the formulas are formulated in a sloppy way. If we are to be
careful, they should be formulated as
f (x) = O x3 =) f (x) = O x2 ,
and
f (x) = O x2 =) f (x) = O x3 .
Now, the first implication is always true, while the second one is not (why?).
13.71 (a) x2 /2 + x3 /2 + O x4 , (b) 1 + x3 /6 + O x4 , (c) x x3 /3 + O x5 .
13.73 It does not affect the result. The extra terms all give extra 0’s when we take the
limit (which is unnecessary).
13.74 (a) 1/2 (b) 1 (c) 1/6 (d) 1/2.
13.76 both (a) and (b) converge.
13.77 The point is to start by writing
1 p
X 1
X ⇣r 1 ⌘
↵ ↵
( k 2↵ + 1 k )= k 1+ 1 .
k 2↵
k=1 k=1
p
Using Taylor polynomials for 1 + x centered at x = 0 we see that convergence
happens if and only if ↵ > 1.
440 CHAPTER 13. TAYLOR POLYNOMIALS

13.78 (b) Use T1 for f (x) = 1/(x 1).

13.79 (a) the (relativistic) energy goes to infinity, (c) when v/c is small, the relativistic
energy gets closer to the classical energy.
Appendix E

Additional results on the definite


integral

E.1 Some words on the Lebesgue integral

At the more advanced level, certain shortcomings


of the Riemann-Darboux integral forces us to re-
place it with with the more powerful Lebesgue
integral. This is not only necessary for mathe-
maticians, but also for physicists and engineers.
For this reason, it does not make sense to study
the Riemann integral more deeply than what is
strictly necessary.
The Lebesgue integral is nowadays consid-
ered by most mathematicians (and engineers and
physicists) as being the correct notion of the in- Fig. 1. Henri Lebesgue (1875 – 1941).
tegral. At this level, it is hard to explain exactly
what is wrong with the Riemann integral. However, it is more or less correct to say that
the set of all Riemann integrable functions is a bit like Q, while the set of all Lebesgue
integrable functions is like R. That is, there are functions that we ought to be able to
integrate but that the Riemann integral refuses to integrate! For instance, this is the case
for The Dirichlet function from Example 12.3. According to Lebesgue, it has definite
integral 1 over [0,1] (see Remark E.1, below, for more details).
But why does it matter for physicists and engineers that we can integrate functions
beyond what the Riemann integral is capable of doing? Well, one of the more important
applications of the definite integral in physics is to measure the energy of signals. Indeed,

E-1
E-2 APPENDIX E. ADDITIONAL RESULTS ON THE DEFINITE INTEGRAL

if f (t) describes a sound wave, starting at t = a and ending at t = b, then


s
Z b
|f (t)|2 dt
a

yields exactly the energy of this sound-wave. Or, if f (t) is the probability distribution
of a quantum particle, then the same expression yields the probability of this particle to
be located in the interval [a,b]. For reasons such as these, it is useful to have a concept
of integration which allows one to compute (or at least define) the definite integral for a
large class of functions. (We note that there are versions of the definite integral that go,
in different ways, beyond the Lebesgue integral. Examples include the Bochner integral
for certain vector valued functions and the Ito integral for "stochastic" functions).

Remark E.1 (The Lebesgue integral of Dirichlet’s function) Let us now discuss
briefly why the Dirichlet function, as we defined it in Example 5.3, satisfies
Z 1
f (x)dx = 1
0

according to Lebesgue. In terms of the general approach, the main difference between
the Riemann integral and the Lebesgue integral is that in the latter, we consider a
"partition" of the y-axis and not the x-axis. In particular, we can express the Lebesgue
integral of f as follows:
Z 1
f (x)dx = 0 · |{x : f (x) = 0}| + 1 · |{x : f (x) = 1}|
0

= 0 · |[0,1] \ Q| + 1 · |[0,1] \ (R\Q)|

= |[0,1] \ (R\Q)|.

where, for A ⇢ R, we use the notation |A| to denote the "size" or "measure" of the set of
points A. (To indicate why this makes sense, note that if f (x) = 3 for x in, say [0,1/2],
and zero for all other x, then the Lebesgue integral of f over [0,1] is equal to 3 · 1/2,
which coincides with the Riemann integral of this function.)
Now, a problem with the Lebesgue integral is that it is quite difficult to make a
theory for how to measure the size of subsets of R. In fact, this gives rise to a separate
branch of mathematics called "measure theory" on which "integration theory" is built.
Here, we mention the following facts, which should all be reasonable for any naive notion
of "measure" for subsets of the real line:

• The size of an interval [a,b] is given by |[a,b]| = b a.

• It follows from the previous fact that |{a}| = |[a,a]| = 0.


E.1. SOME WORDS ON THE LEBESGUE INTEGRAL E-3

• Suppose that A and B are two subsets of R that can be "measured" (not all sets
can be measured!), then:

– if A, B disjoint, then A [ B can be measured, and |A [ B| = |A| + |B|,


– if A ⇢ B, then |A|  |B|,
– if A ⇢ B, then |B\A| = |B| |A|.

• Finally, we mention that the summation rule holds for sequences (An )1
n=1 of disjoint
sets in the sense that
1
X
|A1 [ A2 [ A3 [ · · · | = |An |.
n=1

Let us now see how we can apply these facts to figure out the Lebesgue integral of the
Dirichlet function. As we have seen in Appendix B, Cantor showed that the rational
numbers are countable. That is, we can write Q = (rn )1
n=1 . But this means that

1
X
|Q| = |{rn }| = 0.
n=1

But then it follows that


|[0,1] \ Q|  |Q| = 0
and
|[0,1] \ (R\Q)| = |[0,1]\([0,1] \ Q)| = 1 0 = 1.
So, in conclusion, according to Lebesgue, we get that
Z 1
f (x)dx = 1.
0
Appendix F

Taylor and power series

We now discuss what happens when we combine


what we have seen on Taylor polynomials with
what we know about infinite series. As we shall
try to indicate here, and in later parts of the
epilogue, this will send us down a path where
we ultimately will have the power to play with Fig. 1. The DNA of the... eh... loga-
functions on the level of their DNA. rithm?

Remark F.1 (Selected problems from previous exams) Note that while this
chapter is not, strictly speaking, part of the course, there have been "difficult" problems
on previous exams that explore parts of what we discuss here. Here are two examples of
such problems:

1. (a) Determine the interval of convergence of


1
X 4n x2n
f (x) = ( 1)n .
2n
n=1

Be careful to check the convergence in the endpoints.


(b) Determine the exact value of f (x) by comparing the above power series to
that of 1/(1 x).
2. (a) State Taylor’s theorem with some explicit version (that is, not Big-oh) of the
error term.
(b) Prove that for all x 2 R we have
1
X
x xk
e = .
k!
k=0

F-1
F-2 APPENDIX F. TAYLOR AND POWER SERIES

F.1 A first look at Taylor and power series

Definition of Taylor series and some examples

Taylor series are exactly what you get when you take the limit n ! 1 in a Taylor
expansion of the form
f (x) = Tn (x) + En (x).

As we will indicate here, this allows us to think about certain functions as being "poly-
nomials of infinite degree".
Here is a first example.

Example F.2 (Taylor series of the exponential function) The n’th order Taylor
expansion of the exponential function centered at x = 0 can be expressed as
n
X
x xk ec
e = + xn+1 , (F.1)
k! (n + 1)!
k=0

where the error term is the one given by Lagrange’s formula. By using the quotient test
for the convergence of series, we observe that the series
1
X xk
k!
k=0

converges absolutely for all fixed choices of x. Moreover, by the table of growth, we find
that for all fixed choices of x, we have

xk
lim = 0.
n!1 k!

This implies, by the computational rules for the limit, that


⇣X
n
xk ec ⌘
ex = lim ex = lim + xn+1
n!1 n!1 k! (n + 1)!
k=0
1
X 1
X
xk xk
= +0= .
k! k!
k=0 k=0

This formula expresses the Taylor series of f (x) = ex centered at x = 0.

Now, the exponential function is by no means the only function that can be repre-
sented by a Taylor series. Here is a second example.
F.1. A FIRST LOOK AT TAYLOR AND POWER SERIES F-3

Example F.3 (Taylor series for the arctangent) By exercise 13.56, we know that
the 2n + 1’st order Taylor polynomial of the arctangent, centered at x = 0, is given by
n
X x2k+1 |x|2n+3
arctan x = ( 1)k + E2n+1 (x) where |E2n+1 (x)|  .
2k + 1 2n + 3
k=0

By the alternating series test, we observe that


1
X x2k+1
( 1)k
2k + 1
k=0

converges for all |x|  1, and by the divergence test, this series diverges for all |x| > 1.
Moreover, for fixed |x|  1, we have
|x|2k+1 1
lim |E2n+1 (x)|  lim  lim = 0.
n!1 n!1 2k + 1 n!1 2k + 1

From this, it follows that


1
X x2k+1
arctan(x) = ( 1)k , 8x 2 [ 1,1], (F.2)
2k + 1
k=0

while for x outside of this interval, then the right-hand side of this expression diverges.

Notice how the two examples we consider above differ in that the Taylor series for the
exponential function represents its function for all x 2 R, while the one for the arctangent
only represents its function for x 2 [ 1,1]. This motivates the following definition.

Definition F.4 (Entire function) If a function f is such that its Taylor series centered
at x = 0 converges, and is equal to, f for all x 2 R, then we call it entire.

The following proposition, which we do not prove here, makes identifying entire
functions easier.

Proposition F.5 Let f be defined on R and have derivatives of all orders at some point
x = a. If the Taylor series for a function centered at x = a converges to f (x) for all
x 2 R, then f is an entire function.

Exercise F.6 Show that sin x and cos x are entire functions.
Exercise F.7 Are inverse functions of invertible entire functions themselves entire?
Exercise F.8 Use the Taylor series of ex to find an expression for eix in terms of an
infinite series. What are the real and imaginary parts of this series?
F-4 APPENDIX F. TAYLOR AND POWER SERIES

Definition of power series


In general, a power series centered at a point
x = a is a function on the form
1
X
p(x) = ck (x a)k .
k=0

That is, the term power series basically means


polynomial of degree 1. Note that every Tay-
lor series is a power series. Moreover, it can be
shown that every power series is the Taylor series
for the function it defines.
When mathematicians work with power series,
we sometimes feel a bit like a mad scientist. In-
deed, by carefully choosing the coefficients ck , we
can genetically engineer functions that have very Fig. 2. Haha! The power of creation!
specific properties. That is, working with power
series gives a remarkable combination of flexibility and control which is fundamental to
many directions of mathematical research and applications.
Here are three fundamental questions we need to ask about power series:
(i) How can we determine for which x a power series converges (if any)?
(ii) Given a power series, does it represent any of the "standard" functions seen in this
course (that is, is it the Taylor series of some elementary function)?
(iii) Are we allowed to differentiate and integrate power series in a reasonable way?
Answerting question (i) can be rather hard (or impossible) if the coefficients of the
power series behave erratically, but we do have some tools. Indeed, to know whether
or not a power series converges for a specific x, we can use the convergence tests for
series that we have encountered previously in the course. Here, the comparison test, the
alternating series test, as well as the root and quotient tests are useful (often in combi-
nation with the concept of absolute convergence, if the ck ’s changes sign sporadically).
Moreover, we shall prove below that power series always converge in intervals.
Question (ii) can only be addressed in very special cases where the coefficients ck
follow some clear pattern. Indeed, if you just "blindly" choose coefficients ck then there is
no reason to believe that the function has anything in common with any of the standard
functions we have seen in the course (however, chances are that you still end up with a
function that, in some way or the other, makes sense).
Question (iii) is related to question (i). However, it is not enough for a power series
to converge in order for us to be allowed to differentiate (or integrate) term-by-term.
Indeed, what is required is that the power series converges uniformly for this to be
allowed. We discuss this further below.
F.1. A FIRST LOOK AT TAYLOR AND POWER SERIES F-5

Power series converge on intervals


As a first step in understanding the behaviour of power series in terms of their conver-
gence, we take a look at the following example.

Example F.9 We wish to determine for which x the following power series converges
1
X (3x 1)k
.
k+1
k=1

First, we suppose that x 2 R is fixed, but unknown, and study absolute convergence of
the series. That is, we study the convergence of
1
X |3x 1|k
.
k {z
+1 }
k=1 |
=|ak |

Let us do the quotient test:


⇣ |3x 1|k+1 ⌘
ak+1 k+2
lim = lim ⇣ |3x 1|k ⌘
k!1 ak k!1

k+1
|3x 1|k+1 k + 1
= lim · = |3x 1|
k!1 |3x 1|k k+2
By the quotient test, the series is absolutely convergent whenever this limit is strictly
less than 1. Hence, the series is absolutely convergent when
|3x 1| < 1 () 1 < 3x 1 < 1 () 0 < 3x < 2 () 0 < x < 2/3.
Moreover, the quotient test says that the series diverges whenever
|3x 1| > 1 () x < 0 eller x > 2/3.
However, the quotient test gives no information about what happens when the limit is
equal to 1. That is, when x = 0 or x = 2/3. So, we have to investigate these cases
separately. Plugging these terms into the series, we see that
1
X 1
X
( 1)k 1
x = 0 =) and x = 2/3 =) .
k+1 k+1
k=1 k=1

That is, for x = 0 we obtain the alternate harmonic series, which converges, and for
x = 2/3 we obtain the harmonic series, which diverges.

In conclusion, the power series converges for x 2 [0,2/3) and diverges for all x outside
of this interval. In particular, the convergence is absolute for all x 2 (0,2/3).
F-6 APPENDIX F. TAYLOR AND POWER SERIES

Let us now make some observations with respect to the above example. First of all,
the power series from the example is centered at x = 1/3. This is seen by writing
1
X X 3k1
(3x 1)k
= (x 1/3)k .
k+1 k+1
k=1 k=1

Next, observe that the power series converges in the interval [0,2/3). While this interval
is a fairly ugly one, it has some important features:

• this interval is centered at the same point as the power series (that is, at x = 1/3),
• the power series converges absolutely for all inner points of the interval,
• the power series converges conditionally on one end-point (x = 0), and diverges at
the other (x = 2/3).

These features can be recognised in the following result on power series.

Theorem F.10 For all power series


1
X
f (x) = ck (x a)k
k=0

there exists a number R 0 called the radius of convergence (which we also allow to be
1) such that:

(i) the power series converges precisely on an interval with radius R 0 centered at
x = a (we call this the interval of convergence of the power series),
(ii) the power series converges absolutely on all interior points of this interval,
(iii) the power series may or may not converge (conditionally or absolutely) on its
endpoints.

In particular, if R = 0, the power series only converges at the point x = a (which it


always trivially does), and if R = 1, the power series converges for all x (it is entire).

Proof. To prove (i) and (ii), it suffices to prove that if the power series converges at a
point x0 6= a, then it converges absolutely for all x such that

|x a| < |x0 a| (F.3)

(that is, it will converge absolutely at all points that are strictly closer to a than x0 ).
Indeed, this shows that the convergence has to be on an interval, and that at the inner
points, we have absolute convergence. As we saw in the above example, it is possible to
have both divergence and conditional convergence taking place at endpoint (we leave it
F.1. A FIRST LOOK AT TAYLOR AND POWER SERIES F-7

to the reader to find an example of a power series that has absolute convergence at at
least one endpoint). So, suppose the power series converges at x0 6= a. That is,
1
X
ck (x0 a)k
k=0

converges. By the divergence test, this implies that ck (x0 a)k ! 0 as k ! 1. In


particular, this implies that the sequence ck (x0 a)k is bounded. But then there exists
a number M so that |ck (x0 a)k |  M for all k, and so we obtain that
1
X 1
X 1
X
x a k x a k
|ck (x a)k | = |ck ||x0 a|k · M .
x0 a x0 a
k=0 k=0 k=0

Finally, by inequality (F.3), on the previous page, we conclude that the series on the
right-hand side is a convergent Geometric series, and we are done.

We remark that since absolute convergence in some sense is "better" than conditional
convergence, this means that power series have "better" convergence behaviour in its
interior than on the endpoints of its interval of convergence. Starting on the next page,
we shall show that power series converge converge in an even nicer way if we keep a
positive distance to the endpoints of the interval of convergence.

Exercise F.11 We consider the convergence of the power series


1
X 1
X 1
X
xk 3k xk (2x 1)k
(i) (ii) (iii) .
(k + 1)2k ln(k) · k! k 2 ln k
k=0 k=2 k=2

For each of these, answer the following questions:


(a) Use the root and/or quotient tests to determine for which x it converges absolutely
and diverges.
(b) Use some convergence test to determine if it converges conditionally for any x.
Exercise F.12 Can you give an example of a power series that converges only for
x = 0 and diverges for all other x?
Hint: All you need is the divergence test and some inspiration from the table of growth
for sequences.

Exercise F.13 Not every problem in the known universe can be solved using Taylor
series. Consider ( 2
e 1/x x =6 0
f (x) =
0 x=0
(a) Show that f (x) is continuous at x = 0.
(b) Use induction to determine f (n) (0) for all n. What does this say about Tn (x)?
F-8 APPENDIX F. TAYLOR AND POWER SERIES

F.2 Uniform convergence and continuity of power series


Uniform convergence of power series
Above, we proved that power series converge absolutely on all interior points of their
domains of convergence. Below, when we study how to differentiate and integrate power
series, this will not be enough. In fact, we need to establish that power series satisfy an
even stronger type of convergence that takes place as soon as we stay at a positive fixed
distance from the endpoints, namely uniform convergence.
The difference between "normal" convergence and uniform convergence is analogue
to the difference between "normal" continuity and uniform continuity. That is, the
"normal" variant is defined at specific points, while the uniform variant is defined on
intervals1 (or, more generally, on sets). With this motivation, we now define what
uniform convergence means.

Definition F.14 (Uniform convergence for power series) We say that a power
series
X1
ck (x0 a)k
k=0

converges uniformly on an interval I if, for all ✏ > 0 there exists a number N so that for
all x 2 I we have
X1 Xn
n > N =) ck (x0 a)k ck (x0 a)k < ✏.
|k=0 {z k=0 }
this is the n’th tail

Before considering an example where we use the above definition, let us immediately
point out that the above definition is not really about power series, or even about series
at all.

Definition F.15 (Uniform convergence) Let gn be a sequence of functions defined


on some set I. Then we say that gn converges uniformly to some function f on the set
I if, for every ✏ > 0, there exists a number N so that for all x 2 I, we have

n > N =) g(x) gn (x) < ✏.

Note that definition F.14 follows from Definition F.15 by letting the gn be the n’th
partial sum of the power series. Also, note that in the above definition, we mean that
the gn are defined at least on the set I (which does not need to be an interval at all).
1
For this reason our "normal" continuity is often referred to as pointwise continuity, and the "normal"
convergence of a power series is often referred to as pointwise convergence.
F.2. UNIFORM CONVERGENCE AND CONTINUITY OF POWER SERIES F-9

Let us take a look at what this means in practice.

Example F.16 Let us again consider the power series from Example F.9,
1
X (3x 1)k
.
k+1
k=1

As we saw above, it converges on the interval [0,2/3). Let us now show that it converges
uniformly on the interval [1/6,3/6]. That is, for ✏ > 0 given, our goal is to prove the
existence of a number N so that for n > N we have
1
X (3x 1)k
< ✏.
k+1
k=n+1

So, how to do this? Well, first of all, let us observe that

|3x 1| = 3|x 1/3|.

Next, notice that the assumption x 2 [1/6,3/6] implies that x 1/3 2 [ 1/6, 1/6]. That
is,
1 1
x 2 [1/6,3/6] =) |3x 1|  3 · = .
6 2
So, if we apply the triangle inequality to the infinite series, we obtain the following chain
of inequalities:
X1 1
X X1
(3x 1)k 1 1
 k
 .
k+1 2 (k + 1) 2k
k=n+1 k=n+1 k=n+1

Now, using what we know about Geometric series, we observe that


1
X 1
1 1 X 1 1 1 1
= = n+1 1 = .
2k 2n+1 2k 2 1 2
2n
k=n+1 k=0

Finally, solving 1/2n < ✏, shows that when n > N with N = log ✏/ log 2, the above
expression is less than ✏ for all x 2 [1/6,3/6], and we are done!

Exercise F.17 We consider again the same power series as in the above exercise.
(a) Show that the power series converges uniformly on the even larger interval [1/9, 5/9].
(b) (Challenge) The power series does not converge uniformly on the full interval
[0,2/3). Show that this is the case.
Exercise F.18 What intervals do the Taylor series for exp(x) converge uniformly on?
Is there any interval where this function does not converge uniformly?
F-10 APPENDIX F. TAYLOR AND POWER SERIES

Continuity of power series


We now state and prove an important application of the concept of uniform convergence.
Namely, a result which we can use to prove that certain limits of sequences of functions
are continuous.

Proposition F.19 Suppose gn is a sequence of continuous functions on some interval


I, and that gn converges uniformly on I to some function g. Then g is continuous on I.

Proof. Let c 2 I and ✏ > 0 be given. Our goal is to find some > 0 so that for all x 2 I,
we have
|x c| < =) |g(x) g(c)| < ✏.
Out first observation is that since gn converges to g uniformly on I, it follows that there
exists a number N so that we have
(
|gn (c) g(c)| < ✏/3
n > N =) ,
|gn (x) g(x)| < ✏/3

where c is the fixed number from above, and x any number in I.


Fix the number n = N + 1. Our second observation is that since. the function gn
(for our fixed value of n) is continuous at c, there exists a number > 0 so that

|x c| < =) |gn (x) gn (c)| < ✏/3.

While it may not be immediately clear, we have now won! This is exactly the one we
are looking for. Indeed, if |x c| < and n = N + 1, then the following computation is
justified:

|g(x) g(c)| = |g(x) gn (x) + gn (x) gn (c) + gn (c) g(c)|


| {z }| {z }
=0 =0

 |g(x) gn (x)| + |gn (x) gn (c)| + |gn (c) g(c)|


✏ ✏ ✏
< + + = ✏.
3 3 3
Done!

Exercise F.20 Show that P it follows that if fk (x) is a sequence of continuous functions
on some interval I, then if 1 k=0 fk (x) converges uniformly, then this infinite series is
continuous on I.
P
Hint: Apply the above proposition to the sequence of partial sums nk=0 fk (x).
F.2. UNIFORM CONVERGENCE AND CONTINUITY OF POWER SERIES F-11

Example F.21 By Example F.16, the power series


1
X (3x 1)k
k+1
k=1

converges uniformly on [1/6,3/6]. Since each partial sum of this power series is a finite
sum of continuous functions, and therefore itself continuous, it follows from Proposition
F.19 that the function defined by the power series is continuous on [1/6,3/6].

Now, in the above examples, we showed that the power series we studied converges
uniformly on [1/6,3/6] and is therefore continuous on this interval. However, in exercise
F.17, you were asked to show that it converges uniformly on the larger interval [1/9,5/9],
and therefore is also continuous there. The next result shows how far we can push this.

Proposition F.22 Power series converge uniformly at all inner points of their domain
of convergence.

While it is not that hard to give a direct proof of this proposition, we choose to first
establish a lemma which is important in its own right, as it gives a seemingly modest,
but quite useful, convergence test to establish uniform convergence.

Proposition F.23 (Weierstrass M-test) Let fk (x) be a sequence of functions defined


on some interval I and suppose that there exists a sequence of numbers Mk > 0 so that
|fk (x)|  Mk for all k 2 N and x 2 I. Then,
1
X 1
X
Mk < 1 =) fk (x) converges uniformly on I.
k=0 k=0

Exercise F.24 Use Example F.16 as a blueprint to prove the M -test.


Exercise F.25 Use the M-test to prove Proposition F.22.

In combination, propositions F.19 and F.22, yield most of the following result.

Proposition F.26 Power series are continuous at all points in their domain of conver-
gence.

Exercise F.27 (Challenge) Complete the proof of the above proposition. That is,
prove that if a power series converges at an endpoint of its domain of convergence,
then it is continuous there.
F-12 APPENDIX F. TAYLOR AND POWER SERIES

F.3 When can we integrate and differentiate power series?


Having established that power series are continuous whenever they converge, we turn to
the question of how to differentiate and integrate power series. As we shall see, uniform
convergence plays a central part also for this question.

A motivational example and a warning


First, let us consider an example that shows why we would want to differentiate power
series.

Example F.28 Let us consider the identity

x2 x3 x4
ex = 1 + x + + + + ··· .
2 3! 4!
Are we allowed to differentiate the infinite sum? Well, we do not know this yet, but let
us take a look at what happens if we differentiate both sides of this identity:

d x d⇣ x2 x3 x4 ⌘
e = 1+x+ + + + ···
dx dx 2 3! 4!
d d d x2 d x3 d x4
= 1+ x+ + + + ···
dx dx dx 2 dx 3! dx 4!
x2 x3
=0+1+x+ + + · · · = ex .
2 3!
Nice, no? :-)

Exercise F.29 Differentiate the formula (F.2) for the arctangent term-by-term. What
does this say about the value of the derivative of arctan(x) at x = 1? Does this equal
the value of (arctan x)0 = 1/(1 + x2 ) at x = 1?

The point of the above example and exercise is to show that it is both natural to
be able to differentiate an infinite sum of functions term by term, but it can also be
dangerous. That is, we need some type of proposition to tell us when we are allowed
write
1 1
d X X d
fk (x) = fk (x), and (F.4)
dx dx
k=0 k=0
Z x⇣X
1 ⌘ 1 ⇣Z
X x ⌘
fk (t) dt = fk (t)dt . (F.5)
a k=0 k=0 a
F.3. WHEN CAN WE INTEGRATE AND DIFFERENTIATE POWER SERIES?F-13

When are we allowed to integrate power series term by term?


As it turns out, it is convenient to attack the second problem first.

Proposition F.30 Suppose gn is a sequence of continuous functions on some closed


interval I that converges uniformly on I. Then, for all a,x 2 I, we have
Z x⇣ ⌘ Z x
lim gn (x) dx = lim gn (x)dx.
a n!1 n!1 a

Proof. Let us denote the limit of the uniform limit of the sequence gn on I by g. We first
observe that since a uniform limit of continuous functions is itself continuous, it follows
that g is integrable on I. Next, let ✏ > 0 be given. Since gn converges uniformly to g,
there exists an N so that for n > N , we have

|gn (x) g(x)| < 8x 2 I.
|I|

But then it follows that


Z x Z x Z x
gn (x)dx g(x)dx = gn (x) g(x) dx
a a a
Z x
 |gn (x) g(x)|dx < ✏.
a

That is, we have proved that


Z x Z x
lim gn (x)dx = g(x)dx,
n!1 a a

as required.

Exercise F.31 Use the above propositionP to prove that if fk is a sequence of functions
continuous on a closed interval I, and if 1 k=0 fk (x) converges uniformly on I, then
(F.5) holds.
P
Hint: Apply the above proposition to the sequence of partial sums nk=0 fk (x).

Exercise F.32 Explain how the above result applies to power series.
F-14 APPENDIX F. TAYLOR AND POWER SERIES

When are we allowed to differentate term-by-term?


Let us now turn to a result which tells us when we can differentiate the limit of a
sequence of functions. Since this result follows by an application of Propsition F.30, we
only indicate the proof in the exercises below.

Proposition F.33 Suppose that gn is a sequence of functions differentiable on some


interval I, and that each gn0 is continuous on I. If the there exists (at least) one point a 2 I
such that the sequence gn (a) converges, and the sequence of derivatives gn0 converges
uniformly on I, then
d⇣ ⌘
lim gn (x) = lim gn0 (x).
dx n!1 n!1

Exercise F.34 (Challenge) Combine Proposition F.30 with the Fundamental theo-
rem of calculus to prove the above proposition.
Rx
Hint: a gn0 (t)dt = gn (x) gn (a).

Exercise F.35 Use the above proposition to prove that P1if fk is a sequence of functions
differentiable on some interval I. If the infinite
P series k=0 fk (x) converges on at least
one point x0 2 I, and the infinite series 1 f
k=0 k
0 (x) converges uniformly on I, then

(F.4) holds.

Exercise F.36 Explain how the above result applies to power series.

Exercise F.37 In this exercise you are to explore a situation where we are not al-
lowed to interchange the order of limits and the derivative. Specifically, let fk (x) =
sin(kx)/k, and show that in this case

d d
lim fk (x) 6= lim fk (x).
k!1 dx dx k!1

Exercise F.38 One application of the above proposition is to determine the sum of
certain power series. In particular, can you suggest – and prove – a formula for the
function defined by the power series
1
X
kxk 1
.
k=1
F.4. SUMMARY ON HOW TO DEFINE THE ELEMENTARY FUNCTIONS F-15

F.4 Summary on how to define the elementary functions


A theme in the lecture notes2 is the study of the properties of the elementary functions

• polynomials and rational functions,


• the trigonometric functions and the inverse trigonometric functions.
• the natural logarithm the exponential function and the power functions,

As we have pointed out, a critical defect of any pretense for a "rigorous" construction
of the theory of these functions is that for all – except the polynomials and rational
functions – we have based ourselves on definitions based on geometric considerations.
We now discuss and summarise how all definitions can now be put on a safe foundation,
and how to obtain their fundamental properties.

How to define the logarithm and exponential function

We begin with the logarithm. Now, the definition for the logarithm is on firm foundation,
since the geometric definition we used in Chapter 2 can be formulated in terms of a
definite integral. 3

Definition F.39 For all x > 0 we define


Z x
dt
ln(x) = .
1 t

In particular, it follows more or less immediately from this definition that the loga-
rithm has the following properties:

• The logarithm has domain (0,1) and satisfies ln(1) = 0.


• From the Fundamental theorem of calculus, we obtain that (ln x)0 = 1/x. In
particular, this means that the logarithm is continuous and differentiable.
• From what we know about improper integrals, the range of the logarithm is R.
• Finally, using either the fundamental properties of the integral, or the techniques
from Chapter 9 (see Example 9.27 and exercise 9.28), we can deduce all laws of
the logarithm.
2
Well, an implicit theme in the Lecture notes as published at the start of the term, but a very explicit
theme following the suggested updates posted on the course Canvas page.
3
This observation is not done explicitly in the lecture notes (I think), but usually is discussed in the
lectures. In any case, I will include an exercise on this in the next update.
F-16 APPENDIX F. TAYLOR AND POWER SERIES

Let us move on to the exponential function. Now, since the logarithm has a strictly
positive derivative on its domain, it follows that it is an invertible function. We call this
inverse exp(x). Here is how we can now obtain all the fundamental properties we need
on the this function.

• By its definition, we obtain that the exponential function satisfies exp(0) = 1, has
domain R and range (0,1).
• Since the exponential function is the inverse function of a continuous function
defined on an interval, it is itself continuous.
• By what we did in exercise 8.48, which only used the properties for the logarithm
listed above, we know that (exp x)0 = exp x.
• In Proposition W2.33 (in the suggested changes document), we explained how all
the laws for the exponential function follows from the laws of the logarithm. In
def
particular, we explained why exp(x) = ex with e = exp(1).
• By what we did in Example F.1, which only used what we have discussed above,
we know that
X1
xk
exp(x) = .
k!
k=0

• Using the Taylor series for the exponential function, we can compute the value
def
e = exp(1) to any desired accuracy. In fact, this approximation formula is so good
that to get at least 16 correct decimals, we only need the partial sum for which
k! 1016 , which happens when k = 19.

Finally, we explain how we can use the above to extend the complex exponential to
the complex plane, thus obtaining the definition of the complex exponential. Indeed,
while we, in Chapter 2, defined the complex exponential in terms of the sine and cosine,
we can use the Taylor series of the exponential function to give a direct definition. To
prepare for this, we prove the following result.

Proposition F.40 (Extension of the exponential function to complex numbers)


The Taylor series for the exponential function converges for all z 2 C.

Proof. The point of the proof is that absolute convergence also works when we sum up
complex numbers. That is, if ck is a sequence of complex numbers, then
1
X 1
X
|cn | < 1 =) cn converges.
k=0 k=0

Since this is a rather modest extension, we leave it as an exercise below. Here, the point
is simply to notice, that since, for any complex number we have |z k | = |z|k , we now
F.4. SUMMARY ON HOW TO DEFINE THE ELEMENTARY FUNCTIONS F-17

obtain
1
X 1
X
|z|k zk
< 1 =) converges.
k! k!
k=0 k=0

P F.41 Let cn be a sequence of complex numbers. Prove that if the infinite


Exercise
series cn converges absolutely, then it converges.

We now make the following definition.

Definition F.42 For all z 2 C, we define


1
X zk
exp(z) = .
k!
k=0

Not surprisingly, we can obtain all properties of the complex exponential from this
definition. Since this can be a bit tedious, let us restrict ourselves to proving one such
property (that we will need below, when discussing the trigonometric functions).

Proposition F.43 For all z,w 2 C, we have exp(z + w) = exp(z) exp(w).

Proof. To prove the above formula, let us work on the n’th partial sum for exp(z + w):
Xn Xn k ✓ ◆ n k
(z + w)k 1 X k j k j X X z j wk j
= z w =
k! k! j j!(k j)!
k=0 k=0 j=0 k=0 j=0

At this point, we interchange the order of the two


summation signs. This can be a bit awkard, but
notice that the above expression says that "for all
k 2 {0,1, . . . , n}, we sum over all j 2 N so that and
0  j  k". If we keep track of which pairs of indices
(k,j) this includes, then we get the diagram shown
to the right.
If we want to switch the order of summation, then
we need to be able to say something to the effect of
"for all j in some set, we sum over k such that bla
bla". But if you look at the above diagram, notice
that it is exactly described by the following. "for
all j 2 {0,1, . . . , n}, we sum over all k 2 N so that Fig. 3. Here, we see a visualisa-
j  k  n". That is, we are justified in continuing tion over the pairs of indices (k,j)
the above computation as follows: that we sum over above.
F-18 APPENDIX F. TAYLOR AND POWER SERIES

n X
X n n j j k
n X
X
z j wk j z w
··· = = .
j!(k j)! j!k!
j=0 k=j j=0 k=0

Notice that in the last step, we made a change of summation index from k to n in the
inner sum, by setting n = k j. Finally, by taking the limit as n ! 1 on both sides of
our computation, we arrive at
1
X 1 X
X 1
(z + w)k z j wk
=
k! j!k!
k=0 j=0 k=0

⇣X
1
z k ⌘⇣ X wk ⌘
1
= .
k! k!
j=0 k=0

(This is a bit dodgy since n occurs in two different places, making the "partial sum"
on the right-hand side into not really being a partial sum. We ask you to fix this in an
exercise below.)

Exercise F.44 (Challenge) Show that if ck,j is absolutely convergent, in the sense
that
X1 X1
|ck,j | < 1,
j=0 k=0

then
X n j
n X n X
X n
lim ck,j = lim ck,j .
n!1 n !1
j=0 k=0 j=0 k=0

Remark: This exercise looks worse than it is. The point is just to use the absolute
convergence to show that "the tail" will be as small as you want.

Remark F.45 The above exercise can be extended in the sense that if the double sum
converges absolutely, then all ways of summing that double series will converge and give
the same result.

Remark F.46 The scheme for extending the power series of the exponential function
to the complex plane works for all power series. Indeed, since all power series converge
absolutely on interior points of their intervals of convergence, it follows by the same
argument that if all power series converge on a disc on the complex plane with the same
center and radius as its interval of convergence.
F.4. SUMMARY ON HOW TO DEFINE THE ELEMENTARY FUNCTIONS F-19

How to define the trigonometric functions and their inverse functions


In Chapter 2, we gave a geometric definition of the trigonometric functions that depended
on the concept of arclength. While this was discussed in Chapter 12, and could be used
to put the definition on firm ground, defining the trigonometric functions in terms of
their Taylor series is more clean.

Definition F.47 For all x 2 R, we define


1
X ( 1)k x2k+1
sin x =
(2k + 1)!
k=0

and
1
X ( 1)k x2k
cos x = .
(2k)!
k=0

(Notice that we do not need to make any new definition of the tangent, since, as before,
we just put tan x = sin x/ cos x.)
Let us now indicate how to collect all fundamental facts on the sine and cosine from
starting from the above definition.

• The domains of the sine and cosine are R, and sin 0 = 0 cos 0 = 1.
• sin x is an odd function and cos x is an even function.
• Since the sine and cosine converge uniformly on all finite intervals in R, it follows
we can differentiate their defining power series term by term. This yields (sin x)0 =
cos x and (cos x)0 = sin x.
• As we saw in exercise 9.6, the Pythagorean identity cos2 x + sin2 x = 1 follows by
the above points using the main corollary of the Mean value theorem.
• By comparing the above power series to that of the complex exponential, we get
exp(ix) = cos x + i sin x.
• By equating the real and imaginary parts of the left and right-hand sides of the
identity exp i(a + b) = exp(ia) exp(ib), the addition formulas of the sine and
cosine follow immediately.
• By combining the Pythagorean identity with the addition formulas, we obtain the
double-angle and half-angle formulas.
• By using the error term estimate of the alternating series test, we can check that
cos(2) < 0. By the intermediate value theorem, this implies that there exists at
least one point c 2 (0,2) so that cos(c) = 0. Let c be the smallest such point, and
define ⇡ = 2c.
F-20 APPENDIX F. TAYLOR AND POWER SERIES

• Since cos(x) is strictly positive on (0,⇡/2), it follows that sin x is strictly growing
on [0,⇡/2]. In particular, this implies that sin(⇡/2) is positive, and we can then
deduce from the Pythagorean identity that sin(⇡/2) = 1.
• Using the addition formula in combination with the values of the sine and cosine at
⇡/2, we can obtain all translation formulas for the sine and cosine involving ⇡/2,
⇡ and 2⇡.
• In particular, from the above, we find that cosine is strictly positive on ( ⇡/2,⇡/2)
and and that the sine is strictly positive on (0,⇡). By the differentiation formulas,
this implies that the sine is strictly increasing on [ ⇡/2,⇡/2] and that the cosine
is strictly decreasing on [0,⇡].
• The value of the sine and cosine at ⇡/3 follows by using the above to play around
with the expression sin(3x). The values at ⇡/6 can then be deduced.
• It follows from the Pythagorean identity that the ranges of the sine and cosine
belong to [ 1,1]. Using the fact that sin(±⇡/2) = ±1 and cos(0) = 1 and cos(⇡) =
1 in combination with the intermediate value theorem, we obtain that their ranges
are exactly [ 1,1].

Above, you should note one thing that might strike you as shocking. We are defining
⇡ to be twice the value of the smallest positive zero of the cosine. Below, we point out
an algorithm for computing its value to a high a level of accuracy as we would ever want.
Also, we point out that since all properties of the tangent is derived from its definition
tan(x) = sin(x)/ cos(x), then as soon as we know everything about the cosine and sine
functions, then we also obtain all facts on the tangent function.
But before we do this, let us now turn to the inverse trigonometric functions. By
the list of points above, it follows that the cosine is invertible if its domain is restricted
to [0,⇡] and that the sine is invertible if its domain is restricted to [ ⇡/2,⇡/2], and
that when restricted to these domains, these functions retain their full range. Similarly,
we obtain that the tangent function is invertible when restricted to ( ⇡/2,⇡/2) with
range R. Now, since we have already, in earlier chapters, explained how to obtain all
properties of the inverse trigonometric functions from the sine, cosine and tangent, we
need not repeat this here. However, we do point out that from the Taylor series
1
X x2k+1
arctan x = ( 1)k , x 2 [ 1,1],
2k + 1
k=0

(see exercise 13.56 and Example F.3 for an explanation of this). But since arctan(1) =
⇡/4, it follows that
X1
( 1)k ⇣ 1 1 ⌘
⇡=4 =4 1 + ··· .
2k + 1 3 5
k=0
Unfortunately, while this formula for ⇡ is very pretty, it converges rather slowly, meaning
we have to compute a lot of terms to get a reasonable accuracy for ⇡. But, hey, at least
it is an alternating series, and the error is therefore easily tracked.
F.5. POWER SERIES AND DIFFERENTIAL EQUATIONS F-21

F.5 Power series as a solution method to linear differential


equations with non-constant coefficients
We end this part of the epilogue with an example meant to illustrate how we can use
the "power of creation", mentioned on page F-4, to "genetically engineer" solutions to a
differential equation. This is also relevant to our discussion on how to define the elemen-
tary functions. Indeed, by using this technique, we can start from various differential
equations that we want our elementary functions to satisfy, and then "genetically create"
their solutions from scratch.
As a motivational example, let us consider Airy’s differential equation, which the
techniques we have seen so far in the course, would not allow us to solve.

Example F.48 (Airy’s equation) To give an idea of what we can accomplish using
power series, we consider a famous equation from physics known as Airy’s equation:

y 00 xy = 0.

Why would we be interested in the Airy equation? Well, the following quote is taken
from Wikipedia (and is meant to make this equation sound important):

[The function solving the Airy equation] is the solution to Schrödinger’s equation for a
particle confined within a triangular potential well, [it] underlies the form of the intensity
near an optical directional caustic, [and ] is also important in microscopy and astronomy;
it describes the pattern, due to diffraction and interference, produced by a point source
of light.

Ok, so how to solve Airy’s equation? The first observation is that the Airy equation
is a homogeneous and linear differential equation of order 2 – but with non-constant
coefficients! For this reason, none of the solution methods we have seen for differential
equations, so far, will work. This means we need a new idea. So, let us try to "genetically
engineer" (!) a solution of the form
1
X
y(x) = c k xk .
k=0

This is actually a very good strategy, which (almost immediately) leads to the two
(linearly independent) solutions

x3 x6 x9
y0 (x) = 1 + + + + ···
2·3 2·3·5·6 2·3·5·6·8·9
and x4 x7 x10
y1 (x) = x + + + + ··· .
3 · 4 3 · 4 · 6 · 7 3 · 4 · 6 · 7 · 9 · 10
F-22 APPENDIX F. TAYLOR AND POWER SERIES

As with the case of 2nd order linear differential equations with constant coefficients, here,
the solution space is still linear and 2 dimensional, so all solutions are given on the form

y = Ay0 + By1 .

While the solution method of using Taylor series should understandable for students
having read this far in the lecture notes, we do not include the details as an act of mercy.
Instead, we refer the interested student to exercise F.51, below, and/or the YouTube
movie https://www.youtube.com/watch?v=0jnXdXfIbKk for the details.

These functions y0 (x) and y1 (x), expressed


above, turn out to be so useful that they are
called the Airy functions, and given the sym-
bols Ai(x) and Bi(x), respectively. In fact, in
some mathematicians consider these to be among
the "standard" functions of mathematics! (Note
that if you think about it, sin x and cos x are
nothing but two independent solutions to the dif-
ferential equation y 00 + y = 0.)
To the right, we show a plot of the graphs of Fig. 4. Airy’s functions. Which is
Airy’s functions. Note that the blue one seems which?
to be the love child of the sine function and the exponential function, while the red one
seems to be the offspring of the cosine and the exponential function.

P1
Exercise F.49 Suppose that y(x) = k=0 ck x
k satisfies the initial value problem
(
y0 = y
y(0) = 1

(a) By differentiating the power series of y term-by-term, show that


1
X
y0 = (k + 1)ck+1 xk .
k=0

(b) Use the relation y 0 = y to show that we must have the relation
ck
ck+1 = , 8k 0.
k+1

(c) Use the above, in combination with the initial condition y(0) = 1, to determine
the values of all ck . Do you recognise the function y(x? Is this surprising?
F.5. POWER SERIES AND DIFFERENTIAL EQUATIONS F-23

Exercise F.50 In this exercise, we invite you to repeat the procedure from the above
example to find a solution to the initial value problem
8 00
<y + y = 0
>
y= y(0) = 0
>
: 0
y (0) = 1

(a) Find an expression for y 00 .


(b) Use the relation y 00 + y = 0 to find a formula for ck+2 in terms of ck .
(c) Use the above, in combination with the initial conditions y(0) = 0 and y 0 (0) = 1,
to determine the values of all ck . Do you recognise the function y(x? Is this
surprising?
(d) What happens if you change the initial conditions to y(0) = 1 and y 0 (0) = 0?

Exercise F.51 Repeat the procedure from the above exercise to solve Airy’s equation.

You might also like