OIL Text
OIL Text
Prepared by Bruno Belevan, Parham Hamidi, Nisha Malhotra, and Elyse Yeager
Adapted from CLP Calculus by Joel Feldman, Andrew Rechnitzer, and Elyse Yeager
T HIS DOCUMENT WAS TYPESET ONT HURSDAY 7 TH J ANUARY, 2021, AND IS COMPATIBLE
WITH THE 12 D ECEMBER 2020 VERSION .
§§ Licenses and Attributions
Copyright © 2020 Bruno Belevan, Parham Hamidi, Nisha Malhotra, and Elyse Yeager
This textbook contains new material as well as material adapted from open sources.
• Chapters 1 and 2 (and their associated appendix sections) were adapted with minor
changes from Chapters 1 and 2 of CLP 3 – Multivariable Calculus by Feldman, Rech-
nitzer, and Yeager under a Create Commons Attribution-NonCommercial-ShareAlike
4.0 International license.
• Chapters 3 and 5 (and their associated appendix sections) and Appendix B were
adapted with minor changes from Chapters 1 and 3, Section 2.4, and Appendix A
of CLP 2 – Integral Calculus by Feldman, Rechnitzer, and Yeager under a Create
Commons Attribution-NonCommercial-ShareAlike 4.0 International license.
• Chapter 4 contains content adapted with significant changes from Sections 1.1, 3.1,
Ch 4 introduction, 4.1, and 4.2 of Introductory Statistics by Ilowsky and Dean under
a Creative Commons Attribution License v4.0.
§§ Acknowledgements
UBC Point Grey campus sits on the traditional, ancestral and unceded territory of the
xw m θkw ý m (Musqueam). Musqueam and UBC have an ongoing relationship sharing
e e e
insight, knowledge, and labour. Those interested in learning more about this relationship
might start here.
Matt Coles of the University of British Columbia has been an important member of the
project to develop quality open resources for Math 105. Thanks to Andrew Rechnitzer at
UBC Mathematics for help with converting LaTeX to PreTeXt.
The development of this text was supported by an OER Implementation Grant, pro-
vided through the UBC Open Educational Resources Fund.
§§ Contact
To report a mistake, or to let us know you’re using this book in a course you’re teaching,
please email [email protected]
2
C ONTENTS
2 Partial Derivatives 40
2.1 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.2 Higher Order Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 Local Maximum and Minimum Values . . . . . . . . . . . . . . . . . . . . . . 52
2.3.1 Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.2 Classifying Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.4 Absolute Minima and Maxima . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.4.1 (Optional) Parametrization . . . . . . . . . . . . . . . . . . . . . . . . 78
2.5 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.5.1 Bounded vs Unbounded Constraints . . . . . . . . . . . . . . . . . . . 93
2.6 (Optional) Utility and Demand Functions . . . . . . . . . . . . . . . . . . . . 95
2.6.1 Constrained Optimization of the Utility Function . . . . . . . . . . . 95
2.6.2 Demand Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3 Integration 106
3.1 Definition of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.1.1 Summation Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.1.2 The Definition of the Definite Integral . . . . . . . . . . . . . . . . . . 119
3.1.3 Using Known Areas to Evaluate Integrals . . . . . . . . . . . . . . . . 127
3.1.4 (Optional) Surplus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.2 Basic Properties of the Definite Integral . . . . . . . . . . . . . . . . . . . . . 134
i
CONTENTS CONTENTS
4 Probability 293
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
4.1.1 Foundational Vocabulary and Notation . . . . . . . . . . . . . . . . . 293
4.1.2 Discrete vs Continuous . . . . . . . . . . . . . . . . . . . . . . . . . . 297
4.1.3 Combining Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
4.1.4 Equally Likely Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . 301
4.2 Probability Mass Function (PMF) . . . . . . . . . . . . . . . . . . . . . . . . . 304
4.2.1 Limitations of Probability Mass Function (PMF) . . . . . . . . . . . . 309
4.3 Cumulative Distribution Function (CDF) . . . . . . . . . . . . . . . . . . . . 311
4.3.1 Dot Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
4.4 Probability Density Function (PDF) . . . . . . . . . . . . . . . . . . . . . . . 320
4.5 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
4.5.1 Motivation: Long-Term Average . . . . . . . . . . . . . . . . . . . . . 327
4.5.2 Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 328
4.5.3 Checking your Expectation Calculation . . . . . . . . . . . . . . . . . 332
ii
CONTENTS CONTENTS
iii
CONTENTS CONTENTS
iv
Chapter 1
Before we get started doing calculus in two and three dimensions we need to brush up
on some basic geometry that we will use a lot. We are already familiar with the Cartesian
plane1 , but we’ll start from the beginning.
1.1IJ Points
Each point in two dimensions may be labeled by two coordinates2 ( x, y) which specify the
position of the point in some units with respect to some axes as in the figure below.
(x, y)
x x
1 René Descartes (1596–1650) was a French scientist and philosopher, who lived in the Dutch Republic
for roughly twenty years after serving in the (mercenary) Dutch States Army. He is viewed as the father
of analytic geometry, which uses numbers to study geometry.
2 This is why the xy-plane is called “two dimensional” — the name of each point consists of two real
numbers.
3 Not surprisingly, the 2 in R2 signifies that each point is labelled by two numbers and the R in R2
signifies that the numbers in question are real numbers. There are more advanced applications (for
example in signal analysis and in quantum mechanics) where complex numbers are used. The space of
all pairs (z1 , z2 ), with z1 and z2 complex numbers is denoted C2 .
1
V ECTORS AND G EOMETRY 1.1 P OINTS
Similarly, each point in three dimensions may be labeled by three coordinates ( x, y, z),
as in the two figures below.
z z
(x, y, z) (x, y, z)
z z
y y
x
x
y y
x x
The set of all points in three dimensions is denoted R3 . The plane that contains, for exam-
ple, the x- and y-axes is called the xy-plane.
More generally,
• The set of all points ( x, y, z) that obey z = c is a plane that is parallel to the xy-plane
and is a distance |c| from it. If c ą 0, the plane z = c is above the xy-plane. If
c ă 0, the plane z = c is below the xy-plane. We say that the plane z = c is a signed
distance c from the xy-plane.
• The set of all points ( x, y, z) that obey y = b is a plane that is parallel to the xz-plane
and is a signed distance b from it.
• The set of all points ( x, y, z) that obey x = a is a plane that is parallel to the yz-plane
and is a signed distance a from it.
z z z
z“c
y“b
y y y
x x x x“a
2
V ECTORS AND G EOMETRY 1.2 V ECTORS
To see that the distance from the point ( x, y, z) to the origin (0, 0, 0) is indeed x2 + y2 + z2 ,
a
• apply Pythagoras to the right-angled triangle with vertices a (0, 0, 0), ( x, 0, 0) and
( x, y, 0) to see that the distance from (0, 0, 0) to ( x, y, 0) is x2 + y2 and then
• apply Pythagoras to the right-angled triangle with verticesb(0, 0, 0), ( x, y, 0) and
2
x 2 + y2 + z2 =
a
( x, y, z) to see that the distance from (0, 0, 0) to ( x, y, z) is
x 2 + y2 + z2 .
a
px, y, zq
y
x
px, 0, 0q y px, y, 0q
Notice that this gives us the equation for a sphere quite directly. All the points on a sphere
are equidistant from the centre of the sphere. So, for example, the equation of the sphere
centered on (1, 2, 3) with radius 4, that is, the set of all points ( x, y, z) whose distance from
(1, 2, 3) is 4, is
( x ´ 1)2 + (y ´ 2)2 + (z ´ 3)2 = 16
If you’re having a hard time picturing the three-dimensional axes, Appendix section A.1
will lead you through folding a model out of a piece of paper.
1.2IJ Vectors
In many of our applications in 2d and 3d, we will encounter quantities that have both a
magnitude (like a distance) and also a direction. Such quantities are called vectors. That is,
a vector is a quantity which has both a direction and a magnitude, like a velocity. If you are
moving, the magnitude (length) of your velocity vector is your speed (distance travelled
3
V ECTORS AND G EOMETRY 1.2 V ECTORS
per unit time) and the direction of your velocity vector is your direction of motion. To
specify a vector in three dimensions you have to give three components, just as for a
point. To draw the vector with components a, b, c you can draw an arrow from the point
(0, 0, 0) to the point ( a, b, c). Similarly, to specify a vector in two dimensions you have to
y z
pa, b, cq
(a, b) c
y
b a
b
x
a x
give two components. To draw the vector with components a and b , you can draw an
arrow from the point (0, 0) to the point ( a, b).
There are many situations in which it is preferable to draw a vector with its tail at
some point other than the origin. For example, it is natural to draw the velocity vector
of a moving particle with the tail of the velocity vector at the position of the particle,
whether or not the particle is at the origin. The sketch below shows a moving particle and
its velocity vector at two different times.
y
v
As a second example, suppose that you are analyzing the motion of a pendulum. There
are three forces acting on the pendulum bob: gravity g, which is pulling the bob straight
down, tension t in the rod, which is pulling the bob in the direction of the rod, and air
resistance r, which is pulling the bob in a direction opposite to its direction of motion. All
three forces are acting on the bob. So it is natural to draw all three arrows representing the
forces with their tails at the ball.
4
V ECTORS AND G EOMETRY 1.2 V ECTORS
t
r
In this text, we will used bold faced letters, like v, t, g, to designate vectors. In hand-
writing, it is clearer to use a small overhead arrow4 , as in ~v, ~t, ~g, instead. Also, when we
want to emphasize that some quantity is a number, rather than a vector, we will call the
number a scalar.
Both points and vectors in 2d are specified by two numbers. Until you get used to this,
it might confuse you sometimes — does a given pair of numbers represent a point or a
vector? To distinguish5 between the components of a vector and the coordinates of the
point at its head, when its tail is at some point other than the origin, we shall use angle
brackets rather than round brackets around the components of a vector. For example, the
figure below shows the two-dimensional vector h2, 1i drawn in three different positions.
In each case, when the tail is at the point (u, v) the head is at (2 + u, 1 + v). We warn you
that, out in the real world6 , no one uses notation that distinguishes between components
of a vector and the coordinates of its head — usually round brackets are used for both. It
is up to you to keep straight which is being referred to.
h2, 1i (6, 3)
y
(0, 0) (8, 0) x
By way of summary,
Notation1.2.1.
we use
5
V ECTORS AND G EOMETRY 1.2 V ECTORS
a = h a1 , a2 i , b = hb1 , b2 i ùñ a + b = h a1 + b1 , a2 + b2 i
a = h a1 , a2 i , s a number ùñ sa = hsa1 , sa2 i
Pictorially, you add the vector b to the vector a by drawing b with its tail at the head
of a and then drawing a vector from the tail of a to the head of b, as in the figure on the
left below. For a number s, we can draw the vector sa, by just
a2 ` b2 a`b 2a2
2a
b2 b a
a2
a2 a
a ´2a
The special case of multiplication by s = ´1 appears so frequently that (´1)a is given the
shorter notation ´a. That is,
´ h a1 , a2 i = h´a1 , ´a2 i
6
V ECTORS AND G EOMETRY 1.2 V ECTORS
−b
a−b
a
a−b
b
The operations of addition and multiplication by a scalar that we have just defined are
quite natural and rarely cause any problems, because they inherit from the real numbers
the properties of addition and multiplication that you are used to.
We have just been introduced to many definitions. Let’s see some of them in action.
Example 1.2.4
For example, if
a = h1, 2, 3i b = h3, 2, 1i c = h1, 0, 1i
then
2a = 2 h1, 2, 3i = h2, 4, 6i
´b = ´ h3, 2, 1i = h´3, ´2, ´1i
3c = 3 h1, 0, 1i = h3, 0, 3i
and
Example 1.2.4
7
V ECTORS AND G EOMETRY 1.2 V ECTORS
There are some vectors that occur sufficiently commonly that they are given special
names. One is the vector 0. Some others are the “standard basis vectors”.
Definition1.2.5.
We’ll explain the little hats in the notation ı̂ı , ̂ , k̂ shortly. Some people rename ı̂ı , ̂ and
k̂ to e1 , e2 and e3 respectively. Using the above properties we have, for all vectors,
b
a = h a1 , a2 i ùñ |a| = a21 + a22
b
a = h a1 , a2 , a3 i ùñ |a| = a21 + a22 + a23
A unit vector is a vector of length one. We’ll sometimes use the accent ˆ to em-
phasise that the vector â is a unit vector. That is, |â| = 1.
Example 1.2.7
Recall that multiplying a vector a by a positive number s, changes the length of the vector
by a factor s without changing the direction of the vector. So (assuming that |a| ‰ 0) |a|
a
h1,1,1
? i
is a unit vector that has the same direction as a. For example, 3
is a unit vector that
points in the same direction as h1, 1, 1i.
8
V ECTORS AND G EOMETRY 1.2 V ECTORS
Example 1.2.7
• “scalar plus scalar”, “scalar plus vector” and “vector plus vector”
• “scalar times scalar”, “scalar times vector” and “vector times vector”
We have been using “scalar plus scalar” and “scalar times scalar” since childhood. “Vector
plus vector” and “scalar times vector” were just defined above. There is no sensible way
to define “scalar plus vector”, so we won’t. This leaves “vector times vector”. There are
actually two widely used such products. The first is the dot product, which is the topic of
this section, and which is used to easily determine the angle θ (or more precisely, cos θ)
between two vectors. (The second widely-used product of two vectors, the cross product,
is not a part of this course.)
a = h a1 , a2 i , b = hb1 , b2 i ùñ a ¨ b = a1 b1 + a2 b2
a = h a1 , a2 , a3 i , b = hb1 , b2 , b3 i ùñ a ¨ b = a1 b1 + a2 b2 + a3 b3
9
V ECTORS AND G EOMETRY 1.2 V ECTORS
Proof. Properties 0 through 5 are almost immediate consequences of the definition. For
example, for property 3 (which is called the distributive law) in dimension 2,
a ¨ (b + c) = h a1 , a2 i ¨ hb1 + c1 , b2 + c2 i
= a1 (b1 + c1 ) + a2 (b2 + c2 ) = a1 b1 + a1 c1 + a2 b2 + a2 c2
a¨b+a¨c = h a1 , a2 i ¨ hb1 , b2 i + h a1 , a2 i ¨ hc1 , c2 i
= a1 b1 + a2 b2 + a1 c1 + a2 c2
Property 6 is sufficiently important that it is often used as the definition of dot product.
It is not at all an obvious consequence of the definition. To verify it, we just write |a ´ b|2
in two different ways. The first expresses |a ´ b|2 in terms of a ¨ b. It is
1
|a ´ b|2 = (a ´ b ) ¨ (a ´ b )
3
= a¨a´a¨b´b¨a+b¨b
1,2
= |a|2 + |b|2 ´ 2a ¨ b
1
Here, =, for example, means that the equality is a consequence of property 1. The second
way we write |a ´ b|2 involves cos θ and follows from the cosine law for triangles. Just
in case you don’t remember the cosine law, we’ll derive it right now! Start by applying
Pythagoras to the shaded triangle in the right hand figure of
a a´b |a| |a ´ b|
|a| sin θ
θ θ
b |b|
|a| cos θ
This is precisely the cosine law8 . Observe that, when θ = π2 , this reduces to, (surpise!)
Pythagoras’ Theorem.
Setting our two expressions for |a ´ b|2 equal to each other,
8 You may be used to seeing it written as c2 = a2 + b2 ´ 2ab cos C, where a, b and c are the lengths of the
three sides of the triangle and C is the angle opposite the side of length c
10
V ECTORS AND G EOMETRY 1.2 V ECTORS
z
h´1, 1, 1i
h1, 0, 1i
x h1, 1, 0i
Example 1.2.10
9 The concepts of the dot product and perpendicularity have been generalized a lot in mathematics (for
example, from 2d and 3d vectors to functions). The generalization of the dot product is called the “inner
product” and the generalization of perpendicularity is called “orthogonality”.
11
V ECTORS AND G EOMETRY 1.3 E QUATIONS OF P LANES IN 3 D
n (x, y, z)
d
to that of the plane does uniquely determine the plane. If ( x, y, z) is any point on the line
then the vector h x ´ x0 , y ´ y0 , z ´ z0 i, whose tail is at ( x0 , y0 , z0 ) and whose head is at
( x, y, z), lies entirely inside the plane and so must be perpendicular to n. That is,
n ¨ h x ´ x0 , y ´ y0 , z ´ z0 i = 0
n x ( x ´ x0 ) + ny (y ´ y0 ) + nz (z ´ z0 ) = 0 or n x x + ny y + nz z = d
where d = n x x0 + ny y0 + nz z0 .
Give an equation of the plane that passes through the point (5, 7, 13) and has normal vec-
tor x8, 4, 2y.
Solution. As we saw in Equation 1.3.1, the terms of the normal vector are the coefficients
of the variables:
8x + 4y + 2z = d
and
d = x8, 4, 2y ¨ x5, 7, 13y = 8 ¨ 5 + 4 ¨ 7 + 2 ¨ 13 = 94
So, the equation of the plane is
8x + 4y + 2z = 94
12
V ECTORS AND G EOMETRY 1.3 E QUATIONS OF P LANES IN 3 D
Example 1.3.2
The normal vector to a plane determines the orientation of the plane in space.
Two planes are orthogonal if their normal vectors are orthogonal. Two planes
are parallel their normal vectors are parallel.
A plane is parallel to itself, but when we ask for parallel planes, it is usually
implied that they are distinct.
Example 1.3.4
We have just seen that if we write the equation of a plane in the standard form
ax + by + cz = d
then it is easy to read off a normal vector for the plane. It is just h a, b, ci. So for example
the planes
P : x + 2y + 3z = 4 P1 : 3x + 6y + 9z = 7
have normal vectors n = h1, 2, 3i and n1 = h3, 6, 9i, respectively. Since n1 = 3n, the two
normal vectors n and n1 are parallel to each other. This tells us that the planes P and P1
are parallel to each other.
When the normal vectors of two planes are perpendicular to each other, we say that
the planes are perpendicular to each other. For example the planes
P : x + 2y + 3z = 4 P2 : 2x ´ y = 7
have normal vectors n = h1, 2, 3i and n2 = h2, ´1, 0i, respectively. Since
n ¨ n2 = 1 ˆ 2 + 2 ˆ (´1) + 3 ˆ 0 = 0
the normal vectors n and n2 are mutually perpendicular, so the corresponding planes P
and P2 are perpendicular to each other.
Example 1.3.4
Example 1.3.5
P : 4x + 3y + 2z = 12
A good way to prepare for sketching a plane is to find the intersection points of the plane
with the x-, y- and z-axes, just as you are used to doing when sketching lines in the xy-
plane. For example, any point on the x axis must be of the form ( x, 0, 0). For ( x, 0, 0)
13
V ECTORS AND G EOMETRY 1.3 E QUATIONS OF P LANES IN 3 D
• only sketch the part of the surface in the first ocatant. That is, the part with x ě 0,
y ě 0 and z ě 0.
• To do so, sketch the curve of intersection of the surface with the part of the xy-plane
in the first octant and,
• similarly, sketch the curve of intersection of the surface with the part of the xz-plane
in the first octant and the curve of intersection of the surface with the part of the
yz-plane in the first octant.
That’s what we’ll do. The intersection of the plane P with the xy-plane is the straight line
through the two points (3, 0, 0) and (0, 4, 0). So the part of that intersection in the first
octant is the line segement from (3, 0, 0) to (0, 4, 0). Similarly the part of the intersection
of P with the xz-plane that is in the first octant is the line segment from (3, 0, 0) to (0, 0, 6)
and the part of the intersection of P with the yz-plane that is in the first octant is the line
segment from (0, 4, 0) to (0, 0, 6). So we just have to sketch the three line segments joining
three axis intercepts (3, 0, 0), (0, 4, 0) and (0, 0, 6). That’s it.
z
p0, 0, 6q
y
p0, 4, 0q
p3, 0, 0q
x
Example 1.3.5
Find the equation of the plane that contains the three points (1, ´1, 0), (2, 0, 1), and (5, 0, ´1).
Solution. Solution 1
We know that the equation of the plane will have the form ax + by + cz = d, where
xa, b, cy is a normal vector to the plane. So, we will start by finding a normal vector.
First, let’s find two vectors in the plane. We do this by choosing two pairs of points (it
doesn’t matter which two) and subtracting their coordinates.
14
V ECTORS AND G EOMETRY 1.3 E QUATIONS OF P LANES IN 3 D
1y (2, 0, 1)
x1, 1,
(1, ´1, 0)
x4, 1,
´1y (5, 0, ´1)
The normal vector will be a vector xa, b, cy that is perpendicular (orthogonal) to the two
vectors x4, 1, ´1y and x1, 1, 1y. The usual way of finding such a vector is by using the cross
product, but that’s a topic for another course. We find it by solving a system of equations.
Remember two nonzero vectors are perpendicular if their dot products are zero.
• There will be infinitely many normal vectors, all parallel to one another (i.e. scalar
multiples of one another). So, it’s fine that we have all our coordinates in terms of
b. Our normal vectors have the forms x´ 25 b, b, ´ 35 by. Setting b = 5 gives us integer
coordinates:
x´2, 5, ´3y
• Now that we have a normal vector, we know our plane equation will look like
´2x + 5y ´ 3x = d
for some constant d. Plugging in any of our three points will let us find d. For
example, the point (1,-1,0) tells us ´2 ´ 5 + 0 = d, so d = ´7.
´2x + 5y ´ 3x = ´7
Solution 2 We know that the equation of the plane will have the form ax + by + cz = d.
The three points give us a system of linear equations, which we can solve using substitu-
tion.
15
V ECTORS AND G EOMETRY 1.4 F UNCTIONS OF T WO VARIABLES
Any nonzero value of d will give an equation to our line. To get integer coefficients,
we let d = 7.
2x ´ 5y + 3z = 7
Notice this answer is the negative of the answer from Solution 1. They are equivalent
expressions, as is (for example)
1 5 3 1
x´ y+ z =
7 14 14 2
Example 1.3.6
y = f (x)
where y is the dependent variable and x is the independent variable. Similarly, in a two-
variable function, we generally write
z = f ( x, y)
z3 x + z2 y + xyz ´ 1 = 0
16
V ECTORS AND G EOMETRY 1.4 F UNCTIONS OF T WO VARIABLES
we can think of z as an implicitly defined function of x and y. You’ve already seen two
families of implicitly defined functions: planes and spheres.
Example 1.4.1
z3 x + z2 y + xyz ´ 1 = 0 ?
1+y+y´1 = 0
f ( x, y) = sin( x + y)
or
2 + y2
g( x, y) = e x
and think that the sine and exponential functions are different from the sine and exponen-
tial functions we’ve seen in two dimensions. They aren’t! When x and y are real numbers,
then ( x + y) and ( x2 + y2 ) are real numbers as well. We’re taking the sine of a real number
in the first equation, and e to a real power in the second equation, just as we always have.
Functions of two (or more) variables are not so different from functions of one variable
in other ways as well.
Let f ( x, y) be a function that takes pairs of real numbers as inputs, and gives a
real number as its output.
The set of points ( x, y) that can be input to f is the domain of that function. The
set of outputs of f over its entire domain is the range of that function.
Solution. There are three operations in our function: exponentiation, subtraction, and
taking of a square root. We can subtract anything from anything; and we can raise e to
any power. So the only thing that could “break” our function is if we tried to take the
17
V ECTORS AND G EOMETRY 1.4 F UNCTIONS OF T WO VARIABLES
square root of a negative number. This tells us that, in order for f ( x, y) to be defined, we
need
2 2
e x +y ´ 2 ě 0
2 + y2
ùñ ex ě2
ùñ x2 + y2 ě ln 2
One way of describing the domain of this function is to call it “all points ( x, y) with
x2 + y2 ě ln 2.” A more standard way is to describe the shape this 2
? set makes in R : all
points on our outside the circle centred at the origin with radius ln 2 « 0.83.
y
?
ln 2
To help you visualize what we mean, take a point in the shaded area above. For exam-
ple, (1, .5). If we plug that into our function, it causes no problems:
a
2 2
a ?
f (1, .5) = e1 +.5 ´ 1 = e1.25 ´ 2 « 1.49 « 1.22
On the other hand, take a point in the white area. For example, (.5, .5). If we try to plug
this into our function, we end up with
a
2 2
a ? ?
f (.5, .75) = e.5 +.5 ´ 2 = e0.5 ´ 2 « 1.65 ´ 2 « ´0.35
which is not a real number.
y
?
ln 2
(1, .5)
(.5, .5)
x
18
V ECTORS AND G EOMETRY 1.4 F UNCTIONS OF T WO VARIABLES
Now, let’s think about range. By choosing larger and larger values of x and y, we can
make x2 + y2 into larger and larger numbers. So within our restricted domain, the range
2 2 2 2
of x2 + y2 is [ln2, 8); so the range of e x +y is eln 2 , 8 = [2, 8); so the range of e x +y ´ 2
is [0, 8); so the range of f ( x, y) is [0, 8).
Again, note that the domain of f consists of ordered pairs of real numbers, while its
range consists of real numbers.
Example 1.4.3
Example 1.4.4
Solution. Let’s start with domain. We can take the sine of any number we like, so that
part of the function doesn’t limit the domain. The things limiting the domain are that we
cannot take the square root of a negative number, and we can’t divide by zero.
• Because we can’t take the square root of a negative number, we must have y ě 0.
?
• Because we can’t divide by 0, we must have y ‰ 0, i.e. y ‰ 0.
Combining these restrictions, we can only have values of y in the interval (0, 8); x can be
any real number. So, our domain is the upper half of the xy plane, excluding the x-axis:
In general, the range of sin x is [´1, 1]. So, we certainly can’t get a larger range than
this. We should check that our range is no smaller. When y = 1, our function becomes
f ( x, 1) = sin( x/1) = sin x. Since x can be any real number, indeed the range of our
function is [´1, 1].
19
V ECTORS AND G EOMETRY 1.4 F UNCTIONS OF T WO VARIABLES
Example 1.4.4
Example 1.4.5
Solution. First, let’s think about the arctangent and logarithm function in the context of
single-variable
functions. The domain of arctangent is all real numbers, and its range is
´ 2 , 2 . The domain of the natural logarithm is all positive numbers, and its range is all
π π
real numbers.
z z
π
2
t t
z = arctan t z = ln t
Since only positive numbers may be input into the natural logarithm, we require arctan( x +
y) ą 0. That requires ( x + y) ą 0. So, our domain is the collection of all points ( x, y) such
that x + y ą 0; put another way, all points above the line y = ´x.
y
20
V ECTORS AND G EOMETRY 1.4 F UNCTIONS OF T WO VARIABLES
If our domain is points ( x, y) such that x + y ą 0, then the range of the function ( x +
y) is (0, 8); so the numbers being plugged into the arctangent function are (0, 8). So,
the numbers
coming out of the arctangent function are 0, 2 . Then the numbers from
π
z z
π
2
ln π
2
t t
π
2
If 0 ă t, then 0 ă arctan t ă π
2 If 0 ă t ă 2,
π
then ´8 ă ln t ă ln π
2
Example 1.4.5
We may sometimes restrict the domain of a function more than is mathematically nec-
essary in order for it to make sense in a model. For example, we may have a function
that only makes sense in our model when it gives positive values. In this case, we might
restrict the domain to a model domain, the set of inputs for which the function is not only
defined, but sensible in the context of our model.
Example 1.4.6
A large pharmaceutical company determines its research budget for a new vaccine accord-
ing to the formula
R( x, y) = ln( xy)
where x the is the size of the customer base they expect to have and y is the revenue they
expect per dose.
The each variable x, y, and R, negative values don’t make sense in the model. So
although we could compute R(´1, ´1) = 1, and we could compute R(0.5, 0.5) « ´1.39,
they wouldn’t be sensible in the context of our model.
• Since x and y need to be nonnegative, we will only consider points ( x, y) in the first
quadrant of the Cartesian plane: x ě 0 and y ě 0.
21
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
The two restrictions above give us the model domain shaded below.
x
1
Depending on the specifics of how the function is being used, the model domain may
be restricted even further. For example, perhaps the firm has a maximum budget for any
given project; perhaps the amount they can charge is limited by law; etc.
Example 1.4.6
Definition1.5.1.
The trace of a surface is the intersection of that surface with a plane that is parallel
to one of the coordinate planes.
So, one trace (the intersection with the xy plane) is found by setting z equal to a con-
stant; another trace (the intersection with the yz plane) is found by setting x equal to a
constant; and the final trace (the intersection with the xz plane) is found by setting y equal
to a constant.
10 Of course you could instead use some fancy graphing software, but part of the point is to build intuition.
Not to mention that you can’t use fancy graphing software on your exam.
22
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
One can often get a pretty good idea of what a surface looks like by sketching a bunch
of cross-sections. Here are some examples.
Example 1.5.2 4x2 + y2 ´ z2 = 1
Solution. We’ll start by fixing any number z0 and sketching the part of the surface that
lies in the horizontal plane z = z0 .
z “ z0
The intersection of our surface with that horizontal plane is a horizontal cross-section.
Any point ( x, y, z) lying on that horizontal cross-section satsifies both
z = z0 and 4x2 + y2 ´ z2 = 1
ðñ z = z0 and 4x2 + y2 = 1 + z20
p y
(0 , 1 + z02 )
x
p
( 21 1 + z02 , 0)
11 The semi-axes of an ellipse are the line segments from the centre of the ellipse to the farthest point on
the curve and to the nearest point on the curve. For a circle the lengths of both of these line segments
are just the radius.
23
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
Remember that this ellipse is the part of our surface that lies in the plane z = z0 . Imagine
that the sketch of the ellipse is on a single sheet of paper. Lift the sheet of paper up, move
it around so that the x- and y-axes point in the directions of the three dimensional x- and
y-axes and place the sheet of paper into the three dimensional sketch at height z0 . This
gives a single horizontal ellipse in 3d, as in the figure below.
z
z “ z0
We can build up the full surface by stacking many of these horizontal ellipses — one for
each possible height z0 . So we now draw a few of them as in the figure below. To reduce
the amount of clutter in the sketch, we have only drawn the first octant (i.e. the part of
three dimensions that has x ě 0, y ě 0 and z ě 0).
z
z=3
z=2
z=1
y
Here is why it is OK, in this case, to just sketch the first octant. Replacing x by ´x in
the equation 4x2 + y2 ´ z2 = 1 does not change the equation. That means that a point
( x, y, z) is on the surface if and only if the point (´x, y, z) is on the surface. So the surface
is invariant under reflection in the yz-plane. Similarly, the equation 4x2 + y2 ´ z2 = 1 does
not change when y is replaced by ´y or z is replaced by ´z. Our surface is also invariant
reflection in the xz- and yz-planes. Once we have the part in the first octant, the remaining
octants can be gotten simply by reflecting about the coordinate planes.
We can get a more visually meaningful sketch by adding in some vertical cross-sections.
The x = 0 and y = 0 cross-sections (also called traces — they are the parts of our surface
that are in the yz- and xz-planes, respectively) are
x = 0, y2 ´ z2 = 1 and y = 0, 4x2 ´ z2 = 1
24
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
These equations describe hyperbolae12 . If you don’t remember how to sketch them, don’t
worry. We’ll do it now. We’ll first sketch them in 2d. Since
z z=y z
y2 − z2 = 1 4x2 − z 2 = 1
y x
Now we’ll incorporate them into the 3d sketch. Once again imagine that each is a single
sheet of paper. Pick each up and move it into the 3d sketch, carefully matching up the
axes. The red (blue) parts of the hyperbolas above become the red (blue) parts of the 3d
sketch below (assuming of course that you are looking at this on a colour screen).
z
z=3
z=2
z=1
y
Now that we have a pretty good idea of what the surface looks like we can clean up and
simplify the sketch. Here are a couple of possibilities.
25
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
Example 1.5.3 4x2 + y2 ´ z2 = ´1
Solution. As in the last example, we’ll start by fixing any number z0 and sketching the
part of the surface that lies in the horizontal plane z = z0 . The intersection of our surface
with that horizontal plane is
Think of z0 as a constant.
The first octant parts of a few of these horizontal cross-sections are drawn in the figure
below.
26
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
z
z“3
z“2
z “ 1.02
y
Next we add in the x = 0 and y = 0 cross-sections (i.e. the parts of our surface that are in
the yz- and xz-planes, respectively)
x = 0, z2 = 1 + y2 and y = 0, z2 = 1 + 4x2
z
z“3
z“2
z “ 1.05
y
Now that we have a pretty good idea of what the surface looks like we clean up and
simplify the sketch.
27
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
Solution. This surface has a special property that makes it relatively easy to sketch. There
are no x’s in the equation yz = 1. That means that if some y0 and z0 obey y0 z0 = 1, then
the point ( x, y0 , z0 ) lies on the surface yz = 1 for all values of x. As x runs from ´8 to 8,
the point ( x, y0 , z0 ) sweeps out a straight line parallel to the x-axis. So the surface yz = 1
is a union of lines parallel to the x-axis. It is invariant under translations parallel to the
x-axis. To sketch yz = 1, we just need to sketch its intersection with the yz-plane and then
translate the resulting curve parallel to the x-axis to sweep out the surface.
We’ll start with a sketch of the hyperbola yz = 1 in two dimensions.
z
yz = 1
Next we’ll move this 2d sketch into the yz-plane, i.e. the plane x = 0, in 3d, except that
we’ll only draw in the part in the first octant.
28
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
Example 1.5.4
Solution. We’ll sketch this surface using much the same procedure as we used in Examples
1.5.2 and 1.5.3. We’ll only sketch the part of the surface in the first octant. The remaining
parts (in the octants with x, y ă 0, z ě 0, with x, z ă 0, y ě 0 and with y, z ă 0, x ě 0) are
just reflections of the first octant part.
As usual, we start by fixing any number z0 and sketching the part of the surface that
lies in the horizontal plane z = z0 . The intersection of our surface with that horizontal
plane is the hyperbola
1
z = z0 and xy =
z0
Note that x Ñ 8 as y Ñ 0 and that y Ñ 8 as x Ñ 0. So the hyperbola has both the x-axis
and the y-axis as asymptotes, when drawn in the xy-plane. The first octant parts of a few
of these horizontal cross-sections (namely, z0 = 4, z0 = 2 and z0 = 21 ) are drawn in the
figure below.
29
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
z
z“4
z“2
z “ 1{2
y
Next we add some vertical cross-sections. We can’t use x = 0 or y = 0 because any point
on xyz = 1 must have all of x, y, z nonzero. So we use
x = 4, yz = 1 and y = 4, xz = 1
y“4
x“4
30
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
Example 1.5.5
Often the reason you are interested in a surface in 3d is that it is the graph z = f ( x, y)
of a function of two variables f ( x, y). Another good way to visualize the behaviour of a
function f ( x, y) is to sketch what are called its level curves.
Definition1.5.6.
A level curve is the set of points in the xy-plane where f takes the value C. Because
it is a curve in 2d, it is usually easier to sketch than the graph of f . Here are a couple of
examples.
Example 1.5.7 f ( x, y) = x2 + 4y2 ´ 2x + 2
Solution. Fix any real number C. Then, for the specified function f , the level curve
f ( x, y) = C is the set of points ( x, y) that obey
x2 + 4y2 ´ 2x + 2 = C ðñ x2 ´ 2x + 1 + 4y2 + 1 = C
ðñ ( x ´ 1)2 + 4y2 = C ´ 1
Now ( x ´ 1)2 + 4y2 is the sum of two squares, and so is always at least zero. So if C ´ 1 ă 0,
i.e. if C ă 1, there is no curve f ( x, y) = C. If C ´ 1 = 0, i.e. if C = 1, then f ( x, y) = C ´ 1 =
0 if and only if both ( x ´ 1)2 = 0 and 4y2 = 0 and so the level curve consists of the single
point (1, 0). If C ą 1, then f ( x, y) = C become ( x ´ 1)2 + 4y2 = C ´ 1 ą 0 which describes
an ellipse centred on (1, 0). It intersects the x-axis when y = 0 and
? ?
( x ´ 1)2 = C ´ 1 ðñ x ´ 1 = ˘ C ´ 1 ðñ x = 1 ˘ C ´ 1
31
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
and it intesects the line x = 1 (i.e. the vertical line through the centre) when
? ?
4y2 = C ´ 1 ðñ 2y = ˘ C ´ 1 ðñ y = ˘ 12 C ´ 1
?
So, when C ą?1, f ( x, y) = C is the ellipse centred on (1, 0) with x semi-axis C ´ 1 and
y semi-axis 21 C ´ 1. Here is a sketch of some representative level curves of f ( x, y) =
x2 + 4y2 ´ 2x + 2.
1
f “17
f “10
f “5
f “1 f “2
1 x
x“1
The function f ( x, y) is given implicitly by the equation e x+y+z = 1. Sketch the level curves
of f .
Solution. This one is not as nasty as it appears. That “ f ( x, y) is given implicitly by the
equation e x+y+z = 1” means that, for each x, y, the solution z of e x+y+z = 1 is f ( x, y). So,
for the specified function f and any fixed real number C, the level curve f ( x, y) = C is the
set of points ( x, y) that obey
This is of course a straight line. It intersects the x-axis when y = 0 and x = ´C and it
intersects the y-axis when x = 0 and y = ´C. Here is a sketch of some level curves.
32
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
1 f =−3
x
1 f =−2
f =−1
f =3 f =2 f =1 f =0
Example 1.5.8
We have just seen that sketching the level curves of a function f ( x, y) can help us
understand the behaviour of f . We can generalise this to functions F ( x, y, z) of three vari-
ables. A level surface of F ( x, y, z) is a surface whose equation is of the form F ( x, y, z) = C
for some constant C. It is the set of points ( x, y, z) at which F takes the value C.
Example 1.5.9 F ( x, y, z) = x2 + y2 + z2
F “9
F “4
F “1
33
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
Example 1.5.9
Example 1.5.10 F ( x, y, z) = x2 + z2
z
y “ y0
F “C
z
F “9
F “4
F “1
Example 1.5.10
34
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
F “ e3
F “e 2
F “e
y
Example 1.5.11
There some classes of relatively simple, but commonly occurring, surfaces that are
given their own names. One such class is cylindrical surfaces. You are probably used to
thinking of a cylinder as being something that looks like x2 + y2 = 1.
x2 ` y 2 “ 1
35
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
Definition1.5.12 (Cylinder).
A cylinder is a surface that consists of all points that are on all lines that are
Example 1.5.13
Here are sketches of three cylinders. The familiar cylinder on the left below
x2 ` y 2 “ 1 x2 ` py ´ zq2 “ 1
is called a right circular cylinder, because the given fixed plane curve (x2 + y2 = 1, z = 0)
is a circle and the given line (the z-axis) is perpendicular (i.e. at right angles) to the fixed
plane curve.
The cylinder on the left above can be thought of as a vertical stack of circles. The
cylinder on the right above can also be thought of as a stack of circles, but the centre of the
circle at height z has been shifted rightward to (0, z, z). For that cylinder, the given fixed
plane curve is once again the circle x2 + y2 = 1, z = 0, but the given line is y = z, x = 0.
We have already seen the the third cylinder
x yz “ 1
x, y, z ą 0
36
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
in Example 1.5.4. It is called a hyperbolic cylinder. In this example, the given fixed plane
curve is the hyperbola yz = 1, x = 0 and the given line is the x-axis.
Example 1.5.13
13 Technically, we should also require that the polynomial can’t be factored into the product of two poly-
nomials of degree one.
14 This statement can be justified using a linear algebra eigenvalue/eigenvector analysis. It is beyond
what we can cover here, but is not too difficult for a standard linear algeba course.
37
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
Its constant z cross sections are ellipses and its x = 0 and y = 0 cross sections are
hyperbolae. It is called a hyperboloid of one sheet.
• We saw the quadric surface x2 + y2 = 1 in Example 1.5.13.
Its constant z cross sections are circles and its x = 0 and y = 0 cross sections are
straight lines. It is called a right circular cylinder.
• the quadric surface x2 + (y ´ z)2 = 1 in Example 1.5.13, and
• We saw the quadric surface yz = 1 in Example 1.5.4.
Appendix A.2 contains other quadric surfaces.
Example 1.5.15 (Indifference curves)
Suppose a function U ( x, y) gives the happiness15 (or utility) a consumer gains when they
purchase x units of Good X and y units of Good Y. The level curves of the surface
z = U ( x, y) are called indifference curves, because every point along that curve results
in the same benefit to the consumer.
?
Suppose U ( x, y) = x y. The purchasing 2 units of Good X and one unit of Good Y
produces the same benefit as purchasing 1 unit of Good X and 4 units of Good Y, because
both these combinations are on the level curve U ( x, y) = 2.
y
?
1 x y=2
x
1 2
?
Let’s make a small contour map of our surface U ( x, y) = x y, plotting several indif-
? 2
ference curves. (Note x y = c is equivalent to y = xc 2 in our model domain.)
15 An amusing thought experiment is to propose units for measuring happiness. ”The one-point increase
in GDP was associated with an average increase of 3.7 wrinkly puppy faces of happiness nation-wide.”
38
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D
=1
=2
=3
=4
=5
y
U
U
U
U
U
x
Not surprisingly, if we move roughly in the direction of the vector x1, 1y (that is, in-
creasing both x and y), our happiness U ( x, y) goes up.
Note that none of the indifference curves touch either of the x or y axes. It is clear
enough from the formula that U (0, y) = U ( x, 0) = 0. This is a common feature of utility
functions: that to maximize utility, a consumer will have at least a little of both products,
rather than consuming only one type.
Example 1.5.15
Chapter 1 (excluding Section 1.4) was adapted from Chapter 1 of CLP–3 Multivari-
able Calculus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-
NonCommercial-ShareAlike 4.0 International license.
39
Chapter 2
PARTIAL D ERIVATIVES
dB B( a + h) ´ B( a) f ( a + h, b) ´ f ( a, b)
( a) = lim = lim
dx hÑ0 h hÑ0 h
Bf
This is called the “partial derivative f with respect to x at ( a, b)” and is denoted Bx y ( a, b ).
Here
40
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES
˝ the symbol B, which is read “partial”, indicates that we are dealing with a function
of more than one variable
and
˝ the subscript y on y
indicates that y is being held fixed, i.e. being treated as a
constant, and
Bf
˝ the x in Bx indicates that we are differentiating with respect to x.
Bf
˝ Bx is read “ partial dee f dee x”.
d d
Do not write dx when Bx
B
is appropriate. (There exist situations when dx f and B
Bx f are both
defined and have different meanings.)
If, instead, we are passing through the point ( x, y) = ( a, b) and are walking parallel to
the y-axis (in the positive direction), then our x-coordinate will be constant, always taking
the value x = a. So we can think of the measured temperature as the function of one
variable A(y) = f ( a, y) and we will observe the rate of change of temperature
dA A(b + h) ´ A(b) f ( a, b + h) ´ f ( a, b)
(b) = lim = lim
dy hÑ0 h hÑ0 h
Bf
This is called the “partial derivative f with respect to y at ( a, b)” and is denoted By x ( a, b ).
df
Just as was the case for the ordinary derivative dx ( x ), it is common to treat the partial
derivatives of f ( x, y) as functions of ( x, y) simply by evaluating the partial derivtives at
( x, y) rather than at ( a, b).
respectively. The partial derivatives of functions of more than two variables are
defined analogously.
Partial derivatives are used a lot. And there many notations for them.
41
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES
Notation2.1.2.
Bf
The partial derivative Bx y of a function f ( x, y) is also denoted
Bf
fx Dx f D1 f
Bx
The subscript 1 on D1 f indicates that f is
being differentiated with respect to its
Bf
first variable. The partial derivative Bx ( a, b) is also denoted
y
B f ˇˇ
ˇ
Bx ˇ(a,b)
Bf
with the subscript ( a, b) indicating
that
Bx is being evaluated at ( x, y) = ( a, b).
Bf Bf
The abbreviated notation Bx for Bx is extremely commonly used. But it is
y
dangerous to do so, when it is not clear from the context, that it is the variable y
that is being held fixed.
Remark 2.1.3 (The Geometric Interpretation of Partial Derivatives). We’ll now develop
a geometric interpretation of the partial derivative
Bf f ( a + h, b) ´ f ( a, b)
( a, b) = lim
Bx y hÑ0 h
in terms of the shape of the graph z = f ( x, y) of the function f ( x, y). That graph appears
in the figure below. It looks like the part of a deformed sphere that is in the first octant.
Bf
The definition of Bx y ( a, b) concerns only points on the graph that have y = b. In
other words, the curve of intersection of the surface z = f ( x, y) with the plane y = b. That
is the red curve in the figure. The two blue vertical line segments in the figure have heights
f ( a+h,b)´ f ( a,b)
f ( a, b) and f ( a + h, b), which are the two numbers in the numerator of h .
42
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES
z “ f px, yq
y“b
f pa ` h, bq ´ f pa, bq
f pa, bq
f pa ` h, bq
y
pa, b, 0q
h
pa ` h, b, 0q
A side view of the curve (looking from the left side of the y-axis) is sketched in the figure
below. Again, the two blue vertical line segments in the figure have heights f ( a, b)
f pa ` h, bq ´ f pa, bq
z “ f px, bq, y “ b
f pa, bq
f pa ` h, bq
x
pa, b, 0q pa ` h, b, 0q
f ( a+h,b)´ f ( a,b)
and f ( a + h, b), which are the two numbers in the numerator of h . So the
numerator f ( a + h, b) ´ f ( a, b) and denominator h are
the
rise and run, respectively, of
Bf
the curve z = f ( x, b) from x = a to x = a + h. Thus Bx y ( a, b ) is exactly the slope of (the
tangent to) the curve of intersection ofthe surface z = f ( x, y) and the plane y = b at the point
Bf
a, b, f ( a, b) . In the same way By ( a, b) is exactly the slope of (the tangent to) the curve of
x
intersection of the surface z = f ( x, y) and the plane x = a at the point a, b, f ( a, b) .
43
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES
From the above discussion, we see that we can readily compute partial derivatives B
Bx by
d
using what we already know about ordinary derivatives dx . More precisely,
Bf
• to evaluate Bx ( x, y), treat the y in f ( x, y) as a constant and differentiate the resulting
function of x with respect to x.
Bf
• To evaluate By ( x, y), treat the x in f ( x, y) as a constant and differentiate the resulting
function of y with respect to y.
Bf
• To evaluate Bx ( a, b), treat the y in f ( x, y) as a constant and differentiate the resulting
function of x with respect to x. Then evaluate the result at x = a, y = b.
Bf
• To evaluate By ( a, b), treat the x in f ( x, y) as a constant and differentiate the resulting
function of y with respect to y. Then evaluate the result at x = a, y = b.
Let
f ( x, y) = x3 + y2 + 4xy2
Then, since B
Bx treats y as a constant,
Bf Bf B B B
= = ( x3 ) + (y2 ) + (4xy2 )
Bx Bx y Bx Bx Bx
B
= 3x2 + 0 + 4y2 (x)
Bx
= 3x2 + 4y2
and, since B
By treats x as a constant,
Bf Bf B B B
= = ( x3 ) + (y2 ) + (4xy2 )
By By x By By By
B
= 0 + 2y + 4x (y2 )
By
= 2y + 8xy
In particular, at ( x, y) = (1, 0) these partial derivatives take the values
Bf
(1, 0) = 3(1)2 + 4(0)2 = 3
Bx
Bf
(1, 0) = 2(0) + 8(1)(0) = 0
By
Example 2.1.4
44
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES
Example 2.1.5
Let
f ( x, y) = y cos x + xe xy
B yx
Then, since B
Bx treats y as a constant, Bx e = yeyx and
Bf
( x, y) = ´y sin x + e xy + xye xy
Bx
Bf
( x, y) = cos x + x2 e xy
By
Example 2.1.5
Let’s move up to a function of four variables. Things generalize in a quite straight forward
way.
Example 2.1.6
Let
f ( x, y, z, t) = x sin(y + 2z) + t2 e3y ln z
Then
Bf
( x, y, z, t) = sin(y + 2z)
Bx
Bf
( x, y, z, t) = x cos(y + 2z) + 3t2 e3y ln z
By
Bf
( x, y, z, t) = 2x cos(y + 2z) + t2 e3y /z
Bz
Bf
( x, y, z, t) = 2te3y ln z
Bt
Example 2.1.6
Now here is a more complicated example — our function takes a special value at (0, 0).
To compute derivatives there we have to revert to the definition.
Example 2.1.7
45
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES
f (h, 0) ´ f (0, 0)
f x (0, 0) = lim
hÑ0 h
cos h´1
´0
= lim h (Recall that h ‰ 0 in the limit.)
hÑ0 h
cos h ´ 1
= lim
hÑ0 h2
´ sin h
= lim (By l’Hôpital’s rule.)
hÑ0 2h
´ cos h
= lim (By l’Hôpital again.)
hÑ0 2
1
=´
2
Example 2.1.7
Example 2.1.8
Again set
# cos x´cos y
x´y if x ‰ y
f ( x, y) =
0 if x = y
We’ll now compute f y ( x, y) for all ( x, y).
The case y ‰ x: When y ‰ x,
B cos x ´ cos y
f y ( x, y) =
By x´y
( x ´ y) By
B
(cos x ´ cos y) ´ (cos x ´ cos y) By
B
( x ´ y)
= by the quotient rule
( x ´ y )2
( x ´ y) sin y + cos x ´ cos y
=
( x ´ y )2
f ( x, y + h) ´ f ( x, y) f ( x, x + h) ´ f ( x, x )
f y ( x, y) = lim = lim
hÑ0 h hÑ0 h
cos x´cos( x +h)
x´( x +h)
´0
= lim (Recall that h ‰ 0 in the limit.)
hÑ0 h
cos( x + h) ´ cos x
= lim
hÑ0 h2
Now we apply L’Hôpital’s rule twice, remembering that, in this limit, x is a constant and
46
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES
´ sin( x + h)
f y ( x, y) = lim
hÑ0 2h
´ cos( x + h)
= lim
hÑ0 2
cos x
=´
2
Example 2.1.8
Our next example uses implicit differentiation.
Example 2.1.9
The equation
z5 + y2 ez + e2x = 0
implicitly determines z as a function of x and y. For example, when x = y = 0, the
equation reduces to
z5 = ´1
which forces1 z(0, 0) = ´1. Let’s find the partial derivative Bx Bz
(0, 0).
We are not going to be able to explicitly solve the equation for z( x, y). All we know is
that
z( x, y)5 + y2 ez( x,y) + e2x = 0
for all x and y. We can turn this into an equation for Bx (0, 0)
Bz
by differentiating2 the whole
equation with respect to x, giving
Bz Bz
5z( x, y)4 ( x, y) + y2 ez(x,y) ( x, y) + 2e2x = 0
Bx Bx
and then setting x = y = 0, giving
Bz
5z(0, 0)4 (0, 0) + 2 = 0
Bx
As we already know that z(0, 0) = ´1,
Bz 2 2
(0, 0) = ´ 4
=´
Bx 5z(0, 0) 5
1 The only real number z which obeys z5 = ´1 is z = ´1. However there are four other complex numbers
which also obey z5 = ´1.
2 You should have already seen this technique, called implicit differentiation, in your first Calculus
course.
47
PARTIAL D ERIVATIVES 2.2 H IGHER O RDER D ERIVATIVES
Example 2.1.9
Next we have a partial derivative disguised as a limit.
Example 2.1.10
( x + y + z )3 ´ ( x + y )3
lim
zÑ0 ( x + y)z
The critical observation is that, in taking the limit z Ñ 0, x and y are fixed. They do not
change as z is getting smaller and smaller. Furthermore this limit is exactly of the form
of the limits in the Definition 2.1.1 of partial derivative, disguised by some obfuscating
changes of notation.
Set
( x + y + z )3
f ( x, y, z) =
( x + y)
Then
( x + y + z )3 ´ ( x + y )3 f ( x, y, z) ´ f ( x, y, 0) f ( x, y, 0 + h) ´ f ( x, y, 0)
lim = lim = lim
zÑ0 ( x + y)z zÑ0 z hÑ0 h
Bf
= ( x, y, 0)
Bz
B ( x + y + z )3
=
Bz x+y z =0
( x + y + z )3 ´ ( x + y )3 ( x + y + z)2 ˇˇ
ˇ
lim =3
zÑ0 ( x + y)z x+y ˇ
z =0
= 3( x + y )
Example 2.1.10
48
PARTIAL D ERIVATIVES 2.2 H IGHER O RDER D ERIVATIVES
y. They can both be differentiated with respect to x and they can both be differentiated
with respect to y. So there are four possible second order derivatives. Here they are,
together with various alternate notations.
B Bf B2 f
( x, y) = 2 ( x, y) = f xx ( x, y)
Bx Bx Bx
B Bf B2 f
( x, y) = ( x, y)= f xy ( x, y)
By Bx ByBx
B Bf B2 f
( x, y) = ( x, y)= f yx ( x, y)
Bx By BxBy
B Bf B2 f
( x, y) = 2 ( x, y) = f yy ( x, y)
By By By
Warning2.2.1.
B2 f 2
In By Bx = ByB Bx f , the derivative closest to f , in this case Bx
B
, is applied first. So we
work through the variables in the bottom right-to-left.
In f xy , the derivative with respect to the variable closest to f , in this case x, is
applied first. So we work through the subscript variables left-to-right.
The difference in “direction” highlighted in the warning seems confusing at first, but
it stems from the way the first partial derivative is written. In the fractional notation, if f
Bf
is being differentiated with respect to x, we write Bx or BxB
f . So the operator Bx
B
is added
Bf
to the left of the function. Now suppose we want to differentiate Bx with respect to y.
h i
Bf B2f
By analogy, we would write By B
Bx , or ByBx . This leads to the order of variables being
right-to-left.
With the subscript notation, if f is being differentiated with respect to x, we write f x ,
with the variable on the right of the function. So now if we take the second derivative with
respect to y, it makes sense by analogy to add that new variable to the right: ( f x )y , or f xy ,
in left-to-right order.
Example 2.2.2
Example 2.2.2
Example 2.2.3
49
PARTIAL D ERIVATIVES 2.2 H IGHER O RDER D ERIVATIVES
f x = αeαx+ βy f y = βeαx+ βy
f xx = α2 eαx+ βy f yx = βαeαx+ βy
f xy = αβeαx+ βy f yy = β2 eαx+ βy
More generally, for any integers m, n ě 0,
B m+n f
= αm βn eαx+ βy
Bx m Byn
Example 2.2.3
Example 2.2.4
B4 f B3
4 3 2
= x x x
Bx1 Bx2 Bx3 Bx4 Bx1 Bx2 Bx3 1 2 3
B2
= 2 x14 x23 x3
Bx1 Bx2
B 4 2
= 6 x1 x2 x3
Bx1
= 24 x13 x22 x3
and
B4 f B3
= 4x13 x23 x32 x4
Bx4 Bx3 Bx2 Bx1 Bx4 Bx3 Bx2
B2
= 12 x13 x22 x32 x4
Bx4 Bx3
B
= 24 x13 x22 x3 x4
Bx4
= 24 x13 x22 x3
Example 2.2.4
Notice that in Example 2.2.2,
f xy = f yx = ´mnemy sin(nx )
and in Example 2.2.3
f xy = f yx = αβeαx+ βy
and in Example 2.2.4
B4 f B4 f
= = 24 x13 x22 x3
Bx1 Bx2 Bx3 Bx4 Bx4 Bx3 Bx2 Bx1
50
PARTIAL D ERIVATIVES 2.2 H IGHER O RDER D ERIVATIVES
In all of these examples, it didn’t matter what order we took the derivatives in. The fol-
lowing theorem3 shows that this was no accident.
B2 f B2 f
If the partial derivatives BxBy and ByBx exist and are continuous at ( x0 , y0 ), then
B2 f B2 f
( x0 , y0 ) = ( x0 , y0 )
BxBy ByBx
The Proof of Theorem 2.2.5 can be found in Appendix A.3.1. An example of a function
B2 f B2 f
f ( x, y) where BxBy ( x0 , y0 ) ‰ ByBx ( x0 , y0 ) can be found in Appendix A.3.2.
Suppose a function f ( x, y) has continuous partial derivatives of all orders over all of R2 .
Suppose further
f xx ( x, y) = ye x
What is f xyxy ( x, y)?
Solution. Since the partial derivatives are continuous, Theorem 2.2.5 applies. So:
f xyxy ( x, y) = ( f x )yx ( x, y) = ( f x ) xy = f xxyy ( x, y)
y y
B
f xxy ( x, y) = [ye x ] = e x
By
B x
f xxyy ( x, y) = [e ] = 0
By
Example 2.2.6
Example 2.2.7
3 The history of this important theorem is pretty convoluted. See “A note on the history of mixed partial
derivatives” by Thomas James Higgins which was published in Scripta Mathematica 7 (1940), 59-62.
4 Alexis Clairaut (1713–1765) was a French mathematician, astronomer, and geophysicist.
5 Hermann Schwarz (1843–1921) was a German mathematician.
51
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
Definition2.3.1.
Another complication is that more variables lead to more (partial) derivatives. It’s
convenient to group the partial derivatives into a vector.
Definition2.3.2.
The vector f x ( a, b) , f y ( a, b) is denoted ∇ f ( a, b) and is called “the gradient of
the function f at the point ( a, b)”.
Suppose that the largest value of f ( x ) is f ( a). What does that tell us about a?
52
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
y “ f pxq
Let’s recall why that’s true. Suppose that the largest value of f ( x ) is f ( a). Then for all
h ą 0,
f ( a + h) ´ f ( a)
f ( a + h) ď f ( a) ùñ f ( a + h) ´ f ( a) ď 0 ùñ ď0 if h ą 0
h
f ( a + h) ´ f ( a)
f ( a + h) ď f ( a) ùñ f ( a + h) ´ f ( a) ď 0 ùñ ě0 if h ă 0
h
Let’s use the ideas of the above discourse to extend the study of local maxima and
local minima to functions of more than one variable. Suppose that the function f ( x, y)
is defined for all ( x, y) in some subset R of R2 , that ( a, b) is point of R that is not on the
boundary of R, and that f has a local maximum at ( a, b). See the figure below.
53
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
pa,b , f pa,bqq
z “ f px, yq
y
pa,bq
R
x
Then the function f ( x, y) must decrease in value as ( x, y) moves away from ( a, b) in any
direction. If we change the x-coordinate a little, f ( x, y) must not increase. So for all h ą 0:
f ( a + h, b) ´ f ( a, b)
f ( a + h, b) ď f ( a, b) ùñ f ( a + h, b) ´ f ( a, b) ď 0 ùñ ď0 if h ą 0
h
f ( a + h, b) ´ f ( a, b)
f ( a + h, b) ď f ( a, b) ùñ f ( a + h, b) ´ f ( a, b) ď 0 ùñ ě0 if h ă 0
h
Theorem2.3.3.
Then
∇ f ( a, b) = 0.
54
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
Definition2.3.4.
Warning2.3.5.
Note that some people (and texts) do not include the case ”∇ f ( a, b) does not
exist” in the definition of a critical point. These points where the gradient does
not exists would (usually) be referred as a singular point of the function. We do
not use this terminology.
Warning2.3.6.
Theorem 2.3.3 tells us that every local maximum or minimum (in the interior of
the domain of a differentiable function) is a critical point. Beware that it does not8
tell us that every critical point is either a local maximum or a local minimum.
In fact, as we shall see in Example 2.3.13, critical points that are neither local maxima
nor a local minima. None-the-less, Theorem 2.3.3 is very useful because often functions
have only a small number of critical points. To find local maxima and minima of such
functions, we only need to consider its critical points. We’ll return later to the question of
how to tell if a critical point is a local maximum, local minimum or neither. For now, we’ll
just practice finding critical points.
Example 2.3.7 f ( x, y) = x2 ´ 2xy + 2y2 + 2x ´ 6y + 12
8 A very common error of logic that people make is “Affirming the consequent”. “If P then Q” is true,
does not imply that “If Q then P” is true . The statement “If he is Shakespeare then he is dead” is true.
But concluding from “That sheep is dead” that “He must be Shakespeare” is just silly.
55
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
or equivalently (dividing by two and moving the constants to the right hand side)
x ´ y = ´1 (E1)
´x + 2y = 3 (E2)
This is a system of two equations in two unknowns (x and y). One strategy for solving
system like this is to
• First use one of the equations to solve for one of the unkowns in terms of the other
unknown. For example, (E1) tells us that y = x + 1. This expresses y in terms of x.
We say that we have solved for y in terms of x.
• Then substitute the result, y = x + 1 in our case, into the other equation, (E2). In our
case, this gives
´x + 2( x + 1) = 3 ðñ x + 2 = 3 ðñ x = 1
• We have now found that x = 1, y = x + 1 = 2 is the only solution. So the only critical
point is (1, 2). Of course it only takes a moment to verify that ∇ f (1, 2) = h0, 0i. It is
a good idea to do this as a simple check of our work.
An alternative strategy for solving a system of two equations in two unknowns, like (E1)
and (E2), is to
The point here is that adding equations (E1) and (E2) together eliminates the un-
known x, leaving us with one equation in the unknown y, which is easily solved.
For other systems of equations you might have to multiply the equations by some
numbers before adding them together.
x ´ 2 = ´1 ùñ x = 1
• Once again (thankfully) we have found that the only critical point is (1, 2).
Example 2.3.7
This was pretty easy because we only had to solve linear equations, which in turn was a
consequence of the fact that f ( x, y) was a polynomial of degree two. Here is an example
with some slightly more challenging algebra.
56
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
Example 2.3.8 f ( x, y) = 2x3 ´ 6xy + y2 + 4y
And here is an example for which the algebra requires a bit more thought.
Example 2.3.9 ( f ( x, y) = xy(5x + y ´ 15))
57
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
is satisfied. The second equation, x (5x + 2y ´ 15) = 0, is satisfied if at least one of the two
factors x, (5x + 2y ´ 15) is zero. So the second equation is satisfied if at least one of the
two equations
x=0 (E2a)
5x + 2y = 15 (E2b)
is satisfied.
So both critical point equations (E1) and (E2) are satisfied if and only if at least one
of (E1a), (E1b) is satisfied and in addition at least one of (E2a), (E2b) is satisfied. So both
critical point equations (E1) and (E2) are satisfied if and only if at least one of the following
four possibilities hold.
• (E1a) and (E2a) are satisfied if and only if x = y = 0
• (E1b) and (E2a) are satisfied if and only if 10x + y = 15, x = 0 ðñ y = 15, x = 0
• (E1b) and (E2b) are satisfied if and only if 10x + y = 15, 5x + 2y = 15. We can use, for
example, the second of these equations to solve for x in terms of y: x = 51 (15 ´ 2y).
When we substitute this into the first equation we get 2(15 ´ 2y) + y = 15, which we
can solve for y. This gives ´3y = 15 ´ 30 or y = 5 and then x = 15 (15 ´ 2 ˆ 5) = 1.
In conclusion, the critical points are (0, 0), (3, 0), (0, 15) and (1, 5).
A more compact way to write what we have just done is
f x ( x, y) = 0 and f y ( x, y) = 0
ðñ y(10x + y ´ 15) = 0 and x (5x + 2y ´ 15) = 0
y = 0 or 10x + y = 15 and x = 0 or 5x + 2y = 15
( (
ðñ
y = 0, x = 0 or y = 0, 5x + 2y = 15 or 10x + y = 15, x = 0 or
( ( (
ðñ
10x + y = 15, 5x + 2y = 15
(
x = y = 0 or y = 0, x = 3 or x = 0, y = 15 or x = 1, y = 5
( ( ( (
ðñ
Example 2.3.9
Let’s try a more practical example — something from the real world. Well, a mathe-
matician’s “real world”. The interested reader should search-engine their way to a dis-
cussion of “idealisation”, “game theory” “Cournot models” and “Bertrand models”. But
don’t spend too long there. A discussion of breweries is about to take place.
Example 2.3.10
In a certain community, there are two breweries in competition9 , so that sales of each neg-
atively affect the profits of the other. If brewery A produces x litres of beer per month and
58
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
brewery B produces y litres per month, then the profits of the two breweries are given by
2x2 + y2 4y2 + x2
P = 2x ´ Q = 2y ´
106 2 ˆ 106
respectively. Find the sum of the two profits if each brewery independently sets its own
production level to maximize its own profit and assumes that its competitor does likewise.
Then, assuming cartel behaviour, find the sum of the two profits if the two breweries
cooperate so as to maximize that sum10 .
Solution. If A adjusts x to maximize P (for y held fixed) and B adjusts y to maximize Q
(for x held fixed) then we want to find the ( x, y) using
4x
Px = 2 ´ 106
8y
Qy = 2 ´ 2ˆ106
Note that Px and Qy exists everywhere. Then x and y are determined by the equations
Px = 0 (E1)
Qy = 0 (E2)
Equation (E1) yields x = 21 106 and equation (E2) yields y = 12 106 . Knowing x and y we
can determine P, Q and the total profit
P + Q = 2( x + y) ´ 1016 52 x2 + 3y2
= 106 1 + 1 ´ 85 ´ 43 = 58 106
On the other hand if ( A, B) adjust ( x, y) to maximize P + Q = 2( x + y) ´ 1016 52 x2 + 3y2 ,
then x and y are determined by
5x
( P + Q) x = 2 ´ 10 6 = 0 (E1)
6y
( P + Q)y = 2 ´ 10 6 = 0 (E2)
Equation (E1) yields x = 52 106 and equation (E2) yields y = 31 106 . Again knowing x and y
we can determine the total profit
P + Q = 2( x + y) ´ 1016 52 x2 + 3y2
= 106 54 + 23 ´ 52 ´ 13 = 15
11 6
10
So cooperating really does help their profits. Unfortunately, like a very small tea-pot,
consumers will be a little poorer11 .
Example 2.3.10
Moving swiftly away from the last pun, let’s do something a little more geometric.
Example 2.3.11
Equal angle bends are made at equal distances from the two ends of a 100 metre long fence
so the resulting three segment fence can be placed along an existing wall to make an en-
closure of trapezoidal shape. What is the largest possible area for such an enclosure?
59
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
Solution. This is a very geometric problem (fenced off from pun opportunities), and as
such we should start by drawing a sketch and introducing some variable names.
x x x sin θ
θ θ
100 ´ 2x
The area enclosed by the fence is the area inside the blue rectangle (in the figure on the
right above) plus the area inside the two blue triangles.
A( x, θ ) = (100 ´ 2x ) x sin θ + 2 ¨ 12 ¨ x sin θ ¨ x cos θ
= (100x ´ 2x2 ) sin θ + x2 sin θ cos θ
To maximize the area, we need to solve
BA
0= = (100 ´ 4x ) sin θ + 2x sin θ cos θ
Bx
BA
0= = (100x ´ 2x2 ) cos θ + x2 cos2 θ ´ sin2 θ
(
Bθ
Note that BABx and Bθ are defined everywhere in their domain (so here the critical points are
BA
the points where the gradient is zero). Both terms in the first equation contain the factor
sin θ and all terms in the second equation contain the factor x. If either sin θ or x are zero
the area A( x, θ ) will also be zero, and so will certainly not be maximal. So we may divide
the first equation by sin θ and the second equation by x, giving
(100 ´ 4x ) + 2x cos θ = 0 (E1)
(100 ´ 2x ) cos θ + x cos2 θ ´ sin2 θ = 0 (E2)
(
These equations might look a little scary. But there is no need to panic. They are not as
bad as they look because θ enters only through cos θ and sin2 θ, which we can easily write
in terms of cos θ. Furthermore we can eliminate cos θ by observing that the first equation
(100´4x )2
forces cos θ = ´ 100´4x
2x and hence sin2 θ = 1 ´ cos2 θ = 1 ´ 4x2
. Substituting these
into the second equation gives
100 ´ 4x (100 ´ 4x )2
´(100 ´ 2x ) +x ´1 = 0
2x 2x2
ùñ 6x2 ´ 200x = 0
100 ´100/3 1
ùñ x= cos θ = ´ = θ = 60˝
3 200/3 2
60
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
Example 2.3.11
Now here is a very useful (even practical!) statistical example — finding the line that
best fits a given collection of points.
Example 2.3.12 (Linear regression)
pxn ,yn q
y “ mx ` b
x
Note that
• term number i in E(m, b) ish the square
i of the difference between yi , which is the ith
measured value of y, and mx + b , which is the approximation to yi given by
x = xi
the line y = mx + b.
• All terms in the sum are positive, regardless of whether the points ( xi , yi ) are above
or below the line.
Our problem is to find the m and b that minimizes E(m, b). This technique for drawing a
line through a bunch of data points is called “linear regression”. It is used a lot12 13 . Even
61
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
in the real world — and not just the real world that you find in mathematics problems.
The actual real world that involves jobs.
Solution. We wish to choose m and b so as to minimize E(m, b). So we need to determine
where the gradient of E does not exist or it exists and it is equal to zero.
n hř n i hř n i hř n i
BE ÿ
= 2(mxi + b ´ yi ) xi = m 2xi2 + b 2xi ´ 2xi yi
Bm i =1 i =1 i =1
i =1
n hř
n i hř
n i hř
n i
BE ÿ
= 2(mxi + b ´ yi ) =m 2xi + b 2 ´ 2yi
Bb i =1 i =1 i =1
i =1
There are a lot of symbols here. But remember that all of the xi ’s and yi ’s are given con-
stants. They come from, for example, experimental data. The only unknowns are m and
b. To emphasize this, and to save some writing, define the constants
n n n n
Sx = xi Sy = yi S x2 = xi2 Sxy = xi yi
ř ř ř ř
i =1 i =1 i =1 i =1
The partial derivatives of E exists everywhere so we only need to find where they are
equal to zero. The equations which determine the critical points are (after dividing by
two)
These are two linear equations on the unknowns m and b. They may be solved in any of
the usual ways. One is to use (E2) to solve for b in terms of m
1
b= Sy ´ S x m (E3)
n
and then substitute this into (E1) to get the equation
1
Sx2 m + Sx Sy ´ Sx m = Sxy ùñ nSx2 ´ S2x m = nSxy ´ Sx Sy
n
for m. We can then solve this equation for m and substitute back into (E3) to get b. This
gives
nSxy ´ Sx Sy Sx Sxy ´ Sy Sx2
m= b = ´
nSx2 ´ S2x nSx2 ´ S2x
Another way to solve the system of equations is
h i
n(E1) ´ Sx (E2) : nSx2 ´ S2x m = nSxy ´ Sx Sy
h i
´Sx (E1) + Sx2 (E2) : nSx2 ´ S2x b = ´Sx Sxy + Sy Sx2
62
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
you don’t need any calculus to apply the formulae, you do need calculus to understand
where they came from. The same technique can be extended to other types of curve fitting
problems. For example, polynomial regression.
Example 2.3.12
The figure below show some level curves of f . Observe from the level curves that
• f increases as you leave (0, 0) walking along the x axis
63
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
y
f =−9
f =−4
f =−1
f =0
f =9 f =4 f =1 f =1 f =4 f =9
x
f =−1
f =−4
f =−9
Example 2.3.13
Definition2.3.14.
The critical point ( a, b) is called a saddle point for the function f ( x, y) if, for each
r ą 0,
• there is at least one point ( x, y), within a distance r of ( a, b), for which
f ( x, y) ą f ( a, b) and
• there is at least one point ( x, y), within a distance r of ( a, b), for which
f ( x, y) ă f ( a, b).
Understanding what the graph of a function looks like is a powerful tool for classifying
critical points, but it can me very time-consuming. The Second Derivative Test (below) is
a more algebraic approach to classification. This test is often faster than graphing, but the
drawback is that it is sometimes inconclusive.
64
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
Let r ą 0 and assume that all second order derivatives of the function f ( x, y) are
continuous at all points ( x, y) that are within a distance r of ( a, b). Assume that
f x ( a, b) = f y ( a, b) = 0. Define
D ( x, y) = f xx ( x, y) f yy ( x, y) ´ f xy ( x, y)2
The proof of Theorem 2.3.15 is beyond the scope of Math 105, but there is some intu-
ition supporting it that is more accessible. Extremely informally, we can think of saddle
points as places with inconsistent concavity: in some directions the surface looks concave
up, in other directions it looks concave down. On the other hand, at a local extremum, the
concavity is the same in all directions.
Let’s do thought experiments on a few simple cases to expand those ideas.
1. Suppose at ( a, b), the surface looks like a minimum if y is held constant, but it looks
like a maximum if x is held constant. (In particular, this means ( a, b) is the location
of a saddle point.)
( a, b, f ( a, b))
65
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
f ( x, b)
f ( a, y)
Since f xx ( a, b) and f yy ( a, b) have different signs (or at least one of them is zero):
f xx ( a, b) f yy ( a, b) ď 0
2 2
f xx ( a, b) f yy ( a, b) ´ f xy ( a, b) ď ´ f xy )( a, b) ď 0
D ( a, b) ď 0
2. Suppose D ( a, b) ą 0.
2
0 ă f xx ( a, b) f yy ( a, b) ´ f xy ( a, b)
2
f xy ( a, b) ă f xx ( a, b) f yy ( a, b)
This tells us that f xx ( a, b) and f yy ( a, b) have the same sign – either they’re both pos-
itive or they’re both negative. So, the function’s concavity is the same whether we
hold the x-value or the y-value constant. The function might have the same concav-
ity in all directions – unlike the saddle point example we saw above. So, it seems
plausible that critical points with positive discriminants are local extrema, rather
than saddle points.
z = f ( x, y) z = f ( x, b)
b b
a a
y y
x x
66
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
This doesn’t go so far as to show us that D ( a, b) ě 0, but it does accord with the test
of f xx ( a, b) in the second bullet point of Theorem 2.3.15.
4. Similarly, suppose the surface has a local minimum at ( a, b).
Holding y = b constant, we can think of z = f ( x, b) as a one-variable function, in
which case f xx ( a, b) ě 0 by the single-variable second derivative test.
z = f ( x, y) z = f ( x, b)
b b
a a
y y
x x
You might wonder why, in the local maximum/local minimum cases of Theorem
2.3.15, f xx ( a, b) appears rather than f yy ( a, b). The answer is only that x is before y in the
alphabet14 . You can use f yy ( a, b) just as well as f xx ( a, b). The reason is that if D ( a, b) ą 0
(as in the first two bullets of the theorem), then because D ( a, b) = f xx ( a, b) f yy ( a, b) ´
f xy ( a, b)2 ą 0, we necessarily have f xx ( a, b) f yy ( a, b) ą 0 so that f xx ( a, b) and f yy ( a, b)
must have the same sign — either both are positive or both are negative.
You might also wonder why we cannot draw any conclusions when D ( a, b) = 0 and
what happens then. The second derivative test for functions of two variables was derived
in precisely the same way as the second derivative test for functions of one variable is
derived — you approximate the function by a polynomial that is of degree two in ( x ´ a),
(y ´ b) and then you analyze the behaviour of the quadratic polynomial near ( a, b). For
this to work, the contributions to f ( x, y) from terms that are of degree two in ( x ´ a),
(y ´ b) had better be bigger than the contributions to f ( x, y) from terms that are of degree
three and higher in ( x ´ a), (y ´ b) when ( x ´ a), (y ´ b) are really small. If this is not
the case, for example when the terms in f ( x, y) that are of degree two in ( x ´ a), (y ´ b)
all have coefficients that are exactly zero, the analysis will certainly break down. That’s
exactly what happens when D ( a, b) = 0. Here are some examples. The functions
f 1 ( x, y) = x4 + y4 f 2 ( x, y) = ´x4 ´ y4 f 3 ( x, y) = x3 + y3 f 4 ( x, y) = x4 ´ y4
all have (0, 0) as the only critical point and all have D (0, 0) = 0. The first, f 1 has its
minimum there. The second, f 2 , has its maximum there. The third and fourth have a
saddle point there.
14 The shackles of convention are not limited to mathematics. Election ballots often have the candidates
listed in alphabetic order.
67
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
Here are sketchs of some level curves for each of these four functions (with all renamed
to simply f ).
y f “9 y f “´9
f “4 f “´4
f “1 f “´1
f “0.1 f “´0.1
f “0 f “0
x x
y y
f “4
f “´4
f “1 f “´1
f “0
f “4 f “1 f “1 f “4
f “0
x f “0 x
f “´1
f “´1
f “´4
f “´4
Example 2.3.17 f ( x, y) = 2x3 ´ 6xy + y2 + 4y
f = 2x3 ´ 6xy + y2 + 4y
f x = 6x2 ´ 6y f xx = 12x f xy = ´6
f y = ´6x + 2y + 4 f yy = 2 f yx = ´6
68
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
(Of course, f xy and f yx have to be the same. It is still useful to compute both, as a way to
catch some mechanical errors.)
We have already found, in Example 2.3.8, that the critical points are (1, 1), (2, 4). The
classification is
critical 2
point f xx f yy ´ f xy f xx type
(1, 1) 12 ˆ 2 ´ (´6)2 ă 0 saddle point
(2, 4) 24 ˆ 2 ´ (´6)2 ą0 24 local min
We were able to leave the f xx entry in the top row blank, because
2 (1, 1) ă 0, and
• we knew that f xx (1, 1) f yy (1, 1) ´ f xy
2 (1, 1) ă 0, by itself, was
• we knew, from Theorem 2.3.15, that f xx (1, 1) f yy (1, 1) ´ f xy
enough to ensure that (1, 1) was a saddle point.
Here is a sketch of some level curves of our f ( x, y). They are not needed to answer this
f p2,4q“0, p2,4q
f “0.25
f “0.5
f “2 f “1
f “3 f “2
f “3
p1,1q, f p1,1q“1
f “0.5
f “0 f “1 x
question, but can give you some idea as to what the graph of f looks like.
Example 2.3.17
69
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
of f ( x, y) in Example 2.3.9. Again, to classify the critical points we need the second order
partial derivatives. They are
f xx ( x, y) = 10y
f yy ( x, y) = 2x
f xy ( x, y) = (1)(10x + y ´ 15) + y(1)= 10x + 2y ´ 15
f yx ( x, y) = (1)(5x + 2y ´ 15) + x (5)= 10x + 2y ´ 15
(Once again, we have computed both f xy and f yx to guard against mechanical errors.) We
have already found, in Example 2.3.9, that the critical points are (0, 0), (0, 15), (3, 0) and
(1, 5). The classification is
critical 2
point f xx f yy ´ f xy f xx type
(0, 0) 0 ˆ 0 ´ (´15)2 ă 0 saddle point
(0, 15) 150 ˆ 0 ´ 152 ă 0 saddle point
(3, 0) 0 ˆ 6 ´ 152 ă0 saddle point
(1, 5) 50 ˆ 2 ´ 52 ą0 50 local min
Here is a sketch of some level curves of our f ( x, y). f is negative in the shaded re-
gions and f is positive in the unshaded regions. Again this is not needed to answer this
p0,15q, f p0,15q“0
f p1,5q“´25, p1,5q
f “20 f “20
p3,0q, f p3,0q“0
f p0,0q“0, p0,0q f “´20
f “´10
f “0
x
f “´20 f “´20
f “20
question, but can give you some idea as to what the graph of f looks like.
Example 2.3.18
70
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
Example 2.3.19
Find and classify all of the critical points of f ( x, y) = x3 + xy2 ´ 3x2 ´ 4y2 + 4.
Solution. We know the drill now. We start by computing all of the partial derivatives of f
up to order 2.
f x and f y are defined everywhere. So the critical points are then the solutions of f x = 0,
f y = 0. That is
f x = 3x2 + y2 ´ 6x = 0 (E1)
f y = 2y( x ´ 4) = 0 (E2)
The second equation, 2y( x ´ 4) = 0, is satisfied if and only if at least one of the two
equations y = 0 and x = 4 is satisfied.
0 = 3x2 + 02 ´ 6x = 3x ( x ´ 2)
so that x = 0 or x = 2.
0 = 3 ˆ 42 + y2 ´ 6 ˆ 4 = 24 + y2
which is impossible.
So, there are two critical points: (0, 0), (2, 0). Here is a table that classifies the critical
points.
critical 2
point f xx f yy ´ f xy f xx type
(0, 0) (´6) ˆ (´8) ´ 02 ą 0 ´6 ă 0 local max
(2, 0) 6 ˆ (´4) ´ 02 ă 0 saddle point
Example 2.3.19
Example 2.3.20
A manufacturer wishes to make an open rectangular box of given volume V using the least
possible material. Find the design specifications.
Solution. Denote by x, y and z, the length, width and height, respectively, of the box.
71
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES
x
y
The box has two sides of area xz, two sides of area yz and a bottom of area xy. So the total
surface area of material used is
S = 2xz + 2yz + xy
However the three dimensions x, y and z are not independent. The requirement that the
box have volume V imposes the constraint
xyz = V
We can use this constraint to eliminate one variable. Since z is at the end of the alphabet
V
(poor z), we eliminate z by substituting z = xy . Note that if x (or y) is equal to zero then
the volume of the box would equal zero. What is the point of a box with zero volume?!
So if we assume the box has non-zero volume then x = 0 and y = 0. So we have find the
values of x and y that minimize the function
2V 2V
S( x, y) = + + xy
y x
Let’s start by finding the critical points of S. Since
2V
Sx ( x, y) = ´ +y
x2
2V
Sy ( x, y) = ´ +x
y2
Note that the partial derivatives are not defined for ( x, y) = (0, 0) but we have already
eliminated the case where x or y is equal to zero. So ( x, y) is a critical point if and only if
x2 y = 2V (E1)
2
xy = 2V (E2)
2V
Solving (E1) for y gives y = x2
. Substituting this into (E2) gives
4V 2 3
?
3 2V ?
3
x 4
= 2V ùñ x = 2V ùñ x = 2V and y= 2/3
= 2V
x (2V )
As there is only one critical point, we would expect it to give the minimum15 . But let’s use
the second derivative test to verify that at least the critical point is a local minimum. The
15 Indeed one can use the facts that 0 ă x ă 8, that 0 ă y ă 8, and that S Ñ 8 as x Ñ 0 and as y Ñ 0
and as x Ñ 8 and as y Ñ 8 to prove that the single critical point gives the global minimum.
72
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA
y y = f (x) = x
1
1 x
So to find the maximum and minimum of the function f ( x ) on the interval [0, 1], you:
16 Recall that “extremal value” means “either maximum value or minimum value”.
73
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA
Find the maximum and minimum values of f ( x, y) = x3 + xy2 ´ 3x2 ´ 4y2 + 4 on the disk
x2 + y2 ď 1.
Solution. Again, we first find all critical points, and then we analyze the boundary.
Interior: If f takes its maximum or minimum value at a point in the interior, x2 + y2 ă 1,
then that point must be a critical point of f . To find the critical points19 we compute the
first order derivatives.
f x = 3x2 + y2 ´ 6x f y = 2xy ´ 8y
74
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA
These are polynomials (in two variables) and they are defined everywhere. So the critical
points are the solutions of
f x = 3x2 + y2 ´ 6x = 0 (E1)
f y = 2y( x ´ 4) = 0 (E2)
The second equation, 2y( x ´ 4) = 0, is satisfied if and only if at least one of the two
equations y = 0 and x = 4 is satisfied.
0 = 3x2 + 02 ´ 6x = 3x ( x ´ 2)
so that x = 0 or x = 2.
0 = 3 ˆ 42 + y2 ´ 6 ˆ 4 = 24 + y2
which is impossible.
So, there are only two critical points: (0, 0), (2, 0).
Boundary: Our boundary is x2 + y2 = 1 We know that ( x, y) satisfies x2 + y2 = 1, and
hence y2 = 1 ´ x2 . Examining the formula for f ( x, y), we see that it contains only even20
powers of y, so we can eliminate y by substituting y2 = 1 ´ x2 into the formula.
f = x3 + x (1 ´ x2 ) ´ 3x2 ´ 4(1 ´ x2 ) + 4 = x + x2
• when x = ´1 (ñ y = f = 0) or
• when x = +1 (ñ y = 0, f = 2) or
b
• when 0 = d
dx ( x + x2 ) = 1 + 2x ( so x = ´ 21 , y = ˘ 34 , f = ´ 14 ).
√ y
(− 12 , 2
3
)
√
(− 21 , − 2
3
)
20 If it contained
? odd powers too, we could consider
? the cases y ě 0 and y ď 0 separately and substitute
y = 1 ´ x2 in the former case and y = ´ 1 ´ x2 in the latter case.
75
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA
Note that the point (2, 0) is outside the allowed region21 . So all together, we have the
following candidates for max and min, with the max and min indicated.
?
point (0, 0) (´1, 0) (1, 0) ´ 12 , ˘ 23
value of f 4 2 0 ´ 14
max min
Example 2.4.1
Example 2.4.2
Find the maximum and minimum values of f ( x, y) = xy ´ x3 y2 when ( x, y) runs over the
square 0 ď x ď 1, 0 ď y ď 1.
Solution. As usual, let’s examine the critical points and boundary in turn.
Interior: If f takes its maximum or minimum value at a point in the interior, 0 ă x ă 1,
0 ă y ă 1, then that point must be a critical point of f . To find the critical points we
compute the first order derivatives.
f x ( x, y) = y ´ 3x2 y2 f y ( x, y) = x ´ 2x3 y
Again, these functions are polynomials in two variables and they are smooth everywhere
in their domain, so the gradient is exists everywhere in the interior. This means that the
critical points are the solutions of
fx = 0 ðñ y(1 ´ 3x2 y) = 0 ðñ y = 0 or 3x2 y = 1
fy = 0 ðñ x (1 ´ 2x2 y) = 0 ðñ x = 0 or 2x2 y = 1
• If y = 0, we cannot have 2x2 y = 1, so we must have x = 0.
• If 3x2 y = 1, we cannot have x = 0, so we must have 2x2 y = 1. Dividing gives
3x2 y 3
1= 2x2 y
= 2 which is impossible.
So the only critical point in the square is (0, 0). There f = 0. Boundary: The region is a
square, so its boundary consists of its four sides.
• First, we look at the part of the boundary with x = 0. On that entire side f = 0.
• Next, we look at the part of the boundary with y = 0. On that entire side f = 0.
• Next, we look at the part of the boundary with y = 1. There f = f ( x, 1) = x ´ x3 . To
find the maximum and minimum of f ( x, y) on the part of the boundary with y = 1,
we must find the maximum and minimum of x ´ x3 when 0 ď x ď 1.
Recall that, in general, the maximum and minimum of a function h( x ) on the interval
a ď x ď b, must occur either at x = a or at x = b or at an x for which either h1 ( x ) = 0
or h1 ( x ) does not exist. In this case, ddx ( x ´ x3 ) = 1 ´ 3x2 , so the max and min of
x ´ x3 for 0 ď x ď 1 must occur
21 We found (2, 0) as a solution to the critical point equations (E1), (E2). That’s because, in the course of
solving those equations, we ignored the constraint that x2 + y2 ď 1.
76
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA
– either at x = 0, where f = 0,
2
– or at x = ?13 , where f = 3? 3
,
– or at x = 1, where f = 0.
– either at y = 0, where f = 0,
– or at y = 12 , where f = 14 ,
– or at y = 1, where f = 0.
All together, we have the following candidates for max and min, with the max and min
indicated.
y
(0, 1) ( √13 , 1) (1, 1)
(1, 12 )
x
(0, 0) (1, 0)
Example 2.4.2
Example 2.4.3
Find the high and low points of the surface z = x2 + y2 with ( x, y) varying over the
a
• the minimum of f ( x, y) is achieved at the point in the square that is nearest the
origin — namely the origin itself. So (0, 0, 0) is the lowest point on the surface and
is at height 0.
77
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA
Even though we have already answered this question, it will be instructive to see what
we wouldahave found if we had followed our usual protocol. The partial derivatives of
f ( x, y) = x2 + y2 are defined for ( x, y) ‰ (0, 0) and are
x y
f x ( x, y) = a f y ( x, y) = a
x2+ y2 x2+ y2
• As we mentioned above, at the point ( x, y) = (0, 0) the gradient is not defined. But
(0, 0) is inside the interior of the domain of our function. Therefore, (0, 0) is a critical
point.
• The boundary of the square consists of its four sides. One side is
( x, y) ˇ x = 1, ´1 ď y ď 1
ˇ (
Example 2.4.3
78
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA
Tx and Ty exist everywhere in their domain, so the gradient is defined at every point in
2 2
the interior of the function. Moving on, because the exponential e´x ´y is never zero, the
critical points are the solutions of
Tx = 0 ðñ 2x ( x + y) = 1
Ty = 0 ðñ 2y( x + y) = 1
• As both 2x ( x + y) and 2y( x + y) are nonzero, we may divide the two equations,
which gives yx = 1, forcing x = y.
As all t’s are allowed, this function takes its max and min at zeroes of
y
(cos t, sin t)
1
t x
dT
= ´ sin t + cos t e´1
dt
That is, (cos t + sin t)e´1 takes its max and min
• when sin t = cos t,
All together, we have the following candidates for max and min, with the max and min
indicated.
79
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA
The following sketch shows all of the critical points. It is a good idea to make such a sketch
so that you don’t accidentally include a critical point that is outside of the allowed region.
y
( √12 , √12 )
( 21 , 21 )
x
(− 12 , − 12 )
(− √12 , − √12 )
Example 2.4.4
In the last example, we analyzed the behaviour of f on the boundary of the region
of interest by using the parametrization x = cos t, y = sin t of the circle x2 + y2 = 1.
Sometimes using this parametrization is not so clean. And worse, some curves don’t have
such a simple parametrization. For our purposes, we’ll only use parametrization on circles
and ellipses.
Example 2.4.5
• First, we find the critical points. Tx and Ty are defined at all points in the interior
and therefore the critical points are the solutions of
Tx = 0 ðñ 2x ( x + y) = 1
Ty = 0 ðñ 2y( x + y) = 1
80
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
All together, we have the following candidates for max and min.
1 1
min = ´ ? max = ?
e e
Example 2.4.5
Definition2.5.1.
81
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
Such problems are quite common. As we said above, we have already encountered
them in the last section on absolute maxima and minima, when we were looking for the
extreme values of a function on the boundary of a region. In economics “utility functions”
are used to model the relative “usefulness” or “desirability” or “preference” of various
economic choices. For example, a utility function U (w, κ ) might specify the relative level
of satisfaction a consumer would get from purchasing a quantity w of wine and κ of coffee.
If the consumer wants to spend $100 and wine costs $20 per unit and coffee costs $5 per
unit, then the consumer would like to mazimize U (w, κ ) subject to the constraint that
20w + 5κ = 100.
To this point we have always solved such constrained optimization problems either by
However, quite often the function g( x, y) is so complicated that one cannot explicitly solve
g( x, y) = 0 for y as a function of x or for x as a function of y and one also cannot explicitly
parametrize g( x, y) = 0. Or sometimes you can, for example, solve g( x, y) = 0 for y as
a function of x, but the resulting solution is so complicated that it is really hard, or even
virtually impossible, to work with. Direct attacks become even harder in higher dimen-
sions when, for example, we wish to optimize a function f ( x, y, z) subject to a constraint
g( x, y, z) = 0.
There is another procedure called the method of “Lagrange22 multipliers” that comes
to our rescue in these scenarios. Here is the two-dimensional version of the method. There
are obvious analogues is other dimensions.
22 Joseph-Louis Lagrange was actually born Giuseppe Lodovico Lagrangia in Turin, Italy in 1736. He
moved to Berlin in 1766 and then to Paris in 1786. He eventually acquired French citizenship and then
the French claimed he was a French mathematician, while the Italians continued to claim that he was
an Italian mathematician.
82
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
∇ f ( a, b) = λ∇ g( a, b)
that is
f x ( a, b) = λ gx ( a, b)
f y ( a, b) = λ gy ( a, b)
f x ( x, y) = λ gx ( x, y)
f y ( x, y) = λ gy ( x, y)
g( x, y) = 0
Note that there are three equations and three unknowns, namely x, y, and λ.
2. Then you evaluate f ( x, y) at each ( x, y) on the list of candidates. The biggest of these
candidate values is the absolute maximum, if an absolute maximum exists. The
smallest of these candidate values is the absolute minimum, if an absolute minimum
exists..
Theorem 2.5.2 can be extended to functions of more variables in a natural way. Using
higher-dimensional Lagrange isn’t in our learning goals, but for interest, we want you to
see how easily the method generalizes. The calculus is the same – it’s only the algebra that
gets longer.
23 Note that this implies the gradients of these functions are defined in this region
83
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
∇ f ( a, b, c) = λ∇ g( a, b, c)
that is
f x ( a, b, c) = λ gx ( a, b, c)
f y ( a, b, c) = λ gy ( a, b, c)
f z ( a, b, c) = λ gz ( a, b, c)
Find the maximum and minimum of the function x2 ´ 10x ´ y2 on the ellipse whose equa-
tion is x2 + 4y2 = 16.
Solution. For this first example, we’ll do out the algebra in truly gory detail. Once you
get the hang of it, it’ll go much faster.
Our objective function (the one we want to maximize and/or minimize) is f ( x, y) =
x2 ´ 10x ´ y2 and the constraint function is g( x, y) = x2 + 4y2 ´ 16. To apply the method
of Lagrange multipliers we need ∇ f and ∇ g. So we start by computing the first-order
derivatives of these functions.
f x = 2x ´ 10 f y = ´2y gx = 2x gy = 8y
So, according to the method of Lagrange multipliers, we need to find all solutions to the
following system of equations.
(E1) In equation (E1), if 2x is nonzero, then we can divide both sides of the equation by it,
x´5
to find λ = 2x´10
2x , i.e. λ = . If 2x = 0, then the equation becomes ´10 = 0λ,
x
which is not true for any λ.
84
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
(E2) In equation (E2), if 8y is nonzero, then we can divide both sides of the equation by it,
´2y 1
to find λ = 8y , i.e. λ = ´ . If 8y = 0, then we also get a solution y = 0 for any
4
λ.
(E1)+(E2) We need all three equations to be true at the same time (that is, for the same
values of x, y, and λ. We’ve found two ways for both (E1) and (E2) to be true.
(E3) Now we’ll see which points make (E1) and (E2) true while also making (E3) true.
x´5 1
λ= and λ = ´
x 4
x´5 1
ùñ =´
x 4
ùñ ´4x + 20 = x
ùñ x=4
0 = 42 + 4y2 ´ 16
0=y
0 = x2 + 402 ´ 16
16 = x2
x = ˘4
Now we’ve found the only possible solutions to all three equations: (˘4, 0). (λ has
to exist, but we don’t actually care what it is.) So the method of Lagrange multipliers,
Theorem 2.5.2, gives that the only possible locations of the maximum and minimum of
the function f are (4, 0) and (´4, 0). To complete the problem, we only have to compute f
at those points.
85
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
Hence the maximum value of x2 ´ 10x ´ y2 on the ellipse is 56 and the minimum value is
´24.
y
x2 ` 4y 2 “ 16
p4,0q
p´4,0q
x
Example 2.5.4
In the previous example, we had to make a lot of decisions about how to solve for the
solutions to the system of three equations. Actually, we can start our Lagrange system-
solving the same way every time. The first observation we make is that the partial deriva-
tives of g can be 0, or nonzero. If they’re zero, this may or may not lead to a solution; if
they’re nonzero, this tells us something about λ.
In the textbook and problem book, we will consistently use the same method to solve
the system of equations. It’s certainly no the only way, and you are free to use other
methods. Once you get used to the computations, you’ll probably start finding ways to
make them faster based on the specifics of individual problems.
Example 2.5.5 (Solving Lagrange in General)
Suppose you want to find all points ( x, y) for which a solution exists to the system below.
f x = λgx (E1)
f y = λgy (E2)
g( x, y) = 0 (E3)
where λ is some real constant. Our method below will hinge on the observation from the
last example that we get different solutions for zero vs. nonzero partial derivatives of the
constraint.
fx fy
• If gx ‰ 0 and gy ‰ 0, then from (E1) we see λ = gx , and from (E2) we see λ = gy . So,
choosing a pair ( x, y) such that
fx fy
=
gx gy
means that for some λ, that pair makes (E1) and (E2) true. Simplify the equation
above to find the necessary relationship between x and y, then find which pairs with
that relationship make (E3) true.
• If gx = 0, then from (E1) we see also f x = 0. Then (E1) is true for any λ that we like.
We can check that there exists some λ that makes (E2) true as well. Then, we find
the points ( x, y) that make (E3) true as well as gx = f x = 0.
86
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
• If gy = 0, then from (E2) we see also f y = 0. Then (E2) is true for any λ that we like.
We can check that there exists some λ that makes (E1) true as well. Then, we find
the points ( x, y) that make (E3) true as well as gx = f x = 0.
Sometimes, one or more of these cases won’t lead to any solutions. In Example 2.5.4,
we were immediately able to discard the possibility gx = 0, because it didn’t lead to
a solution. Once you’re practiced with these types of problems, you’ll often see quite
quickly which cases you get to discard.
Example 2.5.5
Example 2.5.6
g( x, y) = x2 ´ 2x + y2 ´ 4y ´ 20 = 0
We start by setting up the first two equations from the method of Lagrange multipliers.
2x ´ 2
f x = λgx = λ(2x ´ 2) (E1)
x2 ´ 2x + 5
2y ´ 4
f y = λgy = λ(2y ´ 4) (E2)
y2 ´ 4y + 13
g( x, y) = 0 x2 ´ 2x + y2 ´ 4y = 20 (E3)
1 1
• gx ‰ 0 and gy ‰ 0. From (E1), this means λ = x2 ´2x +5
. From (E2), λ = y2 ´4y+13
.
1 1
=
x2 ´ 2x + 5 y2 ´ 4y + 13
x2 ´ 2x + 5 = y2 ´ 4y + 13
x2 ´ 2x = y2 ´ 4y + 8
87
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
This gives us the relationship between x and y that must hold for (E1) and (E2) to be
true under the assumption gx ‰ 0 and gy ‰ 0. Now, in order for (E3) to be true as
well:
0 = ( x2 ´ 2x ) + y2 ´ 4y ´ 20
= (y2 ´ 4y + 8) + y2 ´ 4y ´ 20
= 2y2 ´ 8y ´ 12
0 = y2 ´ 4y ´ 6
?
?
a
4 ˘ 16 ´ 4(1)(´6) 4 ˘ 40
y= = = 2 ˘ 10
2 2
2 2
So, 0 = ( x ´ 2x ) + y ´ 4y ´ 20
? 2 ?
= x2 ´ 2x + 2 ˘ 10 ´ 4(2 ˘ 10) ´ 20
? ?
= x2 ´ 2x + 4 ˘ 4 10 + 10 ´ 8 ¯ 4 10 ´ 20
? ?
Note ˘4 2 ¯ 4 2 = 0
= x2 ´ 2x + 4 + 10 ´ 8 ´ 20
= x2 ´ 2x ´ 14
?
?
a
2 ˘ 4 ´ 4(´14) 2 ˘ 2 15
x= = = 1 ˘ 15
2 2
This gives
? us four
? points
to
? consider:
? ? ? ? ?
1 + 15, 2 + 10 , 1 ´ 15, 2 + 10 , 1 + 15, 2 ´ 10 , and 1 ´ 15, 2 ´ 10 .
• If gx = 0, then x = 1, and (E1) is true for any λ. Then we can choose whatever λ is
necessary to make (E2) true. By (E3):
0 = x2 ´ 2x + y2 ´ 4x ´ 20
= 1 ´ 2 + y2 ´ 4x ´ 20
= y2 ´ 4y ´ 21
= (y ´ 7)(y + 3)
y = 7, y = ´3
This gives us two points to consider: (1, 7) and (1, ´3).
• If gy = 0, then y = 2, and (E2) is true for any λ. Then we can choose whatever λ is
necessary to make (E1) true. By (E3):
0 = x2 ´ 2x + y2 ´ 4x ´ 20
= x2 ´ 2x + 4 ´ 8 ´ 20
= x2 ´ 2x ´ 24
= ( x ´ 6)( x + 4)
x = 6, x = ´4
This gives us two points to consider: (´4, 2) and (6, 2).
88
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
So, all together we have eight points that satisfy our three Lagrange equations. It’s left
only to decide which of those points lead to maxima and to minima.
? ? ? ? ? ? ? ?
point (1 + 15, 2 + 10) (1 ´ 15, 2 + 10) (1 + 15, 2 ´ 10) (1 ´ 15, 2 ´ 10)
value of f ln 361 ln 361 ln 361 ln 361
max max max max
Example 2.5.7
Find the ends of the major and minor axes of the ellipse 3x2 ´ 2xy + 3y2 = 4. They are the
points on the ellipse that are farthest from and nearest to the origin.
Solution. Let ( x, y) be a point on 3x2 ´ 2xy + 3y2 = 4. This point is at the end of a major
axis when it maximizes its distance from the centre of the ellipse, (0, 0). It is at the end
of a minor axis when itaminimizes its distance from (0, 0). So we wish to maximize and
minimize the distance x2 + y2 subject to the constraint
f ( x, y) = x2 + y2
which we will do, because it makes the derivatives cleaner. Again, we use Lagrange
multipliers to solve this problem, so we start by finding the partial derivatives.
f x ( x, y) = 2x f y ( x, y) = 2y gx ( x, y) = 6x ´ 2y gy ( x, y) = ´2x + 6y
24 The function S(z) = z2 is a strictly increasing function for z ě 0. So, for a, b ě 0, the statement “a ă b”
is equivalent to the statement “S( a) ă S(b)”.
89
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
2x x 2y y
• If gx ‰ 0 and gy ‰ 0, then λ = 6x´2y = 3x´y by (E1), and λ = ´2x +6y = ´x +3y by
(E2).
x y
=
3x ´ y ´x + 3y
´x2 + 3xy = 3xy ´ y2
x 2 = y2
x = ˘y
So if x = ˘y, then the appropriate λ will make both (E1) and (E2) true. Now let’s see
what makes (E3) true.
This gives us four points to check: the two points ˘ ?1 , ´ ?1 and the two points
2 2
˘(1, 1)
?
The distance
from ( 0, 0 ) to ˘ ( 1, 1 ) , namely 2, is larger than the distance
from (0, 0)
1 1 1 1
to ˘ 2 , ´ 2 , namely 1. So the ends of the minor axes are ˘ 2 , ´ 2 and the ends of
? ? ? ?
the major axes are ˘(1, 1). Those ends are sketched in the figure on the left below. Once
we have the ends, it is an easy matter25 to sketch the ellipse as in the figure on the right
below.
25 if you tilt your head so that the line through (1, 1) and (´1, ´1) appears horizontal
90
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
y y
p1,1q p1,1q
? ?
p´1,1q{ 2 p´1,1q{ 2
x x
? ?
p1,´1q{ 2 p1,´1q{ 2
p´1,´1q p´1,´1q
3x2 ´ 2xy ` 3y 2 “ 4
Example 2.5.7
In the previous examples, the objective function and the constraint were specified ex-
plicitly. That will not always be the case. In the next example, we have to do a little
geometry to extract them.
Example 2.5.8
Find the rectangle of largest area (with sides parallel to the coordinates axes) that can be
inscribed in the ellipse x2 + 2y2 = 1.
Solution. Since this question is so geometric, it is best to start by drawing a picture.
y
x2 ` 2y 2 “ 1
px, yq
Call the coordinates of the upper right corner of the rectangle ( x, y), as in the figure
above. Note that x ě 0 and y ě 0; and if x = 0 or y = 0, then the area of the rectangle is
0, which is certainly not a maximum. So the global maximum must occur at some point
where x and y are both positive. This will also be a local maximum, so we should be able
to find it using the method of Lagrange multipliers.
The four corners of the rectangle are (˘x, ˘y) so the rectangle has width 2x and height
2y and the objective function is f ( x, y) = 4xy. The constraint function for this problem is
g( x, y) = x2 + 2y2 ´ 1. Again, to use Lagrange mutlipliers we need the first order partial
derivatives.
f x = 4y f y = 4x gx = 2x gy = 4y
91
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
So, according to the method of Lagrange multipliers, we need to find all solutions to
4y = λ(2x ) (E1)
4x = λ(4y) (E2)
x2 + 2y2 ´ 1 = 0 (E3)
4y 2y 4x x
• If gx ‰ 0 and gy ‰ 0, then λ = 2x = x from (E1) and λ = 4y = y from (E2). So,
2y x
=
x y
2y2 = x2
?
x = (˘ 2) y
From (E3),
? 2
(˘ 2)y + 2y2 ´ 1 = 0
2y2 + 2y2 = 1
4y2 = 1
1
y=˘
2
? 1
x = (˘ 2) y = ˘ ?
2
1 1
So there are four points to consider: ˘ 2 , ˘ 2 .
?
? ? ?
We now have four possible values of ( x, y), namely 1/ 2 , 1/2 , ´ 1/ 2 , ´1/2 , 1/ 2 , ´1/2
?
and ´ 1/ 2 , 1/2 . They are the four corners of a single rectangle. We said that we wanted
?
( x, y) to be the upper right corner, i.e. the corner in the first quadrant. It is 1/ 2 , 1/2 .
How do we interpret the other three points we found? The global min of the function
4xy subject to the constraint x2 + 2y2 = 1 will occur at one of these points, but those
points aren’t in our model domain. When x and y have different signs, 4xy no longer
gives the area of a rectangle, since it’s negative. Over our model domain, we kind of
have “endpoints:” x = 0 and y = 0. Our maximum occurred somewhere between our
endpoints; our model minimum occurs at the endpoints.
Example 2.5.8
92
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
1. If our constraint function is a closed curve (circle, ellipse, square, etc.) and our ob-
jective function is continuous over it, then there will certainly be an absolute max
and absolute min over the constraint; and these will certainly also be local extrema.
So when our constraint is a closed curve, and our objective function is continuous
over it, we are guaranteed that the absolute max and min exist, and are at points that
satisfy the Lagrange equations.
2. If our constraint function is not a closed curve (e.g. a line, a line segment, a function
like xy = 1, etc.) then the system is more complicated. Assume that the objective
function is continuous over the constraint curve. Since our constraint curve is one-
dimensional (like a line, but a line that has some orientation in space), we’re in a
similar position as we were in single-variable calculus: extrema can occur at end-
points, or at “critical points.” In our case, “critical points” translate to solutions to
the Lagrange equations; “endpoints” mean pretty much the same thing they always
have.
(a) If the constraint function is bounded, we must consider its endpoints as well
as solutions to the Lagrange system. There will be an absolute maximum and
minimum, and these will definitely occur at solutions to the Lagrange system
or at the endpoints of the constraint.
(b) If the constraint function is unbounded, there may or may not exist absolute
extrema. This is where you’ll most heavily rely on your intuition about function
shape and behaviour. Limits can be useful here.
93
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS
Example 2.5.9
U (w, κ ) = 6w /3 κ /3
2 1
subject to the constraint 4w + 2κ = 12
Solution. The constraint 4w + 2κ = 12 is simple enough that we can easily use it to
express κ in terms of w, then substitute κ = 6 ´ 2w into U (w, κ ), and then maximize
U (w, 6 ´ 2w) = 6w2/3 (6 ´ 2w)1/3 using the techniques of last semester.
However, for practice purposes, we’ll use Lagrange multipliers with the objective func-
tion U (w, κ ) = 6w2/3 κ 1/3 and the constraint function g(w, κ ) = 4w + 2κ ´ 12. The first order
derivatives of these functions are
Uw = 4w´ /3 κ /3 Uκ = 2w /3 κ ´ /3
1 1 2 2
gw = 4 gκ = 2
The boundary values (“endpoints”) w = 0 and κ = 0 give utility 0, which is obviously not
going to be the maximum utility. So it suffices to consider only local maxima. According
to the method of Lagrange multipliers, we need to find all solutions to
4w´ /3 κ /3 = 4λ
1 1
(E1)
2w /3 κ ´ /3 = 2λ
2 2
(E2)
4w + 2κ ´ 12 = 0 (E3)
Then we see gx ‰ 0 and gw ‰ 0, so we only have one of our usual three cases.
• equation (E1) gives λ = w´1/3 κ 1/3 .
• Substituting this into (E2) gives w2/3 κ ´2/3 = λ = w´1/3 κ 1/3 and hence w = κ.
• Then substituting w = κ into (E3) gives 6κ = 12.
So w = κ = 2 and the maximum utility is U (2, 2) = 12.
Note in this example we had a bounded (but not closed) curve. It has endpoints (0, 6)
and (3, 0). Since the maximum didn’t occur at the endpoints, then the global maximum
was also a local maximum, and so it showed up as a solution to the system of Lagrange
equations.
Example 2.5.9
94
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS
B2 u B2 u
Bx2
ă 0 and By2
ă 0 everywhere.
Example 2.6.1
Suppose
u( x, y) = x2 + 2y
gives the utility of x and y units of two goods, respectively, for a particular consumer.
For these goods, “more is better” (because u x ą 0 and uy ą 0 for all non-negative x
2
and y) without diminishing returns. (Indeed, BBxu2 ą 0, meaning each subsequent unit of
the good associated with x brings more happiness than the last.)
Suppose the consumer has 10 dollars, and x and y cost 2 and 3 dollars per unit respec-
tively. Find the optimal consumption of x and y, and the corresponding maximum utility.
g( x, y) = 2x + 3y ´ 10 = 0
26 Utilities are meaningful in comparison to one another, but generally don’t have particular units. It’s
hard to say what exactly it means to have “one-and-a-half satisfaction,” but we can say that a utility of
1.5 is better than a utility of 1.25.
95
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS
(Since “more is better,” there’s no incentive to spend less than our budget of ten dollars.)
We can solve this by substitution. From our constraint, we see y = 10´2x3 . That turns
our utility function into the following:
2 4 20
u( x, y) = x2 + 2y = x2 + (10 ´ 2x ) = x2 ´ x +
3 3 3
This is a parabola pointing up, so its maximum will be at an endpoint of our interval.
Since x and y are quantities, we require x ě 0 and y ě 0.
10 ´ 2x
0ďy= ùñ x ď 5
3
Our model domain is 0 ď x ď 5. The endpoint x = 5 corresponds to all $10 going to the
first good (and y = 0). The endpoint x = 0 corresponds to all $10 going to the second
good (with y = 103 ).
u(5, 0) = 52 + 2(0) = 25
10 2 10 20
u 0, = 0 +2 =
3 3 3
Our utility is maximized when we spend all $10 on the first good, purchasing x = 5
and y = 0. That maximum utility is 25.
Example 2.6.1
Two functions that are often used to model utility are natural logarithms and the Cobb-
Douglas function:
u( x, y) = α ln( x ) + β ln(y) u( x, y) = Ax α y β
(where A, α, and β are constants.) The reasons why these equations make good models go
beyond the scope of this class. However, you have all the tools required to solve problems
involving these two equations.
Example 2.6.2
Alejandro has recently found a true passion for baking. He likes making two types of
bread: ciabatta (c) and pita (p). Ciabatta costs 20 dollars per unit to make and pita 10
dollars per unit. Alejandro wants to spend 60 dollars on bread, and his utility function27
is as follows:
u(c, p) = ln(c) + 2 ln( p)
Find the optimal consumption for Alejandro and the corresponding maximum utility.
Solution. The utility function will be the objective function and the constraint will be the
27 We’re not averse to having negative utility values. Again, utility doesn’t have absolute units, but rather
is useful as a relative scale. Higher utility is better, whether the numbers are positive or not.
This particular utility function has the interesting feature that c = 0 or p = 0 will minimize utility. This is
actually a common property of utility functions. It avoids having an optimal solution where one good
is not consumed at all.
96
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS
budget constraint. The budget constraint is 20c + 10p = 60. We can find the maximum
utility using substitution or the method of Lagrange multipliers.
Solution 1: substitution
Since 20c + 10p = 60, we see p = 6 ´ 2c. Then our utility function is:
f (1) = 16
f (3) = 0
0 ď p = 6 ´ 2c ùñ c ď 3
The endpoints of our interval are c = 0 (all pita) and c = 3 (all ciabatta). We’ve already
found f (3) = 0.
f (0) = 0
The function c(6 ´ 2c)2 has a maximum of 16 when c = 1, so the function ln c(6 ´ 2c)2
has a maximum of ln 16 when c = 1. Since c = 1 means p = 4, utility is maximized when
Alejandro spends $20 on ciabatta, and $40 on pita.
Solution 2: Lagrange
$ $1
&u c
’ = λgc &c
’ = λ ¨ 20
2
up = λg p ùñ p = λ ¨ 10
g(c, p) =0
’ ’
20c + 10p ´ 60 = 0
% %
97
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS
Suppose u( x, y) is the utility function for quantities x and y of two goods. Let
these goods have unit prices p x and py , respectively, and let the consumer have a
budget I. Then the function
x m ( p x , py , I )
giving the optimal consumption of x to maximize u( x, y) subject to the budget
constraint p x x + py y = I is called the Marshallian demand function.
Note: the superscript m in the function name x m isn’t a power. Rather than denoting a
variable, m simply stands for “Marshallian.”
Example 2.6.4
Lets go back to Alejandro and his passion for baking. This weekend he would like to make
ciabatta (c) and focaccia (f). Ciabatta costs pc dollars to make and focaccia p f dollars28 . For
28 When he knows we’ll be thinking about price p, Alejandro helpfully bakes a type of bread that does
not start with “p”.
98
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS
this weekend, Alejandro wants to spend I dollars on bread, and his utility function is as
follows:
u(c, f ) = ln(c) + 2 ln( f )
Find the optimal consumption for Alejandro of each bread type.
Solution. The utility function will be the objective function and the constraint will be the
budget constraint. The budget constraint is pc c + p f f = I ñ b(c, f ) = cpc + f p f ´ I.
As in Example 2.6.2, the endpoints of our interval (c = 0, f = 0) minimize utility, so
the maximum will be at some interior point. We can find it using the method of Lagrange
multipliers.
$ $1
&u c
’ = λ ¨ bc &c
’ = λ ¨ pc
2
uf = λ ¨ b f ùñ f = λ ¨ pf
b(c, f ) =0
’ ’
cpc + f p f ´ I =0
% %
0 = cpc + f p f ´ I
!
pc
= cpc + 2c pf ´ I
pf
= cpc + 2cpc ´ I
I = 3cpc
I
c=
3pc
2I
The point c = 3pI c , f = 3p is the only point to consider for a max. Since c ě 0 and
f
f ě 0, it’s within our model domain. So, it give the optimal consumption of ciabatta and
focaccia.
Let’s think of the optimal consumption of each bread type as a function of budget and
demand, and name these functions cm and f m (m for “Marshallian”). Then
I 2I
cm ( I, pc , p f ) = f m ( I, pc , p f ) =
3pc 3p f
99
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS
give the Marshallian demand curves for ciabatta and focaccia, respectively.
Example 2.6.4
It is possible to use the Marshallian Demand to do some analysis on the goods. A normal
good is defined as a product for which quantity demanded increases as income increases.
An inferior good is defined as a product for which quantity demanded decreases as income
increases.
Definition2.6.5 (Normal and Inferior Goods).
Bx m ( p x , py , I )
ą0
BI
everywhere then the good is a normal good. If
Bx m ( p x , py , I )
ă0
BI
everywhere then the good is an inferior good.
You can go back to Alejandro’s example 2.6.4 and verify that, in that case, both are
normal goods.
Let X and Y be two goods with positive unit prices p x and py , respectively, subject
to the budget constraint p x x + py y = I. If X is an inferior good, then Y is a normal
good.
I ´ px x
y=
py
100
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS
Bx m
Since X is an inferior good, by Definition 2.6.5, BI ă 0. Since also p x and py are positive,
p m
then the term ´ pyx Bx
BI is positive:
Bym 1
ùñ ě ą0
BI py
So, Y is a normal good by Definition 2.6.5.
Using the first partial derivative, we can also analyse how changes in prices affect the
Marshallian Demand.
Definition2.6.7 (Price Effect).
Bx m ( p x , py , I )
B px
Bx m ( p x , py , I )
B py
In Example 2.6.4, if the price of making foccacia increases, how will this effect the amount
of ciabatta Alejandro makes? Assume everything else stays the same – the utility function
stays the same, the price of ciabatta stays the same, the budget I stays the same, and the
assumption remains that Alejandro will maximize utility subject to his budget constraint.
Are the ciabatta and focaccia normal goods, or inferior goods?
Solution. Surprisingly, the price of focaccia doesn’t affect the consumption of ciabatta at
all! The Marshallian demand of ciabatta is
I
cm ( I, pc , p f ) =
3pc
Since p f doesn’t even show up, the derivative is easy to take:
Bcm
=0
Bpf
That means the price effect of focaccia on Alejandro’s ciabatta baking is zero. (If the price
of focaccia goes up, the impact on his baking habits are that he will make less focaccia.)
101
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS
To decide whether ciabatta and focaccia are normal or inferior goods, we should take
their partial derivatives with respect to I.
I 2I
cm ( I, pc , p f ) = f m ( I, pc , p f ) =
3pc 3p f
Bcm 1 Bfm 2
= ą0 = ą0
BI 3pc BI 3p f
Since both derivatives are positive everywhere, both breads are normal goods.
Example 2.6.8
Example 2.6.9
Kenechukwu is doing groceries for the week, and as usual he has I dollars to spend on
fruits and berries. If he consumes a kg of apples and s kg of strawberries, then his utility
function is:
u( a, s) = a1/2 s1/4
Apples cost p a dollars per kg, and strawberries cost ps dollars per kg.
Find Kenechukwu’s Marshallian demand function for apples. What is the price effect
of ps on apples? Are apples normal or inferior goods?
Solution. The utility function will be the objective function and the constraint will be the
budget constraint. As in Example 2.6.2, the endpoints a = 0 and s = 0 are minima of the
utility function. (We see this because setting either a = 0 or s = 0 leads to u = 0; and since
u involves even roots, it never returns a negative value.) So, the maximum will happen at
some internal point, which we can find using Lagrange multipliers.
The budget constraint is p a a + ps s = I ñ b( a, s) = p a a + ps s ´ I.
1 ´1/2 1/4
$ $
&u a
’ = λ ¨ ba &2a
’ s = λ ¨ pa
1 1/2
ub = λ ¨ bb ùñ 4 a s ´3/4 = λ ¨ ps
b( a, s) =0 p a a + ps s ´ I =0
’
% ’
%
102
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS
Bam
=0
B ps
Bam 2 Bsm 1
= ą0 = ą0
BI 3p a BI 3ps
So far, our paradigm has been to optimize happiness, given a fixed budget. Thinking
about utility another way, we could fix the desired amount of utility, and try to minimize
the cost required to achieve it. In this paradigm, our utility function is our constraint,
while our cost function is the objective function we want to minimize. This gives rise to
Hicksian demand.
Suppose u( x, y) is the utility function for quantities x and y of two goods, and
suppose the consumer requires a utility value of at least U, where U is some
positive constant. Let these goods have unit prices p x and py . Then the function
x h ( p x , py , U )
The definition requires that the utility be at least some fixed constant. In practice, we
can usually assume that the utility is equal to that fixed constant. That’s because if we
103
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS
have a higher utility than necessary, we can usually save some money by bringing our
utility down to its minimum allowable level. This could only fail to be the case if, at some
point, our utility function had a negative partial derivative. A negative partial derivative
indicates that we might increase utility as we decrease consumption.
Example 2.6.11
Lets go back to Alejandro and his passion for baking. This weekend he would like to make
ciabatta (c) and baguettes (B). Ciabatta costs pc dollars to make and baguettes pb dollars.
His utility function is as follows: ?
u(c, b) = cb
Fixing Alejandro’s utility as the constant u(c, b) = U, find his Hicksian Demand for both
breads.
Solution. The cost will be the
? objective function, f (c, b) = pc c + pb b. The utility function
gives our constraint, U = cb. We can find the constrained minimum of f (c, b) using
substitution.
?
U= cb
2
U = cb
U2
c=
b
Plugging this into our objective function,
U2
f (c, b) = pc c + pb b = pc + pb b = U 2 pc b´1 + pb b
b
This is a function of one variable. Let’s find the critical points.
0 = ´ U 2 pc b´2 + pb
U 2 pc b´2 = pb
U 2 pc
= b2
pb
pc
c
b=U
pb
U2 pb
c
At that point, c = =U
b pc
To verify that this critical point gives a global minimum, consider the second derivative
of our one-variable function.
d h 2 ´2 i
´ U pc b + pb = 2 U 2 pc b´3
db
Our model domain only allows for non-negative values of b, so the second derivative is
non-negative everywhere. That means its global minimum is at its sole critical point. That
104
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS
pb pc
b b
means that, all together, we found that the quantities c = U pc and b = U pb minimize
the cost function f (c, b) = pc c + pb b subject to the constraint u(c, b) = U. So, our Hicksian
demand functions are:
pb pc
c c
h h
c ( pc , pb , U ) = U and b ( pc , pb , U ) = U
pc pb
Example 2.6.11
h
In Example 2.6.11, note BBcp ‰ 0. This is in contrast to examples 2.6.8 and 2.6.9, where
b
the price effects of one good’s price on the other good’s consumption were both 0. Hick-
sian demand is sometimes used to study the substitution effect, where a change in price in
one good causes a change in consumption of another good.
105
Chapter 3
I NTEGRATION
• Integration — at its most basic, allows us to analyse the area under a curve. Of
course, its application and importance extend far beyond areas and it plays a central
role in solving differential equations.
It is not immediately obvious that these two topics are related to each other. However, as
we shall see, they are indeed intimately linked.
106
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
§§ A Motivating Example
Let us find the areaˇ under the curve y = e(x (and above the x–axis) for 0 ď x ď 1. That is,
the area of ( x, y) ˇ 0 ď y ď e x , 0 ď x ď 1 .
Example 3.1.1
experience with e x in differential calculus, that the curve y = e x is not easily written in
terms of other simpler functions, so it is very unlikely that we would be able to write the
area as a combination of simpler geometric objects such as triangles, rectangles or circles.
So rather than trying to write down the area exactly, our strategy is to approximate the
area and then make our approximation more and more precise1 . We choose2 to approx-
imate the area as a union of a large number of tall thin (vertical) rectangles. As we take
more and more rectangles we get better and better approximations. Taking the limit as
the number of rectangles goes to infinity gives the exact area3 .
As a warm up exercise, we’ll now just use four rectangles. In Example 3.1.2, below,
we’ll consider an arbitrary number of rectangles and then take the limit as the number of
rectangles goes to infinity. So
1 This should remind the reader of the approach taken to compute the slope of a tangent line way way
back at the start of differential calculus.
2 Approximating the area in this way leads to a definition of integration that is called Riemann integra-
tion. This is the most commonly used approach to integration. However we could also approximate the
area by using long thin horizontal strips. This leads to a definition of integration that is called Lebesgue
integration. We will not be covering Lebesgue integration in these notes.
3 If we want to be more careful here, we should construct two approximations, one that is always a little
smaller than the desired area and one that is a little larger. We can then take a limit using the Squeeze
Theorem and arrive at the exact area. More on this later.
107
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
• subdivide the interval 0 ď x ď 1 into 4 equal subintervals each of width 1/4, and
• subdivide the area of interest into four corresponding vertical strips, as in the figure
below.
The area we want is exactly the sum of the areas of all four strips.
y = ex
y
1 1 3 x
4 2 4
1
Each of these strips is almost, but not quite, a rectangle. While the bottom and sides are
fine (the sides are at right-angles to the base), the top of the strip is not horizontal. This
is where we must start to approximate. We can replace each strip by a rectangle by just
levelling off the top. But now we have to make a choice — at what height do we level off
the top?
Consider, for example, the leftmost strip. On this strip, x runs from 0 to 1/4. As x
runs from 0 to 1/4, the height y runs from e0 to e1/4 . It would be reasonable to choose the
height of the approximating rectangle to be somewhere between e0 and e1/4 . Which height
y y = ex
e1/40
e
1 x
4
should we choose? Well, actually it doesn’t matter. When we eventually take the limit of
infinitely many approximating rectangles all of those different choices give exactly the
same final answer. We’ll say more about this later.
In this example we’ll do two sample computations.
• For the first computation we approximate each slice by a rectangle whose height is
the height of the left hand side of the slice.
– On the first slice, x runs from 0 to 1/4, and the height y runs from e0 , on the left
hand side, to e1/4 , on the right hand side.
108
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
– So we approximate the first slice by the rectangle of height e0 and width 1/4,
and hence of area 14 e0 = 14 .
– On the second slice, x runs from 1/4 to 1/2, and the height y runs from e1/4 and
e1/2 .
– So we approximate the second slice by the rectangle of height e1/4 and width
1/4, and hence of area 1 e1/4 .
4
– And so on.
– All together, we approximate the area of interest by the sum of the areas of the
four approximating rectangles, which is
1
1 + e /4 + e /2 + e /4
1 1 3
= 1.5124
4
y
y = ex
y y = ex
1 2 3 4 x 1 2 3 4 x
4 4 4 4 4 4 4 4
• For the second computation we approximate each slice by a rectangle whose height
is the height of the right hand side of the slice.
– On the first slice, x runs from 0 to 1/4, and the height y runs from e0 , on the left
hand side, to e1/4 , on the right hand side.
– So we approximate the first slice by the rectangle of height e1/4 and width 1/4,
and hence of area 14 e1/4 .
– On the second slice, x runs from 1/4 to 1/2, and the height y runs from e1/4 and
e1/2 .
109
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
– So we approximate the second slice by the rectangle of height e1/2 and width
1/4, and hence of area 1 e1/2 .
4
– And so on.
– All together, we approximate the area of interest by the sum of the areas of the
four approximating rectangles, which is
1
e /4 + e /2 + e /4 + e1
1 1 3
= 1.9420
4
Example 3.1.1
Now for the full computation that gives the exact area.
Example 3.1.2
strategy is to approximate this area by the area of a union of a large number of very thin
rectangles, and then take the limit as the number of rectangles goes to infinity. In Exam-
ple 3.1.1, we used just four rectangles. Now we’ll consider a general number of rectangles,
that we’ll call n. Then we’ll take the limit n Ñ 8. So
• subdivide the interval 0 ď x ď 1 into n equal subintervals each of width 1/n, and
• subdivide the area of interest into corresponding thin strips, as in the figure below.
The area we want is exactly the sum of the areas of all of the thin strips.
y = ex
y
1 2 n x
n n
··· n
110
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
Each of these strips is almost, but not quite, a rectangle. As in Example 3.1.1, the only
problem is that the top is not horizontal. So we approximate each strip by a rectangle, just
by levelling off the top. Again, we have to make a choice — at what height do we level off
the top?
Consider, for example, the leftmost strip. On this strip, x runs from 0 to 1/n. As x runs
from 0 to 1/n, the height y runs from e0 to e1/n . It would be reasonable to choose the height
of the approximating rectangle to be somewhere between e0 and e1/n . Which height should
we choose?
Well, as we said in Example 3.1.1, it doesn’t matter. We shall shortly take the limit
n Ñ 8 and, in that limit, all of those different choices give exactly the same final answer.
We won’t justify that statement in this example, but Appendix section A.10 provides the
justification. For this example we just, arbitrarily, choose the height of each rectangle to be
the height of the graph y = e x at the smallest value of x in the corresponding strip4 . The
figure on the left below shows the approximating rectangles when n = 4 and the figure
on the right shows the approximating rectangles when n = 8.
y y
x
y=e y = ex
1 2 3 4 x 1 2 3 4 5 6 7 8 x
4 4 4 4 8 8 8 8 8 8 8 8
4 Notice that since e x is an increasing function, this choice of heights means that each of our rectangles is
smaller than the strip it came from.
111
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
• and so on and
So
1
Total approximating area = 1 + r + r2 + ¨ ¨ ¨ + r n´1
n
The sum in brackets is known as a geometric sum and satisfies a nice simple formula:
rn ´ 1
1 + r + r2 + ¨ ¨ ¨ + r n´1 = provided r ‰ 1
r´1
The derivation of the above formula is not too difficult. So let’s derive it in a little aside.
S = 1 + r + r2 + ¨ ¨ ¨ + r n´1
Notice that if we multiply the whole sum by r we get back almost the same thing:
2 n´1
rS = r 1 + r + r + ¨ ¨ ¨ + r
= r + r2 + r3 + ¨ ¨ ¨ + r n
This right hand side differs from the original sum S only in that
• the right hand side is missing the “1+ ” that S starts with and
112
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
• the right hand side has an extra “+r n ” at the end that does not appear in S.
That is
rS = S ´ 1 + r n
(r ´ 1) S = (r n ´ 1)
rn ´ 1
S=
r´1
as required. Notice that the last step in the manipulations only works providing r ‰ 1
(otherwise we are dividing by zero).
To get the exact area5 all we need to do is make the approximation better and better
by taking the limit n Ñ 8. The limit will look more familiar if we rename 1/n to X. As n
tends to infinity, X tends to 0, so
1 e´1
Area = lim
nÑ8 n e1/n ´ 1
1/n
= (e ´ 1) lim 1/n
nÑ8 e ´1
X
= (e ´ 1) lim X (with X = 1/n)
XÑ0 e ´ 1
Examining this limit we see that both numerator and denominator tend to zero as X Ñ
0, and so we cannot evaluate this limit by computing the limits of the numerator and
denominator separately and then dividing the results. Despite this, the limit is not too
hard to evaluate; here we give two ways:
5 We haven’t proved that this will give us the exact area, but it should be clear that taking this limit will
give us a lower bound on the area. To complete things rigorously we also need an upper bound and
the Squeeze Theorem. We do this in the next optional subsection.
113
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
• Perhaps the easiest way to compute the limit is by using l’Hôpital’s rule6 . Since both
numerator and denominator go to zero, this is a 0/0 indeterminate form. Thus
d
X dX X 1
lim X = lim = lim =1
XÑ0 e ´ 1 d X XÑ0 e X
dX ( e ´ 1)
XÑ0
• Another way7 to evaluate the same limit is to observe that it can be massaged into
the form of the limit definition of the derivative. First notice that
´1
X eX ´ 1
lim = lim
XÑ0 e X ´ 1 XÑ0 X
provided this second limit exists and is nonzero. This second limit should look a
little familiar:
eX ´ 1 e X ´ e0
lim = lim
XÑ0 X XÑ0 X ´ 0
So, after this short aside into limits, we may now conclude that
X
Area = (e ´ 1) lim
XÑ0 e X ´ 1
= e´1
Example 3.1.2
6 If you do not recall L’Hôpital’s rule and indeterminate forms then we recommend you skim over your
differential calculus notes on the topic.
7 Say if you don’t recall l’Hôpital’s rule and have not had time to revise it.
114
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
material below without this notation, proper summation notation is well worth learning,
so we advise the reader to persevere.
Writing out the summands explicitly can become quite impractical — for example, say
we need the sum of the first 11 squares:
1 + 22 + 32 + 42 + 52 + 62 + 72 + 82 + 92 + 102 + 112
This becomes tedious. Where the pattern is clear, we will often skip the middle few terms
and instead write
1 + 22 + ¨ ¨ ¨ + 112 .
A far more precise way to write this is using Σ (capital-sigma) notation. For example, we
can write the above sum as
11
k2
ÿ
k =1
This is read as
More generally
Notation3.1.4.
f ( m ) + f ( m + 1) + f ( m + 2) + ¨ ¨ ¨ + f ( n ´ 1) + f ( n ).
Similarly we write
n
ÿ
ai
i =m
to mean
115
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
It is important to note that the right hand side of this expression evaluates to a number8 ; it
does not contain “k”. The summation index k is just a “dummy” variable and it does not
have to be called k. For example
7 7 7 7
ÿ 1 ÿ 1 ÿ 1 ÿ 1
2
= 2
= 2
=
k i j
i =3
`2 j =3
k =3 `=3
Also the summation index has no meaning outside the sum. For example
7
ÿ 1
k
k2
k =3
are equal.
Here is a theorem that gives a few rules for manipulating summation notation.
n
n
n
(b) ( a i + bi ) = ai bi
ř ř ř
+
i =m i =m i =m
n
n
n
(c) ( a i ´ bi ) = ai ´ bi
ř ř ř
i =m i =m i =m
46181
8 Some careful addition shows it is 176400 .
116
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
Proof. We can prove this theorem by just writing out both sides of each equation, and
observing that they are equal, by the usual laws of arithmetic9 . For example, for the first
equation, the left and right hand sides are
n
ÿ ÿ
n
cai = cam + cam+1 + ¨ ¨ ¨ + can and c ai = c ( a m + a m +1 + ¨ ¨ ¨ + a n )
i =m i =m
They are equal by the usual distributive law. The “distributive law” is the fancy name for
c( a + b) = ca + cb.
Not many sums can be computed exactly10 . Here are some that can. The first few are
used a lot.
Theorem3.1.6.
n n +1
(a) ari = a 1´r
1´r , for all real numbers a and r ‰ 1 and all integers n ě 0.
ř
i =0
n
(b) 1 = n, for all integers n ě 1.
ř
i =1
n
(c) i = 21 n(n + 1), for all integers n ě 1.
ř
i =1
n
(d) i2 = 61 n(n + 1)(2n + 1), for all integers n ě 1.
ř
i =1
n h i2
1
(e) i3 = 2 n ( n + 1) , for all integers n ě 1.
ř
i =1
9 Since all the sums are finite, this isn’t too hard. More care must be taken when the sums involve an
infinite number of terms. We will examine this in Chapter 5.
10 Of course, any finite sum can be computed exactly — just sum together the terms. What we mean by
“computed exactly” in this context, is that we can rewrite the sum as a simple, and easily evaluated,
formula involving the terminals of the sum. For example
n
r n +1 ´ r m
rk =
ÿ
provided r ‰ 1
r´1
k=m
No matter what finite integers we choose for m and n, we can quickly compute the sum in just a few
arithmetic operations. On the other hand, the sums,
n n
ÿ 1 ÿ 1
k k2
k=m k=m
cannot be expressed in such clean formulas (though you can rewrite them quite cleanly using integrals).
To explain more clearly we would need to go into a more detailed and careful discussion that is beyond
the scope of this course.
117
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
i =0
which is just the left hand side of equation (3.1.3), with n replaced by n + 1 and then
multiplied by a.
(b) The second sum is just n copies of 1 added together, so of course the sum is n.
n
(c) The sum i = 1 + 2 + 3 + ¨ ¨ ¨ + n can be visualized as the area of the red stairsteps
ř
i =1
below: the first column has area 1, the second column has area 2, and so on.
1 2 3 n
If we duplicate those stairsteps and spin them around, we make a rectangle with base
n + 1 and height n.
Since the red stairsteps are exactly half the total area of that rectangle,
n
ÿ 1
i= (n)(n + 1)
2
i =1
(d) The last two identities are proved in Question 32 of Section 5.2 of the practice book.
118
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
Before we explain more precisely what the definite integral actually is, a few remarks
(actually — a few interpretations) are in order.
żb
• If f ( x ) ě 0 and a ď b, one interpretation of the symbol f ( x )dx is “the area of the
a
region ( x, y) ˇ a ď x ď b, 0 ď y ď f ( x ) ”.
ˇ (
y y = f (x)
a x
b
ş1
In this way we can rewrite the area in Example 3.1.1 as the definite integral 0 e x dx.
• This interpretation breaks down when either a ą b or f ( x ) is not always positive,
but it can be repaired by considering “signed areas”.
şb
• If a ď b, but f ( x ) is not always positive, one interpretation of a f ( x )dx is “the signed
area between y = f ( x ) and the x–axis for a ď x ď b”. For “signed area” (which
is also called the “net area”), areas above the x–axis count as positive while areas
below the x–axis count as negative. In the example below, we have the graph of the
function
$
&´1 if 1 ď x ď 2
’
f (x) = 2 if 2 ă x ď 4
0 otherwise
’
%
119
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
The 2 ˆ 2 shaded square above the x–axis has signed area +2 ˆ 2 = +4. The 1 ˆ 1
shaded square below the x–axis has signed area ´1 ˆ 1 = ´1. So, for this f ( x ),
ż5
f ( x )dx = +4 ´ 1 = 3
0
y
2
+ signed area= +4
1 2 4 x
− signed area= −1
−1
• We start by selecting any natural number n and subdividing the interval from a to b
into n equal subintervals. Each subinterval has width b´a
n .
• Just as was the case in Example 3.1.1 we will eventually take the limit as n Ñ 8,
which squeezes the width of each subinterval down to zero.
12 We’ll eventually allow a and b to be any two real numbers, not even requiring a ă b. But it is easier to
start off assuming a ă b, and that’s what we’ll do.
120
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
y
y = f (x)
x
a = x0 x1 x2 x3 ··· xn−1 xn = b
by the rectangle
each subinterval by the value it took at the leftmost point in that subinterval.
• So, when there are n subintervals our approximation to the signed area between the
curve y = f ( x ) and the x–axis, with x running from a to b, is
n
ÿ b´a
f ( xi,n
˚
)¨
n
i =1
b´a
We interpret this as the signed area since the summands f ( xi,n
˚ )¨
n need not be
positive.
• Finally we define the definite integral by taking the limit of this sum as n Ñ 8.
121
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
Oof! This is quite an involved process, but we can now write down the definition we
żb
need. (A more mathematically rigorous definition of the definite integral f ( x )dx can
a
be found in Appendix A.7.)
Definition3.1.8.
Let a and b be two real numbers and let f ( x ) be a function that is defined for all
x between a and b. Then we define
żb n
ÿ b´a
f ( x )dx = lim f ( xi,n
˚
)¨
a nÑ8 n
i =1
when the limit exists and takes the same value for all choices of the xi,n
˚ ’s. In this
Of course, it is not immediately obvious when this limit should exist. Thankfully it is
easier for a function to be “integrable” than it is for it to be “differentiable”.
Theorem3.1.9.
• f ( x ) is continuous on [ a, b], or
We will not justify this theorem. But a slightly weaker statement is proved in (the
optional) Section A.7. Of course this does not tell us how to actually evaluate any definite
integrals — but we will get to that in time.
Some comments:
• Note that, in Definition 3.1.8, we allow a and b to be any two real numbers. We do
şb
not require that a ă b. That is, even when a ą b, the symbol a f ( x )dx is still defined
şb
by the formula of Definition 3.1.8. We’ll get an interpretation for a f ( x )dx, when
a ą b, later.
şb
• It is important to note that the definite integral a f ( x )dx represents a number, not a
function of x. The integration variable x is another “dummy” variable, just like the
summation index i in in=m ai (see Section 3.1.1). The integration variable does not
ř
have to be called x. For example
żb żb żb
f ( x )dx = f (t)dt = f (u)du
a a a
122
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
Just as with summation variables, the integration variable x has no meaning outside
of f ( x )dx. For example
ż1 żx
x
x e dx and e x dx
0 0
The sum inside definition 3.1.8 is named after Bernhard Riemann13 who made the first
rigorous definition of the definite integral and so placed integral calculus on rigorous
footings.
13 Bernhard Riemann was a 19th century German mathematician who made extremely important con-
tributions to many different areas of mathematics — far too many to list here. Arguably two of the
most important (after Riemann sums) are now called Riemann surfaces and the Riemann hypothesis
(he didn’t name them after himself).
123
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
Definition3.1.10.
b´a
where ∆x = n .
b´a
• If we choose each xi,n ˚ = x
i´1 = a + (i ´ 1) n to be the left hand end point
of the ith interval, [ xi´1 , xi ], we get the approximation
n
ÿ b´a b´a
f a + ( i ´ 1)
n n
i =1
şb
which is called the “left Riemann sum approximation to a f ( x )dx with n
subintervals”. This is the approximation used in Example 3.1.1.
˚ = x = a + i b´a we obtain the approxi-
• In the same way, if we choose xi,n i n
mation
n
ÿ b´a b´a
f a+i
n n
i =1
şb
which is called the “right Riemann sum approximation to a f ( x )dx with n
subintervals”. The word “right” signifies that, on each subinterval [ xi´1 , xi ]
we approximate f by its value at the right–hand end–point, xi = a + i b´a n ,
of the subinterval.
124
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
compute the limit of the sum as the number of summands goes to infinity. This approach is
not always feasible and we will soon arrive at other means of computing definite integrals
based on antiderivatives. However, Riemann sums also provide us with a good means of
approximating definite integrals — if we take n to be a large, but finite, integer, then the
corresponding Riemann sum can be a good approximation of the definite integral. Under
certain circumstances this can be strengthened to give rigorous bounds on the integral.
Let us revisit Example 3.1.1.
Example 3.1.11
ş1
Let’s say we are again interested in the integral 0 e x dx. We can follow the same procedure
as we used previously to construct Riemann sum approximations. However since the in-
tegrand f ( x ) = e x is an increasing function, we can make our approximations into upper
and lower bounds without much extra work.
More precisely, we approximate f ( x ) on each subinterval xi´1 ď x ď xi
• by its smallest value on the subinterval, namely f ( xi´i ), when we compute the left
Riemann sum approximation and
• by its largest value on the subinterval, namely f ( xi ), when we compute the right
Riemann sum approximation.
This is illustrated in the two figures below. The shaded region in the left hand figure is
the left Riemann sum approximation and the shaded region in the right hand figure is the
right Riemann sum approximation.
y y = ex y y = ex
1 2 n x 1 2 n x
n n
··· n n n
··· n
We can see that exactly because f ( x ) is increasing, the left Riemann sum describes an area
smaller than the definite integral while the right Riemann sum gives an area larger14 than
the integral.
ş1
When we approximate the integral 0 e x dx using n subintervals, then, on interval num-
ber i,
i´1 i
• x runs from n to n and
14 When a function is decreasing the situation is reversed — the left Riemann sum is always larger than the
integral while the right Riemann sum is smaller than the integral. For more general functions that both
increase and decrease it is perhaps easiest to study each increasing (or decreasing) interval separately.
125
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
• y = e x runs from e(i´1)/n , when x is at the left hand end point of the interval, to ei/n ,
when x is at the right hand end point of the interval.
ş1
Consequently, the left Riemann sum approximation to 0 e x dx is in=1 e(i´1)/n n1 and the
ř
n ż1 n
(i´1)/n 1 1
e x dx ď e /n ¨
i
ÿ ÿ
e ď
n 0 n
i =1 i =1
Thus Ln = in=1 e(i´1)/n n1 , which for any n can be evaluated by computer, is a lower bound
ř
ş1
on the exact value of 0 e x dx and Rn = in=1 ei/n n1 , which for any n can also be evaluated by
ř
ş1
computer, is an upper bound on the exact value of 0 e x dx. For example, when n = 1000,
Ln = 1.7174 and Rn = 1.7191 (both to four decimal places) so that, again to four decimal
places,
ż1
1.7174 ď e x dx ď 1.7191
0
So far, we have only a single interpretation15 for definite integrals — namely areas
under graphs. In the following example, we develop a second interpretation.
Example 3.1.12 (Another Interpretation for Definite Integrals)
Suppose that a particle is moving along the x–axis and suppose that at time t its velocity is
v(t) (with v(t) ą 0 indicating rightward motion and v(t) ă 0 indicating leftward motion).
What is the change in its x–coordinate between time a and time b ą a?
We’ll work this out using a procedure similar to our definition of the integral. First
pick a natural number n and divide the time interval from a to b into n equal subintervals,
each of width b´an . We are working our way towards a Riemann sum (as we have done
several times above) and so we will eventually take the limit n Ñ 8.
15 If this were the only interpretation then integrals would be a nice mathematical curiosity and unlikely
to be the core topic of a large first year mathematics course.
126
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
b´a
— namely v a + . So during the second subinterval the particle’s x–coordinate
n
changes by approximately v a + b´a
n
b´a
n .
b´a b ´ a b ´ a b ´ a b ´ a
v( a) +v a+ + ¨ ¨ ¨ + v a + ( i ´ 1) +¨¨¨
n n n n n
b ´ a b ´ a
+ v a + ( n ´ 1)
n n
n
ÿ b´a b´a
= v a + ( i ´ 1)
n n
i =1
This exactly the left Riemann sum approximation to the integral of v from a to b with
şb
n subintervals. The limit as n Ñ 8 is exactly the definite integral a v(t)dt. Following
tradition, we have called the (dummy) integration variable t rather than x to remind us
that it is time that is running from a to b.
The conclusion of the above discussion is that if a particle is moving along the x–axis
and its x–coordinate and velocity at time t are x (t) and v(t), respectively, then, for all
b ą a,
żb
x (b) ´ x ( a) = v(t)dt.
a
Example 3.1.12
127
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
a x
b
Example 3.1.13
Example 3.1.14
şb
Let b ą 0. The integral 0 xdx is the area of the shaded triangle (of base b and of height b)
in the figure on the right below. So
y y=x
żb b
1 b2
xdx = bˆb =
0 2 2
x
b
ş0
The integral ´b xdx is the signed area of the shaded triangle (again of base b and of height
b) in the figure on the right below. So
−b y
x
ż0
b2
xdx = ´
´b 2
−b
y=x
Example 3.1.14
şb
Notice that it is very easy to extend this example to the integral 0 cxdx for any real num-
bers b, c ą 0 and find
żb
c
cxdx = b2 .
0 2
Example 3.1.15
ş1
In this example, we shall evaluate ´1 (1 ´ |x|) dx. Recall that
#
´x if x ď 0
|x| =
x if x ě 0
128
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
so that
#
1+x if x ď 0
1 ´ |x| =
1´x if x ě 0
To picture the geometric figure whose area the integral represents observe that
• at the left hand end of the domain of integration x = ´1 and the integrand 1 ´ |x| =
1 ´ | ´ 1| = 1 ´ 1 = 0 and
• as x increases from ´1 towards 0, the integrand 1 ´ |x| = 1 + x increases linearly,
until
• when x hits 0 the integrand hits 1 ´ |x| = 1 ´ |0| = 1 and then
• as x increases from 0, the integrand 1 ´ |x| = 1 ´ x decreases linearly, until
• when x hits +1, the right hand end of the domain of integration, the integrand hits
1 ´ |x| = 1 ´ |1| = 0.
ş1
So the integral ´1 (1 ´ |x|) dx is the area of the shaded triangle (of base 2 and of height 1)
in the figure on the right below and
y
ż1 1
1
(1 ´ |x|) dx = ˆ2ˆ1 = 1
´1 2
−1 1 x
Example 3.1.15
Example 3.1.16
ş1 ? ?
integral 0 1 ´ x2 dx has integrand f ( x ) = 1 ´ x2 . So it represents the area under
The ?
y = 1 ´ x2 with x running from 0 to 1. But we may rewrite
x2 + y2 = 1, y ě 0
a
y = 1 ´ x2 as
But this is the (implicit) equation for a circle — the extra condition that y ě 0 makes it
the equation for the semi-circle centred at the origin with radius 1 lying on and above the
x-axis. Thus the integral represents the area of the quarter circle of radius 1, as shown in
the figure on the right below. So
y
ż1a 1
1 π
1 ´ x2 dx = π (1)2 =
0 4 4
1 x
129
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
Example 3.1.16
This next one is a little trickier and relies on us knowing the symmetries of the sine
function.
Example 3.1.17
The integral ´π sin xdx is the signed area of the shaded region in the figure on the right
şπ
below. It naturally splits into two regions, one on either side of the y-axis. We don’t know
the formula for the area of either of these regions (yet), however the two regions are very
nearly the same. In fact, the part of the shaded region below the x–axis is exactly the re-
flection, in the x–axis, of the part of the shaded region above the x–axis. So the signed area
of part of the shaded region below the x–axis is the negative of the signed area of part of
the shaded region above the x–axis and
y
1
żπ
sin xdx = 0 π x
´π −π
−1
Example 3.1.17
16 The more natural way of thinking about this is reversed: given the price, how much quantity will the
consumer purchase. But formulating the relationship where price is a function of quantity (rather than
the other way around) is standard practice in economics texts, so we follow it here.
130
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
products at a unit price of pe . (If they make more goods, to sell them all they’d have to
charge less than they are willing to accept. If they make fewer goods, they will not meet
consumer demand.)
Price p
S(q)
pe
D (q)
qe Quantity q
The consumer would have been happy to buy their first good at the price D (1). We can
say then that the first good a value of D (1) for the consumer. If they paid a lower price pe ,
then the number D (1) ´ pe is a surplus to the consumer: they gained D (1) units of value
by paying only pe units of value. This surplus can be visualized as the shaded area below.
Price p
S(q)
D (1)
pe
D (q)
1 qe Quantity q
Similarly, the consumer would have been happy to buy their second good at the unit
price D (2). If they paid a smaller price pe , then their surplus from that second good is
131
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
D (2) ´ pe : its value to them, minus what they actually paid. Their combined surplus after
buying two goods can be visualized as the shaded rectangles below.
Price p
S(q)
D (1)
D (2)
pe
D (q)
1 2 qe Quantity q
All together, we expect the consumer to buy qe units. Their total surplus is represented
by the shaded rectangles below.
Price p
S(q)
pe
D (q)
qe Quantity q
This motivates the definition of consumer surplus. Producer surplus behaves similarly.
132
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL
Definition3.1.18.
Consider a supply curve S(q) and a demand curve D (q) with intersection point
(qe , pe ), graphed on the (q, p)-plane. The consumer surplus is the area from q = 0
to q = qe under D (q) and above the line p = pe . The producer surplus is the area
from q = 0 to q = qe over S(q) and under the line p = pe . The total surplus is the
sum of consumer surplus and producer surplus.
Price p
S(q)
pe
P
D (q)
qe Quantity q
Given a sale of qe items at unit price pe , we think of the consumer surplus as the net
benefit to the consumer, and the producer surplus as the net benefit to the producer. To
calculate
ş qe these, we need a little geometric intuition. The consumer surplus is the area
0 D ( q ) dq minus the area of the rectangle with width qe and height pe . So, the consumer
surplus is
ż qe
C= D (q)dq ´ pe qe
0
Finally, the total surplus is the value gained by everybody, producers and consumers
combined:
ż qe ż qe
T =C+P = D (q)dq ´ S(q)dq
0 0
133
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
d df dg
lim ( f ( x ) + g( x )) = lim f ( x ) + lim g( x ) and ( f ( x ) + g( x )) = +
xÑa xÑa xÑa dx dx dx
Some of these rules have very natural analogues for integrals and we discuss them below.
Unfortunately the analogous rules for integrals of products of functions or integrals of
compositions of functions are more complicated than those for limits or derivatives. We
discuss those rules at length in subsequent sections. For now let us consider some of the
simpler rules of the arithmetic of integrals.
It is not too hard to prove this result from the definition of the definite integral. Addi-
tionally we only really need to prove (d) and (e) since
• (a) follows from (d) by setting A = B = 1,
134
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
Proof. As noted above, it suffices for us to prove (d) and (e). Since (e) is easier, we will
start with that. It is also a good warm-up for (d).
şb
• The definite integral in (e), a 1dx, can be interpreted geometrically as the area of the
rectangle with height 1 running from x = a to x = b; this area is clearly b ´ a. We
can also prove this formula from the definition of the integral (Definition 3.1.8):
żb n
ÿ b´a
dx = lim f ( xi,n
˚
) by definition
a nÑ8 n
i =1
n
ÿ b´a
= lim 1 since f ( x ) = 1
nÑ8 n
i =1
n
ÿ 1
= lim (b ´ a) since a, b are constants
nÑ8 n
i =1
= lim (b ´ a)
nÑ8
= b´a
as required.
as required.
Using this Theorem we can integrate sums, differences and constant multiples of functions
we know how to integrate. For example:
135
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
Example 3.2.2
ş1
In Example 3.1.1 we saw that 0 e x dx = e ´ 1. So
ż1 ż1 ż1
x
x
e + 7 dx = e dx + 7 1dx
0 0 0
by Theorem 3.2.1(d) with A = 1, f ( x ) = e x , B = 7, g( x ) = 1
= ( e ´ 1) + 7 ˆ (1 ´ 0)
by Example 3.1.1 and Theorem 3.2.1(e)
= e+6
Example 3.2.2
şb
When we gave the formal definition of a f ( x )dx in Definition 3.1.8 we explained that
the integral could be interpreted as the signed area between the curve y = f ( x ) and the
x-axis on the interval [ a, b]. In order for this interpretation to make sense we required that
a ă b, and though we remarked that the integral makes sense when a ą b we did not
şb
explain any further. Thankfully there is an easy way to express the integral a f ( x )dx in
şa
terms of b f ( x )dx — making it always possible to write an integral so the lower limit of
integration is less than the upper limit of integration. Theorem 3.2.3, below, tell us that, for
ş3 ş7
example, 7 e x dx = ´ 3 e x dx. The same theorem also provides us with two other simple
manipulations of the limits of integration.
136
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
as required.
şb
• Consider now the definite integral a f ( x )dx. We will sneak up on the proof by first
şa
examining Riemann sum approximations to both this and b f ( x )dx. The midpoint
şb
Riemann sum approximation to a f ( x )dx with 4 subintervals (so that each subinter-
val has width b´a4 ) is
"
1 b ´ a 3 b ´ a 5 b ´ a 7 b ´ a b ´ a
*
f a+ + f a+ + f a+ + f a+ ¨
2 4 2 4 2 4 2 4 4
"
7 1 5 3 3 5 1 7 b´a
*
= f a+ b + f a+ b + f a+ b + f a+ b ¨
8 8 8 8 8 8 8 8 4
şa
Now we do the same for b f ( x )dx with 4 subintervals. Note that b is now the lower
limit on the integral and a is now the upper limit on the integral. This is likely to
cause confusion when we write out the Riemann sum, so we’ll temporarily rename
şB
b to A and a to B. The midpoint Riemann sum approximation to A f ( x )dx with 4
subintervals is
"
1 B ´ A 3 B ´ A 5 B ´ A 7 B ´ A B ´ A
*
f A+ + f A+ + f A+ + f A+ ¨
2 4 2 4 2 4 2 4 4
"
7 1 5 3 3 5 1 7 B´A
*
= f A+ B + f A+ B + f A+ B + f A+ B ¨
8 8 8 8 8 8 8 8 4
Thus we see that the Riemann sums for the two integrals are nearly identical — the
only difference being the factor of b´a a´b
4 versus 4 . Hence the two Riemann sums are
negatives of each other.
The same computation with n subintervals shows that the midpoint Riemann sum
şa şb
approximations to b f ( x )dx and a f ( x )dx with n subintervals are negatives of each
şa şb
other. Taking the limit n Ñ 8 gives b f ( x )dx = ´ a f ( x )dx.
137
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
• Finally consider (c) — we will not give a formal proof of this, but instead will inter-
pret it geometrically. Indeed one can also interpret (a) geometrically. In both cases
these become statements about areas:
ża żb żc żb
f ( x )dx = 0 and f ( x )dx = f ( x )dx + f ( x )dx
a a a c
are
Area ( x, y) ˇ a ď x ď a, 0 ď y ď f ( x ) = 0
ˇ (
and
Area ( x, y) ˇ a ď x ď b, 0 ď y ď f ( x ) = Area ( x, y) ˇ a ď x ď c, 0 ď y ď f ( x )
ˇ ( ˇ (
+ Area ( x, y) ˇ c ď x ď b, 0 ď y ď f ( x )
ˇ (
respectively. Both of these geometric statements are intuitively obvious. See the
figures below.
y y
y = f (x) y = f (x)
x x
a a c b
Note that we have assumed that a ď c ď b and that f ( x ) ě 0. One can remove these
restrictions and also make the proof more formal, but it becomes quite tedious and
less intuitive.
Example 3.2.4
şb b2
Back in Example 3.1.14 we saw that when b ą 0 0 xdx = 2. We’ll now verify that
şb b2
0 xdx = 2 is still true when b = 0 and also when b ă 0.
şb b2
• First consider b = 0. Then the statement 0 xdx = 2 becomes
ż0
xdx = 0
0
138
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
So we have
żb ż ´B ż0
xdx = xdx = ´ xdx by Theorem 3.2.3(b)
0 0 ´B
B2
=´ ´ by Example 3.1.14
2
B 2 b2
= =
2 2
Example 3.2.4
Example 3.2.5
Applying Theorem 3.2.3 yet again, we have, for all real numbers a and b,
żb ż0 żb
xdx = xdx + xdx by Theorem 3.2.3(c) with c = 0
a a 0
żb ża
= xdx ´ xdx by Theorem 3.2.3(b)
0 0
b2 ´ a2
= by Example 3.2.4, twice
2
We can also understand this result geometrically.
139
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
• (left) When 0 ă a ă b, the integral represents the area in green which is the difference
of two right–angle triangles — the larger with area b2 /2 and the smaller with area
a2 /2.
• (centre) When a ă 0 ă b, the integral represents the signed area of the two displayed
triangles. The one above the axis has area b2 /2 while the one below has area ´a2 /2
(since it is below the axis).
• (right) When a ă b ă 0, the integral represents the signed area in purple of the
difference between the two triangles — the larger with area ´a2 /2 and the smaller
with area ´b2 /2.
Example 3.2.5
Theorem 3.2.3(c) shows us how we can split an integral over a larger interval into one
over two (or more) smaller intervals. This is particularly useful for dealing with piece-
wise functions, like |x|.
Example 3.2.6
Using Theorem 3.2.3, we can readily evaluate integrals involving |x|. First, recall that
#
x if x ě 0
|x| =
´x if x ă 0
ş3
Now consider (for example) ´2 |x|dx. Since the integrand changes at x = 0, it makes
sense to split the interval of integration at that point:
ż3 ż0 ż3
|x|dx = |x|dx + |x|dx by Theorem 3.2.3
´2 ´2 0
ż0 ż3
= (´x )dx + xdx by definition of |x|
´2 0
ż0 ż3
=´ xdx + xdx by Theorem 3.2.1(c)
´2 0
= ´(´22 /2) + (32 /2) = (4 + 9)/2
= 13/2
We can go further still — given a function f ( x ) we can rewrite the integral of f (|x|) in
terms of the integral of f ( x ) and f (´x ).
ż1 ż0 ż1
f |x| dx = f |x| dx + f |x| dx
´1 ´1 0
ż0 ż1
= f (´x )dx + f ( x )dx
´1 0
140
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
Example 3.2.6
Now we apply parts (a) and (b) of Theorem 3.2.1, and then
ż1 ż0 ż0 ż1 ż1
1 ´ |x| dx = 1dx + xdx + 1dx ´ xdx
´1 ´1 ´1 0 0
02 ´ (´1)2 12 ´ 02
= [0 ´ (´1)] + + [1 ´ 0] ´
2 2
=1
Example 3.2.7
18 We haven’t done this in this course, but you should have seen it in your differential calculus course or
perhaps even earlier.
141
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
Definition3.2.8.
Of course most functions are neither even nor odd, but many of the standard functions
you know are.
Example 3.2.9 (Even functions)
y y
1 1
−π π x −π π x
−1 −1
are reflections of each other in the y–axis and so have the same signed area. That is
ża ż0
f ( x )dx = f ( x )dx
0 ´a
Example 3.2.9
142
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
y y
1 1
−π π x −π π x
−1 −1
• In particular, if f ( x ) is an odd function and a ą 0, then the signed areas of the two
sets
are negatives of each other — to get from the first set to the second set, you flip it
upside down, in addition to reflecting it in the x–axis. That is
ża ż0
f ( x )dx = ´ f ( x )dx
0 ´a
Example 3.2.10
in order to simplify the integration of even and odd functions over intervals of the form
[´a, a].
143
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
Let a ą 0.
When f is even, the two terms on the right hand side are equal. When f is odd, the two
terms on the right hand side are negatives of each other.
144
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
(d) We have
ˇż b ˇ żb
f ( x )dxˇˇ ď | f ( x )|dx
ˇ ˇ
ˇ
a a
ˇ
Proof. (a) By interpreting the integral as the signed area, this statement simply says thatˇ if
the curve y = f ( x ) lies above the x–axis and a ď b, then the signed area of ( x, y) ˇ a ď
x ď b, 0 ď y ď f ( x ) is at least zero. This is quite clear. Alternatively, we could argue
(
şb
more algebraically from Definition 3.1.8. We observe that when we define a f ( x )dx
via Riemann sums, every summand, f ( xi,n ˚ ) b´a ě 0. Thus the whole sum is nonnega-
n
tive and consequently, so is the limit, and thus so is the integral.
(b) We can argue this from (a) with a little massaging. Let g( x ) = M ´ f ( x ), then since
f ( x ) ď M, we have g( x ) = M ´ f ( x ) ě 0 so that
żb żb
M ´ f ( x ) dx = g( x )dx ě 0.
a a
145
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL
146
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
ş π ?
/3
Example 3.2.13 0 cos xdx
This is not so easy to compute exactly19 , but we can bound it quite quickly.
For x between 0 and π3 , the function cos x takes values20 between 1 and 12 . Thus the
?
function cos x takes values between 1 and ?12 . That is
1 ? π
? ď cos x ď 1 for 0 ď x ď .
2 3
Example 3.2.13
19 It is not too hard to use Riemann sums and a computer to evaluate it numerically: 0.948025319 . . . .
20 You know the graphs of sine and cosine, so you should be able to work this out without too much
difficulty.
21 You learned these near the end of your differential calculus course. Now is a good time to revise — but
we’ll go over them here since they are so important in what follows.
147
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
F1 ( x ) = f ( x )
Part 2: Let G ( x ) be any function which is defined and continuous on [ a, b]. Fur-
ther let G ( x ) be differentiable with G1 ( x ) = f ( x ) for all a ă x ă b. Then
żb żb
f ( x )dx = G (b) ´ G ( a) or equivalently G1 ( x )dx = G (b) ´ G ( a)
a a
Before we prove this theorem and look at a bunch of examples of its application, it
is important that we recall one definition from differential calculus — antiderivatives. If
F1 ( x ) = f ( x ) on some interval, then F ( x ) is called an antiderivative of f ( x ) on that inter-
val. So Part 2 of the the Fundamental Theorem of Calculus tells us how to evaluate the
definite integral of f ( x ) in terms of any of its antiderivatives — if G ( x ) is any antideriva-
tive of f ( x ) then
żb
f ( x )dx = G (b) ´ G ( a)
a
şb
The form a G1 ( x ) dx = G (b) ´ G ( a) of the Fundamental Theorem relates the rate of
change of G ( x ) over the interval a ď x ď b to the net change of G between x = a and
x = b. For that reason, it is sometimes called the “net change theorem”.
We’ll start with a simple example. Then we’ll see why the Fundamental Theorem is
true and then we’ll do many more, and more involved, examples.
Example 3.3.2 (A first example)
şb
Consider the integral a xdx which we have explored previously in Example 3.2.5.
• The integrand is f ( x ) = x.
x2
• We can readily verify that G ( x ) = 2 satisfies G1 ( x ) = f ( x ) and so is an antideriva-
tive of the integrand.
• Part 2 of Theorem 3.3.1 then tells us that
żb
f ( x )dx = G (b) ´ G ( a)
a
żb
b2 a2
xdx = ´
a 2 2
which is precisely the result we obtained (with more work) in Example 3.2.5.
148
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
Example 3.3.2
We do not give completely rigorous proofs of the two parts of the theorem — that is
not really needed for this course. We just give the main ideas of the proofs so that you can
understand why the theorem is true.
Part 1. We wish to show that if
żx
F(x) = f (t)dt then F1 ( x ) = f ( x )
a
F ( x + h) ´ F ( x )
F1 ( x ) = lim
hÑ0 h
y = f (t)
t
a x x+h
• We will be taking the limit h Ñ 0. So suppose that h is very small. Then, as t runs
from x to x = h, f (t) runs only over a very narrow range of values22 , all close to
f ( x ).
• So the darkly shaded region is almost a rectangle of width h and height f ( x ) and so
F ( x +h)´F ( x )
has an area which is very close to f ( x )h. Thus h is very close to f ( x ).
149
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
F ( x +h)´F ( x )
• In the limit h Ñ 0, h becomes exactly f ( x ), which is precisely what we
want.
We can make the above more rigorous using the Mean Value Theorem23 .
şb
Part 2. We want to show that a f (t)dt = G (b) ´ G ( a). To do this we exploit the fact that
the derivative of a constant is zero.
• Let
żx
H (x) = f (t)dt ´ G ( x ) + G ( a)
a
Then the result we wish to prove is that H (b) = 0. We will do this by showing that
H ( x ) = 0 for all x between a and b.
• We first show that H ( x ) is constant by computing its derivative:
d x d d
ż
H (x) =
1
f (t)dt ´ ( G ( x )) + ( G ( a))
dx a dx dx
Since G ( a) is a constant, its derivative is 0 and by assumption the derivative of G ( x )
is just f ( x ), so
d x
ż
= f (t)dt ´ f ( x )
dx a
Now Part 1 of the theorem tells us that this derivative is just f ( x ), so
= f (x) ´ f (x) = 0
Hence H is constant.
• To determine which constant we just compute H ( a):
ża
H ( a) = f (t)dt ´ G ( a) + G ( a)
a
ża
= f (t)dt by Theorem 3.2.3(a)
a
=0
as required.
150
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
The simple example we did above (Example 3.3.2), demonstrates the application of
part 2 of the Fundamental Theorem of Calculus. Before we do more examples (and there
will be many more over the coming sections) we should do some examples illustrating
the use of part 1 of the fundamental theorem of calculus. Then we’ll move on to part 2.
ş
d x
Example 3.3.3 dx 0 tdt
şx
Consider the integral 0 t dt. We know how to evaluate this — it is just Example 3.3.2 with
a = 0, b = x. So we have two ways to compute the derivative. We can evaluate the in-
tegral and then take the derivative, or we can apply Part 1 of the Fundamental Theorem.
We’ll do both, and check that the two answers are the same.
First, Example 3.3.2 gives
żx
x2
F(x) = t dt =
0 2
In the previous example we were able to evaluate the integral explicitly, so we did not
need the Fundamental Theorem to determine its derivative. Here is an example that really
does require the use of the Fundamental Theorem.
ş
d x ´t2
Example 3.3.4 dx 0 e dt
d x ´t2
We would like to find dx 0e dt. In the previous example, we were able to compute the
ş
corresponding derivative in two ways — we could explicitly compute the integral and
then differentiate the result, or we could apply part 1 of the Fundamental Theorem of cal-
culus. In this example we do not know the integral explicitly. Indeed it is not possible
şx 2
to express24 the integral 0 e´t dt as a finite combination of standard functions such as
polynomials, exponentials, trigonometric functions and so on.
Despite this, we can find its derivative by just applying the first part of the Fundamen-
şx 2
24 The integral 0 e´t dt is closely related to the “error function” which is an extremely important function
in mathematics. While we cannot express this integral (or the error function) as a finite combination of
polynomials, exponentials etc, we can express it as an infinite series
żx
2 x3 x5 x7 x9 x2k+1
e´t dt = x ´ + ´ + + ¨ ¨ ¨ + (´1)k +¨¨¨
0 3 ¨ 1 5 ¨ 2 7 ¨ 3! 9 ¨ 4! (2k + 1) ¨ k!
151
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
2
tal Theorem of Calculus with f (t) = e´t and a = 0. That gives
żx żx
d ´t2 d
e dt = f (t)dt
dx 0 dx 0
2
= f ( x ) = e´x
Example 3.3.4
Let us ratchet up the complexity of the previous example — we can make the limits
of the integral more complicated functions. So consider the previous example with the
upper limit x replaced by x2 :
ş 2
d x ´t2
Example 3.3.5 dx 0 e dt
ş x2 2
Consider the integral 0 e´t dt. We would like to compute its derivative with respect to x
using part 1 of the fundamental theorem of calculus.
The
ş x Fundamental Theorem tells us how to compute the derivative of functions of the
form a f (t)dt but the integral at hand is not of the specified form because the upper limit
we have is x2 , rather than x, — so more care is required. Thankfully we can deal with this
obstacle with only a little extra work. The trick is to define an auxiliary function by simply
changing the upper limit to x. That is, define
żx
2
E( x ) = e´t dt
0
The derivative E1 ( x ) can be found via part 1 of the Fundamental Theorem of calculus (as
2
we did in Example 3.3.4) and is E1 ( x ) = e´x . We can then use this fact with the chain rule
to compute the derivative we need:
ż x2
d 2 d
e´t dt = E( x2 ) use the chain rule
dx 0 dx
= 2xE1 ( x2 )
4
= 2xe´x
Example 3.3.5
What if both limits of integration are functions of x? We can still make this work, but
we have to split the integral using Theorem 3.2.3.
152
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
ş 2
d x ´t2
Example 3.3.6 dx x e dt
As was the case in the previous example, we have to do a little pre-processing before we
can apply the Fundamental Theorem.
This time (by design), not only is the upper limit of integration x2 rather than x,ş but the
x
lower limit of integration also depends on x — this is different from the integral a f (t)dt
in the Fundamental Theorem where the lower limit of integration is a constant.
Fortunately we can use the basic properties of integrals (Theorem 3.2.3(b) and (c)) to
ş x2 2
split x e´t dt into pieces whose derivatives we already know.
ż x2 ż0 ż x2
´t2 ´t2 2
e dt = e dt + e´t dt by Theorem 3.2.3(c)
x x 0
żx ż x2
´t2 2
=´ e dt + e´t dt by Theorem 3.2.3(b)
0 0
With this pre-processing, both integrals are of the right form. Using what we have learned
in the the previous two examples,
ż x2 żx ż x2 !
d 2 d 2 2
e´t dt = ´ e´t dt + e´t dt
dx x dx 0 0
żx ż 2
d ´t2 d x ´t2
=´ e dt + e dt
dx 0 dx 0
2 4
= ´e´x + 2xe´x
Example 3.3.6
Definition3.3.7 (Antiderivatives).
153
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
Lemma3.3.8.
Proof. There are two parts to the lemma and we prove each in turn.
d d d
( F(x) + C) = ( F ( x )) + (C )
dx dx dx
= f (x) + 0
d d d d
H (x) = ( G ( x ) ´ F ( x )) = G(x) ´ F(x) = f (x) ´ f (x) = 0
dx dx dx dx
25 This follows from the Mean Value Theorem. Say H ( x ) were not constant, then there would be two
numbers a ă b so that H ( a) ‰ H (b). Then the MVT tells us that there is a number c between a and b so
that
H (b) ´ H ( a)
H 1 (c) = .
b´a
Since both numerator and denominator are non-zero, we know the derivative at c is nonzero. But
this would contradict the assumption that derivative of H is zero. Hence we cannot have a ă b with
H ( a) ‰ H (b) and so H ( x ) must be constant.
154
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
Definition3.3.9.
where the C is an arbitrary constant. In this context, the constant C is also often
called a “constant of integration”.
Notation3.3.10.
The symbol
ż ˇb
f ( x )dxˇˇ
ˇ
a
Notice that this notation allows us to write part 2 of the Fundamental Theorem as
żb ż ˇb
f ( x )dx = f ( x )dxˇˇ
ˇ
a a
= F ( x )|ba = F (b) ´ F ( a)
d
ż
f ( x )dx = f ( x ).
dx
155
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
• This is pretty close to what we want except for the factor of 2. Since this is a constant
we can just divide both sides by 2 to obtain:
1 d 2 1
¨ x = ¨ 2x which becomes
2 dx 2
d x 2
¨ =x
dx 2
which is exactly what we need. It tells us that x2 /2 is an antiderivative of x.
• Once one has an antiderivative, it is easy to compute the indefinite integral
1
ż
xdx = x2 + C
2
as well as the definite integral:
1 2 ˇˇ2
ż2 ˇ
xdx = x ˇ since x2 /2 is the antiderivative of x
1 2 1
1 1 3
= 22 ´ 12 =
2 2 2
26 Of course, this assumes that you did your differential calculus course last term. If you did that course at
a different time then please think back to that point in time. If it is long enough ago that you don’t quite
remember when it was, then you should probably do some revision of derivatives of simple functions
before proceeding further.
156
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
Example 3.3.11
While the previous example could be computed using signed areas, the following example
would be very difficult to compute without using the Fundamental Theorem of Calculus.
Example 3.3.12
ş π/2
Compute 0 sin xdx.
Solution.
• Once again, the crux of the solution is guessing the antiderivative of sin x — that is
finding a function whose derivative is sin x.
• The standard derivative that comes closest to sin x is
d
cos x = ´ sin x
dx
which is the derivative we want, multiplied by a factor of ´1.
• Just as we did in the previous example, we multiply this equation by a constant to
remove this unwanted factor:
d
(´1) ¨ cos x = (´1) ¨ (´ sin x ) giving us
dx
d
´ cos x = sin x
dx
This tells us that ´ cos x is an antiderivative of sin x.
• Now it is straightforward to compute the integral:
ż π/2
sin xdx = ´ cos x|0/2 since ´ cos x is the antiderivative of sin x
π
0
π
= ´ cos + cos 0
2
= 0+1 = 1
Example 3.3.12
Example 3.3.13
ş21
Find 1 x dx.
Solution.
• Once again, the crux of the solution is guessing a function whose derivative is 1x .
Our standard way to differentiate powers of x, namely
d n
x = nx n´1 ,
dx
157
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
doesn’t work in this case — since it would require us to pick n = 0 and this would
give
d 0 d
x = 1 = 0.
dx dx
d 1
ln x =
dx x
which is exactly the derivative we want.
Example 3.3.13
Example 3.3.14
1
ş´1
Find ´2 x dx.
Solution.
d 1
ln x =
dx x
and if we naively use this here, then we will obtain
1
ż ´1
dx = ln(´1) ´ ln(´2)
´2 x
which makes no sense since the logarithm is only defined for positive numbers28 .
• We can work around this problem using a slight variation of the logarithm — ln |x|.
27 To align with what you probably saw in high school, we’ll use ln x to denote the natural logarithm.
This is unambiguous – ln x is always the same as loge x.
On the other hand, the precise meaning of log x is not universal. The implied base may be 10 (com-
mon in chemistry and physics), e (common in math and computer languages like Java, C, Python, and
MATLAB), or 2 (common in computer science).
28 This is not entirely true — one can extend the definition of the logarithm to negative numbers, but to
do so one needs to understand complex numbers which is a topic beyond the scope of this course.
158
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
Example 3.3.14
This next example raises a nasty issue that requires a little care. We know that the
function 1/x is not defined at x = 0 — so can we integrate over an interval that contains
x = 0 and still obtain an answer that makes sense? More generally can we integrate a
function over an interval on which that function has discontinuities?
Example 3.3.15
ş1 1
Find ´1 x2 dx.
Solution. Beware that this is a particularly nasty example, which illustrates a booby trap
hidden in the Fundamental Theorem of Calculus. The booby trap explodes when the
theorem is applied sloppily.
• The sloppy solution starts, as our previous examples have, by finding an antideriva-
tive of the integrand. In this case we know that
d 1 1
=´ 2
dx x x
which means that ´x´1 is an antiderivative of x´2 .
159
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
1 ˇˇ1
ż1 ˇ
x ´2
dx = ´ ˇ since ´1/x is an antiderivative of 1/x2
´1 x ´1
1 1
=´ ´ ´
1 ´1
= ´2
Unfortunately,
• At this point we should really start to be concerned. This answer cannot be correct.
Our integrand, being a square, is positive everywhere. So our integral represents the
area of a region above the x–axis and must be positive.
• So what has gone wrong? The flaw in the computation is that the Fundamental
Theorem of calculus, which says that
żb
if F ( x ) = f ( x ) then
1
f ( x )dx = F (b) ´ F ( a),
a
is only applicable when F1 ( x ) exists and equals f ( x ) for all x between a and b.
• In this case F1 ( x ) = x12 does not exist for x = 0. So we cannot apply the Fundamental
Theorem of Calculus as we tried to above.
ş1
An integral, like ´1 x12 dx, whose integrand is undefined somewhere in the domain of
integration is called improper. We’ll give a more thorough treatment of improper integrals
later in the text. For now, we’ll just say that the correct way to define (and evaluate)
improper integrals is as a limit of well–defined approximating integrals. We shall later see
ş1
that, not only is ´1 x12 dx not negative, it is infinite.
Example 3.3.15
The above examples have illustrated how we can use the fundamental theorem of
calculus to convert knowledge of derivatives into knowledge of integrals. We are now in
a position to easily build a table of integrals. Here is a short table of the most important
derivatives that we know.
Of course we know other derivatives, such as those of sec x and cot x, however the ones
listed above are arguably the most important ones. From this table (with a very little
massaging) we can write down a short table of indefinite integrals.
160
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
1 x+C
1 n +1
xn n +1 x + C provided that n ‰ ´1
1
ln |x| + C
x
ex ex + C
sin x ´ cos x + C
cos x sin x + C
sec2 x tan x + C
1
? arcsin x + C
1 ´ x2
1
arctan x + C
1 + x2
Example 3.3.17
ş7
(i) 2 e x dx
ş2 1
(ii) ´2 1+ x2 dx
ş3 3
(iii) 0 (2x + 7x ´ 2)dx
Solution. We can proceed with each of these as before — find the antiderivative and then
apply the Fundamental Theorem. The third integral is a little more complicated, but we
can split it up into monomials using Theorem 3.2.1 and do each separately.
161
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
1
(ii) An antiderivative of 1+ x 2
is arctan( x ), so
ż2 ˇ2
1
dx arctan x
ˇ
= ( ) ˇ
´2 1 + x2 ˇ
´2
= arctan(2) ´ arctan(´2)
We can simplify this a little further by noting that arctan( x ) is an odd function, so
arctan(´2) = ´ arctan(2) and thus our integral is
= 2 arctan(2)
Example 3.3.17
162
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
Total cost is, therefore, the sum of fixed and variable costs:
TC(q) = FC + VC(q)
Fixed cost encompasses all expenses that do not change with quantity (such as rent on
a factory space, which is the same whether you make 1 or 1000 units). Fixed cost is a
constant, and generally nonzero. We can think of these expenses as being incurred before
the first unit is ever produced, hence the definition of fixed costs as TC(0).
Variable cost consists of expenses that depend on quantity. A typical example of such
an expense is raw materials: producing more units means using more raw materials.
Consider the cost of making “one more unit” of output, after having already made q
units: TC(q + 1) ´ TC(q). Using the definition of the derivative, we can approximate this
quantity by dTC
dq :
Definition3.3.19.
Given a good with total cost function TC(q), the marginal cost of production of
the good is defined as
d
MC(q) = TC .
dq
Suppose we know the marginal cost function, MC(q), and we want to find the total
cost function, TC(q). By the Fundamental Theorem of Calculus,
ż
TC(q) = MC(q) dq + C
TC(0) = FC.
163
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS
Suppose a product has fixed cost $100, and marginal cost function MC(q) = e´q + 3. What
is its total cost function?
Using the Fundamental Theorem of Calculus Part 1, given the definition MC(q) =
Solution.
d
dq TC , we see:
ż ż
TC(q) = MC dq + C = e´q + 3 dq + C
Antidifferentiating by inspection,
= ´e´q + 3q + C
Using FC=T(0):
100 = T (0) = ´e´0 + 3 ¨ 0 + C = ´1 + C
101 = C
All together,
TC(q) = ´e´q + 3q + 101
Example 3.3.20
In addition to considering total and marginal costs, we can consider total and marginal
revenue.
Definition3.3.21.
Suppose the total revenue collected from q units of output is given by the func-
tion TR(q), with TR(0) = 0 (since selling no products leads to no revenue). We
define the marginal revenue to be
d
MR(q) = TR(q) .
dq
TR(q)
P( q ) =
q
for q ą 0.
We think of marginal revenue as the extra revenue gained by producing one extra unit
of output.
Example 3.3.22
3
Suppose the marginal revenue function for a product is MR(q) = 10 ´ 1+ q2
. What is the
unit price of the product, if 10 units are sold?
164
I NTEGRATION 3.4 S UBSTITUTION
Solution. First, we use the Fundamental Theorem of Calculus Part 1 to find the total
revenue function.
ż
3
ż
TR = MRdq + C = 10 ´ dq + C
1 + q2
Referring to Theorem 3.3.16,
= 10q ´ 3 arctan q + C
Now we use the initial value TR(0) = 0.
0 = TR(0) = 10(0) ´ 3 arctan 0 + C
0=C
All together,
TR(q) = 10q ´ 3 arctan q
If 10 units are sold, the unit price is
TR(10) 10(10) ´ 3 arctan(10)
P(10) = = = 10 ´ 0.3 arctan(10) « 10.44
10 10
Example 3.3.22
3.4IJ Substitution
In the previous section we explored the Fundamental Theorem of Calculus and the link it
provides between definite integrals and antiderivatives. Indeed, integrals with simple in-
tegrands are usually evaluated via this link. In this section we start to explore methods for
integrating more complicated integrals. We have already seen — via Theorem 3.2.1 — that
integrals interact very nicely with addition, subtraction and multiplication by constants:
żb żb żb
( A f ( x ) + Bg( x )) dx = A f ( x )dx + B g( x )dx
a a a
for A, B constants. By combining this with the list of indefinite integrals in Theorem 3.3.16,
we can compute integrals of linear combinations of simple functions. For example
ż4 ż4 ż4 ż4
2
x
e ´ 2 sin x + 3x dx = x
e dx ´ 2 sin xdx + 3 x2 dx
1 1 1 1
3 ˇ4
x ˇ
= e x + (´2) ¨ (´ cos x ) + 3 ˇ and so on
3 ˇ
1
Of course there are a great many functions that can be approached in this way, however
there are some very simple examples that cannot.
x
ż ż ż
x
sin(πx )dx xe dx dx
x2 ´ 5x + 6
165
I NTEGRATION 3.4 S UBSTITUTION
In each case the integrands are not linear combinations of simpler functions; in order to
compute them we need to understand how integrals (and antiderivatives) interact with
compositions, products and quotients. We reached a very similar point in our differential
calculus course where we understood the linearity of the derivative,
d df dg
( A f ( x ) + Bg( x )) = A +B ,
dx dx dx
but had not yet seen the chain, product and quotient rules29 . While we will develop tools
to find the second and third integrals in later sections, we should really start with how to
integrate compositions of functions.
It is important to state up front, that in general one cannot write down the integral of
the composition of two functions — even if those functions are simple. This is not because
the integral does not exist. Rather it is because the integral cannot be written down as
a finite combination of the standard functions we know. A very good example of this,
which we encountered in Example 3.3.4, is the composition of e x and ´x2 . Even though
we know
1
ż ż
x x
e dx = e + C and ´x2 dx = ´ x3 + C
3
there is no simple function that is equal to the indefinite integral
ż
2
e´x dx.
even though the indefinite integral exists. In this way integration is very different from
differentiation.
With that caveat out of the way, we can introduce the substitution rule. The substitu-
tion rule is obtained by antidifferentiating the chain rule. In some sense it is the chain rule
in reverse. For completeness, let us restate the chain rule:
Theorem3.4.1 (The chain rule).
d
F u ( x ) = F 1 u ( x ) ¨ u1 ( x )
dx
Equivalently, if y( x ) = F (u( x )), then
dy dF du
= ¨ .
dx du dx
Consider a function f (u), which has antiderivative F (u). Then we know that
ż ż
f (u)du = F1 (u)du = F (u) + C
29 If your memory of these rules is a little hazy then you really should go back and revise them before
proceeding. You will definitely need a good grasp of the chain rule for what follows in this section.
166
I NTEGRATION 3.4 S UBSTITUTION
Now take the above equation and substitute into it u = u( x ) — i.e. replace the variable u
with any (differentiable) function of x to get
ż ˇ
f (u)duˇˇ = F (u( x )) + C
ˇ
u=u( x )
But now the right-hand side is a function of x, so we can differentiate it with respect to x
to get
d
F (u( x )) = F1 (u( x )) ¨ u1 ( x )
dx
This tells us that F (u( x )) is an antiderivative of the function F1 (u( x )) ¨ u1 ( x ) = f (u( x ))u1 ( x ).
Thus we know
1
ż ż ˇ
f u( x ) ¨ u ( x ) dx = F u( x ) + C = f (u) duˇˇ
ˇ
u=u( x )
In order to apply the substitution rule successfully we will have to write the integrand
in the form f (u( x )) ¨ u1 ( x ). To do this we need to make a good choice of the function u( x );
after that it is not hard to then find f (u) and u1 ( x ). Unfortunately there is no one strategy
for choosing u( x ). This can make applying the substitution rule more art than science30 .
Here we suggest two possible strategies for picking u( x ):
(1) Factor the integrand and choose one of the factors to be u1 ( x ). For this to work, you
must be able to easily find the antiderivative of the chosen factor. The antiderivative
will be u( x ).
(2) Look for a factor in the integrand that is a function with an argument
that is more
complicated than just “x”. That factor will play the role of f u( x ) Choose u( x ) to be
the complicated argument.
Here are two examples which illustrate each of those strategies in turn.
Example 3.4.3
30 Thankfully this does become easier with experience and we recommend that the reader read some
examples and then practice a LOT.
167
I NTEGRATION 3.4 S UBSTITUTION
We want to massage this into the form of the integrand in the substitution rule — namely
f (u( x )) ¨ u1 ( x ). Our integrand can be written as the product of the two factors
sin8 ( x ) ¨ cos
9looomooon (x)
loomoon
first factor second factor
and we start by determining (or guessing) which factor plays the role of u1 ( x ). We can
choose u1 ( x ) = 9 sin8 ( x ) or u1 ( x ) = cos( x ).
• If we choose u1 ( x ) = 9 sin8 ( x ), then antidifferentiating this to find u( x ) is really not
very easy. So it is perhaps better to investigate the other choice before proceeding
further with this one.
ż ˇ
8
= 9u duˇˇ by the substitution rule
ˇ
u=sin( x )
We are now left with the problem of antidifferentiating a monomial; this we can do with
Theorem 3.3.16.
ˇˇ
9
= u + C ˇˇ
u=sin( x )
9
= sin ( x ) + C
Note that 9 sin8 ( x ) cos( x ) is a function of x. So our answer, which is the indefinite integral
of 9 sin8 ( x ) cos( x ), must also be a function of x. This is why we have substituted u =
sin( x ) in the last step of our solution — it makes our solution a function of x.
Example 3.4.3
Example 3.4.4
Solution. Again we are going to use the substitution rule and helpfully our integrand is a
product of two factors
2
loo3x cos( x3 )
moon ¨ loomoon
first factor second factor
168
I NTEGRATION 3.4 S UBSTITUTION
The second factor, cos x3 is a function, namely cos, with a complicated argument, namely
x3 . So we try u( x ) = x3 . Then u1 ( x ) = 3x2 , which is the other factor in the integrand. So
the integral becomes
ż ż
2 3
3x cos( x )dx = u1 ( x ) cos u( x ) dx just swap order of factors
ż
= cos u( x ) u1 ( x )dx by the substitution rule
ż ˇ
= cos(u)duˇˇ
ˇ
u= x3
ˇ
= (sin(u) + C ) ˇˇ using Theorem 3.3.16)
ˇ
u= x3
= sin( x3 ) + C
Example 3.4.4
Now let’s look at a definite integral.
ş
1 x x
Example 3.4.5 0 e sin(e )dx
Compute
ż1
e x sin e x dx.
0
• The integrand is again the product of two factors and we can choose u1 ( x ) = e x or
u1 ( x ) = sin(e x ).
ż ż
x x
e sin e dx = sin u( x ) u1 ( x )dx apply the substitution rule
ż ˇ
= sin(u)duˇˇ
ˇ
u=e x
ˇ
= (´ cos(u) + C ) ˇˇ
ˇ
u=e x
x
= ´ cos e + C
169
I NTEGRATION 3.4 S UBSTITUTION
• But what happened to the limits of integration? We can incorporate them now. We
have just shown that the indefinite integral is ´ cos(e x ), so by the fundamental the-
orem of calculus
ż1
1
e x sin e x dx = ´ cos e x 0
0
= ´ cos(e1 ) ´ (´ cos(e0 ))
= ´ cos(e) + cos(1)
Example 3.4.5
The example below introduces a special case where the “inside” function is linear.
Example 3.4.6
Solution.
• Starting with the first integral, we see that it is not too hard ?
to spot the complicated
argument. If we set u( x ) = 2x + 1 then the integrand is just u.
1
• Hence we substitute 2x + 1 Ñ u and dx Ñ u1 ( x )
du = 21 du:
ż ?
? 1
ż
2x + 1dx = u du
2
1
ż
= u1/2 du
2
ˇ
2 3/2 1
u ¨ + C ˇˇ
ˇ
=
3 2 u=2x +1
1
= (2x + 1)3/2 + C
3
• We can evaluate the second integral in much the same way. Set u( x ) = 3x ´ 2 and
replace dx by u1 (1x) du = 31 du:
1
ż ż
3x´2
e dx = eu du
3
ˇ
1 u
e + C ˇˇ
ˇ
=
3 u=3x´2
1 3x´2
= e +C
3
170
I NTEGRATION 3.4 S UBSTITUTION
Example 3.4.6
This last example illustrates that substitution can be used to easily deal with arguments of
the form ax + b, i.e. that are linear functions of x, and suggests the following theorem.
Theorem3.4.7.
1
ż
f ( ax + b)dx = F ( ax + b) + C
a
Proof. We can show this using the substitution rule. Let u( x ) = ax + b so u1 ( x ) = a, then
1
ż ż
f ( ax + b)dx = f (u) ¨ 1 du
u (x)
1
ż
= f (u)du
a
1
ż
= f (u)du since a is a constant
a
1
ˇ
= F (u)ˇˇ +C since F (u) is an antiderivative of f (u)
ˇ
a u= ax +b
1
= F ( ax + b) + C.
a
171
I NTEGRATION 3.4 S UBSTITUTION
Notice that to get from the integral on the left hand side to the integral on the right
hand side you
• substitute31 u( x ) Ñ u and u1 ( x )dx Ñ du,
• set the lower limit for the u integral to the value of u (namely u( a)) that corresponds
to the lower limit of the x integral (namely x = a), and
• set the upper limit for the u integral to the value of u (namely u(b)) that corresponds
to the upper limit of the x integral (namely x = b).
şb
Also note that we now have two ways to evaluate definite integrals of the form a f u( x ) u1 ( x ) dx.
• We can find the indefinite integral f u( x ) u1 ( x ) dx, using Theorem 3.4.2, and then
ş
evaluate the result between x = a and x = b. This is what was done in Example 3.4.5.
• Or we can apply Theorem 3.4.2. This entails finding the indefinite integral f (u) du
ş
and evaluating the result between u = u( a) and u = u(b). This is what we will do
in the following example.
ş
1
Example 3.4.9 0 x2 sin( x3 + 1)dx
Compute
ż1
x2 sin x3 + 1 dx
0
Solution.
• In this example the integrand is already neatly factored into two pieces. While we
could deploy either of our two strategies, it is perhaps easier in this case to choose
u( x ) by looking for a complicated argument.
• The second factor of the integrand is sin x3 + 1 , which is the function sin evaluated
at x3 + 1. So set u( x ) = x3 + 1, giving u1 ( x ) = 3x2 and f (u) = sin(u)
• The first factor of the integrand is x2 which is not quite u1 ( x ), however we can easily
massage the integrand into the required form by multiplying and dividing by 3:
1
x2 sin x3 + 1 = ¨ 3x2 ¨ sin x3 + 1 .
3
• We want this in the form of the substitution rule, so we do a little massaging:
ż1 ż1
2 3
1
x sin x + 1 dx = ¨ 3x2 ¨ sin x3 + 1 dx
0 0 3
1 1
ż
= sin x3 + 1 ¨ 3x2 dx by Theorem 3.2.1(c)
3 0
31 A good way to remember this last step is that we replace du dx dx by just du — which looks like we
cancelled out the dx terms: dudx
dx
= du. While using “cancel the dx” is a good mnemonic (memory
aid), you should not think of the derivative du
dx as a fraction — you are not dividing du by dx.
172
I NTEGRATION 3.4 S UBSTITUTION
Example 3.4.9
There is another, and perhaps easier, way to view the manipulations in the previous
example. Once you have chosen u( x ) you
but we do not have to manipulate the integrand so as to make u1 ( x ) explicit. Let us redo
the previous example by this approach.
Example 3.4.10 (Example 3.4.9 revisited)
Solution.
173
I NTEGRATION 3.4 S UBSTITUTION
• We have already observed that one factor of the integrand is sin x3 + 1 , which is
sin evaluated at x3 + 1. Thus we try setting u( x ) = x3 + 1.
1
• This makes u1 ( x ) = 3x2 , and we replace u( x ) = x3 + 1 Ñ u and dx Ñ u1 ( x )
du =
1
3x2
du:
ż1 ż u (1)
2 3
1
x sin x + 1 dx = x2 sin 3
loooooomoooooon 3x2 du
x + 1
0 u (0)
=sin(u)
ż2
x2
= sin(u) du
1 3x2
ż2
1
= sin(u)du
1 3
ż2
1
= sin(u)du
3 1
Example 3.4.10
We can do the following example using the substitution rule or Theorem 3.4.7:
ş π
/2
Example 3.4.11 0 cos(3x )dx
ş π/2
Compute 0 cos(3x )dx.
• In this example we should set u = 3x, and substitute dx Ñ u1 (1x) du = 31 du. When
we do this we also have to convert the limits of the integral: u(0) = 0 and u(π/2) =
3π/2. This gives
ż π/2 ż 3π/2
1
cos(3x )dx = cos(u) du
0 0 3
3π/2
1
= sin(u)
3 0
sin(3π/2) ´ sin(0)
=
3
´1 ´ 0 1
= =´ .
3 3
• We can also do this example more directly using the above theorem. Since sin( x ) is
sin(3x )
an antiderivative of cos( x ), Theorem 3.4.7 tells us that 3 is an antiderivative of
174
I NTEGRATION 3.4 S UBSTITUTION
cos(3x ). Hence
π
sin(3x ) /2
ż π/2
cos(3x )dx =
0 3 0
sin(3π/2) ´ sin(0)
=
3
1
=´ .
3
Example 3.4.11
This integral looks a lot like that of Example 3.4.9. It makes sense to try u( x ) = 1 ´ x3 since
it is the argument of sin(1 ´ x3 ). We
• substitute u = 1 ´ x3 and
1 1
• replace dx with u1 ( x )
du = ´3x2
du,
• when x = 1, we have u = 1 ´ 13 = 0.
So
ż1 ż0
2 3
1
x sin 1 ´ x ¨ dx = x2 sin(u) ¨ du
0 1 ´3x2
ż0
1
= ´ sin(u)du.
1 3
Note that the lower limit of the u–integral, namely 1, is larger than the upper limit, which
is 0. There is absolutely nothing wrong with that. We can simply evaluate the u–integral
in the normal way. Since ´ cos(u) is an antiderivative of sin(u):
cos(u) 0
=
3 1
cos(0) ´ cos(1)
=
3
1 ´ cos(1)
= .
3
175
I NTEGRATION 3.4 S UBSTITUTION
Example 3.4.12
ş
1 1
Example 3.4.13 0 (2x +1)3 dx
ş1 1
Compute 0 (2x+ 1)3
dx.
We could do this one using Theorem 3.4.7, but its not too hard to do without. We can
think of the integrand as the function “one over a cube” with the argument 2x + 1. So it
makes sense to substitute u = 2x + 1. That is
• set u = 2x + 1 and
1
• replace dx Ñ u1 ( x )
du = 21 du.
• when x = 1, we have u = 2 ˆ 1 + 1 = 3.
So
ż1 ż3
1 1 1
dx = ¨ du
0 (2x + 1)3 1 u3 2
ż3
1
= u´3 du
2 1
3
1 u´2
=
2 ´2 1
1 1 1 1 1
= ¨ ´ ¨
2 ´2 9 ´2 1
1 1 1 1 8
= ´ = ¨
2 2 18 2 18
2
=
9
Example 3.4.13
ş
1 x
Example 3.4.14 0 1+ x2 dx
ş1 x
Evaluate 0 1+ x2 dx.
Solution.
• The integrand can be rewritten as x ¨ 1+1x2 . This second factor suggests that we should
try setting u = 1 + x2 — and so we interpret the second factor as the function “one
over” evaluated at argument 1 + x2 .
176
I NTEGRATION 3.4 S UBSTITUTION
– set u = 1 + x2 ,
1
– substitute dx Ñ 2x du, and
– translate the limits of integration: when x = 0, we have u = 1 + 02 = 1 and
when x = 1, we have u = 1 + 12 = 2.
Example 3.4.14
Example 3.4.15 x3 cos x4 + 2 dx
ş
Compute the integral x3 cos x4 + 2 dx.
ş
Solution.
• The integrand is the product of cos evaluated at the argument x4 + 2 times x3 , which
aside from a factor of 4, is the derivative of the argument x4 + 2.
1 1
• Hence we set u = x4 + 2 and then substitute dx Ñ u1 ( x )
du = 4x3
du.
1
ż ż ˇ
3 4 3
x cos x + 2 dx = x cos(u) 3 duˇˇ
ˇ
4x u = x 4 +2
1
ż ˇ
cos(u)duˇˇ
ˇ
=
4 u = x 4 +2
ˇ
1
sin(u) + C ˇˇ
ˇ
=
4 u = x 4 +2
1
= sin( x4 + 2) + C.
4
177
I NTEGRATION 3.4 S UBSTITUTION
Example 3.4.15
The next two examples are more involved and require more careful thinking.
ş ?
Example 3.4.16 1 + x2 x3 dx
ş?
Compute 1 + x2 x3 dx.
• An obvious choice of u is the argument inside the square root. So substitute u =
1
1 + x2 and dx Ñ 2x du.
• When we do this we obtain
? 1
ż a ż
3
2
1 + x ¨ x dx = u ¨ x3 ¨ du
2x
1 ?
ż
= u ¨ x2 du
2
Unlike all our previous examples, we have not cancelled out all of the x’s from the
integrand. However before we do the integral with respect to u, the integrand must
be expressed solely in terms of u — no x’s are allowed. (Look that integrand on the
right hand side of Theorem 3.4.2.)
• But all is not lost. We can rewrite the factor x2 in terms of the variable u. We know
that u = 1 + x2 , so this means x2 = u ´ 1. Substituting this into our integral gives
1?
ż a ż
3
2
1 + x ¨ x dx = u ¨ x2 du
2
1?
ż
= u ¨ (u ´ 1)du
2
ż
1
= u3/2 ´ u1/2 du
2
ˇ
1 2 5/2 2 3/2 ˇˇ
= u ´ u ˇ 2 +C
2 5 3
ˇ u = x +1
1 5/2 1 3/2 ˇˇ
= u ´ u ˇ 2 +C
5 3 u = x +1
1 1
= ( x2 + 1)5/2 ´ ( x2 + 1)3/2 + C.
5 3
Oof!
• Don’t forget that you can always check the answer by differentiating:
d 1 2 5/2 1 2 3/2 d 1 2 5/2 d 1 2 3/2
( x + 1) ´ ( x + 1) + C = ( x + 1) ´ ( x + 1)
dx 5 3 dx 5 dx 3
1 5 1 3
= ¨ 2x ¨ ¨ ( x2 + 1)3/2 ´ ¨ 2x ¨ ¨ ( x2 + 1)1/2
5 2 3 2
2 3/2 2 1/2
= x ( x + 1) ´ x ( x + 1)
a
= x ( x 2 + 1) ´ 1 ¨ x 2 + 1
= x3 x2 + 1.
a
178
I NTEGRATION 3.5 I NTEGRATION BY PARTS
Example 3.4.16
Solution.
• At first glance there is nothing to manipulate here and şso very little to go on. How-
sin x sin x
ever we can rewrite tan x as cos x , making the integral cos x dx. This gives us more
to work with.
• Now think of the integrand as being the product cos1 x ¨ sin x. This suggests that we set
u = cos x and that we interpret the first factor as the function “one over” evaluated
at u = cos x.
1
• Substitute u = cos x and dx Ñ ´ sin x du to give:
sin x sin x 1
ż ż ˇ
dx = duˇˇ
ˇ
cos x u ´ sin x u=cos x
1
ż ˇ
= ´ duˇˇ
ˇ
u u=cos x
= ´ ln | cos x| + C and if we want to go further
ˇ 1 ˇ
ˇ ˇ
= ln ˇˇ ˇ+C
cos x ˇ
= ln | sec x| + C.
Example 3.4.17
d
ż
( F ( x )) dx = F ( x ) + C
dx
We can exploit this in order to develop another rule for integration — in particular a rule
to help us integrate products of simpler function such as
ż
xe x dx
179
I NTEGRATION 3.5 I NTEGRATION BY PARTS
d
u ( x ) v ( x ) = u1 ( x ) v ( x ) + u ( x ) v1 ( x )
dx
Integrating this gives
1
ż
u ( x ) v( x ) + u( x ) v1 ( x ) dx = a function whose derivative is u1 v + uv1 + C
= u( x )v( x ) + C
Now this, by itself, is not terribly useful. In order to apply it we need to have a function
whose integrand is a sum of products that is in exactly this form u1 ( x )v( x ) + u( x )v1 ( x ).
This is far too specialised.
However if we tease this apart a little:
1
ż ż ż
u ( x ) v( x ) + u( x ) v ( x ) dx = u ( x ) v( x ) dx + u( x ) v1 ( x ) dx
1 1
In this form we take the integral of one product and express it in terms of the integral of
a different product. If we express it like that, it doesn’t seem too useful. However, if the
second integral is easier, then this process helps us.
Let us do a simple example before explaining this more generally.
Example 3.5.1 ( xe x dx )
ş
ż
Compute the integral xe x dx.
Solution.
• Now set u( x ) = x and v1 ( x ) = e x . How did we know how to make this choice? We
will explain some strategies later. For now, let us just accept this choice and keep
going.
180
I NTEGRATION 3.5 I NTEGRATION BY PARTS
• In order to use the formula we need to know u1 ( x ) and v( x ). In this case it is quite
straightforward: u1 ( x ) = 1 and v( x ) = e x .
So our original more difficult integral has been turned into a question of computing
an easy one.
= xe x ´ e x + C
d
( xe x ´ e x + C ) = looooomooooon
xe x + 1 ¨ e x ´e x + 0
dx
by product rule
x
= xe as required.
Example 3.5.1
The process we have used in the above example is called “integration by parts”. When
our integrand is a product we try to write it as u( x )v1 ( x ) — we need to choose one factor
to be u( x ) and the other to be v1 ( x ). We then compute u1 ( x ) and v( x ) and then apply the
following theorem:
If we write dv for v1 ( x )dx and du for u1 ( x )dx (as the substitution rule suggests),
then the formula becomes
ż ż
udv = u v ´ vdu
Integration by parts is not as easy to apply as the product rule for derivatives. This is
because it relies on us
181
I NTEGRATION 3.5 I NTEGRATION BY PARTS
(3) that the integral u1 ( x )v( x )dx is easier than the integral we started with.
ş
Notice that any antiderivative of v1 ( x ) will do. All antiderivatives of v1 ( x ) are of the
form v( x ) + A with A a constant. Putting this into the integration by parts formula gives
ż ż
u( x )v ( x )dx = u( x ) (v( x ) + A) ´ u1 ( x ) (v( x ) + A) dx
1
ż ż
= u( x )v( x ) + Au( x ) ´ u ( x )v( x )dx ´ A u1 ( x )dx
1
loooooomoooooon
= Au( x )+C
ż
= u( x )v( x ) ´ u1 ( x )v( x )dx + C
x and ex
u( x ) = x v1 ( x ) = e x
or
u( x ) = e x v1 ( x ) = x
u1 ( x ) = 1 and v( x ) = e x
which means we will need to integrate (in the right-hand side of the integration by
parts formula)
ż ż
u ( x )v( x )dx = 1 ¨ e x dx
1
which looks straightforward. This is a good indication that this is the right choice of
u( x ) and v1 ( x ).
182
I NTEGRATION 3.5 I NTEGRATION BY PARTS
2. But before we do that, we should also explore the other choice, namely u( x ) = e x
and v1 ( x ) = x. This implies that
1 2
u1 ( x ) = e x and v( x ) = x
2
which means we need to integrate
1 2 x
ż ż
u ( x )v( x )dx =
1
x ¨ e dx.
2
This is at least as hard as the integral we started with. Hence we should try the first
choice.
= xe x ´ e x + C.
The above reasoning is a very typical workflow when using integration by parts.
Example 3.5.3
Integration by parts is often used
d
• to eliminate factors of x from an integrand like xe x by using that dx x = 1 and
d 1
• to eliminate a ln x from an integrand by using that dx ln x = x and
• to eliminate inverse trig functions, like arctan x, from an integrand by using that, for
d
example, dx arctan x = 1+1x2 .
Solution.
u1 ( x ) = 1 and v( x ) = ´ cos x
1 2
u1 ( x ) = cos x and v( x ) = x
2
ş1 2 cos xdx.
which is looking worse — we’d need to integrate 2x
183
I NTEGRATION 3.5 I NTEGRATION BY PARTS
= ´x cos x + sin x + C
d
(´x cos x + sin x + C ) = ´ cos x + x sin x + cos x + 0
dx
= x sin xX
Once we have practised this a bit we do not really need to write as much. Let us solve
it again, but showing only what we need to.
Solution.
• We use integration by parts to solve the integral.
= ´x cos x + sin x + C.
Example 3.5.4
It is pretty standard practice to reduce the notation even further in these problems. As
noted above, many people write the integration by parts formula as
ż ż
udv = uv ´ vdu
where du, dv are shorthand for u1 ( x )dx, v1 ( x )dx. Let us write up the previous example
using this notation.
Example 3.5.5 ( x sin xdx yet again)
ş
Solution. Using integration by parts, we set u = x and dv = sin xdx. This makes du = 1dx
and v = ´ cos x. Consequently
ż ż
x sin xdx = udv
ż
= uv ´ vdu
ż
= ´x cos x + cos xdx
= ´x cos x + sin x + C
184
I NTEGRATION 3.5 I NTEGRATION BY PARTS
You can see that this is a very neat way to write up these problems and we will continue
using this shorthand in the examples that follow below.
Example 3.5.5
We can also use integration by parts to eliminate higher powers of x. We just need to
apply the method more than once.
Example 3.5.6 x2 e x dx
ş
Solution.
• So we have reduced the problem of computing the original integral to one of inte-
grating 2xe x . We know how to do this — just integrate by parts again:
ż ż
2 x 2 x
x e dx = x e ´ 2xe x dx set u = 2x, dv = e x dx
ż
2 x x x
= x e ´ 2xe ´ 2e dx since du = 2dx, v = e x
= x2 e x ´ 2xe x + 2e x + C
d 2 x
x e ´ 2xe x + 2e x + C = x2 e x + 2xe x ´ (2xe x + 2e x ) + 2e x + 0
dx
= x2 e x X
Now let us look at integrands containing logarithms. We don’t know the antiderivative
of ln x, but we can eliminate ln x from an integrand by using integration by parts with
u = ln x.
Example 3.5.7 ( x ln xdx )
ş
Solution.
185
I NTEGRATION 3.5 I NTEGRATION BY PARTS
Example 3.5.7
It is not immediately obvious that one should use integration by parts to compute the in-
tegral
ż
ln xdx
since the integrand is not a product. But we should persevere — indeed this is a situation
where our shorter notation helps to clarify how to proceed.
Solution.
32 We will soon.
186
I NTEGRATION 3.5 I NTEGRATION BY PARTS
• In the previous example we saw that we could remove the factor ln x by setting
u = ln x and using integration by parts. Let us try repeating this. When we make
this choice, we are then forced to take dv = dx — that is we choose v1 ( x ) = 1. Once
we have made this sneaky move everything follows quite directly.
1
ż ż
ln xdx = x ln x ´ ¨ xdx
x
ż
= x ln x ´ 1dx
= x ln x ´ x + C
• As always, it is a good idea to check our result by verifying that the derivative of the
answer really is the integrand.
d 1
x ln x ´ x + C = ln x + x ´ 1 + 0 = ln x
dx x
Example 3.5.8
The same method works almost exactly to compute the antiderivatives of arcsin( x )
and arctan( x ):
Compute the antiderivatives of the inverse sine and inverse tangent functions.
Solution.
• Again neither of these integrands are products, but that is no impediment. In both
cases we set dv = dx (ie v1 ( x ) = 1) and choose v( x ) = x.
1
• For inverse tan we choose u = arctan( x ), so du = 1+ x 2
dx:
1
ż ż
arctan( x )dx = x arctan( x ) ´ x¨ dx now use substitution rule
1 + x2
w (x) 1
ż 1
= x arctan( x ) ´ ¨ dx with w( x ) = 1 + x2 , w1 ( x ) = 2x
2 w
1 1
ż
= x arctan( x ) ´ dw
2 w
1
= x arctan( x ) ´ ln |w| + C
2
1
= x arctan( x ) ´ ln |1 + x2 | + C but 1 + x2 ą 0, so
2
1
= x arctan( x ) ´ ln(1 + x2 ) + C
2
187
I NTEGRATION 3.5 I NTEGRATION BY PARTS
x
ż ż
arcsin( x )dx = x arcsin( x ) ´ ? dx now use substitution rule
1 ´ x2
´w1 ( x )
ż
= x arcsin( x ) ´ ¨ w´1/2 dx with w( x ) = 1 ´ x2 , w1 ( x ) = ´2x
2
1
ż
= x arcsin( x ) + w´1/2 dw
2
1
= x arcsin( x ) + ¨ 2w1/2 + C
2
a
= x arcsin( x ) + 1 ´ x2 + C
• Both can be checked quite quickly by differentiating — but we leave that as an exer-
cise for the reader.
Example 3.5.9
There are many other examples we could do, but we’ll finish with a tricky one.
Example 3.5.10 ( e x sin xdx )
ş
Solution. Let us attempt this one a little naively and then we’ll come back and do it more
carefully (and successfully).
• We can choose either u = e x , dv = sin xdx or the other way around.
1. Let u = e x , dv = sin xdx. Then du = e x dx and v = ´ cos x. This gives
ż ż
e sin x = ´e cos x + e x cos xdx
x x
So we are left with an integrand that is very similar to the one we started with.
What about the other choice?
2. Let u = sin x, dv = e x dx. Then du = cos xdx and v = e x . This gives
ż ż
e sin x = e sin x ´ e x cos xdx
x x
So we are again left with an integrand that is very similar to the one we started
with.
• şHow do we proceed? — It turns out to be easier if you do both e x sin xdx and
ş
Example 3.5.10
ş
b x şb x
Example 3.5.11 ae sin xdx and a e cos xdx
188
I NTEGRATION 3.5 I NTEGRATION BY PARTS
• First
żb żb
x
I1 = e sin xdx = udv with u = e x , dv = sin xdx
a a
so v = ´ cos x, du = e x dx
h ib żb
x
= ´ e cos x + e x cos xdx
a a
• So summarising, we have
h ib h ib
I1 = ´ e x cos x + I2 I2 = e x sin x ´ I1
a a
• So now, substitute the expression for I2 from the second equation into the first equa-
tion to get
h ib 1h ib
I1 = ´ e x cos x + e x sin x ´ I1 which implies I1 = e x sin x ´ cos x
a 2 a
That is,
1h x ib 1h x ib
żb żb
x x
e sin xdx = e sin x ´ cos x e cos xdx = e sin x + cos x
a 2 a a 2 a
189
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS
• This also says, for example, that 12 e x sin x ´ cos x is an antiderivative of e x sin x so
that
1
ż
e x sin xdx = e x sin x ´ cos x + C
2
The somewhat magical thing about this example is that we found our antiderivative,
in the end, using algebra. There wasn’t a step where we evaluated an antiderivative in
the usual way – we just generated an equation, and then solved it. To our knowledge,
this technique hasş no particular name. Because we somehow ended up where we started,
with the integral e x sin xdx, this author likes to call the technique integrating around in a
circle.
Since there was no clear “antiderivative” step, the results of this example can feel sus-
picious. We can always check whether an antiderivative is correct! This one is correct if
and only if the derivative of the right hand side is e x sin x. Here goes. By the product rule:
d h1 x i 1h i
e sin x ´ cos x + C = e x sin x ´ cos x + e x cos x + sin x = e x sin x
dx 2 2
which is the desired derivative.
Example 3.5.11
There is another way to find e x sin xdx and e x cos xdx that, in contrast to the above
ş ş
computations, doesn’t involve any trickery. But it does require the use of complex num-
ix
bers and so is beyond the scope of this course. The secret is to use that sin x = e ´e
´ix
2i and
eix +e´ix
cos x = 2 , where i is the square root of ´1 of the complex number system.
Equation 3.6.1.
sin2 x + cos2 x = 1
Equation 3.6.2.
190
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS
Equation 3.6.3.
Notice that the last two lines of Equation (3.6.3) follow from the first line by replacing
either sin2 x or cos2 x using Equation (3.6.1). It is also useful to rewrite these last two lines:
Equation 3.6.4.
1 ´ cos(2x )
sin2 x =
2
Equation 3.6.5.
1 + cos(2x )
cos2 x =
2
These last two are particularly useful since they allow us to rewrite higher powers of
sine and cosine in terms of lower powers. For example:
4 1 ´ cos(2x ) 2
sin ( x ) = by Equation (3.6.4)
2
1 1 1 2
= ´ cos(2x ) + cos (2x ) use Equation (3.6.5)
4 2 4 looomooon
do it again
1 1 1
= ´ cos(2x ) + (1 + cos(4x ))
4 2 8
3 1 1
= ´ cos(2x ) + cos(4x )
8 2 8
So while it was hard to integrate sin4 ( x ) directly, the final expression is quite straightfor-
ward (with a little substitution rule).
There are many such tricks for integrating powers of trigonometric functions. Here we
concentrate on two families
ż ż
m n
sin x cos xdx and tanm x secn xdx
for integer n, m. The details of the technique depend on the parity of n and m — that is,
whether n and m are even or odd numbers.
191
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS
1 1
= u3 + C = sin3 x + C
3 3
This method can be used whenever n is an odd integer.
• Substitute u = sin x and du = cos xdx.
• This leaves an even power of cosines — convert them using cos2 x = 1 ´ sin2 x =
1 ´ u2 .
Here is an example.
Example 3.6.6 sin2 x cos3 xdx
ş
Start by factoring off one power of cos x to combine with dx to get cos xdx = du.
ż ż
2 3 2 2
sin x cos xdx = lo sin cos
omooxn cos
omooxn lo xdx
looomooon set u = sin x
= u2 =1´u2 =du
ż
= u2 (1 ´ u2 )du
u3 u5
= ´ +C
3 5
sin3 x sin5 x
= ´ +C
3 5
Example 3.6.6
Of course if m is an odd integer we can use the same strategy with the roles of sin x
and cos x exchanged. That is, we substitute u = cos x, du = ´ sin xdx and sin2 x =
1 ´ cos2 x = 1 ´ u2 .
By (3.6.5)
1 1h 1 i
ż ż
2
cos xdx = 1 + cos(2x ) dx = x + sin(2x ) + C
2 2 2
192
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS
Example 3.6.7
Example 3.6.8 cos4 xdx
ş
First we’ll prepare the integrand cos4 x for easy integration by applying (3.6.5) a couple
times. We have already used (3.6.5) once to get
1
cos2 x = 1 + cos(2x )
2
Squaring it gives
1 2 1 1 1
cos4 x = 1 + cos(2x ) = + cos(2x ) + cos2 (2x )
4 4 2 4
Now by (3.6.5) a second time
1 1 1 1 + cos(4x )
cos4 x = + cos(2x ) +
4 2 4 2
3 1 1
= + cos(2x ) + cos(4x )
8 2 8
Now it’s easy to integrate
3 1 1
ż ż ż ż
4
cos xdx = dx + cos(2x )dx + cos(4x )dx
8 2 8
3 1 1
= x + sin(2x ) + sin(4x ) + C
8 4 32
Example 3.6.8
Example 3.6.9 cos2 x sin2 xdx
ş
193
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS
Oof! We could also have done this one using (3.6.2) to write the integrand as sin2 (2x ) and
then used (3.6.4) to write it in terms of cos(4x ).
Example 3.6.9
Example 3.6.10 cos2 xdx and sin2 xdx
şπ şπ
0 0
Of course we can compute the definite integral 0 cos2 xdx by using the antiderivative for
şπ
cos2 x that we found in Example 3.6.7. But here is a trickier way to evaluate that inte-
2
gral, and also the integral 0 sin xdx at the same time, very quickly without needing the
şπ
area — look at the graphs below — the darkly shaded regions in the two graphs
have the same area and the lightly shaded regions in the two graphs have the same
area.
y y
1
y = cos2 x
1
y = sin2 x
x x
π/2 π π/2 π
• Consequently,
ż π
1
żπ żπ żπ
2 2 2 2
cos xdx = sin xdx = sin xdx + cos xdx
0 0 2 0 0
1
żπ
= sin2 x + cos2 x dx
2 0
1
żπ
= dx
2 0
π
=
2
Example 3.6.10
The strategy for dealing with these integrals is similar to the strategy that we used to
m
evaluate integrals of the form sin x cosn xdx and again depends on the parity of the
ş
194
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS
These will become much more clear after an example (or two).
(1) When m is odd and any n — rewrite the integrand in terms of sin x and cos x:
sin x m 1 n
tanm x secn xdx = dx
cos x cos x
sinm´1 x
= sin xdx
cosn+m x
and then substitute u = cos x, du = ´ sin xdx, sin2 x = 1 ´ cos2 x = 1 ´ u2 . See
Examples 3.6.11 and 3.6.12.
(2) Alternatively, if m is odd and n ě 1 move one factor of sec x tan x to the side so that
you can see sec x tan xdx in the integral, and substitute u = sec x, du = sec x tan x dx
and tan2 x = sec2 x ´ 1 = u2 ´ 1. See Example 3.6.13.
(3) If n is even with n ě 2, move one factor of sec2 x to the side so that you can see sec2 xdx
in the integral, and substitute u = tan x, du = sec2 x dx and sec2 x = 1 + tan2 x =
1 + u2 . See Example 3.6.14.
(4) When m is even and n = 0 — that is the integrand is just an even power of tangent
— we can still use the u = tan x substitution, after using tan2 x = sec2 x ´ 1 (possibly
more than once) to create a sec2 x. See Example 3.6.16.
(5) This leaves the case n odd and m even. There are strategies like those above for treat-
ing this case. But they are more complicated and also involve more tricks (that ba-
sically have to be memorized). Examples using them are provided in the optional
section entitled “Integrating sec x, csc x, sec3 x and csc3 x”, below. A more straight for-
ward strategy uses another technique called “partial fractions”. We shall return to this
strategy after we have learned about partial fractions. See Example 3.8.4 and 3.8.5 in
Section 3.8.
Solution.
1
• Write the integrand tan x = cos x sin x.
34 You will need to memorise the derivatives of tangent and secant. However there is no need to memorise
1 + tan2 x = sec2 x. To derive it very quickly just divide sin2 x + cos2 x = 1 by cos2 x.
195
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS
Example 3.6.11
Example 3.6.12 tan3 xdx
ş
Solution.
sin2 x
• Write the integrand tan3 x = cos3 x
sin x.
• Again substitute u = cos x, du = ´ sin x dx. We rewrite the remaining even powers
of sin x using sin2 x = 1 ´ cos2 x = 1 ´ u2 .
• Hence
sin2 x
ż ż
3
tan x dx = sin x dx substitute u = cos x
cos3 x
1 ´ u2
ż
= (´1)du
u3
u´2
= + ln |u| + C
2
1
= + ln | cos x| + C can rewrite in terms of secant
2 cos2 x
1
= sec2 x ´ ln | sec x| + C
2
Example 3.6.12
§§§ m is Odd and n ě 1 — Odd Power of Tangent and at Least One Secant
Here we collect a factor of tan x sec x and then substitute u = sec x and du = sec x tan xdx.
We can then rewrite any remaining even powers of tanx in terms of sec x using tan2 x =
sec2 x ´ 1 = u2 ´ 1.
196
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS
Example 3.6.13 tan3 x sec4 xdx
ş
Solution.
• Start by factoring off one copy of sec x tan x and combine it with dx to form sec x tan xdx,
which will be du.
• Now substitute u = sec x, du = sec x tan xdx and tan2 x = sec2 x ´ 1 = u2 ´ 1.
• This gives
ż ż
3 4 2 3
tan x sec xdx = tan sec
omooxn lo
lo sec x tan xdx
omooxn loooooomoooooon
u2 ´1 u3 du
ż
= u2 ´ 1]u3 du
u6 u4
= ´ +C
6 4
1 1
= sec6 x ´ sec4 x + C
6 4
Example 3.6.13
Solution.
• Factor off one copy of sec2 x and combine it with dx to form sec2 xdx, which will be
du.
• Then substitute u = tan x, du = sec2 xdx and rewrite any remaining even powers of
sec x as powers of tan x = u using sec2 x = 1 + tan2 x = 1 + u2 .
• This gives
ż ż
4 2 2
sec xdx = sec
omooxn sec
lo xdx
looomooon
1+ u2 du
ż
= 1 + u2 ]du
u3
= u+ +C
3
1
= tan x + tan3 x + C
3
197
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS
Example 3.6.14
Example 3.6.15 tan3 x sec4 xdx — redux
ş
Solution. Let us revisit this example using this slightly different approach.
• Factor off one copy of sec2 x and combine it with dx to form sec2 xdx, which will be
du.
• Then substitute u = tan x, du = sec2 xdx and rewrite any remaining even powers of
sec x as powers of tan x = u using sec2 x = 1 + tan2 x = 1 + u2 .
• This gives
ż ż
3 4 3 2 2
tan x sec xdx = tan sec
omooxn sec
omooxn lo
lo xdx
looomooon
u3 1+ u2 du
ż
= u3 + u5 ]du
u4 u6
= + +C
4 6
1 1
= tan4 x + tan6 x + C
4 6
• This is not quite the same as the answer we got above in Example 3.6.13. However
we can show they are (nearly) equivalent. To do so we substitute v = sec x and
tan2 x = sec2 x ´ 1 = v2 ´ 1:
1 1 1 1
tan6 x + tan4 x = (v2 ´ 1)3 + (v2 ´ 1)2
6 4 6 4
1 6 1
= (v ´ 3v4 + 3v2 ´ 1) + (v4 ´ 2v2 + 1)
6 4
v 6 v 4 v 2 1 v 4 v2 1
= ´ + ´ + ´ +
6 2 2 6 4 2 4
v6 v4 1 1
= ´ + 0 ¨ v2 + ´
6 4 4 6
1 1 1
= sec6 x ´ sec4 x + .
6 4 12
So while 61 tan6 x + 14 tan4 x ‰ 16 sec6 x ´ 41 sec4 x, they only differ by a constant. Hence
both are valid antiderivatives of tan3 x sec4 x.
Example 3.6.15
198
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS
We integrate this by setting u = tan x. For this to work we need to pull one factor of sec2 x
to one side to form du = sec2 xdx. To find this factor of sec2 x we (perhaps repeatedly)
apply the identity tan2 x = sec2 x ´ 1.
Example 3.6.16 tan4 xdx
ş
Solution.
• There is no sec2 x term present, so we try to create it from tan4 x by using tan2 x =
sec2 x ´ 1.
u3
= ´u+x+C
3
tan3 x
= ´ tan x + x + C
3
Example 3.6.16
Example 3.6.17 tan8 xdx
ş
199
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS
The first term is now ready to be integrated, but we need to reapply the method to
the second term:
= tan6 x sec2 x ´ tan4 x ¨ sec2 x ´ 1
= tan6 x sec2 x ´ tan4 x sec2 x + tan4 x do it again
6 2 4 2 2
2
= tan x sec x ´ tan x sec x + tan x ¨ sec x ´ 1
= tan6 x sec2 x ´ tan4 x sec2 x + tan2 x sec2 x ´ tan2 x and again
= tan6 x sec2 x ´ tan4 x sec2 x + tan2 x sec2 x ´ sec2 x ´ 1
• Hence
ż ż h i
8
tan xdx = tan6 x sec2 x ´ tan4 x sec2 x + tan2 x sec2 x ´ sec2 x + 1 dx
ż h i ż
6 4 2 2
= tan x ´ tan x + tan x ´ 1 sec xdx + dx
ż h i
6 4 2
= u ´ u + u ´ 1 du + x + C
u7 u5 u3
= ´ + ´u+x+C
7 5 3
1 1 1
= tan7 x ´ tan5 x + tan3 x ´ tan x + x + C
7 5 3
1 1
ż
tan2k xdx = tan2k´1 ( x ) ´ tan2k´3 x + ¨ ¨ ¨ ´ (´1)k tan x + (´1)k x + C
2k ´ 1 2k ´ 3
Example 3.6.17
This last example also shows how we might integrate an odd power of tangent:
Example 3.6.18 tan7 x
ş
200
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION
• Now we can substitute u = tan x and du = sec2 xdx and also use the result from
Example 3.6.11 to take care of the last term:
ż ż ż
7 5 2 3 2 2
tan xdx = tan x sec x ´ tan x sec x + tan x sec x dx ´ tan xdx
Now factor out the common sec2 x term and integrate tan x via Example 3.6.11
ż
= tan5 x ´ tan3 x + tan x sec xdx ´ ln | sec x| + C
5
ż
= u ´ u3 + u du ´ ln | sec x| + C
u6 u4 u2
= ´ + ´ ln | sec x| + C
6 4 2
1 1 1
= tan6 x ´ tan4 x + tan2 x ´ ln | sec x| + C
6 4 2
This example suggests that for integer k ě 0:
1 1 1
ż
tan2k+1 xdx = tan2k ( x ) ´ tan2k´2 x + ¨ ¨ ¨ ´ (´1)k tan2 x + (´1)k ln | sec x| + C
2k 2k ´ 2 2
Example 3.6.18
Of course we have not considered integrals involving powers of cot x and csc x. But
they can be treated in much the same way as tan x and sec x were.
Integrating tanm x secn xdx when n is odd and m is even uses similar strategies as to
ş
the previous cases. However, the computations are often more involved and more tricks
need to be deployed. For this reason you will not be asked to compute integrals of that
type. (However, you should memorize the antiderivative of the secant function.) Sec-
tion A.8 in the appendix gives some examples, if you’re curious what these computations
look like. In particular, the derivation of sec xdx has quite an interesting trick to it.
ş
201
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION
?
• eliminate a2 ´ x2 from an integrand by substituting x = a sin u to give
a a a
a2 ´ x2 = a2 ´ a2 sin2 u = a2 cos2 u = |a cos u|
?
• eliminate a2 + x2 from an integrand by substituting x = a tan u to give
a a a
2 2 2 2
a + x = a + a tan u = a2 sec2 u = |a sec u|
2
?
• eliminate x2 ´ a2 from an integrand by substituting x = a sec u to give
a a a
x2 ´ a2 = a2 sec2 u ´ a2 = a2 tan2 u = |a tan u|
Be very careful with signs and absolute values when using this substitution. See
Example 3.7.6.
When we have used substitutions before, we usually gave the new integration vari-
able, u, as a function of the old integration variable x. Here we are doing the reverse
— we are giving the old integration variable, x, in terms of the new integration variable
u. We may do so, as long as we may invert to get u as a function of x. For example, with
x = a sin u, we may take u = arcsin xa . This is a good time for you to review the definitions
of arcsin θ, arctan θ and arcsec θ.
As a warm-up, consider the area of a quarter of the unit circle.
Example 3.7.1 (Quarter of the unit circle)
Compute the area of the unit circle lying in the first quadrant.
Solution. We know that the answer is π/4, but we can also compute this as an integral —
we saw this way back in Example 3.1.16:
ż1a
area = 1 ´ x2 dx
0
dx
• To simplify the integrand we substitute x = sin u. With this choice du = cos u and
so dx = cos udu.
• We also need to translate the limits of integration and it is perhaps easiest to do this
by writing u as a function of x — namely u( x ) = arcsin x. Hence u(0) = 0 and
u(1) = π/2.
• Hence the integral becomes
ż1a ż π/2 a
2
1 ´ x dx = 1 ´ sin2 u ¨ cos udu
0 0
ż π/2 a
= cos2 u ¨ cos udu
0
ż π/2
= cos2 udu
0
?
Notice that here we have used that the positive square root cos2 u = | cos u| = cos u
because cos(u) ě 0 for 0 ď u ď π/2.
202
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION
ż1a ż π/2
1 + cos 2u
2
1 ´ x dx = cos2 udu and since cos2 u =
0 0 2
ż π/2
1
= (1 + cos(2u))du
2 0
π/2
1 1
= u + sin(2u)
2 2 0
1 π sin π sin 0
= ´0+ ´
2 2 2 2
π
= X
4
Example 3.7.1
? x2
Example 3.7.2 dx
ş
1´x2
dx
• To simplify the integrand we substitute x = sin u. With this choice du = cos u and
so dx = cos udu. Also note that u = arcsin x.
x2 sin2 u
ż ż
? dx = ¨ cos udu
1 ´ x2
a
1 ´ sin2 u
sin2 u
ż
= ? ¨ cos udu
cos2 u
• To proceed further we need to get rid of the square-root. Since u = arcsin x has
domain ´1 ď x ď 1 and range ´π/2 ď u ď π/2, it follows that cos u ě 0 (since cosine
is non-negative on these inputs). Hence
a
cos2 u = cos u when ´π/2 ď u ď π/2
203
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION
x2 sin2 u
ż ż
? dx = ? ¨ cos udu
1 ´ x2 cos2 u
sin2 u
ż
= ¨ cos udu
cos u
ż
= sin2 udu
1
ż
= (1 ´ cos 2u)du by Equation (3.6.4)
2
u 1
= ´ sin 2u + C
2 4
1 1
= arcsin x ´ sin(2 arcsin x ) + C
2 4
• We can simplify this further using a double-angle identity. Recall that u = arcsin x
and that x = sin u. Then
2 2
We can replace cos
a u using cos u = 1 ´ sin u. Taking a square-root of this formula
gives cos u = ˘ 1 ´ sin2 u. We need the positive branch here since cos u ě 0 when
´π/2 ď u ď π/2 (which is exactly the range of arcsin x). Continuing along:
a
sin 2u = 2 sin u ¨ 1 ´ sin2 u
a
= 2x 1 ´ x2
x2 1 1
ż
? dx = arcsin x ´ sin(2 arcsin x ) + C
1 ´ x2 2 4
1 1 a
= arcsin x ´ x 1 ´ x2 + C
2 2
Example 3.7.2
The above two example illustrate the main steps of the approach. The next example is
similar, but with more complicated limits of integration.
ş ?
r
Example 3.7.3 a r2 ´ x2 dx
Let’s find the area of the shaded region in the sketch below.
204
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION
y
x2 + y 2 = r 2
a r x
We’ll set
? up the integral using vertical strips. The strip in the figure has width dx and
height r2 ´ x2 . So the area is given by the integral
żra
area = r2 ´ x2 dx
a
Which is very similar to the previous example.
Solution.
• To evaluate the integral we substitute
dx
x = x (u) = r sin u du = r cos u du
dx =
du
It is also helpful to write u as a function of x — namely u = arcsin xr .
• The integral runs from x = a to x = r. These correspond to
r π
u(r ) = arcsin = arcsin 1 =
r 2
a
u( a) = arcsin which does not simplify further
r
• The integral then becomes
żra ż π/2 a
r2 ´ x2 dx = r2 ´ r2 sin2 u ¨ r cos udu
a arcsin( a/r )
ż π/2 a
2
= r 1 ´ sin2 u ¨ cos udu
arcsin( a/r )
ż π/2 a
2
=r cos2 u ¨ cos udu
arcsin( a/r )
To proceed further (as we did in Examples 3.7.1 and 3.7.2) we need to think about
whether cos u is positive or negative.
• Since a (as shown in the diagram) satisfies 0 ď a ď r, we know that u( a) lies between
arcsin(0) = 0 and arcsin(1) = π/2. Hence the variable u lies between 0 and π/2, and
on this range cos u ě 0. This allows us get rid of the square-root:
a
cos2 u = | cos u| = cos u
205
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION
1+cos 2u
Recall the identity cos2 u = 2 from Section 3.6
ż π/2
r2
= (1 + cos 2u)du
2 arcsin( a/r )
π/2
r2 1
= u + sin(2u)
2 2 arcsin( a/r )
2
r π 1 1
= + sin π ´ arcsin( a/r ) ´ sin(2 arcsin( a/r ))
2 2 2 2
2
r π 1
= ´ arcsin( a/r ) ´ sin(2 arcsin( a/r ))
2 2 2
Oof! But there is a little further to go before we are done.
• We can again simplify the term sin(2 arcsin( a/r )) using a double angle identity. Set
θ = arcsin( a/r ). Then θ is the angle in the triangle on the right below. By the double
angle formula for sin(2θ ) (Equation (3.6.2))
206
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION
Example 3.7.3
ş ?
r
Example 3.7.4 a x r2 ´ x2 dx
şr ?
The integral a x r2 ´ x2 dx looks a lot like the integral we just did in the previous 3 exam-
ples. It can also be evaluated using the trigonometric substitution x = r sin u — but that is
unnecessarily complicated. Just because you have now learned how to use trigonometric
substitution35 doesn’t mean that you should forget everything you learned before.
Solution. This integral is much more easily evaluated using the simple substitution u =
r2 ´ x2 .
Example 3.7.4
Solution. As per our guidelines at the start of this section, the presence of the square root
?
term 32 + x2 tells us to substitute x = 3 tan u.
• Substitute
x = 3 tan u dx = 3 sec2 u du
dx 3 sec2 u
ż ż
? = du
x2 9 + x2 9 tan2 u ¨ 3| sec u|
35 To paraphrase the Law of the Instrument, possibly Mark Twain and definitely some psychologists,
when you have a shiny new hammer, everything looks like a nail.
207
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION
• To remove the absolute value we must consider the range of values of u in the in-
tegral. Since x = 3 tan u we have u = arctan( x/3). The range36 of arctangent is
´π/2 ď arctan ď π/2 and so u = arctan( x/3) will always like between ´π/2 and
+π/2. Hence cos u will always be positive, which in turn implies that | sec u| = sec u.
• Using this fact our integral becomes:
dx 3 sec2 u
ż ż
? = du
x2 9 + x2 27 tan2 u| sec u|
1 sec u
ż
= du since sec u ą 0
9 tan2 u
• The original integral was a function of x, so we still have to rewrite sin u in terms of
x. Remember that x = 3 tan u or u = arctan( x/3). So u is the angle shown in the
triangle below and we can read off the triangle that
√
x 9 + x2
sin u = ? x
9 + x2
? u
dx 9 + x2
ż
ùñ ? =´ +C 3
2
x 9+x 2 9x
Example 3.7.5
2
Example 3.7.6 ?x dx
ş
x2 ´1
Solution. This one requires a secant substitution, but otherwise is very similar to those
above.
36 To be pedantic, we mean the range of the “standard” arctangent function or its “principle value”. One
can define other arctangent functions with different ranges.
208
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION
• As before we need to consider the range of u values in order to determine the sign
of tan u. Notice that the integrand is only defined when either x ă ´1 or x ą 1; thus
we should treat the cases x ă ´1 and x ą 1 separately. Let us assume that x ą 1 and
we will come back to the case x ă ´1 at the end of the example.
When x ą 1, our u = arcsec x takes values in (0, π/2). This follows since when
0 ă u ă π/2, we have 0 ă cos u ă 1 and so sec u ą 1. Further, when 0 ă u ă π/2, we
have tan u ą 0. Thus | tan u| = tan u.
x2 tan u
ż ż
? dx = sec3 u ¨ du
x2 ´ 1 | tan u|
ż
= sec3 udu since tan u ě 0
• Since we started with a function of x we need to finish with one. We know that
sec u = x and then we can use trig identities
2 2 2
a
tan u = sec u ´ 1 = x ´ 1 so tan u = ˘ x2 ´ 1, but we know tan u ě 0, so
a
tan u = x2 ´ 1
Thus
x2 1 a 1
ż a
? dx = x x2 ´ 1 + ln |x + x2 ´ 1| + C
x2 ´ 1 2 2
• The above holds when x ą 1. We can confirm that it is also true when x ă ´1 by
showing the right-hand side is a valid antiderivative of the integrand. To do so we
must?differentiate our answer. Notice that we do not need to consider the sign of
x + x2 ´ 1 when we differentiate since we have already seen that
d 1
ln |x| =
dx x
209
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION
The method, as we have demonstrated it above, works when our integrand contains
the square root of very specific families of quadratic polynomials. In fact, the same method
works for more general quadratic polynomials — all we need to do is complete the square37 .
ş5 ?x2 ´2x´3
Example 3.7.7 3 x´1 dx
This time we have an integral with a square root in the integrand, but the argument of the
37 If you have not heard of “completing the square” don’t worry. It is not a difficult method and it will
only take you a few moments to learn. It refers to rewriting a quadratic polynomial
P( x ) = ax2 + bx + c as P ( x ) = a ( x + d )2 + e
for new constants d, e.
210
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION
?
square root, 2 2
? ? while a quadratic function of x, is not in one of the standard forms a ´ x ,
a2 + x2 , x2 ´ a2 . The reason that it is not in one of those forms is that the argument,
2
x ´ 2x ´ 3, contains a term , namely ´2x that is of degree one in x. So we try to manipulate
it into one of the standard forms by completing the square.
Solution.
x2 ´ 2x ´ 3 = ( x ´ a)2 + b = ( x2 ´ 2ax + a2 ) + b
x2 ´ 2x ´ 3 = ( x ´ 1)2 ´ 4
Many of you may have seen this method when learning to sketch parabolas.
• Once this is done we can convert the square root of the integrand into a standard
form by making the simple substitution y = x ´ 1. Here goes
ż5? 2 ż5a
x ´ 2x ´ 3 ( x ´ 1)2 ´ 4
dx = dx
3 x´1 3 x´1
ż4a 2
y ´4
= dy with y = x ´ 1, dy = dx
2 y
ż π/3 ?
4 sec2 u ´ 4
= 2 sec u tan u du with y = 2 sec u
0 2 sec u
and dy = 2 sec u tan u du
211
I NTEGRATION 3.8 PARTIAL F RACTIONS
In taking the square root of sec2 u ´ 1 = tan2 u we used that tan u ě 0 on the range
0 ď u ď π3 .
ż π/3
=2 sec2 u ´ 1 du since sec2 u = 1 + tan2 u, again
0
h iπ/3
= 2 tan u ´ u
? 0
= 2 3 ´ π/3
Example 3.7.7
x3 + x 1 1
2
= x+ +
x ´1 x+1 x´1
then the integration becomes nearly trivial:
ż 3 ż
x +x 1 1
dx = x+ + dx
x2 ´ 1 x+1 x´1
1
= x2 + ln |x + 1| + ln |x ´ 1| + C
2
We are not (typically) presented with a rational function nicely sectioned into neat little
pieces. Partial fraction decomposition is a strategy for breaking rational functions up into
these small, nicely integrable parts.
Suppose that N ( x ) and D ( x ) are polynomials. The basic strategy behind the method
N (x)
of partial fractions is to write D( x) as a sum of very simple, easy-to-integrate rational
functions, namely:
(1) polynomials — we shall see below that these are needed when the degree39 of N ( x ) is
equal to or strictly bigger than the degree of D ( x );
212
I NTEGRATION 3.8 PARTIAL F RACTIONS
A
(2) rational functions of the particularly simple form ( ax +b)n
; and
Ax + B
(3) rational functions of the form40 ( ax2 +bx +c)m
.
1 1 x ( x + 1)( x ´ 1) + ( x ´ 1) + ( x + 1) x3 + x
x+ + = = 2
x+1 x´1 ( x + 1)( x ´ 1) x ´1
(1) The denominators on the left-hand side of are the factors of the denominator x2 ´ 1 =
( x ´ 1)( x + 1) on the right-hand side.
(2) Use P( x ) to denote the polynomial on the left hand side, and then use N ( x ) and D ( x )
to denote the numerator and denominator of the right hand side. That is
P( x ) = x N ( x ) = x3 + x D ( x ) = x2 ´ 1.
Then the degree of N ( x ) is the sum of the degrees of P( x ) and D ( x ). This is because
the highest degree term in N ( x ) is x3 , which comes from multiplying P( x ) by D ( x ),
as we see in
D(x)
hkPkik
( x ) hkkkkkkkikkkkkkkj
kj
1 1 x ( x + 1)( x ´ 1) +( x ´ 1) + ( x + 1) x3 + x
x+ + = = 2
x+1 x´1 ( x + 1)( x ´ 1) x ´1
More generally, the presence of a polynomial on the left hand side is signalled on the
right hand side by the fact that the degree of the numerator is at least as large as the
degree of the denominator.
Solution.
40 You might notice these examples conveniently absent in the discussion that follows. In this class, we
will skip decompositions requiring such functions. For the extra-curious, the rest of the story can be
found in Appendix Section A.9.1
213
I NTEGRATION 3.8 PARTIAL F RACTIONS
x´3 A B
= +
x2 ´ 3x + 2 x´1 x´2
for some constants A and B. More generally, if the denominator consists of n differ-
ent linear factors, then we decompose the ratio as
A1 A2 An
rational function = + +¨¨¨+
linear factor 1 linear factor 2 linear factor n
To proceed we need to determine the values of the constants A, B and there are
several different methods to do so. Here are two methods
• Step 3 – Algebra Method. This approach has the benefit of being conceptually clearer
and easier, but the downside is that it is more tedious.
To determine the values of the constants A, B, we put42 the right-hand side back
over the common denominator ( x ´ 1)( x ´ 2).
x´3 A B A ( x ´ 2) + B ( x ´ 1)
= + =
x2 ´ 3x + 2 x´1 x´2 ( x ´ 1)( x ´ 2)
The fraction on the far left is the same as the fraction on the far right if and only if
their numerators are the same.
x ´ 3 = A ( x ´ 2) + B ( x ´ 1)
Write the right hand side as a polynomial in standard form (i.e. collect up all x terms
and all constant terms)
x ´ 3 = ( A + B) x + (´2A ´ B)
41 We will soon get to an example (Example 3.8.2 in fact) in which the numerator degree is at least as
large as the denominator degree — in that situation we have to extract a polynomial P( x ) before we
can move on to step 2.
42 That is, we take the decomposed form and sum it back together.
214
I NTEGRATION 3.8 PARTIAL F RACTIONS
For these two polynomials to be the same, the coefficient of x on the left hand side
and the coefficient of x on the right hand side must be the same. Similarly the co-
efficients of x0 (i.e. the constant terms) must match. This gives us a system of two
equations.
A+B = 1 ´2A ´ B = ´3
A = 1´B
– Substituting this into the remaining equation eliminates the A from second
equation, leaving one equation in the one unknown B, which can then be solved
for B:
´2A ´ B = ´3 substitute A = 1 ´ B
´2(1 ´ B) ´ B = ´3 clean up
´2 + B = ´3 so B = ´1
A = 1 ´ B = 1 ´ (´1) = 2
Hence
x´3 2 1
= ´
x2 ´ 3x + 2 x´1 x´2
• Step 3 – Sneaky Method. This takes a little more work to understand, but it is more
efficient than the algebra method.
We wish to find A and B for which
x´3 A B
= +
( x ´ 1)( x ´ 2) x´1 x´2
Note that the denominator on the left hand side has been written in factored form.
x´3 ( x ´ 1) B
= A+
x´2 x´2
and then we completely eliminate B from the equation by evaluating at x = 1.
This value of x is chosen to make x ´ 1 = 0.
x ´ 3 ˇˇ ( x ´ 1) B ˇˇ 1´3
ˇ ˇ
= A+ ùñ A = =2
x ´ 2 x =1
ˇ x ´ 2 x =1
ˇ 1´2
215
I NTEGRATION 3.8 PARTIAL F RACTIONS
Example 3.8.1
Perhaps the first thing that you notice is that this process takes quite a few steps43 . How-
ever no single step is all that complicated; it only takes practice. With that said, let’s do
another, slightly more complicated, one.
ş 3 2
3x ´8x +4x´1
Example 3.8.2 x2 ´3x +2
dx
N (x) 3x3 ´8x2 +4x´1
In this example, we integrate D(x)
= x2 ´3x +2
.
Solution.
• Step 1. We first check to see if the degree of the numerator N ( x ) is strictly smaller
than the degree of the denominator D ( x ). In this example, the numerator, 3x3 ´
8x2 + 4x ´ 1, has degree three and the denominator, x2 ´ 3x + 2, has degree two. As
3 ě 2, we have to implement the first step.
N (x)
The goal of the first step is to write D(x)
in the form
N (x) R( x )
= P( x ) +
D(x) D(x)
43 Though, in fairness, we did step 3 twice — and that is the most tedious bit. . . Actually — sometimes
factoring the denominator can be quite challenging. We’ll consider this issue in more detail shortly.
216
I NTEGRATION 3.8 PARTIAL F RACTIONS
In the above expression, the denominator is on the left, the numerator is on the
right and 3x is written above the highest order term of the numerator. Always
put lower powers of x to the right of higher powers of x — this mirrors how
you do long division with numbers; lower powers of ten sit to the right of lower
powers of ten.
– Now we subtract 3x times the denominator, x2 ´ 3x + 2, which is 3x3 ´ 9x2 + 6x,
from the numerator.
3x
x2 − 3x + 2 3x3 − 8x2 + 4x − 1
3x3 − 9x2 + 6x 3x(x2 − 3x + 2)
x2 − 2x − 1
– This has left a remainder of x2 ´ 2x ´ 1. To get from the highest degree term in
the denominator (x2 ) to the highest degree term in the remainder (x2 ), we have
to multiply by 1. So we write,
3x + 1
x2 − 3x + 2 3x3 − 8x2 + 4x − 1
3x3 − 9x2 + 6x
x2 − 2x − 1
– Now we subtract 1 times the denominator, x2 ´ 3x + 2, which is x2 ´ 3x + 2,
from the remainder.
3x + 1
x2 − 3x + 2 3x3 − 8x2 + 4x − 1
3x3 − 9x2 + 6x 3x(x2 − 3x + 2)
x2 − 2x − 1
x2 − 3x + 2 1 (x2 − 3x + 2)
x− 3
217
I NTEGRATION 3.8 PARTIAL F RACTIONS
Moving the (3x + 1)( x2 ´ 3x + 2) to the right hand side and dividing the whole
equation by x2 ´ 3x + 2 gives
3x3 ´ 8x2 + 4x ´ 1 x´3
= 3x + 1 +
x2 ´ 3x + 2 x2 ´ 3x + 2
And we can easily check this expression just by summing the two terms on the
right-hand side.
N (x) R( x )
We have written the integrand in the form D( x) = P( x ) + D( x) , with the degree of
R( x ) strictly smaller than the degree of D ( x ), which is what we wanted. Observe
that R( x ) is the final remainder of the long division procedure and P( x ) is at the top
of the long division computation.
3x + 1 P (x)
2
D(x) x − 3x + 2 3x3 − 8x2 + 4x − 1 N(x)
3x3 − 9x2 + 6x 3x · D(x)
x2 − 2x − 1 N(x) − 3x · D(x)
x2 − 3x + 2 1 · D(x)
x− 3 R(x) = N(x) − (3x + 1)D(x)
This is the end of Step 1. Oof! You should definitely practice this step.
• Step 2. The second step is to factor the denominator
x2 ´ 3x + 2 = ( x ´ 1)( x ´ 2)
We already did this in Example 3.8.1.
x´3
• Step 3. The third step is to write x2 ´3x +2
in the form
x´3 A B
= +
x2 ´ 3x + 2 x´1 x´2
for some constants A and B. We already did this in Example 3.8.1. We found A = 2
and B = ´1.
• Step 4. The final step is to integrate.
3x3 ´ 8x2 + 4x ´ 1 2 ´1
ż ż ż ż
2
dx = 3x + 1 dx + dx + dx
x ´ 3x + 2 x´1 x´2
3
= x2 + x + 2 ln |x ´ 1| ´ ln |x ´ 2| + C
2
218
I NTEGRATION 3.8 PARTIAL F RACTIONS
You can see that the integration step is quite quick — almost all the work is in preparing
the integrand.
Example 3.8.2
The best thing after working through a few a nice medium-length examples is to do a
nice long example — it is excellent practice44 . We recommend that the reader attempt the
problem before reading through our solution.
Problems like this are a vehicle for practicing problem solving in general. Take a big
problem, cut it up into smaller chunks, solve those chunks, and put everything back to-
gether into a single solution — without getting lost or giving up. Even if you forget partial
fraction decompositions the minute you walk out of the final exam, the ability to solve a
big problem by cutting it into smaller ones will remain a useful life skill.
ş 3
4x +23x2 +45x +27
Example 3.8.3 x3 +5x2 +8x +4
dx
• Step 1. The degree of the numerator N ( x ) is equal to the degree of the denominator
N (x)
D ( x ), so the first step to write D( x) in the form
N (x) R( x )
= P( x ) +
D(x) D(x)
with P( x ) being a polynomial (which should be of degree 0, i.e. just a constant) and
R( x ) being a polynomial of degree strictly smaller than the degree of D ( x ). By long
division
4
3 2
x + 5x + 8x + 4 4x3 + 23x2 + 45x + 27
4x3 + 20x2 + 32x + 16
3x2 + 13x + 11
so
4x3 + 23x2 + 45x + 27 3x2 + 13x + 11
= 4 +
x3 + 5x2 + 8x + 4 x3 + 5x2 + 8x + 4
44 At the risk of quoting Nietzsche, “That which does not kill us makes us stronger.” Though this author
always preferred the logically equivalent contrapositive — “That which does not make us stronger will
kill us.” However no one is likely to be injured by practicing partial fractions or looking up quotes
on Wikipedia. Its also a good excuse to remind yourself of what a contrapositive is — though we will
likely look at them again when we get to sequences and series.
219
I NTEGRATION 3.8 PARTIAL F RACTIONS
so
– Notice that we could have instead checked whether or not ˘2 are roots
We now know that both ´1 and ´2 are roots of x3 + 5x2 + 8x + 4 and hence both
( x + 1) and ( x + 2) are factors of x3 + 5x2 + 8x + 4. Because x3 + 5x2 + 8x + 4 is
of degree three and the coefficient of x3 is 1, we must have x3 + 5x2 + 8x + 4 =
( x + 1)( x + 2)( x + a) for some constant a. Multiplying out the right hand side
shows that the constant term is 2a. So 2a = 4 and a = 2.
This is the end of step 2. We now know that
3x2 + 13x + 11 A B C
2
= + +
( x + 1)( x + 2) x + 1 x + 2 ( x + 2)2
for some constants A, B and C.
Note that there are two terms on the right hand arising from the factor ( x + 2)2 . One
has denominator ( x + 2) and one has denominator ( x + 2)2 . More generally, for each
factor ( x + a)n in the denominator of the rational function on the left hand side, we
include
A1 A2 An
+ +¨¨¨+
x + a ( x + a) 2 ( x + a)n
220
I NTEGRATION 3.8 PARTIAL F RACTIONS
3x2 + 13x + 11 A B C
2
= + +
( x + 1)( x + 2) x + 1 x + 2 ( x + 2)2
A( x + 2)2 + B( x + 1)( x + 2) + C ( x + 1)
=
( x + 1)( x + 2)2
The fraction on the far left is the same as the fraction on the far right if and only if
their numerators are the same.
As in the previous examples, there are a couple of different ways to determine the
values of A, B and C from this equation.
• Step 3 – Algebra Method. The conceptually clearest procedure is to write the right
hand side as a polynomial in standard form (i.e. collect up all x2 terms, all x terms
and all constant terms)
For these two polynomials to be the same, the coefficient of x2 on the left hand side
and the coefficient of x2 on the right hand side must be the same. Similarly the
coefficients of x1 and the coefficients of x0 (i.e. the constant terms) must match. This
gives us a system of three equations,
A+B = 3 4A + 3B + C = 13 4A + 2B + C = 11
4(3 ´ B) + 3B + C = 13 4(3 ´ B) + 2B + C = 11
or
´B + C = 1 ´ 2B + C = ´1
221
I NTEGRATION 3.8 PARTIAL F RACTIONS
Hence
• Step 3 – Sneaky Method. The second, sneakier, method for finding A, B and C exploits
the fact that 3x2 + 13x + 11 = A( x + 2)2 + B( x + 1)( x + 2) + C ( x + 1) must be true
for all values of x. In particular, it must be true for x = ´1. When x = ´1, the factor
( x + 1) multiplying B and C is exactly zero. So B and C disappear from the equation,
leaving us with an easy equation to solve for A:
ˇ h i
2 2
3x + 13x + 11ˇ = A( x + 2) + B( x + 1)( x + 2) + C ( x + 1)
ˇ
x =´1 x =´1
ùñ 1 = A
Since ( x + 1) is a factor on the right hand side, it must also be a factor on the left
hand side.
For the coefficients of x to match, B must be 2. For the constant terms to match,
2B + C must be 7, so C must be 3. Hence we again have
Example 3.8.3
222
I NTEGRATION 3.8 PARTIAL F RACTIONS
Solution. In this example, we integrate sec x. It is not yet clear what this integral has to do
with partial fractions. To get to a partial fractions computation, we first make one of our
old substitutions.
1
ż ż
sec xdx = dx massage the expression a little
cos x
cos x
ż
= dx substitute u = sin x, du = cos xdx
cos2 x
du
ż
=´ 2
and use cos2 x = 1 ´ sin2 x = 1 ´ u2
u ´1
1
So we now have to integrate u2 ´1
, which is a rational function of u, and so is perfect for
partial fractions.
• Step 1. The degree of the numerator, 1, is zero, which is strictly smaller than the
degree of the denominator, u2 ´ 1, which is two. So the first step is skipped.
u2 ´ 1 = (u ´ 1)(u + 1)
1
• Step 3. The third step is to write u2 ´1
in the form
1 1 A B
= = +
u2 ´1 (u ´ 1)(u + 1) u´1 u+1
1 = A ( u + 1) + B ( u ´ 1)
1 = 2A so A = 1/2.
223
I NTEGRATION 3.8 PARTIAL F RACTIONS
Another example in the same spirit, though a touch harder. Again, we saw this prob-
lem in Section 3.6 and 3.7.
Example 3.8.5 sec3 xdx
ş
Solution.
• We’ll start by converting it into the integral of a rational function using the substitu-
tion u = sin x, du = cos xdx.
1
ż ż
3
sec xdx = dx massage this a little
cos3 x
cos x
ż
= dx replace cos2 x = 1 ´ sin2 x = 1 ´ u2
cos4 x
cos xdx
ż
= 2
[1 ´ sin2 x ]
du
ż
=
[1 ´ u2 ]2
224
I NTEGRATION 3.8 PARTIAL F RACTIONS
1
• We could now find the partial fraction decomposition of the integrand by
[1´u2 ]2
executing the usual four steps. But it is easier to use
1 1h 1 1 i
= ´
u2 ´ 1 2 u´1 u+1
which we worked out in Example 3.8.4 above.
Example 3.8.5
225
I NTEGRATION 3.8 PARTIAL F RACTIONS
In this subsection we fill that gap by describing the general46 form of partial fraction
decompositions. The justification of these forms is not part of the course, but the interested
reader is invited to read Appendix section A.9.2 where such justification is given. In the
following it is assumed that
• N ( x ) and D ( x ) are polynomials with the degree of N ( x ) strictly smaller than the
degree of D ( x ).
• K is a constant.
Equation 3.8.6.
N (x) A1 A2 Aj
= + +¨¨¨+
D(x) x ´ a1 x ´ a2 x ´ aj
A
ż
dx = A ln |x ´ a| + C.
x´a
Equation 3.8.7.
46 Well — not the completely general form, in the sense that we are not allowing the use of complex
numbers and we are not allowing irreducible quadratic factors.
226
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
Every polynomial can be factored into the product of linear and irreducible quadratic47
factors. The general method of partial fractions makes use of this and can indeed gives
us the ability to antidifferentiate any rational function. However, for the purposes of this
class, we will restrict our study to decomposing rational functions whose denominators
can be factored into linear terms48 . For the extra-curious, the cases involving irreducible
quadratics can be found in Appendix Section A.9.1.
dard functions49 . Such integrals are not merely mathematical curiosities, but arise very
naturally in many contexts. For example, the error function
żx
2 2
erf( x ) = ? e´t dt
π 0
is extremely important in many areas of mathematics, and also in many practical applica-
tions of statistics.
47 An irreducible quadratic equation is a quadratic equation that can’t be factored (at least if we restrict
ourselves to real numbers). This is the same as a quadratic equation with no (real) roots. The simplest
example is x2 + 1.
48 Actually, if you allow complex numbers, all polynomials can be factored into linear terms, because no
quadratics are irreducible. But again, we’ll contain our excitement and restrict ourselves to performing
partial fraction decompositions on rational functions whose denominators are the product of linear
factors with real terms.
49 We apologise for being a little sloppy here — but we just want to say that it can be very hard or even
impossible to write some integrals as some finite sized expression involving polynomials, exponen-
tials, logarithms and trigonometric functions. We don’t want to get into a discussion of computability,
though that is a very interesting topic.
227
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
In such applications we need to be able to evaluate this integral (and many others) at
a given numerical value of x. In this section we turn to the problem of how to find (ap-
proximate) numerical values for integrals, without having to evaluate them algebraically.
To develop these methods we return to Riemann sums and our geometric interpretation
of the definite integral as the signed area.
We start by describing (and applying) three simple algorithms for generating, numer-
şb
ically, approximate values for the definite integral a f ( x ) dx. In each algorithm, we begin
in much the same way as we approached Riemann sums.
y
y = f (x)
x
a = x0 x1 x2 x3 · · · xn−1 xn = b
żb ż x1 ż x2 ż xn
f ( x ) dx = f ( x ) dx + f ( x ) dx + ¨ ¨ ¨ + f ( x ) dx
a x0 x1 xn´1
şx
Each subintegral x j f ( x ) dx is approximated by the area of a simple geometric figure.
j´1
The three algorithms we consider approximate the area by rectangles, trapezoids and
parabolas (respectively).
228
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
We will explain these rules in detail below, but we give a brief overview here:
(1) The midpoint rule approximates each subintegral by the area of a rectangle of height
given by the value of the function at the midpoint of the subinterval
ż xj
x j´1 + x j
f ( x )dx « f ∆x
x j´1 2
This is illustrated in the leftmost figure above.
(2) The trapezoidal rule approximates each subintegral by the area of a trapezoid with
vertices at ( x j´1 , 0), ( x j´1 , f ( x j´1 )), ( x j , f ( x j )), ( x j , 0):
ż xj
1
f ( x )dx « f ( x j´1 ) + f ( x j ∆x
x j´1 2
The trapezoid is illustrated in the middle figure above. We shall derive the formula
for the area shortly.
(3) Simpson’s rule approximates two adjacent subintegrals by the area under a parabola
that passes through the points ( x j´1 , f ( x j´1 )), ( x j , f ( x j )) and ( x j+1 , f ( x j+1 )):
ż x j +1
1
f ( x )dx « f ( x j´1 ) + 4 f ( x j ) + f ( x j+1 ∆x
x j´1 3
The parabola is illustrated in the right hand figure above. We shall derive the formula
for the area shortly.
Notation3.9.1 (Midpoints).
In what follows we need to refer to the midpoint between x j´1 and x j very fre-
quently. To save on writing (and typing) we introduce the notation
1
x̄ j = x j´1 + x j .
2
229
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
f (xj−1 )
The area of the approximating rectangle is f ( x̄ j )∆x, and the midpoint rule approximates
each subintegral by
ż xj
f ( x ) dx « f ( x̄ j )∆x
x j´1
.
Applying this approximation to each subinterval and summing gives us the following
approximation of the full integral:
żb ż x1 ż x2 ż xn
f ( x ) dx = f ( x ) dx + f ( x ) dx + ¨ ¨ ¨ + f ( x ) dx
a x0 x1 xn´1
« f ( x̄1 )∆x + f ( x̄2 )∆x + ¨ ¨ ¨ + f ( x̄n )∆x
So notice that the approximation is the sum of the function evaluated at the midpoint
of each interval and then multiplied by ∆x. Our other approximations will have similar
forms.
In summary:
b´a
where ∆x = n and
230
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
ş
1 4
Example 3.9.3 0 1+ x 2 dx
We approximate the above integral using the midpoint rule with n = 8 step.
Solution.
1
• First we set up all the x-values that we will need. Note that a = 0, b = 1, ∆x = 8 and
1 2 7 8
x0 = 0 x1 = 8 x2 = 8 ¨¨¨ x7 = 8 x8 = 8 =1
Consequently
1 3 5 15
x̄1 = 16 x̄2 = 16 x̄3 = 16 ¨¨¨ x̄8 = 16
4
• We now apply Equation (3.9.2) to the integrand f ( x ) = 1+ x 2
:
• In this case we can compute the integral exactly (which is one of the reasons it was
chosen as a first example):
ż1
4 ˇ1
dx = 4 arctan x ˇ =π
ˇ
2
0 1+x 0
• So the error in the approximation generated by eight steps of the midpoint rule is
|3.1429 ´ π| = 0.0013
|approximate ´ exact|
percentage error = 100 ˆ = 0.04%
exact
That is, the error is about 0.04% of the exact value.
231
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
Example 3.9.3
The midpoint rule gives us quite good estimates of the integral without too much work —
though it is perhaps a little tedious to do by hand50 . Of course, it would be very helpful to
quantify what we mean by “good” in this context and that requires us to discuss errors.
Definition3.9.4.
As a second example, we apply the midpoint rule with n = 8 steps to the above integral.
Consequently,
3π 13π 15π
x̄1 = π
16 x̄2 = 16 ¨¨¨ x̄7 = 16 x̄8 = 16
• Again, we have chosen this example so that we can compare it against the exact
value:
π
żπ
sin xdx = ´ cos x 0 = ´ cos π + cos 0 = 2.
0
232
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
f (xj ) f (xj )
f (xj−1 ) f (xj−1 )
xj−1 xj xj−1 xj
ş xj
The trapezoidal approximation of the integral x j´1 f ( x ) dx is the shaded region in the
figure on the right above. It has width x j ´ x j´1 = ∆x. Its left hand side has height f ( x j´1 )
and its right hand side has height f ( x j ).
As the figure below shows, the area of a trapezoid is its width times its average height.
y
r
area (r − ℓ)w/2
ℓ
area (r + ℓ)w/2
area ℓw
w x
51 This method is also called the “trapezoid rule” and “trapezium rule”.
52 A trapezoid is a four sided polygon, like a rectangle. But, unlike a rectangle, the top and bottom of a
trapezoid need not be parallel.
233
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
Applying this approximation to each subinterval and then summing the result gives us
the following approximation of the full integral
żb ż x1 ż x2 ż xn
f ( x ) dx = f ( x ) dx + f ( x ) dx +¨¨¨ + f ( x ) dx
a x0 x1 xn´1
f ( x0 )+ f ( x1 ) f ( x1 )+ f ( x2 ) f ( xn´1 )+ f ( xn )
« 2 ∆x + 2 ∆x +¨¨¨ + 2 ∆x
h i
1 1
= 2 f ( x0 ) + f ( x1 ) + f ( x2 ) + ¨ ¨ ¨ + f ( xn´1 ) + f ( xn ) ∆x 2
So notice that the approximation has a very similar form to the midpoint rule, excepting
that
In summary:
where
b´a
∆x = n , x0 = a, x1 = a + ∆x, x2 = a + 2∆x, ¨¨¨ , xn´1 = b ´ ∆x, xn = b
To compare and contrast we apply the trapezoidal rule to the examples we did above
with the midpoint rule.
ş
1
Example 3.9.7 0 1+4x2 dx — using the trapezoidal rule
Solution. We proceed very similarly to Example 3.9.3 and again use n = 8 steps.
4 1
• We again have f ( x ) = 1+ x 2
, a = 0, b = 1, ∆x = 8 and
1 2 7 8
x0 = 0 x1 = 8 x2 = 8 ¨¨¨ x7 = 8 x8 = 8 =1
234
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
Example 3.9.7
Solution. We proceed very similarly to Example 3.9.5 and again use n = 8 steps.
• We again have a = 0, b = π, ∆x = π
8 and
2π 7π 8π
x0 = 0 x1 = π
8 x2 = 8 ¨¨¨ x7 = 8 x8 = 8 =π
235
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
Example 3.9.8
These two examples suggest that the midpoint rule is more accurate than the trape-
zoidal rule. Indeed, this observation is born out by a rigorous analysis of the error — see
Section 3.9.4.
x1 , f (x1 ) x2 , f (x2 )
x0 , f (x0 )
x0 x1 x2
şx
this on the next pair of subintervals and approximate x24 f ( x ) dx by the area between the
x–axis and the part of a parabola with x2 ď x ď x4 . This parabola passes through the three
points x2 , f ( x2 ) , x3 , f ( x3 ) and x4 , f ( x4 ) . And so on. Because Simpson’s rule does
the approximation two slices at a time, n must be even.
To derive Simpson’s rule formula, we first find the equation of
the parabola that passes
through the three points x0 , f ( x0 ) , x1 , f ( x1 ) and x2 , f ( x2 ) . Then we find the area
53 Simpson’s rule is named after the 18th century English mathematician Thomas Simpson, despite its
use a century earlier by the German mathematician and astronomer Johannes Kepler. In many German
texts the rule is often called Kepler’s rule.
236
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
between the x–axis and the part of that parabola with x0 ď x ď x2 . To simplify this
computation consider a parabola passing through the points (´h, y´1 ), (0, y0 ) and (h, y1 ).
Write the equation of the parabola as
y = Ax2 + Bx + C
Then the area between it and the x-axis with x running from ´h to h is
żh h
2
A 3 B 2
Ax + Bx + C dx = x + x + Cx
´h 3 2 ´h
2A 3
= h + 2Ch it is helpful to write it as
3
h 2
= 2Ah + 6C
3
Now, the the three points (´h, y´1 ), (0, y0 ) and (h, y1 ) lie on this parabola if and only if
h
żh
2
2
area = Ax + Bx + C dx = 2Ah + 6C
´h 3
h
= (y´1 + 4y0 + y1 )
3
Note that here
• y´1 is the height of the parabola at the left hand end of the interval under consider-
ation
• y0 is the height of the parabola at the middle point of the interval under considera-
tion
• y1 is the height of the parabola at the right hand end of the interval under consider-
ation
237
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
and ż x4
f ( x ) dx « 31 ∆x f ( x2 ) + 4 f ( x3 ) + f ( x4 )
x2
and so on. Summing these all together gives:
żb ż x2 ż x4 ż x6 ż xn
f ( x ) dx = f ( x ) dx + f ( x ) dx + f ( x ) dx + ¨ ¨ ¨ + f ( x ) dx
a x0 x2 x4 xn´2
∆x
∆x
« 3 f ( x0 ) + 4 f ( x1 ) + f ( x2 ) + 3 f ( x2 ) + 4 f ( x3 ) + f ( x4 )
+ ∆x ∆x
3 f ( x4 ) + 4 f ( x5 ) + f ( x6 ) + ¨ ¨ ¨ + 3 f ( xn´2 ) + 4 f ( xn´1 ) + f ( xn )
h i
∆x
= f ( x0 )+ 4 f ( x1 )+ 2 f ( x2 )+ 4 f ( x3 )+ 2 f ( x4 )+ ¨ ¨ ¨ + 2 f ( xn´2 )+ 4 f ( xn´1 )+ f ( xn ) 3
In summary
Notice that Simpson’s rule requires essentially no more work than the trapezoidal rule.
In both rules we must evaluate f ( x ) at x = x0 , x1 , . . . , xn , but we add those terms multi-
plied by different constants54 .
Let’s put it to work on our two running examples.
ş
1
Example 3.9.10 0 1+4x2 dx — using Simpson’s rule
Solution. We proceed almost identically to Example 3.9.7 and again use n = 8 steps.
54 There is an easy generalisation of Simpson’s rule that uses cubics instead of parabolas. It leads to the
formula
żb
3∆x
f ( x )dx = [ f ( x0 ) + 3 f ( x1 ) + 3 f ( x2 ) + 2 f ( x3 ) + 2 f ( x4 ) + 3 f ( x5 ) + 3 f ( x6 ) + 2 f ( x7 ) + ¨ ¨ ¨ + f ( xn )]
a 8
where n is a multiple of 3. This result is known as Simpson’s second rule and Simpson’s 3/8 rule. While
one can push this approach further (using quartics, quintics etc), it can sometimes lead to larger errors
— the interested reader should look up Runge’s phenomenon.
238
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
Solution. We proceed almost identically to Example 3.9.8 and again use n = 8 steps.
= 15.280932 ˆ 0.130900
= 2.00027
239
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
55 Indeed, even beyond the “real world” of many applications in first year calculus texts, some of the
methods we have described are used by actual people (such as ship builders, engineers and surveyors)
to estimate areas and volumes of actual objects!
240
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
Observe that
• Using 101 evaluations of f worth of Simpson’s rule gives an error 80 times smaller
than 1000 evaluations of f worth of the midpoint rule.
• The trapezoidal rule error with n steps is about twice the midpoint rule error with n
steps.
• With the midpoint rule, increasing the number of steps by a factor of 10 appears to
reduce the error by about a factor of 100 = 102 = n2 .
• With the trapezoidal rule, increasing the number of steps by a factor of 10 appears
to reduce the error by about a factor of 102 = n2 .
So it looks like
żb żb
1
approx value of f ( x ) dx given by n midpoint steps « f ( x ) dx + K M ¨
a a n2
żb żb
1
approx value of f ( x ) dx given by n trapezoidal steps « f ( x ) dx + KT ¨ 2
a a n
żb żb
1
approx value of f ( x ) dx given by n Simpson’s steps « f ( x ) dx + K M ¨ 4
a a n
241
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
Figure3.9.1.
1 2 3 4 5 6 7 8 9 10
x = log2 n
−2
1 2 3 4 5 6 7 8 9 10 12 −4
■
x = log2 n
−6
−2
■ −8 ■
−4 ■
−8 ■
■ −14
−10 ■
■ −16 ■
−12 ■
■ −18
−14 ■
■
−20 ■
−16 ■
■
−22
−18 ■
■
−24 ■
−20 ■
Simpson’s rule
■
−26 y = 0.35 − 4.03 x
−22 ■
■
−28 ■
−24 ■
midpoint rule ■
−30
−26 y = −0.2706 − 2.0011 x
−32
■
■
−28
y = log2 en −34
−36 ■
−38
−40 ■
y = log2 en
żπ
A log-log plot of the error in the n step approximation to sin x dx.
0
To test these conjectures for the behaviour of the errors we apply our three rules with
about ten different choices of n of the form n = 2m with m integer. Figure 3.9.1 contains
two graphs of the results. The left-hand plot shows the results for the midpoint and trape-
zoidal rules and the right-hand plot shows the results for Simpson’s rule.
For each rule we are expecting (based on our conjectures above) that the error
en = |exact value ´ approximate value|
with n steps is (roughly) of the form
1
en = K
nk
for some constants K and k. We would like to test if this is really the case, by graphing
Y = en against X = n and seeing if the graph “looks right”. But it is not easy to tell
242
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
whether or not a given curve really is Y = XKk , for some specific k, by just looking at it.
However, your eye is pretty good at determining whether or not a graph is a straight line.
Fortunately, there is a little trick that turns the curve Y = XKk into a straight line – no matter
what k is.
Instead of plotting Y against X, we plot log Y against log X.56 This transformation57
works because when Y = XKk
So plotting y = log Y against x = log X gives the straight line y = log K ´ kx, which has
slope ´k and y–intercept log K.
The three graphs in Figure 3.9.1 plot y = log2 en against x = log2 n for our three rules.
Note that we have chosen to use logarithms58 with this “unusual base” because it makes
it very clear how much the error is improved if we double the number of steps used. To be
more precise — one unit step along the x-axis represents changing n ÞÑ 2n. For example,
applying Simpson’s rule with n = 24 steps results in an error of 0000166, so the point
log 0000166
( x = log2 24 = 4, y = log2 0000166 = log 2 = ´15.8) has been included on the graph.
Doubling the effort used — that is, doubling the number of steps to n = 25 — results
in an error of 0.00000103. So, the data point ( x = log2 25 = 5 , y = log2 0.00000103 =
ln 0.00000103
ln 2 = ´19.9) lies on the graph. Note that the x-coordinates of these points differ
by 1 unit.
For each of the three sets of data points, a straight line has also been plotted “through”
the data points. A procedure called linear regression59 has been used to decide precisely
which straight line to plot. It provides a formula for the slope and y–intercept of the
straight line which “best fits” any given set of data points. From the three lines, it sure
looks like k = 2 for the midpoint and trapezoidal rules and k = 4 for Simpson’s rule.
It also looks like the ratio between the value of K for the trapezoidal rule, namely K =
20.7253 , and the value of K for the midpoint rule, namely K = 2´0.2706 , is pretty close to 2:
20.7253 /2´0.2706 = 20.9959 .
The intuition, about the error behaviour, that we have just developed is in fact correct
— provided the integrand f ( x ) is reasonably smooth. To be more precise
56 Note in footnote 27 in section 3.3 we mentioned that the notation log x was ambiguous, and may mean
loge x, log10 x, or log2 x in different contexts. In this paragraph, the base actually doesn’t matter. Our
claim that the transformed function will be a line is true for all of these bases.
57 There is a variant of this trick that works even when you don’t know the answer to the integral ahead
of time. Suppose that you suspect that the approximation satisfies
Mn = A + K n1k
where A is the exact value of the integral and suppose that you don’t know the values of A, K and k.
Then
Mn ´ M2n = K n1k ´ K (2n1 )k = K 1 ´ 21k n1k
so plotting y = log( Mn ´ M2n ) against x = log n gives the straight line y = log K 1 ´ 21k ´ kx.
58 Now is a good time for a quick revision of logarithms .
59 Linear regression is not part of this course as its derivation requires some multivariable calculus. It is a
very standard technique in statistics.
243
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
M ( b ´ a )3
the total error introduced by the midpoint rule is bounded by
24 n2
and
M ( b ´ a )3
the total error introduced by the trapezoidal rule is bounded by
12 n2
żb
when approximating f ( x )dx. Further, if | f (4) ( x )| ď L for all a ď x ď b, then
a
L ( b ´ a )5
the total error introduced by Simpson’s rule is bounded by .
180 n4
The first of these error bounds in proven in Appendix section A.10. Here are some
examples which illustrate how they are used. First let us check that the above result is
consistent with our data in Figure 3.9.1
Example 3.9.13 Midpoint rule error approximating 0 sin x dx
şπ
So we take M = 1.
M ( b ´ a )3
|en | ď
24 n2
π3 1
=
24 n2
1
« 1.29 2
n
1 1
|en | « 2´.2706 = 0.83
n2 n2
π3 1
which is consistent with the bound |en | ď 24 n2 .
244
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
Example 3.9.13
Suppose, for example, that we wish to use the midpoint rule to evaluate60
ż1
2
e´x dx
0
to within an accuracy of 10´6 .
Solution.
• The integral has a = 0 and b = 1.
• The first two derivatives of the integrand are
d ´x2 2
e = ´2xe´x and
dx
d2 ´x2 d 2 2 2 2
2
e = ´ 2xe´x = ´2e´x + 4x2 e´x = 2(2x2 ´ 1)e´x
dx dx
• As x runs from 0 to 1, 2x2 ´ 1 increases from ´1 to 1, so that
2 2ˇ
0 ď x ď 1 ùñ |2x2 ´ 1| ď 1, e´x ď 1 ùñ ˇ2(2x2 ´ 1)e´x ˇ ď 2
ˇ
So we take M = 2.
• The error introduced by the n step midpoint rule is at most
M ( b ´ a )3
en ď
24 n2
2 (1 ´ 0)3 1
ď =
24 n2 12n2
• We need this error to be smaller than 10´6 so
1
en ď 2
ď 10´6 and so
12n
12n ě 106
2
clean up
106
n2 ě = 83333.3 . . . square root both sides
12
n ě 288.7
So 289 steps of the midpoint rule will do the job.
60 This is our favourite running example of an integral that cannot be evaluated algebraically — we need
to use numerical methods.
245
I NTEGRATION 3.9 N UMERICAL I NTEGRATION
Example 3.9.14
That seems like far too much work, and the trapezoidal rule will have twice the error. So
we should look at Simpson’s rule.
Example 3.9.15
ş1 2
Suppose now that we wish evaluate 0 e´x dx to within an accuracy of 10´6 — but now
using Simpson’s rule. How many steps should we use?
Solution.
• Again we have a = 0, b = 1.
d4 ´x2
• We then need to bound dx4
e on the domain of integration, 0 ď x ď 1.
d3 ´x2 d 2( 2 2
3
e = 2(2x2 ´ 1)e´x = 8xe´x ´ 4x (2x2 ´ 1)e´x
dx dx
2
= 4(´2x3 + 3x )e´x
d4 ´x2 d 3 ´x2 2 ´x2 3 ´x2
e 4 3x e 4 3 e 8x 3x e
(
= ( ´2x + ) = ( ´6x + ) ´ ( ´2x + )
dx4 dx
2
= 4(4x4 ´ 12x2 + 3)e´x
2
• Now, for any x, e´x ď 1. Also, for 0 ď x ď 1,
0 ď x2 , x4 ď 1 so
3 ď 4x4 + 3 ď 7 and
´12 ď ´12x2 ď 0 adding these together gives
´9 ď 4x4 ´ 12x2 + 3 ď 7
So take L = 36.
L ( b ´ a )5
en ď
180 n4
36 (1 ´ 0)5 1
ď =
180 n4 5n4
246
I NTEGRATION 3.10 I MPROPER I NTEGRALS
1
en ď 4
ď 10´6 and so
5n
5n4 ě 106
n4 ě 200000 take fourth root
n ě 21.15
• n = 22 steps actually results in an error of 3.5 ˆ 10´8 . The reason that we get an
error so much smaller than we need is that we have overestimated the number of
steps
ˇ 4 required.ˇ This, in turn, occurred because we made quite a rough bound of
ˇd
ˇ dx4 f ( x )ˇ ď 36. If we are more careful then we will get a slightly smaller n. It
ˇ
actually turns out61 that you only need n = 10 to approximate within 10´6 .
Example 3.9.15
Appendix section A.10 gives some idea of where the error bounds come from.
• a bounded integrand f ( x ) (and in fact continuous except possibly for finitely many
jump discontinuities).
Definition3.10.1.
247
I NTEGRATION 3.10 I MPROPER I NTEGRALS
The first has an infinite domain of integration and the integrand of the second tends to 8
as x approaches the left end of the domain of integration. We’ll start with an example that
illustrates the traps that you can fall into if you treat such integrals sloppily. Then we’ll
see how to treat them carefully.
ş
1
Example 3.10.2 ´1 x12 dx
y = f (x)
a R x
62 Very wrong. But it is not an example of “not even wrong” — which is a phrase attributed to the physicist
Wolfgang Pauli who was known for his harsh critiques of sloppy arguments. The phrase is typically
used to describe arguments that are so incoherent that not only can one not prove they are true, but
they lack enough coherence to be able to show they are false. The interested reader should do a little
searchengineing and look at the concept of falisfyability.
248
I NTEGRATION 3.10 I MPROPER I NTEGRALS
ş
8 dx
Example 3.10.3 a 1+ x 2
Solution.
żR ˇR
dx
= arctan xˇˇ
ˇ
1+x 2
a a
= arctan R ´ arctan a
żR
dx dx
ż8
2
= lim
a 1+x RÑ8 a 1 + x2
= lim arctan R ´ arctan a
RÑ8
π
= ´ arctan a.
2
Example 3.10.3
249
I NTEGRATION 3.10 I MPROPER I NTEGRALS
şR
(a) If the integral a f ( x ) dx exists for all R ą a, then
ż8 żR
f ( x ) dx = lim f ( x ) dx
a RÑ8 a
when both limits exist (and are finite). Any c can be used.
When the limit(s) exist, the integral is said to be convergent. Otherwise it is said
to be divergent.
ş1
We must also be able to treat an integral like 0 dx
x that has a finite domain of integration
but whose integrand is unbounded near one limit of integration63 Our approach is similar
— we sneak up on the problem. We compute the integral on a smaller domain, such as
ş1 dx
t x , with t ą 0, and then take the limit t Ñ 0+.
ş
1 1
Example 3.10.5 0 x dx
Solution.
63 This will, in turn, allow us to deal with integrals whose integrand is unbounded somewhere inside the
domain of integration.
250
I NTEGRATION 3.10 I MPROPER I NTEGRALS
Example 3.10.5
y
1
y= x
t 1 x
şb
(a) If the integral t f ( x ) dx exists for all a ă t ă b, then
żb żb
f ( x ) dx = lim f ( x ) dx
a tÑa+ t
When the limit(s) exist, the integral is said to be convergent. Otherwise it is said
to be divergent.
251
I NTEGRATION 3.10 I MPROPER I NTEGRALS
Notice that (c) is used when the integrand is unbounded at some point in the middle
of the domain of integration, such as was the case in our original example
ż1
1
dx
´1 x2
More generally, if an integral has more than one “source of impropriety” (for exam-
ple an infinite domain of integration and an integrand with an unbounded integrand or
multiple infinite discontinuities) then you split it up into a sum of integrals with a single
“source of impropriety” in each. For the integral, as a whole, to converge every term in
that sum has to converge.
For example
ş
8 dx
Example 3.10.7 ´8 ( x´2 ) x2
dx
ż8
´8 ( x ´ 2) x 2
where
252
I NTEGRATION 3.10 I MPROPER I NTEGRALS
3.10.2 §§ Examples
With the more formal definitions out of the way, we are now ready for some (important)
examples.
ş
Example 3.10.8 1 dx
8
x p with p ą 0
Solution.
• Fix any p ą 0.
• The domain of the integral 1 dx 1
ş8
x p extends to +8 and the integrand xp is continuous
and bounded on the whole domain.
• So we write this integral as the limit
żR
dx dx
ż8
p = lim
1 x RÑ8 1 x p
• The antiderivative of 1/x p changes when p = 1, so we will split the problem into
three cases, p ą 1, p = 1 and p ă 1.
• When p ą 1,
żR ˇR
dx 1 1´p ˇ
x ˇ
ˇ
p =
1 x 1´ p 1
R1´p ´ 1
=
1´ p
Taking the limit as R Ñ 8 gives
żR
dx dx
ż8
= lim
1 xp RÑ8 1 x
p
R1´p ´ 1
= lim
RÑ8 1 ´ p
´1 1
= =
1´ p p´1
253
I NTEGRATION 3.10 I MPROPER I NTEGRALS
since 1 ´ p ă 0.
• Similarly when p ă 1 we have
żR
dx dx R1´p ´ 1
ż8
p = lim = lim
1 x RÑ8 1 x p RÑ8 1 ´ p
= +8
because 1 ´ p ą 0 and the term R1´p diverges to +8.
• Finally when p = 1
żR
dx
= ln |R| ´ ln 0 = ln R
1 x
Then taking the limit as R Ñ 8 gives us
dx
ż8
lim ln |R| = +8.
p = RÑ8
1 x
• So summarising, we have
#
dx divergent if p ď 1
ż8
= 1
1 xp p´1 if p ą 1
Example 3.10.8
ş
1 dx
Example 3.10.9 0 xp with p ą 0
Solution.
• Again, the antiderivative changes at p = 1, so we split the problem into three cases.
• When p ą 1 we have
ż1 ˇ1
dx 1 1´p ˇ
x ˇ
ˇ
p =
t x 1´ p t
1 ´ t1´p
=
1´ p
254
I NTEGRATION 3.10 I MPROPER I NTEGRALS
Since 1 ´ p ă 0 when we take the limit as t Ñ 0 the term t1´p diverges to +8 and
we obtain
ż1
dx 1 ´ t1´p
p = lim = +8
0 x tÑ0+ 1 ´ p
1 ´ t1´p 1
= lim =
tÑ0 + 1´ p 1´ p
since 1 ´ p ą 0.
• In summary
1
ż1 #
dx 1´p if p ă 1
=
0 xp divergent if p ě 1
Example 3.10.9
ş
8 dx
Example 3.10.10 0 xp with p ą 0
Solution.
• So we split the domain in two — given our last two examples, the obvious place to
cut is at x = 1: ż1
dx dx dx
ż8 ż8
p = p + p
0 x 0 x 1 x
• We saw, in Example 3.10.9, that the first integral diverged whenever p ě 1, and we
also saw, in Example 3.10.8, that the second integral diverged whenever p ď 1.
255
I NTEGRATION 3.10 I MPROPER I NTEGRALS
dx
ş8
• So the integral 0 xp diverges for all values of p.
Example 3.10.10
ş
1 dx
Example 3.10.11 ´1 x
This is a pretty subtle example. Look at the sketch below: This suggests that the signed
y
1
y= x
−1 x
1
area to the left of the y–axis should exactly cancel the area to the right of the y–axis making
ş1
the value of the integral ´1 dx x exactly zero.
But both of the integrals
ż1
dx
ż1
dx h i1 1
= lim = lim ln x = lim ln = +8
0 x tÑ0+ t x tÑ0+ t tÑ0+ t
ż0
dx
żT
dx h iT
= lim = lim ln |x| = lim ln |T| = ´8
´1 x TÑ0´ ´1 x TÑ0´ ´1 TÑ0´
ş1
diverge so ´1 dx
x diverges. Don’t make the mistake of thinking that 8 ´ 8 = 0. It is undefined.
And it is undefined for good reason.
For example, we have just seen that the area to the right of the y–axis is
ż1
dx
lim = +8
tÑ0+ t x
and that the area to the left of the y–axis is (substitute ´7t for T above)
dx
ż ´7t
lim = ´8
tÑ0+ ´1 x
256
I NTEGRATION 3.10 I MPROPER I NTEGRALS
h 1 iT h 1 i1
= lim ´ + lim ´
TÑ0´ x ´1 tÑ0+ x t
= 8+8
Hence the integral diverges to +8.
Example 3.10.12
ş
8 dx
Example 3.10.13 ´8 1+ x2
Since
żR
dx h iR π
lim 2
= lim arctan x = lim arctan R =
RÑ8 0 1 + x RÑ8 0 RÑ8 2
ż0
dx h i 0 π
lim 2
= lim arctan x = lim ´ arctan r =
rÑ´8 r 1 + x rÑ´8 r rÑ´8 2
dx
ş8
The integral ´8 1+ x2 converges and takes the value π.
Example 3.10.13
Example 3.10.14
dx
ş8
For what values of p does e x (ln x ) p converge?
Solution.
257
I NTEGRATION 3.10 I MPROPER I NTEGRALS
• For x ě e, the denominator x (ln x ) p is never zero. So the integrand is bounded on the
entire domain of integration and this integral is improper only because the domain
of integration extends to +8 and we proceed as usual.
• We have
żR
dx dx
ż8
= lim use substitution
e x (ln x ) p RÑ8 e x (ln x ) p
ż ln R
du dx
= lim with u = ln x, du =
RÑ8 1 up x
$ h i
& 1 (ln R)1´p ´ 1 if p ‰ 1
1´p
= lim
RÑ8 %
ln(ln R) if p = 1
#
divergent if p ď 1
= 1
p´1 if p ą 1
In this last step we have used similar logic that that used in Example 3.10.8, but with
R replaced by ln R.
Example 3.10.14
ż8
Γ(t) = x t´1 e´x dx
0
ż8 żR h iR
Γ (1) = e ´x
dx = lim e ´x
dx = lim ´e ´x
=1
0 RÑ8 0 RÑ8 0
258
I NTEGRATION 3.10 I MPROPER I NTEGRALS
• Then compute
ż8
Γ (2) = xe´x dx
0
żR
= lim xe´x dx use integration by parts with
RÑ8 0
u = x, dv = e´x dx,
v = ´e´x , du = dx
ˇR ż R
= lim ´ xe ´x ˇ
ˇ + e dx
´x
RÑ8 0 0
h iR
= lim ´ xe´x ´ e´x
RÑ8 0
=1
• Now we move on to general n, using the same type of computation as we just used
to evaluate Γ(2). For any natural number n,
ż8
Γ ( n + 1) = x n e´x dx
0
żR
= lim x n e´x dx again integrate by parts with
RÑ8 0
u = x n , dv = e´x dx,
v = ´e´x , du = nx n´1 dx
ˇR ż R
n ´x ˇ n´1 ´x
= lim ´x e ˇ + nx e dx
RÑ8 0 0
żR
= lim n x n´1 e´x dx
RÑ8 0
= nΓ(n)
• Now that we know Γ(2) = 1 and Γ(n + 1) = nΓ(n), for all n P N, we can compute
all of the Γ(n)’s.
Γ (2) =1
Γ (3) = Γ(2 + 1)= 2Γ(2) = 2 ¨ 1
Γ (4) = Γ(3 + 1)= 3Γ(3) = 3 ¨ 2 ¨ 1
Γ (5) = Γ(4 + 1)= 4Γ(4) = 4 ¨ 3 ¨ 2 ¨ 1
..
.
Γ ( n ) = ( n ´ 1) ¨ ( n ´ 2) ¨ ¨ ¨ 4 ¨ 3 ¨ 2 ¨ 1 = ( n ´ 1) !
259
I NTEGRATION 3.10 I MPROPER I NTEGRALS
That is, the factorial is just64 the Gamma function shifted by one.
Example 3.10.15
Remark 3.10.16. For pedagogical purposes, ş8 we are going to concentrate on the problem
of determining whether or not an integral a f ( x ) dx converges, when f ( x ) has no singu-
larities for x ě a. Recall that the first step in analyzing any improper integral is to write it
as a sum of integrals each of has only a single “source of impropriety” — either a domain
of integration that extends to +8, or a domain of integration that extends to ´8, or an
integrand which is singular at one end of the domain of integration. So we are now going
to consider only the first of these three possibilities. But the techniques that we are about
to see have obvious analogues for the other two possibilities.
ş8
Now let’s start. Imagine that we have an improper integral a f ( x ) dx, that f ( x ) has
no singularities for x ě a and that f ( x ) is complicated enough that
ş8 we cannot evaluate the
integral explicitly66 . The idea is find another improper integral a g( x ) dx
ş8
• with g( x ) simple enough that we can evaluate the integral a g( x ) dx explicitly, or
ş8
at least determine easily whether or not a g( x ) dx converges, and
ş8
• with g( x ) behaving enough like f ( x ) for large x that the integral a f ( x ) dx con-
ş8
verges if and only if a g( x ) dx converges.
So far, this is a pretty vague strategy. Here is a theorem which starts to make it more
precise.
64 The Gamma function is far more important than just a generalisation of the factorial. It appears all over
mathematics, physics, statistics and beyond. It has all sorts of interesting properties and its definition
can be extended from natural numbers n to all numbers excluding 0, ´1, ´2, ´3, . . . . For example, one
can show that
π
Γ (1 ´ z ) Γ ( z ) = .
sin πz
65 Applying numerical integration methods to a divergent integral may result in perfectly reasonably
looking but very wrong answers.
ş8 2
66 You could, for example, think of something like our running example a e´t dt.
260
I NTEGRATION 3.10 I MPROPER I NTEGRALS
Theorem3.10.17 (Comparison).
Let a be a real number. Let f and g be functions that are defined and continuous
for all x ě a and assume that g( x ) ě 0 for all x ě a.
ş8 ş8
(a) If | f ( x )| ď g( x ) for all x ě a and if a g( x ) dx converges then a f ( x ) dx also
converges.
ş8 ş8
(b) If f ( x ) ě g( x ) for all x ě a and if a g( x ) dx diverges then a f ( x ) diverges.
We will not prove this theorem, but, hopefully, the following supporting arguments
should at least appear reasonable to you. Consider the figure below:
ş8
• If a g( x ) dx converges, then the area of
( x, y) ˇ x ě a, 0 ď y ď g( x ) is finite.
ˇ (
( x, y) ˇ x ě a, 0 ď y ď | f ( x )| is contained inside ( x, y) ˇ x ě a, 0 ď y ď g( x )
ˇ ( ˇ (
and so must also have finite area. Consequently the areas of both the regions
( x, y) ˇ x ě a, 0 ď y ď f ( x ) and ( x, y) ˇ x ě a, f ( x ) ď y ď 0
ˇ ( ˇ (
( x, y) ˇ x ě a, 0 ď y ď g( x ) is infinite.
ˇ (
261
I NTEGRATION 3.10 I MPROPER I NTEGRALS
ş
8 ´x2
Example 3.10.18 1 e dx
2
We cannot evaluate the integral 1 e´x dx explicitly68 , however we would still like to un-
ş8
• The integral
ż8 żR
e ´x
dx = lim e´x dx
1 RÑ8 1
h iR
= lim ´ e´x
RÑ8 1
h i
= lim e´1 ´ e´R = e´1
RÑ8
converges.
2
• So, by Theorem 3.10.17, with a = 1, f ( x ) = e´x and g( x ) = e´x , the integral
ş8 ´x2
1 e dx converges too (it is approximately equal to 0.1394).
Example 3.10.18
ş
Example 3.10.19
8 ´x2 dx
1/2 e
Solution.
ş8 2 ş8 2
• The integral 1/2 e´x dx is quite similar to the integral 1 e´x dx of Example 3.10.18.
But we cannot just repeat the argument of Example 3.10.18 because it is not true that
2
e´x ď e´x when 0 ă x ă 1.
2
• In fact, for 0 ă x ă 1, x2 ă x so that e´x ą e´x .
• However the difference between the current example and Example 3.10.18 is
ż8 ż8 ż1
´x2 ´x2 2
e dx ´ e dx = e´x dx
1/2 1 1/2
262
I NTEGRATION 3.10 I MPROPER I NTEGRALS
which is clearly a well defined finite number (its actually about 0.286). It is important
to note that we are being a little sloppy by taking the difference of two integrals like
this — we are assuming that both integrals converge. More on this below.
ş8 2
• So we would expect that 1/2 e´x dx should be the sum of the proper integral inte-
ş1 2 ş8 2
gral 1/2 e´x dx and the convergent integral 1 e´x dx and so should be a conver-
gent integral. This is indeed the case. The Theorem below provides the justification.
Example 3.10.19
Theorem3.10.20.
Let a and c be real numbers with a ă c andş8 let the function f ( x ) be continuous
for all x ě a. Then the improper integral a f ( x ) dx converges if and only if the
ş8
improper integral c f ( x ) dx converges.
ş8
Proof. By definition the improper integral a f ( x ) dx converges if and only if the limit
żR ż c żR
lim f ( x ) dx = lim f ( x ) dx + f ( x ) dx
RÑ8 a RÑ8 a c
żc żR
= f ( x ) dx + lim f ( x ) dx
a RÑ8 c
şc
exists and is finite. (Remember that, in computing the limit, a f ( x ) dx is a finite constant
independent of R and so can be pulled out of the limit.) But that is the case if and only if
şR
the limit limRÑ8 c f ( x ) dx exists and is finite, which in turn is the case if and only if the
ş8
integral c f ( x ) dx converges.
Example 3.10.21
?
ş8 x
Does the integral 1 x2 + x dx converge or diverge?
Solution.
• Our first task is to identify the potential sources of impropriety for this integral.
• The domain of integration extends to +8, but we must also check to see if the in-
tegrand contains any singularities. On the domain of integration x ě 1 so the de-
nominator is never zero and the integrand is continuous. So the only problem is at
+8.
• Our second task is to develop some intuition69 . As the only problem is that the
domain of integration extends to infinity, whether or not the integral converges will
be determined by the behavior of the integrand for very large x.
69 This takes practice, practice and more practice. At the risk of alliteration — please perform plenty of
practice problems.
263
I NTEGRATION 3.10 I MPROPER I NTEGRALS
• When x is very large, x2 is much much larger than x (which we can write as x2 " x)
so that the denominator x2 + x « x2 and the integrand
? ?
x x 1
2
« 2 = 3/2
x +x x x
dx
ş8
• By Example 3.10.8, with p = 3/2, the integral 1 x3/2 converges. So we would expect
ş8 ?
that 1 x2 +xx dx converges too.
x2 + x ą x2
1 1
ă 2
x2 +x x
?
Multiply both sides by x (which is always positive, so the sign of the inequality
does not change)
? ?
x x 1
2
ă 2 = 3/2
x +x x x
• So Theorem ?3.10.17(a) and Example 3.10.8, with p = 3/2 do indeed show that the
integral 1 x2 +xx dx converges.
ş8
Example 3.10.21
Notice that in this last example we managed to show that the integral exists by finding
an integrand that behaved the same way for large x. Our intuition then had to be bolstered
with some careful inequalities to apply the comparison Theorem 3.10.17. It would be
nice to avoid this last step and be able jump from the intuition to the conclusion without
messing around with inequalities. Thankfully there is a variant of Theorem 3.10.17 that is
often easier to apply and that also fits well with the sort of intuition that we developed to
solve Example 3.10.21.
264
I NTEGRATION 3.10 I MPROPER I NTEGRALS
A key phrase in the previous paragraph is “behaves the same way for large x”. A good
way to formalise this expression — “ f ( x ) behaves like g( x ) for large x” — is to require
that the limit
f (x)
lim exists and is a finite nonzero number.
xÑ8 g ( x )
Suppose that this is the case and call the limit L ‰ 0. Then
f (x)
• the ratio g( x )
must approach L as x tends to +8.
• So when x is very large — say x ą B, for some big number B — we must have that
1 f (x)
Lď ď 2L for all x ą B
2 g( x )
Equivalently, f ( x ) lies between L2 g( x ) and 2Lg( x ), for all x ě B.
• Consequently, the integral of f ( x ) converges if and only if the integral of g( x ) con-
verges, by Theorems 3.10.17 and 3.10.20.
These considerations lead to the following variant of Theorem 3.10.17.
Let ´8 ă a ă 8. Let f and g be functions that are defined and continuous for all
x ě a and assume that g( x ) ě 0 for all x ě a.
ş8
(a) If a g( x ) dx converges and the limit
f (x)
lim
xÑ8 g( x )
ş8
exists, then a f ( x ) dx converges.
ş8
(b) If a g( x ) dx diverges and the limit
f (x)
lim
xÑ8 g( x )
ş8
exists and is nonzero, then a f ( x ) diverges.
Note that in (b) the limit must exist and be nonzero, while in (a) we only require
that the limit exists (it can be zero).
265
I NTEGRATION 3.10 I MPROPER I NTEGRALS
• Our first task is to identify the potential sources of impropriety for this integral.
• The domain of integration extends to +8. On the domain of integration the de-
nominator is never zero so the integrand is continuous. Thus the only problem is at
+8.
• Our second task is to develop some intuition about the behavior of the integrand
for very large x. A good way to start is to think about the size of each term when x
becomes big.
Notice that we are using A ! B to mean that “A is much much smaller than B”.
Similarly A " B means “A is much much bigger than B”. We don’t really need to be
too precise about its meaning beyond this in the present context.
dx x +sin x
ş8 ş8
• Now, since 1 x diverges, we would expect 1 e´x + x2 dx to diverge too.
• Our final task is to verify that our intuition is correct. To do so, we set
x + sin x 1
f (x) = g( x ) =
e´x + x2 x
and compute
f (x) x + sin x 1
lim = lim ´x ˜
xÑ8 g( x ) xÑ8 e + x2 x
(1 + sin x/x ) x
= lim ´x 2 ˆx
xÑ8 ( e /x + 1) x2
1 + sin x/x
= lim ´x 2
xÑ8 e /x + 1
=1
• Since 1 g( x ) dx = 1 dx
ş8 ş8
ş8 x diverges, by Example 3.10.8 with p = 1, Theorem 3.10.22(b)
ş8 x+sin x
now tells us that 1 f ( x ) dx = 1 e´x + x2 dx diverges too.
Example 3.10.23
266
I NTEGRATION 3.11 O VERVIEW OF I NTEGRATION T ECHNIQUES
y
żb 1
1 dx = (b ´ a) ˆ (1) = b ´ a
a
a x
b
A special case where this method is useful is with half and quarter circles. If we wanted
to use the Fundamental Theorem of Calculus to evaluate the integral below, we’d need a
trigonometric substitution. It’s much easier to recognize that the area in question is one
quarter of the unit circle.
y
ż1a 1
1 π
1 ´ x2 dx = π (1)2 =
0 4 4
1 x
You can also take advantage of a function’s symmetry. For example, ´π sin x dx = 0
şπ
because the positive area on the right exactly cancels out the negative (net) area on the
left.
y
1
żπ
sin xdx = 0 π x
´π −π
−1
267
I NTEGRATION 3.11 O VERVIEW OF I NTEGRATION T ECHNIQUES
The “inside function” 2x2 + 3x has derivative (4x + 3), and we see precisely that
derivative multiplied to the rest of the integrand. So this is a great candidate for the
substitution u = 2x2 + 3x.
Substitution is sometimes a first step to get a function in a better form for a second tech-
e x +e3x
nique. For example, the function ex (1´e x )(2´e x ) is a rational function (and a candidate for
(Contrast this with the substitution rule: both often operate on integrands that are the
product of functions.)
There are two special cases you should know where integration by parts is useful but
not immediately obvious. One case is with the antiderivatives of logarithms and inverse
trig functions. (See Examples 3.5.8 and 3.5.9 for details.) For example, to antidifferentiate
the natural logarithm, we use integration by parts with u = ln x and dv = dx.
1
ż ż
ln x dx = x ln x ´ x ¨ dx
x
The other special case is integrating around in a circle. We’ve seen this with the integral
ż
e x sin xdx
268
I NTEGRATION 3.11 O VERVIEW OF I NTEGRATION T ECHNIQUES
and tann x secm xdx. You’ll decide on a substitution (say, u = sin x) and then use trigono-
ş
We can’t cancel out the square root and the x2 in its current form, so we set x = tan θ,
dx = sec2 θdθ. Now:
ż a ż a
2
1 + x dx = 1 + tan2 θ sec2 θdθ
Now we can cancel out the square root and the squared function.
ż
= sec θ ¨ sec2 θdθ
269
I NTEGRATION 3.11 O VERVIEW OF I NTEGRATION T ECHNIQUES
cannot be evaluated using the techniques we’ve learned so far. Their definite integrals,
however, can be approximated using the Midpoint Rule, the Trapezoid Rule, or Simpson’s
Rule. These rules come with error bounds, so we can make sure our error is within a given
tolerance.
One special application of numerical integration is finding a decimal approximation
for an irrational number. In Question 28 of Section 3.9 in the practice book, we find a
decimal approximation of ln 2 by applying Simpson’s Rule to the integral
ż2
1
dx.
1 x
are both improper. The second one is a dangerous type: it’s easy to try to apply the
Fundamental Theorem of Calculus to evaluate it, without realizing that your computation
is nonsense. Both types of improper integrals are evaluated with limits. If the limits don’t
exist (including limits going to infinity), then we say the integral diverges.
270
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
We are just going to scratch the surface of the study of differential equations. Most uni-
versities offer half a dozen different undergraduate courses on various aspects of differen-
tial equations. We’ll focus here on one important type of differential equation: separable
differential equations.
We’ve already seen one type of differential equation: finding an antiderivative.
Example 3.12.1
dy
= ex .
dx
What is y?
Solution. We know the derivative of our function y, as a function of x, so we just antidif-
ferentiate.
y( x ) = e x + C
for some constant C.
Note the answer to the question is a function.
271
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
Example 3.12.1
Before we talk about solving more complicated differential equations, let’s get more
practice working with them. The biggest paradigm shift between solving a differential
equation, and the type of equation-solving you’re used to, is that we’re solving for a func-
tion instead of a variable.
Example 3.12.2
Choose the function(s) listed below that solve this differential equation:
dy
+ x2 ´ 1 = y
dx
A. y = x2 + 2x + 1
B. y = x2 + 1
dy
Solution. We want to check whether y is a solution, so we replace y and dx in the differ-
ential equation, and check whether the equation is true.
dy
In the case y = x2 + 2x + 1, then dx = 2x + 2. We plug these into our differential equa-
tion:
dy
+ x2 ´ 1 = y
dx
2x + 2 + x2 ´ 1 = x2 + 2x + 1
x2 + 2x + 1 = x2 + 2x + 1
This is true – the function on the left and the function on the right are the same. So the
dy
equation y = x2 + 2x + 1 is a solution to the differential equation dx + x2 ´ 1 = y.
Now let’s think about the other function we were asked to consider, y = x2 + 1. For
dy
this fuction, dx = 2x. We plug these into our differential equation:
dy
+ x2 ´ 1 = y
dx
2x + x2 ´ 1 = x2 + 2x + 1
x2 + 2x ´ 1 = x2 + 2x + 1
This is not true – the function on the left and the function on the right are not the same.
dy
So the equation y = x2 + 2x is not a solution to the differential equation dx + x2 ´ 1 = y.
272
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
You don’t have enough tools yet to come up with solutions like y = x2 + 2x + 1 – those
will come shortly. For this example, we only want you to understand what it means for a
function to be a solution to a differential equation.
Example 3.12.2
Definition3.12.3.
It may take some rearranging to get a differential equation in this form. The “separa-
ble” refers to the mechanics of getting all terms containing y and y1 on side of the equation,
and all terms containing x on the other side.
We’ll start by developing a recipe for solving separable differential equations. Then
we’ll look at many examples. Usually one suppresses the argument of y( x ) and writes the
equation as below:
dy
g(y) = f (x)
dx
If the left and right functions side of the equation are the same (and they should be – oth-
erwise that equals sign has no business being there) then their antiderivatives with respect
to x should be the same as well, up to the usual additive constant.
ż
dy
ż
g(y) dx = f ( x )dx
dx
The left-hand side of the equation above is in a perfect form for a substitution.
ż ż
g(y)dy = f ( x )dx (*)
In this way, we’ve turned the problem of finding solutions to our separable differential
equation into the problem of finding two antiderivatives.
Note the work above didn’t really depend on what, exactly, f ( x ) and g(y) were. So, to
skip to the end, we use the following mnemonic algorithm. It looks strange, but you can
simply think of it as shorthand for the work we just did above.
dy
g(y) ¨ = f (x)
dx
g(y) dy = f ( x ) dx (1)
ż ż
g(y) dy = f ( x ) dx (2)
273
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
dy
In Step (1), we separate all x’s and y’s, including in dx , by “multiplying” both sides of
the equation by dx. In Step (2), we add an integral side to both sides of the equation.
dy
This looks illegal, and indeed is illegal — dx is not a fraction. Again, this procedure is
simply a mnemonic device to help you remember the result (*).
Example 3.12.4
x2
ż ż
y
e dy = xdx ðñ ey = +C
2
The
ş y C on the right hand side contains both the arbitrary constant for the indefinite integral
e dy and the arbitrary constant for the indefinite integral xdx. Finally, we solve for y,
ş
Note that C is an arbitrary constant. It can take any value. It cannot be determined
by the differential equation itself. In applications C is usually determined by a require-
ment that y take some prescribed value (determined by the application) when x is some
prescribed value. (We call these types of problems “initial value” problems. The given
constants are “initial conditions.”) For example, suppose that we wish to find a function
y( x ) that obeys both
dy
= xe´y and y (0) = 1
dx
dy 2
We know that, to have dx = xe´y satisfied, we must have y( x ) = ln x2 + C , for some
constant C. To also have the initial condition y(0) = 1, we must have
x2 ˇˇ
1 = y(0) = ln + C ˇˇ = ln C ðñ ln C = 1 ðñ C = e
2 x =0
x2
So our final solution is y( x ) = ln 2 +e .
Example 3.12.4
Example 3.12.5
dy
Solve dx = y2
274
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
dy
= y2
dx
dy
= dx
y2
dy
ż ż
= dx
y2
y´1
= x+C
´1
1
y=´
x+C
dy
When y = 0, this computation breaks down because y2 contains a division by 0. We can
check if the function y( x ) = 0 satisfies the differential equation by just subbing it in:
y( x ) = 0 ùñ y1 ( x ) = 0, y( x )2 = 0 ùñ y1 ( x ) = y( x )2
1
y( x ) = 0 or y( x ) = ´ , for any constant C
x+C
Example 3.12.5
In the article War Moods: 170 , researcher Lewis Richardson models the proportion of a
population eager for war using a model previously applied to the spread of infectious
diseases. (We note here an important quote from the paper: “To describe a phenomenon
is not to praise it.” Understanding the social psychology of public support for war may
lead to strategies for preventing conflicts.)
A simplified version of Richardson’s model for the lead-up to hostilities is as follows.
Let y be the proportion of a population that supports going to war, with the rest of the
population against going to war. Then the rate of change of y over time is proportional
to the product of the proportion of people who are pro-war and the proportion of people
who are anti-war. The reasoning is roughly71 that y changes as pro-war people encounter
anti-war people.
That corresponds to the differential equation
dy
= Cy(1 ´ y)
dt
70 War Moods: 1 by Richardson, PSYCHOMETRIKA–Vol. 13, no. 3 September, 1948. You can access the
full text with your UBC CWL at this link.
71 The actual paper has more subtlety, including considering populations of rival nations, and the pro-
gression of public sentiment as a war drags on.
275
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
dy
= Cy(1 ´ y)
dt
1
dy = Cdt
y (1 ´ y )
1
ż ż
dy = Cdt
y (1 ´ y )
In this model, there is a quick change from low support for war (y « 0) to high support
for war (y « 1). The paper notes, regarding the first world war: “There is evidence ... that
the majority of Britishers changed their opinions about war with Germany during a week
in 1914 between July 24 and August 4.”
Example 3.12.6
Professor Daniel Pauly of the UBC Institute for the Oceans and Fisheries considered the
276
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
following model of fish growth in the paper A précis of Gill-Oxygen Limitation Theory
(GOLT), with some Emphasis on the Eastern Mediterranean 72 .
Let w(t) be the weight of an individual fish over time. The rate at which is it able to
synthesize proteins (and other necessary substances) is proportional to wd , while the rate
at which its proteins need to be replaced is proportional to w. So,
dw
= Hwd ´ kw
dt
where Hwd is the rate at which new proteins are built, and kw is the rate at which they
need to be replaced. Because the rate of production is limited to the rate of oxygen intake
(which itself is proportional to gill size), the exponent d is less than one.
The paper notes that researchers often neglect oxygen impacts on fish growth— a de-
cision not supported by this model.
Suppose d = 0.5 for a particular small species of fish. What is w(t)? How large would
the fish grow, if it grew indefinitely?
dw ?
= H w ´ kw
dt
1
? dw = dt
H w ´ kw
1
ż ż
? dw = 1dt
H w ´ kw
1
ż ż
? ? dw = 1dt
w( H ´ k w)
?
For the left-hand side, we use the substitution u = H ´ k w, ´ 2k du = ?1 dw
w
2 1
ż ż
´ du = 1dt
k u
2
´ ln |u| = t + C
k
k
ln |u| = ´ t + C (remember C is an arbitrary constant)
2
´ 2k t+C
|u| = e
? k
|H ´ k w| = e´ 2 t+C
72 PAULY, D. (2019). A précis of Gill-Oxygen Limitation Theory (GOLT), with some Emphasis on the
Eastern Mediterranean. Mediterranean Marine Science, 20(4), 660-668. doi:http://dx.doi.org/
10.12681/mms.19285
277
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
1 2 H 2
´ 2k t+C
lim H´e =
tÑ8 k2 k
2
So, aging fish who grow according to our model approach the weight Hk .
Example 3.12.7
Definition3.12.8.
dy
= a(y ´ b)
dx
where a and b are constants is called a first-order linear differential equation.
“First-order” means the equation has a first derivative, but no higher-order derivatives
(e.g. no second derivatives). The right hand side is a linear expression in the variable y.
Example 3.12.9
Let a and b be any two constants. We’ll now solve the family of differential equations
dy
= a(y ´ b)
dx
278
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
dy dy
ż ż
= a dx ùñ = a dx ùñ ln |y ´ b| = ax + c ùñ |y ´ b| = e ax+c = ec e ax
y´b y´b
ùñ y ´ b = Ce ax
where C is either +ec or ´ec . Note that as c runs over all real numbers, +ec runs over
all strictly positive real numbers and ´ec runs over all strictly negative real numbers. So,
so far, C can be any real number except 0. But we were a bit sloppy here. We implicitly
assumed that y ´ b was nonzero, so that we could divide it across. None–the–less, the
constant function y = b, which corresponds to C = 0, is a perfectly good solution — when
dy
y is the constant function y = b, both dx and a(y ´ b) are zero. So the general solution to
dy
dx= a(y ´ b) is y( x ) = Ce ax + b, where the constant C can be any real number. Note that
when y( x ) = Ce ax + b we have y(0) = C + b. So C = y(0) ´ b and the general solution is
y( x ) = (y(0) ´ b)e ax + b
Example 3.12.9
Theorem3.12.10.
dy
= a(y ´ b)
dt
We call this the “steady state” solution. Steady, because y is never changing – it’s always
b.
279
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
Definition3.12.11.
dy
= g(y)
dt
is a steady
ˇ state solution. The constant b is a steady state of the differential equation
dy ˇ
if dt ˇ = g(b) = 0
y=b
Example 3.12.12
280
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
P(t) β
P(t + t g ) = P(t) + β ´ loPomo
(t)on = P(t)
2
looooooomooooooon 2
parents die
parents+offspring
where t g denotes the lifespan of one generation. The rate of change of the size of the
population per unit time is
P(t + t g ) ´ P(t) 1 hβ i
= P(t) ´ P(t) = bP(t)
tg tg 2
where b = 2tg is the net birthrate per member of the population per unit time. If we
β´2
approximate
P(t + t g ) ´ P(t) dP
« (t)
tg dt
dP
= bP(t) (3.12.1)
dt
By Theorem 3.12.10,
This is called the Malthusian73 growth model. It is, of course, very simplistic. One of its
main characteristics is that, since P(t + T ) = P(0) ¨ eb(t+T ) = P(t) ¨ ebT , every time you add
T to the time, the population size is multiplied by ebT . In particular, the population size
doubles every lnb2 units of time.
Example 3.12.13
In 1927 the population of the world was about 2 billion. In 1974 it was about 4 billion. Esti-
mate when it reached 6 billion. What will the population of the world be in 2100, assuming
the Malthusian growth model?
Solution.
• Let P(t) be the world’s population, in billions, t years after 1927. Note that 1974
corresponds to t = 1974 ´ 1927 = 47.
73 This is named after Rev. Thomas Robert Malthus. He described this model in a 1798 paper called “An
essay on the principle of population”.
281
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
• Finally, our crude model predicts that the population is 6 billion at the time t that
obeys
P(t) = 2ebt = 6 clean up
ebt = 3 take the log and clean up
ln 3 ln 3
t= = 47 = 74.5
b ln 2
which corresponds75 to the middle of 2001.
Example 3.12.13
The Malthusian growth model can be a reasonably good model only when the popu-
lation size is very small compared to its environment76 . A more sophisticated model of
population growth takes into account the “carrying capacity of the environment.”
Logistic growth adds one more wrinkle to the simple population model. It assumes
that the population only has access to limited resources. As the size of the population
grows the amount of food available to each member decreases. This in turn causes the net
P
birth rate b to decrease. In the logistic growth model b = b0 1 ´ K , where K is called the
carrying capacity of the environment, so that
P(t)
P (t) = b0 1 ´
1
P(t)
K
74 The 2015 Revision of World Population, a publication of the United Nations, predicts that the world’s
population in 2100 will be about 11 billion. But “about” covers a pretty large range. They give an 80%
confidence interval running from 10 billion to 12.5 billion.
75 The world population really reached 6 billion in about 1999.
76 That is, the population has plenty of food and space to grow.
282
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
Figure3.12.1.
P(t)
Below is a graph of P1 (t) = b0 1 ´ K P(t). Pay attention to the axis labels: the inde-
pendent (horizontal) axis is population, P. It is not time. The dependent (vertical) axis is
rate of change of population, dP
dt .
dP
dt
P
K
• When P = 0, there are no individuals in the population, so its growth rate is zero.
(Extinct animals do not usually reproduce.)
• When 0 ă P ă K, the population is less than the carrying capacity of its environ-
ment, so the population grows ( dP
dt ą 0).
• When P ą K, the population has outgrown the capacity of the environment to sup-
port it. Then dP
dt ă 0, as the population experiences a higher death rate than birth
rate.
dP
(t) = 6000 ´ 3P(t) P(t)
dt
We’ll sketch the graphs of four functions P(t) that obey this equation.
The sketches will be based on the observation that (6000 ´ 3P) P = 3(2000 ´ P) P
283
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
Consequently
=0 if P(t) = 0
$
’
’
dP ą0 if 0 ă P(t) ă 2000
’
&
(t)
dt ’= 0
’ if P(t) = 2000
’
ă0 if P(t) ą 2000
%
dP
Thus if P(t) is some function that obeys dt ( t ) = 6000 ´ 3P(t) P(t), then as the graph of
P(t) passes through the point t, P(t)
P (t)
3000
2000
1000
t
As a result,
• if P(0) = 0, the graph starts out horizontally. In other words, as t starts to increase,
P(t) remains at zero, so the slope of the graph remains at zero. The population size
remains zero for all time. As a check, observe that the function P(t) = 0 obeys
dP
dt ( t ) = 6000 ´ 3P ( t ) P ( t ) for all t.
• Similarly, if P(0) = 2000, the graph again starts out horizontally. So P(t) remains at
2000 and the slope remains at zero. The population size remains
2000 for all time.
dP
Again, the function P(t) = 2000 obeys dt (t) = 6000 ´ 3P(t) P(t) for all t.
284
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
• If P(0) = 3000, the graph starts out with negative slope. So P(t) decreases with
t. As P(t) decreases towards 2000, the slope (6000 ´ 3P(t) P(t), while remaining
negative, gets closer and closer to zero. As the graph approaches height 2000, it
becomes more and more horizontal. The graph cannot actually cross from above
2000 to below 2000, because to do so it would have to have negative slope for some
value of P below 2000, which is not allowed.
These curves are sketched in the figure below. We conclude that for any initial population
size P(0), except P(0) = 0, the population size approaches 2000 as t Ñ 8.
P (t)
3000
2000
1000
Now we’ll do an example in which we explicitly solve the logistic growth equation.
Example 3.12.14
In 1986, the population of the world was 5 billion and was increasing at a rate of 2% per
year. Using the logistic growth model with an assumed maximum population of 100 bil-
lion, predict the population of the world in the years 2000, 2100 and 2500.
Solution. Let y(t) be the population of the world, in billions of people, at time 1986 + t.
The logistic growth model assumes
y1 = ay(K ´ y)
where K is the carrying capacity and a = bK0 .
First we’ll determine the values of the constants a and K from the given data.
• We know that, if at time zero the population is below K, then as time increases the
population increases, approaching the limit K as t tends to infinity. So in this prob-
lem K is the maximum population. That is, K = 100.
y1
• We are also told that, at time zero, the percentage rate of change of population, 100 y ,
y1
= 0.02. But, from the differential equation, yy = a(K ´ y).
1
is 2, so that, at time zero, y
2
Hence at time zero, 0.02 = a(100 ´ 5), so that a = 9500 .
We now know a and K and can solve the (separable) differential equation
ż h
dy dy 1 1 1 i
ż
= ay(K ´ y) ùñ = a dt ùñ ´ dy = a dt
dt y(K ´ y) K y y´K
1
ùñ [ln |y| ´ ln |y ´ K|] = at + C
K
|y| ˇ y ˇ
ùñ ln = aKt + CK ùñ ˇ ˇ = De aKt
ˇ ˇ
|y ´ K| y´K
285
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
ˇ ˇ
ˇ y ˇ y
with D = eCK . We know that y remains between 0 and K, so that ˇ y´K ˇ = K´y and our
solution obeys
y
= De aKt
K´y
At this stage, we know the values of the constants a and K, but not the value of the constant
D. We are given that at t = 0, y = 5. Subbing in this, and the values of K and a,
5 5
= De0 ùñ D =
100 ´ 5 95
So the solution obeys the algebraic equation
y 5
= e2t/95
100 ´ y 95
which we can solve to get y as a function of t.
5 2t/95
y = (100 ´ y) e ùñ 95y = (500 ´ 5y)e2t/95
95
ùñ 95 + 5e2t/95 y = 500e2t/95
500e2t/95 100e2t/95 100
ùñ y = 2t/95
= 2t/95
=
95 + 5e 19 + e 1 + 19e´2t/95
Finally,
100
• In the year 2000, t = 14 and y = 1+19e´28/95
« 6.6 billion.
• In the year 2100, t = 114 and y = 1+19e100
´228/95 « 36.7 billion.
Example 3.12.14
• The first interest payment is made at time t = n1 . Because the balance in the account
th
during the time interval 0 ă t ă n1 is $P and interest is being paid for n1 of a year,
1 r
that first interest payment is n ˆ 100 ˆ P. After the first interest payment, the balance
in the account is P + n1 ˆ 100r r
ˆ P = 1 + 100n P.
• The second interest payment is made at time t = n2 . Because the balance in the
account during the time interval n1 ă t ă n2 is 1 + 100n r
P and interest is being paid
th
for n1 of a year, the second interest payment is n1 ˆ 100 r r
ˆ 1 + 100n P. After the
r
1 r
second interest payment, the balance in the account is 1 + 100n P + n ˆ 100 ˆ 1+
r
r
2
100n P = 1 + 100n P.
286
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
• And so on.
m
In general, at time t = n (just after the mth interest payment), the balance in the account is
r m r nt
B(t) = 1 + P = 1+ P (3.12.3)
100n 100n
Three common values of n are 1 (interest is paid once a year), 12 (i.e. interest is paid
once a month) and 365 (i.e. interest is paid daily). The limit n Ñ 8 is called continuous
compounding77 . Under continuous compounding, the balance at time t is
r nt
B(t) = lim 1 + P
nÑ8 100n
You may have already seen the limit
If you haven’t seen (3.12.4) before, that’s OK. In the following example, we rederive
(3.12.5) using a differential equation instead of (3.12.4).
Example 3.12.15
Suppose, again, that you deposit $P in a bank account at time t = 0, and that the account
pays r% interest per year compounded n times per year, and denote by B(t) the balance
at time t. Suppose that you have just received an interest payment at time t. Then the next
interest payment will be made at time t + n1 and will be n1 ˆ 100
r
ˆ B(t) = 100n r
B(t). So,
1
calling n = h,
r B(t + h) ´ B(t) r
B(t + h) = B(t) + B(t)h or = B(t)
100 h 100
To get continuous compounding we take the limit n Ñ 8 or, equivalently, h Ñ 0. This
gives
B(t + h) ´ B(t) r dB r
lim = B(t) or (t) = B(t)
hÑ0 h 100 dt 100
r
By Theorem 3.12.10, with a = 100 and b = 0,
77 There are banks that advertise continuous compounding. You can find some by googling “interest is
compounded continuously and paid”
287
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
Example 3.12.16
(a) A bank advertises that it compounds interest continuously and that it will double your
money in ten years. What is the annual interest rate?
(b) A bank advertises that it compounds monthly and that it will double your money in
ten years. What is the annual interest rate?
Solution. (a) Let the interest rate be r% per year. If you start with $P, then after t years,
you have Pert/100 , under continuous compounding. This was (3.12.5). After 10 years you
have Per/10 . This is supposed to be 2P, so
r
Per/10 = 2P ùñ er/10 = 2 ùñ = ln 2 ùñ r = 10 ln 2 = 6.93%
10
(b) Let the interest rate be r% per year. If you start with $P, then after t years, you have
r
12t
P 1 + 100ˆ12 , under monthly compounding. This was (3.12.3). After 10 years you have
r
120
P 1 + 100ˆ12 . This is supposed to be 2P, so
r 120 r 120 r
P 1+ = 2P ùñ 1+ =2 ùñ 1+ = 21/120
100 ˆ 12 1200 1200
r
ùñ = 21/120 ´ 1 ùñ r = 1200 21/120 ´ 1 = 6.95%
1200
Example 3.12.16
Example 3.12.17
A 25 year old graduate of UBC is given $50,000 which is invested at 5% per year com-
pounded continuously. The graduate also intends to deposit money continuously at the
rate of $2000 per year.
(a) Find a differential equation that A(t) obeys, assuming that the interest rate remains
5%.
(b) Determine the amount of money in the account when the graduate is 65.
(c) At age 65, the graduate will start withdrawing money continuously at the rate of W
dollars per year. If the money must last until the person is 85, what is the largest
possible value of W?
Solution. (a) Let’s consider what happens to A over a very short time interval from time
t to time t + ∆t. At time t the account balance is A(t). During the (really short) specified
5
time interval the balance remains very close to A(t) and so earns interest of 100 ˆ ∆t ˆ A(t).
During the same time interval, the graduate also deposits an additional $2000∆t. So
A(t + ∆t) ´ A(t)
A(t + ∆t) « A(t) + 0.05A(t)∆t + 2000∆t ùñ « 0.05A(t) + 2000
∆t
288
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
dA
= 0.05A + 2000
dt
(b) The amount of money at time t obeys
dA
= 0.05A(t) + 2,000 = 0.05 A(t) + 40,000
dt
So by Theorem 3.12.10 (with a = 0.05 and b = ´40,000),
A(t) = A(0) + 40,000 e0.05t ´ 40,000
At time 0 (when the graduate is 25), A(0) = 50,000, so the amount of money at time t is
(c) When the graduate stops depositing money and instead starts withdrawing money at
a rate W, the equation for A becomes
dA
= 0.05A ´ W = 0.05( A ´ 20W )
dt
assuming that the interest rate remains 5%. This time, Theorem 3.12.10 (with a = 0.05 and
b = 20W) gives
A(t) = A(0) ´ 20W e0.05t + 20W
If we now reset our clock so that t = 0 when the graduate is 65, A(0) = 625, 015.05. So the
amount of money at time t is
We want the account to be depleted when the graduate is 85. So, we want A(20) = 0. This
is the case if
Example 3.12.17
Suppose you borrow $750,000 from the bank under the following conditions:
289
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
$4381.25
$1881.25
The dotted line shows monthly interest payments; the solid line shows
monthly payments ($2500+interest).
In this option, your actual monthly payments to the bank vary quite a bit over the 25
years of the loan. If you expect your salary to grow over time, you pay the highest
payments early on, when you make the least amount of money. So, this option is not
ideal.
Option 2: Let’s figure out how to pay off the loan in such a way that your monthly pay-
ments are the same each month, for all 300 months. Again, let P(t) be the amount of
the loan left to repay the bank after you’ve made t monthly payments. Each month,
.25
you pay back some portion of the loan, plus an interest payment of 100 P ( t ).
The amount of the loan you’ve paid back in month t is P(t ´ 1) ´ P(t). In particular,
P ( t ) ´ P ( t ´ 1) P(t) ´ P(t ´ h)
P ( t ´ 1) ´ P ( t ) = ´ « ´ lim = ´P1 (t)
1 hÑ0 h
Thinking of our monthly payments on the loan as how fast P(t) changes, it makes
sense to approximate them by the rate of change of P. The important detail is that
290
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
P(t) is decreasing as you pay positive amounts, which is why we use ´P1 (t) as the
approximation of the amount you paid.
All together, the amount you pay each month is about:
.25
loan payment + interest = ´P1 (t) + P(t)
100
In Option 2, we want this amount to be constant. Let’s call that constant monthly
.25
payment C. This gives us a linear differential equation, C = ´P1 (t) + 100 P(t), or
1
P1 ( t ) = P(t) ´ 400C
400
By Theorem 3.12.10,
P(t) = P(0) ´ 400C et/400 + 400C
= 750, 000 ´ 400C et/400 + 400C
So, a monthly payment of roughly $3553.60 would be sufficient to pay off the loan
in 25 years. The amount of that
monthly
payment that goes to the loan itself is about
1875
P1 (t) = (1875 ´ C )et/400 = e3/4´1
et/400 , while the rest is interest.
$3553.60
$1875.00
The dotted line shows monthly interest payments; the solid line shows
total monthly payments (principal+interest).
291
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS
Initial payments consist of roughly equal parts interest and principal. Over time,
payments consist of more and more principal, with less and less interest.
We note here that the Government of Canada mortgage calculator gives a monthly payment of
$3,549.34 for a mortgage of $750,000 with annual rate of 3% (0.25% ˆ 12) and amortization
period 25 years. It also mentions that the total interest paid will be $314,802.37.
Aside from monthly payments, we can also look at the total amount of interest paid in
the two scenarios. In Option 1, the amount of interest paid in month t was 25 4 (301 ´ t ). So,
over 300 months, the total interest paid was:
!
301
ÿ 25 301
25 ÿ
(301 ´ t) = 301 ¨ 300 ´ t
4 4
t =1 t =1
25 301 ¨ 302 25 ¨ 301 ¨ (300 ´ 151)
= 301 ¨ 300 ´ =
4 2 4
= 280, 306.25
1 1875
For Option 2, in month t, interest paid was approximately 400 P(t) = e3/4 ´1
e3/4 ´ et/400 .
Total interest is then approximately:
!
300 300
ÿ 1875 3/4 t/400
1875 3/4
ÿ
t/400
3/4
e ´e = 3/4 300 ¨ e ´ e
t =1
e ´1 e ´1 t =1
For lack of a nice formula, we’ll interpret the sum as a Riemann sum. It corresponds to the
right-hand Riemann sum for the area under the curve f (t) = et/400 from t = 0 to t = 300,
using 300 intervals.
ż 300 !
1875
« 3/4 300 ¨ e3/4 ´ et/400 dt
e ´1 t =0
h i300
1875 3/4 t/400
= 3/4 300 ¨ e ´ 400e
e ´1 t =0
1875
3/4 3/4
= 3/4 300 ¨ e ´ 400 e ´ 1
e ´1
« 316081.01
Chapter 3 of this work was adapted from Chapter 1 and Section 2.4 of CLP 2 – Inte-
gral Calculus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-
NonCommercial-ShareAlike 4.0 International license.
292
Chapter 4
P ROBABILITY
4.1IJ Introduction
Before we start, a note. Most terms in this introductory section (probability, event, value)
accord pretty well with their usage in everyday life. However, later on in the chapter we
will introduce new vocabulary and notation (PDF, CDF, E) whose interpretations are far
less obvious. Keeping track of definitions will be key to understanding what’s going on.
Make flashcards if you have a hard time remembering different terms. If you read a term
whose meaning you’ve forgotten, look it up! If we don’t have the same vocabulary, then
we aren’t speaking the same language – so it will be difficult to explain things.
Definition4.1.1.
293
P ROBABILITY 4.1 I NTRODUCTION
Definition4.1.2.
Events result in values. The event of rolling a dice might result in the value 1; the
event of flipping a coin might result in the value heads; and the event of choosing a person
might result in the value Parham. We usually use lower-case letters as variables specifying
values.
We’ll mostly use events that result in numerical values, although coin flips are a handy
experiment as well. Unless otherwise specified, you can assume values will be numbers.
(Otherwise our formulas become quite abstract – we won’t ask you to average people or
integrate colours.)
Example 4.1.3
Let X be the random variable corresponding to the event of rolling a standard 6-sided
dice1 . X can result in any of the values 1, 2, 3, 4, 5, or 6.
Suppose we are playing a game and our points are determined by doubling the num-
ber rolled. We might write the following:
If X = x, the number of points earned is 2x.
The equation X = x looks jarring at first, but becomes natural with practice. Remem-
ber that X corresponds to the event (rolling a dice), while x corresponds to the outcome of
that event (e.g. 5).
Example 4.1.3
Notation4.1.4.
We’ll use the shorthand Pr ( A) to mean “the probability that A happens.” For
example:
Pr ( X = x )
denotes “the probability that the event X results in the value x.”
Example 4.1.5
1 In the interest of clarity, we’ll use “dice” as its own singular (as is common in colloquial English), rather
than ”die” (which is more standard in academic English).
294
P ROBABILITY 4.1 I NTRODUCTION
1
where X is the event (dice roll), 5 is the value, and 6 is the probability.
Example 4.1.5
Example 4.1.6
1
Pr ( X = 1 or X = 2) =
3
Example 4.1.6
Example 4.1.7
Let X be the event of flipping a fair coin (that is, a coin that is equally likely to come up
heads or tails), and let x be one of the values “heads” or “tails.” Then:
1
Pr ( X = x ) =
2
Example 4.1.7
Definition4.1.8.
The sample space of an event is the set of all possible outcomes. We will use S
to denote the sample space.
Example 4.1.9
295
P ROBABILITY 4.1 I NTRODUCTION
Warning4.1.10.
dx
= yPyx ( x, s) ´ xPxy ( x, s)
dt
where y = 1 ´ x is the complementary fraction of the population speaking Y
at time t.
2 that’s farther than driving at 100 kph around the clock for one million years, so it’s safe to say no car
has more mileage than this
296
P ROBABILITY 4.1 I NTRODUCTION
• y is the fraction of the population speaking language Y at a given time. Under the
simplified assumptions of the model in the paper, everyone speaks either X or Y,
but not both, at a particular time. So, y = 1 ´ x.
dx
• dtis the rate of change of speakers of language X over time. So if dx
dt is positive, then
dx
dtis increasing and so people are changing from language Y to language X; if dx dt is
negative, then people are changing from language X to language Y.
• The random event in question is a person changing their language. The three values
in its sample space are: person does not change; person changes from X to Y; and
person changes from Y to X.
• The paper uses notation that is different from this textbook. They write Pyx for “the
probability that a person changes from Y to X, and they write Pxy for “the probability
that a person changes from X to Y.
• The probabilities come with arguments: Pyx ( x, s) and Pxy ( x, s). The variables inside
the parentheses are function variables. How likely someone is to switch languages is
not a fixed constant, but rather a function depending on how many people speak the
language, and how much status that language is perceived to have. So, Pyx and Pxy
are functions of multiple variables, like the functions we worked with in Chapter 2.
Now that we understand all the notation, we can figure out where the equation in the
quote came from.
• x increases as people switch from speaking Y to speaking X. Pyx is the proportion of
speakers of Y that we expect to change to X. The number of speakers of Y is y. So,
we expect yPyx people to change from Y to X.
• All together, the change in x is (number of people coming to X from Y) minus (num-
ber of people going to Y from X), or
dx
= yPyx ´ xPxy
dt
which is exactly the equation from the article.
Example 4.1.11
297
P ROBABILITY 4.1 I NTRODUCTION
Definition4.1.12.
If the sample space of a random variable can be written as a list (as opposed to
existing on a continuum), then the sample space and the random variable are
discrete.
Let X be the random variable corresponding to choosing a whole number in [1, 10].
1 2 3 4 5 6 7 8 9 10
The values in the sample space can be listed: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. So, X is discrete.
Example 4.1.13
Example 4.1.14
Let Y be the random variable corresponding to choosing any real number number in [1, 10]
1 10
S = [1, 10]. There are infinitely many possible values, along a continuum, that could
result. Y is not discrete, it is continuous.
Example 4.1.14
Example 4.1.15
For each of the following events, describe the sample space as discrete or continuous,
where we are still using “continuous” informally as the opposite of “discrete.”
3 a set is countable if there exists an injective (one-to-one) function from that set to the natural numbers.
4 We aren’t monsters–there’s no such thing as half a pet.
5 Imagine you have have perfect precision.
298
P ROBABILITY 4.1 I NTRODUCTION
Solution.
1. The sample space is discrete, S = t3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18u.
2. This is also discrete. The number of pets you have is a whole number: 0, 1, 2, 3, etc.
3. You can be any age from 0 to, say, 500 years. If we have exact precision, your age is
a real number: you might be 19.015 years old, or 19.016 years old, or somewhere in
between. So with exact precision, we’re taking numbers along a continuum. This is
a continuous sample space.
4. Similar to the above, if we have exact precision, our answer can be any non-negative
real number, so this is a continuous sample space.
Example 4.1.15
When our variables are describing physical processes, the line between discrete and
continuous can be somewhat blurry. For example, suppose we’re measuring the amount
of water in a container. Volumes in general exist on continuums (or continua), like the
volume of a box in Example 4.1.15, so we could think of this as a continuous sample
space. Alternately, we could think of the amount of water as a discrete quantity, because
the number of molecules is in integer.
Remembering back to our definition of the definite integral (Definition 3.1.8), we ap-
proximated a curvy area with lots of small rectangles. In a similar way, continuous sample
spaces can be approximated with discrete sample spaces. The reason we need the distinc-
tion is less important for actual measurements, and more important for deciding how to
perform calculations. You’ll see as we progress through the chapter that some calculations
only make sense in one type of variable, and not the other.
Two outcomes of an event are disjoint if no value in the sample space can be
described by both outcomes.
For example, consider the event of rolling a dice, corresponding as usual to the discrete
random variable X. The outcomes X = 1 and X = 2 are disjoint, because no dice roll will
result in both of them being true. On the other hand, the outcomes X ą 2 and “X is even”
are not disjoint, because a roll of 4 or 6 makes both of them true.
Example 4.1.17
Let X be a continuous random variable with sample space [0, 10]. For each collection of
outcomes below, decide whether the outcomes are disjoint or not.
1. X ă 5; X ě 5
299
P ROBABILITY 4.1 I NTRODUCTION
2. X ě 9; X ě 8
3. 1 ă X ă 2; X even; X odd
Solution.
1. These are disjoint; no number is both less than five, and also greater than or equal to
five.
2. These are not disjoint: X = 9, for example, makes both true. (So does X = 9.5,
X = 10, etc.)
3. These are disjoint. If X is an integer, then it is even or odd but not both, and it is not
in the interval (1, 2). If X is in the interval (1, 2), then it is not an integer, so it is not
even and not odd.
Example 4.1.17
Theorem4.1.18.
Warning4.1.19.
In a mathematical context, “or” has a slightly different meaning from its collo-
quial use. When we say “A or B,” we mean “A, B, or both.”
A common joke is that when a mathematician is sharing cookies, and tells you
that you may choose peanut butter or chocolate, you are free to take two cookies.
Proof. This comes from the interpretation of a probability as the proportion of trials where
an outcome occurs. If A and B never occur at the same trial, then the proportion where
one or the other occurs is simply the sum of the proportions where one occurs.
Example 4.1.20
1 2 1 1
Pr ( X = 1) + Pr ( X = 2) + Pr ( X = 3) = + + = .
6 6 6 2
300
P ROBABILITY 4.1 I NTRODUCTION
Example 4.1.20
Example 4.1.21
Suppose there is a lottery where you pick five numbers, and you win a prize if at least
three of your five picks accord with the winning five numbers. Suppose you know that
1
the probability of matching exactly three numbers is 100 ; the probability of matching ex-
1 1
actly four numbers is 1000 ; and the probability of matching exactly five numbers if 10000 .
Then the probability of winning something is the probability of matching 3, 4, or 5
1 1 1
numbers: 100 + 1000 + 10000 .
Example 4.1.21
Example 4.1.22
Suppose for the province of British Columbia, the probability that a randomly chosen
3
adult resident will apply for employment insurance (EI) benefits in 2021 is 100 , while the
probability that a randomly chosen adult resident will be laid off from their job in 2021 is
7
100 .
True or false: the probability that a randomly chosen adult resident will apply for EI
1
or be laid off is 10 .
Solution. Not necessarily true (and almost certainly false). These are not disjoint events,
so Theorem 4.1.18 does not apply.
Example 4.1.22
Suppose you roll one fair six-sided die, with the numbers t1, 2, 3, 4, 5, 6u on its faces, and
you need to roll at least 5 to win a game.
There are two values that win you the game, 5 and 6. Each is expected to occur 16 of
the time. So,
1 1 1
Pr ( X ě 5) = + =
6 6 3
If you were to roll the die only a few times, you would not be surprised if your ob-
served results did not match the probability. If you were to roll the die a very large num-
ber of times, you would expect that, overall, roughly (if not exactly) 31 of the rolls would
result in an outcome of ”at least five”.
301
P ROBABILITY 4.1 I NTRODUCTION
Example 4.1.23
It is important to realize that in many situations, the outcomes are not equally likely. Look
at the dice in a game you have at home; the spots on each face are usually small holes
carved out and then painted to make the spots visible. Your dice may or may not be
biased; it is possible that the outcomes may be affected by the slight weight differences
due to the different numbers of holes in the faces. Casino dice have flat faces; the holes
are completely filled with paint having the same density as the material that the dice are
made out of so that each face is equally likely to occur.
The continuous analog of “equally likely” is uniformly distributed.
Definition4.1.24.
Example 4.1.25
Corollary4.1.26.
302
P ROBABILITY 4.1 I NTRODUCTION
Example 4.1.27
Let X be a continuous random variable that is uniformly distributed on its sample space,
the interval [0, 10]. What is Pr (7 ď X ď 9)?
Solution.
The interval [7, 9] has length 2; the sample space interval [0, 10] has length 10. So,
2 1
Pr (7 ď X ď 9) = =
10 5
Example 4.1.27
Example 4.1.28
Let X be a continuous random variable that is uniformly distributed across its sample
space [´8, 17]. Calculate the probabilities below.
1. Pr (1 ď X ď 2)
2. Pr (´5 ď X )
3. Pr (´10 ď X ď 10)
Solution.
2´1 1
1. By Corollary 4.1.26, Pr (1 ď X ď 2) = 17´(´8)
= 25
2. Since X only takes on values in its sample space [´8, 17]: Pr (´5 ď X ) = Pr (´5 ď
17´(´5)
X ď 17). By Corollary 4.1.26, Pr (´5 ď X ď 17) = 17´(´8) = 22
25
3. Since X only takes on values in its sample space [´8, 27]: Pr (´10 ď X ď 10) =
Pr (´8 ď X ď 10). Now the interval [´8, 10] is inside our sample space, unlike the
interval [´10, 10], so we can apply Corollary 4.1.26.
10´(´8) 18
Pr (´8 ď X ď 10) = 17´(´8)
= 25
303
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)
Example 4.1.28
Example 4.1.29
Suppose the continouous variable X is the age of a randomly chosen living person, mea-
sured in years with exact precision. Then X is more likely to be near 50 than it is to be near
110. So, X is not uniformly distributed.
Example 4.1.29
For a discrete random variable, the description of the probabilities of all events in its sam-
ple space is its probability mass function.
Definition4.2.1.
A probability mass function (PMF) for a discrete random variable X is the func-
tion f ( x ) from R to [0, 1], where
f ( x ) = Pr ( X = x )
#
1
6 x = 1, 2, 3, 4, 5, or 6
f (x) =
0 else
for a dice roll. In particular, f ( x ) = 0 for every value x not in the sample space of the
random variable.
304
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)
Notation4.2.2.
Rather than writing a piecewise function every time, we will represent the prob-
ability mass function (PMF) of a random variable X using a table, set up like
this:
x Pr ( X = x )
1
1 Pr ( X = 1) = 6
1
2 Pr ( X = 2) = 6
1
3 Pr ( X = 3) = 6
1
4 Pr ( X = 4) = 6
1
5 Pr ( X = 5) = 6
1
6 Pr ( X = 6) = 6
where events not in the sample space do not show up in the table.
Theorem4.2.3.
• the sum of the probabilities of all values in the sample space is one.
305
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)
Warning4.2.4.
Notation4.2.5.
The notation ÿ
f (x)
x
means we take the sum of f ( x ) for every value x in some set. In this context, that
set is understood to be the sample space of a random variable.
We may also omit the bound, writing simply
ÿ
f (x)
The sample space may or may not be a range of integers, which is why this notation is
slightly different from the sigma notation we use in the other chapters of this book.
Example 4.2.6
A child psychologist is interested in the number of times per night a newborn baby’s cry-
ing wakes its parent. The record this number for 100 different parents.
0 5
1 5
2 40
3 23
4 13
5 10
6 0
7 3
8 1
306
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)
Suppose we pick one parent uniformly at random. Let X be the number of times per
night that parent is woken up. X takes on the values 0, 1, 2, 3, 4, 5, 6, 7, 8.
x P( X = x )
5
0 P ( X = 0) = 100
5
1 P ( X = 1) = 100
40
2 P ( X = 2) = 100
23
3 P ( X = 3) = 100
13
4 P ( X = 4) = 100
10
5 P ( X = 5) = 100
0
6 P ( X = 6) = 100
3
7 P ( X = 7) = 100
1
8 P ( X = 8) = 100
ÿ 5 5 40 23 13 10 0 3 1
Pr ( X = x ) = + + + + + + + + =1
x
100 100 100 100 100 100 100 100 100
Example 4.2.6
Example 4.2.7
A hospital researcher is interested in the number of times an average post-op patient will
ring the nurse during a 12-hour shift. For a random sample of 50 patients, the following
information was obtained.
Let X be the number of times a patient rings the nurse during a 12-hour shift.
307
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)
x P( X = x )
4
0 P ( x = 0) = 50
8
1 P ( x = 1) = 50
16
2 P ( x = 2) = 50
14
3 P ( x = 3) = 50
6
4 P ( x = 4) = 50
2
5 P ( x = 5) = 50
Example 4.2.7
Example 4.2.8
Suppose Nancy has classes three days a week. She attends classes three days a week 80%
of the time, two days 15% of the time, one day 4% of the time, and no days 1% of the time.
Suppose one week is randomly selected.
a. What is the random variable in this case? Call it X.
b. What values does X take on?
c. Construct a probability mass table (called a PM table) like the one in Example 4.2.6.
The table should have two columns, labelled x and P( X = x ).
d. What does the P( x ) column sum to?
Solution.
a. X is the number of days Nancy went to class on the randomly selected week.
b. From the description, X has sample space t0, 1, 2, 3u.
x P( X = x )
0 P( x = 0) = 0.01
c. 1 P( x = 1) = 0.04
2 P( x = 2) = 0.15
3 P( x = 3) = 0.8
308
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)
ř3
d. x =0 Pr ( X = x ) = 0.01 + 0.04 + 0.15 + 0.8 = 1, which accords with Definition 4.2.1.
Example 4.2.8
Example 4.2.9
Suppose a person is chosen at random from a group. Let X be the discrete random variable
describing the number of siblings that person has, and suppose the following probabilities
hold for X:
x P( X = x )
0 P( x = 0) = 0.25
1 P( x = 1) = 0.3
2 P( x = 2) = 0.25
3 P( x = 3) = 0.1
4 P( x = 4) = 0.05
That tells us this is not a probability mass function (PMF). Since all probabilities are
numbers in the interval [0, 1], it must be the case that we haven’t summed over all values
in the sample space. That is, in our sample of people, there must be some people who
haven’t been described here, e.g. people with more than four siblings. (Indeed, these
folks would make up 5% of the group.)
Example 4.2.9
309
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)
y
1
3
x
1 2 3
Furthermore, Pr ( X ď 2) = 32 .
y
1
5
x
1 1.5 2 2.5 3
1
For example, Pr ( X = 2) = 5 and Pr ( X = 7) = 0. Furthermore, Pr ( X ď 2) = 35 .
1
21
x
1 2 3
11
Furthermore, Pr ( X ď 2) = 21 .
• So far, all the examples have been discrete systems. What if we want X to be a
continuous variable? We want to be able to choose any real number from 1 to 3. In
this case, there are infinitely many numbers to choose from. So, the probability of
choosing any of them is... zero!
310
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)
x
1 2 3
This is a problem! We know we’re choosing numbers between 1 and 3, but we have
Pr ( X = 1) = 0 and Pr ( X = 4) = 0. So the probability mass function (PMF) is not
useful for describing continuous random variables. We need a different tool.
On the other hand, it’s easy to imagine that Pr ( X ď 2) = 12 . So somehow this
calculation didn’t break when we moved from a discrete system to a continuous
system.
Definition4.3.1.
This might seem like a weirdly specific definition. Secretly, our main purpose in cre-
ating this function is to use it as a tool to define two other things: a continuous random
variable, and the probability density function. Our motivation for defining the cumulative
distribution function (CDF) may lie with continuous random variables, but the definition
applies to discrete random variables as well.
Example 4.3.2
2. Pr ( X ą 10)
311
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)
3. Pr ( X ď 0)
4. Pr ( X ě 200)
5. Pr (10 ă X ď 20)
Solution.
1. By Definition 4.3.1: Pr ( X ď 50) = F (50); using the formula given for F ( x ), this is
502
104
= 41
2. Pr ( X ą 10) is the probability that X is not less than or equal to 10, so
102 1 99
Pr ( X ą 10) = 1 ´ Pr ( X ď 10) = 1 ´ F (10) = 1 ´ 4
= 1´ =
10 100 100
3. Pr ( X ď 0) = F (0) = 0. Note this tells us that X never takes negative values.
4. Note Pr ( X ď 100) = F (100) = 1. That tells us that X always takes values less than
or equal to 100. Combined with our last note, that means the only values X ever
takes are in the interval [0, 100]. So, Pr ( X ě 200) = 0.
5. We can think of the interval (10, 20] as “numbers that are less than equal to 20 except
numbers less than or equal to 10.” We rewrite Pr (10 ă X ď 20) in a manner similar
to Problem 2:
Pr (10 ă X ď 20) = Pr ( X ď 20 and X ğ 10) = Pr ( X ď 20) ´ Pr ( X ď 10)
202 102 3
= F (20) ´ F (10) = ´ =
104 104 100
Example 4.3.2
The ideas in the calculations of 2 and 5 above give us the following corollary.
Corollary4.3.3.
1. Pr ( X ą a) = 1 ´ F ( a), and
2. Pr ( a ă X ď b) = F (b) ´ F ( a)
Proof. The probability Pr ( X ą a) is the probability that X is not less than or equal to a, so
Pr ( X ą a) = 1 ´ Pr ( X ď a) = F ( a)
The probability Pr ( a ă X ď b) is the probability that X is less than or equal to b and X
is not less than or equal to a.
Pr ( a ă X ď b) = Pr ( X ď b) ´ Pr ( X ď a) = F (b) ´ F ( a)
312
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)
Example 4.3.4
Let X be a discrete random variable with probability mass function (PMF) below.
x Pr ( X = x )
1
10 16
3
20 16
5
30 16
7
40 16
Note that the only values taken on by X are the numbers 10, 20, 30, and 40.
Let F ( X ) be the cumulative distribution function (CDF) of X.
1
F (10) = Pr ( X ď 10) = Pr ( X = 10) =
16
• Similarly,
1 3 5 9
F (30) = Pr ( X ď 30) = Pr ( X = 10 or X = 20 or X = 30) = + + =
16 16 16 16
and
• F (11) = Pr ( X ď 11) = Pr ( X = 10), since 10 is the only number ever taken by X that
1
is less than or equal to 11. So, F (11) = F (10) = 16 . Indeed, F ( x ) = F (10) for all x in
the interval [10, 20)
313
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)
9/16
1/4
1/16
x
10 20 30 40 50
Example 4.3.4
Example 4.3.5
Let U be a random variable that is chosen uniformly at random from all real numbers in
the interval [0, 1]. Understanding the cumulative distribution function (CDF) F ( x ) of U
can help us understand what “uniformly” means in this case.
As we saw in section 4.2.1, it’s not useful to note that Pr (U = x ) is the same for every
number in [0, 1], because that probability is 0. We can get at the meaning of “uniformly”
in a more useful way by examining ranges of numbers.
If we were to divide our interval6 in half, then the uniformity of distribution tells us
that half the time, U is in one half, and half the time, U is in the other half. In particular,
1 1 1
Pr 0 ď U ď = Pr ďUď1 =
2 2 2
6 Since Pr (U = 12 ) = 0, it won’t matter whether we use the interval [0, 1/2] or [0, 1/2).
314
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)
If we were to divide our interval into equal tenths, then the uniformity of distribution
tells us that U should fall in each interval one-tenth of the time. For example,
1 1 1 1 1 1
Pr 0 ď U ď = Pr ďUď = Pr ďUď =
10 10 20 20 30 10
So,
1 1
F =
10 10
In general, if x is a number in the interval [0, 1], then x describes the proportion of [0, 1]
taken up by the interval [0, x ], so F ( x ) = x.
$
&0
’ xă0
F(x) = x 0ďxď1
1 1ăx
’
%
x
1
Example 4.3.5
The cumulative distribution function (CDF) will give us our actual definition of a con-
tinuous random variable. Thinking of “continuous” as the opposite of “discrete” is not
sufficiently accurate.
Definition4.3.6.
Example 4.3.7
The random variables from Examples 4.3.2 and 4.3.5 are continuous random variables.
The random variable from Examples 4.3.4 is not a continuous random variable.
Example 4.3.7
315
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)
Corollary4.3.8.
Pr ( X = a) = 0
Furthermore,
Pr ( X ă a) = Pr ( X ď a) and Pr ( X ą a) = Pr ( X ě a)
lim F ( x ) = lim Pr ( X ď x ) = Pr ( X ă a)
xÑa´ xÑa´
lim F ( x ) = F ( a)
xÑa´
So,
Pr ( X ă a) = Pr ( X ď a)
Pr ( X ď a) ´ Pr ( X ă a) = 0
Pr ( X = a) = 0
Pr ( X ď a) = Pr ( X ă a) + Pr ( X = a) = Pr ( X ă a)
Pr ( X ě a) = Pr ( X ą a) + Pr ( X = a) = Pr ( X ą a)
Example 4.3.9
V is a number chosen at random from all real numbers in the intervals [´3, ´1] or [1, 3] as
follows:
• First, a fair 6-sided dice is rolled. If the outcome of the roll is 1 or 2, then V is chosen
to be in the interval [´3, ´1]. If the outcome of the roll is 3, 4, 5, or 6, then V is chosen
to be in the interval [1, 3].
• Within the selected interval, V is chosen uniformly at random.
Determine the cumulative distribution function (CDF) of V and decide whether or not
V is continuous.
Solution. From the first step, we see that V is in the interval [´3, ´1] one-third of the time,
and in the interval [1, 3] two-thirds of the time.
1 2
Pr (´3 ď V ď ´1) = , Pr (´1 ď V ď 3) =
3 3
316
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)
Also,
1
Pr (´3 ď V ď ´2) + Pr (´2 ď V ď ´1) = Pr (´3 ď V ď ´1) =
3
So,
1
Pr (´3 ď V ď ´2) = Pr (´2 ď V ď ´1) =
6
Following the reasoning in Example 4.3.5, we see on the interval [´3, ´1], the function
F ( x ) is a straight line from F (´3) = 0 to F (´1) = 13 .
When ´1 ă x ă 1, then F ( x ) = Pr ( X ď x ) = Pr ( X ď ´1) = F (´1), since no values of
V are ever less than 1 without also being less than or equal to ´1. Then, by Corollary 4.3.8,
also F (1) = Pr ( X ď 1) = Pr ( X ă 1) = Pr ( X ď ´1) = F (´1).
On the interval [1, 3], V is uniformly distributed. Following the familiar line of reason-
ing, the function F ( x ) is a straight line from F (1) = 13 to F (3) = 1. All together:
1
3
x
´3 ´1 1 3
317
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)
2. F ( x ) is nondecreasing
3. lim F ( x ) = 1
xÑ8
4. lim F ( x ) = 0
xÑ´8
2. Suppose a ă b.
F ( a) = Pr ( X ď a) ď Pr ( X ď a) + Pr ( a ă X ď b) = Pr ( X ď b) = F (b)
3. Rather than a rigorous proof, we offer the following hand-wavey intuition: if infinity
were a number, we’d expect F (8) = Pr ( X ď 8) = 1.
4. Rather than a rigorous proof, we offer the following hand-wavey intuition: if nega-
tive infinity were a number, we’d expect F (´8) = Pr ( X ď ´8) = 0.
1 2 3 4 5 6
1 2 3 4 5 6
318
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
6. After 100 choices, our dots would start being so close together, they would be indis-
tinguishable, so we might choose to make the dots slightly transparent. Then darker
dots would represent the same heights (or nearly the same heights) being repeated.
1 2 3 4 5 6
Example 4.3.11
The dots (trial outcomes) are twice as dense on the right interval. Inside the right
interval, and inside the left interval, the dots are fairly evenly disributed.
Example 4.3.11
Example 4.3.12
Match the dot diagrams to the variable descriptions so that every description corresponds
to exactly one dot diagram.
319
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)
A. Pr ( X ď 0) = Pr ( X ě 0). 1.
2.
B. X is uniformly distributed.
3.
C. Pr ( X ď 0) ă Pr ( X ě 0).
4.
D. Pr ( X ď 0) ă Pr ( X ě 0).
Solution. In both 1 and 2, it seems like (roughly) the same number of trials resulted in
positive and negative values of X. So in both cases, A holds. However, in 2, the distribu-
tion is not uniform: trials are more likely to have large absolute values than to be near 0.
So, we match B to 1 and A to 2.
In 3, more trials gave X ď 0 than X ě 0, so we match that to D.
In 4, more trials gave X ě 0 than X ď 0, so we match that to C.
Example 4.3.12
320
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)
Suppose the dot diagram above represents some continuous random variable X, and
we want to measure the probability density near the indicated point a. We start by defin-
ing a small interval around a. As is tradition, we take the interval between a and a + h,
where h is some small7 real number.
It doesn’t make sense to count the dots is this interval, since the actual number will
change as we do different trials, so instead we measure the likeliness our random variable
is to be in this interval: Pr ( a ď X ď a + h). The length of the interval is h. So, our
probability density around a is:
Pr ( a ď X ď a + h)
h
If F ( x ) is our cumulative distribution function (CDF), then we can re-write this using
Corollaries 4.3.3 and 4.3.8.
F ( a + h) ´ F ( a)
=
h
Since we only consider small values of h, we recognize the definition of a derivative.
F ( a + h) ´ F ( a)
lim = F1 ( a)
hÑ0 h
This motivates our definition of the probability density function (PDF) of a continuous
random variable. Probability mass functions (PMFs) and probability density functions
(PDFs) serve similar purposes: describing which values a variable tends to take on.
Definition4.4.1.
The observant reader will note that the conventional use of F ( x ) as an antiderivative
of f ( x ) squares nicely with our use of F ( x ) for a cumulative distribution function (CDF)
and f ( x ) for a probability density function (PDF).
Warning4.4.2.
7 By “small,” we mean |h| « 0. In the discussion that follows, we’re considering the case h ą 0; the case
h ă 0 proceeds in the same way.
321
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)
Example 4.4.3
$
&0
’ xă0
F(x) = x2
0 ď x ď 100
’ 104
1 x ą 100
%
$
&0
’ xă0
f (x) = x
0 ď x ă 100
’ 5000
0 x ą 100
%
Translating f ( x ) into a dot diagram, to help build intuition about the behaviour of this
variable, we expect to see
• an increasing density of dots from left to right on the interval [0, 100].
Example 4.4.3
322
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)
Notation4.4.4.
and it is understood that f ( x ) is zero or doesn’t exist when x not in the interval
[0, 100).
Another time-saving measure is to use the words “else” or “otherwise” in a
piecewise-defined function. In the context of this function:
#
x
0 ď x ă 100
f ( x ) = 5000
0 else
“else” means “for all values of x other than the ones that have already been de-
fined,” i.e. for all values of x outside the interval [0, 100).
Example 4.4.5
$
&0
’ xă0
F(x) = x 0ăxă1
1 1ăx
’
%
$
&0
’ xă0
f (x) = 1 0ăxă1
0 1ăx
’
%
Notice that the density is constant on the interval (0, 1). This is a hallmark of uniformly
distributed variables: in the interval in question, no one region is denser than any other
region.
323
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)
Example 4.4.6
The places where the probability density function (PDF) is 0 are telling: these are the
regions where our variable never reaches (impossible to occur).
8 You can see this by comparing the right and left limits of the limit definition of the derivative of F ( x ) at
these points.
324
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)
Our intuition about f ( x ) is that higher f ( x ) means more “hits” near x. In the dot
diagram above, f (2) ą f (´2), and indeed the dots are denser in the area 2 than near -2.
Example 4.4.6
Warning4.4.7.
Corollary4.4.8.
The first property of Corollary 4.4.8 is a key piece of intuition for working with prob-
ability density functions (PDFs) : the probability density function (PDF) of a continuous
random variable X is a function f ( x ) with the property that the area under the curve of
f ( x ) from a to b is equal to the probability that X lies between a and b.
y
f (x)
x
a b
shaded area: Pr ( a ď X ď b)
325
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)
Example 4.4.9
Solution.
1. By Corollary 4.4.8,
a 1
ż8 ż8
1= 2
dx = a 2
dx
´8 x + 1 ´8 x + 1
" ż0 żc #
1 1
= a lim dx + lim dx
bÑ´8 b x 2 + 1 cÑ8 0 x2 + 1
=a lim (arctan 0 ´ arctan b) + lim (arctan c ´ arctan 0)
bÑ´8 cÑ8
h ´π π i
= a 0´ + +0 = a¨π
2 2
1
So, a = π.
2. By Corollary 4.4.8,
ż 10 ż 10
1/π 1 arctan(10)
Pr (0 ď X ď 10) = f ( x )dx = 2
dx = [arctan 10 ´ arctan 0] = « 0.47
0 0 x +1 π π
326
P ROBABILITY 4.5 E XPECTED VALUE
Example 4.4.9
We can formalize the last part of the previous exercise as a corollary to Corollary 4.4.8.
Corollary4.4.10.
żx
Pr (´8 ă X ď x ) = f (t)dt
´8
Suppose I throw a 4-sided dice a large number of times, and record the number that comes
up each time. What will the average (mean) of those numbers be?
To calculate the mean, I’ll add up the results of my rolls and divide by the number of
rolls I took.
327
P ROBABILITY 4.5 E XPECTED VALUE
The numerator will consist of the numbers 1 through 4, since these are the numbers re-
sulting from a 4-sided dice roll. Let’s regroup the numerator so we add up all the 1s first,
then all the 2s second, etc.
(1 + 1 + ¨ ¨ ¨ ) + (2 + 2 + ¨ ¨ ¨ ) + (3 + 3 + ¨ ¨ ¨ ) + (4 + 4 + ¨ ¨ ¨ )
=
total number of rolls
(1 + 1 + ¨ ¨ ¨ ) (2 + 2 + ¨ ¨ ¨ ) (3 + 3 + ¨ ¨ ¨ ) (4 + 4 + ¨ ¨ ¨ )
= + + +
total rolls total rolls total rolls total rolls
This calculation, what we expect to have as our average if we perform the dice roll a
large number of times, motivates Definition 4.5.1 below.
Definition4.5.1.
328
P ROBABILITY 4.5 E XPECTED VALUE
Note the similarities between the continuous and discrete cases. A sum in the discrete
cases turns into an integral in the continuous case; Pr ( X = x ) turns into the probabil-
ity density function (PDF) f ( x ); and “every possible value of X” turns into the range
(´8, 8).
Example 4.5.2
In Example 4.2.6, we saw the following probability mass function (PMF) for the random
variable X:
x P( X = x )
5
0 P ( x = 0) = 100
5
1 P ( x = 1) = 100
40
2 P ( x = 2) = 100
23
3 P ( x = 3) = 100
13
4 P ( x = 4) = 100
10
5 P ( x = 5) = 100
0
6 P ( x = 6) = 100
3
7 P ( x = 7) = 100
1
8 P ( x = 8) = 100
8
ÿ
E( X ) = x ¨ Pr ( X = x )
x =0
= 0 ¨ Pr ( X = 0) + 1 ¨ Pr ( X = 1) + 2 ¨ Pr ( X = 2) + 3 ¨ Pr ( X = 3) + 4 ¨ Pr ( X = 4)
+ 5 ¨ Pr ( X = 5) + 6 ¨ Pr ( X = 6) + 7 ¨ Pr ( X = 7) + 8 ¨ Pr ( X = 8)
5 5 40 23 13 10 0 3 1
= 0¨ +1¨ +2¨ +3¨ +4¨ +5¨ +6¨ +7¨ +8¨
100 100 100 100 100 100 100 100 100
285
= = 2.85
100
The most literal interpretation of expected value in this context is this:
Suppose we choose a parent from a list at random many times, and each time
record the number of awakenings, X. After a large number of trials, we expect
the average of these X values to approach 2.85.
The average number of times a parent was woken up in our trial was 2.85.
329
P ROBABILITY 4.5 E XPECTED VALUE
Of course, no parent was woken up exactly 2.85 times in the night. Expected values
refer to averages, and do not necessarily accord well with individual trials.
Example 4.5.2
Probability does not describe the short-term results of an experiment. It gives informa-
tion about what can be expected in the long term. The Law of Large Numbers states that, as
the number of trials in a probability experiment increases, the difference between the the-
oretical probability of an event and the relative frequency approaches zero (the theoretical
probability and the relative frequency get closer and closer together).
Example 4.5.3
Suppose we flip a fair coin a large number of times. We want to record the average number
of times the flip resulted in heads.
Let X be the random variable corresponding to a coin flip, with X = 1 when the flip is
heads and X = 0 when the flip is tails. Using these assignments, if we add up the values
of X from each experiment, that sum tells us how many flips were heads. The expected
value of X is
2
ÿ 1 1 1
E( X ) = x ¨ Pr ( X = x ) = 0 ¨ + 1 ¨ =
2 2 2
x =1
Consider interpreting the expected value as a long-term average, using the law of large
numbers. If we were to flip a fair coin a large number of times, we would expect the
average value of X to be 21 . That is, we would expect roughly 12 of the tosses to result in
heads.
In 2009, intrepid undergraduate students at Berkeley tossed coins 40,000 times9 . The
tosses resulted in 20,217 heads. The fraction of coin tosses resulting in heads, therefore,
was
20, 217
= 0.505425
40, 000
which is indeed fairly close to 21 .
Example 4.5.3
Example 4.5.4
f ( x ) = ax2 (10 ´ x ), 0 ď x ď 10
where a is a constant.
Find a and E( X ).
330
P ROBABILITY 4.5 E XPECTED VALUE
Solution.
From Corollary 4.4.8 part 3:
ż8 ż 10 ż 10
2
1= f (x) = 0 + ax (10 ´ x )dx = a (10x2 ´ x3 )dx
´8 0 0
10
10 3 1 4 104 104 104
=a x ´ x =a ´ =a
3 4 0 3 4 12
12
a= 4
10
şb şb
Note where f ( x ) = 0, we have a x ¨ f ( x )dx = a 0dx = 0.
ż 10 ż 10
3
= 0+ ax (10 ´ x )dx = a (10x3 ´ x4 )dx
0 0
10 5
10 4 1 5 10 105 105
=a x ´ x =a ´ =a
4 5 0 4 5 20
12 105
= 4¨ =6
10 20
Example 4.5.4
Example 4.5.5
f (x) = ex , xď0
Find E(Y ).
Solution.
From Definition 4.5.1,
ż0 "ż #
ż8 0
E (Y ) = x ¨ f ( x )dx = 0 + x ¨ e x dx = lim x ¨ e x dx
´8 ´8 aÑ´8 a
331
P ROBABILITY 4.5 E XPECTED VALUE
Note lim e a = 0, so lim ´ae a has the indeterminate form 0 ¨ 8. We use l’Hôpital’s rule.
aÑ´8 aÑ´8
h ´a i
1
= lim ´ 1 + 0 = lim ´ 1 = lim [e a ] ´ 1 = ´1
aÑ´8 looemo
´a
on aÑ´8 e ´a aÑ´8
numÑ8
denÑ8
Example 4.5.6
1
Let Z be a continuous random variable with probability density function (PDF) f ( x ) = x2
,
x ě 1. Find E( Z ).
Solution.
From Definition 4.5.1,
ż8 ż8 ż8
E( Z ) = x ¨ f ( x )dx = 0 + x¨x ´2
dx = x´1 dx
´8 1 1
"ż #
b
= lim x´1 dx = lim [ln b] = 8
bÑ8 1 bÑ8
It is sometimes the case that the expectation of a continuous random variable is infinite.
How should we interpret that?
A random variable Z with the given probability density function (PDF) has sample
space is [1, 8). It takes on finite values, but there is no limit to how large those values can
be. (It is true that smaller values are more likely, since f ( x ) = x´2 is a decreasing function.
However, Z also takes on extremely large values from time to time.) E( Z ) = 8 tells us
that if we run our experiment Z a lot of times, over time the average will increase without
bound.
Example 4.5.6
Theorem4.5.7.
332
P ROBABILITY 4.5 E XPECTED VALUE
Theorem4.5.8.
Proof. Intuitively, an increasing f ( x ) means we have more high values than low values,
so when we average them together, the average will be high. Similarly, decreasing f ( x )
means we have more low values than high values, so when we average them together, the
average will be low.
More rigorously:
żb żb
a+b a+b b
ż
x f ( x )dx ´ = x f ( x )dx ´ f ( x )dx
a 2 a 2 a
żb
a+b
= x´ f ( x )dx
a 2
ż a+b żb
2 a+b a+b
= x´ f ( x )dx + x´ f ( x )dx
a 2 a+b
2
2
ża żb
a+b a+b
= ´ x f ( x )dx + x´ f ( x )dx
a+b
2
2 a+b
2
2
Using the substitution y = a + b ´ x in the first integral and noting that dy = ´dx,
żb żb
a+b a+b
=´ y´ f ( a + b ´ y)dy + x´ f ( x )dx
a+b
2
2 a+b
2
2
333
P ROBABILITY 4.5 E XPECTED VALUE
If f ( x ) is increasing, then f ( a + b ´ x ) ă f ( x ):
żb
a+b h i
żb
a+b
x f ( x )dx ´ = x´ ´ f ( a + b ´ x ) + f ( x ) dx ą 0
a 2 a+b
2
2
loooooomoooooon looooooooooooooomooooooooooooooon
positive positive
żb
a+b
so x f ( x )dx ą
a 2
i
a+b
If f ( x ) is decreasing, then f ( a + b ´ x ) ą f ( x ) whenever x P 2 ,b :
żb
a+b h i
żb
a+b
x f ( x )dx ´ = x´ ´ f ( a + b ´ x ) + f ( x ) dx ą 0
a 2 a+b
2
2
loooooomoooooon looooooooooooooomooooooooooooooon
positive negative
żb
a+b
so x f ( x )dx ą
a 2
Example 4.5.9
for some appropriate constant a. Using the two theorems in this section, give a range for
E( X ).
334
P ROBABILITY 4.5 E XPECTED VALUE
Example 4.5.10
You calculate expected values for the various random variables described below. Which
of the values can you immediately, with very little computation, say are wrong? Which
seem reasonable?
1. W is a random variable that takes values from [4, 5], and you calculate E(W ) = 4.75.
2. X is a random variable that takes values from [´1, 0], and you calculate E( X ) = 0.5.
335
P ROBABILITY 4.5 E XPECTED VALUE
For E(W ), we don’t have enough information to apply Theorem 4.5.8. However, it
passes the test of Theorem 4.5.7. So E(W ) is reasonable, though we have no way of know-
ing whether it its correct.
For E( A), Theorem 4.5.8 doesn’t apply, since the values of A do not lie in a finite
interval. However, it passes the test of Theorem 4.5.7. So E(W ) is reasonable. (Indeed, if
you go through the calculation, it is correct.)
Example 4.5.10
The paper On the Viability of Conspiratorial Beliefs10 investigates a probabilistic model11 for
the length of time a conspiracy theory can remain secret. In particular, the author uses the
formula N (t)
L(t) = 1 ´ e´t(1´(1´p) )
where L(t) is the probability that, after t years, a leak has occurred that would cause the
conspiracy to be exposed; N (t) is the number of people involved in the conspiracy at time
t; and p is the probability that a person involved will cause a leak in any particular year.
(It is implied that L(t) = 0 for negative values of t.)
For this example, we’ll only use a very basic version of the full model. Suppose there
are 100 (immortal) people involved in a conspiracy, no new people are ever brought into
the conspiracy, and each person has a 1% chance of causing a leak in one year.
(a) Using the model above, what is the expected amount of time it will take for a leak to
occur?
(b) Using the model above, what is the probability that the conspiracy can survive with-
out a leak for at least 5 years?
Solution.
(a) L(t) is the probability that, at time t, at least one leak has occurred. Let T be the
time that the first leak occurs. Then L(t) = Pr ( T ď t). So, the function L(t) is the
cumulative distribution function of T. In order to find E( T ), we’ll need the probability
density function of T, which will be L1 (t) (by Definition 4.4.1).
1
Let’s start by filling in our constants: N (t) = 100 and p = 100 .
L(t) = 1 ´ e´t(1´(1´p)
N (t)
) = 1 ´ e´t(1´0.99100 ) = 1 ´ et(0.99100 ´1)
10 Grimes DR (2016) On the Viability of Conspiratorial Beliefs. PLoS ONE 11(1): e0147905. https://
doi.org/10.1371/journal.pone.0147905
11 The assumptions made that lead to this model are that every member of the conspiracy is equally likely
to cause a leak (whether by negligence or on purpose); that leak events are independent of one another;
and that the probability of a conspirator causing a leak in any given year is constant. The full derivation
is beyond the scope of the text, but the interested reader may look up “Poisson distribution.”
The paper goes on to approximate p using conspiracy theories that have been exposed. They also use
demographic data to approximate N (t). They apply the model to famous conspiracy theories (e.g. the
moon landing being faked) to discuss whether such a plot could realistically remain secret until present
day.
336
P ROBABILITY 4.5 E XPECTED VALUE
Note that 0.99100 ´ 1 is a constant. In order to make the work below clearer, we’ll
replace it with c.
L(t) = 1 ´ ect where c = 0.99100 ´ 1
L1 (t) = ´cect
" żb #
ct b
= lim ´te 0
´ ´ect dt
bÑ8 0
" b #
1
= lim ´bebc + ect
bÑ8 c 0
bc 1 bc 1
= lim ´be + e ´
bÑ8 c c
1 1
= lim ´ b ebc ´
bÑ8 c c
Since c ă 0, lim ebc = 0 (*). So, 1c ´ b ebc has the indeterminate form ´8 ¨ 0. We will
bÑ8
re-write this in order to use l’Hôpital’s rule.
" #
1
c ´b 1
= lim ´
bÑ8 e´bc c
" #
d 1
db [ c ´ b ] 1
= lim d ´bc
´
c
db [ e
bÑ8 ]
´1 1
= lim ´
bÑ8 ´ce ´bc c
1 bc 1
= lim e ´
bÑ8 c c
1
= 0´ using (*)
c
1 1
= ´ 100 = « 1.58
.99 ´ 1 1 ´ .99100
So, the expected value of the time it would take for this conspiracy theory to be leaked
is about 19 months.
337
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION
100 ´1)
1 ´ L(5) = e5(.99 « 0.04
So, there’s about a 4% chance that the conspiracy would survive at least 5 years with-
out any leaks.
Example 4.5.11
• The expected value is not an integer. So no matter who we choose, we are guaran-
teed to not choose a parent with the expected number of awakenings. So, a “usual”
experience is not the same as actually achieving the expected value.
• If we choose a parent with 3 awakenings, that’s as close as we can get to the expec-
tation. It seems reasonable that when X « E( X ), that’s a fairly “usual” trial.
• Parents with two awakenings are the most numerous. So although these parents are
farther from average, we are more likely to choose one of them than we are to choose
any other. So it is not enough to look for value of X that are closest to E( X ).
• Suppose we choose a parent with 4 awakenings. Is this so far above average that is
is very unusual (and so possibly a cause for concern) or is it still within a reasonably
common range? This question will bring us to the heart of the matter: how far from
E( X ) is still “usual”?
To quantify the last bullet point, let’s compare each parent’s experience to the expected
value. If your baby woke you up twice during the night, then your experience differs from
the average by 0.85. If your baby woke you up three times during the night, then your
experience differs from the average by 0.15. Let’s give that difference its own variable
name, Y. Larger values of Y mean a larger difference between the individual experience
and the expectation. So parents with a high Y value are “less usual” than parents with a
low Y-value.
338
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION
8
ÿ
E (Y ) = |x ´ 2.85| ¨Pr (Y = y)
loooomoooon
x =0
Y
5 5 40 23 13
= 2.85 ¨ + 1.85 ¨ + 0.85 ¨ + 0.15 ¨ + 1.15 ¨
100 100 100 100 100
10 0 3 1
+ 2.15 ¨ + 3.15 ¨ + 4.15 ¨ + 5.15 ¨
100 100 100 100
« 1.15
That is, when we choose parents at random, on average their number of awakenings
differs from the expected number of awakenings by 1.15.
With that in mind, we might say a parent who wakes up between 1.15 ´ 2.38 = ´1.23
and 1.15 + 2.38 = 3.53 times wakes up a “usual” number of times, which the other parents
have experiences that are “unusual”. A parent whose baby wakes then up four times dur-
ing the night is “unusual,” in that their experience is quite different from the expectation,
but a parent whose baby never wakes them up is still in the range of “usual”.
To generalize what we just computed:
• X is a random variable
used this as a measure of how far off from E( X ) our variable X could be and still be
considered “usual”.
339
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION
Recall from Definition 4.5.1 that the definition of E( X ) depends on whether X is con-
tinuous or discrete.
Corollary4.6.2.
Var( X ) = ( x ´ E( X ))2 ¨ Pr ( X = x )
ÿ
Note the similarities between Var( X ) and E(Y ) from the end of the last subsection,
4.6.1. Their interpretations are similar: Var( X ) measures the expected squared difference
between X and E( X )12 .
One reason we replace |X ´ E( X )| with ( X ´ E( X ))2 is that f ( X ) = |x ´ E( X )| is
not differentiable, while f ( x ) = ( x ´ E( X ))2 is differentiable. We want to be able to use
calculus tools, so differentiability is desirable.
Example 4.6.3
Consider the random variable X with probability mass function (PMF) given below.
x Pr ( X = x )
1
0 2
1
10 2
12 To explore why we need absolute values or squares, see Question 13 in Section 4.6 of the practice book.
340
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION
X takes on values from [0, 10], with E( X ) = 5. Every value of X differs from E( X ) by
5. However,
1 1
( x ´ 5)2 ¨ Pr ( X = x ) = (0 ´ 5)2 ¨ + (10 ´ 5)2 ¨ = 25
ÿ
Var( X ) =
x
2 2
We take the square root of Var( X ) to somehow atone for our previous transgression
of squaring |X ´ E( X )|. Informally, we think of the standard deviation as the “usual”
difference between X and E( X ).
Example 4.6.3
Example 4.6.4
One thousand students take a midterm, and we choose one student uniformly at random.
X is the mark the student got on the midterm, out of 100. For this particular group of 1000
students, E( X ) = 65 and σ( X ) = 15.
• Suppose we select Student A, who earned 60 points. Although this is below the class
average, it is within one standard deviation of the expectation. That is,
|X ´ E( X )| = 5 ă 15 = σ ( X ).
• If we select Student B who scored 90, not only are they above the class average, they
are well above the class average. The difference between X and E( X ) is greater than
usual.
• If we select Student C who scored 45, not only are they below the class average, they
are well below the class average. The difference between X and E( X ) is worse than
usual.
Example 4.6.4
341
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION
Corollary4.6.5.
2
Var( X ) = E( X 2 ) ´ E( X )
x
x2 ´ 2 ¨ E( x ) + [E( X )]2 ¨ Pr ( X = x )
ÿ
=
x
x2 ¨ Pr ( X = x ) ´ 2 ¨ E( X ) ¨ xPr ( X = x ) + [E( X )]2
ÿ ÿ ÿ
= Pr ( X = x )
x x x
Example 4.6.6
342
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION
ż 10
x
ż8
E( X ) = x ¨ f ( x )dx = x¨ dx
´8 0 50
ż 10 ˇ10
x2 3
x ˇ 103 20
= dx = ˇ = =
0 50 150 ˇ 0 150 3
ż 10
20 2 x
ż8
2
Var( X ) = ( x ´ E( x )) ¨ f ( x )dx = x´ ¨ dx
´8 0 3 50
ż 10 ż 10
2 40 400 x 1 3 40 2 400
= x ´ x+ ¨ dx = x ´ x + x dx
0 3 9 50 50 0 3 9
4 ˇ10
1 x 40x3 200x2 ˇˇ 1 104 40 ¨ 103 200 ¨ 102
= ´ + ˇ = 50 ´ +
50 4 9 9 0 4 9 9
4
10 1 4 2 50
= ´ + =
50 4 9 9 9
Example 4.6.6
Example 4.6.7
Calculate the variance (two ways) and standard deviation of a dice roll.
6 6
ÿ 1 ÿ 1 6¨7 7
E( X ) = x ¨ Pr ( X = x ) = x= =
6 6 2 2
x =1 x =1
343
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION
6 2
ÿ
2
ÿ 7 1
Var( X ) = ( x ´ E( X )) ¨ Pr ( X = x ) = x´ ¨
x
2 6
x =1
6 6 6 6
ÿ 1 2 49 1 2 7 ÿ
ÿ 1 ÿ 49
= x ´ 7x + = x ´ x+
6 4 6 6 6 4
x =1 x =1 x =1 x =1
1 6 ¨ 7 ¨ 13 7 7¨6 49
= ´ +
6 6 6 2 4
35
=
12
6 2
2
2 ÿ
2 7
Var( X ) = E( X ) ´ E( X ) = x ¨ Pr ( X = x ) ´
2
x =1
6
ÿ 1 2 49 1 6 ¨ 7 ¨ 13 49 35
= x ´ = ´ =
6 4 6 6 4 12
x =1
(Computing the variance two different ways is not usually necessary, but it can be a
good way to double-check your work.)
b
Using Definition 4.6.1, σ ( X ) = Var( X ) = 35
a
12 « 1.7
Example 4.6.7
Example 4.6.8
Calculate the variance and standard deviation of W. For practice, use both methods dis-
cussed in this section for computing variance.
Solution.
We use the variance to calculate the standard deviation; we use expected value to
calculate variance; we use probability density function (PDF) to calculate expected value;
and we use cumulative distribution function (CDF) to define probability density function
(PDF). Working backwards, this gives us a plan for performing the necessary calculations.
344
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION
Step 1 Definition 4.4.1 tells us the probability density function (PDF) is the derivative of
the cumulative distribution function (CDF).
$
&0
’ xă0
F ( x ) = e x ´ 1 0 ď x ď ln 2
1 x ą ln 2
’
%
$
&0 x ă 0
’
f ( x ) = e x 0 ă x ă ln 2
0 x ą ln 2
’
%
#
e x 0 ă x ă ln 2
=
0 else
345
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION
= (2 ln2 2 ´ 4 ln 2 + 2) ´ (2 ln 2 ´ 1)2
= 1 ´ 2 ln2 2 « 0.039
Example 4.6.8
Let a, b be real numbers with a ă b and suppose a random variable X takes values
from the interval [ a, b]. Then
b´a
0 ď σ(X ) ď
2
346
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION
Proof. First, consider what happens when we replace E( X ) with b+2 a (the midpoint of the
sample space) in the definition of variance (Definition 4.6.1).
! żb
b+a 2 b+a 2
E X´ = x´ ¨ f ( x )dx
2 a 2
żb 2 !
b + a
= x2 ´ (b + a) x + ¨ f ( x )dx
a 2
ż
b+a 2 b
żb żb
2
= x f ( x )dx ´ (b + a) x f ( x )dx + f ( x )dx
a a 2 a
2 b+a 2
= E( X ) ´ ( b + a )E( X ) +
2
2 2 2 b+a 2
= E( X ) ´ [E( X )] + [E( X )] ´ (b + a)E( X ) +
2
2
b+a
= E( X 2 ) ´ [E( X )]2 + E( X ) ´
2
ě E( X 2 ) ´ [E( X )]2 = Var( X ) (*)
Since X takes values in the interval [ a, b]:
a ďX ď b
b+a b+a b+a
ùñ a´ ďX ´ ď b´
2 2 2
b´a b+a b´a
ùñ ´ ďX ´ ď
2 2 2
2
b+a b´a 2
ùñ 0ď X´ ď
2 2
By Theorem 4.5.7,
2 ! 2
b+a b´a
0ďE X´ ď
2 2
Example 4.6.10
If the random
a variable X takes on values from the interval [1, 5], then 0 ď σ ( X ) ď 2. Since
σ( X ) = Var( X ), then 0 ď Var( X ) ď 4.
347
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION
Example 4.6.10
Chapter 4 contains content adapted by Bruno Belevan, Parham Hamidi, and Elyse
Yeager from Sections 1.1, 3.1, Ch 4 introduction, 4.1, and 4.2 of Introductory Statistics by
Ilowsky and Dean under a Creative Commons Attribution License v4.0.
348
Chapter 5
You have probably learned about Taylor polynomials1 and, in particular, that
x2 x3 xn
ex = 1 + x + + +¨¨¨+ + En ( x )
2! 3! n!
where En ( x ) is the error introduced when you approximate e x by its Taylor polynomial of
degree n. You may have even seen a formula for En ( x ). We are now going to ask what
happens as n goes to infinity? Does the error go zero, giving an exact formula for e x ? We
shall later see that it does and that
x2 x3 xn
8
x
ÿ
e = 1+x+ + +¨¨¨ =
2! 3! n =0
n!
At this point we haven’t defined, or developed any understanding of, this infinite sum.
How do we compute the sum of an infinite number of terms? Indeed, when does a sum
of an infinite number of terms even make sense? Clearly we need to build up foundations
to deal with these ideas. Along the way we shall also see other functions for which the
corresponding error obeys lim En ( x ) = 0 for some values of x and not for other values of
nÑ8
x.
To motivate the next section, consider using the above formula with x = 1 to compute
the number e:
8
1 1 ÿ 1
e = 1+1+ + +¨¨¨ =
2! 3! n =0
n!
As we stated above, we don’t yet understand what to make of this infinite number of
terms, but we might try to sneak up on it by thinking about what happens as we take
1 Now would be an excellent time to quickly read over your notes on the topic.
349
S EQUENCE AND S ERIES 5.1 S EQUENCES
1 term 1=1
2 terms 1+1 = 2
1
3 terms 1 + 1 + = 2.5
2
1 1
4 terms 1 + 1 + + = 2.666666 . . .
2 6
1 1 1
5 terms 1+1+ + + = 2.708333 . . .
2 6 24
1 1 1 1
6 terms 1+1+ + + + = 2.716666 . . .
2 6 24 120
By looking at the infinite sum in this way, we naturally obtain a sequence of numbers
The key to understanding the original infinite sum is to understand the behaviour of this
sequence of numbers — in particularly, what do the numbers do as we go further and
further? Does it settle down 2 to a given limit?
5.1IJ Sequences
In the discussion above we used the term “sequence” without giving it a precise mathe-
matical meaning. Let us rectify this now.
Definition5.1.1.
where f (n) is some function from the natural numbers to the real numbers.
2 You will notice a great deal of similarity between the results of the next section and “limits at infinity”
which was covered last term.
3 For the more pedantic reader, here we mean a list of countably infinitely many numbers. The interested
(pedantic or otherwise) reader should look up countable and uncountable sets.
350
S EQUENCE AND S ERIES 5.1 S EQUENCES
Example 5.1.2
It is not necessary that there be a simple explicit formula for the nth term of a sequence.
For example the decimal digits of π is a perfectly good sequence
3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3, 2, 3, 8, 4, 6, 2, 6, 4, 3, 3, 8, ¨ ¨ ¨
(
Definition5.1.3.
(8
A sequence an n=1 is said to converge to the limit A if an approaches A as n
tends to infinity. If so, we write
lim an = A or an Ñ A as n Ñ 8
nÑ8
The reader should immediately recognise the similarity with limits at infinity
lim f ( x ) = L if f ( x ) Ñ L as x Ñ 8
xÑ8
Example 5.1.4
4 There is, however, a remarkable result due to Bailey, Borwein and Plouffe that can be used to compute
the nth binary digit of π (i.e. writing π in base 2 rather than base 10) without having to work out the
preceding digits.
351
S EQUENCE AND S ERIES 5.1 S EQUENCES
1
lim =0
nÑ8 n
Example 5.1.4
n
Example 5.1.5 lim
nÑ8 2n+1
n
Here is a little less trivial example. To study the behaviour of 2n+1 as n Ñ 8, it is a good
idea to write it as
n 1
=
2n + 1 2 + n1
As n Ñ 8, the n1 in the denominator tends to zero, so that the denominator 2 + 1
n tends to
2 and 1 1 tends to 12 . So
2+ n
n 1 1
lim = lim 1
=
nÑ8 2n + 1 nÑ8 2 + 2
n
Example 5.1.5
Notice that in this last example, we are really using techniques that we used before to
study infinite limits like lim f ( x ). This experience can be easily transferred to dealing
xÑ8
with lim an limits by using the following result.
nÑ8
Theorem5.1.6.
If
lim f ( x ) = L
xÑ8
lim an = L
nÑ8
5 If the digits of π were to converge, then π would have to be a rational number. The irrationality of π
(that it cannot be written as a fraction) was first proved by Lambert in 1761. Niven’s 1947 proof is more
accessible and we invite the interested reader to use their favourite search engine to find step–by–step
guides to that proof.
352
S EQUENCE AND S ERIES 5.1 S EQUENCES
Example 5.1.7 lim e´n
nÑ8
Example 5.1.7
The bulk of the rules for the arithmetic of limits of functions that you already know also
apply to the limits of sequences. That is, the rules you learned to work with limits such as
lim f ( x ) also apply to limits like lim an .
xÑ8 nÑ8
lim an = A lim bn = B
nÑ8 nÑ8
(d) lim an bn = A B
nÑ8
(The limit of the product is the product of the limits.)
an A
(e) If B ‰ 0 then lim =
nÑ8 bn B
(The limit of the quotient is the quotient of the limits provided the limit of the
denominator is not zero.)
We use these rules to evaluate limits of more complicated sequences in terms of the
limits of simpler sequences — just as we did for limits of functions.
Example 5.1.9
353
S EQUENCE AND S ERIES 5.1 S EQUENCES
Example 5.1.9
lim an = lim bn = L
nÑ8 nÑ8
then
lim cn = L
nÑ8
Example 5.1.11
π1 = 3 π2 = 1 π3 = 4 π4 = 1 π5 = 5 π6 = 9 ¨¨¨
354
S EQUENCE AND S ERIES 5.1 S EQUENCES
Example 5.1.11
Finally, recall that we can compute the limit of the composition of two functions using
continuity. In the same way, we have the following result:
lim g( an ) = g( L)
nÑ8
Example 5.1.13 lim sin πn
2n+1
nÑ8
n
Write sin 2nπn
+1 = g 2n+1 with g( x ) = sin(πx ). We saw, in Example 5.1.5 that
n 1
lim =
nÑ8 2n + 1 2
πn n 1 π
lim sin = lim g =g = sin = 1
nÑ8 2n + 1 nÑ8 2n + 1 2 2
Example 5.1.13
355
S EQUENCE AND S ERIES 5.1 S EQUENCES
between two notes relies on the ratio of their two frequencies, which is why we use a ratio
and not a difference when measuring intervals.
Example 5.1.14
Consider the three pairs of notes below. Which pairs will sound roughly the same distance
from each other, and which will sound different?
Solution. To quantify how far apart two notes sound, we take the ratio of their frequencies.
193.25
1 110 Hz and 193.25 Hz have a ratio of 110 « 1.75682
523.25
2 440 Hz and 523.25 Hz have a ratio of 440 « 1.18920
698.46
3 587.33 Hz and 698.46 Hz have a ratio of 587.33 « 1.18921
The last two pairs of notes sound about the same distance away from one another, because
their ratios are nearly identical. The first pair of notes will sound farther apart from one
another than the other pairs.
Incidentally, the interval spanned in 2 and 3 has a name: a minor third. For listeners in
the Western tradition, the sound of two notes of such an interval being played together is
often evocative of a melancholy or enigmatic mood.
Example 5.1.14
A scale is a collection of notes. There are many different scales that are used, and
many more that are theoretically possible. Scales in context usually refer to the collection
of notes that make up most of a single piece of music. So, one song might mainly consist of
notes from a scale named “B Minor,” and another song might mainly consist of notes from
a scale named “G major pentatonic.” Generally speaking7 , standardized scales consist of
notes that people have decided they like hearing played together.
Example 5.1.15
The interval between some frequency a and the frequency 2a is called an octave. Some
popular musical scales divide the octave into twelve intervals. (In the partial piano schematic
below, the key labelled 13 would produce a note with twice the frequency of the key la-
belled 1.)
7 Precision in describing the things that people do is much harder to attain than precision in mathematics.
356
S EQUENCE AND S ERIES 5.1 S EQUENCES
2 5 7 10 12
1 3 4 6 8 9 11 13
We call a scale “even-tempered” if consecutive notes always sound like they’re the
same distance apart from one another. Since the sound of notes in relation to each other
is determined by the ratio of their frequencies, this means means that the ratio of the
frequencies of two consecutive notes is the same, no matter which two consecutive notes
we’re considering.
Suppose the key labelled 1 makes the note 440Hz, and the key labelled 13 makes the
note 880 Hz (one octave above 440). If the piano is tuned to an even-tempered scale, what
are the frequencies associated with the keys labelled 2 through 12?
Solution.
Let the notes on the piano form the first part8 of a sequence, with key 1 making note
a1 , key 2 making note a2 , and so on. We know three pieces of information:
1. a1 = 440
2. a13 = 880
a2 a3 a4 a13
3. a1 = a2 = a3 = ¨¨¨ = a12
(3 comes from the description of even-tempering.) Let’s give the number aa21 the name r
(because it’s a ratio). This gives us a recurrence relation to describe our partial sequence:
a
since na+n 1 = r, then an+1 = ran . We can now write out each element of the partial sequence
in terms of r.
a1 = 440
a2 = 440r
a3 = (440r )r = 440r2
a4 = (440r2 )r = 440r3
..
.
a12 = 440r11
a13 = 440r12
Since we’re given a13 = 880, we can solve for r.
880 = 440r12
1
r = 2 12
8 we defined sequences to be infinite, but pianos have only finitely many keys
357
S EQUENCE AND S ERIES 5.1 S EQUENCES
Example 5.1.15
When we say “the interval between consecutive notes is the same,” we mean “the ratio
between consecutive notes is the same.” Having a common ratio between consecutive
terms is the defining characteristic of a geometric sequence.
Definition5.1.16.
(If we were to “add up” the terms of a geometric sequences, we’d get a geometric series
– see Example 5.2.4.)
When a tone is made by a vibrating physical9 object, although we may primarily pick
up on one frequency (the “fundamental”), usually waves of many different frequencies
are being generated. If we make a tone by causing a string to vibrate, as on a violin or
guitar, the waves that make noise have frequencies that are whole-number multiples of
the fundamental frequency. To explain this behaviour, note that the ends of the string are
fixed, so they can’t move up and down. So, the only waves that can occur on the string
are waves that keep these points fixed. The fundamental is the longest wave. The other
waves that are generated are called harmonics. The nth harmonic has frequency n times
the fundamental.
9 For the following “physical” discussion, we’re relying on a very simplified model. However, the results
are indeed relevant to how actual musical instruments sound.
358
S EQUENCE AND S ERIES 5.1 S EQUENCES
In the figure above, a string10 is fixed between two dots. We imagine it vibrating up
and down in a wave pattern, moving between the positions shown by the dashed and
dotted lines. The wavelength of these waves is inversely proportional to the frequency
they generate – so dividing a wavelength by (say) three causes the frequency to triple.
Example 5.1.17
A string, when played, has a fundamental tone of 100 Hz, with a wavelength of 1 m. Let
t f n u be the sequence of frequencies of the harmonics of the string, organized by increasing
pitch (with f 1 = 100). Let t`n u be the sequence of corresponding wavelengths (so `1 = 1).
What are t`n u and t f n u?
Solution. The frequencies of harmonic tones are integer multiples of the fundamental, so
f 1 = 100, f 2 = 200, f 3 = 300, ... , f n = 100n
The wavelengths are inversely proportional to the frequencies. So, if frequency f n is f 1 ¨ n,
then wavelength `n is `n1 .
1 1 1
`1 = 1, `2 = , `3 = , ... , `n =
2 3 n
Example 5.1.17
The sequence t n1 u8
n=1 is called the harmonic sequence. (We’ll consider the harmonic
series in Example 5.3.4.) In music textbooks, you might see the sequence of harmonic
notes referred to as the “harmonic series.” This isn’t because the notes are added together,
it’s simply a different use of the word “series.”
Example 5.1.18
Consider an even-tempered musical scale with twelve intervals in each octave, the lowest
note of which is 250 Hz.
Suppose we have a string whose fundamental tone is 250 Hz. Which harmonics of the
string are also notes of the even-tempered scale?
Solution.
The even-tempered musical scale is given by the geometric sequence ten = a ¨ r n u8 n =0
1
where a = 250 and r = 2 12 . The harmonic sequence of the string is thn = 250nun=1 .
8
All frequencies in the harmonic sequence are integer multiples of 250, and so are whole
numbers. The only whole numbers in the geometric sequence en occur when 2 is raised
to a whole-number powers, i.e. when n is a multiple of 12. So our only candidates for
frequencies that appear in both sequences have the form 250 ¨ 2k . It’s quick to see that
these occur in both: 250 ¨ 2k = gn when n = 12k, and 250 ¨ 2k = hn when n = 2k .
So, the only intervals from the even-tempered scale that perfectly line up with the nat-
ural harmonics of the string are octaves: the fundamental, twice the fundamental, twice
that frequency, etc.
10 Similar wave behaviour occurs in tubes of air, like you might find in a brass instrument or woodwind.
Brass players can emphasize different harmonic notes by changing they way they blow into their in-
strument.
359
S EQUENCE AND S ERIES 5.2 S ERIES
Example 5.1.18
Harmonics are produced naturally, so it’s nice if they’re “in tune” with the scale notes.
The dearth of overlap between harmonic and geometric sequences is one reason that even-
tempered scales are sometimes unpopular. However, many harmonic notes are approxi-
19
mated by the even-tempered scale above. For example, 2 12 « 2.9966 « 3, so g19 is a fair
approximation to e3 .
Example 5.1.19
Suppose we were to make a scale that consisted only of harmonics. The frequencies would
make up the sequence thn = anu8 n=1 , where a is the fundamental.
How would such a scale sound if we played the notes one after the other? Remember,
the way two notes sound depends on the ratio of their frequencies. A bigger ratio sounds
like a bigger “step” from one note to the next. So, let’s define a sequencetrn u8
n=2 to be the
ratio of the nth harmonic to the note before it. A value of rn that is close to 1 means the
two notes sound the same. A value of rn that is far from 1 means the two notes sound
different.
frequency: a 2a 3a 4a 5a 6a 7a ¨¨¨ na
3 4 5 6 7 n
ratio: 2 2 3 4 7 6 ¨¨¨ n´1
The sequences hn and rn have different limits, each with a musical interpretation.
• lim hn = 8 tells us that the notes of this sequence have no upper bound. We can
nÑ8
find notes as high as we please in this scale.
• lim rn = 1 tells us that notes of the scale sound more and more alike as we go higher.
nÑ8
The picture painted by these two limits is that the scale climbs higher and higher,
but does so in tiny increments, so that many different high-pitched notes are virtually
indistinguishable from one another. (On the other hand, the first step is huge: an entire
octave!)
Example 5.1.19
With this introduction to sequences and some tools to determine their limits, we can
now return to the problem of understanding infinite sums.
5.2IJ Series
A series is a sum
a1 + a2 + a3 + ¨ ¨ ¨ + a n + ¨ ¨ ¨
360
S EQUENCE AND S ERIES 5.2 S ERIES
You already have a lot of experience with series, though you might not realise it. When
you write a number using its decimal expansion you are really expressing it as a series.
Perhaps the simplest example of this is the decimal expansion of 13 :
1
= 0.3333 ¨ ¨ ¨
3
Recall that the expansion written in this way actually means
8
3 3 3 3 ÿ 3
0.333333 ¨ ¨ ¨ = + + + +¨¨¨ =
10 100 1000 10000 10n
n =1
The summation index n is of course a dummy index. You can use any symbol you like
(within reason) for the summation index.
8 8 8 8
ÿ 3 ÿ 3 ÿ 3 ÿ 3
n
= i
= j
=
10 10 10 10`
n =1 i =1 j =1 `=1
A series can be expressed using summation notation in many different ways. For example
the following expressions all represent the same series:
ik0kj hkk`=
hkk`= ik1kj hkk`=
ik3kj
8
ÿ 3 3 3 3
= + + +¨¨¨
`=0
10`+1 10 100 1000
=2kj hkknik
hkknik =3kj
8
3 ÿ 3 3 3 3
+ = + + +¨¨¨
10 n=2 10n 10 100 1000
We can get from the first line to the second line by substituting n = j ´ 1 — don’t forget to
also change the limits of summation (so that n = 1 becomes j ´ 1 = 1 which is rewritten
as j = 2). To get from the first line to the third line, substitute n = ` + 1 everywhere,
including in the limits of summation (so that n = 1 becomes ` + 1 = 1 which is rewritten
as ` = 0).
Whenever you are in doubt as to what series a summation notation expression repre-
sents, it is a good habit to write out the first few terms, just as we did above.
361
S EQUENCE AND S ERIES 5.2 S ERIES
Of course, at this point, it is not clear whether the sum of infinitely many terms adds up
to a finite number or not. In order to make sense of this we will recast the problem in terms
of the convergence of sequences (hence the discussion of the previous section). Before we
proceed more formally let us illustrate the basic idea with a few simple examples.
!
8
ÿ 3
Example 5.2.1
10n
n =1
3
As we have just seen above the series 8 n=1 10n is
ř
=1kj hkknik
hkknik =2kj hkknik
=3kj
8
ÿ 3 3 3 3
= + + +¨¨¨
10n 10 100 1000
n =1
It sure looks like that, as we add more and more terms, we get closer and closer to 0.3̇ = 31 .
3 1
So it is very reasonable11 to define 8n=1 10n to be 3 .
ř
Example 5.2.1
!
8 8
(´1)n
ÿ ÿ
Example 5.2.2 1 and
n =1 n =1
Every term in the series n=1 1 is exactly 1. So the sum of the first N terms is exactly N.
ř8
As we add more ř and more terms this grows unboundedly. So it is very reasonable to say
that the series 8
n=1 1 diverges.
The series
=1kj hkknik
hkknik =3kj hkknik
=2kj hkknik =5kj
=4kj hkknik
8
(´1)n = (´1) + 1 + (´1) + 1 + (´1) + ¨ ¨ ¨
ÿ
n =1
So the sum of the first N terms is 0 if N is even and ´1 if N is odd. As we add more and
more terms from the series, the sum alternates between 0 and ´1 for ever and ever. So the
11 Of course we are free to define the series to be whatever we want. The hard part is defining it to be
something that makes sense and doesn’t lead to contradictions. We’ll get to a more systematic definition
shortly.
362
S EQUENCE AND S ERIES 5.2 S ERIES
In the above examples we have tried to understand the series by examining the sum
of the first few terms and then extrapolating as we add in more and more terms. That is,
we tried to sneak up on the infinite sum by looking at the limit of (partial) sums of the
first few terms. This approach can be made into a more řformal rigorous definition. More
precisely, to define what is meant by the infinite sum n=1 an , we approximate it by the
8
sum of its first N terms and then take the limit as N tends to infinity.
Definition5.2.3.
The N th partial sum of the series n =1 a n is the sum of its first N terms
ř8
N
ÿ
SN = an .
n =1
(8
The partial sums form a sequence S N N =1 . If this sequence of partial sums
converges S N Ñ S as N Ñ 8 then we say that the series 8n=1 an converges to S
ř
and we write
8
ÿ
an = S
n =1
If the sequence of partial sums diverges, we say that the series diverges.
Let a and r be any two fixed real numbers with a ‰ 0. The series
8
2 n
ar n
ÿ
a + ar + ar + ¨ ¨ ¨ + ar + ¨ ¨ ¨ =
n =0
12 It is actually quite common in computer science to think of 0 as the first integer. In that context, the set
of natural numbers is defined to contain 0:
N = t0, 1, 2, . . . u
363
S EQUENCE AND S ERIES 5.2 S ERIES
ar n´1 ˇn=2 = ar2´1 = ar, and so on13 . Regardless of how we write the geometric series, a is
ˇ
The secret to evaluating this sum is to see what happens when we multiply it r:
rS N = r a + ar + ar2 + ¨ ¨ ¨ + ar N
= ar + ar2 + ar3 + ¨ ¨ ¨ + ar N +1
Notice that this is almost the same14 as S N . The only differences are that the first term, a,
is missing and one additional term, ar N +1 , has been tacked on the end. So
S N = a + ar + ar2 + ¨ ¨ ¨ + ar N
rS N = ar + ar2 + ¨ ¨ ¨ + ar N + ar N +1
Hence taking the difference of these expressions cancels almost all the terms:
(1 ´ r )S N = a ´ ar N +1 = a(1 ´ r N +1 )
1 ´ r N +1
SN = a ¨ .
1´r
On the other hand, if r = 1, then
a + a + ¨ ¨ ¨ + a = a ( N + 1)
S N = loooooooomoooooooon
N +1 terms
So in summary:
&a 1´r N +1
$
1´r if r ‰ 1
SN = (5.2.1)
% a ( N + 1) if r = 1
Z+ = t1, 2, 3, . . . u
is used to denote the (strictly) positive integers. Remember that in this text, as is more standard in
mathematics, we define the set of natural numbers to be the set of (strictly) positive integers.
13 This reminds the authors of the paradox of Hilbert’s hotel. The hotel with an infinite number of rooms
is completely full, but can always accommodate one more guest. The interested reader should use their
favourite search engine to find more information on this.
14 One can find similar properties of other special series, that allow us, with some work, to cancel many
terms in the partial sums. We will shortly see a good example of this. The interested reader should look
up “creative telescoping” to see how this idea might be used more generally, though it is somewhat
beyond this course.
364
S EQUENCE AND S ERIES 5.2 S ERIES
Now that we have this expression we can determine whether or not the series con-
a
verges. If |r| ă 1, then r N +1 tends to zero as N Ñ 8, so that S N converges to 1´r as N Ñ 8
and
8
a
ar n =
ÿ
provided |r| ă 1. (5.2.2)
n =0
1 ´ r
On the other hand if |r| ě 1, S N diverges. To understand this divergence, consider the
following 4 cases:
• If r ą 1, then r N grows to 8 as N Ñ 8.
• If r ă ´1, then the magnitude of r N grows to 8, and the sign of r N oscillates between
+ and ´, as N Ñ 8.
In each case the sequence of partial sums does not converge and so the series does not
converge.
Example 5.2.4
Let a and r be fixed real numbers, and let N be a positive integer. Then
&a ¨ 1´r N +1 if r ‰ 1
$
N
ÿ
n 1´r
ar =
%a( N + 1) if r = 1
n =0
and
8
a
ar n =
ÿ
provided |r| ă 1.
n =0
1 ´ r
Bitcoin is a virtual currency that mimics traditional currencies in a number of ways. One
of those ways is controlled supply15 . That is, new bitcoins enter circulation over time in a
controlled manner.
15 Source for the specifics in this example: Controlled Supply, Bitcoin Wiki, url https://en.bitcoin.
it/wiki/Controlled_supply accessed 16 Aug 2020
365
S EQUENCE AND S ERIES 5.2 S ERIES
New blocks16 are searched for by computers. When a block is found, it is converted
into a set number of new bitcoins (owned by the finder). This is the reward for finding a
block.
This process is analogous to mining precious metals which then are added to the cur-
rency supply, so the process of finding new blocks is often called mining. Importantly,
the bitcoins given in the reward are new bitcoins that did not exist before the block was
found. So, finding blocks is how bitcoins are created.
The reward for finding a block started at 50 bitcoins, but it halves every 210,000 blocks.
The miners who found block 0, block 1, and block 209,999 each got a reward of 50 bitcoins.
Then, the miners who found block 210,000 through block 419,999 each got a reward of 25
bitcoins, and so on.
For the purposes of this example, we will assume that miners will always be able to
find blocks. (That is, blocks never run out.) We will also assume that rewards for finding
blocks are the only ways bitcoins are ever created, and that bitcoins are never destroyed.
(a) Suppose bitcoins are infinitely divisible. (That is, you can have an arbitrarily small
portion of a bitcoin, such as one-trillionth of a bitcoin, without a limit on how small
that portion can be.) If miners continue finding blocks for an infinite period of time,
what will happen to the total supply of bitcoins?
(b) One Satoshi (or one sat) is equal to 1/100, 000, 000 bitcoin. Suppose when the reward
for a block is scheduled to be less than one satoshi, the block finder actually gets a
reward of 0 bitcoins. That is, there are no more bitcoins created when the reward for
finding a new block dips below one satoshi. If miners continue finding blocks for an
infinite period of time, what will happen to the total supply of bitcoins?
Solution.
(a) Let’s model the number of bitcoins by grouping together collections of 210,000 blocks.
• For the first collection of 210,000 blocks, the number of bitcoins created is 50 each,
for a total of 210, 000 ¨ 50 bitcoins created.
50
• For the second collection of 210,000 blocks, the number of bitcoins created is 2 =
25 each, for a total of 210, 000 ¨ 50
2 bitcoins created.
50
• For the third collection of 210,000 blocks, the number of bitcoins created is 4 =
25 50
2 = 12.5 each, for a total of 210, 000 ¨ 4 bitcoins created.
• In general, for the nth collection of 210,000 blocks, the total number of bitcoins
50
created by those blocks is 210, 000 ¨ 2n´1 bitcoins.
• All together, the number of bitcoins created by an infinite collection of blocks is
8 8 n´1
ÿ 50 ÿ 1
210, 000 ¨ n´1 = 210, 000 ¨ 50
2 2
n =1 n =1
16 For the purposes of this question, the technical details are not important. What you need to know about
blocks is that you find them and they get turned into currency.
366
S EQUENCE AND S ERIES 5.2 S ERIES
This series almost, but not exactly, looks like the series from Theorem 5.2.5. We’ll
expand the series17 in order to see how we might have indexed the terms differently.
!
8 n´1 0 1 2
ÿ 1 1 1 1
210, 000 ¨ 50 = 210, 000 ¨ 50 + + +¨¨¨
2 2 2 2
n =1
8 n
ÿ 1
= 210, 000 ¨ 50
n =0
2
1
= 210, 000 ¨ 50 ¨ = 210, 000 ¨ 50 ¨ 2 = 21, 000, 000
1 ´ 12
As blocks are mined, the total number of bitcoins will approach 21 million. It will
never exceed 21 million.
(b) For this part we assume that after a certain number of blocks, no more bitcoin are
created. So, we will look at a finite sum, rather than an infinite series. Let’s start by
figuring out when the reward for a block drops below 1 satoshi.
50
The nth batch of 210,000 blocks earns 2n´1 bitcoins, as long as that number is greater
than or equal to one satoshi. That is, we create bitcoins as long as:
50 1 1
n´1
ě = 8
2 100, 000, 000 10
Solving for n:
5 ¨ 109 ě 2n´1
log2 (5 ¨ 109 ) ě n ´ 1
1 + log2 (5 ¨ 109 ) ě n
Note n only makes sense as an integer. Using a calculator, 1 + log2 (5 ¨ 109 ) « 33.2. So
when n = 33, blocks earn rewards, but when n ě 34, they do not.
The means the total supply of bitcoins that could ever be created under this system is:
33 0 1 2 32 !
ÿ 50 1 1 1 1
210, 000 ¨ n´1 = 210, 000 ¨ 50 + + +¨¨¨+
2 2 2 2 2
n =1
32
ÿ 1
= 210, 000 ¨ 50
n =0
2n
17 indexing from 0 (starting with the 0th collection, then the 1st collection in the bullet list) would have
eliminated this upcoming step. We described the creation of the series using the indexing that we
thought would be most intuitive to our readers, rather than the indexing that would lead to the least
amount of algebra.
367
S EQUENCE AND S ERIES 5.2 S ERIES
1
Now we can apply Theorem 5.2.5 with r = 2 and N = 32.
33
1´ 1
2 1
= 210, 000 ¨ 50 ¨ = 210, 000 ¨ 100 ¨ 1 ´ 33
1 ´ 12 2
Using a calculator,
So the total supply of bitcoins approaches 20,999,999 bitcoins and 99,755,528 satoshi,
but never exceeds this amount.
Example 5.2.6
Now that we know how to handle geometric series let’s return to Example 5.2.1.
8
3 3 3 3 ÿ 3
0.3333 ¨ ¨ ¨ = + + + +¨¨¨ =
10 100 1000 10000 10n
n =1
3 1
is a geometric series with the first term a = 10 and the ratio r = 10 . So, by Example 5.2.4,
8
ÿ 3 3/10 3/10 1
0.3333 ¨ ¨ ¨ = n
= = =
10 1 ´ 1/10 9/10 3
n =1
16 16 16
0.16161616 ¨ ¨ ¨ = + + +¨¨¨
100 10000 1000000
16 1
This is another geometric series with the first term a = 100 and the ratio r = 100 . So, by
Example 5.2.4,
8
ÿ 16 16/100 16/100 16
0.16161616 ¨ ¨ ¨ = = = =
100n 1 ´ 1/100 99/100 99
n =1
again, as expected. In this way any periodic decimal expansion converges to a ratio of two
integers — that is, to a rational number.
368
S EQUENCE AND S ERIES 5.2 S ERIES
ries18 that has been rigged to illustrate a phenomenon call “telescoping”. Notice that the
nth term can be rewritten as
1 1 1
= ´
n ( n + 1) n n+1
and so we have
1
a n = bn ´ bn + 1 where bn = .
n
Because of this we get big cancellations when we add terms together. This allows us to
get a simple formula for the partial sums of this series.
1 1 1 1
SN = + + +¨¨¨+
1¨2 2¨3 3¨4 N ¨ ( N + 1)
1 1 1 1 1 1 1 1
= ´ + ´ + ´ +¨¨¨+ ´
1 2 2 3 3 4 N N+1
18 Well. . . this sort of series does show up when you start to look at the Maclaurin polynomial of functions
like (1 ´ x ) ln(1 ´ x ). So it is not totally artificial. At any rate, it illustrates the basic idea of telescoping
very nicely, and the idea of “creative telescoping” turns out to be extremely useful in the study of series
— though it is well beyond the scope of this course.
369
S EQUENCE AND S ERIES 5.2 S ERIES
The second term of each bracket exactly cancels the first term of the following bracket. So
the sum “telescopes” leaving just
1
SN = 1 ´
N+1
and we can now easily compute
8
ÿ 1 1
= lim S N = lim 1 ´ =1
n ( n + 1) NÑ8 NÑ8 N+1
n =1
Example 5.2.8
a n = bn ´ bn + 1
for some other known sequence bn , then the series telescopes and we can compute partial
sums using
N
ÿ N
ÿ
an = ( bn ´ bn + 1 )
n =1 n =1
ÿN N
ÿ
= bn ´ bn + 1
n =1 n =1
= b1 ´ b N +1 .
and hence
8
ÿ
an = b1 ´ lim b N +1
NÑ8
n =1
8
provided this limit exists. Often lim b N +1 = 0 and then an = b1 . But this does not
ř
NÑ8 n =1
always happen. Here is an example.
Example 5.2.9 (A Divergent Telescoping Series)
8
In this example, we are going to study the series log 1 + n1 . (We don’t specify the base
ř
n =1
— any base greater than one will behave the same way.) Let’s start by just writing out the
first few terms.
n =1
hkkkkkkikkkkkkj n =2
hkkkkkkikkkkkkj n =3
hkkkkkkikkkkkkj n =4
hkkkkkkikkkkkkj
8
ÿ 1 1 1 1 1
log 1 + = log 1 + + log 1 + + log 1 + + log 1 + +¨¨¨
n 1 2 3 4
n =1
3 4 5
= log(2) + log + log + log +¨¨¨
2 3 4
370
S EQUENCE AND S ERIES 5.2 S ERIES
N
ÿ 1
SN = log 1 +
n
n =1
n =1
hkkkkkkikkkkkkj n =2
hkkkkkkikkkkkkj n =3
hkkkkkkikkkkkkj n= N´1
hkkkkkkkkkikkkkkkkkkj n= N
hkkkkkkikkkkkkj
1 1 1 1 1
= log 1 + + log 1 + + log 1 + + ¨ ¨ ¨ + log 1 + + log 1 +
1 2 3 N´1 N
3 4 N N + 1
= log(2) + log + log + ¨ ¨ ¨ + log + log
2 3 N´1 N
3 4 N N + 1
= log 2 ˆ ˆ ˆ ¨ ¨ ¨ ˆ ˆ
2 3 N´1 N
= log( N + 1)
Uh oh!
This telescoping series diverges! There is an important lesson here. Telescoping series can
diverge. They do not always converge to b1 .
Example 5.2.9
As was the case for limits, differentiation and antidifferentiation, we can compute more
complicated series in terms of simpler ones by understanding how series interact with
the usual operations of arithmetic. It is, perhaps, not so surprising that there are simple
rules for addition and subtraction of series and for multiplication of a series by a constant.
Unfortunately there are no simple general rules for computing products or ratios of series.
371
S EQUENCE AND S ERIES 5.2 S ERIES
Let A, B and C be real numbers and let the two series n =1 a n and n = 1 bn con-
ř8 ř8
verge to S and T respectively. That is, assume that
8
ÿ 8
ÿ
an = S bn = T
n =1 n =1
8
ÿ
(b) Can = CS.
n =1
Example 5.2.11
As a simple example of how we use the arithmetic of series Theorem 5.2.10, consider
8 h i
ÿ 1 2
+
7n n ( n + 1 )
n =1
We recognize that we know how to compute parts of this sum. We know that
8
ÿ 1 1/7 1
n
= =
7 1 ´ /7
1 6
n =1
1
because it is a geometric series (Example 5.2.4) with first term a = 7 and ratio r = 71 . And
we know that
8
ÿ 1
=1
n ( n + 1)
n =1
by Example 5.2.8. We can now use Theorem 5.2.10 to build the specified “complicated”
series out of these two “simple” pieces.
8 h i 8 8
ÿ 1 2 ÿ 1 ÿ 2
+ = + by Theorem 5.2.10.a
7n n ( n + 1 ) 7n n ( n + 1)
n =1 n =1 n =1
8 8
ÿ 1 ÿ 1
= +2 by Theorem 5.2.10.b
7n n ( n + 1)
n =1 n =1
1 13
= +2¨1 =
6 6
Example 5.2.11
372
S EQUENCE AND S ERIES 5.2 S ERIES
D present-day dollars
D future dollars =
1.1t
In a conventional cost-benefit analysis (CBA), returns that will happen in the future
are subject to precisely this form of discounting. To quantify the value of a project, units
of Present Value (PV) are used. Given a discounting rate20 of δ, possession of D dollars
today has the same value as a gain of (1 + δ)t D dollars t years from now. Rearranged, the
present value of D dollars that will be gained t years in the future is given by
D
PV( D, t) =
(1 + δ ) t
Future discounting is human nature, but it doesn’t always make for good policy. In
particular, “high discount rates favour myopic fisheries policies resulting in global over-
fishing” (p. 334) since the model makes the health of an ecosystem one hundred years
from now worth almost nothing today.
Sumaila proposes an intergenerational model, where discounting still happens within
a generation of people, but different generations are considered together. Quoting the
article:
“The benefits to the current generation from the use of ecosystem resources to-
day would never have appeared in the conventional CBA[Cost-Benefit Analy-
sis] of the generations that were here a hundred years ago. Similarly, the gen-
eration that will be here in a hundred years time, would receive benefits from
restored marine ecosystems that would mean much to them but would not ap-
pear in the current generation’s conventional CBA. Therefore, to capture the
benefits to all generations from ecosystem restoration projects, it is necessary
to use [an intergenerational] CBA approach” (p. 336).
19 Sumaila UR. Intergenerational cost–benefit analysis and marine ecosystem restoration. Fish and
fisheries (Oxford, England). 2004;5(4):329-43. You can access the full text online with your UBC
CWL (campus-wide login) here: https://libkey.io/libraries/498/articles/30981866/
full-text-file?utm_source=api_542.
20 To better understand the rate, note that if δ = 0, then $1 today is worth the same to us as $1 one year
from now, 100 years from now, or at any other time in the future.
373
S EQUENCE AND S ERIES 5.2 S ERIES
The approach proposed by Sumaila is as follows. We divide up the future into distinct
generations, each of which reigns over a (non-overlapping) interval of time. Each gener-
ation has its own Present Value calculation, measured from the start of its reign. So the
Present Value of the promise of D dollars in year t, to a generation that started its reign in
year t0 , is
D
PV( D, t) =
(1 + δ)t´t0
The difference between this calculation and the conventional PV calculation is that “present”
is relative for each generation.
Now that we have these components, we can create an expression for a cost-benefit
analysis (CBA) of a long-term project.
Suppose in year t, the costs incurred by the project are given by Ct , and the benefits
are given by Vt . The net value gained in that year is Vt ´ Ct , before future discounting
is applied. If the generation started its reign in year t = t0 , then the present value of of
(Vt ´ Ct ) to that generation is (1V+t δ´C t
)t´t0
. If the generation reigns from t = t0 to t = t1 , then
we combine the net present value of each of those years to find the net present value to
the generation of the entire project.
To include a collection of generations, we add up each generation’s Net Present Value.
To express this in sigma notation, let NPVk be the Net Present Value for the kth generation.
We’ll index years as follows. The first generation reigns from t = t0 + 1 to t = t1 ; the
second generation reigns from t = t1 + 1 to t = t2 ; and (in general) the kth generation
reigns from t = tk´1 + 1 to t = tk . (Considering the first year to be t = t0 + 1 looks weird,
but makes the indices more consistent with one another.)
t0 t1 t2 t3 t4
All together, the intergenerational Net Present Value of a project, from generation 1 to
generation L, is
L
ÿ
NPV = NPVk
k =1
L tk
ÿ
ÿ Vt ´ Ct
=
t=tk´1 +1
(1 + δ)t´tn
k =1
If the NPV is positive, then the project is a good investment: adjusting for discount-
ing, but considering future generations, the benefits will exceed the costs. If the NPV is
negative, then the project is a bad investment.
374
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
• So a N = S N ´ S N´1 Ñ S ´ S = 0.
This tells us that, if we already know that a given series an is convergent, then the nth
ř
term of the series, an , must converge to 0 as n tends to infinity. In this form, the test is not
so useful. However the contrapositive21 of the statement is a useful test for divergence.
Example 5.3.2
n
Let an = n +1 . Then
n 1
lim an = lim = lim =1‰0
nÑ8 nÑ8 n + 1 nÑ8 1 + 1/n
21 Given a statement of the form “If A is true, then B is true” the contrapositive is “If B is not true, then A
is not true”. The two statements in quotation marks are logically equivalent — if one is true, then so is
the other. In the present context we have
If ( an converges) then (an converges to 0).
ř
375
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
n
So the series diverges.
ř8
n =1 n +1
Example 5.3.2
Warning5.3.3.
The divergence test is a “one way ř8 test”. It tells us that if limnÑ8 an is nonzero,
or fails to exist, then the series n=1 an diverges. But it tells us absolutelyř
nothing
when limnÑ8 an = 0. In particular, it is perfectly possible for a series 8 n =1 a n
ř8 1
to diverge even though limnÑ8 an = 0. An example is n=1 n . We’ll show in
Example 5.3.6, below, that it diverges.
1
Now while convergence or divergence of series like 8 n=1 n can be determined using
ř
some clever tricks, it would be much better of have methods that are more systematic and
rely less on being sneaky. Over the next subsections we will discuss several methods for
testing series for convergence.
Note that while these tests will tell us whether or not a series converges, they do not
(except in rare cases) tell us what the series adds up to. For example, the test we will see
in the next subsection tells us quite immediately that the series
8
ÿ 1
n =1
n3
gle of height n1 and width 1. The limit of the series is then the limiting area of this union
of rectangles. Consider the sketch on the left below.
22 This series converges to Apéry’s constant 1.2020569031 . . . . The constant is named for Roger Apéry
(1916–1994) who proved that this number must be irrational. This number appears in many contexts
including the following cute fact — the reciprocal of Apéry’s constant gives the probability that three
positive integers, chosen at random, do not share a common prime factor.
376
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
ř4 1
It shows that the area of the shaded columns, n =1 n , is bigger than the area under the
curve y = 1x with 1 ď x ď 5. That is
4 ż5
ÿ 1 1
ě dx
n 1 x
n =1
If we were to continue drawing the columns all the way out to infinity, then we would
have
8
1 1
ÿ ż8
ě dx
n 1 x
n =1
That is the area under the curve diverges to +8 and so the area represented by the
columns must also diverge to +8.
It should be clear that the above argument can be quite easily generalised. For example
the same argument holds mutatis mutandis23 for the series
8
ÿ 1
n =1
n2
N żN
ÿ 1 1
2
ď 2
dx
n =2
n 1 x
23 Latin for “Once the necessary changes are made”. This phrase still gets used a little, but these days
mathematicians tend to write something equivalent in English. Indeed, English is pretty much the
lingua franca for mathematical publishing. Quidquid erit.
377
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
and hence
8
1 1
ÿ ż8
2
ď 2
dx
n =2
n 1 x
y = f (x)
y a1
a2
a3 a4
1 2 3 4x
Then
8
ÿ ż8
an converges ðñ f ( x ) dx converges
n =1 N0
378
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
• So s` řmust either converge to some finite number or increase to infinity. That is,
(
y = f (x)
Look at the figure above. The shaded area in the figure is n= I an because
ř8
• the first shaded rectangle has height a I and width 1, and hence area a I and
• the second shaded rectangle has height a I +1 and width 1, and hence area a I +1 , and
so on
This shaded area is smaller than the area under the curve y = f ( x ) for I ´ 1 ď x ă 8. So
8
ÿ ż8
an ď f ( x ) dx
n= I I´1
and, if the integral is finite, the sum 8n= I an is finite too. Furthermore, the desired bound
ř
on the truncation error is just the special case of this inequality with I = N + 1:
8
ÿ N
ÿ 8
ÿ ż8
an ´ an = an ď f ( x ) dx
n =1 n =1 n = N +1 N
y = f (x)
For the
ř8“divergence case” look at the figure above. The (new) shaded area in the figure
is again n= I an because
379
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
• the first shaded rectangle has height a I and width 1, and hence area a I and
• the second shaded rectangle has height a I +1 and width 1, and hence area a I +1 , and
so on
This time the shaded area is larger than the area under the curve y = f ( x ) for I ď x ă 8.
So
ÿ8 ż8
an ě f ( x ) dx
n= I I
Now that we have the integral test, it is straightforward to determine for which values
of p the series24
8
ÿ 1
np
n =1
converges.
8
1
Example 5.3.6 The p test:
ř
np
n =1
1
Let p ą 0. We’ll now use the integral test to determine whether or not the series
ř8
n =1 n p
(which is sometimes called the p–series) converges.
• To do so, we need a function f ( x ) that obeys f (n) = an = n1p for all n bigger than
some N0 . Certainly f ( x ) = x1p obeys f (n) = n1p for all n ě 1. So let’s pick this f and
try N0 = 1. (We can always increase N0 later if we need to.)
• This function also obeys the other two conditions of Theorem 5.3.5:
24 This series, viewed as a function of p, is called the Riemann zeta function, ζ ( p), or the Euler-Riemann
zeta function. It is extremely important because of its connections to prime numbers (among many
other things). Indeed Euler proved that
8
1 ´1
1 ´ P´ p
ÿ ź
ζ ( p) = p =
n
n =1 P prime
Riemann showed the connections between the zeros of this function (over complex numbers p) and
the distribution of prime numbers. Arguably the most famous unsolved problem in mathematics, the
Riemann hypothesis, concerns the locations of zeros of this function.
380
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
1
So we conclude that converges if and only if p ą 1. This is sometimes called the
ř8
n =1 n p
p–test.
1
• In particular, the series 8 n=1 n , which is called the harmonic series, has p = 1 and so
ř
diverges. As we add more and more terms of this series together, the terms we add,
namely n1 , get smaller and smaller and tend to zero, but they tend to zero so slowly
that the full sum is still infinite.
1
• On the other hand, the series 8 n=1 n1.000001 has p = 1.000001 ą 1 and so converges.
ř
This time as we add more and more terms of this series together, the terms we add,
1
namely n1.000001 , tend to zero (just) fast enough that the full sum is finite. Mind you,
for this example, the convergence takes place very slowly — you have to take a huge
number of terms to get a decent approximation to the full sum. If we approximate
1 řN 1
n=1 n1.000001 by the truncated series n=1 n1.000001 , we make an error of at most
ř8
ż8
dx
żR
dx 1 h 1 1 i 106
= lim = lim ´ ´ =
N x1.000001 RÑ8 N x1.000001 RÑ8 0.000001 R0.000001 N 0.000001 N 0.000001
This does tend to zero as N Ñ 8, but really slowly.
Example 5.3.6
1
We now know that the dividing line between convergence and divergence of 8
ř
n =1 n p
occurs at p = 1. We can dig a little deeper and ask ourselves how much more quickly than
1 th
n the n term needs to shrink in order for the series to converge. We know that for large
x, the function log x (of any base) is smaller than x a for any positive a — you can convince
yourself of this with a quick application of L’Hôpital’s rule. So it is not unreasonable to
ask whether the series
8
ÿ 1
n =2
n ln n
25 We could go even further and see what happens if we include powers of ln(ln(n)) and other more
exotic slow-growing functions.
381
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
“Proof”. We will not prove this theorem here. We’ll just observe that it is very reason-
able. That’s why there are quotation marks around “Proof”. For an actual proof see the
appendix section A.11.
8 8
(a) If cn converges to a finite number and if the terms in an are smaller than the
ř ř
n =0 n =0
8 8
terms in cn , then it is no surprise that an converges too.
ř ř
n =0 n =0
382
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
8 8
(b) If dn diverges (i.e. adds up to 8) and if the terms in an are larger than the terms
ř ř
n =0 n =0
8 8
in dn , then of course an adds up to 8, and so diverges, too.
ř ř
n =0 n =0
The comparison test for series is also used in much the same way as is the comparison
test for improper
řintegrals. Of course, one needs a good series to compare against, and
often the series n (from Example 5.3.6), for some p ą 0, turns out to be just what is
´p
needed.
ř
1
Example 5.3.9 8
n=1 n2 +2n+3
1
We could determine whether or not the series 8 n=1 n2 +2n+3 converges by applying the
ř
integral test. But it is not worth the effort26 . Whether or not any series converges is de-
termined by the behaviour of the summand27 for very large n. So the first step in tackling
such a problem is to develop some intuition about the behaviour of an when n is very
large.
• Step 2: Verify intuition. We can use the comparison test to confirm that this is indeed
1
the case. For any n ě 1, n2 + 2n + 3 ą n2 , so that n2 +2n +3
ď n12 . So the compari-
1
son test, Theorem 5.3.8, with an = n2 +2n and cn = n12 , tells us that 8 1
ř
+3 n=1 n2 +2n+3
converges.
1
26 Go back and quickly scan Theorem 5.3.5; to apply it we need to show that n2 +2n +3
is positive and
1
decreasing (it is), and then we need to integrate x2 +2x+3 dx. To do that we reread the notes on partial
ş
and then arctangent appears, etc etc. Urgh. Okay — let’s go back to the text now and see how to avoid
this.
27 To understand this consider any series 8 n=1 an . We can always cut such a series into two parts — pick
ř
some huge number like 106 . Then
8 106 8
ÿ ÿ ÿ
an = an + an
n =1 n =1 n=106 +1
ř8
The first sum, though it could
ř8be humongous, is finite. So the left hand side, n=1 an , is a well–defined
finite number if and only if n=106 +1 an , is a well–defined finite number. The convergence or divergence
of the series is determined by the second sum, which only contains an for “large” n.
28 The symbol “"” means “much larger than”. Similarly, the symbol “!” means “much less than”. Good
shorthand symbols can be quite expressive.
383
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
Example 5.3.9
Of course the previous example was “rigged” to give an easy application of the com-
parison test. It is often
ř relatively easy, using arguments like those in Example 5.3.9, to find
a “simple” series 8 n=1 bn with bn almost the same as an when n is large. However it is
pretty rare that an ď bn for all n. It is much more common that an ď Kbn for some constant
K. This is enough to allow application of the comparison test. Here is an example.
ř
8 n+cos n
Example 5.3.10 n=1 n3 ´1/3
As in the previous example, the first step is to develop some intuition about the behaviour
of an when n is very large.
• Step 1: Develop intuition. When n is very large,
˝ n " | cos n| so that the numerator n + cos n « n and
˝ n3 " 1/3 so that the denominator n3 ´ 1/3 « n3 .
So when n is very large
n + cos n n 1
an = « =
n3 ´ 1/3 n3 n2
1
We already know from Example 5.3.6, with p = 2, that converges, so we
ř8
n =1 n2
n+cos n
would expect that 8n=1 n3 ´1/3 converges too.
ř
• Step 2: Verify intuition. We can use the comparison test to confirm that this is indeed
cos n| n+cos n
the case. To do so we need to find a constant K such that |an | = |nn+
3 ´1/3 = n3 ´1/3 is
smaller than nK2 for all n. A good way29 to do that is to factor the dominant term (in
this case n) out of the numerator and also factor the dominant term (in this case n3 )
out of the denominator.
n + cos n n 1 + cosn n 1 1 + cosn n
an = = =
n3 ´ 1/3 n3 1 ´ 1 3 n2 1 ´ 1 3
3n 3n
1+(cos n)/n
So now we need to find a constant K such that 1´1/3n3
is smaller than K for all n ě 1.
29 This is very similar to how we computed limits at infinity way way back near the beginning of first-
semester calculus.
384
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
1 3
* 1´1/3n3
is between 2 and 1.
˝ As the numerator 1 + (cos n) n1 is always smaller than 2 and 1
1´1/3n3
is always
smaller than 32 , the fraction
1+ cos n 3
n
1
ď2 =3
1 ´ 3n3 2
Example 5.3.10
The last example was actually a relatively simple application of the Comparison Theo-
rem — finding a suitable constant K can be really tedious30 . Fortunately, there is a variant
of the comparison test that completely eliminates the need to explicitly find K.
The idea behind this isn’t too complicated. We have already seen that the convergence
or divergence of a series depends not on its first few terms, but just on what happens
when n is really large. Consequently, if we can work out how the series terms behave for
really big n then we can work out if the series converges. So instead of comparing the
terms of our series for all n, just compare them when n is big.
an
lim =L
nÑ8 bn
exists.
an
Proof. (a) Because we are told that limnÑ8 = L, we know that,
bn
ˇ ˇ
• when n is large, bn is very close to L, so that ˇ bann ˇ is very close to |L|.
an ˇ ˇ
ˇ ˇ
• In particular, there is some natural number N0 so that ˇ bann ˇ ď |L| + 1, for all n ě N0 ,
ˇ ˇ
and hence
30 Really, really tedious. And you thought some of those partial fractions computations were bad . . .
385
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
(b) Let’s suppose that L ą 0. (If L ă 0, just replace an with ´an .) Because we are told that
limnÑ8 bann = L, we know that,
an
• when n is large, bn is very close to L.
an
• In particular, there is some natural number N so that bn ě L2 , and hence
L
• an ě Kbn with K = 2 ą 0, for all n ě N.
The next two examples illustrate how much of an improvement the above theorem is
over the straight comparison test (though of course, we needed the comparison test to
develop the limit comparison test).
ř ?
n +1
Example 5.3.12 8
n=1 n2 ´2n+3
?
n +1
Set an = n2 ´2n +3
. We first try to develop some intuition about the behaviour of an for large
n and then we confirm that our intuition was correct.
? ?
• Step 1: Develop intuition. When n " 1, ? the numerator n + 1 « n, and the denom-
inator n2 ´ 2n + 3 « n2 so that an « n2n = n3/2 1
and it looks like our series should
3
converge by Example 5.3.6 with p = 2 .
1
• Step 2: Verify intuition. To confirm our intuition we set bn = n3/2 and compute the
limit ?
n +1 ?
an n2 ´2n+3 n3/2 n + 1
lim = lim 1
= lim 2
nÑ8 bn nÑ8 nÑ8 n ´ 2n + 3
3/2 n
Again it is a good idea to factor the dominant term out of the numerator and the
dominant term out of the denominator.
? ?
an n2 1 + 1/n 1 + 1/n
lim = lim 2 = lim =1
nÑ8 bn nÑ8 n 1 ´ 2/n + 3/n2 nÑ8 1 ´ 2/n + 3/n2
1
We already know that the series 8 n = 1 bn = n=1 n3/2 converges by Example 5.3.6
ř ř8
with p = 32 . So our series converges by the limit comparison test, Theorem 5.3.11.
Example 5.3.12
ř ?
n +1
Example 5.3.13 n=1 n2 ´2n+3 ,
8
again
We can also try to deal with the series of Example 5.3.12, using the comparison test directly.
386
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
n + 1 ď 2n and so
? ?
n + 1 ď 2n
• The denominator is quite a bit more tricky, since we need a lower bound, rather than
an upper bound, and we cannot just write |n2 ´ 2n + 3| ě n2 , which is false. Instead
we have to make a more careful argument. In particular, we’d like to find N0 and K1
1
so that n2 ´ 2n + 3 ě K1 n2 , i.e. n2 ´2n +3
ď K11n2 for all n ě N0 . For n ě 4, we have
2n = 12 4n ď 12 n ¨ n = 12 n2 . So for n ě 4,
1 1
n2 ´ 2n + 3 ě n2 ´ n2 + 3 ě n2
2 2
(´1)n
8
ÿ
n
n =1
31 here’s a really convenient test for convergence of series that alternate signs every term, the aptly-named
Alternating Series Test. You can find more information in Appendix A.12.1. The Alternating Series Test,
however, is not on our syllabus.
387
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
and Appendix A.13.) You can take our word for it that the rearrangement below does not
impact the convergence of this particular series.
(´1)n
8
ÿ 1 1 1 1 1
= ´1 + ´ + ´ + ´ ¨ ¨ ¨
n 2 3 4 5 6
n =1
1 1 1 1 1
= ´1 + + ´ + + ´ + +¨¨¨
2 3 4 5 6
1
We can compare with using the Limit Comparison Test:
ř ´1 ř
2n(2n´1) n2
´1 1
an = bn =
2n(2n ´ 1) n2
an
L = lim
nÑ8 bn
´1
2n(2n´1)
= lim 1
nÑ8
n2
´n2
= lim
nÑ8 2n(2n ´ 1)
1
=´
4
1
Since 8 n=1 bn converges, and L = ´ 4 exists, n=1 an converges as well. That is, the
ř ř8
alternating harmonic series converges by the limit comparison test (and by trust in your
authors that the rearrangement we started with is, indeed, allowed).
Example 5.3.14
388
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
converges when |r| ă 1 and diverges otherwise. So the convergence of this series is com-
pletely determined by the number r. This number is just the ratio of successive terms —
that is r = an+1 /an .
a
In general the ratio of successive terms of a series, na+n 1 , is not constant, but depends on
n. However, as we have noted above, the convergence of a series an is determined by
ř
the behaviour of its terms when n is large. In this way, the behaviour of this ratio when
n is small tells us nothing about the convergence of the series, but the limit of the ratio as
n Ñ 8 does. This is the basis of the ratio test.
Theorem5.3.15 (Ratio Test).
Warning5.3.16.
Beware that the ratio test provides absolutely no conclusion about the conver-
8 ˇ ˇ
ˇa ˇ
gence or divergence of the series an if lim ˇ na+n 1 ˇ = 1. See Example ??, below.
ř
n =1 nÑ8
ˇ ˇ
ˇ a n +1 ˇ
Proof. (a) Pick any number R obeying L ă R ă 1. We are assuming that ˇ an ˇ approaches
ˇ ˇ
ˇ a n +1 ˇ
L as n Ñ 8. In particular there must be some natural number M so that ˇ an ˇ ď R for all
n ě M. So |an+1 | ď R|an | for all n ě M. In particular
|a M+1 | ď R |a M |
|a M+2 | ď R |a M+1 | ď R2 |a M |
|a M+3 | ď R |a M+2 | ď R3 |a M |
..
.
|a M+` | ď R` |a M |
for all ` ě 0. The series 8`=0 R |a M | is a geometric series with ratio R smaller than one in
`
ř
magnitude and so converges. Consequently, by the comparison test with an replaced by
8 8
A` = an+` and cn replaced by C` = R` |a M |, the series a M+` = an converges. So
ř ř
`=1 n = M +1
8
the series an converges too.
ř
n =1 ˇ ˇ
ˇ a n +1 ˇ
(b) We are assuming that ˇ an ˇ approaches L ą 1 as n Ñ 8. In particular there must be
ˇ ˇ
ˇa ˇ
some natural number M ą N so that ˇ na+n 1 ˇ ě 1 for all n ě M. So |an+1 | ě |an | for all
389
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS
Example 5.3.17 anx n´1
ř8
n =0
8
an = a n x n´1
ÿ
an with
n =0
The ratio test now tells us that the series 8 n´1 converges if |x| ă 1 and diverges if
n =0 a n x
ř
|x| ą 1. It says nothing about the cases x = ˘1. But in both of those cases an = a n (˘1)n
does not converge to zero as n Ñ 8 and the series diverges by the divergence test.
Example 5.3.17
Notice that in the above example, we had to apply another convergence test in addition
to the ratio test. This will be commonplace when we reach power series and Taylor series
— the ratio test will tell us something like
• Divergence Test
– works well when the nth term in the series fails to converge to zero as n tends to
infinity
• Integral Test
32 We shall řsee later, in Theorem 5.5.12, that the function 8 n=0 anx
n´1 is indeed the derivative of the
ř
8 n
function n=0 ax . Of course, such a statement only makes sense where these series converge — how
can you differentiate a divergent series? (This is not an allusion to a popular series of dystopian novels.)
Actually, there is quite a bit of interesting and useful mathematics involving divergent series, but it is
well beyond the scope of this course.
390
S EQUENCE AND S ERIES 5.4 A BSOLUTE AND C ONDITIONAL C ONVERGENCE
– works well when, if you substitute x for n in the nth term you get a function,
f ( x ), that you can integrate
– don’t forget to check that f ( x ) ě 0 and that f ( x ) decreases as x increases
• Ratio Test
a n +1 ˇa ˇ
– works well when an simplifies enough that you can easily compute lim ˇ na+n 1 ˇ =
nÑ8
L
– this often happens when an contains powers, like 7n , or factorials, like n!
– don’t forget that L = 1 tells you nothing about the convergence/divergence of
the series
• Comparison Test and Limit Comparison Test
– works well when, for very large n, the nth term an is approximately the same as
a simpler
ř8 term bn (see Example 5.3.10) and it is easy to determine whether or
not n=1 bn converges
– don’t forget to check that bn ě 0
– usually the Limit Comparison Test is easier to apply than the Comparison Test
This is a simple geometric series and we know it converges. We have also seen, as ex-
amples 5.3.17 and ?? showed us, that we can multiply or divide the nth term by n and it
will still converge. We can even multiply the nth term by (´1)n , and it will still converge.
Pretty robust.
On the other hand, we have explored the Harmonic series and its relatives quite a lot
and we know it is much more delicate. While
8
ÿ 1
n
n =1
n =1
n1.00000001 n =1
n
33 The first is a p-series with p ą 1; the second is the alternating harmonic series, which we found to
converge in Example 5.3.14.
391
S EQUENCE AND S ERIES 5.4 A BSOLUTE AND C ONDITIONAL C ONVERGENCE
This suggests that the divergence of the Harmonic series is much more delicate. In this
section, we discuss one way to characterize this sort of delicate convergence — especially
in the presence of changes of sign.
8 8
(a) A series an is said to converge absolutely if the series |an | converges.
ř ř
n =1 n =1
8 8 8
(b) If an converges but |an | diverges we say that an is conditionally
ř ř ř
n =1 n =1 n =1
convergent.
If you consider these definitions for a moment, it should be clear that absolute ř con-
vergence is a stronger condition than just simple convergence. ř All the terms in n |an |
are forced to be positive (by the absolute value signs), so that n |an | must be bigger than
n an — making it easier for n |an | to diverge. This is formalised by the following the-
ř ř
orem, which is an immediate consequence of the comparison test, Theorem 5.3.8.a, with
cn = |an |.
Recall that some of our convergence tests (for example, the integral test) may only be
applied to series with positive terms. Theorem 5.4.2 opens up the possibility of applying
“positive only” convergence tests to series whose terms are not all positive, by checking
for “absolute convergence” rather than for plain “convergence”.
ř
Example 5.4.3 8
( ´1 ) n´1 1
n =1 n2
8
ˇ(´1)n´1 1ˇ 1
Because the series of Example 5.3.6 converges (by the integral
ř8 ˇ ˇ ř
n =1 n2
= n2
n =1
8
test), the series (´1)n´1 n12 converges absolutely, and hence converges.
ř
n =1
Example 5.4.3
Imagine flipping a coin infinitely many times. Set σn = +1 if the nth flip comes up heads
392
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
and σn = ´1 if the nth flip comes up tails. We know that the series 8 σn 1 ˇ
ˇ ˇ
n=1 (´1) n2 =
ř
ˇ
8
1 σn 1
converges. So 8n=1 (´1) n2 converges absolutely, and hence converges.
ř ř
n2
n =1
Example 5.4.4
With series that converge conditionally, arithmetic can get a little tricky. For some
interesting examples of this trickiness, see Appendix A.13.
8
xn
ÿ
n =0
where x is some real number. As we have seen (back in Example 5.2.4), for |x| ă 1 this
series converges to a limit, that varies with x, while for |x| ě 1 the series diverges. Conse-
quently we can consider this series to be a function of x
8
xn
ÿ
f (x) = on the domain |x| ă 1.
n =0
Furthermore (also from Example 5.2.4) we know what the function is.
8
1
xn =
ÿ
f (x) = .
n =0
1´x
n 1
Hence we can consider the series 8 n=0 x as a new way of representing the function 1´x
ř
when |x| ă 1. This series is an example of a power series.
1
Of course, representing a function as simple as 1´x by a series doesn’t seem like it is
going to make life easier. However the idea of representing a function by a series turns
out to be extremely helpful. Power series turn out to be very robust mathematical ob-
jects and interact very nicely with not only standard arithmetic operations, but also with
differentiation and integration (see Theorem 5.5.12). This means, for example, that
8
d 1 d ÿ n
" *
= x provided |x| ă 1
dx 1´x dx n=0
8
ÿ d n
= x just differentiate term by term
n =0
dx
8
nx n´1
ÿ
=
n =0
393
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
We are hiding some mathematics under the word “just” in the above, but you can see that
once we have a power series representation of a function, differentiation and integration
become very straightforward.
So we should set as our goal for this section, the development of machinery to define
and understand power series. This will allow us to answer questions34 like
xn
8
x
ÿ
Is e = ?
n =0
n!
Our starting point (now that we have equipped ourselves with basic ideas about series),
is the definition of power series.
5.5.1 §§ Definitions
Definition5.5.1.
n =0
xn 1
For example 8 n=0 n! is the power series with c = 0 and An = n! . Typically, as in
ř
that case, the coefficients An are given fixed numbers, but the “x” is to be thought of as a
variable. Thus each power series is really a whole family of series — a different series for
each value of x.
394
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
Equation 5.5.2.
ˇ´1
1 ˇA
ˇ n +1 ˇ
R= = lim ˇ
A nÑ8 An
ˇ
35 By convention, when the term ( x ´ c)0 appears in a power series, it has value 1 for all values of x, even
x = c.
36 The use of the word “radius” might seem a little odd here, since we are really describing the interval
in the real line where the series converges. However, when one starts to consider power series over
complex numbers, the radius of convergence does describe a circle inside the complex plane and so
“radius” is a more natural descriptor.
395
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
Definition5.5.3.
(a) Let 0 ă R ă 8. If 8 n
n=0 An ( x ´ c ) converges for |x ´ c| ă R, and diverges
ř
for |x ´ c| ą R, then we say that the series has radius of convergence R.
(b) If 8 n
n=0 An ( x ´ c ) converges for every number x, we say that the series has
ř
an infinite radius of convergence.
(c) If 8 n
n=0 An ( x ´ c ) diverges for every x ‰ c, we say that the series has radius
ř
of convergence zero.
396
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
as expected.
Example 5.5.4
Example 5.5.7
n =1
37 Because of this, it might seem that such a series is fairly pointless. However there are all sorts of
mathematical games that can be played with them without worrying about their convergence. Such
“formal” power series can still impart useful information and the interested reader is invited to look up
“generating functions” with their preferred search engine.
397
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
we see that
A0 = 1 A1 = 2 A2 = 1 A3 = 2 A4 = 1 A5 = 2 ¨¨¨
so that
A1 A2 1 A3 A4 1 A5
=2 = =2 = =2 ¨¨¨
A0 A1 2 A2 A3 2 A4
A
and An+n 1 does not converge as n Ñ 8. Since the limit of the ratios does not exist, we
cannot tell anything from the ratio test. Nonetheless, we can still figure out for which x’s
our power series converges.
• Because every coefficient An is either 1 or 2, the nth term in our series obeys
ˇ An x n ˇ ď 2|x|n
ˇ ˇ
Example 5.5.8
Lets construct a series from the digits of π. Now to avoid dividing by zero, let us set
An = 1 + the nth digit of π
Since π = 3.141591 . . .
A0 = 4 A1 = 2 A2 = 5 A3 = 2 A4 = 6 A5 = 10 A6 = 2 ¨¨¨
Consequently every An is an integer between 1 and 10 and gives us the series
8
An x n = 4 + 2x + 5x2 + 2x3 + 6x4 + 10x5 + ¨ ¨ ¨
ÿ
n =0
A
The number π is irrational and consequently the ratio An+n 1 cannot have a limit as n Ñ 8.
If you do not understand why this is the case then don’t worry too much about it38 . As
38 This is a little beyond the scope of the course. Roughly speaking, think about what would happen if
the limit of the ratios did exist. If the limit were smaller than 1, then it would tell you that the terms of
our series must be getting smaller and smaller and smaller — which is impossible because they are all
integers between 1 and 10. Similarly if the limit existed and were bigger than 1 then the terms of the
series would have to get bigger and bigger and bigger — also impossible. Hence if the ratio exists then
it must be equal to 1 — but in that case because the terms are integers, they would have to be all equal
when n became big enough. But that means that the expansion of π would be eventually periodic —
something that only rational numbers do.
398
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
in the last example, the limit of the ratios does not exist and we cannot tell anything from
the ratio test. But we can still figure out for which x’s it converges.
• Because every coefficient An is no bigger (in magnitude) than 10, the nth term in our
series obeys
ˇ An x n ˇ ď 10|x|n
ˇ ˇ
• Since every An is at least one, the nth term in our series obeys
ˇ An x n ˇ ě |x|n
ˇ ˇ
In conclusion, our series converges if and only if |x| ă 1, and so has radius of conver-
gence 1.
Example 5.5.8
Though we won’t prove it, it is ˇtrue thatˇ every power series has a radius of conver-
ˇ A n +1 ˇ
gence, whether or not the limit lim ˇ An ˇ exists.
nÑ8
Theorem5.5.9.
8
Let An ( x ´ c)n be a power series. Then one of the following alternatives must
ř
n =0
hold.
(a) The power series converges for every number x. In this case we say that the
radius of convergence is 8.
(c) The series converges for x = c and diverges for all x ‰ c. In this case, we say
that the radius of convergence is 0.
399
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
Definition5.5.10.
n =0
The set of real x-values for which it converges is called the interval of conver-
gence of the series.
8
Suppose that the power series An ( x ´ c)n has radius of convergence R. Then from
ř
n =0
Theorem 5.5.9, we have that
• if 0 ă R ă 8, then we know that the series converges for any x which obeys
|x ´ c| ă R or equivalently ´R ă x´c ă R
or equivalently c´R ă x ă c+R
But we do not (yet) know whether or not the series converges at the two end points
of that interval. We do know, however, that its interval of convergence must be one
of
To reiterate — while the radius convergence, R with 0 ă R ă 8, tells us that the series
converges for |x ´ c| ă R and diverges for |x ´ c| ą R, it does not (by itself) tell us whether
or not the series converges when |x ´ c| = R, i.e. when x = c ˘ R. We will not generally
concern ourselves with these final details. (Determining the endpoints of the interval of
convergence often goes smoothest with the Alternating Series Test, which is available for
your interest in Appendix A.12 but is not a part of our syllabus.)
Example 5.5.11
We are told that a certain power series with centre c = 3, converges at x = 4 and diverges
at x = 1. What else can we say about the convergence or divergence of the series for other
values of x?
400
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
We are told that the series is centred at 3, so its terms are all powers of ( x ´ 3) and it is
of the form
A n ( x ´ 3) n .
ÿ
ně0
A good way to summarise the convergence data we are given is with a figure like the one
below. Green dots mark the values of x where the series is known to converge. (Recall
that every power series converges at its centre.) The red dot marks the value of x where
the series is known to diverge. The bull’s eye marks the centre.
1 3 4
Can we say more about the convergence and/or divergence of the series for other values
of x? Yes!
Let us think about the radius of convergence, R, of the series. We know that it must
exist and the information we have been given allows us to bound R. Recall that
˝ x = 4 cannot obey |x ´ 3| ą R so
˝ x = 4 must obey |x ´ 3| ď R, i.e. |4 ´ 3| ď R, i.e. R ě 1
˝ x = 1 cannot obey |x ´ 3| ă R so
˝ x = 1 must obey |x ´ 3| ě R, i.e. |1 ´ 3| ě R, i.e. R ď 2
• since 1 is the smallest that R could be, the series certainly converges at x if |x ´ 3| ă 1,
i.e. if 2 ă x ă 4 and
• since 2 is the largest that R could be, the series certainly diverges at x if |x ´ 3| ą 2,
i.e. if x ą 5 or if x ă 1.
The following figure provides a resume of all of this convergence data — there is conver-
gence at green x’s and divergence at red x’s.
1 2 3 4 5
401
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
Notice that from the data given we cannot say anything about the convergence or diver-
gence of the series on the intervals (1, 2] and (4, 5].
• then it also converges at all points between c and a, as well as at all points of distance
strictly less than |a ´ c| from c on the other side of c from a.
Example 5.5.11
Just as we have done previously with limits, differentiation and integration, we can con-
struct power series representations of more complicated functions by using those of sim-
pler functions. Here is a theorem that helps us to do so.
402
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
Assume that the functions f ( x ) and g( x ) are given by the power series
8 8
An ( x ´ c)n Bn ( x ´ c)n
ÿ ÿ
f (x) = g( x ) =
n =0 n =0
Example 5.5.13
The last statement of Theorem 5.5.12 might seem a little odd, but consider the following
two power series centred at 0:
8 8
2n x n and (1 ´ 2n ) x n .
ÿ ÿ
n =0 n =0
403
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
1
The ratio test tells us that they both have radius of convergence R = 2. However their
sum is
8 8 8
2n x n + (1 ´ 2n ) x n = xn
ÿ ÿ ÿ
n =0 n =0 n =0
They are both geometric series with radius of convergence R = 12 . But their sum is
8 8 8
n n n n
(0) x n
ÿ ÿ ÿ
2 x + (´2 ) x =
n =0 n =0 n =0
We’ll now use this theorem to build power series representations for a bunch of func-
tions out of the one simple power series representation that we know — the geometric
series
8
1
xn
ÿ
= for all |x| ă 1
1´x n =0
1
Example 5.5.14 1´x2
1
Find a power series representation for 1´x2
.
Solution. The secret to finding power series representations for a good many functions
1
is to manipulate them into a form in which 1´y appears and use the geometric series
1 n
representation 1´y = n=0 y . We have deliberately renamed the variable to y here — it
8
ř
1
does not have to be x. We can use that strategy to find a power series expansion for 1´x2
1 1 2
— we just have to recognize that 1´x 2 is the same as 1´y if we set y to x .
ÿ
8
1 1 ˇˇ
ˇ
n
= = y if |y| ă 1, i.e. |x| ă 1
1 ´ x2 1 ´ y ˇy = x 2 n =0 y= x2
8 8
n
x2 x2n
ÿ ÿ
= =
n =0 n =0
2 4 6
= 1+ x + x + x +¨¨¨
This is a perfectly good power series. There is nothing wrong with the power of x being 2n.
(This just means that the coefficients of all odd powers of x are zero.) In fact, you should
404
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
try to always write power series in forms that are as easy to understand as possible. The
geometric series that we used at the end of the first line converges for
Example 5.5.14
x
Example 5.5.15 2+ x 2
x
Find a power series representation for 2+ x 2
.
Solution. This example is just a more algebraically involved variant of the last one. Again,
1
the strategy is to manipulate 2+xx2 into a form in which 1´y appears.
x x 1 x 1 x2
= = set ´ =y
2 + x2 2 1 + x2/2 2 1 ´ ´x2/2 2
8
x 1 ˇˇ x ÿ n
ˇ
= = y if |y| ă 1
2 1 ´ y ˇy =´ x 2 2 n =0 y = ´ x2
2 2
8 2 n n (´1)n 2n+1
8 8
x ÿ x x ÿ (´1) 2n ÿ
= ´ = x = x by Theorem 5.5.12, twice
2 n =0 2 2 n =0 2n n =0
2 n +1
x x3 x5 x7
= ´ + ´ +¨¨¨
2 4 8 16
The geometric series that we used in the second line converges when
?
|y| ă 1 ðñ ˇ´x2/2ˇ ă 1 ðñ |x|2 ă 2 ðñ |x| ă 2
ˇ ˇ
?
So?the given
? power series has radius of convergence 2 and interval of convergence
´ 2 ă x ă 2.
Example 5.5.15
1 1 1
= =
5´x 5 ´ ( x ´ 3) ´ 3 2 ´ ( x ´ 3)
405
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
1
Now we continue, as in the last example, by manipulating 2´( x´3)
into a form in which
1
1´y appears.
1 1 1 1 x´3
= = set
=y
5´x 2 ´ ( x ´ 3) 2 1 ´ x´3
2
2
8
1 1 ˇˇ 1 ÿ n
ˇ
= = y if |y| ă 1
2 1 ´ yˇ y= x´3 2 n =0 y= x´3
2 2
8
1 ÿ x´3 n 8
ÿ ( x ´ 3) n
= =
2 n =0 2 n =0
2n +1
x ´ 3 ( x ´ 3)2 ( x ´ 3)3
= + + +¨¨¨
2 4 8
The geometric series that we used in the second line converges when
ˇx ´ 3ˇ
|y| ă 1 ðñ ˇ ˇ ă 1 ðñ |x ´ 3| ă 2 ðñ ´2 ă x ´ 3 ă 2 ðñ 1 ă x ă 5
ˇ ˇ
2
So the power series has radius of convergence 2 and interval of convergence 1 ă x ă 5.
Example 5.5.16
In the previous two examples, to construct a new series from an existing series, we
replaced x by a simple function. The following theorem gives us some more (but certainly
not all) commonly used substitutions.
for all x in the interval I. Also let K and k be real constants. Then
8
k
An K n x kn
ÿ
f Kx =
n =0
1
Example 5.5.18 (1´x )2
1
Find a power series representation for (1´x )2
.
406
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
1 1
Solution. Once again the trick is to express (1´x )2
in terms of 1´x . Notice that
1 d 1
"
*
2
=
(1 ´ x ) dx 1´x
# +
8
d
xn
ÿ
=
dx n=0
8
nx n´1
ÿ
= by Theorem 5.5.12
n =1
d n d 0 d
x = x = 1=0
dx dx dx
Also note that the radius of convergence of this series is one. We can see this via Theo-
rem 5.5.12. That theorem tells us that theřradius of convergence of a power series is not
changed by differentiation — and since 8 n
n=0 x has radius of convergence one, so too
does its derivative.
Without much more work we can determine the interval of convergence by testing at
x = ˘1. When x = ˘1 the terms of the series do not go to zero as n Ñ 8 and so, by the
divergence test, the series does not converge there. Hence the interval of convergence for
the series is ´1 ă x ă 1.
Example 5.5.18
Notice that, in this last example, we differentiated a known series to get to our answer. As
per Theorem 5.5.12, the radius of convergence didn’t change. In addition, in this par-
ticular example, the interval of convergence didn’t change. This is not always the case.
Differentiation of some series causes the interval of convergence to shrink. In particular
the differentiated series may no longer be convergent at the end points of the interval39 .
Similarly, when we integrate a power series the radius of convergence is unchanged, but
the interval of convergence may expand to include one or both ends, as illustrated by the
next example.
xn
39 Consider the power series 8 n=1 n . We know that its interval of convergence is ´1řď x ă 1. (Indeed
ř
see the next example.) When we differentiate the series we get the geometric series 8 n
n=0 x which has
interval of convergence ´1 ă x ă 1.
407
S EQUENCE AND S ERIES 5.5 P OWER S ERIES
d
Solution. Recall that ln(1 + x ) = 1+1 x so that ln(1 + t) is an antiderivative of
dx
1
1+ t and
żx żxh ÿ 8 i
dt n
ln(1 + x ) = = (´t) dt
0 1+t 0 n =0
8 żx
(´t)n dt
ÿ
= by Theorem 5.5.12
n =0 0
x n +1
8
(´1)n
ÿ
=
n =0
n+1
x2 x3 x4
= x´
+ ´ +¨¨¨
2 3 4
Theorem 5.5.12 guarantees that the řradius of convergence is exactly one (the radius of
convergence of the geometric series 8 n
n=0 (´t ) ) and that
x n +1
8
(´1)n
ÿ
ln(1 + x ) = for all ´1 ă x ă 1
n =0
n+1
In general, we won’t worry about the endpoints of the interval of convergence. So, in
general, we wouldn’t bother testing x = 1 and x = ´1. However, in this instance, both
examples are pretty accessible. We include
ř8 them below for interest.
When x = ´1 our series reduces to n=0 n+1 , which is (minus) the harmonic series and
´1
x2n+1
8
(´1)n
ÿ
arctan x = for all ´1 ă x ă 1
n =0
2n + 1
408
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
Since we’re not generally concerned with the endpoints of the interval of convergence,
we’ll leave as a mystery whether the series converges at x = 1 and x = ´1.
Example 5.5.20
The operations on power series dealt with in Theorem 5.5.12 are fairly easy to apply.
Unfortunately taking the product, ratio or composition of two power series is more in-
volved and is beyond the scope of this course40 . Unfortunately Theorem 5.5.12 alone will
not get us power series representations of many of our standard functions (like e x and
sin x). Fortunately we can find such representations by extending Taylor polynomials41 to
Taylor series.
1 2
f ( x ) « f ( a) + f 1 ( a) ( x ´ a) + f ( a ) ( x ´ a )2
2
• In general, the Taylor polynomial of degree n, for the function f ( x ), about the ex-
pansion point a, is the polynomial, Tn ( x ), determined by the requirements that
(k)
f (k) ( a) = Tn ( a) for all 0 ď k ď n. That is, f and Tn have the same derivatives
at a, up to order n. Explicitly,
1 2 1
f ( x ) « Tn ( x ) = f ( a) + f 1 ( a) ( x ´ a) + f ( a ) ( x ´ a )2 + ¨ ¨ ¨ + f ( n ) ( a ) ( x ´ a ) n
2 n!
n
1 (k)
f ( a) ( x ´ a)k
ÿ
=
k!
k =0
These are, of course, approximations — often very good approximations near x = a — but
still just approximations. One might hope that if we let the degree, n, of the approximation
go to infinity then the error in the approximation might go to zero. If that is the case then
40 As always, a quick visit to your favourite search engine will direct the interested reader to more infor-
mation.
41 Now is a good time to review your notes from last term, though we’ll give you a whirlwind review
over the next page or two.
42 Please review your notes from last term if this material is feeling a little unfamiliar.
409
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
the “infinite” Taylor polynomial would be an exact representation of the function. Let’s
see how this might work.
Fix a real number a and suppose that all derivatives of the function f ( x ) exist. Then,
for any natural number n,
Equation 5.6.1.
f ( x ) = Tn ( x ) + En ( x )
where Tn ( x ) is the Taylor polynomial of degree n for the function f ( x ) expanded about a,
and En ( x ) = f ( x ) ´ Tn ( x ) is the error in our approximation. The Taylor polynomial43 is
given by the formula
Equation 5.6.1-a
1 (n)
Tn ( x ) = f ( a) + f 1 ( a) ( x ´ a) + ¨ ¨ ¨ + n! f ( a) ( x ´ a)n
Equation 5.6.1-b
En ( x ) = 1
( n +1) !
f ( n +1) ( c ) ( x ´ a ) n +1
for some c strictly between a and x. Note that we typically do not know the value of c in
the formula for the error. Instead we use the bounds on c to find bounds on f (n+1) (c) and
so bound the error44 .
In order for our Taylor polynomial to be an exact representation of the function f ( x )
we need the error En ( x ) to be zero. This will not happen when n is finite unless f ( x ) is a
polynomial. However it can happen in the limit as n Ñ 8, and in that case we can write
f ( x ) as the limit
n
1 (k)
( a) ( x ´ a)k
ÿ
f ( x ) = lim Tn ( x ) = lim k! f
nÑ8 nÑ8
k =0
which is a power series representation of the function. Let us formalise this in a definition.
410
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
The Taylor series for the function f ( x ) expanded around a is the power series
8
1 (n)
( a) ( x ´ a)n
ÿ
f (x) = n! f
n =0
provided the series converges. When a = 0 it is also called the Maclaurin series
of f ( x ).
x2 xn
ex = f (x) = 1 + x + +¨¨¨+ + En ( x )
2! n!
We shall see, in the optional Example 5.6.6 below, that, for any fixed x, lim En ( x ) = 0.
nÑ8
Consequently, for all x,
h 1 2 1 3 1 ni
8
1 n
x
ÿ
e = lim 1 + x + x + x + ¨ ¨ ¨ + x = x
nÑ8 2 3! n! n =0
n!
Example 5.6.3
1 1
ln(1 + x ) arctan( x ) ex .
1´x (1 ´ x )2
411
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
We do not think that you, the reader, will be terribly surprised to see that we develop
series for sine and cosine next.
Example 5.6.4 (Sine and Cosine Series)
The trigonometric functions sin x and cos x also have widely used Maclaurin series ex-
pansions (i.e. Taylor series expansions about a = 0). To find them, we first compute all
derivatives at general x.
Now set x = a = 0.
f ( x ) = sin x f (0) = 0 f 1 (0) = 1 f 2 (0) = 0 f (3) (0) = ´1 f (4) (0) = 0 ¨ ¨ ¨
g( x ) = cos x g(0) = 1 g1 (0) = 0 g2 (0) = ´1 g(3) (0) = 0 g (4) ( 0 ) = 1 ¨ ¨ ¨
For sin x, all even numbered derivatives (at x = 0) are zero, while the odd numbered
derivatives alternate between 1 and ´1. Very similarly, for cos x, all odd numbered deriva-
tives (at x = 0) are zero, while the even numbered derivatives alternate between 1 and ´1.
So, the Taylor polynomials that best approximate sin x and cos x near x = a = 0 are
sin x « x ´ 3!1 x3 + 1 5
5! x ´ ¨ ¨ ¨
cos x « 1 ´ 2!1 x2 + 1 4
4! x ´ ¨ ¨ ¨
We shall see, in the optional Example 5.6.8 below, that, for both sin x and cos x, we have
lim En ( x ) = 0 so that
nÑ8
h i
1 (n)
f ( x ) = lim f (0) + f (0) x + ¨ ¨ ¨ +
1
n! f (0) x n
nÑ8
h i
1 (n)
g( x ) = lim g(0) + g1 (0) x + ¨ ¨ ¨ + n! g (0) x n
nÑ8
Reviewing the patterns we found in the derivatives, we conclude that, for all x,
8
sin x = x ´ 3!1 x3 + 1 5
(´1)n (2n+
1
x2n+1
ÿ
5! x ´¨¨¨ = 1) !
n =0
8
cos x = 1 ´ 2!1 x2 + 1 4
(´1)n (2n1 )! x2n
ÿ
4! x ´¨¨¨ =
n =0
and, in particular, both of the series on the right hand sides converge for all x.
We could also test for convergence of the series using the ratio test. Computing the
ratios of successive terms in these two series gives us
ˇ An+1 ˇ |x|2n+3 /(2n + 3)! |x|2
ˇ ˇ
ˇ An ˇ |x|2n+1 /(2n + 1)! = (2n + 3)(2n + 2)
ˇ ˇ=
412
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
for sine and cosine respectively. Hence as n Ñ 8 these ratios go to zero and consequently
both series are convergent for all x. (This is very similar to what was observed in Exam-
ple 5.5.5.)
Example 5.6.4
Theorem5.6.5.
xn
8
1 2 1
ex = x + x3 + ¨ ¨ ¨
ÿ
= 1+x+ for all ´8 ă x ă 8
n =0
n! 2! 3!
8
1 1 1
(´1)n x2n+1 = x ´ x3 + x5 ´ ¨ ¨ ¨
ÿ
sin( x ) = for all ´8 ă x ă 8
n =0
(2n + 1)! 3! 5!
8
1 1 2 1
(´1)n x2n x + x4 ´ ¨ ¨ ¨
ÿ
cos( x ) = = 1´ for all ´8 ă x ă 8
n =0
(2n)! 2! 4!
8
1
xn = 1 + x + x2 + x3 + ¨ ¨ ¨
ÿ
= for all ´1 ă x ă 1
1´x n =0
x n +1 x2 x3 x4
8
(´1)n
ÿ
ln(1 + x ) = = x´ + ´ +¨¨¨ for all ´1 ă x ď 1
n =0
n+1 2 3 4
x2n+1 x3 x5
8
(´1)n
ÿ
arctan x = = x´ + ´¨¨¨ for all ´1 ď x ď 1
n =0
2n + 1 3 5
Notice that the series for sine and cosine sum to something that looks very similar to
45 The reader might ask whether or not we will give the series for other trigonometric functions or their
inverses. While the tangent function has a perfectly well defined series, its coefficients are not as simple
as those of the series we have seen — they form a sequence of numbers known (perhaps unsurprisingly)
as the “tangent numbers”. They, and the related Bernoulli numbers, have many interesting properties,
links to which the interested reader can find with their favourite search engine. The Maclaurin series
for inverse sine is
8
ÿ 4´n (2n)! 2n+1
arcsin( x ) = x
2n + 1 (n!)2
n =0
which is quite tidy, but proving it is beyond the scope of the course.
413
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
x2 xn
ex = 1 + x + +¨¨¨+ + En ( x )
2! n!
By (5.6.1-b)
1
En ( x ) = e c x n +1
( n + 1) !
for some (unknown) c between 0 and x. Fix any real number x. We’ll now show that En ( x )
converges to zero as n Ñ 8.
To do this we need get bound the size of ec , and to do this, consider what happens if x
is positive or negative.
We claim that this upper bound, and hence the error En ( x ), quickly shrinks to zero as
n Ñ 8.
Call the upper bound (except for the factor e x + 1, which is independent of n) en ( x ) =
|x|n+1
( n +1) !
. To show that this shrinks to zero as n Ñ 8, let’s write it as follows.
n + 1 factors
hkkkkkkkkkkkkkkkkkikkkkkkkkkkkkkkkkkj
|x|n+1 |x| |x| |x| |x| |x|
en ( x ) = = ¨ ¨ ¨¨¨ ¨
( n + 1) ! 1 2 3 n |n + 1|
46 Warning: antique sign–sine pun. No doubt the reader first saw it many years syne.
414
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
Now let k be an integer bigger than |x|. We can split the product
k factors
hkkkkkkkkkkkkkikkkkkkkkkkkkkj
|x| |x| |x| |x| |x| |x|
en ( x ) = ¨ ¨ ¨¨¨ ¨ ¨¨¨
1 2 3 k k+1 |n + 1|
n+1´k
|x| |x| |x| |x| |x|
ď ¨ ¨ ¨¨¨ ¨
1 2 3 k
looooooooooooomooooooooooooon k+1
= Q( x )
n+1´k
|x|
= Q( x ) ¨
k+1
Since k does not depend not n (though it does depend on x), the function Q( x ) does not
change as we increase n. Additionally, we know that |x| ă k + 1 and so k|x|
+1 ă 1. Hence as
we let n Ñ 8 the above bound must go to zero.
Alternatively, compare en ( x ) and en+1 ( x ).
|x|n+2
e n +1 ( x ) ( n +2) ! |x|
= =
en ( x ) |x|n+1 n+2
( n +1) !
e (x)
When n is bigger than, for example 2|x|, we have ne+(1x) ă 12 . That is, increasing the index
n
on en ( x ) by one decreases the size of en ( x ) by a factor of at least two. As a result en ( x )
must tend to zero as n Ñ 8.
Consequently, for all x, lim En ( x ) = 0, as claimed, and we really have
nÑ8
h1 2 1 3 1 ni
8
1 n
x
ÿ
e = lim 1 + x + x + x + ¨ ¨ ¨ + x = x
nÑ8 2 3! n! n =0
n!
Example 5.6.6
xn x
There is another way to prove that the series 8 n=0 n! converges to the function e .
ř
Rather than looking at how the error term En ( x ) behaves as n Ñ 8, we can show that
the series satisfies the same simple differential equation47 and the same initial condition
as the function.
1 n x.
Example 5.6.7 Optional — Another approach to showing that 8 x is e
ř
n=0 n!
1 n
We already know from Example 5.5.5, that the series 8 n=0 n! x converges to some func-
ř
tion f ( x ) for all values of x . All that remains to do is to show that f ( x ) is really e x . We will
do this by showing that f ( x ) and e x satisfy the same differential equation with the same
47 Recall, you studied that differential equation in the section on separable differential equations (Theo-
rem 3.12.10 in Section 3.12) as well as wayyyy back in the section on exponential growth and decay in
differential calculus.
415
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
dy
=y and y (0) = 1
dx
and by Theorem 3.12.10 (with a = 1, b = 0 and y(0) = 1), this is the only solution. So it
xn
suffices to show that f ( x ) = 8n=0 n! satisfies
ř
df
= f (x) and f (0) = 1.
dx
• By Theorem 5.5.12,
# +
8 8 8
df d 1 n n n´1 1
x n´1
ÿ ÿ ÿ
= x = x =
dx dx n =0
n! n! ( n ´ 1) !
n =1 n =1
=3kj
hkknik =4kj
hkknik
=1kj
hkknik =2kj
hkknik x2 x3
= 1 + x + + +¨¨¨
2! 3!
= f (x)
• When we substitute x = 0 into the series we get (see the discussion after Defini-
tion 5.5.1)
0 0
f (0) = 1 + + + ¨ ¨ ¨ = 1.
1! 2!
Hence f ( x ) solves the same initial value problem and we must have f ( x ) = e x .
Example 5.6.7
We can show that the error terms in Maclaurin polynomials for sine and cosine go to
zero as n Ñ 8 using very much the same approach as in Example 5.6.6.
(´1)n 2n+1 ř8 (´1)n 2n
Example 5.6.8 Optional — Why 8 x sin x and x cos x
ř
n=0 (2n+1)! = n=0 (2n)! =
Let f ( x ) be either sin x or cos x. We know that every derivative of f ( x ) will be one
of ˘ sin( x ) or ˘ cos( x ). Consequently,
ˇ (n+1) ˇ when we compute the error term using equa-
tion (5.6.1-b) we always have ˇ f (c)ˇ ď 1 and hence
|x|n+1
|En ( x )| ď .
( n + 1) !
48 Recall that when we solve of a separable differential equation our general solution will have an arbitrary
constant in it. That constant cannot be determined from the differential equation alone and we need
some extra data to find it. This extra information is often information about the system at its beginning
(for example when position or time is zero) — hence “initial conditions”. Of course the reader is already
familiar with this because it was covered back in Section 3.12.
416
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
n +1
In Example 5.6.3, we showed that (|x|
n +1) !
Ñ 0 as n Ñ 8 — so all the hard work is already
done. Since the error term shrinks to zero for both f ( x ) = sin x and f ( x ) = cos x, and
h i
1 (n)
f ( x ) = lim f (0) + f 1 (0) x + ¨ ¨ ¨ + n! f (0) x n
nÑ8
as required.
Example 5.6.8
There are numerous methods for computing π to any desired degree of accuracy49 . Many
of them use the Maclaurin expansion
x2n+1
8
(´1)n
ÿ
arctan x =
n =0
2n + 1
(´1)n
8
π ÿ
= arctan 1 =
4 n =0
2n + 1
1 1 1
π = 4 1´ + ´ +¨¨¨
3 5 7
Unfortunately, this series is not very useful for computing π because it converges so
slowly. If we approximate the series by its N th partial sum, then the alternating series
test (Theorem A.12.1 in the appendix) tells us that the error is bounded by the first term
we drop. To guarantee that we have 2 decimal digits of π correct, we need to sum about
the first 200 terms!
A much better way to compute π using this series is to take advantage of the fact that
49 The computation of π has a very, very long history and your favourite search engine will turn up many
sites that explore the topic. For a more comprehensive history one can turn to books such as “A history
of Pi” by Petr Beckmann and “The joy of π” by David Blatner.
417
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
tan π6 = ?1 :
3
1 8
1 1
(´1)n
ÿ
π = 6 arctan ? =6 ? 2n+1
3 n =0
2n + 1 ( 3)
? ÿ 8
1 1
=2 3 (´1)n
n =0
2n + 1 3n
? 1 1 1 1 1
= 2 3 1´ + ´ + ´ +¨¨¨
3 ˆ 3 5 ˆ 9 7 ˆ 27 9 ˆ 81 11 ˆ 243
Again, this is an alternating series and so (via Theorem A.12.1 in the appendix) the error
we introduce by truncating it is bounded by the first term dropped. For example, if we
keep ten terms, stopping at n = 9, we get π = 3.141591 (to 6 decimal places) with an error
between zero and ?
2 3
ă 3 ˆ 10´6
21 ˆ 310
In 1699, the English astronomer/mathematician Abraham Sharp (1653–1742) used 150
terms of this series to compute 72 digits of π — by hand!
This is just one of very many ways to compute π. Another one, which still uses the
Maclaurin expansion of arctan x, but is much more efficient, is
1 1
π = 16 arctan ´ 4 arctan
5 239
This formula was used by John Machin in 1706 to compute π to 100 decimal digits —
again, by hand.
(You won’t be asked to compute errors using Theorem A.12.1, but we include them
here for interest.)
Example 5.6.9
Power series also give us access to new functions which might not be easily expressed
in terms of the functions we have been introduced to so far. The following is a good
example of this.
Example 5.6.10 (Error function)
418
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
(´1)n 2n
8
´t2
ÿ
e = t
n =0
n!
For example, for the bell curve, the probability of being within one standard deviation of
the mean50 , is
8 ? 2n+1 8
? 2 ÿ n ( / 2)
1 2 ÿ 1
erf 1/ 2 = ? (´1) = ? (´1)n
π n =0 (2n + 1)n! 2π n=0 (2n + 1)2n n!
c
2 1 1 1 1
= 1´ + ´ + ´ ¨ ¨ ¨
π 3 ˆ 2 5 ˆ 22 ˆ 2 7 ˆ 23 ˆ 3! 9 ˆ 24 ˆ 4!
This is yet another alternating series. If we keep five terms, stopping at n = 4, we get
0.68271 (to 5 decimal places) with, by Theorem A.12.1 in the appendix again, an error
between zero and the first dropped term, which is minus
c
2 1
5
ă 2 ˆ 10´5
π 11 ˆ 2 ˆ 5!
(You won’t be asked to compute such an error, but we include it for interest.)
Example 5.6.10
Example 5.6.11
Evaluate
(´1)n´1
8 8
ÿ ÿ 1
and
n3n n3n
n =1 n =1
Solution. There are not very many series that can be easily evaluated exactly. But occa-
sionally one encounters a series that can be evaluated simply by realizing that it is exactly
one of the series in Theorem 5.6.5, just with a specific value of x. The left hand given series
is
(´1)n´1 1
8
ÿ 1 1 1 1 1 1 1
n
= ´ 2
+ 3
´ +¨¨¨
n 3 3 2 3 3 3 4 34
n =1
50 If you don’t know what this means (forgive the pun) don’t worry, because it is not part of the course.
Standard deviation a way of quantifying variation within a population.
419
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
x2 x3 x4
ln(1 + x ) = x ´ + ´ ´¨¨¨
2 3 4
Indeed
(´1)n´1 1
8
ÿ 1 1 1 1 1 1 1
n
= ´ 2
+ 3
´ +¨¨¨
n 3 3 2 3 3 3 4 34
n =1
x2 x3 x4
= x´ + ´ ´¨¨¨
2 3 4 x = 31
h i
= ln(1 + x ) 1
x= 3
4
= ln
3
The right hand series above differs from the left hand series above only that the signs of
the left hand series alternate while those of the right hand series do not. We can flip every
second sign in a power series just by using a negative x.
h i
x2 x3 x4
ln(1 + x ) = x´ + ´ ´¨¨¨
x =´ 31 2 3 4 x =´ 1 3
1 1 1 1 1 1 1
=´ ´ 2
´ 3
´ +¨¨¨
3 2 3 3 3 4 34
which is exactly minus the desired right hand series. So
8
ÿ 1 h i 2 3
n
= ´ ln ( 1 + x ) = ´ ln = ln
n3 x =´ 31 3 2
n =1
Example 5.6.11
Example 5.6.12
51 We could get a computer algebra system to do it for us without much difficulty — but we wouldn’t
learn much in the process. The point of this example is to illustrate that one can do more than just
represent a function with Taylor series. More on this in the next section.
420
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
• We know, or at least can easily find, the Taylor series for sin(2x3 ).
1 3 1
sin y = y ´ y + y5 ´ ¨ ¨ ¨
3! 5!
1 3 1 5
sin(2x3 ) = 2x3 ´ (2x3 ) + (2x3 ) ´ ¨ ¨ ¨
3! 5!
8 2 5
= 2x3 ´ x9 + x15 ´ ¨ ¨ ¨
3! 5!
• So the coefficient of x15 in the Taylor series of f ( x ) = sin(2x3 ) with expansion point
5
a = 0 is 25!
and we have
25
f (15) (0) = 15! ˆ = 348,713,164,800
5!
Example 5.6.12
x2 xn
ex = 1 + x + 2! +¨¨¨+ n! + 1
( n +1) !
e c x n +1
for some (unknown) c between 0 and x. This can be used to approximate the number e,
with any desired degree of accuracy. Setting x = 1 in this equation gives
e = 1+1+ 1
2! +¨¨¨+ 1
n! + 1
( n +1) !
ec
for some c between 0 and 1. Even though we don’t know c exactly, we can bound that
term quite readily. We do know that ec in an increasing function52 of c, and so 1 = e0 ď
ec ď e1 = e. Thus we know that
1 e
1 1
ď e´ 1+1+ 2! +¨¨¨+ n! ď
( n + 1) ! ( n + 1) !
So we have a lower bound on the error, but our upper bound involves the e — precisely
the quantity we are trying to get a handle on.
421
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
But all is not lost. Let’s look a little more closely at the right-hand inequality when
n = 1:
e
e ´ (1 + 1) ď move the e’s to one side
2
e
ď2 and clean it up
2
e ď 4.
Now this is a pretty crude bound53 but it isn’t hard to improve. Try this again with n = 1:
1 e
e ´ (1 + 1 + ) ď move e’s to one side
2 6
5e 5
ď
6 2
e ď 3.
1 e 3
1 1
ď e´ 1+1+ 2! +¨¨¨+ n! ď ď
( n + 1) ! ( n + 1) ! ( n + 1) !
1 3 1
So the error is between 120 and 120 = 40 — this approximation isn’t guaranteed to give us
the first 2 decimal places. If we ramp n up to 9 however, we get
1 1 1 3
ď e´ 1+1+ +¨¨¨+ ď
10! 2 9! 10!
3 3
Since 10! = 3628800, the upper bound on the error is 3628800 ă 3000000 = 10´6 , and we can
approximate e by
1
1+1+ + 3!1 + 4!1 + 5!1 + 6!1
2! + 7!1 + 8!1 + 9!1
=1 + 1 + 0.5 + 0.16̇ + 0.0416̇ + 0.0083̇ + 0.00138̇ + 0.0001984 + 0.0000248 + 0.0000028
=2.718282
53 The authors hope that by now we all “know” that e is between 2 and 3, but maybe we don’t know how
to prove it.
422
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
The first thing to notice about this limit is that, as x tends to zero, both the numerator, sin x,
and the denominator, x, tend to 0. So we may not evaluate the limit of the ratio by simply
dividing the limits of the numerator and denominator. To find the limit, or show that it
does not exist, we are going to have to exhibit a cancellation between the numerator and
the denominator. Let’s start by taking a closer look at the numerator. By Example 5.6.4,
1 3 1
sin x = x ´ x + x5 ´ ¨ ¨ ¨
3! 5!
Consequently54
sin x 1 1
= 1 ´ x2 + x4 ´ ¨ ¨ ¨
x 3! 5!
Every term in this series, except for the very first term, is proportional to a strictly positive
power of x. Consequently, as x tends to zero, all terms in this series, except for the very
first term, tend to zero. In fact the sum of all terms, starting with the second term, also
tends to zero. That is,
h 1 1 4 i
2
lim ´ x + x ´ ¨ ¨ ¨ = 0
xÑ0 3! 5!
We won’t justify that statement here, but it will be justified in the following (optional)
subsection. So
sin x h 1 1 i
lim = lim 1 ´ x2 + x4 ´ ¨ ¨ ¨
xÑ0 x xÑ0 3! 5!
h 1 1 4 i
2
= 1 + lim ´ x + x ´ ¨ ¨ ¨
xÑ0 3! 5!
=1
54 We are hiding some mathematics behind this “consequently”. What we are really using our knowledge
of Taylor polynomials to write
1 3 1
f ( x ) = sin( x ) = x ´ x + x5 + E5 ( x )
3! 5!
f (6) ( c )
where E5 ( x ) = 6! x6 and c is between 0 and x. We are effectively hiding “E5 ( x )” inside the “¨ ¨ ¨ ”.
Now we can divide both sides by x (assuming x ‰ 0):
sin( x ) 1 1 E (x)
= 1 ´ x2 + x4 + 5 .
x 3! 5! x
E5 ( x )
and everything is fine provided the term x stays well behaved.
423
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
Example 5.6.14
The limit in the previous example can also be evaluated relatively easily using l’Hôpital’s
rule55 . While the following limit can also, in principal, be evaluated using l’Hôpital’s rule,
it is much more efficient to use Taylor series56 .
Example 5.6.15
x3 x5
arctan x = x ´ + ´¨¨¨
3 5
so the numerator
x3 x5
arctan x ´ x = ´ + ´¨¨¨
3 5
By Example 5.6.4,
1 3 1
sin x = x ´ x + x5 ´ ¨ ¨ ¨
3! 5!
so the denominator
1 3 1
sin x ´ x = ´ x + x5 ´ ¨ ¨ ¨
3! 5!
and the ratio
3 x5
arctan x ´ x ´ x3 + 5 ´¨¨¨
= 1 3 1 5
sin x ´ x ´ 3! x + 5! x ´ ¨ ¨ ¨
Notice that every term in both the numerator and the denominator contains a common
factor of x3 , which we can cancel out.
2
arctan x ´ x ´ 13 + x5 ´ ¨ ¨ ¨
= 1
sin x ´ x ´ 3! + 5!1 x2 ´ ¨ ¨ ¨
As x tends to zero,
55 Many of you learned about l’Hôpital’s rule in school and all of you should have seen it last term in your
differential calculus course.
56 It takes 3 applications of l’Hôpital’s rule and some careful cleaning up of the intermediate expressions.
Oof!
424
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES
so we may now legitimately evaluate the limit of the ratio by simply dividing the limits
of the numerator and denominator.
2
arctan x ´ x ´1 + x ´ ¨ ¨ ¨
lim = lim 1 3 15 2
xÑ0 sin x ´ x xÑ0 ´ + x ´ ¨ ¨ ¨
3! 5!
1 x2
limxÑ0 ´ 3 + 5 ´ ¨ ¨ ¨
=
limxÑ0 ´ 3!1 + 5!1 x2 ´ ¨ ¨ ¨
´1/3
= 1
´ /3!
=2
Example 5.6.15
Chapter 5 of this work was adapted from Chapter 3 of CLP 2 – Integral Calculus by
Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.
425
Appendix A
3. Your paper now has a triangle sitting on top of a rectangle. Where the triangle ends,
make a crease in the underlying rectangle shapes.
426
P ROOFS AND S UPPLEMENTS A.2 C ONIC S ECTIONS AND Q UADRIC S URFACES
crease
4. Your paper has four layers, with the triangle shapes on top. Open the paper so that
three layers are on top, and one is on the bottom. The result should look like the
inside corner of a box.
open
Your octant is created! The vertical crease is the z axis, the crease to the left is the x axis,
and the crease to the right is the y axis. In the picture below, the blue sphere indicates that
the octant is open towards you: if you were to put a marble inside the paper structure, it
would sit as shown.
z
x y
To practice with your octant, label the following points directly on the paper:
• (1, 1, 0)
• (0, 1, 1)
• (1, 0, 1)
The next collection of points will exist out in space, not on any of the paper sides. Point to
their positions relative to your octant:
• (1, 1, 1)
• (1, 2, 3)
• (1, ´1, 1)
• (1, 1, ´1)
427
P ROOFS AND S UPPLEMENTS A.2 C ONIC S ECTIONS AND Q UADRIC S URFACES
used) definition is that a conic section is the set of all points in the xy–plane that obey
Q( x, y) = 0 with
Q( x, y) = Ax2 + By2 + Cxy + Dx + Ey + F = 0
being a polynomial of degree two3 . By rotating and translating our coordinate system the
equation of the conic section can be brought into one of the forms4
• αx2 + βy2 = γ with α, β, γ ą 0, which is an ellipse (or a circle),
• αx2 ´ βy2 = γ with α, β ą 0, γ ‰ 0, which is a hyperbola,
• x2 = δy, with δ ‰ 0 which is a parabola.
The three dimensional analogs of conic sections, surfaces in three dimensions given by
quadratic equations, are called quadrics. An example is the sphere x2 + y2 + z2 = 1. Here
are some tables giving all of the quadric surfaces.
elliptic parabolic hyperbolic
name sphere
cylinder cylinder cylinder
equation in x2 y2 2 x2 y2
a 2 + b 2 = 1 y = ax a 2 ´ b2
=1 x 2 + y2 + z2 = r 2
standard form
x = constant two lines one line two lines circle
cross–section
y = constant two lines two lines two lines circle
cross–section
z = constant ellipse parabola hyperbola circle
cross–section
sketch
428
P ROOFS AND S UPPLEMENTS A.2 C ONIC S ECTIONS AND Q UADRIC S URFACES
elliptic elliptic
name ellipsoid
paraboloid cone
equation in 2 y2 y2
x2
+ yb2 + zc2 = 1
2 x2 z x2 z2
a2 a2
+ b2
= c a2
+ b2
= c2
standard form
x = constant two lines if x = 0
ellipse parabola
cross–section hyperbola if x ‰ 0
y = constant two lines if y = 0
ellipse parabola
cross–section hyperbola if y ‰ 0
z = constant ellipse ellipse ellipse
cross–section
sketch
429
P ROOFS AND S UPPLEMENTS A.3 M IXED PARTIAL D ERIVATIVES
sketch
Section A.2 of this work was adapted from Appendix G of CLP 3 – Multivariable Calcu-
lus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.
Here is an outline of the proof of Theorem 2.2.5. The (numbered) details are in the subsec-
tion below.
Fix real numbers x0 and y0 and define
1
F (h, k) = f ( x0 + h, y0 + k ) ´ f ( x0 , y0 + k ) ´ f ( x0 + h, y0 ) + f ( x0 , y0 ) .
hk
B2 f B2 f
We define F (h, k ) in this way because both partial derivatives BxBy ( x0 , y0 ) and ByBx ( x0 , y0 )
are limits of F (h, k ) as h, k Ñ 0. We show in item (1) in the details below that
B Bf
( x0 , y0 ) = lim lim F (h, k)
By Bx kÑ0 hÑ0
B Bf
( x0 , y0 ) = lim lim F (h, k)
Bx By hÑ0 kÑ0
B2 f B2 f
and therefore the partial derivatives BxBy ( x0 , y0 ) and ByBx ( x0 , y0 ) are identical except for
the order in which the limits are taken.
430
P ROOFS AND S UPPLEMENTS A.3 M IXED PARTIAL D ERIVATIVES
Now, by applying the Mean Value Theorem multiple times (see items (2) to (5) for more
details) we get
(2) 1 B f Bf
F (h, k ) = ( x0 + h, y0 + θ1 k) ´ ( x0 , y0 + θ1 k)
h By By
(3) B B f
= ( x0 + θ2 h, y0 + θ1 k)
Bx By
(4) 1 B f Bf
F (h, k ) = ( x0 + θ3 h, y0 + k) ´ ( x0 + θ3 h, y0 )
k Bx Bx
(5) B B f
= ( x0 + θ3 h, y0 + θ4 k)
By Bx
as desired. To complete the proof we just have to justify the details (1), (2), (3), (4) and (5).
Similarly,
B Bf 1 Bf Bf
( x0 , y0 ) = lim ( x0 + h, y0 ) ´ ( x0 , y0 )
Bx By hÑ0 h By By
1 f ( x0 + h, y0 + k ) ´ f ( x0 + h, y0 ) f ( x0 , y0 + k ) ´ f ( x0 , y0 )
= lim lim ´ lim
hÑ0 h kÑ0 k kÑ0 k
f ( x0 + h, y0 + k ) ´ f ( x0 + h, y0 ) ´ f ( x0 , y0 + k ) + f ( x0 , y0 )
= lim lim
hÑ0 kÑ0 hk
= lim lim F (h, k)
hÑ0 kÑ0
431
P ROOFS AND S UPPLEMENTS A.3 M IXED PARTIAL D ERIVATIVES
(2) The Mean Value Theorem (probably covered in your last calculus class) says that, for
any differentiable function ϕ( x ),
• the slope of the line joining the points x0 , ϕ( x0 ) and x0 + k, ϕ( x0 + k) on the
graph of ϕ
is the same as
• the slope of the tangent to the graph at some point between x0 and x0 + k.
ϕ ( x0 + k ) ´ ϕ ( x0 ) dϕ
= ( x0 + θ1 k )
k dx
y
y “ ϕpxq
x0
x
x0 `θ1k x0 `k
G ( y0 + k ) ´ G ( y0 ) dG
= ( y0 + θ1 k ) for some 0 ă θ1 ă 1
k dy
Bf Bf
= ( x0 + h, y0 + θ1 k) ´ ( x0 , y0 + θ1 k)
By By
Bf
(3) Define H ( x ) = By ( x, y0 + θ1 k). By the Mean Value Theorem,
1
F (h, k ) = [ H ( x0 + h) ´ H ( x0 )]
h
dH
= ( x0 + θ2 h ) for some 0 ă θ2 ă 1
dx
B Bf
= ( x0 + θ2 h, y0 + θ1 k)
Bx By
432
P ROOFS AND S UPPLEMENTS A.3 M IXED PARTIAL D ERIVATIVES
Bf
(5) Define B(y) = Bx ( x0 + θ3 h, y). By the Mean Value Theorem
1
F (h, k) = [ B(y0 + k) ´ B(y0 )]
k
dB
= ( y0 + θ4 k ) for some 0 ă θ4 ă 1
dy
B Bf
= ( x0 + θ3 h, y0 + θ4 k)
By Bx
B2 f B2 f
A.3.2 §§ An Example of BxBy ( x0 , y0 ) ‰ ByBx ( x0 , y0 )
B2 f B2 f B2 f
In Theorem 2.2.5, we showed that BxBy ( x0 , y0 ) = ByBx ( x0 , y0 ) if the partial derivatives BxBy
B2 f
and ByBx exist and are continuous at ( x0 , y0 ). Here is an example which shows that if
B2 f B2 f
the partial derivatives BxBy and ByBx are not continuous at ( x0 , y0 ), then it is possible that
B2f B2f
BxBy ( x0 , y0 ) ‰ ByBx ( x0 , y0 ).
Define
x2 ´y2
#
xy x2 +y2 if ( x, y) ‰ (0, 0)
f ( x, y) =
0 if ( x, y) = (0, 0)
This function is continuous everywhere. Note that f ( x, 0) = 0 for all x and f (0, y) = 0 for
all y. We now compute the first order partial derivatives. For ( x, y) ‰ (0, 0),
Bf x 2 ´ y2 2x 2x ( x2 ´ y2 ) x 2 ´ y2 4xy2
( x, y) = y 2 + xy ´ xy = y + xy
Bx x + y2 x 2 + y2 ( x 2 + y2 )2 x 2 + y2 ( x 2 + y2 )2
Bf x 2 ´ y2 2y 2y( x2 ´ y2 ) x 2 ´ y2 4yx2
( x, y) = x 2 ´ xy ´ xy = x ´ xy
By x + y2 x 2 + y2 ( x 2 + y2 )2 x 2 + y2 ( x 2 + y2 )2
433
P ROOFS AND S UPPLEMENTS A.4 T HE ( MULTIVARIABLE ) CHAIN RULE
Bf Bf
Both Bx ( x, y ) and By ( x, y ) are continuous. Finally, we compute
2
B2 f d 1 1 h ´ 02
(0, 0) = f y ( x, 0) = lim f y (h, 0) ´ f y (0, 0) = lim h 2 ´0 = 1
BxBy dx x =0 hÑ0 h hÑ0 h h + 02
B2 f d 1 1 02 ´ k 2
(0, 0) = f x (0, y) = lim [ f x (0, k) ´ f x (0, 0)] = lim k 2 ´ 0 = ´1
ByBx dy y =0 kÑ0 k kÑ0 k 0 + k2
Section A.3.2 of this work was adapted from Section 2.3.2 of CLP 3 – Multivariable Cal-
culus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.
d df dx
f x (t) = x (t) (t)
dt dx dt
in doing computations like
d
sin(t2 ) = cos(t2 ) 2t
dt
In this example, f ( x ) = sin( x ) and x (t) = t2 .
We now generalize the chain rule to functions of more than one variable. For con-
creteness, we concentrate on the case in which all functions are functions of two variables.
That is, we find the partial derivatives BF Bs and Bt of a function F ( s, t ) that is defined as a
BF
composition
F (s, t) = f x (s, t) , y(s, t)
We are using the name F for the new function F (s, t) as a reminder that it is closely related
to, though not the same as, the function f ( x, y). The partial derivative BFBs is the rate of
434
P ROOFS AND S UPPLEMENTS A.4 T HE ( MULTIVARIABLE ) CHAIN RULE
change of F when s is varied with t held constant. When s is varied, both the x-argument,
x (s, t), and the y-argument, y(s, t), in f x (s, t) , y(s, t) vary. Consequently, the chain rule
for f x (s, t) , y(s, t) is a sum of two terms — one resulting from the variation of the x-
argument and the other resulting from the variation of the y-argument.
TheoremA.4.1 (The Chain Rule).
Assume that all first order partial derivatives of f ( x, y), x (s, t) and y(s, t) exist
and are continuous. Then the same is true for F (s, t) = f x (s, t) , y(s, t) and
BF Bf Bx Bf By
(s, t) = x (s, t) , y(s, t) (s, t) + x (s, t) , y(s, t) (s, t)
Bs Bx Bs By Bs
BF Bf Bx Bf By
(s, t) = x (s, t) , y(s, t) (s, t) + x (s, t) , y(s, t) (s, t)
Bt Bx Bt By Bt
We will give the proof of this theorem in §A.4.2, below. It is common to state this chain
rule as
BF B f Bx B f By
= +
Bs Bx Bs By Bs
BF B f Bx B f By
= +
Bt Bx Bt By Bt
That is, it is common to suppress the function arguments. But you should make sure that
you understand what the arguments are before doing so.
Theorem A.4.1 is given for the case that F is the composition of a function of two
variables, f ( x, y), with two functions, x (s, t) and y(s, t), of two variables each. There is
nothing magical about the number two. There are obvious variants for any numbers of
variables. For example,
Equation A.4.2.
if F (t) = f x (t), y(t), z(t) , then
dF Bf dx Bf dy
(t) = x (t) , y(t) , z(t) (t) + x (t) , y(t) , z(t) (t)
dt Bx dt By dt
Bf dz
+ x (t) , y(t) , z(t) (t)
Bz dt
and
Equation A.4.3.
if F (s, t) = f x (s, t) , then
BF df Bx
(s, t) = x (s, t) (s, t)
Bt dx Bt
To give you an idea of how the proof of Theorem A.4.1 will go, we first review the
proof of the familiar one dimensional chain rule.
435
P ROOFS AND S UPPLEMENTS A.4 T HE ( MULTIVARIABLE ) CHAIN RULE
d
df dx
A.4.1 §§ Review of the Proof of dt f x (t) = dx x (t) dt ( t )
As a warm up, let’s review the proof of the one dimensional chain rule
d df dx
f x (t) = x (t) (t)
dt dx dt
We wish to find the derivative of F (t) = f x (t) . By definition
F (t + h) ´ F (t)
F1 (t) = lim
hÑ0 h
f x (t + h) ´ f x (t)
= lim
hÑ0 h
Notice that the numerator is the difference of f ( x ) evaluated at two nearby values of
x, namely x1 = x (t + h) and x0 = x (t). The Mean Value Theorem is a good tool for
studying the difference in the values of f ( x ) at two nearby points. Recall that the Mean
Value Theorem says that, for any given x0 and x1 , there exists an (in general unknown) c
between them so that
f ( x1 ) ´ f ( x0 ) = f 1 ( c ) ( x1 ´ x0 )
For this proof, we choose x0 = x (t) and x1 = x (t + h). The the Mean Value Theorem tells
us that there exists a ch so that
f x ( t + h ) ´ f x ( t ) = f ( x1 ) ´ f ( x0 ) = f 1 ( c h ) x ( t + h ) ´ x ( t )
We have put the subscript h on ch to emphasise that ch , which is between x0 = x (t) and
x1 = x (t + h), may depend on h. Now since ch is trapped between x (t) and x (t + h) and
since x (t + h) Ñ x (t) as h Ñ 0, we have that ch must also tend to x (t) as h Ñ 0. Plugging
ths into the definition of F1 (t),
f x (t + h) ´ f x (t)
F (t) =
1
lim
hÑ0 h
f (ch ) x (t + h) ´ x (t)
1
= lim
hÑ0 h
x (t + h) ´ x (t)
= lim f 1 (ch ) lim
hÑ0 hÑ0 h
1
= f x (t) x (t)
1
as desired.
436
P ROOFS AND S UPPLEMENTS A.4 T HE ( MULTIVARIABLE ) CHAIN RULE
We wish to find the partial derivative with respect to s of F (s, t) = f x (s, t) , y(s, t) .
By definition
BF F (s + h, t) ´ F (s, t)
(s, t) = lim
Bs hÑ0 h
f x (s + h, t) , y(s + h, t) ´ f x (s, t) , y(s, t)
= lim
hÑ0 h
The numerator is the difference of f ( x, y) evaluated at two nearby values of ( x, y), namely
( x1 , y1 ) = x (s + h, t) , y(s + h, t) and ( x0 , y0 ) = x (s, t) , y(s, t) . In going from ( x0 , y0 ) to
( x1 , y1 ), both the x and y-coordinates change. By adding and subtracting we can separate
the change in the x-coordinate from the change in the y-coordinate.
f ( x1 , y1 ) ´ f ( x0 , y0 ) = f ( x1 , y1 ) ´ f ( x0 , y1 ) + f ( x0 , y1 ) ´ f ( x0 , y0 )
( (
The first half, f ( x1 , y1 ) ´ f ( x0 , y1 ) , has the same y argument in both terms and so is the
(
Bf
( c , y1 ) [ x1 ´ x0 ]
f ( x1 , y1 ) ´ f ( x0 , y1 ) = g( x1 ) ´ g( x0 ) = g1 (c x,h )[ x1 ´ x0 ] =
Bx x,h
Bf
= c x,h , y(s + h, t) x (s + h, t) ´ x (s, t)
Bx
We have introduced the two subscripts in c x,h to remind ourselves that it may depend on
h and that it lies between the two x-values x0 and x1 .
Similarly, the second half, f ( x0 , y1 ) ´ f ( x0 , y0 ) , is the difference of the function of
(
one variable h(y) = f ( x0 , y) (viewing x0 just as a constant) evaluated at the two nearby
values, y0 , y1 , of y. So, by the mean value theorem,
Bf
f ( x0 , y1 ) ´ f ( x0 , y0 ) = h(y1 ) ´ h(y0 ) = h1 (cy,h )[y1 ´ y0 ] =
( x0 , cy,h ) [y1 ´ y0 ]
By
Bf
= x (s, t) , cy,h y(s + h, t) ´ y(s, t)
By
for some (unknown) cy,h between y0 = y(s, t) and y1 = y(s + h, t). Agan, the two sub-
scripts in cy,h remind ourselves that it may depend on h and that it lies between the two
y-values y0 and y1 . So, noting that, as h tends to zero, c x,h , which is trapped between
x (s, t) and x (s + h, t), must tend to x (s, t), and cy,h , which is trapped between y(s, t) and
437
P ROOFS AND S UPPLEMENTS A.4 T HE ( MULTIVARIABLE ) CHAIN RULE
We can of course follow the same procedure to evaluate the partial derivative with respect
to t. This concludes the proof of Theorem A.4.1.
z0 = f ( x, y( x ))
Now we can think about the single-variable function g( x ) = f ( x, y( x )). Since this
function is equal to the constant value z0 , its derivative is zero. Then, using the chain rule:
d B f dx B f dy
0 = g1 ( x ) = [ f ( x, y( x ))] = +
dx Bx dx By dx
dy
= fx ¨ 1 + fy ¨
dx
If f y ‰ 0, then
dy fx
=´
dx fy
438
P ROOFS AND S UPPLEMENTS A.5 L AGRANGE M ULTIPLIERS : P ROOF OF T HEOREM 2.5.2
TheoremA.4.5.
The derivative of the curve in the xy plane that is implicitly defined by the equa-
tion
z0 = f ( x, y)
for some constant z0 and some differentiable function f ( x, y) is
dy fx
=´
dx fy
as long as f y ‰ 0.
Example A.4.4
CorollaryA.4.6.
Proof. First, suppose f y ( a, b) ‰ 0. Using Theorem A.4.5, the line tangent to the level curve
f
has slope (in the xy plane) ´ f yx . So, one vector in the direction tangent to the level curve is
´ f y , f x . Then
´ fy, fx ¨ fx, fy = 0
so ´ f y , f x and ∇ f = f x , f y are perpendicular.
Second, consider the case f y ( a, b) = 0. In this case, at ( a, b) the level curve has a
vertical tangent line. If ∇ f ( a, b) ‰ 0, then f x ( a, b) ‰ 0, so the gradient ∇ f ( a, b) = h f x , 0i
is horizontal.
Section A.4 of this work was adapted from Section 2.4 of CLP 3 – Multivariable Calcu-
lus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.
5 If you’re walking along hilly terrain, changing direction can cause you to change from going uphill to
downhill. Direction definitely matters!
439
P ROOFS AND S UPPLEMENTS A.5 L AGRANGE M ULTIPLIERS : P ROOF OF T HEOREM 2.5.2
∇ f ( a, b, c) = λ∇ g( a, b, c)
that is
f x ( a, b, c) = λ gx ( a, b, c)
f y ( a, b, c) = λ gy ( a, b, c)
f z ( a, b, c) = λ gz ( a, b, c)
So F (t) is the value of f that we see on our walk at time t. Then for all t close to 0,
x (t), y(t), z(t) is close to x (0), y(0), z(0) = ( a, b, c) so that
F (0) = f x (0), y(0), z(0) = f ( a, b, c) ď f x (t), y(t), z(t) = F (t)
for all t close to zero. So F (t) has a local minimum at t = 0 and consequently F1 (0) = 0.
By the chain rule, Theorem A.4.1,
d ˇˇ
F 1 (0) = f x ( t ), y ( t ), z ( t ) ˇ
dt 1 t =0
= f x a, b, c x (0) + f y a, b, c y1 (0) + f z a, b, c z1 (0) = 0 (˚)
440
P ROOFS AND S UPPLEMENTS A.6 A M ORE R IGOROUS A REA C OMPUTATION
This is true for all paths on S that pass through ( a, b, c) at time 0. In particular it is true for
all vectors h x1 (0) , y1 (0) , z1 (0)i that are tangent to S at ( a, b, c). So ∇ f ( a, b, c) is perpendic-
ular to S at ( a, b, c).
But we already know, by the three-dimensional analogue to Corollary A.4.6, that ∇ g( a, b, c)
is also perpendicular to S at ( a, b, c). So ∇ f ( a, b, c) and ∇ g( a, b, c) have to be parallel vec-
tors. That is,
∇ f ( a, b, c) = λ∇ g( a, b, c)
for some number λ. That’s the Lagrange multiplier rule of our theorem.
Section A.5 of this work was adapted from Section 2.10 of CLP 3 – Multivariable Calcu-
lus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.
claimed that upon taking the number of rectangles to infinity, the approximation of the
area became the exact area. However we did not justify the claim. The purpose of this
optional section is to make that calculation rigorous.
The broad set-up is the same. We divide the region up into n vertical strips, each of
width 1/n and we then approximate those strips by rectangles. However rather than an
uncontrolled approximation, we construct two sets of rectangles — one set always smaller
than the original area and one always larger. This then gives us lower and upper bounds
on the area of the region. Finally we make use of the squeeze theorem6 to establish the
result.
• To find our upper and lower bounds we make use of the fact that e x is an increasing
d x
function. We know this because the derivative dx e = e x is always positive. Conse-
quently, the smallest and largest values of e x on the interval a ď x ď b are e a and eb ,
respectively.
• In particular, for 0 ď x ď 1/n, e x takes values only between e0 and e1/n . As a result,
the first strip
( x, y) ˇ 0 ď x ď 1/n, 0 ď y ď e x
ˇ (
441
P ROOFS AND S UPPLEMENTS A.6 A M ORE R IGOROUS A REA C OMPUTATION
Hence
1 0 ( 1 1
e ď Area ( x, y) ˇ 0 ď x ď 1/n, 0 ď y ď e x ď e /n
ˇ
n n
y y = ex y y = ex
e2/n
e1/n0 e1/n0
e e
x 1 2 n x
1
n n
··· n
n
• Similarly, for the second, third, . . . , last strips, as in the figure on the right above,
1 1/n 1 2/n
e ď Area ( x, y) ˇ 1/n ď x ď 2/n, 0 ď y ď e x e
ˇ (
ď
n n
1 2/n 1 3
e ď Area ( x, y) ˇ 2/n ď x ď 3/n, 0 ď y ď e x ď e /n
ˇ (
n n
.. .. ..
. . .
1 (n´1)/n ( 1 n
e ď Area ( x, y) ˇ (n´1)/n ď x ď n/n, 0 ď y ď e x ď e /n
ˇ
n n
• Adding these n inequalities together gives
1 1/n (n´1)/n
1+ e +¨¨¨+ e
n
ď Area ( x, y) ˇ 0 ď x ď 1, 0 ď y ď e x
ˇ (
• We now apply the Squeeze Theorem to the above inequalities. In particular, the
limits of the lower and upper bounds are
1 e´1 X
lim = ( e ´ 1 ) lim X
= e´1
nÑ8 n e /n ´ 1
1
X =1/nÑ0 e ´ 1
442
P ROOFS AND S UPPLEMENTS A.7 C AREFUL D EFINITION OF THE I NTEGRAL
1 1/n e ´ 1 Xe X
lim e = ( e ´ 1 ) lim ¨
nÑ8 n e1/n ´ 1 X =1/nÑ0 e X ´ 1
X
= (e ´ 1) lim e X ¨ lim
XÑ0 X =Ñ0 e X ´1
= ( e ´ 1) ¨ 1 ¨ 1
Thus, since the exact area is trapped between the lower and upper bounds, the
squeeze theorem then implies that
Exact area = e ´ 1.
Section A.6 of this work was adapted from Section 1.1.1 of CLP 2 – Integral Calculus by
Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.
a = x0 ă x1 ă x2 ă ¨ ¨ ¨ ă xn´1 ă xn = b.
The subinterval number i runs from xi´1 to xi . This formulation does not require
the subintervals to have the same size. However we will eventually require that the
widths of the subintervals shrink towards zero as n Ñ 8.
• Then for each subinterval we select a value of x in that interval. That is, for i =
1, 2, . . . , n, choose xi˚ satisfying xi´1 ď xi˚ ď xi . We will use these values of x to help
approximate f ( x ) on each subinterval.
• The area between the graph of y = f ( x ) and the x–axis, with x running from xi´1
443
P ROOFS AND S UPPLEMENTS A.7 C AREFUL D EFINITION OF THE I NTEGRAL
y = f (x)
x
a = x0 x1 x2 x3 ··· xn−1 xn = b
şx
to xi , i.e. the contribution, x i f ( x )dx, from interval number i to the integral, is
i´1
approximated by the area of a rectangle. The rectangle has width xi ´ xi´1 and height
f ( xi˚ ).
• Of course every different choice of n and x1 , x2 , . . . , xn´1 and x1˚ , x2˚ , . . . , xn˚ gives a
different approximation. So to simplify the discussion that follows, let us denote a
particular choice of all these numbers by P:
P = (n, x1 , x2 , ¨ ¨ ¨ , xn´1 , x1˚ , x2˚ , ¨ ¨ ¨ , xn˚ ) .
Similarly let us denote the resulting approximation by I(P):
I(P) = f ( x1˚ )[ x1 ´ x0 ] + f ( x2˚ )[ x2 ´ x1 ] + ¨ ¨ ¨ + f ( xn˚ )[ xn ´ xn´1 ]
• We claim that, for any reasonable7 function f ( x ), if you take any reasonable8 se-
quence of these approximations you always get the exactly the same limiting value.
şb
We define a f ( x )dx to be this limiting value.
444
P ROOFS AND S UPPLEMENTS A.7 C AREFUL D EFINITION OF THE I NTEGRAL
• Let’s be more precise. We can take the limit of these approximations in two equiv-
alent ways. Above we did this by taking the number of subintervals n to infinity.
When we did this, the width of all the subintervals went to zero. With the formu-
lation we are now using, simply taking the number of subintervals to be very large
does not imply that they will all shrink in size. We could have one very large subin-
terval and a large number of tiny ones. Thus we take the limit we need by taking the
width of the subintervals to zero. So for any choice P, we define
M(P) = max x1 ´ x0 , x2 ´ x1 , ¨ ¨ ¨ , xn ´ xn´1
(
that is the maximum width of the subintervals used in the approximation deter-
mined by P. By forcing the maximum width to go to zero, the widths of all the
subintervals go to zero.
• We then define the definite integral as the limit
żb
f ( x )dx = lim I(P).
a M (P)Ñ0
Of course, one is now left with the question of determining when the above limit exists. A
proof of the very general conditions which guarantee existence of this limit is beyond the
scope of this course, so we instead give a weaker result (with stronger conditions) which
is far easier to prove.
For the rest of this section, assume
• that f ( x ) is continuous for a ď x ď b,
• that f ( x ) is differentiable for a ă x ă b, and
• that f 1 ( x ) is bounded — ie | f 1 ( x )| ď F for some constant F.
We will now show that, under these hypotheses, as M(P) approaches zero, I(P) always
approaches the area, A, between the graph of y = f ( x ) and the x–axis, with x running
from a to b.
These assumptions are chosen to make the argument particularly transparent. With a
little more work one can weaken the hypotheses considerably. We are cheating a little by
implicitly assuming that the area A exists. In fact, one can adjust the argument below to
remove this implicit assumption.
• Consider A j , the part of the area coming from x j´1 ď x ď x j .
445
P ROOFS AND S UPPLEMENTS A.7 C AREFUL D EFINITION OF THE I NTEGRAL
f ( x j )[ x j ´ x j´1 ] ď A j ď f ( x j )[ x j ´ x j´1 ].
• So both the true area, A j , and our approximation of that area f ( x˚j )[ x j ´ x j´1 ] have
to lie between f ( x j )[ x j ´ x j´1 ] and f ( x j )[ x j ´ x j´1 ]. Combining these bounds we
have that the difference between the true area and our approximation of that area is
bounded by
ˇ A j ´ f ( x˚ )[ x j ´ x j´1 ]ˇ ď [ f ( x j ) ´ f ( x j )] ¨ [ x j ´ x j´1 ].
ˇ ˇ
j
(To see this think about the smallest the true area can be and the largest our approx-
imation can be and vice versa.)
• Now since our function, f ( x ) is differentiable we can apply one of the main theo-
rems we learned in first-semester calculus — the Mean Value Theorem10 . The MVT
implies that there exists a c between x j and x j such that
f ( x j ) ´ f ( x j ) = f 1 (c) ¨ [ x j ´ x j ]
• By the assumption that | f 1 ( x )| ď F for all x and the fact that x j and x j must both be
between x j´1 and x j
ˇ f ( x j ) ´ f ( x j )ˇ ď F ¨ ˇx j ´ x j ˇ ď F ¨ [ x j ´ x j´1 ]
ˇ ˇ ˇ ˇ
ˇ A j ´ f ( x˚ )[ x j ´ x j´1 ]ˇ ď F ¨ [ x j ´ x j´1 ]2 .
ˇ ˇ
j
9 Here we are using the Extreme Value Theorem — its proof is beyond the scope of this course. The
theorem says that any continuous function on a closed interval must attain a minimum and maximum
at least once. In this situation this implies that for any continuous function f ( x ), there are x j´1 ď
x j , x j ď x j such that f ( x j ) ď f ( x ) ď f ( x j ) for all x j´1 ď x ď x j .
10 Recall that the Mean Value Theorem states that for a function continuous on [ a, b] and differentiable on
( a, b), there exists a number c between a and b so that
f (b) ´ f ( a)
f 1 (c) = .
b´a
446
P ROOFS AND S UPPLEMENTS A.8 I NTEGRATING sec x, csc x, sec3 x AND csc3 x
• That was just the error in approximating A j . Now we bound the total error by com-
bining the errors from approximating on all the subintervals. This gives
ˇ ˇ
ˇ n n
ˇÿ ÿ ˇ
|A ´ I(P)| = ˇ A j ´ f ( x j )[ x j ´ x j´1 ]ˇˇ
˚
ˇ
ˇ
ˇ j =1 j =1 ˇ
ˇ ˇ
ˇ n ˇˇ
ˇÿ
= ˇˇ A j ´ f ( x j )[ x j ´ x j´1 ] ˇˇ
˚
triangle inequality
ˇ j =1 ˇ
ÿ n ˇ ˇ
ď ˇ A j ´ f ( x˚j )[ x j ´ x j´1 ]ˇ
ˇ ˇ
j =1
n
F ¨ [ x j ´ x j´1 ]2
ÿ
ď from above
j =1
Now do something a little sneaky. Replace one of these factors of [ x j ´ x j´1 ] (which
is just the width of the jth subinterval) by the maximum width of the subintervals:
n
ÿ
ď F ¨ M (P) ¨ [ x j ´ x j´1 ] F and M (P) are constant
j =1
n
ÿ
ď F ¨ M (P) ¨ [ x j ´ x j´1 ] sum is total width
j =1
= F ¨ M (P) ¨ ( b ´ a ).
• Since a, b and F are fixed, this tends to zero as the maximum rectangle width M(P)
tends to zero.
Thus, we have proven
TheoremA.7.1.
Section A.7 of this work was adapted from Section 1.1.6 of CLP 2 – Integral Calculus by
Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.
when n is odd and m is even, we instead give some examples to give the idea of what to
expect.
447
P ROOFS AND S UPPLEMENTS A.8 I NTEGRATING sec x, csc x, sec3 x AND csc3 x
• Notice now that the numerator of this expression is exactly the derivative its denom-
inator. Hence we can substitute u = sec x + tan x and du = (sec x tan x + sec2 x ) dx.
• Hence
sec x + tan x sec2 x + sec x tan x
ż ż ż
sec xdx = sec x dx = dx
sec x + tan x sec x + tan x
1
ż
= du
u
= ln |u| + C
= ln | sec x + tan x| + C
• The above trick appears both totally unguessable and very hard to remember. For-
tunately, there is a simple way11 to recover the trick. Here it is.
Example A.8.1
There is a second method for integrating sec xdx, that is more tedious, but more
ş straight
ş
forward. In particular, it does not involve a memorized trick. The integral sec x dx is
ş du
converted into the integral 1´u 2 by using the substitution u = sin x, du = cos x dx. The
ş du
integral 1´u2 is then integrated by the method of partial fractions, which we shall learn
448
P ROOFS AND S UPPLEMENTS A.8 I NTEGRATING sec x, csc x, sec3 x AND csc3 x
about in Section 3.8 “Partial Fractions”. The details are in Example 3.8.4 in those notes.
This second method gives the answer
1 1 + sin x
ż
sec xdx = ln +C
2 1 ´ sin x
which appears to be different than the answer in Example A.8.1. But they really are the
same since
1 + sin x (1 + sin x )2 (1 + sin x )2
= =
1 ´ sin x 1 ´ sin2 x cos2 x
1 1 + sin x 1 (1 + sin x )2 ˇ sin x + 1 ˇ
ùñ ln = ln = ln ˇ ˇ = ln | tan x + sec x|
ˇ ˇ
2 1 ´ sin x 2 2
cos x cos x
Oof!
Example A.8.2 csc xdx — by the u = tan 2x substitution
ş
Solution. The integral csc xdx may also be evaluated by both the methods above. That
ş
is either
• by multiplying the integrand by a cleverly chosen 1 = cot x´csc x
cot x´csc x and then substitut-
ing u = cot x ´ csc x, du = (´ csc2 x + csc x cot x ) dx, or
ş du
• by substituting u = cos x, du = ´ sin x dx to give csc xdx = ´ 1´u 2 and then
ş
• To express sin x and cos x in terms of u, we first use the double angle trig identities
(Equations 3.6.2 and 3.6.3 with x ÞÑ x/2) to express sin x and cos x in terms of sin 2x
and cos 2x :
x x
sin x = 2 sin cos
2 2
x x
cos x = cos2 ´ sin2
2 2
12 A rational function of sin x and cos x is a ratio with both the numerator and denominator being finite
sums of terms of the form a sinm x cosn x, where a is a constant and m and n are positive integers.
449
P ROOFS AND S UPPLEMENTS A.8 I NTEGRATING sec x, csc x, sec3 x AND csc3 x
√
1 + u2 u
x/2
1
to express sin 2x and cos 2x in terms of u. The bottom and right hand sides of the
triangle have been chosen so that tan 2x = u. This tells us that
x u x 1
sin =? cos =?
2 1 + u2 2 1 + u2
x x u 1 2u
sin x = 2 sin cos = 2 ? ? =
2 2 2
1+u 1+u 2 1 + u2
x x 1 u2 1 ´ u2
cos x = cos2 ´ sin2 = ´ =
2 2 1 + u2 1 + u2 1 + u2
Oof!
1 1 + u2 2 1
ż ż ż ż
csc xdx = dx = 2
du = du = ln |u| + C
sin x 2u 1 + u u
ˇ x ˇˇ
= ln ˇ tan ˇ + C
ˇ
2
To see that this answer is really the same as that in (A.8.1), note that
Example A.8.2
Example A.8.3 sec3 xdx — by trickery
ş
Solution. The standard trick used to evaluate sec3 xdx is integration by parts.
ş
• Set u = sec x, dv = sec2 xdx. Hence du = sec x tan xdx, v = tan x and
ż ż
3 2
sec xdx = lo sec
omoxon sec xdx
looomooon
u dv
ż
= losec tan
omoxon loomooxn ´ tan
lo sec x tan xdx
omooxn loooooomoooooon
u v v du
450
P ROOFS AND S UPPLEMENTS A.8 I NTEGRATING sec x, csc x, sec3 x AND csc3 x
ż
= sec x tan x + ln | sec x + tan x| + C ´ sec3 xdx
where we used sec xdx = ln | sec x + tan x| + C, which we saw in Example A.8.1.
ş
• Now moving the sec3 xdx from the right hand side to the left hand side
ş
ż
2 sec3 xdx = sec x tan x + ln | sec x + tan x| + C and so
1 1
ż
sec3 xdx = sec x tan x + ln | sec x + tan x| + C
2 2
for a new arbitrary constant C (which is just one half the old one).
Example A.8.3
du
• Substitute u = sin x, du = cos xdx to convert sec3 xdx into and evaluate
ş ş
[1´u2 ]2
the latter using the method of partial fractions. This is done in Example 3.8.5 in
Section 3.8.
• Use the u = tan 2x substitution. We use this method to evaluate csc3 xdx in Example
ş
A.8.4, below.
Example A.8.4 csc3 xdx – by the u = tan 2x substitution
ş
Solution. Let us use the half-angle substitution that we introduced in Example A.8.2.
451
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
• This is a perfectly acceptable answer. But if you don’t like the 2x ’s, they may be
eliminated by using
x
2 2 x sin2 2x cos2 2x
tan ´ cot = ´
2 2 cos2 2x sin2 2x
sin4 x
2 ´ cos 2
4 x
=
sin2 2x cos2 2x
sin2 2x ´ cos2 2x sin2 2x + cos2 x
2
=
sin2 2x cos2 2x
sin2 2x ´ cos2 2x x x
= since sin2 + cos2 = 1
sin2 2x cos2 2x 2 2
´ cos x
= 1 2
by (3.6.2) and (3.6.3)
4 sin x
and
x sin 2x sin2 2x
tan = =
2 cos 2x sin 2x cos 2x
1
2 [1 ´ cos x ]
= 1
by (3.6.2) and (3.6.3)
2 sin x
So we may also write
1 1
ż
csc3 xdx = ´ cot x csc x + ln | csc x ´ cot x| + C
2 2
Example A.8.4
Section A.8 of this work was adapted from Section 1.8.3 of CLP 2 – Integral Calculus by
Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.
452
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
Equation A.9.1.
N (x) A1 Aj B x + C1 B x + Ck
= +¨¨¨+ + 2 1 +¨¨¨+ 2 k
D(x) x ´ a1 x ´ aj x + b1 x + c1 x + bk x + c k
Note that the numerator of each term on the right hand side has degree one smaller
than the degree of the denominator.
The quadratic terms x2Bx +C
+bx +c
are integrated in a two-step process that is best illustrated
with a simple example (see also Example A.9.5 above).
ş
2x +7
Example A.9.2 x2 +4x +13
dx
Solution.
453
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
Example A.9.2
Equation A.9.3.
We have already seen how to integrate the simple and general linear terms, and the
simple quadratic terms. Integrating general quadratic terms is not so straightforward.
ş
dx
Example A.9.4 ( x 2 +1) n
1 ( x2 + 1 ´ x2 )
ż ż
2
dx = dx sneaky
( x + 1) n ( x 2 + 1) n
1 x2
ż ż
= dx ´ dx
( x2 + 1)n´1 ( x 2 + 1) n
x2
ż
= In´1 ´ dx
( x 2 + 1) n
454
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
x2 x 2x
ż ż
dx = ¨ 2 dx sneaky
( x 2 + 1) n 2 ( x + 1) n
x 2x x dx
ż ż
¨ 2 n
dx = ´ 2
+
2 ( x + 1) 2(n ´ 1)( x + 1) n´1 2(n ´ 1)( x2 + 1)n´1
x 1
=´ 2 n´1
+ ¨ In´1
2(n ´ 1)( x + 1) 2( n ´ 1)
1
ż
In = 2
dx
( x + 1) n
x 1
= In´1 + 2 n´1
´ ¨ In´1
2(n ´ 1)( x + 1) 2( n ´ 1)
2n ´ 3 x
= In´1 +
2( n ´ 1) 2(n ´ 1)( x2 + 1)n´1
• We can then use this recurrence to write down In for the first few n:
1 x
I2 = I1 + 2
+C
2 2( x + 1)
1 x
= arctan x +
2 2( x 2 + 1)
3 x
I3 = I2 +
4 4( x 2 + 1)2
3 3x x
= arctan x + + +C
8 8( x 2 + 1) 4( x 2 + 1)2
5 x
I4 = I3 +
6 6( x + 1)3
2
5 5x 5x x
= arctan x + 2
+ 2 2
+ +C
16 16( x + 1) 24( x + 1) 6( x + 1)3
2
and so forth. You can see why partial fraction questions involving denominators
with repeated quadratic factors do not often appear on exams.
Example A.9.4
455
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
ş
x4 +5x3 +16x2 +26x +22
Example A.9.5 x3 +3x2 +7x +5
dx
Solution.
• Step 1. Again, we start by comparing the degrees of the numerator and denominator.
In this example, the numerator, x4 + 5x3 + 16x2 + 26x + 22, has degree four and the
denominator, x3 + 3x2 + 7x + 5, has degree three. As 4 ě 3, we must execute the
N (x)
first step, which is to write D( x) in the form
N (x) R( x )
= P( x ) +
D(x) D(x)
– We start by observing that to get from the highest degree term in the denomi-
nator (x3 ) to the highest degree term in the numerator (x4 ), we have to multiply
by x. So we write,
x
x3 + 3x2 + 7x + 5 x4+ 5x3+16x2+26x+ 22
– The remainder was 2x3 + 9x2 + 21x + 22. To get from the highest degree term
in the denominator (x3 ) to the highest degree term in the remainder (2x3 ), we
have to multiply by 2. So we write,
x+ 2
3 2
x + 3x + 7x + 5 x4+ 5x3+16x2+26x+ 22
x4+ 3x3+ 7x2+ 5x
2x3+ 9x2+21x+ 22
456
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
x+ 2
x3 + 3x2 + 7x + 5 x4+ 5x3+16x2+26x+ 22
x4+ 3x3+ 7x2+ 5x x(x3 + 3x2 + 7x + 5)
2x3+ 9x2+21x+ 22
2x3+ 6x2+14x+ 10 2(x3 + 3x2 + 7x + 5)
3x2+ 7x+ 12
– This leaves a remainder of 3x2 + 7x + 12. Because the remainder has degree 2,
which is smaller than the degree of the denominator, which is 3, we stop.
– In this example, when we subtracted x ( x3 + 3x2 + 7x + 5) and 2( x3 + 3x2 +
7x + 5) from x4 + 5x3 + 16x2 + 26x + 22 we ended up with 3x2 + 7x + 12. That
is,
Moving the ( x + 2)( x3 + 3x2 + 7x + 5) to the right hand side and dividing the
whole equation by x3 + 3x2 + 7x + 5 gives
x+ 2 P (x)
3 2
x + 3x + 7x + 5 x4+ 5x3+16x2+26x+ 22
x4+ 3x3+ 7x2+ 5x
2x3+ 9x2+21x+ 22
2x3+ 6x2+14x+ 10
3x2+ 7x+ 12 R(x)
– The trick exploits the fact that most polynomials that appear in homework as-
signments and on tests have integer coefficients and some integer roots. Any
13 One does not typically think of mathematics assignments or exams as nice kind places. . . The polyno-
mials that appear in the “real world” are not so forgiving. Nature, red in tooth and claw — to quote
Tennyson inappropriately (especially when this author doesn’t know any other words from the poem).
457
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
x2+ 2x + 5
x + 1 x3+ 3x2+ 7x+ 5
x3+ x2 x2 (x + 1)
2x2+ 7x+ 5
2x2+ 2x 2x(x + 1)
5x+ 5
5x+ 5 5(x + 1)
0
x3 + 3x2 + 7x + 5 ´ x2 ( x + 1) ´ 2x ( x + 1) ´ 5( x + 1) = 0
or
x3 + 3x2 + 7x + 5 = x2 ( x + 1) + 2x ( x + 1) + 5( x + 1)
or
x3 + 3x2 + 7x + 5 = ( x2 + 2x + 5)( x + 1)
14 Appendix B.16 contains several simple tricks for factoring polynomials. We recommend that you have
a look at them.
458
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
– It isn’t quite time to stop yet; we should attempt to factor the quadratic factor,
x2 + 2x + 5. We can use the quadratic formula15 to find the roots of x2 + 2x + 5:
? ? ?
´b ˘ b2 ´ 4ac ´2 ˘ 4 ´ 20 ´2 ˘ ´16
= =
2a 2 2
Since this expression contains the square root of a negative number the equation
x2 + 2x + 5 = 0 has no real solutions; without the use of complex numbers,
x2 + 2x + 5 cannot be factored.
3x2 + 7x + 12 A Bx + C
2
= + 2
( x + 1)( x + 2x + 5) x + 1 x + 2x + 5
for some constants A, B and C.
Note that the numerator, Bx + C of the second term on the right hand side is not just
a constant. It is of degree one, which is exactly one smaller than the degree of the
denominator, x2 + 2x + 5. More generally, if the denominator consists of n different
linear factors and m different quadratic factors, then we decompose the ratio as
A1 A2 An
rational function = + +¨¨¨+
linear factor 1 linear factor 2 linear factor n
B1 x + C1 B2 x + C2 Bm x + Cm
+ + +¨¨¨+
quadratic factor 1 quadratic factor 2 quadratic factor m
To determine the values of the constants A, B, C, we put the right hand side back
over the common denominator ( x + 1)( x2 + 2x + 5).
3x2 + 7x + 12 A Bx + C A( x2 + 2x + 5) + ( Bx + C )( x + 1)
= + =
( x + 1)( x2 + 2x + 5) x + 1 x2 + 2x + 5 ( x + 1)( x2 + 2x + 5)
The fraction on the far left is the same as the fraction on the far right if and only if
their numerators are the same.
3x2 + 7x + 12 = A( x2 + 2x + 5) + ( Bx + C )( x + 1)
Again, as in Example 3.8.1, there are a couple of different ways to determine the
values of A, B and C from this equation.
459
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
• Step 3 – Algebra Method. The conceptually clearest procedure is to write the right
hand side as a polynomial in standard form (i.e. collect up all x2 terms, all x terms
and all constant terms)
3x2 + 7x + 12 = ( A + B) x2 + (2A + B + C ) x + (5A + C )
For these two polynomials to be the same, the coefficient of x2 on the left hand side
and the coefficient of x2 on the right hand side must be the same. Similarly the
coefficients of x1 must match and the coefficients of x0 must match.
This gives us a system of three equations
A+B = 3 2A + B + C = 7 5A + C = 12
in the three unknowns A, B, C. We can solve this system by
– using the first equation, namely A + B = 3, to determine A in terms of B:
A = 3 ´ B.
– Substituting this into the remaining two equations eliminates the A’s from these
two equations, leaving two equations in the two unknowns B and C.
A = 3´B 2A + B + C = 7 5A + C = 12
ñ 2(3 ´ B ) + B + C = 7 5(3 ´ B) + C = 12
ñ ´B + C = 1 ´5B + C = ´3
460
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
3x2 + 7x + 12 = 2( x2 + 2x + 5) + ( Bx + C )( x + 1)
x2 + 3x + 2 = ( Bx + C )( x + 1)
Since ( x + 1) is a factor on the right hand side, it must also be a factor on the
left hand side.
( x + 2)( x + 1) = ( Bx + C )( x + 1) ñ ( x + 2) = ( Bx + C ) ñ B = 1, C = 2
3x2 + 7x + 12 2 x+2
= + X
( x + 1)( x2 + 2x + 5) x + 1 x2 + 2x + 5
• Step 4. Now we can finally integrate! The first two pieces are easy.
2
ż ż
1 2
( x + 2)dx = 2 x + 2x dx = 2 ln |x + 1|
x+1
(We’re leaving the arbitrary constant to the end of the computation.)
The final piece is a little harder. The idea is to complete the square16 in the denomi-
nator
x+2 x+2
=
x2 + 2x + 5 ( x + 1)2 + 4
ay+b
and then make a change of variables to make the fraction look like y2 +1
. In this case
x+2 1 x+2
2
= 1 2
( x + 1) + 4 4 ( x+
2 ) +1
Q( x ) = ax2 + bx + c
rewrite it as
Q( x ) = a( x + d)2 + e.
461
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
1
so we make the change of variables y = x+ dx
2 , dy = 2 , x = 2y ´ 1, dx = 2 dy
x+2 1 x+2
ż ż
2
dx = dx
( x + 1) + 4 4 ( x +1 )2 + 1
2
1 (2y ´ 1) + 2 1 2y + 1
ż ż
= 2 dy = dy
4 y2 + 1 2 y2 + 1
y 1 1
ż ż
= dy + dy
y2 + 1 2 y2 + 1
Example A.9.5
A.9.2 §§ Proofs
We will now see the justification for the form of the partial fraction decompositions from
Section 3.8.3. We start by considering the case in which the denominator has only linear
factors. Then we’ll consider the case in which quadratic factors are allowed too17 .
17 In fact, quadratic factors are completely avoidable because, if we use complex numbers, then every
polynomial can be written as a product of linear factors. This is the Fundamental Theorem of Algebra.
462
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
We now show that this decomposition can always be achieved, under the assumptions
that the ai ’s are all different and N ( x ) is a polynomial of degree at most d ´ 1. To do so,
we shall repeatedly apply the following Lemma.
LemmaA.9.6.
N (x) P( x ) A
= +
D ( x ) ( x ´ a) D(x) x ´ a
• Now look at the polynomial on the left hand side. Every term in P̃(z)z, has at least
one power of z. So the constant term on the left hand side is exactly the constant
term in A D̃ (z), which is equal to A D̃ (0). The constant term on the right hand side
is equal to Ñ (0). So the constant terms on the left and right hand sides are the same
Ñ (0)
if we choose A = D̃ (0)
. Recall that D̃ (0) cannot be zero, so A is well defined.
The constant terms in Ñ (z) and A D̃ (z) are the same, so the right hand side contains
no constant term and the right hand side is of the form Ñ1 (z)z for some polynomial
Ñ1 (z).
• Since Ñ (z) is of degree at most d and A D̃ (z) is of degree exactly d, Ñ1 is a polynomial
of degree d ´ 1. It now suffices to choose P̃(z) = Ñ1 (z).
Now back to
N (x)
( x ´ a1 ) ˆ ¨ ¨ ¨ ˆ ( x ´ a d )
463
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
LemmaA.9.7.
N (x) P( x ) A1 A2 Am
m
= + + +¨¨¨+
D ( x ) ( x ´ a) D ( x ) x ´ a ( x ´ a) 2 ( x ´ a)m
18 If we allow ourselves to use complex numbers as roots, this is the general case. We don’t need to
consider quadratic (or higher) factors since all polynomials can be written as products of linear factors
with complex coefficients.
464
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
• Now look at the polynomial on the left hand side. Every single term on the left
hand side, except for the very last one, Am D̃ (z), has at least one power of z. So the
constant term on the left hand side is exactly the constant term in Am D̃ (z), which is
equal to Am D̃ (0). The constant term on the right hand side is equal to Ñ (0). So the
Ñ (0)
constant terms on the left and right hand sides are the same if we choose Am = D̃ (0)
.
Recall that D̃ (0) ‰ 0 so Am is well defined.
The constant terms in Ñ (z) and Am D̃ (z) are the same, so the right hand side contains
no constant term and the right hand side is of the form Ñ1 (z)z with Ñ1 a polynomial
of degree at most d + m ´ 2. (Recall that Ñ is of degree at most d + m ´ 1 and D̃ is of
degree at most d.) Divide the whole equation by z to get
• Now, we can repeat the previous argument. The constant term on the left hand side,
which is exactly equal to Am´1 D̃ (0) matches the constant term on the right hand
Ñ1 (0)
side, which is equal to Ñ1 (0) if we choose Am´1 = D̃ (0)
. With this choice of Am´1
465
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
• There is no constant term on the right side so that Ñm´1 (z) ´ A1 D̃ (z) is of the form
Ñm (z)z with Ñm a polynomial of degree d ´ 1. Choosing P̃(z) = Ñm (z) completes
the proof.
Now back to
N (x)
( x ´ a1 ) n1 ˆ ¨ ¨ ¨ ˆ ( x ´ a d )nd
Apply Lemma A.9.7, with D ( x ) = ( x ´ a2 )n2 ˆ ¨ ¨ ¨ ˆ ( x ´ ad )nd , m = n1 and a = a1 . It says
N (x)
( x ´ a1 ) n1 ˆ ¨ ¨ ¨ ˆ ( x ´ a d )nd
A1,1 A1,2 A1,n1 P( x )
= + + ¨ ¨ ¨ + +
x ´ a1 ( x ´ a1 )2 ( x ´ a ) n1 ( x ´ a 2 ) n2 ˆ ¨ ¨ ¨ ˆ ( x ´ a d ) n d
Apply Lemma A.9.7 a second time, with D ( x ) = ( x ´ a3 )n3 ˆ ¨ ¨ ¨ ˆ ( x ´ ad )nd , N ( x ) = P( x ),
m = n2 and a = a2 . And so on. Eventually, we end up with
h A A1,n1 i h A Ad,nd i
1,1 d,1
+¨¨¨+ + ¨ ¨ ¨ + + ¨ ¨ ¨ +
x ´ a1 ( x ´ a 1 ) n1 x ´ ad ( x ´ a d )nd
which is exactly what we were trying to show.
(with b`2 ´ 4ac` ă 0 for all 1 ď ` ď k so that no quadratic factor can be written as a product
of linear factors with real coefficients) then there are real numbers Ai,j , Bi,j , Ci,j such that
466
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
Q2 ( x ) are factored as in (E1), no two of them have a common linear or quadratic factor.
As an example, no two of
P( x ) = 2( x ´ 3)( x ´ 4)( x2 + 3x + 3)
Q1 ( x ) = 2( x ´ 1)( x2 + 2x + 2)
Q2 ( x ) = 2( x ´ 2)( x2 + 2x + 3)
have such a common factor. But, for
P( x ) = 2( x ´ 3)( x ´ 4)( x2 + x + 1)
Q1 ( x ) = 2( x ´ 1)( x2 + 2x + 2)
Q2 ( x ) = 2( x ´ 2)( x2 + x + 1)
P( x ) P (x) P (x)
= 1 + 2
Q1 ( x ) Q2 ( x ) Q1 ( x ) Q2 ( x )
19 It appears in Euclid’s Elements, which was written about 300 BC, and it was probably known even
before that.
467
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
Q2 ( x )
• The second step is to apply long division to r0 ( x )
to find polynomials n1 ( x ) and
r1 ( x ) such that
• And so on.
As the degree of the remainder ri ( x ) decreases by at least one each time i is increased by
one, the above iteration has to terminate with some r`+1 ( x ) = 0. That is, we choose ` to be
index of the last nonzero remainder. Here is a summary of all of the long division steps.
Now we are going to take a closer look at all of the different remainders that we have
generated.
r2 ( x ) = r0 ( x ) ´ n2 ( x ) r1 ( x )
= Q1 ( x ) ´ n0 ( x ) Q2 ( x ) ´ n2 ( x ) A1 ( x ) Q1 ( x ) + B1 ( x ) Q2 ( x )
= A2 ( x ) Q1 ( x ) + B2 ( x ) Q2 ( x )
468
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
• And so on. Continuing in this way, we conclude that the final nonzero remainder
r` ( x ) = A` ( x ) Q1 ( x ) + B` ( x ) Q2 ( x ) for some polynomials A` and B` .
P̃1 ( x ) P̃ ( x ) P( x )
P̃2 ( x ) Q1 ( x ) + P̃1 ( x ) Q2 ( x ) = P( x ) or + 2 =
Q1 ( x ) Q2 ( x ) Q1 ( x ) Q2 ( x )
P( x ) A ( x ) P( x ) B ( x )
with P̃2 ( x ) = C
`
and P̃1 ( x ) = C
`
. We’re not quite done, because there is still
the danger that deg( P̃1 ) ě deg( Q1 ) or deg( P̃2 ) ě deg( Q2 ). To deal with that possibility,
P̃1 ( x )
we long divide Q1 ( x )
and call the remainder P1 ( x ).
P̃1 ( x ) P (x)
= N (x) + 1 with deg( P1 ) ă deg( Q1 )
Q1 ( x ) Q1 ( x )
P( x ) P (x) P̃ ( x )
= 1 + N (x) + 2
Q1 ( x ) Q2 ( x ) Q1 ( x ) Q2 ( x )
P (x) P̃ ( x ) + N ( x ) Q2 ( x )
= 1 + 2
Q1 ( x ) Q2 ( x )
Denoting P2 ( x ) = P̃2 ( x ) + N ( x ) Q2 ( x ) gives Q1PQ2 = QP11 + QP22 and since deg( P1 ) ă deg( Q1 ),
the only thing left to prove is that deg( P2 ) ă deg( Q2 ).
We assume that deg( P2 ) ě deg( Q2 ) and look for a contradiction. We have
which contradicts the hypothesis that deg( P) ă deg( Q1 Q2 ) and the proof is complete.
469
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS
For the second of the two simpler results, that we’ll shortly use repeatedly to get
P( x ) P( x )
(A.9.3), we consider ( x´a)m and ( x2 +bx+c)m .
LemmaA.9.9.
470
P ROOFS AND S UPPLEMENTS A.10 A N E RROR B OUND FOR THE M IDPOINT R ULE
20 This is assuming that there is at least one linear factor. If not, we factor D ( x ) = ( x2 + b1 x + c1 )n1 Q2 ( x )
instead.
21 We chose this interval so that we didn’t have lots of subscripts floating around in the algebra.
471
P ROOFS AND S UPPLEMENTS A.10 A N E RROR B OUND FOR THE M IDPOINT R ULE
ş α+q
Let us apply integration by parts to α f ( x )dx — with u = f ( x ), dv = dx so du =
f 1 ( x )dx and we will make the slightly non-standard choice of v = x ´ α:
ż α+q ż α+q
α+q
f ( x )dx = ( x ´ α) f ( x ) α ´ ( x ´ α) f 1 ( x )dx
α α
ż α+q
= q f (α + q) ´ ( x ´ α) f 1 ( x )dx
α
Notice that the first term on the right-hand side is the term we need, and that our non-
standard choice of v allowed us to avoid introducing an f (α) term.
Now integrate by parts again using u = f 1 ( x ), dv = ( x ´ α)dx, so du = f 2 ( x ), v =
( x´α)2
2 :
ż α+q ż α+q
f ( x )dx = q f (α + q) ´ ( x ´ α) f 1 ( x )dx
α α
α+q ż α+q
( x ´ α )2 1 ( x ´ α )2 2
= q f (α + q) ´ f (x) + f ( x )dx
2 α α 2
ż α+q
q2 1 ( x ´ α )2 2
= q f (α + q) ´ f (α + q) + f ( x )dx
2 α 2
To obtain a similar expression for the other integral, we repeat the above steps and obtain:
q2 1 ( x ´ β )2 2
żβ żβ
f ( x )dx = q f ( β ´ q) + f ( β ´ q) + f ( x )dx
β´q 2 β´q 2
Now add together these two expressions
ż α+q
q2
żβ
f ( x )dx + f ( x )dx = q f (α + q) + q f ( β ´ q) + ( f 1 ( β ´ q) ´ f 1 (α + q))
α β´q 2
ż α+q
( x ´ α )2 2 ( x ´ β )2 2
ż β
+ f ( x )dx + f ( x )dx
α 2 β´q 2
Then since α + q = β ´ q we can combine the integrals on the left-hand side and eliminate
some terms from the right-hand side:
ż α+q
( x ´ α )2 2 ( x ´ β )2 2
żβ żβ
f ( x )dx = 2q f (α + q) + f ( x )dx + f ( x )dx
α α 2 β´q 2
Rearrange this expression a little and take absolute values
ˇż ˇ ˇż ˇ ˇˇż β ˇ
ˇ β ˇ ˇ α + q ( x ´ α )2 ( x ´ β ) 2 ˇ
f ( x )dx ´ 2q f (α + q)ˇ ď ˇˇ f 2 ( x )dxˇˇ + ˇ f 2 ( x )dxˇ
ˇ ˇ ˇ ˇ ˇ
2 2
ˇ
ˇ α ˇ α ˇ β´q ˇ
472
P ROOFS AND S UPPLEMENTS A.11 C OMPARISON T ESTS P ROOF
the interval α ď x ď β, so
ˇż ˇ ż α+q
( x ´ α )2 ( x ´ β )2
ˇ β ˇ żβ
f ( x )dx ´ 2q f (α + q)ˇ ď M dx + M dx
ˇ ˇ
2 2
ˇ
ˇ α ˇ α β´q
Mq3 M ( β ´ α )3
= =
3 24
Putting everything together we see that the error using the midpoint rule is bounded
by
ˇż ˇ
ˇ b ˇ
ˇ f ( x )dx ´ [ f ( x̄1 ) + f ( x̄2 ) + ¨ ¨ ¨ + f ( x̄n )] ∆xˇ
ˇ ˇ
ˇ a ˇ
ˇż x ˇż ˇ
ˇ 1
ˇ ˇ xn ˇ
f ( x )dx ´ ∆x f ( x̄1 )ˇˇ + ¨ ¨ ¨ + ˇ f ( x )dx ´ ∆x f ( x̄n )ˇ
ˇ
ď ˇˇ
ˇ ˇ
x0 ˇ xn´1 ˇ
M 3 M b´a 3 M ( b ´ a )3
ď n ˆ (∆x ) = n ˆ =
24 24 n 24n2
as required.
A very similar analysis shows that, as was stated in Theorem 3.9.12 above,
M ( b ´ a )3
• the total error introduced by the trapezoidal rule is bounded by ,
12 n2
M ( b ´ a )5
• the total error introduced by Simpson’s rule is bounded by
180 n4
Section A.10 of this work was adapted from Section 1.11.5 of CLP 2 – Integral Calculus
by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.
23 It is one way to state a property of the real number system called “completeness”. The interested reader
should use their favourite search engine to look up “completeness of the real numbers”.
473
P ROOFS AND S UPPLEMENTS A.12 A LTERNATING S ERIES
Proof. (a) By hypothesis 8 n=0 cn converges. So it suffices to prove that n=0 [ Kcn ´ an ]
ř ř8
converges, because then, by our Arithmetic of series Theorem 5.2.10,
8
ÿ 8
ÿ 8
ÿ
an = Kcn ´ [Kcn ´ an ]
n =0 n =0 n =0
will converge too. But for all n ě N0 , Kcn ´ an ě 0 so that, for all N ě N0 , the partial sums
N
ÿ
SN = [Kcn ´ an ]
n =0
N0 8
increase with N, but never gets bigger than the finite number [Kcn ´ an ] + K cn . So
ř ř
n =0 n= N0 +1
the partial sums S N converge as N Ñ 8.
(b) For all N ą N0 , the partial sum
N
ÿ N0
ÿ N
ÿ
SN = an ě an + K dn
n =0 n =0 n= N0 +1
řN
By hypothesis, n= N0 +1 dn , and hence S N , grows without bound as N Ñ 8. So S N Ñ 8
as N Ñ 8.
Section A.11 of this work was adapted from Section 3.3.10 of CLP 2 – Integral Calculus
by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.
474
P ROOFS AND S UPPLEMENTS A.12 A LTERNATING S ERIES
When the signs of successive terms in a series alternate between + and ´, like for
example in 1 ´ 12 + 31 ´ 14 + ¨ ¨ ¨ , the series is called an alternating series. More generally, the
series
8
(´1)n´1 An
ÿ
A1 ´ A2 + A3 ´ A4 + ¨ ¨ ¨ =
n =1
is alternating if every An ě 0. Often (but not always) the terms in alternating series get
successively smaller. That is, then A1 ě A2 ě A3 ě ¨ ¨ ¨ . In this case:
• And so on.
So the successive partial sums oscillate, but with ever decreasing amplitude. If, in ad-
dition, An tends to 0 as n tends to 8, the amplitude of oscillation tends to zero and the
sequence S1 , S2 , S3 , ¨ ¨ ¨ converges to some limit S.
Here is a convergence test for alternating series that exploits this structure, and that is
really easy to apply.
Then
8
(´1)n´1 An = S
ÿ
A1 ´ A2 + A3 ´ A4 + ¨ ¨ ¨ =
n =1
converges and, for each natural number N, S ´ S N is between 0 and (the first
dropped term) (´1) N A N +1 . Here S N is, as previously, the N th partial sum
N
(´1)n´1 An .
ř
n =1
“Proof”. We shall only give part of the proof here. For the rest of the proof see the ap-
pendix section A.11. We shall fix any natural number N and concentrate on the last state-
ment, which gives a bound on the truncation error (which is the error introduced when
475
P ROOFS AND S UPPLEMENTS A.12 A LTERNATING S ERIES
This is of course another series. We’re going to study the partial sums
` `ÿ
´N
(´1)n´1 An = (´1) N (´1)m´1 A N +m
ÿ
S N,` =
n = N +1 m =1
• If `1 ą N + 1, with `1 ´ N even,
ě0
hkkkkkkkkkikkkkkkkkkj ě0
hkkkkkkkkkikkkkkkkkkj ě0
hkkkkkkkikkkkkkkj
(´1) N S N,`1 = ( A N +1 ´ A N +2 ) + ( A N +3 ´ A N +4 ) + ¨ ¨ ¨ + ( A`1 ´1 ´ A`1 ) ě 0 and
ě0
hkkkkkikkkkkjhkkě0
ikkj
(´1) N S N,`1 +1 N
= (´1) S N,`1 + A`1 +1 ě 0
This tells us that (´1) N S N,` ě 0 for all ` ą N + 1, both even and odd.
ě0
hkkkkkkkikkkkkkkj ě0
hkkkkkkkikkkkkkkj ě0
hkkkkkkkikkkkkkkj
(´1) N S N,`1 = A N +1 ´ ( A N +2 ´ A N +3 ) ´ ( A N +4 ´ A N +5 ) ´ ¨ ¨ ¨ ´ ( A`1 ´1 ´ A`1 ) ď A N +1
ďA N +1
hkkkkkikkkkkjhkkě0
ikkj
N N
(´1) S N,`1 +1 = (´1) S N,`1 ´ A`1 +1 ď A N +1
This tells us that (´1) N S N,` ď A N +1 for all for all ` ą N + 1, both even and odd.
So we now know that S N,` lies between its first term, (´1) N A N +1 , and 0 for all ` ą N + 1.
While we are not going to prove it here (see the optional section A.11), this implies that,
since A N +1 Ñ 0 as N Ñ 8, the series converges and that
S ´ S N = lim S N,`
`Ñ8
Example A.12.2
1
We have already seen, in Example 5.3.6, that the harmonic series 8 n=1 n diverges. On the
ř
other hand, the series n=1 (´1)n´1 n1 converges by the alternating series test with An = n1 .
ř8
Note that
476
P ROOFS AND S UPPLEMENTS A.12 A LTERNATING S ERIES
1
(iii) lim An = lim = 0.
nÑ8 nÑ8 n
so that all of the hypotheses of the alternating series test, i.e. of Theorem A.12.1, are
satisfied. We shall see, in Example 5.5.19, that
(´1)n´1
8
ÿ
= ln 2.
n
n =1
Example A.12.2
(´1)n
8
1 ÿ 1 1 1 1 1
=e =
´1
= 1´ + ´ + ´ +¨¨¨
e n =0
n! 1! 2! 3! 4! 5!
is an alternating series and satisfies all of the conditions of the alternating series test, The-
orem A.12.1a:
1 1 1 1 1 1 1 1 1
« ´ + ´ + ´ + ´
e 2! 3! 4! 5! 6! 7! 8! 9!
then the error in this approximation lies between 0 and the next term in the series, which
1
is 10! . That is
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
´ + ´ + ´ + ´ ď ď ´ + ´ + ´ + ´ +
2! 3! 4! 5! 6! 7! 8! 9! e 2! 3! 4! 5! 6! 7! 8! 9! 10!
so that
1 1
1 1 1 1 1 1 1 1 1
ďeď 1 1 1 1 1
2! ´ 3! + 4! ´ 5! + 6! ´ 7! + 8! ´ 9! + 10! 2! ´ 3! + 4! ´ 5! + 6! ´ 7!1 + 1
8! ´ 9!1
2.7182816 ď e ď2.7182837
477
P ROOFS AND S UPPLEMENTS A.12 A LTERNATING S ERIES
1
larger than ( N +1) !
. This tends to zero spectacularly quickly as N increases, simply because
( N + 1)! increases spectacularly quickly as N increases24 . For example 20! « 2.4 ˆ 1027 .
Example A.12.3
Example A.12.4
x2 x3 x4 xn
8
(´1)n´1
ÿ
ln(1 + x ) = x ´ + ´ +¨¨¨ =
2 3 4 n
n =1
11 11 1
Suppose that we have to compute ln 10 to within an accuracy of 10´12 . Since 10 = 1 + 10 ,
11 1
we can get ln 10 by evaluating ln(1 + x ) at x = 10 , so that
11 1 1 1 1 1
8
1
n´1
ÿ
ln = ln 1 + = ´ + ´ + ¨ ¨ ¨ = ( ´1 )
10 10 10 2 ˆ 102 3 ˆ 103 4 ˆ 104 n ˆ 10n
n =1
By the alternating series test, this series converges. Also by the alternating series test,
11
approximating ln 10 by throwing away all but the first N terms
N
11 1 1 1 1 N´1 1 1
(´1)n´1
ÿ
ln « ´ + ´ + ¨ ¨ ¨ + (´1) =
10 10 2 ˆ 102 3 ˆ 103 4 ˆ 104 N ˆ 10 N n ˆ 10n
n =1
introduces an error whose magnitude is no more than the magnitude of the first term that
we threw away.
1
error ď
( N + 1) ˆ 10 N +1
To achieve an error that is no more than 10´12 , we have to choose N so that
1
ď 10´12
( N + 1) ˆ 10 N +1
The best way to do so is simply to guess — we are not going to be able to manipulate the
1 1
inequality ( N +1)ˆ10 N +1 ď 1012 into the form N ď ¨ ¨ ¨ , and even if we could, it would not be
24 ?
The interested
n reader may wish to check out “Stirling’s approximation”, which says that n! «
2πn ne .
478
P ROOFS AND S UPPLEMENTS A.12 A LTERNATING S ERIES
1 1
you really need the smallest N that obeys ( N +1)ˆ10 N +1 ď 1012 , you can next just try N = 10,
Then
8
(´1)n´1 an = S
ÿ
a1 ´ a2 + a3 ´ a4 + ¨ ¨ ¨ =
n =1
converges and, for each natural number N, S ´ S N is between 0 and (the first
dropped term) (´1) N a N +1 . Here S N is, as previously, the N th partial sum
N
(´1)n´1 an .
ř
n =1
Proof. Let 2n be an even natural number. Then the 2nth partial sum obeys
ě0
hkkkikkkj ě0
hkkkikkkj ě0
hkkkkkkikkkkkkj
S2n = ( a1 ´ a2 ) + ( a3 ´ a4 ) + ¨ ¨ ¨ + ( a2n´1 ´ a2n )
ě0
hkkkikkkj ě0
hkkkikkkj ě0
hkkkkkkikkkkkkj ě0
hkkkkkkkkikkkkkkkkj
ď ( a1 ´ a2 ) + ( a3 ´ a4 ) + ¨ ¨ ¨ + ( a2n´1 ´ a2n ) + ( a2n+1 ´ a2n+2 ) = S2(n+1)
and
ě0 ě0 ě0
hkkikkj hkkikkj hkkkkkkkkikkkkkkkkj hkkě0
ikkj
S2n = a1 ´ ( a2 ´ a3 ) ´ ( a4 ´ a5 ) ´ ¨ ¨ ¨ ´ ( a2n´2 ´ a2n´1 ) ´ a2n
ď a1
479
P ROOFS AND S UPPLEMENTS A.13 D ELICACY OF C ONDITIONAL C ONVERGENCE
Section A.12 of this work was adapted from Sections 3.3.4 and 3.3.10 of CLP 2 – Inte-
gral Calculus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-
NonCommercial-ShareAlike 4.0 International license.
1+2+3+4+5+6 = 6+3+5+2+4+1
The same is true for absolutely convergent series. But it is not true for conditionally con-
vergent series. In fact by reordering any conditionally convergent series, you can make it
add up to any number you like, including +8 and ´8. This very strange result is known
as Riemann’s rearrangement Theorem, named after Bernhard Riemann (1826–1866). The
following example illustrates the phenomenon.
Example A.13.1
is a very good example of conditional convergence. We can show, quite explicitly, how
we can rearrange the terms to make it add up to two different numbers. Later, in Exam-
ple 5.5.19, we’ll show that this series is equal to ln 2. However, by rearranging the terms
we can make it sum to 12 ln 2. The usual order is
1 1 1 1 1 1
´ + ´ + ´ +¨¨¨
1 2 3 4 5 6
For the moment think of the terms being paired as follows:
1 1 1 1 1 1
´ + ´ + ´ +¨¨¨
1 2 3 4 5 6
so the denominators go odd-even odd-even. Now rearrange the terms so the denomina-
tors are odd-even-even odd-even-even:
1 1 1 1 1 1 1 1
1´ ´ + ´ ´ + ´ ´ +¨¨¨
2 4 3 6 8 5 10 12
480
P ROOFS AND S UPPLEMENTS A.13 D ELICACY OF C ONDITIONAL C ONVERGENCE
Now notice that the first term in each triple is exactly twice the second term. If we now
combine those terms we get
1 ´ 1 ´1 + 1 ´ 1 ´1 + 1 ´ 1 ´ 1 + ¨ ¨ ¨
loomoo2n 4 lo
3omoo6n 8 5 10 12
loomoon
=1/2 =1/6 =1/10
1 1 1 1 1 1
= ´ + ´ + ´ +¨¨¨
2 4 6 8 10 12
In fact, we can go even further, and show how we can rearrange the terms of the
alternating harmonic series to add up to any given number25 . For the purposes of the
example we have chosen 1.234, but it could really be any number. The example below can
actually be formalised to give a proof of the rearrangement Theorem.
Example A.13.2
8
We’ll show how to reorder the conditionally convergent series (´1)n´1 n1 so that it
ř
n =1
adds up to exactly 1.234 (but the reader should keep in mind that any fixed number will
work).
• First create two lists of numbers — the first list consisting of the positive terms of the
series, in order, and the second consisting of the negative numbers of the series, in
order.
1 1 1 1 1 1
1, , , , ¨¨¨ and ´ , ´ , ´ , ¨¨¨
3 5 7 2 4 6
• Notice that that if we add together the numbers in the second list,we get
1h 1 1 i
´ 1+ + +¨¨¨
2 2 3
which is just ´ 21 times the harmonic series. So the numbers in the second list add up
to ´8.
25 This is reminiscent of the accounting trick of pushing all the company’s debts off to next year so that
this year’s accounts look really good and you can collect your bonus.
481
P ROOFS AND S UPPLEMENTS A.13 D ELICACY OF C ONDITIONAL C ONVERGENCE
1 1 1 1 1 1 1
1+ + + ¨¨¨ which is greater than + + + +¨¨¨
3 5 7 2 4 6 8
That is, the sum of the first set of numbers must be bigger than the sum of the second
set of numbers (which is just ´1 times the second list). So the numbers in the first
list add up to +8.
• Now we build up our reordered series. Start by moving just enough numbers from
the beginning of the first list into the reordered series to get a sum bigger than 1.234.
1
1+ = 1.3333
3
We know that we can do this, because the sum of the terms in the first list diverges
to +8.
• Next move just enough numbers from the beginning of the second list into the re-
ordered series to get a number less than 1.234.
1 1
1+ ´ = 0.8333
3 2
Again, we know that we can do this because the sum of the numbers in the second
list diverges to ´8.
• Next move just enough numbers from the beginning of the remaining part of the
first list into the reordered series to get a number bigger than 1.234.
1 1 1 1 1
1+ ´ + + + = 1.2873
3 2 5 7 9
Again, this is possible because the sum of the numbers in the first list diverges. Even
though we have already used the first few numbers, the sum of the rest of the list
will still diverge.
• Next move just enough numbers from the beginning of the remaining part of the
second list into the reordered series to get a number less than 1.234.
1 1 1 1 1 1
1+ ´ + + + ´ = 1.0373
3 2 5 7 9 4
• At this point the idea is clear, just keep going like this. At the end of each step,
the difference between the sum and 1.234 is smaller than the magnitude of the first
unused number in the lists. Since the numbers in both lists tend to zero as you go
farther and farther up the list, this procedure will generate a series whose sum is
exactly 1.234. Since in each step we remove at least one number from a list and we
alternate between the two lists, the reordered series will contain all of the terms from
8
(´1)n´1 n1 , with each term appearing exactly once.
ř
n =1
482
P ROOFS AND S UPPLEMENTS A.13 D ELICACY OF C ONDITIONAL C ONVERGENCE
Example A.13.2
Section A.13 of this work was adapted from Section 3.4.2 of CLP 2 – Integral Calculus
by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.
483
Appendix B
• (SAS — side angle side) Two sides have lengths in the same ratio and the angle
between them is the same. For example
A C
= and angle β is same
a c
484
H IGH SCHOOL MATERIAL B.2 P YTHAGORAS
B.2IJ Pythagoras
For a right-angled triangle the length of the hypotenuse is related to the lengths of the
other two sides by
opposite 1
sin θ = csc θ =
hypotenuse sin θ
adjacent 1
cos θ = sec θ =
hypotenuse cos θ
opposite 1
tan θ = cot θ =
adjacent tan θ
485
H IGH SCHOOL MATERIAL B.5 T RIGONOMETRY — G RAPHS
´π ´ π2 π π 3π 2π ´π ´ π2 π π 3π 2π ´π ´ π2 π π 3π 2π
2 2 2 2 2 2
´1 ´1
• Reflection
sin(´θ ) = ´ sin(θ ) cos(´θ ) = cos(θ )
486
H IGH SCHOOL MATERIAL B.8 T RIGONOMETRY — A DD AND S UBTRACT A NGLES
• Rotation by π
• Pythagoras
sin2 θ + cos2 θ = 1
• Cosine
π/2
π π
2
π/2
´1 1
´π/2 ´ π2
´1 1
487
H IGH SCHOOL MATERIAL B.10 A REAS
and also
sin(arcsin x ) = x ´1 ď x ď 1
cos(arccos x ) = x ´1 ď x ď 1
tan(arctan x ) = x any real x
π π
2 2
´1 1
´ π2
´1 1
Again
π π
arccsc(csc θ ) = θ ´ ďθď , θ‰0
2 2
π
arcsec(sec θ ) = θ 0 ď θ ď π, θ ‰
2
arccot(cot θ ) = θ 0ăθăπ
and
csc(arccsc x ) = x |x| ě 1
sec(arcsec x ) = x |x| ě 1
cot(arccot x ) = x any real x
B.10IJ Areas
• Area of a rectangle
A = bh
488
H IGH SCHOOL MATERIAL B.11 V OLUMES
• Area of a triangle
1 1
A= bh = ab sin θ
2 2
• Area of a circle
A = πr2
• Area of an ellipse
A = πab
B.11IJ Volumes
V = lwh
• Volume of a cylinder
V = πr2 h
• Volume of a cone
1 2
V= πr h
3
• Volume of a sphere
4 3
V= πr
3
B.12IJ Powers
In the following, x and y are arbitrary real numbers, and q is an arbitrary constant that is
strictly bigger than zero.
• q0 = 1
489
H IGH SCHOOL MATERIAL B.13 L OGARITHMS
qx
• q x+y = q x qy , q x´y = qy
1
• q´x = qx
y
• qx = q xy
• lim q x = 8, lim q x = 0 if q ą 1
xÑ8 xÑ´8
• lim q x = 0, lim q x = 8 if 0 ă q ă 1
xÑ8 xÑ´8
y y = 2x
6
2
1
x
−3 −2 −1 1 2 3
B.13IJ Logarithms
In the following, x and y are arbitrary real numbers that are strictly bigger than 0, and p
and q are arbitrary constants that are strictly bigger than one.
logq x
• q = x, logq q x = x
log p x
• logq x = log p q
• logq 1 = 0, logq q = 1
• logq ( x y ) = y logq x
• The graph of log10 x is given below. The graph of logq x, for any q ą 1, is similar.
490
H IGH SCHOOL MATERIAL B.14 Y OU SHOULD BE ABLE TO DERIVE
y
y = log10 x
1.0
0.5
x
1 5 10 15
−0.5
−1.0
1 1
´π ´ π2´1 π π 3π 2π ´π ´ π2´1 π π 3π 2π ´π ´ π2 π π 3π 2π
2 2 2 2 2 2
• More Pythagoras
divide by cos2 θ
sin2 θ + cos2 θ = 1 ÞÝÝÝÝÝÝÝÝÝÝÑ tan2 θ + 1 = sec2 θ
divide by sin2 θ
sin2 θ + cos2 θ = 1 ÞÝÝÝÝÝÝÝÝÝÑ 1 + cot2 θ = csc2 θ
491
H IGH SCHOOL MATERIAL B.15 C ARTESIAN C OORDINATES
(x, y)
x x
Similarly, each point in three dimensions may be labeled by three coordinates ( x, y, z),
as in the two figures below.
z z
(x, y, z) (x, y, z)
z z
y y
x
x
y y
x x
The set of all points in three dimensions is denoted R3 . The plane that contains, for exam-
ple, the x– and y–axes is called the xy–plane.
• The xy–plane is the set of all points ( x, y, z) that obey z = 0.
• The xz–plane is the set of all points ( x, y, z) that obey y = 0.
• The yz–plane is the set of all points ( x, y, z) that obey x = 0.
More generally,
• The set of all points ( x, y, z) that obey z = c is a plane that is parallel to the xy–plane
and is a distance |c| from it. If c ą 0, the plane z = c is above the xy–plane. If
c ă 0, the plane z = c is below the xy–plane. We say that the plane z = c is a signed
distance c from the xy–plane.
492
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS
• The set of all points ( x, y, z) that obey y = b is a plane that is parallel to the xz–plane
and is a signed distance b from it.
• The set of all points ( x, y, z) that obey x = a is a plane that is parallel to the yz–plane
and is a signed distance a from it.
z z z
z=c
y=b
y y y
x=a
x x x
Observe that
• the distance from the point ( x, y, z) to the xy–plane is |z|
• the distance from the point ( x, y, z) to the xz–plane is |y|
• the distance from the point ( x, y, z) to the yz–plane is |x|
the distance from the point ( x, y, z) to the origin (0, 0, 0) is x2 + y2 + z2
a
•
The distance from the point ( x, y, z) to the point ( x1 , y1 , z1 ) is
b
( x ´ x 1 )2 + ( y ´ y1 )2 + ( z ´ z1 )2
so that the equation of the sphere centered on (1, 2, 3) with radius 4, that is, the set of all
points ( x, y, z) whose distance from (1, 2, 3) is 4, is
1 The method for cubics was developed in the 15th century by del Ferro, Cardano and Ferrari (Cardano’s
student). Ferrari then went on to discover a formula for the roots of a quartic. His formula requires the
solution of an associated cubic polynomial.
2 This is the famous Abel-Ruffini Theorem.
493
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS
Despite this there are many tricks3 for finding roots of polynomials that work well in
some situations but not all. Here we describe approaches that will help you find integer
and rational roots of polynomials that will work well on exams, quizzes and homework
assignments.
Consider the quadratic equation x2 ´ 5x + 6 = 0. We could4 solve this using the
quadratic formula
?
5 ˘ 25 ´ 4 ˆ 1 ˆ 6 5˘1
x= = = 2, 3.
2 2
Hence x2 ´ 5x + 6 has roots x = 2, 3 and so it factors as ( x ´ 3)( x ´ 2). Notice5 that the
numbers 2 and 3 divide the constant term of the polynomial, 6. This happens in general
and forms the basis of our first trick.
a n ¨ r n + ¨ ¨ ¨ + a1 ¨ r + a0 = 0
We can see that r divides every term on the right-hand side. This means that the right-
hand side is an integer times r. Thus the left-hand side, being a0 , is an integer times r, as
required. The argument for when ´r is a root is almost identical.
• The only divisors of 2 are 1, 2. So the only candidates for integer roots are ˘1, ˘2.
3 There is actually a large body of mathematics devoted to developing methods for factoring polyno-
mials. Polynomial factorisation is a fundamental problem for most computer algebra systems. The
interested reader should make use of their favourite search engine to find out more.
4 We probably shouldn’t do it this way for such a simple polynomial, but for pedagogical purposes we
do here.
5 Many of you may have been taught this approach in highschool.
494
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS
P (1) = 2 P(´1) = 0
P (2) = 6 P(´2) = ´10
Example B.16.1
Example B.16.2
• The divisors of 6 are 1, 2, 3, 6. So the only candidates for integer roots are ˘1, ˘2, ˘3, ˘6.
P (1) =0 P(´1) =4
P (2) = 40 P(´2) = 12
P (3) = 132 P(´3) =0
P (6) = 900 P(´6) = ´336
Example B.16.2
We can generalise this approach in order to find rational roots. Consider the polyno-
mial 6x2 ´ x ´ 2. We can find its zeros using the quadratic formula:
?
1 ˘ 1 + 48 1˘7 1 2
x= = =´ , .
12 12 2 3
Notice now that the numerators, 1 and 2, both divide the constant term of the polynomial
(being 2). Similarly, the denominators, 2 and 3, both divide the coefficient of the highest
power of x (being 6). This is quite general.
If b/d or ´b/d is a rational root in lowest terms (i.e. b and d are integers with
no common factors) of a polynomial Q( x ) = an x n + ¨ ¨ ¨ + a1 x + a0 with inte-
ger coefficients, then the numerator b is a factor of the constant term a0 and the
denominator d is a factor of an .
495
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS
an (b/d)n + ¨ ¨ ¨ + a1 (b/d) + a0 = 0
an bn + ¨ ¨ ¨ + a1 bdn´1 + a0 dn = 0
Now every term on the right-hand side is some integer times b. Thus the left-hand side
must also be an integer times b. We know that d does not contain any factors of b, hence
a0 must be some integer times b (as required).
Similarly we can isolate the term an bn :
an bn = ´ an´1 bn´1 d + ¨ ¨ ¨ + a1 bdn´1 + a0 dn
Now every term on the right-hand side is some integer times d. Thus the left-hand side
must also be an integer times d. We know that b does not contain any factors of d, hence
an must be some integer times d (as required).
The argument when ´b/d is a root is nearly identical.
We should put this to work:
Example B.16.3
P( x ) = 2x2 ´ x ´ 3.
Solution.
• The constant term in this polynomial is 3 = 1 ˆ 3 and the coefficient of the highest
power of x is 2 = 1 ˆ 2.
• Thus the only candidates for integer roots are ˘1, ˘3.
• By our newest trick, the only candidates for fractional roots are ˘ 12 , ˘ 23 .
P(1) = ´2 P(´1) =0
P(3) = 12 P(´3) = 18
P 12 = ´3 P ´ 12 = ´2
P 32 = 0 P ´ 23 =3
6 Again, this is a little tedious, but not difficult. Its actually pretty easy to code up for a computer to do.
Modern polynomial factoring algorithms do more sophisticated things, but these are a pretty good way
to start.
496
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS
Example B.16.3
The tricks above help us to find integer and rational roots of polynomials. With a little
extra work we can extend those methods to help us factor polynomials. Say we have a
polynomial P( x ) of degree p and have established that r is one of its roots. That is, we
know P(r ) = 0. Then we can factor ( x ´ r ) out from P( x ) — it is always possible to find a
polynomial Q( x ) of degree p ´ 1 so that
P( x ) = ( x ´ r ) Q( x )
In sufficiently simple cases, you can probably do this factoring by inspection. For
example, P( x ) = x2 ´ 4 has r = 2 as a root because P(2) = 22 ´ 4 = 0. In this case,
P( x ) = ( x ´ 2)( x + 2) so that Q( x ) = ( x + 2). As another example, P( x ) = x2 ´ 2x ´ 3
has r = ´1 as a root because P(´1) = (´1)2 ´ 2(´1) ´ 3 = 1 + 2 ´ 3 = 0. In this case,
P( x ) = ( x + 1)( x ´ 3) so that Q( x ) = ( x ´ 3).
For higher degree polynomials we need to use something more systematic — long
divison.
TrickB.16.3 (Long Division).
Once you have found a root r of a polynomial, even if you cannot factor ( x ´ r )
out of the polynomial by inspection, you can find Q( x ) by dividing P( x ) by x ´ r,
using the long division algorithm you learned7 in school, but with 10 replaced
by x.
Example B.16.4
Factor P( x ) = x3 ´ x2 + 2.
Solution.
• We can go hunting for integer roots of the polynomial by looking at the divisors of
the constant term. This tells us to try x = ˘1, ˘2.
• A quick computation shows that P(´1) = 0 while P(1), P(´2), P(2) ‰ 0. Hence
x = ´1 is a root of the polynomial and so x + 1 must be a factor.
3+2 2
• So we divide x ´x 2
x +1 . The first term, x , in the quotient is chosen so that when you
multiply it by the denominator, x2 ( x + 1) = x3 + x2 , the leading term, x3 , matches
the leading term in the numerator, x3 ´ x2 + 2, exactly.
x2
x + 1 x3 − x2 + 2
x3 + x2 x2 (x + 1)
7 This is a standard part of most highschool mathematics curricula, but perhaps not all. You should revise
this carefully.
497
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS
x2
x + 1 x3 − x2 + 2
x3 + x2 x2 (x + 1)
−2x2
• The next term, ´2x, in the quotient is chosen so that when you multiply it by the
denominator, ´2x ( x + 1) = ´2x2 ´ 2x, the leading term ´2x2 matches the leading
term in the remainder exactly.
x2 − 2x
x + 1 x3 − x2 + 2
x3 + x2 x2 (x + 1)
−2x2
−2x2 − 2x −2x(x + 1)
And so on.
x2 − 2x + 2
x + 1 x3 − x2 + 2
x3 + x2 x2 (x + 1)
−2x2
−2x2 − 2x −2x(x + 1)
2x + 2
2x + 2 2(x + 1)
0
• Note that we finally end up with a remainder 0. A nonzero remainder would have
signalled a computational error, since we know that the denominator x ´ (´1) must
divide the numerator x3 ´ x2 + 2 exactly.
• We conclude that
( x + 1)( x2 ´ 2x + 2) = x3 ´ x2 + 2
To check this, just multiply out the left hand side explicitly.
?
´b˘ b2 ´4ac
• Applying the high school quadratic root formula 2a to x2 ´ 2x + 2 tells us
that it has no real roots and that we cannot factor it further8 .
Example B.16.4
498
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS
Factor P( x ) = x3 ´ x2 + 2, again.
Solution. Let us do this again but avoid long division.
3+2 2
• From the previous example, we know that x ´x x +1 must be a polynomial (since ´1
is a root of the numerator) of degree 2. So write
x3 ´ x2 + 2
= ax2 + bx + c
x+1
for some, as yet unknown, coefficients a, b and c.
x3 ´ x2 + 2 = ( ax2 + bx + c)( x + 1)
= ax3 + ( a + b) x2 + (b + c) x + c
• Now matching coefficients of the various powers of x on the left and right hand
sides
coefficient of x3 : a=1
coefficient of x2 : a + b = ´1
coefficient of x1 : b+c = 0
coefficient of x0 : c=2
• This gives us a system of equations that we can solve quite directly. Indeed it tells
us immediately that that a = 1 and c = 2. Subbing a = 1 into a + b = ´1 tells us
that 1 + b = ´1 and hence b = ´2.
• Thus
x3 ´ x2 + 2 = ( x + 1)( x2 ´ 2x + 2).
Example B.16.5
Appendix B of this work was taken from Appendix A of CLP 2 – Integral Calculus by
Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.
499