0% found this document useful (0 votes)
16 views505 pages

OIL Text

Uploaded by

trash eeh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views505 pages

OIL Text

Uploaded by

trash eeh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 505

O PTIMAL , I NTEGRAL , L IKELY

O PTIMIZATION , I NTEGRAL C ALCULUS , AND P ROBABILITY


FOR STUDENTS OF COMMERCE AND THE SOCIAL SCIENCES

Prepared by Bruno Belevan, Parham Hamidi, Nisha Malhotra, and Elyse Yeager
Adapted from CLP Calculus by Joel Feldman, Andrew Rechnitzer, and Elyse Yeager

T HIS DOCUMENT WAS TYPESET ONT HURSDAY 7 TH J ANUARY, 2021, AND IS COMPATIBLE
WITH THE 12 D ECEMBER 2020 VERSION .
§§ Licenses and Attributions
Copyright © 2020 Bruno Belevan, Parham Hamidi, Nisha Malhotra, and Elyse Yeager

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike


4.0 International License. You can view a copy of the license at
http://creativecommons.org/licenses/by-nc-sa/4.0/.

Source files can be found at https://gitlab.math.ubc.ca/ecyeager/OIL

This textbook contains new material as well as material adapted from open sources.

• Chapters 1 and 2 (and their associated appendix sections) were adapted with minor
changes from Chapters 1 and 2 of CLP 3 – Multivariable Calculus by Feldman, Rech-
nitzer, and Yeager under a Create Commons Attribution-NonCommercial-ShareAlike
4.0 International license.

• Chapters 3 and 5 (and their associated appendix sections) and Appendix B were
adapted with minor changes from Chapters 1 and 3, Section 2.4, and Appendix A
of CLP 2 – Integral Calculus by Feldman, Rechnitzer, and Yeager under a Create
Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

• Chapter 4 contains content adapted with significant changes from Sections 1.1, 3.1,
Ch 4 introduction, 4.1, and 4.2 of Introductory Statistics by Ilowsky and Dean under
a Creative Commons Attribution License v4.0.

§§ Acknowledgements
UBC Point Grey campus sits on the traditional, ancestral and unceded territory of the
xw m θkw ý m (Musqueam). Musqueam and UBC have an ongoing relationship sharing
e e e
insight, knowledge, and labour. Those interested in learning more about this relationship
might start here.
Matt Coles of the University of British Columbia has been an important member of the
project to develop quality open resources for Math 105. Thanks to Andrew Rechnitzer at
UBC Mathematics for help with converting LaTeX to PreTeXt.
The development of this text was supported by an OER Implementation Grant, pro-
vided through the UBC Open Educational Resources Fund.

§§ Contact
To report a mistake, or to let us know you’re using this book in a course you’re teaching,
please email [email protected]

2
C ONTENTS

1 Vectors and Geometry in Two and Three Dimensions 1


1.1 Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Addition of Vectors and Multiplication of a Vector by a Scalar . . . . 6
1.2.2 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Equations of Planes in 3d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Functions of Two Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Sketching Surfaces in 3d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.1 Quadric Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2 Partial Derivatives 40
2.1 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.2 Higher Order Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 Local Maximum and Minimum Values . . . . . . . . . . . . . . . . . . . . . . 52
2.3.1 Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.2 Classifying Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.4 Absolute Minima and Maxima . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.4.1 (Optional) Parametrization . . . . . . . . . . . . . . . . . . . . . . . . 78
2.5 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.5.1 Bounded vs Unbounded Constraints . . . . . . . . . . . . . . . . . . . 93
2.6 (Optional) Utility and Demand Functions . . . . . . . . . . . . . . . . . . . . 95
2.6.1 Constrained Optimization of the Utility Function . . . . . . . . . . . 95
2.6.2 Demand Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

3 Integration 106
3.1 Definition of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.1.1 Summation Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.1.2 The Definition of the Definite Integral . . . . . . . . . . . . . . . . . . 119
3.1.3 Using Known Areas to Evaluate Integrals . . . . . . . . . . . . . . . . 127
3.1.4 (Optional) Surplus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.2 Basic Properties of the Definite Integral . . . . . . . . . . . . . . . . . . . . . 134

i
CONTENTS CONTENTS

3.2.1 More Properties of Integration: Even and Odd Functions . . . . . . . 141


3.2.2 More Properties of Integration: Inequalities for Integrals . . . . . . . 145
3.3 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . . . . 147
3.3.1 Indefinite Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
3.3.2 (Optional) Marginal Cost and Marginal Revenue . . . . . . . . . . . 163
3.4 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
3.4.1 Substitution and Definite Integrals . . . . . . . . . . . . . . . . . . . . 171
3.4.2 More Substitution Examples . . . . . . . . . . . . . . . . . . . . . . . 175
3.5 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
3.5.1 Further Techniques using Integration by Parts . . . . . . . . . . . . . 186
3.6 Trigonometric Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
3.6.1 Integrating ş sinm x cosn xdx . . . . . . . . . . . . . . . . . . . . . . . . 192
ş

3.6.2 Integrating tanm x secn xdx . . . . . . . . . . . . . . . . . . . . . . . . 194


3.7 Trigonometric Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
3.8 Partial Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
3.8.1 Partial Fraction Decomposition Examples . . . . . . . . . . . . . . . . 213
3.8.2 Non-Rational Integrands . . . . . . . . . . . . . . . . . . . . . . . . . 223
3.8.3 Systematic Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
3.9 Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
3.9.1 The Midpoint Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
3.9.2 The Trapezoidal Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
3.9.3 Simpson’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
3.9.4 Three Simple Numerical Integrators – Error Behaviour . . . . . . . . 240
3.10 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
3.10.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
3.10.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
3.10.3 Convergence Tests for Improper Integrals . . . . . . . . . . . . . . . . 260
3.11 Overview of Integration Techniques . . . . . . . . . . . . . . . . . . . . . . . 267
3.12 Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
3.12.1 (Optional) Logistic Growth . . . . . . . . . . . . . . . . . . . . . . . . 281
3.12.2 (Optional) Interest on Investments and Loans . . . . . . . . . . . . . 286

4 Probability 293
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
4.1.1 Foundational Vocabulary and Notation . . . . . . . . . . . . . . . . . 293
4.1.2 Discrete vs Continuous . . . . . . . . . . . . . . . . . . . . . . . . . . 297
4.1.3 Combining Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
4.1.4 Equally Likely Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . 301
4.2 Probability Mass Function (PMF) . . . . . . . . . . . . . . . . . . . . . . . . . 304
4.2.1 Limitations of Probability Mass Function (PMF) . . . . . . . . . . . . 309
4.3 Cumulative Distribution Function (CDF) . . . . . . . . . . . . . . . . . . . . 311
4.3.1 Dot Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
4.4 Probability Density Function (PDF) . . . . . . . . . . . . . . . . . . . . . . . 320
4.5 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
4.5.1 Motivation: Long-Term Average . . . . . . . . . . . . . . . . . . . . . 327
4.5.2 Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 328
4.5.3 Checking your Expectation Calculation . . . . . . . . . . . . . . . . . 332

ii
CONTENTS CONTENTS

4.6 Variance and Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . 338


4.6.1 Motivation: Average difference from the average . . . . . . . . . . . 338
4.6.2 Definitions and Computations . . . . . . . . . . . . . . . . . . . . . . 340
4.6.3 Checking your Standard Deviation Calculation . . . . . . . . . . . . 346

5 Sequence and Series 349


5.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
5.1.1 (Optional) Musical Scales . . . . . . . . . . . . . . . . . . . . . . . . . 355
5.2 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
5.2.1 Geometric Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
5.2.2 Telescoping Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
5.2.3 Arithmetic of Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
5.2.4 (Optional) Intergenerational Cost-Benefit Analysis . . . . . . . . . . 373
5.3 Convergence Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
5.3.1 The Divergence Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
5.3.2 The Integral Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
5.3.3 The Comparison Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
5.3.4 The Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
5.3.5 Convergence Test List . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
5.4 Absolute and Conditional Convergence . . . . . . . . . . . . . . . . . . . . . 391
5.5 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
5.5.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
5.5.2 Working With Power Series . . . . . . . . . . . . . . . . . . . . . . . . 402
5.6 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
5.6.1 Extending Taylor Polynomials . . . . . . . . . . . . . . . . . . . . . . 409
5.6.2 Computing with Taylor Series . . . . . . . . . . . . . . . . . . . . . . 417
5.6.3 Evaluating Limits using Taylor Expansions . . . . . . . . . . . . . . . 423

A Proofs and Supplements 426


A.1 Folding the First Octant of R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
A.2 Conic Sections and Quadric Surfaces . . . . . . . . . . . . . . . . . . . . . . . 427
A.3 Mixed Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
A.3.1 Clairaut: The Proof of Theorem22.2.5 . . . . . . . . . . . . . . . . . . . 430
B2 f B f
A.3.2 An Example of BxBy ( x0 , y0 ) ‰ ByBx ( x0 , y0 ) . . . . . . . . . . . . . . . . 433
A.4 The (multivariable) chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
d
 df 
A.4.1 Review of the Proof of dt f x (t) = dx x (t) dx dt ( t ) . . . . . . . . . . . 436
A.4.2 Proof of Theorem A.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 436
A.5 Lagrange Multipliers: Proof of Theorem 2.5.2 . . . . . . . . . . . . . . . . . . 439
A.6 A More Rigorous Area Computation . . . . . . . . . . . . . . . . . . . . . . . 441
A.7 Careful Definition of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . 443
A.8 Integrating sec x, csc x, sec3 x and csc3 x . . . . . . . . . . . . . . . . . . . . . 447
A.9 Partial Fraction Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . 452
A.9.1 Decomposition involving Irreducible Quadratic Factors . . . . . . . 452
A.9.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
A.10 An Error Bound for the Midpoint Rule . . . . . . . . . . . . . . . . . . . . . . 471
A.11 Comparison Tests Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
A.12 Alternating Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474

iii
CONTENTS CONTENTS

A.12.1 The Alternating Series Test . . . . . . . . . . . . . . . . . . . . . . . . 474


A.12.2 Alternating Series Test Proof . . . . . . . . . . . . . . . . . . . . . . . 479
A.13 Delicacy of Conditional Convergence . . . . . . . . . . . . . . . . . . . . . . 480

B High school material 484


B.1 Similar Triangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
B.2 Pythagoras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
B.3 Trigonometry — Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
B.4 Radians, Arcs and Sectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
B.5 Trigonometry — Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
B.6 Trigonometry — Special Triangles . . . . . . . . . . . . . . . . . . . . . . . . 486
B.7 Trigonometry — Simple Identities . . . . . . . . . . . . . . . . . . . . . . . . 486
B.8 Trigonometry — Add and Subtract Angles . . . . . . . . . . . . . . . . . . . 487
B.9 Inverse Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . 487
B.10 Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
B.11 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
B.12 Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
B.13 Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
B.14 Highschool Material You Should be Able to Derive . . . . . . . . . . . . . . . 491
B.15 Cartesian Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
B.16 Roots of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493

iv
Chapter 1

V ECTORS AND G EOMETRY IN


T WO AND T HREE D IMENSIONS

Before we get started doing calculus in two and three dimensions we need to brush up
on some basic geometry that we will use a lot. We are already familiar with the Cartesian
plane1 , but we’ll start from the beginning.

1.1IJ Points
Each point in two dimensions may be labeled by two coordinates2 ( x, y) which specify the
position of the point in some units with respect to some axes as in the figure below.

(x, y)

x x

The set of all points in two dimensions is denoted3 R2 . Observe that

1 René Descartes (1596–1650) was a French scientist and philosopher, who lived in the Dutch Republic
for roughly twenty years after serving in the (mercenary) Dutch States Army. He is viewed as the father
of analytic geometry, which uses numbers to study geometry.
2 This is why the xy-plane is called “two dimensional” — the name of each point consists of two real
numbers.
3 Not surprisingly, the 2 in R2 signifies that each point is labelled by two numbers and the R in R2
signifies that the numbers in question are real numbers. There are more advanced applications (for
example in signal analysis and in quantum mechanics) where complex numbers are used. The space of
all pairs (z1 , z2 ), with z1 and z2 complex numbers is denoted C2 .

1
V ECTORS AND G EOMETRY 1.1 P OINTS

• the distance from the point ( x, y) to the x-axis is |y|


• the distance from the point ( x, y) to the y-axis is |x|
• the distance from the point ( x, y) to the origin (0, 0) is x2 + y2
a

Similarly, each point in three dimensions may be labeled by three coordinates ( x, y, z),
as in the two figures below.

z z

(x, y, z) (x, y, z)

z z
y y
x
x
y y

x x

The set of all points in three dimensions is denoted R3 . The plane that contains, for exam-
ple, the x- and y-axes is called the xy-plane.

• The xy-plane is the set of all points ( x, y, z) that satisfy z = 0.


• The xz-plane is the set of all points ( x, y, z) that satisfy y = 0.
• The yz-plane is the set of all points ( x, y, z) that satisfy x = 0.

More generally,

• The set of all points ( x, y, z) that obey z = c is a plane that is parallel to the xy-plane
and is a distance |c| from it. If c ą 0, the plane z = c is above the xy-plane. If
c ă 0, the plane z = c is below the xy-plane. We say that the plane z = c is a signed
distance c from the xy-plane.
• The set of all points ( x, y, z) that obey y = b is a plane that is parallel to the xz-plane
and is a signed distance b from it.
• The set of all points ( x, y, z) that obey x = a is a plane that is parallel to the yz-plane
and is a signed distance a from it.

z z z
z“c

y“b

y y y

x x x x“a

2
V ECTORS AND G EOMETRY 1.2 V ECTORS

Observe that our 2d distances extend quite easily to 3d.

• the distance from the point ( x, y, z) to the xy-plane is |z|


• the distance from the point ( x, y, z) to the xz-plane is |y|
• the distance from the point ( x, y, z) to the yz-plane is |x|
the distance from the point ( x, y, z) to the origin (0, 0, 0) is x2 + y2 + z2
a

To see that the distance from the point ( x, y, z) to the origin (0, 0, 0) is indeed x2 + y2 + z2 ,
a

• apply Pythagoras to the right-angled triangle with vertices a (0, 0, 0), ( x, 0, 0) and
( x, y, 0) to see that the distance from (0, 0, 0) to ( x, y, 0) is x2 + y2 and then
• apply Pythagoras to the right-angled triangle with verticesb(0, 0, 0), ( x, y, 0) and
2
x 2 + y2 + z2 =
a
( x, y, z) to see that the distance from (0, 0, 0) to ( x, y, z) is
x 2 + y2 + z2 .
a

px, y, zq

y
x
px, 0, 0q y px, y, 0q

More generally, the distance from the point ( x, y, z) to the point ( x1 , y1 , z1 ) is


b
( x ´ x 1 )2 + ( y ´ y1 )2 + ( z ´ z1 )2

Notice that this gives us the equation for a sphere quite directly. All the points on a sphere
are equidistant from the centre of the sphere. So, for example, the equation of the sphere
centered on (1, 2, 3) with radius 4, that is, the set of all points ( x, y, z) whose distance from
(1, 2, 3) is 4, is
( x ´ 1)2 + (y ´ 2)2 + (z ´ 3)2 = 16
If you’re having a hard time picturing the three-dimensional axes, Appendix section A.1
will lead you through folding a model out of a piece of paper.

1.2IJ Vectors
In many of our applications in 2d and 3d, we will encounter quantities that have both a
magnitude (like a distance) and also a direction. Such quantities are called vectors. That is,
a vector is a quantity which has both a direction and a magnitude, like a velocity. If you are
moving, the magnitude (length) of your velocity vector is your speed (distance travelled

3
V ECTORS AND G EOMETRY 1.2 V ECTORS

per unit time) and the direction of your velocity vector is your direction of motion. To
specify a vector in three dimensions you have to give three components, just as for a
point. To draw the vector with components a, b, c you can draw an arrow from the point
(0, 0, 0) to the point ( a, b, c). Similarly, to specify a vector in two dimensions you have to

y z
pa, b, cq
(a, b) c
y
b a
b
x
a x

give two components. To draw the vector with components a and b , you can draw an
arrow from the point (0, 0) to the point ( a, b).
There are many situations in which it is preferable to draw a vector with its tail at
some point other than the origin. For example, it is natural to draw the velocity vector
of a moving particle with the tail of the velocity vector at the position of the particle,
whether or not the particle is at the origin. The sketch below shows a moving particle and
its velocity vector at two different times.

y
v

As a second example, suppose that you are analyzing the motion of a pendulum. There
are three forces acting on the pendulum bob: gravity g, which is pulling the bob straight
down, tension t in the rod, which is pulling the bob in the direction of the rod, and air
resistance r, which is pulling the bob in a direction opposite to its direction of motion. All
three forces are acting on the bob. So it is natural to draw all three arrows representing the
forces with their tails at the ball.

4
V ECTORS AND G EOMETRY 1.2 V ECTORS

t
r

In this text, we will used bold faced letters, like v, t, g, to designate vectors. In hand-
writing, it is clearer to use a small overhead arrow4 , as in ~v, ~t, ~g, instead. Also, when we
want to emphasize that some quantity is a number, rather than a vector, we will call the
number a scalar.
Both points and vectors in 2d are specified by two numbers. Until you get used to this,
it might confuse you sometimes — does a given pair of numbers represent a point or a
vector? To distinguish5 between the components of a vector and the coordinates of the
point at its head, when its tail is at some point other than the origin, we shall use angle
brackets rather than round brackets around the components of a vector. For example, the
figure below shows the two-dimensional vector h2, 1i drawn in three different positions.
In each case, when the tail is at the point (u, v) the head is at (2 + u, 1 + v). We warn you
that, out in the real world6 , no one uses notation that distinguishes between components
of a vector and the coordinates of its head — usually round brackets are used for both. It
is up to you to keep straight which is being referred to.

h2, 1i (6, 3)
y

(2, 1) (4, 2) (10, 1)


h2, 1i

(0, 0) (8, 0) x

By way of summary,

Notation1.2.1.
we use

• bold faced letters, like v, t, g, to designate vectors, and


• angle brackets, like h2, 1i, around the components of a vector, but use
• round brackets, like (2, 1), around the coordinates of a point, and use
• “scalar” to emphasise that some quantity is a number, rather than a vector.

4 Some people use an underline, as in v, rather than an arrow.


5 Or, in the Wikipedia jargon, disambiguate.
6 OK. OK. Out in that (admittedly very small) part of the real world that actually knows what a vector is.

5
V ECTORS AND G EOMETRY 1.2 V ECTORS

1.2.1 §§ Addition of Vectors and Multiplication of a Vector by a Scalar


Just as we have done many times in the texts, when we define a new type of object, we
want to understand how it interacts with the basic operations of addition and multipli-
cation. Vectors are no different, and the following is a natural way to define addition of
vectors. Multiplication will be more subtle, and we start with multiplication of a vector
by a number (rather than with multiplication of a vector by another vector).

Definition1.2.2 (Adding Vectors and Multiplying a Vector by a Number).

These two operations have the obvious definitions

a = h a1 , a2 i , b = hb1 , b2 i ùñ a + b = h a1 + b1 , a2 + b2 i
a = h a1 , a2 i , s a number ùñ sa = hsa1 , sa2 i

and similarly in three dimensions.

Pictorially, you add the vector b to the vector a by drawing b with its tail at the head
of a and then drawing a vector from the tail of a to the head of b, as in the figure on the
left below. For a number s, we can draw the vector sa, by just

• changing the vector a’s length by the factor |s|, and,


• if s ă 0, reversing the arrow’s direction,

as in the other two figures below.

a2 ` b2 a`b 2a2
2a
b2 b a
a2
a2 a
a ´2a

The special case of multiplication by s = ´1 appears so frequently that (´1)a is given the
shorter notation ´a. That is,
´ h a1 , a2 i = h´a1 , ´a2 i

Of course a + (´a) is 0, the vector all of whose components are zero.


To subtract b from a pictorially, you may add ´b (which is drawn by reversing the
direction of b) to a. Alternatively, if you draw a and b with their tails at a common point,
then a ´ b is the vector from the head of b to the head of a. That is, a ´ b is the vector you
must add to b in order to get a.

6
V ECTORS AND G EOMETRY 1.2 V ECTORS

−b
a−b
a
a−b
b

The operations of addition and multiplication by a scalar that we have just defined are
quite natural and rarely cause any problems, because they inherit from the real numbers
the properties of addition and multiplication that you are used to.

Theorem1.2.3 (Properties of Addition and Scalar Multiplication).

Let a, b and c be vectors and s and t be scalars. Then

(1) a+b = b+a (2) a + (b + c) = (a + b) + c


(3) a+0 = a (4) a + (´a) = 0
(5) s(a + b) = sa + sb (6) (s + t)a = sa + ta
(7) (st)a = s(ta) (8) 1a = a

We have just been introduced to many definitions. Let’s see some of them in action.
Example 1.2.4

For example, if
a = h1, 2, 3i b = h3, 2, 1i c = h1, 0, 1i
then

2a = 2 h1, 2, 3i = h2, 4, 6i
´b = ´ h3, 2, 1i = h´3, ´2, ´1i
3c = 3 h1, 0, 1i = h3, 0, 3i

and

2a ´ b + 3c = h2, 4, 6i + h´3, ´2, ´1i + h3, 0, 3i


= h2 ´ 3 + 3 , 4 ´ 2 + 0 , 6 ´ 1 + 3i
= h2, 2, 8i

Example 1.2.4

7
V ECTORS AND G EOMETRY 1.2 V ECTORS

There are some vectors that occur sufficiently commonly that they are given special
names. One is the vector 0. Some others are the “standard basis vectors”.
Definition1.2.5.

(a) The standard basis vectors in two dimensions are


y
ı̂ı = h1, 0i ̂ = h0, 1i ̂
ı̂ı z
x
(b) The standard basis vectors in three dimensions are k̂
̂
y
ı̂ı = h1, 0, 0i ̂ = h0, 1, 0i k̂ = h0, 0, 1i
ı̂ı
x

We’ll explain the little hats in the notation ı̂ı , ̂ , k̂ shortly. Some people rename ı̂ı , ̂ and
k̂ to e1 , e2 and e3 respectively. Using the above properties we have, for all vectors,

h a1 , a2 i = a1 ı̂ı + a2 ̂ h a1 , a2 , a3 i = a1 ı̂ı + a2 ̂ + a3 k̂


A sum of numbers times vectors, like a1ı̂ı + a2 ̂ is called a linear combination of the vectors.
Thus all vectors can be expressed as linear combinations of the standard basis vectors.
This makes basis vectors very helpful in computations. The standard basis vectors are unit
vectors, meaning that they are of length one, where the length of a vector a is denoted7 |a|
and is defined by

Definition1.2.6 (Length of a Vector).

b
a = h a1 , a2 i ùñ |a| = a21 + a22
b
a = h a1 , a2 , a3 i ùñ |a| = a21 + a22 + a23

A unit vector is a vector of length one. We’ll sometimes use the accent ˆ to em-
phasise that the vector â is a unit vector. That is, |â| = 1.

Example 1.2.7

Recall that multiplying a vector a by a positive number s, changes the length of the vector
by a factor s without changing the direction of the vector. So (assuming that |a| ‰ 0) |a|
a

h1,1,1
? i
is a unit vector that has the same direction as a. For example, 3
is a unit vector that
points in the same direction as h1, 1, 1i.

7 The notation }a} is also used for the length of a.

8
V ECTORS AND G EOMETRY 1.2 V ECTORS

Example 1.2.7

1.2.2 §§ The Dot Product


Let’s get back to the arithmetic operations of addition and multiplication. We will be using
both scalars and vectors. So, for each operation there are three possibilities that we need
to explore:

• “scalar plus scalar”, “scalar plus vector” and “vector plus vector”
• “scalar times scalar”, “scalar times vector” and “vector times vector”

We have been using “scalar plus scalar” and “scalar times scalar” since childhood. “Vector
plus vector” and “scalar times vector” were just defined above. There is no sensible way
to define “scalar plus vector”, so we won’t. This leaves “vector times vector”. There are
actually two widely used such products. The first is the dot product, which is the topic of
this section, and which is used to easily determine the angle θ (or more precisely, cos θ)
between two vectors. (The second widely-used product of two vectors, the cross product,
is not a part of this course.)

Definition1.2.8 (Dot Product).

The dot product of the vectors a and b is denoted a ¨ b and is defined by

a = h a1 , a2 i , b = hb1 , b2 i ùñ a ¨ b = a1 b1 + a2 b2
a = h a1 , a2 , a3 i , b = hb1 , b2 , b3 i ùñ a ¨ b = a1 b1 + a2 b2 + a3 b3

in two and three dimensions respectively.

The properties of the dot product are as follows:

Theorem1.2.9 (Properties of the Dot Product).

Let a, b and c be vectors and let s be a scalar. Then

(0) a, b are vectors and a ¨ b is a scalar


(1) a ¨ a = |a|2
(2) a¨b = b¨a
(3) a ¨ (b + c) = a ¨ b + a ¨ c, (a + b) ¨ c = a ¨ c + b ¨ c
(4) (sa) ¨ b = s(a ¨ b)
(5) 0¨a = 0
(6) a ¨ b = |a| |b| cos θ where θ is the angle between a and b
(7) a ¨ b = 0 ðñ a = 0 or b = 0 or a K b

9
V ECTORS AND G EOMETRY 1.2 V ECTORS

Proof. Properties 0 through 5 are almost immediate consequences of the definition. For
example, for property 3 (which is called the distributive law) in dimension 2,

a ¨ (b + c) = h a1 , a2 i ¨ hb1 + c1 , b2 + c2 i
= a1 (b1 + c1 ) + a2 (b2 + c2 ) = a1 b1 + a1 c1 + a2 b2 + a2 c2
a¨b+a¨c = h a1 , a2 i ¨ hb1 , b2 i + h a1 , a2 i ¨ hc1 , c2 i
= a1 b1 + a2 b2 + a1 c1 + a2 c2

Property 6 is sufficiently important that it is often used as the definition of dot product.
It is not at all an obvious consequence of the definition. To verify it, we just write |a ´ b|2
in two different ways. The first expresses |a ´ b|2 in terms of a ¨ b. It is

1
|a ´ b|2 = (a ´ b ) ¨ (a ´ b )
3
= a¨a´a¨b´b¨a+b¨b
1,2
= |a|2 + |b|2 ´ 2a ¨ b

1
Here, =, for example, means that the equality is a consequence of property 1. The second
way we write |a ´ b|2 involves cos θ and follows from the cosine law for triangles. Just
in case you don’t remember the cosine law, we’ll derive it right now! Start by applying
Pythagoras to the shaded triangle in the right hand figure of

a a´b |a| |a ´ b|
|a| sin θ
θ θ
b |b|
|a| cos θ

That triangle is a right triangle whose


 hypotenuse has length |a ´ b| and whose other two
sides have lengths |b| ´ |a| cos θ and |a| sin θ. So Pythagoras gives
2 2
|a ´ b|2 = |b| ´ |a| cos θ + |a| sin θ
= |b|2 ´ 2|a| |b| cos θ + |a|2 cos2 θ + |a|2 sin2 θ
= |b|2 ´ 2|a| |b| cos θ + |a|2

This is precisely the cosine law8 . Observe that, when θ = π2 , this reduces to, (surpise!)
Pythagoras’ Theorem.
Setting our two expressions for |a ´ b|2 equal to each other,

|a ´ b|2 = |a|2 + |b|2 ´ 2a ¨ b = |b|2 ´ 2|a| |b| cos θ + |a|2

8 You may be used to seeing it written as c2 = a2 + b2 ´ 2ab cos C, where a, b and c are the lengths of the
three sides of the triangle and C is the angle opposite the side of length c

10
V ECTORS AND G EOMETRY 1.2 V ECTORS

cancelling the |a|2 and |b|2 common to both sides


´2a ¨ b = ´2|a| |b| cos θ
and dividing by ´2 gives
a ¨ b = |a| |b| cos θ
which is exactly property 6.
Property 7 follows directly from property 6. First note that the dot product a ¨ b =
|a| |b| cos θ is zero if and only if at least one of the three factors |a|, |b|, cos θ is zero. The
first factor is zero if and only if a = 0. The second factor is zero if and only if b = 0. The
third factor is zero if and only if θ = ˘ π2 + 2kπ, for some integer k, which in turn is true if
and only if a and b are mutually perpendicular.
Because of Property 7 of Theorem 1.2.9, the dot product can be used to test whether or
not two vectors are perpendicular to each other. That is, whether or not the angle between
the two vectors is 90˝ . Another name9 for “perpendicular” is “orthogonal”. Testing for
orthogonality is one of the main uses of the dot product.
Example 1.2.10

Consider the three vectors


a = h1, 1, 0i b = h1, 0, 1i c = h´1, 1, 1i
Their dot products
a ¨ b = h1, 1, 0i ¨ h1, 0, 1i = 1 ˆ 1 + 1 ˆ 0 + 0 ˆ 1 =1
a ¨ c = h1, 1, 0i ¨ h´1, 1, 1i = 1 ˆ (´1) + 1 ˆ 1 + 0 ˆ 1 = 0
b ¨ c = h1, 0, 1i ¨ h´1, 1, 1i = 1 ˆ (´1) + 0 ˆ 1 + 1 ˆ 1 = 0
? ?
tell us that c is perpendicular to both a and b. Since both |a| = |b| = 12 + 12 + 02 = 2
the first dot product tells us that the angle, θ, between a and b obeys
a¨b 1 π
cos θ = = ùñ θ =
|a| |b| 2 3

z
h´1, 1, 1i
h1, 0, 1i

x h1, 1, 0i

Example 1.2.10

9 The concepts of the dot product and perpendicularity have been generalized a lot in mathematics (for
example, from 2d and 3d vectors to functions). The generalization of the dot product is called the “inner
product” and the generalization of perpendicularity is called “orthogonality”.

11
V ECTORS AND G EOMETRY 1.3 E QUATIONS OF P LANES IN 3 D

1.3IJ Equations of Planes in 3d


Specifying one point ( x0 , y0 , z0 ) on a plane and a vector d parallel to the plane does not
uniquely determine the plane, because it is free
to rotate about d. On the other hand,
giving one point on the plane and one vector n = n x , ny , nz with direction perpendicular

n (x, y, z)
d

(x0 , y0, z0 ) (x0 , y0 , z0 )

to that of the plane does uniquely determine the plane. If ( x, y, z) is any point on the line
then the vector h x ´ x0 , y ´ y0 , z ´ z0 i, whose tail is at ( x0 , y0 , z0 ) and whose head is at
( x, y, z), lies entirely inside the plane and so must be perpendicular to n. That is,

Equation 1.3.1(The Equation of a Plane).

n ¨ h x ´ x0 , y ´ y0 , z ´ z0 i = 0

Writing out in components

n x ( x ´ x0 ) + ny (y ´ y0 ) + nz (z ´ z0 ) = 0 or n x x + ny y + nz z = d

where d = n x x0 + ny y0 + nz z0 .

Again, the coefficients



n x , ny , nz of x, y and z in the equation of the plane are the com-
ponents of a vector n x , ny , nz perpendicular to the plane. The vector n is often called a
normal vector for the plane. Any nonzero multiple of n will also be perpendicular to the
plane and is also called a normal vector.
Example 1.3.2 (Equation of a plane from a point and a normal vector)

Give an equation of the plane that passes through the point (5, 7, 13) and has normal vec-
tor x8, 4, 2y.
Solution. As we saw in Equation 1.3.1, the terms of the normal vector are the coefficients
of the variables:
8x + 4y + 2z = d
and
d = x8, 4, 2y ¨ x5, 7, 13y = 8 ¨ 5 + 4 ¨ 7 + 2 ¨ 13 = 94
So, the equation of the plane is
8x + 4y + 2z = 94

12
V ECTORS AND G EOMETRY 1.3 E QUATIONS OF P LANES IN 3 D

Example 1.3.2

The normal vector to a plane determines the orientation of the plane in space.

Definition1.3.3 (Parallel and Orthogonal Planes).

Two planes are orthogonal if their normal vectors are orthogonal. Two planes
are parallel their normal vectors are parallel.
A plane is parallel to itself, but when we ask for parallel planes, it is usually
implied that they are distinct.

Example 1.3.4

We have just seen that if we write the equation of a plane in the standard form

ax + by + cz = d

then it is easy to read off a normal vector for the plane. It is just h a, b, ci. So for example
the planes
P : x + 2y + 3z = 4 P1 : 3x + 6y + 9z = 7
have normal vectors n = h1, 2, 3i and n1 = h3, 6, 9i, respectively. Since n1 = 3n, the two
normal vectors n and n1 are parallel to each other. This tells us that the planes P and P1
are parallel to each other.
When the normal vectors of two planes are perpendicular to each other, we say that
the planes are perpendicular to each other. For example the planes

P : x + 2y + 3z = 4 P2 : 2x ´ y = 7

have normal vectors n = h1, 2, 3i and n2 = h2, ´1, 0i, respectively. Since

n ¨ n2 = 1 ˆ 2 + 2 ˆ (´1) + 3 ˆ 0 = 0

the normal vectors n and n2 are mutually perpendicular, so the corresponding planes P
and P2 are perpendicular to each other.
Example 1.3.4

Example 1.3.5

In this example, we’ll sketch the plane

P : 4x + 3y + 2z = 12

A good way to prepare for sketching a plane is to find the intersection points of the plane
with the x-, y- and z-axes, just as you are used to doing when sketching lines in the xy-
plane. For example, any point on the x axis must be of the form ( x, 0, 0). For ( x, 0, 0)

13
V ECTORS AND G EOMETRY 1.3 E QUATIONS OF P LANES IN 3 D

to also be on P we need x = 12/4 = 3. So P intersects the x-axis at (3, 0, 0). Similarly,


P intersects the y-axis at (0, 4, 0) and the z-axis at (0, 0, 6). Now plot the points (3, 0, 0),
(0, 4, 0) and (0, 0, 6). P is the plane containing these three points. Often a visually effective
way to sketch a surface in three dimensions is to

• only sketch the part of the surface in the first ocatant. That is, the part with x ě 0,
y ě 0 and z ě 0.

• To do so, sketch the curve of intersection of the surface with the part of the xy-plane
in the first octant and,

• similarly, sketch the curve of intersection of the surface with the part of the xz-plane
in the first octant and the curve of intersection of the surface with the part of the
yz-plane in the first octant.

That’s what we’ll do. The intersection of the plane P with the xy-plane is the straight line
through the two points (3, 0, 0) and (0, 4, 0). So the part of that intersection in the first
octant is the line segement from (3, 0, 0) to (0, 4, 0). Similarly the part of the intersection
of P with the xz-plane that is in the first octant is the line segment from (3, 0, 0) to (0, 0, 6)
and the part of the intersection of P with the yz-plane that is in the first octant is the line
segment from (0, 4, 0) to (0, 0, 6). So we just have to sketch the three line segments joining
three axis intercepts (3, 0, 0), (0, 4, 0) and (0, 0, 6). That’s it.

z
p0, 0, 6q

y
p0, 4, 0q
p3, 0, 0q
x

Example 1.3.5

Example 1.3.6 (Three points determine a plane)

Find the equation of the plane that contains the three points (1, ´1, 0), (2, 0, 1), and (5, 0, ´1).
Solution. Solution 1
We know that the equation of the plane will have the form ax + by + cz = d, where
xa, b, cy is a normal vector to the plane. So, we will start by finding a normal vector.
First, let’s find two vectors in the plane. We do this by choosing two pairs of points (it
doesn’t matter which two) and subtracting their coordinates.

14
V ECTORS AND G EOMETRY 1.3 E QUATIONS OF P LANES IN 3 D

1y (2, 0, 1)
x1, 1,
(1, ´1, 0)
x4, 1,
´1y (5, 0, ´1)

The normal vector will be a vector xa, b, cy that is perpendicular (orthogonal) to the two
vectors x4, 1, ´1y and x1, 1, 1y. The usual way of finding such a vector is by using the cross
product, but that’s a topic for another course. We find it by solving a system of equations.
Remember two nonzero vectors are perpendicular if their dot products are zero.

• xa, b, cy ¨ x4, 1, ´1y = 0 implies 4a + b ´ c = 0. So c = 4a + b.

• xa, b, cy ¨ x1, 1, 1y = 0 implies a + b + c = 0. So c = ´a ´ b.

• Combining the last two results, 4a + b = ´a ´ b, so a = ´ 25 b. Then also c = ´a ´ b =


2 3
5 b ´ b = ´ 5 b.

• There will be infinitely many normal vectors, all parallel to one another (i.e. scalar
multiples of one another). So, it’s fine that we have all our coordinates in terms of
b. Our normal vectors have the forms x´ 25 b, b, ´ 35 by. Setting b = 5 gives us integer
coordinates:
x´2, 5, ´3y

• Now that we have a normal vector, we know our plane equation will look like

´2x + 5y ´ 3x = d

for some constant d. Plugging in any of our three points will let us find d. For
example, the point (1,-1,0) tells us ´2 ´ 5 + 0 = d, so d = ´7.

• All together, an equation of our plane is

´2x + 5y ´ 3x = ´7

Solution 2 We know that the equation of the plane will have the form ax + by + cz = d.
The three points give us a system of linear equations, which we can solve using substitu-
tion.

• (1, ´1, 0) in the plane tells us a ´ b = d, so a = b + d.

• (2, 0, 1) in the plane tells us 2a + c = d, so 2(b + d) + c = d, so c = ´2b ´ d.

• (5, 0, ´1) in the plane tells us 5a ´ c = d, so 5 (b + d) ´ (´2b ´ d) = d, so b = ´ 75 d.


Now we can get a and c in terms of d, as well.

• Since c = ´2b ´ d, and b = ´ 75 d, then c = 73 d

• Since a = b + d and b = ´ 57 d, then a = 27 d

15
V ECTORS AND G EOMETRY 1.4 F UNCTIONS OF T WO VARIABLES

• All together, the equation of our line is


     
2 5 3
d x+ ´ d y+ d z=d
7 7 7

Any nonzero value of d will give an equation to our line. To get integer coefficients,
we let d = 7.
2x ´ 5y + 3z = 7

Notice this answer is the negative of the answer from Solution 1. They are equivalent
expressions, as is (for example)

1 5 3 1
x´ y+ z =
7 14 14 2

Example 1.3.6

1.4IJ Functions of Two Variables


First, a quick review of dependent and independent variables. Independent variables are
the variables we think of as changing somehow on their own; the dependent variables are
the variables whose change we think of as being caused by the independent variables.
For example, if you want to describe the relationship between the age of a cup of cottage
cheese, and the number of bacteria in that cup, we generally choose age (time) to be the
independent variable and population of bacteria to be the dependent variable: we think
of age changing on its own, then that age causing the bacterial population to change.
We could of course go the other way, and write time as a function of bacteria. This
could be useful if we were trying to figure out how old the cheese was by counting its
bacteria. So the difference between an independent variable and a dependent variable
has to do with how we want to interpret a function.
In a single-variable function, by convention we write

y = f (x)

where y is the dependent variable and x is the independent variable. Similarly, in a two-
variable function, we generally write

z = f ( x, y)

We think of the variables x and y as independent, and the variable z as dependent.


If we’re not too concerned with independent vs dependent variables; or if the rela-
tionship between the dependent and independent variables is difficult (or impossible) to
write explicitly in this form; then we can also define multivariable functions implicitly.
For example, in the equation

z3 x + z2 y + xyz ´ 1 = 0

16
V ECTORS AND G EOMETRY 1.4 F UNCTIONS OF T WO VARIABLES

we can think of z as an implicitly defined function of x and y. You’ve already seen two
families of implicitly defined functions: planes and spheres.
Example 1.4.1

Which points (1, y, 1) in R3 satisfy the equation

z3 x + z2 y + xyz ´ 1 = 0 ?

Solution. If x = z = 1, then the equation becomes

1+y+y´1 = 0

which has solution y = 0. So the only such point is (1, 0, 1).


Example 1.4.1

It’s common to see a multivariable equation like

f ( x, y) = sin( x + y)

or
2 + y2
g( x, y) = e x
and think that the sine and exponential functions are different from the sine and exponen-
tial functions we’ve seen in two dimensions. They aren’t! When x and y are real numbers,
then ( x + y) and ( x2 + y2 ) are real numbers as well. We’re taking the sine of a real number
in the first equation, and e to a real power in the second equation, just as we always have.
Functions of two (or more) variables are not so different from functions of one variable
in other ways as well.

Definition1.4.2 (Domain and Range).

Let f ( x, y) be a function that takes pairs of real numbers as inputs, and gives a
real number as its output.
The set of points ( x, y) that can be input to f is the domain of that function. The
set of outputs of f over its entire domain is the range of that function.

Example 1.4.3 (Domain and Range)

Find the domain and range of the function


a
2 + y2
f ( x, y) = ex ´2

Solution. There are three operations in our function: exponentiation, subtraction, and
taking of a square root. We can subtract anything from anything; and we can raise e to
any power. So the only thing that could “break” our function is if we tried to take the

17
V ECTORS AND G EOMETRY 1.4 F UNCTIONS OF T WO VARIABLES

square root of a negative number. This tells us that, in order for f ( x, y) to be defined, we
need
 2 2 
e x +y ´ 2 ě 0
2 + y2
ùñ ex ě2
ùñ x2 + y2 ě ln 2
One way of describing the domain of this function is to call it “all points ( x, y) with
x2 + y2 ě ln 2.” A more standard way is to describe the shape this 2
? set makes in R : all
points on our outside the circle centred at the origin with radius ln 2 « 0.83.
y

?
ln 2

To help you visualize what we mean, take a point in the shaded area above. For exam-
ple, (1, .5). If we plug that into our function, it causes no problems:
a
2 2
a ?
f (1, .5) = e1 +.5 ´ 1 = e1.25 ´ 2 « 1.49 « 1.22
On the other hand, take a point in the white area. For example, (.5, .5). If we try to plug
this into our function, we end up with
a
2 2
a ? ?
f (.5, .75) = e.5 +.5 ´ 2 = e0.5 ´ 2 « 1.65 ´ 2 « ´0.35
which is not a real number.
y

?
ln 2
(1, .5)

(.5, .5)
x

18
V ECTORS AND G EOMETRY 1.4 F UNCTIONS OF T WO VARIABLES

Now, let’s think about range. By choosing larger and larger values of x and y, we can
make x2 + y2 into larger and larger numbers. So within our restricted domain, the range
2 2   2 2
of x2 + y2 is [ln2, 8); so the range of e x +y is eln 2 , 8 = [2, 8); so the range of e x +y ´ 2
is [0, 8); so the range of f ( x, y) is [0, 8).
Again, note that the domain of f consists of ordered pairs of real numbers, while its
range consists of real numbers.
Example 1.4.3

Example 1.4.4

Find the domain and range of the function


 
x
f ( x, y) = sin ?
y

Solution. Let’s start with domain. We can take the sine of any number we like, so that
part of the function doesn’t limit the domain. The things limiting the domain are that we
cannot take the square root of a negative number, and we can’t divide by zero.

• Because we can’t take the square root of a negative number, we must have y ě 0.
?
• Because we can’t divide by 0, we must have y ‰ 0, i.e. y ‰ 0.

Combining these restrictions, we can only have values of y in the interval (0, 8); x can be
any real number. So, our domain is the upper half of the xy plane, excluding the x-axis:

In general, the range of sin x is [´1, 1]. So, we certainly can’t get a larger range than
this. We should check that our range is no smaller. When y = 1, our function becomes
f ( x, 1) = sin( x/1) = sin x. Since x can be any real number, indeed the range of our
function is [´1, 1].

19
V ECTORS AND G EOMETRY 1.4 F UNCTIONS OF T WO VARIABLES

Example 1.4.4

Example 1.4.5

Find the domain and range of the function


f ( x, y) = ln(arctan( x + y))

Solution. First, let’s think about the arctangent and logarithm function in the context of
single-variable
 functions. The domain of arctangent is all real numbers, and its range is
´ 2 , 2 . The domain of the natural logarithm is all positive numbers, and its range is all
π π

real numbers.

z z

π
2

t t

z = arctan t z = ln t

Since only positive numbers may be input into the natural logarithm, we require arctan( x +
y) ą 0. That requires ( x + y) ą 0. So, our domain is the collection of all points ( x, y) such
that x + y ą 0; put another way, all points above the line y = ´x.
y

20
V ECTORS AND G EOMETRY 1.4 F UNCTIONS OF T WO VARIABLES

If our domain is points ( x, y) such that x + y ą 0, then the range of the function ( x +
y) is (0, 8); so the numbers being plugged into the arctangent  function are (0, 8). So,
the numbers
 coming out of the arctangent function are 0, 2 . Then the numbers from
π

0, π2 are being input into


 the natural logarithm function, leading to a range of the entire
function of ´8, ln 2 .π

z z

π
2

ln π
2
t t
π
2


If 0 ă t, then 0 ă arctan t ă π
2 If 0 ă t ă 2,
π
then ´8 ă ln t ă ln π
2

Example 1.4.5

We may sometimes restrict the domain of a function more than is mathematically nec-
essary in order for it to make sense in a model. For example, we may have a function
that only makes sense in our model when it gives positive values. In this case, we might
restrict the domain to a model domain, the set of inputs for which the function is not only
defined, but sensible in the context of our model.
Example 1.4.6

A large pharmaceutical company determines its research budget for a new vaccine accord-
ing to the formula
R( x, y) = ln( xy)
where x the is the size of the customer base they expect to have and y is the revenue they
expect per dose.
The each variable x, y, and R, negative values don’t make sense in the model. So
although we could compute R(´1, ´1) = 1, and we could compute R(0.5, 0.5) « ´1.39,
they wouldn’t be sensible in the context of our model.

• Since x and y need to be nonnegative, we will only consider points ( x, y) in the first
quadrant of the Cartesian plane: x ě 0 and y ě 0.

• Since R needs to be nonnegative, we will further restrict xy ě 1. That is, y ě 1x .

21
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

The two restrictions above give us the model domain shaded below.

x
1

Depending on the specifics of how the function is being used, the model domain may
be restricted even further. For example, perhaps the firm has a maximum budget for any
given project; perhaps the amount they can charge is limited by law; etc.
Example 1.4.6

1.5IJ Sketching Surfaces in 3d


In practice students taking multivariable calculus regularly have great difficulting visu-
alising surfaces in three dimensions, despite the fact that we all live in three dimensions.
We’ll now develop some technique to help us sketch surfaces in three dimensions10 .
We all have a fair bit of experience drawing curves in two dimensions. Typically the
intersection of a surface (in three dimensions) with a plane is a curve lying in the (two
dimensional) plane. Such an intersection is usually called a cross-section. In the special
case that the plane is one of the coordinate planes, or parallel to one of the coordinate
planes, the intersection is sometimes called a trace.

Definition1.5.1.

The trace of a surface is the intersection of that surface with a plane that is parallel
to one of the coordinate planes.

So, one trace (the intersection with the xy plane) is found by setting z equal to a con-
stant; another trace (the intersection with the yz plane) is found by setting x equal to a
constant; and the final trace (the intersection with the xz plane) is found by setting y equal
to a constant.

10 Of course you could instead use some fancy graphing software, but part of the point is to build intuition.
Not to mention that you can’t use fancy graphing software on your exam.

22
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

One can often get a pretty good idea of what a surface looks like by sketching a bunch
of cross-sections. Here are some examples.


Example 1.5.2 4x2 + y2 ´ z2 = 1

Sketch the surface that satisfies 4x2 + y2 ´ z2 = 1.

Solution. We’ll start by fixing any number z0 and sketching the part of the surface that
lies in the horizontal plane z = z0 .

z “ z0

The intersection of our surface with that horizontal plane is a horizontal cross-section.
Any point ( x, y, z) lying on that horizontal cross-section satsifies both

z = z0 and 4x2 + y2 ´ z2 = 1
ðñ z = z0 and 4x2 + y2 = 1 + z20

Think of z0 as a constant. Then 4x2 + y2 = 1 + z20 is a curve in the xy-plane. As 1 + z20


is an ellipse. To determine its semi-axes11b
is a constant, the curve b , we observe that when
y = 0, we have x = ˘ 21 1 + z20 and when x = 0, we have y = ˘ 1 + z20 . So the curve is
b b
just an ellipse with x semi-axis 21 1 + z20 and y semi-axis 1 + z20 . It’s easy to sketch.

p y
(0 , 1 + z02 )

x
p
( 21 1 + z02 , 0)

11 The semi-axes of an ellipse are the line segments from the centre of the ellipse to the farthest point on
the curve and to the nearest point on the curve. For a circle the lengths of both of these line segments
are just the radius.

23
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

Remember that this ellipse is the part of our surface that lies in the plane z = z0 . Imagine
that the sketch of the ellipse is on a single sheet of paper. Lift the sheet of paper up, move
it around so that the x- and y-axes point in the directions of the three dimensional x- and
y-axes and place the sheet of paper into the three dimensional sketch at height z0 . This
gives a single horizontal ellipse in 3d, as in the figure below.

z
z “ z0

We can build up the full surface by stacking many of these horizontal ellipses — one for
each possible height z0 . So we now draw a few of them as in the figure below. To reduce
the amount of clutter in the sketch, we have only drawn the first octant (i.e. the part of
three dimensions that has x ě 0, y ě 0 and z ě 0).

z
z=3

z=2

z=1
y

Here is why it is OK, in this case, to just sketch the first octant. Replacing x by ´x in
the equation 4x2 + y2 ´ z2 = 1 does not change the equation. That means that a point
( x, y, z) is on the surface if and only if the point (´x, y, z) is on the surface. So the surface
is invariant under reflection in the yz-plane. Similarly, the equation 4x2 + y2 ´ z2 = 1 does
not change when y is replaced by ´y or z is replaced by ´z. Our surface is also invariant
reflection in the xz- and yz-planes. Once we have the part in the first octant, the remaining
octants can be gotten simply by reflecting about the coordinate planes.
We can get a more visually meaningful sketch by adding in some vertical cross-sections.
The x = 0 and y = 0 cross-sections (also called traces — they are the parts of our surface
that are in the yz- and xz-planes, respectively) are

x = 0, y2 ´ z2 = 1 and y = 0, 4x2 ´ z2 = 1

24
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

These equations describe hyperbolae12 . If you don’t remember how to sketch them, don’t
worry. We’ll do it now. We’ll first sketch them in 2d. Since

y2 = 1 + z2 ùñ |y| ě 1 and y = ˘1 when z = 0 and for large z, y « ˘z


2 2 1
4x = 1 + z ùñ |x| ě 2 and x= ˘ 12 when z = 0 and for large z, x « ˘ 12 z

the sketchs are

z z=y z

y2 − z2 = 1 4x2 − z 2 = 1

y x

Now we’ll incorporate them into the 3d sketch. Once again imagine that each is a single
sheet of paper. Pick each up and move it into the 3d sketch, carefully matching up the
axes. The red (blue) parts of the hyperbolas above become the red (blue) parts of the 3d
sketch below (assuming of course that you are looking at this on a colour screen).

z
z=3

z=2

z=1
y

Now that we have a pretty good idea of what the surface looks like we can clean up and
simplify the sketch. Here are a couple of possibilities.

12 It’s not just a figure of speech!

25
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

This type of surface is called a hyperboloid of one sheet.


There are also hyperboloids of two sheets. For example, replacing the +1 on the right
hand side of x2 + y2 ´ z2 = 1 gives x2 + y2 ´ z2 = ´1, which is a hyperboloid of two
sheets. We’ll sketch it quickly in the next example.
Example 1.5.2


Example 1.5.3 4x2 + y2 ´ z2 = ´1

Sketch the surface that satisfies 4x2 + y2 ´ z2 = ´1.

Solution. As in the last example, we’ll start by fixing any number z0 and sketching the
part of the surface that lies in the horizontal plane z = z0 . The intersection of our surface
with that horizontal plane is

z = z0 and 4x2 + y2 = z20 ´ 1

Think of z0 as a constant.

• If |z0 | ă 1, then z20 ´ 1 ă 0 and there are no solutions to x2 + y2 = z20 ´ 1.

• If |z0 | = 1 there is exactly one solution, namely x = y = 0.


b
1
• If |z0 | ą 1 then 4x2
+ y2
= z20
´ 1 is an ellipse with x semi-axis z20 ´ 1 and y
2
b
semi-axis z20 ´ 1. These semi-axes are small when |z0 | is close to 1 and grow as |z0 |
increases.

The first octant parts of a few of these horizontal cross-sections are drawn in the figure
below.

26
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

z
z“3

z“2

z “ 1.02
y

Next we add in the x = 0 and y = 0 cross-sections (i.e. the parts of our surface that are in
the yz- and xz-planes, respectively)

x = 0, z2 = 1 + y2 and y = 0, z2 = 1 + 4x2

z
z“3

z“2

z “ 1.05
y

Now that we have a pretty good idea of what the surface looks like we clean up and
simplify the sketch.

27
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

This type of surface is called a hyperboloid of two sheets.


Example 1.5.3

Example 1.5.4 (yz = 1)

Sketch the surface yz = 1.

Solution. This surface has a special property that makes it relatively easy to sketch. There
are no x’s in the equation yz = 1. That means that if some y0 and z0 obey y0 z0 = 1, then
the point ( x, y0 , z0 ) lies on the surface yz = 1 for all values of x. As x runs from ´8 to 8,
the point ( x, y0 , z0 ) sweeps out a straight line parallel to the x-axis. So the surface yz = 1
is a union of lines parallel to the x-axis. It is invariant under translations parallel to the
x-axis. To sketch yz = 1, we just need to sketch its intersection with the yz-plane and then
translate the resulting curve parallel to the x-axis to sweep out the surface.
We’ll start with a sketch of the hyperbola yz = 1 in two dimensions.

z
yz = 1

Next we’ll move this 2d sketch into the yz-plane, i.e. the plane x = 0, in 3d, except that
we’ll only draw in the part in the first octant.

The we’ll draw in x = x0 cross-sections for a couple of more values of x0

28
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

and clean up the sketch a bit

Example 1.5.4

Example 1.5.5 ( xyz = 1)

Sketch the surface xyz = 4.

Solution. We’ll sketch this surface using much the same procedure as we used in Examples
1.5.2 and 1.5.3. We’ll only sketch the part of the surface in the first octant. The remaining
parts (in the octants with x, y ă 0, z ě 0, with x, z ă 0, y ě 0 and with y, z ă 0, x ě 0) are
just reflections of the first octant part.
As usual, we start by fixing any number z0 and sketching the part of the surface that
lies in the horizontal plane z = z0 . The intersection of our surface with that horizontal
plane is the hyperbola
1
z = z0 and xy =
z0
Note that x Ñ 8 as y Ñ 0 and that y Ñ 8 as x Ñ 0. So the hyperbola has both the x-axis
and the y-axis as asymptotes, when drawn in the xy-plane. The first octant parts of a few
of these horizontal cross-sections (namely, z0 = 4, z0 = 2 and z0 = 21 ) are drawn in the
figure below.

29
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

z
z“4

z“2

z “ 1{2
y

Next we add some vertical cross-sections. We can’t use x = 0 or y = 0 because any point
on xyz = 1 must have all of x, y, z nonzero. So we use

x = 4, yz = 1 and y = 4, xz = 1

instead. They are again hyperbolae.

y“4

x“4

Finally, we clean up and simplify the sketch.

30
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

Example 1.5.5

Often the reason you are interested in a surface in 3d is that it is the graph z = f ( x, y)
of a function of two variables f ( x, y). Another good way to visualize the behaviour of a
function f ( x, y) is to sketch what are called its level curves.

Definition1.5.6.

A level curve of f ( x, y) is a curve whose equation is f ( x, y) = C, for some con-


stant C.

A level curve is the set of points in the xy-plane where f takes the value C. Because
it is a curve in 2d, it is usually easier to sketch than the graph of f . Here are a couple of
examples.


Example 1.5.7 f ( x, y) = x2 + 4y2 ´ 2x + 2

Sketch the level curves of f ( x, y) = x2 + 4y2 ´ 2x + 2.

Solution. Fix any real number C. Then, for the specified function f , the level curve
f ( x, y) = C is the set of points ( x, y) that obey

x2 + 4y2 ´ 2x + 2 = C ðñ x2 ´ 2x + 1 + 4y2 + 1 = C
ðñ ( x ´ 1)2 + 4y2 = C ´ 1

Now ( x ´ 1)2 + 4y2 is the sum of two squares, and so is always at least zero. So if C ´ 1 ă 0,
i.e. if C ă 1, there is no curve f ( x, y) = C. If C ´ 1 = 0, i.e. if C = 1, then f ( x, y) = C ´ 1 =
0 if and only if both ( x ´ 1)2 = 0 and 4y2 = 0 and so the level curve consists of the single
point (1, 0). If C ą 1, then f ( x, y) = C become ( x ´ 1)2 + 4y2 = C ´ 1 ą 0 which describes
an ellipse centred on (1, 0). It intersects the x-axis when y = 0 and
? ?
( x ´ 1)2 = C ´ 1 ðñ x ´ 1 = ˘ C ´ 1 ðñ x = 1 ˘ C ´ 1

31
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

and it intesects the line x = 1 (i.e. the vertical line through the centre) when
? ?
4y2 = C ´ 1 ðñ 2y = ˘ C ´ 1 ðñ y = ˘ 12 C ´ 1
?
So, when C ą?1, f ( x, y) = C is the ellipse centred on (1, 0) with x semi-axis C ´ 1 and
y semi-axis 21 C ´ 1. Here is a sketch of some representative level curves of f ( x, y) =
x2 + 4y2 ´ 2x + 2.

1
f “17
f “10
f “5
f “1 f “2

1 x

x“1

It is often easier to develop an understanding of the behaviour of a function f ( x, y) by


looking at a sketch of its level curves, than it is by looking at a sketch of its graph. On
the other hand, you can also use a sketch of the level curves of f ( x, y) as the first step in
building a sketch of the graph z = f ( x, y). The next step would be to redraw, for each C,
the level curve f ( x, y) = C, in the plane z = C, as we did in Example 1.5.2.
Example 1.5.7

Example 1.5.8 (e x+y+z = 1)

The function f ( x, y) is given implicitly by the equation e x+y+z = 1. Sketch the level curves
of f .
Solution. This one is not as nasty as it appears. That “ f ( x, y) is given implicitly by the
equation e x+y+z = 1” means that, for each x, y, the solution z of e x+y+z = 1 is f ( x, y). So,
for the specified function f and any fixed real number C, the level curve f ( x, y) = C is the
set of points ( x, y) that obey

e x+y+C = 1 ðñ x + y + C = 0 (by taking the ln of both sides)


ðñ x + y = ´C

This is of course a straight line. It intersects the x-axis when y = 0 and x = ´C and it
intersects the y-axis when x = 0 and y = ´C. Here is a sketch of some level curves.

32
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

1 f =−3

x
1 f =−2

f =−1

f =3 f =2 f =1 f =0

Example 1.5.8

We have just seen that sketching the level curves of a function f ( x, y) can help us
understand the behaviour of f . We can generalise this to functions F ( x, y, z) of three vari-
ables. A level surface of F ( x, y, z) is a surface whose equation is of the form F ( x, y, z) = C
for some constant C. It is the set of points ( x, y, z) at which F takes the value C.


Example 1.5.9 F ( x, y, z) = x2 + y2 + z2

Let F ( x,?y, z) = x2 + y2 + z2 . If C ą 0, then the level surface F ( x, y, z) = C is the sphere of


radius C centred on the origin. Here is a sketch of the parts of the level surfaces F = 1
(radius 1), F = 4 (radius 2) and F = 9 (radius 3) that are in the first octant.

F “9
F “4

F “1

33
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

Example 1.5.9


Example 1.5.10 F ( x, y, z) = x2 + z2

Let F ( x, y, z) = x2 + z2 and C ą 0. Consider the level surface x2 + z2 = C. The variable


y does not appear in this equation. So for any fixed y0 , the
? intersection of the our surface
2 2
x + z = C with the plane y = y0 is the circle of radius C centred on x = z = 0. Here is
a sketch of the first quadrant part of one such circle.

z
y “ y0

F “C

The full surface is the horizontal


? stack of all of those circles with y0 running over R. It is
the cylinder of radius C centred on the y-axis. Here is a sketch of the parts of the level
surfaces F = 1 (radius 1), F = 4 (radius 2) and F = 9 (radius 3) that are in the first octant.

z
F “9

F “4

F “1

Example 1.5.10

34
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

Example 1.5.11 ( F ( x, y, z) = e x+y+z )

Let F ( x, y, z) = e x+y+z and C ą 0. Consider the level surface e x+y+z = C, or equivalently,


x + y + z = ln C. It is the plane that contains the intercepts (ln C, 0, 0), (0, ln C, 0) and
(0, 0, ln C ). Here is a sketch of the parts of the level surfaces

• F = e (intercepts (1, 0, 0), (0, 1, 0), (0, 0, 1)),


• F = e2 (intercepts (2, 0, 0), (0, 2, 0), (0, 0, 2)) and
• F = e3 (intercepts (3, 0, 0), (0, 3, 0), (0, 0, 3))

that are in the first octant.

F “ e3
F “e 2

F “e
y

Example 1.5.11

There some classes of relatively simple, but commonly occurring, surfaces that are
given their own names. One such class is cylindrical surfaces. You are probably used to
thinking of a cylinder as being something that looks like x2 + y2 = 1.

x2 ` y 2 “ 1

In Mathematics the word “cylinder” is given a more general meaning.

35
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

Definition1.5.12 (Cylinder).

A cylinder is a surface that consists of all points that are on all lines that are

• parallel to a given line and


• pass through a given fixed plane curve (in a plane not parallel to the given
line).

Example 1.5.13

Here are sketches of three cylinders. The familiar cylinder on the left below

x2 ` y 2 “ 1 x2 ` py ´ zq2 “ 1

is called a right circular cylinder, because the given fixed plane curve (x2 + y2 = 1, z = 0)
is a circle and the given line (the z-axis) is perpendicular (i.e. at right angles) to the fixed
plane curve.
The cylinder on the left above can be thought of as a vertical stack of circles. The
cylinder on the right above can also be thought of as a stack of circles, but the centre of the
circle at height z has been shifted rightward to (0, z, z). For that cylinder, the given fixed
plane curve is once again the circle x2 + y2 = 1, z = 0, but the given line is y = z, x = 0.
We have already seen the the third cylinder

x yz “ 1
x, y, z ą 0

36
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

in Example 1.5.4. It is called a hyperbolic cylinder. In this example, the given fixed plane
curve is the hyperbola yz = 1, x = 0 and the given line is the x-axis.
Example 1.5.13

1.5.1 §§ Quadric Surfaces


Another named class of relatively simple, but commonly occurring, surfaces is the quadric
surfaces.
Definition1.5.14 (Quadrics).

A quadric surface is surface that consists of all points that obey Q( x, y, z) = 0,


with Q being a polynomial of degree two13 .

For Q( x, y, z) to be a polynomial of degree two, it must be of the form


Q( x, y, z) = Ax2 + By2 + Cz2 + Dxy + Eyz + Fxz + Gx + Hy + Iz + J
for some constants A, B, ¨ ¨ ¨ , J. Each constant z cross section of a quadric surface has an
equation of the form
Ax2 + Dxy + By2 + gx + hy + j = 0, z = z0
If A = B = D = 0 but g and h are not both zero, this is a straight line. If A, B, and D
are not all zero, then by rotating and translating our coordinate system the equation of the
cross section can be brought into one of the forms14
• αx2 + βy2 = γ with α, β ą 0, which, if γ ą 0, is an ellipse (or a circle),
• αx2 ´ βy2 = γ with α, β ą 0, which, if γ ‰ 0, is a hyperbola, and if γ = 0 is two lines,
• x2 = δy, which, if δ ‰ 0 is a parabola, and if δ = 0 is a straight line.
There are similar statements for the constant y cross sections and the constant z cross
sections. Hence quadratic surfaces are built by stacking these three types of curves.
We have already seen a number of quadric surfaces in the last couple of sections.
• We saw the quadric surface 4x2 + y2 ´ z2 = 1 in Example 1.5.2.

13 Technically, we should also require that the polynomial can’t be factored into the product of two poly-
nomials of degree one.
14 This statement can be justified using a linear algebra eigenvalue/eigenvector analysis. It is beyond
what we can cover here, but is not too difficult for a standard linear algeba course.

37
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

Its constant z cross sections are ellipses and its x = 0 and y = 0 cross sections are
hyperbolae. It is called a hyperboloid of one sheet.
• We saw the quadric surface x2 + y2 = 1 in Example 1.5.13.

Its constant z cross sections are circles and its x = 0 and y = 0 cross sections are
straight lines. It is called a right circular cylinder.
• the quadric surface x2 + (y ´ z)2 = 1 in Example 1.5.13, and
• We saw the quadric surface yz = 1 in Example 1.5.4.
Appendix A.2 contains other quadric surfaces.
Example 1.5.15 (Indifference curves)

Suppose a function U ( x, y) gives the happiness15 (or utility) a consumer gains when they
purchase x units of Good X and y units of Good Y. The level curves of the surface
z = U ( x, y) are called indifference curves, because every point along that curve results
in the same benefit to the consumer.
?
Suppose U ( x, y) = x y. The purchasing 2 units of Good X and one unit of Good Y
produces the same benefit as purchasing 1 unit of Good X and 4 units of Good Y, because
both these combinations are on the level curve U ( x, y) = 2.
y

?
1 x y=2

x
1 2
?
Let’s make a small contour map of our surface U ( x, y) = x y, plotting several indif-
? 2
ference curves. (Note x y = c is equivalent to y = xc 2 in our model domain.)

15 An amusing thought experiment is to propose units for measuring happiness. ”The one-point increase
in GDP was associated with an average increase of 3.7 wrinkly puppy faces of happiness nation-wide.”

38
V ECTORS AND G EOMETRY 1.5 S KETCHING S URFACES IN 3 D

=1
=2
=3
=4
=5
y

U
U
U
U
U
x

Not surprisingly, if we move roughly in the direction of the vector x1, 1y (that is, in-
creasing both x and y), our happiness U ( x, y) goes up.
Note that none of the indifference curves touch either of the x or y axes. It is clear
enough from the formula that U (0, y) = U ( x, 0) = 0. This is a common feature of utility
functions: that to maximize utility, a consumer will have at least a little of both products,
rather than consuming only one type.
Example 1.5.15

Chapter 1 (excluding Section 1.4) was adapted from Chapter 1 of CLP–3 Multivari-
able Calculus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-
NonCommercial-ShareAlike 4.0 International license.

39
Chapter 2

PARTIAL D ERIVATIVES

In this chapter we are going to generalize the definition of “derivative” to functions of


more than one variable, and then we are going to use those derivatives. We can speed
things up considerably by recycling what we have already learned in the single-variable
case.

2.1IJ Partial Derivatives


First, recall how we defined the derivative, f 1 ( a), of a function of one variable, f ( x ). We
imagined that we were walking along the x-axis, in the positive direction, measuring, for
example, the temperature along the way. We denoted by f ( x ) the temperature at x. The
instantaneous rate of change of temperature that we observed as we passed through x = a
was
df f ( a + h) ´ f ( a) f ( x ) ´ f ( a)
( a) = lim = lim
dx hÑ0 h xÑa x´a
Next suppose that we are walking in the xy-plane and that the temperature at ( x, y) is
f ( x, y). We can pass through the point ( x, y) = ( a, b) moving in many different directions,
and we cannot expect the measured rate of change of temperature if we walk parallel to
the x-axis, in the direction of increasing x, to be the same as the measured rate of change
of temperature if we walk parallel to the y-axis in the direction of increasing y. We’ll start
by considering just those two directions. other directions (like walking parallel to the line
y = x) later.
Suppose that we are passing through the point ( x, y) = ( a, b) and that we are walking
parallel to the x-axis (in the positive direction). Then our y-coordinate will be constant, al-
ways taking the value y = b. So we can think of the measured temperature as the function
of one variable B( x ) = f ( x, b) and we will observe the rate of change of temperature

dB B( a + h) ´ B( a) f ( a + h, b) ´ f ( a, b)
( a) = lim = lim
dx hÑ0 h hÑ0 h
Bf 
This is called the “partial derivative f with respect to x at ( a, b)” and is denoted Bx y ( a, b ).
Here

40
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES

˝ the symbol B, which is read “partial”, indicates that we are dealing with a function
of more than one variable
 and
˝ the subscript y on y
indicates that y is being held fixed, i.e. being treated as a
constant, and
Bf
˝ the x in Bx indicates that we are differentiating with respect to x.
Bf
˝ Bx is read “ partial dee f dee x”.

d d
Do not write dx when Bx
B
is appropriate. (There exist situations when dx f and B
Bx f are both
defined and have different meanings.)
If, instead, we are passing through the point ( x, y) = ( a, b) and are walking parallel to
the y-axis (in the positive direction), then our x-coordinate will be constant, always taking
the value x = a. So we can think of the measured temperature as the function of one
variable A(y) = f ( a, y) and we will observe the rate of change of temperature

dA A(b + h) ´ A(b) f ( a, b + h) ´ f ( a, b)
(b) = lim = lim
dy hÑ0 h hÑ0 h

Bf 
This is called the “partial derivative f with respect to y at ( a, b)” and is denoted By x ( a, b ).
df
Just as was the case for the ordinary derivative dx ( x ), it is common to treat the partial
derivatives of f ( x, y) as functions of ( x, y) simply by evaluating the partial derivtives at
( x, y) rather than at ( a, b).

Definition2.1.1 (Partial Derivatives).

The x- and y-partial derivatives of the function f ( x, y) are


 
Bf f ( x + h, y) ´ f ( x, y)
( x, y) = lim
Bx y hÑ0 h
 
Bf f ( x, y + h) ´ f ( x, y)
( x, y) = lim
By x hÑ0 h

respectively. The partial derivatives of functions of more than two variables are
defined analogously.

Partial derivatives are used a lot. And there many notations for them.

41
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES

Notation2.1.2.
 
Bf
The partial derivative Bx y of a function f ( x, y) is also denoted

Bf
fx Dx f D1 f
Bx
The subscript 1 on D1 f indicates that f is
 being differentiated with respect to its
Bf
first variable. The partial derivative Bx ( a, b) is also denoted
y

B f ˇˇ
ˇ

Bx ˇ(a,b)

Bf
with the subscript ( a, b) indicating
 that
 Bx is being evaluated at ( x, y) = ( a, b).
Bf Bf
The abbreviated notation Bx for Bx is extremely commonly used. But it is
y
dangerous to do so, when it is not clear from the context, that it is the variable y
that is being held fixed.

Remark 2.1.3 (The Geometric Interpretation of Partial Derivatives). We’ll now develop
a geometric interpretation of the partial derivative

 
Bf f ( a + h, b) ´ f ( a, b)
( a, b) = lim
Bx y hÑ0 h

in terms of the shape of the graph z = f ( x, y) of the function f ( x, y). That graph appears
in the figure below. It looks like the part of a deformed sphere that is in the first octant.
 
Bf
The definition of Bx y ( a, b) concerns only points on the graph that have y = b. In
other words, the curve of intersection of the surface z = f ( x, y) with the plane y = b. That
is the red curve in the figure. The two blue vertical line segments in the figure have heights
f ( a+h,b)´ f ( a,b)
f ( a, b) and f ( a + h, b), which are the two numbers in the numerator of h .

42
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES

z “ f px, yq
y“b
f pa ` h, bq ´ f pa, bq
f pa, bq
f pa ` h, bq
y
pa, b, 0q
h
pa ` h, b, 0q

A side view of the curve (looking from the left side of the y-axis) is sketched in the figure
below. Again, the two blue vertical line segments in the figure have heights f ( a, b)

f pa ` h, bq ´ f pa, bq

z “ f px, bq, y “ b

f pa, bq
f pa ` h, bq

x
pa, b, 0q pa ` h, b, 0q

f ( a+h,b)´ f ( a,b)
and f ( a + h, b), which are the two numbers in the numerator of h . So the
numerator f ( a + h, b) ´ f ( a, b) and denominator h are
 the
 rise and run, respectively, of
Bf
the curve z = f ( x, b) from x = a to x = a + h. Thus Bx y ( a, b ) is exactly the slope of (the
tangent to) the curve of intersection ofthe surface z = f ( x, y) and the plane y = b at the point
 Bf
a, b, f ( a, b) . In the same way By ( a, b) is exactly the slope of (the tangent to) the curve of
x 
intersection of the surface z = f ( x, y) and the plane x = a at the point a, b, f ( a, b) .

43
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES

§§§ Evaluation of Partial Derivatives

From the above discussion, we see that we can readily compute partial derivatives B
Bx by
d
using what we already know about ordinary derivatives dx . More precisely,
Bf
• to evaluate Bx ( x, y), treat the y in f ( x, y) as a constant and differentiate the resulting
function of x with respect to x.
Bf
• To evaluate By ( x, y), treat the x in f ( x, y) as a constant and differentiate the resulting
function of y with respect to y.
Bf
• To evaluate Bx ( a, b), treat the y in f ( x, y) as a constant and differentiate the resulting
function of x with respect to x. Then evaluate the result at x = a, y = b.
Bf
• To evaluate By ( a, b), treat the x in f ( x, y) as a constant and differentiate the resulting
function of y with respect to y. Then evaluate the result at x = a, y = b.

Now for some examples.


Example 2.1.4

Let
f ( x, y) = x3 + y2 + 4xy2
Then, since B
Bx treats y as a constant,
 
Bf Bf B B B
= = ( x3 ) + (y2 ) + (4xy2 )
Bx Bx y Bx Bx Bx
B
= 3x2 + 0 + 4y2 (x)
Bx
= 3x2 + 4y2

and, since B
By treats x as a constant,
 
Bf Bf B B B
= = ( x3 ) + (y2 ) + (4xy2 )
By By x By By By
B
= 0 + 2y + 4x (y2 )
By
= 2y + 8xy
In particular, at ( x, y) = (1, 0) these partial derivatives take the values
Bf
(1, 0) = 3(1)2 + 4(0)2 = 3
Bx
Bf
(1, 0) = 2(0) + 8(1)(0) = 0
By

Example 2.1.4

44
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES

Example 2.1.5

Let
f ( x, y) = y cos x + xe xy
B yx
Then, since B
Bx treats y as a constant, Bx e = yeyx and
Bf
( x, y) = ´y sin x + e xy + xye xy
Bx
Bf
( x, y) = cos x + x2 e xy
By

Example 2.1.5
Let’s move up to a function of four variables. Things generalize in a quite straight forward
way.
Example 2.1.6

Let
f ( x, y, z, t) = x sin(y + 2z) + t2 e3y ln z
Then
Bf
( x, y, z, t) = sin(y + 2z)
Bx
Bf
( x, y, z, t) = x cos(y + 2z) + 3t2 e3y ln z
By
Bf
( x, y, z, t) = 2x cos(y + 2z) + t2 e3y /z
Bz
Bf
( x, y, z, t) = 2te3y ln z
Bt

Example 2.1.6
Now here is a more complicated example — our function takes a special value at (0, 0).
To compute derivatives there we have to revert to the definition.
Example 2.1.7

Set # cos x´cos y


x´y if x ‰ y
f ( x, y) =
0 if x = y
cos x´cos y
If b ‰ a, then for all ( x, y) sufficiently close to ( a, b), f ( x, y) = x´y and we can
compute the partial derivatives of f at ( a, b) using the familiar rules of differentiation.
However that is not the case for ( a, b) = (0, 0). To evaluate f x (0, 0), we need to set y = 0
and find the derivative of #
cos x´1
x if x ‰ 0
f ( x, 0) =
0 if x = 0

45
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES

with respect to x at x = 0. To do so, we basically have to apply the definition

f (h, 0) ´ f (0, 0)
f x (0, 0) = lim
hÑ0 h
cos h´1
´0
= lim h (Recall that h ‰ 0 in the limit.)
hÑ0 h
cos h ´ 1
= lim
hÑ0 h2
´ sin h
= lim (By l’Hôpital’s rule.)
hÑ0 2h
´ cos h
= lim (By l’Hôpital again.)
hÑ0 2
1

2

Example 2.1.7

Example 2.1.8

Again set
# cos x´cos y
x´y if x ‰ y
f ( x, y) =
0 if x = y
We’ll now compute f y ( x, y) for all ( x, y).
The case y ‰ x: When y ‰ x,

B cos x ´ cos y
f y ( x, y) =
By x´y
( x ´ y) By
B
(cos x ´ cos y) ´ (cos x ´ cos y) By
B
( x ´ y)
= by the quotient rule
( x ´ y )2
( x ´ y) sin y + cos x ´ cos y
=
( x ´ y )2

The case y = x: When y = x,

f ( x, y + h) ´ f ( x, y) f ( x, x + h) ´ f ( x, x )
f y ( x, y) = lim = lim
hÑ0 h hÑ0 h
cos x´cos( x +h)
x´( x +h)
´0
= lim (Recall that h ‰ 0 in the limit.)
hÑ0 h
cos( x + h) ´ cos x
= lim
hÑ0 h2

Now we apply L’Hôpital’s rule twice, remembering that, in this limit, x is a constant and

46
PARTIAL D ERIVATIVES 2.1 PARTIAL D ERIVATIVES

h is the variable — so we differentiate with respect to h.

´ sin( x + h)
f y ( x, y) = lim
hÑ0 2h
´ cos( x + h)
= lim
hÑ0 2
cos x

2

The conclusion: # ( x´y) sin y+cos x´cos y


( x´y)2
if x ‰ y
f ( x, y) =
´ cos2 x if x = y

Example 2.1.8
Our next example uses implicit differentiation.
Example 2.1.9

The equation
z5 + y2 ez + e2x = 0
implicitly determines z as a function of x and y. For example, when x = y = 0, the
equation reduces to
z5 = ´1
which forces1 z(0, 0) = ´1. Let’s find the partial derivative Bx Bz
(0, 0).
We are not going to be able to explicitly solve the equation for z( x, y). All we know is
that
z( x, y)5 + y2 ez( x,y) + e2x = 0
for all x and y. We can turn this into an equation for Bx (0, 0)
Bz
by differentiating2 the whole
equation with respect to x, giving
Bz Bz
5z( x, y)4 ( x, y) + y2 ez(x,y) ( x, y) + 2e2x = 0
Bx Bx
and then setting x = y = 0, giving
Bz
5z(0, 0)4 (0, 0) + 2 = 0
Bx
As we already know that z(0, 0) = ´1,
Bz 2 2
(0, 0) = ´ 4

Bx 5z(0, 0) 5

1 The only real number z which obeys z5 = ´1 is z = ´1. However there are four other complex numbers
which also obey z5 = ´1.
2 You should have already seen this technique, called implicit differentiation, in your first Calculus
course.

47
PARTIAL D ERIVATIVES 2.2 H IGHER O RDER D ERIVATIVES

Example 2.1.9
Next we have a partial derivative disguised as a limit.
Example 2.1.10

In this example we are going to evaluate the limit

( x + y + z )3 ´ ( x + y )3
lim
zÑ0 ( x + y)z
The critical observation is that, in taking the limit z Ñ 0, x and y are fixed. They do not
change as z is getting smaller and smaller. Furthermore this limit is exactly of the form
of the limits in the Definition 2.1.1 of partial derivative, disguised by some obfuscating
changes of notation.
Set
( x + y + z )3
f ( x, y, z) =
( x + y)
Then
( x + y + z )3 ´ ( x + y )3 f ( x, y, z) ´ f ( x, y, 0) f ( x, y, 0 + h) ´ f ( x, y, 0)
lim = lim = lim
zÑ0 ( x + y)z zÑ0 z hÑ0 h
Bf
= ( x, y, 0)
Bz 
B ( x + y + z )3
=
Bz x+y z =0

Recalling that Bz treats x and


B
y as constants, we are evaluating the derivative of a function
(const+z)3
of the form const . So

( x + y + z )3 ´ ( x + y )3 ( x + y + z)2 ˇˇ
ˇ
lim =3
zÑ0 ( x + y)z x+y ˇ
z =0
= 3( x + y )

Example 2.1.10

2.2IJ Higher Order Derivatives


You have already observed, in your first Calculus course, that if f ( x ) is a function of x,
df
then its derivative, dx ( x ), is also a function of x, and can be differentiated to give the
d2 f
second order derivative dx2 ( x ), which can in turn be differentiated yet again to give the
third order derivative, f (3) ( x ), and so on.
We can do the same for functions of more than one variable. If f ( x, y) is a function of x
Bf Bf
and y, then both of its partial derivatives, Bx ( x, y) and By ( x, y) are also functions of x and

48
PARTIAL D ERIVATIVES 2.2 H IGHER O RDER D ERIVATIVES

y. They can both be differentiated with respect to x and they can both be differentiated
with respect to y. So there are four possible second order derivatives. Here they are,
together with various alternate notations.
 
B Bf B2 f
( x, y) = 2 ( x, y) = f xx ( x, y)
Bx Bx Bx
 
B Bf B2 f
( x, y) = ( x, y)= f xy ( x, y)
By Bx ByBx
 
B Bf B2 f
( x, y) = ( x, y)= f yx ( x, y)
Bx By BxBy
 
B Bf B2 f
( x, y) = 2 ( x, y) = f yy ( x, y)
By By By

Warning2.2.1.

B2 f 2
In By Bx = ByB Bx f , the derivative closest to f , in this case Bx
B
, is applied first. So we
work through the variables in the bottom right-to-left.
In f xy , the derivative with respect to the variable closest to f , in this case x, is
applied first. So we work through the subscript variables left-to-right.

The difference in “direction” highlighted in the warning seems confusing at first, but
it stems from the way the first partial derivative is written. In the fractional notation, if f
Bf
is being differentiated with respect to x, we write Bx or BxB
f . So the operator Bx
B
is added
Bf
to the left of the function. Now suppose we want to differentiate Bx with respect to y.
h i
Bf B2f
By analogy, we would write By B
Bx , or ByBx . This leads to the order of variables being
right-to-left.
With the subscript notation, if f is being differentiated with respect to x, we write f x ,
with the variable on the right of the function. So now if we take the second derivative with
respect to y, it makes sense by analogy to add that new variable to the right: ( f x )y , or f xy ,
in left-to-right order.
Example 2.2.2

Let f ( x, y) = emy cos(nx ). Then


f x = ´nemy sin(nx ) f y = memy cos(nx )
f xx = ´n2 emy cos(nx ) f yx = ´mnemy sin(nx )
f xy = ´mnemy sin(nx ) f yy = m2 emy cos(nx )

Example 2.2.2

Example 2.2.3

49
PARTIAL D ERIVATIVES 2.2 H IGHER O RDER D ERIVATIVES

Let f ( x, y) = eαx+ βy . Then

f x = αeαx+ βy f y = βeαx+ βy
f xx = α2 eαx+ βy f yx = βαeαx+ βy
f xy = αβeαx+ βy f yy = β2 eαx+ βy
More generally, for any integers m, n ě 0,
B m+n f
= αm βn eαx+ βy
Bx m Byn

Example 2.2.3

Example 2.2.4

If f ( x1 , x2 , x3 , x4 ) = x14 x23 x32 x4 , then

B4 f B3  
4 3 2
= x x x
Bx1 Bx2 Bx3 Bx4 Bx1 Bx2 Bx3 1 2 3
B2  
= 2 x14 x23 x3
Bx1 Bx2
B  4 2 
= 6 x1 x2 x3
Bx1
= 24 x13 x22 x3
and
B4 f B3  
= 4x13 x23 x32 x4
Bx4 Bx3 Bx2 Bx1 Bx4 Bx3 Bx2
B2  
= 12 x13 x22 x32 x4
Bx4 Bx3
B  
= 24 x13 x22 x3 x4
Bx4
= 24 x13 x22 x3

Example 2.2.4
Notice that in Example 2.2.2,
f xy = f yx = ´mnemy sin(nx )
and in Example 2.2.3
f xy = f yx = αβeαx+ βy
and in Example 2.2.4
B4 f B4 f
= = 24 x13 x22 x3
Bx1 Bx2 Bx3 Bx4 Bx4 Bx3 Bx2 Bx1

50
PARTIAL D ERIVATIVES 2.2 H IGHER O RDER D ERIVATIVES

In all of these examples, it didn’t matter what order we took the derivatives in. The fol-
lowing theorem3 shows that this was no accident.

Theorem2.2.5 (Clairaut’s Theorem4 or Schwarz’s Theorem5 ).

B2 f B2 f
If the partial derivatives BxBy and ByBx exist and are continuous at ( x0 , y0 ), then

B2 f B2 f
( x0 , y0 ) = ( x0 , y0 )
BxBy ByBx

The Proof of Theorem 2.2.5 can be found in Appendix A.3.1. An example of a function
B2 f B2 f
f ( x, y) where BxBy ( x0 , y0 ) ‰ ByBx ( x0 , y0 ) can be found in Appendix A.3.2.

Example 2.2.6 (Mixed Partial Detective Work)

Suppose a function f ( x, y) has continuous partial derivatives of all orders over all of R2 .
Suppose further
f xx ( x, y) = ye x
What is f xyxy ( x, y)?
Solution. Since the partial derivatives are continuous, Theorem 2.2.5 applies. So:
   
f xyxy ( x, y) = ( f x )yx ( x, y) = ( f x ) xy = f xxyy ( x, y)
y y
B
f xxy ( x, y) = [ye x ] = e x
By
B x
f xxyy ( x, y) = [e ] = 0
By

Example 2.2.6

Example 2.2.7

Is it possible for a function f ( x, y) to have f x ( x, y) = f x ( x, y) = xy?


Solution. It is not. If f x ( x, y) = xy, then f xy ( x, y) = x, which is continuous over all R2 .
Similarly, if f y ( x, y) = xy, then f yx ( x, y) = y, which is continuous over all R2 . But then by
Clairaut’s theorem, since f xy and f yx are continuous at (say) (1, 2), they must be equal at
that point. But f xy (1, 2) = 1 and f yx (1, 2) = 2.
Example 2.2.7

3 The history of this important theorem is pretty convoluted. See “A note on the history of mixed partial
derivatives” by Thomas James Higgins which was published in Scripta Mathematica 7 (1940), 59-62.
4 Alexis Clairaut (1713–1765) was a French mathematician, astronomer, and geophysicist.
5 Hermann Schwarz (1843–1921) was a German mathematician.

51
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

2.3IJ Local Maximum and Minimum Values


One of the core topics in single variable calculus courses is finding the maxima and min-
ima of functions of one variable. We’ll now extend that discussion to functions of more
than one variable6 . To keep things simple, we’ll focus on functions with two variables.
It’s worth noting, though, that many of the techniques we use will generalize to func-
tions with even more. To start, we have the following natural extensions to some familiar
definitions.

Definition2.3.1.

Let the function f ( x, y) be defined for all ( x, y) in some subset R of R2 . Let ( a, b)


be a point in R.

• ( a, b) is a local maximum of f ( x, y) if f ( x, y) ď f ( a, b) for all ( x, y) close to


( a, b). More precisely, ( a, b) is a local maximum of f ( x, y) if there is an r ą 0
such that f ( x, y) ď f ( a, b) for all points ( x, y) within a distance r of ( a, b).

• ( a, b) is a local minimum of f ( x, y) if f ( x, y) ě f ( a, b) for all ( x, y) close to


( a, b).
• Local maximum and minimum values are also called extremal values.

• ( a, b) is an absolute maximum or global maximum of f ( x, y) if f ( x, y) ď f ( a, b)


for all ( x, y) in R.

• ( a, b) is an absolute minimum or global minimum of f ( x, y) if f ( x, y) ě f ( a, b)


for all ( x, y) in R.

Another complication is that more variables lead to more (partial) derivatives. It’s
convenient to group the partial derivatives into a vector.

Definition2.3.2.


The vector f x ( a, b) , f y ( a, b) is denoted ∇ f ( a, b) and is called “the gradient of
the function f at the point ( a, b)”.

2.3.1 §§ Critical Points


One of the first things you did when you were developing the techniques used to find the
maximum and minimum values of f ( x ) was ask yourself7

Suppose that the largest value of f ( x ) is f ( a). What does that tell us about a?

6 Life is not (always) one-dimensional and sometimes we have to embrace it.


7 Or perhaps your instructor asked you.

52
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

After a little thought you answered

If the largest value of f ( x ) is f ( a) and f is differentiable at a, then f 1 ( a) = 0.

y “ f pxq

Let’s recall why that’s true. Suppose that the largest value of f ( x ) is f ( a). Then for all
h ą 0,

f ( a + h) ´ f ( a)
f ( a + h) ď f ( a) ùñ f ( a + h) ´ f ( a) ď 0 ùñ ď0 if h ą 0
h

Taking the limit h Ñ 0 tells us that f 1 ( a) ď 0. Similarly, for all h ă 0,

f ( a + h) ´ f ( a)
f ( a + h) ď f ( a) ùñ f ( a + h) ´ f ( a) ď 0 ùñ ě0 if h ă 0
h

Taking the limit h Ñ 0 now tells us that f 1 ( a) ě 0. So we have both f 1 ( a) ě 0 and f 1 ( a) ď 0


which forces f 1 ( a) = 0.
You also observed at the time that for this argument to work, you only need f ( x ) ď
f ( a) for all x’s close to a, not necessarily for all x’s in the whole world. (In the above
inequalities, we only used f ( a + h) with h small.) Since we care only about f ( x ) for x near
a, we can refine the above statement.

If f ( a) is a local maximum for f ( x ) and f is differentiable at a, then f 1 ( a) = 0.

Precisely the same reasoning applies to minima.

If f ( a) is a local minimum for f ( x ) and f is differentiable at a, then f 1 ( a) = 0.

Let’s use the ideas of the above discourse to extend the study of local maxima and
local minima to functions of more than one variable. Suppose that the function f ( x, y)
is defined for all ( x, y) in some subset R of R2 , that ( a, b) is point of R that is not on the
boundary of R, and that f has a local maximum at ( a, b). See the figure below.

53
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

pa,b , f pa,bqq

z “ f px, yq

y
pa,bq
R
x

Then the function f ( x, y) must decrease in value as ( x, y) moves away from ( a, b) in any
direction. If we change the x-coordinate a little, f ( x, y) must not increase. So for all h ą 0:

f ( a + h, b) ´ f ( a, b)
f ( a + h, b) ď f ( a, b) ùñ f ( a + h, b) ´ f ( a, b) ď 0 ùñ ď0 if h ą 0
h

Taking the limit h Ñ 0 tells us that f x ( a, b) ď 0.


Similarly, for all h ă 0,

f ( a + h, b) ´ f ( a, b)
f ( a + h, b) ď f ( a, b) ùñ f ( a + h, b) ´ f ( a, b) ď 0 ùñ ě0 if h ă 0
h

Taking the limit h Ñ 0 now tells us that f x ( a, b) ě 0. So we have both f x ( a, b) ě 0 and


f x ( a, b) ď 0 which forces f x ( a, b) = 0. The same reasoning tells us f y ( a, b) = 0 as well, and
that these partial derivatives are zero for minima as well as maxima. If both f x ( a, b) = 0
and f y ( a, b) = 0, then ∇ f ( a, b) = 0.
This is an important and useful result, so let’s theoremise it.

Theorem2.3.3.

Let the function f ( x, y) be defined for all ( x, y) in some subset R of R2 . Assume


that

˝ ( a, b) is a point of R that is not on the boundary of R and


˝ ( a, b) is a local maximum or local minimum of f and that
˝ the partial derivatives of f exist at ( a, b).

Then
∇ f ( a, b) = 0.

54
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

Definition2.3.4.

Let f ( x, y) be a function and let ( a, b) be a point in its domain. Then we call ( a, b)


a critical point (or a stationary point) of the function if either

• ∇ f ( a, b) does not exist, or

• ∇ f ( a, b) exists and is zero.

Warning2.3.5.

Note that some people (and texts) do not include the case ”∇ f ( a, b) does not
exist” in the definition of a critical point. These points where the gradient does
not exists would (usually) be referred as a singular point of the function. We do
not use this terminology.

Warning2.3.6.

Theorem 2.3.3 tells us that every local maximum or minimum (in the interior of
the domain of a differentiable function) is a critical point. Beware that it does not8
tell us that every critical point is either a local maximum or a local minimum.

In fact, as we shall see in Example 2.3.13, critical points that are neither local maxima
nor a local minima. None-the-less, Theorem 2.3.3 is very useful because often functions
have only a small number of critical points. To find local maxima and minima of such
functions, we only need to consider its critical points. We’ll return later to the question of
how to tell if a critical point is a local maximum, local minimum or neither. For now, we’ll
just practice finding critical points.

Example 2.3.7 f ( x, y) = x2 ´ 2xy + 2y2 + 2x ´ 6y + 12

Find all critical points of f ( x, y) = x2 ´ 2xy + 2y2 + 2x ´ 6y + 12.


Solution. To find the critical points, we need to find the gradient. To find the gradient we
need to find the first order partial derivatives. So, as a preliminary calculation, we find
the two first order partial derivatives of f ( x, y).
f x ( x, y) = 2x ´ 2y + 2
f y ( x, y) = ´2x + 4y ´ 6
These functions are defined everywhere (so the gradient exists at every point in the do-
main). So the critical points are the solutions of the pair of equations
2x ´ 2y + 2 = 0 ´ 2x + 4y ´ 6 = 0

8 A very common error of logic that people make is “Affirming the consequent”. “If P then Q” is true,
does not imply that “If Q then P” is true . The statement “If he is Shakespeare then he is dead” is true.
But concluding from “That sheep is dead” that “He must be Shakespeare” is just silly.

55
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

or equivalently (dividing by two and moving the constants to the right hand side)

x ´ y = ´1 (E1)
´x + 2y = 3 (E2)

This is a system of two equations in two unknowns (x and y). One strategy for solving
system like this is to

• First use one of the equations to solve for one of the unkowns in terms of the other
unknown. For example, (E1) tells us that y = x + 1. This expresses y in terms of x.
We say that we have solved for y in terms of x.

• Then substitute the result, y = x + 1 in our case, into the other equation, (E2). In our
case, this gives

´x + 2( x + 1) = 3 ðñ x + 2 = 3 ðñ x = 1

• We have now found that x = 1, y = x + 1 = 2 is the only solution. So the only critical
point is (1, 2). Of course it only takes a moment to verify that ∇ f (1, 2) = h0, 0i. It is
a good idea to do this as a simple check of our work.

An alternative strategy for solving a system of two equations in two unknowns, like (E1)
and (E2), is to

• add equations (E1) and (E2) together. This gives

( E1) + ( E2) : (1 ´ 1) x + (´1 + 2)y = ´1 + 3 ðñ y = 2

The point here is that adding equations (E1) and (E2) together eliminates the un-
known x, leaving us with one equation in the unknown y, which is easily solved.
For other systems of equations you might have to multiply the equations by some
numbers before adding them together.

• We now know that y = 2. Substituting it into (E1) gives us

x ´ 2 = ´1 ùñ x = 1

• Once again (thankfully) we have found that the only critical point is (1, 2).

Example 2.3.7
This was pretty easy because we only had to solve linear equations, which in turn was a
consequence of the fact that f ( x, y) was a polynomial of degree two. Here is an example
with some slightly more challenging algebra.

56
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES


Example 2.3.8 f ( x, y) = 2x3 ´ 6xy + y2 + 4y

Find all critical points of f ( x, y) = 2x3 ´ 6xy + y2 + 4y.


Solution. As in the last example, we need to find where the gradient does not exist or is
zero, and to find the gradient we need the first order partial derivatives.
f x = 6x2 ´ 6y f y = ´6x + 2y + 4
These functions are defined everywhere. So the critical points are the solutions of
6x2 ´ 6y = 0 ´ 6x + 2y + 4 = 0
We can rewrite the first equation as y = x2 , which expresses y as a function of x. We can
then substitute y = x2 into the second equation, giving
´6x + 2y + 4 = 0 ðñ ´6x + 2x2 + 4 = 0 ðñ x2 ´ 3x + 2 = 0 ðñ ( x ´ 1)( x ´ 2) = 0
ðñ x = 1 or 2

When x = 1, y = 12 = 1 and when x = 2, y = 22 = 4. So, there are two critical points:


(1, 1), (2, 4).
Alternatively, we could have also used the second equation to write y = 3x ´ 2, and
then substituted that into the first equation to get
6x2 ´ 6(3x ´ 2) = 0 ðñ x2 ´ 3x + 2 = 0
just as above.
Example 2.3.8

And here is an example for which the algebra requires a bit more thought.
Example 2.3.9 ( f ( x, y) = xy(5x + y ´ 15))

Find all critical points of f ( x, y) = xy(5x + y ´ 15).


Solution. The first order partial derivatives of f ( x, y) = xy(5x + y ´ 15) are
f x ( x, y) = y(5x + y ´ 15) + xy(5) = y(5x + y ´ 15) + y(5x ) = y(10x + y ´ 15)
f y ( x, y) = x (5x + y ´ 15) + xy(1) = x (5x + y ´ 15) + x (y) = x (5x + 2y ´ 15)
Therefore the gradient of the function exists everywhere in the domain of te function. The
critical points are the solutions of f x ( x, y) = f y ( x, y) = 0. That is, we need to find all x, y
that satisfy the pair of equations
y(10x + y ´ 15) = 0 (E1)
x (5x + 2y ´ 15) = 0 (E2)
The first equation, y(10x + y ´ 15) = 0, is satisfied if at least one of the two factors y,
(10x + y ´ 15) is zero. So the first equation is satisfied if at least one of the two equations
y=0 (E1a)
10x + y = 15 (E1b)

57
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

is satisfied. The second equation, x (5x + 2y ´ 15) = 0, is satisfied if at least one of the two
factors x, (5x + 2y ´ 15) is zero. So the second equation is satisfied if at least one of the
two equations

x=0 (E2a)
5x + 2y = 15 (E2b)

is satisfied.
So both critical point equations (E1) and (E2) are satisfied if and only if at least one
of (E1a), (E1b) is satisfied and in addition at least one of (E2a), (E2b) is satisfied. So both
critical point equations (E1) and (E2) are satisfied if and only if at least one of the following
four possibilities hold.
• (E1a) and (E2a) are satisfied if and only if x = y = 0

• (E1a) and (E2b) are satisfied if and only if y = 0, 5x + 2y = 15 ðñ y = 0, 5x = 15

• (E1b) and (E2a) are satisfied if and only if 10x + y = 15, x = 0 ðñ y = 15, x = 0

• (E1b) and (E2b) are satisfied if and only if 10x + y = 15, 5x + 2y = 15. We can use, for
example, the second of these equations to solve for x in terms of y: x = 51 (15 ´ 2y).
When we substitute this into the first equation we get 2(15 ´ 2y) + y = 15, which we
can solve for y. This gives ´3y = 15 ´ 30 or y = 5 and then x = 15 (15 ´ 2 ˆ 5) = 1.
In conclusion, the critical points are (0, 0), (3, 0), (0, 15) and (1, 5).
A more compact way to write what we have just done is

f x ( x, y) = 0 and f y ( x, y) = 0
ðñ y(10x + y ´ 15) = 0 and x (5x + 2y ´ 15) = 0
y = 0 or 10x + y = 15 and x = 0 or 5x + 2y = 15
( (
ðñ
y = 0, x = 0 or y = 0, 5x + 2y = 15 or 10x + y = 15, x = 0 or
( ( (
ðñ
10x + y = 15, 5x + 2y = 15
(

x = y = 0 or y = 0, x = 3 or x = 0, y = 15 or x = 1, y = 5
( ( ( (
ðñ

Example 2.3.9

Let’s try a more practical example — something from the real world. Well, a mathe-
matician’s “real world”. The interested reader should search-engine their way to a dis-
cussion of “idealisation”, “game theory” “Cournot models” and “Bertrand models”. But
don’t spend too long there. A discussion of breweries is about to take place.
Example 2.3.10

In a certain community, there are two breweries in competition9 , so that sales of each neg-
atively affect the profits of the other. If brewery A produces x litres of beer per month and

9 We have both types of music here — country and western.

58
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

brewery B produces y litres per month, then the profits of the two breweries are given by
2x2 + y2 4y2 + x2
P = 2x ´ Q = 2y ´
106 2 ˆ 106
respectively. Find the sum of the two profits if each brewery independently sets its own
production level to maximize its own profit and assumes that its competitor does likewise.
Then, assuming cartel behaviour, find the sum of the two profits if the two breweries
cooperate so as to maximize that sum10 .
Solution. If A adjusts x to maximize P (for y held fixed) and B adjusts y to maximize Q
(for x held fixed) then we want to find the ( x, y) using
4x
Px = 2 ´ 106
8y
Qy = 2 ´ 2ˆ106
Note that Px and Qy exists everywhere. Then x and y are determined by the equations
Px = 0 (E1)
Qy = 0 (E2)
Equation (E1) yields x = 21 106 and equation (E2) yields y = 12 106 . Knowing x and y we
can determine P, Q and the total profit

P + Q = 2( x + y) ´ 1016 52 x2 + 3y2

= 106 1 + 1 ´ 85 ´ 43 = 58 106

On the other hand if ( A, B) adjust ( x, y) to maximize P + Q = 2( x + y) ´ 1016 52 x2 + 3y2 ,
then x and y are determined by
5x
( P + Q) x = 2 ´ 10 6 = 0 (E1)
6y
( P + Q)y = 2 ´ 10 6 = 0 (E2)
Equation (E1) yields x = 52 106 and equation (E2) yields y = 31 106 . Again knowing x and y
we can determine the total profit

P + Q = 2( x + y) ´ 1016 52 x2 + 3y2

= 106 54 + 23 ´ 52 ´ 13 = 15
11 6
10
So cooperating really does help their profits. Unfortunately, like a very small tea-pot,
consumers will be a little poorer11 .
Example 2.3.10
Moving swiftly away from the last pun, let’s do something a little more geometric.
Example 2.3.11

Equal angle bends are made at equal distances from the two ends of a 100 metre long fence
so the resulting three segment fence can be placed along an existing wall to make an en-
closure of trapezoidal shape. What is the largest possible area for such an enclosure?

10 This sort of thing is generally illegal.


11 The authors extend their deepest apologies.

59
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

Solution. This is a very geometric problem (fenced off from pun opportunities), and as
such we should start by drawing a sketch and introducing some variable names.

x x x sin θ
θ θ
100 ´ 2x

The area enclosed by the fence is the area inside the blue rectangle (in the figure on the
right above) plus the area inside the two blue triangles.
A( x, θ ) = (100 ´ 2x ) x sin θ + 2 ¨ 12 ¨ x sin θ ¨ x cos θ
= (100x ´ 2x2 ) sin θ + x2 sin θ cos θ
To maximize the area, we need to solve
BA
0= = (100 ´ 4x ) sin θ + 2x sin θ cos θ
Bx
BA
0= = (100x ´ 2x2 ) cos θ + x2 cos2 θ ´ sin2 θ
(

Note that BABx and Bθ are defined everywhere in their domain (so here the critical points are
BA

the points where the gradient is zero). Both terms in the first equation contain the factor
sin θ and all terms in the second equation contain the factor x. If either sin θ or x are zero
the area A( x, θ ) will also be zero, and so will certainly not be maximal. So we may divide
the first equation by sin θ and the second equation by x, giving
(100 ´ 4x ) + 2x cos θ = 0 (E1)
(100 ´ 2x ) cos θ + x cos2 θ ´ sin2 θ = 0 (E2)
(

These equations might look a little scary. But there is no need to panic. They are not as
bad as they look because θ enters only through cos θ and sin2 θ, which we can easily write
in terms of cos θ. Furthermore we can eliminate cos θ by observing that the first equation
(100´4x )2
forces cos θ = ´ 100´4x
2x and hence sin2 θ = 1 ´ cos2 θ = 1 ´ 4x2
. Substituting these
into the second equation gives
 
100 ´ 4x (100 ´ 4x )2
´(100 ´ 2x ) +x ´1 = 0
2x 2x2

ùñ ´(100 ´ 2x )(100 ´ 4x ) + (100 ´ 4x )2 ´ 2x2 = 0

ùñ 6x2 ´ 200x = 0

100 ´100/3 1
ùñ x= cos θ = ´ = θ = 60˝
3 200/3 2

60
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

and the maximum area enclosed is


 ? ?
100 1002  3 1 1002 3 2500
A = 100 ´2 2 + 2
= ?
3 3 2 2 3 2 3

Example 2.3.11

Now here is a very useful (even practical!) statistical example — finding the line that
best fits a given collection of points.
Example 2.3.12 (Linear regression)

An experiment yields n data points ( xi , yi ), i = 1, 2, ¨ ¨ ¨ , n. We wish to find the straight


line y = mx + b which “best” fits the data. The definition of “best” is “minimizes the

pxn ,yn q

px1 ,y1 q px3 ,y3 q


px2 ,y2 q

y “ mx ` b
x

root mean square error”, i.e. minimizes


n
(mxi + b ´ yi )2
ÿ
E(m, b) =
i =1

Note that
• term number i in E(m, b) ish the square
i of the difference between yi , which is the ith
measured value of y, and mx + b , which is the approximation to yi given by
x = xi
the line y = mx + b.

• All terms in the sum are positive, regardless of whether the points ( xi , yi ) are above
or below the line.
Our problem is to find the m and b that minimizes E(m, b). This technique for drawing a
line through a bunch of data points is called “linear regression”. It is used a lot12 13 . Even

12 Proof by search engine.


13 And has been used for a long time. It was introduced by the Fench mathematician Adrein-Marie Leg-
endre, 1752–1833, in 1805, and by the German mathematician and physicist Carl Friedrich Gauss, 1777–
1855, in 1809.

61
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

in the real world — and not just the real world that you find in mathematics problems.
The actual real world that involves jobs.
Solution. We wish to choose m and b so as to minimize E(m, b). So we need to determine
where the gradient of E does not exist or it exists and it is equal to zero.
n hř n i hř n i hř n i
BE ÿ
= 2(mxi + b ´ yi ) xi = m 2xi2 + b 2xi ´ 2xi yi
Bm i =1 i =1 i =1
i =1
n hř
n i hř
n i hř
n i
BE ÿ
= 2(mxi + b ´ yi ) =m 2xi + b 2 ´ 2yi
Bb i =1 i =1 i =1
i =1

There are a lot of symbols here. But remember that all of the xi ’s and yi ’s are given con-
stants. They come from, for example, experimental data. The only unknowns are m and
b. To emphasize this, and to save some writing, define the constants
n n n n
Sx = xi Sy = yi S x2 = xi2 Sxy = xi yi
ř ř ř ř
i =1 i =1 i =1 i =1

The partial derivatives of E exists everywhere so we only need to find where they are
equal to zero. The equations which determine the critical points are (after dividing by
two)

0 = Sx2 m + Sx b ´ Sxy ùñ Sx2 m + Sx b = Sxy (E1)


0 = S x m + n b ´ Sy ùñ S x m + n b = Sy (E2)

These are two linear equations on the unknowns m and b. They may be solved in any of
the usual ways. One is to use (E2) to solve for b in terms of m
1 
b= Sy ´ S x m (E3)
n
and then substitute this into (E1) to get the equation
1  
Sx2 m + Sx Sy ´ Sx m = Sxy ùñ nSx2 ´ S2x m = nSxy ´ Sx Sy
n
for m. We can then solve this equation for m and substitute back into (E3) to get b. This
gives
nSxy ´ Sx Sy Sx Sxy ´ Sy Sx2
m= b = ´
nSx2 ´ S2x nSx2 ´ S2x
Another way to solve the system of equations is
h i
n(E1) ´ Sx (E2) : nSx2 ´ S2x m = nSxy ´ Sx Sy
h i
´Sx (E1) + Sx2 (E2) : nSx2 ´ S2x b = ´Sx Sxy + Sy Sx2

which gives the same solution.


So given a bunch of data points, it only takes a quick bit of arithmetic — no calculus
required — to apply the above formulae and so to find the best fitting line. Of course while

62
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

you don’t need any calculus to apply the formulae, you do need calculus to understand
where they came from. The same technique can be extended to other types of curve fitting
problems. For example, polynomial regression.
Example 2.3.12

2.3.2 §§ Classifying Critical Points


Now let’s start thinking about how to tell if a critical point is a local minimum, local maxi-
mum, or neither. We’ll start with an intuitive approach, then introduce the (multivariable)
Second Derivative Test.
You have already encountered single variable functions that have a critical point which
is neither a local max nor a local min. This can also happen for functions of two variables.
We’ll start with the simplest possible such example.

Example 2.3.13 f ( x, y) = x2 ´ y2

The first partial derivatives of f ( x, y) = x2 ´ y2 are f x ( x, y) = 2x and f y ( x, y) = ´2y. So


the only critical point of this function is (0, 0). Is this a local minimum or maximum? Well
let’s start with ( x, y) at (0, 0) and then move ( x, y) away from (0, 0) and see if f ( x, y) gets
bigger or smaller. At the origin f (0, 0) = 0. Of course we can move ( x, y) away from (0, 0)
in many different directions.
• First consider moving ( x, y) along the x-axis. Then ( x, y) = ( x, 0) and f ( x, y) =
f ( x, 0) = x2 . So when we start with x = 0 and then increase x, the value of the
function f increases — which means that (0, 0) cannot be a local maximum for f .
• Next let’s move ( x, y) away from (0, 0) along the y-axis. Then ( x, y) = (0, y) and
f ( x, y) = f (0, y) = ´y2 . So when we start with y = 0 and then increase y, the value
of the function f decreases — which means that (0, 0) cannot be a local minimum
for f .
So moving away from (0, 0) in one direction causes the value of f to increase, while mov-
ing away from (0, 0) in a second direction causes the value of f to decrease. Consequently
(0, 0) is neither a local minimum or maximum for f . It is called a saddle point, because the
graph of f looks like a saddle. (The full definition of “saddle point” is given immediately
after this example.) Here are some figures showing the graph of f .

The figure below show some level curves of f . Observe from the level curves that
• f increases as you leave (0, 0) walking along the x axis

63
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

• f decreases as you leave (0, 0) walking along the y axis

y
f =−9

f =−4

f =−1

f =0
f =9 f =4 f =1 f =1 f =4 f =9
x

f =−1

f =−4

f =−9

Example 2.3.13

Approximately speaking, if a critical point ( a, b) is neither a local minimum nor a local


maximum, then it is a saddle point. For ( a, b) to not be a local minimum, f has to take val-
ues bigger than f ( a, b) at some points nearby ( a, b). For ( a, b) to not be a local maximum,
f has to take values smaller than f ( a, b) at some points nearby ( a, b). Writing this more
mathematically we get the following definition.

Definition2.3.14.

The critical point ( a, b) is called a saddle point for the function f ( x, y) if, for each
r ą 0,

• there is at least one point ( x, y), within a distance r of ( a, b), for which
f ( x, y) ą f ( a, b) and

• there is at least one point ( x, y), within a distance r of ( a, b), for which
f ( x, y) ă f ( a, b).

Understanding what the graph of a function looks like is a powerful tool for classifying
critical points, but it can me very time-consuming. The Second Derivative Test (below) is
a more algebraic approach to classification. This test is often faster than graphing, but the
drawback is that it is sometimes inconclusive.

64
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

Theorem2.3.15 (Second Derivative Test).

Let r ą 0 and assume that all second order derivatives of the function f ( x, y) are
continuous at all points ( x, y) that are within a distance r of ( a, b). Assume that
f x ( a, b) = f y ( a, b) = 0. Define

D ( x, y) = f xx ( x, y) f yy ( x, y) ´ f xy ( x, y)2

It is called the discriminant of f . Then

• if D ( a, b) ą 0 and f xx ( a, b) ą 0, then f ( x, y) has a local minimum at ( a, b),

• if D ( a, b) ą 0 and f xx ( a, b) ă 0, then f ( x, y) has a local maximum at ( a, b),

• if D ( a, b) ă 0, then f ( x, y) has a saddle point at ( a, b), but

• if D ( a, b) = 0, then we cannot draw any conclusions without more work.

The proof of Theorem 2.3.15 is beyond the scope of Math 105, but there is some intu-
ition supporting it that is more accessible. Extremely informally, we can think of saddle
points as places with inconsistent concavity: in some directions the surface looks concave
up, in other directions it looks concave down. On the other hand, at a local extremum, the
concavity is the same in all directions.
Let’s do thought experiments on a few simple cases to expand those ideas.

Example 2.3.16 (Second Derivative Test Intuition)

Let ( a, b) be a critical point of the function f ( x, y) with ∇ f ( a, b) = 0, and assume all


second-order derivatives fo f ( x, y) are continuous.

1. Suppose at ( a, b), the surface looks like a minimum if y is held constant, but it looks
like a maximum if x is held constant. (In particular, this means ( a, b) is the location
of a saddle point.)

( a, b, f ( a, b))

Holding y = b constant, we can think of z = f ( x, b) as a one-variable function, in


which case f xx ( a, b) ě 0 by the single-variable second derivative test. Holding x = a
constant, we can think of z = f ( a, y) as a one-variable function (whose variable is
y). In that case, f yy ( a, b) ď 0 by the single-variable second derivative test.

65
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

f ( x, b)

f ( a, y)

Since f xx ( a, b) and f yy ( a, b) have different signs (or at least one of them is zero):

f xx ( a, b) f yy ( a, b) ď 0
2 2
f xx ( a, b) f yy ( a, b) ´ f xy ( a, b) ď ´ f xy )( a, b) ď 0
D ( a, b) ď 0

So in this simple saddle-point example, we expect D ( a, b) ď 0. This accords with the


third bullet point in Theorem 2.3.15.

2. Suppose D ( a, b) ą 0.
2
0 ă f xx ( a, b) f yy ( a, b) ´ f xy ( a, b)
2
f xy ( a, b) ă f xx ( a, b) f yy ( a, b)

Since f xy is raised to an even power, it’s nonnegative.


2
0 ď f xy ( a, b) ă f xx ( a, b) f yy ( a, b)
0 ă f xx ( a, b) f yy ( a, b)

This tells us that f xx ( a, b) and f yy ( a, b) have the same sign – either they’re both pos-
itive or they’re both negative. So, the function’s concavity is the same whether we
hold the x-value or the y-value constant. The function might have the same concav-
ity in all directions – unlike the saddle point example we saw above. So, it seems
plausible that critical points with positive discriminants are local extrema, rather
than saddle points.

3. Suppose the surface has a local maximum at ( a, b).


Holding y = b constant, we can think of z = f ( x, b) as a one-variable function, in
which case f xx ( a, b) ď 0 by the single-variable second derivative test.

z = f ( x, y) z = f ( x, b)

b b
a a
y y
x x

66
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

This doesn’t go so far as to show us that D ( a, b) ě 0, but it does accord with the test
of f xx ( a, b) in the second bullet point of Theorem 2.3.15.
4. Similarly, suppose the surface has a local minimum at ( a, b).
Holding y = b constant, we can think of z = f ( x, b) as a one-variable function, in
which case f xx ( a, b) ě 0 by the single-variable second derivative test.

z = f ( x, y) z = f ( x, b)

b b
a a
y y
x x

Again, although this doesn’t go so far as to show us that D ( a, b) ě 0, it does accord


with the test of f xx ( a, b) in the first bullet point of Theorem 2.3.15.
Example 2.3.16

You might wonder why, in the local maximum/local minimum cases of Theorem
2.3.15, f xx ( a, b) appears rather than f yy ( a, b). The answer is only that x is before y in the
alphabet14 . You can use f yy ( a, b) just as well as f xx ( a, b). The reason is that if D ( a, b) ą 0
(as in the first two bullets of the theorem), then because D ( a, b) = f xx ( a, b) f yy ( a, b) ´
f xy ( a, b)2 ą 0, we necessarily have f xx ( a, b) f yy ( a, b) ą 0 so that f xx ( a, b) and f yy ( a, b)
must have the same sign — either both are positive or both are negative.
You might also wonder why we cannot draw any conclusions when D ( a, b) = 0 and
what happens then. The second derivative test for functions of two variables was derived
in precisely the same way as the second derivative test for functions of one variable is
derived — you approximate the function by a polynomial that is of degree two in ( x ´ a),
(y ´ b) and then you analyze the behaviour of the quadratic polynomial near ( a, b). For
this to work, the contributions to f ( x, y) from terms that are of degree two in ( x ´ a),
(y ´ b) had better be bigger than the contributions to f ( x, y) from terms that are of degree
three and higher in ( x ´ a), (y ´ b) when ( x ´ a), (y ´ b) are really small. If this is not
the case, for example when the terms in f ( x, y) that are of degree two in ( x ´ a), (y ´ b)
all have coefficients that are exactly zero, the analysis will certainly break down. That’s
exactly what happens when D ( a, b) = 0. Here are some examples. The functions
f 1 ( x, y) = x4 + y4 f 2 ( x, y) = ´x4 ´ y4 f 3 ( x, y) = x3 + y3 f 4 ( x, y) = x4 ´ y4
all have (0, 0) as the only critical point and all have D (0, 0) = 0. The first, f 1 has its
minimum there. The second, f 2 , has its maximum there. The third and fourth have a
saddle point there.

14 The shackles of convention are not limited to mathematics. Election ballots often have the candidates
listed in alphabetic order.

67
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

Here are sketchs of some level curves for each of these four functions (with all renamed
to simply f ).

y f “9 y f “´9
f “4 f “´4

f “1 f “´1

f “0.1 f “´0.1

f “0 f “0
x x

level curves of f px, yq “ x4 ` y 4 level curves of f px, yq “ ´x4 ´ y 4

y y
f “4
f “´4

f “1 f “´1

f “0
f “4 f “1 f “1 f “4

f “0
x f “0 x

f “´1
f “´1
f “´4
f “´4

level curves of f px, yq “ x3 ` y 3 level curves of f px, yq “ x4 ´ y 4


Example 2.3.17 f ( x, y) = 2x3 ´ 6xy + y2 + 4y

Find and classify all critical points of f ( x, y) = 2x3 ´ 6xy + y2 + 4y.


Solution. Thinking a little way ahead, to find the critical points we will need the gradient
and to apply the second derivative test of Theorem 2.3.15 we will need all second order
partial derivatives. So we need all partial derivatives of order up to two. Here they are.

f = 2x3 ´ 6xy + y2 + 4y
f x = 6x2 ´ 6y f xx = 12x f xy = ´6
f y = ´6x + 2y + 4 f yy = 2 f yx = ´6

68
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

(Of course, f xy and f yx have to be the same. It is still useful to compute both, as a way to
catch some mechanical errors.)
We have already found, in Example 2.3.8, that the critical points are (1, 1), (2, 4). The
classification is
critical 2
point f xx f yy ´ f xy f xx type
(1, 1) 12 ˆ 2 ´ (´6)2 ă 0 saddle point
(2, 4) 24 ˆ 2 ´ (´6)2 ą0 24 local min

We were able to leave the f xx entry in the top row blank, because
2 (1, 1) ă 0, and
• we knew that f xx (1, 1) f yy (1, 1) ´ f xy
2 (1, 1) ă 0, by itself, was
• we knew, from Theorem 2.3.15, that f xx (1, 1) f yy (1, 1) ´ f xy
enough to ensure that (1, 1) was a saddle point.

Here is a sketch of some level curves of our f ( x, y). They are not needed to answer this

f p2,4q“0, p2,4q

f “0.25

f “0.5

f “2 f “1

f “3 f “2

f “3

p1,1q, f p1,1q“1
f “0.5

f “0 f “1 x

question, but can give you some idea as to what the graph of f looks like.
Example 2.3.17

Example 2.3.18 ( f ( x, y) = xy(5x + y ´ 15))

Find and classify all critical points of f ( x, y) = xy(5x + y ´ 15).


Solution. We have already computed the first order partial derivatives

f x ( x, y) = y(10x + y ´ 15) f y ( x, y) = x (5x + 2y ´ 15)

69
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

of f ( x, y) in Example 2.3.9. Again, to classify the critical points we need the second order
partial derivatives. They are

f xx ( x, y) = 10y
f yy ( x, y) = 2x
f xy ( x, y) = (1)(10x + y ´ 15) + y(1)= 10x + 2y ´ 15
f yx ( x, y) = (1)(5x + 2y ´ 15) + x (5)= 10x + 2y ´ 15

(Once again, we have computed both f xy and f yx to guard against mechanical errors.) We
have already found, in Example 2.3.9, that the critical points are (0, 0), (0, 15), (3, 0) and
(1, 5). The classification is

critical 2
point f xx f yy ´ f xy f xx type
(0, 0) 0 ˆ 0 ´ (´15)2 ă 0 saddle point
(0, 15) 150 ˆ 0 ´ 152 ă 0 saddle point
(3, 0) 0 ˆ 6 ´ 152 ă0 saddle point
(1, 5) 50 ˆ 2 ´ 52 ą0 50 local min

Here is a sketch of some level curves of our f ( x, y). f is negative in the shaded re-
gions and f is positive in the unshaded regions. Again this is not needed to answer this

p0,15q, f p0,15q“0

f p1,5q“´25, p1,5q

f “20 f “20
p3,0q, f p3,0q“0
f p0,0q“0, p0,0q f “´20

f “´10
f “0
x
f “´20 f “´20
f “20

question, but can give you some idea as to what the graph of f looks like.
Example 2.3.18

70
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

Example 2.3.19

Find and classify all of the critical points of f ( x, y) = x3 + xy2 ´ 3x2 ´ 4y2 + 4.
Solution. We know the drill now. We start by computing all of the partial derivatives of f
up to order 2.

f = x3 + xy2 ´ 3x2 ´ 4y2 + 4


f x = 3x2 + y2 ´ 6x f xx = 6x ´ 6 f xy = 2y
f y = 2xy ´ 8y f yy = 2x ´ 8 f yx = 2y

f x and f y are defined everywhere. So the critical points are then the solutions of f x = 0,
f y = 0. That is

f x = 3x2 + y2 ´ 6x = 0 (E1)
f y = 2y( x ´ 4) = 0 (E2)

The second equation, 2y( x ´ 4) = 0, is satisfied if and only if at least one of the two
equations y = 0 and x = 4 is satisfied.

• When y = 0, equation (E1) forces x to obey

0 = 3x2 + 02 ´ 6x = 3x ( x ´ 2)

so that x = 0 or x = 2.

• When x = 4, equation (E1) forces y to obey

0 = 3 ˆ 42 + y2 ´ 6 ˆ 4 = 24 + y2

which is impossible.

So, there are two critical points: (0, 0), (2, 0). Here is a table that classifies the critical
points.

critical 2
point f xx f yy ´ f xy f xx type
(0, 0) (´6) ˆ (´8) ´ 02 ą 0 ´6 ă 0 local max
(2, 0) 6 ˆ (´4) ´ 02 ă 0 saddle point

Example 2.3.19

Example 2.3.20

A manufacturer wishes to make an open rectangular box of given volume V using the least
possible material. Find the design specifications.
Solution. Denote by x, y and z, the length, width and height, respectively, of the box.

71
PARTIAL D ERIVATIVES 2.3 L OCAL M AXIMUM AND M INIMUM VALUES

x
y

The box has two sides of area xz, two sides of area yz and a bottom of area xy. So the total
surface area of material used is

S = 2xz + 2yz + xy

However the three dimensions x, y and z are not independent. The requirement that the
box have volume V imposes the constraint

xyz = V

We can use this constraint to eliminate one variable. Since z is at the end of the alphabet
V
(poor z), we eliminate z by substituting z = xy . Note that if x (or y) is equal to zero then
the volume of the box would equal zero. What is the point of a box with zero volume?!
So if we assume the box has non-zero volume then x ­= 0 and y ­= 0. So we have find the
values of x and y that minimize the function
2V 2V
S( x, y) = + + xy
y x
Let’s start by finding the critical points of S. Since
2V
Sx ( x, y) = ´ +y
x2
2V
Sy ( x, y) = ´ +x
y2
Note that the partial derivatives are not defined for ( x, y) = (0, 0) but we have already
eliminated the case where x or y is equal to zero. So ( x, y) is a critical point if and only if

x2 y = 2V (E1)
2
xy = 2V (E2)
2V
Solving (E1) for y gives y = x2
. Substituting this into (E2) gives

4V 2 3
?
3 2V ?
3
x 4
= 2V ùñ x = 2V ùñ x = 2V and y= 2/3
= 2V
x (2V )
As there is only one critical point, we would expect it to give the minimum15 . But let’s use
the second derivative test to verify that at least the critical point is a local minimum. The

15 Indeed one can use the facts that 0 ă x ă 8, that 0 ă y ă 8, and that S Ñ 8 as x Ñ 0 and as y Ñ 0
and as x Ñ 8 and as y Ñ 8 to prove that the single critical point gives the global minimum.

72
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA

various second partial derivatives are


4V ?
3
?3 
Sxx ( x, y) = Sxx 2V , 2V = 2
x3 ? ? 
3 3
Sxy ( x, y) = 1 Sxy 2V , 2V = 1
4V ?
3
?3 
Syy ( x, y) = 3 Syy 2V , 2V = 2
y
So
?3
?3  ?
3
?3  ?3
?3 2 ?3
?3 
Sxx 2V , 2V Syy 2V , 2V ´ Sxy 2V , 2V = 3 ą 0 Sxx 2V , 2V = 2 ą 0
? ? 
and, by Theorem 2.3.15.b, 3 2V , 3 2V is a local minimum and the desired dimensions
are
?
c
3 3 V
x = y = 2V z=
4
Note that our solution has x = y. That’s a good thing — the function S( x, y) is symmetric
in x and y. Because the box has no top, the symmetry does not extend to z.
Example 2.3.20

2.4IJ Absolute Minima and Maxima


Of course a local maximum or minimum of a function need not be the absolute maximum
or minimum. We’ll now consider how to find the absolute maximum and minimum. Let’s
start by reviewing how one finds the absolute maximum and minimum of a function of
one variable on an interval.
For concreteness, let’s suppose that we want to find the extremal16 values of a function
f ( x ) on the interval 0 ď x ď 1. If an extremal value is attained at some x = a which is in
the interior of the interval, i.e. if 0 ă a ă 1, then a is also a local maximum or minimum
and so has to be a critical point of f . But if an extremal value is attained at a boundary
point a of the interval, i.e. if a = 0 or a = 1, then a need not be a critical point of f . This
happens, for example, when f ( x ) = x. The largest value of f ( x ) on the interval 0 ď x ď 1
is 1 and is attained at x = 1, but f 1 ( x ) = 1 is never zero, so that f has no critical points.

y y = f (x) = x
1

1 x

So to find the maximum and minimum of the function f ( x ) on the interval [0, 1], you:

16 Recall that “extremal value” means “either maximum value or minimum value”.

73
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA

1. build up a list of all candidate points 0 ď a ď 1 at which the maximum or miminum


could be attained, by finding all a’s for which either
(a) 0 ă a ă 1 and f 1 ( a) does not exist or
(b) 0 ă a ă 1 and f 1 ( a) = 0 or
(c) a is a boundary point, i.e. a = 0 or a = 1;
2. and then you evaluate f ( a) at each a on the list of candidates. The biggest of these
candidate values of f ( a) is the absolute maximum and the smallest of these candi-
date values is the absolute minimum.
The procedure for finding the maximum and minimum of a function of two variables
f ( x, y) in a set like, for example, the unit disk x2 + y2 ď 1, is similar. You again:
1. build up a list of all candidate points ( a, b) in the set at which the maximum or
minimum could be attained, by finding all ( a, b)’s for which either17
(a) ( a, b) is in the interior of the set and f x ( a, b) or f y ( a, b) does not exist or
(b) ( a, b) is in the interior of the set (for our example, a2 + b2 ă 1) and f x ( a, b) =
f y ( a, b) = 0 or
(c) ( a, b) is a boundary18 point, (for our example, a2 + b2 = 1), and could give the
maximum or minimum on the boundary — more about this shortly —
2. and then you evaluate f ( a, b) at each ( a, b) on the list of candidates. The biggest of
these candidate values of f ( a, b) is the absolute maximum and the smallest of these
candidate values is the absolute minimum.
The boundary of a set in R2 (like x2 + y2 ď 1) is a curve (like x2 + y2 = 1). This curve is a
one dimensional set, meaning that it is like a deformed x-axis. We can find the maximum
and minimum of f ( x, y) on this curve by converting f ( x, y) into a function of one variable
(on the curve) and using the standard function of one variable techniques. This is best
explained by some examples.
Example 2.4.1

Find the maximum and minimum values of f ( x, y) = x3 + xy2 ´ 3x2 ´ 4y2 + 4 on the disk
x2 + y2 ď 1.
Solution. Again, we first find all critical points, and then we analyze the boundary.
Interior: If f takes its maximum or minimum value at a point in the interior, x2 + y2 ă 1,
then that point must be a critical point of f . To find the critical points19 we compute the
first order derivatives.
f x = 3x2 + y2 ´ 6x f y = 2xy ´ 8y

17 This is probably a good time to review the statement of Theorem 2.3.3.


18 It should intuitively obvious from a sketch that the boundary of the disk x2 + y2 ď 1 is the circle
x2 + y2 = 1. But if you really need a formal definition, here it is. A point ( a, b) is on the boundary of a
set S if there is a sequence of points in S that converges to ( a, b) and there is also a sequence of points
in the complement of S that converges to ( a, b).
19 We actually found the critical points in Example 2.3.19. But, for the convenience of the reader, we’ll
repeat that here.

74
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA

These are polynomials (in two variables) and they are defined everywhere. So the critical
points are the solutions of

f x = 3x2 + y2 ´ 6x = 0 (E1)
f y = 2y( x ´ 4) = 0 (E2)

The second equation, 2y( x ´ 4) = 0, is satisfied if and only if at least one of the two
equations y = 0 and x = 4 is satisfied.

• When y = 0, equation (E1) forces x to obey

0 = 3x2 + 02 ´ 6x = 3x ( x ´ 2)

so that x = 0 or x = 2.

• When x = 4, equation (E1) forces y to obey

0 = 3 ˆ 42 + y2 ´ 6 ˆ 4 = 24 + y2

which is impossible.

So, there are only two critical points: (0, 0), (2, 0).
Boundary: Our boundary is x2 + y2 = 1 We know that ( x, y) satisfies x2 + y2 = 1, and
hence y2 = 1 ´ x2 . Examining the formula for f ( x, y), we see that it contains only even20
powers of y, so we can eliminate y by substituting y2 = 1 ´ x2 into the formula.

f = x3 + x (1 ´ x2 ) ´ 3x2 ´ 4(1 ´ x2 ) + 4 = x + x2

The max and min of x + x2 for ´1 ď x ď 1 must occur either

• when x = ´1 (ñ y = f = 0) or

• when x = +1 (ñ y = 0, f = 2) or
b
• when 0 = d
dx ( x + x2 ) = 1 + 2x ( so x = ´ 21 , y = ˘ 34 , f = ´ 14 ).

Here is a sketch showing all of the points that we have identified.

√ y
(− 12 , 2
3
)

(−1, 0) (0, 0) (1, 0) (2, 0) x


(− 21 , − 2
3
)

20 If it contained
? odd powers too, we could consider
? the cases y ě 0 and y ď 0 separately and substitute
y = 1 ´ x2 in the former case and y = ´ 1 ´ x2 in the latter case.

75
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA

Note that the point (2, 0) is outside the allowed region21 . So all together, we have the
following candidates for max and min, with the max and min indicated.
? 
point (0, 0) (´1, 0) (1, 0) ´ 12 , ˘ 23
value of f 4 2 0 ´ 14
max min

Example 2.4.1

Example 2.4.2

Find the maximum and minimum values of f ( x, y) = xy ´ x3 y2 when ( x, y) runs over the
square 0 ď x ď 1, 0 ď y ď 1.
Solution. As usual, let’s examine the critical points and boundary in turn.
Interior: If f takes its maximum or minimum value at a point in the interior, 0 ă x ă 1,
0 ă y ă 1, then that point must be a critical point of f . To find the critical points we
compute the first order derivatives.
f x ( x, y) = y ´ 3x2 y2 f y ( x, y) = x ´ 2x3 y
Again, these functions are polynomials in two variables and they are smooth everywhere
in their domain, so the gradient is exists everywhere in the interior. This means that the
critical points are the solutions of
fx = 0 ðñ y(1 ´ 3x2 y) = 0 ðñ y = 0 or 3x2 y = 1
fy = 0 ðñ x (1 ´ 2x2 y) = 0 ðñ x = 0 or 2x2 y = 1
• If y = 0, we cannot have 2x2 y = 1, so we must have x = 0.
• If 3x2 y = 1, we cannot have x = 0, so we must have 2x2 y = 1. Dividing gives
3x2 y 3
1= 2x2 y
= 2 which is impossible.

So the only critical point in the square is (0, 0). There f = 0. Boundary: The region is a
square, so its boundary consists of its four sides.
• First, we look at the part of the boundary with x = 0. On that entire side f = 0.
• Next, we look at the part of the boundary with y = 0. On that entire side f = 0.
• Next, we look at the part of the boundary with y = 1. There f = f ( x, 1) = x ´ x3 . To
find the maximum and minimum of f ( x, y) on the part of the boundary with y = 1,
we must find the maximum and minimum of x ´ x3 when 0 ď x ď 1.
Recall that, in general, the maximum and minimum of a function h( x ) on the interval
a ď x ď b, must occur either at x = a or at x = b or at an x for which either h1 ( x ) = 0
or h1 ( x ) does not exist. In this case, ddx ( x ´ x3 ) = 1 ´ 3x2 , so the max and min of
x ´ x3 for 0 ď x ď 1 must occur

21 We found (2, 0) as a solution to the critical point equations (E1), (E2). That’s because, in the course of
solving those equations, we ignored the constraint that x2 + y2 ď 1.

76
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA

– either at x = 0, where f = 0,
2
– or at x = ?13 , where f = 3? 3
,
– or at x = 1, where f = 0.

• Finally, we look at the part of the boundary with x = 1. There f = f (1, y) = y ´ y2 .


As ddy (y ´ y2 ) = 1 ´ 2y, the only critical point of y ´ y2 is at y = 21 . So the the max
and min of y ´ y2 for 0 ď y ď 1 must occur

– either at y = 0, where f = 0,
– or at y = 12 , where f = 14 ,
– or at y = 1, where f = 0.

All together, we have the following candidates for max and min, with the max and min
indicated.

point (0, 0) (0,0ďyď1) (0ďxď1,0) (1, 0) (1, 21 ) (1, 1) (0, 1) ( ?13 , 1)


1 2
value of f 0 0 0 0 4 0 0 ?
3 3
« 0.385
min min min min min min max

y
(0, 1) ( √13 , 1) (1, 1)

(1, 12 )

x
(0, 0) (1, 0)

Example 2.4.2

Example 2.4.3

Find the high and low points of the surface z = x2 + y2 with ( x, y) varying over the
a

square |x| ď 1, |y| ď 1 .


Solution. The function f ( x, y) = x2 + y2 has a particularly simple geometric interpre-
a

tation — it is the distance from the point ( x, y) to the origin. So

• the minimum of f ( x, y) is achieved at the point in the square that is nearest the
origin — namely the origin itself. So (0, 0, 0) is the lowest point on the surface and
is at height 0.

• The maximum of f ( x, y) is achieved at the points in the square


 that are farthest from
the origin — namely the four corners of the square ˘ 1, ˘1 . At those four points
? ?
z = 2. So the highest points on the surface are (˘1, ˘1, 2).

77
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA

Even though we have already answered this question, it will be instructive to see what
we wouldahave found if we had followed our usual protocol. The partial derivatives of
f ( x, y) = x2 + y2 are defined for ( x, y) ‰ (0, 0) and are

x y
f x ( x, y) = a f y ( x, y) = a
x2+ y2 x2+ y2

• As we mentioned above, at the point ( x, y) = (0, 0) the gradient is not defined. But
(0, 0) is inside the interior of the domain of our function. Therefore, (0, 0) is a critical
point.

• There are no other critical points because

– f x = 0 only for x = 0, and


– f y = 0 only for y = 0.
– So (0, 0) is the only critical point because f x and f y are not defined there.

• The boundary of the square consists of its four sides. One side is

( x, y) ˇ x = 1, ´1 ď y ď 1
ˇ (

On this side f = 1 + y2 . As 1 + y2 increases with |y|,


a a
? the smallest value of f on
that side is 1 (when y = 0) and the largest value of f is 2 (when y = ˘1). The same
thing happens on the other three sides. The maximum value of f is achieved at the
four corners. Note that f x and f y are both nonzero at all four corners.

Example 2.4.3

2.4.1 §§ (Optional) Parametrization


To find the extrema of a surface along a boundary, we turn the boundary into a function
of one variable. So far we’ve done this by some combination of (1) solving the boundary
equation for one variable or one carefully-chosen expression, and (2) plugging that into
our surface function to eliminate one variable.
When boundaries are roughly circular (circles, ellipses), there’s another method for
turning them into a function of one variable: parametrization. To parametrize the curve
x2 + y2 = 1 (the unit circle), we define a third variable θ. The points ( x, y) on the curve all
satisfy x = cos θ and y = sin θ. This can sometimes make the work go more smoothly.
Example 2.4.4
2 ´y2
Find the maximum and minimum of T ( x, y) = ( x + y)e´x on the region defined by
x2 + y2 ď 1 (i.e. on the unit disk).
Solution. Let’s follow our checklist. First the critical points (where the gradient of our
function does not exist or it exists and is zero), then the boundary.

78
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA

Interior: If T takes its maximum or minimum value at a point in the interior, x2 + y2 ă 1,


then that point must be a critical point of T. To find the critical points we compute the first
order derivatives.
2 ´y2 2 ´y2
Tx ( x, y) = (1 ´ 2x2 ´ 2xy)e´x Ty ( x, y) = (1 ´ 2xy ´ 2y2 )e´x

Tx and Ty exist everywhere in their domain, so the gradient is defined at every point in
2 2
the interior of the function. Moving on, because the exponential e´x ´y is never zero, the
critical points are the solutions of

Tx = 0 ðñ 2x ( x + y) = 1
Ty = 0 ðñ 2y( x + y) = 1

• As both 2x ( x + y) and 2y( x + y) are nonzero, we may divide the two equations,
which gives yx = 1, forcing x = y.

• Substituting this into either equation gives 2x (2x ) = 1 so that x = y = ˘1/2.


So the only critical points are (1/2, 1/2) and (´1/2, ´1/2). Both are in x2 + y2 ă 1.
Boundary: Points on the boundary satisfy x2 + y2 = 1. That is they lie on a circle. We may
use the figure below to express x = cos t and y = sin t, in terms of the angle t. This will
make the formula for T on the boundary quite a bit easier to deal with. On the boundary,
2 t´sin2 t
T = (cos t + sin t)e´ cos = (cos t + sin t)e´1

As all t’s are allowed, this function takes its max and min at zeroes of

y
(cos t, sin t)
1
t x

dT 
= ´ sin t + cos t e´1
dt
That is, (cos t + sin t)e´1 takes its max and min
• when sin t = cos t,

• that is, when x = y and x2 + y2 = 1,

• which forces x2 + x2 = 1 and hence x = y = ˘ ?12 .

All together, we have the following candidates for max and min, with the max and min
indicated.

79
PARTIAL D ERIVATIVES 2.4 A BSOLUTE M INIMA AND M AXIMA

point ( 12 , 21 ) (´ 12 , ´ 12 ) ( ?1 , ?12 ) (´ ?12 , ´ ?12 )


? 2 ?
value of T ?1 « 0.61 ´ ?1e 2 2
e e « 0.52 ´ e
max min

The following sketch shows all of the critical points. It is a good idea to make such a sketch
so that you don’t accidentally include a critical point that is outside of the allowed region.

y
( √12 , √12 )
( 21 , 21 )
x
(− 12 , − 12 )
(− √12 , − √12 )

Example 2.4.4

In the last example, we analyzed the behaviour of f on the boundary of the region
of interest by using the parametrization x = cos t, y = sin t of the circle x2 + y2 = 1.
Sometimes using this parametrization is not so clean. And worse, some curves don’t have
such a simple parametrization. For our purposes, we’ll only use parametrization on circles
and ellipses.
Example 2.4.5

The temperature at a point ( x, y) in the disc x2 + y2 ď 4 is given by


2 ´y2
T ( x, y) = ( x + y)e´x .

Find the maximum and minimum temperatures on the disc.


Solution. The specified temperature and its first order derivatives are
2 ´y2
T ( x, y) = ( x + y)e´x
2 ´y2
Tx ( x, y) = (1 ´ 2x2 ´ 2xy)e´x
2 ´y2
Ty ( x, y) = (1 ´ 2xy ´ 2y2 )e´x

• First, we find the critical points. Tx and Ty are defined at all points in the interior
and therefore the critical points are the solutions of

Tx = 0 ðñ 2x ( x + y) = 1
Ty = 0 ðñ 2y( x + y) = 1

As x + y may not be equal to 0, this forces x = y and then x = y = ˘ 21 . So the only


critical points are ( 12 , 21 ) and (´ 12 , ´ 12 ).

80
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

• The boundary x2 + y2 = 4 is a circle of radius 2 centred at the origin. So, on the


boundary, x = 2 cos θ and y = 2 sin θ.
2 + y2 )
T ( x, y) = ( x + y)e´( x
2
T (2 cos θ, 2 sin θ ) = (2 sin θ + 2 cos θ ) e´4 = (sin θ + cos θ )
e4

This is a periodic function


 and so takes its max and min at zeroes of
dT 2
dθ = e4 ´ sin θ + cos θ . That is, when sin θ = cos θ, which forces sin θ = cos θ =
˘ ?12 .

All together, we have the following candidates for max and min.

location interior interior boundary boundary


point ( 12 , 12 ) (´ 21 , ´ 12 ) ( ?1 , ?12 ) (´ ?12 , ´ ?12 )
? 2 ?
2 2
value of T ?1
e
« 0.61 ´ ?1e « ´0.61 e4
« 0.05 ´ 2e4 2 « ´0.05
max min

The largest and smallest values of T in this table are

1 1
min = ´ ? max = ?
e e

Example 2.4.5

2.5IJ Lagrange Multipliers


In the last section we had to solve a number of problems of the form “What is the maxi-
mum value of the function f on the curve C?” In those examples, the curve C was simple
enough that we could reduce the problem to finding the maximum of a function of one
variable. For more complicated problems this reduction might not be possible. In this sec-
tion, we introduce another method for solving such problems. First some nomenclature.

Definition2.5.1.

A problem of the form

“Find the maximum and minimum values of the function f ( x, y) for ( x, y) on


the curve g( x, y) = 0.”

is one type of constrained optimization problem. The function being maximized or


minimized, f ( x, y), is called the objective function. The function, g( x, y), whose
zero set is the curve of interest, is called the constraint function.

81
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

Such problems are quite common. As we said above, we have already encountered
them in the last section on absolute maxima and minima, when we were looking for the
extreme values of a function on the boundary of a region. In economics “utility functions”
are used to model the relative “usefulness” or “desirability” or “preference” of various
economic choices. For example, a utility function U (w, κ ) might specify the relative level
of satisfaction a consumer would get from purchasing a quantity w of wine and κ of coffee.
If the consumer wants to spend $100 and wine costs $20 per unit and coffee costs $5 per
unit, then the consumer would like to mazimize U (w, κ ) subject to the constraint that
20w + 5κ = 100.

To this point we have always solved such constrained optimization problems either by

• solving g( x, y) = 0 for y as a function of x (or for x as a function of y) or by

• (if you did the optional section) parametrizing the curve


 g( x, y) = 0. This means
writing all points of the curve in the form x (t), y(t) for some functions x (t) and
y(t). For example we used x (t) = cos t, y(t) = sin t as a parametrization of the circle
x2 + y2 = 1 in Example 2.4.4.

However, quite often the function g( x, y) is so complicated that one cannot explicitly solve
g( x, y) = 0 for y as a function of x or for x as a function of y and one also cannot explicitly
parametrize g( x, y) = 0. Or sometimes you can, for example, solve g( x, y) = 0 for y as
a function of x, but the resulting solution is so complicated that it is really hard, or even
virtually impossible, to work with. Direct attacks become even harder in higher dimen-
sions when, for example, we wish to optimize a function f ( x, y, z) subject to a constraint
g( x, y, z) = 0.

There is another procedure called the method of “Lagrange22 multipliers” that comes
to our rescue in these scenarios. Here is the two-dimensional version of the method. There
are obvious analogues is other dimensions.

22 Joseph-Louis Lagrange was actually born Giuseppe Lodovico Lagrangia in Turin, Italy in 1736. He
moved to Berlin in 1766 and then to Paris in 1786. He eventually acquired French citizenship and then
the French claimed he was a French mathematician, while the Italians continued to claim that he was
an Italian mathematician.

82
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

Theorem2.5.2 (Lagrange Multipliers).

Let f ( x, y) and g( x, y) have continuous first partial derivatives23 in a region of


R2 that contains the surface S given by the equation g( x, y) = 0. Further assume
that ∇ g( x, y) ‰ 0 on S.
If f , restricted to the surface S, has a local extreme value at the point ( a, b) on S,
then there is a real number λ such that

∇ f ( a, b) = λ∇ g( a, b)

that is

f x ( a, b) = λ gx ( a, b)
f y ( a, b) = λ gy ( a, b)

The number λ is called a Lagrange multiplier.

A proof of this theorem can be found in Appendix A.5.

So to find the maximum and minimum values of f ( x, y, z) on a surface g( x, y, z) = 0,


assuming that both the objective function f ( x, y, z) and constraint function g( x, y, z) have
continuous first partial derivatives and that ∇ g( x, y, z) ‰ 0, you

1. build up a list of candidate points ( x, y, z) by finding all solutions to the equations

f x ( x, y) = λ gx ( x, y)
f y ( x, y) = λ gy ( x, y)
g( x, y) = 0

Note that there are three equations and three unknowns, namely x, y, and λ.

2. Then you evaluate f ( x, y) at each ( x, y) on the list of candidates. The biggest of these
candidate values is the absolute maximum, if an absolute maximum exists. The
smallest of these candidate values is the absolute minimum, if an absolute minimum
exists..

Theorem 2.5.2 can be extended to functions of more variables in a natural way. Using
higher-dimensional Lagrange isn’t in our learning goals, but for interest, we want you to
see how easily the method generalizes. The calculus is the same – it’s only the algebra that
gets longer.

23 Note that this implies the gradients of these functions are defined in this region

83
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

Theorem2.5.3 ((Optional) Lagrange Multipliers for Functions of Three Variables).

Let f ( x, y, z) and g( x, y, z) have continuous first partial derivatives in a region


of R3 that contains the surface S given by the equation g( x, y, z) = 0. Further
assume that ∇ g( x, y, z) ‰ 0 on S.
If f , restricted to the surface S, has a local extreme value at the point ( a, b, c) on
S, then there is a real number λ such that

∇ f ( a, b, c) = λ∇ g( a, b, c)

that is

f x ( a, b, c) = λ gx ( a, b, c)
f y ( a, b, c) = λ gy ( a, b, c)
f z ( a, b, c) = λ gz ( a, b, c)

The number λ is called a Lagrange multiplier.

Now for a bunch of examples.


Example 2.5.4

Find the maximum and minimum of the function x2 ´ 10x ´ y2 on the ellipse whose equa-
tion is x2 + 4y2 = 16.
Solution. For this first example, we’ll do out the algebra in truly gory detail. Once you
get the hang of it, it’ll go much faster.
Our objective function (the one we want to maximize and/or minimize) is f ( x, y) =
x2 ´ 10x ´ y2 and the constraint function is g( x, y) = x2 + 4y2 ´ 16. To apply the method
of Lagrange multipliers we need ∇ f and ∇ g. So we start by computing the first-order
derivatives of these functions.

f x = 2x ´ 10 f y = ´2y gx = 2x gy = 8y

So, according to the method of Lagrange multipliers, we need to find all solutions to the
following system of equations.

f x = λgx 2x ´ 10 = λ(2x ) (E1)


f y = λgy ùñ ´2y = λ(8y) (E2)
g( x, y) = 0 x2 + 4y2 ´ 16 = 0 (E3)

(E1) In equation (E1), if 2x is nonzero, then we can divide both sides of the equation by it,
x´5
to find λ = 2x´10
2x , i.e. λ = . If 2x = 0, then the equation becomes ´10 = 0λ,
x
which is not true for any λ.

84
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

(E2) In equation (E2), if 8y is nonzero, then we can divide both sides of the equation by it,
´2y 1
to find λ = 8y , i.e. λ = ´ . If 8y = 0, then we also get a solution y = 0 for any
4
λ.

(E1)+(E2) We need all three equations to be true at the same time (that is, for the same
values of x, y, and λ. We’ve found two ways for both (E1) and (E2) to be true.

• First way: λ = x´5


x and λ = ´ 41
• Second way: λ = x´5
x and y =0

(E3) Now we’ll see which points make (E1) and (E2) true while also making (E3) true.

• First way: λ = x´5


x and λ = ´ 41

x´5 1
λ= and λ = ´
x 4
x´5 1
ùñ =´
x 4
ùñ ´4x + 20 = x
ùñ x=4

In order to satisfy (E3):

0 = 42 + 4y2 ´ 16
0=y

So, the point ( x, y) = (4, 0) satisfies all three equations.


x´5
• Second way: λ = x and y = 0. If y = 0, then from E3, we see

0 = x2 + 402 ´ 16
16 = x2
x = ˘4

So the points to consider are ( x, y) = (˘4, 0) .

Now we’ve found the only possible solutions to all three equations: (˘4, 0). (λ has
to exist, but we don’t actually care what it is.) So the method of Lagrange multipliers,
Theorem 2.5.2, gives that the only possible locations of the maximum and minimum of
the function f are (4, 0) and (´4, 0). To complete the problem, we only have to compute f
at those points.

point (4, 0) (´4, 0)


value of f ´24 56
min max

85
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

Hence the maximum value of x2 ´ 10x ´ y2 on the ellipse is 56 and the minimum value is
´24.

y
x2 ` 4y 2 “ 16

p4,0q
p´4,0q
x

Example 2.5.4

In the previous example, we had to make a lot of decisions about how to solve for the
solutions to the system of three equations. Actually, we can start our Lagrange system-
solving the same way every time. The first observation we make is that the partial deriva-
tives of g can be 0, or nonzero. If they’re zero, this may or may not lead to a solution; if
they’re nonzero, this tells us something about λ.
In the textbook and problem book, we will consistently use the same method to solve
the system of equations. It’s certainly no the only way, and you are free to use other
methods. Once you get used to the computations, you’ll probably start finding ways to
make them faster based on the specifics of individual problems.
Example 2.5.5 (Solving Lagrange in General)

Suppose you want to find all points ( x, y) for which a solution exists to the system below.

f x = λgx (E1)
f y = λgy (E2)
g( x, y) = 0 (E3)

where λ is some real constant. Our method below will hinge on the observation from the
last example that we get different solutions for zero vs. nonzero partial derivatives of the
constraint.
fx fy
• If gx ‰ 0 and gy ‰ 0, then from (E1) we see λ = gx , and from (E2) we see λ = gy . So,
choosing a pair ( x, y) such that
fx fy
=
gx gy
means that for some λ, that pair makes (E1) and (E2) true. Simplify the equation
above to find the necessary relationship between x and y, then find which pairs with
that relationship make (E3) true.

• If gx = 0, then from (E1) we see also f x = 0. Then (E1) is true for any λ that we like.
We can check that there exists some λ that makes (E2) true as well. Then, we find
the points ( x, y) that make (E3) true as well as gx = f x = 0.

86
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

• If gy = 0, then from (E2) we see also f y = 0. Then (E2) is true for any λ that we like.
We can check that there exists some λ that makes (E1) true as well. Then, we find
the points ( x, y) that make (E3) true as well as gx = f x = 0.

Sometimes, one or more of these cases won’t lead to any solutions. In Example 2.5.4,
we were immediately able to discard the possibility gx = 0, because it didn’t lead to
a solution. Once you’re practiced with these types of problems, you’ll often see quite
quickly which cases you get to discard.
Example 2.5.5

We’ll apply our three-case breakdown in subsequent examples.

Example 2.5.6

Find the minimum and maximum values of the objective function


   
2 2
f ( x, y) = ln x ´ 2x + 5 + ln y ´ 4y + 13

subject to the constraint


x2 ´ 2x + y2 ´ 4y = 20

Solution. Our constraint function is

g( x, y) = x2 ´ 2x + y2 ´ 4y ´ 20 = 0

We start by setting up the first two equations from the method of Lagrange multipliers.

2x ´ 2
f x = λgx = λ(2x ´ 2) (E1)
x2 ´ 2x + 5
2y ´ 4
f y = λgy = λ(2y ´ 4) (E2)
y2 ´ 4y + 13
g( x, y) = 0 x2 ´ 2x + y2 ´ 4y = 20 (E3)

Now we consider our three cases.

1 1
• gx ‰ 0 and gy ‰ 0. From (E1), this means λ = x2 ´2x +5
. From (E2), λ = y2 ´4y+13
.

1 1
=
x2 ´ 2x + 5 y2 ´ 4y + 13
x2 ´ 2x + 5 = y2 ´ 4y + 13
x2 ´ 2x = y2 ´ 4y + 8

87
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

This gives us the relationship between x and y that must hold for (E1) and (E2) to be
true under the assumption gx ‰ 0 and gy ‰ 0. Now, in order for (E3) to be true as
well:
0 = ( x2 ´ 2x ) + y2 ´ 4y ´ 20
= (y2 ´ 4y + 8) + y2 ´ 4y ´ 20
= 2y2 ´ 8y ´ 12
0 = y2 ´ 4y ´ 6
?
?
a
4 ˘ 16 ´ 4(1)(´6) 4 ˘ 40
y= = = 2 ˘ 10
2 2
2 2
So, 0 = ( x ´ 2x ) + y ´ 4y ´ 20
 ? 2 ?
= x2 ´ 2x + 2 ˘ 10 ´ 4(2 ˘ 10) ´ 20
 ?  ?
= x2 ´ 2x + 4 ˘ 4 10 + 10 ´ 8 ¯ 4 10 ´ 20
? ?
Note ˘4 2 ¯ 4 2 = 0
= x2 ´ 2x + 4 + 10 ´ 8 ´ 20
= x2 ´ 2x ´ 14
?
?
a
2 ˘ 4 ´ 4(´14) 2 ˘ 2 15
x= = = 1 ˘ 15
2 2
This gives
? us four
? points
 to
? consider:
?  ? ?  ? ? 
1 + 15, 2 + 10 , 1 ´ 15, 2 + 10 , 1 + 15, 2 ´ 10 , and 1 ´ 15, 2 ´ 10 .
• If gx = 0, then x = 1, and (E1) is true for any λ. Then we can choose whatever λ is
necessary to make (E2) true. By (E3):
0 = x2 ´ 2x + y2 ´ 4x ´ 20
= 1 ´ 2 + y2 ´ 4x ´ 20
= y2 ´ 4y ´ 21
= (y ´ 7)(y + 3)
y = 7, y = ´3
This gives us two points to consider: (1, 7) and (1, ´3).
• If gy = 0, then y = 2, and (E2) is true for any λ. Then we can choose whatever λ is
necessary to make (E1) true. By (E3):
0 = x2 ´ 2x + y2 ´ 4x ´ 20
= x2 ´ 2x + 4 ´ 8 ´ 20
= x2 ´ 2x ´ 24
= ( x ´ 6)( x + 4)
x = 6, x = ´4
This gives us two points to consider: (´4, 2) and (6, 2).

88
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

So, all together we have eight points that satisfy our three Lagrange equations. It’s left
only to decide which of those points lead to maxima and to minima.

? ? ? ? ? ? ? ?
point (1 + 15, 2 + 10) (1 ´ 15, 2 + 10) (1 + 15, 2 ´ 10) (1 ´ 15, 2 ´ 10)
value of f ln 361 ln 361 ln 361 ln 361
max max max max

point (´4, 2) (6, 2) (1, 7) (1, ´3)


value of f ln 261 ln 261 ln 136 ln 136
min min

Our maximum value is ln 361, and our minimum value is ln 136.


Example 2.5.6

Example 2.5.7

Find the ends of the major and minor axes of the ellipse 3x2 ´ 2xy + 3y2 = 4. They are the
points on the ellipse that are farthest from and nearest to the origin.
Solution. Let ( x, y) be a point on 3x2 ´ 2xy + 3y2 = 4. This point is at the end of a major
axis when it maximizes its distance from the centre of the ellipse, (0, 0). It is at the end
of a minor axis when itaminimizes its distance from (0, 0). So we wish to maximize and
minimize the distance x2 + y2 subject to the constraint

g( x, y) = 3x2 ´ 2xy + 3y2 ´ 4 = 0

Now maximizing/minimizing x2 + y2 is equivalent24 to maximizing/minimizing its


a
2
x2 + y2 = x2 + y2 . So we are free to choose the objective function
a
square

f ( x, y) = x2 + y2

which we will do, because it makes the derivatives cleaner. Again, we use Lagrange
multipliers to solve this problem, so we start by finding the partial derivatives.

f x ( x, y) = 2x f y ( x, y) = 2y gx ( x, y) = 6x ´ 2y gy ( x, y) = ´2x + 6y

We need to find all solutions to

2x = λ(6x ´ 2y) (E1)


2y = λ(´2x + 6y) (E2)
3x2 ´ 2xy + 3y2 ´ 4 = 0 (E3)

24 The function S(z) = z2 is a strictly increasing function for z ě 0. So, for a, b ě 0, the statement “a ă b”
is equivalent to the statement “S( a) ă S(b)”.

89
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

2x x 2y y
• If gx ‰ 0 and gy ‰ 0, then λ = 6x´2y = 3x´y by (E1), and λ = ´2x +6y = ´x +3y by
(E2).

x y
=
3x ´ y ´x + 3y
´x2 + 3xy = 3xy ´ y2
x 2 = y2
x = ˘y

So if x = ˘y, then the appropriate λ will make both (E1) and (E2) true. Now let’s see
what makes (E3) true.

4 = 3x2 ´ 2xy + 3y2


4 = 3(˘y)2 ´ 2(˘y)y + 3y2
= 3y2 ¯ 2y2 + 3y2
= (6 ¯ 2) y2
1
4 = (6 + 2) x 2 ùñ x = ˘ ? when x = ´y
2
4 = (6 ´ 2) x 2 ùñ x = ˘1 when x = y

 
This gives us four points to check: the two points ˘ ?1 , ´ ?1 and the two points
2 2
˘(1, 1)

• If gx = 0, then 6x ´ 2y = 0, i.e. y = 3x. By (E1), x = 0, so y = 0. Then (E3) doesn’t


hold, so this leads to no solutions.

• If gy = 0, then ´2x + 6y = 0, i.e. x = 3y. By (E2), y = 0, so x = 0. Then (E3) doesn’t


hold, so this leads to no solutions.

?
The distance
 from ( 0, 0 ) to ˘ ( 1, 1 ) , namely 2, is larger than the distance
 from (0, 0)
1 1 1 1
to ˘ 2 , ´ 2 , namely 1. So the ends of the minor axes are ˘ 2 , ´ 2 and the ends of
? ? ? ?

the major axes are ˘(1, 1). Those ends are sketched in the figure on the left below. Once
we have the ends, it is an easy matter25 to sketch the ellipse as in the figure on the right
below.

25 if you tilt your head so that the line through (1, 1) and (´1, ´1) appears horizontal

90
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

y y

p1,1q p1,1q
? ?
p´1,1q{ 2 p´1,1q{ 2

x x

? ?
p1,´1q{ 2 p1,´1q{ 2
p´1,´1q p´1,´1q

3x2 ´ 2xy ` 3y 2 “ 4

Example 2.5.7

In the previous examples, the objective function and the constraint were specified ex-
plicitly. That will not always be the case. In the next example, we have to do a little
geometry to extract them.
Example 2.5.8

Find the rectangle of largest area (with sides parallel to the coordinates axes) that can be
inscribed in the ellipse x2 + 2y2 = 1.
Solution. Since this question is so geometric, it is best to start by drawing a picture.

y
x2 ` 2y 2 “ 1
px, yq

p´x, ´yq px, ´yq

Call the coordinates of the upper right corner of the rectangle ( x, y), as in the figure
above. Note that x ě 0 and y ě 0; and if x = 0 or y = 0, then the area of the rectangle is
0, which is certainly not a maximum. So the global maximum must occur at some point
where x and y are both positive. This will also be a local maximum, so we should be able
to find it using the method of Lagrange multipliers.
The four corners of the rectangle are (˘x, ˘y) so the rectangle has width 2x and height
2y and the objective function is f ( x, y) = 4xy. The constraint function for this problem is
g( x, y) = x2 + 2y2 ´ 1. Again, to use Lagrange mutlipliers we need the first order partial
derivatives.
f x = 4y f y = 4x gx = 2x gy = 4y

91
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

So, according to the method of Lagrange multipliers, we need to find all solutions to

4y = λ(2x ) (E1)
4x = λ(4y) (E2)
x2 + 2y2 ´ 1 = 0 (E3)

4y 2y 4x x
• If gx ‰ 0 and gy ‰ 0, then λ = 2x = x from (E1) and λ = 4y = y from (E2). So,

2y x
=
x y
2y2 = x2
?
x = (˘ 2) y

From (E3),
 ? 2
(˘ 2)y + 2y2 ´ 1 = 0
2y2 + 2y2 = 1
4y2 = 1
1
y=˘
2
? 1
x = (˘ 2) y = ˘ ?
2
 
1 1
So there are four points to consider: ˘ 2 , ˘ 2 .
?

• If gx = 0, i.e. 2x = 0, then x = 0; by (E1) also y = 0; but then (E3) fails. So this


doesn’t give us any more points to consider.

• If gy = 0, i.e. 4y = 0, then y = 0; by (E2) also x = 0; but then (E3) fails. So this


doesn’t give us any more points to consider either.

?  ?  ? 
We now have four possible values of ( x, y), namely 1/ 2 , 1/2 , ´ 1/ 2 , ´1/2 , 1/ 2 , ´1/2
? 
and ´ 1/ 2 , 1/2 . They are the four corners of a single rectangle. We said that we wanted
? 
( x, y) to be the upper right corner, i.e. the corner in the first quadrant. It is 1/ 2 , 1/2 .
How do we interpret the other three points we found? The global min of the function
4xy subject to the constraint x2 + 2y2 = 1 will occur at one of these points, but those
points aren’t in our model domain. When x and y have different signs, 4xy no longer
gives the area of a rectangle, since it’s negative. Over our model domain, we kind of
have “endpoints:” x = 0 and y = 0. Our maximum occurred somewhere between our
endpoints; our model minimum occurs at the endpoints.
Example 2.5.8

92
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

2.5.1 §§ Bounded vs Unbounded Constraints


In the last example, we had to think a little extra about whether the solution to the La-
grange equations gave a maximum or minimum. Take a closer look at Theorem 2.5.2: all
local extrema will occur at a solution point. So when do the solution points definitely also
include all absolute extrema?

1. If our constraint function is a closed curve (circle, ellipse, square, etc.) and our ob-
jective function is continuous over it, then there will certainly be an absolute max
and absolute min over the constraint; and these will certainly also be local extrema.
So when our constraint is a closed curve, and our objective function is continuous
over it, we are guaranteed that the absolute max and min exist, and are at points that
satisfy the Lagrange equations.

n Section 2.4 we considered domains that were bounded by a closed curve, so we


only considered boundaries of this type.

2. If our constraint function is not a closed curve (e.g. a line, a line segment, a function
like xy = 1, etc.) then the system is more complicated. Assume that the objective
function is continuous over the constraint curve. Since our constraint curve is one-
dimensional (like a line, but a line that has some orientation in space), we’re in a
similar position as we were in single-variable calculus: extrema can occur at end-
points, or at “critical points.” In our case, “critical points” translate to solutions to
the Lagrange equations; “endpoints” mean pretty much the same thing they always
have.

(a) If the constraint function is bounded, we must consider its endpoints as well
as solutions to the Lagrange system. There will be an absolute maximum and
minimum, and these will definitely occur at solutions to the Lagrange system
or at the endpoints of the constraint.

(b) If the constraint function is unbounded, there may or may not exist absolute
extrema. This is where you’ll most heavily rely on your intuition about function
shape and behaviour. Limits can be useful here.

93
PARTIAL D ERIVATIVES 2.5 L AGRANGE M ULTIPLIERS

Example 2.5.9

Find the values of w ě 0 and κ ě 0 that maximize the utility function

U (w, κ ) = 6w /3 κ /3
2 1
subject to the constraint 4w + 2κ = 12
Solution. The constraint 4w + 2κ = 12 is simple enough that we can easily use it to
express κ in terms of w, then substitute κ = 6 ´ 2w into U (w, κ ), and then maximize
U (w, 6 ´ 2w) = 6w2/3 (6 ´ 2w)1/3 using the techniques of last semester.
However, for practice purposes, we’ll use Lagrange multipliers with the objective func-
tion U (w, κ ) = 6w2/3 κ 1/3 and the constraint function g(w, κ ) = 4w + 2κ ´ 12. The first order
derivatives of these functions are
Uw = 4w´ /3 κ /3 Uκ = 2w /3 κ ´ /3
1 1 2 2
gw = 4 gκ = 2
The boundary values (“endpoints”) w = 0 and κ = 0 give utility 0, which is obviously not
going to be the maximum utility. So it suffices to consider only local maxima. According
to the method of Lagrange multipliers, we need to find all solutions to

4w´ /3 κ /3 = 4λ
1 1
(E1)
2w /3 κ ´ /3 = 2λ
2 2
(E2)
4w + 2κ ´ 12 = 0 (E3)
Then we see gx ‰ 0 and gw ‰ 0, so we only have one of our usual three cases.
• equation (E1) gives λ = w´1/3 κ 1/3 .
• Substituting this into (E2) gives w2/3 κ ´2/3 = λ = w´1/3 κ 1/3 and hence w = κ.
• Then substituting w = κ into (E3) gives 6κ = 12.
So w = κ = 2 and the maximum utility is U (2, 2) = 12.
Note in this example we had a bounded (but not closed) curve. It has endpoints (0, 6)
and (3, 0). Since the maximum didn’t occur at the endpoints, then the global maximum
was also a local maximum, and so it showed up as a solution to the system of Lagrange
equations.
Example 2.5.9

Chapter 2 was adapted from Chapter 2 of CLP 3 – Multivariable Calculus by Feldman,


Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-ShareAlike
4.0 International license.

94
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS

2.6IJ (Optional) Utility and Demand Functions


Economists tend to measure utility — also known as happiness. The utility function gives
the satisfaction26 a consumer gains when consuming a particular amount of goods or
services. Utility functions often give rise to constrained optimization problems: given
finite resources, what amount of consumption will maximize happiness?
The amount consumed should be a non-negative number, so we’ll restrict our domains
accordingly.
Suppose u( x ) is the utility function for quantity x of some good. If “more is better,”
then we expect du
dx ą 0 for all nonnegative x (i.e. for all x in the domain). We can think
du
of dx as marginal utility: the gain in happiness of getting just a little more of something.
2
If the good is subject to “diminishing returns,” then we expect ddxu2 ă 0. That’s because as
we get more of the good (i.e. as x increases) each additional unit brings us less happiness
d2 u
than the last. Our marginal utility du
dx is decreasing, meaning dx2 ă 0.
Utility functions can encompass more than one good. A multivariable utility function
u( x, y) might give the happiness associated with consuming quantity x of one good along-
side quantity y of another good. Just like with single-variable utility functions, if “more
Bx ą 0 and By ą 0 everywhere. If there are diminishing returns, then also
is better” than Bu Bu

B2 u B2 u
Bx2
ă 0 and By2
ă 0 everywhere.

2.6.1 §§ Constrained Optimization of the Utility Function

Example 2.6.1

Suppose
u( x, y) = x2 + 2y
gives the utility of x and y units of two goods, respectively, for a particular consumer.
For these goods, “more is better” (because u x ą 0 and uy ą 0 for all non-negative x
2
and y) without diminishing returns. (Indeed, BBxu2 ą 0, meaning each subsequent unit of
the good associated with x brings more happiness than the last.)
Suppose the consumer has 10 dollars, and x and y cost 2 and 3 dollars per unit respec-
tively. Find the optimal consumption of x and y, and the corresponding maximum utility.

Solution. This is a constrained optimization problem. Our objective function (what we


want to maximize) is
u( x, y) = x2 + 2y
Our constraint function comes from our budget:

g( x, y) = 2x + 3y ´ 10 = 0

26 Utilities are meaningful in comparison to one another, but generally don’t have particular units. It’s
hard to say what exactly it means to have “one-and-a-half satisfaction,” but we can say that a utility of
1.5 is better than a utility of 1.25.

95
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS

(Since “more is better,” there’s no incentive to spend less than our budget of ten dollars.)
We can solve this by substitution. From our constraint, we see y = 10´2x3 . That turns
our utility function into the following:
2 4 20
u( x, y) = x2 + 2y = x2 + (10 ´ 2x ) = x2 ´ x +
3 3 3
This is a parabola pointing up, so its maximum will be at an endpoint of our interval.
Since x and y are quantities, we require x ě 0 and y ě 0.
10 ´ 2x
0ďy= ùñ x ď 5
3
Our model domain is 0 ď x ď 5. The endpoint x = 5 corresponds to all $10 going to the
first good (and y = 0). The endpoint x = 0 corresponds to all $10 going to the second
good (with y = 103 ).

u(5, 0) = 52 + 2(0) = 25
   
10 2 10 20
u 0, = 0 +2 =
3 3 3
Our utility is maximized when we spend all $10 on the first good, purchasing x = 5
and y = 0. That maximum utility is 25.
Example 2.6.1

Two functions that are often used to model utility are natural logarithms and the Cobb-
Douglas function:

u( x, y) = α ln( x ) + β ln(y) u( x, y) = Ax α y β

(where A, α, and β are constants.) The reasons why these equations make good models go
beyond the scope of this class. However, you have all the tools required to solve problems
involving these two equations.
Example 2.6.2

Alejandro has recently found a true passion for baking. He likes making two types of
bread: ciabatta (c) and pita (p). Ciabatta costs 20 dollars per unit to make and pita 10
dollars per unit. Alejandro wants to spend 60 dollars on bread, and his utility function27
is as follows:
u(c, p) = ln(c) + 2 ln( p)
Find the optimal consumption for Alejandro and the corresponding maximum utility.
Solution. The utility function will be the objective function and the constraint will be the

27 We’re not averse to having negative utility values. Again, utility doesn’t have absolute units, but rather
is useful as a relative scale. Higher utility is better, whether the numbers are positive or not.
This particular utility function has the interesting feature that c = 0 or p = 0 will minimize utility. This is
actually a common property of utility functions. It avoids having an optimal solution where one good
is not consumed at all.

96
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS

budget constraint. The budget constraint is 20c + 10p = 60. We can find the maximum
utility using substitution or the method of Lagrange multipliers.
Solution 1: substitution
Since 20c + 10p = 60, we see p = 6 ´ 2c. Then our utility function is:

u(c, p) = ln(c) + 2 ln( p) = ln(c) + 2 ln (6 ´ 2c)

Using log rules,


 
u(c, p) = ln(c) + ln (6 ´ 2c)2
 
= ln c(7 ´ 2c)2

Much like the square root function,


 natural logarithm is an increasing function. So, the
maximum of ln c(6 ´ 2c) 2 will occur at the same place as the maximum of
2
c(6 ´ 2c) , provided that maximum is positive (and thus in the domain of the logarith-
mic function).

f (c) = c(6 ´ 2c)2

Using the product rule,

f 1 (c) = c ¨ 2(6 ´ 2c)(´2) + (6 ´ 2c)2


= (6 ´ 2c) [´4c + (6 ´ 2c)]
= 12(3 ´ c)(1 ´ c)
The critical points of f (c) are c = 1 and c = 3.

f (1) = 16
f (3) = 0

We also need to check the endpoints of our interval. Since p ě 0, then:

0 ď p = 6 ´ 2c ùñ c ď 3
The endpoints of our interval are c = 0 (all pita) and c = 3 (all ciabatta). We’ve already
found f (3) = 0.

f (0) = 0

The function c(6 ´ 2c)2 has a maximum of 16 when c = 1, so the function ln c(6 ´ 2c)2
has a maximum of ln 16 when c = 1. Since c = 1 means p = 4, utility is maximized when
Alejandro spends $20 on ciabatta, and $40 on pita.
Solution 2: Lagrange

$ $1
&u c
’ = λgc &c
’ = λ ¨ 20
2
up = λg p ùñ p = λ ¨ 10
g(c, p) =0
’ ’
20c + 10p ´ 60 = 0
% %

97
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS

From the first two equations, we see


1 2
λ= =
20c 10p
p = 4c
From the constraint equation,
0 = 20c + 10p ´ 60 = 10c + 10(4c) ´ 60
c=1 p=4
So, the point c = 1 is a point to check. We should also check the endpoints of our
interval, c = 0 and c = 3. Note both these cause the utility to go to negative infinity – so
they are minima. That tells us c = 1, p = 4 gives us our constrained maxima. The utility
of spending $20 on ciabatta and $40 on pita is ln 1 + 2 ln 4 = ln(16).
Example 2.6.2

2.6.2 §§ Demand Curves


A demand curve gives the relationship between the amount of a good a consumer would
buy and the price of that good. We assume the consumer would buy the amount that
maximizes their utility function, given their budget constraints.
In Examples 2.6.1 and 2.6.2 we found “optimal consumption” when the price and bud-
get were fixed numbers. So secretly, we were finding a point of the demand function. If
we keep price and budget as variables, the optimal consumption (as a function of price
and budget) will give the general demand function. This is sometimes formally referred
to as Marshallian demand.
Definition2.6.3 (Marshallian demand).

Suppose u( x, y) is the utility function for quantities x and y of two goods. Let
these goods have unit prices p x and py , respectively, and let the consumer have a
budget I. Then the function
x m ( p x , py , I )
giving the optimal consumption of x to maximize u( x, y) subject to the budget
constraint p x x + py y = I is called the Marshallian demand function.

Note: the superscript m in the function name x m isn’t a power. Rather than denoting a
variable, m simply stands for “Marshallian.”
Example 2.6.4

Lets go back to Alejandro and his passion for baking. This weekend he would like to make
ciabatta (c) and focaccia (f). Ciabatta costs pc dollars to make and focaccia p f dollars28 . For

28 When he knows we’ll be thinking about price p, Alejandro helpfully bakes a type of bread that does
not start with “p”.

98
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS

this weekend, Alejandro wants to spend I dollars on bread, and his utility function is as
follows:
u(c, f ) = ln(c) + 2 ln( f )
Find the optimal consumption for Alejandro of each bread type.
Solution. The utility function will be the objective function and the constraint will be the
budget constraint. The budget constraint is pc c + p f f = I ñ b(c, f ) = cpc + f p f ´ I.
As in Example 2.6.2, the endpoints of our interval (c = 0, f = 0) minimize utility, so
the maximum will be at some interior point. We can find it using the method of Lagrange
multipliers.

$ $1
&u c
’ = λ ¨ bc &c
’ = λ ¨ pc
2
uf = λ ¨ b f ùñ f = λ ¨ pf
b(c, f ) =0
’ ’
cpc + f p f ´ I =0
% %

The first two equations tell us


1 2
λ= =
c ¨ pc f ¨ pf
!
pc
f = 2c
pf

From the budget constraint,

0 = cpc + f p f ´ I
!
pc
= cpc + 2c pf ´ I
pf
= cpc + 2cpc ´ I
I = 3cpc
I
c=
3pc

Using the relationship between f and c,


!
pc 2I
f = 2c =
pf 3p f

2I
The point c = 3pI c , f = 3p is the only point to consider for a max. Since c ě 0 and
f
f ě 0, it’s within our model domain. So, it give the optimal consumption of ciabatta and
focaccia.
Let’s think of the optimal consumption of each bread type as a function of budget and
demand, and name these functions cm and f m (m for “Marshallian”). Then
I 2I
cm ( I, pc , p f ) = f m ( I, pc , p f ) =
3pc 3p f

99
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS

give the Marshallian demand curves for ciabatta and focaccia, respectively.
Example 2.6.4
It is possible to use the Marshallian Demand to do some analysis on the goods. A normal
good is defined as a product for which quantity demanded increases as income increases.
An inferior good is defined as a product for which quantity demanded decreases as income
increases.
Definition2.6.5 (Normal and Inferior Goods).

Let x m ( p x , py , I ) be the Marshallian demand function of a good when the price of


that good is p x , the price of another good is py , and the budget constraint is I. If

Bx m ( p x , py , I )
ą0
BI
everywhere then the good is a normal good. If

Bx m ( p x , py , I )
ă0
BI
everywhere then the good is an inferior good.

You can go back to Alejandro’s example 2.6.4 and verify that, in that case, both are
normal goods.

Theorem2.6.6 (Normal and Inferior Goods).

Let X and Y be two goods with positive unit prices p x and py , respectively, subject
to the budget constraint p x x + py y = I. If X is an inferior good, then Y is a normal
good.

Proof. Let x m and ym be the Marshallian demand functions of X and Y, respectively.


For any quantities x and y, we must satisfy the budget constraint p x x + py y = I. So:

I ´ px x
y=
py

When x = x m , then y = ym , so:


I ´ px xm
ym =
py
m    
By B I ´ px xm B 1 px m
ùñ = = I´ x
BI BI py BI py py
1 p x Bx m
= ´ ¨
py py BI

100
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS

Bx m
Since X is an inferior good, by Definition 2.6.5, BI ă 0. Since also p x and py are positive,
p m
then the term ´ pyx Bx
BI is positive:

Bym 1
ùñ ě ą0
BI py
So, Y is a normal good by Definition 2.6.5.
Using the first partial derivative, we can also analyse how changes in prices affect the
Marshallian Demand.
Definition2.6.7 (Price Effect).

Let x m ( p x , py , I ) be a Marshallian demand function for a good X whose quantity


is given by x and unit price is given by p x , in relation to another good Y whose
quantity is given by y and whose unit price s given by py .

Bx m ( p x , py , I )
B px

is the rate of change of x m (the optimal consumption of X) relative to the price of


X. We call this the price effect of X on x m . Similarly,

Bx m ( p x , py , I )
B py

the rate of change of x m (the optimal consumption of X) relative to the price of Y.


This is the price effect of Y on x m .

Example 2.6.8 (Price effects)

In Example 2.6.4, if the price of making foccacia increases, how will this effect the amount
of ciabatta Alejandro makes? Assume everything else stays the same – the utility function
stays the same, the price of ciabatta stays the same, the budget I stays the same, and the
assumption remains that Alejandro will maximize utility subject to his budget constraint.
Are the ciabatta and focaccia normal goods, or inferior goods?
Solution. Surprisingly, the price of focaccia doesn’t affect the consumption of ciabatta at
all! The Marshallian demand of ciabatta is
I
cm ( I, pc , p f ) =
3pc
Since p f doesn’t even show up, the derivative is easy to take:
Bcm
=0
Bpf
That means the price effect of focaccia on Alejandro’s ciabatta baking is zero. (If the price
of focaccia goes up, the impact on his baking habits are that he will make less focaccia.)

101
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS

To decide whether ciabatta and focaccia are normal or inferior goods, we should take
their partial derivatives with respect to I.

I 2I
cm ( I, pc , p f ) = f m ( I, pc , p f ) =
3pc 3p f
Bcm 1 Bfm 2
= ą0 = ą0
BI 3pc BI 3p f

Since both derivatives are positive everywhere, both breads are normal goods.
Example 2.6.8

Example 2.6.9

Kenechukwu is doing groceries for the week, and as usual he has I dollars to spend on
fruits and berries. If he consumes a kg of apples and s kg of strawberries, then his utility
function is:
u( a, s) = a1/2 s1/4

Apples cost p a dollars per kg, and strawberries cost ps dollars per kg.
Find Kenechukwu’s Marshallian demand function for apples. What is the price effect
of ps on apples? Are apples normal or inferior goods?
Solution. The utility function will be the objective function and the constraint will be the
budget constraint. As in Example 2.6.2, the endpoints a = 0 and s = 0 are minima of the
utility function. (We see this because setting either a = 0 or s = 0 leads to u = 0; and since
u involves even roots, it never returns a negative value.) So, the maximum will happen at
some internal point, which we can find using Lagrange multipliers.
The budget constraint is p a a + ps s = I ñ b( a, s) = p a a + ps s ´ I.

1 ´1/2 1/4
$ $
&u a
’ = λ ¨ ba &2a
’ s = λ ¨ pa
1 1/2
ub = λ ¨ bb ùñ 4 a s ´3/4 = λ ¨ ps
b( a, s) =0 p a a + ps s ´ I =0

% ’
%

From the first two equations, we see

1 ´1/2 1/4 1 1/2 ´3/4


λ= a s = a s
2p a 4ps
2ps a´1/2 s1/4 = p a a1/2 s´3/4
2ps s1/4+3/4 = p a a1/2+1/2
2ps s = p a a
2ps
s=a
pa

102
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS

Now, to satisfy the budget constraint,


 
2ps
pa s + ps s = I
pa
3ps s = I
I 2ps 2ps I 2I
s= , a= s= ¨ =
3ps pa p a 3ps 3p a

So, our Marshilian demand function are


2I I
am ( p a , ps , I ) = sm ( p a , ps , I ) =
3p a 3ps

The price effect of ps on apples is nothing, since

Bam
=0
B ps

The goods are both normal, because

Bam 2 Bsm 1
= ą0 = ą0
BI 3p a BI 3ps

The optimal consumption is 1 kg of strawberries and 4 kg of apples. This leads to the


maximum utility, u(4, 1) = 2.
Example 2.6.9

So far, our paradigm has been to optimize happiness, given a fixed budget. Thinking
about utility another way, we could fix the desired amount of utility, and try to minimize
the cost required to achieve it. In this paradigm, our utility function is our constraint,
while our cost function is the objective function we want to minimize. This gives rise to
Hicksian demand.

Definition2.6.10 (Hicksian demand).

Suppose u( x, y) is the utility function for quantities x and y of two goods, and
suppose the consumer requires a utility value of at least U, where U is some
positive constant. Let these goods have unit prices p x and py . Then the function

x h ( p x , py , U )

giving the optimal consumption of x to minimize the cost function f ( x, y) =


p x x + py y subject to the constraint u( x, y) ě U is called the Hicksian demand func-
tion.

The definition requires that the utility be at least some fixed constant. In practice, we
can usually assume that the utility is equal to that fixed constant. That’s because if we

103
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS

have a higher utility than necessary, we can usually save some money by bringing our
utility down to its minimum allowable level. This could only fail to be the case if, at some
point, our utility function had a negative partial derivative. A negative partial derivative
indicates that we might increase utility as we decrease consumption.
Example 2.6.11

Lets go back to Alejandro and his passion for baking. This weekend he would like to make
ciabatta (c) and baguettes (B). Ciabatta costs pc dollars to make and baguettes pb dollars.
His utility function is as follows: ?
u(c, b) = cb
Fixing Alejandro’s utility as the constant u(c, b) = U, find his Hicksian Demand for both
breads.
Solution. The cost will be the
? objective function, f (c, b) = pc c + pb b. The utility function
gives our constraint, U = cb. We can find the constrained minimum of f (c, b) using
substitution.

?
U= cb
2
U = cb
U2
c=
b
Plugging this into our objective function,
   
U2
f (c, b) = pc c + pb b = pc + pb b = U 2 pc b´1 + pb b
b
This is a function of one variable. Let’s find the critical points.
 
0 = ´ U 2 pc b´2 + pb
 
U 2 pc b´2 = pb
U 2 pc
= b2
pb
pc
c
b=U
pb
U2 pb
c
At that point, c = =U
b pc
To verify that this critical point gives a global minimum, consider the second derivative
of our one-variable function.
d h  2  ´2 i  
´ U pc b + pb = 2 U 2 pc b´3
db
Our model domain only allows for non-negative values of b, so the second derivative is
non-negative everywhere. That means its global minimum is at its sole critical point. That

104
PARTIAL D ERIVATIVES 2.6 (O PTIONAL ) U TILITY AND D EMAND F UNCTIONS

pb pc
b b
means that, all together, we found that the quantities c = U pc and b = U pb minimize
the cost function f (c, b) = pc c + pb b subject to the constraint u(c, b) = U. So, our Hicksian
demand functions are:
pb pc
c c
h h
c ( pc , pb , U ) = U and b ( pc , pb , U ) = U
pc pb

Example 2.6.11

h
In Example 2.6.11, note BBcp ‰ 0. This is in contrast to examples 2.6.8 and 2.6.9, where
b
the price effects of one good’s price on the other good’s consumption were both 0. Hick-
sian demand is sometimes used to study the substitution effect, where a change in price in
one good causes a change in consumption of another good.

105
Chapter 3

I NTEGRATION

Calculus is built on two operations — differentiation and integration.

• Differentiation — as we saw last term, differentiation allows us to compute and


study the instantaneous rate of change of quantities. At its most basic it allows
us to compute tangent lines and velocities, but it also led us to quite sophisticated
applications including approximation of functions through Taylor polynomials and
optimisation of quantities by studying critical points.

• Integration — at its most basic, allows us to analyse the area under a curve. Of
course, its application and importance extend far beyond areas and it plays a central
role in solving differential equations.

It is not immediately obvious that these two topics are related to each other. However, as
we shall see, they are indeed intimately linked.

3.1IJ Definition of the Integral


Arguably the easiest way to introduce integration is by considering the area between the
graph of a given function and the x-axis, between two specific vertical lines — such as is
shown in the figure above. We’ll follow this route by starting with a motivating example.

106
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

§§ A Motivating Example
Let us find the areaˇ under the curve y = e(x (and above the x–axis) for 0 ď x ď 1. That is,
the area of ( x, y) ˇ 0 ď y ď e x , 0 ď x ď 1 .

This area is equal to the “definite integral”


ż1
Area = e x dx
0
Do not worry about this notation or terminology just yet. We discuss it at length below.
In different applications this quantity will have different interpretations — not just area.
For example, if x is time and e x is your velocity at time x, then we’ll see later (in Exam-
ple 3.1.12) that the specified area is the net distance travelled between time 0 and time 1.
After we finish with the example, we’ll mimic it to give a general definition of the integral
şb
a f ( x )dx.

Example 3.1.1

We wish to compute the area of ( x, y) ˇ 0 ď y ď e x , 0 ď x ď 1 . We know, from our


ˇ (

experience with e x in differential calculus, that the curve y = e x is not easily written in
terms of other simpler functions, so it is very unlikely that we would be able to write the
area as a combination of simpler geometric objects such as triangles, rectangles or circles.
So rather than trying to write down the area exactly, our strategy is to approximate the
area and then make our approximation more and more precise1 . We choose2 to approx-
imate the area as a union of a large number of tall thin (vertical) rectangles. As we take
more and more rectangles we get better and better approximations. Taking the limit as
the number of rectangles goes to infinity gives the exact area3 .
As a warm up exercise, we’ll now just use four rectangles. In Example 3.1.2, below,
we’ll consider an arbitrary number of rectangles and then take the limit as the number of
rectangles goes to infinity. So

1 This should remind the reader of the approach taken to compute the slope of a tangent line way way
back at the start of differential calculus.
2 Approximating the area in this way leads to a definition of integration that is called Riemann integra-
tion. This is the most commonly used approach to integration. However we could also approximate the
area by using long thin horizontal strips. This leads to a definition of integration that is called Lebesgue
integration. We will not be covering Lebesgue integration in these notes.
3 If we want to be more careful here, we should construct two approximations, one that is always a little
smaller than the desired area and one that is a little larger. We can then take a limit using the Squeeze
Theorem and arrive at the exact area. More on this later.

107
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

• subdivide the interval 0 ď x ď 1 into 4 equal subintervals each of width 1/4, and

• subdivide the area of interest into four corresponding vertical strips, as in the figure
below.
The area we want is exactly the sum of the areas of all four strips.

y = ex
y

1 1 3 x
4 2 4
1

Each of these strips is almost, but not quite, a rectangle. While the bottom and sides are
fine (the sides are at right-angles to the base), the top of the strip is not horizontal. This
is where we must start to approximate. We can replace each strip by a rectangle by just
levelling off the top. But now we have to make a choice — at what height do we level off
the top?
Consider, for example, the leftmost strip. On this strip, x runs from 0 to 1/4. As x
runs from 0 to 1/4, the height y runs from e0 to e1/4 . It would be reasonable to choose the
height of the approximating rectangle to be somewhere between e0 and e1/4 . Which height

y y = ex

e1/40
e

1 x
4

should we choose? Well, actually it doesn’t matter. When we eventually take the limit of
infinitely many approximating rectangles all of those different choices give exactly the
same final answer. We’ll say more about this later.
In this example we’ll do two sample computations.
• For the first computation we approximate each slice by a rectangle whose height is
the height of the left hand side of the slice.

– On the first slice, x runs from 0 to 1/4, and the height y runs from e0 , on the left
hand side, to e1/4 , on the right hand side.

108
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

– So we approximate the first slice by the rectangle of height e0 and width 1/4,
and hence of area 14 e0 = 14 .
– On the second slice, x runs from 1/4 to 1/2, and the height y runs from e1/4 and
e1/2 .
– So we approximate the second slice by the rectangle of height e1/4 and width
1/4, and hence of area 1 e1/4 .
4
– And so on.
– All together, we approximate the area of interest by the sum of the areas of the
four approximating rectangles, which is

 1
1 + e /4 + e /2 + e /4
1 1 3
= 1.5124
4

– This particular approximation is called the “left Riemann sum approximation


ş1
to 0 e x dx with 4 subintervals”. We’ll explain this terminology later.
– This particular approximation represents the shaded area in the figure on the
left below. Note that, because e x increases as x increases, this approximation is
definitely smaller than the true area.

y
y = ex
y y = ex

1 2 3 4 x 1 2 3 4 x
4 4 4 4 4 4 4 4

• For the second computation we approximate each slice by a rectangle whose height
is the height of the right hand side of the slice.

– On the first slice, x runs from 0 to 1/4, and the height y runs from e0 , on the left
hand side, to e1/4 , on the right hand side.
– So we approximate the first slice by the rectangle of height e1/4 and width 1/4,
and hence of area 14 e1/4 .
– On the second slice, x runs from 1/4 to 1/2, and the height y runs from e1/4 and
e1/2 .

109
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

– So we approximate the second slice by the rectangle of height e1/2 and width
1/4, and hence of area 1 e1/2 .
4
– And so on.
– All together, we approximate the area of interest by the sum of the areas of the
four approximating rectangles, which is

 1
e /4 + e /2 + e /4 + e1
1 1 3
= 1.9420
4

– This particular approximation is called the “right Riemann sum approximation


ş1
to 0 e x dx with 4 subintervals”.
– This particular approximation represents the shaded area in the figure on the
right above. Note that, because e x increases as x increases, this approximation is
definitely larger than the true area.

Example 3.1.1

Now for the full computation that gives the exact area.
Example 3.1.2

Recall that we wish to compute the area of ( x, y) ˇ 0 ď y ď e x , 0 ď x ď 1 and that our


ˇ (

strategy is to approximate this area by the area of a union of a large number of very thin
rectangles, and then take the limit as the number of rectangles goes to infinity. In Exam-
ple 3.1.1, we used just four rectangles. Now we’ll consider a general number of rectangles,
that we’ll call n. Then we’ll take the limit n Ñ 8. So

• pick a natural number n and

• subdivide the interval 0 ď x ď 1 into n equal subintervals each of width 1/n, and

• subdivide the area of interest into corresponding thin strips, as in the figure below.

The area we want is exactly the sum of the areas of all of the thin strips.

y = ex
y

1 2 n x
n n
··· n

110
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

Each of these strips is almost, but not quite, a rectangle. As in Example 3.1.1, the only
problem is that the top is not horizontal. So we approximate each strip by a rectangle, just
by levelling off the top. Again, we have to make a choice — at what height do we level off
the top?
Consider, for example, the leftmost strip. On this strip, x runs from 0 to 1/n. As x runs
from 0 to 1/n, the height y runs from e0 to e1/n . It would be reasonable to choose the height
of the approximating rectangle to be somewhere between e0 and e1/n . Which height should
we choose?
Well, as we said in Example 3.1.1, it doesn’t matter. We shall shortly take the limit
n Ñ 8 and, in that limit, all of those different choices give exactly the same final answer.
We won’t justify that statement in this example, but Appendix section A.10 provides the
justification. For this example we just, arbitrarily, choose the height of each rectangle to be
the height of the graph y = e x at the smallest value of x in the corresponding strip4 . The
figure on the left below shows the approximating rectangles when n = 4 and the figure
on the right shows the approximating rectangles when n = 8.

y y
x
y=e y = ex

1 2 3 4 x 1 2 3 4 5 6 7 8 x
4 4 4 4 8 8 8 8 8 8 8 8

Now we compute the approximating area when there are n strips.


• We approximate the leftmost strip by a rectangle of height e0 . All of the rectangles
have width 1/n. So the leftmost rectangle has area n1 e0 .

• On strip number 2, x runs from n1 to n2 . So the smallest value of x on strip number 2


is n1 , and we approximate strip number 2 by a rectangle of height e1/n and hence of
area n1 e1/n .
• And so on.
• On the last strip, x runs from n´1 n
n to n = 1. So the smallest value of x on the last strip
is n´1
n , and we approximate the last strip by a rectangle of height e
(n´1)/n
and hence
1 (n´1)/n
of area n e .

4 Notice that since e x is an increasing function, this choice of heights means that each of our rectangles is
smaller than the strip it came from.

111
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

The total area of all of the approximating rectangles is

1 0 1 1/n 1 2/n 1 3/n 1 (n´1)/n


Total approximating area = e + e + e + e +¨¨¨+ e
n n n n n
1  
1/n 2/n 3/n (n´1)/n
= 1+ e + e + e +¨¨¨+ e
n
Now the sum in the brackets might look a little intimidating because of all the exponen-
tials, but it actually has a pretty simple structure that can be easily seen if we rename
e1/n = r. Then

• the first term is 1 = r0 and

• the second term is e1/n = r1 and

• the third term is e2/n = r2 and

• the fourth term is e3/n = r3 and

• and so on and

• the last term is e(n´1)/n = r n´1 .

So
1 
Total approximating area = 1 + r + r2 + ¨ ¨ ¨ + r n´1
n
The sum in brackets is known as a geometric sum and satisfies a nice simple formula:

Equation 3.1.3(Geometric sum).

rn ´ 1
1 + r + r2 + ¨ ¨ ¨ + r n´1 = provided r ‰ 1
r´1

The derivation of the above formula is not too difficult. So let’s derive it in a little aside.

§§§ Geometric Sum


Denote the sum as

S = 1 + r + r2 + ¨ ¨ ¨ + r n´1

Notice that if we multiply the whole sum by r we get back almost the same thing:
 
2 n´1
rS = r 1 + r + r + ¨ ¨ ¨ + r
= r + r2 + r3 + ¨ ¨ ¨ + r n

This right hand side differs from the original sum S only in that

• the right hand side is missing the “1+ ” that S starts with and

112
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

• the right hand side has an extra “+r n ” at the end that does not appear in S.

That is

rS = S ´ 1 + r n

Moving this around a little gives

(r ´ 1) S = (r n ´ 1)
rn ´ 1
S=
r´1
as required. Notice that the last step in the manipulations only works providing r ‰ 1
(otherwise we are dividing by zero).

§§§ Back to Approximating Areas


Now we can go back to our area approximation armed with the above result about geo-
metric sums.
1 2 n´1

Total approximating area = 1+r +r +¨¨¨+r
n
1 rn ´ 1
= remember that r = e1/n
n r´1
1 en/n ´ 1
=
n e1/n ´ 1
1 e´1
=
n e1/n ´ 1

To get the exact area5 all we need to do is make the approximation better and better
by taking the limit n Ñ 8. The limit will look more familiar if we rename 1/n to X. As n
tends to infinity, X tends to 0, so

1 e´1
Area = lim
nÑ8 n e1/n ´ 1
1/n
= (e ´ 1) lim 1/n
nÑ8 e ´1
X
= (e ´ 1) lim X (with X = 1/n)
XÑ0 e ´ 1

Examining this limit we see that both numerator and denominator tend to zero as X Ñ
0, and so we cannot evaluate this limit by computing the limits of the numerator and
denominator separately and then dividing the results. Despite this, the limit is not too
hard to evaluate; here we give two ways:

5 We haven’t proved that this will give us the exact area, but it should be clear that taking this limit will
give us a lower bound on the area. To complete things rigorously we also need an upper bound and
the Squeeze Theorem. We do this in the next optional subsection.

113
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

• Perhaps the easiest way to compute the limit is by using l’Hôpital’s rule6 . Since both
numerator and denominator go to zero, this is a 0/0 indeterminate form. Thus
d
X dX X 1
lim X = lim = lim =1
XÑ0 e ´ 1 d X XÑ0 e X
dX ( e ´ 1)
XÑ0

• Another way7 to evaluate the same limit is to observe that it can be massaged into
the form of the limit definition of the derivative. First notice that
 ´1
X eX ´ 1
lim = lim
XÑ0 e X ´ 1 XÑ0 X
provided this second limit exists and is nonzero. This second limit should look a
little familiar:
eX ´ 1 e X ´ e0
lim = lim
XÑ0 X XÑ0 X ´ 0

which is just the definition of the derivative of e x at x = 0. Hence we have


 ´1
X e X ´ e0
lim = lim
XÑ0 e X ´ 1 XÑ0 X ´ 0
 ´1
d X ˇˇ
= e ˇ
dX X =0
h ˇ i´1
= e X ˇ X =0
=1

So, after this short aside into limits, we may now conclude that
X
Area = (e ´ 1) lim
XÑ0 e X ´ 1
= e´1

Example 3.1.2

A more rigorous area computation can be found in Appendix A.6

3.1.1 §§ Summation Notation


As you can see from the above example (and the more careful rigorous computation), our
discussion of integration will involve a fair bit of work with sums of quantities. To this
end, we make a quick aside into summation notation. While one can work through the

6 If you do not recall L’Hôpital’s rule and indeterminate forms then we recommend you skim over your
differential calculus notes on the topic.
7 Say if you don’t recall l’Hôpital’s rule and have not had time to revise it.

114
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

material below without this notation, proper summation notation is well worth learning,
so we advise the reader to persevere.
Writing out the summands explicitly can become quite impractical — for example, say
we need the sum of the first 11 squares:

1 + 22 + 32 + 42 + 52 + 62 + 72 + 82 + 92 + 102 + 112

This becomes tedious. Where the pattern is clear, we will often skip the middle few terms
and instead write

1 + 22 + ¨ ¨ ¨ + 112 .

A far more precise way to write this is using Σ (capital-sigma) notation. For example, we
can write the above sum as
11
k2
ÿ

k =1

This is read as

The sum from k equals 1 to 11 of k2 .

More generally

Notation3.1.4.

Let m ď n be integers and let f ( x ) be a function defined on the integers. Then we


write
n
ÿ
f (k)
k=m

to mean the sum of f (k ) for k from m to n:

f ( m ) + f ( m + 1) + f ( m + 2) + ¨ ¨ ¨ + f ( n ´ 1) + f ( n ).

Similarly we write
n
ÿ
ai
i =m

to mean

am + am+1 + am+2 + ¨ ¨ ¨ + an´1 + an

for some set of coefficients tam , . . . , an u.

115
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

Consider the example


7
ÿ 1 1 1 1 1 1
2
= 2+ 2+ 2+ 2+ 2
k 3 4 5 6 7
k =3

It is important to note that the right hand side of this expression evaluates to a number8 ; it
does not contain “k”. The summation index k is just a “dummy” variable and it does not
have to be called k. For example
7 7 7 7
ÿ 1 ÿ 1 ÿ 1 ÿ 1
2
= 2
= 2
=
k i j
i =3
`2 j =3
k =3 `=3

Also the summation index has no meaning outside the sum. For example
7
ÿ 1
k
k2
k =3

has no mathematical meaning; it is gibberish.


A sum can be represented using summation notation in many different ways. If you
are unsure as to whether or not two summation notations represent the same sum, just
write out the first few terms and the last couple of terms. For example,

hkm =3kj hkm


kik =4kj hkm
kik =5kj
kik =14
hkmkik =15
kj hkmkikkj
15
ÿ 1 1 1 1 1 1
2
= 2
+ 2 + 2 +¨¨¨+ 2
+
m =3
m 3 4 5 14 152
hkm =4kj hkm
kik =5kj hkm
kik =6kj
kik =15
hkmkik =16
kj hkmkikkj
16
ÿ 1 1 1 1 1 1
2
= 2
+ 2 + 2 +¨¨¨+ 2
+
m =4
( m ´ 1) 3 4 5 14 152

are equal.
Here is a theorem that gives a few rules for manipulating summation notation.

Theorem3.1.5 (Arithmetic of Summation Notation).

Let n ě m be integers. Then for all real numbers c and ai , bi , m ď i ď n.


n
 n 
(a) cai = c ai
ř ř
i =m i =m

n
 n
  n

(b) ( a i + bi ) = ai bi
ř ř ř
+
i =m i =m i =m

n
 n
  n

(c) ( a i ´ bi ) = ai ´ bi
ř ř ř
i =m i =m i =m

46181
8 Some careful addition shows it is 176400 .

116
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

Proof. We can prove this theorem by just writing out both sides of each equation, and
observing that they are equal, by the usual laws of arithmetic9 . For example, for the first
equation, the left and right hand sides are
n
ÿ ÿ
n 
cai = cam + cam+1 + ¨ ¨ ¨ + can and c ai = c ( a m + a m +1 + ¨ ¨ ¨ + a n )
i =m i =m

They are equal by the usual distributive law. The “distributive law” is the fancy name for
c( a + b) = ca + cb.

Not many sums can be computed exactly10 . Here are some that can. The first few are
used a lot.

Theorem3.1.6.

n n +1
(a) ari = a 1´r
1´r , for all real numbers a and r ‰ 1 and all integers n ě 0.
ř
i =0
n
(b) 1 = n, for all integers n ě 1.
ř
i =1
n
(c) i = 21 n(n + 1), for all integers n ě 1.
ř
i =1
n
(d) i2 = 61 n(n + 1)(2n + 1), for all integers n ě 1.
ř
i =1
n h i2
1
(e) i3 = 2 n ( n + 1) , for all integers n ě 1.
ř
i =1

9 Since all the sums are finite, this isn’t too hard. More care must be taken when the sums involve an
infinite number of terms. We will examine this in Chapter 5.
10 Of course, any finite sum can be computed exactly — just sum together the terms. What we mean by
“computed exactly” in this context, is that we can rewrite the sum as a simple, and easily evaluated,
formula involving the terminals of the sum. For example
n
r n +1 ´ r m
rk =
ÿ
provided r ‰ 1
r´1
k=m

No matter what finite integers we choose for m and n, we can quickly compute the sum in just a few
arithmetic operations. On the other hand, the sums,
n n
ÿ 1 ÿ 1
k k2
k=m k=m

cannot be expressed in such clean formulas (though you can rewrite them quite cleanly using integrals).
To explain more clearly we would need to go into a more detailed and careful discussion that is beyond
the scope of this course.

117
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

§§§ Proof of Theorem 3.1.6


Proof. (a) The first sum is
n
ari = ar0 + ar1 + ar2 + ¨ ¨ ¨ + ar n
ÿ

i =0

which is just the left hand side of equation (3.1.3), with n replaced by n + 1 and then
multiplied by a.
(b) The second sum is just n copies of 1 added together, so of course the sum is n.
n
(c) The sum i = 1 + 2 + 3 + ¨ ¨ ¨ + n can be visualized as the area of the red stairsteps
ř
i =1
below: the first column has area 1, the second column has area 2, and so on.

1 2 3 n

If we duplicate those stairsteps and spin them around, we make a rectangle with base
n + 1 and height n.

Since the red stairsteps are exactly half the total area of that rectangle,
n
ÿ 1
i= (n)(n + 1)
2
i =1

(d) The last two identities are proved in Question 32 of Section 5.2 of the practice book.

118
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

3.1.2 §§ The Definition of the Definite Integral


żb
In this section we give a definition of the definite integral f ( x )dx generalising the ma-
a
chinery we used in Example 3.1.1. But first some terminology and a couple of remarks to
better motivate the definition.
Notation3.1.7.
żb
The symbol f ( x )dx is read “the definite integral of the function f ( x ) from
a şb
a to b”. The function f ( x ) is called the integrand of a f ( x )dx and a and b are
called11 the limits of integration. The interval a ď x ď b is called the interval of
integration and is also called the domain of integration.

Before we explain more precisely what the definite integral actually is, a few remarks
(actually — a few interpretations) are in order.
żb
• If f ( x ) ě 0 and a ď b, one interpretation of the symbol f ( x )dx is “the area of the
a
region ( x, y) ˇ a ď x ď b, 0 ď y ď f ( x ) ”.
ˇ (

y y = f (x)

a x
b

ş1
In this way we can rewrite the area in Example 3.1.1 as the definite integral 0 e x dx.
• This interpretation breaks down when either a ą b or f ( x ) is not always positive,
but it can be repaired by considering “signed areas”.
şb
• If a ď b, but f ( x ) is not always positive, one interpretation of a f ( x )dx is “the signed
area between y = f ( x ) and the x–axis for a ď x ď b”. For “signed area” (which
is also called the “net area”), areas above the x–axis count as positive while areas
below the x–axis count as negative. In the example below, we have the graph of the
function
$
&´1 if 1 ď x ď 2

f (x) = 2 if 2 ă x ď 4
0 otherwise

%

11 a and b are also called the bounds of integration.

119
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

The 2 ˆ 2 shaded square above the x–axis has signed area +2 ˆ 2 = +4. The 1 ˆ 1
shaded square below the x–axis has signed area ´1 ˆ 1 = ´1. So, for this f ( x ),

ż5
f ( x )dx = +4 ´ 1 = 3
0

y
2

+ signed area= +4

1 2 4 x
− signed area= −1
−1

• We’ll come back to the case b ă a later.


şb
We’re now ready to define a f ( x )dx. The definition is a little involved, but essentially
mimics what we did in Example 3.1.1 (which is why we did the example before the defini-
tion). The main differences are that we replace the function e x by a generic function f ( x )
and we replace the interval from 0 to 1 by the generic interval12 from a to b.

• We start by selecting any natural number n and subdividing the interval from a to b
into n equal subintervals. Each subinterval has width b´a
n .

• Just as was the case in Example 3.1.1 we will eventually take the limit as n Ñ 8,
which squeezes the width of each subinterval down to zero.

• For each integer 0 ď i ď n, define xi = a + i ¨ b´a


n . Note that this means that x0 = a
and xn = b. It is worth keeping in mind that these numbers xi do depend on n even
though our choice of notation hides this dependence.

• Subinterval number i is xi´1 ď x ď xi . In particular, on the first subinterval, x


runs from x0 = a to x1 = a + b´a
n . On the second subinterval, x runs from x1 to
b´a
x2 = a + 2 n .

12 We’ll eventually allow a and b to be any two real numbers, not even requiring a ă b. But it is easier to
start off assuming a ă b, and that’s what we’ll do.

120
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

y
y = f (x)

x
a = x0 x1 x2 x3 ··· xn−1 xn = b

• On each subinterval we now pick xi,n ˚ between x


i´1 and xi . We then approximate
th
f ( x ) on the i subinterval by the constant function y = f ( xi,n
˚ ). We include n in the

subscript to remind ourselves that these numbers depend on n.


Geometrically, we’re approximating the region

( x, y) ˇ x is between xi´1 and xi , and y is between 0 and f ( x )


ˇ (

by the rectangle

( x, y) ˇ x is between xi´1 and xi , and y is between 0 and f ( xi,n


ˇ ˚
(
)

In Example 3.1.1 we chose xi,n x


i´1 and so we approximated the function e on
˚ = x

each subinterval by the value it took at the leftmost point in that subinterval.

• So, when there are n subintervals our approximation to the signed area between the
curve y = f ( x ) and the x–axis, with x running from a to b, is
n
ÿ b´a
f ( xi,n
˚

n
i =1

b´a
We interpret this as the signed area since the summands f ( xi,n
˚ )¨
n need not be
positive.

• Finally we define the definite integral by taking the limit of this sum as n Ñ 8.

121
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

Oof! This is quite an involved process, but we can now write down the definition we
żb
need. (A more mathematically rigorous definition of the definite integral f ( x )dx can
a
be found in Appendix A.7.)

Definition3.1.8.

Let a and b be two real numbers and let f ( x ) be a function that is defined for all
x between a and b. Then we define
żb n
ÿ b´a
f ( x )dx = lim f ( xi,n
˚

a nÑ8 n
i =1

when the limit exists and takes the same value for all choices of the xi,n
˚ ’s. In this

case, we say that f is integrable on the interval from a to b.

Of course, it is not immediately obvious when this limit should exist. Thankfully it is
easier for a function to be “integrable” than it is for it to be “differentiable”.

Theorem3.1.9.

Let f ( x ) be a function on the interval [ a, b]. If

• f ( x ) is continuous on [ a, b], or

• f ( x ) has a finite number of jump discontinuities on [ a, b] (and is otherwise


continuous)

then f ( x ) is integrable on [ a, b].

We will not justify this theorem. But a slightly weaker statement is proved in (the
optional) Section A.7. Of course this does not tell us how to actually evaluate any definite
integrals — but we will get to that in time.
Some comments:
• Note that, in Definition 3.1.8, we allow a and b to be any two real numbers. We do
şb
not require that a ă b. That is, even when a ą b, the symbol a f ( x )dx is still defined
şb
by the formula of Definition 3.1.8. We’ll get an interpretation for a f ( x )dx, when
a ą b, later.
şb
• It is important to note that the definite integral a f ( x )dx represents a number, not a
function of x. The integration variable x is another “dummy” variable, just like the
summation index i in in=m ai (see Section 3.1.1). The integration variable does not
ř
have to be called x. For example
żb żb żb
f ( x )dx = f (t)dt = f (u)du
a a a

122
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

Just as with summation variables, the integration variable x has no meaning outside
of f ( x )dx. For example

ż1 żx
x
x e dx and e x dx
0 0

are both gibberish.

The sum inside definition 3.1.8 is named after Bernhard Riemann13 who made the first
rigorous definition of the definite integral and so placed integral calculus on rigorous
footings.

13 Bernhard Riemann was a 19th century German mathematician who made extremely important con-
tributions to many different areas of mathematics — far too many to list here. Arguably two of the
most important (after Riemann sums) are now called Riemann surfaces and the Riemann hypothesis
(he didn’t name them after himself).

123
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

Definition3.1.10.

The sum inside definition 3.1.8


n
ÿ b´a
f ( xi,n
˚
)
n
i =1

is called a Riemann sum. It is also often written as


n
ÿ
f ( xi˚ ) ∆x
i =1

b´a
where ∆x = n .
b´a
• If we choose each xi,n ˚ = x
i´1 = a + (i ´ 1) n to be the left hand end point
of the ith interval, [ xi´1 , xi ], we get the approximation
n  
ÿ b´a b´a
f a + ( i ´ 1)
n n
i =1
şb
which is called the “left Riemann sum approximation to a f ( x )dx with n
subintervals”. This is the approximation used in Example 3.1.1.
˚ = x = a + i b´a we obtain the approxi-
• In the same way, if we choose xi,n i n
mation
n  
ÿ b´a b´a
f a+i
n n
i =1
şb
which is called the “right Riemann sum approximation to a f ( x )dx with n
subintervals”. The word “right” signifies that, on each subinterval [ xi´1 , xi ]
we approximate f by its value at the right–hand end–point, xi = a + i b´a n ,
of the subinterval.

• A third commonly used approximation is


n  
ÿ b´a b´a
f a + (i ´ 1/2)
n n
i =1
şb
which is called the “midpoint Riemann sum approximation to a f ( x )dx
with n subintervals”. The word “midpoint” signifies that, on each subin-
x +x
terval [ xi´1 , xi ] we approximate f by its value at the midpoint, i´12 i =
a + (i ´ 1/2) b´an , of the subinterval.

In order to compute a definite integral using Riemann sums we need to be able to

124
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

compute the limit of the sum as the number of summands goes to infinity. This approach is
not always feasible and we will soon arrive at other means of computing definite integrals
based on antiderivatives. However, Riemann sums also provide us with a good means of
approximating definite integrals — if we take n to be a large, but finite, integer, then the
corresponding Riemann sum can be a good approximation of the definite integral. Under
certain circumstances this can be strengthened to give rigorous bounds on the integral.
Let us revisit Example 3.1.1.
Example 3.1.11
ş1
Let’s say we are again interested in the integral 0 e x dx. We can follow the same procedure
as we used previously to construct Riemann sum approximations. However since the in-
tegrand f ( x ) = e x is an increasing function, we can make our approximations into upper
and lower bounds without much extra work.
More precisely, we approximate f ( x ) on each subinterval xi´1 ď x ď xi

• by its smallest value on the subinterval, namely f ( xi´i ), when we compute the left
Riemann sum approximation and

• by its largest value on the subinterval, namely f ( xi ), when we compute the right
Riemann sum approximation.

This is illustrated in the two figures below. The shaded region in the left hand figure is
the left Riemann sum approximation and the shaded region in the right hand figure is the
right Riemann sum approximation.

y y = ex y y = ex

1 2 n x 1 2 n x
n n
··· n n n
··· n

We can see that exactly because f ( x ) is increasing, the left Riemann sum describes an area
smaller than the definite integral while the right Riemann sum gives an area larger14 than
the integral.
ş1
When we approximate the integral 0 e x dx using n subintervals, then, on interval num-
ber i,
i´1 i
• x runs from n to n and

14 When a function is decreasing the situation is reversed — the left Riemann sum is always larger than the
integral while the right Riemann sum is smaller than the integral. For more general functions that both
increase and decrease it is perhaps easiest to study each increasing (or decreasing) interval separately.

125
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

• y = e x runs from e(i´1)/n , when x is at the left hand end point of the interval, to ei/n ,
when x is at the right hand end point of the interval.
ş1
Consequently, the left Riemann sum approximation to 0 e x dx is in=1 e(i´1)/n n1 and the
ř

right Riemann sum approximation is in=1 ei/n ¨ n1 . So


ř

n ż1 n
(i´1)/n 1 1
e x dx ď e /n ¨
i
ÿ ÿ
e ď
n 0 n
i =1 i =1

Thus Ln = in=1 e(i´1)/n n1 , which for any n can be evaluated by computer, is a lower bound
ř
ş1
on the exact value of 0 e x dx and Rn = in=1 ei/n n1 , which for any n can also be evaluated by
ř
ş1
computer, is an upper bound on the exact value of 0 e x dx. For example, when n = 1000,
Ln = 1.7174 and Rn = 1.7191 (both to four decimal places) so that, again to four decimal
places,
ż1
1.7174 ď e x dx ď 1.7191
0

Recall that the exact value is e ´ 1 = 1.718281828 . . . .


Example 3.1.11

So far, we have only a single interpretation15 for definite integrals — namely areas
under graphs. In the following example, we develop a second interpretation.
Example 3.1.12 (Another Interpretation for Definite Integrals)

Suppose that a particle is moving along the x–axis and suppose that at time t its velocity is
v(t) (with v(t) ą 0 indicating rightward motion and v(t) ă 0 indicating leftward motion).
What is the change in its x–coordinate between time a and time b ą a?
We’ll work this out using a procedure similar to our definition of the integral. First
pick a natural number n and divide the time interval from a to b into n equal subintervals,
each of width b´an . We are working our way towards a Riemann sum (as we have done
several times above) and so we will eventually take the limit n Ñ 8.

• The first time interval runs from a to a + b´a


n . If we think of n as some large number,
b´a
the width of this interval, n is very small and over this time interval, the velocity
does not change very much. Hence we can approximate the velocity over the first
subinterval as being essentially constant at its value at the start of the time interval —
v( a). Over the subinterval the x-coordinate changes by velocity times time, namely
v( a) ¨ b´a
n .

• Similarly, the second interval runs from time a + b´a b´a


n to time a + 2 n . Again, we
can assume that the velocity does not change very much and so we can approximate
the velocity as being essentially constant at its value at the start of the subinterval

15 If this were the only interpretation then integrals would be a nice mathematical curiosity and unlikely
to be the core topic of a large first year mathematics course.

126
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

 
b´a
— namely v a + . So during the second subinterval the particle’s x–coordinate
n
 
changes by approximately v a + b´a
n
b´a
n .

• In general, time subinterval number i runs from a + (i ´ 1) b´a b´a


n to a + i n and during
this subinterval the particle’s x–coordinate changes, essentially, by
 
b´a b´a
v a + ( i ´ 1) .
n n

So the net change in x–coordinate from time a to time b is approximately

b´a  b ´ a b ´ a  b ´ a b ´ a
v( a) +v a+ + ¨ ¨ ¨ + v a + ( i ´ 1) +¨¨¨
n n n n n
 b ´ a b ´ a
+ v a + ( n ´ 1)
n n
n  
ÿ b´a b´a
= v a + ( i ´ 1)
n n
i =1

This exactly the left Riemann sum approximation to the integral of v from a to b with
şb
n subintervals. The limit as n Ñ 8 is exactly the definite integral a v(t)dt. Following
tradition, we have called the (dummy) integration variable t rather than x to remind us
that it is time that is running from a to b.
The conclusion of the above discussion is that if a particle is moving along the x–axis
and its x–coordinate and velocity at time t are x (t) and v(t), respectively, then, for all
b ą a,
żb
x (b) ´ x ( a) = v(t)dt.
a

Example 3.1.12

3.1.3 §§ Using Known Areas to Evaluate Integrals


One of the main aims of this course is to build up general machinery for computing def-
inite integrals (as well as interpreting and applying them). We shall start on this soon,
but not quite yet. We have already seen one concrete, if laborious, method for computing
definite integrals — taking limits of Riemann sums as we did in Example 3.1.1. A second
method, which will work for some special integrands, works by interpreting the definite
integral as “signed area”. This approach will work nicely when the area under the curve
decomposes into simple geometric shapes like triangles, rectangles and circles. Here are
some examples of this second method.
Example 3.1.13
şb şb
The integral a 1dx (which is also written as just a dx) is the area of the shaded rectangle

127
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

(of width b ´ a and height 1) in the figure on the right below. So


y
żb 1
dx = (b ´ a) ˆ (1) = b ´ a
a

a x
b

Example 3.1.13

Example 3.1.14
şb
Let b ą 0. The integral 0 xdx is the area of the shaded triangle (of base b and of height b)
in the figure on the right below. So

y y=x
żb b
1 b2
xdx = bˆb =
0 2 2

x
b
ş0
The integral ´b xdx is the signed area of the shaded triangle (again of base b and of height
b) in the figure on the right below. So

−b y
x
ż0
b2
xdx = ´
´b 2
−b
y=x

Example 3.1.14
şb
Notice that it is very easy to extend this example to the integral 0 cxdx for any real num-
bers b, c ą 0 and find
żb
c
cxdx = b2 .
0 2

Example 3.1.15
ş1
In this example, we shall evaluate ´1 (1 ´ |x|) dx. Recall that
#
´x if x ď 0
|x| =
x if x ě 0

128
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

so that
#
1+x if x ď 0
1 ´ |x| =
1´x if x ě 0

To picture the geometric figure whose area the integral represents observe that

• at the left hand end of the domain of integration x = ´1 and the integrand 1 ´ |x| =
1 ´ | ´ 1| = 1 ´ 1 = 0 and
• as x increases from ´1 towards 0, the integrand 1 ´ |x| = 1 + x increases linearly,
until
• when x hits 0 the integrand hits 1 ´ |x| = 1 ´ |0| = 1 and then
• as x increases from 0, the integrand 1 ´ |x| = 1 ´ x decreases linearly, until
• when x hits +1, the right hand end of the domain of integration, the integrand hits
1 ´ |x| = 1 ´ |1| = 0.
ş1
So the integral ´1 (1 ´ |x|) dx is the area of the shaded triangle (of base 2 and of height 1)
in the figure on the right below and

y
ż1 1
1
(1 ´ |x|) dx = ˆ2ˆ1 = 1
´1 2

−1 1 x

Example 3.1.15

Example 3.1.16
ş1 ? ?
integral 0 1 ´ x2 dx has integrand f ( x ) = 1 ´ x2 . So it represents the area under
The ?
y = 1 ´ x2 with x running from 0 to 1. But we may rewrite

x2 + y2 = 1, y ě 0
a
y = 1 ´ x2 as

But this is the (implicit) equation for a circle — the extra condition that y ě 0 makes it
the equation for the semi-circle centred at the origin with radius 1 lying on and above the
x-axis. Thus the integral represents the area of the quarter circle of radius 1, as shown in
the figure on the right below. So

y
ż1a 1
1 π
1 ´ x2 dx = π (1)2 =
0 4 4

1 x

129
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

Example 3.1.16

This next one is a little trickier and relies on us knowing the symmetries of the sine
function.
Example 3.1.17

The integral ´π sin xdx is the signed area of the shaded region in the figure on the right
şπ

below. It naturally splits into two regions, one on either side of the y-axis. We don’t know
the formula for the area of either of these regions (yet), however the two regions are very
nearly the same. In fact, the part of the shaded region below the x–axis is exactly the re-
flection, in the x–axis, of the part of the shaded region above the x–axis. So the signed area
of part of the shaded region below the x–axis is the negative of the signed area of part of
the shaded region above the x–axis and
y
1

żπ
sin xdx = 0 π x
´π −π

−1

Example 3.1.17

3.1.4 §§ (Optional) Surplus


In Section 2.6, we saw demand curves that depended on a consumer’s income, their pref-
erences (utility function), and the prices of goods. Now let’s use a simplified demand
curve: D (q) is the per-unit price at which a consumer will purchase a quantity q of a
good16 . Rather than think about individuals’ varying utility functions and income, the
demand curve imagines a hypothetical “average” consumer.
Similarly, we can make a supply curve S(q) giving the per-unit price at which a sup-
plier is willing to sell q units.
In simple examples, D (q) has a negative slope (since, to be motivated to buy a higher
quantity, the consumer demands a lower price) and S(q) has a positive slope (since, to be
motivated to sell a higher quantity, the supplier demands a higher price). The quantity
and price where the two curves meet are called the equilibrium quantity and equilibrium
price, respectively, and are denoted qe resp. pe . In theory, suppliers would aim to sell qe

16 The more natural way of thinking about this is reversed: given the price, how much quantity will the
consumer purchase. But formulating the relationship where price is a function of quantity (rather than
the other way around) is standard practice in economics texts, so we follow it here.

130
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

products at a unit price of pe . (If they make more goods, to sell them all they’d have to
charge less than they are willing to accept. If they make fewer goods, they will not meet
consumer demand.)

Price p

S(q)

pe

D (q)

qe Quantity q

The consumer would have been happy to buy their first good at the price D (1). We can
say then that the first good a value of D (1) for the consumer. If they paid a lower price pe ,
then the number D (1) ´ pe is a surplus to the consumer: they gained D (1) units of value
by paying only pe units of value. This surplus can be visualized as the shaded area below.

Price p

S(q)

D (1)

pe

D (q)

1 qe Quantity q

Similarly, the consumer would have been happy to buy their second good at the unit
price D (2). If they paid a smaller price pe , then their surplus from that second good is

131
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

D (2) ´ pe : its value to them, minus what they actually paid. Their combined surplus after
buying two goods can be visualized as the shaded rectangles below.

Price p

S(q)

D (1)

D (2)

pe

D (q)

1 2 qe Quantity q

All together, we expect the consumer to buy qe units. Their total surplus is represented
by the shaded rectangles below.

Price p

S(q)

pe

D (q)

qe Quantity q

This motivates the definition of consumer surplus. Producer surplus behaves similarly.

132
I NTEGRATION 3.1 D EFINITION OF THE I NTEGRAL

Definition3.1.18.

Consider a supply curve S(q) and a demand curve D (q) with intersection point
(qe , pe ), graphed on the (q, p)-plane. The consumer surplus is the area from q = 0
to q = qe under D (q) and above the line p = pe . The producer surplus is the area
from q = 0 to q = qe over S(q) and under the line p = pe . The total surplus is the
sum of consumer surplus and producer surplus.

Price p

S(q)

pe
P
D (q)

qe Quantity q

Given a sale of qe items at unit price pe , we think of the consumer surplus as the net
benefit to the consumer, and the producer surplus as the net benefit to the producer. To
calculate
ş qe these, we need a little geometric intuition. The consumer surplus is the area
0 D ( q ) dq minus the area of the rectangle with width qe and height pe . So, the consumer
surplus is
ż qe
C= D (q)dq ´ pe qe
0

Similarly, the consumer


şq surplus is the area of the rectangle with width qe and height
pe , minus the area 0e S(q) dq. So, the producer surplus is
ż qe
P = pe qe ´ S(q)dq
0

Finally, the total surplus is the value gained by everybody, producers and consumers
combined:
ż qe ż qe
T =C+P = D (q)dq ´ S(q)dq
0 0

133
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

3.2IJ Basic Properties of the Definite Integral


When we studied limits and derivatives, we developed methods for taking limits or
derivatives of “complicated functions” like f ( x ) = x2 + sin( x ) by understanding how lim-
its and derivatives interact with basic arithmetic operations like addition and subtraction.
This allowed us to reduce the problem into one of of computing derivatives of simpler
functions like x2 and sin( x ). Along the way we established simple rules such as

d df dg
lim ( f ( x ) + g( x )) = lim f ( x ) + lim g( x ) and ( f ( x ) + g( x )) = +
xÑa xÑa xÑa dx dx dx
Some of these rules have very natural analogues for integrals and we discuss them below.
Unfortunately the analogous rules for integrals of products of functions or integrals of
compositions of functions are more complicated than those for limits or derivatives. We
discuss those rules at length in subsequent sections. For now let us consider some of the
simpler rules of the arithmetic of integrals.

Theorem3.2.1 (Arithmetic of Integration).

Let a, b and A, B, C be real numbers. Let the functions f ( x ) and g( x ) be integrable


on an interval that contains a and b. Then
żb żb żb
(a) ( f ( x ) + g( x )) dx = f ( x )dx + g( x )dx
a a a
żb żb żb
(b) ( f ( x ) ´ g( x )) dx = f ( x )dx ´ g( x )dx
a a a
żb żb
(c) C f ( x )dx = C ¨ f ( x )dx
a a

Combining these three rules we have


żb żb żb
(d) ( A f ( x ) + Bg( x )) dx = A f ( x )dx + B g( x )dx
a a a

That is, integrals depend linearly on the integrand.


żb żb
(e) dx = 1 ¨ dx = b ´ a
a a

It is not too hard to prove this result from the definition of the definite integral. Addi-
tionally we only really need to prove (d) and (e) since
• (a) follows from (d) by setting A = B = 1,

• (b) follows from (d) by setting A = 1, B = ´1, and

• (c) follows from (d) by setting A = C, B = 0.

134
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

Proof. As noted above, it suffices for us to prove (d) and (e). Since (e) is easier, we will
start with that. It is also a good warm-up for (d).
şb
• The definite integral in (e), a 1dx, can be interpreted geometrically as the area of the
rectangle with height 1 running from x = a to x = b; this area is clearly b ´ a. We
can also prove this formula from the definition of the integral (Definition 3.1.8):
żb n
ÿ b´a
dx = lim f ( xi,n
˚
) by definition
a nÑ8 n
i =1
n
ÿ b´a
= lim 1 since f ( x ) = 1
nÑ8 n
i =1
n
ÿ 1
= lim (b ´ a) since a, b are constants
nÑ8 n
i =1
= lim (b ´ a)
nÑ8
= b´a
as required.

• To prove (d) let us start by defining h( x ) = A f ( x ) + Bg( x ) and then we need to


express the integral of h( x ) in terms of those of f ( x ) and g( x ). We use Definition 3.1.8
and some algebraic manipulations17 to arrive at the result.
żb n
ÿ b´a
h( x )dx = h( xi,n
˚
)¨ by Definition 3.1.8
a n
i =1
n
ÿ  b´a
= A f ( xi,n˚
) + Bg( xi,n
˚
) ¨
n
i =1
n  
ÿ b´a b´a
= A f ( xi,n ) ¨
˚
+ Bg( xi,n ) ¨
˚
n n
i =1
! !
n n
ÿ b ´ a ÿ b ´ a
= A f ( xi,n˚
)¨ + Bg( xi,n
˚
)¨ by Theorem 3.1.5(b)
n n
i =1 i =1
! !
n n
ÿ b ´ a ÿ b ´ a
=A f ( xi,n
˚
)¨ +B g( xi,n
˚
)¨ by Theorem 3.1.5(a)
n n
i =1 i =1
żb żb
=A f ( x )dx + B g( x )dx by Definition 3.1.8
a a

as required.

Using this Theorem we can integrate sums, differences and constant multiples of functions
we know how to integrate. For example:

17 Now is a good time to look back at Theorem 3.1.5.

135
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

Example 3.2.2
ş1
In Example 3.1.1 we saw that 0 e x dx = e ´ 1. So
ż1 ż1 ż1
x
 x
e + 7 dx = e dx + 7 1dx
0 0 0
by Theorem 3.2.1(d) with A = 1, f ( x ) = e x , B = 7, g( x ) = 1
= ( e ´ 1) + 7 ˆ (1 ´ 0)
by Example 3.1.1 and Theorem 3.2.1(e)
= e+6

Example 3.2.2

şb
When we gave the formal definition of a f ( x )dx in Definition 3.1.8 we explained that
the integral could be interpreted as the signed area between the curve y = f ( x ) and the
x-axis on the interval [ a, b]. In order for this interpretation to make sense we required that
a ă b, and though we remarked that the integral makes sense when a ą b we did not
şb
explain any further. Thankfully there is an easy way to express the integral a f ( x )dx in
şa
terms of b f ( x )dx — making it always possible to write an integral so the lower limit of
integration is less than the upper limit of integration. Theorem 3.2.3, below, tell us that, for
ş3 ş7
example, 7 e x dx = ´ 3 e x dx. The same theorem also provides us with two other simple
manipulations of the limits of integration.

Theorem3.2.3 (Arithmetic for the Domain of Integration).

Let a, b, c be real numbers. Let the function f ( x ) be integrable on an interval that


contains a, b and c. Then
ża
(a) f ( x )dx = 0
a
ża żb
(b) f ( x )dx = ´ f ( x )dx
b a
żb żc żb
(c) f ( x )dx = f ( x )dx + f ( x )dx
a a c

The proof of this statement is not too difficult.


Proof. Let us prove the statements in order.

• Consider the definition of the definite integral


żb n
ÿ b´a
f ( x )dx = lim f ( xi,n
˚

a nÑ8 n
i =1

136
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

If we now substitute b = a in this expression we have


ża n
ÿ a´a
f ( x )dx = lim f ( xi,n
˚

a nÑ8 n on
i =1
loomo
=0
n
ÿ
= lim f ( xi,n
˚
)¨0
nÑ8 loooomoooon
i =1
=0
= lim 0
nÑ8
=0

as required.
şb
• Consider now the definite integral a f ( x )dx. We will sneak up on the proof by first
şa
examining Riemann sum approximations to both this and b f ( x )dx. The midpoint
şb
Riemann sum approximation to a f ( x )dx with 4 subintervals (so that each subinter-
val has width b´a4 ) is
" 
1 b ´ a  3 b ´ a  5 b ´ a  7 b ´ a b ´ a
*
f a+ + f a+ + f a+ + f a+ ¨
2 4 2 4 2 4 2 4 4
"        
7 1 5 3 3 5 1 7 b´a
*
= f a+ b + f a+ b + f a+ b + f a+ b ¨
8 8 8 8 8 8 8 8 4
şa
Now we do the same for b f ( x )dx with 4 subintervals. Note that b is now the lower
limit on the integral and a is now the upper limit on the integral. This is likely to
cause confusion when we write out the Riemann sum, so we’ll temporarily rename
şB
b to A and a to B. The midpoint Riemann sum approximation to A f ( x )dx with 4
subintervals is
" 
1 B ´ A  3 B ´ A  5 B ´ A  7 B ´ A B ´ A
*
f A+ + f A+ + f A+ + f A+ ¨
2 4 2 4 2 4 2 4 4
"        
7 1 5 3 3 5 1 7 B´A
*
= f A+ B + f A+ B + f A+ B + f A+ B ¨
8 8 8 8 8 8 8 8 4

Now recalling that


ş a A = b and B = a, we have that the midpoint Riemann sum
approximation to b f ( x )dx with 4 subintervals is
" 
7 1  5 3  3 5  1 7  a´b
*
f b+ a + f b+ a + f b+ a + f b+ a ¨
8 8 8 8 8 8 8 8 4

Thus we see that the Riemann sums for the two integrals are nearly identical — the
only difference being the factor of b´a a´b
4 versus 4 . Hence the two Riemann sums are
negatives of each other.
The same computation with n subintervals shows that the midpoint Riemann sum
şa şb
approximations to b f ( x )dx and a f ( x )dx with n subintervals are negatives of each
şa şb
other. Taking the limit n Ñ 8 gives b f ( x )dx = ´ a f ( x )dx.

137
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

• Finally consider (c) — we will not give a formal proof of this, but instead will inter-
pret it geometrically. Indeed one can also interpret (a) geometrically. In both cases
these become statements about areas:
ża żb żc żb
f ( x )dx = 0 and f ( x )dx = f ( x )dx + f ( x )dx
a a a c

are

Area ( x, y) ˇ a ď x ď a, 0 ď y ď f ( x ) = 0
ˇ (

and

Area ( x, y) ˇ a ď x ď b, 0 ď y ď f ( x ) = Area ( x, y) ˇ a ď x ď c, 0 ď y ď f ( x )
ˇ ( ˇ (

+ Area ( x, y) ˇ c ď x ď b, 0 ď y ď f ( x )
ˇ (

respectively. Both of these geometric statements are intuitively obvious. See the
figures below.

y y
y = f (x) y = f (x)

x x
a a c b

Note that we have assumed that a ď c ď b and that f ( x ) ě 0. One can remove these
restrictions and also make the proof more formal, but it becomes quite tedious and
less intuitive.

Example 3.2.4
şb b2
Back in Example 3.1.14 we saw that when b ą 0 0 xdx = 2. We’ll now verify that
şb b2
0 xdx = 2 is still true when b = 0 and also when b ă 0.
şb b2
• First consider b = 0. Then the statement 0 xdx = 2 becomes
ż0
xdx = 0
0

This is an immediate consequence of Theorem 3.2.3(a).

138
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

• Now consider b ă 0. Let us write B = ´b, so that B ą 0. In Example 3.1.14 we saw


that
ż0
B2
xdx = ´ .
´B 2

So we have
żb ż ´B ż0
xdx = xdx = ´ xdx by Theorem 3.2.3(b)
0 0 ´B
 
B2
=´ ´ by Example 3.1.14
2
B 2 b2
= =
2 2

We have now shown that


żb
b2
xdx = for all real numbers b
0 2

Example 3.2.4

Example 3.2.5

Applying Theorem 3.2.3 yet again, we have, for all real numbers a and b,
żb ż0 żb
xdx = xdx + xdx by Theorem 3.2.3(c) with c = 0
a a 0
żb ża
= xdx ´ xdx by Theorem 3.2.3(b)
0 0
b2 ´ a2
= by Example 3.2.4, twice
2
We can also understand this result geometrically.

139
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

• (left) When 0 ă a ă b, the integral represents the area in green which is the difference
of two right–angle triangles — the larger with area b2 /2 and the smaller with area
a2 /2.

• (centre) When a ă 0 ă b, the integral represents the signed area of the two displayed
triangles. The one above the axis has area b2 /2 while the one below has area ´a2 /2
(since it is below the axis).

• (right) When a ă b ă 0, the integral represents the signed area in purple of the
difference between the two triangles — the larger with area ´a2 /2 and the smaller
with area ´b2 /2.

Example 3.2.5

Theorem 3.2.3(c) shows us how we can split an integral over a larger interval into one
over two (or more) smaller intervals. This is particularly useful for dealing with piece-
wise functions, like |x|.
Example 3.2.6

Using Theorem 3.2.3, we can readily evaluate integrals involving |x|. First, recall that
#
x if x ě 0
|x| =
´x if x ă 0
ş3
Now consider (for example) ´2 |x|dx. Since the integrand changes at x = 0, it makes
sense to split the interval of integration at that point:
ż3 ż0 ż3
|x|dx = |x|dx + |x|dx by Theorem 3.2.3
´2 ´2 0
ż0 ż3
= (´x )dx + xdx by definition of |x|
´2 0
ż0 ż3
=´ xdx + xdx by Theorem 3.2.1(c)
´2 0
= ´(´22 /2) + (32 /2) = (4 + 9)/2
= 13/2

We can go further still — given a function f ( x ) we can rewrite the integral of f (|x|) in
terms of the integral of f ( x ) and f (´x ).
ż1 ż0 ż1
  
f |x| dx = f |x| dx + f |x| dx
´1 ´1 0
ż0 ż1
= f (´x )dx + f ( x )dx
´1 0

140
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

Example 3.2.6

Here is a more concrete example.


Example 3.2.7
ş1 
Let us compute ´1 1 ´ |x| dx again. In Example 3.1.15 we evaluated this integral by in-
terpreting it as the area of a triangle. This time we are going to use only the properties
given in Theorems 3.2.1 and 3.2.3 and the facts that
żb żb
b2 ´ a2
dx = b ´ a and xdx =
a a 2
şb şb 2 2
That a dx = b ´ a is part (e) of Theorem 3.2.1. We saw that a xdx = b ´a
2 in Example 3.2.5.
First we are going to get rid of the absolute value signs by splitting the interval over
which we integrate. Recalling that |x| = x whenever x ě 0 and |x| = ´x whenever x ď 0,
we split the interval by Theorem 3.2.3(c),
ż1 ż0 ż1
  
1 ´ |x| dx = 1 ´ |x| dx + 1 ´ |x| dx
´1 ´1 0
ż0 ż1
 
= 1 ´ (´x ) dx + 1 ´ x dx
´1 0
ż0 ż1
 
= 1 + x dx + 1 ´ x dx
´1 0

Now we apply parts (a) and (b) of Theorem 3.2.1, and then
ż1 ż0 ż0 ż1 ż1
 
1 ´ |x| dx = 1dx + xdx + 1dx ´ xdx
´1 ´1 ´1 0 0

02 ´ (´1)2 12 ´ 02
= [0 ´ (´1)] + + [1 ´ 0] ´
2 2
=1

Example 3.2.7

3.2.1 §§ More Properties of Integration: Even and Odd Functions


Recall18 the following definition

18 We haven’t done this in this course, but you should have seen it in your differential calculus course or
perhaps even earlier.

141
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

Definition3.2.8.

Let f ( x ) be a function. Then,

• we say that f ( x ) is even when f ( x ) = f (´x ) for all x, and

• we say that f ( x ) is odd when f ( x ) = ´ f (´x ) for all x.

Of course most functions are neither even nor odd, but many of the standard functions
you know are.
Example 3.2.9 (Even functions)

• Three examples of even functions are f ( x ) = |x|, f ( x ) = cos x and f ( x ) = x2 . In


fact, if f ( x ) is any even power of x, then f ( x ) is an even function.
• The part of the graph y = f ( x ) with x ď 0, may be constructed by drawing the part
of the graph with x ě 0 (as in the figure on the left below) and then reflecting it in
the y–axis (as in the figure on the right below).

y y
1 1

−π π x −π π x

−1 −1

• In particular, if f ( x ) is an even function and a ą 0, then the two sets


( x, y) ˇ 0 ď x ď a and y is between 0 and f ( x )
ˇ (

( x, y) ˇ ´a ď x ď 0 and y is between 0 and f ( x )


ˇ (

are reflections of each other in the y–axis and so have the same signed area. That is
ża ż0
f ( x )dx = f ( x )dx
0 ´a

Example 3.2.9

Example 3.2.10 (Odd functions)

142
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

• Three examples of odd functions are f ( x ) = sin x, f ( x ) = tan x and f ( x ) = x3 . In


fact, if f ( x ) is any odd power of x, then f ( x ) is an odd function.
• The part of the graph y = f ( x ) with x ď 0, may be constructed by drawing the part
of the graph with x ě 0 (like the solid line in the figure on the left below) and then
reflecting it in the y–axis (like the dashed line in the figure on the left below) and
then reflecting the result in the x–axis (i.e. flipping it upside down, like in the figure
on the right, below).

y y
1 1

−π π x −π π x

−1 −1

• In particular, if f ( x ) is an odd function and a ą 0, then the signed areas of the two
sets

( x, y) ˇ 0 ď x ď a and y is between 0 and f ( x )


ˇ (

( x, y) ˇ ´a ď x ď 0 and y is between 0 and f ( x )


ˇ (

are negatives of each other — to get from the first set to the second set, you flip it
upside down, in addition to reflecting it in the x–axis. That is
ża ż0
f ( x )dx = ´ f ( x )dx
0 ´a

Example 3.2.10

We can exploit the symmetries noted in the examples above, namely


ża ż0
f ( x )dx = f ( x )dx for f even
0 ´a
ża ż0
f ( x )dx = ´ f ( x )dx for f odd
0 ´a

together with Theorem 3.2.3


ża ż0 ża
f ( x )dx = f ( x )dx + f ( x )dx
´a ´a 0

in order to simplify the integration of even and odd functions over intervals of the form
[´a, a].

143
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

Theorem3.2.11 (Even and Odd).

Let a ą 0.

(a) If f ( x ) is an even function, then


ża ża
f ( x )dx = 2 f ( x )dx
´a 0

(b) If f ( x ) is an odd function, then


ża
f ( x )dx = 0
´a

Proof. For any function


ża ża ż0
f ( x )dx = f ( x )dx + f ( x )dx
´a 0 ´a

When f is even, the two terms on the right hand side are equal. When f is odd, the two
terms on the right hand side are negatives of each other.

144
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

3.2.2 §§ More Properties of Integration: Inequalities for Integrals


We are still unable to integrate many functions, however with a little work we can infer
bounds on integrals from bounds on their integrands.

Theorem3.2.12 (Inequalities for Integrals).

Let a ď b be real numbers and let the functions f ( x ) and g( x ) be integrable on


the interval a ď x ď b.

(a) If f ( x ) ě 0 for all a ď x ď b, then


żb
f ( x )dx ě 0
a

(b) If there are constants m and M such that m ď f ( x ) ď M for all a ď x ď b,


then
żb
m(b ´ a) ď f ( x )dx ď M(b ´ a)
a

(c) If f ( x ) ď g( x ) for all a ď x ď b, then


żb żb
f ( x )dx ď g( x )dx
a a

(d) We have
ˇż b ˇ żb
f ( x )dxˇˇ ď | f ( x )|dx
ˇ ˇ
ˇ
a a
ˇ

Proof. (a) By interpreting the integral as the signed area, this statement simply says thatˇ if
the curve y = f ( x ) lies above the x–axis and a ď b, then the signed area of ( x, y) ˇ a ď
x ď b, 0 ď y ď f ( x ) is at least zero. This is quite clear. Alternatively, we could argue
(
şb
more algebraically from Definition 3.1.8. We observe that when we define a f ( x )dx
via Riemann sums, every summand, f ( xi,n ˚ ) b´a ě 0. Thus the whole sum is nonnega-
n
tive and consequently, so is the limit, and thus so is the integral.

(b) We can argue this from (a) with a little massaging. Let g( x ) = M ´ f ( x ), then since
f ( x ) ď M, we have g( x ) = M ´ f ( x ) ě 0 so that

żb żb

M ´ f ( x ) dx = g( x )dx ě 0.
a a

145
I NTEGRATION 3.2 B ASIC P ROPERTIES OF THE D EFINITE I NTEGRAL

but we also have


żb żb żb

M ´ f ( x ) dx = Mdx ´ f ( x )dx
a a a
żb
= M(b ´ a) ´ f ( x )dx
a
Thus
żb
M(b ´ a) ´ f ( x )dx ě 0 rearrange
a
żb
M(b ´ a) ě f ( x )dx
a
şb
as required. The argument showing a f ( x )dx ě m(b ´ a) is similar.
(c) Now let h( x ) = g( x ) ´ f ( x ). Since f ( x ) ď g( x ), we have h( x ) = g( x ) ´ f ( x ) ě 0 so
that
żb żb

g( x ) ´ f ( x ) dx = h( x )dx ě 0
a a
But we also have that
żb żb żb

g( x ) ´ f ( x ) dx = g( x )dx ´ f ( x )dx
a a a
Thus
żb żb
g( x )dx ´ f ( x )dx ě 0 rearrange
a a
żb żb
g( x )dx ě f ( x )dx
a a
as required.
(d) For any x, | f ( x )| is either f ( x ) or ´ f ( x ) (depending on whether f ( x ) is positive or
negative), so we certainly have
f ( x ) ď | f ( x )| and ´ f ( x ) ď | f ( x )|
Applying part (c) to each of those inequalities gives
żb żb żb żb
f ( x )dx ď | f ( x )|dx and ´ f ( x )dx ď | f ( x )|dx
a a a a
şb şb şb
Now | a f ( x )dx| is either equal to a f ( x )dx or ´ a f ( x )dx (depending on whether the
integral is positive or negative). In either case we can apply the above two inequalities
to get the same result, namely
ˇż ˇ ż
ˇ b ˇ b
ˇ f ( x )dxˇ ď | f ( x )|dx.
ˇ ˇ
ˇ a ˇ a

146
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

ş π ? 
/3
Example 3.2.13 0 cos xdx

Consider the integral


ż π/3
?
cos xdx
0

This is not so easy to compute exactly19 , but we can bound it quite quickly.
For x between 0 and π3 , the function cos x takes values20 between 1 and 12 . Thus the
?
function cos x takes values between 1 and ?12 . That is

1 ? π
? ď cos x ď 1 for 0 ď x ď .
2 3

Consequently, by Theorem 3.2.12(b) with a = 0, b = ?1


3, m= and M = 1,
π
2
ż π/3
π ? π
? ď cos xdx ď
3 2 0 3

Plugging these expressions into a calculator gives us


ż π/3
?
0.7404804898 ď cos xdx ď 1.047197551
0

Example 3.2.13

3.3IJ The Fundamental Theorem of Calculus


We have spent quite a few pages (and lectures) talking about definite integrals, what
they are (Definition 3.1.8), when they exist (Theorem 3.1.9), how to compute some spe-
cial cases (Section 3.1.3), some ways to manipulate them (Theorem 3.2.1 and 3.2.3) and
how to bound them (Theorem 3.2.12). Conspicuously missing from all of this has been a
discussion of how to compute them in general. It is high time we rectified that.
The single most important tool used to evaluate integrals is called “the Fundamental
Theorem of Calculus”. Its grand name is justified — it links the two branches of calculus
by connecting derivatives to integrals. In so doing it also tells us how to compute integrals.
Very roughly speaking the derivative of an integral is the original function. This fact
allows us to compute integrals using antiderivatives21 . Of course “very rough” is not
enough — let’s be precise.

19 It is not too hard to use Riemann sums and a computer to evaluate it numerically: 0.948025319 . . . .
20 You know the graphs of sine and cosine, so you should be able to work this out without too much
difficulty.
21 You learned these near the end of your differential calculus course. Now is a good time to revise — but
we’ll go over them here since they are so important in what follows.

147
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

Theorem3.3.1 (Fundamental Theorem of Calculus).

Let a ă b and let f ( x ) be a function which is defined and continuous on [ a, b].


żx
Part 1: Let F ( x ) = f (t)dt for any x P [ a, b]. Then the function F ( x ) is differen-
a
tiable and further

F1 ( x ) = f ( x )

Part 2: Let G ( x ) be any function which is defined and continuous on [ a, b]. Fur-
ther let G ( x ) be differentiable with G1 ( x ) = f ( x ) for all a ă x ă b. Then
żb żb
f ( x )dx = G (b) ´ G ( a) or equivalently G1 ( x )dx = G (b) ´ G ( a)
a a

Before we prove this theorem and look at a bunch of examples of its application, it
is important that we recall one definition from differential calculus — antiderivatives. If
F1 ( x ) = f ( x ) on some interval, then F ( x ) is called an antiderivative of f ( x ) on that inter-
val. So Part 2 of the the Fundamental Theorem of Calculus tells us how to evaluate the
definite integral of f ( x ) in terms of any of its antiderivatives — if G ( x ) is any antideriva-
tive of f ( x ) then
żb
f ( x )dx = G (b) ´ G ( a)
a
şb
The form a G1 ( x ) dx = G (b) ´ G ( a) of the Fundamental Theorem relates the rate of
change of G ( x ) over the interval a ď x ď b to the net change of G between x = a and
x = b. For that reason, it is sometimes called the “net change theorem”.
We’ll start with a simple example. Then we’ll see why the Fundamental Theorem is
true and then we’ll do many more, and more involved, examples.
Example 3.3.2 (A first example)
şb
Consider the integral a xdx which we have explored previously in Example 3.2.5.
• The integrand is f ( x ) = x.
x2
• We can readily verify that G ( x ) = 2 satisfies G1 ( x ) = f ( x ) and so is an antideriva-
tive of the integrand.
• Part 2 of Theorem 3.3.1 then tells us that
żb
f ( x )dx = G (b) ´ G ( a)
a
żb
b2 a2
xdx = ´
a 2 2
which is precisely the result we obtained (with more work) in Example 3.2.5.

148
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

Example 3.3.2

We do not give completely rigorous proofs of the two parts of the theorem — that is
not really needed for this course. We just give the main ideas of the proofs so that you can
understand why the theorem is true.
Part 1. We wish to show that if
żx
F(x) = f (t)dt then F1 ( x ) = f ( x )
a

• Assume that F is the above integral and then consider F1 ( x ). By definition

F ( x + h) ´ F ( x )
F1 ( x ) = lim
hÑ0 h

• To understand this limit, we interpret the terms F ( x ), F ( x + h) as signed areas. To


simplify this further, let’s only consider the case that f is always nonnegative and
that h ą 0. These restrictions are not hard to remove, but the proof ideas are a bit
cleaner if we keep them in place. Then we have

F ( x + h) = the area of the region (t, y) ˇ a ď t ď x + h, 0 ď y ď f (t)


ˇ (

F ( x ) = the area of the region (t, y) ˇ a ď t ď x, 0 ď y ď f (t)


ˇ (

• Then the numerator

F ( x + h) ´ F ( x ) = the area of the region (t, y) ˇ x ď t ď x + h, 0 ď y ď f (t)


ˇ (

This is just the more darkly shaded region in the figure

y = f (t)

t
a x x+h

• We will be taking the limit h Ñ 0. So suppose that h is very small. Then, as t runs
from x to x = h, f (t) runs only over a very narrow range of values22 , all close to
f ( x ).

• So the darkly shaded region is almost a rectangle of width h and height f ( x ) and so
F ( x +h)´F ( x )
has an area which is very close to f ( x )h. Thus h is very close to f ( x ).

22 Notice that if f were discontinuous, then this might be false.

149
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

F ( x +h)´F ( x )
• In the limit h Ñ 0, h becomes exactly f ( x ), which is precisely what we
want.

We can make the above more rigorous using the Mean Value Theorem23 .
şb
Part 2. We want to show that a f (t)dt = G (b) ´ G ( a). To do this we exploit the fact that
the derivative of a constant is zero.
• Let
żx
H (x) = f (t)dt ´ G ( x ) + G ( a)
a

Then the result we wish to prove is that H (b) = 0. We will do this by showing that
H ( x ) = 0 for all x between a and b.
• We first show that H ( x ) is constant by computing its derivative:
d x d d
ż
H (x) =
1
f (t)dt ´ ( G ( x )) + ( G ( a))
dx a dx dx
Since G ( a) is a constant, its derivative is 0 and by assumption the derivative of G ( x )
is just f ( x ), so
d x
ż
= f (t)dt ´ f ( x )
dx a
Now Part 1 of the theorem tells us that this derivative is just f ( x ), so
= f (x) ´ f (x) = 0
Hence H is constant.
• To determine which constant we just compute H ( a):
ża
H ( a) = f (t)dt ´ G ( a) + G ( a)
a
ża
= f (t)dt by Theorem 3.2.3(a)
a
=0
as required.

23 The MVT tells us that there is a number c between x and x + h so that


F ( x + h) ´ F ( x ) F ( x + h) ´ F ( x )
F1 (c) = =
( x + h) ´ x h
But since F1 ( x ) = f ( x ), this tells us that
F ( x + h) ´ F ( x )
= f (c)
h
where c is trapped between x + h and x. Now when we take the limit as h Ñ 0 we have that this number
c is squeezed to x and the result follows.

150
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

The simple example we did above (Example 3.3.2), demonstrates the application of
part 2 of the Fundamental Theorem of Calculus. Before we do more examples (and there
will be many more over the coming sections) we should do some examples illustrating
the use of part 1 of the fundamental theorem of calculus. Then we’ll move on to part 2.
 ş 
d x
Example 3.3.3 dx 0 tdt
şx
Consider the integral 0 t dt. We know how to evaluate this — it is just Example 3.3.2 with
a = 0, b = x. So we have two ways to compute the derivative. We can evaluate the in-
tegral and then take the derivative, or we can apply Part 1 of the Fundamental Theorem.
We’ll do both, and check that the two answers are the same.
First, Example 3.3.2 gives
żx
x2
F(x) = t dt =
0 2

So of course F1 ( x ) = x. Second, Part 1 of the Fundamental Theorem of calculus tells us that


the derivative of F ( x ) is just the integrand. That is, Part 1 of the Fundamental Theorem of
Calculus also gives F1 ( x ) = x.
Example 3.3.3

In the previous example we were able to evaluate the integral explicitly, so we did not
need the Fundamental Theorem to determine its derivative. Here is an example that really
does require the use of the Fundamental Theorem.
 ş 
d x ´t2
Example 3.3.4 dx 0 e dt
d x ´t2
We would like to find dx 0e dt. In the previous example, we were able to compute the
ş

corresponding derivative in two ways — we could explicitly compute the integral and
then differentiate the result, or we could apply part 1 of the Fundamental Theorem of cal-
culus. In this example we do not know the integral explicitly. Indeed it is not possible
şx 2
to express24 the integral 0 e´t dt as a finite combination of standard functions such as
polynomials, exponentials, trigonometric functions and so on.
Despite this, we can find its derivative by just applying the first part of the Fundamen-

şx 2
24 The integral 0 e´t dt is closely related to the “error function” which is an extremely important function
in mathematics. While we cannot express this integral (or the error function) as a finite combination of
polynomials, exponentials etc, we can express it as an infinite series
żx
2 x3 x5 x7 x9 x2k+1
e´t dt = x ´ + ´ + + ¨ ¨ ¨ + (´1)k +¨¨¨
0 3 ¨ 1 5 ¨ 2 7 ¨ 3! 9 ¨ 4! (2k + 1) ¨ k!

But more on this in Chapter 5.

151
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

2
tal Theorem of Calculus with f (t) = e´t and a = 0. That gives
żx żx
d ´t2 d
e dt = f (t)dt
dx 0 dx 0
2
= f ( x ) = e´x

Example 3.3.4

Let us ratchet up the complexity of the previous example — we can make the limits
of the integral more complicated functions. So consider the previous example with the
upper limit x replaced by x2 :
 ş 2 
d x ´t2
Example 3.3.5 dx 0 e dt
ş x2 2
Consider the integral 0 e´t dt. We would like to compute its derivative with respect to x
using part 1 of the fundamental theorem of calculus.
The
ş x Fundamental Theorem tells us how to compute the derivative of functions of the
form a f (t)dt but the integral at hand is not of the specified form because the upper limit
we have is x2 , rather than x, — so more care is required. Thankfully we can deal with this
obstacle with only a little extra work. The trick is to define an auxiliary function by simply
changing the upper limit to x. That is, define
żx
2
E( x ) = e´t dt
0

Then the integral we want to work with is


ż x2
2 2
E( x ) = e´t dt
0

The derivative E1 ( x ) can be found via part 1 of the Fundamental Theorem of calculus (as
2
we did in Example 3.3.4) and is E1 ( x ) = e´x . We can then use this fact with the chain rule
to compute the derivative we need:
ż x2
d 2 d
e´t dt = E( x2 ) use the chain rule
dx 0 dx
= 2xE1 ( x2 )
4
= 2xe´x

Example 3.3.5

What if both limits of integration are functions of x? We can still make this work, but
we have to split the integral using Theorem 3.2.3.

152
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

 ş 2 
d x ´t2
Example 3.3.6 dx x e dt

Consider the integral


ż x2
2
e´t dt
x

As was the case in the previous example, we have to do a little pre-processing before we
can apply the Fundamental Theorem.
This time (by design), not only is the upper limit of integration x2 rather than x,ş but the
x
lower limit of integration also depends on x — this is different from the integral a f (t)dt
in the Fundamental Theorem where the lower limit of integration is a constant.
Fortunately we can use the basic properties of integrals (Theorem 3.2.3(b) and (c)) to
ş x2 2
split x e´t dt into pieces whose derivatives we already know.
ż x2 ż0 ż x2
´t2 ´t2 2
e dt = e dt + e´t dt by Theorem 3.2.3(c)
x x 0
żx ż x2
´t2 2
=´ e dt + e´t dt by Theorem 3.2.3(b)
0 0

With this pre-processing, both integrals are of the right form. Using what we have learned
in the the previous two examples,
ż x2 żx ż x2 !
d 2 d 2 2
e´t dt = ´ e´t dt + e´t dt
dx x dx 0 0
żx ż 2
d ´t2 d x ´t2
=´ e dt + e dt
dx 0 dx 0
2 4
= ´e´x + 2xe´x

Example 3.3.6

3.3.1 §§ Indefinite Integration


Before we start to work with part 2 of the Fundamental Theorem, we need a little termi-
nology and notation. First some terminology — you may have seen this definition in your
differential calculus course.

Definition3.3.7 (Antiderivatives).

Let f ( x ) and F ( x ) be functions. If F1 ( x ) = f ( x ) on an interval, then we say that


F ( x ) is an antiderivative of f ( x ) on that interval.

153
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

As we saw above, an antiderivative of f ( x ) = x is F ( x ) = x2 /2 — we can easily verify


this by differentiation. Notice that x2 /2 + 3 is also an antiderivative of x, as is x2 /2 + C
for any constant C. This observation gives us the following simple lemma.

Lemma3.3.8.

Let f ( x ) be a function and let F ( x ) be an antiderivative of f ( x ). Then F ( x ) + C


is also an antiderivative for any constant C. Further, every antiderivative of f ( x )
must be of this form.

Proof. There are two parts to the lemma and we prove each in turn.

• Let F ( x ) be an antiderivative of f ( x ) and let C be some constant. Then

d d d
( F(x) + C) = ( F ( x )) + (C )
dx dx dx
= f (x) + 0

since the derivative of a constant is zero, and by definition the derivative of F ( x ) is


just f ( x ). Thus F ( x ) + C is also an antiderivative of f ( x ).

• Now let F ( x ) and G ( x ) both be antiderivatives of f ( x ) — we will show that G ( x ) =


F ( x ) + C for some constant C. To do this let H ( x ) = G ( x ) ´ F ( x ). Then

d d d d
H (x) = ( G ( x ) ´ F ( x )) = G(x) ´ F(x) = f (x) ´ f (x) = 0
dx dx dx dx

Since the derivative of H ( x ) is zero, H ( x ) must be a constant function25 . Thus


H ( x ) = G ( x ) ´ F ( x ) = C for some constant C and the result follows.

Based on the above lemma we have the following definition.

25 This follows from the Mean Value Theorem. Say H ( x ) were not constant, then there would be two
numbers a ă b so that H ( a) ‰ H (b). Then the MVT tells us that there is a number c between a and b so
that

H (b) ´ H ( a)
H 1 (c) = .
b´a

Since both numerator and denominator are non-zero, we know the derivative at c is nonzero. But
this would contradict the assumption that derivative of H is zero. Hence we cannot have a ă b with
H ( a) ‰ H (b) and so H ( x ) must be constant.

154
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

Definition3.3.9.

The “indefinite integral of f ( x )” is denoted by f ( x )dx and should be regarded


ş

as the general antiderivative of f ( x ). In particular, if F ( x ) is an antiderivative of


f ( x ) then
ż
f ( x )dx = F ( x ) + C

where the C is an arbitrary constant. In this context, the constant C is also often
called a “constant of integration”.

Now we just need a tiny bit more notation.

Notation3.3.10.

The symbol
ż ˇb
f ( x )dxˇˇ
ˇ
a

denotes the change in an antiderivative of f ( x ) from x = a to x = b. More


precisely, let F ( x ) be any antiderivative of f ( x ). Then
ż ˇb
f ( x )dxˇˇ = F ( x )|ba = F (b) ´ F ( a)
ˇ
a

Notice that this notation allows us to write part 2 of the Fundamental Theorem as
żb ż ˇb
f ( x )dx = f ( x )dxˇˇ
ˇ
a a
= F ( x )|ba = F (b) ´ F ( a)

Some texts also use an equivalent notation using square brackets:


żb h ib
f ( x )dx = F ( x ) = F (b) ´ F ( a).
a a

You should be familiar with both notations.


We’ll soon develop some strategies for computing more complicated integrals. But for
now, we’ll try a few integrals that are simple enough that we can just guess the answer.
Of course, any antiderivative that we can guess we can also check — simply differentiate
the guess and verify you get back to the original function:

d
ż
f ( x )dx = f ( x ).
dx

155
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

We do these examples in some detail to help us become comfortable finding indefinite


integrals.
Example 3.3.11
ş2
Compute the definite integral 1 xdx.
ş2 2 2
Solution. We have already seen, in Example 3.2.5, that 1 xdx = 2 ´1
2 = 23 . We shall now
rederive that result using the Fundamental Theorem of Calculus.
• The main difficulty in this approach is finding the indefinite integral (an antideriva-
tive) of x. That is, we need to find a function F ( x ) whose derivative is x. So think
back to all the derivatives you computed last term26 and try to remember a function
whose derivative was something like x.
• This shouldn’t be too hard — we recall that the derivatives of polynomials are poly-
nomials. More precisely, we know that
d n
x = nx n´1
dx
So if we want to end up with just x = x1 , we need to take n = 2. However this gives
us
d 2
x = 2x
dx

• This is pretty close to what we want except for the factor of 2. Since this is a constant
we can just divide both sides by 2 to obtain:
1 d 2 1
¨ x = ¨ 2x which becomes
2 dx 2
d x 2
¨ =x
dx 2
which is exactly what we need. It tells us that x2 /2 is an antiderivative of x.
• Once one has an antiderivative, it is easy to compute the indefinite integral
1
ż
xdx = x2 + C
2
as well as the definite integral:

1 2 ˇˇ2
ż2 ˇ
xdx = x ˇ since x2 /2 is the antiderivative of x
1 2 1
1 1 3
= 22 ´ 12 =
2 2 2
26 Of course, this assumes that you did your differential calculus course last term. If you did that course at
a different time then please think back to that point in time. If it is long enough ago that you don’t quite
remember when it was, then you should probably do some revision of derivatives of simple functions
before proceeding further.

156
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

Example 3.3.11
While the previous example could be computed using signed areas, the following example
would be very difficult to compute without using the Fundamental Theorem of Calculus.

Example 3.3.12
ş π/2
Compute 0 sin xdx.
Solution.
• Once again, the crux of the solution is guessing the antiderivative of sin x — that is
finding a function whose derivative is sin x.
• The standard derivative that comes closest to sin x is
d
cos x = ´ sin x
dx
which is the derivative we want, multiplied by a factor of ´1.
• Just as we did in the previous example, we multiply this equation by a constant to
remove this unwanted factor:
d
(´1) ¨ cos x = (´1) ¨ (´ sin x ) giving us
dx
d 
´ cos x = sin x
dx
This tells us that ´ cos x is an antiderivative of sin x.
• Now it is straightforward to compute the integral:
ż π/2
sin xdx = ´ cos x|0/2 since ´ cos x is the antiderivative of sin x
π

0
π
= ´ cos + cos 0
2
= 0+1 = 1

Example 3.3.12

Example 3.3.13
ş21
Find 1 x dx.
Solution.
• Once again, the crux of the solution is guessing a function whose derivative is 1x .
Our standard way to differentiate powers of x, namely
d n
x = nx n´1 ,
dx

157
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

doesn’t work in this case — since it would require us to pick n = 0 and this would
give

d 0 d
x = 1 = 0.
dx dx

• Fortunately, we also know27 that

d 1
ln x =
dx x
which is exactly the derivative we want.

• We’re now ready to compute the prescribed integral.


ż2
1
dx = ln x|21 since ln x is an antiderivative of 1/x
1 x
= ln 2 ´ ln 1 since ln 1 = 0
= ln 2

Example 3.3.13

Example 3.3.14
1
ş´1
Find ´2 x dx.
Solution.

• As we saw in the last example,

d 1
ln x =
dx x
and if we naively use this here, then we will obtain

1
ż ´1
dx = ln(´1) ´ ln(´2)
´2 x

which makes no sense since the logarithm is only defined for positive numbers28 .

• We can work around this problem using a slight variation of the logarithm — ln |x|.

27 To align with what you probably saw in high school, we’ll use ln x to denote the natural logarithm.
This is unambiguous – ln x is always the same as loge x.
On the other hand, the precise meaning of log x is not universal. The implied base may be 10 (com-
mon in chemistry and physics), e (common in math and computer languages like Java, C, Python, and
MATLAB), or 2 (common in computer science).

28 This is not entirely true — one can extend the definition of the logarithm to negative numbers, but to
do so one needs to understand complex numbers which is a topic beyond the scope of this course.

158
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

– When x ą 0, we know that |x| = x and so we have

ln |x| = ln x differentiating gives us


d d 1
ln |x| = ln x = .
dx dx x
– When x ă 0 we have that |x| = ´x and so

ln |x| = ln(´x ) differentiating with the chain rule gives


d d
ln |x| = ln(´x )
dx dx
1 1
= ¨ (´1) =
(´x ) x

– Indeed, more generally we should write the indefinite integral of 1/x as


1
ż
dx = ln |x| + C
x
which is valid for all positive and negative x. It is, however, undefined at x = 0.

• We’re now ready to compute the prescribed integral.


ˇ´1
1
ż ´1
dx = ln |x|ˇˇ since ln |x| is an antiderivative of 1/x
ˇ
´2 x ´2
= ln | ´ 1| ´ ln | ´ 2| = ln 1 ´ ln 2
= ´ ln 2 = ln 1/2.

Example 3.3.14

This next example raises a nasty issue that requires a little care. We know that the
function 1/x is not defined at x = 0 — so can we integrate over an interval that contains
x = 0 and still obtain an answer that makes sense? More generally can we integrate a
function over an interval on which that function has discontinuities?
Example 3.3.15
ş1 1
Find ´1 x2 dx.
Solution. Beware that this is a particularly nasty example, which illustrates a booby trap
hidden in the Fundamental Theorem of Calculus. The booby trap explodes when the
theorem is applied sloppily.
• The sloppy solution starts, as our previous examples have, by finding an antideriva-
tive of the integrand. In this case we know that
d 1 1
=´ 2
dx x x
which means that ´x´1 is an antiderivative of x´2 .

159
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

• This suggests (if we proceed naively) that

1 ˇˇ1
ż1 ˇ
x ´2
dx = ´ ˇ since ´1/x is an antiderivative of 1/x2
´1 x ´1
1  1 
=´ ´ ´
1 ´1
= ´2

Unfortunately,

• At this point we should really start to be concerned. This answer cannot be correct.
Our integrand, being a square, is positive everywhere. So our integral represents the
area of a region above the x–axis and must be positive.

• So what has gone wrong? The flaw in the computation is that the Fundamental
Theorem of calculus, which says that
żb
if F ( x ) = f ( x ) then
1
f ( x )dx = F (b) ´ F ( a),
a

is only applicable when F1 ( x ) exists and equals f ( x ) for all x between a and b.

• In this case F1 ( x ) = x12 does not exist for x = 0. So we cannot apply the Fundamental
Theorem of Calculus as we tried to above.
ş1
An integral, like ´1 x12 dx, whose integrand is undefined somewhere in the domain of
integration is called improper. We’ll give a more thorough treatment of improper integrals
later in the text. For now, we’ll just say that the correct way to define (and evaluate)
improper integrals is as a limit of well–defined approximating integrals. We shall later see
ş1
that, not only is ´1 x12 dx not negative, it is infinite.
Example 3.3.15

The above examples have illustrated how we can use the fundamental theorem of
calculus to convert knowledge of derivatives into knowledge of integrals. We are now in
a position to easily build a table of integrals. Here is a short table of the most important
derivatives that we know.

F(x) 1 xn sin x cos x tan x ex ln |x| arcsin x arctan x


1 ?1 1
f (x) = F1 ( x ) 0 nx n´1 cos x ´ sin x sec2 x ex x 1+ x 2
1´x2

Of course we know other derivatives, such as those of sec x and cot x, however the ones
listed above are arguably the most important ones. From this table (with a very little
massaging) we can write down a short table of indefinite integrals.

160
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

Theorem3.3.16 (Important indefinite integrals).

f (x) F(x) = f ( x )dx


ş

1 x+C

1 n +1
xn n +1 x + C provided that n ‰ ´1

1
ln |x| + C
x

ex ex + C

sin x ´ cos x + C

cos x sin x + C

sec2 x tan x + C

1
? arcsin x + C
1 ´ x2
1
arctan x + C
1 + x2

Example 3.3.17

Find the following integrals

ş7
(i) 2 e x dx

ş2 1
(ii) ´2 1+ x2 dx

ş3 3
(iii) 0 (2x + 7x ´ 2)dx

Solution. We can proceed with each of these as before — find the antiderivative and then
apply the Fundamental Theorem. The third integral is a little more complicated, but we
can split it up into monomials using Theorem 3.2.1 and do each separately.

161
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

(i) An antiderivative of e x is just e x , so


ż7 ˇ7
x xˇ
e dx = e ˇ
ˇ
2 2
= e ´ e2 = e2 ( e5 ´ 1).
7

1
(ii) An antiderivative of 1+ x 2
is arctan( x ), so
ż2 ˇ2
1
dx arctan x
ˇ
= ( ) ˇ
´2 1 + x2 ˇ
´2
= arctan(2) ´ arctan(´2)

We can simplify this a little further by noting that arctan( x ) is an odd function, so
arctan(´2) = ´ arctan(2) and thus our integral is

= 2 arctan(2)

(iii) We can proceed by splitting the integral using Theorem 3.2.1(d)


ż3 ż3 ż3 ż3
3 3
(2x + 7x ´ 2)dx = 2x dx + 7xdx ´ 2dx
0 0 0 0
ż3 ż3 ż3
3
= 2 x dx + 7 xdx ´ 2 dx
0 0 0

and because we know that x4 /4, x2 /2, x are antiderivatives of x3 , x, 1 respectively,


this becomes
 4 3  2 3
x 7x
= + ´ [2x ]30
2 0 2 0
81 7 ¨ 9
= + ´6
2 2
81 + 63 ´ 12 132
= = = 66.
2 2
We can also just find the antiderivative of the whole polynomial by finding the an-
tiderivatives of each term of the polynomial and then recombining them. This is
equivalent to what we have done above, but perhaps a little neater:
ż3  3
3 x4 7x2
(2x + 7x ´ 2)dx = + ´ 2x
0 2 2 0
81 7 ¨ 9
= + ´ 6 = 66.
2 2

Example 3.3.17

162
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

3.3.2 §§ (Optional) Marginal Cost and Marginal Revenue


Definition3.3.18.

Total cost, TC(q), is the the cost of producing q of units of a good.

• We call TC(0) the fixed cost, FC.

• The quantity TC(q) ´ TC(0) is the variable cost, VC(q).

Total cost is, therefore, the sum of fixed and variable costs:

TC(q) = FC + VC(q)

Fixed cost encompasses all expenses that do not change with quantity (such as rent on
a factory space, which is the same whether you make 1 or 1000 units). Fixed cost is a
constant, and generally nonzero. We can think of these expenses as being incurred before
the first unit is ever produced, hence the definition of fixed costs as TC(0).
Variable cost consists of expenses that depend on quantity. A typical example of such
an expense is raw materials: producing more units means using more raw materials.
Consider the cost of making “one more unit” of output, after having already made q
units: TC(q + 1) ´ TC(q). Using the definition of the derivative, we can approximate this
quantity by dTC
dq :

dTC TC(q + h) ´ TC(q) TC(q + 1) ´ TC(q)


= lim «
dq hÑ0 h 1

This motivates the definition of marginal cost.

Definition3.3.19.

Given a good with total cost function TC(q), the marginal cost of production of
the good is defined as
d  
MC(q) = TC .
dq

Suppose we know the marginal cost function, MC(q), and we want to find the total
cost function, TC(q). By the Fundamental Theorem of Calculus,
ż
TC(q) = MC(q) dq + C

for some constant C. In order to find C, we use the initial value

TC(0) = FC.

163
I NTEGRATION 3.3 T HE F UNDAMENTAL T HEOREM OF C ALCULUS

Example 3.3.20 (From marginal cost to total cost)

Suppose a product has fixed cost $100, and marginal cost function MC(q) = e´q + 3. What
is its total cost function?

  Using the Fundamental Theorem of Calculus Part 1, given the definition MC(q) =
Solution.
d
dq TC , we see:

ż ż
TC(q) = MC dq + C = e´q + 3 dq + C

Antidifferentiating by inspection,
= ´e´q + 3q + C
Using FC=T(0):
100 = T (0) = ´e´0 + 3 ¨ 0 + C = ´1 + C
101 = C
All together,
TC(q) = ´e´q + 3q + 101

Example 3.3.20

In addition to considering total and marginal costs, we can consider total and marginal
revenue.
Definition3.3.21.

Suppose the total revenue collected from q units of output is given by the func-
tion TR(q), with TR(0) = 0 (since selling no products leads to no revenue). We
define the marginal revenue to be

d  
MR(q) = TR(q) .
dq

We define the unit price to be

TR(q)
P( q ) =
q

for q ą 0.

We think of marginal revenue as the extra revenue gained by producing one extra unit
of output.
Example 3.3.22
3
Suppose the marginal revenue function for a product is MR(q) = 10 ´ 1+ q2
. What is the
unit price of the product, if 10 units are sold?

164
I NTEGRATION 3.4 S UBSTITUTION

Solution. First, we use the Fundamental Theorem of Calculus Part 1 to find the total
revenue function.
ż  
3
ż
TR = MRdq + C = 10 ´ dq + C
1 + q2
Referring to Theorem 3.3.16,
= 10q ´ 3 arctan q + C
Now we use the initial value TR(0) = 0.
0 = TR(0) = 10(0) ´ 3 arctan 0 + C
0=C
All together,
TR(q) = 10q ´ 3 arctan q
If 10 units are sold, the unit price is
TR(10) 10(10) ´ 3 arctan(10)
P(10) = = = 10 ´ 0.3 arctan(10) « 10.44
10 10

Example 3.3.22

3.4IJ Substitution
In the previous section we explored the Fundamental Theorem of Calculus and the link it
provides between definite integrals and antiderivatives. Indeed, integrals with simple in-
tegrands are usually evaluated via this link. In this section we start to explore methods for
integrating more complicated integrals. We have already seen — via Theorem 3.2.1 — that
integrals interact very nicely with addition, subtraction and multiplication by constants:
żb żb żb
( A f ( x ) + Bg( x )) dx = A f ( x )dx + B g( x )dx
a a a

for A, B constants. By combining this with the list of indefinite integrals in Theorem 3.3.16,
we can compute integrals of linear combinations of simple functions. For example
ż4  ż4 ż4 ż4
2
x
e ´ 2 sin x + 3x dx = x
e dx ´ 2 sin xdx + 3 x2 dx
1 1 1 1
 3  ˇ4
x ˇ
= e x + (´2) ¨ (´ cos x ) + 3 ˇ and so on
3 ˇ
1

Of course there are a great many functions that can be approached in this way, however
there are some very simple examples that cannot.
x
ż ż ż
x
sin(πx )dx xe dx dx
x2 ´ 5x + 6

165
I NTEGRATION 3.4 S UBSTITUTION

In each case the integrands are not linear combinations of simpler functions; in order to
compute them we need to understand how integrals (and antiderivatives) interact with
compositions, products and quotients. We reached a very similar point in our differential
calculus course where we understood the linearity of the derivative,
d df dg
( A f ( x ) + Bg( x )) = A +B ,
dx dx dx
but had not yet seen the chain, product and quotient rules29 . While we will develop tools
to find the second and third integrals in later sections, we should really start with how to
integrate compositions of functions.
It is important to state up front, that in general one cannot write down the integral of
the composition of two functions — even if those functions are simple. This is not because
the integral does not exist. Rather it is because the integral cannot be written down as
a finite combination of the standard functions we know. A very good example of this,
which we encountered in Example 3.3.4, is the composition of e x and ´x2 . Even though
we know
1
ż ż
x x
e dx = e + C and ´x2 dx = ´ x3 + C
3
there is no simple function that is equal to the indefinite integral
ż
2
e´x dx.

even though the indefinite integral exists. In this way integration is very different from
differentiation.
With that caveat out of the way, we can introduce the substitution rule. The substitu-
tion rule is obtained by antidifferentiating the chain rule. In some sense it is the chain rule
in reverse. For completeness, let us restate the chain rule:
Theorem3.4.1 (The chain rule).

Let F (u) and u( x ) be differentiable functions and form their composition


F (u( x )). Then

d  
F u ( x ) = F 1 u ( x ) ¨ u1 ( x )
dx
Equivalently, if y( x ) = F (u( x )), then

dy dF du
= ¨ .
dx du dx

Consider a function f (u), which has antiderivative F (u). Then we know that
ż ż
f (u)du = F1 (u)du = F (u) + C

29 If your memory of these rules is a little hazy then you really should go back and revise them before
proceeding. You will definitely need a good grasp of the chain rule for what follows in this section.

166
I NTEGRATION 3.4 S UBSTITUTION

Now take the above equation and substitute into it u = u( x ) — i.e. replace the variable u
with any (differentiable) function of x to get
ż ˇ
f (u)duˇˇ = F (u( x )) + C
ˇ
u=u( x )

But now the right-hand side is a function of x, so we can differentiate it with respect to x
to get
d
F (u( x )) = F1 (u( x )) ¨ u1 ( x )
dx
This tells us that F (u( x )) is an antiderivative of the function F1 (u( x )) ¨ u1 ( x ) = f (u( x ))u1 ( x ).
Thus we know
 1 
ż ż ˇ
f u( x ) ¨ u ( x ) dx = F u( x ) + C = f (u) duˇˇ
ˇ
u=u( x )

This is the substitution rule for indefinite integrals.


Theorem3.4.2 (The substitution rule — indefinite integral version).

For any differentiable function u( x ):


ż ż ˇ
f (u( x ))u ( x )dx = f (u)duˇˇ
1
ˇ
u=u( x )

In order to apply the substitution rule successfully we will have to write the integrand
in the form f (u( x )) ¨ u1 ( x ). To do this we need to make a good choice of the function u( x );
after that it is not hard to then find f (u) and u1 ( x ). Unfortunately there is no one strategy
for choosing u( x ). This can make applying the substitution rule more art than science30 .
Here we suggest two possible strategies for picking u( x ):
(1) Factor the integrand and choose one of the factors to be u1 ( x ). For this to work, you
must be able to easily find the antiderivative of the chosen factor. The antiderivative
will be u( x ).
(2) Look for a factor in the integrand that is a function with an argument
 that is more
complicated than just “x”. That factor will play the role of f u( x ) Choose u( x ) to be
the complicated argument.
Here are two examples which illustrate each of those strategies in turn.
Example 3.4.3

Consider the integral


ż
9 sin8 ( x ) cos( x )dx

30 Thankfully this does become easier with experience and we recommend that the reader read some
examples and then practice a LOT.

167
I NTEGRATION 3.4 S UBSTITUTION

We want to massage this into the form of the integrand in the substitution rule — namely
f (u( x )) ¨ u1 ( x ). Our integrand can be written as the product of the two factors

sin8 ( x ) ¨ cos
9looomooon (x)
loomoon
first factor second factor

and we start by determining (or guessing) which factor plays the role of u1 ( x ). We can
choose u1 ( x ) = 9 sin8 ( x ) or u1 ( x ) = cos( x ).
• If we choose u1 ( x ) = 9 sin8 ( x ), then antidifferentiating this to find u( x ) is really not
very easy. So it is perhaps better to investigate the other choice before proceeding
further with this one.

• If we choose u1 ( x ) = cos( x ), then we know (Theorem 3.3.16) that u( x ) = sin( x ). This


also works nicely because it makes the other factor simplify quite a bit 9 sin8 ( x ) =
9u8 . This looks like the right way to go.
So we go with the second choice. Set u1 ( x ) = cos( x ), u( x ) = sin( x ), then
ż ż
9 sin ( x ) cos( x )dx = 9u( x )8 ¨ u1 ( x )dx
8

ż ˇ
8
= 9u duˇˇ by the substitution rule
ˇ
u=sin( x )

We are now left with the problem of antidifferentiating a monomial; this we can do with
Theorem 3.3.16.
  ˇˇ
9
= u + C ˇˇ
u=sin( x )
9
= sin ( x ) + C

Note that 9 sin8 ( x ) cos( x ) is a function of x. So our answer, which is the indefinite integral
of 9 sin8 ( x ) cos( x ), must also be a function of x. This is why we have substituted u =
sin( x ) in the last step of our solution — it makes our solution a function of x.
Example 3.4.3

Example 3.4.4

Evaluate the integral


ż
3x2 cos( x3 )dx

Solution. Again we are going to use the substitution rule and helpfully our integrand is a
product of two factors
2
loo3x cos( x3 )
moon ¨ loomoon
first factor second factor

168
I NTEGRATION 3.4 S UBSTITUTION


The second factor, cos x3 is a function, namely cos, with a complicated argument, namely
x3 . So we try u( x ) = x3 . Then u1 ( x ) = 3x2 , which is the other factor in the integrand. So
the integral becomes


ż ż
2 3
3x cos( x )dx = u1 ( x ) cos u( x ) dx just swap order of factors

ż
= cos u( x ) u1 ( x )dx by the substitution rule
ż ˇ
= cos(u)duˇˇ
ˇ
u= x3
ˇ
= (sin(u) + C ) ˇˇ using Theorem 3.3.16)
ˇ
u= x3
= sin( x3 ) + C

Example 3.4.4
Now let’s look at a definite integral.
ş 
1 x x
Example 3.4.5 0 e sin(e )dx

Compute
ż1

e x sin e x dx.
0

Solution. Again we use the substitution rule.

• The integrand is again the product of two factors and we can choose u1 ( x ) = e x or
u1 ( x ) = sin(e x ).

• If we choose u1 ( x ) = e x then u( x ) = e x and the other factor becomes sin(u) —


this looks promising. Notice that if we applied the other strategy of looking for a
complicated argument then we would arrive at the same choice.

• So we try u1 ( x ) = e x and u( x ) = e x . This gives (if we ignore the limits of integration


for a moment)

 
ż ż
x x
e sin e dx = sin u( x ) u1 ( x )dx apply the substitution rule
ż ˇ
= sin(u)duˇˇ
ˇ
u=e x
ˇ
= (´ cos(u) + C ) ˇˇ
ˇ
u=e x
x

= ´ cos e + C

169
I NTEGRATION 3.4 S UBSTITUTION

• But what happened to the limits of integration? We can incorporate them now. We
have just shown that the indefinite integral is ´ cos(e x ), so by the fundamental the-
orem of calculus
ż1
  1
e x sin e x dx = ´ cos e x 0
0
= ´ cos(e1 ) ´ (´ cos(e0 ))
= ´ cos(e) + cos(1)

Example 3.4.5

The example below introduces a special case where the “inside” function is linear.
Example 3.4.6

Compute the indefinite integrals


ż ? ż
2x + 1dx and e3x´2 dx

Solution.

• Starting with the first integral, we see that it is not too hard ?
to spot the complicated
argument. If we set u( x ) = 2x + 1 then the integrand is just u.
1
• Hence we substitute 2x + 1 Ñ u and dx Ñ u1 ( x )
du = 21 du:
ż ?
? 1
ż
2x + 1dx = u du
2
1
ż
= u1/2 du
2
 ˇ
2 3/2 1
u ¨ + C ˇˇ
ˇ
=
3 2 u=2x +1
1
= (2x + 1)3/2 + C
3

• We can evaluate the second integral in much the same way. Set u( x ) = 3x ´ 2 and
replace dx by u1 (1x) du = 31 du:

1
ż ż
3x´2
e dx = eu du
3
 ˇ
1 u
e + C ˇˇ
ˇ
=
3 u=3x´2
1 3x´2
= e +C
3

170
I NTEGRATION 3.4 S UBSTITUTION

Example 3.4.6
This last example illustrates that substitution can be used to easily deal with arguments of
the form ax + b, i.e. that are linear functions of x, and suggests the following theorem.
Theorem3.4.7.

Let F (u) be an antiderivative of f (u) and let a, b be constants. Then

1
ż
f ( ax + b)dx = F ( ax + b) + C
a

Proof. We can show this using the substitution rule. Let u( x ) = ax + b so u1 ( x ) = a, then
1
ż ż
f ( ax + b)dx = f (u) ¨ 1 du
u (x)
1
ż
= f (u)du
a
1
ż
= f (u)du since a is a constant
a
1
ˇ
= F (u)ˇˇ +C since F (u) is an antiderivative of f (u)
ˇ
a u= ax +b
1
= F ( ax + b) + C.
a

3.4.1 §§ Substitution and Definite Integrals


Theorem 3.4.2, the substitution rule for indefinite integrals, tells us  that if F (u) is any
antiderivative for f (u), then F u( x ) is an antiderivative for f u( x ) u1 ( x ). So the Funda-
mental Theorem of Calculus gives us
ˇ x = b
żb
 1
ˇ
f u( x ) u ( x ) dx = F u( x ) ˇˇ
a
 x=a 
= F u(b) ´ F u( a)
ż u(b)
= f (u) du since F (u) is an antiderivative for f (u)
u( a)
and we have just found
Theorem3.4.8 (The substitution rule — definite integral version).

For any differentiable function u( x ):


żb ż u(b)
f (u( x ))u ( x )dx =
1
f (u)du
a u( a)

171
I NTEGRATION 3.4 S UBSTITUTION

Notice that to get from the integral on the left hand side to the integral on the right
hand side you
• substitute31 u( x ) Ñ u and u1 ( x )dx Ñ du,
• set the lower limit for the u integral to the value of u (namely u( a)) that corresponds
to the lower limit of the x integral (namely x = a), and
• set the upper limit for the u integral to the value of u (namely u(b)) that corresponds
to the upper limit of the x integral (namely x = b).
şb 
Also note that we now have two ways to evaluate definite integrals of the form a f u( x ) u1 ( x ) dx.

• We can find the indefinite integral f u( x ) u1 ( x ) dx, using Theorem 3.4.2, and then
ş

evaluate the result between x = a and x = b. This is what was done in Example 3.4.5.
• Or we can apply Theorem 3.4.2. This entails finding the indefinite integral f (u) du
ş

and evaluating the result between u = u( a) and u = u(b). This is what we will do
in the following example.
ş 
1
Example 3.4.9 0 x2 sin( x3 + 1)dx

Compute
ż1

x2 sin x3 + 1 dx
0
Solution.

• In this example the integrand is already neatly factored into two pieces. While we
could deploy either of our two strategies, it is perhaps easier in this case to choose
u( x ) by looking for a complicated argument.

• The second factor of the integrand is sin x3 + 1 , which is the function sin evaluated
at x3 + 1. So set u( x ) = x3 + 1, giving u1 ( x ) = 3x2 and f (u) = sin(u)
• The first factor of the integrand is x2 which is not quite u1 ( x ), however we can easily
massage the integrand into the required form by multiplying and dividing by 3:
 1 
x2 sin x3 + 1 = ¨ 3x2 ¨ sin x3 + 1 .
3
• We want this in the form of the substitution rule, so we do a little massaging:
ż1 ż1
2 3
 1 
x sin x + 1 dx = ¨ 3x2 ¨ sin x3 + 1 dx
0 0 3
1 1 
ż
= sin x3 + 1 ¨ 3x2 dx by Theorem 3.2.1(c)
3 0

31 A good way to remember this last step is that we replace du dx dx by just du — which looks like we
cancelled out the dx terms: dudx
dx
 = du. While using “cancel the dx” is a good mnemonic (memory


aid), you should not think of the derivative du
dx as a fraction — you are not dividing du by dx.

172
I NTEGRATION 3.4 S UBSTITUTION

• Now we are ready for the substitution rule:


ż1 ż1
1 3
 12

sin x + 1 ¨ 3x dx = sin x3 + 1 ¨ loo3x 2
moon dx
3 0 3 0
loooooomoooooon
= f (u( x )) = u1 ( x )
ż1
1
= f (u( x ))u1 ( x )dx with u( x ) = x3 + 1 and f (u) = sin(u)
3 0
ż u (1)
1
= f (u)du by the substitution rule
3 u (0)
ż2
1
= sin(u)du since u(0) = 1 and u(1) = 2
3 1
1 2
= ´ cos(u) 1
3
1 
= ´ cos(2) ´ (´ cos(1))
3
cos(1) ´ cos(2)
= .
3

Example 3.4.9

There is another, and perhaps easier, way to view the manipulations in the previous
example. Once you have chosen u( x ) you

• make the substitution u( x ) Ñ u,


1
• replace dx Ñ du.
u1 ( x )
In so doing, we take the integral
żb ż u(b)
1
f (u( x )) ¨ u ( x )dx =
1
f ( u ) ¨ u1 ( x ) ¨ du
a u( a) u1 ( x )
ż u(b)
= f (u)du exactly the substitution rule
u( a)

but we do not have to manipulate the integrand so as to make u1 ( x ) explicit. Let us redo
the previous example by this approach.
Example 3.4.10 (Example 3.4.9 revisited)

Compute the integral


ż1

x2 sin x3 + 1 dx
0

Solution.

173
I NTEGRATION 3.4 S UBSTITUTION


• We have already observed that one factor of the integrand is sin x3 + 1 , which is
sin evaluated at x3 + 1. Thus we try setting u( x ) = x3 + 1.

1
• This makes u1 ( x ) = 3x2 , and we replace u( x ) = x3 + 1 Ñ u and dx Ñ u1 ( x )
du =
1
3x2
du:

ż1 ż u (1)
2 3
  1
x sin x + 1 dx = x2 sin 3
loooooomoooooon 3x2 du
x + 1
0 u (0)
=sin(u)
ż2
x2
= sin(u) du
1 3x2
ż2
1
= sin(u)du
1 3
ż2
1
= sin(u)du
3 1

which is precisely the integral we found in Example 3.4.9.

Example 3.4.10

We can do the following example using the substitution rule or Theorem 3.4.7:
ş π 
/2
Example 3.4.11 0 cos(3x )dx
ş π/2
Compute 0 cos(3x )dx.

• In this example we should set u = 3x, and substitute dx Ñ u1 (1x) du = 31 du. When
we do this we also have to convert the limits of the integral: u(0) = 0 and u(π/2) =
3π/2. This gives

ż π/2 ż 3π/2
1
cos(3x )dx = cos(u) du
0 0 3
 3π/2
1
= sin(u)
3 0
sin(3π/2) ´ sin(0)
=
3
´1 ´ 0 1
= =´ .
3 3

• We can also do this example more directly using the above theorem. Since sin( x ) is
sin(3x )
an antiderivative of cos( x ), Theorem 3.4.7 tells us that 3 is an antiderivative of

174
I NTEGRATION 3.4 S UBSTITUTION

cos(3x ). Hence
 π
sin(3x ) /2
ż π/2
cos(3x )dx =
0 3 0
sin(3π/2) ´ sin(0)
=
3
1
=´ .
3

Example 3.4.11

3.4.2 §§ More Substitution Examples


The rest of this section is just more examples of the substitution rule. We recommend
that you after reading these that you practice many examples by yourself under exam
conditions. Practice is integral to the learning process – there is no substitution for it.
ş 
1
Example 3.4.12 0 x2 sin(1 ´ x3 )dx

This integral looks a lot like that of Example 3.4.9. It makes sense to try u( x ) = 1 ´ x3 since
it is the argument of sin(1 ´ x3 ). We
• substitute u = 1 ´ x3 and
1 1
• replace dx with u1 ( x )
du = ´3x2
du,

• when x = 0, we have u = 1 ´ 03 = 1 and

• when x = 1, we have u = 1 ´ 13 = 0.
So
ż1 ż0
2 3
 1
x sin 1 ´ x ¨ dx = x2 sin(u) ¨ du
0 1 ´3x2
ż0
1
= ´ sin(u)du.
1 3
Note that the lower limit of the u–integral, namely 1, is larger than the upper limit, which
is 0. There is absolutely nothing wrong with that. We can simply evaluate the u–integral
in the normal way. Since ´ cos(u) is an antiderivative of sin(u):
 
cos(u) 0
=
3 1
cos(0) ´ cos(1)
=
3
1 ´ cos(1)
= .
3

175
I NTEGRATION 3.4 S UBSTITUTION

Example 3.4.12

ş 
1 1
Example 3.4.13 0 (2x +1)3 dx
ş1 1
Compute 0 (2x+ 1)3
dx.
We could do this one using Theorem 3.4.7, but its not too hard to do without. We can
think of the integrand as the function “one over a cube” with the argument 2x + 1. So it
makes sense to substitute u = 2x + 1. That is
• set u = 2x + 1 and
1
• replace dx Ñ u1 ( x )
du = 21 du.

• When x = 0, we have u = 2 ˆ 0 + 1 = 1 and

• when x = 1, we have u = 2 ˆ 1 + 1 = 3.
So
ż1 ż3
1 1 1
dx = ¨ du
0 (2x + 1)3 1 u3 2
ż3
1
= u´3 du
2 1
3
1 u´2
=
2 ´2 1
 
1 1 1 1 1
= ¨ ´ ¨
2 ´2 9 ´2 1
 
1 1 1 1 8
= ´ = ¨
2 2 18 2 18
2
=
9

Example 3.4.13

ş 
1 x
Example 3.4.14 0 1+ x2 dx
ş1 x
Evaluate 0 1+ x2 dx.
Solution.
• The integrand can be rewritten as x ¨ 1+1x2 . This second factor suggests that we should
try setting u = 1 + x2 — and so we interpret the second factor as the function “one
over” evaluated at argument 1 + x2 .

• With this choice we

176
I NTEGRATION 3.4 S UBSTITUTION

– set u = 1 + x2 ,
1
– substitute dx Ñ 2x du, and
– translate the limits of integration: when x = 0, we have u = 1 + 02 = 1 and
when x = 1, we have u = 1 + 12 = 2.

• The integral then becomes


ż1 ż2
x x 1
dx = du
0 1 + x2 1 u 2x
ż2
1
= du
1 2u
1 2
= ln |u| 1
2
ln 2 ´ ln 1 ln 2
= = .
2 2

Example 3.4.14

 
Example 3.4.15 x3 cos x4 + 2 dx
ş


Compute the integral x3 cos x4 + 2 dx.
ş

Solution.

• The integrand is the product of cos evaluated at the argument x4 + 2 times x3 , which
aside from a factor of 4, is the derivative of the argument x4 + 2.

1 1
• Hence we set u = x4 + 2 and then substitute dx Ñ u1 ( x )
du = 4x3
du.

• Before proceeding further, we should note that this is an indefinite integral so we


don’t have to worry about the limits of integration. However we do need to make
sure our answer is a function of x — we cannot leave it as a function of u.

• With this choice of u, the integral then becomes

 1
ż ż ˇ
3 4 3
x cos x + 2 dx = x cos(u) 3 duˇˇ
ˇ
4x u = x 4 +2
1
ż ˇ
cos(u)duˇˇ
ˇ
=
4 u = x 4 +2
 ˇ
1
sin(u) + C ˇˇ
ˇ
=
4 u = x 4 +2
1
= sin( x4 + 2) + C.
4

177
I NTEGRATION 3.4 S UBSTITUTION

Example 3.4.15
The next two examples are more involved and require more careful thinking.
ş ? 
Example 3.4.16 1 + x2 x3 dx
ş?
Compute 1 + x2 x3 dx.
• An obvious choice of u is the argument inside the square root. So substitute u =
1
1 + x2 and dx Ñ 2x du.
• When we do this we obtain
? 1
ż a ż
3
2
1 + x ¨ x dx = u ¨ x3 ¨ du
2x
1 ?
ż
= u ¨ x2 du
2
Unlike all our previous examples, we have not cancelled out all of the x’s from the
integrand. However before we do the integral with respect to u, the integrand must
be expressed solely in terms of u — no x’s are allowed. (Look that integrand on the
right hand side of Theorem 3.4.2.)
• But all is not lost. We can rewrite the factor x2 in terms of the variable u. We know
that u = 1 + x2 , so this means x2 = u ´ 1. Substituting this into our integral gives
1?
ż a ż
3
2
1 + x ¨ x dx = u ¨ x2 du
2
1?
ż
= u ¨ (u ´ 1)du
2
ż  
1
= u3/2 ´ u1/2 du
2
 ˇ
1 2 5/2 2 3/2 ˇˇ
= u ´ u ˇ 2 +C
2 5 3
  ˇ u = x +1
1 5/2 1 3/2 ˇˇ
= u ´ u ˇ 2 +C
5 3 u = x +1
1 1
= ( x2 + 1)5/2 ´ ( x2 + 1)3/2 + C.
5 3
Oof!
• Don’t forget that you can always check the answer by differentiating:
     
d 1 2 5/2 1 2 3/2 d 1 2 5/2 d 1 2 3/2
( x + 1) ´ ( x + 1) + C = ( x + 1) ´ ( x + 1)
dx 5 3 dx 5 dx 3
1 5 1 3
= ¨ 2x ¨ ¨ ( x2 + 1)3/2 ´ ¨ 2x ¨ ¨ ( x2 + 1)1/2
5 2 3 2
2 3/2 2 1/2
= x ( x + 1) ´ x ( x + 1)
  a
= x ( x 2 + 1) ´ 1 ¨ x 2 + 1
= x3 x2 + 1.
a

178
I NTEGRATION 3.5 I NTEGRATION BY PARTS

which is the original integrand X.

Example 3.4.16

Example 3.4.17 ( tan xdx )


ş

Evaluate the indefinite integral tan( x )dx.


ş

Solution.

• At first glance there is nothing to manipulate here and şso very little to go on. How-
sin x sin x
ever we can rewrite tan x as cos x , making the integral cos x dx. This gives us more
to work with.

• Now think of the integrand as being the product cos1 x ¨ sin x. This suggests that we set
u = cos x and that we interpret the first factor as the function “one over” evaluated
at u = cos x.
1
• Substitute u = cos x and dx Ñ ´ sin x du to give:

sin x sin x 1
ż ż ˇ
dx = duˇˇ
ˇ
cos x u ´ sin x u=cos x
1
ż ˇ
= ´ duˇˇ
ˇ
u u=cos x
= ´ ln | cos x| + C and if we want to go further
ˇ 1 ˇ
ˇ ˇ
= ln ˇˇ ˇ+C
cos x ˇ
= ln | sec x| + C.

Example 3.4.17

3.5IJ Integration by Parts


The Fundamental Theorem of Calculus tells us that it is very easy to integrate a derivative.
In particular, we know that

d
ż
( F ( x )) dx = F ( x ) + C
dx

We can exploit this in order to develop another rule for integration — in particular a rule
to help us integrate products of simpler function such as
ż
xe x dx

179
I NTEGRATION 3.5 I NTEGRATION BY PARTS

In so doing we will arrive at a method called “integration by parts”.


To do this we start with the product rule and integrate. Recall that the product rule
says

d
u ( x ) v ( x ) = u1 ( x ) v ( x ) + u ( x ) v1 ( x )
dx
Integrating this gives
 1   
ż
u ( x ) v( x ) + u( x ) v1 ( x ) dx = a function whose derivative is u1 v + uv1 + C

= u( x )v( x ) + C

Now this, by itself, is not terribly useful. In order to apply it we need to have a function
whose integrand is a sum of products that is in exactly this form u1 ( x )v( x ) + u( x )v1 ( x ).
This is far too specialised.
However if we tease this apart a little:
 1 
ż ż ż
u ( x ) v( x ) + u( x ) v ( x ) dx = u ( x ) v( x ) dx + u( x ) v1 ( x ) dx
1 1

Bring one of the integrals to the left-hand side


ż ż
u( x )v( x ) ´ u ( x ) v( x )dx = u( x ) v1 ( x )dx
1

Swap left and right sides


ż ż
u( x ) v ( x )dx = u( x )v( x ) ´
1
u1 ( x ) v( x )dx

In this form we take the integral of one product and express it in terms of the integral of
a different product. If we express it like that, it doesn’t seem too useful. However, if the
second integral is easier, then this process helps us.
Let us do a simple example before explaining this more generally.

Example 3.5.1 ( xe x dx )
ş
ż
Compute the integral xe x dx.

Solution.

• We start by taking the equation above


ż ż
u( x ) v ( x )dx = u( x )v( x ) ´ u1 ( x ) v( x )dx
1

• Now set u( x ) = x and v1 ( x ) = e x . How did we know how to make this choice? We
will explain some strategies later. For now, let us just accept this choice and keep
going.

180
I NTEGRATION 3.5 I NTEGRATION BY PARTS

• In order to use the formula we need to know u1 ( x ) and v( x ). In this case it is quite
straightforward: u1 ( x ) = 1 and v( x ) = e x .

• Plug everything into the formula:


ż ż
xe dx = xe ´ e x dx
x x

So our original more difficult integral has been turned into a question of computing
an easy one.

= xe x ´ e x + C

• We can check our answer by differentiating:

d
( xe x ´ e x + C ) = looooomooooon
xe x + 1 ¨ e x ´e x + 0
dx
by product rule
x
= xe as required.

Example 3.5.1
The process we have used in the above example is called “integration by parts”. When
our integrand is a product we try to write it as u( x )v1 ( x ) — we need to choose one factor
to be u( x ) and the other to be v1 ( x ). We then compute u1 ( x ) and v( x ) and then apply the
following theorem:

Theorem3.5.2 (Integration by parts).

Let u( x ) and v( x ) be continuously differentiable. Then


ż ż
u( x ) v ( x )dx = u( x ) v( x ) ´ v( x ) u1 ( x )dx
1

If we write dv for v1 ( x )dx and du for u1 ( x )dx (as the substitution rule suggests),
then the formula becomes
ż ż
udv = u v ´ vdu

The application of this formula is known as integration by parts.


The corresponding statement for definite integrals is
żb żb
u( x ) v ( x )dx = u(b) v(b) ´ u( a) v( a) ´
1
v( x ) u1 ( x )dx
a a

Integration by parts is not as easy to apply as the product rule for derivatives. This is
because it relies on us

181
I NTEGRATION 3.5 I NTEGRATION BY PARTS

(1) judiciously choosing u( x ) and v1 ( x ), then

(2) computing u1 ( x ) and v( x ) — which requires us to antidifferentiate v1 ( x ), and finally

(3) that the integral u1 ( x )v( x )dx is easier than the integral we started with.
ş

Notice that any antiderivative of v1 ( x ) will do. All antiderivatives of v1 ( x ) are of the
form v( x ) + A with A a constant. Putting this into the integration by parts formula gives
ż ż
u( x )v ( x )dx = u( x ) (v( x ) + A) ´ u1 ( x ) (v( x ) + A) dx
1

ż ż
= u( x )v( x ) + Au( x ) ´ u ( x )v( x )dx ´ A u1 ( x )dx
1
loooooomoooooon
= Au( x )+C
ż
= u( x )v( x ) ´ u1 ( x )v( x )dx + C

So that constant A will always cancel out.


In most applications (but not all) our integrand will be a product of two factors so we
have two choices for u( x ) and v1 ( x ). Typically one of these choices will be “good” (in that
it results in a simpler integral) while the other will be “bad” (we cannot antidifferentiate
our choice of v1 ( x ) or the resulting integral is harder). Let us illustrate what we mean by
returning to our previous example.
Example 3.5.3 ( xe x dx — again)
ş

Our integrand is the product of two factors

x and ex

This gives us two obvious choices of u and v1 :

u( x ) = x v1 ( x ) = e x
or
u( x ) = e x v1 ( x ) = x

We should explore both choices:

1. If take u( x ) = x and v1 ( x ) = e x . We then quickly compute

u1 ( x ) = 1 and v( x ) = e x

which means we will need to integrate (in the right-hand side of the integration by
parts formula)
ż ż
u ( x )v( x )dx = 1 ¨ e x dx
1

which looks straightforward. This is a good indication that this is the right choice of
u( x ) and v1 ( x ).

182
I NTEGRATION 3.5 I NTEGRATION BY PARTS

2. But before we do that, we should also explore the other choice, namely u( x ) = e x
and v1 ( x ) = x. This implies that

1 2
u1 ( x ) = e x and v( x ) = x
2
which means we need to integrate

1 2 x
ż ż
u ( x )v( x )dx =
1
x ¨ e dx.
2

This is at least as hard as the integral we started with. Hence we should try the first
choice.

With our choice made, we integrate by parts to get


ż ż
xe dx = xe ´ e x dx
x x

= xe x ´ e x + C.

The above reasoning is a very typical workflow when using integration by parts.
Example 3.5.3
Integration by parts is often used
d
• to eliminate factors of x from an integrand like xe x by using that dx x = 1 and
d 1
• to eliminate a ln x from an integrand by using that dx ln x = x and

• to eliminate inverse trig functions, like arctan x, from an integrand by using that, for
d
example, dx arctan x = 1+1x2 .

Example 3.5.4 ( x sin xdx )


ş

Solution.

• Again we have a product of two factors giving us two possible choices.

(1) If we choose u( x ) = x and v1 ( x ) = sin x, then we get

u1 ( x ) = 1 and v( x ) = ´ cos x

which is looking promising.


(2) On the other hand if we choose u( x ) = sin x and v1 ( x ) = x, then we have

1 2
u1 ( x ) = cos x and v( x ) = x
2
ş1 2 cos xdx.
which is looking worse — we’d need to integrate 2x

183
I NTEGRATION 3.5 I NTEGRATION BY PARTS

• So we stick with the first choice. Plugging u( x ) = x, v( x ) = ´ cos x into integration


by parts gives us
ż ż
x sin xdx = ´x cos x ´ 1 ¨ (´ cos x )dx

= ´x cos x + sin x + C

• Again we can check our answer by differentiating:

d
(´x cos x + sin x + C ) = ´ cos x + x sin x + cos x + 0
dx
= x sin xX

Once we have practised this a bit we do not really need to write as much. Let us solve
it again, but showing only what we need to.
Solution.
• We use integration by parts to solve the integral.

• Set u( x ) = x and v1 ( x ) = sin x. Then u1 ( x ) = 1 and v( x ) = ´ cos x, and


ż ż
x sin xdx = ´x cos x + cos xdx

= ´x cos x + sin x + C.

Example 3.5.4
It is pretty standard practice to reduce the notation even further in these problems. As
noted above, many people write the integration by parts formula as
ż ż
udv = uv ´ vdu

where du, dv are shorthand for u1 ( x )dx, v1 ( x )dx. Let us write up the previous example
using this notation.
Example 3.5.5 ( x sin xdx yet again)
ş

Solution. Using integration by parts, we set u = x and dv = sin xdx. This makes du = 1dx
and v = ´ cos x. Consequently
ż ż
x sin xdx = udv
ż
= uv ´ vdu
ż
= ´x cos x + cos xdx
= ´x cos x + sin x + C

184
I NTEGRATION 3.5 I NTEGRATION BY PARTS

You can see that this is a very neat way to write up these problems and we will continue
using this shorthand in the examples that follow below.
Example 3.5.5

We can also use integration by parts to eliminate higher powers of x. We just need to
apply the method more than once.

Example 3.5.6 x2 e x dx
ş

Solution.

• Let u = x2 and dv = e x dx. This then gives du = 2xdx and v = e x , and


ż ż
2 x 2 x
x e dx = x e ´ 2xe x dx

• So we have reduced the problem of computing the original integral to one of inte-
grating 2xe x . We know how to do this — just integrate by parts again:
ż ż
2 x 2 x
x e dx = x e ´ 2xe x dx set u = 2x, dv = e x dx
 ż 
2 x x x
= x e ´ 2xe ´ 2e dx since du = 2dx, v = e x

= x2 e x ´ 2xe x + 2e x + C

• We can, if needed, check our answer by differentiating:

d  2 x   
x e ´ 2xe x + 2e x + C = x2 e x + 2xe x ´ (2xe x + 2e x ) + 2e x + 0
dx
= x2 e x X

A similar iterated application of integration by parts will work for integrals


ż
P( x ) ( Ae ax + B sin(bx ) + C cos(cx )) dx

where P( x ) is a polynomial and A, B, C, a, b, c are constants.


Example 3.5.6

Now let us look at integrands containing logarithms. We don’t know the antiderivative
of ln x, but we can eliminate ln x from an integrand by using integration by parts with
u = ln x.
Example 3.5.7 ( x ln xdx )
ş

Solution.

185
I NTEGRATION 3.5 I NTEGRATION BY PARTS

• We have two choices for u and dv.


(1) Set u = x and dv = ln xdx. This gives du = dx but v is hard to compute — we
haven’t done it yet32 . Before we go further along this path, we should look to
see what happens with the other choice.
(2) Set u = ln x and dv = xdx. This gives du = 1x dx and v = 12 x2 , and we have to
integrate
1 1 2
ż ż
v du = ¨ x dx
x 2
which is easy.
• So we proceed with the second choice.
1 2 1
ż ż
x ln xdx = x ln x ´ xdx
2 2
1 1
= x2 ln x ´ x2 + C
2 4
• We can check our answer quickly:
d  x2 x2  x2 1 x
ln x ´ + C = x ln x + ´ + 0 = x ln x
dx 2 4 2 x 2

Example 3.5.7

3.5.1 §§ Further Techniques using Integration by Parts


It’s tempting to think of integration by parts as a product rule for integration. It can be
used that way, but it also shows up in more surprising contexts. Integration by parts lets
us find the antderivatives of the natural logarithm and inverse trigonometric functions,
despite these not being the product of two functions in any obvious way. There are even
special cases where using integration by parts actually lets us find an integral without ever
antidifferentiating. These methods are explored in the examples below.
Example 3.5.8 ( ln xdx )
ş

It is not immediately obvious that one should use integration by parts to compute the in-
tegral
ż
ln xdx

since the integrand is not a product. But we should persevere — indeed this is a situation
where our shorter notation helps to clarify how to proceed.
Solution.

32 We will soon.

186
I NTEGRATION 3.5 I NTEGRATION BY PARTS

• In the previous example we saw that we could remove the factor ln x by setting
u = ln x and using integration by parts. Let us try repeating this. When we make
this choice, we are then forced to take dv = dx — that is we choose v1 ( x ) = 1. Once
we have made this sneaky move everything follows quite directly.

• We then have du = 1x dx and v = x, and the integration by parts formula gives us

1
ż ż
ln xdx = x ln x ´ ¨ xdx
x
ż
= x ln x ´ 1dx

= x ln x ´ x + C

• As always, it is a good idea to check our result by verifying that the derivative of the
answer really is the integrand.

d  1
x ln x ´ x + C = ln x + x ´ 1 + 0 = ln x
dx x

Example 3.5.8

The same method works almost exactly to compute the antiderivatives of arcsin( x )
and arctan( x ):

Example 3.5.9 ( arctan( x )dx and arcsin( x )dx )


ş ş

Compute the antiderivatives of the inverse sine and inverse tangent functions.
Solution.

• Again neither of these integrands are products, but that is no impediment. In both
cases we set dv = dx (ie v1 ( x ) = 1) and choose v( x ) = x.
1
• For inverse tan we choose u = arctan( x ), so du = 1+ x 2
dx:

1
ż ż
arctan( x )dx = x arctan( x ) ´ x¨ dx now use substitution rule
1 + x2
w (x) 1
ż 1
= x arctan( x ) ´ ¨ dx with w( x ) = 1 + x2 , w1 ( x ) = 2x
2 w
1 1
ż
= x arctan( x ) ´ dw
2 w
1
= x arctan( x ) ´ ln |w| + C
2
1
= x arctan( x ) ´ ln |1 + x2 | + C but 1 + x2 ą 0, so
2
1
= x arctan( x ) ´ ln(1 + x2 ) + C
2

187
I NTEGRATION 3.5 I NTEGRATION BY PARTS

• Similarly for inverse sine we choose u = arcsin( x ) so du = ? 1 dx:


1´x2

x
ż ż
arcsin( x )dx = x arcsin( x ) ´ ? dx now use substitution rule
1 ´ x2
´w1 ( x )
ż
= x arcsin( x ) ´ ¨ w´1/2 dx with w( x ) = 1 ´ x2 , w1 ( x ) = ´2x
2
1
ż
= x arcsin( x ) + w´1/2 dw
2
1
= x arcsin( x ) + ¨ 2w1/2 + C
2
a
= x arcsin( x ) + 1 ´ x2 + C

• Both can be checked quite quickly by differentiating — but we leave that as an exer-
cise for the reader.
Example 3.5.9

There are many other examples we could do, but we’ll finish with a tricky one.
Example 3.5.10 ( e x sin xdx )
ş

Solution. Let us attempt this one a little naively and then we’ll come back and do it more
carefully (and successfully).
• We can choose either u = e x , dv = sin xdx or the other way around.
1. Let u = e x , dv = sin xdx. Then du = e x dx and v = ´ cos x. This gives
ż ż
e sin x = ´e cos x + e x cos xdx
x x

So we are left with an integrand that is very similar to the one we started with.
What about the other choice?
2. Let u = sin x, dv = e x dx. Then du = cos xdx and v = e x . This gives
ż ż
e sin x = e sin x ´ e x cos xdx
x x

So we are again left with an integrand that is very similar to the one we started
with.
• şHow do we proceed? — It turns out to be easier if you do both e x sin xdx and
ş

e x cos xdx simultaneously. We do so in the next example.

Example 3.5.10

ş 
b x şb x
Example 3.5.11 ae sin xdx and a e cos xdx

188
I NTEGRATION 3.5 I NTEGRATION BY PARTS

This time we’re going to do the two integrals


żb żb
x
I1 = e sin xdx I2 = e x cos xdx
a a

at more or less the same time.

• First
żb żb
x
I1 = e sin xdx = udv with u = e x , dv = sin xdx
a a
so v = ´ cos x, du = e x dx
h ib żb
x
= ´ e cos x + e x cos xdx
a a

We have not found I1 but we have related it to I2 .


h ib
I1 = ´ e x cos x + I2
a

• Now start over with I2 .


żb żb
x
I2 = e cos xdx = udv with u = e x , dv = cos xdx
a a
so v = sin x, du = e x dx
h ib żb
x
= e sin x ´ e x sin xdx
a a

Once again, we have not found I2 but we have related it back to I1 .


h ib
I2 = e x sin x ´ I1
a

• So summarising, we have
h ib h ib
I1 = ´ e x cos x + I2 I2 = e x sin x ´ I1
a a

• So now, substitute the expression for I2 from the second equation into the first equa-
tion to get
h ib 1h ib
I1 = ´ e x cos x + e x sin x ´ I1 which implies I1 = e x sin x ´ cos x
a 2 a

If we substitute the other way around we get


h ib 1h x ib
I2 = e x sin x + e x cos x ´ I2 which implies I2 = e sin x + cos x
a 2 a

That is,
1h x ib 1h x ib
żb żb
x x
e sin xdx = e sin x ´ cos x e cos xdx = e sin x + cos x
a 2 a a 2 a

189
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS


• This also says, for example, that 12 e x sin x ´ cos x is an antiderivative of e x sin x so
that
1 
ż
e x sin xdx = e x sin x ´ cos x + C
2

The somewhat magical thing about this example is that we found our antiderivative,
in the end, using algebra. There wasn’t a step where we evaluated an antiderivative in
the usual way – we just generated an equation, and then solved it. To our knowledge,
this technique hasş no particular name. Because we somehow ended up where we started,
with the integral e x sin xdx, this author likes to call the technique integrating around in a
circle.
Since there was no clear “antiderivative” step, the results of this example can feel sus-
picious. We can always check whether an antiderivative is correct! This one is correct if
and only if the derivative of the right hand side is e x sin x. Here goes. By the product rule:

d h1 x  i 1h  i
e sin x ´ cos x + C = e x sin x ´ cos x + e x cos x + sin x = e x sin x
dx 2 2
which is the desired derivative.
Example 3.5.11

There is another way to find e x sin xdx and e x cos xdx that, in contrast to the above
ş ş

computations, doesn’t involve any trickery. But it does require the use of complex num-
ix
bers and so is beyond the scope of this course. The secret is to use that sin x = e ´e
´ix
2i and
eix +e´ix
cos x = 2 , where i is the square root of ´1 of the complex number system.

3.6IJ Trigonometric Integrals


Integrals of polynomials of the trigonometric functions sin x, cos x, tan x and so on, are
generally evaluated by using a combination of simple substitutions and trigonometric
identities. There are of course a very large number33 of trigonometric identities, but usu-
ally we use only a handful of them. The most important three are:

Equation 3.6.1.

sin2 x + cos2 x = 1

Equation 3.6.2.

sin(2x ) = 2 sin x cos x

33 The more pedantic reader could construct an infinite list of them.

190
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS

Equation 3.6.3.

cos(2x ) = cos2 x ´ sin2 x


= 2 cos2 x ´ 1
= 1 ´ 2 sin2 x

Notice that the last two lines of Equation (3.6.3) follow from the first line by replacing
either sin2 x or cos2 x using Equation (3.6.1). It is also useful to rewrite these last two lines:

Equation 3.6.4.

1 ´ cos(2x )
sin2 x =
2

Equation 3.6.5.

1 + cos(2x )
cos2 x =
2

These last two are particularly useful since they allow us to rewrite higher powers of
sine and cosine in terms of lower powers. For example:
 
4 1 ´ cos(2x ) 2
sin ( x ) = by Equation (3.6.4)
2
1 1 1 2
= ´ cos(2x ) + cos (2x ) use Equation (3.6.5)
4 2 4 looomooon
do it again
1 1 1
= ´ cos(2x ) + (1 + cos(4x ))
4 2 8
3 1 1
= ´ cos(2x ) + cos(4x )
8 2 8

So while it was hard to integrate sin4 ( x ) directly, the final expression is quite straightfor-
ward (with a little substitution rule).
There are many such tricks for integrating powers of trigonometric functions. Here we
concentrate on two families
ż ż
m n
sin x cos xdx and tanm x secn xdx

for integer n, m. The details of the technique depend on the parity of n and m — that is,
whether n and m are even or odd numbers.

191
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS

3.6.1 §§ Integrating sinm x cosn xdx


ş

§§§ One of n and m is Odd


Consider the integral sin2 x cos xdx. We can integrate this by substituting u = sin x and
ş

du = cos xdx. This gives


ż ż
sin x cos xdx = u2 du
2

1 1
= u3 + C = sin3 x + C
3 3
This method can be used whenever n is an odd integer.
• Substitute u = sin x and du = cos xdx.
• This leaves an even power of cosines — convert them using cos2 x = 1 ´ sin2 x =
1 ´ u2 .
Here is an example.

Example 3.6.6 sin2 x cos3 xdx
ş

Start by factoring off one power of cos x to combine with dx to get cos xdx = du.
ż ż
2 3 2 2
sin x cos xdx = lo sin cos
omooxn cos
omooxn lo xdx
looomooon set u = sin x
= u2 =1´u2 =du
ż
= u2 (1 ´ u2 )du

u3 u5
= ´ +C
3 5
sin3 x sin5 x
= ´ +C
3 5

Example 3.6.6
Of course if m is an odd integer we can use the same strategy with the roles of sin x
and cos x exchanged. That is, we substitute u = cos x, du = ´ sin xdx and sin2 x =
1 ´ cos2 x = 1 ´ u2 .

§§§ Both n and m are Even


If m and n are both even, the strategy is to use the trig identities (3.6.4) and (3.6.5) to get
back to the m or n odd case. This is typically more laborious than the previous case we
studied. Here are a couple of examples that arise quite commonly in applications.

Example 3.6.7 cos2 xdx
ş

By (3.6.5)
1   1h 1 i
ż ż
2
cos xdx = 1 + cos(2x ) dx = x + sin(2x ) + C
2 2 2

192
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS

Example 3.6.7


Example 3.6.8 cos4 xdx
ş

First we’ll prepare the integrand cos4 x for easy integration by applying (3.6.5) a couple
times. We have already used (3.6.5) once to get
1 
cos2 x = 1 + cos(2x )
2
Squaring it gives
1 2 1 1 1
cos4 x = 1 + cos(2x ) = + cos(2x ) + cos2 (2x )
4 4 2 4
Now by (3.6.5) a second time
1 1 1 1 + cos(4x )
cos4 x = + cos(2x ) +
4 2 4 2
3 1 1
= + cos(2x ) + cos(4x )
8 2 8
Now it’s easy to integrate
3 1 1
ż ż ż ż
4
cos xdx = dx + cos(2x )dx + cos(4x )dx
8 2 8
3 1 1
= x + sin(2x ) + sin(4x ) + C
8 4 32

Example 3.6.8


Example 3.6.9 cos2 x sin2 xdx
ş

Here we apply both (3.6.4) and (3.6.5).


1   
ż ż
2 2
cos x sin xdx = 1 + cos(2x ) 1 ´ cos(2x ) dx
4
1  
ż
= 1 ´ cos2 (2x ) dx
4
We can then apply (3.6.5) again
1  1 
ż
= 1 ´ (1 + cos(4x )) dx
4 2
1  
ż
= 1 ´ cos(4x ) dx
8
1 1
= x ´ sin(4x ) + C
8 32

193
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS

Oof! We could also have done this one using (3.6.2) to write the integrand as sin2 (2x ) and
then used (3.6.4) to write it in terms of cos(4x ).
Example 3.6.9


Example 3.6.10 cos2 xdx and sin2 xdx
şπ şπ
0 0

Of course we can compute the definite integral 0 cos2 xdx by using the antiderivative for
şπ

cos2 x that we found in Example 3.6.7. But here is a trickier way to evaluate that inte-
2
gral, and also the integral 0 sin xdx at the same time, very quickly without needing the
şπ

antiderivative of Example 3.6.7.


Solution.
• Observe that 0 cos2 xdx and 0 sin2 xdx are equal because they represent the same
şπ şπ

area — look at the graphs below — the darkly shaded regions in the two graphs
have the same area and the lightly shaded regions in the two graphs have the same
area.

y y
1
y = cos2 x
1
y = sin2 x

x x
π/2 π π/2 π

• Consequently,
ż π 
1
żπ żπ żπ
2 2 2 2
cos xdx = sin xdx = sin xdx + cos xdx
0 0 2 0 0

1  
żπ
= sin2 x + cos2 x dx
2 0

1
żπ
= dx
2 0
π
=
2

Example 3.6.10

3.6.2 §§ Integrating tanm x secn xdx


ş

The strategy for dealing with these integrals is similar to the strategy that we used to
m
evaluate integrals of the form sin x cosn xdx and again depends on the parity of the
ş

194
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS

exponents n and m. It uses34


d d
tan x = sec2 x sec x = sec x tan x 1 + tan2 x = sec2 x
dx dx
We split the methods for integrating tanm x secn xdx into 5 cases which we list below.
ş

These will become much more clear after an example (or two).
(1) When m is odd and any n — rewrite the integrand in terms of sin x and cos x:
sin x m 1 n
tanm x secn xdx = dx
cos x cos x
sinm´1 x
= sin xdx
cosn+m x
and then substitute u = cos x, du = ´ sin xdx, sin2 x = 1 ´ cos2 x = 1 ´ u2 . See
Examples 3.6.11 and 3.6.12.
(2) Alternatively, if m is odd and n ě 1 move one factor of sec x tan x to the side so that
you can see sec x tan xdx in the integral, and substitute u = sec x, du = sec x tan x dx
and tan2 x = sec2 x ´ 1 = u2 ´ 1. See Example 3.6.13.
(3) If n is even with n ě 2, move one factor of sec2 x to the side so that you can see sec2 xdx
in the integral, and substitute u = tan x, du = sec2 x dx and sec2 x = 1 + tan2 x =
1 + u2 . See Example 3.6.14.
(4) When m is even and n = 0 — that is the integrand is just an even power of tangent
— we can still use the u = tan x substitution, after using tan2 x = sec2 x ´ 1 (possibly
more than once) to create a sec2 x. See Example 3.6.16.
(5) This leaves the case n odd and m even. There are strategies like those above for treat-
ing this case. But they are more complicated and also involve more tricks (that ba-
sically have to be memorized). Examples using them are provided in the optional
section entitled “Integrating sec x, csc x, sec3 x and csc3 x”, below. A more straight for-
ward strategy uses another technique called “partial fractions”. We shall return to this
strategy after we have learned about partial fractions. See Example 3.8.4 and 3.8.5 in
Section 3.8.

§§§ m is Odd — Odd Power of Tangent


In this case we rewrite the integrand in terms of sine and cosine and then substitute u =
cos x, du = ´ sin xdx.
Example 3.6.11 ( tan xdx )
ş

Solution.
1
• Write the integrand tan x = cos x sin x.

34 You will need to memorise the derivatives of tangent and secant. However there is no need to memorise
1 + tan2 x = sec2 x. To derive it very quickly just divide sin2 x + cos2 x = 1 by cos2 x.

195
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS

• Now substitute u = cos x, du = ´ sin x dx just as we did in treating integrands of


the form sinm x cosn x with m odd.
1
ż ż
tan x dx = sin x dx substitute u = cos x
cos x
1
ż
= ¨ (´1)du
u
= ´ ln |u| + C
= ´ ln | cos x| + C can also write in terms of secant
= ln | cos x|´1 + C = ln | sec x| + C

Example 3.6.11


Example 3.6.12 tan3 xdx
ş

Solution.

sin2 x
• Write the integrand tan3 x = cos3 x
sin x.

• Again substitute u = cos x, du = ´ sin x dx. We rewrite the remaining even powers
of sin x using sin2 x = 1 ´ cos2 x = 1 ´ u2 .

• Hence

sin2 x
ż ż
3
tan x dx = sin x dx substitute u = cos x
cos3 x
1 ´ u2
ż
= (´1)du
u3
u´2
= + ln |u| + C
2
1
= + ln | cos x| + C can rewrite in terms of secant
2 cos2 x
1
= sec2 x ´ ln | sec x| + C
2

Example 3.6.12

§§§ m is Odd and n ě 1 — Odd Power of Tangent and at Least One Secant
Here we collect a factor of tan x sec x and then substitute u = sec x and du = sec x tan xdx.
We can then rewrite any remaining even powers of tanx in terms of sec x using tan2 x =
sec2 x ´ 1 = u2 ´ 1.

196
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS


Example 3.6.13 tan3 x sec4 xdx
ş

Solution.

• Start by factoring off one copy of sec x tan x and combine it with dx to form sec x tan xdx,
which will be du.
• Now substitute u = sec x, du = sec x tan xdx and tan2 x = sec2 x ´ 1 = u2 ´ 1.
• This gives
ż ż
3 4 2 3
tan x sec xdx = tan sec
omooxn lo
lo sec x tan xdx
omooxn loooooomoooooon
u2 ´1 u3 du

ż
= u2 ´ 1]u3 du

u6 u4
= ´ +C
6 4
1 1
= sec6 x ´ sec4 x + C
6 4

Example 3.6.13

§§§ n ě 2 is Even — a Positive Even Power of Secant


In the previous case we substituted u = sec x, while in this case we substitute u = tan x.
When we do this we write du = sec2 xdx and then rewrite any remaining even powers of
sec x as powers of tan x using sec2 x = 1 + tan2 x = 1 + u2 .

Example 3.6.14 sec4 xdx
ş

Solution.

• Factor off one copy of sec2 x and combine it with dx to form sec2 xdx, which will be
du.
• Then substitute u = tan x, du = sec2 xdx and rewrite any remaining even powers of
sec x as powers of tan x = u using sec2 x = 1 + tan2 x = 1 + u2 .
• This gives
ż ż
4 2 2
sec xdx = sec
omooxn sec
lo xdx
looomooon
1+ u2 du

ż
= 1 + u2 ]du

u3
= u+ +C
3
1
= tan x + tan3 x + C
3

197
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS

Example 3.6.14


Example 3.6.15 tan3 x sec4 xdx — redux
ş

Solution. Let us revisit this example using this slightly different approach.

• Factor off one copy of sec2 x and combine it with dx to form sec2 xdx, which will be
du.

• Then substitute u = tan x, du = sec2 xdx and rewrite any remaining even powers of
sec x as powers of tan x = u using sec2 x = 1 + tan2 x = 1 + u2 .

• This gives
ż ż
3 4 3 2 2
tan x sec xdx = tan sec
omooxn sec
omooxn lo
lo xdx
looomooon
u3 1+ u2 du

ż
= u3 + u5 ]du

u4 u6
= + +C
4 6
1 1
= tan4 x + tan6 x + C
4 6

• This is not quite the same as the answer we got above in Example 3.6.13. However
we can show they are (nearly) equivalent. To do so we substitute v = sec x and
tan2 x = sec2 x ´ 1 = v2 ´ 1:

1 1 1 1
tan6 x + tan4 x = (v2 ´ 1)3 + (v2 ´ 1)2
6 4 6 4
1 6 1
= (v ´ 3v4 + 3v2 ´ 1) + (v4 ´ 2v2 + 1)
6 4
v 6 v 4 v 2 1 v 4 v2 1
= ´ + ´ + ´ +
6 2 2 6 4 2 4
 
v6 v4 1 1
= ´ + 0 ¨ v2 + ´
6 4 4 6
1 1 1
= sec6 x ´ sec4 x + .
6 4 12

So while 61 tan6 x + 14 tan4 x ‰ 16 sec6 x ´ 41 sec4 x, they only differ by a constant. Hence
both are valid antiderivatives of tan3 x sec4 x.

Example 3.6.15

198
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS

§§§ m is Even and n = 0 — Even Powers of Tangent

We integrate this by setting u = tan x. For this to work we need to pull one factor of sec2 x
to one side to form du = sec2 xdx. To find this factor of sec2 x we (perhaps repeatedly)
apply the identity tan2 x = sec2 x ´ 1.

Example 3.6.16 tan4 xdx
ş

Solution.

• There is no sec2 x term present, so we try to create it from tan4 x by using tan2 x =
sec2 x ´ 1.

tan4 x = tan2 x ¨ tan2 x


 
= tan2 x sec2 x ´ 1
= tan2 x sec2 x ´ lo
tan2
omooxn
sec2 x´1
= tan2 x sec2 x ´ sec x + 1 2

• Now we can substitute u = tan x, du = sec2 xdx.


ż ż ż ż
4 2 2 2
tan xdx = tan
omooxn sec
lo xdx ´
looomooon sec xdx +
looomooon dx
u2 du du
ż ż ż
= u2 du ´ du + dx

u3
= ´u+x+C
3
tan3 x
= ´ tan x + x + C
3

Example 3.6.16


Example 3.6.17 tan8 xdx
ş

Solution. Let us try the same approach.

• First pull out a factor of tan2 x to create a sec2 x factor:

tan8 x = tan6 x ¨ tan2 x


 
= tan6 x ¨ sec2 x ´ 1
= tan6 x sec2 x ´ tan6 x

199
I NTEGRATION 3.6 T RIGONOMETRIC I NTEGRALS

The first term is now ready to be integrated, but we need to reapply the method to
the second term:
 
= tan6 x sec2 x ´ tan4 x ¨ sec2 x ´ 1
= tan6 x sec2 x ´ tan4 x sec2 x + tan4 x do it again
6 2 4 2 2
 2

= tan x sec x ´ tan x sec x + tan x ¨ sec x ´ 1
= tan6 x sec2 x ´ tan4 x sec2 x + tan2 x sec2 x ´ tan2 x and again
 
= tan6 x sec2 x ´ tan4 x sec2 x + tan2 x sec2 x ´ sec2 x ´ 1

• Hence
ż ż h i
8
tan xdx = tan6 x sec2 x ´ tan4 x sec2 x + tan2 x sec2 x ´ sec2 x + 1 dx
ż h i ż
6 4 2 2
= tan x ´ tan x + tan x ´ 1 sec xdx + dx
ż h i
6 4 2
= u ´ u + u ´ 1 du + x + C

u7 u5 u3
= ´ + ´u+x+C
7 5 3
1 1 1
= tan7 x ´ tan5 x + tan3 x ´ tan x + x + C
7 5 3

Indeed this example suggests that for integer k ě 0:

1 1
ż
tan2k xdx = tan2k´1 ( x ) ´ tan2k´3 x + ¨ ¨ ¨ ´ (´1)k tan x + (´1)k x + C
2k ´ 1 2k ´ 3

Example 3.6.17
This last example also shows how we might integrate an odd power of tangent:

Example 3.6.18 tan7 x
ş

Solution. We follow the same steps

• Pull out a factor of tan2 x to create a factor of sec2 x:

tan7 x = tan5 x ¨ tan2 x


 
= tan5 x ¨ sec2 x ´ 1
= tan5 x sec2 x ´ tan5 x do it again
 
= tan5 x sec2 x ´ tan3 x ¨ sec2 x ´ 1
= tan5 x sec2 x ´ tan3 x sec2 x + tan3 x and again
 
= tan5 x sec2 x ´ tan3 x sec2 x + tan x sec2 x ´ 1
= tan5 x sec2 x ´ tan3 x sec2 x + tan x sec2 x ´ tan x

200
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION

• Now we can substitute u = tan x and du = sec2 xdx and also use the result from
Example 3.6.11 to take care of the last term:
 
ż ż ż
7 5 2 3 2 2
tan xdx = tan x sec x ´ tan x sec x + tan x sec x dx ´ tan xdx

Now factor out the common sec2 x term and integrate tan x via Example 3.6.11
 
ż
= tan5 x ´ tan3 x + tan x sec xdx ´ ln | sec x| + C
 5 
ż
= u ´ u3 + u du ´ ln | sec x| + C

u6 u4 u2
= ´ + ´ ln | sec x| + C
6 4 2
1 1 1
= tan6 x ´ tan4 x + tan2 x ´ ln | sec x| + C
6 4 2
This example suggests that for integer k ě 0:
1 1 1
ż
tan2k+1 xdx = tan2k ( x ) ´ tan2k´2 x + ¨ ¨ ¨ ´ (´1)k tan2 x + (´1)k ln | sec x| + C
2k 2k ´ 2 2

Example 3.6.18

Of course we have not considered integrals involving powers of cot x and csc x. But
they can be treated in much the same way as tan x and sec x were.
Integrating tanm x secn xdx when n is odd and m is even uses similar strategies as to
ş

the previous cases. However, the computations are often more involved and more tricks
need to be deployed. For this reason you will not be asked to compute integrals of that
type. (However, you should memorize the antiderivative of the secant function.) Sec-
tion A.8 in the appendix gives some examples, if you’re curious what these computations
look like. In particular, the derivation of sec xdx has quite an interesting trick to it.
ş

3.7IJ Trigonometric Substitution


In this section we discuss substitutions that simplify integrals containing square roots of
the form
a a a
a2 ´ x 2 a2 + x 2 x 2 ´ a2 .
When the integrand contains one of these square roots, then we can use trigonometric
substitutions to eliminate them. That is, we substitute
x = a sin u or x = a tan u or x = a sec u
and then use trigonometric identities
sin2 θ + cos2 θ = 1 and 1 + tan2 θ = sec2 θ
to simplify the result. To be more precise, we can

201
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION

?
• eliminate a2 ´ x2 from an integrand by substituting x = a sin u to give
a a a
a2 ´ x2 = a2 ´ a2 sin2 u = a2 cos2 u = |a cos u|
?
• eliminate a2 + x2 from an integrand by substituting x = a tan u to give
a a a
2 2 2 2
a + x = a + a tan u = a2 sec2 u = |a sec u|
2

?
• eliminate x2 ´ a2 from an integrand by substituting x = a sec u to give
a a a
x2 ´ a2 = a2 sec2 u ´ a2 = a2 tan2 u = |a tan u|
Be very careful with signs and absolute values when using this substitution. See
Example 3.7.6.
When we have used substitutions before, we usually gave the new integration vari-
able, u, as a function of the old integration variable x. Here we are doing the reverse
— we are giving the old integration variable, x, in terms of the new integration variable
u. We may do so, as long as we may invert to get u as a function of x. For example, with
x = a sin u, we may take u = arcsin xa . This is a good time for you to review the definitions
of arcsin θ, arctan θ and arcsec θ.
As a warm-up, consider the area of a quarter of the unit circle.
Example 3.7.1 (Quarter of the unit circle)

Compute the area of the unit circle lying in the first quadrant.
Solution. We know that the answer is π/4, but we can also compute this as an integral —
we saw this way back in Example 3.1.16:
ż1a
area = 1 ´ x2 dx
0
dx
• To simplify the integrand we substitute x = sin u. With this choice du = cos u and
so dx = cos udu.
• We also need to translate the limits of integration and it is perhaps easiest to do this
by writing u as a function of x — namely u( x ) = arcsin x. Hence u(0) = 0 and
u(1) = π/2.
• Hence the integral becomes
ż1a ż π/2 a
2
1 ´ x dx = 1 ´ sin2 u ¨ cos udu
0 0
ż π/2 a
= cos2 u ¨ cos udu
0
ż π/2
= cos2 udu
0
?
Notice that here we have used that the positive square root cos2 u = | cos u| = cos u
because cos(u) ě 0 for 0 ď u ď π/2.

202
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION

• To go further we use the techniques of Section 3.6.

ż1a ż π/2
1 + cos 2u
2
1 ´ x dx = cos2 udu and since cos2 u =
0 0 2
ż π/2
1
= (1 + cos(2u))du
2 0
 π/2
1 1
= u + sin(2u)
2 2 0
 
1 π sin π sin 0
= ´0+ ´
2 2 2 2
π
= X
4

Example 3.7.1

 
? x2
Example 3.7.2 dx
ş
1´x2

Solution. We proceed much as we did in the previous example.

dx
• To simplify the integrand we substitute x = sin u. With this choice du = cos u and
so dx = cos udu. Also note that u = arcsin x.

• The integral becomes

x2 sin2 u
ż ż
? dx = ¨ cos udu
1 ´ x2
a
1 ´ sin2 u
sin2 u
ż
= ? ¨ cos udu
cos2 u

• To proceed further we need to get rid of the square-root. Since u = arcsin x has
domain ´1 ď x ď 1 and range ´π/2 ď u ď π/2, it follows that cos u ě 0 (since cosine
is non-negative on these inputs). Hence

a
cos2 u = cos u when ´π/2 ď u ď π/2

203
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION

• So our integral now becomes

x2 sin2 u
ż ż
? dx = ? ¨ cos udu
1 ´ x2 cos2 u
sin2 u
ż
= ¨ cos udu
cos u
ż
= sin2 udu
1
ż
= (1 ´ cos 2u)du by Equation (3.6.4)
2
u 1
= ´ sin 2u + C
2 4
1 1
= arcsin x ´ sin(2 arcsin x ) + C
2 4

• We can simplify this further using a double-angle identity. Recall that u = arcsin x
and that x = sin u. Then

sin 2u = 2 sin u cos u

2 2
We can replace cos
a u using cos u = 1 ´ sin u. Taking a square-root of this formula
gives cos u = ˘ 1 ´ sin2 u. We need the positive branch here since cos u ě 0 when
´π/2 ď u ď π/2 (which is exactly the range of arcsin x). Continuing along:
a
sin 2u = 2 sin u ¨ 1 ´ sin2 u
a
= 2x 1 ´ x2

Thus our solution is

x2 1 1
ż
? dx = arcsin x ´ sin(2 arcsin x ) + C
1 ´ x2 2 4
1 1 a
= arcsin x ´ x 1 ´ x2 + C
2 2

Example 3.7.2

The above two example illustrate the main steps of the approach. The next example is
similar, but with more complicated limits of integration.
ş ? 
r
Example 3.7.3 a r2 ´ x2 dx

Let’s find the area of the shaded region in the sketch below.

204
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION

y
x2 + y 2 = r 2

a r x

We’ll set
? up the integral using vertical strips. The strip in the figure has width dx and
height r2 ´ x2 . So the area is given by the integral
żra
area = r2 ´ x2 dx
a
Which is very similar to the previous example.
Solution.
• To evaluate the integral we substitute
dx
x = x (u) = r sin u du = r cos u du
dx =
du
It is also helpful to write u as a function of x — namely u = arcsin xr .
• The integral runs from x = a to x = r. These correspond to
r π
u(r ) = arcsin = arcsin 1 =
r 2
a
u( a) = arcsin which does not simplify further
r
• The integral then becomes
żra ż π/2 a
r2 ´ x2 dx = r2 ´ r2 sin2 u ¨ r cos udu
a arcsin( a/r )
ż π/2 a
2
= r 1 ´ sin2 u ¨ cos udu
arcsin( a/r )
ż π/2 a
2
=r cos2 u ¨ cos udu
arcsin( a/r )

To proceed further (as we did in Examples 3.7.1 and 3.7.2) we need to think about
whether cos u is positive or negative.
• Since a (as shown in the diagram) satisfies 0 ď a ď r, we know that u( a) lies between
arcsin(0) = 0 and arcsin(1) = π/2. Hence the variable u lies between 0 and π/2, and
on this range cos u ě 0. This allows us get rid of the square-root:
a
cos2 u = | cos u| = cos u

205
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION

• Putting this fact into our integral we get


żra ż π/2 a
2 2 2
r ´ x dx = r cos2 u ¨ cos udu
a arcsin( a/r )
ż π/2
= r2 cos2 udu
arcsin( a/r )

1+cos 2u
Recall the identity cos2 u = 2 from Section 3.6
ż π/2
r2
= (1 + cos 2u)du
2 arcsin( a/r )
 π/2
r2 1
= u + sin(2u)
2 2 arcsin( a/r )
2  
r π 1 1
= + sin π ´ arcsin( a/r ) ´ sin(2 arcsin( a/r ))
2 2 2 2
2  
r π 1
= ´ arcsin( a/r ) ´ sin(2 arcsin( a/r ))
2 2 2
Oof! But there is a little further to go before we are done.
• We can again simplify the term sin(2 arcsin( a/r )) using a double angle identity. Set
θ = arcsin( a/r ). Then θ is the angle in the triangle on the right below. By the double
angle formula for sin(2θ ) (Equation (3.6.2))

sin(2θ ) = 2 sin θ cos θ r


a
?
a r 2 ´ a2 θ

=2 .
r r r 2 − a2

• So finally the area is


żra
area = r2 ´ x2 dx
a
 
r2 π 1
= ´ arcsin( a/r ) ´ sin(2 arcsin( a/r ))
2 2 2
πr2 r2 aa 2
= ´ arcsin( a/r ) ´ r ´ a2
4 2 2

• This is a relatively complicated formula, but we can make some “reasonableness”


checks, by looking at special values of a.
– If a = 0 the shaded region, in the figure at the beginning of this example, is
exactly one quarter of a disk of radius r and so has area 14 πr2 . Substituting
a = 0 into our answer does indeed give 14 πr2 .
– At the other extreme, if a = r, the shaded region disappears completely and so
has area 0. Subbing a = r into our answer does indeed give 0, since arcsin 1 =
2.
π

206
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION

Example 3.7.3

ş ? 
r
Example 3.7.4 a x r2 ´ x2 dx
şr ?
The integral a x r2 ´ x2 dx looks a lot like the integral we just did in the previous 3 exam-
ples. It can also be evaluated using the trigonometric substitution x = r sin u — but that is
unnecessarily complicated. Just because you have now learned how to use trigonometric
substitution35 doesn’t mean that you should forget everything you learned before.
Solution. This integral is much more easily evaluated using the simple substitution u =
r2 ´ x2 .

• Set u = r2 ´ x2 . Then du = ´2xdx, and so


żr a ż0
? du
2 2
x r ´ x dx = u
a r2 ´a2 ´2
 3/2 0
1 u

2 3/2 r2 ´a2
1 3/2
= r 2 ´ a2
3

Example 3.7.4

Enough sines and cosines — let us try a tangent substitution.


ş 
dx
Example 3.7.5 2
?
2x 9+ x

Solution. As per our guidelines at the start of this section, the presence of the square root
?
term 32 + x2 tells us to substitute x = 3 tan u.

• Substitute

x = 3 tan u dx = 3 sec2 u du

This allows us to remove the square root:


a a a a
9 + x2 = 9 + 9 tan2 u = 3 1 + tan2 u = 3 sec2 u = 3| sec u|

• Hence our integral becomes

dx 3 sec2 u
ż ż
? = du
x2 9 + x2 9 tan2 u ¨ 3| sec u|

35 To paraphrase the Law of the Instrument, possibly Mark Twain and definitely some psychologists,
when you have a shiny new hammer, everything looks like a nail.

207
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION

• To remove the absolute value we must consider the range of values of u in the in-
tegral. Since x = 3 tan u we have u = arctan( x/3). The range36 of arctangent is
´π/2 ď arctan ď π/2 and so u = arctan( x/3) will always like between ´π/2 and
+π/2. Hence cos u will always be positive, which in turn implies that | sec u| = sec u.
• Using this fact our integral becomes:

dx 3 sec2 u
ż ż
? = du
x2 9 + x2 27 tan2 u| sec u|
1 sec u
ż
= du since sec u ą 0
9 tan2 u

• Rewrite this in terms of sine and cosine


dx 1 sec u
ż ż
? = du (3.7.1)
x2 9 + x2 9 tan2 u
1 1 cos2 u 1 cos u
ż ż
= ¨ du = du (3.7.2)
9 cos u sin2 u 9 sin2 u
Now we can use the substitution rule with y = sin u and dy = cos udu
1 dy
ż
= (3.7.3)
9 y2
1
= ´ +C (3.7.4)
9y
1
=´ +C (3.7.5)
9 sin u

• The original integral was a function of x, so we still have to rewrite sin u in terms of
x. Remember that x = 3 tan u or u = arctan( x/3). So u is the angle shown in the
triangle below and we can read off the triangle that

x 9 + x2
sin u = ? x
9 + x2
? u
dx 9 + x2
ż
ùñ ? =´ +C 3
2
x 9+x 2 9x

Example 3.7.5

 
2
Example 3.7.6 ?x dx
ş
x2 ´1

Solution. This one requires a secant substitution, but otherwise is very similar to those
above.

36 To be pedantic, we mean the range of the “standard” arctangent function or its “principle value”. One
can define other arctangent functions with different ranges.

208
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION

• Set x = sec u and dx = sec u tan u du. Then


x2 sec2 u
ż ż
? dx = ? sec u tan u du
x2 ´ 1 sec 2u´1
tan u
ż
= sec3 u ¨ ? du since tan2 u = sec2 u ´ 1
tan2 u
tan u
ż
= sec3 u ¨ du
| tan u|

• As before we need to consider the range of u values in order to determine the sign
of tan u. Notice that the integrand is only defined when either x ă ´1 or x ą 1; thus
we should treat the cases x ă ´1 and x ą 1 separately. Let us assume that x ą 1 and
we will come back to the case x ă ´1 at the end of the example.
When x ą 1, our u = arcsec x takes values in (0, π/2). This follows since when
0 ă u ă π/2, we have 0 ă cos u ă 1 and so sec u ą 1. Further, when 0 ă u ă π/2, we
have tan u ą 0. Thus | tan u| = tan u.

• Back to our integral, when x ą 1:

x2 tan u
ż ż
? dx = sec3 u ¨ du
x2 ´ 1 | tan u|
ż
= sec3 udu since tan u ě 0

This is exactly Example A.8.3


1 1
= sec u tan u + ln | sec u + tan u| + C
2 2

• Since we started with a function of x we need to finish with one. We know that
sec u = x and then we can use trig identities
2 2 2
a
tan u = sec u ´ 1 = x ´ 1 so tan u = ˘ x2 ´ 1, but we know tan u ě 0, so
a
tan u = x2 ´ 1

Thus
x2 1 a 1
ż a
? dx = x x2 ´ 1 + ln |x + x2 ´ 1| + C
x2 ´ 1 2 2

• The above holds when x ą 1. We can confirm that it is also true when x ă ´1 by
showing the right-hand side is a valid antiderivative of the integrand. To do so we
must?differentiate our answer. Notice that we do not need to consider the sign of
x + x2 ´ 1 when we differentiate since we have already seen that

d 1
ln |x| =
dx x

209
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION

when either x ă 0 or x ą 0. So the following computation applies to both x ą 1 and


x ă ´1. The expressions become quite long so we differentiate each term separately:
d h a 2 i a x 2 
x x ´1 = x2 ´ 1 + ?
dx x2 ´ 1
1 h i
? 2 2
= ( x ´ 1) + x
x2 ´ 1
 
d 1 x
ˇ a ˇ
2
ln x + x ´ 1ˇ = ? ¨ 1+ ?
ˇ ˇ
ˇ ˇ
dx ˇ x + x2 ´ 1 x2 ´ 1
?
1 x + x2 ´ 1
= ? ¨ ?
x + x2 ´ 1 x2 ´ 1
1
=?
x2 ´ 1
Putting things together then gives us
  h i
d 1 a 2 1 a
2
1 2 2
x x ´ 1 + ln |x + x ´ 1| + C = ? ( x ´ 1) + x + 1 + 0
dx 2 2 2 x2 ´ 1
x2
=?
x2 ´ 1
This tells us that our answer for x ą 1 is also valid when x ă ´1 and so
x2 1 a 1
ż a
? dx = x x2 ´ 1 + ln |x + x2 ´ 1| + C
x2 ´ 1 2 2
when x ă ´1 and when x ą 1.
In this example, we were lucky. The answer that we derived for x ą 1 happened to
be also valid for x ă ´1. This does not always happen with the x = a sec u substitution.
When it doesn’t, we have to apply separate x ą a and x ă ´a analyses that are very
similar to our x ą 1 analysis above. Of course that doubles the tedium. So in the practice
book, we will not pose questions that require separate x ą a and x ă ´a computations.
Example 3.7.6

The method, as we have demonstrated it above, works when our integrand contains
the square root of very specific families of quadratic polynomials. In fact, the same method
works for more general quadratic polynomials — all we need to do is complete the square37 .

 
ş5 ?x2 ´2x´3
Example 3.7.7 3 x´1 dx

This time we have an integral with a square root in the integrand, but the argument of the

37 If you have not heard of “completing the square” don’t worry. It is not a difficult method and it will
only take you a few moments to learn. It refers to rewriting a quadratic polynomial

P( x ) = ax2 + bx + c as P ( x ) = a ( x + d )2 + e
for new constants d, e.

210
I NTEGRATION 3.7 T RIGONOMETRIC S UBSTITUTION

?
square root, 2 2
? ? while a quadratic function of x, is not in one of the standard forms a ´ x ,
a2 + x2 , x2 ´ a2 . The reason that it is not in one of those forms is that the argument,
2
x ´ 2x ´ 3, contains a term , namely ´2x that is of degree one in x. So we try to manipulate
it into one of the standard forms by completing the square.
Solution.

• We first rewrite the quadratic polynomial x2 ´ 2x ´ 3 in the form ( x ´ a)2 + b for


some constants a, b. The easiest way to do this is to expand both expressions and
compare coefficients of x:

x2 ´ 2x ´ 3 = ( x ´ a)2 + b = ( x2 ´ 2ax + a2 ) + b

So — if we choose ´2a = ´2 (so the coefficients of x1 match) and a2 + b = ´3 (so


the coefficients of x0 match), then both expressions are equal. Hence we set a = 1
and b = ´4. That is

x2 ´ 2x ´ 3 = ( x ´ 1)2 ´ 4

Many of you may have seen this method when learning to sketch parabolas.

• Once this is done we can convert the square root of the integrand into a standard
form by making the simple substitution y = x ´ 1. Here goes
ż5? 2 ż5a
x ´ 2x ´ 3 ( x ´ 1)2 ´ 4
dx = dx
3 x´1 3 x´1
ż4a 2
y ´4
= dy with y = x ´ 1, dy = dx
2 y
ż π/3 ?
4 sec2 u ´ 4
= 2 sec u tan u du with y = 2 sec u
0 2 sec u
and dy = 2 sec u tan u du

Notice that we could also do this in fewer steps by setting ( x ´ 1) = 2 sec u, dx =


2 sec u tan udu.

• To get the limits of integration we used that


2
– the value of u that corresponds to y = 2 obeys 2 = y = 2 sec u = cos u or
cos u = 1, so that u = 0 works and
2
– the value of u that corresponds to y = 4 obeys 4 = y = 2 sec u = cos u or
cos u = 12 , so that u = π/3 works.

• Now returning to the evaluation of the integral, we simplify and continue.


ż5? 2
x ´ 2x ´ 3
ż π/3 a
dx = 2 sec2 u ´ 1 tan u du
3 x´1 0
ż π/3
=2 tan2 u du since sec2 u = 1 + tan2 u
0

211
I NTEGRATION 3.8 PARTIAL F RACTIONS

In taking the square root of sec2 u ´ 1 = tan2 u we used that tan u ě 0 on the range
0 ď u ď π3 .

 
ż π/3
=2 sec2 u ´ 1 du since sec2 u = 1 + tan2 u, again
0
h iπ/3
= 2 tan u ´ u
? 0
= 2 3 ´ π/3

Example 3.7.7

3.8IJ Partial Fractions


Partial fractions is the name given to a technique of integration that may be used to convert
any rational function38 into a form that is algebraically equivalent but easier to integrate.
For example, at first glance the integral
ż 3
x +x
dx
x2 ´ 1
looks devilish. But if you know the algebraic fact that

x3 + x 1 1
2
= x+ +
x ´1 x+1 x´1
then the integration becomes nearly trivial:
ż 3 ż  
x +x 1 1
dx = x+ + dx
x2 ´ 1 x+1 x´1
1
= x2 + ln |x + 1| + ln |x ´ 1| + C
2
We are not (typically) presented with a rational function nicely sectioned into neat little
pieces. Partial fraction decomposition is a strategy for breaking rational functions up into
these small, nicely integrable parts.

Suppose that N ( x ) and D ( x ) are polynomials. The basic strategy behind the method
N (x)
of partial fractions is to write D( x) as a sum of very simple, easy-to-integrate rational
functions, namely:
(1) polynomials — we shall see below that these are needed when the degree39 of N ( x ) is
equal to or strictly bigger than the degree of D ( x );

38 Recall that a rational function is the ratio of two polynomials.


39 The degree of a polynomial is the largest power of x. For example, the degree of 2x3 + 4x2 + 6x + 8 is
three.

212
I NTEGRATION 3.8 PARTIAL F RACTIONS

A
(2) rational functions of the particularly simple form ( ax +b)n
; and

Ax + B
(3) rational functions of the form40 ( ax2 +bx +c)m
.

To begin to explore this method of decomposition, let us go back to the example we


just saw:

1 1 x ( x + 1)( x ´ 1) + ( x ´ 1) + ( x + 1) x3 + x
x+ + = = 2
x+1 x´1 ( x + 1)( x ´ 1) x ´1

The technique that we will use is based on two observations:

(1) The denominators on the left-hand side of are the factors of the denominator x2 ´ 1 =
( x ´ 1)( x + 1) on the right-hand side.

(2) Use P( x ) to denote the polynomial on the left hand side, and then use N ( x ) and D ( x )
to denote the numerator and denominator of the right hand side. That is

P( x ) = x N ( x ) = x3 + x D ( x ) = x2 ´ 1.

Then the degree of N ( x ) is the sum of the degrees of P( x ) and D ( x ). This is because
the highest degree term in N ( x ) is x3 , which comes from multiplying P( x ) by D ( x ),
as we see in
D(x)
hkPkik
( x ) hkkkkkkkikkkkkkkj
kj
1 1 x ( x + 1)( x ´ 1) +( x ´ 1) + ( x + 1) x3 + x
x+ + = = 2
x+1 x´1 ( x + 1)( x ´ 1) x ´1

More generally, the presence of a polynomial on the left hand side is signalled on the
right hand side by the fact that the degree of the numerator is at least as large as the
degree of the denominator.

3.8.1 §§ Partial Fraction Decomposition Examples


Rather than writing up the technique — known as the partial fraction decomposition — in
full generality first, we will instead illustrate it through a sequence of examples. A more
formal description of the method is given in Subsection 3.8.3.
ş 
x´3
Example 3.8.1 x2 ´3x +2
dx
N (x) x´3
In this example, we integrate D(x)
= x2 ´3x +2
.

Solution.

40 You might notice these examples conveniently absent in the discussion that follows. In this class, we
will skip decompositions requiring such functions. For the extra-curious, the rest of the story can be
found in Appendix Section A.9.1

213
I NTEGRATION 3.8 PARTIAL F RACTIONS

• Step 1. We first check to see if a polynomial P( x ) is needed. To do so, we check to


see if the degree of the numerator, N ( x ), is strictly smaller than the degree of the
denominator D ( x ). In this example, the numerator, x ´ 3, has degree one and that
is indeed strictly smaller than the degree of the denominator, x2 ´ 3x + 2, which is
two. In this case41 we do not need to extract a polynomial P( x ) and we move on to
step 2.
• Step 2. The second step is to factor the denominator
x2 ´ 3x + 2 = ( x ´ 1)( x ´ 2)
In this example it is quite easy, but in future examples (and quite possibly in your
homework, quizzes and exam) you will have to work harder to factor the denomi-
nator. In Appendix B.16 we have written up some simple tricks for factoring poly-
nomials. We will illustrate them in Example A.9.5 below.
x´3
• Step 3. The third step is to write x2 ´3x +2
in the form

x´3 A B
= +
x2 ´ 3x + 2 x´1 x´2
for some constants A and B. More generally, if the denominator consists of n differ-
ent linear factors, then we decompose the ratio as
A1 A2 An
rational function = + +¨¨¨+
linear factor 1 linear factor 2 linear factor n

To proceed we need to determine the values of the constants A, B and there are
several different methods to do so. Here are two methods
• Step 3 – Algebra Method. This approach has the benefit of being conceptually clearer
and easier, but the downside is that it is more tedious.
To determine the values of the constants A, B, we put42 the right-hand side back
over the common denominator ( x ´ 1)( x ´ 2).
x´3 A B A ( x ´ 2) + B ( x ´ 1)
= + =
x2 ´ 3x + 2 x´1 x´2 ( x ´ 1)( x ´ 2)
The fraction on the far left is the same as the fraction on the far right if and only if
their numerators are the same.
x ´ 3 = A ( x ´ 2) + B ( x ´ 1)
Write the right hand side as a polynomial in standard form (i.e. collect up all x terms
and all constant terms)
x ´ 3 = ( A + B) x + (´2A ´ B)

41 We will soon get to an example (Example 3.8.2 in fact) in which the numerator degree is at least as
large as the denominator degree — in that situation we have to extract a polynomial P( x ) before we
can move on to step 2.
42 That is, we take the decomposed form and sum it back together.

214
I NTEGRATION 3.8 PARTIAL F RACTIONS

For these two polynomials to be the same, the coefficient of x on the left hand side
and the coefficient of x on the right hand side must be the same. Similarly the co-
efficients of x0 (i.e. the constant terms) must match. This gives us a system of two
equations.

A+B = 1 ´2A ´ B = ´3

in the two unknowns A, B. We can solve this system by

– using the first equation, namely A + B = 1, to determine A in terms of B:

A = 1´B

– Substituting this into the remaining equation eliminates the A from second
equation, leaving one equation in the one unknown B, which can then be solved
for B:

´2A ´ B = ´3 substitute A = 1 ´ B
´2(1 ´ B) ´ B = ´3 clean up
´2 + B = ´3 so B = ´1

– Once we know B, we can substitute it back into A = 1 ´ B to get A.

A = 1 ´ B = 1 ´ (´1) = 2

Hence
x´3 2 1
= ´
x2 ´ 3x + 2 x´1 x´2

• Step 3 – Sneaky Method. This takes a little more work to understand, but it is more
efficient than the algebra method.
We wish to find A and B for which
x´3 A B
= +
( x ´ 1)( x ´ 2) x´1 x´2

Note that the denominator on the left hand side has been written in factored form.

– To determine A, we multiply both sides of the equation by A’s denominator,


which is x ´ 1,

x´3 ( x ´ 1) B
= A+
x´2 x´2
and then we completely eliminate B from the equation by evaluating at x = 1.
This value of x is chosen to make x ´ 1 = 0.
x ´ 3 ˇˇ ( x ´ 1) B ˇˇ 1´3
ˇ ˇ
= A+ ùñ A = =2
x ´ 2 x =1
ˇ x ´ 2 x =1
ˇ 1´2

215
I NTEGRATION 3.8 PARTIAL F RACTIONS

– To determine B, we multiply both sides of the equation by B’s denominator,


which is x ´ 2,
x´3 ( x ´ 2) A
= +B
x´1 x´1
and then we completely eliminate A from the equation by evaluating at x = 2.
This value of x is chosen to make x ´ 2 = 0.
x ´ 3 ˇˇ ( x ´ 2) A ˇˇ 2´3
ˇ ˇ
= + B ùñ B = = ´1
x ´ 1 x =2
ˇ x ´ 1 x =2
ˇ 2´1

Hence we have (the thankfully consistent answer)


x´3 2 1
= ´
x2 ´ 3x + 2 x´1 x´2
Notice that no matter which method we use to find the constants we can easily check
our answer by summing the terms back together:
2 1 2( x ´ 2) ´ ( x ´ 1) 2x ´ 4 ´ x + 1 x´3
´ = = 2 = 2 X
x´1 x´2 ( x ´ 2)( x ´ 1) x ´ 3x + 2 x ´ 3x + 2

Step 4. The final step is to integrate.


x´3 2 ´1
ż ż ż
2
dx = dx + dx = 2 ln |x ´ 1| ´ ln |x ´ 2| + C
x ´ 3x + 2 x´1 x´2

Example 3.8.1
Perhaps the first thing that you notice is that this process takes quite a few steps43 . How-
ever no single step is all that complicated; it only takes practice. With that said, let’s do
another, slightly more complicated, one.
ş 3 2 
3x ´8x +4x´1
Example 3.8.2 x2 ´3x +2
dx
N (x) 3x3 ´8x2 +4x´1
In this example, we integrate D(x)
= x2 ´3x +2
.
Solution.
• Step 1. We first check to see if the degree of the numerator N ( x ) is strictly smaller
than the degree of the denominator D ( x ). In this example, the numerator, 3x3 ´
8x2 + 4x ´ 1, has degree three and the denominator, x2 ´ 3x + 2, has degree two. As
3 ě 2, we have to implement the first step.
N (x)
The goal of the first step is to write D(x)
in the form

N (x) R( x )
= P( x ) +
D(x) D(x)

43 Though, in fairness, we did step 3 twice — and that is the most tedious bit. . . Actually — sometimes
factoring the denominator can be quite challenging. We’ll consider this issue in more detail shortly.

216
I NTEGRATION 3.8 PARTIAL F RACTIONS

with P( x ) being a polynomial and R( x ) being a polynomial of degree strictly smaller


P( x ) D ( x )+ R( x )
than the degree of D ( x ). The right hand side is D(x)
, so we have to express
the numerator in the form N ( x ) = P( x ) D ( x ) + R( x ), with P( x ) and R( x ) being
polynomials and with the degree of R being strictly smaller than the degree of D.
P( x ) D ( x ) is a sum of expressions of the form ax n D ( x ). We want to pull as many
expressions of this form as possible out of the numerator N ( x ), leaving only a low
degree remainder R( x ).
We do this using long division — the same long division you learned in school, but
with the base 10 replaced by x.
– We start by observing that to get from the highest degree term in the denomina-
tor (x2 ) to the highest degree term in the numerator (3x3 ), we have to multiply
it by 3x. So we write,
3x
x2 − 3x + 2 3x3 − 8x2 + 4x − 1

In the above expression, the denominator is on the left, the numerator is on the
right and 3x is written above the highest order term of the numerator. Always
put lower powers of x to the right of higher powers of x — this mirrors how
you do long division with numbers; lower powers of ten sit to the right of lower
powers of ten.
– Now we subtract 3x times the denominator, x2 ´ 3x + 2, which is 3x3 ´ 9x2 + 6x,
from the numerator.
3x
x2 − 3x + 2 3x3 − 8x2 + 4x − 1
3x3 − 9x2 + 6x 3x(x2 − 3x + 2)
x2 − 2x − 1
– This has left a remainder of x2 ´ 2x ´ 1. To get from the highest degree term in
the denominator (x2 ) to the highest degree term in the remainder (x2 ), we have
to multiply by 1. So we write,
3x + 1
x2 − 3x + 2 3x3 − 8x2 + 4x − 1
3x3 − 9x2 + 6x
x2 − 2x − 1
– Now we subtract 1 times the denominator, x2 ´ 3x + 2, which is x2 ´ 3x + 2,
from the remainder.
3x + 1
x2 − 3x + 2 3x3 − 8x2 + 4x − 1
3x3 − 9x2 + 6x 3x(x2 − 3x + 2)
x2 − 2x − 1
x2 − 3x + 2 1 (x2 − 3x + 2)
x− 3

– This leaves a remainder of x ´ 3. Because the remainder has degree 1, which is


smaller than the degree of the denominator (being degree 2), we stop.

217
I NTEGRATION 3.8 PARTIAL F RACTIONS

– In this example, when we subtracted 3x ( x2 ´ 3x + 2) and 1( x2 ´ 3x + 2) from


3x3 ´ 8x2 + 4x ´ 1 we ended up with x ´ 3. That is,
3x3 ´ 8x2 + 4x ´ 1 ´ 3x ( x2 ´ 3x + 2) ´ 1( x2 ´ 3x + 2) = x ´ 3

or, collecting the two terms proportional to ( x2 ´ 3x + 2)


3x3 ´ 8x2 + 4x ´ 1 ´ (3x + 1)( x2 ´ 3x + 2) = x ´ 3

Moving the (3x + 1)( x2 ´ 3x + 2) to the right hand side and dividing the whole
equation by x2 ´ 3x + 2 gives
3x3 ´ 8x2 + 4x ´ 1 x´3
= 3x + 1 +
x2 ´ 3x + 2 x2 ´ 3x + 2
And we can easily check this expression just by summing the two terms on the
right-hand side.
N (x) R( x )
We have written the integrand in the form D( x) = P( x ) + D( x) , with the degree of
R( x ) strictly smaller than the degree of D ( x ), which is what we wanted. Observe
that R( x ) is the final remainder of the long division procedure and P( x ) is at the top
of the long division computation.

3x + 1 P (x)
2
D(x) x − 3x + 2 3x3 − 8x2 + 4x − 1 N(x)
3x3 − 9x2 + 6x 3x · D(x)
x2 − 2x − 1 N(x) − 3x · D(x)
x2 − 3x + 2 1 · D(x)
x− 3 R(x) = N(x) − (3x + 1)D(x)

This is the end of Step 1. Oof! You should definitely practice this step.
• Step 2. The second step is to factor the denominator
x2 ´ 3x + 2 = ( x ´ 1)( x ´ 2)
We already did this in Example 3.8.1.
x´3
• Step 3. The third step is to write x2 ´3x +2
in the form

x´3 A B
= +
x2 ´ 3x + 2 x´1 x´2
for some constants A and B. We already did this in Example 3.8.1. We found A = 2
and B = ´1.
• Step 4. The final step is to integrate.
3x3 ´ 8x2 + 4x ´ 1   2 ´1
ż ż ż ż
2
dx = 3x + 1 dx + dx + dx
x ´ 3x + 2 x´1 x´2
3
= x2 + x + 2 ln |x ´ 1| ´ ln |x ´ 2| + C
2

218
I NTEGRATION 3.8 PARTIAL F RACTIONS

You can see that the integration step is quite quick — almost all the work is in preparing
the integrand.
Example 3.8.2

The best thing after working through a few a nice medium-length examples is to do a
nice long example — it is excellent practice44 . We recommend that the reader attempt the
problem before reading through our solution.
Problems like this are a vehicle for practicing problem solving in general. Take a big
problem, cut it up into smaller chunks, solve those chunks, and put everything back to-
gether into a single solution — without getting lost or giving up. Even if you forget partial
fraction decompositions the minute you walk out of the final exam, the ability to solve a
big problem by cutting it into smaller ones will remain a useful life skill.
ş 3 
4x +23x2 +45x +27
Example 3.8.3 x3 +5x2 +8x +4
dx

N (x) 4x3 +23x2 +45x +27


In this example, we integrate D(x)
= x3 +5x2 +8x +4
.

• Step 1. The degree of the numerator N ( x ) is equal to the degree of the denominator
N (x)
D ( x ), so the first step to write D( x) in the form

N (x) R( x )
= P( x ) +
D(x) D(x)
with P( x ) being a polynomial (which should be of degree 0, i.e. just a constant) and
R( x ) being a polynomial of degree strictly smaller than the degree of D ( x ). By long
division
4
3 2
x + 5x + 8x + 4 4x3 + 23x2 + 45x + 27
4x3 + 20x2 + 32x + 16
3x2 + 13x + 11

so
4x3 + 23x2 + 45x + 27 3x2 + 13x + 11
= 4 +
x3 + 5x2 + 8x + 4 x3 + 5x2 + 8x + 4

• Step 2. The second step is to factorise D ( x ) = x3 + 5x2 + 8x + 4.


– To start, we’ll try and guess an integer root. Any integer root of D ( x ) must
divide the constant term, 4, exactly. Only ˘1, ˘2, ˘4 can be integer roots of
x3 + 5x2 + 8x + 4.

44 At the risk of quoting Nietzsche, “That which does not kill us makes us stronger.” Though this author
always preferred the logically equivalent contrapositive — “That which does not make us stronger will
kill us.” However no one is likely to be injured by practicing partial fractions or looking up quotes
on Wikipedia. Its also a good excuse to remind yourself of what a contrapositive is — though we will
likely look at them again when we get to sequences and series.

219
I NTEGRATION 3.8 PARTIAL F RACTIONS

– We test to see if ˘1 are roots.

D (1) = (1)3 + 5(1)2 + 8(1) + 4 ‰ 0 ñ x = 1 is not a root


D (´1) = (´1)3 + 5(´1)2 + 8(´1) + 4 = 0 ñ x = ´1 is a root

So ( x + 1) must divide x3 + 5x2 + 8x + 4 exactly.


– By long division
x2+ 4x + 4
x + 1 x3+ 5x2+ 8x+ 4
x3+ x2
4x2+ 8x+ 4
4x2+ 4x
4x+ 4
4x+ 4
0

so

x3 + 5x2 + 8x + 4 = ( x + 1)( x2 + 4x + 4) = ( x + 1)( x + 2)( x + 2)

– Notice that we could have instead checked whether or not ˘2 are roots

D (2) = (2)3 + 5(2)2 + 8(2) + 4 ‰ 0 ñ x = 2 is not a root


D (´2) = (´2)3 + 5(´2)2 + 8(´2) + 4 = 0 ñ x = ´2 is a root

We now know that both ´1 and ´2 are roots of x3 + 5x2 + 8x + 4 and hence both
( x + 1) and ( x + 2) are factors of x3 + 5x2 + 8x + 4. Because x3 + 5x2 + 8x + 4 is
of degree three and the coefficient of x3 is 1, we must have x3 + 5x2 + 8x + 4 =
( x + 1)( x + 2)( x + a) for some constant a. Multiplying out the right hand side
shows that the constant term is 2a. So 2a = 4 and a = 2.
This is the end of step 2. We now know that

4x3 + 23x2 + 45x + 27 3x2 + 13x + 11


= 4 +
x3 + 5x2 + 8x + 4 ( x + 1)( x + 2)2

3x2 +13x +11


• Step 3. The third step is to write ( x +1)( x +2)2
in the form

3x2 + 13x + 11 A B C
2
= + +
( x + 1)( x + 2) x + 1 x + 2 ( x + 2)2
for some constants A, B and C.
Note that there are two terms on the right hand arising from the factor ( x + 2)2 . One
has denominator ( x + 2) and one has denominator ( x + 2)2 . More generally, for each
factor ( x + a)n in the denominator of the rational function on the left hand side, we
include
A1 A2 An
+ +¨¨¨+
x + a ( x + a) 2 ( x + a)n

220
I NTEGRATION 3.8 PARTIAL F RACTIONS

in the partial fraction decomposition on the right hand side45 .


To determine the values of the constants A, B, C, we put the right hand side back
over the common denominator ( x + 1)( x + 2)2 .

3x2 + 13x + 11 A B C
2
= + +
( x + 1)( x + 2) x + 1 x + 2 ( x + 2)2

A( x + 2)2 + B( x + 1)( x + 2) + C ( x + 1)
=
( x + 1)( x + 2)2
The fraction on the far left is the same as the fraction on the far right if and only if
their numerators are the same.

3x2 + 13x + 11 = A( x + 2)2 + B( x + 1)( x + 2) + C ( x + 1)

As in the previous examples, there are a couple of different ways to determine the
values of A, B and C from this equation.
• Step 3 – Algebra Method. The conceptually clearest procedure is to write the right
hand side as a polynomial in standard form (i.e. collect up all x2 terms, all x terms
and all constant terms)

3x2 + 13x + 11 = ( A + B) x2 + (4A + 3B + C ) x + (4A + 2B + C )

For these two polynomials to be the same, the coefficient of x2 on the left hand side
and the coefficient of x2 on the right hand side must be the same. Similarly the
coefficients of x1 and the coefficients of x0 (i.e. the constant terms) must match. This
gives us a system of three equations,

A+B = 3 4A + 3B + C = 13 4A + 2B + C = 11

in the three unknowns A, B, C. We can solve this system by


– using the first equation, namely A + B = 3, to determine A in terms of B:
A = 3 ´ B.
– Substituting this into the remaining equations eliminates the A, leaving two
equations in the two unknown B, C.

4(3 ´ B) + 3B + C = 13 4(3 ´ B) + 2B + C = 11

or

´B + C = 1 ´ 2B + C = ´1

– We can now solve the first of these equations, namely ´B + C = 1, for B in


terms of C, giving B = C ´ 1.
– Substituting this into the last equation, namely ´2B + C = ´1, gives ´2(C ´
1) + C = ´1 which is easily solved to give

45 This is justified in Appendix section A.10

221
I NTEGRATION 3.8 PARTIAL F RACTIONS

– C = 3, and then B = C ´ 1 = 2 and then A = 3 ´ B = 1.

Hence

4x3 + 23x2 + 45x + 27 3x2 + 13x + 11 1 2 3


3 2
= 4 + 2
= 4+ + +
x + 5x + 8x + 4 ( x + 1)( x + 2) x + 1 x + 2 ( x + 2)2

• Step 3 – Sneaky Method. The second, sneakier, method for finding A, B and C exploits
the fact that 3x2 + 13x + 11 = A( x + 2)2 + B( x + 1)( x + 2) + C ( x + 1) must be true
for all values of x. In particular, it must be true for x = ´1. When x = ´1, the factor
( x + 1) multiplying B and C is exactly zero. So B and C disappear from the equation,
leaving us with an easy equation to solve for A:
ˇ h i
2 2
3x + 13x + 11ˇ = A( x + 2) + B( x + 1)( x + 2) + C ( x + 1)
ˇ
x =´1 x =´1

ùñ 1 = A

Sub this value of A back in and simplify.

3x2 + 13x + 11 = (1)( x + 2)2 + B( x + 1)( x + 2) + C ( x + 1)


2x2 + 9x + 7 = B( x + 1)( x + 2) + C ( x + 1) = ( xB + 2B + C )( x + 1)

Since ( x + 1) is a factor on the right hand side, it must also be a factor on the left
hand side.

(2x + 7)( x + 1) = ( xB + 2B + C )( x + 1) ñ (2x + 7) = ( xB + 2B + C )

For the coefficients of x to match, B must be 2. For the constant terms to match,
2B + C must be 7, so C must be 3. Hence we again have

4x3 + 23x2 + 45x + 27 3x2 + 13x + 11 1 2 3


= 4 + = 4 + + +
x3 + 5x2 + 8x + 4 ( x + 1)( x + 2)2 x + 1 x + 2 ( x + 2)2

• Step 4. The final step is to integrate

4x3 + 23x2 + 45x + 27 1 2 3


ż ż ż ż ż
dx = 4dx + dx + dx + dx
x3 + 5x2 + 8x + 4 x+1 x+2 ( x + 2)2
3
= 4x + ln |x + 1| + 2 ln |x + 2| ´ +C
x+2

Example 3.8.3

222
I NTEGRATION 3.8 PARTIAL F RACTIONS

3.8.2 §§ Non-Rational Integrands


The method of partial fractions is not just confinedş to the problem
ş of3 integrating rational
functions. There are other integrals — such as sec xdx and sec xdx — that can be
transformed (via substitutions) into integrals of rational functions. We encountered both
of these integrals in Sections 3.6 and 3.7 on trigonometric integrals and substitutions.

Example 3.8.4 ( sec xdx )


ş

Solution. In this example, we integrate sec x. It is not yet clear what this integral has to do
with partial fractions. To get to a partial fractions computation, we first make one of our
old substitutions.
1
ż ż
sec xdx = dx massage the expression a little
cos x
cos x
ż
= dx substitute u = sin x, du = cos xdx
cos2 x
du
ż
=´ 2
and use cos2 x = 1 ´ sin2 x = 1 ´ u2
u ´1
1
So we now have to integrate u2 ´1
, which is a rational function of u, and so is perfect for
partial fractions.

• Step 1. The degree of the numerator, 1, is zero, which is strictly smaller than the
degree of the denominator, u2 ´ 1, which is two. So the first step is skipped.

• Step 2. The second step is to factor the denominator:

u2 ´ 1 = (u ´ 1)(u + 1)

1
• Step 3. The third step is to write u2 ´1
in the form

1 1 A B
= = +
u2 ´1 (u ´ 1)(u + 1) u´1 u+1

for some constants A and B.

• Step 3 – Sneaky Method.

– Multiply through by the denominator to get

1 = A ( u + 1) + B ( u ´ 1)

This equation must be true for all u.


– If we now set u = 1 then we eliminate B from the equation leaving us with

1 = 2A so A = 1/2.

223
I NTEGRATION 3.8 PARTIAL F RACTIONS

– Similarly, if we set u = ´1 then we eliminate A, leaving


1 = ´2B which implies B = ´1/2.

We have now found that A = 1/2, B = ´1/2, so


1 1h 1 1 i
= ´ .
u2 ´ 1 2 u´1 u+1
• It is always a good idea to check our work.
1/2 ´1/2 1/2( u + 1) ´ 1/2( u ´ 1) 1
+ = = X
u´1 u+1 (u ´ 1)(u + 1) (u ´ 1)(u + 1)

• Step 4. The final step is to integrate.


du
ż ż
sec xdx = ´ after substitution
u2 ´ 1
1 du 1 du
ż ż
=´ + partial fractions
2 u´1 2 u+1
1 1
= ´ ln |u ´ 1| + ln |u + 1| + C
2 2
1 1
= ´ ln | sin( x ) ´ 1| + ln | sin( x ) + 1| + C rearrange a little
2 ˇ 2
1 ˇˇ 1 + sin x ˇˇ
ˇ
= ln ˇ +C
2 1 ´ sin x ˇ
Notice that since ´1 ď sin x ď 1, we are free to drop the absolute values in the last
line if we wish.
Example 3.8.4

Another example in the same spirit, though a touch harder. Again, we saw this prob-
lem in Section 3.6 and 3.7.

Example 3.8.5 sec3 xdx
ş

Solution.
• We’ll start by converting it into the integral of a rational function using the substitu-
tion u = sin x, du = cos xdx.
1
ż ż
3
sec xdx = dx massage this a little
cos3 x
cos x
ż
= dx replace cos2 x = 1 ´ sin2 x = 1 ´ u2
cos4 x
cos xdx
ż
= 2
[1 ´ sin2 x ]
du
ż
=
[1 ´ u2 ]2

224
I NTEGRATION 3.8 PARTIAL F RACTIONS

1
• We could now find the partial fraction decomposition of the integrand by
[1´u2 ]2
executing the usual four steps. But it is easier to use
1 1h 1 1 i
= ´
u2 ´ 1 2 u´1 u+1
which we worked out in Example 3.8.4 above.

• Squaring this gives


1 1h 1 1 i2
= ´
[1 ´ u2 ]2 4 u´1 u+1
1h 1 2 1 i
= ´ +
4 (u ´ 1)2 (u ´ 1)(u + 1) (u + 1)2
1h 1 1 1 1 i
= ´ + +
4 ( u ´ 1)2 u ´ 1 u + 1 ( u + 1)2
h i
1
where we have again used u21´1 = 21 u´1 1
´ u+ 1 in the last step.

• It only remains to do the integrals and simplify.


1 h 1 1 1 1 i
ż ż
3
sec xdx = ´ + + du
4 ( u ´ 1)2 u ´ 1 u + 1 ( u + 1)2
1h 1 1 i
= ´ ´ ln |u ´ 1| + ln |u + 1| ´ +C group carefully
4 u´1 u+1
´1 h 1 1 i 1h i
= + + ln |u + 1| ´ ln |u ´ 1| + C sum carefully
4 u´1 u+1 4
1 2u 1 ˇˇ u + 1 ˇˇ
=´ 2 + ln ˇ ˇ+C clean up
4u ´1 4 u´1
1 u 1 ˇˇ u + 1 ˇˇ
= + ln ˇ ˇ+C put u = sin x
2 1 ´ u2 4 u´1
1 sin x 1 ˇˇ sin x + 1 ˇˇ
= + ln ˇ ˇ+C
2 cos2 x 4 sin x ´ 1

Example 3.8.5

3.8.3 §§ Systematic Description


In the examples above we used the partial fractions method to decompose rational func-
tions into easily integrated pieces. Each of those examples was quite involved and we
had to spend quite a bit of time factoring and doing long division. The key step in each
N (x)
of the computations was Step 3 — in that step we decomposed the rational function D( x)
R( x )
(or D(x)
), for which the degree of the numerator is strictly smaller than the degree of the
A
denominator, into a sum of particularly simple rational functions, like x´a . We did not,
however, give a systematic description of those decompositions.

225
I NTEGRATION 3.8 PARTIAL F RACTIONS

In this subsection we fill that gap by describing the general46 form of partial fraction
decompositions. The justification of these forms is not part of the course, but the interested
reader is invited to read Appendix section A.9.2 where such justification is given. In the
following it is assumed that

• N ( x ) and D ( x ) are polynomials with the degree of N ( x ) strictly smaller than the
degree of D ( x ).

• K is a constant.

• a1 , a2 , ¨ ¨ ¨ , a j are all different numbers.

• m1 , m2 , ¨ ¨ ¨ , m j , and n1 , n2 , ¨ ¨ ¨ , nk are all strictly positive integers.

§§§ Simple Linear Factor Case


If the denominator D ( x ) = K ( x ´ a1 )( x ´ a2 ) ¨ ¨ ¨ ( x ´ a j ) is a product of j different linear
factors, then

Equation 3.8.6.

N (x) A1 A2 Aj
= + +¨¨¨+
D(x) x ´ a1 x ´ a2 x ´ aj

We can then integrate each term

A
ż
dx = A ln |x ´ a| + C.
x´a

§§§ General Linear Factor Case


If the denominator D ( x ) = K ( x ´ a1 )m1 ( x ´ a2 )m2 ¨ ¨ ¨ ( x ´ a j )m j then

Equation 3.8.7.

N (x) A1,1 A1,2 A1,m1


= + +¨¨¨+
D(x) x ´ a1 ( x ´ a1 ) 2 ( x ´ a 1 ) m1
A A2,2 A2,m2
+ 2,1 + +¨¨¨+ +¨¨¨
x ´ a2 ( x ´ a2 ) 2 ( x ´ a 2 ) m2
A j,1 A j,2 A j,m j
+ + + ¨ ¨ ¨ +
x ´ a j ( x ´ a j )2 ( x ´ a j )m j

46 Well — not the completely general form, in the sense that we are not allowing the use of complex
numbers and we are not allowing irreducible quadratic factors.

226
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

Notice that we could rewrite each line as


A1 A2 Am A1 ( x ´ a)m´1 + A2 ( x ´ a)m´2 + ¨ ¨ ¨ + Am
+ + ¨ ¨ ¨ + =
x ´ a ( x ´ a )2 ( x ´ a)m ( x ´ a)m
B x m´1 + B2 x m´2 + ¨ ¨ ¨ + Bm
= 1
( x ´ a)m
which is a polynomial whose degree, m ´ 1, is strictly smaller than that of the denominator
( x ´ a)m . But the form of Equation (3.8.7) is preferable because it is easier to integrate.
A
ż
dx = A ln |x ´ a| + C
x´a
A 1 A
ż
k
dx = ´ ¨ provided k ą 1.
( x ´ a) k ´ 1 ( x ´ a)k´1
A justification for why our algorithm for partial fraction decomposition works is in
Appendix section A.9.2.

Every polynomial can be factored into the product of linear and irreducible quadratic47
factors. The general method of partial fractions makes use of this and can indeed gives
us the ability to antidifferentiate any rational function. However, for the purposes of this
class, we will restrict our study to decomposing rational functions whose denominators
can be factored into linear terms48 . For the extra-curious, the cases involving irreducible
quadratics can be found in Appendix Section A.9.1.

3.9IJ Numerical Integration


By now the reader will have come to appreciate that integration is generally quite a bit
more difficult than differentiation. There are a great many simple-looking integrals, such
2
as e´x dx, that are either very difficult or even impossible to express in terms of stan-
ş

dard functions49 . Such integrals are not merely mathematical curiosities, but arise very
naturally in many contexts. For example, the error function
żx
2 2
erf( x ) = ? e´t dt
π 0
is extremely important in many areas of mathematics, and also in many practical applica-
tions of statistics.

47 An irreducible quadratic equation is a quadratic equation that can’t be factored (at least if we restrict
ourselves to real numbers). This is the same as a quadratic equation with no (real) roots. The simplest
example is x2 + 1.
48 Actually, if you allow complex numbers, all polynomials can be factored into linear terms, because no
quadratics are irreducible. But again, we’ll contain our excitement and restrict ourselves to performing
partial fraction decompositions on rational functions whose denominators are the product of linear
factors with real terms.
49 We apologise for being a little sloppy here — but we just want to say that it can be very hard or even
impossible to write some integrals as some finite sized expression involving polynomials, exponen-
tials, logarithms and trigonometric functions. We don’t want to get into a discussion of computability,
though that is a very interesting topic.

227
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

In such applications we need to be able to evaluate this integral (and many others) at
a given numerical value of x. In this section we turn to the problem of how to find (ap-
proximate) numerical values for integrals, without having to evaluate them algebraically.
To develop these methods we return to Riemann sums and our geometric interpretation
of the definite integral as the signed area.
We start by describing (and applying) three simple algorithms for generating, numer-
şb
ically, approximate values for the definite integral a f ( x ) dx. In each algorithm, we begin
in much the same way as we approached Riemann sums.

• We first select an integer n ą 0, called the “number of steps”.

• We then divide the interval of integration, a ď x ď b, into n equal subintervals, each


of length ∆x = b´a
n . The first subinterval runs from x0 = a to x1 = a + ∆x. The
second runs from x1 to x2 = a + 2∆x, and so on. The last runs from xn´1 = b ´ ∆x
to xn = b.

y
y = f (x)

x
a = x0 x1 x2 x3 · · · xn−1 xn = b

This splits the original integral into n pieces:

żb ż x1 ż x2 ż xn
f ( x ) dx = f ( x ) dx + f ( x ) dx + ¨ ¨ ¨ + f ( x ) dx
a x0 x1 xn´1

şx
Each subintegral x j f ( x ) dx is approximated by the area of a simple geometric figure.
j´1
The three algorithms we consider approximate the area by rectangles, trapezoids and
parabolas (respectively).

228
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

We will explain these rules in detail below, but we give a brief overview here:
(1) The midpoint rule approximates each subintegral by the area of a rectangle of height
given by the value of the function at the midpoint of the subinterval
ż xj  
x j´1 + x j
f ( x )dx « f ∆x
x j´1 2
This is illustrated in the leftmost figure above.
(2) The trapezoidal rule approximates each subintegral by the area of a trapezoid with
vertices at ( x j´1 , 0), ( x j´1 , f ( x j´1 )), ( x j , f ( x j )), ( x j , 0):
ż xj
1 
f ( x )dx « f ( x j´1 ) + f ( x j ∆x
x j´1 2
The trapezoid is illustrated in the middle figure above. We shall derive the formula
for the area shortly.
(3) Simpson’s rule approximates two adjacent subintegrals by the area under a parabola
that passes through the points ( x j´1 , f ( x j´1 )), ( x j , f ( x j )) and ( x j+1 , f ( x j+1 )):
ż x j +1
1 
f ( x )dx « f ( x j´1 ) + 4 f ( x j ) + f ( x j+1 ∆x
x j´1 3
The parabola is illustrated in the right hand figure above. We shall derive the formula
for the area shortly.

Notation3.9.1 (Midpoints).

In what follows we need to refer to the midpoint between x j´1 and x j very fre-
quently. To save on writing (and typing) we introduce the notation

1 
x̄ j = x j´1 + x j .
2

229
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

3.9.1 §§ The Midpoint Rule


şx
The integral x j f ( x ) dx represents the area between the curve y = f ( x ) and the x–axis
j´1
with x running from x j´1 to x j . The width of this region is x j ´ x j´1 = ∆x. The height
varies over the different values that f ( x ) takes as x runs from x j´1 to x j .
The midpoint rule approximates this area by the area of a rectangle of width x j ´ x j´1 =
∆x and height f ( x̄ j ) which is the exact height at the midpoint of the range covered by x.

f (xj ) xj−1 +xj 


f 2

f (xj−1 )

xj−1 xj xj−1 x̄j xj

The area of the approximating rectangle is f ( x̄ j )∆x, and the midpoint rule approximates
each subintegral by
ż xj
f ( x ) dx « f ( x̄ j )∆x
x j´1
.
Applying this approximation to each subinterval and summing gives us the following
approximation of the full integral:
żb ż x1 ż x2 ż xn
f ( x ) dx = f ( x ) dx + f ( x ) dx + ¨ ¨ ¨ + f ( x ) dx
a x0 x1 xn´1
« f ( x̄1 )∆x + f ( x̄2 )∆x + ¨ ¨ ¨ + f ( x̄n )∆x

So notice that the approximation is the sum of the function evaluated at the midpoint
of each interval and then multiplied by ∆x. Our other approximations will have similar
forms.
In summary:

Equation 3.9.2(The midpoint rule).


The midpoint rule approximation is
żb h i
f ( x ) dx « f ( x̄1 ) + f ( x̄2 ) + ¨ ¨ ¨ + f ( x̄n ) ∆x
a

b´a
where ∆x = n and

x0 = a x1 = a + ∆x x2 = a + 2∆x ¨¨¨ xn´1 = b ´ ∆x xn = b


x0 + x1 x1 + x2 xn´2 + xn´1 xn´1 + xn
x̄1 = 2 x̄2 = 2 ¨¨¨ x̄n´1 = 2 x̄n = 2

230
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

ş 
1 4
Example 3.9.3 0 1+ x 2 dx

We approximate the above integral using the midpoint rule with n = 8 step.
Solution.
1
• First we set up all the x-values that we will need. Note that a = 0, b = 1, ∆x = 8 and
1 2 7 8
x0 = 0 x1 = 8 x2 = 8 ¨¨¨ x7 = 8 x8 = 8 =1

Consequently
1 3 5 15
x̄1 = 16 x̄2 = 16 x̄3 = 16 ¨¨¨ x̄8 = 16

4
• We now apply Equation (3.9.2) to the integrand f ( x ) = 1+ x 2
:

f ( x̄1 ) f ( x̄2 ) f ( x̄n´1 ) f ( x̄n )


 
hkkikkj hkkikkj hkkikkj hkkikkj
ż1
4 4 4 4 4
2
dx « + +¨ ¨ ¨ + + ∆x
0 1+x 1 + x̄12 1 + x̄22 1 + x̄72 1 + x̄82
 
4 4 4 4 4 4 4 4 1
= 1
+ 2 + 2 + 2 + 2 + 2 + 2 + 2
1 + 162 3
1 + 16 2 1 + 165
2 1 + 16 7
2 1 + 169
2 1 + 11 1 + 13 1 + 15 8
162 162 162
 1
= 3.98444 + 3.86415 + 3.64413 + 3.35738 + 3.03858 + 2.71618 + 2.40941 + 2.12890
8
= 3.1429

where we have rounded to four decimal places.

• In this case we can compute the integral exactly (which is one of the reasons it was
chosen as a first example):
ż1
4 ˇ1
dx = 4 arctan x ˇ =π
ˇ
2
0 1+x 0

• So the error in the approximation generated by eight steps of the midpoint rule is

|3.1429 ´ π| = 0.0013

• The relative error is then


|approximate ´ exact| |3.1429 ´ π|
= = 0.0004
exact π
That is the error is 0.0004 times the actual value of the integral.

• We can write this as a percentage error by multiplying it by 100

|approximate ´ exact|
percentage error = 100 ˆ = 0.04%
exact
That is, the error is about 0.04% of the exact value.

231
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

Example 3.9.3
The midpoint rule gives us quite good estimates of the integral without too much work —
though it is perhaps a little tedious to do by hand50 . Of course, it would be very helpful to
quantify what we mean by “good” in this context and that requires us to discuss errors.

Definition3.9.4.

Suppose that α is an approximation to A. This approximation has

• absolute error |A ´ α| and

• relative error |A´α|


A and

• percentage error 100 |A´α|


A

We will discuss errors further in Section 3.9.4 below.



Example 3.9.5 0 sin x dx
şπ

As a second example, we apply the midpoint rule with n = 8 steps to the above integral.

• We again start by setting up all the x-values that we will need. So a = 0, b = π,


∆x = π8 and
2π 7π 8π
x0 = 0 x1 = π
8 x2 = 8 ¨¨¨ x7 = 8 x8 = 8 =π

Consequently,
3π 13π 15π
x̄1 = π
16 x̄2 = 16 ¨¨¨ x̄7 = 16 x̄8 = 16

• Now apply Equation (3.9.2) to the integrand f ( x ) = sin x:


żπ h i
sin x dx « sin( x̄1 ) + sin( x̄2 ) + ¨ ¨ ¨ + sin( x̄8 ) ∆x
0
h i
= sin( 16 π
) + sin( 3π
16 ) + sin ( 5π
16 ) + sin ( 7π
16 ) + sin ( 9π
16 ) + sin ( 11π
16 ) + sin ( 13π
16 ) + sin ( 15π
16 ) π
8
h i
= 0.1951 + 0.5556 + 0.8315 + 0.9808 + 0.9808 + 0.8315 + 0.5556 + 0.1951 ˆ 0.3927
= 5.1260 ˆ 0.3927 = 2.013

• Again, we have chosen this example so that we can compare it against the exact
value:
 π
żπ
sin xdx = ´ cos x 0 = ´ cos π + cos 0 = 2.
0

50 Thankfully it is very easy to write a program to apply the midpoint rule.

232
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

• So with eight steps of the midpoint rule we achieved


absolute error = |2.013 ´ 2| = 0.013
|2.013 ´ 2|
relative error = = 0.0065
2
|2.013 ´ 2|
percentage error = 100 ˆ = 0.65%
2
With little work we have managed to estimate the integral to within 1% of its true
value.
Example 3.9.5

3.9.2 §§ The Trapezoidal Rule


ş xj
Consider again the area represented by the integral x j´1 f ( x ) dx. The trapezoidal rule51
(unsurprisingly) approximates this area by a trapezoid52 whose vertices lie at
( x j´1 , 0), ( x j´1 , f ( x j´1 )), ( x j , f ( x j )) and ( x j , 0).

f (xj ) f (xj )

f (xj−1 ) f (xj−1 )

xj−1 xj xj−1 xj

ş xj
The trapezoidal approximation of the integral x j´1 f ( x ) dx is the shaded region in the
figure on the right above. It has width x j ´ x j´1 = ∆x. Its left hand side has height f ( x j´1 )
and its right hand side has height f ( x j ).
As the figure below shows, the area of a trapezoid is its width times its average height.

y
r
area (r − ℓ)w/2

area (r + ℓ)w/2
area ℓw

w x

51 This method is also called the “trapezoid rule” and “trapezium rule”.
52 A trapezoid is a four sided polygon, like a rectangle. But, unlike a rectangle, the top and bottom of a
trapezoid need not be parallel.

233
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

So the trapezoidal rule approximates each subintegral by


ż xj
f ( x j´1 )+ f ( x j )
f ( x ) dx « 2 ∆x
x j´1

Applying this approximation to each subinterval and then summing the result gives us
the following approximation of the full integral
żb ż x1 ż x2 ż xn
f ( x ) dx = f ( x ) dx + f ( x ) dx +¨¨¨ + f ( x ) dx
a x0 x1 xn´1
f ( x0 )+ f ( x1 ) f ( x1 )+ f ( x2 ) f ( xn´1 )+ f ( xn )
« 2 ∆x + 2 ∆x +¨¨¨ + 2 ∆x
h i
1 1
= 2 f ( x0 ) + f ( x1 ) + f ( x2 ) + ¨ ¨ ¨ + f ( xn´1 ) + f ( xn ) ∆x 2

So notice that the approximation has a very similar form to the midpoint rule, excepting
that

• we evaluate the function at the x j ’s rather than at the midpoints, and

• we multiply the value of the function at the endpoints x0 , xn by 1/2.

In summary:

Equation 3.9.6(The trapezoidal rule).


The trapezoidal rule approximation is
żb h i
1
f ( x ) dx « 2 f ( x0 ) + f ( x1 ) + f ( x2 ) + ¨ ¨ ¨ + f ( xn´1 ) + 21 f ( xn ) ∆x
a

where
b´a
∆x = n , x0 = a, x1 = a + ∆x, x2 = a + 2∆x, ¨¨¨ , xn´1 = b ´ ∆x, xn = b

To compare and contrast we apply the trapezoidal rule to the examples we did above
with the midpoint rule.
ş 
1
Example 3.9.7 0 1+4x2 dx — using the trapezoidal rule

Solution. We proceed very similarly to Example 3.9.3 and again use n = 8 steps.

4 1
• We again have f ( x ) = 1+ x 2
, a = 0, b = 1, ∆x = 8 and

1 2 7 8
x0 = 0 x1 = 8 x2 = 8 ¨¨¨ x7 = 8 x8 = 8 =1

234
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

• Applying the trapezoidal rule, Equation (3.9.6), gives


f ( x0 ) f ( x1 ) f ( xn´1 ) f ( xn )
 
hkkikkj hkkikkj hkkikkj hkkikkj
ż1
4 1 4 4 4 1 4
dx « + +¨ ¨ ¨ + + ∆x
0 1 + x2 2 1 + x02 1 + x12 1 + x72 2 1 + x82

1 4 4 4 4
= 2
+ 1
+ 2 + 2
21+0 1 + 82 1 + 282 1 + 382

4 4 4 4 1 4 1
+ 2 + 2 + 2 + 2 + 2
1 + 482 1 + 582 1 + 682 1 + 782 2 1 + 82 8
8
h1
= ˆ 4 + 3.939 + 3.765 + 3.507
2
1 i1
+ 3.2 + 2.876 + 2.56 + 2.266 + ˆ 2
2 8
= 3.139
to three decimal places.
• The exact value of the integral is still π. So the error in the approximation generated
by eight steps of the trapezoidal rule is |3.139 ´ π| = 0.0026, which is 100 |3.139´π|
π %=
0.08% of the exact answer. Notice that this is roughly twice the error that we achieved
using the midpoint rule in Example 3.9.3.

Example 3.9.7

Let us also redo Example 3.9.5 using the trapezoidal rule.



Example 3.9.8 0 sin x dx — using the trapezoidal rule
şπ

Solution. We proceed very similarly to Example 3.9.5 and again use n = 8 steps.

• We again have a = 0, b = π, ∆x = π
8 and
2π 7π 8π
x0 = 0 x1 = π
8 x2 = 8 ¨¨¨ x7 = 8 x8 = 8 =π

• Applying the trapezoidal rule, Equation (3.9.6), gives


żπ h i
sin x dx « 12 sin( x0 ) + sin( x1 ) + ¨ ¨ ¨ + sin( x7 ) + 21 sin( x8 ) ∆x
0
h i
= 21 sin 0 + sin π8 + sin 2π8 + sin 3π
8 + sin 4π
8 + sin 5π
8 + sin 6π
8 + sin 7π
8 + 1
2 sin 8π π
8 8
h i
= 12 ˆ0 + 0.3827 + 0.7071 + 0.9239 + 1.0000 + 0.9239 + 0.7071 + 0.3827 + 12 ˆ0 ˆ 0.3927
= 5.0274 ˆ 0.3927 = 1.974
ˇπ
• The exact answer is sin x dx = ´ cos xˇ = 2. So with eight steps of the trape-
şπ ˇ
0
0
zoidal rule we achieved 100 |1.974´2|
2 = 1.3% accuracy. Again this is approximately
twice the error we achieved in Example 3.9.5 using the midpoint rule.

235
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

Example 3.9.8

These two examples suggest that the midpoint rule is more accurate than the trape-
zoidal rule. Indeed, this observation is born out by a rigorous analysis of the error — see
Section 3.9.4.

3.9.3 §§ Simpson’s Rule


ş xj
When we use the trapezoidal rule we approximate the area x j´1 f ( x )dx by the area be-
tween the x-axis and a straight line that runs from ( x j´1 , f ( x j´1 )) to ( x j , f ( x j )) — that is,
we approximate the function f ( x ) on this interval by a linear function that agrees with the
function at each endpoint. An obvious way to extend this — just as we did when extend-
ing linear approximations to quadratic approximations in our differential calculus course
— is to approximate the function with a quadratic. This is precisely what Simpson’s53 rule
does.
Simpson’s rule approximates the integral over two neighbouring subintervals by the
area between a parabola and the x-axis. In order to describe this parabola we need 3
distinct points (which is why we approximate two subintegrals at a time). That is, we
approximate
ż x1 ż x2 ż x2
f ( x ) dx + f ( x ) dx = f ( x ) dx
x0 x1 x0

by the area bounded by the parabola that passes through the three points x0 , f ( x0 ) ,
 
x1 , f ( x1 ) and x2 , f ( x2 ) , the x-axis and the vertical lines x = x0 and x = x2 . We repeat

 
x1 , f (x1 ) x2 , f (x2 )

x0 , f (x0 )

x0 x1 x2

şx
this on the next pair of subintervals and approximate x24 f ( x ) dx by the area between the
x–axis and the part  of a parabola  with x2 ď x ď x4 . This parabola passes through the three
points x2 , f ( x2 ) , x3 , f ( x3 ) and x4 , f ( x4 ) . And so on. Because Simpson’s rule does
the approximation two slices at a time, n must be even.
To derive Simpson’s rule formula,  we first find  the equation of
 the parabola that passes
through the three points x0 , f ( x0 ) , x1 , f ( x1 ) and x2 , f ( x2 ) . Then we find the area

53 Simpson’s rule is named after the 18th century English mathematician Thomas Simpson, despite its
use a century earlier by the German mathematician and astronomer Johannes Kepler. In many German
texts the rule is often called Kepler’s rule.

236
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

between the x–axis and the part of that parabola with x0 ď x ď x2 . To simplify this
computation consider a parabola passing through the points (´h, y´1 ), (0, y0 ) and (h, y1 ).
Write the equation of the parabola as

y = Ax2 + Bx + C

Then the area between it and the x-axis with x running from ´h to h is
żh  h
 2
 A 3 B 2
Ax + Bx + C dx = x + x + Cx
´h 3 2 ´h
2A 3
= h + 2Ch it is helpful to write it as
3
h 2

= 2Ah + 6C
3

Now, the the three points (´h, y´1 ), (0, y0 ) and (h, y1 ) lie on this parabola if and only if

Ah2 ´ Bh + C = y´1 at (´h, y´1 )


C = y0 at (0, y0 )
Ah2 + Bh + C = y1 at (h, y1 )

Adding the first and third equations together gives us

2Ah2 + ( B ´ B)h + 2C = y´1 + y1

To this we add four times the middle equation

2Ah2 + 6C = y´1 + 4y0 + y1 .

This means that

h 
żh
 2
 2
area = Ax + Bx + C dx = 2Ah + 6C
´h 3
h
= (y´1 + 4y0 + y1 )
3
Note that here

• h is one half of the length of the x–interval under consideration

• y´1 is the height of the parabola at the left hand end of the interval under consider-
ation

• y0 is the height of the parabola at the middle point of the interval under considera-
tion

• y1 is the height of the parabola at the right hand end of the interval under consider-
ation

237
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

So Simpson’s rule approximates


ż x2
 
f ( x ) dx « 31 ∆x f ( x0 ) + 4 f ( x1 ) + f ( x2 )
x0

and ż x4
 
f ( x ) dx « 31 ∆x f ( x2 ) + 4 f ( x3 ) + f ( x4 )
x2
and so on. Summing these all together gives:
żb ż x2 ż x4 ż x6 ż xn
f ( x ) dx = f ( x ) dx + f ( x ) dx + f ( x ) dx + ¨ ¨ ¨ + f ( x ) dx
a x0 x2 x4 xn´2
∆x
  ∆x
 
« 3 f ( x0 ) + 4 f ( x1 ) + f ( x2 ) + 3 f ( x2 ) + 4 f ( x3 ) + f ( x4 )
   
+ ∆x ∆x
3 f ( x4 ) + 4 f ( x5 ) + f ( x6 ) + ¨ ¨ ¨ + 3 f ( xn´2 ) + 4 f ( xn´1 ) + f ( xn )
h i
∆x
= f ( x0 )+ 4 f ( x1 )+ 2 f ( x2 )+ 4 f ( x3 )+ 2 f ( x4 )+ ¨ ¨ ¨ + 2 f ( xn´2 )+ 4 f ( xn´1 )+ f ( xn ) 3

In summary

Equation 3.9.9(Simpson’s rule).


The Simpson’s rule approximation is
żb h
f ( x ) dx « f ( x0 )+ 4 f ( x1 )+ 2 f ( x2 )+ 4 f ( x3 )+ 2 f ( x4 )+ ¨ ¨ ¨
a
i
∆x
¨ ¨ ¨ + 2 f ( xn´2 )+ 4 f ( xn´1 )+ f ( xn ) 3

where n is even and


b´a
∆x = n , x0 = a, x1 = a + ∆x, x2 = a + 2∆x, ¨¨¨ , xn´1 = b ´ ∆x, xn = b

Notice that Simpson’s rule requires essentially no more work than the trapezoidal rule.
In both rules we must evaluate f ( x ) at x = x0 , x1 , . . . , xn , but we add those terms multi-
plied by different constants54 .
Let’s put it to work on our two running examples.
ş 
1
Example 3.9.10 0 1+4x2 dx — using Simpson’s rule

Solution. We proceed almost identically to Example 3.9.7 and again use n = 8 steps.

54 There is an easy generalisation of Simpson’s rule that uses cubics instead of parabolas. It leads to the
formula
żb
3∆x
f ( x )dx = [ f ( x0 ) + 3 f ( x1 ) + 3 f ( x2 ) + 2 f ( x3 ) + 2 f ( x4 ) + 3 f ( x5 ) + 3 f ( x6 ) + 2 f ( x7 ) + ¨ ¨ ¨ + f ( xn )]
a 8
where n is a multiple of 3. This result is known as Simpson’s second rule and Simpson’s 3/8 rule. While
one can push this approach further (using quartics, quintics etc), it can sometimes lead to larger errors
— the interested reader should look up Runge’s phenomenon.

238
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

• We have the same ∆, a, b, x0 , . . . , xn as Example 3.9.7.


• Applying Equation 3.9.9 gives
ż1 
4 4 4 4 4
2
dx « 2
+4 1
+2 2 +4 2
0 1+x 1+0 1 + 82 2
1 + 82 1 + 832

4 4 4 4 4 1
+2 +4 +2 +4 +
42 52 62 72 82 8ˆ3
1+ 82
1+ 82
1+ 82
1+ 82
1+ 82
h
= 4 + 4 ˆ 3.938461538 + 2 ˆ 3.764705882 + 4 ˆ 3.506849315
i 1
+ 2 ˆ 3.2 + 4 ˆ 2.876404494 + 2 ˆ 2.56 + 4 ˆ 2.265486726 + 2
8ˆ3
= 3.14159250
to eight decimal places.
• This agrees with π (the exact value of the integral) to six decimal places. So the error
in the approximation generated by eight steps of Simpson’s rule is |3.14159250 ´
π| = 1.5 ˆ 10´7 , which is 100 |3.14159250´π|
π % = 5 ˆ 10´6 % of the exact answer.
Example 3.9.10
It is striking that the absolute error approximating with Simpson’s rule is so much smaller
than the error from the midpoint and trapezoidal rules.
midpoint error = 0.0013
trapezoid error = 0.0026
Simpson error = 0.00000015
Buoyed by this success, we will also redo Example 3.9.8 using Simpson’s rule.

Example 3.9.11 0 sin x dx — Simpson’s rule
şπ

Solution. We proceed almost identically to Example 3.9.8 and again use n = 8 steps.

• We have the same ∆, a, b, x0 , . . . , xn as Example 3.9.7.


• Applying Equation 3.9.9 gives
żπ h i
sin x dx « sin( x0 ) + 4 sin( x1 ) + 2 sin( x2 ) + ¨ ¨ ¨ + 4 sin( x7 ) + sin( x8 ) ∆x
3
0
h
= sin(0) + 4 sin( π8 ) + 2 sin( 2π 3π
8 ) + 4 sin( 8 ) + 2 sin( 8 )

i
5π 6π 7π 8π
+ 4 sin( 8 ) + 2 sin( 8 ) + 4 sin( 8 ) + sin( 8 ) 8ˆ3 π
h
= 0 + 4 ˆ 0.382683 + 2 ˆ 0.707107 + 4 ˆ 0.923880 + 2 ˆ 1.0
i
+ 4 ˆ 0.923880 + 2 ˆ 0.707107 + 4 ˆ 0.382683 + 0 8ˆ3 π

= 15.280932 ˆ 0.130900
= 2.00027

239
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

• With only eight steps of Simpson’s rule we achieved 100 2.00027´2


2 = 0.014% accuracy.
Example 3.9.11
Again we contrast the error we achieved with the other two rules:
midpoint error = 0.013
trapezoid error = 0.026
Simpson error = 0.00027
This completes our derivation of the midpoint, trapezoidal and Simpson’s rules for
approximating the values of definite integrals. So far we have not attempted to see how
efficient and how accurate the algorithms are in general. That’s our next task.

3.9.4 §§ Three Simple Numerical Integrators – Error Behaviour


Now we are armed with our three (relatively simple) methods for numerical integration
we should give thought to how practical they might be in the real world55 . Two obvious
considerations when deciding whether or not a given algorithm is of any practical value
are
(a) the amount of computational effort required to execute the algorithm and
(b) the accuracy that this computational effort yields.
For algorithms like our simple integrators, the bulk of the computational effort usually
goes into evaluating the function f ( x ). The number of evaluations of f ( x ) required for n
steps of the midpoint rule is n, while the number required for n steps of the trapezoidal
and Simpson’s rules is n + 1. So all three of our rules require essentially the same amount
of effort – one evaluation of f ( x ) per step.
To get a first impression of the error behaviour of these methods, we apply them to a
problem whose answer we know exactly:
żπ
ˇπ
sin x dx = ´ cos xˇ0 = 2.
0
To be a little more precise, we would like to understand how the errors of the three meth-
ods change as we increase the effort we put in (as measured by the number of steps n). The
following table lists the error in the approximate value for this number generated by our
three rules applied with three different choices of n. It also lists the number of evaluations
of f required to compute the approximation.
Midpoint Trapezoidal Simpson’s
n error # evals error # evals error # evals
10 4.1 ˆ 10 ´1 10 8.2 ˆ 10 ´1 11 5.5 ˆ 10 ´3 11
100 4.1 ˆ 10 ´3 100 8.2 ˆ 10 ´3 101 5.4 ˆ 10 ´7 101
1000 4.1 ˆ 10 ´5 1000 8.2 ˆ 10 ´5 1001 5.5 ˆ 10´11 1001

55 Indeed, even beyond the “real world” of many applications in first year calculus texts, some of the
methods we have described are used by actual people (such as ship builders, engineers and surveyors)
to estimate areas and volumes of actual objects!

240
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

Observe that

• Using 101 evaluations of f worth of Simpson’s rule gives an error 80 times smaller
than 1000 evaluations of f worth of the midpoint rule.

• The trapezoidal rule error with n steps is about twice the midpoint rule error with n
steps.

• With the midpoint rule, increasing the number of steps by a factor of 10 appears to
reduce the error by about a factor of 100 = 102 = n2 .

• With the trapezoidal rule, increasing the number of steps by a factor of 10 appears
to reduce the error by about a factor of 102 = n2 .

• With Simpson’s rule, increasing the number of steps by a factor of 10 appears to


reduce the error by about a factor of 104 = n4 .

So it looks like

żb żb
1
approx value of f ( x ) dx given by n midpoint steps « f ( x ) dx + K M ¨
a a n2
żb żb
1
approx value of f ( x ) dx given by n trapezoidal steps « f ( x ) dx + KT ¨ 2
a a n
żb żb
1
approx value of f ( x ) dx given by n Simpson’s steps « f ( x ) dx + K M ¨ 4
a a n

with some constants K M , KT and KS . It also seems that KT « 2K M .

241
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

Figure3.9.1.

1 2 3 4 5 6 7 8 9 10
x = log2 n
−2
1 2 3 4 5 6 7 8 9 10 12 −4

x = log2 n
−6
−2
■ −8 ■

−4 ■

■ trapezoidal rule −10


−6 ■ y = 0.7253 − 2.0008 x
■ −12 ■

−8 ■

■ −14
−10 ■

■ −16 ■

−12 ■

■ −18
−14 ■


−20 ■

−16 ■


−22
−18 ■


−24 ■

−20 ■
Simpson’s rule

−26 y = 0.35 − 4.03 x
−22 ■


−28 ■

−24 ■

midpoint rule ■
−30
−26 y = −0.2706 − 2.0011 x
−32

−28
y = log2 en −34
−36 ■

−38
−40 ■

y = log2 en

żπ
A log-log plot of the error in the n step approximation to sin x dx.
0

To test these conjectures for the behaviour of the errors we apply our three rules with
about ten different choices of n of the form n = 2m with m integer. Figure 3.9.1 contains
two graphs of the results. The left-hand plot shows the results for the midpoint and trape-
zoidal rules and the right-hand plot shows the results for Simpson’s rule.
For each rule we are expecting (based on our conjectures above) that the error
en = |exact value ´ approximate value|
with n steps is (roughly) of the form
1
en = K
nk
for some constants K and k. We would like to test if this is really the case, by graphing
Y = en against X = n and seeing if the graph “looks right”. But it is not easy to tell

242
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

whether or not a given curve really is Y = XKk , for some specific k, by just looking at it.
However, your eye is pretty good at determining whether or not a graph is a straight line.
Fortunately, there is a little trick that turns the curve Y = XKk into a straight line – no matter
what k is.
Instead of plotting Y against X, we plot log Y against log X.56 This transformation57
works because when Y = XKk

log Y = log K ´ k log X

So plotting y = log Y against x = log X gives the straight line y = log K ´ kx, which has
slope ´k and y–intercept log K.
The three graphs in Figure 3.9.1 plot y = log2 en against x = log2 n for our three rules.
Note that we have chosen to use logarithms58 with this “unusual base” because it makes
it very clear how much the error is improved if we double the number of steps used. To be
more precise — one unit step along the x-axis represents changing n ÞÑ 2n. For example,
applying Simpson’s rule with n = 24 steps results in an error of 0000166, so the point
log 0000166
( x = log2 24 = 4, y = log2 0000166 = log 2 = ´15.8) has been included on the graph.
Doubling the effort used — that is, doubling the number of steps to n = 25 — results
in an error of 0.00000103. So, the data point ( x = log2 25 = 5 , y = log2 0.00000103 =
ln 0.00000103
ln 2 = ´19.9) lies on the graph. Note that the x-coordinates of these points differ
by 1 unit.
For each of the three sets of data points, a straight line has also been plotted “through”
the data points. A procedure called linear regression59 has been used to decide precisely
which straight line to plot. It provides a formula for the slope and y–intercept of the
straight line which “best fits” any given set of data points. From the three lines, it sure
looks like k = 2 for the midpoint and trapezoidal rules and k = 4 for Simpson’s rule.
It also looks like the ratio between the value of K for the trapezoidal rule, namely K =
20.7253 , and the value of K for the midpoint rule, namely K = 2´0.2706 , is pretty close to 2:
20.7253 /2´0.2706 = 20.9959 .
The intuition, about the error behaviour, that we have just developed is in fact correct
— provided the integrand f ( x ) is reasonably smooth. To be more precise

56 Note in footnote 27 in section 3.3 we mentioned that the notation log x was ambiguous, and may mean
loge x, log10 x, or log2 x in different contexts. In this paragraph, the base actually doesn’t matter. Our
claim that the transformed function will be a line is true for all of these bases.
57 There is a variant of this trick that works even when you don’t know the answer to the integral ahead
of time. Suppose that you suspect that the approximation satisfies

Mn = A + K n1k

where A is the exact value of the integral and suppose that you don’t know the values of A, K and k.
Then

Mn ´ M2n = K n1k ´ K (2n1 )k = K 1 ´ 21k n1k
 
so plotting y = log( Mn ´ M2n ) against x = log n gives the straight line y = log K 1 ´ 21k ´ kx.
58 Now is a good time for a quick revision of logarithms .
59 Linear regression is not part of this course as its derivation requires some multivariable calculus. It is a
very standard technique in statistics.

243
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

Theorem3.9.12 (Numerical integration errors).

Assume that | f 2 ( x )| ď M for all a ď x ď b. Then

M ( b ´ a )3
the total error introduced by the midpoint rule is bounded by
24 n2
and
M ( b ´ a )3
the total error introduced by the trapezoidal rule is bounded by
12 n2
żb
when approximating f ( x )dx. Further, if | f (4) ( x )| ď L for all a ď x ď b, then
a

L ( b ´ a )5
the total error introduced by Simpson’s rule is bounded by .
180 n4

The first of these error bounds in proven in Appendix section A.10. Here are some
examples which illustrate how they are used. First let us check that the above result is
consistent with our data in Figure 3.9.1

Example 3.9.13 Midpoint rule error approximating 0 sin x dx
şπ

• The integral sin x dx has b ´ a = π.


şπ
0

• The second derivative of the integrand satisfies


ˇ 2
ˇd
ˇ
sin x ˇ = | ´ sin x| ď 1
ˇ
ˇ
ˇ dx2 ˇ

So we take M = 1.

• So the error, en , introduced when n steps are used is bounded by

M ( b ´ a )3
|en | ď
24 n2
π3 1
=
24 n2
1
« 1.29 2
n

• The data in the graph in Figure 3.9.1 gives

1 1
|en | « 2´.2706 = 0.83
n2 n2
π3 1
which is consistent with the bound |en | ď 24 n2 .

244
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

Example 3.9.13

In a typical application we would be asked to evaluate a given integral to some spec-


ified accuracy. For example, if you are manufacturer and your machinery can only cut
1 th
materials to an accuracy of 10 of a millimeter, there is no point in making design specifi-
1 th
cations more accurate than 10 of a millimeter.
Example 3.9.14

Suppose, for example, that we wish to use the midpoint rule to evaluate60
ż1
2
e´x dx
0
to within an accuracy of 10´6 .
Solution.
• The integral has a = 0 and b = 1.
• The first two derivatives of the integrand are
d ´x2 2
e = ´2xe´x and
dx
d2 ´x2 d 2 2 2 2
2
e = ´ 2xe´x = ´2e´x + 4x2 e´x = 2(2x2 ´ 1)e´x
dx dx
• As x runs from 0 to 1, 2x2 ´ 1 increases from ´1 to 1, so that
2 2ˇ
0 ď x ď 1 ùñ |2x2 ´ 1| ď 1, e´x ď 1 ùñ ˇ2(2x2 ´ 1)e´x ˇ ď 2
ˇ

So we take M = 2.
• The error introduced by the n step midpoint rule is at most
M ( b ´ a )3
en ď
24 n2
2 (1 ´ 0)3 1
ď =
24 n2 12n2
• We need this error to be smaller than 10´6 so
1
en ď 2
ď 10´6 and so
12n
12n ě 106
2
clean up
106
n2 ě = 83333.3 . . . square root both sides
12
n ě 288.7
So 289 steps of the midpoint rule will do the job.

60 This is our favourite running example of an integral that cannot be evaluated algebraically — we need
to use numerical methods.

245
I NTEGRATION 3.9 N UMERICAL I NTEGRATION

• In fact n = 289 results in an error of about 3.7 ˆ 10´7 .

Example 3.9.14
That seems like far too much work, and the trapezoidal rule will have twice the error. So
we should look at Simpson’s rule.
Example 3.9.15
ş1 2
Suppose now that we wish evaluate 0 e´x dx to within an accuracy of 10´6 — but now
using Simpson’s rule. How many steps should we use?
Solution.

• Again we have a = 0, b = 1.

d4 ´x2
• We then need to bound dx4
e on the domain of integration, 0 ď x ď 1.

d3 ´x2 d 2( 2 2
3
e = 2(2x2 ´ 1)e´x = 8xe´x ´ 4x (2x2 ´ 1)e´x
dx dx
2
= 4(´2x3 + 3x )e´x
d4 ´x2 d 3 ´x2 2 ´x2 3 ´x2
e 4 3x e 4 3 e 8x 3x e
(
= ( ´2x + ) = ( ´6x + ) ´ ( ´2x + )
dx4 dx
2
= 4(4x4 ´ 12x2 + 3)e´x

2
• Now, for any x, e´x ď 1. Also, for 0 ď x ď 1,

0 ď x2 , x4 ď 1 so
3 ď 4x4 + 3 ď 7 and
´12 ď ´12x2 ď 0 adding these together gives
´9 ď 4x4 ´ 12x2 + 3 ď 7

Consequently, |4x4 ´ 12x2 + 3| is bounded by 9 and so


ˇ 4
ˇ d ´x2 ˇ
ˇ
ˇ dx4 e ˇ ď 4 ˆ 9 = 36
ˇ ˇ

So take L = 36.

• The error introduced by the n step Simpson’s rule is at most

L ( b ´ a )5
en ď
180 n4
36 (1 ´ 0)5 1
ď =
180 n4 5n4

246
I NTEGRATION 3.10 I MPROPER I NTEGRALS

• In order for this error to be no more than 10´6 we require n to satisfy

1
en ď 4
ď 10´6 and so
5n
5n4 ě 106
n4 ě 200000 take fourth root
n ě 21.15

So 22 steps of Simpson’s rule will do the job.

• n = 22 steps actually results in an error of 3.5 ˆ 10´8 . The reason that we get an
error so much smaller than we need is that we have overestimated the number of
steps
ˇ 4 required.ˇ This, in turn, occurred because we made quite a rough bound of
ˇd
ˇ dx4 f ( x )ˇ ď 36. If we are more careful then we will get a slightly smaller n. It
ˇ

actually turns out61 that you only need n = 10 to approximate within 10´6 .

Example 3.9.15

Appendix section A.10 gives some idea of where the error bounds come from.

3.10IJ Improper Integrals


3.10.1 §§ Definitions
şb
To this point we have only considered nicely behaved integrals a f ( x )dx. Though the
algebra involved in some of our examples was quite difficult, all the integrals had

• finite limits of integration a and b, and

• a bounded integrand f ( x ) (and in fact continuous except possibly for finitely many
jump discontinuities).

Not all integrals we need to study are quite so nice.

Definition3.10.1.

An integral having either an infinite limit of integration or an unbounded inte-


grand is called an improper integral.

Two examples are


ż1
dx dx
ż8
and
0 1 + x2 0 x

61 The authors tested this empirically.

247
I NTEGRATION 3.10 I MPROPER I NTEGRALS

The first has an infinite domain of integration and the integrand of the second tends to 8
as x approaches the left end of the domain of integration. We’ll start with an example that
illustrates the traps that you can fall into if you treat such integrals sloppily. Then we’ll
see how to treat them carefully.
ş 
1
Example 3.10.2 ´1 x12 dx

Consider the integral


ż1
1
dx
´1 x2
If we “do” this integral completely naively then we get
ż1 ˇ1
1 x´1 ˇˇ
2
dx =
´1 x ´1 ˇ´1
1 ´1
= ´
´1 ´1
= ´2
which is wrong62 . In fact, the answer is ridiculous. The integrand x12 ą 0, so the integral
has to be positive.
The flaw in the argument is that the Fundamental Theorem of Calculus, which says
that
şb
if F1 ( x ) = f ( x ) then a f ( x ) dx = F (b) ´ F ( a)
is applicable only when F1 ( x ) exists and equals f ( x ) for all a ď x ď b. In this case F1 ( x ) =
1
x2
does not exist for x = 0. The given integral is improper. We’ll see later that the correct
answer is +8.
Example 3.10.2
ş8 dx
Let us put this example to one side for a moment and turn to the integral a 1+ x2
. In this
case, the integrand is bounded but the domain of integration extends to +8. We can eval-
uate this integral by sneaking up on it. We compute it on a bounded domain of integration,
ş R dx
like a 1+ x2
, and then take the limit R Ñ 8. Let us put this into practice:

y = f (x)

a R x

62 Very wrong. But it is not an example of “not even wrong” — which is a phrase attributed to the physicist
Wolfgang Pauli who was known for his harsh critiques of sloppy arguments. The phrase is typically
used to describe arguments that are so incoherent that not only can one not prove they are true, but
they lack enough coherence to be able to show they are false. The interested reader should do a little
searchengineing and look at the concept of falisfyability.

248
I NTEGRATION 3.10 I MPROPER I NTEGRALS

ş 
8 dx
Example 3.10.3 a 1+ x 2

Solution.

• Since the domain extends to +8 we first integrate on a finite domain

żR ˇR
dx
= arctan xˇˇ
ˇ
1+x 2
a a
= arctan R ´ arctan a

• We then take the limit as R Ñ +8:

żR
dx dx
ż8
2
= lim
a 1+x RÑ8 a 1 + x2
 
= lim arctan R ´ arctan a
RÑ8
π
= ´ arctan a.
2

Example 3.10.3

To be more precise, we actually formally define an integral with an infinite domain


as the limit of the integral with a finite domain as we take one or more of the limits of
integration to infinity.

249
I NTEGRATION 3.10 I MPROPER I NTEGRALS

Definition3.10.4 (Improper integral with infinite domain of integration).

şR
(a) If the integral a f ( x ) dx exists for all R ą a, then
ż8 żR
f ( x ) dx = lim f ( x ) dx
a RÑ8 a

when the limit exists (and is finite).


şb
(b) If the integral r f ( x ) dx exists for all r ă b, then
żb żb
f ( x ) dx = lim f ( x ) dx
´8 rÑ´8 r

when the limit exists (and is finite).


şR
(c) If the integral r f ( x ) dx exists for all r ă R, then
ż8 żc żR
f ( x ) dx = lim f ( x ) dx + lim f ( x ) dx
´8 rÑ´8 r RÑ8 c

when both limits exist (and are finite). Any c can be used.

When the limit(s) exist, the integral is said to be convergent. Otherwise it is said
to be divergent.

ş1
We must also be able to treat an integral like 0 dx
x that has a finite domain of integration
but whose integrand is unbounded near one limit of integration63 Our approach is similar
— we sneak up on the problem. We compute the integral on a smaller domain, such as
ş1 dx
t x , with t ą 0, and then take the limit t Ñ 0+.
ş 
1 1
Example 3.10.5 0 x dx

Solution.

• Since the integrand is unbounded near x = 0, we integrate on the smaller domain


t ď x ď 1 with t ą 0:
ż1 ˇ1
1
dx = ln |x|ˇˇ = ´ ln |t|
ˇ
t x t

• We then take the limit as t Ñ 0+ to obtain


ż1 ż1
1 1
dx = lim dx = lim ´ ln |t| = +8
0 x t =0+ t x t =0+

63 This will, in turn, allow us to deal with integrals whose integrand is unbounded somewhere inside the
domain of integration.

250
I NTEGRATION 3.10 I MPROPER I NTEGRALS

Thus this integral diverges to +8.

Example 3.10.5

y
1
y= x

t 1 x

Indeed, we define integrals with unbounded integrands via this process:

Definition3.10.6 (Improper integral with unbounded integrand).

şb
(a) If the integral t f ( x ) dx exists for all a ă t ă b, then
żb żb
f ( x ) dx = lim f ( x ) dx
a tÑa+ t

when the limit exists (and is finite).


şT
(b) If the integral a f ( x ) dx exists for all a ă T ă b, then
żb żT
f ( x ) dx = lim f ( x ) dx
a TÑb´ a

when the limit exists (and is finite).


şT şb
(c) Let a ă c ă b. If the integrals a f ( x ) dx and t f ( x ) dx exist for all a ă T ă c
and c ă t ă b, then
żb żT żb
f ( x ) dx = lim f ( x ) dx + lim f ( x ) dx
a TÑc´ a tÑc+ t

when both limit exist (and are finite).

When the limit(s) exist, the integral is said to be convergent. Otherwise it is said
to be divergent.

251
I NTEGRATION 3.10 I MPROPER I NTEGRALS

Notice that (c) is used when the integrand is unbounded at some point in the middle
of the domain of integration, such as was the case in our original example
ż1
1
dx
´1 x2

A quick computation shows that this integral diverges to +8


ż1 ża ż1
1 1 1
dx = lim dx + lim dx
´1 x2 aÑ0´ ´1 x
2 bÑ0+ b x2
   
1 1
= lim 1 ´ + lim ´1
aÑ0´ a bÑ0+ b
= +8

More generally, if an integral has more than one “source of impropriety” (for exam-
ple an infinite domain of integration and an integrand with an unbounded integrand or
multiple infinite discontinuities) then you split it up into a sum of integrals with a single
“source of impropriety” in each. For the integral, as a whole, to converge every term in
that sum has to converge.
For example
ş 
8 dx
Example 3.10.7 ´8 ( x´2 ) x2

Consider the integral

dx
ż8

´8 ( x ´ 2) x 2

• The domain of integration that extends to both +8 and ´8.

• The integrand is singular (i.e. becomes infinite) at x = 2 and at x = 0.

• So we would write the integral as


ża ż0 żb
dx dx dx dx
ż8
= + +
´8 ( x ´ 2) x 2 ´8 ( x ´ 2) x 2
a ( x ´ 2) x
2
0 ( x ´ 2) x
2
ż2 żc
dx dx dx
ż8
+ 2
+ 2
+ 2
b ( x ´ 2) x 2 ( x ´ 2) x c ( x ´ 2) x

where

– a is any number strictly less than 0,


– b is any number strictly between 0 and 2, and
– c is any number strictly bigger than 2.

So, for example, take a = ´1, b = 1, c = 3.

• When we examine the right-hand side we see that

252
I NTEGRATION 3.10 I MPROPER I NTEGRALS

– the first integral has domain of integration extending to ´8


– the second integral has an integrand that becomes unbounded as x Ñ 0´,
– the third integral has an integrand that becomes unbounded as x Ñ 0+,
– the fourth integral has an integrand that becomes unbounded as x Ñ 2´,
– the fifth integral has an integrand that becomes unbounded as x Ñ 2+, and
– the last integral has domain of integration extending to +8.
• Each of these integrals can then be expressed as a limit of an integral on a small
domain.
Example 3.10.7

3.10.2 §§ Examples
With the more formal definitions out of the way, we are now ready for some (important)
examples.
ş 
Example 3.10.8 1 dx
8
x p with p ą 0

Solution.
• Fix any p ą 0.
• The domain of the integral 1 dx 1
ş8
x p extends to +8 and the integrand xp is continuous
and bounded on the whole domain.
• So we write this integral as the limit
żR
dx dx
ż8
p = lim
1 x RÑ8 1 x p

• The antiderivative of 1/x p changes when p = 1, so we will split the problem into
three cases, p ą 1, p = 1 and p ă 1.
• When p ą 1,
żR ˇR
dx 1 1´p ˇ
x ˇ
ˇ
p =
1 x 1´ p 1
R1´p ´ 1
=
1´ p
Taking the limit as R Ñ 8 gives
żR
dx dx
ż8
= lim
1 xp RÑ8 1 x
p

R1´p ´ 1
= lim
RÑ8 1 ´ p
´1 1
= =
1´ p p´1

253
I NTEGRATION 3.10 I MPROPER I NTEGRALS

since 1 ´ p ă 0.
• Similarly when p ă 1 we have
żR
dx dx R1´p ´ 1
ż8
p = lim = lim
1 x RÑ8 1 x p RÑ8 1 ´ p
= +8
because 1 ´ p ą 0 and the term R1´p diverges to +8.
• Finally when p = 1
żR
dx
= ln |R| ´ ln 0 = ln R
1 x
Then taking the limit as R Ñ 8 gives us
dx
ż8
lim ln |R| = +8.
p = RÑ8
1 x

• So summarising, we have
#
dx divergent if p ď 1
ż8
= 1
1 xp p´1 if p ą 1

Example 3.10.8

ş 
1 dx
Example 3.10.9 0 xp with p ą 0

Solution.

• Again fix any p ą 0.


ş1
• The domain of integration of the integral 0 dx 1
x p is finite, but the integrand x p becomes
unbounded as x approaches the left end, 0, of the domain of integration.
• So we write this integral as
ż1 ż1
dx dx
= lim
0 xp tÑ0+ t xp

• Again, the antiderivative changes at p = 1, so we split the problem into three cases.
• When p ą 1 we have
ż1 ˇ1
dx 1 1´p ˇ
x ˇ
ˇ
p =
t x 1´ p t
1 ´ t1´p
=
1´ p

254
I NTEGRATION 3.10 I MPROPER I NTEGRALS

Since 1 ´ p ă 0 when we take the limit as t Ñ 0 the term t1´p diverges to +8 and
we obtain
ż1
dx 1 ´ t1´p
p = lim = +8
0 x tÑ0+ 1 ´ p

• When p = 1 we similarly obtain


ż1 ż1
dx dx
= lim
0 x tÑ0+ t x

= lim ´ ln |t|
tÑ0+
= +8

• Finally, when p ă 1 we have


ż1 ż1
dx dx
= lim
0 xp tÑ0+ t x
p

1 ´ t1´p 1
= lim =
tÑ0 + 1´ p 1´ p
since 1 ´ p ą 0.

• In summary
1
ż1 #
dx 1´p if p ă 1
=
0 xp divergent if p ě 1

Example 3.10.9

ş 
8 dx
Example 3.10.10 0 xp with p ą 0

Solution.

• Yet again fix p ą 0.

• This time the domain of integration of the integral 0 dx


ş8
x p extends to +8, and in
1
addition the integrand x p becomes unbounded as x approaches the left end, 0, of the
domain of integration.

• So we split the domain in two — given our last two examples, the obvious place to
cut is at x = 1: ż1
dx dx dx
ż8 ż8
p = p + p
0 x 0 x 1 x

• We saw, in Example 3.10.9, that the first integral diverged whenever p ě 1, and we
also saw, in Example 3.10.8, that the second integral diverged whenever p ď 1.

255
I NTEGRATION 3.10 I MPROPER I NTEGRALS

dx
ş8
• So the integral 0 xp diverges for all values of p.

Example 3.10.10

ş 
1 dx
Example 3.10.11 ´1 x

This is a pretty subtle example. Look at the sketch below: This suggests that the signed

y
1
y= x

−1 x
1

area to the left of the y–axis should exactly cancel the area to the right of the y–axis making
ş1
the value of the integral ´1 dx x exactly zero.
But both of the integrals
ż1
dx
ż1
dx h i1 1
= lim = lim ln x = lim ln = +8
0 x tÑ0+ t x tÑ0+ t tÑ0+ t
ż0
dx
żT
dx h iT
= lim = lim ln |x| = lim ln |T| = ´8
´1 x TÑ0´ ´1 x TÑ0´ ´1 TÑ0´

ş1
diverge so ´1 dx
x diverges. Don’t make the mistake of thinking that 8 ´ 8 = 0. It is undefined.
And it is undefined for good reason.
For example, we have just seen that the area to the right of the y–axis is
ż1
dx
lim = +8
tÑ0+ t x

and that the area to the left of the y–axis is (substitute ´7t for T above)

dx
ż ´7t
lim = ´8
tÑ0+ ´1 x

256
I NTEGRATION 3.10 I MPROPER I NTEGRALS

If 8 ´ 8 = 0, the following limit should be 0.


ż 1  h 1 i
dx dx
ż ´7t
lim + = lim ln + ln | ´ 7t|
tÑ0+ t x ´1 x tÑ0+ t
h 1 i
= lim ln + ln(7t)
tÑ0+ t
h i
= lim ´ ln t + ln 7 + ln t = lim ln 7
tÑ0+ tÑ0+
= ln 7
This appears to give 8 ´ 8 = ln 7. Of course the number 7 was picked arbitrarily. You
can make 8 ´ 8 be any number at all, by making a suitable replacement for 7.
Example 3.10.11

Example 3.10.12 (Example 3.10.2 revisited)

The careful computation of the integral of Example 3.10.2 is


ż1 żT ż1
1 1 1
2
dx = lim 2
dx + lim dx
´1 x TÑ0´ ´1 x tÑ0+ t x2

h 1 iT h 1 i1
= lim ´ + lim ´
TÑ0´ x ´1 tÑ0+ x t
= 8+8
Hence the integral diverges to +8.
Example 3.10.12

ş 
8 dx
Example 3.10.13 ´8 1+ x2

Since
żR
dx h iR π
lim 2
= lim arctan x = lim arctan R =
RÑ8 0 1 + x RÑ8 0 RÑ8 2
ż0
dx h i 0 π
lim 2
= lim arctan x = lim ´ arctan r =
rÑ´8 r 1 + x rÑ´8 r rÑ´8 2
dx
ş8
The integral ´8 1+ x2 converges and takes the value π.
Example 3.10.13

Example 3.10.14
dx
ş8
For what values of p does e x (ln x ) p converge?
Solution.

257
I NTEGRATION 3.10 I MPROPER I NTEGRALS

• For x ě e, the denominator x (ln x ) p is never zero. So the integrand is bounded on the
entire domain of integration and this integral is improper only because the domain
of integration extends to +8 and we proceed as usual.

• We have

żR
dx dx
ż8
= lim use substitution
e x (ln x ) p RÑ8 e x (ln x ) p
ż ln R
du dx
= lim with u = ln x, du =
RÑ8 1 up x
$ h i
& 1 (ln R)1´p ´ 1 if p ‰ 1
1´p
= lim
RÑ8 %
ln(ln R) if p = 1
#
divergent if p ď 1
= 1
p´1 if p ą 1

In this last step we have used similar logic that that used in Example 3.10.8, but with
R replaced by ln R.

Example 3.10.14

Example 3.10.15 (the gamma function)

The gamma function Γ( x ) is defined by the improper integral

ż8
Γ(t) = x t´1 e´x dx
0

We shall now compute Γ(n) for all natural numbers n.

• To get started, we’ll compute

ż8 żR h iR
Γ (1) = e ´x
dx = lim e ´x
dx = lim ´e ´x
=1
0 RÑ8 0 RÑ8 0

258
I NTEGRATION 3.10 I MPROPER I NTEGRALS

• Then compute
ż8
Γ (2) = xe´x dx
0
żR
= lim xe´x dx use integration by parts with
RÑ8 0
u = x, dv = e´x dx,
v = ´e´x , du = dx
 ˇR ż R 
= lim ´ xe ´x ˇ
ˇ + e dx
´x
RÑ8 0 0
h iR
= lim ´ xe´x ´ e´x
RÑ8 0
=1

For the last equality, we used that lim xe´x = 0.


xÑ8

• Now we move on to general n, using the same type of computation as we just used
to evaluate Γ(2). For any natural number n,
ż8
Γ ( n + 1) = x n e´x dx
0
żR
= lim x n e´x dx again integrate by parts with
RÑ8 0
u = x n , dv = e´x dx,
v = ´e´x , du = nx n´1 dx
 ˇR ż R 
n ´x ˇ n´1 ´x
= lim ´x e ˇ + nx e dx
RÑ8 0 0
żR
= lim n x n´1 e´x dx
RÑ8 0
= nΓ(n)

To get to the third row, we used that lim x n e´x = 0.


xÑ8

• Now that we know Γ(2) = 1 and Γ(n + 1) = nΓ(n), for all n P N, we can compute
all of the Γ(n)’s.

Γ (2) =1
Γ (3) = Γ(2 + 1)= 2Γ(2) = 2 ¨ 1
Γ (4) = Γ(3 + 1)= 3Γ(3) = 3 ¨ 2 ¨ 1
Γ (5) = Γ(4 + 1)= 4Γ(4) = 4 ¨ 3 ¨ 2 ¨ 1
..
.
Γ ( n ) = ( n ´ 1) ¨ ( n ´ 2) ¨ ¨ ¨ 4 ¨ 3 ¨ 2 ¨ 1 = ( n ´ 1) !

259
I NTEGRATION 3.10 I MPROPER I NTEGRALS

That is, the factorial is just64 the Gamma function shifted by one.

Example 3.10.15

3.10.3 §§ Convergence Tests for Improper Integrals


It is very common to encounter integrals that are too complicated to evaluate explicitly.
Numerical approximation schemes, evaluated by computer, are often used instead (see
Section 3.9). You want to be sure that at least the integral converges before feeding it into
a computer65 . Fortunately it is usually possible to determine whether or not an improper
integral converges even when you cannot evaluate it explicitly.

Remark 3.10.16. For pedagogical purposes, ş8 we are going to concentrate on the problem
of determining whether or not an integral a f ( x ) dx converges, when f ( x ) has no singu-
larities for x ě a. Recall that the first step in analyzing any improper integral is to write it
as a sum of integrals each of has only a single “source of impropriety” — either a domain
of integration that extends to +8, or a domain of integration that extends to ´8, or an
integrand which is singular at one end of the domain of integration. So we are now going
to consider only the first of these three possibilities. But the techniques that we are about
to see have obvious analogues for the other two possibilities.
ş8
Now let’s start. Imagine that we have an improper integral a f ( x ) dx, that f ( x ) has
no singularities for x ě a and that f ( x ) is complicated enough that
ş8 we cannot evaluate the
integral explicitly66 . The idea is find another improper integral a g( x ) dx
ş8
• with g( x ) simple enough that we can evaluate the integral a g( x ) dx explicitly, or
ş8
at least determine easily whether or not a g( x ) dx converges, and
ş8
• with g( x ) behaving enough like f ( x ) for large x that the integral a f ( x ) dx con-
ş8
verges if and only if a g( x ) dx converges.

So far, this is a pretty vague strategy. Here is a theorem which starts to make it more
precise.

64 The Gamma function is far more important than just a generalisation of the factorial. It appears all over
mathematics, physics, statistics and beyond. It has all sorts of interesting properties and its definition
can be extended from natural numbers n to all numbers excluding 0, ´1, ´2, ´3, . . . . For example, one
can show that
π
Γ (1 ´ z ) Γ ( z ) = .
sin πz

65 Applying numerical integration methods to a divergent integral may result in perfectly reasonably
looking but very wrong answers.
ş8 2
66 You could, for example, think of something like our running example a e´t dt.

260
I NTEGRATION 3.10 I MPROPER I NTEGRALS

Theorem3.10.17 (Comparison).

Let a be a real number. Let f and g be functions that are defined and continuous
for all x ě a and assume that g( x ) ě 0 for all x ě a.
ş8 ş8
(a) If | f ( x )| ď g( x ) for all x ě a and if a g( x ) dx converges then a f ( x ) dx also
converges.
ş8 ş8
(b) If f ( x ) ě g( x ) for all x ě a and if a g( x ) dx diverges then a f ( x ) diverges.

We will not prove this theorem, but, hopefully, the following supporting arguments
should at least appear reasonable to you. Consider the figure below:

ş8
• If a g( x ) dx converges, then the area of

( x, y) ˇ x ě a, 0 ď y ď g( x ) is finite.
ˇ (

When | f ( x )| ď g( x ), the region

( x, y) ˇ x ě a, 0 ď y ď | f ( x )| is contained inside ( x, y) ˇ x ě a, 0 ď y ď g( x )
ˇ ( ˇ (

and so must also have finite area. Consequently the areas of both the regions

( x, y) ˇ x ě a, 0 ď y ď f ( x ) and ( x, y) ˇ x ě a, f ( x ) ď y ď 0
ˇ ( ˇ (

are finite too67 .


ş8
• If a g( x ) dx diverges, then the area of

( x, y) ˇ x ě a, 0 ď y ď g( x ) is infinite.
ˇ (

When f ( x ) ě g( x ), the region

( x, y) ˇ x ě a, 0 ď y ď f ( x ) contains the region ( x, y) ˇ x ě a, 0 ď y ď g( x )


ˇ ( ˇ (

and so also has infinite area.


ş8
67 We have separated the regions in which f ( x ) is positive and negative, because the integral a f ( x ) dx
represents the signed area of the union of ( x, y) x ě a, 0 ď y ď f ( x ) and ( x, y) x ě a, f ( x ) ď
ˇ ( ˇ
ˇ ˇ
yď0 .
(

261
I NTEGRATION 3.10 I MPROPER I NTEGRALS

ş 
8 ´x2
Example 3.10.18 1 e dx
2
We cannot evaluate the integral 1 e´x dx explicitly68 , however we would still like to un-
ş8

derstand if it is finite or not — does it converge or diverge?


Solution. We will use Theorem 3.10.17 to answer the question.
• So we want to find another integral that we can compute and that we can compare to
ş8 ´x2 2
1 e dx. To do so we pick an integrand that looks like e´x , but whose indefinite
integral we know — such as e´x .
2
• When x ě 1, we have x2 ě x and hence e´x ď e´x . Thus we can use Theorem 3.10.17
to compare
ż8 ż8
´x2
e dx with e´x dx
1 1

• The integral
ż8 żR
e ´x
dx = lim e´x dx
1 RÑ8 1
h iR
= lim ´ e´x
RÑ8 1
h i
= lim e´1 ´ e´R = e´1
RÑ8
converges.
2
• So, by Theorem 3.10.17, with a = 1, f ( x ) = e´x and g( x ) = e´x , the integral
ş8 ´x2
1 e dx converges too (it is approximately equal to 0.1394).

Example 3.10.18

ş 
Example 3.10.19
8 ´x2 dx
1/2 e

Solution.
ş8 2 ş8 2
• The integral 1/2 e´x dx is quite similar to the integral 1 e´x dx of Example 3.10.18.
But we cannot just repeat the argument of Example 3.10.18 because it is not true that
2
e´x ď e´x when 0 ă x ă 1.
2
• In fact, for 0 ă x ă 1, x2 ă x so that e´x ą e´x .
• However the difference between the current example and Example 3.10.18 is
ż8 ż8 ż1
´x2 ´x2 2
e dx ´ e dx = e´x dx
1/2 1 1/2

68 It has been the subject of many remarks and footnotes.

262
I NTEGRATION 3.10 I MPROPER I NTEGRALS

which is clearly a well defined finite number (its actually about 0.286). It is important
to note that we are being a little sloppy by taking the difference of two integrals like
this — we are assuming that both integrals converge. More on this below.
ş8 2
• So we would expect that 1/2 e´x dx should be the sum of the proper integral inte-
ş1 2 ş8 2
gral 1/2 e´x dx and the convergent integral 1 e´x dx and so should be a conver-
gent integral. This is indeed the case. The Theorem below provides the justification.

Example 3.10.19

Theorem3.10.20.

Let a and c be real numbers with a ă c andş8 let the function f ( x ) be continuous
for all x ě a. Then the improper integral a f ( x ) dx converges if and only if the
ş8
improper integral c f ( x ) dx converges.

ş8
Proof. By definition the improper integral a f ( x ) dx converges if and only if the limit
żR ż c żR 
lim f ( x ) dx = lim f ( x ) dx + f ( x ) dx
RÑ8 a RÑ8 a c
żc żR
= f ( x ) dx + lim f ( x ) dx
a RÑ8 c
şc
exists and is finite. (Remember that, in computing the limit, a f ( x ) dx is a finite constant
independent of R and so can be pulled out of the limit.) But that is the case if and only if
şR
the limit limRÑ8 c f ( x ) dx exists and is finite, which in turn is the case if and only if the
ş8
integral c f ( x ) dx converges.

Example 3.10.21
?
ş8 x
Does the integral 1 x2 + x dx converge or diverge?
Solution.
• Our first task is to identify the potential sources of impropriety for this integral.
• The domain of integration extends to +8, but we must also check to see if the in-
tegrand contains any singularities. On the domain of integration x ě 1 so the de-
nominator is never zero and the integrand is continuous. So the only problem is at
+8.
• Our second task is to develop some intuition69 . As the only problem is that the
domain of integration extends to infinity, whether or not the integral converges will
be determined by the behavior of the integrand for very large x.

69 This takes practice, practice and more practice. At the risk of alliteration — please perform plenty of
practice problems.

263
I NTEGRATION 3.10 I MPROPER I NTEGRALS

• When x is very large, x2 is much much larger than x (which we can write as x2 " x)
so that the denominator x2 + x « x2 and the integrand
? ?
x x 1
2
« 2 = 3/2
x +x x x

dx
ş8
• By Example 3.10.8, with p = 3/2, the integral 1 x3/2 converges. So we would expect
ş8 ?
that 1 x2 +xx dx converges too.

• Our final task is to verify that our intuition


?
is correct. To do so, we want to apply
x 1
part (a) of Theorem 3.10.17 with f ( x ) = x2 + x and g( x ) being x3/2 , or possibly some
1
constant times x3/2 . That is, we need to show that for all x ě 1 (i.e. on the domain of
integration)
?
x A
ď 3/2
x2 +x x
for some constant A. Let’s try this.

• Since x ě 1 we know that

x2 + x ą x2

Now take the reciprocal of both sides:

1 1
ă 2
x2 +x x
?
Multiply both sides by x (which is always positive, so the sign of the inequality
does not change)
? ?
x x 1
2
ă 2 = 3/2
x +x x x

• So Theorem ?3.10.17(a) and Example 3.10.8, with p = 3/2 do indeed show that the
integral 1 x2 +xx dx converges.
ş8

Example 3.10.21

Notice that in this last example we managed to show that the integral exists by finding
an integrand that behaved the same way for large x. Our intuition then had to be bolstered
with some careful inequalities to apply the comparison Theorem 3.10.17. It would be
nice to avoid this last step and be able jump from the intuition to the conclusion without
messing around with inequalities. Thankfully there is a variant of Theorem 3.10.17 that is
often easier to apply and that also fits well with the sort of intuition that we developed to
solve Example 3.10.21.

264
I NTEGRATION 3.10 I MPROPER I NTEGRALS

A key phrase in the previous paragraph is “behaves the same way for large x”. A good
way to formalise this expression — “ f ( x ) behaves like g( x ) for large x” — is to require
that the limit
f (x)
lim exists and is a finite nonzero number.
xÑ8 g ( x )

Suppose that this is the case and call the limit L ‰ 0. Then
f (x)
• the ratio g( x )
must approach L as x tends to +8.

• So when x is very large — say x ą B, for some big number B — we must have that
1 f (x)
Lď ď 2L for all x ą B
2 g( x )
Equivalently, f ( x ) lies between L2 g( x ) and 2Lg( x ), for all x ě B.
• Consequently, the integral of f ( x ) converges if and only if the integral of g( x ) con-
verges, by Theorems 3.10.17 and 3.10.20.
These considerations lead to the following variant of Theorem 3.10.17.

Theorem3.10.22 ( Limiting comparison).

Let ´8 ă a ă 8. Let f and g be functions that are defined and continuous for all
x ě a and assume that g( x ) ě 0 for all x ě a.
ş8
(a) If a g( x ) dx converges and the limit

f (x)
lim
xÑ8 g( x )
ş8
exists, then a f ( x ) dx converges.
ş8
(b) If a g( x ) dx diverges and the limit

f (x)
lim
xÑ8 g( x )
ş8
exists and is nonzero, then a f ( x ) diverges.

Note that in (b) the limit must exist and be nonzero, while in (a) we only require
that the limit exists (it can be zero).

Here is an example of how Theorem 3.10.22 is used.


ş 
8 +sin x
Example 3.10.23 1 ex´x +x 2 dx
x + sin x
ż8
Does the integral ´x + x 2
dx converge or diverge?
1 e
Solution.

265
I NTEGRATION 3.10 I MPROPER I NTEGRALS

• Our first task is to identify the potential sources of impropriety for this integral.

• The domain of integration extends to +8. On the domain of integration the de-
nominator is never zero so the integrand is continuous. Thus the only problem is at
+8.

• Our second task is to develop some intuition about the behavior of the integrand
for very large x. A good way to start is to think about the size of each term when x
becomes big.

• When x is very large:

– e´x ! x2 , so that the denominator e´x + x2 « x2 , and


– | sin x| ď 1 ! x, so that the numerator x + sin x « x, and
x +sin x
– the integrand e´x + x2
« x
x2
= 1x .

Notice that we are using A ! B to mean that “A is much much smaller than B”.
Similarly A " B means “A is much much bigger than B”. We don’t really need to be
too precise about its meaning beyond this in the present context.

dx x +sin x
ş8 ş8
• Now, since 1 x diverges, we would expect 1 e´x + x2 dx to diverge too.

• Our final task is to verify that our intuition is correct. To do so, we set

x + sin x 1
f (x) = g( x ) =
e´x + x2 x

and compute

f (x) x + sin x 1
lim = lim ´x ˜
xÑ8 g( x ) xÑ8 e + x2 x
(1 + sin x/x ) x
= lim ´x 2 ˆx
xÑ8 ( e /x + 1) x2
1 + sin x/x
= lim ´x 2
xÑ8 e /x + 1
=1

• Since 1 g( x ) dx = 1 dx
ş8 ş8
ş8 x diverges, by Example 3.10.8 with p = 1, Theorem 3.10.22(b)
ş8 x+sin x
now tells us that 1 f ( x ) dx = 1 e´x + x2 dx diverges too.

Example 3.10.23

266
I NTEGRATION 3.11 O VERVIEW OF I NTEGRATION T ECHNIQUES

3.11IJ Overview of Integration Techniques


We have now learned many fancy methods of integration. Below we give a short review
of the methods we’ve learned and a general idea of when you might want to choose each
one. This section has no new mathematical content.
Up till now, you could often guess the method to use on integrals in the practice book
by noticing which section they were in. Section 3.11 in the practice book has lots of inte-
grals to work on, without that contextual hint.

§§ Known Areas (Section 3.1.3)


A definite integral gives the area underneath a curve. If that area makes a simple shape,
you might be able to use a formula from geometry. This is particularly convenient for
rectangles.

y
żb 1
1 dx = (b ´ a) ˆ (1) = b ´ a
a

a x
b
A special case where this method is useful is with half and quarter circles. If we wanted
to use the Fundamental Theorem of Calculus to evaluate the integral below, we’d need a
trigonometric substitution. It’s much easier to recognize that the area in question is one
quarter of the unit circle.

y
ż1a 1
1 π
1 ´ x2 dx = π (1)2 =
0 4 4

1 x
You can also take advantage of a function’s symmetry. For example, ´π sin x dx = 0
şπ

because the positive area on the right exactly cancels out the negative (net) area on the
left.

y
1

żπ
sin xdx = 0 π x
´π −π

−1

267
I NTEGRATION 3.11 O VERVIEW OF I NTEGRATION T ECHNIQUES

§§ Substitution (Section 3.4)


Substitution is doing the chain rule in reverse. If you see some “inside” function whose
derivative shows up multiplied to the rest of the integrand, you might want to try substi-
tution. An obvious example would be something like this:
ż
(4x + 3) cos(2x2 + 3x )dx

The “inside function” 2x2 + 3x has derivative (4x + 3), and we see precisely that
derivative multiplied to the rest of the integrand. So this is a great candidate for the
substitution u = 2x2 + 3x.
Substitution is sometimes a first step to get a function in a better form for a second tech-
e x +e3x
nique. For example, the function ex (1´e x )(2´e x ) is a rational function (and a candidate for

the method of partial fractions to antidifferentiate) if you consider e x to be your variable.


So a first step in antidifferentiation would be to use u = e x , du = e x dx:
e x + e3x 1 + e2x 1 + u2
ż ż ż
x
dx = ¨ e dx = du
e x (1 ´ e x )(2 ´ e x ) e x (1 ´ e x )(2 ´ e x ) u(1 ´ u)(2 ´ u)

§§ Integration by Parts (Section 3.5)


If you see the product of two functions, and you would like to swap one with its derivative
and the other with its antiderivative, then integration by parts is the method for you. One
standard example is the integral ż
xe x dx.
We let u = x and dv = e x dx. Then du = dx and v = e x . The function v = e x isn’t much
of an improvement over dv = e x dx, but the function du = dx is an improvement over
u = x. We get to replace our integrand with a simpler product of functions:
ż ż
xe dx = xe ´ e x dx
x x

(Contrast this with the substitution rule: both often operate on integrands that are the
product of functions.)
There are two special cases you should know where integration by parts is useful but
not immediately obvious. One case is with the antiderivatives of logarithms and inverse
trig functions. (See Examples 3.5.8 and 3.5.9 for details.) For example, to antidifferentiate
the natural logarithm, we use integration by parts with u = ln x and dv = dx.
1
ż ż
ln x dx = x ln x ´ x ¨ dx
x
The other special case is integrating around in a circle. We’ve seen this with the integral
ż
e x sin xdx

(Example 3.5.11). As we repeatedly integrate by parts, we get similar integrals (e x stays


the same; sin x eventually turns into ´ sin x). This leads to an equation that we can solve
for our original integral algebraically, without actually having to antidifferentiate in the
usual sense of the term.

268
I NTEGRATION 3.11 O VERVIEW OF I NTEGRATION T ECHNIQUES

§§ Trigonometric Integrals (Section 3.6)


There are fairly straightforward algorithms to handle integrals of the type sinn x cosm xdx
ş

and tann x secm xdx. You’ll decide on a substitution (say, u = sin x) and then use trigono-
ş

metric identities to transform the integrand into a suitable form.


A typical example is that the integral
sin3 x
ż
dx
cos2 x
can be evaluated with the substitution u = cos x, du = sin xdx if we make the following
adjustments:
sin3 x sin2 x 1 ´ cos2 x
ż ż ż
dx = sin x dx = sin x dx
cos2 x cos2 x cos2 x
Identities to remember:
• sin2 x + cos2 x = 1
• tan2 x + 1 = sec2 x
• sin2 x = 1´cos 2x
2
1+cos 2x
• cos2 x = 2
Trigonometric integrals often arise as the result of trigonometric substitution.

§§ Trigonometric Substitution (Section 3.7)


Mostly, this comes up when you have a quadratic equation underneath a square root.
Using a substitution allows you to take advantage of trig identities to turn the thing under
the square root into a perfect square.
A simple example is
ż a
1 + x2 dx.

We can’t cancel out the square root and the x2 in its current form, so we set x = tan θ,
dx = sec2 θdθ. Now:
ż a ż a
2
1 + x dx = 1 + tan2 θ sec2 θdθ

Using the identity tan2 x + 1 = sec2 x,


ż a
= sec2 θ sec2 θdθ

Now we can cancel out the square root and the squared function.
ż
= sec θ ¨ sec2 θdθ

As above, trigonometric substitutions often lead to trigonometric integrals.


You will have to remember the identities sin2 x + cos2 x = 1 and tan2 x + 1 = sec2 x.
Completing the square might be necessary to find an appropriate trigonometric substitu-
tion. After you antidifferentiate, recovering the original variable might involve questions
like ”if sin θ = x, then what is tan θ?” Drawing right triangles can help answer them.

269
I NTEGRATION 3.11 O VERVIEW OF I NTEGRATION T ECHNIQUES

§§ Partial Fractions (Section 3.8)


Partial fractions is a method for evaluating the integral of a rational function (a fraction
whose numerator and denominator are both polynomials). Partial fractions only works
on rational functions (so you might need a substitution to start). It gives you an integral
of MORE rational functions, which often leads you down a road to substitution.
If you see a rational function that you can already integrate using substitution, then
use substitution. If you see a rational function that you can’t integrate using substitution,
try partial fractions to turn it into a friendlier form.
The integral
3x ´ 4
ż
dx
x3 ´ 4x
ş 2 ´4
is a good candidate for partial fractions. By contrast, 3xx3 ´4x
dx lends itself to the substitu-
3
tion u = x ´ 4x, and so has no need for partial fractions.
Before starting, you need to have a factored denominator that has a higher degree
than the numerator. So, polynomial long division and factoring polynomials are common.
Solving for coefficients sometimes involves systems several equations.

§§ Numerical Integration (Section 3.9)


Some integrals, such as ż ż  
x2
e dx and sin x2 dx

cannot be evaluated using the techniques we’ve learned so far. Their definite integrals,
however, can be approximated using the Midpoint Rule, the Trapezoid Rule, or Simpson’s
Rule. These rules come with error bounds, so we can make sure our error is within a given
tolerance.
One special application of numerical integration is finding a decimal approximation
for an irrational number. In Question 28 of Section 3.9 in the practice book, we find a
decimal approximation of ln 2 by applying Simpson’s Rule to the integral
ż2
1
dx.
1 x

§§ Improper Integrals (Section 3.10)


Improper integrals have infinite discontinuities in their integrands or infinite intervals of
integration. The two integrals
ż2
1
ż8
e ´x
dx and dx
1 ´1 x

are both improper. The second one is a dangerous type: it’s easy to try to apply the
Fundamental Theorem of Calculus to evaluate it, without realizing that your computation
is nonsense. Both types of improper integrals are evaluated with limits. If the limits don’t
exist (including limits going to infinity), then we say the integral diverges.

270
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

3.12IJ Differential Equations


A differential equation is an equation for an unknown function that involves the derivative
of the unknown function. Differential equations play a central role in modelling a huge
number of different phenomena. Here is a table giving a bunch of named differential
equations and what they are used for. It is far from complete.

Newton’s Law of Motion describes motion of particles


Maxwell’s equations describes electromagnetic radiation
Navier–Stokes equations describes fluid motion
Heat equation describes heat flow
Wave equation describes wave motion
Schrödinger equation describes atoms, molecules and crystals
Stress-strain equations describes elastic materials
Black–Scholes models used for pricing financial options
Predator–prey equations describes ecosystem populations
Einstein’s equations connects gravity and geometry
Ludwig–Jones–Holling’s equation models spruce budworm/Balsam fir ecosystem
Zeeman’s model models heart beats and nerve impulses
Sherman–Rinzel–Keizer model for electrical activity in Pancreatic β–cells
Hodgkin–Huxley equations models nerve action potentials

We are just going to scratch the surface of the study of differential equations. Most uni-
versities offer half a dozen different undergraduate courses on various aspects of differen-
tial equations. We’ll focus here on one important type of differential equation: separable
differential equations.
We’ve already seen one type of differential equation: finding an antiderivative.
Example 3.12.1

Suppose y( x ) is a function satisfying

dy
= ex .
dx
What is y?
Solution. We know the derivative of our function y, as a function of x, so we just antidif-
ferentiate.
y( x ) = e x + C
for some constant C.
Note the answer to the question is a function.

271
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

Example 3.12.1

Before we talk about solving more complicated differential equations, let’s get more
practice working with them. The biggest paradigm shift between solving a differential
equation, and the type of equation-solving you’re used to, is that we’re solving for a func-
tion instead of a variable.
Example 3.12.2

Choose the function(s) listed below that solve this differential equation:

dy
+ x2 ´ 1 = y
dx
A. y = x2 + 2x + 1

B. y = x2 + 1

dy
Solution. We want to check whether y is a solution, so we replace y and dx in the differ-
ential equation, and check whether the equation is true.
dy
In the case y = x2 + 2x + 1, then dx = 2x + 2. We plug these into our differential equa-
tion:
dy
+ x2 ´ 1 = y
dx
2x + 2 + x2 ´ 1 = x2 + 2x + 1

Simplifying the left-hand side,

x2 + 2x + 1 = x2 + 2x + 1

This is true – the function on the left and the function on the right are the same. So the
dy
equation y = x2 + 2x + 1 is a solution to the differential equation dx + x2 ´ 1 = y.
Now let’s think about the other function we were asked to consider, y = x2 + 1. For
dy
this fuction, dx = 2x. We plug these into our differential equation:

dy
+ x2 ´ 1 = y
dx
2x + x2 ´ 1 = x2 + 2x + 1

Rearranging the left-hand side,

x2 + 2x ´ 1 = x2 + 2x + 1

This is not true – the function on the left and the function on the right are not the same.
dy
So the equation y = x2 + 2x is not a solution to the differential equation dx + x2 ´ 1 = y.

272
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

You don’t have enough tools yet to come up with solutions like y = x2 + 2x + 1 – those
will come shortly. For this example, we only want you to understand what it means for a
function to be a solution to a differential equation.
Example 3.12.2

Definition3.12.3.

A separable differential equation is an equation for a function y( x ) that can be writ-


ten in the form
 dy
g y( x ) (x) = f (x)
dx

It may take some rearranging to get a differential equation in this form. The “separa-
ble” refers to the mechanics of getting all terms containing y and y1 on side of the equation,
and all terms containing x on the other side.
We’ll start by developing a recipe for solving separable differential equations. Then
we’ll look at many examples. Usually one suppresses the argument of y( x ) and writes the
equation as below:

dy
g(y) = f (x)
dx
If the left and right functions side of the equation are the same (and they should be – oth-
erwise that equals sign has no business being there) then their antiderivatives with respect
to x should be the same as well, up to the usual additive constant.
ż  
dy
ż
g(y) dx = f ( x )dx
dx
The left-hand side of the equation above is in a perfect form for a substitution.
ż ż
g(y)dy = f ( x )dx (*)

In this way, we’ve turned the problem of finding solutions to our separable differential
equation into the problem of finding two antiderivatives.
Note the work above didn’t really depend on what, exactly, f ( x ) and g(y) were. So, to
skip to the end, we use the following mnemonic algorithm. It looks strange, but you can
simply think of it as shorthand for the work we just did above.

dy
g(y) ¨ = f (x)
dx
g(y) dy = f ( x ) dx (1)
ż ż
g(y) dy = f ( x ) dx (2)

273
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

dy
In Step (1), we separate all x’s and y’s, including in dx , by “multiplying” both sides of
the equation by dx. In Step (2), we add an integral side to both sides of the equation.
dy
This looks illegal, and indeed is illegal — dx is not a fraction. Again, this procedure is
simply a mnemonic device to help you remember the result (*).
Example 3.12.4

The differential equation


dy
= xe´y
dx
is separable, and we now find all of its solutions by using our mnemonic device. We start
by cross–multiplying so as to move all y’s to the left hand side and all x’s (including dx)
to the right hand side.
ey dy = x dx
Then we integrate both sides.

x2
ż ż
y
e dy = xdx ðñ ey = +C
2
The
ş y C on the right hand side contains both the arbitrary constant for the indefinite integral
e dy and the arbitrary constant for the indefinite integral xdx. Finally, we solve for y,
ş

which is really a function of x.


 x2 
y( x ) = ln +C
2

Note that C is an arbitrary constant. It can take any value. It cannot be determined
by the differential equation itself. In applications C is usually determined by a require-
ment that y take some prescribed value (determined by the application) when x is some
prescribed value. (We call these types of problems “initial value” problems. The given
constants are “initial conditions.”) For example, suppose that we wish to find a function
y( x ) that obeys both
dy
= xe´y and y (0) = 1
dx
dy 2 
We know that, to have dx = xe´y satisfied, we must have y( x ) = ln x2 + C , for some
constant C. To also have the initial condition y(0) = 1, we must have
 x2 ˇˇ
1 = y(0) = ln + C ˇˇ = ln C ðñ ln C = 1 ðñ C = e
2 x =0

x2

So our final solution is y( x ) = ln 2 +e .
Example 3.12.4

Example 3.12.5
dy
Solve dx = y2

274
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

Solution. When y ‰ 0, we can use our mnemonic.

dy
= y2
dx
dy
= dx
y2
dy
ż ż
= dx
y2
y´1
= x+C
´1
1
y=´
x+C
dy
When y = 0, this computation breaks down because y2 contains a division by 0. We can
check if the function y( x ) = 0 satisfies the differential equation by just subbing it in:

y( x ) = 0 ùñ y1 ( x ) = 0, y( x )2 = 0 ùñ y1 ( x ) = y( x )2

So y( x ) = 0 is a solution and the full solution is:

1
y( x ) = 0 or y( x ) = ´ , for any constant C
x+C

Example 3.12.5

Example 3.12.6 (War Moods)

In the article War Moods: 170 , researcher Lewis Richardson models the proportion of a
population eager for war using a model previously applied to the spread of infectious
diseases. (We note here an important quote from the paper: “To describe a phenomenon
is not to praise it.” Understanding the social psychology of public support for war may
lead to strategies for preventing conflicts.)
A simplified version of Richardson’s model for the lead-up to hostilities is as follows.
Let y be the proportion of a population that supports going to war, with the rest of the
population against going to war. Then the rate of change of y over time is proportional
to the product of the proportion of people who are pro-war and the proportion of people
who are anti-war. The reasoning is roughly71 that y changes as pro-war people encounter
anti-war people.
That corresponds to the differential equation

dy
= Cy(1 ´ y)
dt

70 War Moods: 1 by Richardson, PSYCHOMETRIKA–Vol. 13, no. 3 September, 1948. You can access the
full text with your UBC CWL at this link.
71 The actual paper has more subtlety, including considering populations of rival nations, and the pro-
gression of public sentiment as a war drags on.

275
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

where 1 ´ y is the proportion of people who are anti-war.


Let’s solve this differential equation.

dy
= Cy(1 ´ y)
dt
1
dy = Cdt
y (1 ´ y )
1
ż ż
dy = Cdt
y (1 ´ y )

Using the method of partial fraction decomposition from Section 3.8,


ż  
1 1
ż
+ dy = Cdt
y 1´y
ln y ´ ln(1 ´ y) = Ct + D
 
y
ln = Ct + D
1´y
y
= eCt+ D
1´y
y = (1 ´ y)eCt+ D = eCt+ D ´ yeCt+ D
y(1 + eCt+ D ) = eCt+ D
eCt+ D
y=
1 + eCt+ D

where D is some constant.


The graph of this function has the shape below.

In this model, there is a quick change from low support for war (y « 0) to high support
for war (y « 1). The paper notes, regarding the first world war: “There is evidence ... that
the majority of Britishers changed their opinions about war with Germany during a week
in 1914 between July 24 and August 4.”
Example 3.12.6

Example 3.12.7 (Fish growth)

Professor Daniel Pauly of the UBC Institute for the Oceans and Fisheries considered the

276
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

following model of fish growth in the paper A précis of Gill-Oxygen Limitation Theory
(GOLT), with some Emphasis on the Eastern Mediterranean 72 .
Let w(t) be the weight of an individual fish over time. The rate at which is it able to
synthesize proteins (and other necessary substances) is proportional to wd , while the rate
at which its proteins need to be replaced is proportional to w. So,

dw
= Hwd ´ kw
dt

where Hwd is the rate at which new proteins are built, and kw is the rate at which they
need to be replaced. Because the rate of production is limited to the rate of oxygen intake
(which itself is proportional to gill size), the exponent d is less than one.
The paper notes that researchers often neglect oxygen impacts on fish growth— a de-
cision not supported by this model.
Suppose d = 0.5 for a particular small species of fish. What is w(t)? How large would
the fish grow, if it grew indefinitely?

Solution. The differential equation is separable.

dw ?
= H w ´ kw
dt
1
? dw = dt
H w ´ kw
1
ż ż
? dw = 1dt
H w ´ kw
1
ż ż
? ? dw = 1dt
w( H ´ k w)

?
For the left-hand side, we use the substitution u = H ´ k w, ´ 2k du = ?1 dw
w

2 1
ż ż
´ du = 1dt
k u
2
´ ln |u| = t + C
k
k
ln |u| = ´ t + C (remember C is an arbitrary constant)
2
´ 2k t+C
|u| = e
? k
|H ´ k w| = e´ 2 t+C

72 PAULY, D. (2019). A précis of Gill-Oxygen Limitation Theory (GOLT), with some Emphasis on the
Eastern Mediterranean. Mediterranean Marine Science, 20(4), 660-668. doi:http://dx.doi.org/
10.12681/mms.19285

277
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

If we’re studying young fish growing to adulthood,? then dw


dt ą 0, because the fish are
getting bigger. Under this assumption, H ´ k w ą 0, so we can drop the absolute value
signs.
? k
H ´ k w = e´ 2 t +C
? k
k w = H ´ e´ 2 t +C
? 1 ´ 2k t+C

w= H´e
k
1  2
´ 2k t+C
w = 2 H´e
k
The shape of this function is shown below.
y

The graph shape suggests the existence of a horizontal asymptote. Indeed:

1  2  H 2
´ 2k t+C
lim H´e =
tÑ8 k2 k
2
So, aging fish who grow according to our model approach the weight Hk .
Example 3.12.7

Definition3.12.8.

A differential equation of the form

dy
= a(y ´ b)
dx
where a and b are constants is called a first-order linear differential equation.

“First-order” means the equation has a first derivative, but no higher-order derivatives
(e.g. no second derivatives). The right hand side is a linear expression in the variable y.
Example 3.12.9

Let a and b be any two constants. We’ll now solve the family of differential equations
dy
= a(y ´ b)
dx

278
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

using our mnemonic device.

dy dy
ż ż
= a dx ùñ = a dx ùñ ln |y ´ b| = ax + c ùñ |y ´ b| = e ax+c = ec e ax
y´b y´b
ùñ y ´ b = Ce ax

where C is either +ec or ´ec . Note that as c runs over all real numbers, +ec runs over
all strictly positive real numbers and ´ec runs over all strictly negative real numbers. So,
so far, C can be any real number except 0. But we were a bit sloppy here. We implicitly
assumed that y ´ b was nonzero, so that we could divide it across. None–the–less, the
constant function y = b, which corresponds to C = 0, is a perfectly good solution — when
dy
y is the constant function y = b, both dx and a(y ´ b) are zero. So the general solution to
dy
dx= a(y ´ b) is y( x ) = Ce ax + b, where the constant C can be any real number. Note that
when y( x ) = Ce ax + b we have y(0) = C + b. So C = y(0) ´ b and the general solution is

y( x ) = (y(0) ´ b)e ax + b

Example 3.12.9

This is worth stating as a theorem.

Theorem3.12.10.

Let a and b be constants. The differentiable function y( x) obeys the differential


equation
dy
= a(y ´ b)
dx
if and only if
y( x ) = (y(0) ´ b)e ax + b

One solution to the differential equation

dy
= a(y ´ b)
dt

is the constant equation


y = b.

We call this the “steady state” solution. Steady, because y is never changing – it’s always
b.

279
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

Definition3.12.11.

A constant function y = b that satisfies a differential equation of the form

dy
= g(y)
dt
is a steady
ˇ state solution. The constant b is a steady state of the differential equation
dy ˇ
if dt ˇ = g(b) = 0
y=b

Example 3.12.12

A glucose solution is administered intravenously into the bloodstream at a constant rate


r. As the glucose is added, it is converted into other substances at a rate that is propor-
tional to the concentration at that time. The concentration, C (t), of the glucose in the
bloodstream at time t obeys the differential equation
dC
= r ´ kC
dt
where k is a positive constant of proportionality.
(a) Express C (t) in terms of k and C (0).
(b) Find lim C (t).
tÑ8

(c) Find the steady-state solution to the differential equation.



Solution. (a) Since r ´ kC = ´k C ´ kr the given equation is
dC r
= ´k C ´
dt k
which is of the form solved in Theorem 3.12.10 with a = ´k and b = kr . So the solution is
r  r  ´kt
C ( t ) = + C (0) ´ e
k k
r
(b) For any k ą 0, lim e´kt = 0. Consequently, for any C (0) and any k ą 0, lim C (t) = k .
tÑ8 tÑ8
We could have predicted this limit without solving for C (t). If we assume that C (t) ap-
proaches some equilibrium value Ce as t approaches infinity, then taking the limits of both
sides of dC
dt = r ´ kC as t Ñ 8 gives
r
0 = r ´ kCe ùñ Ce =
k
(c) dC k k
dt = 0 when C = r , so C = r is the steady-state solution. That is, if we have a blood
concentration of glucose of kr , the concentration is staying the same even as the glucose is
injected and metabolized.
Example 3.12.12

280
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

3.12.1 §§ (Optional) Logistic Growth


Suppose that we wish to predict the size P(t) of a population as a function of the time
t. In the most naive model of population growth, each couple produces β offspring (for
P(t)
some constant β) and then dies. Thus over the course of one generation β 2 children are
produced and P(t) parents die so that the size of the population grows from P(t) to

P(t) β
P(t + t g ) = P(t) + β ´ loPomo
(t)on = P(t)
2
looooooomooooooon 2
parents die
parents+offspring

where t g denotes the lifespan of one generation. The rate of change of the size of the
population per unit time is

P(t + t g ) ´ P(t) 1 hβ i
= P(t) ´ P(t) = bP(t)
tg tg 2

where b = 2tg is the net birthrate per member of the population per unit time. If we
β´2

approximate

P(t + t g ) ´ P(t) dP
« (t)
tg dt

we get the differential equation

dP
= bP(t) (3.12.1)
dt
By Theorem 3.12.10,

P(t) = P(0) ¨ ebt (3.12.2)

This is called the Malthusian73 growth model. It is, of course, very simplistic. One of its
main characteristics is that, since P(t + T ) = P(0) ¨ eb(t+T ) = P(t) ¨ ebT , every time you add
T to the time, the population size is multiplied by ebT . In particular, the population size
doubles every lnb2 units of time.
Example 3.12.13

In 1927 the population of the world was about 2 billion. In 1974 it was about 4 billion. Esti-
mate when it reached 6 billion. What will the population of the world be in 2100, assuming
the Malthusian growth model?
Solution.
• Let P(t) be the world’s population, in billions, t years after 1927. Note that 1974
corresponds to t = 1974 ´ 1927 = 47.

73 This is named after Rev. Thomas Robert Malthus. He described this model in a 1798 paper called “An
essay on the principle of population”.

281
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

• We are assuming that P(t) obeys equation (3.12.1). So, by (3.12.2)


P(t) = P(0) ¨ ebt
Notice that there are 2 unknowns here — b and P(0) — so we need two pieces of
information to find them.
• We are told P(0) = 2, so
P(t) = 2 ¨ ebt

• We are also told P(47) = 4, which gives


4 = 2 ¨ e47b clean up
47b
e =2 take the log and clean up
ln 2
b= = 0.0147 to 3 decimal places
47
• We now know P(t) completely, so we can easily determine the predicted popula-
tion74 in 2100, i.e. at t = 2100 ´ 1927 = 173.
P(173) = 2e173b = 2e173ˆ0.0147 = 12.7 billion

• Finally, our crude model predicts that the population is 6 billion at the time t that
obeys
P(t) = 2ebt = 6 clean up
ebt = 3 take the log and clean up
ln 3 ln 3
t= = 47 = 74.5
b ln 2
which corresponds75 to the middle of 2001.
Example 3.12.13

The Malthusian growth model can be a reasonably good model only when the popu-
lation size is very small compared to its environment76 . A more sophisticated model of
population growth takes into account the “carrying capacity of the environment.”
Logistic growth adds one more wrinkle to the simple population model. It assumes
that the population only has access to limited resources. As the size of the population
grows the amount of food available to each member decreases. This  in turn causes the net
P
birth rate b to decrease. In the logistic growth model b = b0 1 ´ K , where K is called the
carrying capacity of the environment, so that
 
P(t)
P (t) = b0 1 ´
1
P(t)
K
74 The 2015 Revision of World Population, a publication of the United Nations, predicts that the world’s
population in 2100 will be about 11 billion. But “about” covers a pretty large range. They give an 80%
confidence interval running from 10 billion to 12.5 billion.
75 The world population really reached 6 billion in about 1999.
76 That is, the population has plenty of food and space to grow.

282
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

Figure3.12.1.
 
P(t)
Below is a graph of P1 (t) = b0 1 ´ K P(t). Pay attention to the axis labels: the inde-
pendent (horizontal) axis is population, P. It is not time. The dependent (vertical) axis is
rate of change of population, dP
dt .

dP
dt

P
K

• When P = 0, there are no individuals in the population, so its growth rate is zero.
(Extinct animals do not usually reproduce.)

• When 0 ă P ă K, the population is less than the carrying capacity of its environ-
ment, so the population grows ( dP
dt ą 0).

• When P ą K, the population has outgrown the capacity of the environment to sup-
port it. Then dP
dt ă 0, as the population experiences a higher death rate than birth
rate.

This is a separable differential equation and we can solve it explicitly. We shall do


so shortly (in Example 3.12.14, below). But, before doing that, we’ll see what we can
learn about the behaviour of solutions to differential equations like this without finding
formulae for the solutions. It turns out that we can learn a lot just by watching the sign of
P1 (t). For concreteness, we’ll look at solutions of the differential equation

dP 
(t) = 6000 ´ 3P(t) P(t)
dt
We’ll sketch the graphs of four functions P(t) that obey this equation.

• For the first function, P(0) = 0.


• For the second function, P(0) = 1000.
• For the third function, P(0) = 2000.
• For the fourth function, P(0) = 3000.

The sketches will be based on the observation that (6000 ´ 3P) P = 3(2000 ´ P) P

283
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

• is zero for P = 0, 2000,


• is strictly positive for 0 ă P ă 2000 and
• is strictly negative for P ą 2000.

Consequently

=0 if P(t) = 0
$


dP ą0 if 0 ă P(t) ă 2000

&
(t)
dt ’= 0
’ if P(t) = 2000

ă0 if P(t) ą 2000
%

dP

Thus if P(t) is some function that obeys dt ( t ) = 6000 ´ 3P(t) P(t), then as the graph of

P(t) passes through the point t, P(t)

slope zero, i.e. is horizontal, if P(t) = 0


$


&positive slope,

i.e. is increasing, if 0 ă P(t) ă 2000
the graph has


’slope zero, i.e. is horizontal, if P(t) = 2000
negative slope, i.e. is decreasing, if 0 ă P(t) ă 2000
%

as illustrated in the figure

P (t)
3000
2000
1000
t

As a result,

• if P(0) = 0, the graph starts out horizontally. In other words, as t starts to increase,
P(t) remains at zero, so the slope of the graph remains at zero. The population size
remains zero for all time.  As a check, observe that the function P(t) = 0 obeys
dP
dt ( t ) = 6000 ´ 3P ( t ) P ( t ) for all t.

• Similarly, if P(0) = 2000, the graph again starts out horizontally. So P(t) remains at
2000 and the slope remains at zero. The population size remains
 2000 for all time.
dP
Again, the function P(t) = 2000 obeys dt (t) = 6000 ´ 3P(t) P(t) for all t.

• If P(0) = 1000, the graph starts out with positive slope.


 So P(t) increases with t. As
P(t) increases towards 2000, the slope (6000 ´ 3P(t) P(t), while remaining positive,
gets closer and closer to zero. As the graph approaches height 2000, it becomes more
and more horizontal. The graph cannot actually cross from below 2000 to above
2000, because to do so it would have to have strictly positive slope for some value of
P above 2000, which is not allowed.

284
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

• If P(0) = 3000, the graph starts out with negative slope. So P(t) decreases with
t. As P(t) decreases towards 2000, the slope (6000 ´ 3P(t) P(t), while remaining
negative, gets closer and closer to zero. As the graph approaches height 2000, it
becomes more and more horizontal. The graph cannot actually cross from above
2000 to below 2000, because to do so it would have to have negative slope for some
value of P below 2000, which is not allowed.
These curves are sketched in the figure below. We conclude that for any initial population
size P(0), except P(0) = 0, the population size approaches 2000 as t Ñ 8.

P (t)
3000

2000

1000

Now we’ll do an example in which we explicitly solve the logistic growth equation.
Example 3.12.14

In 1986, the population of the world was 5 billion and was increasing at a rate of 2% per
year. Using the logistic growth model with an assumed maximum population of 100 bil-
lion, predict the population of the world in the years 2000, 2100 and 2500.
Solution. Let y(t) be the population of the world, in billions of people, at time 1986 + t.
The logistic growth model assumes
y1 = ay(K ´ y)
where K is the carrying capacity and a = bK0 .
First we’ll determine the values of the constants a and K from the given data.
• We know that, if at time zero the population is below K, then as time increases the
population increases, approaching the limit K as t tends to infinity. So in this prob-
lem K is the maximum population. That is, K = 100.
y1
• We are also told that, at time zero, the percentage rate of change of population, 100 y ,
y1
= 0.02. But, from the differential equation, yy = a(K ´ y).
1
is 2, so that, at time zero, y
2
Hence at time zero, 0.02 = a(100 ´ 5), so that a = 9500 .
We now know a and K and can solve the (separable) differential equation
ż h
dy dy 1 1 1 i
ż
= ay(K ´ y) ùñ = a dt ùñ ´ dy = a dt
dt y(K ´ y) K y y´K
1
ùñ [ln |y| ´ ln |y ´ K|] = at + C
K
|y| ˇ y ˇ
ùñ ln = aKt + CK ùñ ˇ ˇ = De aKt
ˇ ˇ
|y ´ K| y´K

285
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

ˇ ˇ
ˇ y ˇ y
with D = eCK . We know that y remains between 0 and K, so that ˇ y´K ˇ = K´y and our
solution obeys
y
= De aKt
K´y
At this stage, we know the values of the constants a and K, but not the value of the constant
D. We are given that at t = 0, y = 5. Subbing in this, and the values of K and a,
5 5
= De0 ùñ D =
100 ´ 5 95
So the solution obeys the algebraic equation
y 5
= e2t/95
100 ´ y 95
which we can solve to get y as a function of t.
5 2t/95
y = (100 ´ y) e ùñ 95y = (500 ´ 5y)e2t/95
95 
ùñ 95 + 5e2t/95 y = 500e2t/95
500e2t/95 100e2t/95 100
ùñ y = 2t/95
= 2t/95
=
95 + 5e 19 + e 1 + 19e´2t/95
Finally,
100
• In the year 2000, t = 14 and y = 1+19e´28/95
« 6.6 billion.
• In the year 2100, t = 114 and y = 1+19e100
´228/95 « 36.7 billion.

• In the year 2200, t = 514 and y = 1+19e100


´1028/95 « 100 billion.

Example 3.12.14

3.12.2 §§ (Optional) Interest on Investments and Loans


Suppose that you deposit $P in a bank account at time t = 0. The account pays r% interest
per year compounded n times per year.

• The first interest payment is made at time t = n1 . Because the balance in the account
th
during the time interval 0 ă t ă n1 is $P and interest is being paid for n1 of a year,
1 r
that first interest payment is n ˆ 100 ˆ P. After the first interest payment, the balance

in the account is P + n1 ˆ 100r r
ˆ P = 1 + 100n P.
• The second interest payment is made at time t = n2 . Because the balance in the

account during the time interval n1 ă t ă n2 is 1 + 100n r
P and interest is being paid
 th 
for n1 of a year, the second interest payment is n1 ˆ 100 r r
ˆ 1 + 100n P. After the
r
 1 r
second interest payment, the balance in the account is 1 + 100n P + n ˆ 100 ˆ 1+
r
 r
 2
100n P = 1 + 100n P.

286
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

• And so on.
m
In general, at time t = n (just after the mth interest payment), the balance in the account is
 r m  r nt
B(t) = 1 + P = 1+ P (3.12.3)
100n 100n
Three common values of n are 1 (interest is paid once a year), 12 (i.e. interest is paid
once a month) and 365 (i.e. interest is paid daily). The limit n Ñ 8 is called continuous
compounding77 . Under continuous compounding, the balance at time t is
 r nt
B(t) = lim 1 + P
nÑ8 100n
You may have already seen the limit

lim (1 + x ) a/x = e a (3.12.4)


xÑ0
r rt
If so, you can evaluate B(t) by applying (3.12.4) with x = 100n and a = 100 (so that xa = nt).
As n Ñ 8, x Ñ 0 so that
 r nt
B(t) = lim 1 + P = lim (1 + x ) a/x P = e a P = ert/100 P (3.12.5)
nÑ8 100n xÑ0

If you haven’t seen (3.12.4) before, that’s OK. In the following example, we rederive
(3.12.5) using a differential equation instead of (3.12.4).
Example 3.12.15

Suppose, again, that you deposit $P in a bank account at time t = 0, and that the account
pays r% interest per year compounded n times per year, and denote by B(t) the balance
at time t. Suppose that you have just received an interest payment at time t. Then the next
interest payment will be made at time t + n1 and will be n1 ˆ 100
r
ˆ B(t) = 100n r
B(t). So,
1
calling n = h,
r B(t + h) ´ B(t) r
B(t + h) = B(t) + B(t)h or = B(t)
100 h 100
To get continuous compounding we take the limit n Ñ 8 or, equivalently, h Ñ 0. This
gives
B(t + h) ´ B(t) r dB r
lim = B(t) or (t) = B(t)
hÑ0 h 100 dt 100
r
By Theorem 3.12.10, with a = 100 and b = 0,

B(t) = ert/100 B(0) = ert/100 P


once again.
Example 3.12.15

77 There are banks that advertise continuous compounding. You can find some by googling “interest is
compounded continuously and paid”

287
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

Example 3.12.16

(a) A bank advertises that it compounds interest continuously and that it will double your
money in ten years. What is the annual interest rate?
(b) A bank advertises that it compounds monthly and that it will double your money in
ten years. What is the annual interest rate?

Solution. (a) Let the interest rate be r% per year. If you start with $P, then after t years,
you have Pert/100 , under continuous compounding. This was (3.12.5). After 10 years you
have Per/10 . This is supposed to be 2P, so
r
Per/10 = 2P ùñ er/10 = 2 ùñ = ln 2 ùñ r = 10 ln 2 = 6.93%
10
(b) Let the interest rate be r% per year. If you start with $P, then after t years, you have
r
12t
P 1 + 100ˆ12 , under monthly compounding. This was (3.12.3). After 10 years you have
r
 120
P 1 + 100ˆ12 . This is supposed to be 2P, so
r 120 r 120 r
P 1+ = 2P ùñ 1+ =2 ùñ 1+ = 21/120
100 ˆ 12 1200 1200
r 
ùñ = 21/120 ´ 1 ùñ r = 1200 21/120 ´ 1 = 6.95%
1200

Example 3.12.16

Example 3.12.17

A 25 year old graduate of UBC is given $50,000 which is invested at 5% per year com-
pounded continuously. The graduate also intends to deposit money continuously at the
rate of $2000 per year.
(a) Find a differential equation that A(t) obeys, assuming that the interest rate remains
5%.
(b) Determine the amount of money in the account when the graduate is 65.
(c) At age 65, the graduate will start withdrawing money continuously at the rate of W
dollars per year. If the money must last until the person is 85, what is the largest
possible value of W?

Solution. (a) Let’s consider what happens to A over a very short time interval from time
t to time t + ∆t. At time t the account balance is A(t). During the (really short) specified
5
time interval the balance remains very close to A(t) and so earns interest of 100 ˆ ∆t ˆ A(t).
During the same time interval, the graduate also deposits an additional $2000∆t. So
A(t + ∆t) ´ A(t)
A(t + ∆t) « A(t) + 0.05A(t)∆t + 2000∆t ùñ « 0.05A(t) + 2000
∆t

288
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

In the limit ∆t Ñ 0, the approximation becomes exact and we get

dA
= 0.05A + 2000
dt
(b) The amount of money at time t obeys

dA 
= 0.05A(t) + 2,000 = 0.05 A(t) + 40,000
dt
So by Theorem 3.12.10 (with a = 0.05 and b = ´40,000),

A(t) = A(0) + 40,000 e0.05t ´ 40,000

At time 0 (when the graduate is 25), A(0) = 50,000, so the amount of money at time t is

A(t) = 90,000 e0.05t ´ 40, 000

In particular, when the graduate is 65 years old, t = 40 and

A(40) = 90,000 e0.05ˆ40 ´ 40, 000 = $625,015.05

(c) When the graduate stops depositing money and instead starts withdrawing money at
a rate W, the equation for A becomes

dA
= 0.05A ´ W = 0.05( A ´ 20W )
dt
assuming that the interest rate remains 5%. This time, Theorem 3.12.10 (with a = 0.05 and
b = 20W) gives 
A(t) = A(0) ´ 20W e0.05t + 20W
If we now reset our clock so that t = 0 when the graduate is 65, A(0) = 625, 015.05. So the
amount of money at time t is

A(t) = 20W + e0.05t (625, 015.05 ´ 20W )

We want the account to be depleted when the graduate is 85. So, we want A(20) = 0. This
is the case if

20W + e0.05ˆ20 (625, 015.05 ´ 20W ) = 0 ùñ 20W + e(625, 015.05 ´ 20W ) = 0


ùñ 20(e ´ 1)W = 625, 015.05e
625, 015.05e
ùñ W = = $49, 437.96
20(e ´ 1)

Example 3.12.17

Example 3.12.18 (Loan Repayment)

Suppose you borrow $750,000 from the bank under the following conditions:

289
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

1. You make payments to the bank once a month.


2. Every month, you pay 0.25% of the remaining portion of the loan as interest.
3. Your last payment will be after 300 months (25 years)
Let’s consider different ways to structure your payments.
1
Option 1: In the most naive option, let’s assume you pay off 300 of the loan each month.
Then your payments towards the loan itself are always $2500. However, the interest
you pay changes month by month.
Suppose P(t) is the amount of the loan still owed to the bank after t monthly pay-
ments. Then P(t) = 750, 000 ´ 2500t, since after t months you’ve repaid $2500t.
.25
The interest you pay in month t is then 100 P(t ´ 1) = 41 (750, 000 ´ 2500(t ´ 1)) =
25
4 (301 ´ t ).

$4381.25

$1881.25

The dotted line shows monthly interest payments; the solid line shows
monthly payments ($2500+interest).

In this option, your actual monthly payments to the bank vary quite a bit over the 25
years of the loan. If you expect your salary to grow over time, you pay the highest
payments early on, when you make the least amount of money. So, this option is not
ideal.
Option 2: Let’s figure out how to pay off the loan in such a way that your monthly pay-
ments are the same each month, for all 300 months. Again, let P(t) be the amount of
the loan left to repay the bank after you’ve made t monthly payments. Each month,
.25
you pay back some portion of the loan, plus an interest payment of 100 P ( t ).
The amount of the loan you’ve paid back in month t is P(t ´ 1) ´ P(t). In particular,
P ( t ) ´ P ( t ´ 1) P(t) ´ P(t ´ h)
P ( t ´ 1) ´ P ( t ) = ´ « ´ lim = ´P1 (t)
1 hÑ0 h
Thinking of our monthly payments on the loan as how fast P(t) changes, it makes
sense to approximate them by the rate of change of P. The important detail is that

290
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

P(t) is decreasing as you pay positive amounts, which is why we use ´P1 (t) as the
approximation of the amount you paid.
All together, the amount you pay each month is about:

.25
loan payment + interest = ´P1 (t) + P(t)
100
In Option 2, we want this amount to be constant. Let’s call that constant monthly
.25
payment C. This gives us a linear differential equation, C = ´P1 (t) + 100 P(t), or

1 
P1 ( t ) = P(t) ´ 400C
400
By Theorem 3.12.10,

P(t) = P(0) ´ 400C et/400 + 400C

= 750, 000 ´ 400C et/400 + 400C

To find C, recall P(300) = 0.



0 = 750, 000 ´ 400C e300/400 + 400C
400C (e3/4 ´ 1) = 750, 000e3/4
750, 000e3/4 e3/4
C=  = 1875  « 3553.60
400 e3/4 ´ 1 e3/4 ´ 1

So, a monthly payment of roughly $3553.60 would be sufficient to pay off the loan
in 25 years. The amount of that
 monthly
 payment that goes to the loan itself is about
1875
P1 (t) = (1875 ´ C )et/400 = e3/4´1
et/400 , while the rest is interest.

$3553.60

$1875.00

The dotted line shows monthly interest payments; the solid line shows
total monthly payments (principal+interest).

291
I NTEGRATION 3.12 D IFFERENTIAL E QUATIONS

Initial payments consist of roughly equal parts interest and principal. Over time,
payments consist of more and more principal, with less and less interest.
We note here that the Government of Canada mortgage calculator gives a monthly payment of
$3,549.34 for a mortgage of $750,000 with annual rate of 3% (0.25% ˆ 12) and amortization
period 25 years. It also mentions that the total interest paid will be $314,802.37.

Aside from monthly payments, we can also look at the total amount of interest paid in
the two scenarios. In Option 1, the amount of interest paid in month t was 25 4 (301 ´ t ). So,
over 300 months, the total interest paid was:
!
301
ÿ 25 301
25 ÿ
(301 ´ t) = 301 ¨ 300 ´ t
4 4
t =1 t =1
 
25 301 ¨ 302 25 ¨ 301 ¨ (300 ´ 151)
= 301 ¨ 300 ´ =
4 2 4
= 280, 306.25
1 1875

For Option 2, in month t, interest paid was approximately 400 P(t) = e3/4 ´1
e3/4 ´ et/400 .
Total interest is then approximately:
!
300 300
ÿ 1875  3/4 t/400
 1875 3/4
ÿ
t/400
3/4
e ´e = 3/4 300 ¨ e ´ e
t =1
e ´1 e ´1 t =1

For lack of a nice formula, we’ll interpret the sum as a Riemann sum. It corresponds to the
right-hand Riemann sum for the area under the curve f (t) = et/400 from t = 0 to t = 300,
using 300 intervals.
ż 300 !
1875
« 3/4 300 ¨ e3/4 ´ et/400 dt
e ´1 t =0
 h i300 
1875 3/4 t/400
= 3/4 300 ¨ e ´ 400e
e ´1 t =0
1875   
3/4 3/4
= 3/4 300 ¨ e ´ 400 e ´ 1
e ´1
« 316081.01

Option 2 is more expensive than Option 1.


Example 3.12.18

Chapter 3 of this work was adapted from Chapter 1 and Section 2.4 of CLP 2 – Inte-
gral Calculus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-
NonCommercial-ShareAlike 4.0 International license.

292
Chapter 4

P ROBABILITY

4.1IJ Introduction
Before we start, a note. Most terms in this introductory section (probability, event, value)
accord pretty well with their usage in everyday life. However, later on in the chapter we
will introduce new vocabulary and notation (PDF, CDF, E) whose interpretations are far
less obvious. Keeping track of definitions will be key to understanding what’s going on.
Make flashcards if you have a hard time remembering different terms. If you read a term
whose meaning you’ve forgotten, look it up! If we don’t have the same vocabulary, then
we aren’t speaking the same language – so it will be difficult to explain things.

4.1.1 §§ Foundational Vocabulary and Notation

Definition4.1.1.

A probability is a number between 0 and 1. We interpret it as a likelihood.

If an outcome of an event has probability 1, it will certainly happen. If an outcome


has probability 0, it will certainly not happen. If an outcome has probability 12 , it has an
equal chance of happening and not happening. If we have an event with an outcome with
probability 12 and the event happens a large number of times, we expect the outcome to
occur in roughly half of those trials.
A random variable is a lot like the ordinary variables you’re used to using in functions,
in that it is a kind of placeholder that can take on different values. The “random” part of
the name explains that the values taken on are the result of an event – some experiment
or random process.

293
P ROBABILITY 4.1 I NTRODUCTION

Definition4.1.2.

A random variable (or just variable) is a characteristic or measurement that can


be determined for each outcome of some event.
Random variables are usually denoted with capital letters, like X or Y.

Events result in values. The event of rolling a dice might result in the value 1; the
event of flipping a coin might result in the value heads; and the event of choosing a person
might result in the value Parham. We usually use lower-case letters as variables specifying
values.
We’ll mostly use events that result in numerical values, although coin flips are a handy
experiment as well. Unless otherwise specified, you can assume values will be numbers.
(Otherwise our formulas become quite abstract – we won’t ask you to average people or
integrate colours.)
Example 4.1.3

Let X be the random variable corresponding to the event of rolling a standard 6-sided
dice1 . X can result in any of the values 1, 2, 3, 4, 5, or 6.
Suppose we are playing a game and our points are determined by doubling the num-
ber rolled. We might write the following:
If X = x, the number of points earned is 2x.
The equation X = x looks jarring at first, but becomes natural with practice. Remem-
ber that X corresponds to the event (rolling a dice), while x corresponds to the outcome of
that event (e.g. 5).
Example 4.1.3

Notation4.1.4.

We’ll use the shorthand Pr ( A) to mean “the probability that A happens.” For
example:
Pr ( X = x )
denotes “the probability that the event X results in the value x.”

Example 4.1.5

To express “the probability of a dice roll being 5 is 1/6,” we write:


1
Pr ( X = 5) =
6

1 In the interest of clarity, we’ll use “dice” as its own singular (as is common in colloquial English), rather
than ”die” (which is more standard in academic English).

294
P ROBABILITY 4.1 I NTRODUCTION

1
where X is the event (dice roll), 5 is the value, and 6 is the probability.
Example 4.1.5

Example 4.1.6

If X is the roll of a fair dice, then

1
Pr ( X = 1 or X = 2) =
3

Example 4.1.6

Example 4.1.7

Let X be the event of flipping a fair coin (that is, a coin that is equally likely to come up
heads or tails), and let x be one of the values “heads” or “tails.” Then:

1
Pr ( X = x ) =
2

Example 4.1.7

Definition4.1.8.

The sample space of an event is the set of all possible outcomes. We will use S
to denote the sample space.

Example 4.1.9

If you roll a standard dice, S = t1, 2, 3, 4, 5, 6u.


Example 4.1.9

295
P ROBABILITY 4.1 I NTRODUCTION

Warning4.1.10.

This seemingly straightforward definition can cause some confusion, especially


when measured data is involved.
For example: suppose our random variable X is the mileage of a car picked at
random out of a parking lot. If there are, say, 100 cars in that lot, then there are
(at most) 100 values possible for X to take on. Unfortunately, we do not know
those values.
When we use the context of a supposedly measured variable, we’ll pretend that
it could be anything theoretically sensible. In the case of the cars, we do know
that all of those values will be nonnegative numbers. So, In this case, we could
use the sample space [0, 8). Alternately we could say that each of those values is
less
 than  some arbitrarily huge number like 1012 km2 and use the sample space
0, 1012 .

Example 4.1.11 (A Probabilistic Model in Linguistics)

These introductory concepts are enough to start understanding probabilistic models in


a wide array of fields. Here we’ll consider a paragraph from a short paper about people’s
decisions to change the language they use over time.
The following quote is taken from the article: Abrams, D., Strogatz, S. Modelling the
dynamics of language death. Nature 424, 900 (2003). DOI: https://doi.org/10.
1038/424900a

Consider a system of two competing languages, X and Y, in which the at-


tractiveness of a language increases with both its number of speakers and its
perceived status (a parameter that reflects the social or economic opportunities
afforded to its speakers). Suppose an individual converts from Y to X with a
probability, per unit of time, of Pyx ( x, s), where x is the fraction of the popula-
tion speaking X, and 0 ď s ď 1 is a measure of X’s relative status. A minimal
model for language change is therefore

dx
= yPyx ( x, s) ´ xPxy ( x, s)
dt
where y = 1 ´ x is the complementary fraction of the population speaking Y
at time t.

Let’s parse this quote in terms of vocabulary that is familiar to us.

• x is the fraction of a population speaking language X at a given time. So if everyone


is speaking X, then x = 1; if half the population is speaking x, then x = 12 .

2 that’s farther than driving at 100 kph around the clock for one million years, so it’s safe to say no car
has more mileage than this

296
P ROBABILITY 4.1 I NTRODUCTION

• y is the fraction of the population speaking language Y at a given time. Under the
simplified assumptions of the model in the paper, everyone speaks either X or Y,
but not both, at a particular time. So, y = 1 ´ x.
dx
• dtis the rate of change of speakers of language X over time. So if dx
dt is positive, then
dx
dtis increasing and so people are changing from language Y to language X; if dx dt is
negative, then people are changing from language X to language Y.

• The random event in question is a person changing their language. The three values
in its sample space are: person does not change; person changes from X to Y; and
person changes from Y to X.

• The paper uses notation that is different from this textbook. They write Pyx for “the
probability that a person changes from Y to X, and they write Pxy for “the probability
that a person changes from X to Y.

• The probabilities come with arguments: Pyx ( x, s) and Pxy ( x, s). The variables inside
the parentheses are function variables. How likely someone is to switch languages is
not a fixed constant, but rather a function depending on how many people speak the
language, and how much status that language is perceived to have. So, Pyx and Pxy
are functions of multiple variables, like the functions we worked with in Chapter 2.
Now that we understand all the notation, we can figure out where the equation in the
quote came from.
• x increases as people switch from speaking Y to speaking X. Pyx is the proportion of
speakers of Y that we expect to change to X. The number of speakers of Y is y. So,
we expect yPyx people to change from Y to X.

• x decreases as people switch from speaking X to speaking Y. Pxy is the proportion


of speakers of X that we expect to change to Y. The number of speakers of X is x.
So, we expect xPxy people to change from X to Y.

• All together, the change in x is (number of people coming to X from Y) minus (num-
ber of people going to Y from X), or

dx
= yPyx ´ xPxy
dt
which is exactly the equation from the article.

Example 4.1.11

4.1.2 §§ Discrete vs Continuous


The distinction between discrete and continuous variables plays an important role in the
way we calculate properties of variables. The formal definition of a continuous variable
will have to wait until Definition 4.3.6. For now, we’ll think of continuous simply as the
alternative to discrete.

297
P ROBABILITY 4.1 I NTRODUCTION

Definition4.1.12.

If the sample space of a random variable can be written as a list (as opposed to
existing on a continuum), then the sample space and the random variable are
discrete.

“Written as a list” is an informal description of the mathematical term “countable,”


whose definition3 is beyond the scope of this class. We’ll explain what we mean with
examples.
Example 4.1.13

Let X be the random variable corresponding to choosing a whole number in [1, 10].

1 2 3 4 5 6 7 8 9 10

The values in the sample space can be listed: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. So, X is discrete.
Example 4.1.13

Example 4.1.14

Let Y be the random variable corresponding to choosing any real number number in [1, 10]

1 10

S = [1, 10]. There are infinitely many possible values, along a continuum, that could
result. Y is not discrete, it is continuous.
Example 4.1.14

Example 4.1.15

For each of the following events, describe the sample space as discrete or continuous,
where we are still using “continuous” informally as the opposite of “discrete.”

1. Roll three standard dice, add the values.


2. Number of pets you have.4
3. Your exact age at noon today.5
4. Volume of a box.

3 a set is countable if there exists an injective (one-to-one) function from that set to the natural numbers.
4 We aren’t monsters–there’s no such thing as half a pet.
5 Imagine you have have perfect precision.

298
P ROBABILITY 4.1 I NTRODUCTION

Solution.

1. The sample space is discrete, S = t3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18u.

2. This is also discrete. The number of pets you have is a whole number: 0, 1, 2, 3, etc.

3. You can be any age from 0 to, say, 500 years. If we have exact precision, your age is
a real number: you might be 19.015 years old, or 19.016 years old, or somewhere in
between. So with exact precision, we’re taking numbers along a continuum. This is
a continuous sample space.

4. Similar to the above, if we have exact precision, our answer can be any non-negative
real number, so this is a continuous sample space.

Example 4.1.15

When our variables are describing physical processes, the line between discrete and
continuous can be somewhat blurry. For example, suppose we’re measuring the amount
of water in a container. Volumes in general exist on continuums (or continua), like the
volume of a box in Example 4.1.15, so we could think of this as a continuous sample
space. Alternately, we could think of the amount of water as a discrete quantity, because
the number of molecules is in integer.
Remembering back to our definition of the definite integral (Definition 3.1.8), we ap-
proximated a curvy area with lots of small rectangles. In a similar way, continuous sample
spaces can be approximated with discrete sample spaces. The reason we need the distinc-
tion is less important for actual measurements, and more important for deciding how to
perform calculations. You’ll see as we progress through the chapter that some calculations
only make sense in one type of variable, and not the other.

4.1.3 §§ Combining Events


Definition4.1.16.

Two outcomes of an event are disjoint if no value in the sample space can be
described by both outcomes.

For example, consider the event of rolling a dice, corresponding as usual to the discrete
random variable X. The outcomes X = 1 and X = 2 are disjoint, because no dice roll will
result in both of them being true. On the other hand, the outcomes X ą 2 and “X is even”
are not disjoint, because a roll of 4 or 6 makes both of them true.
Example 4.1.17

Let X be a continuous random variable with sample space [0, 10]. For each collection of
outcomes below, decide whether the outcomes are disjoint or not.

1. X ă 5; X ě 5

299
P ROBABILITY 4.1 I NTRODUCTION

2. X ě 9; X ě 8

3. 1 ă X ă 2; X even; X odd

Solution.

1. These are disjoint; no number is both less than five, and also greater than or equal to
five.

2. These are not disjoint: X = 9, for example, makes both true. (So does X = 9.5,
X = 10, etc.)

3. These are disjoint. If X is an integer, then it is even or odd but not both, and it is not
in the interval (1, 2). If X is in the interval (1, 2), then it is not an integer, so it is not
even and not odd.

Example 4.1.17

Theorem4.1.18.

Suppose A and B represent disjoint outcomes of the same event. Then

Pr ( A happens OR B happens) = Pr ( A happens) + Pr ( B happens)

Warning4.1.19.

In a mathematical context, “or” has a slightly different meaning from its collo-
quial use. When we say “A or B,” we mean “A, B, or both.”
A common joke is that when a mathematician is sharing cookies, and tells you
that you may choose peanut butter or chocolate, you are free to take two cookies.

Proof. This comes from the interpretation of a probability as the proportion of trials where
an outcome occurs. If A and B never occur at the same trial, then the proportion where
one or the other occurs is simply the sum of the proportions where one occurs.

Example 4.1.20

If X is the random variable corresponding to a dice throw, then Pr ( X ď 3) = Pr ( X =


1 OR X = 2 OR X = 3). Since the events X = 1, X = 2, and X = 3 are disjoint, this
probability is equal to:

1 2 1 1
Pr ( X = 1) + Pr ( X = 2) + Pr ( X = 3) = + + = .
6 6 6 2

300
P ROBABILITY 4.1 I NTRODUCTION

Example 4.1.20

Example 4.1.21

Suppose there is a lottery where you pick five numbers, and you win a prize if at least
three of your five picks accord with the winning five numbers. Suppose you know that
1
the probability of matching exactly three numbers is 100 ; the probability of matching ex-
1 1
actly four numbers is 1000 ; and the probability of matching exactly five numbers if 10000 .
Then the probability of winning something is the probability of matching 3, 4, or 5
1 1 1
numbers: 100 + 1000 + 10000 .
Example 4.1.21

Example 4.1.22

Suppose for the province of British Columbia, the probability that a randomly chosen
3
adult resident will apply for employment insurance (EI) benefits in 2021 is 100 , while the
probability that a randomly chosen adult resident will be laid off from their job in 2021 is
7
100 .
True or false: the probability that a randomly chosen adult resident will apply for EI
1
or be laid off is 10 .
Solution. Not necessarily true (and almost certainly false). These are not disjoint events,
so Theorem 4.1.18 does not apply.
Example 4.1.22

4.1.4 §§ Equally Likely Outcomes


Equally likely means that each outcome of a discrete-valued experiment occurs with
equal probability. For example, if you roll a fair, six-sided dice, each face (1, 2, 3, 4, 5, or 6)
is as likely to occur as any other face. If you toss a fair coin, a Head ( H ) and a Tail ( T ) are
equally likely to occur.
Example 4.1.23

Suppose you roll one fair six-sided die, with the numbers t1, 2, 3, 4, 5, 6u on its faces, and
you need to roll at least 5 to win a game.
There are two values that win you the game, 5 and 6. Each is expected to occur 16 of
the time. So,
1 1 1
Pr ( X ě 5) = + =
6 6 3
If you were to roll the die only a few times, you would not be surprised if your ob-
served results did not match the probability. If you were to roll the die a very large num-
ber of times, you would expect that, overall, roughly (if not exactly) 31 of the rolls would
result in an outcome of ”at least five”.

301
P ROBABILITY 4.1 I NTRODUCTION

Example 4.1.23
It is important to realize that in many situations, the outcomes are not equally likely. Look
at the dice in a game you have at home; the spots on each face are usually small holes
carved out and then painted to make the spots visible. Your dice may or may not be
biased; it is possible that the outcomes may be affected by the slight weight differences
due to the different numbers of holes in the faces. Casino dice have flat faces; the holes
are completely filled with paint having the same density as the material that the dice are
made out of so that each face is equally likely to occur.
The continuous analog of “equally likely” is uniformly distributed.

Definition4.1.24.

Intuitively, a continuous random variable is uniformly distributed on an interval


if the variable doesn’t favour one region of the interval over any other region.
More formally:
Let X be a continuous random variable. X is uniformly distributed on the inter-
val [ a, b] if there exists some constant c such that for any interval [ a1 , b1 ] in [ a, b],
Pr ( a1 ď X ď b1 ) = c(b1 ´ a1 ). That is, the probability that X is in a particular
interval within [ a, b] depends only on the length of that interval.

Example 4.1.25

Suppose X is a continuous random variable that is uniformly distributed on the interval


[0, 10].
The intervals [2, 5] and [3, 6] have the same length, so Pr (2 ď X ď 5) = Pr (3 ď X ď 6).
The intervals [2, 3], [3, 4], and [7, 8] have equal length, so Pr (2 ď X ď 3) = Pr (3 ď X ď
4) = Pr (7 ď X ď 8). So, X is twice as likely to be in the interval [2, 4] as it is to be in the
interval [7, 8].
Example 4.1.25

Corollary4.1.26.

Suppose X is a continuous random variable that is uniformly distributed on its


sample space, the interval [ a, b]. Then for any interval [ a1 , b1 ] with a ď a1 ď b1 ď
b,
b ´ a1
Pr ( a1 ď X ď b1 ) = 1
b´a
That is, the probability that X is in the interval [ a1 , b1 ] is the ratio of the length of
that interval to the sample space interval.

302
P ROBABILITY 4.1 I NTRODUCTION

Proof. Since the sample space of X is [ a, b],


Pr ( a ď X ď b) = 1
Since X is uniformly distributed on [ a, b], there exists a constant c such that Pr ( a1 ď X ď
b1 ) = c(b1 ´ a1 ) for any interval [ a1 , b1 ] inside the interval [ a, b]. So,
1
1 = Pr ( a ď X ď b) = c(b ´ a) ùñ c =
b´a
Then:
b1 ´ a1
Pr ( a1 ď X ď b1 ) = c(b1 ´ a1 ) =
b´a

Example 4.1.27

Let X be a continuous random variable that is uniformly distributed on its sample space,
the interval [0, 10]. What is Pr (7 ď X ď 9)?
Solution.
The interval [7, 9] has length 2; the sample space interval [0, 10] has length 10. So,
2 1
Pr (7 ď X ď 9) = =
10 5

Example 4.1.27

Example 4.1.28

Let X be a continuous random variable that is uniformly distributed across its sample
space [´8, 17]. Calculate the probabilities below.

1. Pr (1 ď X ď 2)
2. Pr (´5 ď X )
3. Pr (´10 ď X ď 10)

Solution.
2´1 1
1. By Corollary 4.1.26, Pr (1 ď X ď 2) = 17´(´8)
= 25

2. Since X only takes on values in its sample space [´8, 17]: Pr (´5 ď X ) = Pr (´5 ď
17´(´5)
X ď 17). By Corollary 4.1.26, Pr (´5 ď X ď 17) = 17´(´8) = 22
25

3. Since X only takes on values in its sample space [´8, 27]: Pr (´10 ď X ď 10) =
Pr (´8 ď X ď 10). Now the interval [´8, 10] is inside our sample space, unlike the
interval [´10, 10], so we can apply Corollary 4.1.26.
10´(´8) 18
Pr (´8 ď X ď 10) = 17´(´8)
= 25

303
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)

Example 4.1.28

Example 4.1.29

Suppose the continouous variable X is the age of a randomly chosen living person, mea-
sured in years with exact precision. Then X is more likely to be near 50 than it is to be near
110. So, X is not uniformly distributed.

Example 4.1.29

4.2IJ Probability Mass Function (PMF)

For a discrete random variable, the description of the probabilities of all events in its sam-
ple space is its probability mass function.

Definition4.2.1.

A probability mass function (PMF) for a discrete random variable X is the func-
tion f ( x ) from R to [0, 1], where

f ( x ) = Pr ( X = x )

Often f ( x ) formally takes the form of a piecewise function, e.g.

#
1
6 x = 1, 2, 3, 4, 5, or 6
f (x) =
0 else

for a dice roll. In particular, f ( x ) = 0 for every value x not in the sample space of the
random variable.

304
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)

Notation4.2.2.

Rather than writing a piecewise function every time, we will represent the prob-
ability mass function (PMF) of a random variable X using a table, set up like
this:
x Pr ( X = x )
1
1 Pr ( X = 1) = 6
1
2 Pr ( X = 2) = 6
1
3 Pr ( X = 3) = 6
1
4 Pr ( X = 4) = 6
1
5 Pr ( X = 5) = 6
1
6 Pr ( X = 6) = 6

where events not in the sample space do not show up in the table.

Theorem4.2.3.

For any probability mass function (PMF) f ( x ):

• f ( x ) is a number in [0, 1] for every real number x, and

• the sum of the probabilities of all values in the sample space is one.

Proof. • f ( x ) = Pr ( X = x ), and probabilities are defined (4.1.1) to be in [0, 1].

• X is guaranteed to be a value in the sample space, so using Theorem 4.1.18, 1 =


Pr ( X is in S), which is the sum of Pr ( X = x ) for every x in S .

305
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)

Warning4.2.4.

If Pr ( X = x ) = 0, then often f ( x ) is omitted from the probability mass


function (PMF). For example, in Notation 4.2.2, we don’t bother writing that
Pr ( X = 17) = 0, or Pr ( X = 18) = 0, or Pr ( X = 107.4) = 0.
You might also see this omission in a probability mass function (PMF) written as
a piecewise function. Instead of writing this:
#
1
x = 1, 2, 3, 4, 5, or 6
f (x) = 6
0 else

you could write this:


1
f (x) = 6 for all x in t1, 2, 3, 4, 5, 6u.

Notation4.2.5.

The notation ÿ
f (x)
x

means we take the sum of f ( x ) for every value x in some set. In this context, that
set is understood to be the sample space of a random variable.
We may also omit the bound, writing simply
ÿ
f (x)

The sample space may or may not be a range of integers, which is why this notation is
slightly different from the sigma notation we use in the other chapters of this book.
Example 4.2.6

A child psychologist is interested in the number of times per night a newborn baby’s cry-
ing wakes its parent. The record this number for 100 different parents.

x number of parents woken x times

0 5
1 5
2 40
3 23
4 13
5 10
6 0
7 3
8 1

306
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)

Suppose we pick one parent uniformly at random. Let X be the number of times per
night that parent is woken up. X takes on the values 0, 1, 2, 3, 4, 5, 6, 7, 8.

x P( X = x )
5
0 P ( X = 0) = 100
5
1 P ( X = 1) = 100
40
2 P ( X = 2) = 100
23
3 P ( X = 3) = 100
13
4 P ( X = 4) = 100
10
5 P ( X = 5) = 100
0
6 P ( X = 6) = 100
3
7 P ( X = 7) = 100
1
8 P ( X = 8) = 100

This is a probability mass function (PMF) because:

• Each probability is in the interval [0, 1].

• The sum of the probabilities is one, that is,

ÿ 5 5 40 23 13 10 0 3 1
Pr ( X = x ) = + + + + + + + + =1
x
100 100 100 100 100 100 100 100 100

Example 4.2.6

Example 4.2.7

A hospital researcher is interested in the number of times an average post-op patient will
ring the nurse during a 12-hour shift. For a random sample of 50 patients, the following
information was obtained.
Let X be the number of times a patient rings the nurse during a 12-hour shift.

307
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)

x P( X = x )
4
0 P ( x = 0) = 50
8
1 P ( x = 1) = 50
16
2 P ( x = 2) = 50
14
3 P ( x = 3) = 50
6
4 P ( x = 4) = 50
2
5 P ( x = 5) = 50

Why is this a probability mass function (PMF)?


Solution. Yes, each probability is a number from the interval [0, 1], and their sum is 1:
ÿ 4 8 16 14 6 2
Pr ( X = x ) = + + + + + =1
x
50 50 50 50 50 50

Example 4.2.7

Example 4.2.8

Suppose Nancy has classes three days a week. She attends classes three days a week 80%
of the time, two days 15% of the time, one day 4% of the time, and no days 1% of the time.
Suppose one week is randomly selected.
a. What is the random variable in this case? Call it X.
b. What values does X take on?
c. Construct a probability mass table (called a PM table) like the one in Example 4.2.6.
The table should have two columns, labelled x and P( X = x ).
d. What does the P( x ) column sum to?

Solution.
a. X is the number of days Nancy went to class on the randomly selected week.
b. From the description, X has sample space t0, 1, 2, 3u.
x P( X = x )

0 P( x = 0) = 0.01

c. 1 P( x = 1) = 0.04

2 P( x = 2) = 0.15

3 P( x = 3) = 0.8

308
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)

ř3
d. x =0 Pr ( X = x ) = 0.01 + 0.04 + 0.15 + 0.8 = 1, which accords with Definition 4.2.1.

Example 4.2.8

Example 4.2.9

Suppose a person is chosen at random from a group. Let X be the discrete random variable
describing the number of siblings that person has, and suppose the following probabilities
hold for X:
x P( X = x )

0 P( x = 0) = 0.25

1 P( x = 1) = 0.3

2 P( x = 2) = 0.25

3 P( x = 3) = 0.1

4 P( x = 4) = 0.05

If we sum up the right column, we get


4
ÿ
Pr ( X = x ) = 0.25 + 0.3 + 0.25 + 0.0.05 = 0.95 ă 1
x =0

That tells us this is not a probability mass function (PMF). Since all probabilities are
numbers in the interval [0, 1], it must be the case that we haven’t summed over all values
in the sample space. That is, in our sample of people, there must be some people who
haven’t been described here, e.g. people with more than four siblings. (Indeed, these
folks would make up 5% of the group.)
Example 4.2.9

4.2.1 §§ Limitations of Probability Mass Function (PMF)


Let’s imagine we’re choosing numbers from 1 to 3 uniformly at random. The number
chosen is called X. In the examples below, we’ll investigate the difference between discrete
choices and a continuous choice.
• If we choose an integer from 1 to 3 uniformly at random, then our probability mass
function (PMF) is:
#
1
x is 1, 2, or 3
f ( x ) = Pr ( X = x ) = 3
0 otherwise

Graphed, it looks like this:

309
P ROBABILITY 4.2 P ROBABILITY M ASS F UNCTION (PMF)

y
1
3

x
1 2 3

Furthermore, Pr ( X ď 2) = 32 .

• If we choose a number from 1 to 3 uniformly at random, choosing numbers of the


form n2 where n is an integer (e.g. we can choose 1 and 1.5 but not 1.15), then there
are five numbers to choose from. Our probability mass function (PMF) is:

y
1
5

x
1 1.5 2 2.5 3

1
For example, Pr ( X = 2) = 5 and Pr ( X = 7) = 0. Furthermore, Pr ( X ď 2) = 35 .

• If we choose a number from 1 to 3 uniformly at random, choosing numbers of the


n
form 10 where n is an integer (e.g. we can choose 1 and 1.1 but not 1.15), then there
are 21 numbers to choose from. Our probability mass function (PMF) is:

1
21
x
1 2 3

11
Furthermore, Pr ( X ď 2) = 21 .

• So far, all the examples have been discrete systems. What if we want X to be a
continuous variable? We want to be able to choose any real number from 1 to 3. In
this case, there are infinitely many numbers to choose from. So, the probability of
choosing any of them is... zero!

310
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)

x
1 2 3

This is a problem! We know we’re choosing numbers between 1 and 3, but we have
Pr ( X = 1) = 0 and Pr ( X = 4) = 0. So the probability mass function (PMF) is not
useful for describing continuous random variables. We need a different tool.
On the other hand, it’s easy to imagine that Pr ( X ď 2) = 12 . So somehow this
calculation didn’t break when we moved from a discrete system to a continuous
system.

4.3IJ Cumulative Distribution Function (CDF)


In the final example above, Pr ( X = x ) = 0 for every number x. Looking at individual
numbers isn’t very enlightening. Instead of looking at individual numbers, then, we can
look at ranges of numbers. These behave more nicely. With that in mind, we make the
following definition.

Definition4.3.1.

Given a random variable X, the cumulative distribution function (CDF) of X,


usually denoted by F ( x ), is
Pr ( X ď x )

This might seem like a weirdly specific definition. Secretly, our main purpose in cre-
ating this function is to use it as a tool to define two other things: a continuous random
variable, and the probability density function. Our motivation for defining the cumulative
distribution function (CDF) may lie with continuous random variables, but the definition
applies to discrete random variables as well.
Example 4.3.2

Suppose a random variable X has cumulative distribution function (CDF) F ( x ), given by


$
&0
’ xă0
x 2
F ( x ) = 104 0 ď x ď 100

1 x ą 100
%

Evaluate the following probabilities:


1. Pr ( X ď 50)

2. Pr ( X ą 10)

311
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)

3. Pr ( X ď 0)
4. Pr ( X ě 200)
5. Pr (10 ă X ď 20)

Solution.
1. By Definition 4.3.1: Pr ( X ď 50) = F (50); using the formula given for F ( x ), this is
502
104
= 41
2. Pr ( X ą 10) is the probability that X is not less than or equal to 10, so
102 1 99
Pr ( X ą 10) = 1 ´ Pr ( X ď 10) = 1 ´ F (10) = 1 ´ 4
= 1´ =
10 100 100
3. Pr ( X ď 0) = F (0) = 0. Note this tells us that X never takes negative values.
4. Note Pr ( X ď 100) = F (100) = 1. That tells us that X always takes values less than
or equal to 100. Combined with our last note, that means the only values X ever
takes are in the interval [0, 100]. So, Pr ( X ě 200) = 0.
5. We can think of the interval (10, 20] as “numbers that are less than equal to 20 except
numbers less than or equal to 10.” We rewrite Pr (10 ă X ď 20) in a manner similar
to Problem 2:
Pr (10 ă X ď 20) = Pr ( X ď 20 and X ğ 10) = Pr ( X ď 20) ´ Pr ( X ď 10)
202 102 3
= F (20) ´ F (10) = ´ =
104 104 100

Example 4.3.2

The ideas in the calculations of 2 and 5 above give us the following corollary.
Corollary4.3.3.

Let X be a random variable with cumulative distribution function (CDF) F ( x ).


Then

1. Pr ( X ą a) = 1 ´ F ( a), and

2. Pr ( a ă X ď b) = F (b) ´ F ( a)

Proof. The probability Pr ( X ą a) is the probability that X is not less than or equal to a, so
Pr ( X ą a) = 1 ´ Pr ( X ď a) = F ( a)
The probability Pr ( a ă X ď b) is the probability that X is less than or equal to b and X
is not less than or equal to a.
Pr ( a ă X ď b) = Pr ( X ď b) ´ Pr ( X ď a) = F (b) ´ F ( a)

312
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)

Example 4.3.4

Let X be a discrete random variable with probability mass function (PMF) below.

x Pr ( X = x )
1
10 16
3
20 16
5
30 16
7
40 16

Note that the only values taken on by X are the numbers 10, 20, 30, and 40.
Let F ( X ) be the cumulative distribution function (CDF) of X.

• By Definition 4.3.1, F (10) = Pr ( X ď 10). Looking at the probability mass function


1
(PMF), X ď 10 only when X = 10, which happens 16 of the time. So, in this case:

1
F (10) = Pr ( X ď 10) = Pr ( X = 10) =
16

• By Definition 4.3.1, F (20) = Pr ( X ď 20). Looking at the probability mass function


1 3
(PMF), X ď 20 only when X = 10 or X = 20, which happens 16 + 16 of the time. So,
in this case:
1 3 1
F (20) = Pr ( X ď 20) = Pr ( X = 10 or X = 20) = + =
16 16 4

• Similarly,

1 3 5 9
F (30) = Pr ( X ď 30) = Pr ( X = 10 or X = 20 or X = 30) = + + =
16 16 16 16
and

F (40) = Pr ( X ď 40) = Pr ( X = 10 or X = 20 or X = 30 or X = 40)


1 3 5 7
= + + + =1
16 16 16 17
The cumulative distribution function (CDF) has all real numbers as its domain, so
we aren’t quite finished determining the function F ( x ). However, after doing a few
examples, the rest of the function is easy to figure out.

• F (9) = Pr ( X ď 9) = 0; indeed, F ( x ) = 0 for all x ă 10.

• F (11) = Pr ( X ď 11) = Pr ( X = 10), since 10 is the only number ever taken by X that
1
is less than or equal to 11. So, F (11) = F (10) = 16 . Indeed, F ( x ) = F (10) for all x in
the interval [10, 20)

313
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)

• Following this line of reasoning:


$

’0 x ă 10
1

10 ď x ă 20


& 16

F(x) = 1
4 20 ď x ă 30
9
30 ď x ă 40


16



%1

40 ď x

9/16

1/4

1/16
x
10 20 30 40 50

Example 4.3.4

Example 4.3.5

Let U be a random variable that is chosen uniformly at random from all real numbers in
the interval [0, 1]. Understanding the cumulative distribution function (CDF) F ( x ) of U
can help us understand what “uniformly” means in this case.
As we saw in section 4.2.1, it’s not useful to note that Pr (U = x ) is the same for every
number in [0, 1], because that probability is 0. We can get at the meaning of “uniformly”
in a more useful way by examining ranges of numbers.
If we were to divide our interval6 in half, then the uniformity of distribution tells us
that half the time, U is in one half, and half the time, U is in the other half. In particular,
   
1 1 1
Pr 0 ď U ď = Pr ďUď1 =
2 2 2

So, for the cumulative distribution function (CDF),


 
1 1
F =
2 2

6 Since Pr (U = 12 ) = 0, it won’t matter whether we use the interval [0, 1/2] or [0, 1/2).

314
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)

If we were to divide our interval into equal tenths, then the uniformity of distribution
tells us that U should fall in each interval one-tenth of the time. For example,
     
1 1 1 1 1 1
Pr 0 ď U ď = Pr ďUď = Pr ďUď =
10 10 20 20 30 10

So,  
1 1
F =
10 10
In general, if x is a number in the interval [0, 1], then x describes the proportion of [0, 1]
taken up by the interval [0, x ], so F ( x ) = x.
$
&0
’ xă0
F(x) = x 0ďxď1
1 1ăx

%

x
1

Example 4.3.5

The cumulative distribution function (CDF) will give us our actual definition of a con-
tinuous random variable. Thinking of “continuous” as the opposite of “discrete” is not
sufficiently accurate.

Definition4.3.6.

A random variable X is continuous if its cumulative distribution function (CDF)


is continuous.

Example 4.3.7

The random variables from Examples 4.3.2 and 4.3.5 are continuous random variables.
The random variable from Examples 4.3.4 is not a continuous random variable.
Example 4.3.7

315
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)

Corollary4.3.8.

Let X be a continuous random variable. For any real number a,

Pr ( X = a) = 0

Furthermore,

Pr ( X ă a) = Pr ( X ď a) and Pr ( X ą a) = Pr ( X ě a)

Proof. Let F ( x ) be the cumulative distribution function (CDF) of X.

lim F ( x ) = lim Pr ( X ď x ) = Pr ( X ă a)
xÑa´ xÑa´

By the definition of a continuous function,

lim F ( x ) = F ( a)
xÑa´

So,

Pr ( X ă a) = Pr ( X ď a)
Pr ( X ď a) ´ Pr ( X ă a) = 0
Pr ( X = a) = 0

The “furthermore” statements follow.

Pr ( X ď a) = Pr ( X ă a) + Pr ( X = a) = Pr ( X ă a)
Pr ( X ě a) = Pr ( X ą a) + Pr ( X = a) = Pr ( X ą a)

Example 4.3.9

V is a number chosen at random from all real numbers in the intervals [´3, ´1] or [1, 3] as
follows:
• First, a fair 6-sided dice is rolled. If the outcome of the roll is 1 or 2, then V is chosen
to be in the interval [´3, ´1]. If the outcome of the roll is 3, 4, 5, or 6, then V is chosen
to be in the interval [1, 3].
• Within the selected interval, V is chosen uniformly at random.
Determine the cumulative distribution function (CDF) of V and decide whether or not
V is continuous.
Solution. From the first step, we see that V is in the interval [´3, ´1] one-third of the time,
and in the interval [1, 3] two-thirds of the time.
1 2
Pr (´3 ď V ď ´1) = , Pr (´1 ď V ď 3) =
3 3

316
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)

Within these intervals, V has a uniform distribution. As in Example 4.3.5, we consider


intervals. For example, V is equally likely to be in the interval [´3, ´2] and the interval
[´2, ´3]. So,
Pr (´3 ď V ď ´2) = Pr (´2 ď V ď ´1)

Also,
1
Pr (´3 ď V ď ´2) + Pr (´2 ď V ď ´1) = Pr (´3 ď V ď ´1) =
3
So,
1
Pr (´3 ď V ď ´2) = Pr (´2 ď V ď ´1) =
6
Following the reasoning in Example 4.3.5, we see on the interval [´3, ´1], the function
F ( x ) is a straight line from F (´3) = 0 to F (´1) = 13 .
When ´1 ă x ă 1, then F ( x ) = Pr ( X ď x ) = Pr ( X ď ´1) = F (´1), since no values of
V are ever less than 1 without also being less than or equal to ´1. Then, by Corollary 4.3.8,
also F (1) = Pr ( X ď 1) = Pr ( X ă 1) = Pr ( X ď ´1) = F (´1).
On the interval [1, 3], V is uniformly distributed. Following the familiar line of reason-
ing, the function F ( x ) is a straight line from F (1) = 13 to F (3) = 1. All together:

1
3

x
´3 ´1 1 3

Using the graph, we can find F ( x ) in equation form:


$

’0 x ă ´3
x 1

&6 + ´3 ď x ă ´1


2

F(x) = 1
3 ´1 ď x ă 1
1
’3x 1ďxă3




%1

xě3

Since F ( x ) is continuous, V is a continuous random variable.


Example 4.3.9

317
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)

Corollary4.3.10 (Properties of the cumulative distribution functions (CDF)).

If F ( x ) is the cumulative distribution function (CDF) of a continuous random


variable X, then:

1. 0 ď F ( x ) ď 1 for all real x

2. F ( x ) is nondecreasing

3. lim F ( x ) = 1
xÑ8

4. lim F ( x ) = 0
xÑ´8

Proof. 1. F ( x ) is a probability, and all probabilities are numbers between 0 and 1.

2. Suppose a ă b.

F ( a) = Pr ( X ď a) ď Pr ( X ď a) + Pr ( a ă X ď b) = Pr ( X ď b) = F (b)

That is, if a ă b, then F ( a) ď F (b).

3. Rather than a rigorous proof, we offer the following hand-wavey intuition: if infinity
were a number, we’d expect F (8) = Pr ( X ď 8) = 1.

4. Rather than a rigorous proof, we offer the following hand-wavey intuition: if nega-
tive infinity were a number, we’d expect F (´8) = Pr ( X ď ´8) = 0.

4.3.1 §§ Dot Diagrams


We’re going to introduce a tool for visualizing random processes that will hopefully help
topics in continuous random variables be more intuitive. That tool is a dot diagram.
Let X be some continuous random variable. If X is a process (like choosing the height,
in feet, of a random student), we can imagine performing that process again and again
and again. Suppose we do just that. Every time we get a new value of X, we put a dot on
a number line. For example:

1. The first randomly-chosen student has height 5.5 feet:

1 2 3 4 5 6

2. The second randomly-chosen student has height 6.1 feet:

1 2 3 4 5 6

318
P ROBABILITY 4.3 C UMULATIVE D ISTRIBUTION F UNCTION (CDF)

3. The third randomly-chosen student has height 4.9 feet:

1 2 3 4 5 6

4. The third randomly-chosen student has height 5.4 feet:

1 2 3 4 5 6

5. After 20 choices, our results might look like this:

1 2 3 4 5 6

6. After 100 choices, our dots would start being so close together, they would be indis-
tinguishable, so we might choose to make the dots slightly transparent. Then darker
dots would represent the same heights (or nearly the same heights) being repeated.

1 2 3 4 5 6

Example 4.3.11

The continuous random variable V from Example 4.3.9 is chosen as follows:


• First, a fair 6-sided dice is rolled. If the outcome of the roll is 1 or 2, then V is chosen
to be in the interval [´3, ´1]. If the outcome of the roll is 3, 4, 5, or 6, then V is chosen
to be in the interval [1, 3].
• Within the selected interval, V is chosen uniformly at random.
If we were to perform this trial 100 times, and record the number each time, our results
might look like this:

The dots (trial outcomes) are twice as dense on the right interval. Inside the right
interval, and inside the left interval, the dots are fairly evenly disributed.
Example 4.3.11

Example 4.3.12

Match the dot diagrams to the variable descriptions so that every description corresponds
to exactly one dot diagram.

319
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)

A. Pr ( X ď 0) = Pr ( X ě 0). 1.

2.
B. X is uniformly distributed.

3.
C. Pr ( X ď 0) ă Pr ( X ě 0).

4.

D. Pr ( X ď 0) ă Pr ( X ě 0).

Solution. In both 1 and 2, it seems like (roughly) the same number of trials resulted in
positive and negative values of X. So in both cases, A holds. However, in 2, the distribu-
tion is not uniform: trials are more likely to have large absolute values than to be near 0.
So, we match B to 1 and A to 2.
In 3, more trials gave X ď 0 than X ě 0, so we match that to D.
In 4, more trials gave X ě 0 than X ď 0, so we match that to C.
Example 4.3.12

4.4IJ Probability Density Function (PDF)


As we saw in Corollary 4.3.8, if X is a continuous random variable, then Pr ( X = a) = 0
for any real number a. However, that doesn’t mean that all number ranges are equally
likely. In Example 4.3.5, we saw a continuous random variable U that only existed in the
range [0, 1]; so getting a value near 12 is more likely than getting a value near 2.
When looking at dot diagrams, areas with more “hits” show up as having a higher den-
sity of dots. This idea will be central to this section: measuring the density of a continuous
random variable.
A usual definition of density is something like

how much stuff


how much space
Population density might be measured in people per square kilometre, liquid density
might be measured in grams per mL, etc. Probability density follows a similar pattern:
we’ll measure how likely a variable is to be in a given interval and divide it by the size (length)
of that interval.

320
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)

Suppose the dot diagram above represents some continuous random variable X, and
we want to measure the probability density near the indicated point a. We start by defin-
ing a small interval around a. As is tradition, we take the interval between a and a + h,
where h is some small7 real number.
It doesn’t make sense to count the dots is this interval, since the actual number will
change as we do different trials, so instead we measure the likeliness our random variable
is to be in this interval: Pr ( a ď X ď a + h). The length of the interval is h. So, our
probability density around a is:

Pr ( a ď X ď a + h)
h
If F ( x ) is our cumulative distribution function (CDF), then we can re-write this using
Corollaries 4.3.3 and 4.3.8.
F ( a + h) ´ F ( a)
=
h
Since we only consider small values of h, we recognize the definition of a derivative.

F ( a + h) ´ F ( a)
lim = F1 ( a)
hÑ0 h

This motivates our definition of the probability density function (PDF) of a continuous
random variable. Probability mass functions (PMFs) and probability density functions
(PDFs) serve similar purposes: describing which values a variable tends to take on.

Definition4.4.1.

The probability density function (PDF) of a continuous random variable, usu-


ally written f ( x ), is the derivative of the cumulative distribution function (CDF),
where it exists.

The observant reader will note that the conventional use of F ( x ) as an antiderivative
of f ( x ) squares nicely with our use of F ( x ) for a cumulative distribution function (CDF)
and f ( x ) for a probability density function (PDF).

Warning4.4.2.

Some textbooks use the term ”probability distribution function” instead of


”probability mass function,” and then use the abbreviation PDF in both a con-
tinuous and a discrete context. This reflects the similar roles probability density
functions (PDFs) and probability mass functions (PMFs) play.

7 By “small,” we mean |h| « 0. In the discussion that follows, we’re considering the case h ą 0; the case
h ă 0 proceeds in the same way.

321
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)

Example 4.4.3

In Example 4.3.2, we considered a continuous random variable with cumulative distribu-


tion function (CDF) given by

$
&0
’ xă0
F(x) = x2
0 ď x ď 100
’ 104
1 x ą 100
%

The probability density function (PDF) of this variable is F1 ( x ), namely

$
&0
’ xă0
f (x) = x
0 ď x ă 100
’ 5000
0 x ą 100
%

Translating f ( x ) into a dot diagram, to help build intuition about the behaviour of this
variable, we expect to see

• no dots except in the interval [0, 100], and

• an increasing density of dots from left to right on the interval [0, 100].

Example 4.4.3

322
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)

Notation4.4.4.

As with probability mass functions (PMFs), it is common to suppress the regions


where a probability density function (PDF) is zero or doesn’t exist. Instead of
writing $
&0
’ xă0
x
f ( x ) = 5000 0 ď x ă 100
0 x ą 100

%

as in Example 4.4.3, we may also write


!
x
f ( x ) = 5000 0 ď x ă 100

and it is understood that f ( x ) is zero or doesn’t exist when x not in the interval
[0, 100).
Another time-saving measure is to use the words “else” or “otherwise” in a
piecewise-defined function. In the context of this function:
#
x
0 ď x ă 100
f ( x ) = 5000
0 else
“else” means “for all values of x other than the ones that have already been de-
fined,” i.e. for all values of x outside the interval [0, 100).

Example 4.4.5

In Example 4.3.5, we considered a continuous random variable with cumulative distribu-


tion function (CDF) given by

$
&0
’ xă0
F(x) = x 0ăxă1
1 1ăx

%

The probability density function (PDF) of this variable is F1 ( x ), namely

$
&0
’ xă0
f (x) = 1 0ăxă1
0 1ăx

%

Notice that the density is constant on the interval (0, 1). This is a hallmark of uniformly
distributed variables: in the interval in question, no one region is denser than any other
region.

323
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)

Note f ( x ) is not defined at x = 0 and x = 1 because F ( x ) is not differentiable at these


points8 .
Example 4.4.5

Example 4.4.6

In Example 4.3.9, we considered a continuous random variable with cumulative distribu-


tion function (CDF) given by the function
$

’0 x ă ´3
x 1

& 6 + 2 ´3 ď x ă ´1



F ( x ) = 13 ´1 ď x ă 1
1
’3x 1ďxă3




%1

xě3
The probability density function (PDF) of this variable is F1 ( x ), namely
$

’0 x ă ´3
1

& 6 ´3 ă x ă ´1



f ( x ) = 0 ´1 ă x ă 1
1
3 1ă x ă3





%0 x ą 3

The places where the probability density function (PDF) is 0 are telling: these are the
regions where our variable never reaches (impossible to occur).

8 You can see this by comparing the right and left limits of the limit definition of the derivative of F ( x ) at
these points.

324
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)

Our intuition about f ( x ) is that higher f ( x ) means more “hits” near x. In the dot
diagram above, f (2) ą f (´2), and indeed the dots are denser in the area 2 than near -2.
Example 4.4.6

Warning4.4.7.

Let f ( x ) be the probability density function (PDF) of a continuous random vari-


able. If f ( x ) ą f (y), it is not correct to say that x is more likely than y, because it
is still the case that Pr ( X = x ) = Pr ( X = y) = 0.

Corollary4.4.8.

From Definition 4.4.1, given a continuous random variable X with probability


density function (PDF) f ( x ):
şb
1. Pr ( a ď X ď b) = a f ( x )dx

2. f ( x ) ě 0 for all real x in the domain of f .


ş8
3. ´8 f ( x ) = 1

Proof. 1. By Corollaries 4.3.3 and 4.3.8, Pr ( a ď X ď b) = F (b) ´ F ( a). By the Funda-


şb
mental Theorem of Calculus Part 2, a f ( x )dx = F (b) ´ F ( a), since F1 ( x ) = f ( x ).
For this property, we are glossing over some details in assuming f ( x ) exists on ( a, b).
If it does not, then we partition ( a, b) into intervals where it does exist, and apply
the Fundamental Theorem of Calculus to those intervals separately.
2. By Part 2 of Corollary 4.3.10, F ( x ) is nondecreasing, so its derivative is nonnegative.
ş8
3. From the property above, ´8 f ( x ) = Pr (´8 ď X ď 8) = 1

The first property of Corollary 4.4.8 is a key piece of intuition for working with prob-
ability density functions (PDFs) : the probability density function (PDF) of a continuous
random variable X is a function f ( x ) with the property that the area under the curve of
f ( x ) from a to b is equal to the probability that X lies between a and b.
y

f (x)
x
a b

shaded area: Pr ( a ď X ď b)

325
P ROBABILITY 4.4 P ROBABILITY D ENSITY F UNCTION (PDF)

Example 4.4.9

A continuous random variable X has probability density function (PDF)


a
f (x) = 2
x +1
for some constant a.
1. Find a.
2. Find Pr (0 ď X ď 10).
3. Find the cumulative distribution function (CDF) of X.

Solution.
1. By Corollary 4.4.8,
a 1
ż8 ż8
1= 2
dx = a 2
dx
´8 x + 1 ´8 x + 1
" ż0 żc #
1 1
= a lim dx + lim dx
bÑ´8 b x 2 + 1 cÑ8 0 x2 + 1
   
=a lim (arctan 0 ´ arctan b) + lim (arctan c ´ arctan 0)
bÑ´8 cÑ8
h ´π π i
= a 0´ + +0 = a¨π
2 2
1
So, a = π.

2. By Corollary 4.4.8,
ż 10 ż 10
1/π 1 arctan(10)
Pr (0 ď X ď 10) = f ( x )dx = 2
dx = [arctan 10 ´ arctan 0] = « 0.47
0 0 x +1 π π

Note: because f ( x ) has even symmetry, we know Pr ( X ď 0) = Pr ( X ě 0) = 21 .


Also, Pr (0 ď X ď 10) ď Pr (0 ď X ), so it stands to reason that our answer would be
less than one-half.
3. Let F ( x ) be the cumulative distribution function (CDF) of X.
F ( x ) = Pr ( X ď x ) (definition of CDF)
= Pr (´8 ă X ď x )
żx żx
1/π
= f (t)dt = lim dt
´8 bÑ´8 b t2 + 1
1 1h πi
= lim [arctan x ´ arctan b] = arctan x +
π bÑ´8 π 2
1 1
= arctan x +
π 2
Note: it’s nice to do a quick sanity check by comparing F ( x ) to the properties of a
cumulative distribution function (CDF) in Corollary 4.3.10. This is a great way to
root out calculation errors, sign errors, and so on.

326
P ROBABILITY 4.5 E XPECTED VALUE

Example 4.4.9

We can formalize the last part of the previous exercise as a corollary to Corollary 4.4.8.

Corollary4.4.10.

Let X be a continuous random variable with probability density function (PDF)


f ( x ). Then the cumulative distribution function (CDF) of X is
żx
F(x) = f (t)dt
´8

Proof. The CDF is defined as F ( x ) = Pr ( X ď x ), i.e. Pr (´8 ă X ď x ). By Corollary 4.4.8,

żx
Pr (´8 ă X ď x ) = f (t)dt
´8

4.5IJ Expected Value

4.5.1 §§ Motivation: Long-Term Average

Suppose I throw a 4-sided dice a large number of times, and record the number that comes
up each time. What will the average (mean) of those numbers be?
To calculate the mean, I’ll add up the results of my rolls and divide by the number of
rolls I took.

(result of first roll) + (result of second roll) + ¨ ¨ ¨ + (result of last roll)


mean =
total number of rolls

327
P ROBABILITY 4.5 E XPECTED VALUE

The numerator will consist of the numbers 1 through 4, since these are the numbers re-
sulting from a 4-sided dice roll. Let’s regroup the numerator so we add up all the 1s first,
then all the 2s second, etc.
(1 + 1 + ¨ ¨ ¨ ) + (2 + 2 + ¨ ¨ ¨ ) + (3 + 3 + ¨ ¨ ¨ ) + (4 + 4 + ¨ ¨ ¨ )
=
total number of rolls

(1 + 1 + ¨ ¨ ¨ ) (2 + 2 + ¨ ¨ ¨ ) (3 + 3 + ¨ ¨ ¨ ) (4 + 4 + ¨ ¨ ¨ )
= + + +
total rolls total rolls total rolls total rolls

1 ¨ (number of times 1 was rolled) 2 ¨ (number of times 2 was rolled)


= +
total rolls total rolls
3 ¨ (number of times 3 was rolled) 4 ¨ (number of times 4 was rolled)
+ +
total rolls total rolls

= 1 ¨ (proportion of rolls resulting in 1) + 2 ¨ (proportion of rolls resulting in 2)

+ 3 ¨ (proportion of rolls resulting in 3) + 4 ¨ (proportion of rolls resulting in 4)


If we’ve rolled the dice a large number of times, we expect the proportion of rolls resulting
in 1 to closely approximate Pr ( X = 1), and so on.
4
ÿ
« 1 ¨ Pr ( X = 1) + 2 ¨ Pr ( X = 2) + 3 ¨ Pr ( X = 3) + 4 ¨ Pr ( X = 4) = x ¨ Pr ( X = x )
x =1

This calculation, what we expect to have as our average if we perform the dice roll a
large number of times, motivates Definition 4.5.1 below.

4.5.2 §§ Definition and Examples


The expected value or expectation of a random variables is often referred to as the “long-
term” average. This means that over the long term of doing an experiment over and over,
you would expect this average.

Definition4.5.1.

Given a discrete random variable X, the expected value of X, denoted E( X ), is


given by ÿ
x ¨ Pr ( X = x )
where the sum is taken over every possible value of X.
Given a continuous random variable X with probability density function (PDF)
f ( x ), the expected value of X is given by
ż8
x ¨ f ( x )dx
´8

328
P ROBABILITY 4.5 E XPECTED VALUE

Note the similarities between the continuous and discrete cases. A sum in the discrete
cases turns into an integral in the continuous case; Pr ( X = x ) turns into the probabil-
ity density function (PDF) f ( x ); and “every possible value of X” turns into the range
(´8, 8).
Example 4.5.2

In Example 4.2.6, we saw the following probability mass function (PMF) for the random
variable X:

x P( X = x )
5
0 P ( x = 0) = 100
5
1 P ( x = 1) = 100
40
2 P ( x = 2) = 100
23
3 P ( x = 3) = 100
13
4 P ( x = 4) = 100
10
5 P ( x = 5) = 100
0
6 P ( x = 6) = 100
3
7 P ( x = 7) = 100
1
8 P ( x = 8) = 100

The expected value of this discrete random variable is

8
ÿ
E( X ) = x ¨ Pr ( X = x )
x =0
= 0 ¨ Pr ( X = 0) + 1 ¨ Pr ( X = 1) + 2 ¨ Pr ( X = 2) + 3 ¨ Pr ( X = 3) + 4 ¨ Pr ( X = 4)
+ 5 ¨ Pr ( X = 5) + 6 ¨ Pr ( X = 6) + 7 ¨ Pr ( X = 7) + 8 ¨ Pr ( X = 8)
5 5 40 23 13 10 0 3 1
= 0¨ +1¨ +2¨ +3¨ +4¨ +5¨ +6¨ +7¨ +8¨
100 100 100 100 100 100 100 100 100
285
= = 2.85
100
The most literal interpretation of expected value in this context is this:

Suppose we choose a parent from a list at random many times, and each time
record the number of awakenings, X. After a large number of trials, we expect
the average of these X values to approach 2.85.

We can also interpret the calculation like this:

The average number of times a parent was woken up in our trial was 2.85.

329
P ROBABILITY 4.5 E XPECTED VALUE

Of course, no parent was woken up exactly 2.85 times in the night. Expected values
refer to averages, and do not necessarily accord well with individual trials.
Example 4.5.2

Probability does not describe the short-term results of an experiment. It gives informa-
tion about what can be expected in the long term. The Law of Large Numbers states that, as
the number of trials in a probability experiment increases, the difference between the the-
oretical probability of an event and the relative frequency approaches zero (the theoretical
probability and the relative frequency get closer and closer together).
Example 4.5.3

Suppose we flip a fair coin a large number of times. We want to record the average number
of times the flip resulted in heads.
Let X be the random variable corresponding to a coin flip, with X = 1 when the flip is
heads and X = 0 when the flip is tails. Using these assignments, if we add up the values
of X from each experiment, that sum tells us how many flips were heads. The expected
value of X is
2
ÿ 1 1 1
E( X ) = x ¨ Pr ( X = x ) = 0 ¨ + 1 ¨ =
2 2 2
x =1
Consider interpreting the expected value as a long-term average, using the law of large
numbers. If we were to flip a fair coin a large number of times, we would expect the
average value of X to be 21 . That is, we would expect roughly 12 of the tosses to result in
heads.
In 2009, intrepid undergraduate students at Berkeley tossed coins 40,000 times9 . The
tosses resulted in 20,217 heads. The fraction of coin tosses resulting in heads, therefore,
was
20, 217
= 0.505425
40, 000
which is indeed fairly close to 21 .
Example 4.5.3

Example 4.5.4

Let X be a continuous random variable with probability density function (PDF)

f ( x ) = ax2 (10 ´ x ), 0 ď x ď 10

where a is a constant.
Find a and E( X ).

9 A writeup is here: https://www.stat.berkeley.edu/˜aldous/Real-World/coin_tosses.


html. They were actually trying to determine whether the starting orientation of a coin had an impact
on the result of a toss. There’s actually a bit of a cart-before-the-horse problem in using this example
here: if we tossed a coin a large number of times and it didn’t result in very close to half heads and half
tails, the conclusion would be that the probability of tossing heads was not actually 12 .

330
P ROBABILITY 4.5 E XPECTED VALUE

Solution.
From Corollary 4.4.8 part 3:
ż8 ż 10 ż 10
2
1= f (x) = 0 + ax (10 ´ x )dx = a (10x2 ´ x3 )dx
´8 0 0
 10  
10 3 1 4 104 104 104
=a x ´ x =a ´ =a
3 4 0 3 4 12
12
a= 4
10

From Definition 4.5.1,


ż8
E( X ) = x ¨ f ( x )dx
´8

şb şb
Note where f ( x ) = 0, we have a x ¨ f ( x )dx = a 0dx = 0.
ż 10 ż 10
3
= 0+ ax (10 ´ x )dx = a (10x3 ´ x4 )dx
0 0
 10  5 
10 4 1 5 10 105 105
=a x ´ x =a ´ =a
4 5 0 4 5 20
12 105
= 4¨ =6
10 20

Example 4.5.4

Example 4.5.5

Suppose Y is a continuous random variable with probability density function (PDF)

f (x) = ex , xď0

Find E(Y ).
Solution.
From Definition 4.5.1,
ż0 "ż #
ż8 0
E (Y ) = x ¨ f ( x )dx = 0 + x ¨ e x dx = lim x ¨ e x dx
´8 ´8 aÑ´8 a

We use integration by parts with u = x, dv = e x dx; du = dx, v = e x


" #
ż0 h i
= lim [ xe x ]0a ´ e x dx = lim ´ae a ´ [e x ]0a = lim [´ae a ´ 1 + e a ]
aÑ´8 a aÑ´8 aÑ´8

331
P ROBABILITY 4.5 E XPECTED VALUE

Note lim e a = 0, so lim ´ae a has the indeterminate form 0 ¨ 8. We use l’Hôpital’s rule.
aÑ´8 aÑ´8

h ´a i  
1
= lim ´ 1 + 0 = lim ´ 1 = lim [e a ] ´ 1 = ´1
aÑ´8 looemo
´a
on aÑ´8 e ´a aÑ´8
numÑ8
denÑ8

So, E(Y ) = ´1.


Example 4.5.5

Example 4.5.6
1
Let Z be a continuous random variable with probability density function (PDF) f ( x ) = x2
,
x ě 1. Find E( Z ).
Solution.
From Definition 4.5.1,
ż8 ż8 ż8
E( Z ) = x ¨ f ( x )dx = 0 + x¨x ´2
dx = x´1 dx
´8 1 1
"ż #
b
= lim x´1 dx = lim [ln b] = 8
bÑ8 1 bÑ8

It is sometimes the case that the expectation of a continuous random variable is infinite.
How should we interpret that?
A random variable Z with the given probability density function (PDF) has sample
space is [1, 8). It takes on finite values, but there is no limit to how large those values can
be. (It is true that smaller values are more likely, since f ( x ) = x´2 is a decreasing function.
However, Z also takes on extremely large values from time to time.) E( Z ) = 8 tells us
that if we run our experiment Z a lot of times, over time the average will increase without
bound.
Example 4.5.6

4.5.3 §§ Checking your Expectation Calculation


The expectation of a random variable has several intuitive properties that can be used to
quickly check that your answer is reasonable.

Theorem4.5.7.

Let a, b be real numbers or ˘8 with a ă b. Suppose a (discrete or continuous)


random variable X takes values from the interval [ a, b]. Then E( X ) will be some
number in the interval [ a, b].

332
P ROBABILITY 4.5 E XPECTED VALUE

Proof. First, suppose X is continuous, with probability density function (PDF) f ( x ).


żb żb żb
E( X ) = x f ( x )dx ď b ¨ f ( x )dx = b f ( x )dx = b
a a a
żb żb żb
E( X ) = x f ( x )dx ě a ¨ f ( x )dx = a f ( x )dx = a
a a a

Next, suppose X is discrete


ÿ ÿ ÿ
E( X ) = x ¨ Pr ( X = x ) ď b ¨ Pr ( X = x ) = b Pr ( X = x ) = b
x x x
ÿ ÿ ÿ
E( X ) = x ¨ Pr ( X = x ) ě a ¨ Pr ( X = x ) = a Pr ( X = x ) = a
x x x

Theorem4.5.8.

Let a and b be real numbers with a ă b. Suppose a continuous (resp. discrete)


random variable X takes values from the interval [ a, b], and its probability den-
sity function (PDF) (resp. probability mass function (PMF)) is increasing on the
interval [ a, b]. Then E( X ) ą a+2 b .
Similarly, suppose a continuous (resp. discrete) random variable X takes values
from the interval [ a, b], with a ă b, and its probability density function (PDF)
(resp. probability mass function (PMF)) is decreasing on the interval [ a, b]. Then
E( X ) ă a+2 b .

Proof. Intuitively, an increasing f ( x ) means we have more high values than low values,
so when we average them together, the average will be high. Similarly, decreasing f ( x )
means we have more low values than high values, so when we average them together, the
average will be low.
More rigorously:
żb żb
a+b a+b b
ż
x f ( x )dx ´ = x f ( x )dx ´ f ( x )dx
a 2 a 2 a
żb 
a+b
= x´ f ( x )dx
a 2
ż a+b   żb  
2 a+b a+b
= x´ f ( x )dx + x´ f ( x )dx
a 2 a+b
2
2
ża   żb  
a+b a+b
= ´ x f ( x )dx + x´ f ( x )dx
a+b
2
2 a+b
2
2

Using the substitution y = a + b ´ x in the first integral and noting that dy = ´dx,
żb   żb  
a+b a+b
=´ y´ f ( a + b ´ y)dy + x´ f ( x )dx
a+b
2
2 a+b
2
2

333
P ROBABILITY 4.5 E XPECTED VALUE

Changing y’s into x’s,


żb   żb  
a+b a+b
=´ x´ f ( a + b ´ x )dx + x´ f ( x )dx
a+b
2
2 a+b
2
2
żb  
a+b h i
= x´ ´ f ( a + b ´ x ) + f ( x ) dx
a+b
2
2
h i  
a+b a+b a+b
For values of x in the interval 2 ,b , the term x ´ 2 is positive except at x = 2
where it is zero. So, all together,
żb  
a+b h i
żb
a+b
x f ( x )dx ´ = x´ f ( a + b ´ x ) ´ f ( x ) dx
a 2 a+b
2
2
loooooomoooooon
positive
 
a+b a+b
Furthermore, a + b ´ x ď a + b ´ 2 = 2 ď x.

If f ( x ) is increasing, then f ( a + b ´ x ) ă f ( x ):
żb  
a+b h i
żb
a+b
x f ( x )dx ´ = x´ ´ f ( a + b ´ x ) + f ( x ) dx ą 0
a 2 a+b
2
2
loooooomoooooon looooooooooooooomooooooooooooooon
positive positive
żb
a+b
so x f ( x )dx ą
a 2
 i
a+b
If f ( x ) is decreasing, then f ( a + b ´ x ) ą f ( x ) whenever x P 2 ,b :
żb  
a+b h i
żb
a+b
x f ( x )dx ´ = x´ ´ f ( a + b ´ x ) + f ( x ) dx ą 0
a 2 a+b
2
2
loooooomoooooon looooooooooooooomooooooooooooooon
positive negative
żb
a+b
so x f ( x )dx ą
a 2

The discrete case proceeds in a similar fashion.

Example 4.5.9

Suppose X is a continuous random variable with probability density function (PDF)


#
e x´a if 1 ď x ď 5
f (x) =
0 else

for some appropriate constant a. Using the two theorems in this section, give a range for
E( X ).

334
P ROBABILITY 4.5 E XPECTED VALUE

Solution. X only takes values from [1, 5], so by Theorem 4.5.7, 1 ď E( X ) ď 5.


The probability density function (PDF) f ( x ) = e x´a is an increasing function, so by
Theorem 4.5.8, E( X ) ą 5+2 1 = 3.
So, E( X ) is in the interval (3, 5].
Note: There is a unique value of a for which f ( x ) is a probability density function
(PDF). It is the value of a that satisfies the following equality:
ż5
1= e x´a dx
1

i.e. a = ln e5 ´ e .
Example 4.5.9

Example 4.5.10

You calculate expected values for the various random variables described below. Which
of the values can you immediately, with very little computation, say are wrong? Which
seem reasonable?

1. W is a random variable that takes values from [4, 5], and you calculate E(W ) = 4.75.

2. X is a random variable that takes values from [´1, 0], and you calculate E( X ) = 0.5.

3. Y is a continuous random variable with probability density function (PDF)


#
1
1ďxďe
f (x) = x
0 else

and you calculate E(Y ) = 1.9.

4. Z is a continuous random variable with probability density function (PDF)


#
1
x 2 1ďx
f (x) =
0 xă1

and you calculate E( Z ) = ´1.

5. A is a continuous random variable with probability density function (PDF)


#1
?1 ď x
x3 2
f (x) =
0 x ă ?12
?
and you calculate E( A) = 2.

Solution. From Theorem 4.5.7, E( X ) and E( Z ) are incorrect.


The PDF of Y is decreasing on [1, e], so E(Y ) ă e+2 1 « 3.72 = 1.85 by Theorem 4.5.8.
Therefore the result E(Y ) = 1.9 is incorrect.

335
P ROBABILITY 4.5 E XPECTED VALUE

For E(W ), we don’t have enough information to apply Theorem 4.5.8. However, it
passes the test of Theorem 4.5.7. So E(W ) is reasonable, though we have no way of know-
ing whether it its correct.
For E( A), Theorem 4.5.8 doesn’t apply, since the values of A do not lie in a finite
interval. However, it passes the test of Theorem 4.5.7. So E(W ) is reasonable. (Indeed, if
you go through the calculation, it is correct.)
Example 4.5.10

Example 4.5.11 (Conspiracy Theories)

The paper On the Viability of Conspiratorial Beliefs10 investigates a probabilistic model11 for
the length of time a conspiracy theory can remain secret. In particular, the author uses the
formula N (t)
L(t) = 1 ´ e´t(1´(1´p) )
where L(t) is the probability that, after t years, a leak has occurred that would cause the
conspiracy to be exposed; N (t) is the number of people involved in the conspiracy at time
t; and p is the probability that a person involved will cause a leak in any particular year.
(It is implied that L(t) = 0 for negative values of t.)
For this example, we’ll only use a very basic version of the full model. Suppose there
are 100 (immortal) people involved in a conspiracy, no new people are ever brought into
the conspiracy, and each person has a 1% chance of causing a leak in one year.
(a) Using the model above, what is the expected amount of time it will take for a leak to
occur?
(b) Using the model above, what is the probability that the conspiracy can survive with-
out a leak for at least 5 years?

Solution.
(a) L(t) is the probability that, at time t, at least one leak has occurred. Let T be the
time that the first leak occurs. Then L(t) = Pr ( T ď t). So, the function L(t) is the
cumulative distribution function of T. In order to find E( T ), we’ll need the probability
density function of T, which will be L1 (t) (by Definition 4.4.1).
1
Let’s start by filling in our constants: N (t) = 100 and p = 100 .

L(t) = 1 ´ e´t(1´(1´p)
N (t)
) = 1 ´ e´t(1´0.99100 ) = 1 ´ et(0.99100 ´1)

10 Grimes DR (2016) On the Viability of Conspiratorial Beliefs. PLoS ONE 11(1): e0147905. https://
doi.org/10.1371/journal.pone.0147905
11 The assumptions made that lead to this model are that every member of the conspiracy is equally likely
to cause a leak (whether by negligence or on purpose); that leak events are independent of one another;
and that the probability of a conspirator causing a leak in any given year is constant. The full derivation
is beyond the scope of the text, but the interested reader may look up “Poisson distribution.”
The paper goes on to approximate p using conspiracy theories that have been exposed. They also use
demographic data to approximate N (t). They apply the model to famous conspiracy theories (e.g. the
moon landing being faked) to discuss whether such a plot could realistically remain secret until present
day.

336
P ROBABILITY 4.5 E XPECTED VALUE


Note that 0.99100 ´ 1 is a constant. In order to make the work below clearer, we’ll
replace it with c.

L(t) = 1 ´ ect where c = 0.99100 ´ 1

We find the PDF of T by differentiating L(t).

L1 (t) = ´cect

Now we use the definition of expected value, Definition 4.5.1


żb
 
ż8 ż8
tc
E( T ) = t ¨ L (t)dt =
1
t ¨ ´ce dt = lim t ¨ ´cect dt
´8 0 bÑ8 0

We use integration by parts with u = t, dv = ´cect dt; du = dt, v = ´ect

" żb #
 
ct b
= lim ´te 0
´ ´ect dt
bÑ8 0
" b # 
1
= lim ´bebc + ect
bÑ8 c 0
 
bc 1 bc 1
= lim ´be + e ´
bÑ8 c c
  
1 1
= lim ´ b ebc ´
bÑ8 c c
 
Since c ă 0, lim ebc = 0 (*). So, 1c ´ b ebc has the indeterminate form ´8 ¨ 0. We will
bÑ8
re-write this in order to use l’Hôpital’s rule.

" #
1
c ´b 1
= lim ´
bÑ8 e´bc c
" #
d 1
db [ c ´ b ] 1
= lim d ´bc
´
c
db [ e
bÑ8 ]
 
´1 1
= lim ´
bÑ8 ´ce ´bc c
 
1 bc 1
= lim e ´
bÑ8 c c
1
= 0´ using (*)
c
1 1
= ´ 100 = « 1.58
.99 ´ 1 1 ´ .99100

So, the expected value of the time it would take for this conspiracy theory to be leaked
is about 19 months.

337
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION

(b) The probability that no leak has occurred at time t = 5 is 1 ´ L(5):

100 ´1)
1 ´ L(5) = e5(.99 « 0.04

So, there’s about a 4% chance that the conspiracy would survive at least 5 years with-
out any leaks.

Example 4.5.11

4.6IJ Variance and Standard Deviation

4.6.1 §§ Motivation: Average difference from the average


In Example 4.5.2, we found that if we chose one of our 100 parents at random, the expected
number of nightly awakenings was 2.85. If we choose a parent at random in this way, how
can we determine whether that parent had a “usual” or “unusual” experience? Let’s
get our head around this problem with some preliminary observations.

• The expected value is not an integer. So no matter who we choose, we are guaran-
teed to not choose a parent with the expected number of awakenings. So, a “usual”
experience is not the same as actually achieving the expected value.

• If we choose a parent with 3 awakenings, that’s as close as we can get to the expec-
tation. It seems reasonable that when X « E( X ), that’s a fairly “usual” trial.

• Parents with two awakenings are the most numerous. So although these parents are
farther from average, we are more likely to choose one of them than we are to choose
any other. So it is not enough to look for value of X that are closest to E( X ).

• Suppose we choose a parent with 4 awakenings. Is this so far above average that is
is very unusual (and so possibly a cause for concern) or is it still within a reasonably
common range? This question will bring us to the heart of the matter: how far from
E( X ) is still “usual”?

To quantify the last bullet point, let’s compare each parent’s experience to the expected
value. If your baby woke you up twice during the night, then your experience differs from
the average by 0.85. If your baby woke you up three times during the night, then your
experience differs from the average by 0.15. Let’s give that difference its own variable
name, Y. Larger values of Y mean a larger difference between the individual experience
and the expectation. So parents with a high Y value are “less usual” than parents with a
low Y-value.

338
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION

x proportion of parents woken up x times Y = |x ´ 2.85|


5
0 100 2.85
5
1 100 1.85
40
2 100 0.85
23
3 100 0.15
13
4 100 1.15
10
5 100 2.15
0
6 100 3.15
3
7 100 4.15
1
8 100 5.15

The expectation of Y is about 2.38:

8
ÿ
E (Y ) = |x ´ 2.85| ¨Pr (Y = y)
loooomoooon
x =0
Y
5 5 40 23 13
= 2.85 ¨ + 1.85 ¨ + 0.85 ¨ + 0.15 ¨ + 1.15 ¨
100 100 100 100 100
10 0 3 1
+ 2.15 ¨ + 3.15 ¨ + 4.15 ¨ + 5.15 ¨
100 100 100 100
« 1.15

That is, when we choose parents at random, on average their number of awakenings
differs from the expected number of awakenings by 1.15.
With that in mind, we might say a parent who wakes up between 1.15 ´ 2.38 = ´1.23
and 1.15 + 2.38 = 3.53 times wakes up a “usual” number of times, which the other parents
have experiences that are “unusual”. A parent whose baby wakes then up four times dur-
ing the night is “unusual,” in that their experience is quite different from the expectation,
but a parent whose baby never wakes them up is still in the range of “usual”.
To generalize what we just computed:

• X is a random variable

• Y = |X ´ E( X )| tells us how different X is from its expectation


ˇ ˇ
• So, E(Y ) = E ˇX ´ E( X )ˇ is the expected difference from the expectation. We
ˇ

used this as a measure of how far off from E( X ) our variable X could be and still be
considered “usual”.

339
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION

4.6.2 §§ Definitions and Computations


Definition4.6.1.

The variance of a random variable X, denoted Var( X ), is:


h 2 i
E X ´ E( X ) .

The standard deviation of X, written σ ( X ), is the square root of the variance:


b
σ ( X ) = Var( X )

Recall from Definition 4.5.1 that the definition of E( X ) depends on whether X is con-
tinuous or discrete.
Corollary4.6.2.

If X is a discrete random variable, then

Var( X ) = ( x ´ E( X ))2 ¨ Pr ( X = x )
ÿ

where the sum is taken over every possible value of X.


If X is a continuous random variable, then
ż8
Var( X ) = ( x ´ E( X ))2 ¨ f ( x )dx
´8

where f ( x ) is the probability density function (PDF) of X.

Note the similarities between Var( X ) and E(Y ) from the end of the last subsection,
4.6.1. Their interpretations are similar: Var( X ) measures the expected squared difference
between X and E( X )12 .
One reason we replace |X ´ E( X )| with ( X ´ E( X ))2 is that f ( X ) = |x ´ E( X )| is
not differentiable, while f ( x ) = ( x ´ E( X ))2 is differentiable. We want to be able to use
calculus tools, so differentiability is desirable.
Example 4.6.3

Consider the random variable X with probability mass function (PMF) given below.
x Pr ( X = x )
1
0 2
1
10 2

12 To explore why we need absolute values or squares, see Question 13 in Section 4.6 of the practice book.

340
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION

X takes on values from [0, 10], with E( X ) = 5. Every value of X differs from E( X ) by
5. However,

1 1
( x ´ 5)2 ¨ Pr ( X = x ) = (0 ´ 5)2 ¨ + (10 ´ 5)2 ¨ = 25
ÿ
Var( X ) =
x
2 2

A drawback to replacing |X ´ E( X )| with ( X ´ E( X ))2 is that the variance may no


longer be in a meaningful range. In this case, 25 is not in the range of numbers we’re
considering, so it’s hard to interpret this as a “usual difference” between X and E( X ).
That’s why we define the standard deviation:
?
σ(G) = 25 = 5

We take the square root of Var( X ) to somehow atone for our previous transgression
of squaring |X ´ E( X )|. Informally, we think of the standard deviation as the “usual”
difference between X and E( X ).
Example 4.6.3

Example 4.6.4

One thousand students take a midterm, and we choose one student uniformly at random.
X is the mark the student got on the midterm, out of 100. For this particular group of 1000
students, E( X ) = 65 and σ( X ) = 15.

• Suppose we select Student A, who earned 60 points. Although this is below the class
average, it is within one standard deviation of the expectation. That is,

|X ´ E( X )| = 5 ă 15 = σ ( X ).

So this student is not below average in a really significant way.

• If we select Student B who scored 90, not only are they above the class average, they
are well above the class average. The difference between X and E( X ) is greater than
usual.

• If we select Student C who scored 45, not only are they below the class average, they
are well below the class average. The difference between X and E( X ) is worse than
usual.

• In general, we think of students with grades from 50 to 80 as having a “usual” score.


Those numbers come from E( X ) ´ σ( X ) = 50 and E( X ) + σ( X ) = 80.

Example 4.6.4

The variance of a random variable is often calculated in the manner below:

341
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION

Corollary4.6.5.
 2
Var( X ) = E( X 2 ) ´ E( X )

Proof. In the continuous case, from Corollary 4.6.2:


2
ż8
Var( X ) = x ´ E( X ) f ( x )dx
ż´8
8 
= x2 ´ 2x ¨ E( X ) + [E( X )]2 f ( x )dx
ż´8
8 ż8 ż8
2 2
= x f ( x )dx ´ 2 ¨ E( X ) x f ( x )dx + [E( X )] f ( x )dx
´8 ´8 ´8
By the definition of E( X ), Definition 4.5.1:
ż8
2 2
= E( X ) ´ 2 ¨ E( X ) ¨ E( X ) + [E( X )] f ( x )dx
´8
By property 3 of Corollary 4.4.8,
= E( X 2 ) ´ 2 ¨ [E( X )]2 + [E( X )]2 ¨ 1
= E( X 2 ) ´ [E( X )]2
The discrete case progresses similarly. From Corollary 4.6.2:
Var( X ) = ( x ´ E( x ))2 ¨ Pr ( X = x )
ÿ

x

x2 ´ 2 ¨ E( x ) + [E( X )]2 ¨ Pr ( X = x )
ÿ
=
x
x2 ¨ Pr ( X = x ) ´ 2 ¨ E( X ) ¨ xPr ( X = x ) + [E( X )]2
ÿ ÿ ÿ
= Pr ( X = x )
x x x

By the definition of E( X ), Definition 4.5.1:


= E( X 2 ) ´ 2 ¨ E( X ) ¨ E( X ) + [E( X )]2
ÿ
Pr ( X = x )
x
By the definition of a PDF, Definition 4.2.1,
= E( X 2 ) ´ 2 ¨ E( X ) ¨ E( X ) + [E( X )]2 ¨ 1
= E( X 2 ) ´ [E( X )]2

Example 4.6.6

Suppose X is a continuous random variable with probability density function (PDF)


#
x
if 0 ď x ď 10
f ( x ) = 50
0 else
We will calculate Var( X ) two ways.

342
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION

• To calculate Var( X ), we first need to know E( X ).

ż 10
x
ż8
E( X ) = x ¨ f ( x )dx = x¨ dx
´8 0 50
ż 10 ˇ10
x2 3
x ˇ 103 20
= dx = ˇ = =
0 50 150 ˇ 0 150 3

• Using Corollary 4.6.2,

ż 10 

20 2 x
ż8
2
Var( X ) = ( x ´ E( x )) ¨ f ( x )dx = x´ ¨ dx
´8 0 3 50
ż 10   ż 10  
2 40 400 x 1 3 40 2 400
= x ´ x+ ¨ dx = x ´ x + x dx
0 3 9 50 50 0 3 9
 4 ˇ10  
1 x 40x3 200x2 ˇˇ 1 104 40 ¨ 103 200 ¨ 102
= ´ + ˇ = 50 ´ +
50 4 9 9 0 4 9 9
4  
10 1 4 2 50
= ´ + =
50 4 9 9 9

• Using Corollary 4.6.5,

Var( X ) = E( X 2 ) ´ [E( X )]2


 2 ż 10
20 x 400
ż8
2
= x ¨ f ( x )dx ´ = x2 ¨ dx ´
´8 3 0 50 9
10
x4 ˇˇ 400 104 400 50
ˇ
= ´ = ´ =
50 ¨ 4 0
ˇ 9 200 9 9

Example 4.6.6

Example 4.6.7

Calculate the variance (two ways) and standard deviation of a dice roll.

Solution. (Since we’ll be evaluating sums, Theorem 3.1.6 comes in handy.)


Let X be the random variable that takes on the number rolled. By Definition 4.5.1,

6 6  
ÿ 1 ÿ 1 6¨7 7
E( X ) = x ¨ Pr ( X = x ) = x= =
6 6 2 2
x =1 x =1

343
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION

Using Corollary 4.6.2,

6  2
ÿ
2
ÿ 7 1
Var( X ) = ( x ´ E( X )) ¨ Pr ( X = x ) = x´ ¨
x
2 6
x =1
6   6 6 6
ÿ 1 2 49 1 2 7 ÿ
ÿ 1 ÿ 49
= x ´ 7x + = x ´ x+
6 4 6 6 6 4
x =1 x =1 x =1 x =1
   
1 6 ¨ 7 ¨ 13 7 7¨6 49
= ´ +
6 6 6 2 4
35
=
12

Using Corollary 4.6.5 to calculate a second way,

6  2
2
 2 ÿ
2 7
Var( X ) = E( X ) ´ E( X ) = x ¨ Pr ( X = x ) ´
2
x =1
6  
ÿ 1 2 49 1 6 ¨ 7 ¨ 13 49 35
= x ´ = ´ =
6 4 6 6 4 12
x =1

(Computing the variance two different ways is not usually necessary, but it can be a
good way to double-check your work.)
b
Using Definition 4.6.1, σ ( X ) = Var( X ) = 35
a
12 « 1.7

Example 4.6.7

Example 4.6.8

A continuous random variable W has cumulative distribution function (CDF)


$
&0
’ xă0
F(x) = ex ´ 1 0 ď x ď ln 2
1 x ą ln 2

%

Calculate the variance and standard deviation of W. For practice, use both methods dis-
cussed in this section for computing variance.
Solution.
We use the variance to calculate the standard deviation; we use expected value to
calculate variance; we use probability density function (PDF) to calculate expected value;
and we use cumulative distribution function (CDF) to define probability density function
(PDF). Working backwards, this gives us a plan for performing the necessary calculations.

Step 1 Step 2 Step 3 Step 4


F ( x ) ÝÝÝÝÝÑ f ( x ) ÝÝÝÝÝÑ E(W ) ÝÝÝÝÝÑ Var(W ) ÝÝÝÝÝÑ σ(W )

344
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION

Step 1 Definition 4.4.1 tells us the probability density function (PDF) is the derivative of
the cumulative distribution function (CDF).
$
&0
’ xă0
F ( x ) = e x ´ 1 0 ď x ď ln 2
1 x ą ln 2

%
$
&0 x ă 0

f ( x ) = e x 0 ă x ă ln 2
0 x ą ln 2

%
#
e x 0 ă x ă ln 2
=
0 else

Step 2 Using Definition 4.5.1


ż8
E (W ) = x ¨ f ( x )dx
´8
ż ln 2
= x ¨ e x dx
0
We use integration by parts with u = x, dv = e x dx; du = dx, v = e x
h iln 2 ż ln 2
= xe x ´ e x dx = 2 ln 2 ´ [2 ´ 1] = 2 ln 2 ´ 1 « 0.39
0 0
We can do a quick reliability check using Theorems 4.5.7 and 4.5.8. Our variable W
spends its entire life between 0 and ln 2 « 0.69, so we expect E(W ) to be in that same
interval, which is true. Since f ( x ) is increasing on the relevant interval, W spends
more time near the larger numbers, so we also expect E(W ) ą ln22 « 0.35. This
accords with our calculation.
Step 3 We’ll be using the constant E(W ) = 2 ln 2 ´ 1 a lot in the calculations below, so
we’ll use logarithm rules to write it more compactly:
2 ln 2 ´ 1 = ln(22 ) ´ ln e = ln 4 ´ ln e = ln(4/e)
The benefit of this equivalent expression is that when we square it, there are no
binomials to expand. (It is of course perfectly possible to do the computations with
2 ln 2 ´ 1.)
Using Corollary 4.6.2,
ż8 ż ln 2
2
Var(W ) = ( x ´ E(W )) ¨ f ( x )dx = ( x ´ ln(4/e))2 ¨ e x dx
´8 0
ż ln 2  
= x2 ´ 2x ln(4/e) + ln2 (4/e) e x dx
0
ż ln 2 ż ln 2 ż ln 2
2 x x 2
= x e dx ´ 2 ln(4/e) x ¨ e dx + ln (4/e) e x dx
0 0 0
ż ln 2
= x2 e x dx ´ 2 ln(4/e)E(W ) + ln2 (4/e)(2 ´ 1)
0

345
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION

We’ll use integration by parts on the remaining integral: u = x2 , dv = e x dx; du =


2xdx, v = e x
h iln 2 ż ln 2
= x e 2 x
´ 2x ¨ e x dx ´ 2 ln2 (4/e) + ln2 (4/e)
0 0
= 2 ln 2 ´ 2E(W ) ´ ln2 (4/e)
2

= 2 ln2 2 ´ 2 ln(4/e) ´ ln2 (4/e)

To simplify, we’ll revert the arguments of our logarithms to 2 ln 2 ´ 1, rather than


ln(4/e).
 2
= 2 ln2 2 ´ 2 2 ln 2 ´ 1 ´ 2 ln 2 ´ 1

= 2 ln2 2 ´ 4 ln 2 + 2 ´ 4 ln2 2 ´ 4 ln 2 + 1
= 1 ´ 2 ln2 2 « 0.039

Using Corollary 4.6.5,


ż ln 2
2
 2
Var(W ) = E(W ) ´ E(W ) = x2 e x dx ´ (2 ln 2 ´ 1)2
0

The integral was already computed in the work above.

= (2 ln2 2 ´ 4 ln 2 + 2) ´ (2 ln 2 ´ 1)2
= 1 ´ 2 ln2 2 « 0.039

Step 4 By Definition 4.6.1,


b b
σ(W ) = Var(W ) = 1 ´ 2 ln2 2 « 0.198

Example 4.6.8

4.6.3 §§ Checking your Standard Deviation Calculation


Corollary4.6.9.

Let a, b be real numbers with a ă b and suppose a random variable X takes values
from the interval [ a, b]. Then

b´a
0 ď σ(X ) ď
2

346
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION

Proof. First, consider what happens when we replace E( X ) with b+2 a (the midpoint of the
sample space) in the definition of variance (Definition 4.6.1).
  ! żb 
b+a 2 b+a 2
E X´ = x´ ¨ f ( x )dx
2 a 2
żb  2 !
b + a
= x2 ´ (b + a) x + ¨ f ( x )dx
a 2
  ż
b+a 2 b
żb żb
2
= x f ( x )dx ´ (b + a) x f ( x )dx + f ( x )dx
a a 2 a
 
2 b+a 2
= E( X ) ´ ( b + a )E( X ) +
2
 
2 2 2 b+a 2
= E( X ) ´ [E( X )] + [E( X )] ´ (b + a)E( X ) +
2
 2
b+a
= E( X 2 ) ´ [E( X )]2 + E( X ) ´
2
ě E( X 2 ) ´ [E( X )]2 = Var( X ) (*)
Since X takes values in the interval [ a, b]:
a ďX ď b
b+a b+a b+a
ùñ a´ ďX ´ ď b´
2 2 2
b´a b+a b´a
ùñ ´ ďX ´ ď
2 2 2
 2  
b+a b´a 2
ùñ 0ď X´ ď
2 2
By Theorem 4.5.7,
 2 !  2
b+a b´a
0ďE X´ ď
2 2

So, with our previous result (*),


 2 !  2
b+a b´a
Var( X ) ď E X´ ď
2 2
b´a
So, σ(X ) ď
2

Example 4.6.10

If the random
a variable X takes on values from the interval [1, 5], then 0 ď σ ( X ) ď 2. Since
σ( X ) = Var( X ), then 0 ď Var( X ) ď 4.

347
P ROBABILITY 4.6 VARIANCE AND S TANDARD D EVIATION

Example 4.6.10

Chapter 4 contains content adapted by Bruno Belevan, Parham Hamidi, and Elyse
Yeager from Sections 1.1, 3.1, Ch 4 introduction, 4.1, and 4.2 of Introductory Statistics by
Ilowsky and Dean under a Creative Commons Attribution License v4.0.

348
Chapter 5

S EQUENCE AND S ERIES

You have probably learned about Taylor polynomials1 and, in particular, that

x2 x3 xn
ex = 1 + x + + +¨¨¨+ + En ( x )
2! 3! n!

where En ( x ) is the error introduced when you approximate e x by its Taylor polynomial of
degree n. You may have even seen a formula for En ( x ). We are now going to ask what
happens as n goes to infinity? Does the error go zero, giving an exact formula for e x ? We
shall later see that it does and that

x2 x3 xn
8
x
ÿ
e = 1+x+ + +¨¨¨ =
2! 3! n =0
n!

At this point we haven’t defined, or developed any understanding of, this infinite sum.
How do we compute the sum of an infinite number of terms? Indeed, when does a sum
of an infinite number of terms even make sense? Clearly we need to build up foundations
to deal with these ideas. Along the way we shall also see other functions for which the
corresponding error obeys lim En ( x ) = 0 for some values of x and not for other values of
nÑ8
x.
To motivate the next section, consider using the above formula with x = 1 to compute
the number e:
8
1 1 ÿ 1
e = 1+1+ + +¨¨¨ =
2! 3! n =0
n!

As we stated above, we don’t yet understand what to make of this infinite number of
terms, but we might try to sneak up on it by thinking about what happens as we take

1 Now would be an excellent time to quickly read over your notes on the topic.

349
S EQUENCE AND S ERIES 5.1 S EQUENCES

more and more terms.

1 term 1=1
2 terms 1+1 = 2
1
3 terms 1 + 1 + = 2.5
2
1 1
4 terms 1 + 1 + + = 2.666666 . . .
2 6
1 1 1
5 terms 1+1+ + + = 2.708333 . . .
2 6 24
1 1 1 1
6 terms 1+1+ + + + = 2.716666 . . .
2 6 24 120

By looking at the infinite sum in this way, we naturally obtain a sequence of numbers

t 1 , 2 , 2.5 , 2.666666 , . . . , 2.708333 , . . . , 2.716666 , . . . , ¨ ¨ ¨ u.

The key to understanding the original infinite sum is to understand the behaviour of this
sequence of numbers — in particularly, what do the numbers do as we go further and
further? Does it settle down 2 to a given limit?

5.1IJ Sequences
In the discussion above we used the term “sequence” without giving it a precise mathe-
matical meaning. Let us rectify this now.

Definition5.1.1.

A sequence is a list of infinitely3 many numbers with a specified order. It is


denoted (8
a1 , a2 , a3 , ¨ ¨ ¨ , a n , ¨ ¨ ¨ or an or a n n =1
( (

We will often specify a sequence by writing it more explicitly, like


! )8
an = f (n)
n =1

where f (n) is some function from the natural numbers to the real numbers.

2 You will notice a great deal of similarity between the results of the next section and “limits at infinity”
which was covered last term.
3 For the more pedantic reader, here we mean a list of countably infinitely many numbers. The interested
(pedantic or otherwise) reader should look up countable and uncountable sets.

350
S EQUENCE AND S ERIES 5.1 S EQUENCES

Example 5.1.2

Here are three sequences.


! 1 1 1 ) ! 1 )8
1, , , ¨ ¨ ¨ , , ¨ ¨ ¨ or an =
2 3 n n n =1
! ) ! )8
1, 2, 3, ¨ ¨ ¨ , n, ¨ ¨ ¨ or an = n
n =1
! ) ! )8
1, ´1, 1, ´1, ¨ ¨ ¨ , (´1)n´1 , ¨ ¨ ¨ or an = (´1)n´1
n =1

It is not necessary that there be a simple explicit formula for the nth term of a sequence.
For example the decimal digits of π is a perfectly good sequence

3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3, 2, 3, 8, 4, 6, 2, 6, 4, 3, 3, 8, ¨ ¨ ¨
(

but there is no simple formula4 for the nth digit.


Example 5.1.2
Our primary concern with sequences will be the behaviour of an as n tends to infinity and,
in particular, whether or not an “settles down” to some value as n tends to infinity.

Definition5.1.3.
(8
A sequence an n=1 is said to converge to the limit A if an approaches A as n
tends to infinity. If so, we write

lim an = A or an Ñ A as n Ñ 8
nÑ8

A sequence is said to converge if it converges to some limit. Otherwise it is said


to diverge.

The reader should immediately recognise the similarity with limits at infinity

lim f ( x ) = L if f ( x ) Ñ L as x Ñ 8
xÑ8

Example 5.1.4

Three of the four sequences in Example 5.1.2 diverge:


(8
• The sequence an = n n=1 diverges because an grows without bound, rather than
approaching some finite value, as n tends to infinity.

4 There is, however, a remarkable result due to Bailey, Borwein and Plouffe that can be used to compute
the nth binary digit of π (i.e. writing π in base 2 rather than base 10) without having to work out the
preceding digits.

351
S EQUENCE AND S ERIES 5.1 S EQUENCES

• The sequence an = (´1)n´1 n=1 diverges because an oscillates between +1 and ´1


(8

rather than approaching a singe value as n tends to infinity.


• The sequence of the decimal digits of π also diverges, though the proof that this is
the case is a bit beyond us right now5 .

The other sequence in Example 5.1.2 has an = n1 . As n tends to infinity, 1


n tends to zero. So

1
lim =0
nÑ8 n

Example 5.1.4

 
n
Example 5.1.5 lim
nÑ8 2n+1
n
Here is a little less trivial example. To study the behaviour of 2n+1 as n Ñ 8, it is a good
idea to write it as
n 1
=
2n + 1 2 + n1
As n Ñ 8, the n1 in the denominator tends to zero, so that the denominator 2 + 1
n tends to
2 and 1 1 tends to 12 . So
2+ n

n 1 1
lim = lim 1
=
nÑ8 2n + 1 nÑ8 2 + 2
n

Example 5.1.5
Notice that in this last example, we are really using techniques that we used before to
study infinite limits like lim f ( x ). This experience can be easily transferred to dealing
xÑ8
with lim an limits by using the following result.
nÑ8

Theorem5.1.6.

If
lim f ( x ) = L
xÑ8

and if an = f (n) for all positive integers n, then

lim an = L
nÑ8

5 If the digits of π were to converge, then π would have to be a rational number. The irrationality of π
(that it cannot be written as a fraction) was first proved by Lambert in 1761. Niven’s 1947 proof is more
accessible and we invite the interested reader to use their favourite search engine to find step–by–step
guides to that proof.

352
S EQUENCE AND S ERIES 5.1 S EQUENCES

 
Example 5.1.7 lim e´n
nÑ8

Set f ( x ) = e´x . Then e´n = f (n) and

since lim e´x = 0 we know that lim e´n = 0


xÑ8 nÑ8

Example 5.1.7
The bulk of the rules for the arithmetic of limits of functions that you already know also
apply to the limits of sequences. That is, the rules you learned to work with limits such as
lim f ( x ) also apply to limits like lim an .
xÑ8 nÑ8

Theorem5.1.8 (Arithmetic of limits).


(8 (8
Let A, B and C be real numbers and let the two sequences an n =1
and bn n =1
converge to A and B respectively. That is, assume that

lim an = A lim bn = B
nÑ8 nÑ8

Then the following limits hold.


 
(a) lim an + bn = A + B
nÑ8
(The limit of the sum is the sum of the limits.)
 
(b) lim an ´ bn = A ´ B
nÑ8
(The limit of the difference is the difference of the limits.)

(c) lim Can = CA.


nÑ8

(d) lim an bn = A B
nÑ8
(The limit of the product is the product of the limits.)
an A
(e) If B ‰ 0 then lim =
nÑ8 bn B
(The limit of the quotient is the quotient of the limits provided the limit of the
denominator is not zero.)

We use these rules to evaluate limits of more complicated sequences in terms of the
limits of simpler sequences — just as we did for limits of functions.

Example 5.1.9

353
S EQUENCE AND S ERIES 5.1 S EQUENCES

Combining Examples 5.1.5 and 5.1.7,


h n i n
lim + 7e ´n
= lim + lim 7e´n by Theorem 5.1.8.a
nÑ8 2n + 1 nÑ8 2n + 1 nÑ8
n
= lim + 7 lim e´n by Theorem 5.1.8.c
nÑ8 2n + 1 nÑ8
1
= +7¨0 by Examples 5.1.5 and 5.1.7
2
1
=
2

Example 5.1.9

There is also a Squeeze Theorem for sequences.

Theorem5.1.10 (Squeeze Theorem).

If an ď cn ď bn for all natural numbers n, and if

lim an = lim bn = L
nÑ8 nÑ8

then
lim cn = L
nÑ8

Example 5.1.11

In this example we use the Squeeze Theorem to evaluate


h πn i
lim 1 +
nÑ8 n
where πn is the nth decimal digit of π. That is,

π1 = 3 π2 = 1 π3 = 4 π4 = 1 π5 = 5 π6 = 9 ¨¨¨

We do not have a simple formula for πn . But we do know that


πn 9 πn 9
0 ď πn ď 9 ùñ 0 ď ď ùñ 1 ď 1 + ď 1+
n n n n
and we also know that
h 9i
lim 1 = 1 lim 1 + =1
nÑ8 nÑ8 n
9
So the Squeeze Theorem with an = 1, bn = 1 + πnn , and cn = 1 + n gives
h πn i
lim 1 + =1
nÑ8 n

354
S EQUENCE AND S ERIES 5.1 S EQUENCES

Example 5.1.11

Finally, recall that we can compute the limit of the composition of two functions using
continuity. In the same way, we have the following result:

Theorem5.1.12 (Continuous functions of limits).

If lim an = L and if the function g( x ) is continuous at L, then


nÑ8

lim g( an ) = g( L)
nÑ8

 
Example 5.1.13 lim sin πn
2n+1
nÑ8
n

Write sin 2nπn
+1 = g 2n+1 with g( x ) = sin(πx ). We saw, in Example 5.1.5 that

n 1
lim =
nÑ8 2n + 1 2

Since g( x ) = sin(πx ) is continuous at x = 12 , which is the limit of n


2n+1 , we have

πn  n  1 π
lim sin = lim g =g = sin = 1
nÑ8 2n + 1 nÑ8 2n + 1 2 2

Example 5.1.13

5.1.1 §§ (Optional) Geometric and harmonic sequences in musical scales


Lists of numbers don’t always get added together6 , so sequences (that are not worked into
series) can be interesting in their own right. We present here an application of sequences
to music theory.
First, some musical preliminaries. Sound is caused by waves, and the frequency of a
sound wave determines its pitch – how high or low it sounds. Higher frequencies lead to
higher pitches, so the sound wave made by the chirp of a sparrow has a higher frequency
than the sound wave made by the growl of a dog. We measure frequency in Hz (Hertz),
which corresponds to periods per second. We often leave out the units, so you might see a
frequency referred to as “100” instead of “100 Hz.” We’ll picture the waves like the graph
of a sine function, although this is not how they would actually appear.
We’ll use the word “note” to mean a specified pitch. For example, the note named A4
usually corresponds to a frequency of 440 Hz. An interval is the “distance” between two
notes, quantified as the ratio of their frequencies. The way we perceive the “distance”

6 or they do but they shouldn’t be, e.g. this amusing sign

355
S EQUENCE AND S ERIES 5.1 S EQUENCES

between two notes relies on the ratio of their two frequencies, which is why we use a ratio
and not a difference when measuring intervals.

Example 5.1.14

Consider the three pairs of notes below. Which pairs will sound roughly the same distance
from each other, and which will sound different?

1. 110 Hz and 193.25 Hz

2. 440 Hz and 523.25 Hz

3. 587.33 Hz and 698.46 Hz

Solution. To quantify how far apart two notes sound, we take the ratio of their frequencies.

193.25
1 110 Hz and 193.25 Hz have a ratio of 110 « 1.75682

523.25
2 440 Hz and 523.25 Hz have a ratio of 440 « 1.18920

698.46
3 587.33 Hz and 698.46 Hz have a ratio of 587.33 « 1.18921

The last two pairs of notes sound about the same distance away from one another, because
their ratios are nearly identical. The first pair of notes will sound farther apart from one
another than the other pairs.
Incidentally, the interval spanned in 2 and 3 has a name: a minor third. For listeners in
the Western tradition, the sound of two notes of such an interval being played together is
often evocative of a melancholy or enigmatic mood.
Example 5.1.14

A scale is a collection of notes. There are many different scales that are used, and
many more that are theoretically possible. Scales in context usually refer to the collection
of notes that make up most of a single piece of music. So, one song might mainly consist of
notes from a scale named “B Minor,” and another song might mainly consist of notes from
a scale named “G major pentatonic.” Generally speaking7 , standardized scales consist of
notes that people have decided they like hearing played together.

Example 5.1.15

The interval between some frequency a and the frequency 2a is called an octave. Some
popular musical scales divide the octave into twelve intervals. (In the partial piano schematic
below, the key labelled 13 would produce a note with twice the frequency of the key la-
belled 1.)

7 Precision in describing the things that people do is much harder to attain than precision in mathematics.

356
S EQUENCE AND S ERIES 5.1 S EQUENCES

2 5 7 10 12

1 3 4 6 8 9 11 13

We call a scale “even-tempered” if consecutive notes always sound like they’re the
same distance apart from one another. Since the sound of notes in relation to each other
is determined by the ratio of their frequencies, this means means that the ratio of the
frequencies of two consecutive notes is the same, no matter which two consecutive notes
we’re considering.
Suppose the key labelled 1 makes the note 440Hz, and the key labelled 13 makes the
note 880 Hz (one octave above 440). If the piano is tuned to an even-tempered scale, what
are the frequencies associated with the keys labelled 2 through 12?
Solution.
Let the notes on the piano form the first part8 of a sequence, with key 1 making note
a1 , key 2 making note a2 , and so on. We know three pieces of information:
1. a1 = 440
2. a13 = 880
a2 a3 a4 a13
3. a1 = a2 = a3 = ¨¨¨ = a12

(3 comes from the description of even-tempering.) Let’s give the number aa21 the name r
(because it’s a ratio). This gives us a recurrence relation to describe our partial sequence:
a
since na+n 1 = r, then an+1 = ran . We can now write out each element of the partial sequence
in terms of r.
a1 = 440
a2 = 440r
a3 = (440r )r = 440r2
a4 = (440r2 )r = 440r3
..
.
a12 = 440r11
a13 = 440r12
Since we’re given a13 = 880, we can solve for r.
880 = 440r12
1
r = 2 12

8 we defined sequences to be infinite, but pianos have only finitely many keys

357
S EQUENCE AND S ERIES 5.1 S EQUENCES

Now, we can write down each note frequency.

1. 440 5. 440 ¨ 24/12 « 554.365 9. 440 ¨ 28/12 « 698.456


2. 440 ¨ 21/12 « 466.163 6. 440 ¨ 25/12 « 587.330 10. 440 ¨ 22/12 « 739.989
3.440 ¨ 22/12 « 493.883 7.440 ¨ 26/12 « 622.254 11. 440 ¨ 210/12 « 783.991
4. 440 ¨ 21/12 « 523.251 8.440 ¨ 27/12 « 659.255 12. 440 ¨ 211/12 « 830.609

Example 5.1.15

When we say “the interval between consecutive notes is the same,” we mean “the ratio
between consecutive notes is the same.” Having a common ratio between consecutive
terms is the defining characteristic of a geometric sequence.

Definition5.1.16.

A geometric sequence is a sequence of the form


! ) ! )8
a, ar, ar2 , ¨ ¨ ¨ , ar n , ¨ ¨ ¨ or an = ar n
n =1

where a and r are any two fixed real numbers with a ‰ 0.


a n +1
Note an = r for every whole number n. We call r the common ratio.

(If we were to “add up” the terms of a geometric sequences, we’d get a geometric series
– see Example 5.2.4.)
When a tone is made by a vibrating physical9 object, although we may primarily pick
up on one frequency (the “fundamental”), usually waves of many different frequencies
are being generated. If we make a tone by causing a string to vibrate, as on a violin or
guitar, the waves that make noise have frequencies that are whole-number multiples of
the fundamental frequency. To explain this behaviour, note that the ends of the string are
fixed, so they can’t move up and down. So, the only waves that can occur on the string
are waves that keep these points fixed. The fundamental is the longest wave. The other
waves that are generated are called harmonics. The nth harmonic has frequency n times
the fundamental.

fundamental second harmonic third harmonic

9 For the following “physical” discussion, we’re relying on a very simplified model. However, the results
are indeed relevant to how actual musical instruments sound.

358
S EQUENCE AND S ERIES 5.1 S EQUENCES

In the figure above, a string10 is fixed between two dots. We imagine it vibrating up
and down in a wave pattern, moving between the positions shown by the dashed and
dotted lines. The wavelength of these waves is inversely proportional to the frequency
they generate – so dividing a wavelength by (say) three causes the frequency to triple.
Example 5.1.17

A string, when played, has a fundamental tone of 100 Hz, with a wavelength of 1 m. Let
t f n u be the sequence of frequencies of the harmonics of the string, organized by increasing
pitch (with f 1 = 100). Let t`n u be the sequence of corresponding wavelengths (so `1 = 1).
What are t`n u and t f n u?
Solution. The frequencies of harmonic tones are integer multiples of the fundamental, so
f 1 = 100, f 2 = 200, f 3 = 300, ... , f n = 100n
The wavelengths are inversely proportional to the frequencies. So, if frequency f n is f 1 ¨ n,
then wavelength `n is `n1 .
1 1 1
`1 = 1, `2 = , `3 = , ... , `n =
2 3 n
Example 5.1.17

The sequence t n1 u8
n=1 is called the harmonic sequence. (We’ll consider the harmonic
series in Example 5.3.4.) In music textbooks, you might see the sequence of harmonic
notes referred to as the “harmonic series.” This isn’t because the notes are added together,
it’s simply a different use of the word “series.”
Example 5.1.18

Consider an even-tempered musical scale with twelve intervals in each octave, the lowest
note of which is 250 Hz.
Suppose we have a string whose fundamental tone is 250 Hz. Which harmonics of the
string are also notes of the even-tempered scale?
Solution.
The even-tempered musical scale is given by the geometric sequence ten = a ¨ r n u8 n =0
1
where a = 250 and r = 2 12 . The harmonic sequence of the string is thn = 250nun=1 .
8

All frequencies in the harmonic sequence are integer multiples of 250, and so are whole
numbers. The only whole numbers in the geometric sequence en occur when 2 is raised
to a whole-number powers, i.e. when n is a multiple of 12. So our only candidates for
frequencies that appear in both sequences have the form 250 ¨ 2k . It’s quick to see that
these occur in both: 250 ¨ 2k = gn when n = 12k, and 250 ¨ 2k = hn when n = 2k .
So, the only intervals from the even-tempered scale that perfectly line up with the nat-
ural harmonics of the string are octaves: the fundamental, twice the fundamental, twice
that frequency, etc.

10 Similar wave behaviour occurs in tubes of air, like you might find in a brass instrument or woodwind.
Brass players can emphasize different harmonic notes by changing they way they blow into their in-
strument.

359
S EQUENCE AND S ERIES 5.2 S ERIES

Example 5.1.18

Harmonics are produced naturally, so it’s nice if they’re “in tune” with the scale notes.
The dearth of overlap between harmonic and geometric sequences is one reason that even-
tempered scales are sometimes unpopular. However, many harmonic notes are approxi-
19
mated by the even-tempered scale above. For example, 2 12 « 2.9966 « 3, so g19 is a fair
approximation to e3 .
Example 5.1.19

Suppose we were to make a scale that consisted only of harmonics. The frequencies would
make up the sequence thn = anu8 n=1 , where a is the fundamental.
How would such a scale sound if we played the notes one after the other? Remember,
the way two notes sound depends on the ratio of their frequencies. A bigger ratio sounds
like a bigger “step” from one note to the next. So, let’s define a sequencetrn u8
n=2 to be the
ratio of the nth harmonic to the note before it. A value of rn that is close to 1 means the
two notes sound the same. A value of rn that is far from 1 means the two notes sound
different.

frequency: a 2a 3a 4a 5a 6a 7a ¨¨¨ na
3 4 5 6 7 n
ratio: 2 2 3 4 7 6 ¨¨¨ n´1

The sequences hn and rn have different limits, each with a musical interpretation.

• lim hn = 8 tells us that the notes of this sequence have no upper bound. We can
nÑ8
find notes as high as we please in this scale.

• lim rn = 1 tells us that notes of the scale sound more and more alike as we go higher.
nÑ8

The picture painted by these two limits is that the scale climbs higher and higher,
but does so in tiny increments, so that many different high-pitched notes are virtually
indistinguishable from one another. (On the other hand, the first step is huge: an entire
octave!)
Example 5.1.19

With this introduction to sequences and some tools to determine their limits, we can
now return to the problem of understanding infinite sums.

5.2IJ Series
A series is a sum

a1 + a2 + a3 + ¨ ¨ ¨ + a n + ¨ ¨ ¨

360
S EQUENCE AND S ERIES 5.2 S ERIES

of infinitely many terms. In summation notation, it is written


8
ÿ
an
n =1

You already have a lot of experience with series, though you might not realise it. When
you write a number using its decimal expansion you are really expressing it as a series.
Perhaps the simplest example of this is the decimal expansion of 13 :
1
= 0.3333 ¨ ¨ ¨
3
Recall that the expansion written in this way actually means
8
3 3 3 3 ÿ 3
0.333333 ¨ ¨ ¨ = + + + +¨¨¨ =
10 100 1000 10000 10n
n =1

The summation index n is of course a dummy index. You can use any symbol you like
(within reason) for the summation index.
8 8 8 8
ÿ 3 ÿ 3 ÿ 3 ÿ 3
n
= i
= j
=
10 10 10 10`
n =1 i =1 j =1 `=1

A series can be expressed using summation notation in many different ways. For example
the following expressions all represent the same series:

hkknik =2kj hkknik


=1kj hkknik =3kj
8
ÿ 3 3 3 3
n
= + + +¨¨¨
10 10 100 1000
n =1
=2 =3 =4
hkkjik kj hkkjikkj hkkjikkj
8
ÿ 3 3 3 3
= + + +¨¨¨
j =2
10 j´1 10 100 1000

ik0kj hkk`=
hkk`= ik1kj hkk`=
ik3kj
8
ÿ 3 3 3 3
= + + +¨¨¨
`=0
10`+1 10 100 1000
=2kj hkknik
hkknik =3kj
8
3 ÿ 3 3 3 3
+ = + + +¨¨¨
10 n=2 10n 10 100 1000

We can get from the first line to the second line by substituting n = j ´ 1 — don’t forget to
also change the limits of summation (so that n = 1 becomes j ´ 1 = 1 which is rewritten
as j = 2). To get from the first line to the third line, substitute n = ` + 1 everywhere,
including in the limits of summation (so that n = 1 becomes ` + 1 = 1 which is rewritten
as ` = 0).
Whenever you are in doubt as to what series a summation notation expression repre-
sents, it is a good habit to write out the first few terms, just as we did above.

361
S EQUENCE AND S ERIES 5.2 S ERIES

Of course, at this point, it is not clear whether the sum of infinitely many terms adds up
to a finite number or not. In order to make sense of this we will recast the problem in terms
of the convergence of sequences (hence the discussion of the previous section). Before we
proceed more formally let us illustrate the basic idea with a few simple examples.
!
8
ÿ 3
Example 5.2.1
10n
n =1
3
As we have just seen above the series 8 n=1 10n is
ř

=1kj hkknik
hkknik =2kj hkknik
=3kj
8
ÿ 3 3 3 3
= + + +¨¨¨
10n 10 100 1000
n =1

Notice that the nth term in that sum is


n´1 zeroes
hkkikkj
3 ˆ 10 ´n
= 0. 00 ¨ ¨ ¨ 0 3
So the sum of the first 5, 10, 15 and 20 terms in that series are
5 10
ÿ 3 ÿ 3
= 0.33333 = 0.3333333333
10n 10n
n =1 n =1
15 20
ÿ 3 ÿ 3
= 0.333333333333333 = 0.33333333333333333333
10n 10n
n =1 n =1

It sure looks like that, as we add more and more terms, we get closer and closer to 0.3̇ = 31 .
3 1
So it is very reasonable11 to define 8n=1 10n to be 3 .
ř

Example 5.2.1

!
8 8
(´1)n
ÿ ÿ
Example 5.2.2 1 and
n =1 n =1

Every term in the series n=1 1 is exactly 1. So the sum of the first N terms is exactly N.
ř8
As we add more ř and more terms this grows unboundedly. So it is very reasonable to say
that the series 8
n=1 1 diverges.
The series
=1kj hkknik
hkknik =3kj hkknik
=2kj hkknik =5kj
=4kj hkknik
8
(´1)n = (´1) + 1 + (´1) + 1 + (´1) + ¨ ¨ ¨
ÿ

n =1

So the sum of the first N terms is 0 if N is even and ´1 if N is odd. As we add more and
more terms from the series, the sum alternates between 0 and ´1 for ever and ever. So the

11 Of course we are free to define the series to be whatever we want. The hard part is defining it to be
something that makes sense and doesn’t lead to contradictions. We’ll get to a more systematic definition
shortly.

362
S EQUENCE AND S ERIES 5.2 S ERIES

sum of all infinitely


ř8 manyn terms does not make any sense and it is again reasonable to say
that the series n=1 (´1) diverges.
Example 5.2.2

In the above examples we have tried to understand the series by examining the sum
of the first few terms and then extrapolating as we add in more and more terms. That is,
we tried to sneak up on the infinite sum by looking at the limit of (partial) sums of the
first few terms. This approach can be made into a more řformal rigorous definition. More
precisely, to define what is meant by the infinite sum n=1 an , we approximate it by the
8

sum of its first N terms and then take the limit as N tends to infinity.
Definition5.2.3.

The N th partial sum of the series n =1 a n is the sum of its first N terms
ř8

N
ÿ
SN = an .
n =1
(8
The partial sums form a sequence S N N =1 . If this sequence of partial sums
converges S N Ñ S as N Ñ 8 then we say that the series 8n=1 an converges to S
ř
and we write
8
ÿ
an = S
n =1
If the sequence of partial sums diverges, we say that the series diverges.

5.2.1 §§ Geometric Series

Example 5.2.4 (Geometric Series)

Let a and r be any two fixed real numbers with a ‰ 0. The series
8
2 n
ar n
ÿ
a + ar + ar + ¨ ¨ ¨ + ar + ¨ ¨ ¨ =
n =0

is called the geometric series with first term a and ratio r.


Notice that we have chosen to start the summation index at n = 0. That’s fine. The
first12 term is the n = 0 term, which is ar0 = a. The second term is the n = 1 term,
1
which is ar = ar. And so on. We could have also written the series n=1 ar n´1 . That’s
ř8
exactly the same series — the first term is ar n´1 ˇn=1 = ar1´1 = a, the second term is
ˇ

12 It is actually quite common in computer science to think of 0 as the first integer. In that context, the set
of natural numbers is defined to contain 0:
N = t0, 1, 2, . . . u

363
S EQUENCE AND S ERIES 5.2 S ERIES

ar n´1 ˇn=2 = ar2´1 = ar, and so on13 . Regardless of how we write the geometric series, a is
ˇ

the first term and r is the ratio between successive terms.


Geometric series have the extremely useful property that there is a very simple formula
for their partial sums. Denote the partial sum by
N
ar n = a + ar + ar2 + ¨ ¨ ¨ + ar N .
ÿ
SN =
n =0

The secret to evaluating this sum is to see what happens when we multiply it r:

rS N = r a + ar + ar2 + ¨ ¨ ¨ + ar N
= ar + ar2 + ar3 + ¨ ¨ ¨ + ar N +1

Notice that this is almost the same14 as S N . The only differences are that the first term, a,
is missing and one additional term, ar N +1 , has been tacked on the end. So

S N = a + ar + ar2 + ¨ ¨ ¨ + ar N
rS N = ar + ar2 + ¨ ¨ ¨ + ar N + ar N +1

Hence taking the difference of these expressions cancels almost all the terms:

(1 ´ r )S N = a ´ ar N +1 = a(1 ´ r N +1 )

Provided r ‰ 1 we can divide both side by 1 ´ r to isolate S N :

1 ´ r N +1
SN = a ¨ .
1´r
On the other hand, if r = 1, then

a + a + ¨ ¨ ¨ + a = a ( N + 1)
S N = loooooooomoooooooon
N +1 terms

So in summary:
&a 1´r N +1
$
1´r if r ‰ 1
SN = (5.2.1)
% a ( N + 1) if r = 1

while the notation

Z+ = t1, 2, 3, . . . u

is used to denote the (strictly) positive integers. Remember that in this text, as is more standard in
mathematics, we define the set of natural numbers to be the set of (strictly) positive integers.
13 This reminds the authors of the paradox of Hilbert’s hotel. The hotel with an infinite number of rooms
is completely full, but can always accommodate one more guest. The interested reader should use their
favourite search engine to find more information on this.
14 One can find similar properties of other special series, that allow us, with some work, to cancel many
terms in the partial sums. We will shortly see a good example of this. The interested reader should look
up “creative telescoping” to see how this idea might be used more generally, though it is somewhat
beyond this course.

364
S EQUENCE AND S ERIES 5.2 S ERIES

Now that we have this expression we can determine whether or not the series con-
a
verges. If |r| ă 1, then r N +1 tends to zero as N Ñ 8, so that S N converges to 1´r as N Ñ 8
and
8
a
ar n =
ÿ
provided |r| ă 1. (5.2.2)
n =0
1 ´ r

On the other hand if |r| ě 1, S N diverges. To understand this divergence, consider the
following 4 cases:

• If r ą 1, then r N grows to 8 as N Ñ 8.

• If r ă ´1, then the magnitude of r N grows to 8, and the sign of r N oscillates between
+ and ´, as N Ñ 8.

• If r = +1, then N + 1 grows to 8 as N Ñ 8.

• If r = ´1, then r N just oscillates between +1 and ´1 as N Ñ 8.

In each case the sequence of partial sums does not converge and so the series does not
converge.
Example 5.2.4

Equations 5.2.1 and 5.2.2 are worth stating as a theorem.

Theorem5.2.5 (Geometric Series and Partial Sums).

Let a and r be fixed real numbers, and let N be a positive integer. Then

&a ¨ 1´r N +1 if r ‰ 1
$
N
ÿ
n 1´r
ar =
%a( N + 1) if r = 1
n =0

and
8
a
ar n =
ÿ
provided |r| ă 1.
n =0
1 ´ r

If |r| ě 1 and a ‰ 0, then the series 8 n


n=0 ar diverges.
ř

Example 5.2.6 (Bitcoin Supply)

Bitcoin is a virtual currency that mimics traditional currencies in a number of ways. One
of those ways is controlled supply15 . That is, new bitcoins enter circulation over time in a
controlled manner.

15 Source for the specifics in this example: Controlled Supply, Bitcoin Wiki, url https://en.bitcoin.
it/wiki/Controlled_supply accessed 16 Aug 2020

365
S EQUENCE AND S ERIES 5.2 S ERIES

New blocks16 are searched for by computers. When a block is found, it is converted
into a set number of new bitcoins (owned by the finder). This is the reward for finding a
block.
This process is analogous to mining precious metals which then are added to the cur-
rency supply, so the process of finding new blocks is often called mining. Importantly,
the bitcoins given in the reward are new bitcoins that did not exist before the block was
found. So, finding blocks is how bitcoins are created.
The reward for finding a block started at 50 bitcoins, but it halves every 210,000 blocks.
The miners who found block 0, block 1, and block 209,999 each got a reward of 50 bitcoins.
Then, the miners who found block 210,000 through block 419,999 each got a reward of 25
bitcoins, and so on.
For the purposes of this example, we will assume that miners will always be able to
find blocks. (That is, blocks never run out.) We will also assume that rewards for finding
blocks are the only ways bitcoins are ever created, and that bitcoins are never destroyed.

(a) Suppose bitcoins are infinitely divisible. (That is, you can have an arbitrarily small
portion of a bitcoin, such as one-trillionth of a bitcoin, without a limit on how small
that portion can be.) If miners continue finding blocks for an infinite period of time,
what will happen to the total supply of bitcoins?

(b) One Satoshi (or one sat) is equal to 1/100, 000, 000 bitcoin. Suppose when the reward
for a block is scheduled to be less than one satoshi, the block finder actually gets a
reward of 0 bitcoins. That is, there are no more bitcoins created when the reward for
finding a new block dips below one satoshi. If miners continue finding blocks for an
infinite period of time, what will happen to the total supply of bitcoins?

Solution.

(a) Let’s model the number of bitcoins by grouping together collections of 210,000 blocks.

• For the first collection of 210,000 blocks, the number of bitcoins created is 50 each,
for a total of 210, 000 ¨ 50 bitcoins created.
50
• For the second collection of 210,000 blocks, the number of bitcoins created is 2 =
25 each, for a total of 210, 000 ¨ 50
2 bitcoins created.
50
• For the third collection of 210,000 blocks, the number of bitcoins created is 4 =
25 50
2 = 12.5 each, for a total of 210, 000 ¨ 4 bitcoins created.
• In general, for the nth collection of 210,000 blocks, the total number of bitcoins
50
created by those blocks is 210, 000 ¨ 2n´1 bitcoins.
• All together, the number of bitcoins created by an infinite collection of blocks is

8 8  n´1
ÿ 50 ÿ 1
210, 000 ¨ n´1 = 210, 000 ¨ 50
2 2
n =1 n =1

16 For the purposes of this question, the technical details are not important. What you need to know about
blocks is that you find them and they get turned into currency.

366
S EQUENCE AND S ERIES 5.2 S ERIES

This series almost, but not exactly, looks like the series from Theorem 5.2.5. We’ll
expand the series17 in order to see how we might have indexed the terms differently.
!
8  n´1  0  1  2
ÿ 1 1 1 1
210, 000 ¨ 50 = 210, 000 ¨ 50 + + +¨¨¨
2 2 2 2
n =1
8  n
ÿ 1
= 210, 000 ¨ 50
n =0
2

Now we can apply Theorem 5.2.5 with r = 12 .

1
= 210, 000 ¨ 50 ¨ = 210, 000 ¨ 50 ¨ 2 = 21, 000, 000
1 ´ 12

As blocks are mined, the total number of bitcoins will approach 21 million. It will
never exceed 21 million.

(b) For this part we assume that after a certain number of blocks, no more bitcoin are
created. So, we will look at a finite sum, rather than an infinite series. Let’s start by
figuring out when the reward for a block drops below 1 satoshi.
50
The nth batch of 210,000 blocks earns 2n´1 bitcoins, as long as that number is greater
than or equal to one satoshi. That is, we create bitcoins as long as:

50 1 1
n´1
ě = 8
2 100, 000, 000 10

Solving for n:

5 ¨ 109 ě 2n´1
log2 (5 ¨ 109 ) ě n ´ 1
1 + log2 (5 ¨ 109 ) ě n

Note n only makes sense as an integer. Using a calculator, 1 + log2 (5 ¨ 109 ) « 33.2. So
when n = 33, blocks earn rewards, but when n ě 34, they do not.
The means the total supply of bitcoins that could ever be created under this system is:

33  0  1  2  32 !
ÿ 50 1 1 1 1
210, 000 ¨ n´1 = 210, 000 ¨ 50 + + +¨¨¨+
2 2 2 2 2
n =1
32
ÿ 1
= 210, 000 ¨ 50
n =0
2n

17 indexing from 0 (starting with the 0th collection, then the 1st collection in the bullet list) would have
eliminated this upcoming step. We described the creation of the series using the indexing that we
thought would be most intuitive to our readers, rather than the indexing that would lead to the least
amount of algebra.

367
S EQUENCE AND S ERIES 5.2 S ERIES

1
Now we can apply Theorem 5.2.5 with r = 2 and N = 32.

 33
1´ 1  
2 1
= 210, 000 ¨ 50 ¨ = 210, 000 ¨ 100 ¨ 1 ´ 33
1 ´ 12 2

Using a calculator,

« 20, 999, 999.997555278

So the total supply of bitcoins approaches 20,999,999 bitcoins and 99,755,528 satoshi,
but never exceeds this amount.

Example 5.2.6

Now that we know how to handle geometric series let’s return to Example 5.2.1.

Example 5.2.7 (Decimal Expansions)

The decimal expansion

8
3 3 3 3 ÿ 3
0.3333 ¨ ¨ ¨ = + + + +¨¨¨ =
10 100 1000 10000 10n
n =1

3 1
is a geometric series with the first term a = 10 and the ratio r = 10 . So, by Example 5.2.4,

8
ÿ 3 3/10 3/10 1
0.3333 ¨ ¨ ¨ = n
= = =
10 1 ´ 1/10 9/10 3
n =1

just as we would have expected.


We can push this idea further. Consider the repeating decimal expansion:

16 16 16
0.16161616 ¨ ¨ ¨ = + + +¨¨¨
100 10000 1000000

16 1
This is another geometric series with the first term a = 100 and the ratio r = 100 . So, by
Example 5.2.4,

8
ÿ 16 16/100 16/100 16
0.16161616 ¨ ¨ ¨ = = = =
100n 1 ´ 1/100 99/100 99
n =1

again, as expected. In this way any periodic decimal expansion converges to a ratio of two
integers — that is, to a rational number.

368
S EQUENCE AND S ERIES 5.2 S ERIES

Here is another more complicated example.


12 34 34
0.1234343434 ¨ ¨ ¨ = + + +¨¨¨
100 10000 1000000
8
12 ÿ 34
= +
100 n=2 100n
12 34 1 34 1
= + by Example 5.2.4 with a = 2
and r =
100 10000 1 ´ 1/100 100 100
12 34 100
= +
100 10000 99
1222
=
9900
Example 5.2.7

5.2.2 §§ Telescoping Series


Typically, it is quite difficult to write down a neat closed form expression for the partial
sums of a series. Geometric series are very notable exceptions to this. Another family
of series for which we can write down partial sums is called “telescoping series”. These
series have the desirable property that many of the terms in the sum cancel each other out
rendering the partial sums quite simple.
Example 5.2.8 (Telescoping Series)
1
In this example, we are going to study the series n =1 n ( n +1) . This is a rather artificial se-
ř8

ries18 that has been rigged to illustrate a phenomenon call “telescoping”. Notice that the
nth term can be rewritten as
1 1 1
= ´
n ( n + 1) n n+1
and so we have
1
a n = bn ´ bn + 1 where bn = .
n
Because of this we get big cancellations when we add terms together. This allows us to
get a simple formula for the partial sums of this series.
1 1 1 1
SN = + + +¨¨¨+
1¨2 2¨3 3¨4 N ¨ ( N + 1)
1 1 1 1 1 1 1 1 
= ´ + ´ + ´ +¨¨¨+ ´
1 2 2 3 3 4 N N+1

18 Well. . . this sort of series does show up when you start to look at the Maclaurin polynomial of functions
like (1 ´ x ) ln(1 ´ x ). So it is not totally artificial. At any rate, it illustrates the basic idea of telescoping
very nicely, and the idea of “creative telescoping” turns out to be extremely useful in the study of series
— though it is well beyond the scope of this course.

369
S EQUENCE AND S ERIES 5.2 S ERIES

The second term of each bracket exactly cancels the first term of the following bracket. So
the sum “telescopes” leaving just
1
SN = 1 ´
N+1
and we can now easily compute
8
ÿ 1  1 
= lim S N = lim 1 ´ =1
n ( n + 1) NÑ8 NÑ8 N+1
n =1

Example 5.2.8

More generally, if we can write

a n = bn ´ bn + 1

for some other known sequence bn , then the series telescopes and we can compute partial
sums using
N
ÿ N
ÿ
an = ( bn ´ bn + 1 )
n =1 n =1
ÿN N
ÿ
= bn ´ bn + 1
n =1 n =1
= b1 ´ b N +1 .

and hence
8
ÿ
an = b1 ´ lim b N +1
NÑ8
n =1

8
provided this limit exists. Often lim b N +1 = 0 and then an = b1 . But this does not
ř
NÑ8 n =1
always happen. Here is an example.
Example 5.2.9 (A Divergent Telescoping Series)
8 
In this example, we are going to study the series log 1 + n1 . (We don’t specify the base
ř
n =1
— any base greater than one will behave the same way.) Let’s start by just writing out the
first few terms.
n =1
hkkkkkkikkkkkkj n =2
hkkkkkkikkkkkkj n =3
hkkkkkkikkkkkkj n =4
hkkkkkkikkkkkkj
8
ÿ  1  1  1  1  1
log 1 + = log 1 + + log 1 + + log 1 + + log 1 + +¨¨¨
n 1 2 3 4
n =1
3 4 5
= log(2) + log + log + log +¨¨¨
2 3 4

370
S EQUENCE AND S ERIES 5.2 S ERIES

This is pretty suggestive since

3 4 5  3 4 5


log(2) + log + log + log = log 2 ˆ ˆ ˆ = log(5)
2 3 4 2 3 4

So let’s try using this idea to compute the partial sum S N :

N 
ÿ 1
SN = log 1 +
n
n =1
n =1
hkkkkkkikkkkkkj n =2
hkkkkkkikkkkkkj n =3
hkkkkkkikkkkkkj n= N´1
hkkkkkkkkkikkkkkkkkkj n= N
hkkkkkkikkkkkkj
 1   1   1   1   1
= log 1 + + log 1 + + log 1 + + ¨ ¨ ¨ + log 1 + + log 1 +
1 2 3 N´1 N
3 4  N   N + 1
= log(2) + log + log + ¨ ¨ ¨ + log + log
2 3 N´1 N
 3 4 N N + 1
= log 2 ˆ ˆ ˆ ¨ ¨ ¨ ˆ ˆ
2 3 N´1 N
= log( N + 1)

Uh oh!

lim S N = lim log( N + 1) = +8


NÑ8 NÑ8

This telescoping series diverges! There is an important lesson here. Telescoping series can
diverge. They do not always converge to b1 .
Example 5.2.9

5.2.3 §§ Arithmetic of Series

As was the case for limits, differentiation and antidifferentiation, we can compute more
complicated series in terms of simpler ones by understanding how series interact with
the usual operations of arithmetic. It is, perhaps, not so surprising that there are simple
rules for addition and subtraction of series and for multiplication of a series by a constant.
Unfortunately there are no simple general rules for computing products or ratios of series.

371
S EQUENCE AND S ERIES 5.2 S ERIES

Theorem5.2.10 (Arithmetic of series).

Let A, B and C be real numbers and let the two series n =1 a n and n = 1 bn con-
ř8 ř8
verge to S and T respectively. That is, assume that
8
ÿ 8
ÿ
an = S bn = T
n =1 n =1

Then the following hold.


8 8
ÿ   ÿ  
(a) a n + bn = S + T and a n ´ bn = S ´ T
n =1 n =1

8
ÿ
(b) Can = CS.
n =1

Example 5.2.11

As a simple example of how we use the arithmetic of series Theorem 5.2.10, consider
8 h i
ÿ 1 2
+
7n n ( n + 1 )
n =1

We recognize that we know how to compute parts of this sum. We know that
8
ÿ 1 1/7 1
n
= =
7 1 ´ /7
1 6
n =1
1
because it is a geometric series (Example 5.2.4) with first term a = 7 and ratio r = 71 . And
we know that
8
ÿ 1
=1
n ( n + 1)
n =1
by Example 5.2.8. We can now use Theorem 5.2.10 to build the specified “complicated”
series out of these two “simple” pieces.
8 h i 8 8
ÿ 1 2 ÿ 1 ÿ 2
+ = + by Theorem 5.2.10.a
7n n ( n + 1 ) 7n n ( n + 1)
n =1 n =1 n =1
8 8
ÿ 1 ÿ 1
= +2 by Theorem 5.2.10.b
7n n ( n + 1)
n =1 n =1
1 13
= +2¨1 =
6 6

Example 5.2.11

372
S EQUENCE AND S ERIES 5.2 S ERIES

5.2.4 §§ (Optional) Intergenerational Cost-Benefit Analysis


This subsection presents ideas from the article19 Intergenerational cost–benefit analysis and
marine ecosystem restoration by UBC Institute for the Oceans and Fisheries Professor Ussif
Rashid Sumaila.
Generally we value the promise of money in the future less than we value the posses-
sion of money in the present. The discounting rate describes the loss of value that occurs
with time, and is calculated like interest. For example, suppose we have a discounting
rate of 10%. That means D dollars in our possession today has the same value to us as
D (1 + 0.1) = 1.1 ¨ D dollars promised to us in one year. These both have the same value
to us as (1.1)(1.1 ¨ D ) = 1.12 ¨ D dollars in two years, or 1.1t ¨ D dollars in t years:
D present-day dollars = (1.1t ¨ D ) future dollars
Dividing both sides of the equation by 1.1t , we see that the promise of D dollars in t years
D
is worth the same to us as the possession of 1.1 t dollars in the present:

D present-day dollars
D future dollars =
1.1t
In a conventional cost-benefit analysis (CBA), returns that will happen in the future
are subject to precisely this form of discounting. To quantify the value of a project, units
of Present Value (PV) are used. Given a discounting rate20 of δ, possession of D dollars
today has the same value as a gain of (1 + δ)t D dollars t years from now. Rearranged, the
present value of D dollars that will be gained t years in the future is given by
D
PV( D, t) =
(1 + δ ) t
Future discounting is human nature, but it doesn’t always make for good policy. In
particular, “high discount rates favour myopic fisheries policies resulting in global over-
fishing” (p. 334) since the model makes the health of an ecosystem one hundred years
from now worth almost nothing today.
Sumaila proposes an intergenerational model, where discounting still happens within
a generation of people, but different generations are considered together. Quoting the
article:
“The benefits to the current generation from the use of ecosystem resources to-
day would never have appeared in the conventional CBA[Cost-Benefit Analy-
sis] of the generations that were here a hundred years ago. Similarly, the gen-
eration that will be here in a hundred years time, would receive benefits from
restored marine ecosystems that would mean much to them but would not ap-
pear in the current generation’s conventional CBA. Therefore, to capture the
benefits to all generations from ecosystem restoration projects, it is necessary
to use [an intergenerational] CBA approach” (p. 336).

19 Sumaila UR. Intergenerational cost–benefit analysis and marine ecosystem restoration. Fish and
fisheries (Oxford, England). 2004;5(4):329-43. You can access the full text online with your UBC
CWL (campus-wide login) here: https://libkey.io/libraries/498/articles/30981866/
full-text-file?utm_source=api_542.
20 To better understand the rate, note that if δ = 0, then $1 today is worth the same to us as $1 one year
from now, 100 years from now, or at any other time in the future.

373
S EQUENCE AND S ERIES 5.2 S ERIES

The approach proposed by Sumaila is as follows. We divide up the future into distinct
generations, each of which reigns over a (non-overlapping) interval of time. Each gener-
ation has its own Present Value calculation, measured from the start of its reign. So the
Present Value of the promise of D dollars in year t, to a generation that started its reign in
year t0 , is
D
PV( D, t) =
(1 + δ)t´t0
The difference between this calculation and the conventional PV calculation is that “present”
is relative for each generation.
Now that we have these components, we can create an expression for a cost-benefit
analysis (CBA) of a long-term project.
Suppose in year t, the costs incurred by the project are given by Ct , and the benefits
are given by Vt . The net value gained in that year is Vt ´ Ct , before future discounting
is applied. If the generation started its reign in year t = t0 , then the present value of of
(Vt ´ Ct ) to that generation is (1V+t δ´C t
)t´t0
. If the generation reigns from t = t0 to t = t1 , then
we combine the net present value of each of those years to find the net present value to
the generation of the entire project.
To include a collection of generations, we add up each generation’s Net Present Value.
To express this in sigma notation, let NPVk be the Net Present Value for the kth generation.
We’ll index years as follows. The first generation reigns from t = t0 + 1 to t = t1 ; the
second generation reigns from t = t1 + 1 to t = t2 ; and (in general) the kth generation
reigns from t = tk´1 + 1 to t = tk . (Considering the first year to be t = t0 + 1 looks weird,
but makes the indices more consistent with one another.)

t0 t1 t2 t3 t4

generation 1 generation 2 generation 3 generation 4

Then generation k has its personal Net Present Value given by


tk
ÿ Vt ´ Ct
NPVk =
t=tk´1 +1
(1 + δ)t´tn

All together, the intergenerational Net Present Value of a project, from generation 1 to
generation L, is
L
ÿ
NPV = NPVk
k =1
 
L tk
ÿ

ÿ Vt ´ Ct 
=
t=tk´1 +1
(1 + δ)t´tn
k =1

If the NPV is positive, then the project is a good investment: adjusting for discount-
ing, but considering future generations, the benefits will exceed the costs. If the NPV is
negative, then the project is a bad investment.

374
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

5.3IJ Convergence Tests


It is very common to encounter series for which it is difficult, or even virtually impossi-
ble, to determine the sum exactly. Often you try to evaluate the sum approximately by
truncating it, i.e. having the index run only up to some finite N, rather than infinity. But
there is no point in doing so if the series diverges. So you like to at least know if the
series converges or diverges. Furthermore you would also like to know what error is in-
řN
troduced when you approximate 8 n=1 an by the “truncated series” n=1 an . That’s called
ř
the truncation error. There are a number of “convergence tests” to help you with this.

5.3.1 §§ The Divergence Test


Our first test is very easy to apply, but it is also rarely useful. It just allows us to quickly
reject some “trivially divergent” series. It is based on the observation that
řN
• by definition, a series 8 n=1 an converges to S when the partial sums S N = n =1 a n
ř
converge to S.

• Then, as N Ñ 8, we have S N Ñ S and, because N ´ 1 Ñ 8 too, we also have


S N´1 Ñ S.

• So a N = S N ´ S N´1 Ñ S ´ S = 0.
This tells us that, if we already know that a given series an is convergent, then the nth
ř
term of the series, an , must converge to 0 as n tends to infinity. In this form, the test is not
so useful. However the contrapositive21 of the statement is a useful test for divergence.

Theorem5.3.1 (Divergence Test).


(8
If the sequence an fails to converge to zero as n Ñ 8, then the series n =1 a n
ř8
n =1
diverges.

Example 5.3.2
n
Let an = n +1 . Then

n 1
lim an = lim = lim =1‰0
nÑ8 nÑ8 n + 1 nÑ8 1 + 1/n

21 Given a statement of the form “If A is true, then B is true” the contrapositive is “If B is not true, then A
is not true”. The two statements in quotation marks are logically equivalent — if one is true, then so is
the other. In the present context we have
If ( an converges) then (an converges to 0).
ř

The contrapositive of this statement is then


If (an does not converge to 0) then ( an does not converge).
ř

375
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

n
So the series diverges.
ř8
n =1 n +1
Example 5.3.2

Warning5.3.3.

The divergence test is a “one way ř8 test”. It tells us that if limnÑ8 an is nonzero,
or fails to exist, then the series n=1 an diverges. But it tells us absolutelyř
nothing
when limnÑ8 an = 0. In particular, it is perfectly possible for a series 8 n =1 a n
ř8 1
to diverge even though limnÑ8 an = 0. An example is n=1 n . We’ll show in
Example 5.3.6, below, that it diverges.

1
Now while convergence or divergence of series like 8 n=1 n can be determined using
ř
some clever tricks, it would be much better of have methods that are more systematic and
rely less on being sneaky. Over the next subsections we will discuss several methods for
testing series for convergence.
Note that while these tests will tell us whether or not a series converges, they do not
(except in rare cases) tell us what the series adds up to. For example, the test we will see
in the next subsection tells us quite immediately that the series
8
ÿ 1
n =1
n3

converges. However it does not tell us its value22 .

5.3.2 §§ The Integral Test


In the integral test, we think of a series 8n=1 an , that we cannot evaluate explicitly, as the
ř
area of a union of rectangles, with an representing the area of a rectangle of width one
and height an . Then we compare that area with the area represented by an integral, that
we can evaluate explicitly, much as we did in Theorem 3.10.17, the comparison test for
improper integrals. We’ll start with a simple example, to illustrate the idea. Then we’ll
move on to a formulation of the test in general.
Example 5.3.4
1
Visualise the terms of the harmonic series 8 n=1 n as a bar graph — each term is a rectan-
ř

gle of height n1 and width 1. The limit of the series is then the limiting area of this union
of rectangles. Consider the sketch on the left below.

22 This series converges to Apéry’s constant 1.2020569031 . . . . The constant is named for Roger Apéry
(1916–1994) who proved that this number must be irrational. This number appears in many contexts
including the following cute fact — the reciprocal of Apéry’s constant gives the probability that three
positive integers, chosen at random, do not share a common prime factor.

376
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

ř4 1
It shows that the area of the shaded columns, n =1 n , is bigger than the area under the
curve y = 1x with 1 ď x ď 5. That is

4 ż5
ÿ 1 1
ě dx
n 1 x
n =1

If we were to continue drawing the columns all the way out to infinity, then we would
have
8
1 1
ÿ ż8
ě dx
n 1 x
n =1

We are able to compute this improper integral exactly:


ż8
1 h iR
dx = lim ln |x| = +8
1 x RÑ8 1

That is the area under the curve diverges to +8 and so the area represented by the
columns must also diverge to +8.
It should be clear that the above argument can be quite easily generalised. For example
the same argument holds mutatis mutandis23 for the series
8
ÿ 1
n =1
n2

Indeed we see from the sketch on the right above that

N żN
ÿ 1 1
2
ď 2
dx
n =2
n 1 x

23 Latin for “Once the necessary changes are made”. This phrase still gets used a little, but these days
mathematicians tend to write something equivalent in English. Indeed, English is pretty much the
lingua franca for mathematical publishing. Quidquid erit.

377
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

and hence
8
1 1
ÿ ż8
2
ď 2
dx
n =2
n 1 x

This last improper integral is easy to evaluate:


 
1 1 R
ż8
dx = lim ´
2 x2 RÑ8 x 2
 
1 1 1
= lim ´ =
RÑ8 2 R 2

Thus we know that


8 8
ÿ 1 ÿ 1 3
2
= 1+ 2
ď .
n n =2
n 2
n =1

and so the series must converge.


Example 5.3.4

The above arguments are formalised in the following theorem.

Theorem5.3.5 (The Integral Test).

Let N0 be any natural number. If f ( x ) is a function which is defined and contin-


uous for all x ě N0 and which obeys

(i) f ( x ) ě 0 for all x ě N0 and


(ii) f ( x ) decreases as x increases and
(iii) f (n) = an for all n ě N0 .

y = f (x)
y a1
a2
a3 a4

1 2 3 4x

Then
8
ÿ ż8
an converges ðñ f ( x ) dx converges
n =1 N0

Furthermore, when the series converges, the truncation error


ˇ 8
ˇÿ N
ÿ
ˇ ż8
a a f ( x ) dx for all N ě N0
ˇ
ˇ n ´ n
ˇď
N
ˇ ˇ
n =1 n =1

378
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

Proof. Let I be any fixed integer with I ą N0 . Then

• 8 n=1 an converges if and only if n= I an converges — removing a fixed finite num-


ř ř8
ber of terms from a series cannot impact whether or not it converges.

• Since an ě 0 for all n ě I ą N0 , the sequence of partial sums s` = `n= I an obeys


ř
s`+1 = s` + an+1 ě s` . That is, s` increases as ` increases.

• So s` řmust either converge to some finite number or increase to infinity. That is,
(

either 8 n= I an converges to a finite number or it is +8.

y = f (x)

aI aI+1 aI+2 aI+3


x
I I +1 I +2 I +3

Look at the figure above. The shaded area in the figure is n= I an because
ř8

• the first shaded rectangle has height a I and width 1, and hence area a I and
• the second shaded rectangle has height a I +1 and width 1, and hence area a I +1 , and
so on

This shaded area is smaller than the area under the curve y = f ( x ) for I ´ 1 ď x ă 8. So
8
ÿ ż8
an ď f ( x ) dx
n= I I´1

and, if the integral is finite, the sum 8n= I an is finite too. Furthermore, the desired bound
ř
on the truncation error is just the special case of this inequality with I = N + 1:
8
ÿ N
ÿ 8
ÿ ż8
an ´ an = an ď f ( x ) dx
n =1 n =1 n = N +1 N

y = f (x)

aI aI+1 aI+2 aI+3


x
I I +1 I +2 I +3

For the
ř8“divergence case” look at the figure above. The (new) shaded area in the figure
is again n= I an because

379
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

• the first shaded rectangle has height a I and width 1, and hence area a I and
• the second shaded rectangle has height a I +1 and width 1, and hence area a I +1 , and
so on
This time the shaded area is larger than the area under the curve y = f ( x ) for I ď x ă 8.
So
ÿ8 ż8
an ě f ( x ) dx
n= I I

and, if the integral is infinite, the sum n= I an is infinite too.


ř8

Now that we have the integral test, it is straightforward to determine for which values
of p the series24
8
ÿ 1
np
n =1

converges.
 8

1
Example 5.3.6 The p test:
ř
np
n =1
1
Let p ą 0. We’ll now use the integral test to determine whether or not the series
ř8
n =1 n p
(which is sometimes called the p–series) converges.
• To do so, we need a function f ( x ) that obeys f (n) = an = n1p for all n bigger than
some N0 . Certainly f ( x ) = x1p obeys f (n) = n1p for all n ě 1. So let’s pick this f and
try N0 = 1. (We can always increase N0 later if we need to.)

• This function also obeys the other two conditions of Theorem 5.3.5:

(i) f ( x ) ą 0 for all x ě N0 = 1 and


(ii) f ( x ) decreases as x increases because f 1 ( x ) = ´p x p1+1 ă 0 for all x ě N0 = 1.
1
• So the integral test tells us that the series converges if and only if the integral
ř8
ş8 dx n =1 n p
1 x p converges.
dx
ş8
• We have already seen, in Example 3.10.8, that the integral 1 xp converges if and
only if p ą 1.

24 This series, viewed as a function of p, is called the Riemann zeta function, ζ ( p), or the Euler-Riemann
zeta function. It is extremely important because of its connections to prime numbers (among many
other things). Indeed Euler proved that
8
1  ´1
1 ´ P´ p
ÿ ź
ζ ( p) = p =
n
n =1 P prime

Riemann showed the connections between the zeros of this function (over complex numbers p) and
the distribution of prime numbers. Arguably the most famous unsolved problem in mathematics, the
Riemann hypothesis, concerns the locations of zeros of this function.

380
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

1
So we conclude that converges if and only if p ą 1. This is sometimes called the
ř8
n =1 n p
p–test.
1
• In particular, the series 8 n=1 n , which is called the harmonic series, has p = 1 and so
ř
diverges. As we add more and more terms of this series together, the terms we add,
namely n1 , get smaller and smaller and tend to zero, but they tend to zero so slowly
that the full sum is still infinite.
1
• On the other hand, the series 8 n=1 n1.000001 has p = 1.000001 ą 1 and so converges.
ř
This time as we add more and more terms of this series together, the terms we add,
1
namely n1.000001 , tend to zero (just) fast enough that the full sum is finite. Mind you,
for this example, the convergence takes place very slowly — you have to take a huge
number of terms to get a decent approximation to the full sum. If we approximate
1 řN 1
n=1 n1.000001 by the truncated series n=1 n1.000001 , we make an error of at most
ř8

ż8
dx
żR
dx 1 h 1 1 i 106
= lim = lim ´ ´ =
N x1.000001 RÑ8 N x1.000001 RÑ8 0.000001 R0.000001 N 0.000001 N 0.000001
This does tend to zero as N Ñ 8, but really slowly.

Example 5.3.6

1
We now know that the dividing line between convergence and divergence of 8
ř
n =1 n p
occurs at p = 1. We can dig a little deeper and ask ourselves how much more quickly than
1 th
n the n term needs to shrink in order for the series to converge. We know that for large
x, the function log x (of any base) is smaller than x a for any positive a — you can convince
yourself of this with a quick application of L’Hôpital’s rule. So it is not unreasonable to
ask whether the series
8
ÿ 1
n =2
n ln n

converges. Notice that we sum from n = 2 because when n = 1, n ln n = 0. And we don’t


need to stop there25 . We can analyse the convergence of this sum with any power of ln n.
8 
1
Example 5.3.7
ř
n(ln n) p
n =2
8
1
Let p ą 0. We’ll now use the integral test to determine whether or not the series
ř
n(ln n) p
n =2
converges.
• As in the last example, we start by choosing a function that obeys f (n) = an =
1
n(ln n) p
for all n bigger than some N0 . Certainly f ( x ) = x(ln1 x) p obeys f (n) = n(ln1n) p
for all n ě 2. So let’s use that f and try N0 = 2.

• Now let’s check the other two conditions of Theorem 5.3.5:

25 We could go even further and see what happens if we include powers of ln(ln(n)) and other more
exotic slow-growing functions.

381
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

(i) Both x and ln x are positive for all x ą 1, so f ( x ) ą 0 for all x ě N0 = 2.


(ii) As x increases both x and ln x increase and so x (ln x ) p increases and f ( x ) de-
creases.
8
1
• So the integral test tells us that the series converges if and only if the
ř
n(ln n) p
n = 2
integral 2 x(lndxx) p converges.
ş8

• To test the convergence of the integral, we make the substitution u = ln x, du = dx x .


żR ż ln R
dx du
p = p
2 x (ln x ) ln 2 u
ş R dx
We already know that the integral the integral 1 du
ş8
u p , and hence the integral 2 x (ln x ) p ,
converges if and only if p ą 1.
8
1
So we conclude that converges if and only if p ą 1.
ř
n(ln n) p
n =2
Example 5.3.7

5.3.3 §§ The Comparison Test


Our next convergence test is the comparison test. It is much like the comparison test
for improper integrals (see Theorem 3.10.17) and is true for much the same reasons. The
rough idea is quite simple. A sum of larger terms must be bigger than a sum of smaller
terms. So if we know the big sum converges, then the small sum must converge too. On
the other hand, if we know the small sum diverges, then the big sum must also diverge.
Formalising this idea gives the following theorem.
Theorem5.3.8 (The Comparison Test).

Let N0 be a natural number and let K ą 0.


8 8
(a) If |an | ď Kcn for all n ě N0 and cn converges, then an converges.
ř ř
n =0 n =0
8 8
(b) If an ě Kdn ě 0 for all n ě N0 and dn diverges, then an diverges.
ř ř
n =0 n =0

“Proof”. We will not prove this theorem here. We’ll just observe that it is very reason-
able. That’s why there are quotation marks around “Proof”. For an actual proof see the
appendix section A.11.
8 8
(a) If cn converges to a finite number and if the terms in an are smaller than the
ř ř
n =0 n =0
8 8
terms in cn , then it is no surprise that an converges too.
ř ř
n =0 n =0

382
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

8 8
(b) If dn diverges (i.e. adds up to 8) and if the terms in an are larger than the terms
ř ř
n =0 n =0
8 8
in dn , then of course an adds up to 8, and so diverges, too.
ř ř
n =0 n =0

The comparison test for series is also used in much the same way as is the comparison
test for improper
řintegrals. Of course, one needs a good series to compare against, and
often the series n (from Example 5.3.6), for some p ą 0, turns out to be just what is
´p

needed.
ř 
1
Example 5.3.9 8
n=1 n2 +2n+3

1
We could determine whether or not the series 8 n=1 n2 +2n+3 converges by applying the
ř

integral test. But it is not worth the effort26 . Whether or not any series converges is de-
termined by the behaviour of the summand27 for very large n. So the first step in tackling
such a problem is to develop some intuition about the behaviour of an when n is very
large.

• Step 1: Develop intuition. In this case, when n is very large 28 2


1 1 ř8 n 1 " 2n " 3 so that
n2 +2n+3
« n2 . We already know, from Example 5.3.6, that n=1 n p converges if and
1
only if p ą 1. So 8 n=1 n2 , which has p = 2, converges, and we would expect that
ř
1
n=1 n2 +2n+3 converges too.
ř8

• Step 2: Verify intuition. We can use the comparison test to confirm that this is indeed
1
the case. For any n ě 1, n2 + 2n + 3 ą n2 , so that n2 +2n +3
ď n12 . So the compari-
1
son test, Theorem 5.3.8, with an = n2 +2n and cn = n12 , tells us that 8 1
ř
+3 n=1 n2 +2n+3
converges.

1
26 Go back and quickly scan Theorem 5.3.5; to apply it we need to show that n2 +2n +3
is positive and
1
decreasing (it is), and then we need to integrate x2 +2x+3 dx. To do that we reread the notes on partial
ş

fractions, then rewrite x2 + 2x + 3 = ( x + 1)2 + 2 and so


ż8 ż8
1 1
2
dx = 2
dx ¨ ¨ ¨
1 x + 2x + 3 1 ( x + 1) + 2

and then arctangent appears, etc etc. Urgh. Okay — let’s go back to the text now and see how to avoid
this.
27 To understand this consider any series 8 n=1 an . We can always cut such a series into two parts — pick
ř
some huge number like 106 . Then

8 106 8
ÿ ÿ ÿ
an = an + an
n =1 n =1 n=106 +1
ř8
The first sum, though it could
ř8be humongous, is finite. So the left hand side, n=1 an , is a well–defined
finite number if and only if n=106 +1 an , is a well–defined finite number. The convergence or divergence
of the series is determined by the second sum, which only contains an for “large” n.
28 The symbol “"” means “much larger than”. Similarly, the symbol “!” means “much less than”. Good
shorthand symbols can be quite expressive.

383
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

Example 5.3.9

Of course the previous example was “rigged” to give an easy application of the com-
parison test. It is often
ř relatively easy, using arguments like those in Example 5.3.9, to find
a “simple” series 8 n=1 bn with bn almost the same as an when n is large. However it is
pretty rare that an ď bn for all n. It is much more common that an ď Kbn for some constant
K. This is enough to allow application of the comparison test. Here is an example.
ř 
8 n+cos n
Example 5.3.10 n=1 n3 ´1/3

As in the previous example, the first step is to develop some intuition about the behaviour
of an when n is very large.
• Step 1: Develop intuition. When n is very large,
˝ n " | cos n| so that the numerator n + cos n « n and
˝ n3 " 1/3 so that the denominator n3 ´ 1/3 « n3 .
So when n is very large
n + cos n n 1
an = « =
n3 ´ 1/3 n3 n2
1
We already know from Example 5.3.6, with p = 2, that converges, so we
ř8
n =1 n2
n+cos n
would expect that 8n=1 n3 ´1/3 converges too.
ř

• Step 2: Verify intuition. We can use the comparison test to confirm that this is indeed
cos n| n+cos n
the case. To do so we need to find a constant K such that |an | = |nn+
3 ´1/3 = n3 ´1/3 is

smaller than nK2 for all n. A good way29 to do that is to factor the dominant term (in
this case n) out of the numerator and also factor the dominant term (in this case n3 )
out of the denominator.
n + cos n n 1 + cosn n 1 1 + cosn n
an = = =
n3 ´ 1/3 n3 1 ´ 1 3 n2 1 ´ 1 3
3n 3n
1+(cos n)/n
So now we need to find a constant K such that 1´1/3n3
is smaller than K for all n ě 1.

˝ First consider the numerator 1 + (cos n) n1 . For all n ě 1


1
* n ď 1 and
* | cos n| ď 1
So the numerator 1 + (cos n) n1 is always smaller than 1 + (1) 11 = 2.
˝ Next consider the denominator 1 ´ 1/3n3 .
1 1
* When n ě 1, 3n3 lies between 3 and 0 so that
1 2
* 1 ´ 3n3 is between 3 and 1 and consequently

29 This is very similar to how we computed limits at infinity way way back near the beginning of first-
semester calculus.

384
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

1 3
* 1´1/3n3
is between 2 and 1.
˝ As the numerator 1 + (cos n) n1 is always smaller than 2 and 1
1´1/3n3
is always
smaller than 32 , the fraction
1+ cos n 3
n
1
ď2 =3
1 ´ 3n3 2

We now know that


1 1 + 2/n 3
ď
|an | =
n2 1 ´ 1/3n3 n2
n+cos n
and, since we know n=1 n´2 converges, the comparison test tells us that 8
ř8 ř
n =1 n3 ´1/3
converges.

Example 5.3.10

The last example was actually a relatively simple application of the Comparison Theo-
rem — finding a suitable constant K can be really tedious30 . Fortunately, there is a variant
of the comparison test that completely eliminates the need to explicitly find K.
The idea behind this isn’t too complicated. We have already seen that the convergence
or divergence of a series depends not on its first few terms, but just on what happens
when n is really large. Consequently, if we can work out how the series terms behave for
really big n then we can work out if the series converges. So instead of comparing the
terms of our series for all n, just compare them when n is big.

Theorem5.3.11 (Limit Comparison Theorem).

Let n =1 a n and n = 1 bn be two series with bn ą 0 for all n. Assume that


ř8 ř8

an
lim =L
nÑ8 bn

exists.

(a) If 8 n=1 bn converges, then n=1 an converges too.


ř ř8

(b) If L ‰ 0 and 8 n=1 bn diverges, then n=1 an diverges too.


ř ř8

In particular, if L ‰ 0, then 8 n=1 an converges if and only if n=1 bn converges.


ř ř8

an
Proof. (a) Because we are told that limnÑ8 = L, we know that,
bn
ˇ ˇ
• when n is large, bn is very close to L, so that ˇ bann ˇ is very close to |L|.
an ˇ ˇ

ˇ ˇ
• In particular, there is some natural number N0 so that ˇ bann ˇ ď |L| + 1, for all n ě N0 ,
ˇ ˇ

and hence

30 Really, really tedious. And you thought some of those partial fractions computations were bad . . .

385
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

• |an | ď Kbn with K = |L| + 1, for all n ě N0 .

• The Comparison Theorem 5.3.8 now implies that n =1 a n converges.


ř8

(b) Let’s suppose that L ą 0. (If L ă 0, just replace an with ´an .) Because we are told that
limnÑ8 bann = L, we know that,
an
• when n is large, bn is very close to L.
an
• In particular, there is some natural number N so that bn ě L2 , and hence
L
• an ě Kbn with K = 2 ą 0, for all n ě N.

• The Comparison Theorem 5.3.8 now implies that n =1 a n diverges.


ř8

The next two examples illustrate how much of an improvement the above theorem is
over the straight comparison test (though of course, we needed the comparison test to
develop the limit comparison test).
ř ? 
n +1
Example 5.3.12 8
n=1 n2 ´2n+3
?
n +1
Set an = n2 ´2n +3
. We first try to develop some intuition about the behaviour of an for large
n and then we confirm that our intuition was correct.
? ?
• Step 1: Develop intuition. When n " 1, ? the numerator n + 1 « n, and the denom-
inator n2 ´ 2n + 3 « n2 so that an « n2n = n3/2 1
and it looks like our series should
3
converge by Example 5.3.6 with p = 2 .
1
• Step 2: Verify intuition. To confirm our intuition we set bn = n3/2 and compute the
limit ?
n +1 ?
an n2 ´2n+3 n3/2 n + 1
lim = lim 1
= lim 2
nÑ8 bn nÑ8 nÑ8 n ´ 2n + 3
3/2 n
Again it is a good idea to factor the dominant term out of the numerator and the
dominant term out of the denominator.
? ?
an n2 1 + 1/n 1 + 1/n
lim = lim 2  = lim =1
nÑ8 bn nÑ8 n 1 ´ 2/n + 3/n2 nÑ8 1 ´ 2/n + 3/n2

1
We already know that the series 8 n = 1 bn = n=1 n3/2 converges by Example 5.3.6
ř ř8

with p = 32 . So our series converges by the limit comparison test, Theorem 5.3.11.

Example 5.3.12

ř ? 
n +1
Example 5.3.13 n=1 n2 ´2n+3 ,
8
again

We can also try to deal with the series of Example 5.3.12, using the comparison test directly.

386
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

But that requires us to find K so that


?
n+1 K
ď 3/2
n2 ´ 2n + 3 n
We might do this by examining the numerator and denominator separately:

• The numerator isn’t too bad since for all n ě 1:

n + 1 ď 2n and so
? ?
n + 1 ď 2n

• The denominator is quite a bit more tricky, since we need a lower bound, rather than
an upper bound, and we cannot just write |n2 ´ 2n + 3| ě n2 , which is false. Instead
we have to make a more careful argument. In particular, we’d like to find N0 and K1
1
so that n2 ´ 2n + 3 ě K1 n2 , i.e. n2 ´2n +3
ď K11n2 for all n ě N0 . For n ě 4, we have
2n = 12 4n ď 12 n ¨ n = 12 n2 . So for n ě 4,

1 1
n2 ´ 2n + 3 ě n2 ´ n2 + 3 ě n2
2 2

Putting the numerator and denominator back together we have


? ?
n+1 2n ? 1
ď = 2 2 3/2 for all n ě 4
n2 ´ 2n + 3 n2 /2 n
and the comparison test then tells us that our series converges. It is pretty clear that the
approach of Example 5.3.12 was much more straightforward.
Example 5.3.13

Example 5.3.14 (Alternating Harmonic Series)


8
1
We’ve seen by the integral test that the harmonic series, n, diverges. Now we’ll con-
ř
n =1
sider the alternating harmonic series,

(´1)n
8
ÿ
n
n =1

Since we have negative31 terms, we can’t immediately use a comparison test.


We’d like to re-write our series. The fine print is that there are only certain circum-
stances where re-writing a series of this type preserves its convergence. (See Section 5.4

31 here’s a really convenient test for convergence of series that alternate signs every term, the aptly-named
Alternating Series Test. You can find more information in Appendix A.12.1. The Alternating Series Test,
however, is not on our syllabus.

387
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

and Appendix A.13.) You can take our word for it that the rearrangement below does not
impact the convergence of this particular series.

(´1)n
8
ÿ 1 1 1 1 1
= ´1 + ´ + ´ + ´ ¨ ¨ ¨
n 2 3 4 5 6
n =1
     
1 1 1 1 1
= ´1 + + ´ + + ´ + +¨¨¨
2 3 4 5 6

We’ll get a common denominator for each bracketed pair.


     
2 1 4 3 6 5
= ´ + + ´ + + ´ + +¨¨¨
1¨2 1¨2 3¨4 3¨4 5¨6 5¨6
     
1 1 1
= ´ + ´ + ´ ´¨¨¨
1¨2 3¨4 5¨6
8
ÿ ´1
=
2n(2n ´ 1)
n =1

1
We can compare with using the Limit Comparison Test:
ř ´1 ř
2n(2n´1) n2

´1 1
an = bn =
2n(2n ´ 1) n2
an
L = lim
nÑ8 bn
´1
2n(2n´1)
= lim 1
nÑ8
n2
´n2
= lim
nÑ8 2n(2n ´ 1)
1

4
1
Since 8 n=1 bn converges, and L = ´ 4 exists, n=1 an converges as well. That is, the
ř ř8
alternating harmonic series converges by the limit comparison test (and by trust in your
authors that the rearrangement we started with is, indeed, allowed).
Example 5.3.14

5.3.4 §§ The Ratio Test


The idea behind the ratio test comes from a reexamination of the geometric series. Recall
that the geometric series
8 8
ar n
ÿ ÿ
an =
n =0 n =0

388
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

converges when |r| ă 1 and diverges otherwise. So the convergence of this series is com-
pletely determined by the number r. This number is just the ratio of successive terms —
that is r = an+1 /an .
a
In general the ratio of successive terms of a series, na+n 1 , is not constant, but depends on
n. However, as we have noted above, the convergence of a series an is determined by
ř
the behaviour of its terms when n is large. In this way, the behaviour of this ratio when
n is small tells us nothing about the convergence of the series, but the limit of the ratio as
n Ñ 8 does. This is the basis of the ratio test.
Theorem5.3.15 (Ratio Test).

Let N be any positive integer and assume that an ‰ 0 for all n ě N.


ˇ ˇ 8
ˇa ˇ
(a) If lim ˇ na+n 1 ˇ = L ă 1, then an converges.
ř
nÑ8 n =1
ˇ ˇ ˇ ˇ 8
ˇ a n +1 ˇ ˇ a n +1 ˇ
(b) If lim ˇ an ˇ = L ą 1, or lim ˇ an ˇ = +8, then an diverges.
ř
nÑ8 nÑ8 n =1

Warning5.3.16.

Beware that the ratio test provides absolutely no conclusion about the conver-
8 ˇ ˇ
ˇa ˇ
gence or divergence of the series an if lim ˇ na+n 1 ˇ = 1. See Example ??, below.
ř
n =1 nÑ8

ˇ ˇ
ˇ a n +1 ˇ
Proof. (a) Pick any number R obeying L ă R ă 1. We are assuming that ˇ an ˇ approaches
ˇ ˇ
ˇ a n +1 ˇ
L as n Ñ 8. In particular there must be some natural number M so that ˇ an ˇ ď R for all
n ě M. So |an+1 | ď R|an | for all n ě M. In particular
|a M+1 | ď R |a M |
|a M+2 | ď R |a M+1 | ď R2 |a M |
|a M+3 | ď R |a M+2 | ď R3 |a M |
..
.
|a M+` | ď R` |a M |
for all ` ě 0. The series 8`=0 R |a M | is a geometric series with ratio R smaller than one in
`
ř
magnitude and so converges. Consequently, by the comparison test with an replaced by
8 8
A` = an+` and cn replaced by C` = R` |a M |, the series a M+` = an converges. So
ř ř
`=1 n = M +1
8
the series an converges too.
ř
n =1 ˇ ˇ
ˇ a n +1 ˇ
(b) We are assuming that ˇ an ˇ approaches L ą 1 as n Ñ 8. In particular there must be
ˇ ˇ
ˇa ˇ
some natural number M ą N so that ˇ na+n 1 ˇ ě 1 for all n ě M. So |an+1 | ě |an | for all

389
S EQUENCE AND S ERIES 5.3 C ONVERGENCE T ESTS

n ě M. That is, |an | increases as n increases as long as n ě M. So |an | ě |a M | for all n ě M


and an cannot converge to zero as n Ñ 8. So the series diverges by the divergence test.


Example 5.3.17 anx n´1
ř8
n =0

Fix any two nonzero real numbers a and x. We have already


ř8 seen in Example 5.2.4 — we
n
have just renamed r to x — that the geometric series n=0 ax converges when |x| ă 1
and diverges when |x| ě 1. We are now going tořconsider a new series, constructed by
differentiating32 each term in the geometric series 8 n
n=0 ax . This new series is

8
an = a n x n´1
ÿ
an with
n =0

Let’s apply the ratio test.


 1
ˇ n +1 ˇ ˇ a ( n + 1 ) x n ˇ n+1
ˇa ˇ ˇ ˇ
ˇ=ˇ = |x| = 1 + |x| Ñ L = |x| as n Ñ 8
an a n x n´1 n n
ˇ ˇ

The ratio test now tells us that the series 8 n´1 converges if |x| ă 1 and diverges if
n =0 a n x
ř
|x| ą 1. It says nothing about the cases x = ˘1. But in both of those cases an = a n (˘1)n
does not converge to zero as n Ñ 8 and the series diverges by the divergence test.
Example 5.3.17

Notice that in the above example, we had to apply another convergence test in addition
to the ratio test. This will be commonplace when we reach power series and Taylor series
— the ratio test will tell us something like

The series converges for |x| ă R and diverges for |x| ą R.

We generally won’t bother with the cases x = + R, ´R.

5.3.5 §§ Convergence Test List


We now have half a dozen convergence tests:

• Divergence Test

– works well when the nth term in the series fails to converge to zero as n tends to
infinity

• Integral Test

32 We shall řsee later, in Theorem 5.5.12, that the function 8 n=0 anx
n´1 is indeed the derivative of the
ř
8 n
function n=0 ax . Of course, such a statement only makes sense where these series converge — how
can you differentiate a divergent series? (This is not an allusion to a popular series of dystopian novels.)
Actually, there is quite a bit of interesting and useful mathematics involving divergent series, but it is
well beyond the scope of this course.

390
S EQUENCE AND S ERIES 5.4 A BSOLUTE AND C ONDITIONAL C ONVERGENCE

– works well when, if you substitute x for n in the nth term you get a function,
f ( x ), that you can integrate
– don’t forget to check that f ( x ) ě 0 and that f ( x ) decreases as x increases
• Ratio Test
a n +1 ˇa ˇ
– works well when an simplifies enough that you can easily compute lim ˇ na+n 1 ˇ =
nÑ8
L
– this often happens when an contains powers, like 7n , or factorials, like n!
– don’t forget that L = 1 tells you nothing about the convergence/divergence of
the series
• Comparison Test and Limit Comparison Test
– works well when, for very large n, the nth term an is approximately the same as
a simpler
ř8 term bn (see Example 5.3.10) and it is easy to determine whether or
not n=1 bn converges
– don’t forget to check that bn ě 0
– usually the Limit Comparison Test is easier to apply than the Comparison Test

5.4IJ Absolute and Conditional Convergence


We have now seen examples of series that converge and of series that diverge. But we
haven’t really discussed how robust the convergence of series is — that is, can we tweak
the coefficients in some way while leaving the convergence unchanged. A good example
of this is the series
8  n
ÿ 1
3
n =1

This is a simple geometric series and we know it converges. We have also seen, as ex-
amples 5.3.17 and ?? showed us, that we can multiply or divide the nth term by n and it
will still converge. We can even multiply the nth term by (´1)n , and it will still converge.
Pretty robust.
On the other hand, we have explored the Harmonic series and its relatives quite a lot
and we know it is much more delicate. While
8
ÿ 1
n
n =1

diverges, we also know33 the following two series converge:


8 8
1 1
(´1)n .
ÿ ÿ

n =1
n1.00000001 n =1
n

33 The first is a p-series with p ą 1; the second is the alternating harmonic series, which we found to
converge in Example 5.3.14.

391
S EQUENCE AND S ERIES 5.4 A BSOLUTE AND C ONDITIONAL C ONVERGENCE

This suggests that the divergence of the Harmonic series is much more delicate. In this
section, we discuss one way to characterize this sort of delicate convergence — especially
in the presence of changes of sign.

Definition5.4.1 (Absolute and conditional convergence).

8 8
(a) A series an is said to converge absolutely if the series |an | converges.
ř ř
n =1 n =1
8 8 8
(b) If an converges but |an | diverges we say that an is conditionally
ř ř ř
n =1 n =1 n =1
convergent.

If you consider these definitions for a moment, it should be clear that absolute ř con-
vergence is a stronger condition than just simple convergence. ř All the terms in n |an |
are forced to be positive (by the absolute value signs), so that n |an | must be bigger than
n an — making it easier for n |an | to diverge. This is formalised by the following the-
ř ř
orem, which is an immediate consequence of the comparison test, Theorem 5.3.8.a, with
cn = |an |.

Theorem5.4.2 (Absolute convergence implies convergence).


8 8
If the series |an | converges then the series an also converges. That is, abso-
ř ř
n =1 n =1
lute convergence implies convergence.

Recall that some of our convergence tests (for example, the integral test) may only be
applied to series with positive terms. Theorem 5.4.2 opens up the possibility of applying
“positive only” convergence tests to series whose terms are not all positive, by checking
for “absolute convergence” rather than for plain “convergence”.
ř 
Example 5.4.3 8
( ´1 ) n´1 1
n =1 n2
8
ˇ(´1)n´1 1ˇ 1
Because the series of Example 5.3.6 converges (by the integral
ř8 ˇ ˇ ř
n =1 n2
= n2
n =1
8
test), the series (´1)n´1 n12 converges absolutely, and hence converges.
ř
n =1
Example 5.4.3

Example 5.4.4 (random signs)

Imagine flipping a coin infinitely many times. Set σn = +1 if the nth flip comes up heads

392
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

and σn = ´1 if the nth flip comes up tails. We know that the series 8 σn 1 ˇ
ˇ ˇ
n=1 (´1) n2 =
ř
ˇ
8
1 σn 1
converges. So 8n=1 (´1) n2 converges absolutely, and hence converges.
ř ř
n2
n =1
Example 5.4.4

With series that converge conditionally, arithmetic can get a little tricky. For some
interesting examples of this trickiness, see Appendix A.13.

5.5IJ Power Series


Let’s return to the simple geometric series

8
xn
ÿ

n =0

where x is some real number. As we have seen (back in Example 5.2.4), for |x| ă 1 this
series converges to a limit, that varies with x, while for |x| ě 1 the series diverges. Conse-
quently we can consider this series to be a function of x

8
xn
ÿ
f (x) = on the domain |x| ă 1.
n =0

Furthermore (also from Example 5.2.4) we know what the function is.

8
1
xn =
ÿ
f (x) = .
n =0
1´x

n 1
Hence we can consider the series 8 n=0 x as a new way of representing the function 1´x
ř
when |x| ă 1. This series is an example of a power series.
1
Of course, representing a function as simple as 1´x by a series doesn’t seem like it is
going to make life easier. However the idea of representing a function by a series turns
out to be extremely helpful. Power series turn out to be very robust mathematical ob-
jects and interact very nicely with not only standard arithmetic operations, but also with
differentiation and integration (see Theorem 5.5.12). This means, for example, that

8
d 1 d ÿ n
" *
= x provided |x| ă 1
dx 1´x dx n=0
8
ÿ d n
= x just differentiate term by term
n =0
dx
8
nx n´1
ÿ
=
n =0

393
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

and in a very similar way


8
1
ż ż ÿ
dx = x n dx provided |x| ă 1
1´x n =0
8
ÿż
= x n dx just integrate term by term
n =0
8
1
x n +1
ÿ
=C+
n =0
n + 1

We are hiding some mathematics under the word “just” in the above, but you can see that
once we have a power series representation of a function, differentiation and integration
become very straightforward.
So we should set as our goal for this section, the development of machinery to define
and understand power series. This will allow us to answer questions34 like

xn
8
x
ÿ
Is e = ?
n =0
n!

Our starting point (now that we have equipped ourselves with basic ideas about series),
is the definition of power series.

5.5.1 §§ Definitions
Definition5.5.1.

A series of the form


8
2 3
An ( x ´ c)n
ÿ
A0 + A1 ( x ´ c ) + A2 ( x ´ c ) + A3 ( x ´ c ) + ¨ ¨ ¨ =
n =0

is called a power series in ( x ´ c) or a power series centered on c. The numbers An


are called the coefficients of the power series.
One often considers power series centered on c = 0 and then the series reduces
to
8
A0 + A1 x + A2 x 2 + A3 x 3 + ¨ ¨ ¨ = An x n
ÿ

n =0

xn 1
For example 8 n=0 n! is the power series with c = 0 and An = n! . Typically, as in
ř
that case, the coefficients An are given fixed numbers, but the “x” is to be thought of as a
variable. Thus each power series is really a whole family of series — a different series for
each value of x.

34 Recall that n! = 1 ˆ 2 ˆ 3 ˆ ¨ ¨ ¨ ˆ n is called “n factorial”. By convention 0! = 1.

394
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

One possible value of x is x = c and then the series reduces35 to


8 ˇ 8

An (c ´ c)n
ÿ ÿ
An ( x ´ c) ˇ =
x =c
n =0 n =0
= looA
mo 0 on + loomo
0on + loomo 0 on + loomo
0 on + ¨ ¨ ¨
n =0 n =1 n =2 n =3

and so simply converges to A0 .


We now know that a power series converges when x = c. We can now use our
convergence tests to determine for what other values of x the series ř converges. Per-
th
haps most straightforward is the ratio test. The n term in the series 8 n
n =0 A n ( x ´ c )
is an = An ( x ´ c)n . To apply the ratio test we need to compute the limit
ˇ a n +1 ˇ ˇ A n +1 ( x ´ c ) n +1 ˇ
ˇ ˇ ˇ ˇ
lim ˇ ˇ = lim ˇˇ ˇ
nÑ8 ˇ an ˇ nÑ8 An ( x ´ c)n ˇ
ˇ A n +1 ˇ
ˇ ˇ
= lim ˇˇ ˇ ¨ |x ´ c|
nÑ8 An ˇ
ˇ A n +1 ˇ
ˇ ˇ
= |x ´ c| ¨ lim ˇˇ ˇ.
nÑ8 An ˇ

When we do so there are several possible outcomes.


• If the limit of ratios exists and is non-zero
ˇA ˇ
ˇ n +1 ˇ
lim ˇ ˇ = A ‰ 0,
nÑ8 An

then the ratio test says that the series 8 n


n =0 A n ( x ´ c )
ř

– converges when A ¨ |x ´ c| ă 1, i.e. when |x ´ c| ă 1/A, and


– diverges when A ¨ |x ´ c| ą 1, i.e. when |x ´ c| ą 1/A.
Because of this, when the limit exists, the quantity

Equation 5.5.2.

 ˇ´1
1 ˇA
ˇ n +1 ˇ
R= = lim ˇ
A nÑ8 An
ˇ

is called the radius of convergence of the series36 .

35 By convention, when the term ( x ´ c)0 appears in a power series, it has value 1 for all values of x, even
x = c.
36 The use of the word “radius” might seem a little odd here, since we are really describing the interval
in the real line where the series converges. However, when one starts to consider power series over
complex numbers, the radius of convergence does describe a circle inside the complex plane and so
“radius” is a more natural descriptor.

395
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

• If the limit of ratios exists and is zero


ˇA ˇ
lim ˇ n+1 ˇ = 0
ˇ ˇ
nÑ8 An
ˇ ˇ
ˇ A n +1 ˇ
then limnÑ8 ˇ An ˇ|x ´ c| = 0 for every x and the ratio test tells us that the series
n
n=0 An ( x ´ c ) converges for every number x. In this case we say that the series
ř8
has an infinite radius of convergence.
• If the limit of ratios diverges to +8
ˇA ˇ
ˇ n +1 ˇ
lim ˇ ˇ = +8
nÑ8 An
ˇ ˇ
ˇA ˇ
then limnÑ8 ˇ An+n 1 ˇ|x ´ c| = +8 for every x ‰ c. The ratio test then tells us that the
series 8 n
n=0 An ( x ´ c ) diverges for every number x ‰ c. As we have seen above,
ř
when x = c, the series reduces to A0 + 0 + 0 + 0 + 0 + ¨ ¨ ¨ , which of course converges.
In this case we say that the series has radius of convergence zero.
ˇ ˇ
ˇA ˇ
• If ˇ An+n 1 ˇ does not approach a limit as n Ñ 8, then we learn nothing from the ratio
test and we must use other tools to understand the convergence of the series.
All of these possibilities do happen. We give an example of each below. But first, the
concept of “radius of convergence” is important enough to warrant a formal definition.

Definition5.5.3.

(a) Let 0 ă R ă 8. If 8 n
n=0 An ( x ´ c ) converges for |x ´ c| ă R, and diverges
ř
for |x ´ c| ą R, then we say that the series has radius of convergence R.

(b) If 8 n
n=0 An ( x ´ c ) converges for every number x, we say that the series has
ř
an infinite radius of convergence.

(c) If 8 n
n=0 An ( x ´ c ) diverges for every x ‰ c, we say that the series has radius
ř
of convergence zero.

Example 5.5.4 (Finite nonzero radius of convergence)


8
We already know that, if a ‰ 0, the geometric series ax n converges when |x| ă 1 and
ř
n =0
diverges when |x| ě 1. So, in the terminology of Definition 5.5.3, the geometric series has
radius of convergence R = 1. As a consistency check, we can also compute R using (5.5.2).
8
The series ax n has An = a. So
ř
n =0
 ˇA ˇ´1 h i´1
ˇ n +1 ˇ
R= lim ˇ = lim 1 =1
nÑ8 An nÑ8
ˇ

396
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

as expected.
Example 5.5.4

Example 5.5.5 (Radius of convergence = +8)


8
xn 1
The series has An = n! . So
ř
n!
n =0
ˇA ˇ
ˇ n +1 ˇ 1/(n+1)! n! 1ˆ2ˆ3ˆ¨¨¨ˆn
lim ˇ ˇ = lim 1 = lim = lim
nÑ8 An nÑ8 /n! nÑ8 ( n + 1) ! nÑ8 1 ˆ 2 ˆ 3 ˆ ¨ ¨ ¨ ˆ n ˆ ( n + 1)
1
= lim
nÑ8 n + 1
=0
8
xn
and has radius of convergence 8. It converges for every x.
ř
n!
n =0
Example 5.5.5

Example 5.5.6 (Radius of convergence = 0)


8
The series n!x n has An = n!. So
ř
n =0
ˇA ˇ ( n + 1) ! 1 ˆ 2 ˆ 3 ˆ 4 ˆ ¨ ¨ ¨ ˆ n ˆ ( n + 1)
lim ˇ n+1 ˇ = lim = lim
ˇ ˇ
nÑ8 An nÑ8 n! nÑ8 1ˆ2ˆ3ˆ4ˆ¨¨¨ˆn
= lim (n + 1)
nÑ8
= +8
8
and n!x n has radius of convergence zero37 . It converges only for x = 0, where it takes
ř
n =0
the value 0! = 1.
Example 5.5.6

Example 5.5.7

Comparing the series


1 + 2x + x2 + 2x3 + x4 + 2x5 + ¨ ¨ ¨
to
8
A n x n = A0 + A1 x + A2 x 2 + A3 x 3 + A4 x 4 + A5 x 5 + ¨ ¨ ¨
ÿ

n =1

37 Because of this, it might seem that such a series is fairly pointless. However there are all sorts of
mathematical games that can be played with them without worrying about their convergence. Such
“formal” power series can still impart useful information and the interested reader is invited to look up
“generating functions” with their preferred search engine.

397
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

we see that
A0 = 1 A1 = 2 A2 = 1 A3 = 2 A4 = 1 A5 = 2 ¨¨¨
so that
A1 A2 1 A3 A4 1 A5
=2 = =2 = =2 ¨¨¨
A0 A1 2 A2 A3 2 A4
A
and An+n 1 does not converge as n Ñ 8. Since the limit of the ratios does not exist, we
cannot tell anything from the ratio test. Nonetheless, we can still figure out for which x’s
our power series converges.
• Because every coefficient An is either 1 or 2, the nth term in our series obeys
ˇ An x n ˇ ď 2|x|n
ˇ ˇ

and so is smaller than the nth term in the geometric series 8 n


n=0 2|x| . This geometric
ř
series converges if |x| ă 1. So, by the comparison test, our series converges for
|x| ă 1 too.
• Since every An is at least one, the nth term in our series obeys
ˇ An x n ˇ ě |x|n
ˇ ˇ

If |x| ě 1, this an = An x n cannot converge to zero as n Ñ 8, and our series diverges


by the divergence test.
In conclusion, our series converges if and only if |x| ă 1, and so has radius of conver-
gence 1.
Example 5.5.7

Example 5.5.8

Lets construct a series from the digits of π. Now to avoid dividing by zero, let us set
An = 1 + the nth digit of π
Since π = 3.141591 . . .
A0 = 4 A1 = 2 A2 = 5 A3 = 2 A4 = 6 A5 = 10 A6 = 2 ¨¨¨
Consequently every An is an integer between 1 and 10 and gives us the series
8
An x n = 4 + 2x + 5x2 + 2x3 + 6x4 + 10x5 + ¨ ¨ ¨
ÿ

n =0
A
The number π is irrational and consequently the ratio An+n 1 cannot have a limit as n Ñ 8.
If you do not understand why this is the case then don’t worry too much about it38 . As

38 This is a little beyond the scope of the course. Roughly speaking, think about what would happen if
the limit of the ratios did exist. If the limit were smaller than 1, then it would tell you that the terms of
our series must be getting smaller and smaller and smaller — which is impossible because they are all
integers between 1 and 10. Similarly if the limit existed and were bigger than 1 then the terms of the
series would have to get bigger and bigger and bigger — also impossible. Hence if the ratio exists then
it must be equal to 1 — but in that case because the terms are integers, they would have to be all equal
when n became big enough. But that means that the expansion of π would be eventually periodic —
something that only rational numbers do.

398
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

in the last example, the limit of the ratios does not exist and we cannot tell anything from
the ratio test. But we can still figure out for which x’s it converges.

• Because every coefficient An is no bigger (in magnitude) than 10, the nth term in our
series obeys
ˇ An x n ˇ ď 10|x|n
ˇ ˇ

and so is smaller than the nth term in the geometric series 8 n


n=0 10|x| . This geomet-
ř
ric series converges if |x| ă 1. So, by the comparison test, our series converges for
|x| ă 1 too.

• Since every An is at least one, the nth term in our series obeys

ˇ An x n ˇ ě |x|n
ˇ ˇ

If |x| ě 1, this an = An x n cannot converge to zero as n Ñ 8, and our series diverges


by the divergence test.

In conclusion, our series converges if and only if |x| ă 1, and so has radius of conver-
gence 1.
Example 5.5.8

Though we won’t prove it, it is ˇtrue thatˇ every power series has a radius of conver-
ˇ A n +1 ˇ
gence, whether or not the limit lim ˇ An ˇ exists.
nÑ8

Theorem5.5.9.
8
Let An ( x ´ c)n be a power series. Then one of the following alternatives must
ř
n =0
hold.

(a) The power series converges for every number x. In this case we say that the
radius of convergence is 8.

(b) There is a number 0 ă R ă 8 such that the series converges for |x ´ c| ă R


and diverges for |x ´ c| ą R. Then R is called the radius of convergence.

(c) The series converges for x = c and diverges for all x ‰ c. In this case, we say
that the radius of convergence is 0.

399
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

Definition5.5.10.

Consider the power series


8
An ( x ´ c)n .
ÿ

n =0

The set of real x-values for which it converges is called the interval of conver-
gence of the series.

8
Suppose that the power series An ( x ´ c)n has radius of convergence R. Then from
ř
n =0
Theorem 5.5.9, we have that

• if R = 8, then its interval of convergence is ´8 ă x ă 8, which is also denoted


(´8, 8), and

• if R = 0, then its interval of convergence is just the point x = c, and

• if 0 ă R ă 8, then we know that the series converges for any x which obeys

|x ´ c| ă R or equivalently ´R ă x´c ă R
or equivalently c´R ă x ă c+R

But we do not (yet) know whether or not the series converges at the two end points
of that interval. We do know, however, that its interval of convergence must be one
of

˝ c ´ R ă x ă c + R, which is also denoted (c ´ R , c + R), or


˝ c ´ R ď x ă c + R, which is also denoted [c ´ R , c + R), or
˝ c ´ R ă x ď c + R, which is also denoted (c ´ R , c + R], or
˝ c ´ R ď x ď c + R, which is also denoted [c ´ R , c + R].

To reiterate — while the radius convergence, R with 0 ă R ă 8, tells us that the series
converges for |x ´ c| ă R and diverges for |x ´ c| ą R, it does not (by itself) tell us whether
or not the series converges when |x ´ c| = R, i.e. when x = c ˘ R. We will not generally
concern ourselves with these final details. (Determining the endpoints of the interval of
convergence often goes smoothest with the Alternating Series Test, which is available for
your interest in Appendix A.12 but is not a part of our syllabus.)
Example 5.5.11

We are told that a certain power series with centre c = 3, converges at x = 4 and diverges
at x = 1. What else can we say about the convergence or divergence of the series for other
values of x?

400
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

We are told that the series is centred at 3, so its terms are all powers of ( x ´ 3) and it is
of the form

A n ( x ´ 3) n .
ÿ

ně0

A good way to summarise the convergence data we are given is with a figure like the one
below. Green dots mark the values of x where the series is known to converge. (Recall
that every power series converges at its centre.) The red dot marks the value of x where
the series is known to diverge. The bull’s eye marks the centre.

1 3 4

Can we say more about the convergence and/or divergence of the series for other values
of x? Yes!
Let us think about the radius of convergence, R, of the series. We know that it must
exist and the information we have been given allows us to bound R. Recall that

• the series converges at x provided that |x ´ 3| ă R and

• the series diverges at x if |x ´ 3| ą R.

We have been told that

• the series converges when x = 4, which tells us that

˝ x = 4 cannot obey |x ´ 3| ą R so
˝ x = 4 must obey |x ´ 3| ď R, i.e. |4 ´ 3| ď R, i.e. R ě 1

• the series diverges when x = 1 so we also know that

˝ x = 1 cannot obey |x ´ 3| ă R so
˝ x = 1 must obey |x ´ 3| ě R, i.e. |1 ´ 3| ě R, i.e. R ď 2

We still don’t know R exactly. But we do know that 1 ď R ď 2. Consequently,

• since 1 is the smallest that R could be, the series certainly converges at x if |x ´ 3| ă 1,
i.e. if 2 ă x ă 4 and

• since 2 is the largest that R could be, the series certainly diverges at x if |x ´ 3| ą 2,
i.e. if x ą 5 or if x ă 1.

The following figure provides a resume of all of this convergence data — there is conver-
gence at green x’s and divergence at red x’s.

1 2 3 4 5

401
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

Notice that from the data given we cannot say anything about the convergence or diver-
gence of the series on the intervals (1, 2] and (4, 5].

One lesson that we can derive from this example is that,

• if a series has centre c and converges at a,

• then it also converges at all points between c and a, as well as at all points of distance
strictly less than |a ´ c| from c on the other side of c from a.

Example 5.5.11

5.5.2 §§ Working With Power Series

Just as we have done previously with limits, differentiation and integration, we can con-
struct power series representations of more complicated functions by using those of sim-
pler functions. Here is a theorem that helps us to do so.

402
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

Theorem5.5.12 (Operations on Power Series).

Assume that the functions f ( x ) and g( x ) are given by the power series
8 8
An ( x ´ c)n Bn ( x ´ c)n
ÿ ÿ
f (x) = g( x ) =
n =0 n =0

for all x obeying |x ´ c| ă R. In particular, we are assuming that both power


series have radius of convergence at least R. Also let K be a constant. Then
8
[ An + Bn ] ( x ´ c)n
ÿ
f ( x ) + g( x ) =
n =0
8
K An ( x ´ c)n
ÿ
K f (x) =
n =0
8
( x ´ c) N f ( x ) = An ( x ´ c )n+ N
ÿ
for any integer N ě 1
n =0
8
Ak´N ( x ´ c)k
ÿ
= where k = n + N
k= N
8 8
n´1
An n ( x ´ c)n´1
ÿ ÿ
f (x) =
1
An n ( x ´ c) =
n =0 n =1
żx
( x ´ c ) n +1
ÿ8
f (t) dt = An
c n =0
n+1
ÿ 
( x ´ c ) n +1
ż 8
f ( x ) dx = An + C with C an arbitrary constant
n =0
n + 1

for all x obeying |x ´ c| ă R.


In particular the radius of convergence of each of the six power series on the right
8
hand sides is at least R. In fact, if R is the radius of convergence of An ( x ´ c)n ,
ř
n =0
then R is also the radius of convergence of all of the above right hand sides, with
8 8
the possible exceptions of [ An + Bn ] ( x ´ c)n and KAn ( x ´ c)n when K = 0.
ř ř
n =0 n =0

Example 5.5.13

The last statement of Theorem 5.5.12 might seem a little odd, but consider the following
two power series centred at 0:
8 8
2n x n and (1 ´ 2n ) x n .
ÿ ÿ

n =0 n =0

403
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

1
The ratio test tells us that they both have radius of convergence R = 2. However their
sum is
8 8 8
2n x n + (1 ´ 2n ) x n = xn
ÿ ÿ ÿ

n =0 n =0 n =0

which has the larger radius of convergence 1.


A more extreme example of the same phenomenon is supplied by the two series
8 8
n n
(´2n ) x n .
ÿ ÿ
2 x and
n =0 n =0

They are both geometric series with radius of convergence R = 12 . But their sum is
8 8 8
n n n n
(0) x n
ÿ ÿ ÿ
2 x + (´2 ) x =
n =0 n =0 n =0

which has radius of convergence +8.


Example 5.5.13

We’ll now use this theorem to build power series representations for a bunch of func-
tions out of the one simple power series representation that we know — the geometric
series
8
1
xn
ÿ
= for all |x| ă 1
1´x n =0

 
1
Example 5.5.14 1´x2
1
Find a power series representation for 1´x2
.
Solution. The secret to finding power series representations for a good many functions
1
is to manipulate them into a form in which 1´y appears and use the geometric series
1 n
representation 1´y = n=0 y . We have deliberately renamed the variable to y here — it
8
ř
1
does not have to be x. We can use that strategy to find a power series expansion for 1´x2
1 1 2
— we just have to recognize that 1´x 2 is the same as 1´y if we set y to x .

ÿ
8 
1 1 ˇˇ
ˇ
n
= = y if |y| ă 1, i.e. |x| ă 1
1 ´ x2 1 ´ y ˇy = x 2 n =0 y= x2
8 8
n
x2 x2n
ÿ ÿ
= =
n =0 n =0
2 4 6
= 1+ x + x + x +¨¨¨

This is a perfectly good power series. There is nothing wrong with the power of x being 2n.
(This just means that the coefficients of all odd powers of x are zero.) In fact, you should

404
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

try to always write power series in forms that are as easy to understand as possible. The
geometric series that we used at the end of the first line converges for

|y| ă 1 ðñ ˇx2 ˇ ă 1 ðñ |x| ă 1


ˇ ˇ

So our power series has radius of convergence 1 and interval of convergence ´1 ă x ă 1.

Example 5.5.14

 
x
Example 5.5.15 2+ x 2
x
Find a power series representation for 2+ x 2
.
Solution. This example is just a more algebraically involved variant of the last one. Again,
1
the strategy is to manipulate 2+xx2 into a form in which 1´y appears.

x x 1 x 1 x2
= =  set ´ =y
2 + x2 2 1 + x2/2 2 1 ´ ´x2/2 2
 8 
x 1 ˇˇ x ÿ n
ˇ
= = y if |y| ă 1
2 1 ´ y ˇy =´ x 2 2 n =0 y = ´ x2
2 2
8  2  n n (´1)n 2n+1
8 8
x ÿ x x ÿ (´1) 2n ÿ
= ´ = x = x by Theorem 5.5.12, twice
2 n =0 2 2 n =0 2n n =0
2 n +1

x x3 x5 x7
= ´ + ´ +¨¨¨
2 4 8 16
The geometric series that we used in the second line converges when
?
|y| ă 1 ðñ ˇ´x2/2ˇ ă 1 ðñ |x|2 ă 2 ðñ |x| ă 2
ˇ ˇ

?
So?the given
? power series has radius of convergence 2 and interval of convergence
´ 2 ă x ă 2.
Example 5.5.15

Example 5.5.16 (Nonzero centre)


1
Find a power series representation for 5´x with centre 3.
Solution. The new wrinkle in this example is the requirement that the centre be 3. That
the centre is to be 3 means that we need a powerř8 series in powers of x ´ c, with c = 3. So
n
we are looking for a power series of the form n=0 An ( x ´ 3) . The easy way to find such
a series is to force an x ´ 3 to appear by adding and subtracting a 3.

1 1 1
= =
5´x 5 ´ ( x ´ 3) ´ 3 2 ´ ( x ´ 3)

405
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

1
Now we continue, as in the last example, by manipulating 2´( x´3)
into a form in which
1
1´y appears.

1 1 1 1 x´3
= = set
=y
5´x 2 ´ ( x ´ 3) 2 1 ´ x´3
2
2
 8 
1 1 ˇˇ 1 ÿ n
ˇ
= = y if |y| ă 1
2 1 ´ yˇ y= x´3 2 n =0 y= x´3
2 2
8
1 ÿ x´3  n 8
ÿ ( x ´ 3) n
= =
2 n =0 2 n =0
2n +1
x ´ 3 ( x ´ 3)2 ( x ´ 3)3
= + + +¨¨¨
2 4 8
The geometric series that we used in the second line converges when
ˇx ´ 3ˇ
|y| ă 1 ðñ ˇ ˇ ă 1 ðñ |x ´ 3| ă 2 ðñ ´2 ă x ´ 3 ă 2 ðñ 1 ă x ă 5
ˇ ˇ
2
So the power series has radius of convergence 2 and interval of convergence 1 ă x ă 5.
Example 5.5.16

In the previous two examples, to construct a new series from an existing series, we
replaced x by a simple function. The following theorem gives us some more (but certainly
not all) commonly used substitutions.

Theorem5.5.17 (Substituting in a Power Series).

Assume that the function f ( x ) is given by the power series


8
An x n
ÿ
f (x) =
n =0

for all x in the interval I. Also let K and k be real constants. Then
8
k

An K n x kn
ÿ
f Kx =
n =0

whenever Kx k is in I. In particular, if 8 An x n has radius of convergence R, K


ř
n =0 ř
is nonzero and k is a natural number, then 8 n kn has radius of conver-
a n =0 A n K x
gence k R/|K|.

 
1
Example 5.5.18 (1´x )2
1
Find a power series representation for (1´x )2
.

406
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

1 1
Solution. Once again the trick is to express (1´x )2
in terms of 1´x . Notice that

1 d 1
"
*
2
=
(1 ´ x ) dx 1´x
# +
8
d
xn
ÿ
=
dx n=0
8
nx n´1
ÿ
= by Theorem 5.5.12
n =1

Note that the n = 0 term has disappeared because, for n = 0,

d n d 0 d
x = x = 1=0
dx dx dx

Also note that the radius of convergence of this series is one. We can see this via Theo-
rem 5.5.12. That theorem tells us that theřradius of convergence of a power series is not
changed by differentiation — and since 8 n
n=0 x has radius of convergence one, so too
does its derivative.
Without much more work we can determine the interval of convergence by testing at
x = ˘1. When x = ˘1 the terms of the series do not go to zero as n Ñ 8 and so, by the
divergence test, the series does not converge there. Hence the interval of convergence for
the series is ´1 ă x ă 1.
Example 5.5.18
Notice that, in this last example, we differentiated a known series to get to our answer. As
per Theorem 5.5.12, the radius of convergence didn’t change. In addition, in this par-
ticular example, the interval of convergence didn’t change. This is not always the case.
Differentiation of some series causes the interval of convergence to shrink. In particular
the differentiated series may no longer be convergent at the end points of the interval39 .
Similarly, when we integrate a power series the radius of convergence is unchanged, but
the interval of convergence may expand to include one or both ends, as illustrated by the
next example.

Example 5.5.19 (ln(1 + x ))

Find a power series representation for ln(1 + x ).

xn
39 Consider the power series 8 n=1 n . We know that its interval of convergence is ´1řď x ă 1. (Indeed
ř
see the next example.) When we differentiate the series we get the geometric series 8 n
n=0 x which has
interval of convergence ´1 ă x ă 1.

407
S EQUENCE AND S ERIES 5.5 P OWER S ERIES

d
Solution. Recall that ln(1 + x ) = 1+1 x so that ln(1 + t) is an antiderivative of
dx
1
1+ t and
żx żxh ÿ 8 i
dt n
ln(1 + x ) = = (´t) dt
0 1+t 0 n =0
8 żx
(´t)n dt
ÿ
= by Theorem 5.5.12
n =0 0
x n +1
8
(´1)n
ÿ
=
n =0
n+1
x2 x3 x4
= x´
+ ´ +¨¨¨
2 3 4
Theorem 5.5.12 guarantees that the řradius of convergence is exactly one (the radius of
convergence of the geometric series 8 n
n=0 (´t ) ) and that

x n +1
8
(´1)n
ÿ
ln(1 + x ) = for all ´1 ă x ă 1
n =0
n+1

In general, we won’t worry about the endpoints of the interval of convergence. So, in
general, we wouldn’t bother testing x = 1 and x = ´1. However, in this instance, both
examples are pretty accessible. We include
ř8 them below for interest.
When x = ´1 our series reduces to n=0 n+1 , which is (minus) the harmonic series and
´1

so diverges. That’s no surprise: ln(1 + (´1)) = ln 0 is undefined, with lim ln x = ´8.


xÑ0+
When x = 1, we get the alternating harmonic series, which converges. (It is possible to
prove by continuity, though we won’t do so here, that the sum is ln 2.)
So the interval of convergence is ´1 ă x ď 1.
Example 5.5.19

Example 5.5.20 (arctan x )

Find a power series representation for arctan x.


d
Solution. Recall that arctan x = 1+1x2 so that arctan t is an antiderivative of 1+1t2 and
dx
żx żxh ÿ 8 i 8 żx
dt 2 n
(´1)n t2n dt
ÿ
arctan x = 2
= (´t ) dt =
0 1 + t 0 n =0 n =0 0
x2n+1
8
(´1)n
ÿ
=
n =0
2n + 1
x3 x5
= x´ + ´¨¨¨
3 5
Theorem 5.5.12 guarantees that the řradius of convergence is exactly one (the radius of
convergence of the geometric series 8 2 n
n=0 (´t ) ) and that

x2n+1
8
(´1)n
ÿ
arctan x = for all ´1 ă x ă 1
n =0
2n + 1

408
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

Since we’re not generally concerned with the endpoints of the interval of convergence,
we’ll leave as a mystery whether the series converges at x = 1 and x = ´1.
Example 5.5.20

The operations on power series dealt with in Theorem 5.5.12 are fairly easy to apply.
Unfortunately taking the product, ratio or composition of two power series is more in-
volved and is beyond the scope of this course40 . Unfortunately Theorem 5.5.12 alone will
not get us power series representations of many of our standard functions (like e x and
sin x). Fortunately we can find such representations by extending Taylor polynomials41 to
Taylor series.

5.6IJ Taylor Series


5.6.1 §§ Extending Taylor Polynomials
Recall42 that Taylor polynomials provide a hierarchy of approximations to a given func-
tion f ( x ) near a given point a. Typically, the quality of these approximations improves as
we move up the hierarchy.

• The crudest approximation is the constant approximation f ( x ) « f ( a).

• Then comes the linear, or tangent line, approximation f ( x ) « f ( a) + f 1 ( a) ( x ´ a).

• Then comes the quadratic approximation

1 2
f ( x ) « f ( a) + f 1 ( a) ( x ´ a) + f ( a ) ( x ´ a )2
2

• In general, the Taylor polynomial of degree n, for the function f ( x ), about the ex-
pansion point a, is the polynomial, Tn ( x ), determined by the requirements that
(k)
f (k) ( a) = Tn ( a) for all 0 ď k ď n. That is, f and Tn have the same derivatives
at a, up to order n. Explicitly,

1 2 1
f ( x ) « Tn ( x ) = f ( a) + f 1 ( a) ( x ´ a) + f ( a ) ( x ´ a )2 + ¨ ¨ ¨ + f ( n ) ( a ) ( x ´ a ) n
2 n!
n
1 (k)
f ( a) ( x ´ a)k
ÿ
=
k!
k =0

These are, of course, approximations — often very good approximations near x = a — but
still just approximations. One might hope that if we let the degree, n, of the approximation
go to infinity then the error in the approximation might go to zero. If that is the case then

40 As always, a quick visit to your favourite search engine will direct the interested reader to more infor-
mation.
41 Now is a good time to review your notes from last term, though we’ll give you a whirlwind review
over the next page or two.
42 Please review your notes from last term if this material is feeling a little unfamiliar.

409
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

the “infinite” Taylor polynomial would be an exact representation of the function. Let’s
see how this might work.
Fix a real number a and suppose that all derivatives of the function f ( x ) exist. Then,
for any natural number n,

Equation 5.6.1.

f ( x ) = Tn ( x ) + En ( x )

where Tn ( x ) is the Taylor polynomial of degree n for the function f ( x ) expanded about a,
and En ( x ) = f ( x ) ´ Tn ( x ) is the error in our approximation. The Taylor polynomial43 is
given by the formula

Equation 5.6.1-a

1 (n)
Tn ( x ) = f ( a) + f 1 ( a) ( x ´ a) + ¨ ¨ ¨ + n! f ( a) ( x ´ a)n

while the error satisfies

Equation 5.6.1-b

En ( x ) = 1
( n +1) !
f ( n +1) ( c ) ( x ´ a ) n +1

for some c strictly between a and x. Note that we typically do not know the value of c in
the formula for the error. Instead we use the bounds on c to find bounds on f (n+1) (c) and
so bound the error44 .
In order for our Taylor polynomial to be an exact representation of the function f ( x )
we need the error En ( x ) to be zero. This will not happen when n is finite unless f ( x ) is a
polynomial. However it can happen in the limit as n Ñ 8, and in that case we can write
f ( x ) as the limit
n
1 (k)
( a) ( x ´ a)k
ÿ
f ( x ) = lim Tn ( x ) = lim k! f
nÑ8 nÑ8
k =0

This is really a limit of partial sums, and so we can write


8
1 (k)
( a) ( x ´ a)k
ÿ
f (x) = k! f
k =0

which is a power series representation of the function. Let us formalise this in a definition.

43 Did you take a quick look at your notes?


44 The discussion here is only supposed to jog your memory. If it is feeling insufficiently jogged, then
please look at your notes from last term.

410
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

Definition5.6.2 (Taylor series).

The Taylor series for the function f ( x ) expanded around a is the power series
8
1 (n)
( a) ( x ´ a)n
ÿ
f (x) = n! f
n =0

provided the series converges. When a = 0 it is also called the Maclaurin series
of f ( x ).

This definition hides the discussion of whether or not En ( x ) Ñ 0 as n Ñ 8 within the


caveat “provided the series converges”. Demonstrating that for a given function can be
difficult, but for many of the standard functions you are used to dealing with, it turns out
to be pretty easy. Let’s compute a few Taylor series and see how we do it.
Example 5.6.3 (Exponential Series)

Find the Maclaurin series for f ( x ) = e x .


Solution. Just as was the case for computing Taylor polynomials, we need to compute the
derivatives of the function at the particular choice of a. Since we are asked for a Maclaurin
series, a = 0. So now we just need to find f (k) (0) for all integers k ě 0.
d x
We know that dx e = e x and so

e x = f ( x ) = f 1 ( x ) = f 2 ( x ) = ¨ ¨ ¨ = f (k) ( x ) = ¨ ¨ ¨ which gives


1 = f (0) = f 1 (0) = f 2 (0) = ¨ ¨ ¨ = f ( k ) (0) = ¨ ¨ ¨ .

Equations (5.6.1) and (5.6.1-a) then give us

x2 xn
ex = f (x) = 1 + x + +¨¨¨+ + En ( x )
2! n!
We shall see, in the optional Example 5.6.6 below, that, for any fixed x, lim En ( x ) = 0.
nÑ8
Consequently, for all x,
h 1 2 1 3 1 ni
8
1 n
x
ÿ
e = lim 1 + x + x + x + ¨ ¨ ¨ + x = x
nÑ8 2 3! n! n =0
n!

Example 5.6.3

We have now seen power series representations for the functions

1 1
ln(1 + x ) arctan( x ) ex .
1´x (1 ´ x )2

411
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

We do not think that you, the reader, will be terribly surprised to see that we develop
series for sine and cosine next.
Example 5.6.4 (Sine and Cosine Series)

The trigonometric functions sin x and cos x also have widely used Maclaurin series ex-
pansions (i.e. Taylor series expansions about a = 0). To find them, we first compute all
derivatives at general x.

f ( x ) = sin x f 1 ( x ) = cos x f 2 ( x ) = ´ sin x f (3) ( x ) = ´ cos x f (4) ( x ) = sin x ¨ ¨ ¨


g( x ) = cos x g1 ( x ) = ´ sin x g2 ( x ) = ´ cos x g(3) ( x ) = sin x g(4) ( x ) = cos x ¨ ¨ ¨

Now set x = a = 0.
f ( x ) = sin x f (0) = 0 f 1 (0) = 1 f 2 (0) = 0 f (3) (0) = ´1 f (4) (0) = 0 ¨ ¨ ¨
g( x ) = cos x g(0) = 1 g1 (0) = 0 g2 (0) = ´1 g(3) (0) = 0 g (4) ( 0 ) = 1 ¨ ¨ ¨

For sin x, all even numbered derivatives (at x = 0) are zero, while the odd numbered
derivatives alternate between 1 and ´1. Very similarly, for cos x, all odd numbered deriva-
tives (at x = 0) are zero, while the even numbered derivatives alternate between 1 and ´1.
So, the Taylor polynomials that best approximate sin x and cos x near x = a = 0 are

sin x « x ´ 3!1 x3 + 1 5
5! x ´ ¨ ¨ ¨
cos x « 1 ´ 2!1 x2 + 1 4
4! x ´ ¨ ¨ ¨

We shall see, in the optional Example 5.6.8 below, that, for both sin x and cos x, we have
lim En ( x ) = 0 so that
nÑ8
h i
1 (n)
f ( x ) = lim f (0) + f (0) x + ¨ ¨ ¨ +
1
n! f (0) x n
nÑ8
h i
1 (n)
g( x ) = lim g(0) + g1 (0) x + ¨ ¨ ¨ + n! g (0) x n
nÑ8

Reviewing the patterns we found in the derivatives, we conclude that, for all x,
8
sin x = x ´ 3!1 x3 + 1 5
(´1)n (2n+
1
x2n+1
ÿ
5! x ´¨¨¨ = 1) !
n =0
8
cos x = 1 ´ 2!1 x2 + 1 4
(´1)n (2n1 )! x2n
ÿ
4! x ´¨¨¨ =
n =0

and, in particular, both of the series on the right hand sides converge for all x.
We could also test for convergence of the series using the ratio test. Computing the
ratios of successive terms in these two series gives us
ˇ An+1 ˇ |x|2n+3 /(2n + 3)! |x|2
ˇ ˇ
ˇ An ˇ |x|2n+1 /(2n + 1)! = (2n + 3)(2n + 2)
ˇ ˇ=

ˇ An+1 ˇ |x|2n+2 /(2n + 2)! |x|2


ˇ ˇ
ˇ ˇ= =
ˇ An ˇ |x|2n /(2n)! (2n + 2)(2n + 1)

412
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

for sine and cosine respectively. Hence as n Ñ 8 these ratios go to zero and consequently
both series are convergent for all x. (This is very similar to what was observed in Exam-
ple 5.5.5.)
Example 5.6.4

We have developed power series representations for a number of important func-


tions45 . Here is a theorem that summarizes them.

Theorem5.6.5.

xn
8
1 2 1
ex = x + x3 + ¨ ¨ ¨
ÿ
= 1+x+ for all ´8 ă x ă 8
n =0
n! 2! 3!
8
1 1 1
(´1)n x2n+1 = x ´ x3 + x5 ´ ¨ ¨ ¨
ÿ
sin( x ) = for all ´8 ă x ă 8
n =0
(2n + 1)! 3! 5!
8
1 1 2 1
(´1)n x2n x + x4 ´ ¨ ¨ ¨
ÿ
cos( x ) = = 1´ for all ´8 ă x ă 8
n =0
(2n)! 2! 4!
8
1
xn = 1 + x + x2 + x3 + ¨ ¨ ¨
ÿ
= for all ´1 ă x ă 1
1´x n =0
x n +1 x2 x3 x4
8
(´1)n
ÿ
ln(1 + x ) = = x´ + ´ +¨¨¨ for all ´1 ă x ď 1
n =0
n+1 2 3 4
x2n+1 x3 x5
8
(´1)n
ÿ
arctan x = = x´ + ´¨¨¨ for all ´1 ď x ď 1
n =0
2n + 1 3 5

Notice that the series for sine and cosine sum to something that looks very similar to

45 The reader might ask whether or not we will give the series for other trigonometric functions or their
inverses. While the tangent function has a perfectly well defined series, its coefficients are not as simple
as those of the series we have seen — they form a sequence of numbers known (perhaps unsurprisingly)
as the “tangent numbers”. They, and the related Bernoulli numbers, have many interesting properties,
links to which the interested reader can find with their favourite search engine. The Maclaurin series
for inverse sine is

8
ÿ 4´n (2n)! 2n+1
arcsin( x ) = x
2n + 1 (n!)2
n =0

which is quite tidy, but proving it is beyond the scope of the course.

413
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

the series for e x :


   
1 3 1 5 1 2 1 4
sin( x ) + cos( x ) = x ´ x + x ´ ¨ ¨ ¨ + 1 ´ x + x ´ ¨ ¨ ¨
3! 5! 2! 4!
1 1 1 1
= 1 + x ´ x2 ´ x3 + x4 + x5 ´ ¨ ¨ ¨
2! 3! 4! 5!
1 1 1 1
e x = 1 + x + x2 + x3 + x4 + x5 + ¨ ¨ ¨
2! 3! 4! 5!
1
So both series have coefficients with the same absolute value (namely n! ),?but there are
46
differences in sign . where will show how these series are linked through ´1.
 ř8 1 n 
Example 5.6.6 Optional — Why n=0 n! x is e . x

We have already seen, in Example 5.6.3, that

x2 xn
ex = 1 + x + +¨¨¨+ + En ( x )
2! n!
By (5.6.1-b)
1
En ( x ) = e c x n +1
( n + 1) !
for some (unknown) c between 0 and x. Fix any real number x. We’ll now show that En ( x )
converges to zero as n Ñ 8.
To do this we need get bound the size of ec , and to do this, consider what happens if x
is positive or negative.

• If x ă 0 then x ď c ď 0 and hence e x ď ec ď e0 = 1.

• On the other hand, if x ě 0 then 0 ď c ď x and so 1 = e0 ď ec ď e x .

In either case we have that 0 ď ec ď 1 + e x . Because of this the error term


ˇ ec ˇ |x|n+1
|En ( x )| = ˇ x n +1 ˇ ď [ e x + 1 ]
ˇ ˇ
( n + 1) ! ( n + 1) !

We claim that this upper bound, and hence the error En ( x ), quickly shrinks to zero as
n Ñ 8.
Call the upper bound (except for the factor e x + 1, which is independent of n) en ( x ) =
|x|n+1
( n +1) !
. To show that this shrinks to zero as n Ñ 8, let’s write it as follows.

n + 1 factors
hkkkkkkkkkkkkkkkkkikkkkkkkkkkkkkkkkkj
|x|n+1 |x| |x| |x| |x| |x|
en ( x ) = = ¨ ¨ ¨¨¨ ¨
( n + 1) ! 1 2 3 n |n + 1|

46 Warning: antique sign–sine pun. No doubt the reader first saw it many years syne.

414
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

Now let k be an integer bigger than |x|. We can split the product

k factors
   
hkkkkkkkkkkkkkikkkkkkkkkkkkkj
|x| |x| |x| |x| |x| |x|
en ( x ) = ¨ ¨ ¨¨¨ ¨ ¨¨¨
1 2 3 k k+1 |n + 1|
   n+1´k
|x| |x| |x| |x| |x|
ď ¨ ¨ ¨¨¨ ¨
1 2 3 k
looooooooooooomooooooooooooon k+1
= Q( x )
 n+1´k
|x|
= Q( x ) ¨
k+1

Since k does not depend not n (though it does depend on x), the function Q( x ) does not
change as we increase n. Additionally, we know that |x| ă k + 1 and so k|x|
+1 ă 1. Hence as
we let n Ñ 8 the above bound must go to zero.
Alternatively, compare en ( x ) and en+1 ( x ).

|x|n+2
e n +1 ( x ) ( n +2) ! |x|
= =
en ( x ) |x|n+1 n+2
( n +1) !

e (x)
When n is bigger than, for example 2|x|, we have ne+(1x) ă 12 . That is, increasing the index
n
on en ( x ) by one decreases the size of en ( x ) by a factor of at least two. As a result en ( x )
must tend to zero as n Ñ 8.
Consequently, for all x, lim En ( x ) = 0, as claimed, and we really have
nÑ8

h1 2 1 3 1 ni
8
1 n
x
ÿ
e = lim 1 + x + x + x + ¨ ¨ ¨ + x = x
nÑ8 2 3! n! n =0
n!

Example 5.6.6

xn x
There is another way to prove that the series 8 n=0 n! converges to the function e .
ř
Rather than looking at how the error term En ( x ) behaves as n Ñ 8, we can show that
the series satisfies the same simple differential equation47 and the same initial condition
as the function.
 
1 n x.
Example 5.6.7 Optional — Another approach to showing that 8 x is e
ř
n=0 n!

1 n
We already know from Example 5.5.5, that the series 8 n=0 n! x converges to some func-
ř
tion f ( x ) for all values of x . All that remains to do is to show that f ( x ) is really e x . We will
do this by showing that f ( x ) and e x satisfy the same differential equation with the same

47 Recall, you studied that differential equation in the section on separable differential equations (Theo-
rem 3.12.10 in Section 3.12) as well as wayyyy back in the section on exponential growth and decay in
differential calculus.

415
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

initial conditions48 . We know that y = e x satisfies

dy
=y and y (0) = 1
dx
and by Theorem 3.12.10 (with a = 1, b = 0 and y(0) = 1), this is the only solution. So it
xn
suffices to show that f ( x ) = 8n=0 n! satisfies
ř

df
= f (x) and f (0) = 1.
dx

• By Theorem 5.5.12,
# +
8 8 8
df d 1 n n n´1 1
x n´1
ÿ ÿ ÿ
= x = x =
dx dx n =0
n! n! ( n ´ 1) !
n =1 n =1
=3kj
hkknik =4kj
hkknik
=1kj
hkknik =2kj
hkknik x2 x3
= 1 + x + + +¨¨¨
2! 3!
= f (x)

• When we substitute x = 0 into the series we get (see the discussion after Defini-
tion 5.5.1)

0 0
f (0) = 1 + + + ¨ ¨ ¨ = 1.
1! 2!

Hence f ( x ) solves the same initial value problem and we must have f ( x ) = e x .
Example 5.6.7

We can show that the error terms in Maclaurin polynomials for sine and cosine go to
zero as n Ñ 8 using very much the same approach as in Example 5.6.6.
 
(´1)n 2n+1 ř8 (´1)n 2n
Example 5.6.8 Optional — Why 8 x sin x and x cos x
ř
n=0 (2n+1)! = n=0 (2n)! =

Let f ( x ) be either sin x or cos x. We know that every derivative of f ( x ) will be one
of ˘ sin( x ) or ˘ cos( x ). Consequently,
ˇ (n+1) ˇ when we compute the error term using equa-
tion (5.6.1-b) we always have ˇ f (c)ˇ ď 1 and hence

|x|n+1
|En ( x )| ď .
( n + 1) !

48 Recall that when we solve of a separable differential equation our general solution will have an arbitrary
constant in it. That constant cannot be determined from the differential equation alone and we need
some extra data to find it. This extra information is often information about the system at its beginning
(for example when position or time is zero) — hence “initial conditions”. Of course the reader is already
familiar with this because it was covered back in Section 3.12.

416
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

n +1
In Example 5.6.3, we showed that (|x|
n +1) !
Ñ 0 as n Ñ 8 — so all the hard work is already
done. Since the error term shrinks to zero for both f ( x ) = sin x and f ( x ) = cos x, and
h i
1 (n)
f ( x ) = lim f (0) + f 1 (0) x + ¨ ¨ ¨ + n! f (0) x n
nÑ8

as required.
Example 5.6.8

5.6.2 §§ Computing with Taylor Series


Taylor series have a great many applications. (Hence their place in this course.) One of
the most immediate of these is that they give us an alternate way of computing many
functions. For example, the first definition we see for the sine and cosine functions is
in terms of triangles. Those definitions, however, do not lend themselves to computing
sine and cosine except at very special angles. Armed with power series representations,
however, we can compute them to very high precision at any angle. To illustrate this,
consider the computation of π — a problem that dates back to the Babylonians.

Example 5.6.9 (Computing the number π )

There are numerous methods for computing π to any desired degree of accuracy49 . Many
of them use the Maclaurin expansion

x2n+1
8
(´1)n
ÿ
arctan x =
n =0
2n + 1

of Theorem 5.6.5. Since arctan(1) = 4,


π
the series gives us a very pretty formula for π:

(´1)n
8
π ÿ
= arctan 1 =
4 n =0
2n + 1
 
1 1 1
π = 4 1´ + ´ +¨¨¨
3 5 7

Unfortunately, this series is not very useful for computing π because it converges so
slowly. If we approximate the series by its N th partial sum, then the alternating series
test (Theorem A.12.1 in the appendix) tells us that the error is bounded by the first term
we drop. To guarantee that we have 2 decimal digits of π correct, we need to sum about
the first 200 terms!
A much better way to compute π using this series is to take advantage of the fact that

49 The computation of π has a very, very long history and your favourite search engine will turn up many
sites that explore the topic. For a more comprehensive history one can turn to books such as “A history
of Pi” by Petr Beckmann and “The joy of π” by David Blatner.

417
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

tan π6 = ?1 :
3

 1  8
1 1
(´1)n
ÿ
π = 6 arctan ? =6 ? 2n+1
3 n =0
2n + 1 ( 3)
? ÿ 8
1 1
=2 3 (´1)n
n =0
2n + 1 3n
?  1 1 1 1 1 
= 2 3 1´ + ´ + ´ +¨¨¨
3 ˆ 3 5 ˆ 9 7 ˆ 27 9 ˆ 81 11 ˆ 243
Again, this is an alternating series and so (via Theorem A.12.1 in the appendix) the error
we introduce by truncating it is bounded by the first term dropped. For example, if we
keep ten terms, stopping at n = 9, we get π = 3.141591 (to 6 decimal places) with an error
between zero and ?
2 3
ă 3 ˆ 10´6
21 ˆ 310
In 1699, the English astronomer/mathematician Abraham Sharp (1653–1742) used 150
terms of this series to compute 72 digits of π — by hand!
This is just one of very many ways to compute π. Another one, which still uses the
Maclaurin expansion of arctan x, but is much more efficient, is

1 1
π = 16 arctan ´ 4 arctan
5 239
This formula was used by John Machin in 1706 to compute π to 100 decimal digits —
again, by hand.
(You won’t be asked to compute errors using Theorem A.12.1, but we include them
here for interest.)
Example 5.6.9

Power series also give us access to new functions which might not be easily expressed
in terms of the functions we have been introduced to so far. The following is a good
example of this.
Example 5.6.10 (Error function)

The error function żx


2 2
erf( x ) = ? e´t dt
π 0
is used in computing “bell curve” probabilities. The indefinite integral of the integrand
2
e´t cannot be expressed in terms of standard functions. But we can still evaluate the
integral to within any desired degree of accuracy by using the Taylor expansion of the
exponential. Start with the Maclaurin series for e x :
8
1 n
ex =
ÿ
x
n =0
n!

418
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

and then substitute x = ´t2 into this:

(´1)n 2n
8
´t2
ÿ
e = t
n =0
n!

We can then apply Theorem 5.5.12 to integrate term-by-term:


ż x "ÿ n
#
(´t2 )
8
2
erf( x ) = ? dt
π 0 n=0 n!
x2n+1
8
2 ÿ n
= ? (´1)
π n =0 (2n + 1)n!

For example, for the bell curve, the probability of being within one standard deviation of
the mean50 , is
8 ? 2n+1 8
?  2 ÿ n ( / 2)
1 2 ÿ 1
erf 1/ 2 = ? (´1) = ? (´1)n
π n =0 (2n + 1)n! 2π n=0 (2n + 1)2n n!
c  
2 1 1 1 1
= 1´ + ´ + ´ ¨ ¨ ¨
π 3 ˆ 2 5 ˆ 22 ˆ 2 7 ˆ 23 ˆ 3! 9 ˆ 24 ˆ 4!
This is yet another alternating series. If we keep five terms, stopping at n = 4, we get
0.68271 (to 5 decimal places) with, by Theorem A.12.1 in the appendix again, an error
between zero and the first dropped term, which is minus
c
2 1
5
ă 2 ˆ 10´5
π 11 ˆ 2 ˆ 5!
(You won’t be asked to compute such an error, but we include it for interest.)
Example 5.6.10

Example 5.6.11

Evaluate
(´1)n´1
8 8
ÿ ÿ 1
and
n3n n3n
n =1 n =1

Solution. There are not very many series that can be easily evaluated exactly. But occa-
sionally one encounters a series that can be evaluated simply by realizing that it is exactly
one of the series in Theorem 5.6.5, just with a specific value of x. The left hand given series
is
(´1)n´1 1
8
ÿ 1 1 1 1 1 1 1
n
= ´ 2
+ 3
´ +¨¨¨
n 3 3 2 3 3 3 4 34
n =1

50 If you don’t know what this means (forgive the pun) don’t worry, because it is not part of the course.
Standard deviation a way of quantifying variation within a population.

419
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

The series in Theorem 5.6.5 that this most closely resembles is

x2 x3 x4
ln(1 + x ) = x ´ + ´ ´¨¨¨
2 3 4
Indeed
(´1)n´1 1
8
ÿ 1 1 1 1 1 1 1
n
= ´ 2
+ 3
´ +¨¨¨
n 3 3 2 3 3 3 4 34
n =1
 
x2 x3 x4
= x´ + ´ ´¨¨¨
2 3 4 x = 31
h i
= ln(1 + x ) 1
x= 3
4
= ln
3
The right hand series above differs from the left hand series above only that the signs of
the left hand series alternate while those of the right hand series do not. We can flip every
second sign in a power series just by using a negative x.
h i  
x2 x3 x4
ln(1 + x ) = x´ + ´ ´¨¨¨
x =´ 31 2 3 4 x =´ 1 3
1 1 1 1 1 1 1
=´ ´ 2
´ 3
´ +¨¨¨
3 2 3 3 3 4 34
which is exactly minus the desired right hand series. So
8
ÿ 1 h i 2 3
n
= ´ ln ( 1 + x ) = ´ ln = ln
n3 x =´ 31 3 2
n =1

Example 5.6.11

Example 5.6.12

Let f ( x ) = sin(2x3 ). Find f (15) (0), the fifteenth derivative of f at x = 0.


Solution. This is a bit of a trick question. We could of course use the product and chain
rules to directly apply fifteen derivatives and then set x = 0, but that would be extremely
tedious51 . There is a much more efficient approach that exploits two pieces of knowledge
that we have.
• From equation (5.6.1-a), we see that the coefficient of ( x ´ a)n in the Taylor series of
1 (n)
f ( x ) with expansion point a is exactly n! f ( a). So f (n) ( a) is exactly n! times the
n
coefficient of ( x ´ a) in the Taylor series of f ( x ) with expansion point a.

51 We could get a computer algebra system to do it for us without much difficulty — but we wouldn’t
learn much in the process. The point of this example is to illustrate that one can do more than just
represent a function with Taylor series. More on this in the next section.

420
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

• We know, or at least can easily find, the Taylor series for sin(2x3 ).

Let’s apply that strategy.

• First, we know that, for all y,

1 3 1
sin y = y ´ y + y5 ´ ¨ ¨ ¨
3! 5!

• Just substituting y = 2x3 , we have

1 3 1 5
sin(2x3 ) = 2x3 ´ (2x3 ) + (2x3 ) ´ ¨ ¨ ¨
3! 5!
8 2 5
= 2x3 ´ x9 + x15 ´ ¨ ¨ ¨
3! 5!

• So the coefficient of x15 in the Taylor series of f ( x ) = sin(2x3 ) with expansion point
5
a = 0 is 25!

and we have
25
f (15) (0) = 15! ˆ = 348,713,164,800
5!
Example 5.6.12

Example 5.6.13 (Optional — Computing the number e)

Back in Example 5.6.6, we saw that

x2 xn
ex = 1 + x + 2! +¨¨¨+ n! + 1
( n +1) !
e c x n +1

for some (unknown) c between 0 and x. This can be used to approximate the number e,
with any desired degree of accuracy. Setting x = 1 in this equation gives

e = 1+1+ 1
2! +¨¨¨+ 1
n! + 1
( n +1) !
ec

for some c between 0 and 1. Even though we don’t know c exactly, we can bound that
term quite readily. We do know that ec in an increasing function52 of c, and so 1 = e0 ď
ec ď e1 = e. Thus we know that
1   e
1 1
ď e´ 1+1+ 2! +¨¨¨+ n! ď
( n + 1) ! ( n + 1) !

So we have a lower bound on the error, but our upper bound involves the e — precisely
the quantity we are trying to get a handle on.

52 Check the derivative!

421
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

But all is not lost. Let’s look a little more closely at the right-hand inequality when
n = 1:
e
e ´ (1 + 1) ď move the e’s to one side
2
e
ď2 and clean it up
2
e ď 4.

Now this is a pretty crude bound53 but it isn’t hard to improve. Try this again with n = 1:

1 e
e ´ (1 + 1 + ) ď move e’s to one side
2 6
5e 5
ď
6 2
e ď 3.

Better. Now we can rewrite our bound:

1   e 3
1 1
ď e´ 1+1+ 2! +¨¨¨+ n! ď ď
( n + 1) ! ( n + 1) ! ( n + 1) !

If we set n = 4 in this we get


 
1 1 1 1 1 3
= ď e´ 1+1+ + + ď
120 5! 2 6 24 120

1 3 1
So the error is between 120 and 120 = 40 — this approximation isn’t guaranteed to give us
the first 2 decimal places. If we ramp n up to 9 however, we get
 
1 1 1 3
ď e´ 1+1+ +¨¨¨+ ď
10! 2 9! 10!

3 3
Since 10! = 3628800, the upper bound on the error is 3628800 ă 3000000 = 10´6 , and we can
approximate e by

1
1+1+ + 3!1 + 4!1 + 5!1 + 6!1
2! + 7!1 + 8!1 + 9!1
=1 + 1 + 0.5 + 0.16̇ + 0.0416̇ + 0.0083̇ + 0.00138̇ + 0.0001984 + 0.0000248 + 0.0000028
=2.718282

and it is correct to six decimal places.


Example 5.6.13

53 The authors hope that by now we all “know” that e is between 2 and 3, but maybe we don’t know how
to prove it.

422
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

5.6.3 §§ Evaluating Limits using Taylor Expansions


Taylor polynomials provide a good way to understand the behaviour of a function near a
specified point and so are useful for evaluating complicated limits. Here are some exam-
ples.
Example 5.6.14

In this example, we’ll start with a relatively simple limit, namely


sin x
lim
xÑ0 x

The first thing to notice about this limit is that, as x tends to zero, both the numerator, sin x,
and the denominator, x, tend to 0. So we may not evaluate the limit of the ratio by simply
dividing the limits of the numerator and denominator. To find the limit, or show that it
does not exist, we are going to have to exhibit a cancellation between the numerator and
the denominator. Let’s start by taking a closer look at the numerator. By Example 5.6.4,
1 3 1
sin x = x ´ x + x5 ´ ¨ ¨ ¨
3! 5!
Consequently54
sin x 1 1
= 1 ´ x2 + x4 ´ ¨ ¨ ¨
x 3! 5!
Every term in this series, except for the very first term, is proportional to a strictly positive
power of x. Consequently, as x tends to zero, all terms in this series, except for the very
first term, tend to zero. In fact the sum of all terms, starting with the second term, also
tends to zero. That is,
h 1 1 4 i
2
lim ´ x + x ´ ¨ ¨ ¨ = 0
xÑ0 3! 5!
We won’t justify that statement here, but it will be justified in the following (optional)
subsection. So
sin x h 1 1 i
lim = lim 1 ´ x2 + x4 ´ ¨ ¨ ¨
xÑ0 x xÑ0 3! 5!
h 1 1 4 i
2
= 1 + lim ´ x + x ´ ¨ ¨ ¨
xÑ0 3! 5!
=1

54 We are hiding some mathematics behind this “consequently”. What we are really using our knowledge
of Taylor polynomials to write
1 3 1
f ( x ) = sin( x ) = x ´ x + x5 + E5 ( x )
3! 5!
f (6) ( c )
where E5 ( x ) = 6! x6 and c is between 0 and x. We are effectively hiding “E5 ( x )” inside the “¨ ¨ ¨ ”.
Now we can divide both sides by x (assuming x ‰ 0):
sin( x ) 1 1 E (x)
= 1 ´ x2 + x4 + 5 .
x 3! 5! x
E5 ( x )
and everything is fine provided the term x stays well behaved.

423
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

Example 5.6.14

The limit in the previous example can also be evaluated relatively easily using l’Hôpital’s
rule55 . While the following limit can also, in principal, be evaluated using l’Hôpital’s rule,
it is much more efficient to use Taylor series56 .
Example 5.6.15

In this example we evaluate


arctan x ´ x
lim
xÑ0 sin x ´ x
Once again, the first thing to notice about this limit is that, as x tends to zero, the numera-
tor tends to arctan 0 ´ 0, which is 0, and the denominator tends to sin 0 ´ 0, which is also
0. So we may not evaluate the limit of the ratio by simply dividing the limits of the nu-
merator and denominator. Again, to find the limit, or show that it does not exist, we are
going to have to exhibit a cancellation between the numerator and the denominator. To
get a more detailed understanding of the behaviour of the numerator and denominator
near x = 0, we find their Taylor expansions. By Example 5.5.20,

x3 x5
arctan x = x ´ + ´¨¨¨
3 5
so the numerator
x3 x5
arctan x ´ x = ´ + ´¨¨¨
3 5
By Example 5.6.4,
1 3 1
sin x = x ´ x + x5 ´ ¨ ¨ ¨
3! 5!
so the denominator
1 3 1
sin x ´ x = ´ x + x5 ´ ¨ ¨ ¨
3! 5!
and the ratio
3 x5
arctan x ´ x ´ x3 + 5 ´¨¨¨
= 1 3 1 5
sin x ´ x ´ 3! x + 5! x ´ ¨ ¨ ¨
Notice that every term in both the numerator and the denominator contains a common
factor of x3 , which we can cancel out.
2
arctan x ´ x ´ 13 + x5 ´ ¨ ¨ ¨
= 1
sin x ´ x ´ 3! + 5!1 x2 ´ ¨ ¨ ¨

As x tends to zero,

55 Many of you learned about l’Hôpital’s rule in school and all of you should have seen it last term in your
differential calculus course.
56 It takes 3 applications of l’Hôpital’s rule and some careful cleaning up of the intermediate expressions.
Oof!

424
S EQUENCE AND S ERIES 5.6 TAYLOR S ERIES

• the numerator tends to ´ 13 , which is not 0, and

• the denominator tends to ´ 3!1 = ´ 16 , which is also not 0.

so we may now legitimately evaluate the limit of the ratio by simply dividing the limits
of the numerator and denominator.
2
arctan x ´ x ´1 + x ´ ¨ ¨ ¨
lim = lim 1 3 15 2
xÑ0 sin x ´ x xÑ0 ´ + x ´ ¨ ¨ ¨
3! 5!
 1 x2 
limxÑ0 ´ 3 + 5 ´ ¨ ¨ ¨
=  
limxÑ0 ´ 3!1 + 5!1 x2 ´ ¨ ¨ ¨
´1/3
= 1
´ /3!
=2

Example 5.6.15

Chapter 5 of this work was adapted from Chapter 3 of CLP 2 – Integral Calculus by
Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

425
Appendix A

P ROOFS AND S UPPLEMENTS

A.1IJ Folding the First Octant of R3


This text, whether you’re reading it on a computer screen or a printed page, exists in
two dimensions. So, anything we draw in three dimensions is going to require a little bit
of imagination. If you’re struggling to understand the figures with three coordinates, it
might help to make your own model of these axes.
In the Cartesian plane, the first quadrant is the part of the plane where both x and y
are positive. R3 divides three-dimensional space into eight regions, called octants. The
first octant is the region where all of x, y, and z are positive.
Following the instructions below, you can fold a piece of paper into an octant.
1. Fold your paper in half “hamburger style” (so that the fold goes along the shorter
dimension of the paper). Position it so that it opens like a book1 .

2. Bring the corner of your folded paper up to the side.

3. Your paper now has a triangle sitting on top of a rectangle. Where the triangle ends,
make a crease in the underlying rectangle shapes.

1 in a language written left-to-right

426
P ROOFS AND S UPPLEMENTS A.2 C ONIC S ECTIONS AND Q UADRIC S URFACES

crease

4. Your paper has four layers, with the triangle shapes on top. Open the paper so that
three layers are on top, and one is on the bottom. The result should look like the
inside corner of a box.

open

Your octant is created! The vertical crease is the z axis, the crease to the left is the x axis,
and the crease to the right is the y axis. In the picture below, the blue sphere indicates that
the octant is open towards you: if you were to put a marble inside the paper structure, it
would sit as shown.
z

x y

To practice with your octant, label the following points directly on the paper:

• (1, 1, 0)

• (0, 1, 1)

• (1, 0, 1)

The next collection of points will exist out in space, not on any of the paper sides. Point to
their positions relative to your octant:

• (1, 1, 1)

• (1, 2, 3)

• (1, ´1, 1)

• (1, 1, ´1)

A.2IJ Conic Sections and Quadric Surfaces


A conic section is the curve of intersection of a cone and a plane that does not pass through
the vertex of the cone. This is illustrated in the figures below. An equivalent2 (and often

427
P ROOFS AND S UPPLEMENTS A.2 C ONIC S ECTIONS AND Q UADRIC S URFACES

circle ellipse parabola hyperbola

used) definition is that a conic section is the set of all points in the xy–plane that obey
Q( x, y) = 0 with
Q( x, y) = Ax2 + By2 + Cxy + Dx + Ey + F = 0
being a polynomial of degree two3 . By rotating and translating our coordinate system the
equation of the conic section can be brought into one of the forms4
• αx2 + βy2 = γ with α, β, γ ą 0, which is an ellipse (or a circle),
• αx2 ´ βy2 = γ with α, β ą 0, γ ‰ 0, which is a hyperbola,
• x2 = δy, with δ ‰ 0 which is a parabola.
The three dimensional analogs of conic sections, surfaces in three dimensions given by
quadratic equations, are called quadrics. An example is the sphere x2 + y2 + z2 = 1. Here
are some tables giving all of the quadric surfaces.
elliptic parabolic hyperbolic
name sphere
cylinder cylinder cylinder
equation in x2 y2 2 x2 y2
a 2 + b 2 = 1 y = ax a 2 ´ b2
=1 x 2 + y2 + z2 = r 2
standard form
x = constant two lines one line two lines circle
cross–section
y = constant two lines two lines two lines circle
cross–section
z = constant ellipse parabola hyperbola circle
cross–section

sketch

2 It is outside our scope to prove this equivalence.


3 Technically, we should also require that the constants A, B, C, D, E, F, are real numbers, that A, B, C are
not all zero, that Q( x, y) = 0 has more than one real solution, and that the polynomial can’t be factored
into the product of two polynomials of degree one.
4 This statement can be justified using a linear algebra eigenvalue/eigenvector analysis. It is beyond
what we can cover here, but is not too difficult for a standard linear algeba course.

428
P ROOFS AND S UPPLEMENTS A.2 C ONIC S ECTIONS AND Q UADRIC S URFACES

elliptic elliptic
name ellipsoid
paraboloid cone
equation in 2 y2 y2
x2
+ yb2 + zc2 = 1
2 x2 z x2 z2
a2 a2
+ b2
= c a2
+ b2
= c2
standard form
x = constant two lines if x = 0
ellipse parabola
cross–section hyperbola if x ‰ 0
y = constant two lines if y = 0
ellipse parabola
cross–section hyperbola if y ‰ 0
z = constant ellipse ellipse ellipse
cross–section

sketch

429
P ROOFS AND S UPPLEMENTS A.3 M IXED PARTIAL D ERIVATIVES

hyperboloid hyperboloid hyperbolic


name
of one sheet of two sheets paraboloid
equation in y2 2 y2
x2 z2 x2
+ yb2 ´ zc2 = ´1
2 x2 z
a2
+ b2
´ c2
=1 a2 b2
´ a2
= c
standard form
x = constant hyperbola hyperbola parabola
cross–section
y = constant hyperbola hyperbola ellipse
cross–section
z = constant two lines if z = 0
ellipse ellipse
cross–section hyperbola if z ‰ 0

sketch

Section A.2 of this work was adapted from Appendix G of CLP 3 – Multivariable Calcu-
lus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

A.3IJ Mixed Partial Derivatives


A.3.1 §§ Clairaut: The Proof of Theorem 2.2.5
§§§ Outline

Here is an outline of the proof of Theorem 2.2.5. The (numbered) details are in the subsec-
tion below.
Fix real numbers x0 and y0 and define

1 
F (h, k) = f ( x0 + h, y0 + k ) ´ f ( x0 , y0 + k ) ´ f ( x0 + h, y0 ) + f ( x0 , y0 ) .
hk

B2 f B2 f
We define F (h, k ) in this way because both partial derivatives BxBy ( x0 , y0 ) and ByBx ( x0 , y0 )
are limits of F (h, k ) as h, k Ñ 0. We show in item (1) in the details below that

B Bf
( x0 , y0 ) = lim lim F (h, k)
By Bx kÑ0 hÑ0
B Bf
( x0 , y0 ) = lim lim F (h, k)
Bx By hÑ0 kÑ0

B2 f B2 f
and therefore the partial derivatives BxBy ( x0 , y0 ) and ByBx ( x0 , y0 ) are identical except for
the order in which the limits are taken.

430
P ROOFS AND S UPPLEMENTS A.3 M IXED PARTIAL D ERIVATIVES

Now, by applying the Mean Value Theorem multiple times (see items (2) to (5) for more
details) we get
 
(2) 1 B f Bf
F (h, k ) = ( x0 + h, y0 + θ1 k) ´ ( x0 , y0 + θ1 k)
h By By
(3) B B f
= ( x0 + θ2 h, y0 + θ1 k)
Bx By
 
(4) 1 B f Bf
F (h, k ) = ( x0 + θ3 h, y0 + k) ´ ( x0 + θ3 h, y0 )
k Bx Bx
(5) B B f
= ( x0 + θ3 h, y0 + θ4 k)
By Bx

for some numbers 0 ă θ1 , θ2 , θ3 , θ4 ă 1. All of the numbers θ1 , θ2 , θ3 , θ4 depend on x0 , y0 , h, k.


Hence
B Bf B Bf
( x0 + θ2 h, y0 + θ1 k) = ( x0 + θ3 h, y0 + θ4 k)
Bx By By Bx
for all h and k. Taking the limit (h, k) Ñ (0, 0) and using the assumed continuity of both
partial derivatives at ( x0 , y0 ) gives
B Bf B Bf
lim F (h, k ) = ( x0 , y0 ) = ( x0 , y0 )
(h,k)Ñ(0,0) Bx By By Bx

as desired. To complete the proof we just have to justify the details (1), (2), (3), (4) and (5).

§§§ The Details


(1) By definition,
 
B Bf 1 Bf Bf
( x0 , y0 ) = lim ( x0 , y0 + k ) ´ ( x0 , y0 )
By Bx kÑ0 k Bx Bx
 
1 f ( x0 + h, y0 + k ) ´ f ( x0 , y0 + k ) f ( x0 + h, y0 ) ´ f ( x0 , y0 )
= lim lim ´ lim
kÑ0 k hÑ0 h hÑ0 h
f ( x0 + h, y0 + k) ´ f ( x0 , y0 + k) ´ f ( x0 + h, y0 ) + f ( x0 , y0 )
= lim lim
kÑ0 hÑ0 hk
= lim lim F (h, k)
kÑ0 hÑ0

Similarly,
 
B Bf 1 Bf Bf
( x0 , y0 ) = lim ( x0 + h, y0 ) ´ ( x0 , y0 )
Bx By hÑ0 h By By
 
1 f ( x0 + h, y0 + k ) ´ f ( x0 + h, y0 ) f ( x0 , y0 + k ) ´ f ( x0 , y0 )
= lim lim ´ lim
hÑ0 h kÑ0 k kÑ0 k
f ( x0 + h, y0 + k ) ´ f ( x0 + h, y0 ) ´ f ( x0 , y0 + k ) + f ( x0 , y0 )
= lim lim
hÑ0 kÑ0 hk
= lim lim F (h, k)
hÑ0 kÑ0

431
P ROOFS AND S UPPLEMENTS A.3 M IXED PARTIAL D ERIVATIVES

(2) The Mean Value Theorem (probably covered in your last calculus class) says that, for
any differentiable function ϕ( x ),
 
• the slope of the line joining the points x0 , ϕ( x0 ) and x0 + k, ϕ( x0 + k) on the
graph of ϕ

is the same as

• the slope of the tangent to the graph at some point between x0 and x0 + k.

That is, there is some 0 ă θ1 ă 1 such that

ϕ ( x0 + k ) ´ ϕ ( x0 ) dϕ
= ( x0 + θ1 k )
k dx

y
y “ ϕpxq

x0
x
x0 `θ1k x0 `k

Applying this with x replaced by y and ϕ replaced by G (y) = f ( x0 + h, y) ´ f ( x0 , y)


gives

G ( y0 + k ) ´ G ( y0 ) dG
= ( y0 + θ1 k ) for some 0 ă θ1 ă 1
k dy
Bf Bf
= ( x0 + h, y0 + θ1 k) ´ ( x0 , y0 + θ1 k)
By By

Hence, for some 0 ă θ1 ă 1,


   
1 G ( y0 + k ) ´ G ( y0 ) 1 Bf Bf
F (h, k) = = ( x0 + h, y0 + θ1 k) ´ ( x0 , y0 + θ1 k)
h k h By By

Bf
(3) Define H ( x ) = By ( x, y0 + θ1 k). By the Mean Value Theorem,

1
F (h, k ) = [ H ( x0 + h) ´ H ( x0 )]
h
dH
= ( x0 + θ2 h ) for some 0 ă θ2 ă 1
dx
B Bf
= ( x0 + θ2 h, y0 + θ1 k)
Bx By

432
P ROOFS AND S UPPLEMENTS A.3 M IXED PARTIAL D ERIVATIVES

(4) Define A( x ) = f ( x, y0 + k ) ´ f ( x, y0 ). By the Mean Value Theorem,


 
1 A ( x0 + h ) ´ A ( x0 )
F (h, k ) =
k h
1 dA
= ( x0 + θ3 h ) for some 0 ă θ3 ă 1
k dx 
1 Bf Bf
= ( x0 + θ3 h, y0 + k) ´ ( x0 + θ3 h, y0 )
k Bx Bx

Bf
(5) Define B(y) = Bx ( x0 + θ3 h, y). By the Mean Value Theorem

1
F (h, k) = [ B(y0 + k) ´ B(y0 )]
k
dB
= ( y0 + θ4 k ) for some 0 ă θ4 ă 1
dy
B Bf
= ( x0 + θ3 h, y0 + θ4 k)
By Bx

This completes the proof of Theorem 2.2.5.


Section A.3.1 of this work was adapted from Section 2.3.1 of CLP 3 – Multivariable Cal-
culus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

B2 f B2 f
A.3.2 §§ An Example of BxBy ( x0 , y0 ) ‰ ByBx ( x0 , y0 )

B2 f B2 f B2 f
In Theorem 2.2.5, we showed that BxBy ( x0 , y0 ) = ByBx ( x0 , y0 ) if the partial derivatives BxBy
B2 f
and ByBx exist and are continuous at ( x0 , y0 ). Here is an example which shows that if
B2 f B2 f
the partial derivatives BxBy and ByBx are not continuous at ( x0 , y0 ), then it is possible that
B2f B2f
BxBy ( x0 , y0 ) ‰ ByBx ( x0 , y0 ).
Define
x2 ´y2
#
xy x2 +y2 if ( x, y) ‰ (0, 0)
f ( x, y) =
0 if ( x, y) = (0, 0)

This function is continuous everywhere. Note that f ( x, 0) = 0 for all x and f (0, y) = 0 for
all y. We now compute the first order partial derivatives. For ( x, y) ‰ (0, 0),

Bf x 2 ´ y2 2x 2x ( x2 ´ y2 ) x 2 ´ y2 4xy2
( x, y) = y 2 + xy ´ xy = y + xy
Bx x + y2 x 2 + y2 ( x 2 + y2 )2 x 2 + y2 ( x 2 + y2 )2
Bf x 2 ´ y2 2y 2y( x2 ´ y2 ) x 2 ´ y2 4yx2
( x, y) = x 2 ´ xy ´ xy = x ´ xy
By x + y2 x 2 + y2 ( x 2 + y2 )2 x 2 + y2 ( x 2 + y2 )2

433
P ROOFS AND S UPPLEMENTS A.4 T HE ( MULTIVARIABLE ) CHAIN RULE

For ( x, y) = (0, 0),


   
Bf d d
(0, 0) = f ( x, 0) = 0 =0
Bx dx x =0 dx x=0
   
Bf d d
(0, 0) = f (0, y) = 0 =0
By dy y =0 dy y=0

By way of summary, the two first order partial derivatives are


$ 2 2
&y x ´y + 4x2 y3 if ( x, y) ‰ (0, 0)
f x ( x, y) = x 2 + y2 ( x 2 + y2 )2
%0 if ( x, y) = (0, 0)
$ 2 2
&x x ´y ´ 4x3 y2
x 2 + y2
if ( x, y) ‰ (0, 0)
f y ( x, y) = ( x 2 + y2 )2
%0 if ( x, y) = (0, 0)

Bf Bf
Both Bx ( x, y ) and By ( x, y ) are continuous. Finally, we compute
   2 
B2 f d 1  1 h ´ 02
(0, 0) = f y ( x, 0) = lim f y (h, 0) ´ f y (0, 0) = lim h 2 ´0 = 1
BxBy dx x =0 hÑ0 h hÑ0 h h + 02
   
B2 f d 1 1 02 ´ k 2
(0, 0) = f x (0, y) = lim [ f x (0, k) ´ f x (0, 0)] = lim k 2 ´ 0 = ´1
ByBx dy y =0 kÑ0 k kÑ0 k 0 + k2

Section A.3.2 of this work was adapted from Section 2.3.2 of CLP 3 – Multivariable Cal-
culus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

A.4IJ The (multivariable) chain rule


You already routinely use the one dimensional chain rule

d  df  dx
f x (t) = x (t) (t)
dt dx dt
in doing computations like
d
sin(t2 ) = cos(t2 ) 2t
dt
In this example, f ( x ) = sin( x ) and x (t) = t2 .
We now generalize the chain rule to functions of more than one variable. For con-
creteness, we concentrate on the case in which all functions are functions of two variables.
That is, we find the partial derivatives BF Bs and Bt of a function F ( s, t ) that is defined as a
BF

composition 
F (s, t) = f x (s, t) , y(s, t)
We are using the name F for the new function F (s, t) as a reminder that it is closely related
to, though not the same as, the function f ( x, y). The partial derivative BFBs is the rate of

434
P ROOFS AND S UPPLEMENTS A.4 T HE ( MULTIVARIABLE ) CHAIN RULE

change of F when s is varied with t held constant. When  s is varied, both the x-argument,
x (s, t), and the y-argument, y(s, t), in f x (s, t) , y(s, t) vary. Consequently, the chain rule

for f x (s, t) , y(s, t) is a sum of two terms — one resulting from the variation of the x-
argument and the other resulting from the variation of the y-argument.
TheoremA.4.1 (The Chain Rule).

Assume that all first order partial derivatives of f ( x, y), x (s, t) and y(s, t) exist
and are continuous. Then the same is true for F (s, t) = f x (s, t) , y(s, t) and

BF Bf  Bx Bf  By
(s, t) = x (s, t) , y(s, t) (s, t) + x (s, t) , y(s, t) (s, t)
Bs Bx Bs By Bs
BF Bf  Bx Bf  By
(s, t) = x (s, t) , y(s, t) (s, t) + x (s, t) , y(s, t) (s, t)
Bt Bx Bt By Bt

We will give the proof of this theorem in §A.4.2, below. It is common to state this chain
rule as
BF B f Bx B f By
= +
Bs Bx Bs By Bs
BF B f Bx B f By
= +
Bt Bx Bt By Bt
That is, it is common to suppress the function arguments. But you should make sure that
you understand what the arguments are before doing so.
Theorem A.4.1 is given for the case that F is the composition of a function of two
variables, f ( x, y), with two functions, x (s, t) and y(s, t), of two variables each. There is
nothing magical about the number two. There are obvious variants for any numbers of
variables. For example,
Equation A.4.2.

if F (t) = f x (t), y(t), z(t) , then

dF Bf  dx Bf  dy
(t) = x (t) , y(t) , z(t) (t) + x (t) , y(t) , z(t) (t)
dt Bx dt By dt
Bf  dz
+ x (t) , y(t) , z(t) (t)
Bz dt

and
Equation A.4.3.

if F (s, t) = f x (s, t) , then

BF df  Bx
(s, t) = x (s, t) (s, t)
Bt dx Bt

To give you an idea of how the proof of Theorem A.4.1 will go, we first review the
proof of the familiar one dimensional chain rule.

435
P ROOFS AND S UPPLEMENTS A.4 T HE ( MULTIVARIABLE ) CHAIN RULE

d
 df  dx
A.4.1 §§ Review of the Proof of dt f x (t) = dx x (t) dt ( t )
As a warm up, let’s review the proof of the one dimensional chain rule

d  df  dx
f x (t) = x (t) (t)
dt dx dt

We wish to find the derivative of F (t) = f x (t) . By definition

F (t + h) ´ F (t)
F1 (t) = lim
hÑ0 h
 
f x (t + h) ´ f x (t)
= lim
hÑ0 h

Notice that the numerator is the difference of f ( x ) evaluated at two nearby values of
x, namely x1 = x (t + h) and x0 = x (t). The Mean Value Theorem is a good tool for
studying the difference in the values of f ( x ) at two nearby points. Recall that the Mean
Value Theorem says that, for any given x0 and x1 , there exists an (in general unknown) c
between them so that
f ( x1 ) ´ f ( x0 ) = f 1 ( c ) ( x1 ´ x0 )
For this proof, we choose x0 = x (t) and x1 = x (t + h). The the Mean Value Theorem tells
us that there exists a ch so that
   
f x ( t + h ) ´ f x ( t ) = f ( x1 ) ´ f ( x0 ) = f 1 ( c h ) x ( t + h ) ´ x ( t )

We have put the subscript h on ch to emphasise that ch , which is between x0 = x (t) and
x1 = x (t + h), may depend on h. Now since ch is trapped between x (t) and x (t + h) and
since x (t + h) Ñ x (t) as h Ñ 0, we have that ch must also tend to x (t) as h Ñ 0. Plugging
ths into the definition of F1 (t),
 
f x (t + h) ´ f x (t)
F (t) =
1
lim
hÑ0 h
 
f (ch ) x (t + h) ´ x (t)
1
= lim
hÑ0 h
x (t + h) ´ x (t)
= lim f 1 (ch ) lim
hÑ0 hÑ0 h
 1
= f x (t) x (t)
1

as desired.

A.4.2 §§ Proof of Theorem A.4.1



We’ll now prove the formula for Bs
B
f x (s, t) , y(s, t) that is given in Theorem A.4.1. The
proof uses the same ideas as the proof of the one variable chain rule, that we have just
reviewed.

436
P ROOFS AND S UPPLEMENTS A.4 T HE ( MULTIVARIABLE ) CHAIN RULE


We wish to find the partial derivative with respect to s of F (s, t) = f x (s, t) , y(s, t) .
By definition

BF F (s + h, t) ´ F (s, t)
(s, t) = lim
Bs hÑ0 h
 
f x (s + h, t) , y(s + h, t) ´ f x (s, t) , y(s, t)
= lim
hÑ0 h

The numerator is the difference of  f ( x, y) evaluated at two nearby  values of ( x, y), namely
( x1 , y1 ) = x (s + h, t) , y(s + h, t) and ( x0 , y0 ) = x (s, t) , y(s, t) . In going from ( x0 , y0 ) to
( x1 , y1 ), both the x and y-coordinates change. By adding and subtracting we can separate
the change in the x-coordinate from the change in the y-coordinate.

f ( x1 , y1 ) ´ f ( x0 , y0 ) = f ( x1 , y1 ) ´ f ( x0 , y1 ) + f ( x0 , y1 ) ´ f ( x0 , y0 )
( (

The first half, f ( x1 , y1 ) ´ f ( x0 , y1 ) , has the same y argument in both terms and so is the
(

difference of the function of one variable g( x ) = f ( x, y1 ) (viewing y1 just as a constant)


evaluated at the two nearby values, x0 , x1 , of x. Consequently, we can make use of the
Mean Value Theorem as we did in §A.4.1 above. There is a c x,h between x0 = x (s, t) and
x1 = x (s + h, t) such that

Bf
( c , y1 ) [ x1 ´ x0 ]
f ( x1 , y1 ) ´ f ( x0 , y1 ) = g( x1 ) ´ g( x0 ) = g1 (c x,h )[ x1 ´ x0 ] =
Bx x,h
Bf  
= c x,h , y(s + h, t) x (s + h, t) ´ x (s, t)
Bx

We have introduced the two subscripts in c x,h to remind ourselves that it may depend on
h and that it lies between the two x-values x0 and x1 .
Similarly, the second half, f ( x0 , y1 ) ´ f ( x0 , y0 ) , is the difference of the function of
(

one variable h(y) = f ( x0 , y) (viewing x0 just as a constant) evaluated at the two nearby
values, y0 , y1 , of y. So, by the mean value theorem,

Bf
f ( x0 , y1 ) ´ f ( x0 , y0 ) = h(y1 ) ´ h(y0 ) = h1 (cy,h )[y1 ´ y0 ] =
( x0 , cy,h ) [y1 ´ y0 ]
By
Bf  
= x (s, t) , cy,h y(s + h, t) ´ y(s, t)
By

for some (unknown) cy,h between y0 = y(s, t) and y1 = y(s + h, t). Agan, the two sub-
scripts in cy,h remind ourselves that it may depend on h and that it lies between the two
y-values y0 and y1 . So, noting that, as h tends to zero, c x,h , which is trapped between
x (s, t) and x (s + h, t), must tend to x (s, t), and cy,h , which is trapped between y(s, t) and

437
P ROOFS AND S UPPLEMENTS A.4 T HE ( MULTIVARIABLE ) CHAIN RULE

y(s + h, t), must tend to y(s, t),


 
BF f x (s + h, t) , y(s + h, t) ´ f x (s, t) , y(s, t)
(s, t)) = lim
Bs hÑ0 h
Bf  
c x,h , y ( s + h, t ) x ( s + h, t ) ´ x ( s, t )
= lim Bx
hÑ0 h
Bf  
By x ( s, t ) , c y,h y ( s + h, t ) ´ y ( s, t )
+ lim
hÑ0 h
Bf  x (s + h, t) ´ x (s, t)
= lim c x,h , y(s + h, t) lim
hÑ0 Bx hÑ0 h
Bf  y(s + h, t) ´ y(s, t)
+ lim x (s, t) , cy,h lim
hÑ0 By hÑ0 h
Bf  Bx Bf  By
= x (s, t) , y(s, t) (s, t) + x (s, t) , y(s, t) (s, t)
Bx Bs By Bs

We can of course follow the same procedure to evaluate the partial derivative with respect
to t. This concludes the proof of Theorem A.4.1.

Example A.4.4 (Implicit Differentiation on Level Curves)

Level curves of the surface z = f ( x, y) are points ( x, y) such that f ( x, y) = z0 , where z0 is


some fixed constant. We can think of level curves as existing in an xy-plane by ignoring
the z coordinate (which, remember, is constant). Consider a point ( a, b, z0 ) on a level curve
f ( x, y) = z0 . In the two-dimensional view, near ( a, b) we can think of y as a function of x:
if x moves a little but, then y changes as well to “compensate” and maintain the constant
z-value.[picture] So, we can write y( x ) to remember that y depends on x in this situation.

z0 = f ( x, y( x ))

Now we can think about the single-variable function g( x ) = f ( x, y( x )). Since this
function is equal to the constant value z0 , its derivative is zero. Then, using the chain rule:

d B f dx B f dy
0 = g1 ( x ) = [ f ( x, y( x ))] = +
dx Bx dx By dx
dy
= fx ¨ 1 + fy ¨
dx

If f y ‰ 0, then

dy fx

dx fy

What we’ve proved is the following.

438
P ROOFS AND S UPPLEMENTS A.5 L AGRANGE M ULTIPLIERS : P ROOF OF T HEOREM 2.5.2

TheoremA.4.5.

The derivative of the curve in the xy plane that is implicitly defined by the equa-
tion
z0 = f ( x, y)
for some constant z0 and some differentiable function f ( x, y) is

dy fx

dx fy

as long as f y ‰ 0.

Example A.4.4

CorollaryA.4.6.

Let f ( x, y) be a function whose partial derivatives exist. ∇ f ( a, b) is perpendicu-


lar to the level curve f ( x, y) = f ( a, b) at ( a, b) as long as ∇ f ( a, b) ‰ 0.

Proof. First, suppose f y ( a, b) ‰ 0. Using Theorem A.4.5, the line tangent to the level curve
f
has slope (in the xy plane) ´ f yx . So, one vector in the direction tangent to the level curve is


´ f y , f x . Then


´ fy, fx ¨ fx, fy = 0



so ´ f y , f x and ∇ f = f x , f y are perpendicular.
Second, consider the case f y ( a, b) = 0. In this case, at ( a, b) the level curve has a
vertical tangent line. If ∇ f ( a, b) ‰ 0, then f x ( a, b) ‰ 0, so the gradient ∇ f ( a, b) = h f x , 0i
is horizontal.

Section A.4 of this work was adapted from Section 2.4 of CLP 3 – Multivariable Calcu-
lus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

A.5IJ Lagrange Multipliers: Proof of Theorem 2.5.2


First, some intuition. When we talk about derivatives on a surface, we need to think
about the derivatives in a particular direction.5 Consider in particular the surface formed
by all points ( x, y) such that f ( x, y) = z, for some function f ( x, y). The directions giving
zero rate of increase are those that keep you on a level curve. By Corollary A.4.6, those
directions are perpendicular to ∇ f ( a, b).

5 If you’re walking along hilly terrain, changing direction can cause you to change from going uphill to
downhill. Direction definitely matters!

439
P ROOFS AND S UPPLEMENTS A.5 L AGRANGE M ULTIPLIERS : P ROOF OF T HEOREM 2.5.2

The corresponding statement in three dimensions is that ∇ F ( a, b, c) is perpendicular to


the level surface F ( x, y, z) = F ( a, b, c) at ( a, b, c). Hence a good way to find a vector normal
to the surface F ( x, y, z) = 0 at the point ( a, b, c) is to compute the gradient ∇ F ( a, b, c).

TheoremA.5.1 (Lagrange Multipliers).

Let f ( x, y, z) and g( x, y, z) have continuous first partial derivatives in a region


of R3 that contains the surface S given by the equation g( x, y, z) = 0. Further
sssume that ∇ g( x, y, z) ‰ 0 on S.
If f , restricted to the surface S, has a local extreme value at the point ( a, b, c) on
S, then there is a real number λ such that

∇ f ( a, b, c) = λ∇ g( a, b, c)

that is

f x ( a, b, c) = λ gx ( a, b, c)
f y ( a, b, c) = λ gy ( a, b, c)
f z ( a, b, c) = λ gz ( a, b, c)

The number λ is called a Lagrange multiplier.

Proof. Suppose that ( a, b, c) is a point of S and that f ( x, y, z) ě f ( a, b, c) for all points


( x, y, z) on S that are close to ( a, b, c). That is ( a, b, c) is a local minimum for f on S. Of
course the argument for a local maximum is virtually identical.
Imagine that we go for a walk on S, with the time t running, say, from t = ´1 to t = +1
and that at time  t = 0 we happen to be exactly at ( a, b, c). Let’s say that our position is
x (t), y(t), z(t) at time t. Write

F ( t ) = f x ( t ), y ( t ), z ( t )

So F (t) is the value of f that we see on our walk at time t. Then for all t close to 0,
x (t), y(t), z(t) is close to x (0), y(0), z(0) = ( a, b, c) so that
 
F (0) = f x (0), y(0), z(0) = f ( a, b, c) ď f x (t), y(t), z(t) = F (t)

for all t close to zero. So F (t) has a local minimum at t = 0 and consequently F1 (0) = 0.
By the chain rule, Theorem A.4.1,
d ˇˇ
F 1 (0) = f x ( t ), y ( t ), z ( t ) ˇ
dt  1 t =0
 
= f x a, b, c x (0) + f y a, b, c y1 (0) + f z a, b, c z1 (0) = 0 (˚)

We may rewrite this as a dot product:




0 = F1 (0) = ∇ f ( a, b, c) ¨ x1 (0) , y1 (0) , z1 (0)


ùñ ∇ f ( a, b, c) K x1 (0) , y1 (0) , z1 (0)

440
P ROOFS AND S UPPLEMENTS A.6 A M ORE R IGOROUS A REA C OMPUTATION

This is true for all paths on S that pass through ( a, b, c) at time 0. In particular it is true for
all vectors h x1 (0) , y1 (0) , z1 (0)i that are tangent to S at ( a, b, c). So ∇ f ( a, b, c) is perpendic-
ular to S at ( a, b, c).
But we already know, by the three-dimensional analogue to Corollary A.4.6, that ∇ g( a, b, c)
is also perpendicular to S at ( a, b, c). So ∇ f ( a, b, c) and ∇ g( a, b, c) have to be parallel vec-
tors. That is,
∇ f ( a, b, c) = λ∇ g( a, b, c)
for some number λ. That’s the Lagrange multiplier rule of our theorem.

Section A.5 of this work was adapted from Section 2.10 of CLP 3 – Multivariable Calcu-
lus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

A.6IJ A More Rigorous Area Computation


In Example 3.1.1 above we considered the area of the region ( x, y) ˇ 0 ď y ď e x , 0 ď
ˇ

x ď 1 . We approximated that area by the area of a union of n thin rectangles. We then


(

claimed that upon taking the number of rectangles to infinity, the approximation of the
area became the exact area. However we did not justify the claim. The purpose of this
optional section is to make that calculation rigorous.
The broad set-up is the same. We divide the region up into n vertical strips, each of
width 1/n and we then approximate those strips by rectangles. However rather than an
uncontrolled approximation, we construct two sets of rectangles — one set always smaller
than the original area and one always larger. This then gives us lower and upper bounds
on the area of the region. Finally we make use of the squeeze theorem6 to establish the
result.

• To find our upper and lower bounds we make use of the fact that e x is an increasing
d x
function. We know this because the derivative dx e = e x is always positive. Conse-
quently, the smallest and largest values of e x on the interval a ď x ď b are e a and eb ,
respectively.

• In particular, for 0 ď x ď 1/n, e x takes values only between e0 and e1/n . As a result,
the first strip

( x, y) ˇ 0 ď x ď 1/n, 0 ď y ď e x
ˇ (

– contains the rectangle of 0 ď x ď 1/n, 0 ď y ď e0 (the lighter rectangle in the


figure on the left below) and
– is contained in the rectangle 0 ď x ď 1/n, 0 ď y ď e1/n (the largest rectangle in
the figure on the left below).

6 Recall that if we have 3 functions f ( x ), g( x ), h( x ) that satisfy f ( x ) ď g( x ) ď h( x ) and we know that


limxÑa f ( x ) = limxÑa h( x ) = L exists and is finite, then the SqueezeTheorem tells us that limxÑa g( x ) =
L.

441
P ROOFS AND S UPPLEMENTS A.6 A M ORE R IGOROUS A REA C OMPUTATION

Hence
1 0 ( 1 1
e ď Area ( x, y) ˇ 0 ď x ď 1/n, 0 ď y ď e x ď e /n
ˇ
n n

y y = ex y y = ex

e2/n
e1/n0 e1/n0
e e

x 1 2 n x
1
n n
··· n
n

• Similarly, for the second, third, . . . , last strips, as in the figure on the right above,
1 1/n 1 2/n
e ď Area ( x, y) ˇ 1/n ď x ď 2/n, 0 ď y ď e x e
ˇ (
ď
n n
1 2/n 1 3
e ď Area ( x, y) ˇ 2/n ď x ď 3/n, 0 ď y ď e x ď e /n
ˇ (
n n
.. .. ..
. . .
1 (n´1)/n ( 1 n
e ď Area ( x, y) ˇ (n´1)/n ď x ď n/n, 0 ď y ď e x ď e /n
ˇ
n n
• Adding these n inequalities together gives
1 1/n (n´1)/n

1+ e +¨¨¨+ e
n
ď Area ( x, y) ˇ 0 ď x ď 1, 0 ď y ď e x
ˇ (

1  1/n 2/n n/n



ď e + e +¨¨¨+ e
n
n
• We can then recycle equation (3.1.3) with r = e1/n , so that r n = e1/n = e. Thus we
have
1 e´1 ˇ 0 ď x ď 1, 0 ď y ď e x ď 1 e1/n e ´ 1
Area x, y
ˇ (
ď ( )
n e1/n ´ 1 n e1/n ´ 1
where we have used the fact that the upper bound is a simple multiple of the lower
bound:
   
e /n + e /n + ¨ ¨ ¨ + e /n = e /n 1 + e /n + ¨ ¨ ¨ + e
1 2 n 1 1 (n´1)/n
.

• We now apply the Squeeze Theorem to the above inequalities. In particular, the
limits of the lower and upper bounds are
1 e´1 X
lim = ( e ´ 1 ) lim X
= e´1
nÑ8 n e /n ´ 1
1
X =1/nÑ0 e ´ 1

442
P ROOFS AND S UPPLEMENTS A.7 C AREFUL D EFINITION OF THE I NTEGRAL

(by l’Hôpital’s rule) and

1 1/n e ´ 1 Xe X
lim e = ( e ´ 1 ) lim ¨
nÑ8 n e1/n ´ 1 X =1/nÑ0 e X ´ 1
X
= (e ´ 1) lim e X ¨ lim
XÑ0 X =Ñ0 e X ´1
= ( e ´ 1) ¨ 1 ¨ 1

Thus, since the exact area is trapped between the lower and upper bounds, the
squeeze theorem then implies that

Exact area = e ´ 1.

Section A.6 of this work was adapted from Section 1.1.1 of CLP 2 – Integral Calculus by
Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

A.7IJ Careful Definition of the Integral


In this optional section we give a more mathematically rigorous definition of the definite
żb
integral f ( x )dx. Some textbooks use a sneakier, but equivalent, definition. The integral
a
will be defined as the limit of a family of approximations to the area between the graph of
y = f ( x ) and the x–axis, with x running from a to b. We will then show conditions under
which this limit is guaranteed to exist. We should state up front that these conditions are
more restrictive than is strictly necessary — this is done so as to keep the proof accessible.
The family of approximations needed is slightly more general than that used to define
Riemann sums in the previous sections, though it is quite similar. The main difference is
that we do not require that all the subintervals have the same size.
• We start by selecting a positive integer n. As was the case previously, this will be the
number of subintervals used in the approximation and eventually we will take the
limit as n Ñ 8.

• Now subdivide the interval from a to b into n subintervals by selecting n + 1 values


of x that obey

a = x0 ă x1 ă x2 ă ¨ ¨ ¨ ă xn´1 ă xn = b.

The subinterval number i runs from xi´1 to xi . This formulation does not require
the subintervals to have the same size. However we will eventually require that the
widths of the subintervals shrink towards zero as n Ñ 8.

• Then for each subinterval we select a value of x in that interval. That is, for i =
1, 2, . . . , n, choose xi˚ satisfying xi´1 ď xi˚ ď xi . We will use these values of x to help
approximate f ( x ) on each subinterval.

• The area between the graph of y = f ( x ) and the x–axis, with x running from xi´1

443
P ROOFS AND S UPPLEMENTS A.7 C AREFUL D EFINITION OF THE I NTEGRAL

y = f (x)

x
a = x0 x1 x2 x3 ··· xn−1 xn = b

şx
to xi , i.e. the contribution, x i f ( x )dx, from interval number i to the integral, is
i´1
approximated by the area of a rectangle. The rectangle has width xi ´ xi´1 and height
f ( xi˚ ).

• Thus the approximation to the integral, using all n subintervals, is


żb
f ( x )dx « f ( x1˚ )[ x1 ´ x0 ] + f ( x2˚ )[ x2 ´ x1 ] + ¨ ¨ ¨ + f ( xn˚ )[ xn ´ xn´1 ]
a

• Of course every different choice of n and x1 , x2 , . . . , xn´1 and x1˚ , x2˚ , . . . , xn˚ gives a
different approximation. So to simplify the discussion that follows, let us denote a
particular choice of all these numbers by P:
P = (n, x1 , x2 , ¨ ¨ ¨ , xn´1 , x1˚ , x2˚ , ¨ ¨ ¨ , xn˚ ) .
Similarly let us denote the resulting approximation by I(P):
I(P) = f ( x1˚ )[ x1 ´ x0 ] + f ( x2˚ )[ x2 ´ x1 ] + ¨ ¨ ¨ + f ( xn˚ )[ xn ´ xn´1 ]

• We claim that, for any reasonable7 function f ( x ), if you take any reasonable8 se-
quence of these approximations you always get the exactly the same limiting value.
şb
We define a f ( x )dx to be this limiting value.

7 We’ll be more precise about what “reasonable” means shortly.


8 Again, we’ll explain this “reasonable” shortly

444
P ROOFS AND S UPPLEMENTS A.7 C AREFUL D EFINITION OF THE I NTEGRAL

• Let’s be more precise. We can take the limit of these approximations in two equiv-
alent ways. Above we did this by taking the number of subintervals n to infinity.
When we did this, the width of all the subintervals went to zero. With the formu-
lation we are now using, simply taking the number of subintervals to be very large
does not imply that they will all shrink in size. We could have one very large subin-
terval and a large number of tiny ones. Thus we take the limit we need by taking the
width of the subintervals to zero. So for any choice P, we define
M(P) = max x1 ´ x0 , x2 ´ x1 , ¨ ¨ ¨ , xn ´ xn´1
(

that is the maximum width of the subintervals used in the approximation deter-
mined by P. By forcing the maximum width to go to zero, the widths of all the
subintervals go to zero.
• We then define the definite integral as the limit
żb
f ( x )dx = lim I(P).
a M (P)Ñ0

Of course, one is now left with the question of determining when the above limit exists. A
proof of the very general conditions which guarantee existence of this limit is beyond the
scope of this course, so we instead give a weaker result (with stronger conditions) which
is far easier to prove.
For the rest of this section, assume
• that f ( x ) is continuous for a ď x ď b,
• that f ( x ) is differentiable for a ă x ă b, and
• that f 1 ( x ) is bounded — ie | f 1 ( x )| ď F for some constant F.
We will now show that, under these hypotheses, as M(P) approaches zero, I(P) always
approaches the area, A, between the graph of y = f ( x ) and the x–axis, with x running
from a to b.
These assumptions are chosen to make the argument particularly transparent. With a
little more work one can weaken the hypotheses considerably. We are cheating a little by
implicitly assuming that the area A exists. In fact, one can adjust the argument below to
remove this implicit assumption.
• Consider A j , the part of the area coming from x j´1 ď x ď x j .

445
P ROOFS AND S UPPLEMENTS A.7 C AREFUL D EFINITION OF THE I NTEGRAL

We have approximated this area by f ( x˚j )[ x j ´ x j´1 ] (see figure left).

• Let f ( x j ) and f ( x j ) be the largest and smallest values9 of f ( x ) for x j´1 ď x ď x j .


Then the true area is bounded by

f ( x j )[ x j ´ x j´1 ] ď A j ď f ( x j )[ x j ´ x j´1 ].

(see figure right).

• Now since f ( x j ) ď f ( x˚j ) ď f ( x j ), we also know that

f ( x j )[ x j ´ x j´1 ] ď f ( x˚j )[ x j´1 ´ x j ] ď f ( x j )[ x j ´ x j´1 ].

• So both the true area, A j , and our approximation of that area f ( x˚j )[ x j ´ x j´1 ] have
to lie between f ( x j )[ x j ´ x j´1 ] and f ( x j )[ x j ´ x j´1 ]. Combining these bounds we
have that the difference between the true area and our approximation of that area is
bounded by

ˇ A j ´ f ( x˚ )[ x j ´ x j´1 ]ˇ ď [ f ( x j ) ´ f ( x j )] ¨ [ x j ´ x j´1 ].
ˇ ˇ
j

(To see this think about the smallest the true area can be and the largest our approx-
imation can be and vice versa.)

• Now since our function, f ( x ) is differentiable we can apply one of the main theo-
rems we learned in first-semester calculus — the Mean Value Theorem10 . The MVT
implies that there exists a c between x j and x j such that

f ( x j ) ´ f ( x j ) = f 1 (c) ¨ [ x j ´ x j ]

• By the assumption that | f 1 ( x )| ď F for all x and the fact that x j and x j must both be
between x j´1 and x j

ˇ f ( x j ) ´ f ( x j )ˇ ď F ¨ ˇx j ´ x j ˇ ď F ¨ [ x j ´ x j´1 ]
ˇ ˇ ˇ ˇ

Hence the error in this part of our approximation obeys

ˇ A j ´ f ( x˚ )[ x j ´ x j´1 ]ˇ ď F ¨ [ x j ´ x j´1 ]2 .
ˇ ˇ
j

9 Here we are using the Extreme Value Theorem — its proof is beyond the scope of this course. The
theorem says that any continuous function on a closed interval must attain a minimum and maximum
at least once. In this situation this implies that for any continuous function f ( x ), there are x j´1 ď
x j , x j ď x j such that f ( x j ) ď f ( x ) ď f ( x j ) for all x j´1 ď x ď x j .
10 Recall that the Mean Value Theorem states that for a function continuous on [ a, b] and differentiable on
( a, b), there exists a number c between a and b so that

f (b) ´ f ( a)
f 1 (c) = .
b´a

446
P ROOFS AND S UPPLEMENTS A.8 I NTEGRATING sec x, csc x, sec3 x AND csc3 x

• That was just the error in approximating A j . Now we bound the total error by com-
bining the errors from approximating on all the subintervals. This gives
ˇ ˇ
ˇ n n
ˇÿ ÿ ˇ
|A ´ I(P)| = ˇ A j ´ f ( x j )[ x j ´ x j´1 ]ˇˇ
˚
ˇ
ˇ
ˇ j =1 j =1 ˇ
ˇ ˇ
ˇ n  ˇˇ
ˇÿ
= ˇˇ A j ´ f ( x j )[ x j ´ x j´1 ] ˇˇ
˚
triangle inequality
ˇ j =1 ˇ
ÿ n ˇ ˇ
ď ˇ A j ´ f ( x˚j )[ x j ´ x j´1 ]ˇ
ˇ ˇ
j =1
n
F ¨ [ x j ´ x j´1 ]2
ÿ
ď from above
j =1

Now do something a little sneaky. Replace one of these factors of [ x j ´ x j´1 ] (which
is just the width of the jth subinterval) by the maximum width of the subintervals:
n
ÿ
ď F ¨ M (P) ¨ [ x j ´ x j´1 ] F and M (P) are constant
j =1
n
ÿ
ď F ¨ M (P) ¨ [ x j ´ x j´1 ] sum is total width
j =1
= F ¨ M (P) ¨ ( b ´ a ).

• Since a, b and F are fixed, this tends to zero as the maximum rectangle width M(P)
tends to zero.
Thus, we have proven
TheoremA.7.1.

Assume that f ( x ) is continuous for a ď x ď b, and is differentiable for all a ă x ă


b with | f 1 ( x )| ď F, for some constant F. Then, as the maximum rectangle width
M (P) tends to zero, I(P) always converges to A, the area between the graph of
y = f ( x ) and the x–axis, with x running from a to b.

Section A.7 of this work was adapted from Section 1.1.6 of CLP 2 – Integral Calculus by
Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

A.8IJ Integrating sec x, csc x, sec3 x and csc3 x


Rather than trying to construct a coherent “method” for the integration of tanm x secn xdx
ş

when n is odd and m is even, we instead give some examples to give the idea of what to
expect.

447
P ROOFS AND S UPPLEMENTS A.8 I NTEGRATING sec x, csc x, sec3 x AND csc3 x

Example A.8.1 ( sec xdx — by trickery)


ş

Solution. There is a very sneaky trick to compute this integral.


sec x +tan x
• The standard trick for this integral is to multiply the integrand by 1 = sec x +tan x

sec x + tan x sec2 x + sec x tan x


sec x = sec x =
sec x + tan x sec x + tan x

• Notice now that the numerator of this expression is exactly the derivative its denom-
inator. Hence we can substitute u = sec x + tan x and du = (sec x tan x + sec2 x ) dx.

• Hence
sec x + tan x sec2 x + sec x tan x
ż ż ż
sec xdx = sec x dx = dx
sec x + tan x sec x + tan x
1
ż
= du
u
= ln |u| + C
= ln | sec x + tan x| + C

• The above trick appears both totally unguessable and very hard to remember. For-
tunately, there is a simple way11 to recover the trick. Here it is.

– The goal is to guess a function whose derivative is sec x.


– So get out a table of derivatives and look for functions whose derivatives at
least contain sec x. There are two:
d
tan x = sec2 x
dx
d
sec x = tan x sec x
dx
– Notice that if we add these together we get
d

d  dx sec x + tan x
sec x + tan x = (sec x + tan x ) sec x ùñ = sec x
dx sec x + tan x
– We’ve done it! The right hand side is sec x and the left hand side is the deriva-
tive of ln | sec x + tan x|.

Example A.8.1
There is a second method for integrating sec xdx, that is more tedious, but more
ş straight
ş

forward. In particular, it does not involve a memorized trick. The integral sec x dx is
ş du
converted into the integral 1´u 2 by using the substitution u = sin x, du = cos x dx. The
ş du
integral 1´u2 is then integrated by the method of partial fractions, which we shall learn

11 We thank Serban Raianu for bringing this to our attention.

448
P ROOFS AND S UPPLEMENTS A.8 I NTEGRATING sec x, csc x, sec3 x AND csc3 x

about in Section 3.8 “Partial Fractions”. The details are in Example 3.8.4 in those notes.
This second method gives the answer
1 1 + sin x
ż
sec xdx = ln +C
2 1 ´ sin x
which appears to be different than the answer in Example A.8.1. But they really are the
same since
1 + sin x (1 + sin x )2 (1 + sin x )2
= =
1 ´ sin x 1 ´ sin2 x cos2 x
1 1 + sin x 1 (1 + sin x )2 ˇ sin x + 1 ˇ
ùñ ln = ln = ln ˇ ˇ = ln | tan x + sec x|
ˇ ˇ
2 1 ´ sin x 2 2
cos x cos x
Oof!

Example A.8.2 csc xdx — by the u = tan 2x substitution
ş

Solution. The integral csc xdx may also be evaluated by both the methods above. That
ş

is either
• by multiplying the integrand by a cleverly chosen 1 = cot x´csc x
cot x´csc x and then substitut-
ing u = cot x ´ csc x, du = (´ csc2 x + csc x cot x ) dx, or
ş du
• by substituting u = cos x, du = ´ sin x dx to give csc xdx = ´ 1´u 2 and then
ş

using the method of partial fractions.


These two methods give the answers
1 1 + cos x
ż
csc xdx = ln | cot x ´ csc x| + C = ´ ln +C (A.8.1)
2 1 ´ cos x
In this example, we shall evaluate csc xdx by yet a third method, which can be used to
ş

integrate rational functions12 of sin x and cos x.

• This method uses the substitution


x 2
x = 2 arctan u i.e. u = tan and dx = du
2 1 + u2
— a half-angle substitution.

• To express sin x and cos x in terms of u, we first use the double angle trig identities
(Equations 3.6.2 and 3.6.3 with x ÞÑ x/2) to express sin x and cos x in terms of sin 2x
and cos 2x :
x x
sin x = 2 sin cos
2 2
x x
cos x = cos2 ´ sin2
2 2

12 A rational function of sin x and cos x is a ratio with both the numerator and denominator being finite
sums of terms of the form a sinm x cosn x, where a is a constant and m and n are positive integers.

449
P ROOFS AND S UPPLEMENTS A.8 I NTEGRATING sec x, csc x, sec3 x AND csc3 x

• We then use the triangle


1 + u2 u
x/2
1

to express sin 2x and cos 2x in terms of u. The bottom and right hand sides of the
triangle have been chosen so that tan 2x = u. This tells us that

x u x 1
sin =? cos =?
2 1 + u2 2 1 + u2

• This in turn implies that:

x x u 1 2u
sin x = 2 sin cos = 2 ? ? =
2 2 2
1+u 1+u 2 1 + u2
x x 1 u2 1 ´ u2
cos x = cos2 ´ sin2 = ´ =
2 2 1 + u2 1 + u2 1 + u2
Oof!

• Let’s use this substitution to evaluate csc x dx.


ş

1 1 + u2 2 1
ż ż ż ż
csc xdx = dx = 2
du = du = ln |u| + C
sin x 2u 1 + u u
ˇ x ˇˇ
= ln ˇ tan ˇ + C
ˇ
2
To see that this answer is really the same as that in (A.8.1), note that

cos x ´ 1 ´2 sin2 ( x/2) x


cot x ´ csc x = = = ´ tan
sin x 2 sin( x/2) cos( x/2) 2

Example A.8.2


Example A.8.3 sec3 xdx — by trickery
ş

Solution. The standard trick used to evaluate sec3 xdx is integration by parts.
ş

• Set u = sec x, dv = sec2 xdx. Hence du = sec x tan xdx, v = tan x and
ż ż
3 2
sec xdx = lo sec
omoxon sec xdx
looomooon
u dv
ż
= losec tan
omoxon loomooxn ´ tan
lo sec x tan xdx
omooxn loooooomoooooon
u v v du

450
P ROOFS AND S UPPLEMENTS A.8 I NTEGRATING sec x, csc x, sec3 x AND csc3 x

• Since tan2 x + 1 = sec2 x, we have tan2 x = sec2 x ´ 1 and


ż ż
sec xdx = sec x tan x ´ [sec3 x ´ sec x ]dx
3

ż
= sec x tan x + ln | sec x + tan x| + C ´ sec3 xdx

where we used sec xdx = ln | sec x + tan x| + C, which we saw in Example A.8.1.
ş

• Now moving the sec3 xdx from the right hand side to the left hand side
ş
ż
2 sec3 xdx = sec x tan x + ln | sec x + tan x| + C and so
1 1
ż
sec3 xdx = sec x tan x + ln | sec x + tan x| + C
2 2
for a new arbitrary constant C (which is just one half the old one).
Example A.8.3

The integral sec3 dx can also be evaluated by two other methods.


ş

du
• Substitute u = sin x, du = cos xdx to convert sec3 xdx into and evaluate
ş ş
[1´u2 ]2
the latter using the method of partial fractions. This is done in Example 3.8.5 in
Section 3.8.
• Use the u = tan 2x substitution. We use this method to evaluate csc3 xdx in Example
ş

A.8.4, below.

Example A.8.4 csc3 xdx – by the u = tan 2x substitution
ş

Solution. Let us use the half-angle substitution that we introduced in Example A.8.2.

• In this method we set


x 2 2u 1 ´ u2
u = tan dx = du sin x = cos x =
2 1 + u2 1 + u2 1 + u2
• The integral then becomes
1
ż ż
3
csc xdx = dx
sin3 x
ż  3
1 + u2  2
= du
2u 1 + u2
1 1 + 2u2 + u4
ż
= du
4 u3
1 ! u´2 u2 )
= + 2 ln |u| + +C
4 ´2 2
1! 2 x
ˇ x ˇˇ 2 x
)
= ´ cot + 4 ln ˇ tan ˇ + tan +C
ˇ
8 2 2 2
Oof!

451
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

• This is a perfectly acceptable answer. But if you don’t like the 2x ’s, they may be
eliminated by using
x
2 2 x sin2 2x cos2 2x
tan ´ cot = ´
2 2 cos2 2x sin2 2x
sin4 x
2 ´ cos 2
4 x
=
sin2 2x cos2 2x
 
sin2 2x ´ cos2 2x sin2 2x + cos2 x
2
=
sin2 2x cos2 2x
sin2 2x ´ cos2 2x x x
= since sin2 + cos2 = 1
sin2 2x cos2 2x 2 2
´ cos x
= 1 2
by (3.6.2) and (3.6.3)
4 sin x
and
x sin 2x sin2 2x
tan = =
2 cos 2x sin 2x cos 2x
1
2 [1 ´ cos x ]
= 1
by (3.6.2) and (3.6.3)
2 sin x
So we may also write
1 1
ż
csc3 xdx = ´ cot x csc x + ln | csc x ´ cot x| + C
2 2

Example A.8.4

Section A.8 of this work was adapted from Section 1.8.3 of CLP 2 – Integral Calculus by
Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

A.9IJ Partial Fraction Decompositions


A.9.1 §§ Decomposition involving Irreducible Quadratic Factors
In the following it is assumed that
• N ( x ) and D ( x ) are polynomials with the degree of N ( x ) strictly smaller than the
degree of D ( x ).
• K is a constant.
• a1 , a2 , ¨ ¨ ¨ , a j are all different numbers.
• m1 , m2 , ¨ ¨ ¨ , m j , and n1 , n2 , ¨ ¨ ¨ , nk are all strictly positive integers.

• x2 + b1 x + c1 , x2 + b2 x + c2 , ¨ ¨ ¨ , x2 + bk x + ck are all different.

452
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

§§§ Simple Linear and Quadratic Factor Case


If D ( x ) = K ( x ´ a1 ) ¨ ¨ ¨ ( x ´ a j )( x2 + b1 x + c1 ) ¨ ¨ ¨ ( x2 + bk x + ck ) then

Equation A.9.1.

N (x) A1 Aj B x + C1 B x + Ck
= +¨¨¨+ + 2 1 +¨¨¨+ 2 k
D(x) x ´ a1 x ´ aj x + b1 x + c1 x + bk x + c k

Note that the numerator of each term on the right hand side has degree one smaller
than the degree of the denominator.
The quadratic terms x2Bx +C
+bx +c
are integrated in a two-step process that is best illustrated
with a simple example (see also Example A.9.5 above).
ş 
2x +7
Example A.9.2 x2 +4x +13
dx

Solution.

• Start by completing the square in the denominator:


x2 + 4x + 13 = ( x + 2)2 + 9 and thus
2x + 7 2x + 7
2
=
x + 4x + 13 ( x + 2)2 + 32

• Now set y = ( x + 2)/3, dy = 13 dx, or equivalently x = 3y ´ 2, dx = 3dy:


2x + 7 2x + 7
ż ż
2
dx = dx
x + 4x + 13 ( x + 2 ) 2 + 32
6y ´ 4 + 7
ż
= ¨ 3dy
32 y2 + 32
6y + 3
ż
= dy
3( y2 + 1)
2y + 1
ż
= dy
y2 + 1
Notice that we chose 3 in y = ( x + 2)/3 precisely to transform the denominator into
the form y2 + 1.
• Now almost always the numerator will be a linear polynomial of y and we decom-
pose as follows
2x + 7 2y + 1
ż ż
2
dx = dy
x + 4x + 13 y2 + 1
2y 1
ż ż
= 2
dy + 2
dy
y +1 y +1
= ln |y2 + 1| + arctan y + C
ˇ
ˇ x + 2 2  
ˇ
ˇ x+2
= ln ˇ + 1ˇ + arctan +C
ˇ ˇ
ˇ 3 ˇ 3

453
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

Example A.9.2

§§§ General Linear and Quadratic Factor Case

If D ( x ) = K ( x ´ a1 )m1 ¨ ¨ ¨ ( x ´ a j )m j ( x2 + b1 x + c1 )n1 ¨ ¨ ¨ ( x2 + bk x + ck )nk then

Equation A.9.3.

N (x) A1,1 A1,2 A1,m1


= + +¨¨¨+ +¨¨¨
D(x) x ´ a1 ( x ´ a1 ) 2 ( x ´ a 1 ) m1
A j,1 A j,2 A j,m j
+ + + ¨ ¨ ¨ +
x ´ a j ( x ´ a j )2 ( x ´ a j )m j
B x + C1,1 B x + C1,2 B1,n x + C1,n1
+ 2 1,1 + 2 1,2 2
+¨ ¨ ¨+ 2 1 +¨ ¨ ¨
x + b1 x + c1 ( x + b1 x + c1 ) ( x + b1 x + c1 )n1
B x + Ck,1 B x + Ck,2 Bk,nk x + C1,nk
+ 2 k,1 + 2 k,2 + ¨ ¨ ¨ +
x + bk x + c k ( x + bk x + c k ) 2 ( x 2 + bk x + c k ) n k

We have already seen how to integrate the simple and general linear terms, and the
simple quadratic terms. Integrating general quadratic terms is not so straightforward.
ş 
dx
Example A.9.4 ( x 2 +1) n

This example is not so easy, so it should definitely be considered optional.


Solution. In what follows write
dx
ż
In = .
( x2 + 1) n

• When n = 1 we know that


dx
ż
= arctan x + C
x2+1

• Now assume that n ą 1, then

1 ( x2 + 1 ´ x2 )
ż ż
2
dx = dx sneaky
( x + 1) n ( x 2 + 1) n
1 x2
ż ż
= dx ´ dx
( x2 + 1)n´1 ( x 2 + 1) n
x2
ż
= In´1 ´ dx
( x 2 + 1) n

So we can write In in terms of In´1 and this second integral.

454
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

• We can use integration by parts to compute the second integral:

x2 x 2x
ż ż
dx = ¨ 2 dx sneaky
( x 2 + 1) n 2 ( x + 1) n

We set u = x/2 and dv = ( x22x


+1) n
1
dx, which gives du = 21 dx and v = ´ n´1 ¨ ( x2 +11)n´1 .
You can check v by differentiating. Integration by parts gives

x 2x x dx
ż ż
¨ 2 n
dx = ´ 2
+
2 ( x + 1) 2(n ´ 1)( x + 1) n´1 2(n ´ 1)( x2 + 1)n´1
x 1
=´ 2 n´1
+ ¨ In´1
2(n ´ 1)( x + 1) 2( n ´ 1)

• Now put everything together:

1
ż
In = 2
dx
( x + 1) n
x 1
= In´1 + 2 n´1
´ ¨ In´1
2(n ´ 1)( x + 1) 2( n ´ 1)
2n ´ 3 x
= In´1 +
2( n ´ 1) 2(n ´ 1)( x2 + 1)n´1

• We can then use this recurrence to write down In for the first few n:

1 x
I2 = I1 + 2
+C
2 2( x + 1)
1 x
= arctan x +
2 2( x 2 + 1)
3 x
I3 = I2 +
4 4( x 2 + 1)2
3 3x x
= arctan x + + +C
8 8( x 2 + 1) 4( x 2 + 1)2
5 x
I4 = I3 +
6 6( x + 1)3
2

5 5x 5x x
= arctan x + 2
+ 2 2
+ +C
16 16( x + 1) 24( x + 1) 6( x + 1)3
2

and so forth. You can see why partial fraction questions involving denominators
with repeated quadratic factors do not often appear on exams.

Example A.9.4

Below are some examples using the general method.


Here is a very solid example. It is quite long and the steps are involved. However
please persist. No single step is too difficult.

455
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

ş 
x4 +5x3 +16x2 +26x +22
Example A.9.5 x3 +3x2 +7x +5
dx

N (x) x4 +5x3 +16x2 +26x +22


In this example, we integrate D(x)
= x3 +3x2 +7x +5
.

Solution.

• Step 1. Again, we start by comparing the degrees of the numerator and denominator.
In this example, the numerator, x4 + 5x3 + 16x2 + 26x + 22, has degree four and the
denominator, x3 + 3x2 + 7x + 5, has degree three. As 4 ě 3, we must execute the
N (x)
first step, which is to write D( x) in the form

N (x) R( x )
= P( x ) +
D(x) D(x)

with P( x ) being a polynomial and R( x ) being a polynomial of degree strictly smaller


than the degree of D ( x ). This step is accomplished by long division, just as we did
in Example A.9.5. We’ll go through the whole process in detail again.
Actually — before you read on ahead, please have a go at the long division. It is
good practice.

– We start by observing that to get from the highest degree term in the denomi-
nator (x3 ) to the highest degree term in the numerator (x4 ), we have to multiply
by x. So we write,
x
x3 + 3x2 + 7x + 5 x4+ 5x3+16x2+26x+ 22

– Now we subtract x times the denominator x3 + 3x2 + 7x + 5, which is x4 +


3x3 + 7x2 + 5x, from the numerator.
x
x3 + 3x2 + 7x + 5 x4+ 5x3+16x2+26x+ 22
x4+ 3x3+ 7x2+ 5x x(x3 + 3x2 + 7x + 5)
2x3+ 9x2+21x+ 22

– The remainder was 2x3 + 9x2 + 21x + 22. To get from the highest degree term
in the denominator (x3 ) to the highest degree term in the remainder (2x3 ), we
have to multiply by 2. So we write,

x+ 2
3 2
x + 3x + 7x + 5 x4+ 5x3+16x2+26x+ 22
x4+ 3x3+ 7x2+ 5x
2x3+ 9x2+21x+ 22

– Now we subtract 2 times the denominator x3 + 3x2 + 7x + 5, which is 2x3 +


6x2 + 14x + 10, from the remainder.

456
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

x+ 2
x3 + 3x2 + 7x + 5 x4+ 5x3+16x2+26x+ 22
x4+ 3x3+ 7x2+ 5x x(x3 + 3x2 + 7x + 5)
2x3+ 9x2+21x+ 22
2x3+ 6x2+14x+ 10 2(x3 + 3x2 + 7x + 5)
3x2+ 7x+ 12
– This leaves a remainder of 3x2 + 7x + 12. Because the remainder has degree 2,
which is smaller than the degree of the denominator, which is 3, we stop.
– In this example, when we subtracted x ( x3 + 3x2 + 7x + 5) and 2( x3 + 3x2 +
7x + 5) from x4 + 5x3 + 16x2 + 26x + 22 we ended up with 3x2 + 7x + 12. That
is,

x4 + 5x3 + 16x2 + 26x + 22 ´ x ( x3 + 3x2 + 7x + 5) ´ 2( x3 + 3x2 + 7x + 5)


= 3x2 + 7x + 12

or, collecting the two terms proportional to ( x3 + 3x2 + 7x + 5)

x4 + 5x3 + 16x2 + 26x + 22 ´ ( x + 2)( x3 + 3x2 + 7x + 5) = 3x2 + 7x + 12

Moving the ( x + 2)( x3 + 3x2 + 7x + 5) to the right hand side and dividing the
whole equation by x3 + 3x2 + 7x + 5 gives

x4 + 5x3 + 16x2 + 26x + 22 3x2 + 7x + 12


= x + 2 +
x3 + 3x2 + 7x + 5 x3 + 3x2 + 7x + 5
N (x) R( x )
This is of the form D( x) = P( x ) + D( x) , with the degree of R( x ) strictly smaller than
the degree of D ( x ), which is what we wanted. Observe, once again, that R( x ) is
the final remainder of the long division procedure and P( x ) is at the top of the long
division computation.

x+ 2 P (x)
3 2
x + 3x + 7x + 5 x4+ 5x3+16x2+26x+ 22
x4+ 3x3+ 7x2+ 5x
2x3+ 9x2+21x+ 22
2x3+ 6x2+14x+ 10
3x2+ 7x+ 12 R(x)

• Step 2. The second step is to factor the denominator D ( x ) = x3 + 3x2 + 7x + 5. In


the “real world” factorisation of polynomials is often very hard. Fortunately13 , this
is not the “real world” and there is a trick available to help us find this factorisation.
The reader should take some time to look at Appendix B.16 before proceeding.

– The trick exploits the fact that most polynomials that appear in homework as-
signments and on tests have integer coefficients and some integer roots. Any

13 One does not typically think of mathematics assignments or exams as nice kind places. . . The polyno-
mials that appear in the “real world” are not so forgiving. Nature, red in tooth and claw — to quote
Tennyson inappropriately (especially when this author doesn’t know any other words from the poem).

457
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

integer root of a polynomial that has integer coefficients, like D ( x ) = x3 + 3x2 +


7x + 5, must divide the constant term of the polynomial exactly. Why this is true
is explained14 in Appendix B.16.
– So any integer root of x3 + 3x2 + 7x + 5 must divide 5 exactly. Thus the only
integers which can be roots of D ( x ) are ˘1 and ˘5. Of course, not all of these
give roots of the polynomial — in fact there is no guarantee that any of them
will be. We have to test each one.
– To test if +1 is a root, we sub x = 1 into D ( x ):

D (1) = 13 + 3(1)2 + 7(1) + 5 = 16

As D (1) ‰ 0, 1 is not a root of D ( x ).


– To test if ´1 is a root, we sub it into D ( x ):

D (´1) = (´1)3 + 3(´1)2 + 7(´1) + 5 = ´1 + 3 ´ 7 + 5 = 0



As D (´1) = 0, ´1 is a root of D ( x ). As ´1 is a root of D ( x ), x ´ (´1) =
( x + 1) must factor D ( x ) exactly. We can factor the ( x + 1) out of D ( x ) =
x3 + 3x2 + 7x + 5 by long division once again.
– Dividing D ( x ) by ( x + 1) gives:

x2+ 2x + 5
x + 1 x3+ 3x2+ 7x+ 5
x3+ x2 x2 (x + 1)
2x2+ 7x+ 5
2x2+ 2x 2x(x + 1)
5x+ 5
5x+ 5 5(x + 1)
0

This time, when we subtracted x2 ( x + 1) and 2x ( x + 1) and 5( x + 1) from x3 +


3x2 + 7x + 5 we ended up with 0 — as we knew would happen, because we
knew that x + 1 divides x3 + 3x2 + 7x + 5 exactly. Hence

x3 + 3x2 + 7x + 5 ´ x2 ( x + 1) ´ 2x ( x + 1) ´ 5( x + 1) = 0

or

x3 + 3x2 + 7x + 5 = x2 ( x + 1) + 2x ( x + 1) + 5( x + 1)

or

x3 + 3x2 + 7x + 5 = ( x2 + 2x + 5)( x + 1)

14 Appendix B.16 contains several simple tricks for factoring polynomials. We recommend that you have
a look at them.

458
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

– It isn’t quite time to stop yet; we should attempt to factor the quadratic factor,
x2 + 2x + 5. We can use the quadratic formula15 to find the roots of x2 + 2x + 5:
? ? ?
´b ˘ b2 ´ 4ac ´2 ˘ 4 ´ 20 ´2 ˘ ´16
= =
2a 2 2
Since this expression contains the square root of a negative number the equation
x2 + 2x + 5 = 0 has no real solutions; without the use of complex numbers,
x2 + 2x + 5 cannot be factored.

We have reached the end of step 2. At this point we have


x4 + 5x3 + 16x2 + 26x + 22 3x2 + 7x + 12
= x+2+
x3 + 3x2 + 7x + 5 ( x + 1)( x2 + 2x + 5)
3x2 +7x +12
• Step 3. The third step is to write ( x +1)( x2 +2x +5)
in the form

3x2 + 7x + 12 A Bx + C
2
= + 2
( x + 1)( x + 2x + 5) x + 1 x + 2x + 5
for some constants A, B and C.
Note that the numerator, Bx + C of the second term on the right hand side is not just
a constant. It is of degree one, which is exactly one smaller than the degree of the
denominator, x2 + 2x + 5. More generally, if the denominator consists of n different
linear factors and m different quadratic factors, then we decompose the ratio as
A1 A2 An
rational function = + +¨¨¨+
linear factor 1 linear factor 2 linear factor n
B1 x + C1 B2 x + C2 Bm x + Cm
+ + +¨¨¨+
quadratic factor 1 quadratic factor 2 quadratic factor m

To determine the values of the constants A, B, C, we put the right hand side back
over the common denominator ( x + 1)( x2 + 2x + 5).
3x2 + 7x + 12 A Bx + C A( x2 + 2x + 5) + ( Bx + C )( x + 1)
= + =
( x + 1)( x2 + 2x + 5) x + 1 x2 + 2x + 5 ( x + 1)( x2 + 2x + 5)
The fraction on the far left is the same as the fraction on the far right if and only if
their numerators are the same.
3x2 + 7x + 12 = A( x2 + 2x + 5) + ( Bx + C )( x + 1)
Again, as in Example 3.8.1, there are a couple of different ways to determine the
values of A, B and C from this equation.

15 To be precise, the quadratic equation ax2 + bx + c = 0 has solutions


?
´b ˘ b2 ´ 4ac
x= .
2a
The term b2 ´ 4ac is called the discriminant and it tells us about the number of solutions. If the discrim-
inant is positive then there are two real solutions. When it is zero, there is a single solution. And if it is
negative, there is no real solutions (you need complex numbers to say more than this).

459
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

• Step 3 – Algebra Method. The conceptually clearest procedure is to write the right
hand side as a polynomial in standard form (i.e. collect up all x2 terms, all x terms
and all constant terms)
3x2 + 7x + 12 = ( A + B) x2 + (2A + B + C ) x + (5A + C )
For these two polynomials to be the same, the coefficient of x2 on the left hand side
and the coefficient of x2 on the right hand side must be the same. Similarly the
coefficients of x1 must match and the coefficients of x0 must match.
This gives us a system of three equations
A+B = 3 2A + B + C = 7 5A + C = 12
in the three unknowns A, B, C. We can solve this system by
– using the first equation, namely A + B = 3, to determine A in terms of B:
A = 3 ´ B.
– Substituting this into the remaining two equations eliminates the A’s from these
two equations, leaving two equations in the two unknowns B and C.
A = 3´B 2A + B + C = 7 5A + C = 12
ñ 2(3 ´ B ) + B + C = 7 5(3 ´ B) + C = 12
ñ ´B + C = 1 ´5B + C = ´3

– Now we can use the equation ´B + C = 1, to determine B in terms of C: B =


C ´ 1.
– Substituting this into the remaining equation eliminates the B’s leaving an equa-
tion in the one unknown C, which is easy to solve.
B = C´1 ´5B + C = ´3
ñ ´5(C ´ 1) + C = ´3
ñ ´4C = ´8

– So C = 2, and then B = C ´ 1 = 1, and then A = 3 ´ B = 2. Hence


3x2 + 7x + 12 2 x+2
2
= + 2
( x + 1)( x + 2x + 5) x + 1 x + 2x + 5
• Step 3 – Sneaky Method. While the above method is transparent, it is rather tedious. It
is arguably better to use the second, sneakier and more efficient, procedure. In order
for
3x2 + 7x + 12 = A( x2 + 2x + 5) + ( Bx + C )( x + 1)
the equation must hold for all values of x.
– In particular, it must be true for x = ´1. When x = ´1, the factor ( x + 1)
multiplying Bx + C is exactly zero. So B and C disappear from the equation,
leaving us with an easy equation to solve for A:
ˇ h i
3x2 + 7x + 12ˇ = A( x2 + 2x + 5) + ( Bx + C )( x + 1) ùñ 8 = 4A ùñ A = 2
ˇ
x =´1 x =´1

460
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

– Sub this value of A back in and simplify.

3x2 + 7x + 12 = 2( x2 + 2x + 5) + ( Bx + C )( x + 1)
x2 + 3x + 2 = ( Bx + C )( x + 1)

Since ( x + 1) is a factor on the right hand side, it must also be a factor on the
left hand side.

( x + 2)( x + 1) = ( Bx + C )( x + 1) ñ ( x + 2) = ( Bx + C ) ñ B = 1, C = 2

So again we find that

3x2 + 7x + 12 2 x+2
= + X
( x + 1)( x2 + 2x + 5) x + 1 x2 + 2x + 5

Thus our integrand can be written as

x4 + 5x3 + 16x2 + 26x + 22 2 x+2


3 2
= x+2+ + 2 .
x + 3x + 7x + 5 x + 1 x + 2x + 5

• Step 4. Now we can finally integrate! The first two pieces are easy.

2
ż ż
1 2
( x + 2)dx = 2 x + 2x dx = 2 ln |x + 1|
x+1
(We’re leaving the arbitrary constant to the end of the computation.)
The final piece is a little harder. The idea is to complete the square16 in the denomi-
nator
x+2 x+2
=
x2 + 2x + 5 ( x + 1)2 + 4
ay+b
and then make a change of variables to make the fraction look like y2 +1
. In this case

x+2 1 x+2
2
= 1 2
( x + 1) + 4 4 ( x+
2 ) +1

16 This same idea arose in Section 3.7. Given a quadratic written as

Q( x ) = ax2 + bx + c

rewrite it as

Q( x ) = a( x + d)2 + e.

We can determine d and e by expanding and comparing coefficients of x:

ax2 + bx + c = a( x2 + 2dx + d2 ) + e = ax2 + 2dax + (e + ad2 )

Hence d = b/2a and e = c ´ ad2 .

461
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

1
so we make the change of variables y = x+ dx
2 , dy = 2 , x = 2y ´ 1, dx = 2 dy

x+2 1 x+2
ż ż
2
dx = dx
( x + 1) + 4 4 ( x +1 )2 + 1
2

1 (2y ´ 1) + 2 1 2y + 1
ż ż
= 2 dy = dy
4 y2 + 1 2 y2 + 1
y 1 1
ż ż
= dy + dy
y2 + 1 2 y2 + 1

Both integrals are easily evaluated, using the substitution u = y2 + 1, du = 2y dy for


the first.
y 1 du 1 1 1 h x + 1 2 i
ż ż
2
dy = = ln |u| = ln ( y + 1 ) = ln + 1
y2 + 1 u 2 2 2 2 2
1
ż
1 1 1 x + 1
dy = arctan y = arctan
2 y2 + 1 2 2 2
That’s finally it. Putting all of the pieces together

x4 + 5x3 + 16x2 + 26x + 22 1


ż
3 2
dx = x2 + 2x + 2 ln |x + 1|
x + 3x + 7x + 5 2
1 h x + 1 2 i 1  x +1 
+ ln + 1 + arctan +C
2 2 2 2

Example A.9.5

A.9.2 §§ Proofs
We will now see the justification for the form of the partial fraction decompositions from
Section 3.8.3. We start by considering the case in which the denominator has only linear
factors. Then we’ll consider the case in which quadratic factors are allowed too17 .

§§§ The Simple Linear Factor Case


In the most common partial fraction decomposition, we split up
N (x)
( x ´ a1 ) ˆ ¨ ¨ ¨ ˆ ( x ´ a d )
into a sum of the form
A1 Ad
+¨¨¨+
x ´ a1 x ´ ad

17 In fact, quadratic factors are completely avoidable because, if we use complex numbers, then every
polynomial can be written as a product of linear factors. This is the Fundamental Theorem of Algebra.

462
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

We now show that this decomposition can always be achieved, under the assumptions
that the ai ’s are all different and N ( x ) is a polynomial of degree at most d ´ 1. To do so,
we shall repeatedly apply the following Lemma.

LemmaA.9.6.

Let N ( x ) and D ( x ) be polynomials of degree n and d respectively, with n ď d.


Suppose that a is NOT a zero of D ( x ). Then there is a polynomial P( x ) of degree
p ă d and a number A such that

N (x) P( x ) A
= +
D ( x ) ( x ´ a) D(x) x ´ a

Proof. • To save writing, let z = x ´ a. We then write Ñ (z) = N (z + a) and D̃ (z) =


D (z + a), which are again polynomials of degree n and d respectively. We also know
that D̃ (0) = D ( a) ‰ 0.
• In order to complete the proof we need to find a polynomial P̃(z) of degree p ă d
and a number A such that
Ñ (z) P̃(z) A P̃(z)z + A D̃ (z)
= + =
D̃ (z) z D̃ (z) z D̃ (z) z
or equivalently, such that

P̃(z)z + A D̃ (z) = Ñ (z).

• Now look at the polynomial on the left hand side. Every term in P̃(z)z, has at least
one power of z. So the constant term on the left hand side is exactly the constant
term in A D̃ (z), which is equal to A D̃ (0). The constant term on the right hand side
is equal to Ñ (0). So the constant terms on the left and right hand sides are the same
Ñ (0)
if we choose A = D̃ (0)
. Recall that D̃ (0) cannot be zero, so A is well defined.

• Now move A D̃ (z) to the right hand side.

P̃(z)z = Ñ (z) ´ A D̃ (z)

The constant terms in Ñ (z) and A D̃ (z) are the same, so the right hand side contains
no constant term and the right hand side is of the form Ñ1 (z)z for some polynomial
Ñ1 (z).
• Since Ñ (z) is of degree at most d and A D̃ (z) is of degree exactly d, Ñ1 is a polynomial
of degree d ´ 1. It now suffices to choose P̃(z) = Ñ1 (z).

Now back to
N (x)
( x ´ a1 ) ˆ ¨ ¨ ¨ ˆ ( x ´ a d )

463
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

Apply Lemma A.9.6, with D ( x ) = ( x ´ a2 ) ˆ ¨ ¨ ¨ ˆ ( x ´ ad ) and a = a1 . It says


N (x) A1 P( x )
= +
( x ´ a1 ) ˆ ¨ ¨ ¨ ˆ ( x ´ a d ) x ´ a1 ( x ´ a2 ) ˆ ¨ ¨ ¨ ˆ ( x ´ a d )
for some polynomial P of degree at most d ´ 2 and some number A1 .
Apply Lemma A.9.6 a second time, with D ( x ) = ( x ´ a3 ) ˆ ¨ ¨ ¨ ˆ ( x ´ ad ), N ( x ) = P( x )
and a = a2 . It says
P( x ) A2 Q( x )
= +
( x ´ a2 ) ˆ ¨ ¨ ¨ ˆ ( x ´ a d ) x ´ a2 ( x ´ a3 ) ˆ ¨ ¨ ¨ ˆ ( x ´ a d )
for some polynomial Q of degree at most d ´ 3 and some number A2 .
At this stage, we know that
N (x) A1 A2 Q( x )
= + +
( x ´ a1 ) ˆ ¨ ¨ ¨ ˆ ( x ´ a d ) x ´ a1 x ´ a2 ( x ´ a3 ) ˆ ¨ ¨ ¨ ˆ ( x ´ a d )
If we just keep going, repeatedly applying Lemma 1, we eventually end up with
N (x) A1 Ad
= +¨¨¨+
( x ´ a1 ) ˆ ¨ ¨ ¨ ˆ ( x ´ a d ) x ´ a1 x ´ ad
as required.

§§§ The General Linear Factor Case


Now consider splitting
N (x)
( x ´ a1 ) n1 ˆ ¨ ¨ ¨ ˆ ( x ´ a d )nd
into a sum of the form18
h A A1,n1 i h A Ad,nd i
1,1 d,1
+¨¨¨+ + ¨ ¨ ¨ + + ¨ ¨ ¨ +
x ´ a1 ( x ´ a 1 ) n1 x ´ ad ( x ´ a d )nd
We now show that this decomposition can always be achieved, under the assumptions
that the ai ’s are all different and N ( x ) is a polynomial of degree at most n1 + ¨ ¨ ¨ + nd ´ 1.
To do so, we shall repeatedly apply the following Lemma.

LemmaA.9.7.

Let N ( x ) and D ( x ) be polynomials of degree n and d respectively, with n ă d + m.


Suppose that a is NOT a zero of D ( x ). Then there is a polynomial P( x ) of degree
p ă d and numbers A1 , ¨ ¨ ¨ , Am such that

N (x) P( x ) A1 A2 Am
m
= + + +¨¨¨+
D ( x ) ( x ´ a) D ( x ) x ´ a ( x ´ a) 2 ( x ´ a)m

18 If we allow ourselves to use complex numbers as roots, this is the general case. We don’t need to
consider quadratic (or higher) factors since all polynomials can be written as products of linear factors
with complex coefficients.

464
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

Proof. • As we did in the proof of the previous lemma, we write z = x ´ a. Then


Ñ (z) = N (z + a) and D̃ (z) = D (z + a) are polynomials of degree n and d respec-
tively, D̃ (0) = D ( a) ‰ 0.

• In order to complete the proof we have to find a polynomial P̃(z) of degree p ă d


and numbers A1 , ¨ ¨ ¨ , Am such that
Ñ (z) P̃(z) A1 A2 Am
= + + + ¨ ¨ ¨ +
D̃ (z) zm D̃ (z) z z 2 zm
P̃(z)zm + A1 zm´1 D̃ (z) + A2 zm´2 D̃ (z) + ¨ ¨ ¨ + Am D̃ (z)
=
D̃ (z) zm
or equivalently, such that

P̃(z)zm + A1 zm´1 D̃ (z) + A2 zm´2 D̃ (z) + ¨ ¨ ¨ + Am´1 z D̃ (z) + Am D̃ (z) = Ñ (z)

• Now look at the polynomial on the left hand side. Every single term on the left
hand side, except for the very last one, Am D̃ (z), has at least one power of z. So the
constant term on the left hand side is exactly the constant term in Am D̃ (z), which is
equal to Am D̃ (0). The constant term on the right hand side is equal to Ñ (0). So the
Ñ (0)
constant terms on the left and right hand sides are the same if we choose Am = D̃ (0)
.
Recall that D̃ (0) ‰ 0 so Am is well defined.

• Now move Am D̃ (z) to the right hand side.

P̃(z)zm + A1 zm´1 D̃ (z) + A2 zm´2 D̃ (z) + ¨ ¨ ¨ + Am´1 z D̃ (z) = Ñ (z) ´ Am D̃ (z)

The constant terms in Ñ (z) and Am D̃ (z) are the same, so the right hand side contains
no constant term and the right hand side is of the form Ñ1 (z)z with Ñ1 a polynomial
of degree at most d + m ´ 2. (Recall that Ñ is of degree at most d + m ´ 1 and D̃ is of
degree at most d.) Divide the whole equation by z to get

P̃(z)zm´1 + A1 zm´2 D̃ (z) + A2 zm´3 D̃ (z) + ¨ ¨ ¨ + Am´1 D̃ (z) = Ñ1 (z).

• Now, we can repeat the previous argument. The constant term on the left hand side,
which is exactly equal to Am´1 D̃ (0) matches the constant term on the right hand
Ñ1 (0)
side, which is equal to Ñ1 (0) if we choose Am´1 = D̃ (0)
. With this choice of Am´1

P̃(z)zm´1 + A1 zm´2 D̃ (z) + A2 zm´3 D̃ (z) + ¨ ¨ ¨ + Am´2 z D̃ (z)


= Ñ1 (z) ´ Am´1 D̃ (z) = Ñ2 (z)z

with Ñ2 a polynomial of degree at most d + m ´ 3. Divide by z and continue.

• After m steps like this, we end up with

P̃(z)z = Ñm´1 (z) ´ A1 D̃ (z)


Ñm´1 (0)
after having chosen A1 = D̃ (0)
.

465
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

• There is no constant term on the right side so that Ñm´1 (z) ´ A1 D̃ (z) is of the form
Ñm (z)z with Ñm a polynomial of degree d ´ 1. Choosing P̃(z) = Ñm (z) completes
the proof.

Now back to
N (x)
( x ´ a1 ) n1 ˆ ¨ ¨ ¨ ˆ ( x ´ a d )nd
Apply Lemma A.9.7, with D ( x ) = ( x ´ a2 )n2 ˆ ¨ ¨ ¨ ˆ ( x ´ ad )nd , m = n1 and a = a1 . It says

N (x)
( x ´ a1 ) n1 ˆ ¨ ¨ ¨ ˆ ( x ´ a d )nd
A1,1 A1,2 A1,n1 P( x )
= + + ¨ ¨ ¨ + +
x ´ a1 ( x ´ a1 )2 ( x ´ a ) n1 ( x ´ a 2 ) n2 ˆ ¨ ¨ ¨ ˆ ( x ´ a d ) n d
Apply Lemma A.9.7 a second time, with D ( x ) = ( x ´ a3 )n3 ˆ ¨ ¨ ¨ ˆ ( x ´ ad )nd , N ( x ) = P( x ),
m = n2 and a = a2 . And so on. Eventually, we end up with
h A A1,n1 i h A Ad,nd i
1,1 d,1
+¨¨¨+ + ¨ ¨ ¨ + + ¨ ¨ ¨ +
x ´ a1 ( x ´ a 1 ) n1 x ´ ad ( x ´ a d )nd
which is exactly what we were trying to show.

§§§ Really Optional — The Fully General Case


We are now going to see that, in general, if N ( x ) and D ( x ) are polynomials with the degree
of N being strictly smaller than the degree of D (which we’ll denote deg( N ) ă deg( D ))
and if

D ( x ) = K ( x ´ a1 )m1 ¨ ¨ ¨ ( x ´ a j )m j ( x2 + b1 x + c1 )n1 ¨ ¨ ¨ ( x2 + bk x + ck )nk (E1)

(with b`2 ´ 4ac` ă 0 for all 1 ď ` ď k so that no quadratic factor can be written as a product
of linear factors with real coefficients) then there are real numbers Ai,j , Bi,j , Ci,j such that

N (x) A1,1 A1,2 A1,m1


= + + ¨ ¨ ¨ + +¨¨¨
D(x) x ´ a1 ( x ´ a1 )2 ( x ´ a 1 ) m1
A j,1 A j,2 A j,m j
+ + + ¨ ¨ ¨ +
x ´ a j ( x ´ a j )2 ( x ´ a j )m j
B x + C1,1 B x + C1,2 B1,n1 x + C1,n1
+ 2 1,1 + 2 1,2 + ¨ ¨ ¨ + +¨ ¨ ¨
x + b1 x + c1 ( x + b1 x + c1 )2 ( x2 + b1 x + c1 )n1
B x + Ck,1 B x + Ck,2 Bk,n x + C1,nk
+ 2 k,1 + 2 k,2 2
+¨ ¨ ¨+ 2 k
x + bk x + c k ( x + bk x + c k ) ( x + bk x + c k ) n k
This was (A.9.3).
We start with two simpler results, that we’ll use repeatedly to get (A.9.3). In the first
P( x )
simpler result, we consider the fraction Q ( x) Q ( x) with P( x ), Q1 ( x ) and Q2 ( x ) being poly-
1 2
nomials with real coefficients and we are going to assume that when P( x ), Q1 ( x ) and

466
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

Q2 ( x ) are factored as in (E1), no two of them have a common linear or quadratic factor.
As an example, no two of
P( x ) = 2( x ´ 3)( x ´ 4)( x2 + 3x + 3)
Q1 ( x ) = 2( x ´ 1)( x2 + 2x + 2)
Q2 ( x ) = 2( x ´ 2)( x2 + 2x + 3)
have such a common factor. But, for
P( x ) = 2( x ´ 3)( x ´ 4)( x2 + x + 1)
Q1 ( x ) = 2( x ´ 1)( x2 + 2x + 2)
Q2 ( x ) = 2( x ´ 2)( x2 + x + 1)

P( x ) and Q2 ( x ) have the common factor x2 + x + 1.


LemmaA.9.8.

Let P( x ), Q1 ( x ) and Q2 ( x ) be polynomials with real coefficients and with


deg( P) ă deg( Q1 Q2 ). Assume that no two of P( x ), Q1 ( x ) and Q2 ( x ) have
a common linear or quadratic factor. Then there are polynomials P1 , P2 with
deg( P1 ) ă deg( Q1 ), deg( P2 ) ă deg( Q2 ), and

P( x ) P (x) P (x)
= 1 + 2
Q1 ( x ) Q2 ( x ) Q1 ( x ) Q2 ( x )

Proof. We are to find polynomials P1 and P2 that obey


P( x ) = P1 ( x ) Q2 ( x ) + P2 ( x ) Q1 ( x )
Actually, we are going to find polynomials p1 and p2 that obey
p1 ( x ) Q1 ( x ) + p2 ( x ) Q2 ( x ) = C (E2)
P( x )
for some nonzero constant C, and then just multiply (E2) by C . To find p1 , p2 and C
we are going to use something called the Euclidean algorithm. It is an algorithm19 that
is used to efficiently find the greatest common divisors of two numbers. Because Q1 ( x )
and Q2 ( x ) have no common factors of degree 1 or 2, their “greatest common divisor” has
degree 0, i.e. is a constant.
Q1 ( x )
• The first step is to apply long division to Q2 ( x )
to find polynomials n0 ( x ) and r0 ( x )
such that
Q1 ( x ) r (x)
= n0 ( x ) + 0 with deg(r0 ) ă deg( Q2 )
Q2 ( x ) Q2 ( x )
or, equivalently,
Q1 ( x ) = n0 ( x ) Q2 ( x ) + r0 ( x ) with deg(r0 ) ă deg( Q2 )

19 It appears in Euclid’s Elements, which was written about 300 BC, and it was probably known even
before that.

467
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

Q2 ( x )
• The second step is to apply long division to r0 ( x )
to find polynomials n1 ( x ) and
r1 ( x ) such that

Q2 ( x ) = n1 ( x ) r0 ( x ) + r1 ( x ) with deg(r1 ) ă deg(r0 ) or r1 ( x ) = 0


r0 ( x )
• The third step (assuming that r1 ( x ) was not zero) is to apply long division to r1 ( x )
to
find polynomials n2 ( x ) and r2 ( x ) such that

r0 ( x ) = n2 ( x ) r1 ( x ) + r2 ( x ) with deg(r2 ) ă deg(r1 ) or r2 ( x ) = 0

• And so on.

As the degree of the remainder ri ( x ) decreases by at least one each time i is increased by
one, the above iteration has to terminate with some r`+1 ( x ) = 0. That is, we choose ` to be
index of the last nonzero remainder. Here is a summary of all of the long division steps.

Q1 ( x ) = n0 ( x ) Q2 ( x ) + r0 ( x ) with deg(r0 ) ă deg( Q2 )


Q2 ( x ) = n1 ( x ) r0 ( x ) + r1 ( x ) with deg(r1 ) ă deg(r0 )
r0 ( x ) = n2 ( x ) r1 ( x ) + r2 ( x ) with deg(r2 ) ă deg(r1 )
r1 ( x ) = n3 ( x ) r2 ( x ) + r3 ( x ) with deg(r3 ) ă deg(r2 )
..
.
r`´2 ( x ) = n` ( x ) r`´1 ( x ) + r` ( x ) with deg(r` ) ă deg(r`´1 )
r`´1 ( x ) = n`+1 ( x ) r` ( x ) + r`+1 ( x ) with r`+1 = 0

Now we are going to take a closer look at all of the different remainders that we have
generated.

• From first long division step, namely Q1 ( x ) = n0 ( x ) Q2 ( x ) + r0 ( x ) we have that the


remainder
r0 ( x ) = Q1 ( x ) ´ n0 ( x ) Q2 ( x )
• From the second long division step, namely Q2 ( x ) = n1 ( x ) r0 ( x ) + r1 ( x ) we have
that the remainder
 
r1 ( x ) = Q2 ( x ) ´ n1 ( x ) r0 ( x ) = Q2 ( x ) ´ n1 ( x ) Q1 ( x ) ´ n0 ( x ) Q2 ( x )
= A1 ( x ) Q1 ( x ) + B1 ( x ) Q2 ( x )

with A1 ( x ) = ´n1 ( x ) and B1 ( x ) = 1 + n0 ( x ) n1 ( x ).


• From the third long division step (assuming that r1 ( x ) was not zero), namely r0 ( x ) =
n2 ( x ) r1 ( x ) + r2 ( x ), we have that the remainder

r2 ( x ) = r0 ( x ) ´ n2 ( x ) r1 ( x )
   
= Q1 ( x ) ´ n0 ( x ) Q2 ( x ) ´ n2 ( x ) A1 ( x ) Q1 ( x ) + B1 ( x ) Q2 ( x )
= A2 ( x ) Q1 ( x ) + B2 ( x ) Q2 ( x )

with A2 ( x ) = 1 ´ n2 ( x ) A1 ( x ) and B2 ( x ) = ´n0 ( x ) ´ n2 ( x ) B1 ( x ).

468
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

• And so on. Continuing in this way, we conclude that the final nonzero remainder
r` ( x ) = A` ( x ) Q1 ( x ) + B` ( x ) Q2 ( x ) for some polynomials A` and B` .

Now the last nonzero remainder r` ( x ) has to be a nonzero constant C because

˝ it is nonzero by the definition of r` ( x ) and


˝ if r` ( x ) were a polynomial of degree at least one, then
– r` ( x ) would be a factor of r`´1 ( x ) because r`´1 ( x ) = n`+1 ( x ) r` ( x ) and
– r` ( x ) would be a factor of r`´2 ( x ) because r`´2 ( x ) = n` ( x ) r`´1 ( x ) + r` ( x ) and
– r` ( x ) would be a factor of r`´3 ( x ) because r`´3 ( x ) = n`´1 ( x ) r`´2 ( x ) + r`´1 ( x ) and
– ¨ ¨ ¨ and ¨ ¨ ¨
– r` ( x ) would be a factor of r1 ( x ) because r1 ( x ) = n3 ( x ) r2 ( x ) + r3 ( x ) and
– r` ( x ) would be a factor of r0 ( x ) because r0 ( x ) = n2 ( x ) r1 ( x ) + r2 ( x ) and
– r` ( x ) would be a factor of Q2 ( x ) because Q2 ( x ) = n1 ( x ) r0 ( x ) + r1 ( x ) and
– r` ( x ) would be a factor of Q1 ( x ) because Q1 ( x ) = n0 ( x ) Q2 ( x ) + r0 ( x )
˝ so that r` ( x ) would be a common factor for Q1 ( x ) and Q2 ( x ), in contradiction to the
hypothesis that no two of P( x ), Q1 ( x ) and Q2 ( x ) have a common linear or quadratic
factor.
P( x )
We now have that A` ( x ) Q1 ( x ) + B` ( x ) Q2 ( x ) = r` ( x ) = C. Multiplying by C gives

P̃1 ( x ) P̃ ( x ) P( x )
P̃2 ( x ) Q1 ( x ) + P̃1 ( x ) Q2 ( x ) = P( x ) or + 2 =
Q1 ( x ) Q2 ( x ) Q1 ( x ) Q2 ( x )
P( x ) A ( x ) P( x ) B ( x )
with P̃2 ( x ) = C
`
and P̃1 ( x ) = C
`
. We’re not quite done, because there is still
the danger that deg( P̃1 ) ě deg( Q1 ) or deg( P̃2 ) ě deg( Q2 ). To deal with that possibility,
P̃1 ( x )
we long divide Q1 ( x )
and call the remainder P1 ( x ).

P̃1 ( x ) P (x)
= N (x) + 1 with deg( P1 ) ă deg( Q1 )
Q1 ( x ) Q1 ( x )

Therefore we have that

P( x ) P (x) P̃ ( x )
= 1 + N (x) + 2
Q1 ( x ) Q2 ( x ) Q1 ( x ) Q2 ( x )
P (x) P̃ ( x ) + N ( x ) Q2 ( x )
= 1 + 2
Q1 ( x ) Q2 ( x )

Denoting P2 ( x ) = P̃2 ( x ) + N ( x ) Q2 ( x ) gives Q1PQ2 = QP11 + QP22 and since deg( P1 ) ă deg( Q1 ),
the only thing left to prove is that deg( P2 ) ă deg( Q2 ).
We assume that deg( P2 ) ě deg( Q2 ) and look for a contradiction. We have

deg( P2 Q1 ) ě deg( Q1 Q2 ) ą deg( P1 Q2 )


ùñ deg( P) = deg( P1 Q2 + P2 Q1 ) = deg( P2 Q1 ) ě deg( Q1 Q2 )

which contradicts the hypothesis that deg( P) ă deg( Q1 Q2 ) and the proof is complete.

469
P ROOFS AND S UPPLEMENTS A.9 PARTIAL F RACTION D ECOMPOSITIONS

For the second of the two simpler results, that we’ll shortly use repeatedly to get
P( x ) P( x )
(A.9.3), we consider ( x´a)m and ( x2 +bx+c)m .

LemmaA.9.9.

Let m ě 2 be an integer, and let Q( x ) be either x ´ a or x2 + bx + c, with a, b and c


being real numbers. Let P( x ) be a polynomial with real coefficients, which does
not contain Q( x ) as a factor, and with deg( P) ă deg( Qm ) = m deg( Q). Then, for
each 1 ď i ď m, there is a polynomial Pi with deg( Pi ) ă deg( Q) or Pi = 0, such
that
P( x ) P1 ( x ) P2 ( x ) P3 ( x ) Pm´1 ( x ) Pm ( x )
= + + + ¨ ¨ ¨ + + .
Q( x )m Q( x ) Q ( x )2 Q ( x )3 Q( x )m´1 Q( x )m
In particular, if Q( x ) = x ´ a, then each Pi ( x ) is just a constant Ai , and if Q( x ) =
x2 + bx + c, then each Pi ( x ) is a polynomial Bi x + Ci of degree at most one.

Proof. We simply repeatedly use long divison to get


P( x ) P( x ) 1 r1 ( x ) 1
" *
m
= m´1
= n1 ( x ) +
Q( x ) Q( x ) Q( x ) Q( x ) Q( x )m´1
r1 ( x ) n1 ( x ) 1 r1 ( x ) r2 ( x ) 1
" *
= m
+ m´2
= m
+ n2 ( x ) +
Q( x ) Q( x ) Q( x ) Q( x ) Q( x ) Q( x )m´2
r (x) r (x) n (x) 1
= 1 m + 2 m´1 + 2
Q( x ) Q( x ) Q( x ) Q( x )m´3
..
.
r (x) r (x) r (x) n (x) 1
= 1 m + 2 m´1 + ¨ ¨ ¨ + m´2 3 + m´2
Q( x ) Q( x ) Q( x ) Q( x ) Q( x )
r1 ( x ) r2 ( x ) rm´2 ( x ) rm´1 ( x ) 1
" *
= m
+ m´1
+¨¨¨+ 3
+ nm´1 ( x ) +
Q( x ) Q( x ) Q( x ) Q( x ) Q( x )
r (x) r (x) r (x) r (x) n (x)
= 1 m + 2 m´1 + ¨ ¨ ¨ + m´2 3 + m´1 2 + m´1
Q( x ) Q( x ) Q( x ) Q( x ) Q( x )
By the rules of long division every deg(ri ) ă deg( Q). It is also true that the final numer-
ator, nm´1 , has deg(nm´1 ) ă deg( Q) — that is, we kept dividing by Q until the degree of
the quotient was less than the degree of Q. To see this, note that deg( P) ă m deg( Q) and
deg(n1 ) = deg( P) ´ deg( Q)
deg(n2 ) = deg(n1 ) ´ deg( Q) = deg( P) ´ 2 deg( Q)
..
.
deg(nm´1 ) = deg(nm´2 ) ´ deg( Q) = deg( P) ´ (m ´ 1) deg( Q)
ă m deg( Q) ´ (m ´ 1) deg( Q)
= deg( Q)
So, if deg( Q) = 1, then r1 , r2 , . . . , rm´1 , nm´1 are all real numbers, and if deg( Q) = 2, then
r1 , r2 , . . . , rm´1 , nm´1 all have degree at most one.

470
P ROOFS AND S UPPLEMENTS A.10 A N E RROR B OUND FOR THE M IDPOINT R ULE

We are now in a position to get (A.9.3). We use (E1) to factor20 D ( x ) = ( x ´ a1 )m1 Q2 ( x )


and use Lemma A.9.8 to get
N (x) N (x) P1 ( x ) P (x)
= m
= m
+ 2
D(x) ( x ´ a1 ) Q2 ( x )
1 ( x ´ a1 ) 1 Q2 ( x )
where deg( P1 ) ă m1 , and deg( P2 ) ă deg( Q2 ). Then we use Lemma A.9.9 to get
N (x) P1 ( x ) P2 ( x ) A1,1 A1,2 A1,m1 P2 ( x )
= + = + + ¨ ¨ ¨ + +
D(x) ( x ´ a 1 ) m1 Q 2 ( x ) x ´ a1 ( x ´ a1 )2 ( x ´ a 1 ) m1 Q 2 ( x )
P2 ( x )
We continue working on Q2 ( x )
in this way, pulling off of the denominator one ( x ´ ai )mi or
one ( x2 + bi x + ci )ni at a time, until we exhaust all of the factors in the denominator D ( x ).
Section A.9 of this work was adapted from Section 1.10 of CLP 2 – Integral Calculus by
Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

A.10IJ An Error Bound for the Midpoint Rule


We now try develop some understanding as to why we got the experimental results of
section 3.9.4. We start with the error generated by a single step of the midpoint rule. That
is, the error introduced by the approximation
ż x1
f ( x ) dx « f ( x̄1 )∆x where ∆x = x1 ´ x0 , x̄1 = x0 +2 x1
x0
To do this we are going to need to apply integration by parts in a sneaky way. Let us start
by considering21 a subinterval α ď x ď β and let’s call the width of the subinterval 2q so
that β = α + 2q. If we were to now apply the midpoint rule to this subinterval, then we
would write
żβ
f ( x )dx « 2q ¨ f (α + q) = q f (α + q) + q f ( β ´ q)
α
since the interval has width 2q and the midpoint is α + q = β ´ q.
The sneaky trick we will employ is to write
żβ ż α+q żβ
f ( x )dx = f ( x )dx + f ( x )dx
α α β´q
and then examine each of the integrals on the right-hand side (using integration by parts)
and show that they are each of the form
ż α+q
f ( x )dx « q f (α + q) + small error term
α
żβ
f ( x )dx « q f ( β ´ q) + small error term
β´q

20 This is assuming that there is at least one linear factor. If not, we factor D ( x ) = ( x2 + b1 x + c1 )n1 Q2 ( x )
instead.
21 We chose this interval so that we didn’t have lots of subscripts floating around in the algebra.

471
P ROOFS AND S UPPLEMENTS A.10 A N E RROR B OUND FOR THE M IDPOINT R ULE

ş α+q
Let us apply integration by parts to α f ( x )dx — with u = f ( x ), dv = dx so du =
f 1 ( x )dx and we will make the slightly non-standard choice of v = x ´ α:
ż α+q ż α+q
 α+q
f ( x )dx = ( x ´ α) f ( x ) α ´ ( x ´ α) f 1 ( x )dx
α α
ż α+q
= q f (α + q) ´ ( x ´ α) f 1 ( x )dx
α

Notice that the first term on the right-hand side is the term we need, and that our non-
standard choice of v allowed us to avoid introducing an f (α) term.
Now integrate by parts again using u = f 1 ( x ), dv = ( x ´ α)dx, so du = f 2 ( x ), v =
( x´α)2
2 :
ż α+q ż α+q
f ( x )dx = q f (α + q) ´ ( x ´ α) f 1 ( x )dx
α α
 α+q ż α+q
( x ´ α )2 1 ( x ´ α )2 2
= q f (α + q) ´ f (x) + f ( x )dx
2 α α 2
ż α+q
q2 1 ( x ´ α )2 2
= q f (α + q) ´ f (α + q) + f ( x )dx
2 α 2
To obtain a similar expression for the other integral, we repeat the above steps and obtain:

q2 1 ( x ´ β )2 2
żβ żβ
f ( x )dx = q f ( β ´ q) + f ( β ´ q) + f ( x )dx
β´q 2 β´q 2
Now add together these two expressions
ż α+q
q2
żβ
f ( x )dx + f ( x )dx = q f (α + q) + q f ( β ´ q) + ( f 1 ( β ´ q) ´ f 1 (α + q))
α β´q 2
ż α+q
( x ´ α )2 2 ( x ´ β )2 2
ż β
+ f ( x )dx + f ( x )dx
α 2 β´q 2
Then since α + q = β ´ q we can combine the integrals on the left-hand side and eliminate
some terms from the right-hand side:
ż α+q
( x ´ α )2 2 ( x ´ β )2 2
żβ żβ
f ( x )dx = 2q f (α + q) + f ( x )dx + f ( x )dx
α α 2 β´q 2
Rearrange this expression a little and take absolute values
ˇż ˇ ˇż ˇ ˇˇż β ˇ
ˇ β ˇ ˇ α + q ( x ´ α )2 ( x ´ β ) 2 ˇ
f ( x )dx ´ 2q f (α + q)ˇ ď ˇˇ f 2 ( x )dxˇˇ + ˇ f 2 ( x )dxˇ
ˇ ˇ ˇ ˇ ˇ
2 2
ˇ
ˇ α ˇ α ˇ β´q ˇ

where we have also made use of the triangle inequality22 . By assumption | f 2 ( x )| ď M on

22 The triangle inequality says that for any real numbers x, y


|x + y| ď |x| + |y|.

472
P ROOFS AND S UPPLEMENTS A.11 C OMPARISON T ESTS P ROOF

the interval α ď x ď β, so
ˇż ˇ ż α+q
( x ´ α )2 ( x ´ β )2
ˇ β ˇ żβ
f ( x )dx ´ 2q f (α + q)ˇ ď M dx + M dx
ˇ ˇ
2 2
ˇ
ˇ α ˇ α β´q

Mq3 M ( β ´ α )3
= =
3 24

where we have used q = 2 in the last step.


β´α

Thus on any interval xi ď x ď xi+1 = xi + ∆x


ˇż x  ˇ
ˇ i +1 xi + xi+1 ˇˇ M 3
ˇ f ( x )dx ´ ∆x f ˇ ď 24 (∆x )
ˇ
xi 2

Putting everything together we see that the error using the midpoint rule is bounded
by
ˇż ˇ
ˇ b ˇ
ˇ f ( x )dx ´ [ f ( x̄1 ) + f ( x̄2 ) + ¨ ¨ ¨ + f ( x̄n )] ∆xˇ
ˇ ˇ
ˇ a ˇ
ˇż x ˇż ˇ
ˇ 1
ˇ ˇ xn ˇ
f ( x )dx ´ ∆x f ( x̄1 )ˇˇ + ¨ ¨ ¨ + ˇ f ( x )dx ´ ∆x f ( x̄n )ˇ
ˇ
ď ˇˇ
ˇ ˇ
x0 ˇ xn´1 ˇ
 
M 3 M b´a 3 M ( b ´ a )3
ď n ˆ (∆x ) = n ˆ =
24 24 n 24n2

as required.
A very similar analysis shows that, as was stated in Theorem 3.9.12 above,

M ( b ´ a )3
• the total error introduced by the trapezoidal rule is bounded by ,
12 n2
M ( b ´ a )5
• the total error introduced by Simpson’s rule is bounded by
180 n4

Section A.10 of this work was adapted from Section 1.11.5 of CLP 2 – Integral Calculus
by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

A.11IJ Comparison Tests Proof


In this and the next optional section we provide proofs of two convergence tests. We shall
repeatedly use the fact that any sequence a1 , a2 , a3 , ¨ ¨ ¨ , of real numbers which is increasing
(i.e. an+1 ě an for all n) and bounded (i.e. there is a constant M such that an ď M for all n)
converges. We shall not prove this fact23 .
We start with the comparison test, and then move on to the alternating series test.

23 It is one way to state a property of the real number system called “completeness”. The interested reader
should use their favourite search engine to look up “completeness of the real numbers”.

473
P ROOFS AND S UPPLEMENTS A.12 A LTERNATING S ERIES

TheoremA.11.1 (The Comparison Test).

Let N0 be a natural number and let K ą 0.


8 8
(a) If |an | ď Kcn for all n ě N0 and cn converges, then an converges.
ř ř
n =0 n =0
8 8
(b) If an ě Kdn ě 0 for all n ě N0 and dn diverges, then an diverges.
ř ř
n =0 n =0

Proof. (a) By hypothesis 8 n=0 cn converges. So it suffices to prove that n=0 [ Kcn ´ an ]
ř ř8
converges, because then, by our Arithmetic of series Theorem 5.2.10,

8
ÿ 8
ÿ 8
ÿ
an = Kcn ´ [Kcn ´ an ]
n =0 n =0 n =0

will converge too. But for all n ě N0 , Kcn ´ an ě 0 so that, for all N ě N0 , the partial sums

N
ÿ
SN = [Kcn ´ an ]
n =0

N0 8
increase with N, but never gets bigger than the finite number [Kcn ´ an ] + K cn . So
ř ř
n =0 n= N0 +1
the partial sums S N converge as N Ñ 8.
(b) For all N ą N0 , the partial sum

N
ÿ N0
ÿ N
ÿ
SN = an ě an + K dn
n =0 n =0 n= N0 +1

řN
By hypothesis, n= N0 +1 dn , and hence S N , grows without bound as N Ñ 8. So S N Ñ 8
as N Ñ 8.

Section A.11 of this work was adapted from Section 3.3.10 of CLP 2 – Integral Calculus
by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

A.12IJ Alternating Series


A.12.1 §§ The Alternating Series Test
A common convergence test for series that is not included in our learning goals is the
Alternating Series Test. First, we need to know what an alternating series is.

474
P ROOFS AND S UPPLEMENTS A.12 A LTERNATING S ERIES

When the signs of successive terms in a series alternate between + and ´, like for
example in 1 ´ 12 + 31 ´ 14 + ¨ ¨ ¨ , the series is called an alternating series. More generally, the
series
8
(´1)n´1 An
ÿ
A1 ´ A2 + A3 ´ A4 + ¨ ¨ ¨ =
n =1

is alternating if every An ě 0. Often (but not always) the terms in alternating series get
successively smaller. That is, then A1 ě A2 ě A3 ě ¨ ¨ ¨ . In this case:

• The first partial sum is S1 = A1 .

• The second partial sum, S2 = A1 ´ A2 , is smaller than S1 by A2 .

• The third partial sum, S3 = S2 + A3 , is bigger than S2 by A3 , but because A3 ď A2 ,


S3 remains smaller than S1 . See the figure below.

• The fourth partial sum, S4 = S3 ´ A4 , is smaller than S3 by A4 , but because A4 ď A3 ,


S4 remains bigger than S2 . Again, see the figure below.

• And so on.

So the successive partial sums oscillate, but with ever decreasing amplitude. If, in ad-
dition, An tends to 0 as n tends to 8, the amplitude of oscillation tends to zero and the
sequence S1 , S2 , S3 , ¨ ¨ ¨ converges to some limit S.
Here is a convergence test for alternating series that exploits this structure, and that is
really easy to apply.

TheoremA.12.1 (Alternating Series Test).


(8
Let An n =1
be a sequence of real numbers that obeys

(i) An ě 0 for all n ě 1 and


(ii) An+1 ď An for all n ě 1 (i.e. the sequence is monotone decreasing) and
(iii) limnÑ8 An = 0.

Then
8
(´1)n´1 An = S
ÿ
A1 ´ A2 + A3 ´ A4 + ¨ ¨ ¨ =
n =1
converges and, for each natural number N, S ´ S N is between 0 and (the first
dropped term) (´1) N A N +1 . Here S N is, as previously, the N th partial sum
N
(´1)n´1 An .
ř
n =1

“Proof”. We shall only give part of the proof here. For the rest of the proof see the ap-
pendix section A.11. We shall fix any natural number N and concentrate on the last state-
ment, which gives a bound on the truncation error (which is the error introduced when

475
P ROOFS AND S UPPLEMENTS A.12 A LTERNATING S ERIES

you approximate the full series by the partial sum S N )


8 h i
(´1)n´1 An = (´1) N A N +1 ´ A N +2 + A N +3 ´ A N +4 + ¨ ¨ ¨
ÿ
EN = S ´ S N =
n = N +1

This is of course another series. We’re going to study the partial sums
` `ÿ
´N
(´1)n´1 An = (´1) N (´1)m´1 A N +m
ÿ
S N,` =
n = N +1 m =1

for that series.

• If `1 ą N + 1, with `1 ´ N even,
ě0
hkkkkkkkkkikkkkkkkkkj ě0
hkkkkkkkkkikkkkkkkkkj ě0
hkkkkkkkikkkkkkkj
(´1) N S N,`1 = ( A N +1 ´ A N +2 ) + ( A N +3 ´ A N +4 ) + ¨ ¨ ¨ + ( A`1 ´1 ´ A`1 ) ě 0 and
ě0
hkkkkkikkkkkjhkkě0
ikkj
(´1) N S N,`1 +1 N
= (´1) S N,`1 + A`1 +1 ě 0

This tells us that (´1) N S N,` ě 0 for all ` ą N + 1, both even and odd.

• Similarly, if `1 ą N + 1, with `1 ´ N odd,

ě0
hkkkkkkkikkkkkkkj ě0
hkkkkkkkikkkkkkkj ě0
hkkkkkkkikkkkkkkj
(´1) N S N,`1 = A N +1 ´ ( A N +2 ´ A N +3 ) ´ ( A N +4 ´ A N +5 ) ´ ¨ ¨ ¨ ´ ( A`1 ´1 ´ A`1 ) ď A N +1
ďA N +1
hkkkkkikkkkkjhkkě0
ikkj
N N
(´1) S N,`1 +1 = (´1) S N,`1 ´ A`1 +1 ď A N +1

This tells us that (´1) N S N,` ď A N +1 for all for all ` ą N + 1, both even and odd.

So we now know that S N,` lies between its first term, (´1) N A N +1 , and 0 for all ` ą N + 1.
While we are not going to prove it here (see the optional section A.11), this implies that,
since A N +1 Ñ 0 as N Ñ 8, the series converges and that

S ´ S N = lim S N,`
`Ñ8

lies between (´1) N A N +1 and 0.

Example A.12.2
1
We have already seen, in Example 5.3.6, that the harmonic series 8 n=1 n diverges. On the
ř

other hand, the series n=1 (´1)n´1 n1 converges by the alternating series test with An = n1 .
ř8
Note that

(i) An = n1 ě 0 for all n ě 1, so that 8n=1 (´1)


n´1 1 really is an alternating series, and
ř
n
(ii) An = n1 decreases as n increases, and

476
P ROOFS AND S UPPLEMENTS A.12 A LTERNATING S ERIES

1
(iii) lim An = lim = 0.
nÑ8 nÑ8 n

so that all of the hypotheses of the alternating series test, i.e. of Theorem A.12.1, are
satisfied. We shall see, in Example 5.5.19, that

(´1)n´1
8
ÿ
= ln 2.
n
n =1

Example A.12.2

Example A.12.3 (e)


xn
You may already know that e x = n=0 n! . In any event, we shall prove this in Exam-
ř8
ple 5.6.3, below. In particular

(´1)n
8
1 ÿ 1 1 1 1 1
=e =
´1
= 1´ + ´ + ´ +¨¨¨
e n =0
n! 1! 2! 3! 4! 5!

is an alternating series and satisfies all of the conditions of the alternating series test, The-
orem A.12.1a:

(i) The terms in the series alternate in sign.


(ii) The magnitude of the nth term in the series decreases monotonically as n increases.
(iii) The nth term in the series converges to zero as n Ñ 8.

So the alternating series test guarantees that, if we approximate, for example,

1 1 1 1 1 1 1 1 1
« ´ + ´ + ´ + ´
e 2! 3! 4! 5! 6! 7! 8! 9!
then the error in this approximation lies between 0 and the next term in the series, which
1
is 10! . That is

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
´ + ´ + ´ + ´ ď ď ´ + ´ + ´ + ´ +
2! 3! 4! 5! 6! 7! 8! 9! e 2! 3! 4! 5! 6! 7! 8! 9! 10!
so that
1 1
1 1 1 1 1 1 1 1 1
ďeď 1 1 1 1 1
2! ´ 3! + 4! ´ 5! + 6! ´ 7! + 8! ´ 9! + 10! 2! ´ 3! + 4! ´ 5! + 6! ´ 7!1 + 1
8! ´ 9!1

which, to seven decimal places says

2.7182816 ď e ď2.7182837

(To seven decimal places e = 2.7182818.)


The alternating series test tells us that, for any natural number N, the error that we
(´1)n
make when we approximate 1e by the partial sum S N = nN=0 n! has magnitude no
ř

477
P ROOFS AND S UPPLEMENTS A.12 A LTERNATING S ERIES

1
larger than ( N +1) !
. This tends to zero spectacularly quickly as N increases, simply because
( N + 1)! increases spectacularly quickly as N increases24 . For example 20! « 2.4 ˆ 1027 .
Example A.12.3

Example A.12.4

We will shortly see, in Example 5.5.19, that if ´1 ă x ď 1, then

x2 x3 x4 xn
8
(´1)n´1
ÿ
ln(1 + x ) = x ´ + ´ +¨¨¨ =
2 3 4 n
n =1

11 11 1
Suppose that we have to compute ln 10 to within an accuracy of 10´12 . Since 10 = 1 + 10 ,
11 1
we can get ln 10 by evaluating ln(1 + x ) at x = 10 , so that

11  1 1 1 1 1
8
1
n´1
ÿ
ln = ln 1 + = ´ + ´ + ¨ ¨ ¨ = ( ´1 )
10 10 10 2 ˆ 102 3 ˆ 103 4 ˆ 104 n ˆ 10n
n =1

By the alternating series test, this series converges. Also by the alternating series test,
11
approximating ln 10 by throwing away all but the first N terms
N
11 1 1 1 1 N´1 1 1
(´1)n´1
ÿ
ln « ´ + ´ + ¨ ¨ ¨ + (´1) =
10 10 2 ˆ 102 3 ˆ 103 4 ˆ 104 N ˆ 10 N n ˆ 10n
n =1

introduces an error whose magnitude is no more than the magnitude of the first term that
we threw away.
1
error ď
( N + 1) ˆ 10 N +1
To achieve an error that is no more than 10´12 , we have to choose N so that
1
ď 10´12
( N + 1) ˆ 10 N +1
The best way to do so is simply to guess — we are not going to be able to manipulate the
1 1
inequality ( N +1)ˆ10 N +1 ď 1012 into the form N ď ¨ ¨ ¨ , and even if we could, it would not be

worth the effort. We need to choose N so that the denominator ( N + 1) ˆ 10 N +1 is at least


1012 . That is easy, because the denominator contains the factor 10 N +1 which is at least 1012
whenever N + 1 ě 12, i.e. whenever N ě 11. So we will achieve an error of less than
10´12 if we choose N = 11.
1 1 1
ˇ
ˇ
N 1
ˇ = 12
ă 12
( N + 1) ˆ 10 + ˇ
N =11 12 ˆ 10 10
This is not the smallest possible choice of N, but in practice that just doesn’t matter — your
computer is not going to care whether or not you ask it to compute a few extra terms. If

24 ?
The interested
n reader may wish to check out “Stirling’s approximation”, which says that n! «
2πn ne .

478
P ROOFS AND S UPPLEMENTS A.12 A LTERNATING S ERIES

1 1
you really need the smallest N that obeys ( N +1)ˆ10 N +1 ď 1012 , you can next just try N = 10,

then N = 9, and so on.


1 1 1
ˇ
ˇ
ˇ = ă
( N + 1) ˆ 10 N +1 ˇ N =11 12 ˆ 1012 1012
1 1 1 1
ˇ
ˇ
N 1
ˇ = 11
ă 11
= 12
( N + 1) ˆ 10 + ˇ
N =10 11 ˆ 10 10 ˆ 10 10
1 1 1 1
ˇ
ˇ
N 1
ˇ = 10
= 11 ą 12
( N + 1) ˆ 10 + ˇ
N =9 10 ˆ 10 10 10
So in this problem, the smallest acceptable N = 10.
Example A.12.4

A.12.2 §§ Alternating Series Test Proof


TheoremA.12.5 (Alternating Series Test).
(8
Let an n =1
be a sequence of real numbers that obeys

(i) an ě 0 for all n ě 1 and


(ii) an+1 ď an for all n ě 1 (i.e. the sequence is monotone decreasing) and
(iii) limnÑ8 an = 0.

Then
8
(´1)n´1 an = S
ÿ
a1 ´ a2 + a3 ´ a4 + ¨ ¨ ¨ =
n =1
converges and, for each natural number N, S ´ S N is between 0 and (the first
dropped term) (´1) N a N +1 . Here S N is, as previously, the N th partial sum
N
(´1)n´1 an .
ř
n =1

Proof. Let 2n be an even natural number. Then the 2nth partial sum obeys
ě0
hkkkikkkj ě0
hkkkikkkj ě0
hkkkkkkikkkkkkj
S2n = ( a1 ´ a2 ) + ( a3 ´ a4 ) + ¨ ¨ ¨ + ( a2n´1 ´ a2n )
ě0
hkkkikkkj ě0
hkkkikkkj ě0
hkkkkkkikkkkkkj ě0
hkkkkkkkkikkkkkkkkj
ď ( a1 ´ a2 ) + ( a3 ´ a4 ) + ¨ ¨ ¨ + ( a2n´1 ´ a2n ) + ( a2n+1 ´ a2n+2 ) = S2(n+1)

and
ě0 ě0 ě0
hkkikkj hkkikkj hkkkkkkkkikkkkkkkkj hkkě0
ikkj
S2n = a1 ´ ( a2 ´ a3 ) ´ ( a4 ´ a5 ) ´ ¨ ¨ ¨ ´ ( a2n´2 ´ a2n´1 ) ´ a2n
ď a1

479
P ROOFS AND S UPPLEMENTS A.13 D ELICACY OF C ONDITIONAL C ONVERGENCE

So the sequence S2 , S4 , S6 , ¨ ¨ ¨ of even partial sums is a bounded, increasing sequence and


hence converges to some real number S. Since S2n+1 = S2n + a2n+1 and a2n+1 converges
zero as n Ñ 8, the odd partial sums S2n+1 also converge to S. That S ´ S N is between 0
and (the first dropped term) (´1) N a N +1 was already proved in §A.12.1.

Section A.12 of this work was adapted from Sections 3.3.4 and 3.3.10 of CLP 2 – Inte-
gral Calculus by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-
NonCommercial-ShareAlike 4.0 International license.

A.13IJ Delicacy of Conditional Convergence


Conditionally convergent series have to be treated with great care. For example, switching
the order of the terms in a finite sum does not change its value.

1+2+3+4+5+6 = 6+3+5+2+4+1

The same is true for absolutely convergent series. But it is not true for conditionally con-
vergent series. In fact by reordering any conditionally convergent series, you can make it
add up to any number you like, including +8 and ´8. This very strange result is known
as Riemann’s rearrangement Theorem, named after Bernhard Riemann (1826–1866). The
following example illustrates the phenomenon.
Example A.13.1

The alternating Harmonic series


8
1
(´1)n´1
ÿ
n
n =1

is a very good example of conditional convergence. We can show, quite explicitly, how
we can rearrange the terms to make it add up to two different numbers. Later, in Exam-
ple 5.5.19, we’ll show that this series is equal to ln 2. However, by rearranging the terms
we can make it sum to 12 ln 2. The usual order is

1 1 1 1 1 1
´ + ´ + ´ +¨¨¨
1 2 3 4 5 6
For the moment think of the terms being paired as follows:
     
1 1 1 1 1 1
´ + ´ + ´ +¨¨¨
1 2 3 4 5 6

so the denominators go odd-even odd-even. Now rearrange the terms so the denomina-
tors are odd-even-even odd-even-even:
     
1 1 1 1 1 1 1 1
1´ ´ + ´ ´ + ´ ´ +¨¨¨
2 4 3 6 8 5 10 12

480
P ROOFS AND S UPPLEMENTS A.13 D ELICACY OF C ONDITIONAL C ONVERGENCE

Now notice that the first term in each triple is exactly twice the second term. If we now
combine those terms we get
     
     
 1 ´ 1 ´1 +  1 ´ 1 ´1 + 1 ´ 1 ´ 1  + ¨ ¨ ¨
loomoo2n 4  lo
3omoo6n 8   5 10 12 
loomoon
=1/2 =1/6 =1/10
     
1 1 1 1 1 1
= ´ + ´ + ´ +¨¨¨
2 4 6 8 10 12

We can now extract a factor of 21 from each term, so


     
1 1 1 1 1 1 1 1 1
= ´ + ´ + ´ +¨¨¨
2 1 2 2 3 4 2 5 6
      
1 1 1 1 1 1 1
= ´ + ´ + ´ +¨¨¨
2 1 2 3 4 5 6
So by rearranging the terms, the sum of the series is now exactly half the original sum!
Example A.13.1

In fact, we can go even further, and show how we can rearrange the terms of the
alternating harmonic series to add up to any given number25 . For the purposes of the
example we have chosen 1.234, but it could really be any number. The example below can
actually be formalised to give a proof of the rearrangement Theorem.
Example A.13.2

8
We’ll show how to reorder the conditionally convergent series (´1)n´1 n1 so that it
ř
n =1
adds up to exactly 1.234 (but the reader should keep in mind that any fixed number will
work).
• First create two lists of numbers — the first list consisting of the positive terms of the
series, in order, and the second consisting of the negative numbers of the series, in
order.
1 1 1 1 1 1
1, , , , ¨¨¨ and ´ , ´ , ´ , ¨¨¨
3 5 7 2 4 6
• Notice that that if we add together the numbers in the second list,we get
1h 1 1 i
´ 1+ + +¨¨¨
2 2 3
which is just ´ 21 times the harmonic series. So the numbers in the second list add up
to ´8.

25 This is reminiscent of the accounting trick of pushing all the company’s debts off to next year so that
this year’s accounts look really good and you can collect your bonus.

481
P ROOFS AND S UPPLEMENTS A.13 D ELICACY OF C ONDITIONAL C ONVERGENCE

Also, if we add together the numbers in the first list, we get

1 1 1 1 1 1 1
1+ + + ¨¨¨ which is greater than + + + +¨¨¨
3 5 7 2 4 6 8
That is, the sum of the first set of numbers must be bigger than the sum of the second
set of numbers (which is just ´1 times the second list). So the numbers in the first
list add up to +8.

• Now we build up our reordered series. Start by moving just enough numbers from
the beginning of the first list into the reordered series to get a sum bigger than 1.234.

1
1+ = 1.3333
3
We know that we can do this, because the sum of the terms in the first list diverges
to +8.

• Next move just enough numbers from the beginning of the second list into the re-
ordered series to get a number less than 1.234.

1 1
1+ ´ = 0.8333
3 2
Again, we know that we can do this because the sum of the numbers in the second
list diverges to ´8.

• Next move just enough numbers from the beginning of the remaining part of the
first list into the reordered series to get a number bigger than 1.234.

1 1 1 1 1
1+ ´ + + + = 1.2873
3 2 5 7 9
Again, this is possible because the sum of the numbers in the first list diverges. Even
though we have already used the first few numbers, the sum of the rest of the list
will still diverge.

• Next move just enough numbers from the beginning of the remaining part of the
second list into the reordered series to get a number less than 1.234.

1 1 1 1 1 1
1+ ´ + + + ´ = 1.0373
3 2 5 7 9 4

• At this point the idea is clear, just keep going like this. At the end of each step,
the difference between the sum and 1.234 is smaller than the magnitude of the first
unused number in the lists. Since the numbers in both lists tend to zero as you go
farther and farther up the list, this procedure will generate a series whose sum is
exactly 1.234. Since in each step we remove at least one number from a list and we
alternate between the two lists, the reordered series will contain all of the terms from
8
(´1)n´1 n1 , with each term appearing exactly once.
ř
n =1

482
P ROOFS AND S UPPLEMENTS A.13 D ELICACY OF C ONDITIONAL C ONVERGENCE

Example A.13.2

Section A.13 of this work was adapted from Section 3.4.2 of CLP 2 – Integral Calculus
by Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

483
Appendix B

H IGH SCHOOL MATERIAL

This chapter is really split into three parts.


• Sections B.1 to B.11 contains results that we expect you to understand and know.
• Then Section B.14 contains results that we don’t expect you to memorise, but that
we think you should be able to quickly derive from other results you know.
• The remaining sections contain some material (that may be new to you) that is re-
lated to topics covered in the main body of these notes.

B.1IJ Similar Triangles

Two triangles T1 , T2 are similar when


• (AAA — angle angle angle) The angles of T1 are the same as the angles of T2 .
• (SSS — side side side) The ratios of the side lengths are the same. That is
A B C
= =
a b c

• (SAS — side angle side) Two sides have lengths in the same ratio and the angle
between them is the same. For example
A C
= and angle β is same
a c

484
H IGH SCHOOL MATERIAL B.2 P YTHAGORAS

B.2IJ Pythagoras
For a right-angled triangle the length of the hypotenuse is related to the lengths of the
other two sides by

(adjacent)2 + (opposite)2 = (hypotenuse)2

B.3IJ Trigonometry — Definitions

opposite 1
sin θ = csc θ =
hypotenuse sin θ
adjacent 1
cos θ = sec θ =
hypotenuse cos θ
opposite 1
tan θ = cot θ =
adjacent tan θ

B.4IJ Radians, Arcs and Sectors

For a circle of radius r and angle of θ radians:

• Arc length L(θ ) = rθ.

• Area of sector A(θ ) = 2θ r2 .

485
H IGH SCHOOL MATERIAL B.5 T RIGONOMETRY — G RAPHS

B.5IJ Trigonometry — Graphs

sin θ cos θ tan θ


1 1

´π ´ π2 π π 3π 2π ´π ´ π2 π π 3π 2π ´π ´ π2 π π 3π 2π
2 2 2 2 2 2

´1 ´1

B.6IJ Trigonometry — Special Triangles

From the above pair of special triangles we have


?
π 1 π 1 π 3
sin = ? sin = sin =
4 2 6 2 3 2
?
π 1 π 3 π 1
cos = ? cos = cos =
4 2 6 2 3 2
π π 1 π ?
tan = 1 tan = ? tan = 3
4 6 3 3

B.7IJ Trigonometry — Simple Identities


• Periodicity
sin(θ + 2π ) = sin(θ ) cos(θ + 2π ) = cos(θ )

• Reflection
sin(´θ ) = ´ sin(θ ) cos(´θ ) = cos(θ )

• Reflection around π/4


 
sin π
2 ´ θ = cos θ cos π
2 ´ θ = sin θ

486
H IGH SCHOOL MATERIAL B.8 T RIGONOMETRY — A DD AND S UBTRACT A NGLES

• Reflection around π/2

sin (π ´ θ ) = sin θ cos (π ´ θ ) = ´ cos θ

• Rotation by π

sin (θ + π ) = ´ sin θ cos (θ + π ) = ´ cos θ

• Pythagoras

sin2 θ + cos2 θ = 1

B.8IJ Trigonometry — Add and Subtract Angles


• Sine

sin(α ˘ β) = sin(α) cos( β) ˘ cos(α) sin( β)

• Cosine

cos(α ˘ β) = cos(α) cos( β) ¯ sin(α) sin( β)

B.9IJ Inverse Trigonometric Functions


Some of you may not have studied inverse trigonometric functions in highschool, how-
ever we still expect you to know them by the end of the course.

arcsin x arccos x arctan x


Domain: ´1 ď x ď 1 Domain: ´1 ď x ď 1 Domain: all real numbers
Range: ´ π2 ď arcsin x ď π
2 Range: 0 ď arccos x ď π Range: ´ π2 ă arctan x ă π
2

π/2
π π
2

π/2
´1 1

´π/2 ´ π2
´1 1

Since these functions are inverses of each other we have


π π
arcsin(sin θ ) = θ ´ ďθď
2 2
arccos(cos θ ) = θ 0ďθďπ
π π
arctan(tan θ ) = θ ´ ďθď
2 2

487
H IGH SCHOOL MATERIAL B.10 A REAS

and also

sin(arcsin x ) = x ´1 ď x ď 1
cos(arccos x ) = x ´1 ď x ď 1
tan(arctan x ) = x any real x

arccsc x arcsec x arccot x


Domain: |x| ě 1 Domain: |x| ě 1 Domain: all real numbers
Range: ´ π2 ď arccsc x ď π
2 Range: 0 ď arcsec x ď π Range: 0 ă arccot x ă π
arccsc x ‰ 0 arcsec x ‰ π2
π π π
2

π π
2 2
´1 1

´ π2
´1 1

Again
π π
arccsc(csc θ ) = θ ´ ďθď , θ‰0
2 2
π
arcsec(sec θ ) = θ 0 ď θ ď π, θ ‰
2
arccot(cot θ ) = θ 0ăθăπ

and

csc(arccsc x ) = x |x| ě 1
sec(arcsec x ) = x |x| ě 1
cot(arccot x ) = x any real x

B.10IJ Areas

• Area of a rectangle

A = bh

488
H IGH SCHOOL MATERIAL B.11 V OLUMES

• Area of a triangle

1 1
A= bh = ab sin θ
2 2

• Area of a circle

A = πr2

• Area of an ellipse

A = πab

B.11IJ Volumes

• Volume of a rectangular prism

V = lwh

• Volume of a cylinder

V = πr2 h

• Volume of a cone
1 2
V= πr h
3

• Volume of a sphere

4 3
V= πr
3

B.12IJ Powers
In the following, x and y are arbitrary real numbers, and q is an arbitrary constant that is
strictly bigger than zero.

• q0 = 1

489
H IGH SCHOOL MATERIAL B.13 L OGARITHMS

qx
• q x+y = q x qy , q x´y = qy

1
• q´x = qx
y
• qx = q xy

• lim q x = 8, lim q x = 0 if q ą 1
xÑ8 xÑ´8

• lim q x = 0, lim q x = 8 if 0 ă q ă 1
xÑ8 xÑ´8

• The graph of 2x is given below. The graph of q x , for any q ą 1, is similar.

y y = 2x
6

2
1
x
−3 −2 −1 1 2 3

B.13IJ Logarithms
In the following, x and y are arbitrary real numbers that are strictly bigger than 0, and p
and q are arbitrary constants that are strictly bigger than one.

logq x 
• q = x, logq q x = x
log p x
• logq x = log p q

• logq 1 = 0, logq q = 1

• logq ( xy) = logq x + logq y


x

• logq y = logq x ´ logq y
1

• logq y = ´ logq y,

• logq ( x y ) = y logq x

• lim logq x = 8, lim logq x = ´8


xÑ8 xÑ0

• The graph of log10 x is given below. The graph of logq x, for any q ą 1, is similar.

490
H IGH SCHOOL MATERIAL B.14 Y OU SHOULD BE ABLE TO DERIVE

y
y = log10 x
1.0

0.5

x
1 5 10 15
−0.5

−1.0

B.14IJ Highschool Material You Should be Able to Derive


• Graphs of csc θ, sec θ and cot θ:

csc θ sec θ cot θ

1 1

´π ´ π2´1 π π 3π 2π ´π ´ π2´1 π π 3π 2π ´π ´ π2 π π 3π 2π
2 2 2 2 2 2

• More Pythagoras
divide by cos2 θ
sin2 θ + cos2 θ = 1 ÞÝÝÝÝÝÝÝÝÝÝÑ tan2 θ + 1 = sec2 θ
divide by sin2 θ
sin2 θ + cos2 θ = 1 ÞÝÝÝÝÝÝÝÝÝÑ 1 + cot2 θ = csc2 θ

• Sine — double angle (set β = α in sine angle addition formula)


sin(2α) = 2 sin(α) cos(α)

• Cosine — double angle (set β = α in cosine angle addition formula)


cos(2α) = cos2 (α) ´ sin2 (α)
= 2 cos2 (α) ´ 1 (use sin2 (α) = 1 ´ cos2 (α))
= 1 ´ 2 sin2 (α) (use cos2 (α) = 1 ´ sin2 (α))

• Composition of trigonometric and inverse trigonometric functions:


a a
cos(arcsin x ) = 1 ´ x2 sec(arctan x ) = 1 + x2
and similar expressions.

491
H IGH SCHOOL MATERIAL B.15 C ARTESIAN C OORDINATES

B.15IJ Cartesian Coordinates


Each point in two dimensions may be labeled by two coordinates ( x, y) which specify the
position of the point in some units with respect to some axes as in the figure below.

(x, y)

x x

The set of all points in two dimensions is denoted R2 . Observe that


• the distance from the point ( x, y) to the x–axis is |y|
• the distance from the point ( x, y) to the y–axis is |x|
• the distance from the point ( x, y) to the origin (0, 0) is x2 + y2
a

Similarly, each point in three dimensions may be labeled by three coordinates ( x, y, z),
as in the two figures below.

z z

(x, y, z) (x, y, z)

z z
y y
x
x
y y

x x

The set of all points in three dimensions is denoted R3 . The plane that contains, for exam-
ple, the x– and y–axes is called the xy–plane.
• The xy–plane is the set of all points ( x, y, z) that obey z = 0.
• The xz–plane is the set of all points ( x, y, z) that obey y = 0.
• The yz–plane is the set of all points ( x, y, z) that obey x = 0.
More generally,
• The set of all points ( x, y, z) that obey z = c is a plane that is parallel to the xy–plane
and is a distance |c| from it. If c ą 0, the plane z = c is above the xy–plane. If
c ă 0, the plane z = c is below the xy–plane. We say that the plane z = c is a signed
distance c from the xy–plane.

492
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS

• The set of all points ( x, y, z) that obey y = b is a plane that is parallel to the xz–plane
and is a signed distance b from it.
• The set of all points ( x, y, z) that obey x = a is a plane that is parallel to the yz–plane
and is a signed distance a from it.

z z z

z=c
y=b

y y y

x=a
x x x

Observe that
• the distance from the point ( x, y, z) to the xy–plane is |z|
• the distance from the point ( x, y, z) to the xz–plane is |y|
• the distance from the point ( x, y, z) to the yz–plane is |x|
the distance from the point ( x, y, z) to the origin (0, 0, 0) is x2 + y2 + z2
a

The distance from the point ( x, y, z) to the point ( x1 , y1 , z1 ) is
b
( x ´ x 1 )2 + ( y ´ y1 )2 + ( z ´ z1 )2

so that the equation of the sphere centered on (1, 2, 3) with radius 4, that is, the set of all
points ( x, y, z) whose distance from (1, 2, 3) is 4, is

( x ´ 1)2 + (y ´ 2)2 + (z ´ 3)2 = 16

B.16IJ Roots of Polynomials


Being able to factor polynomials is a very important part of many of the computations in
this course. Related to this is the process of finding roots (or zeros) of polynomials. That
is, given a polynomial P( x ), find all numbers r so that P(r ) = 0.
In the case of a quadratic P( x ) = ax2 + bx + c, we can use the formula
?
´b ˘ b2 ´ 4ac
x=
2a
The corresponding formulas for cubics and quartics1 are extremely cumbersome, and no
such formula exists for polynomials of degree 5 and higher2 .

1 The method for cubics was developed in the 15th century by del Ferro, Cardano and Ferrari (Cardano’s
student). Ferrari then went on to discover a formula for the roots of a quartic. His formula requires the
solution of an associated cubic polynomial.
2 This is the famous Abel-Ruffini Theorem.

493
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS

Despite this there are many tricks3 for finding roots of polynomials that work well in
some situations but not all. Here we describe approaches that will help you find integer
and rational roots of polynomials that will work well on exams, quizzes and homework
assignments.
Consider the quadratic equation x2 ´ 5x + 6 = 0. We could4 solve this using the
quadratic formula
?
5 ˘ 25 ´ 4 ˆ 1 ˆ 6 5˘1
x= = = 2, 3.
2 2

Hence x2 ´ 5x + 6 has roots x = 2, 3 and so it factors as ( x ´ 3)( x ´ 2). Notice5 that the
numbers 2 and 3 divide the constant term of the polynomial, 6. This happens in general
and forms the basis of our first trick.

TrickB.16.1 (A very useful trick).

If r or ´r is an integer root of a polynomial P( x ) = an x n + ¨ ¨ ¨ + a1 x + a0 with


integer coefficients, then r is a factor of the constant term a0 .

Proof. If r is a root of the polynomial we know that P(r ) = 0. Hence

a n ¨ r n + ¨ ¨ ¨ + a1 ¨ r + a0 = 0

If we isolate a0 in this expression we get


 
a0 = ´ a n r n + ¨ ¨ ¨ + a1 r

We can see that r divides every term on the right-hand side. This means that the right-
hand side is an integer times r. Thus the left-hand side, being a0 , is an integer times r, as
required. The argument for when ´r is a root is almost identical.

Let us put this observation to work.


Example B.16.1

Find the integer roots of P( x ) = x3 ´ x2 + 2.


Solution.

• The constant term in this polynomial is 2.

• The only divisors of 2 are 1, 2. So the only candidates for integer roots are ˘1, ˘2.

3 There is actually a large body of mathematics devoted to developing methods for factoring polyno-
mials. Polynomial factorisation is a fundamental problem for most computer algebra systems. The
interested reader should make use of their favourite search engine to find out more.
4 We probably shouldn’t do it this way for such a simple polynomial, but for pedagogical purposes we
do here.
5 Many of you may have been taught this approach in highschool.

494
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS

• Trying each in turn

P (1) = 2 P(´1) = 0
P (2) = 6 P(´2) = ´10

• Thus the only integer root is ´1.

Example B.16.1

Example B.16.2

Find the integer roots of P( x ) = 3x3 + 8x2 ´ 5x ´ 6.


Solution.

• The constant term is ´6.

• The divisors of 6 are 1, 2, 3, 6. So the only candidates for integer roots are ˘1, ˘2, ˘3, ˘6.

• We try each in turn (it is tedious but not difficult):

P (1) =0 P(´1) =4
P (2) = 40 P(´2) = 12
P (3) = 132 P(´3) =0
P (6) = 900 P(´6) = ´336

• Thus the only integer roots are 1 and ´3.

Example B.16.2

We can generalise this approach in order to find rational roots. Consider the polyno-
mial 6x2 ´ x ´ 2. We can find its zeros using the quadratic formula:
?
1 ˘ 1 + 48 1˘7 1 2
x= = =´ , .
12 12 2 3
Notice now that the numerators, 1 and 2, both divide the constant term of the polynomial
(being 2). Similarly, the denominators, 2 and 3, both divide the coefficient of the highest
power of x (being 6). This is quite general.

TrickB.16.2 (Another nice trick).

If b/d or ´b/d is a rational root in lowest terms (i.e. b and d are integers with
no common factors) of a polynomial Q( x ) = an x n + ¨ ¨ ¨ + a1 x + a0 with inte-
ger coefficients, then the numerator b is a factor of the constant term a0 and the
denominator d is a factor of an .

495
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS

Proof. Since b/d is a root of P( x ) we know that

an (b/d)n + ¨ ¨ ¨ + a1 (b/d) + a0 = 0

Multiply this equation through by dn to get

an bn + ¨ ¨ ¨ + a1 bdn´1 + a0 dn = 0

Move terms around to isolate a0 dn :


 
a0 dn = ´ an bn + ¨ ¨ ¨ + a1 bdn´1

Now every term on the right-hand side is some integer times b. Thus the left-hand side
must also be an integer times b. We know that d does not contain any factors of b, hence
a0 must be some integer times b (as required).
Similarly we can isolate the term an bn :
 
an bn = ´ an´1 bn´1 d + ¨ ¨ ¨ + a1 bdn´1 + a0 dn

Now every term on the right-hand side is some integer times d. Thus the left-hand side
must also be an integer times d. We know that b does not contain any factors of d, hence
an must be some integer times d (as required).
The argument when ´b/d is a root is nearly identical.
We should put this to work:
Example B.16.3

P( x ) = 2x2 ´ x ´ 3.
Solution.
• The constant term in this polynomial is 3 = 1 ˆ 3 and the coefficient of the highest
power of x is 2 = 1 ˆ 2.

• Thus the only candidates for integer roots are ˘1, ˘3.

• By our newest trick, the only candidates for fractional roots are ˘ 12 , ˘ 23 .

• We try each in turn6

P(1) = ´2 P(´1) =0
P(3) = 12 P(´3) = 18
   
P 12 = ´3 P ´ 12 = ´2
 
P 32 = 0 P ´ 23 =3

so the roots are ´1 and 32 .

6 Again, this is a little tedious, but not difficult. Its actually pretty easy to code up for a computer to do.
Modern polynomial factoring algorithms do more sophisticated things, but these are a pretty good way
to start.

496
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS

Example B.16.3

The tricks above help us to find integer and rational roots of polynomials. With a little
extra work we can extend those methods to help us factor polynomials. Say we have a
polynomial P( x ) of degree p and have established that r is one of its roots. That is, we
know P(r ) = 0. Then we can factor ( x ´ r ) out from P( x ) — it is always possible to find a
polynomial Q( x ) of degree p ´ 1 so that

P( x ) = ( x ´ r ) Q( x )

In sufficiently simple cases, you can probably do this factoring by inspection. For
example, P( x ) = x2 ´ 4 has r = 2 as a root because P(2) = 22 ´ 4 = 0. In this case,
P( x ) = ( x ´ 2)( x + 2) so that Q( x ) = ( x + 2). As another example, P( x ) = x2 ´ 2x ´ 3
has r = ´1 as a root because P(´1) = (´1)2 ´ 2(´1) ´ 3 = 1 + 2 ´ 3 = 0. In this case,
P( x ) = ( x + 1)( x ´ 3) so that Q( x ) = ( x ´ 3).
For higher degree polynomials we need to use something more systematic — long
divison.
TrickB.16.3 (Long Division).

Once you have found a root r of a polynomial, even if you cannot factor ( x ´ r )
out of the polynomial by inspection, you can find Q( x ) by dividing P( x ) by x ´ r,
using the long division algorithm you learned7 in school, but with 10 replaced
by x.

Example B.16.4

Factor P( x ) = x3 ´ x2 + 2.
Solution.
• We can go hunting for integer roots of the polynomial by looking at the divisors of
the constant term. This tells us to try x = ˘1, ˘2.
• A quick computation shows that P(´1) = 0 while P(1), P(´2), P(2) ‰ 0. Hence
x = ´1 is a root of the polynomial and so x + 1 must be a factor.
3+2 2
• So we divide x ´x 2
x +1 . The first term, x , in the quotient is chosen so that when you
multiply it by the denominator, x2 ( x + 1) = x3 + x2 , the leading term, x3 , matches
the leading term in the numerator, x3 ´ x2 + 2, exactly.

x2
x + 1 x3 − x2 + 2
x3 + x2 x2 (x + 1)

7 This is a standard part of most highschool mathematics curricula, but perhaps not all. You should revise
this carefully.

497
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS

• When you subtract x2 ( x + 1) = x3 + x2 from the numerator x3 ´ x2 + 2 you get the


remainder ´2x2 + 2. Just like in public school, the 2 is not normally “brought down”
until it is actually needed.

x2
x + 1 x3 − x2 + 2
x3 + x2 x2 (x + 1)
−2x2

• The next term, ´2x, in the quotient is chosen so that when you multiply it by the
denominator, ´2x ( x + 1) = ´2x2 ´ 2x, the leading term ´2x2 matches the leading
term in the remainder exactly.

x2 − 2x
x + 1 x3 − x2 + 2
x3 + x2 x2 (x + 1)
−2x2
−2x2 − 2x −2x(x + 1)

And so on.

x2 − 2x + 2
x + 1 x3 − x2 + 2
x3 + x2 x2 (x + 1)
−2x2
−2x2 − 2x −2x(x + 1)
2x + 2
2x + 2 2(x + 1)
0

• Note that we finally end up with a remainder 0. A nonzero remainder would have
signalled a computational error, since we know that the denominator x ´ (´1) must
divide the numerator x3 ´ x2 + 2 exactly.

• We conclude that

( x + 1)( x2 ´ 2x + 2) = x3 ´ x2 + 2

To check this, just multiply out the left hand side explicitly.
?
´b˘ b2 ´4ac
• Applying the high school quadratic root formula 2a to x2 ´ 2x + 2 tells us
that it has no real roots and that we cannot factor it further8 .

Example B.16.4

8 Because we are not permitted to use complex numbers.

498
H IGH SCHOOL MATERIAL B.16 R OOTS OF P OLYNOMIALS

We finish by describing an alternative to long division. The approach is roughly equiv-


alent, but is perhaps more straightforward at the expense of requiring more algebra.
Example B.16.5

Factor P( x ) = x3 ´ x2 + 2, again.
Solution. Let us do this again but avoid long division.
3+2 2
• From the previous example, we know that x ´x x +1 must be a polynomial (since ´1
is a root of the numerator) of degree 2. So write

x3 ´ x2 + 2
= ax2 + bx + c
x+1
for some, as yet unknown, coefficients a, b and c.

• Cross multiplying and simplifying gives us

x3 ´ x2 + 2 = ( ax2 + bx + c)( x + 1)
= ax3 + ( a + b) x2 + (b + c) x + c

• Now matching coefficients of the various powers of x on the left and right hand
sides

coefficient of x3 : a=1
coefficient of x2 : a + b = ´1
coefficient of x1 : b+c = 0
coefficient of x0 : c=2

• This gives us a system of equations that we can solve quite directly. Indeed it tells
us immediately that that a = 1 and c = 2. Subbing a = 1 into a + b = ´1 tells us
that 1 + b = ´1 and hence b = ´2.

• Thus

x3 ´ x2 + 2 = ( x + 1)( x2 ´ 2x + 2).

Example B.16.5

Appendix B of this work was taken from Appendix A of CLP 2 – Integral Calculus by
Feldman, Rechnitzer, and Yeager under a Create Commons Attribution-NonCommercial-
ShareAlike 4.0 International license.

499

You might also like