0% found this document useful (0 votes)

212 views134 pages

Single Variable Notes

This document contains the lecture summaries for a calculus course taught by Jason Starr in the fall of 2005. It outlines the topics covered in each of the 32 lectures, including velocity and derivatives in Lecture 1, limits in Lecture 2, and Taylor series and review in Lectures 31 and 32. For each lecture, it provides the assigned homework problems and any recommended practice problems from the textbook or course reader. It then presents an example of the key concepts and methods covered in Lectures 1 and 2, such as calculating derivatives, determining equations of tangent lines, and defining limits.

Uploaded by

Karan poudel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

212 views134 pages

Single Variable Notes

Uploaded by

Karan poudel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 134

18.

01 Calculus Jason Starr

Fall 2005

Math 18.01 Lecture Summaries

Homework. These are the problems from the assigned Problem Set which can be completed using
the material from that date’s lecture.
Practice Problems. Practice problems are not to be written up or turned in. These are assigned
only for practice, and are entirely voluntary. Problems listed as “1B1”, for example, are taken
from Section E of the 18.01 course reader.
Lecture 1. Sept. 8 Velocity and derivatives

Lecture 2. Sept. 9 Limits

Lecture 3. Sept. 13 Rules of diﬀerentiation

Lecture 4. Sept. 15 The chain rule and implicit diﬀerentiation

Lecture 5. Sept. 16 The derivatives of exponential and logarithm functions

Lecture 6. Sept. 20 The derivatives of trigonometric functions

Lecture 7. Sept. 22 Review for Exam 1

Lecture 8. Sept. 27 Linear and quadratic approximations

Lecture 9. Sept. 29 Sketching curves

Lecture 10. Sept. 30 Applied maximum/minimum problems

Lecture 11. Oct. 4 Related rates problems

Lecture 12. Oct. 6 Newton’s method

Lecture 13. Oct. 13 Antidiﬀerentiation

Lecture 14. Oct. 14 Riemann integrals

Lecture 15. Oct. 18 The Fundamental Theorem of Calculus

Lecture 16. Oct. 20 Properties of the Riemann integral

Lecture 17. Oct. 21 Separable ordinary diﬀerential equations

Lecture 18. Oct. 25 Numerical integration

Lecture 19. Oct. 28 Applications of integration to volumes

Lecture 20. Nov. 1 Averages and volumes by shells

Lecture 21. Nov. 3 Parametric equation curves and arc length

Lecture 22. Nov. 4 Area of a surface of revolution and polar coordinate curves

Lecture 23. Nov. 8 Tangent lines, arc length and areas for polar curves

Lecture 24. Nov. 15 Inverse trigonometric functions and hyperbolic functions

Lecture 25. Nov. 17 Inverse hyperbolic functions and inverse substitution

Lecture 26. Nov. 18 Partial fraction decomposition

Lecture 27. Nov. 22 Integration by parts

1
18.01 Calculus Jason Starr
Fall 2005

Lecture 28. Dec. 1 L’Hospital’s rule

Lecture 29. Dec. 2 Improper integrals
Lecture 30. Dec. 6 Sequences and series
Lecture 31. Dec. 8 Power series and Taylor series
Lecture 32. Dec. 8 More Taylor series and review
Lecture 1. September 8, 2005

Homework. Problem Set 1 Part I: (a)–(e); Part II: Problems 1 and 2.

Practice Problems. Course Reader: 1B1, 1B2

Textbook: p. 68, Problems 1–7 and 15.

1. Velocity. Displacement is s(t). Increment from t0 to t0 + Δt is,

Δs = s(t0 + Δt) − s(t0 ).

Average velocity from t0 to t0 + Δt is,

Δs s(t0 + Δt) − s(t0 )
vave = = .
Δt Δt
Velocity, or instantaneous velocity, at t0 is,
s(t0 + Δt) − s(t0 )
v(t0 ) = lim vave = lim .
Δt→0 Δt→0 Δt
This is a derivative, v(t) equals s� (t) = ds/dt. The derivative of velocity is acceleration,
v(t0 + Δt) − v(t0 )
a(t0 ) = v � (t0 ) = lim .
Δt→0 Δt

Example. For s(t) = −5t2 + 20t, ﬁrst computed velocity at t = 1 is,

v(1) = lim 10 − 5Δt = 10.

Δt→0

Then computed velocity at t = t0 is,

v(t0 ) = lim −10t0 + 10 − 5Δt = −10t0 + 20.

Δt→0

Finally, computed acceleration at t = t0 is,

a(t0 ) = lim −10 = −10.

Δt→0

2. Derivative. Let y = f (x) be a dependent variable depending on an independent variable x,

varying freely. The increment of y from x0 to x0 + Δx is,

Δy = f (x0 + Δx) − f (x0 ).

2
18.01 Calculus Jason Starr
Fall 2005

The diﬀerence quotient or average rateofchange of y from x0 to x0 + Δx is,

Δy f (x0 + Δx) − f (x0 )

= .
Δx Δx
The derivative of y (or f (x)) with respect to x at x0 is,

Δy f (x0 + Δx) − f (x0 )

lim = lim .
Δx →0 Δx Δx→0 Δx

3. Examples in science and math.

(i) Economics. Marginal cost is the derivative of cost with respect to some other variable, for
instance, the quantity purchased.

(ii) Thermodynamics. The ideal gas law relating pressure p, volume V , and temperature T of a
gas is,

pV = nRT.

Under isothermal conditions, T is a constant T0 so that,

nRT0
p(V ) = V
.

Under adiabatic conditions (i.e., no transfer of heat), pV γ is a constant K. Using this to

eliminate p gives,
K 1
T (V ) = nR V γ−1
.
As this illustrates, the independent variable, dependent variable and constants in an equation
very much depend on the problem to be solved.

(iii) Biology. Exponential population growth models the population N (t) after t years as,

N (t) = N0 ert ,

where ex is the exponential function, N0 is initial population, and r is a growth factor. Later
we will see, N � (t) = rN (t), i.e., the population grows at a rate proportional to the size of the
population.

(iv) Geometry. The volume of a right circular cone is,

V = A × h.
3
where A is the base area of the cone and h is the height of the cone. The radius r of the base
is proportional to the height,
r(h) = ch,

3
18.01 Calculus Jason Starr
Fall 2005

for some constant c. Since A = πr2 , this gives,

π 2 3
V (h) = ch.
3
The derivative is,
dV
= πc2 h2 = πr2 = A.
dh
This is very reasonable. In some sense, this explains the classical formula for the volume of

a cone.

Lecture 2. September 9, 2005

Homework. Problem Set 1 Part I: (f)–(h); Part II: Problems 3.

Practice Problems. Course Reader: 1C2, 1C3, 1C4, 1D3, 1D5.

1. Tangent lines to graphs. For y = f (x), the equation of the secant line through
(x0 , f (x0 )) and (x0 + Δx, f (x0 + Δx)) is,
f (x0 + Δx) − f (x0 )
y= (x − x0 ) + f (x0 ).
Δx
In the limit, the equation of the tangent line through (x0 , f (x0 )) is,

y = f � (x0 )(x − x0 ) + y0 .

Example. For the parabola y = x2 , the derivative is,

y � (x0 ) = 2x0 .

The equation of the tangent line is,

y = 2x0 (x − x0 ) = 2x0 x − x20 .

For instance, the equation of the tangent line through (2, 4) is,

y = 4x − 4.

Given a point (x, y), what are all points (x0 , x20 ) on the parabola whose tangent line contains
(x, y)? To solve, consider x and y as constants and solve for x0 . For instance, if (x, y) =
(1, −3), this gives,
(−3) = 2x0 (1) − x20 ,
or,
x20 − 2x0 − 3 = 0.

4
18.01 Calculus Jason Starr
Fall 2005

Factoring (x0 − 3)(x0 + 1), the solutions are x0 equals −1 and x0 equals 3. The corresponding
tangent lines are,
y = −2x − 1,
and
y = 6x − 9.
For general (x, y), the solutions are,
�
x0 = x ± x2 − y.

2. Limits. Precise definition is on p. 791 of Appendix A.2. Intuitive definition: limx→x0 f (x)
equals L if and only if all values of f (x) can be made arbitrarily close to L by choosing x
sufficiently close to x0 . One interpretation is the “microscope/laser illuminator” analogy: An
observer focuses a microscopes fieldofview on a thin strip parallel to the xaxis centered
on y = L. The goal of the illuminator is to focus a laserbeam centered on x0 parallel to
the yaxis (but with the line x = x0 deleted) so that only the portion of the graph in the
fieldofview is illuminated. If for every magnification of the microscope, the illuminator can
succeed, then the limit is defined and equals L.

There is a beautiful Java applet on the webpage of Daniel J. Heath of Paciﬁc Lutheran
University,
http://www.plu.edu/~heathdj/java/calc1/Epsilon.html
If you use this, try a = −1.

For lefthand limits, use a laser that illuminates only to the left of x0 . For righthand limits,
use a laser that illuminates only to the right of x0 .

3. Continuity. A function f (x) is continuous at x0 if f (x0 ) is deﬁned, limx→x0 f (x) is

deﬁned, and limx→x0 f (x) equals f (x0 ). Also, f (x) is continuous on an interval if it is contin
uous at every point of the interval. The types of discontinuity are: removable discontinuity,
jump discontinuity, inﬁnite discontinuity and essential discontinuity.

Lecture 3. September 13, 2005

Homework. Problem Set 1 Part I: (i) and (j).

Practice Problems. Course Reader: 1E1, 1E3, 1E5.

√
1. Another derivative. Use the 3step method to compute the derivative of f (x) = 1/ 3x + 1
is,
f � (x) = −3(3x + 1)−3/2 /2 .
Upshot: Computing derivatives by the deﬁnition is too much work to be practical. We need general
methods to simplify computations.

5
18.01 Calculus Jason Starr
Fall 2005

2. The binomial theorem. For a positive integer n, the factorial,

n! = n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1,

is the number of ways of arranging n distinct objects in a line. For two positive integers n and k,
the binomial coeﬃcient,
� �
n n! n(n − 1) · · · (n − k + 2)(n − k + 1)
= = ,
k k!(n − k)! k(k − 1) · · · 3 · 2 · 1
is the number of ways to choose a subset of k elements from a collection of n elements. A funda
mental fact about binomial coeﬃcients is the following,
� � � � � �
n n n+1
+ = .
k−1 k k
This is known as Pascal’s formula. This link is to a webpage produced by MathWorld, part of
Wolfram Research.
The Binomial Theorem says that for every positive integer n and every pair of numbers a and b,
(a + b)n equals, � �
n n−1 n n−k k
a + na b + · · · + a b + · · · + nabn−1 + bn .
k
This is proved by mathematical induction. First, the result is very easy when n = 1; it just says
that (a + b)1 equals a1 + b1 . Next, make the induction hypothesis that the theorem is true for the
integer n. The goal is to deduce the theorem for n + 1,
� �
n+1 n+1 n n + 1 n+1−k k
(a + b) =a + (n + 1)a b + · · · + a b + · · · + (n + 1)abn + bn+1 .
k

By the deﬁnition of the (n + 1)st power of a number,

(a + b)n+1 = (a + b) × (a + b)n .

By the induction hypothesis, the second factor can be replaced,

� � � �
n n n n−k k n
(a + b)(a + b) = (a + b) a + · · · + a b + ··· + b .
k
Multiplying each term in the second factor ﬁrst by a and then by b gives,
an+1 + nan b + . . . + � nk �an+1−k bk + �k+1
� � � n � n−k k+1
� a b + ... + abn
n n n+1−k k n n−k k+1
+ a b + . . . + k−1 a b + k
a b + ... + nabn + bn+1

Summing in columns gives,

+ ( nk + k−1
� � � n � n+1−k k � n � �n� n−k k+1
an+1 + (n + 1)an b + . . . )a b + ( k+1 + k )a b + ... + (1 + n)abn

6
18.01 Calculus Jason Starr
Fall 2005

Using Pascal’s formula, this simpliﬁes to,

an+1 + (n + 1)an b + . . . + n+1 b + n+1
� � n+1−k k � � n−k k+1
k
a k+1
a b + ... + (n + 1)abn + bn+1 .
This proves the theorem for n + 1, assuming the theorem for n.
Since we proved the theorem for n = 1, and since we also proved that for each integer n, the
theorem for n implies the theorem for n + 1, the theorem holds for every integer.
3. The derivative of xn . Let f (x) = xn where n is a positive integer. For every a and every h,
the binomial theorem gives,
� �
n n n−1 n n−k k
f (a + h) = (a + h) = a + na h + · · · + a h + · · · + hn .
k
Thus, f (a + h) − f (a) equals,
� �
n n n−1 n n−k k
(a + h) − a = na h + ··· + a h + · · · + hn .
k
Thus the difference quotient is,
� � � �
f (a + h) − f (a) n−1 n n−2 n n−k k−1
= na + a h + ··· + a h + · · · + hn−1 .
h 2 k
Every summand except the first is divisible by h. The limit of such a term as h → 0 is 0. Thus,
f (a + h) − f (a)
lim = nan−1 + 0 + · · · + 0 = nan−1 .
h→0 h
So f � (x) equals nxn−1 .
3. Linearity. For differentiable functions f (x) and g(x) and for constants b and c, bf (x) + cg(x)
is differentiable and,
(bf (x) + cg(x))� = bf � (x) + cg � (x).
This is often called linearity of the derivative.
4. The Leibniz rule/Product rule. For differentiable functions f (x) and g(x), the product
f (x)g(x) is differentiable and,
(f (x)g(x))� = f � (x)g(x) + f (x)g � (x).
The crucial observation in proving this is rewriting the increment of f (x)g(x) from a to a + h as,
f (a+h)g(a+h)−f (a)g(a) = f (a+h)[g(a+h)−g(a)]+f (a+h)g(a)−f (a)g(a) = f (a+h)[g(a+h)−g(a)]+[f (a+h)−

5. The quotient rule. Let f (x) and g(x) be differentiable functions. If g(a) is nonzero, the
quotient function f (x)/g(x) is defined and differentiable at a, and,
(f (x)/g(x))� = [f � (x)g(x) − f (x)g � (x)]/g(x)2 .

7
18.01 Calculus Jason Starr
Fall 2005

One way to deduce this formula is to set q(x) = f (x)/g(x) so that f (x) = q(x)g(x), and the apply
the Leibniz formula to get,

f � (x) = q � (x)g(x) + q(x)g � (x) = q � (x)g(x) + f (x)g � (x)/g(x).

Solving for q � (x) gives,

q � (x) = [f � (x) − f (x)g � (x)/g(x)]/g(x) = [f � (x)g(x) − f (x)g � (x)]/g(x)2 .

6. Another proof that d(xn )/dx equals nxn−1 . This was mentioned only very brieﬂy. The
product rule also gives another induction proof that for every positive integer n, d(xn )/dx equals
nxn−1 . For n = 1, we proved this by hand. Let n be some speciﬁc positive integer, and make the
induction hypothesis that d(xn )/dx equals nxn−1 . The goal is to deduce the formula for n + 1,

d(xn+1 )
= (n + 1)xn .
dx

By the Leibniz rule,

d(xn+1 ) d(x × xn ) d(x) n d(xn ) d(xn )

= = x +x = (1)xn + x .
dx dx dx dx dx
By the induction hypothesis, the second term can be replaced,

d(xn+1 )
= xn + x(nxn−1 ) = xn + nxn = (n + 1)xn .
dx
Thus the formula for n implies the formula for n + 1. Therefore, by mathematical induction, the

formula holds for every positive integer n.

Lecture 4. September 15, 2005

Homework. No new problems.

Practice Problems. Course Reader: 1F1, 1F6, 1F7, 1F8.

√
1. Product rule example. For u = 3x + 1, what is u� (x)? Since u · u = 3x + 1, (u · u)� =
(3x + 1)� = 3. By the product rule, (u · u)� = u� · u + u · u� = 2uu� . Thus solving,

u� (x) = 3/(2u) = 3(3x + 1)−1/2 /2 .

2. The derivative of un . From above, (u2 )� equals 2uu� . By a similar computation, (u3 )� equals
3u2 u� . This suggests a pattern,
d(un ) du
= nun−1 .
dx dx

18.01 Calculus Jason Starr

Fall 2005

This can be proved by induction on n. For n = 1, 2 and 3, it was checked. Let n be a particular
integer (for instance, 70119209472933054321). For that integer, suppose the result is known,

d(un ) du
= nun−1 .
dx dx
The goal is to prove the result for n + 1, that is,

d(un+1 ) du
= (n + 1)un .
dx dx
Let v = un . Then un+1 equals uv. So, by the product rule,

d(un+1 ) d(uv) du dv
= = v+u .
dx dx dx dx
Plugging in v = un , this is,
d(un+1 ) du d(un )
= · (un ) + u .
dx dx dx
By the induction hypothesis, d(un )/dx equals nun−1 (du/dx). Plugging in,

d(un+1 ) du du
= · (un ) + u(nun−1 ).
dx dx dx
This simplﬁes to,
d(un+1 ) du du du
= un + nun = (n + 1)un .
dx dx dx dx
Thus, the result for n + 1 follows from the result for n. By induction, the result holds for every n.
3. The derivative of xa , a a fraction. Let a be a fraction m/n and let u(x) be xa . Then un
equals xm . Thus,
d(un ) d(xm )
= ,
dx dx
which equals mxm−1 . By the above, d(un )/dx equals nun−1 (du/dx). Thus,
du
nun−1 = mxm−1 .
dx
Solving for du/dx,
du mxm−1 mxm−1
= = .
dx nun−1 n(xm/n )n−1
One of the basic rules of exponents is that (ab )c equals abc . Thus the denominator n(xm/n )n−1
equals nxm/n(n−1) , which equals nxm−m/n . Thus,

du mxm−1 m m−1 m/n−m

= = x ·x .
dx nxm−m/n n

9
18.01 Calculus Jason Starr
Fall 2005

Another basic rule of exponents is that ab · ac equals ab+c . Thus,

du m m
= x(m−1)+(m/n−m) = xm/n−1 .
dx n n
Remembering that m/n is just a, and u(x) is xa , this ﬁnally gives,

d(xa )
= axa−1 .
dx

4. The chain rule. Let y be a function of x, y = f (x), and let u be a function of y, u = g(y).
Then u is a function of x, u = g(f (x)). This function is a composite function, and is denoted
by,
(g ◦ f )(x) = g(f (x)).
What is the derivative of a composite function? The claim is that,

(g ◦ f )� (x) = g � (f (x)) · f � (x).

This is often easier to remember in the form,

du du dy
= · .
dx dy dx
This also suggests the proof,
Δu Δu Δy
(g ◦ f )� (x0 ) = lim = lim · ,
Δx→0 Δx Δx→0 Δy Δx

where y0 equals f (x0 ), u0 equals g(y0 ) = g(f (x0 )), Δy equals f (x0 + Δx) − f (x0 ) = f (x0 + Δx) − y0 ,
and Δu equals g(y0 +Δy)−g(y0 ) = g(f (x0 +Δx))−g(f (x0 )). So long as Δy is nonzero, the fraction
in the limit is deﬁned. And, as Δx approaches 0, also Δy approaches 0. Thus the limit breaks up
as,
Δu Δy
(g ◦ f )� (x0 ) = lim · lim = g � (y0 ) · f � (x0 ).
Δy→0 Δy Δx→0 Δx

Thus (g ◦ f )� (x0 ) equals g � (f (x0 ))f � (x0 ).

Example. Let y(x) equals 1 + x2 , and let u(y) equal 1/y = y −1 . Then y � (x) = 0 + 2x = 2x and
u� (y) = −y −2 . Thus, by the chain rule,
� �
d 1 −1 −2x
= (2x) = (1+x 2 )2 .
dx 1 + x2 y2

5. Implicit diﬀerentiation. This method has already been used many times. Given a function
y(x) satisfying some equation involving both x and y, formally diﬀerentiate each side of the equation
with respect to x and then try to solve for y � .

10
18.01 Calculus Jason Starr
Fall 2005

Lecture 5. September 16, 2005

Homework. Problem Set 2 Part I: (a)–(e); Part II: Problem 2.

Practice Problems. Course Reader: 1I1, 1I4, 1I5

1. Example of implicit diﬀerentiation. Let y = f (x) be the unique function satisfying the
equation,
1 1
+ = 2.
x y

What is slope of the tangent line to the graph of y = f (x) at the point (x, y) = (1, 1)?

Implicitly diﬀerentiate each side of the equation to get,

� � � �
d 1 d 1 d(2)
+ = = 0.
dx x dx y dx

Of course (1/x)� = (x−1 )� = −x−2 . And by the rule d(un )/dx = nun−1 (du/dx), the derivative of
1/y is −y −2 (dy/dx). Thus,
dy
−x−2 − y −2 = 0.
dx

Plugging in x equals 1 and y equals 1 gives,

−1 − 1y � (1) = 0,

whose solution is,

y � (1) =
−1 .
In fact, using that 1/y equals 2 − 1/x, this can be solved for every x,
dy 1 1 1
= (x−2 )/(y −2 ) = 2 · 2
= .
dx x (2 − 1/x) (2x − 1)2

2. Rules for exponentials and logarithms. Let a be a positive real number. The basic rules
of exponentials are as follows.
Rule 1. If ab equals B and ac equals C, then ab+c equals B · C, i.e.,

ab+c = ab · ac .

Rule 2. If ab equals B and B d equals D, then abd equals D, i.e.,

(ab )d = abd .

If ab equals B, the logarithm with base a of B is defined to be b. This is written loga (B) = b. The
function B → loga (B) is defined for all positive real numbers B. Using this definition, the rules of
exponentiation become rules of logarithms.

18.01 Calculus Jason Starr

Fall 2005

Rule 1. If loga (B) equals b and loga (C) equals c, then loga (B · C) equals b + c, i.e.,

loga (B · C) = loga (B) + loga (C).

Rule 2. If loga (B) equals b and B d equals D, then loga (D) equals d loga (B), i.e.,

loga (B d ) = d loga (B).

Rule 3. Since logB (D) equals d, an equivalent formulation is loga (D) equals loga (B) logB (D), i.e.,

loga (D) = loga (B) logB (D).

3. The derivative of ax . Let a be a positive real number. What is the derivative of ax ? Denote
the derivative of ax at x = 0 by L(a). It equals the value of the limit,

ah − 1
L(a) = lim .
h→0 h
Then for every x0 , the derivative of ax at x0 equals,
ax0 +h − ax0
lim .
h→0 h
By Rule 1, ax0 +h equals ax0 ah . Thus the limit factors as,
ax 0 ah − ax 0
lim = ax0 lim ah − 1h.
h→0 h h→0

Therefore, for every x, the derivative of ax is,

d(ax )

= L(a)ax .
dx

What is L(a)? To ﬁgure this out, consider how L(a) changes as a changes. First of all,

(ab )h − 1
L(ab ) = lim .
h→0 h
By Rule 2, (ab )h equals abh . So the limit is,

abh
− 1 abh − 1
L(ab ) = lim = b lim .
h→0 h h→0 bh
Now, inside the limit, make the substitution that k equals bh. As h approaches 0, also k approaches
0. So the limit is,
ak − 1
L(ab ) = b lim = bL(a).
k→0 k

12
18.01 Calculus Jason Starr
Fall 2005

This is very similar to Rule 2 for logarithms.

Choose a number a0 bigger than 1, say a0 = 2. Then for every positive real number a, a = ab
0
where b = loga0 (a). Thus,

L(a) = L(ab0 ) = bL(a0 ) = L(a0 ) loga0 (a).

So, with a0 ﬁxed and a allowed to vary, L(a) is just the logarithm function loga0 (a) scaled by L(a0 ).
Looking at the graph of (a0 )x , it is geometrically clear that L(a0 ) is positive (though we have not
proved that L(a0 ) is even deﬁned). Thus the graph of L(a) looks qualitatively like the graph of
loga0 (a). In particular, for a less than 1, L(a) is negative. The value L(1) equals 0. And L(a)
approaches +∞ and a increases. Therefore, there must be a number where L takes the value 1.
By long tradition, this number is called e;

eh − 1
L(e) = lim = 1.
h→0 h
This is the deﬁnition of e. It sheds very little light on the decimal value of e.
Because e is so important, the logarithm with base e is given a special name: the natural loga
rithm. It is denote by,
ln(a) = loge (a).
So, ﬁnally, L(a) equals,
L(a) = loge (a)L(e) = ln(a)(1) = ln(a).
This leads to the formula for the derivative of ax ,
d(ax )
= ln(a)ax .
dx
In particular,
d(ex )
= ex .
dx
In fact, ex is characterized by the property above and the property that e0 equals 1.
4. The derivative of loga (x) and the value of e. By the chain rule,
d(au ) du
= ln(a)au .
dx dx
For u = loga (x), au equals x. Thus,
d(au ) d(x)
= = 1.
dx dx
Thus,
du
ln(a)au = 1.
dx

13
18.01 Calculus Jason Starr
Fall 2005

Solving gives,
d loga (x) 1 1
= = 1/(ln(a)x) .
dx ln(a) au
In particular, for a = e, this gives,
d ln(x)
= 1/x .
dx
What is the derivative of ln(x) at x = 1? On the one hand, since the derivative of ln(x) equals 1/x,
the derivative at x = 1 is 1/1 = 1. On the other hand, the deﬁnition of the derivative gives,

ln(1 + h) − ln(1)
lim .
h→0 h
Of course, ln(1) equals 0, so this simpliﬁes to,
1
lim ln(1 + h).
h→0 h

Using Rule 2 for logarithms, this gives,

lim ln((1 + h)1/h ).

h→0

Since ln(y) is continuous, the limit equals,

ln[lim(1 + h)1/h ].
h→0

So the natural logarithm of the inner limit equals 1. But e is the unique number whose natural
logarithm equals 1. This leads to the formula,

e = lim(1 + h)1/h .
h→0

Making the substitution n = 1/h leads to the more familiar form,

lim (1 + 1/n)n = e .
n→+∞

This can be used to compute e to arbitrary accuracy. The ﬁrst few digits of e are
2.718281828459045...
5. Logarithmic diﬀerentiation. There is a method of computing derivatives of products of
functions that is often useful. If y is a product of n factors, say f1 (x)· f2 (x)·· · ·· fn (x), the derivative
of y can be computed by the product rule. However, it seems to be a fact that multiplication is
more errorprone than addition. Thus introduce,

u = ln(y) = ln(f1 (x)) + ln(f2 (x)) + · · · + ln(fn (x)).

14
18.01 Calculus Jason Starr
Fall 2005

The derivative of u is,

du d d
= (ln(f1 (x))) + · · · + (ln(fn (x))).
dx dx dx
Using the chain rule, this is,
du f � (x) fn� (x)
= 1 + ··· + .
dx f1 (x) fn (x)
Thus, far fewer multiplications are needed to compute u� . This is good, because also,

du d ln(y) 1 dy
= = .
dx dx y dx
Therefore the derivative of y can be computed as,

f1� (x) f � (x)

� �
� �
y = yu = (f1 (x) · · · · · fn (x)) + ··· + n .
f1 (x) fn (x)

Example. Let y be, √

(1 + x3 )(1 + x)
.
x3/7
Then,
√ 3
u = ln(y) = ln(1 + x3 ) + ln(1 + x) − ln(x).
7
√ � √ � √ √
3 �
By the chain rule, ln(1+x ) = 3x /(1+x ) and ln(1+ x) = ( x) /(1+ x) = (1/2x−1/2 )/(1+ x).
2 3

Thus, u� equals,
3x2 1 3
u� = 3
+ √ √ − .
(1 + x ) 2 x(1 + x) 7x
So, ﬁnally, √
(1 + x3 )(1 + 3x2
� �
� � x) 1 3
y = yu = + √ √ − .
x3/7 3
(1 + x ) 2 x(1 + x) 7x

Lecture 6. September 20, 2005

Homework. Problem Set 2 Part I: (f)–(j); Part II: Problems 1, 3 and 4.

Practice Problems. Course Reader: 1J1, 1J2, 1J3, 1J4

1. Trigonometric functions. What is angle? For a sector of a unit circle (a circle of radius
1), the angle of the sector equals both the length of the arc of the sector and 1/2 the area of the
sector. Although we have as yet general deﬁnitions of neither arc length nor area, this can be used
to give a rigorous deﬁnition of angle. We can divide any sector in two equal pieces: simply bisect
the chord of the sector. We also know how to add two angles, by laying the sectors in adjacent
positions. Denoting the area of a unit circle by the symbol π (which happens to be the familiar π),
these 2 operations produce every angle of the form mπ/2n , with m and n integers. Every angle can

18.01 Calculus Jason Starr

Fall 2005

be approximated arbitrarily well by such angles. Thus, for every continuous function of an angle,
every value of the function can be computed.
The basic functions are sin(θ), cos(θ), tan(θ), sec(θ), csc(θ) and cot(θ). Full descriptions of these
are in §9.1 of the textbook by Simmons. The same information is contained in the webpage on
Trigonometry produced by MathWorld, part of Wolfram Research.
2. Trigonometric identities. For today, the most important identities are the angle addition
formulas,
sin(α + β) = sin(α) cos(β ) + cos(α) sin(β),

cos(α + β) = cos(α) cos(β) − sin(α) sin(β).

Other important identities are,

(i) cos(−θ) equals cos(θ), i.e., cos(θ) is an even function,

(ii) sin(−θ) equals − sin(θ), i.e., sin(θ) is an odd function,

(iii) sin(θ + π/2) equals cos(θ),

(iv) cos(θ + π/2) equals − sin(θ), and

(v) sin2 (θ) + cos2 (θ) equals 1 for every θ.

3. Some trigonometric limits. In computing trigonometric limits, the following limit is crucial,

sin(θ)
lim = 1.
θ→0 θ
As explained in class, this is essentially the statement that as θ → 0, the quotient of the arc length
by the chord length tends to 1. This was not proved in lecture, nor is it proved in your textbook
in §2.1 (despite the author’s claim). However, it is geometrically reasonable. And, of course, it can
be proved.
This limit implies another limit,
cos(θ) − 1
lim = 0.
θ→0 θ
To see this, rewrite the term as,

cos(θ) − 1 cos(θ) + 1 cos2 (θ) − 1

= .
θ cos(θ) + 1 θ · (cos(θ) + 1)

By Identity (v), cos2 (θ) − 1 equals − sin2 (θ), so the term equals,

− sin2 (θ) sin(θ) 1

=− sin(θ).
θ · (cos(θ) + 1) θ cos(θ) + 1

16
18.01 Calculus Jason Starr
Fall 2005

As θ → 0, this limit tends to,

−(1) × (1/2) × 0 = 0.
By a similar computation,
cos(θ) − 1 −1
lim 2
= .
θ→0 θ 2
4. Derivatives of sin(x) and cos(x). To compute the derivative of y = sin(x) at x = a, use the
angle addition formulas to write,

sin(a + h) = sin(a) cos(h) + cos(a) sin(h).

This gives,
sin(a + h) − sin(a) = sin(a)(cos(h) − 1) + cos(a) sin(h).
Thus the diﬀerence quotient equals,

sin(a + h) − sin(a) cos(h) − 1 sin(h)

= sin(a) + cos(a) .
h h h
Taking the limit gives,

sin(a + h) − sin(a) cos(h) − 1 sin(h)

lim = sin(a) lim + cos(a) lim .
h→0 h h→0 h h→0 h

Using the limits from above, this gives,

sin� (a) = sin(a) × 0 + cos(a) × 1 = cos(a).

Thus the derivative of sin(x) equals,

d sin(x)
= cos(x).
dx

An entirely similar computation gives,

cos(a + h) − cos(a) cos(h) − 1 sin(h)

= cos(a) − sin(a) ,
h h h
which leads to,

cos(h) − 1 sin(h)

cos� (a) = cos(a) lim − sin(a) lim = cos(a) × 0 − sin(a) × 1.

h→0 h h→0 h
Thus the derivative of cos(x) equals,

d cos(x)
= − sin(x).
dx

17
18.01 Calculus Jason Starr
Fall 2005

5. Derivatives of other trigonometric functions. Using the quotient rule,

d tan(x) 1 cos2 (x) + sin2 (x) 1

= (cos(x) × cos(x) − sin(x)(− sin(x))) = = .
dx cos2 (x) cos2 (x) cos2 (x)

Therefore, the derivative of tan(x) equals,

d tan(x)
= sec2 (x).
dx
In a similar manner,
d cot(x)
= − csc2 (x),
dx
d sec(x)
= sec(x) tan(x),
dx
and
d csc(x)
= − csc(x) cot(x).
dx
Lecture 7. September 22, 2005
Review for Exam 1. No new material was presented. There were no practice problems from the

course reader.

Lecture 8. September 27, 2005

Homework. Problem Set 2 all of Part I and Part II.

Practice Problems. Course Reader: 2A1, 2A4, 2A9, 2A11, 2A12.

1. Linear approximations. For a diﬀerentiable function f (x), the linear approximation or

linearization of f (x) at x = a is the linear function,

f (a) + f � (a)(x − a).

In a precise sense, this is the best approximation of f (x) by a linear function near x = a. For x
close to a, the value of f (x) is close to the value of the linearization. The notation for this is,

f (x) ≈ f (a) + f � (a)(x − a) for x ≈ a.

Example. The linearization of,

f (x) = e−3x sin(2πx) + 5e−3x cos(2πx),

near x = 0 is,
f (x) ≈ 5 − (15 − 2π)x for x ≈ 0.

18
18.01 Calculus Jason Starr
Fall 2005

In particular, for x = 0.02, this gives the approximate answer,

f (0.02) ≈ 5 − (15 − 2π)(0.02) ≈ 4.8.

The actual value is approximately 4.71.

2. Basic approximations. Some linear approximations occur so often, they should be committed
to memory. Each of the following is the linear approximation for x ≈ 0, together with the terms in
the quadratic and higher approximations.
1
1−x
≈ 1 + x + x2 + x3 + . . . ,
�r � �r �
(1 + x)r ≈ 1 + rx + 2
x2 + 3
x3 + . . . ,

sin(x) ≈ x − x3 /3! + x5 /5! + . . . ,

cos(x) ≈ 1 − x2 /2! + x4 /4! + . . . ,

ex ≈
1 + x + x2 /2! + x3 /3! + . . . ,

ln(1 + x)
≈ 1 − x + x2 /2 − x3 /3 + . . .

3. Combining basic approximations. The basic approximations can be combined to get new
linear approximations.
(i) The linear approximation of f (x) for x ≈ a can be converted to a linear approximation at 0 by
setting g(u) = f (a + u). In symbols,

f (a) + f � (a)(x − a) = g (0) + g � (0)u.

This is equivalent to the formula,

d df
(f (x − a)) = (x − a).
dx dx

(ii) The linear approximation of f (cx) for x ≈ a is obtained from the linear approximation of f (u)
for u ≈ ca by substituting u = cx,

f (cx) ≈ f (ca) + f � (ca)(cx − ca).

This is equivalent to the formula,

d df
(f (cx)) = c (x).
dx dx

19
18.01 Calculus Jason Starr
Fall 2005

(iii) The linear approximation of cf (x) for x ≈ a is c times the linear approximation of f (x) for
x ≈ a,
cf (x) ≈ cf (a) + cf � (a)(x − a) .
This is diﬀerent than the previous rule. Also, the linear approximation of f (x) + g(x) for x ≈ a is
the sum of the linear approximations of f (x) and g(x),

(f + g)(x) ≈ f (a) + g(a) + (f � (a) + g � (a))(x − a).

Together, these two rules are equivalent to the formulas,

d df d df dg
(cf (x)) = c (x), (f (x) + g(x)) = (x) + (x).
dx dx dx dx dx

(iv) The linear approximation of f (x)g(x) for x ≈ a is the product of the linear approximations,
disregarding all quadratic terms,

f (x)g(x) ≈ (f (a) + f � (a)(x − a))(g(a) + g � (a)(x − a)),

which simpliﬁes to,

f (x)g(x) ≈ f (a)g(a) + (f � (a)g(a) + f (a)g � (a))(x − a).

This is equivalent to Leibniz’s rule,

d df dg
(f (x)g(x)) = (x)g(x) + f (x) (x).
dx dx dx

(v) The linear approximation of f (x)/g(x) for x ≈ a is the quotient of the linear approximations,
using the linear approximation 1/(1 − x) ≈ 1 + x,

f (x) f (a) + f � (a)(x − a) � 1 1

≈ = (f (a) + f (a)(x − a)) ≈
g(x) g(a) + g � (a)(x − a) g(a) 1 − (−g � (a)(x − a)/g(a))
1 1
(f (a) + f � (a)(x − a)) (1 − g � (a)(x − a)/g(a)) = (f (a) + f � (a)(x − a))(g(a) − g � (a)(x − a)).
g(a) g(a)2
This simpliﬁes to,

f (x)/g(x) ≈ f (a)/g(a) + (1/g(a)2 )(f � (a)g(a) − f (a)g � (a))(x − a).

This is equivalent to the quotient rule,

d 1 df dg
(f (x)/g(x)) = ( (x)g(x) − f (x) (x)).
dx g(x) dx dx

20
18.01 Calculus Jason Starr
Fall 2005

(vi) The linear approximation of g(f (x)) for x ≈ a is obtained from the linear approximation of
g(u) for u ≈ f (a) by substituting in for u the linear approximation of f (x) for x ≈ a and ignoring
quadratic terms,
u = f (x) ≈ f (a) + f � (a)(x − a),
g(f (x)) = g(u) ≈ g(f (a)) + g � (f (a))(u − f (a)) ≈ g(f (a)) + g � (f (a))((f (a) + f � (a)(x − a)) − f (a)).
This simplifes to,
g(f (x)) ≈ g(f (a)) + g � (f (a))f � (a)(x − a).
This is equivalent to the chain rule,
d dg df
(g(f (x))) = (f (x)) (x).
dx dx dx
Together, these 6 rules account for all the general rules we have regarding diﬀerentiation. So every
rule of diﬀerentiation has an equivalent formulation in terms of linear approximations.
Example. Using the rules, the linear approximation for,

f (x) = e−3x sin(2πx) + 5e−3x cos(2πx),

for x ≈ 0 is given by,

(1 + (−3x))(2πx) + 5(1 + (−3x))(1) = 2πx + 5 − 15x,

which simpliﬁes to,

f (x) ≈ 5 − (15 − 2π)x.

4. Quadratic approximations. Sometimes the linear approximation is not good enough. One
example is the linear approximation of cos(x) as 1 for x ≈ 0. The linear approximation gives no
idea whether cos(x) is greater than 1, less than 1, concave up, concave down, etc. This is remedied
by the quadratic approximation,

f (x) ≈ f (a) + f � (a)(x − a) + 12 f �� (a)(x − a)2 for x ≈ a.

Each of the basic approximations has an analogous quadratic approximation. Each of the rules for
combining linear approximations has an analogous rule for quadratic approximations.
5. The mean value theorem. This was discussed only very brieﬂy. If a function f (x) is
diﬀerentiable on the interval having a and b as endpoints, then there is a point c strictly between a
and b so that the slope of the tangent line to y = f (x) at x = c equals the slope of the secant line
to y = f (x) containing (a, f (a)) and (b, f (b)),

f (b) − f (a)
f � (c) = .
b−a

21
18.01 Calculus Jason Starr
Fall 2005

This is sometimes useful for bounding f (b) − f (a), if a bound on the derivative of f (x) is known.

Lecture 9. September 29, 2005

Homework. Problem Set 2 all of Part I and Part II.

Practice Problems. Course Reader: 2B1, 2B2, 2B4, 2B5.

1. Application of the Mean Value Theorem. A realworld application of the Mean Value
Theorem is error analysis. A device accepts an input signal x and returns an output signal y. If
the input signal is always in the range −1/2 ≤ x ≤ 1/2 and if the output signal is,
1
y = f (x) = ,
1 + x + x2 + x3
what precision of the input signal x is required to get a precision of ±10−3 for the output signal?
If the ideal input signal is x = a, and if the precision is ±h, then the actual input signal is in the
range a − h ≤ x ≤ a + h. The precision of the output signal is |f (x) − f (a)|. By the Mean Value
Theorem,
f (x) − f (a)
= f � (c),
x−a
for some c between a and x. The derivative f � (x) is,

−(3x2 + 2x + 1)
f � (x) = .
(1 + x + x2 + x3 )2

For −1/2 ≤ x ≤ 1/2, this is bounded by,

� 3(1/2)2 + 2(1/2) + 1
|f (x)| ≤ = 7.04.
[1 + (−1/2) + (−1/2)2 + (−1/2)3 ]2

Thus the Mean Value Theorem gives,

|f (x) − f (a)| = |f � (c)||x − a| ≤ 7.04|x − a| ≤ 7.04h.

Therefore a precision for the input signal of,

h = 10−3 /7.04 ≈ 10−4

guarantees a precision of 10−3 for the output signal.

2. First derivative test. A function f (x) is increasing, respectively decreasing, if f (a) is less than
f (b), resp. greater than f (b), whenever a is less than b. In symbols, f is increasing, respectively
decreasing, if

f (a) < f (b) whenever a < b, resp. f (a) > f (b) whenever a < b.

22
18.01 Calculus Jason Starr
Fall 2005

If f (a) is less than or equal to f (b), resp. greater than or equal to f (b), whenever a is less than
b, then f (x) is nondecreasing, resp. nonincreasing. If f (x) is increasing, the graph rises to the
right. If f (x) is decreasing, the graph rises to the left.
If f � (a) is positive, the First Derivative Test guarantees that f (x) is increasing for all x sufficiently
close to a. If f � (a) is negative, the First Derivative Test guarantees that f (x) is decreasing for all
x sufficiently close to a.
Example. For the function y = x3 + x2 − x − 1, determine where y is increasing and where y is
decreasing.
The derivative is,
y � = 3x2 + 2x − 1 = (3x − 1)(x + 1).
Thus the derivative of y changes sign only at the points x = −1 and x = 1/3. By testing random
elements, y � is positive for x > 1/3, it is negative for −1 < x < 1/3, and it is positive for x < −1.
Therefore, by the First Derivative Test, y is increasing for x < −1, y is decreasing for −1 < x < 1/3,
and y is increasing for x > 1/3.
3. Extremal points. If f (x) ≤ f (a) for all x near a, then x is a local maximum. If f (x) ≥ f (a)
for all x near a, then x is a local minimum. Because of the First Derivative Test, if f � (a) > 0 and
f is defined to the right of a, the graph of f rises to the right of a. Thus a is not a local maximum.
Similarly, if f � (a) < 0 and f is defined to the left of a, the graph of f rises to the left of a. Thus
a is not a local maximum. In particular, if f is defined to both the right and left of a, if f � (a) is
defined, and if a is a local maximum, then f � (a) equals 0. Similarly, if f is defined to both the right
and left of a, if f � (a) is defined, and if a is a local minimum, then f � (a) equals 0.
A point a where f � (a) is defined and equals 0 is a critical point. By the last paragraph, if x = a is
a local maximum of f , respectively a local minimum of f , then one of the following holds.
(i) The function f (x) is discontinuous at a.

(ii) The function f (x) is continuous at a, but f � (a) is not deﬁned.

(iii) The point a is a left endpoint of the interval where f is deﬁned, and f � (a) ≤ 0, resp. f � (a) ≥ 0.

(iv) The point a is a right endpoint of the interval where f is deﬁned, and f � (a) ≥ 0, rexp.
f � (a) ≤ 0.

(v) The function f is deﬁned to the left and right of a, and f � (a) equals 0. In other words, a is a
critical point of f .

Example. For the function y = x3 + x2 − x − 1, the critical points are x = −1 and x = 1/3. By
examining where y is increasing and decreasing, x = −1 is a local maximum and x = 1/3 is a local
minimum.
The plurals of “maximum” and “minimum” are “maxima” and “minima”. Together, local maxima
and local minima are called extremal points, or extrema. These are points where f takes on an

18.01 Calculus Jason Starr

Fall 2005

extreme value, either positive or negative. A point where f achieves its maximum value among all
points where f is defined is a global maximum or absolute maximum. A point where f achieves its
minimum value among all points where f is defined is a global minimum or absolute minimum.
4. Concavity and the Second Derivative Test. For a differentiable function f , every “interior”
extremal point is a critical point of f . But not every critical point of f is an extremal point.
Example. The function f (x) = x3 has a critical point at x = 0. But f (x) is everywhere increasing,
thus x = 0 is not an extremal point of f .
When is a critical point an extremal point? When is it a local maximum? When is it a local
minimum? This is closely related to the concavity of f . A function f (x) is concave up, respectively
concave down, if no secant line segment to f (x) crosses below the graph of f , resp. above the graph
of f . In symbols, f is concave up, resp. concave down, if
(f (c) − f (a))/(c − a) ≤ (f (b) − f (a))/(b − a) whenever a < c < b,

resp. (f (c) − f (a))/(c − a) ≥ (f (b) − f (a))/(b − a) whenever a < c < b.

For a diﬀerentiable function f , this equation is close to,
f � (c) ≤ f � (b) whenever a < c < b,

resp. f � (c) ≥ f � (b) whenever a > c > b.

This precisely says that f � is nondecreasing, resp. f � is nonincreasing. If f � is nondecreasing,
resp. nonincreasing, then f is concave up, resp. concave down. Applying the First Derivative Test
to determine when f � is increasing, resp. decreasing, gives the Second Derivative Test : If f �� (a) > 0,
then f is concave up near x = a; if f �� (a) < 0 then f is concave down near x = a.
If f is concave up near a critical point, the critical point is a local minimum. If f is concave down
near a critical point, the critical point is a local maximum. Combined with the Second Derivative
Test, this gives a test for when a critical point is a local maximum or local minimum: If f � (a) equals
0 and f �� (a) < 0, then x = a is a local maximum. If f � (a) equals 0 and f �� (a) > 0, then x = a is a
local minimum.
Example. For y = x3 + x2 − x − 1, the second derivative is y �� = 6x + 2. Since y �� (−1) = −4 is
negative, the critical point x = −1 is a local maximum. Since y �� (1/3) = 4 is positive, x = 1/3 is a
local minimum.
5. Inflection points. If f is differentiable, but for every neighborhood of a, f is neither concave
up nor concave down on the entire neighborhood, then a is an inflection point. If f �� (a) is defined,
the Second Derivative Test says that f �� (a) must equal 0. Except in pathological cases, an inflection
point is a point where f is concave up to one side of f , and concave down to the other side of f .
Example. For y = x3 + x2 − x − 1, the second derivative y �� = 6x + 2 is negative for x < −1/3
and is positive for x > 1/3. By the Second Derivative Test, y is concave down for x < −1/3 and y
is concave up for x > −1/3. Therefore x = −1/3 is an inflection point for y.

18.01 Calculus Jason Starr

Fall 2005

Lecture 10. September 30, 2005

Homework. Problem Set 3 Part I: (a)–(f). Part II: Problems 1, 2 and 3.

Practice Problems. Course Reader: 2C5, 2C10, 2C12, 2D3, 2D4.

1. Asymptotes. An asymptote describes the behavior of the graph of y = f (x) as it becomes

unbounded, in some sense. There are two main examples. The function f has a vertical asymptote
x = a if at least 1 of the following holds,
lim f (x) = +∞, lim− f (x) = −∞, lim+ f (x) = +∞, lim+ f (x) = −∞.
x→a− x→a x→a x→a

In each case, the graph of y = f (x) becomes unbounded, and becomes arbitrarily close to the line
x = a. If x = a is a vertical asymptote, then f (x) has an inﬁnite discontinuity at x = a.
The function f has a horizontal asymptote y = b if at least 1 of the following holds,
lim f (x) = b, lim f (x) = b.
x→+∞ x→−∞

In other words, the graph of y = f (x) becomes arbitrarily close to the line y = b as x approaches
either +∞ or −∞.
Example. For the function y = (x3 + x)/(x2 − 1) = x(x2 + 1)/(x2 − 1), the lines x = −1 and
x = −1 are vertical asymptotes. There is no horizontal asymptote. However, the graph of y is
asymptotic to the line y = x. This was not discussed in lecture. A pair of functions f and g are
asymptotic to each other if the line y = 0 is a horizontal asymptote of f − g.
2. Applied maximum/minimum problems. Using the First Derivative Test, the maximum
and minimum of many functions can be computed. This is very important in applications.
Example. Two long walls meet at right angles making a corner. Using a length of 10 meters of
fence to form the other 2 sides of a rectangle, what is the largest area that can be enclosed in this
corner?
Step 1. Identify parameters. A parameter is a constant or variable. The constant in this
problem is 10 meters. Two variables are the length l of one side of the rectangle, and the width w
of the remaining side of the rectangle.
Step 2. Draw a diagram. This was done in lecture.
Step 3. Find the quantity to be maximized or minimized. The quantity to be maximized
is the area A of the rectangle. Since the area is the product of the length and width, A equals lw.
Step 4. Use the constraints to eliminate variables. The constraint is that the total length
of fence is 10 meters. Thus l + w equals 10. This is used to eliminate w,
w = 10 − l.
Making this substitution, A is now a function of l alone,
A(l) = lw(l) = l(10 − l) = −l2 + 10l.

25
18.01 Calculus Jason Starr
Fall 2005

Step 4 12 . Sketch a graph of the quantity to be maximized or minimized. This is not

absolutely necessary. Sometimes it is impossible. When you can make a rough sketch, this will
typically give a very good idea where the maximum or minimum lies. In the example above, A(l)
is a quadratic equation. Because both l and w must be nonnegative, A(l) is only deﬁned on the
interval 0 ≤ l ≤ 10. Thus the graph of A(l) is a segment of a parabola opening down. The vertex
of the parabola is contained in the segment. Thus the vertex is the maximum.
Step 5. Compute the derivative. In this case,

A� (l) = −2l + 10.

Step 6. Find all critical points, endpoints, discontinuity points, etc. In most cases, it
suffices to find all critical points and endpoints. Occasionally it is also necessary to find all points
where f � is not defined. Rarely it is necessary to also consider discontinuity points (although this
is usually so obvious that it does not require a separate step). In this case, the endpoints are l = 0
and l = 10. The one critical point is l = 5.
Step 7. Determine the global maximum or minimum. Checking all critical points, end
points, etc., determine the global maximum or the global minimum. In this case, A(0) equals 0,
A(10) equals 0 and A(5) equals 25. Thus l = 5 is the global maximum.
Step 8. Backsubstitute. Plug in the value of the single remaining independent variable to
determine the values of the remaining independent variables. In this case, w equals 10 − l, which
is 10 − 5 = 5 for l = 5. Thus, the largest area 25 is enclosed by a square of side length 5.
Example. A swimmer is in the water at a distance b1 meters from shore. She wants to reach a
point on land b2 meters from the water. The point is a meters parallel to the shore. If the swimmer
swims v1 meters per second and runs v2 meters per second, at what distance x from the closest
point on shore should she aim to minimize her time to the target? Mathematically, the swimmer
is at point (0, b1 ) and wants to reach point (a, −b2 ), where the shore is the xaxis. At what point
(x, 0) should she aim?
The constants are a, b1 , b2 , v1 and v2 . The variable is x. It is also convenient to introduce a
variable d1 for the distance from (0, b1 ) to (x, 0), and a variable d2 for the distance from (x, 0) to
(a, −b2 ). Although not obvious, it is also very convenient to introduce a variable θ1 for the acute
angle formed by the xaxis and the line segment joining (0, b1 ) to (x, 0). Also introduce θ2 for the
acute angle formed by the xaxis and the line segment joining (x, 0) to (a, −b2 ).
The time T1 to swim to point (x, 0) is,
d1 1
T1 = = (x2 + b21 )1/2 .
v1 v1
The time T2 to run from (x, 0) to point (a, −b2 ) is,
d2 1
T2 = = ((a − x)2 + b22 )1/2 .
v2 v2

26
18.01 Calculus Jason Starr
Fall 2005

Thus the total time to reach the target is,

1 2 1
T = T1 + T2 = (x + b21 )1/2 + ((a − x)2 + b22 )1/2 .
v1 v2
The derivative of T with respect to x is,
� � � �
dT 1 1 2 2 −1/2 1 1 2 2 −1/2
= (x + b1 ) (2x) + ((a − x) + b2 ) (−2(a − x)) .
dx v1 2 v2 2
This simpliﬁes to,
dT x a−x
= − .
dx v1 d1 v2 d2
Observe that x/d1 equals sin(θ1 ) and (a − x)/d2 equals sin(θ2 ). Thus,
dT sin(θ1 ) sin(θ2 )
= − .
dx v1 v2

Technically, there are no endpoints. However, it is obvious that the maximum must occur for
0 ≤ x ≤ a. Thus these may be taken to be endpoints. The critical value occurs when,
sin(θ1 ) sin(θ2 )
v1
= v2
.

This is Snell’s Law for refraction of light upon crossing from one medium to another. For refraction,
a particle of light (perhaps ﬁctitious) replaces the swimmer, a translucent medium of one type
replaces the water, and a translucent medium of a second type replaces the land. If light travels
with velocity v1 in the ﬁrst medium and with velocity v2 in the second medium, light rays will refract
upon crossing the boundary between media. Snell’s Law describes the angles of this refraction.
Lecture 11. October 4, 2005
Homework. Problem Set 3 Part I: (g) and (h).
Practice Problems. Course Reader: 2E4, 2E8, 2E9.
1. Related rates. A situation that arises often in practice is that two quantities, say x and y,
depend on a third independent variable, say t. The quantities x and y are related through some
constraint. Using the constraint, if the rateofchange dx/dt is known, the rateofchange dy/dt can
be inferred.
Example. For a spring displaced x units from equilibrium, Hooke’s law implies the potential
energy of the spring is,
1
P = kx2 ,
2
2
where k is a constant with units kg/s . At some moment t = T , a spring is displaced 5cm from
equilibrium and has velocity 5cm/s. In terms of the spring constant k, describe the rateofchange
of the potential energy at t = T .

27
18.01 Calculus Jason Starr
Fall 2005

Implicitly diﬀerentiating the equation with respect to t gives, using the chain rule,
dP 1 dx dx
= k(2x) = kx .
dt 2 dt dt
So, at time t = T ,
dP dx
(T ) = kx(T ) (T ) = k(5)(5)cm2 /s = 25kcm2 /s.
dt dt

2. Method for solving relatedrates problems. Many of these steps apply to any word
problem in mathematics.

(i) Identify the independent variable. In the example, this is t.

(ii) Label all constants. In the example, k is a constant.

(iii) Label all dependent variables. In the example, x and P are dependent variables.

(iv) Draw a diagram and carefully label it.

(v) Write the given rateofchange and the unknown rateofchange. In the example, dx/dt(T ) is
given as 5cm/s, and dP /dt is unknown.

(vi) Using the diagram and any other information, ﬁnd constraints among the dependent variables.
In the example, this is the equation P = kx2 /2.

(vii) Implicitly diﬀerentiate the constraint equations with respect to the independent variable. In
the example, this gives dP/dt = kxdx/dt.

(viii) Substitute in all known quantities and solve for the unknown rateofchange. In the example,
dP/dt(T ) equals 25kcm2 /s.

Example. A state trooper waits a distance a from a highway for passing speeders. The speed
limit is 60mph. The trooper aims her radar gun at an angle of π/4 to the road. The radar registers
a passing car moving away from the trooper at a speed of 50mph. Should the trooper ticket the
driver?
The independent variable is time t. The constants are the distance a and the angle θ = π/4.
Label a coordinate system with the trooper at the origin and the highway equal to the line y = a.
Label the position of the car along the highway as x, moving in the positive direction. Denote by
r the distance of the car from the trooper. Then x and r are dependent variables. The rateof
change dr/dt(T ) is given as 50mph. The unknown rateofchange is dx/dt(T ). The constraint is
the Pythagorean theorem,
r 2 = x2 + y 2 .

28
18.01 Calculus Jason Starr
Fall 2005

Implicit diﬀerentiation with respect to t yields,

dr dx dx
2r = 2x + 0 = 2x .
dt dt dt
√
At time t = T , x(T ) equals a, because the angle θ is π/4. Thus r(T ) equals 2a. Substituting in
gives,
√ dx
2( 2a)50 = 2(a) (T ).
dt
Solving gives,
dx √
(T ) = 250 ≈ 71mph.
dt
So the trooper should ticket the driver.
Example. A point on the xaxis moves away from the origin. There is an angle θ subtended by
the point and the unit circle with equation x2 + y 2 = 1. In other words, standing at the point (x, 0)
and staring at the circle, θ is the angle of your ﬁeldofvision occupied by the circle. At a moment
t = T , the point is at the position (2, 0) and moving with velocity v. What is the rateofchange of
θ at t = T ?
The independent variable is time t. There is no constant. The dependent variables are the x
coordinate of the point, x(t), and the angle θ(t). The rateofchange dx/dt(T ) is given to be v. The
rateofchange dθ/dt is unknown.
The constraint is somewhat tricky. There are two tangent lines to the circle containing (x, 0). These
are the tangent lines to points (a, +b) and (a, −b) on the circle. Because the tangent line to the
circle at (a, b) is perpendicular to the radius through (a, b), the triangle with vertices (0, 0), (a, b)
and the point (x, 0) is a right triangle. The angle of the triangle at (x, 0) is θ/2. Since the radius
has length 1 and the hypotenuse has length x, the constraint is,
1
sin(θ) = .
x
Implicit diﬀerentiation with respect to t gives,

d sin(θ) dθ d(x−1 ) dx
= ,
dθ dt dx dt
or,
dθ −1 dx
cos(θ) = 2 .
dt x dt
√
Since x(T ) equals 2, sin(θ(T )) = 1/2, and thus cos(θ(T )) equals 3/2. Plugging in gives,
√
3 dθ −1 −v
(T ) = 2
v= .
2 dt (2) 4

29
18.01 Calculus Jason Starr
Fall 2005

Solving gives,
dθ √
(T ) = −v/(2 3).
dt
3. Another applied max/min problem. As review for Exam 2, this is another applied max/min
problem. A trapezoid is inscribed inside the upper unit semicircle, x2 + y 2 = 1, y ≥ 0. The base
of the trapezoid is the diameter of the semicircle lying on the xaxis. The top of the trapezoid
is parallel to the xaxis joining (−x, y) to (x, y) for a point (x, y) on the unit circle in the ﬁrst
quadrant. What is the maximal area enclosed by such a trapezoid?
The parameters are x and y. The height of the trapezoid is y. The area of a trapezoid is the
product of the height with the average of the parallel sides. Thus,
(2 + 2x)
A=y = (x + 1)y.
2
This is the quantity to be maximized. There is a constraint among the parameters,

x2 + y 2 = 1.

Also, since (x, y) is in the ﬁrst quadrant, 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1.

There are at least 3 ways to proceed. The most direct is to solve for y in terms of x,
√
y = 1 − x2 .

Substituting this into the equation for A gives,

√
A(x) = (x + 1) 1 − x2 .

Diﬀerentiating gives,
dA √ −2x 1 −(2x2 + x − 1)
= 1 − x2 + (x + 1) √ =√ ((1 − x2 ) − (x2 + x)) = √ .
dx 2 1 − x2 1 − x2 1 − x2
Because the quadratic polynomial 2x2 + x − 1 factors as,

2x2 + x − 1 = (2x − 1)(x + 1),

the critical points of A are x = −1 and x = 1/2. But x = −1 does not give a point in the ﬁrst
quadrant. Thus A is maximized either at one of the endpoints x = 0, x = 1 or at the critical point
x = 1/2. Plugging in gives,
√
A(0) = 1, A(1/2) = 3 3/4, A(1) = 0.

This gives the answer,

√ √
A achieves its maximum 3 3/4 for the point (x, y) = (1/2, 3/2).

18.01 Calculus Jason Starr

Fall 2005

Two other methods were given in lecture. The fastest among the three is to instead minimize A2 ,

A2 = (x + 1)2 y 2 .

Using the constraint, y 2 = 1 − x2 , thus,

(A2 )(x) = (x + 1)2 (1 − x2 ).

The derivative of this polynomial is very fast to compute, and gives the same answer as above.

Lecture 12. October 6, 2005

Homework. Problem Set 3 Part I: (i) and (j).

This was a guest lecture by Sabri Kilic. Notes from the lecture will not be posted. As always,

please do the required reading in the course textbook.

Lecture 13. October 13, 2005

Homework. Problem Set 4 Part I: (a) and (b); Part II: Problem 3.

Practice Problems. Course Reader: 3A1, 3A2, 3A3.

1. Differentials. An alternative notation for derivatives is differential notation. The differential

notation,
dF (x) = f (x)dx,
is shorthand for the sentence “The derivative of F (x) with respect to x equals f (x).” Formally,
this is related to the Leibniz notation for the derivative,
dF
(x) = f (x),
dx
which means the same thing as the differential notation. It may look like the first and second
equation are obtained by dividing and multiplying by the quantity dx. It is crucial to remember
that dF/dx is not a fraction, although the notation suggests otherwise.
In differential notation, some derivative rules have a very simple form, and are thus easier to
remember. Here are a few derivative rules in differential notation.
dF (x) = F � (x)dx
d(F (x) + G(x)) = dF (x) + dG(x)
d(cF (x)) = cdF (x)
d(F (x)G(x)) = G(x)dF (x) + F (x)dG(x)
d(F (x)/G(x)) = 1/(G(x))2 (G(x)dF (x) − F (x)dG(x))

The chain rule has a particularly simple form,

dF dF du
d(F (u)) = du = dx.
du du dx

31
18.01 Calculus Jason Starr
Fall 2005

√
Example. Using diﬀerential notation, the derivative of sin( x2 + 1) is,

d sin((x2 + 1)1/2 ) = cos((x2 + 1)1/2 )d(x2 + 1)1/2 = cos((x2 + 1)1/2 )( 21 (x2 + 1)−1/2 )d(x2 + 1) =
cos((x2 + 1)1/2 ) 21 (x2 + 1)−1/2 (2xdx) = x(x2 + 1)−1/2 cos((x2 + 1)1/2 )dx .

2. Antidiﬀerentiation. Recall, the basic problem of diﬀerentiation is the following.

dF
Problem (Differentiation). Given a function F (x), find the function f (x) satisfying dx
= f (x).
The bais problem of antidifferentiation is the inverse problem.

dF
Problem (Antidiﬀerentiation). Given a function f (x), ﬁnd a function F (x) satisfying dx
=

f (x).
A function F (x) solving the problem is called an antiderivative of f (x), or sometimes an indeﬁnite
integral of f (x). The notation for this is,
�
F (x) = f (x)dx.

The expression f (x) is called the integrand. It is important to note, if F (x) is one antiderivative
of f (x), then for each constant C, F (x) + C is also an antiderivative of f (x). The constant C is
called a constant of integration.
In a sense that can be made precise, the problem of differentiation has a complete solution whenever
F (x) is a “simple expression”, i.e., a function built from the differentiable functions we have seen
so far. Unfortunately, for very many simple functions f (x), no antiderivative of f (x) has a simple
expression. In large part, this is what makes antidifferentiation difficult. Luckily, many of the most
important simple functions f (x) do have an antiderivative with a simple expression. One goal of
this unit is to learn how to recognize when a simple antiderivative exists, and some tools to compute
the antiderivative.
3. Antidifferentiation. Guessandcheck. The main technique for antidifferentiation is edu
cated guessing.
Example. Find an antiderivative of f (x) = x2 + 2x + 1. Since the derivative of xn is nxn−1 , it is
reasonable to guess there is an antiderivative of the form F (x) = Ax3 + Bx2 + Cx. Differentiation
gives,
dF
= 3Ax2 + 2Bx + C.
dx
Thus, F (x) is an antiderivative of f (x) if and only if,

3A = 1, 2B = 2, and C = 1.

This gives an antiderivative,

�
(x2 + 2x + 1)dx = 1 3
3
x + x2 + x + E ,

32
18.01 Calculus Jason Starr
Fall 2005

where E is any constant.

Guessandcheck is a game we can lose, as well as win. However, there are a few rules that better the
odds in this guessing game. In fact, they are basically the same rules for derivatives in diﬀerential
notation, simply written backwards.
� � �
(f �(x) + g(x))dx = f (x)dx
� + g(x)dx
cf (x)dx = c f (x)dx
f (u(x))u� (x)dx =
� �
f (u)du

4. Antidiﬀerentiation. Integration by substitution. The last rule above is very important,

and called integration by substitution.
Example. Find an antiderivative of x sin(x2 ). This time guessandcheck is much less eﬀective.
By roughly the same logic in the last example, we might guess an antiderivative has the form
Ax3 sin(x2 ). The derivative is 3Ax2 sin(x2 ) + 2Ax4 cos(x2 ). The ﬁrst term is good, but the second
term is bad. We can try to correct our guess by adding a term, Ax3 sin(x2 ) − 2/5Ax5 cos(x2 ), whose
derivative is now 3Ax2 sin(x2 ) + 4/5Ax6 sin(x2 ). This still doesn’t work, and is leading in the wrong
direction.
A better solution is to use integration by substitution. Observe part of f (x) can be written as
a function of u(x) = x2 . Also, the derivative u� (x) = 2x occurs in f (x) through x = 1/2(2x) =
u� (x)/2. Thus,
x sin(x2 ) = sin(u(x))u� (x)/2, u(x) = x2 .
Applying integration by substitution,

x sin(x2 )dx = sin(u(x)) 21 u� (x)dx = 12 sin(u)du =

� � �
−1
2
cos(u) + C = −12
cos(x2 ) + C.

Here is a checklist for applying integration by substition to ﬁnd the antiderivative of f (x).

(i) Find an expression u(x) so that most of the integrand f (x) can be expressed as a simpler
function of u(x).

(ii) Compute the diﬀerential du(x) = u� (x)dx.

(iii) Inside the diﬀerential f (x)dx, try to ﬁnd du = u� (x)dx as a factor.

(iv) Try to write f (x)dx as g(u)du. If you cannot do this, the method does not apply with the
given choice of u.
�
(v) Find an antiderivative G(u) = g(u)du for the simpler integrand g(u) (if this is possible).

(vi) Backsubstitute u = u(x) to get an antiderivative F (x) = G(u(x)) for f (x).

33
18.01 Calculus Jason Starr
Fall 2005

Example. Compute the antiderivative,

�
sin(x)3 cos(x)dx.

Most of the integrand is a function of sin(x). So substitute u(x) = sin(x). The diﬀerential of u is
du = cos(x)dx. The diﬀerential sin(x)3 cos(x)dx contains du = cos(x)dx as a factor. The remainder
of the integrand is sin(x)3 = u3 . So, according to integration by substitution,
� �
1
sin(x) cos(x)dx = u3 du = u4 + C.
3
4
Finally, backsubstitute u = sin(x) to get,
�
sin(x)3 cos(x)dx = (sin(x))4 /4 + C.

Lecture 14. October 14, 2005

Homework. Problem Set 4 Part II: Problem 2.
Practice Problems. Course Reader: 3B1, 3B3, 3B4, 3B5.
1. The problem of areas. The ancient Greeks computed the areas of triangles, quadrilaterals,
and many other polygons. Their basic method was dissection: dissecting a polygonal region exactly
into smaller regions, usually triangles, having known areas. The area of the large region is the sum
of the areas of the small regions. But the ancient Greeks also knew the area of a circle, which
cannot be dissected exactly into finitely many polygonal regions. Their method was exhaustion:
finding polygonal regions approximately equal to the original region, and computing the limit of
the areas of the polygons as the approximation improves.
Example. A regular N sided polygon inscribed in a circle of radius r has apothem length a =
r cos(π/N ) and chord length b = 2r sin(π/N ). Thus the area of the polygon is,
ab N sin(2π/N )
A=N = N r2 sin(π/N ) cos(π/N ) = r2 sin(2π/N ) = πr2 .
2 2 2π/N
As N increases, 2π/N decreases to 0. Because limt→0 sin(t)/t equals 1, as N approaches infinity,
the area of the polygon approaches,
sin(2π/N )
lim πr2 = πr2 .
N →∞ 2π/N

A more sophisticated version of the method of exhaustion gives the Riemann integral. Here is the
basic problem.
Problem (Area). Find the signed area between the graph of y = f (x) and the xaxis over the
interval a ≤ x ≤ b.

34
18.01 Calculus Jason Starr
Fall 2005

For a region above the xaxis, the signed area is simply the area. For a region below the xaxis, the
signed area is the negative of the area. For a region partly above the xaxis and partly below the
xaxis, the signed area is the sum of the signed area of the region above the xaxis and the signed
area of the region below the xaxis.
2. Partitions. A partition of an interval [a, b] is a ﬁnite decomposition of the interval as a union
of nonoverlapping subintervals,

[a, b] = [x0 , x1 ] ∪ [x1 , x2 ] ∪ · · · ∪ [xn−2 , xn−2 ] ∪ [xn−1 , xn ].

Since an interval is determined by its right and left endpoints, to specify a partition of [a, b], it is
equivalent to give an ordered sequence of increasing numbers,

a = x0 < x1 < x2 < · · · < xn−2 < xn−1 < xn = b.

The k th subinterval of the partition is the interval [xk−1 , xk ], having length,

Δxk = xk − xk−1 .

A partition is fine if the subintervals are small, and coarse if the subintervals are large. It may seem
the number of intervals n is a good measure of fineness: since the subintervals of a fine partition
are small, the number n of subintervals must be large. However, a partition into many subintervals
may include a few subintervals that are quite large. For instance, the partition

[0, 1] = [0, 1/2n]∪[1/2n, 2/2n]∪[2/2n, 3/2n] ∪· · ·∪[(n−2)/2n, (n−1)/2n] ∪ [n−1/2n, n/2n]∪[1/2, 1],

has n very small intervals of length 1/2n, but has one interval, [1/2, 1], of size 1/2. The number
1/2 may not seem large, but as n increases, it is quite large compared to 1/2n.
Because of such examples, a better measure of ﬁneness is mesh size: The mesh size of a partition
is the maximal length of any subinterval in the partition,

mesh = max Δxk |k = 1, . . . , n.

3. Riemann sums. Let f (x) be a function deﬁned on an interval a ≤ x ≤ b. Given a partition

a = x0 < · · · < xn = b of [a, b], and given a choice, for every k = 1, . . . , n, of element x∗k in the
k th subinterval, xk−1 ≤ x∗k ≤ xk , the curvilinear region bounded by y = f (x) and the xaxis is
approximated by a union of n vertical strips. The k th vertical strip lies above or below the interval
on the xaxis, xk−1 ≤ x ≤ xk , and has height yk∗ = f (x∗k ). The width of the vertical strip is Δxk ,
thus the signed area is,
ΔAk = yk∗ Δxk .
The total area of the union of vertical strips is simply the sum of the areas of individual vertical
strips,
� n
A= yk∗ Δxk .
k=1

18.01 Calculus Jason Starr

Fall 2005

The sum above is a Riemann sum. It is an approximation of the signed area of the curvilinear
region.
There are many choices of partition. And for each partition, there are many choices for the numbers
x∗k . However, there are some special choices. On the k th interval, the smallest value f (x) takes on
is denoted by,
yk,min = min{f (x)|xk−1 ≤ x ≤ xk+1 }.
Similarly, the largest value f (x) takes on is denote by,

yk,max = max{f (x)|xk−1 ≤ x ≤ xk+1 }.

For every choice of x∗k in the k th interval, yk∗ is trapped between these two values,

yk,min ≤ yk∗ ≤ yk,max .

Denoting,
ΔAk,min = yk,min Δxk , ΔAk,max = yk,max Δxk ,
the area ΔAk is trapped between these two values,

ΔAk,min ≤ ΔAk ≤ ΔAk,max .

Denoting the sums of the areas by,

Amin = �nk=1 ΔAk,min = �nk=1 yk,min Δxk ,

� �
Amax = nk=1 ΔAk,min = nk=1 yk,min Δxk ,

the Riemann sum A is trapped between the two values,

Amin ≤ A ≤ Amax .

Thus, if Amin and Amax are close to each other, the value of A does not depend very much on the
choices of the numbers x∗k .
4. The Riemann integral. The method of the Riemann integral is to compute both Amin and
Amax for a sequence of partitions whose mesh sizes approach 0. The mesh size measures the ﬁneness
of the partition, thus also the ﬁt of the union of vertical strips to the curvilinear region. If the two
limits,
lim Amin , lim Amax ,
mesh→0 mesh→0

are deﬁned and equal, it is said the Riemann integral exists, and the common limit is called the
Riemann integral, � b
f (x)dx = lim Amin = lim Amax .
a mesh→0 mesh→0

Also, f (x) is said to be Riemann integrable on the interval [a, b]. Another name for the Riemann
integral is the deﬁnite integral.

36
18.01 Calculus Jason Starr
Fall 2005

Example. Consider the function f (x) = x on the interval 0 ≤ x ≤ L, for some positive number
L. Form the partition with n subintervals of equal length,

x0 = 0 = 0L/n, x1 = 1L/n, x2 = 2L/n, . . . , xk = kL/n, . . . xn = nL/n = L.

Every interval has length Δxk = L/n. So the mesh size is L/n. The minimum value of f (x) on the
interval xk−1 ≤ x ≤ xk is yk,min = xk−1 = (k − 1)L/n. The maximum value is yk,max = xk = kL/n.
Thus,
n n n
� � (k − 1)L L L2 �
Amin = yk,min Δxk = = 2 (k − 1),
k=1 k=1
n n n k=1

and,
n n n
� � kL L L2 �
Amax = yk,max Δxk = = k.
k=1 k=1
n n n2 k=1
To evaluate these sums, use the wellknown formula,
n
� n(n + 1)
k= .
k=1
2

This also gives,

n n−1 n−1
� � � (n − 1)n
(k − 1) = l= l= ,
k=1 l=0 l=1
2
by making the substitution l = k − 1. Substituting the formula gives,
L2 n(n − 1) L2 1
Amin = = (1 − ),
n2 2 2 n
and,
L2 n(n + 1) L2 1
Amin = 2 = (1 + ).
n 2 2 n
Therefore,
L2 1 L2 L2
lim Amin = lim(1 − ) = (1 − 0) = .
n→∞ 2 n→0 n 2 2
Similarly,
L2 1 L2 L2
lim Amax = lim(1 + ) = (1 + 0) = .
n→∞ 2 n→0 n 2 2
Since the two limits are equal, f (x) = x is Riemann integrable on the interval [0, L], and,
� L
xdx = L2 /2.
0

This agrees with the familiar result from highschool geometry: the area of a triangle equals one
half of the base times the height, since both the base and height of this triangle are L.

37
18.01 Calculus Jason Starr
Fall 2005

5. Rules for Riemann integrals. There are several rules for Riemann integrals, summarized
below. �b �b �b
a�
(f (x) + g(x))dx = a f (x)dx + a g(x)dx,
b �b
a
(r · f (x))dx = r · a f (x)dx,
�b �c �c
a
f (x)dx + b f (x)dx = a
f (x)dx.

Lecture 15. October 18, 2005

Homework. Problem Set 4 Part I: (d) and (e); Part II: Problem 2.
Practice Problems. Course Reader: 3B6, 3C2, 3C3, 3C4, 3C6.
1. The Riemann sum for the exponential function. The problem is to compute the Riemann
integral, � b
ex dx,
0
using Riemann sums. Choose the partition of [0, b] into a sequence of n equallyspaced subintervals
of length b/n. So the partition numbers are xk = kb/n. Also the length of each partition is
Δxk = b/n. Because ex is increasing, the minimum value of ex on the interval [xk−1 , xk ] occurs at
the left endpoint,
yk,min = exk−1 = e(k−1)b/n .
Similarly, the maximum value occurs at the right endpoint,

yk,max = exk = ekb/n .

Thus the lower sum is,

n n
� � b
Amin = yk,min Δxk = e(k−1)b/n .
k=1 k=1
n
And the upper sum is,
n n
� � b
Amax = yk,max Δxk = ekb/n .
k=1 k=1
n

To evaluate each of the sums, make the substitution c = eb/n . Then the lower sum is,
n n−1
b � k−1 b� l
Amin = c = c.
n k=1 n l=0

The sum is a geometric sum,

cn − 1
(1 + c + c2 + · · · + cn−2 + cn−1 ) = .
c−1
Plugging this in gives,
b cn − 1 b ebn/n − 1
Amin = = .
n c−1 n eb/n − 1

38
18.01 Calculus Jason Starr
Fall 2005

This simpliﬁes to,

b/n
Amin = (eb − 1) .
eb/n−1
A similar computation gives,
b/n
Amax = (eb − 1)eb/n .
eb/n −1
Now make the substitution, h = b/n. This gives,

h
Amin = (eb − 1) ,
eh −1
h
Amax = (eb − 1)eh .
−1 eh
Taking the limit of Amin , respectively Amax , as n tends to inﬁnity is the same as taking the limit
as h tends to 0.
Now observe that,
eh − 1
lim ,
h→0 h
is the diﬀerence quotient limit giving the derivative of ex at x = 0. Since dex /dx equals ex , and
since e0 equals 1, this gives,
eh − 1
lim = 1.
h→0 h
Inverting gives,
�−1
eh − 1
�
h
lim = lim = (1)−1 = 1.
h→0 eh − 1 h→0 h
Also, because ex is continuous,
lim eh = e0 = 1.
h→0

Putting this together gives,

h
lim Amin = (eb − 1) lim = (eb − 1)(1) = eb − 1.
n→∞ h→0 eh −1
Similarly,
h
lim Amax = (eb − 1)(lim eh )(lim ) = (eb − 1)(1)(1) = eb − 1.
n→∞ h→0 h→0 eh−1
Since the limit of Amin and the limit of Amax exist and are equal, the Riemann integral exists and
equals, � b
ex dx = eb − 1.
0

18.01 Calculus Jason Starr

Fall 2005

2. The Riemann sum for xr . Let r > 0 be a positive real number. The problem is to compute
the Riemann integral, � b
xr dx,
1
using Riemann sums. For this particular integral, a diﬀerent partition than usual is more eﬃcient.
Let n be a positive integer, and let q be the real number,

q = b1/n .

Choose the partition of [1, b] into n subintervals with partition numbers,

xk = q k .

Observe that,
1 = x0 < x1 < · · · < xn−1 < xn = (b1/n )n = b.
The length of the k th subinterval is,

Δxk = xk − xk−1 = q k − q k−1 = q k−1 (q − 1).

Observe this increases as k increases. So this is not the partition of [1, b] into n equal subintervals.
The mesh size is,

mesh = max(Δx1 , . . . , Δxn ) = Δxn = (q − 1)b(n−1)/n ≤ q − 1.

As n tends to inﬁnity, the mesh size tends to,

lim mesh = lim q − 1 = lim b1/n − 1 = 0.

n→0 n→0 n→0

Thus, even though this isn’t the most obvious choice of partition, it can be used to compute the
Riemann integral.
Because xr is increasing, the minimum value of xr on the interval [xk−1 , xk ] occurs at the left
endpoint,
yk,min = xrk−1 = q (k−1)r .
Similarly, the maximum value occurs at the right endpoint,

yk,max = xrk = q kr .

Thus the lower sum is,

n
� n
�
Amin = yk,min Δxk = q (k−1)r · q (k−1) (q − 1).
k=1 k=1

18.01 Calculus Jason Starr

Fall 2005

This simpliﬁes to,

n
�
Amin = (q − 1) q (k−1)(r+1) .
k=1

And the upper sum is,

n
� n
�
Amax = yk,max Δxk = q kr q (k−1) (q − 1).
k=1 k=1

This simpliﬁes to,

n
�
Amax = (q − 1)q r q (k−1)(r+1) .
k=1

To evaluate the sum, make the substitution c = q r+1 . Then the sum is,
n
�
ck−1 = 1 + c + c2 + · · · + cn−2 + cn−1 .
k=1

This geometric sum equals,

cn − 1 q n(r+1) − 1
= r+1 .
c−1 q −1
Thus the upper and lower sums simplify to,

Amin = (q − 1)(q n(r+1) − 1)/(q r+1 − 1),

Amax = q r (q − 1)(q n(r+1) − 1)/(q r+1 − 1).

Now backsubstitute q = b1/n to get that q n(r+1) = br+1 . Simplifying gives,

1
Amin = (br+1 − 1) ,
(q r+1 − 1)/(q − 1)
1
Amax = (br+1 − 1)q r .
(q r+1 − 1)/(q − 1)

As n tends to inﬁnity, the quantity q = b1/n tends to 1. The fraction,

q r+1 − 1
,
q−1
is the diﬀerence quotient for y = xr+1 for x = 1. As q tends to 1, the limit of the diﬀerence quotient
is the derivative of y = xr+1 at x = 1,
q r+1 − 1 d(xr+1 )
lim = |x=1 = ((r + 1)xr |x=1 = (r + 1).
q→1 q − 1 dx

41
18.01 Calculus Jason Starr
Fall 2005

Also, since xr is continuous,

lim q r = 1r = 1.
q→1

Substituting this in gives,

�−1
q r+1 − 1 br+1 − 1
�
r+1
lim Amin = (b − 1) lim = ,
n→∞ q→1 q − 1 r+1
�−1
q r+1 − 1 br+1 − 1
� ��
r+1 r
lim Amax = (b − 1) lim q lim = ,
n→∞ q→1 q→1 q − 1 r+1
Since the limit of Amin and the limit of Amax exist and are equal, the Riemann integral exists and
equals, � b
xr dx = (br+1 − 1)/(r + 1).
1

3. The Fundamental Theorem of Calculus. There is a single theorem that it is at the heart
of almost all applications involving Riemann integrals. The theorem answers two question simul
taneously: Which functions are Riemann integrable? What is the Riemann integral of a function?
The answer to the first question is: Every function you are likely to encounter is Riemann inte
grable. Precisely, every continuous function, and every piecewise continuous function is Riemann
integrable.
The answer to the second question is more interesting. Assume f (x) is a continuous function. Let
x = a be a fixed point where f (x) is defined. Form the function,
� x
F (x) = f (t)dt.
a

The function F (x) is defined whenever f (t) is defined on all of [a, x]. If f (x) is continuous, the
Fundamental Theorem of Calculus asserts F (x) is differentiable and,
� x
dF d
(x) = f (t)dt = f (x).
dx dx a

The proof of the second part is very easy. Consider the increment in F from x to x + Δx,
� x+Δx � x � x+Δx
F (x + Δx) − F (x) = f (t)dt − f (t)dt = f (t)dt.
a a x

Let ymin be the minimum value of f (t) on the interval [x, x + Δx]. Let ymax be the maximum
value of f (t) on the interval [x, x + Δx]. Then for every choice of partition t0 < t1 < · · · < tn of
[x, x + Δx], and every choice of values yk∗ on the subintervals,

ymin ≤ yk∗ ≤ ymax ,

42
18.01 Calculus Jason Starr
Fall 2005

for every k. Thus the Riemann sum is squeezed between,

n
� n
� n
�
ymin Δtk ≤ yk∗ Δtk ≤ ymax Δtk .
k=1 k=1 k=1

Of course the lower bound is,

n
� n
�
ymin Δtk = ymin Δtk = ymin Δx,
k=1 k=1

because the total length of the interval [x, x + Δx] is Δx. Similarly, the upper bound is,
n
�
ymax Δtk = ymax Δx.
k=1

Thus the Riemann sum is squeezed between,

n
�
ymin Δx ≤ yk∗ Δxk ≤ ymax Δx.
k=1

Because the Riemann integral is a limit of Riemann sums, it is also squeezed,

� x+Δx
ymin Δx ≤ f (t)dt ≤ ymax Δx.
x

Substituting in F (x + Δx) − F (x) and dividing each term by Δx gives,

F (x + Δx) − F (x)
ymin ≤ ≤ ymax .
Δx
The middle term is the diﬀerence quotient. Consider what happens as Δx tends to 0. Because f (t)
is continuous, both the maximum and minimum values of f (t) on [x, x + Δx] simply limit to the
value f (x). Thus,
lim ymin = lim ymax = f (x).
Δx Δx

By the Squeezing Lemma for limits, since these two limits exist and are equals, the middle limit
also exists and equals f (x),
F (x + Δx) − F (x)
lim = f (x).
Δx→0 Δx

This is precisely what the Fundamental Theorem of Calculus asserts,

� x
d
f (t)dt = f (x) .
dx a

43
18.01 Calculus Jason Starr
Fall 2005

4. Algorithm for computing Riemann integrals. The Fundamental Theorem of Calculus has many
important applications. The most obvious is to give us a simpler method for computing Riemann
integrals, under the hypothesis that we can compute the antiderivative. If f (x) is a continuous
function and G(x) is a known antiderivative of f (x), then,
� b
f (t)dt = G(b) − G(a).
a

To see this, observe that, � x

F (x) = f (t)dt,
a
is also an antiderivative of f (t) by the Fundamental Theorem of Calculus. Thus, since the general
antiderivative is G(x) + C, there is a constant C such that F (x) = G(x) + C. But also,
� a
F (a) = f (t)dt = 0.
a

Thus, F (x) = G(x) − G(a). Now plug in x = b to get,

� b
f (t)dt = F (b) = G(b) − G(a).
a

Lecture 16. October 20, 2005

Practice Problems. Course Reader: 3D1, 3D3, 3D7, 3E3, 3E4.
1. Dummy variables. Give a Riemann integrable function f (x) deﬁned on an interval [a, b], the
notation, � b
f (x)dx,
a
is shorthand for the Riemann integral of f (x) over this interval. In particular, this equals the limit,
b−a
lim f (a + (b − a)k/n)
.
n→∞ n
Observe, the variable x does not appear in this limit. It is very convenient to include the variable
x in the notation for the Riemann integral; for how else are we to express the function integrated?
But, since the deﬁnition of the Riemann integral does not involve x, x is really a dummy variable.
Any variable name may be substituted for x, with the same meaning.
� b � b � b � b
f (x)dx = f (u)du = f (v)dv = f (t)dt = . . .
a a a a

This freedom is very useful, particularly when one or both of the limits of integration depend
on some parameter. In this case, by convention, the dummy variable is chosen to be a diﬀerent
parameter. � x � x
f (x)dx INCORRECT, f (t)dt CORRECT
a a

18.01 Calculus Jason Starr

Fall 2005

This convention reduces the likelihood of an error.

2. Variable limits of integration. The Riemann integral is often used to deﬁne functions,
particularly antiderivatives having no simpler expression.
Example. For every angle 0 ≤ θ < π/2, deﬁne f (θ) to be the area above the xaxis, inside the unit
circle x2 + y 2 = 1, and bounded by the vertical lines, − cos(θ) ≤ x ≤ cos(θ). This is an integral,
� cos(θ) √
f (θ) = 1 − x2 dx.
− cos(θ)

The problem is to describe the rateofchange of f , df /dθ.

The integral f (θ) is beyond our current techniques of integration (though soon we will have tech
niques to solve it). The simplest solution is indirect. Here, ﬁrst, is the direct solution. The integral
f (θ) equals the area of 2 triangles and a circular sector. By highschool geometry, the area is,

π − 2θ 1 1
f (θ) = + 2( sin(θ) cos(θ)) = π/2 − θ + sin(2θ).
2 2 2
Using standard rules of diﬀerentiation, the derivative is,
df
= −1 + cos(2θ).
dθ
Notice, by the doubleangle formula for cosine, this equals,

−1 + cos(2θ) = −2 sin2 (θ).

The hardest step (hidden here) was the geometric computation of f (θ). However, this is completely
unnecessary. Introduce a function,
� t√
G(t) = 1 − x2 dx.
0

Using symmetry through the yaxis, f (θ) equals,

f (θ) = 2G(cos(θ)).

By the chain rule,

df dG dG dt dG d(cos(θ))
=2 =2 =2 .
dθ dθ dt dθ dt dθ
By the Fundamental Theorem of Calculus,
dG √
= 1 − t2 .
dt

45
18.01 Calculus Jason Starr
Fall 2005

This gives,
df �
= 2 1 − cos2 (θ)(− sin(θ)) = −2 sin2 (θ).
dθ
The second method is indirect. The function G(t) has no simple expression. Nonetheless, this
method is faster. In many cases this is the only method that works.
The argument above using the chain rule and the Fundamental Theorem of Calculus is quite general.
It gives the general equation,
� v(x)
d/dx u(x)
f (t)dt = f (v(x))v � (x) − f (u(x))u� (x).

3. Geometric area and algebraic area. The Riemann integral is the algebraic area,
� b
f (x)dx = Area above the xaxis − Area below the xaxis .
a

The geometric area is the total area, both above and below the xaxis. Although geometric area
does not equal algebraic area, it has a simple expression using the Riemann integral,
� b
Geometric area = |f (x)|dx.
a

Example. Find both the algebraic area and the geometric area bounded by the xaxis and the
graph of y = sin(x) over the interval −π < x < π.
Because sin(x) is an odd function, the area below the xaxis for −π < x < 0 equals the area above
the xaxis for 0 < x < π. In the expression for the algebraic area, these areas cancel to give 0. This
is borne out by computation,
� π
sin(x)dx = (− cos(x)|π−π = − cos(π) + cos(−π) = −(−1) + (−1) = 0.
−π

On the other hand, the absolute value | sin(x)| equals,

�
− sin(x), −π < x ≤ 0,
| sin(x)| =
sin(x), 0 < x < π.

Thus the geometric area equals,

�0 �π
−π
− sin(x)dx + 0 sin(x)dx =
(cos(x)|0−π + (− cos(x)|π0 = (1 − (−1)) + (−(−1) + 1) = 4.

Thus the geometric area does not equal the algebraic area. But computation of the geometric area
reduces to a straightforward Riemann integral.

46
18.01 Calculus Jason Starr
Fall 2005

4. Estimates. For every pair of Riemann integrable functions f (x), g(x) on [a, b] satisfying the
inequality f (x) ≤ g(x) for every choice of x, the following inequality holds,
� b � b
f (x)dx ≤ g(x)dx.
a a

This is very useful for estimating integrals.

Example. Determine the following Riemann integral to within ±10−4 ,

�
0.1 �

1 + sin(x)dx.
0
�

The expression sin(x) has no simple antiderivative. The value of the Riemann integral could be
approximated well by a Riemann sum. An alternative approach is to use the estimates,
√ �
√
(1 − x2 /6) x ≤ sin(x) ≤ x,

for small values of x. This gives,

� 0.1 �
0.1 � 0.1
1/2 1 5/2 �
1 + x
− x dx ≤ 1 + sin(x)dx ≤ 1 + x1/2 dx.
0 6 0 0

The ﬁrst and third Riemann integral follow from the Fundamental Theorem of Calculus,

� 0.1 � �0.1
1/2 1 5/2 2 3/2 1 7/2 �� 2 1
1+x − x dx = x + x − x � = 0.1+ √ − √ = 0.1210667926±10−10 .
0 6 3 21 0 3 1000 21 10000000
Similarly,
� 0.1 �
�0.1
2 3/2 �� 2
1+x 1/2
dx = x + x �
= 0.1 + √ = 0.1210818511 ± 10−10 .
0 3 0 3 1000

Since these two integrals agree to within ±10−4 , this gives the original integral,
� 0.1 �

1 + sin(x)dx = 0.1210 ± 10−4 .

5. Change of variables. After the Fundamental Theorem of Calculus, the most useful integral

rule is the change of variables rule. The rule for Riemann integrals is nearly the same as the rule

for antiderivatives. The additional feature for Riemann integrals is the change of the limits of

integration.

�
x=b �
u=u(b)
�
f (u(x))u (x)dx = f (u)du.
x=a u=u(a)

47
18.01 Calculus Jason Starr
Fall 2005

Example. Find the Riemann integral,

� π/3
tan(x)dx.
π/4

Since tan(x) is not visibly the derivative of another function, we rewrite the integral and hope for
the best. � π/3 � π/3
sin(x)
tan(x)dx = dx.
π/4 π/4 cos(x)

In this form, the substitution u = cos(x) is natural,

� x=π/3sin(x)
x=π/4 cos(x)
dx,

u = cos(x) u(π/3) = cos(π/3) = 1/2,

√
du = − sin(x)dx u(π/4) = cos(π/4) = 1/ 2.
� u=1/2
√ 1 (−du).
u=1/ 2 u

The new integral can be computed by the Fundamental Theorem of Calculus, since 1/u is the
derivative of ln(u).
� u=1/2

−1 1/2

√ √
√
du = (− ln(|u|)|1/√2 = − ln(1/2) + ln(1/ 2) = ln(2) − ln( 2).
u=1/ 2 u

This simpliﬁes to give,

� π/3
tan(x)dx = ln(2)/2.
π/4

It is only fair to note there is a second method. Make the same substitution to simplify the
antiderivative of tan(x) to − ln(|u|) + C, and then backsubstitute to get,
�
tan(x)dx = − ln(| cos(x)|) + C.

Now use the Fundamental Theorem of Calculus with the original limits of integration. Both
methods are correct. Usually the ﬁrst method is faster and less errorprone; it requires no back
substitution.
6. Integrating backwards. This comes so naturally for most calculus students, it barely warrants
mention. Technically, the Riemann integral,
� b
f (x)dx,
a

18.01 Calculus Jason Starr

Fall 2005

is only deﬁned if a ≤ b. What if a is larger than b? The only possible answer consistent with the
Fundamental Theorem of Calculus is the following,
� b � a
f (x)dx = − f (x)dx, if a > b.
a b

Because of the central role of the Fundamental Theorem of Calculus, the above equation is true by
convention. With this convention, the Fundamental Theorem of Calculus holds whenever a is less
than b, equal to b, or greater than b.
Lecture 17. October 21, 2005

Homework. Problem Set 5 Part I: (a) and (b); Part II: Problem 1.

Practice Problems. Course Reader: 3F1, 3F2, 3F4, 3F8.

1. Ordinary diﬀerential equations. An ordinary diﬀerential equation is an equation involving

a single independent variable x together with a dependent variable y and its derivatives dk y/dxk ,
dy d2 y dk y
� �
G x, y, , 2 , . . . , k = 0.
dx dx dx
The largest k for which dk y/dxk occurs in the equation is called order of the differential equation.
Examples. Here are examples of ordinary differential equations.
(i) The ordinary differential equation,

y − sin(x2 ) = 0,

has order 0, because no derivatives of y actually occur in the equation. It has a unique (and rather
trivial) solution,
y = sin(x2 ).
Because the solution is unique, it depends on 0 parameters (and the order is 0).
(ii) The ordinary diﬀerential equation,
dy 1
− = 0,
dx x + 1
has order 1 because dy/dx occurs and no higher derivatives occur. Every solution is an antiderivative
of 1/x + 1, �
1
y= dx = ln(|x + 1|) + C,
x+1
Notice the solution depends on 1 parameter, C. And the order is 1.
(iii) The ordinary diﬀerential equation,
d2 y
+ ω 2 y = 0,
dx2

49
18.01 Calculus Jason Starr
Fall 2005

has order 2. The general solution was found in Problem Set 2, Problem 4,

y = A cos(ωx) + B sin(ωx).

The solution depends on 2 parameters, A and B. And the order is 2.

(iv) The previous equation was one particular linear ordinary diﬀerential equation. A k th order
linear ordinary diﬀerential equation has the form,

dk y dk−1 y dy
ak (x) + a k−1 (x) + · · · + a 1 (x) + a0 (x)y = b(x),
dxk dxk−1 dx
for functions ak (x), . . . , a0 (x), b(x). If b(x) is zero, the equation is homogeneous. Otherwise it
is inhomogeneous. Very important is the case when all the functions ak (x), . . . , a0 (x), b(x) are
constant. Then the differential equation is called constant coefficient. The solution of constant
coefficient linear ordinary differential equations is a main focus of Math 18.03.
2. Separable differential equations. Many differential equations arising in applications are
examples of separable differential equation. A separable ordinary differential equation is a first
order differential equation,
dy
= F (x, y),
dx
for which f (x, y) factors as,
F (x, y) = g(x)/h(y).

Example. Find the equation y = f (x) of every curve with the following property: For every point
(x, y) on the curve, the tangent line to the curve is perpendicular to the line joining (x, y) to the
origin (0, 0).
The slope of the tangent line to the curve at (x, y) is dy/dx. The slope of the line joining (0, 0) and
(x, y) is y/x. Since the tangent line is perpendicular to the line joining (0, 0) and (x, y),

dy
= −x/y.
dx
Thus, the equation y = f (x) is a solution to this separable diﬀerential equation.

The algorithm for solving a separable diﬀerential equation is the following.

(i). Factor f (x, y) as g(x)/h(y). This is often the most diﬃcult step. In the example, it is quite

easy. Simply take g(x) = −x and h(y) = y.

(ii). Rewrite the diﬀerential equation as an equality of diﬀerentials. In other words,

rewrite the equation as,

dy g(x)
= ⇒ h(y)dy = g(x)dx.
dx h(y)

50
18.01 Calculus Jason Starr
Fall 2005

In the example, this gives,

dy −x
= ⇒ ydy = −xdx.
dx y

(iii). Antidiﬀerentiate both sides of the equation. In the example, the antiderivatives
� �
ydy = −xdx,

give,
1 2 −1 2
y = x + C.
2 2
(iv). If there is an inital value, use it to ﬁnd the constant of integration. An initial value
problem is an ordinary diﬀerential equation together with some information for an initial value x0
of the independent variable. It is often written,
�
dy/dx = F (x, y),
y(x0 ) = y0 .

The example was not an initial value problem. However, it can easily be made an initial value
problem by specifying,
y(1) = 1,
for instance. With this condition, the constant C satisﬁes the equation,
1 2 −1 2
(1) = (1) + C.
2 2
The solution is,
C = 1.

(v). Simplify the answer. Often it is best to solve for y = f (x). Often this is unnecessary. It
depends on the problem. In the example problem, the simplest answer is the implicit answer,

x2 + y 2 = 2C.

So the solution of the initial value problem is,

x2 + y 2 = 2.

Thus every curve satisfying the geometric property is a circle centered at the origin.
Example. Here is a somewhat different example. There is a single separable ordinary differential
equation satisfied by every function,
y = (x − a)3 ,
where a is an arbitrary constant. Find this differential equation, and find all its solutions.

51
18.01 Calculus Jason Starr
Fall 2005

The derivative of y is,

dy
= 3(x − a)2 .
dx
The constant a can be eliminated by writing this as,
dy
= 3[(x − a)3 ]2/3 = 3y 2/3 .
dx
This is a separable diﬀerential equation,

dy/dx = 3y 2/3 .

The algorithm gives,

−2/3
�
y −2/3 dy = 3dx,
�
y dy = 3dx,
3y 1/3 = 3x + C.
Calling the constant −3a gives the answer,

y = (x − a)3 .

However, there are other solutions. For instance, y = 0 is a solution. The general solution of the
diﬀerential equation depends on 2 parameters, a < b,
⎧
⎨ (x − a)3 , x ≤ a,
y =
0, a < x ≤ b,
3
(x − b) ,
x>b
⎩

The problem is that in the step giving dy/y 2/3 = dx. If y equals 0, this equation involves division
by zero. Division by zero is not allowed, so the method breaks down.
Important fact. This fact will not be used in this class. However, it is often crucial in realworld
applications to know the solution to an initial value problem is unique. The fact is,
�
dy
dx
= F (x, y),
y(x0 ) = y0 ,

has a unique solution for x close to x0 if F (x, y) is both continuous and differentiable at (x0 , y0 ).
In the previous example, F (x, y) = 3y 2/3 is continuous at y0 = 0. But it is not differentiable at
y0 = 0. Ultimately, this is the reason for the extra solutions of the differential equation.
3. Applications. Separable differential equations come up often in applications. The most
common separable differential equation is the equation for exponential growth,
dy
= ky,
dt

52
18.01 Calculus Jason Starr
Fall 2005

where k is a constant.
The solution behaves differently if k is positive or negative. For k positive, this equation arises in
population growth and interest on savings, among others. For k negative, this equation arises in
radioactive decay, a discharging capactior in an RCcircuit, and Newton’s law of cooling.
Population growth. The simplest model of population growth is that a population N (t) (modeled
as continuous for simplicity) grows at a rate proportional to the size of the population. This gives,
dN
= kN.
dt
Following the method gives,
� dN/N = kdt,�
1/N dN = kdt,
ln(|N |) = kt + C.
Exponentiating both sides gives,
N (t) = N0 ekt .
Observe that N (t) increases without bound as t increases. When N is very large, the ecosystem
cannot support such a population. Thus the model is only valid if N (t) is not too large.
A slightly more realistic model hypothesizes a constant, equilibrium population Nequi sustainable
indefinitely. The model is that the population grows at a rate proportional both to the population
N and the difference Nequi − N ,
dN
= kN (Nequi − N ).
dt
This is again a separable differential equation. It gives the solution,

N (t) = N0 Nequi /(N0 + (Nequi − N0 )e−kNequi t ).

The most important feature is that N (t) approaches Nequi as t increases. This is called the steady
state solution. In general, to find the steadystate solution to a separable ordinary differential
equation, assume the solution is constant y = y1 so that dy/dt is 0. In the original model of
population growth, the only steadystate solution is N = 0. In the new model, there are 2 steady
state solutions, N = 0 and N = Nequi . In Math 18.03, stability is defined, and a method is given
to show the only stable steadystate solution is N = Nequi .
Radioactive decay. A radioactive isotope decays to a more stable isotope at a rate proportional
to the remaining radioactive isotope. Thus the mass m(t) satisfies a differential equation,
dm
= −km.
dt
Using the method, the solution is,
m(t) = m0 e−kt .

53
18.01 Calculus Jason Starr
Fall 2005

An important feature in decay problems is the halflife. The halflife is the length of time necessary
for the mass of radioactive isotope to decrease to onehalf the initial mass,

m(Thalf ) = m0 /2.

Solving in the formula gives,

Thalf = ln(2)/k.

Example. The halflife of a certain radioactive isotope is 20 years. How long is required for the
mass to decrease to 1% of the initial mass? Using the formula above, k = ln(2)/25. Therefore the
equation for the mass is,
m(t) = m0 e− ln(2)t/25 .
Thus the time tf when the mass equals 0.01m0 satisﬁes,

m0 e− ln(2)tf /25 = m0 /100,

or,
ln(2)tf /25 = ln(100) = 2 ln(10).
Solving gives,
tf = 50 ln(10)/ ln(2) = 166 years.

Newton’s Law of Cooling. Isaac Newton proposed a law for the rateofchange of the tempera
ture T of an object placed in a large, effectively infinite, environment at a fixed ambient temperature
Tamb . The law is that the rateofchange of T is negatively proportional to the temperature gradient
T − Tamb ,
dT
= −k(T − Tamb ).
dt
The method gives the solution,

T (t) = Tamb + (T − Tamb )e−kt .

As t increases, the temperature T approaches the steadystate temperature, Tamb .

Lecture 18. October 25, 2005

Homework. Problem Set 5 Part I: (c).

Practice Problems. Course Reader: 3G1, 3G2, 3G4, 3G5.

1. Approximating Riemann integrals. Often, there is no simpler expression for the antideriva
tive than the expression given by the Fundamental Theorem of Calculus. In such cases, the simplest
method to compute a Riemann integral is to use the deﬁnition. However, this is not necessarily the
most eﬃcient method. Often trapezoids or segments under a parabola give a better approximation
to the Riemann integral than do vertical strips.

54
18.01 Calculus Jason Starr
Fall 2005

2. The trapezoid rule. The problem is to ﬁnd an approximation of the Riemann integral,

� b
I= ydx
a

for a function y(x) deﬁned on the interval [a, b]. Choose a partition of the interval [a, b] into n equal
subintervals. The points of this partition are,
(b − a)k b−a
xk = a + , Δxk = .
n n
The values of these points are,
yk = f (xk ).
The Riemann sum using always the left endpoint is,
n
�
Il = yk−1 Δxk .
k=1

The Riemann sum using always the right endpoint is,

n
�
Ir = yk Δxk .
k=1

The average of the two is,

n
� yk−1 + yk
Itrap = Δxk .
k=1
2
This is usually a better approximation than either of the two approximations individually. Part
of the reason is that the term (yk−1 + yk )Δxk /2 is the area of the trapezoid containing the points
(xk−1 , 0), (xk−1 , yk−1 ), (xk , 0) and (xk , yk ). In particular, if the graph of y = f (x) is a line, this
trapezoid is precisely the region between the graph and the xaxis over the interval [xk−1 , xk ]. Thus,
the approximation above gives the exact integral for linear integrands.
Writing out the sum gives,
b−a
Itrap = ((y0 + y1 ) + (y1 + y2 ) + (y2 + y3 ) + · · · + (yn−2 + yn−1 ) + (yn−1 + yn )).
2n
Gathering like terms, this reduces to,

Itrap = (b − a)(y0 + 2y1 + 2y2 + · · · + 2yn−1 + yn )/2n.

3. Simpson’s rule. Again partition the interval [a, b] into n equal subintervals. For reasons that

will become apparent, n must be even. So let n = 2m where m is a positive integer. Again deﬁne,

(b − a)k (b − a)k b−a b−a

xk = a + =a+ , Δxk = = .
n 2m n 2m

55
18.01 Calculus Jason Starr
Fall 2005

Pair oﬀ the intervals as ([x0 , x1 ], [x1 , x2 ]), ([x2 , x3 ], [x3 , x4 ]), etc. Thus the lth pair of intervals is,

([x2l−2 , x2l−1 ], [x2l−1 , x2l ]).

The idea is to approximate the area of the graph over the pair of intervals by the area under the
unique parabola containing the 3 points (x2l−2 , y2l−2 ), (x2l−1 , y2l−1 ), (x2l , y2l ). For notation’s sake,
denote 2l − 1 by k. Thus the 3 points are (xk−1 , yk−1 ), (xk , yk ), and (xk+1 , yk+1 ) (this is slightly
more symmetric).
The ﬁrst problem is to ﬁnd the equation of this parabola. Since the parabola contains the point
(xk , yk ), it has the equation,

y = A(x − xk )2 + B(x − xk ) + yk ,

Plugging in x = xk−1 and x = xk+1 , and using that xk+1 − xk = xk − xk−1 equals Δx,

yk+1 = A(Δx)2 + B(Δx) + yk ,

yk−1 = A(Δx)2 − B(Δx) + yk .

Summing the two sides gives,
yk+1 + yk−1 = 2A(Δx)2 + 2yk .
Solving for A gives,
1
A= (yk−1 − 2yk + yk+1 ).
2(Δx)2
Similarly, taking the diﬀerence of the two sides gives,

yk+1 − yk−1 = 2B(Δx).

Solving for B gives,

1
B= (yk+1 − yk−1 ).
2(Δx)
Thus, the equation of the parabola passing through (xk−1 , yk−1 ), (xk , yk ) and (xk+1 , yk+1 ) is,

y = A(x − xk )2 + B(x − xk )2 + yk ,
A = (yk−1 − 2yk + yk+1 )/2(Δx)2 ,
B = (yk+1 − yk−1 )/2(Δx).

The next problem is to compute the area under the parabola from x = xk−1 to x = xk+1 . This is a
straightforward application of the Fundamental Theorem of Calculus,
� xk+1 � �xk+1
2 A 3 B 2
�
A(x − xk ) + B(x − xk ) + yk dx = (x − xk ) + (x − xk ) + yk (x − xk )��
.
xk−1 3 2 xk−1

56
18.01 Calculus Jason Starr
Fall 2005

Plugging in and using that xk+1 − xk = xk − xk−1 equals Δx, this is,
2A
(Δx)3 + 2yk (Δx).
3
Substituting in the formula for A and simplifying, this is,
Δx Δx Δx
(yk−1 − 2yk + yk+1 ) + (6yk ) = (yk−1 + 4yk + yk+1 ).
3 3 3
Backsubstituting 2l − 1 for k and (b − a)/2m for Δx, the approximate area for the pair of intervals
[x2l−2 , x2l−2 ] and [x2l−1 , x2l ] is,
b−a
ΔIl = (y2l−2 + 4y2l−1 + y2l ).
6m

Finally, summing this contribution over each choice of l gives the Simpson’s rule approximation,
m
b−a�
ISimpson = (y2l−2 + 4y2l−1 + y2l ).
6m l=1

Writing out the sum gives,

ISimpson = b−a6m
((y0 + 4y1 + y2 ) + (y2 + 4y3 + y4 ) + (y4 + 4y5 + y6 )+
· · · + (y2m−4 + 4y2m−3 + y2m−2 ) + (y2m−2 + 4y2m−1 + y2m )).
Gathering like terms, ISimpson reduces to,

(b − a)(y0 + 4y1 + 2y2 + 4y3 + 2y4 + 4y5 + 2y6 + · · · + 4y2m−3 + 2y2m−2 + 4y2m−1 + y2m )/6m.

Example. Approximate ln(2) using a partition into 4 equal subintervals with the Trapezoid Rule
and with Simpson’s Rule.
The value ln(2) equals the Riemann integral,
� 2
1
dx.
1 x
The points of the partition are x0 = 4/4, x1 = 5/4, x2 = 6/4, x3 = 7/4 and x4 = 8/4. The
corresponding values are y0 = 4/4, y1 = 4/5, y2 = 4/6, y3 = 4/7, y4 = 4/8. Thus the Trapezoid Rule
gives,
b−a 1 4 4 4 4 4 1171
Itrap = (y0 + 2y1 + 2y2 + 2y3 + y4 ) = ( + 2 + 2 + 2 + ) = ≈ 0.6970
2n 8 4 5 6 7 8 1680
For Simpson’s Rule, because n equals 4, m equals 2. Thus,
b−a 1 4 4 4 4 4 1747
ISimpson = (y0 + 4y1 + 2y2 + 4y3 + y4 ) = ( + 4 + 2 + 4 + ) = ≈ 0.6933
6m 12 4 5 6 7 8 2520

57
18.01 Calculus Jason Starr
Fall 2005

According to a calculator, the true value is,

ln(2) = 0.6931 ± 10−4

Note that trapezoids overestimate the area, because 1/x is concave up. The approximating parabo
las cross the graph of y = 1/x, thus the underestimation to the left of (xk , yk ) somewhat cancels
the overestimation to the right of (xk , yk ), explaining the better approximation.
4. One review problem. This is a related rates review problem for Exam 3. A particle moves
with constant speed 3 on the parabola y = x2 . The particle is moving away from the origin. What√
is the rateofchange of the distance from the origin to the particle when the distance equals 2 5?
The independent variable is time, t. The dependent variables are the xcoordinate of the particle,
x(t), the ycoordinate of the particle, y(t), and the distance L(t) from the particle to (0, 0). The
constant is the speed s = 3 of the particle. The constraints are that the point moves on the
parabola,
y = x2 ,
and the Pythagorean theorem,
L2 = x 2 + y 2 .
Also, since the speed is constant,
� �2 � �2
2 dx dy
s = + .
dt dt
This plays the role of the “known rateofchange” in a typical related rates problem.

It is simplest to relate the dependent

√ variables y and L to x. The ﬁrst step is to determine x at

the moment when L equals 2 5. Plugging y = x2 into the equation for L2 gives,

L2 = x2 + y 2 = x2 + (x2 )2 = x2 + x4 .
√
At the instant when L equals 2 5, L2 equals 20. Thus, at that moment,

x4 + x2 = 20.

This factors as,

(x2 − 4)(x2 + 5) = 0.
Since x2 is nonnegative, the solution is x2 = 4. Assuming the particle is in the first quadrant (this
is not specified in the problem), x is positive. The other choice
√ leads to a symmetric problem and
the same final answer. So, at the moment when L equals 2 5, x equals 2.
The
√ next step is to determine the “known rateofchange”, dx/dt at the moment when L equals
2
2 5. Implicitly differentiating the equation y = x gives,
dy dx
= 2x .
dt dt

18.01 Calculus Jason Starr

Fall 2005

Substituting this into the equation for s2 gives,

� �2 � �2 � �2
2 dx dx 2 dx
s = + 2x = (1 + 4x ) .
dt dt dt

Since s is known to be 3, and x is known to be 2, this equation can be solved for dx/dt,
�2
32
�
dx 9
= 2
= .
dt 1 + 4(2) 17

Since the particle is in the ﬁrst √

quadrant and moving √ away from the origin, dx/dt is positive. So,
at the moment when L equals 2 5, dx/dt equals 3/ 17.
√
The final step is to compute dL/dt at the moment when L equals 2 5. Implicitly differentiating
the equation,
L 2 = x 2 + x4 ,
gives,
dL dx
2L = (2x + 4x3 ) .
dt dt
Plugging in for L, x and dx/dt gives,
√ dL 3
2(2 5) = (2(2) + 4(2)3 ) √ .
dt 17
Solving gives,
dL √
= 27/ 85 .
dt
√
at the moment when L equals 2 5.
Lecture 19. October 28, 2005
Homework. Problem Set 5 Part I: (d) and (e); Part II: Problems 2 and 3.
Practice Problems. Course Reader: 4A1, 4A3, 4B1, 4B3, 4B6.
1. Differentials revisited. In a typical applied integration problem, the main difficulty is finding
the integrand and the limits of integration. An unknown quantity, for instance area A, depends
on some other quantity, for instance the xcoordinate. The method is to allow the independent
variable x vary “infinitesimally” from x to x + dx and then use geometry or science to deduce the
corresponding variation dA of the unknown quantity. The deduction is usually intuitive rather
than rigorous. What is important is whether the deduction leads to the correct solution. If so, the
method of Riemann sums usually gives a rigorous proof of the intuitive answer. But if the solution
is incorrect, no argument will prove it correct.
2. Areas between curves. Given an interval a ≤ x ≤ b and two functions f (x) ≥ g(x) defined
on the interval, what is the area of the region bounded by the lines x = a, x = b and the curves

18.01 Calculus Jason Starr

Fall 2005

y = f (x), y = g(x)? This problem can be solved directly: the area is the difference of the area
between y = f (x) and the xaxis and the area between y = g(x) and the xaxis. Each of these is a
Riemann integral.
The differential method asks, what is the infinitesimal change in the area A as x varies from x
to x + dx? The infinitesimal region is a rectangle of base dx and height f (x) − g(x). Thus the
infinitesimal change in A is,

dA = height × base = (f (x) − g(x))dx.

Integrating gives, �
�
x=b
A= dA = f (x) − g(x)dx.
x=a
Of course this is the same answer as in the last paragraph. But for other applied integral problems,
the diﬀerential method may be the only method that works.
Example. Find the area bounded by the curve y = x(x2 − 3) and a horizontal tangent line.
The horizontal tangent lines are the tangent lines to the curve at critical points. Setting the
derivative equal to 0 gives,
dy
= 3x2 − 3 = 3(x − 1)(x + 1).
dx
Thus the critical points are x = ±1. The function is odd, so symmetry suggests the area is the same
regardless of the choice of critical point. Thus, choose the critical point x = −1. The corresponding
value of the function is,
y = (−1)((−1)2 − 3) = (−1)(−2) = 2.
This is the equation of the horizontal tangent line. Each intersection point (b, f (b)) of the tangent
line with y = x(x2 − 3) occurs at a solution x = b of,

x(x2 − 3) = 2.

By hypothesis, x = −1 is a solution. Thus the polynomial factors as,

x3 − 3x − 2 = (x + 1)(x2 − x − 2) = (x + 1)2 (x − 2).

The remaining intersection point is (2, 2). So the area bounded by the curve y = x(x2 − 3) and the
tangent line y = 2 is,
�
x=2 � 2
2
2 − (x(x − 3))dx = −x3 + 3x + 2dx.
x=−1 −1

Using the Fundamental Theorem of Calculus, this equals,

�
4 �2
−x 3x2 �
+ + 2x��
= 27/4.
4 2 −1

60
18.01 Calculus Jason Starr
Fall 2005

3. Volumes of solids of revolutions: the disk method. If the region in the xyplane bounded
by x = a, x = b, y = f (x) and the xaxis is revolved through xyzspace about the xaxis, what is
the volume of the resulting solid? The solid is called a solid of revolution. The disk method applies
the method of differentials to solve this problem. As x increases from x to x + dx, the corresponding
infinitesimal region of the solid is essentially a disk. The width of the disk is dx. The area of the
disk is π[f (x)]2 . Thus the infinitesimal volume of the disk is,
dV = Area × width = π[f (x)]2 dx.
Thus the volume is, �
�
x=b
V = dV = π[f (x)]2 dx.
x=a

Example. Find the volume of a right circular cone whose base has radius R and whose vertex has
height H above the base.
The cone is the solid of revolution for the plane region bounded by x = 0, the xaxis, and the line
containing (0, R) and (H, 0). The equation of this line is,
(H − x)R
y= .
H
Thus the area of an inﬁnitesimal disk is,
(H − x)2 R2
dV = Area × width = π dx,
H2
and the volume is, �
x=H
(H − x)2 R2
�

V = dV = π dx.
x=0 H2
Making the substitution u = H − x, du = −dx gives,
� u=0 � 3 �0
R2 2 R2 u �
V = π 2 u (−du) = π 2 − �� .
u=H H H 3 H
Evaluating gives the volume,
V = πR2 H/3.
In particular, the antiderivative of u2 is responsible for the denominator 3 in the formula for the

volume.

Example. Find the volume of a sphere of radius R.

The sphere of radius R is√the solid of revolution for the plane region bounded by the xaxis and

the upper semicircle y = R2 − x2 . Thus the volume is,

� x=R √ � R �R
x3 ��
�
2 2 2 2 2 2
V = π[ R − x ] dx = π(R − x )dx = π R x − �
.
x=−R −R 3 −R

61
18.01 Calculus Jason Starr
Fall 2005

Evaluating gives the volume,

V = 4πR3 /3.

4. The slice method. A generalization of the disk method is the slice method. The problem
is to find the volume of a region bounded by the planes x = a and x = b given the knowledge
of the area A(x) of the slice of the solid by the plane containing (x, 0, 0) parallel to the yzplane.
As x increases from x to x + dx, the corresponding infinitesimal region of the solid is essentially
a cylinder. The width of the cylinder is dx. And the area is the area A(x) of the slice. Thus the
infinitesimal volume of the cylinder is,

dV = Area × width = A(x)dx.

Thus the volume is, �

�
x=b
V = dV = A(x)dx.
x=a

Example. Find the volume of the “corner” region bounded by the xyplane, the xzplane, the
yzplane, and the plane containing (L, 0, 0), (0, L, 0) and (0, 0, L).
This region is bounded by the planes x = 0 and x = L. The xslice of the region is a right isosceles
triangle. The base and altitude of the triangle both equal f (x), where y = f (x) is the equation of
the line passing through (0, L) and (L, 0). This equation is,

f (x) = L − x.

Thus the slice area is

1 1
A(x) = base × altitude = (L − x)2 .
2 2
Thus the inﬁnitesimal volume is,
1
dV = A(x)dx = (L − x)2 dx,
2
giving the total volume, �
�
x=L
1
V = dV = (L − x)2 dx.
x=0 2
Make the substitution u = L − x, du = −dx to get,
u=0
�L
u3 ��
� �
1 2 1
V = u (−du) = − � .
u=L 2 2 3 0

Evaluating gives,
V = L3 /6.
Thus the “corner” takes up one sixth of the corresponding cube of edge length L.

62
18.01 Calculus Jason Starr
Fall 2005

5. Volumes of solids of revolution: the washer method. A variation on the disk method
is the washer method. A washer is the solid obtained by removing from a larger disk a concentric
smaller disk of the same width. If the outer radius of the washer is ro and the inner radius is ri ,
then the net area of the washer is,

A = πro2 − πri2 = π(r

o2 − ri2 ).

Thus the volume of the washer is,

dV = Area × width = π(ro2 − ri2 )dx,

giving a total volume, � � x=b

V = dV = π(ro2 − ri2 )dx.
x=a

Example. The main part of a dog dish is a solid of revolution whose radial crosssection is a
triangle of height H whose base has inner radius Ri and outer radius Ro . Find the volume of
material used to make the dog dish.
Note. This integral was only setup in lecture. The derivation will be completed in recitation.
Here is the complete derivation. Denote by x the height along the altitude of the triangle. Thus x
varies from x = 0 to x = H. When x = H, the inner radius and outer radius are each equal to the
average (Ri + Ro )/2 of the two radii. Both radii depend linearly on x.
The equation for the inner radius increases linearly from ri = Ri at x = 0 to ri = (Ri + Ro )/2 at
x = H. Thus,
H − x Ri + Ro x
ri (x) = Ri + .
H 2 H
Similarly, the equation for the outer radius decreases linearly from ro = Ro at x = 0 to ri =
(Ri + Ro )/2 at x = H. Thus,
H − x Ri + Ro x
ro (x) = Ro + .
H 2 H

Since ro2 − ri2 is a diﬀerence of squares, it equals,

ro2 − ri2 = (ro + ri )(ro − ri ).

This is interesting in its own right. Using this factorization, the net area of the region between the
2 circles, called an annulus, equals
� �
2 2 ro + ri
π(ro − ri ) = 2π
(ro − ri ).
2
Note the ﬁrst factor is the circumference of the center of the annulus. And the second factor is the
radial width of the annulus. Thus the area of an annulus is the circumference of the center times
the radial width.

63
18.01 Calculus Jason Starr
Fall 2005

Back to the problem, the center of each washer is the same,

ri + ro Ri + Ro
= .
2 2
And the radial width of each washer is,
H −x
ro − ri = (Ro − Ri ) .
H
Both of these make sense: The centers are constant because the radial crosssection is an isosceles
triangle. And the width decreases linearly from Ro − Ri at x = 0 to 0 at x = H. Thus the
crosssection area at height x is,
Ri + Ro H −x
A(x) = 2π (Ro − Ri ) .
2 H
Thus the inﬁnitesimal volume is,
H −x
dV = Area × width = π(Ro2 − Ri2 ) dx,
H
giving a total area, �
x=H
H −x
�

V = dV = π(Ro2 − Ri2 ) dx.

x=0 H
Substituting u = H − x, du = −dx gives,
� u=H
u π(Ro2 − Ri2 ) �
2 ��
H
V = π(Ro2 − Ri2 ) dx = u 0 .
u=0 H 2H
Thus the total volume of material to produce the dog dish is,

V = π(Ro2 − Ri2 )H/2.

One reality check is that this is the same volume as a cylinder with the same center (Ri + Ro )/2
and height H as the dish, and whose (constant) radial width equals the average radial width of the
dish, (Ro − Ri )/2.
Lecture 20. November 1, 2005
Practice Problems. Course Reader: 4C2, 4C6, 4D1, 4D4, 4D8.
1. Average values. Given a function f (x) defined on some interval [a, b], what is the average
value of f (x)? A reasonable first approximation is to choose a finite collection of points from [a, b]
and compute the average value over those points. Break [a, b] into a union of n subintervals of
length Δx =
(b − a)/n. From each interval, choose a point; say x
∗k in the k th interval. For the
finitely many values yk∗ = f (x
∗k ), the average value is,
n
1
� ∗
Average ≈ y .
n k=1 k

64
18.01 Calculus Jason Starr
Fall 2005

Multiplying and dividing by Δx gives,

n
1 � ∗
Average ≈ y Δx.
nΔx k=1 k

Now nΔx equals n(a − b)/n, which is a − b. So the average value is,
n
1 � ∗
Average ≈ y Δx.
b − a k=1 k

The sum is a Riemann sum. To get better approximations to the true average, increase the number
of points n at which f (x) is “sampled”. In the limit, this gives the true average,
n
1 � �b
Average = lim yk∗ Δx = a f (x)dx/(b − a).
b − a n→∞ k=1

Example. Under ideal conditions, a wireproducing machine produces wire of uniform radius r0 .
Because of small vibrations in the machine, the actual radius of the wire varies as a function of the
length,
r(x) = r0 + A cos(ωx).
The quantity A is much smaller than r0 . What is the average radius of the wire?
Because the variation is periodic, the average value over any number of periods equals the average
value of one period. In other words, compute the average for the interval 0 ≤ x ≤ 2π/ω. The
length of this interval is 2π/ω. Thus the average value is,
� 2π/ω
1
Average = r0 + A cos(ωx)dx.
(2π/ω) 0
Using the Fundamental Theorem of Calculus, this equals,
1
(r0 x + (A/ω) sin(ωx)|2π/ω
0 .
(2π/ω)
This evaluates to,
1
r0 (2π/ω) = r0 .
(2π/ω)
Thus, although the radius varies and does not usually equal its ideal value r0 , the average value is
indeed,
Average = r0 .

2. Average values: nonuniform distribution. It often happens that the average value of f (x) is
desired in a situation where the values f (x) are not all uniformly likely. Typically, the probability
that x has value in the range from x0 to x0 + Δx is approximately,
Prob(x0 ≤ x ≤ x0 + Δx) ≈ ρ(x0 )Δx,

65
18.01 Calculus Jason Starr
Fall 2005

for some nonnegative continuous function ρ(x). The function ρ(x) is called a probability distribution.
Assuming this approximation becomes arbitrarily good as the length Δx approaches zero, the exact
probability that x has value in the range x0 to x1 is,
� x1
Prob(x0 ≤ x ≤ x1 ) = ρ(x)dx.
x0

In particular, because x must take value somewhere in the interval [a, b], the total probability is 1.
In other words, � b
ρ(x)dx = 1.
a
This is called the normalization condition.
The average value is computed as before. But this time, each value yk∗ = f (x∗k ) is weighted by the
approximate probability that x takes value in the k th interval, ρ(x∗k )Δx. This gives,
n
�
Average ≈ f (xk )∗ ρ(xk )∗ Δx.
k=1

In the limit as n goes to ∞, this gives the exact average,

� b
Average = f (x)ρ(x)dx.
a

It must be noted, the probability distribution ρ(x) often does not satisfy the normalization condi
tion. In this case, the formula above is wrong. But it is easily correct,
�b �b
Average = ( a f (x)ρ(x)dx)/( a ρ(x)dx).

Example. A particle is ﬁred through a slit and strikes a screen on the other side. Measuring the
position on the screen so that the origin is the closest point on the screen to the slit, the probability
distribution is empirically observed to be,
2 /2σ 2
ρ(x) = Ce−x ,

where σ is a constant determining the “width” of the probability distribution, and C is an unde

termined normalization constant. What is the average distance of the particle from the center of

the screen? Assume the particle lies in an interval [−R, R], where R is very large.

2
Remark. This diﬀers from the formula given in lecture, which was Ce−x /2σ for a particular choice

of σ. The formula given here is more “standard”. I apologize for any confusion.

The distance function is, �

−x, x < 0
f (x) = |x| =
x, x ≥ 0

66
18.01 Calculus Jason Starr
Fall 2005

According to the formula, the average value is,

� R � R
( f (x)ρ(x)dx)/( ρ(x)dx).
−R −R

The numerator is, � R

2 /2σ 2
|x|Ce−x dx.
−R

It is easiest to compute this by breaking it into a sum of 2 integrals,

� 0 � R
−x2 /2σ 2 2 2
(−x)Ce dx + (+x)Ce−x /2σ dx.
−R 0

Make the substitution u = −x2 /2σ 2 , du = (−x/σ 2 )dx to reduce this to,
� 0 � −R2 /2σ 2 � 0
u 2 u 2
Ce (σ du) + Ce (−σ du) = 2 Cσ 2 eu du.
−R2 /2σ 2 0 −R2 /2σ 2

Using the Fundamental Theorem of Calculus, this equals,

2 /2Σ2
2Cσ 2 (eu |0−R2 /2Σ2 = 2Cσ 2 (1 − e−R ).
2 2
As R becomes large, the quantity e−R /2Σ becomes vanishingly small. Thus, in the limit as R
tends to ∞, the numerator equals,
� R
2 2
lim |x|Ce−x /2σ dx = 2Cσ 2 .
R→∞ −R

Unfortunately, this is not an answer, because the normalization constant C is unknown. The
normalization condition is that,
� R
2 2
C lim e−x /2σ dx = 1.
R→∞ −R

Simplify this by making the substitution, u = x/σ, du = dx/σ, and Q = R/σ to get,
� R/σ � Q
−u2 /2 2 /2
C lim e σdu = Cσ lim e−u du.
R→∞ −R/σ Q→∞ −Q

Notice the limit, � Q

2 /2
lim e−u du,
Q→∞ −Q

18.01 Calculus Jason Starr

Fall 2005

does not depend on σ. It is simply some number. Denoting this number by 1/C1 , the normalization
condition is,
Cσ/C1 = 1.
The solution is that C = C1 /σ. Plugging this into the formula above, the average distance is,
Average distance = 2C1 σ,
where, � Q
2 /2
1/C1 = lim e−u du.
Q→∞ −Q
There is a beautiful argument that, √
C1 = 1/ 2π.
Unfortunately, we cannot yet prove this. Taking it as true gives the ﬁnal answer,
√
Average distance = 2σ/ 2π.

3. Volumes of solids of revolution: the shell method. An alternative to the disk and washer
method is the shell method. A shell is the region between 2 cylinders of the same height. If the
average radius of the cylinders is r, if the width of the region is w and if the height of the cylinders
is h, then the approximate volume of the shell is,
Volume ≈= Circumference × height × width = 2πrhw.

Take the plane region bounded by x = a, x = b, the xaxis and the curve y = f (x). Revolve this
region about the yaxis. (Please note: In the disk and washer method, the region was revolved
about the xaxis.) To compute the corresponding volume, approximate the region obtained from
x to x + dx as a shell. The radius of the shell is x. The height of the shell is y = f (x). The width
of the shell is dx. Therefore the diﬀerential element of volume is,
dV = (2πx)(f (x))dx.
Integrating gives the volume, � x=b
V = 2πxf (x)dx.
x=a

Example. The dog dish revisited. The main part of a dog dish is a solid of revolution whose
radial crosssection is a triangle of height H whose base has inner radius Ri and outer radius Ro .
Find the volume of material used to make the dog dish.
The volume was computed using the washer method. This time it will be computed using the shell
method. The triangular region is the union of two regions. The ﬁrst region is bounded by x = Ri ,
x = (Ri + Ro )/2, the xaxis, and the line segment,
2H
y= (x − Ri ).
Ro − Ri

68
18.01 Calculus Jason Starr
Fall 2005

The second region is bounded by x = (Ri + Ro )/2, x = Ro , the xaxis, and the line segment,
2H
y= (Ro − x).
Ro − Ri

By the shell method, the volume of the solid of revolution obtained from the ﬁrst region is,
� x=(Ri +Ro )/2 � x=(Ri +Ro )/2
2H 4πH
V1 = (2πx) (x − Ri )dx = x2 − Ri xdx.
x=Ri R o − R i R o − R i x=Ri

This becomes simpler to deal with after the substitution u = −x + (Ri + Ro )/2, du = −dx. The
new integral is,
� u=0
4πH
V1 = (−u + (Ro + Ri )/2)(−u + (Ro − Ri )/2)(−du)
Ro − Ri u=(Ro −Ri )/2
� u=(Ro −Ri )/2
4πH
= (−u + (Ro + Ri )/2)(−u + (Ro − Ri )/2)du.
Ro − Ri u=0

By the shell method, the volume of the solid of revolution obtained from the second region is,
� x=Ro � x=Ro
2H 4πH
V2 = (2πx) (Ro − x)dx = x(Ro − x)dx.
x=(Ro +Ri )/2 Ro − Ri Ro − Ri x=(Ro +Ri )/2

Believe it or not, this will be simpler to deal with after the substitution u = x − (Ro + Ri )/2,
du = dx. The new integral is
� u=(Ro −Ri )/2
4πH
V2 = (u + (Ro + Ri )/2)(−u + (Ro − Ri )/2)du.
Ro − Ri u=0

Notice how similar are the integrals for V1 and V2 . They have the same fraction in front of the
integral, and they have the same limits of integration. Thus, the sum of the 2 volumes is,

V = V1 + V2 =
� u=(Ro −Ri )/2
4πH
[(−u+(Ro +Ri )/2)(−u+(Ro −Ri )/2)]+[(u+(Ro +Ri )/2)(−u+(Ro −Ri )/2)]du.
Ro − Ri u=0
Since both terms in the integrand have the factor (−u + (Ro − Ri )/2), this can be factored to give,
� u=(Ro −Ri )/2
4πH
V = [(−u + (Ro + Ri )/2) + (u + (Ro + Ri )/2)](−u + (Ro − Ri )/2)du.
Ro − Ri u=0
Of course the term in square brackets is simply Ro + Ri . So the total volume is,
� u=(Ro −Ri )/2
4πH
V = (Ro + Ri )(−u + (Ro − Ri )/2)du.
Ro − Ri u=0

69
18.01 Calculus Jason Starr
Fall 2005

By the Fundamental Theorem of Calculus, this equals,

� 2 �(R −R )/2
4πH −u (Ro − Ri )u �� o i
(Ro + Ri ) +
.
Ro − Ri 2 2 �
0

This evaluates to,

4πH (Ro − Ri )2
(Ro + Ri ) .
Ro − Ri 8
This simpliﬁes to give,

V = πH(Ro − Ri )(Ro + Ri )/2 = π(Ro2 − Ri2 )H/2.

This is precisely the same answer as computed using the washer method. Please observe though,
how much more effort was required for the shell method. The lesson is, if you have an alternative
between the disk method and the shell method, consider carefully which method requires less effort
before committing to one or the other.
Lecture 21. November 3, 2005
Homework. Problem Set 6 Part I: (a) (e); Part II: Problem 1.
Practice Problems. Course Reader: 4E2, 4E5, 4E7, 4F1, 4F6.
1. Parametric equations. To this point in the course, plane curves were specified in 1 of 2 ways.
The explicit form, or graph form of a curve in Cartesian coordinates is the common form,
y = f (x), a ≤ x ≤ b.
The implicit form of a curve in Cartesian coordinates is as the set of all solutions of an equation,
F (x, y) = 0.
Often a subset of this curve is specified by imposing extra conditions, e.g., the upper unit semicircle
is the set of solutions of x2 + y 2 = 1 satisfying the extra condition y > 0.
There is a third important way to specify a curve: using parametric equations. Given a parameter
t varying in an interval a ≤ t ≤ b and given functions f (t) and g(t) on this interval, the associated
parametric curve, �

x = f (t),
y = g(t)
is simply the set of all pairs (x, y) = (f (t), g(t)) as t varies over the interval a ≤ t ≤ b. We consider
only the case where f (t) and g(t) are piecewise diﬀerentiable functions (more advanced courses
discuss some pitfalls if f (t) and g(t) are merely continuous functions).
Examples. A. One speciﬁcation of the points on the circle of radius r centered at (0, 0) is using
the angle θ. This gives rise to a parametric equation with parameter θ,
�

x = r cos(θ),
0 ≤ θ < 2π.
y = r sin(θ)

70
18.01 Calculus Jason Starr
Fall 2005

B. An ellipse centered at (0, 0) whose axes equal the coordinate axes has a parametric equation,
�
x = a cos(θ),
0 ≤ θ < 2π.
y = b sin(θ)

C. A projectile is launched from an initial position of (x0 , y0 ) with an initial velocity vector of
magnitude v0 at an angle α to the horizontal, and under the influence of constant graviational
acceleration −g. According to Newton’s laws of mechanics, the position of the projectile after time
t is, �
x = v0 cos(α)t + x0 ,
0 ≤ t.
y = −(g/2)t2 + v0 sin(α)t + y0
This is a parametric equation where time t is the parameter. Even when some other quantity is
the parameter, it is often useful to think of the parameter as time. Thus the curve is the trail left
by a point, or perhaps better, the tip of a pen, as it moves in the plane.
2. Implicitization. Under reasonable hypotheses, it is possible to turn a portion of an implicit
curve into an explicit curve. Similarly, it should be possible to turn a portion of a parametric curve
into an explicit curve. It is often simpler to find an implicit equation satisfied by a parametric
curve. The process of finding an implicit equation is called implicitization.
Examples. A. For the parametric curve in Example A above, by the Pythagorean Theorem,
x(θ)2 + y(θ)2 = r2 cos2 (θ) + r2 sin2 (θ) = r2 (cos2 (θ) + sin2 (θ)) = r2 .
Thus the parametric equation satisfies the implicit equation,

x2 + y 2 = r 2 .

B. For the parametric curve in Example B,

(x(θ)/a)2 + (y(θ)/b)2 = cos2 (θ) + sin2 (θ) = 1.
Thus the parametric equation satisﬁes the implicit equation,

x2 /a2 + y 2 /b2 = 1.

C. For the parametric curve in Example C, assuming v0 cos(α) is nonzero, the equation for x can
be solved for t,
x − x0
x = v0 cos(α)t + x0 ⇔ t = .
v0 cos(α)
This can then be substituted into the equation for y to get an explicit equation for the curve,

y = −g(x − x0 )2 /(2v02 cos2 (α)) + tan(α)(x − x0 ) + y0 .

In going from a parametric equation to an implicit equation, there are 2 important warnings to
keep in mind:

71
18.01 Calculus Jason Starr
Fall 2005

• A parametric equation may traverse only part of the implicit curve. The most usual reason
is that the parameter t is restricted to a certain range. A closely related reason is that the
functions of t are themselves somehow limited, as in the parametric curve lying in the line
y = x, �
x = cos(t),
y = cos(t)
A more interesting reason is that the implicit curve may have more than one connected piece,
as in the parametric curve,
�
x = 2t/(1 − t2 ),

− 1 < t < 1.

y = (1 + t2 )/(1 − t2 )

As t varies, this parametric curve sweeps out the top branch of the hyperbola y 2 − x2 = 1.

• A parametric equation may sweep out all or a portion of the implicit curve multiple times.
This is clear in Examples A and B: as θ is allowed to vary over the interval 0 ≤ θ < 2nπ, the
parametric curve completes n revolutions of the implicit curve.

3. Arc length. Given a segment of curve, what is the length of the curve? Imagining the curve
made of some ﬂexible extensible material like wire, what is the length when the wire is pulled taut?
The answer is called the arc length, s.
The method for expressing arc length is an integral is by now familiar. Break the interval a ≤ t ≤ b
into a large number n of subintervals with endpoints,

a = t0 < t1 < · · · < tn−1 < tn = b.

Approximate the curve on each subinterval tk−1 ≤ t ≤ tk by a line segment. The line segment runs
from the point,
(xk−1 , yk−1 ) = (x(tk−1 ), y(tk−1 )),
to the point,
(xk , yk ) = (x(tk ), y(tk )).
The rise and run of the line segment are,

Δxk = xk − xk−1 ≈ x� (tk )Δtk ,

Δyk = yk − yk−1 ≈ y � (tk )Δtk .

By the Pythagorean theorem, the length of the line segment is,
� �
Δsk = (Δxk )2 + (Δyk )2 ≈ (x� (tk ))2 + (y � (tk ))2 Δtk .

72
18.01 Calculus Jason Starr
Fall 2005

The arc length of the curve is approximately the sum of the lengths of the approximating line
segments,
�n
�
s≈ (x� (tk ))2 + (y � (tk ))2 Δtk .
k=1

This is a Riemann sum. As the mesh of the partition tends to 0, the Riemann sums tend to a
Riemann integral. This Riemann integral is the arc length,
�
� t=b � �2 � �2
dx dy
Arc length = + dt.
t=a dt dt

Example 1. For the parametric curve in Example A above,

dx dy
= −r sin(θ), = r cos(θ).
dθ dθ
Therefore, the expression, � �2 � �2 � �2
ds dx dy
= + ,
dt dt dt
equals,

(−r sin(θ))2 + (r cos(θ))2 = r2 sin2 (θ) + r2 cos2 (θ) = r2 (sin2 (θ) + cos2 (θ)) = r2 .

Taking square roots gives the equation,

ds
= r, ⇔ ds = rdθ.
dθ
Thus the arc length of the arc of the circle from θ = a to θ = b is,
� � θ=b
s = ds = rdθ = r(b − a).
θ=a

This is, in fact, our definition of the angle: the angle θ subtended by an arc of a circle equals the
ratio of the arc length by the radius of the circle. If this logic sounds circular, it is perhaps that
nobody ever told you before how to define the length of the arc of a circle! It is also an argument
in favor of the more natural definition of the angle as the ratio of the area of the sector of the circle
by r2 /2.
Example 2. This is not a single example, but a class of examples. A curve in explicit form,
y = f (x), a ≤ x ≤ b, can always be put in parametric form,
�
x = t,
a ≤ t ≤ b.
y = f (t)

73
18.01 Calculus Jason Starr
Fall 2005

Then,
dx dy
= 1, = f � (t).
dt dt
Using this, �
ds = 1 + (f � (t))2 dt.
Thus the arc length is, � t=b �
Arc length = 1 + (f � (t))2 dt.
t=a
Since the parameter t in the Riemann integral is only a dummy variable anyway, it is allowed to
replace it by the variable x (so long as x plays no other role in the integral, which it does not).
This gives the formula for the arc length of an explicit curve,
�b�
Arc length = a
1 + (dy/dx)2 dx.

Example 3. Consider the explicit curve,

x2 1
y= − ln(x), a ≤ x ≤ b,
4 2
where a is a positive real number. The derivative is,
dy 1 11 1
= (2x) − = (x − x−1 ).
dx 4 2x 2
Thus the square of the derivative is,
� �2
dy 1
= (x2 − 2 + x−2 ).
dx 4

Clearing denominators, 1 + (dy/dx)2 equals,

4 x2 − 2 + x−2 x2 + 2 + x−2
+ = .
4 4 4
It is easy to check this equals the square,
�2
x + x−1
�
.
2

This gives the formula, � � �2

dy 1
1+ = (x + x−1 ).
dx 2

74
18.01 Calculus Jason Starr
Fall 2005

Therefore,
1
ds = (x + x−1 )dx.
2
Integrating gives the arc length,
�
x=b �b
x2
�
�
1 1
(x + x−1 )dx =
�
s= ds = + ln(x)�� .
x=a 2 2 2 a

Evaluating gives the arc length,

Arc length = (b2 − a2 )/4 + (1/2) ln(b/a).

Lecture 22. November 4, 2005

Homework. Problem Set 6 Part I: (f)–(h); Part II: Problem 2 (a) and (c).
Practice Problems. Course Reader: 4G1, 4G4, 4G6, 4H1, 4H3.
1. Surface area of a right circular cone. Before attacking the general problem of the surface
area of a surface of revolution, consider the simplest case of the area of a right circular cone of base
radius R and height H. The slant height of the cone is the length of any line segment from the
vertex to a point on the base circle. By the Pythagorean theorem, the slant height S is,
√
S = R2 + H 2 .

Imagine the cone is made of paper. Make an incision along a line segment from the vertex to the
base circle. The resulting piece of paper may be unfolded to form a sector of a circle. The radius of
the sector is the slant height s. The circumference of the sector is the circumference of the original
base circle 2πr. The formula for the area and circumference of a sector of a circle give the identity,
1
Area of sector = (Radius of sector) × (Circumference of sector).
2
Thus, the area of the cone equals,
1
A = (S)(2πR) = πRS .
2
In particular, the height H is involved only indirectly (as H depends on H).
Next, consider a conical band obtained from a right circular cone of base radius R1 and slant height
S1 by removing the the top part of the cone of base radius R2 and slant height S2 . In particular,
the slant height of the band is the diﬀerence,

s = S 2 − S1 ,

75
18.01 Calculus Jason Starr
Fall 2005

and the average radius of the band is the average of R1 and R2 ,

1
r = (R1 + R2 ).
2

By similar triangles,
S1 S2
= .
R1 R2
Rearranging gives,
R 2 S 1 = R 1 S2 .
Using the formula above, the area of the large cone is,

A1 = πR1 S1 ,

and the area of the small cone is,

A2 = πR2 S2 .
The area A of the band is the diﬀerence,

A = A1 − A2 = π(R1 S1 − R2 S2 ).

Since R2 S1 equals R1 S2 , the formula is unchanged by adding πR2 S1 and subtracting πR1 S2 to get,

A = π(R1 S1 − R2 S2 ) + π(R2 S1 − R1 S2 ) = π((R1 + R2 )S1 − (R1 + R2 )S2 ).

Simplifying and substituting R1 + R2 = 2r and S1 − S2 = 2 gives,

A = 2πrs.

2. Surface area of a surface of revolution. Given a segment of a parametric curve,

�
x = x(t),
a≤t≤b
y = y(t)

the surface of revolution is the surface obtained by revolving the segment through xyzspace about
the yaxis. What is the area of this surface? The answer is called the surface area.
The method for computing the surface area is so close to the method for computing the arc length
of the curve, the details will be skipped. What is relevant is the diﬀerential element of surface
area. Given a small interval from t to t + dt, approximate the segment of the parametric curve as a
line segment. The surface obtained by revolving a line segment is precisely a band of a cone. The
average radius of the cone r is x(t). The slant height of the cone is ds. Thus the area of the band
is,
�� 2
dx 2
dA = 2πrds = 2πx(t) dt
+ dydt
dt.

76
18.01 Calculus Jason Starr
Fall 2005

Integrating gives the formula for the surface area of the surface of revolution,

� ��
� t=b dx 2
� �2
A = dA = t=a 2πx(t) dt
+ dydt
dt.

Examples. A. Consider the line segment connecting the point (0, H) to the point (R, 0). This
has equation,
H
y = (R − x), 0 ≤ x ≤ R.
R
The slant height of the line segment is,
√
S = R2 + H 2 ,

and the diﬀerential arc length of the line segment is,

S
ds = dx.
R
Thus the diﬀerential element of surface area is,
S
dA = 2πrds = 2πx dx.
R
Integrating gives,
�
�
x=R �
2 �R
2πS 2πS x ��
A= dA = xdx = = πRS .
x=0 R R 2 �0
This is the same formula obtained above by more elementary means.
B. Consider the parametrized semicircle of radius R in the ﬁrst and third quadrants,
�

x = R cos(θ), −π π
≤θ≤ .
y = R sin(θ) 2 2
Revolving about the yaxis gives the sphere of radius R. Thus the surface area of the surface of
revolution is the surface area of the sphere of radius R.
As computed in the previous lecture, the diﬀerential element of arc length is,

ds = Rdθ.

Thus the diﬀerential element of surface area is,

dA = 2πrds = 2πx(θ)(Rdθ) = 2π(R cos(θ))(Rdθ) = 2πR2 cos(θ)dθ.

Integrating gives,
�
� θ=π/2
π/2
A= dA = 2πR2 cos(θ)dθ = 2πR2 (sin(θ)|−π/2 .
θ=−π/2

77
18.01 Calculus Jason Starr
Fall 2005

This evaluates to,

A = 2πR2 (2) = 4πR2 .
The fastest way to remember this is to observe the surface area A and the volume V of a sphere of
radius R are related by,
dV d
A = 4πR2 = = (4πR3 /3).
dr dr
C. An astroid is a curve,
x2/3 + y 2/3 = a2/3 .
The part of the astroid in the ﬁrst quadrant has parametric equation,
�
x = a cos3 (t), π
3 0≤t≤ .
y = a sin (t) 2

The derivatives are,

dx dy
= −3a cos2 (t) sin(t), = 3a sin2 (t) cos(t).
dt dt
Thus,
� � 2 � �2
dx dy
+ = (−3a cos2 (t) sin(t))2 + (3a sin2 (t) cos(t))2 = 9a2 sin2 (t) cos2 (t)(cos2 (t) + sin2 (t)).
dt dt

The square root is,

�� 2 � �2
dx dy
�
+ = 9a2 sin2 (t) cos2 (t) = 3a sin(t) cos(t).
dt dt

So the diﬀerential element of arc length is,

ds = 3a sin(t) cos(t)dt.

Thus the diﬀerential element of surface area of the surface of revolution is,

dA = 2πrds = 2πx(t)ds = 2π(a cos3 (t))(3a sin(t) cos(t))dt = 6πa2 cos4 (t) sin(t)dt.

Integrating, the surface area is,

� � t=π/2
A= dA = 6πa2 cos4 (t) sin(t)dt.
t=0

Subsitute u = cos(t), du = − sin(t)dt, u(0) = 1, u(π/2) = 1 to get,

� u=0 � u=1
2 4 2
A= 6πa u (−du) = 6πa u4 du.
u=1 u=0

78
18.01 Calculus Jason Starr
Fall 2005

Thus the surface area of the surface of revolution is,

�1
A = 6πa2 u5 /5�
0 = 6πa2 /5.
�

3. Polar coordinate curves. After the explicit, Cartesian form of a curve as a graph, y = f (x),

the next most common representation is using polar coordinates. Given a function r = r(θ) and

an interval a ≤ θ ≤ b, the associated polar coordinate curve is the parametric curve,

�
x = r(θ) cos(θ),
a ≤ θ ≤ b.
y = r(θ) sin(θ)

For each point on the curve, the distance of the point from the origin is,
� √ �
+r, r ≥ 0,
2 2
Distance from origin = x + y = r = |r| =2
−r, r < 0

Also, assuming the point does not equal the origin, the angle of the ray from the origin to the point

is, �

−1 −1 θ, r > 0,
Angle = tan (y/x) = tan (tan(θ)) =
θ + π, r < 0
This is one of the most confusing aspects of polar curves. The symbols r(θ) and θ are engrained in
mathematical thinking as the distance and angle of a point in polar coordinates. But for a polar
coordinate curve these are simply parameters. They are very closely related to, but often diﬀerent
from, the actual distance and angle. This is easiest to think about by imagining the point swerving
through the origin along the radius line to the opposite ray of the ray given by θ. In other words,
the point “goes negative”.
Given a polar curve, it is often possible to ﬁnd an implicit Cartesian curve containg the polar curve.
Examples. A. Let a be a positive constant and consider the polar curve,

r(θ) = a.

This gives,
r2 = a2 ⇔ x2 + y 2 = a2 .
Thus the polar curve is contained in the circle of radius a.
B. Consider the polar curve,
a
r(θ) = .
sin(θ)
Multiplying both sides by sin(θ) gives,

r sin(θ) = a ⇔ y = a.

Thus the polar curve is contained in the horizontal line passing through (0, a).

79
18.01 Calculus Jason Starr
Fall 2005

C. Consider the polar curve,

r = 2a cos(θ),
Multiplying both sides by r gives,

r2 = 2ar cos(θ) ⇔ x2 + y 2 = 2ax.

Simplifying this gives the equation,

(x − a)2 + y 2 = a2 .

This is the equation of the circle of radius a centered at (a, 0).

4. Sketching polar curves. Given a polar curve, how are we to sketch it? For deﬁniteness,
consider the polar curve,
r(θ) = cos(2θ), −π/4 ≤ θ ≤ 7π/4.
This curves is called the fourleaved rose. A similar curve occurs in Part II, Problem 2 of Problem
Set 6.
1. Find the range of θ. In almost every case, this will be given. In this case, the range is given
as −π/4 ≤ θ ≤ 7π/4. In some cases, the range must be determined. For instance, to sketch only
the “leaf” of the rose containing (1, 0), ﬁrst the endpoints of this leaf must be found.
2. Determine when r is positive, zero or negative. This is easiest to keep track of with a
table.

θ r
−π/4 0
−π/4 < θ < π/4 r >0
π/4 0
π/4 < θ < 3π/4 r <0
3π/4 0
3π/4 < θ < 5π/4 r >0
5π/4 0
5π/4 < θ < 7π/4 r <0

The curve crosses the origin when θ = −π/4, π/4, 3π/4, and 5π/4. The curve “goes negative” when
π/4 < θ < 3π/4 and when 5π/4 < θ < 7π/4.
3. Find the extremal values of |r|. A local maximum of |r| is either a point where r is positive and
a local maximum or a point where r is negative and a local minimum. Similarly for local minima
of |r|. Typically, local maxima of |r| occur either at endpoints of the interval or points where r� (θ)
is zero (occasionally at discontinuity points, or nondiﬀerentiable points). Local minima occur at
such points, but also occur everytime the curve crosses the origin (so that |r| equals 0).

80
18.01 Calculus Jason Starr
Fall 2005

In our example, the local minima are all points where r = 0, enumerated above. The derivative of
r is,
r� (θ) = −2 sin(2θ).
The critical points are θ = 0, π/2, π and 3π/2. For θ = 0 and θ = π, r is positive and maximum.
For θ = π/2 and θ = 3π/3, r is negative and minimum. Thus each critical point is a local maximum
of |r|. The value of |r| at each critical point is 1.
4. Find the asymptotes. This is a bit diﬃcult with a polar curve. What is easier is to ﬁnd a line
parallel to an asymptote. Whenever,

lim r(θ) = +∞,

θ→a

(or the same with a righthand limit or lefthand limit), there is an asymptote parallel to the ray
θ = a. Whenever,
lim r(θ) = −∞,
θ→a

there is an asymptote parallel to the ray θ = a + π.

In our example, r(θ) never limits to ±∞. Thus there are no asymptotes. But in Example B.,
r = a/ sin(θ), r tends to +∞ as θ tends to 0 from above and as θ tends to π from below. Thus
there is an asymptote parallel to the xaxis. Since the explicit equation is y = a, which is a line
parallel to the xaxis, this is correct.
5. Find the tangent direction at important points. This will be discussed further next time. The
most important tangent directions are when the curve crosses the origin, and critical points of r.
If r(θ) = 0 and r� (θ) �= 0, the tangent line of the curve has angle θ. If r� (θ) = 0 and r(θ) �= 0, the
tangent line has angle θ ± π/2, i.e., the tangent line is orthogonal to the radius through the point.
Both of these are consequences of a more general formula. The angle ψ between the tangent line
and the radius satisﬁes,
r(θ)
tan(ψ) = � .
r (θ)

In the example, r� is nonzero whenever r is zero. Thus the tangent direction of the curve as it
crosses the origin is just the direction of the limiting radius.
This is now ample information to sketch the fourleaved rose. Up to a rotation of π/4, the sketch
is the same as in Figure 16.11 on p. 566 of the textbook (the sketch was also given in lecture).
Lecture 23. November 8, 2005
Homework. Problem Set 6 Part I: (i) and (j); Part II: Problem 2.
Practice Problems. Course Reader: 4I1, 4I4, 4I6.
1. Tangent lines to parametric curves. This short section was not explicitly discussed for
general parametric curves. It was discussed for polar curves, which are a special collection of
parametric curves.

81
18.01 Calculus Jason Starr
Fall 2005

Given a parametric curve,

�

x = f (t),
y = g(t),
what is the slope of the tangent line at (f (a), g(a))? The relevant diﬀerentials are,

dx = f � (t)dt, dy = g � (t)dt.

If g � (a) is nonzero, then the slope of the tangent line is,

f � (t)dt �� f � (a)
�
dy
(a) =
� = .

dx g (t)dt �
t=a g � (a)

In particular, for a function r = r(θ), the associated polar curve is,

x = r(θ) cos(θ),
y = r(θ) sin(θ)

Thus the diﬀerentials are,

dx = [r� (θ) cos(θ) − r(θ) sin(θ)]dθ,

dy = [r� (θ) sin(θ) + r(θ) cos(θ)]dθ.

Therefore the slope of the tangent line is,

dy r� (θ) sin(θ) + r(θ) cos(θ)
= � .
dx r (θ) cos(θ) − r(θ) sin(θ)

2. Tangent lines for polar curves. Although the formula above is perfectly correct, it is a bit
long to remember. There is a slightly different packaging that is much easier to remember. Define
α to be the angle from the horizontal ray emanating from (x(θ), y(θ)) in the positive xdirection,
and the tangent line. To be precise, there are two such angles, differing by π. The defining equation
for α is,
dy
tan(α) = .
dx
And, of course,
y
tan(θ) = .
x
Define ψ to be the difference between α and θ,

ψ = α − θ.

The angle addition/subtraction formulas for tan(θ) are,

tan(φ1 ) + tan(φ2 ) tan(φ1 ) − tan(φ2 )
tan(φ1 + φ2 ) = , tan(φ1 − φ1 ) = .
1 − tan(φ1 ) tan(φ2 ) 1 + tan(φ1 ) tan(φ2 )

82
18.01 Calculus Jason Starr
Fall 2005

Therefore,
tan(α) − tan(θ)
tan(ψ) = tan(α − θ) = .
1 + tan(α) tan(θ)
Substituting in the equations for tan(θ) and tan(α) from above gives,

(dy/dx) − (y/x)
tan(ψ) = .
1 + (y/x)(dy/dx)

To simplify this, imagine multiplying both numerator and denominator by xdx and manipulate
formally,
xdy − ydx
tan(ψ) = .
xdx + ydy
The actual justiﬁcation of this is a little more involved, but the formal manipulation leads to the
correct equation.
To compute the denominator in the expression, diﬀerentiate both sides of,

r 2 = x2 + y 2 ,

to get,
2rdr = 2xdx + 2ydy,
or equivalently,
xdx + ydy = r(θ)r� (θ)dθ.
To compute the numerator in the expression, diﬀerentiate both sides of,
y
tan(θ) = ,
x
to get,
dy ydx 1
sec2 (θ)dθ = − 2 = 2 (xdy − ydx).
x x x
Now substitute x = r cos(θ) in the denominator to get,

1 sec2 (θ)
sec2 (θ)dθ = (xdy − ydx) = (xdy − ydx).
r2 cos2 (θ) r2

Cancelling sec2 (θ) and multiplying both sides by r2 gives,

xdy − ydx = r2 dθ.

Thus the fraction for tan(ψ) is,

xdy − ydx r2 dθ
tan(ψ) = = � .
xdx + ydy rr dθ

83
18.01 Calculus Jason Starr
Fall 2005

Simplifying gives,
tan(ψ) = r(θ)/r� (θ) .

Example. Consider the cardioid, discussed in recitation,

r(θ) = a(1 + cos(θ)).

The formula for ψ is,

r a(1 + cos(θ)) 1 + cos(θ)
tan(ψ) = �
= = .
r −a sin(θ) − sin(θ)
To simplify this, write θ = 2(θ/2) and use the doubleangle formulas to get,

1 + cos(2(θ/2)) 1 + (cos2 (θ/2) − sin2 (θ/2))

= .
− sin(2(θ/2)) −2 sin(θ/2) cos(θ/2)

Replacing 1 − sin2 (θ/2) in the numerator by cos2 (θ/2), this simplﬁes to,

2 cos2 (θ/2)
= − cot(θ/2).
−2 sin(θ/2) cos(θ/2)
Of course there is an identity,
− cot(u) = tan(u − π/2).
Altogether, this gives,
tan(ψ) = − cot(θ/2) = tan(θ/2 − π/2).
Therefore,
ψ = (θ − π)/2.
Since α equals θ + ψ, this gives,
α = (3θ − π)/2.
In particular, the angle of the tangent line to the cardioid at θ = π/2 is α = π/4.
3. Arc length in polar coordinates. As discussed previously, the formula for arc length of a
parametric curve is, �
ds = (dx/dt)2 + (dy/dt)2 dt.
In the case of a parametric curve, this becomes a bit simpler. The diﬀerentials are,

dx = (r� (θ) cos(θ) − r(θ) sin(θ))dθ,

dy = (r� (θ) sin(θ) + r(θ) cos(θ))dθ.

Squaring gives,

(dx)2 = ((r� )2 cos2 (θ) − 2rr� sin(θ) cos(θ) + r2 sin2 (θ))(dθ)2 ,

(dy)2 = ((r� )2 sin2 (θ) + 2rr� sin(θ) cos(θ) + r2 cos2 (θ))(dθ)2 .

84
18.01 Calculus Jason Starr
Fall 2005

Summing down columns gives,

(dx)2 + (dy)2 = [(r� )2 + r2 ](dθ)2 .

Taking square roots gives the diﬀerential element of arc length for a polar curve,
�
ds = [r� (θ)]2 + [r(θ)]2 dθ.

Example. For the cardioid,

r(θ) = a(1 + cos(θ)),
the derivative is,
r� (θ) = −a sin(θ).
Thus,

(r� )2 + r2 = a2 (1 + cos(θ))2 + (−a sin(θ))2 = a2 (1 + 2 cos(θ) + cos2 (θ)) + a2 sin2 (θ).

This simpliﬁes to,

2a2 (1 + cos(θ)).
To simplify this further, write θ = 2(θ/2) and use the doubleangle formula to get,

2a2 (1 + cos(2(θ/2))) = 2a2 (1 + cos2 (θ/2) − sin2 (θ/2)) = 2a2 (2 cos2 (θ/2)) = 4a2 cos2 (θ/2).

Taking square roots gives,

ds = 2a cos(θ/2).
Note, this answer is only correct for −π ≤ θ ≤ π. Outside this range, we might have to take the
other square root to get a positive number. In particular, the total arc length of the cardioid is,
� � θ=π
s = ds = 2a cos(θ/2)dθ = 2a (2 sin(θ/2)|π−π = 2a((2) − (−2)).
θ=−π

Simplifying, the total arc length of the cardioid is,

s = 8a.

Surface areas of surfaces of revolution can be computed in a similar way. This was only brieﬂy

discussed in lecture. Here is a continuation of the previous problem.

Example. The top half of the cardioid,

r(θ) = a(1 + cos(θ)), 0 ≤ θ ≤ π,

is revolved about the xaxis to give a fairly good approximation of the surface of an apple. What
is the surface area of this apple?

85
18.01 Calculus Jason Starr
Fall 2005

Since we are revolving about the xaxis, the radius of each slice is y. Therefore the diﬀerential
element of surface area is,
dA = 2πyds.
Substituting in y = r(θ) sin(θ) = a(1 + cos(θ)) sin(θ), and substituting in for ds gives,

dA = 2π[a(1 + cos(θ)) sin(θ)](2a cos(θ/2)dθ).

To simplify this, substitute both,

1 + cos(θ) = 2 cos2 (θ/2),

and,
sin(θ) = 2 sin(θ/2) cos(θ/2),
to get,

dA = 4πa2 (2 cos2 (θ/2))(2 sin(θ/2) cos(θ/2)) cos(θ/2)dθ = 16πa2 cos4 (θ/2) sin(θ/2)dθ.

Thus the total surface area is,

�
� π
A= dA = 16πa2 cos4 (θ/2) sin(θ/2)dθ.
θ=0

To evaluate this integral, substitute,

u = cos(θ/2) u(π) = 0,
du = −(1/2) sin(θ/2)dθ, u(0) = 1
The new integral is,
�
u=0 �
u=1 �
5 �1
2 4 2 4 u ��
2
A = 16πa u (−2du) = 32πa u du = 32πa .

u=1 u=0 5
�
0
This evaluates to give the total surface area of the apple,

A = 32πa2 /5.

5. Area of a region enclosed by a polar curve. What is the area of the planar region enclosed
by a cardioid? By the same sort of reasoning as for volumes and arc lengths, the diﬀerential element
of area of the triangular region bounded by the rays θ, θ + dθ and the curve r(θ) is,
r(θ)2
dA = dθ.
2
Thus the area enclosed by a polar curve is,
θ=b
r(θ)2
�
�
A = dA =
dθ.
θ=a 2

86
18.01 Calculus Jason Starr
Fall 2005

In particular, the area enclosed by the cardioid is,

�
2π 2
a (1 + cos(θ))2
A= dθ.
0 2
This expands to give,
2π
a2
�
1 + 2 cos(θ) + cos(θ)2 dθ.
2 0
To simplify the last part of the integrand, substitute,
1 + cos(2θ)
cos(θ)2 = ,
2
to get,
2π 2π
a2 a2
� �
1 + cos(2θ)
1 + 2 cos(θ) + dθ = 3 + 4 cos(θ) + cos(2θ)dθ.
2 0 2 4 0
Using the Fundamental Theorem of Calculus, this equals,
�2π
a2
�
1 �
3θ + 4 sin(θ) + sin(2θ)
�� .
4
2 0

Evaluating gives,
A = 3πa2 /2.

Lecture 24. November 15, 2005

Practice Problems. Course Reader: 5A1, 5A2, 5A3, 5A5, 5A6.
1. Inverse functions. Let a, b, s and t be constants. Let y = f (x) be a function deﬁned on the
interval,
a ≤ x ≤ b,
and whose values are in the interval,
s ≤ y ≤ t.
Does there exist a function x = g(y) deﬁned on the interval,

s ≤ y ≤ t,

whose values are in the interval,

a ≤ x ≤ b,
satisfying the two conditions,
g(f (x)) = x, f (g(y)) = y ?
If such a function g exists, it is called an inverse function of f , and it is denoted by f −1 (y). Also,
the original function f (x) is called invertible. There is some chance of confusion with the other use

18.01 Calculus Jason Starr

Fall 2005

of “invertible”, namely that 1/f (x) is always defined. We will be careful to specify the meaning of
“invertible”.
There are 2 necessary conditions for f to have an inverse function. Assume f has an inverse function
g. Let x1 , x2 be a pair of numbers in [a, b]. If f (x1 ) equals f (x2 ), then also,
x1 = g(f (x1 )) = g(f (x2 )) = x2 ,
i.e., x1 equals x2 . In other words, two distinct inputs x1 and x2 give two distinct outputs f (x1 ) and
f (x2 ). A function satisfying this condition is called onetoone, because to every output, there is
at most one input. This is the first necessary condition: every invertible function is onetoone.
Next, for every number y in [s, t], there is a number x in [a, b] such that y = f (x). In fact, just
take x to be g(y); then f (x) equals f (g(y)), which equals y. A function satisfying this condition is
called onto. This is the second necessary condition: every invertible function is onto.
Together, this says that an invertible function is onetoone and onto. In fact, the converse is also
true: every onetoone and onto function is invertible. This is easy to check, but we will not prove
it in this class.
Remark: In checking that f is onetoone and onto, the choice of intervals [a, b] and [c, d] are vital.
A simple example comes from f (x) = sin(x). For the interval [−π/2, π, 2] and [−1, 1], the function
f (x) is onetoone and onto. But for many other choices of these intervals, the function is neither
onetoone nor onto.
2. The graph of an inverse function. How should we think of an inverse function? One way
is graphically. The graph of the function y = f −1 (x) is the same as the graph of f (y) = x. This
is simply the usual graph of y = f (x) with the roles of x and y reversed. What this translates to
is, the graph of f −1 is the same as the graph of f with the roles of the xaxis and yaxis reversed.
The simplest way to get the graph of f −1 (x) is simply to reflect the graph of f (x) through the 45◦
line y = x.
3. The inverse trigonometric functions. The function sin(x) is onetoone and onto on
[−π/2, π/2], taking values in [−1, 1]. Thus there is an inverse function sin−1 (x) defined on the
interval [−1, 1], taking values in [−π/2, π/2]. The graph of sin−1 (x) is an increasing function whose
lower left endpoint is (−1, −π/2) and whose upper right endpoint is (1, π/2).
The function cos(x) is onetoone and onto on [0, π], taking values in [−1, 1]. Thus there is an inverse
function cos−1 (x) defined on the interval [−1, 1], taking values in [0, π]. The graph of cos−1 (x) is a
decreasing function whose upper left endpoint is (−1, π) and whose lower right endpoing is (1, 0).
The function tan(x) is onetoone and onto on (−π/2, π/2), taking values in the whole real line.
Thus there is an inverse function tan−1 (x) defined on the whole real line, taking values in (−π/2, π/2).
The graph is an increasing function that is asymptotic to the line y = −π/2 as x → −∞, and
asymptotic to the line y = +π/2 as x → +∞.
4. Derivatives of inverse functions. A particular simple formulation of the chain rule is the
differential formulation,
df (u) = f � (u)du.

88
18.01 Calculus Jason Starr
Fall 2005

If f has an inverse function g(x), let u be g(x). Then this gives,

df (g(x)) = f � (g(x))dg(x).

On the other hand, f (g(x)) equals x. This gives the formula,

dx = f � (g(x))dg(x).

Solving for dg/dx gives,

d
(g(x)) = 1/f � (g(x)).
dx
This is the formula for the derivative of an inverse function.

In fact, we have seen this formula before. It is how we computed the derivative of ln(x), the inverse

function of ex :
d 1 1
(ln(x)) = ln(x) = .
dx e x
5. Derivatives of the inverse trigonometric functions. Because the derivative of sin(x) is
cos(x), the formula above gives,

d 1
(sin−1 (x)) = .
dx cos(sin−1 (x))

This isn’t very useful. A simple argument makes it much more useful. Denote sin−1 (x) by θ. Thus
sin(θ) = x. Also, the formula for the derivative is a bit simpler,

d 1
(sin−1 (x)) = .
dx cos(θ)

By the Pythagorean theorem,

sin2 (θ) + cos2 (θ) = 1.
Solving gives, � √
cos(θ) = 1 − sin2 (θ) = 1 − x2 .
This gives a very useful formula for the derivative of sin−1 (x),

d √
(sin−1 (x)) = 1/ 1 − x2 .
dx

There is a very similar derivation that,

d √
(cos−1 (x)) = −1/ 1 − x2 .
dx

89
18.01 Calculus Jason Starr
Fall 2005

This looks remarkably similar to the previous formula. In particular, this gives,
d 1 −1
(sin−1 (x) + cos−1 (x)) = √ +√ = 0.
dx 1 − x2 1 − x2
Therefore the sum is a constant function. Checking at x = 0 gives the value of this constant
function,
sin−1 (x) + cos−1 (x) = π/2.

Finally, because the derivative of tan(x) is sec2 (x), the formula gives,
d 1
(tan−1 (x)) = .
dx sec (tan−1 (x))
2

Again introduce θ = tan−1 (x). Then the formula for the derivative is,
d 1
(tan−1 (x)) = .
dx sec2 (θ)
But the Pythagorean theorem implies,
sec2 (θ) = 1 + tan2 (θ) = 1 + x2 .
This ﬁnally gives a very useful formula for the derivative of tan(x),
d
(tan−1 (x)) = 1/(1 + x2 ).
dx
Notice, in particular, that the denominator is never zero. This is closely related to the fact that
tan−1 (x) is deﬁned on the entire real line.
6. Hyperbolic trigonometric functions. The trigonometric functions are very useful for dis
cussing point on the unit circle x2 + y 2 = 1, because the circle is the parametric curve,
�
x = cos(θ),
y = sin(θ)

Are there analogous continuous functions for the points on the hyperbola x2 − y 2 = 1?
At first blush, the answer is no. The problem is that the hyperbola has two parts: one part is in
the halfplane where x > 0, and the other part is in the halfplane where x < 0. Because of the
intermediate value theorem, a continuous function x = f (t) cannot jump from x > 0 to x < 0 or
vice versa without crossing x = 0. Thus, refine the question: Are there continuous functions for
the part of the hyperbola in the halfplane where x > 0?
The answer to this question is yes. The corresponding functions are called hyperbolic trigonometric
functions or, more often, simply hyperbolic functions. They are defined as follows,
1
cosh(t) = (et + e−t ),
2

90
18.01 Calculus Jason Starr
Fall 2005

1
sinh(t) = (et − e−t ),
2
sinh(t) et − e−t
tanh(t) = = t ,
cosh(t) e + e−t
1 2
sech(t) = = t ,
cosh(t) e + e−t
1 2
csch(t) = = t ,
sinh(t) e − e−t
and,
1 cosh(t) et + e−t
coth(t) = = = t .
tanh(t) sinh(t) e − e−t
The ﬁrst observation is that,
1 1
cosh2 (t) = (et + e−t )2 = (e2t + 2 + e−2t ),
4 4
1 1
sinh2 (t) = (et − e−t )2 = (e2t − 2 + e−2t ).
4 4
Taking the diﬀerence of these, most of the terms cancel,
1 4
cosh2 (t) − sinh2 (t) = ((2) − (−2)) = = 1.
4 4
This proves that the parametric curve,
�
x = cosh(t),
y = sinh(t)

is contained in the righthalf of the hyperbola x2 − y 2 = 1. We will see next time that there is an
inverse function of sinh(t), from which it follows that every point in the righthalf of the hyperbola
occurs for exactly one value of t. Thus the parametric curve exactly traces out the righthalf of the
hyperbola.
7. The derivatives of the hyperbolic functions. The derivatives of the hyperbolic functions
are straightforward. The formulas are very similar to the formulas in the trigonometric case, but
slightly diﬀerent. Try not to confuse them.
d
(sinh(x)) = cosh(x).
dx
d
(cosh(x)) = sinh(x).
dx

d 1 1

(tanh(x)) = 2 (cosh(x) · cosh(x) − sinh(x) · sinh(x)) = 2 = sech2 (x).

dx cosh (x) cosh (x)

91
18.01 Calculus Jason Starr
Fall 2005

Lecture 25. November 17, 2005

Homework. Problem Set 7 Part I: (a)–(e)

Practice Problems. Course Reader: 5D2, 5D6, 5D7, 5D10, 5D14

1. Inverse hyperbolic functions. There are a few other useful formulas for hyperbolic functions;
for instance, the analogues of the angleaddition formulas,

sinh(s + t) = sinh(s) cosh(t) + cosh(s) sinh(t),

cosh(s + t) = cosh(s) cosh(t) + sinh(s) sinh(t).

These imply the doubleangle formulas,

sinh(2t) = 2 sinh(t) cosh(t),

cosh(2t) = cosh2 (t) + sinh2 (t) = 2 cosh2 (t) − 1 = 2 sinh2 (t) + 1.

From these follow the analogues of the halfangle formulas,
1
sinh2 (t/2) = (cosh(t) − 1),
2
1
cosh2 (t/2) = (cosh(t) + 1).
2
A beautiful feature of hyperbolic functions is that their inverse functions can be expressed in terms
of simpler functions. The inverse function sinh−1 (x) of sinh(x) is deﬁned on the whole real line.
By deﬁnition,
sinh−1 (x) = y if and only if sinh(y) = x.
This second equation can be written out as,
1 y
(e − e−y ) = x.
2
Substituting z = ey gives,
1
(z − z −1 ) = x.
2
Multiplying both sides by 2z gives,

z 2 − 1 = 2xz ⇔ z 2 − 2xz − 1 = 0.

Completing the square gives,

(z − x)2 = x2 + 1.
Taking square roots gives, √
z =x± x2 + 1.

92
18.01 Calculus Jason Starr
Fall 2005

Since z equals ey , z is positive. Thus, the correct square root is,

√
z = x + x2 + 1.

Finally this gives, √

y = ln(z) = ln(x + x2 + 1).
Therefore, the formula for the inverse hyperbolic sine is,
√
sinh−1 (x) = ln(x + x2 + 1).

The same type of argument also gives,

√
cosh−1 (x) = ln(x + x2 − 1), x ≥ 1,

and
tanh−1 (x) = (1/2) ln((1 + x)/(1 − x)), −1 < x < 1.

2. Derivatives of the inverse hyperbolic functions. By the same methods used to compute
the derivatives of inverse trigonometric functions, the derivatives of the inverse hyperbolic functions
are,
du
d sinh−1 (u) = √ ,
1 + u2
du
d cosh−1 (u) = √ , u ≥ 1,
u2 − 1
du
d tanh−1 (u) = , −1 < u < 1.
1 − u2
These can also be computed using the formulas for the inverse functions.
3. Inverse substitution. The derivatives of inverse trigonometric and inverse hyperbolic functions
√
allow us to compute more antiderivatives than before, e.g., dx/( x2 − 1) equals cosh−1 (x) + C.
�

Essentially this comes down to making a direct substitution of an inverse function, e.g., u =
cosh−1 (x). However, this is logically equivalent to making an inverse substition, x = cosh(u).
When the integrand is more complicated, inverse substitution is usually simpler and faster than
direct substitution of an inverse function.
Example. Compute the following antiderivative,
� √
a2 − x2 dx.

This is not quite the derivative of an inverse function above. However, it is clear that inverse
substituting x = a sin(θ) will simplify the integrand, because

a2 − x2 = a2 − (a sin(θ))2 = a2 (1 − sin2 (θ)) = a2 cos2 (θ).

93
18.01 Calculus Jason Starr
Fall 2005

Thus we have,
� √ � � � �
x = a sin(θ), 2
2 2
a − x dx, ,⇒ 2 2
a cos (θ)(a cos(θ)dθ) = a cos2 (θ)dθ.
dx = a cos(θ)dθ

Using the halfangle formula, this becomes,

� � �
2 1 1 2 θ 1
a + cos(2θ)dθ = a + sin(2θ) + C.
2 2 2 4

Using the doubleangle formula and backsubstituting gives,

� √
√
a2 − x2 dx = (1/2)(a2 sin−1 (x/a) + x a2 − x2 ) + C.

4. Three diﬀerent kinds of integrals, three kinds of inverse substitution. The type of
antiderivative where inverse substitution is most successful has the form,
√
F (x, Ax2 + Bx + C)
�
√ dx,
G(x, Ax2 + Bx + C)

where A, B and C are constants, and F (x, y) and G(x, y) are polynomial functions in the two argu
ments. Inverse substitution together with partial fractions solves all such antiderivative problems.
The ﬁrst step is to complete the square of the expression Ax2 + Bx + C. This gives,
�2
B 2 − 4AC
�
2 B
Ax + Bx + C = A x + − .
2A 4A

In particular, making the substition,

B
u=x+ , du = dx,
2A
transforms the quadratic into one of 3 possible types,

β 2 u2 + α2 , β 2 u2 − α2 , −β 2 u2 + α2 ,

where, �
� |B 2 − 4AC |
β= |A|, α = .
|4A|
Deﬁning a = α/β, ﬁnally the integral is transformed to one of 3 possible types,
√
FI (u, a2 − u2 )
�
Type I: √ du,
GI (u, a2 − u2 )

94
18.01 Calculus Jason Starr
Fall 2005

√
FII (u, u2 − a2 )
�
Type II: √ du,
GII (u, u2 − a2 )
and √
FIII (u, a2 + u2 )
�
Type III: √ du.
GIII (u, a2 + u2 )

For each of these types, there are 3 possible inverse substitutions: trigonometric, hyperbolic and
rational. A ﬂow chart of the 9 possible outcomes will be posted on the course webpage. Here are
a couple of examples. In each example, the inverse rational substitution is given, although it was
only brieﬂy discussed in lecture.
Example. Compute the following antiderivative,

x2
�
√ dx.
a2 − x 2
The trigonometric inverse substition is,

x = a sin(θ), dx = a cos(θ)dθ.

The new antiderivative is,

a2 sin2 (θ)
�
� (a cos(θ)dθ).
a2 − a2 sin2 (θ)
Simplifying gives, �
a2 sin2 (θ)dθ.

This can be simpliﬁed using the halfangle formula,

�
2 1 1
a − cos(2θ)dθ.
2 2
This is easily seen to be, � �
2 θ 1
a − sin(2θ) + C.
2 4
Using the doubleangle formula and backsubstituting,

x2 √
�
√ dx = (1/2)(a2 sin−2 (x/a) − x a2 − x2 ) + C.
a2 − x 2

Alternatively, the hyperbolic inverse substitution is,

x = a tanh(t), dx = asech2 (t)dt.

95
18.01 Calculus Jason Starr
Fall 2005

The new antiderivative is,

a2 tanh2 (t)
�
� asech2 (t)dt.
a2 sech2 (t)
Simplifying gives,
sinh2 (t)
� �
2 2 2
a tanh (t)sech(t)dt = a dt.
cosh3 (t)
This can be simpliﬁed a bit by multiplying numerator and denominator by cosh(t) and then ex
pressing in terms of sinh(t) as much as possible,
sinh2 (t) sinh2 (t)
� �
2 2
a cosh(t)dt = a cosh(t)dt.
cosh4 (t) (1 + sinh2 (t))2
Make the substitution u = sinh(t), du = cosh(t)dt to get,
u2
�
2
a du.
(1 + u2 )2
This can be rewritten as, � �
2 1 1
a 2
du − a2 du.
1+u (1 + u2 )2
The ﬁrst of these terms is just a2 tan−1 (u). However, the second term requires another inverse

substitution. All in all, this is not a very eﬃcient approach.

Finally, the rational inverse substitution is,

2t 2(1 − t2 )
x=a , dx = a dt.
1 + t2 (1 + t2 )2
The point is that,
(1 − t2 )2
a2 − x 2 = a 2 .
(1 + t2 )2
Thus the new antiderivative is,
4a2 t2 1 + t2 2a(1 − t2 )
�
dt.
(1 + t2 )2 a(1 − t2 ) (1 + t2 )2
This simplifies to,
t2
� � �
2 1 1
8a dt = 8a2 dt − 8a2 dt.
(1 + t2 )3 (1 + t2 )2 (1 + t2 )3
Notice, these two integrals are the same type that occurred with inverse hyperbolic substitution.
But they came up more quickly: rational inverse substitution is more efficient than inverse hyper
bolic substitution for this problem. However, both require a further inverse trigonometric substi
tution. So inverse trigonometric substitution is the most efficient for this problem.

96
18.01 Calculus Jason Starr
Fall 2005

Example. Compute the following antiderivative,

x2
�
√ dx.
x 2 − a2
The trigonometric inverse substitution is,

x = a sec(θ), dx = a sec(θ) tan(θ)dθ.

The new antiderivative is,

a2 sec2 (θ)
�
� a sec(θ) tan(θ)dθ.
a2 sec2 (θ) − a2
Because sec2 (θ) − 1 equals tan2 (θ), simplifying gives,
� �
2 3 2 1
a sec (θ)dθ = a dθ.
cos3 (θ)
This can be simplifed by multiplying numerator and denominator by cos(θ) and then expressing in
terms of sin(θ) as much as possible,
� �
2 1 2 1
a cos(θ)dθ = a cos(θ)dθ.
4
cos (θ) (1 − sin2 (θ))2
Make the substitution u = sin(θ), du = cos(θ)dθ) to get,
�
2 1
a du.
(1 − u2 )2
This can be computed using partial fractions (not yet discussed).
Alternatively, the hyperbolic inverse substitution is,

x = a cosh(t), dx = a sinh(t)dt.

The new antiderivative is,

a2 cosh2 (t)
�
� a sinh(t)dt.
a2 cosh2 (t) − a2
Since cosh2 (t) − 1 equals sinh2 (t), simplifying gives,
�
a 2
cosh2 (t)dt.

This can be simpliﬁed using the analogue of the halfangle formula,

�
2 1 1
a + cosh(2t)dt.
2 2

97
18.01 Calculus Jason Starr
Fall 2005

This is easily seen to be, � �

2 t 1
a − sinh(2t) + C.
2 4
Using the doubleangle formula and backsubstituting,
�
x2 1� 2 √ �
√ dx = a cosh−1 (x/a) − x x2 − a2 .
x 2 − a2 2

Using the formula for cosh−1 (x/a), this becomes,

x2 √ √
�
√ dx = (1/2)(a2 ln(x + x2 − a2 ) − x x2 − a2 ) + C.
x 2 − a2

Finally, the rational substitution is,

1 + t2 −(1 − t2 )
x=a , dx = a dt.
2t 2t2
The point is that,
(1 − t2 )2
a2 − x 2 = a 2 .
(2t)2
Thus the new antiderivative is,

a2 (1 + t2 )2 −a(1 − t2 )
�
2t
dt.
4t2 a(1 − t2 ) 2t2

This simpliﬁes to,

a2 (1 + t2 )2 a2
� �
1 2
− dt = − + + tdt.
4 t3 4 t3 t
This evaluates to,
a2
� �
−1 t
− 2
+ 2 ln(t) + + C.
4 2t 2
This is clearly the easiest of the 3 methods for computing the antiderivative, for this problem.
However, there still remains the formidable problem of solving for t = t(x), backsubstituting, and
simplifying the resulting expression. All in all, inverse hyperbolic substitution is the most eﬃcient
for this problem.
Lecture 26. November 18, 2005
Homework. Problem Set 7 Part I: (f)–(g); Part II: Problem 1 and Problem 2 (a), (b).
Practice Problems. Course Reader: 5E8, 5E10, and please read through Part II of Problem
Set 7.

98
18.01 Calculus Jason Starr
Fall 2005

1. Review of inverse substitution and another example. Recall the general strategy for
ﬁnding an antiderivative of the form,
√
F (x, Ax2 + Bx + C)
�
√ dx.
G(x, Ax2 + Bx + C)
For deﬁniteness, consider the example,

x2
�
√ dx,
x2 − 2ax + 2a2
where a is a constant.
Step 1. Complete the square. Complete the square of the expression Ax2 + Bx + C, inside the
radical. In the example,
x2 − 2ax + 2a2 = (x − a)2 + a2 .

Step 2. Make a linear change of coordinates. Make a linear change of coordinates to simplify
the quadratic term to one of the 3 types: a2 − x2 , x2 − a2 , or x2 + a2 . In the example, this means
making the linear change of variables,

u = x − a, du = dx.

The new quadratic term is u2 + a2 , the third type. The new antiderivative is,

(u + a)2
� 2
u + 2u + a2
�
√ du = √ du.
u2 + a2 u 2 + a2

Step 3. Use inverse substitution to eliminate the radicals. There is a choice of inverse sub
stitution: trigonometric, hyperbolic or rational. When starting out, it is a good idea to experiment
with all 3. On an exam, usually one choice will be suggested (or even demanded). When no other
guidance is given, trigonometric substitution is a good starting point (because you are already very
familiar with trigonometric functions).
In the example, to eliminate the radical, the correct inverse trigonometric substitution is,

u = a tan(θ), du = a sec2 (θ)dθ.

This is because the quadratic term becomes,

u2 + a2 = a2 tan2 (θ) + a2 = a2 sec2 (θ).

With this substitution, the new antiderivative is,

� 2
a tan2 (θ) + 2a2 tan(θ) + a2
� a sec2 (θ)dθ.
2 2
a sec (θ)

99
18.01 Calculus Jason Starr
Fall 2005

This simpliﬁes to, �

2
a (tan2 (θ) + 2 tan(θ) + 1) sec(θ)dθ.

This can be written as a sum of 3 terms,

� � �
2 2 2 2
a tan (θ) sec(θ)dθ + 2a sec(θ) tan(θ)dθ + a sec(θ)dθ.

Step 4. Compute the new antiderivative. If this were only as simple as it sounds, how much
easier calculus would be! This step is often diﬃcult in itself. Often it requires at least one more
direct substitution. Sometimes, it also requires a partial fractions decomposition. We will return
to this step below.
Step 5. Backsubstitute. This is always a step for a method using direct substitution or inverse
substitution. This step frequently introduces terms like cos(tan−1 (x)). Timepermitting (or when
speciﬁcally instructed to do so), these terms should be simplifed using the righttriangle method
from lecture,
√
θ = tan−1 (x), x/1 = tan(θ) = Opposite/Adjacent, Hypotenuse = 1 + x2 ,
√
cos(θ) = Adjacent/Hypotenuse = 1/ 1 + x2 .

Step 6. Check your answer. When feasible, check your answer. Since diﬀerentiation is so much
faster than antidiﬀerentiation, it is usually quite easy to check an antiderivative is correct.
Example. The tricky part is, of course, Step 4. In the example, the integral broke into 3 terms,
� � �
2 2 2 2
a tan (θ) sec(θ)dθ + 2a sec(θ) tan(θ)dθ + a sec(θ)dθ.

The last antiderivative was actually Problem 3(b) from Part II of Problem Set 4. It turns out to
be,
2
� √ √
a sec(θ)dθ = a2 ln(u + u2 + a2 ) + C = a2 ln(x − a + x2 − 2ax + 2a2 ) + C.
�
The middle antiderivative is simply the derivative of sec(θ) = 1 + tan2 (θ). So the middle term
is,

2
� √ √
2a sec(θ) tan(θ)dθ = 2a2 sec(θ) + C = 2a a2 + u2 + C = 2a x2 − 2ax + 2a2 + C.

But the ﬁnal term does not simplify in an obvious way. In such cases, it is best to express everything
in terms of sin(θ) and cos(θ) to get a fresh perspective,

sin2 (θ)
� �
2 2 2
a tan (θ) sec(θ)dθ = a dθ.
cos3 (θ)

100
18.01 Calculus Jason Starr
Fall 2005

Multiplying numerator and denominator by cos(θ) and expressing in terms of sin(θ) gives,

sin2 (θ) sin2 (θ)

� �
2 2
a cos(θ)dθ = a cos(θ)dθ.
(cos2 (θ))2 (1 − sin2 (θ))2

Now substitute for sin(θ),

z = sin(θ), dz = cos(θ)dθ.
The new antiderivative is,
z2
�
dz.
(1 − z 2 )2
How do we compute this antiderivative? That is the topic of partial fractions.
Remark: In lecture the solution was done a bit differently. This led to a slightly different an
tiderivative, �
1
dz.
(1 − z 2 )2
Notice the difference of these 2 antiderivatives is,

(1) − (z 2 )
� �
1
dz = dz.
(1 − z 2 )2 (1 − z 2 )

This was computed in Problem 3(a), Part II of Problem Set 4. Thus, computing either of the 2
antiderivatives gives both of them.
2. Antidiﬀerentiating simple rational expressions. A rational expression is a fraction of
polynomials, F (x)/G(x). These frequently arise in Step 4 of the algorithm above. From the point
of view of antidiﬀerentiation, the simplest rational expressions are either polynomials,

q(x) = an xn + an−1 xn−1 + · · · + a1 x + a0 ,

or else partial fractions,

A
.
(x − a)m
There are 2 other kinds of partial fractions which were not emphasized in lecture,

B(x − a) C
2 2 m
and .
((x − a) + b ) ((x − a)2 + b2 )m

These 2 kinds come up less often than the ﬁrst kind. But they do come up, for instance, when
studying Laplace transforms in 18.03. Both polynomials and partial fractions are (relatively) easy
to antidiﬀerentiate. The antiderivative of a polynomial is,
�
q(x)dx = (na+1)
n
xn+1 + an−1
n
xn + · · · + a21 x2 + a0 x + C.

101
18.01 Calculus Jason Starr
Fall 2005

The antiderivative of the ﬁrst kind of partial fraction is,

�
(−A/(m − 1))(x − a)−(m−1) + C , m ≥ 2,
�
A
x − a)−m dx =
( A ln(|x − a|) + C , m=1

The second kind of partial fraction can be computed with a direct substitution v = (x − a)2 + b2 ,
dv = 2(x − a)dx,
�
B(x − a) (−B/(2m − 2))((x − a)2 + b2 )−(m−1) + C , m ≥ 2,
� �
B dv
dx = =
((x − a)2 + b2 )m 2 vm (B/2) ln((x − a)2 + b2 ) + C , m=1

The third kind of partial fraction can be computed with an inverse substitution x = b tan(θ) + a,
dx = b sec2 (θ)dθ, �
C 2m−1
�
2 2 m
dx = (C/b ) cos2m−2 (θ)dθ.
((x − a) + b )
Integration by parts gives a reduction formula for such integrals; see Problems (i) and (j), Part I
of Problem Set 7.
3. Simplifying rational expressions: division and factoring Many rational expressions that
come up are not of the simple kinds above. The goal is to express an arbitrary rational expression
as the sum of a polynomial and partial fractions. The ﬁrst step is polynomial division. Given a
fraction F (x)/G(x), apply polynomial division to get a factorization with remainder,

F (x) = q(x)G(x) + r(x),

where q(x) is a polynomial and r(x) is a polynomial of degree less than deg(G(x)). This leads to
the reduced form of a rational expression,
F (x) r(x)
= q(x) + .
G(x) G(x)

Example. I forgot the example from lecture. Here is a similar example. Find the reduced form of
(x3 + 1)/(x2 + 3x + 2). The polynomial division algorithm gives,

x3 + 1 = (x2 + 3x + 2)(x − 3) + (7x + 7),

Thus q(x) is x − 3 and r(x) is 7x + 7. So the reduced form is,

x3 + 1 7x+7
= x−3+ x2 +3x+2
.
x2 + 3x + 2

The next step is to factor the denominator into a product of linear and irreducible quadratic factors,

G(x) = A(x − a1 )m1 · (x − a2 )m2 · · · · · (x − ak )mk · ((x − α1 )2 + b21 )n1 · · · · · ((x − αl )2 + b2l )nl .

102
18.01 Calculus Jason Starr
Fall 2005

Here k and l are nonnegative integers and m1 , . . . , mk , n1 , . . . , nl are positive integers. Also,
a1 , . . . , ak , α1 , . . . , αl , and β1 , . . . , βl are real numbers. The last l factors were not discussed in
lecture until the end of lecture. Although they are important, they do not often come up in this
course.
The Fundamental Theorem of Algebra asserts that every polynomial with real coefficients has a
factorization as above. However, finding the factorization can be very difficult. In all exercises and
exam problems, either the factorization is easy, or the factorization will be given to you. Whenever
possible, cancel common factors from the numerator and denominator.
Example. In the example, the quadratic formula gives the factorization,

x2 + 3x + 2 = (x + 2)(x + 1).

The numerator r(x) is 7(x + 1). Thus the numerator and denominator have a common factor. This
leads to a better reduced form,
x3 + 1 7
2
=x−3+ .
x + 3x + 2 x+2
This can now be integrated to give,
x3 + 1
�
dx = (x2 /2) − 3x + 7 ln(|x + 2|) + C.
x2 + 3x + 2

4. Simplifying rational expressions: partial fraction decomposition. Using the last part,
every rational expression can be written in the form,
F (x) r(x) r(x)
= q(x)+ = q(x)+ ,
G(x) H(x) (x − a1 ) · · · · · (x − ak ) · ((x − α1 )2 + b21 )n1 · · · · · ((x − αl )2 + b2l )nl
m1 mk

where q(x) is a polynomial, the degree of r(x) is less than the degree of H(x), and r(x) has no
common factor with H(x). This can be further simplified using partial fraction decomposition. It
is a fact that every rational expression r(x)/H(x) can be written in the form,
� � � �
C1,1 C1,2 C1,m1 Ck,1 Ck,mk
+ + ··· + + ··· + + ··· + +
x − a1 (x − a1 )2 (x − a1 )m1 x − ak (x − ak )mk
� �
D1,1 (x − α1 ) E1,1 D1,n1 (x − α1 ) E1,n1
+ + ··· + + + ...
(x − α1 )2 + b21 (x − α1 )2 + b12 ((x − α1 )2 + b21 )n1 ((x − α1 )2 + b21 )n1
� �
Dl,1 (x − αl ) El,1 Dl,n1 (x − αl ) El,n1
+ + + ··· + + .
(x − αl )2 + b2l (x − αl )2 + bl2 ((x − αl )2 + b2l )nl ((x − αl )2 + b2l )nl
Here all the terms Ci,j , Di,j and Ei,j are real constants. This sum of partial fractions is called the
partial fraction decomposition of r(x)/H(x). The difficulty is precisely to find the constants Ci,j ,
Di,j , and Ei,j .

103
18.01 Calculus Jason Starr
Fall 2005

One approach, which always works but is quite inefficient, is simply to multiply all terms by the
denominator H(x), and then gather coefficients of powers of x. This will give a collection of linear
equations in the unknowns Ci,j , Di,j and Ei,j . There is a unique solution of this set of linear
equations. Methods of linear algebra, e.g., GaussJordan elimination, give an algorithm for finding
the solution.
Example. Find the partial fraction decomposition of,
1
.
1 − x2
In fact this was Problem 3(a), Part II of Problem Set 4. The partial fraction decomposition will
have the form,
1 A B
2
= + .
1−x x+1 x−1
Multiplying both sides of the equation by x2 − 1 = (x + 1)(x − 1) gives,

−1 = A(x − 1) + B(x + 1) = (A + B)x + (B − A).

This gives the system of 2 linear equations in 2 unknowns,

�

A + B = 0,
−A + B = −1

Solving the ﬁrst equation for B = −A and plugging this into the second equation gives,
1
−A + (−A) = −1 ⇔ 2A = 1 ⇔ A = .
2
Thus B = −A = −1/2. So the partial fraction decomposition is,
1 1 1 −1 1
= 2 x+1
+ 2 x−1
.
1 − x2

5. The Heaviside coverup method. The Heaviside coverup method is a method for deter
mining many of the coeﬃcients Ci,j . For each highest power of a linear factor occuring in H(x),
say (x − ai )mi , coverup that term, and substitute x = ai in the remaining polynomial. Then Ci,mi
equals the value, �
r(x) �
Ci,mi = m
� .
H(x)/(x − ai ) i �
x=ai
The proof is quite simple. Multiply every term in the partial fraction decomposition by (x − ai )mi .
One term is (x − ai )mi (Ci,mi /(x − ai )mi ) = Ci,mi . Every other term has a factor (x − ai ) that is
not cancelled by the denominator. Thus plugging in x = ai , every other term is 0. And the only
remaining term is Ai,mi .

104
18.01 Calculus Jason Starr
Fall 2005

Example. Find many of the terms in the partial fraction decomposition,

z2 z2
= .
(z 2 − 1)2 (z + 1)2 (z − 1)2
The partial fraction decomposition will be,
z2 C1,2 C2,2 C1,1 C2,1
2 2
= 2
+ 2
+ + .
(z + 1) (z − 1) (z + 1) (z − 1) z+1 z−1
Using the Heaviside coverup method,
z 2 �� (−1)2
�
1
C1,2 = 2
= 2
= .
(z − 1) z=−1 (−2)
� 4
Also,
z 2 �� (+1)2
�
1
C2,2 = 2
= 2
= .
(z + 1) z=+1 (+2)
�
4
Thus the partial fraction decomposition is,
z2 1 1 1 1 C1,1 C2,1
= 4 (z +1)2
+ 4 (z−1)2
+ z+1
+ z−1
.
(1 − z 2 )2

As this example illustrates, the Heaviside coverup method does not always determine all coeffi
cients. However, it reduces the number of coefficients. To find the remaining coefficients, either
clear denominators, or else substitute for x some useful numbers (where H(x) is nonzero), and solve
the resulting linear equations.
Example. Find the full partial fraction decomposition of,
z2
.
(1 − z 2 )2
The rational expression is unchanged by the substitution z ↔ −z. Thus the same is true for the
partial fraction decomposition. Therefore C2,1 equals −C1,1 . This gives,
z2 1 1 1 1 C1,1 −C1,1
= + + + .
(1 − z 2 )2 4 (z + 1)2 4 (z − 1)2 z + 1 z−1
Finally, plug in z = 0 to get,
1 1 1 1 C1,1 −C1,1 1
0= 2
+ 2
+ + = + 2C1,1 .
4 (+1) 4 (−1) +1 −1 2
Solving gives C1,1 = −1/4. Finally this gives the full partial fraction decomposition,
z2
= (1/4)(1/(z + 1)2 + 1/(z − 1)2 − 1/(z + 1) + 1/(z − 1)).
(1 − z 2 )2

105
18.01 Calculus Jason Starr
Fall 2005

Using the partial fraction decomposition, the antiderivative is,

z2 √
� � � �
−1 −1
�
1 1 z
dz = + + ln(|z − 1|/|z + 1|) +C = + ln( 1 − z 2 ) − ln(1 + z) +C.
(1 − z 2 )2 4 z+1 z−1 2 1 − z2

This allows us to ﬁnish the computation of the antiderivative from the beginning of the lecture.

This is left as an exercise.

Lecture 27. November 22, 2005

Homework. Problem Set 7 Part II: Problem 2.

Practice Problems. Course Reader: 5F2, 5F3, 5F4, 5F5.

1. Integration by parts. The diﬀerential form of the product rule is,

d(uv) = udv + vdu.

An equivalent form is,

udv = d(uv) − vdu.
This gives a very useful antidiﬀerentiation formula,
� �
udv = uv − vdu.

This formula is integration by parts.

Example. Compute the antiderivative of,
�
x cos(x)dx.

Set u to be x and dv to be cos(x)dx. Then u, v, du and dv are,

u = x dv = cos(x)dx,
du = dx v = sin(x)

Using integration by parts, � �

udv = uv −vdu,
� �
x cos(x)dx = x sin(x) − sin(x)dx.

The new integral is easy to evaluate. Altogether this gives,

�
x cos(x)dx = x sin(x) + cos(x) + C.

106
18.01 Calculus Jason Starr
Fall 2005

Because it is much easier to diﬀerentiate than the antidiﬀerentiate, it is a good idea to check you
answer.
2. How �to use integration by parts. � The goal of integration by parts is to replace a complicated
integral, udv, by a simpler integral vdu. What this usually means is that du should be simpler
than u, and v should be no more complicated than dv. This was the case in the last example.
However, occasionally this is not the case.
Example. Use integration by parts to compute the antiderivative,
�
ln(x)dx.

There is very little choice here, if we are to use only integration by parts. Set u to be ln(x) and set
dv to be dx. Then u, v, du and dv are,

u = ln(x), dv = dx
du = dx/x, v = x

Using integration by parts, � �

udv = uv − vdu,
� �
ln(x)dx = x ln(x) − dx.

The new integral is easy to evaluate. Altogether this gives,

�
ln(x)dx = x ln(x) − x + C.

Notice this example does not follow the general rule. The integral v = x is strictly more complicated
than dv = dx. However, du = dx/x is much simpler than u = ln(x). So vdu = dx is simpler than
udv = ln(x)dx. The lesson is to be flexible when antidifferentiating. Try different things, and see
which one works. For example, another approach to this problem, which ultimately comes down
to integration by parts again, is to make an inverse substitution,

x = et , dx = et dt.

The new integral is, � �

ln(x)dx = tet dt.

Set u = t and du = et dt. Then u, v, du and dv are,

u = t, dv = et dt
du = dt, v = et

107
18.01 Calculus Jason Starr
Fall 2005

Using integration by parts, � �

udv = uv − vdu,
� �
t t
te dt = te − et dt.

The new integral is easy to evaluate. Altogether this gives,

�
tet dt = tet − et + C.

Backsubstituting for x gives,

�
ln(x)dx = x ln(x) − x + C.

This agrees with the earlier answer.

2. Reduction formulas. It often happens that an integral can be computed only be repeated
application of integration by parts. It sometimes happens that integration by parts gives the
induction step to solve inﬁnitely many integrals. In this case, the formula given by integration by
parts is called a reduction formula.
Example. Use integration by parts to give a reduction formula for,
�
[ln(x)]n dx.

Now there is much more choice for u and dv. The simplest choice is to set u = [ln(x)]n and dv = dx.
Then u, v, du and dv are,
u = [ln(x)]n , dv = dx
n−1
du = n[ln(x)] /xdx, v = x
Using integration by parts, � �
udv = uv − vdu,
�
[ln(x)]n dx = x[ln(x)]n − n [ln(x)]n−1 dx.
�

The new integral is simpler than the original integral. And repeated application of the formula
eventually leads to a formula for the integral. Thus this is a reduction formula. For instance, this
gives, � �
[ln(x)]2 dx = x[ln(x)]2 − 2 ln(x)dx.

The new integral was already computed. Altogether this gives,

�
[ln(x)]2 dx = x[ln(x)]2 − 2x ln(x) + 2x + C.

108
18.01 Calculus Jason Starr
Fall 2005

Example. Use integration by parts to ﬁnd a reduction formula for,

�
tn et dt.

The simplest choice is to set u = tn and dv = et dt. Then u, v, du and dv are,

u = tn , dv = et dt
du = ntn−1 dt, v = et
Using integration by parts, � �
udv = uv − vdu,
�
tn et dt = tn et − n
�
tn−1 et dt.

Notice how similar this answer was to the answer of the previous example. The connection comes
from the inverse substitution,
x = et , dx = et dt,
so that, � �
n
[ln(x)] dx = tn et dt.

3. Advanced reduction formulas. Sometimes a reduction formula can only be obtained by

repeatedly applying integration by parts or by using some other identity.
Example. Using integration by parts to ﬁnd a reduction formula for,
�
[sin(x)]n dx, n ≥ 1.

One choice is to set u = [sin(x)]n−1 and to set dv = sin(x)dx. Then u, v, du and dv are,
u = [sin(x)]n−1 , dv = sin(x)dx
du = (n − 1)[sin(x)]n−2 cos(x)dx, v = − cos(x).
Using integration by parts, � �
udv = uv − vdu,
� �
n n−1
[sin(x)] dx = −[sin(x)] cos(x) + (n − 1) [sin(x)]n−2 cos2 (x)dx.

At ﬁrst blush, this is more complicated than the original integral since it involves both sin(x) and
cos(x). But cos2 (x) equals 1 − sin2 (x). This substitution gives,
� � �
n n−1
[sin(x)] dx = −[sin(x)] cos(x) + (n − 1) [sin(x)] dx − (n − 1) [sin(x)]n dx.
n−2

109
18.01 Calculus Jason Starr
Fall 2005

This certainly seems circular: the new formula for the integral involves the integral we were looking
for. However, bringing like terms to one side of the equation gives,
� � �
n n n−1
[sin(x)] dx + (n − 1) [sin(x)] = −[sin(x)] cos(x) + (n − 1) [sin(x)]n−2 dx.

Cleaning this up a bit gives the reduction formula,

�
[sin(x)]n dx = −[sin(x)]n−1 cos(x)/n + (n − 1)/n [sin(x)]n−2 dx.
�

Lecture 28. December 1, 2005

Homework. Problem Set 8 Part I: (a) and (b).

Practice Problems. Course Reader: 6A1, 6A2.

1. Indeterminate forms. Expressions of the form 0/0, ∞/∞, 0 × ∞, ∞ − ∞, 0∞ and ∞0

are called indeterminate forms. To be precise, none of these expressions is deﬁned in mathematics.
However, if a naive limit computation limx→a F (x) leads to an indeterminate form, it often happens
that a more careful computation using calculus eliminates the indeterminate form.
Example. Let b be any real number. Compute the limit as x approaches 0 of F (x) = (b+1/x)−1/x,
x �= 0. If we evaluate this limit in a naive manner, we get,
� � � �
1 1 1 1
lim F (x) = lim b + − “=” lim b + − lim = ∞ − ∞.
x→0 x→0 x x x→0 x x→0 x

This is an indeterminate form. In other words, the computation of the limit failed to give any
useful information. The reason is that the general formula,

lim[g(x) + h(x)] = lim g(x) − lim h(x),

x→a x→a x→a

only holds if all three limits are deﬁned, which they are not in our case.
Of course F (x) is simply the constant function with value b. Therefore,

lim F (x) = lim b = b.

x→0 x→0

Thus, a more careful computation proves the limit exists and gives its value.
2. The Mean Value Theorem revisited. Recall the Mean Value Theorem: If f (x) is continuous
on [a, b] and diﬀerentiable on (a, b), then for some c strictly between a and b,

f (b) − f (a)
f � (c) = .
b−a

110
18.01 Calculus Jason Starr
Fall 2005

Thus, given two such functions f (x) and g(x) such that g(b) − g(a) is nonzero, there exist two
values c1 and c2 strictly between a and b such that,
f � (c1 ) (f (b) − f (a))/(b − a) f (b) − f (a)
= = .
g � (c2 ) (g(b) − g(a))/(b − a) g(b) − g(a)
Is there a single value c = c1 = c2 where this equality holds?
The answer is yes. Form the function

F (x) = (f (b) − f (a))(g(x) − g(a)) − (g(b) − g(a))(f (x) − f (a)).

Since f (x) and g(x) are continuous on [a, b], also F (x) is continuous on [a, b]. Since f (x) and g(x)
are diﬀerentiable on (a, b), also F (x) is diﬀerentiable on (a, b). Moreover,

F (a) = F (b) = 0.

Thus, by the Mean Value Theorem, there exists a value c strictly between a and b such that
F � (c) = 0. By a straightforward computation,

F � (c) = (f (b) − f (a))g � (c) − (g(b) − g(a))f � (c).

This proves the Generalized Mean Value Theorem. The main consequence of the Generalized Mean
Value Theorem is the following result.
Proposition. Let f (x) and g(x) be continuous functions on [a, b] that are diﬀerentiable on (a, b).
If g � (x) is nonzero on (a, b), then g(x) − g(a) is nonzero for all a < x < b so that the expression,
f (x) − f (a)
g(x) − g(a)
is deﬁned. The righthanded limit,
f (x) − f (a)
lim+ ,
x→a g(x) − g(a)
exists if and only if the righthanded limit,
f � (x)
lim+ ,
x→a g � (x)
exists. If both limits exist, they are equal,
f (x) − f (a) f � (x)
lim+ = lim+ � .
x→a g(x) − g(a) x→a g (x)

A similar result holds for lefthanded limits. The proof follows by applying the Generalized Mean
Value Theorem to the interval [a, x] to replace (f (x) − f (a))/(g(x) − g(a)) by f � (c)/g � (c). Then x
approaches a as c approaches a.

111

18.01 Calculus Jason Starr

Fall 2005

3. L’Hospital’s rule. The most important case of the proposition is L’Hospital’s rule. This is
exactly the case when f (a) = g(a) = 0. In this case, a naive computation would give,

f (x) f (a) 0
lim+ “=” = ,
x→a g(x) g(a) 0

which is an indeterminate form. Again, the problem is that the general formula,

f (x) limx→a+ f (x)

lim+ = ,
x→a g(x) limx→a+ g(x)

only holds if all three limits are deﬁned, and the limit limx→a+ g(x) is nonzero. Since the limit is
zero, the formula does not hold.
However, if f � (x) and g � (x) exist, and if g � (x) is nonzero, then the proposition has the following
consequence, known as L’Hospital’s rule,

limx→a+ f (x)/g(x) = limx→a+ f � (x)/g � (x).

Examples.
sinh(x) cosh(x) 1
lim = lim = = 1.
x→0 sin(x) x→0 cos(x) 1
4x3 − 32 12x2 12 · 4 48
lim 2
= lim = = = 16.
x→2 x − x − 2 x→2 2x − 1 2·2−1 3
1 − cos(x)
lim = lim sin(x)2x = lim cos(x)2 = 1/2.
x→0 x2 x→0 x→0

4. L’Hospital’s rule for other indeterminate forms. L’Hospital’s rule can be used to compute
limits that naively lead to indeterminate forms other than 0/0. For instance, if

lim f (x) = lim+ g(x) = ∞,

x→a+ x→a

then the naive computation gives,

f (x) ∞
lim+ “=” .
x→a g(x) ∞
Now observe that,
lim (1/f (x)) = lim+ (1/g(x)) = 0.
x→a+ x→a
�
Therefore, if both g(x) and g (x) are nonzero on (a, b), then L’Hospital’s rule gives,

(1/f (x)) (1/f (x))� −f � (x)/f (x)2

lim+ = lim+ = lim .
x→a (1/g(x)) x→a (1/g(x))� x→a+ −g � (x)/g(x)2

112
18.01 Calculus Jason Starr
Fall 2005

Assuming that the limits,

f (x) f � (x)
lim+ , and lim+ �
x→a g(x) x→a g (x)

are deﬁned and nonzero, the formula above can be rewritten as,
�−1 � �−2
f � (x)
� � �
f (x) f (x)
lim = lim+ � · lim+ .
x→a+ g(x) x→a g (x) x→a g(x)

Solving gives,
limx→a+ f (x)/g(x) = limx→a+ f � (x)/g � (x),
if both limits are defined and nonzero. In fact, a better result is true (with a more subtle proof): if
the second limit is defined, then the first limit is defined and the 2 are equal (whether or not they
are zero).
Example.
ln(x − π/2) 1/(x − π/2)
lim + = lim + = · · · = 0.
x→π/2 sec(x) x→π/2 sec(x) tan(x)

By similar arguments, other indeterminate forms can also be reduced to L’Hospital’s rule. Also,
limits of the form,
lim F (x)
x→∞

giving indeterminate forms can often be reduced to L’Hospital’s rule. The moral is that the formula,

f (x) f � (x)
lim = lim � ,
x→a g(x) x→a g (x)

is almost always true if f (a)/g(a) is an indeterminate form. But a certain amount of care should

be used, since occasionally this fails.

Lecture 29. December 2, 2005

Homework. Problem Set 8 Part I: (c), (d) and (e); Part II: Problems 1 and 2.

Practice Problems. Course Reader: 6B7.

1. A problem with Riemann integrals. Riemann integrals are deﬁned in very many cases.
The result we use most often is that for a piecewise continuous function f (x) on a bounded interval
[a, b], the Riemann integral,
� b
f (x)dx,
a

exists (and equals a finite number). What if the interval is unbounded, e.g., [a, ∞)? Quite simply,
the Riemann integral is not defined. This isn’t a problem with our methods for computing integrals.
It is a problem with the very definition of the Riemann integral. In fact, this is only the first
of many problems with the definition of the Riemann integral. Eventually these problems led

113

18.01 Calculus Jason Starr

Fall 2005

mathematicians to develop a better definition, the Lebesgue integral, which is studied in course
18.103. Luckily, the particular problem of defining the integral on unbounded intervals can be
easily overcome using limits (with no need to use the Lebesgue integral).
2. Improper integrals of the first kind. Let f (x) be defined on the interval [a, ∞). If for every
number t > a the function f (x) is Riemann integrable on [a, t], and if the limit,
�
t
lim f (x)dx,
t→∞ a

exists, then we say the improper integral,

�
∞
f (x)dx,
a

is deﬁned and its value is, �

∞ �
t
f (x)dx = lim f (x)dx.
a t→∞ a
Please note, this is a new deﬁnition. It is not a theorem about Riemann integrals.
Example. Let p > 1 be a real number. Then for every t > 1, the integral,
�
t
1
p
dx,
1 x

exists and equals, �

�t
1 � = 1 −
� 1
− p−1
.
(p − 1)x �

1 p − 1 (p − 1)tp−1
Since p is greater than 1, the limit,
1
lim ,
t→∞ tp−1
exists and equals 0. Therefore, �
t
1
lim dx,
t→∞ 1 xp
exists and equals,
1
.
p−1
Therefore the improper integral exists and equals,
�
∞
1
dx = 1/(p − 1).
1 xp

On the other hand, when p equals 1, then,

�
t
1
dx = ln(t).
1 x

114
18.01 Calculus Jason Starr
Fall 2005

Since the limit limt→∞ ln(t) is not deﬁned (or more precisely, equals +∞), the improper integral,
� ∞
1
dx,
1 x

is not deﬁned (or more precisely, equals +∞).

Example. For t > 0, the integral, � t
cos(x)dx,
0

exists and equals sin(t). Even though all values sin(t) are deﬁned and bounded, the limit,

lim sin(t),
t→∞

is not deﬁned (essentially because it never settles down). Therefore the improper integral,
� ∞
cos(x)dx,
0

is not deﬁned.
3. Improper integrals of the second kind. Here is a second problem with the Riemann
integral. Let [a, b] be a bounded interval. Let f (x) be a function that is bounded on [t, b] for every
a < t < b, but which is unbounded on [a, b]. According to the deﬁnition of the Riemann integral,
� b
f (x)dx,
a

is not deﬁned. However, it may happen that for every a < t < b, the integral,
� b
f (x)dx,
t

is deﬁned and the limit, � b

lim+ f (x)dx,
t→a t
is deﬁned. In this case, we say the improper integral,
� b
f (x)dx,
a+

is deﬁned and its value is, � b � b

f (x)dx = lim+ f (x)dx.
a+ t→a t

115
18.01 Calculus Jason Starr
Fall 2005

Similarly, if f (x) is Riemann integrable on every interval [a, t] for a < t < b, and if
� t
lim− f (x)dx,
t→b a

exists, we say the improper integral, � b−

f (x)dx,
a
exists and its value is,
� b− � t
f (x)dx = lim− f (x)dx.
a t→b a

Example. Let p be a real number in the range 0 < p < 1. Because the function 1/xp is unbounded
on [0, 1], the Riemann integral, � 1
1
p
dx,
0 x
is not deﬁned. However, for every 0 < t < 1, the Riemann integral,
� 1
1
p
dx,
t x

is deﬁned equals,
1 − t1−p
.
1−p
Since 0 < p < 1, the limit,
lim t1−p ,
t→0

exists and equals 0. Therefore, � 1

1
lim dx,
t→0 t xp
exists and equals 1/(1 − p). Therefore the improper integral,
� 1
1
p
dx,
0+ x

exists and its value is, � 1

1
dx = 1/(1 − p).
0+ xp

4. The Comparison Test. When is an improper integral deﬁned? This is equivalent to asking
when a limit is deﬁned. Therefore, every rule for convergence of a limit gives a rule for convergence
of an improper integral. There are 2 basic rules for convergence of a limit.

116
18.01 Calculus Jason Starr
Fall 2005

The squeezing lemma. If F (x) ≤ G(x) ≤ H(x) on an interval, if limx→a F (x) and limx→a H(x)
exist, and if limx→a F (x) equals limx→a H(x), then limx→a G(x) exists and equals the other 2 limits.
Monotone bounded limits. If F (x) is monotone increasing and bounded above on [a, b),
then limx→b− F (x) exists. Similarly, if F (x) is monotone decreasing and bounded below, then
limx→b− F (x) exists, if F (x) is monotone increasing and bounded below, then limx→a+ F (x) exists,
and if F (x) is monotone decreasing and bounded above, then limx→a+ F (x) exists.
These give the following tests for convergence of an improper integral.

Squeezing lemma. If f (x) ≤ g(x) ≤ h(x) on the interval [a, ∞), and if the improper integrals,

� ∞ � ∞
f (x)dx and h(x)dx,
a a

exist and are equal, then the improper integral,

� ∞
g(x)dx,
a

exists and equals the other 2.

The comparison theorem. If 0 ≤ f (x) ≤ g(x) on [a, ∞), and if,

� ∞
g(x)dx,
a

converges, then � ∞
f (x)dx,
a
�∞ �∞
converges. Contrapositively, if a
f (x)dx diverges, then a
g(x)dx diverges.
Lecture 30. December 6, 2005
Practice Problems. Course Reader: 6C2.
1. Sequences By definition, a sequence of real numbers is a rule assigning to each counting number
n an associated real number an . The integer n is called the index of the sequence. Usually the
index begins with n = 1, but occasionally it begins with another integer (sometimes 0). Sequences
are often specified by giving the first few values, and letting the reader infer the rule, e.g.,
1 1 1
a1 = , a 2 = , a 3 = , . . .
1 2 3
It is always better to give a precise definition of each sequence, e.g.,
1
an = , n = 1, 2, . . .
n
The most common notation for a sequence is (an )n≥1 .

117
18.01 Calculus Jason Starr
Fall 2005

A sequence (an )n≥1 converges to a limit L if the sequence becomes arbitrarily close to L, and stays
arbitrarily close to L. More precisely, the sequence converges to L if for every positive number �,
there exists an integer N (depending on the sequence and �) such that for every integer n ≥ N ,

|an − L| < �.

In other words, the tail of the sequence aN , aN +1 , aN +2 , . . . are all numbers in the interval (L −
�, L + �). A sequence cannot have more than 1 limit: given 2 potential limits L1 and L2 , simply
take � = |L1 − L2 |/2 in the deﬁnition above. A sequence which has a limit is said to converge, and
the limit is denoted by,
L = lim an .
n→∞

A sequence which does not have a limit is said to diverge.

Examples.

(i) Let L be a ﬁxed real number. The sequence an = L, n = 1, 2, . . . converges to L.

(ii) The sequence an = n diverges. In a precise sense, this sequence “diverges to ∞”.

(iii) The sequence an = (−1)n diverges, even though it is bounded (it never gets bigger than 1 or
smaller than −1).

(iv) Let r be a real number. The sequence an = rn , n = 0, 1, 2, . . . converges to 0 if |r| < 1 and
diverges if |r| > 1. There are 2 remaining cases. If r = −1, then an = (−1)n diverges. If
r = 1, then an = 1 converges to 1.

2. Tests for convergence/divergence. One useful test for convergence is the Squeezing Lemma.
The squeezing lemma. Let (an )n≥1 , (bn )n≥1 and (cn )n≥1 be sequences such that for every index
n,
an ≤ b n ≤ c n .
In other words, the sequence (bn ) is “squeezed” between the sequences (an ) and (cn ). If (an ) and
(cn ) converge, and if,
lim an = lim cn ,
n→∞ n→∞

then also (bn ) converges and its limit equals the limit of the other 2 sequences.
Another test for convergence is the Monotone Convergence Test. A sequence (an )n≥1 is called non
decreasing if for every index n, an+1 ≥ an . Similarly, a sequence (an ) is nonincreasing if for every
index n, an+2 ≤ an . A sequence which is either nondecreasing or nonincreasing (but not both
increasing and decreasing) is called monotone. A sequence (an ) is bounded above if there exists
a real number u such that for every index n, an ≤ u. The number u is an upper bound for the
sequence. A sequence (an ) is bounded below if there exists a real number l such that for every index
n, an ≥ l. The number l is a lower bound for the sequence.

118
18.01 Calculus Jason Starr
Fall 2005

Monotone Convergence Test. A nondecreasing sequence converges if and only if it is bounded

above. In this case, the limit of the sequence is the least upper bound for the sequence. Similarly,
a nonincreasing sequence converges if and only if it is bounded below and the limit is the greatest
lower bound for the sequence.
3. Series. Given a sequence (an )n≥1 , there are 2 important related sequences. The ﬁrst is the
sequence of partial sums, (bn )n≥1 , deﬁned by,
n
�
b n = a1 + a 2 + · · · + a n = ak .
k=1

The second is the sequence of partial absolute sums, (Bn )n≥1 , deﬁned by,
n
�
Bn = |a1 | + |a2 | + · · · + |an | = |ak |.
k=1

If the sequence of partial sums (bn )n≥1 converges, the limit is called the series of (an )n≥1 , and is
denoted by,
�∞ �n
ak := lim bn = lim ak .
n→∞ n→∞
k=1 k=1
�
In this case is is said the series �k ak converges. If the sequence of partial absolute sums (Bn )n≥1
converges, it is said the series k ak converges absolutely. Although it is not obvious, if the
series converges absolutely, then the series converges (this is a basic theorem from course 18.100).
If a series converges but does not converge absolutely, sometimes it is said the series converges
conditionally.
Examples.
� The harmonic sequence is the sequence an = 1/n. As will be shown soon, the harmonic
series n 1/n diverges to ∞. The alternating harmonic sequence is,

(−1)n
an = .
n
The alternating harmonic series,
∞
� (−1)n
,
n=1
n
does converge. This will also be shown soon. Since the sequence of partial absolute sums for the
alternating sequence equals the sequence of partial sums for the harmonic sequence, the alternating
harmonic series does not converge absolutely. It only converges conditionally.
As counterintuitive as this might sound, the terms in the alternating harmonic series can be
rearranged so that the sum converges to any real number you like! This sounds ridiculous: ﬁnite
sums are independent of the order in which the summands are added, so how could this fail for

119

18.01 Calculus Jason Starr

Fall 2005

�
infinite sums?
� The answer is quite simple. Because the harmonic series n 1/n diverges, the same
is true for 1/2n . Thus, add it up a very large number of only the (positive) even terms in the
alternating harmonic series to make the partial sum bigger than, say, 106 . Now add only the first
odd term −1/2. This has a negligible effect. Now add a large number of the remaining even terms
to make the partial sum bigger than 107 . Now add one more odd term, −1/3. Continuing in this
way, eventually every term in the sequence contributes to one of the partial sums. But because
we add positive terms with a much higher frequency than negative terms, the sequence of partial
sums is diverging to +∞. Similarly, we could negative terms with a very high frequency and make
the partial sums diverge to −∞. Now it is not so surprising that by adding the terms in a careful
order, we can make the partial sums converge to any value we like.
The pathology of the preceding paragraph occurs with any conditionally convergent series. It is a
very important fact that every absolutely convergent series has only a single limit, independent of
the order in which terms are added. For this reason, absolutely convergent series are much more
useful than conditionally convergent series.
�
4. Test for convergence/divergence of series. If a series n an converges, then the sequence
(an ) converges to 0. To see this, denote by L the limit of the sequence of partial sums (bn ). For
every positive real number �, using �/2 in the definition of convergence of (bn ), there exists an
integer N such that for every n ≥ N , |bn − L| < �/2. But then for n ≥ N + 1,

|an | = |bn − bn−1 | = |(bn − L) − (bn−1 − L)| ≤ |bn − L| + |bn−1 − L| < �/2 + �/2 = �.

Thus the sequence (a� n ) converges to 0. Contrapositively, if the sequence (an ) does not converge
to 0, then the series n an diverges. This is the�most basic test for divergence of a series. For
∞ n
example, it immediately follows that the series n=1 (−1) diverges (arguing the opposite is a
favorite pasttime of “mathematical cranks”).
The most basic test for absolute convergence of a sequence follows from the monotone convergence
test. The sequence of partial absolute sums,
n
�
Bn = |ak |,
k=1

is a nondecreasing sequence. Therefore, by the monotone convergence theorem, it converges if

and only if it is bounded above. The most common technique for proving the sequence of partial
absolute sums is bounded above is by comparing it to a larger series that is known to converge.
This gives the following.
Comparison �Test. Let (an )n≥1 and (bn )n≥1 be sequences�such that for every index n, |an | ≤ |bn |.
If the series ∞ b
n=1 n converges absolutely, then the series ∞
n=1 an converges absolutely.

A number of common convergence tests in calculus textbooks come to nothing more than combining
the comparison test with an analysis of the geometric series. Let r be a real number and let (an )n≥0
be the geometric sequence,
an = rn , n ≥ 0,

120
18.01 Calculus Jason Starr
Fall 2005

(by convention, if r = 0, the ﬁrst term a0 is deﬁned to be 1). By high school algebra, if r �= 1, the
partial sums are
1 − rn+1 1 1 n+1
bn = 1 + r + · · · + r n = = − r .
1−r 1−r 1−r
Observe this sequence depends on n only in the last term rn+1 , which is essentially the geometric
sequence. Assuming r �= 1, the geometric sequence rn+1 converges if and only if |r| < 1. In this
case, the sequence of partial absolute sums,
1 1
Bn = 1 + |r| + |r|2 + · · · + |r|n = + |r|n+1 ,
1 − |r| 1 − |r|
�∞ n
also converges. Thus, the geometric series n=0 r converges absolutely to 1/(1 − r) if |r| < 1, and
diverges if |r| > 1 or r = −1. The only remaining case is�when r = 1. Then the partial sums are
bn = n + 1, which diverges to ∞. Altogether, the series ∞ n
n=0 r converges to 1/(1 − r) if |r | < 1,
and diverges otherwise.
The ratio test. There are two tests that allow us to compare a given sequence (an )n≥∞ to a
geometric sequence (rn )n≥1 . If the following limit,
� �
� an+1 �
lim � � ,

n→∞ �
an �

exists, call it r. Then the sequence (an )n≥1 can be� compared to a sequence (Crn )n≥1 for some
choice of C. This leads to the ratio test : The series ∞n=1 an converges absolutely if the sequence
|an+1 /an | converges to a real number r < 1 and diverges if the sequence |an+1 /an | converges to
a real number r > 1 (in which case, the sequence (an )n≥1 does not converge to 0). There is no
information if the sequence converges to 1 or diverges.
Similarly, if the following limit, �
n
lim |an |,
n→∞
exists, call it r.
Then the sequence
�∞ (an )n≥1 can be compared to a sequence
� (Crn )n≥1 . This leads
to the root test : The series n=1 an converges � absolutely if the sequence |an | converges to a real
n

number r < 1 and diverges if the sequence |an | converges to a real number r > 1. There is no
n

information if the sequence converges to 1 or diverges.

Comparison to an improper integral. The ﬁnal test uses improper integrals to get useful
information about a series. Let (an )n≥1 be a sequence. Let f (x) ≥ 0 be a function on [1, ∞) such
that for every integer n, f (x) ≥ an for all n ≤ x ≤ n + 1. If the improper integral,
� ∞
f (x)dx,
1
�
∞
converges, then the series
n=1 an converges absolutely. On the other hand, let g(x) ≥ 0 be a
function on [1, ∞) such that for every integer n, g(x) ≤ an for all n ≤ x ≤ n + 1. If the improper
integral, �
∞
g(x)dx,
1

121
18.01 Calculus Jason Starr
Fall 2005

�
∞
diverges, then the series n=1 an does not converge absolutely. For both directions, deﬁne the
sequence (cn ) by, �
n+1 �
n+1
cn = f (x)dx, or cn = g(x)dx.
n n
The absolute partial sum of the series nk=1 ck is simply,
�

� n � n � n
ck = f (x)dx, or g(x)dx.
k=1 1 1

The result follows.

Examples. 1. The harmonic series. Let (an )n≥1 be the harmonic sequence,
1
an =.
n
Let g(x) be the function g(x) = 1/x on the interval [1, ∞). Then for every integer n, g(x) ≤ an =
1/n on the interval [n, n + 1]. By the Fundamental Theorem of Calculus, the partial sums of the
sequence (cn ) are,
n �
n
� 1
ck = dx = ln(n).
k=1 1 x

As n tends to ∞, the natural logarithms ln(n) also tend to ∞ (although very slowly – ln(n) does
not get bigger than a ﬁxed real number R until n gets bigger than the much larger number eR ).
Therefore the partial sums diverge. By the comparison test, the harmonic series also diverges (very
slowly).
Example. 2. The Riemann zeta function. Let s > 1 be a real number. Deﬁne the sequence
(an )n≥1 by,
1
an = s .
�
∞ �
∞ n
s s
The series n=1 1/n equals 1 + n=2 1/n , which is the same as,
∞
� 1

1+ .
n=1
(n + 1)s

Let f (x) be the function f (x) = 1/xs . Then for each integer n, f (x) ≥ 1/(n + 1)s for every x in
[n, n + 1]. The partial sum of (cn ) is,
� n � �n
1 1 1 �� 1 1 1
cn = s
dx = s−1
= − s−1
.
1 x 1−sx �

1 s−1 s−1n
Because s is bigger than 1, as n tends to ∞, also ns−1 tends to ∞. Therefore the partial sums tend
to 1/(s − 1). Therefore, by the comparison test, the series,
∞
� 1
,
n=1
ns

122
18.01 Calculus Jason Starr
Fall 2005

converges absolutely to a value bounded by 1/(s − 1). The value of this limit is called the Riemann
zeta function at s, denoted
∞
� 1
ζ(s) := .
n=1
ns
This function is of fundamental importance in number theory. It is also pops up in Fourier series
and statistical mechanics. The values of ζ(s) when s is an even integer are known. The ﬁrst couple
are ζ(2) = π 2 /6 and ζ(4) = π 4 /90. There are very fundamental open problems about the Riemann
zeta function. For one of these problems in particular, the Clay Mathematics Institute has oﬀered
a $1 million prize for an accepted, refereed solution.
Lecture 31. December 8, 2005
Practice Problems. Course Reader: 7B4, 7B6, 7C1, 7C5, 7D1, 7D2.
1. Power series. Given a real number a and a sequence of real numbers (cn )n≥0 , there is an
associated expression, called a power series about x = a,
∞
�
cn (x − a)n = c0 + c1 (x − a) + c2 (x − a)2 + . . .
n=0

For every choice of a real number x, the power series gives a usual series. In particular, for the
choice x = a, the series has only 1 nonzero term, thus converges to c0 .
Question. Given a power series, for which real numbers x does the corresponding series absolutely
converge?
Examples. 1. Consider the power series,
∞
�
1 1 2 2 3 3
0 + 1 x + 2 x + 3 x + ··· = nn xn .
n=1

Of course this converges to 0 for x = 0. But for any x other than 0, the sequence nn xn = (nx)n
diverges. Therefore the series does not converge. In other words, the series converges only for
x = 0.
2. Consider the power series,
∞
�
2
1 + x + x + ··· = xn .
n=1

This is a geometric series. From the last lecture, the series converges absolutely for |x| < 1 and
diverges if |x| ≥ 1.
3. Consider the power series,
∞
2 3
� 1 n
1 + x + x /2 + x /3! + · · · = x .
n=0
n!

123

18.01 Calculus Jason Starr

Fall 2005

The ratio of the nth and (n + 1)st terms in the series is,
x
(xn+1 /(n + 1)!)/(xn /n!) = .
n+1
For ﬁxed x, as n grows, this sequence of ratios converges to 0, which is less than 1. Therefore, by
the ratio test, for every choice of x the series converges.
These 3 examples illustrate the whole range of possibilities.
Theorem. Let ∞ n
�
n=0 cn (x − a) be a power series about x = a. Exactly one of the following hold.

(i) For every x diﬀerent from a, the series does not converge absolutely.
(ii) There exists a real number R such that the series converges absolutely if |x − a| < R and
does not converge absolutely if |x − a| > R.
(iii) For every real number x, the series converges absolutely.

The real number R occuring in Case (ii) is called the radius of convergence. By convention, in
Case (i) the radius of convergence is defined to be R = 0. By convention, in Case (iii) the radius
of convergence is defined to be R = ∞. This allows us to replace the original question by a more
precise question.
Question. Given a power series, what is the radius of convergence?
Although there is no single answer to this question, in many interesting cases the ratio or root test
gives an answer.
cn (x − a)n is positive,
�
2. Analytic functions. If the radius of convergence R of a power series
then the power series defines a function on the interval (a − R, a + R),
∞
�
f (x) = cn (x − a)n .
n=0

A function defined in this manner is called an analytic function. This is the real significance of
power series: they give important examples of functions that cannot be described in a more direct
manner. Analytic functions have nice analytic properties (whence the name). For instance, it is a
theorem (proved in 18.100) that an analytic function f (x) is differentiable and the derivative has a
power series converging absolutely with the same radius R,
∞
� ∞
�
� n−1
f (x) = cn n(x − a) = (m + 1)cm+1 (x − a)m .
n=0 m=0

We can iterate the theorem, i.e., f � (x) is diﬀerentiable and f �� (x) has a power series converging
absolutely with radius R. Iterating k times, the function f (x) is ktimes diﬀerentiable and its k th
derivative has a power series,
∞
(k)
� (n + k)!
f (x) = cn+k (x − a)n .
n=0
n!

124

18.01 Calculus Jason Starr

Fall 2005

In particular, every derivative of f (x) is defined. A function with this property is called infinitely
differentiable or smooth. Thus, every analytic function is infinitely differentiable.
This is only 1 of many useful properties of analytic functions. Which functions f (x) are analytic
functions? By the last paragraph, if f (x) is analytic, then it is infinitely differentiable. Are there
other restrictions? Can more than 1 power series about x = a give rise to the same analytic
function?
To answer both of these questions, consider the analytic function defined by a power series,
∞
�
f (x) = cn (x − a)n .
n=0

Plugging in x = a gives the equation,

f (a) = c0 + c1 (a − a) + c2 (a − a)2 + · · · = c0 + 0 + 0 + · · · = c0 .

Thus the ﬁrst coeﬃcient of the power series is simply,

c0 = f (a).

Moreover, from the power series for the k th derivative,

f (k) (a) = k!ck + (k + 1)!/1!ck+1 (a − a) + (k + 2)!/2!ck+2 (a − a)2 + · · · = k!ck + 0 + 0 + · · · = k!ck .

Solving for ck , the k th coeﬃcient of the power series is,

ck = f (k) (a)/k!.

Therefore, the power series deﬁning f (x) is,

∞
� f (n) (a)
f (x) = (x − a)n .
n=0
n!

In particular, this series is unique. This answers the second question. Two absolutely convergent
power series about x = a give the same analytic function if and only if the power series are
themselves equal (i.e., the corresponding coefficients of the 2 series are equal).
Moreover, this gives us alot of information about the first question. For an infinitely differentiable
function f (x) defined at a point x = a, there is a very important power series, the Taylor series
expansion of f (x) about x = a,
�∞ f (n) (a)
n=0 n!
(x − a)n .
If f (x) is analytic, then the Taylor series converges absolutely to f (x). This reduces the original
question to 2 new questions. Does the Taylor series have a positive radius of convergence? If so,
does the analytic function defined in this way equal the original function f (x)?

125
18.01 Calculus Jason Starr
Fall 2005

The radius of convergence question is precisely the radius of convergence question posed earlier. As
there, the answer can often be found by using the ratio or root tests. The second question is yes
in every practical case. There are examples of inﬁnitely diﬀerentiable functions where the Taylor
series has a positive radius of convergence, but does not converge to the original function. However,
every example is somewhat contrived; they rarely come up “in nature”. Just for completeness, here
is an example of one of these pathological functions,
� −1/x2
e , x �= 0,
f (x) =
0 x=0

3. Algorithm for computing Taylor series. The method for finding the Taylor series of
a function is always the same. For definiteness, consider the Taylor series expansion of f (x) =
(1 − x)−1 about the point x = 0.
Step 1. Compute all derivatives of f (x). If this sounds like alot of work, it is! In most
examples, this really comes down to finding an inductive formula for the derivatives of f (x). In the
example, the “zeroth derivative” is,

f (x) = (1 − x)−1 .

The ﬁrst derivative is,

f � (x) = −(1 − x)−2 (−1) = (1 − x)−2 .
The second derivative is,

f �� (x) = (−2)(1 − x)−3 (−1) = (1 − x)−3 .

This begins to suggest a pattern: The k th derivative of f (x) will be,

f (k) (x) = bk (1 − x)−k−1 ,

for some real number bk . Having made this guess, it is easy to verify by induction. By computation,
the result is true for k = 0, 1 and 2 with the corresponding real numbers b0 = 1, b1 = 1 and b2 = 2.
By way of induction, assume the result is true for k = n, i.e.,

f (n) (x) = cn (1 − x)−n−1 .

Then the (n + 1)st derivative is,

f (n+1) (x) = (f (n) (x))� = (cn (1 − x)−n−1 )� = cn (−n − 1)(1 − x)−n−2 (−1) = (n + 1)cn (1 − x)−n−2 .

Thus the result is also true for k = n + 1 where cn+1 satisﬁes the equation,

cn+1 = (n + 1)cn .

126
18.01 Calculus Jason Starr
Fall 2005

Thus the result is proved by induction on k.

In fact, more has been accomplished, since now there is an inductive formula for the numbers cn ,

cn = ncn−1 = n(n−1)cn−2 = n(n−1)(n−2)cn−3 = · · · = n(n−1)(n−2)·· · ··3c2 = n(n−1)(n−2)·· · ··3·2·1.

This number has come up before in this class. It is the nth factorial number,

cn = n!.

This gives the ﬁnal formula for the nth derivative of f (x),

f (n) (x) = n!(1 − x)−n−1 .

Step 2. Substitute x = a into the derivatives. Compared to the work of ﬁnding the derivatives,
this is very simple. In the example, plugging in x = 0 gives,

f (n) (0) = n!.

Step 3. Compute the coefficients of the Taylor series. By definition, the nth coefficient of
the Taylor series is,
f (n) (a)
cn = .
n!
In the example, this gives the coefficient,
n!
cn = = 1,
n!
for every integer n ≥ 0.
Step 4. Write the Taylor series. This is really getting into the “mindnumbing details”. In the
example, the Taylor series expansion for (1 − x)−1 about x = 0 is,
�∞
(1 − x)−1 = n=0 xn .

Step 5. If possible, ﬁnd the radius of convergence. In the example, the Taylor series is
simply the geometric series. By the previous lecture, the geometric series converges absolutely with
radius R = 1. Moreover, it converges absolutely to (1 −x)−1 . Notice, this gives another explanation
for the radius R = 1. Since (1 − x)−1 has a vertical asymptote at x = 1, the Taylor series cannot
converge on any interval that contains x = 1. The largest interval centered at x = 0 not containing
x = 1 is the interval (−1, 1). This interval has radius R = 1.
4. More examples. What is the Taylor series expansion for (1 − x)−1 about a point x = a
diﬀerent from x = 1? The fortunate fact is that Step 1 allows to compute the derivatives f (n) (a)

127

18.01 Calculus Jason Starr

Fall 2005

� 1, not just x = 0. This is the typical case, and it is one justiﬁcation for doing the work
for any a =
necessary in Step 1. In this case, the answer is,

f (n) (a) = n!(1 − a)−n−1 .

Therefore, according the Step 3, the nth coeﬃcient in the Taylor series expansion is,

n!(1 − a)−n−1
cn = = (1 − a)−n−1 .
n!
Thus, according to Step 4, the Taylor series expansion for (1 − x)−1 about x = a is,
�∞
(1 − x)−1 = n=0 (1 − a)−n−1 (x − a)n .

What is the radius of convergence? The ratio of the (n + 1)st and nth terms of the series is,

[(1 − a)−n−2 (x − a)n+1 ]/[(1 − a)−n−1 (x − a)n ] = (1 − a)−1 (x − a).

This is independent of n. Thus, this constant sequence converges to its constant value (1 − a)−1 (x −
a). By the ratio test, the sequence is absolutely convergent if and only if this limit has absolute
value less than 1,
|(1 − a)−1 (x − a)| ≤ 1.
Rearranging, the series converges if and only if,

|x − a| ≤ |1 − a|.

Thus the radius of convergence is,

R = |1 − a|.
This is perfectly reasonable. The function (1 − x)−1 has a vertical asymptote at x = 1. Therefore,
the power series cannot converge on any interval containing x = 1. The largest interval centered
at x = a not containing x = 1 has radius equal to the distance from x = a to x = 1, namely
R = |1 − a|.
Example 2. For the next example, consider the Taylor series expansion for f (x) = ex near x = 0.
In this case, Step 1 is simple. Every derivative of f (x) is simply,

f (n) (x) = ex .

Therefore, the nth coeﬃcient of the Taylor series expansion is,

f (n) (0) e0
cn = = = 1/n!.
n! n!

128
18.01 Calculus Jason Starr
Fall 2005

Therefore the Taylor series expansion is,

∞
� 1 n
x .
n=0
n!

Observe this is the power series considered earlier in the lecture, whose radius of convergence is
R = ∞. Therefore, for every x, the power series converges absolutely to ex ,
�∞
ex = n=0 xn /n!.

This equation is sometimes taken as the deﬁnition of ex . It has certain advantages to our original
deﬁnition of ex . Importantly, it is easy for a computer to determine ex to very high precision using
this formula.
Example 3. Having computed the Taylor series expansion for ex about x = 0, the next question
is to compute the Taylor series expansion for ex about x = a. According to the formula,

f (n) (a) = ea ,

and thus the coeﬃcient is,

c + n = ea /n!.
This gives the Taylor series expansion for ex about x = a,
∞
� ea
(x − a)n .
n=0
n!

As above, the radius of convergence is R = ∞. Thus, for every real number x, the power series
converges absolutely to ex ,
�∞ a
ex = n
n=0 e (x − a) /n!.

On the other hand, we didn’t need to do any extra work to see this. We could have used the
formula,
ex = ea+(x−a) = ea ex−a .
Plugging in x − a for x in the power series expansion for ex gives the power series expansion,
∞
� 1
ex−a = (x − a)n .
n=0
n!

This gives the same Taylor series expansion as above,

∞
x
�
a 1 �∞
e =e (x − a)n = a n
n=0 (e /n!)(x − a) .
n=0
n!

129
18.01 Calculus Jason Starr
Fall 2005

Example 3. Consider the function f (x) = sin(x). The derivatives of f (x) are,

f (x) = sin(x),
f � (x) = cos(x),
f �� (x) = − sin(x),
f (3) (x) = − cos(x),
f (n+4) (x) = f (n) (x)

Together, these give all the derivatives of f (x). Write n = 4l, 4l + 1, 4l + 2 or 4l + 3 for some
nonnegative integer l. Then the rules above give,
⎧
⎪
⎪ sin(x) n = 4l,
cos(x) n = 4l + 1,
⎨
f (n) (x) =
⎪
⎪ − sin(x) n = 4l + 2,

− cos(x) n = 4l + 3

In particular, plugging in x = 0 gives,

⎧
⎪
⎪ 0, n = 4l,
1, n = 4l + 1,
⎨
f (n) (0) =
⎪
⎪ 0, n = 4l + 2,
−1, n = 4l + 3
⎩

Thus, all the even coefficients of the Taylor series are 0. For an odd coefficient, say n = 2m + 1,
the derivative is,
f (2m+1) (0) = (−1)m .
Therefore, the coefficient is,
(−1)m
c2m+1 = .
(2m + 1)!
Plugging this in gives the Taylor series expansion for sin(x) about x = 0,
∞
� (−1)m 2m+1
x .
m=0
(2m + 1)!

The ratio of consecutive terms in the series is,

[(−1)m+1 x2m+3 /(2m + 3)!]/[(−1)m x2m+1 /(2m + 1)!] = −x2 /(4m2 + 8m + 3).

This sequence converges to 0. Therefore, by the ratio test, the power series converges absolutely to
sin(x) for every choice of x,
�∞ m
sin(x) = m=0 (−1) /(2m + 1)!x2m+1 .

130

18.01 Calculus Jason Starr

Fall 2005

There is an exactly similar formula for g(x) = cos(x),

⎧
⎪
⎪ cos(x), n = 4l,
− sin(x), n = 4l + 1,
⎨
g (n) (x) =
⎪
⎪ − cos(x), n = 4l + 2,
sin(x), n = 4l + 3.
⎩

This gives the values, ⎧

⎪
⎪ 1, n = 4l,
0, n = 4l + 1,
⎨
g (n) (0) =
⎪
⎪ −1, n = 4l + 2,

0, n = 4l + 3.

Therefore the Taylor series is,

�∞ m 2m
cos(x) = m=0 (−1) /(2m)!x .

Notice, we didn’t really need to do this work. Since cos(x) is the derivative of sin(x), the Taylor
series for cos(x) is simply the termbyterm derivative of the Taylor series for sin(x). This gives the
same formula as above.
To compute the Taylor series expansions of sin(x) and cos(x) about a point x = a, we can follow
the procedure above. However, it is much faster to use the angle addition formulas,

sin(x) = sin(a + (x − a)) = cos(a) sin(x − a) + sin(a) cos(x − a),

cos(x) = cos(a + (x − a)) = cos(a) cos(x − a) − sin(a) sin(x − a).

This gives the Taylor series expansions,
�∞ (−1)m cos(a) �∞ (−1)m sin(a)
sin(x) = m=0 (2m+1)!
(x − a)2m+1 + m=0 (2m)!
(x − a)2m ,

�∞ (−1)m cos(a) �∞ (−1)m+1 sin(a)

cos(x) = m=0 (2m)!
(x − a)2m + m=0 (2m+1)!
(x − a)2m+1 .

Lecture 32. December 9, 2005

Practice Problems. Course Reader: RI.
1. Using power series to solve calculus problems. The reason power series are useful is
because they allow us to describe functions that have no direct description. For instance, consider
the function,
�
x
2
f (x) = e−t dt,
0
for x ≥ 0. By the Fundamental Theorem of Calculus, this function exists and is diﬀerentiable
2
with derivative f � (x) = e−x . Unfortunately, there is no simple expression for f (x) involving only

131
18.01 Calculus Jason Starr
Fall 2005

polynomials, trigonometric functions, exponential functions and logarithms (the proof of this is far
beyond the scope of this class). However, it is quite easy to write down a power series expansion
2
for f (x). First of all, the Taylor series for e−t about t = 0 is obtained by substituting x = −t2 in
the Taylor series for ex about x = 0,
∞
−t2
�
e = (−1)n t2n /n!.
n=0

Because this series converges absolutely, the integral of the series is the series of the termbyterm
integrals,
� x � x� ∞ ∞ � x
−t2 (−1)n 2n � (−1)n 2n
f (x) = e dt = t dt = t dt.
0 0 n=0 n! n=0 0
n!
Each of these integrals can be computed quite easily. This gives,
�∞ n
f (x) = n=0 (−1) /[(2n + 1) · n!]t2n+1 .

This is the Taylor series expansion for f (x) about x = 0. For instance, using this series, it is easy
to estimate, � 1
2
e−t dt ≈ 0.747 ± 10−3 .
0

2. Taylor series with remainder term. As demonstrated by the computation just done, in
reality only ﬁnitely many terms in a Taylor series are used. What can be said in this case? In other
words, how quickly does the series converge? How large is the remainder after n terms? To make
all this precise, introduce the function RN,a (x) deﬁned to be,
N
� f (n) (a)
RN,a (x) = f (x) − (x − a)n .
n=0
n!

This is precisely the remainder term so that we have,

N
� f (n) (a)
f (x) = (x − a)n + RN,a (x).
n=0
n!

The precise version of the questions above is, what bounds exist for RN,a (x)?
To understand the answer, consider the simplest case where N = 0. Then the remainder term is
simply,
R0,a (x) = f (x) − f (a).
By the Mean Value Theorem, for every x there exists a real number c (depending on x) between a
and x such that,
R0,a (x) = f � (c)(x − a).

132
18.01 Calculus Jason Starr
Fall 2005

Iterating the Mean Value Theorem, for every integer N , for every x, there exists a real number c
(depending on both N and x) between a and x such that,

RN,a (x) = f (N +1) (c)(x − a)N +1 /(N + 1)!.

In particular, if we can bound the (N + 1)st derivative of f (x) on the interval between a and c,
then we can bound RN,a (x).
Example. Bound the remainder in the Taylor series expansion for ex about x = a. The (N + 1)st
derivative is simply ex . Therefore, a bound for f (N +1) (c) for c between a and x is simply,

M = em = emax(a,x) .

This is independent of N . The bound on the remainder term is then,

|RN,a (x)| ≤ M (x − a)N +1 /(N + 1)!.

By choosing N suitably large, we can make this remainder term as small as possible. For instance,
if we want to compute ex for x in the interval (−1, 1), then M equals e. To make the remainder
term less than 10−10 , it suﬃces to take N = 12.
3. Review problems. Each of the following problems was discussed in lecture. Here are the
problems and answers, without the discussion.
Problem 1. Let a and b be positive real numbers. There are 2 tangent lines to the ellipse with
equation,
x2 y 2
+ 2 = 1,
a2 b
containing the point (a, b). Find the equations of each of these tangent lines.
The 2 tangent lines are the line tangent to the ellipse at (x, y) = (0, b) and the line tangent to the
ellipse at (x, y) = (a, 0). The equations of these lines are,

y = b,

and,

x = a.

Problem 2. A grain silo is designed by attaching a cylinder of ﬁxed radius r and height a directly
above a right circular cone of base radius r and height b. The silo has no top, and there is no
bottom between the bottom of the cylinder and the top of the cone. For a ﬁxed volume V , what
choice of b minimizes the surface area of the grain silo?
The choice of b minimizing the surface area is,
√
b = 2 5r/5.

133
18.01 Calculus Jason Starr
Fall 2005

Problem 3. Compute the volume of the solid obtained by revolving about the xaxis the region
in the ﬁrst quadrant bounded by the curve y = x2 and the curve x = y 2 .
The volume of this solid is,
Volume = 3π/10.

Problem 4. Using a trigonometric substitution and a trigonometric identity, compute the an
tiderivative, � √
1 − x2
dx.
x2
The antiderivative equals,
� √
1 − x2 √
2
dx = − 1 − x2 /x − sin−1 (x) + C.
x

Problem 5. Using integration by parts, compute the following antiderivative,

�
x sin−1 (x)dx.

The antiderivative equals,

√
�
x sin−1 (x)dx = −(1/4)(1 − x2 ) sin−1 (x) + (1/4)x 1 − x2 + C.

134

Edexcel A Level Mathematics Topic Checklist
No ratings yet
Edexcel A Level Mathematics Topic Checklist
12 pages
SAT Number Properties
No ratings yet
SAT Number Properties
6 pages
MAT136 Lecture Notes
No ratings yet
MAT136 Lecture Notes
185 pages
10 S Mythological Creatures Dichotomous Key Lab
No ratings yet
10 S Mythological Creatures Dichotomous Key Lab
2 pages
Chapter 1 Mathematical Modelling by Differential Equations: Du DX
No ratings yet
Chapter 1 Mathematical Modelling by Differential Equations: Du DX
7 pages
Mat136 Integration
No ratings yet
Mat136 Integration
21 pages
(Molecular Biology, Biochemistry and Biophysics) A. S Spirin - Ribosomes-Springer-Verlag New York (1969)
100% (1)
(Molecular Biology, Biochemistry and Biophysics) A. S Spirin - Ribosomes-Springer-Verlag New York (1969)
337 pages
Applications of Derivative PDF
No ratings yet
Applications of Derivative PDF
25 pages
Lecture Slides Differentiation
No ratings yet
Lecture Slides Differentiation
5 pages
Homework Helper Chapter 1
No ratings yet
Homework Helper Chapter 1
40 pages
A2h 1ST Sem Final Exam Review Part 1
No ratings yet
A2h 1ST Sem Final Exam Review Part 1
8 pages
Ma1505 Cheat
No ratings yet
Ma1505 Cheat
4 pages
Calculator Apps State
No ratings yet
Calculator Apps State
10 pages
Calc Study PDF
No ratings yet
Calc Study PDF
7 pages
A103 - Analytic Geometry
No ratings yet
A103 - Analytic Geometry
16 pages
Calculus 1 - Lecture 1
No ratings yet
Calculus 1 - Lecture 1
32 pages
Bio 131 CH 13 Exam 4 Review - Clicker - Ppts - Fa2021
No ratings yet
Bio 131 CH 13 Exam 4 Review - Clicker - Ppts - Fa2021
52 pages
Cumulative-Frequency 2
No ratings yet
Cumulative-Frequency 2
16 pages
Exploring Data: AP Statistics Unit 1: Chapters 1-4
No ratings yet
Exploring Data: AP Statistics Unit 1: Chapters 1-4
83 pages
Experiment 37B-2 Spectroscopic Analysis of Dyes - More Than Pretty Colors
No ratings yet
Experiment 37B-2 Spectroscopic Analysis of Dyes - More Than Pretty Colors
5 pages
Unit 2 Test Quadratics Review
100% (1)
Unit 2 Test Quadratics Review
7 pages
Final Notes For AB and BC
No ratings yet
Final Notes For AB and BC
16 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
Sma 2270 Calculus III
No ratings yet
Sma 2270 Calculus III
67 pages
Geometry Semester 1 Review Answers
No ratings yet
Geometry Semester 1 Review Answers
35 pages
Pure Math 30 - Trigonometry Lesson 1
No ratings yet
Pure Math 30 - Trigonometry Lesson 1
8 pages
Notes-PDE pt2 - 2 PDF
No ratings yet
Notes-PDE pt2 - 2 PDF
27 pages
Set-A Test Series: JEE Main Full Test - 4 Hints & Solutions Mathematics
No ratings yet
Set-A Test Series: JEE Main Full Test - 4 Hints & Solutions Mathematics
19 pages
Notes PDE Pt1
No ratings yet
Notes PDE Pt1
23 pages
Formulas Pre Olympiad
No ratings yet
Formulas Pre Olympiad
11 pages
2020 AMC10B Problems
No ratings yet
2020 AMC10B Problems
9 pages
Person Note Document For Multivariable Calculus
No ratings yet
Person Note Document For Multivariable Calculus
2 pages
Portal For CBSE Notes, Test Papers, Sample Papers, Tips and Tricks
No ratings yet
Portal For CBSE Notes, Test Papers, Sample Papers, Tips and Tricks
10 pages
Name - Adhiraj Singh Chauhan Class - B.A.Llb Div. B' Year Iv' PRN - 14010125102 Subject - Forensic Science
No ratings yet
Name - Adhiraj Singh Chauhan Class - B.A.Llb Div. B' Year Iv' PRN - 14010125102 Subject - Forensic Science
5 pages
Section 4.4 Non-Homogeneous Heat Equation
No ratings yet
Section 4.4 Non-Homogeneous Heat Equation
6 pages
DP Math: G12 SL Function Transformation Review: 1a. (2 Marks) The Diagram Below Shows The Graph
No ratings yet
DP Math: G12 SL Function Transformation Review: 1a. (2 Marks) The Diagram Below Shows The Graph
1 page
6 Molecular Basis of Inheritance
No ratings yet
6 Molecular Basis of Inheritance
143 pages
Algebra 2 Unit 2 Pre-Test Review Worksheet
No ratings yet
Algebra 2 Unit 2 Pre-Test Review Worksheet
4 pages
Number Theory A Solutions
No ratings yet
Number Theory A Solutions
5 pages
Graphing Practice Worksheet 2015-2016
No ratings yet
Graphing Practice Worksheet 2015-2016
2 pages
(Saber Elaydi) An Introduction To Difference Equation
No ratings yet
(Saber Elaydi) An Introduction To Difference Equation
4 pages
Nonlinear Functions (Level 2)
No ratings yet
Nonlinear Functions (Level 2)
23 pages
Aops Community 2023 Aime
No ratings yet
Aops Community 2023 Aime
6 pages
AP Calculus AB Formula Sheet - Google Docs
No ratings yet
AP Calculus AB Formula Sheet - Google Docs
17 pages
01252022010047AnGeom - Q3 - Module 3 - Rotation of Axes
No ratings yet
01252022010047AnGeom - Q3 - Module 3 - Rotation of Axes
15 pages
IB MHL 1 Assess Sa
No ratings yet
IB MHL 1 Assess Sa
2 pages
Maths Practical PDF
No ratings yet
Maths Practical PDF
38 pages
(Ebook - PDF - Mathematics) - Abstract Algebra
100% (2)
(Ebook - PDF - Mathematics) - Abstract Algebra
113 pages
Elementary Calculus
No ratings yet
Elementary Calculus
77 pages
Calculus-I-Guided-Notes Download
No ratings yet
Calculus-I-Guided-Notes Download
133 pages
Calculus I Notes
No ratings yet
Calculus I Notes
95 pages
Calculus I, Notes: January 2007
No ratings yet
Calculus I, Notes: January 2007
95 pages
18.01 Single Variable Calculus: Mit Opencourseware
No ratings yet
18.01 Single Variable Calculus: Mit Opencourseware
41 pages
Wood, Bailey: Elementary Calculus
100% (3)
Wood, Bailey: Elementary Calculus
326 pages
Quick Calculus Theory - Emery
No ratings yet
Quick Calculus Theory - Emery
127 pages
Calculus I Notes
No ratings yet
Calculus I Notes
116 pages
Differential Calculus
No ratings yet
Differential Calculus
7 pages
(강의노트) 1장 - Limits and Differentiation Rules
No ratings yet
(강의노트) 1장 - Limits and Differentiation Rules
18 pages
Ap Mock
No ratings yet
Ap Mock
30 pages
Multivariable Calculus CAPTER 3
No ratings yet
Multivariable Calculus CAPTER 3
18 pages
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
4.5/5 (2)
MAsample
No ratings yet
MAsample
27 pages
BTech-ME Curriculum-Syllabi 2018
No ratings yet
BTech-ME Curriculum-Syllabi 2018
113 pages
Differentiation Worksheet HSC Questions W/ Solutions
No ratings yet
Differentiation Worksheet HSC Questions W/ Solutions
10 pages
Ap Calculus Full Length Practice Test and Reflection Paper Guidelines
No ratings yet
Ap Calculus Full Length Practice Test and Reflection Paper Guidelines
5 pages
Mathematics For Microeconomics: Maximization of A Function of One Variable
No ratings yet
Mathematics For Microeconomics: Maximization of A Function of One Variable
35 pages
NM 2 2011 2012 02 Notes PDF
No ratings yet
NM 2 2011 2012 02 Notes PDF
17 pages
Sample Paper Syllabus 2020-21: Class
No ratings yet
Sample Paper Syllabus 2020-21: Class
2 pages
Calculus: Chapter 6 Transcendental Functions
No ratings yet
Calculus: Chapter 6 Transcendental Functions
82 pages
7 For Math For Econ Fall2021
No ratings yet
7 For Math For Econ Fall2021
16 pages
Partial Differential Equations Thesis
100% (3)
Partial Differential Equations Thesis
7 pages
Simulating Coupled Heat Transfer and Fluid Flow in Comsol: AC 274 Tutorial (Fall 2017)
No ratings yet
Simulating Coupled Heat Transfer and Fluid Flow in Comsol: AC 274 Tutorial (Fall 2017)
14 pages
Sympy Tutorial
No ratings yet
Sympy Tutorial
12 pages
Differentiation RMyNNyN8B8ND7X2c
No ratings yet
Differentiation RMyNNyN8B8ND7X2c
79 pages
Differentialequations, Dynamicalsystems, and Anintroduction Tochaos
100% (2)
Differentialequations, Dynamicalsystems, and Anintroduction Tochaos
416 pages
Differential Equations
No ratings yet
Differential Equations
17 pages
BSARCHI Courseoutline 20182019
No ratings yet
BSARCHI Courseoutline 20182019
39 pages
The Calculus AB Bible
No ratings yet
The Calculus AB Bible
14 pages
Week 1-5 - Calculus 1 Module
No ratings yet
Week 1-5 - Calculus 1 Module
30 pages
Combined Mathematics: Structure of The Question Paper
No ratings yet
Combined Mathematics: Structure of The Question Paper
18 pages
Immediate Download Bird S Higher Engineering Mathematics 9th Edition John Bird Ebooks 2024
No ratings yet
Immediate Download Bird S Higher Engineering Mathematics 9th Edition John Bird Ebooks 2024
49 pages
(BS Fisheries) Math 4 - Calculus
No ratings yet
(BS Fisheries) Math 4 - Calculus
8 pages
Unit 3 Assignments
No ratings yet
Unit 3 Assignments
1 page
Chen 801-T2
No ratings yet
Chen 801-T2
27 pages
Mathematics For Economics: Euncheol Shin
No ratings yet
Mathematics For Economics: Euncheol Shin
14 pages
Polytechnic First Year Syllabus: Semester I
No ratings yet
Polytechnic First Year Syllabus: Semester I
25 pages
Report Calculus 2
No ratings yet
Report Calculus 2
5 pages
Syllabus Cbse 12 2023-24
No ratings yet
Syllabus Cbse 12 2023-24
14 pages
Electrical Engineering
No ratings yet
Electrical Engineering
80 pages
Electrical Engg
No ratings yet
Electrical Engg
53 pages

Single Variable Notes

Uploaded by

Single Variable Notes

Uploaded by

18.

01 Calculus Jason Starr

Math 18.01 Lecture Summaries

Lecture 2. Sept. 9 Limits

Lecture 3. Sept. 13 Rules of diﬀerentiation

Lecture 4. Sept. 15 The chain rule and implicit diﬀerentiation

Lecture 5. Sept. 16 The derivatives of exponential and logarithm functions

Lecture 6. Sept. 20 The derivatives of trigonometric functions

Lecture 7. Sept. 22 Review for Exam 1

Lecture 8. Sept. 27 Linear and quadratic approximations

Lecture 9. Sept. 29 Sketching curves

Lecture 10. Sept. 30 Applied maximum/minimum problems

Lecture 11. Oct. 4 Related rates problems

Lecture 12. Oct. 6 Newton’s method

Lecture 13. Oct. 13 Antidiﬀerentiation

Lecture 14. Oct. 14 Riemann integrals

Lecture 15. Oct. 18 The Fundamental Theorem of Calculus

Lecture 16. Oct. 20 Properties of the Riemann integral

Lecture 17. Oct. 21 Separable ordinary diﬀerential equations

Lecture 18. Oct. 25 Numerical integration

Lecture 19. Oct. 28 Applications of integration to volumes

Lecture 20. Nov. 1 Averages and volumes by shells

Lecture 21. Nov. 3 Parametric equation curves and arc length

Lecture 24. Nov. 15 Inverse trigonometric functions and hyperbolic functions

Lecture 25. Nov. 17 Inverse hyperbolic functions and inverse substitution

Lecture 26. Nov. 18 Partial fraction decomposition

Lecture 27. Nov. 22 Integration by parts

Lecture 28. Dec. 1 L’Hospital’s rule

Homework. Problem Set 1 Part I: (a)–(e); Part II: Problems 1 and 2.

Practice Problems. Course Reader: 1B­1, 1B­2

Textbook: p. 68, Problems 1–7 and 15.

1. Velocity. Displacement is s(t). Increment from t0 to t0 + Δt is,

Δs = s(t0 + Δt) − s(t0 ).

Average velocity from t0 to t0 + Δt is,

Example. For s(t) = −5t2 + 20t, ﬁrst computed velocity at t = 1 is,

v(1) = lim 10 − 5Δt = 10.

Then computed velocity at t = t0 is,

v(t0 ) = lim −10t0 + 10 − 5Δt = −10t0 + 20.

Finally, computed acceleration at t = t0 is,

a(t0 ) = lim −10 = −10.

2. Derivative. Let y = f (x) be a dependent variable depending on an independent variable x,

Δy = f (x0 + Δx) − f (x0 ).

The diﬀerence quotient or average rate­of­change of y from x0 to x0 + Δx is,

Δy f (x0 + Δx) − f (x0 )

Δy f (x0 + Δx) − f (x0 )

3. Examples in science and math.

Under isothermal conditions, T is a constant T0 so that,

Under adiabatic conditions (i.e., no transfer of heat), pV γ is a constant K. Using this to

(iv) Geometry. The volume of a right circular cone is,

for some constant c. Since A = πr2 , this gives,

Lecture 2. September 9, 2005

Homework. Problem Set 1 Part I: (f)–(h); Part II: Problems 3.

Practice Problems. Course Reader: 1C­2, 1C­3, 1C­4, 1D­3, 1D­5.

Example. For the parabola y = x2 , the derivative is,

The equation of the tangent line is,

y = 2x0 (x − x0 ) = 2x0 x − x20 .

3. Continuity. A function f (x) is continuous at x0 if f (x0 ) is deﬁned, limx→x0 f (x) is

Lecture 3. September 13, 2005

Homework. Problem Set 1 Part I: (i) and (j).

Practice Problems. Course Reader: 1E­1, 1E­3, 1E­5.

2. The binomial theorem. For a positive integer n, the factorial,

By the deﬁnition of the (n + 1)st power of a number,

By the induction hypothesis, the second factor can be replaced,

Summing in columns gives,

Using Pascal’s formula, this simpliﬁes to,

f � (x) = q � (x)g(x) + q(x)g � (x) = q � (x)g(x) + f (x)g � (x)/g(x).

Solving for q � (x) gives,

q � (x) = [f � (x) − f (x)g � (x)/g(x)]/g(x) = [f � (x)g(x) − f (x)g � (x)]/g(x)2 .

By the Leibniz rule,

d(xn+1 ) d(x × xn ) d(x) n d(xn ) d(xn )

formula holds for every positive integer n.

Lecture 4. September 15, 2005

Homework. No new problems.

Practice Problems. Course Reader: 1F­1, 1F­6, 1F­7, 1F­8.

u� (x) = 3/(2u) = 3(3x + 1)−1/2 /2 .

18.01 Calculus Jason Starr

Practice Problems. Course Reader: 1B1, 1B2

The diﬀerence quotient or average rateofchange of y from x0 to x0 + Δx is,

Practice Problems. Course Reader: 1C2, 1C3, 1C4, 1D3, 1D5.

Practice Problems. Course Reader: 1E1, 1E3, 1E5.

Practice Problems. Course Reader: 1F1, 1F6, 1F7, 1F8.

Practice Problems. Course Reader: 1I1, 1I4, 1I5

Practice Problems. Course Reader: 1J1, 1J2, 1J3, 1J4

Practice Problems. Course Reader: 2A1, 2A4, 2A9, 2A11, 2A12.