Single Variable Notes
Single Variable Notes
Homework. These are the problems from the assigned Problem Set which can be completed using
the material from that date’s lecture.
Practice Problems. Practice problems are not to be written up or turned in. These are assigned
only for practice, and are entirely voluntary. Problems listed as “1B1”, for example, are taken
from Section E of the 18.01 course reader.
Lecture 1. Sept. 8 Velocity and derivatives
Lecture 22. Nov. 4 Area of a surface of revolution and polar coordinate curves
Lecture 23. Nov. 8 Tangent lines, arc length and areas for polar curves
1
18.01 Calculus Jason Starr
Fall 2005
2
18.01 Calculus Jason Starr
Fall 2005
(i) Economics. Marginal cost is the derivative of cost with respect to some other variable, for
instance, the quantity purchased.
(ii) Thermodynamics. The ideal gas law relating pressure p, volume V , and temperature T of a
gas is,
pV = nRT.
nRT0
p(V ) = V
.
(iii) Biology. Exponential population growth models the population N (t) after t years as,
N (t) = N0 ert ,
where ex is the exponential function, N0 is initial population, and r is a growth factor. Later
we will see, N � (t) = rN (t), i.e., the population grows at a rate proportional to the size of the
population.
V = A × h.
3
where A is the base area of the cone and h is the height of the cone. The radius r of the base
is proportional to the height,
r(h) = ch,
3
18.01 Calculus Jason Starr
Fall 2005
a cone.
1. Tangent lines to graphs. For y = f (x), the equation of the secant line through
(x0 , f (x0 )) and (x0 + Δx, f (x0 + Δx)) is,
f (x0 + Δx) − f (x0 )
y= (x − x0 ) + f (x0 ).
Δx
In the limit, the equation of the tangent line through (x0 , f (x0 )) is,
y = f � (x0 )(x − x0 ) + y0 .
y � (x0 ) = 2x0 .
For instance, the equation of the tangent line through (2, 4) is,
y = 4x − 4.
Given a point (x, y), what are all points (x0 , x20 ) on the parabola whose tangent line contains
(x, y)? To solve, consider x and y as constants and solve for x0 . For instance, if (x, y) =
(1, −3), this gives,
(−3) = 2x0 (1) − x20 ,
or,
x20 − 2x0 − 3 = 0.
4
18.01 Calculus Jason Starr
Fall 2005
Factoring (x0 − 3)(x0 + 1), the solutions are x0 equals −1 and x0 equals 3. The corresponding
tangent lines are,
y = −2x − 1,
and
y = 6x − 9.
For general (x, y), the solutions are,
�
x0 = x ± x2 − y.
2. Limits. Precise definition is on p. 791 of Appendix A.2. Intuitive definition: limx→x0 f (x)
equals L if and only if all values of f (x) can be made arbitrarily close to L by choosing x
sufficiently close to x0 . One interpretation is the “microscope/laser illuminator” analogy: An
observer focuses a microscopes fieldofview on a thin strip parallel to the xaxis centered
on y = L. The goal of the illuminator is to focus a laserbeam centered on x0 parallel to
the yaxis (but with the line x = x0 deleted) so that only the portion of the graph in the
fieldofview is illuminated. If for every magnification of the microscope, the illuminator can
succeed, then the limit is defined and equals L.
There is a beautiful Java applet on the webpage of Daniel J. Heath of Pacific Lutheran
University,
http://www.plu.edu/~heathdj/java/calc1/Epsilon.html
If you use this, try a = −1.
For lefthand limits, use a laser that illuminates only to the left of x0 . For righthand limits,
use a laser that illuminates only to the right of x0 .
√
1. Another derivative. Use the 3step method to compute the derivative of f (x) = 1/ 3x + 1
is,
f � (x) = −3(3x + 1)−3/2 /2 .
Upshot: Computing derivatives by the definition is too much work to be practical. We need general
methods to simplify computations.
5
18.01 Calculus Jason Starr
Fall 2005
n! = n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1,
is the number of ways of arranging n distinct objects in a line. For two positive integers n and k,
the binomial coefficient,
� �
n n! n(n − 1) · · · (n − k + 2)(n − k + 1)
= = ,
k k!(n − k)! k(k − 1) · · · 3 · 2 · 1
is the number of ways to choose a subset of k elements from a collection of n elements. A funda
mental fact about binomial coefficients is the following,
� � � � � �
n n n+1
+ = .
k−1 k k
This is known as Pascal’s formula. This link is to a webpage produced by MathWorld, part of
Wolfram Research.
The Binomial Theorem says that for every positive integer n and every pair of numbers a and b,
(a + b)n equals, � �
n n−1 n n−k k
a + na b + · · · + a b + · · · + nabn−1 + bn .
k
This is proved by mathematical induction. First, the result is very easy when n = 1; it just says
that (a + b)1 equals a1 + b1 . Next, make the induction hypothesis that the theorem is true for the
integer n. The goal is to deduce the theorem for n + 1,
� �
n+1 n+1 n n + 1 n+1−k k
(a + b) =a + (n + 1)a b + · · · + a b + · · · + (n + 1)abn + bn+1 .
k
(a + b)n+1 = (a + b) × (a + b)n .
+ ( nk + k−1
� � � n � n+1−k k � n � �n� n−k k+1
an+1 + (n + 1)an b + . . . )a b + ( k+1 + k )a b + ... + (1 + n)abn
6
18.01 Calculus Jason Starr
Fall 2005
5. The quotient rule. Let f (x) and g(x) be differentiable functions. If g(a) is nonzero, the
quotient function f (x)/g(x) is defined and differentiable at a, and,
(f (x)/g(x))� = [f � (x)g(x) − f (x)g � (x)]/g(x)2 .
7
18.01 Calculus Jason Starr
Fall 2005
One way to deduce this formula is to set q(x) = f (x)/g(x) so that f (x) = q(x)g(x), and the apply
the Leibniz formula to get,
6. Another proof that d(xn )/dx equals nxn−1 . This was mentioned only very briefly. The
product rule also gives another induction proof that for every positive integer n, d(xn )/dx equals
nxn−1 . For n = 1, we proved this by hand. Let n be some specific positive integer, and make the
induction hypothesis that d(xn )/dx equals nxn−1 . The goal is to deduce the formula for n + 1,
d(xn+1 )
= (n + 1)xn .
dx
d(xn+1 )
= xn + x(nxn−1 ) = xn + nxn = (n + 1)xn .
dx
Thus the formula for n implies the formula for n + 1. Therefore, by mathematical induction, the
√
1. Product rule example. For u = 3x + 1, what is u� (x)? Since u · u = 3x + 1, (u · u)� =
(3x + 1)� = 3. By the product rule, (u · u)� = u� · u + u · u� = 2uu� . Thus solving,
2. The derivative of un . From above, (u2 )� equals 2uu� . By a similar computation, (u3 )� equals
3u2 u� . This suggests a pattern,
d(un ) du
= nun−1 .
dx dx
This can be proved by induction on n. For n = 1, 2 and 3, it was checked. Let n be a particular
integer (for instance, 70119209472933054321). For that integer, suppose the result is known,
d(un ) du
= nun−1 .
dx dx
The goal is to prove the result for n + 1, that is,
d(un+1 ) du
= (n + 1)un .
dx dx
Let v = un . Then un+1 equals uv. So, by the product rule,
d(un+1 ) d(uv) du dv
= = v+u .
dx dx dx dx
Plugging in v = un , this is,
d(un+1 ) du d(un )
= · (un ) + u .
dx dx dx
By the induction hypothesis, d(un )/dx equals nun−1 (du/dx). Plugging in,
d(un+1 ) du du
= · (un ) + u(nun−1 ).
dx dx dx
This simplfies to,
d(un+1 ) du du du
= un + nun = (n + 1)un .
dx dx dx dx
Thus, the result for n + 1 follows from the result for n. By induction, the result holds for every n.
3. The derivative of xa , a a fraction. Let a be a fraction m/n and let u(x) be xa . Then un
equals xm . Thus,
d(un ) d(xm )
= ,
dx dx
which equals mxm−1 . By the above, d(un )/dx equals nun−1 (du/dx). Thus,
du
nun−1 = mxm−1 .
dx
Solving for du/dx,
du mxm−1 mxm−1
= = .
dx nun−1 n(xm/n )n−1
One of the basic rules of exponents is that (ab )c equals abc . Thus the denominator n(xm/n )n−1
equals nxm/n(n−1) , which equals nxm−m/n . Thus,
9
18.01 Calculus Jason Starr
Fall 2005
d(xa )
= axa−1 .
dx
4. The chain rule. Let y be a function of x, y = f (x), and let u be a function of y, u = g(y).
Then u is a function of x, u = g(f (x)). This function is a composite function, and is denoted
by,
(g ◦ f )(x) = g(f (x)).
What is the derivative of a composite function? The claim is that,
where y0 equals f (x0 ), u0 equals g(y0 ) = g(f (x0 )), Δy equals f (x0 + Δx) − f (x0 ) = f (x0 + Δx) − y0 ,
and Δu equals g(y0 +Δy)−g(y0 ) = g(f (x0 +Δx))−g(f (x0 )). So long as Δy is nonzero, the fraction
in the limit is defined. And, as Δx approaches 0, also Δy approaches 0. Thus the limit breaks up
as,
Δu Δy
(g ◦ f )� (x0 ) = lim · lim = g � (y0 ) · f � (x0 ).
Δy→0 Δy Δx→0 Δx
5. Implicit differentiation. This method has already been used many times. Given a function
y(x) satisfying some equation involving both x and y, formally differentiate each side of the equation
with respect to x and then try to solve for y � .
10
18.01 Calculus Jason Starr
Fall 2005
1. Example of implicit differentiation. Let y = f (x) be the unique function satisfying the
equation,
1 1
+ = 2.
x y
What is slope of the tangent line to the graph of y = f (x) at the point (x, y) = (1, 1)?
� � � �
d 1 d 1 d(2)
+ = = 0.
dx x dx y dx
Of course (1/x)� = (x−1 )� = −x−2 . And by the rule d(un )/dx = nun−1 (du/dx), the derivative of
1/y is −y −2 (dy/dx). Thus,
dy
−x−2 − y −2 = 0.
dx
−1 − 1y � (1) = 0,
y � (1) =
−1 .
In fact, using that 1/y equals 2 − 1/x, this can be solved for every x,
dy 1 1 1
= (x−2 )/(y −2 ) = 2 · 2
= .
dx x (2 − 1/x) (2x − 1)2
2. Rules for exponentials and logarithms. Let a be a positive real number. The basic rules
of exponentials are as follows.
Rule 1. If ab equals B and ac equals C, then ab+c equals B · C, i.e.,
ab+c = ab · ac .
(ab )d = abd .
If ab equals B, the logarithm with base a of B is defined to be b. This is written loga (B) = b. The
function B → loga (B) is defined for all positive real numbers B. Using this definition, the rules of
exponentiation become rules of logarithms.
11
Rule 1. If loga (B) equals b and loga (C) equals c, then loga (B · C) equals b + c, i.e.,
Rule 2. If loga (B) equals b and B d equals D, then loga (D) equals d loga (B), i.e.,
Rule 3. Since logB (D) equals d, an equivalent formulation is loga (D) equals loga (B) logB (D), i.e.,
3. The derivative of ax . Let a be a positive real number. What is the derivative of ax ? Denote
the derivative of ax at x = 0 by L(a). It equals the value of the limit,
ah − 1
L(a) = lim .
h→0 h
Then for every x0 , the derivative of ax at x0 equals,
ax0 +h − ax0
lim .
h→0 h
By Rule 1, ax0 +h equals ax0 ah . Thus the limit factors as,
ax 0 ah − ax 0
lim = ax0 lim ah − 1h.
h→0 h h→0
d(ax )
= L(a)ax .
dx
What is L(a)? To figure this out, consider how L(a) changes as a changes. First of all,
(ab )h − 1
L(ab ) = lim .
h→0 h
By Rule 2, (ab )h equals abh . So the limit is,
abh
− 1 abh − 1
L(ab ) = lim = b lim .
h→0 h h→0 bh
Now, inside the limit, make the substitution that k equals bh. As h approaches 0, also k approaches
0. So the limit is,
ak − 1
L(ab ) = b lim = bL(a).
k→0 k
12
18.01 Calculus Jason Starr
Fall 2005
Choose a number a0 bigger than 1, say a0 = 2. Then for every positive real number a, a = ab
0
where b = loga0 (a). Thus,
So, with a0 fixed and a allowed to vary, L(a) is just the logarithm function loga0 (a) scaled by L(a0 ).
Looking at the graph of (a0 )x , it is geometrically clear that L(a0 ) is positive (though we have not
proved that L(a0 ) is even defined). Thus the graph of L(a) looks qualitatively like the graph of
loga0 (a). In particular, for a less than 1, L(a) is negative. The value L(1) equals 0. And L(a)
approaches +∞ and a increases. Therefore, there must be a number where L takes the value 1.
By long tradition, this number is called e;
eh − 1
L(e) = lim = 1.
h→0 h
This is the definition of e. It sheds very little light on the decimal value of e.
Because e is so important, the logarithm with base e is given a special name: the natural loga
rithm. It is denote by,
ln(a) = loge (a).
So, finally, L(a) equals,
L(a) = loge (a)L(e) = ln(a)(1) = ln(a).
This leads to the formula for the derivative of ax ,
d(ax )
= ln(a)ax .
dx
In particular,
d(ex )
= ex .
dx
In fact, ex is characterized by the property above and the property that e0 equals 1.
4. The derivative of loga (x) and the value of e. By the chain rule,
d(au ) du
= ln(a)au .
dx dx
For u = loga (x), au equals x. Thus,
d(au ) d(x)
= = 1.
dx dx
Thus,
du
ln(a)au = 1.
dx
13
18.01 Calculus Jason Starr
Fall 2005
Solving gives,
d loga (x) 1 1
= = 1/(ln(a)x) .
dx ln(a) au
In particular, for a = e, this gives,
d ln(x)
= 1/x .
dx
What is the derivative of ln(x) at x = 1? On the one hand, since the derivative of ln(x) equals 1/x,
the derivative at x = 1 is 1/1 = 1. On the other hand, the definition of the derivative gives,
ln(1 + h) − ln(1)
lim .
h→0 h
Of course, ln(1) equals 0, so this simplifies to,
1
lim ln(1 + h).
h→0 h
ln[lim(1 + h)1/h ].
h→0
So the natural logarithm of the inner limit equals 1. But e is the unique number whose natural
logarithm equals 1. This leads to the formula,
e = lim(1 + h)1/h .
h→0
lim (1 + 1/n)n = e .
n→+∞
This can be used to compute e to arbitrary accuracy. The first few digits of e are
2.718281828459045...
5. Logarithmic differentiation. There is a method of computing derivatives of products of
functions that is often useful. If y is a product of n factors, say f1 (x)· f2 (x)·· · ·· fn (x), the derivative
of y can be computed by the product rule. However, it seems to be a fact that multiplication is
more errorprone than addition. Thus introduce,
14
18.01 Calculus Jason Starr
Fall 2005
du d ln(y) 1 dy
= = .
dx dx y dx
Therefore the derivative of y can be computed as,
Thus, u� equals,
3x2 1 3
u� = 3
+ √ √ − .
(1 + x ) 2 x(1 + x) 7x
So, finally, √
(1 + x3 )(1 + 3x2
� �
� � x) 1 3
y = yu = + √ √ − .
x3/7 3
(1 + x ) 2 x(1 + x) 7x
1. Trigonometric functions. What is angle? For a sector of a unit circle (a circle of radius
1), the angle of the sector equals both the length of the arc of the sector and 1/2 the area of the
sector. Although we have as yet general definitions of neither arc length nor area, this can be used
to give a rigorous definition of angle. We can divide any sector in two equal pieces: simply bisect
the chord of the sector. We also know how to add two angles, by laying the sectors in adjacent
positions. Denoting the area of a unit circle by the symbol π (which happens to be the familiar π),
these 2 operations produce every angle of the form mπ/2n , with m and n integers. Every angle can
15
be approximated arbitrarily well by such angles. Thus, for every continuous function of an angle,
every value of the function can be computed.
The basic functions are sin(θ), cos(θ), tan(θ), sec(θ), csc(θ) and cot(θ). Full descriptions of these
are in §9.1 of the textbook by Simmons. The same information is contained in the webpage on
Trigonometry produced by MathWorld, part of Wolfram Research.
2. Trigonometric identities. For today, the most important identities are the angle addition
formulas,
sin(α + β) = sin(α) cos(β ) + cos(α) sin(β),
3. Some trigonometric limits. In computing trigonometric limits, the following limit is crucial,
sin(θ)
lim = 1.
θ→0 θ
As explained in class, this is essentially the statement that as θ → 0, the quotient of the arc length
by the chord length tends to 1. This was not proved in lecture, nor is it proved in your textbook
in §2.1 (despite the author’s claim). However, it is geometrically reasonable. And, of course, it can
be proved.
This limit implies another limit,
cos(θ) − 1
lim = 0.
θ→0 θ
To see this, rewrite the term as,
By Identity (v), cos2 (θ) − 1 equals − sin2 (θ), so the term equals,
16
18.01 Calculus Jason Starr
Fall 2005
This gives,
sin(a + h) − sin(a) = sin(a)(cos(h) − 1) + cos(a) sin(h).
Thus the difference quotient equals,
d sin(x)
= cos(x).
dx
cos(h) − 1 sin(h)
h→0 h h→0 h
Thus the derivative of cos(x) equals,
d cos(x)
= − sin(x).
dx
17
18.01 Calculus Jason Starr
Fall 2005
d tan(x)
= sec2 (x).
dx
In a similar manner,
d cot(x)
= − csc2 (x),
dx
d sec(x)
= sec(x) tan(x),
dx
and
d csc(x)
= − csc(x) cot(x).
dx
Lecture 7. September 22, 2005
Review for Exam 1. No new material was presented. There were no practice problems from the
course reader.
In a precise sense, this is the best approximation of f (x) by a linear function near x = a. For x
close to a, the value of f (x) is close to the value of the linearization. The notation for this is,
near x = 0 is,
f (x) ≈ 5 − (15 − 2π)x for x ≈ 0.
18
18.01 Calculus Jason Starr
Fall 2005
ex ≈
1 + x + x2 /2! + x3 /3! + . . . ,
ln(1 + x)
≈ 1 − x + x2 /2 − x3 /3 + . . .
3. Combining basic approximations. The basic approximations can be combined to get new
linear approximations.
(i) The linear approximation of f (x) for x ≈ a can be converted to a linear approximation at 0 by
setting g(u) = f (a + u). In symbols,
(ii) The linear approximation of f (cx) for x ≈ a is obtained from the linear approximation of f (u)
for u ≈ ca by substituting u = cx,
19
18.01 Calculus Jason Starr
Fall 2005
(iii) The linear approximation of cf (x) for x ≈ a is c times the linear approximation of f (x) for
x ≈ a,
cf (x) ≈ cf (a) + cf � (a)(x − a) .
This is different than the previous rule. Also, the linear approximation of f (x) + g(x) for x ≈ a is
the sum of the linear approximations of f (x) and g(x),
(iv) The linear approximation of f (x)g(x) for x ≈ a is the product of the linear approximations,
disregarding all quadratic terms,
(v) The linear approximation of f (x)/g(x) for x ≈ a is the quotient of the linear approximations,
using the linear approximation 1/(1 − x) ≈ 1 + x,
20
18.01 Calculus Jason Starr
Fall 2005
(vi) The linear approximation of g(f (x)) for x ≈ a is obtained from the linear approximation of
g(u) for u ≈ f (a) by substituting in for u the linear approximation of f (x) for x ≈ a and ignoring
quadratic terms,
u = f (x) ≈ f (a) + f � (a)(x − a),
g(f (x)) = g(u) ≈ g(f (a)) + g � (f (a))(u − f (a)) ≈ g(f (a)) + g � (f (a))((f (a) + f � (a)(x − a)) − f (a)).
This simplifes to,
g(f (x)) ≈ g(f (a)) + g � (f (a))f � (a)(x − a).
This is equivalent to the chain rule,
d dg df
(g(f (x))) = (f (x)) (x).
dx dx dx
Together, these 6 rules account for all the general rules we have regarding differentiation. So every
rule of differentiation has an equivalent formulation in terms of linear approximations.
Example. Using the rules, the linear approximation for,
4. Quadratic approximations. Sometimes the linear approximation is not good enough. One
example is the linear approximation of cos(x) as 1 for x ≈ 0. The linear approximation gives no
idea whether cos(x) is greater than 1, less than 1, concave up, concave down, etc. This is remedied
by the quadratic approximation,
Each of the basic approximations has an analogous quadratic approximation. Each of the rules for
combining linear approximations has an analogous rule for quadratic approximations.
5. The mean value theorem. This was discussed only very briefly. If a function f (x) is
differentiable on the interval having a and b as endpoints, then there is a point c strictly between a
and b so that the slope of the tangent line to y = f (x) at x = c equals the slope of the secant line
to y = f (x) containing (a, f (a)) and (b, f (b)),
f (b) − f (a)
f � (c) = .
b−a
21
18.01 Calculus Jason Starr
Fall 2005
This is sometimes useful for bounding f (b) − f (a), if a bound on the derivative of f (x) is known.
1. Application of the Mean Value Theorem. A realworld application of the Mean Value
Theorem is error analysis. A device accepts an input signal x and returns an output signal y. If
the input signal is always in the range −1/2 ≤ x ≤ 1/2 and if the output signal is,
1
y = f (x) = ,
1 + x + x2 + x3
what precision of the input signal x is required to get a precision of ±10−3 for the output signal?
If the ideal input signal is x = a, and if the precision is ±h, then the actual input signal is in the
range a − h ≤ x ≤ a + h. The precision of the output signal is |f (x) − f (a)|. By the Mean Value
Theorem,
f (x) − f (a)
= f � (c),
x−a
for some c between a and x. The derivative f � (x) is,
−(3x2 + 2x + 1)
f � (x) = .
(1 + x + x2 + x3 )2
� 3(1/2)2 + 2(1/2) + 1
|f (x)| ≤ = 7.04.
[1 + (−1/2) + (−1/2)2 + (−1/2)3 ]2
f (a) < f (b) whenever a < b, resp. f (a) > f (b) whenever a < b.
22
18.01 Calculus Jason Starr
Fall 2005
If f (a) is less than or equal to f (b), resp. greater than or equal to f (b), whenever a is less than
b, then f (x) is nondecreasing, resp. nonincreasing. If f (x) is increasing, the graph rises to the
right. If f (x) is decreasing, the graph rises to the left.
If f � (a) is positive, the First Derivative Test guarantees that f (x) is increasing for all x sufficiently
close to a. If f � (a) is negative, the First Derivative Test guarantees that f (x) is decreasing for all
x sufficiently close to a.
Example. For the function y = x3 + x2 − x − 1, determine where y is increasing and where y is
decreasing.
The derivative is,
y � = 3x2 + 2x − 1 = (3x − 1)(x + 1).
Thus the derivative of y changes sign only at the points x = −1 and x = 1/3. By testing random
elements, y � is positive for x > 1/3, it is negative for −1 < x < 1/3, and it is positive for x < −1.
Therefore, by the First Derivative Test, y is increasing for x < −1, y is decreasing for −1 < x < 1/3,
and y is increasing for x > 1/3.
3. Extremal points. If f (x) ≤ f (a) for all x near a, then x is a local maximum. If f (x) ≥ f (a)
for all x near a, then x is a local minimum. Because of the First Derivative Test, if f � (a) > 0 and
f is defined to the right of a, the graph of f rises to the right of a. Thus a is not a local maximum.
Similarly, if f � (a) < 0 and f is defined to the left of a, the graph of f rises to the left of a. Thus
a is not a local maximum. In particular, if f is defined to both the right and left of a, if f � (a) is
defined, and if a is a local maximum, then f � (a) equals 0. Similarly, if f is defined to both the right
and left of a, if f � (a) is defined, and if a is a local minimum, then f � (a) equals 0.
A point a where f � (a) is defined and equals 0 is a critical point. By the last paragraph, if x = a is
a local maximum of f , respectively a local minimum of f , then one of the following holds.
(i) The function f (x) is discontinuous at a.
(iii) The point a is a left endpoint of the interval where f is defined, and f � (a) ≤ 0, resp. f � (a) ≥ 0.
(iv) The point a is a right endpoint of the interval where f is defined, and f � (a) ≥ 0, rexp.
f � (a) ≤ 0.
(v) The function f is defined to the left and right of a, and f � (a) equals 0. In other words, a is a
critical point of f .
Example. For the function y = x3 + x2 − x − 1, the critical points are x = −1 and x = 1/3. By
examining where y is increasing and decreasing, x = −1 is a local maximum and x = 1/3 is a local
minimum.
The plurals of “maximum” and “minimum” are “maxima” and “minima”. Together, local maxima
and local minima are called extremal points, or extrema. These are points where f takes on an
23
extreme value, either positive or negative. A point where f achieves its maximum value among all
points where f is defined is a global maximum or absolute maximum. A point where f achieves its
minimum value among all points where f is defined is a global minimum or absolute minimum.
4. Concavity and the Second Derivative Test. For a differentiable function f , every “interior”
extremal point is a critical point of f . But not every critical point of f is an extremal point.
Example. The function f (x) = x3 has a critical point at x = 0. But f (x) is everywhere increasing,
thus x = 0 is not an extremal point of f .
When is a critical point an extremal point? When is it a local maximum? When is it a local
minimum? This is closely related to the concavity of f . A function f (x) is concave up, respectively
concave down, if no secant line segment to f (x) crosses below the graph of f , resp. above the graph
of f . In symbols, f is concave up, resp. concave down, if
(f (c) − f (a))/(c − a) ≤ (f (b) − f (a))/(b − a) whenever a < c < b,
24
In each case, the graph of y = f (x) becomes unbounded, and becomes arbitrarily close to the line
x = a. If x = a is a vertical asymptote, then f (x) has an infinite discontinuity at x = a.
The function f has a horizontal asymptote y = b if at least 1 of the following holds,
lim f (x) = b, lim f (x) = b.
x→+∞ x→−∞
In other words, the graph of y = f (x) becomes arbitrarily close to the line y = b as x approaches
either +∞ or −∞.
Example. For the function y = (x3 + x)/(x2 − 1) = x(x2 + 1)/(x2 − 1), the lines x = −1 and
x = −1 are vertical asymptotes. There is no horizontal asymptote. However, the graph of y is
asymptotic to the line y = x. This was not discussed in lecture. A pair of functions f and g are
asymptotic to each other if the line y = 0 is a horizontal asymptote of f − g.
2. Applied maximum/minimum problems. Using the First Derivative Test, the maximum
and minimum of many functions can be computed. This is very important in applications.
Example. Two long walls meet at right angles making a corner. Using a length of 10 meters of
fence to form the other 2 sides of a rectangle, what is the largest area that can be enclosed in this
corner?
Step 1. Identify parameters. A parameter is a constant or variable. The constant in this
problem is 10 meters. Two variables are the length l of one side of the rectangle, and the width w
of the remaining side of the rectangle.
Step 2. Draw a diagram. This was done in lecture.
Step 3. Find the quantity to be maximized or minimized. The quantity to be maximized
is the area A of the rectangle. Since the area is the product of the length and width, A equals lw.
Step 4. Use the constraints to eliminate variables. The constraint is that the total length
of fence is 10 meters. Thus l + w equals 10. This is used to eliminate w,
w = 10 − l.
Making this substitution, A is now a function of l alone,
A(l) = lw(l) = l(10 − l) = −l2 + 10l.
25
18.01 Calculus Jason Starr
Fall 2005
Step 6. Find all critical points, endpoints, discontinuity points, etc. In most cases, it
suffices to find all critical points and endpoints. Occasionally it is also necessary to find all points
where f � is not defined. Rarely it is necessary to also consider discontinuity points (although this
is usually so obvious that it does not require a separate step). In this case, the endpoints are l = 0
and l = 10. The one critical point is l = 5.
Step 7. Determine the global maximum or minimum. Checking all critical points, end
points, etc., determine the global maximum or the global minimum. In this case, A(0) equals 0,
A(10) equals 0 and A(5) equals 25. Thus l = 5 is the global maximum.
Step 8. Backsubstitute. Plug in the value of the single remaining independent variable to
determine the values of the remaining independent variables. In this case, w equals 10 − l, which
is 10 − 5 = 5 for l = 5. Thus, the largest area 25 is enclosed by a square of side length 5.
Example. A swimmer is in the water at a distance b1 meters from shore. She wants to reach a
point on land b2 meters from the water. The point is a meters parallel to the shore. If the swimmer
swims v1 meters per second and runs v2 meters per second, at what distance x from the closest
point on shore should she aim to minimize her time to the target? Mathematically, the swimmer
is at point (0, b1 ) and wants to reach point (a, −b2 ), where the shore is the xaxis. At what point
(x, 0) should she aim?
The constants are a, b1 , b2 , v1 and v2 . The variable is x. It is also convenient to introduce a
variable d1 for the distance from (0, b1 ) to (x, 0), and a variable d2 for the distance from (x, 0) to
(a, −b2 ). Although not obvious, it is also very convenient to introduce a variable θ1 for the acute
angle formed by the xaxis and the line segment joining (0, b1 ) to (x, 0). Also introduce θ2 for the
acute angle formed by the xaxis and the line segment joining (x, 0) to (a, −b2 ).
The time T1 to swim to point (x, 0) is,
d1 1
T1 = = (x2 + b21 )1/2 .
v1 v1
The time T2 to run from (x, 0) to point (a, −b2 ) is,
d2 1
T2 = = ((a − x)2 + b22 )1/2 .
v2 v2
26
18.01 Calculus Jason Starr
Fall 2005
Technically, there are no endpoints. However, it is obvious that the maximum must occur for
0 ≤ x ≤ a. Thus these may be taken to be endpoints. The critical value occurs when,
sin(θ1 ) sin(θ2 )
v1
= v2
.
This is Snell’s Law for refraction of light upon crossing from one medium to another. For refraction,
a particle of light (perhaps fictitious) replaces the swimmer, a translucent medium of one type
replaces the water, and a translucent medium of a second type replaces the land. If light travels
with velocity v1 in the first medium and with velocity v2 in the second medium, light rays will refract
upon crossing the boundary between media. Snell’s Law describes the angles of this refraction.
Lecture 11. October 4, 2005
Homework. Problem Set 3 Part I: (g) and (h).
Practice Problems. Course Reader: 2E4, 2E8, 2E9.
1. Related rates. A situation that arises often in practice is that two quantities, say x and y,
depend on a third independent variable, say t. The quantities x and y are related through some
constraint. Using the constraint, if the rateofchange dx/dt is known, the rateofchange dy/dt can
be inferred.
Example. For a spring displaced x units from equilibrium, Hooke’s law implies the potential
energy of the spring is,
1
P = kx2 ,
2
2
where k is a constant with units kg/s . At some moment t = T , a spring is displaced 5cm from
equilibrium and has velocity 5cm/s. In terms of the spring constant k, describe the rateofchange
of the potential energy at t = T .
27
18.01 Calculus Jason Starr
Fall 2005
Implicitly differentiating the equation with respect to t gives, using the chain rule,
dP 1 dx dx
= k(2x) = kx .
dt 2 dt dt
So, at time t = T ,
dP dx
(T ) = kx(T ) (T ) = k(5)(5)cm2 /s = 25kcm2 /s.
dt dt
2. Method for solving relatedrates problems. Many of these steps apply to any word
problem in mathematics.
(iii) Label all dependent variables. In the example, x and P are dependent variables.
(v) Write the given rateofchange and the unknown rateofchange. In the example, dx/dt(T ) is
given as 5cm/s, and dP /dt is unknown.
(vi) Using the diagram and any other information, find constraints among the dependent variables.
In the example, this is the equation P = kx2 /2.
(vii) Implicitly differentiate the constraint equations with respect to the independent variable. In
the example, this gives dP/dt = kxdx/dt.
(viii) Substitute in all known quantities and solve for the unknown rateofchange. In the example,
dP/dt(T ) equals 25kcm2 /s.
Example. A state trooper waits a distance a from a highway for passing speeders. The speed
limit is 60mph. The trooper aims her radar gun at an angle of π/4 to the road. The radar registers
a passing car moving away from the trooper at a speed of 50mph. Should the trooper ticket the
driver?
The independent variable is time t. The constants are the distance a and the angle θ = π/4.
Label a coordinate system with the trooper at the origin and the highway equal to the line y = a.
Label the position of the car along the highway as x, moving in the positive direction. Denote by
r the distance of the car from the trooper. Then x and r are dependent variables. The rateof
change dr/dt(T ) is given as 50mph. The unknown rateofchange is dx/dt(T ). The constraint is
the Pythagorean theorem,
r 2 = x2 + y 2 .
28
18.01 Calculus Jason Starr
Fall 2005
d sin(θ) dθ d(x−1 ) dx
= ,
dθ dt dx dt
or,
dθ −1 dx
cos(θ) = 2 .
dt x dt
√
Since x(T ) equals 2, sin(θ(T )) = 1/2, and thus cos(θ(T )) equals 3/2. Plugging in gives,
√
3 dθ −1 −v
(T ) = 2
v= .
2 dt (2) 4
29
18.01 Calculus Jason Starr
Fall 2005
Solving gives,
dθ √
(T ) = −v/(2 3).
dt
3. Another applied max/min problem. As review for Exam 2, this is another applied max/min
problem. A trapezoid is inscribed inside the upper unit semicircle, x2 + y 2 = 1, y ≥ 0. The base
of the trapezoid is the diameter of the semicircle lying on the xaxis. The top of the trapezoid
is parallel to the xaxis joining (−x, y) to (x, y) for a point (x, y) on the unit circle in the first
quadrant. What is the maximal area enclosed by such a trapezoid?
The parameters are x and y. The height of the trapezoid is y. The area of a trapezoid is the
product of the height with the average of the parallel sides. Thus,
(2 + 2x)
A=y = (x + 1)y.
2
This is the quantity to be maximized. There is a constraint among the parameters,
x2 + y 2 = 1.
Differentiating gives,
dA √ −2x 1 −(2x2 + x − 1)
= 1 − x2 + (x + 1) √ =√ ((1 − x2 ) − (x2 + x)) = √ .
dx 2 1 − x2 1 − x2 1 − x2
Because the quadratic polynomial 2x2 + x − 1 factors as,
the critical points of A are x = −1 and x = 1/2. But x = −1 does not give a point in the first
quadrant. Thus A is maximized either at one of the endpoints x = 0, x = 1 or at the critical point
x = 1/2. Plugging in gives,
√
A(0) = 1, A(1/2) = 3 3/4, A(1) = 0.
30
Two other methods were given in lecture. The fastest among the three is to instead minimize A2 ,
A2 = (x + 1)2 y 2 .
The derivative of this polynomial is very fast to compute, and gives the same answer as above.
This was a guest lecture by Sabri Kilic. Notes from the lecture will not be posted. As always,
Homework. Problem Set 4 Part I: (a) and (b); Part II: Problem 3.
31
18.01 Calculus Jason Starr
Fall 2005
√
Example. Using differential notation, the derivative of sin( x2 + 1) is,
d sin((x2 + 1)1/2 ) = cos((x2 + 1)1/2 )d(x2 + 1)1/2 = cos((x2 + 1)1/2 )( 21 (x2 + 1)−1/2 )d(x2 + 1) =
cos((x2 + 1)1/2 ) 21 (x2 + 1)−1/2 (2xdx) = x(x2 + 1)−1/2 cos((x2 + 1)1/2 )dx .
dF
Problem (Antidifferentiation). Given a function f (x), find a function F (x) satisfying dx
=
f (x).
A function F (x) solving the problem is called an antiderivative of f (x), or sometimes an indefinite
integral of f (x). The notation for this is,
�
F (x) = f (x)dx.
The expression f (x) is called the integrand. It is important to note, if F (x) is one antiderivative
of f (x), then for each constant C, F (x) + C is also an antiderivative of f (x). The constant C is
called a constant of integration.
In a sense that can be made precise, the problem of differentiation has a complete solution whenever
F (x) is a “simple expression”, i.e., a function built from the differentiable functions we have seen
so far. Unfortunately, for very many simple functions f (x), no antiderivative of f (x) has a simple
expression. In large part, this is what makes antidifferentiation difficult. Luckily, many of the most
important simple functions f (x) do have an antiderivative with a simple expression. One goal of
this unit is to learn how to recognize when a simple antiderivative exists, and some tools to compute
the antiderivative.
3. Antidifferentiation. Guessandcheck. The main technique for antidifferentiation is edu
cated guessing.
Example. Find an antiderivative of f (x) = x2 + 2x + 1. Since the derivative of xn is nxn−1 , it is
reasonable to guess there is an antiderivative of the form F (x) = Ax3 + Bx2 + Cx. Differentiation
gives,
dF
= 3Ax2 + 2Bx + C.
dx
Thus, F (x) is an antiderivative of f (x) if and only if,
3A = 1, 2B = 2, and C = 1.
32
18.01 Calculus Jason Starr
Fall 2005
Here is a checklist for applying integration by substition to find the antiderivative of f (x).
(i) Find an expression u(x) so that most of the integrand f (x) can be expressed as a simpler
function of u(x).
(iv) Try to write f (x)dx as g(u)du. If you cannot do this, the method does not apply with the
given choice of u.
�
(v) Find an antiderivative G(u) = g(u)du for the simpler integrand g(u) (if this is possible).
33
18.01 Calculus Jason Starr
Fall 2005
Most of the integrand is a function of sin(x). So substitute u(x) = sin(x). The differential of u is
du = cos(x)dx. The differential sin(x)3 cos(x)dx contains du = cos(x)dx as a factor. The remainder
of the integrand is sin(x)3 = u3 . So, according to integration by substitution,
� �
1
sin(x) cos(x)dx = u3 du = u4 + C.
3
4
Finally, backsubstitute u = sin(x) to get,
�
sin(x)3 cos(x)dx = (sin(x))4 /4 + C.
A more sophisticated version of the method of exhaustion gives the Riemann integral. Here is the
basic problem.
Problem (Area). Find the signed area between the graph of y = f (x) and the xaxis over the
interval a ≤ x ≤ b.
34
18.01 Calculus Jason Starr
Fall 2005
For a region above the xaxis, the signed area is simply the area. For a region below the xaxis, the
signed area is the negative of the area. For a region partly above the xaxis and partly below the
xaxis, the signed area is the sum of the signed area of the region above the xaxis and the signed
area of the region below the xaxis.
2. Partitions. A partition of an interval [a, b] is a finite decomposition of the interval as a union
of nonoverlapping subintervals,
Since an interval is determined by its right and left endpoints, to specify a partition of [a, b], it is
equivalent to give an ordered sequence of increasing numbers,
Δxk = xk − xk−1 .
A partition is fine if the subintervals are small, and coarse if the subintervals are large. It may seem
the number of intervals n is a good measure of fineness: since the subintervals of a fine partition
are small, the number n of subintervals must be large. However, a partition into many subintervals
may include a few subintervals that are quite large. For instance, the partition
[0, 1] = [0, 1/2n]∪[1/2n, 2/2n]∪[2/2n, 3/2n] ∪· · ·∪[(n−2)/2n, (n−1)/2n] ∪ [n−1/2n, n/2n]∪[1/2, 1],
has n very small intervals of length 1/2n, but has one interval, [1/2, 1], of size 1/2. The number
1/2 may not seem large, but as n increases, it is quite large compared to 1/2n.
Because of such examples, a better measure of fineness is mesh size: The mesh size of a partition
is the maximal length of any subinterval in the partition,
35
The sum above is a Riemann sum. It is an approximation of the signed area of the curvilinear
region.
There are many choices of partition. And for each partition, there are many choices for the numbers
x∗k . However, there are some special choices. On the k th interval, the smallest value f (x) takes on
is denoted by,
yk,min = min{f (x)|xk−1 ≤ x ≤ xk+1 }.
Similarly, the largest value f (x) takes on is denote by,
For every choice of x∗k in the k th interval, yk∗ is trapped between these two values,
Denoting,
ΔAk,min = yk,min Δxk , ΔAk,max = yk,max Δxk ,
the area ΔAk is trapped between these two values,
Amin ≤ A ≤ Amax .
Thus, if Amin and Amax are close to each other, the value of A does not depend very much on the
choices of the numbers x∗k .
4. The Riemann integral. The method of the Riemann integral is to compute both Amin and
Amax for a sequence of partitions whose mesh sizes approach 0. The mesh size measures the fineness
of the partition, thus also the fit of the union of vertical strips to the curvilinear region. If the two
limits,
lim Amin , lim Amax ,
mesh→0 mesh→0
are defined and equal, it is said the Riemann integral exists, and the common limit is called the
Riemann integral, � b
f (x)dx = lim Amin = lim Amax .
a mesh→0 mesh→0
Also, f (x) is said to be Riemann integrable on the interval [a, b]. Another name for the Riemann
integral is the definite integral.
36
18.01 Calculus Jason Starr
Fall 2005
Example. Consider the function f (x) = x on the interval 0 ≤ x ≤ L, for some positive number
L. Form the partition with n subintervals of equal length,
Every interval has length Δxk = L/n. So the mesh size is L/n. The minimum value of f (x) on the
interval xk−1 ≤ x ≤ xk is yk,min = xk−1 = (k − 1)L/n. The maximum value is yk,max = xk = kL/n.
Thus,
n n n
� � (k − 1)L L L2 �
Amin = yk,min Δxk = = 2 (k − 1),
k=1 k=1
n n n k=1
and,
n n n
� � kL L L2 �
Amax = yk,max Δxk = = k.
k=1 k=1
n n n2 k=1
To evaluate these sums, use the wellknown formula,
n
� n(n + 1)
k= .
k=1
2
This agrees with the familiar result from highschool geometry: the area of a triangle equals one
half of the base times the height, since both the base and height of this triangle are L.
37
18.01 Calculus Jason Starr
Fall 2005
5. Rules for Riemann integrals. There are several rules for Riemann integrals, summarized
below. �b �b �b
a�
(f (x) + g(x))dx = a f (x)dx + a g(x)dx,
b �b
a
(r · f (x))dx = r · a f (x)dx,
�b �c �c
a
f (x)dx + b f (x)dx = a
f (x)dx.
To evaluate each of the sums, make the substitution c = eb/n . Then the lower sum is,
n n−1
b � k−1 b� l
Amin = c = c.
n k=1 n l=0
38
18.01 Calculus Jason Starr
Fall 2005
h
Amin = (eb − 1) ,
eh −1
h
Amax = (eb − 1)eh .
−1 eh
Taking the limit of Amin , respectively Amax , as n tends to infinity is the same as taking the limit
as h tends to 0.
Now observe that,
eh − 1
lim ,
h→0 h
is the difference quotient limit giving the derivative of ex at x = 0. Since dex /dx equals ex , and
since e0 equals 1, this gives,
eh − 1
lim = 1.
h→0 h
Inverting gives,
�−1
eh − 1
�
h
lim = lim = (1)−1 = 1.
h→0 eh − 1 h→0 h
Also, because ex is continuous,
lim eh = e0 = 1.
h→0
39
2. The Riemann sum for xr . Let r > 0 be a positive real number. The problem is to compute
the Riemann integral, � b
xr dx,
1
using Riemann sums. For this particular integral, a different partition than usual is more efficient.
Let n be a positive integer, and let q be the real number,
q = b1/n .
xk = q k .
Observe that,
1 = x0 < x1 < · · · < xn−1 < xn = (b1/n )n = b.
The length of the k th subinterval is,
Observe this increases as k increases. So this is not the partition of [1, b] into n equal subintervals.
The mesh size is,
Thus, even though this isn’t the most obvious choice of partition, it can be used to compute the
Riemann integral.
Because xr is increasing, the minimum value of xr on the interval [xk−1 , xk ] occurs at the left
endpoint,
yk,min = xrk−1 = q (k−1)r .
Similarly, the maximum value occurs at the right endpoint,
yk,max = xrk = q kr .
40
To evaluate the sum, make the substitution c = q r+1 . Then the sum is,
n
�
ck−1 = 1 + c + c2 + · · · + cn−2 + cn−1 .
k=1
41
18.01 Calculus Jason Starr
Fall 2005
3. The Fundamental Theorem of Calculus. There is a single theorem that it is at the heart
of almost all applications involving Riemann integrals. The theorem answers two question simul
taneously: Which functions are Riemann integrable? What is the Riemann integral of a function?
The answer to the first question is: Every function you are likely to encounter is Riemann inte
grable. Precisely, every continuous function, and every piecewise continuous function is Riemann
integrable.
The answer to the second question is more interesting. Assume f (x) is a continuous function. Let
x = a be a fixed point where f (x) is defined. Form the function,
� x
F (x) = f (t)dt.
a
The function F (x) is defined whenever f (t) is defined on all of [a, x]. If f (x) is continuous, the
Fundamental Theorem of Calculus asserts F (x) is differentiable and,
� x
dF d
(x) = f (t)dt = f (x).
dx dx a
The proof of the second part is very easy. Consider the increment in F from x to x + Δx,
� x+Δx � x � x+Δx
F (x + Δx) − F (x) = f (t)dt − f (t)dt = f (t)dt.
a a x
Let ymin be the minimum value of f (t) on the interval [x, x + Δx]. Let ymax be the maximum
value of f (t) on the interval [x, x + Δx]. Then for every choice of partition t0 < t1 < · · · < tn of
[x, x + Δx], and every choice of values yk∗ on the subintervals,
42
18.01 Calculus Jason Starr
Fall 2005
because the total length of the interval [x, x + Δx] is Δx. Similarly, the upper bound is,
n
�
ymax Δtk = ymax Δx.
k=1
F (x + Δx) − F (x)
ymin ≤ ≤ ymax .
Δx
The middle term is the difference quotient. Consider what happens as Δx tends to 0. Because f (t)
is continuous, both the maximum and minimum values of f (t) on [x, x + Δx] simply limit to the
value f (x). Thus,
lim ymin = lim ymax = f (x).
Δx Δx
By the Squeezing Lemma for limits, since these two limits exist and are equals, the middle limit
also exists and equals f (x),
F (x + Δx) − F (x)
lim = f (x).
Δx→0 Δx
43
18.01 Calculus Jason Starr
Fall 2005
4. Algorithm for computing Riemann integrals. The Fundamental Theorem of Calculus has many
important applications. The most obvious is to give us a simpler method for computing Riemann
integrals, under the hypothesis that we can compute the antiderivative. If f (x) is a continuous
function and G(x) is a known antiderivative of f (x), then,
� b
f (t)dt = G(b) − G(a).
a
This freedom is very useful, particularly when one or both of the limits of integration depend
on some parameter. In this case, by convention, the dummy variable is chosen to be a different
parameter. � x � x
f (x)dx INCORRECT, f (t)dt CORRECT
a a
44
π − 2θ 1 1
f (θ) = + 2( sin(θ) cos(θ)) = π/2 − θ + sin(2θ).
2 2 2
Using standard rules of differentiation, the derivative is,
df
= −1 + cos(2θ).
dθ
Notice, by the doubleangle formula for cosine, this equals,
The hardest step (hidden here) was the geometric computation of f (θ). However, this is completely
unnecessary. Introduce a function,
� t√
G(t) = 1 − x2 dx.
0
f (θ) = 2G(cos(θ)).
45
18.01 Calculus Jason Starr
Fall 2005
This gives,
df �
= 2 1 − cos2 (θ)(− sin(θ)) = −2 sin2 (θ).
dθ
The second method is indirect. The function G(t) has no simple expression. Nonetheless, this
method is faster. In many cases this is the only method that works.
The argument above using the chain rule and the Fundamental Theorem of Calculus is quite general.
It gives the general equation,
� v(x)
d/dx u(x)
f (t)dt = f (v(x))v � (x) − f (u(x))u� (x).
3. Geometric area and algebraic area. The Riemann integral is the algebraic area,
� b
f (x)dx = Area above the xaxis − Area below the xaxis .
a
The geometric area is the total area, both above and below the xaxis. Although geometric area
does not equal algebraic area, it has a simple expression using the Riemann integral,
� b
Geometric area = |f (x)|dx.
a
Example. Find both the algebraic area and the geometric area bounded by the xaxis and the
graph of y = sin(x) over the interval −π < x < π.
Because sin(x) is an odd function, the area below the xaxis for −π < x < 0 equals the area above
the xaxis for 0 < x < π. In the expression for the algebraic area, these areas cancel to give 0. This
is borne out by computation,
� π
sin(x)dx = (− cos(x)|π−π = − cos(π) + cos(−π) = −(−1) + (−1) = 0.
−π
Thus the geometric area does not equal the algebraic area. But computation of the geometric area
reduces to a straightforward Riemann integral.
46
18.01 Calculus Jason Starr
Fall 2005
4. Estimates. For every pair of Riemann integrable functions f (x), g(x) on [a, b] satisfying the
inequality f (x) ≤ g(x) for every choice of x, the following inequality holds,
� b � b
f (x)dx ≤ g(x)dx.
a a
�
0.1 �
1 + sin(x)dx.
0
�
The expression sin(x) has no simple antiderivative. The value of the Riemann integral could be
approximated well by a Riemann sum. An alternative approach is to use the estimates,
√ �
√
(1 − x2 /6) x ≤ sin(x) ≤ x,
The first and third Riemann integral follow from the Fundamental Theorem of Calculus,
� 0.1 � �0.1
1/2 1 5/2 2 3/2 1 7/2 �� 2 1
1+x − x dx = x + x − x � = 0.1+ √ − √ = 0.1210667926±10−10 .
0 6 3 21 0 3 1000 21 10000000
Similarly,
� 0.1 �
�0.1
2 3/2 �� 2
1+x 1/2
dx = x + x �
= 0.1 + √ = 0.1210818511 ± 10−10 .
0 3 0 3 1000
Since these two integrals agree to within ±10−4 , this gives the original integral,
� 0.1 �
5. Change of variables. After the Fundamental Theorem of Calculus, the most useful integral
rule is the change of variables rule. The rule for Riemann integrals is nearly the same as the rule
for antiderivatives. The additional feature for Riemann integrals is the change of the limits of
integration.
�
x=b �
u=u(b)
�
f (u(x))u (x)dx = f (u)du.
x=a u=u(a)
47
18.01 Calculus Jason Starr
Fall 2005
Since tan(x) is not visibly the derivative of another function, we rewrite the integral and hope for
the best. � π/3 � π/3
sin(x)
tan(x)dx = dx.
π/4 π/4 cos(x)
The new integral can be computed by the Fundamental Theorem of Calculus, since 1/u is the
derivative of ln(u).
� u=1/2
−1 1/2
√ √
√
du = (− ln(|u|)|1/√2 = − ln(1/2) + ln(1/ 2) = ln(2) − ln( 2).
u=1/ 2 u
It is only fair to note there is a second method. Make the same substitution to simplify the
antiderivative of tan(x) to − ln(|u|) + C, and then backsubstitute to get,
�
tan(x)dx = − ln(| cos(x)|) + C.
Now use the Fundamental Theorem of Calculus with the original limits of integration. Both
methods are correct. Usually the first method is faster and less errorprone; it requires no back
substitution.
6. Integrating backwards. This comes so naturally for most calculus students, it barely warrants
mention. Technically, the Riemann integral,
� b
f (x)dx,
a
48
is only defined if a ≤ b. What if a is larger than b? The only possible answer consistent with the
Fundamental Theorem of Calculus is the following,
� b � a
f (x)dx = − f (x)dx, if a > b.
a b
Because of the central role of the Fundamental Theorem of Calculus, the above equation is true by
convention. With this convention, the Fundamental Theorem of Calculus holds whenever a is less
than b, equal to b, or greater than b.
Lecture 17. October 21, 2005
Homework. Problem Set 5 Part I: (a) and (b); Part II: Problem 1.
y − sin(x2 ) = 0,
has order 0, because no derivatives of y actually occur in the equation. It has a unique (and rather
trivial) solution,
y = sin(x2 ).
Because the solution is unique, it depends on 0 parameters (and the order is 0).
(ii) The ordinary differential equation,
dy 1
− = 0,
dx x + 1
has order 1 because dy/dx occurs and no higher derivatives occur. Every solution is an antiderivative
of 1/x + 1, �
1
y= dx = ln(|x + 1|) + C,
x+1
Notice the solution depends on 1 parameter, C. And the order is 1.
(iii) The ordinary differential equation,
d2 y
+ ω 2 y = 0,
dx2
49
18.01 Calculus Jason Starr
Fall 2005
has order 2. The general solution was found in Problem Set 2, Problem 4,
y = A cos(ωx) + B sin(ωx).
dk y dk−1 y dy
ak (x) + a k−1 (x) + · · · + a 1 (x) + a0 (x)y = b(x),
dxk dxk−1 dx
for functions ak (x), . . . , a0 (x), b(x). If b(x) is zero, the equation is homogeneous. Otherwise it
is inhomogeneous. Very important is the case when all the functions ak (x), . . . , a0 (x), b(x) are
constant. Then the differential equation is called constant coefficient. The solution of constant
coefficient linear ordinary differential equations is a main focus of Math 18.03.
2. Separable differential equations. Many differential equations arising in applications are
examples of separable differential equation. A separable ordinary differential equation is a first
order differential equation,
dy
= F (x, y),
dx
for which f (x, y) factors as,
F (x, y) = g(x)/h(y).
Example. Find the equation y = f (x) of every curve with the following property: For every point
(x, y) on the curve, the tangent line to the curve is perpendicular to the line joining (x, y) to the
origin (0, 0).
The slope of the tangent line to the curve at (x, y) is dy/dx. The slope of the line joining (0, 0) and
(x, y) is y/x. Since the tangent line is perpendicular to the line joining (0, 0) and (x, y),
dy
= −x/y.
dx
Thus, the equation y = f (x) is a solution to this separable differential equation.
(i). Factor f (x, y) as g(x)/h(y). This is often the most difficult step. In the example, it is quite
dy g(x)
= ⇒ h(y)dy = g(x)dx.
dx h(y)
50
18.01 Calculus Jason Starr
Fall 2005
(iii). Antidifferentiate both sides of the equation. In the example, the antiderivatives
� �
ydy = −xdx,
give,
1 2 −1 2
y = x + C.
2 2
(iv). If there is an inital value, use it to find the constant of integration. An initial value
problem is an ordinary differential equation together with some information for an initial value x0
of the independent variable. It is often written,
�
dy/dx = F (x, y),
y(x0 ) = y0 .
The example was not an initial value problem. However, it can easily be made an initial value
problem by specifying,
y(1) = 1,
for instance. With this condition, the constant C satisfies the equation,
1 2 −1 2
(1) = (1) + C.
2 2
The solution is,
C = 1.
(v). Simplify the answer. Often it is best to solve for y = f (x). Often this is unnecessary. It
depends on the problem. In the example problem, the simplest answer is the implicit answer,
x2 + y 2 = 2C.
x2 + y 2 = 2.
Thus every curve satisfying the geometric property is a circle centered at the origin.
Example. Here is a somewhat different example. There is a single separable ordinary differential
equation satisfied by every function,
y = (x − a)3 ,
where a is an arbitrary constant. Find this differential equation, and find all its solutions.
51
18.01 Calculus Jason Starr
Fall 2005
dy/dx = 3y 2/3 .
y = (x − a)3 .
However, there are other solutions. For instance, y = 0 is a solution. The general solution of the
differential equation depends on 2 parameters, a < b,
⎧
⎨ (x − a)3 , x ≤ a,
y =
0, a < x ≤ b,
3
(x − b) ,
x>b
⎩
The problem is that in the step giving dy/y 2/3 = dx. If y equals 0, this equation involves division
by zero. Division by zero is not allowed, so the method breaks down.
Important fact. This fact will not be used in this class. However, it is often crucial in realworld
applications to know the solution to an initial value problem is unique. The fact is,
�
dy
dx
= F (x, y),
y(x0 ) = y0 ,
has a unique solution for x close to x0 if F (x, y) is both continuous and differentiable at (x0 , y0 ).
In the previous example, F (x, y) = 3y 2/3 is continuous at y0 = 0. But it is not differentiable at
y0 = 0. Ultimately, this is the reason for the extra solutions of the differential equation.
3. Applications. Separable differential equations come up often in applications. The most
common separable differential equation is the equation for exponential growth,
dy
= ky,
dt
52
18.01 Calculus Jason Starr
Fall 2005
where k is a constant.
The solution behaves differently if k is positive or negative. For k positive, this equation arises in
population growth and interest on savings, among others. For k negative, this equation arises in
radioactive decay, a discharging capactior in an RCcircuit, and Newton’s law of cooling.
Population growth. The simplest model of population growth is that a population N (t) (modeled
as continuous for simplicity) grows at a rate proportional to the size of the population. This gives,
dN
= kN.
dt
Following the method gives,
� dN/N = kdt,�
1/N dN = kdt,
ln(|N |) = kt + C.
Exponentiating both sides gives,
N (t) = N0 ekt .
Observe that N (t) increases without bound as t increases. When N is very large, the ecosystem
cannot support such a population. Thus the model is only valid if N (t) is not too large.
A slightly more realistic model hypothesizes a constant, equilibrium population Nequi sustainable
indefinitely. The model is that the population grows at a rate proportional both to the population
N and the difference Nequi − N ,
dN
= kN (Nequi − N ).
dt
This is again a separable differential equation. It gives the solution,
The most important feature is that N (t) approaches Nequi as t increases. This is called the steady
state solution. In general, to find the steadystate solution to a separable ordinary differential
equation, assume the solution is constant y = y1 so that dy/dt is 0. In the original model of
population growth, the only steadystate solution is N = 0. In the new model, there are 2 steady
state solutions, N = 0 and N = Nequi . In Math 18.03, stability is defined, and a method is given
to show the only stable steadystate solution is N = Nequi .
Radioactive decay. A radioactive isotope decays to a more stable isotope at a rate proportional
to the remaining radioactive isotope. Thus the mass m(t) satisfies a differential equation,
dm
= −km.
dt
Using the method, the solution is,
m(t) = m0 e−kt .
53
18.01 Calculus Jason Starr
Fall 2005
An important feature in decay problems is the halflife. The halflife is the length of time necessary
for the mass of radioactive isotope to decrease to onehalf the initial mass,
m(Thalf ) = m0 /2.
Example. The halflife of a certain radioactive isotope is 20 years. How long is required for the
mass to decrease to 1% of the initial mass? Using the formula above, k = ln(2)/25. Therefore the
equation for the mass is,
m(t) = m0 e− ln(2)t/25 .
Thus the time tf when the mass equals 0.01m0 satisfies,
or,
ln(2)tf /25 = ln(100) = 2 ln(10).
Solving gives,
tf = 50 ln(10)/ ln(2) = 166 years.
Newton’s Law of Cooling. Isaac Newton proposed a law for the rateofchange of the tempera
ture T of an object placed in a large, effectively infinite, environment at a fixed ambient temperature
Tamb . The law is that the rateofchange of T is negatively proportional to the temperature gradient
T − Tamb ,
dT
= −k(T − Tamb ).
dt
The method gives the solution,
1. Approximating Riemann integrals. Often, there is no simpler expression for the antideriva
tive than the expression given by the Fundamental Theorem of Calculus. In such cases, the simplest
method to compute a Riemann integral is to use the definition. However, this is not necessarily the
most efficient method. Often trapezoids or segments under a parabola give a better approximation
to the Riemann integral than do vertical strips.
54
18.01 Calculus Jason Starr
Fall 2005
2. The trapezoid rule. The problem is to find an approximation of the Riemann integral,
� b
I= ydx
a
for a function y(x) defined on the interval [a, b]. Choose a partition of the interval [a, b] into n equal
subintervals. The points of this partition are,
(b − a)k b−a
xk = a + , Δxk = .
n n
The values of these points are,
yk = f (xk ).
The Riemann sum using always the left endpoint is,
n
�
Il = yk−1 Δxk .
k=1
3. Simpson’s rule. Again partition the interval [a, b] into n equal subintervals. For reasons that
will become apparent, n must be even. So let n = 2m where m is a positive integer. Again define,
55
18.01 Calculus Jason Starr
Fall 2005
Pair off the intervals as ([x0 , x1 ], [x1 , x2 ]), ([x2 , x3 ], [x3 , x4 ]), etc. Thus the lth pair of intervals is,
The idea is to approximate the area of the graph over the pair of intervals by the area under the
unique parabola containing the 3 points (x2l−2 , y2l−2 ), (x2l−1 , y2l−1 ), (x2l , y2l ). For notation’s sake,
denote 2l − 1 by k. Thus the 3 points are (xk−1 , yk−1 ), (xk , yk ), and (xk+1 , yk+1 ) (this is slightly
more symmetric).
The first problem is to find the equation of this parabola. Since the parabola contains the point
(xk , yk ), it has the equation,
y = A(x − xk )2 + B(x − xk ) + yk ,
Plugging in x = xk−1 and x = xk+1 , and using that xk+1 − xk = xk − xk−1 equals Δx,
y = A(x − xk )2 + B(x − xk )2 + yk ,
A = (yk−1 − 2yk + yk+1 )/2(Δx)2 ,
B = (yk+1 − yk−1 )/2(Δx).
The next problem is to compute the area under the parabola from x = xk−1 to x = xk+1 . This is a
straightforward application of the Fundamental Theorem of Calculus,
� xk+1 � �xk+1
2 A 3 B 2
�
A(x − xk ) + B(x − xk ) + yk dx = (x − xk ) + (x − xk ) + yk (x − xk )��
.
xk−1 3 2 xk−1
56
18.01 Calculus Jason Starr
Fall 2005
Plugging in and using that xk+1 − xk = xk − xk−1 equals Δx, this is,
2A
(Δx)3 + 2yk (Δx).
3
Substituting in the formula for A and simplifying, this is,
Δx Δx Δx
(yk−1 − 2yk + yk+1 ) + (6yk ) = (yk−1 + 4yk + yk+1 ).
3 3 3
Backsubstituting 2l − 1 for k and (b − a)/2m for Δx, the approximate area for the pair of intervals
[x2l−2 , x2l−2 ] and [x2l−1 , x2l ] is,
b−a
ΔIl = (y2l−2 + 4y2l−1 + y2l ).
6m
Finally, summing this contribution over each choice of l gives the Simpson’s rule approximation,
m
b−a�
ISimpson = (y2l−2 + 4y2l−1 + y2l ).
6m l=1
(b − a)(y0 + 4y1 + 2y2 + 4y3 + 2y4 + 4y5 + 2y6 + · · · + 4y2m−3 + 2y2m−2 + 4y2m−1 + y2m )/6m.
Example. Approximate ln(2) using a partition into 4 equal subintervals with the Trapezoid Rule
and with Simpson’s Rule.
The value ln(2) equals the Riemann integral,
� 2
1
dx.
1 x
The points of the partition are x0 = 4/4, x1 = 5/4, x2 = 6/4, x3 = 7/4 and x4 = 8/4. The
corresponding values are y0 = 4/4, y1 = 4/5, y2 = 4/6, y3 = 4/7, y4 = 4/8. Thus the Trapezoid Rule
gives,
b−a 1 4 4 4 4 4 1171
Itrap = (y0 + 2y1 + 2y2 + 2y3 + y4 ) = ( + 2 + 2 + 2 + ) = ≈ 0.6970
2n 8 4 5 6 7 8 1680
For Simpson’s Rule, because n equals 4, m equals 2. Thus,
b−a 1 4 4 4 4 4 1747
ISimpson = (y0 + 4y1 + 2y2 + 4y3 + y4 ) = ( + 4 + 2 + 4 + ) = ≈ 0.6933
6m 12 4 5 6 7 8 2520
57
18.01 Calculus Jason Starr
Fall 2005
Note that trapezoids overestimate the area, because 1/x is concave up. The approximating parabo
las cross the graph of y = 1/x, thus the underestimation to the left of (xk , yk ) somewhat cancels
the overestimation to the right of (xk , yk ), explaining the better approximation.
4. One review problem. This is a related rates review problem for Exam 3. A particle moves
with constant speed 3 on the parabola y = x2 . The particle is moving away from the origin. What√
is the rateofchange of the distance from the origin to the particle when the distance equals 2 5?
The independent variable is time, t. The dependent variables are the xcoordinate of the particle,
x(t), the ycoordinate of the particle, y(t), and the distance L(t) from the particle to (0, 0). The
constant is the speed s = 3 of the particle. The constraints are that the point moves on the
parabola,
y = x2 ,
and the Pythagorean theorem,
L2 = x 2 + y 2 .
Also, since the speed is constant,
� �2 � �2
2 dx dy
s = + .
dt dt
This plays the role of the “known rateofchange” in a typical related rates problem.
the moment when L equals 2 5. Plugging y = x2 into the equation for L2 gives,
L2 = x2 + y 2 = x2 + (x2 )2 = x2 + x4 .
√
At the instant when L equals 2 5, L2 equals 20. Thus, at that moment,
x4 + x2 = 20.
58
Since s is known to be 3, and x is known to be 2, this equation can be solved for dx/dt,
�2
32
�
dx 9
= 2
= .
dt 1 + 4(2) 17
59
y = f (x), y = g(x)? This problem can be solved directly: the area is the difference of the area
between y = f (x) and the xaxis and the area between y = g(x) and the xaxis. Each of these is a
Riemann integral.
The differential method asks, what is the infinitesimal change in the area A as x varies from x
to x + dx? The infinitesimal region is a rectangle of base dx and height f (x) − g(x). Thus the
infinitesimal change in A is,
Integrating gives, �
�
x=b
A= dA = f (x) − g(x)dx.
x=a
Of course this is the same answer as in the last paragraph. But for other applied integral problems,
the differential method may be the only method that works.
Example. Find the area bounded by the curve y = x(x2 − 3) and a horizontal tangent line.
The horizontal tangent lines are the tangent lines to the curve at critical points. Setting the
derivative equal to 0 gives,
dy
= 3x2 − 3 = 3(x − 1)(x + 1).
dx
Thus the critical points are x = ±1. The function is odd, so symmetry suggests the area is the same
regardless of the choice of critical point. Thus, choose the critical point x = −1. The corresponding
value of the function is,
y = (−1)((−1)2 − 3) = (−1)(−2) = 2.
This is the equation of the horizontal tangent line. Each intersection point (b, f (b)) of the tangent
line with y = x(x2 − 3) occurs at a solution x = b of,
x(x2 − 3) = 2.
The remaining intersection point is (2, 2). So the area bounded by the curve y = x(x2 − 3) and the
tangent line y = 2 is,
�
x=2 � 2
2
2 − (x(x − 3))dx = −x3 + 3x + 2dx.
x=−1 −1
60
18.01 Calculus Jason Starr
Fall 2005
3. Volumes of solids of revolutions: the disk method. If the region in the xyplane bounded
by x = a, x = b, y = f (x) and the xaxis is revolved through xyzspace about the xaxis, what is
the volume of the resulting solid? The solid is called a solid of revolution. The disk method applies
the method of differentials to solve this problem. As x increases from x to x + dx, the corresponding
infinitesimal region of the solid is essentially a disk. The width of the disk is dx. The area of the
disk is π[f (x)]2 . Thus the infinitesimal volume of the disk is,
dV = Area × width = π[f (x)]2 dx.
Thus the volume is, �
�
x=b
V = dV = π[f (x)]2 dx.
x=a
Example. Find the volume of a right circular cone whose base has radius R and whose vertex has
height H above the base.
The cone is the solid of revolution for the plane region bounded by x = 0, the xaxis, and the line
containing (0, R) and (H, 0). The equation of this line is,
(H − x)R
y= .
H
Thus the area of an infinitesimal disk is,
(H − x)2 R2
dV = Area × width = π dx,
H2
and the volume is, �
x=H
(H − x)2 R2
�
V = dV = π dx.
x=0 H2
Making the substitution u = H − x, du = −dx gives,
� u=0 � 3 �0
R2 2 R2 u �
V = π 2 u (−du) = π 2 − �� .
u=H H H 3 H
Evaluating gives the volume,
V = πR2 H/3.
In particular, the antiderivative of u2 is responsible for the denominator 3 in the formula for the
volume.
The sphere of radius R is√the solid of revolution for the plane region bounded by the xaxis and
61
18.01 Calculus Jason Starr
Fall 2005
4. The slice method. A generalization of the disk method is the slice method. The problem
is to find the volume of a region bounded by the planes x = a and x = b given the knowledge
of the area A(x) of the slice of the solid by the plane containing (x, 0, 0) parallel to the yzplane.
As x increases from x to x + dx, the corresponding infinitesimal region of the solid is essentially
a cylinder. The width of the cylinder is dx. And the area is the area A(x) of the slice. Thus the
infinitesimal volume of the cylinder is,
Example. Find the volume of the “corner” region bounded by the xyplane, the xzplane, the
yzplane, and the plane containing (L, 0, 0), (0, L, 0) and (0, 0, L).
This region is bounded by the planes x = 0 and x = L. The xslice of the region is a right isosceles
triangle. The base and altitude of the triangle both equal f (x), where y = f (x) is the equation of
the line passing through (0, L) and (L, 0). This equation is,
f (x) = L − x.
Evaluating gives,
V = L3 /6.
Thus the “corner” takes up one sixth of the corresponding cube of edge length L.
62
18.01 Calculus Jason Starr
Fall 2005
5. Volumes of solids of revolution: the washer method. A variation on the disk method
is the washer method. A washer is the solid obtained by removing from a larger disk a concentric
smaller disk of the same width. If the outer radius of the washer is ro and the inner radius is ri ,
then the net area of the washer is,
Example. The main part of a dog dish is a solid of revolution whose radial crosssection is a
triangle of height H whose base has inner radius Ri and outer radius Ro . Find the volume of
material used to make the dog dish.
Note. This integral was only setup in lecture. The derivation will be completed in recitation.
Here is the complete derivation. Denote by x the height along the altitude of the triangle. Thus x
varies from x = 0 to x = H. When x = H, the inner radius and outer radius are each equal to the
average (Ri + Ro )/2 of the two radii. Both radii depend linearly on x.
The equation for the inner radius increases linearly from ri = Ri at x = 0 to ri = (Ri + Ro )/2 at
x = H. Thus,
H − x Ri + Ro x
ri (x) = Ri + .
H 2 H
Similarly, the equation for the outer radius decreases linearly from ro = Ro at x = 0 to ri =
(Ri + Ro )/2 at x = H. Thus,
H − x Ri + Ro x
ro (x) = Ro + .
H 2 H
This is interesting in its own right. Using this factorization, the net area of the region between the
2 circles, called an annulus, equals
� �
2 2 ro + ri
π(ro − ri ) = 2π
(ro − ri ).
2
Note the first factor is the circumference of the center of the annulus. And the second factor is the
radial width of the annulus. Thus the area of an annulus is the circumference of the center times
the radial width.
63
18.01 Calculus Jason Starr
Fall 2005
One reality check is that this is the same volume as a cylinder with the same center (Ri + Ro )/2
and height H as the dish, and whose (constant) radial width equals the average radial width of the
dish, (Ro − Ri )/2.
Lecture 20. November 1, 2005
Practice Problems. Course Reader: 4C2, 4C6, 4D1, 4D4, 4D8.
1. Average values. Given a function f (x) defined on some interval [a, b], what is the average
value of f (x)? A reasonable first approximation is to choose a finite collection of points from [a, b]
and compute the average value over those points. Break [a, b] into a union of n subintervals of
length Δx =
(b − a)/n. From each interval, choose a point; say x
∗k in the k th interval. For the
finitely many values yk∗ = f (x
∗k ), the average value is,
n
1
� ∗
Average ≈ y .
n k=1 k
64
18.01 Calculus Jason Starr
Fall 2005
Now nΔx equals n(a − b)/n, which is a − b. So the average value is,
n
1 � ∗
Average ≈ y Δx.
b − a k=1 k
The sum is a Riemann sum. To get better approximations to the true average, increase the number
of points n at which f (x) is “sampled”. In the limit, this gives the true average,
n
1 � �b
Average = lim yk∗ Δx = a f (x)dx/(b − a).
b − a n→∞ k=1
Example. Under ideal conditions, a wireproducing machine produces wire of uniform radius r0 .
Because of small vibrations in the machine, the actual radius of the wire varies as a function of the
length,
r(x) = r0 + A cos(ωx).
The quantity A is much smaller than r0 . What is the average radius of the wire?
Because the variation is periodic, the average value over any number of periods equals the average
value of one period. In other words, compute the average for the interval 0 ≤ x ≤ 2π/ω. The
length of this interval is 2π/ω. Thus the average value is,
� 2π/ω
1
Average = r0 + A cos(ωx)dx.
(2π/ω) 0
Using the Fundamental Theorem of Calculus, this equals,
1
(r0 x + (A/ω) sin(ωx)|2π/ω
0 .
(2π/ω)
This evaluates to,
1
r0 (2π/ω) = r0 .
(2π/ω)
Thus, although the radius varies and does not usually equal its ideal value r0 , the average value is
indeed,
Average = r0 .
2. Average values: nonuniform distribution. It often happens that the average value of f (x) is
desired in a situation where the values f (x) are not all uniformly likely. Typically, the probability
that x has value in the range from x0 to x0 + Δx is approximately,
Prob(x0 ≤ x ≤ x0 + Δx) ≈ ρ(x0 )Δx,
65
18.01 Calculus Jason Starr
Fall 2005
for some nonnegative continuous function ρ(x). The function ρ(x) is called a probability distribution.
Assuming this approximation becomes arbitrarily good as the length Δx approaches zero, the exact
probability that x has value in the range x0 to x1 is,
� x1
Prob(x0 ≤ x ≤ x1 ) = ρ(x)dx.
x0
In particular, because x must take value somewhere in the interval [a, b], the total probability is 1.
In other words, � b
ρ(x)dx = 1.
a
This is called the normalization condition.
The average value is computed as before. But this time, each value yk∗ = f (x∗k ) is weighted by the
approximate probability that x takes value in the k th interval, ρ(x∗k )Δx. This gives,
n
�
Average ≈ f (xk )∗ ρ(xk )∗ Δx.
k=1
It must be noted, the probability distribution ρ(x) often does not satisfy the normalization condi
tion. In this case, the formula above is wrong. But it is easily correct,
�b �b
Average = ( a f (x)ρ(x)dx)/( a ρ(x)dx).
Example. A particle is fired through a slit and strikes a screen on the other side. Measuring the
position on the screen so that the origin is the closest point on the screen to the slit, the probability
distribution is empirically observed to be,
2 /2σ 2
ρ(x) = Ce−x ,
where σ is a constant determining the “width” of the probability distribution, and C is an unde
termined normalization constant. What is the average distance of the particle from the center of
the screen? Assume the particle lies in an interval [−R, R], where R is very large.
2
Remark. This differs from the formula given in lecture, which was Ce−x /2σ for a particular choice
of σ. The formula given here is more “standard”. I apologize for any confusion.
66
18.01 Calculus Jason Starr
Fall 2005
Make the substitution u = −x2 /2σ 2 , du = (−x/σ 2 )dx to reduce this to,
� 0 � −R2 /2σ 2 � 0
u 2 u 2
Ce (σ du) + Ce (−σ du) = 2 Cσ 2 eu du.
−R2 /2σ 2 0 −R2 /2σ 2
Unfortunately, this is not an answer, because the normalization constant C is unknown. The
normalization condition is that,
� R
2 2
C lim e−x /2σ dx = 1.
R→∞ −R
Simplify this by making the substitution, u = x/σ, du = dx/σ, and Q = R/σ to get,
� R/σ � Q
−u2 /2 2 /2
C lim e σdu = Cσ lim e−u du.
R→∞ −R/σ Q→∞ −Q
67
does not depend on σ. It is simply some number. Denoting this number by 1/C1 , the normalization
condition is,
Cσ/C1 = 1.
The solution is that C = C1 /σ. Plugging this into the formula above, the average distance is,
Average distance = 2C1 σ,
where, � Q
2 /2
1/C1 = lim e−u du.
Q→∞ −Q
There is a beautiful argument that, √
C1 = 1/ 2π.
Unfortunately, we cannot yet prove this. Taking it as true gives the final answer,
√
Average distance = 2σ/ 2π.
3. Volumes of solids of revolution: the shell method. An alternative to the disk and washer
method is the shell method. A shell is the region between 2 cylinders of the same height. If the
average radius of the cylinders is r, if the width of the region is w and if the height of the cylinders
is h, then the approximate volume of the shell is,
Volume ≈= Circumference × height × width = 2πrhw.
Take the plane region bounded by x = a, x = b, the xaxis and the curve y = f (x). Revolve this
region about the yaxis. (Please note: In the disk and washer method, the region was revolved
about the xaxis.) To compute the corresponding volume, approximate the region obtained from
x to x + dx as a shell. The radius of the shell is x. The height of the shell is y = f (x). The width
of the shell is dx. Therefore the differential element of volume is,
dV = (2πx)(f (x))dx.
Integrating gives the volume, � x=b
V = 2πxf (x)dx.
x=a
Example. The dog dish revisited. The main part of a dog dish is a solid of revolution whose
radial crosssection is a triangle of height H whose base has inner radius Ri and outer radius Ro .
Find the volume of material used to make the dog dish.
The volume was computed using the washer method. This time it will be computed using the shell
method. The triangular region is the union of two regions. The first region is bounded by x = Ri ,
x = (Ri + Ro )/2, the xaxis, and the line segment,
2H
y= (x − Ri ).
Ro − Ri
68
18.01 Calculus Jason Starr
Fall 2005
The second region is bounded by x = (Ri + Ro )/2, x = Ro , the xaxis, and the line segment,
2H
y= (Ro − x).
Ro − Ri
By the shell method, the volume of the solid of revolution obtained from the first region is,
� x=(Ri +Ro )/2 � x=(Ri +Ro )/2
2H 4πH
V1 = (2πx) (x − Ri )dx = x2 − Ri xdx.
x=Ri R o − R i R o − R i x=Ri
This becomes simpler to deal with after the substitution u = −x + (Ri + Ro )/2, du = −dx. The
new integral is,
� u=0
4πH
V1 = (−u + (Ro + Ri )/2)(−u + (Ro − Ri )/2)(−du)
Ro − Ri u=(Ro −Ri )/2
� u=(Ro −Ri )/2
4πH
= (−u + (Ro + Ri )/2)(−u + (Ro − Ri )/2)du.
Ro − Ri u=0
By the shell method, the volume of the solid of revolution obtained from the second region is,
� x=Ro � x=Ro
2H 4πH
V2 = (2πx) (Ro − x)dx = x(Ro − x)dx.
x=(Ro +Ri )/2 Ro − Ri Ro − Ri x=(Ro +Ri )/2
Believe it or not, this will be simpler to deal with after the substitution u = x − (Ro + Ri )/2,
du = dx. The new integral is
� u=(Ro −Ri )/2
4πH
V2 = (u + (Ro + Ri )/2)(−u + (Ro − Ri )/2)du.
Ro − Ri u=0
Notice how similar are the integrals for V1 and V2 . They have the same fraction in front of the
integral, and they have the same limits of integration. Thus, the sum of the 2 volumes is,
V = V1 + V2 =
� u=(Ro −Ri )/2
4πH
[(−u+(Ro +Ri )/2)(−u+(Ro −Ri )/2)]+[(u+(Ro +Ri )/2)(−u+(Ro −Ri )/2)]du.
Ro − Ri u=0
Since both terms in the integrand have the factor (−u + (Ro − Ri )/2), this can be factored to give,
� u=(Ro −Ri )/2
4πH
V = [(−u + (Ro + Ri )/2) + (u + (Ro + Ri )/2)](−u + (Ro − Ri )/2)du.
Ro − Ri u=0
Of course the term in square brackets is simply Ro + Ri . So the total volume is,
� u=(Ro −Ri )/2
4πH
V = (Ro + Ri )(−u + (Ro − Ri )/2)du.
Ro − Ri u=0
69
18.01 Calculus Jason Starr
Fall 2005
� 2 �(R −R )/2
4πH −u (Ro − Ri )u �� o i
(Ro + Ri ) +
.
Ro − Ri 2 2 �
0
This is precisely the same answer as computed using the washer method. Please observe though,
how much more effort was required for the shell method. The lesson is, if you have an alternative
between the disk method and the shell method, consider carefully which method requires less effort
before committing to one or the other.
Lecture 21. November 3, 2005
Homework. Problem Set 6 Part I: (a) (e); Part II: Problem 1.
Practice Problems. Course Reader: 4E2, 4E5, 4E7, 4F1, 4F6.
1. Parametric equations. To this point in the course, plane curves were specified in 1 of 2 ways.
The explicit form, or graph form of a curve in Cartesian coordinates is the common form,
y = f (x), a ≤ x ≤ b.
The implicit form of a curve in Cartesian coordinates is as the set of all solutions of an equation,
F (x, y) = 0.
Often a subset of this curve is specified by imposing extra conditions, e.g., the upper unit semicircle
is the set of solutions of x2 + y 2 = 1 satisfying the extra condition y > 0.
There is a third important way to specify a curve: using parametric equations. Given a parameter
t varying in an interval a ≤ t ≤ b and given functions f (t) and g(t) on this interval, the associated
parametric curve, �
x = f (t),
y = g(t)
is simply the set of all pairs (x, y) = (f (t), g(t)) as t varies over the interval a ≤ t ≤ b. We consider
only the case where f (t) and g(t) are piecewise differentiable functions (more advanced courses
discuss some pitfalls if f (t) and g(t) are merely continuous functions).
Examples. A. One specification of the points on the circle of radius r centered at (0, 0) is using
the angle θ. This gives rise to a parametric equation with parameter θ,
�
x = r cos(θ),
0 ≤ θ < 2π.
y = r sin(θ)
70
18.01 Calculus Jason Starr
Fall 2005
B. An ellipse centered at (0, 0) whose axes equal the coordinate axes has a parametric equation,
�
x = a cos(θ),
0 ≤ θ < 2π.
y = b sin(θ)
C. A projectile is launched from an initial position of (x0 , y0 ) with an initial velocity vector of
magnitude v0 at an angle α to the horizontal, and under the influence of constant graviational
acceleration −g. According to Newton’s laws of mechanics, the position of the projectile after time
t is, �
x = v0 cos(α)t + x0 ,
0 ≤ t.
y = −(g/2)t2 + v0 sin(α)t + y0
This is a parametric equation where time t is the parameter. Even when some other quantity is
the parameter, it is often useful to think of the parameter as time. Thus the curve is the trail left
by a point, or perhaps better, the tip of a pen, as it moves in the plane.
2. Implicitization. Under reasonable hypotheses, it is possible to turn a portion of an implicit
curve into an explicit curve. Similarly, it should be possible to turn a portion of a parametric curve
into an explicit curve. It is often simpler to find an implicit equation satisfied by a parametric
curve. The process of finding an implicit equation is called implicitization.
Examples. A. For the parametric curve in Example A above, by the Pythagorean Theorem,
x(θ)2 + y(θ)2 = r2 cos2 (θ) + r2 sin2 (θ) = r2 (cos2 (θ) + sin2 (θ)) = r2 .
Thus the parametric equation satisfies the implicit equation,
x2 + y 2 = r 2 .
x2 /a2 + y 2 /b2 = 1.
C. For the parametric curve in Example C, assuming v0 cos(α) is nonzero, the equation for x can
be solved for t,
x − x0
x = v0 cos(α)t + x0 ⇔ t = .
v0 cos(α)
This can then be substituted into the equation for y to get an explicit equation for the curve,
In going from a parametric equation to an implicit equation, there are 2 important warnings to
keep in mind:
71
18.01 Calculus Jason Starr
Fall 2005
• A parametric equation may traverse only part of the implicit curve. The most usual reason
is that the parameter t is restricted to a certain range. A closely related reason is that the
functions of t are themselves somehow limited, as in the parametric curve lying in the line
y = x, �
x = cos(t),
y = cos(t)
A more interesting reason is that the implicit curve may have more than one connected piece,
as in the parametric curve,
�
x = 2t/(1 − t2 ),
− 1 < t < 1.
y = (1 + t2 )/(1 − t2 )
As t varies, this parametric curve sweeps out the top branch of the hyperbola y 2 − x2 = 1.
• A parametric equation may sweep out all or a portion of the implicit curve multiple times.
This is clear in Examples A and B: as θ is allowed to vary over the interval 0 ≤ θ < 2nπ, the
parametric curve completes n revolutions of the implicit curve.
3. Arc length. Given a segment of curve, what is the length of the curve? Imagining the curve
made of some flexible extensible material like wire, what is the length when the wire is pulled taut?
The answer is called the arc length, s.
The method for expressing arc length is an integral is by now familiar. Break the interval a ≤ t ≤ b
into a large number n of subintervals with endpoints,
Approximate the curve on each subinterval tk−1 ≤ t ≤ tk by a line segment. The line segment runs
from the point,
(xk−1 , yk−1 ) = (x(tk−1 ), y(tk−1 )),
to the point,
(xk , yk ) = (x(tk ), y(tk )).
The rise and run of the line segment are,
72
18.01 Calculus Jason Starr
Fall 2005
The arc length of the curve is approximately the sum of the lengths of the approximating line
segments,
�n
�
s≈ (x� (tk ))2 + (y � (tk ))2 Δtk .
k=1
This is a Riemann sum. As the mesh of the partition tends to 0, the Riemann sums tend to a
Riemann integral. This Riemann integral is the arc length,
�
� t=b � �2 � �2
dx dy
Arc length = + dt.
t=a dt dt
(−r sin(θ))2 + (r cos(θ))2 = r2 sin2 (θ) + r2 cos2 (θ) = r2 (sin2 (θ) + cos2 (θ)) = r2 .
This is, in fact, our definition of the angle: the angle θ subtended by an arc of a circle equals the
ratio of the arc length by the radius of the circle. If this logic sounds circular, it is perhaps that
nobody ever told you before how to define the length of the arc of a circle! It is also an argument
in favor of the more natural definition of the angle as the ratio of the area of the sector of the circle
by r2 /2.
Example 2. This is not a single example, but a class of examples. A curve in explicit form,
y = f (x), a ≤ x ≤ b, can always be put in parametric form,
�
x = t,
a ≤ t ≤ b.
y = f (t)
73
18.01 Calculus Jason Starr
Fall 2005
Then,
dx dy
= 1, = f � (t).
dt dt
Using this, �
ds = 1 + (f � (t))2 dt.
Thus the arc length is, � t=b �
Arc length = 1 + (f � (t))2 dt.
t=a
Since the parameter t in the Riemann integral is only a dummy variable anyway, it is allowed to
replace it by the variable x (so long as x plays no other role in the integral, which it does not).
This gives the formula for the arc length of an explicit curve,
�b�
Arc length = a
1 + (dy/dx)2 dx.
x2 1
y= − ln(x), a ≤ x ≤ b,
4 2
where a is a positive real number. The derivative is,
dy 1 11 1
= (2x) − = (x − x−1 ).
dx 4 2x 2
Thus the square of the derivative is,
� �2
dy 1
= (x2 − 2 + x−2 ).
dx 4
4 x2 − 2 + x−2 x2 + 2 + x−2
+ = .
4 4 4
It is easy to check this equals the square,
�2
x + x−1
�
.
2
74
18.01 Calculus Jason Starr
Fall 2005
Therefore,
1
ds = (x + x−1 )dx.
2
Integrating gives the arc length,
�
x=b �b
x2
�
�
1 1
(x + x−1 )dx =
�
s= ds = + ln(x)�� .
x=a 2 2 2 a
Imagine the cone is made of paper. Make an incision along a line segment from the vertex to the
base circle. The resulting piece of paper may be unfolded to form a sector of a circle. The radius of
the sector is the slant height s. The circumference of the sector is the circumference of the original
base circle 2πr. The formula for the area and circumference of a sector of a circle give the identity,
1
Area of sector = (Radius of sector) × (Circumference of sector).
2
Thus, the area of the cone equals,
1
A = (S)(2πR) = πRS .
2
In particular, the height H is involved only indirectly (as H depends on H).
Next, consider a conical band obtained from a right circular cone of base radius R1 and slant height
S1 by removing the the top part of the cone of base radius R2 and slant height S2 . In particular,
the slant height of the band is the difference,
s = S 2 − S1 ,
75
18.01 Calculus Jason Starr
Fall 2005
By similar triangles,
S1 S2
= .
R1 R2
Rearranging gives,
R 2 S 1 = R 1 S2 .
Using the formula above, the area of the large cone is,
A1 = πR1 S1 ,
A = A1 − A2 = π(R1 S1 − R2 S2 ).
Since R2 S1 equals R1 S2 , the formula is unchanged by adding πR2 S1 and subtracting πR1 S2 to get,
A = 2πrs.
the surface of revolution is the surface obtained by revolving the segment through xyzspace about
the yaxis. What is the area of this surface? The answer is called the surface area.
The method for computing the surface area is so close to the method for computing the arc length
of the curve, the details will be skipped. What is relevant is the differential element of surface
area. Given a small interval from t to t + dt, approximate the segment of the parametric curve as a
line segment. The surface obtained by revolving a line segment is precisely a band of a cone. The
average radius of the cone r is x(t). The slant height of the cone is ds. Thus the area of the band
is,
�� � � �2
dx 2
dA = 2πrds = 2πx(t) dt
+ dydt
dt.
76
18.01 Calculus Jason Starr
Fall 2005
Integrating gives the formula for the surface area of the surface of revolution,
� �� �
� t=b dx 2
� �2
A = dA = t=a 2πx(t) dt
+ dydt
dt.
Examples. A. Consider the line segment connecting the point (0, H) to the point (R, 0). This
has equation,
H
y = (R − x), 0 ≤ x ≤ R.
R
The slant height of the line segment is,
√
S = R2 + H 2 ,
x = R cos(θ), −π π
≤θ≤ .
y = R sin(θ) 2 2
Revolving about the yaxis gives the sphere of radius R. Thus the surface area of the surface of
revolution is the surface area of the sphere of radius R.
As computed in the previous lecture, the differential element of arc length is,
ds = Rdθ.
Integrating gives,
�
� θ=π/2
π/2
A= dA = 2πR2 cos(θ)dθ = 2πR2 (sin(θ)|−π/2 .
θ=−π/2
77
18.01 Calculus Jason Starr
Fall 2005
ds = 3a sin(t) cos(t)dt.
Thus the differential element of surface area of the surface of revolution is,
dA = 2πrds = 2πx(t)ds = 2π(a cos3 (t))(3a sin(t) cos(t))dt = 6πa2 cos4 (t) sin(t)dt.
78
18.01 Calculus Jason Starr
Fall 2005
�1
A = 6πa2 u5 /5�
0 = 6πa2 /5.
�
3. Polar coordinate curves. After the explicit, Cartesian form of a curve as a graph, y = f (x),
the next most common representation is using polar coordinates. Given a function r = r(θ) and
For each point on the curve, the distance of the point from the origin is,
� √ �
+r, r ≥ 0,
2 2
Distance from origin = x + y = r = |r| =2
−r, r < 0
Also, assuming the point does not equal the origin, the angle of the ray from the origin to the point
is, �
−1 −1 θ, r > 0,
Angle = tan (y/x) = tan (tan(θ)) =
θ + π, r < 0
This is one of the most confusing aspects of polar curves. The symbols r(θ) and θ are engrained in
mathematical thinking as the distance and angle of a point in polar coordinates. But for a polar
coordinate curve these are simply parameters. They are very closely related to, but often different
from, the actual distance and angle. This is easiest to think about by imagining the point swerving
through the origin along the radius line to the opposite ray of the ray given by θ. In other words,
the point “goes negative”.
Given a polar curve, it is often possible to find an implicit Cartesian curve containg the polar curve.
Examples. A. Let a be a positive constant and consider the polar curve,
r(θ) = a.
This gives,
r2 = a2 ⇔ x2 + y 2 = a2 .
Thus the polar curve is contained in the circle of radius a.
B. Consider the polar curve,
a
r(θ) = .
sin(θ)
Multiplying both sides by sin(θ) gives,
r sin(θ) = a ⇔ y = a.
Thus the polar curve is contained in the horizontal line passing through (0, a).
79
18.01 Calculus Jason Starr
Fall 2005
(x − a)2 + y 2 = a2 .
θ r
−π/4 0
−π/4 < θ < π/4 r >0
π/4 0
π/4 < θ < 3π/4 r <0
3π/4 0
3π/4 < θ < 5π/4 r >0
5π/4 0
5π/4 < θ < 7π/4 r <0
The curve crosses the origin when θ = −π/4, π/4, 3π/4, and 5π/4. The curve “goes negative” when
π/4 < θ < 3π/4 and when 5π/4 < θ < 7π/4.
3. Find the extremal values of |r|. A local maximum of |r| is either a point where r is positive and
a local maximum or a point where r is negative and a local minimum. Similarly for local minima
of |r|. Typically, local maxima of |r| occur either at endpoints of the interval or points where r� (θ)
is zero (occasionally at discontinuity points, or nondifferentiable points). Local minima occur at
such points, but also occur everytime the curve crosses the origin (so that |r| equals 0).
80
18.01 Calculus Jason Starr
Fall 2005
In our example, the local minima are all points where r = 0, enumerated above. The derivative of
r is,
r� (θ) = −2 sin(2θ).
The critical points are θ = 0, π/2, π and 3π/2. For θ = 0 and θ = π, r is positive and maximum.
For θ = π/2 and θ = 3π/3, r is negative and minimum. Thus each critical point is a local maximum
of |r|. The value of |r| at each critical point is 1.
4. Find the asymptotes. This is a bit difficult with a polar curve. What is easier is to find a line
parallel to an asymptote. Whenever,
(or the same with a righthand limit or lefthand limit), there is an asymptote parallel to the ray
θ = a. Whenever,
lim r(θ) = −∞,
θ→a
In the example, r� is nonzero whenever r is zero. Thus the tangent direction of the curve as it
crosses the origin is just the direction of the limiting radius.
This is now ample information to sketch the fourleaved rose. Up to a rotation of π/4, the sketch
is the same as in Figure 16.11 on p. 566 of the textbook (the sketch was also given in lecture).
Lecture 23. November 8, 2005
Homework. Problem Set 6 Part I: (i) and (j); Part II: Problem 2.
Practice Problems. Course Reader: 4I1, 4I4, 4I6.
1. Tangent lines to parametric curves. This short section was not explicitly discussed for
general parametric curves. It was discussed for polar curves, which are a special collection of
parametric curves.
81
18.01 Calculus Jason Starr
Fall 2005
x = f (t),
y = g(t),
what is the slope of the tangent line at (f (a), g(a))? The relevant differentials are,
dx = f � (t)dt, dy = g � (t)dt.
f � (t)dt �� f � (a)
�
dy
(a) =
� = .
dx g (t)dt �
t=a g � (a)
x = r(θ) cos(θ),
y = r(θ) sin(θ)
2. Tangent lines for polar curves. Although the formula above is perfectly correct, it is a bit
long to remember. There is a slightly different packaging that is much easier to remember. Define
α to be the angle from the horizontal ray emanating from (x(θ), y(θ)) in the positive xdirection,
and the tangent line. To be precise, there are two such angles, differing by π. The defining equation
for α is,
dy
tan(α) = .
dx
And, of course,
y
tan(θ) = .
x
Define ψ to be the difference between α and θ,
ψ = α − θ.
82
18.01 Calculus Jason Starr
Fall 2005
Therefore,
tan(α) − tan(θ)
tan(ψ) = tan(α − θ) = .
1 + tan(α) tan(θ)
Substituting in the equations for tan(θ) and tan(α) from above gives,
(dy/dx) − (y/x)
tan(ψ) = .
1 + (y/x)(dy/dx)
To simplify this, imagine multiplying both numerator and denominator by xdx and manipulate
formally,
xdy − ydx
tan(ψ) = .
xdx + ydy
The actual justification of this is a little more involved, but the formal manipulation leads to the
correct equation.
To compute the denominator in the expression, differentiate both sides of,
r 2 = x2 + y 2 ,
to get,
2rdr = 2xdx + 2ydy,
or equivalently,
xdx + ydy = r(θ)r� (θ)dθ.
To compute the numerator in the expression, differentiate both sides of,
y
tan(θ) = ,
x
to get,
dy ydx 1
sec2 (θ)dθ = − 2 = 2 (xdy − ydx).
x x x
Now substitute x = r cos(θ) in the denominator to get,
1 sec2 (θ)
sec2 (θ)dθ = (xdy − ydx) = (xdy − ydx).
r2 cos2 (θ) r2
xdy − ydx r2 dθ
tan(ψ) = = � .
xdx + ydy rr dθ
83
18.01 Calculus Jason Starr
Fall 2005
Simplifying gives,
tan(ψ) = r(θ)/r� (θ) .
Replacing 1 − sin2 (θ/2) in the numerator by cos2 (θ/2), this simplfies to,
2 cos2 (θ/2)
= − cot(θ/2).
−2 sin(θ/2) cos(θ/2)
Of course there is an identity,
− cot(u) = tan(u − π/2).
Altogether, this gives,
tan(ψ) = − cot(θ/2) = tan(θ/2 − π/2).
Therefore,
ψ = (θ − π)/2.
Since α equals θ + ψ, this gives,
α = (3θ − π)/2.
In particular, the angle of the tangent line to the cardioid at θ = π/2 is α = π/4.
3. Arc length in polar coordinates. As discussed previously, the formula for arc length of a
parametric curve is, �
ds = (dx/dt)2 + (dy/dt)2 dt.
In the case of a parametric curve, this becomes a bit simpler. The differentials are,
Squaring gives,
84
18.01 Calculus Jason Starr
Fall 2005
Taking square roots gives the differential element of arc length for a polar curve,
�
ds = [r� (θ)]2 + [r(θ)]2 dθ.
2a2 (1 + cos(2(θ/2))) = 2a2 (1 + cos2 (θ/2) − sin2 (θ/2)) = 2a2 (2 cos2 (θ/2)) = 4a2 cos2 (θ/2).
s = 8a.
Surface areas of surfaces of revolution can be computed in a similar way. This was only briefly
is revolved about the xaxis to give a fairly good approximation of the surface of an apple. What
is the surface area of this apple?
85
18.01 Calculus Jason Starr
Fall 2005
Since we are revolving about the xaxis, the radius of each slice is y. Therefore the differential
element of surface area is,
dA = 2πyds.
Substituting in y = r(θ) sin(θ) = a(1 + cos(θ)) sin(θ), and substituting in for ds gives,
and,
sin(θ) = 2 sin(θ/2) cos(θ/2),
to get,
dA = 4πa2 (2 cos2 (θ/2))(2 sin(θ/2) cos(θ/2)) cos(θ/2)dθ = 16πa2 cos4 (θ/2) sin(θ/2)dθ.
u=1 u=0 5
�
0
This evaluates to give the total surface area of the apple,
A = 32πa2 /5.
5. Area of a region enclosed by a polar curve. What is the area of the planar region enclosed
by a cardioid? By the same sort of reasoning as for volumes and arc lengths, the differential element
of area of the triangular region bounded by the rays θ, θ + dθ and the curve r(θ) is,
r(θ)2
dA = dθ.
2
Thus the area enclosed by a polar curve is,
θ=b
r(θ)2
�
�
A = dA =
dθ.
θ=a 2
86
18.01 Calculus Jason Starr
Fall 2005
�
2π 2
a (1 + cos(θ))2
A= dθ.
0 2
This expands to give,
2π
a2
�
1 + 2 cos(θ) + cos(θ)2 dθ.
2 0
To simplify the last part of the integrand, substitute,
1 + cos(2θ)
cos(θ)2 = ,
2
to get,
2π 2π
a2 a2
� �
1 + cos(2θ)
1 + 2 cos(θ) + dθ = 3 + 4 cos(θ) + cos(2θ)dθ.
2 0 2 4 0
Using the Fundamental Theorem of Calculus, this equals,
�2π
a2
�
1 �
3θ + 4 sin(θ) + sin(2θ)
�� .
4
2 0
Evaluating gives,
A = 3πa2 /2.
s ≤ y ≤ t,
87
of “invertible”, namely that 1/f (x) is always defined. We will be careful to specify the meaning of
“invertible”.
There are 2 necessary conditions for f to have an inverse function. Assume f has an inverse function
g. Let x1 , x2 be a pair of numbers in [a, b]. If f (x1 ) equals f (x2 ), then also,
x1 = g(f (x1 )) = g(f (x2 )) = x2 ,
i.e., x1 equals x2 . In other words, two distinct inputs x1 and x2 give two distinct outputs f (x1 ) and
f (x2 ). A function satisfying this condition is called onetoone, because to every output, there is
at most one input. This is the first necessary condition: every invertible function is onetoone.
Next, for every number y in [s, t], there is a number x in [a, b] such that y = f (x). In fact, just
take x to be g(y); then f (x) equals f (g(y)), which equals y. A function satisfying this condition is
called onto. This is the second necessary condition: every invertible function is onto.
Together, this says that an invertible function is onetoone and onto. In fact, the converse is also
true: every onetoone and onto function is invertible. This is easy to check, but we will not prove
it in this class.
Remark: In checking that f is onetoone and onto, the choice of intervals [a, b] and [c, d] are vital.
A simple example comes from f (x) = sin(x). For the interval [−π/2, π, 2] and [−1, 1], the function
f (x) is onetoone and onto. But for many other choices of these intervals, the function is neither
onetoone nor onto.
2. The graph of an inverse function. How should we think of an inverse function? One way
is graphically. The graph of the function y = f −1 (x) is the same as the graph of f (y) = x. This
is simply the usual graph of y = f (x) with the roles of x and y reversed. What this translates to
is, the graph of f −1 is the same as the graph of f with the roles of the xaxis and yaxis reversed.
The simplest way to get the graph of f −1 (x) is simply to reflect the graph of f (x) through the 45◦
line y = x.
3. The inverse trigonometric functions. The function sin(x) is onetoone and onto on
[−π/2, π/2], taking values in [−1, 1]. Thus there is an inverse function sin−1 (x) defined on the
interval [−1, 1], taking values in [−π/2, π/2]. The graph of sin−1 (x) is an increasing function whose
lower left endpoint is (−1, −π/2) and whose upper right endpoint is (1, π/2).
The function cos(x) is onetoone and onto on [0, π], taking values in [−1, 1]. Thus there is an inverse
function cos−1 (x) defined on the interval [−1, 1], taking values in [0, π]. The graph of cos−1 (x) is a
decreasing function whose upper left endpoint is (−1, π) and whose lower right endpoing is (1, 0).
The function tan(x) is onetoone and onto on (−π/2, π/2), taking values in the whole real line.
Thus there is an inverse function tan−1 (x) defined on the whole real line, taking values in (−π/2, π/2).
The graph is an increasing function that is asymptotic to the line y = −π/2 as x → −∞, and
asymptotic to the line y = +π/2 as x → +∞.
4. Derivatives of inverse functions. A particular simple formulation of the chain rule is the
differential formulation,
df (u) = f � (u)du.
88
18.01 Calculus Jason Starr
Fall 2005
df (g(x)) = f � (g(x))dg(x).
dx = f � (g(x))dg(x).
In fact, we have seen this formula before. It is how we computed the derivative of ln(x), the inverse
function of ex :
d 1 1
(ln(x)) = ln(x) = .
dx e x
5. Derivatives of the inverse trigonometric functions. Because the derivative of sin(x) is
cos(x), the formula above gives,
d 1
(sin−1 (x)) = .
dx cos(sin−1 (x))
This isn’t very useful. A simple argument makes it much more useful. Denote sin−1 (x) by θ. Thus
sin(θ) = x. Also, the formula for the derivative is a bit simpler,
d 1
(sin−1 (x)) = .
dx cos(θ)
d √
(sin−1 (x)) = 1/ 1 − x2 .
dx
89
18.01 Calculus Jason Starr
Fall 2005
This looks remarkably similar to the previous formula. In particular, this gives,
d 1 −1
(sin−1 (x) + cos−1 (x)) = √ +√ = 0.
dx 1 − x2 1 − x2
Therefore the sum is a constant function. Checking at x = 0 gives the value of this constant
function,
sin−1 (x) + cos−1 (x) = π/2.
Finally, because the derivative of tan(x) is sec2 (x), the formula gives,
d 1
(tan−1 (x)) = .
dx sec (tan−1 (x))
2
Again introduce θ = tan−1 (x). Then the formula for the derivative is,
d 1
(tan−1 (x)) = .
dx sec2 (θ)
But the Pythagorean theorem implies,
sec2 (θ) = 1 + tan2 (θ) = 1 + x2 .
This finally gives a very useful formula for the derivative of tan(x),
d
(tan−1 (x)) = 1/(1 + x2 ).
dx
Notice, in particular, that the denominator is never zero. This is closely related to the fact that
tan−1 (x) is defined on the entire real line.
6. Hyperbolic trigonometric functions. The trigonometric functions are very useful for dis
cussing point on the unit circle x2 + y 2 = 1, because the circle is the parametric curve,
�
x = cos(θ),
y = sin(θ)
Are there analogous continuous functions for the points on the hyperbola x2 − y 2 = 1?
At first blush, the answer is no. The problem is that the hyperbola has two parts: one part is in
the halfplane where x > 0, and the other part is in the halfplane where x < 0. Because of the
intermediate value theorem, a continuous function x = f (t) cannot jump from x > 0 to x < 0 or
vice versa without crossing x = 0. Thus, refine the question: Are there continuous functions for
the part of the hyperbola in the halfplane where x > 0?
The answer to this question is yes. The corresponding functions are called hyperbolic trigonometric
functions or, more often, simply hyperbolic functions. They are defined as follows,
1
cosh(t) = (et + e−t ),
2
90
18.01 Calculus Jason Starr
Fall 2005
1
sinh(t) = (et − e−t ),
2
sinh(t) et − e−t
tanh(t) = = t ,
cosh(t) e + e−t
1 2
sech(t) = = t ,
cosh(t) e + e−t
1 2
csch(t) = = t ,
sinh(t) e − e−t
and,
1 cosh(t) et + e−t
coth(t) = = = t .
tanh(t) sinh(t) e − e−t
The first observation is that,
1 1
cosh2 (t) = (et + e−t )2 = (e2t + 2 + e−2t ),
4 4
1 1
sinh2 (t) = (et − e−t )2 = (e2t − 2 + e−2t ).
4 4
Taking the difference of these, most of the terms cancel,
1 4
cosh2 (t) − sinh2 (t) = ((2) − (−2)) = = 1.
4 4
This proves that the parametric curve,
�
x = cosh(t),
y = sinh(t)
is contained in the righthalf of the hyperbola x2 − y 2 = 1. We will see next time that there is an
inverse function of sinh(t), from which it follows that every point in the righthalf of the hyperbola
occurs for exactly one value of t. Thus the parametric curve exactly traces out the righthalf of the
hyperbola.
7. The derivatives of the hyperbolic functions. The derivatives of the hyperbolic functions
are straightforward. The formulas are very similar to the formulas in the trigonometric case, but
slightly different. Try not to confuse them.
d
(sinh(x)) = cosh(x).
dx
d
(cosh(x)) = sinh(x).
dx
d 1 1
91
18.01 Calculus Jason Starr
Fall 2005
1. Inverse hyperbolic functions. There are a few other useful formulas for hyperbolic functions;
for instance, the analogues of the angleaddition formulas,
z 2 − 1 = 2xz ⇔ z 2 − 2xz − 1 = 0.
92
18.01 Calculus Jason Starr
Fall 2005
and
tanh−1 (x) = (1/2) ln((1 + x)/(1 − x)), −1 < x < 1.
2. Derivatives of the inverse hyperbolic functions. By the same methods used to compute
the derivatives of inverse trigonometric functions, the derivatives of the inverse hyperbolic functions
are,
du
d sinh−1 (u) = √ ,
1 + u2
du
d cosh−1 (u) = √ , u ≥ 1,
u2 − 1
du
d tanh−1 (u) = , −1 < u < 1.
1 − u2
These can also be computed using the formulas for the inverse functions.
3. Inverse substitution. The derivatives of inverse trigonometric and inverse hyperbolic functions
√
allow us to compute more antiderivatives than before, e.g., dx/( x2 − 1) equals cosh−1 (x) + C.
�
Essentially this comes down to making a direct substitution of an inverse function, e.g., u =
cosh−1 (x). However, this is logically equivalent to making an inverse substition, x = cosh(u).
When the integrand is more complicated, inverse substitution is usually simpler and faster than
direct substitution of an inverse function.
Example. Compute the following antiderivative,
� √
a2 − x2 dx.
This is not quite the derivative of an inverse function above. However, it is clear that inverse
substituting x = a sin(θ) will simplify the integrand, because
93
18.01 Calculus Jason Starr
Fall 2005
Thus we have,
� √ � � � �
x = a sin(θ), 2
2 2
a − x dx, ,⇒ 2 2
a cos (θ)(a cos(θ)dθ) = a cos2 (θ)dθ.
dx = a cos(θ)dθ
4. Three different kinds of integrals, three kinds of inverse substitution. The type of
antiderivative where inverse substitution is most successful has the form,
√
F (x, Ax2 + Bx + C)
�
√ dx,
G(x, Ax2 + Bx + C)
where A, B and C are constants, and F (x, y) and G(x, y) are polynomial functions in the two argu
ments. Inverse substitution together with partial fractions solves all such antiderivative problems.
The first step is to complete the square of the expression Ax2 + Bx + C. This gives,
�2
B 2 − 4AC
�
2 B
Ax + Bx + C = A x + − .
2A 4A
β 2 u2 + α2 , β 2 u2 − α2 , −β 2 u2 + α2 ,
where, �
� |B 2 − 4AC |
β= |A|, α = .
|4A|
Defining a = α/β, finally the integral is transformed to one of 3 possible types,
√
FI (u, a2 − u2 )
�
Type I: √ du,
GI (u, a2 − u2 )
94
18.01 Calculus Jason Starr
Fall 2005
√
FII (u, u2 − a2 )
�
Type II: √ du,
GII (u, u2 − a2 )
and √
FIII (u, a2 + u2 )
�
Type III: √ du.
GIII (u, a2 + u2 )
For each of these types, there are 3 possible inverse substitutions: trigonometric, hyperbolic and
rational. A flow chart of the 9 possible outcomes will be posted on the course webpage. Here are
a couple of examples. In each example, the inverse rational substitution is given, although it was
only briefly discussed in lecture.
Example. Compute the following antiderivative,
x2
�
√ dx.
a2 − x 2
The trigonometric inverse substition is,
x = a sin(θ), dx = a cos(θ)dθ.
x2 √
�
√ dx = (1/2)(a2 sin−2 (x/a) − x a2 − x2 ) + C.
a2 − x 2
95
18.01 Calculus Jason Starr
Fall 2005
2t 2(1 − t2 )
x=a , dx = a dt.
1 + t2 (1 + t2 )2
The point is that,
(1 − t2 )2
a2 − x 2 = a 2 .
(1 + t2 )2
Thus the new antiderivative is,
4a2 t2 1 + t2 2a(1 − t2 )
�
dt.
(1 + t2 )2 a(1 − t2 ) (1 + t2 )2
This simplifies to,
t2
� � �
2 1 1
8a dt = 8a2 dt − 8a2 dt.
(1 + t2 )3 (1 + t2 )2 (1 + t2 )3
Notice, these two integrals are the same type that occurred with inverse hyperbolic substitution.
But they came up more quickly: rational inverse substitution is more efficient than inverse hyper
bolic substitution for this problem. However, both require a further inverse trigonometric substi
tution. So inverse trigonometric substitution is the most efficient for this problem.
96
18.01 Calculus Jason Starr
Fall 2005
x = a cosh(t), dx = a sinh(t)dt.
97
18.01 Calculus Jason Starr
Fall 2005
x2 √ √
�
√ dx = (1/2)(a2 ln(x + x2 − a2 ) − x x2 − a2 ) + C.
x 2 − a2
1 + t2 −(1 − t2 )
x=a , dx = a dt.
2t 2t2
The point is that,
(1 − t2 )2
a2 − x 2 = a 2 .
(2t)2
Thus the new antiderivative is,
a2 (1 + t2 )2 −a(1 − t2 )
�
2t
dt.
4t2 a(1 − t2 ) 2t2
98
18.01 Calculus Jason Starr
Fall 2005
1. Review of inverse substitution and another example. Recall the general strategy for
finding an antiderivative of the form,
√
F (x, Ax2 + Bx + C)
�
√ dx.
G(x, Ax2 + Bx + C)
For definiteness, consider the example,
x2
�
√ dx,
x2 − 2ax + 2a2
where a is a constant.
Step 1. Complete the square. Complete the square of the expression Ax2 + Bx + C, inside the
radical. In the example,
x2 − 2ax + 2a2 = (x − a)2 + a2 .
Step 2. Make a linear change of coordinates. Make a linear change of coordinates to simplify
the quadratic term to one of the 3 types: a2 − x2 , x2 − a2 , or x2 + a2 . In the example, this means
making the linear change of variables,
u = x − a, du = dx.
The new quadratic term is u2 + a2 , the third type. The new antiderivative is,
(u + a)2
� 2
u + 2u + a2
�
√ du = √ du.
u2 + a2 u 2 + a2
Step 3. Use inverse substitution to eliminate the radicals. There is a choice of inverse sub
stitution: trigonometric, hyperbolic or rational. When starting out, it is a good idea to experiment
with all 3. On an exam, usually one choice will be suggested (or even demanded). When no other
guidance is given, trigonometric substitution is a good starting point (because you are already very
familiar with trigonometric functions).
In the example, to eliminate the radical, the correct inverse trigonometric substitution is,
99
18.01 Calculus Jason Starr
Fall 2005
Step 4. Compute the new antiderivative. If this were only as simple as it sounds, how much
easier calculus would be! This step is often difficult in itself. Often it requires at least one more
direct substitution. Sometimes, it also requires a partial fractions decomposition. We will return
to this step below.
Step 5. Backsubstitute. This is always a step for a method using direct substitution or inverse
substitution. This step frequently introduces terms like cos(tan−1 (x)). Timepermitting (or when
specifically instructed to do so), these terms should be simplifed using the righttriangle method
from lecture,
√
θ = tan−1 (x), x/1 = tan(θ) = Opposite/Adjacent, Hypotenuse = 1 + x2 ,
√
cos(θ) = Adjacent/Hypotenuse = 1/ 1 + x2 .
Step 6. Check your answer. When feasible, check your answer. Since differentiation is so much
faster than antidifferentiation, it is usually quite easy to check an antiderivative is correct.
Example. The tricky part is, of course, Step 4. In the example, the integral broke into 3 terms,
� � �
2 2 2 2
a tan (θ) sec(θ)dθ + 2a sec(θ) tan(θ)dθ + a sec(θ)dθ.
The last antiderivative was actually Problem 3(b) from Part II of Problem Set 4. It turns out to
be,
2
� √ √
a sec(θ)dθ = a2 ln(u + u2 + a2 ) + C = a2 ln(x − a + x2 − 2ax + 2a2 ) + C.
�
The middle antiderivative is simply the derivative of sec(θ) = 1 + tan2 (θ). So the middle term
is,
2
� √ √
2a sec(θ) tan(θ)dθ = 2a2 sec(θ) + C = 2a a2 + u2 + C = 2a x2 − 2ax + 2a2 + C.
But the final term does not simplify in an obvious way. In such cases, it is best to express everything
in terms of sin(θ) and cos(θ) to get a fresh perspective,
sin2 (θ)
� �
2 2 2
a tan (θ) sec(θ)dθ = a dθ.
cos3 (θ)
100
18.01 Calculus Jason Starr
Fall 2005
Multiplying numerator and denominator by cos(θ) and expressing in terms of sin(θ) gives,
(1) − (z 2 )
� �
1
dz = dz.
(1 − z 2 )2 (1 − z 2 )
This was computed in Problem 3(a), Part II of Problem Set 4. Thus, computing either of the 2
antiderivatives gives both of them.
2. Antidifferentiating simple rational expressions. A rational expression is a fraction of
polynomials, F (x)/G(x). These frequently arise in Step 4 of the algorithm above. From the point
of view of antidifferentiation, the simplest rational expressions are either polynomials,
B(x − a) C
2 2 m
and .
((x − a) + b ) ((x − a)2 + b2 )m
These 2 kinds come up less often than the first kind. But they do come up, for instance, when
studying Laplace transforms in 18.03. Both polynomials and partial fractions are (relatively) easy
to antidifferentiate. The antiderivative of a polynomial is,
�
q(x)dx = (na+1)
n
xn+1 + an−1
n
xn + · · · + a21 x2 + a0 x + C.
101
18.01 Calculus Jason Starr
Fall 2005
The second kind of partial fraction can be computed with a direct substitution v = (x − a)2 + b2 ,
dv = 2(x − a)dx,
�
B(x − a) (−B/(2m − 2))((x − a)2 + b2 )−(m−1) + C , m ≥ 2,
� �
B dv
dx = =
((x − a)2 + b2 )m 2 vm (B/2) ln((x − a)2 + b2 ) + C , m=1
The third kind of partial fraction can be computed with an inverse substitution x = b tan(θ) + a,
dx = b sec2 (θ)dθ, �
C 2m−1
�
2 2 m
dx = (C/b ) cos2m−2 (θ)dθ.
((x − a) + b )
Integration by parts gives a reduction formula for such integrals; see Problems (i) and (j), Part I
of Problem Set 7.
3. Simplifying rational expressions: division and factoring Many rational expressions that
come up are not of the simple kinds above. The goal is to express an arbitrary rational expression
as the sum of a polynomial and partial fractions. The first step is polynomial division. Given a
fraction F (x)/G(x), apply polynomial division to get a factorization with remainder,
where q(x) is a polynomial and r(x) is a polynomial of degree less than deg(G(x)). This leads to
the reduced form of a rational expression,
F (x) r(x)
= q(x) + .
G(x) G(x)
Example. I forgot the example from lecture. Here is a similar example. Find the reduced form of
(x3 + 1)/(x2 + 3x + 2). The polynomial division algorithm gives,
x3 + 1 7x+7
= x−3+ x2 +3x+2
.
x2 + 3x + 2
The next step is to factor the denominator into a product of linear and irreducible quadratic factors,
G(x) = A(x − a1 )m1 · (x − a2 )m2 · · · · · (x − ak )mk · ((x − α1 )2 + b21 )n1 · · · · · ((x − αl )2 + b2l )nl .
102
18.01 Calculus Jason Starr
Fall 2005
Here k and l are nonnegative integers and m1 , . . . , mk , n1 , . . . , nl are positive integers. Also,
a1 , . . . , ak , α1 , . . . , αl , and β1 , . . . , βl are real numbers. The last l factors were not discussed in
lecture until the end of lecture. Although they are important, they do not often come up in this
course.
The Fundamental Theorem of Algebra asserts that every polynomial with real coefficients has a
factorization as above. However, finding the factorization can be very difficult. In all exercises and
exam problems, either the factorization is easy, or the factorization will be given to you. Whenever
possible, cancel common factors from the numerator and denominator.
Example. In the example, the quadratic formula gives the factorization,
x2 + 3x + 2 = (x + 2)(x + 1).
The numerator r(x) is 7(x + 1). Thus the numerator and denominator have a common factor. This
leads to a better reduced form,
x3 + 1 7
2
=x−3+ .
x + 3x + 2 x+2
This can now be integrated to give,
x3 + 1
�
dx = (x2 /2) − 3x + 7 ln(|x + 2|) + C.
x2 + 3x + 2
4. Simplifying rational expressions: partial fraction decomposition. Using the last part,
every rational expression can be written in the form,
F (x) r(x) r(x)
= q(x)+ = q(x)+ ,
G(x) H(x) (x − a1 ) · · · · · (x − ak ) · ((x − α1 )2 + b21 )n1 · · · · · ((x − αl )2 + b2l )nl
m1 mk
where q(x) is a polynomial, the degree of r(x) is less than the degree of H(x), and r(x) has no
common factor with H(x). This can be further simplified using partial fraction decomposition. It
is a fact that every rational expression r(x)/H(x) can be written in the form,
� � � �
C1,1 C1,2 C1,m1 Ck,1 Ck,mk
+ + ··· + + ··· + + ··· + +
x − a1 (x − a1 )2 (x − a1 )m1 x − ak (x − ak )mk
� �
D1,1 (x − α1 ) E1,1 D1,n1 (x − α1 ) E1,n1
+ + ··· + + + ...
(x − α1 )2 + b21 (x − α1 )2 + b12 ((x − α1 )2 + b21 )n1 ((x − α1 )2 + b21 )n1
� �
Dl,1 (x − αl ) El,1 Dl,n1 (x − αl ) El,n1
+ + + ··· + + .
(x − αl )2 + b2l (x − αl )2 + bl2 ((x − αl )2 + b2l )nl ((x − αl )2 + b2l )nl
Here all the terms Ci,j , Di,j and Ei,j are real constants. This sum of partial fractions is called the
partial fraction decomposition of r(x)/H(x). The difficulty is precisely to find the constants Ci,j ,
Di,j , and Ei,j .
103
18.01 Calculus Jason Starr
Fall 2005
One approach, which always works but is quite inefficient, is simply to multiply all terms by the
denominator H(x), and then gather coefficients of powers of x. This will give a collection of linear
equations in the unknowns Ci,j , Di,j and Ei,j . There is a unique solution of this set of linear
equations. Methods of linear algebra, e.g., GaussJordan elimination, give an algorithm for finding
the solution.
Example. Find the partial fraction decomposition of,
1
.
1 − x2
In fact this was Problem 3(a), Part II of Problem Set 4. The partial fraction decomposition will
have the form,
1 A B
2
= + .
1−x x+1 x−1
Multiplying both sides of the equation by x2 − 1 = (x + 1)(x − 1) gives,
A + B = 0,
−A + B = −1
Solving the first equation for B = −A and plugging this into the second equation gives,
1
−A + (−A) = −1 ⇔ 2A = 1 ⇔ A = .
2
Thus B = −A = −1/2. So the partial fraction decomposition is,
1 1 1 −1 1
= 2 x+1
+ 2 x−1
.
1 − x2
5. The Heaviside coverup method. The Heaviside coverup method is a method for deter
mining many of the coefficients Ci,j . For each highest power of a linear factor occuring in H(x),
say (x − ai )mi , coverup that term, and substitute x = ai in the remaining polynomial. Then Ci,mi
equals the value, �
r(x) �
Ci,mi = m
� .
H(x)/(x − ai ) i �
x=ai
The proof is quite simple. Multiply every term in the partial fraction decomposition by (x − ai )mi .
One term is (x − ai )mi (Ci,mi /(x − ai )mi ) = Ci,mi . Every other term has a factor (x − ai ) that is
not cancelled by the denominator. Thus plugging in x = ai , every other term is 0. And the only
remaining term is Ai,mi .
104
18.01 Calculus Jason Starr
Fall 2005
As this example illustrates, the Heaviside coverup method does not always determine all coeffi
cients. However, it reduces the number of coefficients. To find the remaining coefficients, either
clear denominators, or else substitute for x some useful numbers (where H(x) is nonzero), and solve
the resulting linear equations.
Example. Find the full partial fraction decomposition of,
z2
.
(1 − z 2 )2
The rational expression is unchanged by the substitution z ↔ −z. Thus the same is true for the
partial fraction decomposition. Therefore C2,1 equals −C1,1 . This gives,
z2 1 1 1 1 C1,1 −C1,1
= + + + .
(1 − z 2 )2 4 (z + 1)2 4 (z − 1)2 z + 1 z−1
Finally, plug in z = 0 to get,
1 1 1 1 C1,1 −C1,1 1
0= 2
+ 2
+ + = + 2C1,1 .
4 (+1) 4 (−1) +1 −1 2
Solving gives C1,1 = −1/4. Finally this gives the full partial fraction decomposition,
z2
= (1/4)(1/(z + 1)2 + 1/(z − 1)2 − 1/(z + 1) + 1/(z − 1)).
(1 − z 2 )2
105
18.01 Calculus Jason Starr
Fall 2005
z2 √
� � � �
−1 −1
�
1 1 z
dz = + + ln(|z − 1|/|z + 1|) +C = + ln( 1 − z 2 ) − ln(1 + z) +C.
(1 − z 2 )2 4 z+1 z−1 2 1 − z2
This allows us to finish the computation of the antiderivative from the beginning of the lecture.
u = x dv = cos(x)dx,
du = dx v = sin(x)
106
18.01 Calculus Jason Starr
Fall 2005
Because it is much easier to differentiate than the antidifferentiate, it is a good idea to check you
answer.
2. How �to use integration by parts. � The goal of integration by parts is to replace a complicated
integral, udv, by a simpler integral vdu. What this usually means is that du should be simpler
than u, and v should be no more complicated than dv. This was the case in the last example.
However, occasionally this is not the case.
Example. Use integration by parts to compute the antiderivative,
�
ln(x)dx.
There is very little choice here, if we are to use only integration by parts. Set u to be ln(x) and set
dv to be dx. Then u, v, du and dv are,
u = ln(x), dv = dx
du = dx/x, v = x
Notice this example does not follow the general rule. The integral v = x is strictly more complicated
than dv = dx. However, du = dx/x is much simpler than u = ln(x). So vdu = dx is simpler than
udv = ln(x)dx. The lesson is to be flexible when antidifferentiating. Try different things, and see
which one works. For example, another approach to this problem, which ultimately comes down
to integration by parts again, is to make an inverse substitution,
x = et , dx = et dt.
u = t, dv = et dt
du = dt, v = et
107
18.01 Calculus Jason Starr
Fall 2005
Now there is much more choice for u and dv. The simplest choice is to set u = [ln(x)]n and dv = dx.
Then u, v, du and dv are,
u = [ln(x)]n , dv = dx
n−1
du = n[ln(x)] /xdx, v = x
Using integration by parts, � �
udv = uv − vdu,
�
[ln(x)]n dx = x[ln(x)]n − n [ln(x)]n−1 dx.
�
The new integral is simpler than the original integral. And repeated application of the formula
eventually leads to a formula for the integral. Thus this is a reduction formula. For instance, this
gives, � �
[ln(x)]2 dx = x[ln(x)]2 − 2 ln(x)dx.
108
18.01 Calculus Jason Starr
Fall 2005
Notice how similar this answer was to the answer of the previous example. The connection comes
from the inverse substitution,
x = et , dx = et dt,
so that, � �
n
[ln(x)] dx = tn et dt.
One choice is to set u = [sin(x)]n−1 and to set dv = sin(x)dx. Then u, v, du and dv are,
u = [sin(x)]n−1 , dv = sin(x)dx
du = (n − 1)[sin(x)]n−2 cos(x)dx, v = − cos(x).
Using integration by parts, � �
udv = uv − vdu,
� �
n n−1
[sin(x)] dx = −[sin(x)] cos(x) + (n − 1) [sin(x)]n−2 cos2 (x)dx.
At first blush, this is more complicated than the original integral since it involves both sin(x) and
cos(x). But cos2 (x) equals 1 − sin2 (x). This substitution gives,
� � �
n n−1
[sin(x)] dx = −[sin(x)] cos(x) + (n − 1) [sin(x)] dx − (n − 1) [sin(x)]n dx.
n−2
109
18.01 Calculus Jason Starr
Fall 2005
This certainly seems circular: the new formula for the integral involves the integral we were looking
for. However, bringing like terms to one side of the equation gives,
� � �
n n n−1
[sin(x)] dx + (n − 1) [sin(x)] = −[sin(x)] cos(x) + (n − 1) [sin(x)]n−2 dx.
This is an indeterminate form. In other words, the computation of the limit failed to give any
useful information. The reason is that the general formula,
only holds if all three limits are defined, which they are not in our case.
Of course F (x) is simply the constant function with value b. Therefore,
Thus, a more careful computation proves the limit exists and gives its value.
2. The Mean Value Theorem revisited. Recall the Mean Value Theorem: If f (x) is continuous
on [a, b] and differentiable on (a, b), then for some c strictly between a and b,
f (b) − f (a)
f � (c) = .
b−a
110
18.01 Calculus Jason Starr
Fall 2005
Thus, given two such functions f (x) and g(x) such that g(b) − g(a) is nonzero, there exist two
values c1 and c2 strictly between a and b such that,
f � (c1 ) (f (b) − f (a))/(b − a) f (b) − f (a)
= = .
g � (c2 ) (g(b) − g(a))/(b − a) g(b) − g(a)
Is there a single value c = c1 = c2 where this equality holds?
The answer is yes. Form the function
Since f (x) and g(x) are continuous on [a, b], also F (x) is continuous on [a, b]. Since f (x) and g(x)
are differentiable on (a, b), also F (x) is differentiable on (a, b). Moreover,
F (a) = F (b) = 0.
Thus, by the Mean Value Theorem, there exists a value c strictly between a and b such that
F � (c) = 0. By a straightforward computation,
This proves the Generalized Mean Value Theorem. The main consequence of the Generalized Mean
Value Theorem is the following result.
Proposition. Let f (x) and g(x) be continuous functions on [a, b] that are differentiable on (a, b).
If g � (x) is nonzero on (a, b), then g(x) − g(a) is nonzero for all a < x < b so that the expression,
f (x) − f (a)
g(x) − g(a)
is defined. The righthanded limit,
f (x) − f (a)
lim+ ,
x→a g(x) − g(a)
exists if and only if the righthanded limit,
f � (x)
lim+ ,
x→a g � (x)
exists. If both limits exist, they are equal,
f (x) − f (a) f � (x)
lim+ = lim+ � .
x→a g(x) − g(a) x→a g (x)
A similar result holds for lefthanded limits. The proof follows by applying the Generalized Mean
Value Theorem to the interval [a, x] to replace (f (x) − f (a))/(g(x) − g(a)) by f � (c)/g � (c). Then x
approaches a as c approaches a.
111
3. L’Hospital’s rule. The most important case of the proposition is L’Hospital’s rule. This is
exactly the case when f (a) = g(a) = 0. In this case, a naive computation would give,
f (x) f (a) 0
lim+ “=” = ,
x→a g(x) g(a) 0
which is an indeterminate form. Again, the problem is that the general formula,
only holds if all three limits are defined, and the limit limx→a+ g(x) is nonzero. Since the limit is
zero, the formula does not hold.
However, if f � (x) and g � (x) exist, and if g � (x) is nonzero, then the proposition has the following
consequence, known as L’Hospital’s rule,
Examples.
sinh(x) cosh(x) 1
lim = lim = = 1.
x→0 sin(x) x→0 cos(x) 1
4x3 − 32 12x2 12 · 4 48
lim 2
= lim = = = 16.
x→2 x − x − 2 x→2 2x − 1 2·2−1 3
1 − cos(x)
lim = lim sin(x)2x = lim cos(x)2 = 1/2.
x→0 x2 x→0 x→0
4. L’Hospital’s rule for other indeterminate forms. L’Hospital’s rule can be used to compute
limits that naively lead to indeterminate forms other than 0/0. For instance, if
112
18.01 Calculus Jason Starr
Fall 2005
are defined and nonzero, the formula above can be rewritten as,
�−1 � �−2
f � (x)
� � �
f (x) f (x)
lim = lim+ � · lim+ .
x→a+ g(x) x→a g (x) x→a g(x)
Solving gives,
limx→a+ f (x)/g(x) = limx→a+ f � (x)/g � (x),
if both limits are defined and nonzero. In fact, a better result is true (with a more subtle proof): if
the second limit is defined, then the first limit is defined and the 2 are equal (whether or not they
are zero).
Example.
ln(x − π/2) 1/(x − π/2)
lim + = lim + = · · · = 0.
x→π/2 sec(x) x→π/2 sec(x) tan(x)
By similar arguments, other indeterminate forms can also be reduced to L’Hospital’s rule. Also,
limits of the form,
lim F (x)
x→∞
giving indeterminate forms can often be reduced to L’Hospital’s rule. The moral is that the formula,
f (x) f � (x)
lim = lim � ,
x→a g(x) x→a g (x)
is almost always true if f (a)/g(a) is an indeterminate form. But a certain amount of care should
Homework. Problem Set 8 Part I: (c), (d) and (e); Part II: Problems 1 and 2.
1. A problem with Riemann integrals. Riemann integrals are defined in very many cases.
The result we use most often is that for a piecewise continuous function f (x) on a bounded interval
[a, b], the Riemann integral,
� b
f (x)dx,
a
exists (and equals a finite number). What if the interval is unbounded, e.g., [a, ∞)? Quite simply,
the Riemann integral is not defined. This isn’t a problem with our methods for computing integrals.
It is a problem with the very definition of the Riemann integral. In fact, this is only the first
of many problems with the definition of the Riemann integral. Eventually these problems led
113
mathematicians to develop a better definition, the Lebesgue integral, which is studied in course
18.103. Luckily, the particular problem of defining the integral on unbounded intervals can be
easily overcome using limits (with no need to use the Lebesgue integral).
2. Improper integrals of the first kind. Let f (x) be defined on the interval [a, ∞). If for every
number t > a the function f (x) is Riemann integrable on [a, t], and if the limit,
�
t
lim f (x)dx,
t→∞ a
1 p − 1 (p − 1)tp−1
Since p is greater than 1, the limit,
1
lim ,
t→∞ tp−1
exists and equals 0. Therefore, �
t
1
lim dx,
t→∞ 1 xp
exists and equals,
1
.
p−1
Therefore the improper integral exists and equals,
�
∞
1
dx = 1/(p − 1).
1 xp
114
18.01 Calculus Jason Starr
Fall 2005
Since the limit limt→∞ ln(t) is not defined (or more precisely, equals +∞), the improper integral,
� ∞
1
dx,
1 x
exists and equals sin(t). Even though all values sin(t) are defined and bounded, the limit,
lim sin(t),
t→∞
is not defined (essentially because it never settles down). Therefore the improper integral,
� ∞
cos(x)dx,
0
is not defined.
3. Improper integrals of the second kind. Here is a second problem with the Riemann
integral. Let [a, b] be a bounded interval. Let f (x) be a function that is bounded on [t, b] for every
a < t < b, but which is unbounded on [a, b]. According to the definition of the Riemann integral,
� b
f (x)dx,
a
is not defined. However, it may happen that for every a < t < b, the integral,
� b
f (x)dx,
t
115
18.01 Calculus Jason Starr
Fall 2005
Similarly, if f (x) is Riemann integrable on every interval [a, t] for a < t < b, and if
� t
lim− f (x)dx,
t→b a
Example. Let p be a real number in the range 0 < p < 1. Because the function 1/xp is unbounded
on [0, 1], the Riemann integral, � 1
1
p
dx,
0 x
is not defined. However, for every 0 < t < 1, the Riemann integral,
� 1
1
p
dx,
t x
is defined equals,
1 − t1−p
.
1−p
Since 0 < p < 1, the limit,
lim t1−p ,
t→0
4. The Comparison Test. When is an improper integral defined? This is equivalent to asking
when a limit is defined. Therefore, every rule for convergence of a limit gives a rule for convergence
of an improper integral. There are 2 basic rules for convergence of a limit.
116
18.01 Calculus Jason Starr
Fall 2005
The squeezing lemma. If F (x) ≤ G(x) ≤ H(x) on an interval, if limx→a F (x) and limx→a H(x)
exist, and if limx→a F (x) equals limx→a H(x), then limx→a G(x) exists and equals the other 2 limits.
Monotone bounded limits. If F (x) is monotone increasing and bounded above on [a, b),
then limx→b− F (x) exists. Similarly, if F (x) is monotone decreasing and bounded below, then
limx→b− F (x) exists, if F (x) is monotone increasing and bounded below, then limx→a+ F (x) exists,
and if F (x) is monotone decreasing and bounded above, then limx→a+ F (x) exists.
These give the following tests for convergence of an improper integral.
Squeezing lemma. If f (x) ≤ g(x) ≤ h(x) on the interval [a, ∞), and if the improper integrals,
� ∞ � ∞
f (x)dx and h(x)dx,
a a
� ∞
g(x)dx,
a
converges, then � ∞
f (x)dx,
a
�∞ �∞
converges. Contrapositively, if a
f (x)dx diverges, then a
g(x)dx diverges.
Lecture 30. December 6, 2005
Practice Problems. Course Reader: 6C2.
1. Sequences By definition, a sequence of real numbers is a rule assigning to each counting number
n an associated real number an . The integer n is called the index of the sequence. Usually the
index begins with n = 1, but occasionally it begins with another integer (sometimes 0). Sequences
are often specified by giving the first few values, and letting the reader infer the rule, e.g.,
1 1 1
a1 = , a 2 = , a 3 = , . . .
1 2 3
It is always better to give a precise definition of each sequence, e.g.,
1
an = , n = 1, 2, . . .
n
The most common notation for a sequence is (an )n≥1 .
117
18.01 Calculus Jason Starr
Fall 2005
A sequence (an )n≥1 converges to a limit L if the sequence becomes arbitrarily close to L, and stays
arbitrarily close to L. More precisely, the sequence converges to L if for every positive number �,
there exists an integer N (depending on the sequence and �) such that for every integer n ≥ N ,
|an − L| < �.
In other words, the tail of the sequence aN , aN +1 , aN +2 , . . . are all numbers in the interval (L −
�, L + �). A sequence cannot have more than 1 limit: given 2 potential limits L1 and L2 , simply
take � = |L1 − L2 |/2 in the definition above. A sequence which has a limit is said to converge, and
the limit is denoted by,
L = lim an .
n→∞
(ii) The sequence an = n diverges. In a precise sense, this sequence “diverges to ∞”.
(iii) The sequence an = (−1)n diverges, even though it is bounded (it never gets bigger than 1 or
smaller than −1).
(iv) Let r be a real number. The sequence an = rn , n = 0, 1, 2, . . . converges to 0 if |r| < 1 and
diverges if |r| > 1. There are 2 remaining cases. If r = −1, then an = (−1)n diverges. If
r = 1, then an = 1 converges to 1.
2. Tests for convergence/divergence. One useful test for convergence is the Squeezing Lemma.
The squeezing lemma. Let (an )n≥1 , (bn )n≥1 and (cn )n≥1 be sequences such that for every index
n,
an ≤ b n ≤ c n .
In other words, the sequence (bn ) is “squeezed” between the sequences (an ) and (cn ). If (an ) and
(cn ) converge, and if,
lim an = lim cn ,
n→∞ n→∞
then also (bn ) converges and its limit equals the limit of the other 2 sequences.
Another test for convergence is the Monotone Convergence Test. A sequence (an )n≥1 is called non
decreasing if for every index n, an+1 ≥ an . Similarly, a sequence (an ) is nonincreasing if for every
index n, an+2 ≤ an . A sequence which is either nondecreasing or nonincreasing (but not both
increasing and decreasing) is called monotone. A sequence (an ) is bounded above if there exists
a real number u such that for every index n, an ≤ u. The number u is an upper bound for the
sequence. A sequence (an ) is bounded below if there exists a real number l such that for every index
n, an ≥ l. The number l is a lower bound for the sequence.
118
18.01 Calculus Jason Starr
Fall 2005
The second is the sequence of partial absolute sums, (Bn )n≥1 , defined by,
n
�
Bn = |a1 | + |a2 | + · · · + |an | = |ak |.
k=1
If the sequence of partial sums (bn )n≥1 converges, the limit is called the series of (an )n≥1 , and is
denoted by,
�∞ �n
ak := lim bn = lim ak .
n→∞ n→∞
k=1 k=1
�
In this case is is said the series �k ak converges. If the sequence of partial absolute sums (Bn )n≥1
converges, it is said the series k ak converges absolutely. Although it is not obvious, if the
series converges absolutely, then the series converges (this is a basic theorem from course 18.100).
If a series converges but does not converge absolutely, sometimes it is said the series converges
conditionally.
Examples.
� The harmonic sequence is the sequence an = 1/n. As will be shown soon, the harmonic
series n 1/n diverges to ∞. The alternating harmonic sequence is,
(−1)n
an = .
n
The alternating harmonic series,
∞
� (−1)n
,
n=1
n
does converge. This will also be shown soon. Since the sequence of partial absolute sums for the
alternating sequence equals the sequence of partial sums for the harmonic sequence, the alternating
harmonic series does not converge absolutely. It only converges conditionally.
As counterintuitive as this might sound, the terms in the alternating harmonic series can be
rearranged so that the sum converges to any real number you like! This sounds ridiculous: finite
sums are independent of the order in which the summands are added, so how could this fail for
119
�
infinite sums?
� The answer is quite simple. Because the harmonic series n 1/n diverges, the same
is true for 1/2n . Thus, add it up a very large number of only the (positive) even terms in the
alternating harmonic series to make the partial sum bigger than, say, 106 . Now add only the first
odd term −1/2. This has a negligible effect. Now add a large number of the remaining even terms
to make the partial sum bigger than 107 . Now add one more odd term, −1/3. Continuing in this
way, eventually every term in the sequence contributes to one of the partial sums. But because
we add positive terms with a much higher frequency than negative terms, the sequence of partial
sums is diverging to +∞. Similarly, we could negative terms with a very high frequency and make
the partial sums diverge to −∞. Now it is not so surprising that by adding the terms in a careful
order, we can make the partial sums converge to any value we like.
The pathology of the preceding paragraph occurs with any conditionally convergent series. It is a
very important fact that every absolutely convergent series has only a single limit, independent of
the order in which terms are added. For this reason, absolutely convergent series are much more
useful than conditionally convergent series.
�
4. Test for convergence/divergence of series. If a series n an converges, then the sequence
(an ) converges to 0. To see this, denote by L the limit of the sequence of partial sums (bn ). For
every positive real number �, using �/2 in the definition of convergence of (bn ), there exists an
integer N such that for every n ≥ N , |bn − L| < �/2. But then for n ≥ N + 1,
|an | = |bn − bn−1 | = |(bn − L) − (bn−1 − L)| ≤ |bn − L| + |bn−1 − L| < �/2 + �/2 = �.
Thus the sequence (a� n ) converges to 0. Contrapositively, if the sequence (an ) does not converge
to 0, then the series n an diverges. This is the�most basic test for divergence of a series. For
∞ n
example, it immediately follows that the series n=1 (−1) diverges (arguing the opposite is a
favorite pasttime of “mathematical cranks”).
The most basic test for absolute convergence of a sequence follows from the monotone convergence
test. The sequence of partial absolute sums,
n
�
Bn = |ak |,
k=1
A number of common convergence tests in calculus textbooks come to nothing more than combining
the comparison test with an analysis of the geometric series. Let r be a real number and let (an )n≥0
be the geometric sequence,
an = rn , n ≥ 0,
120
18.01 Calculus Jason Starr
Fall 2005
(by convention, if r = 0, the first term a0 is defined to be 1). By high school algebra, if r �= 1, the
partial sums are
1 − rn+1 1 1 n+1
bn = 1 + r + · · · + r n = = − r .
1−r 1−r 1−r
Observe this sequence depends on n only in the last term rn+1 , which is essentially the geometric
sequence. Assuming r �= 1, the geometric sequence rn+1 converges if and only if |r| < 1. In this
case, the sequence of partial absolute sums,
1 1
Bn = 1 + |r| + |r|2 + · · · + |r|n = + |r|n+1 ,
1 − |r| 1 − |r|
�∞ n
also converges. Thus, the geometric series n=0 r converges absolutely to 1/(1 − r) if |r| < 1, and
diverges if |r| > 1 or r = −1. The only remaining case is�when r = 1. Then the partial sums are
bn = n + 1, which diverges to ∞. Altogether, the series ∞ n
n=0 r converges to 1/(1 − r) if |r | < 1,
and diverges otherwise.
The ratio test. There are two tests that allow us to compare a given sequence (an )n≥∞ to a
geometric sequence (rn )n≥1 . If the following limit,
� �
� an+1 �
lim � � ,
n→∞ �
an �
exists, call it r. Then the sequence (an )n≥1 can be� compared to a sequence (Crn )n≥1 for some
choice of C. This leads to the ratio test : The series ∞n=1 an converges absolutely if the sequence
|an+1 /an | converges to a real number r < 1 and diverges if the sequence |an+1 /an | converges to
a real number r > 1 (in which case, the sequence (an )n≥1 does not converge to 0). There is no
information if the sequence converges to 1 or diverges.
Similarly, if the following limit, �
n
lim |an |,
n→∞
exists, call it r.
Then the sequence
�∞ (an )n≥1 can be compared to a sequence
� (Crn )n≥1 . This leads
to the root test : The series n=1 an converges � absolutely if the sequence |an | converges to a real
n
number r < 1 and diverges if the sequence |an | converges to a real number r > 1. There is no
n
121
18.01 Calculus Jason Starr
Fall 2005
�
∞
diverges, then the series n=1 an does not converge absolutely. For both directions, define the
sequence (cn ) by, �
n+1 �
n+1
cn = f (x)dx, or cn = g(x)dx.
n n
The absolute partial sum of the series nk=1 ck is simply,
�
� n � n � n
ck = f (x)dx, or g(x)dx.
k=1 1 1
As n tends to ∞, the natural logarithms ln(n) also tend to ∞ (although very slowly – ln(n) does
not get bigger than a fixed real number R until n gets bigger than the much larger number eR ).
Therefore the partial sums diverge. By the comparison test, the harmonic series also diverges (very
slowly).
Example. 2. The Riemann zeta function. Let s > 1 be a real number. Define the sequence
(an )n≥1 by,
1
an = s .
�
∞ �
∞ n
s s
The series n=1 1/n equals 1 + n=2 1/n , which is the same as,
∞
� 1
1+ .
n=1
(n + 1)s
Let f (x) be the function f (x) = 1/xs . Then for each integer n, f (x) ≥ 1/(n + 1)s for every x in
[n, n + 1]. The partial sum of (cn ) is,
� n � �n
1 1 1 �� 1 1 1
cn = s
dx = s−1
= − s−1
.
1 x 1−sx �
1 s−1 s−1n
Because s is bigger than 1, as n tends to ∞, also ns−1 tends to ∞. Therefore the partial sums tend
to 1/(s − 1). Therefore, by the comparison test, the series,
∞
� 1
,
n=1
ns
122
18.01 Calculus Jason Starr
Fall 2005
converges absolutely to a value bounded by 1/(s − 1). The value of this limit is called the Riemann
zeta function at s, denoted
∞
� 1
ζ(s) := .
n=1
ns
This function is of fundamental importance in number theory. It is also pops up in Fourier series
and statistical mechanics. The values of ζ(s) when s is an even integer are known. The first couple
are ζ(2) = π 2 /6 and ζ(4) = π 4 /90. There are very fundamental open problems about the Riemann
zeta function. For one of these problems in particular, the Clay Mathematics Institute has offered
a $1 million prize for an accepted, refereed solution.
Lecture 31. December 8, 2005
Practice Problems. Course Reader: 7B4, 7B6, 7C1, 7C5, 7D1, 7D2.
1. Power series. Given a real number a and a sequence of real numbers (cn )n≥0 , there is an
associated expression, called a power series about x = a,
∞
�
cn (x − a)n = c0 + c1 (x − a) + c2 (x − a)2 + . . .
n=0
For every choice of a real number x, the power series gives a usual series. In particular, for the
choice x = a, the series has only 1 nonzero term, thus converges to c0 .
Question. Given a power series, for which real numbers x does the corresponding series absolutely
converge?
Examples. 1. Consider the power series,
∞
�
1 1 2 2 3 3
0 + 1 x + 2 x + 3 x + ··· = nn xn .
n=1
Of course this converges to 0 for x = 0. But for any x other than 0, the sequence nn xn = (nx)n
diverges. Therefore the series does not converge. In other words, the series converges only for
x = 0.
2. Consider the power series,
∞
�
2
1 + x + x + ··· = xn .
n=1
This is a geometric series. From the last lecture, the series converges absolutely for |x| < 1 and
diverges if |x| ≥ 1.
3. Consider the power series,
∞
2 3
� 1 n
1 + x + x /2 + x /3! + · · · = x .
n=0
n!
123
The ratio of the nth and (n + 1)st terms in the series is,
x
(xn+1 /(n + 1)!)/(xn /n!) = .
n+1
For fixed x, as n grows, this sequence of ratios converges to 0, which is less than 1. Therefore, by
the ratio test, for every choice of x the series converges.
These 3 examples illustrate the whole range of possibilities.
Theorem. Let ∞ n
�
n=0 cn (x − a) be a power series about x = a. Exactly one of the following hold.
(i) For every x different from a, the series does not converge absolutely.
(ii) There exists a real number R such that the series converges absolutely if |x − a| < R and
does not converge absolutely if |x − a| > R.
(iii) For every real number x, the series converges absolutely.
The real number R occuring in Case (ii) is called the radius of convergence. By convention, in
Case (i) the radius of convergence is defined to be R = 0. By convention, in Case (iii) the radius
of convergence is defined to be R = ∞. This allows us to replace the original question by a more
precise question.
Question. Given a power series, what is the radius of convergence?
Although there is no single answer to this question, in many interesting cases the ratio or root test
gives an answer.
cn (x − a)n is positive,
�
2. Analytic functions. If the radius of convergence R of a power series
then the power series defines a function on the interval (a − R, a + R),
∞
�
f (x) = cn (x − a)n .
n=0
A function defined in this manner is called an analytic function. This is the real significance of
power series: they give important examples of functions that cannot be described in a more direct
manner. Analytic functions have nice analytic properties (whence the name). For instance, it is a
theorem (proved in 18.100) that an analytic function f (x) is differentiable and the derivative has a
power series converging absolutely with the same radius R,
∞
� ∞
�
� n−1
f (x) = cn n(x − a) = (m + 1)cm+1 (x − a)m .
n=0 m=0
We can iterate the theorem, i.e., f � (x) is differentiable and f �� (x) has a power series converging
absolutely with radius R. Iterating k times, the function f (x) is ktimes differentiable and its k th
derivative has a power series,
∞
(k)
� (n + k)!
f (x) = cn+k (x − a)n .
n=0
n!
124
In particular, every derivative of f (x) is defined. A function with this property is called infinitely
differentiable or smooth. Thus, every analytic function is infinitely differentiable.
This is only 1 of many useful properties of analytic functions. Which functions f (x) are analytic
functions? By the last paragraph, if f (x) is analytic, then it is infinitely differentiable. Are there
other restrictions? Can more than 1 power series about x = a give rise to the same analytic
function?
To answer both of these questions, consider the analytic function defined by a power series,
∞
�
f (x) = cn (x − a)n .
n=0
f (a) = c0 + c1 (a − a) + c2 (a − a)2 + · · · = c0 + 0 + 0 + · · · = c0 .
c0 = f (a).
ck = f (k) (a)/k!.
In particular, this series is unique. This answers the second question. Two absolutely convergent
power series about x = a give the same analytic function if and only if the power series are
themselves equal (i.e., the corresponding coefficients of the 2 series are equal).
Moreover, this gives us alot of information about the first question. For an infinitely differentiable
function f (x) defined at a point x = a, there is a very important power series, the Taylor series
expansion of f (x) about x = a,
�∞ f (n) (a)
n=0 n!
(x − a)n .
If f (x) is analytic, then the Taylor series converges absolutely to f (x). This reduces the original
question to 2 new questions. Does the Taylor series have a positive radius of convergence? If so,
does the analytic function defined in this way equal the original function f (x)?
125
18.01 Calculus Jason Starr
Fall 2005
The radius of convergence question is precisely the radius of convergence question posed earlier. As
there, the answer can often be found by using the ratio or root tests. The second question is yes
in every practical case. There are examples of infinitely differentiable functions where the Taylor
series has a positive radius of convergence, but does not converge to the original function. However,
every example is somewhat contrived; they rarely come up “in nature”. Just for completeness, here
is an example of one of these pathological functions,
� −1/x2
e , x �= 0,
f (x) =
0 x=0
3. Algorithm for computing Taylor series. The method for finding the Taylor series of
a function is always the same. For definiteness, consider the Taylor series expansion of f (x) =
(1 − x)−1 about the point x = 0.
Step 1. Compute all derivatives of f (x). If this sounds like alot of work, it is! In most
examples, this really comes down to finding an inductive formula for the derivatives of f (x). In the
example, the “zeroth derivative” is,
f (x) = (1 − x)−1 .
for some real number bk . Having made this guess, it is easy to verify by induction. By computation,
the result is true for k = 0, 1 and 2 with the corresponding real numbers b0 = 1, b1 = 1 and b2 = 2.
By way of induction, assume the result is true for k = n, i.e.,
f (n+1) (x) = (f (n) (x))� = (cn (1 − x)−n−1 )� = cn (−n − 1)(1 − x)−n−2 (−1) = (n + 1)cn (1 − x)−n−2 .
Thus the result is also true for k = n + 1 where cn+1 satisfies the equation,
cn+1 = (n + 1)cn .
126
18.01 Calculus Jason Starr
Fall 2005
This number has come up before in this class. It is the nth factorial number,
cn = n!.
This gives the final formula for the nth derivative of f (x),
Step 2. Substitute x = a into the derivatives. Compared to the work of finding the derivatives,
this is very simple. In the example, plugging in x = 0 gives,
Step 3. Compute the coefficients of the Taylor series. By definition, the nth coefficient of
the Taylor series is,
f (n) (a)
cn = .
n!
In the example, this gives the coefficient,
n!
cn = = 1,
n!
for every integer n ≥ 0.
Step 4. Write the Taylor series. This is really getting into the “mindnumbing details”. In the
example, the Taylor series expansion for (1 − x)−1 about x = 0 is,
�∞
(1 − x)−1 = n=0 xn .
Step 5. If possible, find the radius of convergence. In the example, the Taylor series is
simply the geometric series. By the previous lecture, the geometric series converges absolutely with
radius R = 1. Moreover, it converges absolutely to (1 −x)−1 . Notice, this gives another explanation
for the radius R = 1. Since (1 − x)−1 has a vertical asymptote at x = 1, the Taylor series cannot
converge on any interval that contains x = 1. The largest interval centered at x = 0 not containing
x = 1 is the interval (−1, 1). This interval has radius R = 1.
4. More examples. What is the Taylor series expansion for (1 − x)−1 about a point x = a
different from x = 1? The fortunate fact is that Step 1 allows to compute the derivatives f (n) (a)
127
� 1, not just x = 0. This is the typical case, and it is one justification for doing the work
for any a =
necessary in Step 1. In this case, the answer is,
Therefore, according the Step 3, the nth coefficient in the Taylor series expansion is,
n!(1 − a)−n−1
cn = = (1 − a)−n−1 .
n!
Thus, according to Step 4, the Taylor series expansion for (1 − x)−1 about x = a is,
�∞
(1 − x)−1 = n=0 (1 − a)−n−1 (x − a)n .
What is the radius of convergence? The ratio of the (n + 1)st and nth terms of the series is,
This is independent of n. Thus, this constant sequence converges to its constant value (1 − a)−1 (x −
a). By the ratio test, the sequence is absolutely convergent if and only if this limit has absolute
value less than 1,
|(1 − a)−1 (x − a)| ≤ 1.
Rearranging, the series converges if and only if,
|x − a| ≤ |1 − a|.
f (n) (x) = ex .
f (n) (0) e0
cn = = = 1/n!.
n! n!
128
18.01 Calculus Jason Starr
Fall 2005
Observe this is the power series considered earlier in the lecture, whose radius of convergence is
R = ∞. Therefore, for every x, the power series converges absolutely to ex ,
�∞
ex = n=0 xn /n!.
This equation is sometimes taken as the definition of ex . It has certain advantages to our original
definition of ex . Importantly, it is easy for a computer to determine ex to very high precision using
this formula.
Example 3. Having computed the Taylor series expansion for ex about x = 0, the next question
is to compute the Taylor series expansion for ex about x = a. According to the formula,
f (n) (a) = ea ,
As above, the radius of convergence is R = ∞. Thus, for every real number x, the power series
converges absolutely to ex ,
�∞ a
ex = n
n=0 e (x − a) /n!.
On the other hand, we didn’t need to do any extra work to see this. We could have used the
formula,
ex = ea+(x−a) = ea ex−a .
Plugging in x − a for x in the power series expansion for ex gives the power series expansion,
∞
� 1
ex−a = (x − a)n .
n=0
n!
129
18.01 Calculus Jason Starr
Fall 2005
Example 3. Consider the function f (x) = sin(x). The derivatives of f (x) are,
f (x) = sin(x),
f � (x) = cos(x),
f �� (x) = − sin(x),
f (3) (x) = − cos(x),
f (n+4) (x) = f (n) (x)
Together, these give all the derivatives of f (x). Write n = 4l, 4l + 1, 4l + 2 or 4l + 3 for some
nonnegative integer l. Then the rules above give,
⎧
⎪
⎪ sin(x) n = 4l,
cos(x) n = 4l + 1,
⎨
f (n) (x) =
⎪
⎪ − sin(x) n = 4l + 2,
− cos(x) n = 4l + 3
⎧
⎪
⎪ 0, n = 4l,
1, n = 4l + 1,
⎨
f (n) (0) =
⎪
⎪ 0, n = 4l + 2,
−1, n = 4l + 3
⎩
Thus, all the even coefficients of the Taylor series are 0. For an odd coefficient, say n = 2m + 1,
the derivative is,
f (2m+1) (0) = (−1)m .
Therefore, the coefficient is,
(−1)m
c2m+1 = .
(2m + 1)!
Plugging this in gives the Taylor series expansion for sin(x) about x = 0,
∞
� (−1)m 2m+1
x .
m=0
(2m + 1)!
[(−1)m+1 x2m+3 /(2m + 3)!]/[(−1)m x2m+1 /(2m + 1)!] = −x2 /(4m2 + 8m + 3).
This sequence converges to 0. Therefore, by the ratio test, the power series converges absolutely to
sin(x) for every choice of x,
�∞ m
sin(x) = m=0 (−1) /(2m + 1)!x2m+1 .
130
⎧
⎪
⎪ cos(x), n = 4l,
− sin(x), n = 4l + 1,
⎨
g (n) (x) =
⎪
⎪ − cos(x), n = 4l + 2,
sin(x), n = 4l + 3.
⎩
0, n = 4l + 3.
�∞ m 2m
cos(x) = m=0 (−1) /(2m)!x .
Notice, we didn’t really need to do this work. Since cos(x) is the derivative of sin(x), the Taylor
series for cos(x) is simply the termbyterm derivative of the Taylor series for sin(x). This gives the
same formula as above.
To compute the Taylor series expansions of sin(x) and cos(x) about a point x = a, we can follow
the procedure above. However, it is much faster to use the angle addition formulas,
131
18.01 Calculus Jason Starr
Fall 2005
polynomials, trigonometric functions, exponential functions and logarithms (the proof of this is far
beyond the scope of this class). However, it is quite easy to write down a power series expansion
2
for f (x). First of all, the Taylor series for e−t about t = 0 is obtained by substituting x = −t2 in
the Taylor series for ex about x = 0,
∞
−t2
�
e = (−1)n t2n /n!.
n=0
Because this series converges absolutely, the integral of the series is the series of the termbyterm
integrals,
� x � x� ∞ ∞ � x
−t2 (−1)n 2n � (−1)n 2n
f (x) = e dt = t dt = t dt.
0 0 n=0 n! n=0 0
n!
Each of these integrals can be computed quite easily. This gives,
�∞ n
f (x) = n=0 (−1) /[(2n + 1) · n!]t2n+1 .
This is the Taylor series expansion for f (x) about x = 0. For instance, using this series, it is easy
to estimate, � 1
2
e−t dt ≈ 0.747 ± 10−3 .
0
2. Taylor series with remainder term. As demonstrated by the computation just done, in
reality only finitely many terms in a Taylor series are used. What can be said in this case? In other
words, how quickly does the series converge? How large is the remainder after n terms? To make
all this precise, introduce the function RN,a (x) defined to be,
N
� f (n) (a)
RN,a (x) = f (x) − (x − a)n .
n=0
n!
The precise version of the questions above is, what bounds exist for RN,a (x)?
To understand the answer, consider the simplest case where N = 0. Then the remainder term is
simply,
R0,a (x) = f (x) − f (a).
By the Mean Value Theorem, for every x there exists a real number c (depending on x) between a
and x such that,
R0,a (x) = f � (c)(x − a).
132
18.01 Calculus Jason Starr
Fall 2005
Iterating the Mean Value Theorem, for every integer N , for every x, there exists a real number c
(depending on both N and x) between a and x such that,
In particular, if we can bound the (N + 1)st derivative of f (x) on the interval between a and c,
then we can bound RN,a (x).
Example. Bound the remainder in the Taylor series expansion for ex about x = a. The (N + 1)st
derivative is simply ex . Therefore, a bound for f (N +1) (c) for c between a and x is simply,
M = em = emax(a,x) .
By choosing N suitably large, we can make this remainder term as small as possible. For instance,
if we want to compute ex for x in the interval (−1, 1), then M equals e. To make the remainder
term less than 10−10 , it suffices to take N = 12.
3. Review problems. Each of the following problems was discussed in lecture. Here are the
problems and answers, without the discussion.
Problem 1. Let a and b be positive real numbers. There are 2 tangent lines to the ellipse with
equation,
x2 y 2
+ 2 = 1,
a2 b
containing the point (a, b). Find the equations of each of these tangent lines.
The 2 tangent lines are the line tangent to the ellipse at (x, y) = (0, b) and the line tangent to the
ellipse at (x, y) = (a, 0). The equations of these lines are,
y = b,
and,
x = a.
Problem 2. A grain silo is designed by attaching a cylinder of fixed radius r and height a directly
above a right circular cone of base radius r and height b. The silo has no top, and there is no
bottom between the bottom of the cylinder and the top of the cone. For a fixed volume V , what
choice of b minimizes the surface area of the grain silo?
The choice of b minimizing the surface area is,
√
b = 2 5r/5.
133
18.01 Calculus Jason Starr
Fall 2005
Problem 3. Compute the volume of the solid obtained by revolving about the xaxis the region
in the first quadrant bounded by the curve y = x2 and the curve x = y 2 .
The volume of this solid is,
Volume = 3π/10.
Problem 4. Using a trigonometric substitution and a trigonometric identity, compute the an
tiderivative, � √
1 − x2
dx.
x2
The antiderivative equals,
� √
1 − x2 √
2
dx = − 1 − x2 /x − sin−1 (x) + C.
x
134