0% found this document useful (0 votes)
180 views118 pages

Math132 Notes

This document provides an overview of Calculus concepts including derivatives and integrals. It discusses: 1) Calculus deals with change and variation, allowing analysis of dynamic problems involving motion, rates of change, and optimization. It was developed by Newton and Leibniz. 2) The main concepts are derivatives, which measure instantaneous rates of change, and integrals, which measure cumulative effects or areas. These concepts have physical, geometric, numerical, and algebraic interpretations. 3) Derivatives allow computation of velocity, slope of tangent lines, and linear approximations. Integrals allow computation of total change, areas, and serve as the inverse of derivatives via the Fundamental Theorem of Calculus.

Uploaded by

Seth Killian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
180 views118 pages

Math132 Notes

This document provides an overview of Calculus concepts including derivatives and integrals. It discusses: 1) Calculus deals with change and variation, allowing analysis of dynamic problems involving motion, rates of change, and optimization. It was developed by Newton and Leibniz. 2) The main concepts are derivatives, which measure instantaneous rates of change, and integrals, which measure cumulative effects or areas. These concepts have physical, geometric, numerical, and algebraic interpretations. 3) Derivatives allow computation of velocity, slope of tangent lines, and linear approximations. Integrals allow computation of total change, areas, and serve as the inverse of derivatives via the Fundamental Theorem of Calculus.

Uploaded by

Seth Killian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

Math 132 Overview Stewart

This section is a birds-eye view of the course. Read it over now, then come back
to it as you learn the topics, to see how they fit into the whole theory.
Calculus is the mathematics of change and variation. With ordinary algebra,
we can translate static real-world problems into equations and solve them; with
calculus, we can solve dynamic problems involving motion, rates of change, opti-
mum values, irregular shapes, and the cumulative effect of a changing influence. It
was discovered by Newton and Leibnitz, and developed further notably by Euler.
The main concepts of calculus are derivatives and integrals applied to functions.
Like most mathematical concepts, these have four levels of meaning: physical (real-
world), geometric (pictures), numerical (spreadsheets), and algebraic (formulas).
Given a problem originating on one level (usually physical or geometric), we trans-
late to a different level (numerical or algebraic) where the problem can be solved,
then we translate the solution back to the original level.

Functions. Officially, a function f : X Y is any rule that takes elements of an


input set (or domain) X to elements of an output set Y . In problems, this concept
is represented on the following levels.

1. Physical: A function defines how an input quantity (the independent vari-


able or argument) determines an output quantity (the dependent variable or
value). For example, consider a stone dropped from a bridge: the elapsed
time t (in sec) determines the observed distance s (in feet) that the stone
has fallen, s = f (t). If the stone falls into the water 400 ft below after 5 sec,
the domain is naturally 0 t 5, namely the interval X = [0, 5].

2. Geometric: A function is a graph in the plane, the curve of points (x, y)


such that y = f (x). In our example, we use coordinates (t, s), and the graph
s = f (t) curves upward from (0, 0) to (5, 400). As the stone speeds up with
increasing t, the graph gets steeper: in fact, it is a segment of a parabola.

3. Numerical: A function is a table of values. In our example, we might get a


partial such table by measuring the distance at sample times:

t 0 1 2 3 4 5
s=f (t) 0 16 64 144 256 400

Of course, f (t) has a value at every t, not just the samples. We can imagine
the full function as an infinite table with an entry for every t in the domain.

4. Algebraic: A function is a formula to compute the output in terms of the


input. A model of our physical example is the formula f (t) = 16t2 , where
t is in seconds and s in feet. Like all models of the real world, this is valid
only with some error (due mainly to air resistance).


Each section of these Notes corresponds to a section of James Stewarts Calculus, 7th ed.
Derivatives. Now we preview the main concepts of this course. Given a function
f , the derivative function f 0 has the following meanings.

1. Physical: The derivative of a function y = f (x) is the rate of change of y


with respect to the change in x. In our example of a falling stone, s = f (t),
the derivative tells how fast the distance is changing per unit time, i.e. how
fast the stone is moving. This is the velocity v in feet per second, at time t;
and the derivative is the velocity function v = f 0 (t).

2. Geometric: For a graph y = f (x), the derivative f 0 (a) at x = a is the slope


of the graph at (a, f (a)): that is, the slope of the tangent line at that point,
y = f (a) + m(xa), where m = f 0 (a).

3. Numerical: We can approximate the derivative f 0 (a) by considering an input


x = a + h close to a, and dividing the rise in f (x) by the run in x:

f f (x) f (a) f (a+h) f (a)


f 0 (a) = = .
x xa h

In our example f (t) = 16t2 , we can compute the approximate velocity at


time t = 3 sec by considering the nearby time t = 3.1, and computing the
change in distance (i.e. distance traveled), divided by the time elapsed:

f (3.1) f (3) 153.76 144


v = f 0 (3)
= = = 97.6 .
3.1 3 0.1
That is, after falling for 3 sec, the stone is travelling about 97.6 ft/sec.
Once we know f 0 (a), we can use it to approximate f (x) by a linear function:
f (x) f (a) + f 0 (a)(xa), for x near a.

4. Algebraic: We will give methods to compute the derivative of any formula.


The foundation is the precise definition: the derivative is the limiting value
of approximations over an interval x [a, a+h] of width x = h, where h
gets smaller and smaller toward zero:

f (a+h) f (a)
f 0 (a) = lim .
h0 h

We will first determine some Basic Derivatives, such as (xn )0 = nxn1 and
sin0 (x) = cos(x), and combine them using the Sum, Difference, Constant
Multiple, Product, Quotient and Chain Rules.
For our example f (t) = 16t2 , we get f 0 (t) = 32t: the velocity is steadily
increasing, proportional to time. This gives the exact value f 0 (3) = 96.
Integrals.
R b Given a function g, the integral from x = a to x = b is a number
denoted a g(x) dx, and has the following meanings.
1. Physical. Suppose a quantity z = f (x) changes as its controlling variable
goes from x = a to x = b; and each incremental change x leads to a small
change z g(x)x, depending on x. Then the cumulative total change in z
Rb
from x = a to x = b is given by the integral of g(x): f (b)f (a) = a g(x) dx.
In our example, suppose we know v = g(t) = 32t, the velocity of the stone
at time t, and we wish to deduce the fallen distance s = f (t) for t = 3. Over
a time increment t, the stone moves by about s v t =R 32t t; so we
3
can express the cumulative change as: f (3) = f (3) f (0) = 0 32t dt.
Rb
2. Geometric. For the graph y = g(x) 0, the integral a g(x) dx is the area
under the graph and above the interval a x b on the x-axis. This is
because the area A is the cumulative total of thin slices A g(x) x with
height y = g(x) and width x.
R3
In our example, we can get the integral 0 32t dt as the trapezoidal area
under the graph v = g(t) = 32t and above the interval t [0, 3].
3. Numerical. We approximate the cumulative change in z from x = a to
x = b by splitting up the interval a x b into a large number n of small
increments of width x = ba n . We take sample points x1 , . . . , xn , one in
each increment, and compute the sum of z g(xi )x:
Z b
g(x) dx g(x1 )x + g(x2 )x + + g(xn )x .
a
Rb R
This is the origin of the notation a g(x) dx, where is an elongated S
standing for sum, and g(x) dx represents all the small changes g(xi )x.
In our example, given the velocity function v = g(t) = 32t, we can take
n = 3, t = 1 sec, and sampleR points t1 =1, t1 =2, t2 =3. We approximate the
3
cumulative distance traveled 0 32t dt as the sum over the 3 time increments
of (velocity at ti )(time elapsed) = 32ti t:
Z 3
32t dt 32(1)(1) + 32(2)(1) + 32(3)(1) = 192.
0
This an overestimate because we sample the velocity at the end of each time
increment, when the stone is fastest. Taking more increments (larger n)
would give better and better approximations.
4. Algebraic. Since integrals go from a rate of change to a total change, they
are reverse derivatives (antiderivatives), and we can use our known derivative
rules backwards to find formulas for many (but not all) integrals. That is, if
Rb
g(x) = f 0 (x) for a known formula f (x), then a g(x) dx = f (b) f (a). This
is known as the Fundamental Theorem of Calculus.
In ourRexample, we know f (t) = 16t2 has f 0 (t) = 32t, so we we get the exact
3
value 0 32t dt = 16(32 ) 16(02 ) = 144.
Math 132 Tangent and Velocity Stewart 1.4

Instantaneous velocity. We start our study of the derivative with the


velocity problem: If a particle moves along a coordinate line so that at time
t, it is at position f (t), then compute its velocity or speed at a given instant.
Velocity means distance traveled divided by time elapsed (i.e. feet per
second). If the velocity changes during the time interval, then this quotient
is the average of the changing velocities. From time t = a to t = b, the
distance traveled is the change in position f (b) f (a), and the time elapsed
is b a, so the average velocity is:
f (b) f (a)
vavg = .
ba
What do we mean by the instantaneous velocity at time t = a? We
cannot compute this directly, since the particle does not move at all in an
instant. Rather, we find the average velocity from t = a to t = a+h, where
h is a small time increment, and take the instantaneous velocity v to be the
limiting value approached by the average velocities:
f (a+h) f (a)
v = lim ,
h0 h
where lim means the limit as h approaches 0 of the quantity on the right.
h0
Another way to say this is that velocity is the rate of change of position
with respect to time: how fast the position f (x) is changing per unit change
in time t. Thus, vavg is the average rate of change over an interval t [a, b],
while v is the instantaneous rate of change at a particular t = a.

Falling stone example. A stone dropped off a bridge has position approx-
imately f (t) = 16t2 feet below the bridge after falling for t seconds. The
average velocity between t = 3 and t = 4 is:
f (4) f (3) 16(42 ) 16(32 )
vavg = = = 112.
43 1
That is, the stone has an average velocity of 112 ft/sec, although it starts
slower than this at t = 3 and speeds up steadily throughout the interval.
Now, what is the instantaneous velocity at t = 3? We compute the
average velocity over a short time interval from t = 3 to t = 3 + h, for
example h = 0.1:
f (3.1) f (3) 16(3.12 ) 16(32 )
vavg = = = 97.6 .
3.1 3 0.1
This is a pretty good estimate of the velocity, but to be more precise we
take shorter intervals:
h 1 0.1 0.01 0.001 0.0001 0.00001
vavg 112 97.6 96.16 96.02 96.002 96.0002

Velocity can be positive or negative, depending on the direction of motion. Speed is
the absolute value of velocity.
It is pretty clear that as the interval gets shorter and shorter, the average
velocity approaches the limiting value v = 96, and we define this to be the
instantaneous velocity.
Let us prove this algebraically: instead of trying sample values of the
time increment h, we let h be a variable:

f (3+h) f (3) 16(3+h)2 16(32 ) (3+h)2 32


vavg = = = 16
(3+h) 3 h h

(32 + 2(3h) + h2 ) 32 6h + h2
= 16 = 16 = 16(6 + h) = 96 + 16h .
h h
As we take h smaller and smaller, the error term 16h approaches zero, and
the average velocity approaches the limiting value 96, which by definition is
the instantaneous velocity:

f (3+h) f (3)
v = lim = 96 .
h0 h
Tangent Slope. We have described velocity on three conceptual levels: as a
physical quantity, a numerical approximation, and an algebraic computation.
Velocity also has a geometric meaning in terms of the graph y = f (t).
Consider a secant line which cuts the graph at points (a, f (a)) and (b, f (b)).

The slope msec of the secant line is the rise in the graph per unit of
horizontal run, which means distance traversed divided by time elapsed,
which is the average velocity:

f (b) f (a)
msec = = vavg .
ba
The reason for this coincidence is that slope is the rate of vertical rise with
respect to horizontal run, just as velocity is the rate of change of position
(drawn on the vertical axis) with respect to time (on the horizontal axis).
As we move the point (b, f (b)) to (a+h, f (a+h)), closer and closer to a,
the secant lines approach the tangent line which touches the curve at the
single point (a, f (a)).

The tangent slope m is the limit of the secant slopes, so it is equal to


the instantaneous velocity:

f (a+h) f (a)
m = lim = v.
h0 h
For any graph y = f (x), not only the graph of position with respect to
time, the tangent problem is to find the the tangent line passing through
(a, f (a)). The slope m is given by the above formula. The point-slope
equation of the tangent line is thus: y = f (a) + m(x a). For example, the
tangent line of our graph y = 16x2 at the point (3, 144) is: y = 144+96(x3).
Math 132 Tangent and Velocity Stewart 1.4

Instantaneous velocity. We start our study of the derivative with the


velocity problem: If a particle moves along a line so that at time t, it is at
position f (t), then compute its velocity or speed at a given instant.
Velocity means distance traveled divided by time elapsed (i.e. feet per
second). If the velocity changes during the time interval, then this is the
average of the changing velocities. From time t = a to t = b, the distance
traveled is the change in position f (b) f (a), and the time elapsed is b a,
so the average velocity is:
f (b) f (a)
vavg = .
ba
What do we mean by the instantaneous velocity at time t = a? We
cannot compute this directly, since the particle does not move at all in an
instant. Rather, we find the average velocity from t = a to t = a+h, where
h is a small time increment, and take the instantaneous velocity v to be the
limiting value of the average velocities:
f (a+h) f (a)
v = lim .
h0 h

Falling stone example. A stone dropped off a bridge has position approx-
imately f (t) = 16t2 feet below the bridge after falling for t seconds. The
average velocity between t = 3 and t = 4 is:
f (4) f (3) 16(42 ) 16(32 )
vavg = = = 112.
43 1
That is, the stone has an average velocity of 112 ft/sec, although it starts
slower than this at t = 3 and velocities up steadily throughout the interval.
Now, what is the instantaneous velocity at t = 3? We compute the
average velocity over a short time interval from t = 3 to t = 3 + h, for
example h = 0.1:
f (3.1) f (3) 16(3.12 ) 16(32 )
vavg = = = 97.6 .
3.1 3 0.1
This is a pretty good estimate of the velocity, but to be more precise we
take shorter intervals:

h 1 0.1 0.01 0.001 0.0001 0.00001


vavg 112 97.6 96.16 96.02 96.002 96.0002

It is pretty clear that as the interval gets shorter and shorter, the average
velocity approaches the limiting value v = 96, and we define this to be the
instantaneous velocity.

Velocity can be positive or negative, depending on the direction of motion. Speed is
the absolute value of velocity.
Let us prove this algebraically: instead of trying sample values of the
time increment h, we let h be a variable:

f (3+h) f (3) 16(3+h)2 16(32 ) (3+h)2 32


vavg = = = 16
(3+h) 3 h h

(32 + 2(3h) + h2 ) 32 6h + h2
= 16 = 16 = 16(6 + h) = 96 + 16h .
h h
As we take h smaller and smaller, the error term 16h approaches zero, and
the average velocity approaches the limiting value 96, which by definition is
the instantaneous velocity:

f (3+h) f (3)
v = lim = 96 .
h0 h
Tangent Slope. We have described velocity on three conceptual levels:
real-world quantities, numerical approximations, and algebra. Velocity also
has a geometric meaning in terms of the graph y = f (t). Consider a secant
line which cuts the graph at points (a, f (a)) and (b, f (b)).

The slope of the secant line is the rise in the graph per unit of horizontal
run, which means distance traversed divided by time elapsed, which is the
average velocity:
f (b) f (a)
msec = = vavg .
ba
As we move the point (b, f (b)) to (a+h, f (a+h)), closer and closer to a, the
secant lines approach the tangent line which touches the curve at the single
point (a, f (a)).
The tangent slope is the limit of the secant slopes, so it is equal to the
instantaneous velocity:

f (a+h) f (a)
m = lim = v.
h0 h
For any graph y = f (x), not only graphs of position with respect to time,
the tangent problem is to find the the tangent line passing through (a, f (a)).
The slope m is given by the above formula. The point-slope equation of the
tangent line is thus: y = m(x a) + f (a). For example, the tangent line of
our graph y = 16x2 at the point (3, 144) is: y = 96(x 3) + 144.
Math 132 Limits Stewart 1.5

Definition of limits. The key technical tool in the previous section was
the idea of a limiting value approached by approximations. We need limits
for all the definitions of calculus, so we must understand them clearly.

Preliminary definition: Consider a function f (x) and numbers L,


a. Then the limit of f (x) equals L as x approaches a, in symbols
limxa f (x) = L, means that f (x) can be forced arbitrarily close
to L by making x sufficiently close to (but unequal to) a.

That is, f (x) approximates L to within any desired error tolerance, for all
values of x within some small distance from a (but x 6= a). One more way
to say it: if we make a table of f (x) for any sample values of x getting
closer and closer to a (such as x = a + 0.1, a + 0.01, etc.), then the values
of f (x) will get as close as we like to L (thought they might never reach L).
Graphically:

Evaluating limits. Some limits are easy because we can plug in x = a


to get the limiting value limxa f (x) = f (a), in which case we say f (x) is
continuous at x = a. Graphically, as in the above picture, this means the
curve has no jump or hole at (a, f (a)). For example,

lim x2 = 52 = 25,
x5

as we could see from the graph of y = x2 . Algebraically, if x is close enough


to 5, say x = 5 + h for some small h, then

x2 = (5+h)2 = 52 + 2(5h) + h2 = 25 + 10h + h2 ,

which is forced as close as we like to L = 25 if h is small enough (positive


or negative).
Sometimes f (x) does not approach any limiting value at x = a, in which
case we say the limit does not exist, and the symbol limxa f (x) has no
meaning. For example, define the signum function sgn(x) as:

+1 for x > 0
|x|
sgn(x) = = 1 for x < 0
x
undefined for x = 0,

having the graph:

Near x = 0, the function cannot be forced close to any single output value.
That is, limx0 sgn(x) 6= 1, since no matter how close x gets to 0, there
are some x (namely negative) for which sgn(x) is far from 1; and similarly
limx0 sgn(x) is not 1, nor 0, nor any other value. In particular, it is false
that limx0 sgn(x) = sgn(0), and the function is not continuous at x = 0.
An important feature of limxa f (x) is that it does not depend on f (a),
even if f (a) is undefined: the limit only notices values of f (x) for x 6= a.
For example, define g(x) = 0 for x 6= 1, and g(1) = 12 , having the graph:

Then limx1 g(x) = 0, since if x is close enough to (but unequal to) 1, then
g(x) is arbitrarily close to L = 0 (in fact g(x) = L). Again, limx1 g(x) 6=
g(1) = 21 , and g(x) is not continuous at x = 1.
The important limits in calculus, such as instantaneous velocity, are
cases where the function is not defined at x = a. For example, consider
2 1
limx1 xx1 . Plugging in x = 1 gives the meaningless expression 00 , so this
function is not continuous, but the limit still exists. Indeed, plotting points
gives the graph:
It seems the limit is L = 2: the graph approaches (1, 2), so if x is sufficiently
close to (but not equal to) 1, then f (x) is forced as close as desired to 2. We
can prove this algebraically:
x2 1 (x1)(x+1)
lim = lim = lim x+1 = 1 + 1 = 2,
x1 x 1 x1 x1 x1

since x+1 is continuous.

One-sided and infinite limits. We define another type of limit. One-


sided limits (from the right or left) notice only values of x on one side of
a. That is, the limit of f (x) equals L as x approaches a from the right,
denoted limxa+ f (x) = L, whenever f (x) can be forced arbitrarily close to
L by making x sufficiently close to (but greater than) a. The limit from the
left, denoted limxa f (x) = L, is the same, except with x less than a.
If we have the ordinary limit limxa f (x) = L, then clearly the left
and right limits have the same value L. Thus, in the above examples, we
have limx5+ x2 = limx5 x2 = 52 , and limx1+ g(x) = limx1 g(x) = 0,
2 1 2 1
and limx1+ xx1 = limx1 xx1 = 2. However, limx0+ sgn(x) = 1 and
limx0 sgn(x) = 1, even though limx0 sgn(x) does not exist.
Finally, we define infinite limits: limxa f (x) = means that f (x) can
be forced larger than any bound (for instance f (x) > 1000) by making x
sufficiently close to (but not equal to) a. The symbol has no meaning
by itself: this is just a way of saying that f (x) becomes as large a number
1 1
as we like. For example, we have limx0 |x| = , since the graph y = |x|
shoots upward toward a vertical asymptote at x = 0.

However, for the function x1 , we have limx0 x1 6= , since no matter how


close x is to 0, we cannot force x1 above a given positive bound: rather, x1
is a large negative number when x is a tiny negative number. In fact, the
graph shoots upward to the right of the vertical asymptote, and downward
to the left of the asymptote, so we have one-sided infinite limits:

1 1
lim = lim =
x0 x x0+ x
Math 132 Limit Laws Stewart 1.6

Operations on limits. Some general combination rules make most limit


computations routine. Suppose we know that limxa f (x) and limxa g(x)
exist. Then we have the Limit Laws:

Sum: limxa (f (x) + g(x)) = limxa f (x) + limxa g(x).

Difference: limxa (f (x) g(x)) = limxa f (x) limxa g(x).

Constant Multiple: limxa (c f (x)) = c limxa f (x), for a constant c.

Product: limxa f (x)g(x) = limxa f (x) limxa g(x).


f (x) limxa f (x)
Quotient: limxa g(x) = limxa g(x) , provided limxa g(x) 6= 0.

Power: limxa f (x)n = (limxa f (x))n , for a whole number n.

limxa n f (x) = n limxa f (x), for a whole number n.


p p
Root:

These all have the form: The limit of an operation equals the operation
applied to the limits. These Laws are also valid for one-sided limits.

Limits by plugging in. Assuming the Limit Laws and the Basic Limits
limxa x = a and limxa c = c, we can prove that most functions are
continuous, meaning the limxa f (x) is obtained by substituting x = a to
get f (a). For example, we can formally compute the limit:

1 x lim 1 x
lim = x2 by the Quotient Law
x2 1 + x lim 1 + x
x2

lim 1 lim x
x2 x2
= by the Sum and Difference Laws
lim 1 + lim x
x2 x2
q
lim 1 lim x
x2 x2
= by the Root Law
lim 1 + lim x
x2 x2

1 2 1 2
= = by the Basic Limits.
1+2 3

That is, the correct limit would be obtained just by substituting x = 2. In


general, substituting x = a gives
the correct limit unless it leads to a mean-
ingless expression like 00 or 1 (we do not consider imaginary numbers

If n is even, we assume limxa f (x) > 0.

The Quotient Law requires that the denominator have a non-zero limit. We tentatively
proceed with the computation and find the denominator to be 3, which retrospectively
justifies the quotient step.
in this course). In Notes 2.4, we will show that trigonometric functions
like sin(x) and tan(x) are also continuous, so this principle also works for
formulas involving these functions.

Limits by cases. Consider the function sgn(x) = |x| x for x 6= 0, which


equals 1 for x > 0 and 1 for x < 0. Compute the limit:
p
lim 1 + sgn(x)2 .
x0

We cannot just plug in x = 0 because sgn(x) is not defined there, jumping


from 1 to 1. Instead, we apply the Laws to the right limit only:
p q p
lim 1 + sgn(x)2 = 1 + ( lim sgn(x))2 = 1 + 12 = 2.
x0+ x0+

since we know limx0+ sgn(x) = 1. Similarly, the left limit is:


p p
lim 1 + sgn(x)2 = 1 + (1)2 = 2.
x0

The one-sided limits coincide, so we have the two-sided limit:


p
lim 1 + sgn(x)2 = 2.
x0

Limits by canceling zeroes. As we have seen, the most important limits


are those for which substitution gives the meaningless expression 00 . To
compute these, we must cancel vanishing factors from the top and bottom,
until we get an expression which can be evaluated by the Laws. This often
requires factoring, for example:

x2 4x + 4 (x 2)2 x2
lim 2
= lim = lim ,
x2 x x 2 x2 (x 2)(x + 1) x2 x + 1

0
which can be evaluated by substituting x = 2. Another trick to avoid 0 is
to multiply by a conjugate radical:

x9 x9 x+3 (x9)( x+3)
lim = lim = lim 2
x9 x3 x9 x3 x+3 x9 ( x) 32


(x9)( x+3)
= lim = lim x + 3 = 6.
x9 x9 x9

Oscillating functions. Some limits are difficult to evaluate because the


functions oscillate wildly. Consider:
 
lim sin .
x0 x
If we try this at the sample points x = 12 , 31 , 14 , . . . , we get sin( 1/n

) =
sin(n) = 0, so it seems the limit is 0. But this is deceptive, since the
graph looks like this:

All the oscillations of sin(x) on the real line are here crammed into the in-
terval 1 x 1, and our function sin( x ) cannot be forced close to any
given value, no matter how close x is to 0. That is, this limit does not exist.
However, consider the limit:
 
lim x sin .
x0 x
The Product Law would give limx0 x limx0 sin( x ), but the second limit
does not exist, so the Law does not apply. The graph looks like:

The function is bounded between the graphs y = |x| and y = |x|, so its
limit is squeezed to zero. This reasoning is formalized by the:
Squeeze Theorem: Suppose f (x) g(x) h(x) for all x near a
(except possibly x = a), and limxa f (x) = limxa h(x) = L.
Then limxa g(x) = L.
To evaluate our limit, we note that 1 sin( x ) 1, so x x sin( x ) x
for x > 0, and similarly for x < 0. Hence:
 
|x| x sin |x| for all x 6= 0,
x
and we know by considering cases that limx0 |x| = limx0 |x| = 0, so the
Theorem applies to give:
 
lim x sin = 0.
x0 x
Math 132 Limit Definition Stewart 1.7

Why do we need limits? Because we cannot directly evaluate important


quantities like instantaneous velocity or tangent slope, but we can approxi-
mate them with arbitrary accuracy. A limit pinpoints the exact value within
this cloud of approximations. In this section, we get to the logical core of
this concept.
For example, consider the tangent line of y = x2 at x = 1, approximated
by the secant through (1, 1) and a nearby point (x, x2 ), giving the slope:
2 1
f (x) = xx1 . There is no defined value for f (1), but as x gets very close
to 1, we expect the approximations f (x) to have the exact tangent slope as
their limiting value, limx1 f (x) = L. This means a candidate value L
is the correct value if we can force f (x) as close as desired to L (within an
1 1 1
error = 10 , or 100 , or 1,000,000 , or any possible > 0), provided we restrict
x close enough to 1.
Thus, proving a limit is an error-control problem of a type we see in the
real world. For example, how accurately must you set the angle of your
tennis racket to land the ball within one foot of a given spot (or within one
inch)? In the general situation, an input setting x produces an output f (x):
how accurate must the input be to ensure a tolerable output error? That is,
what allowed difference of x from a will force an error less than of f (x)
from L?

In the graph y = f (x), we take the small red piece between the vertical lines
a < x < a + (not including x = a). By setting small enough, we try
to force this piece between the fixed horizontal lines L < y < L + , for
the specified output error .
Rewriting a < x < a+ as |x a| < , and L < f (x) < L+ as
|f (x) L| < , we get the formal definition of a limit:

Here (delta) is a Greek letter d, standing for difference, and (epsilon) is Greek e,
standing for error.
Definition: limxa f (x) = L means that for any output error
tolerance > 0, there is an input accuracy > 0 such that
0 < |x a| < forces |f (x) L| < .

We can define one-sided and infinte limits similarly.

Proof of Individual Limits. The precise definition allows us to rigorously


prove facts about limits: specific limit computatons, as well as the general
Limit Laws, which can then be applied instead of case-by-case proofs.

example: We prove that limx5 (3x2) = 3(5)2 = 13. We treat the


desired error tolerance as a variable, and we want to guarantee the output
error |f (x) L| < , or equivalently < f (x) L < . We write this out
and solve the inequalities for x:

< (3x2) 13 < 15 < 3x < 15 +


1
3 (15) < x < 31 (15+) 5 31 < x < 5+ 31 .
(Here means is logically equivalent to.) Finally, we put this in terms
of the input accuracy x a = x 5:

13 < x 5 < 13 .

To force this, we are allowed to set any input accuracy |x a| < , or


< x a < . Evidently, = 31 will work.

example: A harder error-control problem: limx3 x = 3. We trans-
late the output accuracy requirement < f (x) L < into inequalities
bounding the input accuracy x a. (Here x is a positive value close to 3,
and we take any small error tolerance 1 > > 0.)

< x 3 < 3 < x < 3+
2 2
3 2 3 + 2 < x < 3 + 2 3 + 2

2 3 + 2 < x3 < 2 3 + 2

We need an input accuracy which guarantees the last inequalities above:


In general, to guarantee a desired equality of the form d1 < xa < d2 , we

limxa+ f (x) = L means that for any > 0, there is some > 0 such that 0 < xa <
implies |f (x) L| < ; and limxa f (x) = means that for any bound B, there is some
> 0 such that 0 < |x a| < implies f (x) > B.

For the formal proof, we must reverse this logic, and show that the given input accu-
racy guarantees the desired output accuracy. Given any desired > 0, we define = 31 ,
and assume |x 5| < . Then we have:

3|x 5| < 3 = = |3x 15| < = |(3x2) 13| < ,

which is our desired conclusion |f (x) L| < . (Here = means logically implies.)

choose to be the smaller of d1 and d2 . Thus we take = 2 3 2 . Then
< x3 is equivalent to the desired lower bound 2 3 + 2 < x3; and
also x3 < implies the desired upper bound, since:

= 2 3 2 < 2 3 + 2 .

Note: In evaluating limits, we almost always rely on the Limit Laws and
other general theorems, without a specific error analysis. The general results
guarantee that the error approaches zero, and this is all we need.

Proof of Limit Theorems. All the general Limit Laws of 1.6 can be
rigorously proved by error-control analysis. We prove the simplest one:

Sum Law: If limxa f (x) = L and limxa g(x) = M , then:

lim (f (x) + g(x)) = L + M.


xa

Proof. Consider any > 0. Since we assume limxa f (x) = L and limxa g(x) =
M , we can require the error tolerance 21 for these limits, getting > 0 small
enough that 0 < |x a| < forces:

21 < f (x)L < 12 and 12 < g(x)M < 12 .

Adding these inequalities, we find that 0 < |x a| < also forces:

12 12 < (f (x)L) + (g(x)M ) < 1


2 + 21

Rewriting, this is just < (f (x)+g(x)) (L+M ) < , which is the


desired output error bound.

Squeeze Theorem: If f (x) < g(x) < h(x) for all values of x near
a (except perhaps x = a), and limxa f (x) = limxa h(x) = L,
then limxa g(x) = L.

Proof. Consider any > 0. Since we assume limxa f (x) = L and limxa h(x) =
L, we can find a > 0 such that 0 < |xa| < forces < f (x)L < and
< g(x)L < . We also know f (x) < g(x) < h(x) provided |x a| <
restricts x close enough to a, so:

f (x)L < g(x)L < h(x)L.

Then 0 < |x a| < also forces:

< f (x)L < g(x)L and g(x)L < h(x)L < ,

which gives the desired output accuracy for g(x).


Substitution Theorem: If limxb f (x) = L and limxa g(x) = b, and
g(x) 6= b for all x close enough to (but unequal to) a, then limxa f (g(x)) = L.
Proof. For any > 0, we must find a number > 0 such that 0 < |x a| <
forces |f (g(x)) L| < .
Take any > 0. Since limxb f (x) = L, there is 1 > 0 such that
0 < |y b| < 1 forces |f (y) L| < . Also, since limxa g(x) = b, there
exists 2 > 0 such that 0 < |x a| < 2 forces |g(x) b| < 1 . Now
take < 2 , and small enough that 0 < |x a| < forces g(x) 6= b.
Then we know |x a| < forces 0 < |g(x) b| < 1 , which in turn forces
|f (g(x)) L| < , as required.
Math 132 Continuity Stewart 1.8

One of the most basic features of a function is whether it is continuous. Roughly, this means
that a small change in x always leads to a fairly small change in f (x), without instantaneous
jumps. In real-world terms, the position of a particle moving in space is continuous, but the
position displayed in a video could have a gap, making the function jump discontinuously.
This can be made precise by saying that near x = a, the limit of f (x) is f (a):
Definition: A function f (x) is continuous at x = a whenever limxa f (x) = f (a).
Graphically, a function is continuous whenever the graph y = f (x) proceeds through the
point (a, f (a)) without jumps or holes.

Types of discontinuity. Continuity can fail in several ways:


i. Removable discontinuity: f (a) is undefined, but limxa f (x) exists.

ii. Removable discontinuity: f (a) is the wrong value, not limxa f (x).

iii. Jump discontinuity: the left and right limits are unequal, limxa+ f (x) 6= limxa f (x).

iv. Vertical asymptote: limxa+ f (x) and limxa f (x) are .

v. Oscillation discontinuity: limxa+ f (x) and limxa f (x) do not exist.

We say f (x) is continuous on an interval whenever it is continuous at each point of the


interval. For the endpoints of a closed interval x [a, b], we cannot take two-sided limits
within the interval, so we only require limxa+ f (x) = f (a) and limxb f (x) = f (b).

Here [a, b] means the set or collection of all numbers x between a and b, including the endpoints. The
notation x [a, b] means x is an element of the set [a, b], meaning it is one of the numbers between a and
b, which means a x b.
Proving continuity. As an example, we prove that the absolute value function f (x) = |x|
is continuous at every point x = a. Intuitively, if we move x by a small amount, then |x|
only changes slightly. This is also clear from the graph y = |x|, which has none of the above
discontinuities: the corner at (0, 0) is a continuous point, since the graph has no break.
Algebraically, we must check that limxa f (x) = f (a) for all a. Now, f (x) = x for
x 0 and f (x) = x for x 0, so if a > 0, then f (x) = x for all x sufficiently close to a,
and limxa f (x) = limxa x = a = f (a). Similarly for a < 0. Finally, for a = 0, we have
the one-sided limits limx0+ f (x) = limx0+ x = 0, and limx0 f (x) = limx0 (x) = 0,
which together show the two-sided limit limx0 f (x) = 0 = f (0).

Domain of continuity. Almost all functions defined by formulas are continuous, except
at points where they are undefined. This follows from our methods for computing limits.
example: Find the points where the following function is continuous:

(x2 3x+1) x+1
g(x) = .
x3
First, we consider the factors outside the square root, repeatedly applying the Limit Laws
from 1.6:

x2 3x + 1 (limxa x)2 3 (limxa x) + 1 a2 3a + 1


lim = = ,
xa x3 (limxa x) 3 a3

provided the denominator


a3 is non-zero; that is, a 6= 3. The Limit Laws also give
limxa x+1 = a+1 provided a+1 > 0 to avoid the square root of a negative number;
that is, for a > 1. Combining these, we have:

(x2 3x+1) x+1 x2 3x + 1 (a2 3a+1) a+1
lim = lim lim x+1 = ,
xa x3 xa x3 xa a3
provided both factor limits exist, that is if a 6= 3 and a > 1. That is, g(x) is continuous
for all these values of a. The remaining values are:

a < 1, where g(x) is undefined, hence discontinuous;

a = 1, where g(x) is continuous, since at the left endpoint of the domain of definition,
we only require the one-sided limit limxa+ g(x) = g(a);

a = 3, where the function clearly has a vertical asymptote, discontinuity of type (iv).

In summary, our g(x) is continuous at every point where it is defined, that is, in the
intervals [1, 3) (3, ). The graph looks like:

The open interval notation [a, b) means the set of all numbers x between a and b, including the left
endpoint x = a but excluding the right endpoint x = b; that is, a x < b. The infinite interval (a, )
means all x > a, with indicating no upper bound on the right.
Composing continuous functions. Another way to combine functions f (x) and g(x)
is to compose or chain them, taking the output of g as the input of f to obtain the new
function f (g(x)). Composition also preserves continuity: if g(x) is continuous at x = a,
and f (x) is continuous at x = g(a), then f (g(x)) is continuous at x = a. This follows from
the following theorem:
Composition Law: We have:

lim f (g(x)) = f ( lim g(x)),


xa xa

provided f (x) is continuous at x = b and limxa g(x) = b.


Proof. (Similar to the Limit Substitution Theorem at the end of Notes 1.7.) For any > 0,
we must find a number > 0 such that |x a| < forces |f (g(x)) L| < .
Take any > 0. Since f (y) is continuous at y = b, there is 1 > 0 such that |y b| < 1
forces |f (y)L| < . Also, since limxa g(x) = b, there exists > 0 such that 0 < |xa| <
forces |g(x) b| < 1 . Therefore 0 < |x a| < forces |g(x) b| < 1 , which in turn forces
|f (g(x)) L| < , as required.

Intermediate Value Theorem:

If f (x) is continuous for x in the interval [a, b], and r is between f (x) and f (b),
either f (a) r f (b) or f (a) r f (b), then there is a value c [a, b] such
that f (c) = r.

This says that as the function value f (x) goes continuously from f (a) to f (b), perhaps
rising and falling many times, it must pass through every value r between f (a) and f (b).
Note that this is not necessarily true for a discontinuous function like g(x) above: we
have g(2) = 1.7, g(4) = 11.2, and g(a) < 7 < g(b), but there is a vertical asymptote
discontinuity in the interval [2, 4], and there is no c [2, 4] with f (c) = 7.
However, g(x) is continuous over the interval [0, 1], with g(0) = 0.33, g(1)
= 0.72, and
g(0) < 0 < g(1), so the Theorem says there must be some c [0, 1] with g(c) = 0. This is
just the x-intercept visible in the graph.
example: Show that there exists a solution x = c to the equation cos(x) = x. We have no
easy way of solving this equation, but writing f (x) = cos(x) x, we know that f (0) = 1,
f () = 1, and f (0) > 0 > f (), so by the Theorem there is some c [0, ] with
f (c) = 0, meaning cos(c) = c.
Math 132 Derivatives Stewart 2.1

Definition of derivative. In Notes 1.4, we saw that instantaneous velocity can be


obtained as a limit of average velocities over shorter and shorter time increments. Also, the
tangent slope of a graph at a given point can be obtained as a limit of secant slopes getting
closer and closer to the point. Both these definitions compute a rate of change: velocity
is the rate of change of position with respect to time, and slope is the rate of vertical rise
with respect to horizontal run.
For any function f (x), we can compute its instantaneous rate of change with respect to
x in analogy with the above examples.
Definition. The derivative of a function f (x) at x = a, denoted f 0 (a), given by:
f (a+h) f (a)
f 0 (a) = lim .
h0 h

Here f (a+h) f (a) is the change in f (x) from x = a to x = a+h, and h is the increment
of x. The average rate of change over the interval [a, a+h] is the difference quotient
f (a+h)f (a)
h , and the instantaneous rate of change at x = a is the limit over smaller and
smaller increments, h 0.
Another way to write this is to substitute x for the endpoint of the interval, a + h = x,
approaching a with increment h = x a:
f (x) f (a)
f 0 (a) = lim .
xa xa
In graphical terms, the derivative f 0 (a) is the slope of the tangent line which touches
the graph y = f (x) at the point (a, f (a)), and the equation of the tangent line is y =
f 0 (a)(x a) + f (a).
When the limit f 0 (a) exists, we say f (x) is differentiable at x = a. When the limit does
not exist, f 0 (a) is undefined, and we say f (x) is non-differentiable or singular at x = a.
In this case, the function f (x) does not have a well-defined rate of change at x = a, and
the graph y = f (x) does not have a single tangent line at (a, f (a)). (See Left and Right
Derivatives below.)

Derivatives of standard functions. A derivative is the limit of a small change in f (x)


divided by a small change in x, so it will always be a difficult limit of the form 00 , with
no defined value if we plug in h = 0. To evaluate it, we must find some trick to cancel
vanishing factors in the numerator and denominator, as we have seen in Notes 1.4 and
1.6 (Limits by canceling).
example: Find f 0 (2) for f (x) = x+1
1
. We compute the derivative by combining fractions
over a common denominator, then canceling the vanishing factors hh :
1 1
(2+h)+1 2+1
 
0 f (2+h) f (2) 1 1 1
f (2) = lim = lim = lim
h0 h h0 h h0 h h+3 3
   
1 3 (h+3) 1 h 1 1 1
= lim = lim = lim = = .
h0 h 3(h+3) h0 h 3(h+3) h0 3(h+3) 3(0+3) 9

Note that h can also be a small negative value, in which case this is the rate of change over [a+h, a].
Let us compare the same calculation with the alternative variable x = a + h:

1 1 3(x+1)
0 f (x) f (2) x+1 2+1 3(x+1)
f (2) = lim = lim = lim
x2 x2 x2 x2 x2 x2
2x 1 1 1
= lim = lim = = ,
x2 3(x+1)(x2) x2 3(x+1) 3(2+1) 9
2x (x2)
where we cancel the vanishing factors x2 = x2 = 1. Graphically, this looks like:

The curve is y = f (x) = x+11


, and the tangent line at (a, f (a)) = (2, 13 ) has slope f 0 (2) = 91 ,
so its equation is: y = 19 (x2) + 13 .

example: Find f 0 (2) for f (x) = x. Our trick is to multiply by a conjugate radical to

liberate 2+h from under the , then cancel hh :

0 2+h 2 2+h 2 2+h + 2
f (2) = lim = lim
h0 h h0 h 2+h + 2
2 2
2+h 2 h 1 1
= lim = lim = = .
h0 h( 2+h + 2) h0 h( 2+h + 2) 2+0 + 2 2 2

Here we used the identity (ab)(a+b) = a2 b2 with a = 2+h and b = 2.

Left and right derivatives. Let us find f 0 (1) for the function defined by:

2x1 for x 1
f (x) =
x2 for x 1.

Since the function is defined differently on the two sides of x = 1, we must compute one-
sided derivative limits, to see if the two-sided limit exists.

f (1+h) f (1) (1+h)2 12 (1+2h+h2 ) 1 h(2+h)


lim = lim = lim = lim = 2.
h0+ h h0+ h h0+ h h0+ h
f (1+h) f (1) 2(1+h)1 1 2 + 2h 2
lim = lim = lim = 2.
h0 h h0 h h0 h
f (1+h)f (1)
Since these one-sided limits agree, we have f 0 (1) = limh0 h = 2. The graph is:

The graph clearly has a transition at x = 1, but it is continuous and has well-defined slope.
On the other hand, if we take:

1 for x 1
g(x) =
x2 for x 1

Then we can compute that limh0+ g(1+h)g(1)


h = 2, but limh0 g(1+h)g(1)
h = 0, which
means the graph has two different tangent slopes to the left and right of this point, namely
it has a corner:

That is, the derivative g 0 (1) does not exist, and g(x) is non-differentiable at x = 1. This
function could model the distance fallen by an object held still, then thrown down with
speed 2 at time t = 1. Before dropping, the speed is 0; an instant after, the speed is 2;
and there is no well-defined speed at the moment of throwing. (A more detailed analysis
would take into account the gradual acceleration during the throw, which would round off
the corner of the graph.)
Real-world derivatives. In Notes 1.4, we computed instantaneous velocity as the deriva-
tive of the position function f (t) with respect to time t. For any function which models the
dependence between two real-world variables, the derivative gives the rate of change of the
dependent variable with respect to the independent variable.
example: A rough model of atmospheric pressure P at height s is given by the function:
P = f (s) = 15cs , where P is in pounds per square inch (psi), s is feet above sea level, and
the constant c = 0.99996. How quickly does the pressure drop at sea-level and at 10,000
feet up?
At sea level s = 0 ft, the pressure is f (0) = 15 psi (about half the pressure of a car tire),
and the rate of change (psi of pressure change per foot of height upward) is the derivative:

15c0+h 15c0 ch 1
f 0 (0) = lim = lim 15
h0 h h0 h
In this case, we have no algebraic trick to cancel vanishing factors, so we must be content
with a numerical approximation of the difference quotient. (Since P = f (s) is only an
approximate model anyway, we lose nothing from this further approximation.)

h 100 10 1 0.1
15(ch 1)/h 0.00059 0.00059 0.00060 0.00060

Thus, f 0 (0)
= 0.0006 psi/ft. This is a negative rate of change because a rise in height gives
a drop in pressure. Thus, for each foot upward, the pressure decreases by approximately
0.0006 psi, so a 1000 ft rise would drop the pressure by about 0.6 psi.
Now at s = 10, 000 ft, the pressure is about f (10, 000) = 10.1 psi. Let us write a =
10, 000, and compute the rate of change as:

15ca+h 15ca 15cach 15ca ch 1 ch 1


f 0 (a) = lim = lim = lim 15ca = ca lim 15 .
h0 h h0 h h0 h h0 h

Now, ca = (0.99996)10,000 = 0.67 and the second factor is the limit we approximated before.
Thus, f 0 (a)
= (0.67)(0.0006)
= 0.0004 psi/ft. That is, at an altitude of 10,000 ft, every
1000 ft rise decreases the pressure by about 0.4 psi.
Math 132 Derivative Function Stewart 2.2

In Notes 2.1, we defined the derivative of a function f (x) at x = a, namely the number
f 0 (a). Since this gives an output f 0 (a) for any input a, the derivative defines a function.

Definition: For a function f (x), we define the derivative function f 0 (x) by:

f (x+h) f (x) f (z) f (x)


f 0 (x) = lim = lim .
h0 h zx zx
If the limit f 0 (a) exists for a given x = a, we say f (x) is differentiable at a;
otherwise f 0 (a) is undefined, and f (x) is non-differentiable or singular at a.

This just repeats the definitions in Notes 2.1, except that we think of the derivative as a
function of the variable x, rather than as a numerical value at a particular point x = a.
The choice of letters is meant to suggest different kinds of variables, but they do not have
any strict logical meaning: for example, f (x) = x2 , f (a) = a2 , and f (t) = t2 all define the
same function, and limxa f (x) = limta f (t) = limza f (z) are all the same limit.

Differentiation. Another name for derivative is differential. When we compute f 0 (x), we


differentiate f (x). The process of finding derivatives is differentiation.
As usual for mathematical objects, we can think of derivatives on four levels of meaning.
The real-world meaning of f 0 (x) is the rate of change of f (x) per unit change in x; for
example velocity is the derivative of the position function at time t. At the end of Notes 2.1,
we also saw how to compute a numerical approximation of a derivative as the diffference
quotient for a small value of h (see also 2.9). In this section, we explore the geometric
meaning as the slopes of the graph y = f (x), and algebraic methods for computing the
limit f 0 (x).

example: Let f (x) = x(x2), with graph y = f (x) in blue:

We can sketch the derivative graph y = f 0 (x) in red, purely from the original graph y = f (x),
without any computation. The slope of the original graph above a given x-value is the height
of the derivative graph above that x-value.
At the minimum x = 1, the original graph y = f (x) is horizontal and its slope is zero,
so f 0 (1) = 0, and we plot the point (1, 0) on the derivative graph y = f 0 (x). To the right
of this point, y = f (x) has positive slope, getting steeper and steeper; so y = f 0 (x) > 0 is
above the x-axis, getting higher and higher. Above x = 2, the tangent of y = f (x) has slope
approximately 2 (considering the relative x and y scales), so we plot (2, 2) on y = f 0 (x).
As we move left from x = 1, the graph y = f (x) has negative slope, getting steeper and
steeper, so y = f 0 (x) < 0 is below the x-axis, getting lower and lower. Above x = 0, we
estimate y = f (x) to have slope 2, and we plot (0, 2) on y = f 0 (x). Thus, y = f 0 (x)
looks like the red line in the above picture.
Next we differentiate algebraically. For any value of x:

f (x+h) f (x) (x+h)(x+h2) x(x2)


f 0 (x) = lim = lim
h0 h h0 h

x2 +2xh+h2 2x2hx2 +2x 2xh2h+h2


= lim = lim = lim 2x2+h = 2x 2.
h0 h h0 h h0

That is, f 0 (x) = 2x 2, which agrees with our sketch of the derivative graph.

example: Let f (x) = x3 x, with graph in blue:

The original graph y = f (x) has a valley with horizontal tangent at x


= 0.6, so the derivative
f 0 (0.6)
= 0, and we plot the approximate point (0.6, 0) on the derivative graph y = f 0 (x);
and similarly the hill on y = f (x) corresponds to the point (0.6, 0) on y = f 0 (x). Between
these x-values, the slope of y = f (x) is negative, with the slope at x = 0 being about 1,
so y = f 0 (x) < 0 is below the x-axis, bottoming out at (0, 1).
Algebraically:

((x+h)3 (x+h)) (x3 x) (x3 +3x2 h+3xh2 +h3 xh) x3 +x


f 0 (x) = lim = lim
h0 h h0 h

3x2 h + 3xh2 + h3 h
= lim = lim 3x2 + 3xh + h2 1 = 3x2 1.
h0 h h0

3
example: Let f (x) = x, the cube root function, with graph in blue:

The slopes of the original graph y = f (x) are all positive, with the same slope above a given
x and its reflection x. Thus the derivative graph y = f 0 (x) > 0 lies above the x-axis, and
it is symmetric across the y-axis (an even function). The slope of y = f (x) gets smaller for
large positive or negative x, and it gets steeper and steeper near the origin, with a vertical
tangent at x = 0. Thus y = f 0 (x) approaches the x-axis for large x, and shoots up the
y-axis on both sides of x = 0, with f 0 (0) undefined.

3 3x
Algebraically, we have: f 0 (x) = limh0 x+h h . We must liberate 3 x+h from under

the 3 , so as to be able to cancel hh . In Notes 2.1, we multiplied top and bottom by the
conjugate radical, exploiting the identity (a b)(a + b) = a2 b2 . Here
we have cube roots,
2 2 3 3 3

so we use the identity: (a b)(a + ab + b ) = a b , taking a = x+h and b = 3 x:

3

3 2 2
0 x+h 3 x x+h + 3 x+h 3 x + 3 x
f (x) = lim 2 2
h0 h 3
x+h + 3 x+h 3 x + 3 x
3
3
3
x+h 3 x x+hx
= lim 2 2
= lim 2
3
h0 h( x+h + 3
x+h 3 x + 3 x ) h0 h( x+h + 3 x+h 3 x + 3 x 2 )
3

1 1 1
= lim 2 2
= 2 2
= 2.
3 3 3 3
h0 x+h + x+h 3 x + 3 x x+0 + x+0 3 x + 3 x 33x

In the Notes 2.3, we will develop standard rules for computing derivatives, which let us
avoid such complicated limit calculations.

Continuity Theorem. Here is a basic fact relating derivatives and continuity:

Theorem: If f (x) is differentiable at x = a, then f (x) is also continuous at x = a.


Turing this around, we have the equivalent negative statement (the contrapositive): If f (x)
is not continuous at x = a, then it is not differentiable at x = a. That is, a discontinuity is
also a non-differentiable point (a singularity).
f (x)f (a)
Proof of Theorem: Assume f (x) is differentiable at x = a, meaning f 0 (a) = limxa xa
is defined. The
Limit Law for Products gives:
f (a+h) f (a) f (a+h) f (a)
lim (f (a+h) f (a)) = lim h = lim lim h = f 0 (a) 0 = 0.
h0 h0 h h0 h h0

Thus 0 = limh0 [f (a+h) f (a)] = [limh0 f (a+h)] f (a), and limh0 f (a+h) = f (a), showing that f (x)
is continuous at x = a.
Math 132 Differentiation Formulas Stewart 2.3

So far, we have seen how various real-world problems (rate of change) and geometric prob-
lems (tangent lines) lead to derivatives. In this section, we will see how to solve such
problems by computing derivatives (differentiating) algebraically.

Notations. We have seen the Newton notation f 0 (x) for the derivative of f (x). The al-
df
ternative Leibnitz notation for the derivative is dx , meant to remind us of the definition of
f 0 (x) as the limit of difference quotients:

df f
f 0 (x) = = lim .
dx x0 x
Here f = f (x+h) f (x), the difference in f (x) corresponding to the difference x =
df
(x+h)x = h. Also, df and dx are meant to suggest very small f and x, but dx is not
literally the quotient of two small quantities, just a complicated symbol meaning the limit
of such quotients.
To illustrate: for f (x) = x2 , the formula f 0 (x) = (x2 )0 = 2x can be written in Leibnitz
notation as:
df d 2
= (x ) = 2x.
dx dx
df
The symbol dx means
the function f 0 (x); for a particular value of a derivative at x = a, we
df
write f 0 (a) = dx . The notation f 0 = Df is also used, and f 0 (x) = Df (x).
x=a

Basic Derivatives. To compute derivatives without a limit analysis each time, we use
the same strategy as for limits in Notes 1.6: we establish the derivatives of some basic
functions, then we show how to compute the derivatives of sums, products, and quotients
of known functions.
Theorem: (i) For a constant function f (x) = c, we have dxd
(c) = (c)0 = 0.
d
(ii) For f (x) = x, we have dx (x) = (x)0 = 1.
n
(iii) For f (x) = x with n a positive integer (a whole number), we have:
d n
dx (x ) = (xn )0 = nxn1 .

Proof: (i) and (ii) follow easily from the definition of f 0 (x). To prove (iii), we use the
identity: an bn = (ab)(an1 + an2 b + an3 b2 + + bn1 ), with a = x+h and b = x:

(x+h)n xn ((x+h)x) ((x+h)n1 +(x+h)n2 x+ +xn1 )


(xn )0 = lim = lim
h0 h h0 h

= lim (x+h)n1 +(x+h)n2 x+ +xn1 = (x+0)n1 +(x+0)n2 x+ +xn1 = nxn1 .


h0

This completes the proof.


We never have to repeat the above analysis: we get formulas like (x2 )0 = 2x1 = 2x
just from quoting the Theorem. In fact, the basic derivative rule (iii) applies for a power
function xp with any exponent p, not just a whole number:
Theorem: (iv) For f (x) = xp with p any real number, we have: d p
dx (x ) = (xp )0 = pxp1 .

is capital letter delta, the Greek D, standing for difference. The small letter delta is .
We will prove this for rational number exponents in Notes 2.5, but assume it for now. The
case of p = 31 repeats our computation in Notes 2.2:
0 1
3
x = (x1/3 )0 = 1 2/3
3x =
3
.
3 x2
For p = 1, the Theorem gives:
 0
1 1
= (x1 )0 = (1)x11 = 2 .
x x
Derivative Rules. Suppose the functions f (x), g(x) are differentiable at x, so that f 0 (x)
and g 0 (x) exist. Then we get the following derivatives:
Sum: (f (x) + g(x))0 = f 0 (x) + g 0 (x).

Difference: (f (x) g(x))0 = f 0 (x) g 0 (x).

Constant Multiple: (c f (x))0 = c f 0 (x) for any constant c.

Product: (f (x)g(x))0 = f 0 (x)g(x) + f (x)g 0 (x).

f (x) 0 f 0 (x)g(x) f (x)g 0 (x)


 
Quotient: = .
g(x) g(x)2
The first three of these Rules, which express the linearity of the derivative operation, are
intuitive and easy to prove. For example the Sum Rule:
(f (x+h)+g(x+h))(f (x)+g(x)) f (x+h)f (x) g(x+h)g(x)
(f (x) + g(x))0 = lim h = lim h + h
h0 h0

f (x+h)f (x) g(x+h)g(x)


= lim h + lim h = f 0 (x) + g 0 (x).
h0 h0
Here the third equality follows from the Sum Law for limits in Notes 1.6.
The Product Rule is more complicated.

Warning: The derivative of a product is NOT the product of derivatives.

We obtain the correct formula from a geometric model: consider a rectangle with changing
sides of lengths f (x) and g(x) depending on some variable x, the upper left rectangle below:

The product f (x)g(x) is the area, and the derivative (f (x)g(x))0 is the rate of change of
area with respect to a change in x. A small increment x = h leads to some increments
f = f (x+h) f (x) and g = g(x+h) g(x) in the sides, and the increment of area,
(f g) = f (x+h)g(x+h) f (x)g(x), is equal to the area of the three edge rectangles:

(f g) = (f )g(x) + f (x)(g) + (f )(g).


To get the derivative, we divide by x to get the difference quotient, and send x = h 0:
g)
(f (x)g(x))0 = lim (f
x = lim (fx
)g(x)
+ f (x)(g)
x + (fx
)(g)
x0 x0
      
f g f
= lim x g(x) + f (x) lim x + lim x lim g
x0 x0 x0 x0

= f 0 (x)g(x) + f (x)g 0 (x) + f 0 (x)(0) = f 0 (x)g(x) + f (x)g 0 (x).

Note that the third term, which goes to zero, corresponds to the tiny bottom right rectangle.
Lastly, we prove the Quotient Rule:
0 f (x+h) f (x)

f (x) g(x+h) g(x) f (x+h)g(x) f (x)g(x+h)
= lim = lim
g(x) h0 h h0 h g(x+h) g(x)
f (x+h)g(x) f (x)g(x) + f (x)g(x) f (x)g(x+h)
= lim
h0 h g(x+h) g(x)
Here, after putting the expression over a common denominator, we have added and sub-
tracted the quantity f (x)g(x) in the numerator, leaving the limit unchanged. Our aim is
to factor the first pair and last pair of terms:

f (x) 0
 
(f (x+h)f (x)) g(x) + f (x) (g(x)g(x+h))
= lim
g(x) h0 h g(x+h) g(x)
 
1 f (x+h) f (x) g(x+h) g(x)
= lim g(x) f (x)
h0 g(x+h) g(x) h h
1 f 0 (x)g(x) f (x)g 0 (x)
f 0 (x)g(x) f (x)g 0 (x) =

= .
g(x+0) g(x) g(x)2
We have again used several Limit Laws from Notes 1.6. We could give another proof of
the Product Rule in a very similar way.

Derivative computations. By repeatedly using these Rules, we can quickly compute the
derivatives of most functions.
d
example: Find ( x)0 = dx ( x). Solution: ( x)0 = (x1/2 )0 = 21 x(1/2)1 = 21 x1/2 = 21 x ,
where we used the Basic Derivative (xb )0 = bxb1 with b = 12 .

example: ( 10)0 = 0 since the derivative of any constant, even a complicated one, is zero.
df
example: For f (x) = (5x2 + 1)( x 3), find the derivative f 0 (x) = dx :
0 0  0
(5x2 +1)( x3) = 5x2 +1 ( x3) + 5x2 +1 ( x3) by Product Rule

5(x2 )0 +(1)0 ( x3) + 5x2 +1 (( x)0 (3)0 ) by Sum & Const Mult Rules
 
=

= 5(2x1 )+(0) ( x3) + 5x2 +1 12 x1/2 (0) by Basic Derivatives
 

= 10x( x3) + (5x2 +1) 21 x Tidying up


Note how we used the derivative from the previous example, ( x)0 = 12 x1/2
Another way to find the same derivative would be to multiply out first:

f (x) = (5x2 +1)( x3) = 5x2 x 15x2 + x 3 = 5x5/2 15x2 + x1/2 3.
Then we get the derivative:

f 0 (x) = 5( 52 x(5/2)1 ) 15(2x1 ) + 21 x(1/2)1 0 = 25
2 x x 30x + 1

2 x
.
This agrees with our previous answer, multiplied out.
t5
+1
example: Differentiate g(t) = t t
. Solution by the Quotient Rule:
 5 0
4 )(t t) (t5 +1)( 3 t1/2 )
dg t +1 (t5 +1)0 (t t) (t5 +1)(t t)0 (5t
g 0 (t) = = = = 2
,
dt t t (t t)2 t3

where we use t t = t3/2 .
Solution by multiplying out: t1
t
= t3/2 , so:

g(t) = (t5 +1)t3/2 = t7/2 + t3/2 and g 0 (t) = 7 5/2


2t 3 1/2
2t .
Example: A block of ice has length 10cm, width 5cm, and height 20cm. Its length and
width are melting at a rate of 1cm per hour, but its height is melting at 2cm per hour
(because the ground is warmer than the air). How fast is the volume decreasing?
Solution: The volume is V = `wh cm3 , where V, `, w, h are all functions of time t. To
get the rate of change, we compute the derivative using the Product Rule twice, considering
`wh = (`)(wh):
dV
= V 0 = (`wh)0 = (`)0 (wh) + (`)(wh)0 = `0 wh + `(w0 h + wh0 ) = `0 wh + `w0 h + `wh0 .
dt
We want the melt rate at the current time t = 0, and we are given: `(0) = 10 cm with
`0 (0) = 1 cm/hr; w(0) = 5 cm with w0 (0) = 1 cm/hr; and h(0) = 20 cm with h0 (0) = 2
cm/hr. Thus:
V 0 (0) = `0 (0)w(0)h(0) + `(0)w0 (0)h(0) + `(0)w(0)h0 (0)
= (1)(5)(20) + (10)(1)(20) + (10)(5)(2) = 400 cm3 /hr.
Higher derivatives. Since the derivative operation turns a function f (x) into another
function f 0 (x), we can do
0 00
 it again to f (x), obtaining yet another function denoted f (x) =
2
(f 0 (x))0 or ddxf = dx
d df
dx , called the second derivative of f (x).
In real-world terms, if f 0 (x) is the rate of change of f (x), then f 00 (x) is the rate of
change of f 0 (x), namely how much the rate f 0 (x) is speeding up or slowing down.
Example: A stone falls f (t) = 16t2 ft in t seconds. Compute the repeated derivatives of
this function, and interpret their physical meaning.
The first derivative is f 0 (t) = (16t2 )0 = 16(2t1 ) = 32t ft/sec. This is the velocity
v(t) = f 0 (t) = 32t ft/sec, increasing proportional to time.
The second derivative is f 00 (t) = (32t)0 = 32, with units ft/sec per sec = ft/sec2 . It
means the rate of change of velocity, how many ft/sec of speed the stone gains each
second. This is the acceleration of the stone, a(t) = f 00 (t) = 32 ft/sec2 , the constant
acceleration due to gravity.
The third derivative f 000 (t) = (32)0 = 0, meaning the rate of change of a constant
acceleration is zero. The physics term for this quantity is the jerk, and we see that
gravity does not jerk: it pulls smoothly. All higher derivatives are also zero; these do
not have names.
Math 132 Trigonometric Derivatives Stewart 2.4

Derivative of sine and cosine. The sine and cosine are important functions describing
periodic motion. From the graph y = sin(x) (in blue), let us examine the slope at each
point to sketch the graph of the derivative y = sin0 (x) (in red), as in Notes 2.3:

The graph y = sin(x) has hills and valleys at x = 21 , 32 , 25 , . . ., so sin0 (x) = 0


at these points. For the interval 12 < x < 21 , the slope of y = sin(x) is positive, so
y = sin0 (x) > 0 swells above the x-axis, and similarly on the other intervals. The graph we
have drawn seems to be roughly the cosine function, so we may guess that:
??
sin0 (x) = cos(x).

To prove this, we need two lemmas (minor theorems which help to prove a major one):
sin() cos()1
lemma: (a) lim0 = 1 (b) lim0 = 0.
0
Proof of (a): This is a difficult limit of the form 0.
Consider a sector OP Q of radians:
this means a pie-slice of radius r = 1, whose circular outer rim has length . (For example,
= 2 would mean a full circle.)

The area of the sector is proportional to the angle, increasing from A = 0 for = 0, to
A = r2 = for = 2, so it is A = 21 for arbitrary . From basic trigonometry, we
know that the height of the triangle 4OQP (inside the sector) is sin(), so its area is
1 1 1
2 (base)(height) = 2 (1) sin() = 2 sin(). Also, the height of the triangle 4OQT (outside
the sector) is tan(), so its area is 21 tan(). We have:

area(4OQP ) area(sector) area(4OQT )


1 1 1
2 sin() 2 2 tan()
sin()
sin() tan() = cos() .

The tangent function gets its name because it is the length of the segment QT tangent to the circle.
1 sin()
Multiplying the left inequality by gives: 1. Multiplying the right inequality by
cos()
gives: cos() sin()
. Thus:

sin()
cos() 1.

That is, the graph y = sin()


is trapped between y = cos() and y = 1. But lim0 cos() =
cos(0) = 1, and lim0 1 = 1, so the Squeeze Theorem (Notes 1.6) implies lim0 sin()
= 1,
as desired.
Proof of (b): We compute:
cos()1 cos()1 cos()+1 cos2 ()12
= cos()+1 = (cos()+1)

sin2 ()
= (cos()+1) = sin()
sin()
cos()+1 .
Hence by the Product Limit Law:
cos()1 sin() sin() 0
lim = lim lim = 1 1+1 = 0,
0 0 0 cos()+1

where we used the result established in (a).


theorem: sin0 (x) = cos(x) and cos0 (x) = sin(x).
Proof: Starting with the definition of the derivative, and simplifying by the Angle Addition
Formula for sine, we have:
sin(x+h) sin(x) sin(x) cos(h) + cos(x) sin(h) sin(x)
sin0 (x) = lim h = lim h
h0 h0

sin(x)(cos(h)1) + cos(x) sin(h) cos(h)1 sin(h)


= lim h = sin(x) lim h + cos(x) lim h
h0 h0 h0

= sin(x) (0) + cos(x) (1) = cos(x).

We used Lemmas (a) and (b) to get the last line. The proof of cos0 (x) = sin(x) is similar.

General trigonometric derivatives. From these basic derivatives, we can compute the
derivative of any trig function or combination of trig functions.
example: Compute the derivative of tan(x). The Quotient Rule for derivatives (Notes
2.3) gives: 0 0 cos0 (x)

sin(x)
tan0 (x) = cos(x) = sin (x) cos(x)sin(x)
cos2 (x)

cos(x) cos(x)sin(x)( sin(x))


= cos2 (x)
= 1
cos2 (x)
= sec2 (x),

since cos2 (x) + sin2 (x) = 1. In fact, we get the following derivatives:

f (x) sin(x) cos(x) tan(x) sec(x) csc(x) cot(x)


f 0 (x) cos(x) sin(x) sec2 (x) tan(x) sec(x) cot(x) csc(x) csc2 (x)

Warning: These formulas are only valid if the angle x is in radians, not degrees.
Limits of quotients. We can also compute trigonometric limits of the form 00 . The trick
is to manipulate the numerators and denominators to get factors of the form sin(g(x))
g(x) , where
g(x) is any quantity which goes to zero.
sin(3x)
example: Compute limx0 x . We have:
sin(3x) sin(3x) 3x sin(3x) 3x sin(h)
lim x = lim 3x x = lim 3x lim = lim lim 3 = 1 3 = 3.
x0 x0 x0 x0 x h0 h x0

Here we use limx0 sin(3x)


3x = limh0 sin(h)
h = 1, where we substitute h = g(x) = 3x, so that
x 0 forces h 0.
tan(x) sin(x)
example: Compute limx0 .
sin( x)
Starting with tan(x) = cos(x) , we get:

tan(x) 1 1
lim = lim sin(x)
sin( x)
x0 x0 cos(x) sin( x)

1 sin(x) x 1
= lim x
x0 cos(x) x sin( x) x

x sin(x) 1 0 1
= lim sin(x) = 1 = 0,
x0 cos(x) x cos(0) 1
x

where limx0 sin(x x) = 1 by the substitution h = g(x) = x.


By the Limit Substitution Theorem at the end of Notes 1.7.
Math 132 The Chain Rule Stewart 2.5

Chain of functions. On a Ferris wheel, your height H (in feet) depends on the angle
of the wheel (in radians): H = 100 + 100 sin(). The wheel is turning at one revolution per
1
minute, meaning the angle at t minutes is = 2t radians. At t = 12 , we have = 6 and:

H = 100 + 100 sin(2t) = 100 + 100 sin( 6 ) = 150 ft.

At this moment, how fast are you rising (in ft/min)?


The answer is given by the Chain Rule, which computes the derivative for a chain of
functional dependencies: one variable H depends on a second variable , which depends on
a third variable t. The Rule states:
dH dH d
=
dt d dt

ft ft rad
=
min rad min
The rate of change of height with respect to angle is:
dH d
= (100 + 100 sin()) = 0 + 100 sin0 ()
d d
ft
= 100 cos() = 100 cos( 6 )
= 86.6 .
rad
The rate of change of angle with respect to time is:
d d rad
= (2t) = 2
= 6.28 .
dt dt min
Thus, the Chain Rule says the rate of change of height with respect to time is the product:
dH ft rad ft
= 86.6 6.28 = 544 .
dt rad min min
1
Your rate of rise is about 544 feet per minute, at time t = 12 .

Chain Rule: Let y, u, x be variables related by y = f (u) and u = g(x), so that y = f (g(x)).
Then, in Leibnitz notation:
dy dy du
=
dx du dx
or in Newton notation:
f (g(x))0 = f 0 (g(x)) g 0 (x).
This holds at any value of x where g 0 (x) and f 0 (g(x)) are both defined.
The function f (g(x)) is called the composition of f following g, sometimes denoted f g,
so that we may write f (g(x))0 as (f g)0 (x).
Proof of the Chain Rule. We will prove the Rule with the extra assumption that g(x) is a

For a general proof without this assumption, see the Stewart text 2.5, p. 153.
one-to-one function near a given x = a: that is, for x close enough (but unequal) to a, we
have g(x) 6= g(a). Then we compute, using the alternative defintiion of derivative:
f (g(x)) f (g(a))
(f g)0 (a) = d

dx f (g(x)) x=a = lim

xa xa
f (g(x)) f (g(a)) g(x) g(a)
= lim lim
xa g(x) g(a) ua xa

f (u) f (g(a)) g(x) g(a)


= lim lim
ug(a) u g(a) ua xa

= f 0 (g(a)) g 0 (a) .
Here we used the Limit Substitution Theorem from Notes 1.7, substituting u for g(x) so
that x a forces u g(a). (Since g(x) is differentiable at x = a, it is also continuous.)

Differentiation Rules. Along with our previous Derivative Rules from Notes 2.3, and
the Basic Derivatives from Notes 2.3 and 2.4, the Chain Rule is the last fact needed to
compute the derivative of any function defined by a formula.
example: Find the derivative of (x+ x1 )10 . First, we use Leibnitz notation: let y = u10 and
u = x + x1 , so that y = (x+ x1 )10 . Then:
dy dy du d 10 d d
x+ x1 = 10u9 x+x1
 
= = (u )
dx du dx du dx dx
9 9
= 10 x+ x1 1+(1x2 ) = 10 x+ x1 1 x12 .
   

Next, we redo this in Newton notation, without introducing new letters y, u. Let f (x) =
x10 with f 0 (x) = 10x9 , and g(x) = x + x1 = x + x1 with g 0 (x) = 1 x2 = 1 x12 , so that:
9
f (g(x))0 = f 0 (g(x)) g 0 (x) = 10 x+ x1 1 x12 .


A third way (the quickest in practice) is to think of the composite function as an outside
function out = ( )10 wrapped around an inside function in = x + x1 , so the Chain Rule
becomes:
out(in)0 = out 0 (in) in 0
Here out 0 = 10( )9 , so:
9 0 9
out(in)0 = 10 x+ x1 x+ x1 = 10 x+ x1 1 x12


example: For any function u = g(x), and any number n, we have:


d n
dx (u ) = nun1 du
dx and (g(x)n )0 = n g(x)n1 g 0 (x).

example: Find the derivative of 1


. Here the outer function is out =
1
= ( )1/2
x cos(x)
with out 0 = 21 ( )3/2 . Thus:
 0
1 = 12 (x cos(x))3/2 (x cos(x))0
x cos(x)
= 21 (x cos(x))3/2 ((x)0 cos(x) + x cos0 (x)) = 21 (x cos(x))3/2 (cos(x) x sin(x))
Here we used the Chain Rule, then the Product Rule.
example: Compare the derivatives of sin(x2 ) and sin2 (x). Note that if f (x) = sin(x) and
g(x) = x2 , we have sin(x2 ) = f (g(x)), but sin2 (x) = g(f (x)). Thus:
0
sin(x2 ) = sin0 (x2 ) (x2 )0 = cos(x2 ) 2x = 2x cos(x2 )
0 0
sin2 (x) = = 2(sin(x)) sin0 (x) = 2 sin(x) cos(x)
(sin(x))2
  
x
example: Find the derivative of sin tan x+1 , a composition of three functions. We
start
 by applying the Chain Rule to the outermost function sin( ), with inner function
x
tan x+1 ; then we use the Chain Rule again on this.
   0      0
x
sin tan x+1 = sin0 tan x+1 x
tan x+1x

      0
= sin0 tan x+1x
tan0 x+1x x
x+1
0 0
    
= sin0 tan x+1
x
tan0 x+1x
(x) (x+1)x(x+1)
(x+1)2

    
= cos tan x+1x
sec2 x+1x
(x+1)x
(x+1)2

The last factor uses the Quotient Rule.


example: What if we apply the Chain Rule to a complicated constant like 3 , where we
consider x3 as the outside function and the constant function p(x) = as the inside? Then:
0
3 = 3 2 ()0 = 3 2 0 = 0,

since ()0 = c0 = 0. Any expression with no variable in it is constant, with derivative zero.

Derivative of a Power Function. In Notes 2.3, we proved the Basic Derivative of a


power function, (xp )0 = pxp1 , when the exponent is p = n, a positive integer (a whole
number); but we stated that this formula is valid for any real number p. Assuming the
Chain Rule and Quotient Rule (which we have proved), we now prove the Basic Derivative
n
when the exponent is p = m , a rational number (a fraction):
 0
xn/m = n (n/m)1
mx , where n, m are integers with m 6= 0.

Proof. First, we prove the formula for exponent p = n, a negative integer power, using
the Quotient Rule:
1 0 (1)0 xn (1)(xn )0 nxn1
(xn )0 = = (n)xn1 .

xn = (xn )2
= x2n

Thus, we have proved the formula (xp )0 = pxp1 for p a positive or negative integer.

If m is even, this only makes sense on the domain x > 0.
Next, consider the equation (xn/m )m = xn where n, m are positive or negative integers.
We take derivatives of both sides, and expand the left side by the Chain Rule, and both
sides by the Basic Derivative for integer exponents:
0
(xn/m )m = (xn )0
0
m(xn/m )m1 xn/m = nxn1

Solving for (xn/m )0 :

nxn1
(xn/m )0 = = n n1(n/m)(m1)
mx = n (nm)/m
mx = n (n/m)1
mx .
m(xn/m )m1

This proves (xp )0 = pxp1 for all fractional powers p = m n.


It
takes a more sophisticated theory to prove it for irrational real number powers such
as x 2 . This is the kind of subject addressed in Real Analysis, Math 320.

Degrees versus radians. In higher mathematics, we always use radian measure (full
circle = 2 radians), so that sin(x) always means sine of x radians. This is essential to get
the formula sin0 (x) = cos(x).
The sine with input x in degrees (full circle = 360 deg) is acutally a different function,
which we can denote as sindeg (x). Remember that a function is a rule which converts input
numbers to output numbers: it does not know that we interpret some numbers as angles,
or what their units should be. Since sin(x) and sindeg (x) produce different outputs from a
given number x, they are different functions. In fact, we have:
2

sindeg (x) = sin 360 x .

The inside operation converts x from degrees to radians, then feeds this into the ordinary
(radian) sine function.
This makes a crucial difference in the derivative:
0  2 0
sin0deg (x) = sin 360
2 2 2

x = cos 360 x 360 x = cosdeg (x) 360 .

This is why we stay away from degree measure in calculus!


The geometric definition of radian measure is that an arc of length x on a unit circle makes an angle of
x radians. The full circle, whose arc length is the circumference 2, measures as 2 radians.
Math 132 Implicit Differentiation Stewart 2.6

Explicit versus implicit functions. Given the circle defined by the equation x2 +y 2 = 25,
suppose we wish to find the tangent line at the point (x, y) = (3, 4). Calculus finds a tangent
df
slope of a function graph y = f (x) as a derivative f 0 (a) = dx |x=a ; but there is no function
specified in our problem.
Rather, we must interpret x as an independent variable, which implicitly makes y a
function of x: to make this explicit, we solve the equation for y, giving y = 25 x2 .
That is, the circle is the union of two function graphs, y = 25 x2 and y = 25 x2 ,
each over the domain x [5, 5].

The given point (3.4) is on the first of these, and we differentiate this explicit function:
1 1
p
dy
dx = d
dx 25x2 = dx d
(25x2 ) 2 = 12 (25x2 ) 2 dx
d x
(25x2 ) = 25x 2
.

Here we used the Chain Rule with outside function ( )1/2 . At our point, we have the
dy 3
tangent slope dx |x=3 = 253 2
= 34 , and our tangent line has equation y = 34 (x3) + 4.
Implicit differentiation is a smoother way to do this problem. Instead of solving the
equation for y, we assume y = y(x) for some unkown function y(x) which satisfies the
equation x2 + y(x)2 = 25. Then we differentiate both sides using the Rules:
0
x2 + y(x)2 = (25)0
0 0
x2 + y(x)2 = 0
2x + 2y(x)y 0 (x) = 0.
Note that (x2 )0= 2x is a Basic Derivative, but for (y 2 )0 , we need the Chain Rule with
outside function ( )2 and inside function y = y(x). The derivative y 0 (x) is the unknown
we are trying to find, and now we can solve for it: y 0 (x) = y(x)
x
, which was easier than
solving for the original y(x). Since are considering the point (x, y) = (3, 4), we must have
dy
y(3) = 4, so that dx |x=3 = y 0 (3) = y(3)
3
= 34 , as before.
dy
Note that the formula y 0 (x) = y(x)x
, or in Leibnitz notation dx = xy , is valid for both
of the functions defining the upper and lower half-circles. Since both functions obey the
original equation, they both obey the derivative equation. For example, at (x, y) = (3, 4),
the slope is y 0 (3) = y(3)
3 3
= 4 = 34 .
We could even take this one step further to find the second derivative implicitly:
x
0 0 yx( y ) 2 +x2
y 00 (x) = (y 0 (x))0 = ( xy )0 = (x) yx
y2
y
= y2
= y y3
= y253 .

We used the Quotient Rule, the previous y 0 = xy , and the original equation x2 + y 2 = 25.
Folium of Descartes. This is a curious curve discovered by the famous mathematician
who gave us Cartesian xy-coordinates. It is defined by the equation: x3 + y 3 = 9xy, with
graph :

We want to find the tangent line at the point (x, y) = (2, 4), which is on the curve because
23 + 43 = 9(2)(4). In this case, there is no easy way to solve for y to get an explicit function
y(x); indeed, over x [0, 92 ], the curve is the union of three function graphs.
Nevertheless, implicit differentiation works without a hitch: we assume y = y(x) is some
unknown function which satisfies the equation, and differentiate both sides (this time in
Leibnitz notation):
d 3 3 d

dx x + y = dx (9xy)
d 3 + d y3 d d
  
dx x dx = 9 dx (x) y + x dx (y)
dy dy
3x2 + 3y 2 dx = 9y + 9x dx .
dy
Here we used the Sum and Product Rules, then the Chain Rule. Solving for dx :

dy dy dy 9y3x2
3y 2 dx 9x dx = 9y 3x2 , dx = .
3y 2 9x
We do not know y(x) explicitly, but our given point (x, y) = (2, 4) means that y(2) = 4, so:

dy 9y3x2 9(4) 3(22 )
dx x=2 = = = 45 .
3y 2 9x 3(42 ) 9(2)

Thus, the tangent line through the point (2, 4) is: y = 45 (x2) + 4.

Method for implicit differentiation. Given an equation involving variables x and y,


we assume x is an independent variable and y = y(x) is a dependent variable. To find the
dy
derivative dx :
1. Take the derivative of both sides of the equation, using the Chain Rule for expressions
involving y = y(x) as the inside function.
dy
2. Solve the derivative equation for the unknown dx , in terms of x and y.
dy
3. To get a specific value y 0 (a) = dx |x=a , plug in the known values x = a and y = y(a).


To find points satisfying this equation, substitute y = tx for a new variable t, and solve for x, giving:
9t 9t2
x= 1+t3
and y = 1+t 3 . Then each value of t gives a point (x, y) on the curve: this is called a parametrization.
Math 132 Rates of Change Stewart 2.7

Conceptual levels. Mathematics solves problems partly with technical tools like the
differentiation rules, but its most powerful method is to translate between different levels of
meaning, transforming the problems to make them accessible to our tools. Problems often
originate at the physical or geometric levels, and we translate to the numerical or algebraic
levels to solve them, then we translate the answer back to the original level.
Our key concept so far has been the derivative, with the following meanings:
dy
Physical: For a function y = f (x), the derivative dx = f 0 (x) is the rate of change of y
with respect to x, near a particular value of x. For a a particular input, f 0 (a) means
how fast f (x) changes from f (a) per unit change in x away from a. This is the main
importance of derivatives.

Geometric: For a graph y = f (x), the derivative f 0 (a) is the slope of the tangent line
at the point (a, f (a)).

Numerical: We approximate the derivative by the difference quotient:

f 0 (a)
= f
x = f (a+h)f (a)
h .

The right side is the average rate of change of f (x) from x = a to x = a+h. As
x = h 0, the difference quotient approaches the instantaneous rate of change, the
derivative f 0 (a).

Algebraic: We can easily compute the derivative of almost any function defined by a
formula. Basic Derivatives like (xp )0 = pxp1 , sin0 (x) = cos(x), and cos0 (x) = sin(x)
are combined using the Sum, Product, Quotient, and Chain Rules for Derivatives.
Occasionally, we must go back to the definition f 0 (a) = limh0 f (a+h)f
h
(a)
.

Functions of motion. We consider the basic physical quantities describing motion. These
are all functions of time t. (See end of 2.3.)

Position or displacement s, the distance of an object past a reference point, in feet,


at time t seconds.

Velocity v = dsdt , how fast the position is increasing per second (ft/sec); this is negative
if position is decreasing. The speed is the magnitude |v|.
2
Acceleration a = dv ds
dt = dt2 , how fast the velocity is increasing, the number of ft/sec
gained each second (ft/sec2 ). Equivalently, this is how fast the object is speeding up
(positive) or slowing down (negative).
da d3s
Jerk j = dt = dt3
, the rate of change of acceleration (ft/sec3 ).

Driving. An insurance company downloads the following data from a cars speedometer,
allowing them to construct the following graph of the cars velocity v(t). What physical
story does this graph tell?
The dip of negative velocity at the beginning is probably the car slowly backing down a
driveway. It goes forward a few blocks at a moderate, almost constant speed (positive
velocity), stops at an intersection (zero velocity), then continues at higher speed.
From the velocity data, we can reconstruct the odometer data, the graph of the distance
function s(t): the level of the velocity graph is the slope of the distance graph.

The distance starts at some positive odometer reading s = s0 (which we cannot know from
the velocity data alone), decreases a bit (negative slope) because of the negative velocity,
increases with constant slope during constant positive velocity, stays at a constant level
during zero velocity, then increases with greater slope after the velocity goes up.
Nothing remarkable so far. What about the acceleration, the derivative a(t) = dvdt ? The
slope of the velocity graph is the level of the acceleration graph:


Assuming the odometer runs both ways, like old mechanical odometers used to.
Here a(t) is roughly proportional to the depression of the gas or brake pedal, and it is
zero except when the car is speeding up or slowing down. The most prominent feature is
the spike after t = 120: just how strong an acceleration is this? The tangent line marked
on the velocity graph shows a change from 0 to 40 mph in about 3 sec, meaning a slope
a(122)
= 40
3 = 10.3 mph/sec. Now,

1 mi 5280 ft ft
hr = 3600 sec = 1.47 sec ,

so we convert 10.3 mph per second = (10.3)(1.47) = 20 ft/sec per second = 20 ft/sec2 .
Compare this to the standard acceleration due to gravity: one gee is about 32 ft/sec2 , so
this driver feels about 2/3 the force of gravity pushing him into the seat-back. It seems he
(Im pretty sure its not a she) is flooring the accelerator, roaring ahead from a standstill
with tires squealing, then easing up past 40 mph or so. Not responsible driving!
Finally, note the jump in a at t = 90, where the car goes from braking deceleration to a
standstill. The change in a is not so large, but it happens so fast that it looks instantaneous,
and the a(t) graph seems to rise veritically (infinite slope). This means the derivative of
acceleration, the jerk j = dadt , is huge at this moment, and the car experiences a lurching
stop, another sign of sloppy driving. This drivers insurance rates are going up!
Note that in this analysis, we have translated from the graphical (geometric) to the
physical level; and also (for the gee calculation) from the graphical to the numerical to the
physical.

Ballistic equation. This is the formula giving the height s(t) for an object launched from
initial height s0 , straight upward with initial velocity v0 , under the influence of a constant
gravitational acceleration g:
s(t) = s0 + v0 t 12 gt2 .
To justify this equation, note that the initial height is indeed s(0) = s0 +v0 (0) 12 g(02 ) = s0 .
Also, s0 , v0 , g are constants, so:

v(t) = s0 (t) = (s0 )0 + (v0 t)0 ( 12 gt2 )0 = v0 gt,

and indeed the initial velocity v(0) = v0 . The acceleration is a(t) = v 0 (t) = g, which is the
desired constant in the correct (downward) direction. Finally, the jerk is j(t) = 0, which is
correct because gravity pulls steadily and never jerks.
example: Given standard gravity of 32 ft/sec2 and initial height s0 = 5 ft, how fast to
throw a ball upward so that it stays airborne for 5 sec? The equation becomes s(t) =
5 + v0 t 16t2 , with v0 an unknown constant. Landing at 5 sec means s(5) = 0, that is
5 + v0 (5) 16(52 ) = 0, and solving, v0 = 79 ft/sec. (This is 79/1.47 = 54 mph!)
How high will the ball go from such a throw? At the instant t = t1 when the ball reaches
79
the top of its arc, its velocity is zero. That is: v(t1 ) = 79 32t1 = 0, and t1 = 32 = 2.47
sec. (This is not quite half the 5 sec interval, because the ball started out at s0 = 5 ft.)
The height at this instant is s(t1 ) = 297 35
64 = 297.5 ft. It would take a baseball pitcher to
throw a ball that high.
Note that the graph s = 5 + 79t + 16t2 is a downward-curving parabola, but this is not
the trajectory of the ball, which is going straight up and down. For t < t1 , the height s(t)
is increasing, and the velocity v(t) = 79 32t is positive; for t > t1 , s(t) is decreasing, and
v(t) is negative.
Math 132 Related Rates Stewart 2.8

Pulley example. Consider a weight hanging from a rope which stretches up to a pulley
10 ft above the floor, then to your hand, which is 3 ft above the floor and 15 ft horizontally
from the pulley. If you walk away from the pulley at 2 ft/sec, how fast will the weight rise?
We want to find an unknown rate of change from a known rate which is related to
it geometrically. To start any such problem, we draw a picture and label constant parts
with their values: the lengths 3 and 7 below, which will not change as your hand moves
horizontally. We label variable parts with letter names: the variable h = h(t) is the
horizontal distance from weight to hand, and r = r(t) is the length of rope from pulley to
hand, both functions of time t.
The problem specifies the current values of some variables, usually meaning at time
t = 0: h(0) = 15. Finally, for each variable we draw an arrow marked with its current rate
of change: we know h0 (0) = 2, and r0 (0) is the target rate which we want to compute, since
the weight goes upward at the same rate as r increases.

Next, we write equations implied by the geometry of the picture: the Pythagorean Theorem
implies r2 = h2 + 72 . To determine r0 (0), we compute r(t) explicitly, and differentiate:
p
r(t) = h(t)2 +49
0
r0 (t) = 21 (h(t)2 +49)1/2 h(t)2 +49

= 1 2 1/2 2h(t)h0 (t).


2 (h(t) +49)

h(t)h0 (t)
= p .
h(t)2 +49
Plugging in the current values at t = 0:
h(0)h0 (0) (15)(2) 30
r0 (0) = p = = = 1.8 ft/sec .
h(0)2 +49 2
15 +49 274
We could do this a bit more simply by implicitly differentiating both sides of the equation
r2 = h2 + 72 , then solving for r0 (t):
0 0
r(t)2 = h(t)2 + 49
2r(t)r0 (t) = 2h(t)h0 (t)
h(t)h0 (t)
r0 (t) = .
r(t)

Now, r(0) = h(0)2 + 49 = 274, so plugging in current values: r0 (0) = (15)(2)
p

274
as before.
Warning: It is essential to plug in the current values
p only in the last step: if we substi-
tuted before differentiating, we would get: (r(0))0 = ( h(0)2 + 49)0 = 0 since the derivative
of any constant (even a complicated constant) is zero.

Method for related rates problems


1. Draw a picture labeled with:
numerical constant values
letter variables and their known current values
arrows showing known current rates of change (derivatives)
an arrow for the unknown rate of change which is desired (the target rate)

2. Write an equation relating the variables according to the geometry of the picture.
d
3. Assuming each variable is a function of time t, take the derivative dt of both sides of
the equation, with the Chain Rule producing derivatives of the variables. If necessary,
solve the derivative equation for the derivative which is desired.
4. Plug in the current values of the variables and rates to compute the target rate.

Ice block example. We saw a related rates problem in Notes 2.3, last page.

Spill radius example. A stream of water is spreading a circular puddle on the floor. If
the puddle is 1 meter across, and the stream increases the area at a rate of 2 sq m/min,
then how quickly is the puddle widening?

The variable quantities are the radius r and the area A. We know the current value r(0) = 12
and the current rate A0 (0) = dA 0
dt |t=0 = 2. The unknown rate which we must find is r (0).
2
The area is related to the radius by the equation: A = r . Differentiating the equation:
0
A0 (t) = r(t)2 = 2 r(t) r0 (t).

A0 (t) A0 (0)
Solving for the target rate: r0 (t) = 2 r(t) , and r0 (0) = 2 r(0) = 2
= 0.64 m/min.
2( 12 )
= 2

It is important to check a real-world result for plausibility. The puddles radius is
growing (positive derivative) at a rate of about half a meter per minute, which is reasonable.
Searchlight example: A searchlight is shining along a wall 20 meters away. If the position
of the light is 30 away from looking directly at the wall, and the light is turning at 5 per
second, then what is the speed of the spotlight image moving along the wall?

The distance from the wall is the constant 20; the variable quantities are and s. The angle
(t) has current value (0) = 30 and current rate 0 (0) = 5/sec, and we seek to compute
the unknown rate s0 (0) = dsdt |t=0 . From the definition of tangent, we have the equation:
s
tan() = 20 , so we can easily solve for s = 20 tan(). Differentiating (in Leibnitz notation
this time):
ds d 2 d
dt = dt (20 tan()) = 20 sec () dt ,
d
since dx tan(x) = sec2 (x) from the table in Notes 2.4. We do not need to solve for ds dt ,
since we already solved for s before differentiating.
Finally, to plug in the current values of the angles, we must convert them to radians,
because the trig differentiation formulas are only valid for radian measure (see last page of
Notes 2.5). Thus:
(0) = 30 = 30( 360 2
) = 6 rad,

0 (0) = d
dt |t=0 = 5/min = 5( 360
2
) =
36 rad/sec,
so the current speed is:

s0 (0) = ds
= 20 sec2 ( 6 ) 20

dt t=0 36 = 27 = 2.3m/sec.

Note that plugging in d dt |t=0 = 5 deg/sec instead of 36 = 0.09 rad/sec would give a wildly
incorrect answer: the conversion to radians is essential.
One last point: the problem specifies only the speed of , not the velocity toward or
away from the wall, so we only know 0 (0) = 36
, either plus or minus, though in the
picture we assumed it was plus. Thus we can only compute s0 (0) = 20 27 , but in any case
the speed is |s0 (0)| = 20
27 .
Math 132 Linear Approximation Stewart 2.9

Tangent linear function. The geometric meaning of the derivative f 0 (a) is the slope of
the tangent to the curve y = f (x) at the point (a, f (a)). The tangent line is itself the graph
of a linear function y = L(x), where:

L(x) = f (a) + f 0 (a)(xa).

This is correct because the line y = f (a) + f 0 (a)(xa) has slope m = f 0 (a), and L(a) =
f (a) + f 0 (a)(aa) = f (a), so the line passes through the point (a, L(a)) = (a, f (a)).
The value f 0 (a) is not just the slope of the tangent line: it is also the slope of the graph
itself, because as we zoom in toward (a, f (a)), the graph and the tangent line become
indistinguishable :

This suggests a further numerical meaning of the derivative: any function f (x) is very
close to being a linear function near a differentiable point x = a, so that L(x) is an excellent
approximation for f (x) when x is close to a:

f (x) L(x) = f (a) + f 0 (a)(xa) for x close to a.



example:
Find a quick approximation for 1.1 without a calculator. Clearly, this is close

to 1 = 1, but we want better. Take f (x) = x, so f 0 (x) = 12 x1/2 and f 0 (1) = 12 . For x
near a = 1, we have the linear function:

L(x) = f (1) + f 0 (1)(x1) = 1 + 21 (x1),

and the linear approximation:



1.1 = f (1.1) = L(1.1) = 1 + 12 (0.1) = 1.05.

A calculator gives: 1.1 1.049, so our answer is correct to 2 decimal places with very
little work. Furthermore,
we get approximations for all other square roots near 1 for free,
for example 0.96 = 1 + 12 (0.961) = 10.02 = 0.98.
example: Approximate sin(42 ) without a scientific calculator. This is clearly close to
sin(45 ) = 22 0.71, so let us take a = 45 . Now, to use calculus with trig functions, we

By contrast, if we zoom in toward a non-differentiable point, such as (0, 0) for the graph y = |x|, the
graph does not look more and more linear, but rather keeps its angular appearance.

2
must always convert to radians: a = 45( 360 )=
4 rad. Thus f 0 (a) = sin0 ( 4 ) = cos( 4 ) = 2
2 ,
and we have the linear function:

2 2
L(x) = 2 + 2 (x 4 ).

The linear approximation is:



sin(42 ) = sin 42( 360
2 2 2 2
) 4 2 2
  
) L(x) = 2 + 2 42( 360 = 2 2 60 0.67.

A scentific calculator gives sin(42 ) 0.669, so again the linear approximation is accurate
to two decimal places.

Error sensitivity. We can rewrite the linear approximation f (x) f (a) + f 0 (a)(xa) as:

f = f (x)f (a) f 0 (a)(xa) = f 0 (a) x.

That is, we can approximate the change in f (x) away from f (a) in terms of the change in
x away from a. In Leibnitz notation, with y = f (x), we write this as:
dy
y dx x.
dy dy
Here we mean dx = dx |x=a = f 0 (a). If we think of x as an error from an intended input
value x = a, then f f 0 (a) x approximates the error from the intended output f (a).
example: A disk of radius r = 5 cm is to be cut from a metal sheet weighing 3 g/cm2 . If
the radius is measured to within an error of r = 0.2 cm, what is the approximate range
of error in the weight? This is the kind of error-control problem from our limit analyses in
Notes 1.7, only now we have the powerful tools of calculus to give a simple answer.
The weight is given by the function:

W = W (r) = 3r2 with W (5) = 75 235.6,

and we aim to find the error W away from this intended value. Since:
dW dW
dr = 3(2r) = 6r and dr |r=5 = 30,

we have the approximate error:


dW
W dr r = 30 r.

Thus, for r = 0.2, we have W 30(0.2) 18.8. That is:

r = 5 0.2 cm = W 235.6 18.8 g .

The point here is not just the specific error estimate, but the formula which gives, for
any small input error r, the resulting output error W 30 r 94 r. The coefficient
30 measures the sensitivity of the output W to an error in the input r.

Differential notation. For y = f (x), we rewrite a small x as dx, and we define:

dy = dy
dx dx and df = f 0 (x) dx.
The dependent variable dy is called a differential: we can think of it as the linear approxi-
mation to y, as pictured below:

example: We can rewrite the approximation in the previous example as:


dW d 2
W dW = dr dr = dr (3r ) dr = 6r dr.
Here dr is just another notation for r, and the approximation W 6r r is valid near
any particular value of r, such as r = 5 in the example.

Linear Approximation Theorem. How close is the approximation y dy, or equiva-


lently f (x) L(x) = f (a) + f 0 (a)(xa)? In fact, the difference between f (x) and L(x) is
not only small compared to x = xa, but comparable to (x)2 = (xa)2 , which becomes
tiny as x 0.
Also, the slower the derivative f 0 (x) changes near x = a, the closer y = f (x) is to being
linear, and this is measured by the rate of change of f 0 (x), namely the second derivative
f 00 (x). The following theorem gives an upper bound on the error in the linear approximation,
(x) = f (x) L(x).
Theorem: Suppose f (x) is a function such that |f 00 (x)| < B on the interval
x [a, a+]. Then, for all x [a, a+], we have:
f (x) = f (a) + f 0 (a)(xa) + (x), where |(x)| < B 2
2 |xa| .

example: For f (x) = x near x = 1, we have f 0 (x) = 12 x1/2 and f 0 (1) = 1
2. Also
f 00 (x) = 14 x3/2 , and on the interval x [0.9, 1.1], we have:
|f 00 (x)| |f 00 (0.9)| = 14 (0.9)3/2 0.29 < 1.
Thus we may take B = 1, and find that:

x = 1 + 21 (x1) + (x), where |(x)| < 21 |x1|2 .
For example, the error at x = 1.1 is |(1.1)| < 12 (0.1)2 = 0.005, so:

1.1 = 1 + 21 (0.1) 0.005 = 1.05 0.005 .
Math 132 Extreme Values Stewart 3.1

Absolute maxima and minima. In many practical problems, we must find


the largest or smallest possible value of a function over a given interval.

Definition: For a function f (x) defined on an interval x [a, b],


an absolute maximum (or global maximum) is a point c [a, b] such
that f (c) f (x) for all x [a, b]. That is, f (c) is the largest output
value of the function at any input point in its domain. We say x = c
is a maximum point and f (c) is the maximum value.

We define an absolute minimum similarly, and both maximums and and mini-
mums are extremums or extreme points. Note that the maximum value M (the
largest possible output) is unique, but f (x) could touch this value at several
input points c1 , c2 , . . . [a, b], all having f (c1 ) = f (c2 ) = = M .
example: At left below, the function y = f (x) on the interval [a, b] has one
absolute maximum point, the left endpoint x = a with f (a) = M , so that
(a, f (a)) is the highest point on the graph; and it has two absolute minimum
points x = c1 , c2 with f (c1 ) = f (c2 ) = N , so that (c1 , f (c1 )) and (c2 , f (c2 )) are
the lowest points on the graph.

Extremal Value Theorem: If f (x) is continuous on the closed, finite


interval x [a, b], then f (x) possesses at least one maximum point
and one minimum point.

The proof would require sophisticated Real Analysis concepts such as those
studied in Math 320. To see that the theorem is not obvious, consider the
function y = g(x) graphed at right above. It is not a continuous function
because the graph has a break, so the Theorem does not guarantee an absolute
maximum; and indeed there is no absolute maximum. Instead, the function
approaches y = 3 as x 1 (i.e. x = 1 for small > 0), but it never
actually reaches y = 3 because it suddenly drops to g(1) = 2. Thus, for any
given output g(c), we can find some slightly larger output g(1) > g(c) for

The Latin plurals of maximum, minimum, extremum are maxima, minima, extrema.
a very tiny > 0, so no g(c) is largest. The function does, however, have the
absolute min point x = 2.

Local maxima and minima. A broader, but still useful, concept is that of
a local extremum: this is a point where the graph has a hill or valley, but not
necessarily the highest or lowest one.

Definition: For a function f (x) defined on an interval x [a, b], a


local maximum (or relative maximum) is a point c [a, b] such that
f (c) is the largest output value for any input point nearby x = c.
Formally, there is a small > 0 such that f (c) f (x) for all
x [c, c+]; or x [a, a+] if c = a; or x [b, b] if c = b.

Clearly, an absolute maximum must also be a local maximum. To illustrate,


the function f (x) in the figure at left above has four local maximum points, the
two endpoints and the two hill tops; and it has three local minimum points, all
valley bottoms.

Vanishing derivatives. Calculus makes finding extremums surprisingly easy.


We have already seen a real-world example at the end of Notes 2.7, where we
asked for the maximum height of a ball whose height at time t is given by the
ballistic function s(t) = 5 + 79t 16t2 . The ball is highest at the moment when
it passes from rising to falling and its velocity is zero: t = c with v(c) = 0,
where v(t) = s0 (t) = 79 32t; and we solve to get c = 32 79
. That is, if t = c is
the maximum point of s(t), then s0 (c) = 0.
This also makes sense graphically. If x = c is a local maximum of a function
f (x), then (c, f (c)) is most likely a hill-top of the graph y = f (x), and the
tangent line is horizontal at this point, having zero slope. But the tangent
slope is the derivative f 0 (c), so if t = c is a local maximum, then f 0 (c) = 0.
The same goes for a local minimum or valley-bottom.

First Derivative Theorem: If f (x) has a local minimum at x = c,


which is not an endpoint of the interval of definition, and f (x) is
differentiable at this point, then f 0 (c) = 0.

This could be proven using the Linear Approximation Theorem at the end of
Notes 2.9.
example: We wish to find the maxima and minima, both local and absolute,
of f (x) = x3 x + 1 on the interval x [1, 23 ]. Since f (x) is continuous (by
the Limit Laws), the Extremal Value Theorem guarantees there is at least one
of each type of point.
Exactly where are the hill-top and the valley-bottom points? Since f (x) is
differentiable at every point, the First Derivative Theorem means that all lo-
cal maximum and minimum points must be solutions of f 0 (x) = 0, namely
3x2 1 = 0, or x = 13 0.58 . The graph shows that the local maxima
are the hill-top x = 13 and the right endpoint x = 32 , and the one with the
larger output is the absolute maximum: f ( 13 ) 1.4 < f ( 32 ) 2.9, so the
endpoint x = 32 is the absolute maximum point. Similarly, the local minima
are x = 1 and x = 13 with f (1) = 1 > f ( 13 ) 0.61, so x = 13 has the
smaller output and is the absolute minimum point.

Critical points. The above example illustrates the method for identifying
all relevant candidates for the absolute maximum and minimum: the end-
points and the points where the derivative vanishes, and also possibly where
the derivative is not defined because the graph has a corner or a discontinuity.

Definition: For a function f (x), a critical point (or critical number)


is a point x = c where the derivative is either zero or the function
is not differentiable: f 0 (c) = 0 or undefined.

Method for absolute maxima and minima problems.

1. Given f (x) on an interval x [a, b], determine the critcal points (critical
numbers) x = c such that f 0 (c) = 0 or undefined. Be sure to consider
only those c [a, b], discarding any critical points outside the relevant
interval.

2. If f (x) is continuous, find f (x) for all critical points x = c and for the
endpoints x = a, b. Those points with the largest output are the absolute
maximum points, and those with smallest values are the absolute minima.

3. If f (x) is not continuous, for each discontinuity x = c you must examine


f (x) for x in a small interval around x = c to see if these outputs become
larger or smaller than the outputs in Step 2 as x c+ or x c .

Most functions are continuous and differentiable as in the previous example,


and it is enough to perform Step 1 with f 0 (c) = 0, then Step 2. Below we
illustrate some more complicated situations.
example: Not every critical point must be a local maximum or minimum. For
f (x) = x3 , solving f 0 (x) = 3x2 = 0 gives x = 0 as the unique critical point.
The graph (above left) has a horizontal slope which is neither a hill-top nor a
valley-bottom, but rather a stationary point, where the function pauses in its
rise. This does not derail the Method, since it only gives an extra candidate
for the absolute max/min, which will be discarded because its output value is
neither largest nor smallest over any given interval.
example: Let f (x) = |2x2 + 2x 1|, with graph at center above. Recall that
d |x|
dx |x| = sgn(x) = x , which is undefined when x = 0. By the Chain Rule:

f 0 (x) = sgn(2x2 +2x1) (2x2 +2x 1)0 = sgn(2x2 +2x1) (4x+2).

Since sgn( ) is never zero, we have f 0 (x) = 0 when the second factor vanishes:
4x + 2 = 0, or x = 21 .
But this is not the only critical point, since we must also consider when f 0 (x)
2
is undefined. This happens when the first factor
2 1
sgn(2x + 2x 1) is undefined,
namely when 2x + 2x 1 = 0, or x = 4 (2 12) by the Quadratic Formula.
These are the corners of the graph sitting on the x-axis: we must not skip
them, since they are actually the absolute minimum points.
1
example: Let f (x) = x2 + (x1)2
on the interval x [2, 2], graph above
right, with:
0 0 2(x4 3x3 +3x2 x1)
f 0 (x) = x2 + (x1)2 = 2x+(2)(x1)3 (x2)0 = (x1)3
.

We have f 0 (x) = 0 when the numerator vanishes, x4 3x3 +3x2 x1 = 0, and


graphing this degree 4 polynomial gives approximate solutions x = c1 0.38
and c2 1.82 with f (c1 ) 0.67 and f (c2 ) 4.80. The endpoints x = 2 give
f (2) = 37
9 4.11 and f (2) = 5. We might be tempted to take the largest of
these outputs as the absolute maximum, but clearly none of these is the highest
point of the graph.
We neglected to consider when f 0 (x) is undefined: this is when the de-
nominator (x1)2 = 0, or x = 1. This point is a discontinuity of f (x),
so by Step 3 we must consider not only f (1), which is undefined, but also
a small interval around x = 1. In fact, we have a vertical asymptote, and
limx1 f (x) = limx1+ f (x) = .
That is, f (x) can get as large as desired for x close enough to 1. There is no
absolute maximum. However, the rising asymptotes do not affect the absolute
minumum, which is still the smallest of the outputs at the other critical points,
namely f (c1 ) 0.67. Note that since f (x) is not continuous, the Extremal
Value Theorem does not guanrantee an absolute max or min; and in fact the
max does not exist, but the min does.
Math 132 Mean Value Theorem Stewart 3.2

Vanishing derivatives. We will prove some basic theorems which relate the
derivative of a function with the values of the function, culminating in the
Uniqueness Theorem at the end. The first result is:
Rolles Theorem: If f (x) is continuous on a closed interval x [a, b]
and differentiable on the open interval x (a, b), and f (a) = f (b),
then there is some point c (a, b) with f 0 (c) = 0.
Here x [a, b] means a x b, and x (a, b) means a < x < b. See the graph
at left for an example: no matter how the curve wiggles, it must be horizontal
somewhere.

Physically, suppose f (t) represents the height of a moving object at time t,


starting and finishing at the same position over the time interval t [a, b]. The
theorem says there must be a pause in the motion where f 0 (t) = 0: this is when
the object doubles back toward its start.
Proof of Theorem. Assume f (x) satisfies the hypotheses of the Theorem. The
Extremal Value Theorem (3.1) guarantees that the continuous function f (x)
has at least one absolute maximum point x = c1 [a, b].
If c1 6= a, b, then c1 (a, b), and the First Derivative Theorem (3.1) says
that f 0 (c1 ) = 0.

On the other hand, if c1 = a or b, then f (c1 ) = f (a) = f (b). Still, f (x)


also has an absolute minimum point x = c2 . If c2 (a, b), then f 0 (c2 ) = 0
as before.

The only case left is if c1 = a or b, and also c2 = a or b, so that f (c1 ) =


f (c2 ) = f (a) = f (b). Since the maximum and minimum values are the
same, f (x) cannot move above or below f (a). Thus, f (x) can only be a
constant function, and f 0 (c) = 0 for all c (a, b).
In every case, the conclusion holds, Q.E.D.

In formal mathematics, hypothesis (plural hypotheses) means the if part of a theorem,
the setup which is given or assumed. In our theorem, the three hypotheses are: f (x) is
continuous on [a, b], f (x) is differentiable on (a, b), and f (a) = f (b).

Conclusion means the then part of a theorem, the payoff which is to be deduced from
the hypothesis: in our theorem, that f 0 (c) = 0.

Initials for Latin quod erat demonstrandum meaning which was to be shown, the tradi-
tional end of a proof.
For Rolles Theorem, as for most well-stated theorems, all the hypotheses
are necessary to be sure of the conclusion. In the graph at right above, y = g(x)
has a corner and g 0 (1) does not exist, so just one hypothesis fails at just one
point. But already the conclusion is false: g 0 (c) = 1 for c < 1 and g 0 (c) = 1
for c > 1, but nowhere is g 0 (c) = 0. In physical terms, the velocity jumps
instantaneously from 1 to 1 like an idealized ping-pong ball, and there is no
well-defined velocity at the moment of impact.

Derivatives versus difference quotients. Throughout our theory, the


derivative f 0 (a) has been shadowed by the difference quotient, which across
f
an interval [a, b] is x = f (b)f
ba
(a)
. Numerically, the difference quotient is an
df f
approximation to the derivative: dx x . In physical terms, the difference
quotient is the average rate of change of f (x) over the interval x [a, b]. Geo-
metrically in terms of the graph y = f (x), the difference quotient is the slope
of the secant line cutting through the points (a, f (a)) and (b, f (b)).
Now we come to the most powerful result of this section, which says that
the derivative is sometimes exactly equal to the difference quotient.
Mean Value Theorem (MVT): If f (x) is continuous on a closed in-
terval x [a, b] and differentiable on the open interval x (a, b),
then there is some point c (a, b) with f 0 (c) = f (b)f
ba
(a)
.
See the picture below for an example: as the graph rises from (a, f (a)) to
(b, f (b)), at some points the tangent line must be parallel to the secant line.

Note that Rolles Theorem is the special case of MVT in which the secant line
is horizontal. In fact, we will prove MVT for a general f (x) by cooking up a
new function g(x) for which Rolles Theorem applies, then translating Rolles
conclusion back in terms of f (x).
Proof of MVT. Suppose f (x) satisfies the hypotheses. Then define a new func-
tion g(x), shown in the picture, which measures the height from the graph
y = f (x) down to the secant line y = f (a) + f (b)f
ba
(a)
(xa):
f (b)f (a)
g(x) = f (x) f (a) ba (xa).
Then g(x) is continuous on [a, b] by the Limit Laws (1.6), and differentiable
on (a, b) by the Derivative Rules (2.3). In fact,
f (b)f (a) f (b)f (a)
g 0 (x) = f 0 (x) 0 ba (10) = f 0 (x) ba ,

since f (a) and f (b)f


ba
(a)
are constants (having no x in them).
Also, we can easily compute that g(a) = g(b) = 0, so all the hypotheses of
Rolles Theorem hold for g(x). Thus the conclusion of Rolles Theorem also
holds: there is some c (a, b) with g 0 (c) = 0. That is,
f (b)f (a)
g 0 (c) = f 0 (c) ba = 0,

f (b)f (a)
which means f 0 (c) = ba , Q.E.D.
The Mean Value Theorem does not give any way to find the particular c (a, b)
in the conclusion, so if we want this value in a particular case, we must solve
for x in the equation f 0 (x) = f (b)f
ba
(a)
; however the Theorem will guarantee
that there is some solution.

example: Let f (x) = 5 x x x over the interval [a, b] = [0, 4].


To check the hypotheses of MVT, note that x is continuous for all x 0, and
thus over [0, 4]. As for differentiability:
 0
f 0 (x) = 5x1/2 x3/2 = 52 x1/2 32 x1/2

is defined for x > 0, and hence over x (0, 4): the hypothesis allows f 0 (a) =
f 0 (0) to be undefined. Thus we conclude there must be some c (0, 4) with
f 0 (c) = f (b)f
ba
(a) 20
= 40 = 12 . That is, we must solve:

f 0 (x) = 52 x1/2 32 x1/2 = 21 ,



which is equivalent to 3x + x 5 = 0. Substituting the variable u = x gives
3u2 + u 5 = 0, so the Quadratic Formula gives:

1 12 4(3)(5) 1 61
u = x = 2(3) = 6 .
 2
611
The negative solution is impossible, and the positive one gives x = c = 6
1.29, which agrees with the picture.
Mathematical and physical uniqueness. We come to the most important
result of this section:
Uniqueness Theorem:
(a) If f (x) is differentiable and f 0 (x) = 0 for all x (a, b), then f (x) = C is
a constant function.
(b) If f (x), g(x) are differentiable and f 0 (x) = g 0 (x) for all x (a, b), then
f (x) = g(x) + C for some constant C.
(c) If f (x), g(x) are differentiable and f 0 (x) = g 0 (x) for all x (a, b), and
also f (c) = g(c) for some c (a, b), then f (x) = g(x).

Proof. (a) Assume the hypothesis f 0 (x) = 0 for all x (a, b), and imagine,
contrary to the conclusion, that f (x) were not a constant function. Then we
would have two unequal values f (a1 ) 6= f (b1 ) for some a1 , b1 [a, b], and we
could apply the Mean Value Theorem to the smaller interval [a1 , b1 ] to get
f 0 (c) = f (bb11)f
a1
(a1 )
6= 0, since f (b1 ) f (a1 ) 6= 0. But this would be impossible,
since we assumed f 0 (c) = 0 for all c (a, b). Hence, f (x) cannot be non-
constant, and must be constant.
(b) Assume the hypothesis f 0 (x) = g 0 (x) for all x (a, b). Now the function
h(x) = f (x) g(x) has h0 (x) = f 0 (x) g 0 (x) = 0, so we can apply part (a) to
conclude that h(x) is a constant function, h(x) = f (x) g(x) = C, meaning
f (x) = g(x) + C for some consant C.
(c) In the situation of (b), we also assume f (c) = g(c). By (b), we know
f (x) = g(x) + C and C = f (x) g(x) for all x. In particular for x = c, we have
C = f (c) g(c) = 0, so f (x) = g(x) + C = g(x), Q.E.D.
To see the significance of this theorem, recall from 2.7 the Ballistic Equa-
tion s(t) = s0 + v0 t 21 gt2 , which gives the height s(t) of an object thrown
straight up from initial height s0 at initial velocity v0 , under the influence
of constant gravitational acceleration g. We verified that the derivative
s0 (t) = v0 gt gives the expected velocity: decreasing from v0 at a constant
rate of g.
But does this guarantee we have the correct function s(t)? What if there
were some other function s(t) with the same derivative s0 (t) = s0 (t) and the
same initial value s(0) = s(0)? Then s(t) would be just as good a candidate to
give the height of the object, and our mathematical theory would not produce
a clear physical prediction. However, the Uniquenss Theorem (c) shows that
s(t) = s(t), so the other solution could only be the same as the original solution.
Experiment shows that objects launched in exactly the same way always fly
the same way, not according to s(t) in some experiments and a different s(t) in
other experiments. This is what we mean by physical law. Our Theorem shows
that the mathematical solution has the same uniqueness as the experimental
result.


The theory of quantum mechanics, however, which explains atomic-scale phenomena,
goes beyond the framework of deterministic laws, incorporating randomness in an essential
way. It requires a yet higher mathematical theory, in which we apply calculus not to specific
positions, but to probability distributions on all possible positions.
Math 132 Derivatives and Graphs Stewart 3.3

Increasing and decreasing functions. We will see how to determine the im-
portant features of a graph y = f (x) from the derivatives f 0 (x) and f 00 (x), sum-
marizing our Method the last page. First, we consider where the graph is rising
% and falling &. Formally:

Definition: A function f (x) is increasing on the interval [a, b] whenever


f (x1 ) < f (x2 ) for every pair of inputs x1 < x2 in [a, b]; and f (x) is
decreasing on [a, b] whenever f (x1 ) > f (x2 ) for every x1 < x2 .

We can determine this with derivatives: the graph rises where its slope is positive.

Increasing/Decreasing Theorem: Let f (x) be continuous on [a, b].


If f 0 (x) > 0 for all x (a, b), then f (x) is increasing on [a, b].
If f 0 (x) < 0 for all x (a, b), then f (x) is decreasing on [a, b].

Proof. Assume the hypothesis that f (x) is continuous on [a, b] and f 0 (x) > 0, but
imagine, contrary to the conclusion, that f (x) failed to be increasing. Then, negat-
ing the definition, there would be some x1 < x2 with f (x1 ) f (x2 ). Applying
the Mean Value Theorem to the interval [x1 , x2 ], we would get some x3 (x1 , x2 )
with
f (x2 ) f (x1 )
f 0 (x3 ) = 0,
x2 x1
since f (x2 )f (x1 ) 0. But this would be impossible, since we assumed f 0 (x) > 0
for all x (a, b), and hence for all x in the smaller interval (x1 , x2 ). Thus,
f (x1 ) f (x2 ) is impossible, and we must have f (x1 ) < f (x2 ) for all x1 < x2 . The
second statement of the Theorem is proved similarly. Q.E.D.
example: For f (x) = x5 15x3 , let us determine the rough shape of the graph
by examining the derivative:

f 0 (x) = 5x4 45x2 = 5x2 (x2 9) = 5x2 (x3)(x+3).

Since f 0 (x) is defined everywhere, the critical points (or critical numbers) are the
solutions of f 0 (x) = 0, namely x = 3, 0, 3.

x 3 0 3
f 0 (x) + 0 0 0 +
f (x) % 162 & 0 & 162 %

Since f 0 (x) is zero only at the critical points, it is all positive or all negative in
each interval between. For example, in the leftmost interval (, 3), a sample
value is f 0 (4) = 560 > 0, so f 0 (x) is positive in the whole interval, and we put +
in the first column next to f 0 (x). The rest of the f 0 (x) row is similar.
What does this mean for the graph y = f (x)? From 3.1, we know the critical
points are candidates for local max/mins: hill tops or valley bottoms. Which is
which? To the left of x = 3, we have f 0 (x) > 0 so f (x) is increasing %; to the
right, we have f 0 (x) < 0 so f (x) is decreasing &. Evidently, x = 3 is a local max,
and (3, 162) is a hill top point of the graph. Similarly, (3, 162) is a valley.
On the other hand, to the left and right of x = 0, we have f 0 (x) < 0, so
f (x) is decreasing on both sides: this means x = 0 is a stationary point where the
graph levels out before continuing to descend. We get a good picture of the graph:

The reasoning in our example holds for any function:


First Derivative Test: Let f (x) be a function differentiable in a small
interval around x = c, with f 0 (c) = 0.
If f 0 (x) > 0 for x < c and f 0 (x) < 0 for x > c, then x = c is local
maximum of f (x).
If f 0 (x) < 0 for x < c and f 0 (x) > 0 for x > c, then x = c is local
minimum of f (x).
If f 0 (x) has the same sign on both sides of x = c, then x = c is a
stationary point of f (x), not an extremal point.

Concavity. A more subtle feature of a graph is where it curves upward or down-


ward. We say a graph is concave up near a point if it is part of a smiling curve
^ ; and concave down if it is part of a frowning curve _. An inflection point is a
special point where the graph wiggles, changing its concavity: a transition point
between smiling and frowning . Some examples:

In terms of the slope, concave up means that as x increases, the slope becomes
less negative or more positive. For concave down, the slope becomes less positive
or more negative.
Definition: Suppose the derivative f 0 (x) is defined for x near c.
f (x) is concave up at x = c if f 0 (x) is increasing near x = c.
f (x) is concave down at x = c if f 0 (x) is decreasing near x = c.
f (x) has an inflection point at x = c if f 0 (x) has a local max or
local min at x = c.

Also note that f (x) = x5 15x3 has only odd powers of x, so f (x) = f (x). This means
the graph has a 180 rotation symmetry, like a propeller. Such an f (x) is called an odd function.
We can test for concavity using the second derivative f 00 (x):

Concavity Theorem: Let f (x) be a function.


If f 00 (x) > 0 for all x (a, b), then f (x) is concave up over (a, b).
If f 00 (x) < 0 for all x (a, b), then f (x) is concave down over (a, b).
If f 00 (c) = 0 and f 00 (x) changes its sign at x = c, then f (x) has an
inflection point at x = c.

Proof. Applying the Increasing/Decreasing Theorem to the function g(x) = f 0 (x),


we get: if g 0 (x) > 0, then g(x) is increasing. But g 0 (x) = (f 0 (x))0 = f 00 (x), so
this means: if f 00 (x) > 0, then f 0 (x) is increasing, and f (x) is concave up. The
proof of the second part is similar. The third part comes from applying the First
Derivative Test to g(x). Q.E.D.
example: Continuing the above f (x) = x5 15x3 , f 0 (x) = 5x4 45x2 , we have:

f 00 (x) = 20x3 90x = 10x(2x2 9).



The candidate inflection points are where f 00 (x) = 0, i.e x = 0 and x = 32 2
2.12, and the sign chart confirms the transitions in concavity at these points.

x 32 2 0 3
2 2
f 00 (x) 0 ++ 0 0 ++
567

f (x) _ 8 2 ^ 0 _ 567
8 2 ^

(I wrote double and + + just to make frowny and smiley faces: this is a good
way to remember which is which.) This agrees with the features of our graph
above, and it allows us to precisely determine
the inflection
points marked by
small diamonds in the picture: ( 32 2, 567
8 2), ( 3
2 2, 567
8 2), and (0, 0), which
is both a stationary critical point and an inflection point.

Critical Points and Concavity. There is one more use we can make of the
second derivative. At a local max x = c, the slope changes from positive to
negative, so the graph is concave down and f 00 (c) < 0; while at a local min it is
concave up and f 00 (x) > 0. Thus, we can distinguish extremal points just from
the sign of f 00 (c).

Second Derivative Test: Let f (x) be a function with f 00 (x) continuous


near x = c. Suppose f 0 (c) = 0.
If f 00 (c) < 0, then x = c is local maximum of f (x).
If f 00 (c) > 0, then x = c is local minimum of f (x).
If f 00 (x) = 0, then this test fails, and x = c might be a local max,
a local min, or a stationary point.

Indeed, in our example, we have f 00 (3) = 270 < 0 at the local max; f 00 (0) = 0
at the stationary point; and f 00 (3) = 270 > 0 at the local min.
x2/3
Example. We will graph f (x) = , going through the Method steps on
(x1)2
the last page.
1. Using the Quotient and Chain Rules, and much simplification, we get:
2 1/3
3x x2/3 2(x1)1 (x1)0
(1x)2 2
3 (2x+1)
f 0 (x) = 4
= .
(1x) (x1)3 x1/3
2 3 1/3 2 (2x+1) 3(x1)2 x1/3 + (x1)3 1 x2/3 2 2

00 3 (2)(1x) x 3 3 9 (14x +14x1)
f (x) = = .
(1x)6 x2/3 (x1)4 x4/3
2. The two types of critical points are solutions of:

f 0 (x) = 0, when the numerator is zero: 2


3 (2x+1) = 0, i.e. x = 12
f 0 (x) = undefined, when the denominator is zero, i.e. x = 1 and x = 0.

3. The sign chart looks like:


x 12 0 1
f 0 (x) + 0 +
2

3
f (x) % 9 2 & 0 % &

( 21 , 29 3 2) (0.5, 0.28) is a local maximum (hill top dot in the pictue

below). We could also see this by taking f 00 ( 12 ) = 32 3
81 2 < 0.
(0, 0) is a local minimum, but instead of a flat valley bottom it is a sharp
ravine (a cusp): instead of a horizontal tangent, the slope becomes
infinte and the tangent line is vertical.
x = 1 is a vertical asymptote (dashed line in picture): since the denom-
inator of f (x) is zero, f (1) is undefined and the function blows up to
. Specifically, f (x) is increasing to the left of x = 1, so the graph
shoots up to limx1 f (x) = ; and f (x) is decreasing to the right of
x = 1, so the graph shoots down from limx1+ f (x) = .

4. The inflection points are solutions of f 00 (x) = 0, when the numerator is zero:

73 7
14x2 + 14x 1 = 0 x = 14 1.07, 0.07

These are the small diamond points in the picture.


Here the solutions of f 00 (x) = undefined are just the vertical asymptote x = 1
of f (x), and also the vertical asymptote x = 0 of f 0 (x). Becasue f 0 (a) is
not defined, these are not considered inflection points, though the concavity
does change.

5. The x and y-intercepts are both at (0, 0).

6. When x is a very large positive or negative number, x is almost the same as


x1 (compare 1000 and 999). We can approximate f (x) by replacing x1
with x:
x2/3 x2/3 1
f (x) = 2
2
= x4/3 = for large |x|.
(x1) x x3x
This simplified function is easy to graph (dotted curve in the picture), and
the true graph y = f (x) approaches this curve like an asymptote at the left
and right ends of the x-axis.

7. This function does not have any symmetry.

8. Finally, the graph is:

Method for Graphing


1. Determine the derivatives f 0 (x) and f 00 (x).
2. Solve f 0 (x) = 0 and f 0 (x) = undef to find the critical points.
3. Sign table: f 0 (x) > 0 means f (x) is %; f 0 (x) < 0 means f (x) is &. Clas-
sify critical points as: local max, local min, stationary, or vertical asymptote.
4. Solve f 00 (x) = 0 or undef for inflection pts. (Sign table usually not needed.)
5. Find the x-intercepts by solving f (x) = 0, and the y-intercept (0, f (0)).
6. Find the behavior as x by taking the highest-order terms in f (x)
7. Check for symmetry: 180 rotation symmetry if f (x) = f (x); or side-to-
side reflection symmetry if f (x) = f (x).
8. Draw all the above features on the graph.
We will discuss Step 6 in 3.4. A very detailed Method chart is at the end of 3.5.
Math 132 Limits at Infinity Stewart 3.4

Vertical asymptotes. We say a curve has a line as an asymptote if, as the curve
runs outward to infinity, it gets closer and closer to the line. Closer and closer
reminds us of limits, and indeed we have seen that x = a is a vertical asymptote
of y = f (x) whenever one of the following holds:

lim f (x) = lim f (x) = lim f (x) = lim f (x) = .


xa xa+ xa xa+

As we saw in 1.5, has no meaning by itself; rather, the whole equation means
that, as x gets closer to (but unequal to) a, the output f (x) eventually becomes
higher than any given bound B, such as B = 100 or 1000 or 1 billion. Similarly,
a limit equals when f (x) becomes lower than B for any large B.
At the end of 3.3, we saw how a sign chart for f 0 (x) can classify vertical
asymptotes. We could do this with a sign chart for f (x) itself, with no derivatives.
example: Let:
x2 6x+9 (x3)2 x3
f (x) = 3 2
= = .
x 6x +11x6 (x1)(x2)(x3) (x1)(x2)

(To determine vertical asymptotes and intercepts, we always want f (x) in factored
form.) In the original form, the denominator vanishes at x = 3, but we work with
the cancelled form at right.
The function can only change its sign at points where f (x) = 0 (numerator =
0) or f (x) is not defined (denominator = 0), that is, x = 1, 2, 3. In the interval
2
x (, 1), the sign is given by a sample point like f (0) = (1)(3) = 32 < 0,
so f (x) is negative; and similarly for the other intervals.
x 1 2 3
f (x) + 0 +
Each time x passes one of the sign-change candidates x = a, a factor (xa) changes
from negative to positive, and f (x) does indeed change sign.

To factor the bottom, we try linear factors x m n
, where m is an integer fac-
tor of the constant coefficient 6, and n is an integer factor of the highest coefficient
1, so n = 1, 2, 3, 6 and m = 1. Trying mn
= 1, we find x1 is a fac-
tor, since polynomial long division gives x 6x2 +11x6 = (x1)(x2 5x+6), and the
3

quadratic is easy to factor. For a review of polynomial long division, see Khan Academy:
www.khanacademy.org/math/algebra2/polynomial and rational/dividing polynomials/v/polynomial-division.
Here f (x) = just means the denominator vanishes and there is a vertical
asymptote. The signs on each side of the asymptote show whether the graph
shoots upward or downward: we have limx1 f (x) = , limx1+ f (x) = ,
limx2 f (x) = , limx2+ f (x) = .

Horzontal asymptotes. To understand the behavior of the graph over the left
and right ends of the x-axis, we will need a new kind of limit in which x becomes
larger and larger.

Definition:

limx f (x) = L means that f (x) can be forced arbitrarily close


to L, closer than any given > 0, by making x > B for some B.
limx f (x) = L means that f (x) can be forced arbitrarily close
to L, closer than any given > 0, by making x < B for some B.

Graphically, limx f (x) = L means that toward the right of the x-axis, the
graph y = f (x) approaches the horizontal asymptote y = L; and similarly for
limx f (x) = L toward the left. We can even have limx f (x) = , which
means that the graph goes off toward the upper right of the xy-plane in an un-
specified way.
The most basic x limits are the power funcitons: for a positive real
number power p > 0, we have:
1
lim xp = , lim = 0.
x x xp
m
For x , consider the rational power p = n where m, n are positive integers
with n odd (perhaps n = 1); then:

m/n for m even 1
lim x = lim = 0.
x for m odd, x xm/n


Proof: For any large bound C, we can force xp > C if we take x so large that x > C 1/p . For
any small error tolerance > 0, we can force | x1p 0| < if we take x so large that x > ( 1 )1/p .
For example:

Based on these, we can deduce the horizontal asymptotes for any rational function
(quotient of polynomials).
x 6x+9 2
example: Continuing f (x) = x3 6x 2 +11x6 , does y = f (x) have a horizontal
asymptote? Informally, we can reason as follows. For large x (positive or neg-
ative), the value of x2 6x+9 is relatively close to x2 : say for x = 1000, com-
pare x2 6x+9 = 9,994,009 and x2 = 1,000,000. Thus we can approximate
x2 6x+9 x2 , which we call the highest term of the polynomial. Also doing
this for the denominator:
x2 6x+9 x2
f (x) = 3 for large x.
x 6x2 +11x6 x3
2
Thus, lim f (x) = lim x3 = lim 1 = 0, and y = f (x) has the horizontal
x x x x x
asymptote y = 0 for x and x . In the graph we drew previously, the
left and right ends do indeed approach the x-axis.
Formally, we can show this from the Limit Laws by dividing numerator and
denominator by the highest term in the denominator:
1
x2 6x+9 x2 6x+9 x3
lim f (x) = lim = lim 1
x x x3 6x2 +11x6 x x3 6x2 +11x6
x3
1 6 9
x x2 + x3 0 6(0) + 9(0)
= lim = = 0.
x 1 6 + 112 63 1 6(0) + 11(0) 6(0)
x x x
Warning: The informal argument is the easiest way to understand these limits,
but the formal argument (dividing by the highest term) might be required for full
credit on a quiz or test.
3x2 x+9
example: For f (x) = 5x2 +2x6
, we take highest terms to get:
3x2 x+9 3x2 3
lim f (x) = lim = lim = .
x x 5x2 +2x6 x 5x2 5
Thus, y = f (x) has horizontal asymptote y = 35 toward the right. We simi-
larly deduce lim f (x) = 35 , which means the same horizontal asymptote toward
x
the left.
example: For
x2 + 3x7/2 x5
f (x) = ,
9x x + 4x2 x
the terms in the denominator are 9xx1/2 = 9x3/2 and 4x2 x1/2 = 4x5/2 , so the
second is the highest term. Thus:

x2 + 3x7/2 x5 3x7/2
lim f (x) = lim = lim
x x 9x x + 4x2 x x 4x5/2

3 7/25/2 3
= lim x = lim x = ,
x 4 x 4

which means y = f (x) has no horizontal asymptote. However, the approximation


f (x) 43 x implies that the right end of the graph looks like a line with slope 43 .
(See slant asymptotes in 3.5.) This function is not defined for x < 0, so there is
no left end.
Math 132 Curve Sketching Stewart 3.5

Man vs machine. In this section, we learn methods of drawing graphs by hand.


The computer can do this much better simply by plotting many points, so why
bother with our piddly sketches? One reason is that calculus tells us the critical
areas of the graph to look at: the computer might default to showing us some
uninteresting region which misses the main features.
This is part of a very great danger for anyone who uses mathematics. If you let
the computer do the thinking, not just the calculating, you are ready to accept any
bizarre wrong answer without any way to check it. Then one typo error will esca-
late until your scientific paper has to be retracted, your companys expenses are
ten times what you predicted, your bridge collapses, your rocket crashes. Dont let
it happen! Before accepting the computers answer, you must check the expected
answer qualitatively against a story or sketch, and quantitatively by plotting sam-
ple points.

Slant asymptote. This means a diagonal line y = mx + b which is approached


by a graph y = f (x). For example, consider the function:

x3 6x2 + 11x 6
f (x) = .
2x2 8x
Recall from 3.3 that to find the large-scale behavior of f (x) as x , we can
x3 1
approximate by the highest term in numerator and denominator: f (x) 2x 2 = 2 x.
Thus, the right and left ends of the graph look like lines with slope 21 .
However, the graph does not actually approach the line y = 21 x: there is
a vertical shift, y = 12 x + b. To approximate better, and find the exact slant
asymptote of y = f (x), we perform polynomial long division:
1
2x 1 rem 3x 6
2x2 8x x3 6x2 + 11x 6


(x3 4x)
2x2 + 11x 6
(2x3 + 8x)
3x 6

This means:

x3 6x2 + 11x 6 = ( 12 x1)(2x2 8x) + (3x6),

so that:
( 21 x1)(2x2 8x) + (3x6) 1 3x6
f (x) = = 2x 1+ .
2x2 8x 2x2 8x
1 3x6
That is, we have the approximation f (x) 2x 1 with error term 2x2 8x
; but
this term gets vanishingly small:
3x6 3x
lim = lim = 0.
x 2x2 8x x 2x2

That is, the difference between them vanishes as x gets large: limx f (x) (mx+b) = 0.
That is, as x gets larger and larger, the error term gets smaller and smaller, and
the graph y = f (x) gets closer and closer to the line y = 12 x 1. This is what we
mean by a slant asymptote.
g(x)
For a general rational function f (x) = h(x) , a quotient of polynomials g(x), h(x),
we use polynomial long division to get g(x) = q(x)h(x) + r(x) for a quotient poly-
nomial q(x) and a remainder polynomial r(x) having lower powers of x than h(x).
Thus:
g(x) q(x)h(x) + r(x) r(x)
f (x) = = = q(x) + .
h(x) h(x) h(x)
Since the numerator r(x) is smaller than the denominator h(x), we have
r(x)
limx h(x) = 0, and y = f (x) gets closer and closer to the curve y = q(x). If
q(x) = mx + b, then y = mx + b is a slant asymptote; otherwise, y = q(x) is an
asymptotic curve of y = f (x).

Rational function example. Referring to the Method for Graphing at the end
of this section, we apply the steps to the above function:

x3 6x2 + 11x 6 (x1)(x2)(x3)


f (x) = 2
= .
2x 8x 2x(x4)

1. We have:
x4 8x3 + 13x2 + 12x 24 3(x3 6x2 + 24x 32)
f 0 (x) = , f 00 (x) = .
2x2 (x4)2 x3 (x4)3

The domain of f (x) is all real numbers x 6= 1, 3, namely:

x (, 1) (1, 3) (3, ).

2. There is no neat way to solve f 0 (x) = 0. If we computer-plot the numerator


x4 8x3 + 13x2 + 12x 24, we see 4 roots, which we can name x = a1 , . . . , a4 ,
approximately at:

a1 1.26, a2 1.39, a3 2.61, a4 5.26 .

In 3.8, we will learn Newtons Method to zero in on such approximate


solutions when algebraic ones are not available.
The other critical points are solutions of f 0 (x) = undefined, namely the roots
of the denominator x = 0 and 4.

3. The sign chart is:

a1 a2 a3 a4
x 1.26 0 1.39 2.61 4 5.26
f 0 (x) + 0 0 + 0 0 +
f (x) % 2.37 & & 0.05 % 0.05 & & 2.37 %
max asym min max asym min

4. To solve f 00 (x) = 0, a computer-plot of the numerator x3 6x2 +24x32 ap-


pears to show a single root at x = 2. To check this, we do polynomial long di-
vision (x3 6x2 +24x32)(x2) to show the numerator is (x2)(x2 4x+16),
where the quadratic factor has no real number roots. Thus x = 2 is the only
inflection point. (Solving f 00 (x) = undef just gives the vertical asymptotes.)
We do not need a sign chart for f 00 (x), since the concavity seen in the picture
below is forced by the known critical and inflection points: anything else
would lead to more wiggles.

5. Solving f (x) = 0 gives the x-intercepts x = 1, 2, 3. There is no y-intercept,


since the y-axis is a vertical asymptote.

6. The slant asymptote is y = 12 x 1, from the beginning above.

7. This function does not have any of the standard symmetries in the Method.
However, the graph reveals a 180 rotation symmetry around the point (2, 0).
This is equivalent to the equation f (4x) = f (x), which can be shown from
the factored form.

8. The graph is:


Trigonometric example. Apply the Method at the end to: s(x) = x 2 sin(x).

1. We have: s0 (x) = 1 2 cos(x) and s00 (x) = 2 sin(x). (See 2.4.)


The domain is all real numbers: x (, ).

2. The critical points are solutions of s0 (x) = 1 2 cos(x) = 0, so cos(x) = 12 ,


and x = 60 = 13 , or x = 2 13 = 53 , or any shift of these by a multiple
of 2:
x = 13 2n and 35 2n for n = 0, 1, 2 . . ..
You can see this on the graph of cos(x):

There are no points with s0 (x) = undefined.

3. The sign chart for s0 (x) is periodic (repeating):

x 53 13 1
3
5
3
7
3
11
3
s0 (x) 0 + 0 0 + 0 0 + 0
5 1 1 5 7 11
s(x) & 3 3 % 3 + 3 & 3 3 % 3 + 3 & 3 3 % 3 + 3 &
min max min max min max

4. The inflection points are solutions of s00 (x) = 2 sin(x) = 0, or x = n for any
integer n. Every multiple of is an inflection point of y = s(x).

5. The point (0, s(0)) = (0, 0) is an x and y-intercept. From the graph, we can
see that there are two more x-intercepts, but we have no way to find them
exactly. (We can approximate by Newtons Method 3.8.)

6. The large-scale behavior can be approximated by taking the highest or


largest term: s(x) x. However, the line y = x is not a slant asymptote,
because s(x) oscillates above and below this line, without getting closer and
closer.

7. This is an odd function, since:

s(x) = (x) 2 sin(x) = x + 2 sin(x) = s(x).

Thus, the graph has 180 rotation symmetry.


This function is not periodic, since s(x+2) 6= s(x), so the graph does
not have the shift-sideways translation symmetry. However, we do have
s(x+2) = s(x)+2, so the graph can be moved to itself by shifting sideways
and up!
8. The graph is:
Method for Graphing (detailed)
1. Determine the derivatives f 0 (x) and f 00 (x) with Derivative Rules.
Determine the domain of f (x): for what x the formula makes sense.
2. Solve f 0 (x) = 0 and f 0 (x) = undef to find the critical points.
3. Make a sign table for f 0 (x) to classify each critical point x = a:
x<a x=a x>a
local max _ f 0 (x) + 0
f (x) % f (a) &
local min ^ f 0 (x) 0 +
f (x) & f (a) %
local max f 0 (x) + undef
f (x) % f (a) &
local min f 0 (x) undef +
f (x) & f (a) %
vert asymp %|- f 0 (x) + 1
0
1
f (x) + 0 +
vert asymp %|. f 0 (x) + 1
0 +
1
f (x) + 0
vert asymp &|- f 0 (x) 1
0
1
f (x) 0 +
vert asymp &|. f 0 (x) 1
0 +
1
f (x) 0
Here f (a) means the output value is defined; and 10 means a zero denomi-
nator at x = a produces values. There other possibilities if x = a is a
discontinuity (see 1.8).
4. Solve f 00 (x) = 0 or undef to find inflection points x = a; we also require that
f 0 (a) exists and is a local max/min of f 0 (x). Make a sign table for f 00 (x)
if concavity is needed: f 00 (x) > 0 means concave up (smiling), f 00 (x) < 0
means concave down (frowning).
5. Solve f (x) = 0 to find the x-intercepts; and compute the y-intercept (0, f (0)).
6. Find the behavior as x .
Approximate by highest terms on top and bottom to get f (x) cxp .
g(x)
For a better approximation of a rational function f (x) = h(x) , use poly-
r(x)
nomial long division to get f (x) = q(x) + h(x) .
r(x)
If f (x) = mx + b + h(x) , then y = mx + b is a slant asymptote.
In general, y = f (x) asymtotically approaches y = q(x) as x .
7. Check for symmetries: ways to move the graph onto itself.
Side-to-side reflection symmetry for even function f (x) = f (x).
examples: x2 +3, x4 , cos(x)
180 rotation symmetry for odd function f (x) = f (x).
examples: 2x, x3 , sin(x)
Shift-sideways translation symmetry for periodic f (x+c) = f (x).
examples: cos(x+2) = cos(x), tan(x+) = tan(x).
8. Draw all the above features on the graph.
Math 132 Optimization Stewart 3.7

Rectangle example. Suppose we have 40 meters of fence to make a rectangular


corral. What length and width will fence off the largest area? The range of
possiblilities is illustrated below:

It appears that the square with length and width ` = w = 10 gives the maximum
area A = `w =100 m2 . To prove this algebraically, we note that the perimeter is
constant, P = 2` + 2w = 40; so the length determines the width and also the area:

w = 12 (402`) = 20 `, A = `w = `(20`) = 20` `2 .

That is, the quantity to be maximized, A, is a function of the variable `, which is


allowed to vary between ` = 0 and ` = 20 (which corresponds to w = 0). This is
a familiar problem: find the absolute maximum of

A(`) = 20` `2 over the interval ` [0, 20].

The critical points are given by dA


d` = 20 2` = 0, i.e. ` = 10 with output
A(10) =100, and the endpoint outputs are A(0) = 0, A(20) = 0. The largest of
these is the absolute maximum: ` = 10 with A(`) = 100, as expected.

Method for optimization. We wish to find the maximum or minimum possible


value of a quantity within the constraints of some (usually geometric) situation.

1. Draw a picture labeled with numerical constant values and with letters for
varying quantities, including the target variable which is to be maximized
or minimized.
2. Write equations relating variables according to the geometry of the picture.
3. Choose one of the non-target variables as the controlling variable, and write
all other variables in terms of it by solving the above equations. Also de-
termine the relevant domain of the controlling variable, which is usually
restricted by requiring all lengths to be positive.
4. Find the absolute maximum or minimum of the target variable over the rele-
vant interval, say T = T (x) over x [a, b]. That is, solve T 0 (x) = 0 or undef,
to find the critical points x = c1 , c2 , . . . , as well as the endpoints x = a, b.
Take the output values T (x) at these candidate points: the largest/smallest
value is the desired maximum/minimum.
5. If needed, find values of the other variables at the optimum. Make sure the
answer is physically plausible to check for mistakes.
Bucket example. Consider a 10-quart bucket with cylindrical sides and a circular
bottom. What radius and height will minimize the surface area of the sides and
bottom?

1.

The target variable to be minimized is surface area S (in square inches). The
other variables are radius r (inches) and height h (inches). The constant
volume is V = 10 quarts; to make this comparable to the other variables,
we must convert to V = 577.5 cubic inches.

2. Equations. The volume V is the base area r2 times the height h. For the
surface S: the sides, if unrolled, form a rectangle with the same height h as
the cylinder, and width equal to the perimeter of the bottom, 2r; and we
also add the bottom area r2 . Thus:

V = r2 h = 577.5 , S = r2 + 2rh = min.

3. We can choose either r or h as the controlling variable, but it is easier to


write the variables in terms of r, so we choose it. The other variables are:
577.5 577.5 1155
h = , S = r2 + 2r = r2 + .
r2 r 2 r
The only restriction on r is r > 0. (The radius can be very large if the height
is correspondingly tiny, which is clearly not optimal, but still possible.) Thus,
the domain is the open interval r (0, ).
1155
4. We must find the absolute minimum of S(r) = r2 + r over r (0, ).
To find the critical points:
r
dS 1155 1155 3 1155
= 2r 2 = 0 = 2r = 2 = r= 5.68 .
dr r r 2
This is the only critical point, with output value S(r) 304 .
Since the endpoint values S(0) and S() are not defined, we must con-
sider the limiting values approaching these points: limr0+ S(r) = and
limx S(r) = . This means S(r) has no absolute maximum, but can get
as large as desired if we make r large or small enough.
q
The remaining candidate must be the absolute minimum point, r = 3 1155
2 .
5. At the minimum point, the other variable is:
577.5
h = q 2 5.68 .
3 1155
2

In fact, we can simplify to show h = r, meaning the optimal bucket is twice


as wide as it is high.
This is plausible. However, an actual 10-quart bucket has dimensions about
as wide as it is high, about r = 12 h = 4.5 in, which uses more plastic than
necessary. Try to explain what other factors might influence the design.

Ants example. A line of ants marches across a 10cm 10cm square of carpet
from the lower left to the upper right corner. Part of their path is along the edge
next to the carpet, where their speed is 1 cm/sec, and part diagonally across the
carpet, where their speed is 12 cm/sec. What path should they take along the edge
before entering the carpet, so as to minimize (a) the total distance; and (b) the
total travel time.

1.

Some variables are e, the distance traveled along the edge, and c, the distance
traveled across the carpet. The target variable to be minimized in each part
is: (a) the total distance D in cm; and (b) the total time T in sec.

2. Equations:
c2 = 102 + (10e)2 , D = e + c.
Also, we know speed time = distance, so time = distance/speed. The
travel time along the edge is e/1 = e, along the carpet c/ 21 = 2c, with total:

T = e + 2c .

3. The obvious controlling variable is e, since we can easily write the other
variables in terms of it, including the target variables:
p p
c = 102 +(10e)2 = 20020e+e2 ,
p p
D = e + 20020e+e2 , T = e + 2 20020e+e2 .
The relevant domain is e [0, 10].
4. For part (a), the critical points are given by:

dD 10 e
= 1+ 12 (20020e+e2 )1/2 (20020e+e2 )0 = 1 = 0 ,
de 20020e+e2

which reduces to 20020e+e2 = 10 e, then to 20020e+e2 = (10e)2 ,
which cancels out to give the impossible equation 200 = 100. Thus, there are
no critical points,
and the absolute minimum must be one of the endpoints.
Since D(0) = 10 2 14.4 < D(10) = 20, the minimum is at e = 0.
For part (b), the critical points are given by:

dT 2(10e) p
= 1 = 0 = 20020e+e2 = 20 2e
de 20020e+e2
= 200 20e + e2 = (20 2e)2 = 3e2 60e + 200 = 0 .
The Quadratic Formula then gives:

60 602 4(3)(200) 10 3
e= 2(3) = 10 3 4.2, 15.8 .

Since the second solution is outside the relevant domain e [0,


10], we
10 3
discard it, and conclude that the only critical point is e = 10 3 4.2,

with valueT (e) = 10 + 10 3 27.3 . Comparing this to the endpoints
T (0) = 20 2 28.3 and T (10) = 30, we find the absolute minimum is at

e = 1 33 4.2 with T (e) = 10+10 3 27.3.

5. For part (a), the minimum distance at e = 0 is obvious, at least in retrospect:


this just says the straight diagonal is the shortest path between the two
corners.
For part (b), the minimum time is about T (4.2) 27.3 sec: that is, at a
speed between 0.5 and 1 cm/sec, the ants can cross the 10 cm 10 cm square
in about 27 sec, which is reasonable. This is a slight saving over the straight
diagonal path, which takes about 28 sec. (This assumes that they move at
carpet speed along the vertical edge of the square; if they moved at floor
speed, they would do much better to go around the obstacle, at 20 sec.)
A line of ants will usually find the minimum distance path over a landscape
by gradually tightening their curves; what do you think they would do in
this case?


Note that dD
de
is always defined over the relevant interval e [0, 10], since 20020e+e2 =
2 2
10 +(10e) > 0.
Math 132 Newtons Method Stewart 3.8

Roots of equations. We frequently need to solve equations for which there is


no neat algebraic solution, such as:

f (x) = x3 + x 1 = 0 .

In this case, the best we can ask is an approximate solution, accurate to a specified
number of decimal places, and this is all we need for any practical purpose.
We can start with a computer graph of y = f (x), which is just a display of
many plotted points (x, f (x)):

A solution of f (x) = 0 is an x-intercept of the graph, and we see one, call it


x = a, close to x = 0.5; that is, our first estimate is a 0.5. Computing:

f (0.5) = 0.375 < 0, f (0.6) = 0.184 < 0, f (0.7) = 0.043 > 0,

the Intermediate Value Theorem (1.8) guarantees a solution 0.6 < a < 0.7;
thus we can improve our estimate to a 0.6. We could add a decimal place by
checking f (0.61), f (0.62), . . . , f (0.69) to see where the values change from negative
to positive, but this is clearly very tedious and inefficient.
Newtons Method is an amazingly efficient way to refine an approximate solu-
tion to get more and more accurate ones, until the required accuracy is reached.
Let us call our first estimate x1 = 0.5. We are seeking the true solution x = a, the
x-intercept of y = f (x). As in 2.9, let us approximate y = f (x) by its tangent
line at our initial point at (x1 , f (x1 )), namely y = f (x1 ) + f 0 (x1 )(xx1 ):

How do we know there is no other solution x = b? If there were, Rolles Theorem (3.2) says
that there would be some x = c (a, b) with f 0 (c) = 0, namely a hill or valley of y = f (x). But
f 0 (x) = 3x2 + 1 = 0 clearly has no solutions, so y = f (x) has no hills or valleys, and there cannot
exist another solution x = b.
You can see how the tangent line (in red) is very close to the graph near x = x1 , and
fairly close even near the true solution x = a. We cannot solve for the x-intercept
of y = f (x), but we can find the x-intercept of the line, denoted x = x2 :

f (x1 )
f (x1 ) + f 0 (x1 )(xx1 ) = 0 = x = x2 = x1 .
f 0 (x1 )

This solution x2 is not exactly a, but it is closer than the initial estimate x1 .
Now we can iterate (green line), repeating the same computation starting with
x2 instead of x1 . The result is:
f (x2 )
x3 = x2 ,
f 0 (x2 )

which is much closer to a; then x4 = x3 ff0(x 3)


(x3 ) ; and repeating the same way we
get the following spreadsheet, computing to 3 decimal places:

n xn f (xn ) f 0 (xn ) xn+1


1 0.500 0.375 1.750 0.714
2 0.714 0.079 2.531 0.683
3 0.683 0.002 2.400 0.682
4 0.682 0.000 2.397 0.682
5 0.682 0.000 2.397 0.682

The xns will continue as real numbers to converge closer and closer to a, but
since we do not see any difference in our 3 decimal places after x4 , there is no
point in continuing. We already have our answer within the specified accuracy:

a 0.682 accurate to 3 decimal places.

In the table, f (x4 ) 0.000 is indeed an approximate solution to f (x) = 0.

Newtons Method. We wish to solve an equation f (x) = 0, with the true


solution x = a fairly close to an intial estimate a x1 , and the final approximation
a xn accurate to a specified number of decimal places.
1. Using a calcuator, spreadsheet, or computer algebra system, compute x2 , x3 , . . .
according to the formula:

f (xn )
xn+1 = xn ,
f 0 (xn )

computing with at least the specified accuracy (number of decimal places).

2. Stop once xn xn+1 are the same up to the given accuracy. The final
approximation is a xn .

Trigonometric equation: Let us solve, to 3 decimal places, the equation:

cos(x) = x .

(As always in Calculus, we assume x is in radians: see 2.5 end.)

Looking at the graph, we see that there is a unique solution somewhere around
x1 = 1. This seems different from the previous case, since we seek the intersection
of two graphs rather than the x-intercept of a single graph; but we can simply
rewrite the equation as f (x) = x cos(x) = 0. Newtons Method gives:

xn cos(xn )
xn+1 = xn ,
1 + sin(xn )

x1 x2 x3 x4
1.000 0.750 0.739 0.739
That is, the solution is a 0.739 to 3 places.

Numerical roots. The number 2 is a known value: a calculator can imme-
diately tell us that 2 = 1.41421356 . . . . But just how does the calculator know
this? Newtons Method,
thats how!
By definition, 2 is the solution of x2 = 2, or f (x) = x2 2 = 0. Starting
2 2
with x1 = 1, the Method gives xn+1 = xn x2x n
n
and:

x1 1.00000000
x2 1.50000000
x3 1.41666667
x4 1.41421569
x5 1.41421356
x6 1.41421356
Here we see the power of the Method: with just a couple of dozen +, , ,
calculator operations, it converged from 0 places to 8 places of accuracy.
We could also do the Method withfractions rather than decimals to get very
accurate fractional approximations of 2:

x1 1
x2 3/2
x3 17/12
x3 577/408

Already x3 = 17 17 2
12 is a very good approximation, since ( 12 ) = 289 1
144 = 2 144 , very
close to 2. However, no fraction or finite decimal can give 2 exactly: it is known
to be an irrational number.
Math 132 Antiderivatives Stewart 3.9

Reversing differentiation. In many problems, especially physical ones, we are


interested in some function F (x), but we only know its derivative F 0 (x) = f (x).
We need to reverse the differentiation process to find the original F (x), the an-
tiderivative of f (x).
For example, suppose F (t) represents the height of an object at time t, but we
only know the velocity:
F 0 (t) = f (t) = t2 + 2.
What was the original F (t)? Recalling our Basic Derivative (tn )0 = ntn1 , we
realize that ( 31 t3 )0 = 13 (3t2 ) = t2 , so F (t) = 13 t3 + 2t works. But this is not the
only possible answer because any constant term C disappears in the derivative, so
the general answer is:
F (t) = 31 t3 + 2t + C.
This family of functions is called the general antiderivative:

The non-uniqueness of F (t) means that the velocity alone does not determine
the height. But if we know the height at just one time, for example the initial
height F (0) = 5, then we can adjust the constant C in a unique way to satisfy
this requirement:
1 3
F (0) = 3 (0 ) + 2(0) + C = 5 = C = 5.

That is, F (t) = 31 t3 +2t+5 is the unique function with F 0 (t) = t2 +2 and F (0) = 5.
We have solved an initial value problem. (See the Ballistic Equation, end of 2.7.)
Generalizing we have:

Definition. F (x) is an antiderivative of f (x) means F 0 (x) = f (x).

Antiderivative Theorem. Assume F (x) is some particular antideriva-


tive of f (x) for x (a, b). Then:
(a) The general antiderivative of f (x) is F (x) + C for any constant C.
There are no other antiderivatives of f (x).
(b) For any c (a, b) and any A, there is a unique antiderivative F (x)
satisfying the intial value problem F 0 (x) = f (x) and F (c) = A.
Proof. (a) This just rephrases 3.2 Uniqueness Theorem (b). That is, if F (x) is any
antiderivative, i.e. any function with F 0 (x) = F 0 (x) = f (x), then the Uniqueness
Theorem guarantees F (x) = F (x) + C for some constant C.
(b) We are given a specific antiderivative F (x) by hypothesis, so by part (a),
a general antiderivative is F (x) = F (x) + C. If we require F (c) = F (c) + C = A,
this determines C uniquely as C = A F (c), so we can only have F (x) =
F (x) + A F (c). (See also 3.2 Uniqueness Theorem (c).) Q.E.D.

Antidifferentiation. This means the process of finding antiderivatives by revers-


ing the rules for derivatives from 2.32.4. For every Basic Derivative of the form
F 0 (x) = f (x), we have a reverse Basic Antiderivative:

F (x) F 0 (x) = f (x) f (x) F (x)


1
xn nxn1 xn n+1 x
+C n+1

sin(x) cos(x) cos(x) sin(x) + C


=
cos(x) sin(x) sin(x) cos(x) + C
tan(x) sec2 (x) 2
sec (x) tan(x) + C
sec(x) tan(x) sec(x) tan(x) sec(x) sec(x) + C

Each general antiderivative has an arbitrary constant term C. Notice we do not


know an antiderivative for f (x) = x1 = x1 , since the formula 1+1 1
x0 does not
make sense.
We can also reverse the Derivative Rules. Since the derivative of a sum is the
sum of derivatives, the same is true for antiderivatives, and similarly for differences
and constant multiples:
3 4
f (x) = 7x3 x x + 2
x cos2 (x)
= 7x3 x3/2 + 3x2 4 sec2 (x),

F (x) = 7( 41 x4 ) 1 5/2
5/2 x + 3( 11 x1 ) 4 tan(x) + C
7 4 3
= 4x 52 x2 x 4 tan(x) + C.
x
To verify this, just differentiate F (x) to recover f (x).
We can also reverse the Chain Rule: we know (sin(3x))0 = cos(3x) (3x)0 =
3 cos(3x), so what F (x) will have F 0 (x) = cos(3x)?
1
f (x) = cos(3x) = F (x) = 3 sin(3x) + C.

On the other hand, the derivative of a product is NOT the product of deriva-
tives (2.3), so the antiderivative of a product is NOT the product of antideriva-
tives. (Similarly for quotients.) We will learn how to handle these later, but for
now, we can sometimes antidifferentiate products or quotients if we can expand
them into sums of Basic Antiderivatives.
x+4 x 4
f (x) = = + = x1/2 + 4x1/2
x x x
1 3/2 1 1/2
F (x) = 3/2 x + 4( 1/2 x ) + C = 23 x x + 8 x + C.
example. Find the antiderivative of f (x) = sin2 (x). This is a product of sin(x)
with itself, and we need to expand it somehow in terms of Basic Antiderivatives.
A clever idea: in the identity

cos(2x) = cos2 (x) sin2 (x) = (1 sin2 (x)) sin2 (x) = 1 2 sin2 (x),

we can solve for sin2 (x), so that:


1
f (x) = sin2 (x) = 2 12 cos(2x),
1
F (x) = 2x 12 ( 21 sin(2x)) + C = 1
2x 14 sin(2x) + C.

Second derivative initial value problem. Suppose an object moves along a


line with position s(t) at time t. If we know the acceleration a(t) = t + 1, and the
position and and velocity at time t = 1, namely s(1) = 10 and v(1) = 3, then find
the position and velocity functions s(t) and v(t).
Rephrasing, we know velocity is the rate of change of position, v(t) = s0 (t),
and acceleration is the rate of change of velocity, a(t) = v 0 (t) = s00 (t). Thus, we
must solve the initial value problem:

s00 (t) = t + 1, s(1) = 10, s0 (1) = 3.

First, we antidifferentiate t + 1 to get v(t) = 21 t2 + t + C. Thus, we have:

v(1) = 12 (12 ) + 1 + C = 3,

which we can solve to get C = 32 , so that:

v(t) = s0 (t) = 12 t2 + t + 32 .

Next we antidifferentiate v(t) to get:


1 1 3
s(t) = 2(3t ) + 12 t2 + 32 t + D = 1 3
6t + 12 t2 + 32 t + D,

where D is another constant (different from the previous C). Again, we can solve:

s(1) = 16 (13 ) + 21 (12 ) + 32 (1) + D = 10 = D = 47


6 .

The final answer is:


1 3
s(t) = 6t + 12 t2 + 23 t + 47
6 .

This is the unique solution, with no arbitrary constants.


Math 132 Area and Distance Stewart 4.1/I

Review. The derivative of y = f (x) has four levels of meaning:


dy
Physical: If y is a quantity depending on x, the derivative dx |x=a is the rate
of change of y per tiny change in x away from a.

Geometric: f 0 (a) is the slope of the graph y = f (x) near the point (a, f (a)),
or the slope of the tangent line at that point, y = f (a) + f 0 (a)(xa).
f (x)f (a)
Numerical: Approximate by the difference quotient, f 0 (a) f
x = xa
for x near a. This gives the linear approximation f (x) f (a)+f 0 (a)(xa).
f (x)f (a) f (a+h)f (a)
Algebraic: Defining f 0 (a) = lim xa = lim h , we prove Basic
xa h0
Derivatives and Derivative Rules to find f 0 (x) for any formula f (x).
Problems usually originate on the physical or geometric levels, then we trans-
late them to the numerical or algebraic levels to solve them. For example, to find
the hill tops of a given curve y = f (x), a geometric problem, we consider that
they must have horizontal tangents, so we take the derivative f 0 (x) and solve for
the critical points f 0 (x) = 0 algebraically, or numerically with Newtons Method.
In the previous chapter 3.9, we introduced the reverse of the derivative, the
antiderivative. In this chapter, we will see that it has all the above levels of
meaning, and connecting them will allow us to solve many new problems.

Distance problem. In 3.9, we reversed the algebraic derivative operation: that


is, we could often recognize a given a function f (x) as the derivative of some known
function F (x), obtaining an algebraic antiderivative. But this does not always
work: there are many functionswhich are not the derivative of any formula we
know, for example f (x) = x1 or x3 +1 or sin(x2 ).
Let us consider this on the physical level: if we take f (t) = v(t) to be a
velocity function, then the antiderivative should be the corresponding position
function F (t) = s(t), since velocity is the rate of change of position, s0 (t) = v(t).
Imagine a toy car on a track which starts out at time t = 0 at the starting line
s(0) = 0, and adjusts its velocity according to v(t). Even if we have no algebraic
formula for s(t), nevertheless the car does have a position, so there must exist an
antiderivative, a new function we have not imagined before.
To compute this new position function s(t), we work numerically, taking a
limit of approximations as we did in computing the derivative ds s
dt = limt0 t .
As an elementary example, we take the velocity function v(t) = t2 , and compute
s(2), the distance traveled from t = 0 to t = 2:

s0 (t) = v(t) = t2 , s(0) = 0, s(2) = ??

Since distance = velocity time, we can say very roughly:

s(2) v(2) t = 22 (20) = 8.



We will eventually learn that x1 is the derivative of the logarithm log(x), but the others
really are not the derivative of any formula.
This would be exact if the velocity held constant at v(2) = 4 for the whole time,
but in fact the car starts from a standstill v(0) = 0, so this is a gross overestimate.
For a good approximation, we split the time interval [0, 2] into n = 20 incre-
ments of size t = 0.1, with dividing points:

0.0 < 0.1 < 0.2 < < 1.8 < 1.9 < 2.0.

We approximate a distance increment during each time increment, and add these
up to get the total distance traveled:
s(2) v(0.1)t + v(0.2)t + + v(1.9) t + v(2.0) t
= (0.1)2 (0.1) + (0.2)2 (0.1) + + (1.9)2 (0.1) + (2.0)2 (0.1)
2.9 .

Here we sample the velocity v(t) at the end of each increment: for example, the
first sample point is t = 0.1, the right endpoint of t [0.0, 0.1]. This is still an
overestimate, since the velocity is slightly less at the beginning of each increment
than at the end.
To get an underestimate, we should sample velocity at the beginning of each
increment, where it is smallest:
s(2) v(0.0)t + v(0.1)t + + v(1.8) t + v(1.9) t
= (0.0)2 (0.1) + (0.1)2 (0.1) + + (1.8)2 (0.1) + (1.9)2 (0.1)
2.5
As we take more and more increments of smaller and smaller size, all estimates
converge on a limiting value, which is the exact position s(2).
For this simple function v(t) = t2 , we can compare the numerical answers
with our known algebraic solution: s(t) = 13 t3 is the unique antiderivative with
s(0) = 0, and we have:
1 3 8
s(2) = 3 (2 ) = 3 2.66.

which is indeed between the lower and upper estimates above. In fact, the average
of the two estimates is 2.9+2.5
2 = 2.7, which is the correct answer rounded to 1
decimal place.

The integral. Applied generally to any velocity v(t) over any interval t [0, b],
this method specifies the value of the position s(b) as a limit. We introduce a new
notation for this limit, the integral of v(t) from t = 0 to b:
Z b
s(b) = v(t) dt = lim v(t1 )t + v(t2 )t + + v(tn )t.
0 t0
R
The integral symbol is an elongated S standing for the sum of n terms; v(t)
stands for all the sample values v(t1 ), . . . , v(tn ); and dt suggests a very small t
as n , getting larger and larger.

Increment: a small increase, a part added.
Sometimes s(t) turns out to equal a known formula, sometimes it can only be
computed
R 2 approximately to any desired accuracy. In our example, we computed
2 8
s(2) = 0 t dt = 3 2.66.
Generalizing further, suppose we are given any function f (x) which we consider
as the rate of change of an unknown function F (x) for x [a, b]. Then we may
compute the cumulative total change F (b) F (a) by the above method: that is,
we compute the integral of f (x) from x = a to b:
Z b
F (b) F (a) = f (x) dx = lim f (x1 )x + f (x2 )x + + f (xn )x.
a x0

Here we split the interval [a, b] into n increments of size x = ba n , and choose a
sample point in each increment: x1 , x2 , . . . , xn can be the left or right endpoints,
or anywhere between. Each term approximates the incremental change in F (x) as
the rate of change f (xi ) times the length of the increment x. Finally, we take
the limit as n and x 0.

Area problem. Now we come to one of the most surprising results in mathe-
matics: the geometric interpretation of the integral. Suppose we have a function
with f (x) 0 for x [a, b], and we wish to determine the area under the graph
y = f (x) and above the interval [a, b] on the x-axis. For example, let us again take
f (x) = x2 over the interval [0, 2].

To approximate the area A, we cover it by 20 thin rectangles of width x = 0.1


(below at left):

The dividing points are again 0.0 < 0.1 < < 1.9 < 2.0, and each rectangle
reaches up to the graph at the right endpoint of an increment, giving heights
f (0.1), f (0.2), . . . , f (2.0). The area A under the curve is close to the total area of
the rectangles; adding up (height)(width) for each rectangle gives:

A f (0.1) x + f (0.2) x + + f (2.0) x 2.9 .

This is an overestimate, since the rectangles slightly overshoot the curve. To get
an underestimate, we take heights at the left endpoint of each increment, fitting
the rectangles under the graph (above at right):

A f (0.0) x + f (0.1) x + + f (1.9) x 2.5 .

Clearly, this is the same computation as we did before, so it has the same answer.
That is, taking the limit of thinner and thinner rectangles gives:
Z 2
A = x2 dx.
0

That is, the area under y = f (x) for x [0, 2] is the same as the distance traveled
with velocity v(t) = t2 during t [0, 2]. Are you not amazed?
Why is this? Let us fix a, take b = x to be a variable, and consider the area
above [a, x] as a function A(x). Then the rate of change of the area function is
the height of the graph: A0 (x) = f (x), since the greater the height, the taller
the rightmost incremental rectangle, and the faster A(x) increases. Thus, we can
consider the height as a rate of change, and the area as a cumulative total change,
which is just what the integral computes.
We have seen the distance problem before in 2.7, when we used speedometer
data to reconstruct odometer data, using the graph of velocity v(t) to draw the
graph of distance s(t). We can now compute s(t) for t = x as the area under the
v(t) graph and a horizontal interval t [0, x]. (If the velocity is negative, the area
counts as negative and s(t) decreases.) For consistency, we must look at the ft/sec
(not mph) scale for v(t), since t is in sec, and we can estimate the total net area
to be about 3500 ft. That is, during the 150 sec shown, the car traveled forward
by s(150) 3500 ft.

Approximating an integral. We compute the area A under one arch of f (x) =


sin2 (x) to an accuracy of one decimal place:

We will compute:
Z
A = sin2 (x) dx sin2 (x1 ) x + sin2 (x2 ) x + + sin2 (xn ) x
0
for suitably large n, correspondingly small increment x = ba
n = n , and appro-
priate sample points x1 , . . . , xn .
To make sure of the required accuracy, we will compute an overestimate and an
underestimate. For an overestimate, we take x1 , . . . , xn so that sin2 (x) is largest
within each increment. These are not always the right endpoints, because the
function is decreasing on the second half of the interval. Rather, for the increments
within [0, 2 ], we take the right endpoints, and for the increments within [ 2 , ] we
take the left endpoints. To get an underestimate, we take sample points where
sin2 (x) is smallest within each increment, reversing the previous choices.
With a spreadsheet or computer algebra program, it is not difficult to take
n = 100. The upper estimate is:

sin2 (0.01) (0.01) + sin2 (0.02) (0.01) + + sin2 (0.50) (0.01)

+ sin2 (0.50) (0.01)+sin2 (0.51) (0.01)+ +sin2 (0.99) (0.01) 1.60 .


The lower estimate is:

sin2 (0.00) (0.01) + sin2 (0.01) (0.01) + + sin2 (0.49) (0.01)

+ sin2 (0.51) (0.01)+sin2 (0.52) (0.01)+ +sin2 (1.00) (0.01) 1.54 .


Thus we get upper and lower bounds for our area A, and taking their average
gives our best estimate:

1.54 < A < 1.60 = A = 1.57 0.03 .


Rb
As we have seen, the integral F (b) = a f (x) dx defines an antiderivative func-
tion numerically, whether or not we can find an algebraic antiderivative. But
in 3.9 we were (just barely) able to find an algebraic antiderivative: F (x) =
1 1
2 x 4 sin(2x), satisfying F (0) = 0. Since there can be only one such antideriva-
tive, we find that:
Z

sin2 (x) dx = F () = 1.571 ,
0 2

so our numerical approximation is actually accurate to 2 decimal places.


Math 132 Area and Distance Stewart 4.1

Review. The derivative of y = f (x) has four levels of meaning:


dy
Physical: if y is a quantity depending on x, the derivative dx |x=a is the rate
of change of y per tiny change in x away from a.
Geometric: f 0 (a) is the slope of the graph y = f (x) near the point (a, f (a)),
or the slope of the tangent line at that point, y = f (a) + f 0 (a)(xa).
f (x)f (a)
Numerical: approximate by the difference quotient, f 0 (a) f
x = xa
for x near a. This gives linear approximation f (x) f (a)+f 0 (a)(xa).
f (x)f (a) f (a+h)f (a)
Algebraic: defining f 0 (a) = lim xa = lim h , we prove Basic
xa h0
Derivatives and Derivative Rules to find f 0 (x) for any formula f (x).
Problems usually originate on the physical or geometric levels, then we trans-
late them to the numerical or algebraic levels to solve them. For example, to find
the hill tops of a curve y = f (x) (a geometric problem), we consider that they
must have horizontal tangents, so we take the derivative f 0 (x) and solve for the
critical points f 0 (x) = 0 algebraically (or numerically with Newtons Method).
In the previous chapter 3.9, we introduced the reverse of the derivative, the
antiderivative. In this chapter, we will see it has all the above levels of meaning,
and connecting them will allow us to solve many new problems.

Distance problem. In 3.9, we reversed the algebraic derivative operation: that


is, we could often recognize a given a function f (x) as the derivative of some known
function F (x), obtaining an algebraic antiderivative. But this does not always
work: there are many functionswhich are not the derivative of any formula we
know, for example f (x) = x1 or x3 +1 or sin(x2 ).
Let us consider this on the physical level: if we take f (t) = v(t) as a velocity
function, then F (t) = s(t) should be the corresponding position function, with
s0 (t) = v(t). Consider an object (say, a toy car on a track) which starts out at
time t = 0 at s(0) = 0, and adjusts its velocity according to v(t). Even if we have
no algebraic formula for s(t), nevertheless the car does have a position, so there
must exist an antiderivative, a new function we have not imagined before.
To compute this new position function s(t), we will more generally compute
the distance traversed over any time interval t [a, b], namely s(b) s(a). Once
we know this, we can take a = 0 and compute s(b) s(0) = s(b) for any t = b > 0,
since we assumed s(0) = 0.
We work numerically, taking a limit of approximations as we did in computing
the derivative ds s
dt = limt0 t . For concreteness, we take a velocity function
2
v(t) = t which we can antidifferentiate algebraically, to check against our numer-
ical method. Also, we focus on the time interval from t = 1 to t = 3, and ask,
what distance does the car move during this time?
s0 (t) = v(t) = t2 , s(3) s(1) = ??

We will eventually learn that x1 is the derivative of the logarithm log(x), but the others
really are not the derivative of any formula.
Estimates of distance. Since distance = velocity time, we can say very roughly:

s(3) s(1) v(1) t = 12 (31) = 2.

This would be exact if the velocity held constant at v(1) = 1 for the whole time,
but in fact the car is speeding up to v(3) = 9, so this is a gross underestimate.
For a good approximation, we split the time interval [1, 3] into 20 increments
of size t = 0.1, with dividing points:

1.0 < 1.1 < 1.2 < < 2.8 < 2.9 < 3.0.

We add up the 20 approximate increments of distance:

s(3) s(1) v(1.0)t + v(1.1)t + + v(2.8) t + v(2.9) t


= (1.0)2 (0.1) + (1.1)2 (0.1) + + (2.8)2 (0.1) + (2.9)2 (0.1)
8.3

Here we sample velocity at the beginning of each increment: for example, the last
sample point is 2.9, the beginning of [2.9, 3.0]. This is still an underestimate, since
the velocity does increase slightly from the beginning to the end of each increment.
To get an overestimate, we should sample velocity at the end of each increment,
where it is largest:

s(3) s(1) v(1.1)t + v(1.2)t + + v(2.9) t + v(3.0) t


= (1.1)2 (0.1) + (1.2)2 (0.1) + + (2.9)2 (0.1) + (3.0)2 (0.1)
9.1

In this case, we can compare with our known algebraic solution: s(t) = 31 t3 is
the unique antiderivative of v(t) = t2 with s(0) = 0, and we have:
1 3
s(3) s(1) = 3 (3 ) 13 (13 ) = 8 23 8.66.

which is indeed between the lower and upper estimates above. Note that the
average of the two estimates is 8.7, which is correct to 1 decimal place.
As we take more and more increments of smaller and smaller size, all estimates
converge on a limiting value, which is the exact answer. Applied generally to any
v(t), this method specifies the position s(b) = s(b) s(0) as a limit, which defines
an antiderivative function. Sometimes this turns out to equal a known formula,
sometimes it can only be computed approximately to any desired accuracy.

Increment: an increase, a part added.
Defintion of integral. Generalizing from velocity and distance, suppose we are
given an arbitrary f (x) which we consider as the rate of change of an unknown
antiderivative F (x), with a given initial value F (a) = 0. We wish to compute the
total change F (b) = F (b) F (a) by the above method, and we introduce a new
Rb
symbol for the answer: a f (x) dx.
Defintion: The definite integral of f (x) from x = a to x = b means:
Z b
f (x) dx = lim f (x1 )x + f (x2 )x + + f (xn )x.
a x0

Here we split the interval [a, b] into n increments of size x = ba n ,


and choose a sample point in each increment: x1 , x2 , . . . , xn can be the
left or right endpoints, or anywhere between. Then we take the limit
as n and x 0.
R
The integral symbol is an elongated S standing for sum; the f (x) stands
for the sample values f (x1 ), . . . , f (xn ); and dx is meant to suggest a very small
x. It can be proven that, for a continuous f (x), the above summation always
approaches a limit no matter how we choose theR sample points x1 , . . . , xn . Note
a
that for b = a, an interval of length zero, we get a f (x) = 0, since x = ba n = 0.
First Fundamental Theorem of Calculus. Given a continuous function
f (x), fix some x = a, and define F (x) for any x = b a as:
Z b
F (b) = f (x) dx .
a
Then F 0 (x) = f (x), and F (x) is an antiderivative of f (x) with initial
value F (a) = 0.
We will discuss this Theorem in detail later, but it should be clear that we defined
the integral just so as to make it true.

Area problem. Now we come to one of the most surprising results in mathe-
matics: the geometric interpretation of the integral. Suppose we have a function
with f (x) 0 for x [a, b], and we wish to determine the area under the graph
y = f (x) and above the interval [a, b] on the x-axis.
There are known aswers if this shape is a triangle, a circle, or a few others, but
we have no formulas for a general f (x). For example, let us again take f (x) = x2
over the interval [1, 3]. To approximate the area A, we fill it with thin rectangles
of width x = 0.1 (below at left):
The dividing points are again 1.0 < 1.1 < < 2.9 < 3.0, and each rectan-
gle reaches up to the graph at the left endpoint of an increment, giving heights
f (1.0), f (1.1), . . . , f (2.9). The area A under the curve is close to the total area of
the rectangles; adding up height width gives:

A f (1.0) x + f (1.1) x + + f (2.9) x 8.3 .

This is an underestimate, since the rectangles do not quite fill the area. To get an
overestimate, we take heights at the right endpoint of each increment, making the
rectangles taller than the graph (above at right):

A f (1.1) x + f (1.2) x + + f (3.0) x 9.1 .

Clearly, taking the limit of thinner and thinner rectangles gives:


Z 3
A = x2 dx.
1

Are you not amazed?


Why is this? Let us fix a, take b = x to be a variable, and consider the
area above [a, x] as a function A(x). Then the rate of change of this function is
the height of the graph: A0 (x) = f (x), since the greater the height, the taller
the rightmost incremental rectangle, and the faster A(x) increases. Thus, we can
consider the height as a rate of change, and the area as a cumulative change.
example: Compute the area under one arch of the function f (x) = sin(x2 ),
to an accuracy of one decimal place.

Area problem for v(t) = 16t: same formula, upper & lower sums
Why same? Because they both compute the cumulative effect of a changing
influence
Math 132 Sigma Notation Stewart 4.1, Part 2
Rb
Notation for sums. In Notes 4.1, we define the integral a f (x) dx as a limit of
approximations. That is, we split the interval x [a, b] into n increments of size
x = ba
n , we choose sample points x1 , x2 , . . . , xn , and we take:
Z b
f (x) dx = lim f (x1 )x + f (x2 )x + + f (xn )x.
a x0

The sum which appears on the right is called a Riemann sum. Similar sums appear
frequently in mathematics, and we define a special notation to handle them.
In the most general situation, we have a sequence of numbers q0 , q1 , q2 , q3 , . . .
so that for any i = 0, 1, 2, . . . we have a number qi . We consider an interval of
integers i = m, m+1, m+2, . . . , n, and we introduce a notation for the sum of all
the qi for i = m to n:
n
X
qi = qm + qm+1 + qm+2 + + qn .
i=m

The summation symbol is capital sigma, the Greek letter S, standing for sum.
The variable i is called the index of summation.
Note: In the WebWork problems, a sequence is denoted f (i) instead of qi .
This is because we can consider the sequence of qis as a function with input i (an
integer) and output qi (a specified number).

Examples

Letting qi = i, we have q0 = 0, q1 = 1, q2 = 2, q3 = 3, etc., and taking
the interval of integers i = 2, 3, 4, 5, we have:
5
X
i = 2+ 3+ 4+ 5 7.38 .
i=2

P10
Letting qi = 1, we have: i=1 1 {z + 1} = 10.
= |1 + 1 +
10 terms

Given the sum of the first ten square numbers 1 + 4 + 9 + 16 + + 100, we


wish to write this compactly in sigma notation. Considering the terms as a
sequence qi = i2 , we get:
10
X
2 2 2 2
1 + 4 + 9 + + 100 = 1 + 2 + 3 + + 10 = i2 .
i=1

Given the sum of the first five odd numbers 1 + 3 + 5 + 7 + 9, we can write
this in sigma notation by considering the terms as qi = 2i1:
5
X
1 + 3 + 5 + 7 + 9 = (2(1)1) + (2(2)1) + + (2(5)1) = (2i1) .
i=1
Another way would be to consider the terms as qi = 2i+1:
4
X
1 + 3 + 5 + 7 + 9 = (2(0)+1) + (2(1)+1) + + (2(4)+1) = (2i+1) .
i=0

The sum of the first n odd numbers, where n is an unspecified whole number,
can be written as:
n
X
1 + 3 + 5 + + (2n1) = (2i1).
i=1

We can write a Riemann sum as:


n
X
f (x1 )x + f (x2 )x + + f (xn )x = f (xi )x .
i=1

Summation Rules. As for limits and derivatives, we can sometimes compute


summations by starting with known Basic Summations, and combining them by
Summation Rules.
n
X n
X n
X
Sum: (qi +pi ) = qi + pi .
i=m i=m i=m
n
X n
X n
X
Difference: (qi pi ) = qi pi .
i=m i=m i=m
n
X n
X
Constant Multiple: C qi = C qi , where C does not depend on i.
i=m i=m

Like all facts about summations, these formulas can be understood by writing out
the terms in dot-dot-dot (ellipsis) notation:
n
X
(qi +pi ) = (qm +pm ) + (qm+1 +pm+1 ) + + (qn +pn )
i=m
= (qm + qm+1 + + qn ) + (pm + pm+1 + + pn )
n
X n
X
= qi + pi .
i=m i=m

Similarly for the other two rules.


Note thatPn is a constantPn not2 depending on i, so we may factor it out of a
n 2 = n
summation: i=1 ni i=1 i . This gives a separate formula for each n: for
n = 3 it means 3(1 )+3(2 )+3(32 ) = 3(12 +22 +32 ). However, P
2 2 the variable iPhas no
??
meaning outside the summation, and cannot be factored out: 3i=1 i2i = i 3i=1 2i
is nonsense, because the left side means 1(21 ) + 2(22 ) + 3(23 ), but the right side
would mean some constant i times 21 +2P 2 +23 , but i is not a constant.

Warning: P the summation


P of a product qi pi is NOT equal to the product of
summations ( qi )( pi ). For example: 1 1 + 2 2 + 3 3 6= (1+2+3)(1+2+3).
Basic Summations. We can get some surprisingly neat formulas for certain
summations:
n
X
(a) 1 = n.
i=1
n
X
1
(b) i = 2 n(n+1).
i=1
n
X
(c) i2 = 1
6 n(n+1)(2n+1).
i=1
Pn
Proof. (a) = 1 + + 1 with n terms, which indeed equals n.
i=1 1
(b) Taking two copies of ni=1 i, we can pair each term with its complement:
P

n
X
2 i = 1 + 2 + + n1 + n
i=1
+ n + n1 + + 2 + 1

= n+1 + n+1 + + n+1 + n+1 = n(n+1).

The equation 2 ni=1 i = n(n+1), divided by 2, gives the desired formula.


P

(c) Consider that (i+1)3 = i3 + 3i2 + 3i + 1, so that:


Pn 3 3 =
Pn 2
i=1 (i+1) i i=1 (3i + 3i + 1)

= 3 ni=1 i2 + 3 ni=1 i + ni=1 1


P P P

= 3 ni=1 i2 + 23 n(n+1) + n.
P

On the other hand, we have a collapsing sum:


Pn 3 3 = (n+1)3 n3 + n3 (n1)3 + + 33 23 + 23 13
i=1 (i+1) i

= (n+1)3 13 .

Solving the equation:


n
X
3 i2 + 23 n(n+1) + n = (n+1)3 1
i=1

gives, as desired:
n
X
i2 = 1
3 ((n+1)
3
32 n(n+1) (n+1)) = 1
6 n(n+1)(2n+1) .
i=1
Pn 3
A similar computation will produce a formula for i=1 i , etc.
Direct Evaluation of Integrals. We can use the above rules to simplify Rie-
mann sums and find integrals exactly. For example, consider:
Z 3 n
X
5x dx = lim 5xi x.
1 x0
i=1

On the right side, we divide the interval [1, 3] into n increments of length x =
31 2
n = n , with dividing points:

1 < 1+x < 1+2x < 1+3x < < 1+nx = 3.

In the ith increment, we arbitrarily choose the sample point xi to be the right
endpoint, that is xi = 1 + i x = 1 + n2 i. Thus:
n n  
X X 2 2
5xi x = 5 1+ i
n n
i=1 i=1
n n
10 X 20 X
= 1+ 2 i
n n
i=1 i=1
10 20 1
= n + 2 n(n+1)
n n 2
10
= 20 + .
n
(Here n is a fixed number not
P depending on i, such as n = 100 or n = 1000, and
we can factor it out of the .) Finally, we let x 0 or equivalently n :
Z 3 n
X 10
5x dx = lim 5xi x = lim 20 + = 20 .
1 x0 n n
i=1

We computed this to show in principle that Riemann sums can be evaluated


directly, but this is far from the easiest way to compute an integral. Geometrically,
the integral equals the trapezoid area below the graph y = 5x and above the
interval [1, 3] on the x-axis. Since (trapezoid area) = (width)(average height),
we get that the integral is A = (31)( 5(1)+5(3)
2 ) = 20.
R3
Physically, if v(t) = 5t is a velocity, then the integral 1 v(t) dt is the distance
traveled from t = 1 to t = 3. Since the position s(t) is an antiderivative, we must
have s(t) = 25 t2 + C, so the distance traveled is s(3) s(1) = 52 (32 ) 52 (12 ) = 20.
Math 132 The Definite Integral Stewart 4.2
Rb
Precise definition. We have defined the integral a f (x) dx as a number approxi-
mated by Riemann sums. The integral is useful because, given a velocity function,
it computes distance traveled; given a graph, it computes an area between the
graph and the x-axis. More generally, given a varying rate of change, the integral
computes the cumulative totalRchange.
b R
The parts of the notation a f (x) dx have their own names: is the integral

sign; a is the lower limit of integration; b is the upper limit of integration; f (x)
is the integrand; and x is the variable of integration. Note that the variable of
integration is named Ronly for convenience, andR changing it does not change the
b Rb b
value of the integral: a f (x) dx = a f (t) dt = a f (r) dr.
Now we give the formal definition of integrals on the numerical level, as we did
for limits in 1.7 and derivatives in 2.1.

Definition: Given a function f (x) and numbers a b.

For each positive integer n, we divide the interval x [a, b] into


n increments of width x = ban , with division points:

a < a+x < a+2x < < a+nx = b,

and we choose sample points x1 , . . . , xn with xi anywhere in the


ith increment: a+(i1)x xi a+ix. Then we let:
Z b n
X
f (x) dx = lim f (xi )x = lim f (x1 )x+ +f (xn )x.
a n n
i=1

The function f (x) is integrable over [a, b] whenever the above limit
exists for every possible choice of sample points xi .

Integrable functions. Most functions are integrable unless they have a vertical
asymptote. To be precise:

Theorem: Assume f (x) is continuous for all x [a, b], except possibly
at a finite list of removable or jump discontinuities (see 1.8).
Then f (x) is integrable, meaning its Riemann sums converge to a well-
Rb
defined limit L = a f (x) dx for any choice of sample points.

This is proved in courses on Real Analysis.


To understand integrability better, let examine the non-integrable function
f (x) = x12 on [a, b] = [0, 1]. (We arbitrarily set f (0) = 0 to make f (x) defined for
all x [0, 1].) The function has a vertical asymptote discontinuity at x = 0, so

Here we use limit to mean a boundary, not a value approached by approximations.

Rb
Even more formally, a f (x) dx = L means that for any error tolerance > 0, there is some
lower bound N such that any Riemann P sum with more than N terms is forced close to L within
an error of : that is, n > N forces n
i=1 f (xi )x L < .

R1
the Theorem does not apply. If we attempt to compute 0 x12 dx by a Riemann
sum with x = 10 1 i
n = n and sample points xi = a+ix = n , we get:
n n n n
1 n2
n1 = n i12 = n 1
P P P P
x2i
x = i2 i2
> n.
i=1 i=1 i=1 i=1
P
(Here n is any fixed number such as n = 100, the same in each term of the ,
so we can factor it out by the Constant Multiple Rule.) Thus, as n , the
Riemann sum also gets larger and larger, and does not approach a finite limit.
Geometrically, this means there is an infinite area under the curve and above the
interval [0, 1] on the x-axis.
Rb
Negative integrand. So far, we have considered a f (x) dx with positive inte-
grand f (x) 0, in which case the integral is a positive number (in fact, an area).
Now suppose f (x) 0: our definition of the integral still makes sense, but it gives
a negative number. For example, for the constant function f (x) = 1, we have:
R3 n n
(1)( 31
P P
1 (1) dx = lim f (xi )x = lim n )
n i=1 n i=1
n
lim (2n lim (2 )
P
= ) 1 = n = 2 .
n i=1 n n

Geometrically, we think of each term f (xi ) x as (height)(width) with a


negative height f (xi ) 0, and we count this as a negative area. For a general
Rb
graph y = f (x) passing above and below the x-axis, a f (x) dx computes the
signed area between the graph and the interval [a, b] on the x-axis: regions
above the x-axis count as positive area; regions below the x-axis count as negative
area.
R3
example: We could evaluate the integral 0 (2x) dx with Riemann sums as
above, but it is easier geometrically. The function f (x) = 2x has x-intercept
x = 2; it is positive for x [0, 2] and negative for x [2, 3]. Thus the integral is
the area of the triangle above [0, 2], minus the area of the triangle below [2, 3]:

R3
But (triangle area) = 12 (base)(height), so 0 (2x) dx = 21 (2)(2) 12 (1)(1) = 32 .
Reversing limits of integration. If we take the limits of integration to be the
same, a = b, then x = ba
n = 0, so every Riemann sum is zero, and we get:
Z a
f (x) dx = 0 .
a

This is clear geometrically, since the area above a one-point interval [a, a] is zero.
Next, we give a meaning to switching the two limits of integration, by defining:
Z a Z b
f (x) dx = f (x) dx .
b a
Ra
Geometrically, in b f (x) dx we imagine x running backward from b to a with a
negative increment x = abn < 0. Since each term f (xi )x has negative width
Rb
x, the integral becomes the negative of a f (x) dx. If f (xi ) is also negative,
then
R 1 both widthR 3and height are negative and the integral is positive: for example,
3 (1) dx = 1 (1) dx = (2) = 2.
We bother with this definition only so as to improve the Splitting Rule below.

Integral Rules. At the end of 4.1 Part 2 we gave a direct computation of


an integral as a limit of Riemann sums. In later sections, we will find much
better algebraic methods to compute integrals using the Fundamental Theorems
of Calculus. For now, we will rely on a few Basic Integrals, and the following Rules
to combine them. Let a, b, c be any points on the x-axis; f (x), g(x) any integrable
functions; and A, B, C any constants. Then we have:
Z b Z b Z b
Sum: f (x)+g(x) dx = f (x) dx + g(x) dx .
a a a
Z b Z b Z b
Difference: f (x)g(x) dx = f (x) dx g(x) dx .
a a a
Z b Z b
Constant Multiple: C f (x) dx = C f (x) dx .
a a
Z b Z b
Domination: If f (x) g(x), then: f (x) dx g(x) dx .
a a
Z b
Bounds: If A f (x) B, then: (ba)A f (x) dx (ba)B .
a
Z c Z b Z c
Splitting: f (x) dx = f (x) dx + f (x) dx .
a a b

Proof. The Sum, Difference, and Constant Multiple Rules follow directly from the
corresponding rules for summations in 4.1 Part 2, applied to Riemann sums.
The Bounds Rule makes sense geometrically because 0 A f (x) B
means the graph y = f (x) is above the line y = A and below y = B. Thus
Rb
the area a f (x) dx below y = f (x) and above [a, b] contains a rectangle with
(width)(height) = (ba)A, and the area is contained inside a rectangle with
(width)(height) = (ba)B.

Pn Pn
Computing
Pn formally, A f (xi ) B implies i=1 A x i=1 f (xi )x
i=1 B x. Hence:

n n n
(ba)A (ba)A
X X X
ba
A x = A n = n 1 = n n = (ba)A,
i=1 i=1 i=1

and similarly for the upper bound. Taking limits as n gives the desired
inequalities. The Domination Rule is similar.
The Splitting Rule is intuitive when a b c. The interval [a, c] splits as the
union of two sub-intervals, [a, b] [b, c], so the area above [a, c] is the sum of the
Rc Rb Rc
areas above [a, b] and [b, c], i.e. a f (x) dx = a f (x) dx + b f (x) dx.
Furthermore, because of our extended definition of integrals, the Splitting Rule
is valid no matter what the relativeRpositions ofRa, b, c. For example, if a c b,
c b Rb
then [a, b] = [a, c][c, b] and clearly a f (x) dx+ c f (x) dx = a f (x) dx. Moving
Rb
c to the other side, we get:
Z c Z c Z b Z b Z c
f (x) dx = f (x) dx f (x) dx = f (x) dx + f (x) dx ,
a a c a b
Rc Rb Rc
so the very same Splitting Rule applies: a f (x) dx = a f (x) dx + b f (x) dx.
Another example: if a = c, the Splitting Rule says:
Z a Z b Z a
f (x) dx = f (x) dx + f (x) dx,
a a b

which is true since both sides are zero.

Basic Integrals:
Z b Z b Z b
1 2 1 2
1 dx = b a, x dx = 2b 2a , x2 dx = 1 3
3b 13 a3 .
a a a

Later, we will easily evaluate these integrals by the Fundamental Theorems. For
now, we can prove them directly from the Basic Summations in 4.1 Part 2. For
the third and hardest formula, we take increment x = ba n , sample points xi =
a + ix, and f (xi ) = (a+ix)2 = a2 + 2ai x + i2 (x)2 , giving Riemann sum:
n
X n
X
f (xi )x = a2 x + 2ai(x)2 + i2 (x)3
i=1 i=1
n
X n
X n
X
2 2 3
= a x 1 + 2a(x) i + (x) i2
i=1 i=1 i=1
2 (ba)3 1
= a2 (ba)
n n+ 2a (ba)
n2
1
2 n(n+1) + n3 6 n(n+1)(2n+1)

= a2 (ba) + 2a(ba)2 ( 21 + 2n
1
) + (ba)3 ( 13 + 2n
1
+ 6n1 2 )

Taking the limn , the terms with n in the denominator disappear, and we get:
Z b
x2 dx = a2 (ba) + 2a(ba)2 ( 21 ) + (ba)3 ( 31 ) = 1 3
3b 31 a3 .
a

Examples.

Evaluate the integral: Z 7


(3t5)2 dt.
2
We
R use the Integral Rules toR reduce the problem to Basic Integrals. Since
f (x)2 dx is NOT equal to ( f (x) dx)2 , we must expand the integrand and
apply the Sum, Difference, and Constant Multiple Rules:
R7 2
R7 2
2 (3t5) dt = 2 (9t 30t + 25) dt
R7 R7 R7
= 9 2 t2 dt 30 2 t dt + 25 2 1 dt

= 9 31 73 12 (2)3 30 12 72 21 (2)2 + 25 (7(2))


 

= 603 .

Note that the variable of integration (t or x) is irrelevant.

Find an upper bound for:


Z 7
(3x5)2 (1+ cos(x2 )) dx.
2

That is, we do not ask for an exact value, only an overestimate. We know
that 1 + cos(x2 ) 2, so the Domination Rule gives:
R7 2 2
R7 2
2 (3x5) (1+ cos(x )) dx 2 (3x5) (2) dx = 2(603) = 1206 .
Math 132 Fundamental Theorem of Calculus Stewart 4.3

Integral as antiderivative. In 4.1, we were given a velocity function v(t), and we wanted
to determine the corresponding position function s(t). First, we computed the distance
traveled by the object over a given time t = a to t = b by adding up (velocity)(time) over
many small time increments of length t:
Xn Z b
distance traveled = s(b) s(a) = lim v(ti ) t = v(t) dt.
t0 a
i=1

Assuming an initial value s(a) = 0, meaning the object is at zero-position at time t = a,


we find that the position at time t = b is the same as distance traveled from t = a to t = b:
Z b
s(b) = s(b) s(a) = v(t) dt.
a

The choice of letters for quantities is only suggestive, and does not affect the computations.
Instead of a fixed interval [a, b], let us change to [a, x] to suggest that the right endpoint
t = x is variable, while the left endpoint t = a remains fixed. We get the position function:
Z x
s(x) = v(t) dt.
a

This always computes an antiderivative function for v(t), even if it is impossible to get an
antiderivative algebraically by reversing differentiation formulas.

First Fundamental Theorem. We generalize the above to produce an antiderivative for


any continuous function f (x), namely the integral function.
Theorem: Let f (x) be continuous for all x [a, b] and define the function:
Z x
F (x) = f (t) dt.
a

Then F 0 (x)
= f (x) for x (a, b), and F (x) is the unique antiderivative of f (x)
with F (a) = 0.
Before a formal proof, let us see how the Theorem relates to our velocity-to-position
argument above. If we assume there is some antiderivative F (x) with F 0 (x) = f (x) and
F (a) = 0, then we could approximate F (x) = F (x) F (a) as the sum of increments of
(rate)(time) = f (ti )t, and the exact value of F (x) as t 0 would be the integral.
However, this does not prove that there really does exist such an antiderivative F (x), only
that if it exists, it must be given by the integral function.
Proof of Theorem. We do not yet know a derivative formula for the new function F (x) =
Rx 0 F (x+h)F (x)
a f (x) dx, so we must compute from the definition: F (x) = lim h . We have:
h0

x+h x
1 x+h
Z 
F (x+h) F (x)
Z Z
1
= f (t) dt f (t) dt = f (t) dt ,
h h a a h x
?? R t ?? R 2
We use the new variable x to avoid s(t) = a v(t) dt, which would imply nonsense like s(2) = a v(2) d2.

Again, we must use different letters for the limit of integration x and the variable of integration t.
R x+h Rx R x+h
since a = a + x for all h (even h < 0).

Geometrically, we see that if h is small enough, the region above [x, x+h] is approximately
R x+h
a rectangle with height f (x) and width h, so x f (x) dx f (x)h, and:
x+h
F (x+h) F (x)
Z
0 1 1
F (x) = f (t) dt h (f (x)h) = f (x),
h h x

with approximations turning into equalities as h 0, as claimed by the Theorem.


However, geometric inspection is not enough for a general proof, because any picture
only shows a particular case, and is not numerically precise. To control errors, we take the
absolute minimum value N and the absolute maximum value M of the continuous function
f (x) on [x, x+h], using the Extremal Value Theorem (3.1). (To indicate that these depend
on h, we write Nh , Mh .) Now, Nh f (t) Mh for t [x, x+h], so by the Bounds Rule for
integrals (4.2) we have:
Z x+h Z x+h
1
((x+h)x)Nh f (t) dt ((x+h)x)Mh = Nh f (t) dt Mh .
x h x

As h gets very small, the interval [x, x+h] gets closer and closer to the single point x,
and the absolute minimum and maximum over this tiny interval must approach f (x) by
continuity: that is, limh0 Nh = limh0 Mh = f (x). Also, by the above we have:
x+h
F (x+h) F (x)
Z
1
Nh = f (t) dt Mh .
h h x

Applying the Squeeze Theorem for limits (1.6), we find what we wanted:

F (x+h) F (x)
F 0 (x) = lim = lim Nh = lim Mh = f (x) ,
h0 h h0 h0
Ra
As for the last part of the conclusion, it is clear that F (a) = a f (t) dt = 0, and there is
a unique antiderivative with this initial value by the Antiderivative Theorem (3.9), which
is a version of the Uniqueness Theorem (3.2). Note how we have used almost all of our
previous theory in proving this culminating Theorem.

Derivative of integralRfunctions. The above Theorem can be stated as a Basic Deriva-


x
tive formula for F (x) = a f (t) dt:
Z x 
0 d
F (x) = f (t) dt = f (x) .
dx a

Here a is any constant, x is the input variable, and t is a dummy variable which only has
meaning inside the integral.
For another function g(x), we can take its composition with F (x). Then the above
Basic Derivative together with the Chain Rule (2.5) implies:
Z g(x) !
0 d
F (g(x)) = f (t) dt = F 0 (g(x)) g 0 (x) = f (g(x)) g 0 (x) .
dx a

R x3
example: Find the derivative of F (x) = 2x sin(x) dx. We have:
Z x3 ! Z x3 Z 2x !
0 d d
F (x) = sin(x) dx = sin(x) dx sin(x) dx
dx 2x dx 0 0

= sin(x3 ) (x3 )0 sin(2x) (2x)0 = 3x2 sin(x3 ) 2 sin(2x) .


R b3
example: Find the derivative of F (x) = 2a sin(t) dt. Here a, b are constants, and hence
so are 2a, b3 . In fact, the right hand side does not depend on the variable x, and is a
constant function with derivative F 0 (x) = 0 ! This also follows from the Chain Rule, since
sin(2a) 2(a)0 = 0 and sin(b3 ) (b3 )0 = sin(b3 ) 3b2 (b)0 = 0.
Rx
Sketching integral functions. Since an antiderivative F (x) = a f (t) dt might be a com-
pletely new function for which no elementary formula is possible, it might seem mysterious.
However, we can find its values with sufficient accuracy by computing Riemann sums on a
spreadsheet, and plot these to get a good idea of the graph.
A cleverer strategy is to use the derivative F 0 (x) = f (x) for sketching y = F (x), as in
3.3 and 3.5. That is, the slope of y = F (x) is given by the height of y = f (x).
Rx
example: Graph the function F (x) = 0 sin(t2 ) dt. The critical points of F (x) are solutions
of F 0 (x) = 0 or undefined, i.e. f (x) = sin(x2 ) = 0 (never undefined). This happens when
2
x = 2k for any integer k, so the critical points are x = 0, , 2, . . . . The sign chart
is:
x 2 0 2
F 0 (x) + 0 0 + 0 + 0 0 +
F (x) % 0.43 & 0.89 % 0 % 0.89 & 0.43 %
For inflection points, we solve F 00 (x) = 0, i.e. f 0 (x) = 2x cos(x2 ) = 0, so x = 0, 2 ,
p
q
3 2 , . . . . Thus, the general shape of the graph is clear, and we can get specific points
Rb
(b, F (b)) from computing a Riemann sum for 0 sin(t2 ) dt.
From the 180 rotational symmetry of the graph, it looks like F (x) is an odd function,
F (x) = F (x). This is because f (x) = sin(x2 ) is an even function, f (x) = f (x), so:
R b R0
F (b) = 0 sin(t2 ) dt = 2
b sin(t ) dt
= (area under y = sin(x2 ) above x [b, 0])
= (area under y = sin(x2 ) above x [0, b])
Rb
= 0 sin(t2 ) dt = F (b)

Second Fundamental Theorem. This is a trick to easily evaluate many integrals, which
we already used to find some exact values inT 4.1.

Theorem: Suppose F (x) is some known antiderivative with F 0 (x) = f (x). Then:
Z b
f (x) dx = F (b) F (a) .
a
Rb
That is, if f (x) is the rate of change of F (x), then the integral a f (x) dx is the
total change of F (x) from x = a to b.

Proof. Since F (x) is a particular antiderivative of f (x), the Uniqueness Theorem (3.9,
3.2) says that the general antiderivative is F (x) + C for R xany constant C. But the First
Fundamental Theorem says the integral function I(x) = 0 f (t) dt is also an antiderivative
of
R a f (x), so we must have I(x) = F (x) + C. Since we know the initial condition I(a) =
a f (t) dt = 0, we get I(a) = F (a) + C = 0, and C = F (a). Therefore I(x) = F (x) F (a)
Rb
and a f (x) dx = I(b) = F (b) F (a) as desired.
R 5
example: Evaluate the integral: 5 5+4x2 x4 dx. Reversing our Derivative Rules as
we did in 3.9, we see that F (x) = 5x+ 43 x3 51 x5 is an antiderivative. By the Theorem:
Z 2
5+4x2 x4 dx = F ( 5) F ( 5) = 4
3 5 ( 34 5) = 8
3 5 5.96
0

example: Determine the area under the curve y = 5+4x2 x4 and above the x-axis.

We must determine the limits of integration, which are the x-intercepts of the graph. Sub-
tituting u = x2 , the equation becomes 5 + 4u u2 = 0, which we can solve by the Quadratic
R 5
Formula as u = 1 or 5, so x = u = 5. Thus the area is 5 5+4x2 x4 dx = 38 5
as above.


The variable of integration, x or t, is irrelevant, provided it doesnt conflict with the limits of integration.
Math 132 More Uses for Integrals 4.4, 5.5
Rb
Review. The integral a f (x) dx has four levels of meaning.

Physical: Suppose y, z are physical variables determined as continuous functions


of an independent variable x, so that y = f (x) and z = F (x). If y is the rate
dz
of change of z, i.e. y = dx , then the integral of y is the cumulative total change
of z between x = a and x = b. In Leibnitz notation:
Z b x=b
y dx = z .

a x=a

In Newton notation: Z b
f (x) dx = F (b) F (a) .
a
This is the Second Fundamental Theorem of Calculus.
Rx
If we know an initial value F (a), we have F (x) = F (a) + a f (t) dt, and:
Z x
0 d
F (x) = f (t) dt = f (x) .
dx a

This is the First Fundamental Theorem of Calculus.

Geometric: The integral is the area between the graph y = f (x) and the interval
x [a, b], counting area above the x-axis as positive, area below the x-axis as
negative.

Numerical: To compute the integral, we divide x [a, b] into n increments of


width x = ban , and choose sample points x1 , . . . , xn , one in each increment.
Then the integral is approximated by a Riemann sum:
Z b n
X
f (x) dx f (xi )x = f (x1 )x + + f (xn )x .
a i=1

The exact integral is the limit of these approximations as n , x 0.

Algebraic: If, by reversing derivative formulas, we can find a formula


Rb for an
antiderivative F (x) with F 0 (x) = f (x), then we can compute a f (x) dx =
F (b) F (a) by the Second Fundamental Theorem. Later, we will develop
techniques for finding F (x), such as the Substitution Method which reverses
the Chain Rule.


In Leibnitz notation, a function is denoted by its output variable, such as z = F (x). A particular
output value of the function is denoted: z|x=a = F (a); and the change in the value over an interval
x [a, b] is denoted: z|x=b
x=a = F (b) F (a).
Indefinite integral notation. Since antiderivatives are so closely related to inte-
grals by the Fundamental Theorems, we adopt the integral sign as a notation for the
most general antiderivative of a function:
Z
f (x) dx = F (x) + C for all C.

Here F (x) is a particular antiderivative: F 0 (x) = f (x); and F (x) + C means the
family of all antiderivatives, one for every constant C (3.9). This familyR is called
the indefinite integral, with no specific limits of integration next to the sign.
d 3 d 1 3
example: Since dx (x ) = 3x2 and dx ( 3 x ) = x2 , we have the indefinite integral:
Z
x2 dx = 13 x3 + C .

example: Suppose a car with position function s(t) lurches forward with velocity
v(t) = 10t + 10 sin(t) m/sec. How far does it travel from t = 0 to t = 3 sec? Making
use of the antiderivative table in 3.9, we first find the indefinite integral:
Z
10t + 10 sin(t) dt = 5t2 10 cos(t) + C

Since velocity is the rate of change of position, the total change in position is the
definite integral:
Z 3
t=3
s(3) s(1) = 10t + 10 sin(t) dt = 5t2 10
cos(t)
t=0
0

5(32 ) 10 2 10 20
 
= cos(3) 5(0 ) cos(0) = 45 + 51.4 meters.

Average of a function. In the numerical definition of integral above, we can rewrite


the Riemann sum as:
n n
X X ba f (x1 ) + + f (xn )
f (xi ) x = f (xi ) = (ba) .
n n
i=1 i=1

This is just the interval length (ba) times the average of the sample values f (x1 ), . . . , f (xn ).
The integral is the limit of this as n , which becomes (ba) times the average
of a more and more dense set of sample values:
Z b
f (x1 ) + + f (xn )
f (x) dx = (ba) lim .
a n n

We define the average of f (x) over all x [a, b] to be the above limit. Hence:
Z b
1
Average of f (x) over [a, b] = fave = f (x) dx .
ba a

example: RFind the average
R 1/2 value2 of f (x) = x over x [0, 4]. The indefinite

integral is: x dx = x dx = 3 x3/2 + C. The average is thus:

4
1 2 3 x=4
Z
1 4
x dx = x2 = 1.3 .
40 0 4 3 x=0 3

That is, the function varies between 0 x 2 over the interval, but its aver-
age value is higher than the halfway point 1.0, because the graph bulges above the
straight line from (0, 0) to (4, 2).

A geometric way to picture the average of a positive f (x) over [a, b] is to think
of the area under the curve as a fluid. If we remove the curve and contain the fluid
between the walls x = a and x = b, then the level of the fluid is the average of the
function. In the picture, the fluid under the straight line would fill the container
between x = 0 and x = 4 to the midpoint level y = 1; but with extra fluid under

y = x and above the line, the average of f (x) = x is higher: y = 43 .

Mean Value Theorem for Integrals

If f (x) is continuous for x [a, b], then there is some c (a, b) where f (c)
1
Rb
equals the average of f (x) over the interval: f (c) = fave = ba a f (x) dx.
Rx
Proof: Take F (x) = a f (x) dx. Then the Mean Value Theorem for Derivatives (3.2)
says there is a value c (a, b) where the tangent line for F (x) is parallel to the secant
line over the interval: F 0 (c) = F (b)F
ba
(a)
. By the First Fundamental Theorem, the
Rb
left side is F (c) = f (c); and since F (a) = 0, the right side is Fba
0 (b) 1
= ba a f (x) dx,
as desired.

In our example f (x) = x, we easily find c (0, 4) with f (c) = c = fave = 43 : it is
c = 16
9 .
Math 132 Substitution Method Stewart 4.5

Reversing the Chain Rule. As we have seen from the R b Second Fundamental
Theorem (4.3), the easiest way to evaluate an integral a f (x) dx is to find an
R Rb
antiderivative, the indefinite integral f (x) dx = F (x) + C, so that a f (x) dx =
F (b)F (a). Building on 3.9, we will learn several methods to find antiderivatives
which reverse our methods of differentiation, in this case the Chain Rule.
For example, let us find the antiderivative:
Z
x cos(x2 ) dx .

That is, for what function will the Derivative Rules produce x cos(x2 )? We notice
an inside function g(x) = x2 , and a factor x which is very close to the derivative
g 0 (x) = 2x. In fact, we can get the exact derivative of the inside function if we
multiply the factors by 21 and 2:
x cos(x2 ) = 1
2 cos(x2 ) (2x) .
This is just the kind of derivative function produced by the Chain Rule:
??
f (g(x))0 = f 0 (g(x)) g 0 (x) = f 0 (x2 ) (2x) = 1
2 cos(x2 ) (2x) .
We still need to find the outside function f . To remind us of the original inside
function, we write f (u), where the new variable u represents u = g(x) = x2 . We
must get f 0 (u) = 12 cos(u), an easy antiderivative:
Z
1 1
2 cos(u) du = f (u) + C = 2 sin(u) + C .

Now we restore the original inside function to get our final answer:
Z
1 1 1 2
2 cos(u) du = 2 sin(u) + C = 2 sin(x ) + C .

The Chain Rule in Leibnitz notation (2.5) reverses and checks the above
computation. Writing y = 21 sin(u) and u = x2 :
dy dy du d 1
 d 2
= = 2 sin(u) x
dx du dx du dx
= 1
2 cos(u) (2x) = 1
2 cos(x2 ) (2x) = x cos(x2 ) .

Substitution Method
R
1. Given an antiderivative h(x) dx, try to find an inside function g(x) such
that g 0 (x) is a factor of the integrand:
h(x) = f (g(x)) g 0 (x).
This will often involve multiplying and dividing by a constant to get the
exact derivative g 0 (x). After factoring out g 0 (x), sometimes the remaining
factor needs to be manipulated to write it as a function of u = g(x).
du
2. Using the symbolic notation u = g(x), du = dx = g 0 (x) dx, write:
dx
Z Z Z
0
h(x) dx = f (g(x)) g (x) dx = f (u) du ,
R
and find the antiderivative f (u) du = F (u) + C by whatever method.
3. Restore the original inside function:
Z Z
h(x) dx = f (u) du = F (u) + C = F (g(x)) + C .

Examples
R
(3x+4) 3x+4 dx. The inside function is clearly u = 3x+4, du = 3 dx, so:

Z Z
1
(3x+4) 3x+4 dx = 3 (3x+4) 3x+4 3 dx


Z Z
= 1
3u u du = 1
3 u3/2 du = 1 2 5/2
3 5u +C = 2
15 (3x+4)
5/2
+ C.


Z
x 3x+4 dx. Again u = 3x+4, so 3x+4 becomes u, but we must still
express the remaining factor x in terms of u. We solve u = 3x+4 to obtain
x = 31 u 43 : that is, x = 31 (3x+4) 43 :

Z Z Z
1 1 4 1 1 4
x 3x+4 dx = 3 ( 3 (3x+4) 3 ) 3x+4 3 dx = 3 ( 3 u 3 ) u du
Z
1 3/2 4 1/2 1 2 5/2 4 2 3/2 5/2
= 9u 9 u du = 9 5u 9 3 u +C = 2
45 (3x+4)
8
27 (3x+4)3/2 +C.

sec2 ( x)
Z
dx . We take u = x = x1/2 , du = 12 x1/2 dx = 21 x dx
x

sec2 ( x)
Z Z
1
dx = 2 sec2 ( x) dx
x 2 x

Z
= sec2 (u) du = tan(u) + C = tan( x) + C.

Here we use the trig integrals from 3.9.


Z
sin(x)
dx. We cannot take the inside function u = sin(x), because
(1 + cos(x))2
its derivative cos(x) is not a factor of the integrand. We could take u =
cos(x), but the best choice is u = 1 + cos(x), du = sin(x) dx:
Z Z
sin(x) 1
dx = ( sin(x)) dx
(1 + cos(x))2 (1 + cos(x))2
Z
1 1 1
= 2
du = +C = + C.
u u 1 + cos(x)


Z
1 x
p dx. Take u = 1+ x, du = 21 x , so x = u1, 1 x = 2u.
1+ x


Z Z
1 x 1 x 1
p dx = p (2 x) dx
1+ x 1+ x 2 x

u2 3u+2
Z Z
2u
= 2(u1) du = 2 du
u u
Z
= 2 u3/2 3u1/2 + 2u1/2 du = 4 5/2
5u 43 u3/2 + 8u1/2 + C

= 4
5 (1+ x)5/2 43 (1+ x)3/2 + 8(1+ x)1/2 + C.
Whew! Here we did not have the derivative factor du 1
dx = 2 x already present:

we had to multipy and divide by it to get du,R then express the remaining
factors in terms of u. By luck, the resulting du was do-able.
Z
sec2 (x) tan(x) dx. Here we could take u = tan(x), du = sec2 (x) dx:
Z Z
2
sec (x) tan(x) dx = tan(x) sec2 (x) dx
Z
1 2
= u du = 2u +C = 1
2 tan2 (x) + C.

Alternatively, use the inside function z = sec(x), dz = tan(x) sec(x) dx:


Z Z
sec2 (x) tan(x) dx = sec(x) tan(x) sec(x) dx
Z
1 2
= z dz = 2z +C = 1
2 sec2 (x) + C.

Thus 21 tan2 (x) and 12 sec2 (x) are two different antiderivatives, so what about
the Antiderivative Uniqueness Theorem (3.9)? In fact, the identity tan2 (x)+
1 = sec2 (x) implies:
1
2 tan2 (x) + 1
2 = 1
2 sec2 (x) .

These give the same antiderivate family: 1


2 tan2 (x) + C = 1
2 sec2 (x) + C 0 !
Substitution for definite integrals. We have, for u = g(x):
Z b Z g(b)
0
f (g(x)) g (x) dx = f (u) du .
a g(a)
R3
example: 2 x(1+x2 )5 dx. Taking u = 1+x2 , du = 2x dx:
Z 4 Z 4 Z 1+42
2 5 1 2 5 1 5
x(1+x ) dx = 2 (1+x ) 2x dx = 2 u du
3 3 1+32
1 6 u=17 1 6 1 6

= 12 u u=10 = 12 10 12 17 .

Integral Symmetry
Ra Theorem: If f (x) is an odd function, meaning f (x) =
f (x), then a f (x) dx = 0.
Proof. By the Integral Splitting Rule (4.2), we have:
Z a Z 0 Z a
f (x) dx = f (x) dx + f (x) dx .
a a 0
Substituting u = x, du = (1) dx in the first term:
Z 0 Z 0 Z 0
f (x) dx = f (x) (1) dx = f (x) (1) dx
a a a
Z 0 Z a Z a
= f (u) du = f (u) du = f (x) dx .
a 0 0
The last equality holds because the variable
R 0 of integration
R a is merely suggestive,
Ra and
can be changed arbitrarily. Therefore a f (x) dx + 0 f (x) dx = 0 f (x) dx +
Ra
0 f (x) dx = 0, as desired.
R
example: Evaluate the definite integral x cos(x) dx. Here substitution will
not work, and it is difficult to find an antiderivative. But since (x) cos(x) =
(x cos(x)), the Theorem tells us the integral must be zero.
Geometrically, the integral is the signed area between the graph and the x-axis:

Since the function f (x) = x cos(x) is odd, the graph has rotational symmetry
around the origin, and each negative area below the x-axis cancels a positive area
above the x-axis.
Math 132 Area Between Curves Stewart 5.1

Region
Rb between two parabolas. We have seen that geometrically, the integral
a f (x) dx computes the area between a curve y = f (x) and an interval x [a, b]
on the x-axis (with area below the axis counted negatively). In Calculus II, we will
show the versatility of the integral to compute all kinds of areas, lengths, volumes:
almost any measure of size for a geometric object.
In this section, we compute more general areas: those between two given curves
y = f (x) and y = g(x), usually with no boundary on the x-axis.
example: Consider the region with top boundary y = f (x) = x2 +x+1, bottom
boundary y = g(x) = 2x2 1, left boundary the y-axis x = 0, right boundary x = 1:

R = { (x, y) with g(x) y f (x) and x [0, 1] }.


= { (x, y) with 2x2 1 y x2 +x+1 and 0 x 1 }.

Here y = g(x) = 2x2 1 is a standard parabola shifted downward, with minimum


point x = 0. The curve y = f (x) = x2 +x+1 is roughly like its leading term y = x2 ,
a parabola opening upward; its minimum point satisfies (x2 +x+1)0 = 2x+1 = 0,
i.e. x = 21 .
To compute the area of R, we use the same geometric-numerical strategy as
for the region under a single curve: split R into n thin vertical slices of width
x = n1 , each approximately a rectangle; then add up the rectangle areas and
take the limit as n becomes larger and larger. In the interval x [0, 1], we take
sample points x1 , . . . , xn , one in each x increment. The slice at position xi has
height equal to the ceiling minus the floor, f (xi ) g(xi ), so:

area of slice (height)(width) = (f (xi ) g(xi )) x,



We specify the region as the set of all points (x, y) which satisfy the given conditions.
and the total area is:
n
X Z 1
A[0,1] = lim (f (xi ) g(xi )) x = (f (x) g(x)) dx
n 0
i=1
Z 1 h ix=1
= (x2 +x+1) (2x2 1) dx = 2x+ 12 x2 31 x3 = 13
6 .
0 x=0

example: Next, consider the region between the same curves y = f (x) = x2 +x+1
and y = g(x) = 2x2 1, but above the interval x [1, 3]. To picture the region
without a calculator, we determine the intersection points where the curves cross:

f (x) = g(x) x2 +x+1 = 2x2 1



1 12 4(1)(2)
x2 x2 = 0 x= 2(1) = 13
2 = 1 or 2

by the Quadratic Formula. Only x = 2 is relevant for our region above x [1, 3].
At x = 1 we have g(1) < f (1), so to the left of x = 2, our region is defined by
g(x) y f (x). At x = 3, we have f (3) < g(3), so to the right of x = 2, it is
f (x) y g(x):

Repeating our previous area formula for the two parts of our region gives:
Z 2 Z 3
A[1,3] = A[1,2] + A[2,3] = (f (x) g(x)) dx + (g(x) f (x)) dx
1 2
Z 2 Z 3
= (2+xx2 ) dx + (2x+x2 ) dx = 7
6 + 11
6 = 3.
1 2
example: Finally, we consider the same curves y = f (x) = x2 +x+1 and y =
g(x) = 2x2 1, but we take the entire finite region between them:

R0 = {(x, y) with g(x) y f (x)}.

Here the top boundary is y = x2 + x + 1 and the bottom boundary is y = 2x2 1,


but we have not specified an x interval. However, we have already computed the
intersection points x = 1 and x = 2, and the curves do not enclose any finite
regions beyond these points. Thus:
Z 2
A[1,2] = (f (x) g(x)) dx = 29 .
1

We can generalize the above examples in:

Theorem: The area of the region enclosed between f (x) and g(x) for
Z b

x [a, b] is: A = f (x) g(x) dx .
a

The absolute value signs ensure we take the integral of top minus bottom, regard-
less of which is which. In practice, we must find the intersection points where
f (x) = g(x), which split the integral into intervals where g(x) f (x) versus
f (x) g(x).

Integrating with respect to y. Consider the region:

R = {(x, y) with y 2 x y+1} .

Here the boudary curves are naturally graphs in which y is the independent vari-
able: the right boundary is the line x = f (y) = y+1; and the left boundary is
x = g(y) = y 2 , a parabola opening to the right.
Understand: it is merely by habit that we consider y as a function of x. We can
make x a function of y instead if it is more convenient, and the same formulas
will work if we switch the
roles of x and y. Thus, we find the intersection points:
2 1 5
y+1 = y when y = 2 by the Quadratic Formula. The area as:

1+ 5 y= 1+ 5

Z 
2 2
2 1 2 1 3 5
A = (y+1) (y ) dy = 2y +y 3y = 6 5.
1 5
2 y= 12 5

Here ((y+1) (y 2 )) dy represents the area of the horizontal slice of the region at
height y, with thickness dy.
To check this, we re-do it from our usual perspective, using x as the indepen-
dent variable. This makes it more complicated, since we must consider the region

as having three boundary graphs: upper boundary y = x, lower right boundary

y = x1, and lower left boundary y = x. The intersection points are:

Between y = x and y = x1: x = 3+2 5 (upper right corner)

Between y = x and y = x1: x = 32 5 (lower middle corner)

Between y = x and y = x: x = 0 (left end)
These split the region into left and right parts:

The area is:


3 5 3+ 5

Z Z
2 2
A = ( x) ( x) dx + ( x) (x1) dx ,
3 5
0 2

5

which after much algebra gives the same answer as before: 6 5.

You might also like