Math132 Notes
Math132 Notes
This section is a birds-eye view of the course. Read it over now, then come back
to it as you learn the topics, to see how they fit into the whole theory.
Calculus is the mathematics of change and variation. With ordinary algebra,
we can translate static real-world problems into equations and solve them; with
calculus, we can solve dynamic problems involving motion, rates of change, opti-
mum values, irregular shapes, and the cumulative effect of a changing influence. It
was discovered by Newton and Leibnitz, and developed further notably by Euler.
The main concepts of calculus are derivatives and integrals applied to functions.
Like most mathematical concepts, these have four levels of meaning: physical (real-
world), geometric (pictures), numerical (spreadsheets), and algebraic (formulas).
Given a problem originating on one level (usually physical or geometric), we trans-
late to a different level (numerical or algebraic) where the problem can be solved,
then we translate the solution back to the original level.
t 0 1 2 3 4 5
s=f (t) 0 16 64 144 256 400
Of course, f (t) has a value at every t, not just the samples. We can imagine
the full function as an infinite table with an entry for every t in the domain.
Each section of these Notes corresponds to a section of James Stewarts Calculus, 7th ed.
Derivatives. Now we preview the main concepts of this course. Given a function
f , the derivative function f 0 has the following meanings.
f (a+h) f (a)
f 0 (a) = lim .
h0 h
We will first determine some Basic Derivatives, such as (xn )0 = nxn1 and
sin0 (x) = cos(x), and combine them using the Sum, Difference, Constant
Multiple, Product, Quotient and Chain Rules.
For our example f (t) = 16t2 , we get f 0 (t) = 32t: the velocity is steadily
increasing, proportional to time. This gives the exact value f 0 (3) = 96.
Integrals.
R b Given a function g, the integral from x = a to x = b is a number
denoted a g(x) dx, and has the following meanings.
1. Physical. Suppose a quantity z = f (x) changes as its controlling variable
goes from x = a to x = b; and each incremental change x leads to a small
change z g(x)x, depending on x. Then the cumulative total change in z
Rb
from x = a to x = b is given by the integral of g(x): f (b)f (a) = a g(x) dx.
In our example, suppose we know v = g(t) = 32t, the velocity of the stone
at time t, and we wish to deduce the fallen distance s = f (t) for t = 3. Over
a time increment t, the stone moves by about s v t =R 32t t; so we
3
can express the cumulative change as: f (3) = f (3) f (0) = 0 32t dt.
Rb
2. Geometric. For the graph y = g(x) 0, the integral a g(x) dx is the area
under the graph and above the interval a x b on the x-axis. This is
because the area A is the cumulative total of thin slices A g(x) x with
height y = g(x) and width x.
R3
In our example, we can get the integral 0 32t dt as the trapezoidal area
under the graph v = g(t) = 32t and above the interval t [0, 3].
3. Numerical. We approximate the cumulative change in z from x = a to
x = b by splitting up the interval a x b into a large number n of small
increments of width x = ba n . We take sample points x1 , . . . , xn , one in
each increment, and compute the sum of z g(xi )x:
Z b
g(x) dx g(x1 )x + g(x2 )x + + g(xn )x .
a
Rb R
This is the origin of the notation a g(x) dx, where is an elongated S
standing for sum, and g(x) dx represents all the small changes g(xi )x.
In our example, given the velocity function v = g(t) = 32t, we can take
n = 3, t = 1 sec, and sampleR points t1 =1, t1 =2, t2 =3. We approximate the
3
cumulative distance traveled 0 32t dt as the sum over the 3 time increments
of (velocity at ti )(time elapsed) = 32ti t:
Z 3
32t dt 32(1)(1) + 32(2)(1) + 32(3)(1) = 192.
0
This an overestimate because we sample the velocity at the end of each time
increment, when the stone is fastest. Taking more increments (larger n)
would give better and better approximations.
4. Algebraic. Since integrals go from a rate of change to a total change, they
are reverse derivatives (antiderivatives), and we can use our known derivative
rules backwards to find formulas for many (but not all) integrals. That is, if
Rb
g(x) = f 0 (x) for a known formula f (x), then a g(x) dx = f (b) f (a). This
is known as the Fundamental Theorem of Calculus.
In ourRexample, we know f (t) = 16t2 has f 0 (t) = 32t, so we we get the exact
3
value 0 32t dt = 16(32 ) 16(02 ) = 144.
Math 132 Tangent and Velocity Stewart 1.4
Falling stone example. A stone dropped off a bridge has position approx-
imately f (t) = 16t2 feet below the bridge after falling for t seconds. The
average velocity between t = 3 and t = 4 is:
f (4) f (3) 16(42 ) 16(32 )
vavg = = = 112.
43 1
That is, the stone has an average velocity of 112 ft/sec, although it starts
slower than this at t = 3 and speeds up steadily throughout the interval.
Now, what is the instantaneous velocity at t = 3? We compute the
average velocity over a short time interval from t = 3 to t = 3 + h, for
example h = 0.1:
f (3.1) f (3) 16(3.12 ) 16(32 )
vavg = = = 97.6 .
3.1 3 0.1
This is a pretty good estimate of the velocity, but to be more precise we
take shorter intervals:
h 1 0.1 0.01 0.001 0.0001 0.00001
vavg 112 97.6 96.16 96.02 96.002 96.0002
Velocity can be positive or negative, depending on the direction of motion. Speed is
the absolute value of velocity.
It is pretty clear that as the interval gets shorter and shorter, the average
velocity approaches the limiting value v = 96, and we define this to be the
instantaneous velocity.
Let us prove this algebraically: instead of trying sample values of the
time increment h, we let h be a variable:
(32 + 2(3h) + h2 ) 32 6h + h2
= 16 = 16 = 16(6 + h) = 96 + 16h .
h h
As we take h smaller and smaller, the error term 16h approaches zero, and
the average velocity approaches the limiting value 96, which by definition is
the instantaneous velocity:
f (3+h) f (3)
v = lim = 96 .
h0 h
Tangent Slope. We have described velocity on three conceptual levels: as a
physical quantity, a numerical approximation, and an algebraic computation.
Velocity also has a geometric meaning in terms of the graph y = f (t).
Consider a secant line which cuts the graph at points (a, f (a)) and (b, f (b)).
The slope msec of the secant line is the rise in the graph per unit of
horizontal run, which means distance traversed divided by time elapsed,
which is the average velocity:
f (b) f (a)
msec = = vavg .
ba
The reason for this coincidence is that slope is the rate of vertical rise with
respect to horizontal run, just as velocity is the rate of change of position
(drawn on the vertical axis) with respect to time (on the horizontal axis).
As we move the point (b, f (b)) to (a+h, f (a+h)), closer and closer to a,
the secant lines approach the tangent line which touches the curve at the
single point (a, f (a)).
f (a+h) f (a)
m = lim = v.
h0 h
For any graph y = f (x), not only the graph of position with respect to
time, the tangent problem is to find the the tangent line passing through
(a, f (a)). The slope m is given by the above formula. The point-slope
equation of the tangent line is thus: y = f (a) + m(x a). For example, the
tangent line of our graph y = 16x2 at the point (3, 144) is: y = 144+96(x3).
Math 132 Tangent and Velocity Stewart 1.4
Falling stone example. A stone dropped off a bridge has position approx-
imately f (t) = 16t2 feet below the bridge after falling for t seconds. The
average velocity between t = 3 and t = 4 is:
f (4) f (3) 16(42 ) 16(32 )
vavg = = = 112.
43 1
That is, the stone has an average velocity of 112 ft/sec, although it starts
slower than this at t = 3 and velocities up steadily throughout the interval.
Now, what is the instantaneous velocity at t = 3? We compute the
average velocity over a short time interval from t = 3 to t = 3 + h, for
example h = 0.1:
f (3.1) f (3) 16(3.12 ) 16(32 )
vavg = = = 97.6 .
3.1 3 0.1
This is a pretty good estimate of the velocity, but to be more precise we
take shorter intervals:
It is pretty clear that as the interval gets shorter and shorter, the average
velocity approaches the limiting value v = 96, and we define this to be the
instantaneous velocity.
Velocity can be positive or negative, depending on the direction of motion. Speed is
the absolute value of velocity.
Let us prove this algebraically: instead of trying sample values of the
time increment h, we let h be a variable:
(32 + 2(3h) + h2 ) 32 6h + h2
= 16 = 16 = 16(6 + h) = 96 + 16h .
h h
As we take h smaller and smaller, the error term 16h approaches zero, and
the average velocity approaches the limiting value 96, which by definition is
the instantaneous velocity:
f (3+h) f (3)
v = lim = 96 .
h0 h
Tangent Slope. We have described velocity on three conceptual levels:
real-world quantities, numerical approximations, and algebra. Velocity also
has a geometric meaning in terms of the graph y = f (t). Consider a secant
line which cuts the graph at points (a, f (a)) and (b, f (b)).
The slope of the secant line is the rise in the graph per unit of horizontal
run, which means distance traversed divided by time elapsed, which is the
average velocity:
f (b) f (a)
msec = = vavg .
ba
As we move the point (b, f (b)) to (a+h, f (a+h)), closer and closer to a, the
secant lines approach the tangent line which touches the curve at the single
point (a, f (a)).
The tangent slope is the limit of the secant slopes, so it is equal to the
instantaneous velocity:
f (a+h) f (a)
m = lim = v.
h0 h
For any graph y = f (x), not only graphs of position with respect to time,
the tangent problem is to find the the tangent line passing through (a, f (a)).
The slope m is given by the above formula. The point-slope equation of the
tangent line is thus: y = m(x a) + f (a). For example, the tangent line of
our graph y = 16x2 at the point (3, 144) is: y = 96(x 3) + 144.
Math 132 Limits Stewart 1.5
Definition of limits. The key technical tool in the previous section was
the idea of a limiting value approached by approximations. We need limits
for all the definitions of calculus, so we must understand them clearly.
That is, f (x) approximates L to within any desired error tolerance, for all
values of x within some small distance from a (but x 6= a). One more way
to say it: if we make a table of f (x) for any sample values of x getting
closer and closer to a (such as x = a + 0.1, a + 0.01, etc.), then the values
of f (x) will get as close as we like to L (thought they might never reach L).
Graphically:
lim x2 = 52 = 25,
x5
Near x = 0, the function cannot be forced close to any single output value.
That is, limx0 sgn(x) 6= 1, since no matter how close x gets to 0, there
are some x (namely negative) for which sgn(x) is far from 1; and similarly
limx0 sgn(x) is not 1, nor 0, nor any other value. In particular, it is false
that limx0 sgn(x) = sgn(0), and the function is not continuous at x = 0.
An important feature of limxa f (x) is that it does not depend on f (a),
even if f (a) is undefined: the limit only notices values of f (x) for x 6= a.
For example, define g(x) = 0 for x 6= 1, and g(1) = 12 , having the graph:
Then limx1 g(x) = 0, since if x is close enough to (but unequal to) 1, then
g(x) is arbitrarily close to L = 0 (in fact g(x) = L). Again, limx1 g(x) 6=
g(1) = 21 , and g(x) is not continuous at x = 1.
The important limits in calculus, such as instantaneous velocity, are
cases where the function is not defined at x = a. For example, consider
2 1
limx1 xx1 . Plugging in x = 1 gives the meaningless expression 00 , so this
function is not continuous, but the limit still exists. Indeed, plotting points
gives the graph:
It seems the limit is L = 2: the graph approaches (1, 2), so if x is sufficiently
close to (but not equal to) 1, then f (x) is forced as close as desired to 2. We
can prove this algebraically:
x2 1 (x1)(x+1)
lim = lim = lim x+1 = 1 + 1 = 2,
x1 x 1 x1 x1 x1
1 1
lim = lim =
x0 x x0+ x
Math 132 Limit Laws Stewart 1.6
These all have the form: The limit of an operation equals the operation
applied to the limits. These Laws are also valid for one-sided limits.
Limits by plugging in. Assuming the Limit Laws and the Basic Limits
limxa x = a and limxa c = c, we can prove that most functions are
continuous, meaning the limxa f (x) is obtained by substituting x = a to
get f (a). For example, we can formally compute the limit:
1 x lim 1 x
lim = x2 by the Quotient Law
x2 1 + x lim 1 + x
x2
lim 1 lim x
x2 x2
= by the Sum and Difference Laws
lim 1 + lim x
x2 x2
q
lim 1 lim x
x2 x2
= by the Root Law
lim 1 + lim x
x2 x2
1 2 1 2
= = by the Basic Limits.
1+2 3
x2 4x + 4 (x 2)2 x2
lim 2
= lim = lim ,
x2 x x 2 x2 (x 2)(x + 1) x2 x + 1
0
which can be evaluated by substituting x = 2. Another trick to avoid 0 is
to multiply by a conjugate radical:
x9 x9 x+3 (x9)( x+3)
lim = lim = lim 2
x9 x3 x9 x3 x+3 x9 ( x) 32
(x9)( x+3)
= lim = lim x + 3 = 6.
x9 x9 x9
All the oscillations of sin(x) on the real line are here crammed into the in-
terval 1 x 1, and our function sin( x ) cannot be forced close to any
given value, no matter how close x is to 0. That is, this limit does not exist.
However, consider the limit:
lim x sin .
x0 x
The Product Law would give limx0 x limx0 sin( x ), but the second limit
does not exist, so the Law does not apply. The graph looks like:
The function is bounded between the graphs y = |x| and y = |x|, so its
limit is squeezed to zero. This reasoning is formalized by the:
Squeeze Theorem: Suppose f (x) g(x) h(x) for all x near a
(except possibly x = a), and limxa f (x) = limxa h(x) = L.
Then limxa g(x) = L.
To evaluate our limit, we note that 1 sin( x ) 1, so x x sin( x ) x
for x > 0, and similarly for x < 0. Hence:
|x| x sin |x| for all x 6= 0,
x
and we know by considering cases that limx0 |x| = limx0 |x| = 0, so the
Theorem applies to give:
lim x sin = 0.
x0 x
Math 132 Limit Definition Stewart 1.7
In the graph y = f (x), we take the small red piece between the vertical lines
a < x < a + (not including x = a). By setting small enough, we try
to force this piece between the fixed horizontal lines L < y < L + , for
the specified output error .
Rewriting a < x < a+ as |x a| < , and L < f (x) < L+ as
|f (x) L| < , we get the formal definition of a limit:
Here (delta) is a Greek letter d, standing for difference, and (epsilon) is Greek e,
standing for error.
Definition: limxa f (x) = L means that for any output error
tolerance > 0, there is an input accuracy > 0 such that
0 < |x a| < forces |f (x) L| < .
13 < x 5 < 13 .
which is our desired conclusion |f (x) L| < . (Here = means logically implies.)
choose to be the smaller of d1 and d2 . Thus we take = 2 3 2 . Then
< x3 is equivalent to the desired lower bound 2 3 + 2 < x3; and
also x3 < implies the desired upper bound, since:
= 2 3 2 < 2 3 + 2 .
Note: In evaluating limits, we almost always rely on the Limit Laws and
other general theorems, without a specific error analysis. The general results
guarantee that the error approaches zero, and this is all we need.
Proof of Limit Theorems. All the general Limit Laws of 1.6 can be
rigorously proved by error-control analysis. We prove the simplest one:
Proof. Consider any > 0. Since we assume limxa f (x) = L and limxa g(x) =
M , we can require the error tolerance 21 for these limits, getting > 0 small
enough that 0 < |x a| < forces:
Squeeze Theorem: If f (x) < g(x) < h(x) for all values of x near
a (except perhaps x = a), and limxa f (x) = limxa h(x) = L,
then limxa g(x) = L.
Proof. Consider any > 0. Since we assume limxa f (x) = L and limxa h(x) =
L, we can find a > 0 such that 0 < |xa| < forces < f (x)L < and
< g(x)L < . We also know f (x) < g(x) < h(x) provided |x a| <
restricts x close enough to a, so:
One of the most basic features of a function is whether it is continuous. Roughly, this means
that a small change in x always leads to a fairly small change in f (x), without instantaneous
jumps. In real-world terms, the position of a particle moving in space is continuous, but the
position displayed in a video could have a gap, making the function jump discontinuously.
This can be made precise by saying that near x = a, the limit of f (x) is f (a):
Definition: A function f (x) is continuous at x = a whenever limxa f (x) = f (a).
Graphically, a function is continuous whenever the graph y = f (x) proceeds through the
point (a, f (a)) without jumps or holes.
ii. Removable discontinuity: f (a) is the wrong value, not limxa f (x).
iii. Jump discontinuity: the left and right limits are unequal, limxa+ f (x) 6= limxa f (x).
Domain of continuity. Almost all functions defined by formulas are continuous, except
at points where they are undefined. This follows from our methods for computing limits.
example: Find the points where the following function is continuous:
(x2 3x+1) x+1
g(x) = .
x3
First, we consider the factors outside the square root, repeatedly applying the Limit Laws
from 1.6:
a = 1, where g(x) is continuous, since at the left endpoint of the domain of definition,
we only require the one-sided limit limxa+ g(x) = g(a);
a = 3, where the function clearly has a vertical asymptote, discontinuity of type (iv).
In summary, our g(x) is continuous at every point where it is defined, that is, in the
intervals [1, 3) (3, ). The graph looks like:
The open interval notation [a, b) means the set of all numbers x between a and b, including the left
endpoint x = a but excluding the right endpoint x = b; that is, a x < b. The infinite interval (a, )
means all x > a, with indicating no upper bound on the right.
Composing continuous functions. Another way to combine functions f (x) and g(x)
is to compose or chain them, taking the output of g as the input of f to obtain the new
function f (g(x)). Composition also preserves continuity: if g(x) is continuous at x = a,
and f (x) is continuous at x = g(a), then f (g(x)) is continuous at x = a. This follows from
the following theorem:
Composition Law: We have:
If f (x) is continuous for x in the interval [a, b], and r is between f (x) and f (b),
either f (a) r f (b) or f (a) r f (b), then there is a value c [a, b] such
that f (c) = r.
This says that as the function value f (x) goes continuously from f (a) to f (b), perhaps
rising and falling many times, it must pass through every value r between f (a) and f (b).
Note that this is not necessarily true for a discontinuous function like g(x) above: we
have g(2) = 1.7, g(4) = 11.2, and g(a) < 7 < g(b), but there is a vertical asymptote
discontinuity in the interval [2, 4], and there is no c [2, 4] with f (c) = 7.
However, g(x) is continuous over the interval [0, 1], with g(0) = 0.33, g(1)
= 0.72, and
g(0) < 0 < g(1), so the Theorem says there must be some c [0, 1] with g(c) = 0. This is
just the x-intercept visible in the graph.
example: Show that there exists a solution x = c to the equation cos(x) = x. We have no
easy way of solving this equation, but writing f (x) = cos(x) x, we know that f (0) = 1,
f () = 1, and f (0) > 0 > f (), so by the Theorem there is some c [0, ] with
f (c) = 0, meaning cos(c) = c.
Math 132 Derivatives Stewart 2.1
Here f (a+h) f (a) is the change in f (x) from x = a to x = a+h, and h is the increment
of x. The average rate of change over the interval [a, a+h] is the difference quotient
f (a+h)f (a)
h , and the instantaneous rate of change at x = a is the limit over smaller and
smaller increments, h 0.
Another way to write this is to substitute x for the endpoint of the interval, a + h = x,
approaching a with increment h = x a:
f (x) f (a)
f 0 (a) = lim .
xa xa
In graphical terms, the derivative f 0 (a) is the slope of the tangent line which touches
the graph y = f (x) at the point (a, f (a)), and the equation of the tangent line is y =
f 0 (a)(x a) + f (a).
When the limit f 0 (a) exists, we say f (x) is differentiable at x = a. When the limit does
not exist, f 0 (a) is undefined, and we say f (x) is non-differentiable or singular at x = a.
In this case, the function f (x) does not have a well-defined rate of change at x = a, and
the graph y = f (x) does not have a single tangent line at (a, f (a)). (See Left and Right
Derivatives below.)
1 1 3(x+1)
0 f (x) f (2) x+1 2+1 3(x+1)
f (2) = lim = lim = lim
x2 x2 x2 x2 x2 x2
2x 1 1 1
= lim = lim = = ,
x2 3(x+1)(x2) x2 3(x+1) 3(2+1) 9
2x (x2)
where we cancel the vanishing factors x2 = x2 = 1. Graphically, this looks like:
Left and right derivatives. Let us find f 0 (1) for the function defined by:
2x1 for x 1
f (x) =
x2 for x 1.
Since the function is defined differently on the two sides of x = 1, we must compute one-
sided derivative limits, to see if the two-sided limit exists.
The graph clearly has a transition at x = 1, but it is continuous and has well-defined slope.
On the other hand, if we take:
1 for x 1
g(x) =
x2 for x 1
That is, the derivative g 0 (1) does not exist, and g(x) is non-differentiable at x = 1. This
function could model the distance fallen by an object held still, then thrown down with
speed 2 at time t = 1. Before dropping, the speed is 0; an instant after, the speed is 2;
and there is no well-defined speed at the moment of throwing. (A more detailed analysis
would take into account the gradual acceleration during the throw, which would round off
the corner of the graph.)
Real-world derivatives. In Notes 1.4, we computed instantaneous velocity as the deriva-
tive of the position function f (t) with respect to time t. For any function which models the
dependence between two real-world variables, the derivative gives the rate of change of the
dependent variable with respect to the independent variable.
example: A rough model of atmospheric pressure P at height s is given by the function:
P = f (s) = 15cs , where P is in pounds per square inch (psi), s is feet above sea level, and
the constant c = 0.99996. How quickly does the pressure drop at sea-level and at 10,000
feet up?
At sea level s = 0 ft, the pressure is f (0) = 15 psi (about half the pressure of a car tire),
and the rate of change (psi of pressure change per foot of height upward) is the derivative:
15c0+h 15c0 ch 1
f 0 (0) = lim = lim 15
h0 h h0 h
In this case, we have no algebraic trick to cancel vanishing factors, so we must be content
with a numerical approximation of the difference quotient. (Since P = f (s) is only an
approximate model anyway, we lose nothing from this further approximation.)
h 100 10 1 0.1
15(ch 1)/h 0.00059 0.00059 0.00060 0.00060
Thus, f 0 (0)
= 0.0006 psi/ft. This is a negative rate of change because a rise in height gives
a drop in pressure. Thus, for each foot upward, the pressure decreases by approximately
0.0006 psi, so a 1000 ft rise would drop the pressure by about 0.6 psi.
Now at s = 10, 000 ft, the pressure is about f (10, 000) = 10.1 psi. Let us write a =
10, 000, and compute the rate of change as:
Now, ca = (0.99996)10,000 = 0.67 and the second factor is the limit we approximated before.
Thus, f 0 (a)
= (0.67)(0.0006)
= 0.0004 psi/ft. That is, at an altitude of 10,000 ft, every
1000 ft rise decreases the pressure by about 0.4 psi.
Math 132 Derivative Function Stewart 2.2
In Notes 2.1, we defined the derivative of a function f (x) at x = a, namely the number
f 0 (a). Since this gives an output f 0 (a) for any input a, the derivative defines a function.
Definition: For a function f (x), we define the derivative function f 0 (x) by:
This just repeats the definitions in Notes 2.1, except that we think of the derivative as a
function of the variable x, rather than as a numerical value at a particular point x = a.
The choice of letters is meant to suggest different kinds of variables, but they do not have
any strict logical meaning: for example, f (x) = x2 , f (a) = a2 , and f (t) = t2 all define the
same function, and limxa f (x) = limta f (t) = limza f (z) are all the same limit.
We can sketch the derivative graph y = f 0 (x) in red, purely from the original graph y = f (x),
without any computation. The slope of the original graph above a given x-value is the height
of the derivative graph above that x-value.
At the minimum x = 1, the original graph y = f (x) is horizontal and its slope is zero,
so f 0 (1) = 0, and we plot the point (1, 0) on the derivative graph y = f 0 (x). To the right
of this point, y = f (x) has positive slope, getting steeper and steeper; so y = f 0 (x) > 0 is
above the x-axis, getting higher and higher. Above x = 2, the tangent of y = f (x) has slope
approximately 2 (considering the relative x and y scales), so we plot (2, 2) on y = f 0 (x).
As we move left from x = 1, the graph y = f (x) has negative slope, getting steeper and
steeper, so y = f 0 (x) < 0 is below the x-axis, getting lower and lower. Above x = 0, we
estimate y = f (x) to have slope 2, and we plot (0, 2) on y = f 0 (x). Thus, y = f 0 (x)
looks like the red line in the above picture.
Next we differentiate algebraically. For any value of x:
That is, f 0 (x) = 2x 2, which agrees with our sketch of the derivative graph.
3x2 h + 3xh2 + h3 h
= lim = lim 3x2 + 3xh + h2 1 = 3x2 1.
h0 h h0
3
example: Let f (x) = x, the cube root function, with graph in blue:
The slopes of the original graph y = f (x) are all positive, with the same slope above a given
x and its reflection x. Thus the derivative graph y = f 0 (x) > 0 lies above the x-axis, and
it is symmetric across the y-axis (an even function). The slope of y = f (x) gets smaller for
large positive or negative x, and it gets steeper and steeper near the origin, with a vertical
tangent at x = 0. Thus y = f 0 (x) approaches the x-axis for large x, and shoots up the
y-axis on both sides of x = 0, with f 0 (0) undefined.
3 3x
Algebraically, we have: f 0 (x) = limh0 x+h h . We must liberate 3 x+h from under
the 3 , so as to be able to cancel hh . In Notes 2.1, we multiplied top and bottom by the
conjugate radical, exploiting the identity (a b)(a + b) = a2 b2 . Here
we have cube roots,
2 2 3 3 3
so we use the identity: (a b)(a + ab + b ) = a b , taking a = x+h and b = 3 x:
3
3 2 2
0 x+h 3 x x+h + 3 x+h 3 x + 3 x
f (x) = lim 2 2
h0 h 3
x+h + 3 x+h 3 x + 3 x
3
3
3
x+h 3 x x+hx
= lim 2 2
= lim 2
3
h0 h( x+h + 3
x+h 3 x + 3 x ) h0 h( x+h + 3 x+h 3 x + 3 x 2 )
3
1 1 1
= lim 2 2
= 2 2
= 2.
3 3 3 3
h0 x+h + x+h 3 x + 3 x x+0 + x+0 3 x + 3 x 33x
In the Notes 2.3, we will develop standard rules for computing derivatives, which let us
avoid such complicated limit calculations.
Thus 0 = limh0 [f (a+h) f (a)] = [limh0 f (a+h)] f (a), and limh0 f (a+h) = f (a), showing that f (x)
is continuous at x = a.
Math 132 Differentiation Formulas Stewart 2.3
So far, we have seen how various real-world problems (rate of change) and geometric prob-
lems (tangent lines) lead to derivatives. In this section, we will see how to solve such
problems by computing derivatives (differentiating) algebraically.
Notations. We have seen the Newton notation f 0 (x) for the derivative of f (x). The al-
df
ternative Leibnitz notation for the derivative is dx , meant to remind us of the definition of
f 0 (x) as the limit of difference quotients:
df f
f 0 (x) = = lim .
dx x0 x
Here f = f (x+h) f (x), the difference in f (x) corresponding to the difference x =
df
(x+h)x = h. Also, df and dx are meant to suggest very small f and x, but dx is not
literally the quotient of two small quantities, just a complicated symbol meaning the limit
of such quotients.
To illustrate: for f (x) = x2 , the formula f 0 (x) = (x2 )0 = 2x can be written in Leibnitz
notation as:
df d 2
= (x ) = 2x.
dx dx
df
The symbol dx means
the function f 0 (x); for a particular value of a derivative at x = a, we
df
write f 0 (a) = dx . The notation f 0 = Df is also used, and f 0 (x) = Df (x).
x=a
Basic Derivatives. To compute derivatives without a limit analysis each time, we use
the same strategy as for limits in Notes 1.6: we establish the derivatives of some basic
functions, then we show how to compute the derivatives of sums, products, and quotients
of known functions.
Theorem: (i) For a constant function f (x) = c, we have dxd
(c) = (c)0 = 0.
d
(ii) For f (x) = x, we have dx (x) = (x)0 = 1.
n
(iii) For f (x) = x with n a positive integer (a whole number), we have:
d n
dx (x ) = (xn )0 = nxn1 .
Proof: (i) and (ii) follow easily from the definition of f 0 (x). To prove (iii), we use the
identity: an bn = (ab)(an1 + an2 b + an3 b2 + + bn1 ), with a = x+h and b = x:
We obtain the correct formula from a geometric model: consider a rectangle with changing
sides of lengths f (x) and g(x) depending on some variable x, the upper left rectangle below:
The product f (x)g(x) is the area, and the derivative (f (x)g(x))0 is the rate of change of
area with respect to a change in x. A small increment x = h leads to some increments
f = f (x+h) f (x) and g = g(x+h) g(x) in the sides, and the increment of area,
(f g) = f (x+h)g(x+h) f (x)g(x), is equal to the area of the three edge rectangles:
Note that the third term, which goes to zero, corresponds to the tiny bottom right rectangle.
Lastly, we prove the Quotient Rule:
0 f (x+h) f (x)
f (x) g(x+h) g(x) f (x+h)g(x) f (x)g(x+h)
= lim = lim
g(x) h0 h h0 h g(x+h) g(x)
f (x+h)g(x) f (x)g(x) + f (x)g(x) f (x)g(x+h)
= lim
h0 h g(x+h) g(x)
Here, after putting the expression over a common denominator, we have added and sub-
tracted the quantity f (x)g(x) in the numerator, leaving the limit unchanged. Our aim is
to factor the first pair and last pair of terms:
f (x) 0
(f (x+h)f (x)) g(x) + f (x) (g(x)g(x+h))
= lim
g(x) h0 h g(x+h) g(x)
1 f (x+h) f (x) g(x+h) g(x)
= lim g(x) f (x)
h0 g(x+h) g(x) h h
1 f 0 (x)g(x) f (x)g 0 (x)
f 0 (x)g(x) f (x)g 0 (x) =
= .
g(x+0) g(x) g(x)2
We have again used several Limit Laws from Notes 1.6. We could give another proof of
the Product Rule in a very similar way.
Derivative computations. By repeatedly using these Rules, we can quickly compute the
derivatives of most functions.
d
example: Find ( x)0 = dx ( x). Solution: ( x)0 = (x1/2 )0 = 21 x(1/2)1 = 21 x1/2 = 21 x ,
where we used the Basic Derivative (xb )0 = bxb1 with b = 12 .
example: ( 10)0 = 0 since the derivative of any constant, even a complicated one, is zero.
df
example: For f (x) = (5x2 + 1)( x 3), find the derivative f 0 (x) = dx :
0 0 0
(5x2 +1)( x3) = 5x2 +1 ( x3) + 5x2 +1 ( x3) by Product Rule
5(x2 )0 +(1)0 ( x3) + 5x2 +1 (( x)0 (3)0 ) by Sum & Const Mult Rules
=
= 5(2x1 )+(0) ( x3) + 5x2 +1 12 x1/2 (0) by Basic Derivatives
= 10x( x3) + (5x2 +1) 21 x Tidying up
Note how we used the derivative from the previous example, ( x)0 = 12 x1/2
Another way to find the same derivative would be to multiply out first:
f (x) = (5x2 +1)( x3) = 5x2 x 15x2 + x 3 = 5x5/2 15x2 + x1/2 3.
Then we get the derivative:
f 0 (x) = 5( 52 x(5/2)1 ) 15(2x1 ) + 21 x(1/2)1 0 = 25
2 x x 30x + 1
2 x
.
This agrees with our previous answer, multiplied out.
t5
+1
example: Differentiate g(t) = t t
. Solution by the Quotient Rule:
5 0
4 )(t t) (t5 +1)( 3 t1/2 )
dg t +1 (t5 +1)0 (t t) (t5 +1)(t t)0 (5t
g 0 (t) = = = = 2
,
dt t t (t t)2 t3
where we use t t = t3/2 .
Solution by multiplying out: t1
t
= t3/2 , so:
Derivative of sine and cosine. The sine and cosine are important functions describing
periodic motion. From the graph y = sin(x) (in blue), let us examine the slope at each
point to sketch the graph of the derivative y = sin0 (x) (in red), as in Notes 2.3:
To prove this, we need two lemmas (minor theorems which help to prove a major one):
sin() cos()1
lemma: (a) lim0 = 1 (b) lim0 = 0.
0
Proof of (a): This is a difficult limit of the form 0.
Consider a sector OP Q of radians:
this means a pie-slice of radius r = 1, whose circular outer rim has length . (For example,
= 2 would mean a full circle.)
The area of the sector is proportional to the angle, increasing from A = 0 for = 0, to
A = r2 = for = 2, so it is A = 21 for arbitrary . From basic trigonometry, we
know that the height of the triangle 4OQP (inside the sector) is sin(), so its area is
1 1 1
2 (base)(height) = 2 (1) sin() = 2 sin(). Also, the height of the triangle 4OQT (outside
the sector) is tan(), so its area is 21 tan(). We have:
sin()
cos() 1.
sin2 ()
= (cos()+1) = sin()
sin()
cos()+1 .
Hence by the Product Limit Law:
cos()1 sin() sin() 0
lim = lim lim = 1 1+1 = 0,
0 0 0 cos()+1
We used Lemmas (a) and (b) to get the last line. The proof of cos0 (x) = sin(x) is similar.
General trigonometric derivatives. From these basic derivatives, we can compute the
derivative of any trig function or combination of trig functions.
example: Compute the derivative of tan(x). The Quotient Rule for derivatives (Notes
2.3) gives: 0 0 cos0 (x)
sin(x)
tan0 (x) = cos(x) = sin (x) cos(x)sin(x)
cos2 (x)
since cos2 (x) + sin2 (x) = 1. In fact, we get the following derivatives:
Warning: These formulas are only valid if the angle x is in radians, not degrees.
Limits of quotients. We can also compute trigonometric limits of the form 00 . The trick
is to manipulate the numerators and denominators to get factors of the form sin(g(x))
g(x) , where
g(x) is any quantity which goes to zero.
sin(3x)
example: Compute limx0 x . We have:
sin(3x) sin(3x) 3x sin(3x) 3x sin(h)
lim x = lim 3x x = lim 3x lim = lim lim 3 = 1 3 = 3.
x0 x0 x0 x0 x h0 h x0
tan(x) 1 1
lim = lim sin(x)
sin( x)
x0 x0 cos(x) sin( x)
1 sin(x) x 1
= lim x
x0 cos(x) x sin( x) x
x sin(x) 1 0 1
= lim sin(x) = 1 = 0,
x0 cos(x) x cos(0) 1
x
where limx0 sin(x x) = 1 by the substitution h = g(x) = x.
By the Limit Substitution Theorem at the end of Notes 1.7.
Math 132 The Chain Rule Stewart 2.5
Chain of functions. On a Ferris wheel, your height H (in feet) depends on the angle
of the wheel (in radians): H = 100 + 100 sin(). The wheel is turning at one revolution per
1
minute, meaning the angle at t minutes is = 2t radians. At t = 12 , we have = 6 and:
ft ft rad
=
min rad min
The rate of change of height with respect to angle is:
dH d
= (100 + 100 sin()) = 0 + 100 sin0 ()
d d
ft
= 100 cos() = 100 cos( 6 )
= 86.6 .
rad
The rate of change of angle with respect to time is:
d d rad
= (2t) = 2
= 6.28 .
dt dt min
Thus, the Chain Rule says the rate of change of height with respect to time is the product:
dH ft rad ft
= 86.6 6.28 = 544 .
dt rad min min
1
Your rate of rise is about 544 feet per minute, at time t = 12 .
Chain Rule: Let y, u, x be variables related by y = f (u) and u = g(x), so that y = f (g(x)).
Then, in Leibnitz notation:
dy dy du
=
dx du dx
or in Newton notation:
f (g(x))0 = f 0 (g(x)) g 0 (x).
This holds at any value of x where g 0 (x) and f 0 (g(x)) are both defined.
The function f (g(x)) is called the composition of f following g, sometimes denoted f g,
so that we may write f (g(x))0 as (f g)0 (x).
Proof of the Chain Rule. We will prove the Rule with the extra assumption that g(x) is a
For a general proof without this assumption, see the Stewart text 2.5, p. 153.
one-to-one function near a given x = a: that is, for x close enough (but unequal) to a, we
have g(x) 6= g(a). Then we compute, using the alternative defintiion of derivative:
f (g(x)) f (g(a))
(f g)0 (a) = d
dx f (g(x)) x=a = lim
xa xa
f (g(x)) f (g(a)) g(x) g(a)
= lim lim
xa g(x) g(a) ua xa
= f 0 (g(a)) g 0 (a) .
Here we used the Limit Substitution Theorem from Notes 1.7, substituting u for g(x) so
that x a forces u g(a). (Since g(x) is differentiable at x = a, it is also continuous.)
Differentiation Rules. Along with our previous Derivative Rules from Notes 2.3, and
the Basic Derivatives from Notes 2.3 and 2.4, the Chain Rule is the last fact needed to
compute the derivative of any function defined by a formula.
example: Find the derivative of (x+ x1 )10 . First, we use Leibnitz notation: let y = u10 and
u = x + x1 , so that y = (x+ x1 )10 . Then:
dy dy du d 10 d d
x+ x1 = 10u9 x+x1
= = (u )
dx du dx du dx dx
9 9
= 10 x+ x1 1+(1x2 ) = 10 x+ x1 1 x12 .
Next, we redo this in Newton notation, without introducing new letters y, u. Let f (x) =
x10 with f 0 (x) = 10x9 , and g(x) = x + x1 = x + x1 with g 0 (x) = 1 x2 = 1 x12 , so that:
9
f (g(x))0 = f 0 (g(x)) g 0 (x) = 10 x+ x1 1 x12 .
A third way (the quickest in practice) is to think of the composite function as an outside
function out = ( )10 wrapped around an inside function in = x + x1 , so the Chain Rule
becomes:
out(in)0 = out 0 (in) in 0
Here out 0 = 10( )9 , so:
9 0 9
out(in)0 = 10 x+ x1 x+ x1 = 10 x+ x1 1 x12
0
= sin0 tan x+1x
tan0 x+1x x
x+1
0 0
= sin0 tan x+1
x
tan0 x+1x
(x) (x+1)x(x+1)
(x+1)2
= cos tan x+1x
sec2 x+1x
(x+1)x
(x+1)2
since ()0 = c0 = 0. Any expression with no variable in it is constant, with derivative zero.
Proof. First, we prove the formula for exponent p = n, a negative integer power, using
the Quotient Rule:
1 0 (1)0 xn (1)(xn )0 nxn1
(xn )0 = = (n)xn1 .
xn = (xn )2
= x2n
Thus, we have proved the formula (xp )0 = pxp1 for p a positive or negative integer.
If m is even, this only makes sense on the domain x > 0.
Next, consider the equation (xn/m )m = xn where n, m are positive or negative integers.
We take derivatives of both sides, and expand the left side by the Chain Rule, and both
sides by the Basic Derivative for integer exponents:
0
(xn/m )m = (xn )0
0
m(xn/m )m1 xn/m = nxn1
nxn1
(xn/m )0 = = n n1(n/m)(m1)
mx = n (nm)/m
mx = n (n/m)1
mx .
m(xn/m )m1
Degrees versus radians. In higher mathematics, we always use radian measure (full
circle = 2 radians), so that sin(x) always means sine of x radians. This is essential to get
the formula sin0 (x) = cos(x).
The sine with input x in degrees (full circle = 360 deg) is acutally a different function,
which we can denote as sindeg (x). Remember that a function is a rule which converts input
numbers to output numbers: it does not know that we interpret some numbers as angles,
or what their units should be. Since sin(x) and sindeg (x) produce different outputs from a
given number x, they are different functions. In fact, we have:
2
sindeg (x) = sin 360 x .
The inside operation converts x from degrees to radians, then feeds this into the ordinary
(radian) sine function.
This makes a crucial difference in the derivative:
0 2 0
sin0deg (x) = sin 360
2 2 2
x = cos 360 x 360 x = cosdeg (x) 360 .
The geometric definition of radian measure is that an arc of length x on a unit circle makes an angle of
x radians. The full circle, whose arc length is the circumference 2, measures as 2 radians.
Math 132 Implicit Differentiation Stewart 2.6
Explicit versus implicit functions. Given the circle defined by the equation x2 +y 2 = 25,
suppose we wish to find the tangent line at the point (x, y) = (3, 4). Calculus finds a tangent
df
slope of a function graph y = f (x) as a derivative f 0 (a) = dx |x=a ; but there is no function
specified in our problem.
Rather, we must interpret x as an independent variable, which implicitly makes y a
function of x: to make this explicit, we solve the equation for y, giving y = 25 x2 .
That is, the circle is the union of two function graphs, y = 25 x2 and y = 25 x2 ,
each over the domain x [5, 5].
The given point (3.4) is on the first of these, and we differentiate this explicit function:
1 1
p
dy
dx = d
dx 25x2 = dx d
(25x2 ) 2 = 12 (25x2 ) 2 dx
d x
(25x2 ) = 25x 2
.
Here we used the Chain Rule with outside function ( )1/2 . At our point, we have the
dy 3
tangent slope dx |x=3 = 253 2
= 34 , and our tangent line has equation y = 34 (x3) + 4.
Implicit differentiation is a smoother way to do this problem. Instead of solving the
equation for y, we assume y = y(x) for some unkown function y(x) which satisfies the
equation x2 + y(x)2 = 25. Then we differentiate both sides using the Rules:
0
x2 + y(x)2 = (25)0
0 0
x2 + y(x)2 = 0
2x + 2y(x)y 0 (x) = 0.
Note that (x2 )0= 2x is a Basic Derivative, but for (y 2 )0 , we need the Chain Rule with
outside function ( )2 and inside function y = y(x). The derivative y 0 (x) is the unknown
we are trying to find, and now we can solve for it: y 0 (x) = y(x)
x
, which was easier than
solving for the original y(x). Since are considering the point (x, y) = (3, 4), we must have
dy
y(3) = 4, so that dx |x=3 = y 0 (3) = y(3)
3
= 34 , as before.
dy
Note that the formula y 0 (x) = y(x)x
, or in Leibnitz notation dx = xy , is valid for both
of the functions defining the upper and lower half-circles. Since both functions obey the
original equation, they both obey the derivative equation. For example, at (x, y) = (3, 4),
the slope is y 0 (3) = y(3)
3 3
= 4 = 34 .
We could even take this one step further to find the second derivative implicitly:
x
0 0 yx( y ) 2 +x2
y 00 (x) = (y 0 (x))0 = ( xy )0 = (x) yx
y2
y
= y2
= y y3
= y253 .
We used the Quotient Rule, the previous y 0 = xy , and the original equation x2 + y 2 = 25.
Folium of Descartes. This is a curious curve discovered by the famous mathematician
who gave us Cartesian xy-coordinates. It is defined by the equation: x3 + y 3 = 9xy, with
graph :
We want to find the tangent line at the point (x, y) = (2, 4), which is on the curve because
23 + 43 = 9(2)(4). In this case, there is no easy way to solve for y to get an explicit function
y(x); indeed, over x [0, 92 ], the curve is the union of three function graphs.
Nevertheless, implicit differentiation works without a hitch: we assume y = y(x) is some
unknown function which satisfies the equation, and differentiate both sides (this time in
Leibnitz notation):
d 3 3 d
dx x + y = dx (9xy)
d 3 + d y3 d d
dx x dx = 9 dx (x) y + x dx (y)
dy dy
3x2 + 3y 2 dx = 9y + 9x dx .
dy
Here we used the Sum and Product Rules, then the Chain Rule. Solving for dx :
dy dy dy 9y3x2
3y 2 dx 9x dx = 9y 3x2 , dx = .
3y 2 9x
We do not know y(x) explicitly, but our given point (x, y) = (2, 4) means that y(2) = 4, so:
dy 9y3x2 9(4) 3(22 )
dx x=2 = = = 45 .
3y 2 9x 3(42 ) 9(2)
Thus, the tangent line through the point (2, 4) is: y = 45 (x2) + 4.
To find points satisfying this equation, substitute y = tx for a new variable t, and solve for x, giving:
9t 9t2
x= 1+t3
and y = 1+t 3 . Then each value of t gives a point (x, y) on the curve: this is called a parametrization.
Math 132 Rates of Change Stewart 2.7
Conceptual levels. Mathematics solves problems partly with technical tools like the
differentiation rules, but its most powerful method is to translate between different levels of
meaning, transforming the problems to make them accessible to our tools. Problems often
originate at the physical or geometric levels, and we translate to the numerical or algebraic
levels to solve them, then we translate the answer back to the original level.
Our key concept so far has been the derivative, with the following meanings:
dy
Physical: For a function y = f (x), the derivative dx = f 0 (x) is the rate of change of y
with respect to x, near a particular value of x. For a a particular input, f 0 (a) means
how fast f (x) changes from f (a) per unit change in x away from a. This is the main
importance of derivatives.
Geometric: For a graph y = f (x), the derivative f 0 (a) is the slope of the tangent line
at the point (a, f (a)).
f 0 (a)
= f
x = f (a+h)f (a)
h .
The right side is the average rate of change of f (x) from x = a to x = a+h. As
x = h 0, the difference quotient approaches the instantaneous rate of change, the
derivative f 0 (a).
Algebraic: We can easily compute the derivative of almost any function defined by a
formula. Basic Derivatives like (xp )0 = pxp1 , sin0 (x) = cos(x), and cos0 (x) = sin(x)
are combined using the Sum, Product, Quotient, and Chain Rules for Derivatives.
Occasionally, we must go back to the definition f 0 (a) = limh0 f (a+h)f
h
(a)
.
Functions of motion. We consider the basic physical quantities describing motion. These
are all functions of time t. (See end of 2.3.)
Velocity v = dsdt , how fast the position is increasing per second (ft/sec); this is negative
if position is decreasing. The speed is the magnitude |v|.
2
Acceleration a = dv ds
dt = dt2 , how fast the velocity is increasing, the number of ft/sec
gained each second (ft/sec2 ). Equivalently, this is how fast the object is speeding up
(positive) or slowing down (negative).
da d3s
Jerk j = dt = dt3
, the rate of change of acceleration (ft/sec3 ).
Driving. An insurance company downloads the following data from a cars speedometer,
allowing them to construct the following graph of the cars velocity v(t). What physical
story does this graph tell?
The dip of negative velocity at the beginning is probably the car slowly backing down a
driveway. It goes forward a few blocks at a moderate, almost constant speed (positive
velocity), stops at an intersection (zero velocity), then continues at higher speed.
From the velocity data, we can reconstruct the odometer data, the graph of the distance
function s(t): the level of the velocity graph is the slope of the distance graph.
The distance starts at some positive odometer reading s = s0 (which we cannot know from
the velocity data alone), decreases a bit (negative slope) because of the negative velocity,
increases with constant slope during constant positive velocity, stays at a constant level
during zero velocity, then increases with greater slope after the velocity goes up.
Nothing remarkable so far. What about the acceleration, the derivative a(t) = dvdt ? The
slope of the velocity graph is the level of the acceleration graph:
Assuming the odometer runs both ways, like old mechanical odometers used to.
Here a(t) is roughly proportional to the depression of the gas or brake pedal, and it is
zero except when the car is speeding up or slowing down. The most prominent feature is
the spike after t = 120: just how strong an acceleration is this? The tangent line marked
on the velocity graph shows a change from 0 to 40 mph in about 3 sec, meaning a slope
a(122)
= 40
3 = 10.3 mph/sec. Now,
1 mi 5280 ft ft
hr = 3600 sec = 1.47 sec ,
so we convert 10.3 mph per second = (10.3)(1.47) = 20 ft/sec per second = 20 ft/sec2 .
Compare this to the standard acceleration due to gravity: one gee is about 32 ft/sec2 , so
this driver feels about 2/3 the force of gravity pushing him into the seat-back. It seems he
(Im pretty sure its not a she) is flooring the accelerator, roaring ahead from a standstill
with tires squealing, then easing up past 40 mph or so. Not responsible driving!
Finally, note the jump in a at t = 90, where the car goes from braking deceleration to a
standstill. The change in a is not so large, but it happens so fast that it looks instantaneous,
and the a(t) graph seems to rise veritically (infinite slope). This means the derivative of
acceleration, the jerk j = dadt , is huge at this moment, and the car experiences a lurching
stop, another sign of sloppy driving. This drivers insurance rates are going up!
Note that in this analysis, we have translated from the graphical (geometric) to the
physical level; and also (for the gee calculation) from the graphical to the numerical to the
physical.
Ballistic equation. This is the formula giving the height s(t) for an object launched from
initial height s0 , straight upward with initial velocity v0 , under the influence of a constant
gravitational acceleration g:
s(t) = s0 + v0 t 12 gt2 .
To justify this equation, note that the initial height is indeed s(0) = s0 +v0 (0) 12 g(02 ) = s0 .
Also, s0 , v0 , g are constants, so:
and indeed the initial velocity v(0) = v0 . The acceleration is a(t) = v 0 (t) = g, which is the
desired constant in the correct (downward) direction. Finally, the jerk is j(t) = 0, which is
correct because gravity pulls steadily and never jerks.
example: Given standard gravity of 32 ft/sec2 and initial height s0 = 5 ft, how fast to
throw a ball upward so that it stays airborne for 5 sec? The equation becomes s(t) =
5 + v0 t 16t2 , with v0 an unknown constant. Landing at 5 sec means s(5) = 0, that is
5 + v0 (5) 16(52 ) = 0, and solving, v0 = 79 ft/sec. (This is 79/1.47 = 54 mph!)
How high will the ball go from such a throw? At the instant t = t1 when the ball reaches
79
the top of its arc, its velocity is zero. That is: v(t1 ) = 79 32t1 = 0, and t1 = 32 = 2.47
sec. (This is not quite half the 5 sec interval, because the ball started out at s0 = 5 ft.)
The height at this instant is s(t1 ) = 297 35
64 = 297.5 ft. It would take a baseball pitcher to
throw a ball that high.
Note that the graph s = 5 + 79t + 16t2 is a downward-curving parabola, but this is not
the trajectory of the ball, which is going straight up and down. For t < t1 , the height s(t)
is increasing, and the velocity v(t) = 79 32t is positive; for t > t1 , s(t) is decreasing, and
v(t) is negative.
Math 132 Related Rates Stewart 2.8
Pulley example. Consider a weight hanging from a rope which stretches up to a pulley
10 ft above the floor, then to your hand, which is 3 ft above the floor and 15 ft horizontally
from the pulley. If you walk away from the pulley at 2 ft/sec, how fast will the weight rise?
We want to find an unknown rate of change from a known rate which is related to
it geometrically. To start any such problem, we draw a picture and label constant parts
with their values: the lengths 3 and 7 below, which will not change as your hand moves
horizontally. We label variable parts with letter names: the variable h = h(t) is the
horizontal distance from weight to hand, and r = r(t) is the length of rope from pulley to
hand, both functions of time t.
The problem specifies the current values of some variables, usually meaning at time
t = 0: h(0) = 15. Finally, for each variable we draw an arrow marked with its current rate
of change: we know h0 (0) = 2, and r0 (0) is the target rate which we want to compute, since
the weight goes upward at the same rate as r increases.
Next, we write equations implied by the geometry of the picture: the Pythagorean Theorem
implies r2 = h2 + 72 . To determine r0 (0), we compute r(t) explicitly, and differentiate:
p
r(t) = h(t)2 +49
0
r0 (t) = 21 (h(t)2 +49)1/2 h(t)2 +49
h(t)h0 (t)
= p .
h(t)2 +49
Plugging in the current values at t = 0:
h(0)h0 (0) (15)(2) 30
r0 (0) = p = = = 1.8 ft/sec .
h(0)2 +49 2
15 +49 274
We could do this a bit more simply by implicitly differentiating both sides of the equation
r2 = h2 + 72 , then solving for r0 (t):
0 0
r(t)2 = h(t)2 + 49
2r(t)r0 (t) = 2h(t)h0 (t)
h(t)h0 (t)
r0 (t) = .
r(t)
Now, r(0) = h(0)2 + 49 = 274, so plugging in current values: r0 (0) = (15)(2)
p
274
as before.
Warning: It is essential to plug in the current values
p only in the last step: if we substi-
tuted before differentiating, we would get: (r(0))0 = ( h(0)2 + 49)0 = 0 since the derivative
of any constant (even a complicated constant) is zero.
2. Write an equation relating the variables according to the geometry of the picture.
d
3. Assuming each variable is a function of time t, take the derivative dt of both sides of
the equation, with the Chain Rule producing derivatives of the variables. If necessary,
solve the derivative equation for the derivative which is desired.
4. Plug in the current values of the variables and rates to compute the target rate.
Ice block example. We saw a related rates problem in Notes 2.3, last page.
Spill radius example. A stream of water is spreading a circular puddle on the floor. If
the puddle is 1 meter across, and the stream increases the area at a rate of 2 sq m/min,
then how quickly is the puddle widening?
The variable quantities are the radius r and the area A. We know the current value r(0) = 12
and the current rate A0 (0) = dA 0
dt |t=0 = 2. The unknown rate which we must find is r (0).
2
The area is related to the radius by the equation: A = r . Differentiating the equation:
0
A0 (t) = r(t)2 = 2 r(t) r0 (t).
A0 (t) A0 (0)
Solving for the target rate: r0 (t) = 2 r(t) , and r0 (0) = 2 r(0) = 2
= 0.64 m/min.
2( 12 )
= 2
It is important to check a real-world result for plausibility. The puddles radius is
growing (positive derivative) at a rate of about half a meter per minute, which is reasonable.
Searchlight example: A searchlight is shining along a wall 20 meters away. If the position
of the light is 30 away from looking directly at the wall, and the light is turning at 5 per
second, then what is the speed of the spotlight image moving along the wall?
The distance from the wall is the constant 20; the variable quantities are and s. The angle
(t) has current value (0) = 30 and current rate 0 (0) = 5/sec, and we seek to compute
the unknown rate s0 (0) = dsdt |t=0 . From the definition of tangent, we have the equation:
s
tan() = 20 , so we can easily solve for s = 20 tan(). Differentiating (in Leibnitz notation
this time):
ds d 2 d
dt = dt (20 tan()) = 20 sec () dt ,
d
since dx tan(x) = sec2 (x) from the table in Notes 2.4. We do not need to solve for ds dt ,
since we already solved for s before differentiating.
Finally, to plug in the current values of the angles, we must convert them to radians,
because the trig differentiation formulas are only valid for radian measure (see last page of
Notes 2.5). Thus:
(0) = 30 = 30( 360 2
) = 6 rad,
0 (0) = d
dt |t=0 = 5/min = 5( 360
2
) =
36 rad/sec,
so the current speed is:
s0 (0) = ds
= 20 sec2 ( 6 ) 20
dt t=0 36 = 27 = 2.3m/sec.
Note that plugging in d dt |t=0 = 5 deg/sec instead of 36 = 0.09 rad/sec would give a wildly
incorrect answer: the conversion to radians is essential.
One last point: the problem specifies only the speed of , not the velocity toward or
away from the wall, so we only know 0 (0) = 36
, either plus or minus, though in the
picture we assumed it was plus. Thus we can only compute s0 (0) = 20 27 , but in any case
the speed is |s0 (0)| = 20
27 .
Math 132 Linear Approximation Stewart 2.9
Tangent linear function. The geometric meaning of the derivative f 0 (a) is the slope of
the tangent to the curve y = f (x) at the point (a, f (a)). The tangent line is itself the graph
of a linear function y = L(x), where:
This is correct because the line y = f (a) + f 0 (a)(xa) has slope m = f 0 (a), and L(a) =
f (a) + f 0 (a)(aa) = f (a), so the line passes through the point (a, L(a)) = (a, f (a)).
The value f 0 (a) is not just the slope of the tangent line: it is also the slope of the graph
itself, because as we zoom in toward (a, f (a)), the graph and the tangent line become
indistinguishable :
This suggests a further numerical meaning of the derivative: any function f (x) is very
close to being a linear function near a differentiable point x = a, so that L(x) is an excellent
approximation for f (x) when x is close to a:
A scentific calculator gives sin(42 ) 0.669, so again the linear approximation is accurate
to two decimal places.
Error sensitivity. We can rewrite the linear approximation f (x) f (a) + f 0 (a)(xa) as:
That is, we can approximate the change in f (x) away from f (a) in terms of the change in
x away from a. In Leibnitz notation, with y = f (x), we write this as:
dy
y dx x.
dy dy
Here we mean dx = dx |x=a = f 0 (a). If we think of x as an error from an intended input
value x = a, then f f 0 (a) x approximates the error from the intended output f (a).
example: A disk of radius r = 5 cm is to be cut from a metal sheet weighing 3 g/cm2 . If
the radius is measured to within an error of r = 0.2 cm, what is the approximate range
of error in the weight? This is the kind of error-control problem from our limit analyses in
Notes 1.7, only now we have the powerful tools of calculus to give a simple answer.
The weight is given by the function:
and we aim to find the error W away from this intended value. Since:
dW dW
dr = 3(2r) = 6r and dr |r=5 = 30,
The point here is not just the specific error estimate, but the formula which gives, for
any small input error r, the resulting output error W 30 r 94 r. The coefficient
30 measures the sensitivity of the output W to an error in the input r.
dy = dy
dx dx and df = f 0 (x) dx.
The dependent variable dy is called a differential: we can think of it as the linear approxi-
mation to y, as pictured below:
We define an absolute minimum similarly, and both maximums and and mini-
mums are extremums or extreme points. Note that the maximum value M (the
largest possible output) is unique, but f (x) could touch this value at several
input points c1 , c2 , . . . [a, b], all having f (c1 ) = f (c2 ) = = M .
example: At left below, the function y = f (x) on the interval [a, b] has one
absolute maximum point, the left endpoint x = a with f (a) = M , so that
(a, f (a)) is the highest point on the graph; and it has two absolute minimum
points x = c1 , c2 with f (c1 ) = f (c2 ) = N , so that (c1 , f (c1 )) and (c2 , f (c2 )) are
the lowest points on the graph.
The proof would require sophisticated Real Analysis concepts such as those
studied in Math 320. To see that the theorem is not obvious, consider the
function y = g(x) graphed at right above. It is not a continuous function
because the graph has a break, so the Theorem does not guarantee an absolute
maximum; and indeed there is no absolute maximum. Instead, the function
approaches y = 3 as x 1 (i.e. x = 1 for small > 0), but it never
actually reaches y = 3 because it suddenly drops to g(1) = 2. Thus, for any
given output g(c), we can find some slightly larger output g(1) > g(c) for
The Latin plurals of maximum, minimum, extremum are maxima, minima, extrema.
a very tiny > 0, so no g(c) is largest. The function does, however, have the
absolute min point x = 2.
Local maxima and minima. A broader, but still useful, concept is that of
a local extremum: this is a point where the graph has a hill or valley, but not
necessarily the highest or lowest one.
This could be proven using the Linear Approximation Theorem at the end of
Notes 2.9.
example: We wish to find the maxima and minima, both local and absolute,
of f (x) = x3 x + 1 on the interval x [1, 23 ]. Since f (x) is continuous (by
the Limit Laws), the Extremal Value Theorem guarantees there is at least one
of each type of point.
Exactly where are the hill-top and the valley-bottom points? Since f (x) is
differentiable at every point, the First Derivative Theorem means that all lo-
cal maximum and minimum points must be solutions of f 0 (x) = 0, namely
3x2 1 = 0, or x = 13 0.58 . The graph shows that the local maxima
are the hill-top x = 13 and the right endpoint x = 32 , and the one with the
larger output is the absolute maximum: f ( 13 ) 1.4 < f ( 32 ) 2.9, so the
endpoint x = 32 is the absolute maximum point. Similarly, the local minima
are x = 1 and x = 13 with f (1) = 1 > f ( 13 ) 0.61, so x = 13 has the
smaller output and is the absolute minimum point.
Critical points. The above example illustrates the method for identifying
all relevant candidates for the absolute maximum and minimum: the end-
points and the points where the derivative vanishes, and also possibly where
the derivative is not defined because the graph has a corner or a discontinuity.
1. Given f (x) on an interval x [a, b], determine the critcal points (critical
numbers) x = c such that f 0 (c) = 0 or undefined. Be sure to consider
only those c [a, b], discarding any critical points outside the relevant
interval.
2. If f (x) is continuous, find f (x) for all critical points x = c and for the
endpoints x = a, b. Those points with the largest output are the absolute
maximum points, and those with smallest values are the absolute minima.
Since sgn( ) is never zero, we have f 0 (x) = 0 when the second factor vanishes:
4x + 2 = 0, or x = 21 .
But this is not the only critical point, since we must also consider when f 0 (x)
2
is undefined. This happens when the first factor
2 1
sgn(2x + 2x 1) is undefined,
namely when 2x + 2x 1 = 0, or x = 4 (2 12) by the Quadratic Formula.
These are the corners of the graph sitting on the x-axis: we must not skip
them, since they are actually the absolute minimum points.
1
example: Let f (x) = x2 + (x1)2
on the interval x [2, 2], graph above
right, with:
0 0 2(x4 3x3 +3x2 x1)
f 0 (x) = x2 + (x1)2 = 2x+(2)(x1)3 (x2)0 = (x1)3
.
Vanishing derivatives. We will prove some basic theorems which relate the
derivative of a function with the values of the function, culminating in the
Uniqueness Theorem at the end. The first result is:
Rolles Theorem: If f (x) is continuous on a closed interval x [a, b]
and differentiable on the open interval x (a, b), and f (a) = f (b),
then there is some point c (a, b) with f 0 (c) = 0.
Here x [a, b] means a x b, and x (a, b) means a < x < b. See the graph
at left for an example: no matter how the curve wiggles, it must be horizontal
somewhere.
Note that Rolles Theorem is the special case of MVT in which the secant line
is horizontal. In fact, we will prove MVT for a general f (x) by cooking up a
new function g(x) for which Rolles Theorem applies, then translating Rolles
conclusion back in terms of f (x).
Proof of MVT. Suppose f (x) satisfies the hypotheses. Then define a new func-
tion g(x), shown in the picture, which measures the height from the graph
y = f (x) down to the secant line y = f (a) + f (b)f
ba
(a)
(xa):
f (b)f (a)
g(x) = f (x) f (a) ba (xa).
Then g(x) is continuous on [a, b] by the Limit Laws (1.6), and differentiable
on (a, b) by the Derivative Rules (2.3). In fact,
f (b)f (a) f (b)f (a)
g 0 (x) = f 0 (x) 0 ba (10) = f 0 (x) ba ,
f (b)f (a)
which means f 0 (c) = ba , Q.E.D.
The Mean Value Theorem does not give any way to find the particular c (a, b)
in the conclusion, so if we want this value in a particular case, we must solve
for x in the equation f 0 (x) = f (b)f
ba
(a)
; however the Theorem will guarantee
that there is some solution.
example: Let f (x) = 5 x x x over the interval [a, b] = [0, 4].
To check the hypotheses of MVT, note that x is continuous for all x 0, and
thus over [0, 4]. As for differentiability:
0
f 0 (x) = 5x1/2 x3/2 = 52 x1/2 32 x1/2
is defined for x > 0, and hence over x (0, 4): the hypothesis allows f 0 (a) =
f 0 (0) to be undefined. Thus we conclude there must be some c (0, 4) with
f 0 (c) = f (b)f
ba
(a) 20
= 40 = 12 . That is, we must solve:
Proof. (a) Assume the hypothesis f 0 (x) = 0 for all x (a, b), and imagine,
contrary to the conclusion, that f (x) were not a constant function. Then we
would have two unequal values f (a1 ) 6= f (b1 ) for some a1 , b1 [a, b], and we
could apply the Mean Value Theorem to the smaller interval [a1 , b1 ] to get
f 0 (c) = f (bb11)f
a1
(a1 )
6= 0, since f (b1 ) f (a1 ) 6= 0. But this would be impossible,
since we assumed f 0 (c) = 0 for all c (a, b). Hence, f (x) cannot be non-
constant, and must be constant.
(b) Assume the hypothesis f 0 (x) = g 0 (x) for all x (a, b). Now the function
h(x) = f (x) g(x) has h0 (x) = f 0 (x) g 0 (x) = 0, so we can apply part (a) to
conclude that h(x) is a constant function, h(x) = f (x) g(x) = C, meaning
f (x) = g(x) + C for some consant C.
(c) In the situation of (b), we also assume f (c) = g(c). By (b), we know
f (x) = g(x) + C and C = f (x) g(x) for all x. In particular for x = c, we have
C = f (c) g(c) = 0, so f (x) = g(x) + C = g(x), Q.E.D.
To see the significance of this theorem, recall from 2.7 the Ballistic Equa-
tion s(t) = s0 + v0 t 21 gt2 , which gives the height s(t) of an object thrown
straight up from initial height s0 at initial velocity v0 , under the influence
of constant gravitational acceleration g. We verified that the derivative
s0 (t) = v0 gt gives the expected velocity: decreasing from v0 at a constant
rate of g.
But does this guarantee we have the correct function s(t)? What if there
were some other function s(t) with the same derivative s0 (t) = s0 (t) and the
same initial value s(0) = s(0)? Then s(t) would be just as good a candidate to
give the height of the object, and our mathematical theory would not produce
a clear physical prediction. However, the Uniquenss Theorem (c) shows that
s(t) = s(t), so the other solution could only be the same as the original solution.
Experiment shows that objects launched in exactly the same way always fly
the same way, not according to s(t) in some experiments and a different s(t) in
other experiments. This is what we mean by physical law. Our Theorem shows
that the mathematical solution has the same uniqueness as the experimental
result.
The theory of quantum mechanics, however, which explains atomic-scale phenomena,
goes beyond the framework of deterministic laws, incorporating randomness in an essential
way. It requires a yet higher mathematical theory, in which we apply calculus not to specific
positions, but to probability distributions on all possible positions.
Math 132 Derivatives and Graphs Stewart 3.3
Increasing and decreasing functions. We will see how to determine the im-
portant features of a graph y = f (x) from the derivatives f 0 (x) and f 00 (x), sum-
marizing our Method the last page. First, we consider where the graph is rising
% and falling &. Formally:
We can determine this with derivatives: the graph rises where its slope is positive.
Proof. Assume the hypothesis that f (x) is continuous on [a, b] and f 0 (x) > 0, but
imagine, contrary to the conclusion, that f (x) failed to be increasing. Then, negat-
ing the definition, there would be some x1 < x2 with f (x1 ) f (x2 ). Applying
the Mean Value Theorem to the interval [x1 , x2 ], we would get some x3 (x1 , x2 )
with
f (x2 ) f (x1 )
f 0 (x3 ) = 0,
x2 x1
since f (x2 )f (x1 ) 0. But this would be impossible, since we assumed f 0 (x) > 0
for all x (a, b), and hence for all x in the smaller interval (x1 , x2 ). Thus,
f (x1 ) f (x2 ) is impossible, and we must have f (x1 ) < f (x2 ) for all x1 < x2 . The
second statement of the Theorem is proved similarly. Q.E.D.
example: For f (x) = x5 15x3 , let us determine the rough shape of the graph
by examining the derivative:
Since f 0 (x) is defined everywhere, the critical points (or critical numbers) are the
solutions of f 0 (x) = 0, namely x = 3, 0, 3.
x 3 0 3
f 0 (x) + 0 0 0 +
f (x) % 162 & 0 & 162 %
Since f 0 (x) is zero only at the critical points, it is all positive or all negative in
each interval between. For example, in the leftmost interval (, 3), a sample
value is f 0 (4) = 560 > 0, so f 0 (x) is positive in the whole interval, and we put +
in the first column next to f 0 (x). The rest of the f 0 (x) row is similar.
What does this mean for the graph y = f (x)? From 3.1, we know the critical
points are candidates for local max/mins: hill tops or valley bottoms. Which is
which? To the left of x = 3, we have f 0 (x) > 0 so f (x) is increasing %; to the
right, we have f 0 (x) < 0 so f (x) is decreasing &. Evidently, x = 3 is a local max,
and (3, 162) is a hill top point of the graph. Similarly, (3, 162) is a valley.
On the other hand, to the left and right of x = 0, we have f 0 (x) < 0, so
f (x) is decreasing on both sides: this means x = 0 is a stationary point where the
graph levels out before continuing to descend. We get a good picture of the graph:
In terms of the slope, concave up means that as x increases, the slope becomes
less negative or more positive. For concave down, the slope becomes less positive
or more negative.
Definition: Suppose the derivative f 0 (x) is defined for x near c.
f (x) is concave up at x = c if f 0 (x) is increasing near x = c.
f (x) is concave down at x = c if f 0 (x) is decreasing near x = c.
f (x) has an inflection point at x = c if f 0 (x) has a local max or
local min at x = c.
Also note that f (x) = x5 15x3 has only odd powers of x, so f (x) = f (x). This means
the graph has a 180 rotation symmetry, like a propeller. Such an f (x) is called an odd function.
We can test for concavity using the second derivative f 00 (x):
(I wrote double and + + just to make frowny and smiley faces: this is a good
way to remember which is which.) This agrees with the features of our graph
above, and it allows us to precisely determine
the inflection
points marked by
small diamonds in the picture: ( 32 2, 567
8 2), ( 3
2 2, 567
8 2), and (0, 0), which
is both a stationary critical point and an inflection point.
Critical Points and Concavity. There is one more use we can make of the
second derivative. At a local max x = c, the slope changes from positive to
negative, so the graph is concave down and f 00 (c) < 0; while at a local min it is
concave up and f 00 (x) > 0. Thus, we can distinguish extremal points just from
the sign of f 00 (c).
Indeed, in our example, we have f 00 (3) = 270 < 0 at the local max; f 00 (0) = 0
at the stationary point; and f 00 (3) = 270 > 0 at the local min.
x2/3
Example. We will graph f (x) = , going through the Method steps on
(x1)2
the last page.
1. Using the Quotient and Chain Rules, and much simplification, we get:
2 1/3
3x x2/3 2(x1)1 (x1)0
(1x)2 2
3 (2x+1)
f 0 (x) = 4
= .
(1x) (x1)3 x1/3
2 3 1/3 2 (2x+1) 3(x1)2 x1/3 + (x1)3 1 x2/3 2 2
00 3 (2)(1x) x 3 3 9 (14x +14x1)
f (x) = = .
(1x)6 x2/3 (x1)4 x4/3
2. The two types of critical points are solutions of:
4. The inflection points are solutions of f 00 (x) = 0, when the numerator is zero:
73 7
14x2 + 14x 1 = 0 x = 14 1.07, 0.07
Vertical asymptotes. We say a curve has a line as an asymptote if, as the curve
runs outward to infinity, it gets closer and closer to the line. Closer and closer
reminds us of limits, and indeed we have seen that x = a is a vertical asymptote
of y = f (x) whenever one of the following holds:
As we saw in 1.5, has no meaning by itself; rather, the whole equation means
that, as x gets closer to (but unequal to) a, the output f (x) eventually becomes
higher than any given bound B, such as B = 100 or 1000 or 1 billion. Similarly,
a limit equals when f (x) becomes lower than B for any large B.
At the end of 3.3, we saw how a sign chart for f 0 (x) can classify vertical
asymptotes. We could do this with a sign chart for f (x) itself, with no derivatives.
example: Let:
x2 6x+9 (x3)2 x3
f (x) = 3 2
= = .
x 6x +11x6 (x1)(x2)(x3) (x1)(x2)
(To determine vertical asymptotes and intercepts, we always want f (x) in factored
form.) In the original form, the denominator vanishes at x = 3, but we work with
the cancelled form at right.
The function can only change its sign at points where f (x) = 0 (numerator =
0) or f (x) is not defined (denominator = 0), that is, x = 1, 2, 3. In the interval
2
x (, 1), the sign is given by a sample point like f (0) = (1)(3) = 32 < 0,
so f (x) is negative; and similarly for the other intervals.
x 1 2 3
f (x) + 0 +
Each time x passes one of the sign-change candidates x = a, a factor (xa) changes
from negative to positive, and f (x) does indeed change sign.
To factor the bottom, we try linear factors x m n
, where m is an integer fac-
tor of the constant coefficient 6, and n is an integer factor of the highest coefficient
1, so n = 1, 2, 3, 6 and m = 1. Trying mn
= 1, we find x1 is a fac-
tor, since polynomial long division gives x 6x2 +11x6 = (x1)(x2 5x+6), and the
3
quadratic is easy to factor. For a review of polynomial long division, see Khan Academy:
www.khanacademy.org/math/algebra2/polynomial and rational/dividing polynomials/v/polynomial-division.
Here f (x) = just means the denominator vanishes and there is a vertical
asymptote. The signs on each side of the asymptote show whether the graph
shoots upward or downward: we have limx1 f (x) = , limx1+ f (x) = ,
limx2 f (x) = , limx2+ f (x) = .
Horzontal asymptotes. To understand the behavior of the graph over the left
and right ends of the x-axis, we will need a new kind of limit in which x becomes
larger and larger.
Definition:
Graphically, limx f (x) = L means that toward the right of the x-axis, the
graph y = f (x) approaches the horizontal asymptote y = L; and similarly for
limx f (x) = L toward the left. We can even have limx f (x) = , which
means that the graph goes off toward the upper right of the xy-plane in an un-
specified way.
The most basic x limits are the power funcitons: for a positive real
number power p > 0, we have:
1
lim xp = , lim = 0.
x x xp
m
For x , consider the rational power p = n where m, n are positive integers
with n odd (perhaps n = 1); then:
m/n for m even 1
lim x = lim = 0.
x for m odd, x xm/n
Proof: For any large bound C, we can force xp > C if we take x so large that x > C 1/p . For
any small error tolerance > 0, we can force | x1p 0| < if we take x so large that x > ( 1 )1/p .
For example:
Based on these, we can deduce the horizontal asymptotes for any rational function
(quotient of polynomials).
x 6x+9 2
example: Continuing f (x) = x3 6x 2 +11x6 , does y = f (x) have a horizontal
asymptote? Informally, we can reason as follows. For large x (positive or neg-
ative), the value of x2 6x+9 is relatively close to x2 : say for x = 1000, com-
pare x2 6x+9 = 9,994,009 and x2 = 1,000,000. Thus we can approximate
x2 6x+9 x2 , which we call the highest term of the polynomial. Also doing
this for the denominator:
x2 6x+9 x2
f (x) = 3 for large x.
x 6x2 +11x6 x3
2
Thus, lim f (x) = lim x3 = lim 1 = 0, and y = f (x) has the horizontal
x x x x x
asymptote y = 0 for x and x . In the graph we drew previously, the
left and right ends do indeed approach the x-axis.
Formally, we can show this from the Limit Laws by dividing numerator and
denominator by the highest term in the denominator:
1
x2 6x+9 x2 6x+9 x3
lim f (x) = lim = lim 1
x x x3 6x2 +11x6 x x3 6x2 +11x6
x3
1 6 9
x x2 + x3 0 6(0) + 9(0)
= lim = = 0.
x 1 6 + 112 63 1 6(0) + 11(0) 6(0)
x x x
Warning: The informal argument is the easiest way to understand these limits,
but the formal argument (dividing by the highest term) might be required for full
credit on a quiz or test.
3x2 x+9
example: For f (x) = 5x2 +2x6
, we take highest terms to get:
3x2 x+9 3x2 3
lim f (x) = lim = lim = .
x x 5x2 +2x6 x 5x2 5
Thus, y = f (x) has horizontal asymptote y = 35 toward the right. We simi-
larly deduce lim f (x) = 35 , which means the same horizontal asymptote toward
x
the left.
example: For
x2 + 3x7/2 x5
f (x) = ,
9x x + 4x2 x
the terms in the denominator are 9xx1/2 = 9x3/2 and 4x2 x1/2 = 4x5/2 , so the
second is the highest term. Thus:
x2 + 3x7/2 x5 3x7/2
lim f (x) = lim = lim
x x 9x x + 4x2 x x 4x5/2
3 7/25/2 3
= lim x = lim x = ,
x 4 x 4
x3 6x2 + 11x 6
f (x) = .
2x2 8x
Recall from 3.3 that to find the large-scale behavior of f (x) as x , we can
x3 1
approximate by the highest term in numerator and denominator: f (x) 2x 2 = 2 x.
Thus, the right and left ends of the graph look like lines with slope 21 .
However, the graph does not actually approach the line y = 21 x: there is
a vertical shift, y = 12 x + b. To approximate better, and find the exact slant
asymptote of y = f (x), we perform polynomial long division:
1
2x 1 rem 3x 6
2x2 8x x3 6x2 + 11x 6
(x3 4x)
2x2 + 11x 6
(2x3 + 8x)
3x 6
This means:
so that:
( 21 x1)(2x2 8x) + (3x6) 1 3x6
f (x) = = 2x 1+ .
2x2 8x 2x2 8x
1 3x6
That is, we have the approximation f (x) 2x 1 with error term 2x2 8x
; but
this term gets vanishingly small:
3x6 3x
lim = lim = 0.
x 2x2 8x x 2x2
That is, the difference between them vanishes as x gets large: limx f (x) (mx+b) = 0.
That is, as x gets larger and larger, the error term gets smaller and smaller, and
the graph y = f (x) gets closer and closer to the line y = 12 x 1. This is what we
mean by a slant asymptote.
g(x)
For a general rational function f (x) = h(x) , a quotient of polynomials g(x), h(x),
we use polynomial long division to get g(x) = q(x)h(x) + r(x) for a quotient poly-
nomial q(x) and a remainder polynomial r(x) having lower powers of x than h(x).
Thus:
g(x) q(x)h(x) + r(x) r(x)
f (x) = = = q(x) + .
h(x) h(x) h(x)
Since the numerator r(x) is smaller than the denominator h(x), we have
r(x)
limx h(x) = 0, and y = f (x) gets closer and closer to the curve y = q(x). If
q(x) = mx + b, then y = mx + b is a slant asymptote; otherwise, y = q(x) is an
asymptotic curve of y = f (x).
Rational function example. Referring to the Method for Graphing at the end
of this section, we apply the steps to the above function:
1. We have:
x4 8x3 + 13x2 + 12x 24 3(x3 6x2 + 24x 32)
f 0 (x) = , f 00 (x) = .
2x2 (x4)2 x3 (x4)3
x (, 1) (1, 3) (3, ).
a1 a2 a3 a4
x 1.26 0 1.39 2.61 4 5.26
f 0 (x) + 0 0 + 0 0 +
f (x) % 2.37 & & 0.05 % 0.05 & & 2.37 %
max asym min max asym min
7. This function does not have any of the standard symmetries in the Method.
However, the graph reveals a 180 rotation symmetry around the point (2, 0).
This is equivalent to the equation f (4x) = f (x), which can be shown from
the factored form.
x 53 13 1
3
5
3
7
3
11
3
s0 (x) 0 + 0 0 + 0 0 + 0
5 1 1 5 7 11
s(x) & 3 3 % 3 + 3 & 3 3 % 3 + 3 & 3 3 % 3 + 3 &
min max min max min max
4. The inflection points are solutions of s00 (x) = 2 sin(x) = 0, or x = n for any
integer n. Every multiple of is an inflection point of y = s(x).
5. The point (0, s(0)) = (0, 0) is an x and y-intercept. From the graph, we can
see that there are two more x-intercepts, but we have no way to find them
exactly. (We can approximate by Newtons Method 3.8.)
It appears that the square with length and width ` = w = 10 gives the maximum
area A = `w =100 m2 . To prove this algebraically, we note that the perimeter is
constant, P = 2` + 2w = 40; so the length determines the width and also the area:
1. Draw a picture labeled with numerical constant values and with letters for
varying quantities, including the target variable which is to be maximized
or minimized.
2. Write equations relating variables according to the geometry of the picture.
3. Choose one of the non-target variables as the controlling variable, and write
all other variables in terms of it by solving the above equations. Also de-
termine the relevant domain of the controlling variable, which is usually
restricted by requiring all lengths to be positive.
4. Find the absolute maximum or minimum of the target variable over the rele-
vant interval, say T = T (x) over x [a, b]. That is, solve T 0 (x) = 0 or undef,
to find the critical points x = c1 , c2 , . . . , as well as the endpoints x = a, b.
Take the output values T (x) at these candidate points: the largest/smallest
value is the desired maximum/minimum.
5. If needed, find values of the other variables at the optimum. Make sure the
answer is physically plausible to check for mistakes.
Bucket example. Consider a 10-quart bucket with cylindrical sides and a circular
bottom. What radius and height will minimize the surface area of the sides and
bottom?
1.
The target variable to be minimized is surface area S (in square inches). The
other variables are radius r (inches) and height h (inches). The constant
volume is V = 10 quarts; to make this comparable to the other variables,
we must convert to V = 577.5 cubic inches.
2. Equations. The volume V is the base area r2 times the height h. For the
surface S: the sides, if unrolled, form a rectangle with the same height h as
the cylinder, and width equal to the perimeter of the bottom, 2r; and we
also add the bottom area r2 . Thus:
Ants example. A line of ants marches across a 10cm 10cm square of carpet
from the lower left to the upper right corner. Part of their path is along the edge
next to the carpet, where their speed is 1 cm/sec, and part diagonally across the
carpet, where their speed is 12 cm/sec. What path should they take along the edge
before entering the carpet, so as to minimize (a) the total distance; and (b) the
total travel time.
1.
Some variables are e, the distance traveled along the edge, and c, the distance
traveled across the carpet. The target variable to be minimized in each part
is: (a) the total distance D in cm; and (b) the total time T in sec.
2. Equations:
c2 = 102 + (10e)2 , D = e + c.
Also, we know speed time = distance, so time = distance/speed. The
travel time along the edge is e/1 = e, along the carpet c/ 21 = 2c, with total:
T = e + 2c .
3. The obvious controlling variable is e, since we can easily write the other
variables in terms of it, including the target variables:
p p
c = 102 +(10e)2 = 20020e+e2 ,
p p
D = e + 20020e+e2 , T = e + 2 20020e+e2 .
The relevant domain is e [0, 10].
4. For part (a), the critical points are given by:
dD 10 e
= 1+ 12 (20020e+e2 )1/2 (20020e+e2 )0 = 1 = 0 ,
de 20020e+e2
which reduces to 20020e+e2 = 10 e, then to 20020e+e2 = (10e)2 ,
which cancels out to give the impossible equation 200 = 100. Thus, there are
no critical points,
and the absolute minimum must be one of the endpoints.
Since D(0) = 10 2 14.4 < D(10) = 20, the minimum is at e = 0.
For part (b), the critical points are given by:
dT 2(10e) p
= 1 = 0 = 20020e+e2 = 20 2e
de 20020e+e2
= 200 20e + e2 = (20 2e)2 = 3e2 60e + 200 = 0 .
The Quadratic Formula then gives:
60 602 4(3)(200) 10 3
e= 2(3) = 10 3 4.2, 15.8 .
Note that dD
de
is always defined over the relevant interval e [0, 10], since 20020e+e2 =
2 2
10 +(10e) > 0.
Math 132 Newtons Method Stewart 3.8
f (x) = x3 + x 1 = 0 .
In this case, the best we can ask is an approximate solution, accurate to a specified
number of decimal places, and this is all we need for any practical purpose.
We can start with a computer graph of y = f (x), which is just a display of
many plotted points (x, f (x)):
the Intermediate Value Theorem (1.8) guarantees a solution 0.6 < a < 0.7;
thus we can improve our estimate to a 0.6. We could add a decimal place by
checking f (0.61), f (0.62), . . . , f (0.69) to see where the values change from negative
to positive, but this is clearly very tedious and inefficient.
Newtons Method is an amazingly efficient way to refine an approximate solu-
tion to get more and more accurate ones, until the required accuracy is reached.
Let us call our first estimate x1 = 0.5. We are seeking the true solution x = a, the
x-intercept of y = f (x). As in 2.9, let us approximate y = f (x) by its tangent
line at our initial point at (x1 , f (x1 )), namely y = f (x1 ) + f 0 (x1 )(xx1 ):
How do we know there is no other solution x = b? If there were, Rolles Theorem (3.2) says
that there would be some x = c (a, b) with f 0 (c) = 0, namely a hill or valley of y = f (x). But
f 0 (x) = 3x2 + 1 = 0 clearly has no solutions, so y = f (x) has no hills or valleys, and there cannot
exist another solution x = b.
You can see how the tangent line (in red) is very close to the graph near x = x1 , and
fairly close even near the true solution x = a. We cannot solve for the x-intercept
of y = f (x), but we can find the x-intercept of the line, denoted x = x2 :
f (x1 )
f (x1 ) + f 0 (x1 )(xx1 ) = 0 = x = x2 = x1 .
f 0 (x1 )
This solution x2 is not exactly a, but it is closer than the initial estimate x1 .
Now we can iterate (green line), repeating the same computation starting with
x2 instead of x1 . The result is:
f (x2 )
x3 = x2 ,
f 0 (x2 )
The xns will continue as real numbers to converge closer and closer to a, but
since we do not see any difference in our 3 decimal places after x4 , there is no
point in continuing. We already have our answer within the specified accuracy:
f (xn )
xn+1 = xn ,
f 0 (xn )
2. Stop once xn xn+1 are the same up to the given accuracy. The final
approximation is a xn .
cos(x) = x .
Looking at the graph, we see that there is a unique solution somewhere around
x1 = 1. This seems different from the previous case, since we seek the intersection
of two graphs rather than the x-intercept of a single graph; but we can simply
rewrite the equation as f (x) = x cos(x) = 0. Newtons Method gives:
xn cos(xn )
xn+1 = xn ,
1 + sin(xn )
x1 x2 x3 x4
1.000 0.750 0.739 0.739
That is, the solution is a 0.739 to 3 places.
Numerical roots. The number 2 is a known value: a calculator can imme-
diately tell us that 2 = 1.41421356 . . . . But just how does the calculator know
this? Newtons Method,
thats how!
By definition, 2 is the solution of x2 = 2, or f (x) = x2 2 = 0. Starting
2 2
with x1 = 1, the Method gives xn+1 = xn x2x n
n
and:
x1 1.00000000
x2 1.50000000
x3 1.41666667
x4 1.41421569
x5 1.41421356
x6 1.41421356
Here we see the power of the Method: with just a couple of dozen +, , ,
calculator operations, it converged from 0 places to 8 places of accuracy.
We could also do the Method withfractions rather than decimals to get very
accurate fractional approximations of 2:
x1 1
x2 3/2
x3 17/12
x3 577/408
Already x3 = 17 17 2
12 is a very good approximation, since ( 12 ) = 289 1
144 = 2 144 , very
close to 2. However, no fraction or finite decimal can give 2 exactly: it is known
to be an irrational number.
Math 132 Antiderivatives Stewart 3.9
The non-uniqueness of F (t) means that the velocity alone does not determine
the height. But if we know the height at just one time, for example the initial
height F (0) = 5, then we can adjust the constant C in a unique way to satisfy
this requirement:
1 3
F (0) = 3 (0 ) + 2(0) + C = 5 = C = 5.
That is, F (t) = 31 t3 +2t+5 is the unique function with F 0 (t) = t2 +2 and F (0) = 5.
We have solved an initial value problem. (See the Ballistic Equation, end of 2.7.)
Generalizing we have:
F (x) = 7( 41 x4 ) 1 5/2
5/2 x + 3( 11 x1 ) 4 tan(x) + C
7 4 3
= 4x 52 x2 x 4 tan(x) + C.
x
To verify this, just differentiate F (x) to recover f (x).
We can also reverse the Chain Rule: we know (sin(3x))0 = cos(3x) (3x)0 =
3 cos(3x), so what F (x) will have F 0 (x) = cos(3x)?
1
f (x) = cos(3x) = F (x) = 3 sin(3x) + C.
On the other hand, the derivative of a product is NOT the product of deriva-
tives (2.3), so the antiderivative of a product is NOT the product of antideriva-
tives. (Similarly for quotients.) We will learn how to handle these later, but for
now, we can sometimes antidifferentiate products or quotients if we can expand
them into sums of Basic Antiderivatives.
x+4 x 4
f (x) = = + = x1/2 + 4x1/2
x x x
1 3/2 1 1/2
F (x) = 3/2 x + 4( 1/2 x ) + C = 23 x x + 8 x + C.
example. Find the antiderivative of f (x) = sin2 (x). This is a product of sin(x)
with itself, and we need to expand it somehow in terms of Basic Antiderivatives.
A clever idea: in the identity
cos(2x) = cos2 (x) sin2 (x) = (1 sin2 (x)) sin2 (x) = 1 2 sin2 (x),
v(1) = 12 (12 ) + 1 + C = 3,
v(t) = s0 (t) = 12 t2 + t + 32 .
where D is another constant (different from the previous C). Again, we can solve:
Geometric: f 0 (a) is the slope of the graph y = f (x) near the point (a, f (a)),
or the slope of the tangent line at that point, y = f (a) + f 0 (a)(xa).
f (x)f (a)
Numerical: Approximate by the difference quotient, f 0 (a) f
x = xa
for x near a. This gives the linear approximation f (x) f (a)+f 0 (a)(xa).
f (x)f (a) f (a+h)f (a)
Algebraic: Defining f 0 (a) = lim xa = lim h , we prove Basic
xa h0
Derivatives and Derivative Rules to find f 0 (x) for any formula f (x).
Problems usually originate on the physical or geometric levels, then we trans-
late them to the numerical or algebraic levels to solve them. For example, to find
the hill tops of a given curve y = f (x), a geometric problem, we consider that
they must have horizontal tangents, so we take the derivative f 0 (x) and solve for
the critical points f 0 (x) = 0 algebraically, or numerically with Newtons Method.
In the previous chapter 3.9, we introduced the reverse of the derivative, the
antiderivative. In this chapter, we will see that it has all the above levels of
meaning, and connecting them will allow us to solve many new problems.
0.0 < 0.1 < 0.2 < < 1.8 < 1.9 < 2.0.
We approximate a distance increment during each time increment, and add these
up to get the total distance traveled:
s(2) v(0.1)t + v(0.2)t + + v(1.9) t + v(2.0) t
= (0.1)2 (0.1) + (0.2)2 (0.1) + + (1.9)2 (0.1) + (2.0)2 (0.1)
2.9 .
Here we sample the velocity v(t) at the end of each increment: for example, the
first sample point is t = 0.1, the right endpoint of t [0.0, 0.1]. This is still an
overestimate, since the velocity is slightly less at the beginning of each increment
than at the end.
To get an underestimate, we should sample velocity at the beginning of each
increment, where it is smallest:
s(2) v(0.0)t + v(0.1)t + + v(1.8) t + v(1.9) t
= (0.0)2 (0.1) + (0.1)2 (0.1) + + (1.8)2 (0.1) + (1.9)2 (0.1)
2.5
As we take more and more increments of smaller and smaller size, all estimates
converge on a limiting value, which is the exact position s(2).
For this simple function v(t) = t2 , we can compare the numerical answers
with our known algebraic solution: s(t) = 13 t3 is the unique antiderivative with
s(0) = 0, and we have:
1 3 8
s(2) = 3 (2 ) = 3 2.66.
which is indeed between the lower and upper estimates above. In fact, the average
of the two estimates is 2.9+2.5
2 = 2.7, which is the correct answer rounded to 1
decimal place.
The integral. Applied generally to any velocity v(t) over any interval t [0, b],
this method specifies the value of the position s(b) as a limit. We introduce a new
notation for this limit, the integral of v(t) from t = 0 to b:
Z b
s(b) = v(t) dt = lim v(t1 )t + v(t2 )t + + v(tn )t.
0 t0
R
The integral symbol is an elongated S standing for the sum of n terms; v(t)
stands for all the sample values v(t1 ), . . . , v(tn ); and dt suggests a very small t
as n , getting larger and larger.
Increment: a small increase, a part added.
Sometimes s(t) turns out to equal a known formula, sometimes it can only be
computed
R 2 approximately to any desired accuracy. In our example, we computed
2 8
s(2) = 0 t dt = 3 2.66.
Generalizing further, suppose we are given any function f (x) which we consider
as the rate of change of an unknown function F (x) for x [a, b]. Then we may
compute the cumulative total change F (b) F (a) by the above method: that is,
we compute the integral of f (x) from x = a to b:
Z b
F (b) F (a) = f (x) dx = lim f (x1 )x + f (x2 )x + + f (xn )x.
a x0
Here we split the interval [a, b] into n increments of size x = ba n , and choose a
sample point in each increment: x1 , x2 , . . . , xn can be the left or right endpoints,
or anywhere between. Each term approximates the incremental change in F (x) as
the rate of change f (xi ) times the length of the increment x. Finally, we take
the limit as n and x 0.
Area problem. Now we come to one of the most surprising results in mathe-
matics: the geometric interpretation of the integral. Suppose we have a function
with f (x) 0 for x [a, b], and we wish to determine the area under the graph
y = f (x) and above the interval [a, b] on the x-axis. For example, let us again take
f (x) = x2 over the interval [0, 2].
The dividing points are again 0.0 < 0.1 < < 1.9 < 2.0, and each rectangle
reaches up to the graph at the right endpoint of an increment, giving heights
f (0.1), f (0.2), . . . , f (2.0). The area A under the curve is close to the total area of
the rectangles; adding up (height)(width) for each rectangle gives:
This is an overestimate, since the rectangles slightly overshoot the curve. To get
an underestimate, we take heights at the left endpoint of each increment, fitting
the rectangles under the graph (above at right):
Clearly, this is the same computation as we did before, so it has the same answer.
That is, taking the limit of thinner and thinner rectangles gives:
Z 2
A = x2 dx.
0
That is, the area under y = f (x) for x [0, 2] is the same as the distance traveled
with velocity v(t) = t2 during t [0, 2]. Are you not amazed?
Why is this? Let us fix a, take b = x to be a variable, and consider the area
above [a, x] as a function A(x). Then the rate of change of the area function is
the height of the graph: A0 (x) = f (x), since the greater the height, the taller
the rightmost incremental rectangle, and the faster A(x) increases. Thus, we can
consider the height as a rate of change, and the area as a cumulative total change,
which is just what the integral computes.
We have seen the distance problem before in 2.7, when we used speedometer
data to reconstruct odometer data, using the graph of velocity v(t) to draw the
graph of distance s(t). We can now compute s(t) for t = x as the area under the
v(t) graph and a horizontal interval t [0, x]. (If the velocity is negative, the area
counts as negative and s(t) decreases.) For consistency, we must look at the ft/sec
(not mph) scale for v(t), since t is in sec, and we can estimate the total net area
to be about 3500 ft. That is, during the 150 sec shown, the car traveled forward
by s(150) 3500 ft.
We will compute:
Z
A = sin2 (x) dx sin2 (x1 ) x + sin2 (x2 ) x + + sin2 (xn ) x
0
for suitably large n, correspondingly small increment x = ba
n = n , and appro-
priate sample points x1 , . . . , xn .
To make sure of the required accuracy, we will compute an overestimate and an
underestimate. For an overestimate, we take x1 , . . . , xn so that sin2 (x) is largest
within each increment. These are not always the right endpoints, because the
function is decreasing on the second half of the interval. Rather, for the increments
within [0, 2 ], we take the right endpoints, and for the increments within [ 2 , ] we
take the left endpoints. To get an underestimate, we take sample points where
sin2 (x) is smallest within each increment, reversing the previous choices.
With a spreadsheet or computer algebra program, it is not difficult to take
n = 100. The upper estimate is:
This would be exact if the velocity held constant at v(1) = 1 for the whole time,
but in fact the car is speeding up to v(3) = 9, so this is a gross underestimate.
For a good approximation, we split the time interval [1, 3] into 20 increments
of size t = 0.1, with dividing points:
1.0 < 1.1 < 1.2 < < 2.8 < 2.9 < 3.0.
Here we sample velocity at the beginning of each increment: for example, the last
sample point is 2.9, the beginning of [2.9, 3.0]. This is still an underestimate, since
the velocity does increase slightly from the beginning to the end of each increment.
To get an overestimate, we should sample velocity at the end of each increment,
where it is largest:
In this case, we can compare with our known algebraic solution: s(t) = 31 t3 is
the unique antiderivative of v(t) = t2 with s(0) = 0, and we have:
1 3
s(3) s(1) = 3 (3 ) 13 (13 ) = 8 23 8.66.
which is indeed between the lower and upper estimates above. Note that the
average of the two estimates is 8.7, which is correct to 1 decimal place.
As we take more and more increments of smaller and smaller size, all estimates
converge on a limiting value, which is the exact answer. Applied generally to any
v(t), this method specifies the position s(b) = s(b) s(0) as a limit, which defines
an antiderivative function. Sometimes this turns out to equal a known formula,
sometimes it can only be computed approximately to any desired accuracy.
Increment: an increase, a part added.
Defintion of integral. Generalizing from velocity and distance, suppose we are
given an arbitrary f (x) which we consider as the rate of change of an unknown
antiderivative F (x), with a given initial value F (a) = 0. We wish to compute the
total change F (b) = F (b) F (a) by the above method, and we introduce a new
Rb
symbol for the answer: a f (x) dx.
Defintion: The definite integral of f (x) from x = a to x = b means:
Z b
f (x) dx = lim f (x1 )x + f (x2 )x + + f (xn )x.
a x0
Area problem. Now we come to one of the most surprising results in mathe-
matics: the geometric interpretation of the integral. Suppose we have a function
with f (x) 0 for x [a, b], and we wish to determine the area under the graph
y = f (x) and above the interval [a, b] on the x-axis.
There are known aswers if this shape is a triangle, a circle, or a few others, but
we have no formulas for a general f (x). For example, let us again take f (x) = x2
over the interval [1, 3]. To approximate the area A, we fill it with thin rectangles
of width x = 0.1 (below at left):
The dividing points are again 1.0 < 1.1 < < 2.9 < 3.0, and each rectan-
gle reaches up to the graph at the left endpoint of an increment, giving heights
f (1.0), f (1.1), . . . , f (2.9). The area A under the curve is close to the total area of
the rectangles; adding up height width gives:
This is an underestimate, since the rectangles do not quite fill the area. To get an
overestimate, we take heights at the right endpoint of each increment, making the
rectangles taller than the graph (above at right):
Area problem for v(t) = 16t: same formula, upper & lower sums
Why same? Because they both compute the cumulative effect of a changing
influence
Math 132 Sigma Notation Stewart 4.1, Part 2
Rb
Notation for sums. In Notes 4.1, we define the integral a f (x) dx as a limit of
approximations. That is, we split the interval x [a, b] into n increments of size
x = ba
n , we choose sample points x1 , x2 , . . . , xn , and we take:
Z b
f (x) dx = lim f (x1 )x + f (x2 )x + + f (xn )x.
a x0
The sum which appears on the right is called a Riemann sum. Similar sums appear
frequently in mathematics, and we define a special notation to handle them.
In the most general situation, we have a sequence of numbers q0 , q1 , q2 , q3 , . . .
so that for any i = 0, 1, 2, . . . we have a number qi . We consider an interval of
integers i = m, m+1, m+2, . . . , n, and we introduce a notation for the sum of all
the qi for i = m to n:
n
X
qi = qm + qm+1 + qm+2 + + qn .
i=m
The summation symbol is capital sigma, the Greek letter S, standing for sum.
The variable i is called the index of summation.
Note: In the WebWork problems, a sequence is denoted f (i) instead of qi .
This is because we can consider the sequence of qis as a function with input i (an
integer) and output qi (a specified number).
Examples
Letting qi = i, we have q0 = 0, q1 = 1, q2 = 2, q3 = 3, etc., and taking
the interval of integers i = 2, 3, 4, 5, we have:
5
X
i = 2+ 3+ 4+ 5 7.38 .
i=2
P10
Letting qi = 1, we have: i=1 1 {z + 1} = 10.
= |1 + 1 +
10 terms
Given the sum of the first five odd numbers 1 + 3 + 5 + 7 + 9, we can write
this in sigma notation by considering the terms as qi = 2i1:
5
X
1 + 3 + 5 + 7 + 9 = (2(1)1) + (2(2)1) + + (2(5)1) = (2i1) .
i=1
Another way would be to consider the terms as qi = 2i+1:
4
X
1 + 3 + 5 + 7 + 9 = (2(0)+1) + (2(1)+1) + + (2(4)+1) = (2i+1) .
i=0
The sum of the first n odd numbers, where n is an unspecified whole number,
can be written as:
n
X
1 + 3 + 5 + + (2n1) = (2i1).
i=1
Like all facts about summations, these formulas can be understood by writing out
the terms in dot-dot-dot (ellipsis) notation:
n
X
(qi +pi ) = (qm +pm ) + (qm+1 +pm+1 ) + + (qn +pn )
i=m
= (qm + qm+1 + + qn ) + (pm + pm+1 + + pn )
n
X n
X
= qi + pi .
i=m i=m
n
X
2 i = 1 + 2 + + n1 + n
i=1
+ n + n1 + + 2 + 1
= 3 ni=1 i2 + 23 n(n+1) + n.
P
= (n+1)3 13 .
gives, as desired:
n
X
i2 = 1
3 ((n+1)
3
32 n(n+1) (n+1)) = 1
6 n(n+1)(2n+1) .
i=1
Pn 3
A similar computation will produce a formula for i=1 i , etc.
Direct Evaluation of Integrals. We can use the above rules to simplify Rie-
mann sums and find integrals exactly. For example, consider:
Z 3 n
X
5x dx = lim 5xi x.
1 x0
i=1
On the right side, we divide the interval [1, 3] into n increments of length x =
31 2
n = n , with dividing points:
In the ith increment, we arbitrarily choose the sample point xi to be the right
endpoint, that is xi = 1 + i x = 1 + n2 i. Thus:
n n
X X 2 2
5xi x = 5 1+ i
n n
i=1 i=1
n n
10 X 20 X
= 1+ 2 i
n n
i=1 i=1
10 20 1
= n + 2 n(n+1)
n n 2
10
= 20 + .
n
(Here n is a fixed number not
P depending on i, such as n = 100 or n = 1000, and
we can factor it out of the .) Finally, we let x 0 or equivalently n :
Z 3 n
X 10
5x dx = lim 5xi x = lim 20 + = 20 .
1 x0 n n
i=1
The function f (x) is integrable over [a, b] whenever the above limit
exists for every possible choice of sample points xi .
Integrable functions. Most functions are integrable unless they have a vertical
asymptote. To be precise:
Theorem: Assume f (x) is continuous for all x [a, b], except possibly
at a finite list of removable or jump discontinuities (see 1.8).
Then f (x) is integrable, meaning its Riemann sums converge to a well-
Rb
defined limit L = a f (x) dx for any choice of sample points.
R3
But (triangle area) = 12 (base)(height), so 0 (2x) dx = 21 (2)(2) 12 (1)(1) = 32 .
Reversing limits of integration. If we take the limits of integration to be the
same, a = b, then x = ba
n = 0, so every Riemann sum is zero, and we get:
Z a
f (x) dx = 0 .
a
This is clear geometrically, since the area above a one-point interval [a, a] is zero.
Next, we give a meaning to switching the two limits of integration, by defining:
Z a Z b
f (x) dx = f (x) dx .
b a
Ra
Geometrically, in b f (x) dx we imagine x running backward from b to a with a
negative increment x = abn < 0. Since each term f (xi )x has negative width
Rb
x, the integral becomes the negative of a f (x) dx. If f (xi ) is also negative,
then
R 1 both widthR 3and height are negative and the integral is positive: for example,
3 (1) dx = 1 (1) dx = (2) = 2.
We bother with this definition only so as to improve the Splitting Rule below.
Proof. The Sum, Difference, and Constant Multiple Rules follow directly from the
corresponding rules for summations in 4.1 Part 2, applied to Riemann sums.
The Bounds Rule makes sense geometrically because 0 A f (x) B
means the graph y = f (x) is above the line y = A and below y = B. Thus
Rb
the area a f (x) dx below y = f (x) and above [a, b] contains a rectangle with
(width)(height) = (ba)A, and the area is contained inside a rectangle with
(width)(height) = (ba)B.
Pn Pn
Computing
Pn formally, A f (xi ) B implies i=1 A x i=1 f (xi )x
i=1 B x. Hence:
n n n
(ba)A (ba)A
X X X
ba
A x = A n = n 1 = n n = (ba)A,
i=1 i=1 i=1
and similarly for the upper bound. Taking limits as n gives the desired
inequalities. The Domination Rule is similar.
The Splitting Rule is intuitive when a b c. The interval [a, c] splits as the
union of two sub-intervals, [a, b] [b, c], so the area above [a, c] is the sum of the
Rc Rb Rc
areas above [a, b] and [b, c], i.e. a f (x) dx = a f (x) dx + b f (x) dx.
Furthermore, because of our extended definition of integrals, the Splitting Rule
is valid no matter what the relativeRpositions ofRa, b, c. For example, if a c b,
c b Rb
then [a, b] = [a, c][c, b] and clearly a f (x) dx+ c f (x) dx = a f (x) dx. Moving
Rb
c to the other side, we get:
Z c Z c Z b Z b Z c
f (x) dx = f (x) dx f (x) dx = f (x) dx + f (x) dx ,
a a c a b
Rc Rb Rc
so the very same Splitting Rule applies: a f (x) dx = a f (x) dx + b f (x) dx.
Another example: if a = c, the Splitting Rule says:
Z a Z b Z a
f (x) dx = f (x) dx + f (x) dx,
a a b
Basic Integrals:
Z b Z b Z b
1 2 1 2
1 dx = b a, x dx = 2b 2a , x2 dx = 1 3
3b 13 a3 .
a a a
Later, we will easily evaluate these integrals by the Fundamental Theorems. For
now, we can prove them directly from the Basic Summations in 4.1 Part 2. For
the third and hardest formula, we take increment x = ba n , sample points xi =
a + ix, and f (xi ) = (a+ix)2 = a2 + 2ai x + i2 (x)2 , giving Riemann sum:
n
X n
X
f (xi )x = a2 x + 2ai(x)2 + i2 (x)3
i=1 i=1
n
X n
X n
X
2 2 3
= a x 1 + 2a(x) i + (x) i2
i=1 i=1 i=1
2 (ba)3 1
= a2 (ba)
n n+ 2a (ba)
n2
1
2 n(n+1) + n3 6 n(n+1)(2n+1)
= a2 (ba) + 2a(ba)2 ( 21 + 2n
1
) + (ba)3 ( 13 + 2n
1
+ 6n1 2 )
Taking the limn , the terms with n in the denominator disappear, and we get:
Z b
x2 dx = a2 (ba) + 2a(ba)2 ( 21 ) + (ba)3 ( 31 ) = 1 3
3b 31 a3 .
a
Examples.
= 603 .
That is, we do not ask for an exact value, only an overestimate. We know
that 1 + cos(x2 ) 2, so the Domination Rule gives:
R7 2 2
R7 2
2 (3x5) (1+ cos(x )) dx 2 (3x5) (2) dx = 2(603) = 1206 .
Math 132 Fundamental Theorem of Calculus Stewart 4.3
Integral as antiderivative. In 4.1, we were given a velocity function v(t), and we wanted
to determine the corresponding position function s(t). First, we computed the distance
traveled by the object over a given time t = a to t = b by adding up (velocity)(time) over
many small time increments of length t:
Xn Z b
distance traveled = s(b) s(a) = lim v(ti ) t = v(t) dt.
t0 a
i=1
The choice of letters for quantities is only suggestive, and does not affect the computations.
Instead of a fixed interval [a, b], let us change to [a, x] to suggest that the right endpoint
t = x is variable, while the left endpoint t = a remains fixed. We get the position function:
Z x
s(x) = v(t) dt.
a
This always computes an antiderivative function for v(t), even if it is impossible to get an
antiderivative algebraically by reversing differentiation formulas.
Then F 0 (x)
= f (x) for x (a, b), and F (x) is the unique antiderivative of f (x)
with F (a) = 0.
Before a formal proof, let us see how the Theorem relates to our velocity-to-position
argument above. If we assume there is some antiderivative F (x) with F 0 (x) = f (x) and
F (a) = 0, then we could approximate F (x) = F (x) F (a) as the sum of increments of
(rate)(time) = f (ti )t, and the exact value of F (x) as t 0 would be the integral.
However, this does not prove that there really does exist such an antiderivative F (x), only
that if it exists, it must be given by the integral function.
Proof of Theorem. We do not yet know a derivative formula for the new function F (x) =
Rx 0 F (x+h)F (x)
a f (x) dx, so we must compute from the definition: F (x) = lim h . We have:
h0
x+h x
1 x+h
Z
F (x+h) F (x)
Z Z
1
= f (t) dt f (t) dt = f (t) dt ,
h h a a h x
?? R t ?? R 2
We use the new variable x to avoid s(t) = a v(t) dt, which would imply nonsense like s(2) = a v(2) d2.
Again, we must use different letters for the limit of integration x and the variable of integration t.
R x+h Rx R x+h
since a = a + x for all h (even h < 0).
Geometrically, we see that if h is small enough, the region above [x, x+h] is approximately
R x+h
a rectangle with height f (x) and width h, so x f (x) dx f (x)h, and:
x+h
F (x+h) F (x)
Z
0 1 1
F (x) = f (t) dt h (f (x)h) = f (x),
h h x
As h gets very small, the interval [x, x+h] gets closer and closer to the single point x,
and the absolute minimum and maximum over this tiny interval must approach f (x) by
continuity: that is, limh0 Nh = limh0 Mh = f (x). Also, by the above we have:
x+h
F (x+h) F (x)
Z
1
Nh = f (t) dt Mh .
h h x
Applying the Squeeze Theorem for limits (1.6), we find what we wanted:
F (x+h) F (x)
F 0 (x) = lim = lim Nh = lim Mh = f (x) ,
h0 h h0 h0
Ra
As for the last part of the conclusion, it is clear that F (a) = a f (t) dt = 0, and there is
a unique antiderivative with this initial value by the Antiderivative Theorem (3.9), which
is a version of the Uniqueness Theorem (3.2). Note how we have used almost all of our
previous theory in proving this culminating Theorem.
Here a is any constant, x is the input variable, and t is a dummy variable which only has
meaning inside the integral.
For another function g(x), we can take its composition with F (x). Then the above
Basic Derivative together with the Chain Rule (2.5) implies:
Z g(x) !
0 d
F (g(x)) = f (t) dt = F 0 (g(x)) g 0 (x) = f (g(x)) g 0 (x) .
dx a
R x3
example: Find the derivative of F (x) = 2x sin(x) dx. We have:
Z x3 ! Z x3 Z 2x !
0 d d
F (x) = sin(x) dx = sin(x) dx sin(x) dx
dx 2x dx 0 0
Second Fundamental Theorem. This is a trick to easily evaluate many integrals, which
we already used to find some exact values inT 4.1.
Theorem: Suppose F (x) is some known antiderivative with F 0 (x) = f (x). Then:
Z b
f (x) dx = F (b) F (a) .
a
Rb
That is, if f (x) is the rate of change of F (x), then the integral a f (x) dx is the
total change of F (x) from x = a to b.
Proof. Since F (x) is a particular antiderivative of f (x), the Uniqueness Theorem (3.9,
3.2) says that the general antiderivative is F (x) + C for R xany constant C. But the First
Fundamental Theorem says the integral function I(x) = 0 f (t) dt is also an antiderivative
of
R a f (x), so we must have I(x) = F (x) + C. Since we know the initial condition I(a) =
a f (t) dt = 0, we get I(a) = F (a) + C = 0, and C = F (a). Therefore I(x) = F (x) F (a)
Rb
and a f (x) dx = I(b) = F (b) F (a) as desired.
R 5
example: Evaluate the integral: 5 5+4x2 x4 dx. Reversing our Derivative Rules as
we did in 3.9, we see that F (x) = 5x+ 43 x3 51 x5 is an antiderivative. By the Theorem:
Z 2
5+4x2 x4 dx = F ( 5) F ( 5) = 4
3 5 ( 34 5) = 8
3 5 5.96
0
example: Determine the area under the curve y = 5+4x2 x4 and above the x-axis.
We must determine the limits of integration, which are the x-intercepts of the graph. Sub-
tituting u = x2 , the equation becomes 5 + 4u u2 = 0, which we can solve by the Quadratic
R 5
Formula as u = 1 or 5, so x = u = 5. Thus the area is 5 5+4x2 x4 dx = 38 5
as above.
The variable of integration, x or t, is irrelevant, provided it doesnt conflict with the limits of integration.
Math 132 More Uses for Integrals 4.4, 5.5
Rb
Review. The integral a f (x) dx has four levels of meaning.
In Newton notation: Z b
f (x) dx = F (b) F (a) .
a
This is the Second Fundamental Theorem of Calculus.
Rx
If we know an initial value F (a), we have F (x) = F (a) + a f (t) dt, and:
Z x
0 d
F (x) = f (t) dt = f (x) .
dx a
Geometric: The integral is the area between the graph y = f (x) and the interval
x [a, b], counting area above the x-axis as positive, area below the x-axis as
negative.
In Leibnitz notation, a function is denoted by its output variable, such as z = F (x). A particular
output value of the function is denoted: z|x=a = F (a); and the change in the value over an interval
x [a, b] is denoted: z|x=b
x=a = F (b) F (a).
Indefinite integral notation. Since antiderivatives are so closely related to inte-
grals by the Fundamental Theorems, we adopt the integral sign as a notation for the
most general antiderivative of a function:
Z
f (x) dx = F (x) + C for all C.
Here F (x) is a particular antiderivative: F 0 (x) = f (x); and F (x) + C means the
family of all antiderivatives, one for every constant C (3.9). This familyR is called
the indefinite integral, with no specific limits of integration next to the sign.
d 3 d 1 3
example: Since dx (x ) = 3x2 and dx ( 3 x ) = x2 , we have the indefinite integral:
Z
x2 dx = 13 x3 + C .
example: Suppose a car with position function s(t) lurches forward with velocity
v(t) = 10t + 10 sin(t) m/sec. How far does it travel from t = 0 to t = 3 sec? Making
use of the antiderivative table in 3.9, we first find the indefinite integral:
Z
10t + 10 sin(t) dt = 5t2 10 cos(t) + C
Since velocity is the rate of change of position, the total change in position is the
definite integral:
Z 3
t=3
s(3) s(1) = 10t + 10 sin(t) dt = 5t2 10
cos(t)
t=0
0
5(32 ) 10 2 10 20
= cos(3) 5(0 ) cos(0) = 45 + 51.4 meters.
This is just the interval length (ba) times the average of the sample values f (x1 ), . . . , f (xn ).
The integral is the limit of this as n , which becomes (ba) times the average
of a more and more dense set of sample values:
Z b
f (x1 ) + + f (xn )
f (x) dx = (ba) lim .
a n n
We define the average of f (x) over all x [a, b] to be the above limit. Hence:
Z b
1
Average of f (x) over [a, b] = fave = f (x) dx .
ba a
example: RFind the average
R 1/2 value2 of f (x) = x over x [0, 4]. The indefinite
integral is: x dx = x dx = 3 x3/2 + C. The average is thus:
4
1 2 3 x=4
Z
1 4
x dx = x2 = 1.3 .
40 0 4 3 x=0 3
That is, the function varies between 0 x 2 over the interval, but its aver-
age value is higher than the halfway point 1.0, because the graph bulges above the
straight line from (0, 0) to (4, 2).
A geometric way to picture the average of a positive f (x) over [a, b] is to think
of the area under the curve as a fluid. If we remove the curve and contain the fluid
between the walls x = a and x = b, then the level of the fluid is the average of the
function. In the picture, the fluid under the straight line would fill the container
between x = 0 and x = 4 to the midpoint level y = 1; but with extra fluid under
y = x and above the line, the average of f (x) = x is higher: y = 43 .
If f (x) is continuous for x [a, b], then there is some c (a, b) where f (c)
1
Rb
equals the average of f (x) over the interval: f (c) = fave = ba a f (x) dx.
Rx
Proof: Take F (x) = a f (x) dx. Then the Mean Value Theorem for Derivatives (3.2)
says there is a value c (a, b) where the tangent line for F (x) is parallel to the secant
line over the interval: F 0 (c) = F (b)F
ba
(a)
. By the First Fundamental Theorem, the
Rb
left side is F (c) = f (c); and since F (a) = 0, the right side is Fba
0 (b) 1
= ba a f (x) dx,
as desired.
In our example f (x) = x, we easily find c (0, 4) with f (c) = c = fave = 43 : it is
c = 16
9 .
Math 132 Substitution Method Stewart 4.5
Reversing the Chain Rule. As we have seen from the R b Second Fundamental
Theorem (4.3), the easiest way to evaluate an integral a f (x) dx is to find an
R Rb
antiderivative, the indefinite integral f (x) dx = F (x) + C, so that a f (x) dx =
F (b)F (a). Building on 3.9, we will learn several methods to find antiderivatives
which reverse our methods of differentiation, in this case the Chain Rule.
For example, let us find the antiderivative:
Z
x cos(x2 ) dx .
That is, for what function will the Derivative Rules produce x cos(x2 )? We notice
an inside function g(x) = x2 , and a factor x which is very close to the derivative
g 0 (x) = 2x. In fact, we can get the exact derivative of the inside function if we
multiply the factors by 21 and 2:
x cos(x2 ) = 1
2 cos(x2 ) (2x) .
This is just the kind of derivative function produced by the Chain Rule:
??
f (g(x))0 = f 0 (g(x)) g 0 (x) = f 0 (x2 ) (2x) = 1
2 cos(x2 ) (2x) .
We still need to find the outside function f . To remind us of the original inside
function, we write f (u), where the new variable u represents u = g(x) = x2 . We
must get f 0 (u) = 12 cos(u), an easy antiderivative:
Z
1 1
2 cos(u) du = f (u) + C = 2 sin(u) + C .
Now we restore the original inside function to get our final answer:
Z
1 1 1 2
2 cos(u) du = 2 sin(u) + C = 2 sin(x ) + C .
The Chain Rule in Leibnitz notation (2.5) reverses and checks the above
computation. Writing y = 21 sin(u) and u = x2 :
dy dy du d 1
d 2
= = 2 sin(u) x
dx du dx du dx
= 1
2 cos(u) (2x) = 1
2 cos(x2 ) (2x) = x cos(x2 ) .
Substitution Method
R
1. Given an antiderivative h(x) dx, try to find an inside function g(x) such
that g 0 (x) is a factor of the integrand:
h(x) = f (g(x)) g 0 (x).
This will often involve multiplying and dividing by a constant to get the
exact derivative g 0 (x). After factoring out g 0 (x), sometimes the remaining
factor needs to be manipulated to write it as a function of u = g(x).
du
2. Using the symbolic notation u = g(x), du = dx = g 0 (x) dx, write:
dx
Z Z Z
0
h(x) dx = f (g(x)) g (x) dx = f (u) du ,
R
and find the antiderivative f (u) du = F (u) + C by whatever method.
3. Restore the original inside function:
Z Z
h(x) dx = f (u) du = F (u) + C = F (g(x)) + C .
Examples
R
(3x+4) 3x+4 dx. The inside function is clearly u = 3x+4, du = 3 dx, so:
Z Z
1
(3x+4) 3x+4 dx = 3 (3x+4) 3x+4 3 dx
Z Z
= 1
3u u du = 1
3 u3/2 du = 1 2 5/2
3 5u +C = 2
15 (3x+4)
5/2
+ C.
Z
x 3x+4 dx. Again u = 3x+4, so 3x+4 becomes u, but we must still
express the remaining factor x in terms of u. We solve u = 3x+4 to obtain
x = 31 u 43 : that is, x = 31 (3x+4) 43 :
Z Z Z
1 1 4 1 1 4
x 3x+4 dx = 3 ( 3 (3x+4) 3 ) 3x+4 3 dx = 3 ( 3 u 3 ) u du
Z
1 3/2 4 1/2 1 2 5/2 4 2 3/2 5/2
= 9u 9 u du = 9 5u 9 3 u +C = 2
45 (3x+4)
8
27 (3x+4)3/2 +C.
sec2 ( x)
Z
dx . We take u = x = x1/2 , du = 12 x1/2 dx = 21 x dx
x
sec2 ( x)
Z Z
1
dx = 2 sec2 ( x) dx
x 2 x
Z
= sec2 (u) du = tan(u) + C = tan( x) + C.
u2 3u+2
Z Z
2u
= 2(u1) du = 2 du
u u
Z
= 2 u3/2 3u1/2 + 2u1/2 du = 4 5/2
5u 43 u3/2 + 8u1/2 + C
= 4
5 (1+ x)5/2 43 (1+ x)3/2 + 8(1+ x)1/2 + C.
Whew! Here we did not have the derivative factor du 1
dx = 2 x already present:
we had to multipy and divide by it to get du,R then express the remaining
factors in terms of u. By luck, the resulting du was do-able.
Z
sec2 (x) tan(x) dx. Here we could take u = tan(x), du = sec2 (x) dx:
Z Z
2
sec (x) tan(x) dx = tan(x) sec2 (x) dx
Z
1 2
= u du = 2u +C = 1
2 tan2 (x) + C.
Thus 21 tan2 (x) and 12 sec2 (x) are two different antiderivatives, so what about
the Antiderivative Uniqueness Theorem (3.9)? In fact, the identity tan2 (x)+
1 = sec2 (x) implies:
1
2 tan2 (x) + 1
2 = 1
2 sec2 (x) .
Integral Symmetry
Ra Theorem: If f (x) is an odd function, meaning f (x) =
f (x), then a f (x) dx = 0.
Proof. By the Integral Splitting Rule (4.2), we have:
Z a Z 0 Z a
f (x) dx = f (x) dx + f (x) dx .
a a 0
Substituting u = x, du = (1) dx in the first term:
Z 0 Z 0 Z 0
f (x) dx = f (x) (1) dx = f (x) (1) dx
a a a
Z 0 Z a Z a
= f (u) du = f (u) du = f (x) dx .
a 0 0
The last equality holds because the variable
R 0 of integration
R a is merely suggestive,
Ra and
can be changed arbitrarily. Therefore a f (x) dx + 0 f (x) dx = 0 f (x) dx +
Ra
0 f (x) dx = 0, as desired.
R
example: Evaluate the definite integral x cos(x) dx. Here substitution will
not work, and it is difficult to find an antiderivative. But since (x) cos(x) =
(x cos(x)), the Theorem tells us the integral must be zero.
Geometrically, the integral is the signed area between the graph and the x-axis:
Since the function f (x) = x cos(x) is odd, the graph has rotational symmetry
around the origin, and each negative area below the x-axis cancels a positive area
above the x-axis.
Math 132 Area Between Curves Stewart 5.1
Region
Rb between two parabolas. We have seen that geometrically, the integral
a f (x) dx computes the area between a curve y = f (x) and an interval x [a, b]
on the x-axis (with area below the axis counted negatively). In Calculus II, we will
show the versatility of the integral to compute all kinds of areas, lengths, volumes:
almost any measure of size for a geometric object.
In this section, we compute more general areas: those between two given curves
y = f (x) and y = g(x), usually with no boundary on the x-axis.
example: Consider the region with top boundary y = f (x) = x2 +x+1, bottom
boundary y = g(x) = 2x2 1, left boundary the y-axis x = 0, right boundary x = 1:
example: Next, consider the region between the same curves y = f (x) = x2 +x+1
and y = g(x) = 2x2 1, but above the interval x [1, 3]. To picture the region
without a calculator, we determine the intersection points where the curves cross:
by the Quadratic Formula. Only x = 2 is relevant for our region above x [1, 3].
At x = 1 we have g(1) < f (1), so to the left of x = 2, our region is defined by
g(x) y f (x). At x = 3, we have f (3) < g(3), so to the right of x = 2, it is
f (x) y g(x):
Repeating our previous area formula for the two parts of our region gives:
Z 2 Z 3
A[1,3] = A[1,2] + A[2,3] = (f (x) g(x)) dx + (g(x) f (x)) dx
1 2
Z 2 Z 3
= (2+xx2 ) dx + (2x+x2 ) dx = 7
6 + 11
6 = 3.
1 2
example: Finally, we consider the same curves y = f (x) = x2 +x+1 and y =
g(x) = 2x2 1, but we take the entire finite region between them:
Theorem: The area of the region enclosed between f (x) and g(x) for
Z b
x [a, b] is: A = f (x) g(x) dx .
a
The absolute value signs ensure we take the integral of top minus bottom, regard-
less of which is which. In practice, we must find the intersection points where
f (x) = g(x), which split the integral into intervals where g(x) f (x) versus
f (x) g(x).
Here the boudary curves are naturally graphs in which y is the independent vari-
able: the right boundary is the line x = f (y) = y+1; and the left boundary is
x = g(y) = y 2 , a parabola opening to the right.
Understand: it is merely by habit that we consider y as a function of x. We can
make x a function of y instead if it is more convenient, and the same formulas
will work if we switch the
roles of x and y. Thus, we find the intersection points:
2 1 5
y+1 = y when y = 2 by the Quadratic Formula. The area as:
1+ 5 y= 1+ 5
Z
2 2
2 1 2 1 3 5
A = (y+1) (y ) dy = 2y +y 3y = 6 5.
1 5
2 y= 12 5
Here ((y+1) (y 2 )) dy represents the area of the horizontal slice of the region at
height y, with thickness dy.
To check this, we re-do it from our usual perspective, using x as the indepen-
dent variable. This makes it more complicated, since we must consider the region
as having three boundary graphs: upper boundary y = x, lower right boundary
y = x1, and lower left boundary y = x. The intersection points are:
Between y = x and y = x1: x = 3+2 5 (upper right corner)
Between y = x and y = x1: x = 32 5 (lower middle corner)
Between y = x and y = x: x = 0 (left end)
These split the region into left and right parts:
5
which after much algebra gives the same answer as before: 6 5.