0% found this document useful (0 votes)
127 views193 pages

Math117 Course Notes

This document contains lecture notes for a Calculus I course for engineering students. It begins with an overview of functions, including the definition of a function, domain and range, and composition of functions. It then discusses inverse functions, noting that not all functions have inverses and describing how to find the inverse of a function by solving for the input variable in terms of the output variable. The document cautions that notation involving inverses and reciprocals can be ambiguous and confusing.

Uploaded by

kalotyabhijit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views193 pages

Math117 Course Notes

This document contains lecture notes for a Calculus I course for engineering students. It begins with an overview of functions, including the definition of a function, domain and range, and composition of functions. It then discusses inverse functions, noting that not all functions have inverses and describing how to find the inverse of a function by solving for the input variable in terms of the output variable. The document cautions that notation involving inverses and reciprocals can be ambiguous and confusing.

Uploaded by

kalotyabhijit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 193

Math 117

Calculus I for Engineering

Lecture Notes

David Harmsworth

Department of Applied Mathematics

University of Waterloo

2014, 2018
Part I

Functions
Since calculus is primarily concerned with the study of functions, we begin this course with

a review of some of the basic concepts. Since most of these ideas should already be familiar

to you, we’ll move quite quickly, with a focus on addressing some common misconceptions.

However, we will also introduce some concepts which are usually not discussed until later on

in the calculus sequence, so there should be something new for everyone each week.

1 Review of the Basics

A function is simply a rule which assigns a single output value to each input value. You are

probably familiar with the “vertical line test”; for a true function there can be only a single

output for each input, and this corresponds to the fact that its graph cannot pass through

any vertical line more than once. Since we customarily use the name “x” for the independent

(input) variable and the name “y” for the dependent (output) variable, we can describe the

action of a function f by writing y = f (x). You may occasionally also see the notation

f : x → f (x), which doesn’t require us to assign a name to the output.

Note that there doesn’t have to be a formula for a function. For instance, the temperature

at a given location can be regarded as a function of time, but there’s no explicit formula for

it! Of course, to use calculus, we might try to invent a formula which approximates the real

function.

The domain of a function is the set of allowable values for the independent variable, while

the range is the set of possible values for the dependent variable. For example, for the function

f (x) = x − 1, the domain (unless otherwise specified) is the set of values of x such that x ≥ 1,

and the range is the set of values of x such that x ≥ 0.

Comment: We’ll often describe such sets in interval notation: an interval such as

{x | 1 < x < 2} can be expressed simply as the interval (1, 2). If we wish to include the

endpoints we use square brackets: the interval {x | 1 ≤ x ≤ 2} can be written as [1, 2]. These

two types of intervals are referred to as “open” and “closed”, respectively. The two types of

1
parentheses can be combined as needed, so for example the interval [1, 2) is closed on the left

and open on the right (that is, the number 1 is included, but the number 2 is not). We use the

symbol ∞ for unbounded intervals; it will always be accompanied by a round bracket (because

∞ is not a real number, so it can’t be included in an interval). With this notation we could

write the domain and range for x − 1 as [1, ∞) and [0, ∞), respectively.

Comment #2: Of course, we could also impose a restriction on the domain for a given

function. For example, we could define a function as g(x) = x − 1 with domain x ∈ [1, 5),

and then the range would be just [0, 2).

Comment #3: You’ll probably notice that textbooks alternate between two different

notations for introducing the functions they want you to work on in their exercises. Is there

a difference in meaning between writing, for example, f (x) = x2 , and writing y = x2 ? Well,

obviously there isn’t much difference in the amount of information given; the difference is

really one of emphasis. In the prior notation the emphasis is on the rule; we are given a name

for the function, and told that it is the one which squares the input. In the latter notation we

are given a name for the output, and the emphasis is on the relationship between variables.

In a sense it is a contraction of two statements: y = f (x) , where f (x) = x2 .

Comment #4: Sometimes other relationships between variables may also be of interest.

For example, the equation x2 + y 2 = a2 should be familiar as the equation of a circle of radius

a, but its graph clearly fails the vertical line test! So, why do we make such a fuss about which

relationships are functions and which aren’t? Well, it does make a difference for the theory of

calculus, so for example to perform certain calculations we might have to break the equation
√ √
of our circle into the two functions y = a2 − x2 and y = − a2 − x2 .

2
2 Composition of Functions

If y = f (x) and x = g(t), then we can view y as a function of t: y = f (g(t)). There is a second

notation for this, too; we may write it as y = f ◦ g(t). This notation is convenient if we wish

to omit the independent variable; we can discuss the functions f , g, f ◦ g, g ◦ f , etc.

It’s easy to show that composition is not generally commutative. That is, f ◦ g is not

usually the same function as g ◦ f .

Example: If f (x) = 1 − sin x and g(x) = x2 ,


then f ◦ g(x) = f (g(x)) = f (x2 ) = 1 − sin(x2 ),

while g ◦ f (x) = g(f (x)) = g(1 − sin x) = (1 − sin x)2 .

Comment on notation: Unfortunately, with all of the concepts we encounter in

mathematics, there will still occasionally be some ambiguity in our notation. For example, it

may have occurred to you that the interval notation we introduced above could cause some

confusion, since the expression (1, 2) could be interpreted as either as an interval or as a point

in the xy-plane. The intent is usually clear from the context, though, and if it isn’t we can

make it clear by writing either x ∈ (1, 2) or (x, y) = (1, 2).

You’ll notice a similar problem if you consider the expression h(2 − y 2 ). This could be

interpreted either as a product of a variable h and the difference 2 − y 2 , or as a function

named h evaluated at 2 − y 2 (a composition). To help avoid confusion we traditionally use the

letters a, b, c, and d to represent parameters (quantities which are constant for the purposes

of the calculation, but can be altered), s, t, x, y, and z to represent variables, and the letters

f and g as names for functions1 . However, this is not a firm rule (and h is often used in both

roles), so you’ll need to pay attention to the context.2


1
There are also traditional roles for the letters of the Greek alphabet: π is a famous constant, α and β
are used to represent parameters , ε is used to denote a small parameter, φ and ψ are used as functions, ξ,
η, and ζ are used as variables, and so on. Some of the letters are often used as counterparts to their English
equivalents, so learning the Greek alphabet will be helpful.
2
To make matters worse, in applications we’ll often use the same name for a function and a variable. For
instance, if x represents the distance of a moving object from its starting point, and it can be calculated using a
function of time f (t), then we ought to write x = f (t), using x for the output variable and f for the rule we use
to calculate it. However, we may be dealing with a lot of variables and several functions, so it might actually
be less confusing to write x = x(t). This blurring of the lines is even reflected in our speech; if y = f (x) we’ll
say that “y is a function of x”... but y is a variable, not a function! Technically we should say “the value of y
is given by a function of x”, but no one ever speaks that precisely - not even mathematicians.

3
3 Inverse Functions

We say that a function g is an inverse of a function f if g(f (x)) = x, for any x in the domain

of f . Actually, inverses are unique, so we may say that g is the inverse of f . In words we

might say that the inverse “undoes” the action of the original function.

Suppose y = f (x), and that g is the inverse of f . Then, by our definition, we know that

g(f (x)) = x. This means that

g (y) = x.

Applying f to both sides of this equation we get

f (g (y)) = f (x) ,

and so

f (g (y)) = y

(using the fact that f (x) = y again).

Therefore if g is the inverse of f , then f is also the inverse of g.

Notation: Unfortunately, the problems with notation do not end with the comments of

the previous section. The standard notation for the inverse of f (x) is f −1 (x), which is arguably

the worst piece of notation in all of calculus. The reason should be clear; the inverse of f is not

the same thing as the reciprocal of f ! That is, we may write the reciprocal, 1
f (x) , as [f (x)]−1 ,

but we must not confuse this with f −1 (x), which means something completely different. We’ll

see this problem again a little bit later with the so-called3 inverse trigonometric functions:

sin2 x is understood to mean the same thing as [sin x]2 ,

but sin−1 x is not the same thing as [sin x]−1 !

(Just for clarity: [sin x]−1 means 1


sin x , which is also written as csc x, while sin−1 x is the

inverse, which is also called arcsin (x).)

Many authors prefer to use the name arcsin x for the inverse, but even if you decide to

use this consistently you must never write sin−1 x to represent the reciprocal, because you

will be misunderstood!
3
You’ll see the reason for the adjective “so-called” when we get to that topic.

4
Finding Inverses: In simple cases we can find the inverse function simply by solving the

equation y = f (x) for x.

Example: If f (x) = 21 (x − 1), find the inverse of f .

Solution: It will help to give a name to the output, so let’s write y = 12 (x − 1). Then we

have

2y = x − 1,

and then

x = 2y + 1.

Therefore the inverse of f is given by f −1 (y) = 2y + 1.

We can confirm that this is correct: f −1 (f (x)) = 2 12 (x − 1) + 1 = (x − 1) + 1 = x.


 

Note: You’ve been taught to switch the variables, so that in this example you would obtain

the expression y = 2x+1, but this is not at all necessary. Writing f −1 (y) = 2y+1 gives exactly

the same information as writing f −1 (x) = 2x + 1. As we discussed at the very beginning, the

function is just the rule: multiply by 2 and then add 1. It doesn’t matter what name we give

to the independent variable; the function remains the same. It is traditional to interchange

the variables simply because it is traditional to use x as the independent variable!

In fact, in applications, interchanging the variables is a terrible idea, since the variables

will usually be associated with specific quantities. For example, suppose that the distance

travelled by a moving object can be calculated as x = f (t) = 2t − 1, where t represents

time. Then the inverse function gives the time required for the object to move a given distance:

t = f −1 (x) = 21 (x2 + 1). Interchanging the variables here would be crazy.

Tradition also dictates that when graphing a function y = f (x), we should use the hori-

zontal axis for the independent variable. If we do this (and also interchange the names of the

variables so that the horizontal axis corresponds to the variable x), then it follows that the

graph of the inverse will be the reflection of the original graph across the line y = x. Why?

It’s precisely because we’re interchanging the roles of x and y!4


4
This is an important point: the change in the graph is a result of the swap of the axes, and really has
nothing to do with the concept of the inverse. In fact, if we don’t swap the axes, then the graphs of f and f −1
will be the same! Graphs are a tool for visualizing relationships between variables, and the two functions f
and f −1 correspond to the same relationship.

5
Invertibility: Not every function possesses an inverse. The problem, of course, is that for

the inverse to be a true function, it must have a single output for every input. This requires

that the original function must have a single input for every output, in which case we say that

it is one-to-one. We can often spot this from the graph; if f is one-to-one it will pass the

“horizontal line test”. This ties in with our discussion above; if f is invertible then its graph

will pass the vertical line test after we interchange the axes!

Note that even if f is not one-to-one, and therefore not invertible, we may be able to

restrict its domain to some interval on which it is one-to-one, and then we can define an

inverse for the restriction of f to that domain.

Example: The function f (x) = x2 has no inverse. However, the restriction of f to the

interval [0, ∞) does have an inverse: let’s call it g+ (x) = x. Alternatively, we could consider

the restriction of f to the interval (−∞, 0], which has the inverse g− (x) = − x.

4 Symmetry

Definition: We say a function f (x) is even if f (−x) = f (x).

Of course, the graph of an even function is symmetric about the y-axis. A famous example is

the cosine function:

1.5

0.5

-5 -4 -3 -2 -1 0 1 2 3 4 5

-0.5

-1

-1.5

Figure 1: -2

The prototypical examples are even powers: x2 , x4 , x6 , etc. (these are probably the reason

for the use of the word “even”). Reciprocal even powers are also even: x−2 , x−4 , etc., and so

is the absolute value function, |x|. Note that we can obtain an even function whose graph lies

in between x2 and x4 (for x 6= 0) by writing x3 .

6
Definition: We say a function f (x) is odd if f (−x) = −f (x).

The graph of an odd function is said to be symmetric “about the origin”, or “antisymmetric”.

By this we mean that it is unchanged by a 180◦ rotation, or equivalently, unchanged by a pair

of successive reflections, across both axes. The sine function is odd, of course:

1.5

0.5

-5 -4 -3 -2 -1 0 1 2 3 4 5

-0.5

-1

-1.5

Figure 2: -2

So are odd powers and reciprocal powers: x, x3 , x5 , etc., x−1 , x−3 , etc. Here’s a little

challenge: can you think of a way to construct an odd function whose graph lies in between x

and x3 ? (Answer at end of section).

Facts about symmetric functions:

• It can be shown5 that

? (EVEN) × (EVEN) = EVEN

? (ODD) × (ODD) = EVEN

? (EVEN) × (ODD) = ODD

• Most functions are neither even nor odd. However, any function whose domain is

symmetric about x = 0 (so that f (−x) is defined whenever f (x) is defined) can be

expressed as the sum of an even component and an odd component. Here’s how:

1 1
f (x) = f (x) + f (−x) − f (−x)
2 2
5
To prove the first one, let f (x) and g (x) be even functions, and let h (x) = f (x) g (x). Then h (−x) =
f (−x) g (−x) = f (x) g (x) = h (x), and so h is even.
To remember the rules, just think of the simplest examples: x2 · x4 = x6 , x · x3 = x4 , and x · x2 = x3 . From
these you can see why the rules for products of even and odd functions correspond to the rules for sums of
even and odd integers!

7
1 1 1 1
= f (x) + f (x) + f (−x) − f (−x)
2 2 2 2
1 1
= [f (x) + f (−x)] + [f (x) − f (−x)] .
2 2
1 1
Now let g (x) = [f (x) + f (−x)] and let h (x) = [f (x) − f (−x)]. Observe that
2 2
f (x) = g (x) + h (x), and when we examine these two functions we find that

1
g (−x) = [f (−x) + f (x)] = g (x) ,
2

so g is even, and

1 1
h (−x) = [f (−x) − f (x)] = − [f (x) − f (−x)] ,
2 2

so h is odd. Thus we have obtained a formula for obtaining the even and odd

components of f . We don’t often use this formula, but it is the basis for two useful

functions; if we break the exponential function up into its even and odd components we

obtain
1 x  1 x
ex = e + e−x + e − e−x ,

2 2

and from this we define the functions:

1 x 1 x
e + e−x , e − e−x .
 
cosh (x) := sinh (x) :=
2 2

These are the even and odd components of the natural exponential function! They are

known as the hyperbolic cosine and hyperbolic sine functions, respectively, and we’ll

discuss the reasons for those names shortly.

As you learn more and more calculus you’ll discover several reasons why symmetry can be

important. For the time being, just observe that noticing symmetry can make sketching

easier:

Example: Sketch the graph of the function f (x) = x2 − |x|.

8
Solution: First note that f is even, so we can start by sketching the part of the graph

for x ≥ 0, and just reflect it. For x ≥ 0, we have

f (x) = x2 − x

1 2 1
 
= x− −
2 4

This tells us that this part of the graph is part of a upward-opening parabola with its vertex

at 21 , − 14 . Finding the intercepts might be helpful:




x2 − x = 0 =⇒ x (x − 1) = 0 =⇒ x = 0 or x = 1.

Now we just sketch this, and reflect it!

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-0.5

Figure 3: -1

Answer to “little challenge” involving odd functions: To use just powers of x and
x3 |x3 |
the absolute value funciton, we can let f (x) = |x| (or f (x) = x , which is exactly the same

function).

9
5 Piecewise-Defined Functions

We will sometimes encounter functions which are defined by different formulas on different

parts of their domains. For example, we might have

 5





−x, if x < 0 4


f (x) = x2 , if 0 ≤ x < 2 3





 2

4,
 if x ≥ 2
1

-5 -4 -3 -2 -1 0 1 2 3 4 5

-1

Such functions might seem artificial, but there are many physical phenomena which need

to be described “piecewise”. For example, consider the density of water/ice as a function of

temperature at standard sea-level pressure; it undergoes an abrupt change at 0◦ C!

There are several simple piecewise-defined functions which are used quite commonly. One

of them should already be familiar to you; the absolute value function is of great importance

in calculus.

5.1 The Absolute Value Function

The absolute value of a number x is defined as




x,
 if x ≥ 0
|x| =

−x,
 if x < 0.

√ √
Alternatively, we could define it as |x| = x2 (this works because the symbol “ ” always

denotes the positive square root6 ).

It will be useful to think of |x| as the distance between x and zero on the number line.

Similarly, we can think of |x − a| as being the distance between x and a specific number a.

Example: The inequality |x − 5| < 2 is satisfied by the set of points within 2 units of 5:
6 √
Also, if we’re working with real numbers, then the notation x1/2 means exactly the same thing as x, so
1/2 1/2
we could also write |x| = x2 . This means that x2 is not always equal to x!

10
Figure 4: 2 3 4 5 6 7 8 x

Why is this true? Consider the definition:

If x ≥ 5, then |x − 5| < 2 means x − 5 < 2, so x < 7.

If x < 5 instead, then |x − 5| < 2 means − (x − 5) < 2. That is, x − 5 > −2, so x > 3

(recall that if we multiply an inequality by a negative number, then the inequality reverses

direction). Hence we may have x ∈ [5, 7) or x ∈ (3, 5). Combining these, we know that

x ∈ (3, 7).

Although we will often be able to find shortcuts, it is essential that you be able to use

the definition of |x|. It allows us to break difficult problems down into cases, so that we can

analyze them one piece at a time.

Example: Solve the inequality |x + 3| ≤ |2x + 1|.

Solution: The expressions |x + 3| and |2x + 1| each have two possible meanings, so it

seems as though we should have four cases to consider. However, if you try to identify the

four cases you’ll realize that there are only three, since it is impossible to have x + 3 < 0 and

2x + 1 ≥ 0 at the same time (this would mean that x < −3 and x ≥ −1/2). The easiest way

to see this is to realize that the meaning of our inequality changes at two values of x : −3 and

−1/2. That means that there are just three intervals to consider!

Case I: Suppose x < −3. Then both x + 3 and 2x + 1 are negative, so the inequality

reads

−x − 3 ≤ −2x − 1

=⇒ x ≤ 2.

Think about what that tells us for a moment: if we assume that x < −3, then the inequality

requires that x ≤ 2, which is guaranteed anyway! So, the inequality is solved by any number

less than −3.

11
Case II: Suppose x ∈ −3, − 21 . Then x + 3 is positive, but 2x + 1 is still negative, so
 

the inequality reads

x + 3 ≤ −2x − 1

=⇒ 3x ≤ −4

4
=⇒ x≤− .
3

That is, of the values of x in the interval −3, − 21 , only the values less than − 43 satisfy the
 

inequality (so the values in − 43 , − 12 are excluded from our solution set).


Case III: Suppose x ≥ − 12 . Then both x + 3 and 2x + 1 are positive, so the inequality is

x + 3 ≤ 2x + 1

=⇒ x ≥ 2.

This excludes the values between − 21 and 2.

Combining the results from the three cases, we can conclude that the inequality |x + 3| ≤

|2x + 1| is satified by
 
4
x∈ −∞, − ∪ [2, ∞).
3

5.2 The Signum Function

You’re unlikely to see this one often, but we’ll mention it briefly. The signum function simply

assigns a value to x based on its sign:







−1 if x < 0


sgn (x) =
0

if x = 0



1
 if x > 0.

Notice that x sgn (x) = |x|.

12
5.3 Ramp Functions

This is another simple one which is useful in signal processing:

 4


0,
 if t < 0 3.2

r (t) = 2.4

ct,
 if t ≥ 0 1.6

0.8

(where c is a constant)
-1 -0.5 0 0.5 1 1.5 2 2.5 3

-0.8

5.4 The Floor and Ceiling Functions

The floor function has infinitely many pieces, but it can be defined simply in words:

bxc = greatest integer ≤ x

(to put it another way, b c rounds the input down!).

Example 1: b4.17c = 4, b7c = 7, and b−2.32c = −3 (not −2; we just said the function

rounds down!).

Example 2: Describe the action of f (x) = x + 12 .


 

Answer: Observe that f (2.1) = 2, f (2.5) = 3, f (2.7) = 3, f (3) = 3, etc. This function

describes our standard rounding procedure!

1
Example 3: What does the function g (x) = 10 b10xc do?

Answer: Notice that we have g (2.1) = 2.1, g (2.13) = 2.1, g (2.1313...) = 2.1, and so on.

This function truncates the input after one decimal place.

The counterpart to the floor function is the ceiling function:

dxe = least integer ≥ x

(this rounds up instead of down). We don’t really need both of these, since dxe = −b−xc.

13
5.5 The Fractional-Part Function

Here’s another simple idea, which can most easily be defined this way:

F RACP T (x) = x − bxc

Examples: F RACP T (2.71828) = 0.71828, F RACP T (−4.23) = 0.77.

Note that F RACP T (x) is periodic! This gives it some interesting applications. For

example, if we are given an angle θ ∈ (−∞, ∞) in radians, we can obtain the corresponding
 
θ
angle in the interval [0, 2π) by using the function f (θ) = 2πF RACP T . Try it!

5.6 The Heaviside Function (a.k.a. the Unit Step Function)

This is perhaps the simplest piecewise-defined funtion we can imagine, and for that reason it’s

also one of the most important (along with the absolute value function). This is all it is:


 H(t)
0,
 if t < 0
H (t) = 1

1,
 if t ≥ 0
!2 !1 1 2 t

This can be used to write any piecewise-defined function in single-line form. In fact, you’ll

need to be able to do this in your third calculus course, so we’ll try to get you used to the idea

now.

First, notice that 



0,
 if t < 0
f (t) H (t) =

f (t)
 if t ≥ 0,

so if we think of moving from left to right (increasing time), then multiplication by H (t) can

be thought of as a switch, to activate the signal f (t) at time t = 0.

Example: e−t H (t) has this graph:

14
H(t)

!1 1 2 3 4 t
Figure 5:


0,
 if t < a
Also, we can shift this effect, since H (t − a) =

1,
 if t ≥ a.



0,
 if t < 1
Example: Consider f (t) = t2 H (t − 1). We can express this as f (t) = so
t2 ,

 if t ≥ 1,
the graph looks like this:

f (t)
3

!1 1 2 t
Figure 6:

That’s really all we need to know; with this tool we can produce an infinite variety of

functions with finite numbers of discontinuities.

Example: Sketch f (t) = t + (2 − t) H (t + 2) + t2 − 1 H (t) + 2 − t − t2 H (t − 1).


 

Solution: First write the function in piecewise-defined form (just apply the definition of

H (t)):
If t ∈ (−∞, −2), then f (t) = t =t

If t ∈ [−2, 0), then f (t) = t + (2 − t) =2

f (t) = t + (2 − t) + t2 − 1 = t2 + 1

If t ∈ [0, 1), then

f (t) = t + (2 − t) + t2 − 1 + 2 − t − t2
 
If t ∈ [1, ∞), then =3−t

Now we can sketch it:

15
f (t)
3

!4 !3 !2 !1 1 2 3 4 t
!1

!2

!3
Figure 7:

Of course, for this idea to be useful we must be able to do these problems in reverse. That

is, given a function defined in piecewise form, we need to be able to express it in terms of

the Heaviside function. There are many ways to do this, but there’s one simple strategy that

turns out to be the most useful. The idea is to begin with whatever we have on the left-most

portion of the graph, and use the Heaviside function to impose changes at each point where

the definition changes. We need the Heaviside function only in the form H (t − a); all we need

to do is set the values of a and work out what H (t − a) needs to be multiplied by.

Example: Consider the function






−t, for t < 0


f (t) = t2 , for 0 ≤ t < 2





4,
 for t ≥ 2.

We want to write this in the form

f (t) = −t + ____H (t) + ____H (t − 2) .

What should go in the spaces? Well, at time t = 0, we need to replace −t with t2 , so we

simply add t + t2 . At time t = 2, we need to replace t2 with 4, so that’s exactly what we do;

we add 4 − t2 !

f (t) = −t + t + t2 H (t) + 4 − t2 H (t − 2) .
 

16
The “Shortcuts”

When the given functions are more complicated it can be harder to do these calculations in

our heads, so some people prefer to use a couple of tricks to write down an expression quickly,

and then simplify it afterwards. This actually ends up requiring more work, but it might help

us to avoid mistakes.

Consider that 

1,
 if t < a
1 − H (t − a) =

0,
 if t ≥ a.

H(t)

a t
so its graph looks like this:

This can be used as a mathematical “off” switch.

Also consider the combination






0, for t < a


H (t − a) − H (t − b) = f (t) = 1, for a ≤ t < b





0,
 for t ≥ b.

H(t)

a b t
for which the graph looks like this:

We can use this as a single “on / off” switch.

With these, we can proceed formulaically. That is, given a function with, say, four “pieces”

17





g1 (t) , for t < a




g (t) ,
 2 for a ≤ t < b
f (t) =

g3 (t) ,


 for b ≤ t < c




g4 (t) ,
 for t > c,

we can immediately write

f (t) = g1 (t) [1 − H (t − a)]

+g2 (t) [H (t − a) − H (t − b)]

+g3 (t) [H (t − b) − H (t − c)]

+g4 (t) H (t − c).

We can then simplify this by “collecting” Heaviside functions:

f (t) = g1 (t) + [g2 (t) − g1 (t)] H (t − a)

+ [g3 (t) − g2 (t)] H (t − b) + [g4 (t) − g3 (t)] H (t − c) .

Remark: You might be wondering how we deal with functions such as


 H(t)
t,
 if t ≤ 1
f (t) = 1

−t,
 if t > 1
1 t
!1

in which there appears a “≤” instead of a “≥” (so our intervals are closed on the right instead of

on the left). It is possible to deal with these; for this example we’d need to use the expression

H (1 − t). However, in applications we will not usually worry about this detail. For example,

if you imagine using mathematics to model what happens when you turn a lamp on, it doesn’t

really matter whether we consider the light to be on or off at the precise moment when we

flick the switch!

In fact, various textbooks define the Heaviside function differently; you may see it defined

as

18


0,
 if t ≤ 0
H (t) =

1,
 if t > 0

or even as 

0,
 if t < 0
H (t) =

1,
 if t > 0.

We simply won’t need to worry about the distinction once we start using the function in

applications.

6 Periodicity

Definition: A function f (t) is periodic if there is a number T such that f (t + nT ) = f (t)

for every integer n. The number T is called the period.

The graph of a periodic function consists of infinitely many segments which are replicas of

each other:

0.5

0.25

-5 -4 -3 -2 -1 0 1 2 3 4 5

-0.25

Figure 8: -0.5

There are a few terms used to describe periodic functions that you should know:

1
• Frequency = . If the period is measured in seconds, then the frequency is in
Period
“Hertz”: 1 Hz = 1s−1 . This is the number of complete cycles per second. In engineering,

it’s common to use the letter f for frequency, while physicists tend to use the Greek

letter ν (“nu”) instead.


• Angular Frequency = = 2πf . This has units of radians per second (we’ll review
Period
what radians are shortly). We will actually refer to the angular frequency much more

19
often than to the frequency; so much so that we’ll often get tired of saying “angular”!

When we say “frequency”, we often mean “angular frequency” - watch out for this. Our

notation will help; it is customary to denote the angular frequency by the Greek letter

ω (or a, b, m, n, or λ... but never f ).

Periodic Extensions

Occasionally, in applications to analog electronics, we’ll be given a voltage or a current over

a certain time interval, and we’ll need to create a function which matches the given one over

the given interval, but is defined over the entire real line, and is periodic. The same task may

be required in other applications as well. For example, we might be given this as our original

function:
2

 1.5

1
 



2x, if x ∈ 0, 2 1



f (x) = 1
 0.5

2 − 2x, if x ∈ 2, 1


 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4 2.8

0
 elsewhere -0.5

-1

We could extend this as an even periodic function (of period 1), like this:

1.5

0.5

-1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4 2.8

-0.5

Figure 9: -1

Alternatively, we could extend it as an odd function of period 2, like this:

1.6

0.8

-1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4 2.8

-0.8

Figure 10:

20
Notice that both of these reproduce f (x) over the original interval (0, 1). You may see some

more examples of these in the assignments.

7 Rational Functions & Partial Fraction Decomposition


x2 − 4x + 1
A rational function is a ratio of polynomials, such as . We call a rational function
x3 + 5x2 − 3
proper if the degree of the numerator is less than the degree of the denominator, and improper

otherwise7 .

You should be quite comfortable with the idea of combining rational functions into one

through finding a common denominator, but in calculus we will often find it necessary to

reverse this procedure. Fortunately, this can always be done... provided that we can manage

to factor the denominator.

Fact: Any proper rational function can be expressed as the sum of simpler rational functions,

whose denominators are either linear or irreducibly quadratic.

Example:
5x2 − 5x + 4 3x − 1 2
3 2
= 2 +
x −x −x−2 x +x+1 x−2

How do we do this? The Method of Partial Fractions essentially consists of guessing the

form of the decomposition on the right by taking advantage of the experience we have in

working in the other direction. If we think about all of the various things that can happen

when we combine rational functions together, we arrive at the following procedure.

We begin by factoring the denominator as far as possible, into linear and irreducibly

quadratic factors (the Fundamental Theorem of Algebra guarantees that this is always possible

in theory, although it can be difficult in practice). We then predict the form of the partial

fraction decomposition using three rules (which are admittedly difficult to explain clearly, but

which should become clearer through the examples):

7
Some authors use the term marginally proper if the numerator and denominator are of the same degree,
in which case our term proper needs to be replaced by strictly proper.

21
1. For any linear factor (c1 x + c0 ) in the denominator, the decomposition will contain a
A
term of the form , for some constant A.
c1 x + c0

2. For any irreducible quadratic factor (c2 x2 + c1 x + c0 ) in the denominator, the decompo-
Ax + B
sition will contain a term of the form 2
, for some constants A and B.
c2 x + c1 x + c0

3. For any factor which is repeated, n times, we need n terms of the forms given by Rules

1 & 2, but distinguished by the exponents 1 through n.

If you’re not sure why these rules work the way they do, just try doing some of the following

examples in reverse (that is, go through the procedure of putting the results over the common

denominator), and you should begin to see the logic behind them.

Examples:

x+2 x+2
1. Consider . Factoring the denominator gives , so we only need
x2
+ 5x + 4 (x + 4)(x + 1)
x+2 A B
Rule 1: 2 = + . Now, to determine the values of A and B, the
x + 5x + 4 x+4 x+1
idea is to put these expressions over a common denominator again and match up the

coefficients:
x+2 A B A(x + 1) + B(x + 4)
= + =
x2 + 5x + 4 x+4 x+1 x2 + 5x + 4

We can cancel the denominators, and this leaves us with x + 2 = A(x + 1) + B(x + 4)∗ =

(A + B)x + (A + 4B). The only way these two polynomials can be equal is if the
1= A+B
coefficients are equal, so we have the pair of equations . Solving these,
2 = A + 4B
we find A = 2/3, and B = 1/3, and so we have our result:

 
x+2 1 2 1
= + .
x2 + 5x + 4 3 x+4 x+1

Note: Once we get to the point marked *, we could find A & B more quickly by substi-

tuting values for x. This “cover-up” trick doesn’t work so well when we have quadratic

factors, so you’ll still need the concept of matching coefficients, but it does give us a useful
set x = −1 to get 1 = 3B (so B = 1/3)
shortcut when the factors are linear:8
set x = −4 to get −2 = −3A (so A = 2/3).
8
It might occur to you that these are precisely the values of x at which our original function is undefined,

22
x
2. Now consider . We need Rules 1 and 2 here:
(x + 1)(x2 + x + 1)

x A Bx + C
2
= + 2
(x + 1)(x + x + 1) x+1 x +x+1

=⇒ x = A(x2 + x + 1) + (Bx + C)(x + 1).

Since we have one quadratic factor, the cover-up method can be used to get one of

the constants quickly: by setting x = −1 we immediately find that A = −1. To get

the remaining two values, though, it’s quickest to match the coefficients: Comparing x2

terms, we see that we must have 0 = A + B, while comparing constant terms, we find

that we must have 0 = A + C.9

x x+1 1
Therefore B = 1 and C = 1, so = 2 − .
(x + 1)(x2 + x + 1) x +x+1 x+1

2x5 − x3
3. Consider . We need all three rules here:
(x + 2) (x2 + 1)3

2x5 − x3 A Bx + C Dx + E Fx + G
3 = + 2 + 2 +
(x + 2) (x2 + 1) x+2 x +1 2
(x + 1) (x2 + 1)3

These problems get very tedious, and software packages can handle them for us, so we

won’t finish this one. The expansion turns out to be

−56/125 (56x + 138)/125 (44x + 37)/25 (6x + 3)/5


+ − + .
x+2 x2 + 1 (x2 + 1)2 (x2 + 1)3

Fact: If f (x) is an improper rational function, then it can always be written as the sum of

a polynomial and a proper rational function.

How? One option is long division (you may have seen synthetic division, but this only
so we shouldn’t be allowed to do this! However, we could get exactly the same results by taking limits as x
approaches −1 and 4, so the problem isn’t really a problem after all (to use some terminology we’ll introduce
properly later on, these points are removable discontinuities).
9
To see why this is more efficient than relying exclusively on the cover-up method, consider trying to
complete this example that way. Setting x = 0 does at least look helpful; it gives us 0 = A + C, but this is
entirely equivalent to comparing the constant terms. After that, though, there are really no more useful values
of x; the best we can do is pick a nice round number like x = 1. This gives us the equation 1 = −3A +2(B + C).
This is certainly sufficient for us to determine the values of all three constants, but this last equation is definitely
more complicated than the equation we obtained from comparing the coefficients of x2 . The most efficient way
to proceed is to use a sensible combination of the two techniques.

23
works for linear denominators).

x3 − 1
Example: Rewrite f (x) = .
x2 + 2x + 1

Solution: The long division calculation should look something like this:
x−2

x2 + 2x + 1 x3 − 1

x3 + 2x2 + x

−2x2 − x − 1

−2x2 − 4x − 2

3x + 1
−1 x3 3x + 1
This tells us that = x−2+ 2 , and now we are in a position to
x2
+ 2x + 1 x + 2x + 1
factor the denominator and proceed with the partial fraction decomposition.

Another option is to include the polynomial terms in our partial fraction decomposition

procedure directly:

Since deg (numerator) − deg (denominator) = 1, and x2 + 2x + 1 = (x + 1)2 , we know that

x3 − 1 C D
= Ax + B + +
2
x + 2x + 1 x + 1 (x + 1)2

x3 − 1 Ax (x + 1)2 + B (x + 1)2 + C (x + 1) + D
=⇒ =
(x + 1)2 (x + 1)2

x3 − 1 = A x3 + 2x2 + x + B x2 + 2x + 1 + C (x + 1) + D.
 
=⇒

Setting x = −1 gives −2 = D.

Setting x = 0 gives −1 = B + C + D, i.e. B + C = 1.

Comparing x3 terms tells us that 1 = A,

while the x2 terms tell us that 0 = 2A + B, so B = −2 and then C = 3.

Thus we find that


x3 − 1 3 2
=x−2+ − .
x2 + 2x + 1 x + 1 (x + 1)2

Note that we might have realized at the beginning that A = 1, by doing the first step of long

division in our heads.

24
Application to Curve Sketching (if time permits)

Suppose we wish to investigate the graph of the function in the example above. We’ve dis-

covered that for large values of x (positive or negative), f (x) ≈ x − 2. This means that the

line y = x − 2 is an oblique asymptote (also called a slant asymptote). We also have a vertical

asymptote at x = −1. Furthermore, we can see that

• the curve passes through (1, 0) and through (0, −1)

3
• the graph of f lies above the line y = x − 2 when x  −1 (because f (x) ≈ x − 2 + x+1 )

• the graph of f lies below the line y = x − 2 when x  −1 (for the same reason)

−2
• as x → −1, f → −∞, since the (x+1)2
term dominates.

This is all we need to determine that the graph looks like this:

y
x"!1
3
y"x!2

!4 !1 1 2 4 x
!1
!2

!7

Figure 11:

x3 − x2 − x + 3
Example: Sketch the graph of f (x) = .
x−1

Solution: Use long division:

x2 −1

x−1 x3 − x2 − x + 3

x3 − x2

−x + 3

−x + 1

25
2
=⇒ f (x) = x2 − 1 + .
x−1

From this we can see that f has a vertical asymptote at x = 1, and the graph of f approaches

the parabola y = x2 − 1 “asymptotically” as x → ±∞. The y-intercept is (0, −3). The other

part of the graph doesn’t have any intercepts, so to anchor the graph let’s just find one point:

notice that f (2) = 5.

Here’s our graph:

!5 !1 1 2 5 x

!3

!5

Figure 12:

8 The Trigonometric Functions (a.k.a. the Circular Functions)

Radian Measure

Definition: One radian is the angle for which the length of the arc of a circle matches the

radius of that circle.

s!r
r

θ!1
r

The radian measure of an angle is the ratio of arc length to radius (in any circle drawn with
s
the vertex of the angle at its center): θ = .
r

A few observations should be made here:

1. Our definition of radian measure immediately gives us a formula for the length of an arc:

in a circle of radius r, the arc length corresponding to an angle θ (in radians) is given

by s = rθ.

26
2. At this point you’re probably more familiar with degrees than radians, so we should

know how they are related. Well, we know that the circumference of a circle (that is,

the arc length for a full circle) is s = 2πr. Matching this to the formula s = rθ, we

discover that in a full circle, θ = 2π radians. Hence 2π radians = 360◦ , which allows us
◦
to conclude that 1 rad = 180 π , and 1◦ = 180
π
rad. To make you a bit more comfortable

with radian measure, it may help to note that 1 radian ≈ 57.3◦ . An easy way to make

sense of this is to compare the diagram above to an equilateral triangle; each angle in

an equilateral triangle is 60◦ , but if you imagine bending one side into an arc of a circle,

then the angle opposite must be reduced by a small amount!

3. There is one way in which radians (and degrees) differ from other kinds of units. Since

they are defined by a ratio of lengths, they are dimensionless. That is,

a second is a unit of time

a meter is a unit of length

a gram is a unit of mass

a radian is ... a pure number!

In calculus we will use radians only. You’ll see why when we discuss derivatives, but

to put it simply it’s because they work better! Degrees are useful because the number

360 is divisible by 2, 3, 4, 5, and 6, but in calculus we won’t always be working with

integers. Radians can be said to be a more natural measure, because they are defined

only in terms of properties of circles themselves.

27
The Trigonometric Functions

You’ve probably seen the sine and cosine functions defined as ratios of lengths of sides of

triangles, but we’ll use a slightly different definition. Consider the unit circle, x2 + y 2 = 1,

and an angle θ made between a ray from its center and the x-axis. We define the cosine and

sine of θ as the coordinates of the point of intersection of that ray and the circle itself:

1.6

1.2

❁ P(cosθ,sinθ)
0.8

0.4

θ
-2.4 -2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4

-0.4

-0.8

-1.2

Figure 13: -1.6

That is, we have


x = cos θ

y = sin θ

These are “parametric equations” for the unit circle; if we imagine θ running through the

values from 0 to 2π, the point (x, y) traces out the circle. Many of the properties of the sine

and cosine functions should now appear to be immediate consequences of our definition:

π π 3π
  
• sin (0) = 0, cos (0) = 1, sin 2 = 1, cos 2 = 0, sin 2 = −1, etc.

• |sin θ| ≤ 1 and |cos θ| ≤ 1 for all values of θ

• sin (θ + 2πk) = sin (θ), cos (θ + 2πk) = cos (θ), for all integers k (so the functions are

periodic, with period 2π).

• Imagine moving clockwise from the origin instead; compare the values we obtain with

the values we obtain from moving counterclockwise:

28
1.6

1.2

❁ cos(θ),sin(θ)
0.8

0.4

θ
-2.4 -2 -1.6 -1.2 -0.8 -0.4 0 -θ 0.4 0.8 1.2 1.6 2 2.4

-0.4

-0.8 ❁ cos(-θ),sin(-θ)

-1.2

Figure 14: -1.6

From this it is clear that cos (−θ) = cos (θ), and sin (−θ) = − sin (θ), so the cosine function

is even and the sine function is odd!

• The sine of θ is positive when P is above the x-axis (in the 1st & 2nd quadrants), while

the cosine is positive when P is to the right of the y-axis (in the 1st & 4th quadrants).

If you know the “right” definition of the trigonometric functions, there’s no need for the

“CAST” rule!

• Where does the tangent function come from? We draw a line which is tangent to the

circle at the point (1, 0), and observe where it intersects our ray. The y-coordinate of

that intersection point is defined to be the tangent of θ, tan (θ). By similar triangles, we
tan θ sin θ
can see that = , which gives us a more practical definition.
1 cos θ

29
❁ (1,tan(θ))
1.5

1
❁ cos(θ),sin(θ)

0.5

θ
-2.4 -2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4

-0.5

-1

-1.5
Figure 15:

• There are another three functions in fairly common usage. They can be defined simply

as reciprocals of the three we have already named, but they all have geometric origins.

The secant function, for example, is the length of the part of our original ray which lies

between the origin and the above-mentioned tangent line (a secant line is a line which

cuts through a circle).

1.5
❁ (1,tan(𝚹)


)

1
c(θ
se


0.5

-2.4 -2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4

-0.5

-1

Figure 16:

There is also a cosecant and a cotangent, abbreviated csc θ and cot θ. For simplicity, it’s

30
enough to remember our original definition of sine and cosine, and to remember that

sin θ 1 1 1
tan θ = , sec θ = , csc θ = , cot θ = .
cos θ cos θ sin θ tan θ

We will rarely use the cosecant and cotangent, and even the secant function is of limited

importance, except that it appears in some useful identities.

Trigonometric Identities

Most textbooks have long lists of identities, but there are really only a few that you really

have to know. We’ve placed them in boxes in the following discussion. The other identities

are either less commonly needed, or can be derived quickly from this short list. We emphasize,

though, that you MUST KNOW the important ones! The trigonometric identities are our tools

for performing algebra with trigonometric functions, so this is as important as knowing the

rules for manipulating exponentials and logarithms, for example, or knowing how to multiply

and divide powers of x.

• The Pythagorean Identity is obvious: since x = cos θ and y = sin θ, and x2 + y 2 = 1, we

have cos2 θ + sin2 θ = 1 .

• Dividing this result by cos2 θ gives 1 + tan2 θ = sec2 θ .

π
• Consider the angles θ and 2 − θ:

π
2! θ
θ
sin θ
θ

cos ( π2 ! θ )
Figure 17:

π π
 
We can see that cos 2 − θ = sin θ. Similarly, sin 2 − θ = cos θ. Furthermore, since cosine

is even and sine is odd, we can express these as

 π
cos θ − = sin θ
2

31
 π
and sin θ − = − cos θ .
2

• The “sum-of-angle” identities are more difficult to establish, and we’ll state them here

without proof:

cos (α + β) = cos α cos β − sin α sin β

sin (α + β) = sin α cos β + cos α sin β .

If you know these, and you know that cosine is even and sine is odd, then you can

immediately determine that

cos (α − β) = cos α cos β + sin α sin β

and sin (α − β) = sin α cos β − cos α sin β.

Also, if we set α = β = θ, we obtain the “double-angle formulas”:

cos 2θ = cos2 θ − sin2 θ

sin 2θ = 2 sin θ cos θ .

Furthermore, the first of these can be combined with the Pythagorean identity to give

cos 2θ = 2 cos2 θ − 1

or cos 2θ = 1 − 2 sin2 θ,

and these can then be re-arranged to give

1
cos2 θ = (1 + cos 2θ)
2

1
and sin2 θ = (1 − cos 2θ) .
2

These are known as the “half-angle formulas” 10


10
This is because they can be expressed in the form cos2 θ 1
(1 + cos θ) and sin2 θ 1
 
2
= 2 2
= 2
(1 − cos θ).

32
Example: Solve for θ, if sin 2θ = cos θ, and x ∈ [0, 2π].

Solution: One option is to rewrite the equation as 2 sin θ cos θ = cos θ. This allows us

to see that either 2 sin θ = 1 or cos θ = 0 (don’t overlook the second possibility - we can only

cancel the cosines if cos θ 6= 0!!!).

Case 1: If 2 sin θ = 1, then sin θ = 12 , so θ = π


6 or θ = 5π
6 (to find the second angle, just

remember that sin θ corresponds to the y-coordinate, and observe that there are two angles

which give the same value. If one is θ, then the other is π − θ (see the figure below).

1.5
π-θ

❁ 0.5 ❁

θ θ

-2.4 -2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4

-0.5

-1

-1.5

π 3π
Case 2: If cos θ = 0, then θ = 2 or θ = 2 .

Thus we’ve found four solutions: θ = π6 , π 3π


2, 3 , or 5π
6 .

Example: Rewrite cos4 θ in terms of cos 2θ and cos 4θ (this is a skill we’ll need later, when

we discuss integration).

Solution: All we need is one of the half-angle formulas, used twice:

 2
4 2
2 1 + cos 2θ
cos θ = cos θ =
2

1
1 + 2 cos 2θ + cos2 2θ

=
4
 
1 1
= 1 + 2 cos 2θ + (1 + cos 4θ)
4 2

33
1
= (3 + 4 cos 2θ + cos 4θ) .
8

The Hyperbolic Functions

We’re now in a position to see some of the reasons for the names of the hyperbolic functions,
1 x
e + e−x and sinh x = 21 (ex − e−x ). You can easily show from their definitions

cosh x =
2
that

cosh2 x − sinh2 x = 1

(try it). Therefore, if we set x = cosh θ and y = sinh θ, then we obtain parametric equations

for the curve x2 − y 2 = 1, which is the unit hyperbola!

This analogy does have one peculiar twist; the variable θ here is NOT the angle! Never-

theless, there is a connection. It turns out that θ is twice the area enclosed by the x-axis, the

ray, and the curve... for both hyperbolic and circular functions!

34
9 The “Inverse” Trigonometric Functions

The first thing to know about the inverse trigonometric functions is that ... there aren’t any!

In fact, no periodic function can have an inverse (because periodic functions can’t be one-to-

one). The functions which are commonly referred to as the inverse trigonometric functions

are inverses only of versions of the trigonometric functions which have restrictions imposed on

their domains. We’ll start with the so-called inverse sine function.

9.1 The Inverse Sine Function

We need to identify an interval on which the sine function covers its full range [−1, 1] and is

one-to-one. There are infinitely many choices, but we tend to like staying close to the origin
h π πi
when given a choice, so we use the interval − , .
2 2

0.5

-2.4π -2π -1.6π -1.2π -0.8π -0.4π 0 0.4π 0.8π 1.2π 1.6π 2π 2.4π

-0.5

-1

Figure 18:

h π πi
We can say that the function f (x) = sin x, with domain − , , is invertible. This
2 2
allows us to define a new function as follows:
h π πi
Definition: y = sin−1 x means that x = sin y, where y ∈ − , .
2 2

Note that the domain of sin−1 x is [−1, 1] (and this would have been the case no matter

which domain restriction we had chosen). Also remember that y is the angle here. It may

help to put the definition into words:

y = sin−1 x means “y is the angle between − π2 and π


2 whose sine is x”.

The graph is below.

35
0.5π

0.25π

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

-0.25π

-0.5π

Figure 19:

Since sin−1 x is NOT really the inverse of sin x, some confusion is natural, and some care

is required when using it. Specifically, note that

• the statement sin sin−1 x = x is true for all x ∈ [−1, 1] (and makes no sense elsewhere),


• but the statement sin−1 (sin x) = x is true only if x ∈ − π2 , π2 , even though this function
 

is defined for all values of x.

Examples:

• sin−1 sin π3 = π

3

• sin−1 sin 5π π

4 = −4

(How did we get the second result? We use the same idea as for Example 1 of the previous

section; we need the angle in the 1st or 4th quadrant which gives the same sine value as 5π/4,
! = 5∏/4

! = -∏/4
which just requires a reflection across the y-axis.)

Since the name “inverse sine” is a misnomer, many people prefer the name arcsine for our

new function. This name is a reminder of the connection of the trigonometric functions to the

unit circle; if y is the angle (in radians), then it’s also the length of the associated arc!

y
1

y
1
Figure 20:

36
A Tougher Question: What does the graph of y = sin−1 (sin x) look like?
h π πi
Well, we know that if x ∈ − , , then y = x:
2 2

2.5

-1.5π -π -0.5π 0 0.5π π 1.5π

-2.5

Figure 21:

We also know that the function is periodic, with period 2π:

7.5

2.5

-3.2π -2.4π -1.6π -0.8π 0 0.8π 1.6π 2.4π 3.2π

-2.5

-5

Figure 22: -7.5

You might be able to guess at the rest if you observe that f (−π) = 0 and f (π) = 0. To

confirm that guess, we need two more properties of trigonometric functions: identities, and

symmetry!
π

We know that sin x = cos x − (the sine function IS the cosine function, shifted to the
2

right!). Therefore we can write our function as y = sin−1 cos x − π2 . Finally, since the


cosine function is even, we realize that our own function is essentially an even function shifted

to the right. In other words, the graph must be symmetric about the line x = π/2:

37
7.5

2.5

-3.2π -2.4π -1.6π -0.8π 0 0.8π 1.6π 2.4π 3.2π

-2.5

-5

Figure 23: -7.5

9.2 The Inverse Cosine Function

We define the inverse cosine function in a similar way, but we have to make one adjustment:
h π πi
the cosine function is not one-to-one on the interval − , , so we have to use a different
2 2
interval for our original domain restriction. We use [0, π] instead:

0.5

-2.4π -1.6π -0.8π 0 0.8π 1.6π 2.4π

-0.5

-1

Figure 24:

This allows us to define the “arccosine” this way:

Definition: y = cos−1 x means that x = cos y, where y ∈ [0, π].

Here’s its graph:

38
1.6π

0.8π

-5 -4 -3 -2 -1 0 1 2 3 4 5

-0.8π

-1.6π

Figure 25:

Since the sine and cosine functions are so closely related, we can often avoid using the

arccosine function if we wish.

9.3 The Inverse Tangent Function

What interval should be used for the arctangent? That’s easy to answer:

2.5

-2.4π -2π -1.6π -1.2π -0.8π -0.4π 0 0.4π 0.8π 1.2π 1.6π 2π 2.4π

-2.5

Figure 26: -5

 π π
The tangent function is one-to-one on the interval − , . This is almost the same as
2 2
for the sine function, except that the tangent function is undefined at the endpoints of the

interval, so we have to exclude those. Here’s our definition:


 π π
Definition: y = tan−1 x means that x = tan y, where y ∈ − , .
2 2

One other difference to note is that the domain of the arctangent is the entire real line

(x ∈ R). For related reasons, it ends up being the most important member of this family of

functions; it shows up in applications quite frequently.

39
π

0.5π

-7.5 -5 -2.5 0 2.5 5 7.5

-0.5π

Figure 27:

9.4 The Others

Of course, there are still three more to be defined, but we will rarely use them. There is one

that we’ll need to use - at least once - later on in the course:

Consider the secant function, y = sec x (it’s the solid curve in the figure below; the dashed

curve is the graph of cos x, to illustrate the relationship between the two).

-2π -1.5π -π -0.5π 0 0.5π π 1.5π 2π

-1

-2

-3

-4

Figure 28:

Picking an interval on which sec x is one-to-one is awkward; there’s no way to cover the full

range of the function without crossing a discontinuity. The most natural choice might be
h π  π i
0, ∪ , π , and some authors do use this interval to define the inverse. However, it turns
2 2
out that this results in two different formulas for the derivative, depending on which part of

the domain x lies in! For this reason, other authors prefer to use this definition:

40
h π h π
Definition: y = sec−1 x means that x = sec y, where y ∈ −π, − ∪ 0, .
2 2

Also note that no matter what we do, the domain will not be contiguous; it must be

(−∞, −1] ∪ [1, ∞). With the definition above, the graph looks like this:

0.5π

-5 -2.5 0 2.5 5

-0.5π

Figure 29:

The domain and range restrictions make this a pretty strange function. Fortunately, we

can usually avoid using it, if we wish! For example, suppose x is in the 1st quadrant. If

y = sec x, and we wish to solve for x, then we can write either

y = sec x =⇒ x = sec−1 y

OR
1
y = sec x =⇒ y =
cos x
1
=⇒ cos x =
y
 
1
=⇒ x = cos−1 .
y

If the angle x is not in the first quadrant, then we’ll have to think a bit harder. There will

be different “rules” for avoiding the inverse secant, cosecant, and cotangent functions in the

various quadrants, but they will all involve just the addition or subtraction of some angle.

41
10 Working with Sines and Cosines

The Pythagorean identity and the double-angle formulas are of critical importance in calculus,

for reasons we’ll explore later (they are essential tools for the evaluation of some common types

of integrals). The sum-of-angle formulas are also extremely important, but for a different

reason, which we will discuss now.

In some applications of calculus to physics and engineering, we’ll have input information

of the form

f (t) = B sin ωt

(here B is the amplitude and ω is the angular frequency), and we’ll find that our mathematical

analysis will result in output of the form

g(t) = a sin ωt + b cos ωt.

This is, in fact, still a sine wave. It has the same angular frequency as the input, but a different

amplitude and phase. That is, it can be re-written in the form11

g(t) = A sin(ωt + α).

How do we accomplish this? The key is the double angle formula, sin(θ1 +θ2 ) = sin θ1 cos θ2 +

sin θ2 cos θ1 . With this, we have

A sin(ωt + α) = A sin ωt cos α + A cos ωt sin α,

and we can work backwards from this to determine what the values of A and ω must be.

Example: Express y = 5 cos 2t − 3 sin 2t in the form y = A sin(2t + α).

Solution: Since A sin(2t+α) = A sin 2t cos α+A cos 2t sin α = (A sin α) cos 2t+(A cos α) sin 2t,

and we want this to be equal to 5 cos 2t−3 sin 2t, we match up the coefficients. That is, looking
11
Note: if we graph g(t) versus t, we find that, in comparison to an unshiftedsine wave such as f (t), g(t) is
α
shifted to the left by the quantity α/ω; since A sin (ωt + α) = A sin ω t + ω . However, it is also possible
to graph g(t) versus the quantity ωt, in which case the shift is simply α, and this practice is quite common.
For this reason when we speak of the phase (or the phase shift), we are referring to α, rather than α/ω.

42
at the coefficients of cos 2t in the two expressions, we conclude that

A sin α = 5. (1)

Similarly, looking at the coefficients of sin 2t, we conclude that

A cos α = −3. (2)

So, we’ve arrived at a system of two equations in two unknowns, which we hope to be able to

solve for the unknowns A and α.

Now, if we square both sides of both equations and add the results, we discover that

A2 sin2 α + A2 cos2 α = 25 + 9, so A2 = 34 (we’ve managed to eliminate α from the system of

equations). Should we take the positive root or the negative one? This is entirely our choice;

each will simply require a different value of α to accompany it. However, since we want to be

able to refer to A as the amplitude of the combined sine wave, we’ll choose the positive one:

A = 34.

Next, we can easily eliminate A from the system of equations by dividing one by the other.

This tells us that tan α = −5/3. Reaching for a calculator, we find that tan−1 (−5/3) ≈ −1.030

radians. We have to be careful, here, though; this is not necessarily the correct value for α!

In fact, in this particular example it isn’t; since we’ve selected A to be positive, equations 1

and 2 tell us that sin α is positive, and cos α is negative, so α must be in the 2nd quadrant,

not the 4th (see Figure 30).

This is α!

-5 -4 -3 -2 -1 0 1 2 3 4 5

This is arctan(-5/3).
-1

-2

-3
Figure 30:

43
The angle in that quadrant whose tangent is −5/3 is α = tan−1 (−5/3) + π ≈ 2.111 radians

(although we could also add any multiple of 2π to this, so for example we might choose to set

α = tan−1 (−5/3) − π ≈ −4.172 radians).

We now have that



5 cos 2t − 3 sin 2t = 34 sin (2t + 2.111).


In general we can conclude that a sin ωt + b cos ωt = a2 + b2 sin(ωt + α), where tan α = ab .

It’s safe to remember the formula A = a2 + b2 for the amplitude, but determining the phase

shift takes more care; we need to use the signs of a and b to determine which quadrant α

lies in before we can decide which formula applies. If α is in the 1st or 4th quadrants then

α = tan−1 ab , but if α is in the 2nd or 3rd quadrants then α = tan−1 ab ± π.


 

Note: If you prefer, you could calculate α more directly from equation 1, using sin α =
√ √
5/ 34, or from equation 2, using cos α = −3/ 34. However, the corrections for angles in the

“wrong” quadrants take a bit more thought if we use the arcsine or arccosine functions.

44
Part II

Limits

11 Sequences

Introduction

A sequence is simply an infinitely long list of numbers, which we usually label as a1 , a2 , a3 ,

a4 , . . .. More precisely, it is a function whose domain is N (the set of natural numbers 1, 2, 3,

etc.), or possibly N plus the number 0. To denote the entire sequence, we write {an }∞
n=1 , or

just {an }.
∞
(−1)n (n + 1)

A sequence may have a formula (for example, is the sequence
  3nn=0
2 3 4 5
1, − , , − , , . . . ), or it might not (for example, the sequence {an }, where an is the
3 9 27 81
population of the world on January 1st of year n).

Most of calculus deals with continuous functions of real variables, but we do encounter

sequences quite frequently, and they arise quite naturally in applications.

For example, consider a taut string of length L:

● ●

If the string is plucked, it will vibrate, as a sine wave. Since the ends are fixed, though,

these waves can only take on certain forms:

● ●

● ●

45
● ●

etc.

At each moment, the waves have the form of some combination of curves C1 sin πx

L ,

C2 sin 2πx 3πx


 
L , C3 sin L , etc. That is, the shapes the string can take on are determined by
n nπx o
the sequence of functions sin .
L
Note: This is the same principle that governs quantum mechanics. At the subatomic level,

energy can only be held in specific, discrete (sequential) amounts. It is “quantized”!

We may even encounter erratic-looking sequences without formulas, and the problems we’re

studying don’t have to be complicated for this to happen. For example, consider the digits in

the number π (which is just the ratio of circumference to diameter in any circle).

Of course, if there is a pattern, then ideally we would like to work out what the formula

is. You’ve probably seen problems like this on IQ tests!

Examples:

1. {2, 7, 12, 17, . . .} =?


 
1 2 3 4
2. − , , − , , . . . =?
4 9 16 25

(Answers at end of section)

Limits of Sequences

The most interesting question about a list of numbers is whether it approaches a limit. Con-
 ∞  
n 1 2 3 4
sider the sequence {an } = , which is , , , , . . . . We can see that the terms
n + 1 n=1 2 3 4 5 
n
get closer and closer to 1 as n increases, and so we say that converges to 1 (and we
n+1
say that 1 is the limit of the sequence). We have two common notations for this; we may write

lim an = 1
n→∞

or

an → 1 as n → ∞.

46
Notice that the terms in our sequence never actually reach 1. This is probably the most

dangerous misconception about limits; when we say that lim an = L, all we are saying is
n→∞
that the difference between an and L becomes infinitesimally small as n increases. Since n can

never be equal to ∞, there’s no guarantee that an will ever be equal to L.

Although the concept of convergence is a fairly simple one, stating a precise mathematical

definition is surprisingly difficult. Saying that “an → L as n → ∞ means that the numbers an

get closer and closer to L as n approaches infinity” is not really sufficient. What does “close”

mean? What does it mean for n to “approach” infinity? The real definition of convergence of

a sequence is this:

Definition: A sequence {an } converges to the limit L if for any positive number ε, there

exists an integer N such that

n>N =⇒ |an − L| < ε.

This probably requires some explanation. The idea is that if the limit exists, then we should

be able to make an as close as we wish to L by making n large enough, and conversely, if we

can do that (for any distance ε, no matter how small), then the limit must exist.

Throughout Math 117 and Math 119 we’ll spend very little time on proofs. However, since

what we are dealing with here is the definition of the most fundamental concept in calculus,

we want you to understand it, and so we are going to give you some practice in proving the

existence of limits.

1
Example: Use the definition to prove that √ → 0 as n → ∞.
n

Solution: We need to show that for any ε, we can find an N such that

1
n > N =⇒ √ − 0 < ε.
n

Well, if we simplify the expression on the right, we see that what we need to end up with is
1 1
√ < ε. We can work backwards from this (solving for n) to see that what we need is n > 2 .
n ε
It might help to think of this as a little game. I give you a value for ε; let’s say ε = 0.1.
1
Your challenge is to tell me how large n has to be to make √ < 0.1. So, you tell me that n
n

47
just needs to be larger than 100 (this is the N in the definition). I might then say, “what if

ε = 0.01?”, and you would tell me that in that case n needs to be larger than 10000. If you

can always win this game, no matter what value of ε I pick, then the sequence does indeed

have the claimed limit!

A well-written, concise proof of the limit would look like this:

1 1 1
Let ε be any positive number. If n > 2
, then √ < ε. Therefore √ → 0 as n → ∞.
ε n n

n
Example: Prove that → 1 as n → ∞.
n+1

Solution: Plugging an and L into our definition, we must show that for any ε > 0, we

can find an N such that


n
n > N =⇒ − 1 < ε.
n+1

As we did in the first example, we’ll try to start with the right hand side and work backwards.

n
−1 <ε will be true if
n+1

n − (n + 1)

n+1

−1
⇐= <ε
n+1
1
⇐= <ε
n+1
1
⇐= n+1>
ε
1
⇐= n> − 1.
ε

This is our “N ”. The essence of the proof, then, is this:


1 n n
Let ε be any positive number. If n > − 1, then n+1 − 1 < ε. Therefore lim = 1.
ε n→∞ n+1
(Normally we’d want to include the details of this “if - then” statement in the proof, but

since we already have them on the page, directly above, we’ll omit them from the proof itself.)

You’ll see some more examples in the assignments.

48
Calculation of Limits

You’ll probably agree that our definition of convergence is not all that easy to use. For one

thing, we have to be able to guess at the limit before we can use the definition to prove that

it is correct! So, how do we go about finding limits, in practice? We use theorems instead.

Below are the tools we need. All of these results can be proved using the definition.

1 Some basics:

? lim C = C, for any constant C.


n→∞

? lim n = ∞.
n→∞
1
? lim = 0.
n→∞ n

The limit of a sum is equal to the sum of the limits, provided that both limits exist.

That is,

if an → a and bn → b as n → ∞,

then an ± bn → a ± b as n → ∞.

8 − 2n ∞
 
Example: Consider .
5n n=1
8 − 2n 8 2 8 2 2
Since = − , and since → 0 and → as n → ∞, we know that
5n 5n 5 5n 5 5
8 − 2n 2
→ − as n → ∞.
5n 5

23 The limit of a product is equal to the product of the limits, provided that both limits

exist. That is,

if an → a and bn → b as n → ∞,

then an bn → ab as n → ∞.

Similarly, the limit of a quotient is equal to the quotient of the limits, provided that the

limits exist and the limit of the sequence in the denominator is not zero.

Note that as a special case, we have also lim Can = Ca, for any constant C.
n→∞

49
 ∞
3+n
Example: Consider .
2n + 1 n=1
Our theorems above only apply when all of the limits involved exist, so we have to be a

little bit careful in how we deal with this. The usual practice for rational expressions is

to divide numerator and denominator by the highest power of n:

3
3+n +1
= n 1.
2n + 1 2+ n

3 1
Now, + 1 → 1 and 2 + → 2 as n → ∞,
n n
3+n 1
so we can conclude that → as n → ∞.
2n + 1 2
1
4 lim = 0, for any p > 0.
n→∞ np

5 If |r| < 1, then rn → 0 as n → ∞.

Note also that if r = 1, then rn → 1, while for other values of r the limit doesn’t exist.

6 If f is continuous12 at a, and an → a as n → ∞, then f (an ) → f (a) as n → ∞.

  ∞
πn
Example: Consider sin .
2n + 1 n=1
πn π π
Since = 1 → as n → ∞, and since sin x is a continuous function (every-
2n + 1 2+ n
2
 
πn π 
where), it follows that sin → sin = 1 as n → ∞.
2n + 1 2

Comment: In the “lim” notation we have

 
lim f (an ) = f lim an if f is continuous and lim an exists.
n→∞ n→∞ n→∞

This looks like a bit of magic; we can interchange the order of the two operations (if the

stated conditions are satisfied).

7 The Squeeze Theorem:

If an → L and cn → L as n → ∞, and if an ≤ bn ≤ cn ,

then bn → L as n → ∞ as well.
12
We’re cheating here - we haven’t defined continuity yet! Unfortunately, because the “real” definitions of
limits and continuity are so difficult to work with, the discussion gets a bit long and complicated if we do
everything in a completely logical order.

50
The reason why this is true is easiest to demonstrate using continuous functions (the

Squeeze Theorem applies to them too). See below; if all we know about the function

h (x) is that its values always lie in between the values of f (x) and g (x), but we know

that f and g share the same limit, then h must have that limit as well.

f(x)

2.5 h(x)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8

g(x)
-2.5

-5

Figure 31:

 ∞
sin n
Example: Consider .
n n=1
Note that lim sin n does not exist, and there’s no useful way to rewrite this sequence.
n→∞
However, we can observe that

−1 ≤ sin n ≤ 1,

which means that


1 sin n 1
− ≤ ≤ .
n n n

1 1 sin n
Finally, since − → 0 and → 0 as n → ∞, we can conclude that → 0 as n → ∞
n n n
as well.

8 Lastly, we list a few results about familiar functions:

? lim en = ∞.
n→∞

? lim ln (n) = ∞ .
n→∞

? lim sin (n) does not exist.


n→∞

? lim cos (n) does not exist.


n→∞

51
? lim tan (n) does not exist.
n→∞

? lim sinh (n) = ∞.


n→∞

? lim cosh (n) = ∞.


n→∞
π
? lim tan−1 (n) = .
n→∞ 2

More Examples

Many limits will be obvious, and we may state them “by inspection”. The only real difficul-

ties arise when inspection leads to an “indeterminate form”. For example, if we look at the
2

ln n + 1 ∞
expression , and try simply “plugging in” n = ∞, we get the result , which is
n+5 ∞
0
meaningless. Similarly, we might end up with , or ∞ − ∞, which are also indeterminate13 .
0
In these cases we can attempt to rewrite the sequences in forms which are not indeterminate.

3n2 − n + 2
1. Find lim .
n→∞ 2n2 + 4

Solution: Divide the numerator and denominator by the highest power:

3n2 − n + 2 3 − n1 + n22 3
2
= 4 −→ as n → ∞.
2n + 4 2 + n2 2

6n + 3 n
2. Evaluate lim .
n→∞ 6n−1 − 4

Solution: We try a variation on the same theme; divide through by the exponential

with the largest base:

6n 3n
6n + 3 n 6n + 6n
= 6n−1
6n−1 − 4 − 64n
6n

1 n

1+ 2 
= 1 1 n
−→ 6 as n → ∞.
6 −4 6


3n2 + 2
3. Evaluate lim .
n→∞ n+4
13
There arealso a handful
n of exponential forms which are indeterminate. A particularly famous example is
1
the sequence 1 + . Trying to “evaluate” this “at infinity” gives us the form 1∞ , which you might think
n
should be 1. However, the base is never actually equal to 1! What we have in the base is a sequence of numbers
which are all larger than one (even though they are shrinking)! This limit is, in fact, the definition of the
number e, as we’ll discuss in more detail later on.

52
Solution: Again, we can adapt the idea of dividing by the highest power. The numer-

ator isn’t a polynomial, but we might say that the highest “effective” power there is n (we

have n2 inside a square root). Since this matches the highest power in the denominator,

we can write √ √
1
3n2 + 2 n 3n2 + 2
= 1
n+4 n (n + 4)

√1 3n2 + 2
n2
= 4
1+ n
q
2
3+ n2 √
= 4 −→ 3 as n → ∞.
1+ n

p
4. Evaluate n2 + 2n − n.

Solution: This has the form ∞ − ∞. We can use our standard trick again, but to do

so we must first convert our sequence into a ratio (we “rationalize” it):

√ !
p p  n2 + 2n + n
n2 + 2n − n = n2 + 2n − n √
n2 + 2n + n

2n 2 2
=√ =q −→ =1 as n → ∞.
n2 + 2n + n 1 + n2 + 1 2

Sequences Without Limits

If a sequence does not have a limit, we say that it diverges. This can happen in a few different

ways.

n2

• The terms may grow without bound. For example, the terms in the sequence

continue to get larger and larger. We have a special notation for this; we write

lim an = ∞
n→∞

or

an → ∞ as n → ∞.

There is a precise definition of what this notation means, similar to the “ε-N ” definition

introduced in this section. You may see it discussed in the assignments. There is a

53
similar definition for the case lim an = −∞, as well.
n→∞

• There may be no pattern at all to the terms (for example, consider the sequence of digits

in the number π: {3, 1, 4, 1, 5, 9, 2, 6, . . .}. We still say that this sequence diverges, since

it has no limit.

• Some sequences oscillate between two numbers. For example, the sequence {(−1)n }

bounces back and forth between 1 and -1. In this case as well, we say that the sequence

diverges (we use the word divergent to mean that that the limit does not exist, no matter

how this happens).

• It is possible for a sequence to contain subsequences which converge to different limits.

Such a sequence is still said to diverge (our definition doesn’t allow a sequence to have
   
2πn 1
more than one limit). For example, consider sin + (we’ll let you
3 10n + 1
investigate this one on your own; there are three convergent subsequences).

Answers to Examples:

1. {an } = {2 + 5n}∞ ∞
n=0 , or {5n − 3}n=1 , etc.
( )∞ !
(−1)n n ∞ (−1)n+1 (n + 1)
 
2. {an } = or , etc.
(n + 1)2 n=1 (n + 2)2 n=0

12 Limits of Functions of Real Variables

The ε-δ Definition

The definitions and theorems for limits of sequences can easily be extended to limits of func-

tions as x → ∞; we can simply replace the integer variable n with the real variable x, and

everything that we have discussed so far works exactly the same way. We can even adapt it

all for limits as x → −∞; nothing else changes. However, given a function f of a real variable

x, we can also consider a second type of limit: given a constant a, what happens to f (x) as

x → a? This requires a new definition:

54
Definition:
 
The statement f (x) → L as x → a or lim f (x) = L means that for any positive number
x→a
ε, there exists a number δ such that

0 < |x − a| < δ =⇒ |f (x) − L| < .

Whereas for sequences we needed to make n large, we’re now thinking of making the distance

between x and a small ; you should think of δ as being a small positive number. If you found

the idea of the game from the previous section helpful, then the modification is this: if I

suggest a small value for ε, your challenge is now to tell me how close x must be to a in order

for |f (x) − L| to be less than ε. The number you provide is what we’ve labelled as δ. If you

can always win this game, then the limit is indeed L.

There’s one other small modification: we’ve stipulated that 0 < |x − a| because we’re not

interested in what happens when x is exactly equal to a.

The definition could be paraphrased as “ lim f (x) = L means that f (x) can be made
x→a
arbitrarily close to L by making x sufficiently close (but not equal) to a”.

Example: Prove that 3x − 2 → 4 as x → 2.

Solution: We must show that for any  > 0, there exists a δ > 0 such that |x − 2| <

δ =⇒ |(3x − 2) − 4| < .

Well, working backwards again,

|(3x − 2) − 4| < ε

⇐= |3x − 6| < ε

⇐= 3 |x − 2| < ε

ε
⇐= |x − 2| < .
3

We can now see that δ = ε/3 is the value required for the proof:

(Condensed) Proof:

Let  be any positive number. If |x − 2| < ε/3, then |(3x − 2) − 4| < ε. Therefore

lim (3x − 2) = 4.
x→2

55
Note that limits of functions can also fail to exist in the same ways as limits of sequences

can fail to exist. Also note that we need a different definition for every one of the following

statements:

• lim f (x) = L
x→a

• lim f (x) = ∞
x→a

• lim f (x) = −∞
x→a

• lim f (x) = L
x→∞

• lim f (x) = ∞
x→∞

• lim f (x) = −∞
x→∞

• lim f (x) = L
x→−∞

• lim f (x) = ∞
x→−∞

• lim f (x) = −∞
x→−∞

You may get some practice with some of these in the assignments. Actually, there are six

more; we also need definitions of “one-sided” limits,


 
 



L 


L

 

lim f (x) = ∞ and lim f (x) = ∞ .
x→a+ 
 x→a− 


 

 
−∞
 −∞

Calculation of Limits (of Functions of Real Variables)

Fortunately, the definition we’ve just introduced leads to the same set of theorems as govern

limits of sequences as n → ∞ (and functions as x → ±∞). We also get one more extremely

important one:

f is continuous at a value a if and only if f (x) −→ f (a) as x −→ a. (3)

If you prefer the other notation, this says that f is continuous at a if and only if lim f (x) =
x→a
f (a).

56
This theorem makes the calculation of most limits trivial in practice. For instance, consider

the example above. It seems silly to have to prove that 3x − 2 approaches 4 as x approaches 2,

because we all know that the function 3x − 2 is continuous (not just at 2, but everywhere), so

all we have to do is evaluate it at x = 2! In fact, we know that all polynomials are continuous,

and all of our other familiar functions are continuous on their domains as well. Therefore

evaluating limits only requires real effort when we encounter discontinuities.14

So, what kinds of discontinuities might we encounter? The most familiar is the division-

by-zero kind. If both numerator and denominator approach zero as x → a, then we have an

indeterminate form, and as we saw with sequences the most useful strategy is to try to rewrite

the function in a form which is not indeterminate. In some cases, though, it isn’t possible

to rewrite the function in any useful way, so we might ask ourselves if the Squeeze Theorem

could be of any use.

We may also encounter discontinuities when working with piecewise-defined functions. For

these we need one more (hopefully transparent) theorem:

f (x) → L as x → a if and only if f (x) → L as x → a+ and f (x) → L as x → a− .

Here the notations x → a+ and x → a− refer to the limits of f as we approach a from the

right and left sides, respectively (and again, all of our other limit theorems extend to these

types of limits as well). If we find different limits from the two directions of approach, then f

simply does not have a limit as x approaches a. So, for piecewise-defined functions, we simply

need to check whether the two limits match!

Examples:

2x2 + 1
a) lim
x→2 x2 + 6x − 4
14
Technically, we have a serious logical problem here. We have not yet defined continuity! What we should
be doing is proceeding to a discussion of continuity (which we’ll discuss in the next lecture), verifying this
claim that all of our elementary functions are continuous on their domains, and then returning to the present
discussion of limits. However, we all have a basic understanding of what continuity means, and this departure
from the logical procession allows us proceed to some examples of actual calculations. The difficulty is this:
an understanding of limits is required for a rigorous discussion of continuity, but in practice we rely on an
understanding of continuity to evaluate limits! We’ve dealt with this in a different way than most textbooks,
but the textbooks still have a logical problem. They use theorem (3) as the definition of continuity... but then
they assume that their functions are continuous almost everywhere in order to determine if f (x) → f (a)!

57
This function is continuous at x = 2, so we can conclude immediately that

2x2 + 1 2x2 + 1 9 3
lim = = = .
x→2 x2 + 6x − 4 x2 + 6x − 4 x=2 12 4

x+5
b) lim
x→2 x−2
This time we have a discontinuity in the denominator, but we do not have an indeter-

minate form, since the numerator is not zero. A little bit of thought is all that is required
 
x+5
here: if we approach 2 from the right we obtain large positive values so lim =∞ ,
 x→2+ x − 2 
x+5
while if we approach from the left we obtain large negative values so lim = −∞ .
x→2− x − 2
Therefore the limit simply doesn’t exist.

x2 + x − 6
c) lim
x→2 x−2
Here we do have an indeterminate form; both numerator and denominator approach
x2 + x − 6 (x + 3)(x − 2)
zero as x → 2, so we try rewriting the function: = =x+3
x−2 x−2
(for x 6= 2). Clearly x + 3 → 5 as x → 2, so the limit is 5.

(2 + x)3 − 8
d) lim
x→0 x
Again, a little bit of algebra is all that’s required:

(2 + x)3 − 8 x3 + 6x2 + 12x


for x 6= 0, = = x2 + 6x + 12,
x x

and this approaches 12 as x → 0.


 
1
e) lim x cos
x→0 x
This is a tricky one; cos x1 behaves very strangely near x = 0 (we may discuss this in


class). However, it is bounded:

 
1
−1 ≤ cos ≤ 1.
x

Therefore we can state that


 
1
−x ≤ x cos ≤ x,
x
1

and so we can conclude from the Squeeze Theorem that x cos x → 0 as x → 0.

58
2x2 − 3x
f) lim
x→1.5 |2x − 3|
This is a piecewise-defined function, with the change in the definition of the function

occurring precisely at the discontinuity, so we need to check the two one-sided limits:

2x2 − 3x x(2x − 3)
For x > 1.5, = = x → 1.5 as x → 1.5+ .
|2x − 3| 2x − 3

2x2 − 3x x(2x − 3)
For x < 1.5, = = −x → −1.5 as x → 1.5− .
|2x − 3| −(2x − 3)

Since the left- and right-sided limits don’t match, the limit does not exist.
p
g) lim 9x2 + x − 3x
x→∞

Limits as x → ∞ work exactly as limits of sequences; we’ve seen problems like this

before! √ !
p 9x2 + x − 3x p 2
9x2 + x − 3x = √ ( 9x + x − 3x)
9x2 + x + 3x
x
=√
9x2 + x + 3x
1
=q
9 + x1 + 3

1
−→ as x → ∞.
6

x
h) lim √
x→−∞ 4x2 + 3
Limits as x → −∞ also work the same way... except that occasionally the fact that we

are dealing with negative values of x is important.

x 1
√ = 1

4x2 + 3 x 4x2 + 3
1 √
= −1 √ since x = − x2 when x < 0(!!!)
√ 4x2 + 3
x2
−1
=q , which → − 21 as x → ∞.
4 + x32

A Special Limit: Once in a while we do encounter limits which we cannot evaluate with

these simple algebraic manipulations. Later on we’ll introduce a couple of other techniques

which will help (Taylor Series approximations, and L’Hôpital’s Rule). For now, we’ll introduce

59
just one special limit which is of particular importance:

sin θ
lim = 1.
θ→0 θ

The proof is somewhat long, so we’ll omit it, but we can make a quick intuitive argument as

to why this should be true. Recall that θ is not just the angle, but also the length of the arc

subtended by the angle θ in a unit circle. If we consider an extremely small angle, we can see

that this arc becomes virtually indistinguishable from the vertical distance sin θ (see Figure

32).

Arc length θ
Vertical
Figure 32: distance sin θ

sin θ
You may also encounter the limit lim . You should be able to see that this is different;
θ
θ→∞
it’s zero, by the Squeeze Theorem. As a final note, we point out that the function

 sin x


x if x 6= 0
f (x) =

1
 if x = 0

occurs so often in digital signal processing that it gets a special name: it is called the “sinc”

function (short for “sinus cardinalus”, or cardinal sine), and it is written as sinc (x). You can

see the two limits we’ve just mentioned in its graph:

0.75

0.5

0.25

-15π -10π -5π 0 5π 10π 15π

-0.25

Figure 33: -0.5

60
13 Continuity

The Continuity Criterion

There is a precise definition of continuity which is often omitted from textbooks. You’ll notice

that it looks like another limit definition, except that it involves values of the function at two

points instead of one:

Definition (the Continuity Criterion):

A function f (x) is continuous on an interval if for any x and y in that interval, and for any

positive number ε, there exists a number δ such that

|x − y| < δ =⇒ |f (x) − f (y)| < ε.

To paraphrase this, when we say that a function is continuous, we mean that we can always

force f (x) to be close to f (y) by making sure that x is close to y. Unfortunately, it turns out

that it is hard to use this definition successfully, so we’ll stick with one example:

Example: Show that sin x is continuous everywhere.

Solution: We’ll rely on a geometric argument. What we need to show is that for any

positive number ε, we can make |sin x − sin y| < ε, by making |x − y| small enough. Well,

consider the following diagram:

61
1.5

1.25

0.75
arc length = x-y
sin(x)-sin(y)

0.5

angle = x sin(x)
arc length = y
0.25 sin(y)

angle = y
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5

Figure 34:

We can see that the vertical distance sin (x) − sin (y) is smaller than the arc length x − y.

If we consider other quadrants of the unit circle, the signs may change, but we’ll always have

|sin (x) − sin (y)| ≤ |x − y|. Therefore, in order to ensure that |sin (x) − sin (y)| < ε, all we

have to do is make sure that |x − y| < ε. This implies that sin (x) is a continuous function,

for all values of x.

As usual with definitions, we won’t want to use it very often. Instead, we’ll accept that all

of our familiar functions:

√ 1 x
sin x, cos x, x, x, , e , ln x, etc.
x

have already been proved (by someone) to be continuous on their domains.

Furthermore, the Continuity Criterion can be used to prove the Continuity Theorems:

62
Theorem:

Suppose that f (x) and g (x) are continuous on an interval I. Then

• (f ◦ g) (x) is continuous on I

• (f ± g) (x) are continuous on I

• (f g) (x) is continuous on I

1
• is continuous at any point in I where f (x) 6= 0.
f (x)

Determining Continuity in Practice

Putting all of this together, we can summarize everything there is to know about continuity

in two simple statements:

1 Most of the functions we’ll encounter will obviously be continuous on their domains.

This means that for most functions, we can usually tell where we have continuity just

by identifying any gaps in the domain!

Examples:

? The function sin (ex + cos x) is continuous for all values of x.


x2 + 4
? The function √ is continuous for all values of x > 12 .
2x − 1
1
? The function is continuous for all x 6= 015
x

2 Continuity will usually only be in question when we are dealing with piecewise-defined

functions!

For these, the theorem of the previous section is the tool we need:

f is continuous at a value a if and only if f (x) −→ f (a) as x −→ a. (4)

In fact most textbooks present this as the definition of continuity! (Why didn’t we

do this? Simply because we felt like being honest. This “definition” only applies to
15 1
It probably makes sense to you to say that has a discontinuity at x = 0. Note, though, that we can
x
1
also correctly say that is continuous on its domain!
x

63
individual points, so it would be extremely hard to prove that a function like sin x is

continuous everywhere!)

Examples:

x2 ,

 for x ≤ 2
? Consider the function f (x) = We know that this is continuous

6 − x,
 for x > 2.
for x < 2 and x > 2, but is it continuous at x = 2? Well, we can see that f (2) = 4,

so the question is whether lim f (x) = 4. That’s easy to answer:


x→2

lim f (x) = lim x2 = 4,


x→2− x→2−

and lim f (x) = lim (6 − x) = 4,


x→2+ x→2−

and so lim f (x) = 4. Hence f (x) is continuous everywhere.


x→2

? Consider the sinc function (introduced in the previous section):



 sin x


x for x 6= 0
sinc (x) =

1
 for x = 0.

sin x
We know that is continuous for all x 6= 0, so the only question is at that
x
sin x
special point x = 0. However, we’ve established that lim = 1, so the sinc
x→0 x
function is indeed continuous everywhere.

Types of Discontinuities

We have three informal names for the various ways in which a function can fail to be continuous

at a single point:

sin x
• The discontinuity in is called a “removable discontinuity”, because we can obtain
x
a continuous function (the sinc function) simply by defining f (0) = 1.

 1

1

x−1 for x 6= 1
• The discontinuities we see in the functions f (x) = and g (x) =
x−1 
0
 for x = 1
are called “infinite discontinuities”, for obvious reasons.

64


x
 for x ∈ [0, 10)
• The discontinuity in the function f (x) = is called a “jump dis-

8
 for x ∈ [10, ∞)
continuity”. Again, you should be able to see why if you make a rough sketch of the

graph.

The Intermediate and Extreme Value Theorems

There are many theorems which apply only to continuous functions. Here are two of the most

important:

The Intermediate Value Theorem (IVT)

Theorem:

Suppose f (x) is continuous on a closed interval [a, b]. If c is a number between f (a) and

f (b) (that is, if f (a) < c < f (b) or f (b) < c < f (a)), then there exists at least one number

x ∈ (a, b) such that f (x) = c.

In other words, if c ● (b,f(b))


is a value between
f(a) and f(b), then it
is impossible to draw
a continuous path
from (a,f(a)) to (b,f(b)) y=c
without crossing the
line y=c at least once.

(a,f(a)) ●

It might seem odd to state this as a theorem, since it seems so obvious, but it will be

convenient to have a name for the argument. Instead of saying “we know this because f is

continuous and c is between f (a) and f (b)” we can now just say “by the IVT”! Also, it is

65
surprisingly difficult to prove the IVT; we won’t be able to do so here.

Application: Root Finding Suppose we wish to solve the equation ex −2 = cos x. A quick

sketch tells us that there must be at least one solution, but we have no way of calculating

it exactly. However, with repeated use of the IVT we are able to calculate it precisely; that

is, we can find an approximate solution to whatever level of accuracy we desire. The idea is

simple: we rewrite the equation as ex − 2 − cos x = 0, let f (x) = ex − 2 − cos x, and observe

that f (x) is continuous. We then start checking values of f (x):

Since f (0) = −2 < 0 and f (1) ≈ 0.18 > 0, we can conclude that the root must lie between

x = 0 and x = 1 (let’s refer to the root as x0 ).

Next, check the midpoint of this interval: f (0.5) ≈ −0.85 < 0, so x0 must lie between

x = 0.5 and x = 1.

Repeat using the midpoint of this new interval: f (0.75) ≈ −0.63 < 0, so x0 ∈ (0.75, 1).

Continue. Let’s see if we can find the root to two decimal places:

f (0.875) ≈ −0.48 < 0 =⇒ x0 ∈ (0.875, 1).

f (0.9375) ≈ −0.04 < 0 =⇒ x0 ∈ (0.9375, 1).

f (0.96875) ≈ 0.07 > 0 =⇒ x0 ∈ (0.9375, 0.96875).

f (0.953125) ≈ 0.01 > 0 =⇒ x0 ∈ (0.9375, 0.953125).

f (0.9453125) ≈ −0.01 < 0 =⇒ x0 ∈ (0.9453125, 0.953125).

We can now say that to two decimal places, the root is 0.95. This is referred to as the

Bisection Method for root finding. It’s slow, but if we’re working by hand we can speed it up

a bit by using some common sense; since f (0) = −2 and f (1) ≈ 0.18 we might have guessed

right away that the root would be closer to 1 than to 0, and jumped straight to 0.9 instead of

using the midpoint. We’ll see some faster methods in Math 119.

Appication: Curve Sketching Understanding the IVT can help us determine what a

function’s graph looks like.

Example: Determine the intervals on which f (x) = x3 − 4x is positive.

Solution: Since x3 − 4x = x x2 − 4 = x (x + 2) (x − 2), the zeroes of f (x) are at 0, 2,




and −2. Since f is continuous, it cannot change sign within the intervals (−∞, −2), (−2, 0),

66
(0, 2), or (2, ∞) (otherwise the IVT would guarantee the existence of another zero). So, all we

need to do is check the sign of one point within each interval:

f (−3) = −15

f (−1) = 3

f (1) = −3

f (3) = 15

Therefore f is positive on (−2, 0) and on (2, ∞), and negative on (−∞, −2) and (0, 2).

Of course, this isn’t an awful lot of information, but it can be a useful idea if we’re having

difficulty producing a sketch and we want to check our work.

The Extreme Value Theorem

Theorem:

If f is continuous on a closed interval [a, b], then f attains a maximum value and a minimum

value on that interval.

Perhaps the best way to appreciate this is to consider how a function can fail to have

extreme values if it is not continuous, or if the interval is not closed:

If f (x) is not continuous, then its graph could do something like this:
3


In this case f (x) isn’t
1

0
even bounded on the

given interval.
-1

-2

-3

67
If we have a removable discontinuity, we could have a situation like this instead:
3

y=N Here f (x) is bounded, but


2 °
does not have a maximum
1

value. We can say that N


0

is a least upper bound, but


-1
N is not a value of f , and if
-2
you name a value of f I can
-3
always find a larger one!

If f (x) is continuous, but the interval under consideration isn’t closed, then we might have

this situation:
3

This is quite similar to


2 °
the previous case; f (x) is
1

bounded, but lacks extreme


0

values.
-1

-2

°
-3

So, that explains what can happen with continuous functions on open intervals, or functions

with infinite or removable discontinuities on closed intervals. As a quick exercise, sketch an

example of the remaining case; demonstrate that a function with a jump discontinuity can

lack extreme values, even if the interval under consideration is closed.

68
Part III

Differential Calculus
Calculus is usually thought of as having two “halves”: differential calculus and integral calculus

(although the concept of limit underlies both of them). You should already be familiar with

most of the formulas we’re about to review, so we’ll move quickly here. First, though, we

should remind you of the basic concept behind the derivative.

14 The Derivative

The average rate of change of a function f (x) over an interval [x0 , x1 ] is

f (x1 ) − f (x0 )
.
x1 − x0

This expression is often referred to as the Newton difference quotient. Its meaning should be

clear. As an example, suppose your car’s odometer reads 31,040 km at 10am, and 31,373 km

at 1pm. Then the average rate of change of the distance you have travelled along your path

(your average speed) has been

f (t1 ) − f (t0 ) 31373km − 31040km 333km


= = = 111km/hr.
t1 − t0 1pm − 10am 3hrs

The derivative of a function f (x) at a point is simply the limit of the difference quotient

as the length of the interval approaches zero:

Definition:

The derivative of a function f (x) at a point x0 is

f (x1 ) − f (x0 )
lim ,
x1 →x0 x1 − x0

assuming that this limit exists. If the limit does not exist, we say that f is not differentiable

at x0 .

There are several notations in common usage, all inherited from the great minds of the

17th and 18th centuries:

69
• Lagrange’s notation: f 0 (x0 ). This will be our preferred notation for simple statements

or calculations, in which the possibility of confusion is low. We read this as “f prime of

x0 ”.

df
• Leibniz’s notation: (x0 ). This will be our preferred notation for more sophisticated
dx
df
calculations. It is useful because its form is so similar to the Newton quotient;
dx
∆f
stands for lim (where the Greek letter ∆ is used to denote a change in a quantity;
x1 →x0 ∆x
∆f = f1 − f0 and ∆x = x1 − x0 ). Since the derivative is the limit of a quotient, there are

circumstances in which it is permissible to treat it as if it were an actual quotient itself,

and the Leibniz notation enables us to do so efficiently (that is, we may occasionally
df
treat the expression as a fraction). It also allows us to treat the derivative as an
dx
d
operator (a function which acts on functions instead of numbers); we can write [f (x)]
dx
to denote the action of the differentiation process upon the function f (more on this in
dy df
a moment). Also, note that if y = f (x), we’ll frequently write instead of .
dx dx

• Newton’s notation: f˙ (x0 ). This is rarely used in purely mathematical texts, but is the

preferred notation in physics.

• Euler’s notation: Df . This is used when we are treating the derivative exclusively as an
d
operator; it’s a more concise way of writing [f (x)], which also allows us to discuss
dx
the derivative function without specifying a name for the independent variable. You will

likely not see this used in Math 117 or Math 119.

We can think of the derivative as an instantaneous rate of change of f at x0 . The action

of calculating a derivative is called differentiation (I’ve heard many students refer to it as

“deriving”, but please don’t do that - it makes my head hurt!).

Geometric Interpretation The average rate of change of a function f (x) over an interval

[x0 , x1 ] can be interpreted as the slope of the secant line joining the points (x0 , f (x0 )) and

(x1 , f (x1 )). If we let x1 approach x0 this secant line spans a shorter and shorter segment of

the curve, and in the limit we obtain a line which touches the curve at only one point (at

least, only one in the immediate vicinity). This, of course, is how we define a tangent line to

a generic curve. See below:

70
2.4

•• • •
1.6
•• •
• •

0.8

x0 ← x1

-0.8
Figure 35:

To obtain the equation of the tangent line, recall the “point-slope” form of the equation of

a line: a line with slope m passing through a point (a, b) has equation

(y − b) = m (x − a) .

Using this we can say that the tangent line to f (x) at x0 has equation

y − f (x0 ) = f 0 (x0 ) (x − x0 ) .

That is,

y = f (x0 ) + f 0 (x0 ) (x − x0 ) .

The Derivative as a Function

The definition of the derivative can be written in a second way. If we write the difference

x1 − x0 as ∆x, then we can also write x1 as x1 − x0 + x0 = ∆x + x0 . With those changes,

f (x1 ) − f (x0 )
f 0 (x0 ) = lim
x1 →x0 x1 − x0

can be rewritten as
f (x0 + ∆x) − f (x0 )
f 0 (x0 ) = lim
∆x→0 ∆x

71
The advantage of this is that we’ve eliminated x from the definition, which allows us to let x0

vary (we’ll just relabel it as x) and hence define the derivative as a function:

f (x + ∆x) − f (x)
f 0 (x) = lim .
∆x→0 ∆x

Higher-order Derivatives

Since the derivative of a function f is itself a new function, it can be differentiated as well.

We’ll call the result the second derivative of f (and we can then calculate a third derivative,

and so on).

In Lagrange’s notation, we add more strokes: the second and third derivatives are f 00 (x)

and f 000 (x). For the fourth derivative and beyond, it’s common to write superscript numbers

in brackets instead: f (4) (x), etc.


...
Similarly, in Euler’s notation, we add more dots: f¨, f , etc. It quickly becomes difficult

to tell how many dots there are, which may be one reason why this notation has fallen out of

favour with mathematicians, but is still popular with physicists, who rarely need to discuss

derivatives of very high orders.


 
d dy
In Leibniz’s notation, the derivative of the derivative is , and we abbreviate this
dx dx
d2 y dn y
as . This can easily be generalized: the nth derivative is .
dx2 dxn

Physical Interpretation

The meaning of a derivative will depend on the context. Remembering the difference quotient

will help!

Example: If s (t) describes the displacement of an object as a function of time, then the
length (∆s)
derivative s 0 (t) has units of , so it must be velocity! Since velocity is interesting
time (∆t)
∆v
in its own right, let’s give it a new label: let v (t) = s 0 (t). Then v 0 (t) = lim has units
∆t→0 ∆t
length/time (∆s/∆t) length
of , or ; it is acceleration (a (t) = v 0 (t) = s 00 (t)).
time (∆t) time2

Question: If m (x) gives the mass of the part of a metal rod that lies between its left end

and a point x meters to the right, what does m 0 (x) represent?

72
0 x

Figure 36: This section has mass m (x)

∆m mass
Answer: Since m 0 (x) = lim , it has units of . We may refer to this as the
∆x→0 ∆x length
“linear density” of the rod.

15 Differentiation Formulas

We want you to know and understand the definition of the derivative so that you can properly

interpret derivatives, but we’ll rarely have to actually use it. Instead, we have a set of theorems

we can rely on.

d
1 (k) = 0, for any constant k.
dt
d n
2 (x ) = nxn−1 , for any constant n (although of course if n < 1 then we must exclude
dt
the point x = 0).

d df df
3 [kf (x)] = k , for any constant k (assuming that exists).
dx dx dx
d df dg
4 [f (x) + g (x)] = + (assuming that these both exist).
dx dx dx

Note that rules 3 and 4 are the defining properties of linearity; we say that the derivative

is a linear operator.

Also, rules 1 through 4 are all we need in order to differentiate any polynomial.

Example: You should have no problem calculating the derivative of f (x) = 3x3 +5x2 −2x+7.

Answer: f 0 (x) = 9x2 + 10x − 2.

For other functions (non-polynomial), we’ll need some specific rules:

d
5 (sin x) = cos x
dx
It may be worth seeing the proof of this particular rule:

d sin (x + ∆x) − sin x


(sin x) = lim (by definition)
dx ∆x→0 ∆x

73
sin x cos ∆x + cos x sin ∆x − sin x
= lim (using the sum-of-angle identity)
∆x→0 ∆x
   
cos ∆x − 1 sin ∆x
= lim sin x + lim cos x (assuming that both limits exist)
∆x→0 ∆x ∆x→0 ∆x
   
cos ∆x − 1 sin ∆x
= sin x lim + cos x lim
∆x→0 ∆x ∆x→0 ∆x
sin ∆x
Now, if we are working in radians, then → 1 as ∆x → 0, while
∆x
  
cos ∆x − 1 cos ∆x − 1 cos ∆x + 1
=
∆x ∆x cos ∆x + 1

cos2 ∆x − 1
=
∆x (cos ∆x + 1)

sin2 ∆x
=
∆x (cos ∆x + 1)
  
sin ∆x sin ∆x
=
∆x cos ∆x + 1

−→ 1 · 0 = 0 as ∆x −→ 0.

d
Hence (sin x) = cos x , provided that x is in radians.
dx

d
6 (cos x) = − sin x , by a similar calculation.
dx

d x
7 (e ) = ex
dx
This rule requires some explanation. If we apply the definition of the derivative, this is

what we find:
d x ex+∆x − ex
(e ) = lim
dx ∆x→0 ∆x
∆x − 1

x e
= lim e
∆x→0 ∆x
e∆x − 1
= ex lim .
∆x→0 ∆x

eh − 1
This is a very special limit. For simplicity let’s write it as lim. There’s no obvious
h→0 h
way to simplify the expression, and at the moment we have no techniques for evaluating

74
it. However, you can see that in order to get the result we want, the limit needs to be

equal to 1. In fact, the number e is defined to be the number which makes this particular
d
limit 1, so that dx (ex ) = ex . That’s an implicit definition; to find an explicit one observe

that
eh − 1
if ≈1
h

then eh − 1 ≈ h

so eh ≈ 1 + h

and so e ≈ (1 + h)1/h .

This is the definition you’ll find in most textbooks: e = lim (1 + h)1/h


h→0
1 n
   
or, equivalently, e = lim 1 + . Try experimenting with your calculator to see
n→∞ n
how these expressions approach e.

Next we have the big three rules for differentiating combinations of functions:

d df dg
8 The Product Rule: [f (x) g (x)] = ·g+f ·
dx dx dx
   
d f (x) 1 df dg
9 The Quotient Rule: = ·g−f ·
dx g (x) [g (x)]2 dx dx

10 The Chain Rule: [f (g (x))] 0 = f 0 (g (x)) · g 0 (x)

This, of course, is what the rule looks like in Lagrange’s notation, which should be

sufficient for our needs for now. As we proceed, though, you will need to become equally

comfortable with using the Chain Rule in Leibniz’s notation:

If y = f (u) and u = g (x) (so that y = f (g (x))), then

dy dy du
= .
dx du dx

dy
Note that du is f 0 (u), which is f 0 (g (x)), while du
dx is g 0 (x), so this is indeed the same

rule!

You can see that in the Leibniz notation it looks as though if we work backwards the

differential du cancels out, as if our derivatives were fractions. This is, in fact, exactly

75
what’s happening, except that the cancellation occurs within a limit! That apparent

cancellation is a reflection of an actual cancellation used in the proof of the chain rule:

Suppose y = f (u) and u = g (x). We may write

dy ∆y
= lim
dx ∆x→0 ∆x

∆y ∆u
= lim , (as long as ∆u 6= 0).
∆x→0 ∆u ∆x

du ∆u
Now, if exists (that is, if lim exists), then this can be expressed as
dx ∆x→0 ∆x

  
∆y ∆u
lim lim
∆x→0 ∆u ∆x→0 ∆x

du
and ∆u must approach 0 as ∆x → 0 (again, assuming that exists). Therefore this is
dx
  
∆y ∆u dy du
lim lim = ,
∆u→0 ∆u ∆x→0 ∆x du dx

as expected.16

A second advantage of the Leibniz notation is that it allows us to write the rules for

multiply-compound functions more concisely. For example, suppose that y = f1 (x),

x = f2 (s), s = f3 (t), and t = f4 (z). We can then say that

dy dy dx ds dt
= .
dz dx ds dt dz

How would you write this rule in prime notation??

With these ten rules, we can calculate derivatives for almost any other functions we need.

cos2 x + sin2 x
 
d d sin x 1 d
11 (tan x) = = 2
= 2
= sec2 x, so (tan x) = sec2 x .
dx dx cos x cos x cos x dx

d
12 Similarly, (cot x) = − csc2 x .
dx
16
This “proof” is not quite complete. The assumption “as long as ∆u 6= 0” needs to be explored.

76
 
d d 1 d h i
13 (sec x) = = (cos x)−1
dx dx cos x dx

  
−2 sin x 1 sin x
= − (cos x) (− sin x) = = = sec x tan x,
cos2 x cos x cos x

d
so (sec x) = sec x tan x .
dx

d
14 Similarly, (csc x) = − csc x cot x .
dx

d ex + e−x
 
d 1 x d
e − e−x = sinh x, so

15 (cosh x) = = (cosh x) = sinh x .
dx dx 2 2 dx

d
16 Similarly, (sinh x) = cosh x .
dx

d
17 A calculation similar to that for the tangent function shows that (tanh x) = sech2 x .
dx
etc.

16 Implicit and Logarithmic Differentiation

16.1 Implicit Differentiation

For functions which are defined implicitly, such as the functions defined by the equation

x2 + y 2 = 4, we might try solving for y (or x) explicitly before using the formulas above.

There is another way to proceed, though (which is fortunate, since it isn’t always possible to

solve for either variable in expressions of this type).

All we need to do is “view” y as a function of x, and differentiate both sides of the equation

with respect to x, applying the Chain Rule whenever necessary:

Example:

x2 + [y (x)]2 = 4

d h 2 i d
=⇒ x + [y (x)]2 = (4)
dx dx
dy
=⇒ 2x + 2y (x) =0
dx
dy x
=⇒ =− (for y 6= 0)
dx y

77

Of course, we could now replace y with ± 4 − x2 , if we wish.

Example: Suppose exy = x + y (notice that it is impossible to solve for y or x here).

Differentiating with respect to x gives us

 
xy dy dy
e y+x =1+
dx dx

dy dy
=⇒ yexy + xexy =1+
dx dx
dy 1 − yexy
=⇒ = xy .
dx xe − 1

Unfortunately we can’t do anything about the fact that our expression involves both x and y.

Nevertheless, if we can identify a point on the curve, then we should be able to find the slope

at that point. For example, we can see that the point (0, 1) is on the curve, and the slope

there is zero.

16.2 Derivatives of Inverses

Suppose we know the derivative of an invertible function f (x). What does that tell us about

the derivative of its inverse? Well, let’s start by writing y = f −1 (x). We wish to determine
dy
. We can use the implicit differentiation technique we’ve just discussed to find it! We know
dx
that

x = f (y) ,

d d
so (x) = (f (y))
dx dx
dy
=⇒ 1 = f 0 (y)
dx
dy 1
=⇒ = 0
dx f (y)
1
= .
f 0 (f −1 (x))

This formula looks a bit cumbersome, but consider what happens if we write it in Leibniz
dy 1
notation. Our second-last line, = 0 , can be rewritten as
dx f (y)

dy 1
= ,
dx dx/dy

78
which is a marvelously simple result!

By mimicking this procedure, we can prove a few more formulas to add to our list:

16 Suppose y = ln x.

Then x = ey

dy
so 1 = ey
dx
dy 1 1
and so = y = .
dx e x

d 1
That is, ln x = .
dx x

17 Suppose y = sin−1 x. Then x = sin y, and y ∈ − π2 , π2 .


 

dy
Therefore 1 = cos y ,
dx

dy 1
so =
dx cos y
1
=p
1 − sin2 y
 h π π i
we know that we need the positive root, since cos y ≥ 0 when y ∈ − ,
2 2
1
=√ .
1 − x2

d 1
sin−1 x = √

Hence .
dx 1 − x2

d 1
cos−1 x = − √

18 Similarly, (try proving this to see where the negative sign
dx 1 − x2
comes from).

19 Using the same strategy with the arctangent function yields an extremely useful rule:
d 1
tan−1 x =

.
dx 1 + x2

dy 1
Caution: Our formula for the derivative of an inverse, = , suggests that we can
dx dx/dy
treat derivatives as fractions. This is usually true, but only for first-order deriva-

tives. That is,


d2 y 1
2
is NOT equal to 2 .
dx d x/dy 2

79
It is not hard to find a counterexample to prove this claim. Consider:

If y = ex then x = ln y,

dy dx 1 1
so = ex while = = x ,
dx dy y e
d2 y 2
 
d x 1 1
but = ex whereas = − 2 = − 2x .
dx2 dy 2 y e

d2 y 1
So, if 6= 2 , what should the formula be? It’s unlikely that you’ll ever have to
dx2 d x/dy 2
use it, but finding it is a useful exercise in using the chain rule:

d2 y
 
d dy
=
dx2 dx dx
 
d 1
= .
dx dx/dy

This is now asking for the derivative with respect to x of a function of y (or rather, y (x)),

which is exactly what the Chain Rule is for. We differentiate with respect to y, and multiply

by dy/dx:

   
d 1 d 1 dy
= ·
dx dx/dy dy dx/dy dx

1 d2 x dy
=− · · (Chain Rule again!)
(dx/dy)2 dy 2 dx

1 d2 x
=− .
(dx/dy)3 dy 2

You do NOT need to know this formula, but the steps are quite similar to certain calculations

you’ll need to do in one of your second-year courses.

16.3 Logarithmic Differentiation

How might we find the derivative of a function such as (cos x)x ? First of all, we need to realize

that the formulas we’ve discussed so far do not allow us to calculate it straightforwardly
d  k
(the formula x = kxk−1 requires that k be a constant, so we cannot simply write
dx
y 0 = x (cos x)x−1 ).

80
The trick is to apply logarithms, in order to displace the exponent:

y = (cos x)x

=⇒ ln y = ln [(cos x)x ]

= x ln (cos x)
 
1 dy 1
=⇒ = ln (cos x) + x (− sin x)
y dx cos x

dy
=⇒ = (cos x)x [ln (cos x) − x tan x] .
dx

We could, in fact, use this technique to derive a formula for the derivative of a function of the

form f (x)g(x) , but it turns out to be a bit cumbersome!

Of course, in practice you won’t often encounter functions of this form. This technique,

though, can be a useful shortcut for differentiating complicated expressions.



x2 3 7x − 14
Example: Suppose y = . Then
(1 + x2 )4


x2 3 7x − 14
 
ln y = ln
(1 + x2 )4

4
= ln x2 + ln [7 (x − 2)]1/3 − ln 1 + x2

1 1
ln 7 + ln (x − 2) − 4 ln 1 + x2

= 2 ln x +
3 3
1 dy 2 1 8x
so = + −
y dx x 3 (x − 2) 1 + x2

x2 3 7x − 14 2
 
dy 1 8x
and hence = + − .
dx (1 + x2 )4 x 3 (x − 2) 1 + x2

Comment: Technically these steps are only valid for x > 2, since we may only apply

the logarithm to positive numbers. However, the result is also valid for x < 2, and this is not

just a “fluke”. One extra step would make this clear: we’d just need to apply absolute values

to both sides of the equation before applying the logarithm.

Logarithmic differentiation also allows us to derive one more pair of differentiation rules:

20 Consider f (x) = ax , where a can be any positive number. To find its derivative, let’s

81
write

y = ax ,

and apply logarithms:

ln y = x ln a

1 dy
=⇒ = ln a
y dx
dy
=⇒ = y ln a = (ln a) ax .
dx

d x
Therefore (a ) = (ln a) ax .
dx

21 Using the previously-discussed method for dealing with inverse functions, we can show
d 1
that (loga x) = .
dx (ln a) x

17 Theorems

17.1 Differentiability Implies Continuity

Theorem: If f (x) is differentiable at a, then f (x) is continuous at a.


 
f (x) − f (a)
Proof: Observe that f (x) − f (a) = (x − a) .
x−a
Take limits of both sides:

f (x) − f (a)
lim [f (x) − f (a)] = lim (x − a) lim (true if both limits exist)
x→a x→a x→a x−a

= 0 · f 0 (a) (since we’ve assumed that f 0 (a) does exist)

= 0.

17.2 The Mean Value Theorem

Theorem: If f (x) is continuous on [a, b] and differentiable on (a, b), then there exists a

number c ∈ (a, b) such that


f (b) − f (a)
f 0 (c) = .
b−a

82
Note that there may be more than one such number. The graphical interpretation is that f

attains its average slope at least once in (a, b), which we hope seems intuitive. This particular

theorem can be very useful for proving other theorems in calculus. For example, we can prove

the following:

Corollary: If f 0 (x) = 0 for all x ∈ (a, b), then f (x) is constant on (a, b).

Proof: Let x1 and x2 be any two distinct points in (a, b). By the Mean Value Theorem,

there exists a number c ∈ (x1 , x2 ) such that

f (x1 ) − f (x2 )
f 0 (c) = .
x1 − x2

But we’ve assumed that f 0 (c) = 0, so f (x1 ) = f (x2 ). Since x1 and x2 are arbitary, f has

the same value everywhere in (a, b).

17.3 L’Hôpital’s Rule


f (x)
Theorem: Consider a limit of the form lim , where f (x) and g (x) are differentiable on
x→a g (x)
some open interval containing a (exept possibly at a itself). If lim f (x) = 0 and lim g (x) = 0
x→a x→a
0 f (x) f 0 (x)
(so that the limit is of indeterminate form ), then lim = lim 0 , if this limit exists.
0 x→a g (x) x→a g (x)

The same result holds for the indeterminate form , and it also holds if we replace “ lim ”
∞ x→a
with any other limit (that is, it holds for one-sided limits, and it holds as x → ±∞ ). We’ll

give you some practice with this in the homework and assignments.

Note: you will also see the name spelled “L’Hospital’s Rule”. This is how Guillaume de

l’Hospital, after whom the result is named, spelled his own name, but in modern French

spelling the silent “s” is replaced by the circumflex over the preceding vowel. The “H” is also

silent, so whichever spelling you use, the pronunciation is “lopital’s rule”.

18 Related Rates

Here’s a very direct application of the chain rule: if we know that two quantities are related

(let’s say that y = f (x)), and each of those two quantities is changing in time, then the rates

of change of the two quantities are also related:

83
y (t) = f (x (t))

d d
=⇒ [y (t)] = [f (x (t))]
dt dt
dy df dx
=⇒ = .
dt dx dt
df dy
Note that dx will not usually be constant, so to find dt , we will usually need to know both
dx df
dt and t (from which we can determine x, and hence determine dx ).

Example: Suppose a spherical balloon is being inflated at a rate of 1 litre per minute. How

quickly is the diameter of the balloon increasing at the moment when the diameter is 10 cm?

Solution: Uh-oh - we have a word problem. Don’t panic; we just need to translate it

into mathematics! We are given information about volume and diameter, so let’s call those

V and d. We need to consider the relationship between them; we know that for a sphere,
4 d πd3
V = πr3 , where r is the radius, and of course r = , so V = . Differentiating both sides
3 2 6
with respect to time, t, we find that

dV πd2 dd
= .
dt 2 dt

dV dd
We are given both dt and d, so all we have to do is plug them in and solve for dt . There’s

just one catch: we have to take some care that we use compatible units. Since 1 litre is a cubic

decimeter, let’s express the diameter as 1 dm. We then have

π dd
1= ,
2 dt

dd 2 20
so = dm/min, or cm/min if we’d rather not use decimeters for the final result.
dt π π

Comment: We can also see that this rate of change will be slowing rapidly, since with
dV dd 2
fixed at 1 L/min, we have = .
dt dt πd2
The essence of every one of these problems is the same; we’re just differentiating everything

with respect to time. The challenge will typically lie in the “translation to mathematics”, but

there is less to that than you might fear:

84
• Read the question carefully (what information are you given, and what information are

you being asked for?). Give names to the quantities, and be aware that some will be

variable, and some will be constant.

• Determine the relationship between the variables. This may come from a standard

formula, as in the example above, or it might be based on the specific geometry of the

problem. Drawing a diagram may be an essential step.

Once you have that relationship, all you have to do is differentiate both sides of the equation

with respect to time, applying the chain rule. The last step is just plugging numbers in, with

attention paid to the units involved.

Example: Suppose an aircraft is cruising at an altitude of 10 km, at a groundspeed of 900

km/hr. An observer on the ground sees it pass directly overhead, and watches it travel away

from him. When the angle of elevation reaches 45◦ , how quickly is it decreasing?

Solution: We are given an altitude of h = 10 km (constant). We are also told that the

horizontal component of the distance from the observer (let’s call this x) is changing at the
dx π
rate = 900 km/hr. Finally, we are given that the angle of elevation (θ) is radians (since
dt 4

we’re never going to use degrees in a calculus problem), and we are asked to find . So...
dt
how are x and θ related? A diagram might help here.
h
We can see that tan θ = , so differentiating with respect to time tells us that
x

dθ h dx
sec2 θ =− 2 .
dt x dt

π h
When θ = , we have x = = 10 km, and so
4 tan θ

dθ h cos2 θ dx
=−
dt x2 dt

1

(10 km) 2
=− 2 (900 km/hr)
100 km

= −45 rad/hr.

That is, the angle of elevation is decreasing at 1


8 rad/sec, or about 7.2◦ per second.

85
19 Differentials

As we mentioned earlier in the course, the expressions ∆x and ∆y are used to denote small

increments in the values of x and y. We also use the expressions dx and dy to represent

these quantities in limits as they approach zero. For example, when we introduce the definite

integral (coming soon), we’ll write

n
X Z b
lim f (x∗i ) ∆x = f (x) dx,
∆x→0 a
i=1

and of course in Leibniz’s notation for the derivative we write

∆y dy
lim = .
∆x→0 ∆x dx

Let’s explore this definition of the derivative a bit further. Assuming that the increment
∆y dy
∆x is small, we know that ≈ . That is (if we switch to Lagrange’s notation),
∆x dx

∆y
≈ f 0 (x) .
∆x

Well, we can write this as ∆y ≈ f 0 (x) ∆x (since ∆y and ∆x actually are separate finite

quantities). The error in this approximation approaches zero as ∆x → 0, and to incorporate

this realization into our expression, we write

dy = f 0 (x) dx .

The expression f 0 (x) dx is called the differential of f (and we may also write it as df instead
dy
of dy). Essentially what we’ve done is provide some justification for treating the expression
dx
as a fraction. The differentials dy and dx are still meaningless in isolation; they are defined by

their relationship to each other. Nevertheless, we can manipulate them as separate quantities

within that relationship.17

As a rule of thumb, given expressions involving differentials (dx, dy, dz, ds, etc.), we can

treat them as separate quantities as long as we put everything back into sensible expressions
17
As a simple example, we can see an immediate connection to our discussion of related rates: if we divide
dy dx
the expression above by a differential dt, we get = f 0 (x) .
dt dt

86
dy
when we’re finished. For example, consider the chain rule. Given the derivative , we can
dx
imagine dy and dx to be distinct quantities, and we can multiply and divide by a third quantity

du:
dy dy du dy du
= = ,
dx dx du du dx

and this gives a valid result as long as we’ve defined the variable u appropriately and as long
dy du
as the derivatives and both exist. We’ll manipulate differentials in a similar way when
du dx Z b
we study integration; the differential dx in the expression f (x) dx can be treated as a finite
a
quantity, as long as when we’re finished “playing with it” we end up with a sensible integral

expression.

The Use (or Abuse) of Differentials for Tangent Line Approximations

The notation of differentials is sometimes used in a different way. Consider a function f (x),

on an interval [x0 , x0 + ∆x]. When x increases by the amount ∆x, the value of f will increase

by an amount ∆f , and we know that we can use the tangent line to f at x0 to find an

approximation for this change. The straightforward way is to find the equation of the tangent

line to f at x0 , and evaluate it at x0 + ∆x, but we can be more efficient than that.
∆f
Since ≈ f 0 (x0 ), we know that ∆f ≈ f 0 (x0 ) ∆x. This allows us to conclude that
∆x

f (x0 + ∆x) = f (x0 ) + ∆f

≈ f (x0 ) + f 0 (x0 ) ∆x.

In practice, it is helpful to have a separate notation for the approximate change, f 0 (x0 ) ∆x,

and we traditionally use the differentials for this purpose18 . That is, while we’re using the

increments ∆x and ∆f for the horizontal and vertical components of motion along the actual

curve y = f (x), we’ll use the differentials dx and df to represent the corresponding quantities

for the tangent line. Of course, the change in x is the same for both, so we’re setting dx = ∆x.

On the other hand, df will be our approximation for ∆f . See below:


18
This is a bit odd, since the differentials are normally supposed to be infinitesimally small, and not inde-
pendently meaningful! However, we need a second notation for the change along the tangent line, and this
happens to work well - and this abuse of differentials has been standard for a long time.

87
y=f(x)

tangent line

f(x0+Δx)

f(x0)
↕df=f '(x )dx
0 ↕ Δf=f(x0+Δx)-f(x0)

← Δx=dx →

x0 x0+Δx

Figure 37:

The idea is that if we’re interested in finding the change ∆f , we can find a quick approxi-

mation by calculating df = f 0 (x0 ) dx.

Example: Consider a square of side length x. Its area is A (x) = x2 . Suppose we increase x

by a small amount ∆x. Since the differential is dA = 2xdx, we can conclude that the change

in area, ∆A, will be approximately 2x∆x.

Of course, we could calculate the change exactly:

∆A = A (x + ∆x) − A (x)

= (x + ∆x)2 − x2

= x2 + 2x∆x + (∆x)2 − x2

= 2x∆x + (∆x)2 .

You can see that the error in our differential approximation is (∆x)2 , which will be very small

if ∆x is small. In fact, in this particular example we can see this geometrically:

88
Δx xΔx (Δx)2

x x2 xΔx

x Δx

Figure 38:

This is, admittedly, not quite as useful as it was 50 years ago! If we have a calculator handy,

and we have numbers, then we won’t need the approximation. For example, if x = 20cm and

∆x = 0.1cm, then we can immediately calculate the new area as (20.1cm)2 = 404.01cm2 (and

finding the approximate value of 404cm2 actually takes a few seconds longer!). However, as

you’ll see in the assignments, the idea is still useful when we are not dealing with specific

numbers.

Here is another couple of examples, to get you used to the notation.


Example: Use differentials to approximate 78.

√ √
Solution: Realizing that 81 = 9, we identify f (x) = x, x0 = 81, and ∆x = dx = −3.

We then calculate df :

df = f 0 (x) dx

1
= √ dx
2 x
1
= √ · (−3)
2 81
1
=− .
6

That is, when x decreases by 3, f (x) decreases by approximately 1/6. Thus,

√ √
78 = 81 + ∆f


≈ 81 + df

89
1
=9−
6
5
= 8 ≈ 8.833.
6

(Compare this to the calculator value of 78 ≈ 8.83176...)

Example: A metal sphere of radius 10cm is to be coated with a layer of silver, 0.02 cm

thick. What volume of silver will be required?

Solution: We essentially want to know the change in the volume of a sphere when the

radius is increased from 10 to 10.02. Well,

4
V (r) = πr3 ,
3

and so we have dV = 4πr2 dr

= 4πr2 ∆r

= 4π(10)2 (0.02)

= 8π.

Therefore the required volume is approximately 8πcm3 ≈ 25.133cm3 .

For comparison, the exact value should be 34 πr13 − 43 πr03 = 43 π 10.023 − 103 ≈ 25.183cm3 .


To appreciate the advantage of the method of differentials, consider that we can state more

generally that for any initial radius r, and any desired thickness of silver coating dr, the

volume of silver required will be approximately dV = 4πr2 dr (as long as the coating is thin,

in relation to the base sphere). Realizing that this quantity is directly proportional to the

thickness of the coating, but proportional to the square of the radius of the sphere may be

useful if we want to consider the costs of different-sized spheres.

90
20 Graphical Implications of the Derivative (Application to Curve-

Sketching)

Monotonicity and Extrema

You probably already have some intuitive understanding of the terms “increasing”, “decreasing”,

“maximum”, and “minimum”, as they apply to the graphs of functions. However, it will be

useful to define them precisely. We’ll start with monotonicity:

Definition:

• A function f is increasing on an interval I if f (x2 ) > f (x1 ) whenever x2 > x1 , where x1

and x2 are any numbers in I.

• Similarly, we say f is decreasing if f (x2 ) < f (x1 ) whenever x2 > x1 .

• If f is either increasing or decreasing on I (one or the other, but we don’t care or know

which), then we may say that f is monotonic on I.

Now, consider the definition of the derivative of f with this new definition in mind. Since we
f (x2 ) − f (x1 )
can write f 0 (x1 ) = lim , it follows that if f 0 (x) > 0, for all x in the interval
x2 →x1 x2 − x1
I, then f (x) is increasing on I. Similarly, if f 0 (x) < 0, then f is decreasing on I.

Note that the converse is not quite true; if f is increasing we may still find that f 0 (x) = 0

at some isolated points.

Example: Consider the function f (x) = x − sin x. Differentiating, we find that f 0 (x) =

1 − cos x. Clearly f 0 (x) ≥ 0 for all x, but equality holds only at the discrete points x = 0,

±2π, ±4π, etc. This does imply that f (x) is increasing for all x. The graph is plotted below.

91
12.5

10

7.5

2.5

0 0.4π 0.8π 1.2π 1.6π 2π 2.4π 2.8π 3.2π 3.6π 4π 4.4π

Figure 39:

Now, let’s turn to maxima and minima:

Definition:

• A function f has an absolute ( or global) maximum at x0 if f (x0 ) ≥ f (x) for all x in the

domain of f .

• Similarly, if f (x0 ) ≤ f (x) for all x in its domain, then we say that f has an absolute ( or

global) minimum at x0 .

For determining the shape of a graph, the following terms are more helpful:

• A function f has a local (or relative) maximum at x0 if there is a number h such that

f (x0 ) ≥ f (x) for all x ∈ (x0 − h, x0 + h).

• A function f has a local (or relative) minimum at x0 if there is a number h such that

f (x0 ) ≤ f (x) for all x ∈ (x0 − h, x0 + h).

Comment on Terminology: The plural of “maximum” is “maxima”, and the plural of

“minimum” is “minima”. We also refer to them collectively as “extrema” (the singular of which

is “extremum”).

Now, a little thought should reveal that if the function under consideration is continuous,

then it can only possess a local maximum at x0 if it is increasing to the left of x0 and decreasing

to the right of x0 (% &). What does this tell us about the derivative of f at x0 ? Well,

if f 0 changes from positive to negative as x increases through x0 , then there are only two

possibilities: either f 0 (x0 ) = 0, or else f 0 (x0 ) simply doesn’t exist.

92
Of course, exactly the same observations can be made about local minima, which suggests

the following definition and procedure for identifying local extrema:

Definition: A number x0 is called a critical value of f if either f 0 (x0 ) = 0, or f 0 (x0 ) does

not exist. The point (x0 , f (x0 )) is called a critical point of f .

Algorithm for identifying extrema:

If we wish to find all of the local extrema of a continuous function f , we must first find all

of the critical values. To determine whether each corresponds to a maximum, a minimum, or

neither, we may apply the First Derivative Test: if f 0 changes from positive to negative as x

increases through x0 , then f must have a local maximum at x0 (% &), while if f 0 changes

from negative to positive, then f must have a local minimum instead (& %).

Examples:

• Consider the function f (x) = 2x3 − 3x2 − 12x + 5. Differentiating, we find that f 0 (x) =

6x2 − 6x − 12 = 6(x2 − x − 2) = 6(x − 2)(x + 1). Therefore f has critical points where

x = −1 and x = 2. Furthermore, f 0 > 0 on (−∞, −1), f 0 < 0 on (−1, 2), and f 0 > 0

on (2, ∞). Hence we can conclude that the point (−1, 18) is a local maximum, while the

point (2, −15) is a local minimum.

24

16

-5 -2.5 0 2.5 5

-8

-16

Figure 40:

1
• Consider the function f (x) = x1/3 . Here differentiating yields f 0 (x) = x−2/3 , so f 0 (0)
3
does not exist. This makes (0, 0) a critical point, but it is in fact not an extremum, since

f 0 (x) > 0 for all x 6= 0.

93
1.6

0.8

-2.4 -1.6 -0.8 0 0.8 1.6 2.4

-0.8

Figure 41: -1.6

• Consider the function f (x) = (3x + 1)2/3 . The derivative is

2 2
f 0 (x) = (3x + 1)−1/3 (3) = √
3
.
3 3x + 1

We can see that f 0 (x) is undefined when x = −1/3. Furthermore, f 0 (x) < 0 when

x < −1/3, while f 0 (x) > 0 when x > −1/3, so f has a local minimum at (−1/3, 0).

2.5

1.5

0.5

-1.5 -1 -0.5 0 0.5 1 1.5 2

Figure 42:

1 2
• Consider the function f (x) = = x−2 . We have f 0 (x) = −2x−3 = − 3 , which is
x2 x
negative when x is negative, and positive when x is positive. However, there is no

critical point here, since x = 0 is not in the domain of f ! In fact, even if we were to

94
re-define f as, say, 
1,


2
if x 6= 0
f (x) = x

a
 if x = 0

then the point (0, a) would still not be a maximum. In fact it would be a local minimum!

The point here is that all of our discussions regarding derivatives and their implications

for graphs hold only for continuous functions.

7.5

2.5

-4 -3.2 -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4

Figure 43:

Locating Absolute Extrema

Recall the Extreme Value Theorem: if f (x) is continuous on a closed interval [a, b], then f

attains an absolute maximum value and a minimum value on that interval. A little thought

reveals that these absolute extrema must occur either at critical points of f on [a, b] or at the

endpoints x = a, x = b. So, all we need to do is find the critical points and compare their

values, both against each other and against f (a) and f (b) (it is not necessary to classify them

as local maxima or minima). This is commonly referred to as the Closed Interval Method for

locating absolute extrema.

Example: Find the absolute maximum and minumum values of f (x) = x3 − 3x + 1 on the

interval [0, 2].

Solution: Differentiating, we find f 0 (x) = 3x2 − 3 = 3(x + 1)(x − 1), so the only critical

value within the interval [0, 2] is x = 1. At this point we find that f (1) = −1. Meanwhile,

at the endpoints we have f (0) = 1 and f (2) = 3. Comparing these, we conclude that the

absolute maximum of f on [0, 2] is 3, and the absolute minimum is -1.

95
Comment: If the interval of interest is not closed, then the Extreme Value Theorem doesn’t

apply, and there may not be any absolute extrema. However, there may be, and if there are

then we can usually modify the Closed Interval Method by considering limits as we approach
x
each end of the interval. For example, consider the function f (x) = , on the interval
1 + x2
1−x 2
(0, ∞). This has derivative f 0 (x) = , so the only critical point within the interval
(1 + x2 )2
is at x = 1. We aren’t allowed to evaluate f at zero (or, obviously “at” infinity), but if we

consider that lim f (x) = 0, and lim f (x) = 0, while f (1) = 1/2, it becomes clear that the
x→0+ x→∞
maximum value of f on the given interval is 1/2. On the other hand, f does not have a

minimum value (although we could state that 0 is a lower bound on the values of f ).

0.5

0.25

0 0.8 1.6 2.4 3.2 4 4.8 5.6


o

Figure 44:

Graphical Implications of the Second Derivative

A few more definitions:

• A function f is concave up on an interval I if f 0 (x) is increasing on I.

• A function f is concave down on an interval I if f 0 (x) is decreasing on I.

• If the concavity of f changes at a point (x0 , f (x0 )) (either from up to down or from

down to up) then we call this point an inflection point.

Considering our earlier discussion of the terms “increasing” and “decreasing”, we can see an

easy test for concavity: if f 00 (x) > 0 on I then f must be concave up, while if f 00 (x) < 0

then f must be concave down. Also, just as critical points occur where f 0 (x) is either zero

96
or undefined, we see that inflection points must occur either where f 00 (x) = 0 or f 00 (x) does

not exist.

Comment on Terminology: Note that while a critical point of f is a point at which f 0

is either zero or undefined (and these are important because they are the possible locations

of maxima or minima), we do not have a corresponding term for points at which f 00 is zero

or undefined. These are simply “possible inflection points”, or, to be more precise, they are

critical points of f 0 .

Examples:

• Consider f (x) = ex + x. We see that f 0 (x) = ex + 1, and f 00 (x) = ex , so in fact f is

both increasing and concave up on the entire real line.

• Consider f (x) = x1/3 . Here f 0 (x) = 13 x−2/3 , and f 00 (x) = − 92 x−5/3 , so we see that even

though f is continuous everywhere, none of its derivatives are defined at (0, 0). Therefore

this is a critical point. Is it an extremum? Well, applying the First Derivative Test, we

see that f 0 is positive for x < 0 and for x > 0, so this is in fact not an extremum.

However, the second derivative f 00 does change sign at 0, from positive to negative, so

(0, 0) is indeed an inflection point.

In general, if we want to produce a reasonably accurate graph by hand, we can use information

from f , f 0 , and f 00 . From f itself we can plot a point or two to “anchor” the graph (usually we

look for points at which the curve crosses the x or y axes), and it’s often useful to consider limits

of f as x approaches any discontinuities or ±∞. From f 0 we can tell where f is monotonic

and locate any extrema, and from f 00 we can determine concavity and locate any inflection

points.

Example: Sketch the graph of f (x) = x5/3 − x2/3 .

Solution:

Information from f : since f is continuous everywhere, and can be expressed as f (x) =

x2/3 (x − 1), we can see that f = 0 at x = 0 and x = 1 (these are the only “x-intercepts”), and

97
the only limits of interest are lim f = ∞, and lim f = −∞ (both of which you should be
x→∞ x→−∞
able to see upon inspection).

5 2 1
Information from f 0 : Differentiating gives f 0 (x) = x2/3 − x−1/3 = 1/3 (5x − 2),
3 3 3x
from which we can see that the only critical values are x = 0 and x = 2/5. To apply the first

derivative test and summarize the behaviour of the function on the intervals between the

critical points, some people find it helpful to display the signs of each factor of f 0 in a chart,

in this way:

(−∞, 0) (0, 25 ) ( 25 , ∞)
1
3x1/3
− + +

(5x − 2) − − +

f0 + − +

f % & %

Here the last line is intended to represent the deduction that f must be increasing on

(−∞, 0) and ( 52 , ∞), and decreasing in between. From this we can conclude from the first

derivative test that the point (0, 0) must be a local minimum, while the point ( 52 , −0.326...)

must be a relative minimum.

Information from f 00 : Differentiating again, we find

10 −1/3 2 −4/3 2
f 00 (x) = x + x = 4/3 (5x + 1).
9 9 9x

From this we can see that the only possible locations of inflection points are at x = − 15 and

x = 0. Again, we can use a chart to display the signs of f 00 on each interval of interest, for

each factor:

(−∞, − 15 ) (− 15 , 0) (0, ∞)
2
9x4/3
+ + +

(5x + 1) − + +

f 00 − + +

f a ` `

We’ve deduced that (− 15 , −0.410...) is an inflection point, but (0, 0) is not.

98
Putting all of this information together, we arrive at the graph shown below.

1.6

0.8

-1.5 -1 -0.5 0 0.5 1 1.5 2

-0.8

-1.6

Figure 45:

Comment: Some graphing software has difficulty in producing the graph of this function

(the entire left side of the graph may be omitted). The problem can usually be corrected by

rewriting the function in the form f (x) = (x5 )1/3 − (x2 )1/3 .

The Second Derivative Test

If we are going to calculate f 00 anyway, or if it is easy to calculate, then we do have an

alternative to the First Derivative Test available to us. If x = x0 is a critical value of f , and

f 00 (x0 ) exists , then we have the following:

1. If f 00 (x0 ) > 0, then x0 corresponds to a relative minimum of f .

2. If f 00 (x0 ) < 0, then x0 corresponds to a relative maximum of f .

Of course, if f 00 (x0 ) = 0, or if f 00 (x0 ) does not exist, then the test is of no help, so we must

rely on the First Derivative Test. For example, consider the functions f (x) = ±x4 . You can

easily verify that x = 0 is a critical point for both functions, and that f 00 (0) = 0 for both

functions. Obviously, though, x4 > 0 for all x 6= 0, while −x4 < 0 for all x 6= 0, so the origin

is a minimum in one case and a maximum in the other.

As another example, consider that for the sketch above (Figure 45), we have only one way

of classifying the extremum at the origin, because f 00 (0) is undefined.

In fact, the First Derivative Test will often be the easier of the two to apply, because the

calculation of the second derivative can be more trouble than it’s worth. The Second Derivative

Test will be a useful shortcut when we’re working with polynomials and other simple functions.

99
Part IV

Integral Calculus

21 The Definite Integral

Introduction

Integral calculus is actually much older than differential calculus. The essential idea is simply

that of taking a difficult problem, breaking it down into smaller, more manageable pieces, and

then putting the results together (“re-integrating” them!). There is one application which is

particularly easy to visualize, and so we’ll use that as our motivation.

Consider a continuous, non-negative function f (x). How might we find the area between

its graph and the x-axis, over an interval [a, b]?

y=f(x)

x=a x=b

Figure 46:

You should be able to see numerous ways in which we could find an approximation; we

could split the region up into rectangles and triangles, calculate the area of each one, and add

the results. That’s integration!

Rather than doing this haphazardly, though, we have a standard algorithm, which will

allow us to develop theorems and formulas (and to calculate such an area much more quickly).

First, we divide the interval [a, b] into n subintervals of equal length ∆x. We’ll label the

endpoints of each interval as xi :

x0 x1 x2 xn
Figure 47: a b

In each interval [xi−1 , xi ], we pick some value x∗i at which to evaluate f , and use this value

100
to define the height of a rectangle occupying that interval:

y=f(x)

x=a x=xi* x=b

x=xi-₁ x=xi
Figure 48:

We calculate the area of this rectangle. We then repeat the procedure for each of the other

subintervals, and thus obtain our approximation:

A1
A2 A3 A6
A4 A5

Figure 49:

Area ≈ A1 + A2 + ... + An

where Ai = f (x∗i ) ∆x (height × width).

That is,
n
X
Area ≈ f (x∗i ) ∆x (this is called a Riemann Sum).
i=1

Example: Let’s find an approximate value for the area below the curve y = x2 , between

x = 0 and x = 3. For simplicity, we’ll use just three rectangles (of width 1), and base their

heights on the midpoints of each interval.

101
10
10

y=x2
y=x2
7.5
7.5

5
5

2.5
2.5

-0.4 0 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4

x1*=0.5
-0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4

x2*=1.5 x3*=2.5

This gives

Area ≈ A1 + A2 + A3 = f (x∗1 ) ∆x + f (x∗2 ) ∆x + f (x∗3 ) ∆x

= f (0.5) · 1 + f (1.5) · 1 + f (2.5) · 1


 2  2  2
1 3 5
= + +
2 2 2
35 3
= = 8 .
4 4

Now, you might be thinking “couldn’t we have used more rectangles?”, and of course you’re

right; using more rectangles should give us a better approximation. In fact, we can take it one

step further! The exact area should be given by the expression19

n
X
lim f (x∗i ) ∆x.
n→∞
i=1

Example: Let’s revisit our previous example, and try to find the exact area. Here’s what

we do:

• Partition the interval [0, 3] into n intervals (of width ∆x = 3/n):

3 6 ... 3i ...
0 n n n 3

Figure 50: x0 x1 x2 ... xi ... xn

• Choose x∗i to be xi , say (that is, evaluate f at the right point of each interval)

• Let n → ∞.

This gives
n
X
Area = lim f (x∗i ) ∆x
n→∞
i=1
19
In fact, we can view this as the definition of the area of an irregularly shaped region.

102
n  2  
X 3i 3
= lim
n→∞ n n
i=1

n
X 27i2
= lim
n→∞ n3
i=1

n
27 X 2
= lim i
n→∞ n3
i=1
 
27 n (n + 1) (2n + 1) X
= lim (this is a known formula for i2 )
n→∞ n3 6

=9

You can see that even with just three rectangles, we didn’t do too badly. This is partly because

we chose to use the midpoint of each rectangle; as a rule of thumb this tends to work best

(this is the so-called “Midpoint Rule”).

Definition of the Definite Integral

Generalizing the idea used above leads to this:

Definition:

Let f be continuous on the interval [a, b]. Partition [a, b] into n subintervals of equal length
b−a
∆x = . Label the endpoints of the subintervals xi , for i = 0..n (so that the ith interval
n
is [xi−1 , xi ] i = 1..n), and in each interval, select a point x∗i . The definite integral of f from

a to b is
Z b n
X
f (x) dx = lim f (x∗i ) ∆x.
a n→∞
i=1

A few comments need to be made here:

R
1 The symbol “ ” is an elongated “s” for “sum”. This should serve as an eternal reminder

of the definition; it may be helpful to think of the integral as a kind of sum, of infinitely

many infinitesimal quantities.

2 There are standard terms for the various elements within the integral. The “dx” is called

the differential (as we’ve already discussed), the function f (x) is called the integrand,

and the numbers a and b are called the limits of integration (although this usage of the

word limit is not consistent with the rest of mathematics; boundaries might be a more

103
appropriate word).

3 Our definition says nothing about areas! Instead, the integral can represent anything

that the product f (x) ∆x can represent. If you’re ever struggling to interpret an integral,

try looking at the units, and remember that the differential is part of the integral!

E.g. If t is time, and f (t) is a velocity of an object travelling in a straight line, then
Z b
f (t) dt has units of (m/s)·s = m. It’s the distance travelled!
a
Z b
4 Because f (x) dx is a number, the x is often called a “dummy” variable. It is only
a
used for the calculation process, and so any other variable can be substituted, whenever

we wish it:

Z b Z b Z b Z b
f (x) dx = f (t) dt = f (θ) dθ = f (γ) dγ = . . .
a a a a

There is no difference in the meanings of these expressions, whatsoever!

Definite integrals have a number of useful properties. We won’t prove these here, but if

you think of the application to area, most of them should make sense:

If f and g are both integrable on [a, b], and k is a constant, then


Z b
1 kdx = k (b − a)
a

y=k

x=a x=b

Figure 51:

Z a
2 f (x) dx = 0 (The region has no width.)
a
Z b Z b Z b
3 [f (x) ± g (x)] dx = f (x) dx ± g (x) dx
a a a
Z b Z b
4 kf (x) dx = k f (x) dx
a a

Note: properties 3 and 4 make integration a linear process, like differentiation.

104
a b  
(a − b) (b − a)
Z Z
5 f (x) dx = − f (x) dx because ∆x = =−
b a n n
This property is hard to make sense of in terms of areas, but it will occasionally be

helpful; we can reverse the limits of integration at any time we wish, and just multiply

by −1 to compensate.
Z b Z c Z b
6 f (x) dx = f (x) dx + f (x) dx
a a c

When we’re dealing with areas, this statement just says that we can split areas into two:

x=a x=c x=b


Figure 52:

However, we can easily use property 5 to show that this rule works even if the number

c is outside of the interval [a, b]!

7 It is permissible to integrate inequalities. That is,

Z b Z b
If f (x) ≥ g (x) for x ∈ [a, b] , then f (x) dx ≥ g (x) dx.
a a

y y ! f(x)

y ! g(x)

x!a x!b x
Figure 53:

105
The Connection between Integral and Differential Calculus

Consider a function f (x), continuous and positive on the interval [a, b]. Let A (x) be the area

below the curve between x = a and an arbitrary point x in the interval [a, b]. Notice that the

rate of change of A (x) must be related to the magnitude of f (x)!

To be precise, if we add a thin strip of area to the right, of width h, then the area of the

strip must be A (x + h) − A (x).

y ! f(x)

A(x)

a x x"h b x
Figure 54:

It must also be approximately equal to hf (x) (since it is nearly rectangular). For small

values of h, then, we have

A (x + h) − A (x) ≈ hf (x) ,

and so
A (x + h) − A (x)
≈ f (x) .
h

We can make the approximation as accurate as we wish by letting h approach 0, and so

A (x + h) − A (x)
lim = f (x)
h→0 h

that is, A 0 (x) = f (x) .

The rate of change of A (x) isn’t just related to f (x); it’s equal to it!

Example: As a very simple demonstration, suppose f (x) = x. The region below the line
dA
y = x is always a triangle, with area A (x) = 21 x2 . Sure enough, = x.
dx

106
4

3.2

y=x

2.4

1.6

0.8
A=(1/2)x2

-0.4 0 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4

Figure 55:

22 The Fundamental Theorem of Calculus

22.1 The FTC Part I

The relationship found above is the single most important discovery in calculus. To generalize

it (so that we’re not limited to discussing areas), note that our area function A (x) could be
Z x
expressed as f (x) dx. Actually, this reveals that we’ve used x for two different purposes
a
(it’s both the name of the axis and the name of an arbitrary point on the axis). To avoid this
Z x
confusion, recall our comments about the “dummy” variable; we can rewrite this as f (t) dt.
a
We then obtain this:

The Fundamental Theorem of Calculus (Part I)

If f (x) is continuous on [a, b], then the function g (x)defined by

Z x
g (x) = f (t) dt, for x ∈ [a, b]
a

is differentiable on (a, b), and its derivative is g 0 (x) = f (x).

This could be stated more concisely as a new differentiation rule:

Z x
d
f (t) dt = f (x) .
dx a

Comments:

1 We’ve dropped the restriction that f (x) ≥ 0, and made no reference to areas.

107
2 The FTC can be paraphrased as “differentiation is the inverse of integration”.

Z x Z x
d
f −→ −→ f (t) dt −→ −→ f
a a dx

The reverse of this is not quite true, as we’ll discuss shortly.

3 We’ve just said that the FTC can be thought of as a differentiation rule, but it may

be even more helpful to think of it in reverse; the function we’ve called g (x) is an

antiderivative of f (x). We will often be able to find simpler forms of antiderivatives;


Z x
1
for example, we’ve already established that t dt = x2 . However, it is important to
0 2
realize that simple forms of antiderivatives do not always exist. In some cases we will

have no alternative but to define the funtions we need as integrals.

Examples:

a) The error function is defined as

Z x
2 2
erf (x) = √ e−t dt.
π 0

d 2 2
You can see by applying the FTC that (erf (x)) = √ e−x . We’ll discuss
dx π
numerous techniques for evaluating integrals, but none of them will ever help us to
Z x
2
simplify the expression e−t dt; it cannot be expressed in terms of elementary
0
functions! All we can do is add the error function to our list of functions; we’ll have

to accept that it’s defined as an integral, and we’ll only ever be able to evaluate it

approximately, using numerical methods.


2
This function appears often in statistics and the study of fluids. The factor √ is
Z x √ π
2 π
included because lim e−t dt = , so this way we have lim erf (x) = 1.
x→∞ 0 2 x→∞
Z x
sin t2 dt. The best we’ll be

b) Similarly, there is no way to simplify the integral
0
able to do is express it in terms of the Fresnel Sine Function, which is defined as

x
πt2
Z  
S (x) = sin dt.
0 2

This arises in the study of optics, and in other applications.

108
22.2 The FTC Part II
Z x
We commented above that g (x) = f (t) dt is an antiderivative of f (x). However it’s
a
actually infinitely many of them... a different one for each value of a. This shouldn’t be
d
surprising; we know in general that (f (x) + C) = f 0 (x) for any value of C, so if we work
dx
backwards there should be infinitely many antiderivatives.

Of course, we can see that the antiderivatives must be related: if g1 (x) and g2 (x) are both

antiderivatives of f (x), then g1 and g2 can differ only by a constant.

(Why? Because (g1 − g2 )0 = g10 − g20 = f − f = 0, so (g1 − g2 ) = constant.)

It may help to state this as a theorem:

If F (x) is an antiderivative of f (x), then every antiderivative of f (x) can be expressed as

F (x) + C, for some constant C.

This implies the following:

Suppose we can find a specific antiderivative for f (x); call it F (x) (for example, given

f (x) = x2 we might be able to guess that F (x) = 31 x3 ). No matter what the value of a is, we
Z x
know that if g (x) = f (t) dt, then g (x) = F (x) + C, for some value of C.
a
That is,
Z x
f (t) dt = F (x) + C.
a

If we set x = a we find that 0 = F (x) + C, so C = −F (a):

Z x
f (t) dt = F (x) − F (a) .
a

Now, if we evaluate this at an arbitrary number b, we find this:

The Fundamental Theorem of Calculus, Part II

If F (x) is any antiderivative of f (x), then

Z b
f (t) dt = F (b) − F (a) .
a

This is a remarkable result. It says, essentially, that to sum up all of the values of f (x)

between a and b, all we need to do is evaluate an antiderivative of f (x) at just two points!

To use this, of course, we’ll need to be able to find antiderivatives, so that will be our focus

109
for the next several lectures.

Comment on Notation: Since we may encounter complicated functions, we will often write

the difference F (b) − F (a) as F (x) | ba , to save ourselves having to write F out twice.

23 Indefinite Integrals

Thanks to the Fundamental Theorem of Calculus, the first step towards evaluating an inte-

gral is finding an antiderivative (unless we’re going to use numerical methods instead of the

FTC). Because of this, we often use the word “integral” as a synonym for “antiderivative”,

even though they are, in principle, completely different concepts! This has become part of

universally-accepted terminology, and we even use the integral sign (without limits) to repre-

sent antiderivatives:

Definition:

The indefinite integral of f (x) is the collection of all possible antiderivatives. We denote it
Z
by f (x) dx.

If we happen to know one antiderivative F (x), then of course this allows us to write
Z
f (x) dx = F (x) + C.

Now, how might we go about finding antiderivatives? Well, for a handful of expressions

we can simply reverse our differentiation rules, so let’s start with that:

d
1 Since (xn ) = nxn−1 , we know that
dx

Z
1
xn dx = xn+1 + C
n+1

(the differentiation rule is “multiply by the original exponent, then reduce the exponent

by one”, so the reverse is “increase the exponent by 1, then divide by the new exponent”).

d 1
2 Rule 1 breaks down if n = −1, so we need a special case. Recall that (ln x) = , so
dx x
we may claim that
Z
1
dx = ln x + C.
x

This is not quite complete, though, because it only makes sense if x is positive. What if

110
x is negative? Well, for x < 0 the function ln x is undefined, but ln (−x) is ok. In fact,

d 1 1
(ln (−x)) = · (−1) = ,
dx −x x

so we have 

Z
1 ln (x) + C,
 if x > 0
dx = .
x 
ln (−x) + C,
 if x < 0

We can state these two rules more concisely this way:

Z
1
dx = ln |x| + C .
x

d x
3 Since (e ) = ex , we know that
dx

Z
ex dx = ex + C .

d x
More generally, since (a ) = (ln a) ax , we know that
dx

ax
Z
ax dx = +C .
ln a

4 Considering our list of rules for differentiating the trigonometric functions, we can state

the following:

Z
cos x dx = sin x + C

Z
sin x dx = − cos x + C

Z
sec2 x dx = tan x + C

Z
csc2 x dx = − cot x + C

111
Z
sec x tan x dx = sec x + C

Z
csc x cot x dx = − csc x + C

5 Similarly,
Z
cosh x dx = sinh x + C

Z
sinh x dx = cosh x + C

etc.

6
Z
1
dx = tan−1 x + C
1 + x2

Z
1
√ dx = sin−1 x + C
1 − x2
Z
1
We might also state that −√ dx = cos−1 x + C, but this isn’t really adding
1 − x2
anything for us, except that it reveals something about the inverse trigonometric func-

tions.20

Unfortunately, those are about all of the actual rules available to us! Over the next few

lectures we’ll introduce several “techniques of integration”, but all any of these can do is convert

integrals into new forms... and the goal will be to obtain one of the integrals we already know.

For this reason the list above will be essential; you MUST know every formula we’ve written

above.
Z
1
20
If √ dx = sin−1 x + C1 = − cos−1 x + C2 , then sin−1 x and (− cos−1 x) must only differ by a
1 − x2
constant:
sin−1 x = − cos−1 x + K
or sin−1 x + cos−1 x = K.
Setting x = 0 tells us that K = π/2, so we’ve discovered an identity:
π
sin−1 x + cos−1 x = .
2

112
Also, keep in mind that some integration problems are actually impossible! As we’ve
2
already mentioned, e−x and sin x2 do not possess “nice” antiderivatives (every continuous


function possesses an antiderivative, but it may not be possible to express that antiderivative

in terms of the functions we’re accustomed to using).

Examples

With a bit of thought, we can apply the rules above to functions which don’t quite match the

formulas exactly.
Z
2
1. Consider 2 + x2 dx. All we need to do is expand the integrand; we have

Z
2 1
4 + 2x2 + x4 dx = 4x + x3 + x5 + C.

3 5

Z
2. Consider cos (2x) dx. We might guess that this should involve sin (2x), but checking
d
this gives sin (2x) = 2 cos (2x). All we need to do here is adjust for the 2:
dx
Z
1
cos (2x) dx = sin (2x) + C.
2

This is sometimes referred to as the “guess and fix-up method”. If you’re going to guess,

though, make sure you check your answer! Here we can indeed see that

 
d 1
sin (2x) + C = cos (2x) .
dx 2

Z
3. Consider cos2 x dx. Your first thought might be that this should involve sin2 x... but

if you check that you’ll see that it isn’t even close! In fact there’s a standard trick for

this; we use the double-angle identity:

Z Z
1 + cos 2x
cos2 x dx = dx
2
Z Z
1 1
= dx + cos (2x) dx
2 2
1 1
= x + sin (2x) + C.
2 4

113
2 + e−x
Z
4. Consider dx. We can expand this:
ex

2 + e−x
Z Z Z Z
−x −2x −x
e−2x dx

dx = 2e +e dx = 2 e dx +
ex
 
−x 1
+ − e−2x + C2

= 2 −e + C1 (guessing and fixing)
2
1
= −2e−x − e−2x + C.
2

24 The Method of Substitution (a.k.a. the Change-of-Variable

Technique)

Inverting the rules for specific functions was easy enough, but can we invert the Product,

Quotient, or Chain Rules?

We won’t worry about the Quotient Rule, since it is never required (we can always write
f (x)
as f (x) [g (x)]−1 and apply the Product and Chain Rules). Of the other two, the easier
g (x)
one to deal with is the Chain Rule, so let’s start there:

The original rule is this:

d
(f (g (x))) = f 0 (g (x)) g 0 (x) .
dx

This means that


Z
f 0 (g (x)) g 0 (x) dx = f (g (x)) + C.

This structure will usually be hard to recognize, but we may be able to break it down into two

steps. Notice that the integral contains a function (g (x)) “inside” another function, and the

derivative of this “inner” function also appears in the integral. If we happen to notice this, we

can try introducing a different variable; we make the substitution u = g (x). If we differentiate

this substitution, we find that


du
= g 0 (x) ,
dx

which we could write in differential form as

du = g 0 (x) dx.

114
This allows us to rewrite the entire integral in terms of our new variable:

Z Z
f 0 (g (x)) g 0 (x) dx = f 0 (u) du.

If we’ve been lucky enough to encounter a problem which is well suited to this idea, and clever

enough to spot the substitution, then we might now recognize the function f 0 (u) as being one

of the functions from our list of known antidifferentiation formulas, allowing us to conclude

that

Z Z
0 0
f (g (x)) g (x) dx = f 0 (u) du = f (u) + C = f (g (x)) + C.

A few examples:

Consider the function f (x) = ln 3x2 + 4 . To differentiate this, we would identify



1
du
u = 3x2 + 4, and differentiate it to obtain = 6x (even though we would probably not
dx
6x
write this out explicitly). This would give us f 0 (x) = 2 .
3x + 4
Z
6x
So, what would we do if we encountered the integral 2
dx? We would again
3x + 4
du
identify u = 3x2 + 4, and differentiate it to obtain = 6x. In differential form this is
dx
du = 6xdx, and this would allow us to write the integral as

Z Z
6x du
2
dx = .
3x + 4 u

This is from our list, and so we can complete the task.

Z Z
6x du
2
dx =
3x + 4 u

= ln |u| + C

= ln 3x2 + 4 + C


(we can drop the absolute values here because we know that 3x2 + 4 is always positive).
Z
sin (3 ln x)
2 Consider dx. There’s an obvious choice here; 3 ln x is sitting “inside” an-
x
other function, and its derivative is also in the integral... or at least a multiple of its

derivative is there.

115
3 dx du
We let u = 3 ln x, which means that du = dx, which we could rearrange as = .
x x 3
Now

Z Z  
sin (3 ln x) dx
dx = sin (3 ln x)
x x
Z  
du
= sin (u)
3
Z
1
= sin u du
3
1
= − cos u + C
3
1
= − cos (3 ln x) + C.
3
Z
3 Here’s a simple one for which we don’t yet have a formula: what’s tan x dx? We need

to rewrite this first:


Z Z
sin x
tan x dx = dx.
cos x

Now we let u = cos x, because this choice gives us du = − sin x dx, which matches the

numerator (except for a negative sign). Now,

Z Z Z
sin x 1
tan x dx = dx = − du
cos x u

= − ln |u| + C

= − ln |cos x| + C.


ex 1 + ex dx? We let u = 1 + ex , which gives du = ex dx. Then
R
4 How might we tackle

√ √
Z Z
x
e 1+ ex dx = u du

2
= u3/2 + C
3
2
= (1 + ex )3/2 + C.
3

5 Sometimes the method can be made to work even when it doesn’t initially look hopeful.

116
Z p
For example, consider x3 1 + x2 dx. We see that we have an “inner” function 1 +

x2 , but its derivative is 2x, which does not match the x3 constituting the rest of the

integrand. We can still try the substitution, though:

u = 1 + x2 =⇒ du = 2xdx

We have an “x dx”, so let’s separate that within our integral:

Z p Z p
x3 1 + x2 dx = x2 1 + x2 (x dx)

A glance back at our substitution tells us that we can replace x2 with u − 1, and so

Z p Z p
x3 1 + x2 dx = x2 1 + x2 (x dx)

√ du
Z
= (u − 1) u
2

Z
1  3/2 
= u − u1/2 du
2
 
1 2 5/2 2 3/2
= u − u +C
2 5 3
1 5/2 1 3/2
= 1 + x2 − 1 + x2 + C.
5 3

If we’re unsure of ourselves, we always have the option of checking our results by differ-

entiation:

1 5/2 1 3/2
If y= 1 + x2 − 1 + x2 +C
5 3
dy 1 3/2 1 1/2
then = 1 + x2 · 2x − 1 + x2 · 2x
dx 2 2
3/2 1/2
= x 1 + x2 − x 1 + x2
p
1 + x2 − 1
 
=x 1 + x2
p
= x3 1 + x2 , as it should be!

117
6 Sometimes we will have to rework the integral before a useful substitution becomes
Z
dx
apparent. This may require some creativity. For example, consider . Your first
1 + ex
thought might be to try letting u = 1 + ex , but this fails (because differentiation yields

ex dx, not just dx).

There are several tricks that might work here. Here’s one: multiply the numerator and

denominator by e−x :
e−x
Z Z
dx
= dx.
1 + ex e−x + 1

Now if we let u = e−x + 1, so du = −e−x dx, we meet with success!

e−x
Z Z
dx
= dx
1 + ex e−x + 1
Z
du
= −
u

= − ln |u| + C

= − ln e−x + 1 + C.


 
−x
 1
Note that the result can be written in several different ways: − ln e + 1 = ln −x
=
 x  e +1
e
ln = x − ln (1 + ex ). This is something to keep in mind if you’re comparing
1 + ex
answers to practice problems with your classmates, or with the answers in the back of

the textbook, or with answers given by software; if your answers don’t match exactly, it’s

possible that they’re equal, but in a different form. They may even differ by a constant!

118
25 Integration by Parts (IBP)

We have just one more differentiation rule to reverse. Recall the Product Rule:

d du dv
(u (x) v (x)) = ·v+u·
dx dx dx

(we’re using u and v instead of f and g here simply because we’re working towards a formula,

in which those letters have become the standard).

Inverting the rule gives


Z  
du dv
v +u dx = uv + C
dx dx

(omitting the explicit dependence on x for simplicity).

If we separate the integral into two and simplify the differentials, this takes on the form

Z Z
v du + u dv = uv + C.

Now, it’s going to be hard to recognize this pattern in a pair of integrals; we’re normally going

to be looking at one integral at a time. So, we isolate one of the integrals. This gives the

integration by parts formula:


Z Z
u dv = uv − v du .

Note that we’ve omitted the “+C”. This is because we have an indefinite integral on each

side of the equation - each one of them already contains an arbitrary constant!

This formula probably looks a bit strange, but remember that both u and v are supposed
Z Z
dv
to be functions of x. The left-hand side is really u (x) dx = u (x) v 0 (x) dx, and the
Z dx
right-hand side contains u (x), v (x), and v (x) u0 (x) dx. Therefore, to use the formula, we

need to break our integrand up into a function u (x) and a function v 0 (x). If we can integrate

v 0 (x) to find v (x), then the integration by parts formula will enable us to replace the original

integration problem with a new one. If we’re lucky, the new one will be a problem we can

solve!

119
Examples:
Z
1 Consider xex dx. First observe that there really aren’t any useful substitutions avail-

able here, and we really can’t simplify the integrand in any way. However, we do have

a product of functions, so IBP is worth trying. So, we want to identify u and dv such
Z Z
x
that xe dx will become udv. There seems to be more than one way to proceed,

but we’ll let

u = x and dv = ex dx.

This means

du = dx and v = ex + C.

The IBP formula then gives us

Z Z Z
xex dx = udv = uv − vdu

Z
x
= x (e + C) − (ex + C) dx

= xex + Cx − (ex + Cx + K)

= xex − ex + K.

Notice that the arbitrary constant from v (the “C”) cancelled out. This will always

happen, so henceforth when we integrate dv to find v we will neglect the constant.

It isn’t always going to be clear how we should pick u and dv, or that we should be using

IBP at all; it will take practice and experience. Two principles may help:

? We must be able to integrate dv.

? The goal is to obtain a simpler integral than we start with, so it may help to pick

a u which gets simpler when differentiated.

In the above example, we could have picked

u = ex and dv = xdx.

120
1
=⇒ du = ex dx, v = x2 .
2

This gives
Z Z
1 1 2 x
xe dx = x2 ex −
x
x e dx.
2 2

This is entirely correct... but we’ve arrived at a more complicated integral than we

started with! When there are two obvious ways of proceeding, we’ll often find that one

of them leads to a simpler integral, while the other leads in the other direction.
Z
2 Many problems will require repeated applications of IBP. For example, consider x2 ex dx.

If we let

u = x2 and dv = ex dx

so du = 2xdx, v = ex

then
Z Z
x2 ex dx = uv − vdu
Z
2 x
=x e −2 xex dx

= x2 ex − 2xex + 2ex + C (using the result of example 1)

Z
3 Here’s a famous example with an odd twist: what do we do with ex sin x dx? There

are two obvious choices for u again, but in this case it turns out that both work equally

well. Let’s try this:

u = ex , dv = sin x dx

du = ex dx, v = − cos x dx.

This gives
Z Z
x
uv − vdu = −e cos x + ex cos x dx,

and now we have a second integral, similar to the original one. We try the same strategy

again:

u = ex , dv = cos x dx

121
du = ex dx, v = sin x

and this gives


Z Z
x x
e cos x dx = e sin x − ex sin x dx.

It looks as though we’re back to where we started. However, if we carefully write down

what we’ve got so far, we discover that we’re not actually stuck at all; we have

Z Z
ex sin x dx = −ex cos x + ex sin x − ex sin x dx

and we can rewrite this as

Z
2 ex sin x dx = ex (sin x − cos x) + C

(adding “+C” because we now have an indefinite integral on only one side of the equa-

tion), so
Z
1
ex sin x dx = ex (sin x − cos x) + C2
2

(where C2 = 21 C1 ; relabelling the constant is a good habit).

4 Normally, we’ll consider using IBP when our integrand is a product of two functions,
Z
but there is a handful of exceptions. For example, consider ln x dx. If we let

u = ln x dv = dx

1
du = dx v=x
x

then we find that


Z Z
ln x dx = uv − vdu
Z
= x ln x − dx

= x ln x − x + C.

The same trick works on the inverse trigonometric functions sin−1 x and tan−1 x, with

a bit more effort.

122
5 It will occasionally be useful to use both the Change-of-Variable technique and IBP

Z
in the same problem. Consider sinh x dx. At first glance neither method seems

hopeful. However, we can always try a substitution, and it will transform the integral

into a new form, so let’s see what happens if we let


t= x.

Then
1
dt = √ dx,
2 x

which we can rearrange as



dx = 2 xdt

= 2t dt


Z Z
This converts sinh x dx into 2t sinh t dt, and now we can try integration by parts!

Let

u = 2t, dv = sinh t dt

du = 2dt, v = cosh t.

Then

Z Z
sinh x dx = 2t sinh t dt
Z
= 2t cosh t − 2 cosh t dt

= 2t cosh t − 2 sinh t + C

√ √ √
= 2 x cosh x − 2 sinh x + C.

Substitution and IBP in Definite Integrals

If we’re working with a definite integral, and we decide that we need either to make a substi-

tution or to use integration by parts, we could work on the indefinite integral in the margin,

and then apply the Fundamental Theorem of Calculus directly. However, we can be slightly

more efficient.

123
Z 3
2
Example: Consider xex dx. If we let
2

u = x2

then du = 2xdx

and we can express the limits of integration in terms of the new variable as well. If x = 2,

then u = 4, while if x = 3 then u = 9. Therefore we may write

Z 3 Z 9 9
x2 1 u 1
xe dx = e du = eu
2 4 2 2 4

1 9
e − e4 .

=
2

Thus, if we make a substitution in a definite integral, we never have to return to the original

variables! We will be expecting you to take advantage of this in your own work; it might

not be absolutely necessary, but there is usually no good reason not to change the limits of

integration in problems like this.

Z 3
Example: Consider x2 ln x dx. We’ll use integration by parts here, with
2

u = ln x, dv = x2 dx

1 1
du = dx, v = x3 .
x 3

Since the integration by parts formula involves two new variables, we’ll always revert to the

original one. We can simply carry the limits through the integration process, like this:

Z 3 3 Z 3
1 3 1 2
x2 ln x dx = x ln x − x dx
2 3 2 2 3

3
8 1
= 9 ln 3 − ln 2 − x3
3 9 2

8 19
= 9 ln 3 − ln 2 − .
3 9

124
26 Other Simple Applications of Integration

26.1 Areas Between Curves

We introduced the concept of the definite integral with a view to finding the area between a

curve and the x-axis, but we can make one generalization very easily. Suppose we have two

curves over an interval; can we find the area between them?

y=f(x)

y=g(x)

x=a ∆x x=b
Figure 56:

Of course! Considering Figure (56), we proceed exactly as before: we partition the interval

[a, b] into small segments of width ∆x, and pick a point x∗i in each subinterval. We evaluate

f and g at this point to create a rectangle of width ∆x and height f (x∗i ) − g (x∗i ). Adding all

such rectangles, we find that the area is approximately

n
X
[f (x∗i ) − g (x∗i )] ∆x.
i=1

This is just the Riemann sum from the definition of the definite integral, with the function

f (x) replaced with the function f (x) − g (x), and so if we let n → ∞, we find that

Z b
Area = [f (x) − g (x)] dx.
a

Example: Find the area enclosed by the curves y = x2 and y = x3 .

Solution: First, a quick sketch reveals that there is exactly one region enclosed by these

curves, and it spans the interval [0, 1].

125
1.5

y=x2

0.5
y=x3

∆x

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 1.25 1.5

Figure 57: -0.5

Therefore the area is

1 1
x3 x4
Z  
2 3
 1 1 1
A= x −x dx = − = − = .
0 3 4 0 3 4 12

Occasionally a problem will be made easier if we divide the region into horizontal rectangles

instead. We then use y as the variable of integration. This will be the obvious way to proceed

if our region looks like this:

x=g(y) x=f(y)

y=d

∆y

y=c

Figure 58:

This way,
n
X
Area ≈ [xright − xleft ] ∆y
i=1

n
X
= [f (yi∗ ) − g (yi∗ )] ∆y,
i=1

so
Z d
Area = [f (y) − g (y)] dy.
c

126
We refer to regions bounded by two functions of x, such as in Figure (56) as “Type I” regions,

while regions bounded by two functions of y, as in Figure (58), are called “Type II”. If a region

can be described in either way (which is the case in Figure (57)), then we call it “Type III”.

Example: Find the area of the shaded region:

2.5

y=ln(x)
1.5

1
∆y

0.5

-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5


∆x
Figure 59: -0.5

This is a Type III region, so we can express the area as a single integral, using either x or

y as our variable of integration:

1. Integrating on x (that is, using vertical rectangles):

Z 3
A= ln x dx
1

= (x ln x − x)|31 (we found this antiderivative earlier, using IBP)

= (3 ln 3 − 3) − (0 − 1)

= 3 ln 3 − 2.

2. Integrating on y (that is, using horizontal rectangles):

Z ln 3
A= (3 − ey ) dy
0

= (3y − ey )|ln
0
3

= (3 ln 3 − 3) − (0 − 1)

127
= 3 ln 3 − 2.

We may choose to integrate on y either because the integral is easier that way (in the above

example using x requires integration by parts, while using y does not), or because of the shape

of the region.

Example: Find the area of the shaded region:

y=x
1.5

0.5

-0.5 0 0.5 1 1.5 2 2.5 3

y=(2/x)-1
-0.5

-1

Figure 60: -1.5

Solution: We always have two options, but since this is a type II region, one of them

looks more appealing:

1. Integrating on x requires two separate integrals:

Z 1 Z 2 
2
A= x dx + − 1 dx
0 1 x

1
= ... = 2 ln 2 − .
2

2. Integrating on y needs only one:

Z 1 
2
A= − y dy
0 y+1

1
= ... = 2 ln 2 − .
2

128
26.2 Mean Values of Functions

Here’s a completely different application. Think back to our development of the definite

integral. At one stage we have n rectangles, of heights f (x∗i ). The average height of these

rectangles must be
Pn
i=1 f (x∗i )
,
n

and it seems reasonable to claim that this can be used as an approximation for the average

value of the function f over the interval [a, b].


(b−a)
Recall also that the width of those rectangles is ∆x = n . This allows us to rewrite our

expression this way:


Pn Pn
i=1 f (x∗i ) (x∗i )
i=1 f
favg ≈ =
n (b − a) /∆x
n
1 X
= f (x∗i ) ∆x.
(b − a)
i=1

Letting n → ∞, we can define the mean value of a continuous function f on an interval [a, b]

as
Z b
1
m.v. (f ) = f (x) dx.
(b − a) a

Example: The mean value of f (x) = sin x over the interval [0, π] is

Z π
1
sin x dx
π 0

π
1
= (− cos x)
π 0

1
= (1 + 1)
π
2
=
π

(which is approximately 0.637). We could consider this to be the mean value of the absolute

value of f over the entire real line (its average distance from the x-axis). See below:

129
2

-1.5π -π -0.5π 0 0.5π π 1.5π

Figure 61:

Comment: In some situations it is convenient to use a different kind of average. The

square root of the mean of the square of f (the “root mean square”):

s
Z b
1
r.m.s. (f ) = [f (x)]2 dx
b−a a

This will always give a value at least as large as the mean of the absolute value of f ; it

exaggerates the significance of deviations from the average.

For comparison, consider sin x on [0, 2π]:

m.v. (sin x) = 0

2
m.v. (|sin x|) = ≈ 0.637
π
s
Z 2π
1 1
r.m.s. (sin x) = sin2 x dx = . . . = √ ≈ 0.707.
2π 0 2

130
27 Trigonometric Substitutions

Now we introduce a variation on the change-of-variable technique, useful for a fairly specific
√ √
set of problems. If we encounter integrals containing terms such as a2 − x2 or x2 ± a2 ,

where a is a constant, there are three specific substitutions which may be helpful. The idea

behind these substitutions is to make use of the identites

sin2 θ + cos2 θ = 1 and tan2 θ + 1 = sec2 θ.


For example, if we encounter the expression x2 + 1, replacing x with tan θ will enable us to

eliminate the square root:

p p √
x2 + 1 = tan2 θ + 1 = sec2 θ = sec θ

(this last step is only valid if sec θ > 0, but for reasons we’ll explain in a moment, this will

always be the case).

Expression Substitution Relevant Identity



x2 + a2 x = a tan θ tan2 θ + 1 = sec2 θ

x2 − a2 x = a sec θ sec2 θ − 1 = tan2 θ

a2 − x2 x = a sin θ 1 − sin2 θ = cos2 θ

Notice that we’ve written these substitutions in inverse form; what we are really doing is

introducing new variables θ = tan−1 xa , etc. For these particular substitutions, the calcula-


tions tend to be easier if we use the inverse form instead.

As with any substitution, the effect is simply to transform the integral into a new form,

and there’s no guarantee that the new integral will be one that we can evaluate... but it may

be worth trying!

Finally, if we are indeed able to evaluate the integral in θ, we’ll want to rewrite the result

in terms of x. This will typically involve a problem of the following type: given that tan θ = xa ,

what is sin θ? To answer this, we could either work with identities, or draw a sample right

triangle in which an angle θ has tangent xa .

131
x2 ! a2
x

θ
Figure 62: a

From this we can read off the values of the other trigonometric functions. In this example,

x a
sin θ = √ , and cos θ = √ .
x2+a 2 x + a2
2

Examples:
Z
dx
a) Consider √ . Following the rules above, we let x = 2 tan θ. As in every substi-
4 + x2
tution, we differentiate this:

dx = 2 sec2 θ dθ.

Plugging everything in, we obtain

2 sec2 θ
Z Z
dx
√ = √ dθ
4 + x2 4 + 4 tan2 θ

sec2 θ
Z
= √ dθ
1 + tan2 θ
sec2 θ
Z
= dθ
|sec θ|
 π π
Now, we’ve defined θ as θ = tan−1 x

2 . That means that θ ∈ − , , and so sec θ is
2 2
always positive. Therefore we have

Z
sec θ dθ.

Now, that’s a problem. This is not on our basic list, and it turns out that with the

techniques available to us in this course, it’s quite difficult to evaluate it. However, the

formula is known, and this particular integral appears often enough that we’ll add it to

our basic list, along with a second, related one:

132
Z
sec x dx = ln |sec x + tan x| + C

Z
csc x dx = − ln |csc x + cot x| + C

Exercise: verify these (by differentiation).

Now we can finish the task at hand:

Z Z
dx
√ = sec θ dθ
4 + x2

= ln |sec θ + tan θ| + C.

We still need to return to x. We know that tan θ = x/2, but what’s sec θ? We can either draw

a sample triangle, as discussed above, or observe that

p
sec θ = 1 + tan2 θ

r  x 2
= 1+
2
1p
= 4 + x2 .
2

Continuing,


4 + x2 x
Z
dx
√ = ln + + C1
4 + x2 2 2
p
= ln x + 4 + x 2 + C2
 
A
where we’ve used the fact that ln = ln A − ln B, and set C2 = C1 − ln 2.
B

Comment: This integral could be approached in a completely different manner. We

might have written


Z Z
dx 1 dx
√ = q 2 ,
4 + x2 2
1 + x2

133
and let u = x/2, so du = dx/2, giving

Z
du

1 + u2

= sinh−1 u + C
x
= sinh−1 + C,
2

and this is, believe it or not, equivalent to our first result! (Since the hyperbolic func-

tions are defined in terms of exponentials, their inverses can be expressed in terms of

logarithms.)
Z
1
b) Now consider the integral √ dx (you might recognize the integrand, if you’ve
x x2 − 1
been looking at tables of derivatives, but let’s assume that you don’t). Looking at the

denominator, it looks as though we might be able to get somewhere by letting

x = sec θ,

which will mean that

dx = sec θ tan θ dθ.

Rewriting the integral in terms of θ, we find

Z Z
1 1 1
√ dx = ·√ sec θ tan θ dθ
x x2 − 1 sec θ 2
sec θ − 1
Z
tan θ
√= dθ.
tan2 θ
 π  π
Now, since θ = sec−1 (x), we have θ ∈ −π, − ∪ 0, (see section 8 of these notes,
2 2
and observe that we have to exclude the points θ = −π, θ = 0 to avoid division by zero),

and on either part of this domain the tangent function is positive, so

Z Z
1 tan θ
√ dx = dθ
x x2 − 1 tan θ
Z
= dθ

134
=θ+C

= sec−1 x + C,

and this is the rule that you might have recognized:

d 1
sec−1 x = √

.
dx x x2 − 1

x3
Z
c) Consider √ dx. This time we let
4 − x2

x = 2 sin θ

=⇒ dx = 2 cos θ dθ.

Plug everything in:

x3 8 sin3 θ (2 cos θdθ)


Z Z
√ dx = p
4 − x2 4 − 4 sin2 θ

sin3 θ cos θ
Z
=8 p dθ
1 − sin2 θ

sin3 θ cos θ
Z
=8 dθ (question: why not ± cos θ?)
cos θ
Z
=8 sin3 θ dθ
Z
1 − cos2 θ sin θ dθ

=8

Now let u = cos θ, du = − sin θ dθ (an ordinary substitution):

Z
1 − u2 du

= −8

Z
u2 − 1 du

=8

u3
 
=8 −u +C
3
 
1 3
=8 cos θ − cos θ + C
3

135
Use a triangle:

3/2 √ !
4 − x2 4 − x2
=8 − +C
24 2

1 3/2 p
= 4 − x2 − 4 4 − x2 + C.
3

Check? If this is f (x), then

1 1/2 −1/2
f 0 (x) = 4 − x2 (−2x) − 2 4 − x2 (−2x)
2
p 4x
= −x 4 − x2 + √
4 − x2

−x 4 − x2 + 4x

= √
4 − x2

x3
=√ as it should be!
4 − x2

The same technique can be useful for other powers of a2 − x2 or x2 ± a2 (not just square

roots). You’ll see an example of this shortly.

28 Integration of Rational Functions

Every rational function has an antiderivative which can be expressed in terms of some com-

bination of rational functions, the natural logarithm, and the arctangent. We’ll discuss the

required strategies through a number of examples, starting with some easy ones.
Z
1
1 dx = ln |x − 4| + C
x−4
(If you have trouble seeing this right away, you could make the substitution u = x − 4).
Z
1
2 dx = − ln |4 − x| + C
4−x
1 1
(Again, a substitution could be made here, or you could realize that 4−x = − x−4 , and

ln |x − 4| = ln |4 − x|).
Z
1 1
2 dx = − (x − 4) + C
3
(x − 4)

136
(Once again, try u = x − 4 if you need to. Once you’ve seen a few examples like this,

you’ll probably find that you no longer need to go through the substitution procedure.)
Z
1
4 What do we do with dx? Use partial fractions!
x2 − 4

1 A B
= +
x2 − 4 x+2 x−2

=⇒ 1 = A (x − 2) + B (x + 2)

x=2 =⇒ 1 = 4B

x = −2 =⇒ 1 = −4A

=⇒ A = − 14 , B= 1
4

− 14 1
Z Z Z
1 4
so 2
dx = dx + dx
x −4 x+2 x−2
1 1
− ln |x + 2| + ln |x − 2| + C
4 4
1 x−2
= ln + C.
4 x+2
Z
1
5 Next, what about dx?
x2 +4
Z
1
Recall from our basic list of integrals that dx = tan−1 x + C. So... we just
x2 + 1
need to figure out how to deal with the “4”:

Z Z
1 1 1
dx = dx.
2
x +4 4 x 2

2 +1

x 1
Now we let u = , and differentiate: du = dx
2 2
Z Z
1 2du 1 du
= 2
= 2
4 u +1 2 u +1
1
= tan−1 u + C
2
1 x
= tan−1 + C.
2 2

137
We could easily make this more general:

Z
1 1 −1 x
 
dx = tan + C.
x2 + a2 a a

This is simple enough that you might want to remember it as a formula.


Z
x
6 Next, consider dx. This looks quite similar to the previous example, but the
2
x +4
presence of the x in the numerator changes things entirely! All we have to do here is

make the substitution

u = x2 + 4

=⇒ du = 2x dx.

Then
1
2 du
Z Z
x
2
dx =
x +4 u
1
= ln |u| + C
2
1
= ln x2 + 4 + C.

2

x2
Z
7 Let’s take that one step further: what do we do with dx? Well, the integrand
x2 + 4
is an improper rational function, and if we encounter these it will usually be a good idea

to split them up. We can use long division if necessary, but in this case you might be

able to see a clever shortcut:

x2 x2 + 4 4 4
2
= 2
− 2 =1− 2 .
x +4 x +4 x +4 x +4

The rest is easy:


x2
Z Z  
4
dx = 1− 2 dx
x2 + 4 x +4
Z
dx
=x−4
x2 + 4
x
= x − 2 tan−1 +C
2

(using the result of example 5 ).

8 Continuing on to more complicated-looking integrands, how might we deal with the

138
Z
dx
integral ?
x2 − 3x + 2
Well, this is actually no more complicated than example 4 ; we can use partial fractions

again.
1 A B
= +
x2 − 3x + 2 x−2 x−1

=⇒ 1 = A (x − 1) + B (x − 2)

=⇒ B = −1, A = 1.

Now we can write

Z Z Z
dx 1 1
2
= dx − dx
x − 3x + 2 x−2 x−1

= ln |x − 2| − ln |x − 1| + C

x−2
= ln + C.
x−1
Z
dx
9 Next, consider . This looks very much like the previous example... but
x2 − 2x + 5
the denominator is irreducible! In this circumstance the trick is to complete the square:

Z Z
dx dx
=
2
x − 2x + 5 (x − 1)2 + 4

 
1 x−1
= tan−1 + C.
2 2

10 As we discussed earlier in the course, every polynomial can be factored into linear and

quadratic factors, so we can break any rational function down into a sum of polynomials

and proper rational functions in which the denominator is of degree no greater than

two. So, after performing long division and a partial fraction decomposition, we really
Z
2x + 5
shouldn’t have to deal with anything worse than this: dx.
x2 + 5x + 10
Actually, this one isn’t so bad - we can just make a substitution!

Let u = x2 + 5x + 10

=⇒ du = (2x + 5) dx

139
Z Z
2x + 5 du
and then dx =
x2 + 5x + 10 u

= ln |u| + C

= ln x2 + 5x + 10 + C


(we’ve dropped the absolute values because this quadratic is always positive).

11 Ok, we were lucky there. What do we do if we’re NOT so lucky?


Z
x+2
Consider dx. This time if we try letting
x2 +x+1

u = x2 + x + 1

we get

du = (2x + 1) dx,

which does not match the numerator. However, it does tell us something. Notice that

it’s really the “+2” in the numerator of our integrand that’s causing the problem. In

other words, the substitution would have worked if the problem had been to evaluate
x + 12
Z    
du 1
dx instead because = x+ dx . Therefore we can try splitting
x2 + x + 1 2 2
the integral up; the remaining part will have only a constant in the numerator, and we’ll

be able to deal with that!

x + 12 3
Z Z Z
x+2 2
dx = dx + dx
x2 + x + 1 x2 + x + 1 x2 + x + 1
Z 1
2 du
Z
3 dx
= + 2
u 2 x+ 1 + 32 4

1
 !!
1 3 2 2 x+
= ln |u|+ · √ tan−1 √ 2
(using our formula from example 5 again)
2 2 3 3

 √
 
1 2x + 1
= ln x2 + x + 1 + 3 tan−1 √ + C.
2 3

12 The previous example is just about as hard as it gets. The only way it could possibly be

140
worse is if our denominator contains a repeated quadratic root. In these cases we may

need to make a trigonometric substitution. For example, given

Z
1
dx,
(x2 + 1)2

we would let

x = tan θ

=⇒ dx = sec2 θ dθ.

Then,
sec2 θ
Z Z
1
dx = dθ
(x2 + 1)2 (tan2 θ + 1)
2

sec2 θ
Z
= dθ
sec4 θ
Z
= cos2 θ dθ
Z
1 + cos 2θ
= dθ
2
 
1 sin (2θ)
= θ+ + C.
2 2

To return to x, we first need to apply one of the double angle formulas, to obtain

1
(θ + sin θ cos θ) + C.
2

Now use a sample triangle:

1)
x+
2
√(
x

Figure 63:

141
and we have our result:

Z  
1 1 x
2 = tan−1 x + 2
+ C.
(x2 + 1) 2 x +1

142
29 Summary of Integration Rules and Techniques

So far, we have compiled the following list of basic integration rules:


Z
1
• xn dx = xn+1 + C, for any number n 6= −1
n+1
Z
1
• dx = ln |x| + C
x
Z
• ex dx = ex + C
Z
1 x
• ax dx = a +C
ln a
Z
• cos x dx = sin x + C
Z
• sin x dx = − cos x + C
Z
• sec2 x dx = tan x + C
Z
• csc2 x dx = − cot x + C
Z
• sec x tan x dx = sec x + C
Z
• csc x cot x dx = − csc x + C
Z
• cosh x dx = sinh x + C
Z
• sinh x dx = cosh x + C
Z
• sech2 x dx = tanh x + C
Z
• sechx tanhx dx = sechx + C
Z
1
• dx = tan−1 x + C
1 + x2
Z
1 1 −1 x
 
• dx = tan +C
a2 + x2 a a
Z
1
• √ dx = sin−1 x + C
1 − x2
Z
1
• √ dx = sec−1 x + C (assuming that the arcsecant function is defined as in
x x2 − 1
these notes)

143
Z
• sec x dx = ln |sec x + tan x| + C
Z
• csc x dx = − ln |csc x + cot x| + C

As for integration techniques, in a sense there are only three:

• Substitution (including trigonometric substitution)

• Integration by Parts

• Rewriting the Integrand (by simple algebra, trigonometric identities, partial fractions,

or completing squares)

All of these techniques have the same goal: to replace the given integral with an integral which

is on our list! For this reason some memorization is essential; you MUST know these formulas.

144
30 More Applications of Integration

Now that we’ve studied all of the methods of integration available to us, we turn to another

couple of applications.

30.1 Finding Lengths of Curves (the “Arc Length Formula”)

Given a segment of a curve y = f (x), spanning an interval [a, b], we’ve discussed how to find

the area below it, and how to find its average distance from the origin. We might also be

interested in its length. Of course, we could measure this, by laying a string along the curve

(carefully), then stretching out the string and measuring that, but how might we calculate it?

The strategy will be the same as in every other application of integration; we’ll break the

problem into small pieces. The first step, as usual, is to partition an axis (let’s say the x-axis)

into subintervals of equal length (∆x). Now, the section of the curve y = f (x) which spans

each one of these subintervals will be short (and will get shorter when we let ∆x approach

zero), and so we can reasonably approximate it by a straight line. . ., and we do know how to

calculate lengths of straight line segments!

Approximation
to curve
∆Li
∆yi
∆x

xi-1 xi

Figure 64:

From the figure above, you can see that the length of each straight line segment can be

expressed as
q
∆Li = (∆x)2 + (∆yi )2 . (5)

What we need to do next is let ∆x approach zero, but it might not be immediately clear how
Xn q
we would go about calculating lim (∆x)2 + (∆yi )2 ; we need one more step first. The
∆x→0
i=1

145
trick is to factor a copy of ∆x out of the square root; this way we have a differential at the end

of our expression, and we can interpret our sum as a Riemann sum. That is, from equation

(5) we write s  2
∆yi
∆Li = 1+ ∆x,
∆x

so that s
n  2
X ∆yi
L≈ 1+ ∆x,
∆x
i=1

and now we may state that the exact length of the curve y = f (x) over the interval [a, b] is

s 2
Z b 
dy
L= 1+ dx.
a dx

Example: It can be shown that a cable suspended between two supports of equal height

assumes the form of a “catenary” (the graph of a hyperbolic cosine):

x
y = a cosh + K,
a

where a and K are constants.

y=a⋅cosh(1)+K

y=a+K

x=-D/2 x=0 x=D/2


Figure 65:

If we can determine the value of a, then we can calculate the length of the cable (the

constant K is merely a vertical displacement):

s 2
Z D/2 
dy
L= 1+ dx
−D/2 dx

146
D/2
Z r x
= 1 + sinh2 dx
−D/2 a

D/2
Z r x
=2 1 + sinh2 dx (since the integrand is even)
0 a
Z D/2 r x
=2 cosh2 dx (since cosh2 x − sinh2 x = 1)
0 a
Z D/2 x
=2 cosh dx (since cosh x > 0 for all x)
0 a
 
D
= 2a sinh .
2a

Unfortunately, most of the integrals generated by the arc length formula will have to be

approximated by numerical methods. For example, even for a simple curve like y = x3 , the

expression giving the length of the curve over an interval [a, b] is

Z bp
L= 1 + 9x4 dx,
a

and there is no way to evaluate this exactly.

Comment: We can easily adapt the formula for curves which are given in the form x = g (y).

We simply subdivide the y axis instead, and factor out ∆y before taking the limit as ∆y → 0,

to obtain s
Z d  2
dx
L= 1+ dy.
c dy

30.2 Volumes of Solids of Revolution

Consider the graph of an equation y = f (x) over an interval [a, b], with f (x) > 0 on this

interval. Imagine revolving this curve segment about the x- axis, in three dimensions; the

result will be a three-dimensional surface. We can go one step further; consider the region

bounded by the curve segment, the x- axis, and the lines x = a and x = b, and imagine

revolving that entire region about the x- axis; the result will be a three-dimensional solid. For

illustration, let’s use y = ex , over the interval [1, 2]:

147
15

10

y = ex

0 1 2 3

Figure 66:

12

0 1 2 3

-4

-8

Figure 67: -12

Analysis of three-dimensional objects usually requires multivariable calculus (Math 119),

but because of the symmetry of this particular object it can be fully described in terms of the

single variable x, and we already have the tools to answer questions such as this: what is the

volume of the solid we have just generated?

We are about to develop a formula for this, but before we do so, here’s a word of caution:

we could easily consider a different axis of revolution, and each choice will require a different

formula. For example, we could consider revolving the same curve segment about the line

y = −5; this will lead to a larger solid with a rather different geometry:

148
10

0 1 2 3

-5

-10

-15

Figure 68: -20

We could even consider a vertical axis of revolution; using the y- axis produces a circular

bowl with a cylindrical hole in the middle:

12

-3 -2 -1 0 1 2 3

-4

Figure 69:

Because of this variability in the nature of the examples you might encounter, you are

strongly advised not to rely on memorization of the formulas we are about to develop. Instead,

pay attention to the logic behind them.

How are we going to develop these formulas? By now the strategy should be familiar: we’re

going to partition the x- axis (or possibly the y- axis) into intervals of length ∆x (or ∆y), make

an approximation of the volume for each corresponding section of the solid, add them up, and

149
take a limit as the width of the intervals goes to zero. Since we’ve done this sort of thing more

than once now, for different applications, let’s dispense with some of the rigour. In a very

rough sense, the essential idea of integration is to break a problem down into a sum of infinitely

many infinitesimal parts. We’ll start by dividing the region into infinitesimally thin rectangles

(of thickness dx or dy), and calculate the volume of the corresponding (infinitesimally thin)

section of the solid. We’ll label this volume dV (often called a “volume element”). Summing
R
up all infinitely many of them, the volume of the entire solid will be V = dV .

There are two very different circumstances to consider, each with a few possible variations.

30.2.1 Vertical Rectangles Revolved about a Horizontal Axis (or Vice Versa)

Consider the first problem proposed above (Figure 67). If we were asked to find the area of the

generating region, it would be sensible to begin with a partition of the x- axis, producing thin

vertical rectangles. So, let’s start the same way, but ask a different question: what happens to

infinitesimally thin vertical rectangles as the region they cover is revolved about the x- axis?

We hope you can see that each one of these rectangles will generate a disk, of radius f (x)

and thickness dx. That disk must therefore have volume

dV = (area of face)(thickness)

= π [f (x)]2 dx.

The volume of the entire solid of revolution, then, must be

Z
V = dV

Z b
= π [f (x)]2 dx.
a

Setting f (x) = ex and the interval as [1, 2], we find that the volume of the object in Figure
Z 2
π 4
πe2x dx = e − e2 units3 .

67 is V =
1 2
Now consider the second of our proposed problems (Figure 68). What happens when we

revolve our vertical rectangles about an axis y = k, k < 0? We obtain disks with holes in the

middle (we call these washers). This just requires a minor adjustment to our formula; each

150
infinitesimally-thin washer will have volume

dV = (area of face)(thickness)

2 2

= πrouter − πrinner dx
h i
= π (f (x) + k)2 − k 2 dx,

and so the volume of the entire solid must be

Z b h i
V = π (f (x))2 + 2kf (x) dx.
a

So (setting f (x) = ex and the interval as [1, 2] again) we find that the volume of the object
Z 2 h Z 2  
x 2 2
i
2x x
 1 4 19 2
in Figure 68 is V = π (e + 5) − 5 dx = π e + 10e dx = π e + e − 10e
1 1 2 2
units3 .

30.2.2 Vertical Rectangles about a Vertical Axis (or Horizontal about Horizontal)

Next, consider the third of our proposed problems (Figure 69). If we stick with the choice

of vertical rectangles (which does make sense, given the shape of the region), then we find

ourselves with a very different situation: revolving these rectangles about a vertical axis won’t

produce disks or washers at all. Instead, we obtain infinitesimally thin cylinders (these are

typically referred to as cylindrical shells).

Very well, then... what’s the volume of a cylindrical shell? We could calculate it as the

difference between the volumes of two cylinders of (very) slightly different radii. For the

situation shown in Figure 69, the smaller cylinder would have radius x, and the larger one

would have radius x + dx. They both have height f (x), and so the volume of a typical

cylindrical shell must be


2 2
dV = πrouter h − πrinner h

 2 2

= π router − rinner h
h i
= π (x + dx)2 − x2 f (x)
h i
= π 2xdx + (dx)2 f (x) .

151
Since dx is infinitesimally small, the (dx)2 term is infinitesimal even when compared to dx,

so it can be ignored; we can use dV = 2πx dx as our volume element, and the volume of the

solid of revolution must be


Z
V = dV

Z b
= 2πxf (x) dx.
a
Z 2 2
Thus we find that the volume of the object in Figure (69) is V = 2πxex dx = 2π [xex − ex ] =
1 1
2πe2 units3 .

Note that we’ll also generate cylindrical shells if we revolve horizontal rectangles about

a horizontal axis, and so we’ll require a similar integral in y. And of course, just as our

disk/washer formula might need to be modified on a case-by-case basis, the same is true of

our cylindrical shell formula. If the original region is bounded between two curves y = f (x)

and y = g (x) with f (x) > g (x) on [a, b], and the axis of revolution is, say, the line x = k
Z b
(where k < a), then we’ll need to calculate V as 2π (x − k) [f (x) − g (x)] dx.
a

Comments:

1. There is a simpler way to think of the cylindrical shell formula: since the cylindrical shell

is infinitesimally thin, we could imagine cutting it along a vertical line, and flattening

it out like a sheet of paper. That sheet would have thickness dx, height h = f (x), and

length given by what was the circumference of the cylindrical shell: 2πr = 2πx. Thinking

of it as a thin rectangular box, then, its volume is dV = (length) (width) (thickness) =

2πrh dx = 2πxf (x) dx.

2. Most textbooks devote separate sections to a “disk / washer method” and a “cylindrical

shell” method, but you shouldn’t be making decisions based upon which of the “methods”

you want to use. Rather, the decision to be made is whether to partition the original

region into vertical rectangles or horizontal rectangles - the appropriate formula will

follow from that decision. How do we make that decision? The same way we make that

decision in other applications, like finding areas (we’d rather end up with one integral

than more than one, and some functions might be easier to work with than their inverses).

3. Something to think about: what if the axis of revolution is at an angle? It’s unlikely

152
that you’ll see any problems like this, but ask yourself: if the same generating region

we’ve used in our discussion so far were to be revolved about the line y = x − 2, say,

what would happen to vertical rectangles?

Example: Consider the region bounded by the curve y = tan−1 x and the lines x = 0 and

y = π/4. What is the volume of the object generated by revolving this region about the line

x = −1?

Solution: This region can be conveniently broken down into either vertical or horizontal

rectangles (it’s of “Type III”), so let’s look at both options.

(I) Using Vertical Rectangles: Revolving vertical rectangles about the given (verti-

cal) axis generates cylindrical shells, of volume

dV = (circumference) (height) (thickness)

= (2πr) (yupper − ylower ) (dx)


π 
= 2π (x + 1) − tan−1 x dx.
4
Z 1 π 
Therefore our object has volume V = 2π (x + 1) − tan−1 x dx.
0 4

(II) Using Horizontal Rectangles: Revolving horizontal rectangles about the given

(vertical) axis generates washers, of volume

dV = (area) (thickness)

= πr22 − πr12 dy


h i
= π (xright + 1)2 − (xleft + 1)2 dy
h i
= π (tan y + 1)2 − 12 dy.

153
So, if we don’t like the look of the integral in (I), we could try this instead:

Z π/4
π tan2 y + 2 tan y dy
 
V =
0

Z π/4
sec2 y − 1 + 2 tan y dy


0

π/4
= π [tan y − y − 2 ln |cos y|]|0
 π 
= π 1 − + ln 2 .
4

Exercise: Verify that the integral in (I) gives the same result. Hint: resist the urge to

multiply everything out! Just proceed directly with integration by parts, with u = π
4 − tan−1 x

and dv = (x + 1) dx.

30.3 Surface Areas of Solids of Revolution

That’s right - we’re not done with solids of revolution yet; there’s another question we can

answer! Let’s revisit the original idea from the previous section: take a segment of the curve

y = f (x) over an interval [a, b] and revolve it about the x-axis. To find the surface area of this

solid of revolution, we begin by partitioning the given interval [a, b] into subintervals of length

dx. On each of these intervals, the curve y = f (x) can be approximated by a straight line

segment - and from our earlier discussion of arc length we know the length of this segment to
q
be ds = 1 + [f 0 (x)]2 dx . When revolved about the x-axis, each one of these line segments

generates a frustum of a cone (a section bounded by two parallel planes). So, what’s the

surface area for each frustum?

The change in the radius of the cone over the interval is negligible, so we can approximate

the surface area of each frustrum as the surface area of a cylinder of radius f (x) and length

ds. That is, for each interval of length dx, we get a frustum of surface area dA = 2πrl =
q
2πf (x) 1 + [f 0 (x)]2 dx, and so the surface area for the entire solid of revolution (not counting

the circular left and right faces) will be

Z b q
A= 2πf (x) 1 + [f 0 (x)]2 dx.
a

What if we are to revolve the curve segment about a vertical axis instead? Then the radius

154
of each frustrum isn’t f (x)! If, for example, the axis of revolution is the y-axis (and a > 0),

then the radius of each frustrum is x, so

Z b q
A= 2πx 1 + [f 0 (x)]2 dx.
a

Naturally, if the axis of revolution is shifted (x = k, or y = k, where k 6= 0), or if we wish to

integrate on y instead of x, then we’ll have to make the appropriate adjustments


r (adjust the
 2
radius, and recall that the arc length element can be expressed as ds = 1 + dxdy dy). The

presence of the square root will often make these difficult to evaluate exactly (as we saw with

the arc length formula).

Examples:

1. For the solid we began with, generated by the curve y = ex over the interval [1, 2],

revolved about the x-axis (Figure 67) we obtain

Z 2 p
A= 2πex 1 + e2x dx.
1

This can be tackled by hand (with methods we’ve discussed), but it’s a challenge. The
h √ √  √   √ i
exact value is π e2 1 + e4 − e 1 + e2 + ln e2 + 1 + e4 − ln e + 1 + e2 square

units (that’s about 151.4).

2. For our second solid (Figure 68), the area of the outer surface (just the surface generated

by the curve y = ex , ignoring the left, right, and interior surfaces of the solid) must be

Z 2 p
A= 2π (ex + 5) 1 + e2x dx.
1

The only change we’ve had to make is to increase the radii of the frustra, but it makes

the integral even harder. A calculator gives an approximate value of 301.7 square units.

3. For our third solid (Figure 69), the radius of each frustum is x, so the surface generated

by the curve y = ex has area

Z 2 p
A= 2πx 1 + e2x dx.
1

155
That integral doesn’t look very friendly either, so maybe we could consider integrating

on y? To do so, we need to express the curve segment as x = ln y, with y ∈ e, e2 . This


 

gives us
e2
Z r
1
A= 2π ln y 1 + 2 dy.
e y

Ouch. It’s safe to say that we won’t be asking you to evaluate that by hand on your final

exam, either! The really important thing (for your exam, and for your future studies) is

that you be able to set up the integral correctly. After all, that’s the part that requires

human understanding; the evaluation can always be left to a machine. The two integrals

above give the same result (of course): it’s approximately 47.45 square units.

31 Improper Integrals

Our definition of the definite integral applies only to continuous functions, on closed intervals.
1
However, it is possible to relax these conditions. For example, consider the function f (x) = 2 .
x
The area below the graph of f (x) from x = 1 to an arbitary point x = t is

Z t
1
A (t) = dx
1 x2

t
1
=−
x 1

1
= − + 1.
t

y=1/x2

x=1 x=t

Figure 70:

1 2 3
This gives A (2) = , A (3) = , A (4) = , etc, and we immediately notice something:
2 3 4

156
the area will always be less than 1, and in fact we can state that

 
1
lim A (t) = lim 1 − = 1.
t→∞ t→∞ t

More generally, then, we can extend our definition of the definite integral to infinite intervals

in this way:

Definition:
Z ∞
If f (x) is continuous on [a, ∞), then the improper integral f (x) dx is defined as
a

Z ∞ Z t
f (x) dx = lim f (x) dx.
a t→∞ a

If this limit exists, then we say that the integral converges; otherwise it diverges.

Similarly, we define
Z a Z a
f (x) dx = lim f (x) dx.
−∞ t→−∞ t

Example 1: With this new notation, we can state that

Z ∞ Z t
1 1
dx = lim dx = . . .
1 x2 t→∞ 1 x2

=1

A comment should be made here; we appear to be looking at a region which has infinite

length, but finite area! If this seems paradoxical, it’s because we (humans) tend to misinterpret

statements involving the concept of “infinity”. Infinity is not a “thing”, and there’s no such

thing as an infinitely long curve21 . The statement

Z ∞
1
dx = 1
1 x2

is exactly the same (by definition) as the statement

Z t
1
lim dx = 1;
t→∞ 1 x2
21
Actually, it is possible to define infinity as an object, but care is required. In the “extended reals”, ∞ and
−∞ are treated as numbers, while in the usual treatment of complex numbers they merge into a single point
at infinity (if you’re interested, look up “Riemann Sphere”).

157
it simply means that as we continue to enlarge the area highlighted in Figure 70, the area

increases more and more slowly, and never reaches 1. This is analogous to the idea of a

convergent infinite sequence, which you’ll study in Math 119; as an example consider that we

could take the number 0.9, add 0.09, then add 0.009, etc, and never reach 1 no matter how

long we continue the process.

Z ∞
1
Example 2: Now consider the improper integral dx. Applying the definition, we find
1 x
that
Z ∞ Z t
1 1
dx = lim dx
1 x t→∞ 1 x
t
= lim ln |x|
t→∞ 1

= lim ln t
t→∞

= ∞.
Z ∞
1
That is, this integral diverges. In fact, it is not hard to show that dx converges if and
1 xp
only if p > 1.

Z ∞
1
Example 3: Consider dx. To find an antiderivative, we would let u = ln x,
2 x (ln x)2
giving du = x1 dx. Hence

Z ∞ Z t
1 1
dx = lim dx
2 x (ln x)2 t→∞ 2 x (ln x)2

Z ln t
1
= lim du
t→∞ ln 2 u2
ln t
1
= lim −
t→∞ u ln 2
 
1 1
= lim − +
t→∞ ln t ln 2
1
= .
ln 2

For functions with discontinuities (or for functions being considered on finite but open

intervals), we need a second set of definitions:

158
Definition:

If f (x) is continuous at every point in the interval [a, b] except at x = a, then the improper
Z b
integral f (x) dx is defined as
a

Z b Z b
f (x) dx = lim f (x) dx.
a t→a+ t

As for the first type of improper integral, if this limit exists, then we say that the integral

converges, and that otherwise it diverges.

Similarly, if the discontinuity is at b instead, then

Z b Z t
f (x) dx := lim f (x) dx.
a t→b− a

If there is a discontinuity at a point c ∈ (a, b), then

Z b Z c Z b
f (x) dx := f (x) dx + f (x) dx,
a a c

where these two integrals are as we’ve just defined.

Example 4:
Z 1 Z 1
1 1
√ dx = lim √ dx
0 x t→0+ t x
1

= lim 2 x
t→0+ t
 √
= lim 2−2 t
t→0+

=2
Z 1
1
More generally, it can be shown that dx converges if and only if p < 1 (compare this to
0 xp
Example 2).

Z 4
dt
Example 5: Consider . If you fail to notice the discontinuity, you might write
1 (t − 2)2

Z 4 4
dt 1
2 = −t − 2
1 (t − 2) 1

159
1
=− −1
2
3
=− ,
2

and it should be obvious that this can’t be right (the integrand is always positive, so the

integral can’t possibly be negative).

Indeed, there is a discontinuity, and we must write

Z 4 Z 2 Z 4
dt dt dt
= +
1 (t − 2)2 1 (t − 2)2 2 (t − 2)2

Z x Z 4
dt dt
= lim + lim
x→2− 1 (t − 2)2 x→2+ x (t − 2)2

1 x 1 4
= lim −
+ lim −
x→2− t − 2 1 x→2+ t − 2 x
   
1 1 1
= lim − − 1 + lim − + .
t→2− x−2 t→2+ 2 x−2

Now, the first limit is infinite, and the second one is also (negatively) infinite. Hence the

integral diverges.

Z ∞
Example 6: Consider x dx. You might be tempted to say that this is zero, since
Z a −∞

x dx = 0, for any value of a. However, in doing so you’d be assuming that the upper and
−a
lower limits of the integral go to ±∞ at the same rate! To use our definitions, we must write22

Z ∞ Z 0 Z ∞
x dx = x dx + x dx
−∞ −∞ 0

0 t
x2 x2
= lim + lim
t→−∞ 2 t→∞ 2
t 0

↓ ↓

−∞ ∞
Z ∞ Z t
22
The temptation is to write f (x) dx = lim f (x) dx. In fact, this definition is used in certain
−∞ t→∞ −t
contexts, because it does
Z ∞ give useful results in applications.
Z 2tHowever, from a strictlyZ logical perspective, it is
t+1
flawed; why couldn’t f dx be defined instead as lim f dx, or perhaps lim f dx?
−∞ t→∞ −t t→∞ −t

160
Z ∞
Since both integrals diverge, we must conclude that x dx diverges.
−∞
If you’re wondering why we need these definitions, here are a couple of examples to show

you that improper integrals can arise quite naturally:

Example 7: Suppose we try to apply our arc length formula to a circle of radius r to find

its circumference (yes, we already know what the answer should be!). Let’s proceed by finding

the length of the portion of the circle in the first quadrant, and multiplying it by 4. The upper
√ x
half of the circle has equation y = r2 − x2 , so y 0 = − √ . Therefore the circumference
r 2 − x2
of the circle must be s
Z r  2
dy
C=4 1+ dx
0 dx
r
r
x2
Z
=4 1+ dx
0 r 2 − x2
r
r
r2
Z
=4 dx
0 r2 − x2
Z r
r
=4 √ dx.
0 r2 − x2

This is an improper integral, since the integrand is undefined at the upper limit x = r. So,

Z t
1
C = 4 lim q dx
t→r− 0 x 2

1− r

x t
= 4 lim r sin−1
t→r− r 0

   
−1 t −1
= 4r lim sin − sin (0)
t→r− r

= 4r sin−1 (1)

= 2πr,

as expected.

Note: In this particular example (and many similar ones), you would still get the same

result if you failed to notice that the integral was improper. However, understanding the

161
concept may still be helpful to you. For instance, if you were to try to evaluate this integral

using a calculator (after choosing a specific value for r), it would fail! Realizing that it is

improper might help you modify your approach to fix the problem.

Example 8: We can use an improper integral to calculate the escape velocity of a projectile

fired from a planet’s surface (we’ll ignore air resistance here). The key realization is that the
1 2
kinetic energy of the projectile at launch, 2 mv , must be greater than the total work to be

done against gravity as the projectile rises. Now, for motion in a straight line we know that

work = force × distance, but if the force varies with distance then this must be calculated as

Z b
W = F (x) dx
a

(we split the journey into small distances of length dx, imagine the gravitational force to be

constant over each small interval, and sum up the amounts of work required to traverse each

one).

In our case,
GM m
F =− ,
x2

and we want to move the projectile from x = R (the surface of the planet) to x = ∞ (an

indefinitely large distance away!).

Z ∞
GM m
Work = dx
R x2
Z t
GM m
= lim dx
t→∞ R x2
t
1
= GM m lim −
t→∞ x R
 
1 1
= GM m lim − +
t→∞ t R
GM m
= .
R

Hence, to escape the planet’s gravitational field, we require that

1 GM m
mv 2 > ,
2 R

162
r
2GM
i.e. v> .
R

If the planet is earth, this gives an astonishing speed of 11.2 km/s... and we haven’t even

considered the effect of air resistance!

163
Part V

Polar Coordinates

32 Introduction

So far, all of our discussions have taken place in the Cartesian coordinate system, in which

the coordinates are the distances from a vertical and a horizontal axis, respectively.

P=(x,y)
x

Figure 71:

Alternatively, we could describe the location of a point by giving its distance from the

origin (we’ll call this ρ) and the angle between the horizontal axis and a line segment drawn

from the origin to the point (we’ll call this φ).23

P=(𝝆,𝝋)

“pole” “polar axis”

Figure 72:

Notice that ρ, as a distance, must always be non-negative: ρ ∈ [0, ∞). Meanwhile, φ


23
There is a lack of agreement on standard notation for polar coordinates and their three dimensional
counterparts (cylindrical and spherical coordinates, which you will see in Math 119). Most mathematical texts
use r and θ for the polar system, but we have chosen to use ρ and φ here to match the notation you will
probably see in your physics courses. These are consistent with the standards established by the International
Organization for Standardization (ISO).
Previous editions of these notes used r and θ, and you’ll see those names used in old exams (and possibly
even in your current assignments, depending on how careful we are!).

164
can take on any value, although we only need the values φ ∈ [0, 2π]. This means that polar

representations are not unique; the point (1, 0) can also be described as (1, 2π), (1, 4π), etc.

There is one exception: the “pole” ρ = 0 doesn’t need an angle at all.

Note also that we lack distinct notations for the two systems; if we see the expression (a, b)

we need context to tell us whether it is a point in Cartesian coordinates or a point in polar

coordinates (or whether it is an interval on the real line).

If we lay our two diagrams over top of each other, we can convert between the two systems.

The formulas are

x = ρ cos φ

y = ρ sin φ.

P=(𝝆,𝝋)

𝝆
y
𝝋

“pole” x “polar axis”

Figure 73:

Going the other way, we have


p
ρ= x2 + y 2
y
φ = tan−1 + 2kπ, k∈Z
x
 
π 3π y
(φ is multivalued, but if we want the value in − , we can say that φ = tan−1 for
2 2   x
y
points in the first or fourth quadrants, while φ = tan−1 + π for points in the second or
x
third quadrants).

Examples: For points on the axes, no calculations should be necessary; just think about

how we’ve defined the coordinates:

(x, y) = (1, 0) is (ρ, φ) = (1, 0)

165
 π
(x, y) = (0, 1) is (ρ, φ) = 1,
2

(x, y) = (−1, 0) is (ρ, φ) = (1, π)


 

(x, y) = (0, −1) is (ρ, φ) = 1,
2

Points on the lines y = ±x should also be easy:

√ π
(x, y) = (1, 1) is (ρ, φ) = 2,
4

√ 3π
 
(x, y) = (−1, 1) is (ρ, φ) = 2,
4

etc.

Of course, we can also use relationships between ρ and φ to describe curves.

Examples:

• The circle x2 + y 2 = 4 can be described simply as ρ = 2.

• Consider the equation ρ = sin φ, with the restriction φ ∈ [0, π]. What does the graph

look like? If we try drawing it roughly, by hand, just by thinking about what the

coordinates represent, we get something like this:

Figure 74:

166
Is this really a circle? Yes it is! We can confirm this by using the conversion formulas:

p y
ρ = sin φ =⇒ x2 + y 2 = p
x2 + y 2

=⇒ x2 + y 2 = y

=⇒ x2 + y 2 − y = 0
 2
2 1 1
=⇒ x + y− = ,
2 4

and we recognize this as being the equation of a circle of radius 1/2, centered at 0, 12 .


Question: What happens for φ ∈ (π, 2π), where we obtain negative values for ρ?

There are two conventions in use. Some authors simply ignore this range, since ρ is

supposed to be a distance (so there are no points corresponding to these values of φ). Others,

though, interpret a negative distance as a distance in the opposite direction. That is, they use

the rule (−ρ, φ) = (ρ, φ + π). Thus, for example, the point (ρ, φ) = −2, π4 is (ρ, φ) = 2, 5π
 
4 :

𝛗=𝞹/4
-5 -4 -3 -2 -1 0 1 2 3 4 5

-1

Here!
-2

-3
Figure 75:

More Examples:

• For ρ = sin φ, the section for φ ∈ (π, 2π) reproduces the same circle!

167
1.6

1.2

0.8


0.4

-2.4 -2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4

-0.4

-0.8

-1.2

Figure 76: -1.6

• What about the equation ρ = cos φ? This is also a circle!

0.75

0.5
𝝆=cos(𝛗)

0.25

𝝆(𝞹/2)=0 𝝆(0)=1
-0.5 -0.25 0 0.25 0.5 0.75 1 1.25 1.5 1.75

-0.25

-0.5

Figure 77: -0.75

• Simple equations in polar coordinates can have some interesting graphs. Consider ρ =

2 sin φ + 1. Conversion into Cartesian coordinates is not very helpful here, so it’s easiest

to trace the curve out using what we know about the sine function (it might help to

graph the equation y = 2 sin x + 1 first, to see how the output values change as the input

variable increases). Note in particular that ρ = 0 when sin φ = − 21 , i.e. at φ = − π6 and



6 , so the curve passes through the origin tangentially to those lines, and between those

values we’ll use the rule described above. We should get something like this:

168
4

𝝆(𝞹/2)=3
3
𝝆 increases from 1 to 3

2

-5 -4 -3 -2 -1 0 1 2 3 4 5
𝝆(𝞹)=1 𝝆(0)=1

-1

Figure 78:

• You might try also the equations ρ = cos 2φ and ρ = 2 − sin φ, as exercises.

33 Calculating Area and Arc Length in Polar Coordinates

Since circles (and spheres) are such natural shapes, we encounter them frequently in applica-

tions. And since they can be described so much more simply in polar coordinates, it can be

very useful to be able to do calculus in polar coordinates. You’ll get a sense of how important

this is in Math 119, but we are already in a position to at least ask some simple questions:

given a curve (or curve segment) with polar equation ρ = f (φ), how might we find its length?

If it’s a closed curve, can we find the area it encloses?

33.1 Areas

We know how to find the area between the x-axis and a curve y = f (x) for x ∈ [a, b], and it

isn’t too difficult to modify that idea. The first step is to realize what it is we will actually be

calculating! Polar equations describe distance from the origin instead of from an axis, and so

given a curve segment ρ = f (φ), φ ∈ [α, β], the quantity we’ll most easily be able to calculate

is the area between the origin and the curve over that range of angles, i.e. the area bounded

by the curve and the two lines φ = α and φ = β.

169
4
φ =β

ρ=f(φ)
2

φ =α
1

-2 -1 0 1 2 3 4 5 6 7

-1

Figure 79:

From here, we follow a familiar argument, modifying it for the context. We partition the

interval [α, β] into tiny angular increments ∆φ. In each small interval [φi−1 , φi ], i = 1...n, we

pick a value φ∗i at which to evaluate f . We use that value to define the radius of a sector

of a circle (instead of the height of a rectangle). Recalling that the area of a sector of a
1
circle of radius r and angle θ is A = 21 r2 θ, we know that this sector has area dA = ρ2i dφ =
2
1 ∗ 2
[f (φi )] dφ. Thus, we can approximate the area “inside” the curve as the sum of the areas
2
of a multitude of sectors of different radii, and letting dφ → 0 gives us the exact value:

Z β
1
A= [f (φ)]2 dφ.
α 2

If what we’re looking for is the area between two curves ρ = f (φ) and ρ = g (φ), with

f > g on [α, β], then we just have to subtract the smaller area from the larger one:

Z β
1 
A= [f (φ)]2 − [g (φ)]2 dφ.
α 2

Example: Find the area of the portion of the circle x2 + y 2 = 4 which lies above the line

y = 1.

Solution: If you were to encounter this question on an exam, your instincts might tell

you to just proceed with a calculation in Cartesian coordinates; this area must be A =
Z √3 p
√ 4 − x2 dx. That integral can indeed be evaluated, but polar coordinates provide
− 3
an easier option. We need to rewrite the curves in polar form, of course: the circle has polar

170
equation ρ = 2, and for the line we have ρ sin φ = 1, so ρ = csc φ. We also need to know the

range of value of φ; for this we can just observe that the curves intersect when csc φ = 2, i.e.

when sin φ = 1/2. Therefore φ runs from π/6 to 5π/6. The desired area must therefore be

Z φ2
1 2
ρouter − ρ2inner dφ

A=
φ1 2

Z 5π/6 Z 5π/6  
1 2 1
2 − csc2 φ dφ = 2 − csc2 φ

= dφ
π/6 2 π/6 2
  5π/6
1
= 2φ + cot φ
2 π/6

√ ! √ !
5π 3 π 3
= − − +
3 2 3 2

4π √
= − 3.
3

Note: there are a couple of ways we could have made this problem even easier, by taking

advantage of symmetry: consider that the area bounded above the line y = 1 must be the

same as the area bounded to the right of the line x = 1; that gives an integral involving sec2 φ

instead of csc2 φ. We could also cut the region in half along the axis, and double the result.

33.2 Arc Length

To find the length of a curve with polar equation ρ = f (φ), with φ ∈ [α, β], recall the work

we’ve already done on arc length. The idea was to partition the curve into small segments of

length ∆s, calculate the length of each one, add them up, and take a limit. Roughly speaking

then, we determined that


Z
L= ds,

where the length element ds has the structure

q
ds = (dx)2 + (dy)2 .

171
To use x as the variable of integration (with x ∈ [a, b]), we simply factored out a dx within

the integral (or equivalently, multiply and divide by dx):

s 2
Z b 
dy
L= 1+ dx.
a dx

To use y as the variable of integration (with y ∈ [c, d]), we simply factored out a dy:

s 2
Z d
dx
L= + 1 dy.
c dy

If we wish to use φ as the variable of integration, we just have to introduce a differential dφ,

in exactly the same way: s


Z β 2  2
dx dy
L= + dφ.
α dφ dφ

Now, to use this formula, we do need a bit of work. You can see that we will need to

express both x and y in terms of φ. That isn’t hard to do; we know that

x = ρ cos φ and y = ρ sin φ.

Since, on our curve, we have ρ = f (φ), then, we can write

x = f (φ) cos φ and y = f (φ) sin φ.

This describes the curve using two functions of the single variable φ; we call this a parameter-

ization of the curve, and we’ll discuss the concept further in Math 119. For the moment, all

we have to do is differentiate and simplify our formula for L:

dx dy
= f 0 (φ) cos φ − f (φ) sin φ, = f 0 (φ) sin φ + f (φ) cos φ
dφ dφ

 2
dx 2
= f 0 (φ) cos2 φ − 2f (φ) f 0 (φ) sin φ cos φ + [f (φ)]2 sin2 φ

=⇒

 2
dy 2
= f 0 (φ) sin2 φ + 2f (φ) f 0 (φ) sin φ cos φ + [f (φ)]2 cos2 φ,

and

 2  2
dx dy 2
= f 0 (φ) + [f (φ)]2 .

so +
dφ dφ

172
Therefore, the arc length formula has polar form

Z β q
L= [f 0 (φ)]2 + [f (φ)]2 dφ.
α

If you prefer, you can use the fact that ρ = f (φ) and express this as

s 2
Z β

L= + ρ2 dφ.
α dφ

Example: Consider the graph of the equation ρ = 2 sin φ + 1, which we discussed in the

previous section (see Figure 78) . The outer loop has length

Z 7π/6 q
L= (2 cos φ)2 + (2 sin φ + 1)2 dφ
−π/6

Z 7π/6 q
= 4 cos2 φ + 4 sin2 φ + 4 sin φ + 1 dφ
−π/6

Z 7π/6 p
= 5 + 4 sin φ dφ.
−π/6

Unfortunately, using polar coordinates doesn’t get us away from the fact that arc length

integrals contain square roots, and we’ve got another one which is hard to evaluate exactly. A

calculator gives an approximation easily enough: the length is about 10.68 units.

173
34 Complex Numbers

The basic idea behind complex numbers is a simple one, but it took centuries to be taken

seriously. Consider the algebraic equation x2 + 1 = 0. We know that this has no solutions

(just consider the fact that the graph of y = x2 + 1 does not cross the x-axis). However, if

we “blindly” apply the rules we usually use for solving such equations, we find that x2 = −1,

which leads to the expression



x = ± −1.

This is clearly nonsense, but it turns out to be a remarkable useful sort of nonsense!

Defining an imaginary number or a complex number needs some care; we will avoid using

the expression “ −1” in our definitions altogether because it can lead to errors like this24 :

√ p p p √ 2
1= 1= (−1) (−1) = (−1) (−1) = −1 = −1.

Instead, we define a complex number z to be an ordered pair (x, y), written in the special

notation

z = x + iy

and accompanied by rules of algebra which ensure that i2 = −1 (we’ll state those rules in

a moment). This allows us to avoid the “ ” symbol, but still introduce i as an “imaginary”

number such that i and −i are the two square roots of −1. If y = 0 then z is a real number,

while if x = 0 then z is (purely) imaginary.

The number x is referred to as the real part of z, and we may use the notation Re (z) = x.

The number y is referred to as the imaginary part of z, and we may use the notation

Im (z) = y. There is something odd about this particular choice of terminology; the imaginary

part of z is a real number ! The quantity iy is imaginary, but y itself is real, and this is the

number we refer to as Im (z).

There is also one other peculiarity of notation you’ll need to get used to. In the engineering

world the letter i has another role; it stands for electrical current (often, I is used for a constant

value, and i (t) is used for a current which changes as a function of time). For that reason,
24
What’s wrong with this calculation? The problem is that when we work with real numbers, the symbol

“ ” represents the positive square root, but complex numbers cannot be described as positive or negative!
In fact, when working with√complex numbers, we will have to treat roots (and all non-integer exponents) as
multivalued functions, so “ −1” will have to be understood as representing both i and −i.

174
engineers will usually use j for the imaginary unit instead of i. We might as well make that

switch right away; from now on we’ll write our complex numbers as z = x + jy.

Remarkably, the introduction of complex numbers can make math simpler ! For example,

consider the problem of factoring polynomials (which is what motivated the idea in the first

place). When we work with real numbers, it is known that every polynomial can be factored

into a collection of linear and irreducible quadratic factors. If we allow ourselves to use

complex numbers, then we can shorten that statement: every polynomial can be factored into

a collection of linear factors (this is the Fundamental Theorem of Algebra).

Example: Consider the equation z 2 + z + 1 = 0. Applying the quadratic formula gives

√ √
−1 ± 1−4 1 −3
z= =− ± .
2 2 2

We can see then that there are no real solutions, but in the world of complex numbers we can

identify two solutions:


√ √
−1 + 3j −1 − 3j
z1 = and z2 = .
2 2

This allows us to factor z 2 + z + 1 as (z − z1 ) (z − z2 ).

You’ll notice that the two roots differ only by a negative sign, and you should be able to

see that this will happen with every quadratic equation, as a result of the “±” in the quadratic

formula. We say that the solutions are complex conjugates of each other, and we have a special

notation for this as well:

If z = x + jy, then the complex conjugate of z is the number z ∗ = x − jy.

Before we really get our hands dirty, we should make one more comment about notation:

we do not distinguish between x + jy and x + yj; either form is acceptable. Common practice

is to write numbers in front of j, but variables behind it, so for example we would usually

write x + jy but 1 + 5j.

Complex Arithmetic

Let two complex numbers be z1 = x1 + jy1 and z2 = x2 + jy2 . We define equality, addition,

multiplication, and division as follows:

1 z1 = z2 means that x1 = x2 and y1 = y2 .

175
2 z1 + z2 := (x1 + x2 ) + j (y1 + y2 )

3 z1 z2 := (x1 x2 − y1 y2 ) + j (x1 y2 + x2 y1 )
   
z1 x1 x2 + y1 y2 x 2 y1 − x 1 y2
4 := +j
z2 (x2 )2 + (y2 )2 (x2 )2 + (y2 )2
The multiplication and division rules look complicated, but they are designed such that we

can simply use our familiar rules for real variables, combined with the rule j 2 = −1.

Examples: Suppose z1 = 3 + 2j and z2 = 4 + 5j. Then

a) z1 + z2 = (3 + 2j) + (4 + 5j) = 7 + 7j.

b) z1 − z2 = (3 + 2j) − (4 + 5j) = −1 − 3j.

c) z1 z2 = (3 + 2j) (4 + 5j) = 12 + 8j + 15j + 10j 2 = 2 + 23j.


  
z1 3 + 2j 3 + 2j 4 − 5j (12 + 10) + j (8 − 15) 22 − 7j
d) = = = 2
= .
z2 4 + 5j 4 + 5j 4 − 5j 16 − 25j 41
There are also some useful rules for working with the complex conjugate:

5 (z ∗ )∗ = z
1
6 z + z ∗ = 2Re (z), so Re (z) = (z + z ∗ )
2
1
7 z − z ∗ = 2jIm (z), so Im (z) = (z − z ∗ )
2j
8 zz ∗ = (x + jy) (x − jy) = x2 + y 2

Graphical Representation

We can represent z = x + jy graphically by plotting it as the ordered pair (x, y) in the complex

plane:

Im(z)
3

3+2j
2

-4 -3 -2 -1 0 1 2 3 4 5
Re(z)
-1

-2

Figure 80:

176
This turns out to be a very useful idea. For one thing, you can now see why the “imaginary

part” is defined to be the real number y; in this graphical interpretation j is nothing more

than a placeholder. Also, we can see that addition and subtraction of complex numbers is

analogous to addition and subtraction of vectors. Even more importantly, it opens the door

to an alternative representation; we can express complex numbers in polar coordinates!

Polar and Exponential Forms

Recall that we can move from Cartesian coordinates to polar coordinates by letting x = r cos θ

and y = r sin θ. Therefore, we may write 25

z = x + jy

= r cos θ + jr sin θ

= r (cos θ + j sin θ) .

Im(z)
3

3+2j=r(cos(𝛳)+jsin(𝛳))
2

r=√13
1

𝛳=arctan(2/3)
-4 -3.2 -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4 4.8 5.6

Re(z)
-1

-2

Figure 81:

The number r is called the modulus of z, and we’ll also write it as |z| .

The number θ is called the argument of z, and we may write it as arg (z).

You should be able to see that


p
r = |z| = x2 + y 2
25
We have pointed out that the ISO standard for polar coordinates uses (ρ, φ) instead of (r, θ), but r and θ
are still much more common in the context of complex numbers. Strictly speaking, we should be using |z| and
argz, but who wants to write those out every time?

177
and
y
tan θ = .
x

To solve for θ, of course, we need to know which quadrant z is in. If Re (z) > 0 then

θ = tan−1 (y/x), while if Re (z) < 0 we have θ = tan−1 (y/x) + π. Of course, we could use

tan−1 (y/x) − π instead; every complex number has infinitely many polar representations since

we can always add or subtract multiples of 2π to the argument.

Now, if we try multiplying two complex numbers expressed in polar form, something sur-

prising happens. Let z1 = r1 (cos θ1 + j sin θ1 ) and let z2 = r2 (cos θ2 + j sin θ2 ). Then

z1 z2 = r1 (cos θ1 + j sin θ1 ) r2 (cos θ2 + j sin θ2 )

= r1 r2 [(cos θ1 cos θ2 − sin θ1 sin θ2 ) + j (sin θ1 cos θ2 + sin θ2 cos θ1 )]

= r1 r2 [cos (θ1 + θ2 ) + j sin (θ1 + jθ2 )] .

That is, when we multiply complex numbers, the moduli get multiplied and the arguments

get added! This might look familiar; compare it to what happens when we multiply two

real-valued expressions of the form aex :

(a1 ex1 ) (a2 ex2 ) = a1 a2 ex1 +x2 .

This suggests a way to define the exponential function for imaginary numbers:

Euler’s Formula:

ejθ = cos θ + j sin θ

This is one of the most important formulas in all of mathematics. The special case where

θ = π is particularly elegant: we find that ejπ = −1, so ejπ + 1 = 0. This one simple equation

ties together the five most important numbers in mathematics: 0, 1, j, π, and e! Also, Euler’s

Formula gives us a third way to express complex numbers, which is equivalent to the polar

form but much more concise and easier to use:

z = x + jy

178
= r (cos θ + j sin θ)

= rejθ .

The standard form x + jy will still be the easier form to use when we need to add or

subtract complex numbers, but the exponential form can make multiplication and division

much easier.

Example:
√ jπ
1+j 2e 4
= √ −j π
1−j 2e 4
π
= ej 2

=j

A comment seems necessary here: you shouldn’t need the conversion formulas to get

through this example; just think about the location of the points in the complex plane. For
π
example, ej 2 has r = 1 and θ = π2 , so it lies one unit up the imaginary axis; that’s the number

j. Similarly, you should immediately recognize that ejπ = −1, and ej 2 = −j. Of course, the

numbers don’t always work out so nicely:

Example:
√ j tan−1 (1/2)
2+j 5e √ −1
· (3 − j) = √ −1
· 10ej tan (−1/3)
−3 + 2j 13e j(tan (−2/3)+π)

r
50 j [tan−1 (1/2)+tan−1 (2/3)−π−tan−1 (1/3)]
= e
13

≈ 1.961ej(−2.412)

≈ −1.46 − 1.31j

Euler’s Formula can be used to extend the definitions of some familiar functions to the

world of complex numbers.

π π
• We define ez = ex+jy = ex ejy . So, for example, e1+j 2 = eej 2 = ej.

• Since ejθ = cos θ + j sin θ, replacing θ with −θ gives e−jθ = cos θ − j sin θ. Adding the

179
two expressions together and dividing by 2, we get

1  jθ  1  jθ 
cos θ = e + e−jθ , and sin θ = e − e−jθ .
2 2j

• The structures of the two formulas above might look familiar. In fact, if we set θ = jx,

we discover that
1 −x
e + ex

cos (jx) =
2
1 x
e + e−x

=
2

= cosh x,

and
1 −x
e − ex

sin (jx) =
2j
1 x
e − e−x

=−
2j
−1
= sinh x
j

= j sinh x

−1 −j
(using the fact that j = j2
= j). We use these as our definitions of the cosines and

sines of imaginary numbers, and combining them with the sum-of-angle identities allows

us to define cosines and sines of complex numbers.

Example:

cos (2 + 3j) = cos (2) cos (3j) − sin (2) sin (3j)

= cos 2 cosh 3 − j sin 2 sinh 3

≈ −4.19 − 9.11j

De Moivre’s Theorem:

Euler’s Formula gives us a quick and easy way to compute powers z n :

 n
z n = rejθ

180
= rn ejnθ

= rn [cos (nθ) + j sin (nθ)] .


 10
1 1
Example: Compute + j .
2 2

Solution: We have z = 12 (1 + j), for which r = 1
2 2= √1
2
and θ = tan−1 (1) = π/4. So,
1 π
in exponential form, z = √ ej 4 , and so
2

 10  10
1 1 1 π
+ j = √ ej 4
2 2 2

1 j 5π
= e 2
25
    
1 5π 5π
= cos + j sin
32 2 2
1
= j.
32

Complex Roots

De Moivre’s Theorem can be modified for fractional powers. Before we explore this, though, we

need another comment about notation. For real numbers, the expression x is understood to

represent the positive square root of x, and it is undefined when x is negative. More generally,

there may be 0, 1, or 2 nth roots of a real number, depending on whether n is even or odd, and

whether the real number is positive or negative. Whenever there are two roots, the notation

x1/n is always understood to represent the positive one. However, when we discuss complex

numbers, the words “positive” and “negative” have no meaning (all we can do is describe the

real and imaginary parts as being either positive or negative). So, in the context of complex
√ √
numbers, we consider the expressions z and z 1/n to be multivalued; z has two values, and

z 1/n has even more (as we’re about to discuss).

Now, if we try to apply De Moivre’s Theorem using a fractional power, we end up with

this:

z 1/n = r1/n ejθ/n = r1/n [cos (θ/n) + j sin (θ/n)] .

However, this only gives one root, even in cases where we already know that there should be

181
two (eg for 11/2 we get 1, but not −1). To find the rest, we need to remember that the polar

and exponential forms of z are not unique; if z = rejθ , then z = rej(θ+2kπ) , for every integer

k. Using this expression for z, we discover that there are n distinct values of z 1/n , given by

 1/n
wk = z 1/n = rej(θ+2kπ)

= r1/n ej(θ+2kπ)/n
    
1/n θ 2kπ θ 2kπ
=r cos + + j sin + ,
n n n n

where k = 0, 1, 2, ...n − 1 (when k reaches n we have r1/n ej(θ/n+2π) = r1/n ejθ/n , so we begin

to cycle through the same roots again).

Notice that all of the roots have the same modulus, and the arguments differ by multiples

of 2π/n. That is, the roots are all located on the circle z = |z|1/n , and are evenly spaced

around it!

Example: Find the six sixth roots of −8.

Solution: In exponential form, −8 = 8ejπ (that is, r = 8 and θ = π). However, we can

also write it as 8ej(π+2kπ) , for any integer k.


  √ π kπ
Therefore (−8)1/6 = 81/6 ej(π+2kπ)/6 = 2ej ( 6 + 3 ) . Listing them individually, we have

√ jπ √  √ √ 
w0 = 2e 6 = 2 cos π6 + j sin π6 = 2 23 + 12 j

√ j( π + π ) √ j π √
w1 = 2e 6 3 = 2e 2 = 2j

√ j ( π + 2π ) √  √  −√3 1  √
2e 6 3 = 2 cos 5π 5π √1

w2 = 6 + j sin 6 = 2 2 + 2j = 2
− 3+j

√ j ( π + 3π ) √  √  −√3 1  √
2e 6 3 = 2 cos 7π 7π √1

w3 = 6 + j sin 6 = 2 2 − 2j = 2
− 3−j

√ j ( π + 4π ) √ j 3π √
w4 = 2e 6 3 = 2e 2 = − 2j

√ j ( π + 5π ) √  √  √3 1  √
2e 6 3 = 2 cos 11π 11π √1

w5 = 6 + j sin 6 = 2 2 − 2j = 2
3−j

182
Im(z)
1.6
w1

1.2

w2 0.8
w0

0.4

Re(z)
-2.4 -2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4

-0.4

w3 -0.8
w5

-1.2

w4
-1.6

Figure 82:

Example: Let’s just make sure that our knowledge of real numbers fits into this new way

of looking at roots. What happens if we look for the square roots of 4?

Solution: In exponential form, 4 is just 4... but it can also be written as 4ej2kπ .

Therefore 4 = (4)1/2 = 2ej(2kπ)/2 = 2ejkπ . Letting k = 0 and k = 1, we obtain w0 = 2

and w1 = 2ejπ = −2, as expected! And indeed, the two points are equally spaced on the circle

of radius 2.

Some Simple Applications

Euler’s Formula and the resulting formulas for the sine and cosine functions can be useful for

those of us who dislike working with trigonometric functions; sometimes we can avoid them

and use complex exponentials instead.

Example: Find an identity involving cos3 θ.

1
ejθ + e−jθ . Therefore

Solution: We know that cos θ can be expressed as 2

1  jθ 3
cos3 θ = e + e−jθ
8

183
1  3jθ 
= e + 3ejθ + 3e−jθ + e−3jθ (using the binomial expansion)
8
1  3jθ  3  jθ 
= e + e−3jθ + e + e−jθ
8 8
1 e3jθ + e−3jθ 3 ejθ + e−jθ
   
= +
4 2 4 2
1 3
= cos 3θ + cos θ.
4 4
Z
Example: Evaluate ex cos 2x dx.

Solution: Using the traditional methods, we would need to do two iterations of inte-

gration by parts here. However, if we recognize cos 2x as being the real part of e2jx , we can

proceed this way:

Z Z Z 
x x 2jx x 2jx

e cos 2x dx = e Re e dx = Re e e dx

Z Z
Now, ex e2jx dx = e(1+2j)x dx

e(1+2j)x
= +C (where C is a complex constant)
1 + 2j

(1 − 2j)
= e(1+2j)x +C
5

1 x
= [(e cos 2x + jex sin 2x) (1 − 2j)] + C.
5

All that is left is to write down the real part of this; we can conclude that

Z
1 x
ex cos 2x dx = (e cos 2x + 2ex sin 2x) + C.
5

You might not be convinced that this is easier, but it does demonstrate the surprising fact

that complex numbers can be used to solve problems involving real-valued variables! We’ll

look at a more specific application in the next section.

184
35 An Application of Complex Numbers: Impedance

It is known from experiment that when a current of i (t) = I sin (ωt) passes through a resistor

of resistance R the voltage drop across the resistor is vR (t) = IR sin (ωt). This can be stated

more concisely as

v = iR,

which is the famous “Ohm’s Law”. However, the relationship between current and voltage is

slightly more complicated when capacitors or inductors are involved. When the same current

i (t) = I sin (ωt) passes across a capacitor of capacitance C the voltage drop across the ca-
I  π
pacitor is vC (t) = sin ωt − , and when it passes through an inductor of inductance L
ωC  2 π
we find that vL (t) = ωLI sin ωt + . Even though these formulas involve phase shifts, it
2
is possible to express them in a form similar to Ohm’s Law, if we take advantage of complex

numbers!

Recall that

ejθ = cos θ + j sin θ

(Euler’s Formula). This means that

ejωt = cos (ωt) + j sin (ωt)

and so the current i (t) = I sin (ωt) can be expressed as

i (t) = I · Im ejωt .


In fact, as a mathematical trick, we will pretend that the current is a complex quantity; we

will write ĩ (t) = Iejωt .

What does this mean?

185
• Each value of t gives a point in the complex plane, on the circle of radius I.

i (π/2ω ) ! I e jω (π/2ω ) ! I e ( jπ/2) ! I j

i ! "I
i (0) ! i (2π/ω ) ! i (4π/ω ) ! I

i (t) ! I e jωt
i ! "I j
Figure 83:

• We can picture ĩ (t) as a point travelling around the circle.

• The actual current is given by the imaginary part of the complex one; we just consider

what the y-coordinate does!

Now consider our three rules again; how should they be expressed for the complex current

ĩ (t) = Iejωt ? We just replace sin (θ) with ejθ , everywhere:

Resistors: v (t) = IR sin (ωt) becomes ṽ (t) = IRejωt . So, Ohm’s Law is unchanged:

ṽ = Rĩ.

I  π
Capacitors: v (t) = sin ωt − becomes
ωC 2

I j (ωt− π )
ṽ (t) = e 2
ωC

I −j π jωt
= e 2e
ωC
−jI jωt
= e
ωC
I jωt 1
= e (using = −j)
jωC j

But now this is just a multiple of the current! We have arrived at a version of Ohm’s Law for

capacitors:

1
ṽ = ĩ
jωC

186
 π
Inductors: v (t) = ωLI sin ωt + becomes
2

π
ṽ (t) = ωLIej (ωt+ 2 )

π
= ωLIej 2 ejωt

= jωLIejωt

and we now have a version of Ohm’s Law for inductors as well:

ṽ = jωLĩ

We can summarize the various versions of Ohm’s Law this way:




R

 for resistors



ṽ = Z ĩ, where Z = 1
for capacitors
 jωC




jωL
 for inductors.

All three component types represent opposition to the flow of electricity, so we call this new

quantity Z the impedance of the circuit (or complex impedance).

Components Combined in Series or in Parallel

You may be familiar with the fact that if a current passes through several resistors connected in

series, then we can treat that section of the circuit as a single resistor, and the total resistance

is simply the sum of the individual ones. On the other hand, if the resistors are connected in

parallel, then it is the reciprocals of the resistances which combine.

R1

A B A B
R1 R2 R2 R2

R3
1 1 1 1
R ! R1 " R2 " R3 ! " "
Figure 84: R R1 R2 R3

Fortunately, capacitances and inductances combine in the same way (at least, if we view

1/C as the quantity of interest for capacitors instead of C). Furthermore, now that we have

187
generalized Ohm’s Law, we can treat resistors, capacitors, and inductors as if they were all

the same thing; impedances combine in the same way as resistances!

Components in Series: Components in Parallel:

1 1 1 1
R = R1 + R2 + R3 + . . . = + + + ...
R R1 R2 R3
1 1 1 1
= + + + ... C = C1 + C2 + C3 + . . .
C C1 C2 C3
1 1 1 1
L = L1 + L2 + L3 + · · · = + + + ...
L L1 L2 L3

1 1 1 1
Z = Z1 + Z2 + Z3 + ... = + + + ...
Z Z1 Z2 Z3

Examples:

1. Suppose a current i (t) = I sin t passes through the circuit illustrated:

A B
Figure 85: R!5Ω C ! 0.1 F

Find v (t): the difference in electrical potential energy between terminals A and B.

Note: 0.1 farads is a rather large value for a capacitor, but we won’t concern ourselves

with realism too much here; the goal is to illustrate the method.

Solution: The complex current must be ĩ (t) = Iejt (so that the imaginary part is

i (t)).

The impedance of the resistor is ZR = R = 5.


1 1
The impedance of the capacitor is ZC = = = −10j.
jωC 0.1j
The total impedance is therefore Z = ZR + ZC = 5 − 10j, and so the complex voltage is

ṽ (t) = Z ĩ = (5 − 10j) Iejt

= I (5 − 10j) (cos t + j sin t)

188
= I [(5 cos t + 10 sin t) + j (5 sin t − 10 cos t)] .

The actual voltage drop is the imaginary part of this:

v (t) = Im (ṽ (t)) = I (5 sin t − 10 cos t)


(which we could rewrite as v (t) = 5 5I sin (t − 1.107)).

2. Suppose a current i (t) = I sin 4t passes through the circuit illustrated:

R!2Ω

A B

Figure 86: L ! 0.5 H

Find the voltage drop between A and B.

Solution: This time our complex current is ĩ (t) = Ie4jt .

The impedance of the resistor is ZR = R = 2, and the impedance of the inductor is

ZL = jωL = j (4) (0.5) = 2j.

Since they are connected in parallel, the total impedance can be calculated this way:

1 1 1
= +
Z Z1 Z2

1 1
= +
2 2j
1
= (1 − j)
2
  
1 2 1+j
=⇒ Z= 1 = = 1 + j.
2 (1 − j)
1−j 1+j

Hence the complex voltage is

ṽ (t) = Z ĩ = (1 + j) Ie4jt

= I (1 + j) (cos 4t + j sin 4t)

189
= I [(cos 4t − sin 4t) + j (cos 4t + sin 4t)]

and so the actual voltage is

v (t) = Im (ṽ (t)) = I (cos 4t + sin 4t)

√  π
= 2I sin 4t + .
4
√ π
Comment: Since the number 1 − j is recognizable as 2e−j 4 , these calculations are

much easier in exponential form:

1 2 √
Z= 1 = √ −j π = 2ejπ/4 = 1 + j
2 (1 − j) 2e 4

√ jπ/4 4jt
=⇒ ṽ (t) = 2e e I


= 2Iej(4t+π/4)

√  π
=⇒ v (t) = 2I sin 4t + .
4

Note in particular that this puts the solution directly into the desired amplitude/phase

form!

3. Here’s a slightly more complicated circuit:

R!5Ω

A B
L ! 0.5 H
Figure 87: C ! 0.1 F

Again, the problem is to find the voltage drop between terminals A and B, given an

alternating current. Let’s say i (t) = sin 2t.

Solution: The impedances of the resistor, capacitor, and inductor are ZR = 5, ZC =


1 1 5
= = = −5j, and ZL = jωL = j.
jωC 0.2j j

190
To determine how they combine, we first consider the part of the circuit which is con-

structed in parallel. We can calculate an impedance for this section; let’s call it ZRC :

1 1 1
= +
ZRC ZR ZC

1 j 1 2 jπ/4
= + = (1 + j) = e
5 5 5 5

5 5
=⇒ ZRC = √ e−jπ/4 = (1 − j)
2 2

Now we simply have two components connected in series, and the net impedance is

Z = ZRC + ZL

5
= (1 − j) + j
2
5 3
= − j.
2 2

The complex voltage is therefore

 
5 − 3j
ṽ (t) = Z ĩ (t) = Iej2t
2


34I −j tan−1 (3/5) j2t
= e e
2

34I j (2t−tan−1 (3/5))
= e
2

and the actual voltage is the imaginary part:


34
I sin 2t − tan−1 (3/5)

v (t) =
2

≈ 2.91I sin (2t − 0.54)

A final comment: in one of your second-year courses, you will see another method for

solving problems like these, which is based on calculus instead of complex algebra. It is

based on the observation that the formulas for the voltage drop formulas for capacitors

191
and inductors can be expressed as antiderivatives or derivatives of the current. Given

i (t) = I sin (ωt) ,

I  π
vC (t) = sin ωt − ,
ωC 2

and
 π 
vL (t) = ωLI sin ωt + ,
2

we can see that


d di
vL (t) = ωLI cos (ωt) = LI (sin ωt) = L
dt dt

and
t t
−I
Z Z
I 1
vC (t) = cos (ωt) = sin (ωτ ) dτ = i (τ ) dτ.
ωC C a C a

These formulas turn out to be valid even when the current is not simply alternating,

which makes the resulting method (using differential equations) much more powerful.

192

You might also like