0% found this document useful (0 votes)

286 views461 pages

Calc

This preface introduces the goals and structure of the textbook. The book aims to integrate linear algebra and multivariable calculus material by emphasizing implicit and explicit representations. It includes standard computational topics with complete proofs. The book is intended for students with single-variable calculus background and an interest in proofs. It provides numerous exercises of varying difficulty levels to help students learn the material.

Uploaded by

Aritra C Bhattacharya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

286 views461 pages

Calc

Uploaded by

Aritra C Bhattacharya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 461

Preface

I began writing this text as I taught a brand-new course combining linear algebra and a rigorous
approach to multivariable calculus. Some of the students had already taken a proof-oriented single-
variable calculus course (using Spivak’s beautiful book, Calculus), but many had not: There were
sophomores who wanted a more challenging entrée to higher-level mathematics, as well as freshmen
who’d scored a 5 on the Advanced Placement Calculus BC exam. My goal was to include all
the standard computational material found in the usual linear algebra and multivariable calculus
courses and more, interweaving the material as effectively as possible, and include complete proofs.
Although there have been a number of books that include both the linear algebra and the
calculus material, they have tended to segregate the material. Advanced calculus books treat the
rigorous multivariable calculus, but presume the students have already mastered linear algebra. I
wanted to integrate the material so as to emphasize the recurring theme of implicit versus explicit
that persists in linear algebra and analysis. In every linear algebra course we should learn how to
go back and forth between a system of equations and a parametrization of its solution set. But the
same problem occurs, in principle, in calculus: To solve constrained maximum/minimum problems
we must either parametrize the constraint set or use Lagrange multipliers; to integrate over a curve
or surface, we need a parametric representation. Of course, in the linear case one can globally
go back and forth; it’s not so easy in the nonlinear case, but, as we’ll learn, it should at least be
possible in principle locally.
The prerequisites for this book are a solid background in single-variable calculus and, if not some
experience writing proofs, a strong interest in grappling with them. In presenting the material, I
have included plenty of examples, clear proofs, and significant motivation for the crucial concepts.
We all know that to learn (and enjoy?) mathematics one must work lots of problems, from the
routine to the more challenging. To this end, I have provided numerous exercises of varying levels
of difficulty, both computational and more proof-oriented. Some of the proof exercises require the
student “merely” to understand and modify a proof in the text; others may require a good deal
of ingenuity. I also ask students for lots of examples and counterexamples. Generally speaking,
exercises are arranged in order of increasing difficulty. To offer a bit more guidance, I have marked
with an asterisk (*) those problems to which short answers, hints, or—in some cases—complete
solutions are given at the back of the book. As a guide to the new teacher, I have marked with
a sharp (♯ ) some important exercises to which reference is made later. An Instructors’ Solutions
Manual is available from the publisher.

i
ii PREFACE

Comments on Contents

The linear algebraic material with which we begin the course in Chapter 1 is concrete, es-
tablishes the link with geometry, and is a good self-contained setting for working on proofs. We
introduce vectors, dot products, subspaces, and linear transformations and matrix computations.
At this early stage we emphasize the two interpretations of multiplying a matrix A by a vector
x: the linear equations viewpoint (considering the dot products of the rows of A with x) and the
linear combinations viewpoint (taking the linear combination of the columns of A weighted by the
coordinates of x). We end the chapter with a discussion of 2 × 2 and 3 × 3 determinants, area,
volume, and the cross product.
In Chapter 2 we begin to make the transition to calculus, introducing scalar functions of a
vector variable—their graphs and their level sets—and vector-valued functions. We introduce the
requisite language of open and closed sets, sequences, and limits and continuity, including the proofs
of the usual limit theorems. (Generally, however, I give these short shrift in lecture, as I don’t have
the time to emphasize δ-ε arguments.)
We come to the concepts of differential calculus in Chapter 3. We quickly introduce partial and
directional derivatives as immediate to calculate, and then come to the definition of differentiability,
the characterization of differentiable functions, and the standard differentiation rules. We give the
gradient vector its own brief section, in which we emphasize its geometric meaning. Then comes
a section on curves, in which we mention Kepler’s laws (the second is proved in the text and the
other two are left as an exercise), arclength, and curvature of a space curve.
In the first four sections of Chapter 4 we give an accelerated treatment of Gaussian elimination
(including a proof of uniqueness of reduced echelon form) and the theory of linear systems, the
standard material on linear independence and dimension (including a brief mention of abstract
vector spaces), and the four fundamental subspaces associated to a matrix. In the last section, we
begin our assault on the nonlinear case, introducing (with no proofs) the implicit function theorem
and the notion of a manifold.
Chapter 5 is a blend of topology, calculus, and linear algebra—quadratic forms and projections.
We start with the topological notion of compactness and prove the maximum value theorem in
higher dimensions. We then turn to the calculus of applied maximum/minimum problems and then
to the analysis of the second-derivative test and the Hessian. Then comes one of the most important
topics in applications, Lagrange multipliers (with a rigorous proof). In the last section, we return
to linear algebra, to discuss projections (from both the explicit and the implicit approaches), least-
squares solutions of inconsistent systems, the Gram-Schmidt process, and a brief discussion of
abstract inner product spaces (including a nice proof of Lagrange interpolation).
Chapter 6 is a brief, but sophisticated, introduction to the inverse and implicit function theo-
rems. We present our favorite proof using the contraction mapping principle (which is both more
elegant and works just fine in the infinite-dimensional setting). In the last section we prove that all
three definitions of a manifold are (locally) equivalent: the implicit representation, the parametric
representation, and the representation as a graph. (In the year-long course that I teach, I find I
have time to treat this chapter only lightly.)
In Chapter 7 we study the multidimensional (Riemann) integral. In the first two sections we
PREFACE iii

deal predominantly with the theory of the multiple integral and, then, Fubini’s Theorem and the
computation of iterated integrals. Then we introduce (as is customary in a typical multivariable
calculus course) polar, cylindrical, and spherical coordinates and various physical applications. We
conclude the chapter with a careful treatment of determinants (which will play a crucial role in
Chapter 8 and 9) and a proof of the Change of Variables Theorem.
In single-variable calculus, one of the truly impressive results is the Fundamental Theorem
of Calculus. In Chapter 8 we start by laying the groundwork for the analogous multidimensional
result, introducing differential forms in a very explicit fashion. We then parallel a traditional vector
calculus course, introducing line integrals and Green’s Theorem, surface integrals and flux, and,
then, finally stating and proving the general Stokes’ Theorem for compact oriented manifolds. We
do not skimp on concrete and nontrivial examples throughout. In Section 8.6 we introduce the
standard terminology of divergence and curl and give the “classical” versions of Stokes’ and the
Divergence Theorems, along with some applications to physics. In Section 8.7 we begin to illustrate
the power of Stokes’ Theorem by proving the Fundamental Theorem of Algebra, a special case of
the argument principle, and the “hairy ball theorem” from topology.
In Chapter 9 we complete our study of linear algebra, including standard material on change
of basis (with a geometric slant), eigenvalues, eigenvectors and discussion of diagonalizability. The
remainder of the chapter is devoted to applications: difference and differential equations, and a
brief discussion of flows and their relation to the Divergence Theorem of Chapter 8. We close with
the Spectral Theorem. (With the exception of Section 3.3, which relies on Chapter 8, and the
proof of the Spectral Theorem, which relies on Section 4 of Chapter 5, topics in this chapter can
be covered at any time after completing Chapter 4.)
We have included a glossary of notations and a quick compilation of relevant results from
trigonometry and single-variable calculus (including a short table of integrals), along with a much-
requested list of the Greek alphabet.
There are over 800 exercises in the text, many with multiple parts. Here are a few particularly
interesting (and somewhat unusual) exercises included in this text:

• Exercises 1.2.22–26 and Exercises 1.5.19 and 1.5.20 on the geometry of triangles, and Ex-
ercise 1.5.17, a nice glimpse of affine geometry;
• Exercise 2.1.12, a parametrization of a hyperboloid of one sheet in which the parameter
curves are the two families of rulings
• Exercises 2.3.15–17, 3.1.10, and 3.2.18–19, exploring the infamous sorts of discontinuous
and non-differentiable functions
• Example 3.4.3 introducing the reflectivity property of the ellipse via the gradient, with
follow-ups in Exercises 3.4.8, 3.4.9, and 3.4.13, and then Kepler’s first and third laws in
Exercise 3.5.15.
• Exercise 3.5.14, the famous fact (due to Huygens) that the evolute of a cycloid is a congruent
cycloid
• Exercise 4.5.13, in which we discover that the lines passing through three pairwise-skew
lines generate a saddle surface
• Exercises 5.1.5, 5.1.7, 9.4.11, exploring the (operator) norm of a matrix
iv PREFACE

• Exercise 5.2.15, introducing the Fermat/Steiner point of a triangle

• Exercises 5.3.2 and 5.3.4, pointing out a local minimum along every line need not be a local
minimum (an issue that is mishandled in in surprisingly many multivariable calculus texts)
and that a lone critical point that is a local minimum may not be a global minimum
• Exercises 5.4.32, 5.4.34, and 9.4.21, giving the interpretation of the Lagrange multiplier,
introducing the bordered Hessian, and giving a proof that the bordered Hessian gives a
sufficient test for constrained critical points
• Exercises 6.1.8 and 6.1.10, giving Kantarovich’s Theorem (first in one dimension and then in
higher), a sufficient condition for Newton’s method to converge (a beautiful result I learned
from Hubbard and Hubbard)
• Exercise 6.2.13, introducing the envelope of a family of curves
• Exercise 7.3.24, my favorite triple integral challenge problem
• Exercises 7.4.27 and 7.4.28
• Exercises 7.5.25–27, some nice applications of the determinant
• Exercises 8.3.23, 8.3.25, and 8.3.26, some interesting applications of line integration and
Green’s Theorem
• Exercise 8.5.22, giving a calibrations proof that the minimal surface equation gives surfaces
of least area
• The discussion in Chapter 8, §7, of counting roots (reminiscent of the treatment of winding
numbers and Gauss’s Law in earlier sections) and Exercises 8.7.9 and 9.4.22, in which we
prove that the roots of a complex polynomial depend continuously on its coefficients, and
then derive Sylvester’s Law of Inertia as a corollary
• Exercises 9.1.12 and 9.1.13, some interesting applications of the change-of-basis framework
• Exercises 9.2.19, 9.2.20, 9.2.23, 9.2.24, some more standard but more challenging linear
algebra exercises

Possible Ways to Use this Book

I have been using the text for a number of years in a course for highly motivated freshmen and
sophomores. Since this is the first “serious” course in mathematics for many of them, because of
time limitations, I must give somewhat short shrift to many of the complicated analytic proofs.
For example, I only have time to talk about the Inverse and Implicit Function Theorems and to
sketch the proof of the Change of Variables Theorem, and don’t include all the technical aspects
of the proof of Stokes’ Theorem. On the other hand, I cover most of the linear algebra material
thoroughly. I do plenty of examples and assign a broad range of homework problems, from the
computational to the more challenging proofs.
It would also be quite appropriate to use the text in courses in advanced calculus or multivariable
analysis. Depending on the students’ background, I might bypass the linear algebra material or
assign some of it as review reading and highlight a few crucial results. I would spend more time
on the analytic material (especially in Chapters 3, 6, and 7) and treat Stokes’ Theorem from the
differential form viewpoint very carefully, including the applications in Section 8.7. The approach
of the text will give the students a very hands-on understanding of rather abstract material. In such
PREFACE v

courses, I would spend more time in class on proofs and assign a greater proportion of theoretical
homework problems.
Acknowledgments
I would like to thank my students of the past years for enduring preliminary versions of this text
and for all their helpful comments and suggestions. I would like to acknowledge helpful conversations
with my colleagues Malcolm Adams and Jason Cantarella. I would also like to thank the following
reviewers, along with several anonymous referees, who offered many helpful comments:
Quo-Shin Chi Washington University
Philip B. Yasskin Texas A& M University
Mohamed Elhamdadi University of South Florida
I am very grateful to my editor, Laurie Rosatone, for her enthusiastic support, encouragement, and
guidance.
I welcome any comments and suggestions. Please address any e-mail correspondence to
[email protected]
and please keep an eye on
http://www.math.uga.edu/~shifrin/Multivariable.html
or
http://www.wiley.com/college/shifrin
for the latest in typos and corrections.
CHAPTER 1
Vectors and Matrices
Linear algebra provides a beautiful example of the interplay between two branches of mathe-
matics, geometry and algebra. Moreover, it provides the foundations for all of our upcoming work
with calculus, which is based on the idea of approximating the general function locally by a linear
one. In this chapter, we introduce the basic language of vectors, linear functions, and matrices.
We emphasize throughout the symbiotic relation between geometric and algebraic calculations and
interpretations. This is true also of the last section, where we discuss the determinant in two and
three dimensions and define the cross product.

1. Vectors in Rn

A point in Rn is an
 ordered n-tuple of real numbers, written (x1 , . . . , xn ). To it we may associate
x1
 x2 
 
the vector x =  .. , which we visualize geometrically as the arrow pointing from the origin to the
 . 
xn
point. We shall (purposely) use the boldface letter x to denote both the point and the corresponding
vector, as illustrated in Figure 1.1. We denote by 0 the vector all of coordinates are 0, called the
zero vector .

x1
x = x2
x3
x = xx1
2

x2 x3

O x1
x2 x1

Figure 1.1

More generally, any two points A and B in space determine the arrow pointing from A to B,
−−
→
as shown in Figure 1.2, again specifying a vector thatwe 
denote AB. We often refer to A as the
a1 b1
−
−→  .   .  −
−→
“tail” of the vector
 AB and B as its “head.” If A =  ..  and B =  .. , then AB is equal to
b 1 − a1 an bn
 .. 
the vector v =  . , whose tail is at the origin, as indicated in Figure 1.2.
b n − an
1
2 Chapter 1. Vectors and Matrices

b
B b1
2
D b2−a2

A a
a2
1
v
C b1−a1

Figure 1.2
p
The Pythagorean Theorem tells us that when n = 2 the length of the vector x is x21 + x22 . A
repeated application of the Pythagorean Theorem, as indicated in Figure 1.3, leads to the following

x1
x = x2
x3

x2 x1

Figure 1.3

Definition. We define the length of the vector

 
x1
  q
 x2 
x=  .. 
 ∈ R n
to be kxk = x21 + x22 + · · · + x2n .
 . 
xn

We say x is a unit vector if it has length 1, i.e., if kxk = 1.

There are two crucial algebraic operations one can perform on vectors, both of which have clear
geometric interpretations.  
x1
 x2 
 
Scalar multiplication:
  If c is a real number and x =  ..  is a vector, then we define cx
cx1
 . 
 cx2  xn
 
to be the vector  .. . Note that cx points in either the same direction as x or the opposite
 . 
cxn
direction, depending on whether c > 0 or c < 0, respectively. Thus, multiplication by the real
number c simply stretches (or shrinks) the vector by a factor of |c| and reverses its direction when
§1. Vectors in Rn 3

c is negative. Since this is a geometric “change of scale,” we refer to the real number c as a scalar
and the multiplication cx as scalar multiplication.
Note that whenever x 6= 0 we can find a unit vector with the same direction by taking
x 1
= x,
kxk kxk
as shown in Figure 1.4.

x
x x x
||x|| ||x||

the unit circle in R 2 the unit sphere in R 3

Figure 1.4

Given a nonzero vector x, any scalar multiple cx lies on the line through the origin and passing
through the head of the vector x. For this reason, we make the following

Definition. We say two vectors x and y are parallel if one is a scalar multiple of the other,
i.e., if there is a scalar c so that y = cx or x = cy. We say x and y are nonparallel if they are not
parallel.
     
x1 y1 x1 + y 1
     .. 
Vector addition: If x =  ...  and y =  ... , then we define x + y =   . . To

xn yn xn + y n
understand this geometrically, we move the vector y so that its tail is at the head of x, and draw
the arrow from the origin to its head. This is the so-called parallelogram law for vector addition,

x2+y2 x

y
y2

y x+y
x2
x

y1 x1 x1+y1

Figure 1.5

for, as we see in Figure 1.5, x + y is the “long” diagonal of the parallelogram spanned by x and y.
4 Chapter 1. Vectors and Matrices

Notice that the picture makes it clear that vector addition is commutative; i.e.,

x + y = y + x.

This also follows immediately from the algebraic definition because addition of real numbers is
commutative. (See Exercise 12 for an exhaustive list of the properties of vector addition and scalar
multiplication.)

Remark . We emphasize here that the notions of vector addition and scalar multiplication
−−
→
make sense geometrically for vectors in the form AB which do not necessarily have their tails at
−
−→ −−→ −−→
the origin. If we wish to add AB to CD, we simply recall that CD is equal to any vector with the
−−→
same length and direction, so we just translate CD so that C and B coincide; then the arrow from
−
−→ −−→
A to the point D in its new position is the sum AB + CD.

Subtraction of one vector from another is easy to define algebraically. If x and y are as above,
then we set
 
x1 − y 1
 .. 
x−y =  . .

xn − y n
As is the case with real numbers, we have the following interpretation of the difference x − y: It is

y x−y

−y
x−y=x+(−y)
−y

Figure 1.6

the vector we add to y in order to obtain x, i.e.,

(x − y) + y = x.

Pictorially, we see that x − y is drawn, as shown in Figure 1.6, by putting its tail at y and its head
at x, thereby resulting in the other diagonal of the parallelogram determined by x and y. Note
−→ −−→ −−
→
that if A and B are points in space and we set x = OA and y = OB, then y − x = AB. Moreover,
as Figure 1.6 also suggests, we have x − y = x + (−y).
§1. Vectors in Rn 5

Example 1. Let A and B be points in Rn . The midpoint M of the line segment joining them is
−−→ −−
→ −→
the point halfway from A to B; that is, AM = 21 AB. Using the notation as above, we set x = OA
−−→
and y = OB, and we have
−−→ −−→
(∗) OM = x + AM = x + 21 (y − x) = 12 (x + y).
In particular, the vector from the origin to the midpoint of AB is the average of the vectors x and
y. See Exercise 8 for a generalization to three vectors and Section 4 of Chapter 7 for more.
From this formula follows one of the classic results from high school geometry: The diagonals

B
y
M
x A

Figure 1.7

of a parallelogram bisect one another. We’ve seen that the midpoint M of AB is, by virtue of the
formula (∗), also the midpoint of diagonal OC. (See Figure 1.7.) ▽

It should now be evident that vector methods provide a great tool for translating theorems
from Euclidean geometry into simple algebraic statements. Here is another example. Recall that a
median of a triangle is a line segment from a vertex to the midpoint of the opposite side.

Proposition 1.1. The medians of a triangle intersect at a point that is two-thirds of the way
from each vertex to the opposite side.

Proof. We may put one of the vertices of the triangle at the origin, so that the picture is as
−→ −−→
in Figure 1.8(a). Let x = OA, y = OB, and let L, M , and N be the midpoints of OA, AB, and

B B
y y
M

N R Q
A P A
x x
L
O O

(a) (b)

Figure 1.8

OB, respectively. The battle plan is the following: We let P denote the point 2/3 of the way from
6 Chapter 1. Vectors and Matrices

B to L, Q the point 2/3 of the way from O to M , and R the point 2/3 of the way from A to N .
Although we’ve indicated P , Q, and R as distinct points in Figure 1.8(b), our goal is to prove that
−−→ −−→ −−
→
P = Q = R; we do this by expressing all the vectors OP , OQ, and OR in terms of x and y.

−−→ −−→ −−→ −−→ 2 −→

OP = OB + BP = OB + 3 BL = y + 32 ( 12 x − y)
= 13 x + 13 y;
−−→ 2 −−→ 2 1
OQ = 3 OM = 3 2 (x + y) = 13 (x + y); and
−−→ −→ −→ −→ −−→
OR = OA + AR = OA + 23 AN = x + 23 ( 12 y − x) = 31 x + 13 y.

−−→ −−→ −−
→
We conclude that, as desired, OP = OQ = OR, and so P = Q = R. That is, if we go 2/3 of
the way down any of the medians, we end up at the same point; this is, of course, the point of
intersection of the three medians.

The astute reader might notice that we could have been more economical in the last proof. Suppose
we merely check that the points 2/3 of the way down two of the medians (say P and Q) agree. It
would then follow (say, by relabeling the triangle slightly) that the same is true of a different pair
of medians (say P and R). But since any two pairs must have a point in common, we may now
conclude that all three points are equal.

EXERCISES 1.1

2 −1
1. Given x = and y = , calculate the following both algebraically and geometrically.
3 1
a. x+y
b. x−y
c. x + 2y
1 1
d. 2x + 2y
e. y−x
f. 2x − y
g. kxk
x
h. kxk
     
1 2 3
*2. Three vertices of a parallelogram are  2 ,  4 , and  1 . What are all the possible positions
1 3 5
of the fourth vertex? Give your reasoning.

3. The origin is at the center of a regular polygon.

a. What is the sum of the vectors to each of the vertices of the polygon? Give your reasoning.
(Hint: What are the symmetries of the polygon?)
b. What is the sum of the vectors from one fixed vertex to each of the remaining vertices?
Give your reasoning.
§1. Vectors in Rn 7

−−→
4. Given △ABC, let M and N be the midpoints of AB and AC, respectively. Prove that M N =
1−−→
2 BC.

5. Let ABCD be an arbitrary quadrilateral. Let P , Q, R, and S be the midpoints of AB, BC,
CD, and DA, respectively. Use vector methods to prove that P QRS is a parallelogram. (Hint:
Use Exercise 4.)
−−→ −−→ −−→ −−→
*6. In △ABC pictured in Figure 1.9, kADk = 32 kABk and kCEk = 52 kCBk. Let Q denote

E
Q

A D B

Figure 1.9
−→ −→
the midpoint of CD; show that AQ = cAE for some scalar c and determine the ratio c =
−→ −→
kAQk/kAEk. In what ratio does CD divide AE?
−→ −
−→ −−→ −−→
7. Consider parallelogram ABCD. Suppose AE = 13 AB and DP = 34 DE. Show that P lies on
the diagonal AC. (See Figure 1.10.)

D C

A E B

Figure 1.10

−→ −−→ −−→
8. Let A, B, and C be vertices of a triangle in R3 . Let x = OA, y = OB, and z = OC. Show that
the head of the vector v = 13 (x + y + z) lies on each median of △ABC (and thus is the point
of intersection of the three medians). It follows (see Section 4 of Chapter 7) that when we put
equal masses at A, B, and C, the center of mass of that system is given by the intersections of
the medians of the triangle.

9. a. Let u, v ∈ R2 . Describe the vectors x = su + tv, where s + t = 1. Pay particular attention

to the location of x when s ≥ 0 and when t ≥ 0.
b. Let u, v, w ∈ R3 . Describe the vectors x = ru + sv + tw, where r + s + t = 1. Pay
particular attention to the location of x when each of r, s, and t is positive.

10. Suppose x, y ∈ Rn are nonparallel vectors. (Recall the definition on p. 3.)

a. Prove that if sx + ty = 0, then s = t = 0. (Hint: Show that neither s 6= 0 nor t 6= 0 is
possible.)
8 Chapter 1. Vectors and Matrices

b. Prove that if ax + by = cx + dy, then a = c and b = d.

11. “Discover” the fraction 2/3 that appears in Proposition 1.1 by finding the intersection of two
←−→
medians. (Hint: A point on the line OM can be written in the form t(x + y) for some scalar
←→
t, and a point on the line AN can be written in the form x + s( 21 y − x) for some scalar s. You
will need to use the result of Exercise 10.)

12. Verify both algebraically and geometrically that the following properties of vector arithmetic
hold. (Do so for n = 2 if the general case is too intimidating.)
a. For all x, y ∈ Rn , x + y = y + x.
b. For all x, y, z ∈ Rn , (x + y) + z = x + (y + z).
c. 0 + x = x for all x ∈ Rn .
d. For each x ∈ Rn , there is a vector −x so that x + (−x) = 0.
e. For all c, d ∈ R and x ∈ Rn , c(dx) = (cd)x.
f. For all c ∈ R and x, y ∈ Rn , c(x + y) = cx + cy.
g. For all c, d ∈ R and x ∈ Rn , (c + d)x = cx + dx.
h. For all x ∈ Rn , 1x = x.
♯ 13. a. Using only the properties listed in Exercise 12, prove that for any x ∈ Rn , we have 0x = 0.
(It often surprises students that this is a consequence of the properties in Exercise 12.)
b. Using the result of part a, prove that (−1)x = −x. (Be sure that you didn’t use this fact
in your proof of part a!)

2. Dot Product

We discuss next one of the crucial constructions in linear algebra, the dot product x · y of two
n
x, y ∈ R . By wayof motivation, let’s recall some basic results from plane geometry. Let
vectors
x1 y1
P = and Q = be points in the plane, as pictured in Figure 2.1. Then we observe
x2 y2

y1
Q B

y2 P
x2
O x1 A

Figure 2.1
that when ∠P OQ is a right angle, △OAP is similar to △OBQ, and so x2 /x1 = −y1 /y2 , whence
x1 y1 + x2 y2 = 0. This leads us to make the following

Definition. Given vectors x, y ∈ R2 , define their dot product

x · y = x1 y 1 + x2 y 2 .
§2. Dot Product 9

More generally, given vectors x, y ∈ Rn , define their dot product

x · y = x1 y 1 + x2 y 2 + · · · + xn y n .

We know that when the vectors x and y ∈ R2 are perpendicular, their dot product is 0. By
starting with the algebraic properties of the dot product, we are able to get a great deal of geometry
out of it.

Proposition 2.1. The dot product has the following properties:

(1) x · y = y · x for all x, y ∈ Rn (dot product is commutative);
(2) x · x = kxk2 ≥ 0 and x · x = 0 ⇐⇒ x = 0;
(3) (cx) · y = c(x · y) for all x, y ∈ Rn and c ∈ R;
(4) x · (y + z) = x · y + x · z for all x, y, z ∈ Rn (the distributive property).

Proof. In order to simplify the notation, we give the proof with n = 2. Since multiplication of
real numbers is commutative, we have

x · y = x1 y1 + x2 y2 = y1 x1 + y2 x2 = y · x.

The square of a real number is nonnegative and the sum of nonnegative numbers is nonnegative,
so x · x = x21 + x22 ≥ 0 and is equal to 0 only when x1 = x2 = 0. The next property follows from
the associative and distributive properties of real numbers:

(cx) · y = (cx1 )y1 + (cx2 )y2 = c(x1 y1 ) + c(x2 y2 ) = c(x1 y1 + x2 y2 ) = c(x · y).

The last result follows from the commutative, associative, and distributive properties of real num-
bers:

x · (y + z) = x1 (y1 + z1 ) + x2 (y2 + z2 ) = x1 y1 + x1 z1 + x2 y2 + x2 z2
= (x1 y1 + x2 y2 ) + (x1 z1 + x2 z2 ) = x · y + x · z.

Corollary 2.2. kx + yk2 = kxk2 + 2x · y + kyk2 .

Proof. Using the properties repeatedly, we have:

kx + yk2 = (x + y) · (x + y) = x · x + x · y + y · x + y · y
= kxk2 + 2x · y + kyk2 ,

as desired.

The geometric meaning of this result comes from the Pythagorean Theorem: When x and y are
perpendicular vectors in R2 , then we have kx+ yk2 = kxk2 + kyk2 , and so, by Corollary 2.2, it must
be the case that x · y = 0. (And the converse follows, too, from the converse of the Pythagorean
Theorem.) That is, two vectors in R2 are perpendicular if and only if their dot product is 0.
Motivated by this, we use the algebraic definition of dot product of vectors in Rn to bring in
the geometry. In keeping with current use of the terminology and falling prey to the penchant to
have several names for the same thing, we make the following

Definition. We say vectors x and y are orthogonal if x · y = 0.

10 Chapter 1. Vectors and Matrices

x
x

x || y

Figure 2.2

Armed with this definition, we proceed to a construction that will be important in much of
our future work. Starting with two vectors x, y ∈ Rn , where y 6= 0, Figure 2.2 suggests that we
should be able to write x as the sum of a vector, xk , that is parallel to y and a vector, x⊥ , that is
orthogonal to y. Let’s suppose we have such an equation:

x = xk + x⊥ , where
xk is a scalar multiple of y and x⊥ is orthogonal to y.

To say that xk is a scalar multiple of y means that we can write xk = cy for some scalar c. Now,
assuming such an expression exists, we can determine c by taking the dot product of both sides of
the equation with y:

x · y = (xk + x⊥ ) · y = (xk · y) + (x⊥ · y) = xk · y = (cy) · y = ckyk2 .

This means that

x·y x·y
c= , and so xk = y.
kyk2 kyk2

The vector xk is called the projection of x onto y, written projy x.

The fastidious reader may be puzzled by the logic here. We have apparently assumed that we
can write x = xk + x⊥ in order to prove that we can do so. Of course, as it stands, this is no fair.
Here’s how we fix it. We now define
x·y
xk = y
kyk2
x·y
x⊥ = x − y.
kyk2

Obviously, xk + x⊥ = x and xk is a scalar multiple of y. All we need to check is that x⊥ is in fact

orthogonal to y. Well,

⊥ x·y x·y x·y
x ·y= x− 2
y ·y =x·y− 2
y·y =x·y− kyk2 = x · y − x · y = 0,
kyk kyk kyk2

as required. Note, moreover, that xk is the unique multiple of y that satisfies the equation
(x − xk ) · y = 0.
§2. Dot Product 11
   
2 −1
Example 1. Let x =  3  and y =  1 . Then
1 1
   
2 −1
3· 1    
−1 −1
x · y 1 1 2
xk = y =    1  =  1  and
kyk2 −1 2
1
3
1
 1 

1
     
8
2 −1 3
  2   
x⊥ =  3  −  1  =  7
3 .
3 1
1 1 3
8  
3 −1
To double-check, we compute x⊥ · y =  7
3
 ·  1  = 0, as it should be. ▽
1
3 1

Suppose x, y ∈ R2 . We shall see next that the formula for the projection of x onto y enables
us to calculate the angle between the vectors x and y. Consider the right triangle in Figure 2.3;
let θ denote the angle between the vectors x and y. Remembering that the cosine of an angle is

x x

θ
θ
x || y x || y

Figure 2.3

the ratio of the signed length of the adjacent side to the length of the hypotenuse, we see that
x·y
kyk
signed length of xk ckyk kyk2 x·y
cos θ = = = = .
length of x kxk kxk kxkkyk
This, then, is the geometric interpretation of the dot product:
x · y = kxkkyk cos θ.
Will this formula still make sense even when x, y ∈ Rn ? Geometrically, we simply restrict our
attention to the plane spanned by x and y and measure the angle θ in that plane, and so we
blithely make the

Definition. Let x and y be nonzero vectors in Rn . We define the angle between them to be
the unique θ satisfying 0 ≤ θ ≤ π so that
x·y
cos θ = .
kxkkyk
12 Chapter 1. Vectors and Matrices

Since our geometric intuition may be misleading in Rn , we should check algebraically that this
definition makes sense. Since |cos θ| ≤ 1, the following result gives us what is needed.

Proposition 2.3 (Cauchy-Schwarz Inequality). If x, y ∈ Rn , then

|x · y| ≤ kxkkyk.

Moreover, equality holds if and only if one of the vectors is a scalar multiple of the other.

Proof. If y = 0, then there’s nothing to prove. If y 6= 0, then we observe that the quadratic
function of t given by
g(t) = kx + tyk2 = kxk2 + 2tx · y + t2 kyk2
x·y
takes its minimum at t0 = − . The minimum value
kyk2
(x · y)2 (x · y)2 (x · y)2
g(t0 ) = kxk2 − 2 + = kxk2
−
kyk2 kyk2 kyk2
is necessarily nonnegative, so

(x · y)2 ≤ kxk2 kyk2 ,

and, since square root preserves inequality,

|x · y| ≤ kxkkyk,

as desired. Equality holds if and only if x+ty = 0 for some scalar t. (See Exercise 9 for a discussion
of how this proof relates to our formula for projy x above.)

One of the most useful applications of this result is the famed triangle inequality, which tells us
that the sum of the lengths of two sides of a triangle cannot be less than the length of the third.

Corollary 2.4 (Triangle Inequality). For any vectors x, y ∈ Rn , we have kx + yk ≤ kxk + kyk.

Proof. By Corollary 2.2 and Proposition 2.3 we have

kx + yk2 = kxk2 + 2x · y + kyk2 ≤ kxk2 + 2kxkkyk + kyk2 = (kxk + kyk)2 .

Since square root preserves inequality, we conclude that kx + yk ≤ kxk + kyk, as desired.

Remark . The dot product also arises in situations removed from geometry. The economist
introduces the commodity vector x, whose entries are the quantities of various commodities that
happen to be of interest and the price vector p. For example, we might consider
   
x1 p1
 x2   p2 
   
x= 
 x3  and p=  5
 p3  ∈ R ,
 x4   p4 
x5 p5
where x1 represents the number of pounds of flour, x2 the number of dozens of eggs, x3 the number
of pounds of chocolate chips, x4 the number of pounds of walnuts, and x5 the number of pounds of
butter needed to produce a certain massive quantity of chocolate chip cookies, and pi is the price
§2. Dot Product 13

(in dollars) of a unit of the ith commodity (e.g., p2 is the price of a dozen eggs). Then it is easy to
see that

p · x = p 1 x1 + p 2 x2 + p 3 x3 + p 4 x4 + p 5 x5

is the total cost of producing the massive quantity of cookies. (To be realistic, we might also want
to include x6 as the number of hours of labor, with corresponding hourly wage p6 .) We will return
to this interpretation in Section 4.

EXERCISES 1.2

1. For each of the following pairs of vectors x and y, calculate x · y and the angle θ between the
vectors.
2 −5
a. x = ,y=
5 2

2 −1    
b. x = ,y= 3 −1
1 1
*f. x =  −4 , y =  0 
1 7
*c. x = ,y= 5 1
8 −4    
    1 1
1 5 1  −3 
d. x =  4 , y =  1  g. x =   
 1 , y =  −1 

−3 3
    1 5
1 5
e. x =  −1 , y =  3 
6 2

*2. For each pair of vectors in Exercise 1, calculate projy x and projx y.

*3. Find the angle between the long diagonal of a cube and a face diagonal.

4. Find the angle that the long diagonal of a 3 × 4 × 5 rectangular box makes with the longest
edge.

5. Suppose x, y ∈ Rn , kxk = 2, kyk = 1, and the angle θ between x and y is θ = arccos(1/4).

Prove that the vectors x − 3y and x + y are orthogonal.

6. Suppose x, y, z ∈ R2 are unit vectors satisfying x + y + z = 0. What can you say about the
angles between each pair?
     
1 0 0
7. Let e1 =  0 , e2 =  1 , and e3 =  0  be the so-called standard basis for R3 . Let x ∈ R3
0 0 1
be a nonzero vector. For i = 1, 2, 3, let θi denote the angle between x and ei . Compute
cos2 θ1 + cos2 θ2 + cos2 θ3 .
14 Chapter 1. Vectors and Matrices
   
1 1
1 2
   
   
*8. Let x =  1  and y =  3  ∈ Rn . Let θn be the angle between x and y in Rn . Find lim θn .
 ..   ..  n→∞
. .
1 n
(Hint: You may need to recall the formulas for 1 + 2 + · · · + n and 12 + 22 + · · · + n2 from your
beginning calculus course.)

9. With regard to the proof of Proposition 2.3, how is t0 y related to xk ? What does this say
about projy x?

10. Use vector methods to prove that a parallelogram is a rectangle if and only if its diagonals have
the same length.

11. Use the fundamental properties of the dot product to prove that

kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 .

Interpret the result geometrically.

*12. Use the dot product to prove the law of cosines: As shown in Figure 2.4,

c2 = a2 + b2 − 2ab cos θ.

c
a
A
θ b
C

Figure 2.4

13. Use vector methods to prove that the diagonals of a parallelogram are orthogonal if and only
if the parallelogram is a rhombus (i.e., has all sides of equal length).
♯ 14. Use vector methods to prove that a triangle inscribed in a circle and having a diameter as one

O x

Figure 2.5
§2. Dot Product 15

of its sides must be a right triangle. (Hint: See Figure 2.5.)

Geometric challenge: More generally, given two points A and B in the plane, what is the locus
of points X so that ∠AXB has a fixed measure?

15. a. Let y ∈ Rn . If x · y = 0 for all x ∈ Rn , then prove that y = 0.

b. Suppose y, z ∈ Rn and x · y = x · z for all x ∈ Rn . What can you conclude?

x1 2 −x2
16. If x = ∈ R , set ρ(x) = .
x2 x1
a. Check that ρ(x) is orthogonal to x; indeed, ρ(x) is obtained by rotating x an angle π/2
counterclockwise.
b. Given x, y ∈ R2 , prove that x · ρ(y) = −ρ(x) · y. Interpret this statement geometrically.

♯ 17. Prove that for any vectors x, y ∈ Rn , we have kxk − kyk ≤ kx − yk. Deduce that kxk − kyk ≤

kx − yk. (Hint: Apply the result of Corollary 2.4 directly.)

18. Use the Cauchy-Schwarz inequality to solve the following max/min problem: If the (long)
diagonal of a rectangular box has length c, what is the greatest the sum of the length, width,
and height of the box can be? For what shape box does the maximum occur?

19. Give an alternative proof of the Cauchy-Schwarz inequality, as follows. Let a = kxk, b = kyk,
and deduce from kbx − ayk2 ≥ 0 that x · y ≤ ab. Now how do you show that |x · y| ≤ ab?
When does equality hold?
♯ 20. a. Let x and y be vectors with kxk = kyk. Prove that the vector x + y bisects the angle
between x and y.
b. More generally, if x and y are arbitrary nonzero vectors, let a = kxk and b = kyk. Prove
that the vector bx + ay bisects the angle between x and y.

21. Use vector methods to prove that the diagonals of a parallelogram bisect the vertex angles if
and only if the parallelogram is a rhombus.

22. Given △ABC with D on BC as shown in Figure 2.6. Prove that if AD bisects ∠BAC, then
C
D

A B

Figure 2.6
−−→ −−→ −−→ −→ −−
→ −→
kBDk/kCDk = kABk/kACk. (Hint: Use Exercise 20b. Let x = AB and y = AC; give two
−−→
expressions for AD in terms of x and y and use Exercise 1.1.10.)

23. Use vector methods to prove that the angle bisectors of a triangle have a common point. (Hint:
−→ −−→ −→ −−→ −−→
Given △OAB, let x = OA, y = OB, a = kOAk, b = kOBk, and c = kABk. If we define
−−→ 1
the point P by OP = a+b+c (bx + ay), use Exercise 20b to show that P lies on all three angle
bisectors.)
16 Chapter 1. Vectors and Matrices

24. Use vector methods to prove that the altitudes of a triangle have a common point. Recall that
altitudes of a triangle are the lines passing through a vertex and perpendicular to the opposite
←→
side. (Hint: See Figure 2.7. Let C be the point of intersection of the altitude from B to OA
←→ −−→ −−
→
and the altitude from A to OB. Prove that OC is orthogonal to AB.)

B
?

O A

Figure 2.7

25. Use vector methods to prove that the perpendicular bisectors of the sides of a triangle intersect
−→
in a point, as follows. Assume the triangle OAB has one vertex at the origin, and let x = OA
−−→
and y = OB.
a. Let z be the point of intersection of the perpendicular bisectors of OA and OB. Prove
that (using the notation of Exercise 16)
kyk2 − x · y
z = 12 x + cρ(x), where c = .
2ρ(x) · y
b. Show that z lies on the perpendicular bisector of AB. (Hint: What is the dot product of
z − 12 (x + y) with y − x?)

26. Let P be the intersection of the medians of △OAB (see Proposition 1.1), Q the intersection of
its altitudes (see Exercise 24), and R the intersection of the perpendicular bisectors of its sides
(see Exercise 25). Show that P , Q, and R are collinear and that P is two-thirds of the way
from Q to R. Does the intersection of the angle bisectors (see Exercise 23) lie on this line as
well?

3. Subspaces of Rn

As we proceed in our study of “linear objects,” it is fundamental to concentrate on subsets of

Rn that are generalizations of lines and planes through the origin.

Definition. A set V ⊂ Rn (a subset of Rn ) is called a subspace of Rn if it satisfies the following

properties:
(1) 0 ∈ V (the zero vector belongs to V );
(2) whenever v ∈ V and c ∈ R, we have cv ∈ V (V is closed under scalar multiplication);
(3) whenever v, w ∈ V , we have v + w ∈ V (V is closed under addition).
§3. Subspaces of Rn 17

Examples 1. Let’s begin with some familiar examples.

(a) The trivial subspace consisting of just the zero vector 0 ∈ Rn is a subspace, since c0 = 0
for any scalar c and 0 + 0 = 0.
(b) Rn itself is likewise a subspace of Rn .
(c) Fix a nonzero vector u ∈ Rn , and consider
ℓ = {x ∈ Rn : x = tu for some t ∈ R}.
We check that the three criteria hold:
(1) Setting t = 0, we see that 0 ∈ ℓ.
(2) If v ∈ ℓ and c ∈ R, then v = tu for some t ∈ R, and so cv = c(tu) = (ct)u, which is
again a scalar multiple of u and hence an element of ℓ.
(3) If v, w ∈ ℓ, this means that v = su and w = tu for some scalars s and t. Then
v + w = su + tu = (s + t)u, so v + w ∈ ℓ, as needed.
ℓ is called a line through the origin.
(d) Fix two nonparallel vectors u and v ∈ Rn . Set
P = {x ∈ Rn : x = su + tv for some s, t ∈ R},
as shown in Figure 3.1. P is called a plane through the origin. To see that P is a subspace,
we do the obligatory checks:

v tv

u
su su+tv

Figure 3.1

(1) Setting s and t = 0, we see that 0 = 0u + 0v, so 0 ∈ P.

(2) Suppose x ∈ P and c ∈ R. Then x = su + tv for some scalars s and t, and
cx = c(su + tv) = (cs)u + (ct)v, so cx ∈ P as well.
(3) Suppose x, y ∈ P. This means that x = su + tv for some scalars s and t, and
y = s′ u + t′ v for some scalars s′ and t′ . Then
x + y = (su + tv) + (s′ u + t′ v) = (s + s′ )u + (t + t′ )v,
so x + y ∈ P, as required.
(e) Fix a nonzero vector A ∈ Rn , and consider
V = {x ∈ Rn : A · x = 0}.
V consists of all vectors orthogonal to the given vector A, as pictured in Figure 3.2. We
check once again that the three criteria hold:
18 Chapter 1. Vectors and Matrices

A⋅x=0
x

Figure 3.2

(1) Since A · 0 = 0, we know that 0 ∈ V .

(2) Suppose v ∈ V and c ∈ R. Then A · (cv) = c(A · v) = 0, so cv ∈ V .
(3) Suppose v, w ∈ V . Then A · (v + w) = (A · v) + (A · w) = 0 + 0 = 0, so v + w ∈ V ,
as required.
Thus, V is a subspace of Rn . We call V a hyperplane in Rn , having normal vector A.
More generally, given any collection of vectors A1 , . . . , Am ∈ Rn , the set of solutions of the
homogeneous system of linear equations

A1 · x = 0, A2 · x = 0, ..., Am · x = 0

forms a subspace of Rn . ▽

Examples 2. Let’s consider next a few subsets of R2 , as pictured in Figure 3.3, that are not
subspaces.

x1 2
(a) S = ∈ R : x2 = 2x1 + 1 is not a subspace. All three criteria fail, but it suffices
x2
to point
out 0∈/ S.
x1 2 1 0
(b) S = ∈ R : x1 x2 = 0 is not a subspace. Each of the vectors v = and w =
x2 0 1

1
lies in S, and yet their sum v + w = does not.
1

x1 2 0
(c) S = ∈ R : x2 ≥ 0 is not a subspace. The vector v = lies in S, and yet any
x2 1

0
negative scalar multiple of it, e.g., (−2)v = , does not. ▽
−2

Given a collection of vectors in Rn , it is natural to try to “build” a subspace from them. We

begin with some crucial definitions.

Definition. Let v1 , . . . , vk ∈ Rn . If c1 , . . . , ck ∈ R, the vector

v = c1 v1 + c2 v2 + · · · + ck vk

(as illustrated in Figure 3.4) is called a linear combination of v1 , . . . , vk . The set of all linear
combinations of v1 , . . . , vk is called their span, denoted Span(v1 , . . . , vk ).
§3. Subspaces of Rn 19

(a) (b) (c)

not subspaces of IR2

Figure 3.3

c1v1+c2v2

c2v2

v2
c1v1
v1

Figure 3.4

Every vector in Rn can be written as a linear combination of the vectors

     
1 0 0
0 1 0
     
. . .
e1 =  ..  , e2 =  ..  , . . . , en = 
    
 ..  .
     
0 0 0
0 0 1

The vectors e1 , . . . , en are often called the standard basis vectors for Rn . Obviously, given the vector
 
x1
 
x =  ...  , we have x = x1 e1 + x2 e2 + · · · + xn en .
xn

Proposition 3.1. Let v1 , . . . , vk ∈ Rn . Then V = Span(v1 , . . . , vk ) is a subspace of Rn .

Proof. We check that all three criteria hold.

(1) To see that 0 ∈ V , we merely take c1 = c2 = · · · = ck = 0. Then (by Exercise 1.1.13)
c1 v1 + c2 v2 + · · · + ck vk = 0v1 + · · · + 0vk = 0 + · · · + 0 = 0.
(2) Suppose v ∈ V and c ∈ R. By definition, there are scalars c1 , . . . , ck so that v = c1 v1 +
c2 v2 + · · · + ck vk . Thus,

cv = c(c1 v1 + c2 v2 + · · · + ck vk ) = (cc1 )v1 + (cc2 )v2 + · · · + (cck )vk ,

which is again a linear combination of v1 , . . . , vk , so cv ∈ V , as desired.

20 Chapter 1. Vectors and Matrices

(3) Suppose v, w ∈ V . This means there are scalars c1 , . . . , ck and d1 , . . . , dk so that

v = c1 v1 + · · · + ck vk and
w = d1 v1 + · · · + dk vk ;

adding, we obtain

v + w = (c1 v1 + · · · + ck vk ) + (d1 v1 + · · · + dk vk )
= (c1 + d1 )v1 + · · · + (ck + dk )vk ,

which is again a linear combination of v1 , . . . , vk , hence an element of V .

This completes the verification that V is a subspace of Rn .

Remark. Let V ⊂ Rn be a subspace and let v1 , . . . , vk ∈ V . We say that v1 , . . . , vk span V

if Span(v1 , . . . , vk ) = V . (The point here is that every vector in V must be a linear combination
of the vectors v1 , . . . , vk .) As we shall see in Chapter 4, it takes at least n vectors to span Rn ;
the smallest number of vectors required to span a given subspace will be a measure of its “size” or
“dimension.”

Example 3. The plane

     

 1 2 

   
P1 = s  −1  + t  0  : s, t ∈ R

 

2 1
is the span of the vectors
   
1 2
   
v1 =  −1  and v2 =  0 
2 1
and is therefore a subspace of R3 . On the other hand, the plane
      

 1 1 2 

     
P2 =  0  + s  −1  + t  0  : s, t ∈ R

 

0 2 1
/ P2 . For 0 ∈ P2 precisely when
is not a subspace. This is most easily verified by checking that 0 ∈
we can find values of s and t so that
       
0 1 1 2
       
 0  =  0  + s  −1  + t  0  .
0 0 2 1
This amounts to the system of equations:
s + 2t = −1
−s = 0
2s + t = 0 ,

which we easily see has no solution.

§3. Subspaces of Rn 21

A word of warning here: We might have expressed P1 in the form

      

 1 1 2 

     
 1  + s  −1  + t  0  : s, t ∈ R ,

 

−1 2 1
so that, despite the presence of the “shifting” term, the plane may still pass through the origin.
▽

There are really two different ways that subspaces of Rn arise: as being the span of a collection
of vectors (the “parametric” approach), or as being the set of solutions of a (homogeneous) system
of linear equations (the “implicit” approach). We shall study the connections between the two in
detail in Chapter 4.
 
−1
Example 4. As the reader can verify, the vector A =  3  is orthogonal to both the vectors
2
that span the plane P1 given in Example 3 above. Thus, every vector in P1 is orthogonal to A and
we suspect that
P1 = {x ∈ R3 : A · x = 0} = {x ∈ R3 : −x1 + 3x2 + 2x3 = 0}.
Strictly speaking, we only know that every vector in P1 is a solution of this equation. But note
that if x is a solution, then
           
x1 3x2 +2x3 3 2 1 2
           
x =  x2  =  x2  = x2 1 + x3 0 = (−x2 ) −1 + (2x2 + x3 ) 0 ,
x3 x3 0 1 2 1
so x ∈ P1 and the two sets are equal.1 Thus, the discussion of Example 1(e) gives another
justification that P1 is a subspace of R3 .
On the other hand, one can check, analogously, that
P2 = {x ∈ R3 : −x1 + 3x2 + 2x3 = −1},
/ P2 and P2 is not a subspace. It is an affine plane parallel to P1 .
and so clearly 0 ∈ ▽

Definition. Let V and W be subspaces of Rn . We say they are orthogonal subspaces if every
element of V is orthogonal to every element of W , i.e., if
v·w =0 for every v ∈ V and every w ∈ W .
As indicated in Figure 3.5, given a subspace V ⊂ Rn , define
V ⊥ = {x ∈ Rn : x · v = 0 for every v ∈ V }.
V ⊥ (read “V perp”) is called the orthogonal complement of V .2

Proposition 3.2. V ⊥ is also a subspace of Rn .

Proof. We leave this to the reader in Exercise 4.

1
Ordinarily, the easiest way to establish that two sets are equal is to show that each is a subset of the other.
2
In fact, both this definition and Proposition 3.2 work just fine for any subset V ⊂ Rn .
22 Chapter 1. Vectors and Matrices

⊥
V

Figure 3.5
 
1
Example 5. Let V = Span  2 . Then V ⊥ is the plane W = {x ∈ R3 : x1 + 2x2 + x3 = 0}.
1
Now what is the orthogonal complement of W ? We suspect it is just the line V , but we will have
to wait until Chapter 4 to have the appropriate tools. ▽

If V and W are orthogonal subspaces of Rn , then certainly W ⊂ V ⊥ (why?). Of course, W

need not be equal to V ⊥ : consider, for example, the x1 -axis and the x2 -axis in R3 .

EXERCISES 1.3

*1. Which of the following are subspaces? Justify your answer in each case.
a. {x ∈ R2 : x1 + x2 = 1}
 
a
b. {x ∈ R3 : x =  b  for some a, b ∈ R}
a+b
c. {x ∈ R3 : x1 + 2x2 < 0}
d. {x ∈ R3 : x21 + x22 + x23 = 1}
e. {x ∈ R3 : x21 + x22 + x23 = 0}
f. {x ∈ R3 : x21 + x22 +x23 = −1}
2 1
g. {x ∈ R3 : x = s  1  + t  2  for some s, t ∈ R}
1 1
     
3 2 1
h. 3    
{x ∈ R : x = 0 + s 1 + t 2  for some s, t ∈ R}

1 1 1
     
2 2 1
i. {x ∈ R3 : x =  4  + s  1  + t  2  for some s, t ∈ R}
−1 1 −1

*2. Criticize the following argument: By Exercise 1.1.13, for any vector v, we have 0v = 0. So the
first criterion for subspaces is, in fact, a consequence of the second criterion and could therefore
be omitted.
♯ 3. Suppose x, v1 , . . . , vk ∈ Rn and x is orthogonal to each of the vectors v1 , . . . , vk . Prove that x
is orthogonal to any linear combination c1 v1 + c2 v2 + · · · + ck vk .
§4. Linear Transformations and Matrix Algebra 23

4. Prove Proposition 3.2.

5. Given vectors v1 , . . . , vk ∈ Rn , prove that V = Span(v1 , . . . , vk ) is the smallest subspace

containing them all. That is, prove that if W ⊂ Rn is a subspace and v1 , . . . , vk ∈ W , then
V ⊂ W.
♯ 6. a. Let U and V be subspaces of Rn . Define

U ∩ V = {x ∈ Rn : x ∈ U and x ∈ V }.

Prove that U ∩ V is a subspace of Rn . Give two examples.

b. Is U ∪ V = {x ∈ Rn : x ∈ U or x ∈ V } a subspace of Rn ? Give a proof or counterexample.
c. Let U and V be subspaces of Rn . Define

U + V = {x ∈ Rn : x = u + v for some u ∈ U and v ∈ V }.

Prove that U + V is a subspace of Rn . Give two examples.

7. Let v1 , . . . , vk ∈ Rn and let v ∈ Rn . Prove that

Span(v1 , . . . , vk ) = Span(v1 , . . . , vk , v) ⇐⇒ v ∈ Span(v1 , . . . , vk ).

♯ *8. Let V ⊂ Rn be a subspace. Prove that V ∩ V ⊥ = {0}.

♯ 9. Suppose U, V ⊂ Rn are subspaces and U ⊂ V . Prove that V ⊥ ⊂ U ⊥ .
♯ 10. Let V ⊂ Rn be a subspace. Prove that V ⊂ (V ⊥ )⊥ . Do you think more is true?
♯ 11. Suppose V = Span(v1 , . . . , vk ) ⊂ Rn . Show that there are vectors w1 , . . . , wk ∈ V that are
mutually orthogonal (i.e., wi · wj = 0 whenever i 6= j) that also span V . (Hint: Let w1 = v1 .
Using techniques of Section 2, define w2 so that Span(w1 , w2 ) = Span(v1 , v2 ) and w1 · w2 = 0.
Continue.)

12. Suppose U and V are subspaces of Rn . Prove that (U + V )⊥ = U ⊥ ∩ V ⊥ . (See the footnote on
p. 21.)

4. Linear Transformations and Matrix Algebra

We are heading towards calculus and the study of functions. As we learned in the case of one
variable, differential calculus is based on the idea of the best (affine) linear approximation of a
function. Thus, our first brush with functions is with those that are linear.
First we introduce a bit of notation. If X and Y are sets, a function f : X → Y is a rule that
assigns to each element x ∈ X a single element y ∈ Y ; we write y = f (x). We call X the domain of f
and Y the range. The image of f is the set of all its values, i.e., {y ∈ Y : y = f (x) for some x ∈ X}.

Definition . A function T : Rn → Rm is called a linear transformation or linear map if it

satisfies
(i) T (u + v) = T (u) + T (v) for all u, v ∈ Rn ;
24 Chapter 1. Vectors and Matrices

(ii) T (cv) = cT (v) for all v ∈ Rn and scalars c.

If we think visually of T as mapping Rn to Rm , then we have a diagram like Figure 4.1.

IRn IRm
v
u+v T T(u)

T(u+v)
u T(v)

Figure 4.1

The main point of the linearity properties is that the values of T on the standard basis vectors
e1 , . . . , en completely determine the function T : For suppose x = x1 e1 + · · · + xn en ∈ Rn ; then

(∗) T (x) = T (x1 e1 + · · · + xn en ) = T (x1 e1 ) + · · · + T (xn en ) = x1 T (e1 ) + . . . xn T (en ).

In particular, let
 
a1j
 
 a2j 
T (ej ) = 
 ..  ∈ Rm ;

 . 
amj
then to T we can naturally associate the m × n array
 
a11 ... a1n
 
 a21 ... a2n 
A=  .. .. .. 
,
 . . . 
am1 . . . amn

which we call the standard matrix for T . (We will often denote this by [T ].) To emphasize: the j th
column of A is the vector in Rm obtained by applying T to the j th standard basis vector, ej .

Example 1. The most basic example of a linear map is the following. Fix a ∈ Rn , and define
T : Rn → R by T (x) = a · x. By Proposition 2.1, we have

T (u + v) = a · (u + v) = (a · u) + (a · v) = T (u) + T (v), and

T (cv) = a · (cv) = c(a · v) = cT (v),

as required. Moreover, it is easy to see that

 
a1
 a2 
 
if a =  .. , then [T ] = a1 a2 · · · an . ▽
 . 
an
§4. Linear Transformations and Matrix Algebra 25

Examples 2. (a) Consider the function T : R2 → R2 defined by rotating vectors in the

plane counterclockwise by 90◦ . Then it is easy to see from the geometry in Figure 4.2 that
" #! " #
x1 −x2
T = .
x2 x1
Now the linearity properties can be checked algebraically: If

−x2

T(x)
x1 x
x2

Figure 4.2
" # " #
x1 y1
x= and y =
x2 y2
are vectors, then
" #! " # " # " #
x1 + y 1 −(x2 + y2 ) −x2 −y2
T (x + y) = T = = + = T (x) + T (y),
x2 + y 2 x1 + y 1 x1 y1

and, even easier,

" #! " #! " # " #
x1 cx1 −cx2 −x2
T (cx) = T c =T = =c = cT (x),
x2 cx2 cx1 x1
as required. The standard matrix for T is
" #
0 −1
[T ] = .
1 0
Better yet, since rotation carries lines through the origin to lines through the origin
and triangles to congruent triangles, it is clear on geometric grounds that T must satisfy
properties (i) and (ii).
(b) Consider the function T : R2 → R2 defined by reflecting vectors across the line x1 = x2 , as
shown in Figure 4.3. (Visualize this as looking at vectors through a mirror along that line.)
Once again, we see from the geometry that
" #! " #
x1 x2
T = ,
x2 x1
and linearity is obvious algebraically. But it should also be clear on geometric grounds that
stretching a vector and then looking at it in the mirror is the same as stretching its mirror
image, and likewise for addition of vectors. The standard matrix for T is
" #
0 1
[T ] = .
1 0
26 Chapter 1. Vectors and Matrices

)x ( T
x
x1
x2

Figure 4.3

(c) Consider the linear transformation T : R2 → R2 whose standard matrix is

" #
1 1
A= .
0 1
The effect of T is pictured in Figure 4.4. One might slide a deck of cards in this fashion
and such a motion is called a shear.

e2 T(e2)
T

e1 T(e1)

Figure 4.4

(d) Consider the function T : R3 → R3 defined by reflecting across the plane x3 = 0. Then
T (e1 ) = e1 , T (e2 ) = e2 , and T (e3 ) = −e3 , so the standard matrix for T is
 
1 0 0
 
0 1 0. ▽
0 0 −1

(e) Generalizing (a), we consider rotation of R2 through the angle θ (given in radians). By
the same geometric argument we suggested earlier (see Figure 4.5), this is a linear trans-

)
v) +v
v T( T(
u
T

rotate
u+v
u)
T(

Figure 4.5
§4. Linear Transformations and Matrix Algebra 27

formation of R2 . Now, as we can see from Figure 4.6, the standard matrix has as its first
column " #
cos θ
T (e1 ) =
sin θ
(by the usual definition of cos θ and sin θ, in fact) and as its second
" #
− sin θ
T (e2 ) =
cos θ

(since e2 is obtained by rotating e1 through π/2, so then is T (e2 ) obtained by rotating

T (e1 ) through π/2). Thus, the standard matrix for T is
" #
cos θ − sin θ
Aθ = .
sin θ cos θ

T(e2)= −sin θ
cos θ
e2 −sin θ
cos θ
T cos θ T(e1)= sin θ
sin θ
θ
rotate θ cos θ
e1

Figure 4.6

1
(f) If ℓ ⊂ R2 is the line spanned by , then we can consider the linear maps S, T : R2 → R2
2
given respectively by projection onto, and reflection across, the line ℓ. Their standard
matrices are " # " #
1 2 3 4
−
A = 25 5
4 and B= 5
4
5 .
3
5 5 5 5

If we consider larger matrices, e.g.,

 1 1
√
6 1
√
6

6√ 3 + 6 6 − √3
 6 6 ,
C =  13 − 2 1
+ 6 
√6 3√ 3
1 6 1 6 1
6 + 3 3 − 6 6

then it seems impossible to discern the geometric nature of the linear map represented by
such a matrix.3 In these examples, the standard “coordinate system” built into matrices
just masks the geometry, and, as we shall see, the solution is to change our coordinate
system. This we do in Chapter 9. ▽

3
For thecurious
 among you, multiplication by C gives a rotation of R3 through an angle of π/2 about the line
1
spanned by  2 . See Exercise 9.2.21.
 
1
28 Chapter 1. Vectors and Matrices

Let T : Rn → Rm be a linear map, and let A be its standard matrix. We want to define the
product of the m × n matrix A with the vector x ∈ Rn in such a way that the vector T (x) ∈ Rm is
equal to Ax. (We will occasionally denote the linear map defined in this way by µA .) In accordance
with the formula (∗) on p. 24, we have
n
X n
X
Ax = T (x) = xi T (ei ) = xi a i ,
i=1 i=1

where
    
a11 a12 a1n
 .   .   . 
a1 =  . 
 . , a2 =  . 
 . ,..., an =  . 
 . ∈R
m

am1 am2 amn

are the column vectors of the matrix A. That is, Ax is the linear combination of the vectors
a1 , . . . , an , weighted according to the coordinates of the vector x.
There is, however, an alternative interpretation. Let
     
a11 a21 am1
 .     
A1 =  ..  , A2 =  ...  , , . . . , Am =  ...  ∈ Rn
     
a1n a2n amn

be the row vectors of the matrix A. Then

   
a11 x1 + · · · + a1n xn A1 · x
   
 a21 x1 + · · · + a2n xn   A2 · x 
Ax =  .. =
  .. .

 .   . 
am1 x1 + · · · + amn xn Am · x

As we shall study in great detail in Chapter 4, this allows us to interpret the equation Ax = y as
a system of m linear equations in the variables x1 , . . . , xn .

4.1. Algebra of Linear Functions. Denote by Mm×n the set of all m × n matrices. In an
obvious way this set can be identified with Rmn (how?). Indeed, we begin by observing that we
can add m × n matrices and multiply them by scalars just as we did vectors.
For future reference, we call a matrix square if m = n (i.e., it has equal numbers of rows and
columns). We refer to the entries aii , i = 1, . . . , n, as diagonal entries. We call the (square) matrix
a diagonal matrix if aij = 0 whenever i 6= j, i.e., if every non-diagonal entry is 0. A square matrix
all of whose entries below the diagonal are 0 is called upper triangular ; one all of whose entries
above the diagonal are 0 is called lower triangular .
If S, T : Rn → Rm are linear maps and c ∈ R, then we can obviously form the linear maps
cT : Rn → Rm and S + T : Rn → Rm , defined, respectively, by

(cT )(x) = c(T (x))

(S + T )(x) = S(x) + T (x).
§4. Linear Transformations and Matrix Algebra 29

The corresponding algebraic manipulations with matrices are clear: If A = [aij ], i = 1, . . . , m,

j = 1, . . . , n, then cA is the matrix whose entries are caij :
   
a11 . . . a1n ca11 . . . ca1n
   
 a21 . . . a2n   ca21 . . . ca2n 

cA = c  .  
..  =  .. .. 
.. .. .
 .. . .   . . . 
am1 . . . amn cam1 . . . camn
Given two matrices A and B ∈ Mm×n , we define their sum entry by entry. In symbols, when
   
a11 . . . a1n b11 . . . b1n
   
 a21 . . . a2n   b21 . . . b2n 
A=  .. . .  and B =  .
.. .. 
,
 . .. .. 

 .
 . . . 
am1 . . . amn bm1 . . . bmn
we define  
a11 + b11 ... a1n + b1n
 
 a21 + b21 ... a2n + b2n 
A+B =
 .. .. .. .

 . . . 
am1 + bm1 . . . amn + bmn
Example 3. Let c = −2 and
   
1 2 3 6 4 −1 " #
    1 2 1
A = 2 1 −2  , B =  −3 1 1, C= .
2 1 1
4 −1 3 0 0 0
Then    
−2 −4 −6 7 6 2
   
cA =  −4 −2 4, A + B =  −1 2 −1  ,
−8 2 −6 4 −1 3
and neither sum A + C nor B + C makes sense, since C has a different shape from A and B. (One
should not expect to be able to add functions with different domains or ranges.) ▽

Denote by O the zero matrix , the m × n matrix all of whose entries are 0. As the reader can
easily check, scalar multiplication of matrices and matrix addition satisfy the same properties as
scalar multiplication of vectors and vector addition (see Exercise 1.1.12). We list them here for
reference.

Proposition 4.1. Let A, B, C ∈ Mm×n and let c, d ∈ R.

(1) A + B = B + A.
(2) (A + B) + C = A + (B + C).
(3) O + A = A.
(4) There is a matrix −A so that A + (−A) = O.
(5) c(dA) = (cd)A.
(6) c(A + B) = cA + cB.
(7) (c + d)A = cA + dA.
30 Chapter 1. Vectors and Matrices

(8) 1A = A.

Of all the operations one performs on functions, probably the most powerful is composition.
Recall that when g(x) is in the domain of f , we define (f ◦ g)(x) = f (g(x)). So, suppose we have
linear maps S : Rp → Rn and T : Rn → Rm . Then we define T ◦ S : Rp → Rm by (T ◦ S)(x) =
T (S(x)). It is well known that composition of functions is not commutative4 but is associative,
inasmuch as

(f ◦ g)◦ h (x) = f (g(h(x))) = f ◦ (g ◦ h) (x).

We want to define matrix multiplication so that it corresponds to the composition of linear maps.
Let A be the m × n matrix representing T and let B be the n × p matrix representing S. We expect
that the m × p matrix C representing T ◦ S can be expressed in terms of A and B. The j th column
of C is the vector (T ◦ S)(ej ) ∈ Rm . Now,
 
b1j
 b2j 
 
T (S(ej )) = T  ..  = b1j a1 + b2j a2 + · · · + bnj an ,
 . 
bnj

where a1 , . . . , an are the column vectors of A. That is, the j th column of C is the product of the
matrix A with the vector bj . So we now make the definition:

Definition. Let A be an m × n matrix and B an n × p matrix. Their product AB is the m × p

matrix whose j th column is the product of A with the j th column of B. That is, its ij-entry is

(AB)ij = ai1 b1j + ai2 b2j + · · · + ain bnj ,

i.e., the dot product of the ith row vector of A and the j th column vector of B, both of which are
vectors in Rn . Graphically, we have
   
a11 a12 · · · a1n   ··· ··· ··· ··· ···
 .  b b1j b1p  .. 
 ..   11   
    . 
  b21 b2j b2p
 ai1 ai2 · · · ain   ... ...
 
 = ··· ··· (AB)ij

··· ···.
  ... .. ..   
 ..   . .   .. 
   . 
 .  bn1 bnj bnp
am1 am2 · · · amn ··· ··· ··· ··· ···

We reiterate that in order for the product AB to be defined, the number of columns of A must
equal the number of rows of B.

Example 4. If
 
1 3 " #
  4 1 0 −2
A = 2 −1  and B= ,
−1 1 5 1
1 1

4
e.g., sin(x2 ) 6= sin2 x
§4. Linear Transformations and Matrix Algebra 31

then
   
1 3 " # 1 4 15 1
  4 1 0 −2  
AB =  2 −1  = 9 1 −5 −5  .
−1 1 5 1
1 1 3 2 5 −1
Notice also that the product BA does not make sense: B is a 2 × 4 matrix and A is 3 × 2, and
4 6= 3. ▽

The preceding example brings out an important point about the nature of matrix multiplication:
it can happen that the matrix product AB is defined and the product BA is not. Now if A is an
m × n matrix and B is an n × m matrix, then both products AB and BA make sense: AB is m × m
and BA is n × n. Notice that these are both square matrices, but of different sizes. But even if we
start with both A and B as n × n matrices, the products AB and BA need not be equal.

Example 5. Let " # " #

1 2 −1 0
A= and B = .
−3 1 1 0
Then " # " #
1 0 −1 −2
AB = , whereas BA = . ▽
4 0 1 2

When—and only when—A is a square matrix (i.e., m = n), we can multiply A by itself,
obtaining A2 = AA, A3 = A2 A = AA2 , etc. If we think of Ax as resulting from x by performing
some geometric procedure, then (A2 )x should result from performing that procedure twice, (A3 )x
thrice, and so on.

Example 6. Let " #

1 2
A= 5 5 .
2 4
5 5

Then it is easy to check that A2 = A, so An = A for all positive integers n (why?). What is the
geometric explanation? Note that
" # " # " # " # " # " #
1 2
1 5 1 1 0 5 2 1
A = 2 = and A = 4 = ,
0 5
5 2 1 5
5 2

2 1
so that for every x ∈ R , we see that Ax lies on the line spanned by . Indeed, we can tell more:
2

" # " # " # 1 " #
1 x·
x1 5 (x1
+ 2x2 ) x1 + 2x2 1 2 1
A = 2 = = 2
x2 5 (x1
+ 2x2 ) 5 2 2
1
2

1
is the projection of x onto the line spanned by . This explains why A2 x = Ax for every x ∈ R2 :
2
A2 x = A(Ax), and once we’ve projected the vector x onto the line, it stays exactly the same. ▽
32 Chapter 1. Vectors and Matrices

Example 7. There is an interesting way to interpret matrix powers in terms of directed graphs.
Starting with the matrix  
0 2 1
 
A = 1 1 1,
1 0 1
we draw a graph with 3 nodes (vertices) and aij directed edges (paths) from node i to node j, as
shown in Figure 4.7. For example, there are 2 edges from node 1 to node 2 and none from node 3

2
1

Figure 4.7

to node 2.
We calculate
    
0 2 1 0 2 1 3 2 3
2     
A = 1 1 11 1 1 = 2 3 3,
1 0 1 1 0 1 1 2 2
 
5 8 8
3  
A = 6 7 8, and . . .
4 4 5
 
272 338 377
7  
A =  273 337 377  .
169 208 233
For example, the 13-entry of A2 is

(A2 )13 = a11 a13 + a12 a23 + a13 a33 = (0)(1) + (2)(1) + (1)(1) = 3.

With a bit of thought, the reader will convince herself that the ij-entry of A2 is the number of
“two-step” directed paths from node i to node j. Similarly, the ij-entry of An is the number of
n-step directed paths from node i to node j. ▽

We have seen that, in general, matrix multiplication is not commutative. However, it does have
the following crucial properties. Let In denote the n × n matrix with 1’s on the diagonal and 0’s
elsewhere.

Proposition 4.2. Let A and A′ be m × n matrices; let B and B ′ be n × p matrices; let C be

a p × q matrix, and let c be a scalar. Then
§4. Linear Transformations and Matrix Algebra 33

(1) AIn = A = Im A. For this reason, In is called the n × n identity matrix.

(2) (A + A′ )B = AB + A′ B and A(B + B ′ ) = AB + AB ′ . This is the distributive property of
matrix multiplication over matrix addition.
(3) (cA)B = c(AB) = A(cB).
(4) (AB)C = A(BC). This is the associative property of matrix multiplication.

Proof. These are all immediate from the linear map viewpoint.

One of the important concepts is that of the inverse of a function.

Definition. Let A be an n × n matrix. We say A is invertible if there is an n × n matrix B so

that
AB = BA = In .
We call B the inverse of the matrix A and denote this by B = A−1 .

If A is the matrix representing the linear transformation T : Rn → Rn , then A−1 represents the
inverse function T −1 , which must then also be a linear transformation.

Example 8. Let " # " #

2 5 3 −5
A= and B= .
1 3 −1 2
Then AB = I2 and BA = I2 , so B is the inverse matrix of A. ▽

Example 9. It will be convenient for our future work to have the inverse of a 2 × 2 matrix
" #
a b
A= .
c d
Provided ad − bc 6= 0, if we set
" #
1 d −b
A−1 = ,
ad − bc −c a
then an easy calculation shows that AA−1 = A−1 A = I2 , as needed. ▽

Example 10. It follows immediately from Example 9 that for our rotation matrix
" # " #
cos θ − sin θ −1 cos θ sin θ
Aθ = , we have Aθ = .
sin θ cos θ − sin θ cos θ
Since cos(−θ) = cos θ and sin(−θ) = − sin θ, we see that this is the matrix A(−θ) . If we think
about the corresponding linear maps, this result becomes obvious: to invert (or “undo”) a rotation
through angle θ, we must rotate through angle −θ. ▽

Example 11. As an application of Example 9, we can now show thatany two nonparallel
u1 v
vectors u,v ∈ R2 must span R2 . It is easy to check that if u = and v = 1 are nonparallel,
u2 v2
then u1 v2 − u2 v1 6= 0, so the matrix " #
u1 v1
A=
u2 v2
34 Chapter 1. Vectors and Matrices

c1
is invertible. Given x ∈ R2 , define c = by c = A−1 x. Then we have
c2

x = A(A−1 x) = Ac = c1 u + c2 v,

thereby establishing that an arbitrary x ∈ R2 is a linear combination of u and v. Indeed, more is

true: That linear combination is unique, since x = Ac if and only if c = A−1 x. We shall study the
generalization of this result to higher dimensions in great detail in Chapter 4. ▽

We shall learn in Chapter 4 how to calculate the inverse of a matrix in a straightforward fashion.
We end the present discussion of inverses with a very important observation.

Proposition 4.3. Suppose A and B are invertible n × n matrices. Then their product AB is
invertible, and
(AB)−1 = B −1 A−1 .

Remark. Some people refer to this result rather endearingly as the “shoe-sock theorem,” for
to undo (invert) the process of putting on one’s socks and then one’s shoes, one must first remove
the shoes and then remove the socks.

Proof. To prove the matrix AB is invertible, we need only check that the candidate for the
inverse works. That is, we need to check that

(AB)(B −1 A−1 ) = In and (B −1 A−1 )(AB) = In .

But these follow immediately from associativity:

(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In , and

(B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 In B = B −1 B = In .

4.2. The Transpose. The final matrix operation we will discuss in this chapter is the trans-
pose. When A is an m × n matrix with entries aij , the matrix AT (read “A transpose”) is the n × m
matrix whose ij-entry is aji ; i.e., the ith row of AT is the ith column of A. We say a square matrix
A is symmetric if AT = A and skew-symmetric if AT = −A.

Example 12. Suppose

   
" # 1 3 1
1 2 1 h i
   
A= , B = 2 −1  , C =  2, and D = 1 2 −3 .
3 −1 0
1 0 −3

Then AT = B, B T = A, C T = D, and D T = C. Note, in particular, that the transpose of a column

vector, i.e., a n × 1 matrix, is a row vector, i.e., a 1 × n matrix. An example of a symmetric matrix
is    
1 2 3 1 2 3
   
S = 2 0 −1  , since S T =  2 0 −1  = S. ▽
3 −1 7 3 −1 7

The basic properties of the transpose operation are as follows:

§4. Linear Transformations and Matrix Algebra 35

Proposition 4.4. Let A and A′ be m × n matrices, let B be an n × p matrix, and let c be a

scalar. Then
(1) (AT )T = A
(2) (cA)T = cAT
(3) (A + A′ )T = AT + A′T
(4) (AB)T = B T AT

Proof. The first is obvious, since we swap rows and columns and then swap again, returning
to our original matrix. The second and third are immediate to check. The last result is more
interesting and we will use it to derive a crucial result in a moment. Note, first, that AB is an
m × p matrix, so (AB)T will be a p × m matrix; B T AT is the product of a p × n matrix and an
n × m matrix and hence will be p × m as well, so the shapes agree. Now, the ji-entry of AB is the
dot product of the j th row vector of A and the ith column vector of B, i.e., the ij-entry of (AB)T
is

(AB)T ij = (AB)ji = Aj · bi .
On the other hand, the ij-entry of B T AT is the dot product of the ith row vector of B T and the
j th column vector of AT ; but this is, by definition, the dot product of the ith column vector of B
and the j th row vector of A. That is,
(B T AT )ij = bi · Aj ,
and, since dot product is commutative, the two formulas agree.

The transpose matrix will be so important to us because of the interplay between dot product
and transpose. If x and y are vectors in Rn , then by virtue of our very definition of matrix
multiplication,
x · y = xT y,
provided we agree to think of a 1 × 1 matrix as a scalar. Now we have the highly useful

Proposition 4.5. Let A be an m × n matrix, x ∈ Rn , and y ∈ Rm . Then

Ax · y = x · AT y.
(On the left, we take the dot product of vectors in Rm ; on the right, of vectors in Rn .)

Remark. You might remember this: to move the matrix “across the dot product,” you must
transpose it.

Proof. We just calculate, using the formula for the transpose of a product and, as usual,
associativity:
Ax · y = (Ax)T y = (xT AT )y = xT (AT y) = x · AT y.

Example 13. We return to the economic interpretation of dot product given in the remark
on p. 12. Suppose that m different ingredients
  are required to manufacture n different
 products.
x1 y1
   
To manufacture the product vector x =  ...  requires the ingredient vector y =  ... , and we
xn ym
36 Chapter 1. Vectors and Matrices

suppose x and y are related by the equation y = Ax for some m × n matrix A. If each unit of
ingredient j costs a price pj , then the cost of producing x is
m
X n
X
pj yj = y · p = Ax · p = x · AT p = q i xi ,
j=1 i=1

where q = AT p. Notice then that qi is the amount it costs to produce a unit of the ith product.
Our fundamental formula, Proposition 4.5, tells us that the total cost of the ingredients should
equal the total worth of the products we manufacture. ▽

EXERCISES 1.4

 
" # " # " # 0 1
1 2 2 1 1 2 1  
1. Let A = ,B = ,C = , and D =  1 0 . Calculate
3 4 4 3 0 1 2
2 3
each of the following expressions or explain why it is not defined.
a. A+B e. AB i. BD
*b. 2A − B *f. BA j. DB
c. A−C *g. AC *k. CD
d. C +D *h. CA *l. DC
2. a. If A is an m × n matrix and Ax = 0 for all x ∈ Rn , prove that A = O.
b. If A and B are m × n matrices and Ax = Bx for all x ∈ Rn , prove that A = B.
♯ 3. Let A be an m × n matrix. Show that V = {x ∈ Rn : Ax = 0} is a subspace of Rn .
♯ 4. Let A be an m × n matrix.

x n
a. Show that V = :x∈R ⊂ Rm+n is a subspace of Rm+n .
Ax
b. When m = 1, show that V ⊂ Rn+1 is a hyperplane (see Example 1(e) in Section 3) by
finding a vector b ∈ Rn+1 so that V = {z ∈ Rn+1 : b · z = 0}.

5. Give 2 × 2 matrices A so that for any x ∈ R2 we have, respectively:

a. Ax is the vector whose components are, respectively, the sum and difference of the com-
ponents of x;
*b. Ax is the vector obtained by projecting x onto the line x1 = x2 in R2 ;
c. Ax is the vector obtained by first reflecting x across the line x1 = 0 and then reflecting
the resulting vector across the line x2 = 0;
d. Ax is the vector obtained by projecting x onto the line 2x1 − x2 = 0;
*e. Ax is the vector obtained by first projecting x onto the line 2x1 − x2 = 0 and then rotating
the resulting vector π/2 counterclockwise;
f. Ax is the vector obtained by first rotating x an angle of π/2 counterclockwise and then
projecting the resulting vector onto the line 2x1 − x2 = 0.

6. a. Calculate Aθ Aφ and Aφ Aθ . (Recall the definition of the rotation matrix on p. 27.)

§4. Linear Transformations and Matrix Algebra 37

b. Use your answer to part a to derive the addition formulas for cos and sin.

7. Let Aθ be the rotation matrix defined on p. 27, 0 ≤ θ ≤ π. Prove that

a. kAθ xk = kxk for all x ∈ R2 ;
b. the angle between x and Aθ x is θ.
These properties should characterize a rotation of the plane through angle θ.

8. Prove or give a counterexample. Assume all relevant matrices are square and of the same size.
a. If AB = CB and B 6= O, then A = C.
b. If A2 = A, then A = O or A = I.
c. (A + B)(A − B) = A2 − B 2 .
d. If AB = BC and B is invertible, then A = C.

a b
9. Find all 2 × 2 matrices A = satisfying
c d
a. A2 = I2
*b. A2 = O
c. A2 = −I2

cos θ
10. a. Show that the matrix giving reflection across the line spanned by is
" # sin θ
cos 2θ sin 2θ
R= .
sin 2θ − cos 2θ
b. Letting Aθ be the rotation matrix defined on p. 27, check that
" # " #
1 0 1 0
A2θ = R = Aθ A(−θ) .
0 −1 0 −1

11. For each of the following matrices A, find a formula for An . (If you know how to do proof by
induction, please
do.)
1 1
a. A =
0 1
 
d1
 d2 
 
b. A =  ..  (all nondiagonal entries are 0)
 . 
dm

12. Suppose A and A′ are m × m matrices, B and B ′ are m × n matrices, C and C ′ are n × m
matrices, and D and D ′ are n × n matrices. Check the following formula for the product of
“block” matrices:
" #" # " #
A B A′ B ′ AA′ + BC ′ AB ′ + BD ′
= .
C D C ′ D′ CA′ + DC ′ CB ′ + DD ′

*13. Let T : R2 → R2 be the linear transformation defined by rotating the plane π/2 counterclock-
wise; let S : R2 → R2 be the linear transformation defined by reflecting the plane across the
line x1 + x2 = 0.
38 Chapter 1. Vectors and Matrices

a. Give the standard matrices representing S and T .

b. Give the standard matrix representing T ◦ S.
c. Give the standard matrix representing S ◦ T .

14. Calculate the standard matrix for each of the following linear transformations T :
a. T : R2 → R2 given by rotating −π/4 about the origin and then reflecting across the line
x1 − x2 = 0.
b. T : R3 → R3 given by rotating π/2 about the x1 -axis (as viewed from the positive side)
and then reflecting across the plane x2 = 0.
c. T : R3 → R3 given by rotating −π/2 about the x1 -axis (as viewed from the positive side)
and then rotating π/2 about the x3 -axis.
 
±1
15. Consider the cube with vertices  ±1 , pictured in Figure 4.8. (Note that the coordinate axes
±1
pass through the centers of the various faces.) Give the standard matrices for each of the
following symmetries of the cube.

−1
−1
1 1 −1
−1 1
1 1 1
−1 1
−1 1
−1

−1
1 1
−1 −1
−1 1
1
−1

Figure 4.8

a. 90◦ rotation about the x3 -axis (viewed

 from
 highabove)

−1 1
b. 180◦ rotation about the line joining  0  and  0 
1 −1
   
−1 1
c. 120◦ rotation about the line joining  −1  and  1  (viewed from high above)
−1 1
       
1 −1 1 −1
16. Consider the tetrahedron with vertices  1 ,  −1 ,  −1 , and  1 , pictured in Figure
1 1 −1 −1
4.9. Give the standard matrices for each of the following symmetries of the tetrahedron.
§4. Linear Transformations and Matrix Algebra 39

−1
−1
1
1
1
1

1 −1
−1 1
−1 −1

Figure 4.9
 
0
a. 120◦ rotation counterclockwise (as viewed from high above) about the line joining  0 
0
 
1
and the vertex  1 
1
   
0 0
b. 180◦ rotation about the line joining  0  and  0 
1 −1
c. reflection across the plane containing one edge and the midpoint of the opposite edge
(Hint: Note where the coordinate axes intersect the tetrahedron.)

*17. Suppose A is an n × n matrix and B is an invertible n × n matrix. Calculate the following.

a. (BAB −1 )2
b. (BAB −1 )n (n a positive integer)
c. (BAB −1 )−1 (what additional assumption is required here?)

18. Find matrices A so that

a. A 6= O, but A2 = O
b. A2 6= O, but A3 = O
Can you make a conjecture about matrices satisfying An−1 6= O but An = O?

*19. Suppose A is an invertible n × n matrix and x ∈ Rn satisfies Ax = 7x. Calculate A−1 x.

20. Suppose A is a square matrix satisfying the equation A3 − 3A + 2I = 0. Show that A is

invertible. (Hint: Can you give an explicit formula for A−1 ?)

21. Suppose A is an n × n matrix satisfying A10 = O. Prove that the matrix In − A is invertible.
(Hint: As a warm-up, try assuming A2 = O.)

22. Define the trace of an n × n matrix A (denoted trA) to be the sum of its diagonal entries:
n
X
trA = aii .
i=1

a. Prove that trA = tr(AT ).

b. Prove that tr(A + B) = trA + trB and tr(cA) = c trA for any scalar c.
40 Chapter 1. Vectors and Matrices

n P
P n n P
P n
c. Prove that tr(AB) = tr(BA). (Hint: ckℓ = ckℓ .)
k=1 ℓ=1 ℓ=1 k=1

23. Let
 
" # " # " # 0 1
1 2 2 1 1 2 1  
A= , B= , C= , and D =  1 0.
3 4 4 3 0 1 2
2 3
Calculate each of the following expressions or explain why it is not defined.
a. AT *e. AT C i. DT B
*b. 2A − B T f. AC T *j. CC T
c. CT *g. C T AT *k. C TC
d. CT + D h. BD T l. C T DT
*24. Suppose A and B are symmetric. Prove that AB is symmetric if and only if AB = BA.

25. Let A be an arbitrary m × n matrix. Prove that AT A is symmetric.

26. Suppose A is invertible. Check that (A−1 )T AT = I and AT (A−1 )T = I, and deduce that AT is
likewise invertible.

*27. Let Aθ be the rotation matrix defined on p. 27. Explain why A−1 T
θ = Aθ .

28. An n × n matrix is called a permutation matrix if it has a single 1 in each row and column and
all its remaining entries are 0.
a. Write down all the 2 × 2 permutation matrices. How many are there?
b. Write down all the 3 × 3 permutation matrices. How many are there?
c. Prove that the product of two permutation matrices is again a permutation matrix. Do
they commute?
d. Prove that every permutation matrix is invertible and P −1 = P T .
e. If A is an n × n matrix and P is an n × n permutation matrix, describe the matrices P A
and AP .
♯ 29. Let A be an m × n matrix and let x, y ∈ Rn . Prove that if Ax = 0 and y = AT b for some
b ∈ Rm , then x · y = 0.
♯ 30. Suppose A is a symmetric n × n matrix. Let V ⊂ Rn be a subspace with the property that
Ax ∈ V for every x ∈ V . Prove that Ay ∈ V ⊥ for all y ∈ V ⊥ .

*31. Given the matrix

   
1 2 1 4 −3 1
   
A = 1 3 1 and its inverse matrix A−1 =  −1 1 0,
0 1 −1 −1 1 −1
find (with no computation) the inverse of
     
1 1 0 1 2 1 1 2 1
     
a.  2 3 1 b.  0 1 −1  c.  1 3 1
1 1 −1 1 3 1 0 2 −2
§4. Linear Transformations and Matrix Algebra 41

♯ *32. Suppose A is an m × n matrix and x ∈ Rn satisfies (AT A)x = 0. Prove that Ax = 0. (Hint:
What is kAxk?)

33. Suppose A is a symmetric matrix satisfying A2 = O. Prove that A = O. Give an example to

show that the hypothesis of symmetry is required.
♯ 34. We say an n × n matrix A is orthogonal if AT A = In .
a. Prove that the column vectors a1 , . . . , an of an orthogonal matrix A are unit vectors that
are orthogonal to one another, i.e.,

1, i = j
ai · aj = .
0, i 6= j

b. Fill in the missing columns in the following matrices to make them orthogonal:
   
" √ # 1 0 ? 1
? 2
3
2 ?    32 3
2 
,  0 −1 ?  ,  3 ? − 3 
− 12 ? 2 1
0 0 ? 3 ? 3

c. Prove that any 2 × 2 orthogonal matrix A must be of the form

" # " #
cos θ − sin θ cos θ sin θ
or
sin θ cos θ sin θ − cos θ

for some real number θ. (Hint: Use part a, rather than the original definition.)
*d. Prove that if A is an orthogonal 2 × 2 matrix, then µA : R2 → R2 is either a rotation or
the composition of a rotation and a reflection.
e. Assume for now that AT = A−1 when A is orthogonal (this is a consequence of Corollary
2.2 of Chapter 4). Prove that the row vectors A1 , . . . , An of an orthogonal matrix A are
unit vectors that are orthogonal to one another.

35. (Recall the definition of orthogonal matrices from Exercise 34.)

a. Prove that if A and B are orthogonal n × n matrices, then so is AB.
*b. Prove that if A is an orthogonal matrix, then so is A−1 .
♯ 36. a. Prove that the only matrix that is both symmetric and skew-symmetric is O.
b. Given any square matrix A, prove that S = 12 (A + AT ) is symmetric and K = 21 (A − AT )
is skew-symmetric.
c. Prove that any square matrix A can be written in the form A = S + K, where S is
symmetric and K is skew-symmetric.
d. Prove that the expression in part c is unique: if A = S + K and A = S ′ + K ′ (where S and
S ′ are symmetric and K and K ′ are skew-symmetric), then S = S ′ and K = K ′ . (Hint:
Use part a.)

37. Suppose A is an n × n matrix that commutes with all n × n matrices; i.e., AB = BA for all
B ∈ Mn×n . What can you say about A?
42 Chapter 1. Vectors and Matrices

5. Introduction to Determinants and the Cross Product

Let x and y be vectors in R2 and consider the parallelogram P they span. The area of P is
nonzero so long as x and y are not collinear. We want to express the area of P in terms of the
coordinates of x and y. First notice that the area of the parallelogram pictured in Figure 5.1 is the

y
P

θ x

Figure 5.1

same as the area of the rectangle obtained by moving the shaded triangle from the right side to
the left. This rectangle has area A = bh, where b = kxk is the base and h = kyk sin θ is the height.
We could calculate sin θ from the formula
x·y
cos θ = ,
kxkkyk
but instead we note (see Figure 5.2) that
π

kxkkyk sin θ = kxkkyk cos 2 − θ = ρ(x) · y,

where ρ(x) is the vector obtained by rotating x an angle π/2 counterclockwise (see Exercise 1.2.16).

ρ( x) y

π− θ θ x
2

Figure 5.2

x1 y1
If x = and y = , then we have
x2 y2
" # " #
−x2 y1
area(P) = ρ(x) · y = · = x1 y 2 − x2 y 1 .
x1 y2

3 4
Example 1. If x = , then the area of the parallelogram spanned by x and
and y =
1 3

4
y is x1 y2 − x2 y1 = 3 · 3 − 1 · 4 = 5. On the other hand, if we interchange the two, letting x =
3
§5. Introduction to Determinants and the Cross Product 43

3
and y = , then we get x1 y2 − x2 y1 = 4 · 1 − 3 · 3 = −5. Certainly the parallelogram hasn’t
1
changed, and nor does it make sense to have negative area. What is the explanation? In deriving
our formula for the area above, we assumed 0 < θ < π; but if we must turn clockwise to get from
x to y, this means that θ is negative, resulting in a sign discrepancy in the area calculation. ▽
So we should amend our earlier result. We define the signed area of the parallelogram P to be

y Signed area < 0

x x

Signed area > 0 y

Figure 5.3

the area of P when one turns counterclockwise from x to y and to be negative the area of P when
one turns clockwise from x to y, as illustrated in Figure 5.3. Then we have:
signed area(P) = x1 y2 − x2 y1 .
Because of its geometric significance, we consider the function5
(∗) D(x, y) = x1 y2 − x2 y1 ;
this is the function that associates to each ordered pair of vectors x, y ∈ R2 the signed area of the
parallelogram they span.
Next, let’s explore the properties of the signed area function D on R2 × R2 .6
Property 1. If x, y ∈ R2 , then D(y, x) = −D(x, y).
Algebraically, we have
D(y, x) = y1 x2 − y2 x1 = −(x1 y2 − x2 y1 ) = −D(x, y).
Geometrically, this was the point of our introducing the notion of signed area.
Property 2. If x, y ∈ R2 and c ∈ R, then
D(cx, y) = cD(x, y) = D(x, cy).
This follows immediately from the formula (∗):
D(cx, y) = (cx1 )y2 − (cx2 )y1 = c(x1 y2 − x2 y1 ) = cD(x, y).
Geometrically, if we stretch one of the edges of the parallelogram by a factor of c > 0, then the
area is multiplied by a factor of c. And if c < 0, the area is multiplied by a factor of |c| and the
signed area changes sign (why?).
5
Here, since x and y are themselves vectors, we use the customary notation for functions.
6
Recall that, given two sets X and Y , their product X × Y consists of all ordered pairs (x, y), where x ∈ X and
y ∈Y.
44 Chapter 1. Vectors and Matrices

Property 3. If x, y, z ∈ R2 , then

D(x + y, z) = D(x, z) + D(y, z) and D(x, y + z) = D(x, y) + D(x, z).

We can check this explicitly in coordinates (but the clever

reader
should
try to use properties

x1 y1 z1
of the dot product to give a better algebraic proof): if x = ,y= , and z = , then
x2 y2 z2

D(x + y, z) = (x1 + y1 )z2 − (x2 + y2 )z1 = (x1 z2 − x2 z1 ) + (y1 z2 − y2 z1 ) = D(x, z) + D(y, z),

as required. (The formula for D(x, y + z) can now be deduced using Property 1.) Geometrically,
we can deduce the result from Figure 5.4: the area of parallelogram OBCD (D(x + y, z)) is equal

E
D
z
x+y B

x A
O

Figure 5.4

to the sum of the areas of parallelograms OAED (D(x, z)) and ABCE (D(y, z)). The proof of
this, in turn, follows from the fact that △OAB is congruent to △DEC.

Property 4. For the standard basis vectors e1 , e2 , we have D(e1 , e2 ) = 1.

The expression D(x, y) is a 2 × 2 determinant, often written x y . Indeed, given a 2 × 2 matrix
A with column vectors a1 , a2 ∈ R2 , we define

det A = D(a1 , a2 ) = a1 a2 .

As we ask the reader to check in Exercise 4, one can deduce from the four properties above and
the geometry of linear maps the fact that the determinant represents the signed area of the paral-
lelogram.
We next turn to the case of 3 × 3 determinants. The general case will wait until Chapter 7.
Given three vectors,
     
x1 y1 z1
     
x =  x2  , y =  y2  , and z =  z2  ∈ R3 ,
x3 y3 z3
§5. Introduction to Determinants and the Cross Product 45

we define
| | |
y z y z y z

D(x, y, z) = x y z = x1 2 2 − x2 1 1 + x3 1 1 .
y3 z3 y3 z3 y2 z2
| | |
Multiplying this out, we get three positive terms and three negative terms; a handy mnemonic
device for this formula is depicted in Figure 5.5.

— — —
x 1 y1 z 1 x 1 y1
x 2 y2 z 2 x 2 y2
x 3 y3 z 3 x 3 y3
+ + +

Figure 5.5

This function D of three vectors in R3 has properties quite analogous to those in the two-
dimensional case. In particular, it follows immediately from the latter that if x, y, z, and w are
vectors in R3 and c is a scalar, then
D(x, z, y) = −D(x, y, z)
D(x, cy, z) = cD(x, y, z) = D(x, y, cz)
D(x, y + w, z) = D(x, y, z) + D(x, w, z) and D(x, y, z + w) = D(x, y, z) + D(x, y, w).
It is also immediately obvious from the definition that if x, y, z, and w are vectors in R3 and c is
a scalar, then
D(cx, y, z) = cD(x, y, z)
D(x + w, y, z) = D(x, y, z) + D(w, y, z).
Least elegant is the verification that D(y, x, z) = −D(x, y, z):

x z x z x z

D(y, x, z) = y1 2 2 − y2 1 1 + y3 1 1
x3 z3 x3 z3 x2 z2
= y1 (x2 z3 − x3 z2 ) + y2 (x3 z1 − x1 z3 ) + y3 (x1 z2 − x2 z1 )
= −x1 (y2 z3 − y3 z2 ) + x2 (y1 z3 − y3 z1 ) − x3 (y1 z2 − y2 z1 )

y z y z y z
2 2 1 1 1 1
= −x1 + x2 − x3
y3 z3 y3 z3 y2 z2
= −D(x, y, z).
Summarizing, we have:

Property 1. If x, y, z ∈ R3 , then
D(y, x, z) = D(x, z, y) = D(z, y, x) = −D(x, y, z).
Note that, as a consequence, whenever two of x, y, and z are the same, we have D(x, y, z) = 0.
46 Chapter 1. Vectors and Matrices

Property 2. If x, y, z ∈ R3 and c ∈ R, then

D(cx, y, z) = D(x, cy, z) = D(x, y, cz) = cD(x, y, z).
Property 3. If x, y, z ∈ R3 , then
D(x + w, y, z) = D(x, y, z) + D(w, y, z), D(x, y + w, z) = D(x, y, z) + D(x, w, z),
and D(x, y, z + w) = D(x, y, z) + D(x, y, w).
Property 4. For the standard basis vectors e1 , e2 , e3 , we have D(e1 , e2 , e3 ) = 1.

If we let y′ = y − projx y and z′ = z − projx z − projy′ z, then it follows from the properties of
D that D(x, y, z) = D(x, y′ , z′ ). Moreover, we shall see when we study determinants in Chapter
7 that the results of Exercise 4 hold in three dimensions as well, so that the latter value is not

Signed volume < 0

y y

x x

Signed volume > 0 z

Figure 5.6

changed by rotating R3 to make x = αe1 , y′ = βe2 , and z′ = γe3 . Since rotation doesn’t change
signed volume, we deduce that D(x, y, z) equals the signed volume of the parallelepiped spanned
by x, y, and z, as suggested in Figure 5.6. For an alternative argument, see Exercise 18.
Given two vectors x, y ∈ R3 , define a vector, called their cross product, by
x × y = (x2 y3 − x3 y2 )e1 + (x3 y1 − x1 y3 )e2 + (x1 y2 − x2 y1 )e3

e x y
1 1 1

= e2 x2 y 2 ,

e3 x3 y 3
where the latter is to be interpreted “formally.” The geometric interpretation of the cross product,
as indicated in Figure 5.7, is the content of the following

Proposition 5.1. The cross product x × y of two vectors x, y ∈ R3 is orthogonal to both x

and y and kx × yk is the area of the parallelogram P spanned by x and y. Moreover, when x and
y are nonparallel, the vectors x, y, x × y determine a parallelepiped of positive signed volume.

Remark. More colloquially, if you curl the fingers of your right hand from x towards y, your
thumb points in the direction of x × y.
§5. Introduction to Determinants and the Cross Product 47

Proof. The orthogonality is an immediate consequence of the properties once we realize that
the formula for the cross product guarantees that

z · (x × y) = D(z, x, y).

In particular, x · (x × y) = D(x, x, y) = 0.

x×y

Figure 5.7

Now, D(x, y, x × y) is the signed volume of the parallelepiped spanned by x, y, and x × y.

Since x × y is orthogonal to the plane spanned by x and y, that volume is the product of the area
of P and kx × yk. On the other hand,

D(x, y, x × y) = D(x × y, x, y) = (x × y) · (x × y) = kx × yk2 .

Setting the two expressions equal, we infer that

kx × yk = area(P).

When x and y are nonparallel, we have D(x, y, x × y) = kx × yk2 > 0, so the vectors span a
parallelepiped of positive signed volume, as desired.

Example 2.We   product to find the equation of the subspace P spanned by

can use thecross
1 1
the vectors u =  1  and v =  2 . For the normal vector to P is
−1 1
 
e 1 1 3
1
 
A = u × v = e2 1 2 =  −2  ,

e3 −1 1 1
and so
P = {x ∈ R3 : A · x = 0} = {x ∈ R3 : 3x1 − 2x2 + x3 = 0}.
Moreover, as depicted schematically
  in Figure 5.8, the affine plane P1 parallel to P and passing
2
through the point x0 =  0  is given by
1

P1 = {x ∈ R3 : A · (x − x0 ) = 0} = {x ∈ R3 : A · x = A · x0 }
= {x ∈ R3 : 3x1 − 2x2 + x3 = 7}. ▽
48 Chapter 1. Vectors and Matrices

A
x0

P1
A•x=A•x0

P
A•x=0

Figure 5.8

EXERCISES 1.5

1. Give a geometric proof that D(x, y + cx) = D(x, y) for any scalar c.

2. Show that if a function D : R2 × R2 → R satisfies Properties 1–4, then D(x, y) = x1 y2 − x2 y1 .

x1 x2 xn
3. Suppose a polygon in the plane has vertices , , . . ., . Give a formula for its
y1 y2 yn
area.

4. a. Check that when A and B are 2 × 2 matrices, we have det(AB) = det A det B.
b. Let A = Aθ be a rotation matrix. Check that det(Aθ B) = det B for any 2 × 2 matrix B.
c. Use the result of part b and the properties of determinants to give an alternative proof
that D(x, y) is the signed area of the parallelogram spanned by x and y.

5. Calculatethe cross
 product
  of the given vectors x and y.    
1 1 1 7
*a. x =  0 , y =  2  b. x =  −2 , y =  1 
−1 1 1 −5
6. Find the area
 of the 
triangle
 with thegiven vertices.      
0 1 1 0 1 7
*a.  
A= 0 ,B=  
0 , C = 2
 c.   
A = 0 , B = −2 , C =  1
0 −1 1 0 1 −5
           
1 2 2 1 2 8
b. A = −1 , B = −1 , C = 1 
     d. A =  1 , B =  −1 , C =  2 
1 0 2 1 2 −4

7. Find the equation

  of the (affine)
 plane
  containing the three points
     
0 1 1 0 1 7
*a. A =  0 , B =  0 , C =  2  c. A =  0 , B =  −2 , C =  1 
0 −1 1 0 1 −5
           
1 2 2 1 2 8
b.    
A = −1 , B = −1 , C = 1   *d. A =  1 , B =  −1 , C =  2 
1 0 2 1 2 −4
§5. Introduction to Determinants and the Cross Product 49
   
1 0
*8. Find the equation of the (affine) plane containing the points  −1  and  1  and parallel to
 
1 2 3
the vector 0 .

1
9. Find the intersection of the two planes x1 + x2 − 2x3 = 0 and 2x1 + x2 + x3 = 0.

10. Given the nonzero vector a ∈ R3 , a · x = b ∈ R, and a × x = c ∈ R3 , can you determine the
vector x ∈ R3 ? If so, give a geometric construction for x.

*11. Find the distance between the given skew lines in R3 :

       
2 0 1 1
ℓ :  1  + t 1  and m :  1  + s  1 .
1 −1 0 1

*12. Find the volume of the parallelepiped spanned by

     
1 2 −1
x = 2,
 y = 3,
 and z=  0 .
1 1 3

13. Let P be a parallelogram in R3 . Let P1 be its projection on the x2 x3 -plane, P2 be its projection
on the x1 x3 -plane, and P3 be its projection on the x1 x2 -plane. Prove that
2 2 2 2
area(P) = area(P1 ) + area(P2 ) + area(P3 ) .
(How’s that for a generalization of the Pythagorean Theorem?)

14. Let x, y, z ∈ R3 .
a. Show that x × y = −y × x and x × (y + z) = x × y + x × z.
b. Show that cross product is not associative; i.e., give specific vectors so that (x × y) × z 6=
x × (y × z).
 
a
*15. Given a =  b  ∈ R3 , define T : R3 → R3 by T (x) = a × x. Prove that T is a linear
c
transformation and give its standard matrix. Explain in the context of Proposition 4.5 why [T ]
is skew-symmetric.

x · z x · w

16. Let x, y, z, w ∈ R3 . Show that (x × y) · (z × w) = .
y · z y · w

17. Suppose u, v, w ∈ R2 are noncollinear, and let x ∈ R2 .

a. Show that we can write x uniquely in the form x = ru + sv + tw, where r + s + t = 1.
(Hint: The vectors v − u and w − u must be nonparallel. Now apply the result of Example
11 of Section 4.)
b. Show that r is the ratio of the signed area of the triangle with vertices x, v, and w to the
signed area of the triangle with vertices u, v, and w. Give corresponding formulas for s
and t.
50 Chapter 1. Vectors and Matrices

c. Suppose x is the the intersection of the medians of the triangle with vertices u, v, and w.
Compare the areas of the three triangles formed by joining x with any pair of the vertices.
(Cf. Exercise 1.1.8.)
d. Let r = D(v, w), s = D(w, u), and t = D(u, v). Show that ru + sv + tw = 0. Give a
physical interpretation of this result.

18. In this exercise, we give a self-contained derivation of the geometric interpretation of the 3 × 3
determinant as signed volume.
a. By direct algebraic calculation, show that kx × yk2 = kxk2 kyk2 − (x · y)2 . Deduce that
kx × yk is the area of the parallelogram spanned by x and y.
b. Show that z · (x × y) is the signed volume of the parallelepiped spanned by x, y, and z.
c. Conclude that D(x, y, z) equals the signed volume of that parallelepiped.
−→ −−→
19. (Heron’s formula) Given △OAB, let OA = x and OB = y, and set kxk = a, kyk = b, and
kx − yk = c. Let s = 21 (a + b + c) be the semiperimeter of the triangle. Use the formulas
kx × yk2 = kxk2 kyk2 − (x · y)2 (see Exercise 18)
kx − yk2 = kxk2 + kyk2 − 2x · y
to prove that the area A of △OAB satisfies

2 1 2 2 1 2 2 2 2
A = a b − (c − a − b ) = s(s − a)(s − b)(s − c) .
4 4

20. Let △ABC have sides a, b, and c. Let s = 12 (a + b + c) be its semiperimeter. Prove that the
p
inradius of the triangle (i.e., the radius of its inscribed circle) is r = (s − a)(s − b)(s − c)/s.
CHAPTER 2
Functions, Limits, and Continuity
In this brief chapter we introduce examples of non-linear functions, their graphs, and their level
sets. As usual in calculus, the notion of limit is a cornerstone on which calculus is built. To discuss
“nearness,” we need the concepts of open and closed sets and of convergent sequences. We then give
the usual theorems on limits of functions and several equivalent ways of thinking about continuity.
All of this will be the foundation for our work on differential calculus, which comes next.

1. Scalar- and Vector-Valued Functions

In first-year calculus we studied real-valued functions defined on intervals in R (or perhaps on

all of R). In Chapter 1 we began our study of linear functions from Rn to Rm . There are three steps
we might imagine to understand more complicated vector-valued functions of a vector variable.

1.1. Parametrized Curves. First, we might study a vector-valued function of a single vari-
able. If we think of the independent variable, t, as time, then we can visualize f : (a, b) → Rn as
a parametrized curve—we can imagine a particle moving in Rn as time varies, and f (t) gives its
position at time t. At this point, we just give an assortment of examples. The careful analysis,
including the associated differential calculus and physical interpretations, will come in the next
chapter.

Example 1. The easiest examples, perhaps, are linear. Imagine a particle starting at position
x0 and moving with constant velocity v. Then its position at time t is evidently f (t) = x0 + tv and
its trajectory is a line passing through x0 and having direction vector v, as shown in Figure 1.1.
We refer to the vector-valued function f as a parametrization of the line. Here t is free to vary over

f(t)=x0+tv

x0 v

Figure 1.1

all of R. When we wish to parametrize the line passing through two points A and B, it is natural
51
52 Chapter 2. Functions, Limits, and Continuity

B x=x0+tv
A
x0 v

Figure 1.2

−−
→
to use one of those points, say A, as x0 and the vector AB as the direction vector v, as indicated
in Figure 1.2. ▽

Example 2. The next curve with which every mathematics student is familiar is the circle.
Essentially by the very definition of the trigonometric functions cos and sin, we obtain a very
natural parametrization of a circle of radius a, as pictured in Figure 1.3(a):
" # " #
cos t a cos t
f (t) = a = , 0 ≤ t ≤ 2π.
sin t a sin t

Now, if a, b > 0 and we apply the linear map

x ax
T : R2 → R2 , T = ,
y by

x2 y 2 cos t a cos t
we see that the unit circle x2 + y 2 = 1 maps to the ellipse 2 + 2 = 1. Since T = ,
a b sin t b sin t

a cos t
a sin t a cos t
b sin t
b
t t
a a

(a) (b) (c)

Figure 1.3

the latter gives a natural parametrization of the ellipse, as shown in Figure 1.3(b). Be warned,
however; here t is not the angle between the position vector and the positive x-axis, as Figure 1.3(c)
indicates. ▽

Now we come to some more interesting examples.

§1. Scalar- and Vector-Valued Functions 53

y=tx

y2=x3 y2=x3+x2

(a) (b)

Figure 1.4

Example 3. Consider the two cubic curves in R2 illustrated in Figure 1.4. On the left is the
cuspidal cubic y 2 = x3 , and on the right is the nodal cubic y 2 = x3 +x2 . These can be parametrized,
respectively, by the functions
" # " #
t2 t2 − 1
f (t) = 3 and f (t) = ,
t t(t2 − 1)

as the reader can verify.1 Now consider the twisted cubic in R3 , illustrated in Figure 1.5, given by

z2=y3
z=x3

y=x2

Figure 1.5

1
To see where the latter came from, as suggested by Figure 1.4(b), we substitute y = tx in the equation and
solve for x.
54 Chapter 2. Functions, Limits, and Continuity
 
t
 
f (t) =  t2  , t ∈ R.
t3

Its projections in the xy-, xz-, and yz-coordinate planes are, respectively, y = x2 , z = x3 , and
z 2 = y 3 (the cuspidal cubic). ▽

Example 4. Our last example is a classic called the cycloid: It is the trajectory of a dot on
a rolling wheel (circle). Consider the illustration in Figure 1.6. Assuming the wheel rolls without

P
t a
O

Figure 1.6

slipping, the distance it travels along the ground is equal to the length of the circular arc subtended
by the angle through which it has turned. That is, if the radius of the circle is a and it has turned
through angle t, then the point of contact with the x-axis, Q, is at units to the right. The vector

C C
a
P t a cos t
P a sin t
O Q

Figure 1.7

−−→ −−→ −−→

from the origin to the point P can be expressed as the sum of the three vectors OQ, QC, and CP
(see Figure 1.7):
−−→ −−→ −−→ −−→
OP = OQ + QC + CP
" # " # " #
at 0 −a sin t
= + + ,
0 a −a cos t

and hence the function

" # " #
at − a sin t t − sin t
f (t) = =a , t∈R
a − a cos t 1 − cos t

gives a parametrization of the cycloid. ▽

§1. Scalar- and Vector-Valued Functions 55

1.2. Scalar Functions of Several Variables. Next, we might study a scalar-valued function
of several variables. For example, we might study elevation of the earth as a function of position
on the surface of the earth, temperature at noon as a function of position in space, or, indeed,
temperature as a function of both position and time. If we have a function of n variables, to avoid
cumbersome notation, we will typically write
   
x1 x1
   
f  ...  rather than f  ...  .
xn xn

It would be typographically more pleasant and economical to suppress the vector notation and
write merely f (x1 , . . . , xn ), as do most mathematicians. We hope our choice will make it easier for
the reader to keep vectors in columns and not confuse rows and columns of matrices.
When n = 1 or n = 2, such functions are often best visualized by their graphs
(" # )
x n
graph(f ) = :x∈R ⊂ Rn+1 ,
f (x)

as pictured, for example, in Figure 1.8. There are two ways to try to visualize functions and their

y z=f ( xy )
y=f(x)

x
x

Figure 1.8

graphs, as we shall see in further detail in Chapter 3. One is to fix all of the coordinates of x but
one, and see how f varies with each of x1 , . . . , xn individually. This corresponds to taking slices
of the graph, as shown in Figure 1.9. The other is to think of a topographical map, in which we
see curves representing points at the same elevation. One then can lift each of these up to the
appropriate height and imagine the surface interpolating among them, as illustrated in Figure 1.10.
These curves are called level curves or contour curves of the function.

Example 5. Suppose we see families of concentric circles as the level curves, as shown in Figure
1.11. We see that in (a) the circles are evenly spaced, whereas in (b) they grow closer together as
we move outwards. This tells us that in (a) the value of f grows linearly with the distance from
the origin and in (b) it grows more quickly. Indeed, it is not surprising to see the corresponding
graphs in Figure 1.12: the respective functions are f (x) = kxk and f (x) = kxk2 . ▽
56 Chapter 2. Functions, Limits, and Continuity

x
( )
z=f 0

0
z=f y( )

Figure 1.9

y
z

Figure 1.10

1.3. Vector Functions of Several Variables. Last, we think of vector-valued functions of

several variables. Of course the linear versions arise in the study of linear maps, as we’ve already
seen, and a good deal more in the solution of systems of linear equations. Sometimes it is easiest
to think of a vector-valued function f : Rn → Rm as merely a collection of m scalar functions of n
variables:    
x1 f1 (x)
   .. 
f  ...  =  . .
xn fm (x)
But in other instances, we really want to think of the values as geometrically defined vectors;
fundamental examples are parametrized surfaces and vector fields (both of which we shall study a
good deal in Chapter 8). Note that we will indicate a vector-valued function by boldface.

Example 6. Consider the mapping

f : (0, ∞) × [0, 2π) → R2 − {0}

§1. Scalar- and Vector-Valued Functions 57

(a) (b)

Figure 1.11

Figure 1.12

y
2π θ

r sin θ
f
r cos˚θ x
0
r

Figure 1.13

r r cos θ
f = ,
θ r sin θ
58 Chapter 2. Functions, Limits, and Continuity

r
as illustrated in Figure 1.13. This is a one-to-one mapping onto R2
− {0}. The coordinates
θ

x r cos θ
are often called the polar coordinates of the point = . ▽
y r sin θ

Example 7. Consider the mapping

 
u cos v
u
f =  u sin v  , u > 0, 0 ≤ v ≤ 2π.
v
u
When we fix u = u0 , the image is a circle of radius u0 at height u0 ; when we fix v = v0 , the image

v
2π
f

u
y

Figure 1.14

is a ray making an angle of π/4 with the z-axis and whose projection into the xy-plane makes an
angle of v0 with the positive x-axis. Thus, the image of f is a cone, as pictured in Figure 1.14. ▽

EXERCISES 2.1

1. Find parametrizations of each of the following lines:

a. 3x1 + 4x2 = 6
−1
*b. the line with slope 1/3 that passes through
2
   
1 2
c. the line through A =  2  and B =  1 
1 0

−2 3
d. the line through A = perpendicular to
1 5
   
1 2+t
 1  1 − 2t 
*e. the line through   
 0  parallel to g(t) =  3t 


−1 4−t
§1. Scalar- and Vector-Valued Functions 59

2. a. Give parametric equations for the circle x2 + y 2 = 1 in terms of the length t pictured in
Figure 1.15. (Hint: Use similar triangles and algebra.)

x
y
t
—1
0

Figure 1.15

b. Use your answer to part a to produce infinitely many positive integer solutions2 of X 2 +
Y 2 = Z 2 with distinct ratios Y /X.

3. A string is unwound from a circular reel of radius a, being pulled taut at each instant. Give
parametric equations for the tip of the string P in terms of the angle θ, as pictured in Figure
1.16.

Figure 1.16

4. A wheel of radius a (perhaps belonging to a train) rolls along the x-axis. If a point P (on the
wheel) is located a distance b from the center of the wheel, what are the parametric equations
of its locus as the wheel rolls? (Note that when b = a we obtain a cycloid.) See Figure 1.17.

5. *a. A circle of radius b rolls without slipping outside a circle of radius a > b. Give the
parametric equations of a point P on the circumference of the rolling circle (in terms of
the angle θ of the line joining the centers of the two circles). (See Figure 1.18(a).)
b. Now it rolls inside. Do the same as for part a.
These curves are called, respectively, an epicycloid and a hypocycloid.
2
These are called Pythagorean triples. Fermat asked whether they were any nonzero integer solutions of the
corresponding equations X n + Y n = Z n for n ≥ 3. In 1995, Andrew Wiles proved in a tour de force of algebraic
number theory that there can be none.
60 Chapter 2. Functions, Limits, and Continuity

b<a b>a

Figure 1.17

(a) (b)

Figure 1.18

6. A coin of radius 1′′ is rolled (without slipping) around the outside of a coin of radius 2′′ . How
many complete revolutions does its “head” make? Now explain the correct answer! (There is
a famous story that the Educational Testing Service screwed this one up and was challenged
by a precocious high school student who knew that he had done the problem correctly.)

0
*7. A dog buries a bone at . He is at the end of a 1-unit long leash, and his master walks
1
down the positive x-axis, dragging the dog along. Since the dog wants to get back to the bone,
he pulls the leash taut. (It was pointed out to me by some students a few years ago that the

0
1

θ(t)
t
0

Figure 1.19

realism of this model leaves something to be desired.) The curve the dog travels is called a
§1. Scalar- and Vector-Valued Functions 61

tractrix (why?). Give parametric equations of the curve in terms of the parameters
a. θ
b. t
as pictured in Figure 1.19. (Hint: The fact that the leash is pulled taut means that the leash
is tangent to the curve. Show that θ ′ (t) = sin θ(t).)

8. Prove that the twisted cubic (given in Example 3) has the property that any three distinct
points on it determine a plane; i.e., no three distinct points are collinear.

9. Sketchfamilies
of level curves and the graphs of the following functions f .
x
a. f =1−y
y

x
b. f = x2 − y
y

x
c. f = x2 − y 2
y

x
d. f = xy
y
10. Consider the surfaces
     
 x   x 
X =  y  : x2 + y 2 − z 2 = 1 and Y =  y  : x2 + y 2 − z 2 = −1 .
   
z z

a. Sketch the surfaces.

b. Give a rigorous argument (not merely based on your pictures) that every pair of points of
X can be joined by a curve in X, but that the same is not true of Y .

11. Consider the function

 
(2 + cos t) cos s
s
g =  (2 + cos t) sin s  , 0 ≤ s, t ≤ 2π.
t
sin t

a. Sketch the image, X, of g.

*b. Find an algebraic equation satisfied by all the points of X.

12. Consider the function (defined wherever st 6= 1)

 st+1 
st−1
s
g = s−t
st−1
.
t s+t
st−1

2 2 2
a. Show that every point inthe glies on the hyperboloid x + y − z = 1.
image of
s s
b. Show that the curves g 0 and g (for s0 and t0 constants) are (subsets of) lines.
t t0
(See Figure 1.20.)
c. (more challenging) What is the image of g?
62 Chapter 2. Functions, Limits, and Continuity

Figure 1.20

2. A Bit of Topology in Rn

Having introduced functions, we must next decide what it means for a function to be continuous.
In one-variable calculus we study functions defined on intervals and come to appreciate the difference
between open and closed intervals. For example, the notion of limit is couched in terms of open
intervals, whereas the maximum value theorem for continuous functions depends crucially on closed
intervals. Matters are somewhat more subtle in higher dimensions, and we begin our assault on
the analogous notions in Rn .

Definition. Let a ∈ Rn and let δ > 0. The ball of radius δ centered at a is

B(a, δ) = {x ∈ Rn : kx − ak < δ}.
This is often called a neighborhood of a.
δ
Note that if |xi − ai | < √ for all i = 1, . . . , n, then
n
v s
u n 2
uX δ
kx − ak = t (xi − ai )2 < n √ = δ,
n
i=1

so x ∈ B(a, δ). And if x ∈ B(a, δ), then |xi − ai | ≤ kx − ak < δ for all i = 1, . . . , n. Figure 2.1
illustrates these relationships.

δ
a √
δ/ n

Figure 2.1

If ai < bi for i = 1, . . . , n, we can consider the rectangle

R = [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ] = {x ∈ Rn : ai ≤ xi ≤ bi , i = 1, . . . , n}.
§2. A Bit of Topology in Rn 63

(Strictly speaking, we should call this a rectangular parallelepiped, but that’s too much of a mouth-
ful.) For reasons that will be obvious in a moment, when we construct the rectangle from open
intervals, viz.,
S = (a1 , b1 ) × (a2 , b2 ) × · · · × (an , bn ) = {x ∈ Rn : ai < xi < bi , i = 1, . . . , n},
we call it an open rectangle.

Definition. We say a subset U ⊂ Rn is open if for every a ∈ U , there is some ball centered at
a that is completely contained in U ; that is, there is δ > 0 so that B(a, δ) ⊂ U .

Examples 1.
(a) First of all, an open interval (a, b) ⊂ R is an open subset. Given any c ∈ (a, b), choose

a c b
(( ) ) ( )

Figure 2.2

δ < min(c − a, b − c). Then B(c,

δ) ⊂ (a, b). However,
suppose we view this interval as
x
a subset of R2 , namely, S = : a < x < b . Then it is no longer an open subset,
0

c
because no ball in R2 centered at is contained in S, as Figure 2.2 plainly indicates.
0
(b) An open rectangle is an open set. As indicated in Figure 2.3, suppose c ∈ S = (a1 , b1 ) ×
(a2 , b2 )×· · ·×(an , bn ). Let δi = min(ci −ai , bi −ci ), i = 1, . . . , n, and set δ = min(δ1 , . . . , δn ).

b2
δ2
c2 c δ

a2
a1 c1 b1
δ1

Figure 2.3

Then we claim that B(c, δ) ⊂ S. For if x ∈ B(c, δ), then |xi − ci | ≤ kx − ck < δ ≤ δi , so
ai < xi < bi , as
required.

x
(c)
Consider
S = : 0 < xy < 1 . We want to show that S is open, so we choose c =
y
a
∈ S. Without loss of generality, we may assume that 0 < b ≤ a, as shown in Figure
b
2.4. We claim that the ball of radius

1 1 − ab
δ= 1 1 − b = b 1 + ab
2 (a + b )
64 Chapter 2. Functions, Limits, and Continuity

xy=1

b c δ

a 1/b

Figure 2.4

centered at c is wholly contained in the region S. We consider the open rectangle centered
at c with base 1b − a and height 2δ; by construction, this rectangle is contained in S. Since
b ≤ a and ab < 1, it easy to check that the height is smaller than the length, and so the
ball of radius δ centered at c is contained in the rectangle, hence in S. ▽

As we shall see in the next section, the concept of open sets is integral to the notion of continuity
of a function.
We turn next to a discussion of sequences. The connections to open sets will become clear.

Definition . A sequence of vectors (or points) in Rn is a function from the set of natural
numbers, N, to Rn , i.e., an assignment of a vector xk ∈ Rn to each natural number k ∈ N. We refer
to xk as the kth term of the sequence. We often abuse notation and write {xk } for such a sequence,
even though we are thinking of the actual function and not the set of its values.
We say the sequence {xk } converges to a (denoted xk → a or lim xk = a) if for all ε > 0,
k→∞
there is K ∈ N such that
kxk − ak < ε whenever k > K .
(That is, given any neighborhood of a, “eventually”—past some K—all the elements xk of the
sequence lie inside.) We say the sequence {xk } is convergent if it converges to some a.

Examples 2. Here are a few examples of sequences, both convergent and non-convergent.
k
(a) Let xk = k+1 . We suspect that xk → 1. To prove this, note that, given any ε > 0,

k 1

|xk − 1| = − 1 = <ε
k+1 k+1
§2. A Bit of Topology in Rn 65

whenever k + 1 > 1/ε. If we let K = [1/ε] (the greatest integer less than or equal to 1/ε),
then it is easy to see that k > K =⇒ k + 1 > 1/ε, as required.
(b) The sequence {xk = (1+ k1 )k } of real numbers is a famous one (think of compound interest),
and converges to e, as the reader can check by taking logs and applying Proposition 3.6.
(c) The sequence 1, −1, 1, −1, 1, . . ., i.e., {xk = (−1)k+1 }, is not convergent. Since its consecu-
tive terms are 2 units apart, no matter what a ∈ R and K ∈ N we pick, whenever ε < 1,
we cannot have |xk − a| < ε whenever k > K. For if we did, we would have (by the triangle
inequality)

2 = |xk+1 − xk | ≤ |xk+1 − a + a − xk | ≤ |xk+1 − a| + |xk − a| < 2ε < 2

whenever k > K, which is, of course, impossible.

(d) Let x0 ∈ Rn be a fixed vector. Define a sequence (recursively) by xk = 21 xk−1 , k ≥ 1. This
k
means, of course, that xk = 12 x0 , and so we suspect that xk → 0. If x0 = 0, there is
nothing to prove. Suppose x0 6= 0 and ε > 0. Then we will have
k
kxk − 0k = kxk k = 12 kx0 k < ε

whenever k > log2 (kx0 k/ε) = log(kx0 k/ε)/ log 2. So, if we take K = [log(kx0 k/ε)/ log 2]+1,
then it follows
" that
# whenever k > K, we have kxk k < ε, as required.

2 0 1
(e) Let A = and x0 = . Define a sequence of vectors in R2 recursively by
0 1 1
Axk−1
xk = , k ≥ 1.
kAxk−1 k
1 2k
can easily prove by induction, we have xk = √
As the reader
22k +1 1
, and it follows that
1
lim xk = . ▽
k→∞ 0

Example 3. Suppose xk , yk ∈ Rn , xk → a and yk → b. Then it seems quite plausible

that xk + yk → a + b. Given ε > 0, we are to find K ∈ N so that whenever k > K we have
k(xk + yk ) − (a + b)k < ε. Rewriting, we observe that (by the triangle inequality)

k(xk + yk ) − (a + b)k = k(xk − a) + (yk − b)k ≤ kxk − ak + kyk − bk,

and so we can make k(xk + yk ) − (a + b)k < ε by making kxk − ak < ε/2 and kyk − bk < ε/2. To
this end, we use the definition of convergence of the sequences {xk } and {yk } as follows: there are
K1 , K2 ∈ N so that

whenever k > K1 , we have kxk − ak < ε/2

and

whenever k > K2 , we have kyk − bk < ε/2.

Thus, if we take K = max(K1 , K2 ), whenever k > K, we will have k > K1 and k > K2 , and so
ε ε
k(xk + yk ) − (a + b)k ≤ kxk − ak + kyk − bk < + = ε,
2 2
as was required. ▽
66 Chapter 2. Functions, Limits, and Continuity

A crucial topological property of R is the least upper bound property:

A subset S ⊂ R is bounded above if there is some b ∈ R so that a ≤ b for all a ∈ S.
Such a real number b is called an upper bound of S. Then every nonempty set S
that is bounded above has a least upper bound, denoted sup S. That is, a ≤ sup S
for all a ∈ S and sup S ≤ b for every upper bound b of S.

Examples 4. (a) Let S = [0, 1]. Then S is bounded above (e.g., by 2) and sup S = 1.
√
(b) Let S = {x ∈ Q : x2 < 2}. Then S is bounded above (e.g., by 2), and sup S = 2. (Note
√
that 2 ∈ / Q. The point is that the irrational numbers fill in all the “holes” amongst the
rationals.)
(c) Suppose {xk } is a sequence of real numbers that is both bounded above and nondecreasing
(i.e., xk ≤ xk+1 for all k ∈ N). Then the sequence must converge. Since the sequence is
bounded above, there is a least upper bound, α, for the set of its values. Now we claim
that xk → α. Given ε > 0, there is K ∈ N so that α − xK < ε (for otherwise α would not
be the least upper bound). But then the fact that the sequence is nondecreasing tells us
that whenever k > K we have 0 ≤ α − xk ≤ α − xK < ε, as required. ▽

Definition. Suppose S ⊂ Rn . If S has the property that every convergent sequence of points
in S converges to a point in S, then we say S is closed. That is, S is closed if the following is true:
Whenever a convergent sequence xk → a has the property that xk ∈ S for all k ∈ N, then a ∈ S as
well.

Example 5. S = R − {0} = {x ∈ R : x 6= 0} ⊂ R is not closed, for if we take the sequence

xk = 1/k ∈ S, clearly xk → 0, and 0 ∈
/ S. ▽

This definition seems a bit strange, but it is exactly what we will need for many applications
to come. In the meantime, if we need to decide whether or not a set is closed, it is easiest to use
the following

Proposition 2.1. The subset S ⊂ Rn is closed if and only if its complement, Rn − S =

{x ∈ Rn : x ∈
/ S}, is open.

Proof. Suppose Rn − S is open and {xk } is a convergent sequence with xk ∈ S and limit
a. Suppose that a ∈ / S. Then there is a neighborhood B(a, ε) of a wholly contained in Rn − S,
which means no element of the sequence {xk } lies in that neighborhood, contradicting the fact that
xk → a. Therefore, a ∈ S, as desired.
Suppose S is closed and b ∈ / S. We claim that there is a neighborhood of b lying entirely in
n
R − S. Suppose not. Then for every k ∈ N, the ball B(b, 1/k) intersects S; that is, we can find a
point xk ∈ S with kxk − bk < 1/k. Then {xk } is a sequence of points in S converging to the point
b∈ / S, contradicting the hypothesis that S is closed.

Example 6. It now follows easily that the closed interval [a, b] = {x ∈ R : a ≤ x ≤ b} is a

closed subset of R, inasmuch as its complement is the union of two open intervals. Similarly, the
closed ball B(a, r) = {x ∈ Rn : kx − ak ≤ r} is a closed subset of Rn , as we ask the reader to check
in Exercise 5. In summary, our choice of terminology is felicitous indeed. ▽
§2. A Bit of Topology in Rn 67

Note that most sets are neither open nor closed. For example, the interval S = (0, 1] ⊂ R is not
open because there is no neighborhood of the point 1 contained in S, and it is not closed because
of the reasoning in Example 5. Be careful not to make a common mistake here: just because a set
isn’t open, it need not be closed, and vice versa.
For future use, we make the following

Definition. Suppose S ⊂ Rn . We define the closure of S to be the smallest closed set containing
S. It is denoted by S.

We should think of S as containing all the points of S and all points that can be obtained as
limits of convergent sequences of points of S. A slightly different formulation of this notion is given
in Exercise 8.

EXERCISES 2.2

*1. Which of the following subsets of Rn is open? closed? neither? Prove your answer.

a. {x : 0 < x ≤ 2} ⊂ R x
−k for some k ∈ N or x = 0} ⊂ R g. : y = x ⊂ R2
b. {x : x = 2 y

x h. {x : 0 < kxk < 1} ⊂ Rn
c. : y > 0 ⊂ R2

y
i. {x : kxk > 1} ⊂ Rn
x j. {x : kxk ≤ 1} ⊂ Rn
d. : y ≥ 0 ⊂ R2
y
k. Q ⊂R
the set of rational numbers,
x 2 1
e. :y>x ⊂R l. x : kxk < 1 or x − < 1 ⊂ R2

y
0
x m. ∅ (the empty set)
f. : xy 6= 0 ⊂ R2
y

♯ 2. Let {xk } be a sequence of points in Rn . For i = 1, . . . , n, let xk,i denote the ith coordinate of
the vector xk . Prove that xk → a if and only if xk,i → ai for all i = 1, . . . , n.

3. Suppose {xk } is a sequence of points (vectors) in Rn converging to a.

a. Prove that kxk k → kak. (Hint: See Exercise 1.2.17.)
b. Prove that if b ∈ Rn is any vector, then b · xk → b · a.
♯ 4. Prove that a rectangle R = [a1 , b1 ] × · · · × [an , bn ] ⊂ Rn is closed.

*5. Prove that the closed ball B(a, r) = {x ∈ Rn : kx − ak ≤ r} ⊂ Rn is closed.

♯ 6. Given a sequence {xk } of points in Rn , a subsequence is formed by taking xk1 , xk2 , . . . , xkj , . . .,
where k1 < k2 < k3 < · · ·.
a. Prove that if the sequence {xk } converges to a, then any subsequence {xkj } converges to
a as well.
b. Is the converse valid? Give a proof or counterexample.
68 Chapter 2. Functions, Limits, and Continuity

♯ 7. a. Suppose U and V are open subsets of Rn . Prove that U ∪ V and U ∩ V are open as well.
(Recall that U ∪ V = {x ∈ Rn : x ∈ U or x ∈ V } and U ∩ V = {x ∈ Rn : x ∈ U and
x ∈ V }.)
b. Suppose C and D are closed subsets of Rn . Prove that C ∪ D and C ∩ D are closed as
well.
♯ 8. Let S ⊂ Rn . We say a ∈ S is an interior point of S if some neighborhood of a is contained in
S. We say a ∈ Rn is a frontier point of S if every neighborhood of a contains both points in S
and points not in S.
a. Show that every point of S is either an interior point or a frontier point, but give examples
to show that a frontier point of S may or may not belong to S.
b. Give an example of a set S every point of which is a frontier point.
c. Prove that the set of frontier points of S is always a closed set.
d. Let S ′ be the union of S and the set of frontier points of S. Prove that S ′ is closed.
e. Suppose C is a closed set containing S. Prove that S ′ ⊂ C. Thus, S ′ is the smallest
closed set containing S, which we have earlier called S, the closure of S. (Hint: Show that
Rn − C ⊂ Rn − S ′ .)

9. Continuing Exercise 8:
a. Is it true that all the interior points of S are points of S? Is this true if S is open? (Give
proofs or counterexamples.)
b. Let S ⊂ Rn and let F be the set of the frontier points of S. Is it true that the set of
frontier points of F is F itself? (Proof or counterexample.)
♯ *10. a. Suppose I0 = [a, b] is a closed interval, and for each k ∈ N, Ik is a closed interval with the
property that Ik ⊂ Ik−1 . Prove that there is a point x ∈ R so that x ∈ Ik for all k ∈ N.
b. Give an example to show the result of part a is false if the intervals are not closed.

11. Prove that the only subsets of R that are both open and closed are the empty set and R
itself. (Hint: Suppose S is such a nonempty subset that is not equal to R. Then there
are some points a ∈ S and b ∈ / S. Without loss of generality (how?), assume a < b. Let
α = sup{x ∈ R : [a, x] ⊂ S}. Show that neither α ∈ S nor α ∈
/ S is possible.)

12. A sequence {xk } of points in Rn is called a Cauchy sequence if for all ε > 0 there is K ∈ N so
that whenever k, ℓ > K, we have kxk − xℓ k < ε.
a. Prove that any convergent sequence is Cauchy.
b. Prove that if a subsequence of a Cauchy sequence converges, then the sequence itself must
converge. (Hint: Suppose ε > 0. If xkj → a, then there is J ∈ N so that whenever j > J,
we have kxkj − ak < ε/2. There is also K ∈ N so that whenever k, ℓ > K, we have
kxk − xℓ k < ε/2. Choose j > J so that kj > K.)

*13. Prove that if {xk } is a Cauchy sequence, then all the points lie in some ball centered at the
origin.

14. a. Suppose {xk } is a sequence of points in R satisfying a ≤ xk ≤ b for all k ∈ N. Prove that
§3. Limits and Continuity 69

{xk } has a convergent subsequence (see Exercise 6). (Hint: If there are only finitely many
distinct terms in the sequence, this should be easy. If there are infinitely many distinct
terms in the sequence, then there must be infinitely many either in the left half-interval
[a, a+b a+b
2 ] or in the right half-interval [ 2 , b]. Let [a1 , b1 ] be such a half-interval. Continue
the process, and apply Exercise 10.)
b. Use the results of Exercises 12 and 13 to prove that any Cauchy sequence in R is convergent.
c. Now prove that any Cauchy sequence in Rn is convergent. (Hint: Use Exercise 2.)
♯ 15. Suppose S ⊂ Rn is a closed set that is a subset of the rectangle [a1 , b1 ] × · · · × [an , bn ]. Prove
that any sequence of points in S has a convergent subsequence. (Hint: Use repeatedly the idea
of Exercise 14a.)

3. Limits and Continuity

The concept on which all of calculus is founded is that of the limit. Limits are rather more subtle
when we consider functions of more than one variable. We begin with the obligatory definition and
some standard properties of limits.

Definition . Let U ⊂ Rn be an open subset containing a neighborhood of a ∈ Rn , except

perhaps for the point a itself. Suppose f : U → Rm . We say that

lim f (x) = ℓ
x→a

(“f (x) approaches ℓ ∈ Rm as x approaches a”) if for every ε > 0 there is δ > 0 so that

kf (x) − ℓ k < ε whenever 0 < kx − ak < δ.

(Note that even if f (a) is defined, we say nothing whatsoever about its relation to ℓ .)

We begin by observing that for a vector-valued function, calculating a limit may be done
component by component. As is customary by now, we denote the components of f by f1 , . . . , fm .

Proposition 3.1. lim f (x) = ℓ if and only if lim fj (x) = ℓj for all j = 1, . . . , m.
x→a x→a

Proof. The proof is based on Figure 2.1. Suppose lim f (x) = ℓ . We must show that for any
x→a
j = 1, . . . , m, we have lim fj (x) = ℓj . Given ε > 0, there is δ > 0 so that whenever 0 < kx−ak < δ,
x→a
we have kf (x) − ℓ k < ε. But since we have

|fj (x) − ℓj | ≤ kf (x) − ℓ k,

we see that whenever 0 < kx − ak < δ, we have |fj (x) − ℓj | < ε, as required.
Now, suppose that lim fj (x) = ℓj for j = 1, . . . , m. Given ε > 0, there are δ1 , . . . , δm > 0 so
x→a
that
ε
|fj (x) − ℓj | < √ whenever 0 < kx − ak < δj .
m
70 Chapter 2. Functions, Limits, and Continuity

Let δ = min(δ1 , . . . , δm ). Then whenever 0 < kx − ak < δ, we have

v s
uX 2
um ε
kf (x) − ℓ k = t (fj (x) − ℓj ) < m √
2 = ε,
m
j=1

as required.

Example 1. Fix a nonzero vector b ∈ Rn . Let f : Rn → R be defined by f (x) = b · x. We

claim that lim f (x) = b · a. For
x→a
|f (x) − b · a| = |b · x − b · a| = |b · (x − a)| ≤ kbkkx − ak,
by the Cauchy-Schwarz Inequality, Proposition 2.3 of Chapter 1. Thus, given ε > 0, if we take
ε
δ = ε/kbk, then whenever 0 < kx − ak < δ, we have |f (x) − b · a| < kbk = ε, as needed.
kbk
Note, moreover, that as a consequence of Proposition 3.1, for any linear map T : Rn → Rm it
is the case that lim T (x) = T (a). ▽
x→a

Example 2. Let f : Rn → R be defined by f (x) = kxk2 . Then we claim that lim f (x) = kak2 .
x→a
(1) Suppose first that a = 0. Since r 2 ≤ r whenever 0 ≤ r ≤ 1, we know that when 0 < ε ≤ 1,
we can choose δ = ε and then
0 < kxk < δ = ε =⇒ |f (x)| = kxk2 < ε2 ≤ ε,
as required. But what if some (admittedly, silly) person hands us an ε > 1? The trick to
take care of this is to let δ = min(1, ε). Should ε be bigger than 1, then δ = 1, and so when
0 < kxk < δ, we know that kxk < 1 and, onceagain, |f (x)| < 1 < ε, as required.
ε
(2) Now suppose a 6= 0. Given ε > 0, let δ = min kak, . Now suppose 0 < kx − ak < δ.
3kak
Then, in particular, we have kxk < kak + δ ≤ 2kak, so that kx + ak ≤ kxk + kak < 3kak.
Then
ε
|f (x) − kak2 | = |x · x − a · a| = |(x + a) · (x − a)| ≤ kx + akkx − ak < 3kak · = ε,
3kak
as required.
Such sleight of hand (and more) is often required when the function is nonlinear. ▽

2 x x2 y
Example 3. Define f : R − {0} → R by f = 2 . Does lim f (x) exist? Since
y
x + y2 x→0
p p x
|x| ≤ x2 + y 2 and |y| ≤ x2 + y 2 , we have (writing x = )
y
kxk3
|f (x)| ≤ = kxk,
kxk2
and so f (x) → 0 as x → 0. (In particular, taking δ = ε will work.) An alternative approach, which
will be useful later, is this:
x2
|f (x)| = |y| 2 ≤ |y|,
x + y2
x2
since 0 ≤ 2 ≤ 1. Once again, |y| ≤ kxk, and hence approaches 0 as x → 0. Thus, so does
x + y2
|f (x)|. (See Figure 3.1(a).) ▽
§3. Limits and Continuity 71

(a) (b)

Figure 3.1

x
Example 4. Let’s modify the previous example slightly. Define f : R2 − {0} → R by f =
y
x2
. We ask again whether lim f (x) exists. Note that
x2 + y 2 x→0

h h2
lim f = lim 2 = 1, whereas
h→0 0 h→0 h

0 0
lim f = lim 2 = 0.
k→0 k k→0 k

Thus, lim f (x) cannot exist (there is no number ℓ so that both 1 and 0 are less than ε away from ℓ
x→0
x xy
when 0 < ε < 1/2). (See Figure 3.1(b).) Now, what about f = 2 ? In this case we have
y x + y2

h 0
lim f = lim f = 0,
h→0 0 k→0 k
so we might surmise that the limit exists and equals 0. But consider what happens if x approaches
0 along the line y = x:
h h2 1
lim f = lim 2 = .
h→0 h h→0 2h 2
Once again, the limit does not exist. ▽

The fundamental properties of limits with which every calculus student is familiar generalize
in an obvious way to the multivariable setting.

Theorem 3.2. Suppose f and g map a neighborhood of a ∈ Rn (with the possible exception
of the point a itself) to Rm and k maps the same neighborhood to R. Suppose
lim f (x) = ℓ , lim g(x) = m , and lim k(x) = c.
x→a x→a x→a
Then
lim f (x) + g(x) = ℓ + m
x→a
lim f (x) · g(x) = ℓ · m
x→a
lim k(x)f (x) = cℓℓ .
x→a
72 Chapter 2. Functions, Limits, and Continuity

Proof. Given ε > 0, there are δ1 , δ2 > 0 so that

ε
kf (x) − ℓ k < whenever 0 < kx − ak < δ1
2
and
ε
kg(x) − mk < whenever 0 < kx − ak < δ2 .
2
Let δ = min(δ1 , δ2 ). Whenever 0 < kx − ak < δ, we have
ε ε
k(f (x) + g(x)) − (ℓℓ + m )k ≤ kf (x) − ℓ k + kg(x) − m k < + = ε,
2 2
as required.
Given ε > 0, there are (different) δ1 , δ2 > 0 so that
ε
kf (x) − ℓ k < min , 1 whenever 0 < kx − ak < δ1
m k + 1)
2(km

and
ε
kg(x) − m k < whenever 0 < kx − ak < δ2 .
2(kℓℓ k + 1)
Note that when 0 < kx − ak < δ1 , we have (by the triangle inequality) kf (x)k < kℓℓk + 1. Now, let
δ = min(δ1 , δ2 ). Whenever 0 < kx − ak < δ, we have

mk
|f (x) · g(x) − ℓ · m | = |f (x) · (g(x) − m ) + (f (x) − ℓ ) · m | ≤ kf (x)kkg(x) − m k + kf (x) − ℓ kkm
< (kℓℓ k + 1)kg(x) − m k + kf (x) − ℓ kkm
mk
ε ε ε ε
< (kℓℓ k + 1) + mk < + = ε,
km
2(kℓℓ k + 1) 2(km
m k + 1) 2 2
as required.
The proof of the last equality is left to the reader in Exercise 4.

Once we have the concept of limit, the definition of continuity is quite straightforward.

Definition . Let U ⊂ Rn be an open subset containing a neighborhood of a ∈ Rn , and let

f : U → Rm . We say f is continuous at a if

lim f (x) = f (a).

x→a

That is, f is continuous at a if, given any ε > 0, there is δ > 0 so that

kf (x) − f (a)k < ε whenever kx − ak < δ.

We say f is continuous if it is continuous at every point of its domain.

As an immediate consequence of Theorem 3.2 and this definition we have:

Corollary 3.3. Suppose f and g map a neighborhood of a ∈ Rn to Rm and k maps the same
neighborhood to R. If each function is continuous at a, then so are f + g, f · g, and kf .
§3. Limits and Continuity 73

It is perhaps a bit more interesting to relate the definition of continuity to our notions of
open and closed sets from the previous section. Let’s first introduce a bit of standard notation: if
f : X → Y is a function and Z ⊂ Y , we write f −1 (Z) = {x ∈ X : f (x) ∈ Z}, as illustrated in
Figure 3.2. This is called the preimage of Z under the mapping f ; be careful to remember that f
may not be one-to-one and hence may well have no inverse function.

−1
f (Z)

X Z

Figure 3.2

Proposition 3.4. Let U ⊂ Rn be an open set. The function f : U → Rm is continuous if and

only if for every open subset V ⊂ Rm , f −1 (V ) is an open set (i.e., the preimage of every open set
is open).

Proof. ⇐=: Suppose a ∈ U and we wish to prove f is continuous at a. Given ε > 0, we

must find δ > 0 so that whenever kx − ak < δ (and x ∈ U ), we have kf (x) − f (a)k < ε. Take
V = B(f (a), ε). Since f −1 (V ) is open and a ∈ f −1 (V ), there is δ > 0 so that B(a, δ) ⊂ f −1 (V ).
Shrinking δ if necessary so that B(a, δ) ⊂ U , we then know that whenever kx − ak < δ, we have
f (x) ∈ V = B(f (a), ε), and so we’re done. (See Figure 3.3.)

a δ
f
f(a)
ε
V

f −1(V)

Figure 3.3

=⇒: Suppose now that f is continuous and V ⊂ Rm is open. Let a ∈ f −1 (V ) be arbitrary.

Since f (a) ∈ V and V is open, there is ε > 0 so that B(f (a), ε) ⊂ V . Since f is continuous at a,
there is δ > 0 so that whenever kx − ak < δ, we have kf (x) − f (a)k < ε. So, whenever x ∈ B(a, δ),
we have f (x) ∈ B(f (a), ε) ⊂ V . This means that B(a, δ) ⊂ f −1 (V ), and so f −1 (V ) is open.
74 Chapter 2. Functions, Limits, and Continuity

Proposition 3.5. Suppose U ⊂ Rn and W ⊂ Rp are open, f : U → Rm , g : W → Rn , and

the composition of functions f ◦ g is defined (i.e., g(x) ∈ U for all x ∈ W ). Then if f and g are
continuous, so is f ◦ g.

Proof. Let V ⊂ Rm be open. We need to see that (f ◦ g)−1 (V ) is an open subset of Rp . By

the definition of composition, (f ◦ g)(x) = f (g(x)) ∈ V if and only if g(x) ∈ f −1 (V ) if and only if

x ∈ g−1 f −1 (V ) . By the continuity of f , we know that f −1 (V ) ⊂ Rn is open, and then by the

continuity of g, we deduce that g−1 f −1 (V ) ⊂ Rp is open. That is, (f ◦ g)−1 (V ) ⊂ Rp is open, as
required.

Example 5. Consider the function

x x2 y
f = 4 , x 6= 0, f (0) = 0,
y x + y2
whose graph is shown in Figure 3.4. We ask whether f is continuous. Since the denominator
vanishes onlyat the origin,
it follows from Corollary 3.3 that f is continuous away from the origin.
h 0
Now, since f =f = 0 for all h and k, we are encouraged. What’s more, the restriction of
0 k
f to any line y = mx through the origin is continuous, since

x mx3 mx
f = 4 = 2 for all x.
mx x + m 2 x2 m + x2
On the other hand, if we consider the restriction of f to the parabola y = x2 , we find that

Figure 3.4
 4
 x = 1 , x 6= 0
x
f = 2x4 2 ,
x2 
0, x=0
which is definitely not a continuous
function.
Thus, f cannot be continuous. (If it were, according
x
to Proposition 3.5, letting g(x) = 2 , f g would have to be continuous.)
◦ ▽
x

Next we come to the relation between continuity and convergent sequences.

Proposition 3.6. Suppose U ⊂ Rn is open and f : U → Rm . Then f is continuous at a if and

only if for every sequence {xk } of points in U converging to a the sequence {f (xk )} converges to
f (a).
§3. Limits and Continuity 75

Proof. Suppose f is continuous at a. Given ε > 0, there is δ > 0 so that whenever kx − ak < δ,
we have kf (x) − f (a)k < ε. Suppose xk → a. There is K ∈ N so that whenever k > K, we have
kxk − ak < δ, and hence kf (xk ) − f (a)k < ε. Thus, f (xk ) → f (a), as required.
The converse is a bit trickier. We proceed by proving the contrapositive. Suppose f is not
continuous at a. This means that for some ε0 > 0, it is the case that for every δ > 0 there is
some x with kx − ak < δ and kf (x) − f (a)k ≥ ε0 . So, for each k ∈ N, there is a point xk so that
kxk − ak < 1/k and kf (xk ) − f (a)k ≥ ε0 . But this means that the sequence {xk } converges to a
and yet clearly the sequence {f (xk )} cannot converge to f (a).

Corollary 3.7. Suppose f : Rn → Rm is continuous. Then for any c ∈ Rm , the level set
f −1 ({c}) = {x ∈ Rn : f (x) = c} is a closed set.

Proof. Suppose {xk } is a convergent sequence of points in f −1 ({c}), and let a be its limit. By
Proposition 3.6, f (xk ) → f (a). Since f (xk ) = c for all k, it follows that f (a) = c as well, and so
a ∈ f −1 ({c}), as we needed to show.

Example 6. By Example 2, the function f : Rn → R, f (x) = kxk2 , is continuous. The level

sets of f are spheres centered at the origin. It follows that these spheres are closed sets. ▽

EXERCISES 2.3

1. Prove that if lim f (x) exists, it must be unique. (Hint: if ℓ and m are two putative limits,
x→a
choose ε = kℓℓ − m k/2.)
♯ 2. Prove that f : Rn → R, f (x) = kxk is continuous. (Hint: Write x = a + (x − a).)
♯ 3. (Squeeze Principle) Suppose f , g, and h are real-valued functions on a neighborhood of a
(perhaps not including the point a itself). Suppose f (x) ≤ g(x) ≤ h(x) for all x and lim f (x) =
x→a
ℓ = lim h(x). Prove that lim g(x) = ℓ. (Hint: Given ε > 0, show that there is δ > 0 so that
x→a x→a
whenever 0 < kx − ak < δ, we have −ε < f (x) − ℓ ≤ g(x) − ℓ ≤ h(x) − ℓ < ε.)

4. Suppose lim f (x) = ℓ and lim k(x) = c. Prove that lim k(x)f (x) = cℓℓ.
x→a x→a x→a
♯ 5. Suppose U ⊂ Rn is open and f : U → R is continuous. If a ∈ U and f (a) > 0, prove that there
is δ > 0 so that f (x) > 0 for all x ∈ B(a, δ). (That is, a continuous function that is positive at
a point must be positive on a neighborhood of that point.) Can you state a somewhat stronger
result?

6. Let U ⊂ Rn be open. Suppose g : U → R is continuous and g(a) 6= 0. Prove that 1/g is

continuous on some neighborhood of a. (Hint: Apply Proposition 3.5.)
♯ 7. Suppose T : Rn → Rm is a linear map.
a. Prove that T is continuous. (See Example 1.)
76 Chapter 2. Functions, Limits, and Continuity

b. Deduce the result of part a an alternative way by showing that for any m × n matrix A,
we have
X 1/2
kAxk ≤ a2ij kxk.
i,j

*8. Using Theorem 3.2 whenever possible (and standard facts from one-variable calculus), decide
in each case whether lim f (x) exists. Provide appropriate justification.
x→0

x xy x x2 + y 2 0
a. f = f. f = , x 6= 0, f =0
y x+y+1 y y
x
x sin(x2 + y 2 )
b. f = x x3
y 2
x +y 2 g. f =

y x2 + y 2
x 2
x −y 2
x
c. f = , x 6= y, f =0 x x sin2 y
y x−y x h. f =
y x2 + y 2
x 2 2

d. f = ex +y x xy x
y i. f = 3 3
, x 6= y, f =0
y x −y x
x
e. f
2
= e−1/(x +y )
2
x x2 + y 2 x
y j. f = , x 6= −y, f =0
y x+y −x

9. Suppose f : Rn → Rn is continuous and x0 is arbitrary. Define a sequence by xk = f (xk−1 ),

k = 1, 2, . . .. Prove that if xk → a, then f (a) = a. We say a is a fixed point of f .

10. Use Exercise 9 to find the limit of each of the following sequences of points in R, presuming it
exists.
√ 1
*a. x0 = 1, xk = 2xk−1 c. x0 = 1, xk = 1 +
xk−1
xk−1 2 1
b. x0 = 5, xk = + *d. x0 = 1, xk = 1 +
2 xk−1 1 + xk−1
11. Give an example of a discontinuous function f : R → R having the property that for every
c ∈ R the level set f −1 ({c}) is closed.

12. If f : X → Y is a function and U ⊂ X, recall that the image of U is the set f (U ) = {y ∈ Y :

y = f (x) for some x ∈ U }. Prove or give a counterexample: if f is continuous, then the image
of every open set is open. (Cf. Proposition 3.4.)

13. Prove that if f is continuous, then the preimage of every closed set is closed.

14. Identify Mm×n , the set of m × n matrices, with Rmn in the obvious way.
a. Prove that when n = 2 or 3, the set of n × n matrices with nonzero determinant is an open
subset of Mn×n .
b. Prove that the set of n × n matrices A satisfying AT A = In is a closed subset of Mn×n .

15. a. Let 
0,
x |y| > x2 or y = 0
f = .
y 1, otherwise

Show that f is continuous at 0 on every line through the origin but is not continuous at 0.
§3. Limits and Continuity 77

b. Give a function that is continuous at 0 along every line and every parabola y = kx2
through the origin but is not continuous at 0.

16. Give a function f : R2 → R that is

a. continuous at 0 along every line through the origin but unbounded in every neighborhood
of 0.
b. continuous at 0 along every line through the origin, unbounded in every neighborhood of
0, and discontinuous only at the origin.

17. Generalizing Example 5, for what positive values of α, β, γ, and δ is the analogous function

x |x|α |y|β
f = , x 6= 0, f (0) = 0
y |x|γ + |y|δ
continuous at 0?

18. a. Suppose A is an invertible n × n matrix. Show that the solution of Ax = b varies contin-
uously with b ∈ Rn .
A
b. Show that the solution of Ax = b varies continuously as a function of , as A varies
b
over all invertible matrices and b over Rn . (You should be able to get the cases n = 1 and
n = 2. What do you need for n > 2?)
CHAPTER 3
The Derivative
In this chapter we start in earnest on calculus. The immediate goal is to define the tangent plane
at a point to the graph of a function, which should be the suitable generalization of the tangent
lines in single-variable calculus. The fundamental computational tool is the partial derivative, a
direct application of single-variable calculus tools. But the actual definition of a differentiable
function immediately involves linear algebra. We establish various differentiation rules and then
introduce the gradient, which, as common parlance has come to suggest, tells us in which direction
a scalar function increases the fastest; thus, it is highly important for physical and mathematical
applications. We conclude the chapter with a discussion of Kepler’s laws, the geometry of curves,
and higher-order derivatives.

1. Partial Derivatives and Directional Derivatives

Whenever possible it is desirable to reduce problems in multivariable calculus to those in single-

variable calculus. It is reasonable to think that we should understand a function by knowing how
it varies with each variable, fixing all the others. (A physical analogy is this: to find the change
in energy of a gas as we change its volume and temperature, we imagine that we can first fix the
volume and change the temperature, and then, fixing the temperature, change the volume.)
We begin by considering a real-valued function f of two variables, x and y.

y=b
y=b
z

z=f (xy )

x
a a
b

Figure 1.1

78
§1. Partial Derivatives and Directional Derivatives 79

∂f ∂f
Definition. We define the partial derivatives and as follows:
∂x ∂y

a+h a
f −f
∂f a b b
= lim
∂x b h→0 h

a a
f −f
∂f a b+k b
= lim .
∂y b k→0 k

∂f a x
Very simply, if we fix b, then is the derivative at a (or slope) of the function F (x) = f ,
∂x b b

∂f a
as indicated in Figure 1.1. There is an analogous interpretation of .
∂y b
More generally, if U ⊂ Rn is open and a ∈ U , we define the j th partial derivative of f : U → Rm
at a to be
∂f f (a + tej ) − f (a)
(a) = lim , j = 1, . . . , n
∂xj t→0 t
(provided this limit exists). Many authors use the alternative notation Dj f (a) to represent the j th
partial derivative of f at a.

x
Example 1. Let f = x3 y 5 + exy sin(2x + 3y). Then
y

∂f x
= 3x2 y 5 + exy y sin(2x + 3y) + 2 cos(2x + 3y) and
∂x y

∂f x
= 5x3 y 4 + exy x sin(2x + 3y) + 3 cos(2x + 3y) . ▽
∂y y

The partial derivatives of f measure the rate of change of f in the directions of the coordinate
axes, i.e., in the directions of the standard basis vectors e1 , . . . , en . Given any nonzero vector v, it
is natural to consider the rate of change of f in the direction of v.

Definition. Let U ⊂ Rn be open and a ∈ U . We define the directional derivative of f : U → Rm

at a in the direction v to be
f (a + tv) − f (a)
Dv f (a) = lim ,
t→0 t
provided this limit exists.

Note that the j th partial derivative of f at a is just Dej f (a). When n = 2 and m = 1, as we see
from Figure 1.2, if kvk = 1, the directional derivative Dv f (a) is just the slope at a of the graph we
obtain by restricting to the line through a with direction v.

Remark. Our terminology might be a bit misleading. Note that since

f a + t(cv) − f (a) f a + (ct)v − f (a)
Dcv f (a) = lim = lim
t→0 t t→0 t

f a + (ct)v − f (a) f (a + sv) − f (a)
= c lim = c lim
t→0 ct s→0 s
= cDv f (a),
80 Chapter 3. The Derivative

z=f (xy )

y
v
x
a v
b a
b

Figure 1.2

the directional derivative depends not only on the direction of v, but also on its magnitude. It is
for this reason that many calculus books require that one specify a unit vector v. It makes more
sense to think of Dv f (a) as the rate of change of f as experienced by an observer moving with
instantaneous velocity v. We shall return to this interpretation in Section 3.

Example 2. Let f : R2 → R be defined by

x |x|y
f =p , x 6= 0, f (0) = 0,
y x2 + y 2
in Figure 1.3. Then the directional derivative of f at 0 in the direction of
whose graph is shown
v
the unit vector v = 1 is
v2
|tv1 |(tv2 )
f (tv) − f (0) |t|
Dv f (0) = lim = lim = |v1 |v2 .
t→0 t t→0 t
Note that both partial derivatives of f at 0 are 0, and yet the remaining directional derivatives

Figure 1.3

are nonzero. ▽
§1. Partial Derivatives and Directional Derivatives 81

Example 3. Let f : Rn → R be defined by f (x) = kxk. Let a 6= 0 be arbitrary. Let v = a/kak

be a unit vector pointing radially outwards at a. Then
a
f (a + tv) − f (a) ka + t kak k − kak kak + t − kak
Dv f (a) = lim = lim = lim = 1.
t→0 t t→0 t t→0 t
On the other hand, if v · a = 0, then
ka + tvk − kak
Dv f (a) = lim = 0,
t→0 t
inasmuch as t = 0 is a global minimum of the function g(t) = ka + tvk (why?). ▽
     
x 1 2
Example 4. Let f y  = x2 y + e3x+y−z , let a =  −1  and v =  3 . What is the
z 2 −1
directional derivative Dv f (a)?
We define ϕ : R → R by ϕ(t) = f (a + tv). Note that
ϕ(t) − ϕ(0) f (a + tv) − f (a)
ϕ′ (0) = lim = lim = Dv f (a).
t→0t t→0 t
So we just calculate ϕ and compute its derivative at 0:
 
1 + 2t
ϕ(t) = f −1 + 3t = (1 + 2t)2 (−1 + 3t) + e(3(1+2t)+(−1+3t)−(2−t))
2−t
= (1 + 2t)2 (−1 + 3t) + e10t , so
ϕ′ (t) = 4(1 + 2t)(−1 + 3t) + 3(1 + 2t)2 + 10e10t ,

from which we conclude that Dv f (a) = ϕ′ (0) = 9.

This approach is usually more convenient than calculating the limit directly when we are given
commonplace functions. ▽

EXERCISES 3.1

1. Calculate
the partial derivatives of the following functions:
x
*a. f = x3 + 3xy 2 − 2y + 7
y

x p
b. f = x2 + y 2
y
y
x
*c. f = arctan
y x

x 2 2
d. f = e−(x +y )
y

x
e. f = (x + y 2 ) log x
y
 
x
f. f y  = exy z 2 − xy sin(πyz)
z
82 Chapter 3. The Derivative

2. Calculate the directional derivative of the given function f at the given point a in the direction
of the
given
vector v.
x 2 2 1
*a. f = x + xy, a = , v=
y 1 −1

x 2 2 1 1
*b. f = x + xy, a = , v= √
y 1 2 −1
x 0 3
c. f = ye−x , a = , v=
y 1 4

x −x 0 1 3
d. f = ye , a = , v=
y 1 5 4
3. For each of the following functions f and points a, find the unit vector v with the property
that D
v f
(a) is as large as possible.

x 2
*a. f = x2 + xy, a =
y
1
x 0
b. f = ye−x , a =
y 1
   
x 1
1 1 1
c. f y  = + + , a =  −1 
z
x y z 1

4. Suppose Dv f (a) exists. Prove that D−v f (a) exists and calculate it in terms of the former.

5. a. Show that there can be no function f : Rn → R so that for some point a ∈ Rn we have
Dv f (a) > 0 for all nonzero vectors v ∈ Rn .
b. Show that there can, however, be a function f : Rn → R so that for some vector v ∈ Rn
we have Dv f (a) > 0 for all points a ∈ Rn .

6. Consider the ideal gas law pV = nRT . (Here p is pressure, V is volume, n is the number of
moles of gas present, R is the universal gas constant, and T is temperature.) Assume n is fixed.
Solve for each of p, V , and T as functions of the others, viz.,

V p p
p=f , V =g , and T = h .
T T V
Compute the partial derivatives of f , g, and h. What is
∂f ∂g ∂h ∂p ∂V ∂T
· · , or, more colloquially, · · ?
∂V ∂T ∂p ∂V ∂T ∂p

x x
7. Suppose f : R → R is differentiable, and let g =f . Show that
y y
∂g ∂g
x +y = 0.
∂x ∂y

x p
8. Suppose f : R → R is differentiable, and let g = f ( x2 + y 2 ) for x 6= 0. Show that
y
∂g ∂g
y =x .
∂x ∂y
§2. Differentiability 83

9. Let f : R2 → R be defined by:

x xy
f = , x 6= 0, f (0) = 0.
y x2 + y2

Show that the partial derivatives of f exist at 0 and yet f is not continuous at 0. Do other
directional derivatives of f exist at 0?

10. a. Let f : R2 → R be the function defined in Example 5 of Chapter 2, Section 3. Calculate

Dv f (0) for any v ∈ R2 .
b. Give an example of a function f : R2 → R all of whose directional derivatives at 0 are 0
and that is, nevertheless, discontinuous at 0.

*11. Suppose T : Rn → Rm is a linear map. Show that the directional derivative Dv T (a) exists for
all a ∈ Rn and all v ∈ Rn and calculate it.
2
12. Identify the set Mn×n of n × n matrices with Rn .
a. Define f : Mn×n → Mn×n by f (A) = AT . For any A, B ∈ Mn×n , prove that DB f (A) =
B T.
b. Define f : Mn×n → R by f (A) = trA. For any A, B ∈ Mn×n , prove that DB f (A) = trB.
(For the definition of trace, see Exercise 1.4.22.)
2
13. Identify the set Mn×n of n × n matrices with Rn .
a. Define f : Mn×n → Mn×n by f (A) = A2 . For any A, B ∈ Mn×n , prove that DB f (A) =
AB + BA.
b. Define f : Mn×n → Mn×n by f (A) = AT A. Calculate DB f (A).

2. Differentiability

For a function f : R → R one of the fundamental consequences of being differentiable (at a) is

that the function must be continuous (at a). We have already seen that for a function f : Rn → R,
having partial derivatives (or, indeed, all directional derivatives) at a need not guarantee continuity
at a. We now seek the appropriate definition.
Recall that the derivative is defined to be
f (a + h) − f (a)
f ′ (a) = lim ;
h→0 h
alternatively, if it exists, it is the unique number m with the property that

f (a + h) − f (a) − mh
lim = 0.
h→0 h

a
That is, the tangent line—the line passing through with slope m = f ′ (a)—is the best
f (a)
(affine) linear approximation to the graph of f at a, in the sense that the error goes to 0 faster
than h as h → 0. (See Figure 2.1.) Generalizing the latter notion, we make the
84 Chapter 3. The Derivative

ε(h)=f(a+h)−f(a)−f ′(a)h
f ′(a)h

a a+h

Figure 2.1

Definition. Let U ⊂ Rn be open, and let a ∈ U . A function f : U → Rm is differentiable at a

if there is a linear map Df (a) : Rn → Rm so that
f (a + h) − f (a) − Df (a)h
lim = 0.
h→0 khk

This says that Df (a) is the best linear approximation to the function f − f (a) at a, in the
sense that the difference f (a + h) − f (a) − Df (a)h is small compared to h. See Figure 2.2 and

f(a+h)−f(a)−Df(a)h

Df(a)h

a
h
a+h

Figure 2.2

compare Figure 2.1. Equivalently, writing x = a + h, the function g(x) = f (a) + Df (a)(x − a) is
the best affine linear approximation to f near a. Indeed, the graph of g is called the tangent plane
of the graph at a. The tangent plane is obtained
by translating the graph of Df (a), a subspace of
a
Rn × Rm , so that it passes through .
f (a)

Remark . The derivative Df (a), if it exists, must be unique. If there were two linear maps
T, T ′ : Rn → Rm satisfying
f (a + h) − f (a) − T (h) f (a + h) − f (a) − T ′ (h)
lim = 0 and lim = 0,
h→0 khk h→0 khk
then we would have
(T − T ′ )(h)
lim = 0.
h→0 khk
§2. Differentiability 85

In particular, letting h = tei for any i = 1, . . . , n, we see that

(T − T ′ )(tei )
lim = (T − T ′ )(ei ) = 0 for i = 1, . . . , n,
t→0+ t
and so T = T ′ .

It is worth observing that a vector-valued function f is differentiable at a if and only if each of

its coordinate functions fi is differentiable at a. (See Exercise 6 for a proof.)
∂fi
Proposition 2.1. If f : Rn → Rm is differentiable at a, then the partial derivatives (a)
∂xj
exist and
∂fi
[Df (a)] = (a) .
∂xj
The latter matrix is often called the Jacobian matrix of f at a.

Proof. Since we assume f is differentiable at a, we know there is a linear map Df (a) with the
property that
f (a + h) − f (a) − Df (a)h
lim = 0.
h→0 khk
As we did in the remark above, for any j = 1, . . . , n, we consider h = tej , and let t → 0. Then we
have
f (a + tej ) − f (a) − Df (a)(tej )
0 = lim = 0.
t→0 |t|
Considering separately the cases t > 0 and t < 0, we find that
f (a + tej ) − f (a) − Df (a)(tej ) f (a + tej ) − f (a)
0 = lim = lim − Df (a)(ej )
t→0+ t t→0+ t
f (a + tej ) − f (a) − Df (a)(tej ) f (a + tej ) − f (a)
0 = lim = − lim − Df (a)(ej ) ,
t→0− −t t→0− t
f (a + tej ) − f (a) ∂f
and so Df (a)(ej ) = lim = (a), as required.
t→0 t ∂xj
Example 1. When n = 1, we have parametric equations of a curve in Rm . We see that if f is
differentiable at a, then  
f1′ (a)
 ′ 
 f2 (a) 
Df (a) = 
 ..  ,

 . 
′ (a)
fm
and we can think of Df (a) = Df (a)(1) as the velocity vector of the parametrized curve at the point
f (a), which we will usually denote by the (more) familiar f ′ (a). See Section 5 for further discussion
of this topic. ▽

x a
Example 2. Let f = xy. To prove that f is differentiable at a = , we must exhibit a
y b
linear map Df (a) with the requisite property. By Proposition 2.1, we know the only candidate is

a
Df = b a ,
b
86 Chapter 3. The Derivative

so now we just prove that the appropriate limit is really 0:

a+h a h
f −f − b a
b+k b k (ab + bh + ak + hk) − ab − (bh + ak)
lim = lim √

h h h

h2 + k2
k →0
k →0
k
hk h
= lim √ = lim √ k = 0,
h
2
h +k 2 h

h + k2
2
k →0 →0
k

|h| h
because √ ≤ 1 and k → 0 as → 0. ▽
h2 + k2 k

x 2
Example 3. The tangent plane of the graph z = f = xy at a = is
y 1

2 x−2 2
z=f + Df
1 y−1 1
" #
h i x−2
=2+ 1 2 = 2 + (x − 2) + 2(y − 1) = x + 2y − 2. ▽
y−1

x x a
Example 4. Let f = . First, we claim that f is differentiable at a = , provided
y y b
b 6= 0. The putative derivative is
a 1 a
Df = − 2 ,
b b b
and we check that
f (a + h) − f (a) − Df (a)h
lim = 0.
h→0 khk
Well,

a+h a 1 a h a + h a h ak
f −f − − 2 = − − + 2
b+k b b b k b+k b b b
(a + h)(−bk) + ak(b + k) k(ak − bh)
= = 2
b2 (b + k) b (b + k)
and so

a+h a 1 a h k(ak − bh)
f −f − − 2
b+k b b b k b2 (b + k)
lim = lim √ = 0,
h
h
h

h 2 + k2
k →0
k →0
k

|k| ak − bh h
since √ ≤ 1 and 2 → 0 as → 0.
h2 + k2 b (b + k) k
Now, as a (not totally facetious) application, consider the problem of calculating one’s gas
mileage, having used y gallons of gas to travel x miles. For example, without having a calculator
on hand, we can use linear approximation afforded us by the derivative to estimate our gas mileage
if we’ve used 10.8 gallons to drive 344 miles. Using a = 350 and b = 10, we have

344 a a 344 − a
f ≈f + Df
10.8 b b 10.8 − b
§2. Differentiability 87
" #
h i −6
= 35 + 0.1 −3.5 = 35 − 0.6 − 2.8 = 31.6.
0.8

(The actual value, to two decimal places, is 31.85.) ▽

Example 5. As we said earlier, a function f : R2 → R2 is differentiable if and only if both

its component functions fi : R2 → R are differentiable. It follows from Examples 2 and 4 that the
function f : R2 − {y = 0} → R2 given by

x xy
f =
y x/y
is differentiable, and the Jacobian matrix of f at a is

a b a
Df = . ▽
b 1/b −a/b2

One indication that we have the correct definition is the following

Proposition 2.2. If f : Rn → Rm is differentiable at a, then f is continuous at a.

Proof. Suppose f is differentiable at a; we must show that lim f (x) = f (a), or, equivalently,
x→a
that lim f (a + h) = f (a). We have a linear map Df (a) : Rn → Rm so that
h→0

f (a + h) − f (a) − Df (a)h
lim = 0.
h→0 khk
This means that

f (a + h) − f (a) − Df (a)h
lim f (a + h) − f (a) − Df (a)h = lim khk
h→0 h→0 khk
f (a + h) − f (a) − Df (a)h
= lim lim khk = 0.
h→0 khk h→0

By Exercise 2.3.7, lim Df (a)h = 0, and so

h→0

lim f (a + h) − f (a) = lim (f (a + h) − f (a) − Df (a)h) + lim Df (a)h = 0,

h→0 h→0 h→0

as required.

Let’s now study a few examples to see just how subtle the issue of differentiability is.

Example 6. Define f : R2 → R by

x xy
f = 2 , x 6= 0, f (0) = 0.
y x + y2

x 0
Since f = 0 for all x and f = 0 for all y, certainly
0 y

∂f 0 ∂f 0
= = 0.
∂x 0 ∂y 0
88 Chapter 3. The Derivative

However, we have already seen in Exercise 3.1.9 that f is discontinuous, so it cannot be differ-
entiable. For practice, we check directly: if Df (0) existed, by Proposition 2.1 we would have
Df (0) = 0. Now let’s consider
f (h) − f (0) − Df (0)h f (h) hk
lim = lim = lim .
khk h→0 khk (h2 + k2 )3/2

h→0 h
k →0

Like many of the limits we considered in Chapter 2, this one obviously does not exist; indeed, as
h → 0 along the line h = k, this fraction becomes
h2 1
2 3/2
= √ ,
(2h ) 2 2|h|
which is clearly unbounded as h → 0. What’s more, as the reader can check, f has directional
derivatives at 0 only in the directions of the axes.

Example 7. Define f : R2 → R by

x x2 y
f = 2 , x 6= 0, f (0) = 0.
y x + y2
As in Example 6, both partial derivatives of this function at 0 are 0. This function, as we saw in
Example 3 of Chapter 2, Section 3, is continuous, so differentiability is a bit more unclear. But we
just try to calculate:
f (h) − f (0) − Df (0)h f (h) h2 k
lim = lim = lim .
khk h→0 khk (h2 + k2 )3/2

h→0 h
k →0

When h → 0 along either coordinate axis, the limit is obviously 0; however, when h → 0 along
1 1
the line h = k, the limit does not exist (the function is equal to + 2√ 2
when h > 0 and − 2√ 2
when
h < 0). Thus, f is not differentiable at 0.

Proposition 2.3. When f is differentiable at a, for any v ∈ Rn , the directional derivative of f

at a in the direction v is given by
Dv f (a) = Df (a)v.

Proof. Since f is differentiable at a, we know that its derivative, Df (a), has the property that
f (a + h) − f (a) − Df (a)h
lim = 0.
h→0 khk
Substituting h = tv and letting t → 0, we have
f (a + tv) − f (a) − Df (a)(tv)
lim = 0.
t→0 |t|
Since Df (a) is a linear map, Df (a)(tv) = tDf (a)v. Proceeding as in the proof of Proposition 2.1,
letting t approach 0 through positive values, we have
f (a + tv) − f (a) − tDf (a)v
lim = 0, and so
t→0+ t
f (a + tv) − f (a)
lim = Df (a)v.
t→0+ t
§2. Differentiability 89

Similarly, when t approaches 0 through negative values, we have |t| = −t and

f (a + tv) − f (a) − tDf (a)v
lim = 0, so
t→0− −t
f (a + tv) − f (a) − tDf (a)v
lim = 0, and, as before,
t→0 − t
f (a + tv) − f (a)
lim = Df (a)v.
t→0− t
In sum,
f (a + tv) − f (a)
Dv f (a) = lim = Df (a)v,
t→0 t
as required.

Remark. Let’s consider the case of a function f : R2 → R, as we pictured in Figures 1.1 and
1.2. As a consequence of Proposition 2.3, the tangent plane of the graph of f at a contains the
tangent lines at a of the slices by all vertical planes. The function f given in Example 2 of Section
1 cannot be differentiable at 0, as it is clear from Figure 1.3 that the tangent lines to the various
vertical slices at the origin do not lie in a plane.

Since it is so tedious to determine from the definition whether a function is differentiable, the
following Proposition is useful indeed.

Proposition 2.4. If f : U → Rm is continuous and has continuous partial derivatives, then f

is differentiable.

A continuous function with continuous partial derivatives is said to be C1 or continuously

differentiable. The reason for this notation will become clear when we study partial derivatives of
higher order.
Proof. By Exercise 6, it suffices to treat the case m = 1. For clarity, we give the proof in the

a
case n = 2, although
the general case is not conceptually any harder. As usual, we write a =
b
h
and h = .
k
As usual, if f is to be differentiable, we know that Df (a) must be given by the Jacobian matrix
of f at a. To prove that f is differentiable at a ∈ U , we need to estimate

a+h
b+k

a a+h
b b

Figure 2.3

∂f ∂f
f (a + h) − f (a) − Df (a)h = f (a + h) − f (a) − (a)h + (a)k .
∂x ∂y
90 Chapter 3. The Derivative

Now, here is the new twist: as Figure 2.3 indicates, we calculate f (a + h) − f (a) by taking a
two-step route.

a+h a
f (a + h) − f (a) = f −f
b+k b

a+h a a+h a+h
= f −f + f −f
b b b+k b
and so, regrouping in a clever fashion and using the Mean Value Theorem twice, we obtain
f (a + h) − f (a) − Df (a)h

a+h a ∂f a+h a+h ∂f
= f −f − (a)h + f −f − (a)k
b b ∂x b+k b ∂y

∂f a + ξ ∂f ∂f a + h ∂f
= h− (a)h + k− (a)k
∂x b ∂x ∂y b + η ∂y
for some ξ between 0 and h and some η between 0 and k

∂f a + ξ ∂f ∂f a + h ∂f
= − (a) h + − (a) k.
∂x b ∂x ∂y b + η ∂y
|h| |k|
Now, observe that ≤ 1 and ≤ 1; as h → 0, continuity of the partial derivatives guarantees
khk khk
that
∂f a + ξ ∂f ∂f a + h ∂f
lim − (a) = lim − (a) = 0
h→0 ∂x b ∂x h→0 ∂y b+η ∂y
since ξ → 0 and η → 0 as h → 0. Thus,

|f (a + h) − f (a) − Df (a)h| ∂f a + ξ ∂f |h|

∂f a + h
∂f |k|
≤ − (a) + − (a) ,
khk ∂x b ∂x khk ∂y b + η ∂y khk
and therefore indeed approaches 0 as h → 0.

Example 8. We know that the function f given in Example 7 is not differentiable. It follows
from Proposition 2.4 that f cannot be C1 at 0. Let’s verify this directly.
∂f ∂f
It is obvious that (0) = (0) = 0, and for x 6= 0, we have
∂x ∂y

∂f x 2xy 3 ∂f x x2 (x2 − y 2 )
= 2 and = .
∂x y (x + y 2 )2 ∂y y (x2 + y 2 )2

∂f x 1 ∂f x
So we see that when x 6= 0, = and = 1, neither of which approaches 0 as x → 0.
∂x x 2 ∂y 0
Thus, f is not C1 at 0.

Example 9. To see that the sufficient condition for differentiability given by Proposition 2.4
is not necessary, we consider the classic example of the function f : R → R defined by

x2 sin 1 , x 6= 0
f (x) = x .
0, x=0
1 1
Then it is easy to check that f ′ (0) = 0, and yet f ′ (x) = 2x sin − cos has no limit as x → 0.
x x
Thus, f is differentiable on all of R, but is not C1 .
§2. Differentiability 91

EXERCISES 3.2

1. Find the
equation of the tangent
plane of the graph z = f (x) at the indicated point.
x xy −1
*a. f =e , a=
y 2

x 2 2 −1
b. f =x +y , a=
y 2
x p 3
c. f = x2 + y 2 , a =
y 4

x p 1
d. f 2
= 4−x −y , a= 2
y 1
   
x 1
e. f y  = xyz, a =  2 
z 3
   
x 1
*f. f y  = sin(xy)z 2 + exz+1 , a =  0 
z −1
2. Calculate
the directional derivative
of f at
ain the given direction v:
x 0 1
*a. f = ex cos y, a = , v=
y π/4 1

x x 0 1
b. f = e cos y, a = , v=
y π/4 −1

x 3 1
c. f = xy 2 , a = , v=
y 1 2

x 2 2 2 1 2
d. f =x +y , a= v= √
y 1 5 1

x p 2 1 2
*e. f = x2 + y 2 , a = v= √
y 1 1
     5
x 1 2
f. f y  = exyz , a =  −1 , v =  2 
z −1 1

*3. Give the derivative matrix

of each ofthe following vector-valued functions:
x xy
a. f : R2 → R2 , f =
y x2 + y 2
 
cos t
b. f : R → R3 , f (t) =  sin t 
et

s s cos t
c. f: R2 → R2 ,f =
t s sin t
 
x
xyz
d. f : R → R , f y  =
3 2
x + y + z2
z
 
x cos y
x
e. f : R2 → R3 , f =  x sin y 
y
y
92 Chapter 3. The Derivative

*4. Use the technique of Example 4 to estimate your gas mileage if you used 6.5 gallons to drive
224 miles.

5. Two sides of a triangle are x = 3 and y = 4, and the included angle is θ = π/3. To a small
change in which of these three variables is the area of the triangle most sensitive? Why?

6. Let U ⊂ Rn be an open set, and let a ∈ U . Suppose m > 1. Prove that the function f : U → Rm
is differentiable at a if and only if each component function fi , i = 1, . . . , m, is differentiable at
a. (Hint: Review the proof of Proposition 3.1 of Chapter 2.)

7. Show that any linear map is differentiable and is its own derivative (at an arbitrary point).
 
a
8. Show that the tangent plane of the cone z 2 = x2 + y 2 at  b  6= 0 intersects the cone in a line.
c
9. Show that the tangent plane of the saddle surface z = xy at any point intersects the surface in
a pair of lines.
2
x x − y2
10. Find the derivative of the map f = at the point a. Show that whenever a 6= 0,
y 2xy
the linear map Df (a) is a scalar multiple of a rotation matrix.

11. Prove
from
the definition that the following functions are differentiable:
x
a. f = x2 + y 2
y
x
b. f = xy 2
y
c. f : Rn → R, f (x) = kxk2

12. Let

x x2 y
f = , x 6= 0, f (0) = 0.
y x4 + y 2

Show directly that f fails to be C1 at the origin. (Of course, this follows from Example 5 of
Section 3 of Chapter 2 and Propositions 2.2 and 2.4.)

13. Use the results of Exercise 3.1.13 to show that f (A) = A2 and f (A) = AT A are differentiable
functions mapping Mn×n to Mn×n .
♯ 14. Let A be an n × n matrix. Define f : Rn → R by f (x) = Ax · x = xT Ax.
a. Show that f is differentiable and Df (a)h = Aa · h + Ah · a.
b. Deduce that when A is symmetric, Df (a)h = 2Aa · h.

15. Let a ∈ Rn , δ > 0, and suppose f : B(a, δ) → R is differentiable at a. Suppose f (a) ≥ f (x) for
all x ∈ B(a, δ). Prove that Df (a) = O.

16. Let a ∈ R2 , δ > 0, and suppose f : B(a, δ) → R is differentiable and Df (x) = O for all
x ∈ B(a, δ). Prove that f (x) = f (a) for all x ∈ B(a, δ). (Hint: Start with the proof of
Proposition 2.4.)
§3. Differentiation Rules 93

17. Let 
y, x 6= 0
x
f = .
y 0, x = 0

a. Prove that f is continuous at 0.

b. Determine whether f is differentiable at 0. Give a careful proof.

18. Let
x xy 6
f = , x 6= 0, f (0) = 0.
y x4 + y 8
a. Find all the directional derivatives of f at 0.
b. Is f continuous at 0?
c. Is f differentiable at 0?

19. a. Let f : R2 → R be the function defined in Example 5 of Chapter 2, Section 3. Show that
f has directional derivatives at 0 in every direction but is not differentiable at 0.
b. Find a function all of whose directional derivatives at 0 are 0 but, nevertheless, that is not
differentiable at 0.
c. Find a function all of whose directional derivatives at 0 are 0 but that is unbounded in any
neighborhood of 0.
d. Find a function all of whose directional derivatives at 0 are 0, all of whose directional
derivatives exist at every point, and that is unbounded in any neighborhood of 0.

3. Differentiation Rules

In practice, most of the time Proposition 2.4 is sufficient for us to calculate explicit derivatives.
However, it is reassuring to know that the sum, product, and quotient rules from elementary
calculus go over to the multivariable case. We shall come to the chain rule shortly.
For the next proofs, we need the notion of the norm of a linear map T : Rn → Rm . We set

kT k = max kT (x)k.
kxk=1

(In Section 1 of Chapter 5 we will prove the maximum value theorem, which states that a continuous
function on a closed and bounded subset of Rn achieves its maximum value. Since the unit sphere
x
in Rn is closed and bounded, this maximum exists.) When x 6= 0, we have
T kxk ≤ kT k, and
so, by linearity, the following formula follows immediately:

kT (x)k ≤ kT kkxk.

Proposition 3.1. Suppose U ⊂ Rn is open and f : U → Rm , g : U → Rm , and k : U → R.

Suppose a ∈ U and f , g, and k are differentiable at a. Then
(1) f + g : U → Rm is differentiable at a and D(f + g)(a) = Df (a) + Dg(a).

(2) kf : U → Rm is differentiable at a and D(kf )(a)v = Dk(a)v f (a) + k(a)Df (a)v for any
v ∈ Rn .
94 Chapter 3. The Derivative

(3) f · g : U → R is differentiable at a and D(f · g)(a)v = Df (a)v · g(a) + f (a) · Dg(a)v for
any v ∈ Rn .

Proof. These are much like the proofs of the corresponding results in single-variable calculus.
Here, however, we insert the candidate for the derivative in the definition and check that the limit
is indeed 0.
(f + g)(a + h) − (f + g)(a) − Df (a) + Dg(a) h
(1): lim
h→0 khk

f (a + h) − f (a) − Df (a)h + g(a + h) − g(a) − Dg(a)h
= lim
h→0 khk
f (a + h) − f (a) − Df (a)h g(a + h) − g(a) − Dg(a)h
= lim + lim = 0 + 0 = 0.
h→0 khk h→0 khk
(2): We proceed much asin the proof of the limit of the product in Theorem 3.2 of Chapter 2.

(kf )(a + h) − (kf )(a) − Dk(a)h f (a) + k(a)Df (a)h
lim
h→0 khk

k(a + h)f (a + h) − k(a)f (a) − Dk(a)h f (a) + k(a)Df (a)h
= lim
h→0 khk

k(a + h) − k(a) f (a + h) + k(a) f (a + h) − f (a) − Dk(a)h f (a) + k(a)Df (a)h
= lim
h→0 khk

k(a + h) − k(a) f (a + h) − Dk(a)h f (a) k(a) f (a + h) − f (a) − k(a)Df (a)h
= lim + lim
h→0 khk h→0 khk

k(a + h) − k(a) f (a + h) − Dk(a)h f (a) f (a + h) − f (a) − Df (a)h
= lim + k(a) lim
h→0 khk h→0 khk
Now, the second term clearly approaches 0. To handle the first term, we have to use continuity in
a rather subtle way, remembering that if f is differentiable at a, then it is necessarily continuous
at a (Proposition 2.2).

k(a + h) − k(a) f (a + h) − Dk(a)h f (a)
khk

k(a + h) − k(a) − Dk(a)h f (a + h) + Dk(a)h f (a + h) − f (a)
=
khk

k(a + h) − k(a) − Dk(a)h Dk(a)h f (a + h) − f (a)
= f (a + h) + .
khk khk
Now here the first term clearly approaches 0, but the second term is a bit touchy. The length of
the second term is
|Dk(a)h|kf (a + h) − f (a)k
≤ kDk(a)kkf (a + h) − f (a)k,
khk
which in turn goes to 0 as h → 0 by continuity of f at a. This concludes the proof of (2).
The proof of (3) is virtually identical to that of (2) and is left to the reader in Exercise 9.
§3. Differentiation Rules 95

Theorem 3.2 (The Chain Rule). Suppose g : Rn → Rm and f : Rm → Rℓ , g is differentiable

at a, and f is differentiable at g(a). Then f ◦ g is differentiable at a and

D(f ◦ g)(a) = Df (g(a))◦ Dg(a).

Proof. We must show that

f (g(a + h)) − f (g(a)) − Df (g(a))Dg(a)h
lim = 0.
h→0 khk
Letting b = g(a), we know that
g(a + h) − g(a) − Dg(a)h
lim = 0 and
h→0 khk
f (b + k) − f (b) − Df (b)k
lim = 0.
k→0 kkk
Given ε > 0, this means that there are δ1 > 0 and η > 0 so that

(∗) 0 < khk < δ1 =⇒ kg(a + h) − g(a) − Dg(a)hk < εkhk and
(∗∗) kkk < η =⇒ kf (b + k) − f (b) − Df (b)kk ≤ εkkk.

Setting k = g(a + h) − g(a) and rewriting (∗), we conclude that whenever 0 < khk < δ1 , we have

kk − Dg(a)hk < εkhk,

and so

kkk < kDg(a)hk + εkhk ≤ kDg(a)k + ε khk.

Let δ2 = η kDg(a)k + ε and set δ = min(δ1 , δ2 ).
Finally, we start with the numerator of the fraction whose limit we seek.

f (b + k) − f (b) − Df (b)Dg(a)h = [f (b + k) − f (b) − Df (b)k] + [Df (b)k − Df (b)Dg(a)h]

= [f (b + k) − f (b) − Df (b)k] + Df (b) k − Dg(a)h .

Therefore, whenever 0 < khk < δ, we have

kf (b + k) − f (b) − Df (b)Dg(a)hk ≤ kf (b + k) − f (b) − Df (b)kk + kDf (b)kkk − Dg(a)hk

< εkkk + kDf (b)kεkhk < εkhk kDg(a)k + ε + kDf (b)k .

Thus, whenever 0 < khk < δ, we have

kf (g(a + h)) − f (g(a)) − Df (g(a))Dg(a)hk
< ε kDg(a)k + ε + kDf (b)k .
khk
Since ε > 0 is arbitrary, this shows that
kf (g(a + h)) − f (g(a)) − Df (g(a))Dg(a)hk
lim = 0,
h→0 khk
as required.

Remark. Those who wish to end with a perfect ε at the end may replace the ε in (∗) with
ε ε
and that in (∗∗) with .
2(kDf (b)k + 1) 2(kDg(a)k + ε)
96 Chapter 3. The Derivative
 
x
Example 1. Suppose the temperature in space is given by f y  = xyz 2 + e3xy−2z and the
z
position ofa bumblebee
 is given as a function of timet by g : R → R3 . If at time t = 0 the bumblebee
1 −1
is at a =  2  and her velocity vector is v =  1 , as indicated in Figure 3.1, then we might ask
3 2
at what rate she perceives the temperature to be changing at that instant. The temperature she
measures at time t is (f ◦ g)(t), and so she wants to calculate (f ◦ g)′ (0) = D(f ◦ g)(0). We have

Figure 3.1
 
x
Df y  = yz 2 + 3ye3xy−2z xz 2 + 3xe3xy−2z 2xyz − 2e3xy−2z ,
z
h i
so Df (a) = 24 12 10 . Then
 
h i −1
 
(f ◦ g)′ (0) = Df (g(0))g′ (0) = Df (a)v = 24 12 10  1  = 8.
2
Note that in order to apply the chain rule, we need to know only her position and velocity vector
at that instant, not even what her path near a might be. ▽

Remark. Suppose f : Rn → Rm is differentiable at a and we wish to evaluate Dv f (a) for some

v ∈ Rn . Define g : R → Rn by g(t) = a + tv, and consider ϕ(t) = (f ◦ g)(t). By definition, we have
Dv f (a) = ϕ′ (0). Then, by the chain rule, we have
Dv f (a) = ϕ′ (0) = (f ◦ g)′ (0) = Df (a)g′ (0) = Df (a)v.
This is an alternative derivation of the result of Proposition 2.3. (Cf. Example 4 of Section 1.)
Indeed, if g is any differentiable function with g(0) = a and g′ (0) = v, we see that Dv f (a) =
(f ◦ g)′ (0), so this shows, as we suggested in the remark on p. 79, that we should think of the
directional derivative as the rate of change perceived by an observer at a moving with instantaneous
velocity v.
§3. Differentiation Rules 97

Example 2. Let

x x2 − y 2 u u cos v
f = and g = .
y 2xy v u sin v
Since
x 2x −2y u cos v −u sin v
Df = and Dg = ,
y 2y 2x v sin v u cos v
we have

u u u
D(f ◦ g) = Df g Dg
v v v
" #" # " #
2u cos v −2u sin v cos v −u sin v u cos 2v −u2 sin 2v
= =2 .
2u sin v 2u cos v sin v u cos v u sin 2v u2 cos 2v
2
u u cos 2v
On the other hand, as the reader can verify, (f g)
◦ = 2 , and so we can double-check
v u sin 2v
the calculation of the derivative directly. ▽

EXERCISES 3.3

*1. Suppose f : R3 → R is differentiable and

   
cos t + sin t 1
 
g(t) =  t+1  and Df  1 = 2 1 −1 .
2
t + 4t − 1 −1

Find (f ◦ g)′ (0).

*2. Suppose
   
2y − sin x x
x 3x + y − z
f =  ex+3y  and g y  = .
y x + yz + 1
xy + y 3 z

Calculate D(f ◦ g)(0) and D(g◦ f )(0).

   
cos t x
3. Suppose g(t) =  sin t  and f y  = x2 + y 2 + z 2 + 2x. Use the chain rule to calculate
2 sin(t/2) z
(f ◦ g)′ (t). What do you conclude?
 
3 cos t
4. An ant moves along a helical path with trajectory g(t) =  3 sin t .
5t
a. At what rate is his distance from the origin changing at t = 2π? 
x
b. The temperature in space is given by the function f : R3 → R, f y  = xy + z 2 . At what
z
rate does the ant detect the temperature to be changing at t = 3π/4?
98 Chapter 3. The Derivative

*5. An airplane is flying near a radar tower. At the instant it is exactly 3 miles due west of the
tower, it is 4 miles high and flying with a ground speed of 450 mph and climbing at a rate of 5
mph. If at that instant it is flying
a. due east,
b. northeast,
at what rate is it approaching the radar tower at that instant?

*6. An ideal gas obeys the law pV = nRT , where p is pressure, V is volume, n is the number of
moles, R is the universal gas constant, and T is temperature. Suppose for a certain quantity
of ideal gas, nR = 1 ℓ-atm/◦ K. At a given instant, the volume is 10 ℓ and is increasing at the
rate of 1 ℓ/min; the temperature is 300◦ K and is increasing at the rate of 5◦ K/min. At what
rate is the pressure increasing at that instant?

7. Ohm’s law tells us that V = IR, where V is the voltage in an electric circuit, I is the current flow
(in amps), and R is the resistance (in ohms). Suppose that as time passes, the voltage decreases
as the battery wears out and the resistance increases as the resistor heats up. Assuming V and
R vary as differentiable functions of t, at what rate is the current flow changing at the instant
t0 if R(t0 ) = 100 ohm, R′ (t0 ) = 0.5 ohm/sec, I(t0 ) = 0.1 amp, and V ′ (t0 ) = −0.1 volt/sec?

8. Let U ⊂ Rn be open. Suppose g : U → R is differentiable at a ∈ U and g(a) 6= 0. Prove that

1/g is differentiable at a and D(1/g)(a) = −1/(g(a))2 Dg(a).

9. Prove (3) in Proposition 3.1. (One approach is to mimic the proof given of (2). Another is to
apply (1) and (2) appropriately.)
♯ 10. Suppose U ⊂ Rn is open and a ∈ U . Let f , g : U → R3 be differentiable at a. Prove that f × g

is differentiable at a and D(f × g)(a)v = Df (a)v × g(a) + f (a) × Dg(a)v for any v ∈ Rn .
(Hint: Follow the proof of part (2) of Proposition 3.1, and use Exercise 1.5.14.)

11. (Euler’s Theorem on homogeneous functions) We say f : Rn − {0} → R is homogeneous of

degree k if f (tx) = tk f (x) for all t > 0. Prove that a differentiable function f is homogeneous
of degree k if and only if Df (x)x = kf (x) for all nonzero x ∈ Rn . (Hint: Fix x and consider
h(t) = t−k f (tx).)
♯ 12.Suppose U ⊂ Rn is open and convex (i.e., given any points a, b ∈ U , the line segment joining
them lies in U as well). If f : U → Rm is differentiable and Df (x) = O for all x ∈ U , prove
that f is constant. Can you prove this when U is open and connected (i.e., any pair of points
can be joined by a piecewise-C1 path)?

x p
13. Suppose f : R → R is differentiable and let h = f ( x2 + y 2 ) for x 6= 0. Letting r =
p y
x2 + y 2 , show that

∂h ∂h
x +y = rf ′ (r).
∂x ∂y
§4. The Gradient 99

*14. Suppose h : R → R is continuous and u, v : (a, b) → R are differentiable. Prove that the function
F : (a, b) → R given by
Z v(t)
F (t) = h(s)ds
u(t)
is differentiable and calculate F ′.
(Hint: Recall that the Fundamental Theorem of Calculus
Rx
tells you how to differentiate functions such as H(x) = a h(s)ds.)

2 u u+v
15. If f : R → R is differentiable and F =f , show that
v u−v
∂F ∂F ∂f 2 ∂f 2
= − ,
∂u ∂v ∂x ∂y

u+v
where the functions on the right-hand side are evaluated at .
u−v

r r cos θ
*16. Suppose f : R2 → R is differentiable, let F =f . Calculate
θ r sin θ
2 2
∂F 1 ∂F
+
∂r r2 ∂θ
in terms of the partial derivatives of f .
∂f ∂f
17. Suppose f : R2 → R is differentiable and =c for some nonzero constant c. Prove that
∂t ∂x
x u x
f = h(x + ct) for some function h. (Hint: Let = .)
t v x + ct

4. The Gradient

To develop physical intuition, it is important to recast Proposition 2.3 in more geometric terms
when f is a scalar-valued function.

Definition. Let f : Rn → R be differentiable at a. We define the gradient of f at a to be the

vector  ∂f 
∂x (a)
 ∂f1 
T  ∂x2 (a) 
∇f (a) = Df (a) =   .. .

 . 
∂f
∂xn (a)

Now we can interpret the directional derivative of a differentiable function as a dot product:
(∗) Dv f (a) = Df (a)v = ∇f (a) · v.

If we consider the directional derivative in the direction of various unit vectors v, we infer from the
Cauchy-Schwarz inequality, Proposition 2.3 of Chapter 1, that

Dv f (a) ≤ k∇f (a)k,

with equality holding if and only if ∇f (a) is a positive scalar multiple of v.
100 Chapter 3. The Derivative

As a consequence, we have

Proposition 4.1. Suppose f is differentiable at a. Then ∇f (a) points in the direction in

which f increases at the greatest rate, and k∇f (a)k is that greatest possible rate of change; i.e.,
k∇f (a)k = max Dv f (a).
kvk=1

Example 1. Let f : Rn → R be defined by f (x) = kxk. It is simple enough to calculate partial

derivatives of f , but we’d rather use the geometric meaning of the gradient to figure out ∇f (a)
for any a 6= 0. Clearly, if we are at a, the direction in which distance from the origin increases
most rapidly is in the direction of a itself (i.e., to move away from the origin as fast as possible, we
should move radially outwards). Moreover, we saw in Example 3 of Section 1 that the directional
derivative Dv f (a) = 1 when v = a/kak. Therefore, we infer from Proposition 4.1 that ∇f (a) is a
vector pointing radially outwards and having length 1. That is,
a
∇f (a) = .
kak
As corroboration, we observe that if we move orthogonal to a, then instantaneously our distance
from the origin is not changing, so Dv f (a) = ∇f (a) · v = 0 when v · a = 0, as it should. ▽

An equally important interpretation, which will emerge in a significant rôle in Section 5 of

Chapter 4 and then in Chapter 5, is this: Suppose f : R2 → R is a C1 function, c is a constant, and
C = {x ∈ R2 : f (x) = c} is a level curve of f . We shall prove later that for any a ∈ C, provided
∇f (a) 6= 0, C has a tangent line at a and ∇f (a) is orthogonal to that tangent line. Intuitively,
this is quite plausible: if v is tangent to C at a, then since f does not change as we move along C,
it therefore does not change instantaneously as we move in the direction of v, and so Dv f (a) = 0.
Therefore, by (∗), ∇f (a) is orthogonal to v. (See also Exercise 6.) More generally, if f : Rn → R
is differentiable, and ∇f (a) 6= 0, then ∇f (a) is orthogonal to the level set {x ∈ Rn : f (x) = c} of
f passing through a.
 
x
Example 2. Consider
 the surface M defined by f y  = ex+2y cos z − xz + y = 2. Note that
−2 z
the point a =  1  lies on M . We want to find the equation of the tangent plane to M at a. We
0
know that ∇f (a) gives the normal to the plane, so we calculate:
       
x −z + ex+2y cos z −2 1
∇f y  =  1 + 2ex+2y cos z  , and therefore ∇f  1 = 3.
 
z −x − ex+2y sin z 0 2
Thus, the equation of the tangent plane of M at a is 1(x + 2) + 3(y − 1) + 2(z − 0) = 0 or
x + 3y + 2z = 1. ▽

Remark. Be sure to distinguish between a level surface of f and the graph of f (which, in this
case, would reside in R4 ).

Example 3. As a beautiful application of this principle, we use the results of Example 1 to

derive a fundamental physical property of the ellipse. Given two points F1 and F2 in the plane, an
§4. The Gradient 101

∇f(P)
v2 β α
v1
P

F1 F2

Figure 4.1

ellipse (as pictured in Figure 4.1) is the locus of points P so that

−−→ −−→
kF1 P k + kF2 P k = 2a
−−→
for some constant a. Write fi (x) = kFi xk, i = 1, 2, and set f (x) = f1 (x) + f2 (x). Then, by the
results of Example 1, we have
−−→ −−→
F1 P F2 P
∇f (P ) = ∇f1 (P ) + ∇f2 (P ) = −−→ + −−→ .
kF1 P k kF2 P k
| {z } | {z }
v1 v2
Both v1 and v2 are unit vectors pointing radially away from F1 and F2 , respectively, and therefore
∇f (P ) bisects the angle between them (see Exercise 1.2.20). Thus, α = β, and so the tangent line
←−→ ←−→
to the ellipse at P makes equal angles with the lines F1 P and F2 P . Thus, a light ray emanating
from one focus reflects off the ellipse back to the other focus. ▽

EXERCISES 3.4

1. Give the equation of the

tangent
line of the given level curve at the prescribed point a:
1
*a. x3 + y 3 = 9, a =
2

2 xy 0
b. 3xy + e − sin(πy) = 1, a =
1

1
c. x3 + xy 2 − y 4 = 1, a =
−1
2. Give the equation of the tangent
  plane of the given level surface at the prescribed point a:
1
a. x2 + y 2 + z 2 = 5, a =  0 
2
 
0
*b. yz + 2e z = 4, a = 2 
2 xy 5 
1
102 Chapter 3. The Derivative
 
−1
c. x3 + xz 2 + y 2 z + y 3 = 0, a =  1 
0
 
−1
d. e2x+z cos(3y) − xy + z = 3, a =  0 
2
3. Given the topographical map in Figure 4.2, sketch on the map an approximate route of steepest
ascent from P to Q, the top of the mountain. What about from R?

Figure 4.2

h i
*4. Suppose a hillside is given by z = f (x), x ∈ U ⊂ R2 . Suppose f (a) = c and Df (a) = 3 −4 .

a
a. Find a vector tangent to the curve of steepest ascent on the hill at .
c

a
b. Find the angle that a stream makes with the horizontal at if it flows in the e2 direction
c
at that point.

5. As shown in Figure 4.3, at a certain moment, a ladybug is at position x0 and moving with
velocity vector v. At that moment, the angle ∠ax0 b = π/2, her velocity bisects that angle,

a b

Figure 4.3

and her speed is 5 units/sec. At what rate is the sum of her distances from a and b decreasing
at that moment? Give your reasoning clearly.

6. Suppose that, in a neighborhood of the point a, the level curve C = {x ∈ R2 : f (x) = c} can
be parametrized by a differentiable function g : (−ε, ε) → R2 , with g(0) = a. Use the chain
rule to prove that ∇f (a) is orthogonal to the tangent vector to C at a.
§4. The Gradient 103

7. Check that the definition of an ellipse given in Example 3 gives the usual Cartesian equation
of the form
x2 y 2
+ 2 =1
a2 b
±c
when the foci are at . (Hint: You should find that a2 = b2 + c2 .)
0
8. By analogy with Example 3, prove that light emanating from the focus of a parabola reflects
off the parabola in the direction of the axis of the parabola. This is why automobile headlights
use parabolic reflectors. (A convenient definition of a parabola is this: it is the locus of points

focus

directrix

Figure 4.4

equidistant from a point (the focus) and a line (the directrix ), as pictured in Figure 4.4.)

9. Using Figure 4.5 as a guide, complete Dandelin’s proof (dating from 1822) that the appropriate
conic section is an ellipse. Find spheres that are inscribed in the cone and tangent to the plane

F2
F1 P

Figure 4.5

of the ellipse. Letting F1 and F2 be the points of tangency and P a point of the ellipse, let Q1
and Q2 be points where the generator of the cone through P intersects the respective spheres.
104 Chapter 3. The Derivative

−−→ −−→ −−→ −−→

Show that kQi P k = kFi P k, i = 1, 2, and deduce that kF1 P k + kF2 P k = const. (What happens
when we tilt the plane to obtain a parabola or hyperbola?)

10. Suppose f : R2 → R is a differentiable function whose gradient is nowhere 0 and that satisfies

∂f ∂f
=2
∂x ∂y

everywhere.
a. Find (with proof) the level curves of f .
x
b. Show that there is a differentiable function F : R → R so that f = F (2x + y).
y
11. Suppose f : R2 − {0} → R is a differentiable function whose gradient is nowhere 0 and that
satisfies
∂f ∂f
−y +x =0
∂x ∂y

everywhere.
a. Find (with proof) the level curves of f .
b. Show that there is a differentiable function F defined on the set of positive real numbers
so that f (x) = F (kxk).

*12. Find all constants c for which the surfaces

x2 + y 2 + z 2 = 1 and z = x2 + y 2 + c

a. intersect tangentially at each point

b. intersect orthogonally at each point

13. Prove the so-called pedal property of the ellipse: If n is the unit normal to the ellipse at P , then
−−→ −−→
(F1 P · n)(F2 P · n) = constant.

14. The height of land in the vicinity of a hill is given in terms of horizontal
  coordinates x and
1
x 40
y by h = . A stream passes through the point  1  and follows a path of
y 4 + x2 + 3y 2 5
“steepest descent.” Find the equation of the path of the stream on a map of the region.

15. A drop of water falls onto a football and rolls down following the path of steepest descent; that
is, it moves in the direction tangent to the football most nearly vertically downward. Find the
path the water drop follows if the surface of the football is ellipsoidal and given by the equation

4x2 + y 2 + 4z 2 = 9
 
1
and the drop starts at the point  1 .
1
§5. Curves 105

5. Curves

In this section, we return to the study of (parametrized) curves with which we began Chapter
2. Now we bring in the appropriate differential calculus to discuss velocity, acceleration, some basic
principles from physics, and the notion of curvature.
If g : (a, b) → Rn is a twice-differentiable vector-valued function, we can visualize g(t) as denot-
ing the position of a particle at time t, and hence the image of g represents its trajectory as time
passes. Then g′ (t) is the velocity vector of the particle at time t and g′′ (t) is its acceleration vector
at time t. The length of the velocity vector, kg′ (t)k, is called the speed of the particle. In physics,
a particle of mass m is said to have kinetic energy

K.E. = 21 m(speed)2 ,

and acceleration looms large because of Newton’s second law of motion, which says that a force
acting on an object imparts an acceleration according to the equation
−−→ −−−−−−−−→
force = (mass)acceleration or, in other words, F = ma.

As a quick application of some vector calculus, let’s discuss a few properties of motion in a central
force field. We call a force field F : U → R3 on an open subset U ⊂ R3 central if F(x) = ψ(x)x
for some continuous function ψ : U → R; that is, F is everywhere a scalar multiple of the position
vector.
Newton discovered that the gravitational field of a point mass M is an inverse square force
directed toward the point mass. If we assume the point mass is at the origin, then the force exerted
on a unit test mass at position x is
GM x GM
F(x) = − 2
=− x,
kxk kxk kxk3
where G is the universal gravitational constant. Newton published his laws of motion in 1687 in his
Philosophiae Naturalis Principia Mathematica. Interestingly, Kepler had published his empirical
observations almost a century earlier, in 1596:1
Kepler’s first law: Planets move in ellipses with the sun at one focus.
Kepler’s second law: The position vector from the sun to the planet sweeps out area at a
constant rate.
Kepler’s third law: The square of the period of a planet is proportional to the cube of the
semimajor axis of its elliptical orbit.
For the first and third laws we refer the reader to Exercise 15, but here we prove a generalization
of the second.

Proposition 5.1. Let F be a central force field on R3 . Then the trajectory of any particle lies
in a plane; assuming the trajectory is not a line, the position vector sweeps out area at a constant
rate.

1
Somewhat earlier he had surmised that the positions of the six known planets were linked to the famous five
regular polyhedra.
106 Chapter 3. The Derivative

Proof. Let the trajectory of the particle be given by g(t), and let its mass be m. Consider the
vector function A(t) = g(t) × g′ (t). By Exercise 3.3.10 and by Newton’s second law of motion, we
have
A′ (t) = g′ (t) × g′ (t) + g(t) × g′′ (t) = g(t) × 1
m ψ(g(t))g(t) = 0,
since the cross product of any vector with a scalar multiple of itself is 0. Thus, A(t) = A0 is a
constant. If A0 = 0, the particle moves on a line (why?). If A0 6= 0, then note that g lies on the
plane
A0 · x = 0,
as A0 · g(t) = A(t) · g(t) = 0 for all t.
Assume now the trajectory is not linear. Let A(t) denote the area swept out by the position
vector g(t) from time t0 to time t. Since A(t + h) − A(t) equals the area subtended by the position

g(t+h)
g(t)

Figure 5.1

vectors g(t) and g(t + h) (see Figure 5.1), for h small, this is approximately the area of the triangle
determined by the pair of vectors, or equivalently, by the vectors g(t) and g(t + h) − g(t). By

Proposition 5.1 of Chapter 1, this area is 12 kg(t) × g(t + h) − g(t) k, so that
A(t + h) − A(t)
A′ (t) = lim
h→0 h

1 kg(t) × g(t + h) − g(t) k
= lim
h→0+ 2 h

1 g(t + h) − g(t)
= lim g(t) ×
h→0+ 2 h
= 12 kg(t) × g′ (t)k = 12 kA0 k.

That is, the position vector sweeps out area at a constant rate.

One of the most useful (yet intuitively quite apparent) results about curves is the following.

Proposition 5.2. Suppose g : (a, b) → Rn is a differentiable parametrized curve with the

property that g has constant length (i.e., the curve lies on a sphere centered at the origin). Then
g(t) · g′ (t) = 0 for all t; i.e., the velocity vector is everywhere orthogonal to the position vector.

Proof. By (3) of Proposition 3.1, we differentiate the equation

g(t) · g(t) = const

§5. Curves 107

to obtain

g′ (t) · g(t) + g(t) · g′ (t) = 2g(t) · g′ (t) = 0,

as required.

Physically, one should think of it this way: if the velocity vector had a nonzero projection on the
position vector, that would mean that the particle’s distance from the center of the sphere would
be changing. Analogously, as we ask the reader to show in Exercise 2, if a particle moves with
constant speed, then its acceleration must be orthogonal to its velocity.
Now we leave physics behind for a while and move on to discuss some geometry. We begin with
a generalization of the triangle inequality, Corollary 2.4 of Chapter 1.

Lemma 5.3. Suppose g : [a, b] → Rn is continuous (except perhaps at finitely many points).
Then, defining the integral of g component by component, i.e.,
R 
b
Z b g 1 (t)dt
 a . 

g(t)dt =  .. ,
Rb 
a
a ng (t)dt

we have
Z b Z b

g(t)dt
≤ kg(t)kdt.
a a
Z b
Proof. Let v = g(t)dt. If v = 0, there is nothing to prove. By the Cauchy-Schwarz
a
inequality, Proposition 2.3 of Chapter 1, |v · g(t)| ≤ kvkkg(t)k, so
Z b Z b Z b Z b
2
kvk = v · g(t)dt = v · g(t)dt ≤ kvkkg(t)kdt = kvk kg(t)kdt.
a a a a
Z b
Assuming v 6= 0, we now infer that kvk ≤ kg(t)kdt, as required.
a

Definition . Let g : [a, b] → Rn be a (continuous) parametrized curve. Given a partition

P = {a = t0 < t1 < · · · < tk = b} of the interval [a, b], let
k
X
ℓ(g, P) = kg(ti ) − g(ti−1 )k.
i=1

That is, ℓ(g, P) is the length of the inscribed polygon with vertices at g(ti ), i = 0, . . . , k, as indicated
in Figure 5.2. We define the arclength of g to be

ℓ(g) = sup{ℓ(g, P) : P partition of [a, b]},

provided the set of polygonal lengths is bounded above.

The following result is not in the least surprising: The distance a particle travels is the integral
of its speed.
108 Chapter 3. The Derivative

a b
Given this partition, the length of this polygonal
P, of [a,b], path is ℓ(g,P).

Figure 5.2

Proposition 5.4. Let g : [a, b] → Rn be a piecewise-C1 parametrized curve. Then

Z b
ℓ(g) = kg′ (t)kdt.
a

Proof. For any partition P of [a, b], we have, by Lemma 5.3,

k Z ti
k Z ti Z b
k
X X X
′
ℓ(g, P) = kg(ti ) − g(ti−1 )k = g (t)dt ≤ kg′ (t)kdt = kg′ (t)kdt,
ti−1 ti−1 a
i=1 i=1 i=1
Z b
so ℓ(g) ≤ kg′ (t)kdt. The same holds on any interval.
a
Now, for a ≤ t ≤ b, define s(t) to be the arclength of the curve g on the interval [a, t]. Then
for h > 0 we have
Z
kg(t + h) − g(t)k s(t + h) − s(t) 1 t+h ′
≤ ≤ kg (u)kdu,
h h h t
since s(t + h) − s(t) is the arclength of the curve g on the interval [t, t + h]. Now
Z
kg(t + h) − g(t)k ′ 1 t+h ′
lim = kg (t)k = lim kg (u)kdu.
h→0+ h h→0+ h t

Therefore, by the squeeze principle (see Exercise 2.3.3),

s(t + h) − s(t)
lim = kg′ (t)k.
h→0+ h
A similar argument works for h < 0, and we conclude that s′ (t) = kg′ (t)k. Therefore,
Z t
s(t) = kg′ (u)kdu, a ≤ t ≤ b,
a
Z b
and, in particular, s(b) = ℓ(g) = kg′ (t)kdt, as desired.
a
§5. Curves 109

Example 1. Consider the helix

 
a cos t
 
g(t) =  a sin t  , t ∈ R,
bt
as pictured in Figure 5.3. Note that it twists around the cylinder of radius a, heading “uphill” at

Figure 5.3

a constant pitch. If we take one “coil” of the helix, letting t run from 0 to 2π, then the arclength
of that portion is
 
Z 2π Z 2π −a sin t Z 2π p p

ℓ(g) = kg′ (t)kdt =  a cos t  dt = a2 + b2 dt = 2π a2 + b2 . ▽

0 0 b 0

We say the parametrized curve is arclength-parametrized if kg′ (t)k = 1 for all t, so s(t) = t+c for
some constant c. Typically, when the curve is arclength-parametrized, we use s as the parameter.

Examples 2. The following curves are arclength-parametrized:

(a) Consider the following parametrization of a circle of radius a:
" #
a cos(s/a)
g(s) = , 0 ≤ s ≤ 2πa.
a sin(s/a)

Then note that

" #
− sin(s/a)
g′ (s) = , and kg′ (s)k = 1.
cos(s/a)

(b) Consider the curve

 1

3 (1 + s)3/2
 1 
g(s) =  3 (1 − s)3/2  , −1 < s < 1.
√s
2
110 Chapter 3. The Derivative

Then
 1√ 
2 √1 + s
 
g′ (s) =  12 1 − s  , and kg′ (s)k = 1. ▽
√1
2

If g is arclength-parametrized, then the velocity vector g′ (s) is the unit tangent vector at each
point, which we denote by T(s). Let’s assume now that g is twice differentiable. Since kT(s)k = 1
for all s, it follows from Proposition 5.2 that T(s) · T′ (s) = 0. Define the curvature of the curve to
be κ(s) = kT′ (s)k; assuming T′ (s) 6= 0, define the principal normal vector N(s) = T′ (s)/kT′ (s)k.
(See Figure 5.4.)

T(s)

g(s)

N(s)

Figure 5.4

Example 3. If g is a line, then T is constant and κ = 0 (and conversely). If we start with a

circle of radius a, then from Example 2(a) we have
" #
− sin(s/a)
T(s) =
cos(s/a)

from which we compute that

" #
1 − cos(s/a)
T′ (s) = .
a − sin(s/a)
1
In particular, we see that N(s) is centripetal (pointing towards the center of the circle) and κ(s) = a
for all s. ▽

Remark. If the arclength-parametrized curve g : [0, L] → R3 is closed (meaning that g(0) =

RL
g(L)), then it is interesting to consider its total curvature, 0 κ(s)ds. For a circle, or indeed, for
any convex plane curve, this integral is 2π. Not surprisingly, at least that much total curvature is
required for the curve to “close up.” A famous theorem in differential geometry, called the Farý-
Milnor Theorem, states that total curvature at least 4π is required to make a knot, a closed curve
that cannot be continuously deformed into a circle without crossing itself. A trefoil knot is pictured
in Figure 5.5. See, e.g., doCarmo, Differential Geometry of Curves and Surfaces, §5.7.
§5. Curves 111

Figure 5.5

Note that so long as g′ (t) never vanishes, the arclength s is a differentiable function of t with
everywhere positive derivative; thus, it has a differentiable inverse function, which we write t(s).
We can “reparametrize by arclength” by considering the composition h(s) = g(t(s)), and then, of
course, g(t) = h(s(t)). Writing2 υ(t) = s′ (t) = kg′ (t)k for the speed, we have by the chain rule

g′ (t) = h′ (s(t))s′ (t) = υ(t)T(s(t)) and

(†) g′′ (t) = υ ′ (t)T(s(t)) + υ(t)2 T′ (s(t)) = υ ′ (t)T(s(t)) + κ(s(t))υ(t)2 N(s(t)).

Example 4. Consider the parametrized curve

" #
cos3 t
g(t) = , 0 < t < π/2.
sin3 t

Then we have
" # " #
− cos t − cos t
g′ (t) = 3 cos t sin t , so υ(t) = 3 cos t sin t and T(s(t)) = .
sin t sin t

Then, by the chain rule, we have

" #
sin t
= (T◦ s)′ (t) = T′ (s(t))s′ (t) = κ(s(t))υ(t)N(s(t)),
cos t

from which we conclude that

" #
sin t 1 1

κ(s(t))υ(t) = = 1, and κ(s(t)) = = . ▽
cos t υ(t) 3 cos t sin t

EXERCISES 3.5

1. Suppose g : (a, b) → Rn is a differentiable parametrized curve with the property that at each t,
the position and velocity vectors are orthogonal. Prove that g lies on a sphere centered at the
origin.

2. Suppose g : (a, b) → Rn is a twice-differentiable parametrized curve. Prove that g has constant

speed if and only if the velocity and acceleration vectors are orthogonal at every t.
2
For those who might not know, υ is the Greek letter upsilon, not to be confused with ν, the Greek letter nu.
112 Chapter 3. The Derivative

3. Suppose f , g : (a, b) → Rn are differentiable and f · g = const. Prove that f ′ · g = −g′ · f .

Interpret the result geometrically in the event that f and g are always unit vectors.

4. Suppose a particle moves in a central force field in R3 with constant speed. What can you say
about its trajectory? (Proof?)

5. Suppose g : (a, b) → Rn is nowhere zero and g′ (t) = λ(t)g(t) for some scalar function λ. Prove
(rigorously) that g/kgk is constant. (Hint: Set h = g/kgk, write g = kgkh, and differentiate.)

6. Suppose g : (a, b) → Rn is a differentiable parametrized curve and that for some point p ∈ Rn
we have kg(t0 ) − pk ≤ kg(t) − pk for all t ∈ (a, b). Prove that g′ (t0 ) · (g(t0 ) − p) = 0. Give a
geometric explanation.

7. Find the arclength

 t of the following parametrized curves:
e cos t
*a. g(t) =  et sin t , a ≤ t ≤ b
et
 1 t −t

2 (e + e )
b. g(t) =  12 (et − e−t ) , −1 ≤ t ≤ 1
t
 
t
*c. g(t) =  3t2 , 0 ≤ t ≤ 1
6t3

a(t − sin t)
d. g(t) = , 0 ≤ t ≤ 2π
a(1 − cos t)
8. Calculate the unit tangent vector and curvature of the following curves:
 √1 √1

3
cos t + 2 sin t
*a. g(t) =  √1 cos t
3

1 1
√ cos t − √ sin t
 3  2
e−t
*b. g(t) =  √et 
2t
 
t
c. g(t) =  t2 
t3

9. Prove that for a parametrized curve g : (a, b) → R3 , we have κ = kg′ × g′′ k/υ 3 .

10. Using the formula (†) for acceleration, explain how engineers might decide at what angle to
bank a road that is a circle of radius 1/4 mile and around which cars wish to drive safely at 40
mph.

11. (Frenet Formulas) Let g : [0, L] → R3 be a three-times differentiable arclength-parametrized

curve with κ > 0, and let T and N be defined as above. Define the binormal B = T × N.
a. Show that kBk = 1. Assuming the result of Exercise 1.4.34e, show that every vector in R3
can be expressed as a linear combination of T(s), N(s), and B(s). (Hint: See Example 11
of Chapter 1, Section 4.)
§5. Curves 113

b. Show that B′ · T = B′ · B = 0, and deduce that B′ (s) is a scalar multiple of N(s) for every
s. (Hint: See Exercise 3.)
c. Define the torsion τ of the curve by B′ = −τ N. Show that g is a planar curve if and only
if τ (s) = 0 for all s.
d. Show that N′ = −κT + τ B.
The equations
T′ = κN, N′ = −κT + τ B, B′ = −τ N
are called the Frenet formulas for the arclength-parametrized curve g.

*12. (See Exercise 11 for the definition of torsion.) Calculate the curvature and torsion of the helix
presented in Example 1. Explain the meaning of the sign of the torsion.

13. (See Exercise 11 for the definition of torsion.) Calculate the curvature and torsion of the curve
 
et cos t
 
g(t) =  et sin t  .
et

14. A pendulum is made, as pictured in Figure 5.6, by hanging from the cusp where two arches of a
cycloid meet a string of length equal to the length of one of the arches. As it swings, the string

Figure 5.6

wraps around the cycloid and extends tangentially to the bob at the end. Given the equation
" #
t + sin t
f (t) = , 0 ≤ t ≤ 2π,
1 − cos t

for the cycloid, find the parametric equation of the bob, P , of the pendulum.3

15. Assuming that the force field is inverse square, prove Kepler’s first and third laws, as follows.
Without loss of generality, we may assume that the planet has mass 1 and moves in the xy-
plane. (You will need to use polar coordinates, as introduced in Example 6 of Chapter 2,
Section 1.)
3
This phenomenon was originally discovered by the Dutch mathematician Huygens, in an effort to design a
pendulum whose period would not depend on the amplitude of its motion, hence one ideal for an accurate clock.
114 Chapter 3. The Derivative

a. Suppose a, b > 0 and a2 = b2 + c2 . Show that the polar coordinates equation of the ellipse
(x − c)2 y 2 c b2
+ = 1 is r(1 − cos θ) = .
a2 b2 a a
This is an ellipse with semimajor axis a and semiminor axis b, with one focus at the origin.
(Hint: Expand the left-hand side in polar coordinates and express the result as a difference
of squares.)

eθ(t)
er(t)
g(t)
r(t)
θ(t)

Figure 5.7

b. Let r(t) and θ(t) be the polar coordinates of g(t), and let

cos θ(t) − sin θ(t)
er (t) = and eθ (t) = ,
sin θ(t) cos θ(t)
as pictured in Figure 5.7. Show that
g′ (t) = r ′ (t)er (t) + r(t)θ ′ (t)eθ (t)

g′′ (t) = r ′′ (t) − r(t)θ ′ (t)2 er (t) + 2r ′ (t)θ ′ (t) + r(t)θ ′′ (t) eθ (t).
c. Let A0 be as in the proof of Proposition 5.1. Show that g′′ (t) × A0 = GM θ ′ (t)eθ (t) =
GM e′r (t), and deduce that g′ (t) × A0 = GM (er (t) + c) for some constant vector c.
d. Dot the previous equation with g(t) and use the fact that g(t) × g′ (t) = A0 to deduce
that GM r(t)(1 − kck cos θ(t)) = kA0 k2 if we assume c is a negative scalar multiple of e1 .
Deduce that when kck ≥ 1 the path of the planet is unbounded and that when kck < 1
the orbit of the planet is an ellipse with one focus at the origin.
e. As we shall see in Chapter 7, the area of an ellipse with semimajor axis a and semiminor
4π 2 3
axis b is πab; show that the period T = 2πab/kA0 k. Now prove that T 2 = a .
GM
16. (pilfered from Which Way did the Bicycle Go . . . and other intriguing mathematical mysteries,
published by the M.A.A.)
“This track, as you perceive, was made by a rider who was going from the
direction of the school.”
“Or towards it?”
“No, no, my dear Watson . . . It was undoubtedly heading away from the
school.”
So spoke Sherlock Holmes.4 Imagine a 20-foot wide mud patch through which a bicycle has
just passed, with its front and rear tires leaving tracks as illustrated in Figure 5.8. (We have
4
The Return of Sherlock Holmes,“The Adventure of the Priory School”
§6. Higher-Order Partial Derivatives 115

Figure 5.8

taken the liberty of helping you in your capacity as sleuth by dashing the path of one of the
wheels.) In which direction was the bicyclist traveling? Explain your answer.

6. Higher-Order Partial Derivatives

Suppose U ⊂ Rn is open and f : U → Rm is a vector-valued function on U . Recall that we said

∂f
f is C1 (on U ) if the partial derivatives exist and are continuous on U . Suppose this is the
∂xi
case. Then we can ask whether they in turn have partial derivatives, i.e., whether the functions

∂2f ∂ ∂f
=
∂xj ∂xi def ∂xj ∂xi
are defined. These functions are, for obvious reasons, called second-order partial derivatives of f .
We say f is C2 (on U ) if all its first- and second-order partial derivatives exist and are continuous
(on U ). More generally, we say f is Ck (on U ) if all its first-, second-, . . . , and k th -order partial
derivatives
∂k f
, 1 ≤ i1 , i2 , . . . , ik ≤ n,
∂xik ∂xik−1 · · · ∂xi2 ∂xi1
exist and are continuous (on U ). We say f is C∞ (or smooth) if all its partial derivatives of all
orders exist.
 
x
Example 1. Let f y  = exy sin z + xy 3 z 4 . Then f is smooth and
z

∂f
= yexy sin z + y 3 z 4
∂x
2
∂ f
= yexy cos z + 4y 3 z 3
∂z∂x
∂3f
= −yexy sin z + 12y 3 z 2 , and
∂z 2 ∂x
∂3f
= exy (xy + 1) cos z + 12y 2 z 3 . ▽
∂y∂z∂x
It is a hassle to keep track of the order in which we calculate higher-order partial derivatives.
Luckily, the following result tells us that for smooth functions, the order in which we calculate the
116 Chapter 3. The Derivative

partial derivatives does not matter. This is an intuitively obvious result, but the proof is quite
subtle.

Theorem 6.1. Let U ⊂ Rn be open, and suppose f : U → Rm is a C2 function. Then for any
i and j we have
∂2f ∂2f
= .
∂xi ∂xj ∂xj ∂xi
Proof. It suffices to prove the result when m = 1. For ease of notation, we take n = 2, i = 1,
and j = 2. Introduce the function

h a+h a+h a a
∆ =f −f −f +f ,
k b+k b b+k b

s s
as indicated schematically in Figure 6.1. Letting q(s) = f −f , and applying the Mean
b+k b

a a+h
b+k — + b+k

r(t)

a + — a+h
b q(s) b

Figure 6.1
Value Theorem, we have

h
∆ = q(a + h) − q(a) = hq ′ (ξ) for some ξ between a and a + h
k

∂f ξ ∂f ξ
=h −
∂x b + k ∂x b
2
∂ f ξ
= hk for some η between b and b + k.
∂y∂x η

a+h a
On the other hand, letting r(t) = f −f , we have
t t

h
∆ = r(b + k) − r(k) = kr ′ (τ ) for some τ between b and b + k
k

∂f a + h ∂f a
=k −
∂y τ ∂y τ

∂2f σ
= hk for some σ between a and a + h.
∂x∂y τ
Therefore, we have
1 h ∂2f ξ ∂2f σ
∆ = = .
hk k ∂y∂x η ∂x∂y τ
§6. Higher-Order Partial Derivatives 117

∂2f ∂2f
Now ξ, σ → a and η, τ → b as h, k → 0, and since the functions and are continuous,
∂x∂y ∂y∂x
we have

∂2f a ∂2f a
= ,
∂x∂y b ∂y∂x b
as required.

To see why the C2 hypothesis is necessary, see Exercise 1.

Second-order derivatives appear in the study of the local behavior of functions near critical
points and, more importantly in differential equations and physics—as we’ve seen, Newton’s second
law of motion tells us that forces induce acceleration. At this juncture, we give a few examples of
higher-order partial derivatives and partial differential equations that arise in the further study of
mathematics and physics.

Example 2. (Harmonic functions) If f is a C2 function on (an open subset of) Rn , the expres-
sion
∂2f ∂2f ∂2f
∇2 f = 2 + 2 + ··· + 2
∂x1 ∂x2 ∂xn
is called the Laplacian of f . A solution of the equation ∇2 f = 0 is called a harmonic function. As
we shall see in Chapter 8, the Laplacian and harmonic functions play an important rôle in physical
applications. For example, the gravitational (resp., electrostatic) potential is a harmonic function
in mass-free (resp., charge-free) space. ▽

Example 3. (Wave Equation) The equation

∂2f 2
2∂ f
(∗) = c
∂t2 ∂x2
models the displacement of a one-dimensional vibrating string (with “wave velocity” c) from its
equilibrium position. By a clever use of the chain rule, we can find an explicit formula for its general
solution, assuming f is C2 . Let
" # 1
x u 2 (u + v)
=g = 1
2c (u − v)
t v

u u
(so that u = x + ct and v = x − ct), and set F =f g . Then by the chain rule, we have
v v

u u u
DF = Df g Dg
v v v
" 1 1
#
∂f u ∂f u 2 2
= g g 1 1 ,
∂x v ∂t v −
2c 2c

so
" 1 #
∂F ∂f u ∂f u 2 1 ∂f u 1 ∂f u
= g g 1 = g − g .
∂v ∂x v ∂t v − 2c 2 ∂x v 2c ∂t v
118 Chapter 3. The Derivative

Now,
differentiating
with
respect
to u, we have to apply the chain rule to each of the functions
∂f u ∂f u
g and g :
∂x v ∂t v
" 1 # 2 2 " 1 #
∂2F 1 ∂2f u ∂2f u 2 1 ∂ f u ∂ f u 2
= 2
g g 1 − g 2
g 1
∂u∂v 2 ∂x v ∂t∂x v 2c ∂x∂t v ∂t v
2c 2c

1 ∂2f u 1 ∂2f u 1 ∂2f u 1 ∂2f u
= 2
g + g − g − 2 2 g
4 ∂x v c ∂t∂x v c ∂x∂t v c ∂t v
2
1 ∂ f u 1 ∂2f u
= 2
g − 2 2
g = 0,
4 ∂x v c ∂t v
where at the last step we use Theorem 6.1. Now what can we say about the general solution of the
∂2F
equation = 0? On any rectangle in the uv-plane, we can infer that
∂u∂v

u
F = φ(u) + ψ(v)
v

∂ ∂F ∂F
for some differentiable functions φ and ψ. (For = 0 tells us that is independent of
∂u ∂v ∂v
u, hence a function of v only, whose antiderivative we call ψ(v). But the constant of integration
can be an arbitrary function of u. To examine this argument a bit more carefully, we recommend
that the reader consider Exercise 11.)
In conclusion, on a suitable domain, the general solution of the wave equation (∗) can be written
in the form
x
f = φ(x + ct) + ψ(x − ct)
t
for arbitrary C2 functions φ and ψ. The physical interpretation is this: the general solution is the
superposition of two traveling waves, one moving to the right along the string with speed c, the
other moving to the left with speed c. ▽

Example 4. (Minimal Surfaces) When you dip a piece of wire shaped in the form of a closed
curve C into soap film, the resulting surface you see is called a minimal surface, so called because
in principle surface tension dictates that the surface should have least area among all those surfaces

x
having that curve C as boundary. If the minimal surface is in the form of a graph z = f ,
y
then it is shown in a differential geometry course that f must be a solution of the minimal surface
equation ∂f 2 ∂ 2 f ∂f ∂f ∂ 2 f
∂f 2 ∂ 2 f
1+ − 2 + 1 + = 0.
∂y ∂x2 ∂x ∂y ∂x∂y ∂x ∂y 2
(See also Exercise 8.5.22.) Examples of minimal surfaces are:
(a) a plane;
(b) a helicoid—the spiral surface obtained by joining points of a helix “horizontally” to its
vertical axis, as pictured in Figure 6.2(a);
1
(c) a catenoid—the surface of revolution obtained by rotating a catenary y = 2c (ecx + e−cx )
(for any c > 0) about the x-axis, as pictured in Figure 6.2(b).
(See Exercise 10.) ▽
§6. Higher-Order Partial Derivatives 119

(a) (b)

Figure 6.2

EXERCISES 3.6

1. Define f : R2 → R by

x x2 − y 2
f = xy , x 6= 0, f (0) = 0.
y x2 + y 2

∂f 0 ∂f x
a. Show that = −y for all y and = x for all x.
∂x y ∂y 0
∂2f ∂2f
b. Deduce that (0) = 1 but (0) = −1.
∂x∂y ∂y∂x
c. Conclude that f is not C2 at 0.

2. Check
that the following are harmonic functions.
x
a. f = 3x2 − 5xy − 3y 2
y

x
b. f = log(x2 + y 2 )
y
 
x
c. f y  = x2 + xy + 2y 2 − 3z 2 + xyz
z
 
x
d. f y  = (x2 + y 2 + z 2 )−1/2
z
3. Check that the following functions are solutions of the one-dimensional wave equation given in
Example
3.
x
a. f = cos(x + ct)
t

x
b. f = sin 5x cos 5ct
t

x 2 /4kt
4. Let f = t−1/2 e−x . Show that f is a solution of the one-dimensional heat equation
t
∂f ∂2f
= k 2.
∂t ∂x
120 Chapter 3. The Derivative

*5. Suppose
we are given a solution f of the one-dimensional
wave equation, with initial position
x ∂f x
f = h(x) and initial velocity = k(x). Express the functions φ and ψ in the
0 ∂t 0
solution of Example 3 in terms of h and k.

2 2 2 2 u u u
6. Suppose f : R → R and g : R → R are C , and let F =f g . Writing g1 =
v v v

u u u
x and g2 =y , show that
v v v

∂2F ∂f ∂ 2 x ∂f ∂ 2 y ∂ 2 f ∂x ∂x ∂2f ∂x ∂y ∂x ∂y ∂ 2 f ∂y ∂y
= + + 2 + + + 2 ,
∂u∂v ∂x ∂u∂v ∂y ∂u∂v ∂x ∂u ∂v ∂x∂y ∂u ∂v ∂v ∂u ∂y ∂u ∂v

u
where the partial derivatives of f are evaluated at g .
v

r r cos θ
7. Suppose f : R2 → R is C2 . Let F =f . Show that
θ r sin θ
∂2F 1 ∂F 1 ∂2F ∂2f ∂2f
+ + = + ,
∂r 2 r ∂r r 2 ∂θ 2 ∂x2 ∂y 2

r r cos θ
where the left-hand side is evaluated at and the right-hand side is evaluated at .
θ r sin θ
(This is the formula for the Laplacian in polar coordinates.)

r
8. Use the result of Exercise 7 to show that for any integer n, the functions F = r n cos nθ
θ

r
and F = r n sin nθ are harmonic.
θ

*9. Use the result of Exercise

7 to find all radially symmetric harmonic functions on the plane.
r
(This means that F is independent of θ, so we can call it h(r).)
θ

10. Check that the following functions f : R2 → R are indeed solutions of the minimal surface
equation
given in Example 4.
x
a. f =c
y
x
b. f = arctan(y/x)
y
q
x 1 x
2
c. f = 2 (e + e−x ) − y 2 (For this one a computer algebra system is recommended.)
y

0,
u u < 0 or v < 0 ∂2F
11. Define F = . Show that F is C2 and = 0, and yet F cannot
v u3 ,
u ≥ 0 and v > 0 ∂u∂v
be written in the form prescribed by the discussion of Example 3. Resolve this paradox.
CHAPTER 4
Implicit and Explicit Solutions of Linear Systems
We have seen that we can view the unit circle {x ∈ R2 : kxk = 1} either as the set of
cos t
solutions of an equation or in terms of a parametric representation : t ∈ [0, 2π) . These
sin t
are, respectively, the implicit and explicit representations of this subset of R2 . Similarly, any
subspace V ⊂ Rn can be represented in two ways:
(i) V = Span(v1 , . . . , vk ) for appropriate vectors v1 , . . . , vk ∈ Rn —this is the explicit or
parametric representation;
(ii) V = {x ∈ Rn : Ax = 0} for an appropriate m × n matrix A—this is the implicit
representation, viewing V as the intersection of the hyperplanes defined by Ai · x = 0.
In this chapter we will see how to go back and forth between these two approaches. The central
tool is Gaussian elimination, with which we deal in depth in the first two sections. We then come to
the central notion of dimension and some useful applications. In the last section, we will begin to
investigate to what extent we can relate implicit and explicit descriptions in the nonlinear setting.

1. Gaussian Elimination and the Theory of Linear Systems

In this section we give an explicit algorithm for solving a system of m linear equations in n
variables:
a11 x1 + a12 x2 + . . . + a1n xn = b1
a21 x1 + a22 x2 + . . . + a2n xn = b2
.. ..
. .
am1 x1 + am2 x2 + . . . + amn xn = bm .

Of course, we can write this in the form Ax = b, where

   
a11 a12 . . . a1n x1  
    b1
 a21 a22 . . . a2n  x   . 
A=  , x =  .2  , and b =  . 
 . . .
.. . .. .
..   .   . .
 .   . 
bm
am1 am2 . . . amn xn

Geometrically, a solution of the system Ax = b is a vector x having the requisite dot products with
the row vectors Ai of the matrix A:

Ai · x = bi for all i = 1, 2, . . . , m.

That is, the system of equations describes the intersection of the m hyperplanes with normal vectors
Ai and at (signed) distance bi /kAi k from the origin.
121
122 Chapter 4. Implicit and Explicit Solutions of Linear Systems

To solve a system of linear equations, we want to give an explicit parametric description of the
general solution. Some systems are relatively simple to solve. For example, taking the system

x1 − x3 = 1
x2 + 2x3 = 2 ,

we see that these equations allow us to determine x1 and x2 in terms of x3 ; in particular, we can
write x1 = 1 + x3 and x2 = 2 − 2x3 , wherex3 is free to take on any real value. Thus, any solution
1+t
of this system is of the form x =  2 − 2t  for some t ∈ R. (It is easily checked that every vector
t
of this form is in fact a solution, as (1 + t) − t = 1 and (2 − 2t) + 2t = 2 for every t ∈ R.)Thus,
1
we see that the 
intersection
 of the two given planes is the line in R3 passing through  2  with
1 0
direction vector  −2 .
1
More complicated systems of equations require some algebraic manipulations before we can
easily read off the general solution in parametric form. There are three basic operations we can
perform on systems of equations that will not affect the solution set. They are the following
elementary operations:
(i) interchange any pair of equations;
(ii) multiply any equation by a nonzero real number;
(iii) replace any equation by its sum with a multiple of any other equation.

Example 1. Consider the system of linear equations

3x1 − 2x2 + 9x4 = 4

2x1 + 2x2 − 4x4 = 6 .

We can use operation (i) to replace this system with

2x1 + 2x2 − 4x4 = 6

3x1 − 2x2 + 9x4 = 4 ;

then we use operation (ii), multiplying the first equation by 1/2, to get

x1 + x2 − 2x4 = 3
3x1 − 2x2 + 9x4 = 4 ;

now we use operation (iii), adding −3 times the first equation to the second:

x1 + x2 − 2x4 = 3
− 5x2 + 15x4 = −5 .

Next we use operation (ii) again, multiplying the second equation by −1/5, to obtain

x1 + x2 − 2x4 = 3
x2 − 3x4 = 1 ;
§1. Gaussian Elimination and the Theory of Linear Systems 123

finally, we use operation (iii), adding −1 times the second equation to the first:

x1 + x4 = 2
x2 − 3x4 = 1 .

From this we see that x1 and x2 are determined by x4 , while x3 and x4 are free to take on any
values. Thus, we read off the general solution of the system of equations:

x1 = 2 − x4
x2 = 1 + 3x4
x3 = x3
x4 = x4

Thus, the general solution is

       
x1 2 0 −1
x  1 0  3
 2      
x =   =   + x3   + x4  ,
 x3   0  1  0
x4 0 0 1

which is a parametric representation of a plane in R4 . ▽

We now describe a systematic technique, using the three allowable elementary operations, for
solving systems of m equations in n variables. Before going any further, we should make the official
observation that performing elementary operations on a system of equations does not change its
solutions.

Proposition 1.1. If a system of equations Ax = b is changed into the new system Cx = d by

elementary operations, then the systems have the same set of solutions.

Proof. Left to the reader in Exercise 1.

We introduce one further piece of shorthand notation, the augmented matrix

 
a11 . . . a1n b1
 
 a21 . . . a2n b2 
[A | b] = 
 .. .. .. .. 
.
 . . . . 
am1 . . . amn bm
Notice that the augmented matrix contains all of the information of the original system of equations,
since we can recover the latter by filling in the xi ’s, +’s, and =’s as needed.
The elementary operations on a system of equations become operations on the rows of the aug-
mented matrix; in this setting, we refer to them as elementary row operations of the corresponding
three types:
(i) interchange any pair of rows;
(ii) multiply all the entries of any row by a nonzero real number;
(iii) replace any row by its sum with a multiple of any other row.
124 Chapter 4. Implicit and Explicit Solutions of Linear Systems

Since we have established that elementary operations do not affect the solution set of a system of
equations, we can freely perform elementary row operations on the augmented matrix of a system
of equations with the goal of finding an “equivalent” augmented matrix from which we can easily
read off the general solution.

Example 2. We revisit Example 1 in the notation of augmented matrices. To solve

3x1 − 2x2 + 9x4 = 4
2x1 + 2x2 − 4x4 = 6 ,

we begin by forming the appropriate augmented matrix

" #
3 −2 0 9 4
.
2 2 0 −4 6
We denote the process of performing row operations by the symbol and (in this example) we
indicate above it the type of operation we are performing:
" # " # " #
3 −2 0 9 4 (i) 2 2 0 −4 6 (ii) 1 1 0 −2 3
2 2 0 −4 6 3 −2 0 9 4 3 −2 0 9 4
" # " # " #
(iii) 1 1 0 −2 3 (ii) 1 1 0 −2 3 (iii) 1 0 0 1 2
.
0 −5 0 15 −5 0 1 0 −3 1 0 1 0 −3 1
From the final augmented matrix we are able to recover the simpler form of the equations,
x1 + x4 = 2
x2 − 3x4 = 1 ,

and read off the general solution just as before. ▽

Definition. We call the first nonzero entry of a row (reading left to right) its leading entry. A
matrix is in echelon1 form if
(1) the leading entries move to the right in successive rows;
(2) the entries of the column below each leading entry are all 0;2
(3) all rows of zeroes are at the bottom of the matrix.
A matrix is in reduced echelon form if it is in echelon form and, in addition,
(4) every leading entry is 1;
(5) all the entries of the column above each leading entry are 0 as well.
We call the leading entry of a certain row of a matrix a pivot if there is no leading entry above
it in the same column. When a matrix is in echelon form, we refer to the columns in which a pivot
appears as pivot columns and to the corresponding variables (in the original system of equations)
as pivot variables. The remaining variables are called free variables.
1
The word echelon derives from theFrench “échelle,” ladder. Although we don’t usually draw the rungs of the
1 2 3 4
ladder, they are there:  0 0 1 2 .
0 0 0 3
2
Condition (2) is actually a consequence of (1), but we state it anyway for clarity.
§1. Gaussian Elimination and the Theory of Linear Systems 125

The augmented matrices

" # " # " #
1 2 0 −1 1 1 2 1 1 3 1 2 0 3
, ,
0 0 1 2 2 0 0 1 2 2 1 0 −1 2
are, respectively, in reduced echelon form, in echelon form, and in neither. The key point is this:
When the matrix is in reduced echelon form, we are able to determine the general solution by
expressing each of the pivot variables in terms of the free variables.

Example 3. The augmented matrix

 
1 2 0 0 4 1
 
0 0 1 0 −2 2
0 0 0 1 1 1
is in reduced echelon form. The corresponding system of equations is
x1 + 2x2 + 4x5 = 1
x3 − 2x5 = 2
x4 + x5 = 1 .

Notice that the pivot variables, x1 , x3 , and x4 , are completely determined by the free variables x2
and x5 . As usual, we can write the general solution in terms of the free variables only:
         
x1 1−2x2 −4x5 1 −2 −4
         
 x2   x2  0  1  0
         
  
x =  x3  =  2    
+2x5  = 2 + x2  0 + x5 
 
 2 . ▽
         
 x4   1 − x5  1  0 −1
x5 x5 0 0 1

In this last example, we see that the general solution is the sum of a particular solution—
obtained by setting all the free variables equal to 0—and a linear combination of vectors, one
for each free variable—obtained by setting that free variable equal to 1 and the remaining free
variables equal to 0 and ignoring the particular solution. In other words, if xk is a free variable,
the corresponding vector in the general solution has kth coordinate equal to 1 and j th coordinate
equal to 0 for all the other free variables xj . Concentrate on the circled entries in the vectors from
Example 3:
   
−2 −4
 n   n 
 1  0
   
x2 
 0  + x5  2  .
  
   
 0  −1 
0n 1n
We refer to this as the standard form of the general solution. The general solution of any system
in reduced echelon form can be presented in this manner.
Our strategy now is to transform the augmented matrix of any system of linear equations into
echelon form by performing a sequence of elementary row operations. The algorithm goes by the
name of Gaussian elimination.
126 Chapter 4. Implicit and Explicit Solutions of Linear Systems

The first step is to identify the first column (starting at the left) that does not consist only
of 0’s; usually this is the first column, but it may not be. Pick a row whose entry in this column
is nonzero—usually the uppermost such row, but you may choose another if it helps with the
arithmetic—and interchange this with the first row; now the first entry of the first nonzero column
is nonzero. This will be our first pivot. Next, we add the appropriate multiple of the top row to all
the remaining rows to make all the entries below the pivot equal to 0. To consider two examples,
if we begin with the matrices
   
3 −1 2 7 0 2 4 3
   
A = 2 1 3 3 and B = 0 1 2 −1  ,
2 2 4 2 0 2 3 3

then we begin by switching the first and third rows of A and the first and second rows of B (to
avoid fractions). After clearing out the first pivot column we have
   
2n 2 4 2 0 1n 2 −1
′   ′  
A =  0 −1 −1 1 and B = 0 0 0 5.
0 −4 −4 4 0 0 −1 5

We have circled the pivots for emphasis. (If we are headed for the reduced echelon form, we might

replace the first row of A′ by 1 1 2 1 .)
The next step is to find the first column (again, starting at the left) in the new matrix having
a nonzero entry below the first row . Pick a row below the first that has a nonzero entry in this
column, and, if necessary, interchange it with the second row. Now the second entry of this column
is nonzero; this is our second pivot. (Once again, if we’re calculating the reduced echelon form, we
multiply by the reciprocal of this entry to make the pivot 1.) We then add appropriate multiples
of the second row to the rows beneath it to make all the entries beneath the pivot equal to 0.
Continuing with our examples, we obtain
   
2n 2 4 2 0 1n 2 −1
   n 5
A′′ =  0 −1 n −1 1 and B ′′ =  0 0 −1 .
0 0 0 0 0 0 0 5n

At this point, both A′′ and B ′′ are in echelon form; note that the zero row of A′′ is at the bottom,
and that the pivots move toward the right and down.
The process continues until we can find no more pivots—either because we have a pivot in each
row or because we’re left with nothing but rows of zeroes. At this stage, if we are interested in
finding the reduced echelon form, we clear out the entries in the pivot columns above the pivots
and then make all the pivots equal to 1. (Two words of advice here: if we start at the right and
work our way up and to the left, we in general minimize the amount of arithmetic that must be
done. Also, we always do our best to avoid fractions.) Continuing with our examples, we find the
reduced echelon forms of A and B respectively:
     
2n 2 4 2 1n 1 2 1 1n 0 1 2
     
A′′ =  0 −1 n −1 1 0 1n 1 −1  0 1n 1 −1  = RA
0 0 0 0 0 0 0 0 0 0 0 0
§1. Gaussian Elimination and the Theory of Linear Systems 127
     
0 1n 2 −1 0 1n 2 −1 0 1n 2 0
′′  n 5     
B = 0 0 −1 0 0 1n −5  0 0 1n 0 
0 0 0 5n 0 0 0 1n 0 0 0 1n
 
0 1n 0 0
 
0 0 1n 0  = RB .
0 0 0 1n
We must be careful from now on to distinguish between the symbols “=” and “ ”; when we convert
one matrix to another by performing one or more row operations, we do not have equal matrices.
Here is one last example:

Example 4. Give the general solution of the following system of linear equations:
x1 + x2 + 3x3 − x4 = 0
−x1 + x2 + x3 + x4 + 2x5 = −4
x2 + 2x3 + 2x4 − x5 = 0
2x1 − x2 + x4 − 6x5 = 9 .

We begin with the augmented matrix of coefficients and put it in reduced echelon form:
   
1 1 3 −1 0 0 1 1 3 −1 0 0
 −1 1 1 1 2 −4  0 2 4 0 2 −4 
   
   
 0 1 2 2 −1 0 0 1 2 2 −1 0
2 −1 0 1 −6 9 0 −3 −6 3 −6 9
   
1 1 3 −1 0 0 1 1 3 −1 0 0
0 1 2 0 1 −2   −2 
 0 1 2 0 1 
   
0 0 0 2 −2 2 0 0 0 1 −1 1
0 0 0 3 −3 3 0 0 0 0 0 0
 
1 0 1 0 −2 3
0 1 2 0 1 −2 
 
 
0 0 0 1 −1 1
0 0 0 0 0 0
Thus, the system of equations is given in reduced echelon form by
x1 + x3 − 2x5 = 3
x2 + 2x3 + x5 = −2
x4 − x5 = 1 ,

from which we read off

x1 = 3 − x3 + 2x5
x2 = −2 − 2x3 − x5
x3 = x3
x4 = 1 + x5
x5 = x5 ,
128 Chapter 4. Implicit and Explicit Solutions of Linear Systems

and so the general solution is

       
x1 3 −1 2
       
 x2   −2   −2   −1 
       
x=       
 x3  =  0  + x3  1  + x5  0  . ▽
       
 x4   1   0  1
x5 0 0 1

When we reduce a matrix to echelon form, we must make a number of choices along the way,
and the echelon form may well depend on the choices. But we shall now prove (using an inductive
argument) that any two echelon forms of the same matrix must have pivots in the same columns,
and from this it will follow that the reduced echelon form must be unique.

Theorem 1.2. Suppose A and B are echelon forms of the same nonzero matrix M . Then all
the their pivots appear in the same positions. As a consequence, if they are in reduced echelon
form, then they are equal.

Proof. We begin by noting that we can transform M to both A and B by sequences of ele-
mentary row operations. It follows that we can proceed from A to B by a sequence of elementary
row operations: The inverse of an elementary row operation is itself an elementary row operation,
so we can first transform A to M and then transform M to B.
Suppose the ith column of A is its first pivot column; this column vector is the standard basis
vector e1 ∈ Rm and all previous columns are zero. If we perform any elementary row operation on
A, the first i − 1 columns remain zero and the ith column remains nonzero. Thus, the ith column
is the first nonzero column of B, i.e., it is B’s first pivot column.
Next we prove that all the pivots must be in the same locations. We do this by induction on
m, the number of rows. We’ve already established that this must be the case for m = 1. Now
assume that the statement is true for m = k and consider (k + 1) × n matrices A and B satisfying
the hypotheses. By what we’ve already said, A and B have the same first pivot column; by using
an elementary row operation of type (ii) appropriately, we may assume those respective first pivot
entries in the first row are equal. Now, the k × n matrices A′ and B ′ obtained from A and B
by deleting their first rows are also in echelon form. Furthermore, any sequence of elementary
row operations that transforms A to B cannot involve the first row in a nontrivial way (if we add
a multiple of the first row to any other row, we must later subtract it again). Thus, A′ can be
transformed to B ′ by a sequence of elementary row operations. By the induction hypothesis we
can now conclude that A′ and B ′ have pivots in the same locations and, thus, so do A and B.
Last, we prove that if A and B are in reduced echelon form, then they are equal. Again we
proceed by induction on m. The case m = 1 is trivial. Assume that the statement is true for m = k
and consider the case m = k + 1. If the matrix A has a row of zeroes, then so must the matrix
B; we delete these rows and apply the induction hypothesis to conclude that A = B. Now, if the
last row of A is nonzero, it must contain the last pivot of A (say, in the j th column). Then we
know that the last pivot of B must be in the j th column as well. Since the matrices are in reduced
echelon form, their j th columns must be the last standard basis vector em ∈ Rm . Because of this,
the sequence of elementary row operations that transforms A to B cannot involve the last row in
§1. Gaussian Elimination and the Theory of Linear Systems 129

a nontrivial way. Thus, if we let A′ and B ′ be the matrices obtained from A and B by deleting
the last row, we see that A′ can be transformed to B ′ by a sequence of elementary row operations
and that A′ and B ′ are both in reduced echelon form. The induction hypothesis applies to A′ and
B ′ , so we conclude that A′ = B ′ . Finally, we need to argue that the bottom rows of A and B are
identical. But any elementary row operation that would alter the last row would also have to make
some change in the first j entries. Since the last rows of A and B are known to agree in the first j
entries we conclude that they must agree everywhere.

1.1. Consistency. We recall from Chapter 1 that the product Ax can be expressed as
       
a11 x1 + · · · + a1n xn a11 a12 a1n
       
 a21 x1 + · · · + a2n xn   a21   a22   a2n 
(∗) Ax = 
 ..  = x1  .  + x2  .  + · · · + xn  . 
  .   .   . 
 .   .   .   . 
am1 x1 + · · · + amn xn am1 am2 amn
= x1 a 1 + x2 a 2 + · · · + xn a n ,
 
c1
 .. 
where a1 , . . . , an ∈ Rm are the column vectors of the matrix A. Thus, a solution c =  .  of the
cn
linear system Ax = b provides scalars c1 , . . . , cn so that

b = c1 a1 + · · · + cn an ;

i.e., a solution gives a representation of the vector b as a linear combination, c1 a1 + · · · + cn an , of

the column vectors of A.

Example 5. Consider the four vectors

       
4 1 1 2
3 0 1 1
       
b =   , v1 =   , v2 =   , and v3 =   .
1 1 1 1
2 2 1 2

Suppose we want to express the vector b as a linear combination of the vectors v1 , v2 , and v3 .
Writing out the expression
       
1 1 2 4
0 1 1 3
       
x1 v1 + x2 v2 + x3 v3 = x1   + x2   + x3   =   ,
1 1 1 1
2 1 2 2

we obtain the system of equations

x1 + x2 + 2x3 = 4
x2 + x3 = 3
x1 + x2 + x3 = 1
2x1 + x2 + 2x3 = 2 .
130 Chapter 4. Implicit and Explicit Solutions of Linear Systems

In matrix notation, we must solve Ax = b, where

 
1 1 2
0 1 1
 
A= .
1 1 1
2 1 2
So we take the augmented matrix to reduced echelon form:
   
1 1 2 4 1 1 2 4
0 1 1 3 0 1 1 3
   
[A | b] =    
1 1 1 1 0 0 −1 −3 
2 1 2 2 0 −1 −2 −6
   
1 1 2 4 1 0 0 −2
0 1 1 3 0 1 0 0
   
   .
0 0 1 3 0 0 1 3
0 0 0 0 0 0 0 0
This tells us that the solution is
 
−2
 
x =  0, so b = −2v1 + 0v2 + 3v3 ,
3
which, as the reader can check, works. ▽

Now we modify the preceding example slightly.

Example 6. We would like to express the vector

 
1
1
 
b= 
0
1
as a linear combination of the same vectors v1 , v2 , and v3 . This then leads analogously to the
system of equations
x1 + x2 + 2x3 = 1
x2 + x3 = 1
x1 + x2 + x3 = 0
2x1 + x2 + 2x3 = 1
and to the augmented matrix
 
1 1 2 1
0 1 1 1
 
 ,
1 1 1 0
2 1 2 1
§1. Gaussian Elimination and the Theory of Linear Systems 131

whose echelon form is

 
1 1 2 1
0 1 1 1
 
 .
0 0 1 1
0 0 0 1

The last row of the augmented matrix corresponds to the equation

0x1 + 0x2 + 0x3 = 1,

which obviously has no solution. Thus, the original system of equations has no solution: the vector
b in this example cannot be written as a linear combination of v1 , v2 , and v3 . ▽

These examples lead us to make the following

Definition . If the system of equations Ax = b has no solutions, the system is said to be

inconsistent; if it has at least one solution, then it is said to be consistent.

A system of equations is consistent precisely when a solution exists. We see that the system of
equations in Example 6 is inconsistent and the system of equations in Example 5 is consistent. It
is easy to recognize an inconsistent system of equations from the echelon form of its augmented
matrix: the system is inconsistent only when there is an equation which reads

0x1 + 0x2 + · · · + 0xn = c

for some nonzero scalar c, i.e., when there is a row in the echelon form of the augmented matrix
where all but the rightmost entry are 0.
Turning this around a bit, let [U | c] denote the echelon form of the augmented matrix [A | b].
The system Ax = b is consistent if and only if any zero row in U corresponds to a zero entry in
the vector c.
There are two geometric interpretations of consistency. From the standpoint of row vectors,
the system Ax = b is consistent precisely when the intersection of the hyperplanes

A1 · x = b1 , ..., Am · x = bm

is nonempty. From the point of view of column vectors, the system Ax = b is consistent precisely
when the vector b can be written as a linear combination of the column vectors a1 , . . . , an of A.
In the next example, we characterize those vectors b ∈ R4 that can be expressed as a linear
combination of the three vectors v1 , v2 , and v3 from Examples 5 and 6.

Example 7. For what vectors


b1
b 
 2
b= 
 b3 
b4
132 Chapter 4. Implicit and Explicit Solutions of Linear Systems

will the system of equations

x1 + x2 + 2x3 = b1
x2 + x3 = b2
x1 + x2 + x3 = b3
2x1 + x2 + 2x3 = b4

have a solution? We form the augmented matrix [A | b] and determine its echelon form:
     
1 1 2 b1 1 1 2 b1 1 1 2 b1
0 1 1 b2  0 1 1 b2  0 1 1 b2 
     
     .
1 1 1 b3  0 0 −1 b3 − b1  0 0 1 b1 − b3 
2 1 2 b4 0 −1 −2 b4 − 2b1 0 0 0 −b1 + b2 − b3 + b4

We infer from the last row of the latter matrix that the original system of equations will have a
solution if and only if

(†) −b1 + b2 − b3 + b4 = 0.

That is, the vector b can be written as a linear combination of v1 , v2 , and v3 precisely when b
satisfies the constraint equation (†). ▽

Example 8. Given
 
1 −1 1
3 2 −1 
 
A= ,
1 4 −3 
3 −3 3

we wish to find all vectors b ∈ R4 so that Ax = b is consistent, i.e., all vectors b that can be
expressed as a linear combination of the columns of A.
We consider the augmented matrix [A | b] and determine its echelon form [U | c]. In order for
the system to be consistent, every entry of c corresponding to a row of zeroes in U must be 0 as
well:
   
1 −1 1 b1 1 −1 1 b1
3 2 −1 b2  0 5 −4 b2 − 3b1 
   
[A | b] =    
1 4 −3 b3  0 5 −4 b3 − b1 
3 3 −3 b4 0 0 0 b4 − 3b1
 
1 −1 1 b1
0 5 −4 b2 − 3b1 
 
 .
0 0 0 b3 − b2 + 2b1 
0 0 0 b4 − 3b1

Thus, we conclude that Ax = b is consistent if and only if b satisfies the constraint equations

2b1 − b2 + b3 = 0 and − 3b1 + b4 = 0.

§1. Gaussian Elimination and the Theory of Linear Systems 133

4
These equations describe
    of two hyperplanes through the origin in R with respec-
the intersection
2 −3
 −1   0
tive normal vectors    
 1  and  0 . ▽
0 1
Notice that here we have reversed the process at the beginning of this section. There we
expressed the general solution of a system of linear equations as a linear combination of certain
vectors. Here, starting with the column vectors of the matrix A, we have found the constraint
equations a vector b must satisfy in order to be a linear combination of them (that is, to be in
the plane they span). This is the process of determining Cartesian equations for a space defined
parametrically.

1.2. Existence and Uniqueness of Solutions. In general, given an m × n matrix, we might

wonder how many conditions a vector b ∈ Rm must satisfy in order to be a linear combination of
the columns of A. From the procedure we’ve just followed, the answer is quite clear: Each row of
zeroes in the echelon form of A contributes one constraint. This leads us to our next
Definition. The rank of a matrix is the number of nonzero rows (i.e., the number of pivots)
in its echelon form. It is usually denoted by r.
Then the number of rows of zeroes in the echelon form is m − r, and b must satisfy m − r constraint
equations. We recall that even though a matrix may have lots of different echelon forms, it follows
from Theorem 1.2 that they all must have the same number of nonzero rows.
Given a system of m linear equations in n variables, let A denote its coefficient matrix and r
the rank of A. Let’s now summarize the state of our knowledge:
Proposition 1.3. The linear system Ax = b is consistent if and only if the rank of the
augmented matrix [A | b] equals the rank of A. In particular, if r = m, then the system Ax = b
will be consistent for all vectors b ∈ Rm .
Proof. Ax = b is consistent if and only if the rank of the augmented matrix [A | b], which is
the number of nonzero rows in the augmented matrix [U | c], equals the number of nonzero rows
in U , i.e., the rank of A. When r = m, there is no row of zeroes in U , hence no possibility of
inconsistency.
We now turn our attention to the question of how many solutions a given consistent system of
equations has. Our experience with solving systems of equations suggests that the solutions of a
consistent linear system Ax = b are intimately related to the solutions of the system Ax = 0.
Definition. A system Ax = b of linear equations is called inhomogeneous when b 6= 0; the
corresponding equation Ax = 0 is called the associated homogeneous system.
The solutions of the inhomogeneous system Ax = b and those of the associated homogeneous
system Ax = 0 are related by the following
Proposition 1.4. Assume the system Ax = b is consistent, and let u1 be a “particular
solution.” Then all the solutions are of the form
u = u1 + v
134 Chapter 4. Implicit and Explicit Solutions of Linear Systems

for some solution v of the associated homogeneous system Ax = 0.

Proof. First we observe that any such vector u is a solution of Ax = b. By linearity, we have

Au = A(u1 + v) = Au1 + Av = b + 0 = b.

Conversely, every solution of Ax = b can be written in this form, for if u is an arbitrary solution
of Ax = b, then, by linearity again,

A(u − u1 ) = Au − Au1 = b − b = 0,

so v = u−u1 is a solution of the associated homogeneous system; now we just solve for u, obtaining
u = u1 + v, as required.

Remark. As Figure 1.1 suggests, when the inhomogeneous system Ax = b is consistent, its
solutions are obtained by translating the set of solutions of the associated homogeneous system by

u v
v
solutions of Ax=b
u1
solutions of Ax=0

Figure 1.1

a particular solution u1 .

Of course, a homogeneous system is always consistent, since the trivial solution, x = 0, is always
a solution of Ax = 0. Now, if the rank of A is r, then there will be r pivot variables and n − r free
variables in the general solution of Ax = 0. In particular, if r = n, then x = 0 is the only solution
of Ax = 0.

Definition . If the system of equations Ax = b has precisely one solution, then we say that
the system has a unique solution.

Thus, a homogeneous system Ax = 0 has a unique solution when r = n and infinitely many
solutions when r < n. Note that it is impossible to have r > n, since there cannot be more pivots
than columns. Similarly, there cannot be more pivots than rows in the matrix, so it follows that
whenever n > m (i.e., there are more variables than equations), the homogeneous system Ax = 0
must have infinitely many solutions.
From Proposition 1.4 we know that if the inhomogeneous system Ax = b is consistent, then its
solutions are obtained by translating the solutions of the associated homogeneous system Ax = 0
by a particular solution. So we have the

Proposition 1.5. Suppose the system Ax = b is consistent. Then it has a unique solution if
and only if the associated homogeneous system Ax = 0 has only the trivial solution. This happens
exactly when r = n.
§1. Gaussian Elimination and the Theory of Linear Systems 135

We conclude this discussion with an important special case. It is natural to ask when the
inhomogeneous system Ax = b has a unique solution for every b ∈ Rm . From Proposition 1.3 we
infer that for the system always to be consistent, we must have r = m; from Proposition 1.5 we
infer that for solutions to be unique, we must have r = n. And so we see that we can only have
both conditions when r = m = n.

Definition. An n × n matrix of rank r = n is called nonsingular. An n × n matrix of rank

r < n is called singular.

It is easy—but important—to observe that an n × n matrix is nonsingular if and only if there is a

pivot in each row, hence in each column, of its echelon form.

Proposition 1.6. Let A be an n × n matrix. The following are equivalent:

(1) A is nonsingular.
(2) Ax = 0 has only the trivial solution.
(3) For every b ∈ Rn , the equation Ax = b has a unique solution.

EXERCISES 4.1

1. Prove Proposition 1.1.

*2. Decide which of the following matrices are in echelon form, which are in reduced echelon form,
and which are neither. Justify your answers.
" #  
0 1 1 1 0
a.  
2 3 e.  0 0 0
" #
2 1 3 0 0 1
b.  
0 1 −1 1 1 0 −1
" #  
1 0 2 f.  0 2 1 0
c. 0 0 0 1
0 1 −1  
" # 1 0 −2 0 1
1 1 0  
d. g.  0 1 1 0 1
0 0 2
0 0 0 1 4
3. For each of the following matrices A, determine its reduced echelon form and give the general
solution of Ax = 0 in standard
 form.  
1 0 −1 1 2 −1
  
a. A =  −2 3 −1   1 3 1
c. A =  
3 −3 0  2 4 3
 
2 −2 4 −1 1 6
  " #
*b. A =  −1 1 −2  1 −2 1 0
d. A =
3 −3 6 2 −4 3 −1
136 Chapter 4. Implicit and Explicit Solutions of Linear Systems
   
1 1 1 1 1 −1 1 1 0
1 2 1 2  1 0 2 1 1
   
*e. A =   g. A =  
1 3 2 4  0 2 2 2 0
1 2 2 3 −1 1 −1 0 −1
   
1 2 0 −1 −1 1 1 0 5 0 −1
 −1 −3 1 2 3  0 1 1 3 −2 0
   
*f. A =   h. A =  
 1 −1 3 1 1  −1 2 3 4 1 −6 
2 −3 7 3 4 0 4 4 12 −1 −7
4. Give the general
 solution of
 the equation
 Ax
 = b in standard form.
2 1 −1 3
   
*a. A =  1 2 1 , b =  0 
−1 1 2 −3
" # " #
1 1 1 1 6
b. A = , b=
3 3 2 0 17
   
1 1 1 −1 0 −2
2 0 4 1 −1   10 
   
c. A =  , b =  
1 2 0 −2 2   −3 
0 1 −1 2 4 7    
1 0
*5. Find all the unit vectors x ∈ R3 that make an angle of π/3 with the vectors  0  and  1 .
−1 1
4
6. Findthenormal
  vector  inR spanned
  to the hyperplane   by
1 1 1 1 2 1
1 2 3 1 2 3
*a.      
 1 ,  1 ,  2  b.      
 1 ,  1 ,  2 .
1 2 4 1 2 3

2 −1 −4
*7. A circle C passes through the points , , and . Find the center and radius of
6 7 −2
C. (Hint: The equation of a circle can be written in the form x2 + y 2 + ax + by + c = 0. Why?)

*8. By solving a system of equations, find the linear combination of the vectors

     
1 0 2
v1 =  0  , v2 =  1  , v3 =  1 
−1 2 1
 
3
that gives b =  0 .
−2

*9. For each of the following vectors b ∈ R4 , decide whether b is a linear combination of

     
1 0 1
 0  −1   −2 
v1 = 
 1,
 v2 = 
 0,
 and v3 = 
 1.


−2 1 0
§1. Gaussian Elimination and the Theory of Linear Systems 137
     
1 1 1
1  −1   1
a. b=
1
 b. b=
 1
 c. b=
 0


1 −1 −2
3
*10. Decide
 whether
  each of the following collections of vectors  R .    
 spans
1 1 1 1 3 2
a.  1 ,  2  c.  0 ,  −1 ,  5 ,  3 
1 2 1 1 3 2
           
1 1 1 1 2 0
b.  1 ,  2 ,  3  d.  0 ,  1 ,  1 
1 2 3 −1 1 5
11. Find the constraint
 equations
 that b must satisfy in order for Ax = b to be consistent.
3 −1
 
a. A =  6 −2 
−9 3
 
1 1 1
 
*b. A =  −1 1 2
1 3 4
 
1 2 1
 0 1 1
 
c. A =  
 −1 3 4
−2 −1 1
12. Find the constraint
 equations
   that
b must satisfy in order to be
an 
element
  of
 
1 0 1 1 0 2
 0   1   1   0   1   −1 
a. V = Span      
 1  ,  1  ,  1  b. V = Span       
 1  ,  1  ,  1 
1 2 0 1 2 0

13. Find a matrix A with the given

 property
 or explain why none can exist.    
1 1 2
*a. one of the rows of A is  0  and for some b ∈ R2 both the vectors  0  and  1  are
1 1 1
solutions of the equation Ax = b;    
0 0
1 0
b. the rows of A are linear
 combinations
  of    
 0  and  1  and for some nonzero b ∈ R
2

1 4
2 1 1 1
both the vectors 1
 and   are solutions of the equation Ax = b;
0
2 3
 
1
0
  2
c. the 
  orthogonal to  1  and for some nonzero vector b ∈ R both the vectors
rows ofA are
1 1
0 1 0
  and   are solutions of the equation Ax = b;
1 1
0 1
138 Chapter 4. Implicit and   Solutions
Explicit   of Linear Systems
1 2
d. for some nonzero vectors b1 , b2 ∈ R2 both the vectors  0  and  1  are solutions of
1 1
   
1 1
the equation Ax = b1 and both the vectors  0  and  1  are solutions of the equation
Ax = b2 . 0 1
" #
1 α
*14. Let A= .
α 3α
a. For which numbers α will A be singular?
b. For all numbers α not on your list in part a, we can solve Ax = b for every vector b ∈ R2 .
For each of the numbers α on your list, give the vectors b for which we can solve Ax = b.
 
1 1 α
 
15. Let A =  α 2 α .
α α 1
a. For which numbers α will A be singular?
b. For all numbers α not on your list in part a, we can solve Ax = b for every vector b ∈ R3 .
For each of the numbers α on your list, give the vectors b for which we can solve Ax = b.

16. Prove or give a counterexample:

a. If Ax = 0 has only the trivial solution x = 0, then Ax = b always has a unique solution.
b. If Ax = 0 and Bx = 0 have the same solutions, then the set of vectors b so that Ax = b
is consistent is the same as the set of the vectors b so that Bx = b is consistent.
♯ 17. a. Suppose A and B are nonsingular n × n matrices. Prove that AB is nonsingular. (Hint:
Solve (AB)x = 0.)
b. Suppose A and B are n × n matrices. Prove that if either A or B is singular, then AB is
singular.

18. In each case, give positive integers m and n and an example of an m × n matrix A with the
stated property, or explain why none can exist.
*a. Ax = b is inconsistent for every b ∈ Rm .
*b. Ax = b has one solution for every b ∈ Rm .
c. Ax = b has either zero or one solution for every b ∈ Rm .
d. Ax = b has infinitely many solutions for every b ∈ Rm .
*e. Ax = b has infinitely many solutions whenever it is consistent.
f. There are vectors b1 , b2 , b3 so that Ax = b1 has no solution, Ax = b2 has exactly one
solution, and Ax = b3 has infinitely many solutions.

19. ♯ a. Suppose A ∈ Mm×n , B ∈ Mn×m , and BA = In . Prove that if for some b ∈ Rm the
equation Ax = b has a solution, then that solution is unique.
b. Suppose A ∈ Mm×n , C ∈ Mn×m , and AC = Im . Prove that the system Ax = b is
consistent for every b ∈ Rm .
♯ c. Suppose A ∈ M
m×n and B, C ∈ Mn×m are matrices that satisfy BA = In and AC = Im .
Prove that B = C.
§1. Gaussian Elimination and the Theory of Linear Systems 139

20. Let A be an m × n matrix with row vectors A1 , . . . , Am ∈ Rn .

a. Suppose A1 + · · · + Am = 0. Prove that rank(A) < m. (Hint: Why must there be a row
of zeroes in the echelon form of A?)
b. More generally, suppose there is some nontrivial linear combination c1 A1 +· · ·+cm Am = 0.
Prove rank(A) < m.

21. Let A be an m × n matrix with column vectors a1 , . . . , an ∈ Rm .

a. Suppose a1 + · · · + an = 0. Prove that rank(A) < n. (Hint: Consider solutions of Ax = 0.)
b. More generally, suppose there is some nontrivial linear combination c1 a1 + · · · + cn an = 0.
Prove rank(A) < n.

xi
22. Let Pi = ∈ R2 , i = 1, 2, 3. Assume x1 , x2 , and x3 are distinct.
yi
a. Show that the matrix
 
1 x1 x21
 
1 x2 x22 
1 x3 x23

is nonsingular.
b. Show that the system of equations

    
x21 x1 1 a y1
 2    
 x2 x2 1   b  =  y2 
x23 x3 1 c y3

always has a unique solution. Deduce that if P1 , P2 , and P3 are not collinear, then they
lie on a unique parabola y = ax2 + bx + c.

xi
23. Let Pi = ∈ R2 , i = 1, 2, 3. Let
yi

 
x1 y1 1
 
A =  x2 y2 1.
x3 y3 1

a. Prove that the three points P1 , P2 , and P3 are collinear if and only if the equation Ax = 0
has a nontrivial solution. (Hint: A general line in R2 is of the form ax + by + c = 0, where
a and b are not both 0.)
b. Prove that if the three given points are not collinear, then there is a unique circle passing
through them. (Hint: If you set up a system of linear equations as suggested by the hint
for Exercise 7, you should use part a to deduce that the appropriate coefficient matrix is
nonsingular.)
140 Chapter 4. Implicit and Explicit Solutions of Linear Systems

2. Elementary Matrices and Calculating Inverse Matrices

So far we have focused on the interpretation of matrix multiplication in terms of columns,

namely, the fact that the j th column of AB is the product of A with the j th column vector of B.
But equally à propos is the observation that
the ith row of AB is the product of the ith row vector of A with B.
Just as multiplying the matrix A by a column vector x on the right,
 
  x1
| | |  
  x2 

a1 a2 · · · an   ..  ,
 . 
| | |
xn

gives us the linear combination x1 a1 + x2 a2 + · · · + xn an of the columns of A, the reader can easily
check that multiplying A on the left by the row vector [x1 x2 · · · xm ],
 
A1
h i
 A2


x1 x2 · · · xm 
 .. ,

 . 
Am

yields the linear combination x1 A1 + x2 A2 + · · · + xm Am of the rows of A.

It should come as no surprise, then, that we can perform row operations on a matrix A by
multiplying on the left by appropriately chosen matrices. For example, if
 
1 2
 
A = 3 4,
5 6
     
0 1 0 1 0 0 1 0 0
     
E1 =  1 0 0  , E2 =  0 1 0  , and E3 =  −2 1 0,
0 0 1 0 0 4 0 0 1

then
     
3 4 1 2 1 2
     
E1 A =  1 2, E2 A =  3 4, and E3 A =  1 0.
5 6 20 24 5 6

Such matrices that give corresponding elementary row operations are called elementary ma-
trices. Note that each elementary matrix—( differs from the identity matrix only in a small way.
(N.B. Here we establish the custom that blank spaces in a matrix represent 0’s.)
(i) To interchange rows i and j, we should multiply by an elementary matrix of the form

i j
↓ ↓
§2. Elementary Matrices and Calculating Inverse Matrices 141
 
1
 .. 
 . 
 
i→ ··· 0 ··· 1 ··· 

 .. 
 . .
 
 
j → ··· 1 ··· 0 ··· 
 
 .. 
 . 
1

(ii) To multiply row i by a scalar c, we should multiply by an elementary matrix of the

form
i
↓
 
1
 . 
 .. 
 
 
 1 
 
i →
 c .

 
 1 
 .. 
 . 
 
1

(iii) To add c times row i to row j, we should multiply by an elementary matrix of the form
i j
↓ ↓
 
1
 .. 
 . 
 
i→ 1 

 .. 
 . .
 
 
j → ··· c ··· 1 
 
 .. 
 . 
1

Here’s an easy way to remember the form of these matrices: each elementary matrix is obtained
by performing the corresponding elementary row operation on the identity matrix.elementary ma-
trix—)
" #
4 3 5
Example 1. Let A = . We put A in reduced echelon form by the following
1 2 5
sequence of row operations:
" # " # " # " # " #
4 3 5 1 2 5 1 2 5 1 2 5 1 0 −1
.
1 2 5 4 3 5 0 −5 −15 0 1 3 0 1 3
142 Chapter 4. Implicit and Explicit Solutions of Linear Systems

These steps correspond to multiplying, in sequence from right to left, by the elementary matrices
" # " # " # " #
1 1 1 1 −2
E1 = , E2 = , E3 = , E4 = ;
1 −4 1 − 15 1
now the reader can check that
" #" #" #" # " #
2
1 −2 1 1 1 5 − 53
E = E4 E3 E2 E1 = =
1 − 51 −4 1 1 − 15 4
5

and, indeed,
" #" # " #
2
5 − 53 4 3 5 1 0 −1
EA = = ,
− 15 4
5 1 2 5 0 1 3
as it should. ▽

Example 2. Let’s revisit Example 4 on p. 127. Let

 
1 1 3 −1 0
 −1 1 1 1 2
 
A= .
 0 1 2 2 −1 
2 −1 0 1 −6
To clear out the entries below the first pivot, we must multiply by the product of the two elementary
matrices E1 and E2 :
    
1 1 1
 1 1 1   1 1 
    
E2 E1 =   = ;
 1  1   1 
−2 1 1 −2 1
to change the pivot in the second row to 1 and then clear out below, we multiply first by
 
1
 1 
 2 
E3 =  
 1 
1

and then by the product

    
1 1 1
 1  1   1 
    
E5 E4 =   = .
 1  −1 1   −1 1 
3 1 1 3 1
We then change the pivot in the third row to 1 and clear out below, multiplying by
   
1 1
 1   1 
   
E6 =  1  and E 7 =  .
 2
  1 
1 −3 1
§2. Elementary Matrices and Calculating Inverse Matrices 143

Now we clear out above the pivots by multiplying by

   
1 1 1 −1
 1   1 
   
E8 =   and E9 =  .
 1   1 
1 1
The net result is this: when we multiply the product
 1

4 − 43 1
2 0
 1 1
0 0
 2 2 
E9 E8 E7 E6 (E5 E4 )E3 (E2 E1 ) =  1 
 −4 − 14 1
2 0
1 9
4 4 − 23 1
by the original matrix, we do in fact get the reduced echelon form. ▽
Recall from Section 1 that if we want to find the constraint equations that a vector b must
satisfy in order for Ax = b to be consistent, we reduce the augmented matrix [A | b] to echelon form
[U | c] and set equal to 0 those entries of c corresponding to the rows of zeroes in U . That is, when A
is an m×n matrix of rank r, the constraint equations are merely the equations cr+1 = · · · = cm = 0.
Letting E be the product of the elementary matrices corresponding to the elementary row operations
required to put A in echelon form, we have U = EA and so
(†) [U | c] = [EA | Eb] .
That is, the constraint equations are the equations
Er+1 · b = 0, ... , Em · b = 0.
Interestingly, we can use the equation (†) to find a simple way to compute E: when we reduce the
augmented matrix [A | b] to echelon form [U | c], E is the matrix so that Eb = c.
Example 3. Taking the matrix A from Example 2, let’s find the constraint equations for
Ax = b to be consistent. We start with the augmented matrix
 
1 1 3 −1 0 b1
 −1 1 1 1 2 b2 
 
[A | b] =  
 0 1 2 2 −1 b3 
2 −1 0 1 −6 b4
and reduce to echelon form
 
1 1 3 −1 0 b1
0 2 4 0 2 b1 + b2 
 
[U | c] =  .
0 0 0 4 −4 −b1 − b2 + 2b3 
0 0 0 0 0 b1 + 9b2 − 6b3 + 4b4
Now it is easy to see that if
   
b1 1
 b1 + b2   1 1 
   
Eb =  , then E= .
 −b1 − b2 + 2b3   −1 −1 2 
b1 + 9b2 − 6b3 + 4b4 1 9 −6 4
144 Chapter 4. Implicit and Explicit Solutions of Linear Systems

The reader should check that, in fact, EA = U .

We could continue our Gaussian elimination to reach reduced echelon form:
 1 3 1

1 0 1 0 −2 4 b1 − 4 b2 + 2 b3
0 1 2 0 1 1 1 

[R | d] =  2 b1 + 2 b2 
1 1 1 .
0 0 0 1 −1 − 4 b1 − 4 b2 + 2 b3 
0 0 0 0 0 b1 + 9b2 − 6b3 + 4b4
From this we see that R = E ′ A, where
 1

4 − 43 1
2 0
 1 1
0 0
 2 
E′ =  1 2
,
 −4 − 41 1
2 0
1 9 −6 4
which is very close to—but not the same as—the product of elementary matrices we obtained at
the end of Example 2. Can you explain why the first three rows must agree here, but not the last?
▽

We now concentrate on square (n × n) matrices. Recall that the inverse of the n × n matrix
A is the matrix A−1 satisfying AA−1 = A−1 A = In . It is convenient to have an inverse matrix if
we wish to solve the system Ax = b for numerous vectors b. If A is invertible, we can solve as
follows:3

Ax = b
⇓ multiplying both sides of the equation by A−1 on the left
A−1 (Ax) = A−1 b
⇓ using the associative property
(A−1 A)x = A−1 b
⇓ using the definition of A−1
x = In x = A−1 b

We aren’t done! We’ve shown that if x is a solution, then it must satisfy x = A−1 b. That is,
we’ve shown that the vector A−1 b is a candidate for a solution. But now we check that it truly is
a solution by straightforward calculation:

Ax = A(A−1 b) = (AA−1 )b = In b = b,

as required; but note that we have used both pieces of the definition of the inverse matrix to prove
that the system has a unique solution (which we “discovered” along the way).
It is a consequence of this computation that if A is an invertible n×n matrix, then Ax = c has a
unique solution for every c ∈ Rn , and so it follows from Proposition 1.6 that A must be nonsingular.
What about the converse? If A is nonsingular, must A be invertible? Well, if A is nonsingular, we
know that every equation Ax = c has a unique solution. In particular, for j = 1, . . . , n, there is
3
We will write the “implies” symbol “=⇒” vertically so that we can indicate the reasoning in each step.
§2. Elementary Matrices and Calculating Inverse Matrices 145

a unique vector bj that solves Abj = ej , the j th standard basis vector. If we let B be the n × n
matrix whose column vectors are b1 , . . . , bn , then we have
   
| | | | | |
   
AB = A b1 b2 · · · bn  = e1 e2 · · · en  = In .
| | | | | |
This suggests that the matrix we’ve constructed should be the inverse matrix of A. But we need
to know that BA = In as well. Here is a very elegant way to understand why this is so. We can
find the matrix B by forming the giant augmented matrix
   
| |
   
 A e1 · · · en  =  A In 
| |
and using Gaussian elimination to obtain the reduced echelon form
 
 
 In B .

(Note that the reduced echelon form of A must be In because A is nonsingular.) But this tells us
that if E is the product of the elementary matrices required to put A in reduced echelon form, then
we have
E [A | I ] = [I | B] ,
and so B = E and BA = In , which is what we needed to check. In conclusion, we have proved the
following

Theorem 2.1. An n × n matrix is nonsingular if and only if it is invertible.

Note that Gaussian elimination will also let us know when A is not invertible: if we come to a
row of zeroes while reducing A to echelon form, then, of course, A is singular and so it cannot be
invertible. The following observation is often very useful.

Corollary 2.2. If A and B are n×n matrices satisfying BA = In , then B = A−1 and A = B −1 .

Proof. By Exercise 4.1.19a, the equation Ax = 0 has only the trivial solution. Hence, by
Proposition 1.6, A is nonsingular; according to Theorem 2.1, A is therefore invertible. Since A has
an inverse matrix, A−1 , we deduce that
BA = In
⇓ multiplying both sides of the equation by A−1 on the right
(BA)A−1 = In A−1
⇓ using the associative property
B(AA−1 ) = A−1
⇓ using the definition of A−1
B = A−1 ,
146 Chapter 4. Implicit and Explicit Solutions of Linear Systems

as desired. Since AB = In and BA = In , it now follows that A = B −1 , as well.

Example 4. We wish to determine the inverse of the matrix

 
1 −1 1
 
A =  2 −1 0
1 −2 2
(if it exists). We apply Gaussian elimination to the augmented matrix:
   
1 −1 1 1 0 0 1 −1 1 1 0 0
   
 2 −1 0 0 1 0 0 1 −2 −2 1 0
1 −2 2 0 0 1 0 −1 1 −1 0 1
   
1 −1 1 1 0 0 1 −1 1 1 0 0
   
0 1 −2 −2 1 0 0 1 −2 −2 1 0
0 0 −1 −3 1 1 0 0 1 3 −1 −1
   
1 −1 0 −2 1 1 1 0 0 2 0 −1
   
0 1 0 4 −1 −2  0 1 0 4 −1 −2 
0 0 1 3 −1 −1 0 0 1 3 −1 −1
It follows that  
2 0 −1
 
A−1 = 4 −1 −2  .
3 −1 −1
(The reader should check our arithmetic by multiplying AA−1 or A−1 A.) ▽

Example 5. It is convenient to derive the formula for the inverse of a general 2 × 2 matrix
first given in Example 9 of Chapter 1, Section 4. Let
" #
a b
A= .
c d
We assume a 6= 0 to start with.
" # " # " #
a b 1 0 1 ab a1 0 1 b
a
1
a 0
(assuming ad − bc 6= 0)
c d 0 1 c d 0 1 0 d − bca − c
a 1
" # " # " #
1 ab 1
a 0 1 0 1
a − b
a (− c
ad−bc ) − b a
a ad−bc 1 0 d
ad−bc − b
ad−bc ,
c a c a = c a
0 1 − ad−bc ad−bc 0 1 − ad−bc ad−bc 0 1 − ad−bc ad−bc

and so we see that, provided ad − bc 6= 0,

" #
1 d −b
A−1 = .
ad − bc −c a
As a check, we have
" # " # " #" #
a b 1 d −b 1 d −b a b
= I2 = .
c d ad − bc −c a ad − bc −c a c d
§2. Elementary Matrices and Calculating Inverse Matrices 147

Of course, we have derived this assuming a 6= 0, but the reader can check easily that the formula
works fine even when a = 0. We do see, however, from the row reduction that

a b
is nonsingular ⇐⇒ ad − bc 6= 0. ▽
c d

We have shown in the course of proving Theorem 2.1 that when A is square, any B that satisfies
AB = I (a so-called right inverse of A) must also satisfy BA = I (and thus is a left inverse of A).
Likewise, we have established in Corollary 2.2 that when A is square, any left inverse of A is a bona
fide inverse of A. Indeed, it will never happen that a non-square matrix has both a left and a right
inverse (see Exercise 9).

Remark. Even when A is square, the left and right inverses have rather different interpreta-
tions. As we saw in the proof of Theorem 2.1, the columns of the right inverse arise as the solutions
of Ax = ej . On the other hand, the left inverse of A is the product of the elementary matrices by
which we reduce A to its reduced echelon form, I. (See Exercise 8.)

EXERCISES 4.2

*1. For each of the matrices A in Exercise 4.1.3, find a product of elementary matrices E = · · · E2 E1
so that EA is in echelon form. Use the matrix E you’ve found to give constraint equations for
Ax = b to be consistent.

2. Use Gaussian
" elimination
# to find A−1 (if it exists):
1 2
a. A =
−1 3
 
1 2 3
 
b. A =  1 1 2
0 1 2
 
1 0 1
 
*c. A =  0 2 1
−1 3 1
 
1 2 3
 
d. A =  4 5 6
7 8 9
 
2 3 4
 
*e. A =  2 1 1
−1 1 2
3. In each case, given A and b,
(i) Find A−1 .
(ii) Use your answer to (i) to solve Ax = b.
(iii) Use your answer to (ii) to express b as a linear combination of the columns of A.
148 Chapter 4. Implicit and Explicit Solutions of Linear Systems
" # " #
2 3 3
a. A = ,b=
3 5 4
   
1 1 1 1
   
*b. A =  0 2 3 , b =  1 
3 2 2 2
   
1 1 1 3
   
c. A =  0 1 1 , b =  0 
1 2 1 1
   
1 1 1 1 2
0 1 1 1  
 0
*d. A =   , b =  
0 0 1 3 1
0 0 1 4 1
" #
1 −1 1
4. a. Find two different right inverses of the matrix A = .
2 −1 0
b. Give a nonzero matrix that has no right inverse.
 
1 2
 
c. Find two left inverses of the matrix A =  0 −1 .
1 1
d. Give a nonzero matrix that has no left inverse.

5. Prove that the inverse of every elementary matrix is again an elementary matrix. Indeed, give
a simple prescription for determining the inverse of each type of elementary matrix.

6. Using Theorem 2.1 and Proposition 4.3 of Chapter 1, prove that if AB and B are nonsingular,
then A is nonsingular. (Cf. Exercise 4.1.17.)
♯ 7. Suppose A is an invertible m × m matrix and B is an invertible n × n matrix.
a. Prove that the matrix " #
A O
O B
is invertible and give a formula for its inverse.
b. Suppose C is an arbitrary m × n matrix. Is the matrix
" #
A C
O B
invertible?
(See Exercise 1.4.12 for the notion of block multiplication.)

8. Complete the following alternative argument that the matrix obtained by Gaussian elimination
must be the inverse matrix of A. Suppose A is nonsingular.
a. Show there are finitely many elementary matrices E1 , E2 , . . . , Ek so that
Ek Ek−1 · · · E2 E1 A = I.
b. Let B = Ek Ek−1 · · · E2 E1 . Prove that AB = I. (Hint: Use Proposition 4.3 of Chapter 1.)
§3. Linear Independence, Basis, and Dimension 149

9. Let A be an m × n matrix. Recall that the n × m matrix B is a left inverse of A if BA = In

and a right inverse if AB = Im .
a. Show that A has a right inverse if and only if we can solve Ax = b for every b ∈ Rm if
and only if rank(A) = m.
b. Show that A has a left inverse if and only if Ax = 0 has the unique solution x = 0 if and
only if rank(A) = n. (Hint for “⇐=”: If rank(A) = n, what is the reduced echelon form
of A?)
c. Show that A has both a left inverse and a right inverse if and only if A is invertible if and
only if m = n = rank(A).

3. Linear Independence, Basis, and Dimension

Given vectors v1 , . . . , vk ∈ Rn and v ∈ Rn , it is natural to ask whether v ∈ Span(v1 , . . . , vk ).

That is, do there exist scalars c1 , . . . , ck so that v = c1 v1 + c2 v2 + · · · + ck vk ? This is in turn a
question of whether a certain (inhomogeneous) system of linear equations has a solution. As we
saw in Section 1, one is often interested in the allied question: Is that solution unique?

Example 1. Let
       
1 1 1 1
       
v1 =  1  , v2 =  −1  , v3 =  0  , and v =  1  .
2 0 1 0
We ask first of all whether v ∈ Span(v1 , v2 , v3 ). This is a familiar question when we recast it in
matrix notation: Let    
1 1 1 1
   
A =  1 −1 0  and b =  1  .
2 0 1 0
Is the system Ax = b consistent? Immediately we write down the appropriate augmented matrix
and reduce to echelon form:
   
1 1 1 1 1 1 1 1
   
 1 −1 0 1 0 2 1 0,
2 0 1 0 0 0 0 −2
so the system is obviously inconsistent. The answer is: No, v is not in Span(v1 , v2 , v3 ).
What about  
2
 
w =  3 ?
5
As the reader can easily check, w = 3v1 − v3 , so w ∈ Span(v1 , v2 , v3 ). What’s more, w =
2v1 − v2 + v3 , as well. So, obviously, there is no unique expression for w as a linear combination
of v1 , v2 , and v3 . But we can conclude more: setting the two expressions for w equal, we obtain

3v1 − v3 = 2v1 − v2 + v3 , i.e., v1 + v2 − 2v3 = 0.

150 Chapter 4. Implicit and Explicit Solutions of Linear Systems

That is, there is a nontrivial relation among the vectors v1 , v2 , and v3 , and this is the reason
we have different ways of expressing w as a linear combination of the three of them. Indeed,
since v1 = −v2 + 2v3 , we can see easily that any linear combination of v1 , v2 , and v3 is a linear
combination just of v2 and v3 :
c1 v1 + c2 v2 + c3 v3 = c1 (−v2 + 2v3 ) + c2 v2 + c3 v3 = (c2 − c1 )v2 + (c3 + 2c1 )v3 .
The vector v1 was redundant, because
Span(v1 , v2 , v3 ) = Span(v2 , v3 ).
We might surmise that the vector w can now be written uniquely as a linear combination of v2
and v3 , and this is easy to check:
   
1 1 2 1 1 2
′    
A |w =  −1 0 3 0 1 5,
0 1 5 0 0 0
and from the fact that the matrix A′ has rank 2 we infer that the system of equations has a unique
solution. ▽
Remark. In the language of functions, if A is the standard matrix of a linear map T : Rn → Rm ,
we are interested in the image of T (i.e., the set of w ∈ Rm so that w = T (v) for some v ∈ Rn ),
and the issue of whether T is one-to-one (i.e., given w in the image, is there exactly one v ∈ Rn so
that T (v) = w?).
Generalizing the preceding example, we now recast Proposition 1.5:
Proposition 3.1. Let v1 , . . . , vk ∈ Rn and let V = Span(v1 , . . . , vk ). An arbitrary vector
v ∈ Span(v1 , . . . , vk ) has a unique expression as a linear combination of v1 , . . . , vk if and only if
the zero vector has a unique expression as a linear combination of v1 , . . . , vk , i.e.,
c1 v1 + c2 v2 + · · · + ck vk = 0 =⇒ c1 = c2 = · · · = ck = 0.
Proof. Suppose for some v ∈ V there are two different expressions
v = c1 v1 + c2 v2 + · · · + ck vk and
v = d1 v1 + d2 v2 + · · · + dk vk .

Then, subtracting, we obtain

0 = (c1 − d1 )v1 + · · · + (ck − dk )vk ,

and so the zero vector has a nontrivial representation as a linear combination of v1 , . . . , vk (by
which we mean that not all the coefficients are 0).
Conversely, suppose there is a nontrivial linear combination
0 = s1 v1 + · · · + sk vk .
Then, given any vector v ∈ V , we can express v as a linear combination of v1 , . . . , vk in several
ways: for instance, adding
v = c1 v1 + c2 v2 + · · · + ck vk and
§3. Linear Independence, Basis, and Dimension 151

0 = s1 v1 + s2 v2 + · · · + sk vk ,

we obtain another formula for v, namely,

v = (c1 + s1 )v1 + · · · + (ck + sk )vk .

This completes the proof.

This discussion leads us to make the following

Definition. The (indexed) set of vectors {v1 , . . . , vk } is called linearly independent if

c1 v1 + c2 v2 + · · · + ck vk = 0 =⇒ c1 = c2 = · · · = ck = 0,

i.e., if the only way of expressing the zero vector as a linear combination of v1 , . . . , vk is the trivial
linear combination 0v1 + · · · + 0vk .
The set of vectors {v1 , . . . , vk } is called linearly dependent if it is not linearly independent, i.e.,
if there is some expression

c1 v1 + c2 v2 + · · · + ck vk = 0, where not all the ci ’s are 0.

Remark. The language is problematic here. Many mathematicians—often including the author
of this text—tend to say things like “the vectors v1 , . . . , vk are linearly independent.” But linear
independence (or dependence) is a property of the whole collection of vectors, not of the individual
vectors. What’s worse, we really should refer to an ordered list of vectors rather than to a set of
vectors: for example, any list in which some vector, v, appears twice is obviously giving a linearly
dependent collection; but the set {v, v} is indistinguishable from the set {v}. There seems to be
no ideal route out of this morass! Having said all this, we warn the gentle reader that we may
occasionally say “the vectors v1 , . . . , vk are linearly (in)dependent” where it would be too clumsy
to be more pedantic. Just stay alert!!

Remark. Here is a piece of advice: It is virtually always the case that when you are presented
with a set of vectors {v1 , . . . , vk } that you are to prove linearly independent, you should write:
“Suppose c1 v1 + c2 v2 + · · · + ck vk = 0. I must show that c1 = · · · = ck = 0.”
You then use whatever hypotheses you’re given to arrive at that conclusion.

Example 2. We wish to decide whether the vectors

     
1 2 1
0 1  1
     
v1 =   , v2 =   , and v3 =   ∈ R4
1 1  0
2 1 −1
form a linearly independent set. Suppose c1 v1 + c2 v2 + c3 v3 = 0, i.e.,
     
1 2 1
0 1  1
     
c1   + c2   + c3   = 0.
1 1  0
2 1 −1
152 Chapter 4. Implicit and Explicit Solutions of Linear Systems

Can we conclude that c1 = c2 = c3 = 0? We recognize this as a homogeneous system of linear

equations:
 
1 2 1  
0 c1
 1 1
 
   c2  = 0.
1 1 0
c3
2 1 −1
By now we are old hands at solving such systems. We find that the echelon form of the coefficient
matrix is  
1 2 1
0 1 1
 
 ,
0 0 0
0 0 0
and so our system of equations in fact has infinitely many solutions. For example, we can take
c1 = 1, c2 = −1, and c3 = 1. The vectors therefore form a linearly dependent set. ▽

Example 3. Suppose u, v, w ∈ Rn . We show next that if {u, v, w} is linearly independent,

then so is {u + v, v + w, u + w}. Suppose

c1 (u + v) + c2 (v + w) + c3 (u + w) = 0.

We must show that c1 = c2 = c3 = 0. We use the distributive property to rewrite our equation as

(c1 + c3 )u + (c1 + c2 )v + (c2 + c3 )w = 0.

Since {u, v, w} is linearly independent, we may infer that

c1 + c3 = 0
c1 + c2 = 0
c2 + c3 = 0 ,

and we leave it to the reader to check that the only solution of this system of equations is, in fact,
c1 = c2 = c3 = 0, as desired. ▽

Example 4. Any time one has a list of vectors v1 , . . . , vk in which one of the vectors is the
zero vector, say v1 = 0, then the set of vectors must be linearly dependent. For the equation

1v1 = 0

is a nontrivial linear combination of the vectors yielding the zero vector. ▽

Example 5. How can two nonzero vectors u and v give rise to a linearly dependent set? By
definition, this means that there is a linear combination

au + bv = 0,

where either a 6= 0 or b 6= 0. Suppose a 6= 0. Then we may write u = − ab v, so u is a scalar multiple

of v. (Similarly, you may show that if b 6= 0, v must be a scalar multiple of u.) So two linearly
dependent vectors are parallel (and vice versa).
§3. Linear Independence, Basis, and Dimension 153

How can a collection of three nonzero vectors be linearly dependent? As before, there must be
a linear combination
au + bv + cw = 0,
where (at least) one of a, b, and c is nonzero. Say a 6= 0. This means that we can solve:
1 b c
u = − (bv + cw) = (− )v + (− )w,
a a a
so u ∈ Span(v, w). In particular, Span(u, v, w) is either a line (if all three vectors u, v, w are
parallel) or a plane. ▽

The appropriate generalization of the last example is the following useful

Proposition 3.2. Suppose v1 , . . . , vk ∈ Rn form a linearly independent set, and suppose

x ∈ Rn . Then {v1 , . . . , vk , x} is linearly independent if and only if x ∈
/ Span(v1 , . . . , vk ).

x
vk

Figure 3.1

Proof. Although Figure 3.1 suggests the result is quite plausible, we will prove the contrapos-
itive:
{v1 , . . . , vk , x} is linearly dependent if and only if x ∈ Span(v1 , . . . , vk ).
Suppose x ∈ Span(v1 , . . . , vk ). Then x = c1 v1 + c2 v2 + · · · + ck vk for some scalars c1 , . . . , ck , so

c1 v1 + c2 v2 + · · · + ck vk + (−1)x = 0,

from which we conclude that {v1 , . . . , vk , x} is linearly dependent (since at least one of the coeffi-
cients is nonzero).
Now suppose {v1 , . . . , vk , x} is linearly dependent. This means that there are scalars c1 , . . . , ck ,
and c, not all 0, so that
c1 v1 + c2 v2 + · · · + ck vk + cx = 0.
Note that we cannot have c = 0: for if c were 0, we’d have c1 v1 + c2 v2 + · · · + ck vk = 0, and linear
independence of {v1 , . . . , vk } implies c1 = · · · = ck = 0, which contradicts our assumption that
{v1 , . . . , vk , x} is linearly dependent. Therefore, c 6= 0, and so
1 c1 c2 ck
x = − (c1 v1 + c2 v2 + · · · + ck vk ) = (− )v1 + (− )v2 + · · · + (− )vk ,
c c c c
which tells us that x ∈ Span(v1 , . . . , vk ), as required.
154 Chapter 4. Implicit and Explicit Solutions of Linear Systems

Proposition 3.2 has the following consequence: if {v1 , . . . , vk } is linearly independent, then
Span(v1 ) ( Span(v1 , v2 ) ( · · · ( Span(v1 , . . . , vk ).
That is, with each additional vector, the subspace spanned gets larger. We now formalize the notion
of “size” of a subspace. But we now understand that when we have a set of linearly independent
vectors, no proper subset will yield the same span. In other words, we will have an “efficient” set
of spanning vectors (i.e., there is no redundancy in the vectors we’ve chosen: no proper subset will
do). This motivates the following

Definition. Let V ⊂ Rn be a subspace. The set of vectors {v1 , . . . , vk } is called a basis for V
if
(i) v1 , . . . , vk span V , i.e., V = Span(v1 , . . . , vk ), and
(ii) {v1 , . . . , vk } is linearly independent.
We comment that the plural of basis is bases.

Example 6. Recall that the vectors

     
1 0 0
0 1 0
     
. . .
 . 
e1 =  .  , e2 =  
 ..  , ..., en =  
 ..  ∈ R
n
     
0 0 0
0 0 1
are called the standard basis for Rn . To check that they comprise a basis, we must establish
that properties (i) and (ii) above hold for V = Rn . The first is obvious: if x ∈ Rn , then x =
x1 e1 + x2 e2 + · · · + xn en . The second is not much harder. Suppose c1 e1 + c2 e2 + · · · + cn en = 0.
Then this means that    
c1 0
   
 c2   0 
c=   
 ..  =  ..  ,
 .  .
cn 0
and so c1 = c2 = · · · = cn = 0. ▽

Example 7. Consider the plane given by V = {x ∈ R3 : x1 − x2 + 2x3 = 0} ⊂ R3 . Our

algorithms of Section 1 tell us that the vectors
   
1 −2
   
v1 =  1  and v2 =  0 
0 1
span V . Since these vectors are not parallel, we can deduce (see Example 5) that they must be
linearly independent.
For the practice, however, we give a direct argument. Suppose
   
1 −2
   
c1 v1 + c2 v2 = c1  1  + c2  0  = 0.
0 1
§3. Linear Independence, Basis, and Dimension 155

Writing out the entries explicitly, we obtain

   
c1 − 2c2 0
   
 c1  = 0,
c2 0
from which we conclude that c1 = c2 = 0, as required. (For future reference, we note that this
information came from the free variable “slots.”) Therefore, {v1 , v2 } is linearly independent and
gives a basis for V , as required. ▽

The following observation may prove useful.

Corollary 3.3. Let V ⊂ Rn be a subspace, and let v1 , . . . , vk ∈ V . Then {v1 , . . . , vk } is a

basis for V if and only if every vector of V can be written uniquely as a linear combination of
v1 , . . . , vk .

Proof. This is immediate from Proposition 3.1.

Definition. When we write v = c1 v1 +c2 v2 +· · ·+ck vk , we refer to c1 , . . . , ck as the coordinates

of v with respect to the (ordered) basis {v1 , . . . , vk }.

Example 8. Consider the three vectors

     
1 1 1
     
v1 =  2  , v2 =  1  , and v3 =  0  .
1 2 2
Let’s take a general vector b ∈ R3 and ask first of all whether it has a unique expression as a linear
combination of v1 , v2 , and v3 . Forming the augmented matrix and row reducing, we find
   
1 1 1 b1 1 0 0 2b1 − b3
   
2 1 0 b2  0 1 0 −4b1 + b2 + 2b3  .
1 2 2 b3 0 0 1 3b1 − b2 − b3
It follows from Corollary 3.3 that {v1 , v2 , v3 } is a basis for R3 , for an arbitrary vector b ∈ R3 can
be written in the form
b = (2b1 − b3 ) v1 + (−4b1 + b2 + 2b3 ) v2 + (3b1 − b2 − b3 ) v3 .
| {z } | {z } | {z }
c1 c2 c3
And, what’s more,
c1 = 2b1 − b3 ,
c2 = −4b1 + b2 + 2b3 , and
c3 = 3b1 − b2 − b3
give the coordinates of b with respect to the basis {v1 , v2 , v3 }. ▽

Another example, which will be quite important to us in the future, is the following

Proposition 3.4. Let A be an n × n matrix. Then A is nonsingular if and only if its column
vectors form a basis for Rn .
156 Chapter 4. Implicit and Explicit Solutions of Linear Systems

Proof. As usual, let’s denote the column vectors of A by a1 , a2 , . . . , an . Using Corollary 3.3,
we are to prove that A is nonsingular if and only if every vector in Rn can be written uniquely as
a linear combination of a1 , a2 , . . . , an . But this is exactly what Proposition 1.6 tells us.

Given a subspace V ⊂ Rn , how do we know there is some basis for it? This is a consequence of
Proposition 3.2 as well.

Theorem 3.5. Any subspace V ⊂ Rn other than the trivial subspace has a basis.

Proof. Since V 6= {0}, we choose a nonzero vector v1 ∈ V . If v1 spans V , then we know

{v1 } will constitute a basis for V . If not, choose v2 ∈ / Span(v1 ). From Proposition 3.2 we infer
that {v1 , v2 } is linearly independent. If v1 , v2 span V , then {v1 , v2 } will be a basis for V . If not,
choose v3 ∈ / Span(v1 , v2 ). Once again, we know that {v1 , v2 , v3 } will be linearly independent and
hence will form a basis for V if the three vectors span V . We continue in this fashion, and we are
guaranteed that the process will terminate in at most n steps, because, according to Exercise 6,
once we have n + 1 vectors in Rn , they must form a linearly dependent set.

Once we realize that every subspace V ⊂ Rn has some basis, we are confronted with the problem
that it has many of them. For example, Proposition 3.4 gives us a way of finding zillions of bases
for Rn . As we shall now show, all bases for a given subspace have one thing in common: they all
consist of the same number of elements.

Proposition 3.6. Let V ⊂ Rn be a subspace, let {v1 , . . . , vk } be a basis for V , and let
w1 , . . . , wℓ ∈ V . If ℓ > k, then {w1 , . . . , wℓ } must be linearly dependent.

Proof. Each vector in V can be written uniquely as a linear combination of v1 , . . . , vk . So let’s

write each vector w1 , . . . , wℓ as such:

w1 = a11 v1 + a21 v2 + · · · + ak1 vk

w2 = a12 v1 + a22 v2 + · · · + ak2 vk
..
.
wℓ = a1ℓ v1 + a2ℓ v2 + · · · + akℓ vk .

We now form the k × ℓ matrix A = [aij ]. This gives the matrix equation
   
| | | | | |
   
(∗) v1 v2 · · · vk  A = w1 w2 · · · wℓ  .
| | | | | |
Since ℓ > k, there cannot be a pivot in every column of A and so there is a nonzero vector c
satisfying
 
c1
 
 c2 
A 
 ..  = 0.
 . 
cℓ
§3. Linear Independence, Basis, and Dimension 157

Using (∗) and associativity, we have

    
  c1   c1
| | |   | | |   
 
c2  
   c2 
w1 w2 · · · wℓ   .  = v1 v2 · · · vk    
A  ..  = 0.
| | |  ..  | | |   . 
cℓ cℓ
That is, we have found a nontrivial linear combination
c1 w1 + · · · + cℓ wℓ = 0,
which means that {w1 , . . . , wℓ } is linearly dependent, as was claimed.

Remark. We can easily avoid equation (∗) in its matrix form. Since
k
X
wj = aij vi ,
i=1

we have
ℓ
X ℓ
X X
k Xk X
ℓ
(∗∗) cj wj = cj aij vi = aij cj vi .
j=1 j=1 i=1 i=1 j=1

As before, since ℓ > k, there is a nonzero vector c so that Ac = 0; this choice of c makes the right-
hand side of (∗∗) the zero vector. Consequently, there is a nontrivial relation among w1 , . . . , wℓ .

This proposition leads directly to our main result.

Theorem 3.7. Let V ⊂ Rn be a subspace, and let {v1 , . . . , vk } and {w1 , . . . , wℓ } be two bases
for V . Then we have k = ℓ.

Proof. Since {v1 , . . . , vk } forms a basis for V and {w1 , . . . , wℓ } is known to be linearly in-
dependent, we use Proposition 3.6 to conclude that ℓ ≤ k. Now here’s the trick: {w1 , . . . , wℓ }
is likewise a basis for V and {v1 , . . . , vk } is known to be linearly independent, so we infer from
Proposition 3.6 that k ≤ ℓ. The only way both inequalities can hold is for k and ℓ to be equal, as
we wished to show.
We now make the official

Definition. The dimension of a subspace V ⊂ Rn is the number of vectors in any basis for V .
We denote the dimension of V by dim V . By convention, dim{0} = 0.

As we shall see in our applications, dimension is a powerful tool. Here is the first instance.

Lemma 3.8. Suppose V and W are subspaces of Rn with the property that W ⊂ V . If
dim V = dim W , then V = W .

Proof. Let dim W = k and let {v1 , . . . , vk } be a basis for W . If W ( V , then there must be
a vector v ∈ V with v ∈
/ W . By virtue of Proposition 3.2, we know that {v1 , . . . , vk , v} is linearly
independent, so dim V ≥ k + 1. This is a contradiction. Therefore, V = W .
The next result is quite useful.
158 Chapter 4. Implicit and Explicit Solutions of Linear Systems

Proposition 3.9. Let V ⊂ Rn be a k-dimensional subspace. Then any k vectors that span V
must be linearly independent and any k linearly independent vectors in V must span V .

Proof. Left to the reader in Exercise 17.

Example 9. Let V = Span(v1 , v2 , v3 , v4 ) ⊂ R3 , where

       
1 2 0 3
       
v1 =  1  , v2 =  2  , v3 =  1  , and v4 =  4  .
2 4 1 7
We want a subset of {v1 , v2 , v3 , v4 } that will give us a basis for V . Of course, this set of 4 vectors
must be linearly dependent, since V ⊂ R3 and R3 is but 3-dimensional. But let’s examine the
solutions of
c1 v1 + c2 v2 + c3 v3 + c4 v4 = 0,

or, in matrix form,

 
  c1
1 2 0 3  
   c2 
1 2 1 4    = 0.
 c3 
2 4 1 7
c4
As usual, we proceed to reduced echelon form:
 
1 2 0 3
 
R = 0 0 1 1,
0 0 0 0
from which we find that the vectors
   
−2 −3
 1  0
   
  and  
 0  −1 
0 1
span the space of solutions. In particular, this tells us that
−2v1 + v2 = 0 and −3v1 − v3 + v4 = 0,
and so the vectors v2 and v4 can be expressed as linear combinations of the vectors v1 and v3 . On
the other hand, {v1 , v3 } is linearly independent (why?), so this gives a basis for V . ▽

3.1. Abstract Vector Spaces. We have not yet dealt with vector spaces other than Euclidean
spaces. In general, a vector space is a set endowed with the operations of addition and scalar
multiplication, subject to the properties listed in Exercise 1.1.12. Notions of linear independence
and basis proceed analogously; the remark on p. 157 shows that dimension is well-defined in the
general setting.

Examples 10. Here are a few examples of so-called “abstract” vector spaces. Others appear
in the exercises.
§3. Linear Independence, Basis, and Dimension 159

(a) Let Mm×n denote the set of all m×n matrices. As we’ve seen in Proposition 4.1 of Chapter
1, Mm×n is a vector space, using the operations of matrix addition and scalar multiplication
we’ve already defined. The zero “vector” is the zero matrix O. This space can naturally
be identified with Rmn (see Exercise 24).
(b) Let F(U ) denote the collection of all real valued functions defined on some subset U ⊂ Rn .
If f ∈ F(U ) and c ∈ R, then we can define a new function cf ∈ F(U ) by multiplying the
value of f at each point by the scalar c: i.e.,

(cf )(t) = cf (t) for each t ∈ U .

Similarly, if f, g ∈ F(U ), then we can define the new function f + g ∈ F(U ) by adding the
values of f and g at each point: i.e.,

(f + g)(t) = f (t) + g(t) for each t ∈ U .

By these formulas we define scalar multiplication and vector addition in F(U ). The zero
“vector” in F(U ) is the zero function. The various properties of a vector space follow from
the corresponding properties of the real numbers (as everything is defined in terms of the
values of the function at every point t). Since an element of F(U ) is a function, F(U ) is
often called a function space.
(c) Let Rω denote thecollection
 of all infinite sequences of real numbers. That is, an element of
x1
 x2 
 
Rω looks like x =  x3 , where xi ∈ R, i = 1, 2, 3, . . .. Operations are defined in the obvious
 
..
.
     
y1 cx1 x1 + y1
 y2   cx2   x2 + y2 
     
way: if c ∈ R and y =  y3 , then we set cx =  cx3  and x + y =  x3 + y3 . ▽
     
.. .. ..
. . .

The vector space of functions on an open subset U ⊂ Rn has various subspaces that will be of
particular interest to us. For any k ≥ 0 we have Ck (U ), the space of Ck functions on U ; indeed, we
have the hierarchy

C∞ (U ) ⊂ · · · ⊂ Ck+1 (U ) ⊂ Ck (U ) ⊂ · · · ⊂ C2 (U ) ⊂ C1 (U ) ⊂ C0 (U ).

(That these are all subspaces follows from the standard facts that sums and scalar multiples of Ck
functions are again Ck .) We can also consider the subspaces of polynomial functions. We denote
by Pk the vector space of polynomials of degree ≤ k in one variable.
As we ask the reader to check in Exercise 26, the vector space Pk has dimension k + 1. In
general, we say a vector space is finite-dimensional if it has dimension n for some n ∈ N and
infinite-dimensional if not. The vector space C∞ (R) is infinite-dimensional, as it contains the
polynomials of arbitrarily high degree.
160 Chapter 4. Implicit and Explicit Solutions of Linear Systems

EXERCISES 4.3

     
1 2 2
1. Let v1 =  2  , v2 =  4 , and v3 =  4  ∈ R3 . Is each of the following statements correct
3 5 6
or incorrect? Explain.
a. The set {v1 , v2 , v3 } is linearly dependent.
b. Each of the vectors v1 , v2 , and v3 can be written as a linear combination of the others.

*2. Decide
whether
each of the following sets of vectorsis linearly
  independent.
    
1 2 
 1 1 1 3 
a. , ⊂R 2
        
4  9  e.  1  ,  1  ,  3  ,  1  ⊂ R4
 1 2    1   3   1   1 

 

b.  4  ,  9  ⊂ R3 3 1 1 1
          
0
     0
 
 1 1 1 −3 

   1   −3   1 
 1 2 3   1        4
c.  4  ,  9  ,  −2  ⊂ R3 f.
  1  ,  −3  ,  1  ,  1  ⊂ R
  
 

−3 1 1 1
 0   0    0

 1 2 0 
d.  1 , 3 , 1  ⊂ R3
   
 
1 3 2

3. Suppose v, w ∈ Rn and {v, w} is linearly independent. Prove that {v − w, 2v + w} is linearly

independent as well.

4. Suppose {u, v, w} ⊂ R3 is linearly independent.

a. Prove that u · (v × w) 6= 0.
b. Prove that {u × v, v × w, w × u} is linearly independent as well.
♯ 5. Suppose v1 , . . . , vk are nonzero vectors with the property that vi ·vj = 0 whenever i 6= j. Prove
that {v1 , . . . , vk } is linearly independent. (Hint: “Suppose c1 v1 + c2 v2 + · · · + ck vk = 0.” Start
by showing c1 = 0.)
♯ 6. Suppose k > n. Prove that any k vectors in Rn must form a linearly dependent set. (So what
can you conclude if you have k linearly independent vectors in Rn ?)

7. Suppose v1 , . . . , vk ∈ Rn form a linearly dependent set. Prove that for some 1 ≤ j ≤ k we have
vj ∈ Span(v1 , . . . , vj−1 , vj+1 , . . . , vk ). That is, one of the vectors v1 , . . . , vk can be written as
a linear combination of the remaining vectors.

8. Suppose v1 , . . . , vk ∈ Rn form a linearly dependent set. Prove that either v1 = 0 or vi+1 ∈

Span(v1 , . . . , vi ) for some i = 1, 2, . . . , k − 1.

9. Let A be an m × n matrix and b1 , . . . , bk ∈ Rm . Suppose {b1 , . . . , bk } is linearly independent.

Suppose that v1 , . . . , vk ∈ Rn are chosen so that Av1 = b1 , . . . , Avk = bk . Prove that
{v1 , . . . , vk } must be linearly independent.

10. Suppose T : Rn → Rn is a linear map. Prove that if [T ] is nonsingular and {v1 , . . . , vk } is

linearly independent, then {T (v1 ), T (v2 ), . . . , T (vk )} is likewise linearly independent.
§3. Linear Independence, Basis, and Dimension 161

♯ 11. Suppose T : Rn → Rm is a linear map and [T ] has rank n. Suppose v1 , . . . , vk ∈ Rn and

{v1 , . . . , vk } is linearly independent. Prove that {T (v1 ), . . . , T (vk )} ⊂ Rm is likewise linearly
independent. (N.B.: If you did not explicitly make use of the assumption that rank([T ]) = n,
your proof cannot be correct. Why?)

*12. Decide
whether
  the  following
 sets of vectors give a basis for the indicated space.
 1 2 1 
a.  2  ,  4  ,  2  ; R3
 
 1   5   3   
 1 1 2 2 
b.  0  ,  2  ,  2  ,  2  ; R3
 
 1   4   5  −1

 1 0 1 
       
c.
0 1
  ,   ,  1  ; R4
  2   1   4 

 

3 1 4
       

 1 0 1 2 
       
 0   1   1   −2   4
d.
  2   1   4   1 ; R
, , ,

 

3 1 4 2

13. Find a basis foreach

 ofthegiven
 subspaces
 and determine its dimension.
1 3 5
*a. V = Span  2  ,  4  ,  −2  ⊂ R3
3 7 3
b. V = {x ∈ R4 : x + x + x + x4 = 0, x2 + x4 = 0} ⊂ R4
  1 2⊥ 3
1
c. V = Span  2  ⊂ R3
3
d. V = {x ∈ R5 : x1 = x2 , x3 = x4 } ⊂ R 5

14. In each case, check that {v1 , . . . , vn } is a basis for Rn and give the coordinates of the given
vector b ∈Rn with respect
to that basis.
2 3 3
a. v1 = , v2 = ;b=
3 5 4
       
1 1 1 1
*b. v1 = 0 , v2 =
  2 , v3 =  3 ; b =  1 
3 2 2 2
       
1 1 1 3
c. v1 =  0 , v2 =  1 , v3 =  1 ; b =  0 
1 2 1 1
         
1 1 1 1 2
0 1 1 1 0
*d. v1 =  
 0 , v2 =
 , v3 =  , v4 =  ; b =  
0 1 3 1
0 0 1 4 1
162 Chapter 4. Implicit and Explicit Solutions of Linear Systems

15. Find a basis for the intersection of the subspaces

       
1 2 0 2
 0   1   1   0 
V = Span    
 1  ,  1  and W = Span      4
 1  ,  1  ⊂ R .
1 2 0 2

♯ 16. Suppose v1 , . . . , vn are nonzero, mutually orthogonal vectors in Rn .

a. Prove that they form a basis for Rn .
b. Given any x ∈ Rn , give an explicit formula for the coordinates of x with respect to the
basis {v1 , . . . , vn }.
Pn
c. Deduce from your answer to part b that x = projvi x.
i=1

17. Prove Proposition 3.9. (Hint: Exercise 7 and Lemma 3.8 may be of help.)
♯ 18. Let V ⊂ Rn be a subspace, and suppose you are given a linearly independent set of vectors
{v1 , . . . , vk } ⊂ V . Prove that there are vectors vk+1 , . . . , vℓ ∈ V so that {v1 , . . . , vℓ } forms a
basis for V .

19. Suppose V and W are subspaces of Rn and W ⊂ V . Prove that dim W ≤ dim V . (Hint: Start
with a basis for W and apply Exercise 18.)

20. Suppose A is an n × n matrix, and let v1 , . . . , vn ∈ Rn . Suppose {Av1 , . . . , Avn } is linearly

independent. Prove that A is nonsingular.

21. *a. Suppose U and V are subspaces of Rn with U ∩ V = {0}. If {u1 , . . . , uk } is a basis for U
and {v1 , . . . , vℓ } is a basis for V , prove that {u1 , . . . , uk , v1 , . . . , vℓ } is a basis for U + V .
b. Let U and V be subspaces of Rn . Prove that if U ∩ V = {0}, then dim(U + V ) =
dim U + dim V .
c. Let U and V be subspaces of Rn . Prove that dim(U + V ) = dim U + dim V − dim(U ∩ V ).
(Hint: Start with a basis for U ∩ V , and use Exercise 18.)

22. Let T : Rn → Rm be a linear map. Define

ker(T ) = {x ∈ Rn : T (x) = 0} and

image(T ) = {y ∈ Rm : y = T (x) for some x ∈ Rn }.

a. Check that ker(T ) and image(T ) are subspaces of Rn and Rm , respectively.

b. Let {v1 , . . . , vk } be a basis for ker(T ), and, using Exercise 18, extend to a basis {v1 , . . . , vk ,
vk+1 , . . . , vn } for Rn . Prove that {T (vk+1 ), . . . , T (vn )} gives a basis for image(T ).
c. Deduce that dim ker(T ) + dim image(T ) = n.

*23. Decide
whether
the following
sets of vectors are linearly independent.

1 0 0 1 1 1
a. , , ⊂ M2×2
0 1 1 0 1 −1
b. {f1 , f2 , f3 } ⊂ P1 , where f1 (t) = t, f2 (t) = t + 1, f3 (t) = t + 2
c. {f1 , f2 , f3 } ⊂ C∞ (R), where f1 (t) = 1, f2 (t) = cos t, f3 (t) = sin t
§4. The Four Fundamental Subspaces 163

d. {f1 , f2 , f3 } ⊂ C0 (R), where f1 (t) = 1, f2 (t) = sin2 t, f3 (t) = cos2 t

e. {f1 , f2 , f3 } ⊂ C∞ (R), where f1 (t) = 1, f2 (t) = cos t, f3 (t) = cos 2t
f. {f1 , f2 , f3 } ⊂ C∞ (R), where f1 (t) = 1, f2 (t) = cos 2t, f3 (t) = cos2 t

24. Recall that Mm×n denotes the vector space of m × n matrices.

a. Give a basis for and determine the dimension of Mm×n .
b. Show that the set of diagonal matrices, the set of upper triangular matrices, and the set
of lower triangular matrices are all subspaces of Mn×n and determine their dimensions.
c. Show that the set of symmetric matrices, S, and the set of skew-symmetric matrices, K,
are subspaces of Mn×n . What are their dimensions? Show that S + K = Mn×n . (See
Exercise 1.4.36.)
♯ 25. Let V be a vector space.
a. Let V ∗ denote the set of all linear transformations from V to R. Show that V ∗ is a vector
space.
b. Suppose {v1 , . . . , vn } is a basis for V . For i = 1, . . . , n, define fi ∈ V ∗ by

fi (a1 v1 + a2 v2 + · · · + an vn ) = ai .

Prove that {f1 , . . . , fn } gives a basis for V ∗ .

c. Deduce that whenever V is finite-dimensional, dim V ∗ = dim V .

26. Show that the set Pk of polynomials in one variable of degree ≤ k is a vector space of dimension
k + 1. (Hint: Suppose c0 + c1 x + · · · + ck xk = 0 for all x. Differentiate.)

27. Recall that f : Rn − {0} → R is homogeneous of degree k if f (tx) = tk f (x) for all t > 0.
a. Show that the set Pk,n of homogeneous polynomials of degree k in n variables is a vector
space.
b. Fix k ∈ N. Show that the monomials xi11 xi22 · · · xinn , where i1 + i2 + · · · + in = k, form a
basis for Pk,n .
4 j
c. Show that dim Pk,n = n−1+k k . (Hint: It may help to remember that kj = j−k .)
P n+i n+k+1
k
d. Using the interpretation in part c, prove that i = k .
i=0

4. The Four Fundamental Subspaces

Given an m × n matrix A (or, more conceptually, a linear map T : Rn → Rm ), there are four
natural subspaces to consider. It is one of our goals to understand the relations among them. We
begin with the column space and row space.

Definition. Let A be an m × n matrix with row vectors A1 , . . . , Am ∈ Rn and column vectors

a1 , . . . , an ∈ Rm . We define the column space of A to be the subspace of Rm spanned by a1 , . . . , an :

C(A) = Span(a1 , . . . , an ) ⊂ Rm .
4 n

Recall that the binomial coefficient k
= n!/k!(n−k)! gives the number of k-element subsets of a given n-element
set.
164 Chapter 4. Implicit and Explicit Solutions of Linear Systems

We define the row space of A to be the subspace of Rn spanned by A1 , . . . , Am :

R(A) = Span(A1 , . . . , Am ) ⊂ Rn .

Our work in Section 1 gives an important alternative interpretation of the column space.

Proposition 4.1. Let A be an m × n matrix. Let b ∈ Rm . Then b ∈ C(A) if and only if

b = Ax for some x ∈ Rn . That is,

C(A) = {b ∈ Rm : Ax = b is consistent}.

Proof. By definition, C(A) = Span(a1 , . . . , an ), and so b ∈ C(A) if and only if b is a linear

combination of the vectors a1 , . . . , an , i.e., b = x1 a1 + · · · + xn an for some scalars x1 , . . . , xn .
Recalling our crucial observation (∗) on p. 129, we conclude that b ∈ C(A) if and only if b = Ax
for some x ∈ Rn . The final reformulation is straightforward so long as we remember that the
system Ax = b is consistent provided it has a solution.

Remark. If we think of A as the standard matrix of a linear map T : Rn → Rm , then C(A) ⊂

Rm is the set of all the values of T , i.e., its image, denoted image(T ).

Perhaps the most natural subspace of all comes from solving a homogeneous system of linear
equations.

Definition. Let A be an m × n matrix. The nullspace of A is the set of solutions of the system
Ax = 0:
N(A) = {x ∈ Rn : Ax = 0}.

Recall (see Exercise 1.4.3) that N(A) is in fact a subspace. If we think of A as the standard matrix
of a linear map T : Rn → Rm , then N(A) ⊂ Rn is often called the kernel of T , denoted ker(T ).
We might surmise that our algorithm in Section 1 for finding the general solution of the homo-
geneous linear system Ax = 0 produces a basis for N(A).

Example 1. Let’s find a basis for the nullspace of the matrix

" #
1 2 1 −1
A= .
1 0 1 1

Of course, we bring A to its reduced echelon form

" #
1 0 1 1
R=
0 1 0 −1

and read off the general solution

x1 = −x3 − x4
x2 = x4
x3 = x3
x4 = x4 ,
§4. The Four Fundamental Subspaces 165

i.e.,
       
x1 −x3 −x4 −1 −1
x   
x4     
 2   0  1
x= =  = x3   + x4   .
 x3   x3   1  0
x4 x4 0 1
From this we see that the vectors
   
−1 −1
 0  1
   
v1 =   and v2 =  
 1  0
0 1
span N(A). On the other hand, they are clearly linearly independent, for if
       
−1 −1 −c1 − c2 0
 0  1  c2   
      0
c1   + c2  =  =  ,
 1  0  c1  0
0 1 c2 0
then c1 = c2 = 0. Thus, {v1 , v2 } gives a basis for N(A). ▽
One of the most beautiful and powerful relations among these subspaces is the following:
Proposition 4.2. Let A be an m × n matrix. Then N(A) = R(A)⊥ .
Proof. If x ∈ N(A), then, by definition, Ai · x = 0 for all i = 1, 2, . . . , m. (Remember that
A1 , . . . , Am denote the row vectors of the matrix A.) So it follows (see Exercise 1.3.3) that x
is orthogonal to any linear combination of A1 , . . . , Am , hence to any vector in R(A). That is,
x ∈ R(A)⊥ , so N(A) ⊂ R(A)⊥ . Now we need only show that R(A)⊥ ⊂ N(A). If x ∈ R(A)⊥ , this
means that x is orthogonal to every vector in R(A), so, in particular, x is orthogonal to each of
the row vectors A1 , . . . , Am . But this means that Ax = 0, so x ∈ N(A), as required.
It is also the case that R(A) = N(A)⊥ , but we are not quite yet in a position to establish this.
Since C(A) = R(AT ), the following is immediate:
Corollary 4.3. Let A be an m × n matrix. Then N(AT ) = C(A)⊥ .
In fact, we really came across this earlier, when we found constraint equations for Ax = b to be
consistent. Just as multiplying A by x takes linear combinations of the columns of A, so then does
multiplying AT by x take linear combinations of the rows of A (perhaps it helps to think of AT x
as (xT A)T ). Corollary 4.3 is the statement that any linear combination of the rows of A that gives
0 corresponds to a constraint on C(A) and vice versa. What is, however, far from clear is whether
the vectors we obtain as coefficients of the constraint equations form a linearly independent set.
Example 2. Let  
1 2
1 1
 
A= .
0 1
1 2
166 Chapter 4. Implicit and Explicit Solutions of Linear Systems

We wish to find a homogeneous system of linear equations describing C(A). That is, we seek the
equations b ∈ R4 must satisfy in order for Ax = b to be consistent. By row reduction, we find:
     
1 2 b1 1 2 b1 1 2 b1
1 1 b2   0 −1 b − b  0 1 b1 − b2 
   2 1  
     ,
0 1 b3  0 1 b3  0 0 −b1 + b2 + b3 
1 2 b4 0 0 b4 − b1 0 0 −b1 + b4

and so the constraint equations are

−b1 + b2 + b3 = 0
−b1 + b4 = 0 .

Now, if we keep track of the row operations involved in reducing A to echelon form, we find
that
   
1 0 0 0 1 2
 −1 1 0 0 0 −1 
   
 A =  ,
 −1 1 1 0 0 0
−1 0 0 1 0 0
from which we see that
−A1 + A2 + A3 = −A1 + A4 = 0.

Thus, we infer that

   
−1 −1
 1  0
   
  and  
 1  0
0 1

span N(AT ). On the other hand, in this instance, it is easy to see they are linearly independent
and hence give a basis for N(AT ). ▽

Example 3. Let
 
1 1 0 1 4
1 2 1 1 6
 
A= .
0 1 1 1 3
2 2 0 1 7
Gaussian elimination gives us the reduced echelon form R:
    
1 0 −1 0 1 1 0 1 4 1 0 −1 0 1
 −1 1 0 
01 2 1 1  
6 0 1 1 0 2
 
R=  = .
 1 −1 1 00 1 1 1 3 0 0 0 1 1
−1 −1 1 1 2 2 0 1 7 0 0 0 0 0

From this information, we wish to read off bases for each of the subspaces R(A), N(A), C(A), and
N(AT ).
§4. The Four Fundamental Subspaces 167

Using the result of Exercise 1, R(A) = R(R), so the nonzero rows of R span R(A); now we
need only check that they form a linearly independent set. We keep an eye on the pivot “slots”:
Suppose
     
1 0 0
     
 0 1 0
     
   
c1  −1  + c2  1  + c3  
 0  = 0.
     
 0 0 1
1 2 1

This means that

   
c1 0
   
 c2  0
   
 −c1 + c2  =  0  ,
   
   
 c3  0
c1 + 2c2 + c3 0

and so c1 = c2 = c3 = 0, as promised.
From the reduced echelon form R, we read off the vectors that span N(A): the general solution
of Ax = 0 is      
x3 − x5 1 −1
     
−x3 −2x5  −1 −2
     

x =  x3     
 = x3  1 + x5  0 ,
     
 − x5   0 −1
x5 0 1
so    
1 −1
   
 −1   −2 
   
 1 and  0
   
   
 0  −1 
0 1
span N(A). On the other hand, these vectors are linearly independent, for if we take a linear
combination    
1 −1
   
 −1   −2 
   
x3    
 1  + x5  0  = 0,
   
 0  −1 
0 1
we infer (from the free variable slots) that x3 = x5 = 0. Thus, these two vectors form a basis for
N(A).
Obviously, C(A) is spanned by the five column vectors of A. But these vectors cannot be
linearly independent—that’s what vectors in the nullspace of A tell us. From our vectors spanning
168 Chapter 4. Implicit and Explicit Solutions of Linear Systems

N(A), we know that

(∗) a1 − a2 + a3 = 0 and −a1 − 2a2 − a4 + a5 = 0.
These equations tell us that a3 and a5 can be written as linear combinations of a1 , a2 , and a4 , and
so these latter three vectors span C(A). If we can check that they form a linearly independent set,
we’ll know they give a basis for C(A). We form a matrix A′ with these columns (easier: cross out
the third and fifth columns of A), and reduce it to echelon form (easier: cross out the third and
fifth columns of R). Well, we have
   
1 1 1 1 0 0
1 2 1 0 1 0
   
A′ =     = R′ ,
0 1 1 0 0 1
2 2 1 0 0 0
and so only the trivial linear combination of the columns of A′ will yield the zero vector. In
conclusion, the vectors
     
1 1 1
1 2 1
     
a1 =   , a2 =   , and a4 =  
0 1 1
2 2 1
are linearly independent and span C(A).
What about N(AT )? The only row of zeroes in R arises as the linear combination
−A1 − A2 + A3 + A4 = 0
of the rows of A, so we expect the vector
 
−1
 −1 
 
v= 
 1
1
to give a basis for N(AT ). ▽

We now state the formal results regarding the four fundamental subspaces.

Theorem 4.4. Let A be an m × n matrix. Let U and R, resp., denote the echelon and reduced
echelon form, resp., of A, and write EA = U (so E is the product of the elementary matrices by
which we reduce A to echelon form).
(1) The nonzero rows of U (or of R) give a basis for R(A).
(2) The vectors obtained by setting each free variable equal to 1 and the remaining free variables
equal to 0 in the general solution of Ax = 0 (which we read off from Rx = 0) give a basis
for N(A).
(3) The pivot columns of A (i.e., the columns of the original matrix A corresponding to the
pivots in U ) give a basis for C(A).
(4) The (transposes of the) rows of E that correspond to the zero rows of U give a basis for
N(AT ). (The same works with E ′ if we write E ′ A = R.)
§4. The Four Fundamental Subspaces 169

Proof. For simplicity of exposition, let’s assume that the reduced echelon form takes the shape
 

 1 b1,r+1 b1,r+2 · · · b1n
 .. .. .. .. .. 
r 


. . . . . 

R=  
 1 br,r+1 br,r+2 · · · brn  .
 
 
m−r O O

(1) Since row operations are invertible, R(A) = R(U ) (see Exercise 1). Clearly the nonzero
rows of U span R(U ). Moreover, they are linearly independent because of the pivots. Let
U1 , . . . , Ur denote the nonzero rows of U ; because of our simplifying assumption on R, we
know that the pivots of U occur in the first r columns as well. Suppose now that

c1 U1 + · · · + cr Ur = 0.

The first entry of the left-hand side is c1 u11 (since the first entry of the vectors U2 , . . . , Ur
is 0 by definition of echelon form). Since u11 6= 0 by definition of pivot, we must have
c1 = 0. Continuing in this fashion, we find that c1 = c2 = · · · = cr = 0. In conclusion,
{U1 , . . . , Ur } forms a basis for R(U ), hence for R(A).
(2) Ax = 0 if and only if Rx = 0, which means that

x1 + b1,r+1 xr+1 + b1,r+2 xr+2 + . . . + b1n xn = 0

x2 + b2,r+1 xr+1 + b2,r+2 xr+2 + . . . + b2n xn = 0
.. .. .. .. ..
. . . . .
xr + br,r+1 xr+1 + br,r+2 xr+2 + . . . + brn xn = 0 .

Thus, an arbitrary element of N(A) can be written in the form

       
x1 −b1,r+1 −b1,r+2 −b1n
 .   ..   ..   . 
 ..   .   .   .. 
       
       
 xr   −br,r+1   −br,r+2   −brn 
       
x=  
 xr+1  = xr+1  1  + xr+2 
  0  + · · · + xn  0  .
  
       
x
 r+2   0   1   0 
 .   ..   ..   
 .       .. 
 .   .   .   . 
xn 0 0 1

The assertion is then that the vectors

     
−b1,r+1 −b1,r+2 −b1n
 ..   ..   . 
 .   .   .. 
     
     
 −br,r+1   −br,r+2   −brn 
     
 1 ,  0 , ... ,  0 
     
     
 0   1   0 
 ..   ..   . 
     . 
 .   .   . 
0 0 1
170 Chapter 4. Implicit and Explicit Solutions of Linear Systems

give a basis for N(A). They obviously span (since every vector in N(A) can be expressed
as a linear combination of them). We need to check linear independence: the key is the
pattern of 1’s and 0’s in the free variable “slots.” Suppose
       
0 −b1,r+1 −b1,r+2 −b1n
.  ..   ..   . 
 ..   .   .   .. 
       
       
0  −br,r+1   −br,r+2   −brn 
       
0= 
 0  = xr+1 
 1  + xr+2 
  0  + · · · + xn  0  .
  
       
 0  0   1   0 
.  ..   ..   
.      .. 
 .  .   .   . 
0 0 0 1

Then we get xr+1 = xr+2 = · · · = xn = 0, as required.

(3) Let’s continue with the notational simplification that the pivots occur in the first r columns.
Then we need to establish the fact that the first r column vectors of the original matrix A
give a basis for C(A). These vectors form a linearly independent set, since the only solution
of
c1 a1 + · · · + cr ar = 0
is c1 = c2 = · · · = cr = 0 (look only at the first r columns of A and the first r columns of
R). It is more interesting to understand why a1 , . . . , ar span C(A). Consider each of the
basis vectors for N(A) given above: each one gives us a linear combination of the column
vectors of A that results in the zero vector. In particular, we find that

−b1,r+1 a1 − . . . − br,r+1 ar + ar+1 = 0

−b1,r+2 a1 − . . . − br,r+2 ar + ar+2 = 0
.. .. .. ..
. . . .
−b1n a1 − . . . − brn ar + an = 0 ,

from which we conclude that the vectors ar+1 , . . . , an are all linear combinations of a1 , . . . , ar .
It follows that C(A) is spanned by a1 , . . . , ar , as required.
(4) We are interested in the linear relations among the rows of A. The key point here is that
the first r rows of the echelon matrix U form a linearly independent set, whereas the last
m − r rows of U consist just of 0. Thus, N(U T ) is spanned by the last m − r standard basis
vectors for Rm . Using EA = U , we see that

AT = (E −1 U )T = U T (E −1 )T = U T (E T )−1 ,

and so

x ∈ N(AT ) ⇐⇒ x ∈ N(U T (E T )−1 ) ⇐⇒ (E T )−1 x ∈ N(U T )

⇐⇒ x = E T y for some y ∈ N(U T ).

This tells us that the last m − r rows of E span N(AT ). But these vectors are linearly
independent, since E is nonsingular.
§4. The Four Fundamental Subspaces 171

Remark. Referring to our earlier discussion of (†) on p. 143 and our discussion in Sections 1
and 2 of this chapter, we finally know that finding the constraint equations for C(A) will give a
basis for N(AT ). It is also worth noting that to find bases for the four fundamental subspaces of
the matrix A, we need only find the echelon form of A to deal with R(A) and C(A), the reduced
echelon form of A to deal with N(A), and the echelon form of the augmented matrix [A | b] to
deal with N(AT ).

Example 4. We want bases for R(A), N(A), C(A), and N(AT ), given the matrix
 
1 1 2 0 0
0 1 1 −1 −1 
 
A= .
1 1 2 1 2
2 1 3 −1 −3

We leave it to the reader to check that the reduced echelon form of A is

 
1 0 1 0 −1
0 1 1 0 1
 
R= 
0 0 0 1 2
0 0 0 0 0

and that EA = U , where

   
1 0 0 0 1 1 2 0 0
 0 1 0 0 0 1 1 −1 −1 
   
E=  and U = .
 −1 0 1 0 0 0 0 1 2
−4 1 2 1 0 0 0 0 0

Alternatively, the echelon form of the augmented matrix [A | b] is

 
1 1 2 0 0 b1
0 1 1 −1 −1 b2 
 
[EA | Eb] =  .
0 0 0 1 2 −b1 + b3 
0 0 0 0 0 −4b1 + b2 + 2b3 + b4

Then we have the following bases for the respective subspaces:

      

 1 0 0 
      
 


 1   1   0  
     
R(A): 2 ,  1 , 0

      

  0   −1   1 



 0 

−1 2
    

 −1 1 


  −1   −1 


     
   
N(A): ,
   
1 0

     

 0   −2  

 0 

1
172 Chapter 4. Implicit and Explicit Solutions of Linear Systems
      

 1 1 0  

       
 0   1   −1 
C(A):  , , 
 1   1   1  
 

 2 
1 −1 
 

 −4 

  
 1
N(AT ):  

  2 

 

1

The reader should check these all carefully. Note that dim R(A) = dim C(A) = 3, dim N(A) = 2,
and dim N(AT ) = 1. ▽

We now deduce the following results on dimension. Recall that the rank of a matrix is the
number of pivots in its echelon form.

Theorem 4.5. Let A be an m × n matrix of rank r. Then

(1) dim R(A) = dim C(A) = r.
(2) dim N(A) = n − r.
(3) dim N(AT ) = m − r.

Proof. There are r pivots and a pivot in each nonzero row of U , so dim R(A) = r. Similarly, we
have a basis vector for C(A) for every pivot, so dim C(A) = r, as well. We see that dim N(A) is equal
to the number of free variables, and this is the difference between the total number of variables (n)
and the number of pivot variables (r). Last, the number of zero rows in U is the difference between
the total number of rows (m) and the number of nonzero rows (r), so dim N(AT ) = m − r.

An immediate corollary of Theorem 4.5 is the following. The dimension of the nullspace of A
is often called the nullity of A, denoted null(A). (Cf. also Exercise 4.3.22.)

Corollary 4.6 (Nullity-Rank Theorem). Let A be an m × n matrix. Then

null(A) + rank(A) = n.

Now we are in a position to complete our discussion of orthogonal complements.

Proposition 4.7. Let V ⊂ Rn be a k-dimensional subspace. Then dim V ⊥ = n − k.

Proof. Choose a basis {v1 , . . . , vk } for V , and let these be the rows of a k × n matrix A.
By construction, we have R(A) = V . Notice also that rank(A) = dim R(A) = dim V = k. By
Proposition 4.2, we have V ⊥ = N(A), so dim V ⊥ = dim N(A) = n − k.

Now we arrive at our desired conclusion:

Proposition 4.8. Let V ⊂ Rn be a subspace. Then (V ⊥ )⊥ = V .

Proof. By Exercise 1.3.10, we have V ⊂ (V ⊥ )⊥ . Now we calculate dimensions: if dim V = k,

then dim V ⊥ = n − k, and dim(V ⊥ )⊥ = n − (n − k) = k. Applying Lemma 3.8, we deduce that
V = (V ⊥ )⊥ .
§4. The Four Fundamental Subspaces 173

We can finally bring this discussion to a close with the geometric characterization of the relations
among the four fundamental subspaces. Note that this result completes the story of Theorem 4.5.

Theorem 4.9. Let A be an m × n matrix. Then

(1) R(A)⊥ = N(A)
(2) N(A)⊥ = R(A)
(3) C(A)⊥ = N(AT )
(4) N(AT )⊥ = C(A)

Proof. These are immediate from Proposition 4.2, Corollary 4.3, and Proposition 4.8.

Now, using Theorem 4.9, we have an alternative way of expressing a subspace V spanned by a
given set of vectors v1 , . . . , vk as the solution set of a homogeneous system of linear equations. We
use the vectors as rows of a matrix A; let {w1 , . . . , wℓ } give a basis for N(A). Since V = R(A) =
N(A)⊥ , we see that V is defined by the equations

w1 · x = 0, ..., wℓ · x = 0.

Example 5. Let
   
1 2
1 1
   
v1 =   and v2 =   .
0 1
1 2
We wish to write V = Span(v1 , v2 ) as the solution set of a homogeneous system of linear equations.
We introduce the matrix " #
1 1 0 1
A=
2 1 1 2
and find that    
−1 −1
 1  0
   
w1 =   and w2 =  
 1  0
0 1
give a basis for N(A). By our earlier comments,

V = R(A) = N(A)⊥
= {x ∈ R4 : w1 · x = 0, w2 · x = 0}
= {x ∈ R4 : −x1 + x2 + x3 = 0, −x1 + x4 = 0}. ▽

Earlier, e.g., in Example 2, we determined the constraint equations for the column space. The
column space, as we’ve seen, is the intersection of hyperplanes whose normal vectors are the basis
vectors for N(AT ). This is an application of the result that C(A) = N(AT )⊥ . As we interchange
A and AT , we turn one method of solving the problem into the other.
To close our discussion now, we introduce in Figure 4.1 a schematic diagram summarizing the
geometric relation among our four fundamental subspaces. We know that N(A) and R(A) are
174 Chapter 4. Implicit and Explicit Solutions of Linear Systems

N(A) T
N(AT )

R(A)

S
C(A)
IRn IRm

Figure 4.1

orthogonal complements of one another in Rn and that, similarly, N(AT ) and C(A) are orthogonal
complements of one another in Rm . But there is more to be said.
Recall that, given an m × n matrix A, we have linear maps T : Rn → Rm and S : Rm → Rn
whose standard matrices are A and AT , respectively. T sends all of N(A) to 0 ∈ Rm and S sends
all of N(AT ) to 0 ∈ Rn . Now, the column space of A consists of all vectors of the form Ax for some
x ∈ Rn ; that is, it is the image of the function T . Since dim R(A) = dim C(A), this suggests that T
maps the subspace R(A) one-to-one and onto C(A). (And, symmetrically, S maps C(A) one-to-one
and onto R(A). These are, however, generally not inverse functions. Why? See Exercise 18.)

Proposition 4.10. For each b ∈ C(A), there is a unique vector x ∈ R(A) so that Ax = b.

(See Figure 4.2.)

R(A)

{x: Ax=b}

N(A) = {x: Ax=0}

Figure 4.2

Proof. Let {v1 , . . . , vr } be a basis for R(A). Then Av1 , . . . , Avr are r vectors in C(A). They
are linearly independent (by a modification of the proof of Exercise 4.3.11 that we leave to the
reader). Therefore, by Proposition 3.9, these vectors must span C(A). This tells us that every
vector b ∈ C(A) is of the form b = Ax for some x ∈ R(A) (why?). And there can be only one
such vector x because R(A) ∩ N(A) = {0}.

Remark. There is a further geometric interpretation of the vector x ∈ R(A) that arises in the
preceding Proposition. Of all the solutions of Ax = b, it is the one of least length. Why?
§4. The Four Fundamental Subspaces 175

EXERCISES 4.4

*1. Show that if B is obtained from A by performing one or more row operations, then R(B) =
R(A).
 
1 2 1 1
 
2. Let A =  −1 0 3 4 .
2 2 −2 −3
a. Give constraint equations for C(A).
b. Find a basis for N(AT ).

3. For each of the following matrices A, give bases for R(A), N(A), C(A), and N(AT ). Check
dimensions
" and orthogonality.
#
1 2 3
a. A =
2 4 6
 
2 1 3
 
b. A =  4 3 5
3 3 3
" #
1 −2 1 0
c. A =
2 −4 3 −1
 
1 −1 1 1 0
 1 0 2 1 1
 
d. A =  
 0 2 2 2 0
−1 1 −1 0 −1
 
1 1 0 1 −1
 1 1 2 −1 1
 
e. A =  
 2 2 2 0 0
−1 −1 2 −3 3
 
1 1 0 5 0 −1
 0 1 1 3 −2 0
 
*f. A =  
 −1 2 3 4 1 −6 
0 4 4 12 −1 −7
4. Given each
 matrix A, find matrices X and Y so that C(A) = N(X) and N(A) = C(Y ).
3 −1
   
*a. A =  6 −2  1 1 1
−9 3  0
  1 2 
c. A =  
1 1 0 1 1 1
 
b. A =  2 1 1 1 0 2
1 −1 2
5. In each case, construct a matrix with the requisite properties or explain why no such matrix
exists.
176 Chapter 4. Implicit and Explicit Solutions of Linear Systems
       
1 0 1 0
a.      
The column space contains 1 and 1 and the nullspace contains 0 and 1 .

1 1 1 0
   
    1 1
1 0    
0 0
*b. The column space contains  1  and  1  and the nullspace contains   and  .
1 0
1 1
0 1
   
1 1
*c.  
The column space has basis 0 and the nullspace contains 2 .

1 0
     
1 −1 1
d. 
The nullspace contains 0 ,  
2 , and the row space contains  1 .
1 1 −1
       
1 0 1 2
*e.      
The column space has basis 0 , 1 , and the row space has basis 1 , 0 .

1 1 1 1

1
f. The column space and the nullspace both have basis .
0
 
1
g. The column space and the nullspace both have basis  0 .
0

6. a. Construct a nonzero 3 × 3 matrix A with C(A) ⊂ N(A).

b. Construct a 3 × 3 matrix A with N(A) ⊂ C(A).
c. Can there be a 3 × 3 matrix A with N(A) = C(A)? Why or why not?
d. Can there be a 4 × 4 matrix A with N(A) = C(A)? Why or why not?
   
1 0
0  1
   
7. Let V ⊂ R be spanned by 
5   
 1  and  −1 . Give a homogeneous system of equations having
1  0
1 2
V as its solution set.
4
*8. Give a basis forthe complement of each of the following subspaces of R :
 orthogonal
 
1 0
 0   1 
a. V = Span    
 3  ,  2 
4 −5
b. W = {x ∈ R4 : x1 + 3x3 + 4x4 = 0, x2 + 2x3 − 5x4 = 0}

9. a. Give a basis for the orthogonal complement of the subspace V ⊂ R4 given by

V = {x ∈ R4 : x1 + x2 − 2x4 = 0, x1 − x2 − x3 + 6x4 = 0, x2 + x3 − 4x4 = 0}.

§4. The Four Fundamental Subspaces 177

b. Give a basis for the orthogonal complement of the subspace

     
1 1 0
 1   −1   1 
W = Span        4
 0  ,  −1  ,  1  ⊂ R .
−2 6 −4

c. Give a matrix B so that the subspace W defined in part b can be written in the form
W = N(B).

*10. Let A be an m × n matrix with rank r. Suppose A = BU , where U is in echelon form. Prove
that the first r columns of B give a basis for C(A). (In particular, if EA = U , where U is the
echelon form of A and E is the product of elementary matrices by which we reduce A to U ,
then the first r columns of E −1 give a basis for C(A).)

11. According to Proposition 4.10, if A is an m × n matrix, then for each b ∈ C(A), there is a
∈ R(A) with Ax
unique x " # = b. In each case, give a formula
" for that x.#
1 2 3 1 1 1
a. A = *b. A =
1 2 3 0 1 −1
♯ 12. Let A be an m × n matrix and B be an n × p matrix. Prove that
a. N(B) ⊂ N(AB).
b. C(AB) ⊂ C(A). (Hint: Use Proposition 4.1.)
c. N(B) = N(AB) when A is n × n and nonsingular.
d. C(AB) = C(A) when B is n × n and nonsingular.

13. Continuing Exercise 12: Let A be an m × n matrix and B be an n × p matrix.

a. Prove that rank(AB) ≤ rank(A). (Hint: See part b of the earlier exercise.)
b. Prove that if n = p and B is nonsingular, then rank(AB) = rank(A).
c. Prove that rank(AB) ≤ rank(B). (Hint: Use part a of said exercise and Theorem 4.5.)
d. Prove that if m = n and A is nonsingular, then rank(AB) = rank(B).
e. Prove that if rank(AB) = n, then rank(A) = rank(B) = n.
♯ 14. Let A be an m × n matrix. Prove that N(AT A) = N(A). (Hint: Use Exercise 12 and Exercise
1.4.32.)

15. Let A be an m × n matrix.

a. Use Theorem 4.9 to prove that N(AT A) = N(A). (Hint: You’ve already proved “⊃” in
Exercise 12. Now, if x ∈ N(AT A), then Ax ∈ C(A) ∩ N(AT ).)
b. Prove that rank(A) = rank(AT A).
c. Prove that C(AT A) = C(AT ). (Hint: You’ve already proved “⊂” in Exercise 12. Use part
b to see the two spaces have the same dimension.)

16. Suppose A is an n × n matrix with the property that A2 = A.

a. Prove that C(A) = {x ∈ Rn : x = Ax}.
b. Prove that N(A) = {x ∈ Rn : x = u − Au for some u ∈ Rn }.
c. Prove that C(A) ∩ N(A) = {0}.
178 Chapter 4. Implicit and Explicit Solutions of Linear Systems

d. Prove that C(A) + N(A) = Rn .

17. Suppose U and V are subspaces of Rn . Prove that (U ∩ V )⊥ = U ⊥ + V ⊥ . (Hint: Use Exercise
1.3.12 and Proposition 4.8.)

18. a. Show that if the m × n matrix A has rank 1, then there are nonzero vectors u ∈ Rm and
v ∈ Rn so that A = uvT . Describe the geometry of the four fundamental subspaces in
terms of u and v.
Pursuing the discussion on p. 174,
b. Suppose A is an m × n matrix of rank n. Show that AT A = In if and only if the column
vectors a1 , . . . , an ∈ Rn are mutually orthogonal unit vectors.
c. Suppose A is an m×n matrix of rank 1. Using the notation of part a, show that (S ◦ T )(x) =
x for each x ∈ R(A) if and only if kukkvk = 1. Interpret T geometrically.
d. Can you generalize? (See Exercise 9.1.15.)

5. The Nonlinear Case: Introduction to Manifolds

We have seen that given a linear subspace V of Rn , we can represent it either explicitly (para-
metrically) as the span of its basis vectors or implicitly as the solution set of a homogeneous system
of linear equations (i.e., the nullspace of an appropriate matrix A). Proposition 4.2 gives a geometric
interpretation of that matrix: its row vectors must span the orthogonal complement of V .
In the nonlinear case, sometimes we are just as fortunate. Given the hyperbola with equation
xy = 1, it is easy to solve (everywhere) explicitly for either x or y as a function of the other. In
the case of the circle x2 + y 2 = 1, we can solve for y as a function of x locally near any point not
√
on the x-axis (viz., y = ± 1 − x2 ), and for x as a function of y near any point not on the y-axis
(analogously).
But it is important to understand that going back and forth between these two approaches can
be far more difficult—if not impossible—in the nonlinear case. For example, with a bit of luck, we
can see that the parametric curve
" #
t2 − 1
g(t) = , t ∈ R,
t(t2 − 1)

is given by the algebraic equation y 2 = x2 (x + 1) (the curve pictured in Figure 1.4(b) on p. 53).
On the other hand, the cycloid, presented parametrically as the image of the function
" #
t − sin t
g(t) = , t ∈ R,
1 − cos t

(see Figure 1.6 on p. 54) is obviously the graph y = f (x) for some function f , but I believe no one
can find f explicitly. Nor is there a function on R2 whose zero-set is the cycloid. Nevertheless, it is
easy to see that locally we can write x as a function of y away from the cusps. On the other hand,
§5. The Nonlinear Case: Introduction to Manifolds 179

given the hypocycloid x2/3 + y 2/3 = 1, we can find the parametrization

" #
cos3 t
g(t) = , t ∈ [0, 2π],
sin3 t
but giving an explicit (global) parametrization of the curve y 2 − x3 + x = 0 in terms of elementary
y

y2 = x3−x

Figure 5.1
functions is impossible. However, as Figure 5.1 suggests, away from the points lying on the x-axis,
√
we can write y as a function of x (explicitly in this case: y = ± x3 − x), and near each of those
three points we can write x as a function of y (explicitly only if you know how to solve the cubic
equation x3 − x = y 2 explicitly).
Given the hyperplane a · x = 0 in Rn , we can solve for xn as a function of x1 , . . . , xn−1 —i.e.,
we can represent the hyperplane as a graph over the x1 · · · xn−1 -plane—if and only if an 6= 0 (and,
likewise, we can solve for xk in terms of the remaining variables if and only if ak 6= 0). More
generally, given a system of linear equations, we apply Gaussian elimination and solve for the pivot
variables as functions of the free variables. In particular, as Theorem 4.4 shows, if rank(A) = r,
then we solve for the r pivot variables as functions of the n − r free variables.
Now, since the derivative gives us the best linear approximation of a function, we expect that
if the tangent plane to a surface at a point is a graph, then so locally should be the surface, as
depicted in Figure 5.2. We suggested in Section 4 of Chapter 3 that, given a level surface f = c
of a differentiable function f : Rn → R, the vector ∇f (a)—provided it is nonzero—should be the
normal vector to the tangent plane at a; equivalently, the subspace of Rn parallel to the tangent
plane should be the nullspace of the matrix [Df (a)]. To establish these facts we need the Implicit
Function Theorem, whose proof we delay to Chapter 6.
Theorem 5.1 (Implicit Function Theorem, simple case). Suppose U ⊂ Rn is open, a ∈ U ,
∂f
and f : U → R is C1 . Suppose that f (a) = 0 and (a) 6= 0. Then there are neighborhoods V of
∂xn
180 Chapter 4. Implicit and Explicit Solutions of Linear Systems

f(x) = c
x1
( )
xn = φ ...
xn−1
a

a1
...
an−1
V

Figure 5.2
 
a1
 .. 
 .  and W of an and a C1 function φ : V → W so that
   
an−1 x1 x1
 ..   .. 
f (x) = 0,  .  ∈ V, and xn ∈ W ⇐⇒ xn = φ  .  .
xn−1 xn−1
That is, near a, the level surface f = 0 can be expressed as a graph over the x1 · · · xn−1 -plane; i.e.,
near a, the equation f = 0 defines xn implicitly as a function of the remaining variables.
∂f
More generally, provided Df (a) 6= 0, we know that some partial derivative (a) 6= 0, and so
∂xk
locally the equation f = 0 expresses xk implicitly as a function of x1 , . . . , xk−1 , xk+1 , . . . , xn .

Example 1. Consider the curve

x
f = y 3 − 3y − x = 0,
y
as shown in Figure 5.3. Although it is globally a graph of x as a function of y, we see that

y
y=φ1(x)

y=φ2(x) 1
x
−2 2
−1
y=φ3(x)

Figure 5.3

∂f −2
= 3(y 2 − 1) = 0 at the points ± . Away from these points, y is given (implicitly) locally
∂y 1
§5. The Nonlinear Case: Introduction to Manifolds 181

as a function of x. We recognize these as the three (C1 ) local inverse functions φ1 , φ2 , and φ3 of
g(x) = x3 − 3x, defined, respectively, on the intervals (−2, ∞), (−2, 2), (−∞, 2). ▽

Example 2. Consider the surface

 
x
f y  = z 2 + xz + y = 0,
z
pictured in Figure 5.4. Note first of all that it is globally a graph: y = −(z 2 + xz). On the other
∂f
hand, = 2z + x = 0 on f = 0 precisely when x = −2z and y = z 2 . That is, away from points
∂z

Figure 5.4
 
−2t
 2  x
of the form t for some t ∈ R, we can locally write z = φ . Of course, it doesn’t take a
y
t
wizard to do so: we have p
−x ± x2 − 4y
z= ,
2
and away from points of the designated form we can choose either the positive or negative square
root. It is along the curve 4y = x2 (in the xy-plane) that the two roots of this quadratic equation
in z coalesce. (Note that this curve is the projection of the locus of points on the surface where
∂f
= 0.) ▽
∂z
Now we can legitimize (finally) the process of implicit differentiation introduced in beginning
∂f
calculus classes. Suppose U ⊂ Rn is open, a ∈ U , f : U → R is C1 , and (a) 6= 0. For convenience
∂xn
here, let’s write  
x1
 . 
x= . 
 . .
xn−1
182 Chapter 4. Implicit and Explicit Solutions of Linear Systems

Then, by Theorem 5.1, f = 0 defines xn implicitly as a C1 function φ(x) near a. Then we have

Lemma 5.2. For j = 1, . . . , n − 1, we have

∂f
∂φ ∂xj (a)
(a) = − ∂f .
∂xj
∂xn (a)
" #
n x
Proof. Define g : V → R by g(x) = . Then (f ◦ g)(x) = 0 for all x ∈ V . Thus, by the
φ(x)
Chain Rule, D(f ◦ g)(a) = Df (g(a))Dg(a) = 0, so that
 
1 ··· 0
 . . ..  h i
∂f ∂f ∂f  .. .. . 
···   = 0 ··· 0 .
∂x1 ∂xn−1 ∂xn   0 ··· 1 

∂φ ∂φ
∂x1 · · · ∂xn−1

(Here all the derivatives of φ are evaluated at a, and all the derivatives of f are evaluated at
g(a) = a.) In particular, for any j = 1, . . . , n − 1, we have
∂f ∂f ∂φ
(a) + (a) (a) = 0,
∂xj ∂xn ∂xj
from which the result is immediate.

Now we can officially prove our assertion from Chapter 3.

Proposition 5.3. Suppose U ⊂ Rn is open, a ∈ U , f : U → R is C1 , and Df (a) 6= 0. Suppose

f (a) = c. Then the tangent hyperplane at a of the level surface M = f −1 ({c}) is given by

Ta M = {x ∈ Rn : Df (a)(x − a) = 0};

that is, ∇f (a) is normal to the tangent hyperplane.

∂f
Proof. Since Df (a) 6= 0, we may assume without loss of generality that (a) 6= 0. Applying
∂xn
Theorem 5.1 to the function f − c, we know that M can be expressed near a as the
graph xn = φ(x)
a
for some C1 function φ. Now, the tangent plane to the graph xn = φ(x) at a = is the graph
φ(a)
of Dφ(a), translated so that it passes through a:
n−1
X ∂φ
xn − an = Dφ(a)(x − a) = (a)(xj − aj )
∂xj
j=1
n−1 ∂f !
X ∂xj (a)
= − ∂f (xj − aj ) (by Lemma 5.2),
j=1 ∂xn (a)

and so, by simple algebra, we obtain

n−1
X ∂f ∂f
(a)(xj − aj ) + (a)(xn − an ) = Df (a)(x − a) = 0,
∂xj ∂xn
j=1

as required.
§5. The Nonlinear Case: Introduction to Manifolds 183

From Theorem 5.1 we infer that if f : Rn → R is C1 and ∇f 6= 0 on the level surface M =

f −1 ({c}), then at each point a ∈ M , we can locally represent M as a graph over (at least) one of
the n coordinate hyperplanes. We call such a set M a smooth hypersurface or (n − 1)-dimensional
manifold. More generally, a subset M ⊂ Rn is an (n − m)-dimensional manifold if each point has
a neighborhood that is a C1 graph over some (n − m)-dimensional coordinate plane. The general
version of the Implicit Function Theorem, which we shall prove in Chapter 6, tells us that this is true
whenever M is the level set of a C1 function F : Rn → Rm with the property that rank(DF(x)) = m
at every point x ∈ M . Moreover, generalizing the result of Proposition 5.3, the (n−m)-dimensional
tangent plane of M at a point a is then obtained by translating the (n − m)-dimensional subspace
N([DF(a)]) so that it passes through a.

x2+y2 = a2

x2+z2 = b2

Figure 5.5

Example 3. Suppose a, b > 0. Consider the intersection M of the cylinders x2 + y 2 = a2 and

x2 + z 2 = b2 . We claim that so long as a 6= b, M is a smooth curve (1-dimensional manifold), as
pictured in Figure 5.5. If we define F : R3 → R2 by
 
x
x 2 + y 2 − a2
F y  = ,
x 2 + z 2 − b2
z

then M = F−1 ({0}). To see that M is a 1-dimensional manifold, we need only check that
rank(DF(x)) = 2 for every x ∈ M . We have
 
x
2x 2y 0 x y 0
DF y  = =2 .
2x 0 2z x 0 z
z

If x 6= 0, this matrix will have 2 pivots, since y and z can’t be simultaneously 0. If x = 0, then
both y and z are nonzero, and once again the matrix has 2 pivots. Thus, as claimed, the rank of
DF(x) is 2 for every x ∈ M , and so M is a smooth curve. ▽
184 Chapter 4. Implicit and Explicit Solutions of Linear Systems

EXERCISES 4.5

1. Can one solve for one of the variables in terms of the other to express each of the following as
a graph? What about locally?
a. xy = 0
b. 2 sin(xy) = 1

2. Decide whether each of the following is a smooth curve (1-dimensional manifold). If not, what
are the trouble points?
a. y 2 − x3 + x = 0
b. y 2 − x3 − x2 = 0
c. z − xy = y − x2 = 0
d. x2 + y 2 + z 2 − 1 = x2 − x + y 2 = 0
e. x2 + y 2 + z 2 − 1 = z 2 − xy = 0

*3. Let    
x 1
f y  = xy 2 + sin(xz) + ez and a =  −1  .
z 0

1 x
a. Show that theequation
f = 2 defines z as a C function z = φ near a.
y
∂φ 1 ∂φ 1
b. Find and .
∂x −1 ∂y −1
c. Find the equation of the tangent plane of the surface f −1 ({2}) at a in two ways.

2 1 ∂h y/x
4. Suppose h : R → R is C and 6= 0. Show that the equation h = 0 defines z (locally)
∂x2 z/x
x
implicitly as a C1 function z = φ and show that
y

∂φ ∂φ x
x +y =φ .
∂x ∂y y

5. Prove that S n−1 = {x ∈ Rn : kxk = 1} is an (n − 1)-dimensional manifold. (Hint: Note that

kxk = 1 ⇐⇒ kxk2 = 1.)

*6. Let f : R3 → R be given by

 
x
f y  = z 2 + 4x3 z − 6xyz + 4y 3 − 3x2 y 2 .
z

Is M = f −1 ({0}) a smooth surface (2-dimensional manifold)? If not, at what points does it

fail to be so?

7. Show that the intersection of the surfaces x2+ 2y

2 + 3z 2 = 9 and x2 + y 2 = z 2 is a smooth

1
curve. Find its tangent line at the point a =  √1 .
2
§5. The Nonlinear Case: Introduction to Manifolds 185

8. Investigate what happens in Example 3 when a = b.

9. Show that the set of nonzero singular 2 × 2 matrices is a 3-dimensional manifold in M2×2 = R4 .

x
10. Consider the curve f = 4y 3 − 3y − x = 0.
y
a. Sketch the curve.
b. Check that y is given (locally) by the following C1 functions of x on the given intervals:
√ √
φ1 (x) = 21 (x + x2 − 1)1/3 + (x + x2 − 1)−1/3 , x ∈ (1, ∞)
φ2 (x) = cos( 31 arccos x), x ∈ (−1, 1)
Give the remaining functions (two defined on (−1, 1), one on (−∞, −1)).
c. Show that the function φ : (−1, ∞) → R defined by

 x ∈ (−1, 1)
φ2 (x),

φ(x) = 1, x=1



φ1 (x), x ∈ (1, ∞)

is C1 and that the value of φ′ (1) agrees with that given by Lemma 5.2.

*11. Let M = {x ∈ R4 : x21 + x22 + x23 + x24 = 1, x1 x2 = x3 x4 }.

a. Show that M is a smooth surface (2-dimensional
  manifold).
 
1 1/2
0  
b. Find the tangent plane of M at a =   and at a =  −1/2 .
0  −1/2 
0 1/2

∂f
12. Suppose f : R3 → R is C2 and (a) 6= 0, so that f = 0 defines z implicitly as a C2 function φ
∂z
of x and y near a. Show that
2 2
∂2f ∂f ∂ 2 f ∂f ∂f ∂2f ∂f
−2 +
∂2φ ∂z 2 ∂x ∂x∂z ∂x ∂z ∂x2 ∂z
(a) = − 3 ,
∂x2 ∂f
∂z

where all the partial derivatives on the right-hand side are evaluated at a.

13. Consider the three (pairwise) skew lines

 
1
ℓ1 : x = s 0 
0
   
0 1
ℓ2 : x =  1  + t 0 
0 1
   
1 1
ℓ3 : x =  2  + u  0 .
2 2
186 Chapter 4. Implicit and Explicit Solutions of Linear Systems

Show that through each point of ℓ3 there is a single line that intersects both ℓ1 and ℓ2 . Now,
find the equation of the surface formed by all the lines intersecting the three lines ℓ1 , ℓ2 , and
ℓ3 . Is it everywhere smooth? Sketch it.

14. Suppose X ⊂ Rn is a k-dimensional manifold and Y ⊂ Rp is an ℓ-dimensional manifold. Prove

that n o
x
X ×Y = ∈ Rn × Rp : x ∈ X and y ∈ Y
y
is a (k + ℓ)-dimensional manifold in Rn+p . (Hint: Recall that X is locally a graph over a k-
dimensional coordinate plane in Rn and Y is locally a graph over an ℓ-dimensional coordinate
plane in Rp .)

15. a. Suppose A is an n×(n+1) matrix of rank n. Show that the one-dimensional solution space
of Ax = b varies continuously with b ∈ Rn . (First you must decide what this means!)
b. Generalize.
CHAPTER 5
Extremum Problems
In this chapter we turn to one of the standard topics in differential calculus, solving maxi-
mum/minimum problems. In single-variable calculus, the strategy is to invoke the Maximum Value
Theorem (which guarantees that a continuous function on a closed interval achieves its maximum
and minimum) and then to examine all critical points and the endpoints of the interval. In problems
that are posed on open intervals, one must work harder to understand the global behavior of the
function. For example, it is not too hard to prove that if a differentiable function has precisely one
critical point on an interval and that critical point is a local maximum point, then it must indeed
be the global maximum point. As we shall see, all of these issues are—not surprisingly—rather
more subtle in higher dimensions. But just to stimulate the reader’s geometric intuition, we pose
a direct question here.
Query: Suppose f : R2 → R is C1 and there is exactly one point a at which the
tangent plane of the graph of f is horizontal. Suppose a is a local minimum point.
Must it be a global minimum point?
We close the chapter with a discussion of projections and inconsistent linear systems, along with a
brief treatment of inner product spaces.

1. Compactness and the Maximum Value Theorem

In Section 2 of Chapter 2 we introduced the basic topological notions of open and closed sets
and sequences. Here we return to a few more questions of the topology of Rn in order to frame the
higher-dimensional version of the Maximum Value Theorem. Let’s begin by reminding ourselves
why a closed interval is needed in the case of a continuous function of one variable: As Figure 1.1

a b a

Figure 1.1

illustrates, when an endpoint is missing or the interval extends to infinity, the function may have
no maximum value. We now make the “obvious” definition in higher dimensions:
Definition. We say S ⊂ Rn is bounded if all the points of S lie in some ball centered at the
origin, i.e., if there is a constant M so that kxk ≤ M for all x ∈ S. We say S ⊂ Rn is compact if it
187
188 Chapter 5. Extremum Problems

is a bounded, closed subset. That is, all the points of S lie in some ball centered at the origin, and
any convergent sequence of points in S converges to a point in S.
Examples 1. We saw in Example 6 of Chapter 2, Section 2 that a closed interval in R is a
closed subset, and it is obviously bounded, so it is in fact compact. Here are a few more examples.
(a) The unit sphere S n−1 = {x ∈ Rn : kxk = 1} is compact. Indeed, by Corollary 3.7 of Chapter
2, any level set of a continuous function is closed, so provided we have a bounded set, it will
also be compact. (Note that we write S n−1 because the sphere is an (n − 1)-dimensional
manifold, as Exercise 4.5.5 shows.)
(b) Any rectangle [a1 , b1 ] × · · · × [an , bn ] ⊂ Rn is compact. This set is obviously bounded, and
is closed because of Exercise 2.2.4.
(c) The set of 2 × 2 matrices of determinant 1 is a closed subset of R4 (because the determinant
is a polynomial expression in the entries of the matrix), but is not compact. The set is
k 0
unbounded, as we can take matrices of the form for arbitrarily large k. ▽
0 1/k
One of the most important features of a compact set is the following
Theorem 1.1. If X ⊂ Rn is compact, and {xk } is a sequence of points in A, then there is a
convergent subsequence {xkj } (which a fortiori converges to a point in X).
Proof. We first prove that any sequence of points in a rectangle [a1 , b1 ]× · · · × [an , bn ] ⊂ Rn has
a convergent subsequence. (This was the result of Exercise 2.2.15, but the argument is sufficiently
subtle that we include the proof here.) We proceed by induction on n.
Step (i): Suppose n = 1. Given a sequence {xk } of real numbers with a ≤ xk ≤ b for all k, we
claim that there is a convergent subsequence. If there are only finitely many distinct numbers xk ,
this is easy: at least one value must be taken on infinitely often, and we choose k1 < k2 < . . . so
that xk1 = xk2 = . . ..
If there are infinitely many distinct numbers among the xk , then we use the famous “successive
bisection” argument. Let I0 = [a, b]. There must be infinitely many distinct elements of our
sequence either to the left of the midpoint of I0 or to the right; let I1 = [a1 , b1 ] be the half that
contains infinitely many (if both do, let’s agree to choose the left half). Choose xk1 ∈ I1 . At
the next step, there must be infinitely many distinct elements of our sequence either to the left
or to the right of the midpoint of I1 . Let I2 = [a2 , b2 ] be the half that contains infinitely many
(and choose the left half if both do), and choose xk2 ∈ I2 with k1 < k2 . Continue this process
inductively. Suppose we have the interval Ij = [aj , bj ] containing infinitely many distinct elements
of our sequence, as well as k1 < k2 < · · · < kj with xkℓ ∈ Iℓ for ℓ = 1, 2, . . . , j. Then there must be
infinitely many distinct elements of our sequence either to the left or to the right of the midpoint of
the interval Ij , and we let Ij+1 = [aj+1 , bj+1 ] be the half that contains infinitely many (once again
choosing the left half if both do). We also choose xkj+1 ∈ Ij+1 with kj < kj+1 .
At the end of all this, why does the subsequence {xkj } converge? Well, in fact, we know what
its limit must be. The set of left endpoints aj is nonempty and bounded above by b, hence has a
least upper bound, α. First of all, the left endpoints aj must converge to α, because (see Figure
1.2)
a1 ≤ a2 ≤ · · · ≤ aj ≤ · · · ≤ α ≤ · · · ≤ bj ≤ · · · ≤ b2 ≤ b1 ,
§1. Compactness and the Maximum Value Theorem 189

α
a0=a1 a2=a3 a4 b3=b4 b1=b2 b0

I1
I3 I2
I4

Figure 1.2

and so α − aj ≤ bj − aj = (b − a)/2j → 0 as j → ∞. But since α and xkj both lie in the interval
[aj , bj ], it follows that |α − xkj | ≤ bj − aj → 0 as j → ∞.
n−1 . We introduce some
Step (ii): Suppose  now n ≥ 2 and we know   to be true in R
the result
x1 x1
   .. 
notation: given x =  ...  ∈ Rn , we write x =  . ∈R
n−1 . Given a sequence {x } of points
k
xn xn−1
in the rectangle [a1 , b1 ] × · · · × [an , bn ] ⊂ Rn , consider the sequence {xk } of points in the rectangle
[a1 , b1 ] × · · · × [an−1 , bn−1 ] ⊂ Rn−1 . By our induction hypothesis, there is a convergent subsequence
{xkj }. Now the sequence of nth coordinates of the corresponding vectors xkj , lying in the closed
interval [an , bn ], has in turn a convergent subsequence, indexed by kj1 < kj2 < · · · < kjℓ < . . ..
But then, by Exercises 2.2.6 and 2.2.2, it now follows that the subsequence {xkjℓ } converges, as
required.
Step (iii): Now we turn to the case of our general compact subset X. Since it is bounded, it is
contained in some ball B(0, R) centered at the origin, hence in some cube [−R, R] × · · · × [−R, R].
Thus, given a sequence {xk } of points in X, it lies in this rectangle, and hence by what we’ve
already proved has a convergent subsequence. The limit of that subsequence is, of course, a point
of the rectangle, but must in fact lie in X, since X is also closed. This completes the proof.

The result that is the cornerstone of our work in this chapter is the following

Theorem 1.2 (Maximum Value Theorem). Let X ⊂ Rn be compact, and let f : X → R be a

continuous function.1 Then f takes on its maximum and minimum values; that is, there are points
y and z ∈ X so that
f (y) ≤ f (x) ≤ f (z) for all x ∈ X .

Proof. First we show that f is bounded (by which we mean that the set of its values is a
bounded subset of R). Assume to the contrary that the values of f are arbitrarily large. Then for
each k ∈ N there is a point xk ∈ X so that f (xk ) > k. By Theorem 1.1, since X is compact, the
sequence {xk } has a convergent subsequence, say xkj → a. Since f is continuous, by Proposition
3.6 of Chapter 2, f (a) = lim f (xkj ), but this is impossible, since f (xkj ) → ∞ as j → ∞. An
j→∞
identical argument shows that the values of f are bounded below as well.
1
Although we have not heretofore defined continuity of a function defined on an arbitrary subset of Rn , there is
no serious problem. We say f : X → R is continuous at a ∈ X if, given any ε > 0, there is δ > 0 so that
|f (x) − f (a)| < ε whenever kx − ak < δ and x ∈ X .
190 Chapter 5. Extremum Problems

Since the set of values of f is bounded above, it has a least upper bound, M . By the definition
of least upper bound, for each k ∈ N there is xk ∈ X so that M − f (xk ) < 1/k. As before, since
X is compact, the sequence {xk } has a convergent subsequence, say xkj → z. Then, by continuity,
f (z) = lim f (xkj ) = M , so f takes on its maximum value at z. An identical argument shows that
j→∞
f takes on its minimum value as well.

We infer from Theorem 1.2 that, given any linear map T : Rn → Rm , the function

f : S n−1 → R
f (x) = kT (x)k

is continuous (see Exercises 2.3.2 and 2.3.7 and Proposition 3.5 of Chapter 2). Therefore, f takes
on its maximum value, which we denote by kT k, called the norm of T :

kT k = max kT (x)k.
kxk=1

Since T is linear, the following formula follows immediately:

Proposition 1.3. Let T : Rn → Rm be a linear map. Then for any x ∈ Rn , we have

kT (x)k ≤ kT kkxk.

Moreover, for any scalar c we have kcT k = |c|kT k; and if S : Rn → Rm is another linear map, we
have kS + T k ≤ kSk + kT k.

Proof. There is nothing to prove when x = 0. When x 6= 0, we have

x
T ≤ kT k,
kxk
by definition of the norm, and so, using the linearity of T , we have

x x
kT (x)k =
T kxk = kxk T ≤ kT kkxk,
kxk kxk
as required.
That max kcT (x)k = |c| max kT (x)k = |c|kT k is evident. Now, last, since
kxk=1 kxk=1

k(S + T )(x)k ≤ kS(x)k + kT (x)k,

we have

max k(S + T )(x)k ≤ max kS(x)k + kT (x)k ≤ max kS(x)k + max kT (x)k = kSk + kT k.
kxk=1 kxk=1 kxk=1 kxk=1

We will compute a few nontrivial examples of the norm of a linear map in the Exercises of Section
4, but in the meantime we have the following.

Example 2. Let A be an n × n diagonal matrix, with diagonal entries d1 , . . . , dn . Then for

any x ∈ S n−1 we have

kAxk2 = (d1 x1 )2 +(d2 x2 )2 +· · ·+(dn xn )2 ≤ max(d21 , d22 , . . . , d2n )(x21 +· · ·+x2n ) = max(d21 , d22 , . . . , d2n ).
§1. Compactness and the Maximum Value Theorem 191

Note, moreover, that this maximum value is achieved, for if max(|d1 |, |d2 |, . . . , |dn |) = |di |, then
Aei = di ei and kAei k = |di |. Thus, we conclude that

kAk = max(|d1 |, |d2 |, . . . , |dn |). ▽

For future reference, we include the following important and surprising result.

Theorem 1.4 (Uniform Continuity Theorem). Let X ⊂ Rn be compact and let f : X → R

be continuous. Then f is uniformly continuous; i.e., given ε > 0, there is δ > 0 so that whenever
kx − yk < δ, x, y ∈ X, we have |f (x) − f (y)| < ε.

Proof. We argue by contradiction. Suppose that for some ε0 > 0 there were no such δ > 0.
Then for every m ∈ N, we could find xm , ym ∈ X with kxm −ym k < 1/m and |f (xm )−f (ym )| ≥ ε0 .
Since X is compact, we may choose a convergent subsequence xmk → a. Now since kxm − ym k → 0
as m → ∞, it must be the case that ymk → a as well. Since f is continuous at a, given ε0 > 0,
there is δ0 > 0 so that whenever kx − ak < δ0 , we have |f (x) − f (a)| < ε0 /2. By the triangle
inequality, whenever k is sufficiently large that kxmk − ak < δ0 and kymk − ak < δ0 , we have

|f (xmk ) − f (ymk )| ≤ |f (xmk ) − f (a)| + |f (ymk ) − f (a)| < ε0 ,

contradicting our hypothesis that |f (xm ) − f (ym )| ≥ ε0 for all m.

EXERCISES 5.1

*1. Which of the following are compact subsets of the given Rn ? Give your reasoning. (Identify
2
the space of all n × n matrices with Rn .)
t
x 2 2 2 e cos t
a. ∈R :x +y =1 g. ∈ R 2 :t≤0
y et sin t
   
x
b. ∈ R 2 : x2 + y 2 ≤ 1  x 
y h.  y  ∈ R 3 : x2 + y 2 + z 2 ≤ 1
 
x 2 2 2 z
c. ∈R :x −y =1   
y
 x 
x i.  y  ∈ R 3 : x3 + y 3 + z 3 ≤ 1
d. ∈ R 2 : x2 − y 2 ≤ 1 
z

y

x 2 1 j. {3 × 3 matrices A : det A = 1}
e. ∈ R : y = sin for some 0 < x ≤ 1
y x
t k. 2 × 2 matrices A : AT A = I
e cos t 2

f. t ∈R :t∈R l. 3 × 3 matrices A : AT A = I
e sin t
2. If X ⊂ Rn is not compact, then show that there is an unbounded continuous function f : X → R.

3. Let T : Rn → R be a linear map. Prove that there is a vector a ∈ Rn so that T (x) = a · x, and
deduce that kT k = kak.

4. Find kAk if
192 Chapter 5. Extremum Problems
" # " #
1 1 3 4
*a. A = b. A =
1 1 3 4
qP √
♯ 5. Suppose A is an m × n matrix. Prove that kAk ≤ a2ij ≤ nkAk.
i,j

♯ 6. Suppose T : Rn → Rm and S : Rm → Rℓ are linear maps. Show that kS ◦ T k ≤ kSkkT k. (In

particular, when A is an ℓ × m matrix and B is an m × n matrix, we have kABk ≤ kAkkBk.)

7. Let A be an m × n matrix. Show that kAk = kAT k. (Hint: Start by showing that kAk ≤ kAT k
by using Proposition 4.5 of Chapter 1.)

8. Suppose S ⊂ Rn is compact and a ∈ Rn is fixed. Show that there is a point of S closest to a.

(Hint: Use Exercise 2.3.2.)

*9. Suppose S ⊂ Rn has the property that any sequence of points in S has a subsequence converging
to a point in S. Prove that S is compact.

10. Suppose f : X → Rm is continuous and X is compact. Prove that the set f (X) = {y ∈ Rm :
y = f (x) for some x ∈ X} is compact. (Hint: Use Exercise 9.)

11. Suppose S1 ⊃ S2 ⊃ S3 ⊃ . . . are nonempty compact subsets of Rn . Prove that there is x ∈ Rn

so that x ∈ Sk for all k ∈ N. (Cf. Exercise 2.2.10.)
♯ 12. Suppose X ⊂ Rn is a compact set. Suppose U1 , U2 , U3 , . . . ⊂ Rn are open sets whose union
contains X. Prove that for some N ∈ N we have X ⊂ U1 ∪ · · · ∪ UN . (Hint: If not, for each k,
choose xk ∈ X so that xk ∈/ U1 ∪ · · · ∪ Uk .)

13. Suppose X ⊂ Rn is compact, and U1 , U2 , U3 , . . . ⊂ Rn are open sets whose union contains X.
Prove that there is a number δ > 0 so that for every x ∈ X, there is some j ∈ N so that
B(x, δ) ⊂ Uj . (Hint: If not, for each k ∈ N, what happens with δ = 1/k?)

2. Maximum/Minimum Problems

Definition. Let X ⊂ Rn , and let a ∈ X. The function f : X → R has a global maximum at a

if f (x) ≤ f (a) for all x ∈ X. f has a local maximum at a if, for some δ > 0, we have f (x) ≤ f (a)
for all x ∈ B(a, δ) ∩ X. We say a is a (local or global) maximum point of f .
Analogously, f has a global minimum at a if f (x) ≥ f (a) for all x ∈ X. f has a local minimum
at a if, for some δ > 0, we have f (x) ≥ f (a) for all x ∈ B(a, δ) ∩ X. We say a is a (local or global)
minimum point of f .
If a is either a local maximum or local minimum point, we say it is an extremum.

We begin with a somewhat silly example:


1, x ∈ Q
Example 1. If f (x) = , then every point a ∈ Q is a global maximum point and
0, x ∈ /Q
/ Q is a global minimum point.
every point a ∈ ▽
§2. Maximum/Minimum Problems 193

Now for something a bit more substantial.

x
Example 2. Let f : R2 → R be defined by f = x2 + 2xy + 3y 2 . Then
y

x
f = (x + y)2 + 2y 2 ≥ 0 for all x, y.
y

From this we infer that 0 is a global minimum point. Indeed, (x + y)2 + 2y 2 = 0 if and only if
x + y = y = 0 if and only if x = y = 0, so 0 is the only global minimum point of f . But is 0 the
only extremum? ▽

Lemma 2.1. Suppose f is defined on some neighborhood of the extremum a and f is differ-
entiable at a. Then Df (a) = O (or, equivalently, ∇f (a) = 0).

Proof. Suppose that a is a local minimum (the case of a local maximum is left to the reader).
Then for any v ∈ Rn , there is δ > 0 so that we have

f (a + tv) − f (a) ≥ 0 for all real numbers t with |t| < δ.

This means that

f (a + tv) − f (a) f (a + tv) − f (a)
lim ≥0 and lim ≤ 0.
t→0+ t t→0− t
Since f is differentiable at a, the directional derivative Dv f (a) exists, and so we must have
f (a + tv) − f (a)
Df (a)v = Dv f (a) = lim = 0.
t→0 t
Since v is arbitrary, we infer that Df (a) = 0.

Remark. Geometrically, if we consider f as a function of xi only, fixing all the other variables,
we get a curve with a local minimum at ai , which must therefore have a flat tangent line. That is,
all partial derivatives of f at a must be 0, and so the tangent plane must be horizontal.

Definition . Suppose f is differentiable at a. We say a is a critical point if Df (a) = 0. A

critical point a is a saddle point if for every δ > 0, there are points x, y ∈ B(a, δ) with f (x) < f (a)
and f (y) > f (a).

(a) local minimum (b) local maximum (c) saddle point

Figure 2.1
194 Chapter 5. Extremum Problems

In Section 3 we will devise a second-derivative test to attempt to distinguish among local maxima,
local minima, and saddle points, typical ones of which are shown in Figure 2.1. In the sketch in
Figure 2.2(a), we cannot tell whether we are at a local maximum or a local minimum; however, in

(a) (b) (c)

Figure 2.2
(b) and (c) we strongly suspect a saddle point.

x
Example 3. The prototypical example of a saddle point is provided by the function f =
y

x 0
x2 − y 2 . The origin is a critical point, and clearly f > 0 for x 6= 0 and f < 0 for y 6= 0.
0 y
In the graph we see parabolas opening upwards in the x-direction and those opening downwards
in the y-direction (see Figure 2.3(a)).
A somewhat more interesting example is provided by the so-called monkey saddle, pictured in
x
Figure 2.3(b), which is the graph of f = 3xy 2 − x3 . Note that whereas the usual saddle surface
y

(a) (b)

Figure 2.3
allows room for the legs, in the case of the monkey saddle, there is also room for the monkey’s tail.
▽
§2. Maximum/Minimum Problems 195

Now we turn to the standard fare in differential calculus, the typical “applied extremum prob-
lems.” If we are fortunate enough to have a differentiable function on a compact region X, then
the Maximum Value Theorem guarantees both a global maximum and a global minimum, and we
can test for critical points on the interior of X (points having a neighborhood wholly contained in
X). It still remains to examine the function on the boundary of X, as well.

Example 4. We want to find

thehottest and coldest points on the metal plate R = [0, π]×[0, π],
x
whose temperature is given by f = sin x + cos 2y. Since f is continuous and R is compact, we
y
know the global maximum and minimum exist. We find that

x
Df = cos x −2 sin 2y ,
y

π/2
and so the only critical point in the interior of R is . The boundary of R consists of four
π/2

x x
line segments, as indicated in Figure 2.4. On C1 and C3 we have f = f = sin x + 1,
0 π

C3
π
1 2 1

C4 −1 0 −1 C2

1 2 1
0 C1 π

Figure 2.4

x ∈ [0, π],
whichachieves
a maximum at π/2 and minima at 0 and π. Similarly, on C2 and C4 we
0 π
have f =f = cos 2y, y ∈ [0, π], which achieves its maximum at 0 and π and its minimum
y y
at π/2. Wenow mark the
values
of f at the nine points we’ve
unearthed.
We
see that the hottest
π/2 π/2 0 π
points are and and the coldest points are and . On the other hand,
0 π π/2 π/2
the critical point at the center of the square is a saddle point (why?). ▽

Somewhat more challenging are extremum problems where the domain is not naturally compact.
Consider the following

Example 5. Of all rectangular boxes with no lid and having a volume of 4 m3 , we wish to
determine the dimensions of the one with least total surface area. Let x, y, and z represent the
length, width, and height of the box, respectively, measured in meters. Given that xyz = 4, we
wish to minimize the surface area xy + 2z(x + y). Substituting z = 4/xy, we then define the surface
area as a function of the independent variables x and y:

x 8 1 1
f = xy + (x + y) = xy + 8 + .
y xy x y
196 Chapter 5. Extremum Problems

y
24

xy=12

z a
S
y 1/2
1/2 24 x
x

(a) (b)

Figure 2.5

x
Note that the domain of f is the open first quadrant, i.e., X = : x > 0 and y > 0 , which
y
is definitely not compact. What guarantees that our function f achieves a minimum value on X?
(Note, for example, that f has no maximum value on X.) The heuristic answer is this: if either x
or y gets either very small or very large, the value of f gets very large. We shall make this precise
soon.
Let’s first of all find the critical points of f . We have

x 8 8
Df = y− 2 x− 2 ,
y x y
so at a critical point we must have
8 8
y− = x − 2 = 0,
x2 y

2 2
whence x = y = 2. The sole critical point is a = , and f = 12. Now it is not difficult to
2 2
establish the fact that a is the global minimum point of f . Let

x 1 1 12
S= : ≤ x ≤ 24, ≤y≤ ,
y 2 2 x

as in Figure 2.5(b). Then S is compact, so that the restriction

of f to the set S attains its global
x
minimum value. Here is the crucial point: whenever is on the boundary of or outside S, we
y

x 1 1 x 1 1
have f > 12. (For if either x ≤ 2 or y ≤ 2 , then we have f > 8 + > 16; and
y y x y

x
if xy ≥ 12, then we have f > 12.) Since f (a) = 12, it follows that the global minimum of
y
f on S cannot occur on the boundary of S, hence must at an interior point, and therefore at a
critical point of f . It follows that a is the global minimum point of f on S, hence on all of X since
f (x) > f (a) whenever x ∈ / S.
In summary, the box of the least surface area has dimensions 2 m × 2 m × 1 m. ▽
§2. Maximum/Minimum Problems 197

In general, when confronted with a maximum/minimum problem on a noncompact region, we

must usually be somewhat resourceful—either with such estimates, or with algebraic or geometric
arguments.

EXERCISES 5.2

1. Find all
the critical points of the following scalar functions:

x 2 2 x
*a. f = x + 3x − 2y + 4y h. f = x2 y − 4xy − y 2
y y
 
x x
b. f = xy + x − y

y *i. f y  = xyz − x2 − y 2 + z 2

x z
c. f = sin x + sin y  
y x
x j. f y  = x3 + xz 2 − 3x2 + y 2 + 2z 2
d. f = x2 − 3x2 y + y 3 z
y  

x x
e. f = x2 y + x3 − x2 + y 2 2 2 2
y k. f y  = e−(x +y +z )/6 (x − y + z)

x z
f. f = (x2 + y 2 )e−y  
y x
y  = xyz − x2 − y 2 − z 2
x 2 2 l. f
*g. f = (x − y)e−(x +y )/4 z
y
2. A rectangular box with edges parallel to the coordinate axes has one corner at the origin and
the opposite corner on the plane x + 2y + 3z = 6. What is the maximum possible volume of
the box?

*3. A rectangular box is inscribed in a hemisphere of radius r. Find the dimensions of the box of
maximum volume.
√
*4. The
temperature of the circular plate D = {x : kxk ≤ 2} ⊂ R2 is given by the function
x
f = x2 + 2y 2 − 2x. Find the maximum and minimum values of the temperature on D.
y
5. Two non-overlapping rectangles with
their
sides
parallel
to the coordinate axes are inscribed in
0 1 0
the triangle with vertices at , , and . What configuration will maximize the sum
0 0 1
of their areas?

*6. A post office employee has 12 ft2 of cardboard from which to construct a rectangular box with
no lid. Find the dimensions of the box with largest possible volume.

7. Show that the rectangular box of maximum volume with a given surface area is a cube.

8. The material for the sides of a rectangular box cost twice as much per ft2 as that for the top and
bottom. Find the relative dimensions of the box with greatest volume that can be constructed
for a given cost.
198   Chapter 5. Extremum Problems
1
9. Find the equation of the plane through the point  2  that cuts off the smallest possible volume
in the first octant. 2

*10. A long flat piece of sheet metal, 12′′ wide, is to be bent to form a long trough with cross-sections
an isosceles trapezoid. Find the shape of the trough with maximum cross-sectional area. (Hint:
It will help to use an angle as one of your variables.)

11. A pentagon is formed by placing an isosceles triangle atop a rectangle. If the perimeter P of
the pentagon is fixed, find the dimensions of the rectangle and the height of the triangle that
give the pentagon of maximum area.

12. An ellipse is formed by intersecting the cylinder x2 + y 2 = 1 and the plane x + 2y + z = 0. Find
the highest and lowest points on the ellipse. (As usual, the z-axis is vertical.)

13. Suppose x, y, and z are positive numbers with xy 2 z 3 = 108. Find (with proof) the minimum
value of their sum.

14. Let a1 , . . . , ak ∈ Rn be fixed points. Show that the function

k
X
f (x) = kx − aj k2
j=1

has a global minimum and find the global minimum point.

15. (Cf. Exercise 14.) Let a1 , a2 , a3 ∈ R2 be three noncollinear points. Show that the function
3
X
f (x) = kx − aj k
j=1

has a global minimum and characterize the global minimum point. (Hint: Your answer will be
geometric in nature. Can you give an explicit geometric construction?)

3. Quadratic Forms and the Second Derivative Test

Just as the second derivative test in single-variable calculus often allows us to differentiate
between local minima and local maxima, there is something quite analogous in the multivariable
case, to which we now turn. Of course, even with just one variable, if f ′ (a) = f ′′ (a) = 0, we
do not have enough information, and we need higher derivatives to infer the local behavior of f
at a; lying behind this is the theory of the Taylor polynomial, which works analogously in the
multivariable case. In the interest of time, however, we shall content ourselves here with just the
second derivative.
First, we need a one-variable generalization of the Mean Value Theorem. (In truth, it is Taylor’s
Theorem with Remainder for the first degree Taylor polynomial. See Chapter 20 of Spivak.)

Lemma 3.1. Suppose g : [0, 1] → R is twice differentiable. Then

g(1) = g(0) + g ′ (0) + 12 g ′′ (ξ) for some 0 < ξ < 1.

§3. Quadratic Forms and the Second Derivative Test 199

Proof. Define the polynomial P by P (t) = g(0) + g′ (0)t + Ct2 , where C = g(1) − g(0) − g ′ (0).
This choice of C makes P (1) = g(1), and it is easy to see that P (0) = g(0) and P ′ (0) = g ′ (0) as
well, as shown in Figure 3.1. Then the function h = g − P satisfies h(0) = h′ (0) = h(1) = 0. By

Figure 3.1

Rolle’s Theorem, since h(0) = h(1) = 0, there is c ∈ (0, 1) so that h′ (c) = 0. By Rolle’s Theorem
applied to h′ , since h′ (0) = h′ (c) = 0, there is ξ ∈ (0, c) so that h′′ (ξ) = 0. This means that
g′′ (ξ) = P ′′ (ξ) = 2C, and so

g(1) = P (1) = g(0) + g ′ (0) + 21 g′′ (ξ),

as required.

The derivative in the multivariable setting becomes a linear map (or vector); as we shall soon
see, the second derivative should become a quadratic form, i.e., a quadratic function of a vector
variable.

Definition. Assume f is a C2 function in a neighborhood of a. Define the symmetric matrix

 ∂2f ∂2f ∂2f

∂x21
(a) ∂x1 ∂x2 (a) · · · ∂x1 ∂xn (a)
2  ∂ 2 f (a) ∂2f ∂2f


∂ f  ∂x ∂x ∂x22 (a) · · · ∂x2 ∂xn (a) 
Hess(f )(a) = (a) =  2 . 1 .. .. .. .
∂xi ∂xj  .. . . . 
 
2
∂ f 2
∂ f 2
∂ f
∂xn ∂x1 (a) ∂xn ∂x2 (a) · · · ∂x2 (a)
n

Hess(f )(a) is called the Hessian matrix of f at a. Define the associated quadratic form

Hf,a : Rn → R by
n
X
∂2f
Hf,a (h) = hT Hess(f )(a) h = (a)hi hj .
∂xi ∂xj
i,j=1

Now we are in a position to state the generalization of Lemma 3.1 to functions of several
variables. This will enable us to deduce the appropriate second derivative test for extrema.

Proposition 3.2. Suppose f : B(a, r) → R is C2 . Then for all h with khk < r we have

f (a + h) = f (a) + Df (a)h + 21 Hf,a+ξh (h) for some 0 < ξ < 1.

200 Chapter 5. Extremum Problems

Consequently,
f (a + h) = f (a) + Df (a)h + 12 Hf,a (h) + ǫ(h), where ǫ(h)/khk2 → 0 as h → 0.

Remark . Just as the derivative gives the best linear approximation to f at a, so adding
the quadratic term 12 Hf,a (h) gives the best possible quadratic approximation to f at a. This
is the second-degree Taylor polynomial of f at a. For further reading on multivariable Taylor
polynomials, consult, e.g., C. H. Edwards’ Advanced Calculus of Several Variables or Hubbard and
Hubbard Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach.

Proof. We apply Lemma 3.1 to the function g(t) = f (a + th). Using the chain rule twice (and
applying Theorem 6.1 of Chapter 3 as well), we have
n
X ∂f
g′ (t) = Df (a + th)h = (a + th)hi
∂xi
i=1
n X
X n n
X
∂2f ∂2f
g′′ (t) = (a + th)hj hi = (a + th)hi hj
∂xj ∂xi ∂xj ∂xi
i=1 j=1 i,j=1

= Hf,a+th (h).
Now substitution yields the first result.
Since f is C2 , given any ε > 0, there is δ > 0 so that whenever kvk < δ we have
kHess(f )(a + v) − Hess(f )(a)k < ε.
Using the Cauchy-Schwarz inequality, Proposition 2.3 of Chapter 1, and Proposition 1.3, we find
that |hT Ah| ≤ kAkkhk2 . So whenever khk < δ, we have, for any 0 < ξ < 1,
|Hf,a+ξh (h) − Hf,a (h)| < εkhk2 .
1

By definition, ǫ(h) = 2 Hf,a+ξh (h) − Hf,a (h) , so
|ǫ(h)| |Hf,a+ξh (h) − Hf,a (h)| ε
= <
khk2 2khk2 2
whenever khk < δ. Since ε > 0 was arbitrary, this proves the result.

Definition . Given a symmetric n × n matrix A, we say the associated quadratic form

Q : Rn → R, Q(x) = xT Ax, is
positive definite if Q(x) > 0 for all x 6= 0,
negative definite if Q(x) < 0 for all x 6= 0,
positive semidefinite if Q(x) ≥ 0 for all x and = 0 for some x 6= 0,
negative semidefinite if Q(x) ≤ 0 for all x and = 0 for some x 6= 0, and
indefinite if Q(x) > 0 for some x and Q(x) < 0 for other x.

Examples 1.

1 2
(a) The quadratic form Q(x) = x21 + 4x1 x2 + 5x22 = xT x is positive definite, as we see
2 5
by completing the square:
x21 + 4x1 x2 + 5x22 = (x1 + 2x2 )2 + x22 ,
§3. Quadratic Forms and the Second Derivative Test 201

being the sum of two squares (with positive coefficients) is nonnegative and can vanish only
if x2 = x1 + 2x2 = 0, i.e., only if x = 0.
1 1
(b) The quadratic form Q(x) = x21 +2x1 x2 −x22 = xT x is indefinite, as we can see either
1 −1

t 2 0
by completing the square or merely by observing that Q = t > 0 and Q = −t2 < 0
0 t
for t 6= 0.  
1 1 1
(c) The quadratic form Q(x) = x1 + 2x1 x2 + 2x2 + 2x1 x3 + 2x3 = x 1 2 0 x is, however,
2 2 2 T 
positive semidefinite, for 1 0 2

x21 + 2x1 x2 + 2x22 + 2x1 x3 + 2x23 = (x1 + x2 + x3 )2 + x22 − 2x2 x3 + x23

= (x1 + x2 + x3 )2 + (x2 − x3 )2 ≥ 0,
 
−2
but note that Q  1 = 0. ▽
1

Theorem 3.3. Suppose f : B(a, r) → R is C2 and a is a critical point. If Hf,a is positive

(resp., negative) definite, then a is a local minimum (resp., maximum) point; if Hf,a is indefinite,
then a is a saddle point. If Hf,a is semidefinite, we can draw no conclusions.

Proof. By Proposition 3.2, given ε > 0, there is δ > 0 so that

|ǫ(h)|
f (a + h) − f (a) = 12 Hf,a (h) + ǫ(h) where <ε whenever khk < δ.
khk2

Suppose now that Hf,a is positive definite. By the Maximum Value Theorem, Theorem 1.2,
there is a number m > 0 so that Hf,a (x) ≥ m for all unit vectors x. This means that Hf,a (h) ≥
mkhk2 for all h. So now, choosing ε = m/4, we have

f (a + h) − f (a) = 21 Hf,a (h) + ǫ(h) > 14 mkhk2 > 0

for all h with khk < δ. This means that a is a local minimum, as desired. The negative definite
case is analogous.
Now suppose Hf,a is indefinite. Then there are unit vectors x and y so that Hf,a (x) = m1 > 0
and Hf,a (y) = m2 < 0. Choose ε = 41 min(m1 , −m2 ). Now, letting h = tx (resp., ty) with |t| < δ,
we see that

f (a + tx) − f (a) > 14 m1 t2 > 0 and f (a + ty) − f (a) < 41 m2 t2 < 0.

This means that a is a saddle point of f .

Last, note that if Hf,a is positive semidefinite, then a may be either
a local minimum, a local
x
maximum (!), or a saddle. Consider, respectively, the functions f = x2 + y 4 , −x4 − y 4 , and
y
x2 + y 3 , all at the origin.
202 Chapter 5. Extremum Problems

2 A B
Corollary 3.4. When n = 2, assume f is C near the critical point a and Hess(f )(a) = .
B C
Then
and A > 0 a is a local minimum
AC − B 2 > 0
and A < 0 a is a local maximum
=⇒
AC − B 2 < 0 a is a saddle point
AC − B 2 = 0 the test is inconclusive

Proof. This is just the usual process of completing the square: When A 6= 0,
B 2 B2 2 B 2 AC − B 2 2
Ax2 + 2Bxy + Cy 2 = A x + y + C− y =A x+ y + y ,
A A A A
so the quadratic form is positive definite when A > 0 and AC − B 2 > 0, negative definite when
A < 0 and AC − B 2 > 0, and indefinite when AC − B 2 < 0. When A = 0, we have 2Bxy + Cy 2 =
y(2Bx+Cy), and so the quadratic form is indefinite provided B 6= 0, i.e., provided AC −B 2 < 0.

x
Example 2. Let’s find and classify the critical points of the function f : R2 → R, f =
y
x3 + y2 − 6xy. Then
x
Df = 3x2 − 6y 2y − 6x ,
y

0
and so
at a critical point we must have 2y = x2 = 6x. Thus, the critical points are a = and
0
6
b= .
18
Now, we calculate the Hessian:

x 6x −6
Hess(f ) = ,
y −6 2
and so
0 0 −6 6 36 −6
Hess(f ) = and Hess(f ) = .
0 −6 2 18 −6 2
We see that Hf,a is indefinite, so a is a saddle point, and Hf,b is positive definite, so b is a local
minimum point. ▽

The process of completing the square as we’ve done in Examples 1 can be couched in matrix
language; indeed, it is intimately related to the reduction to echelon form, as we shall now see.

Example 3. Suppose we begin to reduce the symmetric matrix

 
1 3 2
 
A = 3 4 −4 
2 −4 −10
to echelon form. The first step is
   
1 3 2 1 3 2
    ′
A = 3 4 −4   0 −5 −10  = A ,
2 −4 −10 0 −10 −14
§3. Quadratic Forms and the Second Derivative Test 203

where
   
1 1
    ′
A′ =  −3 1  A, and so A = 3 1 A .
−2 0 1 2 0 1
| {z } | {z }
E1 E1−1

There are already two interesting observations to make: the first column of E1−1 is the transpose
of the first row of A (hence of A′ ); and if we remove the first row and column from A′ , what’s left
is also symmetric. Indeed, we can write
   
1 h i 0 0 0
   
A = 3 1 3 2 +  0 −5 −10  ;
2 0 −10 −14
since the first term is symmetric (why?), the latter term must be as well. Now we just continue:
   
1 3 2 1 3 2
   
A′ =  0 −5 −10   0 −5 −10  = U ,
0 −10 −14 0 0 6

and so, as before,

   
1 1
  ′  
U = 1 A , and so A′ =  1 U.
−2 1 2 1
| {z } | {z }
E2 E2−1

Summarizing, we have A = LU , where

 
1
 
L = E1−1 E2−1 = 3 1 
2 2 1
is a lower triangular matrix with 1’s on the diagonal. Now here comes the amazing thing: if we
factor out the diagonal entries of the echelon matrix U , we are left with LT :
  
1 1 3 2
  
U = −5  1 2.
6 1
| {z }| {z }
D LT
Because A is symmetric, we are left with the formula
A = LDLT ,
corresponding to the formula we get by completing the square:
x21 + 6x1 x2 + 4x22 + 4x1 x3 − 8x2 x3 − 10x23 = (x1 + 3x2 + 2x3 )2 − 5x22 − 20x2 x3 − 14x23
= (x1 + 3x2 + 2x3 )2 − 5(x22 + 4x2 x3 ) − 14x23
(∗) = (x1 + 3x2 + 2x3 )2 − 5(x2 + 2x3 )2 + 6x23 .
204 Chapter 5. Extremum Problems

To complete the circle of ideas, note that

Q(x) = xT Ax = xT (LDLT )x = (LT x)T D(LT x)

recaptures the form of (∗). ▽

Remark. T
Of course, not every symmetric matrix can be written in the form LDL ; e.g., take
0 1
A = . The problem arises when we have to switch rows to get pivots in the appropriate
1 0
places. Nevertheless, by doing appropriate row operations together with the companion column
operations (to maintain symmetry), one can show that every symmetric matrix can be written in
the form EDE T , where E is the product of elementary matrices with only 1’s on the diagonal
(i.e., elementary matrices of type (iii)). See Exercise 8b for the example of the matrix A given just
above.

Proposition 3.5. Suppose A is a symmetric matrix with associated quadratic form Q. Suppose
A = LDLT , where L is lower triangular with 1’s on the diagonal and D is diagonal. If all the entries
of D are positive (resp., negative), then Q is positive (resp., negative) definite; if all the entries of
D are nonnegative (resp., nonpositive) and at least one is 0, then Q is positive (resp., negative)
semidefinite; and if entries of D have opposite sign, then Q is indefinite.
Conversely, if Q is positive (resp., negative) definite, then there are a lower triangular matrix
L with 1’s on the diagonal and a diagonal matrix D with positive (resp., negative) entries so that
A = LDLT . If Q is semidefinite (resp., indefinite), the matrix EAE T (where E is a suitable product
of elementary matrices of type (iii)) can be written in the form LDLT , where now there is at least
one 0 (resp., real numbers of opposite sign) on the diagonal of D.

Sketch of proof. Suppose A = LDLT , where L is lower triangular with 1’s on the diagonal
(or, more generally, A = EDE T , where E is invertible). Let d1 , . . . , dn be the diagonal entries of
the diagonal matrix D. Letting y = LT x, we have
n
X
Q(x) = xT Ax = xT (LDLT )x = (LT x)T D(LT x) = yT Dy = di yi2 .
i=1

Realizing that y = 0 ⇐⇒ x = 0, the conclusions of the first part of the Proposition are now
evident.
Suppose Q is positive definite. Then, in particular, Q(e1 ) = a11 > 0, so we can write
   
1 0 0 ··· 0
 a12  h ih i  
 a11  0 
A= 
 ..  a11
a
1 a12 · · · a
a
1n
+  .

,

 ..
11 11
 .  B 
a1n
a11 0

where B is also symmetric and the quadratic form on Rn−1 associated to B is likewise positive
definite. We now continue by induction. (For example, if the upper left entry of B were 0, this
would mean that Q(a12 e1 − a11 e2 ) = 0, contradicting the hypothesis that Q is positive definite.)
§3. Quadratic Forms and the Second Derivative Test 205

Remark. More explicitly, the matrix B corresponds to the quadratic form

Ax · e1
Q′ (x) = Q(x) − a11 (Ax · e1 )2 = Q(x − e1 ).
a11
Now, Q′ (x) ≥ 0 and = 0 if and only if a11 x = (Ax · e1 )e1 if and only if x is a scalar multiple of e1 .
This means precisely that B is positive definite.
An analogous argument works when Q is negative definite. If A = O, there is nothing to prove.
If not, in the semidefinite or indefinite case, if a11 = 0, we first find an appropriate elementary
matrix, E1 , so that the first entry of the symmetric matrix B = E1 AE1T is nonzero, and then we
continue as above.
Remark . We will see another way, introduced in the next section and developed fully in
Chapter 9, of analyzing the nature of the quadratic form Q associated to a symmetric matrix A.
The signs of the eigenvalues of A will tell the whole story.

EXERCISES 5.3

*1. Classify the critical points of the functions in Exercise 5.2.1.

x
2. Consider the function f = 2x4 − 3x2 y + y 2 .
y
a. Show that the origin is a critical point of f and that, restricting f to any line through the
origin, the origin is a local minimum point.
b. Is the origin a local minimum point of f ?2

x 2 2
*3. Describe the graph of f = (2x2 + y 2 )e−(x +y ) .
y

x
4. Let f = x3 + e3y − 3xey .
y
a. Show that f has exactly one critical point a, which is a local minimum point.
b. Show that a is not a global minimum point.
∂2f
5. Suppose f : R2 → R is C2 and harmonic (see Example 2 on p. 117). Assume (a) 6= 0. Prove
∂x2
that a cannot be an extremum of f .

6. For each of the following symmetric matrices A, write A = LDLT as in Example 3. Use your
answer to determine whether the associated quadratic form Q given by Q(x) = xT Ax is positive
definite, negative
" definite,
# indefinite, etc.
1 3
a. A =
3 13
" #
2 3
b. A =
3 4
2
We’ve seen several textbooks that purportedly prove Theorem 3.3 by showing, for example, that if Hf,a is
positive definite, then the restriction of f to any line through a has a local minimum at a, and then concluding that a
must be a local minimum point of f . We hope that this exercise will convince you that such a proof must be flawed.
206 Chapter 5. Extremum Problems
 
2 2 −2
 
*c. A =  2 −1 4
−2 4 1
 
1 −2 2
 
d. A =  −2 6 −6 
2 −6 9
 
1 1 −3 1
 1 0 −3 0
 
e. A =  
 −3 −3 11 −1 
1 0 −1 2
7. Suppose A = LDU , where L is lower triangular with 1’s on the diagonal, D is diagonal, and
U is upper triangular with 1’s on the diagonal. Prove that this decomposition is unique; i.e., if
A = LDU = L′ D ′ U ′ , where L′ , D ′ , and U ′ have the same defining properties as L, D, and U ,
respectively, then L = L′ , D = D ′ , and U = U ′ . (Hint: The product of two lower triangular
matrices is lower triangular, and likewise for upper.)

0 2
8. a. Let A = . After making a row exchange (and corresponding column exchange to
2 1

T 1 2
preserve symmetry), we get B = E1 AE1 = . Now write B = LDLT and get a
2 0
corresponding equation for A. How then have we expressed the associated quadratic form
Q(x) = 4x1 x2 + x22 as a sum (or difference) of squares?

0 1 T 1 1
*b. Let A = . By considering B = E1 AE1 = , where E1 is the elementary matrix
1 0 1 0
corresponding to adding 1/2 of the second row to the first, show that
1 1

T 2 −2 1
A = EDE where E= and D = .
1 1 −1
What is the corresponding expression for the quadratic form Q(x) = 2x1 x2 as a sum (or
difference) of squares?

4. Lagrange Multipliers

Most extremum problems, including those encountered in single-variable calculus, involve func-
tions of several variables with some constraints. Consider, for example, the box of prescribed
volume, a cylinder inscribed in a sphere of given radius, or the desire to maximize profit with only
a certain amount of working capital. There is an elegant and powerful way to approach all these
problems using multivariable calculus, the method of Lagrange multipliers. A generalization to
infinite dimensions, which we shall not study here, is central in the calculus of variations, which is
a powerful tool in mechanics, thermodynamics, and differential geometry.

Example 1. Your boat has sprung a leak in the middle of the lake and you are trying to find
the closest point on the shoreline. As suggested by Figure 4.1, we imagine dropping a rock in the
§4. Lagrange Multipliers 207

water at the location of boat and watching the circular waves radiate outwards. The moment the

g( xy)=0
f ( xy)=constant
a
∇g( a)

∇f( a) =λ∇g( a)

Figure 4.1

first wave touches the shoreline, we know that the point a at which it touches must be closest to
us. And at that point, the circle must be tangent to the shoreline.
Let’s place the origin at the point at which we drop the rock. Then the circles emanating from
this point are level curves of f (x) = kxk. Suppose, moreover, that the shoreline is a level curve of
a differentiable function g. By Proposition 5.3 of Chapter 4, the gradient is normal to level sets,
so if the tangent line of the circle at a and the tangent line of the shoreline at a are the same, this
means that we should have
∇f (a) = λ∇g(a) for some scalar λ. ▽

We now want to study the calculus of constrained extrema a bit more carefully.

Definition . Suppose U ⊂ Rn is open and f : U → R and g : U → Rm are differentiable.

Suppose g(a) = 0. We say a is a local maximum (resp., minimum) point of f subject to the
constraint g(x) = 0 if for some δ > 0, f (x) ≤ f (a) (resp., f (x) ≥ f (a)) for all x ∈ B(a, δ)
satisfying g(x) = 0. More succinctly, letting M = g−1 ({0}), a is an extremum of the restriction of
f to the set M .

Theorem 4.1. Suppose U ⊂ Rn is open, f : U → R is differentiable, and g : U → Rm is C1 .

Suppose g(a) = 0 and rank(Dg(a)) = m. If a is a local extremum of f subject to the constraint
g = 0, then there are scalars λ1 , . . . , λm so that
Df (a) = λ1 Dg1 (a) + · · · + λm Dgm (a).
The scalars λ1 , . . . , λm are called Lagrange multipliers.

Remark. As usual, this is a necessary condition for a constrained extremum, but not a sufficient
one. There may be (constrained) saddle points as well.
208 Chapter 5. Extremum Problems

Proof. By the Implicit Function Theorem, we can represent M = g−1 ({0}) locally near a as
a graph over some coordinate (n − m)-plane. For concreteness, let’s say that locally

x
M= : x ∈ V ⊂ Rn−m ,
φ(x)

where φ : V → Rm is C1 . Thus, we can define a local parametrization of M by

n x
Φ: V → R , Φ(x) = ,
φ(x)
as shown in Figure 4.2, with Φ(a) = a. Now we have two crucial pieces of information:

T M
a f
Φ
IR
IRn

a
IRn−m
V

Figure 4.2

g◦ Φ = 0 and f ◦ Φ has a local extremum at a.

Differentiating by the chain rule, and applying Lemma 2.1, we have
(†) Dg(a)◦ DΦ(a) = O and Df (a)◦ DΦ(a) = 0.
The first equation in (†) tells us that T, the (n − m)-dimensional image of the linear map DΦ(a),
satisfies T ⊂ ker Dg(a) (or C([DΦ(a)]) ⊂ N([Dg(a)])). But, by the Nullity-Rank Theorem,
Corollary 4.6 of Chapter 4, we have
dim N([Dg(a)]) = n − rank([Dg(a)]) = n − m,
by hypothesis. Since dim T = n − m = dim N([Dg(a)]), we must have T = N([Dg(a)]). Moreover,
⊥
T = N([Dg(a)]) = R([Dg(a)]) .
On the other hand, the latter equation in (†) tells us that
T ⊂ N([Df (a)]) = R([Df (a)])⊥ .
Thus,
⊥
R([Dg(a)]) ⊂ R([Df (a)])⊥ ,

so, taking orthogonal complements and using Exercise 1.3.9 and Proposition 4.8 of Chapter 4, we
have

R([Df (a)]) ⊂ R([Dg(a)]),

§4. Lagrange Multipliers 209

so Df (a) is a linear combination of the linear maps Dg1 (a), . . . , Dgm (a)—or, more geometrically,
∇f (a) is a linear combination of the vectors ∇g1 (a), . . . , ∇gm (a)—as we needed to show.
⊥
Remark. The subspace T = image(DΦ(a)) = R([Dg(a)]) is called the tangent space of M
at a. We shall return to such matters in Chapter 6.
   
x x
Example 2. The temperature at the point  y  in space is given by f y  = xy + z 2 .
z z
We wish to find the hottest 2 2 2
  and coldest points on the sphere x + y + z = 2z (the sphere of
0
radius  
  1 centered at 0 ). That is, we must find the extrema of f subject to the constraint
x 1
g y  = x2 + y 2 + z 2 − 2z = 0. By Theorem 4.1, we must find points x satisfying g(x) = 0 at
z
which Df (x) = λDg(x) for some scalar λ. That is, we seek points x so that
h i h i
(∗) y x 2z = λ x y z−1 for some scalar λ.

(Notice that we removed the factor of 2 from Dg.)

Eliminating λ, we see that, provided none of our denominators is 0,
y x 2z
= = .
x y z−1
So either
y=x and 2z = z − 1 or y = −x and 2z = 1 − z;
the former leads to z = −1, which is impossible, and the latter leads to
r
1 1 5
z = , y = −x, x = ± .
3 3 2
Now, we infer from (∗) that if x = 0, then y = 0 as well (and vice versa), and then z can be
arbitrary, so we also find that the north and south poles of the sphere are constrained critical
points. On the other hand, we cannot have the denominator z − 1 = 0, for, by (∗), that would
require z = 0, and these equations cannot hold simultaneously.
Calculating the values of f at our various constrained critical points, we have
p   p     
p 5/2/3 −p 5/2/3 0 0
    1
f − 5/2/3 = f 5/2/3  
= − , f 0 = 0, and f 0 = 4. 
6 0 2
1/3 1/3
   p   p 
0 p5/2/3 −p 5/2/3
Thus, the topmost point  0  is the hottest and the two points  − 5/2/3  and  5/2/3 
2 1/3 1/3
are the coldest. ▽

Remark. We surmise that the origin is a saddle point. Indeed, representing the sphere locally
p
as a graph near the origin, we have z = 1 − 1 − (x2 + y 2 ) and
 
x p 2
f y  = xy + 1 − 1 − (x2 + y 2 ) = xy + higher order terms.

z
210 Chapter 5. Extremum Problems
√
(This is easiest to see by using 1 + u = 1 + u/2 + higher order terms.) Even easier, the origin
is a non-constrained critical point of f . Since f is a quadratic polynomial, Hf,0 = f , and on the
tangent plane of the sphere at 0 we just get xy. (Also see Exercise 34.)

Example 3. Find the shortest possible distance from the ellipse x2 + 2y 2 = 2 to the line
x + y = 2. We need to consider the (square of the) distance between pairs of points, one on the

x
ellipse, the other on the line. This means that we need to work in R2 × R2 , with coordinates
y

u
v

x
y

Figure 4.3

u
and , respectively. Let’s try to minimize
v
 
x
y 
f  2
u = (x − u) + (y − v)
2

subject to the constraints

 
x
y  x2 + 2y 2 − 2 0
g 
u = = .
u+v−2 0
v
(The rank condition on g is easily checked in this case.) So we need to find points at which, for
some scalars λ and µ, we have
Df = λDg1 + µDg2 , i.e.,
h i h i h i
x − u y − v −(x − u) −(y − v) = λ x 2y 0 0 + µ 0 0 1 1 .
We see that we must have x− u = y − v and so x = 2y, as well. Now substituting into the constraint
equations yields two critical points:
 √   √ 
2/√3 −2/√3
 1/ 3√   −1/ 3 
  and  √ 
 1 + 1/(2 3)   1 − 1/(2 3)  .
√ √
1 − 1/(2 3) 1 + 1/(2 3)
§4. Lagrange Multipliers 211

u x
As a check, note that the vector from to in each case is normal to both the ellipse and
v y
the line, as Figure 4.3 corroborates. Evidently, the first point gives the shortest possible distance,
and we leave it to the reader to establish this rigorously. ▽

We close this section with an application of the method of Lagrange multipliers to linear algebra.
Suppose A is a symmetric n × n matrix. Let’s find the extrema of the quadratic form Q(x) = xT Ax
subject to the constraint g(x) = kxk2 = 1. By Theorem 4.1, we seek x ∈ Rn so that for some scalar
λ we have DQ(x) = λDg(x). Applying the result of Exercise 3.2.14 (and canceling a pair of 2’s),
this means that at any constrained extremum we must have
Ax = λx for some scalar λ.
Such a vector x is called an eigenvector of A, and the Lagrange multiplier λ is called an eigenvalue.
Note that by compactness of the unit sphere, Q must have at least a global minimum and a global
maximum; hence A must have at least two eigenvalues and corresponding eigenvectors.

6 2
Example 4. Consider A = . Proceeding as above, we arrive at the system of equations
2 9

6x + 2y = λx
2x + 9y = λy.
Eliminating λ, we obtain
6x + 2y 2x + 9y
= ,
x y
from which we find the equation
y 2 y y y
2 −3 −2= 2 +1 − 2 = 0,
x x x x
so either y = 2x or y = − 12x. Substituting
into the constraint equation, we obtain the critical
√ √
1/√5 −2/√ 5
points (eigenvectors) and , with respective Lagrange multipliers (eigenvalues)
2/ 5 1/ 5
10 and 5. ▽

EXERCISES 5.4

x
1. a. Find the minimum value of f = x2 + y 2 on the curve x + y = 2. Why is there no
y
maximum?
x
b. Find the maximum value of g = x + y on the curve x2 + y 2 = 2. Is there a minimum?
y
c. How are the questions (and answers) in parts a and b related?

x
*2. A wire has the shape of the circle x2 + y 2 − 2y = 0. Its temperature at the point is given
y

x
by T = 2x2 + 3y. Find the maximum and minimum temperatures of the wire. (Be sure
y
you’ve found all potential critical points!)
212 Chapter 5. Extremum Problems
 
x
3. Find the maximum value of f y  = 2x + 2y − z on the sphere of radius 2 centered at the
origin. z

x
4. Find the maximum and minimum values of the function f = x2 + xy + y 2 on the unit disk
y
D = {x ∈ R2 : kxk ≤ 1}.

1
5. Find the point(s) on the ellipse x2 + 4y 2 = 4 closest to the point .
0
 
x
6. The temperature at the point x is given by f y  = x2 + 2y + 2z. Find the hottest and coldest
z
points on the sphere x2 + y2 + z2 = 3.

7. Find the volume of the largest rectangular box (with all its edges parallel to the coordinate
axes) that can be inscribed in the ellipsoid
y2 z2
x2 + + =1.
2 3

8. A space probe in the shape of the ellipsoid 4x2 + y 2 + 4z 2 = 16 enters the earth’s atmosphere
and ◦
 its
 surface begins to heat. After one hour, the temperature in C on its surface is given by
x
f y  = 2x2 + yz − 4z + 600. Find the hottest and coldest points on the probe’s surface.
z
 
x
9. The temperature in space is given by f y  = 3xy + z 3 − 3z. Prove that there are hottest and
z
coldest points on the sphere x2 + y 2 + z 2 − 2z = 0, and find them.
    
x  x 
10. Let f y  = xy + z 3 and S =  y  : x2 + y 2 + z 2 = 1, z ≥ 0 . Prove that f attains its
 
z z
global maximum and minimum on S and determine its global maximum and minimum points.

11. Among all triangles inscribed in the unit circle, which have the greatest area? (Hint: Consider
the three small triangles formed by joining the vertices to the center of the circle.)

12. Among all triangles inscribed in the unit circle, which have the greatest perimeter?

2 2 2 2 3
*13. Find the ellipse x /a + y /b = 1 that passes through the point and has the least area.
1
(Recall that the area of the ellipse is πab.)

14. If α, β, and γ are the angles of a triangle, show that

α β γ 1
sin sin ≤ .
sin
2 2 2 8
For what triangles is the maximum attained?
♯ *15. Find the points closest to and farthest from the origin on the ellipse 2x2 + 4xy + 5y 2 = 1.
§4. Lagrange Multipliers 213

16. Solve Exercise 5.2.8 anew using Lagrange multipliers.

17. Solve Exercise 5.2.9 anew using Lagrange multipliers.

18. Find the maximum and minimum values of the function f (x) = x1 + · · · + xn subject to the
constraint kxk = 1.

*19. Find the maximum volume of an n-dimensional rectangular parallelepiped of diameter δ.

20. Suppose x1 , . . . , xn are positive numbers. Prove that

√ x1 + · · · + xn
n
x1 x2 · · · xn ≤ .
n
1 1
21. Suppose p, q > 0 and + = 1. Suppose x, y > 0. Use Lagrange multipliers to prove that
p q
xp y q
+ ≥ xy.
p q

(Hint: Minimize the left-hand side subject to the constraint xy = constant.)

22. Solve Exercise 5.2.11 anew using Lagrange multipliers.

23. A silo is built by putting a right circular cone atop a right circular cylinder (both having the
same radius). What dimensions will give the silo of maximum volume for a given surface area?

24. Solve Exercise 5.2.12 anew using Lagrange multipliers.

25. Use Lagrange multipliers to find the point closest to the origin on the line of intersection of the
planes x + 2y + z = 5 and 2x + y − z = 1.

26. In each case, find the point in the given subspace V closest
 to b.
3
*a. V = {x ∈ R3 : x1 − x2 + 3x3 = 2x1 + x2 = 0}, b =  7 
1
 
3
 1
b. V = {x ∈ R4 : x1 + x2 + x3 + x4 = x1 + 2x3 + x4 = 0}, b = 
 1


−1

*27. Find the points on the curve of intersection of the two surfaces x2 − xy + y 2 − z 2 = 1 and
x2 + y 2 = 1 that are closest to the origin.

28. Show that of all quadrilaterals with fixed side lengths, the one of maximum area can be inscribed
in a circle. (Hint: Use as variables a pair of opposite angles. See also Exercise 1.2.14.)

29. For each of the following symmetric matrices A, find all the extrema of Q(x) = xT Ax subject
to the constraint kxk 2
" # = 1. Also determine the Lagrange multiplier
" each time.
#
1 2 0 3
*a. A = b. A =
2 −2 3 −8
30. Find the norm of each of the following matrices. Note: A calculator will be helpful.
214 Chapter 5. Extremum Problems
" # " # " #
1 1 2 1 2 1
*a. b. c.
0 1 0 3 1 3

31. A (frictionless) lasso is thrown around two pegs, as pictured in Figure 4.4, and a large weight
hung from the free end. Treating the mass of the rope as insignificant, and supposing the weight
hangs freely, what is the equilibrium position of the system?

y
?

Figure 4.4

32. (Interpreting the Lagrange multiplier)

a. Suppose a = ψ(c) is a local extreme point of the function f relative to the constraint
g(x) = c; suppose moreover that ψ is a differentiable function. Show that λ = (f ◦ ψ)′ (c).
b. Assume that f and g are C2 . Use the Implicit Function Theorem to show that the extreme
point a is given locally as a differentiable function of c whenever the “bordered Hessian”
 
|

a  Hess(f )(a) − λHess(g)(a) ∇g(a) 
Hf,g = 


λ |
Dg(a) 0
is invertible.

33. (An application of Exercise 32 to economics) Let x ∈ Rn be the commodity vector, p ∈ Rn

the price vector, and f : Rn → R the “production function,” so that f (x) tells us how many
widgets are produced using xi units of item i, i = 1, . . . , n. Prove that to produce the greatest
number of widgets with a given budget, we must have
1 ∂f 1 ∂f
λ= = ··· = .
p1 ∂x1 pn ∂xn
1 ∂f
The quantity is called the marginal productivity per dollar for item i. Explain why this
pi ∂xi
result is believable. What does the result of Exercise 32a tell us in this case?

34. (A second derivative test for constrained extrema) Suppose a is a critical point of f subject
to the constraint g(x) = c, Df (a) = λDg(a), and Dg(a) 6= O. Show that a is a constrained
local maximum (resp., minimum) of f on S = {x : g(x) = c} if the restriction of the Hessian of
f − λg to the tangent space Ta S is negative (resp., positive) definite. (Hint: Parametrize the
§5. Projections, Least Squares, and Inner Product Spaces 215

constraint surface g = c locally by Φ with Φ(a) = a and apply Theorem 3.3 to f ◦ Φ.) There
is an interpretation in terms of the bordered Hessian (see Exercise 32b), which is indicated in
Exercise 9.4.21.

5. Projections, Least Squares, and Inner Product Spaces

Example 1. Suppose we’re given the system Ax = b to solve, where

   
1 2 1
   
A = 0 1 and b = 1.
1 1 1
It is easy to check that b ∈
/ C(A), and so this system is inconsistent. The best we can do is to solve
Ax = p, where p is the vector in C(A) that is closest to b. Clearly that point is p = b − proja b,
where a is the normal vector to C(A) ⊂ R3 , as shown in Figure 5.1. Now we see how to solve our

b
C(A)

Figure 5.1

problem. C(A) is the plane in R3 with normal vector

 
−1
 
a =  1,
1
and if we compute proja b, then we will have

p = b − proja b ∈ Span(a)⊥ = C(A) and b − p = proja b ∈ C(A)⊥ .

In our case, we have

 
−1
b·a 1 
proja b = 2a = 3  1,
kak
1

and so
     
1 −1 4/3
  1   
p =  1  −  1  =  2/3  .
3
1 1 2/3
216 Chapter 5. Extremum Problems

Now it is an easy matter to solve Ax = p; indeed, the solution is

0
x= .
2/3
This is called the least squares solution of the original problem, inasmuch as Ax is the vector in
C(A) closest to b. ▽
In general, given b ∈ Rn and an m-dimensional subspace V ⊂ Rn , we can ask for the projection
of b onto V , i.e., the point in V closest to b, which we denote by projV b. We first make the official
Definition. Let V ⊂ Rn be a subspace, and let b ∈ Rn . We define the projection of b onto V
to be the unique vector p ∈ V with the property that b − p ∈ V ⊥ . We write p = projV b.
We ask the reader to show in Exercise 10 that projection onto a subspace V gives a linear map.
As we know from Chapter 4, we can be given V either explicitly (say V = C(A) for some n × m
matrix A) or implicitly (say V = N(B) for some m × n matrix B). We will start by applying the
methods of this chapter to obtain a simple solution of the problem (and then we will indicate that
we could have omitted the calculus completely).
Suppose A is an n × m matrix of rank m (so that the column vectors a1 , . . . , am give a basis
for our subspace V ). Define
f : Rm → R by f (x) = kAx − bk2 .
We seek critical points of f . Write h(x) = kxk2 and g(x) = Ax−b, so that f = h◦ g. Then Dh(y) =
2yT and Dg(x) = A, so, differentiating f by the chain rule, we have Df (x) = Dh(g(x))Dg(x) =
2(Ax − b)T A. Thus, Df (x) = 0 ⇐⇒ (Ax − b)T A = 0. Transposing for convenience, we deduce
that x is a critical point if and only if
(∗) (AT A)x = AT b.
Now, AT A is an m×m matrix, and by Exercise 4.4.14 this matrix is nonsingular, so the equation (∗)
has a unique solution x. We claim that x is the global minimum point. This is just the Pythagorean
Theorem again: since AT (Ax − b) = 0, Ax − b ∈ N(AT ) = C(A)⊥ , so, for any x ∈ Rm , x 6= x, as
Figure 5.2 shows,
f (x) = kAx − bk2 = kA(x − x) + (Ax − b)k2 = kA(x − x)k2 + kAx − bk2 > f (x).
The vector x is called the least squares solution of the (inconsistent) linear system Ax = b and

b
V

IRn Ax

Figure 5.2
(∗) gives the associated normal equations.
§5. Projections, Least Squares, and Inner Product Spaces 217

Remark. When A has rank less than m, the linear system (∗) is still consistent (see Exercise
4.4.15) and has infinitely many solutions. We define the least squares solution to be the one of
smallest length, i.e., the unique vector x ∈ R(A) that satisfies the equation. See Proposition 4.10
of Chapter 4. This leads to the pseudoinverse that is important in numerical analysis (cf. Strang).

Example 2. We wish to find the least squares solution of the system Ax = b, where
   
2 1 2
1 1  1
   
A=  and b= .
0 1  1
1 −1 −1

We need only solve the normal equations AT Ax = AT b. Now,

" # " #
T 6 2 T 4
A A= and A b= ,
2 4 5

and so, using the formula for the inverse of a 2 × 2 matrix in Example 5 on p. 146,
" #" # " #
T −1 T 1 4 −2 4 1 3
x = (A A) A b= =
20 −2 6 5 10 11

is the least squares solution. ▽

This is all it takes to give an explicit formula for projection onto a subspace V ⊂ Rn . In
particular, denote by
projV : Rn → Rn

the function which assigns to each vector b ∈ Rn the vector p ∈ V closest to b. Start by choosing
a basis {v1 , . . . , vm } for V , and let
 
| | |
 
A = v1 v2 · · · vm 
| | |

be the n × m matrix whose column vectors are these basis vectors. Then, given b ∈ Rn , we know
that if we take x = (AT A)−1 AT b, then Ax = p = projV b. That is,

p = projV b = A(AT A)−1 AT b,

from which we deduce that the matrix

(†) P = A(AT A)−1 AT

is the appropriate “projection matrix”: i.e.,

[projV ] = A(AT A)−1 AT .

In Section 5.2, we’ll see a bit more of the geometry underlying the formula for the projection matrix.
218 Chapter 5. Extremum Problems

Example 3. If b ∈ C(A) to begin with, then b = Ax for some x ∈ Rm , and

P b = A(AT A)−1 AT b = A(AT A)−1 (AT A)x = Ax = b,

as it should be. And if b ∈ C(A)⊥ , then b ∈ N(AT ), so

P b = A(AT A)−1 AT b = A(AT A)−1 (AT b) = 0,

as it should be. ▽

Example 4. Note that when dim V = 1, we recover our formula for projection onto a line from
Section 2 of Chapter 1. If a ∈ Rn is a nonzero vector, we consider it as an n × 1 matrix and the
projection formula becomes
1
P = aaT ;
kak2

that is,

1 1 a·b
Pb = 2
(aaT )b = 2
a(aT b) = a,
kak kak kak2
as before. ▽

Example 5. Let V ⊂ R3 be the plane defined by the equation x1 − 2x2 + x3 = 0. Then

   
2 −1
   
v1 =  1  and v2 =  0 
0 1

form a basis for V , and we take

 
2 −1
 
A = 1 0.
0 1
Then, since
" # " #
T 5 −2 T −1 1 2 2
A A= , we have (A A) = ,
−2 2 6 2 5
and so
   
2 −1 " #" # 5 2 −1
1  2 2 2 1 0 1 
P = A(AT A)−1 AT =  1 0 =  2 2 2. ▽
6 2 5 −1 0 1 6
0 1 −1 2 5

Now, what happens if we are given the subspace implicitly? This sounds like the perfect set-up
for Lagrange multipliers. Suppose the m-dimensional subspace V ⊂ Rn is given as the nullspace
of an (n − m) × n matrix B of rank n − m. To find the point in V closest to b ∈ Rn , we want to
minimize the function

f : Rn → R, f (x) = kx − bk2 , subject to the constraint g(x) = Bx = 0.

§5. Projections, Least Squares, and Inner Product Spaces 219

The method of Lagrange multipliers, Theorem 4.1, tells us that we must have (dropping the factor
of 2)
n−m
X
T
(x − b) = λi Bi , for some scalars λ1 , . . . , λn−m ,
i=1
where, as usual, Bi are the rows of B. Transposing this equation, we have
 
λ1
 .. 
x − b = B T λ, where λ =  . .
λn−m

Multiplying this equation by B and using the constraint equation, we get

(BB T )λ = −Bb.

By analogy with our treatment of the equation (∗), the matrix BB T has rank n − m, and so we
can solve for λ, hence for the constrained extremum x0 :

(‡) x0 = b + B T −(BB T )−1 Bb = b − B T (BB T )−1 Bb.

Note that, according to our projection formula (†), we can interpret this answer as

x0 = b − projC(B T ) b = b − projR(B) b = projR(B)⊥ b = projN(B) b,

as it should be.

5.1. Data fitting. Perhaps the most natural setting in which inconsistent systems of equations
arise is that of fitting data to a curve when they won’t quite fit. For example, in our laboratory
work

x1 xm
many of us have tried to find the right constants a and k so that the data points ,...,
y1 ym
lie on the k
curve
y = ax . Taking natural logarithms, we see that this is equivalent to fitting the
ui log xi
points = , i = 1, . . . , m, to a line v = ku + log a—whence the convenience of log-log
vi log yi
paper. The least squares solution of such problems is called the least squares line fitting the points
(or the line of regression in statistics).

−1 1 2
Example 6. Find the least squares line y = ax + b for the data points , , and .
0 1 3
(See Figure 5.3.) We get the system of equations

−1a + b = 0
1a + b = 1
2a + b = 3 ,

which in matrix form becomes

   
" # −1 1 " # 0
a   a  
A = 1 1 = 1.
b b
2 1 3
220 Chapter 5. Extremum Problems

-2 -1 1 2 3

-1

Figure 5.3

The least squares solution is

 
" # 0 " #" # " #
a T −1 T   1 3 −2 7 1 13
= (A A) A  1  = = .
b 14 −2 6 4 14 10
3
That is, the least squares line is
13 5
y= x+ . ▽
14 7

x1 xm
When we find the least squares line y = ax + b fitting the data points ,..., , we are
y1 ym

a
finding the least squares solution of the (inconsistent) system A = y, where
b
   
x1 1 y1
   
 x2 1   y2 
A=
 .. .. 
 and y = 
 .. .

 . .  . 
xm 1 ym

a a
Let’s denote by y = A the projection of y onto C(A). The least squares solution has the
b b
property that ky − yk is as small as possible. If we define the error vector ǫ = y − y, then we have
     
ǫ1 y1 − y 1 y1 − (ax1 + b)
     
 ǫ2   y2 − y 2   y2 − (ax2 + b) 

ǫ= . =     .
. .. = .. 
 .   .   . 
ǫm ym − y m ym − (axm + b)

The least squares process chooses a and b so that kǫk2 = ǫ21 + · · · + ǫ2m is as small as possible. But
something interesting happens. Recall that

ǫ = y − y ∈ C(A)⊥ .
§5. Projections, Least Squares, and Inner Product Spaces 221

Thus, ǫ is orthogonal to each of the column vectors of A, and so, in particular,

   
ǫ1 1
 .  .
 ..  ·  ..  = ǫ1 + · · · + ǫm = 0.
   
ǫm 1
That is, in the process of minimizing the sum of the squares of the errors ǫi , we have in fact made
their (algebraic) sum equal to 0.

5.2. Orthogonal Bases. We have seen how to find the projection of a vector onto a subspace
V ⊂ Rn using the so-called normal equations. But the inner workings of the formula (†) on p. 217
escape us. Since we have known since Chapter 1 how to project a vector x onto a line, it might
seem more natural to start with a basis {v1 , . . . , vk } for V and sum up the projections of x onto
the vj ’s. However, as we see in Figure 5.4(a), when we start with x ∈ V and add the projections

projv1x + projv2x
projv2x

v2 w2
projw1x + projw2x
x x

projw2x

v1 projv1x projw1x
w1

(a) (b)

Figure 5.4

of x onto the vectors of an arbitrary basis for V , the resulting vector needn’t have much to do with
x. Nevertheless, the diagram on the right suggests that when we start with a basis consisting of
mutually orthogonal vectors, the process may work. We begin by proving this as a lemma.

Definition. Let v1 , . . . , vk ∈ Rn . We say {v1 , . . . , vk } is an orthogonal set of vectors provided

vi · vj = 0 whenever i 6= j. We say {v1 , . . . , vk } is an orthogonal basis for a subspace V if
{v1 , . . . , vk } is both a basis for V and an orthogonal set.

Lemma 5.1. Suppose {v1 , . . . , vk } is a basis for V and x ∈ V . Then

k
X Xk
x · vi
x= projvi x = vi
i=1 i=1
kvi k2
if and only if {v1 , . . . , vk } is an orthogonal basis for V .

Proof. Suppose {v1 , . . . , vk } is an orthogonal basis for V . Then there are scalars c1 , . . . , ck so
that
x = c1 v1 + · · · + ci vi + · · · + ck vk .
222 Chapter 5. Extremum Problems

Taking advantage of the orthogonality of the vj ’s, we take the dot product of this equation with
vi :

x · vi = c1 (v1 · vi ) + · · · + ci (vi · vi ) + · · · + ck (vk · vi )

= ci kvi k2 ,

and so
x · vi
ci = .
kvi k2
(Note that vi 6= 0 since {v1 , . . . , vk } forms a basis for V .)
Conversely, suppose that every vector x ∈ V is the sum of its projections on v1 , . . . , vk . Let’s
just examine what this means when x = v1 : we are given that
k
X k
X v1 · vi
v1 = projvi v1 = vi .
i=1 i=1
kvi k2

Recall from Proposition 3.1 of Chapter 4 that every vector has a unique expansion as a linear
combination of basis vectors, so comparing coefficients of v2 , . . . , vk on either side of this equation,
we conclude that
v1 · vi = 0 for all i = 2, . . . , k.
A similar argument shows that vi · vj = 0 for all i 6= j, and the proof is complete.

As we mentioned above, if {v1 , . . . , vk } is a basis for V , then every vector x ∈ V can be written
uniquely as a linear combination

x = c1 v1 + c2 v2 + · · · + ck vk .

We recall that the coefficients c1 , c2 , . . . , ck that appear here are called the coordinates of x with
respect to the basis {v1 , . . . , vk }. It is worth emphasizing that when {v1 , . . . , vk } forms an orthog-
onal basis for V , it is quite easy to compute the coordinates of x by using the dot product, that is,
ci = x · vi /kvi k2 . As we saw in Example 8 of Section 3 of Chapter 4 (see also Section 1 of Chapter
9), when the basis is not orthogonal, it is far more tedious to compute these coordinates.
Not only do orthogonal bases make it easy to calculate coordinates, they also make projections
quite easy to compute, as we now see.

Proposition 5.2. Let V ⊂ Rn be a k-dimensional subspace. For any vector b ∈ Rn ,

k
X Xk
b · vi
(∗∗) projV b = projvi b = vi
i=1 i=1
kvi k2

if and only if {v1 , . . . , vk } is an orthogonal basis for V .

Proof. Assume {v1 , . . . , vk } is an orthogonal basis for V and write b = p + (b − p), where
Xk
p · vi
p = projV b (and so b − p ∈ V ⊥ ). Then, since p ∈ V , by Lemma 5.1, we know p = vi .
i=1
kvi k2
§5. Projections, Least Squares, and Inner Product Spaces 223

Moreover, for i = 1, . . . , k, we have b · vi = p · vi , since b − p ∈ V ⊥ . Thus,

k
X Xk Xk Xk
p · vi b · vi
projV b = p = projvi p = vi = vi = projvi b.
i=1 i=1
kvi k2 i=1
kvi k2 i=1

P
k
Conversely, suppose projV b = projvi b for all b ∈ Rn . In particular, when b ∈ V , we deduce
i=1
that b = projV b can be written as a linear combination of v1 , . . . , vk , so these vectors span V ;
since V is k-dimensional, {v1 , . . . , vk } gives a basis for V . By Lemma 5.1, it must be an orthogonal
basis.

We now have another way to calculate the projection of a vector on a subspace V , provided we
can come up with an orthogonal basis for V .

Example 7. We return to Example 5 on p. 218. The basis {v1 , v2 } we used there was certainly
not an orthogonal basis, but it is not hard to find one that is. Instead, we take
   
−1 1
   
w1 =  0  and w2 =  1  .
1 1
(It is immediate that w1 · w2 = 0 and that w1 , w2 lie in the plane x1 − 2x2 + x3 = 0.) Now, we
calculate
b · w1 b · w2
projV b = projw1 b + projw2 b = 2 w1 + w2
kw1 k kw2 k2

1 T 1 T
= w1 w1 + w2 w2 b
kw1 k2 kw2 k2
    
1 0 −1 1 1 1
1   1 
=  0 0 0+ 1 1 1  b
2 3
−1 0 1 1 1 1
 
5 1
6 3 − 61
 1  b,
=  13 1
3 3 
− 16 1
3
5
6

as we found earlier. ▽

Remark . This is exactly what we get from formula (†) on p. 217 when {v1 , . . . , vk } is an
orthogonal set. In particular,
  
1
  kv1 k2 v1T
| | |   
 1  v2T 
T −1 T   kv2 k2  
P = A(A A) A = v1 v2 · · · vk    .
 ..
.  .. 

| | |  
1 vk T
kv k2 k

k
X 1
= vi viT .
kvi k2
i=1
224 Chapter 5. Extremum Problems

Now it is time to develop an algorithm for transforming a given (ordered) basis {v1 , . . . , vk }
for a subspace into an orthogonal basis {w1 , . . . , wk }, as shown in Figure 5.5. The idea is quite
simple. We set
w1 = v1 .

If v2 is orthogonal to w1 , then we set w2 = v2 . Of course, in general, it will not be, and we want
w2 to be the part of v2 that is orthogonal to w1 ; i.e., we set
v2 · w1
w2 = v2 − projw1 v2 = v2 − w1 .
kw1 k2
Then, by construction, w1 and w2 are orthogonal and Span(w1 , w2 ) ⊂ Span(v1 , v2 ). Since w2 6= 0
(why?), {w1 , w2 } must be linearly independent and therefore give a basis for Span(v1 , v2 ) by
Lemma 3.8. We continue, replacing v3 by its part orthogonal to the plane spanned by w1 and w2 :

v2 w2

v1 w1

v3 w3

v2 w2
v1 w1

Figure 5.5

v3 · w1 v3 · w2
w3 = v3 − projSpan(w1 ,w2 ) v3 = v3 − projw1 v3 − projw2 v3 = v3 − 2 w1 − w2 .
kw1 k kw2 k2
Note that we are making definite use of Proposition 5.2 here: we must use w1 and w2 in the formula
here, rather than v1 and v2 , because the formula (∗∗) requires an orthogonal basis. Once again,
we find that w3 6= 0 (why?), and so {w1 , w2 , w3 } must be linearly independent, and, consequently,
an orthogonal basis for Span(v1 , v2 , v3 ). The process continues until we have arrived at vk and
replaced it by
vk · w1 vk · w2 vk · wk−1
wk = vk − projSpan(w1 ,...,wk−1 ) vk = vk − 2 w1 − 2 w2 − · · · − wk−1 .
kw1 k kw2 k kwk−1 k2
Summarizing, we have the algorithm that goes by the name of the Gram-Schmidt process.
§5. Projections, Least Squares, and Inner Product Spaces 225

Theorem 5.3 (Gram-Schmidt Process). Given a basis {v1 , . . . , vk } for a subspace V ⊂ Rn ,

we obtain an orthogonal basis {w1 , . . . , wk } for V as follows:

w1 = v1
v2 · w1
w2 = v2 − w1
kw1 k2
..
.

and, assuming w1 , . . . , wj have been defined,

vj+1 · w1 vj+1 · w2 vj+1 · wj
wj+1 = vj+1 − 2 w1 − 2 w2 − · · · − wj
kw1 k kw2 k kwj k2
..
.
vk · w1 vk · w2 vk · wk−1
wk = vk − 2 w1 − 2 w2 − · · · − wk−1 .
kw1 k kw2 k kwk−1 k2
If we so desire, we can arrange for an orthogonal basis consisting of unit vectors by dividing each
of w1 , . . . , wk by its respective length:
w1 w2 wk
q1 = , q2 = , ..., qk = .
kw1 k kw2 k kwk k
The set {q1 , . . . , qk } is called an orthonormal basis for V .
     
1 3 1
1  1 1
Example 8. Let v1 =      
 1 , v2 =  −1 , and v3 =  3 . We want to use the Gram-Schmidt
1 1 3
process to give an orthogonal basis for V = Span(v1 , v2 , v3 ) ⊂ R4 . We take
 
1
1
w1 = v1 =  
1;
1    
3 1
 1 1
     
 −1  ·  1   1 
3
v2 · w1  1 1 1 1
w2 = v2 − w 1 =   −    
kw1 k2  −1  1 2  1 
1
1
 1 
 
 1 

1
     
3 1 2
 1 41  0
=     
 −1  − 4  1  =  −2  ;
1 1 0

v3 · w1 v3 · w2
w3 = v3 − 2 w1 − w2
kw1 k kw2 k2
226 Chapter 5. Extremum Problems
       
1 1 1 2
1 1 1  0
         
 3  ·  1   1   3  ·  −2   
1 2
1 3 1 1 3 0  0
= 
3−   2  
 − 
  2  
 −2 
1 1 2
3
1
0
 1   0 
   
 1   −2 

1 0
       
1 1 2 0
 1  8  1  −4  0   −1 
=       
 3  − 4  1  − 8  −2  =  0  .
3 1 0 1
And if we desire an orthonormal basis, then we take
     
1 1 0
1 1 1  0 1  −1 
q1 =  , q2 = √  , q3 = √  .
21 2  −1  2 0
1 0 1
It’s always a good idea to check that the vectors form an orthogonal (or orthonormal) set, and it’s
easy—with these numbers—to do so. ▽

5.3. Inner Product Spaces. In certain abstract vector spaces we may define a notion of dot
product.

Definition. Let V be a real vector space. We say V is an inner product space if for every pair
of elements u, v ∈ V there is a real number hu, vi, called the inner product of u and v, such that:
(1) hu, vi = hv, ui for all u, v ∈ V ;
(2) hcu, vi = chu, vi for all u, v ∈ V and scalars c;
(3) hu + v, wi = hu, wi + hv, wi for all u, v, w ∈ V ;
(4) hv, vi ≥ 0 for all v ∈ V and hv, vi = 0 only if v = 0.

Examples 9. (a) Fix k+1 distinct real numbers t1 , t2 , . . . , tk+1 and define an inner product
on Pk by the formula
k+1
X
hp, qi = p(ti )q(ti ), p, q ∈ Pk .
i=1

All the properties of an inner product are obvious except for the very last. If hp, pi = 0,
P
k+1
then p(ti )2 = 0, and so we must have p(t1 ) = p(t2 ) = · · · = p(tk+1 ) = 0. But if a
i=1
polynomial of degree ≤ k has (at least) k + 1 roots, then it must be the zero polynomial.
(b) Let C0 ([a, b]) denote the vector space of continuous functions on the interval [a, b]. If
f, g ∈ C0 ([a, b]), define
Z b
hf, gi = f (t)g(t)dt.
a
We verify that the defining properties hold.
Rb Rb
(1) hf, gi = a f (t)g(t)dt = a g(t)f (t)dt = hg, f i.
§5. Projections, Least Squares, and Inner Product Spaces 227
Rb Rb Rb
(2) hcf, gi = a (cf )(t)g(t)dt = a cf (t)g(t)dt = c a f (t)g(t)dt = chf, gi .
Rb Rb Rb
(3) hf + g, hi = a (f + g)(t)h(t)dt = a f (t) + g(t) h(t)dt = a f (t)h(t) + g(t)h(t) dt =
Rb Rb
a f (t)h(t)dt
Rb
+ a g(t)h(t)dt = hf, hi + hg, hi.
(4) hf, f i = a f (t) dt ≥ 0 since f (t)2 ≥ 0 for all t. On the other hand, if hf, f i =
2
Rb 2
a f (t) dt = 0, then since f is continuous and f 2 ≥ 0, it must be the case that
f = 0. (If not, we would have f (t0 ) 6= 0 for some t0 , and then f (t)2 would be positive
Rb
on some small interval containing t0 ; it would then follow that a f (t)2 dt > 0.)
The same inner product can be defined on subspaces of C0 ([a, b]), e.g., Pk .
(c) We define an inner product on Mn×n in Exercise 18. ▽
If V is an inner product space, we define length, orthogonality, and the angle between vectors
p
just as we did in Rn . If v ∈ V , we define its length to be kvk = hv, vi. We say v and w are
orthogonal if hv, wi = 0. Since the Cauchy-Schwarz inequality can be established in general by
following the proof of Proposition 2.3 of Chapter 1 verbatim, we can define the angle θ between v
and w by the equation
hv, wi
cos θ = .
kvkkwk
We can define orthogonal subspaces, orthogonal complements, and the Gram-Schmidt process anal-
ogously.
We can use the inner product defined in Example 9(a) to prove the following important result
about curve fitting.
Theorem 5.4 (Lagrange Interpolation Formula). Given k + 1 points

t1 t2
, , . . . , smvectk+1 bk+1
b1 b2
in the plane with t1 , t2 , . . . , tk+1 distinct, there is exactly one polynomial p ∈ Pk whose graph passes
through the points.
Proof. We begin by explicitly constructing a basis for Pk consisting of mutually orthogonal
vectors of length 1 with respect to the inner product defined in Example 9(a). That is, to start,
we seek a polynomial p1 ∈ Pk so that
p1 (t1 ) = 1, p1 (t2 ) = 0, ..., p1 (tk+1 ) = 0.
The polynomial q1 (t) = (t − t2 )(t − t3 ) · · · (t − tk+1 ) has the property that q1 (tj ) = 0 for j =
2, 3, . . . , k + 1, and q1 (t1 ) = (t1 − t2 )(t1 − t3 ) · · · (t1 − tk+1 ) 6= 0 (why?). So now we set
(t − t2 )(t − t3 ) · · · (t − tk+1 )
p1 (t) = ;
(t1 − t2 )(t1 − t3 ) · · · (t1 − tk+1 )
then, as desired, p1 (t1 ) = 1 and p1 (tj ) = 0 for j = 2, 3, . . . , k + 1. Similarly, we can define
(t − t1 )(t − t3 ) · · · (t − tk+1 )
p2 (t) =
(t2 − t1 )(t2 − t3 ) · · · (t2 − tk+1 )
and polynomials p3 , . . . , pk+1 so that

1, when i = j
pi (tj ) = .
0, when i =
6 j
228 Chapter 5. Extremum Problems

Like the standard basis vectors in Euclidean space, p1 , p2 , . . . , pk+1 are unit vectors in Pk that
are orthogonal to one another. It follows from Exercise 4.3.5 that these vectors form a linearly
independent set, hence a basis for Pk (why?). In Figure 5.6 we give the graphs of the Lagrange
“basis polynomials” p1 , p2 , p3 for P2 when t1 = −1, t2 = 0, and t3 = 2.

p1
2 p3
p2
1

-2 -1 1 2 3

-1

-2

Figure 5.6

Now it is easy to see that the appropriate linear combination

p = b1 p1 + b2 p2 + · · · + bk+1 pk+1

has the desired properties: viz., p(tj ) = bj for j = 1, 2, . . . , k+1. On the other hand, two polynomials
of degree ≤ k with the same values at k+1 points must be equal since their difference is a polynomial
of degree ≤ k with at least k+1 roots. This establishes uniqueness. (More elegantly, any polynomial
q with q(tj ) = bj , j = 1, . . . , k + 1, must satisfy hq, pj i = bj , j = 1, . . . , k + 1.)

EXERCISES 5.5

Rn onto the given hyperplane V ⊂ Rn .

1. Find the projection of the given vector b ∈ 
2
a. V = {x1 + x2 + x3 = 0} ⊂ R3 , b =  1 
1
 
0
 1
*b. V = {x1 + x2 + x3 = 0} ⊂ R4 , b = 
2


3
 
1
1
c. V = {x1 − x2 + x3 + 2x4 = 0} ⊂ R4 , b =  
1
1

2. Check from the formula P = A(AT A)−1 AT for the projection matrix that P = P T and P 2 = P .
Show that I − P has the same properties; explain.
§5. Projections, Least Squares, and Inner Product Spaces 229
   
1 0
3. Let V = Span  0  ,  1  ⊂ R3 . Construct the matrix [projV ]
1 −2
a. by finding [proj(V ⊥ ) ];
b. by using the projection matrix P given in formula (†) on p. 217.
c. by finding an orthogonal basis for V .

*4. a. Find the least squares solution of

x1 + x2 = 4
2x1 + x2 = −2
x1 − x2 = 1 .
     
1 1 4
b. Find the point on the plane spanned by  2  and  1  that is closest to  −2 .
1 −1 1
5. a. Find the least squares solution of
x1 + x2 = 1
x1 − 3x2 = 4
2x1 + x2 = 3 .
     
1 1 1
b. Find the point on the plane spanned by  1  and  −3  that is closest to  4 .
2 1 3

6. Solve Exercise 5.4.26 anew using (‡) on p. 219.

−1 0 1 2
7. Consider the four data points , , , .
0 1 3 5
*a. Find the “least squares horizontal line” y = a fitting the data points. Check that the sum
of the errors is 0.
b. Find the “least squares line” y = ax + b fitting the data points. Check that the sum of the
errors is 0.
*c. Find the “least squares parabola” y = ax2 + bx + c fitting the data points. (Calculator
recommended.) What is true of the sum of the errors in this case?

1 2 3 4
8. Consider the four data points , , , .
1 2 1 3
a. Find the “least squares horizontal line” y = a fitting the data points. Check that the sum
of the errors is 0.
b. Find the “least squares line” y = ax + b fitting the data points. Check that the sum of the
errors is 0.
c. Find the “least squares parabola” y = ax2 + bx + c fitting the data points. (Calculator
recommended.) What is true of the sum of the errors in this case?

9. Derive the equation (∗) on p. 216 by starting with the equation Ax = p and using the result
of Theorem 4.9 of Chapter 4.

10. Prove from the definition of projV on p. 216 that

230 Chapter 5. Extremum Problems

a. projV (x + y) = projV x + projV y for all vectors x and y;

b. projV (cx) = cprojV x for all vectors x and scalars c.
c. for any b ∈ Rn we have b = projV b + projV ⊥ b.
Parts a and b tell us that projV is a linear map.

11. Using the definition of projection on p. 216, prove that

a. if [projV ] = A, then A = A2 and A = AT . (Hint: For the latter, show that Ax · y = x · Ay
for all x, y. It may be helpful to write x and y as the sum of vectors in V and V ⊥ .)
b. if A2 = A and A = AT , then A is a projection matrix. (Hints: First decide onto which
subspace it should be projecting. Then show that for all x, the vector Ax lies in that
subspace and x − Ax is orthogonal to that subspace.)

12. Execute the Gram-Schmidt process in each case to give an orthonormal basis for the subspace
spanned
 bythegiven
 vectors.
1 2 3
a.  0 ,  1 ,  2 
0 0 1
     
1 0 0
b.  1 ,  1 ,  0 
1 1 1
     
1 2 0
0 1  1
*c.  ,  ,  
1 0  2
0 1 −3
     
−1 2 −1
 2   −4   3 
d.      
 0 ,  1 ,  1 
2 −4 1
     
1 1 1
 −1   0   
Let V = Span     4  −3 
*13.  0  ,  1  ⊂ R , and let b =  1 .
2 1 1
a. Find an orthogonal basis for V .
b. Use your answer to part a to find p = projV b.
c. Letting
 
1 1
 −1 0
 
A= ,
 0 1
2 1

use your answer to part b to give the least squares solution of Ax = b.

   
1 0
 0   1 
14. Let V = Span     4
 1  ,  −1  ⊂ R .
1 1
a. Give an orthogonal basis for V .
§5. Projections, Least Squares, and Inner Product Spaces 231

b. Give an orthogonal basis for V ⊥ .

c. Given a general vector x ∈ R4 , find v ∈ V and w ∈ V ⊥ so that x = v + w.

15. According to Proposition 4.10 of Chapter 4, if A is an m × n matrix, then for each b ∈ C(A),
there is a "unique x ∈ R(A)
# with Ax = b. In each case, give a formula for that x.
1 2 3
a. A =
1 2 3
" #
1 1 1
*b. A =
0 1 −1
" #
1 1 1 1
c. A =
1 1 3 −5
 
1 1 1 1
 
d. A =  1 1 3 −5 
2 2 4 −4
♯ 16. Let A be an n × n matrix and, as usual, let a1 , . . . , an denote its column vectors.
a. Suppose a1 , . . . , an form an orthonormal set. Prove that A−1 = AT .
*b. Suppose a1 , . . . , an form an orthogonal set and each is nonzero. Find the appropriate
formula for A−1 .
Ra
17. Let V = C0 ([−a, a]) with the inner product hf, gi = −a f (t)g(t)dt. Let U + ⊂ V be the subset
of even functions, and let U − ⊂ V be the subset of odd functions. That is, U + = {f ∈ V :
f (−t) = f (t) for all t ∈ [−a, a]} and U − = {f ∈ V : f (−t) = −f (t) for all t ∈ [−a, a]}.
a. Prove that U + and U − are orthogonal subspaces of V .
b. Use the fact that every function can be written as the sum of an even and an odd function,
viz.,

1

f (t) = 2 f (t) + f (−t) + 12 f (t) − f (−t) ,
| {z } | {z }
even odd

to prove that U − = (U + )⊥ and U + = (U − )⊥ .

18. (See Exercise 1.4.22 for the definition and basic properties of trace.)
a. If A, B ∈ Mn×n , define hA, Bi = tr(AT B). Check that this is an inner product on Mn×n .
b. Check that if A is symmetric and B is skew-symmetric, then hA, Bi = 0. (Hint: Show
that hA, Bi = −hB, Ai.)
c. Deduce that the subspaces of symmetric and skew-symmetric matrices (cf. Exercise 4.3.24)
are orthogonal complements in Mn×n .

19. Let g1 (t) = 1 and g2 (t) = t. Using the inner product defined in Example 9(b), find the
orthogonal complement of Span(g1 , g2 ) in
a. P2 ⊂ C0 ([−1, 1])
*b. P2 ⊂ C0 ([0, 1])
c. P3 ⊂ C0 ([−1, 1])
232 Chapter 5. Extremum Problems

*20. Show that for any positive integer n, the functions 1, cos t, sin t, cos 2t, sin 2t, . . . , cos nt, sin nt
are orthogonal in C∞ ([−π, π]) ⊂ C0 ([−π, π]) (using the inner product defined in Example 9(b)).
CHAPTER 6
Solving Nonlinear Problems
In this brief chapter we introduce some important techniques for dealing with nonlinear prob-
lems (and in the infinite-dimensional setting, as well, although that is too far off-track for us here).
As we’ve said all along, we expect the derivative of a nonlinear function to dictate locally how the
function behaves. In this chapter we come to the rigorous treatment of the inverse and implicit
function theorems, to which we alluded at the end of Chapter 4, and to a few equivalent descriptions
of a k-dimensional manifold, which will play a prominent role in Chapter 8.

1. The Contraction Mapping Principle

We begin with a useful result about summing series of vectors. It will be important not just in
our immediate work, but also in our treatment of matrix exponentials in Chapter 9.

Proposition 1.1. Suppose {ak } is a sequence of vectors in Rn and the series

∞
X
kak k
k=1

converges (i.e., the sequence of partial sums tk = ka1 k + · · · + kak k is a convergent sequence of real
numbers). Then the series
X∞
ak
k=1
of vectors converges (i.e., the sequence of partial sums sk = a1 + · · · + ak is a convergent sequence
of vectors in Rn ).

Proof. We first prove the result in the case n = 1. Given a sequence {ak } of real numbers,
define bk = ak + |ak |. Note that

2a , if a ≥ 0
k k
bk = .
0, otherwise
P
∞ P
Now, the series bk converges by comparison with 2|ak |. (Directly: since bk ≥ 0, the partial
k=1 P
sums form a nondecreasing sequence that is bounded above by 2 |ak |. That nondecreasing
sequence must converge to its least upper bound. See Example 4(c) of Chapter 2, Section 2.) Since
P P
ak = bk − |ak |, the series ak converges, being the sum of the two convergent series bk and
P
− |ak |.
We use this case to derive the general result. Denote by ak,j , j = 1, . . . , n, the j th component of
P
the vector ak . Obviously, we have |ak,j | ≤ kak k. By comparison with the convergent series kak k,
233
234 Chapter 6. Solving Nonlinear Problems
P
for any j = 1, . . . , n, the series k |ak,j | converges, and hence, by what we’ve just proved, so does
P
the series k ak,j . Since this is true for each j = 1, . . . , n, the series
P 
a k,1
X  k. 
ak = 

.. 

k P
k ak,n

converges as well, as we wished to establish.

Remark. The result holds even if we use something other than the Euclidean length in Rn .
For example, we can apply the result using the norm defined on the vector space of m × n matrices
in Section 1 of Chapter 5, since the triangle inequality kA + Bk ≤ kAk + kBk holds (see Proposition
1.3 of Chapter 5) and |aij | ≤ kAk for any matrix A = [aij ] (why?).

The following result is crucial in both pure and applied mathematics, and applies in infinite-
dimensional settings as well.

Definition. Let X be a subset of Rn . A function f : X → X is called a contraction mapping

if there is a constant c, 0 < c < 1, so that

kf (x) − f (y)k ≤ ckx − yk for all x, y ∈ X .

(It is crucial that c be strictly less than 1, as Exercise 2 illustrates.)

Example 1. Consider f : [0, π/3] → [0, 1] ⊂ [0, π/3] given by f (x) = cos x. Then by the mean
value theorem, for any x, y ∈ [0, π/3],

|f (x) − f (y)| = | sin z||x − y| for some z between x and y

√
3
≤ |x − y| since 0 < z < π/3.
2
√
Since 3/2 < 1, f is a contraction mapping. ▽

Theorem 1.2 (Contraction Mapping Principle). Let X ⊂ Rn be closed. Let f : X → X be a

contraction mapping. Then there is a unique point x ∈ X such that f (x) = x. (Not surprisingly,
x is called a fixed point of f .)

Proof. Let x0 ∈ X be arbitrary, and define a sequence recursively by

xk+1 = f (xk ).

Our goal is to show that, inasmuch as f is a contraction mapping, this sequence converges to some
point x ∈ X. Then, by continuity of f (see Exercise 1), we will have

f (x) = lim f (xk ) = lim xk+1 = x.

k→∞ k→∞

Consider the equation

k
X
xk = x0 + (x1 − x0 ) + (x2 − x1 ) + · · · + (xk − xk−1 ) = x0 + (xj − xj−1 ) .
j=1
§1. The Contraction Mapping Principle 235

This suggests that we set

ak = xk − xk−1
P
and try to determine whether the series ak converges. To this end, we wish to apply Proposition
1.1, and so we begin by estimating kak k: by the definition of the sequence {xk } and the definition
of a contraction mapping, we have

kak k = kxk − xk−1 k = kf (xk−1 ) − f (xk−2 )k ≤ ckxk−1 − xk−2 k = ckak−1 k

for some constant 0 < c < 1, so that

kak k ≤ ckak−1 k ≤ c2 kak−2 k ≤ · · · ≤ ck−1 ka1 k.

Therefore,
K K
!
X X 1 − cK
kak k ≤ ck−1 ka1 k = ka1 k.
1−c
k=1 k=1

Since 0 < c < 1,

1 − cK 1
lim = ,
K→∞ 1 − c 1−c
P P
and so the series kak k converges. By Proposition 1.1, we infer that the series ak converges to
n
some vector a ∈ R . It follows, then, that xk → x0 + a = x, as required.
Two issues remain. Since xk → x, all the xk ’s are elements of X, and X is closed, then we
know that x ∈ X as well. The uniqueness of the fixed point is left to the reader in Exercise 1.

Example 2. According to Theorem 1.2, the function f introduced in Example 1 must have a
unique fixed point in the interval [0, π/3]. Following the proof with x0 = 0, we obtain the following

1 y=x

0.8

0.6
y = cos x

0.4

0.2

0.2 0.4 0.6 0.8 1

x0 x2 x4 x3 x1

Figure 1.1

values:
236 Chapter 6. Solving Nonlinear Problems

k xk k xk
1 1. 11 0.744237
2 0.540302 12 0.735604
3 0.857553 13 0.741425
4 0.654289 14 0.737506
5 0.793480 15 0.740147
6 0.701368 16 0.738369
7 0.763959 17 0.739567
8 0.722102 18 0.738760
9 0.750417 19 0.739303
10 0.731404 20 0.738937

Indeed, as Figure 1.1 illustrates, the values xk are converging to the x-coordinate of the intersection
of the graph of f (x) = cos x with the diagonal y = x. ▽

Example 2 shows that this is a very slow method to obtain the solution of cos x = x. Far better
is Newton’s method, familiar to every student of calculus. Given a differentiable function g : R → R,
we start at xk , draw the tangent line to the graph of g at xk and let xk+1 be the x-intercept of
that tangent line, as shown in Figure 1.2. We obtain in this way a sequence, and one hopes that if

y=g(x)

xk+1 xk

Figure 1.2

x0 is sufficiently close to a root a, then the sequence will converge to a. It is easy to see that the
recursion formula for this sequence is
g(xk )
xk+1 = xk − ,
g′ (xk )

g(x)
so, in fact, we are looking for a fixed point of the mapping f (x) = x − . If we assume g is
g′ (x)
twice differentiable, then we find that f ′ = gg ′′ /(g′ )2 , so f will be a contraction mapping whenever
|gg ′′ /(g ′ )2 | ≤ c < 1. In particular, if |g ′′ | ≤ M and |g′ | ≥ m, then iterating f will converge to a root
a of g if we start in any closed interval containing a on which |g| < m2 /M (provided f maps that
interval back to itself). For the strongest result, see Exercise 8.
§1. The Contraction Mapping Principle 237

Example 3. Reconsidering the problem of Example 2, let’s use Newton’s method to approxi-
x − cos x
mate the root of cos x = x by taking g(x) = x − cos x and iterating the map f (x) = x − .
1 + sin x
k xk k xk
0 1. 0 0.523599
1 0.750364 1 0.751883
2 0.739113 2 0.739121
3 0.739085 3 0.739085
4 0.739085 4 0.739085
Here we see that, whether we start at either x0 = 1 or at x0 = π/6, Newton’s method converges to
the root quite rapidly. Indeed, on the interval [π/6, π/3], we have m = 1.5, M = .87, and |g| ≤ .55,
which is far smaller than m2 /M ≈ 2.6. ▽

To move to higher dimensions, we need a multivariable Mean Value Theorem. The Mean Value
Theorem, although often misinterpreted in beginning calculus courses, tells us that if we have
bounds on the size of the derivative of a differentiable function, then we have bounds on how much
the function itself can change from one point to another. A crucial tool here will be the norm of a
linear map, introduced in Chapters 3 and 5.

Proposition 1.3 (The Mean Value Inequality). Suppose U ⊂ Rn is open, f : U → Rm is C1 ,

and a and b are points in U so that the line segment between them is contained in U .1 Then

kf (b) − f (a)k ≤ max kDf (x)k kb − ak.
x∈[a,b]

Proof. Define g : [0, 1] → Rm by g(t) = f (a + t(b − a)). Note that

f (b) − f (a) = g(1) − g(0).
By the chain rule, g is differentiable and
(∗) g′ (t) = Df (a + t(b − a))(b − a).
Applying Lemma 5.3 of Chapter 3, we have
Z 1 Z
1
kf (b) − f (a)k =
g ′
(t)dt ≤
kg′ (t)kdt ≤ max kg′ (t)k.
0 0 t∈[0,1]

By (∗), we have kg′ (t)k ≤ kDf (a + t(b − a))kkb − ak, and so

max kg′ (t)k ≤ max kDf (x)k kb − ak.
t∈[0,1] x∈[a,b]

This completes the proof.

EXERCISES 6.1

1. Prove that any contraction mapping is continuous and has at most one fixed point.
1
More generally, all we need is a C1 path in U joining a and b.
238 Chapter 6. Solving Nonlinear Problems
√
2. Let f : R → R be given by f (x) = x2 + 1. Show that f has no fixed point and that |f ′ (x)| < 1
for all x ∈ R. Why does this not contradict Theorem 1.2?
ck
*3. For the sequence {xk } defined in the proof of Theorem 1.2, prove that kxk −xk ≤ kx1 −x0 k.
1−c
This gives an a priori estimate on how fast the sequence converges to the fixed point.

4. A sequence {xk } of points in Rn is called a Cauchy sequence if for all ε > 0 there is K so that
whenever k, ℓ > K, we have kxk − xℓ k < ε. It is a fact that any Cauchy sequence in Rn is
convergent. (See Exercise 2.2.14.) Suppose 0 < c < 1 and {xk } is a sequence of points in Rn so
that kxk+1 − xk k < ckxk − xk−1 k for all k ∈ N. Prove that {xk } is a Cauchy sequence, hence
cK
convergent. (Hint: Show that whenever k, ℓ > K, we have kxk − xℓ k < kx1 − x0 k.)
1−c
5. Use the result of Exercise 2.2.14 to give a different proof of Proposition 1.1.
♯ 6. a. Show that if H is any square matrix with kHk < 1, then I − H is invertible. (Hint:
P
∞
Consider the geometric series H k . You will need to use the result of Exercise 5.1.6.)
k=0
b. Suppose, more generally, that A is an invertible n × n matrix. Show that when kHk <
1/kA−1 k, the matrix A + H is invertible as well. (Hint: Write A + H = A(I + A−1 H).)
2
c. Prove that the set of invertible n × n matrices is an open subset of Mn×n = Rn . This set
P 2 1/2
is denoted GL(n), the general linear group. (Hint: By Exercise 5.1.5, if hij < δ,
then kHk < δ.)
♯ 7. Continuing Exercise 6:
ε
a. Show that if kHk < ε < 1, then k(I + H)−1 − Ik < .
1−ε
b. More generally, if A is invertible and kA−1 kkHk < ε < 1, then estimate k(A+H)−1 −A−1 k.
c. Let X ⊂ Mn×n be the set of invertible n × n matrices (by Exercise 6, this is an open
subset). Prove that the function f : X → X, f (A) = A−1 , is continuous.

8. Suppose x0 ∈ U ⊂ R, g : U → R is C2 , and g′ (x0 ) 6= 0. Set h0 = −g(x0 )/g′ (x0 ) and x1 = x0 +h0 .

Prove that if |g′′ | ≤ M on the interval x1 − |h0 |, x1 + |h0 | and |g(x0 )|M ≤ 21 (g′ (x0 ))2 , then
Newton’s method converges, starting at x0 , to the unique root of g in that interval,2 as follows.
a. According to Proposition 3.2 of Chapter 5, we have g(x1 ) = g(x0 ) + g′ (x0 )h0 + 12 g′′ (ξ)h20
for some ξ between x0 and x1 . Prove that |g(x1 )| ≤ 14 |g(x0 )|.
b. Using the fact that g ′ (x1 ) = g′ (x0 ) + g ′′ (c)h0 for some c between x0 and x1 , show that
1 2
≤ .
|g′ (x 1 )| |g ′ (x 0 )|

Now deduce that

|g(x1 )| |g(x0 )|
≤ ′ and hence that |g(x1 )|M ≤ 12 (g′ (x1 ))2 .
g ′ (x1 )2 g (x0 )2
c. Deduce that |g(x1 )/g′ (x1 )| ≤ |h0 |/2.

2
We learned of the n-dimensional version of this result, which we give in Exercise 10, called Kantarovich’s
Theorem, in Hubbard and Hubbard’s Vector Calculus, Linear Algebra, and Differential Forms.
§1. The Contraction Mapping Principle 239

d. Prove analogously that if, when we apply Newton’s method, we set xk+1 = xk + hk , then
|hk | ≤ |h0 |/2k . Deduce that iterating Newton’s method converges to a point in the given
interval.

9. Using the result of Exercise 8:

*a. Let g(x) = x2 − 2. Carry out two steps of Newton’s method starting at x0 = 1. Give an
interval that is guaranteed to contain a nearby root of g.
b. Let g(x) = x3 − 2. Carry out two steps of Newton’s method starting at x0 = 5/4. Give an
interval that is guaranteed to contain a nearby root of g.
*c. Let g(x) = x − cos 2x. Carry out two steps of Newton’s method starting at x0 = π/4. Give
an interval that is guaranteed to contain a nearby root of g.

10. Suppose x0 ∈ U ⊂ Rn , g : U → Rn is C2 , and Dg(x0 ) is invertible. Newton’s method in n

dimensions is given by iterating the map
f (x) = x − Dg(x)−1 g(x),
starting at x0 . Set h0 = −Dg(x0 )−1q
g(x0 ) and x1 = x0 + h0 . Let B = B(x1 , kh0 k); suppose
Pn 2
kHess(gi )k ≤ Mi on B, and set M = i=1 Mi . Suppose, moreover, that

kDg(x0 )−1 k2 kg(x0 )kM ≤ 1/2.

Prove that Newton’s method converges, starting at x0 , to a point of B, as follows.
a. Using Proposition 3.2 of Chapter 5, prove that kg(x1 )k ≤ 12 M kh0 k2 ≤ 41 kg(x0 )k.
b. Show that kDg(x1 ) − Dg(x0 )k ≤ M kh0 k. Using Exercise 7, deduce that kDg(x1 )−1 k ≤

2kDg(x0 )−1 k. (Hint: Let H = Dg(x0 )−1 Dg(x1 ) − Dg(x0 ) ; show that kHk ≤ 1/2.)
c. Now show that kDg(x1 )−1 k2 kg(x1 )k ≤ kDg(x0 )−1 k2 kg(x0 )k, and conclude that
kDg(x1 )−1 k2 kg(x1 )kM ≤ 1/2.
d. Prove that kDg(x1 )−1 g(x1 )k ≤ kh0 k/2.
e. Letting hk = −Dg(xk )−1 g(xk ), prove analogously that khk k ≤ kh0 k/2k . Deduce that
iterating Newton’s method converges to a point in the given ball.

11. Using the result of Exercise 10:

x1 4x1 + x22 − 4
*a. Let g : R2 → R2 be defined by g = . Do one step of Newton’s method
x2 4x1 x2 − 1

1
to solve g(x) = 0, starting at x0 = , and find a ball in R2 that is guaranteed to contain
0
a root of g.
x1 x21 + x22 − 5
b. Let g : R2 → R2 be defined by g = 3 . Do one step of Newton’s method
2 x1 x2 − 1
x2

2
to solve g(x) = 0, starting at x0 = , and find a ball in R2 that is guaranteed to contain
0
a root of g.
x1 4 sin x1 + x22
c. Let g : R2 → R2 be defined by g = . Do one step of Newton’s method
x2 2x1 x2 − 1

π
to solve g(x) = 0, starting at x0 = , and find a ball in R2 that is guaranteed to contain
0
a root of g.
240 Chapter 6. Solving Nonlinear Problems

x1 x1 − 41 x22
d. Let g : R2 → R2 be defined by g= . Do one step of Newton’s method
x2 x2 − cos x1

0
to solve g(x) = 0, starting at x0 = , and find a ball in R2 that is guaranteed to contain
1
a root of g.

12. Prove the following, slightly stronger version of Proposition 1.3. Suppose U ⊂ Rn is open,
f : U → Rm is differentiable, and a and b are points in U so that the line segment between
them is contained in U . Then prove that there is a point ξ on that line segment so that
kf (b) − f (a)k ≤ kDf (ξ)kkb − ak. (Hints: Define g as before, let v = g(1) − g(0) and define
φ : [0, 1] → R by φ(t) = g(t) · v. Apply the usual mean value theorem and the Cauchy-Schwarz
inequality, Proposition 2.3 of Chapter 1, to show that kvk2 = φ(1) − φ(0) ≤ kg′ (c)kkvk for
some c ∈ (0, 1).)

2. The Inverse and Implicit Function Theorems

When we study functions f : R → R in single-variable calculus, it is usually quite simple to

decide when a function has an inverse function. Any increasing (or decreasing) function certainly
has an inverse, even if we are unable to give it explicitly (e.g., what is the inverse of the function
f (x) = x5 + x + 1?). Sometimes we make up names for inverse functions, such as log, the inverse
of exp, and arcsin, the inverse of sin (restricted to the interval [−π/2, π/2]).
Since a differentiable function on an interval in R with nowhere zero derivative has a differen-
tiable inverse, it is tempting to think that if the derivative f ′ (a) 6= 0, then f should have a local
inverse at a.

 x + x2 sin 1 , x 6= 0
Example 1. Let f (x) = 2 x
. Then, calculating from the definition, we find
0, x=0

f ′ (0) = 1
2 + lim h sin h1 = 1
2 > 0.
h→0

On the other hand, if x 6= 0,

f ′ (x) = 1
2 + 2x sin x1 − cos x1 ,

so there are points (e.g., x = 1/2πn for any nonzero integer n) arbitrarily close to 0 where f ′ (x) < 0.
That is, despite the fact that f ′ (0) > 0, there is no interval around 0 on which f is increasing, as
Figure 2.1 suggests. Thus, f has no inverse on any neighborhood of 0! ▽

All right, so we need a stronger hypothesis. If we assume f is C1 , then it will follow that if
f ′ (a) > 0, then f ′ > 0 on an interval around a, and so f will be increasing—hence invertible—on
that interval. That is the result that generalizes nicely to higher dimensions.

Theorem 2.1 (Inverse Function Theorem). Suppose U ⊂ Rn is open, x0 ∈ U , f : U → Rn is

C1 , and Df (x0 ) is invertible. Then there is a neighborhood V ⊂ U of x0 on which f has a C1
inverse function. That is, there are neighborhoods V of x0 and W of f (x0 ) = y0 and a C1 function
§2. The Inverse and Implicit Function Theorems 241

0.02

0.01

-0.04 -0.02 0.02 0.04

-0.01

-0.02

Figure 2.1

g : W → V so that
f (g(y)) = y for all y ∈ W and g(f (x)) = x for all x ∈ V .
Moreover, if f (x) = y, we have
−1
Dg(y) = Df (x) .

Proof. Without loss of generality, we assume that x0 = y0 = 0 and that Df (0) = I. (We
make appropriate translations and then replace f (x) by Df (0)−1 f (x).) Since f is C1 , there is r > 0
so that
kDf (x) − Ik ≤ 21 whenever kxk ≤ r.
Now, fix y with kyk < r/2, and define the function φ by
φ(x) = x − f (x) + y.
Note that kDφ(x)k = kDf (x) − Ik. Whenever kxk ≤ r, we have (by Proposition 1.3)
r r
kφ(x)k ≤ kx − f (x)k + kyk < + = r,
2 2
and so φ maps the closed ball B(0, r) to itself. Moreover, if x, y ∈ B(0, r), by Proposition 1.3 we
have
1
kφ(x) − φ(y)k ≤ kx − yk,
2
so φ is a contraction mapping on B(0, r). By Theorem 1.2, φ has a unique fixed point xy ∈ B(0, r).
That is, there is a unique point xy ∈ B(0, r) so that f (xy ) = y. We leave it to the reader to check
in Exercise 10 that in fact xy ∈ B(0, r).
As pictured in Figure 2.2, take W = B(0, r/2) and V = f −1 (W ) ∩ B(0, r) (note that V is open
because f is continuous; see also Exercise 2.2.7). Define g : W → V by g(y) = xy . We claim first of
all that g is continuous. Indeed, define ψ : B(0, r) → Rn by ψ(x) = f (x) − x. Then, by Proposition
1.3 we have

f (u) − u − f (v) − v = kψ(u) − ψ(v)k ≤ 1 ku − vk.
2

Thus, we have

f (u) − f (v) − u − v ≤ 1 ku − vk.
2
242 Chapter 6. Solving Nonlinear Problems

B(x0,r) f
r W=B(y0 ,r/2)

V x0 y0
g

Figure 2.2

It follows from the triangle inequality (see Exercise 1.2.17) that

ku − vk − kf (u) − f (v)k ≤ 12 ku − vk,

and so
1
2 ku − vk ≤ kf (u) − f (v)k.
Writing f (u) = y and f (v) = z, we have
(∗) kg(y) − g(z)k ≤ 2ky − zk.
It follows that g is continuous (e.g., given ε > 0, take δ = ε/2).
Next, we check that g is differentiable. Fix y ∈ W and write g(y) = x; and we wish to prove
−1
that Dg(y) = Df (x) . Choose k sufficiently small that y + k ∈ W . Set g(y + k) = x + h, so
that h = g(y + k) − g(y). For ease of notation, write A = Df (x). We are to prove that
g(y + k) − g(y) − A−1 k
→0 as k → 0.
kkk
We consider instead the result of multiplying this quantity by (the fixed matrix) A:

A g(y + k) − g(y) − k Ah − k f (x + h) − f (x) − Df (x)h khk
= =− · .
kkk kkk khk kkk
We infer from (∗) that khk ≤ 2kkk, so as k → 0, it follows that h → 0 as well. Note, moreover,
that h 6= 0 when k 6= 0 (why?). Now we analyze the final product above: the first term approaches
0 by the differentiability of f ; the second is bounded above by 2. Thus, the product approaches 0,
as desired.
The last order of business is to see that g is C1 . We have
−1
Dg(y) = Df (g(y)) ,
so we see that Dg is the composition of the function y Df (g(y)) and the function A A−1
1
on the space of invertible matrices. Since g is continuous and f is C , the former is continuous.
By Exercise 6.1.7, the latter is continuous (indeed, we will prove much more in Corollary 5.19 of
Chapter 7 when we study determinants in detail). Since the composition of continuous functions
is continuous, the function y Dg(y) is continuous, as required.
§2. The Inverse and Implicit Function Theorems 243

Remark. More generally, with a bit more work, one can show that if f is Ck (or smooth), then
the local inverse g is likewise Ck (or smooth).
It is important to remember that this theorem guarantees only a local inverse function. It may
be rather difficult to determine whether f is globally one-to-one. Indeed, as the following example
shows, even if Df is everywhere invertible, the function f may be very much not one-to-one.

Example 2. Define f : R2 → R2 by

u eu cos v
f = .
v eu sin v
Then f is C1 , and
u eu cos v −eu sin v
Df =
v eu sin v eu cos v
is everywhere nonsingular, since its determinant is e2u6=0. Nevertheless,
since sin and cos are
u u
periodic, it is clear that f is not one-to-one: We have f =f for any integer k.
v v + 2πk

u x
On the other hand, if f = , then we apparently can solve for u and v:
v y
1

x 2log(x2 + y 2 )
(†) g =
y arctan(y/x)

x x
certainly satisfies f ◦g = . So, why is g not the inverse function of f ? Recall that
y y
arctan : R → (−π/2, π/2). So, as shown in Figure 2.3, if we consider the domain of f to be

v y

f
π/2
u x
−π/2
g

Figure 2.3

u x
: −π/2 < v < π/2 and the domain of g to be : x > 0 , then f and g will be inverse
v y
functions.
u x
Let’s calculate the derivative of any local inverse g according to Theorem 2.1. If f = ,
v y
then  
−1 x y
x u e−u cos v e−u sin v +y 2 2 2
x +y . x2
Dg = Df = = y x
y v −e−u sin v e−u cos v− 2
x + y 2 x2 + y 2
Note that we get the same formula differentiating our specific inverse function (†). It is a bit
surprising that the derivative of any other inverse function, with different domain and range, must
be given by the identical formula. ▽
244 Chapter 6. Solving Nonlinear Problems

Now we are finally in a position to prove the Implicit Function Theorem, which first arose in
our informal discussion of manifolds in Section 5 of Chapter 4. It is without question one of the
most important theorems in higher mathematics.

Theorem 2.2 (Implicit Function

Theorem). Suppose U ⊂ Rn is open, F: U→ Rm is C1 .
x x
Writing a vector in Rn as , with x ∈ Rn−m and y ∈ Rm , suppose that F 0 = 0 and the
y y0

∂F x0
m × m matrix is invertible. Then there are neighborhoods V of x0 and W of y0 and a
∂y y0
C1 function φ : V → W so that

x
F = 0, x ∈ V, and y ∈ W ⇐⇒ y = φ(x).
y
Moreover,
−1
∂F x ∂F x
Dφ(x) = − .
∂y φ(x) ∂x φ(x)

Proof. Define f : U → Rn = Rn−m × Rm by

 
x
x
f = x .
y F
y
Note that the linear map
 
I O
x0
Df =  ∂F x0 ∂F x0 
y0
∂x y0 ∂y y0
is invertible (see Exercise 4.2.7). This means that—as illustrated
in Figure 2.4—there are neigh-
x0
borhoods V ⊂ Rn−m of x0 , W ⊂ Rm of y0 , and Z ⊂ Rn of and a C1 function g : Z → V × W
0

W
)

y0
f
(

( ) ( )
V x0
g
Z

Figure 2.4

so that g is an inverse of f on V × W . Now define φ : V → W by

x x
=g .
φ(x) 0

φ is obviously C1 since g is. And

 
x
x x x
 x =f =f g = ,
F φ(x) 0 0
φ(x)
§2. The Inverse and Implicit Function Theorems 245

x x
so F = 0, as desired. On the other hand, if F = 0, x ∈ V , and y ∈ W , then
φ(x) y

x x
=g , so y must be equal to φ(x).
y 0
is C1 ,
Since we know that φ we can calculate the derivative by “implicit differentiation”: define
x
h : V → Rm by h(x) = F . Then h is C1 and, since h(x) = 0 for all x ∈ V , we have
φ(x)

∂F x ∂F x
O = Dh(x) = + Dφ(x).
∂x φ(x) ∂y φ(x)

∂F x
Since by hypothesis is invertible, the desired result is immediate.
∂y φ(x)

Remark. With not much more work, one can prove analogously that if F is Ck (or smooth),
then y is given locally as a Ck (or smooth) function of x. We may take this for granted in our later
work.

2 x
Example 3. Consider the function F : R → R, F = x3 ey + 2x cos(xy). We claim that
y

x x0 1
the equation F = 3 defines y locally as a function of x near the point = . By the
y y0 0

-6 -4 -2 0 2 4 6

-2

-4

-6

Figure 2.5

∂F 1
Implicit Function Theorem, Theorem 2.2, we need only check that 6= 0. Well,
∂y 0

∂F ∂F 1
= x3 ey − 2x2 sin(xy) and so = 1,
∂y ∂y 0

so we know that in a neighborhood of x0 = 1 there is a C1 function φ with φ(1) = 0 whose graph is

(the appropriate piece of) the level curve F = 3, as shown in Figure 2.5. Of course, farther away,
the curve apparently gets quite crazy.
246 Chapter 6. Solving Nonlinear Problems

1
If we’re interested in the best linear approximation to the curve at the point , then we also
0
know from Theorem 2.2 that
∂F
′ ∂x 1 5
φ (1) = − ∂F
= − = −5,
0 1
∂y
so the line 5x + y = 5 is the desired tangent line of the curve at that point. ▽
Example 4. Consider F : R5 → R3 given by
   
x1 0
 
x2  2x1 + x2 + y1 + y3 − 1  1
   
F   3 2 2 
 y 1  = x1 x2 + x1 y 1 + x2 y 2 − y 2 y 3 , and let a= 
 −1  .
 y2  2 2
x2 y 1 y 3 + x1 y 1 + y 2 y 3  1
y3 1
 
y1
x1
Does the equation F = 0 define y =  y2  as implicitly as a function of x = near a? Note
x2
y3
first of all that F is C1 . We begin by calculating the derivative of F:
 
x1  
x2  2 1 1 0 1
 
DF    3 2
 y1  = x2 + y1 3x1 x2 + 2x2 y2
2 x1 2x22 y2 − y3 −y2 ,
 y2  2
y1 y1 y3 x2 y3 + 2x1 y1 2
y3 x2 y1 + 2y2 y3
y3

and so
 
0  
 1 2 1 1 0 1
 
DF −1
 
 = 0 2 0 1 −1  .
 1 1 −1 1 1 1
1
In particular, we see that  
1 0 1
∂F  
(a) =  0 1 −1  ,
∂y
1 1 1
which is easily checked to be nonsingular, and so the hypotheses of the Implicit Function Theorem,
Theorem 2.2, are fulfilled. There is a neighborhood of a in which we have y = φ(x). Moreover, we
have
    
−1 2 1 −1 2 1 −3 −5
0 ∂F ∂F
Dφ =− (a) (a) = −  −1 0 10 2 =  1 2.
1 ∂y ∂x
−1 −1 1 1 −1 1 4
With this information, we can easily give the tangent plane at a of the surface F = 0. ▽
Remark . In general, we shall not always be so chivalrous (nor shall life) as to set up the
notation precisely as in the statement of Theorem 2.2. Just as in the case of linear equations where
the first r variables needn’t always be the pivot variables, here the last m variables needn’t always
be (locally) the dependent variables. In general, it is a matter of finding m pivots in some m
columns of the m × n derivative matrix.
§2. The Inverse and Implicit Function Theorems 247

EXERCISES 6.2

1. By applying the Inverse Function Theorem, Theorem 2.1, determine at which points x0 the
1
f has a local C inverse g, and calculate Dg(f (x0 )).
given function
x x2 − y 2
*a. f =
y 2xy2
x x/(x + y 2 )
b. f =
y y/(x2 + y 2 )

x x + h(y)
c. f = for any C1 function h : R → R
y y

x x + ey
d. f =
y y + ex
   
x x+y+z
e. f y  =  xy + xz + yz  (cf. also Exercise 2)
z xyz

u u u+v
2. Let U = : 0 < v < u , and define f : U → R2 by f = .
v v uv
a. Show that f has a global inverse function g. Determine the domain of and an explicit
formula for g.
b. Calculate Dg both directly and by the formula given in the Inverse Function Theorem.
Compare your answers.
c. What does this exercise have to do with Example 2 in Chapter 4, Section 5? In particular,
give a concrete interpretation of your answer to part b.
1
3. Check that in each
of the following cases, the equation F = 0 defines y locally as a C function
x0
φ(x) near a = , and calculate Dφ(x0 ).
y0

x
a. F = y 2 − x3 − 2 sin π(x − y) , x0 = 1, y0 = −1
y
 
x1
1
*b. F x2  = e x 1 y 2
+ y cos x1 x2 − 1, x0 = , y0 = 0
2
y
 
x1
0
c. F x2  = e 1 + y arctan x2 − (1 + π/4), x0 =
x y 2 , y0 = 1
1
y
 
x 2 2 − y2 − 2

x − y 1
d. F  y1  = 1 2 , x0 = 2, y0 =
x − y1 + y2 − 2 1
y
 2
x1
x2  x21 − x22 − y13 + y22 + 4 2 2
e. 
F  =  , x0 = , y0 =
y1 2 2
2x1 x2 + x2 − 2y1 + 3y2 + 8 4 −1 1
y2

*4. Show that the equations x2 y +xy 2+ t2 − 2 2

 1 = 0 and x + y − 2yt = 0 define x and y implicitly
x −1
1
as C functions of t near  y  =  1 . Find the tangent line at this point to the curve so
t 1
248 Chapter 6. Solving Nonlinear Problems

defined.
   
x 1
5. Let F y  = x2 + 2y 2 − 2xz − z 2 = 0. Show that near the point a =  1 , z is given implicitly
z 1
as a C1 function of x and y. Find the largest neighborhood of a on which this is true.

*6. Using the law of cosines (Exercise 1.2.12) and Theorem 2.2, show that the angles of a triangle
are C1 functions of the sides. To a small change in which one of the sides (keeping the other
two fixed) is an angle most sensitive?

7. Define f : Mn×n → Mn×n by f (A) = A2 .

a. By applying the Inverse Function Theorem, Theorem 2.1, show that every matrix B in a
neighborhood of I has (at least) two square roots A (i.e., A2 = B), each varying as a C1
function of B. (See Exercise 3.1.13.)
b. Can if there are precisely two or more? (Hint: In the 2 × 2 case, what is
you decide
1 0
Df ?)
0 −1
∂F
8. Suppose U ⊂ R3 is an open set, F : U → R is a C1 function, and on F 
= 0we have 6= 0,
p ∂p
∂F ∂F
6= 0, and 6= 0. (You might use, as an example, the equation F V  = pV − RT = 0
∂V ∂T T
for one mole of ideal gas; here R is the so-called gas constant.)
 Then
 it is a consequence of the
p0
Implicit Function Theorem that in some neighborhood of  V0 , each of p, V , and T can be
T0
written
as a differentiable function of the remaining two
variables.
Physical chemists denote
∂p V
by the partial derivative of the function p = p with respect to V , holding T
∂V T T
constant, etc. Prove the thermodynamicist’s magic formula

∂p ∂V ∂T
= −1.
∂V T ∂T p ∂p V

9. Using the notation of Exercise 8, physical chemists define the expansion coefficient α and
isothermal compressibility β to be, respectively,

1 ∂V 1 ∂V
α= and β=− .
V ∂T p V ∂p T
*a. Calculate α and β for an idealgas.
∂p α
b. Show that in general we have = .
∂T V β
10. Check that, under the hypotheses in place in the proof of Theorem 2.1, if kxk = r, then
kf (x)k ≥ r/2. (Hint: Use Exercise 1.2.17.)
♯ 11. Let B = B(0, r) ⊂ Rn . Suppose U ⊂ Rn is an open subset containing the closed ball B,
f : U → Rn is C1 , f (0) = 0, and kDf (x)− Ik ≤ s < 1 for all x ∈ B. Prove that if kyk < r(1− s),
then there is x ∈ B such that f (x) = y.
§2. The Inverse and Implicit Function Theorems 249

12. Suppose U ⊂ Rn is open and f : U → Rm is C1 with f (a) = 0 and rank(Df (a)) = m. Prove
that for every c sufficiently close to 0 ∈ Rm the equation f (x) = c has a solution near a.
2 2
13. (Theenvelope
f: R
of a family of curves) Suppose × (a, b) → R is C and for each t ∈ (a, b),
x x
∇f 6= 0 on the level curve Ct = f = 0 . (Here the gradient denotes differentiation
t t
with respect only to x.) The curve C is called the envelope of the family of curves {Ct : t ∈
(a, b)} if each member
of the family
is tangent to C at some point (depending on t).
x0 ∂f x0
a. Suppose f = = 0 and the matrix
t0 ∂t t0
 
∂f x0 ∂f x0
 ∂x t0 
 2 ∂y t0 
 ∂ f x0 ∂ f x0 
2

∂x∂t t0 ∂y∂t t0

is nonsingular. Show that for some δ > 0, there is a C1 curve g : (t0 − δ, t0 + δ) → R2 with
g(t0 ) = x0 so that

g(t) ∂f g(t)
f = = 0.
t ∂t t

Conclude that g is a parametrization of the envelope C near x0 .

b. Find the envelopes of the following families of curves (portions of which are sketched in
Figure
2.6).

x
(i) f = (cos t)x + (sin t)y = 1,
t

x
(ii) f = y + t2 x − t = 0
t

x x 2 y 2
(iii) f = + = 1, t ∈ (0, 1)
t t 1−t

(i) (ii) (iii)

Figure 2.6
250 Chapter 6. Solving Nonlinear Problems

3. Manifolds Revisited

In Chapter 4, we introduced k-dimensional manifolds in Rn informally as being locally the

graph of C1 function over an open subset of a k-dimensional coordinate plane. We suggested that,
because of the Implicit Function Theorem, under the appropriate hypotheses, a level set of a C1
function is a prototypical example. Indeed, as we now wish to make clear, there are three equivalent
formulations, roughly these:
explicit: near each point, M is a graph over some k-dimensional coordinate plane;
implicit: near each point, M is the level set of some function whose derivative has maximum
rank;
parametric: near each point, M is parametrized by some one-to-one function whose derivative
has maximum rank (e.g., a parametrized curve with nonzero velocity).
We’ve seen that the implicit formulation arises in working with Lagrange multipliers, and the
parametric formulation will be crucial for our work with integration in Chapter 8. In this brief
section, we are going to make the three definitions quite precisely and then prove their equivalence in
Theorem 3.1. To make our life easier in Chapter 8, we will replace the C1 condition with “smooth.”

Definition . We say M ⊂ Rn is a k-dimensional manifold if any one of the following three

criteria holds:
(1) For any p ∈ M , there is a neighborhood W ⊂ Rn of p so that M ∩ W is the graph of a
smooth function f : V → Rn−k , where V ⊂ Rk is an open set. Here we are allowed to choose
any k integers 1 ≤ i1 < · · · < ik ≤ n; then Rk is the xi1 · · · xik -plane, and Rn−k is the plane
of the complementary coordinates.
(2) For any p ∈ M , there are a neighborhood W ⊂ Rn of p and a smooth function F : W →
Rn−k so that F−1 (0) = M ∩ W and rank(DF(x)) = n − k for every x ∈ M ∩ W .
(3) For any p ∈ M , there is a neighborhood W ⊂ Rn of p so that M ∩ W is the image of
a smooth function g : U → Rn for some open set U ⊂ Rk , with the properties that g is

g W
M
p

Figure 3.1

one-to-one, rank(Dg(u)) = k for all u ∈ U , and g−1 : M ∩ W → U is continuous. (See

Figure 3.1.)

If the curious reader wonders why the last (and obviously technical) condition is included in
the third definition, see Exercises 2 and 3.
§3. Manifolds Revisited 251

Theorem 3.1. The three criteria given in this definition are all equivalent.

Proof. The Implicit Function Theorem, Theorem

2.2, tells us precisely that (2)=⇒(1). And
u
(1)=⇒(3) is obvious, since we can set g(u) = (where, for ease of notation, we assume here
f (u)
that Rk is the x1 · · · xk -plane). So it remains only to check that (3)=⇒(2).
Suppose, as in the third definition, that we are given a neighborhood W f ⊂ Rn of p ∈ M so that
M ∩W f is the image of a smooth function g : U → Rn for some open set U ⊂ Rk , with the properties
that g is one-to-one, rank(Dg(u)) = k for all u ∈ U , and g−1 : M ∩ W f → U is continuous. The
last condition tells us that that if g(u0 ) = p, then points sufficiently close to p in M must map by
g−1 close to u0 ; that is, all points of M ∩ W f are the image under g of a neighborhood of u0 .
n
" We # may assume that g(0) = p and (renumbering coordinates in R as necessary)
Dg(0) =

A n−k n u 0
, where A is an invertible k×k matrix. We define G : U ×R → R by G = g(u)+ .
B v v
Since

0 A O
DG =
0 B In−k
is invertible (see Exercise 4.2.7), it follows from the Inverse Function
Theorem, Theorem 2.1, that
0
there are neighborhoods V = V1 × V2 ⊂ Rk × Rn−k of and W ⊂ Rn of p and a local
0
(smooth) inverseH : W → V of G. (Shrinking W if necessary, we assume W ⊂ Wf .) Writing
H1 (x)
H(x) = ∈ Rk × Rn−k , we define F : W → Rn−k by F = H2 . Now suppose F(x) = 0.
H2 (x)

u u
Since x ∈ W , x = G for a unique vector ∈ V . Then
v v

u u
F(x) = F G = H2 G = v,
v v

so F(x) = 0 if and only if v = 0, which means that x = g(u). This proves that the equation F = 0
defines that portion of M given by g(u) for all u ∈ V1 . But because W ⊂ Wf , we know that such
points comprise all of M ∩ W .

Example 1. Perhaps an explicit example

  will make this proof a bit more understandable.
u
Suppose g : R → R3 is given by g(u) =  u2  and M is the image of g. We wish to write M
u3
(perhaps locally) as the level set of a function near p = 0. As in the proof, we define
       
u u 0 u
G v1  =  u2  +  v1  =  u2 + v1  .
v2 u3 v2 u3 + v2

We can explicitly construct the inverse function

     
x x x
G−1 y  = H y  =  y − x2  .
z z z − x3
252 Chapter 6. Solving Nonlinear Problems

The proof tells us to define F = H2 , and, indeed, this works. M is the zero-set of the function
 
x
y − x2
F : R3 → R2 given by F y  = .
z − x3
z
We ask the reader to carry this procedure out in Exercise 6 in a situation where it will only work
locally. ▽

There are corresponding notions of the tangent space of the manifold M at p. (Recall that we
shall attempt to refer to the tangent space as a subspace, whereas the tangent plane is obtained by
translating it to pass through the point p.)

Definition. If the manifold M is presented in the three respective forms above, then its tangent
space at p, denoted Tp M , is defined as follows.

a
(1) assuming M is locally the graph of f with p = , then Tp M is the graph of Df (a);
f (a)
(2) assuming M is locally a level set of F, then Tp M = N([DF(p)]).
(3) assuming M is locally parametrized by g with p = g(a), then Tp M is the image of the
linear map Dg(a) : Rk → Rn .

Once again, we need to check that these three recipes all give the same k-dimensional subspace
of Rn . The ideas involved in this check have all emerged already in the preceding chapters. Since
(1) is a special case of (3) (why?), we need only check that N([DF(p)]) = image(Dg(a)). Note
that both of these are k-dimensional subspaces, because of our rank conditions on F and g. So it
suffices to show that image(Dg(a)) ⊂ N([DF(p)]). But this is easy: the function F◦ g : U → Rn−k
is identically 0, so, by the chain rule, DF(p)◦ Dg(a) = O, which says precisely that any vector in
the image of Dg(a) is in the kernel of DF(p).

EXERCISES 6.3

x
*1. Show that the set X = : y = |x| is not a 1-dimensional manifold, even though the
y
3
t
function g(t) = 3 gives a C1 “parametrization” of it. What’s going on?
|t |

cos 2t cos t
2. Show that the parametric curve g(t) = , t ∈ (−π/2, π/4), is not a 1-dimensional
cos 2t sin t
manifold. (Hint: Stare at Figure 3.2.)

3. Consider the following union of parallel lines:

x
X= : y = q for some q ∈ Q ⊂ R2 .
y
Is X a 1-dimensional manifold?

4. Is the union of the hyperbola xy = 1 and its asymptote y = 0 a 1-dimensional manifold? Give
your reasoning.
§3. Manifolds Revisited 253

Figure 3.2

5. Show the equivalence of

 the three definitions for each of the following 1-dimensional manifolds:
2
t
*a. parametric curve  t 
t4

cos t
b. parametric curve
3 sin t
c. implicit curve x2 + y 2 = 1, x2 + y 2 + z 2 = 2x
d. implicit curve x2 + y 2 = 1, z 2 + w2 = 1, xz + yw = 0
 2

u+u
6. Suppose g : R → R3 is given by g(u) =  u2 . Let M be the image of g.
u3
a. Show that g is globally one-to-one.
b. Following the proof given of Theorem 3.1, find a neighborhood W of 0 ∈ R3 and F : W →
R2 so that M ∩ W = F−1 (0).

7. Show the equivalence of the three definitions for each of the following 2-dimensional manifolds:
a. implicit surface x2 + y 2 = 1 (in R3 )
b. implicit surface x2 + 2 2 3
 y = z  (in R − {0})
u cos v
*c. parametric surface  u sin v , u > 0, v ∈ R
v
 
sin u cos v
d. parametric surface  sin u sin v , 0 < u < π, 0 < v < 2π
cos u
 
sin u cos v
e. parametric surface  sin u sin v , 0 < u < π, 0 < v < 2π
2 cos u
 
(3 + 2 cos u) cos v
f. parametric surface  (3 + 2 cos u) sin v , 0 ≤ u, v ≤ 2π
2 sin u
8. a. Show that
  
 x 
X =  y  : (x2 + y 2 + z 2 )2 − 10(x2 + y 2 ) + 6z 2 + 9 = 0
 
z

is a 2-manifold.
254 Chapter 6. Solving Nonlinear Problems
 
x p
b. Check that  y  ∈ X ⇐⇒ ( x2 + y 2 − 2)2 + z 2 = 1. Use this to sketch X.
z
9. At what points is   
 x 
X = y  : (x2 + y 2 + z 2 )2 − 4(x2 + y 2 ) = 0
 
z
a smooth surface? Proof? Give the equation of its tangent space at such a point.

10. Prove that the equations

x21 + x22 + x23 + x24 = 4 and x1 x2 + x3 x4 = 0
 
1
 1
define a smooth surface in R4 . Give a basis for its tangent space at  
 −1 .
1

11. Prove (1)=⇒(2) in Theorem 3.1 directly.

x
12. Writing ∈ R3 × R3 , show that the equations
y
kxk2 = kyk2 = 1 and x·y =0
define a 3-dimensional manifold in R6 . Give a geometric interpretation of this manifold.

13. Recall from Exercise 1.4.34 that an n × n matrix A is called orthogonal if AT A = I.

a. Prove that the set O(n) of n × n orthogonal matrices forms a n(n−1)
2 -dimensional manifold
2
in Mn×n = R . (Hint: Consider F : Mn×n → {symmetric n × n matrices} = Rn(n+1)/2
n

defined by F(A) = AT A − I. Use Exercise 3.1.13.)

b. Show that the tangent space of O(n) at I is the set of skew-symmetric n × n matrices.
" #
A
14. Prove (3)=⇒(1) in Theorem 3.1 directly. (Hint: Suppose g(0) = p and Dg(0) = , where
B
" #
g1
A is an invertible k × k matrix. Write g = : U → Rk × Rn−k and observe that g1 has a
g2
local inverse. What about the general case?)
CHAPTER 7
Integration
We turn now to the integral, with which, intuitively, we chop a large problem into small,
understandable bits and add them up, then proceeding to a limit in some fashion. We start with
the definition and then proceed to the computation, which is, once again, based on reducing the
problem to several one-variable calculus problems. We then learn how to exploit symmetry by
using different coordinate systems and tackle various standard physical applications (e.g., center of
mass, moment of inertia, and gravitational attraction). The discussion of determinants, initiated in
Chapter 1, culminates here with a complete treatment and their role in integration and the change
of variables theorem.

1. Multiple Integrals

In single-variable calculus the integral is motivated by the problem of finding the area under a
curve y = f (x) over aninterval
[a, b]. Now we want to find the volume of the region in R3 lying
x
under the graph z = f and over the rectangle R = [a, b] × [c, d] in the xy-plane. Once we see
y
how partitions, upper and lower sums, and the integral are defined for rectangles in R2 , then it is
simple (although notationally discomforting) to generalize to higher dimensions.

Definition. Let R = [a, b] × [c, d] be a rectangle in R2 . Let f : R → R be a bounded function.

Given partitions P1 = {a = x0 < x1 < · · · < xk = b} and P2 = {c = y0 < y1 < · · · < yℓ = d} of [a, b]
and [c, d], respectively, denote by P = P1 × P2 the partition of the rectangle R into subrectangles
Rij = [xi−1 , xi ] × [yj−1 , yj ], 1 ≤ i ≤ k, 1 ≤ j ≤ ℓ.
Let Mij = sup f (x) and mij = inf f (x), as indicated in Figure 1.1. Define the upper sum of f
x∈Rij x∈Rij

Mij mij
Rij

Figure 1.1
255
256 Chapter 7. Integration

with respect to the partition P,

X
U (f, P) = Mij area(Rij ),
i,j

and the analogous lower sum

X
L(f, P) = mij area(Rij ).
i,j

We say f is integrable on R if there is a unique number I satisfying

L(f, P) ≤ I ≤ U (f, P) for all partitions P .

Z
In that event, we denote I = f dA, called the integral of f over R.
R

(Note that the inequality L(f, P) ≤ U (f, P) is obvious, as mij ≤ Mij for all i and j.)

Example 1. Let f be a constant function, viz., f (x) = α for all x ∈ R. Then

Z for any partition
P of R we have L(f, P) = αarea(R) = U (f, P), so f is integrable on R and f dA = αarea(R).
R
▽

In higher dimensions, we proceed analogously, but the notation is horrendous. Let R = [a1 , b1 ]×
[a2 , b2 ] × · · · × [an , bn ] ⊂ Rn be a rectangle in Rn . We obtain a partition of R by dividing each of
the intervals into subintervals,

a1 = x1,0 < x1,1 < · · · < x1,k1 = b1 ,

a2 = x2,0 < x2,1 < · · · < x2,k2 = b2 ,
..
.
an = xn,0 < xn,1 < · · · < xn,kn = bn ,

and in such a way forming a “paving” of R by subrectangles

Rj1 j2 ...jn = [x1,j1 −1 , x1,j1 ] × [x2,j2 −1 , x2,j2 ] × · · · × [xn,jn−1 , xn,jn ] for some 1 ≤ js ≤ ks , s = 1, . . . , n.

We will usually suppress all the subscripts and just refer to the partition as {Ri }. We define the
volume of a rectangle R = [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ] ⊂ Rn to be

vol(R) = (b1 − a1 )(b2 − a2 ) · · · (bn − an ).

Then upper sums, lower sums, and the integral are defined as before, substituting volume (of a
rectangle
Z in Rn ) for area (of a rectangle in R2 ). In dimensions n ≥ 3, we denote the integral by
f dV .
R
We need some criteria to detect integrability of functions. Then we will find soon that we can
evaluate integrals by reverting to our techniques from one-variable calculus.

Definition. Let P and P′ be partitions of a given rectangle R. We say P′ is a refinement of P

if for every rectangle Q′ ∈ P′ there is a rectangle Q ∈ P so that Q′ ⊂ Q. (See Figure 1.2.)
§1. Multiple Integrals 257

Q
Q′

partition P of the rectangle R refinement P ′ of the partition P

Figure 1.2

Lemma 1.1. Let P and P′ be partitions of a given rectangle R, and suppose P is a refinement
of P′ . Suppose f is a bounded function on R. Then we have
L(f, P′ ) ≤ L(f, P) ≤ U (f, P) ≤ U (f, P′ ).

Proof. It suffices to check the following: let Q be a single rectangle, and let Q = {Q1 , . . . , Qr }
be a partition of Q. Let m = inf x∈Q f (x), mi = inf x∈Qi f (x), M = supx∈Q f (x), Mi = supx∈Qi f (x).
Then we claim that
Xr Xr
marea(Q) ≤ mi area(Qi ) ≤ Mi area(Qi ) ≤ M area(Q).
i=1 i=1
This is immediate from the fact that m ≤ mi ≤ Mi ≤ M for all i = 1, . . . , r.

Corollary 1.2. If P′ and P′′ are two partitions of R, we have L(f, P′ ) ≤ U (f, P′′ ).

Proof. Let P be the partition of R formed by taking the union of the respective partitions in
each coordinate, as indicated in Figure 1.3. P is called the common refinement of P′ and P′′ . Then
by Lemma 1.1, we have
L(f, P′ ) ≤ L(f, P) ≤ U (f, P) ≤ U (f, P′′ ),
as required.

Proposition 1.3 (Convenient Criterion). Given a bounded function on a rectangle R, f is

integrable on R if and only if, for any ε > 0, there is a partition P of R so that U (f, P)−L(f, P) < ε.

Proof. ⇐=: Suppose there were two different numbers I1 and I2 satisfying L(f, P) ≤ Ij ≤
U (f, P) for all partitions P. Choosing ε = |I2 − I1 | yields a contradiction.
=⇒: Now suppose f is integrable, so that there is a unique number I satisfying L(f, P) ≤ I ≤
U (f, P) for all partitions P. Given ε > 0, we can find a partitions P′ and P′′ so that
I − L(f, P′ ) < ε/2 and U (f, P′′ ) − I < ε/2.
(If we could not get as close as desired to I with upper and lower sums, we would violate uniqueness
of I.) Let P be the common refinement of P′ and P′′ . Then
L(f, P′ ) ≤ L(f, P) ≤ U (f, P) ≤ U (f, P′′ ),
so
U (f, P) − L(f, P) ≤ U (f, P′′ ) − L(f, P′ ) < ε,
258 Chapter 7. Integration

′′
partition P ′ of the rectangle R partition P of the rectangle R

common refinement of partitions P ′ and P ′ ′

Figure 1.3

as required.

We need to be aware of the basic properties of the integral (which we leave to the reader as
exercises):

Proposition 1.4. Suppose f and g are integrable functions on R. Then f + g is integrable on

R and we have Z Z Z
(f + g)dV = f dV + gdV .
R R R

Proof. See Exercise 9.

Proposition 1.5. Suppose f is an integrable function on R and α is a scalar. Then αf is

integrable on R and we have Z Z
(αf )dV = α f dV .
R R

Proof. See Exercise 9.

Proposition 1.6. Suppose R = R′ ∪ R′′ is the union of two subrectangles. Then f is integrable
on R if and only if f is integrable on both R′ and R′′ , in which case we have
Z Z Z
f dV = f dV + f dV .
R R′ R′′

Proof. See Exercise 9.

Proposition 1.7. Let R ⊂ Rn be a rectangle. Suppose f : R → R is continuous. Then f is

integrable.
§1. Multiple Integrals 259

Proof. Given ε > 0, we must find a partition P of R so that U (f, P) − L(f, P) < ε. Since f
is continuous on the compact set R, it follows from Theorem 1.4 of Chapter 5 that f is uniformly
continuous. That means that given any ε > 0, there is δ > 0 so that whenever kx − yk < δ,
ε
x, y ∈ R, we have |f (x) − f (y)| < . Partition R into subrectangles Ri , i = 1, . . . , k, of
vol(R) √
diameter less than δ (e.g., whose sidelengths are less than δ/ n). Then on any such subrectangle
ε
Ri , we will have Mi − mi < , and so
vol(R)
k
X ε
U (f, P) − L(f, P) = (Mi − mi )vol(Ri ) < vol(R) = ε,
vol(R)
i=1
as needed.

Definition . We say X ⊂ Rn has (n-dimensional) volume zero if for every ε > 0, there are
P
s
finitely many rectangles R1 , . . . , Rs so that X ⊂ R1 ∪ · · · ∪ Rs and vol(Ri ) < ε.
i=1

Proposition 1.8. Suppose f : R → R is a bounded function and the set X = {x ∈ R :

f is not continuous at x} has volume zero. Then f is integrable on R.

Proof. Let ε > 0 be given. We must find a partition P of R so that U (f, P) − L(f, P) < ε.
Since f is bounded, there is a real number M so that |f | ≤ M . Because X has volume zero,
we can find finitely many rectangles R1′ , . . . , Rs′ , as shown in Figure 1.4, that cover X and satisfy
Ps
vol(Rj′ ) < ε/4M . We can also ensure that no point of X is a frontier point of the union of
j=1
these rectangles (see Exercise 2.2.8). Now create a partition of R in such a way that each of Rj′ ,

Figure 1.4
j = 1, . . . , s, will be a union of subrectangles of this partition, as shown in Figure 1.5. Consider the
Ss
closure of Y = R − Rj′ ; it too is compact, and f is continuous on Y , hence uniformly continuous.
j=1
Proceeding as in the proof of Proposition 1.7, we can refine the partition to obtain a partition
P = {R1 , . . . , Rt } of R with the property that
X ε
(Mi − mi )vol(Ri ) < .
2
Ri ⊂Y

But we already know that

X ε ε
(Mi − mi )vol(Ri ) < (2M ) = .
s
4M 2
R′j
S
Ri ⊂
j=1
260 Chapter 7. Integration

Figure 1.5

Therefore, U (f, P) − L(f, P) < ε, as required.

If we want to integrate over a nonrectangular bounded set Ω, we pick a rectangle R with Ω ⊂ R.

Given a bounded function f : Ω → R, define

f˜: R → R

f (x), x ∈ Ω
˜
f (x) = .
0, otherwise

We then define f to be integrable when f˜ is, and set

Z Z
f dV = f˜dV .
Ω R
(We leave it to the reader to check in Exercise 8 that this is well-defined.)

Definition. We say a subset Ω ⊂ Rn is a region if it is the closure of a bounded open subset

of Rn and its frontier, i.e., the set of its frontier points, has volume 0.

Remark. Note, first of all, that any region is compact.

As we ask the reader to check in Exercise 12, if m < n, X ⊂ Rm is compact, and φ : X → Rn
is C1 , then φ(X) has volume 0 in Rn . So, any time the frontier of Ω is a finite union of such sets,
it has volume 0 and Ω is a region.

Corollary 1.9. If Ω ⊂ Rn is a region and f : Ω → R is continuous, then f is integrable on Ω.

Proof. Recall that to integrate f over Ω we must integrate f˜ over some rectangle R containing
Ω. The function f˜ is continuous on all of R except for the frontier of Ω, which is a set of volume
zero.
Z
n
Corollary 1.10. If Ω ⊂ R is a region, then vol(Ω) = 1dV is well-defined.
Ω

Proof. The constant function 1 is continuous on Ω.

The following result is often quite useful.

Proposition 1.11. Suppose f and g are integrable functions on the region Ω and f ≤ g. Then
Z Z
f dV ≤ gdV .
Ω Ω
§1. Multiple Integrals 261

Proof. Let R be a rectangle containing Ω and let f˜ and g̃ be the functions as defined above.
Then we have f˜ ≤ g̃ everywhere
Z on R.
Z Then, Zapplying Propositions 1.4 and 1.5, the function
h = g̃ − f˜ is integrable and hdV = gdV − f dV . On the other hand, since h ≥ 0, for any
R Ω Ω Z
partition P of R, the lower sum L(h, P) ≥ 0, and therefore hdV ≥ 0. The desired result now
R
follows immediately.

EXERCISES 7.1


0, 0 ≤ y ≤ 1
x 2
*1. Suppose
Z f = . Prove that f is integrable on R = [0, 1] × [0, 1] and find
y 1, 1 < y ≤ 1
2
f dA.
R
2. Show directly that the function

1,
x x=y
f =
y 0, otherwise
Z
is integrable on R = [0, 1] × [0, 1] and find f dA. (Hint: Partition R into 1/N by 1/N
R
squares.)

3. Show directly that the function


1,
x y<x
f =
y 0, otherwise
Z
is integrable on R = [0, 1] × [0, 1] and find f dA.
R
4. Show directly that the function


x 1, x = 12 , 13 , 41 , 15 , . . .
f =
y 0, otherwise
Z
is integrable on R = [0, 1] × [0, 1] and find f dA.
R
♯ 5.
Z f : R → R is nonnegative, continuous, and positive at some point of R. Prove
a. Suppose
that f dV > 0.
R
b. Give an example to show the result of part a is false if we remove the hypothesis of
continuity.

6. Let Ω ⊂ Rn be a region and suppose f : Ω → R is continuous.

a. If m and M are, respectively, the minimum and maximum values of f , prove that
Z
mvol(Ω) ≤ f dV ≤ M vol(Ω).
Ω
262 Chapter 7. Integration

(Hint: Use Proposition 1.11.)

b. (Mean Value Theorem for Integrals) Suppose Ω is connected (this means that any pair of
points in Ω can be joined by a path in Ω). Prove that there is a point c ∈ Ω so that
Z
f dV = f (c)vol(Ω).
Ω
(Hint: Apply the Intermediate Value Theorem.)
♯ 7. Suppose f is continuous at a and integrable on a neighborhood of a. Prove that
Z
1
lim f dV = f (a).
ε→0+ volB(a, ε) B(a,ε)

Z
*8. Check that f dV is well-defined. That is, if R and R′ are two rectangles containing Ω and
Ω
f˜ and f˜′ are the corresponding
Z functions,
˜ ˜′
Z check that f is integrable over R if and only if f is
integrable over R′ and that f˜dV = f˜′ dV .
R R′

9. a. Prove Proposition 1.4. (Hint: If P = {Ri } is a partition and mfi , mgi , mfi +g , Mif , Mig ,
Mif +g denote the obvious, show that
mfi + mgi ≤ mfi +g ≤ Mif +g ≤ Mif + Mig .
Z Z
It will also be helpful to see that f dV + gdV is the unique number between L(f, P)+
R R
L(g, P) and U (f, P) + U (g, P) for all partitions P.)
b. Prove Proposition 1.5.
c. Prove Proposition 1.6.
♯ 10. Suppose f is integrable on R. Given ε > 0, prove there is δ > 0 so that whenever all the
rectangles of a partition P have diameter less than δ, we have U (f, P) − L(f, P) < ε. (Hint:
By Proposition 1.3, there is a partition P′ (as indicated by the darker lines in Figure 1.6) so
that U (f, P′ ) − L(f, P′ ) < ε/2. Show that covering the dividing hyperplanes (of total area A)

Figure 1.6

of the partition by rectangles of diameter < δ requires at most volume 2Aδ. If |f | ≤ M , then
we can pick δ so that that total volume is at most ε/4M . Show that this δ works.)
♯ 11. Let X ⊂ Rn be a set of volume 0.
a. Show that for every ε > 0, there are finitely many cubes C1 , . . . , Cr so that X ⊂ C1 ∪· · ·∪Cr
Pr
and vol(Ci ) < ε. (Hint: If R is a rectangle with vol(R) < δ, show that there is a
i=1
rectangle R′ containing R with vol(R′ ) < δ and whose sidelengths are rational numbers.)
§1. Multiple Integrals 263

b. Let T : Rn → Rn be a linear map. Prove that T (X) has volume 0 as well. (Hint: Show
that there is a constant k so that for any cube C, the image T (C) is contained in a cube
whose volume is at most k times the volume of C.) Query: What goes wrong with this if
T : Rn → Rm and m < n?
♯ 12. Let m < n, let X ⊂ Rm be compact and U ⊂ Rm an open set containing X. Suppose
φ : U → Rn is C1 . Prove φ(X) has volume 0 in Rn . (Hints: Take X ⊂ C, where C is a cube.
Show that if N is sufficiently large and we divide C into N m subcubes, then X is covered by
such cubes all contained in U ,1 and φ(X) will be contained in at most N m cubes in Rn . Argue
by continuity of Dφ that there is a constant k (not depending on N ) so that each of these will
have volume less than (k/N )n .)

13. We’ve seen in Proposition 1.8 a sufficient condition for f to be integrable. Show that it isn’t
necessary by considering the famous function f : [0, 1] → R given by

 1 , x = p in lowest terms
f (x) = q q
.
0, otherwise

(Hint: Why is Q ∩ [0, 1] not a set of length zero?)

14. A subset X ⊂ Rn has measure zero if, given any ε > 0, there is a sequence of rectangles R1 ,
R2 , R3 , . . . , Rk , . . . , so that
∞
[ ∞
X
X⊂ Ri and vol(Ri ) < ε.
i=1 i=1

a. Prove that any set of volume 0 has measure 0.

b. Give an example of a set of measure 0 that does not have volume 0.
c. Prove that if X is compact and has measure 0, then X has volume 0. (Hint: See Exercise
5.1.12.)
S
∞
d. Suppose X1 , X2 , . . . is a sequence of sets of measure 0. Prove that Xi has measure 0.
i=1
15. In this (somewhat challenging) exercise, we discover precisely which bounded functions are
integrable. Let f : R → R be a bounded function.
a. Let a ∈ R and δ > 0. Define

M (f, a, δ) = sup f (x)

x∈B(a,δ)∩R

m(f, a, δ) = inf f (x)

x∈B(a,δ)∩R

o(f, a) = lim M (f, a, δ) − m(f, a, δ)

δ→0+

Prove that o(f, a) makes sense (i.e., the limit exists) and is nonnegative; it is called the
oscillation of f at a. Prove that f is continuous at a if and only if o(f, a) = 0.
b. For any ε > 0, set Dε = {x ∈ R : o(f, x) ≥ ε}, and let D = {x ∈ R : f is discontinuous at x}.
Show that D = D1 ∪ D1/2 ∪ D1/3 ∪ · · · and that Dε is a closed set.

1
This follows from Exercise 5.1.13.
264 Chapter 7. Integration

c. Suppose that f is integrable on R. Prove that for any k ∈ N, D1/k has volume 0. Deduce
that if f is integrable on R, then D has measure 0. (Hint: Use Exercise 14.)
d. Conversely, prove that if D has measure 0, then f is integrable. (Hints: Choose ε > 0 and
apply the convenient criterion. If D has measure 0, then so has Dε , and so it has volume
0 (why?). Create a partition consisting of rectangles disjoint from Dε and of rectangles of
small total volume that cover Dε .)

2. Iterated Integrals and Fubini’s Theorem

In one-variable integral calculus we learned that we could compute the volume of a solid region
by slicing it by parallel planes and integrating the cross-sectional area. In particular, given a
R = [a, b] × [c, d], if we are interested in finding the volume over R and under the graph
rectangle
x
z=f , we could slice by planes perpendicular to the x-axis, as shown in Figure 2.1, obtaining
y

c
a d

Figure 2.1
Z b
volume = cross-sectional area at x dx
a
Z b Z d ! Z bZ d
x x
= f dy dx = f dydx
a c y a c y
| {z }
x fixed
This expression is called an iterated integral. Perhaps it would be more suggestive to call it a nested
integral. Calculating iterated integrals reverts to one-variable calculus skills (finding antiderivatives
and applying the Fundamental Theorem of Calculus) along with a healthy dose of neat bookkeeping.

Example 1.
Z 1Z 2 Z 1 i2
2

1 + x + xy dydx = (1 + x2 )y + 21 xy 2 dx
0 1 0 | {z } y=1
x fixed
Z 1 i1

= 1 + x2 + 23 x dx = x + 13 x3 + 34 x2
0 x=0
§2. Iterated Integrals and Fubini’s Theorem 265

1 3 25
=1+ + = . ▽
3 4 12
Examples 2. Let’s investigate an obvious question.
Z 1Z 2
2
(a) We wish to evaluate xyex+y dxdy.
−1 0
Z Z ! Z
1 2
2
1 2
i2 R
xyex+y dx dy = yey (x − 1)ex dy (recalling that xex dx = xex − ex )
−1 0 −1 x=0
Z 1 i1
2 2
= yey (e2 + 1)dy = 12 (e2 + 1)ey = 0.
−1 y=−1
Z 2Z 1
2
(b) Now let’s consider xyex+y dydx.
0 −1
Z !
2Z 1 Z 2 Z 1
x+y 2 y2
xye dydx = (xex )(ye )dy dx
0 −1 0 −1
Z !
2 i1
1 x y2
= 2 (xe )e dx
0 y=−1

= 0.
2
More to the point, we should observe that for fixed x, the function (xex )(yey ) is an odd
function of y, and hence the integral as y varies from −1 to 1 must be 0.
We shall prove in a moment that for reasonable functions the iterated integrals in either order are
equal, and so it behooves us to think a minute about symmetry (or about the difficulty of finding
an antiderivative) and choose the more convenient order of integration. ▽
2
Example 3.Suppose
we wish over the triangle Ω ⊂ R
to find the volume of the region lying
0 1 1 x
with vertices at , , and and bounded above by z = f = xy. Then we wish to
0 0 1 y
find the integral of f over the region Ω. By definition, we consider Ω as a subset of, say, the square
R = [0, 1] × [0, 1] and define f˜: R → R by

xy, x
x ∈Ω
f˜ = y ,
y 
0, otherwise

˜ x
whose graph is sketched in Figure 2.2. Note that for x fixed, f = xy when 0 ≤ y ≤ x and is 0
y
otherwise. So Z Z Z Z
1 x 1 x
x
f˜ dy = xydy + 0dy = xydy.
0 y 0 x 0
Thus, we have
Z !
1Z 1 Z 1 Z x
x
f˜ dydx = xydy dx
0 0 y 0 0
Z !
1 ix
1 2
= 2 xy dx
0 y=0
266 Chapter 7. Integration

x
Ω

Figure 2.2

Z 1
1 3 1
= 2 x dx = . ▽
0 8

Example 4. Suppose we slice into a cylindrical tree trunk, x2 + y 2 ≤ a2 , and remove the wedge

Figure 2.3

bounded below by z = 0 and above by z = y, as depicted in Figure 2.3. What is the volume of the
chunk we remove?
We see that the plane z = y lies above the plane z = 0 when y ≥ 0, so we let Ω =

y=√a 2 −x 2
Ω
x
−a a

Figure 2.4
§2. Iterated Integrals and Fubini’s Theorem 267

x
: x2 + y2 ≤ a2 , y ≥ 0 , as indicated in Figure 2.4, and to obtain the volume we calculate:
y
Z Z Z √ Z !
a a2 −x2 a i√a2 −x2
1 2
ydA = ydydx = 2 y y=0 dx
Ω −a 0 −a
Z a Z a
1 2
= (a2 − x2 )dx = (a2 − x2 )dx = a3 . ▽
2 −a 0 3
The fact that we can compute volume using either a multiple integral or an iterated integral
suggests that, at least for “reasonable” functions, we should in general be able to calculate multiple
integrals by computing iterated integrals. The crucial theorem that allows us to calculate multiple
integrals with relative ease is the following

Theorem 2.1 (Fubini’s Theorem, 2-dimensional case). Suppose f is integrable

on a rectangle
x
R = [a, b] × [c, d] ⊂ R2 . Suppose that for each x ∈ [a, b], the function f is integrable on [c, d],
y
Z d
x
i.e., F (x) = f dy exists. Suppose next that the function F is integrable on [a, b], i.e.,
c y
Z Z Z !
b b d
x
F (x)dx = f dy dx
a a c y

exists. Then we have !

Z Z b Z d
x
f dA = f dy dx.
R a c y

Proof. Let P be an arbitrary partition of R into rectangles Rij = [xi−1 , xi ] × [yj−1 , yj ],

x
i = 1, . . . , k, j = 1, . . . , ℓ. When ∈ Rij , we have
y

x
mij ≤ f ≤ Mij , and so
Z yj y
x
mij (yj − yj−1 ) ≤ f dy ≤ Mij (yj − yj−1 ).
yj−1 y

So now when x ∈ [xi−1 , xi ], we have

ℓ
X Z d ℓ
X
x
mij (yj − yj−1 ) ≤ f dy ≤ Mij (yj − yj−1 ), whence
c y
j=1 j=1
Z Z !
X
ℓ xi d X
ℓ
x
mij (yj − yj−1 ) (xi − xi−1 ) ≤ f dy dx ≤ Mij (yj − yj−1 ) (xi − xi−1 ).
xi−1 c y
j=1 j=1

Finally, summing over i, we have

k X Z Z !
X ℓ b d
x
mij (yj − yj−1 ) (xi − xi−1 ) ≤ f dy dx
a c y
i=1 j=1
k X
X ℓ
≤ Mij (yj − yj−1 ) (xi − xi−1 ).
i=1 j=1
268 Chapter 7. Integration

But this can be rewritten as

ℓ
k X Z Z k X ℓ
!
X b d X x
mij area(Rij ) ≤ dy dx ≤
f Mij area(Rij ),
a c y
i=1 j=1 i=1 j=1
Z b Z d !
x
i.e., L(f, P) ≤ f dy dx ≤ U (f, P).
a c y

Since f is integrable on R, if a number I satisfies

L(f, P) ≤ I ≤ U (f, P) for all partitions P of [a, b],

Z
then I = f dA. This completes the proof.
R

Corollary 2.2. Suppose f is integrable on the rectangle R = [a, b] × [c, d] and the iterated
integrals
Z bZ d Z dZ b
x x
f dydx and f dxdy
a c y c a y
Z d
x
both exist. (That is, for each x, the integral f
dy exists and defines a function of x that is
c y
Z b
x
integrable on [a, b]. And, likewise, for each y, the integral f dx exists and defines a function
a y
of y that is integrable on [c, d].) Then
Z bZ d Z Z dZ b
x x
f dydx = f dA = f dxdy.
a c y R c a y

In general, in n dimensions, we have:

Theorem 2.3 (Fubini’s Theorem, general case). Let R ⊂ Rn be a rectangle, say

R = [a1 , b1 ] × · · · × [an , bn ].

Suppose f : R → R is integrable and that, moreover, the integrals

Z bn Z bn−1 Z bn
f (x)dxn , f (x)dxn dxn−1 , ...,
an an−1 an
Z Z Z !
b1 bn−1 bn
... f (x)dxn dxn−1 · · · dx1
a1 an−1 an

all exist. Then the multiple integral and the iterated integral are equal:
Z Z b1 Z bn
f (x)dV = ... f (x)dxn · · · dx1 .
R a1 an

(The same is true for the iterated integral in any order, provided all the intermediate integrals
exist.) In particular, whenever f is continuous on R, then the multiple integral equals any of the
n! possible iterated integrals.
§2. Iterated Integrals and Fubini’s Theorem 269

Example 5. It is easy to find a function f on the rectangle R = [0, 1] × [0, 1] that is integrable
but whose iterated integral doesn’t exist. Take


x 1, x = 0, y ∈ Q
f = .
y 0, otherwise

Z 1 Z
0
The integral f dy does not exist, but it is easy to see that f is integrable and f dA = 0.
0 y R
▽

Example 6. It is somewhat harder to find a function whose iterated integral exists but that
is not integrable. Let


x 1, y∈Q
f = .
y 2x, y ∈ /Q
Z 1 Z 1Z 1
x x
Then f dx = 1 for every y ∈ [0, 1], so the iterated integral f dxdy exists and
0 y 0 0 y
equals 1. Whether f is integrable on R = [0, 1] × [0, 1] is more subtle. Probably the easiest way
to see that it is not is this: if it were, by Proposition 1.6, then it would also be integrable on
R′ = [0, 12 ] × [0, 1]. For any partition P of R′ , we have U (f, P) = 21 , whereas we can make L(f, P)
Z 1 Z 1/2
1
as close to 2xdxdy = as we wish.
0 0 4
We
Z 1Z 1 ask the reader to decide in Exercise 4 whether the other iterated integral,
x
f dydx, exists.
0 0 y

Example 7. More subtle yet is a nonintegrable function on R = [0, 1] × [0, 1] both of whose
iterated integrals exist. Define


x 1, x = m n
q and y = q for some m, n, q ∈ N with q prime
f = .
y 0, otherwise

First of all, f is not integrable on R since L(f, P) = 0 and U (f, P) = 1 for every partition P of R
Z 1
x
(see Exercise 5). Next, we claim that for any x, f dy exists and equals 0. When x ∈ / Q,
y
0
m x
this is obvious. When x = , only for finitely many y ∈ [0, 1] is f not equal to 0, and so
q y
Z 1Z 1
x
the integral exists. Obviously, then, the iterated integral f dydx exists. The same
0 0 y
argument applies when we reverse the order. ▽

Example 8 (Changing the order of integration). You are asked to evaluate the iterated integral
Z 1Z 1
sin x
dxdy.
0 y x
270 Chapter 7. Integration
Z
sin x
It is a classical fact that dx cannot be evaluated in elementary terms, and so (other than
x
resorting to numerical integration) we are stymied. To be careful, we define

 sin x
x , x 6= 0
f = x .
y 1, x=0

Then f is continuous and

Z we recognize (applying Theorem 2.1) that the iterated integral is equal
to the double integral f dA, where
Ω

x
Ω= : 0 ≤ y ≤ 1, y ≤ x ≤ 1 ,
y

which is the triangle pictured in Figure 2.5. Once we have a picture of Ω, we see that we can

1
y=x
x=y x=1
Ω Ω
y=0 1

Figure 2.5

equally well represent it in the form

x
Ω= : 0 ≤ x ≤ 1, 0 ≤ y ≤ x ,
y

and so, writing the iterated integral in the other order,

Z Z 1 Z x !
sin x
f dA = dy dx
Ω 0 0 x
| {z }
x fixed
Z x !
1
sin x
= y dx
0 x y=0
Z ! Z 1
1
sin x
= · x dx = sin xdx = 1 − cos 1.
0 x 0

The moral of this story is that, when confronted by an iterated integral that cannot be evaluated
in elementary terms, it doesn’t hurt to change the order of integration and see what happens. ▽

Example 9. Let Ω ⊂ R3 be the region in the first octant bounded belowZ by the paraboloid
z = x2 + y 2 and above by the plane z = 4, shown in Figure 2.6. Evaluate xdV . It is most
Ω
natural to integrate first with respect to z; notice that the projection of Ω onto the xy-plane is the
§2. Iterated Integrals and Fubini’s Theorem 271

z=x2+y2

Figure 2.6

quarter
of the disk of radius 2 centered at the origin lying in the first quadrant. For each point
x
in that quarter-disk, z varies from x2 + y 2 to 4. Thus, we have
y
Z Z √
2Z 4−x2 Z 4
xdV = xdzdydx
Ω 0 0 x2 +y 2
Z √
2Z 4−x2
= x 4 − (x2 + y 2 ) dydx
0 0
Z 2
= x (4 − x2 )3/2 − 31 (4 − x2 )3/2 dx
0
i2 64
2
= − 15 (4 − x2 )5/2 = .
0 15
We will revisit this example in Section 3. ▽

Example 10. Let Ω = {x ∈ Rn : 0 ≤ xn ≤ xn−1 ≤ · · · ≤ x2 ≤ x1 ≤ 1}. This region is pictured

in the case n = 3 in Figure 2.7. Then

Figure 2.7
Z 1 Z x1 Z xn−1
vol(Ω) = ... dxn · · · dx2 dx1
0 0 0
Z 1 Z x1 Z xn−2
= ... xn−1 dxn−1 · · · dx2 dx1
0 0 0
272 Chapter 7. Integration
Z 1 Z x1 Z xn−3
1 2
= ... 2 xn−2 dxn−2 · · · dx2 dx1
0 0 0
Z 1
1 1
= ··· = x1n−1 dx1 = ▽
0 (n − 1)! n!

EXERCISES 7.2

Z
1. Evaluate the integrals f dV for the given function f and rectangle R:
R
x
*a. f = ex cos y, R = [0, 1] × [0, π2 ]
y

x y
b. f = , R = [1, 3] × [2, 4]
y
x
x x
*c. f = 2 , R = [0, 1] × [1, 3]
y x +y
 
x
d. f y  = (x + y)z, R = [−1, 1] × [1, 2] × [2, 3]
z
Z
2. Interpret each of the following iterated integrals as a double integral f dA for the appropriate
Ω
region Ω, sketch Ω, and change the order of integration. (You may assume f is continuous.)
Z 1Z 1
x
a. f dydx
y
Z0 1 Zx2x
x
*b. f dydx
y
Z0 2 Z0 4
x
c. f dxdy
1 y 2√ y
Z Z 1 2
1−y
x
d. f dxdy
y
Z−1 0
1Z x
x
e. f
dydx
x2 y
Z0 2 Z x+2
x
*f. f dydx
−1 x2 y

3. Z
Evaluate each of the following iterated integrals. In addition, interpret each as a double integral
f dA, sketch the region Ω, change the order of integration and evaluate the alternative
Ω
iterated
Z 1 Zintegral.
x
a. (x + y)dydx
0 0√
Z 1 Z 1−y2
b. √ ydxdy
0 − 1−y 2
Z 1Z x
x
*c. dydx
0 x 2 1 + y2
§2. Iterated Integrals and Fubini’s Theorem 273
Z 1Z 1
x
4. Given the function f in Example 6, does the iterated integral f dydx exist?
0 0 y

5. Check that for the function f defined in Example 7, for every partition P of R,
U (f, P) = 1 and L(f, P) = 0. (Hint: Show that for every δ > 0, if 1/q < δ, then every
interval of length δ in [0, 1] contains a point of the form k/q.)

6. Let

 1 , x = p in lowest terms, y ∈ Q
x
f = q q
.
y 0, otherwise

Decide whether f is integrable on R = [0, 1] × [0, 1] and whether the iterated integrals
Z 1Z 1 Z 1Z 1
x x
f dxdy and f dydx
0 0 y 0 0 y

exist.

7. Is there an integrable function on a rectangle neither of whose iterated integrals exists?

8. Evaluate
Z 4 Zthe following iterated integrals:
2
1
*a. √ 1 + x3
dxdy
0 y
Z 1Z 1
4
b. √
ey dydx
3
Z0 1 Z 1x
c. √
ey/x dxdy (Be careful: why does the double integral even exist?)
0 y

9. Find the volume of the region in the first octant of R3 bounded below by the xy-plane, on the
sides by x = 0 and y = 2x, and above by y 2 + z 2 = 16.

10. Find the volume of the region in the R3 bounded below by the xy-plane, above by z = y, and
on the sides by y = 4 − x2 .

*11. Find the volume of the region in R3 bounded by the cylinders x2 + y 2 = 1 and x2 + z 2 = 1.
Z
12. Interpret each of the following iterated integrals as a triple integral f dV for the appropriate
Ω
region Ω, sketch Ω, and change the order of integration so that the innermost integral is taken
with respect to y. (You may
 assume f is continuous.)
Z 1 Z 1−x Z 1−x−y x
*a. f y  dzdydx
0 0 0 z
 
Z 1Z 1−x2
y Z
x
b. f y  dzdydx
0 0 0 z
 
Z 1 Z √1−x2 Z 1 x
c. √ √ f y  dzdydx

−1 − 1−x2 x2 +y 2 z
274 Chapter 7. Integration
 
Z 1 Z 1−x2 Z x+y x
*d. f y  dzdydx
0 0 0 z
 
Z 1Z 1−x Z x+y x
e. f y  dzdydx
0 0 0 z

*13. Suppose a, b, and c are positive. Find the volume of the tetrahedron bounded by the coordinate
planes and the plane x/a + y/b + z/c = 1.

14. Find the volume of the region in R3 bounded by z = 1 − x2 , z = x2 − 1, y + z = 1, and y = 0.

*15. Let Ω ⊂ R3 be the portion of the cube 0Z ≤ x, y, z ≤ 1 lying above the plane y + z = 1 and
below the plane x + y + z = 2. Evaluate xdV .
Ω
16. Let

x−y x
. f =
(x + y)3 y
Z 1Z 1 Z 1Z 1
x x
Calculate the iterated integrals f dxdy and f dydx. Explain your re-
0 0 y 0 0 y
sults.

17. Let R = [0, 1] × [0, 1]. Define f : R → R by


kℓ(k+1)(ℓ+1) 1


 2ℓ−k
, k+1 < x ≤ k1 , 1
ℓ+1 < y ≤ 1ℓ , k < ℓ
x 1 1
f = −k2 (k + 1)2 , < x, y ≤ .
y 

k+1 k

0, otherwise

Decide if both iterated integrals exist and if Zthey are equal. Is f integrable on R? (Hint: To
see where this function came from, calculate f dA.)
1
[ k+1 , k1 ]×[ ℓ+1
1
, 1ℓ ]

18. (Exploiting symmetry) Let R ⊂ Rn . Suppose f : R → R is integrable.

a. Suppose R is a rectangle that is symmetric about the hyperplane x1 = 0, i.e.,
   
x1 −x1
 x2   x2 
   
 ..  ∈ R ⇐⇒  ..  ∈ R,
 .   . 
xn xn
   
−x1 x1
 x2   x2  Z
   
and f  ..  = −f  .. . Prove that f dV = 0.
 .  . R
xn xn
b. Suppose R is a rectangle that is symmetric about the origin, i.e., x ∈ ZR ⇐⇒ −x ∈ R,
and suppose f is an odd function, so that f (−x) = −f (x). Prove that f dV = 0.
R
c. Generalize the results of parts a and b to allow regions other than rectangles.
§2. Iterated Integrals and Fubini’s Theorem 275

19. Assume f is C2 . Prove Theorem 6.1 of Chapter 3 by applying Fubini’s Theorem. (Hint: Proceed
by contradiction: if the mixed partials are not equal at some point, apply Exercise 2.3.5 to show
∂2f ∂2f
we can find a rectangle on which, say, > . Exercise 7.1.5 may also be useful.)
∂x∂y ∂y∂x
♯ 20. ∂f
(Differentiating under the integral sign) Suppose f : [a, b] × [c, d] → R is continuous and is
Z d ∂x
x
continuous. Define F (x) = f dy.
c y
a. Prove that F is continuous. (Hint: You will need to use uniform continuity of f .)
Z d
′ ∂f x
b. Prove that F is differentiable and that F (x) = dy. (Hint: Let φ(t) =
Z d Z x c ∂x y
∂f t
dy, and let Φ(x) = φ(t)dt. Show that φ is continuous and that F (x) =
c ∂x y a
Φ(x) + const.)
Z 1 x Z 1
y −1 ′ y−1
21. Let F (x) = dy. Use Exercise 20 to calculate F (x) and prove that dy =
0 log y 0 log y
F (1) = log 2.
Z x 2 Z 1 −x2 (t2 +1)
e
−t2
22. Let f (x) = e dt and g(x) = dt.
0 0 t2 + 1
a. Using Exercise 20 as necessary, prove that f ′ (x) + g ′ (x) = 0 for all x.
Z ∞ Z N
−t2 2 √
b. Prove that f (x)+g(x) = π/4 for all x. Deduce that e dt = lim e−t dt = π/2.
0 N →∞ 0

∂f
23. Suppose f : [a, b] × [c, d] → R is continuous and is continuous. Suppose g : [a, b] → (c, d) is
Z g(x) ∂x
x
differentiable. Let h(x) = f dy. Use the chain rule and Exercise 20 to show that
c y
Z g(x)
′ ∂f x x
h (x) = dy + f g′ (x).
c ∂x y g(x)
Z z
x x
(Hint: Consider F = f dy.)
z c y
x Z
dy
24. Evaluate 2 2
. Use Exercise 23 to evaluate
Z x 0 x +y Z x
dy dy
*a. 2 2 2
b. 2 2 3
0 (x + y ) 0 (x + y )
Z x
25. Suppose f is continuous. Let h(x) = sin(x − y)f (y)dy. Show that h(0) = h′ (0) = 0 and
0
h′′ (x) + h(x) = f (x).

26. Suppose f is continuous. Prove that

Z x Z x1 Z x2 Z xn−1 Z x
1
··· f (xn )dxn · · · dx3 dx2 dx1 = (x − t)n−1 f (t)dt.
0 0 0 0 (n − 1)! 0

(Hint: Start by doing the cases n = 2 and n = 3.)

276 Chapter 7. Integration

3. Polar, Cylindrical, and Spherical Coordinates

In this section we introduce three extremely useful alternative coordinate systems in two and
three dimensions. We treat the question of changes of variables in multiple integrals intuitively
here, leaving the official proofs for Section 6.

x2+y2=2

x2+y2=1
S

Figure 3.1
Z
x
Suppose one wished to calculate f dA, where S is the annular region between two
S y
concentric circles, as shown in Figure 3.1. As we quickly realize if we try to write down iterated
integrals in xy-coordinates, although it is not impossible to evaluate them, it is far from a pleasant
task. It would be much more sensible to work in a coordinate system that is built around the radial
symmetry. This is the place of polar coordinates.
Polar coordinates on the xy-plane are defined as follows: As shown in Figure 3.2, let r =
p x
x2 + y 2 denote the distance of the point from the origin, and let θ denote the angle from
y

x
r y
θ

Figure 3.2
the positive x-axis to the vector from the origin to the point. Ordinarily, we adopt the convention
that
r≥0 and 0 ≤ θ < 2π or − π ≤ θ < π.
It is better to express x and y in terms of r and θ, and we do this by means of the mapping
g : [0, ∞) × [0, 2π) → R2
§3. Polar, Cylindrical, and Spherical Coordinates 277

r r cos θ x
g = = .
θ r sin θ y
Z
x
To evaluate a double integral f dA in polar coordinates, we first determine the region
S y
Ω in the rθ-plane that maps to S. We substitute x = r cos θ and y = r sin θ, and then realize that a
little rectangle ∆r by ∆θ in the rθ-plane maps to an “annular chunk” whose area is approximately

θ y
r∆θ
∆θ g ∆r

∆r r x

Figure 3.3

∆r(r∆θ) in the xy-plane (see Figure 3.3). That is, partitioning the region Ω into little rectangles
corresponds to “partitioning” S into such annular pieces. Summing over all the subrectangles of a
partition suggests a formula like
Z Z
x r cos θ
f dA = f rdrdθ.
S y Ω r sin θ

A rigorous justification will come in Section 6.

x 2 2
Example 1. Let S be the annular region : 1 ≤ x + y ≤ 2 pictured in Figure 3.1. We
y
Z p
wish to evaluate x2 + y 2 dA.
S
Z p Z Z √
2π 2
x2 + y 2 dA = r
|{z} rdrdθ
| {z }
S 0 1 √ dA
x2 +y 2
Z Z √
2π 2
= r 2 drdθ
0 1
√
2π(2 2 − 1)
= .
3
If you are not yet convinced, try doing this in Cartesian coordinates! ▽

Example 2. Let S ⊂ R2 be the region inside the circle x2 + y 2 = 9, below the Z line y = x,
above the x-axis, and lying to the right of x = 1, as shown in Figure 3.4. Evaluate xydA. We
S
begin by finding the region Ω in the rθ-plane that maps to S, as shown in Figure 3.5. Clearly θ
goes from 0 to π/4, and for each fixed θ, we see that r starts at r = sec θ (as we enter S at the line
x = 1) and increases to r = 3 (as we exit S at the circle). (We think naturally of determining r as
278 Chapter 7. Integration

y
x=1
y=x

S
x

x2+y2=9

Figure 3.4

y
x=1
y=x
θ
r=sec θ S
g
π/4
Ω x

3 r x2+y2=9

Figure 3.5

a function of θ, so that naturally we would place θ on the horizontal axis and r on the vertical; for
reasons we’ll see in Chapter 8, this is not a good idea.)
Therefore, we have
Z Z π/4 Z 3
xydA = (r| cos
{z θ})(r| sin
{z θ}) rdrdθ
| {z }
S 0 sec θ x y dA
Z π/4 Z 3
= r 3 cos θ sin θdrdθ
0 sec θ
Z π/4
1
= (81 − sec4 θ) cos θ sin θdθ
4 0
Z π/4
1 sin θ
= 81 cos θ sin θ − dθ
4 0 cos3 θ

1 2 1 π/4 79
= 81 sin θ − = . ▽
8 cos2 θ 0 16
Z ∞
2
Example 3. We wish to evaluate the improper integral e−x dx. This “Gaussian integral”
0
is ubiquitous in probability, statistics, and statistical mechanics. Although one way of doing so was
§3. Polar, Cylindrical, and Spherical Coordinates 279

given in Exercise 7.2.22, the approach we take here is more amenable to generalization.
Taking advantage of the property ea+b = ea eb , we exploit radial symmetry by calculating instead
the double integral
Z ∞ 2 Z ∞ Z ∞ Z
−x2 −x2 −y 2 2 2
e dx = e e dydx = e−(x +y ) dA.
0 0 0 [0,∞)×[0,∞)

Converting to polar coordinates, we have

Z Z π/2 Z ∞
2 2 2
e−(x +y ) dA = e−r rdrdθ
[0,∞)×[0,∞) 0 0
Z π/2 Z R
2
= lim e−r rdrdθ
R→∞ 0 0
π i
2 R π 2

− 21 e−r
= lim = lim 1 − e−R
R→∞ 2 0 R→∞ 4
π
= ,
4
√
π
and so our original integral is equal to .
2
Remark. We should probably stop to worry for a moment about convergence of these improper
integrals. First of all,
Z ∞ Z N
−x2 2
e dx = lim e−x dx
0 N →∞ 0
Z N
2 2
exists because, for example, when x ≥ 1, we have 0 < e−x ≤ e−x , and so the integrals e−x dx
Z ∞
0
−x −1
increase as N → ∞ and are all bounded above by 1 + e dx = 1 + e . Now it is easy to see,
1

N
√
N 2

Figure 3.6
2 2
as Figure 3.6 suggests, that the integral of e−(x +y ) over the square [0, N ] × [0, N ] lies between the
√
integral over the quarter-disk of radius N and the integral over the quarter-disk of radius N 2,
both of which approach π/4. ▽

In general, it is good to use polar coordinates when either the form of the integrand or the
shape of the region recommends it.
280 Chapter 7. Integration

Next we come to three dimensions. Cylindrical coordinates r, θ, z are merely polar coordinates
(used in the xy-plane) along with the cartesian coordinate z:

g : [0, ∞) × [0, 2π) × R → R3

   
r r cos θ
g θ  =  r sin θ  .
z z

The intuitive argument we gave earlier for polar coordinates suggests now that a little rectangle ∆r

z z
∆z
∆z
g
∆r ∆r
∆θ θ r∆θ
y

r x

Figure 3.7

by ∆θ by ∆z in rθz-space corresponds to a “chunk” with approximate volume ∆V ≈ ∆r(r∆θ)∆z,

as pictured in Figure 3.7. If g maps the region Ω in rθz-space to our region S ⊂ R3 , then we expect
      !
Z x Z r cos θ ZZ Z r cos θ
f y  dV = f  r sin θ  rdrdθdz = f  r sin θ  dz rdrdθ.
S z Ω z z

Indeed, as suggested by the last integral above, it is almost always preferable to set up an iterated
integral with dz innermost, and then the usual rdrdθ outside (integrating over the projection of Ω
onto the xy-plane).

Example 4. Revisiting Example 9 of Section 2, let S ⊂ R3 be the region in the firstZoctant

bounded below by the paraboloid z = x2 + y 2 and above by the plane z = 4. To evaluate xdV
S
by using cylindrical coordinates, we realize that S is the image under g of the region
  
 r 
Ω =  θ  : 0 ≤ r ≤ 2, 0 ≤ θ ≤ π/2, r 2 ≤ z ≤ 4 .
 
z

Thus, we have
Z Z π/2 Z 2 Z 4
xdV = |r cos
{z θ} rdzdrdθ
| {z }
S 0 0 r2 x dV
Z π/2 Z 2 Z 4
= r 2 cos θdzdrdθ
0 0 r2
Z π/2 Z 2
= r 2 cos θ(4 − r 2 )drdθ
0 0
§3. Polar, Cylindrical, and Spherical Coordinates 281

Z π/2
64 64
= cos θdθ = ,
0 15 15
which, reassuringly, is the same answer we obtained earlier. ▽

Example 5. Let S be the region bounded above by theZ paraboloid z = 6 − x2 − y 2 and below
p
by the cone z = x2 + y 2 , as pictured in Figure 3.8. Find zdV . The symmetry of S about the
S

z=6—r2

z=r

r=2

Figure 3.8

z-axis makes cylindrical coordinates a natural. The surfaces z = 6 − r 2 and z = r intersect when
r = 2, so we see that S is the image under g of the region
  
 r 
Ω=  
θ : 0 ≤ r ≤ 2, 0 ≤ θ ≤ 2π, r ≤ z ≤ 6 − r 2
.
 
z

Thus, we have
Z Z 2π Z 2 Z 6−r 2
zdV = z |rdzdrdθ
{z }
S 0 0 r
dV
Z 2π Z 2
= 1
2 (6 − r 2 )2 − r 2 rdrdθ
0 0
Z 2
92
=π (36 − 13r 2 + r 4 )rdr = π. ▽
0 3
Last, we come to spherical coordinates: ρ represents the distance from the origin to the point,
φ the angle from the positive z-axis to the vector from the origin to the point, and θ the angle from
the positive x-axis to the projection of that vector into the xy-plane. That is, in some sense, φ
specifies the latitude of the point and θ specifies its longitude. (As shown in Figure 3.9, when ρ and
φ are held constant, we get a circle parallel to the xy-plane; when ρ and θ are held constant, we get
a great circle going from the north pole to the south pole.) Notice that we make the convention
that
ρ ≥ 0, 0 ≤ φ ≤ π, and 0 ≤ θ ≤ 2π.
282 Chapter 7. Integration

φ constant

φ
y
θ

x θ constant

Figure 3.9

z
ρ sin φ

φ ρ cos φ

θ
y
x ρ sin φ cos θ
ρ sin φ sin θ

Figure 3.10

As usual, we use basic trigonometry to express x, y, and z in terms of our new coordinates ρ,
φ, and θ (see also Figure 3.10):

g : [0, ∞) × [0, π] × [0, 2π) → R3

   
ρ ρ sin φ cos θ
g φ =  ρ sin φ sin θ  .
θ ρ cos φ

As suggested by Figure 3.11, a rectangle ∆ρ by ∆φ by ∆θ in ρφθ-space maps to a spherical chunk

θ z

∆θ
g ρ∆φ
∆ρ φ
∆φ φ ∆ρ
y
ρ sin ρ sinφ ∆θ
θ φ
ρ x

Figure 3.11
§3. Polar, Cylindrical, and Spherical Coordinates 283

of volume approximately

∆V ≈ (∆ρ)(ρ∆φ)(ρ sin φ∆θ) = ρ2 sin φ∆ρ∆φ∆θ.

So, if g maps the region Ω to S, we expect that

   
Z x Z ρ sin φ cos θ
f y  dV = f  ρ sin φ sin θ  ρ2 sin φdρdφdθ.
S z Ω ρ cos φ

Example 6. Let S ⊂ R3 be the “ice-cream cone” bounded above by the sphere x2 +y 2 +z 2 = a2

p
and below by the cone z = c x2 + y 2 , where c is a fixed positive constant, as depicted in Figure
3.12. It is easy to see that the region Ω in ρφθ-space mapping to S is given by

ρ=a

φ0 φ=φ0

Figure 3.12

  
 ρ 
Ω =  φ  : 0 ≤ ρ ≤ a, 0 ≤ φ ≤ φ0 , 0 ≤ θ ≤ 2π ,
 
θ

where φ0 = arctan(1/c).
The volume of S is calculated using spherical coordinates as follows:
Z Z 2π Z φ0 Z a
vol(S) = 1dV = ρ2 sin φdρdφdθ
S 0 0 0

2π 3 2π 3 c
= a (1 − cos φ0 ) = a 1− √ .
3 3 1 + c2
Z
We can calculate zdV as well:
S
Z Z 2π Z φ0 Z a
zdV = (ρ cos φ) ρ2 sin φdρdφdθ
S 0 0 0 | {z } | {z }
z dV
Z 2π Z φ0 Z a
= ρ3 sin φ cos φdρdφdθ
0 0 0

π 4 2 π 4 1
= a sin φ0 = a . ▽
4 4 1 + c2
284 Chapter 7. Integration
 
0 Z
Example 7. Let S be the sphere of radius a centered at  0 . We wish to evaluate z 2 dV .
a S
We observe first that, by Exercise 1.2.14, the triangle shown in Figure 3.13 is a right triangle, and
so the equation of the sphere is ρ = 2a cos φ, 0 ≤ φ ≤ π/2. So we have

ρ
φ

Figure 3.13

Z Z 2π Z π/2 Z 2a cos φ
z 2 dV = (ρ2 cos2 φ) ρ2 sin φdρdφdθ
S 0 0 0 | {z } | {z }
z2 dV
Z 2π Z π/2 Z 2a cos φ
= ρ4 cos2 φ sin φdρdφdθ
0 0 0
Z π/2
64 5 8 5
= πa cos7 φ sin φdφ = πa . ▽
5 0 5

EXERCISES 7.3

1. Sketch the curves:

a. r = 4 cos θ
b. r = 3 sec θ
c. r = 1 − sin θ
d. r = 1/(cos θ + sin θ)

2. Find the area of the region bounded on the left by x = 1 and on the right by x2 + y 2 = 4.
Check your answer with simple geometry.

3. Find the area of the cardioid r = 1 + cos θ, pictured in Figure 3.14.

§3. Polar, Cylindrical, and Spherical Coordinates 285

Figure 3.14
Z
1
4. For ε > 0, let Sε = {x : ε ≤ kxk ≤ 1} ⊂ Evaluate lim R2 . p dA. (This is often
Z ε→0+ Sε x2 + y2
expressed as the improper integral (x2 + y 2 )−1/2 dA.)
B(0,1)
Z
*5. Let S be the annular region shown in Figure 3.1. Evaluate y 2 dA
S
a. directly Z
b. by instead calculating (x2 + y 2 )dA.
S
Z
*6. Calculate y(x2 + y 2 )−5/2 dA , where S is the plane region lying above the x-axis, bounded
S
on the left by x = 1 and above by x2 + y 2 = 2.
Z
7. Calculate (x2 + y 2 )−3/2 dA , where S is the plane region bounded below by y = 1 and above
S
by x2 + y 2 = 4.

x y2
8. Let f =p . Let S be the planar region lying inside the circle x2 + y 2 = 2x, above
y x2 + y 2 Z
the x-axis, and to the right of x = 1. Evaluate f dA.
S
Z 1Z 1
xex
*9. Evaluate dxdy.
0 y x2 + y 2
10. Find the volume of the region bounded above by z = 2y and below by z = x2 + y 2 .

11. Find the volume of the “doughnut with no hole,” ρ = sin φ, pictured in Figure 3.15.

*12. Sketch and find the volume of the “pillow” ρ = sin θ, 0 ≤ θ ≤ π.

13. Find the volume of the region inside both x2 + y 2 = 1 and x2 + y 2 + z 2 = 2.

14. Find the volume of the region inside both x2 + y 2 + z 2 = 4a2 and x2 + y 2 = 2ay.

15. Find the volume of the region bounded above by x2 + y 2 + z 2 = 2 and below by z = x2 + y 2 .

16. Find the volume of the region inside the sphere x2 + y 2 + z 2 = a2 by integrating in
a. cylindrical coordinates
286 Chapter 7. Integration

Figure 3.15

b. spherical coordinates

*17. Find the volume of a right circular cone of base radius a and height h by integrating in
a. cylindrical coordinates
b. spherical coordinates
p
18. Find the volume of the region lying above the cone z = x2 + y 2 and inside the sphere
x2 + y 2 + z 2 = 2 by integrating in
a. cylindrical coordinates
b. spherical coordinates

19. Find the volume of the region lying above the plane z = a and inside the sphere
x2 + y 2 + z 2 = 4a2 by integrating in
a. cylindrical coordinates
b. spherical coordinates
Z
3
*20. Let S ⊂ R be the unit ball. Use symmetry principles to compute x2 dV as easily as possible.
S
Z
2 +y 2 +z 2 )
21. a. Evaluate e−(x dV .
Z
R3
2 +2y 2 +3z 2 )
b. Evaluate e−(x dV .
R3

*22. Find the volume of the region in R3 bounded above by the plane z = 3x + 4y and below by the
paraboloid z = x2 + y 2 .
Z
z
23. Evaluate 2 2 2 3/2
dV , where S is the region bounded below by the sphere x2 + y 2 +
S (x + y + z )
z 2 = 2z and above by the sphere x2 + y 2 + z 2 = 1.

24. Find the volume of the region in R3 bounded by the cylinders x2 + y 2 = 1, y 2 + z 2 = 1, and
x2 + z 2 = 1. (Hint: Make full use of symmetry.)

4. Physical Applications

So far we have focused on area and volume as our interpretation of the multiple integral. Now
we discuss average value and mass (which have both physical and probabilistic interpretations),
§4. Physical Applications 287

center of mass, moment of inertia, and gravitational attraction.

Recall from one-variable calculus the notion of the average value of an integrable function.
Given a real-valued function f on an interval [a, b], we may take the uniform partition Pk of the

interval into k equal subintervals, xi = a + i b−ak , i = 1, . . . , k, and calculate the average of the
values f (x1 ), . . . , f (xk ):
k
1X
y (k) = f (xi ).
k
i=1
Multiplying and dividing by b − a gives
k
1 X b−a
y (k) = f (xi ) .
b−a k
i=1

Now let’s suppose that f is bounded. Then, as usual, mi ≤ f (xi ) ≤ Mi for each i = 1, . . . , k, and
so
1 1
L(f, Pk ) ≤ y (k) ≤ U (f, Pk )
b−a b−a
for every uniform partition Pk of the interval [a, b]. Now assume that f is integrable. Then it
Z b
follows from Exercise 7.1.10 that L(f, Pk ) and U (f, Pk ) both approach f (x)dx as k → ∞, and
a
so
Z b
(k) 1
y → f (x)dx as k → ∞.
b−a a
This motivates the following

Definition. Let f be an integrable function on the interval [a, b]. We define the average value
of f on [a, b] to be
Z b
1
f= f (x)dx.
b−a a
In general, if Ω ⊂ Rn is a region and f : Ω → R is integrable, we define the average value of f on
Ω to be Z
1
f= f dV .
vol(Ω) Ω

Example
1. A round hotplate S is given by the disk r ≤ π/2. Its temperature is given by
x p
f = cos x2 + y 2 . We want to determine the average temperature of the plate. We calculate
y
Z
1
f= f dA
area(S) S

by proceeding in polar coordinates:

Z Z 2π Z π/2 iπ/2
π

f dA = (cos r)rdrdθ = 2π(r sin r + cos r) = 2π 2 −1 ,
S 0 0 0

and so
2π π2 − 1 4(π − 2)
f= π 2 = ≈ 0.463. ▽
π( 2 ) π2
288 Chapter 7. Integration

It is useful to define the integral of a vector-valued function f : Ω → Rm component by com-

ponent (generalizing what we did in Lemma 5.3 of Chapter 3): assuming each of the component
functions f1 , . . . , fm is integrable on Ω, set
R 
Z f 1 dV
 Ω . 
f dV = 
R
.. .

Ω
Ω fm dV
Then we can define the average value of f on Ω in the obvious way:
Z
1
f= f dV .
vol(Ω) Ω
In particular, we define the centroid or center of mass of the region Ω to be
Z
1
x= xdV .
vol(Ω) Ω
Example 2. We want to find the centroid of the plane region Ω bounded below by y = 0,
above by y = x2 , and on the right by x = 1. Its area is given by
Z 1 Z x2
1
area(Ω) = dydx = .
0 0 3
Now, integrating the position vector x over Ω gives
1

0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1

Figure 4.1
Z Z 1 Z x2 Z 1
x x3 1/4
xdA = dydx = 1 4 dx = ,
Ω 0 0 y 0 2x 1/10

3/4
so x = , which makes physical sense (see Figure 4.1). ▽
3/10
It is useful to observe that when the region Ω is symmetric about an axis, its centroid will lie on
that axis. (See Exercise 7.2.18.)
When a mass distribution Ω is non-uniform, it is important to understand the idea of density.
Much like instantaneous velocity (or slope of a curve), which is defined as a limit of average
velocities (or slopes of secant lines), we define the density δ(x) to be the limit as r → 0+ of the
average density (mass/volume) of a cube of sidelength r centered at x.2 Then it is quite plausible
2
More precisely, the average density of that portion of the cube lying in Ω.
§4. Physical Applications 289

that, with some reasonable assumptions on the behavior of “mass,” it should be recaptured by
integrating the density function.

Lemma 4.1.ZLet Ω ⊂ Rn be a region. Assume the density function δ : Ω → R is integrable.

Then mass(Ω) = δdV .
Ω

Proof. As usual, it suffices to assume Ω is a rectangle R. For any partition P = {Ri } of R, let
mi = inf x∈Ri δ(x) and Mi = supx∈Ri δ(x). Then mi vol(Ri ) ≤ mass(Ri ) ≤ Mi vol(Ri ). (Suppose,
for example, that
mass(Ri )
Mi < = δ∗ .
vol(Ri )
Then, in particular, for all x ∈ Ri , we have δ(x) < δ∗ , and so, by the definition of δ, for each x
there is a cube centered at x whose average density is less than δ∗ . By compactness, we can cover
Ri by finitely many such cubes and we see that the average density of Ri itself is less than δ∗ , which
is a contradiction.) It now follows that L(δ, P) ≤ mass(R) ≤ U (δ, Z
P) for any partition P of R, and
so, since we’ve assumed δ is integrable, we must have mass(R) = δdV .
R

Remark. We should be a little bit careful here. The Fundamental Theorem of Calculus tells
Rx
us that we can recover f by differentiating its integral F (x) = a f (t)dt provided f is continuous.
If we start with an arbitrary integrable function f , e.g., the function in Exercise 7.1.13, this will
of course not work. A similar situation occurs if we start with an integrable δ, define the mass by
integrating, and then try to recapture δ by “differentiating” (taking the limit of average densities).
Since we are concerned here with physical applications, we will tacitly assume δ is continuous (see
Exercise 7.1.7). In more sophisticated treatments, we really would like to allow point masses and
“generalized functions,” called distributions; this will have to wait for a more advanced course.

Now, generalizing our earlier definition of center of mass, if Ω is a mass distribution with density
function δ, then we define the center of mass to be the weighted average
Z
1
x= δ(x)xdV .
mass(Ω) Ω
This is a natural generalization of the weighted average we see with a system of finitely many point
masses m1 , . . . , mN at positions x1 , . . . , xN , respectively, as shown in Figure 4.2. In this case, the
weighted average is
P
N
mi xi
i=1
x= N ,
P
mi
i=1
and it has the following physical interpretation. If external forces Fi act on the point masses mi ,
they impart accelerations x′′i according to Newton’s second law: Fi = mi x′′i . Consider the resultant
PN PN
force F = Fi acting on the total mass m = mi (any internal forces cancel ultimately by
i=1 i=1
Newton’s third law). Then
N
X N
X
F= Fi = mi x′′i = mx′′ .
i=1 i=1
290 Chapter 7. Integration

mN
m = m1+m2+m3+...+mN
xN
x
m1 x3 m3
x1

x2
m2

Figure 4.2

That is, as the forces act and time passes, the center of mass of the system translates exactly as if
we concentrated the total mass m at x and let the resultant force F act there.
Next, let’s consider now a rigid body3 consisting of point masses m1 , . . . , mN rotating about
an axis ℓ; a typical such mass is pictured in Figure 4.3. The kinetic energy of the system is

ω
ri
mi
xi

Figure 4.3

N
X N
1 1X
K.E. = mi kx′i k2 = mi (ri ω)2 ,
2 2
i=1 i=1
where ω is the angular speed with which the body is rotating about the axis and ri is the distance
from the axis of rotation to the point mass mi . (Remember that each mass is moving in a circle
whose center lies on ℓ.) Regrouping,
N
X
K.E. = 12 Iω 2 , where I= mi ri2 .
i=1
I is called the moment of inertia of the rigid body about ℓ.
In the case of a mass distribution Ω forming a rigid body, we define by analogy (partitioning it
and approximating it by a finite number of masses) its moment of inertia about an axis ℓ to be
Z
I= δr 2 dV ,
Ω
where r is the distance from ℓ.
3
A rigid body does not move relative to itself; imagine the masses connected to one another by inflexible rods.
§4. Physical Applications 291

Example 3. Let’s find the moment of inertia of a uniform solid ball Ω of radius a about an
axis through its center. We may as well place the ball with its center at the origin and let the axis
be the z-axis. Then, using spherical coordinates, we have (since δ is constant):
Z Z 2π Z π Z a
2
I= δr dV = δ (ρ sin φ)2 ρ2 sin φdρdφdθ
Ω 0 0 0 | {z } | {z }
r2 dV
Z π Z a
= 2πδ ρ4 sin3 φdρdφ
0 0

a5 4 4 3 2 2 2
= 2πδ · · = πa δ a = ma2 ,
5 3 3 5 5
where m = 34 πa3 δ is the total mass of Ω. ▽

Example 4. One of the classic applications of the moment of inertia is to decide which rolling

Figure 4.4

object wins the race down a ramp. Given a hula hoop, a wooden nickel, a hollow ball, a solid ball,
or, something more imaginative like a solid cone, as pictured in Figure 4.4, which one gets to the
bottom first?
We use the basic result from physics (see the remark on p. 337 and Example 6 of Chapter 8,
Section 3) that, if we ignore friction, total energy—potential plus kinetic—is conserved.4 We mea-
sure potential energy relative to ground level, so a mass m has potential energy mgh at (relatively
small) heights h. If the rolling radius is a, its angular speed is ω, and its linear speed is v, then we
have aω = v, so when the mass has descended a vertical height h, we have

original (potential) energy = final (kinetic) energy

1 1 v 2 1 I
mgh = mv 2 + I( ) = m 1+ v2
2
| {z } |2 {za } 2 ma2
translational K.E. rotational K.E.

Thus, the object’s speed is greatest when the fraction I/ma2 is smallest. We calculated in Example
3 that this fraction is 2/5 for a solid ball. For a hula-hoop of radius a or for a hollow cylinder of
4
Of course, for the objects to roll, there must be some friction.
292 Chapter 7. Integration

radius a, it is obviously 1 (why?). So the solid ball beats the hula-hoop or hollow cylinder. What
about the other shapes? (See Exercises 16, 17, and 19.) And is there an optimal shape? ▽

Newton’s law of gravitation applies to point masses: the force F exerted by a mass m at position
x on a test mass (which we take to have mass 1 unit) at the origin is given by
x
F = Gm .
kxk3
Thus, the gravitational force exerted by a collection of masses mi , i = 1, . . . , N , at positions xi on
the test mass is given by
N
X XN
xi
F= Fi = G mi ,
kxi k3
i=1 i=1
and, thus, the gravitational force exerted by a continuous mass distribution Ω with density function
δ is Z
x
F=G δ 3
dV .
Ω kxk

Example 5. Find the gravitational attraction on a unit mass at the origin of the uniform region
Ω bounded above by the sphere x2 + y 2 + z 2 = 2a2 and below by the paraboloid az = x2 + y 2 ,
pictured in Figure 4.5. (Take δ = 1.)

az=x2+y2

x2+y2+z2=2a2

r=a

Figure 4.5

Since Ω is symmetric about the z-axis, the net force will lie entirely in the z-direction, so we
calculate only the e3 -component of F. Working in cylindrical coordinates, we see that Ω lies over
the disk of radius a centered at the origin in the xy-plane, and so
Z
z
F3 = G 2 + y 2 + z 2 )3/2
dV
Ω (x
Z 2π Z a Z √2a2 −r2
z
=G rdzdrdθ
0 0 r 2 /a (r + z 2 )3/2
2
Z a √
2 2 −1/2
i 2a2 −r2
= 2πG r −(r + z ) dr
0 r 2 /a
Z a
√

a r 1
= 2πG √ − √ dr = 2πGa log(1 + 2) − √ .
0 a2 + r 2 a 2 2 2
§4. Physical Applications 293

We leave it to the reader to set the problem up in spherical coordinates (see Exercise 24). ▽

Example 6. Newton wanted to understand the gravitational attraction of the earth, which he
took to be a uniform ball. Most of us are taught nowadays that the gravitational attraction of the
earth on a point mass outside the earth is that of a point mass M concentrated at the center of
the earth. But what happens if the point mass is inside the earth? We put the earth (a ball of

α
d
b

φ ρ

R
Ω

Figure 4.6
 
0
radius R) with its center at the origin and the point mass at  0 , b > 0, as shown in Figure 4.6.
b
By symmetry, the net force will lie in the z-direction, so we compute only that component. If the
earth has (constant) density δ, we have
Z Z
cos α b − ρ cos φ
F3 = −Gδ 2 dV = −Gδ 2 2 3/2
dV
Ω d Ω (b + ρ − 2bρ cos φ)
Z RZ π
b − ρ cos φ
= −2πGδ 2 2 3/2
ρ2 sin φdφdρ.
0 0 (b + ρ − 2bρ cos φ)

(Note that we are going to do the φ integral first!)

Fixing ρ, let u = b2 + ρ2 − 2bρ cos φ, du = 2bρ sin φdφ, so
Z π Z (b+ρ)2 2 2
b − ρ cos φ 2 b − b +ρ2b −u ρ
2 2 3/2
ρ sin φdφ = du
0 (b + ρ − 2bρ cos φ) (b−ρ)2 u3/2 2b
Z (b+ρ)2
ρ
= 2 (b2 − ρ2 )u−3/2 + u−1/2 du
4b (b−ρ)2
ρ i(b+ρ)2
= 2 (b2 − ρ2 )(−2u−1/2 ) + 2u1/2
4b (b−ρ)2
h i
ρ 1 1
= 2 (b2 − ρ2 ) − + (b + ρ) − |b − ρ|
2b |b − ρ| b + ρ

2ρ2 /b2 , ρ ≤ b
= .
0, ρ>b
294 Chapter 7. Integration

Now we do the ρ integral. In the event that b ≥ R, we have

Z R 2
2ρ 4πGδ R3 GM
F3 = −2πGδ 2
dρ = − 2
=− 2 ,
0 b b 3 b
where M = 4πδR3 /3 is the total mass of the earth. On the other hand, if b < R, then, since the
integrand vanishes whenever ρ > b, we have
Z b 2
2ρ 4πGδ M
F3 = −2πGδ 2
dρ = − b = −G 3 b,
0 b 3 R
which, interestingly, is linear in b. (When b = R, of course, the two answers agree.) Incidentally,
we will be able to rederive these results in a matter of seconds in Section 6 of Chapter 8. ▽

EXERCISES 7.4

*1. Find the average distance from the origin to the points in the ball B(0, a) ⊂ R2 .

2. Find the average distance from the origin to the points in the ball B(0, a) ⊂ R3 .

*3. Find the average distance from a point on the boundary of a ball of radius a in R2 to the points
inside the ball.

*4. Find the average distance from a point on the boundary of a ball of radius a in R3 to the points
inside the ball.

5. Find the average distance from one corner of a square of sidelength a to the points inside the
square.

6. Consider the region Ωlying 2 2

inside the circle x + y = 2x, above the x-axis, and to the right of
x y
x = 1, with density δ = 2 . Find the center of mass of Ω.
y x + y2
*7. Consider the region Ω lying inside thecircle x2 + y 2 = 2x and outside the circle x2 + y 2 = 1.
x
If its density function is given by δ = (x2 + y 2 )−1/2 , find its center of mass.
y

8. Find the center of mass of a uniform semicircular plate of radius a in R2 .

9. Find the center of mass of a uniform solid hemisphere of radius a in R3 .

10. Find the center of mass of the uniform region in Exercise 7.3.19.

*11. Find the center of mass of the uniform tetrahedron bounded by the coordinate planes and the
plane x/a + y/b + z/c = 1.

*12. Find the mass of a solid cylinder of height h and base radius a if its density at x is equal to
the distance from x to the axis of the cylinder. Next find its moment of inertia about the axis.

13. Find the moment of inertia about the z-axis of a solid ball of radius a centered at the origin,
whose density is given by δ(x) = kxk.
§4. Physical Applications 295
p
14. Let Ω be the region bounded above by x2 + y 2 + z 2 = 4 and below by z = x2 + y 2 . Calculate
the moment of inertia of Ω about the z-axis by integrating in both cylindrical and spherical
coordinates.

15. Find the moment of inertia about the z-axis of the region of constant density δ = 1 bounded
√ p
above by the sphere x2 + y 2 + z 2 = 4 and below by the cone z 3 = x2 + y 2 .

*16. Find the moment of inertia about the z-axis of each of the following uniform objects:
a. a hollow cylindrical can x2 + y 2 = a2 , 0 ≤ z ≤ h
b. the solid cylinder x2 + y 2 ≤ a2 , 0 ≤ z ≤ h
c. the solid cone of base radius a and height h symmetric about the z-axis
Express each of your answers in the form I = kma2 for the appropriate constant k.

17. a. Let 0 < b < a. Find the moment of inertia Ia,b about the z-axis of the uniform region
b2 ≤ x2 + y 2 + z 2 ≤ a2 .
Ia,b
b. Find lim 3 .
b→a a − b3
−
c. Use your answer to part b to show that the moment of inertia of a uniform hollow spherical
shell x2 + y 2 + z 2 = a2 about the z-axis is 23 ma2 , where m is its total mass.

18. Let Ω ⊂ Rn be a region. For what value of a ∈ Rn is the integral

Z
kx − ak2 dV
Ω
minimized? (Cf. Exercise 5.2.14.)

19. Let Ω be the uniform solid of revolution obtained by rotating the graph of y = |x|n , |x| ≤ a1/n ,
I 2n + 1
about the x-axis, as indicated in Figure 4.7. Show that 2
= .
ma 2(4n + 1)

Figure 4.7

20. Let Ωε denote the uniform solid region described in spherical coordinates by 0 ≤ ρ ≤ a,
0 ≤ φ ≤ ε.
a. Find the center of mass of Ωε .
b. Find the limiting position of the center of mass as ε → 0+ . Explain your answer.

21. (Pappus’s Theorem) Suppose R ⊂ R2 is a plane region (say, that bounded by the graphs of
f and g on the interval [a, b]), and let Ω ⊂ R3 be obtained by revolving R about the x-axis.
Prove that the volume of Ω is equal to
vol(Ω) = 2πy · area(R).
296 Chapter 7. Integration

22. Let Ω denote a mass distribution. Denote by I the moment of inertia of Ω about a given axis ℓ,
and by I0 the moment of inertia about the axis ℓ0 parallel to ℓ and passing through the center
of mass of Ω. Then prove the parallel axis theorem:

I = I0 + mh2 ,

where m is the total mass of Ω and h is the distance between ℓ and ℓ0 .

23. Calculate the gravitational attraction of a solid ball of radius R on a unit mass on its boundary
if its density is equal to distance from the center of the ball.

24. Set up Example 5 in spherical coordinates and verify the calculations.

25. Prove or give a counterexample: The gravitational force on a test mass of a body with total
mass M is equal to that of a point mass M located at the center of mass of the body.

26. Show that Newton’s first result in Example 6 still works for a nonuniform earth, so long as
the density δ is radially symmetric (i.e., is a function of ρ only). What happens to the second
result?

27. Consider the solid region Ω bounded by (x2 + y 2 + z 2 )3/2 = kz (k > 0), with k chosen so that
the volume of Ω is equal to the volume of the unit ball.
a. Find k.
b. Taking δ = 1, find the gravitational attraction of Ω on a unit test mass at the origin.

Remark. Your answer to part b should be somewhat larger than 4πG/3, the gravitational
attraction of the unit ball (with δ = 1) on a unit mass on its boundary. In fact, Ω is the region
of appropriate mass that maximizes the gravitational attraction on a point mass at the origin.
Can you think of any explanation—physical, geometric, or otherwise?

28. A completely uniform forest is in the shape of a plane region Ω. The forest service will locate
a helipad somewhere in the forest and, in the event of fire, will dispatch helicopters to fight it.
If a fire is equally likely to start anywhere in the forest, where should the forest service locate
the helipad to minimize fire damage? (Let’s take the simplest model possible: Assume that fire
spreads radially at a constant rate, that the helicopters fly at a constant rate and take off as
soon as the fire starts. So what are we trying to minimize here?)

5. Determinants and n-dimensional Volume

We now want to complete the discussion of determinants initiated in Section 5 of Chapter 1. We

will see soon the relation between such “multilinear” functions and n-dimensional volume. Indeed,
determinants will play a central rôle in all our remaining work.
n
Theorem 5.1. For each n ≥ 1, there is exactly one function D : R
| × ·{z· · × Rn} → R having
the following properties: n times
§5. Determinants and n-dimensional Volume 297

(1) If any pair of the vectors v1 , . . . , vn are exchanged, D changes sign. That is,
D(v1 , . . . , vi , . . . , vj , . . . , vn ) = −D(v1 , . . . , vj , . . . , vi , . . . , vn ) for any 1 ≤ i < j ≤ n.
(2) For all v1 , . . . , vn ∈ Rn and c ∈ R, we have
D(cv1 , v2 , . . . , vn ) = D(v1 , cv2 , . . . , vn ) = · · · = D(v1 , . . . , vn−1 , cvn ) = cD(v1 , . . . , vn ).
(3) For any vectors v1 , . . . , vn and vi′ , we have

D(v1 , . . . , vi−1 , vi + vi′ , vi+1 , . . . , vn ) =

D(v1 , . . . , vi−1 , vi , vi+1 , . . . , vn ) + D(v1 , . . . , vi−1 , vi′ , vi+1 , . . . , vn ).
(4) If {e1 , . . . , en } is the standard basis for Rn , then we have
D(e1 , . . . , en ) = 1.
Properties (2) and (3) indicate that D is linear as a function of each of its variables (whence
“multi linear”); property (1) indicates that D is “alternating.” Property (4) can be interpreted as
saying that the unit cube should have volume 1.

Definition. Given an n × n matrix A with column vectors a1 , . . . , an ∈ Rn , set

det A = D(a1 , . . . , an ).
This is called the determinant of A.

Since most of our work with matrices has centered on row operations, it would perhaps be more
convenient to define the determinant in terms of the rows of A. But it really is inconsequential for
two reasons: first, everything we proved using row operations (and, correspondingly, left multipli-
cation by elementary matrices) works verbatim for column operations (and, correspondingly, right
multiplication by elementary matrices); second, we will prove shortly that det AT = det A.
Properties (1)–(3) of D listed in Theorem 5.1 allow us to see the effect of elementary column
operations on the determinant of a matrix. Indeed, Property (1) corresponds to a column inter-
change; Property (2) corresponds to multiplying a column by a scalar; and Property (3) tells us—in
combination with Property (1)—that adding a multiple of one column to another does not change
the determinant.

Example 1. We can calculate the determinant of the matrix

 
0 0 0 4
0 2 0 0
 
A= 
0 0 1 0
3 0 0 0
as follows. First we factor out the 3 from the first column to get

0 0 0 4 0 0 0 4

0 2 0 0 0
0 2 0
= 3
0 0 1 0 0 0 1 0

3 0 0 0 1 0 0 0
298 Chapter 7. Integration

by Property (2). Repeating this process with the 4 and the 2, we obtain

0 0 0 4 0 0 0 1

0 2 0 0 0 1 0 0

3 = 2 · 4 · 3 .
0 0 1 0 0 0 1 0

1 0 0 0 1 0 0 0
Now interchanging columns 1 and 4 introduces a factor of −1 by Property (1), and we have

0 0 0 1 1 0 0 0

0 1 0 0 0
0 1 0
det A = 24 = −24 = −24,
0 0 1 0 0 0 1 0

1 0 0 0 0 0 0 1
since Property (4) tells us that det I4 = 1. ▽

To calculate the effect of the third type of column operation—adding a multiple of one column
to another—we need the following observation.

Lemma 5.2. If two columns of a matrix A are equal, then det A = 0.

Proof. If ai = aj , then the matrix is unchanged when we switch columns i and j. On the
other hand, by Property (1), its determinant changes sign when we do so. That is, we have
det A = − det A. This can happen only when det A = 0.
Now we can easily prove the

Proposition 5.3. Let A be an n × n matrix and let A′ be the matrix obtained by adding a
multiple of one column of A to another. Then det A′ = det A.

Proof. Suppose A′ is obtained from A by replacing the ith column by its sum with c times the
j th column; i.e., a′i = ai + caj , with i 6= j. (As a notational convenience, we assume i < j, but that
really is inconsequential.) We wish to show that
det A′ = D(a1 , . . . , ai−1 , ai + caj , ai+1 , . . . , aj , . . . , an )
= D(a1 , . . . , ai−1 , ai , ai+1 , . . . , aj , . . . , an ) = det A.
By Property (3), we have
det A′ = D(a1 , . . . , ai−1 , ai + caj , ai+1 , . . . , aj , . . . , an )
= D(a1 , . . . , ai−1 , ai , ai+1 , . . . , aj , . . . , an ) + D(a1 , . . . , ai−1 , caj , ai+1 , . . . , aj , . . . , an )
= D(a1 , . . . , ai−1 , ai , ai+1 , . . . , aj , . . . , an ) + cD(a1 , . . . , ai−1 , aj , ai+1 , . . . , aj , . . . , an )
= D(a1 , . . . , ai−1 , ai , ai+1 , . . . , aj , . . . , an ),
since D(a1 , . . . , ai−1 , aj , ai+1 , . . . , aj , . . . , an ) = 0 by the preceding Lemma.

Example 2. We now use column operations to calculate the determinant of the matrix
 
2 2 1
 
A = 4 1 0.
6 0 1
§5. Determinants and n-dimensional Volume 299

First we exchange columns 1 and 3, and then we proceed to (column) echelon form:

2 2 1 1 2 2 1 0 0 1 0 0

det A = 4 1 0 = −0 1 4 = −0 1 4 = −0 1 0.

6 0 1 1 0 6 1 −2 4 1 −2 12
But
1 0 0 1 0 0

0 1 0 = 12 0 1 0,

1 −2 12 1 −2 1
and now we can use the pivots to column reduce to the identity matrix without changing the
determinant. Thus,

1 0 0 1 0 0

det A = −12 0 1 0 = −12 0 1 0 = −12. ▽

1 −2 1 0 0 1
This is altogether too brain-twisting. We will now go back to the theory and soon show that it’s
perfectly all right to use row operations. First, let’s summarize what we’ve established so far: we
have

Proposition 5.4. Let A be an n × n matrix.

(1) Let A′ be obtained from A by exchanging two columns. Then det A′ = − det A.
(2) Let A′ be obtained from A by multiplying some column by the number c. Then det A′ =
c det A.
(3) Let A′ be obtained from A by adding a multiple of one column to another. Then det A′ =
det A.

Generalizing our discovery in Example 5 of Section 2 of Chapter 4, we have the following

characterization of nonsingular matrices that will be critical both here and in Chapter 9.

Theorem 5.5. Let A be a square matrix. Then A is nonsingular if and only if det A 6= 0.

Proof. Suppose A is nonsingular. Then its reduced (column) echelon form is the identity
matrix. Turning this upside down, we can start with the identity matrix and perform a sequence
of column operations to obtain A. If we keep track of their effects on the determinant, we see
that we’ve started with det I = 1 and multiplied it by a nonzero number to obtain det A. That is,
det A 6= 0. Conversely, suppose A is singular. Then its (column) echelon form U has a column of
zeroes and therefore (see Exercise 2) det U = 0. It follows as in the previous case that det A = 0.

Reinterpreting Proposition 5.4, we have

Corollary 5.6. Let E be an elementary matrix and let A be an arbitrary square matrix. Then

det(AE) = det E det A.

Proof. Left to the reader in Exercise 3.

Of especial interest is the “product rule” for determinants.

300 Chapter 7. Integration

Proposition 5.7. Let A and B be n × n matrices. Then

det(AB) = det A det B.

Proof. Suppose B is singular, so that there is some nontrivial linear relation among its column
vectors:
c1 b1 + · · · + cn bn = 0.
Then, multiplying by A on the left, we find that

c1 (Ab1 ) + · · · + cn (Abn ) = 0,

from which we conclude that there is (the same) nontrivial linear relation among the column
vectors of AB, and so AB is singular as well. We infer from Theorem 5.5 that both det B = 0 and
det AB = 0, and so the result holds in this case.
Now, if B is nonsingular, we know that we can write B as a product of elementary matrices,
viz., B = E1 E2 · · · Em . We now apply Corollary 5.6 twice: first, we have

det B = det(IE1 E2 · · · Em ) = det E1 det E2 · · · det Em det I = det E1 det E2 · · · det Em ;

but then we have

det AB = det(AE1 E2 · · · Em ) = det E1 det E2 · · · det Em det A

= det A(det E1 det E2 · · · det Em ) = det A det B,

as claimed.

A consequence of this Proposition is that det(AB) = det(BA), even though matrix multiplication
is not commutative. Thus, we have

Corollary 5.8. Let E be an elementary matrix and let A be an arbitrary square matrix. Then

det(EA) = det E det A.

Now we infer that, analogous to Proposition 5.4, we have

Proposition 5.9. Let A be an n × n matrix.

(1) Let A′ be obtained from A by exchanging two rows. Then det A′ = − det A.
(2) Let A′ be obtained from A by multiplying some row by the number c. Then det A′ = c det A.
(3) Let A′ be obtained from A by adding a multiple of one row to another. Then det A′ = det A.

Another useful observation is the following

1
Corollary 5.10. If A is nonsingular, then det(A−1 ) = .
det A
Proof. From the equation AA−1 = I and Proposition 5.7 we deduce that det A det A−1 = 1,
so det A−1 = 1/ det A.

Since we’ve seen that row and column operations have the same effect on determinant, it should
not come as a surprise that a matrix and its transpose have the same determinant.
§5. Determinants and n-dimensional Volume 301

Proposition 5.11. Let A be a square matrix. Then

det AT = det A.

Proof. Suppose A is singular. Then so is AT (why?). Thus, det AT = 0 = det A, and so the
result holds in this case. Suppose now that A is nonsingular. As in the preceding proof, we write
T · · · E T E T , and so, using the product
A = E1 E2 · · · Em . Now we have AT = (E1 E2 · · · Em )T = Em 2 1
rule and Exercise 4, we obtain
det AT = det(Em
T
) · · · det(E2T ) det(E1T ) = det E1 det E2 · · · det Em = det A.
The following result can be useful:

Proposition 5.12. If A is an upper (lower) triangular n×n matrix, then det A = a11 a22 · · · ann .

Proof. If aii = 0 for some i, then A is singular (why?) and so det A = 0, and the desired
equality holds in this case. Now assume all the aii are nonzero. Let Ai be the ith row vector of A,
as usual, and write Ai = aii Bi , where the ith entry of Bi is 1. Then, using Property (2) repeatedly,
we have det A = a11 · · · ann det B. Now B is an upper triangular matrix with 1’s on the diagonal,
so we can use the pivots to clear out the upper (lower) entries without changing the determinant,
and thus det B = det I = 1. So det A = a11 a22 · · · ann , as promised.

Remark. As we shall prove in Theorem 1.1 of Chapter 9, any two matrices A and A′ repre-
senting a linear map T are related by the equation A′ = P −1 AP for some invertible matrix P . As
a consequence of Proposition 5.7, we have
det A′ = det(P −1 AP ) = det(AP P −1 ) = det A det(P P −1 ) = det A,
and so it makes sense to define det T = det A for any matrix representative of T .

We now come to the geometric meaning of det T : it gives the factor by which signed volume is
distorted under the mapping by T . (See Exercise 24 for another approach.)

Proposition 5.13. Let T : Rn → Rn be a linear map, and let R be a parallelepiped. Then

vol(T (R)) = | det T |vol(R). Indeed, if Ω ⊂ Rn is a general region, then vol(T (Ω)) = | det T |vol(Ω).

Proof. When T has rank < n, det T = 0 and the image of T lies in a subspace of dimension
< n; hence, by Exercise 7.1.12, T (R) has volume zero. When T has rank n, we can write [T ] as
a product of elementary matrices. Because of Proposition 5.7, it now suffices to prove the result
when [T ] is an elementary matrix itself.
Recall that there are three kinds of elementary matrices (see p. 140). When R is a rectangle,
it is clear that the first type does not change volume, and the second multiplies the volume by |c|;
the third (a shear) does not change the volume, for the following reason. The transformation is the
identity in all directions other than the xi xj -plane, and we’ve already checked that in 2 dimensions
the determinant gives the signed area. (See also Exercise 24.)
Suppose Ω is a region. Then we can take a rectangle R containing Ω and consider the function

1, x ∈ Ω
χ : R → R, χ(x) = .
0, otherwise
302 Chapter 7. Integration

Since by our definition of region, χ is integrable, given ε > 0, we can find a partition P of R so that
U (χ, P) − L(χ, P) < ε. That is, the sum of the volumes of those subrectangles of P that intersect
the frontier of Ω is less than ε. In particular, this means Ω contains a union, S1 , of subrectangles of
P and is contained in a union, S2 , of subrectangles of P, as shown in Figure 5.1, with the property
that vol(S2 ) − vol(S1 ) < ε. And, likewise, T (Ω) contains a union, T (S1 ), of parallelepipeds and is

Ω S1

Figure 5.1

contained in a union, T (S2 ), of parallelepipeds, with vol(T (Si )) = |c|vol(Si ) or vol(T (Si )) = vol(Si ),
depending on the nature of the elementary matrix. In either event, we see that
| det T |vol(S1 ) ≤ vol(T (Ω)) ≤ | det T |vol(S2 )
and since ε > 0 was arbitrary, we are done. (Note that, by Exercise 7.1.11 and Corollary 1.10,
T (Ω) has a well-defined volume.)

5.1. Formulas for the determinant. In Chapter 1 we had explicit formulas for the determi-
nant of 2 × 2 and 3 × 3 matrices. It is sometimes more useful to have a recursive way of calculating
the determinant. Given an n × n matrix A with n ≥ 2, denote by Aij the (n − 1) × (n − 1) matrix
obtained by deleting the ith row and the j th column from A. Define the ij th cofactor of the matrix
to be
cij = (−1)i+j det Aij .
Note that we include the coefficient of ±1 according to the “checkerboard” pattern as indicated
below:  
+ − + ···
 
− + − ···
 
+ − + ···
 
.. .. .. . .
. . . .
Then we have the following formula, called the expansion in cofactors along the ith row :
Proposition 5.14. Let A be an n × n matrix. Then for any fixed i, we have
n
X
det A = aij cij .
j=1

Using rows here allows us to check that the expression on the right-hand side of this equation
satisfies the properties of a determinant as set forth in Theorem 5.1. However, using the fact that
det AT = det A, we can transpose this result to obtain the expansion in cofactors along the j th
column:
§5. Determinants and n-dimensional Volume 303

Proposition 5.15. Let A be an n × n matrix. Then for any fixed j, we have

n
X
det A = aij cij .
i=1

Note that when we define the determinant of a 1 × 1 matrix by the obvious rule,

det [a] = a,

Proposition 5.15 yields the familiar formula for the determinant of a 2 × 2 matrix, and, again, that
of a 3 × 3 matrix.

Example 3. Let
 
2 1 3
 
A = 1 −2 3.
0 2 1
We calculate det A by expanding in cofactors along the second row:

1 3 2 3 2 1

det A = (−1)(2+1) (1) + (−1)(2+2) (−2) + (−1)(2+3) (3)
2 1 0 1 0 2
= −(1)(−5) + (−2)(2) − (3)(4) = −11.

Of course, because of the 0 entry in the third row, we’d have been smarter to expand in cofactors
along the third row, obtaining

1 3 2 3 2 1

det A = (−1)(3+1) (0) + (−1)(3+2) (2) + (−1)(3+3) (1)
−2 3 1 3 1 −2
= −2(3) + 1(−5) = −11. ▽

Sketch of proof of Proposition 5.15. As we mentioned earlier, we must check that the ex-
pression on the right-hand side has the requisite properties. When we form a new matrix A′
by switching two adjacent columns (say columns k and k + 1) of A, then whenever j 6= k and
j 6= k + 1, we have a′ij = aij and c′ij = −cij ; on the other hand, when j = k, we have a′ik = ai,k+1
and c′ik = −ci,k+1 ; when j = k + 1, we have a′i,k+1 = aik and c′i,k+1 = −cik , so
n
X n
X
a′ij c′ij = − aij cij ,
j=1 j=1

as required. We can exchange an arbitrary pair of columns by exchanging an odd number of adjacent
pairs in succession (see Exercise 16), so the general result follows.
The remaining properties are easier to check. If we multiply the kth column by c, then for
j 6= k, we have a′ij = aij and c′ij = ccij , whereas for j = k, we have c′ik = cik and a′ik = caik . Thus,
n
X n
X
a′ij c′ij = c aij cij ,
j=1 j=1

as required. Suppose now that we replace the kth column by the sum of two column vectors, viz.,
a′k = ak + a′′k . Then for j 6= k, we have c′ij = cij + c′′ij and a′ij = aij = a′′ij . When j = k, we likewise
304 Chapter 7. Integration

have c′ik = cik = c′′ik , but a′ik = aik + a′′ik . So

n
X n
X n
X
a′ij c′ij = aij cij + a′′ij c′′ij ,
j=1 j=1 j=1

as required.

Proof of Theorem 5.1. Proposition 5.15 establishes that a function D exists satisfying the
properties listed in the statement of the Theorem. On the other hand, as we saw, calculating
determinants just using the properties, there can only be one such function, because, by reducing
the matrix to echelon form by column or row operations, we are able to compute the determinant.
(See also Exercise 21.)

Remark. It is worth remarking that expansion in cofactors is an important theoretical tool, but
a computational nightmare. Even using calculators and computers, to compute an n×n determinant
by expanding in cofactors requires more than n! multiplications5 (and lots of additions). On the
other hand, to compute an n × n determinant by row reducing the matrix to upper triangular form
requires slightly fewer than 13 n3 multiplications (and additions). Now, Stirling’s formula tells us
that n! grows faster than (n/e)n , which gets large much faster than does n3 . Indeed, consider the
following table displaying the number of operations required:

n cofactors row or column operations

2 2 2
3 9 8
4 40 20
5 205 40
6 1236 70
7 8659 112
8 69280 168
9 623529 240
10 6235300 330

Thus, we see that once n > 4, it is sheer folly to calculate a determinant by the cofactor method
(unless almost all the entries of the matrix happen to be 0).

We conclude this section with a few classic formulas. The first is particularly useful for solving
2 × 2 systems of equations and may be useful even for larger n if you are interested only in a certain
component xi of the solution vector.

Proposition 5.16 (Cramer’s Rule). Let A be a nonsingular n × n matrix, and let b ∈ Rn .

Then the ith coordinate of the vector x solving Ax = b is
det Bi
xi = ,
det A
where Bi is the matrix obtained by replacing the ith column of A by the vector b.
5
In fact, as n gets larger, the number of multiplications is essentially (e − 1)n!.
§5. Determinants and n-dimensional Volume 305

Proof. This is amazingly simple. We calculate the determinant of the matrix obtained by
replacing the ith column of A by b = Ax = x1 a1 + · · · + xn an :

| | | |

det Bi = a1 a2 · · · x1 a1 + · · · + xn an · · · an

| | | |

| | | |

= a1 a2 · · · xi ai · · · an = xi det A,

| | | |

since the multiples of columns other than the ith do not contribute to the determinant.

Example 4. We wish to solve

" #" # " #
2 3 x1 3
= .
4 7 x2 −1

We have " # " #

3 3 2 3
B1 = and B2 = ,
−1 7 4 −1
so det B1 = 24, det B2 = −14, and det A = 2. Therefore, x1 = 12 and x2 = −7. ▽

We now deduce from Cramer’s rule an “explicit” formula for the inverse of a nonsingular matrix.
Students seem always to want an alternative to Gaussian elimination, but what follows is practical
only for the 2 × 2 case (where it gives us our familiar formula from Example 5 on p. 146) and—
barely—for the 3 × 3 case.

Proposition 5.17. Let A be a nonsingular matrix, and let C = [cij ] be the matrix of its
cofactors. Then
1
A−1 = CT.
det A
Proof. We recall from p. 145 that the j th column vector of A−1 is the solution of Ax = ej ,
where ej is the j th standard basis vector for Rn . Now, Cramer’s rule tells us that the ith coordinate
of the j th column of A−1 is
1
(A−1 )ij = det Aji ,
det A
where Aji is the matrix obtained by replacing the ith column of A by ej . Now, we calculate det Aji
by expanding in cofactors along the ith column of the matrix Aji . Since the only nonzero entry of
that column is the j th , and since all its remaining columns are those of the original matrix A, we
find that
det Aji = (−1)i+j det Aji = cji ,
and this proves the result.

For 3 × 3 matrices, this formula isn’t bad when det A would cause troublesome arithmetic doing
Gaussian elimination.
306 Chapter 7. Integration

Example 5. Consider the matrix

 
1 2 1
 
A =  −1 1 2;
2 0 3
then
1 2 −1 2 −1 1

det A = (1) − (2) + (1) = 15,
0 3 2 3 2 0
and so we suspect the fractions would not be fun if we implemented Gaussian elimination. Un-
daunted, we calculate the cofactor matrix:
 
3 7 −2
 
C =  −6 1 4,
3 −3 3
and so  
3 −6 3
1 1  
A−1 = CT =  7 1 −3  . ▽
det A 15
−2 4 3
In general, the determinant of an n × n matrix can be written as the sum of n! terms, each (±)
the product of n entries of the matrix, one from each row and column. This can be deduced either
from the recursive formula of Proposition 5.15 or directly from the properties of Theorem 5.1.

Definition . A permutation σ of the numbers 1, . . . , n is a one-to-one function σ mapping

{1, . . . , n} to itself. The sign of the permutation σ, denoted sign(σ), is +1 if an even number of
exchanges is required to change the ordered set {1, . . . , n} to the ordered set {σ(1), . . . , σ(n)} and
−1 if an odd number of exchanges is required.

Remark. It is a consequence of the well-definedness of the determinant, which we’ve already

proved, that the sign of a permutation is itself well-defined. One can then define sign(σ) to be the
determinant of the permutation matrix whose ij-entry is 1 when j = σ(i) and 0 otherwise.

Proposition 5.18. Let A be an n × n matrix. Then

X
det A = sign(σ)aσ(1)1 aσ(2)2 · · · aσ(n)n
permutations σ of 1,...,n
X
= sign(σ)a1σ(1) a2σ(2) · · · anσ(n) .
permutations σ of 1,...,n

P
n
Proof. The j th column of A is the vector aj = aij ei , and so, by Properties (2) and (3), we
i=1
have
n n n
!
X X X
det A = D(a1 , . . . , an ) = D ai 1 1 e i 1 , ai 2 2 e i 2 , . . . , ain n ein
i1 =1 i2 =1 in =1
n
X
= ai1 1 ai2 2 · · · ain n D(ei1 , ei2 , . . . , ein ),
i1 ,...,in =1
§5. Determinants and n-dimensional Volume 307

which, by Property (1),

X
= sign(σ)aσ(1)1 aσ(2)2 · · · aσ(n)n D(e1 , . . . , en )
permutations σ of 1,...,n
X
= sign(σ)aσ(1)1 aσ(2)2 · · · aσ(n)n ,
permutations σ of 1,...,n

by Property (4). To obtain the second equality, we apply Proposition 5.11.

Recall that GL(n) denotes the set of invertible n × n matrices (which, by Exercise 6.1.6, is an
open subset of Mn×n ).

Corollary 5.19. The function f : GL(n) → GL(n), f (A) = A−1 , is smooth.

Proof. Proposition 5.18 shows that the determinant of an n × n matrix is a polynomial expres-
sion (of degree n) in its n2 entries. Thus, we infer from Proposition 5.17 that each entry of A−1 is
a rational function (quotient of polynomials) of the entries of A.

EXERCISES 7.5

1. Calculate
the following
determinants:
−1 6 −2

a. 3 4 5

5 2 1

1 0 2 0

−1 2 −2 0

*b.
0 1 2 6

1 1 3 2

1 4 1 −3

2 10 0 1

c.
0 0 2 2

0 0 −2 1

2 −1 0 0 0

−1 2 −1 0 0

*d. 0 −1 2 −1 0

0 0 −1 2 −1

0 0 0 −1 2
2. Suppose one column of the matrix A consists only of 0 entries, i.e., ai = 0 for some i. Prove
that det A = 0.

3. Prove Corollary 5.6.

♯ 4. Prove (without using Proposition 5.11) that for any elementary matrix E, we have det E T =
det E. (Hint: Consider each of the three types of elementary matrices.)
308 Chapter 7. Integration

5. Let A be an n × n matrix and let c be a scalar. Prove det(cA) = cn det A.

6. Prove that if the entries of a matrix A are integers, then det A is an integer. Hint: Use
Proposition 5.14 and induction or Proposition 5.18.

7. Given that 1898, 3471, 7215, and 8164 are all divisible by 13, use only the properties of deter-
minants and the result of Exercise 6 to prove that

1 8 9 8

3 4 7 1

7 2 1 5

8 1 6 4
is divisible by 13.

a1 b1 c1
8. Let A = , B = , and C = be points in R2 . Show that the signed area of
a2 b2 c2
△ABC is given by

a b c
1
1 1 1

a2 b2 c2 .
2
1 1 1

9. Let A be an n × n matrix. Prove that

 
1 0 ··· 0
 
 0 
det 
 ..
 = det A.

 . A 
0
What’s the interpretation in terms of (signed) volume?

10. Generalizing Exercise 9, we have:

a. Suppose A ∈ Mk×k , B ∈ Mk×ℓ , and D ∈ Mℓ×ℓ . Prove that
" #
A B
det = det A det D.
O D
b. Suppose now that A, B, and D are as in part a and C ∈ Mℓ×k . Prove that if A is
invertible, then
" #
A B
det = det A det(D − CA−1 B).
C D
c. If we assume, moreover, that k = ℓ and AC = CA, then deduce that
" #
A B
det = det(AD − CB).
C D
d. Give examples to show the result of part c needn’t hold when A and C do not commute.

*11. Suppose A is an orthogonal n × n matrix. (Recall that this means that AT A = In .) Compute
det A.
§5. Determinants and n-dimensional Volume 309

12. Suppose A is a skew-symmetric n × n matrix. (Recall that this means that AT = −A.) Prove
that when n is odd, det A = 0. Give an example to show this needn’t be true when n is even.
(Hint: Use Exercise 5.)
 
1 2 1
 
*13. Let A =  2 3 0 .
1 4 2
 
1
a. If Ax =  2 , use Cramer’s Rule to find x2 .
−1
b. Find A−1 using cofactors.

*14. Using cofactors, find the determinant and the inverse of the matrix
 
−1 2 3
 
A= 2 1 0.
0 2 3

♯ 15. a. Suppose A is an n × n matrix with integer entries and det A = ±1. Prove that A−1 has
all integer entries.
b. Conversely, suppose A and A−1 are both matrices with integer entries. Prove that det A =
±1.

16. Prove that the exchange of any pair of rows (or columns) of a matrix can be accomplished by
an odd number of exchanges of adjacent pairs.

17. Suppose A is an orthogonal n × n matrix. Show that the cofactor matrix C = ±A.

18. Generalizing the result of Proposition 5.17, prove that AC T = (det A)I even if A happens to
be singular. In particular, when A is singular, what can you conclude about the columns of
C T?

x1 x2
19. a. Show that if and are distinct points in R2 , then the unique line passing through
y1 y2
them is given by the equation

1 1 1

x x1 x2 = 0.

y y1 y2
     
x1 x2 x3
b. Show that if  y1 ,  y2 , and  y3  are noncollinear points in R3 , then the unique
z1 z2 z3
plane passing through them is given by the equation

1 1 1 1

x x x x
1 2 3
= 0.
y y1 y2 y3

z z1 z2 z3
310 Chapter 7. Integration

20. As we saw in Exercises 4.1.22 and 4.1.23, through any three noncollinear points in R2 there
pass a unique parabola6 2 2 2
y = ax + bx+ c and a unique circle x + y + ax + by + c = 0. Given
x1 x2 x3
three such points , , and , show that the equation of the parabola and circle
y1 y2 y3
are, respectively,

1 1 1 1 1 1 1 1

x x x x x x1 x2 x3
1 2 3
2 2 2 2 = 0 and = 0.
x x1 x2 x3 y y1 y2 y3

y y1 y2 y3 x2 + y 2 x2 + y 2 x2 + y 2 x2 + y 2
1 1 2 2 3 3

21. Using Corollary 5.6, prove that the determinant function is uniquely determined by the prop-
erties listed in Theorem 5.1. (Hint: Mimic the proof of Proposition 5.7. It might be helpful to
f that have these properties and prove that det(A) = det(A)
consider two functions det and det f
for every square matrix A.)

22. Let v1 , . . . , vk ∈ Rn . Show that

v · v v · v . . . v1 · vk
1 1 1 2

v2 · v1 v2 · v2 . . . v2 · vk
. .. ..
. ..
. . . .

vk · v1 vk · v2 . . . vk · vk

is the square of the (k-dimensional) volume of the k-dimensional parallelepiped spanned by

v1 , . . . , vk . (Hints: First take care of the case that {v1 , . . . , vk } is linearly dependent. Now,
supposing they are linearly independent and therefore span a k-dimensional subspace V , choose
an orthonormal basis {uk+1 , . . . , un } for V ⊥ . What is the relation between the k-dimensional
volume of the parallelepiped spanned by v1 , . . . , vk and the n-dimensional volume of the par-
allelepiped spanned by v1 , . . . , vk , uk+1 , . . . , un ?)

23. a. Using Proposition 5.18, prove that D(det)(I)B = trB = b11 + · · · + bnn . (See Exercise
1.4.22.)
b. More generally, show that for any invertible matrix A, D(det)(A)B = det A tr(A−1 B).

24. Give an alternative proof of Proposition 5.13 for general parallelepipeds as follows. Let R ⊂ Rn
be a parallelepiped. Suppose T : Rn → Rn is a linear map of either of the forms
       
x1 cx1 x1 x1 + cx2
 x2   x2   x2   x2 
       
T  ..  =  .  or T  ..  =  . .
.  .  . .  .
. 
xn xn xn xn

Calculate the volume of R and of T (R) by applying Fubini’s Theorem, putting the x1 integral
innermost. (This is in essence a proof of Cavalieri’s principle.)

6
Here we must also assume that no pair of the points lies on a vertical line.
§6. Change of Variables Theorem 311

25. (from the 1994 Putnam Exam) Find the value of m so that the line y = mx bisects the region
n x2 o
x
∈ R2 : + y 2 ≤ 1, x ≥ 0, y ≥ 0 .
y 4

26. Given any ellipse, show that there are infinitely many inscribed triangles of maximal area.

27. (from the 1994 Putnam Exam) Let A and B be 2 × 2 matrices with integer entries such that
A, A + B, A + 2B, A + 3B, and A + 4B are all invertible matrices whose inverses have integer
entries. Prove that A + 5B is invertible and that its inverse has integer entries. (Hint: Use
Exercise 15.)

6. Change of Variables Theorem

We end this chapter with a general theorem justifying our formulas for integration in polar,
cylindrical, and spherical coordinates. Since we know that the determinant tells us the factor by
which linear maps distort signed volume, and since the derivative gives the best linear approxima-
tion, we expect a change of variables formula to involve the determinant of the derivative matrix.
Giving a rigorous proof is, however, another matter.
Since integration is based upon rectangles rather than balls, it is most convenient to choose (for
this section only) a different norm to measure vectors and linear maps, which, for obvious reasons,
we dub the cubical norm.
Definition. If x ∈ Rn , set kxk = max(|x1 |, |x2 |, . . . , |xn |). If T : Rn → Rm is a linear map, set
kT k = max kT (x)k .
kxk =1

We leave it to the reader to check in Exercise 1 that these are indeed norms and, as will be crucial
for us, that kT (x)k ≤ kT k kxk for all x ∈ Rn . Our first result, depicted in Figure 6.1, estimates
how much a C1 map can distort a cube.
Lemma 6.1. Let Cr denote the cube in Rn of sidelength 2r centered at 0. Suppose U ⊂ Rn is
an open set containing Cr and φ : U → Rn is a C1 function with the property that φ(0) = 0 and
kDφ(x) − Ik < ε for all x ∈ Cr and some 0 < ε < 1. Then
C(1−ε)r ⊂ φ(Cr ) ⊂ C(1+ε)r .

Proof. One can check that Proposition 1.3 of Chapter 6 holds when we use the k · k norm
instead of the usual one (see Exercise 1f). Then if x ∈ Cr , we have
kφ(x)k ≤ max kDφ(y)k kxk < (1 + ε)r,
y∈[0,x]

so φ(Cr ) ⊂ C(1+ε)r . The other inclusion can be proved by applying Exercise 6.2.11 in the k · k
norm.
The crucial ingredient in the proof of the Change of Variables Theorem is the following result,
which says that for sufficiently small cubes C, the image g(C) is well approximated by the image
under the derivative at the center of C.
312 Chapter 7. Integration

(1—ε)r

Cr φ(Cr)
r

(1+ε)r

Figure 6.1

Proposition 6.2. Suppose U ⊂ Rn is open, g : U → Rn is C1 and Dg(x) is invertible for every

x ∈ U . Let C ⊂ U be a cube with center a, and suppose

kDg(a)−1 ◦ Dg(x) − Ik < ε < 1 for all x ∈ C .

Then g(C) is a region (and hence has volume) and

(1 − ε)n | det Dg(a)|vol(C) ≤ vol(g(C)) ≤ (1 + ε)n | det Dg(a)|vol(C).

Proof. Since g is C1 with invertible derivative at each point of U , g maps open sets to open
sets and the frontier of g(C) is the image of the frontier of C, hence a set of zero volume (see
Exercise 7.1.12). Therefore, g(C) is a region.
Suppose the sidelength of the cube C is 2r. We apply Lemma 6.1 to the function φ defined by

φ(x) = Dg(a)−1 g(x + a) − g(a) , x ∈ Cr .

Then φ(0) = 0, Dφ(0) = I, and Dφ(x) = Dg(a)−1 ◦ Dg(x + a) so, by the hypothesis, kDφ(x) −
Ik < ε for all x ∈ Cr . Therefore, we have

C(1−ε)r ⊂ φ(Cr ) ⊂ C(1+ε)r ,

and so

g(a) + Dg(a) C(1−ε)r ⊂ g(C) ⊂ g(a) + Dg(a) C(1+ε)r .

Applying Proposition 5.5.13, using the fact that vol(Cαr ) = αn vol(Cr ), and remembering that
translation preserves volume, we obtain the result.

We begin our onslaught on the Change of Variables Theorem with a very simple case, whose
proof is left to the reader in Exercise 2.

Lemma 6.3. Suppose T : Rn → Rn is a linear map whose standard matrix is diagonal and
nonsingular. Let R ⊂ Rn be a rectangle, and suppose f is integrable on T (R). Then f ◦ T is
integrable on R and
Z Z
f (y)dVy = | det T | (f ◦ T )(x)dVx .
T (R) R
§6. Change of Variables Theorem 313

Theorem 6.4 (Change of Variables Theorem). Let Ω ⊂ Rn be a region and let U be an open
set containing Ω so that g : U → Rn is one-to-one and C1 with invertible derivative at each point.
Suppose f : g(Ω) → R and (f ◦ g)| det Dg| : Ω → R are both integrable. Then
Z Z
f (y)dVy = (f ◦ g)(x)| det Dg(x)|dVx .
g(Ω) Ω

Remark . One can strengthen the theorem, in particular by allowing Dg(x) to fail to be
invertible on a set of volume 0. This is important for many applications—e.g., polar, cylindrical,
and spherical coordinates. But we won’t bother justifying it here.

Proof. First, we cover Ω with a union of rectangles with rational sidelengths (as usual, by
working with the function f˜). Then, dividing these rectangles into cubes, we may assume R is a
cube.
There are positive constants M and N so that |f | ≤ M (by integrability) and k(Dg)−1 k ≤ N
(by continuity and compactness). Choose 0 < ε < 1. By uniform continuity, Theorem 1.4 of
Chapter 5, there is δ1 > 0 so that kDg(x) − Dg(y)k ≤ ε/N whenever kx − yk < δ1 , x, y ∈ R.
Similarly, there is δ2 > 0 so that | det Dg(x) − det Dg(y)| < ε/M whenever kx − yk < δ2 , x, y ∈ R.
And by integrability of (f ◦ g)| det Dg|, there is δ3 > 0 so that whenever the diameter of the cubes
of a cubical partition P is less than δ3 , we have U ((f ◦ g)| det Dg|, P) − L((f ◦ g)| det Dg|, P) < ε (see
Exercise 7.1.10).
Suppose P = {R1 , . . . , Rs } is a partition of R into cubes of diameter less than δ = min(δ1 , δ2 , δ3 ).
Let

Mi = sup (f ◦ g)(x)
x∈Ri

mi = inf (f ◦ g)(x)
x∈Ri
fi = sup (f ◦ g)(x)| det Dg(x)|
M
x∈Ri

e i = inf (f ◦ g)(x)| det Dg(x)|

m
x∈Ri

We claim that if ai is the center of the cube Ri , then

(∗) e i − ε ≤ mi | det Dg(ai )|

m and fi + ε.
Mi | det Dg(ai )| ≤ M

Taking the limit as k → ∞, we conclude that

fi + ε,
Mi | det Dg(ai )| ≤ M

as required.
314 Chapter 7. Integration

On any cube Ri with center ai , we have

ε
kDg(ai )−1 ◦ Dg(x) − Ik ≤ kDg(ai )−1 k kDg(x) − Dg(a)k < N =ε
N
for all x ∈ Ri . By Proposition 6.2, we have
(1 − ε)n | det Dg(ai )|vol(Ri ) ≤ vol(g(Ri )) ≤ (1 + ε)n | det Dg(ai )|vol(Ri ).
Z Xs Z
Now, f dV = f dV , and
g(R) i=1 g(Ri )
Z
mi vol(g(Ri )) ≤ f dV ≤ Mi vol(g(Ri )) for i = 1, . . . , s.
g(Ri )

Therefore, we have
s
X Z s
X
(1 − ε)n mi | det Dg(ai )|vol(Ri ) ≤ f dV ≤ (1 + ε)n Mi | det Dg(ai )|vol(Ri ).
i=1 g(R) i=1

Substituting the inequalities (∗), we find

s
X Z s
X
(1 − ε) n
e i − ε)vol(Ri ) ≤
(m f dV ≤ (1 + ε) n fi + ε)vol(Ri ).
(M
i=1 g(R) i=1
Now, since 0 < ε < 1, we have
(1 + ε)n < 2n , (1 + ε)n − 1 < 2n−1 nε (by the mean value theorem),
(1 − ε)n < 1, and 1 − (1 − ε)n < nε (by the mean value theorem).
Therefore,
s
X Z s
X

e i vol(Ri ) − εvol(R) 1 + M n ≤
m f dV ≤ fi vol(Ri ) + εvol(R) 2n + 2n−1 M n .
M
i=1 g(R) i=1

We’ve almost arrived at the end. For convenience, let β = vol(R) 2n + 2n−1 M n . Recall that
since (f ◦ g)| det Dg| is integrable, its Z
integral is theZunique number lying between all its upper
and lower sums. Suppose now that f dV 6= (f ◦ g)| det Dg|dV . In particular, suppose
Z Z g(R) R

which is a contradiction. Similarly, supposing that γ < 0 leads to a contradiction. Thus,

Z Z
f dV = (f ◦ g)| det Dg|dV ,
g(R) R

as desired.

Examples 1. First, to be official, we check that the formulas we derived in a heuristic manner
in Section 3 are valid.
§6. Change of Variables Theorem 315

r cos θ
r
(a) polar coordinates: Let g = . Then
θ
r sin θ

r cos θ −r sin θ r
Dg = and det Dg = r.
θ sin θ r cos θ θ
   
r r cos θ
(b) cylindrical coordinates: g θ  =  r sin θ . Then
z z
     
r cos θ −r sin θ 0 r
Dg θ  =  sin θ r cos θ 0  and det Dg θ = r.
z 0 0 1 z
   
ρ ρ sin φ cos θ
(c) spherical coordinates: Let g φ =  ρ sin φ sin θ . Then
θ ρ cos φ
   
ρ sin φ cos θ ρ cos φ cos θ −ρ sin φ sin θ
Dg φ =  sin φ sin θ ρ cos φ sin θ ρ sin φ cos θ  ,
θ cos φ −ρ sin φ 0
and, expanding in cofactors along the third row, we find that
 
ρ
det Dg φ = cos φ(ρ2 sin φ cos φ) + ρ sin φ(ρ sin2 φ) = ρ2 sin φ. ▽
θ

0 3 4 1
Example 2. Let S ⊂ R2
be the parallelogram with vertices , , , and , as
0 1 3 2
Z
pictured in Figure 6.2. Evaluate xdA. Of course, with a bit of patience, we could evaluate this
S

4
v y 1 3
g 2
1 S 3
R 1
1 u x

Figure 6.2
by three different iterated integrals in cartesian coordinates, but it makes sense to take a linear
transformation g that maps the unit square, R, to the region S, e.g.,

u 3 1 u x
g = = .
v 1 2 v y
Then, applying the Change of Variables Theorem, we have
Z Z
xdAxy = (3u | {z+ v}) |{z}
5 dAuv
S R
x | det Dg|
316 Chapter 7. Integration
Z 1Z 1 Z 1
=5 (3u + v)dvdu = 5 (3u + 21 )du = 5 · 2 = 10. ▽
0 0 0

Example 3. Let S ⊂ RZ2 be the region bounded by the curves y = x, y = 3, xy = 1, and

xy = 4. We wish to evaluate ydA. The equations of the boundary curves suggest a substitution
S
u x
u = xy, v = y/x. To determine the function g so that g = , we need the inverse function
v y
(note that S lies in the first quadrant):
"p #
u
u v
g = √ .
v uv

u 9
Looking at Figure 6.3, it is easy to check that g maps the region Ω = : 1 ≤ u ≤ 4, 1 ≤ v ≤
v u

y
u=1 u=4
g y=3
v y=x
S
Ω uv=9 xy=4
v=1 xy=1
u x

Figure 6.3
to S. Now, "
p #
u √1 − 12 vu3 u 1
2 uv
Dg = pv pu , so det Dg = .
v 1 1 v 2v
2 u 2 v
Then, by the Change of Variables Theorem, we have
Z Z
√ 1
ydAxy = uv dAuv
S Ω 2v
Z 4 Z 9/u r
u
= 12 dvdu
1 1 v
Z 4 Z
√ √ 9/u 4 √ 13
= u v 1 du = 3− u du = . ▽
1 1 3

EXERCISES 7.6

1. Suppose x, y ∈ Rn , S and T are linear maps from Rn to Rm , and c ∈ R.

a. Prove that kx + yk ≤ kxk + kyk and kcxk = |c|kxk .
b. Prove that kS + T k ≤ kSk + kT k and kcT k = |c|kT k .
c. Prove that kT (x)k ≤ kT k kxk .
§6. Change of Variables Theorem 317

d. Suppose the standard matrix for T is the m × n matrix A. Prove that

Pn
kT k = max |aij |.
1≤i≤m j=1
√ √
e. Check that kxk ≤ kxk ≤ nkxk and √1n kT k ≤ kT k ≤ nkT k .
Z b Z b

n
f. Suppose g : [a, b] → R is continuous. Prove that
g(t)dt ≤ kg(t)k dt. (This is
a a
needed to prove Proposition 1.3 of Chapter 6 with the k · k norm.)

2. Prove Lemma 6.3.

x2 y2 x2 y2 z2
3. Find the area of the ellipse + ≤ 1 and the volume of the ellipsoid + + ≤ 1.
a2 b2 a2 b2 c2
(Cf. also Exercise 9.4.17.)

0 1 0 x
4. Let S be the triangle with vertices at , , and . Let f = e(x−y)/(x+y) . Evaluate
0 0 1 y
Z
the integral f dA
S
a. by changing to polar coordinates
b. by making the change of variables u = x − y, v = x + y.

Let S be the plane region bounded by y = 0, 2x + y = 1, 2x + y = 5, and −x + 3y = 1. Evaluate

*5. Z
x − 3y
dA.
S 2x + y
6. Rework Example 3 with the substitution u = xy, v = y.
Z
x−y
*7. Let S be the plane region bounded by x = 0, y = 0, and x+y = 1. Evaluate cos dA.
S x+y
(Remark: The integrand is undefined at the origin. Does this cause a problem?)

8. Find the volume of the region bounded below by the plane z = 0 and above by the elliptical
paraboloid z = 16 − x2 − 4y 2 .

9. Let S be the plane

Z region in the first quadrant bounded by the curves y = x, y = 2x, and
xy = 3. Evaluate xdA.
S

*10. Let S be the plane region

Z in the first quadrant bounded by the curves y = x, y = 2x, xy = 3,
x
and xy = 1. Evaluate dA.
S y

11. Let S be Zthe region in the first quadrant bounded by y = 0, y = x, xy = 1, and x2 − y 2 = 1.

Evaluate (x2 + y 2 )dA. (Hint: The obvious change of variables is u = xy, v = x2 − y 2 . Here
S
x u
it is too hard to find =g explicitly, but how can you find det Dg another way?)
y v
1
Z S be the region bounded by y = −x, y =
12. Let 3, y = 2x, and y = 2x − 1. Evaluate
x+y
4
dA.
S (2x − y + 1)
318 Chapter 7. Integration

*13. Let S be Zthe region with x ≥ 0 bounded by y + x2 = 0, x − y = 2, and x2 − 2x + 4y = 0.

Evaluate (x − y + 1)−2 dA. (Hint: Consider x = u + v, y = v − u2 .)
S

*14. Suppose 0 < b < a. Define g : (0, b) × (0, 2π) × (0, 2π) → R3 by
   
r (a + r cos φ) cos θ
g  θ  =  (a + r cos φ) sin θ  .
φ r sin φ
Describe and sketch the image of g, and find its volume.

15. Let  
1 1 1 ··· 1
1 2 1 ··· 1
 
 
A=

1 2 3 ··· 1.

 .. .. .. .. .. 
. . . . .
1 2 3 ··· n
Z Z
Given that f dV = 1, evaluate f (A−1 x)dV .
Rn Rn

16. Let S = {x ∈ Rn : xi ≥ 0 for all i, x1 + 2x2 + 3x3 + · · · + nxn ≤ n}. Find vol(S).
Z
*17. Define spherical coordinates in R4 and calculate kxkdV .
B(0,a)
Z
1
18. Let R = [0, 1] × [0, 1], and consider the integral I = dA.
R 1 − xy
P
∞
1
a. By expanding the integrand in a geometric series, show that I = k2 . (To be completely
k=1
rigorous, you will need to write I as the limit of integrals over [0, 1] × [0, 1 − δ] as δ → 0+ .
Why?)
b. Evaluate I by rotating the plane through π/4. A reasonable amount of cleverness will be
required.7

19. Let an denote the n-dimensional volume of the n-dimensional unit ball B(0, 1) ⊂ Rn . Prove
that 
π m /m!, n = 2m
an = .
π m 22m+1 m!/(2m + 1)!, n = 2m + 1
(Hint: Proceed by induction with gaps of 2.)

7
We learned of this calculation from Simmons’ Calculus with Analytic Geometry, first edition, pp. 751–2.
CHAPTER 8
Differential Forms and Integration on Manifolds
In this chapter we come to the culmination of our study of multivariable calculus. Just as
in single-variable calculus, we’ve studied two seemingly unrelated topics—the derivative and the
integral. Now the time has come to make the connection between the two, namely, the multivariable
version of the Fundamental Theorem of Calculus. After building up to the ultimate theorem, we
consider some nontrivial applications to physics and topology.

1. Motivation

We want to be able to integrate on k-dimensional manifolds, so we begin by introducing the

appropriate integrands, which are called (differential) k-forms. These integrals should generalize
the ideas of work (done by a force field along a directed curve) and flux (of a vector field outwards
across a surface). But not only are k-forms invented to be integrated, they can also be differentiated.
There is a natural operator d, called the exterior derivative, which will turn k-forms into k+1-forms.
The classical Fundamental Theorem of Calculus, we recall, tells us that
Z b
f ′ (t)dt = f (b) − f (a)
a
1
whenever f is C . We should think of this as relating the integral of the derivative over the interval
[a, b] to the “integral” of f over the boundary of the interval, which in this case is the signed sum
of the values f (b) and f (a). Notice that there is a notion of direction or orientation built into the
Z a Z b
integral, as indicated in Figure 1.1. inasmuch as f (t)dt = − f (t)dt. In this guise, we can
b a

a b

Figure 1.1

write the Fundamental Theorem of Calculus in the form

Z Z
df = f = f (b) − f (a).
[a,b] ∂[a,b]

More generally, we will prove Stokes’s Theorem, which says that

Z Z
dω = ω
M ∂M
for any k-form ω and compact, oriented k-dimensional manifold M with boundary ∂M . The original
versions of Stokes’s Theorem all arose in the first half of the nineteenth century in connection with
physics, particularly potential theory and electrostatics.
319
320 Chapter 8. Differential Forms and Integration on Manifolds

Just as the Fundamental Theorem of Calculus tells us that our displacement is the integral
of our velocity, so can it tell us the area of a plane region by tracing around its boundary (see
Exercises 1.5.3 and 8.3.26). Another instance of the Fundamental Theorem of Calculus is Gauss’s
Law in physics, which tells us that the total flux of the electric field across a “Gaussian surface” is
proportional to the total charge contained inside that surface. And, as we shall see in Section 7,
another application is the Hairy Ball Theorem, which tells us we can’t comb the hairs on a billiard
ball. The elegant modern-day theory of calibrated geometries, which grew out of understanding
minimal surfaces (the surfaces of least area with a given boundary curve), is based on differential
forms and Stokes’s Theorem.
As we’ve seen in Sections 5 and 6 of Chapter 7, determinants play a crucial rôle in the under-
standing of n-dimensional volume, and so it is not surprising that k-forms, the objects we wish to
integrate over k-dimensional surfaces, will be built out of determinants. We turn to this multilinear
algebra in the next section.

EXERCISES 8.1

1. Why does a (plane) mirror reverse left and right but not up and down?
2. Appropriating from Tom and Ray Magliozzi’s Car Talk :
RAY: Picture this. It’s 1936. You’re in your second year of high school. Europe is
on the brink of yet another war.
TOM: Second senior year in high school.
RAY: In a secret location in Germany, German officers are gathered around a table
with the designers and builders of its new personnel carrier. They’re going over every
little detail and leaving no stone unturned. They want everything to be flawless. One
of the officers stands up and says, “I have a question about the fan belt, about the
longevity of the fan belt.” You with me?
TOM: They spoke English there?
RAY: Oh, yeah.
TOM: Just like in all the movies?
RAY: I’m reading the subtitles.
TOM: Just like in all the movies. I often wondered how come they all spoke English?
RAY: Well, it’s so close to German, after all.
TOM: Yeah. You just add an ish or ein to the end of everything.
RAY: Anyway, this fan belt looks just like the belt around your waist. It’s a flat
piece of rubber, and it’s designed to run around the fan and the generator. So, he
asks, “How long do you expect the belt to last?” The engineer says, “30 to 40 thousand
kilometers.” The officer says, “Not good enough.”
TOM: He said, how many miles is that?
RAY: The colonel says . . .
TOM: That’s why I never made any money in scriptwriting.
RAY: Yeah. The colonel says, “Not good enough. We need it to last at least 60K.”
The engineer says, “Huh. Not a problem. It’s just a question of taking off the belt and
flipping it over, right?”
TOM: Sure.
RAY: Turning it inside-out.
TOM: Yeah.
RAY: The officer says, “That’s unacceptable. Our soldiers will be engaged in battle.
We can’t ask them to change fan belts in the middle of the battlefield.”
TOM: Well, it’s a good point.
§2. Differential Forms 321

RAY: That’s right.

TOM: I mean, come on. You can’t tell the guys to stop shooting, your fan belt’s
got to be replaced.
RAY: Exactly. Hold your fire. So, the engineers huddle together, and they come up
with a clever design change. And I think I mentioned they do not change the material of
the belt in any way, yet they satisfy the new longevity requirement quite easily. What
did they do?
TOM: Whew!

2. Differential Forms

We have learned how to calculate multiple integrals over regions in Rn . Our next goal is to
be able to integrate over compact manifolds, e.g., curves and surfaces in R3 . In some sense, the
most basic question is this: we know that determinant gives the signed volume of an n-dimensional
parallelepiped in Rn ; how do we find the signed volume of a k-dimensional parallelepiped in Rn ,
and what does “signed” mean in this instance?

2.1. The multilinear set-up. We begin by using the determinant to define various mul-
tilinear functions of (ordered) sets of k vectors in Rn . First, we define n different linear maps
dxi : Rn → R, i = 1, . . . , n, as follows: if

 
v1
 
 v2 
v=
 ..
 ∈ Rn ,
 then set dxi (v) = vi .
 . 
vn

(The reason for the bizarre notation will soon become clear.) Note that the set of linear maps from
Rn to R is an n-dimensional vector space, often denoted (Rn )∗ , and {dx1 , . . . , dxn } is a basis for it.
(See Exercise 4.3.25.) For if φ : Rn → R is a linear map, then, letting {e1 , . . . , en } be the standard
basis for Rn , set ai = φ(ei ), i = 1, . . . , n. Then φ = a1 dx1 + · · · + an dxn , so dx1 , . . . , dxn span
(Rn )∗ . Why do they form a linearly independent set? Well, suppose φ = c1 dx1 + · · · + cn dxn is the
zero linear map. Then, in particular, φ(ei ) = ci = 0 for all i = 1, . . . , n, as required.
Now, if I = (i1 , . . . , ik ) is an ordered k-tuple, define

n
dxI : R | × ·{z · · × Rn} → R by1
k times

dx (v ) · · · dx (v )
i1 1 i1 k
.. .. ..
dxI (v1 , . . . , vk ) = . . . .

dxik (v1 ) · · · dxik (vk )
322 Chapter 8. Differential Forms and Integration on Manifolds

As is the case with the determinant, dxI defines an alternating, multilinear function of k vectors
in Rn . If we write

 
vi,1
 
 vi,2 

vi =  .  , i = 1, . . . , k,
 .. 
vi,n

then

v · · · v
1,i1 k,i1
. .. .. .
dxI (v1 , . . . , vk ) = .. . .

v1,ik · · · vk,ik

When i1 < i2 < · · · < ik , this is of course the determinant of the k × k matrix obtained by taking
rows i1 ,. . . ,ik of the matrix
 
| | |
 
v1 v2 · · · vk  .
| | |

   
2 −1
Example 1. Let n = 3, I = (1, 3), and let v1 =  4  and v2 =  0 . Then
5 3

   
2 −1

    dx1  4  dx1  0 

−1
2 −1
dx(1,3)  4  ,  0  =
5
   3  = 2 = 11. ▽
2 −1 5 3
5 3
dx3  4  dx3  0 

5 3

Example 2. Let n = 4, I = (3, 1, 4), and let

     
1 0 2
 −1   3 1
v1 = 
 0,
 v2 =  
 −2  , and v3 =  
1.
2 1 1

1
Here we revert to the usual notation for functions, inasmuch as v1 , . . . , vk are all vectors.
§2. Differential Forms 323

Then
     
1 0 2

 −1   3   1 
dx3   dx3   dx3  
 0   −2   1 

2 1 1
           
1 0 2 1 0 2

 −1   3   1   −1   3   1 

dx(3,1,4)   ,   ,   
= dx    
dx1    dx1  
   
0   −2   1  1  0  −2  1
2 1 1 2 1 1
     
1 0 2

     
dx4  −1  dx4  3  dx4  1 
 0   −2   1 

2 1 1

0 −2 1

= 1 0 2 = −5. ▽

2 1 1

When i1 < i2 < · · · < ik , we say that the ordered k-tuple I = (i1 , . . . , ik ) is increasing. If I is
a k-tuple with no repeated index, we denote by I < the associated increasing k-tuple. For example,
if I = (2, 4, 5, 1), then I < = (1, 2, 4, 5), and we observe that

dx(2,4,5,1) = −dx(2,4,1,5) = +dx(2,1,4,5) = −dx(1,2,4,5) .

In general, dxI = (−1)s dxI < , where s is the number of exchanges required to move from I to I < .
Note that if we switch two of the indices in the ordered k-tuple, this amounts to switching two
rows in the matrix, and the determinant changes sign. Similarly, if two of the indices are equal,
the determinant will always be 0, so dxI = 0 whenever there is a repeated index in I.
It follows from Theorem 5.1 or Proposition 5.18 of Chapter 7 that the set of dxI with I increasing
spans the vector space of alternating multilinear functions from (Rn )k to R, denoted Λk (Rn )∗ . In
particular, if T ∈ Λk (Rn )∗ , then for any increasing k-tuple I, set aI = T (ei1 , . . . , eik ). Then we
leave it to the reader to check that
X
T = aI dxI
I increasing

and that the set of dxI with I increasing forms a linearly independent set (see Exercise 1). Since
counting the increasing sequences of k numbers between 1 and n is the same as counting the number
of k-element subsets of an n-element set, we have
n
k n ∗
dim Λ (R ) = .
k
Remark. Suppose I is an increasing k-tuple. We have the following geometric interpretation:
given vectors v1 , . . . , vk ∈ Rn , the number dxI (v1 , . . . , vk ) is the signed volume of the projection
onto the xi1 xi2 . . . xik -plane of the parallelepiped spanned by v1 , . . . , vk . See Figure 2.1.

Generalizing the cross product of vectors in R3 (see Exercise 3), we define the product of these
alternating multilinear functions, as follows. If I and J are ordered k- and ℓ-tuples, respectively,
324 Chapter 8. Differential Forms and Integration on Manifolds

x3
w

x1 the signed area (as viewed from the positive

x2-axis) of this parallelogram is dx(3,1)(v,w)

Figure 2.1

we define
dxI ∧ dxJ = dx(I,J) ,
where by (I, J) we mean the ordered (k + ℓ)-tuple obtained by concatenating I and J.

Example 3.
dx(1,2) ∧ dx3 = dx(1,2,3)
dx(1,5) ∧ dx(4,2) = dx(1,5,4,2) = −dx(1,2,4,5)
dx(1,3,2) ∧ dx(3,4) = dx(1,3,2,3,4) = 0 ▽
P P P
We extend by linearity: if ω = aI dxI and η = bJ dxJ , then we set ω ∧ η = (aI bJ )dxI ∧
P
dxJ = (aI bJ )dx(I,J) . This is called the wedge product of ω and η.

Example 4. Suppose ω = a1 dx1 + a2 dx2 and η = b1 dx1 + b2 dx2 ∈ Λ1 (R2 )∗ = (R2 )∗ . Then
let’s compute ω ∧ η ∈ Λ2 (R2 )∗ :
ω ∧ η = (a1 dx1 + a2 dx2 ) ∧ (b1 dx1 + b2 dx2 )
= a1 b1 dx1 ∧ dx1 + a2 b1 dx2 ∧ dx1 + a1 b2 dx1 ∧ dx2 + a2 b2 dx2 ∧ dx2
= a1 b1 dx(1,1) + a2 b1 dx(2,1) + a1 b2 dx(1,2) + a2 b2 dx(2,2)
= (a1 b2 − a2 b1 )dx(1,2) .
Of
course,
it should not be altogether surprising that the determinant of the coefficient matrix
a1 a2
has emerged here. ▽
b1 b2

Proposition 2.1. The wedge product enjoys the following properties.

(1) It is bilinear: (ω + φ) ∧ η = ω ∧ η + φ ∧ η and (cω) ∧ η = c(ω ∧ η).
(2) It is skew-commutative: ω ∧ η = (−1)kℓ η ∧ ω, when ω ∈ Λk (Rn )∗ and η ∈ Λℓ (Rn )∗ .
(3) It is associative: (ω ∧ η) ∧ φ = ω ∧ (η ∧ φ).

Proof. (1) and (3) are obvious from the definition. For (2), we observe that to change the
ordered (k + ℓ)-tuple (i1 , . . . , ik , j1 , . . . , jℓ ) to the ordered (k + ℓ)-tuple (j1 , . . . , jℓ , i1 , . . . , ik ) requires
kℓ exchanges: to move j1 past i1 , . . . , ik requires k exchanges, to move j2 past i1 , . . . , ik requires k
more, and so on.
§2. Differential Forms 325

Now that we’ve established associativity, we can make the crucial observation that
dxi ∧ dxj = dx(i,j) and, moreover,
dxi1 ∧ dxi2 ∧ · · · ∧ dxik = dx(i1 ,...,ik ) .
As has been our custom throughout this text, when we work in R3 , it is often more convenient to
write x, y, z for x1 , x2 , x3 .

2.2. Differential forms on Rn and the exterior derivative. A (differential) 0-form on Rn

is a smooth function. An n-form on Rn is an expression of the form2
ω = f (x)dx1 ∧ · · · ∧ dxn
for some smooth function f . As we shall soon see, these (rather than functions) are precisely what
it makes sense to integrate over regions in Rn . A (differential) k-form on Rn is an expression
X X
ω= fI (x)dxI = fI dxi1 ∧ · · · ∧ dxik
increasing k-tuples I i1 <···<ik

for some smooth functions fI . (Remember that the dxI with I increasing give a basis for Λk (Rn )∗ .)
As usual, if k > n, the only k-form is 0.
We can perform the obvious algebraic manipulations with forms: we can add two k-forms, we
can multiply a k-form by a function, we can form the wedge product of a k-form and an ℓ-form.
The set of k-forms on Rn is naturally a vector space, which we denote by Ak (Rn ).3 For reference
we list the relevant algebraic properties:
Proposition 2.2. Let U ⊂ Rn be an open set. Let ω ∈ Ak (U ), η ∈ Aℓ (U ), and φ ∈ Am (U ).
(1) When k = ℓ = m, ω + η = η + ω and (ω + η) + φ = ω + (η + φ).
(2) ω ∧ η = (−1)kℓ η ∧ ω.
(3) (ω ∧ η) ∧ φ = ω ∧ (η ∧ φ).
(4) When k = ℓ, (ω + η) ∧ φ = (ω ∧ φ) + (η ∧ φ).
Determinants (and hence volume) are already built into the structure of k-forms. As the name
“differential form” suggests, their substantial power comes, however, from our ability to differentiate
them. We begin with the case of a 0-form, i.e., a smooth function f : U → R. Then for any x ∈ U
we want df (x) = Df (x) as a linear map on Rn . In other words, we have
n
X ∂f
df = dxj .
∂xj
j=1

In particular, note that if we take f to be the ith coordinate function, then df = dxi and dxi (v) =
P
Dxi (v) = vi , so this explains (in part) our original choice of notation. If ω = fI (x)dxI is a
I
k-form, then we define
X XX n
∂fI
dω = dfI ∧ dxI = dxj ∧ dxi1 ∧ · · · ∧ dxik .
∂xj
I I j=1

2
Sorry about that. You think of a better word!
3
For those of you who may see such words in the future, it is in fact a module over the ring of smooth functions.
Indeed, because we can multiply using the wedge product, if we put all the k-forms together, k = 0, 1, . . . , n, we get
what is called a graded algebra.
326 Chapter 8. Differential Forms and Integration on Manifolds

(Note that for a fixed k-tuple I, only the terms dxj where j is different from i1 , . . . , ik will appear.)

Examples 5.
(a) Suppose f : R → R is smooth. Then we have df = f ′ (x)dx.
(b) Let ω = ydx + xdy ∈ A1 (R2 ). Then dω = dy ∧ dx + dx ∧ dy = 0.
(c) Let ω = −ydx + xdy ∈ A1 (R2 ). Then dω = −dy ∧ dx + dx ∧ dy = 2dx ∧ dy.
y −ydx + xdy
(d) Let ω = d arctan = ∈ A1 (R2 − {0}). Then
x x2 + y 2

y x
dω = d − 2 ∧ dx + d ∧ dy
x + y2 x2 + y 2

∂ y ∂ x
=− dy ∧ dx + dx ∧ dy
∂y x2 + y 2 ∂x x2 + y 2
(x2 + y 2 ) − 2y 2 (x2 + y 2 ) − 2x2
= dx ∧ dy + dx ∧ dy = 0.
(x2 + y 2 )2 (x2 + y 2 )2
(e) Let ω = x1 dx2 + x3 dx4 + x5 dx6 ∈ A1 (R6 ). Then dω = dx1 ∧ dx2 + dx3 ∧ dx4 + dx5 ∧ dx6 .
(f) Let ω = (x2 + eyz )dy ∧ dz + (y 2 + sin(x3 z))dz ∧ dx + (z 2 + arctan(x2 + y 2 ))dx ∧ dy ∈ A2 (R3 ).
Then
dω = 2xdx ∧ dy ∧ dz + 2ydy ∧ dz ∧ dx + 2zdz ∧ dx ∧ dy
= 2(x + y + z)dx ∧ dy ∧ dz. ▽

The operator d, called the exterior derivative, enjoys the following properties:

Proposition 2.3. Let ω ∈ Ak (U ) and η ∈ Aℓ (U ). Let f be a function.

(1) When k = ℓ, we have d(ω + η) = dω + dη.
(2) d(f ω) = df ∧ ω + f dω
(3) d(ω ∧ η) = dω ∧ η + (−1)k ω ∧ dη.
(4) d(dω) = 0.

Proof. (1) and (2) are immediate; indeed, (2) is a consequence of (3). To prove (3), we note
that because d commutes with sums, it suffices to consider the case that ω = f dxI and η = gdxJ .
Then, since the product rule gives d(f g) = gdf + f dg, we have
d(ω ∧ η) = d(f gdxI ∧ dxJ ) = d(f g) ∧ dxI ∧ dxJ
= (gdf + f dg) ∧ dxI ∧ dxJ = gdf ∧ dxI ∧ dxJ + f dg ∧ dxI ∧ dxJ
= (df ∧ dxI ) ∧ (gdxJ ) + (−1)k (f dxI ) ∧ (dg ∧ dxJ )

(since we must switch dg ∈ A1 (U ) and dxI ∈ Ak (U ))

= dω ∧ η + (−1)k ω ∧ dη.
To prove (4), suppose ω = f dxI . Then
n
X ∂f
dω = dxj ∧ dxI
∂xj
j=1
§2. Differential Forms 327

and
n X
X n
∂2f
(∗) d(dω) = dxi ∧ dxj ∧ dxI .
∂xi ∂xj
i=1 j=1

Since dxi ∧ dxj = −dxj ∧ dxi , we can rewrite the right-hand side of (∗) as
X ∂2f ∂2f

− dxi ∧ dxj ∧ dxI .
∂xi ∂xj ∂xj ∂xi
i<j

But by Theorem 6.1 of Chapter 3, we have

∂2f ∂2f
= ,
∂xi ∂xj ∂xj ∂xi
and so this sum is 0, as required.

2.3. Pullback. All the algebraic and differential structure inherent in differential forms en-
dows them with a very natural behavior under mappings. The main point is to generalize the
procedure of “integration by substitution” familiar to all calculus students: When confronted with
Z b
the integral f (g(u))g ′ (u)du, we substitute x = g(u), formally write dx = g ′ (u)du, and say
Z b a Z g(b)
′
f (g(u))g (u)du = f (x)dx. The proof that this works is, of course, the chain rule. Now we
a g(a)
put this procedure in the proper setting.

Definition . Let U ⊂ Rm be open, and let g : U → Rn be smooth. If ω ∈ Ak (Rn ), then we

define g∗ ω ∈ Ak (U ) (the pullback of ω by g) as follows. To pull back a function (0-form) f , we just
compose functions:
g∗ f = f ◦ g.
To pull back the basis 1-forms, if g(u) = x, then set
m
X ∂gi
g∗ dxi = dgi = duj .
∂uj
j=1

Note that the coefficients of g∗ dxi , written as a linear combination of du1 , . . . , dum , are the entries
of the ith row of the derivative matrix of g. Now just let the pullback of a wedge product be the
wedge product of the pullbacks:

g∗ (dxi1 ∧ · · · ∧ dxik ) = dgi1 ∧ · · · ∧ dgik , which we can abbreviate as dgI .

Last, we take the pullback of a sum to be the sum of the pullbacks:

X X X
g∗ fI dxI = (fI ◦ g)dgI = (fI ◦ g)dgi1 ∧ · · · ∧ dgik .
I I I

Examples 6.
(a) If g : R → R, then g ∗ (f (x)dx) = f (g(u))g ′ (u)du.
328 Chapter 8. Differential Forms and Integration on Manifolds

(b) Let g : R → R2 be given by " #

cos t
g(t) = .
sin t
Then g∗ dx = − sin tdt and g∗ dy = cos tdt, so g∗ (−ydx + xdy) = (− sin t)(− sin tdt) +
(cos t)(cos tdt) = dt.
(c) Let g : R2 → R2 be given by

u u cos v
g = .
v u sin v
If ω = xdx + ydy, then

g∗ ω = (u cos v)(cos vdu − u sin vdv) + (u sin v)(sin vdu + u cos vdv)
= u(cos2 v + sin2 v)du + u2 (− cos v sin v + cos v sin v)dv = udu.

Moreover,

g∗ (dx ∧ dy) = g∗ dx ∧ g∗ dy = (cos vdu − u sin vdv) ∧ (sin vdu + u cos vdv)
= u(cos2 v + sin2 v)du ∧ dv = udu ∧ dv,
2 2 2
so g∗ e−(x +y ) dx ∧ dy = ue−u du ∧ dv.
(d) Let g : R2 → R3 be given by
 
u cos v
u
g =  u sin v  .
v
v
Then

g∗ dx = cos vdu − u sin vdv

g∗ dy = sin vdu + u cos vdv
g∗ dz = dv

and so

g∗ (dx ∧ dy) = udu ∧ dv

g∗ (dx ∧ dz) = cos vdu ∧ dv
g∗ (dy ∧ dz) = sin vdu ∧ dv.

Therefore, if ω = (x2 + y 2 )dx ∧ dy + xdx ∧ dz + ydy ∧ dz, then we have

g∗ ω = u2 (udu ∧ dv) + (u cos v)(cos vdu ∧ dv) + (u sin v)(sin vdu ∧ dv)
= u(u2 + 1)du ∧ dv. ▽

It is impossible to miss the appearance of determinants of the derivative matrix in the calculation
we just performed. Indeed, if I is an ordered k-tuple,
X
∗ ∂gI
g dxI = det duJ ; i.e.,
∂uJ
increasing k-tuples J
§2. Differential Forms 329

∂gi1 ∂gi1
∂uj · · · ∂ujk
X 1
.. .. ..
g∗ dxi1 ∧ · · · ∧ dxik = . . . duj1 ∧ · · · ∧ dujk .

1≤j1 <···<jk ≤m ∂gik ∂gik
∂uj · · · ∂uj
1 k

We need one last technical result before we turn to integrating.

Proposition 2.4. Let U ⊂ Rm be open, and let g : U → Rn be smooth. If ω ∈ Ak (Rn ), then

g∗ (dω) = d(g∗ ω).

Proof. The statement for k = 0 is the chain rule (Theorem 3.2 of Chapter 3):
m n !
X X ∂f ∂gi
d(g∗ f ) = d(f ◦ g) = ◦g duj
∂xi ∂uj
j=1 i=1
n X m
! n
X ∂f ∂gi X ∂f
∗
= ◦g duj = g g∗ dxi
∂xi ∂uj ∂xi
i=1 j=1 i=1
X
n
∂f
= g∗ dxi = g∗ (df ).
∂xi
i=1

Since the pullback of a wedge product is the wedge product of the pullbacks, we infer that g∗ (dxI ) =
dgI . Because d and pullback are linear, it suffices to prove the result for ω = f dxI . Well,

g∗ d(f dxI ) = g∗ df ∧ dxI ) = g∗ (df ) ∧ g∗ (dxI ) = g∗ (df ) ∧ dgI

= d(g∗ f ) ∧ dgI = d (g∗ f )dgI = d g∗ (f dxI ) .

(Notice that at the penultimate step we use the rule for differentiating the wedge product and the
fact that d(dgi ) = 0.)

Now we come to integration. Given an n-form ω = f (x)dx1 ∧ · · · ∧ dxn on a region Ω ⊂ Rn , we

define Z Z
ω= f dV .
Ω Ω
Note that since f is smooth, it is continuous, and hence integrable on any region Ω. It is very
important to emphasize here that the n-form ω must be written as a functional multiple of the
standard n-form dx1 ∧ · · · ∧ dxn .
In some sense, the whole point of differential forms is the following restatement of the Change
of Variables Theorem:

Proposition 2.5. Let Ω ⊂ Rn be a region, and let g : Ω → Rn be smooth, one-to-one, with

det(Dg) > 0. Then for any n-form ω = f dx1 ∧ · · · ∧ dxn on S = g(Ω), we have
Z Z
ω= g∗ ω.
S Ω

Let Ω ⊂ Rk be a region, and let g : Ω → Rn be a smooth one-to-one map whose derivative has
rank k at every point. (Actually, it is allowed to have lesser rank on a set of volume 0, but we won’t
330 Chapter 8. Differential Forms and Integration on Manifolds

bother with this now.) We say that M = g(Ω) ⊂ Rn is a parametrized k-dimensional manifold. If
ω is a k-form on Rn , we define Z Z
ω= g∗ ω.
M Ω
If g1 : Ω1 → Rn
and g2 : Ω2 → Rn are two parametrizations of the same k-manifold M , it can be
checked that g2 g1 is smooth. Then, provided det D(g2−1 ◦ g1 ) > 0 (which, as we shall soon see,
−1 ◦

means that g1 and g2 parametrize M with the same orientation),

Z Z
∗
g2 ω = (g2−1 ◦ g1 )∗ (g2∗ ω) (by Proposition 2.5)
Ω2 Ω1
Z
∗
= g2 ◦ (g2−1 ◦ g1 ) ω (see Exercise 16)
Ω
Z 1
∗
= (g2 ◦ g2−1 )◦ g1 ω by associativity
Ω
Z 1
= g1∗ ω.
Ω1

That is, the integral of ω over the (oriented) parametrized manifold M is well-defined.

EXERCISES 8.2

1. Prove that as I ranges over all increasing k-tuples, the dxI form a linearly independent set
P
in Λk (Rn )∗ . Also check that for any T ∈ Λk (Rn )∗ , T = I increasing aI dxI , where aI =
T (ei1 , . . . , eik ).

2. a. Suppose ω ∈ Λk (Rn )∗ and k is odd. Prove that ω ∧ ω = 0.

b. Give an example to show that the result of part a need not hold when k is even.

3. Suppose v, w ∈ R3 . Show that dx(v × w) = dy ∧ dz(v, w), dy(v × w) = dz ∧ dx(v, w), and
dz(v × w) = dx ∧ dy(v, w).

4. Simplify the following expressions:

*a. (2dx + 3dy + 4dz) ∧ (dx − dy + 2dz)
b. (dx + dy − dz) ∧ (dx + 2dy + dz) ∧ (dx − 2dy + dz)
*c. (2dx ∧ dy + dy ∧ dz) ∧ (3dx − dy + 4dz)
d. (dx1 ∧ dx2 + dx3 ∧ dx4 ) ∧ (dx1 ∧ dx2 + dx3 ∧ dx4 )
e. (dx1 ∧ dx2 + dx3 ∧ dx4 + dx5 ∧ dx6 ) ∧ (dx1 ∧ dx2 + dx3 ∧ dx4 + dx5 ∧ dx6 ) ∧ (dx1 ∧ dx2 +
dx3 ∧ dx4 + dx5 ∧ dx6 )
♯ 5. Let n ∈ R3 be a unit vector, and let v and w be orthogonal to n. Let

φ = n1 dy ∧ dz + n2 dz ∧ dx + n3 dx ∧ dy.

Prove that φ(v, w) is equal to the signed area of the parallelogram spanned by v and w (the
sign being determined by whether n, v, w form a right-handed system for R3 ).

*6. Calculate the exterior derivatives of the following differential forms:

§2. Differential Forms 331

a. ω = exy dx
b. ω = z 2 dx + x2 dy + y 2 dz
c. ω = x2 dy ∧ dz + y 2 dz ∧ dx + z 2 dx ∧ dy
d. ω = x1 x2 dx3 ∧ dx4

*7. Can there be a function f so that df is the given 1-form ω (everywhere ω is defined)? If so,
can you find f ?
a. ω = −ydx + xdy
b. ω = 2xydx + x2 dy
c. ω = ydx + zdy + xdz
d. ω = (x2 + yz)dx + (xz + cos y)dy + (z + xy)dz
x y
e. ω = x2 +y 2 dx + x2 +y 2 dy
y x
f. ω = − x2 +y 2 dx + x2 +y 2 dy

8. For each of the following k-forms ω, can there be a (k − 1)-form η (defined wherever ω is) so
that dη = ω?
a. ω = dx ∧ dy
b. ω = xdx ∧ dy
c. ω = zdx ∧ dy
d. ω = zdx ∧ dy + ydx ∧ dz + zdy ∧ dz
e. ω = xdy ∧ dz + ydx ∧ dz + zdx ∧ dy
f. ω = (x2 + y 2 + z 2 )−1 (xdy ∧ dz + ydz ∧ dx + zdx ∧ dy)
g. ω = x5 dx1 ∧ dx2 ∧ dx3 ∧ dx4 + x1 dx2 ∧ dx4 ∧ dx3 ∧ dx5
♯ 9. (The star operator)
a. Define ⋆ : A1 (R2 ) → A1 (R2 ) by ⋆dx = dy and ⋆dy = −dx, extending by linearity. If f is a
smooth function, show that
2
∂ f ∂2f
d⋆(df ) = + dx ∧ dy.
∂x2 ∂y 2

b. Define ⋆ : A1 (R3 ) → A2 (R3 ) by ⋆dx = dy ∧ dz, ⋆dy = dz ∧ dx, and ⋆dz = dx ∧ dy, extending
by linearity. If f is a smooth function, show that
2
∂ f ∂2f ∂2f
d⋆(df ) = + 2 + 2 dx ∧ dy ∧ dz.
∂x2 ∂y ∂z

(Note that we can generalize the definition of the star operator by declaring that, in Rn , ⋆ of
a basis 1-form φ = dxi is the “complementary” (n − 1)-form, subject to the sign requirement
that φ ∧ ⋆φ = dx1 ∧ · · · ∧ dxn .)

10. Suppose ω ∈ A1 (Rn ) and there is a nowhere-zero function λ so that λω is the exterior derivative
of some function f . Prove that ω ∧ dω = 0. (This problem gives a useful criterion for deciding
whether the differential equation ω = 0 has an integrating factor λ.)

11. In each case, calculate the pullback g∗ ω and simplify your answer as much as possible.
√
a. g : (−π/2, π/2) → R, g(u) = sin u, ω = dx/ 1 − x2
332 Chapter 8. Differential Forms and Integration on Manifolds

3 cos 2v
*b. g: R → R2 ,g(v) = , ω = −ydx + xdy
3 sin 2v

2 2 u 3u cos 2v
c. g:R →R ,g = , ω = −ydx + xdy
v 3u sin 2v
 
cos u
u
d. 2 3
g: R → R , g =  sin u , ω = zdx + xdy + ydz
v
v
 
cos u
u
*e. g : R2 → R3 , g =  sin u , ω = zdx ∧ dy + ydz ∧ dx
v
v
 
cos u
u  sin v 
f. g : R2 → R4 , g = 
 sin u , ω = x2 dx1 + x3 dx4
v
cos v
 
cos u
u  sin v 
g. g : R2 → R4 , g = 
 sin u , ω = x1 dx3 − x2 dx4
v
cos v
 
cos u
u  sin v 
h. g : R2 → R4 , g = 
 sin u , ω = (−x3 dx1 + x1 dx3 ) ∧ (−x2 dx4 + x4 dx2 )
v
cos v

12. For each part of Exercise 11, calculate g∗ (dω) and d(g∗ ω) and compare your answers.

13. Let g : (0, ∞) × (0, π) × (0, 2π) → R3 be the usual spherical coordinates mapping given on
p. 282. Compute g∗ (dx ∧ dy ∧ dz).
♯ 14. We say a k-form ω is closed if dω = 0 and exact if ω = dη for some (k − 1)-form η.
a. Prove that an exact form is closed. Is every closed form exact? (Hint: Work with Example
5(d).)
b. Prove that if ω and φ are closed, then ω ∧ φ is closed.
c. Prove that if ω is exact and φ is closed, then ω ∧ φ is exact.
P
k
15. Suppose k ≤ n. Let ω1 , . . . , ωk ∈ (Rn )∗ and suppose that dxi ∧ ωi = 0. Prove that there are
i=1
P
k
scalars aij such that aij = aji and ωi = aij dxj .
j=1

h g
16. Suppose Rℓ → Rm → Rn . Prove that (g◦ h)∗ = h∗ ◦ g∗ . (Hint: It suffices to prove (g◦ h)∗ dxi =
h∗ (g∗ dxi ). Why?)

17. a. Suppose I = (i1 , . . . , in ) is an ordered n-tuple and I < = (1, 2, . . . , n). Then we can
define a permutation σ of the numbers 1, . . . , n by σ(j) = ij , j = 1, . . . , n. Show that
dxI = sign(σ)dx1 ∧ · · · ∧ dxn .
P
n
b. Suppose ωi = aij dxj , i = 1, . . . , n, are 1-forms on Rn . Use Proposition 5.18 of Chapter
j=1
7 to prove that ω1 ∧ · · · ∧ ωn = (det A)dx1 ∧ · · · ∧ dxn .
§3. Line Integrals and Green’s Theorem 333

c. Suppose g : Rn → Rn is smooth. Show that dg1 ∧ · · · ∧ dgn = det(Dg)dx1 ∧ · · · ∧ dxn .

18. Suppose φ1 , . . . , φk ∈ (Rn )∗ and v1 , . . . , vk ∈ Rn . Prove that

φ1 ∧ · · · ∧ φk (v1 , . . . , vk ) = det [φi (vj )] .
(Hints: First of all, it suffices to check this holds when the vj are standard basis vectors. Why?
P
n
Write out the φi as linear combinations of the dxj , φi = aij dxj , and show that both sides
j=1
of the desired equality are
a
1j1 · · · a1jk
. . ..
.. .. .

akj1 · · · akjk
when we take v1 = ej1 ,. . . , vk = ejk .)

19. Suppose U ⊂ Rm is open and g : U → Rn is smooth. Prove that for any ω ∈ Ak (Rn ) and
v1 , . . . , vk ∈ Rm , we have

g∗ ω(a)(v1 , . . . , vk ) = ω(g(a)) Dg(a)v1 , . . . , Dg(a)vk .
(Hint: Consider ω = dxI .)

20. Prove that there is a unique linear operator d mapping Ak (U ) → Ak+1 (U ) for all k that satisfies
P
n
∂f
the properties in Proposition 2.3 and df = ∂xj dxj . (This tells us that, appearances to the
j=1
contrary notwithstanding, the exterior derivative d does not depend on our coordinate system.)

3. Line Integrals and Green’s Theorem

P
We begin with a 1-form ω = Fi dxi on Rn and a parametrized curve, C, given by a C1 function
g : [a, b] → Rn (ordinarily with g′ 6= 0). Then we define
Z Z Z bX n
∗
ω= g ω= Fi (g(t))gi′ (t)dt.
C [a,b] a i=1

Now we define a vector field (vector-valued function) F : Rn → Rn by

 
F1
 F2 
 
F =  .. .
 . 
Fn
We then recognize that
Z Z b Z b Z
′ g′ (t) ′
ω= F(g(t)) · g (t)dt = F(g(t)) · ′ kg (t)kdt = F · Tds,
C a a kg (t)k C
where ds is classically called the “element of arclength” on C and T is the unit tangent vector (see
Section 5 of Chapter 3). The most general path over which we’ll be integrating will be a finite
union of C1 paths, as above. In particular, we say the path C is piecewise-C1 if C = C1 ∪ · · · ∪ Cs ,
where Cj is the image of the C1 function gj : [aj , bj ] → Rn .
334 Chapter 8. Differential Forms and Integration on Manifolds

Remark. Consider the curve C − parametrized by h : [a, b] → Rn , h(u) = g(a + b − u). Then
Z Z b Z b
∗ ′
h ω= F(h(u)) · h (u)du = F(g(a + b − u)) · −g′ (a + b − u))du
[a,b] a a
Z b
=− F(g(t)) · g′ (t)dt (substituting t = a + b − u)
a
Z
=− g∗ ω.
[a,b]

Note that h(a) = g(b) and h(b) = g(a): when we go backwards on C, the integral of ω changes
sign. We can think of obtaining C − by reversing the orientation (or direction) of C.
In comparing C and C − , the unit tangent vector T reverses direction, so that F · T changes
sign, but ds does not. That is, the notation notwithstanding, ds is not a 1-form, as its value on a
tangent vector to C is the length of that tangent vector; this, in turn, is not a linear function of
tangent vectors! It would probably be better to write |ds|.
   
1 2
Example 1. Let C be the line segment from  −1  to  2 , and let ω = xydz. We wish to
0 2
Z
calculate ω. The first step is to parametrize C:
C
 
1+t
g(t) =  −1 + 3t  , 0 ≤ t ≤ 1.
2t
Then
Z Z Z 1 Z 1
∗
1
ω= g ω= (1 + t)(−1 + 3t)(2dt) = 2 (3t2 + 2t − 1)dt = 2(t3 + t2 − t) 0 = 2. ▽
C [0,1] 0 0

Example 2. Let ω = −ydx + xdy. Consider two parametrized curves C1 and C2 , as shown in

B
C2 C1

Figure 3.1

1 0
Figure 3.1, starting at A = and ending at B = , and parametrized respectively by:
0 1

cos t π 1−t
g(t) = , 0≤t≤ , and h(t) = , 0 ≤ t ≤ 1.
sin t 2 t
Z Z Z π/2 Z π/2
π
ω= g∗ ω = (− sin t)(− sin tdt) + (cos t)(cos tdt) = 1dt = ;
C1 [0,π/2] 0 0 2
Z Z Z 1 Z 1
ω= h∗ ω = (−t)(−dt) + (1 − t)(dt) = 1dt = 1.
C2 [0,1] 0 0
§3. Line Integrals and Green’s Theorem 335
Z B
Thus, we see that ω depends not just on the endpoints of the path, but on the particular path
A
joining them. ▽

Recall from your integral calculus (or introductory physics) class the definition of work done by
a force in displacing an object. When the force and the displacement are parallel, the definition is
work = force × displacement,
and in general only the component of the force vector F in the direction of the displacement vector
d is considered to do work, so
work = F · d.
If a vector field F moves a particle
Z along a parametrized curve C, then it is reasonable to suggest
that the total work should be F · Tds: Instantaneously the particle moves in the direction of T
C
and only the component of F in that direction should contribute. Without providing complete rigor,

g(t) C

g(t+h)
F(g(t))

Figure 3.2

we see from Figure 3.2 that the amount of work done by the force in moving the particle along C

during a very small time interval [t, t+h] is approximately F(g(t))· g(t+h)−g(t) ≈ F(g(t))·g′ (t)h,
Z b
which suggests that the total work should be given by F(g(t)) · g′ (t)dt.
a

Example 3. What is the relation between work and energy? As we saw in Section 4 of Chapter
7, the kinetic energy of a particle with mass m and velocity v is defined to be K.E. = 21 mkvk2 .
Suppose a particle with mass m moves along a curve C, its position at time t being given by g(t),
t ∈ [a, b]. Then the work done by the force field F on the particle is given by
Z Z b
work = F · Tds = F(g(t)) · g′ (t)dt
C a
Z b
= mg′′ (t) · g′ (t)dt by Newton’s second law of motion
a
Z b ′
=m 1
2 kg′ k2 (t)dt
a
1 ′ 2 ′ 2

= 2 m kg (b)k − kg (a)k by the Fundamental Theorem of Calculus
= ∆( 12 mkvk2 ) = ∆(K.E.).
336 Chapter 8. Differential Forms and Integration on Manifolds

That is, assuming F is the only force acting on the particle, the work done in moving it along a
path is the particle’s change in kinetic energy along that path. ▽

3.1. The Fundamental Theorem of Calculus for Line Integrals.

Proposition 3.1. Suppose ω = df for some C1 function f . Then for any path (i.e., piecewise-C1
manifold) C starting at A and ending at B, we have
Z
ω = f (B) − f (A).
C

Equivalently, when F = ∇f , we have

Z
F · Tds = f (B) − f (A).
C

Proof. It follows from Theorem 3.1 of Chapter 6 that any C1 segment of C is a finite union
of parametrized curves Cj , j = 1, . . . , s, where Cj is the image of a C1 function gj : [aj , bj ] → Rn .
Let gj (aj ) = Aj and gj (bj ) = Bj . We may arrange that A1 = A, Bj = Aj+1 , j = 1, . . . , s − 1, and
Bs = B. It suffices to prove the result for Cj , for then we will have
Z Xs Z Xs

ω= ω= f (Bj ) − f (Aj ) = f (B) − f (A).
C j=1 Cj j=1

Now, we have
Z Z bj
ω= gj∗ ω by definition
Cj aj
Z bj Z bj
= gj∗ (df ) = d(gj∗ f ) since d commutes with pullback
aj aj
Z bj
= d(f ◦ gj ) by definition of pullback
aj
Z bj
= (f ◦ gj )′ (t)dt
aj

= (f ◦ gj )(bj ) − (f ◦ gj )(aj ) by the Fundamental Theorem of Calculus

= f (Bj ) − f (Aj ),

as required. Note that the proof amounts merely to applying the standard Fundamental Theorem
of Calculus, along with the definition of line integration by pullback. The fact that d commutes
with pullback, in this instance, is simply the chain rule.
P
Theorem 3.2. Let ω = Fi dxi be a 1-form (or let F be the corresponding force field) on an
n
open subset U ⊂ R . The following are equivalent:
I
(1) ω = 0 for every closed curve C ⊂ U .
C
Z B
(2) ω is path-independent in U .
A
(3) ω = df (or F = ∇f ) for some potential function f on U .
§3. Line Integrals and Green’s Theorem 337
I
Remark . Note that we are using the notation ω to denote the integral of ω around the
C
closed curve (or loop) C. This notation is prevalent in physics texts. Next, in light of Example 3,
there is no net work done by F around closed paths, so that kinetic energy is conserved. This is
why such force fields are called conservative. Physicists refer to −f as the potential energy (P.E.).
It then follows from Proposition 3.1 that the total energy, K.E. + P.E., is conserved along all curves:
for
∆(K.E.) = work = f (B) − f (A) = −∆(P.E.), and so ∆(K.E. + P.E.) = 0.

Proof. (1) =⇒ (2): If C1 and C2 are two paths from A to B, then C = C1 ∪ C2− is a closed
curve, as indicated in Figure 3.3(a). Then
Z Z Z Z Z
0= ω= ω− ω =⇒ ω= ω.
C C1 C2 C1 C2

C2 x
x+hei
a
A
C1
(a) (b)

Figure 3.3

(2) =⇒ (3): (Here we assume any two points of U can be joined by a path. If not, one must
repeat the argument on each connected “piece” of U .) Fix a ∈ U , and define f : U → R by
Z x
f (x) = ω, where the integral is computed along any path from a to x.
a
By path-independence, f is well-defined. Now, to show that df = ω, we must evidently establish
∂f
that (x) = Fi (x). Now, as Figure 3.3(b) suggests,
∂xi     
x1 x1
  ..   .. 
∂f 
1  .   . 
  
(x) = lim f xi + h − f  xi 
∂xi h→0 h     . 
  ...   .. 
xn xn
Z x+hei Z h
1 1
= lim ω = lim Fi (x + tei )dt
h→0 h x h→0 h 0
= Fi (x)
by the usual Fundamental Theorem of Calculus.
(3) =⇒ (1): This is immediate from Proposition 3.1.
338 Chapter 8. Differential Forms and Integration on Manifolds

Remark. We know that when ω = df , it must be the case that dω = 0. That is, a necessary
condition for the 1-form ω to be exact is that it be closed. As we saw in Example 5(d) of Section
2, the condition is definitely not sufficient. We shall soon see that the topology of the region on
which ω is defined is relevant.
Z
3.2. Finding a potential function. If we know that ω is path-independent on a region,
then we can construct a potential function by choosing a convenient path. We illustrate the general
principle with some examples.

Examples 4. Let ω = ex + 2xy dx + x2 + cos y dy. We show two different ways to calculate
a potential function f , i.e., a function f with df = ω.

x0 x0
y0 y0
C
C2

C1
0 0 x0
0 0 0

(a) (b)

Figure 3.4

0 x0
(a) Take the line segment C joining 0 = and x0 = ; we take the obvious parametriza-
0 y0
tion:
tx0
g(t) = tx0 = , 0 ≤ t ≤ 1.
ty0
Then
Z x0 Z
x0
f = ω= g∗ ω
y0 0 [0,1]
Z 1
= (etx0 + 2t2 x0 y0 )x0 + (t2 x20 + cos(ty0 ))y0 dt
0
i1
= etx0 + 23 t3 x20 y0 + 31 t3 x20 y0 + sin(ty0 )
0
x0 2

= e + x0 y0 + sin y0 − 1,

x
and so we set f = ex + x2 y + sin y − 1, and it is easy to check that df = ω.
y
(b) Now we take the two-step path, as shown in Figure 3.4(b), first varying x and then varying
y, to get from 0 to x0 . That is, we have the two parametrizations:

t x0
C1 : g1 (t) = , 0 ≤ t ≤ x0 , C2 : g2 (t) = , 0 ≤ t ≤ y0 .
0 t
Then we have
Z Z
x0
f = ω+ ω
y0 C1 C2
§3. Line Integrals and Green’s Theorem 339
Z x0 Z y0
t

= e dt + x20 + cos t dt
0 0
x0
= (e − 1) + (x20 y0 + sin y0 ).

x
Once again, we have f = ex − 1 + x2 y + sin y.
y
(c) As a variation on the approach of part (b), we proceed purely by antidifferentiating. If we
seek a function f with df = ω, then this means that
∂f ∂f
(∗) = ex + 2xy and = x2 + cos y.
∂x ∂y
Integrating the first equation, holding y fixed, we obtain
Z
x
(†) f = ex + 2xy dx = ex + x2 y + h(y)
y
for some arbitrary function h (this is the “constant of integration”). Differentiating (†)
with respect to y and comparing with the latter equation in (∗), we find
∂f
= x2 + h′ (y) = x2 + cos y,
∂y

x
whence h′ (y) = cos y and h(y) = sin y + C. Thus, the general potential function is f =
y
ex + x2 y + sin y + C for any constant C.
Note that even though it is computationally more clumsy, the approach in (a) requires only that
we be able to draw a line segment from the “base point” (in this case, the origin) to all the other
points of our region. The approaches in (b) and (c) require some further sort of convexity: we must
be able to start at our base point and reach every other point by a path that is first horizontal and
then vertical. ▽

We now prove a general result along these lines: Suppose an open subset U ⊂ Rn has the
property that for some point a ∈ U , the line segment from a to each and every point x ∈ U lies
entirely in U . (Such a region is called star-shaped with respect to a, as Figure 3.5 suggests.) Then

Figure 3.5

we have:
340 Chapter 8. Differential Forms and Integration on Manifolds

Proposition 3.3. Let ω be a closed 1-form on a star-shaped region. Then ω is exact.

P
Proof. Write ω = Fi dxi . For any x ∈ U , we can parametrize the line segment from a to x
by
g(t) = a + t(x − a), 0 ≤ t ≤ 1.
Then we have
Z x Z 1 n
X
f (x) = ω= Fj (a + t(x − a))(xj − aj ) dt
a 0 j=1
n
X Z 1
= (xj − aj ) Fj (a + t(x − a))dt.
j=1 0

Using Exercise 7.2.20 to calculate the derivative of f ,

Z 1 Xn Z 1
∂f ∂Fj
= Fi (a + t(x − a))dt + t (a + t(x − a))(xj − aj )dt
∂xi 0 0 ∂xi
j=1
Z 1 Z 1 X
n
∂F i
= Fi (a + t(x − a))dt + t (a + t(x − a))(xj − aj ) dt
0 0 ∂xj
j=1

∂Fj ∂Fi
(using the fact that = , since dω = 0)
∂xi ∂xj
Z 1 Z 1
= Fi (a + t(x − a))dt + t(Fi ◦ g)′ (t)dt (by the chain rule)
0 0
Z 1 i1 Z 1
= Fi (a + t(x − a))dt + t(Fi ◦ g)(t) − Fi (a + t(x − a))dt
0 0 0

(integrating by parts)

= Fi (x).

That is, df = ω, as required.

Example 5. Let C be the parametric curve

 7

et
 
g(t) =  t6 + 4t3 − 1  , 0 ≤ t ≤ 1,
t4 + (t − t2 )esin t
Z
z
and let ω = + y dx + x + z dy + log x + y + 2z dz. We wish to calculate ω.
x C
We certainly hope that the 1-form ω is exact (or, equivalently, that the corresponding force field
is conservative). For then we can apply the Fundamental Theorem of Calculus for Line Integrals,
Proposition 3.1.
If ω is to be equal to df for some function f , we need to solve
∂f z ∂f ∂f
= + y, = x + z, = log x + y + 2z.
∂x x ∂y ∂z
§3. Line Integrals and Green’s Theorem 341

Integrating the first equation, we obtain:

 
x Z
  z y
f y = + y dx = z log x + xy + g ,
x z
z

y
where g is the “constant of integration.” Differentiating with respect to y, we have
z

∂f ∂g
=x+ = x + z,
∂y ∂y

∂g y
and so we find that = z. Thus, g = yz+h(z) for some appropriate “constant of integration”
∂y z
h(z). So
 
x
f y  = z log x + xy + yz + h(z).
z
Now, differentiating with respect to z, we have
∂f
= log x + y + h′ (z) = log x + y + 2z,
∂z
and so—finally—h(z) = z 2 + c, whence
 
x
f y  = z log x + xy + yz + z 2 + c.
z

Now comes the easy part. The curve goes from

   
1 e
A = g(0) =  −1  to B = g(1) =  4  ,
0 1

and so
Z

ω = f (B) − f (A) = 1 + 4e + 4 + 1 − −1 = 4e + 7. ▽
C

Example 6. Newton’s law of gravitation states that the gravitational force exerted by a point
mass M at the origin on a unit test mass is radial and inverse-square in magnitude:
x
F = −GM .
kxk3
(The corresponding 1-form is ω = −GM (x2 + y 2 + z 2 )−3/2 (xdx + ydy + zdz).) Since

d(kxk) = kxk−1 (xdx + ydy + zdz)

(see Example 1 of Chapter 3, Section 4), it follows immediately that a potential function for the
gravitational field is f (x) = GM/kxk. (Physicists ordinarily choose the constant so that the
potential goes to 0 as x goes to infinity.)
Let’s now consider the case of the gravitational field of the earth; note that the gravitational
acceleration at the surface of the earth is given by g = GM/R2 , where R is the radius of the earth.
342 Chapter 8. Differential Forms and Integration on Manifolds

By Proposition 3.1, the work done (against gravity) to lift a unit test mass from a point A on the
surface of the earth to a point B height h units above the surface of the earth is therefore

1 1 h GM h
−(f (B) − f (A)) = GM − = GM = ≈ gh,
R R+h R(R + h) R2 1 + R h

provided h is quite small compared to R. This checks with the standard formula for the potential
energy of a mass m at (small) height h above the surface of the earth: P.E. = mgh. ▽

3.3. Green’s
I Theorem. We have seen that whenever ω = df for some function f , it is the
case that ω = 0 for all closed curves C. So certainly we expect that the size of dω on a region will
C
affect the integral of ω around the boundary of that region. The precise statement is the following

Theorem 3.4 (Green’s Theorem for a rectangle). Let R ⊂ R2 be a rectangle, and let ω be a
1-form on R. Then Z Z
ω= dω.
∂R R
(Here the boundary ∂R is traversed counterclockwise.)

Proof. Take R = [a, b] × [c, d], as shown in Figure 3.6, and write ω = P dx + Qdy. Then

c ∂R

a b

Figure 3.6

∂Q ∂P
dω = − dx ∧ dy.
∂x ∂y
Now we merely calculate, using Fubini’s Theorem appropriately:
Z Z
∂Q ∂P
dω = − dA
R R ∂x ∂y
Z d Z b Z b Z d
∂Q ∂P
= dx dy − dy dx
c a ∂x a c ∂y
Z d Z b
b a x x
= Q −Q dy − P −P dx
c y y a d c

by the Fundamental Theorem of Calculus

Z b Z d Z b Z d
x b x a
= P dx + Q dy − P dx − Q dy
a c c y a d c y
Z
= ω,
∂R
as required.
§3. Line Integrals and Green’s Theorem 343

For most applications, the following observation is adequate:

Corollary 3.5. If S ⊂ R2 is parametrized by a rectangle, and ω is a 1-form on S, then

Z Z
ω= dω.
∂S S

Proof. Let g : R → S ⊂ R2 be a parametrization. Then, applying Proposition 2.4, we have

Z Z Z Z Z
∗ ∗ ∗
ω= g ω= d(g ω) = g (dω) = dω.
∂S ∂R R R S

(It is important to understand that both S and ∂S inherit an orientation from the parametrization
g.)
Z
2
Example 7. Suppose ω is a smooth 1-form on the unit disk D in R . Can we infer that ω=
Z ∂D
dω? The naı̈ve answer is “of course,” parametrizing by polar coordinates and applying Corollary
D
3.5. The difficulty that arises is that we only get a bona fide parametrization on (0, 1] × (0, 2π).
But we can apply Corollary 3.5 on the rectangle Rδ,ε = [δ, 1] × [ε, 2π] when δ, ε > 0 are small. Let

2π g
Dδ,ε
Rδ,ε

ε
δ 1

Figure 3.7

Dδ,ε = g(Rδ,ε ), as indicated in Figure 3.7. Because ω is smooth on all of the unit disk, we have
Z Z Z Z Z Z
∗ ∗
dω = lim dω = lim g dω = lim g ω = lim ω= ω.
D δ,ε→0+ Dδ,ε δ,ε→0+ Rδ,ε δ,ε→0+ ∂Rδ,ε δ,ε→0+ ∂Dδ,ε ∂D

(We leave it to the reader to justify the first and last equalities.) We shall not belabor such details
in the future. ▽

More generally, we observe that Green’s Theorem holds for any region S that can be decomposed
as a finite union of parametrized rectangles overlapping only along their edges. For, as Figure 3.8
S
illustrates, if S = ki=1 Si , because the integrals over interior boundary segments cancel in pairs,
we have
Z Xk Z Xk Z Z
ω= ω= dω = dω.
∂S i=1 ∂Si i=1 Si S

Remark. We do not usually stop to express every “reasonable” region explicitly as a union of
parametrized rectangles. (For most purposes, our work in Section 5 will obviate all such worries.)
344 Chapter 8. Differential Forms and Integration on Manifolds

S2
S1

Figure 3.8

In Example 7 we already dealt with the case of a disk. To set our minds further at ease, we can
easily check that
r cos θ
g1 : [0, 1] × [0, π], g1 =r
θ sin θ
maps a rectangle to a half-disk, and that

r r cos θ
g2 : [0, 1] × [0, π/2], g2 =
θ cos θ + sin θ sin θ

0 1 0
maps a rectangle to the triangle with vertices at , , and .
0 0 1

Example 8. We can use Green’s Theorem to calculate the area of a planar region S by line
integration. Since

dx ∧ dy = d(xdy) = d(−ydx) = d 12 (−ydx + xdy) ,
we have Z Z Z
1
area(S) = xdy = −ydx = −ydx + xdy. ▽
∂S ∂S 2 ∂S

Definition . A subset X ⊂ Rn is called simply connected if it is connected and every simple

closed curve in X can be continuously shrunk to a point in X.

Corollary 3.6. Let Ω ⊂ R2 be a simply connected region. If ω is a smooth 1-form on Ω with

dω = 0, then ω is exact, i.e., there is a function f so that ω = df .

Proof. By Green’s Theorem, for any rectangle R ⊂ Ω, we have

Z Z
ω= dω = 0,
∂R R
and so the integral along a polygonal path consisting of horizontal and vertical segments depends
only on its endpoints. As the proof of Theorem 3.2 showed, this is sufficient to construct a potential
function f .

To emphasize the importance of all the hypotheses, we give an important example.

§3. Line Integrals and Green’s Theorem 345

y x
Example 9. Let ω = − dx + 2 dy. Then, as we calculated in Example 5(d) of
x2
+y 2 x + y2 I
Section 2, dω = 0. And yet, letting C be the unit circle, it is easy to check that ω = 2π. So ω
C
cannot be exact. We shall see further instances of this phenomenon in later sections. ▽
Nevertheless, we can use Green’s Theorem to draw a very interesting conclusion.
Example 10. Suppose C is any simple closed curve in the plane that encircles the origin, and
let Γ be a circle centered at the origin lying in the interior of C, as shown in Figure 3.9. Let S be the

Figure 3.9

region lying between C and Γ. If we orient C and Γ counterclockwise, then we have ∂S = C + Γ− .

y x
Once again, let ω = − 2 dx + 2 dy. Then, as in Example 9, we have dω = 0. But now
x + y2 x + y2
ω is smooth everywhere on S, and so
Z Z Z Z
0= dω = ω= ω − ω.
S ∂S C Γ
That is, Z Z
ω= ω = 2π,
C Γ
and this is true for any simple closed curve C with the origin in its interior.

C C

(a) (b) (c)

Figure 3.10
Z
More generally, consider the curves shown in Figure 3.10. Then ω = 2π, 4π, and 0, respec-
C
tively, in parts (a), (b), and (c). For reasons we leave to the reader to surmise, for a closed plane
curve not passing through the origin, the integer
Z
1 y x
− 2 2
dx + 2 dy
2π C x + y x + y2
346 Chapter 8. Differential Forms and Integration on Manifolds

is called the winding number of C around the origin. ▽

EXERCISES 8.3

Z
*1. Let ω = ydx + xdy. Compare and contrast the integrals ω for the following parametrized
C
curves C. (Be sure to sketch C.)
2 t
a. g : [0, 1] → R , g(t) =
t

t
b. g : [0, 1] → R2 , g(t) = 2
t

2 1−t
c. g : [0, 1] → R , g(t) =
1−t

cos2 t
d. g : [0, π/2] → R2 , g(t) =
1 − sin2 t

2 sin 2t
e. g : [0, π/4] → R , g(t) =
1 − cos 2t

2 cos t
f. g : [0, π/2] → R , g(t) =
1 − sin t

*2. Repeat Exercise 1 with ω = y 2 dx + xdy.

3. Calculate
Z the following line integrals:
a. xy 3 dx, where C is the unit circle x2 + y 2 = 1, oriented counterclockwise
C    
Z 0 1
b. zdx + xdy + ydz, where C is the line segment from  1  to  −1 
C 2
3
  

Z 1 2
c. y 2 dx + zdy − 3xydz, where C is the line segment from  0  to  3 
C 1 −1
Z
d. ydx, where C is the intersection of the unit sphere and the plane x + y + z = 0, oriented
C
counterclockwise as viewed from high above the xy-plane. (Hint: Find an orthonormal
basis for the plane.)

4. Let C be the curve of intersection of the upper hemisphere x2 + y 2 + z 2 = 4, z ≥ 0, and

the cylinder 2 2
Z x + y = 2x, oriented counterclockwise as viewed from high above the xy-plane.
Evaluate ydx + zdy + xdz.
C

x y 1 2
5. Let ω = 2 2
dx + 2 dy. If C is an arbitrary path from to not passing
x +y x + yZ2 1 2
through the origin, calculate ω.
C

6. Determine which of the following 1-forms ω are exact (or, in other words, which of the corre-
sponding vector fields F are conservative). For those that are, construct (following one of the
algorithms in the text) a potential function f . For those that are not, give a closed curve C for
§3. Line Integrals and Green’s Theorem 347
I
which ω 6= 0.
C
a. ω = (x + y)dx + (x + y)dy
b. ω = y 2 dx + x2 dy
c. ω = (ex + 2xy)dx + (x2 + y 2 )dy
d. ω = (x2 + y + z)dx + (x + y 2 + z)dy + (x + y + z 2 )dz
e. ω = y 2 zdx + (2xyz + sin z)dy + (xy 2 + y cos z)dz
P
n
7. Let f : R → R and ω = f (kxk) xi dxi ∈ A1 (Rn ).
i=1
a. Assuming f is differentiable, prove that dω = 0 on Rn − {0}.
b. Assuming f is continuous, prove that ω is exact.

8. Let C be the parametric curve

 7 21

et cos(2πt )
g(t) =  t17 + 4t3 − 1  , 0 ≤ t ≤ 1.
t4 + (t − t2 )esin t
Z
2
Calculate (3x + y 2 + 2xz)dx + (2xy + zeyz + y)dy + (x2 + yeyz + zez )dz. (Hint: This problem
C
should involve very little computation!)
I I
9. Let C be any closed curve in the plane. Show that ydx = − xdy. What is the geometric
C C
interpretation of these integrals?
Z
10. Calculate each of the following line integrals ω directly and by applying Green’s Theorem.
C
(In all cases, C is traversed counterclockwise.)
0 1 1 0
a. ω = (x2 − y 2 )dx + 2xydy, C is the square with vertices , , , and
0 0 1 1
b. ω= −y 3 dx + x3 dy, C as in part a.
*c. ω= −x2 ydx + xy 2 dy, C is the circle of radius a centered at the origin
p
*d. ω= x2 + y 2 −ydx + xdy), C is the circle x2 + y 2 = 2x
e. ω= −y 2 dx + x2 dy, C is the boundary of the sector of the circle r ≤ a, 0 ≤ θ ≤ π/4
Z
2 2
11. Let C be the circle x + y = 2x, oriented counterclockwise. Evaluate ω, where ω =
C
2
−y 2 + ex dx + x + sin(y 3 ) dy.
x2 y 2
*12. Use Green’s Theorem to find the area of the ellipse + 2 ≤ 1.
a2 b

cos3 t
13. Find the area inside the hypocycloid x2/3 + y 2/3 = 1. (Hint: Parametrize by g(t) = .)
sin3 t

*14. Let 0 < b < a. Find the area beneath one arch of the trochoid (as shown in Figure 3.11)

at − b sin t
g(t) = , 0 ≤ t ≤ 2π.
a − b cos t
348 Chapter 8. Differential Forms and Integration on Manifolds

Figure 3.11

15. Find the area of the plane region bounded by the evolute

a(cos t + t sin t)
g(t) = , 0 ≤ t ≤ 2π ,
a(sin t − t cos t)

and the line segment AB, as pictured in Figure 3.12.

Figure 3.12

16. Use symmetry considerations to find the following. I

2
a. Let C be the polygonal curve shown in Figure 3.13(a). Compute (ex − 2xy)dx + (2xy −
C
x2 )dy.

−1 4
5 5
−1 3
3 3
4 C
C 2
−1 4
−1 3 2 2
1 1

(a) (b)

Figure 3.13

b. Let C be the curve pictured in Figure

I 3.13(b); you might visualize it as a racetrack with
two semicircular ends. Compute (4x3 y − 3y 2 )dx + (x4 + esin y )dy.
C

17. Let C be an oriented curve in R2 ,

and let n be the unit outward-pointing normal (this means
that {n, T} gives a right-handed basis for R2 ). Define the 1-form σ on C by σ = −n2 dx + n1 dy.
a. Show that σ(T) = 1.
§3. Line Integrals and Green’s Theorem 349
Z
b. Show that σ gives the arclength of C.
C
c. Can you explain how your answers to parts a and b might be related?

18. Let C be an oriented curve in R2 , and let n be the unit outward-pointing

normal (this means
F1
that {n, T} gives a right-handed basis for R2 ). Let F = be a vector field on the plane,
F2
and let ω = F1 dx + F2 dy be the corresponding 1-form. Show that
Z Z Z
F · nds = F1 dy − F2 dx = ⋆ω.
C C C

This is called the flux of F across C. (See Exercise 8.2.9.) Conclude that when C = ∂S, we
have
Z Z
∂F1 ∂F2
F · nds = + dA.
C S ∂x ∂y

x p
19. Prove Green’s theorem for the annular region Ω = :a≤ x2 + y2 ≤ b , pictured in
y
Figure 3.14.

Figure 3.14

20. Give a direct proof of Green’s theorem

for
0 a 0
a. a triangle with vertices at , , and .
0 0 b

x
b. the region : a ≤ x ≤ b, g(x) ≤ y ≤ h(x) . Hint: Exercise 7.2.23 will be helpful.
y

21. Suppose C is a piecewise C1 closed curve in R2 that intersects itself finitely many times and
does not pass through the origin. Show that the line integral
Z
1 −ydx + xdy
2π C x2 + y 2
is always an integer. (See the discussion of Example 10.)
1 2
C closed
22. Suppose C is a piecewise curve in R that intersects itself finitely many times and
1 −1
does not pass through or . Show that there are integers m and n so that
0 0
Z
1 −ydx + (x − 1)dy −ydx + (x + 1)dy
A 2 2
+B = mA + nB.
2π C (x − 1) + y (x + 1)2 + y 2
350 Chapter 8. Differential Forms and Integration on Manifolds

y 3 + x2 y
23. An ant finds himself in the xy-plane in the presence of the force field F = . Around
2x2 − 6xy
what simple closed curve beginning and ending at the origin should he travel counterclockwise
(once) so as to maximize the work done on him by F?

24. Suppose Ω ⊂ R2 is a region with the property that every simple closed curve in Ω bounds a
region contained in Ω that is a finite union of parametrized rectangles. Prove that if ω is 1-form
on Ω with dω = 0, then ω is exact, i.e., there is a potential function f with ω = df .

25. a. Suppose there is a current c in a river. Show that if we row at a constant ground speed
v > c directly downstream a certain distance and then directly back upstream to our
beginning point, the time required (ignoring the time to turn around) is always greater
than the time it would take with no current. (This is just an Algebra I problem!)
b. Show that the same is true no matter what closed path C we take in the river. (Assume
we still row with ground velocity v, with kvk > c constant.) (Hint: Express the time of
the trip as a line integral over C and do some clever estimates. The diagram in Figure
3.15 may help.)

v
c β
α

Figure 3.15

26. According to Webster, a planimeter, pictured in Figure 3.16, is “an instrument for measuring
the area of a regular or irregular plane figure by tracing the perimeter of the figure.” As we

Figure 3.16

show a bit more schematically in Figure 3.17, an arm of fixed length b has one fixed end; to
the other is attached another arm of length a which is free to rotate. A wheel (for convenience
attached slightly off the near end) turns as the arm rotates about the pivot point. Use Green’s
Theorem to explain how the amount that the wheel rotates tells us the area of the figure.
§4. Surface Integrals and Flux 351

x traces out curve

pivot τ
b 1
wheel
θ
fixed

Figure 3.17

4. Surface Integrals and Flux

Suppose U ⊂ R2 is a bounded open set and g : U → Rn is a one-to-one smooth map with the
property that Dg(a) has rank 2 for all a ∈ U . Then we call S = g(U ) a parametrized surface.

Examples 1. (a) Consider g : (0, 2π) × (0, a) → R3 given by

 
v cos u
u
g =  v sin u  .
v
v
p
This is a parametrization of that portion of the cone z = x2 + y 2 between z = 0 and
z = a, less one ruling, as shown in Figure 4.1.

Figure 4.1

(b) Consider g : (0, 2π) × (0, 2π) → R3 given by

 
(a + b cos v) cos u
u
g =  (a + b cos v) sin u  .
v
b sin v

Assuming 0 < b < a, the image of g is most of a torus, as pictured in Figure 4.2, the surface
of revolution obtained by rotating a circle of radius b about an axis a units from its center.
352 Chapter 8. Differential Forms and Integration on Manifolds

Figure 4.2

(c) Consider g : R × (0, ∞) → R3 given by

 
v cos u
u
g =  v sin u  .
v
u
This parametrized surface, pictured in part in Figure 4.3, resembles a spiral ramp, and is

Figure 4.3

officially called a helicoid. ▽

As we expect by now, to define the integral of a 2-form over a parametrized surface S, we pull
back and integrate: when ω ∈ A2 (Rn ) and S = g(U ), we set
Z Z
ω= g∗ ω
S U
(provided the integral exists).

Examples 2. For these examples, let’s fix ω = zdx ∧ dy ∈ A2 (R3 ).

(a) Let R = (0, 1) × (0, 2π) ⊂ R2 , and let g : R → R3 be given by
 
r cos θ
r
g =  √r sin θ  .
θ
1 − r2
Then we recognize that g is a parametrization of the upper unit hemisphere, S. We then
have p
g∗ (zdx ∧ dy) = 1 − r 2 rdr ∧ dθ,
§4. Surface Integrals and Flux 353

and so Z Z Z Z
2π 1p
∗ 2π
ω= g ω= 1 − r 2 rdrdθ = .
S R 0 0 3
(b) Now consider g : (0, π/2) × (0, 2π) → R3 given by
 
sin φ cos θ
φ
g =  sin φ sin θ  .
θ
cos φ
This is an alternative parametrization of the upper hemisphere, S. Then
g∗ (zdx ∧ dy) = cos φ(cos φ sin φdφ ∧ dθ) = cos2 φ sin φdφ ∧ dθ,
and so
Z Z Z 2π Z π/2
∗ 2π
ω= g ω= cos2 φ sin φdφdθ = .
S (0,π/2)×(0,2π) 0 0 3
(c) Now let’s do the lower hemisphere correspondingly in each of these two ways. Parametrizing
by R = (0, 1) × (0, 2π), we have
 
r cos θ
r
h = √ r sin θ  .
θ
− 1 − r2
√
We then have h∗ (zdx ∧ dy) = − 1 − r 2 rdr ∧ dθ, and so
Z Z
2π
ω= h∗ ω = − .
S R 3
On the other hand, in spherical coordinates, we have k : (π/2, π) × (0, 2π) → R3 given by
the same formula as g in part (b) above, and so
Z Z
2π
ω= k∗ ω = .
S (π/2,π)×(0,2π) 3
What gives? ▽

The answer to the query is very simple. Imagine you were walking around on the unit sphere
with your feet on the surface (your body pointing radially outwards, normal to the sphere). As
you look down, you determine that a basis for the tangent plane to the sphere will be “correctly

v
u

Figure 4.4
354 Chapter 8. Differential Forms and Integration on Manifolds

oriented” if you see a positive (counterclockwise) rotation from the first vector (u) to the second
(v), as pictured in Figure 4.4. We will say that your body is pointing in the direction of the
outward-pointing normal vector to the surface. Note that then n, u, v form a positively-oriented
basis for R3 , i.e.,

| | |

n u v > 0.

| | |

More generally, an orientation on a surface S ⊂ Rn is a continuously varying notion of what

a positively oriented basis for the tangent plane at each point should be. In particular, S has an
orientation if and only if we can choose various parametrizations g : U → Rn of (subsets of) S (the
∂g ∂g
union of whose images covers all of S) so that and give a positively oriented basis of the
∂u1 ∂u2
tangent plane of S at every point of g(U ). We say a surface is orientable if there is an orientation
on S. (See Exercise 26.)
An alternative characterization of an orientation on a surface S is the following. Recall that
dim Λ2 (R2 )∗ = 1; i.e., any nonzero element of this vector space is either a positive or a negative
multiple of dx1 ∧ dx2 . Given a nonzero element φ ∈ Λ2 (R2 )∗ , it defines an orientation on R2 in an
obvious way: the basis vectors v1 , v2 are said to define a positive orientation on R2 if and only if
φ(v1 , v2 ) > 0. Now, by analogy, a nowhere-zero 2-form ω on the surface S defines an orientation
on S: for each point a ∈ S, the tangent vectors u and v at a will form a positively oriented basis
for the tangent plane if and only if ω(a)(u, v) > 0. (We will abuse notation as follows: given an
orientation on S, i.e., a compatible choice of positively oriented basis for each tangent plane of S,
we will say ω > 0 on S if the value of ω on that basis is positive and ω < 0 if the value is negative.)
Orienting the sphere as pictured in Figure 4.4, we now see from Figure 4.5 that dx ∧ dy > 0 on the
upper hemisphere, whereas dx ∧ dy < 0 on the lower. This explains the sign disparity in the two

v
u
v
u dx ∧ dy<0
dx ∧ dy>0 n

Figure 4.5

calculations in part (c) of the preceding Example.

Example 3. The standard example of a non-orientable surface is the Möbius strip, pictured in
Figure 4.6. Observe that if you slide the positive basis {u, v} once around the strip, it will return
with the opposite orientation. Alternatively, if you start with an outward-pointing normal n and
travel once around the Möbius strip, the normal returns pointing in the opposite direction. ▽
§4. Surface Integrals and Flux 355

v
u

u
v
u

Figure 4.6

Definition . If S is an oriented surface, its (oriented) area 2-form σ is the 2-form with the
property that σ(a) assigns to each pair of tangent vectors at a the signed area of the parallelogram
they span. (By signed area we mean the obvious: the pair of tangent vectors form a positively-
oriented basis if and only if the signed area is positive.)
3 3
4.1. Oriented Surfaces
 in R and Flux. Let S ⊂ R be an oriented surface with outward-
n1
pointing unit normal n =  n2 . Then we claim that
n3
σ = n1 dy ∧ dz + n2 dz ∧ dx + n3 dx ∧ dy
is its area 2-form. This was the point of Exercise 8.2.5, but we give the argument here. If u and v
are in the tangent plane to S, then

| | |

σ(u, v) = n u v

| | |
gives the signed volume of the parallelepiped spanned by n, u, and v. Since n is a unit vector
orthogonal to u and v, this volume is the area of the parallelogram spanned by u and v; our
definition of orientation dictates that the signs agree.

Example 4. Consider the surface of revolution S defined by z = f (r), 0 ≤ r ≤ a, oriented so

that its outward-pointing normal has a positive e3 -component. We can parametrize S by
g : (0, a) × (0, 2π) → R3 ,
 
r cos θ
r
g =  r sin θ  .
θ
f (r)
∂g ∂g
Since the vector × has a positive e3 -component, this is an appropriate parametrization.
∂r ∂θ
Now, the unit normal is  x ′ 
− r f (r)
1 
n= p − yr f ′ (r)  ,
′
1 + f (r)2
1
356 Chapter 8. Differential Forms and Integration on Manifolds

and so
1 x y
σ=p − f ′ (r)dy ∧ dz − f ′ (r)dz ∧ dx + dx ∧ dy .
1 + f ′ (r)2 r r
Pulling back, we have
p
g∗ σ = r 1 + f ′ (r)2 dr ∧ dθ,

and so the surface area of S is given by

Z Z 2π Z a p
∗
g σ= r 1 + f ′ (r)2 drdθ,
(0,a)×(0,2π) 0 0

which agrees with the formula usually derived in single variable integral calculus. ▽

Example 5. Given a plane n · x = c, with knk = 1, then, assuming n3 6= 0, we can give a

parametrization by thinking of the plane as a graph over the xy-plane:
 
x
x
g = y .
y 1
n3 (c − n1 x − n2 y)

Then

∗ n1 n2 1
g σ = n1 dx ∧ dy + n2 dx ∧ dy + n3 dx ∧ dy = dx ∧ dy.
n3 n3 n3
Recall that if u and v are two vectors in the plane, then σ(u, v) gives the signed area of the
parallelogram they span, whereas (dx ∧ dy)(u, v) gives the signed area of its projection into the
xy-plane. As we see from Figure 4.7, the area of the projection is |n3 | = | cos γ| times the area of

e3
γ n

Figure 4.7

the original parallelogram, where γ is the angle between the plane and the xy-plane, so the general
theory is compatible with a more intuitive, geometric approach. ▽
 
F1
Given a vector field F =  F2  on an open subset of R3 , we saw in Section 3 that integrating
F3
the 1-form ω = F1 dx + F2 dy + F3 dz along an oriented curve computes the work done by F in
moving a test particle along that curve. What is the meaning of integrating the corresponding
2-form η = F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy over an oriented surface S? (The observant reader
§4. Surface Integrals and Flux 357

who’s worked Exercise 8.2.9 will recognize that η = ⋆ω. See also Exercise 8.3.18.) Well, if u and v
are tangent to S, then

| | |

η(u, v) = F u v = (F · n) × (signed area of the parallelogram spanned by u and v).

| | |
Z Z
That is, η represents the flux of F outwards across S, often written F · ndS. Here dS repre-
S S
sents an element of (nonoriented) surface area, just as ds represented the element of (nonoriented)
arclength on a curve; in neither case should these be interpreted as the exterior derivative of some-
thing.
A physical interpretation is the following: imagine a fluid in motion (not depending on time),
and let F(x) represent the velocity of the fluid at x multiplied by the density of the fluid at x.
(Note that F points in the direction of the velocity and has units of mass/(area × time).) Then the
mass of fluid that flows across a small area ∆S of S in a small amount of time ∆t is approximately
∆m ≈ δ∆V ≈ δ(v∆t · n)(∆S) ≈ (F · n)∆S∆t,
so that
∆m
≈ F · n∆S.
∆t Z
Taking the limit as ∆t → 0 and summing over the bits of area ∆S, we infer that η represents
S
the rate at which mass is transferred across S by the fluid flow.
 
xz 2

Example 6. We wish to find the flux of the vector field F = yx2  outwards across the sphere
zy 2
S of radius a centered at the origin. That is, we wish to find the integral over S of the 2-form
η = xz 2 dy ∧ dz + yx2 dz ∧ dx + zy 2 dx ∧ dy. Calculating the pullback under the spherical coordinate
parametrization g : (0, π) × (0, 2π) → R3 ,
 
sin φ cos θ
φ
g = a  sin φ sin θ  ,
θ
cos φ
we have
g∗ η = a5 sin φ cos θ cos2 φ(sin2 φ cos θ) + sin3 φ sin θ cos2 θ(sin2 φ sin θ)

+ cos φ sin2 φ sin2 θ(sin φ cos φ) dφ ∧ dθ

= a5 sin3 φ cos2 φ + sin5 φ cos2 θ sin2 θ dφ ∧ dθ,

and so
Z Z

η= a5 sin3 φ cos2 φ + sin5 φ cos2 θ sin2 θ dφ ∧ dθ
S (0,π)×(0,2π)
Z π Z 2π
5

=a sin3 φ cos2 φ + sin5 φ cos2 θ sin2 θ dθdφ
0 0
Z π
= 2πa5 sin3 φ cos2 φ + 18 sin5 φ dφ
0
358 Chapter 8. Differential Forms and Integration on Manifolds
Z π
5
4
= 2πa 1
8 sin φ + 34 cos2 φ sin φ − 87 cos4 φ sin φ dφ = πa5 . ▽
0 5

4.2. Surface area. We have pilfered Figure 4.8 from someone who, in turn, plagiarized from
the book Matematiqeski Analiz na Mnogoobrazih by Mihail Spivak. As this example,

top view

Figure 4.8

due to Hermann Schwarz, illustrates, one must be far more careful to define surface area by a
limiting process than arclength of curves. It seems natural to approximate a surface by inscribed
triangles. But, even as the triangles get smaller and smaller, the sum of their areas may go to
infinity, even in the case of a surface as simplistic as a cylinder. In particular, by moving the planes
of the hexagons closer together, the triangles become more and more orthogonal to the cylinder.
The area of the individual triangles approaches hℓ/2, and the number of triangles grows without
bound.
For an oriented surface S ⊂ R3 , we can (and did) explicitly write down the 2-form σ that gives
the oriented area-form on S. In analogy with our development of arclength of a curve and our
treatment of change of variables in Chapter 7, we next give a definition of surface area that will
work for any parametrized surface. We need the result of Exercise 7.5.22: if u and v are vectors in
Rn , the area of the parallelogram they span is given by
v
u
u u · u u · v
t .
v · u v · v

(Here is the sketch of a proof. We may assume {u, v} is linearly independent, and let {v3 , . . . , vn }
be an orthonormal basis for Span(u, v)⊥ . Then we know that the volume of the n-dimensional
parallelepiped spanned by u, v, v3 , . . . , vn is the absolute value of the determinant of the matrix
 
| | | |
 
A =  u v v3 · · · vn  .
| | | |
§4. Surface Integrals and Flux 359

But by our choice of the vectors v3 , . . . , vn , this volume is evidently the area of the parallelogram
spanned by u and v. But by Propositions 5.11 and 5.7 of Chapter 7, we have

u · u u · v 0 · · · 0

v · u v · v 0 · · · 0

u · u u · v
(det A)2 = det(AT A) = 0 0 1 =
v · u v · v ,
.. .. ..
. . .

0 0 1
as required.) If g is a parametrization of a smooth surface, then for sufficiently small ∆u and ∆v,

we expect that the area of the image g [u, u + ∆u] × [v, v + ∆v] should be approximately
the area
u
of the parallelogram that is the image of this rectangle under the linear map Dg , and that, in
v
∂g ∂g
turn, is ∆u∆v times the area of the parallelogram spanned by and .
∂u ∂v
With this motivation, we now make the following

Definition . Let S ⊂ Rn be a parametrized surface, given by g : Ω → Rn , for some region

Ω ⊂ R2 . Let 2 2
∂g ∂g ∂g ∂g

E= , F = · , G=
∂u ∂u ∂v ∂v .
We define the surface area of S to be
Z p
area(S) = EG − F 2 dAuv .
Ω

We leave it to the reader to check in Exercise 20 that for a parametrized, oriented surface in
R3 this gives the same result as integrating the area 2-form σ over the surface.

EXERCISES 8.4

1. Let S be that portion of the plane x + 2y + 2z = 4 lying in the first octant, oriented with
outward normal pointing upwards. Find
a. Z
the area of S
b. (x − y + 3z)σ
ZS
c. zdx ∧ dy + ydz ∧ dx + xdy ∧ dz
S

2. Find the area of that portion of the cylinder x2 + y 2 = a2 lying above the xy-plane and below
the plane z = y.
p
3. Find the area of that portion of the cone z = 2(x2 + y 2 ) lying beneath the plane y + z = 1.

*4. Find the area of that portion of the cylinder x2 +y 2 = 2y lying inside the sphere x2 +y 2 +z 2 = 4.
♯ 5. Let S be the
Z sphere of radius a centered at the origin, oriented with normal pointing outwards.
Evaluate xdy ∧ dz + ydz ∧ dx + zdx ∧ dy explicitly. What formula do you deduce for the
S
surface area of S?
360 Chapter 8. Differential Forms and Integration on Manifolds

6. Let S be the surface

Z of the unit sphere, and let its area element be σ.
a. Calculate x2 σ directly.
S Z
b. Evaluate the integral in part a without doing any calculations. (Hint: Why is x2 σ =
Z Z S
2
y σ= z 2 σ?)
S S

7. Find the surface area of the torus given parametrically in Example 1(b).

*8. Find the surface area of that portion of a sphere of radius a lying between two parallel planes
(both intersecting the sphere) a distance h apart.

9. Let S be that portion of the helicoid given parametrically by

 
u cos v
u
g =  u sin v  , 0 ≤ u ≤ 1, 0 ≤ v ≤ 2π.
v
v
a. With the orientation determined by g, decide whether the outward-pointing normal points
upwards or downwards. Z
b. If we orient S with the normal pointing upwards, compute xdz ∧ dx.
S

*10. We can parametrize the unit sphere (except for the

 north
 pole) by stereographic projection
u
from the north  
   pole, as indicated in Figure 4.9. If v is the point where the line through
0 x 0
 0  and  y  (on the sphere) intersects the plane z = 0, solve for u and v. Then solve for
1 z
 
x
u
g =  y . Explain geometrically why stereographic projection is an orientation-reversing
v
z
parametrization.

0
0 x
1 y
z
(u )
g v

u
v

Figure 4.9

Z ω = xdy ∧ dz. Let S be the unit sphere, oriented with outward-pointing normal. Calculate
11. Let
ω by parametrizing S
S
a. by spherical coordinates
§4. Surface Integrals and Flux 361

b. as a union of graphs
c. by stereographic projection (see Exercise 10)
Z
12. Let S be the unit upper hemisphere, oriented with outward-pointing normal. Calculate zσ
S
by showing that zσ = dx ∧ dy as 2-forms on S.

Let S be the cylinder x2 +y 2 = a2 , 0 ≤ z ≤ h, oriented with outward-pointing normal. Calculate

13. Z
ω for
S
a. ω = zdx ∧ dy
b. ω = ydx ∧ dz.

*14. Find the moment of inertia about the z-axis of a uniform spherical shell of radius a centered
at the origin.

*15. Find the flux of the vector field F(x) = x outwards across the following surfaces (all oriented
with outward-pointing normal pointing away from the origin):
a. the surface of the sphere of radius a centered at the origin
b. the surface of the cylinder x2 + y 2 = a2 , −h ≤ z ≤ h
c. the surface of the cylinder x2 + y 2 = a2 , −h ≤ z ≤ h, together with the two disks,
 
x2 + y 2 ≤ a2 , z = ±h ±1
d. the surface of the cube with vertices at  ±1 
 2  ±1
x
16. Find the flux of the vector field F =  y 2  outwards across the given surface S (all oriented
z2
with outward-pointing normal pointing away from the origin, unless otherwise specified):
a. S is the sphere of radius a centered at the origin
b. S is the upper hemisphere of radius a centered at the origin
p
c. S is the cone z = x2 + y 2 , 0 < z < 1, with outward-pointing normal having a negative
e3 -component
d. S is the cylinder x2 + y 2 = a2 , 0 ≤ z ≤ h
e. S is the cylinder x2 + y 2 = a2 , 0 ≤ z ≤ h, along with the disks x2 + y 2 ≤ a2 , z = 0 and
z=h
 
xz
*17. Calculate the flux of the vector field F =  yz  outwards across the surface of the pa-
x + y2
2

raboloid S given by z = 4 − x2 − y 2 , z ≥ 0 (with outward-pointing normal having positive

e3 -component).

*18. Find the flux of the vector field F(x) = x/kxk3 outwards across the given surface (oriented
with outward-pointing normal pointing away from the origin):
a. the surface of the sphere of radius a centered at the origin
b. the surface of the cylinder x2 + y 2 = a2 , −h ≤ z ≤ h
c. the surface of the cylinder x2 + y 2 = a2 , −h ≤ z ≤ h, together with the two disks,
x2 + y 2 ≤ a2 , z = ±h
362  8.Differential Forms and Integration on Manifolds
Chapter
±1
d. the surface of the cube with vertices at  ±1 
±1
p
19. Let S be that portion of the cone z = x2 + y 2 lying inside 2 2 2
Z the sphere x + y + z = 2ax,
and oriented with normal pointing downwards. Calculate ω for
S
a. ω = dx ∧ dy
x y
b. ω = dy ∧ dz + dz ∧ dx − dx ∧ dy.
z z
20. Suppose g : Ω → R3 gives a parametrized, oriented surface with unit outward normal n. Let
∂g ∂g
N= × , so that n = N/kNk. Check that
∂u ∂v
p
g∗ (n1 dy ∧ dz + n2 dz ∧ dx + n3 dx ∧ dy) = kNkdu ∧ dv = EG − F 2 du ∧ dv.

21. Sketch the parametrized surface g : [0, 2π] × [−1, 1] given by:
 
(2 + v sin u2 ) cos u
u
g =  (2 + v sin u2 ) sin u  .
v
v cos u2

0 2π
Compare g∗ (dy ∧ dz) at and . Explain.
0 0

*22. Consider the “flat torus”

  

 x1 

  x2  2 
  2 2 2
X =   : x1 + x2 = 1, x3 + x4 = 1 ⊂ R4 .

 x3 

 
x4
 
1 Z
0
Orient X so that dx2 ∧ dx4 > 0 at the point  
 1  ∈ X. Calculate ω for
X
0
a. ω = dx1 ∧ dx2 + dx3 ∧ dx4
b. ω = dx1 ∧ dx3 + dx2 ∧ dx4
c. ω = x2 x4 dx1 ∧ dx3 .

23. Consider the cylinder S with equation x2 + y 2 = 1, −1 ≤ z ≤ 1, oriented with unit normal
pointing
Z outwards. Calculate
a. xdy ∧ dz − zdx ∧ dy
ZS Z
b. xzdy and xzdy (See Figure 4.10.)
C1 C2
Compare your answers and explain.

24. Let S be the hemisphere x2 + y 2 + z 2 = a2 , z ≥ 0, oriented with unit normal pointing upwards.
Let CZ be the boundary curve, x2 + y 2 = a2 , z = 0, oriented counterclockwise. Calculate
a. dx ∧ dy + 2zdz ∧ dx
ZS
b. xdy + z 2 dx
C
§5. Stokes’s Theorem 363

fold seal
S

Figure 4.10

Compare your answers and explain.

25. Construct two Möbius strips out of paper: For each, cut out a long rectangle, and attach the
short edges with opposite orientations.
a. Cut along the center circle of the first strip. What happens? Explain. What happens if
you repeat the process?
b. Make parallel cuts in the second strip one third the way from either edge. What happens?
Explain.

26. Prove or give a counterexample: If S is an orientable surface, then there are exactly two possible
orientations on S.

5. Stokes’s Theorem

We now come to the generalization of Green’s Theorem to higher dimensions. We first stop to
make the official definition of the integral of a differential form over a compact, oriented manifold.
So far we have dealt only with the integrals of 1- and 2-forms over parametrized curves and surfaces,
respectively.

5.1. Integrating over a general compact, oriented k-dimensional manifold. We know

how to integrate a k-form over a parametrized k-dimensional manifold by pulling back. In general,
a manifold will be a union of parametrized pieces that overlap, and so summing the integrals will
give a meaningless result. To solve this problem, we introduce one of the powerful tools in the
study of manifolds, one that allows us to chop a global problem into local ones and then add up
the answers.
We start with a

Definition . A subset M ⊂ Rn is called a k-dimensional manifold with boundary if for each

point p ∈ M there is an open set W ⊂ Rn containing p and a parametrization4 g : U → Rn so
that
(i) g(U ) = V = W ∩ M ; and
4
Recall from Section 3 of Chapter 6 that this means that g is a one-to-one smooth map from U to W ∩ M so
that Dg(u) has rank k for every u ∈ U and g−1 : W ∩ M → U is continuous.
364 Chapter 8. Differential Forms and Integration on Manifolds

(ii) U is an open subset either of Rk or of Rk+ = {u ∈ Rk : uk ≥ 0}.5

See Figure 5.1. We say p is a boundary point of M (written p ∈ ∂M ) if p = g(u) for some

M
V

U
U
IRk IR+k

Figure 5.1

parametrization g : U → Rn with U ⊂ Rk+ and u ∈ ∂Rk+ = {u ∈ Rk : uk = 0}.

g(U ) is sometimes called a coordinate chart on M . A coordinate ball on M is the image of
some ball under some parametrization.

As was the case with surfaces, an orientation on a manifold with boundary M ⊂ Rn is a

continuously varying notion of what a positively oriented basis for the tangent space at each point
should be.
M has an orientation
if and only if we can cover M by coordinate charts g : U → Rn
∂g ∂g
so that ,..., is a positive basis for the tangent space of M at each point. We say M is
∂u1 ∂uk
orientable if there is some orientation on M .
We leave it to the reader to prove, using Theorem 5.1, that M is orientable if and only if there
is a nowhere-vanishing k-form on M (see Exercise 23). Then we can make the

Definition. Let M be an oriented k-dimensional manifold with boundary. Its (oriented) volume
form is the k-form σ with the property that σ(a) assigns to each k-tuple of tangent vectors at a
the signed volume of the parallelepiped they span.

Now we come to the main technical tool that will enable us to define integration on manifolds.

Theorem 5.1. Let M ⊂ Rn be a compact k-dimensional manifold with boundary. Then there
are smooth real-valued functions ρ1 , . . . , ρN on M so that
(i) 0 ≤ ρi ≤ 1 for all i;
5
We say U ⊂ Rk+ is an open subset of Rk+ if it is the intersection of Rk+ with some open subset of Rk .
§5. Stokes’s Theorem 365

(ii) each ρi is zero outside some coordinate ball;

PN
(iii) ρi = 1.
i=1
{ρi } is called a partition of unity on M .

Proof. Step 1. Define h : R → R by


e−1/x , x>0
h(x) = .
0, x≤0

Then h is smooth (in particular, all its derivatives at 0 are equal to 0, as we ask the reader to prove
in Exercise 25). Set
Z x
h(t)h(1 − t)dt
0
j(x) = Z 1 ,
h(t)h(1 − t)dt
0

and define ψ : Rk
→ R by ψ(x) = j(3−2kxk). Then ψ is a smooth function with ψ(x) = 1 whenever
kxk ≤ 1 and ψ(x) = 0 whenever kxk ≥ 3/2. ψ is often called a bump function. (See Figure 5.2 for
the graph of ψ for k = 1.)

y=h(x)h(1–x) y=j(x) y=ψ(x)

1 1 1 1.5

Figure 5.2

Step 2. For each point p ∈ M , choose a coordinate chart whose domain is a ball of radius 2
in Rk (why can we do so?).6 The images of the balls of radius 1 obviously cover all of M ; indeed
we can choose a sequence (countable number) of p’s so that this is true. (See Exercise 26.) By
Exercise 5.1.12, finitely many of these images of balls of radius 1, say, V1 , . . . , VN , cover all of M .
Let gi : B(0, 2) → Vi be the respective coordinate charts, and define θi = ψ ◦ gi−1 , interpreting θi to
be defined on all of M by letting it be 0 outside of Vi (note that the fact that ψ is 0 outside the
ball of radius 3/2 means that θi will be smooth). Set
θi
ρi = PN .
j=1 θj

Note that for each p ∈ M , we have p = gj (u) for some j and some u ∈ B(0, 1), and hence θj (p) = 1
for some j. Thus, the sum is everywhere positive. These functions ρi fulfill the requirements of the
theorem.
6
For those p in the boundary, this will be a half-ball, i.e., the points in the ball with nonnegative kth coordinate.
366 Chapter 8. Differential Forms and Integration on Manifolds

Now it is easy to define the integral. Let M ⊂ Rn be a compact, oriented k-dimensional manifold
(with piecewise-smooth boundary). Let ω be a k-form on M .7 Let {ρi } be a partition of unity,
and let gi be the corresponding parametrizations, which we may take to be orientation-preserving
(how?). Now we set
Z Z XN N Z
X
ω= ρi ω = gi∗ (ρi ω).
M M i=1 i=1 B(0,2)

The point is that the form ρi ω is nonzero only inside the image of the parametrization gi .
One last technical point. Let M be a k-dimensional manifold with boundary, and let p be a
boundary point. The tangent space of ∂M at p is a (k − 1)-dimensional subspace of the tangent
space of M at p, and its orthogonal complement is 1-dimensional. That 1-dimensional subspace has
two possible basis vectors, called the inward- and outward-pointing normal vectors. By definition,
if we follow a curve starting at p whose tangent vector is the inward-pointing normal, we move into
M , as shown in Figure 5.3. We endow ∂M with an orientation, called the boundary orientation,

inward-pointing path

Tp(∂M)
M
p

n outward-pointing normal

Figure 5.3

by saying that the outward normal, n, followed by a positively-oriented basis for the tangent space
of ∂M should provide a positively-oriented basis for the tangent space of M . For examples, see
Figure 5.4. We ask the reader to check in Exercise 1 that the boundary orientation on ∂Rk+ is the

IR3+
2
IR+
v1
v1
v2
n
n

Figure 5.4

usual one on Rk−1 precisely when k is even.

7
We are being a bit casual about what a smooth function or k-form on M ought to mean. We might start with
something defined on a neighborhood of M in Rn or, instead, we might just know the pullbacks under coordinate
charts are smooth. Because of Theorem 3.1 of Chapter 6, these notions are equivalent. We leave the technical details
to a more advanced course. In practice, except for results such as Theorem 5.1, we will usually start with objects
defined on Rn anyhow.
§5. Stokes’s Theorem 367

5.2. Stokes’s Theorem. Now we come to the crowning result. We will give various physical
interpretations and applications in the next section, as well as some applications to topology in the
last section of the chapter. Here we will give the theorem and some concrete examples.

Theorem 5.2 (Stokes’s Theorem). Let M be a compact, oriented k-dimensional manifold with
boundary, and let ω be a smooth (k − 1)-form on M . Then
Z Z
ω= dω.
∂M M

(Here ∂M is endowed with the boundary orientation, as described above.)

Remark. Note that the usual Fundamental Theorem of Calculus, the Fundamental Theorem
of Calculus for Line Integrals (Proposition 3.1), and Green’s Theorem (Corollary 3.5) are all special
cases of this theorem. When we’re orienting the boundary of an oriented line segment, we assign a
+ when the outward-pointing normal agrees with the orientation on the segment, and a − when it
disagrees. This is compatible with the signs in
Z b Z b
df = f ′ (t)dt = f (b) − f (a).
a a

Proof. Since both sides of the desired equation are linear in ω, we can (by using a partition of
unity) reduce to the case that ω is zero outside of a compact subset of a single coordinate chart,
g : U → Rn (where U is open in either Rk or Rk+ ). Then we have
Z Z Z Z
dω = dω = g∗ (dω) = d(g∗ ω).
M g(U ) U U

g∗ ω, being a (k − 1)-form on U ⊂ Rk , can be written as follows:

k
X
g∗ ω = ci ∧ · · · ∧ dxk ,
fi (x)dx1 ∧ · · · ∧ dx
i=1

ci indicates that the dxi term is omitted. So we have

where dx
k
∂fi
k
X X
∂fi ci ∧ · · · ∧ dxk =
d(g∗ ω) = dxi ∧ dx1 ∧ · · · ∧ dx (−1)i−1 dx1 ∧ · · · ∧ dxi ∧ · · · ∧ dxk .
∂xi ∂xi
i=1 i=1

Z Case 1.
Z Suppose U is open in Rk ; this means that ω = 0 on ∂M , and so we need only show that
dω = d(g∗ ω) = 0. The crucial point is this: since g∗ ω is smooth and 0 outside of a compact
M U
subset of U , we may choose a rectangle R containing U , as shown in Figure 5.5, and extend the
functions fi to functions on all of R by setting them equal to 0 outside of U . Finally, we integrate
over R = [a1 , b1 ] × · · · × [ak , bk ]:
Z Z Xk
∂fi
d(g∗ ω) = (−1)i−1 dx1 ∧ · · · ∧ dxi ∧ · · · ∧ dxk
U R ∂xi
i=1
k
X Z
i−1 ∂fi
= (−1) dx1 dx2 · · · dxk
R ∂xi
i=1
368 Chapter 8. Differential Forms and Integration on Manifolds

U
R
ai bi

Figure 5.5

k
X Z bk Z b1 Z bi
i−1 ∂fi ci · · · dxk
= (−1) ... dxi dx1 · · · dx
i=1 ak a1 ai ∂xi
    
x1 x1
Z b1  .  .. 
Xk Z bk   ..   . 
i−1      ci · · · dxk
= (−1) ... fi  bi  − fi  ai  dx1 · · · dx
a a   .   . 
i=1 k 1
  ..   .. 
xk xk
= 0,

since fi = 0 everywhere on the boundary of R. (Note the applications of Fubini’s Theorem and the
traditional Fundamental Theorem of Calculus.)
Case 2. Now comes the more interesting situation. Suppose U is open in Rk+ , and once again
we extend the functions fi to functions on a rectangle R ⊂ Rk+ by letting them be 0 outside of U .
In this case, the rectangle is of the form R = [a1 , b1 ] × · · · × [ak−1 , bk−1 ] × [0, bk ], as we see in Figure

U
R

Figure 5.6

5.6. Now we have

Z Z Xk
∗ ∂fi
d(g ω) = (−1)i−1 dx1 ∧ · · · ∧ dxi ∧ · · · ∧ dxk
U R ∂xi
i=1
k
X Z
∂fi
= (−1)i−1 dx1 dx2 · · · dxk
R ∂xi
i=1
k
X Z bk Z b1 Z bi
∂fi ci · · · dxk
= (−1)i−1 ... dxi dx1 · · · dx
ak a1 ai ∂xi
i=1
§5. Stokes’s Theorem 369

    
x1 x1
  ..   . 
Xk Z bk Z b1   .   .. 
     ci · · · dxk
= (−1)i−1 ... fi  bi  − fi  ai  dx1 · · · dx
    
i=1 ak a1
  ...   ... 
xk xk
    
x1 x1
Z bk−1 Z b1   ..   .. 
k−1   .   . 
= (−1) ... fk   − fk   dx1 · · · dxk−1
ak−1 a1  xk−1  xk−1 
bk 0

(since all the other integrals vanish for the same reason as in case 1)
 
x1
Z  .. 
k  . 
= (−1) fk   dx1 · · · dxk−1
U ∩∂Rk+ xk−1 
0
Z Z
= g∗ ω = ω,
U ∩∂Rk+ ∂M

as required. Note the crucial sign in the definition of the boundary orientation (see also Exercise
1).

Remark . Although we won’t take the time to prove it here, Stokes’s Theorem is also valid
when the boundary, rather than being a manifold itself, is piecewise smooth, e.g., a union of smooth
(k − 1)-dimensional manifolds with boundary intersecting along (k − 2)-dimensional manifolds. For
example, we may take a cube or a solid cylinder, whose boundary is the union of a cylinder and
two disks. The theorem also applies to such non-manifolds as a solid cone.

Corollary 5.3. Let M be a compact, oriented k-dimensional manifold

Z without boundary. Let
ω be an exact k-form, i.e., ω = dη for some (k − 1)-form η. Then ω = 0.
M

Proof. This is immediate from case 1 of the proof of Theorem 5.2.

Example 1. Let C be the intersection of the unit sphere x2 + y 2 + z 2 = 1 and the plane
x + 2y +Zz = 0, oriented counterclockwise as viewed from high above the xy-plane. We wish to
evaluate (z − x)dx + (x − y)dy + (y − z)dz.
C
We let ω = (z−x)dx+(x−y)dy+(y−z)dz and M be that portion of the plane x+2y+z = 0 lying
inside the unit sphere, oriented so that the outward-pointing normal has a positive e3 -component,
as shown in Figure 5.7. Then ∂M = C, and by Stokes’s Theorem we have
Z Z Z Z
(∗) ω= ω= dω = (dy ∧ dz + dz ∧ dx + dx ∧ dy).
C ∂M M M

Parametrizing the plane by projection on the xy-plane, we have M = g(D), where D is the interior
yyy
370 Chapter 8. Differential Forms and Integration on Manifolds

yyy
yyy
D

Figure 5.7

of the ellipse 2x2 + 4xy + 5y 2 = 1 (why?), and

 
x
x
g = y .
y
−x − 2y
(The reader should check that g is an orientation-preserving parametrization.) Therefore,
Z Z Z
dω = g∗ dω = (1 + 2 + 1)dx ∧ dy = 4area(D).
M D D
Now, by Exercise 5.4.15 or by techniques we shall learn in Chapter 9, this ellipse has semimajor
√ √
axis 1 and semiminor axis 1/ 6, so, using the result of Exercise 8.3.12, its area is π/ 6, and the
√
integral is 4π/ 6.
Alternatively, applying our discussion of flux
 inSection 4, we recognize the surface integral in
1
(∗) as the fluxof the  
 constant vector field F = 1 outwards across M . Since the unit normal of
1 1
1
M is n = √  2 , we see that
6 1    
Z Z 1 1
   1   4 4
(dy ∧ dz + dz ∧ dx + dx ∧ dy) = 1 ·√ 2 dS = √ area(M ) = √ π,
M M 1 6 1 6 6

since M is, after all, a disk of radius 1. ▽

Example 2. LetZS be the sphere x2 + y 2 + (z − 1)2 = 1, oriented in the customary fashion.

We wish to evaluate ω, where ω = xzdy ∧ dz + yzdz ∧ dx + z 2 dx ∧ dy. Let M be the compact
S
3-manifold with boundary whose boundary is S, i.e., M = {x ∈ R3 : x2 + y 2 + (z − 1)2 ≤ 1},
oriented by the standard orientation on R3 . We apply Stokes’s Theorem to M :
Z Z Z Z
ω= ω= dω = 4zdx ∧ dy ∧ dz
S
Z∂M M M
4π 16π
= 4zdV = 4zvol(M ) = 4 · 1 · = .
M 3 3
Of course, we could compute
 the surface integral directly, parametrizing S by, for example, spherical
0
coordinates centered at  0 . ▽
1
§5. Stokes’s Theorem 371
 
xz
Example 3. Suppose we wish to calculate the flux of the vector field F =  yz  outwards
x2 + y 2
across the surface of the paraboloid S given by z = 4 − x2 − y 2 , z ≥ 0 (with outward-pointing
normal having positive e3 -component). That is, we must compute the integral of ω = xzdy ∧ dz +
yzdz ∧ dx + (x2 + y 2 )dx ∧ dy. How might we do this
Z withZ Stokes’s Theorem? If ω were exact, i.e.,
if ω = dη for some 1-form η, then we would have ω= η; but since dω = 2zdx ∧ dy ∧ dz 6= 0,
S ∂S
we know that ω cannot be exact. What now?

n D

Figure 5.8

If we attach the disk D = {x2 + y 2 ≤ 4, z = 0}, to S, then we have a (piecewise-smooth) closed

surface, which bounds the region M = {0 ≤ z ≤ 4 − x2 − y 2 } ⊂ R3 , as shown in Figure 5.8. Then
we have ∂M = S ∪ D − (where by D − we mean the disk with outward-pointing normal given by
−e3 ). Applying Stokes’s Theorem, we find
Z Z Z Z Z 2π Z 2 Z 4−r2
64
ω= dω = 2zdx ∧ dy ∧ dz = 2zdV = 2rzdzdrdθ = π.
∂M M M M 0 0 0 3
But we are interested in the integral of ω only over the surface S. Since
Z Z Z Z Z
ω= ω+ ω= ω− ω
∂M S D− S D

(where by D we mean the disk with its usual upwards orientation), then we have
Z Z Z Z 2π Z 2
64 64 88
ω= ω+ ω= π+ r 2 rdrdθ = π + 8π = π.
S ∂M D 3 0 0 3 3
We leave it to the reader to check this by a direct calculation (see Exercise 8.4.17). ▽

Example 4. We come now to the 3-dimensional analogue of Example 9 of Section 3. It will play
a major rôle in physical and topological applications in upcoming sections. Consider the 2-form
xdy ∧ dz + ydz ∧ dx + zdx ∧ dy
ω= ,
(x2 + y 2 + z 2 )3/2
which is defined and smooth on R3 − {0}. The astute reader may recognize that on a sphere of
radius a centered at the origin, ω is 1/a2 times the area 2-form.
Pulling back by the spherical coordinates parametrization given on p. 315, with a bit of work
we see that
g∗ ω = sin φdφ ∧ dθ,
372 Chapter 8. Differential Forms and Integration on Manifolds

which establishes again the geometric interpretation of ω. It is also clear that d(g∗ ω) = 0; since
det Dg 6= 0 whenever ρ 6= 0 and φ 6= 0, π, it follows that dω = 0. (Of course, it isn’t too hard to
calculate this directly!)
So here we have a 2-form whose integral over any sphere centered atZ the origin (with outward-
pointing normal) is 4π, and yet, for any ball B centered at the origin, dω = 0. What happened
B
to Stokes’s Theorem? The problem is that ω is not defined, let alone smooth, on all of B.
But there is more to be learned here. If Ω ⊂ R3 is a compact 3-manifold with boundary with
0∈/ ∂Ω, then we claim that 
Z 4π, 0 ∈ Ω
ω= ,
∂Ω 0, 0∈/Ω
rather like what happened with the winding number in Example 10 of Section 3. When 0 ∈ / Ω, we
know that ω is a (smooth) 2-form on all of Ω, and hence Stokes’s Theorem applies directly to give
Z Z
ω= dω = 0.
∂Ω Ω

When 0 ∈ Ω, however, we choose ε > 0 small enough so that the closed ball B(0, ε) ⊂ Ω, and we

ε
Ωε

Figure 5.9

let Ωε = Ω − B(0, ε), as pictured in Figure 5.9, recalling that ∂Ωε = ∂Ω + Sε− . (Here Sε denotes
the sphere of radius ε centered at 0, with its usual outward orientation.) Then ω is a smooth form
defined on all of Ωε and we have
Z Z Z Z
0= dω = ω= ω− ω.
Ωε ∂Ωε ∂Ω Sε

Therefore, we have Z Z
ω= ω = 4π,
∂Ω Sε
as we learned above. ▽

EXERCISES 8.5

*1. Check that the boundary orientation on ∂Rk+ is (−1)k times the usual orientation on Rk−1 .
§5. Stokes’s Theorem 373

2. Let C be the intersection of the cylinder x2 + y 2 = 1 and the plane 2x + 3y − z = 1, oriented

counterclockwise as viewed from high above the xy-plane. Evaluate
Z
ydx − 2zdy + xdz
C
directly and by applying Stokes’s Theorem.
Z
*3. Compute (y − z)dx + (z − x)dy + (x − y)dz, where C is the intersection of the cylinder
C
x z
x2 + y 2 = a2 and the plane + = 1, oriented clockwise as viewed from high above the
a b
xy-plane.

4. Let C be the intersection of the sphere x2 + y 2 + z 2 = 2 and the plane z = 1, oriented

counterclockwise as viewed from high above the xy-plane. Evaluate
Z
(−y 3 + z)dx + (x3 + 2y)dy + (y − x)dz.
C

5. Let C be the intersection of the sphere x2 + y 2 + z 2 = a2 and the plane x + y + z = 0, oriented

counterclockwise as viewed from high above the xy-plane. Evaluate
Z
2zdx + 3xdy − dz.
C

*6. Let Ω ⊂ R3 be the region bounded above by the sphere x2 + y 2 + z 2 = a2 and below by the
plane z = 0. Compute
Z
xzdy ∧ dz + yzdz ∧ dx + (x2 + y 2 + z 2 )dx ∧ dy
∂Ω
directly and by applying Stokes’s Theorem.

7. Let ω = yZ2 dy ∧dz +x2 dz ∧dx+z 2 dx∧dy, and let M be the solid paraboloid 0 ≤ z ≤ 1−x2 −y 2 .
Evaluate ω directly and by applying Stokes’s Theorem.
∂M

8. Let M be the surface of the paraboloid z = 1 − x2 −y 2 , z ≥ 0,

 oriented so that the outward-
x2 z Z
pointing normal has positive e3 -component. Let F =  2
y z  . Compute F · ndS directly
x2 + y 2 M
and by applying Stokes’s Theorem. Be careful!

9. Let M be the surface pictured in Figure 5.10, with boundary curve x2 +y 2 = 4, z = 0. Calculate
Z
yzdy ∧ dz + x3 dz ∧ dx + y 2 dx ∧ dy.
M

10. Suppose M and M ′ are two compact, oriented k-dimensional manifolds with boundary, and
suppose
Z ∂M Z= ∂M ′ (as oriented (k−1)-dimensional manifolds). Prove that for any (k−1)-form
ω, dω = dω.
M M′
Z
11. Use the result of Exercise 10 to compute dω for the given surface M and 1-form ω:
M
374 Chapter 8. Differential Forms and Integration on Manifolds

Figure 5.10

a. M is the upper hemisphere x2 +y 2 +z 2 = a2 , z ≥ 0, oriented with outward-pointing normal

having positive e3 -component; ω = (x3 + 3x2 y − y)dx + (y 3 z + x + x3 )dy + (x2 + y 2 + z)dz
b. M is that portion of the paraboloid z = x2 + y 2 lying beneath z = 4, oriented with
outward-pointing normal having negative e3 -component; ω = ydx + zdy + xdz
c. M is the union of the cylinder x2 + y 2 = 1, 0 ≤ z ≤ 2, and the disk x2 + y 2 ≤ 1,
z = 0, oriented so that the normal to the cylindrical portion points radially outwards;
ω = −y 3 zdx + x3 zdy + x2 y 2 dz

12. Let M = Z{x ∈ R4 : x21 + x22 + x23 ≤ x4 ≤ 1}, with the standard orientation inherited from R4 .
Evaluate ω:
∂M
*a. ω = (x31 x42 + x4 )dx1 ∧ dx2 ∧ dx3
b. ω = kxk2 dx1 ∧ dx2 ∧ dx3

13. Redo Exercise 8.4.22c by applying Stokes’s Theorem.

14. Suppose f is a smooth function on a compact 3-manifold with boundary M ⊂ R3 . At a point of

∂M , let Dn f denote the directional derivative of f in the direction of the unit outward normal.
Show that
Z Z
Dn f dS = ∇2 f dV ,
∂M M

∂2f ∂2f ∂2f

where ∇2 f = + + is the Laplacian of f . (Hint: ∇2 f dx ∧ dy ∧ dz = d⋆df . See
∂x2 ∂y 2 ∂z 2
Exercise 8.2.9.)

15. Let S be that portion of the cylinder x2 + y 2 = a2 lying above the xy-plane and below the
sphere x2 + (y − a)2 + z 2 = 4a2 . Let C be the intersection of the cylinder and sphere, oriented
clockwise as viewed
Z from high above the xy-plane.
a. Evaluate zdS.
S Z
b. Use your answer to part a to evaluate y(z 2 − 1)dx + x(1 − z 2 )dy + z 2 dz.
C

16. Let S be that portion of the cylinder x2

+ y 2 = a2 lying above the xy-plane and below the
sphere (x − a)2 + y 2 + z 2 = 4a2 . Let C be the intersection of the cylinder and sphere, oriented
clockwise as viewed from high above the xy-plane.
§5. Stokes’s Theorem 375
Z
a. Evaluate z 2 dS.
S Z
b. Use your answer to part a to evaluate y(z 3 + 1)dx − x(z 3 + 1)dy + zdz.
C
17. Let   

 x1 

  
x
M=  2  ∈ R × R : kxk + kyk = 1, x · y = 0 ⊂ R4 ,
 y1 
2 2 2 2

 

 
y2
Z
dx2 ∧ dy2
oriented so that > 0 on M . Evaluate (y12 − x21 )dx2 ∧ dy2 . (Hint: By applying an
y12 − x21 M
appropriate linear transformation, you should be able to recognize M as a torus.)

*18. Let C be the intersection of the sphere x2 + y 2 + z 2 = 1 and the plane x + y + z = 0, oriented
counterclockwise as viewed from high above the xy-plane. Evaluate
Z
z 3 dx.
C

(Hint: Give an orthonormal basis for the plane x + y + z = 0, and use polar coordinates.)

19. Let C be the intersection of the sphere x2 + y 2 + z 2 = 1 and the plane x + y + z = 0, oriented
counterclockwise as viewed from high above the xy-plane. Evaluate
Z
xy 2 dx + yz 2 dy + zx2 dz.
C

(Hint: See Exercise 18.)

20. Suppose ω ∈ Ak−2 (Rk ). Complete the following proof that d(dω) = 0. Write d(dω) =
f (x)dx1 ∧ · · · ∧ dxk , and suppose f (a) > 0. By considering the integral of d(dω) over a small
ball centered at a and applying Corollary 5.3, arrive at a contradiction.

21. We saw in Example 8 of Section 3 that Zthere are 1-forms ω on R2 with the property that for
every region S ⊂ R2 we have area(S) = ω. Can there be such a 1-form on
∂S
a. the unit sphere?
b. the torus?
c. the punctured sphere (i.e., the sphere less the north pole)?

22. In this exercise we sketch a proof that the graph of a function f satisfying the minimal surface
equation (see p. 118) on a region Ω ⊂ R2 has less area than any other surface with the same
boundary curve.8
a. Consider the area 2-form σ of the graph of f :

1 ∂f ∂f
σ=p − dy ∧ dz − dz ∧ dx + dx ∧ dy .
1 + k∇f k2 ∂x ∂y
Show that dσ = 0 if and only if f satisfies the minimal surface equation.
8
This is an illustration of the use of calibrations, introduced by Reese Harvey and Blaine Lawson in their seminal
paper, Calibrated Geometries, Acta Math. 148 (1982), pp. 47–157.
376 Chapter 8. Differential Forms and Integration on Manifolds
Z
b. Show that for any compact oriented surface N ⊂ R3 , σ ≤ area(N ), and equality holds
N Z
if and only if N is parallel to the graph of f . (Hint: Interpret σ as a flux integral.)
N
c. Let M be the graph of f over Ω, and let N be a different oriented surface with ∂N = ∂M .
Deduce that area(M ) < area(N ).

23. a. Prove that M is an orientable k-dimensional manifold with boundary if and only if there is
a nowhere-zero k-form on M . (Hint: For “=⇒,” use definition (1) of a manifold on p. 250
and a partition of unity to glue together compatibly chosen forms on coordinate charts.
Although we’ve only proved Theorem 5.1 for a compact manifold M , the proof can easily
be adapted to show that for any manifold M and any covering {Uj } by coordinate charts,
we have a sequence of such functions ρi , each of which is zero outside some Uj .)
b. Conclude that M is orientable if and only if there is a volume form globally defined on M .

24. Let M be a compact, orientable k-dimensional manifold (with no boundary), and let ω be a
(k − 1)-form. Show that dω = 0 at some point of M . (Hint: Using Exercise 23, write dω = f σ,
where σ is the volume form of M . Without loss of generality, you may assume M is connected.
Why?)

e−1/x , x > 0
25. Let h(x) = . Because exponential functions grow faster at infinity than any
0, x≤0
polynomial, it should be plausible that all the derivatives of h at 0 are 0. But give a rigorous
proof as follows:
a. Let f (x) = e−1/x , x > 0. Prove by induction that f (k) , the kth derivative of f , is given by
f (k) (x) = e−1/x pk (1/x) for some polynomial pk of degree 2k.
b. Prove by induction that h(k) (0) = 0 for all k ≥ 0.

26. Let X ⊂ Rn . Prove that given any collection {Vα } of open subsets of Rn whose union contains
X, there is a sequence Vα1 , Vα2 , . . . of these sets whose union contains X. (Hint: Consider all
balls B(q, 1/k) ⊂ Rn (for some k ∈ N) centered at points q ∈ Rn all of whose coordinates are
rational. This collection is countable, i.e., can be arranged in a sequence. Show that we can
choose such balls B(qi , 1/ki ), i = 1, 2, . . ., covering all of X with the additional property that
each is contained in some Vαi .)

6. Applications to Physics

6.1. The Dictionary in R3 . We have already seen that a vector field in R3 can plausibly be
interpreted as either a 1-form or a 2-form, the former when we are calculating work, the latter when
we are calculating flux. We have already seen that for any function f , the 1-form df corresponds
to the vector field ∇f . We want to give the traditional interpretations of the exterior derivative as
it acts on 1- and 2-forms.
§6. Applications to Physics 377

Given a 1-form

ω = F1 dx1 + F2 dx2 + F3 dx3 ∈ A1 (R3 ), we have

∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1
dω = − dx2 ∧ dx3 + − dx3 ∧ dx1 + − dx1 ∧ dx2 .
∂x2 ∂x3 ∂x3 ∂x1 ∂x1 ∂x2

(We stick to the subscript notation here to make the symmetries as clear as possible.) Correspond-
ingly, given the vector field
 
∂F3 ∂F2
   − 
F1  ∂x2 ∂x3 
 
   
F =  F2  , we set curl F =  ∂F1 − ∂F3 .
 ∂x3 ∂x1 
F3  
 ∂F ∂F 
2 1
−
∂x1 ∂x2

Note first of all that d2 = 0 tells us that

curl (∇f ) = 0 for all C2 functions f .

In somewhat older books one often sees the notation “rot,” rather than curl; both terms suggest
that we think of curl F as having something to do with rotation (curling).
Stokes’s Theorem can now be phrased in the following classical form:

Theorem 6.1 (Classical Stokes’s Theorem). Let S ⊂ R3 be a compact, oriented surface with
boundary. Let F be a smooth vector field defined on all of S. Then we have
Z Z
|F ·{z
Tds} = |curl F{z· ndS} .
∂S ω S
dω

If we return to our Zdiscussion of flux in Section 4 and visualize F as the velocity field of a fluid,
then the line integral F · Tds around a closed curve C may be interpreted as the circulation
C
of F around C, which we might visualize as a measure of the tendency of a piece of wire in the
shape of C to turn (or circulate) when dropped in the fluid. Applying the theorem with S = Dr ,
a 2-dimensional disk of radius r centered at a with normal vector n, and using continuity (see
Exercise 7.1.7), we have
Z
1
curl F(a) · n = lim F · Tds.
r→0+ πr 2 ∂Dr

In particular, if, as pictured in Figure 6.1, we stick a very small paddlewheel (of radius r) in the
fluid, it will spin the fastest when the axle points in the direction of curl F (and—at least in the
limit—won’t spin at all when the axle is orthogonal to curl F!). Indeed, if the fluid—and hence the
paddlewheel—is spinning about an axis with angular speed ν, then kcurl Fk = 2ν (see Exercise 1).
Now, given the 2-form

ω = F1 dx2 ∧ dx3 + F2 dx3 ∧ dx1 + F3 dx1 ∧ dx2 ∈ A2 (R3 )

378 Chapter 8. Differential Forms and Integration on Manifolds

Figure 6.1

(which happens to be obtained by applying the star operator, defined in Exercise 8.2.9, to our
original 1-form), then

∂F1 ∂F2 ∂F3
dω = + + dx1 ∧ dx2 ∧ dx3 .
∂x1 ∂x2 ∂x3
Correspondingly, given the vector field
 
F1
∂F1 ∂F2 ∂F3
F =  F2  , we set div F = + + .
F3
∂x1 ∂x2 ∂x3

“div” is short for divergence, a term that is à propos, as we shall soon see. In this case, d2 = 0 can
be restated as
div (curl F) = 0 for all C2 vector fields F.
Stokes’s Theorem now takes the following form, sometimes called Gauss’s Theorem:

Theorem 6.2 (Classical Divergence Theorem). Suppose F is a smooth vector field on a compact
3-manifold with boundary, Ω ⊂ R3 . Then
Z Z
|F ·{z
ndS} = |div {z
FdV} .
∂Ω ω Ω
dω

Once again, we get from this a limiting interpretation of the divergence: Applying Exercise
7.1.7, we find
Z
1
(∗) div F(a) = lim 4 3 F · ndS.
r→0+ πr ∂B(a,r)
3
That is, div F(a) is a measure of the flux (per unit volume) outwards across very small spheres
centered at a. If that flux is positive, we can visualize a as a source of the field, with a net divergence
of the fluid flow; if the flux is negative, we can visualize a as a sink , with a net confluence of the
fluid. We shall see a beautiful alternative interpretation of the divergence in Chapter 9.
Given a vector field F (in the context of work) and the corresponding 1-form ω, applying the
star operator introduced in Exercise 8.2.9 gives the 2-form ⋆ω corresponding to the same vector
field F (in the context of flux)—and vice versa. That is, when we have an oriented surface S, the
2-form ⋆ω gives the normal component of F times area 2-form σ of S. In particular, if we start
§6. Applications to Physics 379

with a function f , then on S, ⋆df = (Dn f )σ, where Dn f = ∇f · n is the directional derivative of
f in the normal direction.
We summarize the relation among forms and vector fields, the d operator and gradient, curl,
and divergence, in the following table:

Differential Forms Fields

0-forms functions (scalar fields)
↓d ↓ grad
d2 =0 1-forms vector fields (work) curl (grad) = 0
↓d ↓ curl
d2 = 0 2-forms vector fields (flux) div (curl ) = 0
↓d ↓ div
3-forms functions (scalar fields)

6.2. Gauss’s Law. In this passage we concentrate on inverse square forces, either gravitation
(according to Newton’s law of gravitation) or electrostatic attraction (according to Coulomb’s
law). We will stick with the notation of Newton’s law of gravitation, as we discussed in Section 4
of Chapter 7: the gravitational attraction of a mass M at the origin on a unit test mass at position
x is given by
x
F = −GM .
kxk3
(Here G is the universal gravitation constant.) As we saw in Example 4 of Section 5, div F = 0
(except at the origin) and for any compact surface S ⊂ R3 bounding a region Ω, we have

Z −4πGM , 0 ∈ Ω
F · ndS = .
S 0, otherwise

(We must also stipulate that 0 ∈ / S for the integral to make sense.) More generally, if Fa is the
gravitational force field due to a point mass at point a ∈/ S, then

Z −4πGM , a ∈ Ω
Fa · ndS = .
S 0, otherwise

If we have point masses M1 , . . . , Mk at points a1 , . . . , ak , then the flux of the resultant gravitation
Pk
force F = Faj outwards across the surface S (on which, once again, none of the point masses
j=1
lies) is given by
Z X
F · ndS = −4πG Mj .
S aj ∈Ω
380 Chapter 8. Differential Forms and Integration on Manifolds

Indeed, given a mass distribution with (integrable) density function δ on a region D, we can, in
fact, write an explicit formula for the gravitational field (see Section 4 of Chapter 7):
Z
y−x
(†) F(x) = G δ(y)dVy .
D ky − xk3
(When x ∈ D, this integral is improper, yet convergent, as can be verified by using spherical
coordinates centered at the point x.) It should come as no surprise, approximating the mass
distribution by a finite set of point masses, that the flux of the resulting gravitational force F is
given by Z Z
F · ndS = −4πG δdV = −4πGM ,
S Ω
where M is the mass inside S = ∂Ω. This is Gauss’s law.
Using the limiting formula for divergence given in (∗) on p. 378, we see that, even if F isn’t
apparently smooth, it is plausible to define

div F(x) = −4πGδ(x)

when δ is continuous on D (and div F(x) = 0 when x ∈ / D).

Now we can determine, as did Newton (following the lines of Example 6 of Chapter 7, Section
4), the gravitational field F inside the earth, assuming—albeit incorrectly—that the earth is a ball
of uniform density. Take the earth to be a ball of radius R centered at the origin and have constant
density and total mass M . Fix x with kxk = b < R. First of all, we have
Z 3
b
F · ndS = −4πG(mass of the earth inside B(0, b)) = −4πG M.
∂B(0,b) R
Now, by symmetry, F points radially inwards, and so
Z

F · ndS = −kFkarea ∂B(0, b) = −kFk(4πb2 ).
∂B(0,b)

GM
Thus, we have kF(x)k = kxk. Since F is radial, we have
R3
GM
F(x) = − 3 x.
R
It is often surprising to find that the gravitational force inside the earth is linear in the distance
from the center. Notice that at the earth’s surface, this analysis is in accord with the inverse-square
nature of the field. (See Exercise 2.)
As an amusing application, we calculate the time required to travel in a perfectly frictionless
tunnel inside the earth from one point on the surface to another. We suppose that we start the trip
with zero speed. When the mass is at position x, the component of the gravitational force acting
in the direction of the tunnel is
GM
−kFk sin θ = − 3 u,
R
where u is the displacement of the mass from the center of the tunnel (see Figure 6.2). By Newton’s
second law, we have
GM
mu′′ (t) = − 3 u(t).
R
§6. Applications to Physics 381

u
x

Figure 6.2

The general solution is

q q
GM GM
u(t) = a cos R3 t + b sin R3 t .
If we start with the initial conditions u(0) = u0 and u′ (0) = 0, then we have
q
GM
u(t) = u0 cos R 3 t ,

and we see that the mass reaches the opposite end of the tunnel after time
s
π R
T =q =π ≈ 42 min.
GM g
R3

As was pointed out to me my freshman year of college, this is rather less time than many of our
commutes!

6.3. Maxwell’s Equations. Let E denote the electric field, B the magnetic field, ρ the charge
density, and J the current density. All of these are functions on (some region in) R3 × R (space-
time), on which we use coordinates x, y, z, and t. The classical presentation of Maxwell’s equations
is the following system of four partial differential equations (ignoring various constants such as 4π
and c, the speed of light).
Gauss’s law: div E = ρ
no magnetic monopoles: div B = 0
∂B
Faraday’s law: curl E = −
∂t
∂E
Ampère’s law: curl B = +J
∂t
These are all “differential” versions of equivalent “integral” statements obtained by applying Stokes’s
Theorem, as we already encountered Gauss’s Law in the previous subsection. Briefly: suppose S is
an oriented surface (perhaps imagined) and ∂S represents a wire. Then Faraday’s law states that
Z Z Z
∂B d
E · Tds = − · ndS = − B · ndS
∂S S ∂t dt S
(using the result of Exercise 7.2.20 to differentiate under the integral sign); i.e., the voltage around
the loops ∂S equals the negative of the rate of change of magnetic flux across the loop. (More
382 Chapter 8. Differential Forms and Integration on Manifolds

colloquially, a moving magnetic field induces an electric field that in turn does work, namely,
creates a voltage drop across the loop.) On the other hand, Ampère’s law states that (in steady
state, with no time variation) Z Z
B · Tds = J · ndS,
∂S S
i.e., the circulation of the magnetic field around the wire is the flux of the current density across
the loop.
Let
ω = (E1 dx + E2 dy + E3 dz) ∧ dt + (B1 dy ∧ dz + B2 dz ∧ dx + B3 dx ∧ dy).
Then

∂E3 ∂E2 ∂B1 ∂E
1 ∂E3 ∂B2
dω = − + dy ∧ dz + − + dz ∧ dx+
∂y ∂z ∂t ∂z ∂x ∂t
∂E
2 ∂E1 ∂B3 ∂B1 ∂B2 ∂B3
− + dx ∧ dy ∧ dt + + + dx ∧ dy ∧ dz,
∂x ∂y ∂t ∂x ∂y ∂z
and so we see that
∂B
dω = 0 ⇐⇒ div B = 0 and curl E + = 0.
∂t
Next, let
θ = −(E1 dy ∧ dz + E2 dz ∧ dx + E3 dx ∧ dy) + (B1 dx + B2 dy + B3 dz) ∧ dt.
(Using the star operator defined in Exercise 8.2.9, one can check that θ = ⋆ω. The subtlety is that
we’re working in space-time, endowed with a Lorentz metric in which the standard orthonormal
basis {e1 , . . . , e4 } has the property that e4 ·e4 = −1; this introduces a minus sign so that ⋆(dx∧dt) =
−dy ∧ dz, etc.) Then an analogous calculation shows that
∂E
dθ = 0 ⇐⇒ div E = 0 and curl B − = 0.
∂t
This would hold, for example, in a vacuum, where ρ = 0 and J = 0. But, in general, the first and
last of Maxwell’s equations are equivalent to the equation
dθ = (J1 dy ∧ dz + J2 dz ∧ dx + J3 dx ∧ dy) ∧ dt − ρdx ∧ dy ∧ dz.
Since dω = 0 on R4 , there is a 1-form
α = a1 dx + a2 dy + a3 dz − ϕdt
so that dα = ω (see Exercise 8.7.12). Of course, α is far from unique; for any function f , we will
have d(α + df ) = ω as well. Let β = α + df , where f is a solution of the inhomogeneous wave
equation
2 ∂2f ∂a1 ∂a2 ∂a3 ∂ϕ
∇ f− 2 =− + + + .
∂t ∂x ∂y ∂z ∂t
This means that d⋆df = −d⋆α, and so d⋆β = 0. Writing
β = A1 dx + A2 dy + A3 dz − φdt,
the condition that d⋆β = 0 is equivalent to
∂A1 ∂A2 ∂A3 ∂φ
(∗) + + + = 0.
∂x ∂y ∂z ∂t
§6. Applications to Physics 383

Since dβ = ω, ⋆dβ = ⋆ω = θ, we calculate

d⋆dβ = dθ = (J1 dy ∧ dz + J2 dz ∧ dx + J3 dx ∧ dy) ∧ dt − ρdx ∧ dy ∧ dz.
Using (∗) to substitute
∂ 2 A1 ∂ 2 A2 ∂ 2 A3 ∂2φ
+ + =− 2,
∂x∂t ∂y∂t ∂z∂t ∂t
 
A1
we can check that solving Maxwell’s equations is equivalent to finding A =  A2  and φ satisfying
A3
the inhomogeneous wave equations
∂2A ∂2φ
∇2 A − = −J and ∇ 2
φ − = −ρ.
∂t2 ∂t2
Solving such equations is a standard topic in an upper-division course in partial differential equa-
tions.

EXERCISES 8.6

1. Write down the vector field F corresponding to a rotation counterclockwise about an axis in
the direction of the unit vector a with angular speed ν, and check that curl F = 2νa.

2. Using Gauss’s law, show that the gravitational field of a uniform ball outside the ball is that
of a point mass at its center.

3. (Green’s formulas) Let f, g : Ω → R be smooth functions on a region Ω ⊂ R3 . Recall that Dn g

denotes the directional derivative of g in the normal direction.
a. Prove that
Z Z
(Dn g)dS = ∇2 gdV
∂Ω Ω
Z Z

f (Dn g)dS = f ∇2 g + ∇f · ∇g dV
∂Ω
Z ZΩ

f Dn g − gDn f dS = f ∇2 g − g∇2 f dV
∂Ω Ω
(Hint: ∇2 gdx
∧ dy ∧ dz = d⋆dg.)
b. We say f is harmonic on Ω if ∇2 f = 0 on Ω. Prove that if f and g are harmonic on Ω,
then
Z
(Dn f )dS = 0
∂Ω
Z Z
f (Dn f )dS = k∇f k2 dV
∂Ω Ω
Z Z
f (Dn g)dS = g(Dn f )dS.
∂Ω ∂Ω

4. (See Exercise 3.) Prove that if f and g are harmonic on a region Ω and f = g on ∂Ω, then
f = g everywhere on Ω. (Hint: Consider f − g.)
384 Chapter 8. Differential Forms and Integration on Manifolds

5. a. Prove that g : R3 − {0} → R, g(x) = 1/kxk, is harmonic. (See Exercise 3.)

b. Prove that if f is harmonic on B(0, R) ⊂ R3 , then f has the mean value property: f (0)
is the average of the values of f on the sphere of any radius r < R centered at 0. (Hint:
Apply the appropriate results of Exercise 3 with Ωε = {x : ε ≤ kxk ≤ r} and g as in part
a; then let ε → 0+ .)
c. Deduce the maximum principle for harmonic functions: if f is harmonic on a region Ω,
then f takes on its maximum value on ∂Ω.

6. Let S ⊂ R3 be a closed, oriented surface. Using the formula (†) for the gravitational field F,
show that
a. the flux of F outwards across S is 0 whenZ no points of D lie on or inside S.
b. the flux of F outwards across S is −4πG δdV when all of D lies inside S.
D
(Hint: Change the order of integration.)

*7. Try to determine which of the vector fields pictured in Figure 6.3 have zero divergence and
which have zero curl. Justify your answers.

8. Let F be a smooth vector field on an open set U ⊂ Rn . A parametrized curve g is a flow line
for a vector field F if g′ (t) = F(g(t)) for all t.
a. Give a vector field with a closed flow line.
b. Prove that if F is conservative, then it can have no closed flow line (other than a single
point).
c. Prove that if n = 2 and F has a closed flow line C, then div F must equal 0 at some point
inside C. (Hint: See Exercise 8.3.18.)

9. Let Ω ⊂ R3 be aZcompact 3-manifold

Z with boundary.
a. Prove that f ndS = ∇f dV . (Hint: Apply Stokes’s Theorem to each component.)
Z
∂Ω Ω
b. Deduce that ndS = 0. Give a (geometric) plausibility argument for this result.
∂Ω

10. (Archimedes’ law of buoyancy) Prove that when a floating body in a uniform liquid is at
equilibrium, it displaces its own weight, as follows. Let Ω denote the portion of the body that
is submerged.
a. The force exerted by the pressure of the liquid on a planar piece of surface is directed inward
normal to the surface,
Z and pressure is force per unit area. Deduce that the buoyancy force
is given by B = −pndS, where p is the pressure.
∂Ω
b. Assuming that ∇p = δg, where δ is the (constant) density of the liquid and g is the
acceleration of gravity, deduce that B = −M g, where M is the mass of the displaced
liquid. (Hint: Apply Exercise 9.)
c. Deduce the result.

11. Let v be the velocity field of a fluid flow, and let δ be the density of the fluid. (These are both
§6. Applications to Physics 385

(a) (b)

(e) (f)

(g) (h)

Figure 6.3
386 Chapter 8. Differential Forms and Integration on Manifolds

C1 functions of position and time.) Let F = δv. The law of conservation of mass states that
Z Z
d
δdV = − F · ndS.
dt Ω ∂Ω

Show that the validity of this equation for all regions Ω is equivalent to the equation of conti-
nuity:
∂δ
div F + = 0.
∂t
(Hint: Use Exercise 7.2.20.)

3 2 x
12. Suppose a body Ω ⊂ R has (C ) temperature u at position x ∈ Ω at time t. Assume
t
that the heat flow vector q = −K∇u, where K is a constant (called the heat conductivity of
the body); the flux of q outwards across an oriented surface S represents the rate of heat flow
across S. Z
a. Show that the rate of heat flow across ∂Ω into Ω is F = K∇2 udV .
Ω
b. Let c denote the heat capacity of the body; the amount of heat required to raise the
temperature of the volume ∆V by ∆T degrees is approximately (c∆T )∆V ; thus, the rate
∂u
at which the volume ∆V absorbs heat is c ∆V . Conclude that the rate of heat flow into
Z ∂t
∂u
Ω is F = c dV .
Ω ∂t
∂u
c. Deduce that the heat flow within Ω is governed by the partial differential equation c =
∂t
2
K∇ u.
2
13. Suppose Ω ⊂ R3 is a region
and u : Ω × [0, ∞) → R is a C solution of the heat equation
∂u x
∇2 u = . Suppose u = 0 for all x ∈ Ω and Dn u = 0 on ∂Ω (this means the region is
∂t 0
insulated along the boundary). Z
1
a. Consider the “energy” E(t) = u2 dV . Note that E(0) = 0. Prove that E ′ (t) ≤ 0 (this
2 Ω
means that heat dissipates) and show that E(t) = 0 for all t ≥ 0. (Hint: Use Exercise
7.2.20.)
x
b. Prove that u = 0 for all x ∈ Ω and all t ≥ 0.
t
c. Prove that if u1 and u2 are two solutions of the heat equation that agree at t = 0 and
agree on ∂Ω for all time t ≥ 0, then they must agree for all time t ≥ 0.
∂2u
14. Suppose Ω ⊂ R3 is a region and u : Ω×R → R is a C2 solution of the wave equation ∇2 u = 2 .
∂t
x
Suppose that u = f (x) for all x ∈ ∂Ω and all t (e.g., in two dimensions, the drumhead is
t
clamped along the boundary of Ω). Prove that the total energy
Z 2
1 ∂u 2
E(t) = + k∇uk dV
2 Ω ∂t
is constant. Here by ∇u we mean the vector of derivatives with respect only to the space
variables.
§7. Applications to Topology 387

7. Applications to Topology

We are going to give a brief introduction to the field of topology by using the techniques of
differential forms and Stokes’s Theorem to prove three rather deep theorems. The basic ingredient
of several of our proofs is the following. Let S n denote the n-dimensional unit sphere, S n = {x ∈
Rn+1 : kxk = 1}, and D n the closed unit ball, D n = {x ∈ Rn : kxk ≤ 1}. (Then ∂D n+1 = S n .)

Lemma 7.1. There is an n-form ω on S n whose integral is nonzero.

Proof. It is easy to check directly that the volume form

n+1
X
ω= ci ∧ · · · ∧ dxn+1
(−1)i−1 xi dx1 ∧ · · · ∧ dx
i=1
is such a form.

Theorem 7.2. There is no smooth function r : D n+1 → S n with the property that r(x) = x
for all x ∈ S n .

Proof. Suppose there were such an r. Letting ω be an n-form on S n as in Lemma 7.1, we have
Z Z Z Z
ω= r∗ ω = d(r∗ ω) = r∗ (dω) = 0,
Sn Sn D n+1 D n+1
inasmuch as the only (n + 1)-form on an n-dimensional manifold is 0 (and hence dω = 0). But this
is a contradiction, since we chose ω with a nonzero integral.

Corollary 7.3 (Brouwer Fixed Point Theorem). Let f : D n → D n be smooth. Then there
must be a point x ∈ D n so that f (x) = x; i.e., f must have a fixed point.

Proof. Suppose not. Then for all x ∈ D n , the points x and f (x) are distinct. Define r : D n →
S n−1 by setting r(x) to be the point where the ray starting at f (x) and passing through x intersects

r(x)

x
f(x)

Figure 7.1

the unit sphere, as shown in Figure 7.1. We leave it to the reader to check in Exercise 1 that r is
in fact smooth. By construction, whenever x ∈ S n−1 , we have r(x) = x. By Theorem 7.2, no such
function can exist, and hence f must have a fixed point.
Topology is in some sense the study of continuous (or, in our case, smooth) deformations of
objects. An old saw is that a topologist is one who cannot tell the difference between a doughnut
and a coffee cup. This is because we can continuously deform one to the other, assuming we have
388 Chapter 8. Differential Forms and Integration on Manifolds

flexible, plastic objects: the “hole” in the doughnut becomes the “hole” in the handle of the cup.
The crucial notion here is the following:

Definition . Suppose X ⊂ Rn and Y ⊂ Rm . Let f : X → Y and g : X → Y be (smooth)

functions.
We say they are
(smoothly)
homotopic if there is a (smooth) map H : X × [0, 1] → Y so
x x
that H = f (x) and H = g(x) for all x ∈ X.
0 1

Example 1. The identity function f : D n → D n , f (x) = x, is homotopic to the constant map

g(x) = 0. We merely set
x
H = (1 − t)x.
t
The homotopy shrinks the unit ball gradually to its center. ▽

Example 2. Are the maps f , g : S 1 → S 1 given by

cos t cos t cos t cos 2t
f = and g =
sin t sin t sin t sin 2t
homotopic? These parametrized curves wrap once and twice, respectively, around the unit circle, so
the winding numbers of these curves about the origin are 1 and 2, respectively. If we surmise that
the winding number should vary continuously as we continuously deform the curve, then we guess
that the curves cannot be homotopic. Let’s make this precise: suppose there were a homotopy
H : S 1 × [0, 1] → S 1 between f and g. Let ω = −ydx + xdy ∈ A1 (S 1 ). Then

S1 × [0,1]

0
S1

Figure 7.2
Z Z
f ∗ ω = 2π and g∗ ω = 4π.
S1 S1
On the other hand,
Z Z Z
∗ ∗
H ω= d(H ω) = H∗ (dω) = 0,
∂(S 1 ×[0,1]) S 1 ×[0,1] S 1 ×[0,1]

since any 2-form on S 1 must be 0. On the other hand, as we see from Figure 7.2,
∂(S 1 × [0, 1]) = (S 1 × {1})− ∪ (S 1 × {0}),

so
Z Z Z
∗ ∗
H ω= f ω− g∗ ω.
∂(S 1 ×[0,1]) S1 S1
§7. Applications to Topology 389

In conclusion, if f and g are homotopic, then we must have

Z Z
f ∗ω = g∗ ω;
S1 S1
since 2π 6= 4π, we infer that f and g cannot be homotopic. ▽

In general, we have the following important result:

Proposition 7.4. Suppose X is a compact, oriented k-dimensional manifold and f , g : X → Y

are homotopic maps. Then for any closed k-form ω on Y , we have
Z Z
∗
f ω= g∗ ω.
X X

Proof. We leave this to the reader in Exercise 3.

By the way, it is time to give a more precise definition of the term “simply connected.” A closed
curve in Rn is nothing other than the image of a map S 1 → X.

Definition. We say X ⊂ Rn is simply connected if every pair of points in X can be joined by

a path and every map f : S 1 → X is homotopic to a constant map.

As a consequence of Proposition 7.4, we have

Corollary 7.5. Suppose X is a simply connected manifold. Then every closed 1-form on X is
exact.
1 ∗
Proof. LetZ f : S → X be a closed curve. f is homotopic to a constant map g. Since g ω = 0,
we infer that f ∗ ω = 0. The result now follows from Theorem 3.2.
S1

Note that this is the generalization of the local result we obtained earlier, Proposition 3.3.
Before moving on to our last topic, we stop to state and prove one of the cornerstones of classical
mathematics. We assume a modest familiarity with the complex numbers.

Theorem 7.6 (Fundamental Theorem of Algebra). Let n ≥ 1 and a0 , a1 , . . . , an−1 ∈ C; con-

sider a polynomial p(z) = z n + an−1 z n−1 + · · · + a1 z + a0 . Then p has n roots in C (counting
multiplicities).

Proof. (We identify C with R2 for purposes of the vector calculus.9 ) Since
an−1 z n−1 + · · · + a1 z + a0
lim = 0,
z→∞ zn
there is R > 0 so that whenever |z| ≥ R we have

an−1 z n−1 + · · · + a1 z + a0 1
≤ .
zn 2

9
Recall that complex numbers are of the form z = x + iy, x, y ∈ R. We add complex numbers as vectors in
R , and we multiply by using the distributive property and the rule i2 = −1: if z = x + iy and w = u + iv, then
2

zw = (xu − yv) + i(xv + yu). It is customary to denote the length of the complex number z by |z| and the reader
can easily check that |zw| = |z||w|. In addition, deMoivre’s formula tells us that if z = r(cos θ + i sin θ), then
z n = r n (cos nθ + i sin nθ).
390 Chapter 8. Differential Forms and Integration on Manifolds

On ∂B(0, R) we have a homotopy H : ∂B(0, R) × [0, 1] → C − {0} between p and g(z) = z n given
by

z
H = z n + (1 − t)(an−1 z n−1 + · · · + a1 z + a0 ) = tg(z) + (1 − t)p(z).
t
The crucial issue is that, by the triangle inequality,

|z n + (1 − t)(an−1 z n−1 + · · · + a1 z + a0 )| ≥ |z n | − (1 − t)|an−1 z n−1 + · · · + a1 z + a0 |

1 Rn
≥ |z n |(1 − ) = ,
2 2
so the function H indeed takes values in C − {0}.
−ydx + xdy
Recall that the 1-form ω = 2 + y2
is a closed form on C − {0} = R2 − {0}. Moreover,
x
R cos t cos nt
writing g = Rn , 0 ≤ t ≤ 2π, we see that
R sin t sin nt
Z Z 2π Z 2π

g∗ ω = −(sin nt)(−n sin nt) + (cos nt)(n cos nt) dt = ndt = 2πn,
∂B(0,R) 0 0
Z
and hence, by Proposition 7.4, we have p∗ ω = 2πn as well. Now, suppose p had no root
∂B(0,R)
in B(0, R). Then p would actually be a smooth map from all of B(0, R) to C − {0} and we would
have Z Z Z
∗ ∗
2πn = p ω= d(p ω) = p∗ (dω) = 0,
∂B(0,R) B(0,R) B(0,R)
which is a contradiction. Therefore, p has at least one root on B(0, R). The stronger statement of
the theorem follows easily by induction on n.

We can actually obtain a stronger, more localized version. We need the following computational
result, a more elegant proof of which is suggested in Exercise 8.

Lemma 7.7. Let ω = (−ydx + xdy)/(x2 + y 2 ) ∈ A1 (C − {0}), and suppose f and g are smooth
maps to C − {0}. Then (f g)∗ ω = f ∗ ω + g ∗ ω.

Proof. Write f = u + iv and g = U + iV . Then f g = (uU − vV ) + i(uV + vU ) and so, using

the product rule and a bit of high school algebra, we obtain

∗ ∗ −ydx + xdy −(uV + vU )d(uU − vV ) + (uU − vV )d(uV + vU )
(f g) ω = (f g) 2 2
=
x +y (uU − vV )2 + (uV + vU )2
−(uV + vU )d(uU − vV ) + (uU − vV )d(uV + vU )
=
(u2 + v 2 )(U 2 + V 2 )
−(uV + vU )(U du − V dv + udU − vdV ) + (uU − vV )(V du + U dv + udV + vdU )
=
(u2 + v 2 )(U 2 + V 2 )
(U 2 + V 2 )(−vdu + udv) + (u2 + v 2 )(−V dU + U dV )
=
(u2 + v 2 )(U 2 + V 2 )
−vdu + udv −V dU + U dV
= + = f ∗ ω + g∗ ω,
u2 + v 2 U2 + V 2
as required.
§7. Applications to Topology 391

Now we have an intriguing application of winding numbers (see Section 3) that gives a two-
dimensional analogue of Gauss’s Law from the preceding section. We make use of the Fundamental
Theorem of Algebra.

Proposition 7.8. Let p be a polynomial with complex coefficients. Let D ⊂ C be a region so

that no root of p lies on C = ∂D. Then
Z
1 ∗ −ydx + xdy
p
2π C x2 + y 2
is equal to the number of roots of p in D.

Proof. As usual, let ω = (−ydx + xdy)/(x2 + y 2 ). Using Theorem 7.6, we factor p(z) =
c(z − r1 )(z − r2 ) · · · (z − rn ), where c 6= 0 and rj ∈ C, j = 1, . . . , n, are the roots of p. Let
fj (z) = z − rj . Then we claim that

Z 1, r ∈ D
1 ∗ j
f ω= .
2π C j 0, otherwise

The former is a consequence of Example 10 on p. 345; the latter follows from Corollary 7.5. Applying
Pn
Lemma 7.7 repeatedly, we see that p∗ ω = fj∗ ω, and so
j=1
Z n
X Z X
1 ∗ 1
p ω= fj∗ ω = 1
2π C 2π C
j=1 rj ∈D

is equal to the number of roots of p in D.

There are far-reaching generalizations of this result that you may learn about in a differential
topology or differential geometry course. An interesting application is the study of how roots of a
polynomial vary as we change the polynomial; see Exercise 9.
A vector field v on S n is a smooth function v : S n → Rn+1 with the property that x · v(x) = 0
for all x. (That is, v(x) is tangent to the sphere at x.)

Example 3. There is an obvious nowhere-zero vector field on S 1 , the unit circle, which we’ve
seen many times this chapter:

x1 −x2
v = .
x2 x1
Indeed, an analogous formula works on S 2m−1 ⊂ R2m :
   
x1 −x2
 x2  x1 
   
  
.. .. 
v =
. . .
   
x2m−1   −x2m 
x2m x2m−1

(If we visualize the vector field in the case of the circle as pushing around the circle, in the higher-
dimensional case, we imagine pushing in each of the orthogonal x1 x2 -, x3 x4 -, . . . , x2m−1 x2m -planes
independently.) ▽
392 Chapter 8. Differential Forms and Integration on Manifolds

In contrast with the preceding example, however, it is somewhat surprising that there is no
nowhere-zero vector field on S n when n is even. The following result is usually affectionately called
the Hairy Ball Theorem, as it says that we cannot “comb the hairs” on an even-dimensional sphere.

Theorem 7.9. Any vector field on the unit sphere S 2m must vanish somewhere.

Proof. We proceed by contradiction. Suppose v were a nowhere-zero vector field on S 2m ; we

may assume (by normalizing) that kv(x)k = 1 for all x ∈ S 2m . We now use the vector field to
define a homotopy between the identity map f : S 2m → S 2m and the antipodal map g : S 2m → S 2m ,
g(x) = −x. Namely, we follow along the semicircle from x to −x in the direction of v(x), as pictured
in Figure 7.3. To be specific, define H : S 2m × [0, 1] → S 2m by

−x v(x)
H(xt )

Figure 7.3

x
H = (cos πt)x + (sin πt)v(x).
t
Clearly, H is a smooth function. Now we apply Proposition 7.4, using the form ω defined in Lemma
7.1. In particular, we calculate g∗ ω explicitly:
2m+1
X
g∗ ω = g∗ ci ∧ · · · ∧ dx2m+1
(−1)i−1 xi dx1 ∧ · · · ∧ dx
i=1
2m+1
X
= \i ) ∧ · · · ∧ (−dx2m+1 ) = (−1)2m+1 ω = −ω.
(−1)i−1 (−xi )(−dx1 ) ∧ · · · ∧ (−dx
i=1
Thus, we have Z Z Z Z
∗ ∗
ω= f ω= g ω=− ω;
Z S 2m S 2m S 2m S 2m

since ω 6= 0, we have arrived at a contradiction.

S 2m

EXERCISES 8.7

1. Check that the mapping r defined in the proof of Corollary 7.3 is in fact smooth.

2. Consider the maps f and g defined in Example 2 as maps from [0, 2π] to R2 (rather than to
S 1 ). Determine whether they are homotopic.
§7. Applications to Topology 393

3. Prove Proposition 7.4.

Z
4. Let f : C → C be given by f (z) = z 4 − 3z + 9, and let Ω = {|z| ≤ 2}. Evaluate f ∗ ω, where,
∂Ω
as usual, ω = (−ydx + xdy)/(x2 + y 2 ). How many roots does f have in Ω?

5. Show that Corollary 7.3 need not hold on the following spaces:
a. S n
b. the annulus {x ∈ R2 : 1 ≤ kxk ≤ 2}
c. a solid torus
d. B n (the open unit ball)

6. Prove the following generalization of Theorem 7.2: let M be any compact, orientable manifold
with boundary. Then there is no function f : M → ∂M with the property that f (x) = x for all
x ∈ ∂M .

C5 C4

Figure 7.4

7. As pictured in Figure 7.4, let

Z = {x2 + y 2 = 1, z = 0} ∪ {x = y = 0} ∪ {x = z = 0, y ≥ 1} ⊂ R3 .

Suppose ω is a Zcontinuously differentiable

Z 1-form on R3Z − Z satisfying
Z dωZ = 0. Suppose,
moreover, that ω = 3 and ω = −7. Calculate ω, ω, and ω. Give your
C1 C2 C3 C4 C5
reasoning.

8. a. Let z = x + iy. Show that

dz xdx + ydy −ydx + xdy
= 2 2
+i .
z x +y x2 + y 2
b. Let U ⊂ C be open and suppose f, g : U → C − {0} are differentiable. Letting ω =
(−ydx + xdy)/(x2 + y 2 ), prove that (f g)∗ ω = f ∗ ω + g∗ ω. (Hint: What is (f g)∗ (dz/z)?)

9. Let ω = (−ydx + xdy)/(x2 + y 2 ).

394 Chapter 8. Differential Forms and Integration on Manifolds

a. Suppose U ⊂ C is open, f, g : U → C are smooth, and C ⊂ U is a closed curve. Suppose

that on C we have f, g 6= 0 and |g − f | < |f |. Prove that
Z Z
g∗ ω = f ∗ ω.
C C

(Hint: Use a homotopy similar to that appearing in the proof of Theorem 7.6.)
b. Let a0 , a1 , . . . , an−1 ∈ C and p(z) = z n + an−1 z n−1 + · · · + a1 z + a0 . Let D ⊂ C be a
region so that no root of p lies on C = ∂D. Prove that there is δ > 0 so that whenever
|bj − aj | < δ for all j = 0, 1, . . . , n − 1, the polynomial P (z) = z n + bn−1 z n−1 + · · · + b1 z + b0
has the same number of roots in D as p.
c. Deduce from part b that “the roots of a polynomial vary continuously with the coefficients.”
(Cf. Example 2 on p. 181 and Exercise 6.2.2. See also Exercise 9.4.22 for an interesting appli-
cation to linear algebra.)

10. Let f : S 2m → S 2m be a smooth map. Prove that there exists x ∈ S 2m so that either f (x) = x
or f (x) = −x.

11. Let n ≥ 2 and f : D n → Rn be smooth. Suppose kf (x) − xk < 1 for all x ∈ S n−1 . Prove that
there is some x ∈ D n so that f (x) = 0. (Hint: If not, show that the restriction of the map
f
: D n → S n−1 to ∂D n is homotopic to the identity map.)
kf k
12. We wish to give a generalization of Proposition 3.3. Suppose U ⊂ Rn is an open subset that is
star-shaped with respect to the origin.
a. For any k = 1, . . . , n, given a k-form φ = fI dxI on U , define the (k − 1)-form I(φ) =
Z 1 Xk
tk−1 fI (tx)dt d
(−1)j−1 xij dxi1 ∧ · · · ∧ dx ij ∧ · · · ∧ dxik . Then make I linear. Prove
0 j=1
that φ = d(I(φ)) + I(dφ).
b. Prove that if ω is a closed k-form on U , then ω is exact.

13. Use the result of Exercise 12 to express each of the following closed forms ω on R3 in the form
ω = dη.
a. ω = (ex cos y + z)dx + (2yz 2 − ex sin y)dy + (x + 2y 2 z + ez )dz
b. ω = (2x + y 2 )dy ∧ dz + (3y + z)dx ∧ dz + (z − xy)dx ∧ dy
c. ω = xyzdx ∧ dy ∧ dz

14. Draw an orientable surface whose boundary is the boundary curve of the Möbius strip, as
pictured in Figure 7.5. (More generally, every simple closed curve in R3 bounds an orientable
surface. Can you see why?)

15. Find three everywhere linearly independent vector fields on S 1 × S 2 .

16. Fill in the details in the following alternative proof of Theorem 7.9 following J. Milnor. Given
a (smooth) unit vector field v on S n , first extend v to be a vector field V on Rn+1 by setting

kxk2 v x , x 6= 0
kxk
V(x) = .
0, x=0
§7. Applications to Topology 395

Figure 7.5

a. Check that V is C1 .
b. Define ft : D n+1 → Rn+1 by ft (x) = x + tV(x). Apply the inverse function theorem to
prove that for t sufficiently small, ft maps the closed unit ball one-to-one and onto the
√
closed ball of radius 1 + t2 . (Hints: To establish one-to-one, first use the
inverse
function

x ft (x)
theorem to show that the function F : D n+1 × R → Rn+1 × R given by F = is
t t
locally one-to-one. Now proceed by contradiction: suppose there were a sequence tk → 0
and points xk , yk ∈ D n+1 so that ftk (xk ) = ftk (yk ). Use compactness of D n+1 to pass to
convergent subsequences xkj and ykj . To establish onto, you will need to use the fact that
the only nonempty subset of D n+1 that is both open (in D n+1 ) and closed is D n+1 itself.)
√
c. Apply the Change of Variables Theorem to see that the volume of B(0, 1 + t2 ) must be
a polynomial expression in t.
d. Deduce that you have arrived at a contradiction when n is even.
CHAPTER 9
Eigenvalues, Eigenvectors, and Applications
We have seen the importance of choosing the appropriate coordinates in doing multiple inte-
gration. Now we turn to what is really a much more basic question. Given a linear transformation
T : Rn → Rn , can we choose appropriate (convenient?) coordinates on Rn so that the matrix for
T (in these coordinates) is as simple as possible, say diagonal? For this the fundamental tool is
eigenvalues and eigenvectors. We then give applications to difference and differential equations and
quadratic forms.

1. Linear Transformations and Change of Basis

In all our previous work, we have referred to the “standard matrix” of a linear transformation.
Now we wish to broaden our scope.
Definition. Let V be a finite-dimensional vector space and let T : V → V be a linear trans-
formation. Let B = {v1 , . . . , vn } be an ordered basis for V . Define numbers aij , i = 1, . . . , n,
j = 1, . . . , n, by
T (vj ) = a1j v1 + a2j v2 + · · · + anj vn , j = 1, . . . , n.
Then we define A = [aij ] to be the matrix for T with respect to B, also denoted [T ]B . As before,
we have  
| | |
 
A = T (v1 ) T (v2 ) · · · T (vn )  ,
| | |
where now the column vectors are the coordinates of the vectors with respect to the basis B.
We might agree that, generally, the easiest matrices to understand are diagonal. If we think
of our examples of projection and reflection in Rn , we obtain some particularly simple diagonal
matrices.
Example 1. Suppose V ⊂ Rn is a subspace. Choose a basis {v1 , . . . , vk } for V and a basis
{vk+1 , . . . , vn } for V ⊥ . Then B = {v1 , . . . , vn } forms a basis for Rn (why?). Let T = projV : Rn →
Rn be the linear transformation given by projecting onto V , and let S : Rn → Rn be the linear
transformation given by reflecting across V . Then we have
T (v1 ) = v1 S(v1 ) = v1
.. ..
. .
T (vk ) = vk S(vk ) = vk
and
T (vk+1 ) = 0 S(vk+1 ) = −vk+1
.. ..
. .
T (vn ) = 0 S(vn ) = −vn
396
§1. Linear Transformations and Change of Basis 397

Then the matrices for T and S with respect to the basis B are, respectively,
" # " #
Ik O Ik O
B= and C = . ▽
O O O −In−k
Example 2. Let T : R2 → R2 be the linear transformation defined by multiplying by
" #
3 1
A= .
2 2
It is rather difficult to understand this function until we discover that if we take

1 −1
v1 = and v2 = ,
1 2
then T (v1 ) = 4v1 and T (v2 ) = v2 , so that the matrix for T with respect to the ordered basis

T(v2) = v2 T(v1) = 4v1

v2 T

Figure 1.1

B = {v1 , v2 } is the diagonal matrix " #

4 0
B= .
0 1
Now it is rather straightforward to picture the linear transformation: as we see from Figure 1.1,
it stretches the v1 -axis by a factor of 4 and leaves the v2 -axis unchanged. Since we can “pave”
the plane by parallelograms formed by v1 and v2 , we are able to describe the effects of T quite
explicitly. We shall soon see how to find v1 and v2 .
For future reference, let’s consider the matrix P with column vectors v1 and v2 . Since T (v1 ) =
4v1 and T (v2 ) = v2 , we observe that
" #" # " # " #" #
3 1 1 −1 4 −1 1 −1 4 0
AP = = = = P B.
2 2 1 2 4 2 1 2 0 1
This might be rewritten as B = P −1 AP , in the form that will occupy our attention for the rest of
this section.
It would have been a more honest exercise here to start with the geometric description of T ,
i.e., its action on the basis vectors v1 and v2 , and try to find the standard matrix for T . As the
reader can check, we have
e1 = 32 v1 − 13 v2
e2 = 31 v1 + 13 v2 ,
398 Chapter 9. Eigenvalues, Eigenvectors, and Applications

and so we compute that

T (e1 ) = 23 T (v1 ) − 31 T (v2 ) = 83 v1 − 13 v2

3
= , and
2
1
T (e2 ) = 3 T (v1 ) + 31 T (v2 ) = 43 v1 + 13 v2

1
= .
2
What a relief! ▽

Given a (finite-dimensional) vector space V and an ordered basis B = {v1 , . . . , vn } for V , we

can define a linear transformation
CB : V → R n ,
that assigns to each vector v its vector of coordinates with respect to the basis B. That is,
 
c1
 c2 
 
CB (c1 v1 + c2 v2 + · · · + cn vn ) =  .. .
 . 
cn

Of course, when B is the standard basis E for Rn , this is what you’d expect:
 
x1
 x2 
 
CE (x) =  .. .
 . 
xn

Suppose T : Rn → Rn is a linear transformation and T (x) = y; to say that A is the standard

matrix for T is to say that multiplying A by the coordinate vector of x (in the standard basis)
gives the coordinate vector of y (in the standard basis). Likewise, suppose T : V → V is a linear
transformation, T (v) = w, and B is an ordered basis for V . Then let CB (v) = x be the coordinate
vector of v with respect to the basis B, and let CB (w) = y be the coordinate vector of w with
respect to the basis B. To say that A is the matrix for T with respect to the basis B (see the
definition on p. 396) is to say Ax = y. (See Figure 1.2.)

T
V V

CB CB

A
IRn IRn

Figure 1.2

Suppose now that we have a linear transformation T : V → V and two ordered bases B =
{v1 , . . . , vn } and B ′ = {v1′ , . . . , vn′ } for V . (Often in our applications, as the notation suggests, V
§1. Linear Transformations and Change of Basis 399

will be Rn and B will be the standard basis E.) Let Aold = [T ]B be the matrix for T with respect
to the “old” basis B, and let Anew = [T ]B′ be the matrix for T with respect to the “new” basis B ′ .
The fundamental issue now is to compute Anew if we know Aold . Define the change-of-basis matrix
P to be the matrix whose column vectors are the coordinates of the new basis vectors with respect
to the old basis: i.e.,
vj′ = p1j v1 + p2j v2 + · · · + pnj vn .
When B is the standard basis, we have our usual schematic picture
 
| | |
 
P =  v1′ v2′ · · · vn′  .
| | |
Note that P must be invertible, since we can similarly express each of the old basis vectors as
a linear combination of the new basis vectors. (Cf. Proposition 3.4 of Chapter 4.) Then, as the
diagram in Figure 1.3 summarizes, we have the following

Theorem 1.1 (Change-of-Basis Formula). Let T : V → V be a linear transformation, and let

B = {v1 , . . . , vn } and B ′ = {v1′ , . . . , vn′ } be ordered bases for V . If [T ]B and [T ]B′ are the matrices
for T with respect to the respective bases and P is the change-of-basis matrix (whose columns are
the coordinates of the new basis vectors with respect the old basis), then we have
[T ]B′ = P −1 [T ]B P .

T
V V
CB CB′ CB CB′
[T]B′
IRn IRn

P P —1
[T]B
IRn IRn

Figure 1.3

Remark. Two matrices A and B are called similar if B = P −1 AP for some invertible matrix P
(see Exercise 9). Theorem 1.1 tells us that any two matrices representing a linear map T : V → V
are similar.

Proof. Given a vector v ∈ V , denote by x and x′ , respectively, its coordinate vectors with
respect to the bases B and B ′ . The important relation here is
x = P x′ .
Pn
We derive this as follows: Using the equations v = xi vi and
i=1
Xn X X
n n X n Xn
′ ′ ′
v= xj vj = xj pij vi = pij x′j vi ,
j=1 j=1 i=1 i=1 j=1
400 Chapter 9. Eigenvalues, Eigenvectors, and Applications

we deduce from Corollary 4.3.3 that

n
X
xi = pij x′j .
j=1

(If we think of the old basis as the standard basis for Rn , then this is our familiar fact that
multiplying P by x′ takes the appropriate linear combination of the columns of P .)
Likewise, if T (v) = w, let y and y′ , respectively, denote the coordinate vectors of w with
respect to bases B and B ′ . Now compare the equations

y′ = [T ]B′ x′ and y = [T ]B x

using

y = P y′ and x = P x′ :

On one hand, we have

y = P y′ = P ([T ]B′ x′ ) = (P [T ]B′ )x′ ,

and on the other hand,

y = [T ]B x = [T ]B (P x′ ) = ([T ]B P )x′ ,

from which we conclude that

[T ]B P = P [T ]B′ , i.e., [T ]B′ = P −1 [T ]B P .

Example 3. Let’s return to Example 2 as a test case for the change-of-basis formula. (Of
course, we’ve already seen there that it works!) Given the matrix
" #
3 1
A = [T ] =
2 2

of a linear transformation T : R2 → R2 with respect to the standard basis, let’s calculate its matrix
[T ]B′ with respect to the new basis B ′ = {v1 , v2 }, where

1 −1
v1 = and v2 = .
1 2

The change-of-basis matrix is

" # " #
1 −1 1 2 1
P = , and P −1 = ,
1 2 3 −1 1

from which the reader should calculate that, indeed,

" #" #" # " #
−1 1 2 1 3 1 1 −1 4 0
[T ]B′ = P AP = = . ▽
3 −1 1 2 2 1 2 0 1
§1. Linear Transformations and Change of Basis 401

Example 4. We wish to calculate the standard matrix for the linear transformation T = projV ,
where V ⊂ R3 is the plane x1 − 2x2 + x3 = 0. If we choose a basis B = {v1 , v2 , v3 } for R3 so that
{v1 , v2 } is a basis for V and v3 is normal to the plane, then (see Example 1) we’ll have
 
1 0 0
 
[T ]B =  0 1 0.
0 0 0
So we take      
−1 1 1
v1 =  0, v2 = 1  ,
 and v3 = −2  .

1 1 1
We wish to know the standard matrix, which means that B ′ = {e1 , e2 , e3 } should be the standard
basis for R3 . Then the inverse of the change-of-basis matrix is
 
−1 1 1
 
P −1 =  0 1 −2  ,
1 1 1

and so
 
− 21 0 1
2
 1 1 1 
P = 3 3 3 .
1
6 − 31 1
6

Now we use the change-of-basis formula:

   
−1 1 1 1 0 0 − 12 0 1
2
−1    
[T ] = [T ]B′ = P [T ]B P =  0 1 −2   0 1 0   13 3
1 1
3 
1
1 1 1 0 0 0 6 − 31 1
6
 
5 1
− 61
 16 3
1 1 .
=  3 3 3 
▽
− 61 1
3
5
6
3 3
 transformation T : R → R defined by rotating
Example 5. Suppose we consider thelinear
1
an angle 2π/3 about the line spanned by  −1 . (The angle is measured counterclockwise from
1

P v3

v2
e2 v1
e1

Figure 1.4
402 Chapter 9. Eigenvalues, Eigenvectors, and Applications

a vantage point on the “positive side” of this line.) Once again, the key is to choose a convenient
new basis adapted to the geometry of the problem. We choose
 
1
v3 =  −1 
1
along the axis of rotation and v1 , v2 to be an orthonormal basis for the plane orthogonal to that
axis: e.g.,    
1 −1
1   1 
v1 = √ 1 and v2 = √ 1.
2 0 6 2
Now let’s compute:
√
3
T (v1 ) = − 12 v1 + 2 v2
√
T (v2 ) = − 23 v1 − 1
2 v2
T (v3 ) = v3
(Now it should be clear why we chose v1 , v2 to be orthonormal. We also want v1 , v2 , v3 to form
a “right-handed system” so that we’re turning in the correct direction, as indicated in Figure 1.4.
But there’s no need to worry about the length of v3 .) Thus, we have
 √ 
1 3
− 2 − 2 0
√ 
[T ]B =  23 − 12 0 .
0 0 1
Next, we take B ′ = {e1 , e2 , e3 } and the inverse of the change-of-basis matrix is
 
√1 − √1 1
 12 6 
P −1 = 
 2
√ √1
6
−1 
,
2
0 √
6
1

so that
 
√1 √1 0
2 2
 √1 √1 √2

P = − 6 .
6 6
1
3 − 13 1
3
(Exercise 5.5.16 may be helpful here, but, as a last resort, there’s always Gaussian elimination.)
Once again, we solve for
  √  1 
√1 − √1 1 − 1
− 3
0 √ √1 0
 1 2 6   √2 2 2 2
  √1 √1 √2 
[T ] = [T ]B′ = P −1 [T ]B P =  √
 2
√1
6
−1   3 −1
 2 2 0   − 
6 6 6
2 1 1 1
0 √
6
1 0 0 1 3 −3 3
 
0 −1 0
 
= 0 0 −1  ,
1 0 0
amazingly enough. In hindsight, then, we should be able to see the effect of T on the standard
basis vectors quite plainly. Can you? ▽
§1. Linear Transformations and Change of Basis 403

Remark. Suppose we first rotate π/2 about the x3 -axis and then rotate π/2 about the x1 -axis.
We leave it to the reader to check that the result is the linear transformation whose matrix we
just calculated. This raises a fascinating question: Is the composition of rotations always again a
rotation? If so, is there a way of predicting the ultimate axis and angle?

EXERCISES 9.1

2 1
*1. Let v1 = and v2 = , and consider the basis B ′ = {v1 , v2 } for R2 .
3 2

1 5
a. Suppose T : R2 → R2 is a linear transformation whose standard matrix is [T ] = .
2 −2
Find the matrix for T with respect to the basis B ′ .
b. If S : R2 → R2 is a linear transformation defined by

S(v1 ) = 2v1 + v2
S(v2 ) = −v1 + 3v2 ,

then give the standard matrix for S.

2. Derive the result of Exercise 1.4.10a by the change-of-basis formula.

3. Let T : R3 → R3 be the linear transformation given by reflecting across the plane

−x1 + x2 + x3 = 0.
a. Find an orthogonal basis {v1 , v2 , v3 } for R3 so that v1 , v2 span the plane and v3 is or-
thogonal to it.
b. Give the matrix representing T with respect to your basis in part a.
c. Use the change-of-basis theorem to give the matrix representing T with respect to the
standard basis.

4. Use the change-of-basis

   formula
 to find the standard matrix for projection onto the plane
1 0
spanned by  0  and  1 .
1 −2

*5. Let T : R3 → R3 be the linear transformation given by reflecting across the plane
x1 − 2x2 + 2x3 = 0. Use the change-of-basis formula to find its standard matrix.

6. Check the result claimed in the remark on p. 403.

7. Let V ⊂ R3 be the subspace defined by

V = {x ∈ R3 : x1 − x2 + x3 = 0}.

Find the standard matrix for each of the following linear transformations:
a. projection on V
b. reflection across V
c. rotation of V through angle π/6 (as viewed from high above)
404 Chapter 9. Eigenvalues, Eigenvectors, and Applications

4
*8. Find the standard
  matrix
 for the linear transformation giving projection onto the plane in R
1 0
0  1
spanned by    
 2  and  −1 .
1 1

9. Let A and B be n × n matrices. We say B is similar to A if there is an invertible matrix P so

that B = P −1 AP . (Hint: B = P −1 AP ⇐⇒ P B = AP .)
a. If c is any scalar, show that cI is similar only to itself.
b. Show that if B is similar to A, then A is similar to B.
a 0 b 0
c. Show that is similar to .
0 b 0 a

1 a 1 b
d. Show that for any real numbers a and b, the matrices and are similar.
0 2 0 2

2 1 2 0
e. Show that is not similar to .
0 2 0 2

2 1
f. Show that is not diagonalizable, i.e., is not similar to any diagonal matrix.
0 2

10. See Exercise 9 for the relevant definition. Prove or give a counterexample:
a. If B is similar to A, then B T is similar to AT .
b. If B 2 is similar to A2 , then B is similar to A.
c. If B is similar to A and A is nonsingular, then B is nonsingular.
d. If B is similar to A and A is symmetric, then B is symmetric.
e. If B is similar to A, then N(B) = N(A).
f. If B is similar to A, then rank(B) = rank(A).

11. See Exercise 9 for the relevant definition. Suppose A and B are n × n matrices.
a. Show that if either A or B is nonsingular, then AB and BA are similar.
b. Must AB and BA be similar in general?
 
sin φ cos θ
12. *a. Let a =  sin φ sin θ , 0 ≤ φ < π/2. Prove that the intersection of the circular cylinder
cos φ
x21 + x
2
2 = 1 with the plane a·
x = 0 is an ellipse. (Hint: Consider the new basis
− sin θ − cos φ cos θ
v1 =  cos θ , v2 =  − cos φ sin θ , v3 = a.)
0 sin φ
b. Describe the projection of the cylindrical region x21 + x22 = 1, −h ≤ x3 ≤ h onto the
general plane a · x = 0. (Hint: Special cases are the planes x3 = 0 and x1 = 0.)
   
±1 1
13. A cube with vertices at  ±1  is rotated about the long diagonal through ±  1 . Describe
±1 1
the resulting surface and give equation(s) for it.

14. In this exercise we give the general version of the change-of-basis formula for a linear transfor-
mation T : V → W .
§2. Eigenvalues, Eigenvectors, and Diagonalizability 405

a. Suppose V and V′ are ordered bases for the vector space V and W and W′ are ordered bases
for the vector space W . Let P be the change of basis matrix from V to V′ and let Q be
the change of basis matrix from W to W′ . Suppose T : V → W is a linear transformation
whose matrix with respect to the bases V and W is [T ]W V and whose matrix with respect
′ ′ ′ ′
to the new bases V and W is [T ]V′ . Prove that [T ]V′ = Q−1 [T ]W
W W
V P.
b. Consider the identity transformation T : V → V . Using the basis V′ in the domain and
the basis V in the range, show that the matrix for [T ]VV′ is the change of basis matrix P .

15. (See the discussion on p. 174 and Exercise 4.4.18.) Let A be an n × n matrix. Prove that the
functions T : R(A) → C(A) and S : C(A) → R(A) are inverse functions if and only if A = QP ,
where P is a projection matrix and Q is orthogonal.

2. Eigenvalues, Eigenvectors, and Diagonalizability

As we shall soon see, it is often necessary in applications to compute (high) powers of a given
square matrix. When A is diagonalizable, i.e., there is an invertible matrix P so that P −1 AP = Λ
is diagonal, we have

A = P ΛP −1 , and so
Ak = (P ΛP −1 )(P ΛP −1 ) · · · (P ΛP −1 ) = P Λk P −1 .
| {z }
k times

Since Λk is easy to calculate, we are left with a very computable formula for Ak . We will see a
number of applications of this principle in Section 3. We turn first to the matter of finding the
diagonal matrix Λ if, in fact, A is diagonalizable. Then we will try to develop some criteria that
guarantee diagonalizability.

2.1. The Characteristic Polynomial. Recall that a linear transformation T : V → V is

diagonalizable if there is an (ordered) basis B = {v1 , . . . , vn } for V so that the matrix for T with
respect to that basis is diagonal. This means precisely that, for some scalars λ1 , . . . , λn , we have

T (v1 ) = λ1 v1 ,
T (v2 ) = λ2 v2 ,
..
.
T (vn ) = λn vn .

Likewise, an n×n matrix A is diagonalizable if there is a basis {v1 , . . . , vn } for Rn with the property
that Avi = λi vi for all i = 1, . . . , n.
This observation leads us to the following

Definition. Let T : V → V be a linear transformation. A nonzero vector v ∈ V is called an

eigenvector of T if there is a scalar λ so that T (v) = λv. The scalar λ is called the associated
eigenvalue of T .
406 Chapter 9. Eigenvalues, Eigenvectors, and Applications

In other words, an eigenvector of a linear transformation T is a (nonzero) vector that is merely

stretched (perhaps in the negative direction) by T . The line spanned by the vector is identical to
the line spanned by its image under T .
This definition, in turn, leads to a convenient reformulation of diagonalizability:

Proposition 2.1. The linear transformation T : V → V is diagonalizable if and only if there

is a basis for V consisting of eigenvectors of T .

At this juncture, the obvious question to ask is how we should find eigenvectors. Let’s start
by observing that, if we include the zero vector, the set of eigenvectors with eigenvalue λ forms a
subspace.

Lemma 2.2. Let T : V → V be a linear transformation, and let λ be any scalar. Then
E(λ) = {v ∈ V : T (v) = λv} = ker(T − λI)
is a subspace of V . dim E(λ) > 0 if and only if λ is an eigenvalue, in which case we call E(λ) the
λ-eigenspace.

Proof. That E(λ) is a subspace follows immediately once we recognize that it is the kernel
(or nullspace) of a linear map. (In the more familiar matrix notation, {x ∈ Rn : Ax = λx} =
N(A − λI).) Now, by definition, λ is an eigenvalue precisely when there is a nonzero vector in
E(λ).
We now come to the main computational tool for finding eigenvalues.

Proposition 2.3. Let A be an n × n matrix. Then λ is an eigenvalue of A if and only if

det(A − λI) = 0.

Proof. From Lemma 2.2 we infer that λ is an eigenvalue if and only if the matrix A − λI is
singular. Next we conclude from Theorem 5.5 of Chapter 7 that A − λI is singular precisely when
det(A − λI) = 0. Putting the two statements together, we obtain the result.
Once we use this criterion to find the eigenvalues λ, it is an easy matter to find the corresponding
eigenvectors merely by finding N(A − λI).

Example 1. Let’s find the eigenvalues and eigenvectors of the matrix

" #
3 1
A= .
−3 7
We start by calculating

3 − t 1

det(A − tI) = = (3 − t)(7 − t) − (1)(−3) = t2 − 10t + 24.
−3 7 − t

Since t2 − 10t + 24 = (t − 4)(t − 6) = 0 when t = 4 or t = 6, these are our two eigenvalues. We now
proceed to find the corresponding eigenspaces:
E(4): We see that

1 −1 1
v1 = is a basis for N(A − 4I) = N .
1 −3 3
§2. Eigenvalues, Eigenvectors, and Diagonalizability 407

E(6): We see that

1 −3 1
v2 = is a basis for N(A − 6I) = N .
3 −3 1
Since we observe that the {v1 , v2 } is linearly
independent, the matrix A is diagonalizable. Indeed,
1 1
as the reader can check, if we take P = , then
1 3
" #" #" # " #
3
− 21 3 1 1 1 4 0
P −1 AP = 2 = ,
− 21 1
2 −3 7 1 3 0 6
as should be the case. ▽

Example 2. Let’s find the eigenvalues and eigenvectors of the matrix

 
1 2 1
 
A = 0 1 0.
1 3 1
We begin by computing

1 − t 2 1

det(A − tI) = 0 1−t 0

1 3 1 − t

(expanding in cofactors along the second row)

= (1 − t) (1 − t)(1 − t) − 1 = (1 − t)(t2 − 2t) = −t(t − 1)(t − 2).
Thus, the eigenvalues of A are 0, 1, and 2. We next find the respective eigenspaces:
E(0): We see that
     
−1 1 2 1 1 0 1
v1 =  0 is a basis for N(A − 0I) = N  0 1 0   =N  0 1 0  .
1 1 3 1 0 0 0
E(1): We see that
     
3 0 2 1 1 0 − 23
v2 = −1 
 is a basis for N(A − 1I) = N   0 0 0   =N   0 1 1 
2 .
2 1 3 0 0 0 0
E(2): We see that
     
1 −1 2 1 1 0 −1
v3 =  0  is a basis for N(A − 2I) = N  0 −1 0  = N  0 1 0  .
1 1 3 −1 0 0 0
Once again, A is diagonalizable: as the reader can check, {v1 , v2 , v3 } is linearly independent and
therefore gives a basis for R3 . Just to be sure, we let
 
−1 3 1
 
P =  0 −1 0;
1 2 1
408 Chapter 9. Eigenvalues, Eigenvectors, and Applications

then
     
− 12 − 21 1
2 1 2 1 −1 3 1 0 0 0
     
P −1 AP =  0 −1 00 1 0 0 −1 0 = 0 1 0,
1 5 1
2 2 2 1 3 1 1 2 1 0 0 2
as we expected. ▽

Remark. There is a built-in check here for the eigenvalues. If λ is truly to be an eigenvalue of
A, we must find a nonzero vector in N(A − λI). If we do not, then λ cannot be an eigenvalue.

Example 3. Let’s find the eigenvalues and eigenvectors of the matrix

" #
0 1
A= .
−1 0
As usual, we calculate
−t 1

det(A − tI) = = t2 + 1.
−1 −t
Since t2 + 1 ≥ 1 for all real numbers t, there is no real number λ so that det(A − λI) = 0. Since
our scalars are allowed only to be real numbers, this matrix has no eigenvalue. On the other hand,
as the reader may see in a more advanced course, it is often convenient to allow complex numbers
as scalars. ▽

It is evident that we are going to find the eigenvalues of a matrix A by finding the (real) roots
of the polynomial det(A − tI). This leads us to make our next

Definition. Let A be a square matrix. Then p(t) = pA (t) = det(A − tI) is called the charac-
teristic polynomial of A.1

We can restate Proposition 2.3 by saying that the eigenvalues of A are the real roots of the
characteristic polynomial pA (t). It is comforting to observe that similar matrices have the same
characteristic polynomial, and hence it makes sense to refer to the characteristic polynomial of a
linear map T : V → V .

Lemma 2.4. If B = P −1 AP for some invertible matrix P , then pA (t) = pB (t).

Proof. We have

pB (t) = det(B − tI) = det(P −1 AP − tI) = det P −1 (A − tI)P = det(A − tI) = pA (t),

by virtue of the product rule for determinants.

As a consequence, if V is a finite-dimensional vector space and T : V → V is a linear transformation,

then we can define the characteristic polynomial of T to be that of the matrix A for T with respect
to any basis for V . By Lemma 2.4 we’ll get the same answer no matter what basis we choose.
1
That the characteristic polynomial of an n × n matrix is in fact a polynomial of degree n seems pretty evident
from examples; but the fastidious reader can establish this by expanding in cofactors.
§2. Eigenvalues, Eigenvectors, and Diagonalizability 409

Remark . In order to determine the eigenvalues of a matrix, we must find the roots of its
characteristic polynomial. In real-world applications (where the matrices tend to get quite large),
one might solve this numerically (e.g., using Newton’s method). However, there are more sophisti-
cated methods for finding the eigenvalues without even calculating the characteristic polynomial; a
powerful such method is based on the Gram-Schmidt process. The interested reader should consult
Strang or Wilkinson for more details.
For the lion’s share of the matrices that we shall encounter here, the eigenvalues will be integers,
and so we take this opportunity to remind you of a trick from high school algebra.

Proposition 2.5 (Rational Roots Test). Let p(t) = an tn + an−1 tn−1 + · · · + a1 t + a0 be a

polynomial with integer coefficients. If t = r/s is a rational root (in lowest terms) of p(t), then r
must be a factor of a0 and s must be a factor of an .

Proof. You can find a proof in most abstract algebra texts, but, for obvious reasons, we
recommend Abstract Algebra: A Geometric Approach, by someone named T. Shifrin, p. 105.

In particular, when the leading coefficient an is ±1, as is always the case with the characteristic
polynomial, any rational root must in fact be an integer that divides a0 . So, in practice, we test
the various factors of a0 (being careful to try both positive and negative such). Once we find one
root r, we can divide p(t) by t − r to obtain a polynomial of smaller degree.

Example 4. The characteristic polynomial of the matrix

 
4 −3 3
 
A = 0 1 4
2 −2 1
is p(t) = −t3 + 6t2 − 11t + 6. The factors of 6 are ±1, ±2, ±3, and ±6. Since p(1) = 0, we know
that 1 is a root (so we were lucky!). Now,
−p(t)
= t2 − 5t + 6 = (t − 2)(t − 3),
t−1
and we have succeeded in finding all three eigenvalues of A. They are 1, 2, and 3. ▽

Remark. It might be nice to have a few shortcuts for calculating the characteristic polynomial
of small matrices. For 2 × 2 matrices, it’s quite easy:

a − t b

= (a − t)(d − t) − bc = t2 − (a + d) t + (ad − bc) = t2 − trA t + det A .
c d − t
(Recall that the trace of a matrix is the sum of its diagonal entries. The trace of A is denoted trA.)
For 3 × 3 matrices, similarly,

a − t a12 a13
11

a21 a22 − t a23 = −t3 + trA t2 − (C11 + C22 + C33 ) t + det A ,

a31 a32 a33 − t

where Cii is the iith cofactor, the determinant of the 2 × 2 submatrix formed by deleting the ith
row and column from A.
410 Chapter 9. Eigenvalues, Eigenvectors, and Applications

In general, the characteristic polynomial p(t) of an n × n matrix A is always of the form

p(t) = (−1)n tn + (−1)n−1 trA tn−1 + · · · + det A .
Note that the constant term is always det A (with no minus signs) because p(0) = det(A − 0I) =
det A.

In the long run, these formulas notwithstanding, it’s sometimes best to calculate the character-
istic polynomial of 3 × 3 matrices by expansion in cofactors. If one is both attentive and fortunate,
this may save the trouble of factoring the polynomial.

Example 5. Let’s find the characteristic polynomial of

 
2 0 0
 
A = 1 2 1.
0 1 2
We calculate the determinant by expanding in cofactors along the first row:

2 − t 0 0
2 − t 1

1 2−t 1 = (2 − t)
1 2 − t
0 1 2−t

= (2 − t) (2 − t)2 − 1 = (2 − t)(t2 − 4t + 3)
= (2 − t)(t − 3)(t − 1).
But that was too easy. Let’s try the characteristic polynomial of
 
2 0 1
 
B = 1 3 1.
1 1 2
Again, we expand in cofactors along the first row:

2 − t 0 1
3 − t 1 1 3 − t

1 3−t 1 = (2 − t) + 1
1 2−t 1 1
1 1 2−t

= (2 − t) (3 − t)(2 − t) − 1 + 1 − (3 − t)
= (2 − t)(t2 − 5t + 5) − (2 − t) = (2 − t)(t2 − 5t + 4)
= (2 − t)(t − 1)(t − 4).
OK, perhaps we were a bit lucky there, too. ▽

2.2. Diagonalizability. Judging by the foregoing examples, it seems to be the case that
when an n × n matrix (or linear transformation) has n distinct eigenvalues, the corresponding
eigenvectors form a linearly independent set and will therefore give a “diagonalizing basis.” Let’s
begin by proving a slightly stronger statement.

Theorem 2.6. Let T : V → V be a linear transformation. Let λ1 , . . . , λk be k distinct scalars.

Suppose v1 , . . . , vk are eigenvectors of T with respective eigenvalues λ1 , . . . , λk . Then {v1 , . . . , vk }
is a linearly independent set of vectors.
§2. Eigenvalues, Eigenvectors, and Diagonalizability 411

Proof. Let m be the largest number between 1 and k (inclusive) so that {v1 , . . . , vm } is linearly
independent. We want to see that m = k. By way of contradiction, suppose m < k. Then we know
that {v1 , . . . , vm } is linearly independent and {v1 , . . . , vm , vm+1 } is linearly dependent. It follows
from Proposition 3.2 of Chapter 4 that vm+1 = c1 v1 + · · · + cm vm for some scalars c1 , . . . , cm . Then
(using repeatedly the fact that T (vi ) = λi vi )

0 = (T − λm+1 I)vm+1 = (T − λm+1 I)(c1 v1 + · · · + cm vm )

= c1 (λ1 − λm+1 )v1 + · · · + cm (λm − λm+1 )vm .

Since λi − λm+1 6= 0 for i = 1, . . . , m, and since {v1 , . . . , vm } is linearly independent, the only
possibility is that c1 = · · · = cm = 0, contradicting the fact that vm+1 6= 0 (by the very definition
of eigenvector). Thus, it cannot happen that m < k, and the proof is complete.

We now arrive at our first result that gives a sufficient condition for a linear transformation to
be diagonalizable.

Corollary 2.7. Suppose V is an n-dimensional vector space and T : V → V has n distinct

(real) eigenvalues. Then T is diagonalizable.

Proof. The set of the n corresponding eigenvectors will be linearly independent and will hence
give a basis for V . The matrix for T with respect to a basis of eigenvectors is always diagonal.

Remark. Of course, there are many diagonalizable (indeed, diagonal) matrices with repeated
eigenvalues. Certainly the identity matrix and the matrix
 
2 0 0
 
0 3 0
0 0 2
are diagonal, and yet they fail to have distinct eigenvalues.

We spend the rest of this section discussing the two ways the hypotheses of Corollary 2.7 can
fail: the characteristic polynomial may have complex roots or it may have repeated roots.

Example 6. Consider the matrix

" #
√1 − √12
A= 2 .
√1 √1
2 2

The reader may well recall from Chapter 1 that the multiplying by A gives a rotation of the plane
through an angle of π/4. Now, what are the eigenvalues of A? The characteristic polynomial is
√
p(t) = t2 − (trA)t + det A = t2 − 2t + 1,

whose roots (by the quadratic formula) are

√ √
2 ± −2 1±i
λ= = √ .
2 2
After a bit of thought, it should come as no surprise that A has no (real) eigenvector, as there can
be no line through the origin that is unchanged after a rotation. ▽
412 Chapter 9. Eigenvalues, Eigenvectors, and Applications

We have seen that when the characteristic polynomial has distinct (real) roots, we get a 1-
dimensional eigenspace for each. What happens if the characteristic polynomial has some repeated
roots?

Example 7. Consider the matrix

" #
1 1
A= .
−1 3

Its characteristic polynomial is p(t) = t2 − 4t + 4 = (t − 2)2 , so 2 is a repeated eigenvalue. Now

let’s find the corresponding eigenvectors:
−1 1 1 −1
N(A − 2I) = N =N
−1 1 0 0

is 1-dimensional, with basis

1
.
1
It follows that A cannot be diagonalized. (See also Exercise 16.) ▽

Example 8. Both the matrices

   
2 0 2 1
 0 2 O   0 2 O 
   
A=  and B= 
 3 1   3 0 
O O
0 3 0 3

have the characteristic polynomial p(t) = (t − 2)2 (t − 3)2 (why?). For A, there are two linearly
independent eigenvectors with eigenvalue 2 but only one linearly independent eigenvector with
eigenvalue 3. For B, there are two linearly independent eigenvectors with eigenvalue 3 but only one
linearly independent eigenvector with eigenvalue 2. As a result, neither can be diagonalized. ▽

It would be convenient to have a bit of terminology here.

Definition . Let λ be an eigenvalue of a linear transformation. The algebraic multiplicity of

λ is its multiplicity as a root of the characteristic polynomial p(t), i.e., the highest power of t − λ
dividing p(t). The geometric multiplicity of λ is the dimension of the λ-eigenspace E(λ).

Example 9. For the matrices in Example 8, both the eigenvalues 2 and 3 have algebraic
multiplicity 2. For matrix A, the eigenvalue 2 has geometric multiplicity 2 and the eigenvalue 3
has geometric multiplicity 1; for matrix B, the eigenvalue 2 has geometric multiplicity 1 and the
eigenvalue 3 has geometric multiplicity 2. ▽

From the examples we’ve seen, it seems quite plausible that the geometric multiplicity of an
eigenvalue can be no larger than its algebraic multiplicity, but we stop to give a proof.

Proposition 2.8. Let λ be an eigenvalue of algebraic multiplicity m and geometric multiplicity

d. Then 1 ≤ d ≤ m.
§2. Eigenvalues, Eigenvectors, and Diagonalizability 413

Proof. Suppose λ is an eigenvalue of the linear transformation T . Then d = dim E(λ) ≥ 1 by

definition. Now, choose a basis {v1 , . . . , vd } for E(λ) and extend it to a basis B = {v1 , . . . , vn } for
V . Then the matrix for T with respect to the basis B is of the form
 
 λId B 
A= ,
O C

and so, by Exercise 7.5.10, the characteristic polynomial

pA (t) = det(A − tI) = det (λ − t)Id det(C − tI) = (λ − t)d det(C − tI).
Since the characteristic polynomial does not depend on the basis and since (t − λ)m is the largest
power of t − λ dividing the characteristic polynomial, it follows that d ≤ m.

We are now able to give a necessary and sufficient criterion for a linear transformation to be
diagonalizable. Based on our experience with examples, it should come as no great surprise.

Theorem 2.9. Let T : V → V be a linear transformation. Let its distinct eigenvalues be

λ1 , . . . , λk and assume these are all real numbers. Then T is diagonalizable if and only if the
geometric multiplicity, di , of each λi equals its algebraic multiplicity, mi .

Proof. Let V be an n-dimensional vector space. Then the characteristic polynomial of T has
degree n, and we have
p(t) = ±(t − λ1 )m1 (t − λ2 )m2 · · · (t − λk )mk ;

therefore,
k
X
n= mi .
i=1
Now, suppose T is diagonalizable. Then there is a basis B consisting of eigenvectors. At most
P
k
di of these basis vectors lie in E(λi ), and so n ≤ di . On the other hand, by Proposition 2.8
i=1
di ≤ mi for i = 1, . . . , k. Putting these together, we have
k
X k
X
n≤ di ≤ mi = n.
i=1 i=1
Thus, we must have equality at every stage here, which implies that di = mi for all i = 1, . . . , k.
Conversely, suppose di = mi for i = 1, . . . , k. If we choose a basis Bi for each eigenspace E(λi )
and let B = B1 ∪ · · · ∪ Bk , then we assert that B is a basis for V . There are n vectors in B, so
we need only check that the set of vectors is linearly independent. This is a generalization of the
argument of Theorem 2.6, and we leave it to Exercise 25.

Example 10. The matrices

   
−1 4 2 0 3 1
   
A =  −1 3 1 and B =  −1 3 1
−1 2 2 0 1 1
414 Chapter 9. Eigenvalues, Eigenvectors, and Applications

both have characteristic polynomial p(t) = −(t − 1)2 (t − 2). That is, the eigenvalue 1 has algebraic
multiplicity 2 and the eigenvalue 2 has algebraic multiplicity 1. To decide whether the matrices are
diagonalizable, we need to know the geometric multiplicity of the eigenvalue 1. Well,
   
−2 4 2 1 −2 −1
   
A − I =  −1 2 1 0 0 0
−1 2 1 0 0 0

has rank 1 and so dim EA (1) = 2. We infer from Theorem 2.9 that A is diagonalizable. Indeed, as
the reader can check, a diagonalizing basis is
     
 1 1 2 
 0 , 1 , 1  .
 
1 −1 1

On the other hand,

   
−1 3 1 1 0 −1
   
B − I =  −1 2 1 0 1 0
0 1 0 0 0 0

has rank 2 and so dim EB (1) = 1. Since the eigenvalue 1 has geometric multiplicity 1, it follows
from Theorem 2.9 that B is not diagonalizable. ▽

In the next section we will see the power of diagonalizing matrices in several applications.

EXERCISES 9.2

1. Find"the eigenvalues
# and eigenvectors of the following matrices. 
1 5 1 0 0
*a.  
2 4 g.  −2 1 2
" #
0 1 −2 0 3
b.  
1 0 1 −1 2
" #  
h.  0 1 0
10 −6
c. 0 −2 3
18 −11  
" # 2 0 1
1 3  
d. *i.  0 1 2
3 1
" # 0 0 1
1 1  
*e. 1 −2 2
−1 3  
  j.  −1 0 −1 
−1 1 2 0 2 −1
 
*f.  1 2 1
2 1 −1
§2. Eigenvalues, Eigenvectors, and Diagonalizability 415
   
3 1 0 3 2 −2
   
k.  0 1 2 m.  2 2 −1 
0 1 2 2 1 0
   
1 −6 4 1 0 0 1
  0
*l.  −2 −4 5  1 1 1
n.  
−2 −6 7 0 0 2 0
0 0 0 2
2. Prove that 0 is an eigenvalue of A if and only if A is singular.

3. Prove that the eigenvalues of an upper (or lower) triangular matrix are its diagonal entries.

4. What are the eigenvalues and eigenvectors of a projection? a reflection?

5. Suppose A is nonsingular. Prove that the eigenvalues of A−1 are the reciprocals of the eigen-
values of A.

6. Suppose x is an eigenvector of A with corresponding eigenvalue λ.

a. Prove that for any n ∈ N, x is an eigenvector of An with corresponding eigenvalue λn .
b. Prove or give a counterexample: x is an eigenvector of A + I.
c. If x is an eigenvector of B with corresponding eigenvalue µ, prove or give a counterexample:
x is an eigenvector of A + B with corresponding eigenvalue λ + µ.
d. Prove or give a counterexample: if λ is an eigenvalue of A and µ is an eigenvalue of B,
then λ + µ is an eigenvalue of A + B.

7. Prove or give a counterexample: If A and B have the same characteristic polynomial, then
there is an invertible matrix P so that B = P −1 AP .
♯ 8. Suppose A is a square matrix. Suppose x is an eigenvector of A with corresponding eigenvalue
λ and y is an eigenvector of AT with corresponding eigenvalue µ. Prove that if λ 6= µ, then
x · y = 0.

9. Prove or give a counterexample:

a. A and AT have the same eigenvalues.
b. A and AT have the same eigenvectors.

10. Prove that the product of the roots of the characteristic polynomial of A is equal to det A.
(Hint: If λ1 , . . . , λn are the roots, show that p(t) = ±(t − λ1 )(t − λ2 ) . . . (t − λn ).)

11. Let A and B be n × n matrices.

a. Suppose A (or B) is nonsingular. Prove that the characteristic polynomials of AB and
BA are equal.
b. (more challenging) Prove the result of part a when both A and B are singular.

*12. Decide whether each of the matrices in Exercise 1 is diagonalizable. Give your reasoning.

13. Prove or give a counterexample.

a. If A is an n × n matrix with n distinct (real) eigenvalues, then A is diagonalizable.
416 Chapter 9. Eigenvalues, Eigenvectors, and Applications

b. If A is diagonalizable and AB = BA, then B is diagonalizable.

c. If there is an invertible matrix P so that A = P −1 BP , then A and B have the same
eigenvalues.
d. If A and B have the same eigenvalues, then there is an invertible matrix P so that A =
P −1 BP .
e. There is no real 2 × 2 matrix A satisfying A2 = −I.
f. If A and B are diagonalizable and have the same eigenvalues (with the same algebraic
multiplicities), then there is an invertible matrix P so that A = P −1 BP .

14. Suppose A is a 2 × 2 matrix whose eigenvalues are integers. If det A = 120, explain why A
must be diagonalizable.

15. Is the linear transformation T : Mn×n → Mn×n defined by T (X) = X T diagonalizable? (Hint:
Consider the equation X T = λX. What are the corresponding eigenspaces? Exercise 1.4.36
may also be relevant.)

1 1 1
*16. Let A = . We saw in Example 7 that A has repeated eigenvalue 2 and v1 =
−1 3 1
spans E(2).
a. Calculate (A − 2I)2 .
b. Solve (A − 2I)v2 = v1 for v2 . Explain how we know a priori that this equation has a
solution.
c. Give the matrix for A with respect to the basis {v1 , v2 }.
This is the closest to diagonal one can get and is called the Jordan canonical form of A.

17. Prove that if λ is an eigenvalue of A with geometric multiplicity d, then λ is an eigenvalue of

AT with geometric multiplicity d. (Hint: Use Theorem 4.5 of Chapter 4.)

18. Suppose A is an n × n matrix with the property that A2 = A.

a. Show that if λ is an eigenvalue of A, then λ = 0 or λ = 1.
b. Prove that A is diagonalizable. (Hint: See Exercise 4.4.16.)

19. Suppose A is an n × n matrix with the property that A2 = I.

a. Show that if λ is an eigenvalue of A, then λ = 1 or λ = −1.
b. Prove that

E(1) = {x ∈ Rn : x = 21 (u + Au) for some u ∈ Rn } and

E(−1) = {x ∈ Rn : x = 12 (u − Au) for some u ∈ Rn }.

c. Prove that E(1) + E(−1) = Rn and deduce that A is diagonalizable.

(For an application, see Exercise 15.)

20. Let A be an orthogonal 3 × 3 matrix.

a. Prove that the characteristic polynomial pA has a real root.
b. Prove that kAxk = kxk for all x ∈ R3 and deduce that the only (real) eigenvalues of A
can be 1 and −1.
c. Prove that if det A = 1, then 1 must be an eigenvalue of A.
§2. Eigenvalues, Eigenvectors, and Diagonalizability 417

d. Prove that if det A = 1 and A 6= I, then µA : R3 → R3 is given by rotation through some

angle θ about some axis. (Hint: First show dim E(1) = 1. Then show that µA maps E(1)⊥
to itself and use Exercise 1.4.34.)
e. (Cf. the remark on p. 403.) Prove that the composition of rotations in R3 is again a
rotation.

21. Consider the linear map T : R3 → R3 whose standard matrix is the matrix

 1 1
√
6 1
√
6

6√ 3 + 6 6 − √3
 6 6
C =  13 − 2 1
+ 6 
√6 3√ 3
1 6 1 6 1
6 + 3 3 − 6 6

given on p. 27. Show that T is indeed a rotation. Find the axis and angle of rotation.

22. Let A be an n × n matrix all of whose eigenvalues are real numbers. Prove that there is a basis
for Rn with respect to which the matrix for A becomes upper triangular. (Hint: Consider a
basis {v1 , v2′ , . . . , vn′ }, where v1 is an eigenvector.)
♯ 23. Suppose T : V → V is a linear transformation. Suppose T is diagonalizable (i.e., there is a basis
for V consisting of eigenvectors of T ). Suppose, moreover, that there is a subspace W ⊂ V with
the property that T (W ) ⊂ W . Prove that there is a basis for W consisting of eigenvectors of
T . (Hint: Using Exercise 4.3.18, concoct a basis for V by starting with a basis for W . Consider
the matrix for T with respect to this basis; what is its characteristic polynomial?)

24. Suppose A and B are n × n matrices.

a. Suppose both A and B are diagonalizable and that they have the same eigenvectors. Prove
that AB = BA.
b. Suppose A has n distinct eigenvalues and AB = BA. Prove that every eigenvector of
A is also an eigenvector of B. Conclude that B is diagonalizable. (Query: Need every
eigenvector of B be an eigenvector of A?)
c. Suppose A and B are diagonalizable and AB = BA. Prove that A and B are simul-
taneously diagonalizable; i.e., there is a nonsingular matrix P so that both P −1 AP and
P −1 BP are diagonal. (Hint: If E(λ) is the λ-eigenspace for A, show that if v ∈ E(λ), then
Bv ∈ E(λ). Now use Exercise 23.)

25. a. Let λ 6= µ be eigenvalues of a linear transformation. Suppose {v1 , . . . , vk } ⊂ E(λ)

is linearly independent and {w1 , . . . , wℓ } ⊂ E(µ) is linearly independent. Prove that
{v1 , . . . , vk , w1 , . . . , wℓ } is linearly independent.
(i) (i)
b. More generally, if λ1 , . . . , λk are distinct and {v1 , . . . , vdi } ⊂ E(λi ) is linearly independent
(i)
for i = 1, . . . , k, prove that {vj : i = 1, . . . , k, j = 1, . . . , di } is linearly independent.
418 Chapter 9. Eigenvalues, Eigenvectors, and Applications

3. Difference Equations and Ordinary Differential Equations

Suppose A is a diagonalizable matrix. Then there is a nonsingular matrix P so that

 
λ1
 
 λ2 
−1
P AP = Λ =   ,
.. 
 . 
λn
where the diagonal entries of Λ are the eigenvalues λ1 , . . . , λn of A. Then it is easy to use this to
calculate the powers of A:
A = P ΛP −1
A2 = (P ΛP −1 )2 = (P ΛP −1 )(P ΛP −1 ) = P Λ(P −1 P )ΛP −1 = P Λ2 P −1
A3 = A2 A = (P Λ2 P −1 )(P ΛP −1 ) = P Λ2 (P −1 P )ΛP −1 = P Λ3 P −1
..
.
Ak = P Λk P −1 .
We now show how linear algebra can be applied to solve some simple difference equations and
systems of differential equations. Both arise very naturally in modeling economic, physical, and
biological problems. For the most basic example, we need only take “exponential growth.” When
we model a discrete growth process and stipulate that population doubles each year, then ak , the
population after k years, obeys the law: ak+1 = 2ak . When we model a continuous growth process,
we stipulate that the rate of change of the population x(t) is proportional to the population at that
instant, giving the differential equation ẋ(t) = kx(t).

3.1. Difference Equations.

Example 1. (A cat/mouse population problem) Suppose the

cat population at month k is ck
ck
and the mouse population at month k is mk , and let xk = denote the population vector at
mk
month k. Suppose " #
0.7 0.2
xk+1 = Axk , where A = ,
−0.6 1.4
and an initial population vector x0 is given. Then the population vector xk can be computed from
xk = Ak x0 ,
so we want to compute Ak by diagonalizing the matrix A.
Since the characteristic polynomial of A is p(t) = t2 − 2.1t + 1.1 = (t − 1)(t − 1.1), we see that
the eigenvalues of A are 1 and 1.1. The corresponding eigenvectors are

2 1
v1 = and v2 = ,
3 2
and so we form the change-of-basis matrix
" #
2 1
P = .
3 2
§3. Difference Equations and Ordinary Differential Equations 419

Then we have
" #
−1 1 0
A = P ΛP , where Λ = ,
0 1.1

and so
" #" #" #
2 1 1 0 2 −1
Ak = P Λk P −1 = .
3 2 0 (1.1)k −3 2

c0
In particular, if x0 = is the original population vector, we have
m0
" # " #" #" #" #
ck 2 1 1 0 2 −1 c0
xk = =
mk 3 2 0 (1.1)k −3 2 m0
" #" #" #
2 1 2c0 − m0
1 0
=
3 2 0 (1.1)k
−3c0 + 2m0
" #" #
2 1 2c0 − m0
=
3 2 (1.1)k (−3c0 + 2m0 )
" # " #
2 1
= (2c0 − m0 ) + (−3c0 + 2m0 )(1.1)k .
3 2

We can now see what happens as time passes. If 3c0 = 2m0 , the second term drops out and the
population vector stays constant. If 3c0 < 2m0 , the first term still is constant, and the second
term increases exponentially; but note the contribution to the mouse population is double the
contribution to the cat population. And if 3c0 > 2m0 , we see that the population vector decreases
exponentially, the mouse population being the first to disappear (why?). ▽

The story for a general diagonalizable matrix A is the same. The column vectors of P are the
eigenvectors v1 , . . . , vn , the entries of Λk are λk1 , . . . , λkn , and so, letting
 
c1
 c2 
 
P −1 x0 =  .. ,
 . 
cn

we have
   
| | | λk1 c
   1 
  λk2   c2 
(∗) k
A x0 = P Λ (P k −1
x0 ) = 
 v1 v2 ··· vn 


..
 .
 . 
  .  . 
| | | λkn cn

= c1 λk1 v1 + c2 λk2 v2 + · · · + cn λkn vn .

This formula will have all the information we need, and we will see physical interpretations of
analogous formulas when we discuss systems of differential equations shortly.
420 Chapter 9. Eigenvalues, Eigenvectors, and Applications

Example 2. (The Fibonacci sequence) The renowned Fibonacci sequence,

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . . ,

is obtained by letting each number (starting with the third) be the sum of the preceding two: if we
let ak denote the kth number in the sequence, then

ak+1 = ak + ak−1 , a0 = a1 = 1.

ak
Thus, if we define xk = , k ≥ 0, then we can encode the pattern of the sequence in the
ak+1
matrix equation
" # " #
ak 0 1 ak−1
= , k ≥ 1.
ak+1 1 1 ak

In other words, setting

" # " #
0 1 ak
A= and xk = ,
1 1 ak+1
we have
1
xk+1 = Axk for all k ≥ 0, with x0 = .
1
Once again, by computing the powers of the matrix A, we can calculate xk = Ak x0 , and hence the
kth term in the Fibonacci sequence.
The characteristic polynomial of A is p(t) = t2 − t − 1, and so the eigenvalues are
√ √
1+ 5 1− 5
λ1 = and λ2 = .
2 2
The corresponding eigenvectors are

1 1
v1 = and v2 = .
λ1 λ2

Then " # " #

1 1 −1 1 −λ2 1
P = and P =√ ,
λ1 λ2 5 λ1 −1
so we have
c1 −11 1 1 − λ2 1 λ1
=P =√ =√ .
c2 5 1 λ1 − 1 5 −λ2
Now we use the formula (∗) above to calculate

xk = Ak x0 = c1 λk1 v1 + c2 λk2 v2

λ1 1 λ2 1
= √ λk1 − √ λk2 .
5 λ 1 5 λ2

In particular, reading off the first coordinate of this vector, we find that the kth number in the
Fibonacci sequence is

1 k+1 k+1
1 1+√5 k+1 1−√5 k+1
ak = √ λ1 − λ2 =√ 2 − 2 .
5 5
§3. Difference Equations and Ordinary Differential Equations 421

It’s far from obvious (at least to the author) that each such number is an integer! We would be
remiss if we didn’t point out one of the classic facts about the Fibonacci sequence: if we take the
ratio of successive terms, we get

√1 k+2 k+2
ak+1 5
λ 1 − λ 2
= .
ak √1 λk+1 − λk+1
5 1 2

Now, |λ2 | ≈ .618, so lim λk2 = 0 and we have

k→∞
ak+1
lim = λ1 ≈ 1.618.
k→∞ ak

This is the famed golden ratio. ▽

3.2. Systems of Differential Equations. Another powerful application of linear algebra

comes from the study of systems of ordinary differential equations (ODE’s). For example, we have
the constant-coefficient system of linear ordinary differential equations:
ẋ1 (t) = a11 x1 (t) + a12 x2 (t)
ẋ2 (t) = a21 x1 (t) + a22 x2 (t).
Here, and throughout this section, we use a dot to represent differentiation with respect to t (time).
The main problem we address in this section is the following. Given an n × n (constant) matrix
A and a vector x0 ∈ Rn , we wish to find all differentiable vector-valued functions x(t) so that
ẋ(t) = Ax(t), x(0) = x0 .
(The vector x0 is called the initial value of the solution x(t).)

Example 3. Suppose n = 1, so that A = [a] for some real number a. Then we have simply
the ordinary differential equation
ẋ(t) = ax(t), x(0) = x0 .
The trick of “separating variables” that the reader most likely learned in her integral calculus course
leads to the solution x(t) = x0 eat . As we can easily check, ẋ(t) = ax(t), so we have in fact found
a solution. Do we know there can be no more? Suppose y(t) were any solution of the original
problem. Then the function z(t) = y(t)e−at satisfies the equation

ż(t) = ẏ(t)e−at + y(t) −ae−at = (ay(t)) e−at + y(t) −ae−at = 0,
and so z(t) must be a constant function. Since z(0) = y(0) = x0 , we see that y(t) = x0 eat . The
original differential equation (with its initial condition) has a unique solution. ▽

Example 4. Consider perhaps the simplest possible 2 × 2 example:

ẋ1 (t) = ax1 (t)
ẋ2 (t) = bx2 (t)
with the initial conditions x1 (0) = (x1 )0 , x2 (0) = (x2 )0 . In matrix notation, this is the ODE
ẋ(t) = Ax(t), x(0) = x0 , where
422 Chapter 9. Eigenvalues, Eigenvectors, and Applications
" #
a 0 x1 (t) (x1 )0
A= , x(t) = , and x0 = .
0 b x2 (t) (x2 )0

Since x1 (t) and x2 (t) appear completely independently in these equations, we infer from Example
3 that the unique solution of this system of equations will be

x1 (t) = (x1 )0 eat , x2 (t) = (x2 )0 ebt .

In vector notation, we have

x1 (t) eat 0
x(t) = = x0 = E(t)x0 ,
x2 (t) 0 ebt

where E(t) is the diagonal 2 × 2 matrix with entries eat and ebt . This result is easily generalized to
the case of a diagonal n × n matrix. ▽

Recall that for any real number x, we have the Taylor series expansion
∞
X
x xk 1 1 1
(†) e = = 1 + x + x2 + x3 + · · · + xk + . . . .
k! 2 6 k!
k=0

Now, given an n × n matrix A, we define a new n × n matrix eA , called the exponential of A, by

X Ak ∞
1 1 1
eA = I + A + A2 + A3 + · · · + Ak + · · · = .
2 6 k! k!
k=0

That the series converges is immediate from Proposition 1.1 of Chapter 6. In general, however,
trying to evaluate this series directly is extremely difficult, because the coefficients of Ak are not
easily expressed in terms of the coefficients of A. However, when A is a diagonalizable matrix, it
is easy to compute eA : there is an invertible matrix P so that Λ = P −1 AP is diagonal. Thus,
A = P ΛP −1 and Ak = P Λk P −1 for all k ∈ N, and so
∞ ∞ ∞
!
X Ak X P Λk P −1 X Λk
A
e = = =P P −1 = P eΛ P −1 .
k! k! k!
k=0 k=0 k=0
" #
2 0
Example 5. Let A = . Then A = P ΛP −1 , where
3 −1
" # " #
2 1 0
Λ= and P = .
−1 1 1

Then we have
" # " #
e2t e2t 0
etΛ = and etA = P etΛ P −1 = . ▽
e−t 2t
e −e −t e−t

The result of Example 4 generalizes to the n × n case. Indeed, whenever we can solve a problem
for diagonal matrices, we can solve it for diagonalizable matrices by making the appropriate change-
of-basis. So we should not be surprised by the following result.
§3. Difference Equations and Ordinary Differential Equations 423

Proposition 3.1. Let A be a diagonalizable n × n matrix. The general solution of initial value
problem

(‡) ẋ(t) = Ax(t), x(0) = x0

is given by x(t) = etA x0 .

Proof. As above, since A is diagonalizable, there are an invertible matrix P and a diagonal
matrix Λ so that A = P ΛP −1 and etA = P etΛ P −1 . Since the derivative of the diagonal matrix
 
etλ1
 
 etλ2 
tΛ
e =  
.. 
 . 
etλn

is obviously
 
λ1 etλ1
 
 λ2 etλ2 
  = ΛetΛ ,
 .. 
 . 
λn etλn
then we have
• •
(etA )• = P etΛ P −1 = P etΛ P −1

= P ΛetΛ P −1
= (P ΛP −1 )(P etΛ P −1 ) = AetA .

We begin by checking that x(t) = etA x0 is indeed a solution.

•
ẋ(t) = etA x0 = (AetA )x0 = A(etA x0 ) = Ax(t),

as required.
Now suppose that y(t) is a solution of the equation (‡), and consider the vector function
z(t) = e−tA y(t). Then by the product rule, we have

ż(t) = (e−tA )• y(t) + e−tA ẏ(t)

= −Ae−tA y(t) + e−tA Ay(t) = −Ae−tA + e−tA A y(t) = 0,

as Ae−tA = e−tA A. This implies that z(t) must be a constant vector, and so

z(t) = z(0) = y(0) = x0 ,

whence y(t) = etA z(t) = etA x0 for all t, as required.

Remark . A more sophisticated interpretation of this result is the following: If we view the
system (‡) of ODE’s in a coordinate system derived from the eigenvectors of the matrix A, then
the system is uncoupled.
424 Chapter 9. Eigenvalues, Eigenvectors, and Applications

Example 6. Continuing Example 5, we see that the general solution of the system ẋ(t) = Ax(t)
has the form
" # " #
x1 (t) tA c1
x(t) = =e for appropriate constants c1 and c2
x2 (t) c2
" # " # " #
c1 e2t 1 0
= 2t −t −t
= c1 e2t + (c2 − c1 )e−t .
c1 e − c1 e + c2 e 1 1

Of course, this is the expression we get when we write

" # " #!
c1 c1
x(t) = etA = P etΛ P −1
c2 c2

and obtain the familiar linear combination of the columns of P (which are the eigenvectors of A). If,
in particular, we wish to study the long-term behavior of the solution, we observe that lim e−t = 0
t→∞
2t 2t 1
and lim e = ∞, so that x(t) behaves like c1 e as t → ∞. In general, this type of analysis
t→∞ 1
of diagonalizable systems is called normal mode analysis and the vector functions
" # " #
2t 1 −t 0
e and e
1 1

corresponding to the eigenvectors are called the normal modes of the system. ▽

To emphasize the analogy with the solution of difference equations earlier and the formula (∗)
on p. 419, we rephrase Proposition 3.1 so as to highlight the normal modes.

Corollary 3.2. Suppose A is diagonalizable, with eigenvalues λ1 , . . . , λn and corresponding

eigenvectors v1 , . . . , vn , and write A = P ΛP −1 , as usual. Then the solution of the initial value
problem
ẋ(t) = Ax(t), x(0) = x0
is

x(t) = etA x0 = P etΛ (P −1 x0 )

= c1 eλ1 t v1 + c2 eλ2 t v2 + · · · + cn eλn t vn ,

where  
c1
 c2 
 
P −1 x0 =  .. .
 . 
cn
§3. Difference Equations and Ordinary Differential Equations 425

Note that the general solution is a linear combination of the normal modes eλ1 t v1 , . . . , eλn t vn .
Even when A is not diagonalizable, we may differentiate the exponential series term-by-term2
to obtain
•
tA • t2 2 t3 3 tk k tk+1 k+1
(e ) = I + tA + A + A + · · · + A + A + ...
2! 3! k! (k + 1)!
t2 3 tk−1 tk
= A + tA2 + A + ··· + Ak + Ak+1 + . . .
2! (k − 1)! k!
t2 2 tk−1 tk
= A I + tA + A + ··· + Ak−1 + Ak + . . . = AetA .
2! (k − 1)! k!
Thus, we have

Theorem 3.3. Suppose A is an n × n matrix. Then the unique solution of the initial value
problem
ẋ(t) = Ax(t), x(0) = x0
is x(t) = etA x0 .

Example 7. Consider the differential equation ẋ(t) = Ax(t) when

" #
0 −1
A= .
1 0
The unsophisticated (but tricky) approach is to write this system out explicitly:
ẋ1 (t) = −x2 (t)
ẋ2 (t) = x1 (t)
and differentiate again, obtaining
ẍ1 (t) = −ẋ2 (t) = −x1 (t)
(∗∗)
ẍ2 (t) = ẋ1 (t) = −x2 (t).
That is, our vector function x(t) satisfies the second-order differential equation
ẍ(t) = −x(t).
Now, the equations (∗∗) have the “obvious” solutions
x1 (t) = a1 cos t + b1 sin t and x2 (t) = a2 cos t + b2 sin t
for some constants a1 , a2 , b1 , and b2 (although it is far from obvious that these are the only
solutions). Some information was lost in the process; in particular, since ẋ1 = −x2 , the constants
must satisfy the equations
a2 = −b1 and b2 = a1 .
That is, the vector function
" # " # " #" #
x1 (t) a cos t − b sin t cos t − sin t a
x(t) = = =
x2 (t) a sin t + b cos t sin t cos t b
2
See Spivak, Calculus, Third Edition, Chapter 24, for the proof in the real case; Proposition 1.1 of Chapter 6
applies to show that it works for the matrix case as well.
426 Chapter 9. Eigenvalues, Eigenvectors, and Applications

gives a solution of the original differential equation.

On the other hand, Theorem 3.3 tells us that the general solution should be of the form

x(t) = etA x0 ,

and so we suspect that

 
t
0 −1  " #
1 0 cos t − sin t
e =
sin t cos t
should hold. Well,
t2 t3 t4
etA = I + tA + A2 + A3 + A4 + . . .
" #2! " 3! # 4! " # " # " #
1 0 0 −1 t2 −1 0 t3 0 1 t4 1 0
= +t + + + + ...
0 1 1 0 2! 0 −1 3! −1 0 4! 0 1
" 2 4 3 5
#
1 − t2! + t4! + . . . −t + t3! − t5! + . . .
= 3 5 2 4 .
t − t3! + t5! + . . . 1 − t2! + t4! + . . .

Since the power series expansions (Taylor series) for sin and cos are, indeed,
1 3 1 1
sin t = t − t + t5 + · · · + (−1)k t2k+1 + . . .
3! 5! (2k + 1)!
1 1 1 2k
cos t = 1 − t2 + t4 + · · · + (−1)k t + ... ,
2! 4! (2k)!
the formulas agree. (Another approach to computing etA is to diagonalize A over the complex
numbers, but we don’t stop to do this here.3 ) ▽

Example 8. Let’s now consider the case of a non-diagonalizable matrix, e.g.,

" #
2 1
A= .
0 2
The system
ẋ1 (t) = 2x1 + x2
ẋ2 (t) = 2x2
is already partially uncoupled, so we know that x2 (t) must take the form x2 (t) = ce2t for some
constant c. Now, in order to find x1 (t), we must solve the inhomogeneous ODE

ẋ1 (t) = 2x1 (t) + ce2t .

In elementary differential equations courses, one is taught to look for a solution of the form

x1 (t) = ae2t + bte2t ;

in this case,
ẋ1 (t) = (2a + b)e2t + (2b)te2t = 2x1 (t) + be2t ,
3
But we must remind you of the famous formula, usually attributed to Euler: eit = cos t + i sin t.
§3. Difference Equations and Ordinary Differential Equations 427

and so taking b = c gives the desired solution of our equation. That is, the solution of the system
is the vector function 2t 2t
ae + cte2t e te2t a
x(t) = = .
ce2t 0 e2t c
The explanation of the trick is quite simple. Let’s calculate the matrix exponential etA by
writing " # " # " #
2 0 0 1 0 1
A= + = 2I + B, where B = .
0 2 0 0 0 0
The powers of A are easy to compute because B 2 = 0: by the binomial theorem,
(2I + B)k = 2k I + k2k−1 B,
and so
∞ k
X ∞ k
X
t t
etA = Ak = 2k I + k2k−1 B
k! k!
k=0 k=0
X∞ X∞ k
(2t)k t
= I+ k2k−1 B
k! k!
k=0 k=0
X∞ X∞
(2t)k−1 (2t)k
= e2t I + t B = e2t I + t B
(k − 1)! k!
k=1 k=0
" #
e2t te2t
= e2t I + te2t B = .
0 e2t
A similar phenomenon occurs more generally (see Exercise 14). ▽
Let’s consider the general nth order linear ODE with constant coefficients:
(⋆) y (n) (t) + an−1 y (n−1) (t) + · · · + a2 ÿ(t) + a1 ẏ(t) + a0 y(t) = 0.
Here a0 , a1 , . . . , an−1 are scalars, and y(t) is assumed to be Cn ; y (k) denotes its kth derivative. We
can use the power of Theorem 3.3 to derive the following general result.
Theorem 3.4. Let n ∈ N. The set of solutions of the nth order ODE (⋆) is an n-dimensional
subspace of C∞ (R), the vector space4 of smooth functions. In particular, the initial value problem
y (n) (t) + an−1 y (n−1) (t) + · · · + a2 ÿ(t) + a1 ẏ(t) + a0 y(t) = 0
y(0) = c0 , ẏ(0) = c1 , ÿ(0) = c2 , ..., y (n−1) (0) = cn−1
has a unique solution.
Proof. The trick is to concoct a way to apply Theorem 3.3. We introduce the vector function
x(t) defined by  
y(t)
 ẏ(t) 
 
 ÿ(t) 
x(t) =  ,
 .. 
 . 
y (n−1) (t)
4
See section 3.1 of Chapter 4.
428 Chapter 9. Eigenvalues, Eigenvectors, and Applications

and observe that it satisfies the first-order system of ODE’s

 
  0 1 0 ··· 0  
ẏ(t)   y(t)
 ÿ(t)   0 0 1 ··· 0   ẏ(t) 
 ...    
   ..  
ẋ(t) =  y (t)  =  0 0 0 . 0   ÿ(t) 
 ..   . .. .. ..   .. 
 .   . ..  . 
 . . . . . 
(n) (n−1)
y (t) y (t)
−a0 −a1 −a2 · · · −an−1
= Ax(t),
where A is the obvious matrix of coefficients. We infer from Theorem 3.3 that the general solution
is x(t) = etA x0 , so
   
y(t) c0
 ẏ(t)   c1 
   
 ÿ(t)   
  = etA  c2  = c0 v1 (t) + c1 v2 (t) + · · · + cn−1 vn (t),
 ..   .. 
 .   . 
y (n−1) (t) cn−1

where vj (t) are the columns of etA . In particular, if we let q1 (t), . . . , qn (t) denote the first entries
of the vector functions v1 (t), . . . , vn (t), respectively, we see that
y(t) = c0 q1 (t) + c1 q2 (t) + · · · + cn−1 qn (t);
that is, the functions q1 , . . . , qn span the vector space of solutions of the differential equation (⋆).
Note that these functions are C∞ since the entries of etA are. Last, we claim that these functions
are linearly independent. For suppose that for some scalars c0 , c1 , . . . , cn−1 we have
y(t) = c0 q1 (t) + c1 q2 (t) + · · · + cn−1 qn (t) = 0;
Then, differentiating, we have the same linear relation among the kth derivatives of q1 , . . . , qn ,
k = 1, . . . , n − 1, and so we have
   
y(t) c0
 ẏ(t)   c1 
   
 ÿ(t)  tA  c2 
0= =e  .
 ..   . 
 .   .. 
y (n−1) (t) cn−1
Since etA is an invertible matrix (see Exercise 17), we infer that c0 = c1 = · · · = cn−1 = 0, and so
{q1 , . . . , qn } is linearly independent.

Example 9. Let " #

−3 2
A= ,
2 −3
and consider the second-order system of ODE’s
ẍ(t) = Ax(t), x(0) = x0 , ẋ(0) = ẋ0 .
The experience we gained in Example 7 suggests that if we can uncouple this system (by finding
eigenvalues and eigenvectors), we should expect to find normal modes that are sinusoidal in nature.
§3. Difference Equations and Ordinary Differential Equations 429

The characteristic polynomial of A is p(t) = t2 + 6t + 5, and so its eigenvalues are λ1 = −1 and

λ2 = −5, with corresponding eigenvectors
" # " #
1 1
v1 = and v2 = .
1 −1
(Note, as a check, that because A is symmetric, the eigenvectors are orthogonal. See Exercise
9.2.8.) As usual, we write P −1 AP = Λ, where
" # " #
−1 1 1
Λ= and P = .
−5 1 −1
Let’s make the “uncoupling” change of coordinates y = P −1 x, i.e.,
" # " #" #
y1 1 1 1 x1
y= = .
y2 2 1 −1 x2
Then the system of differential equations becomes
ÿ(t) = P −1 ẍ(t) = P −1 Ax = ΛP −1 x = Λy,

i.e.,

ÿ1 (t) = −y1

ÿ2 (t) = − 5y2 ,

whose general solution is

y1 (t) = a1 cos t + b1 sin t

√ √
y2 (t) = a2 cos 5t + b2 sin 5t.
This means that in the original coordinates, we have x = P y, i.e.,
" # " #" #
x1 1 1 a1 cos t + b1 sin t
x= = √ √
x2 1 −1 a2 cos 5t + b2 sin 5t
" # " #
1 √ √ 1
= (a1 cos t + b1 sin t) + (a2 cos 5t + b2 sin 5t) .
1 −1
The four constants can be determined from the initial conditions x0 and ẋ0 . In particular, if we
start with " # " #
1 0
x0 = and ẋ0 = ,
0 0
then a1 = a2 = 21 and b1 = b2 = 0. Note that the form of our solution looks very much like the
normal mode decomposition of the solution (††) of the first-order system earlier.
A physical system that leads to this differential equation is the following. Hooke’s Law says
that a spring with spring constant k exerts a restoring force F = −kx on a mass m that is displaced
x units from its equilibrium position (corresponding to the “natural length” of the spring). Now
imagine a system, as pictured in Figure 3.1, consisting of two masses (m1 and m2 ) connected to
each other and to walls by three springs (with spring constants k1 , k2 , and k3 ). Denote by x1 and
x2 the displacement of masses m1 and m2 , resp., from equilibrium position. Hooke’s Law, as stated
430 Chapter 9. Eigenvalues, Eigenvectors, and Applications

k1 k2 k3
m1 m2

Figure 3.1

above, and Newton’s second law of motion (“force = mass × acceleration”) give us the following
system of equations:

m1 ẍ1 (t) = −k1 x1 + k2 (x2 − x1 ) = − (k1 + k2 )x1 + k2 x2

m2 ẍ2 (t) = k2 (x1 − x2 ) − k3 x2 = k2 x1 − (k2 + k3 )x2 .

Setting m1 = m2 = 1, k1 = k3 = 1, and k2 = 2 gives the system of differential equations with which

we began. Here the normal modes correspond to sinusoidal motion with x1 = x2 (so we observe the
masses moving “in parallel,” the middle spring staying at its natural length) and frequency 1 and
sinusoidal motion with x1 = −x2 (so we observe the masses moving “in antiparallel,” the middle
√
spring compressing symmetrically) and frequency 5. ▽

3.3. Flows and the Divergence Theorem. Let U ⊂ Rn be an open subset. Let F : U → Rn
be a vector field on U . So far we have dealt with vector fields of the form F(x) = Ax, where A is
an n × n matrix. But, more generally, we can try to solve the system of differential equations

ẋ(t) = F(x(t)), x(0) = x0 .

We will write the solution of this initial value problem as φt (x0 ), indicating its functional depen-
dence on both time and the initial value. The function φ is called the flow of the vector field F.
Note that φ0 (x) = x for all x ∈ U .
t
Examples 10. (a) The flow of the
vector
field F (x) = x on R is φt (x) = e x.
x −y
(b) The flow of the vector field F = on R2 is
y x

x cos t − sin t x
φt = ;
y sin t cos t y
i.e., the flow
" lines #
are circles centered at the origin.
2 1
(c) Let A = . The flow of the vector field F(x) = Ax on R2 is
1 −2
" # " #
tA tΛ −1 1 3t 1 −3t −1

φt (x) = e x = P e (P x) = (5x1 + x2 )e + (−x1 + x2 )e ,
6 1 5
where " # " #
3 1 −1
Λ= and P = .
−3 1 5
(d) The flow of the general linear differential equation ẋ(t) = Ax(t) is given by φt (x) = etA x.
Finding an explicit formula for the flow of a nonlinear differential equation may be somewhat
difficult. ▽
§3. Difference Equations and Ordinary Differential Equations 431

It is proved in more advanced courses that if F is a smooth vector field on an open set U ⊂ Rn ,
then for any x ∈ U , there are a neighborhood V of x and ε > 0 so that for any y ∈ V the
flow
starting
at y, φt (y), is defined for all |t| < ε. Moreover, the function φ : V × (−ε, ε) → Rn ,
y
φ = φt (y), is smooth. We now want to give another interpretation of divergence of the vector
t
field F, first discussed in Section 6 of Chapter 8. It is a natural generalization of the elementary
observation that the derivative of the area of a circle with respect to its radius is the circumference.

F1
 
First we need to extend the definition of divergence to n dimensions: If F =  ...  is a smooth
vector field on Rn , we set Fn
∂F1 ∂F2 ∂Fn
div F = + + ··· + .
∂x1 ∂x2 ∂xn
Proposition 3.5. Let F be a smooth vector field on U ⊂ Rn , let φt denote the flow of F, and
let Ω ⊂ZU be a compact region with piecewise smooth boundary. Let V(t) = vol(φt (Ω)). Then
V˙ (0) = div FdV .
Ω

Remark . Using (the obvious generalization of) the DivergenceZ Theorem, Theorem 6.2 of
˙
Chapter 8, we have the intuitively appealing result that V(0) = F · ndS. That is, what causes
∂Ω
net increase in the volume of the region is flow across its boundary.

Examples 11. (a) In Figure 3.2, we see the flow of the unit square under the vector field

-4 -2 2 4

-2

-4

Figure 3.2

x 2x + y
F = . Note that area is preserved under the flow, as div F = 0.
y 5x − 2y
(b) In Figure 3.3 (with thanks to John Polking’s MATLAB software pplane5), we see the flow
of certain regions Ω. In (a), the region expands (as div F > 0), whereas in (b) the region
maintains its area (as div F = 0). ▽
432 Chapter 9. Eigenvalues, Eigenvectors, and Applications

x’=x+2y x’=-x+2y
y’=-2x+y y’=5x+y

2 2

1.5 1.5

1 1

0.5 0.5 φt(Ω)

Ω
0 0

y
y

φt(Ω)
-0.5 -0.5
Ω
Ω
-1 -1

-1.5 -1.5

-2 -2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

x x

(a) (b)

Figure 3.3

Proof. We have
Z Z
V(t) = φ∗t (dx1 ∧ · · · ∧ dxn ) = d(φt )1 ∧ · · · ∧ d(φt )n .
Ω Ω
By Exercise 7.2.20, we have
Z
∂
V˙ (0) = d(φt )1 ∧ · · · ∧ d(φt )n .
Ω ∂t t=0
∂ 2 φt ∂ 2 φt ∂ ∂φt
Now the fact that mixed partials are equal tells us that = , and so (dφt ) = d =
∂t∂xi ∂xi ∂t ∂t ∂t
d(φ̇t ). Moreover, φ˙0 (x) = F(x) (since φ̇t (x) = F(φt (x))), and φ0 (x) = x, so the latter integral
can be rewritten
Z
˙
V(0) = dF1 ∧ dx2 ∧ · · · ∧ dxn + dx1 ∧ dF2 ∧ dx3 ∧ · · · ∧ dxn + · · · + dx1 ∧ · · · ∧ dxn−1 ∧ dFn
ZΩ
= div Fdx1 ∧ · · · ∧ dxn ,
Ω
as required.

EXERCISES 9.3

2 5
1. Let A = . Calculate Ak for all k ≥ 1.
1 −2

*2. Suppose each of two tubs contains two bottles of beer; two are Budweiser and two are Beck’s.
Each minute, Fraternity Freddy picks a bottle of beer at random from each tub and replaces it
in the other tub. After a long time, what portion of the time will there be exactly one bottle
§3. Difference Equations and Ordinary Differential Equations 433

of Beck’s in the first tub? at least one bottle of Beck’s? (Hint: Let xk be the vector whose
entries are, respectively, the probabilities that there are 2 Beck’s, 1 of each, or 2 Buds in the
first tub.)

*3. Gambling Gus has $200 and plays a game where he must continue playing until he has either
lost all his money or doubled it. In each game, he has a 2/5 chance of winning $100 and a
3/5 chance of losing $100. What is the probability that he eventually loses all his money?
(Warning: Calculator or computer suggested.)

*4. If a0 = 2, a1 = 3, and ak+1 = 3ak − 2ak−1 , for all k ≥ 1, use methods of linear algebra to
determine the formula for ak .

5. If a0 = a1 = 1 and ak+1 = ak + 6ak−1 for all k ≥ 1, use methods of linear algebra to determine
the formula for ak .

6. Suppose a0 = 0, a1 = 1, and ak+1 = 3ak + 4ak−1 for all k ≥ 1. Use methods of linear algebra
to find an explicit formula for ak .

7. If a0 = 0, a1 = 1, and ak+1 = 4ak − 4ak−1 for all k ≥ 1, use methods of linear algebra to
determine the formula for ak . (Hint: The matrix will not be diagonalizable, but you can get
close if you stare at Exercise 9.2.16.)

*8. If a0 = 0, a1 = a2 = 1, and ak+1 = 2ak + ak−1 − 2ak−2 for k ≥ 2, use methods of linear algebra
to determine the formula for ak .

9. Consider the cat/mouse population problem studied in Example 1. Solve the following versions,
including an investigation of the dependence on the original populations.

a. ck+1 = 0.7ck + 0.1mk c. ck+1 = 1.1ck + 0.3mk

mk+1 = −0.2ck + mk mk+1 = 0.1ck + 0.9mk
*b. ck+1 = 1.3ck + 0.2mk
mk+1 = −0.1ck + mk

What conclusions do you draw?

10. Check that if A is an n × n matrix and the n × n differentiable matrix function E(t) satisfies
Ė(t) = AE(t) and E(0) = I, then E(t) = etA for all t ∈ R.

11. Calculate"etA and use

# your"answer # to solve ẋ(t) = Ax(t), x(0) = x0 .
1 5 6
*a. A = , x0 =
2 4 −1
" # " #
0 1 1
b. A = , x0 =
1 0 3
" # " #
1 3 5
c. A = , x0 =
3 1 1
" # " #
1 1 2
*d. A = , x0 =
−1 3 −1
434 Chapter 9. Eigenvalues, Eigenvectors, and Applications
   
−1 1 2 2
   
*e. A =  1 2 1 , x0 =  0 
2 1 −1 4
   
1 −2 2 3
   
f. A =  −1 0 −1 , x0 =  −1 
0 2 −1 −4
12. Solve ẍ ="Ax, x(0)#= x0 , ẋ(0)
" #= ẋ0 . " #
1 5 7 −5
*a. A = , x0 = , ẋ0 =
2 4 0 2
" # " # " #
0 1 −2 1
b. A = , x0 = , ẋ0 =
1 0 2 3
" # " # " √ #
1 3 −2 2−3 2
c. A = , x0 = , ẋ0 = √
3 1 4 2+3 2
" # " # " #
0 1 1 2
*d. A = , x0 = , ẋ0 =
0 0 2 1
13. Find the motion of the two-mass, three-spring system in Example 9 when
a. m1 = m2 = 1 and k1 = k3 = 1, k2 = 3
b. m1 = m2 = 1 and k1 = 1, k2 = 2, k3 = 4
*c. m1 = 1, m2 = 2, k1 = 1, and k2 = k3 = 2

*14. Let
 
2 1
 
J = 2 1.
2

Calculate etJ .

*15. By mimicking the proof of Theorem 3.4, convert the following second-order differential equations
into first-order systems and use matrix exponentials to solve:
a. ÿ(t) − ẏ(t) − 2y(t) = 0, y(0) = −1, ẏ(0) = 4
b. ÿ(t) − 2ẏ(t) + y(t) = 0, y(0) = 1, ẏ(0) = 2

16. Let a, b ∈ R. Convert the constant coefficient second-order differential equation

ÿ(t) + aẏ(t) + by(t) = 0

y(t)
into a first-order system by letting x(t) = . Considering separately the cases a2 − 4b 6= 0
ẏ(t)
and a2 − 4b = 0, use matrix exponentials to find the general solution.
−1
17. a. Prove that for any square matrix A, (eA )−1 = e−A . (Hint: Show etA = e−tA for all
t ∈ R.)
b. Prove that if A is skew-symmetric (i.e., AT = −A), then eA is an orthogonal matrix.
c. Prove that when the eigenvalues of A are real, det(eA ) = etrA . (Hint: Use Exercise 9.2.22.)
§4. The Spectral Theorem 435

18. Consider the mapping exp : Mn×n → Mn×n given by exp(A) = eA . By Exercise 17, eA is
always invertible.
a. Use the Inverse Function Theorem to show that for every matrix B sufficiently close to I,
A
there is a unique A # close" to O so that
" sufficiently # e = B.
−1 −2
b. Can the matrices and be written in the form eA for some A?
−1 1
19. Use Proposition 3.5 to deduce that the derivative with respect to r of the volume of a ball of
radius r (in Rn ) is the volume (surface area) of the sphere of radius r.

20. It can be proved using (a generalization of) the Contraction Mapping Principle, Theorem 1.2
of Chapter 6, that when F is a smooth vector field, given a, there are δ, ε > 0 so that the
differential equation ẋ(t) = F(x(t)), x(0) = x0 , has a unique solution for all x0 ∈ B(a, δ) and
defined for all |t| < ε.
a. Assuming this result, prove that whenever |s|, |t|, and |s + t| < ε, we have φs+t = φs ◦ φt .
(Hint: Fix t = t0 and vary s.)
b. Deduce that φ−t = (φt )−1 .
p
c. By considering the example F (x) = |x|, show that uniqueness may fail when the vector
p
field isn’t smooth. Indeed, show that the initial value problem ẋ(t) = |x(t)|, x(0) = 0,
has infinitely many solutions.
Z
˙
21. Generalizing Proposition 3.5 somewhat, prove that V(t) = div FdV . (Hint: Use Exercise
φt (Ω)
20 and the Proposition as stated.)

22. a. Show that the space-derivative of the flow φt satisfies the first variation equation

(Dφt (x))• = DF(φt (x))Dφt (x), Dφ0 (x) = I .

b. For fixed x, let J(t) = det(Dφt (x)). Using Exercise 7.5.23, show that

˙ = div F(φt (x))J(t).

J(t)

Rt
div F(φs (x))ds
Deduce that J(t) = e 0 .

4. The Spectral Theorem

We now turn to the study of a large class of diagonalizable matrices, the symmetric matrices.
Recall that a square matrix A is symmetric when A = AT . To begin our exploration, let’s start
with a general symmetric 2 × 2 matrix
" #
a b
A= ,
b c
436 Chapter 9. Eigenvalues, Eigenvectors, and Applications

whose characteristic polynomial is p(t) = t2 − (a + c)t + (ac − b2 ). By the quadratic formula, its
eigenvalues are
p p
(a + c) ± (a + c)2 − 4(ac − b2 ) (a + c) ± (a − c)2 + 4b2
λ= = .
2 2
Only when A is diagonal are the eigenvalues not distinct. Thus, A is diagonalizable. Moreover, the
corresponding eigenvectors are
" # " #
b λ2 − c
v1 = and v2 = ;
λ1 − a b

note that
v1 · v2 = b(λ2 − c) + (λ1 − a)b = b(λ1 + λ2 − a − c) = 0,
and so the eigenvectors are orthogonal. Since there is an orthogonal basis for R2 consisting of
eigenvectors of A, we of course have an orthonormal basis for R2 consisting of eigenvectors of A.
That is, by an appropriate rotation of the usual basis, we obtain a diagonalizing basis for A.

Example 1. The eigenvalues of

" #
1 2
A=
2 −2
are λ1 = 2 and λ2 = −3, with corresponding eigenvectors
" # " #
2 −1
v1 = and v2 = .
1 2

By normalizing the vectors, we obtain an orthonormal basis

Figure 4.1

" # " #
1 2 1 −1
q1 = √ , q2 = √ .
5 1 5 2
See Figure 4.1. ▽

From Proposition 4.5 of Chapter 1 we recall that for all x, y ∈ Rn and n × n matrices A we
have
Ax · y = x · AT y.
In particular, when A is symmetric,

Ax · y = x · Ay.
§4. The Spectral Theorem 437

Generalizing somewhat, we say a linear map T : Rn → Rn is symmetric if T x · y = x · T y for

all x, y ∈ Rn . It is easy to see that the matrix for a symmetric linear map with respect to any
orthonormal basis is symmetric.
In general, we have the following important result. Its name comes from the word spectrum,
associated with the physical concept of decomposing light into its components of different colors.

Theorem 4.1 (Spectral Theorem). Let T : Rn → Rn be a symmetric linear map. Then

(1) The eigenvalues of T are real.
(2) There is an orthonormal basis for Rn consisting of eigenvectors of T . That is, if A is
the standard matrix for T , then there is an orthogonal matrix Q so that Q−1 AQ = Λ is
diagonal.

Proof. We proceed by induction on n. The case n = 1 is automatic. Now assume that the
result is true for all symmetric linear maps T ′ : Rn−1 → Rn−1 . Given a symmetric linear map
T : Rn → Rn , we begin by proving that it has a real eigenvalue. We choose to use calculus to prove
this, but for a purely linear-algebraic proof, see Exercise 16. Consider the function

f : Rn → R, f (x) = Ax · x = xT Ax.

By compactness of the unit sphere, f has a maximum subject to the constraint g(x) = kxk2 =
1. Applying the method of Lagrange multipliers, we infer that there is a unit vector v so that
Df (v) = λDg(v) for some scalar λ. By Exercise 3.2.14, this means

Av = λv,

and so we’ve found an eigenvector of A; the Lagrange multiplier is the corresponding eigenvalue.
(Incidentally, this was derived at the end of Section 4 of Chapter 5.)
By what we’ve just established, T has a real eigenvalue λ1 and a corresponding eigenvector v1
⊥
of length 1. Let W = Span(v1 ) ⊂ Rn ; note that if w · v1 = 0, then T (w) · v1 = w · T (v1 ) =
λ1 w · v1 = 0, so that T (w) ∈ W whenever w ∈ W . If we let T ′ = T |W be the restriction of T
to W , since dim W = n − 1, it follows from our induction hypothesis that there is an orthonormal
basis {v2 , . . . , vn } for W consisting of eigenvectors of T ′ . Then {v1 , v2 , . . . , vn } is the requisite
orthonormal basis for Rn , since T (v1 ) = λ1 v1 and T (vi ) = T ′ (vi ) = λi vi for i ≥ 2.

Example 2. Consider the symmetric matrix

 
0 1 1
 
A = 1 1 0.
1 0 1

Its characteristic polynomial is p(t) = −t3 + 2t2 + t − 2 = −(t2 − 1)(t − 2) = −(t + 1)(t − 1)(t − 2),
so the eigenvalues of A are −1, 1, and 2. As the reader can check, the corresponding eigenvectors
are      
−2 0 1
     
v1 =  1  , v2 =  −1  , and v3 =  1  .
1 1 1
438 Chapter 9. Eigenvalues, Eigenvectors, and Applications

Note that these three vectors form an orthogonal basis for R3 , and we can easily obtain an or-
thonormal basis by normalizing:
     
−2 0 1
1   1   1  
q1 = √  1  , q2 = √  −1  , and q3 = √  1  .
6 2 3
1 1 1

The orthogonal diagonalizing matrix Q is therefore

 
− √2 0 √1
 16 3 
Q= 
√
6
− √12 √1
3
.
 ▽
√1 √1 √1
6 2 3

Example 3. Consider the symmetric matrix

 
5 −4 −2
 
A =  −4 5 −2  .
−2 −2 8

Its characteristic polynomial is p(t) = −t3 + 18t2 − 81t = −t(t − 9)2 , so the eigenvalues of A are 0,
9, and 9. It is easy to check that
 
2
 
v1 =  2 
1
gives a basis for E(0) = N(A). As for E(9), we find
 
−4 −4 −2
 
A − 9I =  −4 −4 −2  ,
−2 −2 −1

which has rank 1, and so, as the spectral theorem guarantees, E(9) is 2-dimensional, with basis
   
−1 −1
   
v2 =  1  and v3 =  0  .
0 2

If we want an orthogonal (or orthonormal) basis, we must use the Gram-Schmidt process, Theorem
5.3 of Chapter 5: we take w2 = v2 and let
     
−1 −1 − 12
  1   
w3 = v3 − projw2 v3 =  0  −  1  =  − 12  .
2
2 0 2

It is convenient to eschew fractions, and so we let

 
−1
 
w3′ = 2w3 =  −1  .
4
§4. The Spectral Theorem 439

As a check, note that v1 , w2 , w3′ do in fact form an orthogonal basis. As before, if we want the
orthogonal diagonalizing matrix Q, we take
     
2 −1 −1
1  1   1  
q1 =  2  , q2 = √  1  , and q3 = √  −1  ,
3 2 3 2
1 0 4
whence  
2
3 − √12 − 3√
1
2
 
Q=

2
3
√1
2
1 
− 3√ 2
.
1 4
3 0 √
3 2
We reiterate that repeated eigenvalues cause no problem with symmetric matrices. ▽

We conclude this discussion with a comparison to our study of projections in Chapter 5. Note
that if we write out A = QΛQ−1 = QΛQT , we see that
   
| | | λ1 qT 1
   
  λ2  qT 2 

A =  q1 
q2 · · · qn      . 
..  .. 
  .  
| | | λn qn T

n
X
= λi qi qT
i .
i=1

This is the so-called spectral decomposition of A: multiplying by a symmetric matrix A is the

same as taking a weighted sum (weighted by the eigenvalues) of projections onto the respective
eigenspaces. This is, indeed, a beautiful result with many applications in higher mathematics and
physics.

4.1. Conics and Quadric Surfaces. We now use the Spectral Theorem to analyze the equa-
tions of conic sections and quadric surfaces.

Example 4. Suppose we are given the quadratic equation

x21 + 4x1 x2 − 2x22 = 6

to graph. Then we notice that we can write the quadratic expression

" #" #
h i 1 2 x1
2 2
x1 + 4x1 x2 − 2x2 = x1 x2 = xT Ax,
2 −2 x2
where " #
1 2
A=
2 −2
is the symmetric matrix we analyzed in Example 1 above. Thus, we know that
" # " #
T 1 2 −1 2 0
A = QΛQ , where Q = √ and Λ = .
5 1 2 0 −3
440 Chapter 9. Eigenvalues, Eigenvectors, and Applications

So, if we make the substitution y = QT x, then we have

xT Ax = xT (QΛQT )x = (QT x)T Λ(QT x) = yT Λy = 2y12 − 3y22 .

Note that the conic is much easier to understand in the y1 y2 -coordinates. Indeed, we recognize
that the equation 2y12 − 3y22 = 6 can be written in the form
y12 y22
− = 1,
3 2
q
from which we see that this is a hyperbola with asymptotes y2 = ± 23 y1 , as pictured in Figure
4.2. Now recall that the y1 y2 -coordinates are the coordinates with respect to the basis formed by

Figure 4.2

the column vectors of Q. Thus, if we want to sketch the picture in the original x1 x2 -coordinates,
we first draw in the basis vectors q1 and q2 , and these establish the y1 - and y2 -axes, respectively,

q2
q1

Figure 4.3

as shown in Figure 4.3. ▽

§4. The Spectral Theorem 441

It’s worth recalling that the equation

x21 x22
+ 2 =1
a2 b
represents an ellipse (with semiaxes a and b), whereas the equation
x21 x22
− 2 =1
a2 b

±a
represents a hyperbola with vertices and asymptotes x2 = ± ab x1 .
0
Quadric surfaces include those shown in Figure 4.4: ellipsoids, cylinders, and hyperboloids of
1 and 2 sheets. There are also paraboloids (both elliptic and hyperbolic), but we come to these a
bit later. We turn to another example.

hyperboloid hyperboloid
ellipsoid cylinder
of one sheet of two sheets

Figure 4.4

Example 5. Consider the surface defined by the equation

2x1 x2 + 2x1 x3 + x22 + x23 = 2.
We observe that if  
0 1 1
 
A = 1 1 0
1 0 1
is the symmetric matrix from Example 2, then
xT Ax = 2x1 x2 + 2x1 x3 + x22 + x23 ,
and so we use the diagonalization and the substitution y = QT x as before to write
 
−1 0 0
 
xT Ax = yT Λy, where Λ =  0 1 0;
0 0 2
 
y1
 
that is, in terms of the coordinates y =  y2 , we have
y3
2x1 x2 + 2x1 x3 + x22 + x23 = −y12 + y22 + 2y32 ,
442 Chapter 9. Eigenvalues, Eigenvectors, and Applications

and the graph of −y12 + y22 + 2y32 = 2 is the hyperboloid of one sheet shown in Figure 4.5. This is

Figure 4.5

the picture with respect to the “new basis” {q1 , q2 , q3 } (given in the solution of Example 2). The
picture with respect to the standard basis, then, is as shown in Figure 4.6. (This figure is obtained

Figure 4.6

by multiplying by the matrix Q. Why?) ▽

The alert reader may have noticed that we’re lacking certain curves and surfaces. If there
are linear terms present along with the quadratic we must adjust accordingly. For example, we
recognize that
x21 + 2x22 = 1
is the equation of an ellipse centered at the origin. Correspondingly, by completing the square, we
see that
x21 + 2x1 + 2x22 − 3x2 = 13
2
§4. The Spectral Theorem 443

−1
is the equation of a congruent ellipse centered at . However, the linear terms become all
3/4
important when the symmetric matrix defining the quadratic terms is singular. For example,
x21 − x1 = 1
defines a pair of lines, whereas
x21 − x2 = 1
defines a parabola.

Example 6. We wish to sketch the surface

5x21 − 8x1 x2 − 4x1 x3 + 5x22 − 4x2 x3 + 8x23 + 2x1 + 2x2 + x3 = 9.
No, we did not pull this mess out of a hat. The quadratic terms came, as might be predicted, from
Example 3. Thus, we make the change of coordinates given by y = QT x, with
 
2 √1 1
3 − 2 −3 2
√
2 
Q= 3
√1
2
1 
− 3√ 2
.
1 4
3 0 √
3 2
Since x = Qy, we have
 
2
h i h i 3 − √12 − 3√
1
2 y1
1 
 
2x1 + 2x2 + x3 = 2 2 1 Qy = 2 2 1 

2
3
√1
2
− 3√ y
2  2 
= 3y1 ,
1 4 y3
3 0 √
3 2
and so our given equation becomes, in the y1 y2 y3 -coordinates,
9y22 + 9y32 + 3y1 = 9.
Rewriting this a bit, we have
y1 = 3(1 − y22 − y32 ),
which we recognize as a (circular) paraboloid, shown in Figure 4.7. The sketch of the surface in

Figure 4.7

our original x1 x2 x3 -coordinates is then shown in Figure 4.8. ▽

444 Chapter 9. Eigenvalues, Eigenvectors, and Applications

Figure 4.8

EXERCISES 9.4

1. Find"orthogonal
# matrices that diagonalize each of the following symmetric matrices:
6 2
*a.
2 9
   
2 0 0 1 −2 2
   
b.  0 1 −1  e.  −2 1 2
0 −1 1 2 2 1
   
2 2 −2 1 0 1 0
  0
*c.  2 −1 −1   1 0 1
f.  
−2 −1 −1 1 0 1 0
 
3 2 2 0 1 0 1
 
*d.  2 2 0
2 0 4
   
1 1
*2. Suppose A is a symmetric matrixwith   
1 and −1 
 eigenvalues 2 and 5. If the vectors
1 −1 0
span the 5-eigenspace, what is A  1 ? Give your reasoning.
2  
1
3. A symmetric matrix A has eigenvalues 1 and 2. Find A if  1  spans E(2).
1
1 2
4. Suppose A is symmetric, A = , and det A = 6. Give the matrix A. Explain your
1 2
reasoning clearly. (Hint: What are the eigenvalues of A?)

*5. Prove that if λ is the only eigenvalue of a symmetric matrix A, then A = λI.

6. Decide (as efficiently as possible) which of the following matrices are diagonalizable. Give your
§4. The Spectral Theorem 445

reasoning.
   
5 0 2 5 0 2
   
A = 0 5 0, B = 0 5 0,
0 0 5 2 0 5
   
1 2 4 1 2 4
   
C = 0 2 2, D = 0 2 2.
0 0 3 0 0 1

7. Suppose A is a diagonalizable matrix whose eigenspaces are orthogonal. Prove that A is sym-
metric.

8. Suppose A is a symmetric n × n matrix. Using the spectral theorem, prove that if Ax · x = 0

for every vector x ∈ Rn , then A = O.

9. Apply the spectral theorem to prove that any symmetric matrix A satisfying A2 = A is in fact
a projection matrix.

10. Suppose T is a symmetric linear map satisfying [T ]4 = I. Use the spectral theorem to give a
complete description of T : Rn → Rn . (Hint: For starters, what are the potential eigenvalues
of T ?)
√
11. Let A be an m × n matrix. Show that kAk = λ, where λ is the largest eigenvalue of the
symmetric matrix AT A.

12. We say a symmetric matrix A is positive definite if Ax · x > 0 for all x 6= 0, negative definite if
Ax · x < 0 for all x 6= 0, and positive (resp., negative) semidefinite if Ax · x ≥ 0 (resp., ≤ 0)
for all x.
a. Prove that if A and B are positive (negative) definite, then so is A + B.
b. Prove that A is positive (resp., negative) definite if and only if all its eigenvalues are
positive (resp., negative).
c. Prove that A is positive (resp., negative) semidefinite if and only if all its eigenvalues are
nonnegative (resp., nonpositive).
d. Prove that if C is any m × n matrix of rank n, then A = C T C has positive eigenvalues.
e. Prove or give a counterexample: if A and B are positive definite, then so is AB. What
about AB + BA?

13. Let A be an n × n matrix. Prove that A is nonsingular if and only if every eigenvalue of AT A
is positive.

14. Prove that if A is positive semidefinite (symmetric) matrix, then there is a unique positive
semidefinite (symmetric) matrix B with B 2 = A.

15. Suppose A and B are symmetric and AB = BA. Prove there is an orthogonal matrix Q so that
both Q−1 AQ and Q−1 BQ are diagonal. (Hint: Let λ be an eigenvalue of A. Use the Spectral
Theorem to show that there is an orthonormal basis for E(λ) consisting of eigenvectors of B.)
446 Chapter 9. Eigenvalues, Eigenvectors, and Applications

16. Prove using only methods of linear algebra that the eigenvalues of a symmetric matrix are real.
(Hints: Let λ = a + bi be a putative complex eigenvalue of A, and consider the real matrix

B = A − (a + bi)I A − (a − bi)I = A2 − 2aA + (a2 + b2 )I = (A − aI)2 + b2 I .

Show that B is singular, and that if v ∈ N(B) is a nonzero vector, then (A − aI)v = 0 and
b = 0.)

17. If A is a positive definite symmetric n × n matrix, what is the volume of the n-dimensional
ellipsoid {x ∈ Rn : Ax · x ≤ 1}? (See also Exercise 7.6.3.)

18. Sketch the following conic sections, giving axes of symmetry and asymptotes (if any).
a. 6x1 x2 − 8x22 = 9
*b. 3x21 − 2x1 x2 + 3x22 = 4
*c. 16x21 + 24x1 x2 + 9x22 − 3x1 + 4x2 = 5
d. 10x21 + 6x1 x2 + 2x22 = 11
e. 7x21 + 12x1 x2 − 2x22 − 2x1 + 4x2 = 6

19. Sketch the following quadric surfaces.

*a. 3x21 + 2x1 x2 + 2x1 x3 + 4x2 x3 = 4
b. 4x21 − 2x1 x2 − 2x1 x3 + 3x22 + 4x2 x3 + 3x23 = 6
c. −x21 + 2x22 − x23 − 4x1 x2 − 10x1 x3 + 4x2 x3 = 6
*d. 2x21 + 2x1 x2 + 2x1 x3 + 2x2 x3 − x1 + x2 + x3 = 1
e. 3x21 + 4x1 x2 + 8x1 x3 + 4x2 x3 + 3x23 = 8
f. 3x21 + 2x1 x3 − x22 + 3x23 + 2x2 = 0

20. Let a, b, c ∈ R, and let Q(x) = ax21 + 2bx1 x2 + cx22 .

a. The Spectral Theorem tells us that there exists an orthonormal basis for R2 with respect
to whose coordinates y1 , y2 we have

Q(x) = Q̃(y) = λy12 + µy22 .

In high school analytic geometry one derives the formula

a−c
cot 2α =
2b

for the angle α through which we must rotate the x1 x2 -axes to get the appropriate y1 y2 -
axes. Derive this using eigenvalues and eigenvectors, and determine the type (ellipse, hy-
perbola, etc.) of the conic section Q(x) = 1 from a, b, and c. (Hint: Use the characteristic
polynomial to eliminate λ2 in your computation of tan 2α.)
b. Use the formula for Q̃ above to find the maximum and minimum of Q on the unit circle
kxk = 1.

21. In this exercise we consider the nature of the restriction of a quadratic form to a hyperplane.
Let A be a symmetric n × n matrix.
§4. The Spectral Theorem 447

a. Show that the quadratic form Q(x) = xT Ax on Rn is positive definite when restricted to
the subspace xn = 0 if and only if all the roots of

0

..
.
A − tI

0 = 0
1

0 ··· 0 1 0

are positive.
b. Use the change-of-basis theorem to prove that the restriction to the subspace b · x = 0 is
positive definite if and only if all the roots of

|

A − tI b

=0
|

bT 0

are positive.
c. Use this result to give a bordered Hessian test for the point a to be a constrained maximum
(minimum) of the function f subject to the constraint g = c. (See Exercises 5.4.34 and
5.4.32b.)
d. What is the analogous result for an arbitrary subspace?

22. We saw in Section 3 of Chapter 5 that we can write a symmetric n × n matrix A in the form
A = LDLT (where L is lower triangular with diagonal entries 1 and D is diagonal); we saw
in this section that we can write A = QΛQT for some orthogonal matrix Q. Although the
diagonal entries of D obviously need not be the eigenvalues of A, the point of this exercise is
to see that the signs of these numbers must agree. That is, the number of positive entries in D
equals the number of positive eigenvalues of A, the number of negative entries in D equals the
number of negative eigenvalues of A, and the number of zero (diagonal) entries in D equals the
number of zero eigenvalues.
a. Assume first that A is nonsingular. Consider the “straight line path” joining I and L (stick
a parameter s in front of the non-diagonal entries of L and let s vary from 0 to 1). We
then obtain a path in Mn×n joining D and A. Show that all the matrices in this path are
nonsingular and, applying Exercise 8.7.9, show that the number of positive eigenvalues of
D equals the number of positive eigenvalues of A. Deduce the result in this case.
b. In general, prove that the number of zero diagonal entries in D is equal to dim N(A) =
dim E(0). By considering the matrix A + εI for ε > 0 sufficiently small, use part a to
deduce the result.

Remark. Comparing Proposition 3.5 of Chapter 5 with Exercise 12 above, we can easily
derive the result of this exercise when A is either positive or negative definite. But the indefinite
case is more subtle.
Glossary of Notations and Results from
Single-Variable Calculus
Notations

notation definition discussion/page reference

∈ is an element of x ∈ X means that x belongs

to the set X
⊂ subset X ⊂ Y means that every ele-
ment x of X belongs to Y as
well; two sets X and Y are equal
if X ⊂ Y and Y ⊂ X.
( proper subset X ( Y means that X ⊂ Y and
X 6= Y .
=⇒ implies P =⇒ Q means that whenever
P is true, Q must be as well
⇐⇒ if and only if P ⇐⇒ Q means P =⇒ Q and
Q =⇒ P
gives by row operations See p. 124
n
k binomial coefficient 163
∂f
partial derivative of f with respect to xj 79
∂xj
∂2f
second-order partial derivative 115
∂xj ∂xi
∇2 f Laplacian of f 117
∇f gradient of f 99
Z Z
f dA, f dV integral of f over R 256
R R

∧ wedge product 324

∂R boundary of R 342
Ai ith row vector of the matrix A 28

448
GLOSSARY 449

aj j th column vector of the matrix A 28

A−1 inverse of the matrix A 33
AT transpose of the matrix A 34
Aθ matrix giving rotation through angle θ 27
−
−→
AB vector corresponding to the directed line segment 1
from A to B
AB product of the matrices A and B 30
Ax product of the matrix A and the vector x 28
Aij (n − 1) × (n − 1) matrix obtained by deleting the 302
ith row and the j th column from the n × n matrix
A
B(a, δ) ball 62
B(a, δ) closed ball 66
B basis 396
C1 , Ck , C∞ continuously differentiable, smooth functions 89, 115, 115, 159
C(A) column space of the matrix A 163
CB coordinates with respect to a basis B 398
cij ij th cofactor 302
curl F curl of vector field F 377
d (exterior) derivative 325
Df (a) derivative of f at a 84
Dv f (a) directional derivative of f at a in direction v 79
det A determinant of the square matrix A 297
div F divergence of vector field F 378
{e1 , . . . , en } standard basis for Rn 19, 154
E(λ) λ-eigenspace 406
eA exponential of the square matrix A 422
 
x1
 
f  ...  function of a vector variable 56
xn

f˜ extension of f by 0 260
f average value of f 288
450 GLOSSARY

graph(f ) graph of the function f 55

Hess(f ) Hessian matrix of f 199
Hf,a quadratic form associated to Hessian of f 199
g∗ ω pullback of ω by g 327
I identity matrix, moment of inertia 290
In n × n identity matrix 33
image(T ) image of a linear transformation T 164
κ(s) curvature 110
ker(T ) kernel of a linear transformation T 164
lim f (x) limit of f (x) as x approaches a 69
x→a

ℓ(g) arclength of g 107

L(f, P) lower sum of f with respect to partition P 256
Λk (Rn )∗ vector space of alternating multilinear functions 323
from (Rn )k to R
Mm×n vector space of m × n matrices 159
µA linear transformation defined by multiplication 28
by A
n outward-pointing unit normal 348
N(A) nullspace of the matrix A 164
N(s) principal normal vector 110
ω differential form 325
⋆ω star operator 331
Ω region 260
P plane, parallelogram, or partition 17, 42, 255
Pk vector space of polynomials of degree ≤ k 159
pA (t) characteristic polynomial of the matrix A 408
projy x projection of x onto y 10
projV b projection of b onto the subspace V 216
Q quadratic form 200
r, θ polar coordinates 276
r, θ, z cylindrical coordinates 280
GLOSSARY 451

ρ, φ, θ spherical coordinates 281

Rn (real) n-dimensional space 1
(Rn )∗ vector space of linear maps from Rn to R 321
R rectangle 62
R(A) row space of the matrix A 164
ρ(x) rotation of x ∈ R2 through angle π/2 15
S n−1 unit sphere 387
Span(v1 , . . . , vk ) span of v1 , . . . , vk 18
sup S least upper bound (supremum) of S 66
S closure of the subset S 67
T linear map (or transformation) 23
[T ] standard matrix of linear map T 24
kT k norm of the linear map T 93
kT k cubical norm of the linear map T 311
T(s) unit tangent vector 110
trA trace of the matrix A 39
U (f, P) upper sum of f with respect to partition P 256
U +V sum of the subspaces U and V 23
U ∩V intersection of the subspaces U and V 23
x×y cross product of the vectors x, y ∈ R3 46
V⊥ orthogonal complement of subspace V 21
kxk length of the vector x 2
xk sequence 64
xkj subsequence 67
x least squares solution 216
x·y dot product of the vectors x and y 8
hx, yi inner product of the vectors x and y 226
xk , x⊥ components of x parallel to and orthogonal to 10
another vector
0 zero vector 1
O zero matrix 29
452 GLOSSARY

Results from Single-Variable Calculus

Intermediate Value Theorem: Let f : [a, b] → R be continuous. Then for any y between f (a)
and f (b), there is x ∈ [a, b] with f (x) = y.
Rolle’s Theorem: Suppose f : [a, b] → R is continuous, f is differentiable on (a, b) and f (a) =
f (b). Then there is c ∈ (a, b) so that f ′ (c) = 0. (Proof : By the maximum value theorem, Theorem
1.2 of Chapter 5, f takes on its maximum and minimum values on [a, b]. If f is constant on [a, b],
then f ′ (c) = 0 for all c ∈ (a, b). If not, say f (x) > f (a) for some x ∈ (a, b), in which case f takes on
a global maximum at some c ∈ (a, b). Then f ′ (c) = 0 (by Lemma 2.1 of Chapter 5). Alternatively,
f (x) < f (a) for some x ∈ (a, b), in which case f takes on a global minimum at some c ∈ (a, b).
Then in this case, as well, f ′ (c) = 0.)
Mean Value Theorem: Suppose f : [a, b] → R is continuous and f is differentiable on (a, b).
Then there is c ∈ (a, b) so that f (b) − f (a) = f ′ (c)(b − a).
Z Fundamental Theorem of Calculus, Part I : Suppose f is continuous on [a, b] and we set F (x) =
x
f (t)dt. Then F ′ (x) = f (x) for all x ∈ (a, b).
a
Fundamental Theorem of Calculus, Part II : Suppose f is integrable on [a, b] and f = F ′ . Then
Z b
f (x)dx = F (b) − F (a).
a

Basic differentiation formulas:

product rule: (f g)′ = f ′ g + f g ′
quotient rule: (f /g)′ = (f ′ g − f g ′ )/g 2
chain rule: (f ◦ g)′ = (f ′ ◦ g)g′
Note: We use log to denote the natural logarithm (ln).

Function Derivative
xn nxn−1
ex ex
log x 1/x
sin x cos x
cos x − sin x
tan x sec2 x
sec x sec x tan x
cot x − csc2 x
csc x − csc x cot x
√
arcsin x 1/ 1 − x2
arctan x 1/(1 + x2 )

Basic trigonometric formulas:

sin2 θ + cos2 θ = 1 tan2 θ + 1 = sec2 θ cot2 θ + 1 = csc2 θ
cos 2θ = cos2 θ − sin2 θ = 2 cos2 θ − 1 = 1 − 2 sin2 θ
sin 2θ = 2 sin θ cos θ
GLOSSARY 453

law of cosines: c2 = a2 + b2 − 2ab cos γ

sin α sin β sin γ
law of sines: = = a
a b c c

Basic integration formulas:

Z Z
integration by parts: f ′ (x)g(x)dx = f (x)g(x) − f (x)g′ (x)dx
Z
R
integration by substitution: f (g(x))g ′ (x)dx = F (g(x)), where F (u) = f (u)du

Miscellaneous integration formulas:

Z Z
n xn+1
x dx = , n 6= −1 ex dx = ex
Z n+1 Z
dx
= log |x| sin xdx = − cos x
Z x Z
cos xdx = sin x tan xdx = − log | cos x|
Z Z
1 1
sin2 xdx = (x − sin x cos x) cos2 xdx = (x + sin x cos x)
Z 2 Z 2
2
tan xdx = tan x − x sec xdx = log | sec x + tan x|
Z Z
1
sec2 xdx = tan x sec3 xdx = (sec x tan x + log | sec x + tan x|)
Z Z 2
3 1 3 1
sin xdx = − cos x + cos x cos xdx = sin x − sin3 x
3
Z 3 Z 3
3 1 2 dx x
tan xdx = tan x + log | cos x| √ = arcsin
Z 2 Z a2 − x 2 a
dx 1 x dx 1 x + a
= arctan = log
a2 + x2 a a a 2 − x2 2 x − a
Z p Z
2 2
xp 2 2
a2 p
2 2 dx p
2 2
x ± a dx = x ±a ± log x + x ± a √ = log x + x ± a
Z 2 2 Z
2
x ±a 2
p xp 2 a2 x
2 2
a − x dx = a −x + 2 arcsin log xdx = x log x − x
Z 2 ax 2 a Z
e eax
eax sin bxdx = 2 (a sin bx − b cos bx) eax
cos bxdx = (a cos bx + b sin bx)
a + b2 a 2 + b2
454 GLOSSARY

Greek alphabet

alpha α iota ι rho ρ

beta β kappa κ sigma σ Σ
gamma γ Γ lambda λ Λ tau τ
delta δ ∆ mu µ upsilon υ Υ
epsilon ǫ (ε) nu ν phi φ (ϕ) Φ
zeta ζ xi ξ Ξ chi χ
eta η omicron o psi ψ Ψ
theta θ Θ pi π Π omega ω Ω
For Further Reading

Apostol, Tom M., Calculus (two volumes), 2nd ed. Waltham, MA: Blaisdell Publishing Co., 1967.
Although the first volume is needed for rudimentary vector algebra, the second volume includes
linear algebra, multivariable calculus (although only treating the “classic” versions of Stokes’s
Theorem), and an introduction to probability theory and numerical analysis.

Bamberg, Paul, and Sternberg, Shlomo, A Course in Mathematics for Students of Physics (two
volumes), Cambridge: Cambridge University Press, 1988. This book includes much of the
mathematics of our course, but also a volume’s worth of interesting physics (using differential
forms).

Edwards, Jr., C. H. , Advanced Calculus of Several Variables, New York: Dover Publications, 1994
(originally published by Academic Press, 1973). This very well-written book parallels ours for
students who have already had standard courses in linear algebra and multivariable calculus.
Of particular note is the last chapter, on the calculus of variations.

Friedberg, Stephen H., Insel, Arnold J., and Spence, Lawrence E., Linear Algebra, 3rd ed. Up-
per Saddle River, NJ: Prentice Hall, 1997. A well-written, somewhat more advanced book
concentrating on the theoretical aspects of linear algebra.

Hubbard, John H., and Hubbard, Barbara Burke, Vector Calculus, Linear Algebra, and Differential
Forms: A Unified Approach, 2nd ed. Upper Saddle River, NJ: Prentice Hall, 2002. Very similar
in spirit to our text, this book is wonderfully idiosyncratic and includes Lebesgue integration,
Kantarovich’s Theorem, and the exterior derivative from a non-standard definition. It also
treats the Taylor polynomial in several variables.

Shifrin, Theodore, and Adams, Malcolm, Linear Algebra: A Geometric Approach, New York:
W. H. Freeman, 2002. Includes a few advanced topics in linear algebra that we did not have
time to discuss in this text, e.g., complex eigenvalues, Jordan canonical form, and computer
graphics.

Spivak, Michael, Calculus, 3rd ed. Houston, TX: Publish or Perish, 1994. The beautiful, ultimate
source for single-variable calculus “done right.”

455
456 FURTHER READING

Spivak, Michael, Calculus on Manifolds: A Modern Approach to Classical theorems of Advanced

Calculus Boulder, CO: Westview Press, 1965. A very terse and sophisticated version of this
text, intended to introduce students who’ve seen linear algebra and multivariable calculus to
the rigorous approach and to a more formal treatment of Stokes’s Theorem.

Strang, Gilbert, Linear Algebra and its Applications, 3rd ed. Philadelphia: Saunders, 1988. A
classic text, with far more depth on applications.