Calc
Calc
I began writing this text as I taught a brand-new course combining linear algebra and a rigorous
approach to multivariable calculus. Some of the students had already taken a proof-oriented single-
variable calculus course (using Spivak’s beautiful book, Calculus), but many had not: There were
sophomores who wanted a more challenging entrée to higher-level mathematics, as well as freshmen
who’d scored a 5 on the Advanced Placement Calculus BC exam. My goal was to include all
the standard computational material found in the usual linear algebra and multivariable calculus
courses and more, interweaving the material as effectively as possible, and include complete proofs.
Although there have been a number of books that include both the linear algebra and the
calculus material, they have tended to segregate the material. Advanced calculus books treat the
rigorous multivariable calculus, but presume the students have already mastered linear algebra. I
wanted to integrate the material so as to emphasize the recurring theme of implicit versus explicit
that persists in linear algebra and analysis. In every linear algebra course we should learn how to
go back and forth between a system of equations and a parametrization of its solution set. But the
same problem occurs, in principle, in calculus: To solve constrained maximum/minimum problems
we must either parametrize the constraint set or use Lagrange multipliers; to integrate over a curve
or surface, we need a parametric representation. Of course, in the linear case one can globally
go back and forth; it’s not so easy in the nonlinear case, but, as we’ll learn, it should at least be
possible in principle locally.
The prerequisites for this book are a solid background in single-variable calculus and, if not some
experience writing proofs, a strong interest in grappling with them. In presenting the material, I
have included plenty of examples, clear proofs, and significant motivation for the crucial concepts.
We all know that to learn (and enjoy?) mathematics one must work lots of problems, from the
routine to the more challenging. To this end, I have provided numerous exercises of varying levels
of difficulty, both computational and more proof-oriented. Some of the proof exercises require the
student “merely” to understand and modify a proof in the text; others may require a good deal
of ingenuity. I also ask students for lots of examples and counterexamples. Generally speaking,
exercises are arranged in order of increasing difficulty. To offer a bit more guidance, I have marked
with an asterisk (*) those problems to which short answers, hints, or—in some cases—complete
solutions are given at the back of the book. As a guide to the new teacher, I have marked with
a sharp (♯ ) some important exercises to which reference is made later. An Instructors’ Solutions
Manual is available from the publisher.
i
ii PREFACE
Comments on Contents
The linear algebraic material with which we begin the course in Chapter 1 is concrete, es-
tablishes the link with geometry, and is a good self-contained setting for working on proofs. We
introduce vectors, dot products, subspaces, and linear transformations and matrix computations.
At this early stage we emphasize the two interpretations of multiplying a matrix A by a vector
x: the linear equations viewpoint (considering the dot products of the rows of A with x) and the
linear combinations viewpoint (taking the linear combination of the columns of A weighted by the
coordinates of x). We end the chapter with a discussion of 2 × 2 and 3 × 3 determinants, area,
volume, and the cross product.
In Chapter 2 we begin to make the transition to calculus, introducing scalar functions of a
vector variable—their graphs and their level sets—and vector-valued functions. We introduce the
requisite language of open and closed sets, sequences, and limits and continuity, including the proofs
of the usual limit theorems. (Generally, however, I give these short shrift in lecture, as I don’t have
the time to emphasize δ-ε arguments.)
We come to the concepts of differential calculus in Chapter 3. We quickly introduce partial and
directional derivatives as immediate to calculate, and then come to the definition of differentiability,
the characterization of differentiable functions, and the standard differentiation rules. We give the
gradient vector its own brief section, in which we emphasize its geometric meaning. Then comes
a section on curves, in which we mention Kepler’s laws (the second is proved in the text and the
other two are left as an exercise), arclength, and curvature of a space curve.
In the first four sections of Chapter 4 we give an accelerated treatment of Gaussian elimination
(including a proof of uniqueness of reduced echelon form) and the theory of linear systems, the
standard material on linear independence and dimension (including a brief mention of abstract
vector spaces), and the four fundamental subspaces associated to a matrix. In the last section, we
begin our assault on the nonlinear case, introducing (with no proofs) the implicit function theorem
and the notion of a manifold.
Chapter 5 is a blend of topology, calculus, and linear algebra—quadratic forms and projections.
We start with the topological notion of compactness and prove the maximum value theorem in
higher dimensions. We then turn to the calculus of applied maximum/minimum problems and then
to the analysis of the second-derivative test and the Hessian. Then comes one of the most important
topics in applications, Lagrange multipliers (with a rigorous proof). In the last section, we return
to linear algebra, to discuss projections (from both the explicit and the implicit approaches), least-
squares solutions of inconsistent systems, the Gram-Schmidt process, and a brief discussion of
abstract inner product spaces (including a nice proof of Lagrange interpolation).
Chapter 6 is a brief, but sophisticated, introduction to the inverse and implicit function theo-
rems. We present our favorite proof using the contraction mapping principle (which is both more
elegant and works just fine in the infinite-dimensional setting). In the last section we prove that all
three definitions of a manifold are (locally) equivalent: the implicit representation, the parametric
representation, and the representation as a graph. (In the year-long course that I teach, I find I
have time to treat this chapter only lightly.)
In Chapter 7 we study the multidimensional (Riemann) integral. In the first two sections we
PREFACE iii
deal predominantly with the theory of the multiple integral and, then, Fubini’s Theorem and the
computation of iterated integrals. Then we introduce (as is customary in a typical multivariable
calculus course) polar, cylindrical, and spherical coordinates and various physical applications. We
conclude the chapter with a careful treatment of determinants (which will play a crucial role in
Chapter 8 and 9) and a proof of the Change of Variables Theorem.
In single-variable calculus, one of the truly impressive results is the Fundamental Theorem
of Calculus. In Chapter 8 we start by laying the groundwork for the analogous multidimensional
result, introducing differential forms in a very explicit fashion. We then parallel a traditional vector
calculus course, introducing line integrals and Green’s Theorem, surface integrals and flux, and,
then, finally stating and proving the general Stokes’ Theorem for compact oriented manifolds. We
do not skimp on concrete and nontrivial examples throughout. In Section 8.6 we introduce the
standard terminology of divergence and curl and give the “classical” versions of Stokes’ and the
Divergence Theorems, along with some applications to physics. In Section 8.7 we begin to illustrate
the power of Stokes’ Theorem by proving the Fundamental Theorem of Algebra, a special case of
the argument principle, and the “hairy ball theorem” from topology.
In Chapter 9 we complete our study of linear algebra, including standard material on change
of basis (with a geometric slant), eigenvalues, eigenvectors and discussion of diagonalizability. The
remainder of the chapter is devoted to applications: difference and differential equations, and a
brief discussion of flows and their relation to the Divergence Theorem of Chapter 8. We close with
the Spectral Theorem. (With the exception of Section 3.3, which relies on Chapter 8, and the
proof of the Spectral Theorem, which relies on Section 4 of Chapter 5, topics in this chapter can
be covered at any time after completing Chapter 4.)
We have included a glossary of notations and a quick compilation of relevant results from
trigonometry and single-variable calculus (including a short table of integrals), along with a much-
requested list of the Greek alphabet.
There are over 800 exercises in the text, many with multiple parts. Here are a few particularly
interesting (and somewhat unusual) exercises included in this text:
• Exercises 1.2.22–26 and Exercises 1.5.19 and 1.5.20 on the geometry of triangles, and Ex-
ercise 1.5.17, a nice glimpse of affine geometry;
• Exercise 2.1.12, a parametrization of a hyperboloid of one sheet in which the parameter
curves are the two families of rulings
• Exercises 2.3.15–17, 3.1.10, and 3.2.18–19, exploring the infamous sorts of discontinuous
and non-differentiable functions
• Example 3.4.3 introducing the reflectivity property of the ellipse via the gradient, with
follow-ups in Exercises 3.4.8, 3.4.9, and 3.4.13, and then Kepler’s first and third laws in
Exercise 3.5.15.
• Exercise 3.5.14, the famous fact (due to Huygens) that the evolute of a cycloid is a congruent
cycloid
• Exercise 4.5.13, in which we discover that the lines passing through three pairwise-skew
lines generate a saddle surface
• Exercises 5.1.5, 5.1.7, 9.4.11, exploring the (operator) norm of a matrix
iv PREFACE
I have been using the text for a number of years in a course for highly motivated freshmen and
sophomores. Since this is the first “serious” course in mathematics for many of them, because of
time limitations, I must give somewhat short shrift to many of the complicated analytic proofs.
For example, I only have time to talk about the Inverse and Implicit Function Theorems and to
sketch the proof of the Change of Variables Theorem, and don’t include all the technical aspects
of the proof of Stokes’ Theorem. On the other hand, I cover most of the linear algebra material
thoroughly. I do plenty of examples and assign a broad range of homework problems, from the
computational to the more challenging proofs.
It would also be quite appropriate to use the text in courses in advanced calculus or multivariable
analysis. Depending on the students’ background, I might bypass the linear algebra material or
assign some of it as review reading and highlight a few crucial results. I would spend more time
on the analytic material (especially in Chapters 3, 6, and 7) and treat Stokes’ Theorem from the
differential form viewpoint very carefully, including the applications in Section 8.7. The approach
of the text will give the students a very hands-on understanding of rather abstract material. In such
PREFACE v
courses, I would spend more time in class on proofs and assign a greater proportion of theoretical
homework problems.
Acknowledgments
I would like to thank my students of the past years for enduring preliminary versions of this text
and for all their helpful comments and suggestions. I would like to acknowledge helpful conversations
with my colleagues Malcolm Adams and Jason Cantarella. I would also like to thank the following
reviewers, along with several anonymous referees, who offered many helpful comments:
Quo-Shin Chi Washington University
Philip B. Yasskin Texas A& M University
Mohamed Elhamdadi University of South Florida
I am very grateful to my editor, Laurie Rosatone, for her enthusiastic support, encouragement, and
guidance.
I welcome any comments and suggestions. Please address any e-mail correspondence to
[email protected]
and please keep an eye on
http://www.math.uga.edu/~shifrin/Multivariable.html
or
http://www.wiley.com/college/shifrin
for the latest in typos and corrections.
CHAPTER 1
Vectors and Matrices
Linear algebra provides a beautiful example of the interplay between two branches of mathe-
matics, geometry and algebra. Moreover, it provides the foundations for all of our upcoming work
with calculus, which is based on the idea of approximating the general function locally by a linear
one. In this chapter, we introduce the basic language of vectors, linear functions, and matrices.
We emphasize throughout the symbiotic relation between geometric and algebraic calculations and
interpretations. This is true also of the last section, where we discuss the determinant in two and
three dimensions and define the cross product.
1. Vectors in Rn
A point in Rn is an
ordered n-tuple of real numbers, written (x1 , . . . , xn ). To it we may associate
x1
x2
the vector x = .. , which we visualize geometrically as the arrow pointing from the origin to the
.
xn
point. We shall (purposely) use the boldface letter x to denote both the point and the corresponding
vector, as illustrated in Figure 1.1. We denote by 0 the vector all of coordinates are 0, called the
zero vector .
x1
x = x2
x3
x = xx1
2
x2 x3
O x1
x2 x1
Figure 1.1
More generally, any two points A and B in space determine the arrow pointing from A to B,
−−
→
as shown in Figure 1.2, again specifying a vector thatwe
denote AB. We often refer to A as the
a1 b1
−
−→ . . −
−→
“tail” of the vector
AB and B as its “head.” If A = .. and B = .. , then AB is equal to
b 1 − a1 an bn
..
the vector v = . , whose tail is at the origin, as indicated in Figure 1.2.
b n − an
1
2 Chapter 1. Vectors and Matrices
b
B b1
2
D b2−a2
A a
a2
1
v
C b1−a1
Figure 1.2
p
The Pythagorean Theorem tells us that when n = 2 the length of the vector x is x21 + x22 . A
repeated application of the Pythagorean Theorem, as indicated in Figure 1.3, leads to the following
x1
x = x2
x3
x3
x2 x1
Figure 1.3
There are two crucial algebraic operations one can perform on vectors, both of which have clear
geometric interpretations.
x1
x2
Scalar multiplication:
If c is a real number and x = .. is a vector, then we define cx
cx1
.
cx2 xn
to be the vector .. . Note that cx points in either the same direction as x or the opposite
.
cxn
direction, depending on whether c > 0 or c < 0, respectively. Thus, multiplication by the real
number c simply stretches (or shrinks) the vector by a factor of |c| and reverses its direction when
§1. Vectors in Rn 3
c is negative. Since this is a geometric “change of scale,” we refer to the real number c as a scalar
and the multiplication cx as scalar multiplication.
Note that whenever x 6= 0 we can find a unit vector with the same direction by taking
x 1
= x,
kxk kxk
as shown in Figure 1.4.
x
x x x
||x|| ||x||
Figure 1.4
Given a nonzero vector x, any scalar multiple cx lies on the line through the origin and passing
through the head of the vector x. For this reason, we make the following
Definition. We say two vectors x and y are parallel if one is a scalar multiple of the other,
i.e., if there is a scalar c so that y = cx or x = cy. We say x and y are nonparallel if they are not
parallel.
x1 y1 x1 + y 1
..
Vector addition: If x = ... and y = ... , then we define x + y = . . To
xn yn xn + y n
understand this geometrically, we move the vector y so that its tail is at the head of x, and draw
the arrow from the origin to its head. This is the so-called parallelogram law for vector addition,
x2+y2 x
y
y2
y x+y
x2
x
y1 x1 x1+y1
Figure 1.5
for, as we see in Figure 1.5, x + y is the “long” diagonal of the parallelogram spanned by x and y.
4 Chapter 1. Vectors and Matrices
Notice that the picture makes it clear that vector addition is commutative; i.e.,
x + y = y + x.
This also follows immediately from the algebraic definition because addition of real numbers is
commutative. (See Exercise 12 for an exhaustive list of the properties of vector addition and scalar
multiplication.)
Remark . We emphasize here that the notions of vector addition and scalar multiplication
−−
→
make sense geometrically for vectors in the form AB which do not necessarily have their tails at
−
−→ −−→ −−→
the origin. If we wish to add AB to CD, we simply recall that CD is equal to any vector with the
−−→
same length and direction, so we just translate CD so that C and B coincide; then the arrow from
−
−→ −−→
A to the point D in its new position is the sum AB + CD.
Subtraction of one vector from another is easy to define algebraically. If x and y are as above,
then we set
x1 − y 1
..
x−y = . .
xn − y n
As is the case with real numbers, we have the following interpretation of the difference x − y: It is
y x−y
−y
x−y=x+(−y)
−y
Figure 1.6
(x − y) + y = x.
Pictorially, we see that x − y is drawn, as shown in Figure 1.6, by putting its tail at y and its head
at x, thereby resulting in the other diagonal of the parallelogram determined by x and y. Note
−→ −−→ −−
→
that if A and B are points in space and we set x = OA and y = OB, then y − x = AB. Moreover,
as Figure 1.6 also suggests, we have x − y = x + (−y).
§1. Vectors in Rn 5
Example 1. Let A and B be points in Rn . The midpoint M of the line segment joining them is
−−→ −−
→ −→
the point halfway from A to B; that is, AM = 21 AB. Using the notation as above, we set x = OA
−−→
and y = OB, and we have
−−→ −−→
(∗) OM = x + AM = x + 21 (y − x) = 12 (x + y).
In particular, the vector from the origin to the midpoint of AB is the average of the vectors x and
y. See Exercise 8 for a generalization to three vectors and Section 4 of Chapter 7 for more.
From this formula follows one of the classic results from high school geometry: The diagonals
B
y
M
x A
Figure 1.7
of a parallelogram bisect one another. We’ve seen that the midpoint M of AB is, by virtue of the
formula (∗), also the midpoint of diagonal OC. (See Figure 1.7.) ▽
It should now be evident that vector methods provide a great tool for translating theorems
from Euclidean geometry into simple algebraic statements. Here is another example. Recall that a
median of a triangle is a line segment from a vertex to the midpoint of the opposite side.
Proposition 1.1. The medians of a triangle intersect at a point that is two-thirds of the way
from each vertex to the opposite side.
Proof. We may put one of the vertices of the triangle at the origin, so that the picture is as
−→ −−→
in Figure 1.8(a). Let x = OA, y = OB, and let L, M , and N be the midpoints of OA, AB, and
B B
y y
M
N R Q
A P A
x x
L
O O
(a) (b)
Figure 1.8
OB, respectively. The battle plan is the following: We let P denote the point 2/3 of the way from
6 Chapter 1. Vectors and Matrices
B to L, Q the point 2/3 of the way from O to M , and R the point 2/3 of the way from A to N .
Although we’ve indicated P , Q, and R as distinct points in Figure 1.8(b), our goal is to prove that
−−→ −−→ −−
→
P = Q = R; we do this by expressing all the vectors OP , OQ, and OR in terms of x and y.
−−→ −−→ −−
→
We conclude that, as desired, OP = OQ = OR, and so P = Q = R. That is, if we go 2/3 of
the way down any of the medians, we end up at the same point; this is, of course, the point of
intersection of the three medians.
The astute reader might notice that we could have been more economical in the last proof. Suppose
we merely check that the points 2/3 of the way down two of the medians (say P and Q) agree. It
would then follow (say, by relabeling the triangle slightly) that the same is true of a different pair
of medians (say P and R). But since any two pairs must have a point in common, we may now
conclude that all three points are equal.
EXERCISES 1.1
2 −1
1. Given x = and y = , calculate the following both algebraically and geometrically.
3 1
a. x+y
b. x−y
c. x + 2y
1 1
d. 2x + 2y
e. y−x
f. 2x − y
g. kxk
x
h. kxk
1 2 3
*2. Three vertices of a parallelogram are 2 , 4 , and 1 . What are all the possible positions
1 3 5
of the fourth vertex? Give your reasoning.
−−→
4. Given △ABC, let M and N be the midpoints of AB and AC, respectively. Prove that M N =
1−−→
2 BC.
5. Let ABCD be an arbitrary quadrilateral. Let P , Q, R, and S be the midpoints of AB, BC,
CD, and DA, respectively. Use vector methods to prove that P QRS is a parallelogram. (Hint:
Use Exercise 4.)
−−→ −−→ −−→ −−→
*6. In △ABC pictured in Figure 1.9, kADk = 32 kABk and kCEk = 52 kCBk. Let Q denote
E
Q
A D B
Figure 1.9
−→ −→
the midpoint of CD; show that AQ = cAE for some scalar c and determine the ratio c =
−→ −→
kAQk/kAEk. In what ratio does CD divide AE?
−→ −
−→ −−→ −−→
7. Consider parallelogram ABCD. Suppose AE = 13 AB and DP = 34 DE. Show that P lies on
the diagonal AC. (See Figure 1.10.)
D C
A E B
Figure 1.10
−→ −−→ −−→
8. Let A, B, and C be vertices of a triangle in R3 . Let x = OA, y = OB, and z = OC. Show that
the head of the vector v = 13 (x + y + z) lies on each median of △ABC (and thus is the point
of intersection of the three medians). It follows (see Section 4 of Chapter 7) that when we put
equal masses at A, B, and C, the center of mass of that system is given by the intersections of
the medians of the triangle.
11. “Discover” the fraction 2/3 that appears in Proposition 1.1 by finding the intersection of two
←−→
medians. (Hint: A point on the line OM can be written in the form t(x + y) for some scalar
←→
t, and a point on the line AN can be written in the form x + s( 21 y − x) for some scalar s. You
will need to use the result of Exercise 10.)
12. Verify both algebraically and geometrically that the following properties of vector arithmetic
hold. (Do so for n = 2 if the general case is too intimidating.)
a. For all x, y ∈ Rn , x + y = y + x.
b. For all x, y, z ∈ Rn , (x + y) + z = x + (y + z).
c. 0 + x = x for all x ∈ Rn .
d. For each x ∈ Rn , there is a vector −x so that x + (−x) = 0.
e. For all c, d ∈ R and x ∈ Rn , c(dx) = (cd)x.
f. For all c ∈ R and x, y ∈ Rn , c(x + y) = cx + cy.
g. For all c, d ∈ R and x ∈ Rn , (c + d)x = cx + dx.
h. For all x ∈ Rn , 1x = x.
♯ 13. a. Using only the properties listed in Exercise 12, prove that for any x ∈ Rn , we have 0x = 0.
(It often surprises students that this is a consequence of the properties in Exercise 12.)
b. Using the result of part a, prove that (−1)x = −x. (Be sure that you didn’t use this fact
in your proof of part a!)
2. Dot Product
We discuss next one of the crucial constructions in linear algebra, the dot product x · y of two
n
x, y ∈ R . By wayof motivation, let’s recall some basic results from plane geometry. Let
vectors
x1 y1
P = and Q = be points in the plane, as pictured in Figure 2.1. Then we observe
x2 y2
y1
Q B
y2 P
x2
O x1 A
Figure 2.1
that when ∠P OQ is a right angle, △OAP is similar to △OBQ, and so x2 /x1 = −y1 /y2 , whence
x1 y1 + x2 y2 = 0. This leads us to make the following
x · y = x1 y 1 + x2 y 2 + · · · + xn y n .
We know that when the vectors x and y ∈ R2 are perpendicular, their dot product is 0. By
starting with the algebraic properties of the dot product, we are able to get a great deal of geometry
out of it.
Proof. In order to simplify the notation, we give the proof with n = 2. Since multiplication of
real numbers is commutative, we have
x · y = x1 y1 + x2 y2 = y1 x1 + y2 x2 = y · x.
The square of a real number is nonnegative and the sum of nonnegative numbers is nonnegative,
so x · x = x21 + x22 ≥ 0 and is equal to 0 only when x1 = x2 = 0. The next property follows from
the associative and distributive properties of real numbers:
(cx) · y = (cx1 )y1 + (cx2 )y2 = c(x1 y1 ) + c(x2 y2 ) = c(x1 y1 + x2 y2 ) = c(x · y).
The last result follows from the commutative, associative, and distributive properties of real num-
bers:
x · (y + z) = x1 (y1 + z1 ) + x2 (y2 + z2 ) = x1 y1 + x1 z1 + x2 y2 + x2 z2
= (x1 y1 + x2 y2 ) + (x1 z1 + x2 z2 ) = x · y + x · z.
kx + yk2 = (x + y) · (x + y) = x · x + x · y + y · x + y · y
= kxk2 + 2x · y + kyk2 ,
as desired.
The geometric meaning of this result comes from the Pythagorean Theorem: When x and y are
perpendicular vectors in R2 , then we have kx+ yk2 = kxk2 + kyk2 , and so, by Corollary 2.2, it must
be the case that x · y = 0. (And the converse follows, too, from the converse of the Pythagorean
Theorem.) That is, two vectors in R2 are perpendicular if and only if their dot product is 0.
Motivated by this, we use the algebraic definition of dot product of vectors in Rn to bring in
the geometry. In keeping with current use of the terminology and falling prey to the penchant to
have several names for the same thing, we make the following
x
x
x || y
Figure 2.2
Armed with this definition, we proceed to a construction that will be important in much of
our future work. Starting with two vectors x, y ∈ Rn , where y 6= 0, Figure 2.2 suggests that we
should be able to write x as the sum of a vector, xk , that is parallel to y and a vector, x⊥ , that is
orthogonal to y. Let’s suppose we have such an equation:
x = xk + x⊥ , where
xk is a scalar multiple of y and x⊥ is orthogonal to y.
To say that xk is a scalar multiple of y means that we can write xk = cy for some scalar c. Now,
assuming such an expression exists, we can determine c by taking the dot product of both sides of
the equation with y:
as required. Note, moreover, that xk is the unique multiple of y that satisfies the equation
(x − xk ) · y = 0.
§2. Dot Product 11
2 −1
Example 1. Let x = 3 and y = 1 . Then
1 1
2 −1
3· 1
−1 −1
x · y 1 1 2
xk = y =
1 = 1 and
kyk2
−1
2
1
3
1
1
1
8
2 −1 3
2
x⊥ = 3 − 1 = 7
3 .
3 1
1 1 3
8
3 −1
To double-check, we compute x⊥ · y = 7
3
· 1 = 0, as it should be. ▽
1
3 1
Suppose x, y ∈ R2 . We shall see next that the formula for the projection of x onto y enables
us to calculate the angle between the vectors x and y. Consider the right triangle in Figure 2.3;
let θ denote the angle between the vectors x and y. Remembering that the cosine of an angle is
x x
θ
θ
x || y x || y
Figure 2.3
the ratio of the signed length of the adjacent side to the length of the hypotenuse, we see that
x·y
kyk
signed length of xk ckyk kyk2 x·y
cos θ = = = = .
length of x kxk kxk kxkkyk
This, then, is the geometric interpretation of the dot product:
x · y = kxkkyk cos θ.
Will this formula still make sense even when x, y ∈ Rn ? Geometrically, we simply restrict our
attention to the plane spanned by x and y and measure the angle θ in that plane, and so we
blithely make the
Definition. Let x and y be nonzero vectors in Rn . We define the angle between them to be
the unique θ satisfying 0 ≤ θ ≤ π so that
x·y
cos θ = .
kxkkyk
12 Chapter 1. Vectors and Matrices
Since our geometric intuition may be misleading in Rn , we should check algebraically that this
definition makes sense. Since |cos θ| ≤ 1, the following result gives us what is needed.
|x · y| ≤ kxkkyk.
Moreover, equality holds if and only if one of the vectors is a scalar multiple of the other.
Proof. If y = 0, then there’s nothing to prove. If y 6= 0, then we observe that the quadratic
function of t given by
g(t) = kx + tyk2 = kxk2 + 2tx · y + t2 kyk2
x·y
takes its minimum at t0 = − . The minimum value
kyk2
(x · y)2 (x · y)2 (x · y)2
g(t0 ) = kxk2 − 2 + = kxk2
−
kyk2 kyk2 kyk2
is necessarily nonnegative, so
|x · y| ≤ kxkkyk,
as desired. Equality holds if and only if x+ty = 0 for some scalar t. (See Exercise 9 for a discussion
of how this proof relates to our formula for projy x above.)
One of the most useful applications of this result is the famed triangle inequality, which tells us
that the sum of the lengths of two sides of a triangle cannot be less than the length of the third.
Corollary 2.4 (Triangle Inequality). For any vectors x, y ∈ Rn , we have kx + yk ≤ kxk + kyk.
Since square root preserves inequality, we conclude that kx + yk ≤ kxk + kyk, as desired.
Remark . The dot product also arises in situations removed from geometry. The economist
introduces the commodity vector x, whose entries are the quantities of various commodities that
happen to be of interest and the price vector p. For example, we might consider
x1 p1
x2 p2
x=
x3 and p= 5
p3 ∈ R ,
x4 p4
x5 p5
where x1 represents the number of pounds of flour, x2 the number of dozens of eggs, x3 the number
of pounds of chocolate chips, x4 the number of pounds of walnuts, and x5 the number of pounds of
butter needed to produce a certain massive quantity of chocolate chip cookies, and pi is the price
§2. Dot Product 13
(in dollars) of a unit of the ith commodity (e.g., p2 is the price of a dozen eggs). Then it is easy to
see that
p · x = p 1 x1 + p 2 x2 + p 3 x3 + p 4 x4 + p 5 x5
is the total cost of producing the massive quantity of cookies. (To be realistic, we might also want
to include x6 as the number of hours of labor, with corresponding hourly wage p6 .) We will return
to this interpretation in Section 4.
EXERCISES 1.2
1. For each of the following pairs of vectors x and y, calculate x · y and the angle θ between the
vectors.
2 −5
a. x = ,y=
5 2
2 −1
b. x = ,y= 3 −1
1 1
*f. x = −4 , y = 0
1 7
*c. x = ,y= 5 1
8 −4
1 1
1 5 1 −3
d. x = 4 , y = 1 g. x =
1 , y = −1
−3 3
1 5
1 5
e. x = −1 , y = 3
6 2
*2. For each pair of vectors in Exercise 1, calculate projy x and projx y.
*3. Find the angle between the long diagonal of a cube and a face diagonal.
4. Find the angle that the long diagonal of a 3 × 4 × 5 rectangular box makes with the longest
edge.
6. Suppose x, y, z ∈ R2 are unit vectors satisfying x + y + z = 0. What can you say about the
angles between each pair?
1 0 0
7. Let e1 = 0 , e2 = 1 , and e3 = 0 be the so-called standard basis for R3 . Let x ∈ R3
0 0 1
be a nonzero vector. For i = 1, 2, 3, let θi denote the angle between x and ei . Compute
cos2 θ1 + cos2 θ2 + cos2 θ3 .
14 Chapter 1. Vectors and Matrices
1 1
1 2
*8. Let x = 1 and y = 3 ∈ Rn . Let θn be the angle between x and y in Rn . Find lim θn .
.. .. n→∞
. .
1 n
(Hint: You may need to recall the formulas for 1 + 2 + · · · + n and 12 + 22 + · · · + n2 from your
beginning calculus course.)
9. With regard to the proof of Proposition 2.3, how is t0 y related to xk ? What does this say
about projy x?
10. Use vector methods to prove that a parallelogram is a rectangle if and only if its diagonals have
the same length.
11. Use the fundamental properties of the dot product to prove that
kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 .
*12. Use the dot product to prove the law of cosines: As shown in Figure 2.4,
c2 = a2 + b2 − 2ab cos θ.
c
a
A
θ b
C
Figure 2.4
13. Use vector methods to prove that the diagonals of a parallelogram are orthogonal if and only
if the parallelogram is a rhombus (i.e., has all sides of equal length).
♯ 14. Use vector methods to prove that a triangle inscribed in a circle and having a diameter as one
O x
Figure 2.5
§2. Dot Product 15
18. Use the Cauchy-Schwarz inequality to solve the following max/min problem: If the (long)
diagonal of a rectangular box has length c, what is the greatest the sum of the length, width,
and height of the box can be? For what shape box does the maximum occur?
19. Give an alternative proof of the Cauchy-Schwarz inequality, as follows. Let a = kxk, b = kyk,
and deduce from kbx − ayk2 ≥ 0 that x · y ≤ ab. Now how do you show that |x · y| ≤ ab?
When does equality hold?
♯ 20. a. Let x and y be vectors with kxk = kyk. Prove that the vector x + y bisects the angle
between x and y.
b. More generally, if x and y are arbitrary nonzero vectors, let a = kxk and b = kyk. Prove
that the vector bx + ay bisects the angle between x and y.
21. Use vector methods to prove that the diagonals of a parallelogram bisect the vertex angles if
and only if the parallelogram is a rhombus.
22. Given △ABC with D on BC as shown in Figure 2.6. Prove that if AD bisects ∠BAC, then
C
D
A B
Figure 2.6
−−→ −−→ −−→ −→ −−
→ −→
kBDk/kCDk = kABk/kACk. (Hint: Use Exercise 20b. Let x = AB and y = AC; give two
−−→
expressions for AD in terms of x and y and use Exercise 1.1.10.)
23. Use vector methods to prove that the angle bisectors of a triangle have a common point. (Hint:
−→ −−→ −→ −−→ −−→
Given △OAB, let x = OA, y = OB, a = kOAk, b = kOBk, and c = kABk. If we define
−−→ 1
the point P by OP = a+b+c (bx + ay), use Exercise 20b to show that P lies on all three angle
bisectors.)
16 Chapter 1. Vectors and Matrices
24. Use vector methods to prove that the altitudes of a triangle have a common point. Recall that
altitudes of a triangle are the lines passing through a vertex and perpendicular to the opposite
←→
side. (Hint: See Figure 2.7. Let C be the point of intersection of the altitude from B to OA
←→ −−→ −−
→
and the altitude from A to OB. Prove that OC is orthogonal to AB.)
B
?
O A
Figure 2.7
25. Use vector methods to prove that the perpendicular bisectors of the sides of a triangle intersect
−→
in a point, as follows. Assume the triangle OAB has one vertex at the origin, and let x = OA
−−→
and y = OB.
a. Let z be the point of intersection of the perpendicular bisectors of OA and OB. Prove
that (using the notation of Exercise 16)
kyk2 − x · y
z = 12 x + cρ(x), where c = .
2ρ(x) · y
b. Show that z lies on the perpendicular bisector of AB. (Hint: What is the dot product of
z − 12 (x + y) with y − x?)
26. Let P be the intersection of the medians of △OAB (see Proposition 1.1), Q the intersection of
its altitudes (see Exercise 24), and R the intersection of the perpendicular bisectors of its sides
(see Exercise 25). Show that P , Q, and R are collinear and that P is two-thirds of the way
from Q to R. Does the intersection of the angle bisectors (see Exercise 23) lie on this line as
well?
3. Subspaces of Rn
v tv
u
su su+tv
Figure 3.1
A⋅x=0
x
Figure 3.2
A1 · x = 0, A2 · x = 0, ..., Am · x = 0
forms a subspace of Rn . ▽
Examples 2. Let’s consider next a few subsets of R2 , as pictured in Figure 3.3, that are not
subspaces.
x1 2
(a) S = ∈ R : x2 = 2x1 + 1 is not a subspace. All three criteria fail, but it suffices
x2
to point
out 0∈/ S.
x1 2 1 0
(b) S = ∈ R : x1 x2 = 0 is not a subspace. Each of the vectors v = and w =
x2 0 1
1
lies in S, and yet their sum v + w = does not.
1
x1 2 0
(c) S = ∈ R : x2 ≥ 0 is not a subspace. The vector v = lies in S, and yet any
x2 1
0
negative scalar multiple of it, e.g., (−2)v = , does not. ▽
−2
v = c1 v1 + c2 v2 + · · · + ck vk
(as illustrated in Figure 3.4) is called a linear combination of v1 , . . . , vk . The set of all linear
combinations of v1 , . . . , vk is called their span, denoted Span(v1 , . . . , vk ).
§3. Subspaces of Rn 19
Figure 3.3
c1v1+c2v2
c2v2
v2
c1v1
v1
Figure 3.4
The vectors e1 , . . . , en are often called the standard basis vectors for Rn . Obviously, given the vector
x1
x = ... , we have x = x1 e1 + x2 e2 + · · · + xn en .
xn
v = c1 v1 + · · · + ck vk and
w = d1 v1 + · · · + dk vk ;
adding, we obtain
v + w = (c1 v1 + · · · + ck vk ) + (d1 v1 + · · · + dk vk )
= (c1 + d1 )v1 + · · · + (ck + dk )vk ,
There are really two different ways that subspaces of Rn arise: as being the span of a collection
of vectors (the “parametric” approach), or as being the set of solutions of a (homogeneous) system
of linear equations (the “implicit” approach). We shall study the connections between the two in
detail in Chapter 4.
−1
Example 4. As the reader can verify, the vector A = 3 is orthogonal to both the vectors
2
that span the plane P1 given in Example 3 above. Thus, every vector in P1 is orthogonal to A and
we suspect that
P1 = {x ∈ R3 : A · x = 0} = {x ∈ R3 : −x1 + 3x2 + 2x3 = 0}.
Strictly speaking, we only know that every vector in P1 is a solution of this equation. But note
that if x is a solution, then
x1 3x2 +2x3 3 2 1 2
x = x2 = x2 = x2 1 + x3 0 = (−x2 ) −1 + (2x2 + x3 ) 0 ,
x3 x3 0 1 2 1
so x ∈ P1 and the two sets are equal.1 Thus, the discussion of Example 1(e) gives another
justification that P1 is a subspace of R3 .
On the other hand, one can check, analogously, that
P2 = {x ∈ R3 : −x1 + 3x2 + 2x3 = −1},
/ P2 and P2 is not a subspace. It is an affine plane parallel to P1 .
and so clearly 0 ∈ ▽
Definition. Let V and W be subspaces of Rn . We say they are orthogonal subspaces if every
element of V is orthogonal to every element of W , i.e., if
v·w =0 for every v ∈ V and every w ∈ W .
As indicated in Figure 3.5, given a subspace V ⊂ Rn , define
V ⊥ = {x ∈ Rn : x · v = 0 for every v ∈ V }.
V ⊥ (read “V perp”) is called the orthogonal complement of V .2
⊥
V
Figure 3.5
1
Example 5. Let V = Span 2 . Then V ⊥ is the plane W = {x ∈ R3 : x1 + 2x2 + x3 = 0}.
1
Now what is the orthogonal complement of W ? We suspect it is just the line V , but we will have
to wait until Chapter 4 to have the appropriate tools. ▽
EXERCISES 1.3
*1. Which of the following are subspaces? Justify your answer in each case.
a. {x ∈ R2 : x1 + x2 = 1}
a
b. {x ∈ R3 : x = b for some a, b ∈ R}
a+b
c. {x ∈ R3 : x1 + 2x2 < 0}
d. {x ∈ R3 : x21 + x22 + x23 = 1}
e. {x ∈ R3 : x21 + x22 + x23 = 0}
f. {x ∈ R3 : x21 + x22 +x23 = −1}
2 1
g. {x ∈ R3 : x = s 1 + t 2 for some s, t ∈ R}
1 1
3 2 1
h. 3
{x ∈ R : x = 0 + s 1 + t 2 for some s, t ∈ R}
1 1 1
2 2 1
i. {x ∈ R3 : x = 4 + s 1 + t 2 for some s, t ∈ R}
−1 1 −1
*2. Criticize the following argument: By Exercise 1.1.13, for any vector v, we have 0v = 0. So the
first criterion for subspaces is, in fact, a consequence of the second criterion and could therefore
be omitted.
♯ 3. Suppose x, v1 , . . . , vk ∈ Rn and x is orthogonal to each of the vectors v1 , . . . , vk . Prove that x
is orthogonal to any linear combination c1 v1 + c2 v2 + · · · + ck vk .
§4. Linear Transformations and Matrix Algebra 23
U ∩ V = {x ∈ Rn : x ∈ U and x ∈ V }.
12. Suppose U and V are subspaces of Rn . Prove that (U + V )⊥ = U ⊥ ∩ V ⊥ . (See the footnote on
p. 21.)
We are heading towards calculus and the study of functions. As we learned in the case of one
variable, differential calculus is based on the idea of the best (affine) linear approximation of a
function. Thus, our first brush with functions is with those that are linear.
First we introduce a bit of notation. If X and Y are sets, a function f : X → Y is a rule that
assigns to each element x ∈ X a single element y ∈ Y ; we write y = f (x). We call X the domain of f
and Y the range. The image of f is the set of all its values, i.e., {y ∈ Y : y = f (x) for some x ∈ X}.
IRn IRm
v
u+v T T(u)
T(u+v)
u T(v)
Figure 4.1
The main point of the linearity properties is that the values of T on the standard basis vectors
e1 , . . . , en completely determine the function T : For suppose x = x1 e1 + · · · + xn en ∈ Rn ; then
In particular, let
a1j
a2j
T (ej ) =
.. ∈ Rm ;
.
amj
then to T we can naturally associate the m × n array
a11 ... a1n
a21 ... a2n
A= .. .. ..
,
. . .
am1 . . . amn
which we call the standard matrix for T . (We will often denote this by [T ].) To emphasize: the j th
column of A is the vector in Rm obtained by applying T to the j th standard basis vector, ej .
Example 1. The most basic example of a linear map is the following. Fix a ∈ Rn , and define
T : Rn → R by T (x) = a · x. By Proposition 2.1, we have
−x2
T(x)
x1 x
x2
x1
Figure 4.2
" # " #
x1 y1
x= and y =
x2 y2
are vectors, then
" #! " # " # " #
x1 + y 1 −(x2 + y2 ) −x2 −y2
T (x + y) = T = = + = T (x) + T (y),
x2 + y 2 x1 + y 1 x1 y1
x2
)x ( T
x
x1
x2
x1
Figure 4.3
e2 T(e2)
T
e1 T(e1)
Figure 4.4
(d) Consider the function T : R3 → R3 defined by reflecting across the plane x3 = 0. Then
T (e1 ) = e1 , T (e2 ) = e2 , and T (e3 ) = −e3 , so the standard matrix for T is
1 0 0
0 1 0. ▽
0 0 −1
(e) Generalizing (a), we consider rotation of R2 through the angle θ (given in radians). By
the same geometric argument we suggested earlier (see Figure 4.5), this is a linear trans-
)
v) +v
v T( T(
u
T
rotate
u+v
u)
T(
Figure 4.5
§4. Linear Transformations and Matrix Algebra 27
formation of R2 . Now, as we can see from Figure 4.6, the standard matrix has as its first
column " #
cos θ
T (e1 ) =
sin θ
(by the usual definition of cos θ and sin θ, in fact) and as its second
" #
− sin θ
T (e2 ) =
cos θ
T(e2)= −sin θ
cos θ
e2 −sin θ
cos θ
T cos θ T(e1)= sin θ
sin θ
θ
rotate θ cos θ
e1
Figure 4.6
1
(f) If ℓ ⊂ R2 is the line spanned by , then we can consider the linear maps S, T : R2 → R2
2
given respectively by projection onto, and reflection across, the line ℓ. Their standard
matrices are " # " #
1 2 3 4
−
A = 25 5
4 and B= 5
4
5 .
3
5 5 5 5
then it seems impossible to discern the geometric nature of the linear map represented by
such a matrix.3 In these examples, the standard “coordinate system” built into matrices
just masks the geometry, and, as we shall see, the solution is to change our coordinate
system. This we do in Chapter 9. ▽
3
For thecurious
among you, multiplication by C gives a rotation of R3 through an angle of π/2 about the line
1
spanned by 2 . See Exercise 9.2.21.
1
28 Chapter 1. Vectors and Matrices
Let T : Rn → Rm be a linear map, and let A be its standard matrix. We want to define the
product of the m × n matrix A with the vector x ∈ Rn in such a way that the vector T (x) ∈ Rm is
equal to Ax. (We will occasionally denote the linear map defined in this way by µA .) In accordance
with the formula (∗) on p. 24, we have
n
X n
X
Ax = T (x) = xi T (ei ) = xi a i ,
i=1 i=1
where
a11 a12 a1n
. . .
a1 = .
. , a2 = .
. ,..., an = .
. ∈R
m
are the column vectors of the matrix A. That is, Ax is the linear combination of the vectors
a1 , . . . , an , weighted according to the coordinates of the vector x.
There is, however, an alternative interpretation. Let
a11 a21 am1
.
A1 = .. , A2 = ... , , . . . , Am = ... ∈ Rn
a1n a2n amn
As we shall study in great detail in Chapter 4, this allows us to interpret the equation Ax = y as
a system of m linear equations in the variables x1 , . . . , xn .
4.1. Algebra of Linear Functions. Denote by Mm×n the set of all m × n matrices. In an
obvious way this set can be identified with Rmn (how?). Indeed, we begin by observing that we
can add m × n matrices and multiply them by scalars just as we did vectors.
For future reference, we call a matrix square if m = n (i.e., it has equal numbers of rows and
columns). We refer to the entries aii , i = 1, . . . , n, as diagonal entries. We call the (square) matrix
a diagonal matrix if aij = 0 whenever i 6= j, i.e., if every non-diagonal entry is 0. A square matrix
all of whose entries below the diagonal are 0 is called upper triangular ; one all of whose entries
above the diagonal are 0 is called lower triangular .
If S, T : Rn → Rm are linear maps and c ∈ R, then we can obviously form the linear maps
cT : Rn → Rm and S + T : Rn → Rm , defined, respectively, by
Denote by O the zero matrix , the m × n matrix all of whose entries are 0. As the reader can
easily check, scalar multiplication of matrices and matrix addition satisfy the same properties as
scalar multiplication of vectors and vector addition (see Exercise 1.1.12). We list them here for
reference.
(8) 1A = A.
Of all the operations one performs on functions, probably the most powerful is composition.
Recall that when g(x) is in the domain of f , we define (f ◦ g)(x) = f (g(x)). So, suppose we have
linear maps S : Rp → Rn and T : Rn → Rm . Then we define T ◦ S : Rp → Rm by (T ◦ S)(x) =
T (S(x)). It is well known that composition of functions is not commutative4 but is associative,
inasmuch as
(f ◦ g)◦ h (x) = f (g(h(x))) = f ◦ (g ◦ h) (x).
We want to define matrix multiplication so that it corresponds to the composition of linear maps.
Let A be the m × n matrix representing T and let B be the n × p matrix representing S. We expect
that the m × p matrix C representing T ◦ S can be expressed in terms of A and B. The j th column
of C is the vector (T ◦ S)(ej ) ∈ Rm . Now,
b1j
b2j
T (S(ej )) = T .. = b1j a1 + b2j a2 + · · · + bnj an ,
.
bnj
where a1 , . . . , an are the column vectors of A. That is, the j th column of C is the product of the
matrix A with the vector bj . So we now make the definition:
i.e., the dot product of the ith row vector of A and the j th column vector of B, both of which are
vectors in Rn . Graphically, we have
a11 a12 · · · a1n ··· ··· ··· ··· ···
. b b1j b1p ..
.. 11
.
b21 b2j b2p
ai1 ai2 · · · ain ... ...
= ··· ··· (AB)ij
··· ···.
... .. ..
.. . . ..
.
. bn1 bnj bnp
am1 am2 · · · amn ··· ··· ··· ··· ···
We reiterate that in order for the product AB to be defined, the number of columns of A must
equal the number of rows of B.
Example 4. If
1 3 " #
4 1 0 −2
A = 2 −1 and B= ,
−1 1 5 1
1 1
4
e.g., sin(x2 ) 6= sin2 x
§4. Linear Transformations and Matrix Algebra 31
then
1 3 " # 1 4 15 1
4 1 0 −2
AB = 2 −1 = 9 1 −5 −5 .
−1 1 5 1
1 1 3 2 5 −1
Notice also that the product BA does not make sense: B is a 2 × 4 matrix and A is 3 × 2, and
4 6= 3. ▽
The preceding example brings out an important point about the nature of matrix multiplication:
it can happen that the matrix product AB is defined and the product BA is not. Now if A is an
m × n matrix and B is an n × m matrix, then both products AB and BA make sense: AB is m × m
and BA is n × n. Notice that these are both square matrices, but of different sizes. But even if we
start with both A and B as n × n matrices, the products AB and BA need not be equal.
When—and only when—A is a square matrix (i.e., m = n), we can multiply A by itself,
obtaining A2 = AA, A3 = A2 A = AA2 , etc. If we think of Ax as resulting from x by performing
some geometric procedure, then (A2 )x should result from performing that procedure twice, (A3 )x
thrice, and so on.
Then it is easy to check that A2 = A, so An = A for all positive integers n (why?). What is the
geometric explanation? Note that
" # " # " # " # " # " #
1 2
1 5 1 1 0 5 2 1
A = 2 = and A = 4 = ,
0 5
5 2 1 5
5 2
2 1
so that for every x ∈ R , we see that Ax lies on the line spanned by . Indeed, we can tell more:
2
" # " # " # 1 " #
1 x·
x1 5 (x1
+ 2x2 ) x1 + 2x2 1 2 1
A = 2 = =
2
x2 5 (x1
+ 2x2 ) 5 2
2
1
2
1
is the projection of x onto the line spanned by . This explains why A2 x = Ax for every x ∈ R2 :
2
A2 x = A(Ax), and once we’ve projected the vector x onto the line, it stays exactly the same. ▽
32 Chapter 1. Vectors and Matrices
Example 7. There is an interesting way to interpret matrix powers in terms of directed graphs.
Starting with the matrix
0 2 1
A = 1 1 1,
1 0 1
we draw a graph with 3 nodes (vertices) and aij directed edges (paths) from node i to node j, as
shown in Figure 4.7. For example, there are 2 edges from node 1 to node 2 and none from node 3
2
1
Figure 4.7
to node 2.
We calculate
0 2 1 0 2 1 3 2 3
2
A = 1 1 11 1 1 = 2 3 3,
1 0 1 1 0 1 1 2 2
5 8 8
3
A = 6 7 8, and . . .
4 4 5
272 338 377
7
A = 273 337 377 .
169 208 233
For example, the 13-entry of A2 is
(A2 )13 = a11 a13 + a12 a23 + a13 a33 = (0)(1) + (2)(1) + (1)(1) = 3.
With a bit of thought, the reader will convince herself that the ij-entry of A2 is the number of
“two-step” directed paths from node i to node j. Similarly, the ij-entry of An is the number of
n-step directed paths from node i to node j. ▽
We have seen that, in general, matrix multiplication is not commutative. However, it does have
the following crucial properties. Let In denote the n × n matrix with 1’s on the diagonal and 0’s
elsewhere.
Proof. These are all immediate from the linear map viewpoint.
If A is the matrix representing the linear transformation T : Rn → Rn , then A−1 represents the
inverse function T −1 , which must then also be a linear transformation.
Example 9. It will be convenient for our future work to have the inverse of a 2 × 2 matrix
" #
a b
A= .
c d
Provided ad − bc 6= 0, if we set
" #
1 d −b
A−1 = ,
ad − bc −c a
then an easy calculation shows that AA−1 = A−1 A = I2 , as needed. ▽
Example 10. It follows immediately from Example 9 that for our rotation matrix
" # " #
cos θ − sin θ −1 cos θ sin θ
Aθ = , we have Aθ = .
sin θ cos θ − sin θ cos θ
Since cos(−θ) = cos θ and sin(−θ) = − sin θ, we see that this is the matrix A(−θ) . If we think
about the corresponding linear maps, this result becomes obvious: to invert (or “undo”) a rotation
through angle θ, we must rotate through angle −θ. ▽
Example 11. As an application of Example 9, we can now show thatany two nonparallel
u1 v
vectors u,v ∈ R2 must span R2 . It is easy to check that if u = and v = 1 are nonparallel,
u2 v2
then u1 v2 − u2 v1 6= 0, so the matrix " #
u1 v1
A=
u2 v2
34 Chapter 1. Vectors and Matrices
c1
is invertible. Given x ∈ R2 , define c = by c = A−1 x. Then we have
c2
x = A(A−1 x) = Ac = c1 u + c2 v,
We shall learn in Chapter 4 how to calculate the inverse of a matrix in a straightforward fashion.
We end the present discussion of inverses with a very important observation.
Proposition 4.3. Suppose A and B are invertible n × n matrices. Then their product AB is
invertible, and
(AB)−1 = B −1 A−1 .
Remark. Some people refer to this result rather endearingly as the “shoe-sock theorem,” for
to undo (invert) the process of putting on one’s socks and then one’s shoes, one must first remove
the shoes and then remove the socks.
Proof. To prove the matrix AB is invertible, we need only check that the candidate for the
inverse works. That is, we need to check that
4.2. The Transpose. The final matrix operation we will discuss in this chapter is the trans-
pose. When A is an m × n matrix with entries aij , the matrix AT (read “A transpose”) is the n × m
matrix whose ij-entry is aji ; i.e., the ith row of AT is the ith column of A. We say a square matrix
A is symmetric if AT = A and skew-symmetric if AT = −A.
Proof. The first is obvious, since we swap rows and columns and then swap again, returning
to our original matrix. The second and third are immediate to check. The last result is more
interesting and we will use it to derive a crucial result in a moment. Note, first, that AB is an
m × p matrix, so (AB)T will be a p × m matrix; B T AT is the product of a p × n matrix and an
n × m matrix and hence will be p × m as well, so the shapes agree. Now, the ji-entry of AB is the
dot product of the j th row vector of A and the ith column vector of B, i.e., the ij-entry of (AB)T
is
(AB)T ij = (AB)ji = Aj · bi .
On the other hand, the ij-entry of B T AT is the dot product of the ith row vector of B T and the
j th column vector of AT ; but this is, by definition, the dot product of the ith column vector of B
and the j th row vector of A. That is,
(B T AT )ij = bi · Aj ,
and, since dot product is commutative, the two formulas agree.
The transpose matrix will be so important to us because of the interplay between dot product
and transpose. If x and y are vectors in Rn , then by virtue of our very definition of matrix
multiplication,
x · y = xT y,
provided we agree to think of a 1 × 1 matrix as a scalar. Now we have the highly useful
Remark. You might remember this: to move the matrix “across the dot product,” you must
transpose it.
Proof. We just calculate, using the formula for the transpose of a product and, as usual,
associativity:
Ax · y = (Ax)T y = (xT AT )y = xT (AT y) = x · AT y.
Example 13. We return to the economic interpretation of dot product given in the remark
on p. 12. Suppose that m different ingredients
are required to manufacture n different
products.
x1 y1
To manufacture the product vector x = ... requires the ingredient vector y = ... , and we
xn ym
36 Chapter 1. Vectors and Matrices
suppose x and y are related by the equation y = Ax for some m × n matrix A. If each unit of
ingredient j costs a price pj , then the cost of producing x is
m
X n
X
pj yj = y · p = Ax · p = x · AT p = q i xi ,
j=1 i=1
where q = AT p. Notice then that qi is the amount it costs to produce a unit of the ith product.
Our fundamental formula, Proposition 4.5, tells us that the total cost of the ingredients should
equal the total worth of the products we manufacture. ▽
EXERCISES 1.4
" # " # " # 0 1
1 2 2 1 1 2 1
1. Let A = ,B = ,C = , and D = 1 0 . Calculate
3 4 4 3 0 1 2
2 3
each of the following expressions or explain why it is not defined.
a. A+B e. AB i. BD
*b. 2A − B *f. BA j. DB
c. A−C *g. AC *k. CD
d. C +D *h. CA *l. DC
2. a. If A is an m × n matrix and Ax = 0 for all x ∈ Rn , prove that A = O.
b. If A and B are m × n matrices and Ax = Bx for all x ∈ Rn , prove that A = B.
♯ 3. Let A be an m × n matrix. Show that V = {x ∈ Rn : Ax = 0} is a subspace of Rn .
♯ 4. Let A be an m × n matrix.
x n
a. Show that V = :x∈R ⊂ Rm+n is a subspace of Rm+n .
Ax
b. When m = 1, show that V ⊂ Rn+1 is a hyperplane (see Example 1(e) in Section 3) by
finding a vector b ∈ Rn+1 so that V = {z ∈ Rn+1 : b · z = 0}.
b. Use your answer to part a to derive the addition formulas for cos and sin.
8. Prove or give a counterexample. Assume all relevant matrices are square and of the same size.
a. If AB = CB and B 6= O, then A = C.
b. If A2 = A, then A = O or A = I.
c. (A + B)(A − B) = A2 − B 2 .
d. If AB = BC and B is invertible, then A = C.
a b
9. Find all 2 × 2 matrices A = satisfying
c d
a. A2 = I2
*b. A2 = O
c. A2 = −I2
cos θ
10. a. Show that the matrix giving reflection across the line spanned by is
" # sin θ
cos 2θ sin 2θ
R= .
sin 2θ − cos 2θ
b. Letting Aθ be the rotation matrix defined on p. 27, check that
" # " #
1 0 1 0
A2θ = R = Aθ A(−θ) .
0 −1 0 −1
11. For each of the following matrices A, find a formula for An . (If you know how to do proof by
induction, please
do.)
1 1
a. A =
0 1
d1
d2
b. A = .. (all nondiagonal entries are 0)
.
dm
12. Suppose A and A′ are m × m matrices, B and B ′ are m × n matrices, C and C ′ are n × m
matrices, and D and D ′ are n × n matrices. Check the following formula for the product of
“block” matrices:
" #" # " #
A B A′ B ′ AA′ + BC ′ AB ′ + BD ′
= .
C D C ′ D′ CA′ + DC ′ CB ′ + DD ′
*13. Let T : R2 → R2 be the linear transformation defined by rotating the plane π/2 counterclock-
wise; let S : R2 → R2 be the linear transformation defined by reflecting the plane across the
line x1 + x2 = 0.
38 Chapter 1. Vectors and Matrices
14. Calculate the standard matrix for each of the following linear transformations T :
a. T : R2 → R2 given by rotating −π/4 about the origin and then reflecting across the line
x1 − x2 = 0.
b. T : R3 → R3 given by rotating π/2 about the x1 -axis (as viewed from the positive side)
and then reflecting across the plane x2 = 0.
c. T : R3 → R3 given by rotating −π/2 about the x1 -axis (as viewed from the positive side)
and then rotating π/2 about the x3 -axis.
±1
15. Consider the cube with vertices ±1 , pictured in Figure 4.8. (Note that the coordinate axes
±1
pass through the centers of the various faces.) Give the standard matrices for each of the
following symmetries of the cube.
−1
−1
1 1 −1
−1 1
1 1 1
−1 1
−1 1
−1
−1
1 1
−1 −1
−1 1
1
−1
Figure 4.8
−1
−1
1
1
1
1
1 −1
−1 1
−1 −1
Figure 4.9
0
a. 120◦ rotation counterclockwise (as viewed from high above) about the line joining 0
0
1
and the vertex 1
1
0 0
b. 180◦ rotation about the line joining 0 and 0
1 −1
c. reflection across the plane containing one edge and the midpoint of the opposite edge
(Hint: Note where the coordinate axes intersect the tetrahedron.)
21. Suppose A is an n × n matrix satisfying A10 = O. Prove that the matrix In − A is invertible.
(Hint: As a warm-up, try assuming A2 = O.)
22. Define the trace of an n × n matrix A (denoted trA) to be the sum of its diagonal entries:
n
X
trA = aii .
i=1
n P
P n n P
P n
c. Prove that tr(AB) = tr(BA). (Hint: ckℓ = ckℓ .)
k=1 ℓ=1 ℓ=1 k=1
23. Let
" # " # " # 0 1
1 2 2 1 1 2 1
A= , B= , C= , and D = 1 0.
3 4 4 3 0 1 2
2 3
Calculate each of the following expressions or explain why it is not defined.
a. AT *e. AT C i. DT B
*b. 2A − B T f. AC T *j. CC T
c. CT *g. C T AT *k. C TC
d. CT + D h. BD T l. C T DT
*24. Suppose A and B are symmetric. Prove that AB is symmetric if and only if AB = BA.
26. Suppose A is invertible. Check that (A−1 )T AT = I and AT (A−1 )T = I, and deduce that AT is
likewise invertible.
*27. Let Aθ be the rotation matrix defined on p. 27. Explain why A−1 T
θ = Aθ .
28. An n × n matrix is called a permutation matrix if it has a single 1 in each row and column and
all its remaining entries are 0.
a. Write down all the 2 × 2 permutation matrices. How many are there?
b. Write down all the 3 × 3 permutation matrices. How many are there?
c. Prove that the product of two permutation matrices is again a permutation matrix. Do
they commute?
d. Prove that every permutation matrix is invertible and P −1 = P T .
e. If A is an n × n matrix and P is an n × n permutation matrix, describe the matrices P A
and AP .
♯ 29. Let A be an m × n matrix and let x, y ∈ Rn . Prove that if Ax = 0 and y = AT b for some
b ∈ Rm , then x · y = 0.
♯ 30. Suppose A is a symmetric n × n matrix. Let V ⊂ Rn be a subspace with the property that
Ax ∈ V for every x ∈ V . Prove that Ay ∈ V ⊥ for all y ∈ V ⊥ .
♯ *32. Suppose A is an m × n matrix and x ∈ Rn satisfies (AT A)x = 0. Prove that Ax = 0. (Hint:
What is kAxk?)
b. Fill in the missing columns in the following matrices to make them orthogonal:
" √ # 1 0 ? 1
? 2
3
2 ? 32 3
2
, 0 −1 ? , 3 ? − 3
− 12 ? 2 1
0 0 ? 3 ? 3
for some real number θ. (Hint: Use part a, rather than the original definition.)
*d. Prove that if A is an orthogonal 2 × 2 matrix, then µA : R2 → R2 is either a rotation or
the composition of a rotation and a reflection.
e. Assume for now that AT = A−1 when A is orthogonal (this is a consequence of Corollary
2.2 of Chapter 4). Prove that the row vectors A1 , . . . , An of an orthogonal matrix A are
unit vectors that are orthogonal to one another.
37. Suppose A is an n × n matrix that commutes with all n × n matrices; i.e., AB = BA for all
B ∈ Mn×n . What can you say about A?
42 Chapter 1. Vectors and Matrices
Let x and y be vectors in R2 and consider the parallelogram P they span. The area of P is
nonzero so long as x and y are not collinear. We want to express the area of P in terms of the
coordinates of x and y. First notice that the area of the parallelogram pictured in Figure 5.1 is the
y
P
θ x
Figure 5.1
same as the area of the rectangle obtained by moving the shaded triangle from the right side to
the left. This rectangle has area A = bh, where b = kxk is the base and h = kyk sin θ is the height.
We could calculate sin θ from the formula
x·y
cos θ = ,
kxkkyk
but instead we note (see Figure 5.2) that
π
kxkkyk sin θ = kxkkyk cos 2 − θ = ρ(x) · y,
where ρ(x) is the vector obtained by rotating x an angle π/2 counterclockwise (see Exercise 1.2.16).
ρ( x) y
π− θ θ x
2
Figure 5.2
x1 y1
If x = and y = , then we have
x2 y2
" # " #
−x2 y1
area(P) = ρ(x) · y = · = x1 y 2 − x2 y 1 .
x1 y2
3 4
Example 1. If x = , then the area of the parallelogram spanned by x and
and y =
1 3
4
y is x1 y2 − x2 y1 = 3 · 3 − 1 · 4 = 5. On the other hand, if we interchange the two, letting x =
3
§5. Introduction to Determinants and the Cross Product 43
3
and y = , then we get x1 y2 − x2 y1 = 4 · 1 − 3 · 3 = −5. Certainly the parallelogram hasn’t
1
changed, and nor does it make sense to have negative area. What is the explanation? In deriving
our formula for the area above, we assumed 0 < θ < π; but if we must turn clockwise to get from
x to y, this means that θ is negative, resulting in a sign discrepancy in the area calculation. ▽
So we should amend our earlier result. We define the signed area of the parallelogram P to be
x x
Figure 5.3
the area of P when one turns counterclockwise from x to y and to be negative the area of P when
one turns clockwise from x to y, as illustrated in Figure 5.3. Then we have:
signed area(P) = x1 y2 − x2 y1 .
Because of its geometric significance, we consider the function5
(∗) D(x, y) = x1 y2 − x2 y1 ;
this is the function that associates to each ordered pair of vectors x, y ∈ R2 the signed area of the
parallelogram they span.
Next, let’s explore the properties of the signed area function D on R2 × R2 .6
Property 1. If x, y ∈ R2 , then D(y, x) = −D(x, y).
Algebraically, we have
D(y, x) = y1 x2 − y2 x1 = −(x1 y2 − x2 y1 ) = −D(x, y).
Geometrically, this was the point of our introducing the notion of signed area.
Property 2. If x, y ∈ R2 and c ∈ R, then
D(cx, y) = cD(x, y) = D(x, cy).
This follows immediately from the formula (∗):
D(cx, y) = (cx1 )y2 − (cx2 )y1 = c(x1 y2 − x2 y1 ) = cD(x, y).
Geometrically, if we stretch one of the edges of the parallelogram by a factor of c > 0, then the
area is multiplied by a factor of c. And if c < 0, the area is multiplied by a factor of |c| and the
signed area changes sign (why?).
5
Here, since x and y are themselves vectors, we use the customary notation for functions.
6
Recall that, given two sets X and Y , their product X × Y consists of all ordered pairs (x, y), where x ∈ X and
y ∈Y.
44 Chapter 1. Vectors and Matrices
Property 3. If x, y, z ∈ R2 , then
D(x + y, z) = (x1 + y1 )z2 − (x2 + y2 )z1 = (x1 z2 − x2 z1 ) + (y1 z2 − y2 z1 ) = D(x, z) + D(y, z),
as required. (The formula for D(x, y + z) can now be deduced using Property 1.) Geometrically,
we can deduce the result from Figure 5.4: the area of parallelogram OBCD (D(x + y, z)) is equal
E
D
z
x+y B
x A
O
Figure 5.4
to the sum of the areas of parallelograms OAED (D(x, z)) and ABCE (D(y, z)). The proof of
this, in turn, follows from the fact that △OAB is congruent to △DEC.
As we ask the reader to check in Exercise 4, one can deduce from the four properties above and
the geometry of linear maps the fact that the determinant represents the signed area of the paral-
lelogram.
We next turn to the case of 3 × 3 determinants. The general case will wait until Chapter 7.
Given three vectors,
x1 y1 z1
x = x2 , y = y2 , and z = z2 ∈ R3 ,
x3 y3 z3
§5. Introduction to Determinants and the Cross Product 45
we define
| | |
y z y z y z
D(x, y, z) = x y z = x1 2 2 − x2 1 1 + x3 1 1 .
y3 z3 y3 z3 y2 z2
| | |
Multiplying this out, we get three positive terms and three negative terms; a handy mnemonic
device for this formula is depicted in Figure 5.5.
— — —
x 1 y1 z 1 x 1 y1
x 2 y2 z 2 x 2 y2
x 3 y3 z 3 x 3 y3
+ + +
Figure 5.5
This function D of three vectors in R3 has properties quite analogous to those in the two-
dimensional case. In particular, it follows immediately from the latter that if x, y, z, and w are
vectors in R3 and c is a scalar, then
D(x, z, y) = −D(x, y, z)
D(x, cy, z) = cD(x, y, z) = D(x, y, cz)
D(x, y + w, z) = D(x, y, z) + D(x, w, z) and D(x, y, z + w) = D(x, y, z) + D(x, y, w).
It is also immediately obvious from the definition that if x, y, z, and w are vectors in R3 and c is
a scalar, then
D(cx, y, z) = cD(x, y, z)
D(x + w, y, z) = D(x, y, z) + D(w, y, z).
Least elegant is the verification that D(y, x, z) = −D(x, y, z):
x z x z x z
D(y, x, z) = y1 2 2 − y2 1 1 + y3 1 1
x3 z3 x3 z3 x2 z2
= y1 (x2 z3 − x3 z2 ) + y2 (x3 z1 − x1 z3 ) + y3 (x1 z2 − x2 z1 )
= −x1 (y2 z3 − y3 z2 ) + x2 (y1 z3 − y3 z1 ) − x3 (y1 z2 − y2 z1 )
y z y z y z
2 2 1 1 1 1
= −x1 + x2 − x3
y3 z3 y3 z3 y2 z2
= −D(x, y, z).
Summarizing, we have:
Property 1. If x, y, z ∈ R3 , then
D(y, x, z) = D(x, z, y) = D(z, y, x) = −D(x, y, z).
Note that, as a consequence, whenever two of x, y, and z are the same, we have D(x, y, z) = 0.
46 Chapter 1. Vectors and Matrices
If we let y′ = y − projx y and z′ = z − projx z − projy′ z, then it follows from the properties of
D that D(x, y, z) = D(x, y′ , z′ ). Moreover, we shall see when we study determinants in Chapter
7 that the results of Exercise 4 hold in three dimensions as well, so that the latter value is not
y y
x x
Figure 5.6
changed by rotating R3 to make x = αe1 , y′ = βe2 , and z′ = γe3 . Since rotation doesn’t change
signed volume, we deduce that D(x, y, z) equals the signed volume of the parallelepiped spanned
by x, y, and z, as suggested in Figure 5.6. For an alternative argument, see Exercise 18.
Given two vectors x, y ∈ R3 , define a vector, called their cross product, by
x × y = (x2 y3 − x3 y2 )e1 + (x3 y1 − x1 y3 )e2 + (x1 y2 − x2 y1 )e3
e x y
1 1 1
= e2 x2 y 2 ,
e3 x3 y 3
where the latter is to be interpreted “formally.” The geometric interpretation of the cross product,
as indicated in Figure 5.7, is the content of the following
Remark. More colloquially, if you curl the fingers of your right hand from x towards y, your
thumb points in the direction of x × y.
§5. Introduction to Determinants and the Cross Product 47
Proof. The orthogonality is an immediate consequence of the properties once we realize that
the formula for the cross product guarantees that
z · (x × y) = D(z, x, y).
In particular, x · (x × y) = D(x, x, y) = 0.
x×y
Figure 5.7
kx × yk = area(P).
When x and y are nonparallel, we have D(x, y, x × y) = kx × yk2 > 0, so the vectors span a
parallelepiped of positive signed volume, as desired.
P1 = {x ∈ R3 : A · (x − x0 ) = 0} = {x ∈ R3 : A · x = A · x0 }
= {x ∈ R3 : 3x1 − 2x2 + x3 = 7}. ▽
48 Chapter 1. Vectors and Matrices
A
x0
P1
A•x=A•x0
P
A•x=0
Figure 5.8
EXERCISES 1.5
1. Give a geometric proof that D(x, y + cx) = D(x, y) for any scalar c.
4. a. Check that when A and B are 2 × 2 matrices, we have det(AB) = det A det B.
b. Let A = Aθ be a rotation matrix. Check that det(Aθ B) = det B for any 2 × 2 matrix B.
c. Use the result of part b and the properties of determinants to give an alternative proof
that D(x, y) is the signed area of the parallelogram spanned by x and y.
5. Calculatethe cross
product
of the given vectors x and y.
1 1 1 7
*a. x = 0 , y = 2 b. x = −2 , y = 1
−1 1 1 −5
6. Find the area
of the
triangle
with thegiven vertices.
0 1 1 0 1 7
*a.
A= 0 ,B=
0 , C = 2
c.
A = 0 , B = −2 , C = 1
0 −1 1 0 1 −5
1 2 2 1 2 8
b. A = −1 , B = −1 , C = 1
d. A = 1 , B = −1 , C = 2
1 0 2 1 2 −4
10. Given the nonzero vector a ∈ R3 , a · x = b ∈ R, and a × x = c ∈ R3 , can you determine the
vector x ∈ R3 ? If so, give a geometric construction for x.
13. Let P be a parallelogram in R3 . Let P1 be its projection on the x2 x3 -plane, P2 be its projection
on the x1 x3 -plane, and P3 be its projection on the x1 x2 -plane. Prove that
2 2 2 2
area(P) = area(P1 ) + area(P2 ) + area(P3 ) .
(How’s that for a generalization of the Pythagorean Theorem?)
14. Let x, y, z ∈ R3 .
a. Show that x × y = −y × x and x × (y + z) = x × y + x × z.
b. Show that cross product is not associative; i.e., give specific vectors so that (x × y) × z 6=
x × (y × z).
a
*15. Given a = b ∈ R3 , define T : R3 → R3 by T (x) = a × x. Prove that T is a linear
c
transformation and give its standard matrix. Explain in the context of Proposition 4.5 why [T ]
is skew-symmetric.
x · z x · w
16. Let x, y, z, w ∈ R3 . Show that (x × y) · (z × w) = .
y · z y · w
c. Suppose x is the the intersection of the medians of the triangle with vertices u, v, and w.
Compare the areas of the three triangles formed by joining x with any pair of the vertices.
(Cf. Exercise 1.1.8.)
d. Let r = D(v, w), s = D(w, u), and t = D(u, v). Show that ru + sv + tw = 0. Give a
physical interpretation of this result.
18. In this exercise, we give a self-contained derivation of the geometric interpretation of the 3 × 3
determinant as signed volume.
a. By direct algebraic calculation, show that kx × yk2 = kxk2 kyk2 − (x · y)2 . Deduce that
kx × yk is the area of the parallelogram spanned by x and y.
b. Show that z · (x × y) is the signed volume of the parallelepiped spanned by x, y, and z.
c. Conclude that D(x, y, z) equals the signed volume of that parallelepiped.
−→ −−→
19. (Heron’s formula) Given △OAB, let OA = x and OB = y, and set kxk = a, kyk = b, and
kx − yk = c. Let s = 21 (a + b + c) be the semiperimeter of the triangle. Use the formulas
kx × yk2 = kxk2 kyk2 − (x · y)2 (see Exercise 18)
kx − yk2 = kxk2 + kyk2 − 2x · y
to prove that the area A of △OAB satisfies
2 1 2 2 1 2 2 2 2
A = a b − (c − a − b ) = s(s − a)(s − b)(s − c) .
4 4
20. Let △ABC have sides a, b, and c. Let s = 12 (a + b + c) be its semiperimeter. Prove that the
p
inradius of the triangle (i.e., the radius of its inscribed circle) is r = (s − a)(s − b)(s − c)/s.
CHAPTER 2
Functions, Limits, and Continuity
In this brief chapter we introduce examples of non-linear functions, their graphs, and their level
sets. As usual in calculus, the notion of limit is a cornerstone on which calculus is built. To discuss
“nearness,” we need the concepts of open and closed sets and of convergent sequences. We then give
the usual theorems on limits of functions and several equivalent ways of thinking about continuity.
All of this will be the foundation for our work on differential calculus, which comes next.
1.1. Parametrized Curves. First, we might study a vector-valued function of a single vari-
able. If we think of the independent variable, t, as time, then we can visualize f : (a, b) → Rn as
a parametrized curve—we can imagine a particle moving in Rn as time varies, and f (t) gives its
position at time t. At this point, we just give an assortment of examples. The careful analysis,
including the associated differential calculus and physical interpretations, will come in the next
chapter.
Example 1. The easiest examples, perhaps, are linear. Imagine a particle starting at position
x0 and moving with constant velocity v. Then its position at time t is evidently f (t) = x0 + tv and
its trajectory is a line passing through x0 and having direction vector v, as shown in Figure 1.1.
We refer to the vector-valued function f as a parametrization of the line. Here t is free to vary over
f(t)=x0+tv
x0 v
Figure 1.1
all of R. When we wish to parametrize the line passing through two points A and B, it is natural
51
52 Chapter 2. Functions, Limits, and Continuity
B x=x0+tv
A
x0 v
Figure 1.2
−−
→
to use one of those points, say A, as x0 and the vector AB as the direction vector v, as indicated
in Figure 1.2. ▽
Example 2. The next curve with which every mathematics student is familiar is the circle.
Essentially by the very definition of the trigonometric functions cos and sin, we obtain a very
natural parametrization of a circle of radius a, as pictured in Figure 1.3(a):
" # " #
cos t a cos t
f (t) = a = , 0 ≤ t ≤ 2π.
sin t a sin t
a cos t
a sin t a cos t
b sin t
b
t t
a a
Figure 1.3
the latter gives a natural parametrization of the ellipse, as shown in Figure 1.3(b). Be warned,
however; here t is not the angle between the position vector and the positive x-axis, as Figure 1.3(c)
indicates. ▽
y=tx
y2=x3 y2=x3+x2
(a) (b)
Figure 1.4
Example 3. Consider the two cubic curves in R2 illustrated in Figure 1.4. On the left is the
cuspidal cubic y 2 = x3 , and on the right is the nodal cubic y 2 = x3 +x2 . These can be parametrized,
respectively, by the functions
" # " #
t2 t2 − 1
f (t) = 3 and f (t) = ,
t t(t2 − 1)
as the reader can verify.1 Now consider the twisted cubic in R3 , illustrated in Figure 1.5, given by
z2=y3
z=x3
y=x2
Figure 1.5
1
To see where the latter came from, as suggested by Figure 1.4(b), we substitute y = tx in the equation and
solve for x.
54 Chapter 2. Functions, Limits, and Continuity
t
f (t) = t2 , t ∈ R.
t3
Its projections in the xy-, xz-, and yz-coordinate planes are, respectively, y = x2 , z = x3 , and
z 2 = y 3 (the cuspidal cubic). ▽
Example 4. Our last example is a classic called the cycloid: It is the trajectory of a dot on
a rolling wheel (circle). Consider the illustration in Figure 1.6. Assuming the wheel rolls without
P
t a
O
Figure 1.6
slipping, the distance it travels along the ground is equal to the length of the circular arc subtended
by the angle through which it has turned. That is, if the radius of the circle is a and it has turned
through angle t, then the point of contact with the x-axis, Q, is at units to the right. The vector
C C
a
P t a cos t
P a sin t
O Q
Figure 1.7
1.2. Scalar Functions of Several Variables. Next, we might study a scalar-valued function
of several variables. For example, we might study elevation of the earth as a function of position
on the surface of the earth, temperature at noon as a function of position in space, or, indeed,
temperature as a function of both position and time. If we have a function of n variables, to avoid
cumbersome notation, we will typically write
x1 x1
f ... rather than f ... .
xn xn
It would be typographically more pleasant and economical to suppress the vector notation and
write merely f (x1 , . . . , xn ), as do most mathematicians. We hope our choice will make it easier for
the reader to keep vectors in columns and not confuse rows and columns of matrices.
When n = 1 or n = 2, such functions are often best visualized by their graphs
(" # )
x n
graph(f ) = :x∈R ⊂ Rn+1 ,
f (x)
as pictured, for example, in Figure 1.8. There are two ways to try to visualize functions and their
y z=f ( xy )
y=f(x)
x
x
Figure 1.8
graphs, as we shall see in further detail in Chapter 3. One is to fix all of the coordinates of x but
one, and see how f varies with each of x1 , . . . , xn individually. This corresponds to taking slices
of the graph, as shown in Figure 1.9. The other is to think of a topographical map, in which we
see curves representing points at the same elevation. One then can lift each of these up to the
appropriate height and imagine the surface interpolating among them, as illustrated in Figure 1.10.
These curves are called level curves or contour curves of the function.
Example 5. Suppose we see families of concentric circles as the level curves, as shown in Figure
1.11. We see that in (a) the circles are evenly spaced, whereas in (b) they grow closer together as
we move outwards. This tells us that in (a) the value of f grows linearly with the distance from
the origin and in (b) it grows more quickly. Indeed, it is not surprising to see the corresponding
graphs in Figure 1.12: the respective functions are f (x) = kxk and f (x) = kxk2 . ▽
56 Chapter 2. Functions, Limits, and Continuity
x
( )
z=f 0
0
z=f y( )
Figure 1.9
y
z
Figure 1.10
(a) (b)
Figure 1.11
Figure 1.12
y
2π θ
r sin θ
f
r cos˚θ x
0
r
Figure 1.13
r r cos θ
f = ,
θ r sin θ
58 Chapter 2. Functions, Limits, and Continuity
r
as illustrated in Figure 1.13. This is a one-to-one mapping onto R2
− {0}. The coordinates
θ
x r cos θ
are often called the polar coordinates of the point = . ▽
y r sin θ
v
2π
f
u
y
Figure 1.14
is a ray making an angle of π/4 with the z-axis and whose projection into the xy-plane makes an
angle of v0 with the positive x-axis. Thus, the image of f is a cone, as pictured in Figure 1.14. ▽
EXERCISES 2.1
−1 4−t
§1. Scalar- and Vector-Valued Functions 59
2. a. Give parametric equations for the circle x2 + y 2 = 1 in terms of the length t pictured in
Figure 1.15. (Hint: Use similar triangles and algebra.)
x
y
t
—1
0
Figure 1.15
b. Use your answer to part a to produce infinitely many positive integer solutions2 of X 2 +
Y 2 = Z 2 with distinct ratios Y /X.
3. A string is unwound from a circular reel of radius a, being pulled taut at each instant. Give
parametric equations for the tip of the string P in terms of the angle θ, as pictured in Figure
1.16.
Figure 1.16
4. A wheel of radius a (perhaps belonging to a train) rolls along the x-axis. If a point P (on the
wheel) is located a distance b from the center of the wheel, what are the parametric equations
of its locus as the wheel rolls? (Note that when b = a we obtain a cycloid.) See Figure 1.17.
5. *a. A circle of radius b rolls without slipping outside a circle of radius a > b. Give the
parametric equations of a point P on the circumference of the rolling circle (in terms of
the angle θ of the line joining the centers of the two circles). (See Figure 1.18(a).)
b. Now it rolls inside. Do the same as for part a.
These curves are called, respectively, an epicycloid and a hypocycloid.
2
These are called Pythagorean triples. Fermat asked whether they were any nonzero integer solutions of the
corresponding equations X n + Y n = Z n for n ≥ 3. In 1995, Andrew Wiles proved in a tour de force of algebraic
number theory that there can be none.
60 Chapter 2. Functions, Limits, and Continuity
b<a b>a
Figure 1.17
(a) (b)
Figure 1.18
6. A coin of radius 1′′ is rolled (without slipping) around the outside of a coin of radius 2′′ . How
many complete revolutions does its “head” make? Now explain the correct answer! (There is
a famous story that the Educational Testing Service screwed this one up and was challenged
by a precocious high school student who knew that he had done the problem correctly.)
0
*7. A dog buries a bone at . He is at the end of a 1-unit long leash, and his master walks
1
down the positive x-axis, dragging the dog along. Since the dog wants to get back to the bone,
he pulls the leash taut. (It was pointed out to me by some students a few years ago that the
0
1
θ(t)
t
0
Figure 1.19
realism of this model leaves something to be desired.) The curve the dog travels is called a
§1. Scalar- and Vector-Valued Functions 61
tractrix (why?). Give parametric equations of the curve in terms of the parameters
a. θ
b. t
as pictured in Figure 1.19. (Hint: The fact that the leash is pulled taut means that the leash
is tangent to the curve. Show that θ ′ (t) = sin θ(t).)
8. Prove that the twisted cubic (given in Example 3) has the property that any three distinct
points on it determine a plane; i.e., no three distinct points are collinear.
9. Sketchfamilies
of level curves and the graphs of the following functions f .
x
a. f =1−y
y
x
b. f = x2 − y
y
x
c. f = x2 − y 2
y
x
d. f = xy
y
10. Consider the surfaces
x x
X = y : x2 + y 2 − z 2 = 1 and Y = y : x2 + y 2 − z 2 = −1 .
z z
2 2 2
a. Show that every point inthe glies on the hyperboloid x + y − z = 1.
image of
s s
b. Show that the curves g 0 and g (for s0 and t0 constants) are (subsets of) lines.
t t0
(See Figure 1.20.)
c. (more challenging) What is the image of g?
62 Chapter 2. Functions, Limits, and Continuity
Figure 1.20
2. A Bit of Topology in Rn
Having introduced functions, we must next decide what it means for a function to be continuous.
In one-variable calculus we study functions defined on intervals and come to appreciate the difference
between open and closed intervals. For example, the notion of limit is couched in terms of open
intervals, whereas the maximum value theorem for continuous functions depends crucially on closed
intervals. Matters are somewhat more subtle in higher dimensions, and we begin our assault on
the analogous notions in Rn .
so x ∈ B(a, δ). And if x ∈ B(a, δ), then |xi − ai | ≤ kx − ak < δ for all i = 1, . . . , n. Figure 2.1
illustrates these relationships.
δ
a √
δ/ n
Figure 2.1
(Strictly speaking, we should call this a rectangular parallelepiped, but that’s too much of a mouth-
ful.) For reasons that will be obvious in a moment, when we construct the rectangle from open
intervals, viz.,
S = (a1 , b1 ) × (a2 , b2 ) × · · · × (an , bn ) = {x ∈ Rn : ai < xi < bi , i = 1, . . . , n},
we call it an open rectangle.
Definition. We say a subset U ⊂ Rn is open if for every a ∈ U , there is some ball centered at
a that is completely contained in U ; that is, there is δ > 0 so that B(a, δ) ⊂ U .
Examples 1.
(a) First of all, an open interval (a, b) ⊂ R is an open subset. Given any c ∈ (a, b), choose
a c b
(( ) ) ( )
Figure 2.2
b2
δ2
c2 c δ
a2
a1 c1 b1
δ1
Figure 2.3
Then we claim that B(c, δ) ⊂ S. For if x ∈ B(c, δ), then |xi − ci | ≤ kx − ck < δ ≤ δi , so
ai < xi < bi , as
required.
x
(c)
Consider
S = : 0 < xy < 1 . We want to show that S is open, so we choose c =
y
a
∈ S. Without loss of generality, we may assume that 0 < b ≤ a, as shown in Figure
b
2.4. We claim that the ball of radius
1 1 − ab
δ= 1 1 − b = b 1 + ab
2 (a + b )
64 Chapter 2. Functions, Limits, and Continuity
xy=1
b c δ
a 1/b
Figure 2.4
centered at c is wholly contained in the region S. We consider the open rectangle centered
at c with base 1b − a and height 2δ; by construction, this rectangle is contained in S. Since
b ≤ a and ab < 1, it easy to check that the height is smaller than the length, and so the
ball of radius δ centered at c is contained in the rectangle, hence in S. ▽
As we shall see in the next section, the concept of open sets is integral to the notion of continuity
of a function.
We turn next to a discussion of sequences. The connections to open sets will become clear.
Definition . A sequence of vectors (or points) in Rn is a function from the set of natural
numbers, N, to Rn , i.e., an assignment of a vector xk ∈ Rn to each natural number k ∈ N. We refer
to xk as the kth term of the sequence. We often abuse notation and write {xk } for such a sequence,
even though we are thinking of the actual function and not the set of its values.
We say the sequence {xk } converges to a (denoted xk → a or lim xk = a) if for all ε > 0,
k→∞
there is K ∈ N such that
kxk − ak < ε whenever k > K .
(That is, given any neighborhood of a, “eventually”—past some K—all the elements xk of the
sequence lie inside.) We say the sequence {xk } is convergent if it converges to some a.
Examples 2. Here are a few examples of sequences, both convergent and non-convergent.
k
(a) Let xk = k+1 . We suspect that xk → 1. To prove this, note that, given any ε > 0,
k 1
|xk − 1| = − 1 = <ε
k+1 k+1
§2. A Bit of Topology in Rn 65
whenever k + 1 > 1/ε. If we let K = [1/ε] (the greatest integer less than or equal to 1/ε),
then it is easy to see that k > K =⇒ k + 1 > 1/ε, as required.
(b) The sequence {xk = (1+ k1 )k } of real numbers is a famous one (think of compound interest),
and converges to e, as the reader can check by taking logs and applying Proposition 3.6.
(c) The sequence 1, −1, 1, −1, 1, . . ., i.e., {xk = (−1)k+1 }, is not convergent. Since its consecu-
tive terms are 2 units apart, no matter what a ∈ R and K ∈ N we pick, whenever ε < 1,
we cannot have |xk − a| < ε whenever k > K. For if we did, we would have (by the triangle
inequality)
whenever k > log2 (kx0 k/ε) = log(kx0 k/ε)/ log 2. So, if we take K = [log(kx0 k/ε)/ log 2]+1,
then it follows
" that
# whenever k > K, we have kxk k < ε, as required.
2 0 1
(e) Let A = and x0 = . Define a sequence of vectors in R2 recursively by
0 1 1
Axk−1
xk = , k ≥ 1.
kAxk−1 k
1 2k
can easily prove by induction, we have xk = √
As the reader
22k +1 1
, and it follows that
1
lim xk = . ▽
k→∞ 0
and so we can make k(xk + yk ) − (a + b)k < ε by making kxk − ak < ε/2 and kyk − bk < ε/2. To
this end, we use the definition of convergence of the sequences {xk } and {yk } as follows: there are
K1 , K2 ∈ N so that
and
Thus, if we take K = max(K1 , K2 ), whenever k > K, we will have k > K1 and k > K2 , and so
ε ε
k(xk + yk ) − (a + b)k ≤ kxk − ak + kyk − bk < + = ε,
2 2
as was required. ▽
66 Chapter 2. Functions, Limits, and Continuity
Examples 4. (a) Let S = [0, 1]. Then S is bounded above (e.g., by 2) and sup S = 1.
√
(b) Let S = {x ∈ Q : x2 < 2}. Then S is bounded above (e.g., by 2), and sup S = 2. (Note
√
that 2 ∈ / Q. The point is that the irrational numbers fill in all the “holes” amongst the
rationals.)
(c) Suppose {xk } is a sequence of real numbers that is both bounded above and nondecreasing
(i.e., xk ≤ xk+1 for all k ∈ N). Then the sequence must converge. Since the sequence is
bounded above, there is a least upper bound, α, for the set of its values. Now we claim
that xk → α. Given ε > 0, there is K ∈ N so that α − xK < ε (for otherwise α would not
be the least upper bound). But then the fact that the sequence is nondecreasing tells us
that whenever k > K we have 0 ≤ α − xk ≤ α − xK < ε, as required. ▽
Definition. Suppose S ⊂ Rn . If S has the property that every convergent sequence of points
in S converges to a point in S, then we say S is closed. That is, S is closed if the following is true:
Whenever a convergent sequence xk → a has the property that xk ∈ S for all k ∈ N, then a ∈ S as
well.
This definition seems a bit strange, but it is exactly what we will need for many applications
to come. In the meantime, if we need to decide whether or not a set is closed, it is easiest to use
the following
Proof. Suppose Rn − S is open and {xk } is a convergent sequence with xk ∈ S and limit
a. Suppose that a ∈ / S. Then there is a neighborhood B(a, ε) of a wholly contained in Rn − S,
which means no element of the sequence {xk } lies in that neighborhood, contradicting the fact that
xk → a. Therefore, a ∈ S, as desired.
Suppose S is closed and b ∈ / S. We claim that there is a neighborhood of b lying entirely in
n
R − S. Suppose not. Then for every k ∈ N, the ball B(b, 1/k) intersects S; that is, we can find a
point xk ∈ S with kxk − bk < 1/k. Then {xk } is a sequence of points in S converging to the point
b∈ / S, contradicting the hypothesis that S is closed.
Note that most sets are neither open nor closed. For example, the interval S = (0, 1] ⊂ R is not
open because there is no neighborhood of the point 1 contained in S, and it is not closed because
of the reasoning in Example 5. Be careful not to make a common mistake here: just because a set
isn’t open, it need not be closed, and vice versa.
For future use, we make the following
Definition. Suppose S ⊂ Rn . We define the closure of S to be the smallest closed set containing
S. It is denoted by S.
We should think of S as containing all the points of S and all points that can be obtained as
limits of convergent sequences of points of S. A slightly different formulation of this notion is given
in Exercise 8.
EXERCISES 2.2
*1. Which of the following subsets of Rn is open? closed? neither? Prove your answer.
a. {x : 0 < x ≤ 2} ⊂ R x
−k for some k ∈ N or x = 0} ⊂ R g. : y = x ⊂ R2
b. {x : x = 2 y
x h. {x : 0 < kxk < 1} ⊂ Rn
c. : y > 0 ⊂ R2
y
i. {x : kxk > 1} ⊂ Rn
x j. {x : kxk ≤ 1} ⊂ Rn
d. : y ≥ 0 ⊂ R2
y
k.
Q ⊂R
the set of rational
numbers,
x 2
1
e. :y>x ⊂R l. x : kxk < 1 or
x −
< 1 ⊂ R2
y
0
x m. ∅ (the empty set)
f. : xy 6= 0 ⊂ R2
y
♯ 2. Let {xk } be a sequence of points in Rn . For i = 1, . . . , n, let xk,i denote the ith coordinate of
the vector xk . Prove that xk → a if and only if xk,i → ai for all i = 1, . . . , n.
♯ 7. a. Suppose U and V are open subsets of Rn . Prove that U ∪ V and U ∩ V are open as well.
(Recall that U ∪ V = {x ∈ Rn : x ∈ U or x ∈ V } and U ∩ V = {x ∈ Rn : x ∈ U and
x ∈ V }.)
b. Suppose C and D are closed subsets of Rn . Prove that C ∪ D and C ∩ D are closed as
well.
♯ 8. Let S ⊂ Rn . We say a ∈ S is an interior point of S if some neighborhood of a is contained in
S. We say a ∈ Rn is a frontier point of S if every neighborhood of a contains both points in S
and points not in S.
a. Show that every point of S is either an interior point or a frontier point, but give examples
to show that a frontier point of S may or may not belong to S.
b. Give an example of a set S every point of which is a frontier point.
c. Prove that the set of frontier points of S is always a closed set.
d. Let S ′ be the union of S and the set of frontier points of S. Prove that S ′ is closed.
e. Suppose C is a closed set containing S. Prove that S ′ ⊂ C. Thus, S ′ is the smallest
closed set containing S, which we have earlier called S, the closure of S. (Hint: Show that
Rn − C ⊂ Rn − S ′ .)
9. Continuing Exercise 8:
a. Is it true that all the interior points of S are points of S? Is this true if S is open? (Give
proofs or counterexamples.)
b. Let S ⊂ Rn and let F be the set of the frontier points of S. Is it true that the set of
frontier points of F is F itself? (Proof or counterexample.)
♯ *10. a. Suppose I0 = [a, b] is a closed interval, and for each k ∈ N, Ik is a closed interval with the
property that Ik ⊂ Ik−1 . Prove that there is a point x ∈ R so that x ∈ Ik for all k ∈ N.
b. Give an example to show the result of part a is false if the intervals are not closed.
11. Prove that the only subsets of R that are both open and closed are the empty set and R
itself. (Hint: Suppose S is such a nonempty subset that is not equal to R. Then there
are some points a ∈ S and b ∈ / S. Without loss of generality (how?), assume a < b. Let
α = sup{x ∈ R : [a, x] ⊂ S}. Show that neither α ∈ S nor α ∈
/ S is possible.)
12. A sequence {xk } of points in Rn is called a Cauchy sequence if for all ε > 0 there is K ∈ N so
that whenever k, ℓ > K, we have kxk − xℓ k < ε.
a. Prove that any convergent sequence is Cauchy.
b. Prove that if a subsequence of a Cauchy sequence converges, then the sequence itself must
converge. (Hint: Suppose ε > 0. If xkj → a, then there is J ∈ N so that whenever j > J,
we have kxkj − ak < ε/2. There is also K ∈ N so that whenever k, ℓ > K, we have
kxk − xℓ k < ε/2. Choose j > J so that kj > K.)
*13. Prove that if {xk } is a Cauchy sequence, then all the points lie in some ball centered at the
origin.
14. a. Suppose {xk } is a sequence of points in R satisfying a ≤ xk ≤ b for all k ∈ N. Prove that
§3. Limits and Continuity 69
{xk } has a convergent subsequence (see Exercise 6). (Hint: If there are only finitely many
distinct terms in the sequence, this should be easy. If there are infinitely many distinct
terms in the sequence, then there must be infinitely many either in the left half-interval
[a, a+b a+b
2 ] or in the right half-interval [ 2 , b]. Let [a1 , b1 ] be such a half-interval. Continue
the process, and apply Exercise 10.)
b. Use the results of Exercises 12 and 13 to prove that any Cauchy sequence in R is convergent.
c. Now prove that any Cauchy sequence in Rn is convergent. (Hint: Use Exercise 2.)
♯ 15. Suppose S ⊂ Rn is a closed set that is a subset of the rectangle [a1 , b1 ] × · · · × [an , bn ]. Prove
that any sequence of points in S has a convergent subsequence. (Hint: Use repeatedly the idea
of Exercise 14a.)
The concept on which all of calculus is founded is that of the limit. Limits are rather more subtle
when we consider functions of more than one variable. We begin with the obligatory definition and
some standard properties of limits.
lim f (x) = ℓ
x→a
(“f (x) approaches ℓ ∈ Rm as x approaches a”) if for every ε > 0 there is δ > 0 so that
(Note that even if f (a) is defined, we say nothing whatsoever about its relation to ℓ .)
We begin by observing that for a vector-valued function, calculating a limit may be done
component by component. As is customary by now, we denote the components of f by f1 , . . . , fm .
Proposition 3.1. lim f (x) = ℓ if and only if lim fj (x) = ℓj for all j = 1, . . . , m.
x→a x→a
Proof. The proof is based on Figure 2.1. Suppose lim f (x) = ℓ . We must show that for any
x→a
j = 1, . . . , m, we have lim fj (x) = ℓj . Given ε > 0, there is δ > 0 so that whenever 0 < kx−ak < δ,
x→a
we have kf (x) − ℓ k < ε. But since we have
we see that whenever 0 < kx − ak < δ, we have |fj (x) − ℓj | < ε, as required.
Now, suppose that lim fj (x) = ℓj for j = 1, . . . , m. Given ε > 0, there are δ1 , . . . , δm > 0 so
x→a
that
ε
|fj (x) − ℓj | < √ whenever 0 < kx − ak < δj .
m
70 Chapter 2. Functions, Limits, and Continuity
as required.
Example 2. Let f : Rn → R be defined by f (x) = kxk2 . Then we claim that lim f (x) = kak2 .
x→a
(1) Suppose first that a = 0. Since r 2 ≤ r whenever 0 ≤ r ≤ 1, we know that when 0 < ε ≤ 1,
we can choose δ = ε and then
0 < kxk < δ = ε =⇒ |f (x)| = kxk2 < ε2 ≤ ε,
as required. But what if some (admittedly, silly) person hands us an ε > 1? The trick to
take care of this is to let δ = min(1, ε). Should ε be bigger than 1, then δ = 1, and so when
0 < kxk < δ, we know that kxk < 1 and, onceagain, |f (x)| < 1 < ε, as required.
ε
(2) Now suppose a 6= 0. Given ε > 0, let δ = min kak, . Now suppose 0 < kx − ak < δ.
3kak
Then, in particular, we have kxk < kak + δ ≤ 2kak, so that kx + ak ≤ kxk + kak < 3kak.
Then
ε
|f (x) − kak2 | = |x · x − a · a| = |(x + a) · (x − a)| ≤ kx + akkx − ak < 3kak · = ε,
3kak
as required.
Such sleight of hand (and more) is often required when the function is nonlinear. ▽
2 x x2 y
Example 3. Define f : R − {0} → R by f = 2 . Does lim f (x) exist? Since
y
x + y2 x→0
p p x
|x| ≤ x2 + y 2 and |y| ≤ x2 + y 2 , we have (writing x = )
y
kxk3
|f (x)| ≤ = kxk,
kxk2
and so f (x) → 0 as x → 0. (In particular, taking δ = ε will work.) An alternative approach, which
will be useful later, is this:
x2
|f (x)| = |y| 2 ≤ |y|,
x + y2
x2
since 0 ≤ 2 ≤ 1. Once again, |y| ≤ kxk, and hence approaches 0 as x → 0. Thus, so does
x + y2
|f (x)|. (See Figure 3.1(a).) ▽
§3. Limits and Continuity 71
(a) (b)
Figure 3.1
x
Example 4. Let’s modify the previous example slightly. Define f : R2 − {0} → R by f =
y
x2
. We ask again whether lim f (x) exists. Note that
x2 + y 2 x→0
h h2
lim f = lim 2 = 1, whereas
h→0 0 h→0 h
0 0
lim f = lim 2 = 0.
k→0 k k→0 k
Thus, lim f (x) cannot exist (there is no number ℓ so that both 1 and 0 are less than ε away from ℓ
x→0
x xy
when 0 < ε < 1/2). (See Figure 3.1(b).) Now, what about f = 2 ? In this case we have
y x + y2
h 0
lim f = lim f = 0,
h→0 0 k→0 k
so we might surmise that the limit exists and equals 0. But consider what happens if x approaches
0 along the line y = x:
h h2 1
lim f = lim 2 = .
h→0 h h→0 2h 2
Once again, the limit does not exist. ▽
The fundamental properties of limits with which every calculus student is familiar generalize
in an obvious way to the multivariable setting.
Theorem 3.2. Suppose f and g map a neighborhood of a ∈ Rn (with the possible exception
of the point a itself) to Rm and k maps the same neighborhood to R. Suppose
lim f (x) = ℓ , lim g(x) = m , and lim k(x) = c.
x→a x→a x→a
Then
lim f (x) + g(x) = ℓ + m
x→a
lim f (x) · g(x) = ℓ · m
x→a
lim k(x)f (x) = cℓℓ .
x→a
72 Chapter 2. Functions, Limits, and Continuity
and
ε
kg(x) − m k < whenever 0 < kx − ak < δ2 .
2(kℓℓ k + 1)
Note that when 0 < kx − ak < δ1 , we have (by the triangle inequality) kf (x)k < kℓℓk + 1. Now, let
δ = min(δ1 , δ2 ). Whenever 0 < kx − ak < δ, we have
mk
|f (x) · g(x) − ℓ · m | = |f (x) · (g(x) − m ) + (f (x) − ℓ ) · m | ≤ kf (x)kkg(x) − m k + kf (x) − ℓ kkm
< (kℓℓ k + 1)kg(x) − m k + kf (x) − ℓ kkm
mk
ε ε ε ε
< (kℓℓ k + 1) + mk < + = ε,
km
2(kℓℓ k + 1) 2(km
m k + 1) 2 2
as required.
The proof of the last equality is left to the reader in Exercise 4.
Once we have the concept of limit, the definition of continuity is quite straightforward.
That is, f is continuous at a if, given any ε > 0, there is δ > 0 so that
Corollary 3.3. Suppose f and g map a neighborhood of a ∈ Rn to Rm and k maps the same
neighborhood to R. If each function is continuous at a, then so are f + g, f · g, and kf .
§3. Limits and Continuity 73
It is perhaps a bit more interesting to relate the definition of continuity to our notions of
open and closed sets from the previous section. Let’s first introduce a bit of standard notation: if
f : X → Y is a function and Z ⊂ Y , we write f −1 (Z) = {x ∈ X : f (x) ∈ Z}, as illustrated in
Figure 3.2. This is called the preimage of Z under the mapping f ; be careful to remember that f
may not be one-to-one and hence may well have no inverse function.
−1
f (Z)
X Z
Figure 3.2
a δ
f
f(a)
ε
V
f −1(V)
Figure 3.3
Figure 3.4
4
x = 1 , x 6= 0
x
f = 2x4 2 ,
x2
0, x=0
which is definitely not a continuous
function.
Thus, f cannot be continuous. (If it were, according
x
to Proposition 3.5, letting g(x) = 2 , f g would have to be continuous.)
◦ ▽
x
Proof. Suppose f is continuous at a. Given ε > 0, there is δ > 0 so that whenever kx − ak < δ,
we have kf (x) − f (a)k < ε. Suppose xk → a. There is K ∈ N so that whenever k > K, we have
kxk − ak < δ, and hence kf (xk ) − f (a)k < ε. Thus, f (xk ) → f (a), as required.
The converse is a bit trickier. We proceed by proving the contrapositive. Suppose f is not
continuous at a. This means that for some ε0 > 0, it is the case that for every δ > 0 there is
some x with kx − ak < δ and kf (x) − f (a)k ≥ ε0 . So, for each k ∈ N, there is a point xk so that
kxk − ak < 1/k and kf (xk ) − f (a)k ≥ ε0 . But this means that the sequence {xk } converges to a
and yet clearly the sequence {f (xk )} cannot converge to f (a).
Corollary 3.7. Suppose f : Rn → Rm is continuous. Then for any c ∈ Rm , the level set
f −1 ({c}) = {x ∈ Rn : f (x) = c} is a closed set.
Proof. Suppose {xk } is a convergent sequence of points in f −1 ({c}), and let a be its limit. By
Proposition 3.6, f (xk ) → f (a). Since f (xk ) = c for all k, it follows that f (a) = c as well, and so
a ∈ f −1 ({c}), as we needed to show.
EXERCISES 2.3
1. Prove that if lim f (x) exists, it must be unique. (Hint: if ℓ and m are two putative limits,
x→a
choose ε = kℓℓ − m k/2.)
♯ 2. Prove that f : Rn → R, f (x) = kxk is continuous. (Hint: Write x = a + (x − a).)
♯ 3. (Squeeze Principle) Suppose f , g, and h are real-valued functions on a neighborhood of a
(perhaps not including the point a itself). Suppose f (x) ≤ g(x) ≤ h(x) for all x and lim f (x) =
x→a
ℓ = lim h(x). Prove that lim g(x) = ℓ. (Hint: Given ε > 0, show that there is δ > 0 so that
x→a x→a
whenever 0 < kx − ak < δ, we have −ε < f (x) − ℓ ≤ g(x) − ℓ ≤ h(x) − ℓ < ε.)
4. Suppose lim f (x) = ℓ and lim k(x) = c. Prove that lim k(x)f (x) = cℓℓ.
x→a x→a x→a
♯ 5. Suppose U ⊂ Rn is open and f : U → R is continuous. If a ∈ U and f (a) > 0, prove that there
is δ > 0 so that f (x) > 0 for all x ∈ B(a, δ). (That is, a continuous function that is positive at
a point must be positive on a neighborhood of that point.) Can you state a somewhat stronger
result?
b. Deduce the result of part a an alternative way by showing that for any m × n matrix A,
we have
X 1/2
kAxk ≤ a2ij kxk.
i,j
*8. Using Theorem 3.2 whenever possible (and standard facts from one-variable calculus), decide
in each case whether lim f (x) exists. Provide appropriate justification.
x→0
x xy x x2 + y 2 0
a. f = f. f = , x 6= 0, f =0
y x+y+1 y y
x
x sin(x2 + y 2 )
b. f = x x3
y 2
x +y 2 g. f =
y x2 + y 2
x 2
x −y 2
x
c. f = , x 6= y, f =0 x x sin2 y
y x−y x h. f =
y x2 + y 2
x 2 2
d. f = ex +y x xy x
y i. f = 3 3
, x 6= y, f =0
y x −y x
x
e. f
2
= e−1/(x +y )
2
x x2 + y 2 x
y j. f = , x 6= −y, f =0
y x+y −x
10. Use Exercise 9 to find the limit of each of the following sequences of points in R, presuming it
exists.
√ 1
*a. x0 = 1, xk = 2xk−1 c. x0 = 1, xk = 1 +
xk−1
xk−1 2 1
b. x0 = 5, xk = + *d. x0 = 1, xk = 1 +
2 xk−1 1 + xk−1
11. Give an example of a discontinuous function f : R → R having the property that for every
c ∈ R the level set f −1 ({c}) is closed.
13. Prove that if f is continuous, then the preimage of every closed set is closed.
14. Identify Mm×n , the set of m × n matrices, with Rmn in the obvious way.
a. Prove that when n = 2 or 3, the set of n × n matrices with nonzero determinant is an open
subset of Mn×n .
b. Prove that the set of n × n matrices A satisfying AT A = In is a closed subset of Mn×n .
15. a. Let
0,
x |y| > x2 or y = 0
f = .
y 1, otherwise
Show that f is continuous at 0 on every line through the origin but is not continuous at 0.
§3. Limits and Continuity 77
b. Give a function that is continuous at 0 along every line and every parabola y = kx2
through the origin but is not continuous at 0.
17. Generalizing Example 5, for what positive values of α, β, γ, and δ is the analogous function
x |x|α |y|β
f = , x 6= 0, f (0) = 0
y |x|γ + |y|δ
continuous at 0?
18. a. Suppose A is an invertible n × n matrix. Show that the solution of Ax = b varies contin-
uously with b ∈ Rn .
A
b. Show that the solution of Ax = b varies continuously as a function of , as A varies
b
over all invertible matrices and b over Rn . (You should be able to get the cases n = 1 and
n = 2. What do you need for n > 2?)
CHAPTER 3
The Derivative
In this chapter we start in earnest on calculus. The immediate goal is to define the tangent plane
at a point to the graph of a function, which should be the suitable generalization of the tangent
lines in single-variable calculus. The fundamental computational tool is the partial derivative, a
direct application of single-variable calculus tools. But the actual definition of a differentiable
function immediately involves linear algebra. We establish various differentiation rules and then
introduce the gradient, which, as common parlance has come to suggest, tells us in which direction
a scalar function increases the fastest; thus, it is highly important for physical and mathematical
applications. We conclude the chapter with a discussion of Kepler’s laws, the geometry of curves,
and higher-order derivatives.
y=b
y=b
z
z=f (xy )
x
a a
b
Figure 1.1
78
§1. Partial Derivatives and Directional Derivatives 79
∂f ∂f
Definition. We define the partial derivatives and as follows:
∂x ∂y
a+h a
f −f
∂f a b b
= lim
∂x b h→0 h
a a
f −f
∂f a b+k b
= lim .
∂y b k→0 k
∂f a x
Very simply, if we fix b, then is the derivative at a (or slope) of the function F (x) = f ,
∂x b b
∂f a
as indicated in Figure 1.1. There is an analogous interpretation of .
∂y b
More generally, if U ⊂ Rn is open and a ∈ U , we define the j th partial derivative of f : U → Rm
at a to be
∂f f (a + tej ) − f (a)
(a) = lim , j = 1, . . . , n
∂xj t→0 t
(provided this limit exists). Many authors use the alternative notation Dj f (a) to represent the j th
partial derivative of f at a.
x
Example 1. Let f = x3 y 5 + exy sin(2x + 3y). Then
y
∂f x
= 3x2 y 5 + exy y sin(2x + 3y) + 2 cos(2x + 3y) and
∂x y
∂f x
= 5x3 y 4 + exy x sin(2x + 3y) + 3 cos(2x + 3y) . ▽
∂y y
The partial derivatives of f measure the rate of change of f in the directions of the coordinate
axes, i.e., in the directions of the standard basis vectors e1 , . . . , en . Given any nonzero vector v, it
is natural to consider the rate of change of f in the direction of v.
Note that the j th partial derivative of f at a is just Dej f (a). When n = 2 and m = 1, as we see
from Figure 1.2, if kvk = 1, the directional derivative Dv f (a) is just the slope at a of the graph we
obtain by restricting to the line through a with direction v.
z=f (xy )
y
v
x
a v
b a
b
Figure 1.2
the directional derivative depends not only on the direction of v, but also on its magnitude. It is
for this reason that many calculus books require that one specify a unit vector v. It makes more
sense to think of Dv f (a) as the rate of change of f as experienced by an observer moving with
instantaneous velocity v. We shall return to this interpretation in Section 3.
Figure 1.3
are nonzero. ▽
§1. Partial Derivatives and Directional Derivatives 81
EXERCISES 3.1
1. Calculate
the partial derivatives of the following functions:
x
*a. f = x3 + 3xy 2 − 2y + 7
y
x p
b. f = x2 + y 2
y
y
x
*c. f = arctan
y x
x 2 2
d. f = e−(x +y )
y
x
e. f = (x + y 2 ) log x
y
x
f. f y = exy z 2 − xy sin(πyz)
z
82 Chapter 3. The Derivative
2. Calculate the directional derivative of the given function f at the given point a in the direction
of the
given
vector v.
x 2 2 1
*a. f = x + xy, a = , v=
y 1 −1
x 2 2 1 1
*b. f = x + xy, a = , v= √
y 1 2 −1
x 0 3
c. f = ye−x , a = , v=
y 1 4
x −x 0 1 3
d. f = ye , a = , v=
y 1 5 4
3. For each of the following functions f and points a, find the unit vector v with the property
that D
v f
(a) is as large as possible.
x 2
*a. f = x2 + xy, a =
y
1
x 0
b. f = ye−x , a =
y 1
x 1
1 1 1
c. f y = + + , a = −1
z
x y z 1
4. Suppose Dv f (a) exists. Prove that D−v f (a) exists and calculate it in terms of the former.
5. a. Show that there can be no function f : Rn → R so that for some point a ∈ Rn we have
Dv f (a) > 0 for all nonzero vectors v ∈ Rn .
b. Show that there can, however, be a function f : Rn → R so that for some vector v ∈ Rn
we have Dv f (a) > 0 for all points a ∈ Rn .
6. Consider the ideal gas law pV = nRT . (Here p is pressure, V is volume, n is the number of
moles of gas present, R is the universal gas constant, and T is temperature.) Assume n is fixed.
Solve for each of p, V , and T as functions of the others, viz.,
V p p
p=f , V =g , and T = h .
T T V
Compute the partial derivatives of f , g, and h. What is
∂f ∂g ∂h ∂p ∂V ∂T
· · , or, more colloquially, · · ?
∂V ∂T ∂p ∂V ∂T ∂p
x x
7. Suppose f : R → R is differentiable, and let g =f . Show that
y y
∂g ∂g
x +y = 0.
∂x ∂y
x p
8. Suppose f : R → R is differentiable, and let g = f ( x2 + y 2 ) for x 6= 0. Show that
y
∂g ∂g
y =x .
∂x ∂y
§2. Differentiability 83
Show that the partial derivatives of f exist at 0 and yet f is not continuous at 0. Do other
directional derivatives of f exist at 0?
*11. Suppose T : Rn → Rm is a linear map. Show that the directional derivative Dv T (a) exists for
all a ∈ Rn and all v ∈ Rn and calculate it.
2
12. Identify the set Mn×n of n × n matrices with Rn .
a. Define f : Mn×n → Mn×n by f (A) = AT . For any A, B ∈ Mn×n , prove that DB f (A) =
B T.
b. Define f : Mn×n → R by f (A) = trA. For any A, B ∈ Mn×n , prove that DB f (A) = trB.
(For the definition of trace, see Exercise 1.4.22.)
2
13. Identify the set Mn×n of n × n matrices with Rn .
a. Define f : Mn×n → Mn×n by f (A) = A2 . For any A, B ∈ Mn×n , prove that DB f (A) =
AB + BA.
b. Define f : Mn×n → Mn×n by f (A) = AT A. Calculate DB f (A).
2. Differentiability
f (a + h) − f (a) − mh
lim = 0.
h→0 h
a
That is, the tangent line—the line passing through with slope m = f ′ (a)—is the best
f (a)
(affine) linear approximation to the graph of f at a, in the sense that the error goes to 0 faster
than h as h → 0. (See Figure 2.1.) Generalizing the latter notion, we make the
84 Chapter 3. The Derivative
ε(h)=f(a+h)−f(a)−f ′(a)h
f ′(a)h
a a+h
Figure 2.1
This says that Df (a) is the best linear approximation to the function f − f (a) at a, in the
sense that the difference f (a + h) − f (a) − Df (a)h is small compared to h. See Figure 2.2 and
f(a+h)−f(a)−Df(a)h
Df(a)h
a
h
a+h
Figure 2.2
compare Figure 2.1. Equivalently, writing x = a + h, the function g(x) = f (a) + Df (a)(x − a) is
the best affine linear approximation to f near a. Indeed, the graph of g is called the tangent plane
of the graph at a. The tangent plane is obtained
by translating the graph of Df (a), a subspace of
a
Rn × Rm , so that it passes through .
f (a)
Remark . The derivative Df (a), if it exists, must be unique. If there were two linear maps
T, T ′ : Rn → Rm satisfying
f (a + h) − f (a) − T (h) f (a + h) − f (a) − T ′ (h)
lim = 0 and lim = 0,
h→0 khk h→0 khk
then we would have
(T − T ′ )(h)
lim = 0.
h→0 khk
§2. Differentiability 85
Proof. Since we assume f is differentiable at a, we know there is a linear map Df (a) with the
property that
f (a + h) − f (a) − Df (a)h
lim = 0.
h→0 khk
As we did in the remark above, for any j = 1, . . . , n, we consider h = tej , and let t → 0. Then we
have
f (a + tej ) − f (a) − Df (a)(tej )
0 = lim = 0.
t→0 |t|
Considering separately the cases t > 0 and t < 0, we find that
f (a + tej ) − f (a) − Df (a)(tej ) f (a + tej ) − f (a)
0 = lim = lim − Df (a)(ej )
t→0+ t t→0+ t
f (a + tej ) − f (a) − Df (a)(tej ) f (a + tej ) − f (a)
0 = lim = − lim − Df (a)(ej ) ,
t→0− −t t→0− t
f (a + tej ) − f (a) ∂f
and so Df (a)(ej ) = lim = (a), as required.
t→0 t ∂xj
Example 1. When n = 1, we have parametric equations of a curve in Rm . We see that if f is
differentiable at a, then
f1′ (a)
′
f2 (a)
Df (a) =
.. ,
.
′ (a)
fm
and we can think of Df (a) = Df (a)(1) as the velocity vector of the parametrized curve at the point
f (a), which we will usually denote by the (more) familiar f ′ (a). See Section 5 for further discussion
of this topic. ▽
x a
Example 2. Let f = xy. To prove that f is differentiable at a = , we must exhibit a
y b
linear map Df (a) with the requisite property. By Proposition 2.1, we know the only candidate is
a
Df = b a ,
b
86 Chapter 3. The Derivative
Proof. Suppose f is differentiable at a; we must show that lim f (x) = f (a), or, equivalently,
x→a
that lim f (a + h) = f (a). We have a linear map Df (a) : Rn → Rm so that
h→0
f (a + h) − f (a) − Df (a)h
lim = 0.
h→0 khk
This means that
f (a + h) − f (a) − Df (a)h
lim f (a + h) − f (a) − Df (a)h = lim khk
h→0 h→0 khk
f (a + h) − f (a) − Df (a)h
= lim lim khk = 0.
h→0 khk h→0
as required.
Let’s now study a few examples to see just how subtle the issue of differentiability is.
Example 6. Define f : R2 → R by
x xy
f = 2 , x 6= 0, f (0) = 0.
y x + y2
x 0
Since f = 0 for all x and f = 0 for all y, certainly
0 y
∂f 0 ∂f 0
= = 0.
∂x 0 ∂y 0
88 Chapter 3. The Derivative
However, we have already seen in Exercise 3.1.9 that f is discontinuous, so it cannot be differ-
entiable. For practice, we check directly: if Df (0) existed, by Proposition 2.1 we would have
Df (0) = 0. Now let’s consider
f (h) − f (0) − Df (0)h f (h) hk
lim = lim = lim .
khk h→0 khk (h2 + k2 )3/2
h→0 h
k →0
Like many of the limits we considered in Chapter 2, this one obviously does not exist; indeed, as
h → 0 along the line h = k, this fraction becomes
h2 1
2 3/2
= √ ,
(2h ) 2 2|h|
which is clearly unbounded as h → 0. What’s more, as the reader can check, f has directional
derivatives at 0 only in the directions of the axes.
Example 7. Define f : R2 → R by
x x2 y
f = 2 , x 6= 0, f (0) = 0.
y x + y2
As in Example 6, both partial derivatives of this function at 0 are 0. This function, as we saw in
Example 3 of Chapter 2, Section 3, is continuous, so differentiability is a bit more unclear. But we
just try to calculate:
f (h) − f (0) − Df (0)h f (h) h2 k
lim = lim = lim .
khk h→0 khk (h2 + k2 )3/2
h→0 h
k →0
When h → 0 along either coordinate axis, the limit is obviously 0; however, when h → 0 along
1 1
the line h = k, the limit does not exist (the function is equal to + 2√ 2
when h > 0 and − 2√ 2
when
h < 0). Thus, f is not differentiable at 0.
Proof. Since f is differentiable at a, we know that its derivative, Df (a), has the property that
f (a + h) − f (a) − Df (a)h
lim = 0.
h→0 khk
Substituting h = tv and letting t → 0, we have
f (a + tv) − f (a) − Df (a)(tv)
lim = 0.
t→0 |t|
Since Df (a) is a linear map, Df (a)(tv) = tDf (a)v. Proceeding as in the proof of Proposition 2.1,
letting t approach 0 through positive values, we have
f (a + tv) − f (a) − tDf (a)v
lim = 0, and so
t→0+ t
f (a + tv) − f (a)
lim = Df (a)v.
t→0+ t
§2. Differentiability 89
Remark. Let’s consider the case of a function f : R2 → R, as we pictured in Figures 1.1 and
1.2. As a consequence of Proposition 2.3, the tangent plane of the graph of f at a contains the
tangent lines at a of the slices by all vertical planes. The function f given in Example 2 of Section
1 cannot be differentiable at 0, as it is clear from Figure 1.3 that the tangent lines to the various
vertical slices at the origin do not lie in a plane.
Since it is so tedious to determine from the definition whether a function is differentiable, the
following Proposition is useful indeed.
a+h
b+k
a a+h
b b
Figure 2.3
∂f ∂f
f (a + h) − f (a) − Df (a)h = f (a + h) − f (a) − (a)h + (a)k .
∂x ∂y
90 Chapter 3. The Derivative
Now, here is the new twist: as Figure 2.3 indicates, we calculate f (a + h) − f (a) by taking a
two-step route.
a+h a
f (a + h) − f (a) = f −f
b+k b
a+h a a+h a+h
= f −f + f −f
b b b+k b
and so, regrouping in a clever fashion and using the Mean Value Theorem twice, we obtain
f (a + h) − f (a) − Df (a)h
a+h a ∂f a+h a+h ∂f
= f −f − (a)h + f −f − (a)k
b b ∂x b+k b ∂y
∂f a + ξ ∂f ∂f a + h ∂f
= h− (a)h + k− (a)k
∂x b ∂x ∂y b + η ∂y
for some ξ between 0 and h and some η between 0 and k
∂f a + ξ ∂f ∂f a + h ∂f
= − (a) h + − (a) k.
∂x b ∂x ∂y b + η ∂y
|h| |k|
Now, observe that ≤ 1 and ≤ 1; as h → 0, continuity of the partial derivatives guarantees
khk khk
that
∂f a + ξ ∂f ∂f a + h ∂f
lim − (a) = lim − (a) = 0
h→0 ∂x b ∂x h→0 ∂y b+η ∂y
since ξ → 0 and η → 0 as h → 0. Thus,
|f (a + h) − f (a) − Df (a)h| ∂f a + ξ ∂f |h|
∂f a + h
∂f |k|
≤ − (a) + − (a) ,
khk ∂x b ∂x khk ∂y b + η ∂y khk
and therefore indeed approaches 0 as h → 0.
Example 8. We know that the function f given in Example 7 is not differentiable. It follows
from Proposition 2.4 that f cannot be C1 at 0. Let’s verify this directly.
∂f ∂f
It is obvious that (0) = (0) = 0, and for x 6= 0, we have
∂x ∂y
∂f x 2xy 3 ∂f x x2 (x2 − y 2 )
= 2 and = .
∂x y (x + y 2 )2 ∂y y (x2 + y 2 )2
∂f x 1 ∂f x
So we see that when x 6= 0, = and = 1, neither of which approaches 0 as x → 0.
∂x x 2 ∂y 0
Thus, f is not C1 at 0.
Example 9. To see that the sufficient condition for differentiability given by Proposition 2.4
is not necessary, we consider the classic example of the function f : R → R defined by
x2 sin 1 , x 6= 0
f (x) = x .
0, x=0
1 1
Then it is easy to check that f ′ (0) = 0, and yet f ′ (x) = 2x sin − cos has no limit as x → 0.
x x
Thus, f is differentiable on all of R, but is not C1 .
§2. Differentiability 91
EXERCISES 3.2
1. Find the
equation of the tangent
plane of the graph z = f (x) at the indicated point.
x xy −1
*a. f =e , a=
y 2
x 2 2 −1
b. f =x +y , a=
y 2
x p 3
c. f = x2 + y 2 , a =
y 4
x p 1
d. f 2
= 4−x −y , a= 2
y 1
x 1
e. f y = xyz, a = 2
z 3
x 1
*f. f y = sin(xy)z 2 + exz+1 , a = 0
z −1
2. Calculate
the directional derivative
of f at
ain the given direction v:
x 0 1
*a. f = ex cos y, a = , v=
y π/4 1
x x 0 1
b. f = e cos y, a = , v=
y π/4 −1
x 3 1
c. f = xy 2 , a = , v=
y 1 2
x 2 2 2 1 2
d. f =x +y , a= v= √
y 1 5 1
x p 2 1 2
*e. f = x2 + y 2 , a = v= √
y 1 1
5
x 1 2
f. f y = exyz , a = −1 , v = 2
z −1 1
*4. Use the technique of Example 4 to estimate your gas mileage if you used 6.5 gallons to drive
224 miles.
5. Two sides of a triangle are x = 3 and y = 4, and the included angle is θ = π/3. To a small
change in which of these three variables is the area of the triangle most sensitive? Why?
6. Let U ⊂ Rn be an open set, and let a ∈ U . Suppose m > 1. Prove that the function f : U → Rm
is differentiable at a if and only if each component function fi , i = 1, . . . , m, is differentiable at
a. (Hint: Review the proof of Proposition 3.1 of Chapter 2.)
7. Show that any linear map is differentiable and is its own derivative (at an arbitrary point).
a
8. Show that the tangent plane of the cone z 2 = x2 + y 2 at b 6= 0 intersects the cone in a line.
c
9. Show that the tangent plane of the saddle surface z = xy at any point intersects the surface in
a pair of lines.
2
x x − y2
10. Find the derivative of the map f = at the point a. Show that whenever a 6= 0,
y 2xy
the linear map Df (a) is a scalar multiple of a rotation matrix.
11. Prove
from
the definition that the following functions are differentiable:
x
a. f = x2 + y 2
y
x
b. f = xy 2
y
c. f : Rn → R, f (x) = kxk2
12. Let
x x2 y
f = , x 6= 0, f (0) = 0.
y x4 + y 2
Show directly that f fails to be C1 at the origin. (Of course, this follows from Example 5 of
Section 3 of Chapter 2 and Propositions 2.2 and 2.4.)
13. Use the results of Exercise 3.1.13 to show that f (A) = A2 and f (A) = AT A are differentiable
functions mapping Mn×n to Mn×n .
♯ 14. Let A be an n × n matrix. Define f : Rn → R by f (x) = Ax · x = xT Ax.
a. Show that f is differentiable and Df (a)h = Aa · h + Ah · a.
b. Deduce that when A is symmetric, Df (a)h = 2Aa · h.
15. Let a ∈ Rn , δ > 0, and suppose f : B(a, δ) → R is differentiable at a. Suppose f (a) ≥ f (x) for
all x ∈ B(a, δ). Prove that Df (a) = O.
16. Let a ∈ R2 , δ > 0, and suppose f : B(a, δ) → R is differentiable and Df (x) = O for all
x ∈ B(a, δ). Prove that f (x) = f (a) for all x ∈ B(a, δ). (Hint: Start with the proof of
Proposition 2.4.)
§3. Differentiation Rules 93
17. Let
y, x 6= 0
x
f = .
y 0, x = 0
18. Let
x xy 6
f = , x 6= 0, f (0) = 0.
y x4 + y 8
a. Find all the directional derivatives of f at 0.
b. Is f continuous at 0?
c. Is f differentiable at 0?
19. a. Let f : R2 → R be the function defined in Example 5 of Chapter 2, Section 3. Show that
f has directional derivatives at 0 in every direction but is not differentiable at 0.
b. Find a function all of whose directional derivatives at 0 are 0 but, nevertheless, that is not
differentiable at 0.
c. Find a function all of whose directional derivatives at 0 are 0 but that is unbounded in any
neighborhood of 0.
d. Find a function all of whose directional derivatives at 0 are 0, all of whose directional
derivatives exist at every point, and that is unbounded in any neighborhood of 0.
3. Differentiation Rules
In practice, most of the time Proposition 2.4 is sufficient for us to calculate explicit derivatives.
However, it is reassuring to know that the sum, product, and quotient rules from elementary
calculus go over to the multivariable case. We shall come to the chain rule shortly.
For the next proofs, we need the notion of the norm of a linear map T : Rn → Rm . We set
kT k = max kT (x)k.
kxk=1
(In Section 1 of Chapter 5 we will prove the maximum value theorem, which states that a continuous
function on a closed and bounded subset of Rn achieves its maximum value.
Since the unit sphere
x
in Rn is closed and bounded, this maximum exists.) When x 6= 0, we have
T kxk
≤ kT k, and
so, by linearity, the following formula follows immediately:
kT (x)k ≤ kT kkxk.
Proof. These are much like the proofs of the corresponding results in single-variable calculus.
Here, however, we insert the candidate for the derivative in the definition and check that the limit
is indeed 0.
(f + g)(a + h) − (f + g)(a) − Df (a) + Dg(a) h
(1): lim
h→0 khk
f (a + h) − f (a) − Df (a)h + g(a + h) − g(a) − Dg(a)h
= lim
h→0 khk
f (a + h) − f (a) − Df (a)h g(a + h) − g(a) − Dg(a)h
= lim + lim = 0 + 0 = 0.
h→0 khk h→0 khk
(2): We proceed much asin the proof of the limit of the product in Theorem 3.2 of Chapter 2.
(kf )(a + h) − (kf )(a) − Dk(a)h f (a) + k(a)Df (a)h
lim
h→0 khk
k(a + h)f (a + h) − k(a)f (a) − Dk(a)h f (a) + k(a)Df (a)h
= lim
h→0 khk
k(a + h) − k(a) f (a + h) + k(a) f (a + h) − f (a) − Dk(a)h f (a) + k(a)Df (a)h
= lim
h→0 khk
k(a + h) − k(a) f (a + h) − Dk(a)h f (a) k(a) f (a + h) − f (a) − k(a)Df (a)h
= lim + lim
h→0 khk h→0 khk
k(a + h) − k(a) f (a + h) − Dk(a)h f (a) f (a + h) − f (a) − Df (a)h
= lim + k(a) lim
h→0 khk h→0 khk
Now, the second term clearly approaches 0. To handle the first term, we have to use continuity in
a rather subtle way, remembering that if f is differentiable at a, then it is necessarily continuous
at a (Proposition 2.2).
k(a + h) − k(a) f (a + h) − Dk(a)h f (a)
khk
k(a + h) − k(a) − Dk(a)h f (a + h) + Dk(a)h f (a + h) − f (a)
=
khk
k(a + h) − k(a) − Dk(a)h Dk(a)h f (a + h) − f (a)
= f (a + h) + .
khk khk
Now here the first term clearly approaches 0, but the second term is a bit touchy. The length of
the second term is
|Dk(a)h|kf (a + h) − f (a)k
≤ kDk(a)kkf (a + h) − f (a)k,
khk
which in turn goes to 0 as h → 0 by continuity of f at a. This concludes the proof of (2).
The proof of (3) is virtually identical to that of (2) and is left to the reader in Exercise 9.
§3. Differentiation Rules 95
(∗) 0 < khk < δ1 =⇒ kg(a + h) − g(a) − Dg(a)hk < εkhk and
(∗∗) kkk < η =⇒ kf (b + k) − f (b) − Df (b)kk ≤ εkkk.
Setting k = g(a + h) − g(a) and rewriting (∗), we conclude that whenever 0 < khk < δ1 , we have
and so
kkk < kDg(a)hk + εkhk ≤ kDg(a)k + ε khk.
Let δ2 = η kDg(a)k + ε and set δ = min(δ1 , δ2 ).
Finally, we start with the numerator of the fraction whose limit we seek.
Remark. Those who wish to end with a perfect ε at the end may replace the ε in (∗) with
ε ε
and that in (∗∗) with .
2(kDf (b)k + 1) 2(kDg(a)k + ε)
96 Chapter 3. The Derivative
x
Example 1. Suppose the temperature in space is given by f y = xyz 2 + e3xy−2z and the
z
position ofa bumblebee
is given as a function of timet by g : R → R3 . If at time t = 0 the bumblebee
1 −1
is at a = 2 and her velocity vector is v = 1 , as indicated in Figure 3.1, then we might ask
3 2
at what rate she perceives the temperature to be changing at that instant. The temperature she
measures at time t is (f ◦ g)(t), and so she wants to calculate (f ◦ g)′ (0) = D(f ◦ g)(0). We have
Figure 3.1
x
Df y = yz 2 + 3ye3xy−2z xz 2 + 3xe3xy−2z 2xyz − 2e3xy−2z ,
z
h i
so Df (a) = 24 12 10 . Then
h i −1
(f ◦ g)′ (0) = Df (g(0))g′ (0) = Df (a)v = 24 12 10 1 = 8.
2
Note that in order to apply the chain rule, we need to know only her position and velocity vector
at that instant, not even what her path near a might be. ▽
Example 2. Let
x x2 − y 2 u u cos v
f = and g = .
y 2xy v u sin v
Since
x 2x −2y u cos v −u sin v
Df = and Dg = ,
y 2y 2x v sin v u cos v
we have
u u u
D(f ◦ g) = Df g Dg
v v v
" #" # " #
2u cos v −2u sin v cos v −u sin v u cos 2v −u2 sin 2v
= =2 .
2u sin v 2u cos v sin v u cos v u sin 2v u2 cos 2v
2
u u cos 2v
On the other hand, as the reader can verify, (f g)
◦ = 2 , and so we can double-check
v u sin 2v
the calculation of the derivative directly. ▽
EXERCISES 3.3
*2. Suppose
2y − sin x x
x 3x + y − z
f = ex+3y and g y = .
y x + yz + 1
xy + y 3 z
*5. An airplane is flying near a radar tower. At the instant it is exactly 3 miles due west of the
tower, it is 4 miles high and flying with a ground speed of 450 mph and climbing at a rate of 5
mph. If at that instant it is flying
a. due east,
b. northeast,
at what rate is it approaching the radar tower at that instant?
*6. An ideal gas obeys the law pV = nRT , where p is pressure, V is volume, n is the number of
moles, R is the universal gas constant, and T is temperature. Suppose for a certain quantity
of ideal gas, nR = 1 ℓ-atm/◦ K. At a given instant, the volume is 10 ℓ and is increasing at the
rate of 1 ℓ/min; the temperature is 300◦ K and is increasing at the rate of 5◦ K/min. At what
rate is the pressure increasing at that instant?
7. Ohm’s law tells us that V = IR, where V is the voltage in an electric circuit, I is the current flow
(in amps), and R is the resistance (in ohms). Suppose that as time passes, the voltage decreases
as the battery wears out and the resistance increases as the resistor heats up. Assuming V and
R vary as differentiable functions of t, at what rate is the current flow changing at the instant
t0 if R(t0 ) = 100 ohm, R′ (t0 ) = 0.5 ohm/sec, I(t0 ) = 0.1 amp, and V ′ (t0 ) = −0.1 volt/sec?
9. Prove (3) in Proposition 3.1. (One approach is to mimic the proof given of (2). Another is to
apply (1) and (2) appropriately.)
♯ 10. Suppose U ⊂ Rn is open and a ∈ U . Let f , g : U → R3 be differentiable at a. Prove that f × g
is differentiable at a and D(f × g)(a)v = Df (a)v × g(a) + f (a) × Dg(a)v for any v ∈ Rn .
(Hint: Follow the proof of part (2) of Proposition 3.1, and use Exercise 1.5.14.)
∂h ∂h
x +y = rf ′ (r).
∂x ∂y
§4. The Gradient 99
*14. Suppose h : R → R is continuous and u, v : (a, b) → R are differentiable. Prove that the function
F : (a, b) → R given by
Z v(t)
F (t) = h(s)ds
u(t)
is differentiable and calculate F ′.
(Hint: Recall that the Fundamental Theorem of Calculus
Rx
tells you how to differentiate functions such as H(x) = a h(s)ds.)
2 u u+v
15. If f : R → R is differentiable and F =f , show that
v u−v
∂F ∂F ∂f 2 ∂f 2
= − ,
∂u ∂v ∂x ∂y
u+v
where the functions on the right-hand side are evaluated at .
u−v
r r cos θ
*16. Suppose f : R2 → R is differentiable, let F =f . Calculate
θ r sin θ
2 2
∂F 1 ∂F
+
∂r r2 ∂θ
in terms of the partial derivatives of f .
∂f ∂f
17. Suppose f : R2 → R is differentiable and =c for some nonzero constant c. Prove that
∂t ∂x
x u x
f = h(x + ct) for some function h. (Hint: Let = .)
t v x + ct
4. The Gradient
To develop physical intuition, it is important to recast Proposition 2.3 in more geometric terms
when f is a scalar-valued function.
Now we can interpret the directional derivative of a differentiable function as a dot product:
(∗) Dv f (a) = Df (a)v = ∇f (a) · v.
If we consider the directional derivative in the direction of various unit vectors v, we infer from the
Cauchy-Schwarz inequality, Proposition 2.3 of Chapter 1, that
As a consequence, we have
Remark. Be sure to distinguish between a level surface of f and the graph of f (which, in this
case, would reside in R4 ).
∇f(P)
v2 β α
v1
P
F1 F2
Figure 4.1
EXERCISES 3.4
Figure 4.2
h i
*4. Suppose a hillside is given by z = f (x), x ∈ U ⊂ R2 . Suppose f (a) = c and Df (a) = 3 −4 .
a
a. Find a vector tangent to the curve of steepest ascent on the hill at .
c
a
b. Find the angle that a stream makes with the horizontal at if it flows in the e2 direction
c
at that point.
5. As shown in Figure 4.3, at a certain moment, a ladybug is at position x0 and moving with
velocity vector v. At that moment, the angle ∠ax0 b = π/2, her velocity bisects that angle,
x0
a b
Figure 4.3
and her speed is 5 units/sec. At what rate is the sum of her distances from a and b decreasing
at that moment? Give your reasoning clearly.
6. Suppose that, in a neighborhood of the point a, the level curve C = {x ∈ R2 : f (x) = c} can
be parametrized by a differentiable function g : (−ε, ε) → R2 , with g(0) = a. Use the chain
rule to prove that ∇f (a) is orthogonal to the tangent vector to C at a.
§4. The Gradient 103
7. Check that the definition of an ellipse given in Example 3 gives the usual Cartesian equation
of the form
x2 y 2
+ 2 =1
a2 b
±c
when the foci are at . (Hint: You should find that a2 = b2 + c2 .)
0
8. By analogy with Example 3, prove that light emanating from the focus of a parabola reflects
off the parabola in the direction of the axis of the parabola. This is why automobile headlights
use parabolic reflectors. (A convenient definition of a parabola is this: it is the locus of points
focus
directrix
Figure 4.4
equidistant from a point (the focus) and a line (the directrix ), as pictured in Figure 4.4.)
9. Using Figure 4.5 as a guide, complete Dandelin’s proof (dating from 1822) that the appropriate
conic section is an ellipse. Find spheres that are inscribed in the cone and tangent to the plane
Q2
F2
F1 P
Q1
Figure 4.5
of the ellipse. Letting F1 and F2 be the points of tangency and P a point of the ellipse, let Q1
and Q2 be points where the generator of the cone through P intersects the respective spheres.
104 Chapter 3. The Derivative
10. Suppose f : R2 → R is a differentiable function whose gradient is nowhere 0 and that satisfies
∂f ∂f
=2
∂x ∂y
everywhere.
a. Find (with proof) the level curves of f .
x
b. Show that there is a differentiable function F : R → R so that f = F (2x + y).
y
11. Suppose f : R2 − {0} → R is a differentiable function whose gradient is nowhere 0 and that
satisfies
∂f ∂f
−y +x =0
∂x ∂y
everywhere.
a. Find (with proof) the level curves of f .
b. Show that there is a differentiable function F defined on the set of positive real numbers
so that f (x) = F (kxk).
x2 + y 2 + z 2 = 1 and z = x2 + y 2 + c
13. Prove the so-called pedal property of the ellipse: If n is the unit normal to the ellipse at P , then
−−→ −−→
(F1 P · n)(F2 P · n) = constant.
14. The height of land in the vicinity of a hill is given in terms of horizontal
coordinates x and
1
x 40
y by h = . A stream passes through the point 1 and follows a path of
y 4 + x2 + 3y 2 5
“steepest descent.” Find the equation of the path of the stream on a map of the region.
15. A drop of water falls onto a football and rolls down following the path of steepest descent; that
is, it moves in the direction tangent to the football most nearly vertically downward. Find the
path the water drop follows if the surface of the football is ellipsoidal and given by the equation
4x2 + y 2 + 4z 2 = 9
1
and the drop starts at the point 1 .
1
§5. Curves 105
5. Curves
In this section, we return to the study of (parametrized) curves with which we began Chapter
2. Now we bring in the appropriate differential calculus to discuss velocity, acceleration, some basic
principles from physics, and the notion of curvature.
If g : (a, b) → Rn is a twice-differentiable vector-valued function, we can visualize g(t) as denot-
ing the position of a particle at time t, and hence the image of g represents its trajectory as time
passes. Then g′ (t) is the velocity vector of the particle at time t and g′′ (t) is its acceleration vector
at time t. The length of the velocity vector, kg′ (t)k, is called the speed of the particle. In physics,
a particle of mass m is said to have kinetic energy
K.E. = 21 m(speed)2 ,
and acceleration looms large because of Newton’s second law of motion, which says that a force
acting on an object imparts an acceleration according to the equation
−−→ −−−−−−−−→
force = (mass)acceleration or, in other words, F = ma.
As a quick application of some vector calculus, let’s discuss a few properties of motion in a central
force field. We call a force field F : U → R3 on an open subset U ⊂ R3 central if F(x) = ψ(x)x
for some continuous function ψ : U → R; that is, F is everywhere a scalar multiple of the position
vector.
Newton discovered that the gravitational field of a point mass M is an inverse square force
directed toward the point mass. If we assume the point mass is at the origin, then the force exerted
on a unit test mass at position x is
GM x GM
F(x) = − 2
=− x,
kxk kxk kxk3
where G is the universal gravitational constant. Newton published his laws of motion in 1687 in his
Philosophiae Naturalis Principia Mathematica. Interestingly, Kepler had published his empirical
observations almost a century earlier, in 1596:1
Kepler’s first law: Planets move in ellipses with the sun at one focus.
Kepler’s second law: The position vector from the sun to the planet sweeps out area at a
constant rate.
Kepler’s third law: The square of the period of a planet is proportional to the cube of the
semimajor axis of its elliptical orbit.
For the first and third laws we refer the reader to Exercise 15, but here we prove a generalization
of the second.
Proposition 5.1. Let F be a central force field on R3 . Then the trajectory of any particle lies
in a plane; assuming the trajectory is not a line, the position vector sweeps out area at a constant
rate.
1
Somewhat earlier he had surmised that the positions of the six known planets were linked to the famous five
regular polyhedra.
106 Chapter 3. The Derivative
Proof. Let the trajectory of the particle be given by g(t), and let its mass be m. Consider the
vector function A(t) = g(t) × g′ (t). By Exercise 3.3.10 and by Newton’s second law of motion, we
have
A′ (t) = g′ (t) × g′ (t) + g(t) × g′′ (t) = g(t) × 1
m ψ(g(t))g(t) = 0,
since the cross product of any vector with a scalar multiple of itself is 0. Thus, A(t) = A0 is a
constant. If A0 = 0, the particle moves on a line (why?). If A0 6= 0, then note that g lies on the
plane
A0 · x = 0,
as A0 · g(t) = A(t) · g(t) = 0 for all t.
Assume now the trajectory is not linear. Let A(t) denote the area swept out by the position
vector g(t) from time t0 to time t. Since A(t + h) − A(t) equals the area subtended by the position
g(t+h)
g(t)
Figure 5.1
vectors g(t) and g(t + h) (see Figure 5.1), for h small, this is approximately the area of the triangle
determined by the pair of vectors, or equivalently, by the vectors g(t) and g(t + h) − g(t). By
Proposition 5.1 of Chapter 1, this area is 12 kg(t) × g(t + h) − g(t) k, so that
A(t + h) − A(t)
A′ (t) = lim
h→0 h
1 kg(t) × g(t + h) − g(t) k
= lim
h→0+ 2 h
1
g(t + h) − g(t)
= lim
g(t) ×
h→0+ 2 h
= 12 kg(t) × g′ (t)k = 12 kA0 k.
That is, the position vector sweeps out area at a constant rate.
One of the most useful (yet intuitively quite apparent) results about curves is the following.
to obtain
as required.
Physically, one should think of it this way: if the velocity vector had a nonzero projection on the
position vector, that would mean that the particle’s distance from the center of the sphere would
be changing. Analogously, as we ask the reader to show in Exercise 2, if a particle moves with
constant speed, then its acceleration must be orthogonal to its velocity.
Now we leave physics behind for a while and move on to discuss some geometry. We begin with
a generalization of the triangle inequality, Corollary 2.4 of Chapter 1.
Lemma 5.3. Suppose g : [a, b] → Rn is continuous (except perhaps at finitely many points).
Then, defining the integral of g component by component, i.e.,
R
b
Z b g 1 (t)dt
a .
g(t)dt = .. ,
Rb
a
a ng (t)dt
we have
Z b
Z b
g(t)dt
≤ kg(t)kdt.
a a
Z b
Proof. Let v = g(t)dt. If v = 0, there is nothing to prove. By the Cauchy-Schwarz
a
inequality, Proposition 2.3 of Chapter 1, |v · g(t)| ≤ kvkkg(t)k, so
Z b Z b Z b Z b
2
kvk = v · g(t)dt = v · g(t)dt ≤ kvkkg(t)kdt = kvk kg(t)kdt.
a a a a
Z b
Assuming v 6= 0, we now infer that kvk ≤ kg(t)kdt, as required.
a
That is, ℓ(g, P) is the length of the inscribed polygon with vertices at g(ti ), i = 0, . . . , k, as indicated
in Figure 5.2. We define the arclength of g to be
The following result is not in the least surprising: The distance a particle travels is the integral
of its speed.
108 Chapter 3. The Derivative
a b
Given this partition, the length of this polygonal
P, of [a,b], path is ℓ(g,P).
Figure 5.2
Figure 5.3
a constant pitch. If we take one “coil” of the helix, letting t run from 0 to 2π, then the arclength
of that portion is
Z 2π Z 2π
−a sin t
Z 2π p p
ℓ(g) = kg′ (t)kdt =
a cos t
dt = a2 + b2 dt = 2π a2 + b2 . ▽
0 0
b
0
We say the parametrized curve is arclength-parametrized if kg′ (t)k = 1 for all t, so s(t) = t+c for
some constant c. Typically, when the curve is arclength-parametrized, we use s as the parameter.
Then
1√
2 √1 + s
g′ (s) = 12 1 − s , and kg′ (s)k = 1. ▽
√1
2
If g is arclength-parametrized, then the velocity vector g′ (s) is the unit tangent vector at each
point, which we denote by T(s). Let’s assume now that g is twice differentiable. Since kT(s)k = 1
for all s, it follows from Proposition 5.2 that T(s) · T′ (s) = 0. Define the curvature of the curve to
be κ(s) = kT′ (s)k; assuming T′ (s) 6= 0, define the principal normal vector N(s) = T′ (s)/kT′ (s)k.
(See Figure 5.4.)
T(s)
g(s)
N(s)
Figure 5.4
Figure 5.5
Note that so long as g′ (t) never vanishes, the arclength s is a differentiable function of t with
everywhere positive derivative; thus, it has a differentiable inverse function, which we write t(s).
We can “reparametrize by arclength” by considering the composition h(s) = g(t(s)), and then, of
course, g(t) = h(s(t)). Writing2 υ(t) = s′ (t) = kg′ (t)k for the speed, we have by the chain rule
Then we have
" # " #
− cos t − cos t
g′ (t) = 3 cos t sin t , so υ(t) = 3 cos t sin t and T(s(t)) = .
sin t sin t
EXERCISES 3.5
1. Suppose g : (a, b) → Rn is a differentiable parametrized curve with the property that at each t,
the position and velocity vectors are orthogonal. Prove that g lies on a sphere centered at the
origin.
4. Suppose a particle moves in a central force field in R3 with constant speed. What can you say
about its trajectory? (Proof?)
5. Suppose g : (a, b) → Rn is nowhere zero and g′ (t) = λ(t)g(t) for some scalar function λ. Prove
(rigorously) that g/kgk is constant. (Hint: Set h = g/kgk, write g = kgkh, and differentiate.)
6. Suppose g : (a, b) → Rn is a differentiable parametrized curve and that for some point p ∈ Rn
we have kg(t0 ) − pk ≤ kg(t) − pk for all t ∈ (a, b). Prove that g′ (t0 ) · (g(t0 ) − p) = 0. Give a
geometric explanation.
9. Prove that for a parametrized curve g : (a, b) → R3 , we have κ = kg′ × g′′ k/υ 3 .
10. Using the formula (†) for acceleration, explain how engineers might decide at what angle to
bank a road that is a circle of radius 1/4 mile and around which cars wish to drive safely at 40
mph.
b. Show that B′ · T = B′ · B = 0, and deduce that B′ (s) is a scalar multiple of N(s) for every
s. (Hint: See Exercise 3.)
c. Define the torsion τ of the curve by B′ = −τ N. Show that g is a planar curve if and only
if τ (s) = 0 for all s.
d. Show that N′ = −κT + τ B.
The equations
T′ = κN, N′ = −κT + τ B, B′ = −τ N
are called the Frenet formulas for the arclength-parametrized curve g.
*12. (See Exercise 11 for the definition of torsion.) Calculate the curvature and torsion of the helix
presented in Example 1. Explain the meaning of the sign of the torsion.
13. (See Exercise 11 for the definition of torsion.) Calculate the curvature and torsion of the curve
et cos t
g(t) = et sin t .
et
14. A pendulum is made, as pictured in Figure 5.6, by hanging from the cusp where two arches of a
cycloid meet a string of length equal to the length of one of the arches. As it swings, the string
Figure 5.6
wraps around the cycloid and extends tangentially to the bob at the end. Given the equation
" #
t + sin t
f (t) = , 0 ≤ t ≤ 2π,
1 − cos t
for the cycloid, find the parametric equation of the bob, P , of the pendulum.3
15. Assuming that the force field is inverse square, prove Kepler’s first and third laws, as follows.
Without loss of generality, we may assume that the planet has mass 1 and moves in the xy-
plane. (You will need to use polar coordinates, as introduced in Example 6 of Chapter 2,
Section 1.)
3
This phenomenon was originally discovered by the Dutch mathematician Huygens, in an effort to design a
pendulum whose period would not depend on the amplitude of its motion, hence one ideal for an accurate clock.
114 Chapter 3. The Derivative
a. Suppose a, b > 0 and a2 = b2 + c2 . Show that the polar coordinates equation of the ellipse
(x − c)2 y 2 c b2
+ = 1 is r(1 − cos θ) = .
a2 b2 a a
This is an ellipse with semimajor axis a and semiminor axis b, with one focus at the origin.
(Hint: Expand the left-hand side in polar coordinates and express the result as a difference
of squares.)
eθ(t)
er(t)
g(t)
r(t)
θ(t)
Figure 5.7
b. Let r(t) and θ(t) be the polar coordinates of g(t), and let
cos θ(t) − sin θ(t)
er (t) = and eθ (t) = ,
sin θ(t) cos θ(t)
as pictured in Figure 5.7. Show that
g′ (t) = r ′ (t)er (t) + r(t)θ ′ (t)eθ (t)
g′′ (t) = r ′′ (t) − r(t)θ ′ (t)2 er (t) + 2r ′ (t)θ ′ (t) + r(t)θ ′′ (t) eθ (t).
c. Let A0 be as in the proof of Proposition 5.1. Show that g′′ (t) × A0 = GM θ ′ (t)eθ (t) =
GM e′r (t), and deduce that g′ (t) × A0 = GM (er (t) + c) for some constant vector c.
d. Dot the previous equation with g(t) and use the fact that g(t) × g′ (t) = A0 to deduce
that GM r(t)(1 − kck cos θ(t)) = kA0 k2 if we assume c is a negative scalar multiple of e1 .
Deduce that when kck ≥ 1 the path of the planet is unbounded and that when kck < 1
the orbit of the planet is an ellipse with one focus at the origin.
e. As we shall see in Chapter 7, the area of an ellipse with semimajor axis a and semiminor
4π 2 3
axis b is πab; show that the period T = 2πab/kA0 k. Now prove that T 2 = a .
GM
16. (pilfered from Which Way did the Bicycle Go . . . and other intriguing mathematical mysteries,
published by the M.A.A.)
“This track, as you perceive, was made by a rider who was going from the
direction of the school.”
“Or towards it?”
“No, no, my dear Watson . . . It was undoubtedly heading away from the
school.”
So spoke Sherlock Holmes.4 Imagine a 20-foot wide mud patch through which a bicycle has
just passed, with its front and rear tires leaving tracks as illustrated in Figure 5.8. (We have
4
The Return of Sherlock Holmes,“The Adventure of the Priory School”
§6. Higher-Order Partial Derivatives 115
Figure 5.8
taken the liberty of helping you in your capacity as sleuth by dashing the path of one of the
wheels.) In which direction was the bicyclist traveling? Explain your answer.
∂f
= yexy sin z + y 3 z 4
∂x
2
∂ f
= yexy cos z + 4y 3 z 3
∂z∂x
∂3f
= −yexy sin z + 12y 3 z 2 , and
∂z 2 ∂x
∂3f
= exy (xy + 1) cos z + 12y 2 z 3 . ▽
∂y∂z∂x
It is a hassle to keep track of the order in which we calculate higher-order partial derivatives.
Luckily, the following result tells us that for smooth functions, the order in which we calculate the
116 Chapter 3. The Derivative
partial derivatives does not matter. This is an intuitively obvious result, but the proof is quite
subtle.
Theorem 6.1. Let U ⊂ Rn be open, and suppose f : U → Rm is a C2 function. Then for any
i and j we have
∂2f ∂2f
= .
∂xi ∂xj ∂xj ∂xi
Proof. It suffices to prove the result when m = 1. For ease of notation, we take n = 2, i = 1,
and j = 2. Introduce the function
h a+h a+h a a
∆ =f −f −f +f ,
k b+k b b+k b
s s
as indicated schematically in Figure 6.1. Letting q(s) = f −f , and applying the Mean
b+k b
a a+h
b+k — + b+k
r(t)
a + — a+h
b q(s) b
Figure 6.1
Value Theorem, we have
h
∆ = q(a + h) − q(a) = hq ′ (ξ) for some ξ between a and a + h
k
∂f ξ ∂f ξ
=h −
∂x b + k ∂x b
2
∂ f ξ
= hk for some η between b and b + k.
∂y∂x η
a+h a
On the other hand, letting r(t) = f −f , we have
t t
h
∆ = r(b + k) − r(k) = kr ′ (τ ) for some τ between b and b + k
k
∂f a + h ∂f a
=k −
∂y τ ∂y τ
∂2f σ
= hk for some σ between a and a + h.
∂x∂y τ
Therefore, we have
1 h ∂2f ξ ∂2f σ
∆ = = .
hk k ∂y∂x η ∂x∂y τ
§6. Higher-Order Partial Derivatives 117
∂2f ∂2f
Now ξ, σ → a and η, τ → b as h, k → 0, and since the functions and are continuous,
∂x∂y ∂y∂x
we have
∂2f a ∂2f a
= ,
∂x∂y b ∂y∂x b
as required.
Example 2. (Harmonic functions) If f is a C2 function on (an open subset of) Rn , the expres-
sion
∂2f ∂2f ∂2f
∇2 f = 2 + 2 + ··· + 2
∂x1 ∂x2 ∂xn
is called the Laplacian of f . A solution of the equation ∇2 f = 0 is called a harmonic function. As
we shall see in Chapter 8, the Laplacian and harmonic functions play an important rôle in physical
applications. For example, the gravitational (resp., electrostatic) potential is a harmonic function
in mass-free (resp., charge-free) space. ▽
∂2f 2
2∂ f
(∗) = c
∂t2 ∂x2
models the displacement of a one-dimensional vibrating string (with “wave velocity” c) from its
equilibrium position. By a clever use of the chain rule, we can find an explicit formula for its general
solution, assuming f is C2 . Let
" # 1
x u 2 (u + v)
=g = 1
2c (u − v)
t v
u u
(so that u = x + ct and v = x − ct), and set F =f g . Then by the chain rule, we have
v v
u u u
DF = Df g Dg
v v v
" 1 1
#
∂f u ∂f u 2 2
= g g 1 1 ,
∂x v ∂t v −
2c 2c
so
" 1 #
∂F ∂f u ∂f u 2 1 ∂f u 1 ∂f u
= g g 1 = g − g .
∂v ∂x v ∂t v − 2c 2 ∂x v 2c ∂t v
118 Chapter 3. The Derivative
Now,
differentiating
with
respect
to u, we have to apply the chain rule to each of the functions
∂f u ∂f u
g and g :
∂x v ∂t v
" 1 # 2 2 " 1 #
∂2F 1 ∂2f u ∂2f u 2 1 ∂ f u ∂ f u 2
= 2
g g 1 − g 2
g 1
∂u∂v 2 ∂x v ∂t∂x v 2c ∂x∂t v ∂t v
2c 2c
1 ∂2f u 1 ∂2f u 1 ∂2f u 1 ∂2f u
= 2
g + g − g − 2 2 g
4 ∂x v c ∂t∂x v c ∂x∂t v c ∂t v
2
1 ∂ f u 1 ∂2f u
= 2
g − 2 2
g = 0,
4 ∂x v c ∂t v
where at the last step we use Theorem 6.1. Now what can we say about the general solution of the
∂2F
equation = 0? On any rectangle in the uv-plane, we can infer that
∂u∂v
u
F = φ(u) + ψ(v)
v
∂ ∂F ∂F
for some differentiable functions φ and ψ. (For = 0 tells us that is independent of
∂u ∂v ∂v
u, hence a function of v only, whose antiderivative we call ψ(v). But the constant of integration
can be an arbitrary function of u. To examine this argument a bit more carefully, we recommend
that the reader consider Exercise 11.)
In conclusion, on a suitable domain, the general solution of the wave equation (∗) can be written
in the form
x
f = φ(x + ct) + ψ(x − ct)
t
for arbitrary C2 functions φ and ψ. The physical interpretation is this: the general solution is the
superposition of two traveling waves, one moving to the right along the string with speed c, the
other moving to the left with speed c. ▽
Example 4. (Minimal Surfaces) When you dip a piece of wire shaped in the form of a closed
curve C into soap film, the resulting surface you see is called a minimal surface, so called because
in principle surface tension dictates that the surface should have least area among all those surfaces
x
having that curve C as boundary. If the minimal surface is in the form of a graph z = f ,
y
then it is shown in a differential geometry course that f must be a solution of the minimal surface
equation ∂f 2 ∂ 2 f ∂f ∂f ∂ 2 f
∂f 2 ∂ 2 f
1+ − 2 + 1 + = 0.
∂y ∂x2 ∂x ∂y ∂x∂y ∂x ∂y 2
(See also Exercise 8.5.22.) Examples of minimal surfaces are:
(a) a plane;
(b) a helicoid—the spiral surface obtained by joining points of a helix “horizontally” to its
vertical axis, as pictured in Figure 6.2(a);
1
(c) a catenoid—the surface of revolution obtained by rotating a catenary y = 2c (ecx + e−cx )
(for any c > 0) about the x-axis, as pictured in Figure 6.2(b).
(See Exercise 10.) ▽
§6. Higher-Order Partial Derivatives 119
(a) (b)
Figure 6.2
EXERCISES 3.6
1. Define f : R2 → R by
x x2 − y 2
f = xy , x 6= 0, f (0) = 0.
y x2 + y 2
∂f 0 ∂f x
a. Show that = −y for all y and = x for all x.
∂x y ∂y 0
∂2f ∂2f
b. Deduce that (0) = 1 but (0) = −1.
∂x∂y ∂y∂x
c. Conclude that f is not C2 at 0.
2. Check
that the following are harmonic functions.
x
a. f = 3x2 − 5xy − 3y 2
y
x
b. f = log(x2 + y 2 )
y
x
c. f y = x2 + xy + 2y 2 − 3z 2 + xyz
z
x
d. f y = (x2 + y 2 + z 2 )−1/2
z
3. Check that the following functions are solutions of the one-dimensional wave equation given in
Example
3.
x
a. f = cos(x + ct)
t
x
b. f = sin 5x cos 5ct
t
x 2 /4kt
4. Let f = t−1/2 e−x . Show that f is a solution of the one-dimensional heat equation
t
∂f ∂2f
= k 2.
∂t ∂x
120 Chapter 3. The Derivative
*5. Suppose
we are given a solution f of the one-dimensional
wave equation, with initial position
x ∂f x
f = h(x) and initial velocity = k(x). Express the functions φ and ψ in the
0 ∂t 0
solution of Example 3 in terms of h and k.
2 2 2 2 u u u
6. Suppose f : R → R and g : R → R are C , and let F =f g . Writing g1 =
v v v
u u u
x and g2 =y , show that
v v v
∂2F ∂f ∂ 2 x ∂f ∂ 2 y ∂ 2 f ∂x ∂x ∂2f ∂x ∂y ∂x ∂y ∂ 2 f ∂y ∂y
= + + 2 + + + 2 ,
∂u∂v ∂x ∂u∂v ∂y ∂u∂v ∂x ∂u ∂v ∂x∂y ∂u ∂v ∂v ∂u ∂y ∂u ∂v
u
where the partial derivatives of f are evaluated at g .
v
r r cos θ
7. Suppose f : R2 → R is C2 . Let F =f . Show that
θ r sin θ
∂2F 1 ∂F 1 ∂2F ∂2f ∂2f
+ + = + ,
∂r 2 r ∂r r 2 ∂θ 2 ∂x2 ∂y 2
r r cos θ
where the left-hand side is evaluated at and the right-hand side is evaluated at .
θ r sin θ
(This is the formula for the Laplacian in polar coordinates.)
r
8. Use the result of Exercise 7 to show that for any integer n, the functions F = r n cos nθ
θ
r
and F = r n sin nθ are harmonic.
θ
10. Check that the following functions f : R2 → R are indeed solutions of the minimal surface
equation
given in Example 4.
x
a. f =c
y
x
b. f = arctan(y/x)
y
q
x 1 x
2
c. f = 2 (e + e−x ) − y 2 (For this one a computer algebra system is recommended.)
y
0,
u u < 0 or v < 0 ∂2F
11. Define F = . Show that F is C2 and = 0, and yet F cannot
v u3 ,
u ≥ 0 and v > 0 ∂u∂v
be written in the form prescribed by the discussion of Example 3. Resolve this paradox.
CHAPTER 4
Implicit and Explicit Solutions of Linear Systems
We have seen that we can view the unit circle {x ∈ R2 : kxk = 1} either as the set of
cos t
solutions of an equation or in terms of a parametric representation : t ∈ [0, 2π) . These
sin t
are, respectively, the implicit and explicit representations of this subset of R2 . Similarly, any
subspace V ⊂ Rn can be represented in two ways:
(i) V = Span(v1 , . . . , vk ) for appropriate vectors v1 , . . . , vk ∈ Rn —this is the explicit or
parametric representation;
(ii) V = {x ∈ Rn : Ax = 0} for an appropriate m × n matrix A—this is the implicit
representation, viewing V as the intersection of the hyperplanes defined by Ai · x = 0.
In this chapter we will see how to go back and forth between these two approaches. The central
tool is Gaussian elimination, with which we deal in depth in the first two sections. We then come to
the central notion of dimension and some useful applications. In the last section, we will begin to
investigate to what extent we can relate implicit and explicit descriptions in the nonlinear setting.
In this section we give an explicit algorithm for solving a system of m linear equations in n
variables:
a11 x1 + a12 x2 + . . . + a1n xn = b1
a21 x1 + a22 x2 + . . . + a2n xn = b2
.. ..
. .
am1 x1 + am2 x2 + . . . + amn xn = bm .
Geometrically, a solution of the system Ax = b is a vector x having the requisite dot products with
the row vectors Ai of the matrix A:
Ai · x = bi for all i = 1, 2, . . . , m.
That is, the system of equations describes the intersection of the m hyperplanes with normal vectors
Ai and at (signed) distance bi /kAi k from the origin.
121
122 Chapter 4. Implicit and Explicit Solutions of Linear Systems
To solve a system of linear equations, we want to give an explicit parametric description of the
general solution. Some systems are relatively simple to solve. For example, taking the system
x1 − x3 = 1
x2 + 2x3 = 2 ,
we see that these equations allow us to determine x1 and x2 in terms of x3 ; in particular, we can
write x1 = 1 + x3 and x2 = 2 − 2x3 , wherex3 is free to take on any real value. Thus, any solution
1+t
of this system is of the form x = 2 − 2t for some t ∈ R. (It is easily checked that every vector
t
of this form is in fact a solution, as (1 + t) − t = 1 and (2 − 2t) + 2t = 2 for every t ∈ R.)Thus,
1
we see that the
intersection
of the two given planes is the line in R3 passing through 2 with
1 0
direction vector −2 .
1
More complicated systems of equations require some algebraic manipulations before we can
easily read off the general solution in parametric form. There are three basic operations we can
perform on systems of equations that will not affect the solution set. They are the following
elementary operations:
(i) interchange any pair of equations;
(ii) multiply any equation by a nonzero real number;
(iii) replace any equation by its sum with a multiple of any other equation.
then we use operation (ii), multiplying the first equation by 1/2, to get
x1 + x2 − 2x4 = 3
3x1 − 2x2 + 9x4 = 4 ;
now we use operation (iii), adding −3 times the first equation to the second:
x1 + x2 − 2x4 = 3
− 5x2 + 15x4 = −5 .
Next we use operation (ii) again, multiplying the second equation by −1/5, to obtain
x1 + x2 − 2x4 = 3
x2 − 3x4 = 1 ;
§1. Gaussian Elimination and the Theory of Linear Systems 123
finally, we use operation (iii), adding −1 times the second equation to the first:
x1 + x4 = 2
x2 − 3x4 = 1 .
From this we see that x1 and x2 are determined by x4 , while x3 and x4 are free to take on any
values. Thus, we read off the general solution of the system of equations:
x1 = 2 − x4
x2 = 1 + 3x4
x3 = x3
x4 = x4
We now describe a systematic technique, using the three allowable elementary operations, for
solving systems of m equations in n variables. Before going any further, we should make the official
observation that performing elementary operations on a system of equations does not change its
solutions.
Since we have established that elementary operations do not affect the solution set of a system of
equations, we can freely perform elementary row operations on the augmented matrix of a system
of equations with the goal of finding an “equivalent” augmented matrix from which we can easily
read off the general solution.
Definition. We call the first nonzero entry of a row (reading left to right) its leading entry. A
matrix is in echelon1 form if
(1) the leading entries move to the right in successive rows;
(2) the entries of the column below each leading entry are all 0;2
(3) all rows of zeroes are at the bottom of the matrix.
A matrix is in reduced echelon form if it is in echelon form and, in addition,
(4) every leading entry is 1;
(5) all the entries of the column above each leading entry are 0 as well.
We call the leading entry of a certain row of a matrix a pivot if there is no leading entry above
it in the same column. When a matrix is in echelon form, we refer to the columns in which a pivot
appears as pivot columns and to the corresponding variables (in the original system of equations)
as pivot variables. The remaining variables are called free variables.
1
The word echelon derives from theFrench “échelle,” ladder. Although we don’t usually draw the rungs of the
1 2 3 4
ladder, they are there: 0 0 1 2 .
0 0 0 3
2
Condition (2) is actually a consequence of (1), but we state it anyway for clarity.
§1. Gaussian Elimination and the Theory of Linear Systems 125
Notice that the pivot variables, x1 , x3 , and x4 , are completely determined by the free variables x2
and x5 . As usual, we can write the general solution in terms of the free variables only:
x1 1−2x2 −4x5 1 −2 −4
x2 x2 0 1 0
x = x3 = 2
+2x5 = 2 + x2 0 + x5
2 . ▽
x4 1 − x5 1 0 −1
x5 x5 0 0 1
In this last example, we see that the general solution is the sum of a particular solution—
obtained by setting all the free variables equal to 0—and a linear combination of vectors, one
for each free variable—obtained by setting that free variable equal to 1 and the remaining free
variables equal to 0 and ignoring the particular solution. In other words, if xk is a free variable,
the corresponding vector in the general solution has kth coordinate equal to 1 and j th coordinate
equal to 0 for all the other free variables xj . Concentrate on the circled entries in the vectors from
Example 3:
−2 −4
n n
1 0
x2
0 + x5 2 .
0 −1
0n 1n
We refer to this as the standard form of the general solution. The general solution of any system
in reduced echelon form can be presented in this manner.
Our strategy now is to transform the augmented matrix of any system of linear equations into
echelon form by performing a sequence of elementary row operations. The algorithm goes by the
name of Gaussian elimination.
126 Chapter 4. Implicit and Explicit Solutions of Linear Systems
The first step is to identify the first column (starting at the left) that does not consist only
of 0’s; usually this is the first column, but it may not be. Pick a row whose entry in this column
is nonzero—usually the uppermost such row, but you may choose another if it helps with the
arithmetic—and interchange this with the first row; now the first entry of the first nonzero column
is nonzero. This will be our first pivot. Next, we add the appropriate multiple of the top row to all
the remaining rows to make all the entries below the pivot equal to 0. To consider two examples,
if we begin with the matrices
3 −1 2 7 0 2 4 3
A = 2 1 3 3 and B = 0 1 2 −1 ,
2 2 4 2 0 2 3 3
then we begin by switching the first and third rows of A and the first and second rows of B (to
avoid fractions). After clearing out the first pivot column we have
2n 2 4 2 0 1n 2 −1
′ ′
A = 0 −1 −1 1 and B = 0 0 0 5.
0 −4 −4 4 0 0 −1 5
We have circled the pivots for emphasis. (If we are headed for the reduced echelon form, we might
replace the first row of A′ by 1 1 2 1 .)
The next step is to find the first column (again, starting at the left) in the new matrix having
a nonzero entry below the first row . Pick a row below the first that has a nonzero entry in this
column, and, if necessary, interchange it with the second row. Now the second entry of this column
is nonzero; this is our second pivot. (Once again, if we’re calculating the reduced echelon form, we
multiply by the reciprocal of this entry to make the pivot 1.) We then add appropriate multiples
of the second row to the rows beneath it to make all the entries beneath the pivot equal to 0.
Continuing with our examples, we obtain
2n 2 4 2 0 1n 2 −1
n 5
A′′ = 0 −1 n −1 1 and B ′′ = 0 0 −1 .
0 0 0 0 0 0 0 5n
At this point, both A′′ and B ′′ are in echelon form; note that the zero row of A′′ is at the bottom,
and that the pivots move toward the right and down.
The process continues until we can find no more pivots—either because we have a pivot in each
row or because we’re left with nothing but rows of zeroes. At this stage, if we are interested in
finding the reduced echelon form, we clear out the entries in the pivot columns above the pivots
and then make all the pivots equal to 1. (Two words of advice here: if we start at the right and
work our way up and to the left, we in general minimize the amount of arithmetic that must be
done. Also, we always do our best to avoid fractions.) Continuing with our examples, we find the
reduced echelon forms of A and B respectively:
2n 2 4 2 1n 1 2 1 1n 0 1 2
A′′ = 0 −1 n −1 1 0 1n 1 −1 0 1n 1 −1 = RA
0 0 0 0 0 0 0 0 0 0 0 0
§1. Gaussian Elimination and the Theory of Linear Systems 127
0 1n 2 −1 0 1n 2 −1 0 1n 2 0
′′ n 5
B = 0 0 −1 0 0 1n −5 0 0 1n 0
0 0 0 5n 0 0 0 1n 0 0 0 1n
0 1n 0 0
0 0 1n 0 = RB .
0 0 0 1n
We must be careful from now on to distinguish between the symbols “=” and “ ”; when we convert
one matrix to another by performing one or more row operations, we do not have equal matrices.
Here is one last example:
Example 4. Give the general solution of the following system of linear equations:
x1 + x2 + 3x3 − x4 = 0
−x1 + x2 + x3 + x4 + 2x5 = −4
x2 + 2x3 + 2x4 − x5 = 0
2x1 − x2 + x4 − 6x5 = 9 .
We begin with the augmented matrix of coefficients and put it in reduced echelon form:
1 1 3 −1 0 0 1 1 3 −1 0 0
−1 1 1 1 2 −4 0 2 4 0 2 −4
0 1 2 2 −1 0 0 1 2 2 −1 0
2 −1 0 1 −6 9 0 −3 −6 3 −6 9
1 1 3 −1 0 0 1 1 3 −1 0 0
0 1 2 0 1 −2 −2
0 1 2 0 1
0 0 0 2 −2 2 0 0 0 1 −1 1
0 0 0 3 −3 3 0 0 0 0 0 0
1 0 1 0 −2 3
0 1 2 0 1 −2
0 0 0 1 −1 1
0 0 0 0 0 0
Thus, the system of equations is given in reduced echelon form by
x1 + x3 − 2x5 = 3
x2 + 2x3 + x5 = −2
x4 − x5 = 1 ,
x1 = 3 − x3 + 2x5
x2 = −2 − 2x3 − x5
x3 = x3
x4 = 1 + x5
x5 = x5 ,
128 Chapter 4. Implicit and Explicit Solutions of Linear Systems
When we reduce a matrix to echelon form, we must make a number of choices along the way,
and the echelon form may well depend on the choices. But we shall now prove (using an inductive
argument) that any two echelon forms of the same matrix must have pivots in the same columns,
and from this it will follow that the reduced echelon form must be unique.
Theorem 1.2. Suppose A and B are echelon forms of the same nonzero matrix M . Then all
the their pivots appear in the same positions. As a consequence, if they are in reduced echelon
form, then they are equal.
Proof. We begin by noting that we can transform M to both A and B by sequences of ele-
mentary row operations. It follows that we can proceed from A to B by a sequence of elementary
row operations: The inverse of an elementary row operation is itself an elementary row operation,
so we can first transform A to M and then transform M to B.
Suppose the ith column of A is its first pivot column; this column vector is the standard basis
vector e1 ∈ Rm and all previous columns are zero. If we perform any elementary row operation on
A, the first i − 1 columns remain zero and the ith column remains nonzero. Thus, the ith column
is the first nonzero column of B, i.e., it is B’s first pivot column.
Next we prove that all the pivots must be in the same locations. We do this by induction on
m, the number of rows. We’ve already established that this must be the case for m = 1. Now
assume that the statement is true for m = k and consider (k + 1) × n matrices A and B satisfying
the hypotheses. By what we’ve already said, A and B have the same first pivot column; by using
an elementary row operation of type (ii) appropriately, we may assume those respective first pivot
entries in the first row are equal. Now, the k × n matrices A′ and B ′ obtained from A and B
by deleting their first rows are also in echelon form. Furthermore, any sequence of elementary
row operations that transforms A to B cannot involve the first row in a nontrivial way (if we add
a multiple of the first row to any other row, we must later subtract it again). Thus, A′ can be
transformed to B ′ by a sequence of elementary row operations. By the induction hypothesis we
can now conclude that A′ and B ′ have pivots in the same locations and, thus, so do A and B.
Last, we prove that if A and B are in reduced echelon form, then they are equal. Again we
proceed by induction on m. The case m = 1 is trivial. Assume that the statement is true for m = k
and consider the case m = k + 1. If the matrix A has a row of zeroes, then so must the matrix
B; we delete these rows and apply the induction hypothesis to conclude that A = B. Now, if the
last row of A is nonzero, it must contain the last pivot of A (say, in the j th column). Then we
know that the last pivot of B must be in the j th column as well. Since the matrices are in reduced
echelon form, their j th columns must be the last standard basis vector em ∈ Rm . Because of this,
the sequence of elementary row operations that transforms A to B cannot involve the last row in
§1. Gaussian Elimination and the Theory of Linear Systems 129
a nontrivial way. Thus, if we let A′ and B ′ be the matrices obtained from A and B by deleting
the last row, we see that A′ can be transformed to B ′ by a sequence of elementary row operations
and that A′ and B ′ are both in reduced echelon form. The induction hypothesis applies to A′ and
B ′ , so we conclude that A′ = B ′ . Finally, we need to argue that the bottom rows of A and B are
identical. But any elementary row operation that would alter the last row would also have to make
some change in the first j entries. Since the last rows of A and B are known to agree in the first j
entries we conclude that they must agree everywhere.
1.1. Consistency. We recall from Chapter 1 that the product Ax can be expressed as
a11 x1 + · · · + a1n xn a11 a12 a1n
a21 x1 + · · · + a2n xn a21 a22 a2n
(∗) Ax =
.. = x1 . + x2 . + · · · + xn .
. . .
. . . .
am1 x1 + · · · + amn xn am1 am2 amn
= x1 a 1 + x2 a 2 + · · · + xn a n ,
c1
..
where a1 , . . . , an ∈ Rm are the column vectors of the matrix A. Thus, a solution c = . of the
cn
linear system Ax = b provides scalars c1 , . . . , cn so that
b = c1 a1 + · · · + cn an ;
Suppose we want to express the vector b as a linear combination of the vectors v1 , v2 , and v3 .
Writing out the expression
1 1 2 4
0 1 1 3
x1 v1 + x2 v2 + x3 v3 = x1 + x2 + x3 = ,
1 1 1 1
2 1 2 2
x1 + x2 + 2x3 = 4
x2 + x3 = 3
x1 + x2 + x3 = 1
2x1 + x2 + 2x3 = 2 .
130 Chapter 4. Implicit and Explicit Solutions of Linear Systems
which obviously has no solution. Thus, the original system of equations has no solution: the vector
b in this example cannot be written as a linear combination of v1 , v2 , and v3 . ▽
A system of equations is consistent precisely when a solution exists. We see that the system of
equations in Example 6 is inconsistent and the system of equations in Example 5 is consistent. It
is easy to recognize an inconsistent system of equations from the echelon form of its augmented
matrix: the system is inconsistent only when there is an equation which reads
for some nonzero scalar c, i.e., when there is a row in the echelon form of the augmented matrix
where all but the rightmost entry are 0.
Turning this around a bit, let [U | c] denote the echelon form of the augmented matrix [A | b].
The system Ax = b is consistent if and only if any zero row in U corresponds to a zero entry in
the vector c.
There are two geometric interpretations of consistency. From the standpoint of row vectors,
the system Ax = b is consistent precisely when the intersection of the hyperplanes
A1 · x = b1 , ..., Am · x = bm
is nonempty. From the point of view of column vectors, the system Ax = b is consistent precisely
when the vector b can be written as a linear combination of the column vectors a1 , . . . , an of A.
In the next example, we characterize those vectors b ∈ R4 that can be expressed as a linear
combination of the three vectors v1 , v2 , and v3 from Examples 5 and 6.
have a solution? We form the augmented matrix [A | b] and determine its echelon form:
1 1 2 b1 1 1 2 b1 1 1 2 b1
0 1 1 b2 0 1 1 b2 0 1 1 b2
.
1 1 1 b3 0 0 −1 b3 − b1 0 0 1 b1 − b3
2 1 2 b4 0 −1 −2 b4 − 2b1 0 0 0 −b1 + b2 − b3 + b4
We infer from the last row of the latter matrix that the original system of equations will have a
solution if and only if
(†) −b1 + b2 − b3 + b4 = 0.
That is, the vector b can be written as a linear combination of v1 , v2 , and v3 precisely when b
satisfies the constraint equation (†). ▽
Example 8. Given
1 −1 1
3 2 −1
A= ,
1 4 −3
3 −3 3
we wish to find all vectors b ∈ R4 so that Ax = b is consistent, i.e., all vectors b that can be
expressed as a linear combination of the columns of A.
We consider the augmented matrix [A | b] and determine its echelon form [U | c]. In order for
the system to be consistent, every entry of c corresponding to a row of zeroes in U must be 0 as
well:
1 −1 1 b1 1 −1 1 b1
3 2 −1 b2 0 5 −4 b2 − 3b1
[A | b] =
1 4 −3 b3 0 5 −4 b3 − b1
3 3 −3 b4 0 0 0 b4 − 3b1
1 −1 1 b1
0 5 −4 b2 − 3b1
.
0 0 0 b3 − b2 + 2b1
0 0 0 b4 − 3b1
Thus, we conclude that Ax = b is consistent if and only if b satisfies the constraint equations
4
These equations describe
of two hyperplanes through the origin in R with respec-
the intersection
2 −3
−1 0
tive normal vectors
1 and 0 . ▽
0 1
Notice that here we have reversed the process at the beginning of this section. There we
expressed the general solution of a system of linear equations as a linear combination of certain
vectors. Here, starting with the column vectors of the matrix A, we have found the constraint
equations a vector b must satisfy in order to be a linear combination of them (that is, to be in
the plane they span). This is the process of determining Cartesian equations for a space defined
parametrically.
Proof. First we observe that any such vector u is a solution of Ax = b. By linearity, we have
Au = A(u1 + v) = Au1 + Av = b + 0 = b.
Conversely, every solution of Ax = b can be written in this form, for if u is an arbitrary solution
of Ax = b, then, by linearity again,
A(u − u1 ) = Au − Au1 = b − b = 0,
so v = u−u1 is a solution of the associated homogeneous system; now we just solve for u, obtaining
u = u1 + v, as required.
Remark. As Figure 1.1 suggests, when the inhomogeneous system Ax = b is consistent, its
solutions are obtained by translating the set of solutions of the associated homogeneous system by
u v
v
solutions of Ax=b
u1
solutions of Ax=0
Figure 1.1
a particular solution u1 .
Of course, a homogeneous system is always consistent, since the trivial solution, x = 0, is always
a solution of Ax = 0. Now, if the rank of A is r, then there will be r pivot variables and n − r free
variables in the general solution of Ax = 0. In particular, if r = n, then x = 0 is the only solution
of Ax = 0.
Definition . If the system of equations Ax = b has precisely one solution, then we say that
the system has a unique solution.
Thus, a homogeneous system Ax = 0 has a unique solution when r = n and infinitely many
solutions when r < n. Note that it is impossible to have r > n, since there cannot be more pivots
than columns. Similarly, there cannot be more pivots than rows in the matrix, so it follows that
whenever n > m (i.e., there are more variables than equations), the homogeneous system Ax = 0
must have infinitely many solutions.
From Proposition 1.4 we know that if the inhomogeneous system Ax = b is consistent, then its
solutions are obtained by translating the solutions of the associated homogeneous system Ax = 0
by a particular solution. So we have the
Proposition 1.5. Suppose the system Ax = b is consistent. Then it has a unique solution if
and only if the associated homogeneous system Ax = 0 has only the trivial solution. This happens
exactly when r = n.
§1. Gaussian Elimination and the Theory of Linear Systems 135
We conclude this discussion with an important special case. It is natural to ask when the
inhomogeneous system Ax = b has a unique solution for every b ∈ Rm . From Proposition 1.3 we
infer that for the system always to be consistent, we must have r = m; from Proposition 1.5 we
infer that for solutions to be unique, we must have r = n. And so we see that we can only have
both conditions when r = m = n.
(1) A is nonsingular.
(2) Ax = 0 has only the trivial solution.
(3) For every b ∈ Rn , the equation Ax = b has a unique solution.
EXERCISES 4.1
*2. Decide which of the following matrices are in echelon form, which are in reduced echelon form,
and which are neither. Justify your answers.
" #
0 1 1 1 0
a.
2 3 e. 0 0 0
" #
2 1 3 0 0 1
b.
0 1 −1 1 1 0 −1
" #
1 0 2 f. 0 2 1 0
c. 0 0 0 1
0 1 −1
" # 1 0 −2 0 1
1 1 0
d. g. 0 1 1 0 1
0 0 2
0 0 0 1 4
3. For each of the following matrices A, determine its reduced echelon form and give the general
solution of Ax = 0 in standard
form.
1 0 −1 1 2 −1
a. A = −2 3 −1 1 3 1
c. A =
3 −3 0 2 4 3
2 −2 4 −1 1 6
" #
*b. A = −1 1 −2 1 −2 1 0
d. A =
3 −3 6 2 −4 3 −1
136 Chapter 4. Implicit and Explicit Solutions of Linear Systems
1 1 1 1 1 −1 1 1 0
1 2 1 2 1 0 2 1 1
*e. A = g. A =
1 3 2 4 0 2 2 2 0
1 2 2 3 −1 1 −1 0 −1
1 2 0 −1 −1 1 1 0 5 0 −1
−1 −3 1 2 3 0 1 1 3 −2 0
*f. A = h. A =
1 −1 3 1 1 −1 2 3 4 1 −6
2 −3 7 3 4 0 4 4 12 −1 −7
4. Give the general
solution of
the equation
Ax
= b in standard form.
2 1 −1 3
*a. A = 1 2 1 , b = 0
−1 1 2 −3
" # " #
1 1 1 1 6
b. A = , b=
3 3 2 0 17
1 1 1 −1 0 −2
2 0 4 1 −1 10
c. A = , b =
1 2 0 −2 2 −3
0 1 −1 2 4 7
1 0
*5. Find all the unit vectors x ∈ R3 that make an angle of π/3 with the vectors 0 and 1 .
−1 1
4
6. Findthenormal
vector inR spanned
to the hyperplane by
1 1 1 1 2 1
1 2 3 1 2 3
*a.
1 , 1 , 2 b.
1 , 1 , 2 .
1 2 4 1 2 3
2 −1 −4
*7. A circle C passes through the points , , and . Find the center and radius of
6 7 −2
C. (Hint: The equation of a circle can be written in the form x2 + y 2 + ax + by + c = 0. Why?)
*8. By solving a system of equations, find the linear combination of the vectors
1 0 2
v1 = 0 , v2 = 1 , v3 = 1
−1 2 1
3
that gives b = 0 .
−2
*9. For each of the following vectors b ∈ R4 , decide whether b is a linear combination of
1 0 1
0 −1 −2
v1 =
1,
v2 =
0,
and v3 =
1.
−2 1 0
§1. Gaussian Elimination and the Theory of Linear Systems 137
1 1 1
1 −1 1
a. b=
1
b. b=
1
c. b=
0
1 −1 −2
3
*10. Decide
whether
each of the following collections of vectors R .
spans
1 1 1 1 3 2
a. 1 , 2 c. 0 , −1 , 5 , 3
1 2 1 1 3 2
1 1 1 1 2 0
b. 1 , 2 , 3 d. 0 , 1 , 1
1 2 3 −1 1 5
11. Find the constraint
equations
that b must satisfy in order for Ax = b to be consistent.
3 −1
a. A = 6 −2
−9 3
1 1 1
*b. A = −1 1 2
1 3 4
1 2 1
0 1 1
c. A =
−1 3 4
−2 −1 1
12. Find the constraint
equations
that
b must satisfy in order to be
an
element
of
1 0 1 1 0 2
0 1 1 0 1 −1
a. V = Span
1 , 1 , 1 b. V = Span
1 , 1 , 1
1 2 0 1 2 0
1 4
2 1 1 1
both the vectors 1
and are solutions of the equation Ax = b;
0
2 3
1
0
2
c. the
orthogonal to 1 and for some nonzero vector b ∈ R both the vectors
rows ofA are
1 1
0 1 0
and are solutions of the equation Ax = b;
1 1
0 1
138 Chapter 4. Implicit and Solutions
Explicit of Linear Systems
1 2
d. for some nonzero vectors b1 , b2 ∈ R2 both the vectors 0 and 1 are solutions of
1 1
1 1
the equation Ax = b1 and both the vectors 0 and 1 are solutions of the equation
Ax = b2 . 0 1
" #
1 α
*14. Let A= .
α 3α
a. For which numbers α will A be singular?
b. For all numbers α not on your list in part a, we can solve Ax = b for every vector b ∈ R2 .
For each of the numbers α on your list, give the vectors b for which we can solve Ax = b.
1 1 α
15. Let A = α 2 α .
α α 1
a. For which numbers α will A be singular?
b. For all numbers α not on your list in part a, we can solve Ax = b for every vector b ∈ R3 .
For each of the numbers α on your list, give the vectors b for which we can solve Ax = b.
18. In each case, give positive integers m and n and an example of an m × n matrix A with the
stated property, or explain why none can exist.
*a. Ax = b is inconsistent for every b ∈ Rm .
*b. Ax = b has one solution for every b ∈ Rm .
c. Ax = b has either zero or one solution for every b ∈ Rm .
d. Ax = b has infinitely many solutions for every b ∈ Rm .
*e. Ax = b has infinitely many solutions whenever it is consistent.
f. There are vectors b1 , b2 , b3 so that Ax = b1 has no solution, Ax = b2 has exactly one
solution, and Ax = b3 has infinitely many solutions.
19. ♯ a. Suppose A ∈ Mm×n , B ∈ Mn×m , and BA = In . Prove that if for some b ∈ Rm the
equation Ax = b has a solution, then that solution is unique.
b. Suppose A ∈ Mm×n , C ∈ Mn×m , and AC = Im . Prove that the system Ax = b is
consistent for every b ∈ Rm .
♯ c. Suppose A ∈ M
m×n and B, C ∈ Mn×m are matrices that satisfy BA = In and AC = Im .
Prove that B = C.
§1. Gaussian Elimination and the Theory of Linear Systems 139
is nonsingular.
b. Show that the system of equations
x21 x1 1 a y1
2
x2 x2 1 b = y2
x23 x3 1 c y3
always has a unique solution. Deduce that if P1 , P2 , and P3 are not collinear, then they
lie on a unique parabola y = ax2 + bx + c.
xi
23. Let Pi = ∈ R2 , i = 1, 2, 3. Let
yi
x1 y1 1
A = x2 y2 1.
x3 y3 1
a. Prove that the three points P1 , P2 , and P3 are collinear if and only if the equation Ax = 0
has a nontrivial solution. (Hint: A general line in R2 is of the form ax + by + c = 0, where
a and b are not both 0.)
b. Prove that if the three given points are not collinear, then there is a unique circle passing
through them. (Hint: If you set up a system of linear equations as suggested by the hint
for Exercise 7, you should use part a to deduce that the appropriate coefficient matrix is
nonsingular.)
140 Chapter 4. Implicit and Explicit Solutions of Linear Systems
gives us the linear combination x1 a1 + x2 a2 + · · · + xn an of the columns of A, the reader can easily
check that multiplying A on the left by the row vector [x1 x2 · · · xm ],
A1
h i
A2
x1 x2 · · · xm
.. ,
.
Am
then
3 4 1 2 1 2
E1 A = 1 2, E2 A = 3 4, and E3 A = 1 0.
5 6 20 24 5 6
Such matrices that give corresponding elementary row operations are called elementary ma-
trices. Note that each elementary matrix—( differs from the identity matrix only in a small way.
(N.B. Here we establish the custom that blank spaces in a matrix represent 0’s.)
(i) To interchange rows i and j, we should multiply by an elementary matrix of the form
i j
↓ ↓
§2. Elementary Matrices and Calculating Inverse Matrices 141
1
..
.
i→ ··· 0 ··· 1 ···
..
. .
j → ··· 1 ··· 0 ···
..
.
1
(iii) To add c times row i to row j, we should multiply by an elementary matrix of the form
i j
↓ ↓
1
..
.
i→ 1
..
. .
j → ··· c ··· 1
..
.
1
Here’s an easy way to remember the form of these matrices: each elementary matrix is obtained
by performing the corresponding elementary row operation on the identity matrix.elementary ma-
trix—)
" #
4 3 5
Example 1. Let A = . We put A in reduced echelon form by the following
1 2 5
sequence of row operations:
" # " # " # " # " #
4 3 5 1 2 5 1 2 5 1 2 5 1 0 −1
.
1 2 5 4 3 5 0 −5 −15 0 1 3 0 1 3
142 Chapter 4. Implicit and Explicit Solutions of Linear Systems
These steps correspond to multiplying, in sequence from right to left, by the elementary matrices
" # " # " # " #
1 1 1 1 −2
E1 = , E2 = , E3 = , E4 = ;
1 −4 1 − 15 1
now the reader can check that
" #" #" #" # " #
2
1 −2 1 1 1 5 − 53
E = E4 E3 E2 E1 = =
1 − 51 −4 1 1 − 15 4
5
and, indeed,
" #" # " #
2
5 − 53 4 3 5 1 0 −1
EA = = ,
− 15 4
5 1 2 5 0 1 3
as it should. ▽
We now concentrate on square (n × n) matrices. Recall that the inverse of the n × n matrix
A is the matrix A−1 satisfying AA−1 = A−1 A = In . It is convenient to have an inverse matrix if
we wish to solve the system Ax = b for numerous vectors b. If A is invertible, we can solve as
follows:3
Ax = b
⇓ multiplying both sides of the equation by A−1 on the left
A−1 (Ax) = A−1 b
⇓ using the associative property
(A−1 A)x = A−1 b
⇓ using the definition of A−1
x = In x = A−1 b
We aren’t done! We’ve shown that if x is a solution, then it must satisfy x = A−1 b. That is,
we’ve shown that the vector A−1 b is a candidate for a solution. But now we check that it truly is
a solution by straightforward calculation:
Ax = A(A−1 b) = (AA−1 )b = In b = b,
as required; but note that we have used both pieces of the definition of the inverse matrix to prove
that the system has a unique solution (which we “discovered” along the way).
It is a consequence of this computation that if A is an invertible n×n matrix, then Ax = c has a
unique solution for every c ∈ Rn , and so it follows from Proposition 1.6 that A must be nonsingular.
What about the converse? If A is nonsingular, must A be invertible? Well, if A is nonsingular, we
know that every equation Ax = c has a unique solution. In particular, for j = 1, . . . , n, there is
3
We will write the “implies” symbol “=⇒” vertically so that we can indicate the reasoning in each step.
§2. Elementary Matrices and Calculating Inverse Matrices 145
a unique vector bj that solves Abj = ej , the j th standard basis vector. If we let B be the n × n
matrix whose column vectors are b1 , . . . , bn , then we have
| | | | | |
AB = A b1 b2 · · · bn = e1 e2 · · · en = In .
| | | | | |
This suggests that the matrix we’ve constructed should be the inverse matrix of A. But we need
to know that BA = In as well. Here is a very elegant way to understand why this is so. We can
find the matrix B by forming the giant augmented matrix
| |
A e1 · · · en = A In
| |
and using Gaussian elimination to obtain the reduced echelon form
In B .
(Note that the reduced echelon form of A must be In because A is nonsingular.) But this tells us
that if E is the product of the elementary matrices required to put A in reduced echelon form, then
we have
E [A | I ] = [I | B] ,
and so B = E and BA = In , which is what we needed to check. In conclusion, we have proved the
following
Note that Gaussian elimination will also let us know when A is not invertible: if we come to a
row of zeroes while reducing A to echelon form, then, of course, A is singular and so it cannot be
invertible. The following observation is often very useful.
Corollary 2.2. If A and B are n×n matrices satisfying BA = In , then B = A−1 and A = B −1 .
Proof. By Exercise 4.1.19a, the equation Ax = 0 has only the trivial solution. Hence, by
Proposition 1.6, A is nonsingular; according to Theorem 2.1, A is therefore invertible. Since A has
an inverse matrix, A−1 , we deduce that
BA = In
⇓ multiplying both sides of the equation by A−1 on the right
(BA)A−1 = In A−1
⇓ using the associative property
B(AA−1 ) = A−1
⇓ using the definition of A−1
B = A−1 ,
146 Chapter 4. Implicit and Explicit Solutions of Linear Systems
Example 5. It is convenient to derive the formula for the inverse of a general 2 × 2 matrix
first given in Example 9 of Chapter 1, Section 4. Let
" #
a b
A= .
c d
We assume a 6= 0 to start with.
" # " # " #
a b 1 0 1 ab a1 0 1 b
a
1
a 0
(assuming ad − bc 6= 0)
c d 0 1 c d 0 1 0 d − bca − c
a 1
" # " # " #
1 ab 1
a 0 1 0 1
a − b
a (− c
ad−bc ) − b a
a ad−bc 1 0 d
ad−bc − b
ad−bc ,
c a c a = c a
0 1 − ad−bc ad−bc 0 1 − ad−bc ad−bc 0 1 − ad−bc ad−bc
Of course, we have derived this assuming a 6= 0, but the reader can check easily that the formula
works fine even when a = 0. We do see, however, from the row reduction that
a b
is nonsingular ⇐⇒ ad − bc 6= 0. ▽
c d
We have shown in the course of proving Theorem 2.1 that when A is square, any B that satisfies
AB = I (a so-called right inverse of A) must also satisfy BA = I (and thus is a left inverse of A).
Likewise, we have established in Corollary 2.2 that when A is square, any left inverse of A is a bona
fide inverse of A. Indeed, it will never happen that a non-square matrix has both a left and a right
inverse (see Exercise 9).
Remark. Even when A is square, the left and right inverses have rather different interpreta-
tions. As we saw in the proof of Theorem 2.1, the columns of the right inverse arise as the solutions
of Ax = ej . On the other hand, the left inverse of A is the product of the elementary matrices by
which we reduce A to its reduced echelon form, I. (See Exercise 8.)
EXERCISES 4.2
*1. For each of the matrices A in Exercise 4.1.3, find a product of elementary matrices E = · · · E2 E1
so that EA is in echelon form. Use the matrix E you’ve found to give constraint equations for
Ax = b to be consistent.
2. Use Gaussian
" elimination
# to find A−1 (if it exists):
1 2
a. A =
−1 3
1 2 3
b. A = 1 1 2
0 1 2
1 0 1
*c. A = 0 2 1
−1 3 1
1 2 3
d. A = 4 5 6
7 8 9
2 3 4
*e. A = 2 1 1
−1 1 2
3. In each case, given A and b,
(i) Find A−1 .
(ii) Use your answer to (i) to solve Ax = b.
(iii) Use your answer to (ii) to express b as a linear combination of the columns of A.
148 Chapter 4. Implicit and Explicit Solutions of Linear Systems
" # " #
2 3 3
a. A = ,b=
3 5 4
1 1 1 1
*b. A = 0 2 3 , b = 1
3 2 2 2
1 1 1 3
c. A = 0 1 1 , b = 0
1 2 1 1
1 1 1 1 2
0 1 1 1
0
*d. A = , b =
0 0 1 3 1
0 0 1 4 1
" #
1 −1 1
4. a. Find two different right inverses of the matrix A = .
2 −1 0
b. Give a nonzero matrix that has no right inverse.
1 2
c. Find two left inverses of the matrix A = 0 −1 .
1 1
d. Give a nonzero matrix that has no left inverse.
5. Prove that the inverse of every elementary matrix is again an elementary matrix. Indeed, give
a simple prescription for determining the inverse of each type of elementary matrix.
6. Using Theorem 2.1 and Proposition 4.3 of Chapter 1, prove that if AB and B are nonsingular,
then A is nonsingular. (Cf. Exercise 4.1.17.)
♯ 7. Suppose A is an invertible m × m matrix and B is an invertible n × n matrix.
a. Prove that the matrix " #
A O
O B
is invertible and give a formula for its inverse.
b. Suppose C is an arbitrary m × n matrix. Is the matrix
" #
A C
O B
invertible?
(See Exercise 1.4.12 for the notion of block multiplication.)
8. Complete the following alternative argument that the matrix obtained by Gaussian elimination
must be the inverse matrix of A. Suppose A is nonsingular.
a. Show there are finitely many elementary matrices E1 , E2 , . . . , Ek so that
Ek Ek−1 · · · E2 E1 A = I.
b. Let B = Ek Ek−1 · · · E2 E1 . Prove that AB = I. (Hint: Use Proposition 4.3 of Chapter 1.)
§3. Linear Independence, Basis, and Dimension 149
Example 1. Let
1 1 1 1
v1 = 1 , v2 = −1 , v3 = 0 , and v = 1 .
2 0 1 0
We ask first of all whether v ∈ Span(v1 , v2 , v3 ). This is a familiar question when we recast it in
matrix notation: Let
1 1 1 1
A = 1 −1 0 and b = 1 .
2 0 1 0
Is the system Ax = b consistent? Immediately we write down the appropriate augmented matrix
and reduce to echelon form:
1 1 1 1 1 1 1 1
1 −1 0 1 0 2 1 0,
2 0 1 0 0 0 0 −2
so the system is obviously inconsistent. The answer is: No, v is not in Span(v1 , v2 , v3 ).
What about
2
w = 3 ?
5
As the reader can easily check, w = 3v1 − v3 , so w ∈ Span(v1 , v2 , v3 ). What’s more, w =
2v1 − v2 + v3 , as well. So, obviously, there is no unique expression for w as a linear combination
of v1 , v2 , and v3 . But we can conclude more: setting the two expressions for w equal, we obtain
That is, there is a nontrivial relation among the vectors v1 , v2 , and v3 , and this is the reason
we have different ways of expressing w as a linear combination of the three of them. Indeed,
since v1 = −v2 + 2v3 , we can see easily that any linear combination of v1 , v2 , and v3 is a linear
combination just of v2 and v3 :
c1 v1 + c2 v2 + c3 v3 = c1 (−v2 + 2v3 ) + c2 v2 + c3 v3 = (c2 − c1 )v2 + (c3 + 2c1 )v3 .
The vector v1 was redundant, because
Span(v1 , v2 , v3 ) = Span(v2 , v3 ).
We might surmise that the vector w can now be written uniquely as a linear combination of v2
and v3 , and this is easy to check:
1 1 2 1 1 2
′
A |w = −1 0 3 0 1 5,
0 1 5 0 0 0
and from the fact that the matrix A′ has rank 2 we infer that the system of equations has a unique
solution. ▽
Remark. In the language of functions, if A is the standard matrix of a linear map T : Rn → Rm ,
we are interested in the image of T (i.e., the set of w ∈ Rm so that w = T (v) for some v ∈ Rn ),
and the issue of whether T is one-to-one (i.e., given w in the image, is there exactly one v ∈ Rn so
that T (v) = w?).
Generalizing the preceding example, we now recast Proposition 1.5:
Proposition 3.1. Let v1 , . . . , vk ∈ Rn and let V = Span(v1 , . . . , vk ). An arbitrary vector
v ∈ Span(v1 , . . . , vk ) has a unique expression as a linear combination of v1 , . . . , vk if and only if
the zero vector has a unique expression as a linear combination of v1 , . . . , vk , i.e.,
c1 v1 + c2 v2 + · · · + ck vk = 0 =⇒ c1 = c2 = · · · = ck = 0.
Proof. Suppose for some v ∈ V there are two different expressions
v = c1 v1 + c2 v2 + · · · + ck vk and
v = d1 v1 + d2 v2 + · · · + dk vk .
0 = s1 v1 + s2 v2 + · · · + sk vk ,
c1 v1 + c2 v2 + · · · + ck vk = 0 =⇒ c1 = c2 = · · · = ck = 0,
i.e., if the only way of expressing the zero vector as a linear combination of v1 , . . . , vk is the trivial
linear combination 0v1 + · · · + 0vk .
The set of vectors {v1 , . . . , vk } is called linearly dependent if it is not linearly independent, i.e.,
if there is some expression
Remark. The language is problematic here. Many mathematicians—often including the author
of this text—tend to say things like “the vectors v1 , . . . , vk are linearly independent.” But linear
independence (or dependence) is a property of the whole collection of vectors, not of the individual
vectors. What’s worse, we really should refer to an ordered list of vectors rather than to a set of
vectors: for example, any list in which some vector, v, appears twice is obviously giving a linearly
dependent collection; but the set {v, v} is indistinguishable from the set {v}. There seems to be
no ideal route out of this morass! Having said all this, we warn the gentle reader that we may
occasionally say “the vectors v1 , . . . , vk are linearly (in)dependent” where it would be too clumsy
to be more pedantic. Just stay alert!!
Remark. Here is a piece of advice: It is virtually always the case that when you are presented
with a set of vectors {v1 , . . . , vk } that you are to prove linearly independent, you should write:
“Suppose c1 v1 + c2 v2 + · · · + ck vk = 0. I must show that c1 = · · · = ck = 0.”
You then use whatever hypotheses you’re given to arrive at that conclusion.
c1 (u + v) + c2 (v + w) + c3 (u + w) = 0.
We must show that c1 = c2 = c3 = 0. We use the distributive property to rewrite our equation as
c1 + c3 = 0
c1 + c2 = 0
c2 + c3 = 0 ,
and we leave it to the reader to check that the only solution of this system of equations is, in fact,
c1 = c2 = c3 = 0, as desired. ▽
Example 4. Any time one has a list of vectors v1 , . . . , vk in which one of the vectors is the
zero vector, say v1 = 0, then the set of vectors must be linearly dependent. For the equation
1v1 = 0
Example 5. How can two nonzero vectors u and v give rise to a linearly dependent set? By
definition, this means that there is a linear combination
au + bv = 0,
How can a collection of three nonzero vectors be linearly dependent? As before, there must be
a linear combination
au + bv + cw = 0,
where (at least) one of a, b, and c is nonzero. Say a 6= 0. This means that we can solve:
1 b c
u = − (bv + cw) = (− )v + (− )w,
a a a
so u ∈ Span(v, w). In particular, Span(u, v, w) is either a line (if all three vectors u, v, w are
parallel) or a plane. ▽
x
vk
v1
Figure 3.1
Proof. Although Figure 3.1 suggests the result is quite plausible, we will prove the contrapos-
itive:
{v1 , . . . , vk , x} is linearly dependent if and only if x ∈ Span(v1 , . . . , vk ).
Suppose x ∈ Span(v1 , . . . , vk ). Then x = c1 v1 + c2 v2 + · · · + ck vk for some scalars c1 , . . . , ck , so
c1 v1 + c2 v2 + · · · + ck vk + (−1)x = 0,
from which we conclude that {v1 , . . . , vk , x} is linearly dependent (since at least one of the coeffi-
cients is nonzero).
Now suppose {v1 , . . . , vk , x} is linearly dependent. This means that there are scalars c1 , . . . , ck ,
and c, not all 0, so that
c1 v1 + c2 v2 + · · · + ck vk + cx = 0.
Note that we cannot have c = 0: for if c were 0, we’d have c1 v1 + c2 v2 + · · · + ck vk = 0, and linear
independence of {v1 , . . . , vk } implies c1 = · · · = ck = 0, which contradicts our assumption that
{v1 , . . . , vk , x} is linearly dependent. Therefore, c 6= 0, and so
1 c1 c2 ck
x = − (c1 v1 + c2 v2 + · · · + ck vk ) = (− )v1 + (− )v2 + · · · + (− )vk ,
c c c c
which tells us that x ∈ Span(v1 , . . . , vk ), as required.
154 Chapter 4. Implicit and Explicit Solutions of Linear Systems
Proposition 3.2 has the following consequence: if {v1 , . . . , vk } is linearly independent, then
Span(v1 ) ( Span(v1 , v2 ) ( · · · ( Span(v1 , . . . , vk ).
That is, with each additional vector, the subspace spanned gets larger. We now formalize the notion
of “size” of a subspace. But we now understand that when we have a set of linearly independent
vectors, no proper subset will yield the same span. In other words, we will have an “efficient” set
of spanning vectors (i.e., there is no redundancy in the vectors we’ve chosen: no proper subset will
do). This motivates the following
Definition. Let V ⊂ Rn be a subspace. The set of vectors {v1 , . . . , vk } is called a basis for V
if
(i) v1 , . . . , vk span V , i.e., V = Span(v1 , . . . , vk ), and
(ii) {v1 , . . . , vk } is linearly independent.
We comment that the plural of basis is bases.
Another example, which will be quite important to us in the future, is the following
Proposition 3.4. Let A be an n × n matrix. Then A is nonsingular if and only if its column
vectors form a basis for Rn .
156 Chapter 4. Implicit and Explicit Solutions of Linear Systems
Proof. As usual, let’s denote the column vectors of A by a1 , a2 , . . . , an . Using Corollary 3.3,
we are to prove that A is nonsingular if and only if every vector in Rn can be written uniquely as
a linear combination of a1 , a2 , . . . , an . But this is exactly what Proposition 1.6 tells us.
Given a subspace V ⊂ Rn , how do we know there is some basis for it? This is a consequence of
Proposition 3.2 as well.
Theorem 3.5. Any subspace V ⊂ Rn other than the trivial subspace has a basis.
Once we realize that every subspace V ⊂ Rn has some basis, we are confronted with the problem
that it has many of them. For example, Proposition 3.4 gives us a way of finding zillions of bases
for Rn . As we shall now show, all bases for a given subspace have one thing in common: they all
consist of the same number of elements.
Proposition 3.6. Let V ⊂ Rn be a subspace, let {v1 , . . . , vk } be a basis for V , and let
w1 , . . . , wℓ ∈ V . If ℓ > k, then {w1 , . . . , wℓ } must be linearly dependent.
We now form the k × ℓ matrix A = [aij ]. This gives the matrix equation
| | | | | |
(∗) v1 v2 · · · vk A = w1 w2 · · · wℓ .
| | | | | |
Since ℓ > k, there cannot be a pivot in every column of A and so there is a nonzero vector c
satisfying
c1
c2
A
.. = 0.
.
cℓ
§3. Linear Independence, Basis, and Dimension 157
Remark. We can easily avoid equation (∗) in its matrix form. Since
k
X
wj = aij vi ,
i=1
we have
ℓ
X ℓ
X X
k Xk X
ℓ
(∗∗) cj wj = cj aij vi = aij cj vi .
j=1 j=1 i=1 i=1 j=1
As before, since ℓ > k, there is a nonzero vector c so that Ac = 0; this choice of c makes the right-
hand side of (∗∗) the zero vector. Consequently, there is a nontrivial relation among w1 , . . . , wℓ .
Theorem 3.7. Let V ⊂ Rn be a subspace, and let {v1 , . . . , vk } and {w1 , . . . , wℓ } be two bases
for V . Then we have k = ℓ.
Proof. Since {v1 , . . . , vk } forms a basis for V and {w1 , . . . , wℓ } is known to be linearly in-
dependent, we use Proposition 3.6 to conclude that ℓ ≤ k. Now here’s the trick: {w1 , . . . , wℓ }
is likewise a basis for V and {v1 , . . . , vk } is known to be linearly independent, so we infer from
Proposition 3.6 that k ≤ ℓ. The only way both inequalities can hold is for k and ℓ to be equal, as
we wished to show.
We now make the official
Definition. The dimension of a subspace V ⊂ Rn is the number of vectors in any basis for V .
We denote the dimension of V by dim V . By convention, dim{0} = 0.
As we shall see in our applications, dimension is a powerful tool. Here is the first instance.
Lemma 3.8. Suppose V and W are subspaces of Rn with the property that W ⊂ V . If
dim V = dim W , then V = W .
Proof. Let dim W = k and let {v1 , . . . , vk } be a basis for W . If W ( V , then there must be
a vector v ∈ V with v ∈
/ W . By virtue of Proposition 3.2, we know that {v1 , . . . , vk , v} is linearly
independent, so dim V ≥ k + 1. This is a contradiction. Therefore, V = W .
The next result is quite useful.
158 Chapter 4. Implicit and Explicit Solutions of Linear Systems
Proposition 3.9. Let V ⊂ Rn be a k-dimensional subspace. Then any k vectors that span V
must be linearly independent and any k linearly independent vectors in V must span V .
3.1. Abstract Vector Spaces. We have not yet dealt with vector spaces other than Euclidean
spaces. In general, a vector space is a set endowed with the operations of addition and scalar
multiplication, subject to the properties listed in Exercise 1.1.12. Notions of linear independence
and basis proceed analogously; the remark on p. 157 shows that dimension is well-defined in the
general setting.
Examples 10. Here are a few examples of so-called “abstract” vector spaces. Others appear
in the exercises.
§3. Linear Independence, Basis, and Dimension 159
(a) Let Mm×n denote the set of all m×n matrices. As we’ve seen in Proposition 4.1 of Chapter
1, Mm×n is a vector space, using the operations of matrix addition and scalar multiplication
we’ve already defined. The zero “vector” is the zero matrix O. This space can naturally
be identified with Rmn (see Exercise 24).
(b) Let F(U ) denote the collection of all real valued functions defined on some subset U ⊂ Rn .
If f ∈ F(U ) and c ∈ R, then we can define a new function cf ∈ F(U ) by multiplying the
value of f at each point by the scalar c: i.e.,
Similarly, if f, g ∈ F(U ), then we can define the new function f + g ∈ F(U ) by adding the
values of f and g at each point: i.e.,
By these formulas we define scalar multiplication and vector addition in F(U ). The zero
“vector” in F(U ) is the zero function. The various properties of a vector space follow from
the corresponding properties of the real numbers (as everything is defined in terms of the
values of the function at every point t). Since an element of F(U ) is a function, F(U ) is
often called a function space.
(c) Let Rω denote thecollection
of all infinite sequences of real numbers. That is, an element of
x1
x2
Rω looks like x = x3 , where xi ∈ R, i = 1, 2, 3, . . .. Operations are defined in the obvious
..
.
y1 cx1 x1 + y1
y2 cx2 x2 + y2
way: if c ∈ R and y = y3 , then we set cx = cx3 and x + y = x3 + y3 . ▽
.. .. ..
. . .
The vector space of functions on an open subset U ⊂ Rn has various subspaces that will be of
particular interest to us. For any k ≥ 0 we have Ck (U ), the space of Ck functions on U ; indeed, we
have the hierarchy
C∞ (U ) ⊂ · · · ⊂ Ck+1 (U ) ⊂ Ck (U ) ⊂ · · · ⊂ C2 (U ) ⊂ C1 (U ) ⊂ C0 (U ).
(That these are all subspaces follows from the standard facts that sums and scalar multiples of Ck
functions are again Ck .) We can also consider the subspaces of polynomial functions. We denote
by Pk the vector space of polynomials of degree ≤ k in one variable.
As we ask the reader to check in Exercise 26, the vector space Pk has dimension k + 1. In
general, we say a vector space is finite-dimensional if it has dimension n for some n ∈ N and
infinite-dimensional if not. The vector space C∞ (R) is infinite-dimensional, as it contains the
polynomials of arbitrarily high degree.
160 Chapter 4. Implicit and Explicit Solutions of Linear Systems
EXERCISES 4.3
1 2 2
1. Let v1 = 2 , v2 = 4 , and v3 = 4 ∈ R3 . Is each of the following statements correct
3 5 6
or incorrect? Explain.
a. The set {v1 , v2 , v3 } is linearly dependent.
b. Each of the vectors v1 , v2 , and v3 can be written as a linear combination of the others.
*2. Decide
whether
each of the following sets of vectorsis linearly
independent.
1 2
1 1 1 3
a. , ⊂R 2
4 9 e. 1 , 1 , 3 , 1 ⊂ R4
1 2 1 3 1 1
b. 4 , 9 ⊂ R3 3 1 1 1
0
0
1 1 1 −3
1 −3 1
1 2 3 1 4
c. 4 , 9 , −2 ⊂ R3 f.
1 , −3 , 1 , 1 ⊂ R
−3 1 1 1
0 0 0
1 2 0
d. 1 , 3 , 1 ⊂ R3
1 3 2
7. Suppose v1 , . . . , vk ∈ Rn form a linearly dependent set. Prove that for some 1 ≤ j ≤ k we have
vj ∈ Span(v1 , . . . , vj−1 , vj+1 , . . . , vk ). That is, one of the vectors v1 , . . . , vk can be written as
a linear combination of the remaining vectors.
*12. Decide
whether
the following
sets of vectors give a basis for the indicated space.
1 2 1
a. 2 , 4 , 2 ; R3
1 5 3
1 1 2 2
b. 0 , 2 , 2 , 2 ; R3
1 4 5 −1
1 0 1
c.
0 1
, , 1 ; R4
2 1 4
3 1 4
1 0 1 2
0 1 1 −2 4
d.
2 1 4 1 ; R
, , ,
3 1 4 2
14. In each case, check that {v1 , . . . , vn } is a basis for Rn and give the coordinates of the given
vector b ∈Rn with respect
to that basis.
2 3 3
a. v1 = , v2 = ;b=
3 5 4
1 1 1 1
*b. v1 = 0 , v2 =
2 , v3 = 3 ; b = 1
3 2 2 2
1 1 1 3
c. v1 = 0 , v2 = 1 , v3 = 1 ; b = 0
1 2 1 1
1 1 1 1 2
0 1 1 1 0
*d. v1 =
0 , v2 =
, v3 = , v4 = ; b =
0 1 3 1
0 0 1 4 1
162 Chapter 4. Implicit and Explicit Solutions of Linear Systems
17. Prove Proposition 3.9. (Hint: Exercise 7 and Lemma 3.8 may be of help.)
♯ 18. Let V ⊂ Rn be a subspace, and suppose you are given a linearly independent set of vectors
{v1 , . . . , vk } ⊂ V . Prove that there are vectors vk+1 , . . . , vℓ ∈ V so that {v1 , . . . , vℓ } forms a
basis for V .
19. Suppose V and W are subspaces of Rn and W ⊂ V . Prove that dim W ≤ dim V . (Hint: Start
with a basis for W and apply Exercise 18.)
21. *a. Suppose U and V are subspaces of Rn with U ∩ V = {0}. If {u1 , . . . , uk } is a basis for U
and {v1 , . . . , vℓ } is a basis for V , prove that {u1 , . . . , uk , v1 , . . . , vℓ } is a basis for U + V .
b. Let U and V be subspaces of Rn . Prove that if U ∩ V = {0}, then dim(U + V ) =
dim U + dim V .
c. Let U and V be subspaces of Rn . Prove that dim(U + V ) = dim U + dim V − dim(U ∩ V ).
(Hint: Start with a basis for U ∩ V , and use Exercise 18.)
*23. Decide
whether
the following
sets of vectors are linearly independent.
1 0 0 1 1 1
a. , , ⊂ M2×2
0 1 1 0 1 −1
b. {f1 , f2 , f3 } ⊂ P1 , where f1 (t) = t, f2 (t) = t + 1, f3 (t) = t + 2
c. {f1 , f2 , f3 } ⊂ C∞ (R), where f1 (t) = 1, f2 (t) = cos t, f3 (t) = sin t
§4. The Four Fundamental Subspaces 163
fi (a1 v1 + a2 v2 + · · · + an vn ) = ai .
26. Show that the set Pk of polynomials in one variable of degree ≤ k is a vector space of dimension
k + 1. (Hint: Suppose c0 + c1 x + · · · + ck xk = 0 for all x. Differentiate.)
27. Recall that f : Rn − {0} → R is homogeneous of degree k if f (tx) = tk f (x) for all t > 0.
a. Show that the set Pk,n of homogeneous polynomials of degree k in n variables is a vector
space.
b. Fix k ∈ N. Show that the monomials xi11 xi22 · · · xinn , where i1 + i2 + · · · + in = k, form a
basis for Pk,n .
4 j
c. Show that dim Pk,n = n−1+k k . (Hint: It may help to remember that kj = j−k .)
P n+i n+k+1
k
d. Using the interpretation in part c, prove that i = k .
i=0
Given an m × n matrix A (or, more conceptually, a linear map T : Rn → Rm ), there are four
natural subspaces to consider. It is one of our goals to understand the relations among them. We
begin with the column space and row space.
C(A) = Span(a1 , . . . , an ) ⊂ Rm .
4 n
Recall that the binomial coefficient k
= n!/k!(n−k)! gives the number of k-element subsets of a given n-element
set.
164 Chapter 4. Implicit and Explicit Solutions of Linear Systems
R(A) = Span(A1 , . . . , Am ) ⊂ Rn .
Our work in Section 1 gives an important alternative interpretation of the column space.
C(A) = {b ∈ Rm : Ax = b is consistent}.
Perhaps the most natural subspace of all comes from solving a homogeneous system of linear
equations.
Definition. Let A be an m × n matrix. The nullspace of A is the set of solutions of the system
Ax = 0:
N(A) = {x ∈ Rn : Ax = 0}.
Recall (see Exercise 1.4.3) that N(A) is in fact a subspace. If we think of A as the standard matrix
of a linear map T : Rn → Rm , then N(A) ⊂ Rn is often called the kernel of T , denoted ker(T ).
We might surmise that our algorithm in Section 1 for finding the general solution of the homo-
geneous linear system Ax = 0 produces a basis for N(A).
x1 = −x3 − x4
x2 = x4
x3 = x3
x4 = x4 ,
§4. The Four Fundamental Subspaces 165
i.e.,
x1 −x3 −x4 −1 −1
x
x4
2 0 1
x= = = x3 + x4 .
x3 x3 1 0
x4 x4 0 1
From this we see that the vectors
−1 −1
0 1
v1 = and v2 =
1 0
0 1
span N(A). On the other hand, they are clearly linearly independent, for if
−1 −1 −c1 − c2 0
0 1 c2
0
c1 + c2 = = ,
1 0 c1 0
0 1 c2 0
then c1 = c2 = 0. Thus, {v1 , v2 } gives a basis for N(A). ▽
One of the most beautiful and powerful relations among these subspaces is the following:
Proposition 4.2. Let A be an m × n matrix. Then N(A) = R(A)⊥ .
Proof. If x ∈ N(A), then, by definition, Ai · x = 0 for all i = 1, 2, . . . , m. (Remember that
A1 , . . . , Am denote the row vectors of the matrix A.) So it follows (see Exercise 1.3.3) that x
is orthogonal to any linear combination of A1 , . . . , Am , hence to any vector in R(A). That is,
x ∈ R(A)⊥ , so N(A) ⊂ R(A)⊥ . Now we need only show that R(A)⊥ ⊂ N(A). If x ∈ R(A)⊥ , this
means that x is orthogonal to every vector in R(A), so, in particular, x is orthogonal to each of
the row vectors A1 , . . . , Am . But this means that Ax = 0, so x ∈ N(A), as required.
It is also the case that R(A) = N(A)⊥ , but we are not quite yet in a position to establish this.
Since C(A) = R(AT ), the following is immediate:
Corollary 4.3. Let A be an m × n matrix. Then N(AT ) = C(A)⊥ .
In fact, we really came across this earlier, when we found constraint equations for Ax = b to be
consistent. Just as multiplying A by x takes linear combinations of the columns of A, so then does
multiplying AT by x take linear combinations of the rows of A (perhaps it helps to think of AT x
as (xT A)T ). Corollary 4.3 is the statement that any linear combination of the rows of A that gives
0 corresponds to a constraint on C(A) and vice versa. What is, however, far from clear is whether
the vectors we obtain as coefficients of the constraint equations form a linearly independent set.
Example 2. Let
1 2
1 1
A= .
0 1
1 2
166 Chapter 4. Implicit and Explicit Solutions of Linear Systems
We wish to find a homogeneous system of linear equations describing C(A). That is, we seek the
equations b ∈ R4 must satisfy in order for Ax = b to be consistent. By row reduction, we find:
1 2 b1 1 2 b1 1 2 b1
1 1 b2 0 −1 b − b 0 1 b1 − b2
2 1
,
0 1 b3 0 1 b3 0 0 −b1 + b2 + b3
1 2 b4 0 0 b4 − b1 0 0 −b1 + b4
−b1 + b2 + b3 = 0
−b1 + b4 = 0 .
Now, if we keep track of the row operations involved in reducing A to echelon form, we find
that
1 0 0 0 1 2
−1 1 0 0 0 −1
A = ,
−1 1 1 0 0 0
−1 0 0 1 0 0
from which we see that
−A1 + A2 + A3 = −A1 + A4 = 0.
span N(AT ). On the other hand, in this instance, it is easy to see they are linearly independent
and hence give a basis for N(AT ). ▽
Example 3. Let
1 1 0 1 4
1 2 1 1 6
A= .
0 1 1 1 3
2 2 0 1 7
Gaussian elimination gives us the reduced echelon form R:
1 0 −1 0 1 1 0 1 4 1 0 −1 0 1
−1 1 0
01 2 1 1
6 0 1 1 0 2
R= = .
1 −1 1 00 1 1 1 3 0 0 0 1 1
−1 −1 1 1 2 2 0 1 7 0 0 0 0 0
From this information, we wish to read off bases for each of the subspaces R(A), N(A), C(A), and
N(AT ).
§4. The Four Fundamental Subspaces 167
Using the result of Exercise 1, R(A) = R(R), so the nonzero rows of R span R(A); now we
need only check that they form a linearly independent set. We keep an eye on the pivot “slots”:
Suppose
1 0 0
0 1 0
c1 −1 + c2 1 + c3
0 = 0.
0 0 1
1 2 1
and so c1 = c2 = c3 = 0, as promised.
From the reduced echelon form R, we read off the vectors that span N(A): the general solution
of Ax = 0 is
x3 − x5 1 −1
−x3 −2x5 −1 −2
x = x3
= x3 1 + x5 0 ,
− x5 0 −1
x5 0 1
so
1 −1
−1 −2
1 and 0
0 −1
0 1
span N(A). On the other hand, these vectors are linearly independent, for if we take a linear
combination
1 −1
−1 −2
x3
1 + x5 0 = 0,
0 −1
0 1
we infer (from the free variable slots) that x3 = x5 = 0. Thus, these two vectors form a basis for
N(A).
Obviously, C(A) is spanned by the five column vectors of A. But these vectors cannot be
linearly independent—that’s what vectors in the nullspace of A tell us. From our vectors spanning
168 Chapter 4. Implicit and Explicit Solutions of Linear Systems
We now state the formal results regarding the four fundamental subspaces.
Theorem 4.4. Let A be an m × n matrix. Let U and R, resp., denote the echelon and reduced
echelon form, resp., of A, and write EA = U (so E is the product of the elementary matrices by
which we reduce A to echelon form).
(1) The nonzero rows of U (or of R) give a basis for R(A).
(2) The vectors obtained by setting each free variable equal to 1 and the remaining free variables
equal to 0 in the general solution of Ax = 0 (which we read off from Rx = 0) give a basis
for N(A).
(3) The pivot columns of A (i.e., the columns of the original matrix A corresponding to the
pivots in U ) give a basis for C(A).
(4) The (transposes of the) rows of E that correspond to the zero rows of U give a basis for
N(AT ). (The same works with E ′ if we write E ′ A = R.)
§4. The Four Fundamental Subspaces 169
Proof. For simplicity of exposition, let’s assume that the reduced echelon form takes the shape
1 b1,r+1 b1,r+2 · · · b1n
.. .. .. .. ..
r
. . . . .
R=
1 br,r+1 br,r+2 · · · brn .
m−r O O
(1) Since row operations are invertible, R(A) = R(U ) (see Exercise 1). Clearly the nonzero
rows of U span R(U ). Moreover, they are linearly independent because of the pivots. Let
U1 , . . . , Ur denote the nonzero rows of U ; because of our simplifying assumption on R, we
know that the pivots of U occur in the first r columns as well. Suppose now that
c1 U1 + · · · + cr Ur = 0.
The first entry of the left-hand side is c1 u11 (since the first entry of the vectors U2 , . . . , Ur
is 0 by definition of echelon form). Since u11 6= 0 by definition of pivot, we must have
c1 = 0. Continuing in this fashion, we find that c1 = c2 = · · · = cr = 0. In conclusion,
{U1 , . . . , Ur } forms a basis for R(U ), hence for R(A).
(2) Ax = 0 if and only if Rx = 0, which means that
give a basis for N(A). They obviously span (since every vector in N(A) can be expressed
as a linear combination of them). We need to check linear independence: the key is the
pattern of 1’s and 0’s in the free variable “slots.” Suppose
0 −b1,r+1 −b1,r+2 −b1n
. .. .. .
.. . . ..
0 −br,r+1 −br,r+2 −brn
0=
0 = xr+1
1 + xr+2
0 + · · · + xn 0 .
0 0 1 0
. .. ..
. ..
. . . .
0 0 0 1
from which we conclude that the vectors ar+1 , . . . , an are all linear combinations of a1 , . . . , ar .
It follows that C(A) is spanned by a1 , . . . , ar , as required.
(4) We are interested in the linear relations among the rows of A. The key point here is that
the first r rows of the echelon matrix U form a linearly independent set, whereas the last
m − r rows of U consist just of 0. Thus, N(U T ) is spanned by the last m − r standard basis
vectors for Rm . Using EA = U , we see that
AT = (E −1 U )T = U T (E −1 )T = U T (E T )−1 ,
and so
This tells us that the last m − r rows of E span N(AT ). But these vectors are linearly
independent, since E is nonsingular.
§4. The Four Fundamental Subspaces 171
Remark. Referring to our earlier discussion of (†) on p. 143 and our discussion in Sections 1
and 2 of this chapter, we finally know that finding the constraint equations for C(A) will give a
basis for N(AT ). It is also worth noting that to find bases for the four fundamental subspaces of
the matrix A, we need only find the echelon form of A to deal with R(A) and C(A), the reduced
echelon form of A to deal with N(A), and the echelon form of the augmented matrix [A | b] to
deal with N(AT ).
Example 4. We want bases for R(A), N(A), C(A), and N(AT ), given the matrix
1 1 2 0 0
0 1 1 −1 −1
A= .
1 1 2 1 2
2 1 3 −1 −3
The reader should check these all carefully. Note that dim R(A) = dim C(A) = 3, dim N(A) = 2,
and dim N(AT ) = 1. ▽
We now deduce the following results on dimension. Recall that the rank of a matrix is the
number of pivots in its echelon form.
Proof. There are r pivots and a pivot in each nonzero row of U , so dim R(A) = r. Similarly, we
have a basis vector for C(A) for every pivot, so dim C(A) = r, as well. We see that dim N(A) is equal
to the number of free variables, and this is the difference between the total number of variables (n)
and the number of pivot variables (r). Last, the number of zero rows in U is the difference between
the total number of rows (m) and the number of nonzero rows (r), so dim N(AT ) = m − r.
An immediate corollary of Theorem 4.5 is the following. The dimension of the nullspace of A
is often called the nullity of A, denoted null(A). (Cf. also Exercise 4.3.22.)
null(A) + rank(A) = n.
Proof. Choose a basis {v1 , . . . , vk } for V , and let these be the rows of a k × n matrix A.
By construction, we have R(A) = V . Notice also that rank(A) = dim R(A) = dim V = k. By
Proposition 4.2, we have V ⊥ = N(A), so dim V ⊥ = dim N(A) = n − k.
We can finally bring this discussion to a close with the geometric characterization of the relations
among the four fundamental subspaces. Note that this result completes the story of Theorem 4.5.
Proof. These are immediate from Proposition 4.2, Corollary 4.3, and Proposition 4.8.
Now, using Theorem 4.9, we have an alternative way of expressing a subspace V spanned by a
given set of vectors v1 , . . . , vk as the solution set of a homogeneous system of linear equations. We
use the vectors as rows of a matrix A; let {w1 , . . . , wℓ } give a basis for N(A). Since V = R(A) =
N(A)⊥ , we see that V is defined by the equations
w1 · x = 0, ..., wℓ · x = 0.
Example 5. Let
1 2
1 1
v1 = and v2 = .
0 1
1 2
We wish to write V = Span(v1 , v2 ) as the solution set of a homogeneous system of linear equations.
We introduce the matrix " #
1 1 0 1
A=
2 1 1 2
and find that
−1 −1
1 0
w1 = and w2 =
1 0
0 1
give a basis for N(A). By our earlier comments,
V = R(A) = N(A)⊥
= {x ∈ R4 : w1 · x = 0, w2 · x = 0}
= {x ∈ R4 : −x1 + x2 + x3 = 0, −x1 + x4 = 0}. ▽
Earlier, e.g., in Example 2, we determined the constraint equations for the column space. The
column space, as we’ve seen, is the intersection of hyperplanes whose normal vectors are the basis
vectors for N(AT ). This is an application of the result that C(A) = N(AT )⊥ . As we interchange
A and AT , we turn one method of solving the problem into the other.
To close our discussion now, we introduce in Figure 4.1 a schematic diagram summarizing the
geometric relation among our four fundamental subspaces. We know that N(A) and R(A) are
174 Chapter 4. Implicit and Explicit Solutions of Linear Systems
N(A) T
N(AT )
R(A)
S
C(A)
IRn IRm
Figure 4.1
orthogonal complements of one another in Rn and that, similarly, N(AT ) and C(A) are orthogonal
complements of one another in Rm . But there is more to be said.
Recall that, given an m × n matrix A, we have linear maps T : Rn → Rm and S : Rm → Rn
whose standard matrices are A and AT , respectively. T sends all of N(A) to 0 ∈ Rm and S sends
all of N(AT ) to 0 ∈ Rn . Now, the column space of A consists of all vectors of the form Ax for some
x ∈ Rn ; that is, it is the image of the function T . Since dim R(A) = dim C(A), this suggests that T
maps the subspace R(A) one-to-one and onto C(A). (And, symmetrically, S maps C(A) one-to-one
and onto R(A). These are, however, generally not inverse functions. Why? See Exercise 18.)
Proposition 4.10. For each b ∈ C(A), there is a unique vector x ∈ R(A) so that Ax = b.
R(A)
{x: Ax=b}
Figure 4.2
Proof. Let {v1 , . . . , vr } be a basis for R(A). Then Av1 , . . . , Avr are r vectors in C(A). They
are linearly independent (by a modification of the proof of Exercise 4.3.11 that we leave to the
reader). Therefore, by Proposition 3.9, these vectors must span C(A). This tells us that every
vector b ∈ C(A) is of the form b = Ax for some x ∈ R(A) (why?). And there can be only one
such vector x because R(A) ∩ N(A) = {0}.
Remark. There is a further geometric interpretation of the vector x ∈ R(A) that arises in the
preceding Proposition. Of all the solutions of Ax = b, it is the one of least length. Why?
§4. The Four Fundamental Subspaces 175
EXERCISES 4.4
*1. Show that if B is obtained from A by performing one or more row operations, then R(B) =
R(A).
1 2 1 1
2. Let A = −1 0 3 4 .
2 2 −2 −3
a. Give constraint equations for C(A).
b. Find a basis for N(AT ).
3. For each of the following matrices A, give bases for R(A), N(A), C(A), and N(AT ). Check
dimensions
" and orthogonality.
#
1 2 3
a. A =
2 4 6
2 1 3
b. A = 4 3 5
3 3 3
" #
1 −2 1 0
c. A =
2 −4 3 −1
1 −1 1 1 0
1 0 2 1 1
d. A =
0 2 2 2 0
−1 1 −1 0 −1
1 1 0 1 −1
1 1 2 −1 1
e. A =
2 2 2 0 0
−1 −1 2 −3 3
1 1 0 5 0 −1
0 1 1 3 −2 0
*f. A =
−1 2 3 4 1 −6
0 4 4 12 −1 −7
4. Given each
matrix A, find matrices X and Y so that C(A) = N(X) and N(A) = C(Y ).
3 −1
*a. A = 6 −2 1 1 1
−9 3 0
1 2
c. A =
1 1 0 1 1 1
b. A = 2 1 1 1 0 2
1 −1 2
5. In each case, construct a matrix with the requisite properties or explain why no such matrix
exists.
176 Chapter 4. Implicit and Explicit Solutions of Linear Systems
1 0 1 0
a.
The column space contains 1 and 1 and the nullspace contains 0 and 1 .
1 1 1 0
1 1
1 0
0 0
*b. The column space contains 1 and 1 and the nullspace contains and .
1 0
1 1
0 1
1 1
*c.
The column space has basis 0 and the nullspace contains 2 .
1 0
1 −1 1
d.
The nullspace contains 0 ,
2 , and the row space contains 1 .
1 1 −1
1 0 1 2
*e.
The column space has basis 0 , 1 , and the row space has basis 1 , 0 .
1 1 1 1
1
f. The column space and the nullspace both have basis .
0
1
g. The column space and the nullspace both have basis 0 .
0
c. Give a matrix B so that the subspace W defined in part b can be written in the form
W = N(B).
*10. Let A be an m × n matrix with rank r. Suppose A = BU , where U is in echelon form. Prove
that the first r columns of B give a basis for C(A). (In particular, if EA = U , where U is the
echelon form of A and E is the product of elementary matrices by which we reduce A to U ,
then the first r columns of E −1 give a basis for C(A).)
11. According to Proposition 4.10, if A is an m × n matrix, then for each b ∈ C(A), there is a
∈ R(A) with Ax
unique x " # = b. In each case, give a formula
" for that x.#
1 2 3 1 1 1
a. A = *b. A =
1 2 3 0 1 −1
♯ 12. Let A be an m × n matrix and B be an n × p matrix. Prove that
a. N(B) ⊂ N(AB).
b. C(AB) ⊂ C(A). (Hint: Use Proposition 4.1.)
c. N(B) = N(AB) when A is n × n and nonsingular.
d. C(AB) = C(A) when B is n × n and nonsingular.
17. Suppose U and V are subspaces of Rn . Prove that (U ∩ V )⊥ = U ⊥ + V ⊥ . (Hint: Use Exercise
1.3.12 and Proposition 4.8.)
18. a. Show that if the m × n matrix A has rank 1, then there are nonzero vectors u ∈ Rm and
v ∈ Rn so that A = uvT . Describe the geometry of the four fundamental subspaces in
terms of u and v.
Pursuing the discussion on p. 174,
b. Suppose A is an m × n matrix of rank n. Show that AT A = In if and only if the column
vectors a1 , . . . , an ∈ Rn are mutually orthogonal unit vectors.
c. Suppose A is an m×n matrix of rank 1. Using the notation of part a, show that (S ◦ T )(x) =
x for each x ∈ R(A) if and only if kukkvk = 1. Interpret T geometrically.
d. Can you generalize? (See Exercise 9.1.15.)
We have seen that given a linear subspace V of Rn , we can represent it either explicitly (para-
metrically) as the span of its basis vectors or implicitly as the solution set of a homogeneous system
of linear equations (i.e., the nullspace of an appropriate matrix A). Proposition 4.2 gives a geometric
interpretation of that matrix: its row vectors must span the orthogonal complement of V .
In the nonlinear case, sometimes we are just as fortunate. Given the hyperbola with equation
xy = 1, it is easy to solve (everywhere) explicitly for either x or y as a function of the other. In
the case of the circle x2 + y 2 = 1, we can solve for y as a function of x locally near any point not
√
on the x-axis (viz., y = ± 1 − x2 ), and for x as a function of y near any point not on the y-axis
(analogously).
But it is important to understand that going back and forth between these two approaches can
be far more difficult—if not impossible—in the nonlinear case. For example, with a bit of luck, we
can see that the parametric curve
" #
t2 − 1
g(t) = , t ∈ R,
t(t2 − 1)
is given by the algebraic equation y 2 = x2 (x + 1) (the curve pictured in Figure 1.4(b) on p. 53).
On the other hand, the cycloid, presented parametrically as the image of the function
" #
t − sin t
g(t) = , t ∈ R,
1 − cos t
(see Figure 1.6 on p. 54) is obviously the graph y = f (x) for some function f , but I believe no one
can find f explicitly. Nor is there a function on R2 whose zero-set is the cycloid. Nevertheless, it is
easy to see that locally we can write x as a function of y away from the cusps. On the other hand,
§5. The Nonlinear Case: Introduction to Manifolds 179
y2 = x3−x
Figure 5.1
functions is impossible. However, as Figure 5.1 suggests, away from the points lying on the x-axis,
√
we can write y as a function of x (explicitly in this case: y = ± x3 − x), and near each of those
three points we can write x as a function of y (explicitly only if you know how to solve the cubic
equation x3 − x = y 2 explicitly).
Given the hyperplane a · x = 0 in Rn , we can solve for xn as a function of x1 , . . . , xn−1 —i.e.,
we can represent the hyperplane as a graph over the x1 · · · xn−1 -plane—if and only if an 6= 0 (and,
likewise, we can solve for xk in terms of the remaining variables if and only if ak 6= 0). More
generally, given a system of linear equations, we apply Gaussian elimination and solve for the pivot
variables as functions of the free variables. In particular, as Theorem 4.4 shows, if rank(A) = r,
then we solve for the r pivot variables as functions of the n − r free variables.
Now, since the derivative gives us the best linear approximation of a function, we expect that
if the tangent plane to a surface at a point is a graph, then so locally should be the surface, as
depicted in Figure 5.2. We suggested in Section 4 of Chapter 3 that, given a level surface f = c
of a differentiable function f : Rn → R, the vector ∇f (a)—provided it is nonzero—should be the
normal vector to the tangent plane at a; equivalently, the subspace of Rn parallel to the tangent
plane should be the nullspace of the matrix [Df (a)]. To establish these facts we need the Implicit
Function Theorem, whose proof we delay to Chapter 6.
Theorem 5.1 (Implicit Function Theorem, simple case). Suppose U ⊂ Rn is open, a ∈ U ,
∂f
and f : U → R is C1 . Suppose that f (a) = 0 and (a) 6= 0. Then there are neighborhoods V of
∂xn
180 Chapter 4. Implicit and Explicit Solutions of Linear Systems
xn
f(x) = c
x1
( )
xn = φ ...
xn−1
a
a1
...
an−1
V
Figure 5.2
a1
..
. and W of an and a C1 function φ : V → W so that
an−1 x1 x1
.. ..
f (x) = 0, . ∈ V, and xn ∈ W ⇐⇒ xn = φ . .
xn−1 xn−1
That is, near a, the level surface f = 0 can be expressed as a graph over the x1 · · · xn−1 -plane; i.e.,
near a, the equation f = 0 defines xn implicitly as a function of the remaining variables.
∂f
More generally, provided Df (a) 6= 0, we know that some partial derivative (a) 6= 0, and so
∂xk
locally the equation f = 0 expresses xk implicitly as a function of x1 , . . . , xk−1 , xk+1 , . . . , xn .
y
y=φ1(x)
y=φ2(x) 1
x
−2 2
−1
y=φ3(x)
Figure 5.3
∂f −2
= 3(y 2 − 1) = 0 at the points ± . Away from these points, y is given (implicitly) locally
∂y 1
§5. The Nonlinear Case: Introduction to Manifolds 181
as a function of x. We recognize these as the three (C1 ) local inverse functions φ1 , φ2 , and φ3 of
g(x) = x3 − 3x, defined, respectively, on the intervals (−2, ∞), (−2, 2), (−∞, 2). ▽
Figure 5.4
−2t
2 x
of the form t for some t ∈ R, we can locally write z = φ . Of course, it doesn’t take a
y
t
wizard to do so: we have p
−x ± x2 − 4y
z= ,
2
and away from points of the designated form we can choose either the positive or negative square
root. It is along the curve 4y = x2 (in the xy-plane) that the two roots of this quadratic equation
in z coalesce. (Note that this curve is the projection of the locus of points on the surface where
∂f
= 0.) ▽
∂z
Now we can legitimize (finally) the process of implicit differentiation introduced in beginning
∂f
calculus classes. Suppose U ⊂ Rn is open, a ∈ U , f : U → R is C1 , and (a) 6= 0. For convenience
∂xn
here, let’s write
x1
.
x= .
. .
xn−1
182 Chapter 4. Implicit and Explicit Solutions of Linear Systems
Then, by Theorem 5.1, f = 0 defines xn implicitly as a C1 function φ(x) near a. Then we have
(Here all the derivatives of φ are evaluated at a, and all the derivatives of f are evaluated at
g(a) = a.) In particular, for any j = 1, . . . , n − 1, we have
∂f ∂f ∂φ
(a) + (a) (a) = 0,
∂xj ∂xn ∂xj
from which the result is immediate.
Ta M = {x ∈ Rn : Df (a)(x − a) = 0};
∂f
Proof. Since Df (a) 6= 0, we may assume without loss of generality that (a) 6= 0. Applying
∂xn
Theorem 5.1 to the function f − c, we know that M can be expressed near a as the
graph xn = φ(x)
a
for some C1 function φ. Now, the tangent plane to the graph xn = φ(x) at a = is the graph
φ(a)
of Dφ(a), translated so that it passes through a:
n−1
X ∂φ
xn − an = Dφ(a)(x − a) = (a)(xj − aj )
∂xj
j=1
n−1 ∂f !
X ∂xj (a)
= − ∂f (xj − aj ) (by Lemma 5.2),
j=1 ∂xn (a)
as required.
§5. The Nonlinear Case: Introduction to Manifolds 183
x2+y2 = a2
x2+z2 = b2
Figure 5.5
then M = F−1 ({0}). To see that M is a 1-dimensional manifold, we need only check that
rank(DF(x)) = 2 for every x ∈ M . We have
x
2x 2y 0 x y 0
DF y = =2 .
2x 0 2z x 0 z
z
If x 6= 0, this matrix will have 2 pivots, since y and z can’t be simultaneously 0. If x = 0, then
both y and z are nonzero, and once again the matrix has 2 pivots. Thus, as claimed, the rank of
DF(x) is 2 for every x ∈ M , and so M is a smooth curve. ▽
184 Chapter 4. Implicit and Explicit Solutions of Linear Systems
EXERCISES 4.5
1. Can one solve for one of the variables in terms of the other to express each of the following as
a graph? What about locally?
a. xy = 0
b. 2 sin(xy) = 1
2. Decide whether each of the following is a smooth curve (1-dimensional manifold). If not, what
are the trouble points?
a. y 2 − x3 + x = 0
b. y 2 − x3 − x2 = 0
c. z − xy = y − x2 = 0
d. x2 + y 2 + z 2 − 1 = x2 − x + y 2 = 0
e. x2 + y 2 + z 2 − 1 = z 2 − xy = 0
*3. Let
x 1
f y = xy 2 + sin(xz) + ez and a = −1 .
z 0
1 x
a. Show that theequation
f = 2 defines z as a C function z = φ near a.
y
∂φ 1 ∂φ 1
b. Find and .
∂x −1 ∂y −1
c. Find the equation of the tangent plane of the surface f −1 ({2}) at a in two ways.
2 1 ∂h y/x
4. Suppose h : R → R is C and 6= 0. Show that the equation h = 0 defines z (locally)
∂x2 z/x
x
implicitly as a C1 function z = φ and show that
y
∂φ ∂φ x
x +y =φ .
∂x ∂y y
1
curve. Find its tangent line at the point a = √1 .
2
§5. The Nonlinear Case: Introduction to Manifolds 185
9. Show that the set of nonzero singular 2 × 2 matrices is a 3-dimensional manifold in M2×2 = R4 .
x
10. Consider the curve f = 4y 3 − 3y − x = 0.
y
a. Sketch the curve.
b. Check that y is given (locally) by the following C1 functions of x on the given intervals:
√ √
φ1 (x) = 21 (x + x2 − 1)1/3 + (x + x2 − 1)−1/3 , x ∈ (1, ∞)
φ2 (x) = cos( 31 arccos x), x ∈ (−1, 1)
Give the remaining functions (two defined on (−1, 1), one on (−∞, −1)).
c. Show that the function φ : (−1, ∞) → R defined by
x ∈ (−1, 1)
φ2 (x),
φ(x) = 1, x=1
φ1 (x), x ∈ (1, ∞)
is C1 and that the value of φ′ (1) agrees with that given by Lemma 5.2.
∂f
12. Suppose f : R3 → R is C2 and (a) 6= 0, so that f = 0 defines z implicitly as a C2 function φ
∂z
of x and y near a. Show that
2 2
∂2f ∂f ∂ 2 f ∂f ∂f ∂2f ∂f
−2 +
∂2φ ∂z 2 ∂x ∂x∂z ∂x ∂z ∂x2 ∂z
(a) = − 3 ,
∂x2 ∂f
∂z
where all the partial derivatives on the right-hand side are evaluated at a.
Show that through each point of ℓ3 there is a single line that intersects both ℓ1 and ℓ2 . Now,
find the equation of the surface formed by all the lines intersecting the three lines ℓ1 , ℓ2 , and
ℓ3 . Is it everywhere smooth? Sketch it.
15. a. Suppose A is an n×(n+1) matrix of rank n. Show that the one-dimensional solution space
of Ax = b varies continuously with b ∈ Rn . (First you must decide what this means!)
b. Generalize.
CHAPTER 5
Extremum Problems
In this chapter we turn to one of the standard topics in differential calculus, solving maxi-
mum/minimum problems. In single-variable calculus, the strategy is to invoke the Maximum Value
Theorem (which guarantees that a continuous function on a closed interval achieves its maximum
and minimum) and then to examine all critical points and the endpoints of the interval. In problems
that are posed on open intervals, one must work harder to understand the global behavior of the
function. For example, it is not too hard to prove that if a differentiable function has precisely one
critical point on an interval and that critical point is a local maximum point, then it must indeed
be the global maximum point. As we shall see, all of these issues are—not surprisingly—rather
more subtle in higher dimensions. But just to stimulate the reader’s geometric intuition, we pose
a direct question here.
Query: Suppose f : R2 → R is C1 and there is exactly one point a at which the
tangent plane of the graph of f is horizontal. Suppose a is a local minimum point.
Must it be a global minimum point?
We close the chapter with a discussion of projections and inconsistent linear systems, along with a
brief treatment of inner product spaces.
In Section 2 of Chapter 2 we introduced the basic topological notions of open and closed sets
and sequences. Here we return to a few more questions of the topology of Rn in order to frame the
higher-dimensional version of the Maximum Value Theorem. Let’s begin by reminding ourselves
why a closed interval is needed in the case of a continuous function of one variable: As Figure 1.1
a b a
Figure 1.1
illustrates, when an endpoint is missing or the interval extends to infinity, the function may have
no maximum value. We now make the “obvious” definition in higher dimensions:
Definition. We say S ⊂ Rn is bounded if all the points of S lie in some ball centered at the
origin, i.e., if there is a constant M so that kxk ≤ M for all x ∈ S. We say S ⊂ Rn is compact if it
187
188 Chapter 5. Extremum Problems
is a bounded, closed subset. That is, all the points of S lie in some ball centered at the origin, and
any convergent sequence of points in S converges to a point in S.
Examples 1. We saw in Example 6 of Chapter 2, Section 2 that a closed interval in R is a
closed subset, and it is obviously bounded, so it is in fact compact. Here are a few more examples.
(a) The unit sphere S n−1 = {x ∈ Rn : kxk = 1} is compact. Indeed, by Corollary 3.7 of Chapter
2, any level set of a continuous function is closed, so provided we have a bounded set, it will
also be compact. (Note that we write S n−1 because the sphere is an (n − 1)-dimensional
manifold, as Exercise 4.5.5 shows.)
(b) Any rectangle [a1 , b1 ] × · · · × [an , bn ] ⊂ Rn is compact. This set is obviously bounded, and
is closed because of Exercise 2.2.4.
(c) The set of 2 × 2 matrices of determinant 1 is a closed subset of R4 (because the determinant
is a polynomial expression in the entries of the matrix), but is not compact. The set is
k 0
unbounded, as we can take matrices of the form for arbitrarily large k. ▽
0 1/k
One of the most important features of a compact set is the following
Theorem 1.1. If X ⊂ Rn is compact, and {xk } is a sequence of points in A, then there is a
convergent subsequence {xkj } (which a fortiori converges to a point in X).
Proof. We first prove that any sequence of points in a rectangle [a1 , b1 ]× · · · × [an , bn ] ⊂ Rn has
a convergent subsequence. (This was the result of Exercise 2.2.15, but the argument is sufficiently
subtle that we include the proof here.) We proceed by induction on n.
Step (i): Suppose n = 1. Given a sequence {xk } of real numbers with a ≤ xk ≤ b for all k, we
claim that there is a convergent subsequence. If there are only finitely many distinct numbers xk ,
this is easy: at least one value must be taken on infinitely often, and we choose k1 < k2 < . . . so
that xk1 = xk2 = . . ..
If there are infinitely many distinct numbers among the xk , then we use the famous “successive
bisection” argument. Let I0 = [a, b]. There must be infinitely many distinct elements of our
sequence either to the left of the midpoint of I0 or to the right; let I1 = [a1 , b1 ] be the half that
contains infinitely many (if both do, let’s agree to choose the left half). Choose xk1 ∈ I1 . At
the next step, there must be infinitely many distinct elements of our sequence either to the left
or to the right of the midpoint of I1 . Let I2 = [a2 , b2 ] be the half that contains infinitely many
(and choose the left half if both do), and choose xk2 ∈ I2 with k1 < k2 . Continue this process
inductively. Suppose we have the interval Ij = [aj , bj ] containing infinitely many distinct elements
of our sequence, as well as k1 < k2 < · · · < kj with xkℓ ∈ Iℓ for ℓ = 1, 2, . . . , j. Then there must be
infinitely many distinct elements of our sequence either to the left or to the right of the midpoint of
the interval Ij , and we let Ij+1 = [aj+1 , bj+1 ] be the half that contains infinitely many (once again
choosing the left half if both do). We also choose xkj+1 ∈ Ij+1 with kj < kj+1 .
At the end of all this, why does the subsequence {xkj } converge? Well, in fact, we know what
its limit must be. The set of left endpoints aj is nonempty and bounded above by b, hence has a
least upper bound, α. First of all, the left endpoints aj must converge to α, because (see Figure
1.2)
a1 ≤ a2 ≤ · · · ≤ aj ≤ · · · ≤ α ≤ · · · ≤ bj ≤ · · · ≤ b2 ≤ b1 ,
§1. Compactness and the Maximum Value Theorem 189
α
a0=a1 a2=a3 a4 b3=b4 b1=b2 b0
I1
I3 I2
I4
Figure 1.2
and so α − aj ≤ bj − aj = (b − a)/2j → 0 as j → ∞. But since α and xkj both lie in the interval
[aj , bj ], it follows that |α − xkj | ≤ bj − aj → 0 as j → ∞.
n−1 . We introduce some
Step (ii): Suppose now n ≥ 2 and we know to be true in R
the result
x1 x1
..
notation: given x = ... ∈ Rn , we write x = . ∈R
n−1 . Given a sequence {x } of points
k
xn xn−1
in the rectangle [a1 , b1 ] × · · · × [an , bn ] ⊂ Rn , consider the sequence {xk } of points in the rectangle
[a1 , b1 ] × · · · × [an−1 , bn−1 ] ⊂ Rn−1 . By our induction hypothesis, there is a convergent subsequence
{xkj }. Now the sequence of nth coordinates of the corresponding vectors xkj , lying in the closed
interval [an , bn ], has in turn a convergent subsequence, indexed by kj1 < kj2 < · · · < kjℓ < . . ..
But then, by Exercises 2.2.6 and 2.2.2, it now follows that the subsequence {xkjℓ } converges, as
required.
Step (iii): Now we turn to the case of our general compact subset X. Since it is bounded, it is
contained in some ball B(0, R) centered at the origin, hence in some cube [−R, R] × · · · × [−R, R].
Thus, given a sequence {xk } of points in X, it lies in this rectangle, and hence by what we’ve
already proved has a convergent subsequence. The limit of that subsequence is, of course, a point
of the rectangle, but must in fact lie in X, since X is also closed. This completes the proof.
The result that is the cornerstone of our work in this chapter is the following
Proof. First we show that f is bounded (by which we mean that the set of its values is a
bounded subset of R). Assume to the contrary that the values of f are arbitrarily large. Then for
each k ∈ N there is a point xk ∈ X so that f (xk ) > k. By Theorem 1.1, since X is compact, the
sequence {xk } has a convergent subsequence, say xkj → a. Since f is continuous, by Proposition
3.6 of Chapter 2, f (a) = lim f (xkj ), but this is impossible, since f (xkj ) → ∞ as j → ∞. An
j→∞
identical argument shows that the values of f are bounded below as well.
1
Although we have not heretofore defined continuity of a function defined on an arbitrary subset of Rn , there is
no serious problem. We say f : X → R is continuous at a ∈ X if, given any ε > 0, there is δ > 0 so that
|f (x) − f (a)| < ε whenever kx − ak < δ and x ∈ X .
190 Chapter 5. Extremum Problems
Since the set of values of f is bounded above, it has a least upper bound, M . By the definition
of least upper bound, for each k ∈ N there is xk ∈ X so that M − f (xk ) < 1/k. As before, since
X is compact, the sequence {xk } has a convergent subsequence, say xkj → z. Then, by continuity,
f (z) = lim f (xkj ) = M , so f takes on its maximum value at z. An identical argument shows that
j→∞
f takes on its minimum value as well.
We infer from Theorem 1.2 that, given any linear map T : Rn → Rm , the function
f : S n−1 → R
f (x) = kT (x)k
is continuous (see Exercises 2.3.2 and 2.3.7 and Proposition 3.5 of Chapter 2). Therefore, f takes
on its maximum value, which we denote by kT k, called the norm of T :
kT k = max kT (x)k.
kxk=1
kT (x)k ≤ kT kkxk.
Moreover, for any scalar c we have kcT k = |c|kT k; and if S : Rn → Rm is another linear map, we
have kS + T k ≤ kSk + kT k.
we have
max k(S + T )(x)k ≤ max kS(x)k + kT (x)k ≤ max kS(x)k + max kT (x)k = kSk + kT k.
kxk=1 kxk=1 kxk=1 kxk=1
We will compute a few nontrivial examples of the norm of a linear map in the Exercises of Section
4, but in the meantime we have the following.
kAxk2 = (d1 x1 )2 +(d2 x2 )2 +· · ·+(dn xn )2 ≤ max(d21 , d22 , . . . , d2n )(x21 +· · ·+x2n ) = max(d21 , d22 , . . . , d2n ).
§1. Compactness and the Maximum Value Theorem 191
Note, moreover, that this maximum value is achieved, for if max(|d1 |, |d2 |, . . . , |dn |) = |di |, then
Aei = di ei and kAei k = |di |. Thus, we conclude that
For future reference, we include the following important and surprising result.
Proof. We argue by contradiction. Suppose that for some ε0 > 0 there were no such δ > 0.
Then for every m ∈ N, we could find xm , ym ∈ X with kxm −ym k < 1/m and |f (xm )−f (ym )| ≥ ε0 .
Since X is compact, we may choose a convergent subsequence xmk → a. Now since kxm − ym k → 0
as m → ∞, it must be the case that ymk → a as well. Since f is continuous at a, given ε0 > 0,
there is δ0 > 0 so that whenever kx − ak < δ0 , we have |f (x) − f (a)| < ε0 /2. By the triangle
inequality, whenever k is sufficiently large that kxmk − ak < δ0 and kymk − ak < δ0 , we have
EXERCISES 5.1
*1. Which of the following are compact subsets of the given Rn ? Give your reasoning. (Identify
2
the space of all n × n matrices with Rn .)
t
x 2 2 2 e cos t
a. ∈R :x +y =1 g. ∈ R 2 :t≤0
y et sin t
x
b. ∈ R 2 : x2 + y 2 ≤ 1 x
y h. y ∈ R 3 : x2 + y 2 + z 2 ≤ 1
x 2 2 2 z
c. ∈R :x −y =1
y
x
x i. y ∈ R 3 : x3 + y 3 + z 3 ≤ 1
d. ∈ R 2 : x2 − y 2 ≤ 1
z
y
x 2 1 j. {3 × 3 matrices A : det A = 1}
e. ∈ R : y = sin for some 0 < x ≤ 1
y x
t k. 2 × 2 matrices A : AT A = I
e cos t 2
f. t ∈R :t∈R l. 3 × 3 matrices A : AT A = I
e sin t
2. If X ⊂ Rn is not compact, then show that there is an unbounded continuous function f : X → R.
3. Let T : Rn → R be a linear map. Prove that there is a vector a ∈ Rn so that T (x) = a · x, and
deduce that kT k = kak.
4. Find kAk if
192 Chapter 5. Extremum Problems
" # " #
1 1 3 4
*a. A = b. A =
1 1 3 4
qP √
♯ 5. Suppose A is an m × n matrix. Prove that kAk ≤ a2ij ≤ nkAk.
i,j
7. Let A be an m × n matrix. Show that kAk = kAT k. (Hint: Start by showing that kAk ≤ kAT k
by using Proposition 4.5 of Chapter 1.)
*9. Suppose S ⊂ Rn has the property that any sequence of points in S has a subsequence converging
to a point in S. Prove that S is compact.
10. Suppose f : X → Rm is continuous and X is compact. Prove that the set f (X) = {y ∈ Rm :
y = f (x) for some x ∈ X} is compact. (Hint: Use Exercise 9.)
13. Suppose X ⊂ Rn is compact, and U1 , U2 , U3 , . . . ⊂ Rn are open sets whose union contains X.
Prove that there is a number δ > 0 so that for every x ∈ X, there is some j ∈ N so that
B(x, δ) ⊂ Uj . (Hint: If not, for each k ∈ N, what happens with δ = 1/k?)
2. Maximum/Minimum Problems
From this we infer that 0 is a global minimum point. Indeed, (x + y)2 + 2y 2 = 0 if and only if
x + y = y = 0 if and only if x = y = 0, so 0 is the only global minimum point of f . But is 0 the
only extremum? ▽
Lemma 2.1. Suppose f is defined on some neighborhood of the extremum a and f is differ-
entiable at a. Then Df (a) = O (or, equivalently, ∇f (a) = 0).
Proof. Suppose that a is a local minimum (the case of a local maximum is left to the reader).
Then for any v ∈ Rn , there is δ > 0 so that we have
Remark. Geometrically, if we consider f as a function of xi only, fixing all the other variables,
we get a curve with a local minimum at ai , which must therefore have a flat tangent line. That is,
all partial derivatives of f at a must be 0, and so the tangent plane must be horizontal.
Figure 2.1
194 Chapter 5. Extremum Problems
In Section 3 we will devise a second-derivative test to attempt to distinguish among local maxima,
local minima, and saddle points, typical ones of which are shown in Figure 2.1. In the sketch in
Figure 2.2(a), we cannot tell whether we are at a local maximum or a local minimum; however, in
Figure 2.2
(b) and (c) we strongly suspect a saddle point.
x
Example 3. The prototypical example of a saddle point is provided by the function f =
y
x 0
x2 − y 2 . The origin is a critical point, and clearly f > 0 for x 6= 0 and f < 0 for y 6= 0.
0 y
In the graph we see parabolas opening upwards in the x-direction and those opening downwards
in the y-direction (see Figure 2.3(a)).
A somewhat more interesting example is provided by the so-called monkey saddle, pictured in
x
Figure 2.3(b), which is the graph of f = 3xy 2 − x3 . Note that whereas the usual saddle surface
y
(a) (b)
Figure 2.3
allows room for the legs, in the case of the monkey saddle, there is also room for the monkey’s tail.
▽
§2. Maximum/Minimum Problems 195
Now we turn to the standard fare in differential calculus, the typical “applied extremum prob-
lems.” If we are fortunate enough to have a differentiable function on a compact region X, then
the Maximum Value Theorem guarantees both a global maximum and a global minimum, and we
can test for critical points on the interior of X (points having a neighborhood wholly contained in
X). It still remains to examine the function on the boundary of X, as well.
C3
π
1 2 1
C4 −1 0 −1 C2
1 2 1
0 C1 π
Figure 2.4
x ∈ [0, π],
whichachieves
a maximum at π/2 and minima at 0 and π. Similarly, on C2 and C4 we
0 π
have f =f = cos 2y, y ∈ [0, π], which achieves its maximum at 0 and π and its minimum
y y
at π/2. Wenow mark the
values
of f at the nine points we’ve
unearthed.
We
see that the hottest
π/2 π/2 0 π
points are and and the coldest points are and . On the other hand,
0 π π/2 π/2
the critical point at the center of the square is a saddle point (why?). ▽
Somewhat more challenging are extremum problems where the domain is not naturally compact.
Consider the following
Example 5. Of all rectangular boxes with no lid and having a volume of 4 m3 , we wish to
determine the dimensions of the one with least total surface area. Let x, y, and z represent the
length, width, and height of the box, respectively, measured in meters. Given that xyz = 4, we
wish to minimize the surface area xy + 2z(x + y). Substituting z = 4/xy, we then define the surface
area as a function of the independent variables x and y:
x 8 1 1
f = xy + (x + y) = xy + 8 + .
y xy x y
196 Chapter 5. Extremum Problems
y
24
xy=12
z a
S
y 1/2
1/2 24 x
x
(a) (b)
Figure 2.5
x
Note that the domain of f is the open first quadrant, i.e., X = : x > 0 and y > 0 , which
y
is definitely not compact. What guarantees that our function f achieves a minimum value on X?
(Note, for example, that f has no maximum value on X.) The heuristic answer is this: if either x
or y gets either very small or very large, the value of f gets very large. We shall make this precise
soon.
Let’s first of all find the critical points of f . We have
x 8 8
Df = y− 2 x− 2 ,
y x y
so at a critical point we must have
8 8
y− = x − 2 = 0,
x2 y
2 2
whence x = y = 2. The sole critical point is a = , and f = 12. Now it is not difficult to
2 2
establish the fact that a is the global minimum point of f . Let
x 1 1 12
S= : ≤ x ≤ 24, ≤y≤ ,
y 2 2 x
EXERCISES 5.2
1. Find all
the critical points of the following scalar functions:
x 2 2 x
*a. f = x + 3x − 2y + 4y h. f = x2 y − 4xy − y 2
y y
x x
b. f = xy + x − y
y *i. f y = xyz − x2 − y 2 + z 2
x z
c. f = sin x + sin y
y x
x j. f y = x3 + xz 2 − 3x2 + y 2 + 2z 2
d. f = x2 − 3x2 y + y 3 z
y
x x
e. f = x2 y + x3 − x2 + y 2 2 2 2
y k. f y = e−(x +y +z )/6 (x − y + z)
x z
f. f = (x2 + y 2 )e−y
y x
y = xyz − x2 − y 2 − z 2
x 2 2 l. f
*g. f = (x − y)e−(x +y )/4 z
y
2. A rectangular box with edges parallel to the coordinate axes has one corner at the origin and
the opposite corner on the plane x + 2y + 3z = 6. What is the maximum possible volume of
the box?
*3. A rectangular box is inscribed in a hemisphere of radius r. Find the dimensions of the box of
maximum volume.
√
*4. The
temperature of the circular plate D = {x : kxk ≤ 2} ⊂ R2 is given by the function
x
f = x2 + 2y 2 − 2x. Find the maximum and minimum values of the temperature on D.
y
5. Two non-overlapping rectangles with
their
sides
parallel
to the coordinate axes are inscribed in
0 1 0
the triangle with vertices at , , and . What configuration will maximize the sum
0 0 1
of their areas?
*6. A post office employee has 12 ft2 of cardboard from which to construct a rectangular box with
no lid. Find the dimensions of the box with largest possible volume.
7. Show that the rectangular box of maximum volume with a given surface area is a cube.
8. The material for the sides of a rectangular box cost twice as much per ft2 as that for the top and
bottom. Find the relative dimensions of the box with greatest volume that can be constructed
for a given cost.
198 Chapter 5. Extremum Problems
1
9. Find the equation of the plane through the point 2 that cuts off the smallest possible volume
in the first octant. 2
*10. A long flat piece of sheet metal, 12′′ wide, is to be bent to form a long trough with cross-sections
an isosceles trapezoid. Find the shape of the trough with maximum cross-sectional area. (Hint:
It will help to use an angle as one of your variables.)
11. A pentagon is formed by placing an isosceles triangle atop a rectangle. If the perimeter P of
the pentagon is fixed, find the dimensions of the rectangle and the height of the triangle that
give the pentagon of maximum area.
12. An ellipse is formed by intersecting the cylinder x2 + y 2 = 1 and the plane x + 2y + z = 0. Find
the highest and lowest points on the ellipse. (As usual, the z-axis is vertical.)
13. Suppose x, y, and z are positive numbers with xy 2 z 3 = 108. Find (with proof) the minimum
value of their sum.
15. (Cf. Exercise 14.) Let a1 , a2 , a3 ∈ R2 be three noncollinear points. Show that the function
3
X
f (x) = kx − aj k
j=1
has a global minimum and characterize the global minimum point. (Hint: Your answer will be
geometric in nature. Can you give an explicit geometric construction?)
Just as the second derivative test in single-variable calculus often allows us to differentiate
between local minima and local maxima, there is something quite analogous in the multivariable
case, to which we now turn. Of course, even with just one variable, if f ′ (a) = f ′′ (a) = 0, we
do not have enough information, and we need higher derivatives to infer the local behavior of f
at a; lying behind this is the theory of the Taylor polynomial, which works analogously in the
multivariable case. In the interest of time, however, we shall content ourselves here with just the
second derivative.
First, we need a one-variable generalization of the Mean Value Theorem. (In truth, it is Taylor’s
Theorem with Remainder for the first degree Taylor polynomial. See Chapter 20 of Spivak.)
Proof. Define the polynomial P by P (t) = g(0) + g′ (0)t + Ct2 , where C = g(1) − g(0) − g ′ (0).
This choice of C makes P (1) = g(1), and it is easy to see that P (0) = g(0) and P ′ (0) = g ′ (0) as
well, as shown in Figure 3.1. Then the function h = g − P satisfies h(0) = h′ (0) = h(1) = 0. By
Figure 3.1
Rolle’s Theorem, since h(0) = h(1) = 0, there is c ∈ (0, 1) so that h′ (c) = 0. By Rolle’s Theorem
applied to h′ , since h′ (0) = h′ (c) = 0, there is ξ ∈ (0, c) so that h′′ (ξ) = 0. This means that
g′′ (ξ) = P ′′ (ξ) = 2C, and so
as required.
The derivative in the multivariable setting becomes a linear map (or vector); as we shall soon
see, the second derivative should become a quadratic form, i.e., a quadratic function of a vector
variable.
Hess(f )(a) is called the Hessian matrix of f at a. Define the associated quadratic form
Hf,a : Rn → R by
n
X
∂2f
Hf,a (h) = hT Hess(f )(a) h = (a)hi hj .
∂xi ∂xj
i,j=1
Now we are in a position to state the generalization of Lemma 3.1 to functions of several
variables. This will enable us to deduce the appropriate second derivative test for extrema.
Proposition 3.2. Suppose f : B(a, r) → R is C2 . Then for all h with khk < r we have
Consequently,
f (a + h) = f (a) + Df (a)h + 12 Hf,a (h) + ǫ(h), where ǫ(h)/khk2 → 0 as h → 0.
Remark . Just as the derivative gives the best linear approximation to f at a, so adding
the quadratic term 12 Hf,a (h) gives the best possible quadratic approximation to f at a. This
is the second-degree Taylor polynomial of f at a. For further reading on multivariable Taylor
polynomials, consult, e.g., C. H. Edwards’ Advanced Calculus of Several Variables or Hubbard and
Hubbard Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach.
Proof. We apply Lemma 3.1 to the function g(t) = f (a + th). Using the chain rule twice (and
applying Theorem 6.1 of Chapter 3 as well), we have
n
X ∂f
g′ (t) = Df (a + th)h = (a + th)hi
∂xi
i=1
n X
X n n
X
∂2f ∂2f
g′′ (t) = (a + th)hj hi = (a + th)hi hj
∂xj ∂xi ∂xj ∂xi
i=1 j=1 i,j=1
= Hf,a+th (h).
Now substitution yields the first result.
Since f is C2 , given any ε > 0, there is δ > 0 so that whenever kvk < δ we have
kHess(f )(a + v) − Hess(f )(a)k < ε.
Using the Cauchy-Schwarz inequality, Proposition 2.3 of Chapter 1, and Proposition 1.3, we find
that |hT Ah| ≤ kAkkhk2 . So whenever khk < δ, we have, for any 0 < ξ < 1,
|Hf,a+ξh (h) − Hf,a (h)| < εkhk2 .
1
By definition, ǫ(h) = 2 Hf,a+ξh (h) − Hf,a (h) , so
|ǫ(h)| |Hf,a+ξh (h) − Hf,a (h)| ε
= <
khk2 2khk2 2
whenever khk < δ. Since ε > 0 was arbitrary, this proves the result.
Examples 1.
1 2
(a) The quadratic form Q(x) = x21 + 4x1 x2 + 5x22 = xT x is positive definite, as we see
2 5
by completing the square:
x21 + 4x1 x2 + 5x22 = (x1 + 2x2 )2 + x22 ,
§3. Quadratic Forms and the Second Derivative Test 201
being the sum of two squares (with positive coefficients) is nonnegative and can vanish only
if x2 = x1 + 2x2 = 0, i.e., only if x = 0.
1 1
(b) The quadratic form Q(x) = x21 +2x1 x2 −x22 = xT x is indefinite, as we can see either
1 −1
t 2 0
by completing the square or merely by observing that Q = t > 0 and Q = −t2 < 0
0 t
for t 6= 0.
1 1 1
(c) The quadratic form Q(x) = x1 + 2x1 x2 + 2x2 + 2x1 x3 + 2x3 = x 1 2 0 x is, however,
2 2 2 T
positive semidefinite, for 1 0 2
|ǫ(h)|
f (a + h) − f (a) = 12 Hf,a (h) + ǫ(h) where <ε whenever khk < δ.
khk2
Suppose now that Hf,a is positive definite. By the Maximum Value Theorem, Theorem 1.2,
there is a number m > 0 so that Hf,a (x) ≥ m for all unit vectors x. This means that Hf,a (h) ≥
mkhk2 for all h. So now, choosing ε = m/4, we have
for all h with khk < δ. This means that a is a local minimum, as desired. The negative definite
case is analogous.
Now suppose Hf,a is indefinite. Then there are unit vectors x and y so that Hf,a (x) = m1 > 0
and Hf,a (y) = m2 < 0. Choose ε = 41 min(m1 , −m2 ). Now, letting h = tx (resp., ty) with |t| < δ,
we see that
Proof. This is just the usual process of completing the square: When A 6= 0,
B 2 B2 2 B 2 AC − B 2 2
Ax2 + 2Bxy + Cy 2 = A x + y + C− y =A x+ y + y ,
A A A A
so the quadratic form is positive definite when A > 0 and AC − B 2 > 0, negative definite when
A < 0 and AC − B 2 > 0, and indefinite when AC − B 2 < 0. When A = 0, we have 2Bxy + Cy 2 =
y(2Bx+Cy), and so the quadratic form is indefinite provided B 6= 0, i.e., provided AC −B 2 < 0.
x
Example 2. Let’s find and classify the critical points of the function f : R2 → R, f =
y
x3 + y2 − 6xy. Then
x
Df = 3x2 − 6y 2y − 6x ,
y
0
and so
at a critical point we must have 2y = x2 = 6x. Thus, the critical points are a = and
0
6
b= .
18
Now, we calculate the Hessian:
x 6x −6
Hess(f ) = ,
y −6 2
and so
0 0 −6 6 36 −6
Hess(f ) = and Hess(f ) = .
0 −6 2 18 −6 2
We see that Hf,a is indefinite, so a is a saddle point, and Hf,b is positive definite, so b is a local
minimum point. ▽
The process of completing the square as we’ve done in Examples 1 can be couched in matrix
language; indeed, it is intimately related to the reduction to echelon form, as we shall now see.
where
1 1
′
A′ = −3 1 A, and so A = 3 1 A .
−2 0 1 2 0 1
| {z } | {z }
E1 E1−1
There are already two interesting observations to make: the first column of E1−1 is the transpose
of the first row of A (hence of A′ ); and if we remove the first row and column from A′ , what’s left
is also symmetric. Indeed, we can write
1 h i 0 0 0
A = 3 1 3 2 + 0 −5 −10 ;
2 0 −10 −14
since the first term is symmetric (why?), the latter term must be as well. Now we just continue:
1 3 2 1 3 2
A′ = 0 −5 −10 0 −5 −10 = U ,
0 −10 −14 0 0 6
Remark. T
Of course, not every symmetric matrix can be written in the form LDL ; e.g., take
0 1
A = . The problem arises when we have to switch rows to get pivots in the appropriate
1 0
places. Nevertheless, by doing appropriate row operations together with the companion column
operations (to maintain symmetry), one can show that every symmetric matrix can be written in
the form EDE T , where E is the product of elementary matrices with only 1’s on the diagonal
(i.e., elementary matrices of type (iii)). See Exercise 8b for the example of the matrix A given just
above.
Proposition 3.5. Suppose A is a symmetric matrix with associated quadratic form Q. Suppose
A = LDLT , where L is lower triangular with 1’s on the diagonal and D is diagonal. If all the entries
of D are positive (resp., negative), then Q is positive (resp., negative) definite; if all the entries of
D are nonnegative (resp., nonpositive) and at least one is 0, then Q is positive (resp., negative)
semidefinite; and if entries of D have opposite sign, then Q is indefinite.
Conversely, if Q is positive (resp., negative) definite, then there are a lower triangular matrix
L with 1’s on the diagonal and a diagonal matrix D with positive (resp., negative) entries so that
A = LDLT . If Q is semidefinite (resp., indefinite), the matrix EAE T (where E is a suitable product
of elementary matrices of type (iii)) can be written in the form LDLT , where now there is at least
one 0 (resp., real numbers of opposite sign) on the diagonal of D.
Sketch of proof. Suppose A = LDLT , where L is lower triangular with 1’s on the diagonal
(or, more generally, A = EDE T , where E is invertible). Let d1 , . . . , dn be the diagonal entries of
the diagonal matrix D. Letting y = LT x, we have
n
X
Q(x) = xT Ax = xT (LDLT )x = (LT x)T D(LT x) = yT Dy = di yi2 .
i=1
Realizing that y = 0 ⇐⇒ x = 0, the conclusions of the first part of the Proposition are now
evident.
Suppose Q is positive definite. Then, in particular, Q(e1 ) = a11 > 0, so we can write
1 0 0 ··· 0
a12 h ih i
a11 0
A=
.. a11
a
1 a12 · · · a
a
1n
+ .
,
..
11 11
. B
a1n
a11 0
where B is also symmetric and the quadratic form on Rn−1 associated to B is likewise positive
definite. We now continue by induction. (For example, if the upper left entry of B were 0, this
would mean that Q(a12 e1 − a11 e2 ) = 0, contradicting the hypothesis that Q is positive definite.)
§3. Quadratic Forms and the Second Derivative Test 205
EXERCISES 5.3
6. For each of the following symmetric matrices A, write A = LDLT as in Example 3. Use your
answer to determine whether the associated quadratic form Q given by Q(x) = xT Ax is positive
definite, negative
" definite,
# indefinite, etc.
1 3
a. A =
3 13
" #
2 3
b. A =
3 4
2
We’ve seen several textbooks that purportedly prove Theorem 3.3 by showing, for example, that if Hf,a is
positive definite, then the restriction of f to any line through a has a local minimum at a, and then concluding that a
must be a local minimum point of f . We hope that this exercise will convince you that such a proof must be flawed.
206 Chapter 5. Extremum Problems
2 2 −2
*c. A = 2 −1 4
−2 4 1
1 −2 2
d. A = −2 6 −6
2 −6 9
1 1 −3 1
1 0 −3 0
e. A =
−3 −3 11 −1
1 0 −1 2
7. Suppose A = LDU , where L is lower triangular with 1’s on the diagonal, D is diagonal, and
U is upper triangular with 1’s on the diagonal. Prove that this decomposition is unique; i.e., if
A = LDU = L′ D ′ U ′ , where L′ , D ′ , and U ′ have the same defining properties as L, D, and U ,
respectively, then L = L′ , D = D ′ , and U = U ′ . (Hint: The product of two lower triangular
matrices is lower triangular, and likewise for upper.)
0 2
8. a. Let A = . After making a row exchange (and corresponding column exchange to
2 1
T 1 2
preserve symmetry), we get B = E1 AE1 = . Now write B = LDLT and get a
2 0
corresponding equation for A. How then have we expressed the associated quadratic form
Q(x) = 4x1 x2 + x22 as a sum (or difference) of squares?
0 1 T 1 1
*b. Let A = . By considering B = E1 AE1 = , where E1 is the elementary matrix
1 0 1 0
corresponding to adding 1/2 of the second row to the first, show that
1 1
T 2 −2 1
A = EDE where E= and D = .
1 1 −1
What is the corresponding expression for the quadratic form Q(x) = 2x1 x2 as a sum (or
difference) of squares?
4. Lagrange Multipliers
Most extremum problems, including those encountered in single-variable calculus, involve func-
tions of several variables with some constraints. Consider, for example, the box of prescribed
volume, a cylinder inscribed in a sphere of given radius, or the desire to maximize profit with only
a certain amount of working capital. There is an elegant and powerful way to approach all these
problems using multivariable calculus, the method of Lagrange multipliers. A generalization to
infinite dimensions, which we shall not study here, is central in the calculus of variations, which is
a powerful tool in mechanics, thermodynamics, and differential geometry.
Example 1. Your boat has sprung a leak in the middle of the lake and you are trying to find
the closest point on the shoreline. As suggested by Figure 4.1, we imagine dropping a rock in the
§4. Lagrange Multipliers 207
water at the location of boat and watching the circular waves radiate outwards. The moment the
g( xy)=0
f ( xy)=constant
a
∇g( a)
∇f( a) =λ∇g( a)
Figure 4.1
first wave touches the shoreline, we know that the point a at which it touches must be closest to
us. And at that point, the circle must be tangent to the shoreline.
Let’s place the origin at the point at which we drop the rock. Then the circles emanating from
this point are level curves of f (x) = kxk. Suppose, moreover, that the shoreline is a level curve of
a differentiable function g. By Proposition 5.3 of Chapter 4, the gradient is normal to level sets,
so if the tangent line of the circle at a and the tangent line of the shoreline at a are the same, this
means that we should have
∇f (a) = λ∇g(a) for some scalar λ. ▽
We now want to study the calculus of constrained extrema a bit more carefully.
Remark. As usual, this is a necessary condition for a constrained extremum, but not a sufficient
one. There may be (constrained) saddle points as well.
208 Chapter 5. Extremum Problems
Proof. By the Implicit Function Theorem, we can represent M = g−1 ({0}) locally near a as
a graph over some coordinate (n − m)-plane. For concreteness, let’s say that locally
x
M= : x ∈ V ⊂ Rn−m ,
φ(x)
T M
a f
Φ
IR
IRn
a
IRn−m
V
Figure 4.2
so, taking orthogonal complements and using Exercise 1.3.9 and Proposition 4.8 of Chapter 4, we
have
so Df (a) is a linear combination of the linear maps Dg1 (a), . . . , Dgm (a)—or, more geometrically,
∇f (a) is a linear combination of the vectors ∇g1 (a), . . . , ∇gm (a)—as we needed to show.
⊥
Remark. The subspace T = image(DΦ(a)) = R([Dg(a)]) is called the tangent space of M
at a. We shall return to such matters in Chapter 6.
x x
Example 2. The temperature at the point y in space is given by f y = xy + z 2 .
z z
We wish to find the hottest 2 2 2
and coldest points on the sphere x + y + z = 2z (the sphere of
0
radius
1 centered at 0 ). That is, we must find the extrema of f subject to the constraint
x 1
g y = x2 + y 2 + z 2 − 2z = 0. By Theorem 4.1, we must find points x satisfying g(x) = 0 at
z
which Df (x) = λDg(x) for some scalar λ. That is, we seek points x so that
h i h i
(∗) y x 2z = λ x y z−1 for some scalar λ.
Remark. We surmise that the origin is a saddle point. Indeed, representing the sphere locally
p
as a graph near the origin, we have z = 1 − 1 − (x2 + y 2 ) and
x p 2
f y = xy + 1 − 1 − (x2 + y 2 ) = xy + higher order terms.
z
210 Chapter 5. Extremum Problems
√
(This is easiest to see by using 1 + u = 1 + u/2 + higher order terms.) Even easier, the origin
is a non-constrained critical point of f . Since f is a quadratic polynomial, Hf,0 = f , and on the
tangent plane of the sphere at 0 we just get xy. (Also see Exercise 34.)
Example 3. Find the shortest possible distance from the ellipse x2 + 2y 2 = 2 to the line
x + y = 2. We need to consider the (square of the) distance between pairs of points, one on the
x
ellipse, the other on the line. This means that we need to work in R2 × R2 , with coordinates
y
u
v
x
y
Figure 4.3
u
and , respectively. Let’s try to minimize
v
x
y
f 2
u = (x − u) + (y − v)
2
We close this section with an application of the method of Lagrange multipliers to linear algebra.
Suppose A is a symmetric n × n matrix. Let’s find the extrema of the quadratic form Q(x) = xT Ax
subject to the constraint g(x) = kxk2 = 1. By Theorem 4.1, we seek x ∈ Rn so that for some scalar
λ we have DQ(x) = λDg(x). Applying the result of Exercise 3.2.14 (and canceling a pair of 2’s),
this means that at any constrained extremum we must have
Ax = λx for some scalar λ.
Such a vector x is called an eigenvector of A, and the Lagrange multiplier λ is called an eigenvalue.
Note that by compactness of the unit sphere, Q must have at least a global minimum and a global
maximum; hence A must have at least two eigenvalues and corresponding eigenvectors.
6 2
Example 4. Consider A = . Proceeding as above, we arrive at the system of equations
2 9
6x + 2y = λx
2x + 9y = λy.
Eliminating λ, we obtain
6x + 2y 2x + 9y
= ,
x y
from which we find the equation
y 2 y y y
2 −3 −2= 2 +1 − 2 = 0,
x x x x
so either y = 2x or y = − 12x. Substituting
into the constraint equation, we obtain the critical
√ √
1/√5 −2/√ 5
points (eigenvectors) and , with respective Lagrange multipliers (eigenvalues)
2/ 5 1/ 5
10 and 5. ▽
EXERCISES 5.4
x
1. a. Find the minimum value of f = x2 + y 2 on the curve x + y = 2. Why is there no
y
maximum?
x
b. Find the maximum value of g = x + y on the curve x2 + y 2 = 2. Is there a minimum?
y
c. How are the questions (and answers) in parts a and b related?
x
*2. A wire has the shape of the circle x2 + y 2 − 2y = 0. Its temperature at the point is given
y
x
by T = 2x2 + 3y. Find the maximum and minimum temperatures of the wire. (Be sure
y
you’ve found all potential critical points!)
212 Chapter 5. Extremum Problems
x
3. Find the maximum value of f y = 2x + 2y − z on the sphere of radius 2 centered at the
origin. z
x
4. Find the maximum and minimum values of the function f = x2 + xy + y 2 on the unit disk
y
D = {x ∈ R2 : kxk ≤ 1}.
1
5. Find the point(s) on the ellipse x2 + 4y 2 = 4 closest to the point .
0
x
6. The temperature at the point x is given by f y = x2 + 2y + 2z. Find the hottest and coldest
z
points on the sphere x2 + y2 + z2 = 3.
7. Find the volume of the largest rectangular box (with all its edges parallel to the coordinate
axes) that can be inscribed in the ellipsoid
y2 z2
x2 + + =1.
2 3
8. A space probe in the shape of the ellipsoid 4x2 + y 2 + 4z 2 = 16 enters the earth’s atmosphere
and ◦
its
surface begins to heat. After one hour, the temperature in C on its surface is given by
x
f y = 2x2 + yz − 4z + 600. Find the hottest and coldest points on the probe’s surface.
z
x
9. The temperature in space is given by f y = 3xy + z 3 − 3z. Prove that there are hottest and
z
coldest points on the sphere x2 + y 2 + z 2 − 2z = 0, and find them.
x x
10. Let f y = xy + z 3 and S = y : x2 + y 2 + z 2 = 1, z ≥ 0 . Prove that f attains its
z z
global maximum and minimum on S and determine its global maximum and minimum points.
11. Among all triangles inscribed in the unit circle, which have the greatest area? (Hint: Consider
the three small triangles formed by joining the vertices to the center of the circle.)
12. Among all triangles inscribed in the unit circle, which have the greatest perimeter?
2 2 2 2 3
*13. Find the ellipse x /a + y /b = 1 that passes through the point and has the least area.
1
(Recall that the area of the ellipse is πab.)
18. Find the maximum and minimum values of the function f (x) = x1 + · · · + xn subject to the
constraint kxk = 1.
√ x1 + · · · + xn
n
x1 x2 · · · xn ≤ .
n
1 1
21. Suppose p, q > 0 and + = 1. Suppose x, y > 0. Use Lagrange multipliers to prove that
p q
xp y q
+ ≥ xy.
p q
23. A silo is built by putting a right circular cone atop a right circular cylinder (both having the
same radius). What dimensions will give the silo of maximum volume for a given surface area?
25. Use Lagrange multipliers to find the point closest to the origin on the line of intersection of the
planes x + 2y + z = 5 and 2x + y − z = 1.
26. In each case, find the point in the given subspace V closest
to b.
3
*a. V = {x ∈ R3 : x1 − x2 + 3x3 = 2x1 + x2 = 0}, b = 7
1
3
1
b. V = {x ∈ R4 : x1 + x2 + x3 + x4 = x1 + 2x3 + x4 = 0}, b =
1
−1
*27. Find the points on the curve of intersection of the two surfaces x2 − xy + y 2 − z 2 = 1 and
x2 + y 2 = 1 that are closest to the origin.
28. Show that of all quadrilaterals with fixed side lengths, the one of maximum area can be inscribed
in a circle. (Hint: Use as variables a pair of opposite angles. See also Exercise 1.2.14.)
29. For each of the following symmetric matrices A, find all the extrema of Q(x) = xT Ax subject
to the constraint kxk 2
" # = 1. Also determine the Lagrange multiplier
" each time.
#
1 2 0 3
*a. A = b. A =
2 −2 3 −8
30. Find the norm of each of the following matrices. Note: A calculator will be helpful.
214 Chapter 5. Extremum Problems
" # " # " #
1 1 2 1 2 1
*a. b. c.
0 1 0 3 1 3
31. A (frictionless) lasso is thrown around two pegs, as pictured in Figure 4.4, and a large weight
hung from the free end. Treating the mass of the rope as insignificant, and supposing the weight
hangs freely, what is the equilibrium position of the system?
y
?
Figure 4.4
34. (A second derivative test for constrained extrema) Suppose a is a critical point of f subject
to the constraint g(x) = c, Df (a) = λDg(a), and Dg(a) 6= O. Show that a is a constrained
local maximum (resp., minimum) of f on S = {x : g(x) = c} if the restriction of the Hessian of
f − λg to the tangent space Ta S is negative (resp., positive) definite. (Hint: Parametrize the
§5. Projections, Least Squares, and Inner Product Spaces 215
constraint surface g = c locally by Φ with Φ(a) = a and apply Theorem 3.3 to f ◦ Φ.) There
is an interpretation in terms of the bordered Hessian (see Exercise 32b), which is indicated in
Exercise 9.4.21.
b
C(A)
Figure 5.1
and so
1 −1 4/3
1
p = 1 − 1 = 2/3 .
3
1 1 2/3
216 Chapter 5. Extremum Problems
b
V
Ax
IRn Ax
Figure 5.2
(∗) gives the associated normal equations.
§5. Projections, Least Squares, and Inner Product Spaces 217
Remark. When A has rank less than m, the linear system (∗) is still consistent (see Exercise
4.4.15) and has infinitely many solutions. We define the least squares solution to be the one of
smallest length, i.e., the unique vector x ∈ R(A) that satisfies the equation. See Proposition 4.10
of Chapter 4. This leads to the pseudoinverse that is important in numerical analysis (cf. Strang).
Example 2. We wish to find the least squares solution of the system Ax = b, where
2 1 2
1 1 1
A= and b= .
0 1 1
1 −1 −1
and so, using the formula for the inverse of a 2 × 2 matrix in Example 5 on p. 146,
" #" # " #
T −1 T 1 4 −2 4 1 3
x = (A A) A b= =
20 −2 6 5 10 11
This is all it takes to give an explicit formula for projection onto a subspace V ⊂ Rn . In
particular, denote by
projV : Rn → Rn
the function which assigns to each vector b ∈ Rn the vector p ∈ V closest to b. Start by choosing
a basis {v1 , . . . , vm } for V , and let
| | |
A = v1 v2 · · · vm
| | |
be the n × m matrix whose column vectors are these basis vectors. Then, given b ∈ Rn , we know
that if we take x = (AT A)−1 AT b, then Ax = p = projV b. That is,
p = projV b = A(AT A)−1 AT b,
In Section 5.2, we’ll see a bit more of the geometry underlying the formula for the projection matrix.
218 Chapter 5. Extremum Problems
as it should be. ▽
Example 4. Note that when dim V = 1, we recover our formula for projection onto a line from
Section 2 of Chapter 1. If a ∈ Rn is a nonzero vector, we consider it as an n × 1 matrix and the
projection formula becomes
1
P = aaT ;
kak2
that is,
1 1 a·b
Pb = 2
(aaT )b = 2
a(aT b) = a,
kak kak kak2
as before. ▽
Now, what happens if we are given the subspace implicitly? This sounds like the perfect set-up
for Lagrange multipliers. Suppose the m-dimensional subspace V ⊂ Rn is given as the nullspace
of an (n − m) × n matrix B of rank n − m. To find the point in V closest to b ∈ Rn , we want to
minimize the function
The method of Lagrange multipliers, Theorem 4.1, tells us that we must have (dropping the factor
of 2)
n−m
X
T
(x − b) = λi Bi , for some scalars λ1 , . . . , λn−m ,
i=1
where, as usual, Bi are the rows of B. Transposing this equation, we have
λ1
..
x − b = B T λ, where λ = . .
λn−m
(BB T )λ = −Bb.
By analogy with our treatment of the equation (∗), the matrix BB T has rank n − m, and so we
can solve for λ, hence for the constrained extremum x0 :
(‡) x0 = b + B T −(BB T )−1 Bb = b − B T (BB T )−1 Bb.
Note that, according to our projection formula (†), we can interpret this answer as
as it should be.
5.1. Data fitting. Perhaps the most natural setting in which inconsistent systems of equations
arise is that of fitting data to a curve when they won’t quite fit. For example, in our laboratory
work
x1 xm
many of us have tried to find the right constants a and k so that the data points ,...,
y1 ym
lie on the k
curve
y = ax . Taking natural logarithms, we see that this is equivalent to fitting the
ui log xi
points = , i = 1, . . . , m, to a line v = ku + log a—whence the convenience of log-log
vi log yi
paper. The least squares solution of such problems is called the least squares line fitting the points
(or the line of regression in statistics).
−1 1 2
Example 6. Find the least squares line y = ax + b for the data points , , and .
0 1 3
(See Figure 5.3.) We get the system of equations
−1a + b = 0
1a + b = 1
2a + b = 3 ,
-2 -1 1 2 3
-1
Figure 5.3
The least squares process chooses a and b so that kǫk2 = ǫ21 + · · · + ǫ2m is as small as possible. But
something interesting happens. Recall that
ǫ = y − y ∈ C(A)⊥ .
§5. Projections, Least Squares, and Inner Product Spaces 221
5.2. Orthogonal Bases. We have seen how to find the projection of a vector onto a subspace
V ⊂ Rn using the so-called normal equations. But the inner workings of the formula (†) on p. 217
escape us. Since we have known since Chapter 1 how to project a vector x onto a line, it might
seem more natural to start with a basis {v1 , . . . , vk } for V and sum up the projections of x onto
the vj ’s. However, as we see in Figure 5.4(a), when we start with x ∈ V and add the projections
projv1x + projv2x
projv2x
v2 w2
projw1x + projw2x
x x
projw2x
v1 projv1x projw1x
w1
(a) (b)
Figure 5.4
of x onto the vectors of an arbitrary basis for V , the resulting vector needn’t have much to do with
x. Nevertheless, the diagram on the right suggests that when we start with a basis consisting of
mutually orthogonal vectors, the process may work. We begin by proving this as a lemma.
Proof. Suppose {v1 , . . . , vk } is an orthogonal basis for V . Then there are scalars c1 , . . . , ck so
that
x = c1 v1 + · · · + ci vi + · · · + ck vk .
222 Chapter 5. Extremum Problems
Taking advantage of the orthogonality of the vj ’s, we take the dot product of this equation with
vi :
and so
x · vi
ci = .
kvi k2
(Note that vi 6= 0 since {v1 , . . . , vk } forms a basis for V .)
Conversely, suppose that every vector x ∈ V is the sum of its projections on v1 , . . . , vk . Let’s
just examine what this means when x = v1 : we are given that
k
X k
X v1 · vi
v1 = projvi v1 = vi .
i=1 i=1
kvi k2
Recall from Proposition 3.1 of Chapter 4 that every vector has a unique expansion as a linear
combination of basis vectors, so comparing coefficients of v2 , . . . , vk on either side of this equation,
we conclude that
v1 · vi = 0 for all i = 2, . . . , k.
A similar argument shows that vi · vj = 0 for all i 6= j, and the proof is complete.
As we mentioned above, if {v1 , . . . , vk } is a basis for V , then every vector x ∈ V can be written
uniquely as a linear combination
x = c1 v1 + c2 v2 + · · · + ck vk .
We recall that the coefficients c1 , c2 , . . . , ck that appear here are called the coordinates of x with
respect to the basis {v1 , . . . , vk }. It is worth emphasizing that when {v1 , . . . , vk } forms an orthog-
onal basis for V , it is quite easy to compute the coordinates of x by using the dot product, that is,
ci = x · vi /kvi k2 . As we saw in Example 8 of Section 3 of Chapter 4 (see also Section 1 of Chapter
9), when the basis is not orthogonal, it is far more tedious to compute these coordinates.
Not only do orthogonal bases make it easy to calculate coordinates, they also make projections
quite easy to compute, as we now see.
Proof. Assume {v1 , . . . , vk } is an orthogonal basis for V and write b = p + (b − p), where
Xk
p · vi
p = projV b (and so b − p ∈ V ⊥ ). Then, since p ∈ V , by Lemma 5.1, we know p = vi .
i=1
kvi k2
§5. Projections, Least Squares, and Inner Product Spaces 223
P
k
Conversely, suppose projV b = projvi b for all b ∈ Rn . In particular, when b ∈ V , we deduce
i=1
that b = projV b can be written as a linear combination of v1 , . . . , vk , so these vectors span V ;
since V is k-dimensional, {v1 , . . . , vk } gives a basis for V . By Lemma 5.1, it must be an orthogonal
basis.
We now have another way to calculate the projection of a vector on a subspace V , provided we
can come up with an orthogonal basis for V .
Example 7. We return to Example 5 on p. 218. The basis {v1 , v2 } we used there was certainly
not an orthogonal basis, but it is not hard to find one that is. Instead, we take
−1 1
w1 = 0 and w2 = 1 .
1 1
(It is immediate that w1 · w2 = 0 and that w1 , w2 lie in the plane x1 − 2x2 + x3 = 0.) Now, we
calculate
b · w1 b · w2
projV b = projw1 b + projw2 b = 2 w1 + w2
kw1 k kw2 k2
1 T 1 T
= w1 w1 + w2 w2 b
kw1 k2 kw2 k2
1 0 −1 1 1 1
1 1
= 0 0 0+ 1 1 1 b
2 3
−1 0 1 1 1 1
5 1
6 3 − 61
1 b,
= 13 1
3 3
− 16 1
3
5
6
as we found earlier. ▽
Remark . This is exactly what we get from formula (†) on p. 217 when {v1 , . . . , vk } is an
orthogonal set. In particular,
1
kv1 k2 v1T
| | |
1 v2T
T −1 T kv2 k2
P = A(A A) A = v1 v2 · · · vk .
..
. ..
| | |
1 vk T
kv k2 k
k
X 1
= vi viT .
kvi k2
i=1
224 Chapter 5. Extremum Problems
Now it is time to develop an algorithm for transforming a given (ordered) basis {v1 , . . . , vk }
for a subspace into an orthogonal basis {w1 , . . . , wk }, as shown in Figure 5.5. The idea is quite
simple. We set
w1 = v1 .
If v2 is orthogonal to w1 , then we set w2 = v2 . Of course, in general, it will not be, and we want
w2 to be the part of v2 that is orthogonal to w1 ; i.e., we set
v2 · w1
w2 = v2 − projw1 v2 = v2 − w1 .
kw1 k2
Then, by construction, w1 and w2 are orthogonal and Span(w1 , w2 ) ⊂ Span(v1 , v2 ). Since w2 6= 0
(why?), {w1 , w2 } must be linearly independent and therefore give a basis for Span(v1 , v2 ) by
Lemma 3.8. We continue, replacing v3 by its part orthogonal to the plane spanned by w1 and w2 :
v2 w2
v1 w1
v3 w3
v2 w2
v1 w1
Figure 5.5
v3 · w1 v3 · w2
w3 = v3 − projSpan(w1 ,w2 ) v3 = v3 − projw1 v3 − projw2 v3 = v3 − 2 w1 − w2 .
kw1 k kw2 k2
Note that we are making definite use of Proposition 5.2 here: we must use w1 and w2 in the formula
here, rather than v1 and v2 , because the formula (∗∗) requires an orthogonal basis. Once again,
we find that w3 6= 0 (why?), and so {w1 , w2 , w3 } must be linearly independent, and, consequently,
an orthogonal basis for Span(v1 , v2 , v3 ). The process continues until we have arrived at vk and
replaced it by
vk · w1 vk · w2 vk · wk−1
wk = vk − projSpan(w1 ,...,wk−1 ) vk = vk − 2 w1 − 2 w2 − · · · − wk−1 .
kw1 k kw2 k kwk−1 k2
Summarizing, we have the algorithm that goes by the name of the Gram-Schmidt process.
§5. Projections, Least Squares, and Inner Product Spaces 225
w1 = v1
v2 · w1
w2 = v2 − w1
kw1 k2
..
.
v3 · w1 v3 · w2
w3 = v3 − 2 w1 − w2
kw1 k kw2 k2
226 Chapter 5. Extremum Problems
1 1 1 2
1 1 1 0
3 · 1 1 3 · −2
1 2
1 3 1 1 3 0 0
=
3−
2
−
2
−2
1
1
2
3
1
0
1
0
1
−2
1
0
1 1 2 0
1 8 1 −4 0 −1
=
3 − 4 1 − 8 −2 = 0 .
3 1 0 1
And if we desire an orthonormal basis, then we take
1 1 0
1 1 1 0 1 −1
q1 = , q2 = √ , q3 = √ .
21 2 −1 2 0
1 0 1
It’s always a good idea to check that the vectors form an orthogonal (or orthonormal) set, and it’s
easy—with these numbers—to do so. ▽
5.3. Inner Product Spaces. In certain abstract vector spaces we may define a notion of dot
product.
Definition. Let V be a real vector space. We say V is an inner product space if for every pair
of elements u, v ∈ V there is a real number hu, vi, called the inner product of u and v, such that:
(1) hu, vi = hv, ui for all u, v ∈ V ;
(2) hcu, vi = chu, vi for all u, v ∈ V and scalars c;
(3) hu + v, wi = hu, wi + hv, wi for all u, v, w ∈ V ;
(4) hv, vi ≥ 0 for all v ∈ V and hv, vi = 0 only if v = 0.
Examples 9. (a) Fix k+1 distinct real numbers t1 , t2 , . . . , tk+1 and define an inner product
on Pk by the formula
k+1
X
hp, qi = p(ti )q(ti ), p, q ∈ Pk .
i=1
All the properties of an inner product are obvious except for the very last. If hp, pi = 0,
P
k+1
then p(ti )2 = 0, and so we must have p(t1 ) = p(t2 ) = · · · = p(tk+1 ) = 0. But if a
i=1
polynomial of degree ≤ k has (at least) k + 1 roots, then it must be the zero polynomial.
(b) Let C0 ([a, b]) denote the vector space of continuous functions on the interval [a, b]. If
f, g ∈ C0 ([a, b]), define
Z b
hf, gi = f (t)g(t)dt.
a
We verify that the defining properties hold.
Rb Rb
(1) hf, gi = a f (t)g(t)dt = a g(t)f (t)dt = hg, f i.
§5. Projections, Least Squares, and Inner Product Spaces 227
Rb Rb Rb
(2) hcf, gi = a (cf )(t)g(t)dt = a cf (t)g(t)dt = c a f (t)g(t)dt = chf, gi .
Rb Rb Rb
(3) hf + g, hi = a (f + g)(t)h(t)dt = a f (t) + g(t) h(t)dt = a f (t)h(t) + g(t)h(t) dt =
Rb Rb
a f (t)h(t)dt
Rb
+ a g(t)h(t)dt = hf, hi + hg, hi.
(4) hf, f i = a f (t) dt ≥ 0 since f (t)2 ≥ 0 for all t. On the other hand, if hf, f i =
2
Rb 2
a f (t) dt = 0, then since f is continuous and f 2 ≥ 0, it must be the case that
f = 0. (If not, we would have f (t0 ) 6= 0 for some t0 , and then f (t)2 would be positive
Rb
on some small interval containing t0 ; it would then follow that a f (t)2 dt > 0.)
The same inner product can be defined on subspaces of C0 ([a, b]), e.g., Pk .
(c) We define an inner product on Mn×n in Exercise 18. ▽
If V is an inner product space, we define length, orthogonality, and the angle between vectors
p
just as we did in Rn . If v ∈ V , we define its length to be kvk = hv, vi. We say v and w are
orthogonal if hv, wi = 0. Since the Cauchy-Schwarz inequality can be established in general by
following the proof of Proposition 2.3 of Chapter 1 verbatim, we can define the angle θ between v
and w by the equation
hv, wi
cos θ = .
kvkkwk
We can define orthogonal subspaces, orthogonal complements, and the Gram-Schmidt process anal-
ogously.
We can use the inner product defined in Example 9(a) to prove the following important result
about curve fitting.
Theorem 5.4 (Lagrange Interpolation Formula). Given k + 1 points
t1 t2
, , . . . , smvectk+1 bk+1
b1 b2
in the plane with t1 , t2 , . . . , tk+1 distinct, there is exactly one polynomial p ∈ Pk whose graph passes
through the points.
Proof. We begin by explicitly constructing a basis for Pk consisting of mutually orthogonal
vectors of length 1 with respect to the inner product defined in Example 9(a). That is, to start,
we seek a polynomial p1 ∈ Pk so that
p1 (t1 ) = 1, p1 (t2 ) = 0, ..., p1 (tk+1 ) = 0.
The polynomial q1 (t) = (t − t2 )(t − t3 ) · · · (t − tk+1 ) has the property that q1 (tj ) = 0 for j =
2, 3, . . . , k + 1, and q1 (t1 ) = (t1 − t2 )(t1 − t3 ) · · · (t1 − tk+1 ) 6= 0 (why?). So now we set
(t − t2 )(t − t3 ) · · · (t − tk+1 )
p1 (t) = ;
(t1 − t2 )(t1 − t3 ) · · · (t1 − tk+1 )
then, as desired, p1 (t1 ) = 1 and p1 (tj ) = 0 for j = 2, 3, . . . , k + 1. Similarly, we can define
(t − t1 )(t − t3 ) · · · (t − tk+1 )
p2 (t) =
(t2 − t1 )(t2 − t3 ) · · · (t2 − tk+1 )
and polynomials p3 , . . . , pk+1 so that
1, when i = j
pi (tj ) = .
0, when i =
6 j
228 Chapter 5. Extremum Problems
Like the standard basis vectors in Euclidean space, p1 , p2 , . . . , pk+1 are unit vectors in Pk that
are orthogonal to one another. It follows from Exercise 4.3.5 that these vectors form a linearly
independent set, hence a basis for Pk (why?). In Figure 5.6 we give the graphs of the Lagrange
“basis polynomials” p1 , p2 , p3 for P2 when t1 = −1, t2 = 0, and t3 = 2.
p1
2 p3
p2
1
-2 -1 1 2 3
-1
-2
Figure 5.6
p = b1 p1 + b2 p2 + · · · + bk+1 pk+1
has the desired properties: viz., p(tj ) = bj for j = 1, 2, . . . , k+1. On the other hand, two polynomials
of degree ≤ k with the same values at k+1 points must be equal since their difference is a polynomial
of degree ≤ k with at least k+1 roots. This establishes uniqueness. (More elegantly, any polynomial
q with q(tj ) = bj , j = 1, . . . , k + 1, must satisfy hq, pj i = bj , j = 1, . . . , k + 1.)
EXERCISES 5.5
3
1
1
c. V = {x1 − x2 + x3 + 2x4 = 0} ⊂ R4 , b =
1
1
2. Check from the formula P = A(AT A)−1 AT for the projection matrix that P = P T and P 2 = P .
Show that I − P has the same properties; explain.
§5. Projections, Least Squares, and Inner Product Spaces 229
1 0
3. Let V = Span 0 , 1 ⊂ R3 . Construct the matrix [projV ]
1 −2
a. by finding [proj(V ⊥ ) ];
b. by using the projection matrix P given in formula (†) on p. 217.
c. by finding an orthogonal basis for V .
9. Derive the equation (∗) on p. 216 by starting with the equation Ax = p and using the result
of Theorem 4.9 of Chapter 4.
12. Execute the Gram-Schmidt process in each case to give an orthonormal basis for the subspace
spanned
bythegiven
vectors.
1 2 3
a. 0 , 1 , 2
0 0 1
1 0 0
b. 1 , 1 , 0
1 1 1
1 2 0
0 1 1
*c. , ,
1 0 2
0 1 −3
−1 2 −1
2 −4 3
d.
0 , 1 , 1
2 −4 1
1 1 1
−1 0
Let V = Span 4 −3
*13. 0 , 1 ⊂ R , and let b = 1 .
2 1 1
a. Find an orthogonal basis for V .
b. Use your answer to part a to find p = projV b.
c. Letting
1 1
−1 0
A= ,
0 1
2 1
15. According to Proposition 4.10 of Chapter 4, if A is an m × n matrix, then for each b ∈ C(A),
there is a "unique x ∈ R(A)
# with Ax = b. In each case, give a formula for that x.
1 2 3
a. A =
1 2 3
" #
1 1 1
*b. A =
0 1 −1
" #
1 1 1 1
c. A =
1 1 3 −5
1 1 1 1
d. A = 1 1 3 −5
2 2 4 −4
♯ 16. Let A be an n × n matrix and, as usual, let a1 , . . . , an denote its column vectors.
a. Suppose a1 , . . . , an form an orthonormal set. Prove that A−1 = AT .
*b. Suppose a1 , . . . , an form an orthogonal set and each is nonzero. Find the appropriate
formula for A−1 .
Ra
17. Let V = C0 ([−a, a]) with the inner product hf, gi = −a f (t)g(t)dt. Let U + ⊂ V be the subset
of even functions, and let U − ⊂ V be the subset of odd functions. That is, U + = {f ∈ V :
f (−t) = f (t) for all t ∈ [−a, a]} and U − = {f ∈ V : f (−t) = −f (t) for all t ∈ [−a, a]}.
a. Prove that U + and U − are orthogonal subspaces of V .
b. Use the fact that every function can be written as the sum of an even and an odd function,
viz.,
1
f (t) = 2 f (t) + f (−t) + 12 f (t) − f (−t) ,
| {z } | {z }
even odd
18. (See Exercise 1.4.22 for the definition and basic properties of trace.)
a. If A, B ∈ Mn×n , define hA, Bi = tr(AT B). Check that this is an inner product on Mn×n .
b. Check that if A is symmetric and B is skew-symmetric, then hA, Bi = 0. (Hint: Show
that hA, Bi = −hB, Ai.)
c. Deduce that the subspaces of symmetric and skew-symmetric matrices (cf. Exercise 4.3.24)
are orthogonal complements in Mn×n .
19. Let g1 (t) = 1 and g2 (t) = t. Using the inner product defined in Example 9(b), find the
orthogonal complement of Span(g1 , g2 ) in
a. P2 ⊂ C0 ([−1, 1])
*b. P2 ⊂ C0 ([0, 1])
c. P3 ⊂ C0 ([−1, 1])
232 Chapter 5. Extremum Problems
*20. Show that for any positive integer n, the functions 1, cos t, sin t, cos 2t, sin 2t, . . . , cos nt, sin nt
are orthogonal in C∞ ([−π, π]) ⊂ C0 ([−π, π]) (using the inner product defined in Example 9(b)).
CHAPTER 6
Solving Nonlinear Problems
In this brief chapter we introduce some important techniques for dealing with nonlinear prob-
lems (and in the infinite-dimensional setting, as well, although that is too far off-track for us here).
As we’ve said all along, we expect the derivative of a nonlinear function to dictate locally how the
function behaves. In this chapter we come to the rigorous treatment of the inverse and implicit
function theorems, to which we alluded at the end of Chapter 4, and to a few equivalent descriptions
of a k-dimensional manifold, which will play a prominent role in Chapter 8.
We begin with a useful result about summing series of vectors. It will be important not just in
our immediate work, but also in our treatment of matrix exponentials in Chapter 9.
converges (i.e., the sequence of partial sums tk = ka1 k + · · · + kak k is a convergent sequence of real
numbers). Then the series
X∞
ak
k=1
of vectors converges (i.e., the sequence of partial sums sk = a1 + · · · + ak is a convergent sequence
of vectors in Rn ).
Proof. We first prove the result in the case n = 1. Given a sequence {ak } of real numbers,
define bk = ak + |ak |. Note that
2a , if a ≥ 0
k k
bk = .
0, otherwise
P
∞ P
Now, the series bk converges by comparison with 2|ak |. (Directly: since bk ≥ 0, the partial
k=1 P
sums form a nondecreasing sequence that is bounded above by 2 |ak |. That nondecreasing
sequence must converge to its least upper bound. See Example 4(c) of Chapter 2, Section 2.) Since
P P
ak = bk − |ak |, the series ak converges, being the sum of the two convergent series bk and
P
− |ak |.
We use this case to derive the general result. Denote by ak,j , j = 1, . . . , n, the j th component of
P
the vector ak . Obviously, we have |ak,j | ≤ kak k. By comparison with the convergent series kak k,
233
234 Chapter 6. Solving Nonlinear Problems
P
for any j = 1, . . . , n, the series k |ak,j | converges, and hence, by what we’ve just proved, so does
P
the series k ak,j . Since this is true for each j = 1, . . . , n, the series
P
a k,1
X k.
ak =
..
k P
k ak,n
Remark. The result holds even if we use something other than the Euclidean length in Rn .
For example, we can apply the result using the norm defined on the vector space of m × n matrices
in Section 1 of Chapter 5, since the triangle inequality kA + Bk ≤ kAk + kBk holds (see Proposition
1.3 of Chapter 5) and |aij | ≤ kAk for any matrix A = [aij ] (why?).
The following result is crucial in both pure and applied mathematics, and applies in infinite-
dimensional settings as well.
Example 1. Consider f : [0, π/3] → [0, 1] ⊂ [0, π/3] given by f (x) = cos x. Then by the mean
value theorem, for any x, y ∈ [0, π/3],
xk+1 = f (xk ).
Our goal is to show that, inasmuch as f is a contraction mapping, this sequence converges to some
point x ∈ X. Then, by continuity of f (see Exercise 1), we will have
Therefore,
K K
!
X X 1 − cK
kak k ≤ ck−1 ka1 k = ka1 k.
1−c
k=1 k=1
Example 2. According to Theorem 1.2, the function f introduced in Example 1 must have a
unique fixed point in the interval [0, π/3]. Following the proof with x0 = 0, we obtain the following
1 y=x
0.8
0.6
y = cos x
0.4
0.2
Figure 1.1
values:
236 Chapter 6. Solving Nonlinear Problems
k xk k xk
1 1. 11 0.744237
2 0.540302 12 0.735604
3 0.857553 13 0.741425
4 0.654289 14 0.737506
5 0.793480 15 0.740147
6 0.701368 16 0.738369
7 0.763959 17 0.739567
8 0.722102 18 0.738760
9 0.750417 19 0.739303
10 0.731404 20 0.738937
Indeed, as Figure 1.1 illustrates, the values xk are converging to the x-coordinate of the intersection
of the graph of f (x) = cos x with the diagonal y = x. ▽
Example 2 shows that this is a very slow method to obtain the solution of cos x = x. Far better
is Newton’s method, familiar to every student of calculus. Given a differentiable function g : R → R,
we start at xk , draw the tangent line to the graph of g at xk and let xk+1 be the x-intercept of
that tangent line, as shown in Figure 1.2. We obtain in this way a sequence, and one hopes that if
y=g(x)
xk+1 xk
Figure 1.2
x0 is sufficiently close to a root a, then the sequence will converge to a. It is easy to see that the
recursion formula for this sequence is
g(xk )
xk+1 = xk − ,
g′ (xk )
g(x)
so, in fact, we are looking for a fixed point of the mapping f (x) = x − . If we assume g is
g′ (x)
twice differentiable, then we find that f ′ = gg ′′ /(g′ )2 , so f will be a contraction mapping whenever
|gg ′′ /(g ′ )2 | ≤ c < 1. In particular, if |g ′′ | ≤ M and |g′ | ≥ m, then iterating f will converge to a root
a of g if we start in any closed interval containing a on which |g| < m2 /M (provided f maps that
interval back to itself). For the strongest result, see Exercise 8.
§1. The Contraction Mapping Principle 237
Example 3. Reconsidering the problem of Example 2, let’s use Newton’s method to approxi-
x − cos x
mate the root of cos x = x by taking g(x) = x − cos x and iterating the map f (x) = x − .
1 + sin x
k xk k xk
0 1. 0 0.523599
1 0.750364 1 0.751883
2 0.739113 2 0.739121
3 0.739085 3 0.739085
4 0.739085 4 0.739085
Here we see that, whether we start at either x0 = 1 or at x0 = π/6, Newton’s method converges to
the root quite rapidly. Indeed, on the interval [π/6, π/3], we have m = 1.5, M = .87, and |g| ≤ .55,
which is far smaller than m2 /M ≈ 2.6. ▽
To move to higher dimensions, we need a multivariable Mean Value Theorem. The Mean Value
Theorem, although often misinterpreted in beginning calculus courses, tells us that if we have
bounds on the size of the derivative of a differentiable function, then we have bounds on how much
the function itself can change from one point to another. A crucial tool here will be the norm of a
linear map, introduced in Chapters 3 and 5.
EXERCISES 6.1
1. Prove that any contraction mapping is continuous and has at most one fixed point.
1
More generally, all we need is a C1 path in U joining a and b.
238 Chapter 6. Solving Nonlinear Problems
√
2. Let f : R → R be given by f (x) = x2 + 1. Show that f has no fixed point and that |f ′ (x)| < 1
for all x ∈ R. Why does this not contradict Theorem 1.2?
ck
*3. For the sequence {xk } defined in the proof of Theorem 1.2, prove that kxk −xk ≤ kx1 −x0 k.
1−c
This gives an a priori estimate on how fast the sequence converges to the fixed point.
4. A sequence {xk } of points in Rn is called a Cauchy sequence if for all ε > 0 there is K so that
whenever k, ℓ > K, we have kxk − xℓ k < ε. It is a fact that any Cauchy sequence in Rn is
convergent. (See Exercise 2.2.14.) Suppose 0 < c < 1 and {xk } is a sequence of points in Rn so
that kxk+1 − xk k < ckxk − xk−1 k for all k ∈ N. Prove that {xk } is a Cauchy sequence, hence
cK
convergent. (Hint: Show that whenever k, ℓ > K, we have kxk − xℓ k < kx1 − x0 k.)
1−c
5. Use the result of Exercise 2.2.14 to give a different proof of Proposition 1.1.
♯ 6. a. Show that if H is any square matrix with kHk < 1, then I − H is invertible. (Hint:
P
∞
Consider the geometric series H k . You will need to use the result of Exercise 5.1.6.)
k=0
b. Suppose, more generally, that A is an invertible n × n matrix. Show that when kHk <
1/kA−1 k, the matrix A + H is invertible as well. (Hint: Write A + H = A(I + A−1 H).)
2
c. Prove that the set of invertible n × n matrices is an open subset of Mn×n = Rn . This set
P 2 1/2
is denoted GL(n), the general linear group. (Hint: By Exercise 5.1.5, if hij < δ,
then kHk < δ.)
♯ 7. Continuing Exercise 6:
ε
a. Show that if kHk < ε < 1, then k(I + H)−1 − Ik < .
1−ε
b. More generally, if A is invertible and kA−1 kkHk < ε < 1, then estimate k(A+H)−1 −A−1 k.
c. Let X ⊂ Mn×n be the set of invertible n × n matrices (by Exercise 6, this is an open
subset). Prove that the function f : X → X, f (A) = A−1 , is continuous.
2
We learned of the n-dimensional version of this result, which we give in Exercise 10, called Kantarovich’s
Theorem, in Hubbard and Hubbard’s Vector Calculus, Linear Algebra, and Differential Forms.
§1. The Contraction Mapping Principle 239
d. Prove analogously that if, when we apply Newton’s method, we set xk+1 = xk + hk , then
|hk | ≤ |h0 |/2k . Deduce that iterating Newton’s method converges to a point in the given
interval.
12. Prove the following, slightly stronger version of Proposition 1.3. Suppose U ⊂ Rn is open,
f : U → Rm is differentiable, and a and b are points in U so that the line segment between
them is contained in U . Then prove that there is a point ξ on that line segment so that
kf (b) − f (a)k ≤ kDf (ξ)kkb − ak. (Hints: Define g as before, let v = g(1) − g(0) and define
φ : [0, 1] → R by φ(t) = g(t) · v. Apply the usual mean value theorem and the Cauchy-Schwarz
inequality, Proposition 2.3 of Chapter 1, to show that kvk2 = φ(1) − φ(0) ≤ kg′ (c)kkvk for
some c ∈ (0, 1).)
f ′ (0) = 1
2 + lim h sin h1 = 1
2 > 0.
h→0
so there are points (e.g., x = 1/2πn for any nonzero integer n) arbitrarily close to 0 where f ′ (x) < 0.
That is, despite the fact that f ′ (0) > 0, there is no interval around 0 on which f is increasing, as
Figure 2.1 suggests. Thus, f has no inverse on any neighborhood of 0! ▽
All right, so we need a stronger hypothesis. If we assume f is C1 , then it will follow that if
f ′ (a) > 0, then f ′ > 0 on an interval around a, and so f will be increasing—hence invertible—on
that interval. That is the result that generalizes nicely to higher dimensions.
0.02
0.01
-0.01
-0.02
Figure 2.1
g : W → V so that
f (g(y)) = y for all y ∈ W and g(f (x)) = x for all x ∈ V .
Moreover, if f (x) = y, we have
−1
Dg(y) = Df (x) .
Proof. Without loss of generality, we assume that x0 = y0 = 0 and that Df (0) = I. (We
make appropriate translations and then replace f (x) by Df (0)−1 f (x).) Since f is C1 , there is r > 0
so that
kDf (x) − Ik ≤ 21 whenever kxk ≤ r.
Now, fix y with kyk < r/2, and define the function φ by
φ(x) = x − f (x) + y.
Note that kDφ(x)k = kDf (x) − Ik. Whenever kxk ≤ r, we have (by Proposition 1.3)
r r
kφ(x)k ≤ kx − f (x)k + kyk < + = r,
2 2
and so φ maps the closed ball B(0, r) to itself. Moreover, if x, y ∈ B(0, r), by Proposition 1.3 we
have
1
kφ(x) − φ(y)k ≤ kx − yk,
2
so φ is a contraction mapping on B(0, r). By Theorem 1.2, φ has a unique fixed point xy ∈ B(0, r).
That is, there is a unique point xy ∈ B(0, r) so that f (xy ) = y. We leave it to the reader to check
in Exercise 10 that in fact xy ∈ B(0, r).
As pictured in Figure 2.2, take W = B(0, r/2) and V = f −1 (W ) ∩ B(0, r) (note that V is open
because f is continuous; see also Exercise 2.2.7). Define g : W → V by g(y) = xy . We claim first of
all that g is continuous. Indeed, define ψ : B(0, r) → Rn by ψ(x) = f (x) − x. Then, by Proposition
1.3 we have
f (u) − u − f (v) − v
= kψ(u) − ψ(v)k ≤ 1 ku − vk.
2
Thus, we have
f (u) − f (v) − u − v
≤ 1 ku − vk.
2
242 Chapter 6. Solving Nonlinear Problems
B(x0,r) f
r W=B(y0 ,r/2)
V x0 y0
g
Figure 2.2
and so
1
2 ku − vk ≤ kf (u) − f (v)k.
Writing f (u) = y and f (v) = z, we have
(∗) kg(y) − g(z)k ≤ 2ky − zk.
It follows that g is continuous (e.g., given ε > 0, take δ = ε/2).
Next, we check that g is differentiable. Fix y ∈ W and write g(y) = x; and we wish to prove
−1
that Dg(y) = Df (x) . Choose k sufficiently small that y + k ∈ W . Set g(y + k) = x + h, so
that h = g(y + k) − g(y). For ease of notation, write A = Df (x). We are to prove that
g(y + k) − g(y) − A−1 k
→0 as k → 0.
kkk
We consider instead the result of multiplying this quantity by (the fixed matrix) A:
A g(y + k) − g(y) − k Ah − k f (x + h) − f (x) − Df (x)h khk
= =− · .
kkk kkk khk kkk
We infer from (∗) that khk ≤ 2kkk, so as k → 0, it follows that h → 0 as well. Note, moreover,
that h 6= 0 when k 6= 0 (why?). Now we analyze the final product above: the first term approaches
0 by the differentiability of f ; the second is bounded above by 2. Thus, the product approaches 0,
as desired.
The last order of business is to see that g is C1 . We have
−1
Dg(y) = Df (g(y)) ,
so we see that Dg is the composition of the function y Df (g(y)) and the function A A−1
1
on the space of invertible matrices. Since g is continuous and f is C , the former is continuous.
By Exercise 6.1.7, the latter is continuous (indeed, we will prove much more in Corollary 5.19 of
Chapter 7 when we study determinants in detail). Since the composition of continuous functions
is continuous, the function y Dg(y) is continuous, as required.
§2. The Inverse and Implicit Function Theorems 243
Remark. More generally, with a bit more work, one can show that if f is Ck (or smooth), then
the local inverse g is likewise Ck (or smooth).
It is important to remember that this theorem guarantees only a local inverse function. It may
be rather difficult to determine whether f is globally one-to-one. Indeed, as the following example
shows, even if Df is everywhere invertible, the function f may be very much not one-to-one.
Example 2. Define f : R2 → R2 by
u eu cos v
f = .
v eu sin v
Then f is C1 , and
u eu cos v −eu sin v
Df =
v eu sin v eu cos v
is everywhere nonsingular, since its determinant is e2u6=0. Nevertheless,
since sin and cos are
u u
periodic, it is clear that f is not one-to-one: We have f =f for any integer k.
v v + 2πk
u x
On the other hand, if f = , then we apparently can solve for u and v:
v y
1
x 2log(x2 + y 2 )
(†) g =
y arctan(y/x)
x x
certainly satisfies f ◦g = . So, why is g not the inverse function of f ? Recall that
y y
arctan : R → (−π/2, π/2). So, as shown in Figure 2.3, if we consider the domain of f to be
v y
f
π/2
u x
−π/2
g
Figure 2.3
u x
: −π/2 < v < π/2 and the domain of g to be : x > 0 , then f and g will be inverse
v y
functions.
u x
Let’s calculate the derivative of any local inverse g according to Theorem 2.1. If f = ,
v y
then
−1 x y
x u e−u cos v e−u sin v +y 2 2 2
x +y . x2
Dg = Df = = y x
y v −e−u sin v e−u cos v− 2
x + y 2 x2 + y 2
Note that we get the same formula differentiating our specific inverse function (†). It is a bit
surprising that the derivative of any other inverse function, with different domain and range, must
be given by the identical formula. ▽
244 Chapter 6. Solving Nonlinear Problems
Now we are finally in a position to prove the Implicit Function Theorem, which first arose in
our informal discussion of manifolds in Section 5 of Chapter 4. It is without question one of the
most important theorems in higher mathematics.
W
)
y0
f
(
( ) ( )
V x0
g
Z
Figure 2.4
Remark. With not much more work, one can prove analogously that if F is Ck (or smooth),
then y is given locally as a Ck (or smooth) function of x. We may take this for granted in our later
work.
2 x
Example 3. Consider the function F : R → R, F = x3 ey + 2x cos(xy). We claim that
y
x x0 1
the equation F = 3 defines y locally as a function of x near the point = . By the
y y0 0
-6 -4 -2 0 2 4 6
-2
-4
-6
Figure 2.5
∂F 1
Implicit Function Theorem, Theorem 2.2, we need only check that 6= 0. Well,
∂y 0
∂F ∂F 1
= x3 ey − 2x2 sin(xy) and so = 1,
∂y ∂y 0
and so
0
1 2 1 1 0 1
DF −1
= 0 2 0 1 −1 .
1 1 −1 1 1 1
1
In particular, we see that
1 0 1
∂F
(a) = 0 1 −1 ,
∂y
1 1 1
which is easily checked to be nonsingular, and so the hypotheses of the Implicit Function Theorem,
Theorem 2.2, are fulfilled. There is a neighborhood of a in which we have y = φ(x). Moreover, we
have
−1 2 1 −1 2 1 −3 −5
0 ∂F ∂F
Dφ =− (a) (a) = − −1 0 10 2 = 1 2.
1 ∂y ∂x
−1 −1 1 1 −1 1 4
With this information, we can easily give the tangent plane at a of the surface F = 0. ▽
Remark . In general, we shall not always be so chivalrous (nor shall life) as to set up the
notation precisely as in the statement of Theorem 2.2. Just as in the case of linear equations where
the first r variables needn’t always be the pivot variables, here the last m variables needn’t always
be (locally) the dependent variables. In general, it is a matter of finding m pivots in some m
columns of the m × n derivative matrix.
§2. The Inverse and Implicit Function Theorems 247
EXERCISES 6.2
1. By applying the Inverse Function Theorem, Theorem 2.1, determine at which points x0 the
1
f has a local C inverse g, and calculate Dg(f (x0 )).
given function
x x2 − y 2
*a. f =
y 2xy2
x x/(x + y 2 )
b. f =
y y/(x2 + y 2 )
x x + h(y)
c. f = for any C1 function h : R → R
y y
x x + ey
d. f =
y y + ex
x x+y+z
e. f y = xy + xz + yz (cf. also Exercise 2)
z xyz
u u u+v
2. Let U = : 0 < v < u , and define f : U → R2 by f = .
v v uv
a. Show that f has a global inverse function g. Determine the domain of and an explicit
formula for g.
b. Calculate Dg both directly and by the formula given in the Inverse Function Theorem.
Compare your answers.
c. What does this exercise have to do with Example 2 in Chapter 4, Section 5? In particular,
give a concrete interpretation of your answer to part b.
1
3. Check that in each
of the following cases, the equation F = 0 defines y locally as a C function
x0
φ(x) near a = , and calculate Dφ(x0 ).
y0
x
a. F = y 2 − x3 − 2 sin π(x − y) , x0 = 1, y0 = −1
y
x1
1
*b. F x2 = e x 1 y 2
+ y cos x1 x2 − 1, x0 = , y0 = 0
2
y
x1
0
c. F x2 = e 1 + y arctan x2 − (1 + π/4), x0 =
x y 2 , y0 = 1
1
y
x 2 2 − y2 − 2
x − y 1
d. F y1 = 1 2 , x0 = 2, y0 =
x − y1 + y2 − 2 1
y
2
x1
x2 x21 − x22 − y13 + y22 + 4 2 2
e.
F = , x0 = , y0 =
y1 2 2
2x1 x2 + x2 − 2y1 + 3y2 + 8 4 −1 1
y2
defined.
x 1
5. Let F y = x2 + 2y 2 − 2xz − z 2 = 0. Show that near the point a = 1 , z is given implicitly
z 1
as a C1 function of x and y. Find the largest neighborhood of a on which this is true.
*6. Using the law of cosines (Exercise 1.2.12) and Theorem 2.2, show that the angles of a triangle
are C1 functions of the sides. To a small change in which one of the sides (keeping the other
two fixed) is an angle most sensitive?
9. Using the notation of Exercise 8, physical chemists define the expansion coefficient α and
isothermal compressibility β to be, respectively,
1 ∂V 1 ∂V
α= and β=− .
V ∂T p V ∂p T
*a. Calculate α and β for an idealgas.
∂p α
b. Show that in general we have = .
∂T V β
10. Check that, under the hypotheses in place in the proof of Theorem 2.1, if kxk = r, then
kf (x)k ≥ r/2. (Hint: Use Exercise 1.2.17.)
♯ 11. Let B = B(0, r) ⊂ Rn . Suppose U ⊂ Rn is an open subset containing the closed ball B,
f : U → Rn is C1 , f (0) = 0, and kDf (x)− Ik ≤ s < 1 for all x ∈ B. Prove that if kyk < r(1− s),
then there is x ∈ B such that f (x) = y.
§2. The Inverse and Implicit Function Theorems 249
12. Suppose U ⊂ Rn is open and f : U → Rm is C1 with f (a) = 0 and rank(Df (a)) = m. Prove
that for every c sufficiently close to 0 ∈ Rm the equation f (x) = c has a solution near a.
2 2
13. (Theenvelope
f: R
of a family of curves) Suppose × (a, b) → R is C and for each t ∈ (a, b),
x x
∇f 6= 0 on the level curve Ct = f = 0 . (Here the gradient denotes differentiation
t t
with respect only to x.) The curve C is called the envelope of the family of curves {Ct : t ∈
(a, b)} if each member
of the family
is tangent to C at some point (depending on t).
x0 ∂f x0
a. Suppose f = = 0 and the matrix
t0 ∂t t0
∂f x0 ∂f x0
∂x t0
2 ∂y t0
∂ f x0 ∂ f x0
2
∂x∂t t0 ∂y∂t t0
is nonsingular. Show that for some δ > 0, there is a C1 curve g : (t0 − δ, t0 + δ) → R2 with
g(t0 ) = x0 so that
g(t) ∂f g(t)
f = = 0.
t ∂t t
Figure 2.6
250 Chapter 6. Solving Nonlinear Problems
3. Manifolds Revisited
g W
M
p
Figure 3.1
If the curious reader wonders why the last (and obviously technical) condition is included in
the third definition, see Exercises 2 and 3.
§3. Manifolds Revisited 251
Theorem 3.1. The three criteria given in this definition are all equivalent.
so F(x) = 0 if and only if v = 0, which means that x = g(u). This proves that the equation F = 0
defines that portion of M given by g(u) for all u ∈ V1 . But because W ⊂ Wf , we know that such
points comprise all of M ∩ W .
The proof tells us to define F = H2 , and, indeed, this works. M is the zero-set of the function
x
y − x2
F : R3 → R2 given by F y = .
z − x3
z
We ask the reader to carry this procedure out in Exercise 6 in a situation where it will only work
locally. ▽
There are corresponding notions of the tangent space of the manifold M at p. (Recall that we
shall attempt to refer to the tangent space as a subspace, whereas the tangent plane is obtained by
translating it to pass through the point p.)
Definition. If the manifold M is presented in the three respective forms above, then its tangent
space at p, denoted Tp M , is defined as follows.
a
(1) assuming M is locally the graph of f with p = , then Tp M is the graph of Df (a);
f (a)
(2) assuming M is locally a level set of F, then Tp M = N([DF(p)]).
(3) assuming M is locally parametrized by g with p = g(a), then Tp M is the image of the
linear map Dg(a) : Rk → Rn .
Once again, we need to check that these three recipes all give the same k-dimensional subspace
of Rn . The ideas involved in this check have all emerged already in the preceding chapters. Since
(1) is a special case of (3) (why?), we need only check that N([DF(p)]) = image(Dg(a)). Note
that both of these are k-dimensional subspaces, because of our rank conditions on F and g. So it
suffices to show that image(Dg(a)) ⊂ N([DF(p)]). But this is easy: the function F◦ g : U → Rn−k
is identically 0, so, by the chain rule, DF(p)◦ Dg(a) = O, which says precisely that any vector in
the image of Dg(a) is in the kernel of DF(p).
EXERCISES 6.3
x
*1. Show that the set X = : y = |x| is not a 1-dimensional manifold, even though the
y
3
t
function g(t) = 3 gives a C1 “parametrization” of it. What’s going on?
|t |
cos 2t cos t
2. Show that the parametric curve g(t) = , t ∈ (−π/2, π/4), is not a 1-dimensional
cos 2t sin t
manifold. (Hint: Stare at Figure 3.2.)
4. Is the union of the hyperbola xy = 1 and its asymptote y = 0 a 1-dimensional manifold? Give
your reasoning.
§3. Manifolds Revisited 253
Figure 3.2
7. Show the equivalence of the three definitions for each of the following 2-dimensional manifolds:
a. implicit surface x2 + y 2 = 1 (in R3 )
b. implicit surface x2 + 2 2 3
y = z (in R − {0})
u cos v
*c. parametric surface u sin v , u > 0, v ∈ R
v
sin u cos v
d. parametric surface sin u sin v , 0 < u < π, 0 < v < 2π
cos u
sin u cos v
e. parametric surface sin u sin v , 0 < u < π, 0 < v < 2π
2 cos u
(3 + 2 cos u) cos v
f. parametric surface (3 + 2 cos u) sin v , 0 ≤ u, v ≤ 2π
2 sin u
8. a. Show that
x
X = y : (x2 + y 2 + z 2 )2 − 10(x2 + y 2 ) + 6z 2 + 9 = 0
z
is a 2-manifold.
254 Chapter 6. Solving Nonlinear Problems
x p
b. Check that y ∈ X ⇐⇒ ( x2 + y 2 − 2)2 + z 2 = 1. Use this to sketch X.
z
9. At what points is
x
X = y : (x2 + y 2 + z 2 )2 − 4(x2 + y 2 ) = 0
z
a smooth surface? Proof? Give the equation of its tangent space at such a point.
1. Multiple Integrals
In single-variable calculus the integral is motivated by the problem of finding the area under a
curve y = f (x) over aninterval
[a, b]. Now we want to find the volume of the region in R3 lying
x
under the graph z = f and over the rectangle R = [a, b] × [c, d] in the xy-plane. Once we see
y
how partitions, upper and lower sums, and the integral are defined for rectangles in R2 , then it is
simple (although notationally discomforting) to generalize to higher dimensions.
Mij mij
Rij
Figure 1.1
255
256 Chapter 7. Integration
(Note that the inequality L(f, P) ≤ U (f, P) is obvious, as mij ≤ Mij for all i and j.)
In higher dimensions, we proceed analogously, but the notation is horrendous. Let R = [a1 , b1 ]×
[a2 , b2 ] × · · · × [an , bn ] ⊂ Rn be a rectangle in Rn . We obtain a partition of R by dividing each of
the intervals into subintervals,
Rj1 j2 ...jn = [x1,j1 −1 , x1,j1 ] × [x2,j2 −1 , x2,j2 ] × · · · × [xn,jn−1 , xn,jn ] for some 1 ≤ js ≤ ks , s = 1, . . . , n.
We will usually suppress all the subscripts and just refer to the partition as {Ri }. We define the
volume of a rectangle R = [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ] ⊂ Rn to be
Then upper sums, lower sums, and the integral are defined as before, substituting volume (of a
rectangle
Z in Rn ) for area (of a rectangle in R2 ). In dimensions n ≥ 3, we denote the integral by
f dV .
R
We need some criteria to detect integrability of functions. Then we will find soon that we can
evaluate integrals by reverting to our techniques from one-variable calculus.
Q
Q′
Figure 1.2
Lemma 1.1. Let P and P′ be partitions of a given rectangle R, and suppose P is a refinement
of P′ . Suppose f is a bounded function on R. Then we have
L(f, P′ ) ≤ L(f, P) ≤ U (f, P) ≤ U (f, P′ ).
Proof. It suffices to check the following: let Q be a single rectangle, and let Q = {Q1 , . . . , Qr }
be a partition of Q. Let m = inf x∈Q f (x), mi = inf x∈Qi f (x), M = supx∈Q f (x), Mi = supx∈Qi f (x).
Then we claim that
Xr Xr
marea(Q) ≤ mi area(Qi ) ≤ Mi area(Qi ) ≤ M area(Q).
i=1 i=1
This is immediate from the fact that m ≤ mi ≤ Mi ≤ M for all i = 1, . . . , r.
Corollary 1.2. If P′ and P′′ are two partitions of R, we have L(f, P′ ) ≤ U (f, P′′ ).
Proof. Let P be the partition of R formed by taking the union of the respective partitions in
each coordinate, as indicated in Figure 1.3. P is called the common refinement of P′ and P′′ . Then
by Lemma 1.1, we have
L(f, P′ ) ≤ L(f, P) ≤ U (f, P) ≤ U (f, P′′ ),
as required.
Proof. ⇐=: Suppose there were two different numbers I1 and I2 satisfying L(f, P) ≤ Ij ≤
U (f, P) for all partitions P. Choosing ε = |I2 − I1 | yields a contradiction.
=⇒: Now suppose f is integrable, so that there is a unique number I satisfying L(f, P) ≤ I ≤
U (f, P) for all partitions P. Given ε > 0, we can find a partitions P′ and P′′ so that
I − L(f, P′ ) < ε/2 and U (f, P′′ ) − I < ε/2.
(If we could not get as close as desired to I with upper and lower sums, we would violate uniqueness
of I.) Let P be the common refinement of P′ and P′′ . Then
L(f, P′ ) ≤ L(f, P) ≤ U (f, P) ≤ U (f, P′′ ),
so
U (f, P) − L(f, P) ≤ U (f, P′′ ) − L(f, P′ ) < ε,
258 Chapter 7. Integration
′′
partition P ′ of the rectangle R partition P of the rectangle R
Figure 1.3
as required.
We need to be aware of the basic properties of the integral (which we leave to the reader as
exercises):
Proposition 1.6. Suppose R = R′ ∪ R′′ is the union of two subrectangles. Then f is integrable
on R if and only if f is integrable on both R′ and R′′ , in which case we have
Z Z Z
f dV = f dV + f dV .
R R′ R′′
Proof. Given ε > 0, we must find a partition P of R so that U (f, P) − L(f, P) < ε. Since f
is continuous on the compact set R, it follows from Theorem 1.4 of Chapter 5 that f is uniformly
continuous. That means that given any ε > 0, there is δ > 0 so that whenever kx − yk < δ,
ε
x, y ∈ R, we have |f (x) − f (y)| < . Partition R into subrectangles Ri , i = 1, . . . , k, of
vol(R) √
diameter less than δ (e.g., whose sidelengths are less than δ/ n). Then on any such subrectangle
ε
Ri , we will have Mi − mi < , and so
vol(R)
k
X ε
U (f, P) − L(f, P) = (Mi − mi )vol(Ri ) < vol(R) = ε,
vol(R)
i=1
as needed.
Definition . We say X ⊂ Rn has (n-dimensional) volume zero if for every ε > 0, there are
P
s
finitely many rectangles R1 , . . . , Rs so that X ⊂ R1 ∪ · · · ∪ Rs and vol(Ri ) < ε.
i=1
Proof. Let ε > 0 be given. We must find a partition P of R so that U (f, P) − L(f, P) < ε.
Since f is bounded, there is a real number M so that |f | ≤ M . Because X has volume zero,
we can find finitely many rectangles R1′ , . . . , Rs′ , as shown in Figure 1.4, that cover X and satisfy
Ps
vol(Rj′ ) < ε/4M . We can also ensure that no point of X is a frontier point of the union of
j=1
these rectangles (see Exercise 2.2.8). Now create a partition of R in such a way that each of Rj′ ,
Figure 1.4
j = 1, . . . , s, will be a union of subrectangles of this partition, as shown in Figure 1.5. Consider the
Ss
closure of Y = R − Rj′ ; it too is compact, and f is continuous on Y , hence uniformly continuous.
j=1
Proceeding as in the proof of Proposition 1.7, we can refine the partition to obtain a partition
P = {R1 , . . . , Rt } of R with the property that
X ε
(Mi − mi )vol(Ri ) < .
2
Ri ⊂Y
Figure 1.5
f˜: R → R
f (x), x ∈ Ω
˜
f (x) = .
0, otherwise
Proof. Recall that to integrate f over Ω we must integrate f˜ over some rectangle R containing
Ω. The function f˜ is continuous on all of R except for the frontier of Ω, which is a set of volume
zero.
Z
n
Corollary 1.10. If Ω ⊂ R is a region, then vol(Ω) = 1dV is well-defined.
Ω
Proposition 1.11. Suppose f and g are integrable functions on the region Ω and f ≤ g. Then
Z Z
f dV ≤ gdV .
Ω Ω
§1. Multiple Integrals 261
Proof. Let R be a rectangle containing Ω and let f˜ and g̃ be the functions as defined above.
Then we have f˜ ≤ g̃ everywhere
Z on R.
Z Then, Zapplying Propositions 1.4 and 1.5, the function
h = g̃ − f˜ is integrable and hdV = gdV − f dV . On the other hand, since h ≥ 0, for any
R Ω Ω Z
partition P of R, the lower sum L(h, P) ≥ 0, and therefore hdV ≥ 0. The desired result now
R
follows immediately.
EXERCISES 7.1
0, 0 ≤ y ≤ 1
x 2
*1. Suppose
Z f = . Prove that f is integrable on R = [0, 1] × [0, 1] and find
y 1, 1 < y ≤ 1
2
f dA.
R
2. Show directly that the function
1,
x x=y
f =
y 0, otherwise
Z
is integrable on R = [0, 1] × [0, 1] and find f dA. (Hint: Partition R into 1/N by 1/N
R
squares.)
Z
*8. Check that f dV is well-defined. That is, if R and R′ are two rectangles containing Ω and
Ω
f˜ and f˜′ are the corresponding
Z functions,
˜ ˜′
Z check that f is integrable over R if and only if f is
integrable over R′ and that f˜dV = f˜′ dV .
R R′
9. a. Prove Proposition 1.4. (Hint: If P = {Ri } is a partition and mfi , mgi , mfi +g , Mif , Mig ,
Mif +g denote the obvious, show that
mfi + mgi ≤ mfi +g ≤ Mif +g ≤ Mif + Mig .
Z Z
It will also be helpful to see that f dV + gdV is the unique number between L(f, P)+
R R
L(g, P) and U (f, P) + U (g, P) for all partitions P.)
b. Prove Proposition 1.5.
c. Prove Proposition 1.6.
♯ 10. Suppose f is integrable on R. Given ε > 0, prove there is δ > 0 so that whenever all the
rectangles of a partition P have diameter less than δ, we have U (f, P) − L(f, P) < ε. (Hint:
By Proposition 1.3, there is a partition P′ (as indicated by the darker lines in Figure 1.6) so
that U (f, P′ ) − L(f, P′ ) < ε/2. Show that covering the dividing hyperplanes (of total area A)
Figure 1.6
of the partition by rectangles of diameter < δ requires at most volume 2Aδ. If |f | ≤ M , then
we can pick δ so that that total volume is at most ε/4M . Show that this δ works.)
♯ 11. Let X ⊂ Rn be a set of volume 0.
a. Show that for every ε > 0, there are finitely many cubes C1 , . . . , Cr so that X ⊂ C1 ∪· · ·∪Cr
Pr
and vol(Ci ) < ε. (Hint: If R is a rectangle with vol(R) < δ, show that there is a
i=1
rectangle R′ containing R with vol(R′ ) < δ and whose sidelengths are rational numbers.)
§1. Multiple Integrals 263
b. Let T : Rn → Rn be a linear map. Prove that T (X) has volume 0 as well. (Hint: Show
that there is a constant k so that for any cube C, the image T (C) is contained in a cube
whose volume is at most k times the volume of C.) Query: What goes wrong with this if
T : Rn → Rm and m < n?
♯ 12. Let m < n, let X ⊂ Rm be compact and U ⊂ Rm an open set containing X. Suppose
φ : U → Rn is C1 . Prove φ(X) has volume 0 in Rn . (Hints: Take X ⊂ C, where C is a cube.
Show that if N is sufficiently large and we divide C into N m subcubes, then X is covered by
such cubes all contained in U ,1 and φ(X) will be contained in at most N m cubes in Rn . Argue
by continuity of Dφ that there is a constant k (not depending on N ) so that each of these will
have volume less than (k/N )n .)
13. We’ve seen in Proposition 1.8 a sufficient condition for f to be integrable. Show that it isn’t
necessary by considering the famous function f : [0, 1] → R given by
1 , x = p in lowest terms
f (x) = q q
.
0, otherwise
14. A subset X ⊂ Rn has measure zero if, given any ε > 0, there is a sequence of rectangles R1 ,
R2 , R3 , . . . , Rk , . . . , so that
∞
[ ∞
X
X⊂ Ri and vol(Ri ) < ε.
i=1 i=1
Prove that o(f, a) makes sense (i.e., the limit exists) and is nonnegative; it is called the
oscillation of f at a. Prove that f is continuous at a if and only if o(f, a) = 0.
b. For any ε > 0, set Dε = {x ∈ R : o(f, x) ≥ ε}, and let D = {x ∈ R : f is discontinuous at x}.
Show that D = D1 ∪ D1/2 ∪ D1/3 ∪ · · · and that Dε is a closed set.
1
This follows from Exercise 5.1.13.
264 Chapter 7. Integration
c. Suppose that f is integrable on R. Prove that for any k ∈ N, D1/k has volume 0. Deduce
that if f is integrable on R, then D has measure 0. (Hint: Use Exercise 14.)
d. Conversely, prove that if D has measure 0, then f is integrable. (Hints: Choose ε > 0 and
apply the convenient criterion. If D has measure 0, then so has Dε , and so it has volume
0 (why?). Create a partition consisting of rectangles disjoint from Dε and of rectangles of
small total volume that cover Dε .)
In one-variable integral calculus we learned that we could compute the volume of a solid region
by slicing it by parallel planes and integrating the cross-sectional area. In particular, given a
R = [a, b] × [c, d], if we are interested in finding the volume over R and under the graph
rectangle
x
z=f , we could slice by planes perpendicular to the x-axis, as shown in Figure 2.1, obtaining
y
c
a d
Figure 2.1
Z b
volume = cross-sectional area at x dx
a
Z b Z d ! Z bZ d
x x
= f dy dx = f dydx
a c y a c y
| {z }
x fixed
This expression is called an iterated integral. Perhaps it would be more suggestive to call it a nested
integral. Calculating iterated integrals reverts to one-variable calculus skills (finding antiderivatives
and applying the Fundamental Theorem of Calculus) along with a healthy dose of neat bookkeeping.
Example 1.
Z 1Z 2 Z 1 i2
2
1 + x + xy dydx = (1 + x2 )y + 21 xy 2 dx
0 1 0 | {z } y=1
x fixed
Z 1 i1
= 1 + x2 + 23 x dx = x + 13 x3 + 34 x2
0 x=0
§2. Iterated Integrals and Fubini’s Theorem 265
1 3 25
=1+ + = . ▽
3 4 12
Examples 2. Let’s investigate an obvious question.
Z 1Z 2
2
(a) We wish to evaluate xyex+y dxdy.
−1 0
Z Z ! Z
1 2
2
1 2
i2 R
xyex+y dx dy = yey (x − 1)ex dy (recalling that xex dx = xex − ex )
−1 0 −1 x=0
Z 1 i1
2 2
= yey (e2 + 1)dy = 12 (e2 + 1)ey = 0.
−1 y=−1
Z 2Z 1
2
(b) Now let’s consider xyex+y dydx.
0 −1
Z !
2Z 1 Z 2 Z 1
x+y 2 y2
xye dydx = (xex )(ye )dy dx
0 −1 0 −1
Z !
2 i1
1 x y2
= 2 (xe )e dx
0 y=−1
= 0.
2
More to the point, we should observe that for fixed x, the function (xex )(yey ) is an odd
function of y, and hence the integral as y varies from −1 to 1 must be 0.
We shall prove in a moment that for reasonable functions the iterated integrals in either order are
equal, and so it behooves us to think a minute about symmetry (or about the difficulty of finding
an antiderivative) and choose the more convenient order of integration. ▽
2
Example 3.Suppose
we wish over the triangle Ω ⊂ R
to find the volume of the region lying
0 1 1 x
with vertices at , , and and bounded above by z = f = xy. Then we wish to
0 0 1 y
find the integral of f over the region Ω. By definition, we consider Ω as a subset of, say, the square
R = [0, 1] × [0, 1] and define f˜: R → R by
xy, x
x ∈Ω
f˜ = y ,
y
0, otherwise
˜ x
whose graph is sketched in Figure 2.2. Note that for x fixed, f = xy when 0 ≤ y ≤ x and is 0
y
otherwise. So Z Z Z Z
1 x 1 x
x
f˜ dy = xydy + 0dy = xydy.
0 y 0 x 0
Thus, we have
Z !
1Z 1 Z 1 Z x
x
f˜ dydx = xydy dx
0 0 y 0 0
Z !
1 ix
1 2
= 2 xy dx
0 y=0
266 Chapter 7. Integration
x
Ω
Figure 2.2
Z 1
1 3 1
= 2 x dx = . ▽
0 8
Example 4. Suppose we slice into a cylindrical tree trunk, x2 + y 2 ≤ a2 , and remove the wedge
Figure 2.3
bounded below by z = 0 and above by z = y, as depicted in Figure 2.3. What is the volume of the
chunk we remove?
We see that the plane z = y lies above the plane z = 0 when y ≥ 0, so we let Ω =
y=√a 2 −x 2
Ω
x
−a a
Figure 2.4
§2. Iterated Integrals and Fubini’s Theorem 267
x
: x2 + y2 ≤ a2 , y ≥ 0 , as indicated in Figure 2.4, and to obtain the volume we calculate:
y
Z Z Z √ Z !
a a2 −x2 a i√a2 −x2
1 2
ydA = ydydx = 2 y y=0 dx
Ω −a 0 −a
Z a Z a
1 2
= (a2 − x2 )dx = (a2 − x2 )dx = a3 . ▽
2 −a 0 3
The fact that we can compute volume using either a multiple integral or an iterated integral
suggests that, at least for “reasonable” functions, we should in general be able to calculate multiple
integrals by computing iterated integrals. The crucial theorem that allows us to calculate multiple
integrals with relative ease is the following
Corollary 2.2. Suppose f is integrable on the rectangle R = [a, b] × [c, d] and the iterated
integrals
Z bZ d Z dZ b
x x
f dydx and f dxdy
a c y c a y
Z d
x
both exist. (That is, for each x, the integral f
dy exists and defines a function of x that is
c y
Z b
x
integrable on [a, b]. And, likewise, for each y, the integral f dx exists and defines a function
a y
of y that is integrable on [c, d].) Then
Z bZ d Z Z dZ b
x x
f dydx = f dA = f dxdy.
a c y R c a y
R = [a1 , b1 ] × · · · × [an , bn ].
all exist. Then the multiple integral and the iterated integral are equal:
Z Z b1 Z bn
f (x)dV = ... f (x)dxn · · · dx1 .
R a1 an
(The same is true for the iterated integral in any order, provided all the intermediate integrals
exist.) In particular, whenever f is continuous on R, then the multiple integral equals any of the
n! possible iterated integrals.
§2. Iterated Integrals and Fubini’s Theorem 269
Example 5. It is easy to find a function f on the rectangle R = [0, 1] × [0, 1] that is integrable
but whose iterated integral doesn’t exist. Take
x 1, x = 0, y ∈ Q
f = .
y 0, otherwise
Z 1 Z
0
The integral f dy does not exist, but it is easy to see that f is integrable and f dA = 0.
0 y R
▽
Example 6. It is somewhat harder to find a function whose iterated integral exists but that
is not integrable. Let
x 1, y∈Q
f = .
y 2x, y ∈ /Q
Z 1 Z 1Z 1
x x
Then f dx = 1 for every y ∈ [0, 1], so the iterated integral f dxdy exists and
0 y 0 0 y
equals 1. Whether f is integrable on R = [0, 1] × [0, 1] is more subtle. Probably the easiest way
to see that it is not is this: if it were, by Proposition 1.6, then it would also be integrable on
R′ = [0, 12 ] × [0, 1]. For any partition P of R′ , we have U (f, P) = 21 , whereas we can make L(f, P)
Z 1 Z 1/2
1
as close to 2xdxdy = as we wish.
0 0 4
We
Z 1Z 1 ask the reader to decide in Exercise 4 whether the other iterated integral,
x
f dydx, exists.
0 0 y
Example 7. More subtle yet is a nonintegrable function on R = [0, 1] × [0, 1] both of whose
iterated integrals exist. Define
x 1, x = m n
q and y = q for some m, n, q ∈ N with q prime
f = .
y 0, otherwise
First of all, f is not integrable on R since L(f, P) = 0 and U (f, P) = 1 for every partition P of R
Z 1
x
(see Exercise 5). Next, we claim that for any x, f dy exists and equals 0. When x ∈ / Q,
y
0
m x
this is obvious. When x = , only for finitely many y ∈ [0, 1] is f not equal to 0, and so
q y
Z 1Z 1
x
the integral exists. Obviously, then, the iterated integral f dydx exists. The same
0 0 y
argument applies when we reverse the order. ▽
Example 8 (Changing the order of integration). You are asked to evaluate the iterated integral
Z 1Z 1
sin x
dxdy.
0 y x
270 Chapter 7. Integration
Z
sin x
It is a classical fact that dx cannot be evaluated in elementary terms, and so (other than
x
resorting to numerical integration) we are stymied. To be careful, we define
sin x
x , x 6= 0
f = x .
y 1, x=0
which is the triangle pictured in Figure 2.5. Once we have a picture of Ω, we see that we can
1
y=x
x=y x=1
Ω Ω
y=0 1
Figure 2.5
The moral of this story is that, when confronted by an iterated integral that cannot be evaluated
in elementary terms, it doesn’t hurt to change the order of integration and see what happens. ▽
Example 9. Let Ω ⊂ R3 be the region in the first octant bounded belowZ by the paraboloid
z = x2 + y 2 and above by the plane z = 4, shown in Figure 2.6. Evaluate xdV . It is most
Ω
natural to integrate first with respect to z; notice that the projection of Ω onto the xy-plane is the
§2. Iterated Integrals and Fubini’s Theorem 271
z=x2+y2
Figure 2.6
quarter
of the disk of radius 2 centered at the origin lying in the first quadrant. For each point
x
in that quarter-disk, z varies from x2 + y 2 to 4. Thus, we have
y
Z Z √
2Z 4−x2 Z 4
xdV = xdzdydx
Ω 0 0 x2 +y 2
Z √
2Z 4−x2
= x 4 − (x2 + y 2 ) dydx
0 0
Z 2
= x (4 − x2 )3/2 − 31 (4 − x2 )3/2 dx
0
i2 64
2
= − 15 (4 − x2 )5/2 = .
0 15
We will revisit this example in Section 3. ▽
x3
x2
x1
Figure 2.7
Z 1 Z x1 Z xn−1
vol(Ω) = ... dxn · · · dx2 dx1
0 0 0
Z 1 Z x1 Z xn−2
= ... xn−1 dxn−1 · · · dx2 dx1
0 0 0
272 Chapter 7. Integration
Z 1 Z x1 Z xn−3
1 2
= ... 2 xn−2 dxn−2 · · · dx2 dx1
0 0 0
Z 1
1 1
= ··· = x1n−1 dx1 = ▽
0 (n − 1)! n!
EXERCISES 7.2
Z
1. Evaluate the integrals f dV for the given function f and rectangle R:
R
x
*a. f = ex cos y, R = [0, 1] × [0, π2 ]
y
x y
b. f = , R = [1, 3] × [2, 4]
y
x
x x
*c. f = 2 , R = [0, 1] × [1, 3]
y x +y
x
d. f y = (x + y)z, R = [−1, 1] × [1, 2] × [2, 3]
z
Z
2. Interpret each of the following iterated integrals as a double integral f dA for the appropriate
Ω
region Ω, sketch Ω, and change the order of integration. (You may assume f is continuous.)
Z 1Z 1
x
a. f dydx
y
Z0 1 Zx2x
x
*b. f dydx
y
Z0 2 Z0 4
x
c. f dxdy
1 y 2√ y
Z Z 1 2
1−y
x
d. f dxdy
y
Z−1 0
1Z x
x
e. f
dydx
x2 y
Z0 2 Z x+2
x
*f. f dydx
−1 x2 y
3. Z
Evaluate each of the following iterated integrals. In addition, interpret each as a double integral
f dA, sketch the region Ω, change the order of integration and evaluate the alternative
Ω
iterated
Z 1 Zintegral.
x
a. (x + y)dydx
0 0√
Z 1 Z 1−y2
b. √ ydxdy
0 − 1−y 2
Z 1Z x
x
*c. dydx
0 x 2 1 + y2
§2. Iterated Integrals and Fubini’s Theorem 273
Z 1Z 1
x
4. Given the function f in Example 6, does the iterated integral f dydx exist?
0 0 y
5. Check that for the function f defined in Example 7, for every partition P of R,
U (f, P) = 1 and L(f, P) = 0. (Hint: Show that for every δ > 0, if 1/q < δ, then every
interval of length δ in [0, 1] contains a point of the form k/q.)
6. Let
1 , x = p in lowest terms, y ∈ Q
x
f = q q
.
y 0, otherwise
Decide whether f is integrable on R = [0, 1] × [0, 1] and whether the iterated integrals
Z 1Z 1 Z 1Z 1
x x
f dxdy and f dydx
0 0 y 0 0 y
exist.
8. Evaluate
Z 4 Zthe following iterated integrals:
2
1
*a. √ 1 + x3
dxdy
0 y
Z 1Z 1
4
b. √
ey dydx
3
Z0 1 Z 1x
c. √
ey/x dxdy (Be careful: why does the double integral even exist?)
0 y
9. Find the volume of the region in the first octant of R3 bounded below by the xy-plane, on the
sides by x = 0 and y = 2x, and above by y 2 + z 2 = 16.
10. Find the volume of the region in the R3 bounded below by the xy-plane, above by z = y, and
on the sides by y = 4 − x2 .
*11. Find the volume of the region in R3 bounded by the cylinders x2 + y 2 = 1 and x2 + z 2 = 1.
Z
12. Interpret each of the following iterated integrals as a triple integral f dV for the appropriate
Ω
region Ω, sketch Ω, and change the order of integration so that the innermost integral is taken
with respect to y. (You may
assume f is continuous.)
Z 1 Z 1−x Z 1−x−y x
*a. f y dzdydx
0 0 0 z
Z 1Z 1−x2
y Z
x
b. f y dzdydx
0 0 0 z
Z 1 Z √1−x2 Z 1 x
c. √ √ f y dzdydx
−1 − 1−x2 x2 +y 2 z
274 Chapter 7. Integration
Z 1 Z 1−x2 Z x+y x
*d. f y dzdydx
0 0 0 z
Z 1Z 1−x Z x+y x
e. f y dzdydx
0 0 0 z
*13. Suppose a, b, and c are positive. Find the volume of the tetrahedron bounded by the coordinate
planes and the plane x/a + y/b + z/c = 1.
*15. Let Ω ⊂ R3 be the portion of the cube 0Z ≤ x, y, z ≤ 1 lying above the plane y + z = 1 and
below the plane x + y + z = 2. Evaluate xdV .
Ω
16. Let
x−y x
. f =
(x + y)3 y
Z 1Z 1 Z 1Z 1
x x
Calculate the iterated integrals f dxdy and f dydx. Explain your re-
0 0 y 0 0 y
sults.
Decide if both iterated integrals exist and if Zthey are equal. Is f integrable on R? (Hint: To
see where this function came from, calculate f dA.)
1
[ k+1 , k1 ]×[ ℓ+1
1
, 1ℓ ]
19. Assume f is C2 . Prove Theorem 6.1 of Chapter 3 by applying Fubini’s Theorem. (Hint: Proceed
by contradiction: if the mixed partials are not equal at some point, apply Exercise 2.3.5 to show
∂2f ∂2f
we can find a rectangle on which, say, > . Exercise 7.1.5 may also be useful.)
∂x∂y ∂y∂x
♯ 20. ∂f
(Differentiating under the integral sign) Suppose f : [a, b] × [c, d] → R is continuous and is
Z d ∂x
x
continuous. Define F (x) = f dy.
c y
a. Prove that F is continuous. (Hint: You will need to use uniform continuity of f .)
Z d
′ ∂f x
b. Prove that F is differentiable and that F (x) = dy. (Hint: Let φ(t) =
Z d Z x c ∂x y
∂f t
dy, and let Φ(x) = φ(t)dt. Show that φ is continuous and that F (x) =
c ∂x y a
Φ(x) + const.)
Z 1 x Z 1
y −1 ′ y−1
21. Let F (x) = dy. Use Exercise 20 to calculate F (x) and prove that dy =
0 log y 0 log y
F (1) = log 2.
Z x 2 Z 1 −x2 (t2 +1)
e
−t2
22. Let f (x) = e dt and g(x) = dt.
0 0 t2 + 1
a. Using Exercise 20 as necessary, prove that f ′ (x) + g ′ (x) = 0 for all x.
Z ∞ Z N
−t2 2 √
b. Prove that f (x)+g(x) = π/4 for all x. Deduce that e dt = lim e−t dt = π/2.
0 N →∞ 0
∂f
23. Suppose f : [a, b] × [c, d] → R is continuous and is continuous. Suppose g : [a, b] → (c, d) is
Z g(x) ∂x
x
differentiable. Let h(x) = f dy. Use the chain rule and Exercise 20 to show that
c y
Z g(x)
′ ∂f x x
h (x) = dy + f g′ (x).
c ∂x y g(x)
Z z
x x
(Hint: Consider F = f dy.)
z c y
x Z
dy
24. Evaluate 2 2
. Use Exercise 23 to evaluate
Z x 0 x +y Z x
dy dy
*a. 2 2 2
b. 2 2 3
0 (x + y ) 0 (x + y )
Z x
25. Suppose f is continuous. Let h(x) = sin(x − y)f (y)dy. Show that h(0) = h′ (0) = 0 and
0
h′′ (x) + h(x) = f (x).
In this section we introduce three extremely useful alternative coordinate systems in two and
three dimensions. We treat the question of changes of variables in multiple integrals intuitively
here, leaving the official proofs for Section 6.
x2+y2=2
x2+y2=1
S
Figure 3.1
Z
x
Suppose one wished to calculate f dA, where S is the annular region between two
S y
concentric circles, as shown in Figure 3.1. As we quickly realize if we try to write down iterated
integrals in xy-coordinates, although it is not impossible to evaluate them, it is far from a pleasant
task. It would be much more sensible to work in a coordinate system that is built around the radial
symmetry. This is the place of polar coordinates.
Polar coordinates on the xy-plane are defined as follows: As shown in Figure 3.2, let r =
p x
x2 + y 2 denote the distance of the point from the origin, and let θ denote the angle from
y
x
r y
θ
Figure 3.2
the positive x-axis to the vector from the origin to the point. Ordinarily, we adopt the convention
that
r≥0 and 0 ≤ θ < 2π or − π ≤ θ < π.
It is better to express x and y in terms of r and θ, and we do this by means of the mapping
g : [0, ∞) × [0, 2π) → R2
§3. Polar, Cylindrical, and Spherical Coordinates 277
r r cos θ x
g = = .
θ r sin θ y
Z
x
To evaluate a double integral f dA in polar coordinates, we first determine the region
S y
Ω in the rθ-plane that maps to S. We substitute x = r cos θ and y = r sin θ, and then realize that a
little rectangle ∆r by ∆θ in the rθ-plane maps to an “annular chunk” whose area is approximately
θ y
r∆θ
∆θ g ∆r
∆r r x
Figure 3.3
∆r(r∆θ) in the xy-plane (see Figure 3.3). That is, partitioning the region Ω into little rectangles
corresponds to “partitioning” S into such annular pieces. Summing over all the subrectangles of a
partition suggests a formula like
Z Z
x r cos θ
f dA = f rdrdθ.
S y Ω r sin θ
Example 2. Let S ⊂ R2 be the region inside the circle x2 + y 2 = 9, below the Z line y = x,
above the x-axis, and lying to the right of x = 1, as shown in Figure 3.4. Evaluate xydA. We
S
begin by finding the region Ω in the rθ-plane that maps to S, as shown in Figure 3.5. Clearly θ
goes from 0 to π/4, and for each fixed θ, we see that r starts at r = sec θ (as we enter S at the line
x = 1) and increases to r = 3 (as we exit S at the circle). (We think naturally of determining r as
278 Chapter 7. Integration
y
x=1
y=x
S
x
x2+y2=9
Figure 3.4
y
x=1
y=x
θ
r=sec θ S
g
π/4
Ω x
3 r x2+y2=9
Figure 3.5
a function of θ, so that naturally we would place θ on the horizontal axis and r on the vertical; for
reasons we’ll see in Chapter 8, this is not a good idea.)
Therefore, we have
Z Z π/4 Z 3
xydA = (r| cos
{z θ})(r| sin
{z θ}) rdrdθ
| {z }
S 0 sec θ x y dA
Z π/4 Z 3
= r 3 cos θ sin θdrdθ
0 sec θ
Z π/4
1
= (81 − sec4 θ) cos θ sin θdθ
4 0
Z π/4
1 sin θ
= 81 cos θ sin θ − dθ
4 0 cos3 θ
1 2 1 π/4 79
= 81 sin θ − = . ▽
8 cos2 θ 0 16
Z ∞
2
Example 3. We wish to evaluate the improper integral e−x dx. This “Gaussian integral”
0
is ubiquitous in probability, statistics, and statistical mechanics. Although one way of doing so was
§3. Polar, Cylindrical, and Spherical Coordinates 279
given in Exercise 7.2.22, the approach we take here is more amenable to generalization.
Taking advantage of the property ea+b = ea eb , we exploit radial symmetry by calculating instead
the double integral
Z ∞ 2 Z ∞ Z ∞ Z
−x2 −x2 −y 2 2 2
e dx = e e dydx = e−(x +y ) dA.
0 0 0 [0,∞)×[0,∞)
N
√
N 2
Figure 3.6
2 2
as Figure 3.6 suggests, that the integral of e−(x +y ) over the square [0, N ] × [0, N ] lies between the
√
integral over the quarter-disk of radius N and the integral over the quarter-disk of radius N 2,
both of which approach π/4. ▽
In general, it is good to use polar coordinates when either the form of the integrand or the
shape of the region recommends it.
280 Chapter 7. Integration
Next we come to three dimensions. Cylindrical coordinates r, θ, z are merely polar coordinates
(used in the xy-plane) along with the cartesian coordinate z:
The intuitive argument we gave earlier for polar coordinates suggests now that a little rectangle ∆r
z z
∆z
∆z
g
∆r ∆r
∆θ θ r∆θ
y
r x
Figure 3.7
Indeed, as suggested by the last integral above, it is almost always preferable to set up an iterated
integral with dz innermost, and then the usual rdrdθ outside (integrating over the projection of Ω
onto the xy-plane).
Thus, we have
Z Z π/2 Z 2 Z 4
xdV = |r cos
{z θ} rdzdrdθ
| {z }
S 0 0 r2 x dV
Z π/2 Z 2 Z 4
= r 2 cos θdzdrdθ
0 0 r2
Z π/2 Z 2
= r 2 cos θ(4 − r 2 )drdθ
0 0
§3. Polar, Cylindrical, and Spherical Coordinates 281
Z π/2
64 64
= cos θdθ = ,
0 15 15
which, reassuringly, is the same answer we obtained earlier. ▽
Example 5. Let S be the region bounded above by theZ paraboloid z = 6 − x2 − y 2 and below
p
by the cone z = x2 + y 2 , as pictured in Figure 3.8. Find zdV . The symmetry of S about the
S
z=6—r2
z=r
r=2
Figure 3.8
z-axis makes cylindrical coordinates a natural. The surfaces z = 6 − r 2 and z = r intersect when
r = 2, so we see that S is the image under g of the region
r
Ω=
θ : 0 ≤ r ≤ 2, 0 ≤ θ ≤ 2π, r ≤ z ≤ 6 − r 2
.
z
Thus, we have
Z Z 2π Z 2 Z 6−r 2
zdV = z |rdzdrdθ
{z }
S 0 0 r
dV
Z 2π Z 2
= 1
2 (6 − r 2 )2 − r 2 rdrdθ
0 0
Z 2
92
=π (36 − 13r 2 + r 4 )rdr = π. ▽
0 3
Last, we come to spherical coordinates: ρ represents the distance from the origin to the point,
φ the angle from the positive z-axis to the vector from the origin to the point, and θ the angle from
the positive x-axis to the projection of that vector into the xy-plane. That is, in some sense, φ
specifies the latitude of the point and θ specifies its longitude. (As shown in Figure 3.9, when ρ and
φ are held constant, we get a circle parallel to the xy-plane; when ρ and θ are held constant, we get
a great circle going from the north pole to the south pole.) Notice that we make the convention
that
ρ ≥ 0, 0 ≤ φ ≤ π, and 0 ≤ θ ≤ 2π.
282 Chapter 7. Integration
φ constant
φ
y
θ
x θ constant
Figure 3.9
z
ρ sin φ
φ ρ cos φ
θ
y
x ρ sin φ cos θ
ρ sin φ sin θ
Figure 3.10
As usual, we use basic trigonometry to express x, y, and z in terms of our new coordinates ρ,
φ, and θ (see also Figure 3.10):
θ z
∆θ
g ρ∆φ
∆ρ φ
∆φ φ ∆ρ
y
ρ sin ρ sinφ ∆θ
θ φ
ρ x
Figure 3.11
§3. Polar, Cylindrical, and Spherical Coordinates 283
of volume approximately
ρ=a
φ0 φ=φ0
Figure 3.12
ρ
Ω = φ : 0 ≤ ρ ≤ a, 0 ≤ φ ≤ φ0 , 0 ≤ θ ≤ 2π ,
θ
where φ0 = arctan(1/c).
The volume of S is calculated using spherical coordinates as follows:
Z Z 2π Z φ0 Z a
vol(S) = 1dV = ρ2 sin φdρdφdθ
S 0 0 0
2π 3 2π 3 c
= a (1 − cos φ0 ) = a 1− √ .
3 3 1 + c2
Z
We can calculate zdV as well:
S
Z Z 2π Z φ0 Z a
zdV = (ρ cos φ) ρ2 sin φdρdφdθ
S 0 0 0 | {z } | {z }
z dV
Z 2π Z φ0 Z a
= ρ3 sin φ cos φdρdφdθ
0 0 0
π 4 2 π 4 1
= a sin φ0 = a . ▽
4 4 1 + c2
284 Chapter 7. Integration
0 Z
Example 7. Let S be the sphere of radius a centered at 0 . We wish to evaluate z 2 dV .
a S
We observe first that, by Exercise 1.2.14, the triangle shown in Figure 3.13 is a right triangle, and
so the equation of the sphere is ρ = 2a cos φ, 0 ≤ φ ≤ π/2. So we have
ρ
φ
Figure 3.13
Z Z 2π Z π/2 Z 2a cos φ
z 2 dV = (ρ2 cos2 φ) ρ2 sin φdρdφdθ
S 0 0 0 | {z } | {z }
z2 dV
Z 2π Z π/2 Z 2a cos φ
= ρ4 cos2 φ sin φdρdφdθ
0 0 0
Z π/2
64 5 8 5
= πa cos7 φ sin φdφ = πa . ▽
5 0 5
EXERCISES 7.3
2. Find the area of the region bounded on the left by x = 1 and on the right by x2 + y 2 = 4.
Check your answer with simple geometry.
Figure 3.14
Z
1
4. For ε > 0, let Sε = {x : ε ≤ kxk ≤ 1} ⊂ Evaluate lim R2 . p dA. (This is often
Z ε→0+ Sε x2 + y2
expressed as the improper integral (x2 + y 2 )−1/2 dA.)
B(0,1)
Z
*5. Let S be the annular region shown in Figure 3.1. Evaluate y 2 dA
S
a. directly Z
b. by instead calculating (x2 + y 2 )dA.
S
Z
*6. Calculate y(x2 + y 2 )−5/2 dA , where S is the plane region lying above the x-axis, bounded
S
on the left by x = 1 and above by x2 + y 2 = 2.
Z
7. Calculate (x2 + y 2 )−3/2 dA , where S is the plane region bounded below by y = 1 and above
S
by x2 + y 2 = 4.
x y2
8. Let f =p . Let S be the planar region lying inside the circle x2 + y 2 = 2x, above
y x2 + y 2 Z
the x-axis, and to the right of x = 1. Evaluate f dA.
S
Z 1Z 1
xex
*9. Evaluate dxdy.
0 y x2 + y 2
10. Find the volume of the region bounded above by z = 2y and below by z = x2 + y 2 .
11. Find the volume of the “doughnut with no hole,” ρ = sin φ, pictured in Figure 3.15.
14. Find the volume of the region inside both x2 + y 2 + z 2 = 4a2 and x2 + y 2 = 2ay.
15. Find the volume of the region bounded above by x2 + y 2 + z 2 = 2 and below by z = x2 + y 2 .
16. Find the volume of the region inside the sphere x2 + y 2 + z 2 = a2 by integrating in
a. cylindrical coordinates
286 Chapter 7. Integration
Figure 3.15
b. spherical coordinates
*17. Find the volume of a right circular cone of base radius a and height h by integrating in
a. cylindrical coordinates
b. spherical coordinates
p
18. Find the volume of the region lying above the cone z = x2 + y 2 and inside the sphere
x2 + y 2 + z 2 = 2 by integrating in
a. cylindrical coordinates
b. spherical coordinates
19. Find the volume of the region lying above the plane z = a and inside the sphere
x2 + y 2 + z 2 = 4a2 by integrating in
a. cylindrical coordinates
b. spherical coordinates
Z
3
*20. Let S ⊂ R be the unit ball. Use symmetry principles to compute x2 dV as easily as possible.
S
Z
2 +y 2 +z 2 )
21. a. Evaluate e−(x dV .
Z
R3
2 +2y 2 +3z 2 )
b. Evaluate e−(x dV .
R3
*22. Find the volume of the region in R3 bounded above by the plane z = 3x + 4y and below by the
paraboloid z = x2 + y 2 .
Z
z
23. Evaluate 2 2 2 3/2
dV , where S is the region bounded below by the sphere x2 + y 2 +
S (x + y + z )
z 2 = 2z and above by the sphere x2 + y 2 + z 2 = 1.
24. Find the volume of the region in R3 bounded by the cylinders x2 + y 2 = 1, y 2 + z 2 = 1, and
x2 + z 2 = 1. (Hint: Make full use of symmetry.)
4. Physical Applications
So far we have focused on area and volume as our interpretation of the multiple integral. Now
we discuss average value and mass (which have both physical and probabilistic interpretations),
§4. Physical Applications 287
Now let’s suppose that f is bounded. Then, as usual, mi ≤ f (xi ) ≤ Mi for each i = 1, . . . , k, and
so
1 1
L(f, Pk ) ≤ y (k) ≤ U (f, Pk )
b−a b−a
for every uniform partition Pk of the interval [a, b]. Now assume that f is integrable. Then it
Z b
follows from Exercise 7.1.10 that L(f, Pk ) and U (f, Pk ) both approach f (x)dx as k → ∞, and
a
so
Z b
(k) 1
y → f (x)dx as k → ∞.
b−a a
This motivates the following
Definition. Let f be an integrable function on the interval [a, b]. We define the average value
of f on [a, b] to be
Z b
1
f= f (x)dx.
b−a a
In general, if Ω ⊂ Rn is a region and f : Ω → R is integrable, we define the average value of f on
Ω to be Z
1
f= f dV .
vol(Ω) Ω
Example
1. A round hotplate S is given by the disk r ≤ π/2. Its temperature is given by
x p
f = cos x2 + y 2 . We want to determine the average temperature of the plate. We calculate
y
Z
1
f= f dA
area(S) S
and so
2π π2 − 1 4(π − 2)
f= π 2 = ≈ 0.463. ▽
π( 2 ) π2
288 Chapter 7. Integration
0.8
0.6
0.4
0.2
Figure 4.1
Z Z 1 Z x2 Z 1
x x3 1/4
xdA = dydx = 1 4 dx = ,
Ω 0 0 y 0 2x 1/10
3/4
so x = , which makes physical sense (see Figure 4.1). ▽
3/10
It is useful to observe that when the region Ω is symmetric about an axis, its centroid will lie on
that axis. (See Exercise 7.2.18.)
When a mass distribution Ω is non-uniform, it is important to understand the idea of density.
Much like instantaneous velocity (or slope of a curve), which is defined as a limit of average
velocities (or slopes of secant lines), we define the density δ(x) to be the limit as r → 0+ of the
average density (mass/volume) of a cube of sidelength r centered at x.2 Then it is quite plausible
2
More precisely, the average density of that portion of the cube lying in Ω.
§4. Physical Applications 289
that, with some reasonable assumptions on the behavior of “mass,” it should be recaptured by
integrating the density function.
Proof. As usual, it suffices to assume Ω is a rectangle R. For any partition P = {Ri } of R, let
mi = inf x∈Ri δ(x) and Mi = supx∈Ri δ(x). Then mi vol(Ri ) ≤ mass(Ri ) ≤ Mi vol(Ri ). (Suppose,
for example, that
mass(Ri )
Mi < = δ∗ .
vol(Ri )
Then, in particular, for all x ∈ Ri , we have δ(x) < δ∗ , and so, by the definition of δ, for each x
there is a cube centered at x whose average density is less than δ∗ . By compactness, we can cover
Ri by finitely many such cubes and we see that the average density of Ri itself is less than δ∗ , which
is a contradiction.) It now follows that L(δ, P) ≤ mass(R) ≤ U (δ, Z
P) for any partition P of R, and
so, since we’ve assumed δ is integrable, we must have mass(R) = δdV .
R
Remark. We should be a little bit careful here. The Fundamental Theorem of Calculus tells
Rx
us that we can recover f by differentiating its integral F (x) = a f (t)dt provided f is continuous.
If we start with an arbitrary integrable function f , e.g., the function in Exercise 7.1.13, this will
of course not work. A similar situation occurs if we start with an integrable δ, define the mass by
integrating, and then try to recapture δ by “differentiating” (taking the limit of average densities).
Since we are concerned here with physical applications, we will tacitly assume δ is continuous (see
Exercise 7.1.7). In more sophisticated treatments, we really would like to allow point masses and
“generalized functions,” called distributions; this will have to wait for a more advanced course.
Now, generalizing our earlier definition of center of mass, if Ω is a mass distribution with density
function δ, then we define the center of mass to be the weighted average
Z
1
x= δ(x)xdV .
mass(Ω) Ω
This is a natural generalization of the weighted average we see with a system of finitely many point
masses m1 , . . . , mN at positions x1 , . . . , xN , respectively, as shown in Figure 4.2. In this case, the
weighted average is
P
N
mi xi
i=1
x= N ,
P
mi
i=1
and it has the following physical interpretation. If external forces Fi act on the point masses mi ,
they impart accelerations x′′i according to Newton’s second law: Fi = mi x′′i . Consider the resultant
PN PN
force F = Fi acting on the total mass m = mi (any internal forces cancel ultimately by
i=1 i=1
Newton’s third law). Then
N
X N
X
F= Fi = mi x′′i = mx′′ .
i=1 i=1
290 Chapter 7. Integration
mN
m = m1+m2+m3+...+mN
xN
x
m1 x3 m3
x1
x2
m2
Figure 4.2
That is, as the forces act and time passes, the center of mass of the system translates exactly as if
we concentrated the total mass m at x and let the resultant force F act there.
Next, let’s consider now a rigid body3 consisting of point masses m1 , . . . , mN rotating about
an axis ℓ; a typical such mass is pictured in Figure 4.3. The kinetic energy of the system is
ω
ri
mi
xi
Figure 4.3
N
X N
1 1X
K.E. = mi kx′i k2 = mi (ri ω)2 ,
2 2
i=1 i=1
where ω is the angular speed with which the body is rotating about the axis and ri is the distance
from the axis of rotation to the point mass mi . (Remember that each mass is moving in a circle
whose center lies on ℓ.) Regrouping,
N
X
K.E. = 12 Iω 2 , where I= mi ri2 .
i=1
I is called the moment of inertia of the rigid body about ℓ.
In the case of a mass distribution Ω forming a rigid body, we define by analogy (partitioning it
and approximating it by a finite number of masses) its moment of inertia about an axis ℓ to be
Z
I= δr 2 dV ,
Ω
where r is the distance from ℓ.
3
A rigid body does not move relative to itself; imagine the masses connected to one another by inflexible rods.
§4. Physical Applications 291
Example 3. Let’s find the moment of inertia of a uniform solid ball Ω of radius a about an
axis through its center. We may as well place the ball with its center at the origin and let the axis
be the z-axis. Then, using spherical coordinates, we have (since δ is constant):
Z Z 2π Z π Z a
2
I= δr dV = δ (ρ sin φ)2 ρ2 sin φdρdφdθ
Ω 0 0 0 | {z } | {z }
r2 dV
Z π Z a
= 2πδ ρ4 sin3 φdρdφ
0 0
a5 4 4 3 2 2 2
= 2πδ · · = πa δ a = ma2 ,
5 3 3 5 5
where m = 34 πa3 δ is the total mass of Ω. ▽
Example 4. One of the classic applications of the moment of inertia is to decide which rolling
Figure 4.4
object wins the race down a ramp. Given a hula hoop, a wooden nickel, a hollow ball, a solid ball,
or, something more imaginative like a solid cone, as pictured in Figure 4.4, which one gets to the
bottom first?
We use the basic result from physics (see the remark on p. 337 and Example 6 of Chapter 8,
Section 3) that, if we ignore friction, total energy—potential plus kinetic—is conserved.4 We mea-
sure potential energy relative to ground level, so a mass m has potential energy mgh at (relatively
small) heights h. If the rolling radius is a, its angular speed is ω, and its linear speed is v, then we
have aω = v, so when the mass has descended a vertical height h, we have
Thus, the object’s speed is greatest when the fraction I/ma2 is smallest. We calculated in Example
3 that this fraction is 2/5 for a solid ball. For a hula-hoop of radius a or for a hollow cylinder of
4
Of course, for the objects to roll, there must be some friction.
292 Chapter 7. Integration
radius a, it is obviously 1 (why?). So the solid ball beats the hula-hoop or hollow cylinder. What
about the other shapes? (See Exercises 16, 17, and 19.) And is there an optimal shape? ▽
Newton’s law of gravitation applies to point masses: the force F exerted by a mass m at position
x on a test mass (which we take to have mass 1 unit) at the origin is given by
x
F = Gm .
kxk3
Thus, the gravitational force exerted by a collection of masses mi , i = 1, . . . , N , at positions xi on
the test mass is given by
N
X XN
xi
F= Fi = G mi ,
kxi k3
i=1 i=1
and, thus, the gravitational force exerted by a continuous mass distribution Ω with density function
δ is Z
x
F=G δ 3
dV .
Ω kxk
Example 5. Find the gravitational attraction on a unit mass at the origin of the uniform region
Ω bounded above by the sphere x2 + y 2 + z 2 = 2a2 and below by the paraboloid az = x2 + y 2 ,
pictured in Figure 4.5. (Take δ = 1.)
az=x2+y2
x2+y2+z2=2a2
r=a
Figure 4.5
Since Ω is symmetric about the z-axis, the net force will lie entirely in the z-direction, so we
calculate only the e3 -component of F. Working in cylindrical coordinates, we see that Ω lies over
the disk of radius a centered at the origin in the xy-plane, and so
Z
z
F3 = G 2 + y 2 + z 2 )3/2
dV
Ω (x
Z 2π Z a Z √2a2 −r2
z
=G rdzdrdθ
0 0 r 2 /a (r + z 2 )3/2
2
Z a √
2 2 −1/2
i 2a2 −r2
= 2πG r −(r + z ) dr
0 r 2 /a
Z a
√
a r 1
= 2πG √ − √ dr = 2πGa log(1 + 2) − √ .
0 a2 + r 2 a 2 2 2
§4. Physical Applications 293
We leave it to the reader to set the problem up in spherical coordinates (see Exercise 24). ▽
Example 6. Newton wanted to understand the gravitational attraction of the earth, which he
took to be a uniform ball. Most of us are taught nowadays that the gravitational attraction of the
earth on a point mass outside the earth is that of a point mass M concentrated at the center of
the earth. But what happens if the point mass is inside the earth? We put the earth (a ball of
α
d
b
φ ρ
R
Ω
Figure 4.6
0
radius R) with its center at the origin and the point mass at 0 , b > 0, as shown in Figure 4.6.
b
By symmetry, the net force will lie in the z-direction, so we compute only that component. If the
earth has (constant) density δ, we have
Z Z
cos α b − ρ cos φ
F3 = −Gδ 2 dV = −Gδ 2 2 3/2
dV
Ω d Ω (b + ρ − 2bρ cos φ)
Z RZ π
b − ρ cos φ
= −2πGδ 2 2 3/2
ρ2 sin φdφdρ.
0 0 (b + ρ − 2bρ cos φ)
EXERCISES 7.4
*1. Find the average distance from the origin to the points in the ball B(0, a) ⊂ R2 .
2. Find the average distance from the origin to the points in the ball B(0, a) ⊂ R3 .
*3. Find the average distance from a point on the boundary of a ball of radius a in R2 to the points
inside the ball.
*4. Find the average distance from a point on the boundary of a ball of radius a in R3 to the points
inside the ball.
5. Find the average distance from one corner of a square of sidelength a to the points inside the
square.
10. Find the center of mass of the uniform region in Exercise 7.3.19.
*11. Find the center of mass of the uniform tetrahedron bounded by the coordinate planes and the
plane x/a + y/b + z/c = 1.
*12. Find the mass of a solid cylinder of height h and base radius a if its density at x is equal to
the distance from x to the axis of the cylinder. Next find its moment of inertia about the axis.
13. Find the moment of inertia about the z-axis of a solid ball of radius a centered at the origin,
whose density is given by δ(x) = kxk.
§4. Physical Applications 295
p
14. Let Ω be the region bounded above by x2 + y 2 + z 2 = 4 and below by z = x2 + y 2 . Calculate
the moment of inertia of Ω about the z-axis by integrating in both cylindrical and spherical
coordinates.
15. Find the moment of inertia about the z-axis of the region of constant density δ = 1 bounded
√ p
above by the sphere x2 + y 2 + z 2 = 4 and below by the cone z 3 = x2 + y 2 .
*16. Find the moment of inertia about the z-axis of each of the following uniform objects:
a. a hollow cylindrical can x2 + y 2 = a2 , 0 ≤ z ≤ h
b. the solid cylinder x2 + y 2 ≤ a2 , 0 ≤ z ≤ h
c. the solid cone of base radius a and height h symmetric about the z-axis
Express each of your answers in the form I = kma2 for the appropriate constant k.
17. a. Let 0 < b < a. Find the moment of inertia Ia,b about the z-axis of the uniform region
b2 ≤ x2 + y 2 + z 2 ≤ a2 .
Ia,b
b. Find lim 3 .
b→a a − b3
−
c. Use your answer to part b to show that the moment of inertia of a uniform hollow spherical
shell x2 + y 2 + z 2 = a2 about the z-axis is 23 ma2 , where m is its total mass.
19. Let Ω be the uniform solid of revolution obtained by rotating the graph of y = |x|n , |x| ≤ a1/n ,
I 2n + 1
about the x-axis, as indicated in Figure 4.7. Show that 2
= .
ma 2(4n + 1)
Figure 4.7
20. Let Ωε denote the uniform solid region described in spherical coordinates by 0 ≤ ρ ≤ a,
0 ≤ φ ≤ ε.
a. Find the center of mass of Ωε .
b. Find the limiting position of the center of mass as ε → 0+ . Explain your answer.
21. (Pappus’s Theorem) Suppose R ⊂ R2 is a plane region (say, that bounded by the graphs of
f and g on the interval [a, b]), and let Ω ⊂ R3 be obtained by revolving R about the x-axis.
Prove that the volume of Ω is equal to
vol(Ω) = 2πy · area(R).
296 Chapter 7. Integration
22. Let Ω denote a mass distribution. Denote by I the moment of inertia of Ω about a given axis ℓ,
and by I0 the moment of inertia about the axis ℓ0 parallel to ℓ and passing through the center
of mass of Ω. Then prove the parallel axis theorem:
I = I0 + mh2 ,
23. Calculate the gravitational attraction of a solid ball of radius R on a unit mass on its boundary
if its density is equal to distance from the center of the ball.
25. Prove or give a counterexample: The gravitational force on a test mass of a body with total
mass M is equal to that of a point mass M located at the center of mass of the body.
26. Show that Newton’s first result in Example 6 still works for a nonuniform earth, so long as
the density δ is radially symmetric (i.e., is a function of ρ only). What happens to the second
result?
27. Consider the solid region Ω bounded by (x2 + y 2 + z 2 )3/2 = kz (k > 0), with k chosen so that
the volume of Ω is equal to the volume of the unit ball.
a. Find k.
b. Taking δ = 1, find the gravitational attraction of Ω on a unit test mass at the origin.
Remark. Your answer to part b should be somewhat larger than 4πG/3, the gravitational
attraction of the unit ball (with δ = 1) on a unit mass on its boundary. In fact, Ω is the region
of appropriate mass that maximizes the gravitational attraction on a point mass at the origin.
Can you think of any explanation—physical, geometric, or otherwise?
28. A completely uniform forest is in the shape of a plane region Ω. The forest service will locate
a helipad somewhere in the forest and, in the event of fire, will dispatch helicopters to fight it.
If a fire is equally likely to start anywhere in the forest, where should the forest service locate
the helipad to minimize fire damage? (Let’s take the simplest model possible: Assume that fire
spreads radially at a constant rate, that the helicopters fly at a constant rate and take off as
soon as the fire starts. So what are we trying to minimize here?)
(1) If any pair of the vectors v1 , . . . , vn are exchanged, D changes sign. That is,
D(v1 , . . . , vi , . . . , vj , . . . , vn ) = −D(v1 , . . . , vj , . . . , vi , . . . , vn ) for any 1 ≤ i < j ≤ n.
(2) For all v1 , . . . , vn ∈ Rn and c ∈ R, we have
D(cv1 , v2 , . . . , vn ) = D(v1 , cv2 , . . . , vn ) = · · · = D(v1 , . . . , vn−1 , cvn ) = cD(v1 , . . . , vn ).
(3) For any vectors v1 , . . . , vn and vi′ , we have
Since most of our work with matrices has centered on row operations, it would perhaps be more
convenient to define the determinant in terms of the rows of A. But it really is inconsequential for
two reasons: first, everything we proved using row operations (and, correspondingly, left multipli-
cation by elementary matrices) works verbatim for column operations (and, correspondingly, right
multiplication by elementary matrices); second, we will prove shortly that det AT = det A.
Properties (1)–(3) of D listed in Theorem 5.1 allow us to see the effect of elementary column
operations on the determinant of a matrix. Indeed, Property (1) corresponds to a column inter-
change; Property (2) corresponds to multiplying a column by a scalar; and Property (3) tells us—in
combination with Property (1)—that adding a multiple of one column to another does not change
the determinant.
by Property (2). Repeating this process with the 4 and the 2, we obtain
0 0 0 4 0 0 0 1
0 2 0 0 0 1 0 0
3 = 2 · 4 · 3 .
0 0 1 0 0 0 1 0
1 0 0 0 1 0 0 0
Now interchanging columns 1 and 4 introduces a factor of −1 by Property (1), and we have
0 0 0 1 1 0 0 0
0 1 0 0 0
0 1 0
det A = 24 = −24 = −24,
0 0 1 0 0 0 1 0
1 0 0 0 0 0 0 1
since Property (4) tells us that det I4 = 1. ▽
To calculate the effect of the third type of column operation—adding a multiple of one column
to another—we need the following observation.
Proof. If ai = aj , then the matrix is unchanged when we switch columns i and j. On the
other hand, by Property (1), its determinant changes sign when we do so. That is, we have
det A = − det A. This can happen only when det A = 0.
Now we can easily prove the
Proposition 5.3. Let A be an n × n matrix and let A′ be the matrix obtained by adding a
multiple of one column of A to another. Then det A′ = det A.
Proof. Suppose A′ is obtained from A by replacing the ith column by its sum with c times the
j th column; i.e., a′i = ai + caj , with i 6= j. (As a notational convenience, we assume i < j, but that
really is inconsequential.) We wish to show that
det A′ = D(a1 , . . . , ai−1 , ai + caj , ai+1 , . . . , aj , . . . , an )
= D(a1 , . . . , ai−1 , ai , ai+1 , . . . , aj , . . . , an ) = det A.
By Property (3), we have
det A′ = D(a1 , . . . , ai−1 , ai + caj , ai+1 , . . . , aj , . . . , an )
= D(a1 , . . . , ai−1 , ai , ai+1 , . . . , aj , . . . , an ) + D(a1 , . . . , ai−1 , caj , ai+1 , . . . , aj , . . . , an )
= D(a1 , . . . , ai−1 , ai , ai+1 , . . . , aj , . . . , an ) + cD(a1 , . . . , ai−1 , aj , ai+1 , . . . , aj , . . . , an )
= D(a1 , . . . , ai−1 , ai , ai+1 , . . . , aj , . . . , an ),
since D(a1 , . . . , ai−1 , aj , ai+1 , . . . , aj , . . . , an ) = 0 by the preceding Lemma.
Example 2. We now use column operations to calculate the determinant of the matrix
2 2 1
A = 4 1 0.
6 0 1
§5. Determinants and n-dimensional Volume 299
First we exchange columns 1 and 3, and then we proceed to (column) echelon form:
2 2 1 1 2 2 1 0 0 1 0 0
det A = 4 1 0 = −0 1 4 = −0 1 4 = −0 1 0.
6 0 1 1 0 6 1 −2 4 1 −2 12
But
1 0 0 1 0 0
0 1 0 = 12 0 1 0,
1 −2 12 1 −2 1
and now we can use the pivots to column reduce to the identity matrix without changing the
determinant. Thus,
1 0 0 1 0 0
det A = −12 0 1 0 = −12 0 1 0 = −12. ▽
1 −2 1 0 0 1
This is altogether too brain-twisting. We will now go back to the theory and soon show that it’s
perfectly all right to use row operations. First, let’s summarize what we’ve established so far: we
have
Theorem 5.5. Let A be a square matrix. Then A is nonsingular if and only if det A 6= 0.
Proof. Suppose A is nonsingular. Then its reduced (column) echelon form is the identity
matrix. Turning this upside down, we can start with the identity matrix and perform a sequence
of column operations to obtain A. If we keep track of their effects on the determinant, we see
that we’ve started with det I = 1 and multiplied it by a nonzero number to obtain det A. That is,
det A 6= 0. Conversely, suppose A is singular. Then its (column) echelon form U has a column of
zeroes and therefore (see Exercise 2) det U = 0. It follows as in the previous case that det A = 0.
Corollary 5.6. Let E be an elementary matrix and let A be an arbitrary square matrix. Then
Proof. Suppose B is singular, so that there is some nontrivial linear relation among its column
vectors:
c1 b1 + · · · + cn bn = 0.
Then, multiplying by A on the left, we find that
c1 (Ab1 ) + · · · + cn (Abn ) = 0,
from which we conclude that there is (the same) nontrivial linear relation among the column
vectors of AB, and so AB is singular as well. We infer from Theorem 5.5 that both det B = 0 and
det AB = 0, and so the result holds in this case.
Now, if B is nonsingular, we know that we can write B as a product of elementary matrices,
viz., B = E1 E2 · · · Em . We now apply Corollary 5.6 twice: first, we have
as claimed.
A consequence of this Proposition is that det(AB) = det(BA), even though matrix multiplication
is not commutative. Thus, we have
Corollary 5.8. Let E be an elementary matrix and let A be an arbitrary square matrix. Then
Since we’ve seen that row and column operations have the same effect on determinant, it should
not come as a surprise that a matrix and its transpose have the same determinant.
§5. Determinants and n-dimensional Volume 301
Proof. Suppose A is singular. Then so is AT (why?). Thus, det AT = 0 = det A, and so the
result holds in this case. Suppose now that A is nonsingular. As in the preceding proof, we write
T · · · E T E T , and so, using the product
A = E1 E2 · · · Em . Now we have AT = (E1 E2 · · · Em )T = Em 2 1
rule and Exercise 4, we obtain
det AT = det(Em
T
) · · · det(E2T ) det(E1T ) = det E1 det E2 · · · det Em = det A.
The following result can be useful:
Proposition 5.12. If A is an upper (lower) triangular n×n matrix, then det A = a11 a22 · · · ann .
Proof. If aii = 0 for some i, then A is singular (why?) and so det A = 0, and the desired
equality holds in this case. Now assume all the aii are nonzero. Let Ai be the ith row vector of A,
as usual, and write Ai = aii Bi , where the ith entry of Bi is 1. Then, using Property (2) repeatedly,
we have det A = a11 · · · ann det B. Now B is an upper triangular matrix with 1’s on the diagonal,
so we can use the pivots to clear out the upper (lower) entries without changing the determinant,
and thus det B = det I = 1. So det A = a11 a22 · · · ann , as promised.
Remark. As we shall prove in Theorem 1.1 of Chapter 9, any two matrices A and A′ repre-
senting a linear map T are related by the equation A′ = P −1 AP for some invertible matrix P . As
a consequence of Proposition 5.7, we have
det A′ = det(P −1 AP ) = det(AP P −1 ) = det A det(P P −1 ) = det A,
and so it makes sense to define det T = det A for any matrix representative of T .
We now come to the geometric meaning of det T : it gives the factor by which signed volume is
distorted under the mapping by T . (See Exercise 24 for another approach.)
Proof. When T has rank < n, det T = 0 and the image of T lies in a subspace of dimension
< n; hence, by Exercise 7.1.12, T (R) has volume zero. When T has rank n, we can write [T ] as
a product of elementary matrices. Because of Proposition 5.7, it now suffices to prove the result
when [T ] is an elementary matrix itself.
Recall that there are three kinds of elementary matrices (see p. 140). When R is a rectangle,
it is clear that the first type does not change volume, and the second multiplies the volume by |c|;
the third (a shear) does not change the volume, for the following reason. The transformation is the
identity in all directions other than the xi xj -plane, and we’ve already checked that in 2 dimensions
the determinant gives the signed area. (See also Exercise 24.)
Suppose Ω is a region. Then we can take a rectangle R containing Ω and consider the function
1, x ∈ Ω
χ : R → R, χ(x) = .
0, otherwise
302 Chapter 7. Integration
Since by our definition of region, χ is integrable, given ε > 0, we can find a partition P of R so that
U (χ, P) − L(χ, P) < ε. That is, the sum of the volumes of those subrectangles of P that intersect
the frontier of Ω is less than ε. In particular, this means Ω contains a union, S1 , of subrectangles of
P and is contained in a union, S2 , of subrectangles of P, as shown in Figure 5.1, with the property
that vol(S2 ) − vol(S1 ) < ε. And, likewise, T (Ω) contains a union, T (S1 ), of parallelepipeds and is
Ω S1
S2
Figure 5.1
contained in a union, T (S2 ), of parallelepipeds, with vol(T (Si )) = |c|vol(Si ) or vol(T (Si )) = vol(Si ),
depending on the nature of the elementary matrix. In either event, we see that
| det T |vol(S1 ) ≤ vol(T (Ω)) ≤ | det T |vol(S2 )
and since ε > 0 was arbitrary, we are done. (Note that, by Exercise 7.1.11 and Corollary 1.10,
T (Ω) has a well-defined volume.)
5.1. Formulas for the determinant. In Chapter 1 we had explicit formulas for the determi-
nant of 2 × 2 and 3 × 3 matrices. It is sometimes more useful to have a recursive way of calculating
the determinant. Given an n × n matrix A with n ≥ 2, denote by Aij the (n − 1) × (n − 1) matrix
obtained by deleting the ith row and the j th column from A. Define the ij th cofactor of the matrix
to be
cij = (−1)i+j det Aij .
Note that we include the coefficient of ±1 according to the “checkerboard” pattern as indicated
below:
+ − + ···
− + − ···
+ − + ···
.. .. .. . .
. . . .
Then we have the following formula, called the expansion in cofactors along the ith row :
Proposition 5.14. Let A be an n × n matrix. Then for any fixed i, we have
n
X
det A = aij cij .
j=1
Using rows here allows us to check that the expression on the right-hand side of this equation
satisfies the properties of a determinant as set forth in Theorem 5.1. However, using the fact that
det AT = det A, we can transpose this result to obtain the expansion in cofactors along the j th
column:
§5. Determinants and n-dimensional Volume 303
Note that when we define the determinant of a 1 × 1 matrix by the obvious rule,
det [a] = a,
Proposition 5.15 yields the familiar formula for the determinant of a 2 × 2 matrix, and, again, that
of a 3 × 3 matrix.
Example 3. Let
2 1 3
A = 1 −2 3.
0 2 1
We calculate det A by expanding in cofactors along the second row:
1 3 2 3 2 1
det A = (−1)(2+1) (1) + (−1)(2+2) (−2) + (−1)(2+3) (3)
2 1 0 1 0 2
= −(1)(−5) + (−2)(2) − (3)(4) = −11.
Of course, because of the 0 entry in the third row, we’d have been smarter to expand in cofactors
along the third row, obtaining
1 3 2 3 2 1
det A = (−1)(3+1) (0) + (−1)(3+2) (2) + (−1)(3+3) (1)
−2 3 1 3 1 −2
= −2(3) + 1(−5) = −11. ▽
Sketch of proof of Proposition 5.15. As we mentioned earlier, we must check that the ex-
pression on the right-hand side has the requisite properties. When we form a new matrix A′
by switching two adjacent columns (say columns k and k + 1) of A, then whenever j 6= k and
j 6= k + 1, we have a′ij = aij and c′ij = −cij ; on the other hand, when j = k, we have a′ik = ai,k+1
and c′ik = −ci,k+1 ; when j = k + 1, we have a′i,k+1 = aik and c′i,k+1 = −cik , so
n
X n
X
a′ij c′ij = − aij cij ,
j=1 j=1
as required. We can exchange an arbitrary pair of columns by exchanging an odd number of adjacent
pairs in succession (see Exercise 16), so the general result follows.
The remaining properties are easier to check. If we multiply the kth column by c, then for
j 6= k, we have a′ij = aij and c′ij = ccij , whereas for j = k, we have c′ik = cik and a′ik = caik . Thus,
n
X n
X
a′ij c′ij = c aij cij ,
j=1 j=1
as required. Suppose now that we replace the kth column by the sum of two column vectors, viz.,
a′k = ak + a′′k . Then for j 6= k, we have c′ij = cij + c′′ij and a′ij = aij = a′′ij . When j = k, we likewise
304 Chapter 7. Integration
as required.
Proof of Theorem 5.1. Proposition 5.15 establishes that a function D exists satisfying the
properties listed in the statement of the Theorem. On the other hand, as we saw, calculating
determinants just using the properties, there can only be one such function, because, by reducing
the matrix to echelon form by column or row operations, we are able to compute the determinant.
(See also Exercise 21.)
Remark. It is worth remarking that expansion in cofactors is an important theoretical tool, but
a computational nightmare. Even using calculators and computers, to compute an n×n determinant
by expanding in cofactors requires more than n! multiplications5 (and lots of additions). On the
other hand, to compute an n × n determinant by row reducing the matrix to upper triangular form
requires slightly fewer than 13 n3 multiplications (and additions). Now, Stirling’s formula tells us
that n! grows faster than (n/e)n , which gets large much faster than does n3 . Indeed, consider the
following table displaying the number of operations required:
Thus, we see that once n > 4, it is sheer folly to calculate a determinant by the cofactor method
(unless almost all the entries of the matrix happen to be 0).
We conclude this section with a few classic formulas. The first is particularly useful for solving
2 × 2 systems of equations and may be useful even for larger n if you are interested only in a certain
component xi of the solution vector.
Proof. This is amazingly simple. We calculate the determinant of the matrix obtained by
replacing the ith column of A by b = Ax = x1 a1 + · · · + xn an :
| | | |
det Bi = a1 a2 · · · x1 a1 + · · · + xn an · · · an
| | | |
| | | |
= a1 a2 · · · xi ai · · · an = xi det A,
| | | |
since the multiples of columns other than the ith do not contribute to the determinant.
We now deduce from Cramer’s rule an “explicit” formula for the inverse of a nonsingular matrix.
Students seem always to want an alternative to Gaussian elimination, but what follows is practical
only for the 2 × 2 case (where it gives us our familiar formula from Example 5 on p. 146) and—
barely—for the 3 × 3 case.
Proposition 5.17. Let A be a nonsingular matrix, and let C = [cij ] be the matrix of its
cofactors. Then
1
A−1 = CT.
det A
Proof. We recall from p. 145 that the j th column vector of A−1 is the solution of Ax = ej ,
where ej is the j th standard basis vector for Rn . Now, Cramer’s rule tells us that the ith coordinate
of the j th column of A−1 is
1
(A−1 )ij = det Aji ,
det A
where Aji is the matrix obtained by replacing the ith column of A by ej . Now, we calculate det Aji
by expanding in cofactors along the ith column of the matrix Aji . Since the only nonzero entry of
that column is the j th , and since all its remaining columns are those of the original matrix A, we
find that
det Aji = (−1)i+j det Aji = cji ,
and this proves the result.
For 3 × 3 matrices, this formula isn’t bad when det A would cause troublesome arithmetic doing
Gaussian elimination.
306 Chapter 7. Integration
P
n
Proof. The j th column of A is the vector aj = aij ei , and so, by Properties (2) and (3), we
i=1
have
n n n
!
X X X
det A = D(a1 , . . . , an ) = D ai 1 1 e i 1 , ai 2 2 e i 2 , . . . , ain n ein
i1 =1 i2 =1 in =1
n
X
= ai1 1 ai2 2 · · · ain n D(ei1 , ei2 , . . . , ein ),
i1 ,...,in =1
§5. Determinants and n-dimensional Volume 307
Recall that GL(n) denotes the set of invertible n × n matrices (which, by Exercise 6.1.6, is an
open subset of Mn×n ).
Proof. Proposition 5.18 shows that the determinant of an n × n matrix is a polynomial expres-
sion (of degree n) in its n2 entries. Thus, we infer from Proposition 5.17 that each entry of A−1 is
a rational function (quotient of polynomials) of the entries of A.
EXERCISES 7.5
1. Calculate
the following
determinants:
−1 6 −2
a. 3 4 5
5 2 1
1 0 2 0
−1 2 −2 0
*b.
0 1 2 6
1 1 3 2
1 4 1 −3
2 10 0 1
c.
0 0 2 2
0 0 −2 1
2 −1 0 0 0
−1 2 −1 0 0
*d. 0 −1 2 −1 0
0 0 −1 2 −1
0 0 0 −1 2
2. Suppose one column of the matrix A consists only of 0 entries, i.e., ai = 0 for some i. Prove
that det A = 0.
6. Prove that if the entries of a matrix A are integers, then det A is an integer. Hint: Use
Proposition 5.14 and induction or Proposition 5.18.
7. Given that 1898, 3471, 7215, and 8164 are all divisible by 13, use only the properties of deter-
minants and the result of Exercise 6 to prove that
1 8 9 8
3 4 7 1
7 2 1 5
8 1 6 4
is divisible by 13.
a1 b1 c1
8. Let A = , B = , and C = be points in R2 . Show that the signed area of
a2 b2 c2
△ABC is given by
a b c
1
1 1 1
a2 b2 c2 .
2
1 1 1
*11. Suppose A is an orthogonal n × n matrix. (Recall that this means that AT A = In .) Compute
det A.
§5. Determinants and n-dimensional Volume 309
12. Suppose A is a skew-symmetric n × n matrix. (Recall that this means that AT = −A.) Prove
that when n is odd, det A = 0. Give an example to show this needn’t be true when n is even.
(Hint: Use Exercise 5.)
1 2 1
*13. Let A = 2 3 0 .
1 4 2
1
a. If Ax = 2 , use Cramer’s Rule to find x2 .
−1
b. Find A−1 using cofactors.
*14. Using cofactors, find the determinant and the inverse of the matrix
−1 2 3
A= 2 1 0.
0 2 3
♯ 15. a. Suppose A is an n × n matrix with integer entries and det A = ±1. Prove that A−1 has
all integer entries.
b. Conversely, suppose A and A−1 are both matrices with integer entries. Prove that det A =
±1.
16. Prove that the exchange of any pair of rows (or columns) of a matrix can be accomplished by
an odd number of exchanges of adjacent pairs.
17. Suppose A is an orthogonal n × n matrix. Show that the cofactor matrix C = ±A.
18. Generalizing the result of Proposition 5.17, prove that AC T = (det A)I even if A happens to
be singular. In particular, when A is singular, what can you conclude about the columns of
C T?
x1 x2
19. a. Show that if and are distinct points in R2 , then the unique line passing through
y1 y2
them is given by the equation
1 1 1
x x1 x2 = 0.
y y1 y2
x1 x2 x3
b. Show that if y1 , y2 , and y3 are noncollinear points in R3 , then the unique
z1 z2 z3
plane passing through them is given by the equation
1 1 1 1
x x x x
1 2 3
= 0.
y y1 y2 y3
z z1 z2 z3
310 Chapter 7. Integration
20. As we saw in Exercises 4.1.22 and 4.1.23, through any three noncollinear points in R2 there
pass a unique parabola6 2 2 2
y = ax + bx+ c and a unique circle x + y + ax + by + c = 0. Given
x1 x2 x3
three such points , , and , show that the equation of the parabola and circle
y1 y2 y3
are, respectively,
1 1 1 1 1 1 1 1
x x x x x x1 x2 x3
1 2 3
2 2 2 2 = 0 and = 0.
x x1 x2 x3 y y1 y2 y3
y y1 y2 y3 x2 + y 2 x2 + y 2 x2 + y 2 x2 + y 2
1 1 2 2 3 3
21. Using Corollary 5.6, prove that the determinant function is uniquely determined by the prop-
erties listed in Theorem 5.1. (Hint: Mimic the proof of Proposition 5.7. It might be helpful to
f that have these properties and prove that det(A) = det(A)
consider two functions det and det f
for every square matrix A.)
23. a. Using Proposition 5.18, prove that D(det)(I)B = trB = b11 + · · · + bnn . (See Exercise
1.4.22.)
b. More generally, show that for any invertible matrix A, D(det)(A)B = det A tr(A−1 B).
24. Give an alternative proof of Proposition 5.13 for general parallelepipeds as follows. Let R ⊂ Rn
be a parallelepiped. Suppose T : Rn → Rn is a linear map of either of the forms
x1 cx1 x1 x1 + cx2
x2 x2 x2 x2
T .. = . or T .. = . .
. . . . .
.
xn xn xn xn
Calculate the volume of R and of T (R) by applying Fubini’s Theorem, putting the x1 integral
innermost. (This is in essence a proof of Cavalieri’s principle.)
6
Here we must also assume that no pair of the points lies on a vertical line.
§6. Change of Variables Theorem 311
25. (from the 1994 Putnam Exam) Find the value of m so that the line y = mx bisects the region
n x2 o
x
∈ R2 : + y 2 ≤ 1, x ≥ 0, y ≥ 0 .
y 4
26. Given any ellipse, show that there are infinitely many inscribed triangles of maximal area.
27. (from the 1994 Putnam Exam) Let A and B be 2 × 2 matrices with integer entries such that
A, A + B, A + 2B, A + 3B, and A + 4B are all invertible matrices whose inverses have integer
entries. Prove that A + 5B is invertible and that its inverse has integer entries. (Hint: Use
Exercise 15.)
We end this chapter with a general theorem justifying our formulas for integration in polar,
cylindrical, and spherical coordinates. Since we know that the determinant tells us the factor by
which linear maps distort signed volume, and since the derivative gives the best linear approxima-
tion, we expect a change of variables formula to involve the determinant of the derivative matrix.
Giving a rigorous proof is, however, another matter.
Since integration is based upon rectangles rather than balls, it is most convenient to choose (for
this section only) a different norm to measure vectors and linear maps, which, for obvious reasons,
we dub the cubical norm.
Definition. If x ∈ Rn , set kxk = max(|x1 |, |x2 |, . . . , |xn |). If T : Rn → Rm is a linear map, set
kT k = max kT (x)k .
kxk =1
We leave it to the reader to check in Exercise 1 that these are indeed norms and, as will be crucial
for us, that kT (x)k ≤ kT k kxk for all x ∈ Rn . Our first result, depicted in Figure 6.1, estimates
how much a C1 map can distort a cube.
Lemma 6.1. Let Cr denote the cube in Rn of sidelength 2r centered at 0. Suppose U ⊂ Rn is
an open set containing Cr and φ : U → Rn is a C1 function with the property that φ(0) = 0 and
kDφ(x) − Ik < ε for all x ∈ Cr and some 0 < ε < 1. Then
C(1−ε)r ⊂ φ(Cr ) ⊂ C(1+ε)r .
Proof. One can check that Proposition 1.3 of Chapter 6 holds when we use the k · k norm
instead of the usual one (see Exercise 1f). Then if x ∈ Cr , we have
kφ(x)k ≤ max kDφ(y)k kxk < (1 + ε)r,
y∈[0,x]
so φ(Cr ) ⊂ C(1+ε)r . The other inclusion can be proved by applying Exercise 6.2.11 in the k · k
norm.
The crucial ingredient in the proof of the Change of Variables Theorem is the following result,
which says that for sufficiently small cubes C, the image g(C) is well approximated by the image
under the derivative at the center of C.
312 Chapter 7. Integration
(1—ε)r
Cr φ(Cr)
r
(1+ε)r
Figure 6.1
Proof. Since g is C1 with invertible derivative at each point of U , g maps open sets to open
sets and the frontier of g(C) is the image of the frontier of C, hence a set of zero volume (see
Exercise 7.1.12). Therefore, g(C) is a region.
Suppose the sidelength of the cube C is 2r. We apply Lemma 6.1 to the function φ defined by
φ(x) = Dg(a)−1 g(x + a) − g(a) , x ∈ Cr .
Then φ(0) = 0, Dφ(0) = I, and Dφ(x) = Dg(a)−1 ◦ Dg(x + a) so, by the hypothesis, kDφ(x) −
Ik < ε for all x ∈ Cr . Therefore, we have
and so
g(a) + Dg(a) C(1−ε)r ⊂ g(C) ⊂ g(a) + Dg(a) C(1+ε)r .
Applying Proposition 5.5.13, using the fact that vol(Cαr ) = αn vol(Cr ), and remembering that
translation preserves volume, we obtain the result.
We begin our onslaught on the Change of Variables Theorem with a very simple case, whose
proof is left to the reader in Exercise 2.
Lemma 6.3. Suppose T : Rn → Rn is a linear map whose standard matrix is diagonal and
nonsingular. Let R ⊂ Rn be a rectangle, and suppose f is integrable on T (R). Then f ◦ T is
integrable on R and
Z Z
f (y)dVy = | det T | (f ◦ T )(x)dVx .
T (R) R
§6. Change of Variables Theorem 313
Theorem 6.4 (Change of Variables Theorem). Let Ω ⊂ Rn be a region and let U be an open
set containing Ω so that g : U → Rn is one-to-one and C1 with invertible derivative at each point.
Suppose f : g(Ω) → R and (f ◦ g)| det Dg| : Ω → R are both integrable. Then
Z Z
f (y)dVy = (f ◦ g)(x)| det Dg(x)|dVx .
g(Ω) Ω
Remark . One can strengthen the theorem, in particular by allowing Dg(x) to fail to be
invertible on a set of volume 0. This is important for many applications—e.g., polar, cylindrical,
and spherical coordinates. But we won’t bother justifying it here.
Proof. First, we cover Ω with a union of rectangles with rational sidelengths (as usual, by
working with the function f˜). Then, dividing these rectangles into cubes, we may assume R is a
cube.
There are positive constants M and N so that |f | ≤ M (by integrability) and k(Dg)−1 k ≤ N
(by continuity and compactness). Choose 0 < ε < 1. By uniform continuity, Theorem 1.4 of
Chapter 5, there is δ1 > 0 so that kDg(x) − Dg(y)k ≤ ε/N whenever kx − yk < δ1 , x, y ∈ R.
Similarly, there is δ2 > 0 so that | det Dg(x) − det Dg(y)| < ε/M whenever kx − yk < δ2 , x, y ∈ R.
And by integrability of (f ◦ g)| det Dg|, there is δ3 > 0 so that whenever the diameter of the cubes
of a cubical partition P is less than δ3 , we have U ((f ◦ g)| det Dg|, P) − L((f ◦ g)| det Dg|, P) < ε (see
Exercise 7.1.10).
Suppose P = {R1 , . . . , Rs } is a partition of R into cubes of diameter less than δ = min(δ1 , δ2 , δ3 ).
Let
Mi = sup (f ◦ g)(x)
x∈Ri
mi = inf (f ◦ g)(x)
x∈Ri
fi = sup (f ◦ g)(x)| det Dg(x)|
M
x∈Ri
We check the latter: choose a sequence of points xk ∈ Ri so that (f ◦ g)(xk ) → Mi (and we assume
Mi > 0 and all (f ◦ g)(xk ) > 0 for convenience). We have | det Dg(ai )| < | det Dg(xk )| + ε/M and
so
ε
(f ◦ g)(xk )| det Dg(ai )| < (f ◦ g)(xk )| det Dg(xk )| + (f ◦ g)(xk )
M
fi + ε.
≤ (f ◦ g)(xk )| det Dg(xk )| + ε ≤ M
as required.
314 Chapter 7. Integration
Therefore, we have
s
X Z s
X
(1 − ε)n mi | det Dg(ai )|vol(Ri ) ≤ f dV ≤ (1 + ε)n Mi | det Dg(ai )|vol(Ri ).
i=1 g(R) i=1
f dV = (f ◦ g)| det Dg|dV + γ for some γ > 0. Let ε > 0 be chosen small enough so that
g(R) R
(β + 1)ε < γ. We have
Z Z
f dV ≤ U ((f ◦ g)| det Dg|, P) + βε < (f ◦ g)| det Dg|dV + (β + 1)ε
g(R) R
Z Z
< (f ◦ g)| det Dg|dV + γ = f dV ,
R g(R)
as desired.
Examples 1. First, to be official, we check that the formulas we derived in a heuristic manner
in Section 3 are valid.
§6. Change of Variables Theorem 315
r cos θ
r
(a) polar coordinates: Let g = . Then
θ
r sin θ
r cos θ −r sin θ r
Dg = and det Dg = r.
θ sin θ r cos θ θ
r r cos θ
(b) cylindrical coordinates: g θ = r sin θ . Then
z z
r cos θ −r sin θ 0 r
Dg θ = sin θ r cos θ 0 and det Dg θ = r.
z 0 0 1 z
ρ ρ sin φ cos θ
(c) spherical coordinates: Let g φ = ρ sin φ sin θ . Then
θ ρ cos φ
ρ sin φ cos θ ρ cos φ cos θ −ρ sin φ sin θ
Dg φ = sin φ sin θ ρ cos φ sin θ ρ sin φ cos θ ,
θ cos φ −ρ sin φ 0
and, expanding in cofactors along the third row, we find that
ρ
det Dg φ = cos φ(ρ2 sin φ cos φ) + ρ sin φ(ρ sin2 φ) = ρ2 sin φ. ▽
θ
0 3 4 1
Example 2. Let S ⊂ R2
be the parallelogram with vertices , , , and , as
0 1 3 2
Z
pictured in Figure 6.2. Evaluate xdA. Of course, with a bit of patience, we could evaluate this
S
4
v y 1 3
g 2
1 S 3
R 1
1 u x
Figure 6.2
by three different iterated integrals in cartesian coordinates, but it makes sense to take a linear
transformation g that maps the unit square, R, to the region S, e.g.,
u 3 1 u x
g = = .
v 1 2 v y
Then, applying the Change of Variables Theorem, we have
Z Z
xdAxy = (3u | {z+ v}) |{z}
5 dAuv
S R
x | det Dg|
316 Chapter 7. Integration
Z 1Z 1 Z 1
=5 (3u + v)dvdu = 5 (3u + 21 )du = 5 · 2 = 10. ▽
0 0 0
y
u=1 u=4
g y=3
v y=x
S
Ω uv=9 xy=4
v=1 xy=1
u x
Figure 6.3
to S. Now, "
p #
u √1 − 12 vu3 u 1
2 uv
Dg = pv pu , so det Dg = .
v 1 1 v 2v
2 u 2 v
Then, by the Change of Variables Theorem, we have
Z Z
√ 1
ydAxy = uv dAuv
S Ω 2v
Z 4 Z 9/u r
u
= 12 dvdu
1 1 v
Z 4 Z
√ √ 9/u 4 √ 13
= u v 1 du = 3− u du = . ▽
1 1 3
EXERCISES 7.6
8. Find the volume of the region bounded below by the plane z = 0 and above by the elliptical
paraboloid z = 16 − x2 − 4y 2 .
*14. Suppose 0 < b < a. Define g : (0, b) × (0, 2π) × (0, 2π) → R3 by
r (a + r cos φ) cos θ
g θ = (a + r cos φ) sin θ .
φ r sin φ
Describe and sketch the image of g, and find its volume.
15. Let
1 1 1 ··· 1
1 2 1 ··· 1
A=
1 2 3 ··· 1.
.. .. .. .. ..
. . . . .
1 2 3 ··· n
Z Z
Given that f dV = 1, evaluate f (A−1 x)dV .
Rn Rn
16. Let S = {x ∈ Rn : xi ≥ 0 for all i, x1 + 2x2 + 3x3 + · · · + nxn ≤ n}. Find vol(S).
Z
*17. Define spherical coordinates in R4 and calculate kxkdV .
B(0,a)
Z
1
18. Let R = [0, 1] × [0, 1], and consider the integral I = dA.
R 1 − xy
P
∞
1
a. By expanding the integrand in a geometric series, show that I = k2 . (To be completely
k=1
rigorous, you will need to write I as the limit of integrals over [0, 1] × [0, 1 − δ] as δ → 0+ .
Why?)
b. Evaluate I by rotating the plane through π/4. A reasonable amount of cleverness will be
required.7
19. Let an denote the n-dimensional volume of the n-dimensional unit ball B(0, 1) ⊂ Rn . Prove
that
π m /m!, n = 2m
an = .
π m 22m+1 m!/(2m + 1)!, n = 2m + 1
(Hint: Proceed by induction with gaps of 2.)
7
We learned of this calculation from Simmons’ Calculus with Analytic Geometry, first edition, pp. 751–2.
CHAPTER 8
Differential Forms and Integration on Manifolds
In this chapter we come to the culmination of our study of multivariable calculus. Just as
in single-variable calculus, we’ve studied two seemingly unrelated topics—the derivative and the
integral. Now the time has come to make the connection between the two, namely, the multivariable
version of the Fundamental Theorem of Calculus. After building up to the ultimate theorem, we
consider some nontrivial applications to physics and topology.
1. Motivation
a b
Figure 1.1
Just as the Fundamental Theorem of Calculus tells us that our displacement is the integral
of our velocity, so can it tell us the area of a plane region by tracing around its boundary (see
Exercises 1.5.3 and 8.3.26). Another instance of the Fundamental Theorem of Calculus is Gauss’s
Law in physics, which tells us that the total flux of the electric field across a “Gaussian surface” is
proportional to the total charge contained inside that surface. And, as we shall see in Section 7,
another application is the Hairy Ball Theorem, which tells us we can’t comb the hairs on a billiard
ball. The elegant modern-day theory of calibrated geometries, which grew out of understanding
minimal surfaces (the surfaces of least area with a given boundary curve), is based on differential
forms and Stokes’s Theorem.
As we’ve seen in Sections 5 and 6 of Chapter 7, determinants play a crucial rôle in the under-
standing of n-dimensional volume, and so it is not surprising that k-forms, the objects we wish to
integrate over k-dimensional surfaces, will be built out of determinants. We turn to this multilinear
algebra in the next section.
EXERCISES 8.1
1. Why does a (plane) mirror reverse left and right but not up and down?
2. Appropriating from Tom and Ray Magliozzi’s Car Talk :
RAY: Picture this. It’s 1936. You’re in your second year of high school. Europe is
on the brink of yet another war.
TOM: Second senior year in high school.
RAY: In a secret location in Germany, German officers are gathered around a table
with the designers and builders of its new personnel carrier. They’re going over every
little detail and leaving no stone unturned. They want everything to be flawless. One
of the officers stands up and says, “I have a question about the fan belt, about the
longevity of the fan belt.” You with me?
TOM: They spoke English there?
RAY: Oh, yeah.
TOM: Just like in all the movies?
RAY: I’m reading the subtitles.
TOM: Just like in all the movies. I often wondered how come they all spoke English?
RAY: Well, it’s so close to German, after all.
TOM: Yeah. You just add an ish or ein to the end of everything.
RAY: Anyway, this fan belt looks just like the belt around your waist. It’s a flat
piece of rubber, and it’s designed to run around the fan and the generator. So, he
asks, “How long do you expect the belt to last?” The engineer says, “30 to 40 thousand
kilometers.” The officer says, “Not good enough.”
TOM: He said, how many miles is that?
RAY: The colonel says . . .
TOM: That’s why I never made any money in scriptwriting.
RAY: Yeah. The colonel says, “Not good enough. We need it to last at least 60K.”
The engineer says, “Huh. Not a problem. It’s just a question of taking off the belt and
flipping it over, right?”
TOM: Sure.
RAY: Turning it inside-out.
TOM: Yeah.
RAY: The officer says, “That’s unacceptable. Our soldiers will be engaged in battle.
We can’t ask them to change fan belts in the middle of the battlefield.”
TOM: Well, it’s a good point.
§2. Differential Forms 321
2. Differential Forms
We have learned how to calculate multiple integrals over regions in Rn . Our next goal is to
be able to integrate over compact manifolds, e.g., curves and surfaces in R3 . In some sense, the
most basic question is this: we know that determinant gives the signed volume of an n-dimensional
parallelepiped in Rn ; how do we find the signed volume of a k-dimensional parallelepiped in Rn ,
and what does “signed” mean in this instance?
2.1. The multilinear set-up. We begin by using the determinant to define various mul-
tilinear functions of (ordered) sets of k vectors in Rn . First, we define n different linear maps
dxi : Rn → R, i = 1, . . . , n, as follows: if
v1
v2
v=
..
∈ Rn ,
then set dxi (v) = vi .
.
vn
(The reason for the bizarre notation will soon become clear.) Note that the set of linear maps from
Rn to R is an n-dimensional vector space, often denoted (Rn )∗ , and {dx1 , . . . , dxn } is a basis for it.
(See Exercise 4.3.25.) For if φ : Rn → R is a linear map, then, letting {e1 , . . . , en } be the standard
basis for Rn , set ai = φ(ei ), i = 1, . . . , n. Then φ = a1 dx1 + · · · + an dxn , so dx1 , . . . , dxn span
(Rn )∗ . Why do they form a linearly independent set? Well, suppose φ = c1 dx1 + · · · + cn dxn is the
zero linear map. Then, in particular, φ(ei ) = ci = 0 for all i = 1, . . . , n, as required.
Now, if I = (i1 , . . . , ik ) is an ordered k-tuple, define
n
dxI : R | × ·{z · · × Rn} → R by1
k times
dx (v ) · · · dx (v )
i1 1 i1 k
.. .. ..
dxI (v1 , . . . , vk ) = . . . .
dxik (v1 ) · · · dxik (vk )
322 Chapter 8. Differential Forms and Integration on Manifolds
As is the case with the determinant, dxI defines an alternating, multilinear function of k vectors
in Rn . If we write
vi,1
vi,2
vi = . , i = 1, . . . , k,
..
vi,n
then
v · · · v
1,i1 k,i1
. .. .. .
dxI (v1 , . . . , vk ) = .. . .
v1,ik · · · vk,ik
When i1 < i2 < · · · < ik , this is of course the determinant of the k × k matrix obtained by taking
rows i1 ,. . . ,ik of the matrix
| | |
v1 v2 · · · vk .
| | |
2 −1
Example 1. Let n = 3, I = (1, 3), and let v1 = 4 and v2 = 0 . Then
5 3
2 −1
dx1 4 dx1 0
−1
2 −1
dx(1,3) 4 , 0 =
5
3 = 2 = 11. ▽
2 −1 5 3
5 3
dx3 4 dx3 0
5 3
1 0 2
−1 3 1
v1 =
0,
v2 =
−2 , and v3 =
1.
2 1 1
1
Here we revert to the usual notation for functions, inasmuch as v1 , . . . , vk are all vectors.
§2. Differential Forms 323
Then
1 0 2
−1 3 1
dx3 dx3 dx3
0 −2 1
2 1 1
1 0 2 1 0 2
−1 3 1 −1 3 1
dx(3,1,4) , ,
= dx
dx1 dx1
0 −2 1 1 0 −2 1
2 1 1 2 1 1
1 0 2
dx4 −1 dx4 3 dx4 1
0 −2 1
2 1 1
0 −2 1
= 1 0 2 = −5. ▽
2 1 1
When i1 < i2 < · · · < ik , we say that the ordered k-tuple I = (i1 , . . . , ik ) is increasing. If I is
a k-tuple with no repeated index, we denote by I < the associated increasing k-tuple. For example,
if I = (2, 4, 5, 1), then I < = (1, 2, 4, 5), and we observe that
In general, dxI = (−1)s dxI < , where s is the number of exchanges required to move from I to I < .
Note that if we switch two of the indices in the ordered k-tuple, this amounts to switching two
rows in the matrix, and the determinant changes sign. Similarly, if two of the indices are equal,
the determinant will always be 0, so dxI = 0 whenever there is a repeated index in I.
It follows from Theorem 5.1 or Proposition 5.18 of Chapter 7 that the set of dxI with I increasing
spans the vector space of alternating multilinear functions from (Rn )k to R, denoted Λk (Rn )∗ . In
particular, if T ∈ Λk (Rn )∗ , then for any increasing k-tuple I, set aI = T (ei1 , . . . , eik ). Then we
leave it to the reader to check that
X
T = aI dxI
I increasing
and that the set of dxI with I increasing forms a linearly independent set (see Exercise 1). Since
counting the increasing sequences of k numbers between 1 and n is the same as counting the number
of k-element subsets of an n-element set, we have
n
k n ∗
dim Λ (R ) = .
k
Remark. Suppose I is an increasing k-tuple. We have the following geometric interpretation:
given vectors v1 , . . . , vk ∈ Rn , the number dxI (v1 , . . . , vk ) is the signed volume of the projection
onto the xi1 xi2 . . . xik -plane of the parallelepiped spanned by v1 , . . . , vk . See Figure 2.1.
Generalizing the cross product of vectors in R3 (see Exercise 3), we define the product of these
alternating multilinear functions, as follows. If I and J are ordered k- and ℓ-tuples, respectively,
324 Chapter 8. Differential Forms and Integration on Manifolds
x3
w
x2
Figure 2.1
we define
dxI ∧ dxJ = dx(I,J) ,
where by (I, J) we mean the ordered (k + ℓ)-tuple obtained by concatenating I and J.
Example 3.
dx(1,2) ∧ dx3 = dx(1,2,3)
dx(1,5) ∧ dx(4,2) = dx(1,5,4,2) = −dx(1,2,4,5)
dx(1,3,2) ∧ dx(3,4) = dx(1,3,2,3,4) = 0 ▽
P P P
We extend by linearity: if ω = aI dxI and η = bJ dxJ , then we set ω ∧ η = (aI bJ )dxI ∧
P
dxJ = (aI bJ )dx(I,J) . This is called the wedge product of ω and η.
Example 4. Suppose ω = a1 dx1 + a2 dx2 and η = b1 dx1 + b2 dx2 ∈ Λ1 (R2 )∗ = (R2 )∗ . Then
let’s compute ω ∧ η ∈ Λ2 (R2 )∗ :
ω ∧ η = (a1 dx1 + a2 dx2 ) ∧ (b1 dx1 + b2 dx2 )
= a1 b1 dx1 ∧ dx1 + a2 b1 dx2 ∧ dx1 + a1 b2 dx1 ∧ dx2 + a2 b2 dx2 ∧ dx2
= a1 b1 dx(1,1) + a2 b1 dx(2,1) + a1 b2 dx(1,2) + a2 b2 dx(2,2)
= (a1 b2 − a2 b1 )dx(1,2) .
Of
course,
it should not be altogether surprising that the determinant of the coefficient matrix
a1 a2
has emerged here. ▽
b1 b2
Proof. (1) and (3) are obvious from the definition. For (2), we observe that to change the
ordered (k + ℓ)-tuple (i1 , . . . , ik , j1 , . . . , jℓ ) to the ordered (k + ℓ)-tuple (j1 , . . . , jℓ , i1 , . . . , ik ) requires
kℓ exchanges: to move j1 past i1 , . . . , ik requires k exchanges, to move j2 past i1 , . . . , ik requires k
more, and so on.
§2. Differential Forms 325
Now that we’ve established associativity, we can make the crucial observation that
dxi ∧ dxj = dx(i,j) and, moreover,
dxi1 ∧ dxi2 ∧ · · · ∧ dxik = dx(i1 ,...,ik ) .
As has been our custom throughout this text, when we work in R3 , it is often more convenient to
write x, y, z for x1 , x2 , x3 .
for some smooth functions fI . (Remember that the dxI with I increasing give a basis for Λk (Rn )∗ .)
As usual, if k > n, the only k-form is 0.
We can perform the obvious algebraic manipulations with forms: we can add two k-forms, we
can multiply a k-form by a function, we can form the wedge product of a k-form and an ℓ-form.
The set of k-forms on Rn is naturally a vector space, which we denote by Ak (Rn ).3 For reference
we list the relevant algebraic properties:
Proposition 2.2. Let U ⊂ Rn be an open set. Let ω ∈ Ak (U ), η ∈ Aℓ (U ), and φ ∈ Am (U ).
(1) When k = ℓ = m, ω + η = η + ω and (ω + η) + φ = ω + (η + φ).
(2) ω ∧ η = (−1)kℓ η ∧ ω.
(3) (ω ∧ η) ∧ φ = ω ∧ (η ∧ φ).
(4) When k = ℓ, (ω + η) ∧ φ = (ω ∧ φ) + (η ∧ φ).
Determinants (and hence volume) are already built into the structure of k-forms. As the name
“differential form” suggests, their substantial power comes, however, from our ability to differentiate
them. We begin with the case of a 0-form, i.e., a smooth function f : U → R. Then for any x ∈ U
we want df (x) = Df (x) as a linear map on Rn . In other words, we have
n
X ∂f
df = dxj .
∂xj
j=1
In particular, note that if we take f to be the ith coordinate function, then df = dxi and dxi (v) =
P
Dxi (v) = vi , so this explains (in part) our original choice of notation. If ω = fI (x)dxI is a
I
k-form, then we define
X XX n
∂fI
dω = dfI ∧ dxI = dxj ∧ dxi1 ∧ · · · ∧ dxik .
∂xj
I I j=1
2
Sorry about that. You think of a better word!
3
For those of you who may see such words in the future, it is in fact a module over the ring of smooth functions.
Indeed, because we can multiply using the wedge product, if we put all the k-forms together, k = 0, 1, . . . , n, we get
what is called a graded algebra.
326 Chapter 8. Differential Forms and Integration on Manifolds
(Note that for a fixed k-tuple I, only the terms dxj where j is different from i1 , . . . , ik will appear.)
Examples 5.
(a) Suppose f : R → R is smooth. Then we have df = f ′ (x)dx.
(b) Let ω = ydx + xdy ∈ A1 (R2 ). Then dω = dy ∧ dx + dx ∧ dy = 0.
(c) Let ω = −ydx + xdy ∈ A1 (R2 ). Then dω = −dy ∧ dx + dx ∧ dy = 2dx ∧ dy.
y −ydx + xdy
(d) Let ω = d arctan = ∈ A1 (R2 − {0}). Then
x x2 + y 2
y x
dω = d − 2 ∧ dx + d ∧ dy
x + y2 x2 + y 2
∂ y ∂ x
=− dy ∧ dx + dx ∧ dy
∂y x2 + y 2 ∂x x2 + y 2
(x2 + y 2 ) − 2y 2 (x2 + y 2 ) − 2x2
= dx ∧ dy + dx ∧ dy = 0.
(x2 + y 2 )2 (x2 + y 2 )2
(e) Let ω = x1 dx2 + x3 dx4 + x5 dx6 ∈ A1 (R6 ). Then dω = dx1 ∧ dx2 + dx3 ∧ dx4 + dx5 ∧ dx6 .
(f) Let ω = (x2 + eyz )dy ∧ dz + (y 2 + sin(x3 z))dz ∧ dx + (z 2 + arctan(x2 + y 2 ))dx ∧ dy ∈ A2 (R3 ).
Then
dω = 2xdx ∧ dy ∧ dz + 2ydy ∧ dz ∧ dx + 2zdz ∧ dx ∧ dy
= 2(x + y + z)dx ∧ dy ∧ dz. ▽
The operator d, called the exterior derivative, enjoys the following properties:
Proof. (1) and (2) are immediate; indeed, (2) is a consequence of (3). To prove (3), we note
that because d commutes with sums, it suffices to consider the case that ω = f dxI and η = gdxJ .
Then, since the product rule gives d(f g) = gdf + f dg, we have
d(ω ∧ η) = d(f gdxI ∧ dxJ ) = d(f g) ∧ dxI ∧ dxJ
= (gdf + f dg) ∧ dxI ∧ dxJ = gdf ∧ dxI ∧ dxJ + f dg ∧ dxI ∧ dxJ
= (df ∧ dxI ) ∧ (gdxJ ) + (−1)k (f dxI ) ∧ (dg ∧ dxJ )
= dω ∧ η + (−1)k ω ∧ dη.
To prove (4), suppose ω = f dxI . Then
n
X ∂f
dω = dxj ∧ dxI
∂xj
j=1
§2. Differential Forms 327
and
n X
X n
∂2f
(∗) d(dω) = dxi ∧ dxj ∧ dxI .
∂xi ∂xj
i=1 j=1
Since dxi ∧ dxj = −dxj ∧ dxi , we can rewrite the right-hand side of (∗) as
X ∂2f ∂2f
− dxi ∧ dxj ∧ dxI .
∂xi ∂xj ∂xj ∂xi
i<j
2.3. Pullback. All the algebraic and differential structure inherent in differential forms en-
dows them with a very natural behavior under mappings. The main point is to generalize the
procedure of “integration by substitution” familiar to all calculus students: When confronted with
Z b
the integral f (g(u))g ′ (u)du, we substitute x = g(u), formally write dx = g ′ (u)du, and say
Z b a Z g(b)
′
f (g(u))g (u)du = f (x)dx. The proof that this works is, of course, the chain rule. Now we
a g(a)
put this procedure in the proper setting.
Note that the coefficients of g∗ dxi , written as a linear combination of du1 , . . . , dum , are the entries
of the ith row of the derivative matrix of g. Now just let the pullback of a wedge product be the
wedge product of the pullbacks:
Examples 6.
(a) If g : R → R, then g ∗ (f (x)dx) = f (g(u))g ′ (u)du.
328 Chapter 8. Differential Forms and Integration on Manifolds
g∗ ω = (u cos v)(cos vdu − u sin vdv) + (u sin v)(sin vdu + u cos vdv)
= u(cos2 v + sin2 v)du + u2 (− cos v sin v + cos v sin v)dv = udu.
Moreover,
g∗ (dx ∧ dy) = g∗ dx ∧ g∗ dy = (cos vdu − u sin vdv) ∧ (sin vdu + u cos vdv)
= u(cos2 v + sin2 v)du ∧ dv = udu ∧ dv,
2 2 2
so g∗ e−(x +y ) dx ∧ dy = ue−u du ∧ dv.
(d) Let g : R2 → R3 be given by
u cos v
u
g = u sin v .
v
v
Then
and so
g∗ ω = u2 (udu ∧ dv) + (u cos v)(cos vdu ∧ dv) + (u sin v)(sin vdu ∧ dv)
= u(u2 + 1)du ∧ dv. ▽
It is impossible to miss the appearance of determinants of the derivative matrix in the calculation
we just performed. Indeed, if I is an ordered k-tuple,
X
∗ ∂gI
g dxI = det duJ ; i.e.,
∂uJ
increasing k-tuples J
§2. Differential Forms 329
∂gi1 ∂gi1
∂uj · · · ∂ujk
X 1
.. .. ..
g∗ dxi1 ∧ · · · ∧ dxik = . . . duj1 ∧ · · · ∧ dujk .
1≤j1 <···<jk ≤m ∂gik ∂gik
∂uj · · · ∂uj
1 k
Proof. The statement for k = 0 is the chain rule (Theorem 3.2 of Chapter 3):
m n !
X X ∂f ∂gi
d(g∗ f ) = d(f ◦ g) = ◦g duj
∂xi ∂uj
j=1 i=1
n X m
! n
X ∂f ∂gi X ∂f
∗
= ◦g duj = g g∗ dxi
∂xi ∂uj ∂xi
i=1 j=1 i=1
X
n
∂f
= g∗ dxi = g∗ (df ).
∂xi
i=1
Since the pullback of a wedge product is the wedge product of the pullbacks, we infer that g∗ (dxI ) =
dgI . Because d and pullback are linear, it suffices to prove the result for ω = f dxI . Well,
g∗ d(f dxI ) = g∗ df ∧ dxI ) = g∗ (df ) ∧ g∗ (dxI ) = g∗ (df ) ∧ dgI
= d(g∗ f ) ∧ dgI = d (g∗ f )dgI = d g∗ (f dxI ) .
(Notice that at the penultimate step we use the rule for differentiating the wedge product and the
fact that d(dgi ) = 0.)
Let Ω ⊂ Rk be a region, and let g : Ω → Rn be a smooth one-to-one map whose derivative has
rank k at every point. (Actually, it is allowed to have lesser rank on a set of volume 0, but we won’t
330 Chapter 8. Differential Forms and Integration on Manifolds
bother with this now.) We say that M = g(Ω) ⊂ Rn is a parametrized k-dimensional manifold. If
ω is a k-form on Rn , we define Z Z
ω= g∗ ω.
M Ω
If g1 : Ω1 → Rn
and g2 : Ω2 → Rn are two parametrizations of the same k-manifold M , it can be
checked that g2 g1 is smooth. Then, provided det D(g2−1 ◦ g1 ) > 0 (which, as we shall soon see,
−1 ◦
That is, the integral of ω over the (oriented) parametrized manifold M is well-defined.
EXERCISES 8.2
1. Prove that as I ranges over all increasing k-tuples, the dxI form a linearly independent set
P
in Λk (Rn )∗ . Also check that for any T ∈ Λk (Rn )∗ , T = I increasing aI dxI , where aI =
T (ei1 , . . . , eik ).
3. Suppose v, w ∈ R3 . Show that dx(v × w) = dy ∧ dz(v, w), dy(v × w) = dz ∧ dx(v, w), and
dz(v × w) = dx ∧ dy(v, w).
φ = n1 dy ∧ dz + n2 dz ∧ dx + n3 dx ∧ dy.
Prove that φ(v, w) is equal to the signed area of the parallelogram spanned by v and w (the
sign being determined by whether n, v, w form a right-handed system for R3 ).
a. ω = exy dx
b. ω = z 2 dx + x2 dy + y 2 dz
c. ω = x2 dy ∧ dz + y 2 dz ∧ dx + z 2 dx ∧ dy
d. ω = x1 x2 dx3 ∧ dx4
*7. Can there be a function f so that df is the given 1-form ω (everywhere ω is defined)? If so,
can you find f ?
a. ω = −ydx + xdy
b. ω = 2xydx + x2 dy
c. ω = ydx + zdy + xdz
d. ω = (x2 + yz)dx + (xz + cos y)dy + (z + xy)dz
x y
e. ω = x2 +y 2 dx + x2 +y 2 dy
y x
f. ω = − x2 +y 2 dx + x2 +y 2 dy
8. For each of the following k-forms ω, can there be a (k − 1)-form η (defined wherever ω is) so
that dη = ω?
a. ω = dx ∧ dy
b. ω = xdx ∧ dy
c. ω = zdx ∧ dy
d. ω = zdx ∧ dy + ydx ∧ dz + zdy ∧ dz
e. ω = xdy ∧ dz + ydx ∧ dz + zdx ∧ dy
f. ω = (x2 + y 2 + z 2 )−1 (xdy ∧ dz + ydz ∧ dx + zdx ∧ dy)
g. ω = x5 dx1 ∧ dx2 ∧ dx3 ∧ dx4 + x1 dx2 ∧ dx4 ∧ dx3 ∧ dx5
♯ 9. (The star operator)
a. Define ⋆ : A1 (R2 ) → A1 (R2 ) by ⋆dx = dy and ⋆dy = −dx, extending by linearity. If f is a
smooth function, show that
2
∂ f ∂2f
d⋆(df ) = + dx ∧ dy.
∂x2 ∂y 2
b. Define ⋆ : A1 (R3 ) → A2 (R3 ) by ⋆dx = dy ∧ dz, ⋆dy = dz ∧ dx, and ⋆dz = dx ∧ dy, extending
by linearity. If f is a smooth function, show that
2
∂ f ∂2f ∂2f
d⋆(df ) = + 2 + 2 dx ∧ dy ∧ dz.
∂x2 ∂y ∂z
(Note that we can generalize the definition of the star operator by declaring that, in Rn , ⋆ of
a basis 1-form φ = dxi is the “complementary” (n − 1)-form, subject to the sign requirement
that φ ∧ ⋆φ = dx1 ∧ · · · ∧ dxn .)
10. Suppose ω ∈ A1 (Rn ) and there is a nowhere-zero function λ so that λω is the exterior derivative
of some function f . Prove that ω ∧ dω = 0. (This problem gives a useful criterion for deciding
whether the differential equation ω = 0 has an integrating factor λ.)
11. In each case, calculate the pullback g∗ ω and simplify your answer as much as possible.
√
a. g : (−π/2, π/2) → R, g(u) = sin u, ω = dx/ 1 − x2
332 Chapter 8. Differential Forms and Integration on Manifolds
3 cos 2v
*b. g: R → R2 ,g(v) = , ω = −ydx + xdy
3 sin 2v
2 2 u 3u cos 2v
c. g:R →R ,g = , ω = −ydx + xdy
v 3u sin 2v
cos u
u
d. 2 3
g: R → R , g = sin u , ω = zdx + xdy + ydz
v
v
cos u
u
*e. g : R2 → R3 , g = sin u , ω = zdx ∧ dy + ydz ∧ dx
v
v
cos u
u sin v
f. g : R2 → R4 , g =
sin u , ω = x2 dx1 + x3 dx4
v
cos v
cos u
u sin v
g. g : R2 → R4 , g =
sin u , ω = x1 dx3 − x2 dx4
v
cos v
cos u
u sin v
h. g : R2 → R4 , g =
sin u , ω = (−x3 dx1 + x1 dx3 ) ∧ (−x2 dx4 + x4 dx2 )
v
cos v
12. For each part of Exercise 11, calculate g∗ (dω) and d(g∗ ω) and compare your answers.
13. Let g : (0, ∞) × (0, π) × (0, 2π) → R3 be the usual spherical coordinates mapping given on
p. 282. Compute g∗ (dx ∧ dy ∧ dz).
♯ 14. We say a k-form ω is closed if dω = 0 and exact if ω = dη for some (k − 1)-form η.
a. Prove that an exact form is closed. Is every closed form exact? (Hint: Work with Example
5(d).)
b. Prove that if ω and φ are closed, then ω ∧ φ is closed.
c. Prove that if ω is exact and φ is closed, then ω ∧ φ is exact.
P
k
15. Suppose k ≤ n. Let ω1 , . . . , ωk ∈ (Rn )∗ and suppose that dxi ∧ ωi = 0. Prove that there are
i=1
P
k
scalars aij such that aij = aji and ωi = aij dxj .
j=1
h g
16. Suppose Rℓ → Rm → Rn . Prove that (g◦ h)∗ = h∗ ◦ g∗ . (Hint: It suffices to prove (g◦ h)∗ dxi =
h∗ (g∗ dxi ). Why?)
17. a. Suppose I = (i1 , . . . , in ) is an ordered n-tuple and I < = (1, 2, . . . , n). Then we can
define a permutation σ of the numbers 1, . . . , n by σ(j) = ij , j = 1, . . . , n. Show that
dxI = sign(σ)dx1 ∧ · · · ∧ dxn .
P
n
b. Suppose ωi = aij dxj , i = 1, . . . , n, are 1-forms on Rn . Use Proposition 5.18 of Chapter
j=1
7 to prove that ω1 ∧ · · · ∧ ωn = (det A)dx1 ∧ · · · ∧ dxn .
§3. Line Integrals and Green’s Theorem 333
19. Suppose U ⊂ Rm is open and g : U → Rn is smooth. Prove that for any ω ∈ Ak (Rn ) and
v1 , . . . , vk ∈ Rm , we have
g∗ ω(a)(v1 , . . . , vk ) = ω(g(a)) Dg(a)v1 , . . . , Dg(a)vk .
(Hint: Consider ω = dxI .)
20. Prove that there is a unique linear operator d mapping Ak (U ) → Ak+1 (U ) for all k that satisfies
P
n
∂f
the properties in Proposition 2.3 and df = ∂xj dxj . (This tells us that, appearances to the
j=1
contrary notwithstanding, the exterior derivative d does not depend on our coordinate system.)
Remark. Consider the curve C − parametrized by h : [a, b] → Rn , h(u) = g(a + b − u). Then
Z Z b Z b
∗ ′
h ω= F(h(u)) · h (u)du = F(g(a + b − u)) · −g′ (a + b − u))du
[a,b] a a
Z b
=− F(g(t)) · g′ (t)dt (substituting t = a + b − u)
a
Z
=− g∗ ω.
[a,b]
Note that h(a) = g(b) and h(b) = g(a): when we go backwards on C, the integral of ω changes
sign. We can think of obtaining C − by reversing the orientation (or direction) of C.
In comparing C and C − , the unit tangent vector T reverses direction, so that F · T changes
sign, but ds does not. That is, the notation notwithstanding, ds is not a 1-form, as its value on a
tangent vector to C is the length of that tangent vector; this, in turn, is not a linear function of
tangent vectors! It would probably be better to write |ds|.
1 2
Example 1. Let C be the line segment from −1 to 2 , and let ω = xydz. We wish to
0 2
Z
calculate ω. The first step is to parametrize C:
C
1+t
g(t) = −1 + 3t , 0 ≤ t ≤ 1.
2t
Then
Z Z Z 1 Z 1
∗
1
ω= g ω= (1 + t)(−1 + 3t)(2dt) = 2 (3t2 + 2t − 1)dt = 2(t3 + t2 − t) 0 = 2. ▽
C [0,1] 0 0
Example 2. Let ω = −ydx + xdy. Consider two parametrized curves C1 and C2 , as shown in
B
C2 C1
Figure 3.1
1 0
Figure 3.1, starting at A = and ending at B = , and parametrized respectively by:
0 1
cos t π 1−t
g(t) = , 0≤t≤ , and h(t) = , 0 ≤ t ≤ 1.
sin t 2 t
Z Z Z π/2 Z π/2
π
ω= g∗ ω = (− sin t)(− sin tdt) + (cos t)(cos tdt) = 1dt = ;
C1 [0,π/2] 0 0 2
Z Z Z 1 Z 1
ω= h∗ ω = (−t)(−dt) + (1 − t)(dt) = 1dt = 1.
C2 [0,1] 0 0
§3. Line Integrals and Green’s Theorem 335
Z B
Thus, we see that ω depends not just on the endpoints of the path, but on the particular path
A
joining them. ▽
Recall from your integral calculus (or introductory physics) class the definition of work done by
a force in displacing an object. When the force and the displacement are parallel, the definition is
work = force × displacement,
and in general only the component of the force vector F in the direction of the displacement vector
d is considered to do work, so
work = F · d.
If a vector field F moves a particle
Z along a parametrized curve C, then it is reasonable to suggest
that the total work should be F · Tds: Instantaneously the particle moves in the direction of T
C
and only the component of F in that direction should contribute. Without providing complete rigor,
g(t) C
g(t+h)
F(g(t))
Figure 3.2
we see from Figure 3.2 that the amount of work done by the force in moving the particle along C
during a very small time interval [t, t+h] is approximately F(g(t))· g(t+h)−g(t) ≈ F(g(t))·g′ (t)h,
Z b
which suggests that the total work should be given by F(g(t)) · g′ (t)dt.
a
Example 3. What is the relation between work and energy? As we saw in Section 4 of Chapter
7, the kinetic energy of a particle with mass m and velocity v is defined to be K.E. = 21 mkvk2 .
Suppose a particle with mass m moves along a curve C, its position at time t being given by g(t),
t ∈ [a, b]. Then the work done by the force field F on the particle is given by
Z Z b
work = F · Tds = F(g(t)) · g′ (t)dt
C a
Z b
= mg′′ (t) · g′ (t)dt by Newton’s second law of motion
a
Z b ′
=m 1
2 kg′ k2 (t)dt
a
1 ′ 2 ′ 2
= 2 m kg (b)k − kg (a)k by the Fundamental Theorem of Calculus
= ∆( 12 mkvk2 ) = ∆(K.E.).
336 Chapter 8. Differential Forms and Integration on Manifolds
That is, assuming F is the only force acting on the particle, the work done in moving it along a
path is the particle’s change in kinetic energy along that path. ▽
Proposition 3.1. Suppose ω = df for some C1 function f . Then for any path (i.e., piecewise-C1
manifold) C starting at A and ending at B, we have
Z
ω = f (B) − f (A).
C
Proof. It follows from Theorem 3.1 of Chapter 6 that any C1 segment of C is a finite union
of parametrized curves Cj , j = 1, . . . , s, where Cj is the image of a C1 function gj : [aj , bj ] → Rn .
Let gj (aj ) = Aj and gj (bj ) = Bj . We may arrange that A1 = A, Bj = Aj+1 , j = 1, . . . , s − 1, and
Bs = B. It suffices to prove the result for Cj , for then we will have
Z Xs Z Xs
ω= ω= f (Bj ) − f (Aj ) = f (B) − f (A).
C j=1 Cj j=1
Now, we have
Z Z bj
ω= gj∗ ω by definition
Cj aj
Z bj Z bj
= gj∗ (df ) = d(gj∗ f ) since d commutes with pullback
aj aj
Z bj
= d(f ◦ gj ) by definition of pullback
aj
Z bj
= (f ◦ gj )′ (t)dt
aj
as required. Note that the proof amounts merely to applying the standard Fundamental Theorem
of Calculus, along with the definition of line integration by pullback. The fact that d commutes
with pullback, in this instance, is simply the chain rule.
P
Theorem 3.2. Let ω = Fi dxi be a 1-form (or let F be the corresponding force field) on an
n
open subset U ⊂ R . The following are equivalent:
I
(1) ω = 0 for every closed curve C ⊂ U .
C
Z B
(2) ω is path-independent in U .
A
(3) ω = df (or F = ∇f ) for some potential function f on U .
§3. Line Integrals and Green’s Theorem 337
I
Remark . Note that we are using the notation ω to denote the integral of ω around the
C
closed curve (or loop) C. This notation is prevalent in physics texts. Next, in light of Example 3,
there is no net work done by F around closed paths, so that kinetic energy is conserved. This is
why such force fields are called conservative. Physicists refer to −f as the potential energy (P.E.).
It then follows from Proposition 3.1 that the total energy, K.E. + P.E., is conserved along all curves:
for
∆(K.E.) = work = f (B) − f (A) = −∆(P.E.), and so ∆(K.E. + P.E.) = 0.
Proof. (1) =⇒ (2): If C1 and C2 are two paths from A to B, then C = C1 ∪ C2− is a closed
curve, as indicated in Figure 3.3(a). Then
Z Z Z Z Z
0= ω= ω− ω =⇒ ω= ω.
C C1 C2 C1 C2
C2 x
x+hei
a
A
C1
(a) (b)
Figure 3.3
(2) =⇒ (3): (Here we assume any two points of U can be joined by a path. If not, one must
repeat the argument on each connected “piece” of U .) Fix a ∈ U , and define f : U → R by
Z x
f (x) = ω, where the integral is computed along any path from a to x.
a
By path-independence, f is well-defined. Now, to show that df = ω, we must evidently establish
∂f
that (x) = Fi (x). Now, as Figure 3.3(b) suggests,
∂xi
x1 x1
.. ..
∂f
1 . .
(x) = lim f xi + h − f xi
∂xi h→0 h .
... ..
xn xn
Z x+hei Z h
1 1
= lim ω = lim Fi (x + tei )dt
h→0 h x h→0 h 0
= Fi (x)
by the usual Fundamental Theorem of Calculus.
(3) =⇒ (1): This is immediate from Proposition 3.1.
338 Chapter 8. Differential Forms and Integration on Manifolds
Remark. We know that when ω = df , it must be the case that dω = 0. That is, a necessary
condition for the 1-form ω to be exact is that it be closed. As we saw in Example 5(d) of Section
2, the condition is definitely not sufficient. We shall soon see that the topology of the region on
which ω is defined is relevant.
Z
3.2. Finding a potential function. If we know that ω is path-independent on a region,
then we can construct a potential function by choosing a convenient path. We illustrate the general
principle with some examples.
Examples 4. Let ω = ex + 2xy dx + x2 + cos y dy. We show two different ways to calculate
a potential function f , i.e., a function f with df = ω.
x0 x0
y0 y0
C
C2
C1
0 0 x0
0 0 0
(a) (b)
Figure 3.4
0 x0
(a) Take the line segment C joining 0 = and x0 = ; we take the obvious parametriza-
0 y0
tion:
tx0
g(t) = tx0 = , 0 ≤ t ≤ 1.
ty0
Then
Z x0 Z
x0
f = ω= g∗ ω
y0 0 [0,1]
Z 1
= (etx0 + 2t2 x0 y0 )x0 + (t2 x20 + cos(ty0 ))y0 dt
0
i1
= etx0 + 23 t3 x20 y0 + 31 t3 x20 y0 + sin(ty0 )
0
x0 2
= e + x0 y0 + sin y0 − 1,
x
and so we set f = ex + x2 y + sin y − 1, and it is easy to check that df = ω.
y
(b) Now we take the two-step path, as shown in Figure 3.4(b), first varying x and then varying
y, to get from 0 to x0 . That is, we have the two parametrizations:
t x0
C1 : g1 (t) = , 0 ≤ t ≤ x0 , C2 : g2 (t) = , 0 ≤ t ≤ y0 .
0 t
Then we have
Z Z
x0
f = ω+ ω
y0 C1 C2
§3. Line Integrals and Green’s Theorem 339
Z x0 Z y0
t
= e dt + x20 + cos t dt
0 0
x0
= (e − 1) + (x20 y0 + sin y0 ).
x
Once again, we have f = ex − 1 + x2 y + sin y.
y
(c) As a variation on the approach of part (b), we proceed purely by antidifferentiating. If we
seek a function f with df = ω, then this means that
∂f ∂f
(∗) = ex + 2xy and = x2 + cos y.
∂x ∂y
Integrating the first equation, holding y fixed, we obtain
Z
x
(†) f = ex + 2xy dx = ex + x2 y + h(y)
y
for some arbitrary function h (this is the “constant of integration”). Differentiating (†)
with respect to y and comparing with the latter equation in (∗), we find
∂f
= x2 + h′ (y) = x2 + cos y,
∂y
x
whence h′ (y) = cos y and h(y) = sin y + C. Thus, the general potential function is f =
y
ex + x2 y + sin y + C for any constant C.
Note that even though it is computationally more clumsy, the approach in (a) requires only that
we be able to draw a line segment from the “base point” (in this case, the origin) to all the other
points of our region. The approaches in (b) and (c) require some further sort of convexity: we must
be able to start at our base point and reach every other point by a path that is first horizontal and
then vertical. ▽
We now prove a general result along these lines: Suppose an open subset U ⊂ Rn has the
property that for some point a ∈ U , the line segment from a to each and every point x ∈ U lies
entirely in U . (Such a region is called star-shaped with respect to a, as Figure 3.5 suggests.) Then
Figure 3.5
we have:
340 Chapter 8. Differential Forms and Integration on Manifolds
∂Fj ∂Fi
(using the fact that = , since dω = 0)
∂xi ∂xj
Z 1 Z 1
= Fi (a + t(x − a))dt + t(Fi ◦ g)′ (t)dt (by the chain rule)
0 0
Z 1 i1 Z 1
= Fi (a + t(x − a))dt + t(Fi ◦ g)(t) − Fi (a + t(x − a))dt
0 0 0
(integrating by parts)
= Fi (x).
∂f ∂g
=x+ = x + z,
∂y ∂y
∂g y
and so we find that = z. Thus, g = yz+h(z) for some appropriate “constant of integration”
∂y z
h(z). So
x
f y = z log x + xy + yz + h(z).
z
Now, differentiating with respect to z, we have
∂f
= log x + y + h′ (z) = log x + y + 2z,
∂z
and so—finally—h(z) = z 2 + c, whence
x
f y = z log x + xy + yz + z 2 + c.
z
and so
Z
ω = f (B) − f (A) = 1 + 4e + 4 + 1 − −1 = 4e + 7. ▽
C
Example 6. Newton’s law of gravitation states that the gravitational force exerted by a point
mass M at the origin on a unit test mass is radial and inverse-square in magnitude:
x
F = −GM .
kxk3
(The corresponding 1-form is ω = −GM (x2 + y 2 + z 2 )−3/2 (xdx + ydy + zdz).) Since
(see Example 1 of Chapter 3, Section 4), it follows immediately that a potential function for the
gravitational field is f (x) = GM/kxk. (Physicists ordinarily choose the constant so that the
potential goes to 0 as x goes to infinity.)
Let’s now consider the case of the gravitational field of the earth; note that the gravitational
acceleration at the surface of the earth is given by g = GM/R2 , where R is the radius of the earth.
342 Chapter 8. Differential Forms and Integration on Manifolds
By Proposition 3.1, the work done (against gravity) to lift a unit test mass from a point A on the
surface of the earth to a point B height h units above the surface of the earth is therefore
1 1 h GM h
−(f (B) − f (A)) = GM − = GM = ≈ gh,
R R+h R(R + h) R2 1 + R h
provided h is quite small compared to R. This checks with the standard formula for the potential
energy of a mass m at (small) height h above the surface of the earth: P.E. = mgh. ▽
3.3. Green’s
I Theorem. We have seen that whenever ω = df for some function f , it is the
case that ω = 0 for all closed curves C. So certainly we expect that the size of dω on a region will
C
affect the integral of ω around the boundary of that region. The precise statement is the following
Theorem 3.4 (Green’s Theorem for a rectangle). Let R ⊂ R2 be a rectangle, and let ω be a
1-form on R. Then Z Z
ω= dω.
∂R R
(Here the boundary ∂R is traversed counterclockwise.)
Proof. Take R = [a, b] × [c, d], as shown in Figure 3.6, and write ω = P dx + Qdy. Then
c ∂R
a b
Figure 3.6
∂Q ∂P
dω = − dx ∧ dy.
∂x ∂y
Now we merely calculate, using Fubini’s Theorem appropriately:
Z Z
∂Q ∂P
dω = − dA
R R ∂x ∂y
Z d Z b Z b Z d
∂Q ∂P
= dx dy − dy dx
c a ∂x a c ∂y
Z d Z b
b a x x
= Q −Q dy − P −P dx
c y y a d c
(It is important to understand that both S and ∂S inherit an orientation from the parametrization
g.)
Z
2
Example 7. Suppose ω is a smooth 1-form on the unit disk D in R . Can we infer that ω=
Z ∂D
dω? The naı̈ve answer is “of course,” parametrizing by polar coordinates and applying Corollary
D
3.5. The difficulty that arises is that we only get a bona fide parametrization on (0, 1] × (0, 2π).
But we can apply Corollary 3.5 on the rectangle Rδ,ε = [δ, 1] × [ε, 2π] when δ, ε > 0 are small. Let
2π g
Dδ,ε
Rδ,ε
ε
δ 1
Figure 3.7
Dδ,ε = g(Rδ,ε ), as indicated in Figure 3.7. Because ω is smooth on all of the unit disk, we have
Z Z Z Z Z Z
∗ ∗
dω = lim dω = lim g dω = lim g ω = lim ω= ω.
D δ,ε→0+ Dδ,ε δ,ε→0+ Rδ,ε δ,ε→0+ ∂Rδ,ε δ,ε→0+ ∂Dδ,ε ∂D
(We leave it to the reader to justify the first and last equalities.) We shall not belabor such details
in the future. ▽
More generally, we observe that Green’s Theorem holds for any region S that can be decomposed
as a finite union of parametrized rectangles overlapping only along their edges. For, as Figure 3.8
S
illustrates, if S = ki=1 Si , because the integrals over interior boundary segments cancel in pairs,
we have
Z Xk Z Xk Z Z
ω= ω= dω = dω.
∂S i=1 ∂Si i=1 Si S
Remark. We do not usually stop to express every “reasonable” region explicitly as a union of
parametrized rectangles. (For most purposes, our work in Section 5 will obviate all such worries.)
344 Chapter 8. Differential Forms and Integration on Manifolds
S3
S2
S1
Figure 3.8
In Example 7 we already dealt with the case of a disk. To set our minds further at ease, we can
easily check that
r cos θ
g1 : [0, 1] × [0, π], g1 =r
θ sin θ
maps a rectangle to a half-disk, and that
r r cos θ
g2 : [0, 1] × [0, π/2], g2 =
θ cos θ + sin θ sin θ
0 1 0
maps a rectangle to the triangle with vertices at , , and .
0 0 1
Example 8. We can use Green’s Theorem to calculate the area of a planar region S by line
integration. Since
dx ∧ dy = d(xdy) = d(−ydx) = d 12 (−ydx + xdy) ,
we have Z Z Z
1
area(S) = xdy = −ydx = −ydx + xdy. ▽
∂S ∂S 2 ∂S
y x
Example 9. Let ω = − dx + 2 dy. Then, as we calculated in Example 5(d) of
x2
+y 2 x + y2 I
Section 2, dω = 0. And yet, letting C be the unit circle, it is easy to check that ω = 2π. So ω
C
cannot be exact. We shall see further instances of this phenomenon in later sections. ▽
Nevertheless, we can use Green’s Theorem to draw a very interesting conclusion.
Example 10. Suppose C is any simple closed curve in the plane that encircles the origin, and
let Γ be a circle centered at the origin lying in the interior of C, as shown in Figure 3.9. Let S be the
Figure 3.9
C C
Figure 3.10
Z
More generally, consider the curves shown in Figure 3.10. Then ω = 2π, 4π, and 0, respec-
C
tively, in parts (a), (b), and (c). For reasons we leave to the reader to surmise, for a closed plane
curve not passing through the origin, the integer
Z
1 y x
− 2 2
dx + 2 dy
2π C x + y x + y2
346 Chapter 8. Differential Forms and Integration on Manifolds
EXERCISES 8.3
Z
*1. Let ω = ydx + xdy. Compare and contrast the integrals ω for the following parametrized
C
curves C. (Be sure to sketch C.)
2 t
a. g : [0, 1] → R , g(t) =
t
t
b. g : [0, 1] → R2 , g(t) = 2
t
2 1−t
c. g : [0, 1] → R , g(t) =
1−t
cos2 t
d. g : [0, π/2] → R2 , g(t) =
1 − sin2 t
2 sin 2t
e. g : [0, π/4] → R , g(t) =
1 − cos 2t
2 cos t
f. g : [0, π/2] → R , g(t) =
1 − sin t
3. Calculate
Z the following line integrals:
a. xy 3 dx, where C is the unit circle x2 + y 2 = 1, oriented counterclockwise
C
Z 0 1
b. zdx + xdy + ydz, where C is the line segment from 1 to −1
C 2
3
Z 1 2
c. y 2 dx + zdy − 3xydz, where C is the line segment from 0 to 3
C 1 −1
Z
d. ydx, where C is the intersection of the unit sphere and the plane x + y + z = 0, oriented
C
counterclockwise as viewed from high above the xy-plane. (Hint: Find an orthonormal
basis for the plane.)
6. Determine which of the following 1-forms ω are exact (or, in other words, which of the corre-
sponding vector fields F are conservative). For those that are, construct (following one of the
algorithms in the text) a potential function f . For those that are not, give a closed curve C for
§3. Line Integrals and Green’s Theorem 347
I
which ω 6= 0.
C
a. ω = (x + y)dx + (x + y)dy
b. ω = y 2 dx + x2 dy
c. ω = (ex + 2xy)dx + (x2 + y 2 )dy
d. ω = (x2 + y + z)dx + (x + y 2 + z)dy + (x + y + z 2 )dz
e. ω = y 2 zdx + (2xyz + sin z)dy + (xy 2 + y cos z)dz
P
n
7. Let f : R → R and ω = f (kxk) xi dxi ∈ A1 (Rn ).
i=1
a. Assuming f is differentiable, prove that dω = 0 on Rn − {0}.
b. Assuming f is continuous, prove that ω is exact.
*14. Let 0 < b < a. Find the area beneath one arch of the trochoid (as shown in Figure 3.11)
at − b sin t
g(t) = , 0 ≤ t ≤ 2π.
a − b cos t
348 Chapter 8. Differential Forms and Integration on Manifolds
Figure 3.11
15. Find the area of the plane region bounded by the evolute
a(cos t + t sin t)
g(t) = , 0 ≤ t ≤ 2π ,
a(sin t − t cos t)
Figure 3.12
−1 4
5 5
−1 3
3 3
4 C
C 2
−1 4
−1 3 2 2
1 1
(a) (b)
Figure 3.13
This is called the flux of F across C. (See Exercise 8.2.9.) Conclude that when C = ∂S, we
have
Z Z
∂F1 ∂F2
F · nds = + dA.
C S ∂x ∂y
x p
19. Prove Green’s theorem for the annular region Ω = :a≤ x2 + y2 ≤ b , pictured in
y
Figure 3.14.
Figure 3.14
21. Suppose C is a piecewise C1 closed curve in R2 that intersects itself finitely many times and
does not pass through the origin. Show that the line integral
Z
1 −ydx + xdy
2π C x2 + y 2
is always an integer. (See the discussion of Example 10.)
1 2
C closed
22. Suppose C is a piecewise curve in R that intersects itself finitely many times and
1 −1
does not pass through or . Show that there are integers m and n so that
0 0
Z
1 −ydx + (x − 1)dy −ydx + (x + 1)dy
A 2 2
+B = mA + nB.
2π C (x − 1) + y (x + 1)2 + y 2
350 Chapter 8. Differential Forms and Integration on Manifolds
y 3 + x2 y
23. An ant finds himself in the xy-plane in the presence of the force field F = . Around
2x2 − 6xy
what simple closed curve beginning and ending at the origin should he travel counterclockwise
(once) so as to maximize the work done on him by F?
24. Suppose Ω ⊂ R2 is a region with the property that every simple closed curve in Ω bounds a
region contained in Ω that is a finite union of parametrized rectangles. Prove that if ω is 1-form
on Ω with dω = 0, then ω is exact, i.e., there is a potential function f with ω = df .
25. a. Suppose there is a current c in a river. Show that if we row at a constant ground speed
v > c directly downstream a certain distance and then directly back upstream to our
beginning point, the time required (ignoring the time to turn around) is always greater
than the time it would take with no current. (This is just an Algebra I problem!)
b. Show that the same is true no matter what closed path C we take in the river. (Assume
we still row with ground velocity v, with kvk > c constant.) (Hint: Express the time of
the trip as a line integral over C and do some clever estimates. The diagram in Figure
3.15 may help.)
v
c β
α
Figure 3.15
26. According to Webster, a planimeter, pictured in Figure 3.16, is “an instrument for measuring
the area of a regular or irregular plane figure by tracing the perimeter of the figure.” As we
Figure 3.16
show a bit more schematically in Figure 3.17, an arm of fixed length b has one fixed end; to
the other is attached another arm of length a which is free to rotate. A wheel (for convenience
attached slightly off the near end) turns as the arm rotates about the pivot point. Use Green’s
Theorem to explain how the amount that the wheel rotates tells us the area of the figure.
§4. Surface Integrals and Flux 351
pivot τ
b 1
wheel
θ
fixed
Figure 3.17
Suppose U ⊂ R2 is a bounded open set and g : U → Rn is a one-to-one smooth map with the
property that Dg(a) has rank 2 for all a ∈ U . Then we call S = g(U ) a parametrized surface.
Figure 4.1
Assuming 0 < b < a, the image of g is most of a torus, as pictured in Figure 4.2, the surface
of revolution obtained by rotating a circle of radius b about an axis a units from its center.
352 Chapter 8. Differential Forms and Integration on Manifolds
Figure 4.2
Figure 4.3
As we expect by now, to define the integral of a 2-form over a parametrized surface S, we pull
back and integrate: when ω ∈ A2 (Rn ) and S = g(U ), we set
Z Z
ω= g∗ ω
S U
(provided the integral exists).
and so Z Z Z Z
2π 1p
∗ 2π
ω= g ω= 1 − r 2 rdrdθ = .
S R 0 0 3
(b) Now consider g : (0, π/2) × (0, 2π) → R3 given by
sin φ cos θ
φ
g = sin φ sin θ .
θ
cos φ
This is an alternative parametrization of the upper hemisphere, S. Then
g∗ (zdx ∧ dy) = cos φ(cos φ sin φdφ ∧ dθ) = cos2 φ sin φdφ ∧ dθ,
and so
Z Z Z 2π Z π/2
∗ 2π
ω= g ω= cos2 φ sin φdφdθ = .
S (0,π/2)×(0,2π) 0 0 3
(c) Now let’s do the lower hemisphere correspondingly in each of these two ways. Parametrizing
by R = (0, 1) × (0, 2π), we have
r cos θ
r
h = √ r sin θ .
θ
− 1 − r2
√
We then have h∗ (zdx ∧ dy) = − 1 − r 2 rdr ∧ dθ, and so
Z Z
2π
ω= h∗ ω = − .
S R 3
On the other hand, in spherical coordinates, we have k : (π/2, π) × (0, 2π) → R3 given by
the same formula as g in part (b) above, and so
Z Z
2π
ω= k∗ ω = .
S (π/2,π)×(0,2π) 3
What gives? ▽
The answer to the query is very simple. Imagine you were walking around on the unit sphere
with your feet on the surface (your body pointing radially outwards, normal to the sphere). As
you look down, you determine that a basis for the tangent plane to the sphere will be “correctly
v
u
Figure 4.4
354 Chapter 8. Differential Forms and Integration on Manifolds
oriented” if you see a positive (counterclockwise) rotation from the first vector (u) to the second
(v), as pictured in Figure 4.4. We will say that your body is pointing in the direction of the
outward-pointing normal vector to the surface. Note that then n, u, v form a positively-oriented
basis for R3 , i.e.,
| | |
n u v > 0.
| | |
v
u
v
u dx ∧ dy<0
dx ∧ dy>0 n
Figure 4.5
Example 3. The standard example of a non-orientable surface is the Möbius strip, pictured in
Figure 4.6. Observe that if you slide the positive basis {u, v} once around the strip, it will return
with the opposite orientation. Alternatively, if you start with an outward-pointing normal n and
travel once around the Möbius strip, the normal returns pointing in the opposite direction. ▽
§4. Surface Integrals and Flux 355
v
u
u
v
u
Figure 4.6
Definition . If S is an oriented surface, its (oriented) area 2-form σ is the 2-form with the
property that σ(a) assigns to each pair of tangent vectors at a the signed area of the parallelogram
they span. (By signed area we mean the obvious: the pair of tangent vectors form a positively-
oriented basis if and only if the signed area is positive.)
3 3
4.1. Oriented Surfaces
in R and Flux. Let S ⊂ R be an oriented surface with outward-
n1
pointing unit normal n = n2 . Then we claim that
n3
σ = n1 dy ∧ dz + n2 dz ∧ dx + n3 dx ∧ dy
is its area 2-form. This was the point of Exercise 8.2.5, but we give the argument here. If u and v
are in the tangent plane to S, then
| | |
σ(u, v) = n u v
| | |
gives the signed volume of the parallelepiped spanned by n, u, and v. Since n is a unit vector
orthogonal to u and v, this volume is the area of the parallelogram spanned by u and v; our
definition of orientation dictates that the signs agree.
and so
1 x y
σ=p − f ′ (r)dy ∧ dz − f ′ (r)dz ∧ dx + dx ∧ dy .
1 + f ′ (r)2 r r
Pulling back, we have
p
g∗ σ = r 1 + f ′ (r)2 dr ∧ dθ,
which agrees with the formula usually derived in single variable integral calculus. ▽
Then
∗ n1 n2 1
g σ = n1 dx ∧ dy + n2 dx ∧ dy + n3 dx ∧ dy = dx ∧ dy.
n3 n3 n3
Recall that if u and v are two vectors in the plane, then σ(u, v) gives the signed area of the
parallelogram they span, whereas (dx ∧ dy)(u, v) gives the signed area of its projection into the
xy-plane. As we see from Figure 4.7, the area of the projection is |n3 | = | cos γ| times the area of
e3
γ n
Figure 4.7
the original parallelogram, where γ is the angle between the plane and the xy-plane, so the general
theory is compatible with a more intuitive, geometric approach. ▽
F1
Given a vector field F = F2 on an open subset of R3 , we saw in Section 3 that integrating
F3
the 1-form ω = F1 dx + F2 dy + F3 dz along an oriented curve computes the work done by F in
moving a test particle along that curve. What is the meaning of integrating the corresponding
2-form η = F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy over an oriented surface S? (The observant reader
§4. Surface Integrals and Flux 357
who’s worked Exercise 8.2.9 will recognize that η = ⋆ω. See also Exercise 8.3.18.) Well, if u and v
are tangent to S, then
| | |
η(u, v) = F u v = (F · n) × (signed area of the parallelogram spanned by u and v).
| | |
Z Z
That is, η represents the flux of F outwards across S, often written F · ndS. Here dS repre-
S S
sents an element of (nonoriented) surface area, just as ds represented the element of (nonoriented)
arclength on a curve; in neither case should these be interpreted as the exterior derivative of some-
thing.
A physical interpretation is the following: imagine a fluid in motion (not depending on time),
and let F(x) represent the velocity of the fluid at x multiplied by the density of the fluid at x.
(Note that F points in the direction of the velocity and has units of mass/(area × time).) Then the
mass of fluid that flows across a small area ∆S of S in a small amount of time ∆t is approximately
∆m ≈ δ∆V ≈ δ(v∆t · n)(∆S) ≈ (F · n)∆S∆t,
so that
∆m
≈ F · n∆S.
∆t Z
Taking the limit as ∆t → 0 and summing over the bits of area ∆S, we infer that η represents
S
the rate at which mass is transferred across S by the fluid flow.
xz 2
Example 6. We wish to find the flux of the vector field F = yx2 outwards across the sphere
zy 2
S of radius a centered at the origin. That is, we wish to find the integral over S of the 2-form
η = xz 2 dy ∧ dz + yx2 dz ∧ dx + zy 2 dx ∧ dy. Calculating the pullback under the spherical coordinate
parametrization g : (0, π) × (0, 2π) → R3 ,
sin φ cos θ
φ
g = a sin φ sin θ ,
θ
cos φ
we have
g∗ η = a5 sin φ cos θ cos2 φ(sin2 φ cos θ) + sin3 φ sin θ cos2 θ(sin2 φ sin θ)
+ cos φ sin2 φ sin2 θ(sin φ cos φ) dφ ∧ dθ
= a5 sin3 φ cos2 φ + sin5 φ cos2 θ sin2 θ dφ ∧ dθ,
and so
Z Z
η= a5 sin3 φ cos2 φ + sin5 φ cos2 θ sin2 θ dφ ∧ dθ
S (0,π)×(0,2π)
Z π Z 2π
5
=a sin3 φ cos2 φ + sin5 φ cos2 θ sin2 θ dθdφ
0 0
Z π
= 2πa5 sin3 φ cos2 φ + 18 sin5 φ dφ
0
358 Chapter 8. Differential Forms and Integration on Manifolds
Z π
5
4
= 2πa 1
8 sin φ + 34 cos2 φ sin φ − 87 cos4 φ sin φ dφ = πa5 . ▽
0 5
4.2. Surface area. We have pilfered Figure 4.8 from someone who, in turn, plagiarized from
the book Matematiqeski Analiz na Mnogoobrazih by Mihail Spivak. As this example,
top view
Figure 4.8
due to Hermann Schwarz, illustrates, one must be far more careful to define surface area by a
limiting process than arclength of curves. It seems natural to approximate a surface by inscribed
triangles. But, even as the triangles get smaller and smaller, the sum of their areas may go to
infinity, even in the case of a surface as simplistic as a cylinder. In particular, by moving the planes
of the hexagons closer together, the triangles become more and more orthogonal to the cylinder.
The area of the individual triangles approaches hℓ/2, and the number of triangles grows without
bound.
For an oriented surface S ⊂ R3 , we can (and did) explicitly write down the 2-form σ that gives
the oriented area-form on S. In analogy with our development of arclength of a curve and our
treatment of change of variables in Chapter 7, we next give a definition of surface area that will
work for any parametrized surface. We need the result of Exercise 7.5.22: if u and v are vectors in
Rn , the area of the parallelogram they span is given by
v
u
u u · u u · v
t .
v · u v · v
(Here is the sketch of a proof. We may assume {u, v} is linearly independent, and let {v3 , . . . , vn }
be an orthonormal basis for Span(u, v)⊥ . Then we know that the volume of the n-dimensional
parallelepiped spanned by u, v, v3 , . . . , vn is the absolute value of the determinant of the matrix
| | | |
A = u v v3 · · · vn .
| | | |
§4. Surface Integrals and Flux 359
But by our choice of the vectors v3 , . . . , vn , this volume is evidently the area of the parallelogram
spanned by u and v. But by Propositions 5.11 and 5.7 of Chapter 7, we have
u · u u · v 0 · · · 0
v · u v · v 0 · · · 0
u · u u · v
(det A)2 = det(AT A) = 0 0 1 =
v · u v · v ,
.. .. ..
. . .
0 0 1
as required.) If g is a parametrization of a smooth surface, then for sufficiently small ∆u and ∆v,
we expect that the area of the image g [u, u + ∆u] × [v, v + ∆v] should be approximately
the area
u
of the parallelogram that is the image of this rectangle under the linear map Dg , and that, in
v
∂g ∂g
turn, is ∆u∆v times the area of the parallelogram spanned by and .
∂u ∂v
With this motivation, we now make the following
We leave it to the reader to check in Exercise 20 that for a parametrized, oriented surface in
R3 this gives the same result as integrating the area 2-form σ over the surface.
EXERCISES 8.4
1. Let S be that portion of the plane x + 2y + 2z = 4 lying in the first octant, oriented with
outward normal pointing upwards. Find
a. Z
the area of S
b. (x − y + 3z)σ
ZS
c. zdx ∧ dy + ydz ∧ dx + xdy ∧ dz
S
2. Find the area of that portion of the cylinder x2 + y 2 = a2 lying above the xy-plane and below
the plane z = y.
p
3. Find the area of that portion of the cone z = 2(x2 + y 2 ) lying beneath the plane y + z = 1.
*4. Find the area of that portion of the cylinder x2 +y 2 = 2y lying inside the sphere x2 +y 2 +z 2 = 4.
♯ 5. Let S be the
Z sphere of radius a centered at the origin, oriented with normal pointing outwards.
Evaluate xdy ∧ dz + ydz ∧ dx + zdx ∧ dy explicitly. What formula do you deduce for the
S
surface area of S?
360 Chapter 8. Differential Forms and Integration on Manifolds
7. Find the surface area of the torus given parametrically in Example 1(b).
*8. Find the surface area of that portion of a sphere of radius a lying between two parallel planes
(both intersecting the sphere) a distance h apart.
0
0 x
1 y
z
(u )
g v
u
v
Figure 4.9
Z ω = xdy ∧ dz. Let S be the unit sphere, oriented with outward-pointing normal. Calculate
11. Let
ω by parametrizing S
S
a. by spherical coordinates
§4. Surface Integrals and Flux 361
b. as a union of graphs
c. by stereographic projection (see Exercise 10)
Z
12. Let S be the unit upper hemisphere, oriented with outward-pointing normal. Calculate zσ
S
by showing that zσ = dx ∧ dy as 2-forms on S.
*14. Find the moment of inertia about the z-axis of a uniform spherical shell of radius a centered
at the origin.
*15. Find the flux of the vector field F(x) = x outwards across the following surfaces (all oriented
with outward-pointing normal pointing away from the origin):
a. the surface of the sphere of radius a centered at the origin
b. the surface of the cylinder x2 + y 2 = a2 , −h ≤ z ≤ h
c. the surface of the cylinder x2 + y 2 = a2 , −h ≤ z ≤ h, together with the two disks,
x2 + y 2 ≤ a2 , z = ±h ±1
d. the surface of the cube with vertices at ±1
2 ±1
x
16. Find the flux of the vector field F = y 2 outwards across the given surface S (all oriented
z2
with outward-pointing normal pointing away from the origin, unless otherwise specified):
a. S is the sphere of radius a centered at the origin
b. S is the upper hemisphere of radius a centered at the origin
p
c. S is the cone z = x2 + y 2 , 0 < z < 1, with outward-pointing normal having a negative
e3 -component
d. S is the cylinder x2 + y 2 = a2 , 0 ≤ z ≤ h
e. S is the cylinder x2 + y 2 = a2 , 0 ≤ z ≤ h, along with the disks x2 + y 2 ≤ a2 , z = 0 and
z=h
xz
*17. Calculate the flux of the vector field F = yz outwards across the surface of the pa-
x + y2
2
*18. Find the flux of the vector field F(x) = x/kxk3 outwards across the given surface (oriented
with outward-pointing normal pointing away from the origin):
a. the surface of the sphere of radius a centered at the origin
b. the surface of the cylinder x2 + y 2 = a2 , −h ≤ z ≤ h
c. the surface of the cylinder x2 + y 2 = a2 , −h ≤ z ≤ h, together with the two disks,
x2 + y 2 ≤ a2 , z = ±h
362 8.Differential Forms and Integration on Manifolds
Chapter
±1
d. the surface of the cube with vertices at ±1
±1
p
19. Let S be that portion of the cone z = x2 + y 2 lying inside 2 2 2
Z the sphere x + y + z = 2ax,
and oriented with normal pointing downwards. Calculate ω for
S
a. ω = dx ∧ dy
x y
b. ω = dy ∧ dz + dz ∧ dx − dx ∧ dy.
z z
20. Suppose g : Ω → R3 gives a parametrized, oriented surface with unit outward normal n. Let
∂g ∂g
N= × , so that n = N/kNk. Check that
∂u ∂v
p
g∗ (n1 dy ∧ dz + n2 dz ∧ dx + n3 dx ∧ dy) = kNkdu ∧ dv = EG − F 2 du ∧ dv.
21. Sketch the parametrized surface g : [0, 2π] × [−1, 1] given by:
(2 + v sin u2 ) cos u
u
g = (2 + v sin u2 ) sin u .
v
v cos u2
0 2π
Compare g∗ (dy ∧ dz) at and . Explain.
0 0
23. Consider the cylinder S with equation x2 + y 2 = 1, −1 ≤ z ≤ 1, oriented with unit normal
pointing
Z outwards. Calculate
a. xdy ∧ dz − zdx ∧ dy
ZS Z
b. xzdy and xzdy (See Figure 4.10.)
C1 C2
Compare your answers and explain.
24. Let S be the hemisphere x2 + y 2 + z 2 = a2 , z ≥ 0, oriented with unit normal pointing upwards.
Let CZ be the boundary curve, x2 + y 2 = a2 , z = 0, oriented counterclockwise. Calculate
a. dx ∧ dy + 2zdz ∧ dx
ZS
b. xdy + z 2 dx
C
§5. Stokes’s Theorem 363
C2
fold seal
S
C1
Figure 4.10
25. Construct two Möbius strips out of paper: For each, cut out a long rectangle, and attach the
short edges with opposite orientations.
a. Cut along the center circle of the first strip. What happens? Explain. What happens if
you repeat the process?
b. Make parallel cuts in the second strip one third the way from either edge. What happens?
Explain.
26. Prove or give a counterexample: If S is an orientable surface, then there are exactly two possible
orientations on S.
5. Stokes’s Theorem
We now come to the generalization of Green’s Theorem to higher dimensions. We first stop to
make the official definition of the integral of a differential form over a compact, oriented manifold.
So far we have dealt only with the integrals of 1- and 2-forms over parametrized curves and surfaces,
respectively.
M
V
U
U
IRk IR+k
Figure 5.1
Definition. Let M be an oriented k-dimensional manifold with boundary. Its (oriented) volume
form is the k-form σ with the property that σ(a) assigns to each k-tuple of tangent vectors at a
the signed volume of the parallelepiped they span.
Now we come to the main technical tool that will enable us to define integration on manifolds.
Theorem 5.1. Let M ⊂ Rn be a compact k-dimensional manifold with boundary. Then there
are smooth real-valued functions ρ1 , . . . , ρN on M so that
(i) 0 ≤ ρi ≤ 1 for all i;
5
We say U ⊂ Rk+ is an open subset of Rk+ if it is the intersection of Rk+ with some open subset of Rk .
§5. Stokes’s Theorem 365
Then h is smooth (in particular, all its derivatives at 0 are equal to 0, as we ask the reader to prove
in Exercise 25). Set
Z x
h(t)h(1 − t)dt
0
j(x) = Z 1 ,
h(t)h(1 − t)dt
0
and define ψ : Rk
→ R by ψ(x) = j(3−2kxk). Then ψ is a smooth function with ψ(x) = 1 whenever
kxk ≤ 1 and ψ(x) = 0 whenever kxk ≥ 3/2. ψ is often called a bump function. (See Figure 5.2 for
the graph of ψ for k = 1.)
1 1 1 1.5
Figure 5.2
Step 2. For each point p ∈ M , choose a coordinate chart whose domain is a ball of radius 2
in Rk (why can we do so?).6 The images of the balls of radius 1 obviously cover all of M ; indeed
we can choose a sequence (countable number) of p’s so that this is true. (See Exercise 26.) By
Exercise 5.1.12, finitely many of these images of balls of radius 1, say, V1 , . . . , VN , cover all of M .
Let gi : B(0, 2) → Vi be the respective coordinate charts, and define θi = ψ ◦ gi−1 , interpreting θi to
be defined on all of M by letting it be 0 outside of Vi (note that the fact that ψ is 0 outside the
ball of radius 3/2 means that θi will be smooth). Set
θi
ρi = PN .
j=1 θj
Note that for each p ∈ M , we have p = gj (u) for some j and some u ∈ B(0, 1), and hence θj (p) = 1
for some j. Thus, the sum is everywhere positive. These functions ρi fulfill the requirements of the
theorem.
6
For those p in the boundary, this will be a half-ball, i.e., the points in the ball with nonnegative kth coordinate.
366 Chapter 8. Differential Forms and Integration on Manifolds
Now it is easy to define the integral. Let M ⊂ Rn be a compact, oriented k-dimensional manifold
(with piecewise-smooth boundary). Let ω be a k-form on M .7 Let {ρi } be a partition of unity,
and let gi be the corresponding parametrizations, which we may take to be orientation-preserving
(how?). Now we set
Z Z XN N Z
X
ω= ρi ω = gi∗ (ρi ω).
M M i=1 i=1 B(0,2)
The point is that the form ρi ω is nonzero only inside the image of the parametrization gi .
One last technical point. Let M be a k-dimensional manifold with boundary, and let p be a
boundary point. The tangent space of ∂M at p is a (k − 1)-dimensional subspace of the tangent
space of M at p, and its orthogonal complement is 1-dimensional. That 1-dimensional subspace has
two possible basis vectors, called the inward- and outward-pointing normal vectors. By definition,
if we follow a curve starting at p whose tangent vector is the inward-pointing normal, we move into
M , as shown in Figure 5.3. We endow ∂M with an orientation, called the boundary orientation,
inward-pointing path
Tp(∂M)
M
p
n outward-pointing normal
Figure 5.3
by saying that the outward normal, n, followed by a positively-oriented basis for the tangent space
of ∂M should provide a positively-oriented basis for the tangent space of M . For examples, see
Figure 5.4. We ask the reader to check in Exercise 1 that the boundary orientation on ∂Rk+ is the
IR3+
2
IR+
v1
v1
v2
n
n
Figure 5.4
5.2. Stokes’s Theorem. Now we come to the crowning result. We will give various physical
interpretations and applications in the next section, as well as some applications to topology in the
last section of the chapter. Here we will give the theorem and some concrete examples.
Theorem 5.2 (Stokes’s Theorem). Let M be a compact, oriented k-dimensional manifold with
boundary, and let ω be a smooth (k − 1)-form on M . Then
Z Z
ω= dω.
∂M M
Remark. Note that the usual Fundamental Theorem of Calculus, the Fundamental Theorem
of Calculus for Line Integrals (Proposition 3.1), and Green’s Theorem (Corollary 3.5) are all special
cases of this theorem. When we’re orienting the boundary of an oriented line segment, we assign a
+ when the outward-pointing normal agrees with the orientation on the segment, and a − when it
disagrees. This is compatible with the signs in
Z b Z b
df = f ′ (t)dt = f (b) − f (a).
a a
Proof. Since both sides of the desired equation are linear in ω, we can (by using a partition of
unity) reduce to the case that ω is zero outside of a compact subset of a single coordinate chart,
g : U → Rn (where U is open in either Rk or Rk+ ). Then we have
Z Z Z Z
dω = dω = g∗ (dω) = d(g∗ ω).
M g(U ) U U
Z Case 1.
Z Suppose U is open in Rk ; this means that ω = 0 on ∂M , and so we need only show that
dω = d(g∗ ω) = 0. The crucial point is this: since g∗ ω is smooth and 0 outside of a compact
M U
subset of U , we may choose a rectangle R containing U , as shown in Figure 5.5, and extend the
functions fi to functions on all of R by setting them equal to 0 outside of U . Finally, we integrate
over R = [a1 , b1 ] × · · · × [ak , bk ]:
Z Z Xk
∂fi
d(g∗ ω) = (−1)i−1 dx1 ∧ · · · ∧ dxi ∧ · · · ∧ dxk
U R ∂xi
i=1
k
X Z
i−1 ∂fi
= (−1) dx1 dx2 · · · dxk
R ∂xi
i=1
368 Chapter 8. Differential Forms and Integration on Manifolds
U
R
ai bi
Figure 5.5
k
X Z bk Z b1 Z bi
i−1 ∂fi ci · · · dxk
= (−1) ... dxi dx1 · · · dx
i=1 ak a1 ai ∂xi
x1 x1
Z b1 . ..
Xk Z bk .. .
i−1 ci · · · dxk
= (−1) ... fi bi − fi ai dx1 · · · dx
a a . .
i=1 k 1
.. ..
xk xk
= 0,
since fi = 0 everywhere on the boundary of R. (Note the applications of Fubini’s Theorem and the
traditional Fundamental Theorem of Calculus.)
Case 2. Now comes the more interesting situation. Suppose U is open in Rk+ , and once again
we extend the functions fi to functions on a rectangle R ⊂ Rk+ by letting them be 0 outside of U .
In this case, the rectangle is of the form R = [a1 , b1 ] × · · · × [ak−1 , bk−1 ] × [0, bk ], as we see in Figure
bk
U
R
Figure 5.6
x1 x1
.. .
Xk Z bk Z b1 . ..
ci · · · dxk
= (−1)i−1 ... fi bi − fi ai dx1 · · · dx
i=1 ak a1
... ...
xk xk
x1 x1
Z bk−1 Z b1 .. ..
k−1 . .
= (−1) ... fk − fk dx1 · · · dxk−1
ak−1 a1 xk−1 xk−1
bk 0
(since all the other integrals vanish for the same reason as in case 1)
x1
Z ..
k .
= (−1) fk dx1 · · · dxk−1
U ∩∂Rk+ xk−1
0
Z Z
= g∗ ω = ω,
U ∩∂Rk+ ∂M
as required. Note the crucial sign in the definition of the boundary orientation (see also Exercise
1).
Remark . Although we won’t take the time to prove it here, Stokes’s Theorem is also valid
when the boundary, rather than being a manifold itself, is piecewise smooth, e.g., a union of smooth
(k − 1)-dimensional manifolds with boundary intersecting along (k − 2)-dimensional manifolds. For
example, we may take a cube or a solid cylinder, whose boundary is the union of a cylinder and
two disks. The theorem also applies to such non-manifolds as a solid cone.
Example 1. Let C be the intersection of the unit sphere x2 + y 2 + z 2 = 1 and the plane
x + 2y +Zz = 0, oriented counterclockwise as viewed from high above the xy-plane. We wish to
evaluate (z − x)dx + (x − y)dy + (y − z)dz.
C
We let ω = (z−x)dx+(x−y)dy+(y−z)dz and M be that portion of the plane x+2y+z = 0 lying
inside the unit sphere, oriented so that the outward-pointing normal has a positive e3 -component,
as shown in Figure 5.7. Then ∂M = C, and by Stokes’s Theorem we have
Z Z Z Z
(∗) ω= ω= dω = (dy ∧ dz + dz ∧ dx + dx ∧ dy).
C ∂M M M
Parametrizing the plane by projection on the xy-plane, we have M = g(D), where D is the interior
yyy
370 Chapter 8. Differential Forms and Integration on Manifolds
yyy
yyy
D
Figure 5.7
n D
Figure 5.8
(where by D we mean the disk with its usual upwards orientation), then we have
Z Z Z Z 2π Z 2
64 64 88
ω= ω+ ω= π+ r 2 rdrdθ = π + 8π = π.
S ∂M D 3 0 0 3 3
We leave it to the reader to check this by a direct calculation (see Exercise 8.4.17). ▽
Example 4. We come now to the 3-dimensional analogue of Example 9 of Section 3. It will play
a major rôle in physical and topological applications in upcoming sections. Consider the 2-form
xdy ∧ dz + ydz ∧ dx + zdx ∧ dy
ω= ,
(x2 + y 2 + z 2 )3/2
which is defined and smooth on R3 − {0}. The astute reader may recognize that on a sphere of
radius a centered at the origin, ω is 1/a2 times the area 2-form.
Pulling back by the spherical coordinates parametrization given on p. 315, with a bit of work
we see that
g∗ ω = sin φdφ ∧ dθ,
372 Chapter 8. Differential Forms and Integration on Manifolds
which establishes again the geometric interpretation of ω. It is also clear that d(g∗ ω) = 0; since
det Dg 6= 0 whenever ρ 6= 0 and φ 6= 0, π, it follows that dω = 0. (Of course, it isn’t too hard to
calculate this directly!)
So here we have a 2-form whose integral over any sphere centered atZ the origin (with outward-
pointing normal) is 4π, and yet, for any ball B centered at the origin, dω = 0. What happened
B
to Stokes’s Theorem? The problem is that ω is not defined, let alone smooth, on all of B.
But there is more to be learned here. If Ω ⊂ R3 is a compact 3-manifold with boundary with
0∈/ ∂Ω, then we claim that
Z 4π, 0 ∈ Ω
ω= ,
∂Ω 0, 0∈/Ω
rather like what happened with the winding number in Example 10 of Section 3. When 0 ∈ / Ω, we
know that ω is a (smooth) 2-form on all of Ω, and hence Stokes’s Theorem applies directly to give
Z Z
ω= dω = 0.
∂Ω Ω
When 0 ∈ Ω, however, we choose ε > 0 small enough so that the closed ball B(0, ε) ⊂ Ω, and we
ε
Ωε
Figure 5.9
let Ωε = Ω − B(0, ε), as pictured in Figure 5.9, recalling that ∂Ωε = ∂Ω + Sε− . (Here Sε denotes
the sphere of radius ε centered at 0, with its usual outward orientation.) Then ω is a smooth form
defined on all of Ωε and we have
Z Z Z Z
0= dω = ω= ω− ω.
Ωε ∂Ωε ∂Ω Sε
Therefore, we have Z Z
ω= ω = 4π,
∂Ω Sε
as we learned above. ▽
EXERCISES 8.5
*1. Check that the boundary orientation on ∂Rk+ is (−1)k times the usual orientation on Rk−1 .
§5. Stokes’s Theorem 373
*6. Let Ω ⊂ R3 be the region bounded above by the sphere x2 + y 2 + z 2 = a2 and below by the
plane z = 0. Compute
Z
xzdy ∧ dz + yzdz ∧ dx + (x2 + y 2 + z 2 )dx ∧ dy
∂Ω
directly and by applying Stokes’s Theorem.
7. Let ω = yZ2 dy ∧dz +x2 dz ∧dx+z 2 dx∧dy, and let M be the solid paraboloid 0 ≤ z ≤ 1−x2 −y 2 .
Evaluate ω directly and by applying Stokes’s Theorem.
∂M
9. Let M be the surface pictured in Figure 5.10, with boundary curve x2 +y 2 = 4, z = 0. Calculate
Z
yzdy ∧ dz + x3 dz ∧ dx + y 2 dx ∧ dy.
M
10. Suppose M and M ′ are two compact, oriented k-dimensional manifolds with boundary, and
suppose
Z ∂M Z= ∂M ′ (as oriented (k−1)-dimensional manifolds). Prove that for any (k−1)-form
ω, dω = dω.
M M′
Z
11. Use the result of Exercise 10 to compute dω for the given surface M and 1-form ω:
M
374 Chapter 8. Differential Forms and Integration on Manifolds
Figure 5.10
12. Let M = Z{x ∈ R4 : x21 + x22 + x23 ≤ x4 ≤ 1}, with the standard orientation inherited from R4 .
Evaluate ω:
∂M
*a. ω = (x31 x42 + x4 )dx1 ∧ dx2 ∧ dx3
b. ω = kxk2 dx1 ∧ dx2 ∧ dx3
15. Let S be that portion of the cylinder x2 + y 2 = a2 lying above the xy-plane and below the
sphere x2 + (y − a)2 + z 2 = 4a2 . Let C be the intersection of the cylinder and sphere, oriented
clockwise as viewed
Z from high above the xy-plane.
a. Evaluate zdS.
S Z
b. Use your answer to part a to evaluate y(z 2 − 1)dx + x(1 − z 2 )dy + z 2 dz.
C
*18. Let C be the intersection of the sphere x2 + y 2 + z 2 = 1 and the plane x + y + z = 0, oriented
counterclockwise as viewed from high above the xy-plane. Evaluate
Z
z 3 dx.
C
(Hint: Give an orthonormal basis for the plane x + y + z = 0, and use polar coordinates.)
19. Let C be the intersection of the sphere x2 + y 2 + z 2 = 1 and the plane x + y + z = 0, oriented
counterclockwise as viewed from high above the xy-plane. Evaluate
Z
xy 2 dx + yz 2 dy + zx2 dz.
C
20. Suppose ω ∈ Ak−2 (Rk ). Complete the following proof that d(dω) = 0. Write d(dω) =
f (x)dx1 ∧ · · · ∧ dxk , and suppose f (a) > 0. By considering the integral of d(dω) over a small
ball centered at a and applying Corollary 5.3, arrive at a contradiction.
21. We saw in Example 8 of Section 3 that Zthere are 1-forms ω on R2 with the property that for
every region S ⊂ R2 we have area(S) = ω. Can there be such a 1-form on
∂S
a. the unit sphere?
b. the torus?
c. the punctured sphere (i.e., the sphere less the north pole)?
22. In this exercise we sketch a proof that the graph of a function f satisfying the minimal surface
equation (see p. 118) on a region Ω ⊂ R2 has less area than any other surface with the same
boundary curve.8
a. Consider the area 2-form σ of the graph of f :
1 ∂f ∂f
σ=p − dy ∧ dz − dz ∧ dx + dx ∧ dy .
1 + k∇f k2 ∂x ∂y
Show that dσ = 0 if and only if f satisfies the minimal surface equation.
8
This is an illustration of the use of calibrations, introduced by Reese Harvey and Blaine Lawson in their seminal
paper, Calibrated Geometries, Acta Math. 148 (1982), pp. 47–157.
376 Chapter 8. Differential Forms and Integration on Manifolds
Z
b. Show that for any compact oriented surface N ⊂ R3 , σ ≤ area(N ), and equality holds
N Z
if and only if N is parallel to the graph of f . (Hint: Interpret σ as a flux integral.)
N
c. Let M be the graph of f over Ω, and let N be a different oriented surface with ∂N = ∂M .
Deduce that area(M ) < area(N ).
23. a. Prove that M is an orientable k-dimensional manifold with boundary if and only if there is
a nowhere-zero k-form on M . (Hint: For “=⇒,” use definition (1) of a manifold on p. 250
and a partition of unity to glue together compatibly chosen forms on coordinate charts.
Although we’ve only proved Theorem 5.1 for a compact manifold M , the proof can easily
be adapted to show that for any manifold M and any covering {Uj } by coordinate charts,
we have a sequence of such functions ρi , each of which is zero outside some Uj .)
b. Conclude that M is orientable if and only if there is a volume form globally defined on M .
24. Let M be a compact, orientable k-dimensional manifold (with no boundary), and let ω be a
(k − 1)-form. Show that dω = 0 at some point of M . (Hint: Using Exercise 23, write dω = f σ,
where σ is the volume form of M . Without loss of generality, you may assume M is connected.
Why?)
e−1/x , x > 0
25. Let h(x) = . Because exponential functions grow faster at infinity than any
0, x≤0
polynomial, it should be plausible that all the derivatives of h at 0 are 0. But give a rigorous
proof as follows:
a. Let f (x) = e−1/x , x > 0. Prove by induction that f (k) , the kth derivative of f , is given by
f (k) (x) = e−1/x pk (1/x) for some polynomial pk of degree 2k.
b. Prove by induction that h(k) (0) = 0 for all k ≥ 0.
26. Let X ⊂ Rn . Prove that given any collection {Vα } of open subsets of Rn whose union contains
X, there is a sequence Vα1 , Vα2 , . . . of these sets whose union contains X. (Hint: Consider all
balls B(q, 1/k) ⊂ Rn (for some k ∈ N) centered at points q ∈ Rn all of whose coordinates are
rational. This collection is countable, i.e., can be arranged in a sequence. Show that we can
choose such balls B(qi , 1/ki ), i = 1, 2, . . ., covering all of X with the additional property that
each is contained in some Vαi .)
6. Applications to Physics
6.1. The Dictionary in R3 . We have already seen that a vector field in R3 can plausibly be
interpreted as either a 1-form or a 2-form, the former when we are calculating work, the latter when
we are calculating flux. We have already seen that for any function f , the 1-form df corresponds
to the vector field ∇f . We want to give the traditional interpretations of the exterior derivative as
it acts on 1- and 2-forms.
§6. Applications to Physics 377
Given a 1-form
(We stick to the subscript notation here to make the symmetries as clear as possible.) Correspond-
ingly, given the vector field
∂F3 ∂F2
−
F1 ∂x2 ∂x3
F = F2 , we set curl F = ∂F1 − ∂F3 .
∂x3 ∂x1
F3
∂F ∂F
2 1
−
∂x1 ∂x2
In somewhat older books one often sees the notation “rot,” rather than curl; both terms suggest
that we think of curl F as having something to do with rotation (curling).
Stokes’s Theorem can now be phrased in the following classical form:
Theorem 6.1 (Classical Stokes’s Theorem). Let S ⊂ R3 be a compact, oriented surface with
boundary. Let F be a smooth vector field defined on all of S. Then we have
Z Z
|F ·{z
Tds} = |curl F{z· ndS} .
∂S ω S
dω
If we return to our Zdiscussion of flux in Section 4 and visualize F as the velocity field of a fluid,
then the line integral F · Tds around a closed curve C may be interpreted as the circulation
C
of F around C, which we might visualize as a measure of the tendency of a piece of wire in the
shape of C to turn (or circulate) when dropped in the fluid. Applying the theorem with S = Dr ,
a 2-dimensional disk of radius r centered at a with normal vector n, and using continuity (see
Exercise 7.1.7), we have
Z
1
curl F(a) · n = lim F · Tds.
r→0+ πr 2 ∂Dr
In particular, if, as pictured in Figure 6.1, we stick a very small paddlewheel (of radius r) in the
fluid, it will spin the fastest when the axle points in the direction of curl F (and—at least in the
limit—won’t spin at all when the axle is orthogonal to curl F!). Indeed, if the fluid—and hence the
paddlewheel—is spinning about an axis with angular speed ν, then kcurl Fk = 2ν (see Exercise 1).
Now, given the 2-form
Figure 6.1
(which happens to be obtained by applying the star operator, defined in Exercise 8.2.9, to our
original 1-form), then
∂F1 ∂F2 ∂F3
dω = + + dx1 ∧ dx2 ∧ dx3 .
∂x1 ∂x2 ∂x3
Correspondingly, given the vector field
F1
∂F1 ∂F2 ∂F3
F = F2 , we set div F = + + .
F3
∂x1 ∂x2 ∂x3
“div” is short for divergence, a term that is à propos, as we shall soon see. In this case, d2 = 0 can
be restated as
div (curl F) = 0 for all C2 vector fields F.
Stokes’s Theorem now takes the following form, sometimes called Gauss’s Theorem:
Theorem 6.2 (Classical Divergence Theorem). Suppose F is a smooth vector field on a compact
3-manifold with boundary, Ω ⊂ R3 . Then
Z Z
|F ·{z
ndS} = |div {z
FdV} .
∂Ω ω Ω
dω
Once again, we get from this a limiting interpretation of the divergence: Applying Exercise
7.1.7, we find
Z
1
(∗) div F(a) = lim 4 3 F · ndS.
r→0+ πr ∂B(a,r)
3
That is, div F(a) is a measure of the flux (per unit volume) outwards across very small spheres
centered at a. If that flux is positive, we can visualize a as a source of the field, with a net divergence
of the fluid flow; if the flux is negative, we can visualize a as a sink , with a net confluence of the
fluid. We shall see a beautiful alternative interpretation of the divergence in Chapter 9.
Given a vector field F (in the context of work) and the corresponding 1-form ω, applying the
star operator introduced in Exercise 8.2.9 gives the 2-form ⋆ω corresponding to the same vector
field F (in the context of flux)—and vice versa. That is, when we have an oriented surface S, the
2-form ⋆ω gives the normal component of F times area 2-form σ of S. In particular, if we start
§6. Applications to Physics 379
with a function f , then on S, ⋆df = (Dn f )σ, where Dn f = ∇f · n is the directional derivative of
f in the normal direction.
We summarize the relation among forms and vector fields, the d operator and gradient, curl,
and divergence, in the following table:
6.2. Gauss’s Law. In this passage we concentrate on inverse square forces, either gravitation
(according to Newton’s law of gravitation) or electrostatic attraction (according to Coulomb’s
law). We will stick with the notation of Newton’s law of gravitation, as we discussed in Section 4
of Chapter 7: the gravitational attraction of a mass M at the origin on a unit test mass at position
x is given by
x
F = −GM .
kxk3
(Here G is the universal gravitation constant.) As we saw in Example 4 of Section 5, div F = 0
(except at the origin) and for any compact surface S ⊂ R3 bounding a region Ω, we have
Z −4πGM , 0 ∈ Ω
F · ndS = .
S 0, otherwise
(We must also stipulate that 0 ∈ / S for the integral to make sense.) More generally, if Fa is the
gravitational force field due to a point mass at point a ∈/ S, then
Z −4πGM , a ∈ Ω
Fa · ndS = .
S 0, otherwise
If we have point masses M1 , . . . , Mk at points a1 , . . . , ak , then the flux of the resultant gravitation
Pk
force F = Faj outwards across the surface S (on which, once again, none of the point masses
j=1
lies) is given by
Z X
F · ndS = −4πG Mj .
S aj ∈Ω
380 Chapter 8. Differential Forms and Integration on Manifolds
Indeed, given a mass distribution with (integrable) density function δ on a region D, we can, in
fact, write an explicit formula for the gravitational field (see Section 4 of Chapter 7):
Z
y−x
(†) F(x) = G δ(y)dVy .
D ky − xk3
(When x ∈ D, this integral is improper, yet convergent, as can be verified by using spherical
coordinates centered at the point x.) It should come as no surprise, approximating the mass
distribution by a finite set of point masses, that the flux of the resulting gravitational force F is
given by Z Z
F · ndS = −4πG δdV = −4πGM ,
S Ω
where M is the mass inside S = ∂Ω. This is Gauss’s law.
Using the limiting formula for divergence given in (∗) on p. 378, we see that, even if F isn’t
apparently smooth, it is plausible to define
GM
Thus, we have kF(x)k = kxk. Since F is radial, we have
R3
GM
F(x) = − 3 x.
R
It is often surprising to find that the gravitational force inside the earth is linear in the distance
from the center. Notice that at the earth’s surface, this analysis is in accord with the inverse-square
nature of the field. (See Exercise 2.)
As an amusing application, we calculate the time required to travel in a perfectly frictionless
tunnel inside the earth from one point on the surface to another. We suppose that we start the trip
with zero speed. When the mass is at position x, the component of the gravitational force acting
in the direction of the tunnel is
GM
−kFk sin θ = − 3 u,
R
where u is the displacement of the mass from the center of the tunnel (see Figure 6.2). By Newton’s
second law, we have
GM
mu′′ (t) = − 3 u(t).
R
§6. Applications to Physics 381
u
x
Figure 6.2
and we see that the mass reaches the opposite end of the tunnel after time
s
π R
T =q =π ≈ 42 min.
GM g
R3
As was pointed out to me my freshman year of college, this is rather less time than many of our
commutes!
6.3. Maxwell’s Equations. Let E denote the electric field, B the magnetic field, ρ the charge
density, and J the current density. All of these are functions on (some region in) R3 × R (space-
time), on which we use coordinates x, y, z, and t. The classical presentation of Maxwell’s equations
is the following system of four partial differential equations (ignoring various constants such as 4π
and c, the speed of light).
Gauss’s law: div E = ρ
no magnetic monopoles: div B = 0
∂B
Faraday’s law: curl E = −
∂t
∂E
Ampère’s law: curl B = +J
∂t
These are all “differential” versions of equivalent “integral” statements obtained by applying Stokes’s
Theorem, as we already encountered Gauss’s Law in the previous subsection. Briefly: suppose S is
an oriented surface (perhaps imagined) and ∂S represents a wire. Then Faraday’s law states that
Z Z Z
∂B d
E · Tds = − · ndS = − B · ndS
∂S S ∂t dt S
(using the result of Exercise 7.2.20 to differentiate under the integral sign); i.e., the voltage around
the loops ∂S equals the negative of the rate of change of magnetic flux across the loop. (More
382 Chapter 8. Differential Forms and Integration on Manifolds
colloquially, a moving magnetic field induces an electric field that in turn does work, namely,
creates a voltage drop across the loop.) On the other hand, Ampère’s law states that (in steady
state, with no time variation) Z Z
B · Tds = J · ndS,
∂S S
i.e., the circulation of the magnetic field around the wire is the flux of the current density across
the loop.
Let
ω = (E1 dx + E2 dy + E3 dz) ∧ dt + (B1 dy ∧ dz + B2 dz ∧ dx + B3 dx ∧ dy).
Then
∂E3 ∂E2 ∂B1 ∂E
1 ∂E3 ∂B2
dω = − + dy ∧ dz + − + dz ∧ dx+
∂y ∂z ∂t ∂z ∂x ∂t
∂E
2 ∂E1 ∂B3 ∂B1 ∂B2 ∂B3
− + dx ∧ dy ∧ dt + + + dx ∧ dy ∧ dz,
∂x ∂y ∂t ∂x ∂y ∂z
and so we see that
∂B
dω = 0 ⇐⇒ div B = 0 and curl E + = 0.
∂t
Next, let
θ = −(E1 dy ∧ dz + E2 dz ∧ dx + E3 dx ∧ dy) + (B1 dx + B2 dy + B3 dz) ∧ dt.
(Using the star operator defined in Exercise 8.2.9, one can check that θ = ⋆ω. The subtlety is that
we’re working in space-time, endowed with a Lorentz metric in which the standard orthonormal
basis {e1 , . . . , e4 } has the property that e4 ·e4 = −1; this introduces a minus sign so that ⋆(dx∧dt) =
−dy ∧ dz, etc.) Then an analogous calculation shows that
∂E
dθ = 0 ⇐⇒ div E = 0 and curl B − = 0.
∂t
This would hold, for example, in a vacuum, where ρ = 0 and J = 0. But, in general, the first and
last of Maxwell’s equations are equivalent to the equation
dθ = (J1 dy ∧ dz + J2 dz ∧ dx + J3 dx ∧ dy) ∧ dt − ρdx ∧ dy ∧ dz.
Since dω = 0 on R4 , there is a 1-form
α = a1 dx + a2 dy + a3 dz − ϕdt
so that dα = ω (see Exercise 8.7.12). Of course, α is far from unique; for any function f , we will
have d(α + df ) = ω as well. Let β = α + df , where f is a solution of the inhomogeneous wave
equation
2 ∂2f ∂a1 ∂a2 ∂a3 ∂ϕ
∇ f− 2 =− + + + .
∂t ∂x ∂y ∂z ∂t
This means that d⋆df = −d⋆α, and so d⋆β = 0. Writing
β = A1 dx + A2 dy + A3 dz − φdt,
the condition that d⋆β = 0 is equivalent to
∂A1 ∂A2 ∂A3 ∂φ
(∗) + + + = 0.
∂x ∂y ∂z ∂t
§6. Applications to Physics 383
EXERCISES 8.6
1. Write down the vector field F corresponding to a rotation counterclockwise about an axis in
the direction of the unit vector a with angular speed ν, and check that curl F = 2νa.
2. Using Gauss’s law, show that the gravitational field of a uniform ball outside the ball is that
of a point mass at its center.
4. (See Exercise 3.) Prove that if f and g are harmonic on a region Ω and f = g on ∂Ω, then
f = g everywhere on Ω. (Hint: Consider f − g.)
384 Chapter 8. Differential Forms and Integration on Manifolds
6. Let S ⊂ R3 be a closed, oriented surface. Using the formula (†) for the gravitational field F,
show that
a. the flux of F outwards across S is 0 whenZ no points of D lie on or inside S.
b. the flux of F outwards across S is −4πG δdV when all of D lies inside S.
D
(Hint: Change the order of integration.)
*7. Try to determine which of the vector fields pictured in Figure 6.3 have zero divergence and
which have zero curl. Justify your answers.
8. Let F be a smooth vector field on an open set U ⊂ Rn . A parametrized curve g is a flow line
for a vector field F if g′ (t) = F(g(t)) for all t.
a. Give a vector field with a closed flow line.
b. Prove that if F is conservative, then it can have no closed flow line (other than a single
point).
c. Prove that if n = 2 and F has a closed flow line C, then div F must equal 0 at some point
inside C. (Hint: See Exercise 8.3.18.)
10. (Archimedes’ law of buoyancy) Prove that when a floating body in a uniform liquid is at
equilibrium, it displaces its own weight, as follows. Let Ω denote the portion of the body that
is submerged.
a. The force exerted by the pressure of the liquid on a planar piece of surface is directed inward
normal to the surface,
Z and pressure is force per unit area. Deduce that the buoyancy force
is given by B = −pndS, where p is the pressure.
∂Ω
b. Assuming that ∇p = δg, where δ is the (constant) density of the liquid and g is the
acceleration of gravity, deduce that B = −M g, where M is the mass of the displaced
liquid. (Hint: Apply Exercise 9.)
c. Deduce the result.
11. Let v be the velocity field of a fluid flow, and let δ be the density of the fluid. (These are both
§6. Applications to Physics 385
(a) (b)
(c) (d)
(e) (f)
(g) (h)
Figure 6.3
386 Chapter 8. Differential Forms and Integration on Manifolds
C1 functions of position and time.) Let F = δv. The law of conservation of mass states that
Z Z
d
δdV = − F · ndS.
dt Ω ∂Ω
Show that the validity of this equation for all regions Ω is equivalent to the equation of conti-
nuity:
∂δ
div F + = 0.
∂t
(Hint: Use Exercise 7.2.20.)
3 2 x
12. Suppose a body Ω ⊂ R has (C ) temperature u at position x ∈ Ω at time t. Assume
t
that the heat flow vector q = −K∇u, where K is a constant (called the heat conductivity of
the body); the flux of q outwards across an oriented surface S represents the rate of heat flow
across S. Z
a. Show that the rate of heat flow across ∂Ω into Ω is F = K∇2 udV .
Ω
b. Let c denote the heat capacity of the body; the amount of heat required to raise the
temperature of the volume ∆V by ∆T degrees is approximately (c∆T )∆V ; thus, the rate
∂u
at which the volume ∆V absorbs heat is c ∆V . Conclude that the rate of heat flow into
Z ∂t
∂u
Ω is F = c dV .
Ω ∂t
∂u
c. Deduce that the heat flow within Ω is governed by the partial differential equation c =
∂t
2
K∇ u.
2
13. Suppose Ω ⊂ R3 is a region
and u : Ω × [0, ∞) → R is a C solution of the heat equation
∂u x
∇2 u = . Suppose u = 0 for all x ∈ Ω and Dn u = 0 on ∂Ω (this means the region is
∂t 0
insulated along the boundary). Z
1
a. Consider the “energy” E(t) = u2 dV . Note that E(0) = 0. Prove that E ′ (t) ≤ 0 (this
2 Ω
means that heat dissipates) and show that E(t) = 0 for all t ≥ 0. (Hint: Use Exercise
7.2.20.)
x
b. Prove that u = 0 for all x ∈ Ω and all t ≥ 0.
t
c. Prove that if u1 and u2 are two solutions of the heat equation that agree at t = 0 and
agree on ∂Ω for all time t ≥ 0, then they must agree for all time t ≥ 0.
∂2u
14. Suppose Ω ⊂ R3 is a region and u : Ω×R → R is a C2 solution of the wave equation ∇2 u = 2 .
∂t
x
Suppose that u = f (x) for all x ∈ ∂Ω and all t (e.g., in two dimensions, the drumhead is
t
clamped along the boundary of Ω). Prove that the total energy
Z 2
1 ∂u 2
E(t) = + k∇uk dV
2 Ω ∂t
is constant. Here by ∇u we mean the vector of derivatives with respect only to the space
variables.
§7. Applications to Topology 387
7. Applications to Topology
We are going to give a brief introduction to the field of topology by using the techniques of
differential forms and Stokes’s Theorem to prove three rather deep theorems. The basic ingredient
of several of our proofs is the following. Let S n denote the n-dimensional unit sphere, S n = {x ∈
Rn+1 : kxk = 1}, and D n the closed unit ball, D n = {x ∈ Rn : kxk ≤ 1}. (Then ∂D n+1 = S n .)
Theorem 7.2. There is no smooth function r : D n+1 → S n with the property that r(x) = x
for all x ∈ S n .
Proof. Suppose there were such an r. Letting ω be an n-form on S n as in Lemma 7.1, we have
Z Z Z Z
ω= r∗ ω = d(r∗ ω) = r∗ (dω) = 0,
Sn Sn D n+1 D n+1
inasmuch as the only (n + 1)-form on an n-dimensional manifold is 0 (and hence dω = 0). But this
is a contradiction, since we chose ω with a nonzero integral.
Corollary 7.3 (Brouwer Fixed Point Theorem). Let f : D n → D n be smooth. Then there
must be a point x ∈ D n so that f (x) = x; i.e., f must have a fixed point.
Proof. Suppose not. Then for all x ∈ D n , the points x and f (x) are distinct. Define r : D n →
S n−1 by setting r(x) to be the point where the ray starting at f (x) and passing through x intersects
r(x)
x
f(x)
Figure 7.1
the unit sphere, as shown in Figure 7.1. We leave it to the reader to check in Exercise 1 that r is
in fact smooth. By construction, whenever x ∈ S n−1 , we have r(x) = x. By Theorem 7.2, no such
function can exist, and hence f must have a fixed point.
Topology is in some sense the study of continuous (or, in our case, smooth) deformations of
objects. An old saw is that a topologist is one who cannot tell the difference between a doughnut
and a coffee cup. This is because we can continuously deform one to the other, assuming we have
388 Chapter 8. Differential Forms and Integration on Manifolds
flexible, plastic objects: the “hole” in the doughnut becomes the “hole” in the handle of the cup.
The crucial notion here is the following:
S1 × [0,1]
0
S1
Figure 7.2
Z Z
f ∗ ω = 2π and g∗ ω = 4π.
S1 S1
On the other hand,
Z Z Z
∗ ∗
H ω= d(H ω) = H∗ (dω) = 0,
∂(S 1 ×[0,1]) S 1 ×[0,1] S 1 ×[0,1]
since any 2-form on S 1 must be 0. On the other hand, as we see from Figure 7.2,
∂(S 1 × [0, 1]) = (S 1 × {1})− ∪ (S 1 × {0}),
so
Z Z Z
∗ ∗
H ω= f ω− g∗ ω.
∂(S 1 ×[0,1]) S1 S1
§7. Applications to Topology 389
By the way, it is time to give a more precise definition of the term “simply connected.” A closed
curve in Rn is nothing other than the image of a map S 1 → X.
Corollary 7.5. Suppose X is a simply connected manifold. Then every closed 1-form on X is
exact.
1 ∗
Proof. LetZ f : S → X be a closed curve. f is homotopic to a constant map g. Since g ω = 0,
we infer that f ∗ ω = 0. The result now follows from Theorem 3.2.
S1
Note that this is the generalization of the local result we obtained earlier, Proposition 3.3.
Before moving on to our last topic, we stop to state and prove one of the cornerstones of classical
mathematics. We assume a modest familiarity with the complex numbers.
Proof. (We identify C with R2 for purposes of the vector calculus.9 ) Since
an−1 z n−1 + · · · + a1 z + a0
lim = 0,
z→∞ zn
there is R > 0 so that whenever |z| ≥ R we have
an−1 z n−1 + · · · + a1 z + a0 1
≤ .
zn 2
9
Recall that complex numbers are of the form z = x + iy, x, y ∈ R. We add complex numbers as vectors in
R , and we multiply by using the distributive property and the rule i2 = −1: if z = x + iy and w = u + iv, then
2
zw = (xu − yv) + i(xv + yu). It is customary to denote the length of the complex number z by |z| and the reader
can easily check that |zw| = |z||w|. In addition, deMoivre’s formula tells us that if z = r(cos θ + i sin θ), then
z n = r n (cos nθ + i sin nθ).
390 Chapter 8. Differential Forms and Integration on Manifolds
On ∂B(0, R) we have a homotopy H : ∂B(0, R) × [0, 1] → C − {0} between p and g(z) = z n given
by
z
H = z n + (1 − t)(an−1 z n−1 + · · · + a1 z + a0 ) = tg(z) + (1 − t)p(z).
t
The crucial issue is that, by the triangle inequality,
We can actually obtain a stronger, more localized version. We need the following computational
result, a more elegant proof of which is suggested in Exercise 8.
Lemma 7.7. Let ω = (−ydx + xdy)/(x2 + y 2 ) ∈ A1 (C − {0}), and suppose f and g are smooth
maps to C − {0}. Then (f g)∗ ω = f ∗ ω + g ∗ ω.
Now we have an intriguing application of winding numbers (see Section 3) that gives a two-
dimensional analogue of Gauss’s Law from the preceding section. We make use of the Fundamental
Theorem of Algebra.
Proof. As usual, let ω = (−ydx + xdy)/(x2 + y 2 ). Using Theorem 7.6, we factor p(z) =
c(z − r1 )(z − r2 ) · · · (z − rn ), where c 6= 0 and rj ∈ C, j = 1, . . . , n, are the roots of p. Let
fj (z) = z − rj . Then we claim that
Z 1, r ∈ D
1 ∗ j
f ω= .
2π C j 0, otherwise
The former is a consequence of Example 10 on p. 345; the latter follows from Corollary 7.5. Applying
Pn
Lemma 7.7 repeatedly, we see that p∗ ω = fj∗ ω, and so
j=1
Z n
X Z X
1 ∗ 1
p ω= fj∗ ω = 1
2π C 2π C
j=1 rj ∈D
There are far-reaching generalizations of this result that you may learn about in a differential
topology or differential geometry course. An interesting application is the study of how roots of a
polynomial vary as we change the polynomial; see Exercise 9.
A vector field v on S n is a smooth function v : S n → Rn+1 with the property that x · v(x) = 0
for all x. (That is, v(x) is tangent to the sphere at x.)
Example 3. There is an obvious nowhere-zero vector field on S 1 , the unit circle, which we’ve
seen many times this chapter:
x1 −x2
v = .
x2 x1
Indeed, an analogous formula works on S 2m−1 ⊂ R2m :
x1 −x2
x2 x1
.. ..
v =
. . .
x2m−1 −x2m
x2m x2m−1
(If we visualize the vector field in the case of the circle as pushing around the circle, in the higher-
dimensional case, we imagine pushing in each of the orthogonal x1 x2 -, x3 x4 -, . . . , x2m−1 x2m -planes
independently.) ▽
392 Chapter 8. Differential Forms and Integration on Manifolds
In contrast with the preceding example, however, it is somewhat surprising that there is no
nowhere-zero vector field on S n when n is even. The following result is usually affectionately called
the Hairy Ball Theorem, as it says that we cannot “comb the hairs” on an even-dimensional sphere.
Theorem 7.9. Any vector field on the unit sphere S 2m must vanish somewhere.
−x v(x)
H(xt )
Figure 7.3
x
H = (cos πt)x + (sin πt)v(x).
t
Clearly, H is a smooth function. Now we apply Proposition 7.4, using the form ω defined in Lemma
7.1. In particular, we calculate g∗ ω explicitly:
2m+1
X
g∗ ω = g∗ ci ∧ · · · ∧ dx2m+1
(−1)i−1 xi dx1 ∧ · · · ∧ dx
i=1
2m+1
X
= \i ) ∧ · · · ∧ (−dx2m+1 ) = (−1)2m+1 ω = −ω.
(−1)i−1 (−xi )(−dx1 ) ∧ · · · ∧ (−dx
i=1
Thus, we have Z Z Z Z
∗ ∗
ω= f ω= g ω=− ω;
Z S 2m S 2m S 2m S 2m
EXERCISES 8.7
1. Check that the mapping r defined in the proof of Corollary 7.3 is in fact smooth.
2. Consider the maps f and g defined in Example 2 as maps from [0, 2π] to R2 (rather than to
S 1 ). Determine whether they are homotopic.
§7. Applications to Topology 393
5. Show that Corollary 7.3 need not hold on the following spaces:
a. S n
b. the annulus {x ∈ R2 : 1 ≤ kxk ≤ 2}
c. a solid torus
d. B n (the open unit ball)
6. Prove the following generalization of Theorem 7.2: let M be any compact, orientable manifold
with boundary. Then there is no function f : M → ∂M with the property that f (x) = x for all
x ∈ ∂M .
C1
C3
C2
C5 C4
Figure 7.4
Z = {x2 + y 2 = 1, z = 0} ∪ {x = y = 0} ∪ {x = z = 0, y ≥ 1} ⊂ R3 .
(Hint: Use a homotopy similar to that appearing in the proof of Theorem 7.6.)
b. Let a0 , a1 , . . . , an−1 ∈ C and p(z) = z n + an−1 z n−1 + · · · + a1 z + a0 . Let D ⊂ C be a
region so that no root of p lies on C = ∂D. Prove that there is δ > 0 so that whenever
|bj − aj | < δ for all j = 0, 1, . . . , n − 1, the polynomial P (z) = z n + bn−1 z n−1 + · · · + b1 z + b0
has the same number of roots in D as p.
c. Deduce from part b that “the roots of a polynomial vary continuously with the coefficients.”
(Cf. Example 2 on p. 181 and Exercise 6.2.2. See also Exercise 9.4.22 for an interesting appli-
cation to linear algebra.)
10. Let f : S 2m → S 2m be a smooth map. Prove that there exists x ∈ S 2m so that either f (x) = x
or f (x) = −x.
11. Let n ≥ 2 and f : D n → Rn be smooth. Suppose kf (x) − xk < 1 for all x ∈ S n−1 . Prove that
there is some x ∈ D n so that f (x) = 0. (Hint: If not, show that the restriction of the map
f
: D n → S n−1 to ∂D n is homotopic to the identity map.)
kf k
12. We wish to give a generalization of Proposition 3.3. Suppose U ⊂ Rn is an open subset that is
star-shaped with respect to the origin.
a. For any k = 1, . . . , n, given a k-form φ = fI dxI on U , define the (k − 1)-form I(φ) =
Z 1 Xk
tk−1 fI (tx)dt d
(−1)j−1 xij dxi1 ∧ · · · ∧ dx ij ∧ · · · ∧ dxik . Then make I linear. Prove
0 j=1
that φ = d(I(φ)) + I(dφ).
b. Prove that if ω is a closed k-form on U , then ω is exact.
13. Use the result of Exercise 12 to express each of the following closed forms ω on R3 in the form
ω = dη.
a. ω = (ex cos y + z)dx + (2yz 2 − ex sin y)dy + (x + 2y 2 z + ez )dz
b. ω = (2x + y 2 )dy ∧ dz + (3y + z)dx ∧ dz + (z − xy)dx ∧ dy
c. ω = xyzdx ∧ dy ∧ dz
14. Draw an orientable surface whose boundary is the boundary curve of the Möbius strip, as
pictured in Figure 7.5. (More generally, every simple closed curve in R3 bounds an orientable
surface. Can you see why?)
16. Fill in the details in the following alternative proof of Theorem 7.9 following J. Milnor. Given
a (smooth) unit vector field v on S n , first extend v to be a vector field V on Rn+1 by setting
kxk2 v x , x 6= 0
kxk
V(x) = .
0, x=0
§7. Applications to Topology 395
Figure 7.5
a. Check that V is C1 .
b. Define ft : D n+1 → Rn+1 by ft (x) = x + tV(x). Apply the inverse function theorem to
prove that for t sufficiently small, ft maps the closed unit ball one-to-one and onto the
√
closed ball of radius 1 + t2 . (Hints: To establish one-to-one, first use the
inverse
function
x ft (x)
theorem to show that the function F : D n+1 × R → Rn+1 × R given by F = is
t t
locally one-to-one. Now proceed by contradiction: suppose there were a sequence tk → 0
and points xk , yk ∈ D n+1 so that ftk (xk ) = ftk (yk ). Use compactness of D n+1 to pass to
convergent subsequences xkj and ykj . To establish onto, you will need to use the fact that
the only nonempty subset of D n+1 that is both open (in D n+1 ) and closed is D n+1 itself.)
√
c. Apply the Change of Variables Theorem to see that the volume of B(0, 1 + t2 ) must be
a polynomial expression in t.
d. Deduce that you have arrived at a contradiction when n is even.
CHAPTER 9
Eigenvalues, Eigenvectors, and Applications
We have seen the importance of choosing the appropriate coordinates in doing multiple inte-
gration. Now we turn to what is really a much more basic question. Given a linear transformation
T : Rn → Rn , can we choose appropriate (convenient?) coordinates on Rn so that the matrix for
T (in these coordinates) is as simple as possible, say diagonal? For this the fundamental tool is
eigenvalues and eigenvectors. We then give applications to difference and differential equations and
quadratic forms.
In all our previous work, we have referred to the “standard matrix” of a linear transformation.
Now we wish to broaden our scope.
Definition. Let V be a finite-dimensional vector space and let T : V → V be a linear trans-
formation. Let B = {v1 , . . . , vn } be an ordered basis for V . Define numbers aij , i = 1, . . . , n,
j = 1, . . . , n, by
T (vj ) = a1j v1 + a2j v2 + · · · + anj vn , j = 1, . . . , n.
Then we define A = [aij ] to be the matrix for T with respect to B, also denoted [T ]B . As before,
we have
| | |
A = T (v1 ) T (v2 ) · · · T (vn ) ,
| | |
where now the column vectors are the coordinates of the vectors with respect to the basis B.
We might agree that, generally, the easiest matrices to understand are diagonal. If we think
of our examples of projection and reflection in Rn , we obtain some particularly simple diagonal
matrices.
Example 1. Suppose V ⊂ Rn is a subspace. Choose a basis {v1 , . . . , vk } for V and a basis
{vk+1 , . . . , vn } for V ⊥ . Then B = {v1 , . . . , vn } forms a basis for Rn (why?). Let T = projV : Rn →
Rn be the linear transformation given by projecting onto V , and let S : Rn → Rn be the linear
transformation given by reflecting across V . Then we have
T (v1 ) = v1 S(v1 ) = v1
.. ..
. .
T (vk ) = vk S(vk ) = vk
and
T (vk+1 ) = 0 S(vk+1 ) = −vk+1
.. ..
. .
T (vn ) = 0 S(vn ) = −vn
396
§1. Linear Transformations and Change of Basis 397
Then the matrices for T and S with respect to the basis B are, respectively,
" # " #
Ik O Ik O
B= and C = . ▽
O O O −In−k
Example 2. Let T : R2 → R2 be the linear transformation defined by multiplying by
" #
3 1
A= .
2 2
It is rather difficult to understand this function until we discover that if we take
1 −1
v1 = and v2 = ,
1 2
then T (v1 ) = 4v1 and T (v2 ) = v2 , so that the matrix for T with respect to the ordered basis
v1
Figure 1.1
Of course, when B is the standard basis E for Rn , this is what you’d expect:
x1
x2
CE (x) = .. .
.
xn
T
V V
CB CB
A
IRn IRn
Figure 1.2
Suppose now that we have a linear transformation T : V → V and two ordered bases B =
{v1 , . . . , vn } and B ′ = {v1′ , . . . , vn′ } for V . (Often in our applications, as the notation suggests, V
§1. Linear Transformations and Change of Basis 399
will be Rn and B will be the standard basis E.) Let Aold = [T ]B be the matrix for T with respect
to the “old” basis B, and let Anew = [T ]B′ be the matrix for T with respect to the “new” basis B ′ .
The fundamental issue now is to compute Anew if we know Aold . Define the change-of-basis matrix
P to be the matrix whose column vectors are the coordinates of the new basis vectors with respect
to the old basis: i.e.,
vj′ = p1j v1 + p2j v2 + · · · + pnj vn .
When B is the standard basis, we have our usual schematic picture
| | |
P = v1′ v2′ · · · vn′ .
| | |
Note that P must be invertible, since we can similarly express each of the old basis vectors as
a linear combination of the new basis vectors. (Cf. Proposition 3.4 of Chapter 4.) Then, as the
diagram in Figure 1.3 summarizes, we have the following
T
V V
CB CB′ CB CB′
[T]B′
IRn IRn
P P —1
[T]B
IRn IRn
Figure 1.3
Remark. Two matrices A and B are called similar if B = P −1 AP for some invertible matrix P
(see Exercise 9). Theorem 1.1 tells us that any two matrices representing a linear map T : V → V
are similar.
Proof. Given a vector v ∈ V , denote by x and x′ , respectively, its coordinate vectors with
respect to the bases B and B ′ . The important relation here is
x = P x′ .
Pn
We derive this as follows: Using the equations v = xi vi and
i=1
Xn X X
n n X n Xn
′ ′ ′
v= xj vj = xj pij vi = pij x′j vi ,
j=1 j=1 i=1 i=1 j=1
400 Chapter 9. Eigenvalues, Eigenvectors, and Applications
(If we think of the old basis as the standard basis for Rn , then this is our familiar fact that
multiplying P by x′ takes the appropriate linear combination of the columns of P .)
Likewise, if T (v) = w, let y and y′ , respectively, denote the coordinate vectors of w with
respect to bases B and B ′ . Now compare the equations
y′ = [T ]B′ x′ and y = [T ]B x
using
y = P y′ and x = P x′ :
y = [T ]B x = [T ]B (P x′ ) = ([T ]B P )x′ ,
Example 3. Let’s return to Example 2 as a test case for the change-of-basis formula. (Of
course, we’ve already seen there that it works!) Given the matrix
" #
3 1
A = [T ] =
2 2
of a linear transformation T : R2 → R2 with respect to the standard basis, let’s calculate its matrix
[T ]B′ with respect to the new basis B ′ = {v1 , v2 }, where
1 −1
v1 = and v2 = .
1 2
Example 4. We wish to calculate the standard matrix for the linear transformation T = projV ,
where V ⊂ R3 is the plane x1 − 2x2 + x3 = 0. If we choose a basis B = {v1 , v2 , v3 } for R3 so that
{v1 , v2 } is a basis for V and v3 is normal to the plane, then (see Example 1) we’ll have
1 0 0
[T ]B = 0 1 0.
0 0 0
So we take
−1 1 1
v1 = 0, v2 = 1 ,
and v3 = −2 .
1 1 1
We wish to know the standard matrix, which means that B ′ = {e1 , e2 , e3 } should be the standard
basis for R3 . Then the inverse of the change-of-basis matrix is
−1 1 1
P −1 = 0 1 −2 ,
1 1 1
and so
− 21 0 1
2
1 1 1
P = 3 3 3 .
1
6 − 31 1
6
e3
P v3
v2
e2 v1
e1
Figure 1.4
402 Chapter 9. Eigenvalues, Eigenvectors, and Applications
a vantage point on the “positive side” of this line.) Once again, the key is to choose a convenient
new basis adapted to the geometry of the problem. We choose
1
v3 = −1
1
along the axis of rotation and v1 , v2 to be an orthonormal basis for the plane orthogonal to that
axis: e.g.,
1 −1
1 1
v1 = √ 1 and v2 = √ 1.
2 0 6 2
Now let’s compute:
√
3
T (v1 ) = − 12 v1 + 2 v2
√
T (v2 ) = − 23 v1 − 1
2 v2
T (v3 ) = v3
(Now it should be clear why we chose v1 , v2 to be orthonormal. We also want v1 , v2 , v3 to form
a “right-handed system” so that we’re turning in the correct direction, as indicated in Figure 1.4.
But there’s no need to worry about the length of v3 .) Thus, we have
√
1 3
− 2 − 2 0
√
[T ]B = 23 − 12 0 .
0 0 1
Next, we take B ′ = {e1 , e2 , e3 } and the inverse of the change-of-basis matrix is
√1 − √1 1
12 6
P −1 =
2
√ √1
6
−1
,
2
0 √
6
1
so that
√1 √1 0
2 2
√1 √1 √2
P = − 6 .
6 6
1
3 − 13 1
3
(Exercise 5.5.16 may be helpful here, but, as a last resort, there’s always Gaussian elimination.)
Once again, we solve for
√ 1
√1 − √1 1 − 1
− 3
0 √ √1 0
1 2 6 √2 2 2 2
√1 √1 √2
[T ] = [T ]B′ = P −1 [T ]B P = √
2
√1
6
−1 3 −1
2 2 0 −
6 6 6
2 1 1 1
0 √
6
1 0 0 1 3 −3 3
0 −1 0
= 0 0 −1 ,
1 0 0
amazingly enough. In hindsight, then, we should be able to see the effect of T on the standard
basis vectors quite plainly. Can you? ▽
§1. Linear Transformations and Change of Basis 403
Remark. Suppose we first rotate π/2 about the x3 -axis and then rotate π/2 about the x1 -axis.
We leave it to the reader to check that the result is the linear transformation whose matrix we
just calculated. This raises a fascinating question: Is the composition of rotations always again a
rotation? If so, is there a way of predicting the ultimate axis and angle?
EXERCISES 9.1
2 1
*1. Let v1 = and v2 = , and consider the basis B ′ = {v1 , v2 } for R2 .
3 2
1 5
a. Suppose T : R2 → R2 is a linear transformation whose standard matrix is [T ] = .
2 −2
Find the matrix for T with respect to the basis B ′ .
b. If S : R2 → R2 is a linear transformation defined by
S(v1 ) = 2v1 + v2
S(v2 ) = −v1 + 3v2 ,
*5. Let T : R3 → R3 be the linear transformation given by reflecting across the plane
x1 − 2x2 + 2x3 = 0. Use the change-of-basis formula to find its standard matrix.
V = {x ∈ R3 : x1 − x2 + x3 = 0}.
Find the standard matrix for each of the following linear transformations:
a. projection on V
b. reflection across V
c. rotation of V through angle π/6 (as viewed from high above)
404 Chapter 9. Eigenvalues, Eigenvectors, and Applications
4
*8. Find the standard
matrix
for the linear transformation giving projection onto the plane in R
1 0
0 1
spanned by
2 and −1 .
1 1
10. See Exercise 9 for the relevant definition. Prove or give a counterexample:
a. If B is similar to A, then B T is similar to AT .
b. If B 2 is similar to A2 , then B is similar to A.
c. If B is similar to A and A is nonsingular, then B is nonsingular.
d. If B is similar to A and A is symmetric, then B is symmetric.
e. If B is similar to A, then N(B) = N(A).
f. If B is similar to A, then rank(B) = rank(A).
11. See Exercise 9 for the relevant definition. Suppose A and B are n × n matrices.
a. Show that if either A or B is nonsingular, then AB and BA are similar.
b. Must AB and BA be similar in general?
sin φ cos θ
12. *a. Let a = sin φ sin θ , 0 ≤ φ < π/2. Prove that the intersection of the circular cylinder
cos φ
x21 + x
2
2 = 1 with the plane a·
x = 0 is an ellipse. (Hint: Consider the new basis
− sin θ − cos φ cos θ
v1 = cos θ , v2 = − cos φ sin θ , v3 = a.)
0 sin φ
b. Describe the projection of the cylindrical region x21 + x22 = 1, −h ≤ x3 ≤ h onto the
general plane a · x = 0. (Hint: Special cases are the planes x3 = 0 and x1 = 0.)
±1 1
13. A cube with vertices at ±1 is rotated about the long diagonal through ± 1 . Describe
±1 1
the resulting surface and give equation(s) for it.
14. In this exercise we give the general version of the change-of-basis formula for a linear transfor-
mation T : V → W .
§2. Eigenvalues, Eigenvectors, and Diagonalizability 405
a. Suppose V and V′ are ordered bases for the vector space V and W and W′ are ordered bases
for the vector space W . Let P be the change of basis matrix from V to V′ and let Q be
the change of basis matrix from W to W′ . Suppose T : V → W is a linear transformation
whose matrix with respect to the bases V and W is [T ]W V and whose matrix with respect
′ ′ ′ ′
to the new bases V and W is [T ]V′ . Prove that [T ]V′ = Q−1 [T ]W
W W
V P.
b. Consider the identity transformation T : V → V . Using the basis V′ in the domain and
the basis V in the range, show that the matrix for [T ]VV′ is the change of basis matrix P .
15. (See the discussion on p. 174 and Exercise 4.4.18.) Let A be an n × n matrix. Prove that the
functions T : R(A) → C(A) and S : C(A) → R(A) are inverse functions if and only if A = QP ,
where P is a projection matrix and Q is orthogonal.
As we shall soon see, it is often necessary in applications to compute (high) powers of a given
square matrix. When A is diagonalizable, i.e., there is an invertible matrix P so that P −1 AP = Λ
is diagonal, we have
A = P ΛP −1 , and so
Ak = (P ΛP −1 )(P ΛP −1 ) · · · (P ΛP −1 ) = P Λk P −1 .
| {z }
k times
Since Λk is easy to calculate, we are left with a very computable formula for Ak . We will see a
number of applications of this principle in Section 3. We turn first to the matter of finding the
diagonal matrix Λ if, in fact, A is diagonalizable. Then we will try to develop some criteria that
guarantee diagonalizability.
T (v1 ) = λ1 v1 ,
T (v2 ) = λ2 v2 ,
..
.
T (vn ) = λn vn .
Likewise, an n×n matrix A is diagonalizable if there is a basis {v1 , . . . , vn } for Rn with the property
that Avi = λi vi for all i = 1, . . . , n.
This observation leads us to the following
At this juncture, the obvious question to ask is how we should find eigenvectors. Let’s start
by observing that, if we include the zero vector, the set of eigenvectors with eigenvalue λ forms a
subspace.
Lemma 2.2. Let T : V → V be a linear transformation, and let λ be any scalar. Then
E(λ) = {v ∈ V : T (v) = λv} = ker(T − λI)
is a subspace of V . dim E(λ) > 0 if and only if λ is an eigenvalue, in which case we call E(λ) the
λ-eigenspace.
Proof. That E(λ) is a subspace follows immediately once we recognize that it is the kernel
(or nullspace) of a linear map. (In the more familiar matrix notation, {x ∈ Rn : Ax = λx} =
N(A − λI).) Now, by definition, λ is an eigenvalue precisely when there is a nonzero vector in
E(λ).
We now come to the main computational tool for finding eigenvalues.
Proof. From Lemma 2.2 we infer that λ is an eigenvalue if and only if the matrix A − λI is
singular. Next we conclude from Theorem 5.5 of Chapter 7 that A − λI is singular precisely when
det(A − λI) = 0. Putting the two statements together, we obtain the result.
Once we use this criterion to find the eigenvalues λ, it is an easy matter to find the corresponding
eigenvectors merely by finding N(A − λI).
Since t2 − 10t + 24 = (t − 4)(t − 6) = 0 when t = 4 or t = 6, these are our two eigenvalues. We now
proceed to find the corresponding eigenspaces:
E(4): We see that
1 −1 1
v1 = is a basis for N(A − 4I) = N .
1 −3 3
§2. Eigenvalues, Eigenvectors, and Diagonalizability 407
then
− 12 − 21 1
2 1 2 1 −1 3 1 0 0 0
P −1 AP = 0 −1 00 1 0 0 −1 0 = 0 1 0,
1 5 1
2 2 2 1 3 1 1 2 1 0 0 2
as we expected. ▽
Remark. There is a built-in check here for the eigenvalues. If λ is truly to be an eigenvalue of
A, we must find a nonzero vector in N(A − λI). If we do not, then λ cannot be an eigenvalue.
It is evident that we are going to find the eigenvalues of a matrix A by finding the (real) roots
of the polynomial det(A − tI). This leads us to make our next
Definition. Let A be a square matrix. Then p(t) = pA (t) = det(A − tI) is called the charac-
teristic polynomial of A.1
We can restate Proposition 2.3 by saying that the eigenvalues of A are the real roots of the
characteristic polynomial pA (t). It is comforting to observe that similar matrices have the same
characteristic polynomial, and hence it makes sense to refer to the characteristic polynomial of a
linear map T : V → V .
Proof. We have
pB (t) = det(B − tI) = det(P −1 AP − tI) = det P −1 (A − tI)P = det(A − tI) = pA (t),
Remark . In order to determine the eigenvalues of a matrix, we must find the roots of its
characteristic polynomial. In real-world applications (where the matrices tend to get quite large),
one might solve this numerically (e.g., using Newton’s method). However, there are more sophisti-
cated methods for finding the eigenvalues without even calculating the characteristic polynomial; a
powerful such method is based on the Gram-Schmidt process. The interested reader should consult
Strang or Wilkinson for more details.
For the lion’s share of the matrices that we shall encounter here, the eigenvalues will be integers,
and so we take this opportunity to remind you of a trick from high school algebra.
Proof. You can find a proof in most abstract algebra texts, but, for obvious reasons, we
recommend Abstract Algebra: A Geometric Approach, by someone named T. Shifrin, p. 105.
In particular, when the leading coefficient an is ±1, as is always the case with the characteristic
polynomial, any rational root must in fact be an integer that divides a0 . So, in practice, we test
the various factors of a0 (being careful to try both positive and negative such). Once we find one
root r, we can divide p(t) by t − r to obtain a polynomial of smaller degree.
Remark. It might be nice to have a few shortcuts for calculating the characteristic polynomial
of small matrices. For 2 × 2 matrices, it’s quite easy:
a − t b
= (a − t)(d − t) − bc = t2 − (a + d) t + (ad − bc) = t2 − trA t + det A .
c d − t
(Recall that the trace of a matrix is the sum of its diagonal entries. The trace of A is denoted trA.)
For 3 × 3 matrices, similarly,
a − t a12 a13
11
a21 a22 − t a23 = −t3 + trA t2 − (C11 + C22 + C33 ) t + det A ,
a31 a32 a33 − t
where Cii is the iith cofactor, the determinant of the 2 × 2 submatrix formed by deleting the ith
row and column from A.
410 Chapter 9. Eigenvalues, Eigenvectors, and Applications
In the long run, these formulas notwithstanding, it’s sometimes best to calculate the character-
istic polynomial of 3 × 3 matrices by expansion in cofactors. If one is both attentive and fortunate,
this may save the trouble of factoring the polynomial.
2.2. Diagonalizability. Judging by the foregoing examples, it seems to be the case that
when an n × n matrix (or linear transformation) has n distinct eigenvalues, the corresponding
eigenvectors form a linearly independent set and will therefore give a “diagonalizing basis.” Let’s
begin by proving a slightly stronger statement.
Proof. Let m be the largest number between 1 and k (inclusive) so that {v1 , . . . , vm } is linearly
independent. We want to see that m = k. By way of contradiction, suppose m < k. Then we know
that {v1 , . . . , vm } is linearly independent and {v1 , . . . , vm , vm+1 } is linearly dependent. It follows
from Proposition 3.2 of Chapter 4 that vm+1 = c1 v1 + · · · + cm vm for some scalars c1 , . . . , cm . Then
(using repeatedly the fact that T (vi ) = λi vi )
Since λi − λm+1 6= 0 for i = 1, . . . , m, and since {v1 , . . . , vm } is linearly independent, the only
possibility is that c1 = · · · = cm = 0, contradicting the fact that vm+1 6= 0 (by the very definition
of eigenvector). Thus, it cannot happen that m < k, and the proof is complete.
We now arrive at our first result that gives a sufficient condition for a linear transformation to
be diagonalizable.
Proof. The set of the n corresponding eigenvectors will be linearly independent and will hence
give a basis for V . The matrix for T with respect to a basis of eigenvectors is always diagonal.
Remark. Of course, there are many diagonalizable (indeed, diagonal) matrices with repeated
eigenvalues. Certainly the identity matrix and the matrix
2 0 0
0 3 0
0 0 2
are diagonal, and yet they fail to have distinct eigenvalues.
We spend the rest of this section discussing the two ways the hypotheses of Corollary 2.7 can
fail: the characteristic polynomial may have complex roots or it may have repeated roots.
The reader may well recall from Chapter 1 that the multiplying by A gives a rotation of the plane
through an angle of π/4. Now, what are the eigenvalues of A? The characteristic polynomial is
√
p(t) = t2 − (trA)t + det A = t2 − 2t + 1,
We have seen that when the characteristic polynomial has distinct (real) roots, we get a 1-
dimensional eigenspace for each. What happens if the characteristic polynomial has some repeated
roots?
have the characteristic polynomial p(t) = (t − 2)2 (t − 3)2 (why?). For A, there are two linearly
independent eigenvectors with eigenvalue 2 but only one linearly independent eigenvector with
eigenvalue 3. For B, there are two linearly independent eigenvectors with eigenvalue 3 but only one
linearly independent eigenvector with eigenvalue 2. As a result, neither can be diagonalized. ▽
Example 9. For the matrices in Example 8, both the eigenvalues 2 and 3 have algebraic
multiplicity 2. For matrix A, the eigenvalue 2 has geometric multiplicity 2 and the eigenvalue 3
has geometric multiplicity 1; for matrix B, the eigenvalue 2 has geometric multiplicity 1 and the
eigenvalue 3 has geometric multiplicity 2. ▽
From the examples we’ve seen, it seems quite plausible that the geometric multiplicity of an
eigenvalue can be no larger than its algebraic multiplicity, but we stop to give a proof.
We are now able to give a necessary and sufficient criterion for a linear transformation to be
diagonalizable. Based on our experience with examples, it should come as no great surprise.
Proof. Let V be an n-dimensional vector space. Then the characteristic polynomial of T has
degree n, and we have
p(t) = ±(t − λ1 )m1 (t − λ2 )m2 · · · (t − λk )mk ;
therefore,
k
X
n= mi .
i=1
Now, suppose T is diagonalizable. Then there is a basis B consisting of eigenvectors. At most
P
k
di of these basis vectors lie in E(λi ), and so n ≤ di . On the other hand, by Proposition 2.8
i=1
di ≤ mi for i = 1, . . . , k. Putting these together, we have
k
X k
X
n≤ di ≤ mi = n.
i=1 i=1
Thus, we must have equality at every stage here, which implies that di = mi for all i = 1, . . . , k.
Conversely, suppose di = mi for i = 1, . . . , k. If we choose a basis Bi for each eigenspace E(λi )
and let B = B1 ∪ · · · ∪ Bk , then we assert that B is a basis for V . There are n vectors in B, so
we need only check that the set of vectors is linearly independent. This is a generalization of the
argument of Theorem 2.6, and we leave it to Exercise 25.
both have characteristic polynomial p(t) = −(t − 1)2 (t − 2). That is, the eigenvalue 1 has algebraic
multiplicity 2 and the eigenvalue 2 has algebraic multiplicity 1. To decide whether the matrices are
diagonalizable, we need to know the geometric multiplicity of the eigenvalue 1. Well,
−2 4 2 1 −2 −1
A − I = −1 2 1 0 0 0
−1 2 1 0 0 0
has rank 1 and so dim EA (1) = 2. We infer from Theorem 2.9 that A is diagonalizable. Indeed, as
the reader can check, a diagonalizing basis is
1 1 2
0 , 1 , 1 .
1 −1 1
has rank 2 and so dim EB (1) = 1. Since the eigenvalue 1 has geometric multiplicity 1, it follows
from Theorem 2.9 that B is not diagonalizable. ▽
In the next section we will see the power of diagonalizing matrices in several applications.
EXERCISES 9.2
1. Find"the eigenvalues
# and eigenvectors of the following matrices.
1 5 1 0 0
*a.
2 4 g. −2 1 2
" #
0 1 −2 0 3
b.
1 0 1 −1 2
" #
h. 0 1 0
10 −6
c. 0 −2 3
18 −11
" # 2 0 1
1 3
d. *i. 0 1 2
3 1
" # 0 0 1
1 1
*e. 1 −2 2
−1 3
j. −1 0 −1
−1 1 2 0 2 −1
*f. 1 2 1
2 1 −1
§2. Eigenvalues, Eigenvectors, and Diagonalizability 415
3 1 0 3 2 −2
k. 0 1 2 m. 2 2 −1
0 1 2 2 1 0
1 −6 4 1 0 0 1
0
*l. −2 −4 5 1 1 1
n.
−2 −6 7 0 0 2 0
0 0 0 2
2. Prove that 0 is an eigenvalue of A if and only if A is singular.
3. Prove that the eigenvalues of an upper (or lower) triangular matrix are its diagonal entries.
5. Suppose A is nonsingular. Prove that the eigenvalues of A−1 are the reciprocals of the eigen-
values of A.
7. Prove or give a counterexample: If A and B have the same characteristic polynomial, then
there is an invertible matrix P so that B = P −1 AP .
♯ 8. Suppose A is a square matrix. Suppose x is an eigenvector of A with corresponding eigenvalue
λ and y is an eigenvector of AT with corresponding eigenvalue µ. Prove that if λ 6= µ, then
x · y = 0.
10. Prove that the product of the roots of the characteristic polynomial of A is equal to det A.
(Hint: If λ1 , . . . , λn are the roots, show that p(t) = ±(t − λ1 )(t − λ2 ) . . . (t − λn ).)
*12. Decide whether each of the matrices in Exercise 1 is diagonalizable. Give your reasoning.
14. Suppose A is a 2 × 2 matrix whose eigenvalues are integers. If det A = 120, explain why A
must be diagonalizable.
15. Is the linear transformation T : Mn×n → Mn×n defined by T (X) = X T diagonalizable? (Hint:
Consider the equation X T = λX. What are the corresponding eigenspaces? Exercise 1.4.36
may also be relevant.)
1 1 1
*16. Let A = . We saw in Example 7 that A has repeated eigenvalue 2 and v1 =
−1 3 1
spans E(2).
a. Calculate (A − 2I)2 .
b. Solve (A − 2I)v2 = v1 for v2 . Explain how we know a priori that this equation has a
solution.
c. Give the matrix for A with respect to the basis {v1 , v2 }.
This is the closest to diagonal one can get and is called the Jordan canonical form of A.
21. Consider the linear map T : R3 → R3 whose standard matrix is the matrix
1 1
√
6 1
√
6
6√ 3 + 6 6 − √3
6 6
C = 13 − 2 1
+ 6
√6 3√ 3
1 6 1 6 1
6 + 3 3 − 6 6
given on p. 27. Show that T is indeed a rotation. Find the axis and angle of rotation.
22. Let A be an n × n matrix all of whose eigenvalues are real numbers. Prove that there is a basis
for Rn with respect to which the matrix for A becomes upper triangular. (Hint: Consider a
basis {v1 , v2′ , . . . , vn′ }, where v1 is an eigenvector.)
♯ 23. Suppose T : V → V is a linear transformation. Suppose T is diagonalizable (i.e., there is a basis
for V consisting of eigenvectors of T ). Suppose, moreover, that there is a subspace W ⊂ V with
the property that T (W ) ⊂ W . Prove that there is a basis for W consisting of eigenvectors of
T . (Hint: Using Exercise 4.3.18, concoct a basis for V by starting with a basis for W . Consider
the matrix for T with respect to this basis; what is its characteristic polynomial?)
Then we have
" #
−1 1 0
A = P ΛP , where Λ = ,
0 1.1
and so
" #" #" #
2 1 1 0 2 −1
Ak = P Λk P −1 = .
3 2 0 (1.1)k −3 2
c0
In particular, if x0 = is the original population vector, we have
m0
" # " #" #" #" #
ck 2 1 1 0 2 −1 c0
xk = =
mk 3 2 0 (1.1)k −3 2 m0
" #" #" #
2 1 2c0 − m0
1 0
=
3 2 0 (1.1)k
−3c0 + 2m0
" #" #
2 1 2c0 − m0
=
3 2 (1.1)k (−3c0 + 2m0 )
" # " #
2 1
= (2c0 − m0 ) + (−3c0 + 2m0 )(1.1)k .
3 2
We can now see what happens as time passes. If 3c0 = 2m0 , the second term drops out and the
population vector stays constant. If 3c0 < 2m0 , the first term still is constant, and the second
term increases exponentially; but note the contribution to the mouse population is double the
contribution to the cat population. And if 3c0 > 2m0 , we see that the population vector decreases
exponentially, the mouse population being the first to disappear (why?). ▽
The story for a general diagonalizable matrix A is the same. The column vectors of P are the
eigenvectors v1 , . . . , vn , the entries of Λk are λk1 , . . . , λkn , and so, letting
c1
c2
P −1 x0 = .. ,
.
cn
we have
| | | λk1 c
1
λk2 c2
(∗) k
A x0 = P Λ (P k −1
x0 ) =
v1 v2 ··· vn
..
.
.
. .
| | | λkn cn
This formula will have all the information we need, and we will see physical interpretations of
analogous formulas when we discuss systems of differential equations shortly.
420 Chapter 9. Eigenvalues, Eigenvectors, and Applications
is obtained by letting each number (starting with the third) be the sum of the preceding two: if we
let ak denote the kth number in the sequence, then
ak+1 = ak + ak−1 , a0 = a1 = 1.
ak
Thus, if we define xk = , k ≥ 0, then we can encode the pattern of the sequence in the
ak+1
matrix equation
" # " #
ak 0 1 ak−1
= , k ≥ 1.
ak+1 1 1 ak
xk = Ak x0 = c1 λk1 v1 + c2 λk2 v2
λ1 1 λ2 1
= √ λk1 − √ λk2 .
5 λ 1 5 λ2
In particular, reading off the first coordinate of this vector, we find that the kth number in the
Fibonacci sequence is
1 k+1 k+1
1 1+√5 k+1 1−√5 k+1
ak = √ λ1 − λ2 =√ 2 − 2 .
5 5
§3. Difference Equations and Ordinary Differential Equations 421
It’s far from obvious (at least to the author) that each such number is an integer! We would be
remiss if we didn’t point out one of the classic facts about the Fibonacci sequence: if we take the
ratio of successive terms, we get
√1 k+2 k+2
ak+1 5
λ 1 − λ 2
= .
ak √1 λk+1 − λk+1
5 1 2
Example 3. Suppose n = 1, so that A = [a] for some real number a. Then we have simply
the ordinary differential equation
ẋ(t) = ax(t), x(0) = x0 .
The trick of “separating variables” that the reader most likely learned in her integral calculus course
leads to the solution x(t) = x0 eat . As we can easily check, ẋ(t) = ax(t), so we have in fact found
a solution. Do we know there can be no more? Suppose y(t) were any solution of the original
problem. Then the function z(t) = y(t)e−at satisfies the equation
ż(t) = ẏ(t)e−at + y(t) −ae−at = (ay(t)) e−at + y(t) −ae−at = 0,
and so z(t) must be a constant function. Since z(0) = y(0) = x0 , we see that y(t) = x0 eat . The
original differential equation (with its initial condition) has a unique solution. ▽
Since x1 (t) and x2 (t) appear completely independently in these equations, we infer from Example
3 that the unique solution of this system of equations will be
where E(t) is the diagonal 2 × 2 matrix with entries eat and ebt . This result is easily generalized to
the case of a diagonal n × n matrix. ▽
Recall that for any real number x, we have the Taylor series expansion
∞
X
x xk 1 1 1
(†) e = = 1 + x + x2 + x3 + · · · + xk + . . . .
k! 2 6 k!
k=0
That the series converges is immediate from Proposition 1.1 of Chapter 6. In general, however,
trying to evaluate this series directly is extremely difficult, because the coefficients of Ak are not
easily expressed in terms of the coefficients of A. However, when A is a diagonalizable matrix, it
is easy to compute eA : there is an invertible matrix P so that Λ = P −1 AP is diagonal. Thus,
A = P ΛP −1 and Ak = P Λk P −1 for all k ∈ N, and so
∞ ∞ ∞
!
X Ak X P Λk P −1 X Λk
A
e = = =P P −1 = P eΛ P −1 .
k! k! k!
k=0 k=0 k=0
" #
2 0
Example 5. Let A = . Then A = P ΛP −1 , where
3 −1
" # " #
2 1 0
Λ= and P = .
−1 1 1
Then we have
" # " #
e2t e2t 0
etΛ = and etA = P etΛ P −1 = . ▽
e−t 2t
e −e −t e−t
The result of Example 4 generalizes to the n × n case. Indeed, whenever we can solve a problem
for diagonal matrices, we can solve it for diagonalizable matrices by making the appropriate change-
of-basis. So we should not be surprised by the following result.
§3. Difference Equations and Ordinary Differential Equations 423
Proposition 3.1. Let A be a diagonalizable n × n matrix. The general solution of initial value
problem
Proof. As above, since A is diagonalizable, there are an invertible matrix P and a diagonal
matrix Λ so that A = P ΛP −1 and etA = P etΛ P −1 . Since the derivative of the diagonal matrix
etλ1
etλ2
tΛ
e =
..
.
etλn
is obviously
λ1 etλ1
λ2 etλ2
= ΛetΛ ,
..
.
λn etλn
then we have
• •
(etA )• = P etΛ P −1 = P etΛ P −1
= P ΛetΛ P −1
= (P ΛP −1 )(P etΛ P −1 ) = AetA .
as required.
Now suppose that y(t) is a solution of the equation (‡), and consider the vector function
z(t) = e−tA y(t). Then by the product rule, we have
as Ae−tA = e−tA A. This implies that z(t) must be a constant vector, and so
Remark . A more sophisticated interpretation of this result is the following: If we view the
system (‡) of ODE’s in a coordinate system derived from the eigenvectors of the matrix A, then
the system is uncoupled.
424 Chapter 9. Eigenvalues, Eigenvectors, and Applications
Example 6. Continuing Example 5, we see that the general solution of the system ẋ(t) = Ax(t)
has the form
" # " #
x1 (t) tA c1
x(t) = =e for appropriate constants c1 and c2
x2 (t) c2
" # " # " #
c1 e2t 1 0
= 2t −t −t
= c1 e2t + (c2 − c1 )e−t .
c1 e − c1 e + c2 e 1 1
and obtain the familiar linear combination of the columns of P (which are the eigenvectors of A). If,
in particular, we wish to study the long-term behavior of the solution, we observe that lim e−t = 0
t→∞
2t 2t 1
and lim e = ∞, so that x(t) behaves like c1 e as t → ∞. In general, this type of analysis
t→∞ 1
of diagonalizable systems is called normal mode analysis and the vector functions
" # " #
2t 1 −t 0
e and e
1 1
corresponding to the eigenvectors are called the normal modes of the system. ▽
To emphasize the analogy with the solution of difference equations earlier and the formula (∗)
on p. 419, we rephrase Proposition 3.1 so as to highlight the normal modes.
where
c1
c2
P −1 x0 = .. .
.
cn
§3. Difference Equations and Ordinary Differential Equations 425
Note that the general solution is a linear combination of the normal modes eλ1 t v1 , . . . , eλn t vn .
Even when A is not diagonalizable, we may differentiate the exponential series term-by-term2
to obtain
•
tA • t2 2 t3 3 tk k tk+1 k+1
(e ) = I + tA + A + A + · · · + A + A + ...
2! 3! k! (k + 1)!
t2 3 tk−1 tk
= A + tA2 + A + ··· + Ak + Ak+1 + . . .
2! (k − 1)! k!
t2 2 tk−1 tk
= A I + tA + A + ··· + Ak−1 + Ak + . . . = AetA .
2! (k − 1)! k!
Thus, we have
Theorem 3.3. Suppose A is an n × n matrix. Then the unique solution of the initial value
problem
ẋ(t) = Ax(t), x(0) = x0
is x(t) = etA x0 .
x(t) = etA x0 ,
Since the power series expansions (Taylor series) for sin and cos are, indeed,
1 3 1 1
sin t = t − t + t5 + · · · + (−1)k t2k+1 + . . .
3! 5! (2k + 1)!
1 1 1 2k
cos t = 1 − t2 + t4 + · · · + (−1)k t + ... ,
2! 4! (2k)!
the formulas agree. (Another approach to computing etA is to diagonalize A over the complex
numbers, but we don’t stop to do this here.3 ) ▽
In elementary differential equations courses, one is taught to look for a solution of the form
in this case,
ẋ1 (t) = (2a + b)e2t + (2b)te2t = 2x1 (t) + be2t ,
3
But we must remind you of the famous formula, usually attributed to Euler: eit = cos t + i sin t.
§3. Difference Equations and Ordinary Differential Equations 427
and so taking b = c gives the desired solution of our equation. That is, the solution of the system
is the vector function 2t 2t
ae + cte2t e te2t a
x(t) = = .
ce2t 0 e2t c
The explanation of the trick is quite simple. Let’s calculate the matrix exponential etA by
writing " # " # " #
2 0 0 1 0 1
A= + = 2I + B, where B = .
0 2 0 0 0 0
The powers of A are easy to compute because B 2 = 0: by the binomial theorem,
(2I + B)k = 2k I + k2k−1 B,
and so
∞ k
X ∞ k
X
t t
etA = Ak = 2k I + k2k−1 B
k! k!
k=0 k=0
X∞ X∞ k
(2t)k t
= I+ k2k−1 B
k! k!
k=0 k=0
X∞ X∞
(2t)k−1 (2t)k
= e2t I + t B = e2t I + t B
(k − 1)! k!
k=1 k=0
" #
e2t te2t
= e2t I + te2t B = .
0 e2t
A similar phenomenon occurs more generally (see Exercise 14). ▽
Let’s consider the general nth order linear ODE with constant coefficients:
(⋆) y (n) (t) + an−1 y (n−1) (t) + · · · + a2 ÿ(t) + a1 ẏ(t) + a0 y(t) = 0.
Here a0 , a1 , . . . , an−1 are scalars, and y(t) is assumed to be Cn ; y (k) denotes its kth derivative. We
can use the power of Theorem 3.3 to derive the following general result.
Theorem 3.4. Let n ∈ N. The set of solutions of the nth order ODE (⋆) is an n-dimensional
subspace of C∞ (R), the vector space4 of smooth functions. In particular, the initial value problem
y (n) (t) + an−1 y (n−1) (t) + · · · + a2 ÿ(t) + a1 ẏ(t) + a0 y(t) = 0
y(0) = c0 , ẏ(0) = c1 , ÿ(0) = c2 , ..., y (n−1) (0) = cn−1
has a unique solution.
Proof. The trick is to concoct a way to apply Theorem 3.3. We introduce the vector function
x(t) defined by
y(t)
ẏ(t)
ÿ(t)
x(t) = ,
..
.
y (n−1) (t)
4
See section 3.1 of Chapter 4.
428 Chapter 9. Eigenvalues, Eigenvectors, and Applications
where vj (t) are the columns of etA . In particular, if we let q1 (t), . . . , qn (t) denote the first entries
of the vector functions v1 (t), . . . , vn (t), respectively, we see that
y(t) = c0 q1 (t) + c1 q2 (t) + · · · + cn−1 qn (t);
that is, the functions q1 , . . . , qn span the vector space of solutions of the differential equation (⋆).
Note that these functions are C∞ since the entries of etA are. Last, we claim that these functions
are linearly independent. For suppose that for some scalars c0 , c1 , . . . , cn−1 we have
y(t) = c0 q1 (t) + c1 q2 (t) + · · · + cn−1 qn (t) = 0;
Then, differentiating, we have the same linear relation among the kth derivatives of q1 , . . . , qn ,
k = 1, . . . , n − 1, and so we have
y(t) c0
ẏ(t) c1
ÿ(t) tA c2
0= =e .
.. .
. ..
y (n−1) (t) cn−1
Since etA is an invertible matrix (see Exercise 17), we infer that c0 = c1 = · · · = cn−1 = 0, and so
{q1 , . . . , qn } is linearly independent.
i.e.,
k1 k2 k3
m1 m2
Figure 3.1
above, and Newton’s second law of motion (“force = mass × acceleration”) give us the following
system of equations:
3.3. Flows and the Divergence Theorem. Let U ⊂ Rn be an open subset. Let F : U → Rn
be a vector field on U . So far we have dealt with vector fields of the form F(x) = Ax, where A is
an n × n matrix. But, more generally, we can try to solve the system of differential equations
We will write the solution of this initial value problem as φt (x0 ), indicating its functional depen-
dence on both time and the initial value. The function φ is called the flow of the vector field F.
Note that φ0 (x) = x for all x ∈ U .
t
Examples 10. (a) The flow of the
vector
field F (x) = x on R is φt (x) = e x.
x −y
(b) The flow of the vector field F = on R2 is
y x
x cos t − sin t x
φt = ;
y sin t cos t y
i.e., the flow
" lines #
are circles centered at the origin.
2 1
(c) Let A = . The flow of the vector field F(x) = Ax on R2 is
1 −2
" # " #
tA tΛ −1 1 3t 1 −3t −1
φt (x) = e x = P e (P x) = (5x1 + x2 )e + (−x1 + x2 )e ,
6 1 5
where " # " #
3 1 −1
Λ= and P = .
−3 1 5
(d) The flow of the general linear differential equation ẋ(t) = Ax(t) is given by φt (x) = etA x.
Finding an explicit formula for the flow of a nonlinear differential equation may be somewhat
difficult. ▽
§3. Difference Equations and Ordinary Differential Equations 431
It is proved in more advanced courses that if F is a smooth vector field on an open set U ⊂ Rn ,
then for any x ∈ U , there are a neighborhood V of x and ε > 0 so that for any y ∈ V the
flow
starting
at y, φt (y), is defined for all |t| < ε. Moreover, the function φ : V × (−ε, ε) → Rn ,
y
φ = φt (y), is smooth. We now want to give another interpretation of divergence of the vector
t
field F, first discussed in Section 6 of Chapter 8. It is a natural generalization of the elementary
observation that the derivative of the area of a circle with respect to its radius is the circumference.
F1
First we need to extend the definition of divergence to n dimensions: If F = ... is a smooth
vector field on Rn , we set Fn
∂F1 ∂F2 ∂Fn
div F = + + ··· + .
∂x1 ∂x2 ∂xn
Proposition 3.5. Let F be a smooth vector field on U ⊂ Rn , let φt denote the flow of F, and
let Ω ⊂ZU be a compact region with piecewise smooth boundary. Let V(t) = vol(φt (Ω)). Then
V˙ (0) = div FdV .
Ω
Remark . Using (the obvious generalization of) the DivergenceZ Theorem, Theorem 6.2 of
˙
Chapter 8, we have the intuitively appealing result that V(0) = F · ndS. That is, what causes
∂Ω
net increase in the volume of the region is flow across its boundary.
Examples 11. (a) In Figure 3.2, we see the flow of the unit square under the vector field
-4 -2 2 4
-2
-4
Figure 3.2
x 2x + y
F = . Note that area is preserved under the flow, as div F = 0.
y 5x − 2y
(b) In Figure 3.3 (with thanks to John Polking’s MATLAB software pplane5), we see the flow
of certain regions Ω. In (a), the region expands (as div F > 0), whereas in (b) the region
maintains its area (as div F = 0). ▽
432 Chapter 9. Eigenvalues, Eigenvectors, and Applications
x’=x+2y x’=-x+2y
y’=-2x+y y’=5x+y
2 2
1.5 1.5
1 1
y
y
φt(Ω)
-0.5 -0.5
Ω
Ω
-1 -1
-1.5 -1.5
-2 -2
(a) (b)
Figure 3.3
Proof. We have
Z Z
V(t) = φ∗t (dx1 ∧ · · · ∧ dxn ) = d(φt )1 ∧ · · · ∧ d(φt )n .
Ω Ω
By Exercise 7.2.20, we have
Z
∂
V˙ (0) = d(φt )1 ∧ · · · ∧ d(φt )n .
Ω ∂t t=0
∂ 2 φt ∂ 2 φt ∂ ∂φt
Now the fact that mixed partials are equal tells us that = , and so (dφt ) = d =
∂t∂xi ∂xi ∂t ∂t ∂t
d(φ̇t ). Moreover, φ˙0 (x) = F(x) (since φ̇t (x) = F(φt (x))), and φ0 (x) = x, so the latter integral
can be rewritten
Z
˙
V(0) = dF1 ∧ dx2 ∧ · · · ∧ dxn + dx1 ∧ dF2 ∧ dx3 ∧ · · · ∧ dxn + · · · + dx1 ∧ · · · ∧ dxn−1 ∧ dFn
ZΩ
= div Fdx1 ∧ · · · ∧ dxn ,
Ω
as required.
EXERCISES 9.3
2 5
1. Let A = . Calculate Ak for all k ≥ 1.
1 −2
*2. Suppose each of two tubs contains two bottles of beer; two are Budweiser and two are Beck’s.
Each minute, Fraternity Freddy picks a bottle of beer at random from each tub and replaces it
in the other tub. After a long time, what portion of the time will there be exactly one bottle
§3. Difference Equations and Ordinary Differential Equations 433
of Beck’s in the first tub? at least one bottle of Beck’s? (Hint: Let xk be the vector whose
entries are, respectively, the probabilities that there are 2 Beck’s, 1 of each, or 2 Buds in the
first tub.)
*3. Gambling Gus has $200 and plays a game where he must continue playing until he has either
lost all his money or doubled it. In each game, he has a 2/5 chance of winning $100 and a
3/5 chance of losing $100. What is the probability that he eventually loses all his money?
(Warning: Calculator or computer suggested.)
*4. If a0 = 2, a1 = 3, and ak+1 = 3ak − 2ak−1 , for all k ≥ 1, use methods of linear algebra to
determine the formula for ak .
5. If a0 = a1 = 1 and ak+1 = ak + 6ak−1 for all k ≥ 1, use methods of linear algebra to determine
the formula for ak .
6. Suppose a0 = 0, a1 = 1, and ak+1 = 3ak + 4ak−1 for all k ≥ 1. Use methods of linear algebra
to find an explicit formula for ak .
7. If a0 = 0, a1 = 1, and ak+1 = 4ak − 4ak−1 for all k ≥ 1, use methods of linear algebra to
determine the formula for ak . (Hint: The matrix will not be diagonalizable, but you can get
close if you stare at Exercise 9.2.16.)
*8. If a0 = 0, a1 = a2 = 1, and ak+1 = 2ak + ak−1 − 2ak−2 for k ≥ 2, use methods of linear algebra
to determine the formula for ak .
9. Consider the cat/mouse population problem studied in Example 1. Solve the following versions,
including an investigation of the dependence on the original populations.
10. Check that if A is an n × n matrix and the n × n differentiable matrix function E(t) satisfies
Ė(t) = AE(t) and E(0) = I, then E(t) = etA for all t ∈ R.
*14. Let
2 1
J = 2 1.
2
Calculate etJ .
*15. By mimicking the proof of Theorem 3.4, convert the following second-order differential equations
into first-order systems and use matrix exponentials to solve:
a. ÿ(t) − ẏ(t) − 2y(t) = 0, y(0) = −1, ẏ(0) = 4
b. ÿ(t) − 2ẏ(t) + y(t) = 0, y(0) = 1, ẏ(0) = 2
18. Consider the mapping exp : Mn×n → Mn×n given by exp(A) = eA . By Exercise 17, eA is
always invertible.
a. Use the Inverse Function Theorem to show that for every matrix B sufficiently close to I,
A
there is a unique A # close" to O so that
" sufficiently # e = B.
−1 −2
b. Can the matrices and be written in the form eA for some A?
−1 1
19. Use Proposition 3.5 to deduce that the derivative with respect to r of the volume of a ball of
radius r (in Rn ) is the volume (surface area) of the sphere of radius r.
20. It can be proved using (a generalization of) the Contraction Mapping Principle, Theorem 1.2
of Chapter 6, that when F is a smooth vector field, given a, there are δ, ε > 0 so that the
differential equation ẋ(t) = F(x(t)), x(0) = x0 , has a unique solution for all x0 ∈ B(a, δ) and
defined for all |t| < ε.
a. Assuming this result, prove that whenever |s|, |t|, and |s + t| < ε, we have φs+t = φs ◦ φt .
(Hint: Fix t = t0 and vary s.)
b. Deduce that φ−t = (φt )−1 .
p
c. By considering the example F (x) = |x|, show that uniqueness may fail when the vector
p
field isn’t smooth. Indeed, show that the initial value problem ẋ(t) = |x(t)|, x(0) = 0,
has infinitely many solutions.
Z
˙
21. Generalizing Proposition 3.5 somewhat, prove that V(t) = div FdV . (Hint: Use Exercise
φt (Ω)
20 and the Proposition as stated.)
22. a. Show that the space-derivative of the flow φt satisfies the first variation equation
b. For fixed x, let J(t) = det(Dφt (x)). Using Exercise 7.5.23, show that
Rt
div F(φs (x))ds
Deduce that J(t) = e 0 .
We now turn to the study of a large class of diagonalizable matrices, the symmetric matrices.
Recall that a square matrix A is symmetric when A = AT . To begin our exploration, let’s start
with a general symmetric 2 × 2 matrix
" #
a b
A= ,
b c
436 Chapter 9. Eigenvalues, Eigenvectors, and Applications
whose characteristic polynomial is p(t) = t2 − (a + c)t + (ac − b2 ). By the quadratic formula, its
eigenvalues are
p p
(a + c) ± (a + c)2 − 4(ac − b2 ) (a + c) ± (a − c)2 + 4b2
λ= = .
2 2
Only when A is diagonal are the eigenvalues not distinct. Thus, A is diagonalizable. Moreover, the
corresponding eigenvectors are
" # " #
b λ2 − c
v1 = and v2 = ;
λ1 − a b
note that
v1 · v2 = b(λ2 − c) + (λ1 − a)b = b(λ1 + λ2 − a − c) = 0,
and so the eigenvectors are orthogonal. Since there is an orthogonal basis for R2 consisting of
eigenvectors of A, we of course have an orthonormal basis for R2 consisting of eigenvectors of A.
That is, by an appropriate rotation of the usual basis, we obtain a diagonalizing basis for A.
v2
v1
Figure 4.1
" # " #
1 2 1 −1
q1 = √ , q2 = √ .
5 1 5 2
See Figure 4.1. ▽
From Proposition 4.5 of Chapter 1 we recall that for all x, y ∈ Rn and n × n matrices A we
have
Ax · y = x · AT y.
In particular, when A is symmetric,
Ax · y = x · Ay.
§4. The Spectral Theorem 437
Proof. We proceed by induction on n. The case n = 1 is automatic. Now assume that the
result is true for all symmetric linear maps T ′ : Rn−1 → Rn−1 . Given a symmetric linear map
T : Rn → Rn , we begin by proving that it has a real eigenvalue. We choose to use calculus to prove
this, but for a purely linear-algebraic proof, see Exercise 16. Consider the function
f : Rn → R, f (x) = Ax · x = xT Ax.
By compactness of the unit sphere, f has a maximum subject to the constraint g(x) = kxk2 =
1. Applying the method of Lagrange multipliers, we infer that there is a unit vector v so that
Df (v) = λDg(v) for some scalar λ. By Exercise 3.2.14, this means
Av = λv,
and so we’ve found an eigenvector of A; the Lagrange multiplier is the corresponding eigenvalue.
(Incidentally, this was derived at the end of Section 4 of Chapter 5.)
By what we’ve just established, T has a real eigenvalue λ1 and a corresponding eigenvector v1
⊥
of length 1. Let W = Span(v1 ) ⊂ Rn ; note that if w · v1 = 0, then T (w) · v1 = w · T (v1 ) =
λ1 w · v1 = 0, so that T (w) ∈ W whenever w ∈ W . If we let T ′ = T |W be the restriction of T
to W , since dim W = n − 1, it follows from our induction hypothesis that there is an orthonormal
basis {v2 , . . . , vn } for W consisting of eigenvectors of T ′ . Then {v1 , v2 , . . . , vn } is the requisite
orthonormal basis for Rn , since T (v1 ) = λ1 v1 and T (vi ) = T ′ (vi ) = λi vi for i ≥ 2.
Its characteristic polynomial is p(t) = −t3 + 2t2 + t − 2 = −(t2 − 1)(t − 2) = −(t + 1)(t − 1)(t − 2),
so the eigenvalues of A are −1, 1, and 2. As the reader can check, the corresponding eigenvectors
are
−2 0 1
v1 = 1 , v2 = −1 , and v3 = 1 .
1 1 1
438 Chapter 9. Eigenvalues, Eigenvectors, and Applications
Note that these three vectors form an orthogonal basis for R3 , and we can easily obtain an or-
thonormal basis by normalizing:
−2 0 1
1 1 1
q1 = √ 1 , q2 = √ −1 , and q3 = √ 1 .
6 2 3
1 1 1
Its characteristic polynomial is p(t) = −t3 + 18t2 − 81t = −t(t − 9)2 , so the eigenvalues of A are 0,
9, and 9. It is easy to check that
2
v1 = 2
1
gives a basis for E(0) = N(A). As for E(9), we find
−4 −4 −2
A − 9I = −4 −4 −2 ,
−2 −2 −1
which has rank 1, and so, as the spectral theorem guarantees, E(9) is 2-dimensional, with basis
−1 −1
v2 = 1 and v3 = 0 .
0 2
If we want an orthogonal (or orthonormal) basis, we must use the Gram-Schmidt process, Theorem
5.3 of Chapter 5: we take w2 = v2 and let
−1 −1 − 12
1
w3 = v3 − projw2 v3 = 0 − 1 = − 12 .
2
2 0 2
As a check, note that v1 , w2 , w3′ do in fact form an orthogonal basis. As before, if we want the
orthogonal diagonalizing matrix Q, we take
2 −1 −1
1 1 1
q1 = 2 , q2 = √ 1 , and q3 = √ −1 ,
3 2 3 2
1 0 4
whence
2
3 − √12 − 3√
1
2
Q=
2
3
√1
2
1
− 3√ 2
.
1 4
3 0 √
3 2
We reiterate that repeated eigenvalues cause no problem with symmetric matrices. ▽
We conclude this discussion with a comparison to our study of projections in Chapter 5. Note
that if we write out A = QΛQ−1 = QΛQT , we see that
| | | λ1 qT 1
λ2 qT 2
A = q1
q2 · · · qn .
.. ..
.
| | | λn qn T
n
X
= λi qi qT
i .
i=1
4.1. Conics and Quadric Surfaces. We now use the Spectral Theorem to analyze the equa-
tions of conic sections and quadric surfaces.
Note that the conic is much easier to understand in the y1 y2 -coordinates. Indeed, we recognize
that the equation 2y12 − 3y22 = 6 can be written in the form
y12 y22
− = 1,
3 2
q
from which we see that this is a hyperbola with asymptotes y2 = ± 23 y1 , as pictured in Figure
4.2. Now recall that the y1 y2 -coordinates are the coordinates with respect to the basis formed by
y2
y1
Figure 4.2
the column vectors of Q. Thus, if we want to sketch the picture in the original x1 x2 -coordinates,
we first draw in the basis vectors q1 and q2 , and these establish the y1 - and y2 -axes, respectively,
q2
q1
Figure 4.3
hyperboloid hyperboloid
ellipsoid cylinder
of one sheet of two sheets
Figure 4.4
and the graph of −y12 + y22 + 2y32 = 2 is the hyperboloid of one sheet shown in Figure 4.5. This is
y3
y2
y1
Figure 4.5
the picture with respect to the “new basis” {q1 , q2 , q3 } (given in the solution of Example 2). The
picture with respect to the standard basis, then, is as shown in Figure 4.6. (This figure is obtained
x3
x2
x1
Figure 4.6
The alert reader may have noticed that we’re lacking certain curves and surfaces. If there
are linear terms present along with the quadratic we must adjust accordingly. For example, we
recognize that
x21 + 2x22 = 1
is the equation of an ellipse centered at the origin. Correspondingly, by completing the square, we
see that
x21 + 2x1 + 2x22 − 3x2 = 13
2
§4. The Spectral Theorem 443
−1
is the equation of a congruent ellipse centered at . However, the linear terms become all
3/4
important when the symmetric matrix defining the quadratic terms is singular. For example,
x21 − x1 = 1
defines a pair of lines, whereas
x21 − x2 = 1
defines a parabola.
y3
y2
y1
Figure 4.7
x3
x2
x1
Figure 4.8
EXERCISES 9.4
1. Find"orthogonal
# matrices that diagonalize each of the following symmetric matrices:
6 2
*a.
2 9
2 0 0 1 −2 2
b. 0 1 −1 e. −2 1 2
0 −1 1 2 2 1
2 2 −2 1 0 1 0
0
*c. 2 −1 −1 1 0 1
f.
−2 −1 −1 1 0 1 0
3 2 2 0 1 0 1
*d. 2 2 0
2 0 4
1 1
*2. Suppose A is a symmetric matrixwith
1 and −1
eigenvalues 2 and 5. If the vectors
1 −1 0
span the 5-eigenspace, what is A 1 ? Give your reasoning.
2
1
3. A symmetric matrix A has eigenvalues 1 and 2. Find A if 1 spans E(2).
1
1 2
4. Suppose A is symmetric, A = , and det A = 6. Give the matrix A. Explain your
1 2
reasoning clearly. (Hint: What are the eigenvalues of A?)
*5. Prove that if λ is the only eigenvalue of a symmetric matrix A, then A = λI.
6. Decide (as efficiently as possible) which of the following matrices are diagonalizable. Give your
§4. The Spectral Theorem 445
reasoning.
5 0 2 5 0 2
A = 0 5 0, B = 0 5 0,
0 0 5 2 0 5
1 2 4 1 2 4
C = 0 2 2, D = 0 2 2.
0 0 3 0 0 1
7. Suppose A is a diagonalizable matrix whose eigenspaces are orthogonal. Prove that A is sym-
metric.
9. Apply the spectral theorem to prove that any symmetric matrix A satisfying A2 = A is in fact
a projection matrix.
10. Suppose T is a symmetric linear map satisfying [T ]4 = I. Use the spectral theorem to give a
complete description of T : Rn → Rn . (Hint: For starters, what are the potential eigenvalues
of T ?)
√
11. Let A be an m × n matrix. Show that kAk = λ, where λ is the largest eigenvalue of the
symmetric matrix AT A.
12. We say a symmetric matrix A is positive definite if Ax · x > 0 for all x 6= 0, negative definite if
Ax · x < 0 for all x 6= 0, and positive (resp., negative) semidefinite if Ax · x ≥ 0 (resp., ≤ 0)
for all x.
a. Prove that if A and B are positive (negative) definite, then so is A + B.
b. Prove that A is positive (resp., negative) definite if and only if all its eigenvalues are
positive (resp., negative).
c. Prove that A is positive (resp., negative) semidefinite if and only if all its eigenvalues are
nonnegative (resp., nonpositive).
d. Prove that if C is any m × n matrix of rank n, then A = C T C has positive eigenvalues.
e. Prove or give a counterexample: if A and B are positive definite, then so is AB. What
about AB + BA?
13. Let A be an n × n matrix. Prove that A is nonsingular if and only if every eigenvalue of AT A
is positive.
14. Prove that if A is positive semidefinite (symmetric) matrix, then there is a unique positive
semidefinite (symmetric) matrix B with B 2 = A.
15. Suppose A and B are symmetric and AB = BA. Prove there is an orthogonal matrix Q so that
both Q−1 AQ and Q−1 BQ are diagonal. (Hint: Let λ be an eigenvalue of A. Use the Spectral
Theorem to show that there is an orthonormal basis for E(λ) consisting of eigenvectors of B.)
446 Chapter 9. Eigenvalues, Eigenvectors, and Applications
16. Prove using only methods of linear algebra that the eigenvalues of a symmetric matrix are real.
(Hints: Let λ = a + bi be a putative complex eigenvalue of A, and consider the real matrix
B = A − (a + bi)I A − (a − bi)I = A2 − 2aA + (a2 + b2 )I = (A − aI)2 + b2 I .
Show that B is singular, and that if v ∈ N(B) is a nonzero vector, then (A − aI)v = 0 and
b = 0.)
17. If A is a positive definite symmetric n × n matrix, what is the volume of the n-dimensional
ellipsoid {x ∈ Rn : Ax · x ≤ 1}? (See also Exercise 7.6.3.)
18. Sketch the following conic sections, giving axes of symmetry and asymptotes (if any).
a. 6x1 x2 − 8x22 = 9
*b. 3x21 − 2x1 x2 + 3x22 = 4
*c. 16x21 + 24x1 x2 + 9x22 − 3x1 + 4x2 = 5
d. 10x21 + 6x1 x2 + 2x22 = 11
e. 7x21 + 12x1 x2 − 2x22 − 2x1 + 4x2 = 6
a−c
cot 2α =
2b
for the angle α through which we must rotate the x1 x2 -axes to get the appropriate y1 y2 -
axes. Derive this using eigenvalues and eigenvectors, and determine the type (ellipse, hy-
perbola, etc.) of the conic section Q(x) = 1 from a, b, and c. (Hint: Use the characteristic
polynomial to eliminate λ2 in your computation of tan 2α.)
b. Use the formula for Q̃ above to find the maximum and minimum of Q on the unit circle
kxk = 1.
21. In this exercise we consider the nature of the restriction of a quadratic form to a hyperplane.
Let A be a symmetric n × n matrix.
§4. The Spectral Theorem 447
a. Show that the quadratic form Q(x) = xT Ax on Rn is positive definite when restricted to
the subspace xn = 0 if and only if all the roots of
0
..
.
A − tI
0 = 0
1
0 ··· 0 1 0
are positive.
b. Use the change-of-basis theorem to prove that the restriction to the subspace b · x = 0 is
positive definite if and only if all the roots of
|
A − tI b
=0
|
bT 0
are positive.
c. Use this result to give a bordered Hessian test for the point a to be a constrained maximum
(minimum) of the function f subject to the constraint g = c. (See Exercises 5.4.34 and
5.4.32b.)
d. What is the analogous result for an arbitrary subspace?
22. We saw in Section 3 of Chapter 5 that we can write a symmetric n × n matrix A in the form
A = LDLT (where L is lower triangular with diagonal entries 1 and D is diagonal); we saw
in this section that we can write A = QΛQT for some orthogonal matrix Q. Although the
diagonal entries of D obviously need not be the eigenvalues of A, the point of this exercise is
to see that the signs of these numbers must agree. That is, the number of positive entries in D
equals the number of positive eigenvalues of A, the number of negative entries in D equals the
number of negative eigenvalues of A, and the number of zero (diagonal) entries in D equals the
number of zero eigenvalues.
a. Assume first that A is nonsingular. Consider the “straight line path” joining I and L (stick
a parameter s in front of the non-diagonal entries of L and let s vary from 0 to 1). We
then obtain a path in Mn×n joining D and A. Show that all the matrices in this path are
nonsingular and, applying Exercise 8.7.9, show that the number of positive eigenvalues of
D equals the number of positive eigenvalues of A. Deduce the result in this case.
b. In general, prove that the number of zero diagonal entries in D is equal to dim N(A) =
dim E(0). By considering the matrix A + εI for ε > 0 sufficiently small, use part a to
deduce the result.
Remark. Comparing Proposition 3.5 of Chapter 5 with Exercise 12 above, we can easily
derive the result of this exercise when A is either positive or negative definite. But the indefinite
case is more subtle.
Glossary of Notations and Results from
Single-Variable Calculus
Notations
448
GLOSSARY 449
f˜ extension of f by 0 260
f average value of f 288
450 GLOSSARY
Intermediate Value Theorem: Let f : [a, b] → R be continuous. Then for any y between f (a)
and f (b), there is x ∈ [a, b] with f (x) = y.
Rolle’s Theorem: Suppose f : [a, b] → R is continuous, f is differentiable on (a, b) and f (a) =
f (b). Then there is c ∈ (a, b) so that f ′ (c) = 0. (Proof : By the maximum value theorem, Theorem
1.2 of Chapter 5, f takes on its maximum and minimum values on [a, b]. If f is constant on [a, b],
then f ′ (c) = 0 for all c ∈ (a, b). If not, say f (x) > f (a) for some x ∈ (a, b), in which case f takes on
a global maximum at some c ∈ (a, b). Then f ′ (c) = 0 (by Lemma 2.1 of Chapter 5). Alternatively,
f (x) < f (a) for some x ∈ (a, b), in which case f takes on a global minimum at some c ∈ (a, b).
Then in this case, as well, f ′ (c) = 0.)
Mean Value Theorem: Suppose f : [a, b] → R is continuous and f is differentiable on (a, b).
Then there is c ∈ (a, b) so that f (b) − f (a) = f ′ (c)(b − a).
Z Fundamental Theorem of Calculus, Part I : Suppose f is continuous on [a, b] and we set F (x) =
x
f (t)dt. Then F ′ (x) = f (x) for all x ∈ (a, b).
a
Fundamental Theorem of Calculus, Part II : Suppose f is integrable on [a, b] and f = F ′ . Then
Z b
f (x)dx = F (b) − F (a).
a
Function Derivative
xn nxn−1
ex ex
log x 1/x
sin x cos x
cos x − sin x
tan x sec2 x
sec x sec x tan x
cot x − csc2 x
csc x − csc x cot x
√
arcsin x 1/ 1 − x2
arctan x 1/(1 + x2 )
Z Z
n xn+1
x dx = , n 6= −1 ex dx = ex
Z n+1 Z
dx
= log |x| sin xdx = − cos x
Z x Z
cos xdx = sin x tan xdx = − log | cos x|
Z Z
1 1
sin2 xdx = (x − sin x cos x) cos2 xdx = (x + sin x cos x)
Z 2 Z 2
2
tan xdx = tan x − x sec xdx = log | sec x + tan x|
Z Z
1
sec2 xdx = tan x sec3 xdx = (sec x tan x + log | sec x + tan x|)
Z Z 2
3 1 3 1
sin xdx = − cos x + cos x cos xdx = sin x − sin3 x
3
Z 3 Z 3
3 1 2 dx x
tan xdx = tan x + log | cos x| √ = arcsin
Z 2 Z a2 − x 2 a
dx 1 x dx 1 x + a
= arctan = log
a2 + x2 a a a 2 − x2 2 x − a
Z p Z
2 2
xp 2 2
a2 p
2 2 dx p
2 2
x ± a dx = x ±a ± log x + x ± a √ = log x + x ± a
Z 2 2 Z
2
x ±a 2
p xp 2 a2 x
2 2
a − x dx = a −x + 2 arcsin log xdx = x log x − x
Z 2 ax 2 a Z
e eax
eax sin bxdx = 2 (a sin bx − b cos bx) eax
cos bxdx = (a cos bx + b sin bx)
a + b2 a 2 + b2
454 GLOSSARY
Greek alphabet
Apostol, Tom M., Calculus (two volumes), 2nd ed. Waltham, MA: Blaisdell Publishing Co., 1967.
Although the first volume is needed for rudimentary vector algebra, the second volume includes
linear algebra, multivariable calculus (although only treating the “classic” versions of Stokes’s
Theorem), and an introduction to probability theory and numerical analysis.
Bamberg, Paul, and Sternberg, Shlomo, A Course in Mathematics for Students of Physics (two
volumes), Cambridge: Cambridge University Press, 1988. This book includes much of the
mathematics of our course, but also a volume’s worth of interesting physics (using differential
forms).
Edwards, Jr., C. H. , Advanced Calculus of Several Variables, New York: Dover Publications, 1994
(originally published by Academic Press, 1973). This very well-written book parallels ours for
students who have already had standard courses in linear algebra and multivariable calculus.
Of particular note is the last chapter, on the calculus of variations.
Friedberg, Stephen H., Insel, Arnold J., and Spence, Lawrence E., Linear Algebra, 3rd ed. Up-
per Saddle River, NJ: Prentice Hall, 1997. A well-written, somewhat more advanced book
concentrating on the theoretical aspects of linear algebra.
Hubbard, John H., and Hubbard, Barbara Burke, Vector Calculus, Linear Algebra, and Differential
Forms: A Unified Approach, 2nd ed. Upper Saddle River, NJ: Prentice Hall, 2002. Very similar
in spirit to our text, this book is wonderfully idiosyncratic and includes Lebesgue integration,
Kantarovich’s Theorem, and the exterior derivative from a non-standard definition. It also
treats the Taylor polynomial in several variables.
Shifrin, Theodore, and Adams, Malcolm, Linear Algebra: A Geometric Approach, New York:
W. H. Freeman, 2002. Includes a few advanced topics in linear algebra that we did not have
time to discuss in this text, e.g., complex eigenvalues, Jordan canonical form, and computer
graphics.
Spivak, Michael, Calculus, 3rd ed. Houston, TX: Publish or Perish, 1994. The beautiful, ultimate
source for single-variable calculus “done right.”
455
456 FURTHER READING
Strang, Gilbert, Linear Algebra and its Applications, 3rd ed. Philadelphia: Saunders, 1988. A
classic text, with far more depth on applications.
Flanders, Harley, Differential Forms with Applications to the Physical Sciences, New York: Dover
Publications, 1989 (originally published by Academic Press in 1963). A short, sophisticated
treatment of differential forms with applications to physics, topology, differential geometry, and
partial differential equations.
Guillemin, Victor, and Pollack, Alan, Differential Topology, Englewood Cliffs, NJ: Prentice Hall,
1974. The perfect follow-up to our introduction to manifolds and the material of Chapter 8,
Section 7.
Munkres, James, Topology, 2nd ed. Upper Saddle River, NJ: Prentice Hall, 2000. A classic,
extremely well-written text on point-set topology, to follow up on our discussion of open and
closed sets, compactness, maximum value theorem, etc.
Shifrin, Theodore, Abstract Algebra: A Geometric Approach, Upper Saddle River, NJ: Prentice
Hall, 1996. A first course in abstract algebra that will be accessible to anyone who’s enjoyed
this course.
Wilkinson, J. M., The Algebraic Eigenvalue Problem, New York: Oxford Univ. Press, 1965. An
advanced book that includes a proof of the algorithm based on the Gram-Schmidt process to
calculate eigenvalues and eigenvectors numerically.