0% found this document useful (0 votes)
29 views133 pages

Calc 3

Uploaded by

zhuoqunli315
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views133 pages

Calc 3

Uploaded by

zhuoqunli315
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 133

Multivariable Calculus

Lecture Notes for


MATH 2023

Frederick Tsz-Ho Fong


Department of Mathematics
Hong Kong University of Science and Technology

May 10, 2024


Contents

1 Three-Dimensional Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1 Rectangular Coordinates in R3 7
1.2 Dot Product 9
1.3 Cross Product 11
1.4 Lines and Planes 13
1.5 Parametric Curves 16

2 Partial Differentiations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1 Functions of Several Variables 23
2.2 Partial Derivatives 27
2.3 Chain Rule 31
2.4 Directional Derivatives 36
2.5 Tangent Planes 40
2.6 Local Extrema 42
2.7 Lagrange’s Multiplier 47
2.8 Optimizations 52

3 Multiple Integrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1 Double Integrals in Rectangular Coordinates 55
3.2 Fubini’s Theorem for General Regions 59
3.3 Double Integrals in Polar Coordinates 63
3.4 Triple Integrals in Rectangular Coordinates 68
3.5 Triple Integrals in Cylindrical Coordinates 72
3.6 Triple Integrals in Spherical Coordinates 75
4 Vector Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.1 Vector Fields on R2 and R3 79
4.2 Line Integrals of Vector Fields 81
4.3 Conservative Vector Fields 89
4.4 Green’s Theorem 97
4.5 Parametric Surfaces 104
4.6 Stokes’ Theorem 118
4.7 Divergence Theorem 125
4.8 Heat Diffusion (Optional) 131
5

Preface and Acknowledgement


This lecture note was written by the author for his Multivariable Calculus courses MATH 0200
at Brown University in 2014 and 2015, and MATH 2023 at the Hong Kong University of Science
and Technology (HKUST) in Spring 2016. These courses and this lecture note is not intended
to be mathematically rigorous. They aim at developing geometric intuitions toward the subject.
Students are expected to learn the rigorous treatment of multivariable functions in MATH
2043/3033/3043.
Some examples in this lecture note are modified from those in some textbooks listed below,
and solutions were rewritten to suit the lecture style of the author.
• Calculus for Scientists and Engineers, Multivariable by W. Briggs, L. Cochran, B. Gillett
• Thomas’ Calculus by J. Hass, C. Hell, M. Weir
• Calculus: Early Transcendentals by J. Stewart
Most diagrams in this lecture notes are produced using Adobe Illustrator, Mathematica, and
LaTeX Tikz codes. The author would like to thank Mr. HUNG Chun Kit for assisting in the
making of many diagrams in this lecture note.
1 — Three-Dimensional Space

“There is no royal road to geometry.”

Euclid

1.1 Rectangular Coordinates in R3


Throughout the course, we will use an ordered triple ( x, y, z) to represent a point in the three
dimensional space. The real numbers x, y and z in an ordered triple ( x, y, z) are respectively the
x-, y- and z-coordinates which, by convention, are defined according to the following diagram:

c
( a, b, c)
b
a
y
( a, b, 0)
x

Figure 1.1: Rectangular coordinates system in 3-space

Notation We will use the notation R3 to denote the entire three dimensional space.

Any point on the x-axis has the form ( x, 0, 0), i.e. y = 0 and z = 0. Similarly, points on
the y-axis are of the form (0, y, 0), and points on the z-axis are of the form (0, 0, z). The three
coordinate axes meet at a point with coordinates (0, 0, 0) which is called the origin.
A vector in R3 is an arrow which is based at one point and is pointing at another point. If a
vector v is based at ( x0 , y0 , z0 ) and points toward ( x1 , y1 , z1 ), then the vector is written as:
v = ( x1 − x0 )i + (y1 − y0 )j + (z1 − z0 )k.
For example, the vector based at (3, 2, −1) pointing at (5, 2, 0) is expressed as
(5 − 3)i + (2 − 2)j + (0 − (−1))k = 2i + k.
8 Three-Dimensional Space

i Consequently, any two vectors that are pointing at the same direction and have the same
length are considered to be equal, even though they may have different base points. For
instance, a vector v based at (1, 2, 3) pointing at (4, 3, 2), i.e.

v = (4 − 1)i + (3 − 2)j + (2 − 3)k = 3i + j − k,

is considered to be equal to the vector w based at (0, 0, 0) pointing at (3, 1, −1). In other
words, we can write w = v.

An alternative notation for a vector is the angular bracket ⟨ a, b, c⟩. We will sometimes write
a vector this way to save the hassle of writing down i, j and k:

Notation ⟨ a, b, c⟩ = ai + bj + ck.

In this course, we make very little conceptual distinction between a point ( x, y, z) and a
vector based at (0, 0, 0) pointing at the point ( x, y, z). However, speaking of notations, one should
use ⟨ x, y, z⟩ or xi + yj + zk to denote a vector and ( x, y, z) to denote a point so as to avoid
confusion.
Vector additions can scalar multiplications are defined as follows:
Definition 1.1 — Vector Additions and Scalar Multiplications. Let a = ⟨ a1 , a2 , a3 ⟩ and b =
⟨b1 , b2 , b3 ⟩ be two vectors in R3 , and c be a real scalar, then:

a + b = ⟨ a1 + b1 , a2 + b2 , a3 + b3 ⟩ (vector addition)
ca = ⟨ca1 , ca2 , ca3 ⟩ (scalar multiplication)

The negative of a vector is defined as: −a = (−1)a. The difference between vectors is
defined as a − b = a + (−b).

Geometrically, these vector operations can be represented by the following diagrams:

a a+b
b
b
a a−b

Figure 1.2: Geometric representations of various vector operations

Property Vector additions and scalar multiplications have the following algebraic properties
1. commutative rule: a + b = b + a
2. associatative rule: (a + b) + c = a + (b + c)
3. distributive rules: (λ + µ)a = λa + µa and λ(a + b) = λa + λb
1.2 Dot Product 9

1.2 Dot Product


There are two types of products for vectors in R3 , namely the dot product and the cross product.
The former outputs a scalar whereas the latter outputs a vector. In this section, We first talk
about the dot product.
Definition 1.2 — Dot Product. Let a = ⟨ a1 , a2 , a3 ⟩ and b = ⟨b1 , b2 , b3 ⟩, then dot product
between the vectors a and b are defined as:

a · b = a1 b1 + a2 b2 + a3 b3 .

It is important to note that the dot product between a vector a = ⟨ a1 , a2 , a3 ⟩ and itself is given
by:
a · a = a21 + a22 + a23
which is incidently the square of the length of the vector a (by the Pythagoreas’ Theorem in
R3 ).
Notation Let a = ⟨ a1 , a2 , a3 ⟩. We denote the length of a vector a by |a|, which is given by:
q
|a| = a21 + a22 + a23 .

It is important to note that a · a = |a|2 .

The length of a vector is sometimes called the norm, or the magnitude, of the vector.
Property It can easily be verified that the dot product satisfies the following algebraic
properties:
1. a · b = b · a.
2. (a + b) · c = a · c + b · c.
3. (λa) · b = λ(a · b)
4. 0 · a = a · 0 = 0.

The following theorem gives the geometric meaning of the dot product:
Theorem 1.1 Let a = ⟨ a1 , a2 , a3 ⟩ and b = ⟨b1 , b2 , b3 ⟩ be two vectors in R3 , and θ be the angle
between these two vectors. Then we have:

a · b = |a| |b| cos θ. (1.1)

Proof. The proof uses the Law of Cosines. Consider the triangle in the diagram below:

b a−b

θ
a

The side opposite to the angle is represented by the vector a − b. Using the Law of Cosines:

|a − b|2 = |a|2 + |b|2 − 2 |a| |b| cos θ


(a − b) · (a − b) = a · a + b · b − 2 |a| |b| cos θ
a · a − a · b − b · a + b · b = a · a + b · b − 2 |a| |b| cos θ
a ·
 a − 2a · b + X b ·X
X b =a ·a +X b ·X
Xb − 2 |a| |b| cos θ
−2a · b = −2 |a| |b| cos θ
a · b = |a| |b| cos θ.

10 Three-Dimensional Space

One immediate consequence of Equation (1.1) is that it allows us to use the dot product to
find the angle between two vectors. Precisely, the angle θ between two vectors a and b is given
by:
a·b
 
θ = cos−1 .
|a| |b|
The most important case is that the two vectors a and b are perpendicular, also known
as orthogonal. The angle between the vectors is π2 and so we have the following important
fact:
Corollary 1.2 Two non-zero vectors a and b are orthogonal if and only if a · b = 0.

This corollary is particularly useful to determine whether two vectors are perpendicular.

■Example 1.1 Show that any triangle which is inscribed in a circle and has one of its side
coincides the diameter of the circle must be a right-angled triangle.

■ Solution Let O be the center of the circle. Define vectors a and b as in the diagram below.

−a O a

We would like to show the vectors in red and blue are orthogonal to each other. By basic
vector additions and subtractions:

Red vector = −a − b Blue vector = a − b.

Their dot product equals to

(−a − b) · (a − b) = −a · a + 
a ·
b −
b ·
a +b·b
= − | a |2 + | b |2 (recall v · v = |v|2 )

Since both a and b represent the radii of the circle, they have the same magnitude. Therefore
|a| = |b| and we have (−a − b) · (a − b) = 0. This shows the red and blue vectors are
orthogonal.
1.3 Cross Product 11

1.3 Cross Product


The cross product is another important vector operation. In contrast to the dot product, the
cross product gives a vector instead of a scalar. A vector is characterized by its length and
direction, we define the cross product by declaring these two attributes:
Definition 1.3 — Cross Product. Given two vectors a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k
in R3 with angle θ between them, the cross product between a and b, denoted by a × b, is
defined as the vector such that:
• the length is given by |a × b| = |a| |b| sin θ, i.e. the area of the parallelogram formed
by vectors a and b;
• the cross product a × b is orthogonal to both a and b;
• the direction of a × b is determined by the right-hand grab rule illustrated by the
Figure 1.3.

Figure 1.3: right-hand grab rule

From the right-hand grab rule, we can clearly see that a × b and b × a are vectors with the
same length but in opposite direction, i.e. a × b = −b × a. The magnitude of |a × b|, which is
defined to be |a| |b| sin θ, is the area of the parallelogram formed by a and b:

b |b| sin θ

θ
a

The following are some useful algebraic properties of the cross products. Based on the
definition of cross products we presented above, the proofs are purely geometric and are
omitted here.
Property The cross product satisfies:
1. a × b = −b × a
2. (a + b) × c = a × c + b × c
3. a×0 = 0
4. a×a = 0
12 Three-Dimensional Space

For simple vectors such as i, j and k, their cross products can be easily found from the
definition:
i × j = k, j × k = i, k × i = j.
For more complicated vectors, the cross product can be computed using the following determi-
nant formula:
Theorem 1.3 — Determinant Formula of Cross Product. Given two vectors a = a1 i + a2 j + a3 k
and b = b1 i + b2 j + b3 k, their cross product is given by:

i j k
a × b = a1 a2 a3 (1.2)
b1 b2 b3
= ( a2 b3 − a3 b2 )i − ( a1 b3 − a3 b1 )j + ( a1 b2 − a2 b1 )k.

Proof. The proof follows from expanding:

a × b = ( a1 i + a2 j + a3 k) × (b1 i + b2 j + b3 k)

using the algebraic properties of the cross product. It is left as an exercise for readers. ■
The cross product can be used to find a vector which is orthogonal to a plane.

■ Example 1.2 Given three points in the xyz-space:

A(0, 2, −1), B(4, 0, −1), C (7, −3, 0)

Find a vector n which is orthogonal to the plane passing through A, B and C. Moreover, find
the area of the triangle △ ABC.

■ Solution A vector n is orthogonal to the plane if and only if it is orthogonal to any two

(non-parallel) vectors on the plane. We will first find two vectors on the plane and then take
the cross product. The outcome will give a vector orthogonal to these two vectors (hence
orthogonal to the plane as well).
The following two vectors lie on the plane:
−→
AB = ⟨4, 0, −1⟩ − ⟨0, 2, −1⟩ = ⟨4, −2, 0⟩
−→
AC = ⟨7, −3, 0⟩ − ⟨0, 2, −1⟩ = ⟨7, −5, 1⟩
−→ −→
Taking the cross product: AB × AC = ⟨−2, −4, −6⟩.
Therefore, the required vector n can be taken to be any scalar multiple of ⟨−2, −4, −6⟩,
such as ⟨1, 2, 3⟩ or ⟨2, 4, 6⟩.
−→ −→
The length of the cross product AB × AC is equal to the area of the parallelogram formed
−→ −→
by AB and AC. The area of the triangle △ ABC is 12 of the area of this parallelogram.
Therefore,
1 −→ − → 1
q √
Area of △ ABC = AB × BC = (−2)2 + (−4)2 + (−6)2 = 14.
2 2
1.4 Lines and Planes 13

1.4 Lines and Planes


1.4.1 Parametric Equations of Lines
In the three dimensional space, lines are no longer represented by an equation like x + 2y = 1
in the two dimensional plane. In order to represent a straight-line (and a curve as well), we
need to introduce the time variable t, and think of a straight-line or a curve as the path of a
particle travelling as t varies.
Suppose the line L passes through the point P0 ( x0 , y0 , z0 ) and is parallel to the vector
−→
v = ⟨v1 , v2 , v3 ⟩ (see Figure 1.4). For any variable point P( x, y, z), the vector P0 P is parallel to
−→
the vector v, meaning that P0 P = tv for some real number t. Therefore,

⟨ x, y, z⟩ − ⟨ x0 , y0 , z0 ⟩ = tv
⟨ x, y, z⟩ = ⟨ x0 , y0 , z0 ⟩ + t ⟨v1 , v2 , v3 ⟩

Therefore, we have:

x = x0 + tv1
y = y0 + tv2
z = z0 + tv3

which is called the parametric equation of the line L. It is called this way because the variable
t is called the parameter of the line.

Figure 1.4: a straight line L passing through P0 and parallel to v

Notation In this course, the vector r is “reserved” to denote the position vector ⟨ x, y, z⟩.

Using this notation, we can also write the parametric equation of the line L in vector form:
−−→
r(t) = OP0 + tv.

The t-variable in r(t) emphasizes the fact that the position vector r depends on t. It can be
omitted if it is clear that t is the parameter letter.

■Example 1.3 Find parametric equation of the line L passing through both A(3, −2, 0) and
B(1, 0, 1). Express your answer in both equation form and vector form.
14 Three-Dimensional Space

■ Solution In order to write down the parametric equation of a straight-line, we need two

“ingredients":
1. a given point P0 on the line, and
2. the direction v of the line. −→
In this case, we can take P0 to be A(3, −2, 0). The direction of the line L is the vector AB,
which is given by:
−→ −→ −→
AB = OB − OA = ⟨1, 0, 1⟩ − ⟨3, −2, 0⟩ = ⟨−2, 2, 1⟩ .

With P0 (3, −2, 0) and v = ⟨−2, 2, 1⟩, the parametric equation of the line L is given by:

x = 3 − 2t
y = −2 + 2t
z = 0+t

or equivalently,
r(t) = ⟨3, −2, 0⟩ + t ⟨−2, 2, 1⟩ .

i In the above example, one may also take P0 to be B(1, 0, 1) and keeping v to be ⟨−2, 2, 1⟩,
then the parametric equation of L is given by:

r(t) = ⟨1, 0, 1⟩ + t ⟨−2, 2, 1⟩ .

Although it gives a different r(t), this parametric equation represents the same straight
line L. Every straight line can be represented by many different parametric equations!

1.4.2 Equation of Planes


In three dimensions, equation of a plane can be represented in the form of Ax + By + Cz = D.
For a plane through a given point P0 ( x0 , y0 , z0 ) with a normal vector n = ⟨ A, B, C ⟩, the equation
of the plane is given by:
Ax + By + Cz = Ax0 + By0 + Cz0 . (1.3)
Equation (1.3) can be proved by considering a variable point P( x, y, z). As illustrated in
−→
Figure 1.5, the vector P0 P lies on the plane and therefore is orthogonal to the normal vector n.
Therefore, we have:
−→
n · P0 P = 0
⟨ A, B, C ⟩ · (⟨ x, y, z⟩ − ⟨ x0 , y0 , z0 ⟩) = 0
⟨ A, B, C ⟩ · ⟨ x, y, z⟩ − ⟨ A, B, C ⟩ · ⟨ x0 , y0 , z0 ⟩ = 0
Ax + By + Cz = Ax0 + By0 + Cz0 .

P0 P

Figure 1.5: equation of a plane


1.4 Lines and Planes 15

In order to find n, one may use the cross product. Here is an example:

■ Example 1.4 Find an equation of the plane in R3 passing through the following three
points.
A(0, 2, −1), B(4, 0, −1), C (7, −3, 0).

■ Solution The two “ingredients" of finding the equation of a plane are


1. a given point on the plane; and
2. a normal vector to the plane.
In order to find the normal vector the plane through A, B and C, we take the cross product
−→ −→
of AB and AC.

−→
AB = ⟨4, 0, −1⟩ − ⟨0, 2, −1⟩ = ⟨4, −2, 0⟩
−→
AC = ⟨7, −3, 0⟩ − ⟨0, 2, −1⟩ = ⟨7, −5, 1⟩
−→ −→
Taking the cross product: AB × AC = ⟨−2, −4, −6⟩. Any non-zero vector parallel to this
cross product is a normal vector to the plane. For simplicity, we can take:

n = ⟨1, 2, 3⟩ .

Take A(0, 2, −1) to be the given point P0 , then the equation of the plane through A, B
and C is given by:
1x + 2y + 3z = 1(0) + 2(2) + 3(−1)
| {z }
( x0 ,y0 ,z0 )=(0,2,−1) and n=⟨1,2,3⟩

After simplification: x + 2y + 3z = 1.
16 Three-Dimensional Space

1.5 Parametric Curves


1.5.1 Parametric Equations of Curves
In two dimensions, there are two ways to represent a curve, namely in the form of x2 + y2 = 1,
or of the form

x = cos t
y = sin t.

The former is called the Cartesian equation and the second one is called the parametric
equation.
However, in three dimensions, a single Cartesian equation such as x2 + y2 + z2 = 1
represents a surface instead. Therefore, we will only use the parametric equations to present
curves in three dimensions.
Definition 1.4 — Parametric Equation of a Curve. The parametric equation of a curve is of the
form:

x = f (t)
y = g(t)
z = h(t)

where f (t), g(t) and h(t) are differentiable functions of t. In vector notations, the parametric
equation of this curve is written as:

r(t) = f (t)i + g(t)j + h(t)k.

The parametric equation of a straight-line is a special case of parametric equation of a


“curve". An interesting example of a parametric curve is the helix:

t
r1 (t) = (cos t)i + (sin t)j + k.
20
It is a curve that goes around the circle but the altitude is constantly increasing. See Figure
1.6a for the computer sketch. Here is another example of a parametric curve. See Figure 1.6b
for the sketch.
r2 (t) = (sin t)i + (cos t)j + (sin 2t)k.

1.0
0.5
0.0
-0.5
-1.0

1.0
1.5

0.5
1.0

0.0

0.5
1.0
-0.5
0.5
0.0
-1.0 -1.0
-1.0 0.0
-0.5
-0.5
0.0 0.0 -0.5
0.5 0.5
-1.0
1.0 1.0

(a) sketch of r1 (t), a helix (b) sketch of r2 (t)

Figure 1.6: Sketches of two parametric curves


1.5 Parametric Curves 17

1.5.2 Derivatives of Parametric Curves


One reason for using vector notations for a parametric curve is that their derivatives carry
various physical and geometric meanings.
Given a parametric curve r(t) = f (t)i + g(t)j + h(t)k, regarded as the position vector of a
particle at time t, then:
• the first derivative, r′ (t) = f ′ (t)i + g′ (t)j + h′ (t)k, represents the velocity vector of the
particle,

p derivative r (t) (if non-zero) is a tangent vector of the curve,
• geometrically, the first

• the length |r (t)| = ( f (t)) + ( g′ (t))2 + (h′ (t))2 represents the speed of the particle,
′ 2
and
• the second derivative, r′′ (t) = f ′′ (t)i + g′′ (t)j + h′′ (t)k, represents the acceleration vector
of the particle.

■ Example 1.5 Find the velocity, speed and acceleration of the particle whose path is:
 
r(t) = (sin t) i + t2 − cos t j + et k.

■ Solution
 ′ ′
velocity = r′ (t) = (sin t)′ i + t2 − cos t j + et k
= (cos t) i + (2t + sin t) k + et k
q
speed = r (t) = (cos t)2 + (2t + sin t)2 + (et )2

p
= cos2 t + 4t2 + 4t sin t + sin2 t + e2t
p
= 1 + 4t2 + 4t sin t + e2t
d
acceleration = r′′ (t) = r′ (t)
dt
′
= (cos t)′ i + (2t + sin t)′ j + et k
= (− sin t)i + (2 + cos t) j + et k.

Conservation of Angular Momentum


In physics, given a particle with mass m travelling along the r(t), the following vector is defined
to be the angular momentum about the origin of the particle:

L(t) = r(t) × mr′ (t).

When L(t) is a non-zero constant vector (independent of t), we say that the angular momentum
is conserved. The conservation of angular momentum implies that the path of the particle is
contained in a plane. It can be explained as follows:
By the definition of cross product, the angular momentum L(t) is always orthogonal to r(t)
(and to r′ (t) too, but we do not need this). Therefore, at any time t, we have:

L(t) · r(t) = 0.

Let r(t) = x (t)i + y(t)j + z(t)k. If L(t) is a constant vector, it can be expressed as L =
Ai + Bj + Ck where A, B and C are fixed numbers. Then:

( Ai + Bj + Ck) · ( x (t)i + y(t)j + z(t)k) = 0


Ax (t) + By(t) + Cz(t) = 0.

Therefore, the point ( x (t), y(t), z(t)) lies on the plane Ax + By + Cz = 0, which is a plane with
normal vector L passing through the origin. In other words, the path of the particle is confined
in this plane.
18 Three-Dimensional Space

Product Rules of Differentiating Curves


When differentiating the dot or cross product of two curves u(t) and v(t), you may apply the
product rule as like in single variable calculus.

Property Given two curves u(t) and v(t), and a scalar function f (t), we have:
d ′ ′
1. dt ( f ( t ) u ( t )) = f ( t ) u ( t ) + f ( t ) u ( t )
d ′ ′
2. dt ( u ( t ) · v ( t )) = u ( t ) · v ( t ) + u ( t ) · v ( t )
d ′ ′
3. dt ( u ( t ) × v ( t )) = u ( t ) × v ( t ) + u ( t ) × v ( t ).

Here is a good example on the use of one of the above product rules.

■ Example 1.6 Given r ( t ) represents a particle travelling at uniform speed C. Show that its

velocity and acceleration vectors are always orthogonal.

■ Solution The particle is travelling at uniform speed C. Therefore, |r′ (t)| ≡ C. We want to
show that r′ (t) · r′′ (t) ≡ 0, so it is natural to differentiate |r′ (t)| with respect to t so that the
RHS vanishes and the LHS perhaps may be related to r′′ (t).
However, since |r′ (t)| is the form of a square root so it is cumbersome to differentiate it.
2 2
Instead, we differentiate |r′ (t)| = C2 using the fact that |r′ (t)| = r′ (t) · r′ (t):
2
r′ (t) = C2
r′ (t) · r′ (t) = C2
d ′
r (t) · r′ (t) = 0

dt
r′′ (t) · r′ (t) + r′ (t) · r′′ (t) = 0
2r′ (t) · r′′ (t) = 0.

Therefore, we have r′ (t) · r′′ (t) = 0, which is desired.

1.5.3 Arc Lengths of Curves


For a particle travelling at uniform speed, the distance (i.e. arc length) travelled is simply:
distance = speed × time lapsed
As compared to the area of a rectangle: height × width. However, if one is asked to calculate
the area under a curve y = f ( x ), a ≤ x ≤ b, one should consider the integral
ˆ b ˆ b
ydx = f ( x )dx
a a
´b
under the rationale of Riemann sum: a ydx ≈ ∑i yi ∆xi .
At the same token, if the particle is not travelling at uniform speed, one should calculate
the distance by integration:
ˆ
distance = (speed) × d(time),
´
as compared to area = height × d(width). Precisely, we have:
Theorem 1.4 Given a parametric curve r(t). The arc length of the curve from the point at
t = a to the point at t = b is given by:
ˆ b
arc length = |r′ (t)|dt. (1.4)
a
1.5 Parametric Curves 19

■ Example 1.7 Find the arc length of the curve:



1 2 2 2 3
r(t) = t i + t 2 j + tk
2 3

from (0, 0, 0) to 2, 38 , 2 .


■ Solution It is simple to verify that the initial point corresponds to t = 0 since r (0) = ⟨0, 0, 0⟩,

whereas the final point corresponds to t = 2 since r(2) = 2, 83 , 2 .


D √ 1 E
r′ (t) = t, 2t 2 , 1
p
r′ (t) = t2 + 2t + 1
q
= ( t + 1)2
= t+1 (note that t + 1 > 0 in our case)

The desired arc length is given by:


ˆ 2 ˆ 2
r′ (t) dt = (t + 1)dt
0 0
2
t2
= +t
2 0
= 10.

1.5.4 Arc-Length Parametrization


Let’s begin our discussion by considering the three curves:

r1 (t) = (cos t)i + (sin t)j + tk, 0 ≤ t ≤ 2π


r2 (t) = (cos 2t)i + (sin 2t)j + 2tk, 0≤t≤π

If you plot them using Mathematica, you should find out that these two curves are the same,
although their speeds are different. The curve r2 (t) is obtained by replacing every t in r1 (t) by
2t. The initial and final times are adjusted so that the end-points of both r1 and r2 are (0, 0, 0)
and (0, 0, 2π ). We say r2 is a reparametrization of r1 .
If r(s) is a parametric curve such that |r′ (s)| = 1 for any s, we say the curve is parametrized
by arc-length. For such a parametrization, it is conventional to use s to denote the parameter.
Given a parametric curve r(t), in theory one can reparametrize the curve by arc-length, such
that with the new parameter s, the curve r(s) travels at unit speed. To find the arc-length
parametrization, you may follow the procedure:
1. Given a curve r(t) : [ a, b] → R3 , compute the following integral:
ˆ t
s= r′ (τ ) dτ.
a

2. Since the upper limit of the above integral is t, the function s should be a function of t.
Express t in terms of s whenever it is possible, so that t is a function of s, i.e. t = t(s).
3. Finally, replace all t’s by this function of s in the curve r(t).
The new parametrization er(s) will be arc-length parametrized. Let’s see some examples before
we learn why it works:
20 Three-Dimensional Space

■ Example 1.8 Find the arc-length parametrization of the curve:

r(t) = (cos t)i + (sin t)j + tk, t ∈ [0, 2π ].

■ Solution By straight-forward computations, we get



r′ (t) = 2.

Therefore, ˆ ˆ t√
t √
s(t) = r′ (τ ) dτ = 2dτ = 2t.
0 0
Express t in terms of s, we get t = √s . Replace all t’s in r(t) by √s , we get an arc-length
2 2
parametrization:    
s s s
er(s) = cos √ i + sin √ j + √ k.
2 2 2

■ Example 1.9 Find the arc-length parametrization of the curve:



1 2 2 2 3
r(t) = t i + t 2 j + tk, t ≥ 0.
2 3

■ Solution By straight-forward computations (and simplification), we get:

r′ (t) = t + 1.

Consider: ˆ ˆ
t t
t2
s= r′ (τ ) dτ = (τ + 1)dτ = + t.
0 0 2
To solve t in terms of s, we use the quadratic equation. One should get:

−2 + 4 + 8s √
t= = −1 + 1 + 2s.
2
Finally, replace all t’s in r(t) by this function of s, we get an arc-length parametrization:

1 √ 2 2 2 √ 3/2  √ 
er(s) = −1 + 1 + 2s i + −1 + 1 + 2s j + −1 + 1 + 2s k.
2 3

To see why this procedure gives an arc-length parametrization, we need to show |er′ (s)| = 1.
Note that we relabel the parametrization r to er just to avoid confusion. Rigorously, they are
related by er(s) = r(t(s)) regarding t as a function of s.
We first use chain rule:
dr(t(s)) dr dt
er′ (s) = =
ds dt ds
dt
= r′ (t) .
ds
ˆ t
Recall that s is defined to be s = r′ (τ ) dτ. The Fundamental Theorem of Calculus tells us
a
ds
that = r′ (t) and so,
dt
dt 1 1
= ds = ′ .
ds
dt
|r (t)|
1.5 Parametric Curves 21

Therefore, we have:
1
er′ (s) = r′ (t) · = 1.
|r′ (t)|
The parametrization r(s) has unit speed, and hence is an arc-length parametrization.

i Although the above procedure of finding arc-length parametrization works in the two
examples we have seen, in general it may be hard to find an arc-length parametrization.
Since both steps – integration and solving t in terms of s – can be difficult if the given
curve r(t) is not nice.

1.5.5 Curvature
Curvature is quantity that measures the sharpness of a curve, and is closely related to the
acceleration. Imagine you are driving a car along a curved road. On a sharp turn, the force
exerted on your body is proportional to the acceleration according to the Newton’s Second Law.
Therefore, given a parametric curve r(t), the magnitude of the acceleration |r′′ (t)| somewhat
reflects the sharpness of the path – the sharper the turn, the larger the |r′′ (t)|.
However, the magnitude |r′′ (t)| is not only affected by the sharpness of the curve, but also
on how fast you drive. In order to give a fair and standardized measurement of sharpness, we
need to get an arc-length parametrization r(s) so that the “car” travels at unit speed.
Definition 1.5 — Curvature. Given a curve γ in R2 or R3 which can be arc-length parametrized
by r(s), then it’s curvature is a function of s defined as:

κ (s) := r′′ (s) .

■ Example 1.10 Find the curvature of the circle of radius R centered at the origin (0, 0) in R2 .

■ Solution The circle of radius R centered at the origin (0, 0) on the xy-plane can be
parametrized by:
r(t) = ( R cos t, R sin t).
It can be easily verified that |r′ (t)| = R and so r(t) is not an arc-length parametrization.
To find an arc-length parametrization, we let:
ˆ t ˆ t
s(t) = r′ (τ ) dτ = R dτ = Rt.
0 0
s
Therefore, t(s) = R as a function of s and so an arc-length parametrization of the circle is:
 s s
er(s) = R cos , R sin .
R R
To find its curvature, we compute:

d  s s
er′ (s) = R cos , R sin
ds s
R
s
R
= − sin , cos
 R R 
′′ 1 s 1 s
er (s) = − cos , − sin
R R R R
1
κ (s) = er′′ (s) = .
R
1
Thus the curvature of the circle is given by R, i.e. the larger the circle, the smaller the
curvature.
2 — Partial Differentiations

“As you will find in multivariable


calculus, there is often a number of
solutions for any given problem.”

John Nash

2.1 Functions of Several Variables


As the name implies, a function of several variables (or a multivariable function) is a function
which depends on several quantities. Examples of which include:

volume of a cylinder = πr2 h

where r is the radius of the cylinder and h is the height.


Symbolically, we denote the function for the volume of a cylinder by V (r, h), which indicates
V depends on both r and h. We can write:

V (r, h) = πr2 h.

In this chapter, we will extend theory and applications of single-variable differentiation to


multivariable differentiation. Multivariable integration will be discussed in the next chapter.

2.1.1 Domains of Functions


An input of a function f ( x, y) of two variables involves two quantities x and y. Each input
is then represented by a point ( x, y) in R2 . As in single-variable calculus, pthe domain of a
function is the set of allowable inputs. For instance, the function f ( x, y) = y − x2 is defined
only when y ≥ x2 . The domain of this function is given by:

D = {( x, y) : y ≥ x2 }

which is the region above the parabola y = x2 in R2 (the parabola is included).


1
The function g( x, y) = 2 is defined everywhere on R2 except the origin (0, 0). There-
x + y2
fore the domain of g is given by {( x, y) : x ̸= 0 and y ̸= 0}. In short, we can write this set as
R2 \{(0, 0)}, meaning R2 with the origin removed.
24 Partial Differentiations

1
The function h = is undefined when xy = 0, or equivalently when at least one of the x
xy
and y is zero. We can write its domain as {( x, y) : x ̸= 0 or y ̸= 0}. Geometrically, it is R2 with
both x- and y-axes removed.

2.1.2 Graphs of Two-Variable Functions


In single-variable calculus, we visualize a function y = f ( x ) through its graph. The horizontal
x-axis stands for the inputs, and the height of the graph above x represents the output f ( x ).
Many concepts in single-variable calculus, such as derivatives, integrals, critical points, etc. are
introduced using the graph of a function.
For functions of two variables, i.e. f ( x, y), the graph is no longer a curve in R2 , but a surface
in R3 . The inputs involve two variables x and y, and are represented by points on the xy-plane.
The value of the function f ( x, y) is now represented by the height z of the surface above the
point ( x, y).

Figure 2.1: value of f ( x0 , y0 ) is the height of the surface above the point ( x0 , y0 , 0)

(a) the graph of z = x2 − y2 (b) the graph of z = x2 + y2

Figure 2.2: graphs of two-variables functions


2.1 Functions of Several Variables 25

2.1.3 Level Set Diagrams


Another common way to visualize a two-variable function is through its level sets:
Definition 2.1 — Level Sets. Given a function f ( x1 , . . . , xn ) : Rn → R, a level set of the
function f is a subset of Rn of the form:

f ( x1 , . . . , x n ) = c

where c is a constant.
Given f ( x, y) = x2 + y2 , which is a function from R2 to R. An example of a level set of f
is x2 + y2 = 1, which is a unit circle on R2 centered at the origin. By taking c to be different
| {z }
f ( x,y)
values, we get several level sets on the plane. They are circles centered at the origin with
varying radii depending on the value of c chosen:

c=0 x 2 + y2 = 0 the origin only


2 2
c=1 x +y = 1 radius = 1
2 2

c=2 x +y = 2 radius = 2

c=3 x 2 + y2 = 3 radius = 3

The level set diagram of the two-variable function f ( x, y) consists of some representative
level sets of the function on R2 . See Figure 2.3.
3 16 12 10 14
16

14 6
2 8
12

10
2
1

-1

10
4
12
-2
14

16
-3 14 10 12 16
-3 -2 -1 0 1 2 3

Figure 2.3: level set diagram of f ( x, y) = x2 + y2

For three-variable functions f ( x, y, z), we do not attempt to visualize its graph, but we can
visualize its level set diagram. The former requires the fourth dimension while the latter can
be visualized in R3 . A generic level set of a three-variable function f ( x, y, z) is a surface in R3
(see Figures 2.4ab).
To summarize, a the graph and level set of a function on several variables are:
Functions Graph Level Sets
f (x) y = f ( x ) is a curve in R 2 f ( x ) = c is generically a point on R
f ( x, y) z = f ( x, y) is a surface in R3 f ( x, y) = c is generically a curve on R2
f ( x, y, z) cannot visualize f ( x, y, z) = c is generically a surface in R3
f ( x, y, z, w) cannot visualize cannot visualize
26 Partial Differentiations

(a) some level sets of x2 + y2 + z2 (b) some level sets of x3 + y2 − z2

Figure 2.4: level sets of three-variable functions

2.1.4 Continuous Functions


The concept of continuity plays an important role in single variable calculus since many
theorems require continuity as one of the conditions. For multivariable functions, the notion of
continuity is formally defined as follows: a function f ( x, y) is continuous at ( x0 , y0 ) if:
1. ( x0 , y0 ) is in the domain of f ( x, y); and
2. for any ε > 0, there exists δ > 0 such that whenever ( x, y) is in the domain of f and that
q
( x − x0 )2 + (y − y0 )2 < δ,

we have | f ( x, y) − f ( x0 , y0 )| < ε.
Don’t panic if you cannot understand this definition at this moment! We will not deal with this
definition in this course, but instead we learn continuity through some examples. The rigorous
approach of dealing with continuity will be covered systematically in MATH 2033 and MATH
3033. In this course, we will use the following facts about continuity without proof:
1. Any polynomial such as f ( x, y) = x2 + xy + y5 is continuous at every ( x0 , y0 ) in R2 (we
can also say it is continuous on R2 ).
2. sin x, cos x,√e x , | x | are all continuous everywhere.
3. ln x, tan x, x are continuous on their domains.
4. The sum, difference and product of two continuous functions are all continuous.
f ( x,y)
5. The quotient g( x,y) of two continuous functions f ( x, y) and g( x, y) is continuous at ( x0 , y0 )
whenever g( x0 , y0 ) ̸= 0. For instance, the function

x 2 − y2
x 2 + y2

is continuous on R2 \{(0, 0)}.


6. The composition f ◦ g of two continuous functions f (t) and g( x, y) is continuous on the
domain of f ◦ g. For example, the functions
x
e x −y , cos
1 + x 2 + y2

are continuous on R2 , whereas the function


1
q
y − x2 ,
xy

are continuous on their domains.


2.2 Partial Derivatives 27

2.2 Partial Derivatives


2.2.1 First Derivatives
Given a multivariable function such as f ( x, y), one can talk about derivatives with respect to
both variables x and y. Taking partial derivatives means differentiating f ( x, y) with respect to
one of the variables while keeping the other variables fixed.
Definition 2.2 — Partial Derivatives. Given a multivariable function f ( x, y), we define

∂f
( x, y) := the derivative of f ( x, y) with respect to x regarding y constant
∂x
f ( x + h, y) − f ( x, y)
= lim ;
h →0 h
∂f
( x, y) := the derivative of f ( x, y) with respect to y regarding x constant
∂y
f ( x, y + h) − f ( x, y)
= lim .
h →0 h

∂f ∂f
Notation Alternatively, we sometimes denote ∂x by f x , and ∂y by f y . Note also that we do
not use f ′ ( x, y) for multivariable functions, since it is ambigious to whether it means f x or
fy.

Computations of partial derivatives are as easy as single-variable derivatives, as illustrated in


the following example:

∂f ∂f
■ Example 2.1 Find ∂x and ∂y for the function:

f ( x, y) = x2 sin( xy).

∂f
■ Solution To calculate ∂x , we regard y as a constant:

∂f ∂  2 
= x sin( xy) (regarding y constant)
∂x ∂x
∂x2 ∂
= sin( xy) + x2 sin( xy) (product rule)
∂x ∂x

= 2x sin( xy) + x2 · cos( xy) xy
∂x
= 2x sin( xy) + x2 y cos( xy).
∂f
Similarly, to calculate ∂y , regard x as a constant:

∂f ∂  2 
= x sin( xy)
∂y ∂y

= x2 sin( xy) (here x2 is regarded as a constant)
∂y

= x2 cos( xy) · xy
∂y
= x2 cos( xy) · x
= x3 cos( xy).
28 Partial Differentiations

i Similar to single-variable calculus, to evaluate f x at a fixed point say ( x, y) = (1, π ),


one should perform the differentiation first, and then substitute ( x, y) = (1, π ) into the
derivative. Not the other way round! For example,

f x (1, π ) = 2x sin( xy) + x2 y cos( xy) = 2 · 1 sin π + 12 · π cos(π ) = −π.


( x,y)=(1,π )

Geometric interpretation of partial derivatives


The geometric meaning of the partial derivative f x is illustrated in Figure 2.5. By keeping
y constant (say we let y = b) and let x varies, the path traced on the surface is the curve of
intersection between the plane y = b and the graph of the function. This curve is sometimes
called an x-curve. The partial derivative f x ( a, b) is the slope of the tangent to this x-curve at
( a, b):

∂f
Figure 2.5: geometric interpretation of ∂x

Partial derivatives of function with more than two variables


Partial derivatives of functions with more than two variables, say f ( x, y, z), are defined in an
∂f
analogous way. For instance, ∂x is the derivative with respect to x regarding all other variables,
∂f
i.e. y and z, constant. Although it is not easy to interpret ∂x geometrically since the graph of
∂f ∂f ∂f
f ( x, y, z) sits inside a 4-dimensional space, the way to compute ∂x , ∂y and ∂z are exactly the
same as two-variable functions.
2 3 ∂f
■ Example 2.2 Given f ( x, y, z) = e x +y + xyz . Compute ∂z .

■ Solution

∂f ∂ 2 3
= e x +y + xyz
∂z ∂z
2 3 ∂( x2 + y3 + xyz)
= e x +y + xyz · (chain rule)
∂z
2 + y3 + xyz
= ex · (0 + 0 + xy)
x2 +y3 + xyz
= xye .
2.2 Partial Derivatives 29

2.2.2 Second Derivatives


As in single-variable calculus, one can also talk about second derivatives for multivariable
functions. Given a two-variable function f ( x, y), its first partial derivatives f x and f y are also
functions of x and y. Therefore, we can further differentiate them with respect to either x or y.

■ Example 2.3 Let f ( x, y) = 3x4 y − 2xy + 5xy3 . Compute all first and second partial deriva-
tives.

■ Solution It is easy to see that

∂f
= 12x3 y − 2y + 5y3
∂x
∂f
= 3x4 − 2x + 15xy2
∂y

Then, the second derivatives are:


 
∂ ∂f ∂
= (12x3 y − 2y + 5y3 ) = 36x2 y
∂x ∂x ∂x
 
∂ ∂f ∂
= (12x3 y − 2y + 5y3 ) = 12x3 − 2 + 15y2
∂y ∂x ∂y
 
∂ ∂f ∂
= (3x4 − 2x + 15xy2 ) = 12x3 − 2 + 15y2
∂x ∂y ∂x
 
∂ ∂f ∂
= (3x4 − 2x + 15xy2 ) = 30xy.
∂y ∂y ∂y

 
∂ ∂f
Notation Since it is a bit clumsy to write ∂y ∂x every time, we can use the following
short-hand:

∂2 f ∂2 f
   
∂ ∂f ∂ ∂f
= =
∂x2 ∂x ∂x ∂x∂y ∂x ∂y
∂2 f ∂2 f
   
∂ ∂f ∂ ∂f
= 2
=
∂y∂x ∂y ∂x ∂y ∂y ∂y

Similarly for the subscript notations, we write f xx = ( f x ) x , and f xy = ( f x )y . The later means
to differentiate by x first and then by y. Therefore, it is related to the fraction notation by:

∂2 f
 
∂ ∂f
f xy = ( f x )y = = .
∂y ∂x ∂y∂x

∂2 f
The above remark seems to suggest that we should be very careful when converting ∂y∂x
into the subscript notation f xy . The order of x and y needs to be switched in the conversion.
However, thanks to the following important theorem, we don’t need to worry about this
too much, since in many cases, we have f xy = f yx .
Theorem 2.1 — Mixed Partials Theorem. Consider the function f ( x, y), if at least one of the
second partials f xy and f yx exists and is continuous, then we must have f xy = f yx .

Proof. Beyond the scope of this course. To be covered in MATH 2043/3033/3043. ■


Although the theorem requires that f xy or f yx needs to be continuous, most functions we
will encounter in this course are continuous and so this theorem applies. In Example 2.3, you
may have already noticed that f xy and f yx are equal. The Mixed Partials Theorem tells us that
it is not a coincident!
30 Partial Differentiations

■ Example 2.4 Consider the function:


s
esin x
f ( x, y) = √ + cos( xy).
x2014 + x2012 + 1
 
∂ ∂f
Find the second partial derivative ∂y ∂x .

∂f
■ Solution Needless to say, it is very tedious and time-consuming to compute ∂x . However,
 
∂ ∂f
by the Mixed Partials Theorem, we can try to find ∂x ∂y , and if it is continuous, then
   
∂ ∂ f ∂ ∂ f
∂y ∂x = ∂x ∂y .
 
∂ ∂f
It is much easier to compute ∂x ∂y since the “monster” term is gone after differentiating
the function by y:

∂f ∂
= 0 − sin( xy) · xy
∂y ∂y
= − x sin( xy)
 
∂ ∂f ∂
= − ( x sin( xy))
∂x ∂y ∂x
= − sin( xy) − xy cos( xy),

which is a continuous function. Therefore, f yx = f xy and so


   
∂ ∂f ∂ ∂f
= = − sin( xy) − xy cos( xy).
∂y ∂x ∂x ∂y

All examples of multivariable functions we have seen so far are “nice”, in a sense that partial
derivatives exist and are continuous at every point in their domains. In this course, we will use
the following terminology:
Definition 2.3 — C k functions. A multivariable function f ( x, y) is said to be C0 on its domain
D if it is continuous at every point ( x0 , y0 ) in D. Moreover, a function f ( x, y) is said to
be C k on its domain D if all partial derivatives up to and including order k exist and are
continuous at every point ( x0 , y0 ) in D.

i In this course, we will not discuss the difficult notion of differentiable functions, which will
be covered in MATH 3033. Meanwhile, please note that a function being C1 is not the
same as saying it is differentiable!
2.3 Chain Rule 31

2.3 Chain Rule


In this section, we assume that the partial derivatives of all functions involved exist and
are continuous (i.e. C k for any k), so that we do not need to worry about whether we can
differentiate the functions.

2.3.1 Multivariable Chain Rule


In single-variable calculus, we apply the chain rule when there is a chain of relations between
variables. For example, if f ( x ) is a function of x, and x is in turn a function of t, then f is
df
ultimately a function of t. The derivative dt can be calculated by:

df d f dx
= .
dt dx dt
One may represent this chain of relations by the schematic diagram:

f
df
dx
x
dx
dt

t
df d f dx
Figure 2.6: the schematic diagram for chain rule formula dt = dx dt .

For multivariable functions, the relation between variables can be more complicated. For
example, let the function u( x, y, z) be the temperature at the point ( x, y, z) in the space. Suppose
a particle moves along the path

r(t) = x (t)i + y(t)j + z(t)k.

Then, the coordinates x, y and z all depend on t, and so u is ultimately a function of t. See
Figure 2.7 for the tree diagram of the variables.

u
∂u ∂u
∂x ∂u ∂z
∂y

x y z
dx dy dz
dt dt dt

t t t

Figure 2.7: tree diagram of variables for u( x, y, z).

The derivative du
dt is the rate of change of the temperature that the particle “feels” as it
travels. The multivariable chain rule can be read off from the tree diagram 2.7:

du ∂u dx ∂u dy ∂u dz
= + + .
dt ∂x dt ∂y dt ∂z dt

Precisely, to write down the chain rule for du


dt , we find all possible paths from u to t in the tree
diagram. Each path consists of some segments. A segment, say, from u to x represents the
derivative ∂u∂x . To write down the chain rule, we “multiply” all segments of each path, and
“add” up all the paths.
32 Partial Differentiations

■ Example 2.5 Let u( x, y, z) = x2 + y2 − 2z2 and ⟨ x, y, z⟩ = ⟨cos t, sin t, t⟩. Compute du


dt .

■ Solution Although it can be done by substituting x = cos t, y = sin t and z = t into


u( x, y, z) = x2 + y2 − 2z2 and then compute du
dt directly, let’s try to use the chain rule to do it.
From the tree diagram (Figure 2.7), we have:

du ∂u dx ∂u dy ∂u dz
= + +
dt ∂x dt ∂y dt ∂z dt
2x ·(− sin t) + 2y · cos
= |{z} t +(−4z ) · 1 (calculate each derivative)
| {z } |{z} |{z} |{z} |{z}
∂u dx ∂u dy ∂z dz
∂x dt ∂y dt ∂t dt

= −2 cos t sin t + 2 sin t cos t − 4t (write x, y and z in terms of t)


= −2t.

As illustrated in the above example, once the chain rule formula is correctly written
according to the tree diagram, the remaining computations are straight-forward. From now on,
we will investigate how to write down the chain rule under various configuration of variables.
The computations will usually be skipped and are left as exercises for readers.

Examples with more diversed variable configurations


Suppose now that the temperature distribution is changing over time as well, i.e. the tem-
perature u( x, y, z, t) is a function of four variables. Again, a particle travels along a path
r(t) = x (t)i + y(t)j + z(t)k. Then, the tree diagram of variables in this case can be drawn as in
Figure 2.8.

∂u ∂u
∂x ∂u ∂t
∂y ∂u
∂z
x y z t

dx dy dz
dt dt dt

t t t

Figure 2.8: tree diagram of variables for u( x, y, z, t).

du
There are four paths from u to t, so we expect the chain rule formula for dt consists of four
terms:
du ∂u dx ∂u dy ∂u dz ∂u
= + + + .
dt ∂x dt ∂y dt ∂z dt ∂t

i Note that du ∂u
dt is different from ∂t . As the particle travels along its path, the temperature
the particle “feels” is a ultimately a function of t, and so the rate of change of temperature
is represented by du dt (using d instead of the partial ∂).
On the other hand, the partial ∂u∂t is the time derivative of u regarding x, y and z constant!
Therefore, it is the rate of change of temperature when the position is fixed! It is not the
rate of change of temperature for the moving particle!

Now consider a slightly more complicated example. Let w be a function of x, y and z,


and each of x, y and z is a function of s and t, as illustrated in Figure 2.9. The function w is
ultimately a function of s and t.
2.3 Chain Rule 33
∂w
There are three paths from w to s, so the chain rule for ∂s is given by the following:

∂w ∂w ∂x ∂w ∂y ∂w ∂z
= + + .
∂s ∂x ∂s ∂y ∂s ∂z ∂s

Figure 2.9: tree diagram

■ Example 2.6 Express ∂w


∂s in terms of s and r where:

w = x + 2y + z2
r
x= y = r2 + ln s z = 2r
s

■ Solution According to the tree diagram Figure 2.9, the chain rule for ∂w
∂s is given by:

∂w ∂w ∂x ∂w ∂y ∂w ∂z
= + +
∂s ∂x ∂s ∂y ∂s ∂z ∂s
 r 1
1 · − 2 + |{z}
= |{z} 2 · 2z · |{z}
+ |{z} 0
s s
wx | {z } wy |{z} wz zs
xs ys
r 2
=− 2+ .
s s

Now suppose w is a function of z only, z is a function of x and y, and both x and y are
functions of t, as illustrated in Figure 2.10. Ultimately, w is a function t. There are two paths
from w to t in the tree diagram. Each path consists of three segments. The chain rule for dw dt is
given by:
dw dw ∂z dx dw ∂z dy
= + .
dt dz ∂x dt dz ∂y dt

2.3.2 Implicit Differentiation: revisited


Given an implicit equation such as:

x2 + y3 + sin2 y = 1,

it is very difficult (possibly impossible) to express y in terms of x. In single-variable calculus,


dy
we learned how to find dx using implicit differentiation – regard y as a function of x, and
dy
differentiate both sides by x then solve for dx .
34 Partial Differentiations

Figure 2.10: tree diagram

The multivariable chain rule offers an alternative approach to implicit differentiation.


Define f ( x, y) = x2 + y3 + sin2 y, then the above implicit equation can be written as
f ( x, y) = 1. Regarding y as a function of x, f ( x, y) is ultimately a function of x. Figure 2.11
shows the tree diagram for the variables.

f
∂f ∂F f
∂x ∂y

x y
dy
dx

Figure 2.11: tree diagram for implicit differentiation

Therefore, by the chain rule, we have

df ∂f ∂ f dy
= + .
dx ∂x ∂y dx

df
Recall that f ( x, y) = 1 is a constant, so dx = 0, which yields:

dy dy fx
0 = fx + fy , and so: =− .
dx dx fy

It is a much straight-forward formula than the implicit differentiation method learned in


single-variable calculus. When f ( x, y) = x2 + y3 + sin2 y, we have:

dy 2x
=− 2 .
dx 3y + 2 sin y cos y

2.3.3 Chain Rule on Second Derivatives


Suppose u( x, y) is a function of x and y. The rectangular-polar coordinates conversion rule is
given by:

x = r cos θ
y = r sin θ
2.3 Chain Rule 35

Therefore, both x and y can be regarded as functions of r and θ, and so u can be regarded as a
function of r and θ as well. By the chain rule, we know:

∂u ∂u ∂x ∂u ∂y
= +
∂r ∂x ∂r ∂y ∂r
= u x cos θ + uy sin θ.

To find urθ , one can differentiate the above expression by θ:

∂2 u ∂ 
= u x cos θ + uy sin θ
∂θ∂r ∂θ
∂u x ∂uy
= cos θ − u x sin θ + sin θ + uy cos θ.
∂θ ∂θ
∂uy
Next we would like to express ∂u x
∂θ and ∂θ as partial derivatives with respect to x and y only.
The reason of doing so is because u xx is much easier to compute than u xθ . For instance, if
u( x, y) = x2 + y, then u x = 2x, and so u xx = 2 and u xy = 0. However, to find u xθ one needs to
first express u x as 2r cos θ.
Since u x and uy are both functions of x and y, and ( x, y) are functions of (r, θ ). Therefore,
u x and uy will have the same tree diagram as the function u. The chain rule for them is given
by:

∂u x ∂u x ∂x ∂u x ∂y
= +
∂θ ∂x ∂θ ∂y ∂θ
= u xx (−r sin θ ) + u xy (r cos θ )
∂uy ∂uy ∂x ∂uy ∂y
= +
∂θ ∂x ∂θ ∂y ∂θ
= uyx (−r sin θ ) + uyy (r cos θ )

Substitute them back in, we get:

∂2 u ∂u x ∂uy
= cos θ − u x sin θ + sin θ + uy cos θ
∂θ∂r ∂θ ∂θ
= −u xx r sin θ + u xy r cos θ cos θ − u x sin θ

+ −uyx r sin θ + uyy r cos θ sin θ + uy cos θ
 
= −u xx r sin θ cos θ + u xy r cos2 θ − sin2 θ + uyy r sin θ cos θ
− u x sin θ + uy cos θ
36 Partial Differentiations

2.4 Directional Derivatives


∂f
Recall that the physical meaning of the partial derivative ∂x is the rate of change of f in the
∂f
direction of x. The geometric interpretation was discussed in Figure 2.5. Similarly, ∂y is the
rate of change of f in the direction of y. In this section, we introduce the rate of change of f in
any other directions:
Definition 2.4 — Directional Derivative. Given a unit direction u = u1 i + u2 j and a two-
variable function f ( x, y), the directional derivative of f in the direction of u at point ( x, y)
is denoted and defined to be:
d
Du f ( x, y) = f ( x + tu1 , y + tu2 ) .
dt t =0

i When u = i, then u1 = 1 and u2 = 0 and so

d f ( x + t, y) − f ( x, y) ∂f
Di f ( x, y) = f ( x + t, y) = lim = ( x, y).
dt t =0 t → 0 t ∂x

Figure 2.12: directional derivative

In practice, we do not need to compute Du f from the definition, since Theorem 2.2 to
be introduced will come in handy help us. In order to introduce this theorem, we first
define:
Definition 2.5 — Gradient Vector. Given a two-variable function f ( x, y) which is C1 on its
domain, the gradient vector of f at ( x, y) is denoted and defined as:

∂f ∂f
∇ f ( x, y) = ( x, y)i + ( x, y)j.
∂x ∂y

As an example, let f ( x, y) = x2 y + x3 , then

∂f
( x, y) = 2xy + 3x2
∂x
∂f
( x, y) = x2
∂y

Therefore,
∇ f ( x, y) = (2xy + 3x2 )i + x2 j.
The vector ∇ f ( x, y) depends on ( x, y). By taking different values of ( x, y), a different gradient
2.4 Directional Derivatives 37

vector is produced. For instance,

∇ f (1, 1) = 5i + j,
∇ f (1, 0) = 3i + j.

Theorem 2.2 Given a two-variable function f ( x, y) which is C1 on its domain, the directional
derivative of f at ( x, y) in the unit direction u = u1 i + u2 j is given by:

Du f ( x, y) = ∇ f ( x, y) · u.

By applying this theorem in the special case u = i, we can see


 
∂f ∂f ∂f
∇ f ( x, y) · u = ( x, y)i + ( x, y)j · i = ( x, y) = Di f ( x, y)
∂x ∂y ∂x
∂f
as expected. Similarly, we can see ∇ f ( x, y) · j = ∂y ( x, y ) = Dj f ( x, y) as expected. Let’s see the
proof of the general case:
Proof of Theorem 2.2. The key idea is to use the chain rule. The directional derivative Du f ( x0 , y0 )
at the point ( x0 , y0 ) is the rate of change of f along the path r(t) = ⟨ x0 , y0 ⟩ + t⟨u1 , u2 ⟩, i.e.
x = x0 + u1 t and y = y0 + u2 t. By definition of directional derivative, Du f ( x0 , y0 ) is the
d
derivative dt f ( x0 + u1 t, y0 + u2 t) at t = 0. Therefore, f is a function of ( x, y), and ( x, y) are
functions of t. By the chain rule, we have:
df
Du f =
dt
∂ f dx ∂ f dy
= +
∂x dt ∂y dt
∂f d ∂f d
= ( x0 + tu1 ) + (y0 + tu2 )
∂x dt ∂y dt
∂f ∂f
= ·u + · u2 .
∂x 1 ∂y
On the other hand,
 
∂f ∂f
∇f · u = i+ j · ( u1 i + u2 j )
∂x ∂y
∂f ∂f
= ·u + · u2 .
∂x 1 ∂y
Therefore, Du f = ∇ f · u. ■
This theorem tells us that the computation of directional derivatives amounts to computing
the gradient vector and a dot product. Easy enough? As an example, given f ( x, y) = x2 y + x3 .
We worked out in the previous example that ∇ f (1, 1) = 5i + j, and so the directional derivative
of f at (1, 1) along u = √1 i + √1 j is given by:
2 2
   
1 1 1 1 6
D √1 i+ √1 j f (1, 1) = ∇ f (1, 1) · √ i+ √ j = (5i + j) · √ i+ √ j = √ .
2 2 2 2 2 2 2

2.4.1 Geometric Interpretation of Gradient Vectors


Theorem 2.2 not only tells us how to compute the directional derivative Du f ( x, y), but also
tells us in what direction u the derivative Du f ( x, y) is the greatest and the smallest.
Given a function f ( x, y), fix a point ( a, b). From the dot product formula (1.1), we know:

Du f ( a, b) = ∇ f ( a, b) · u = |∇ f ( a, b)| |u| cos θ = |∇ f ( a, b)| cos θ


|{z}
=1
38 Partial Differentiations

θ
∇ f ( a, b)

Figure 2.13: Du f = ∇ f · u = |∇ f | |u| cos θ

where θ is the angle between ∇ f ( a, b) and u.


Since cos θ is the largest when θ = 0 at which cos θ = 1, the directional derivative Du f ( a, b)
is maximized when ∇ f ( a, b) is parallel to u. Therefore, ∇ f ( a, b) is pointing in the direction
at which f increases most rapidly from ( a, b). It is quite intuitive that in order to increase the
value of f most rapidly, one should go along the direction perpendicular to the level curve. It
is indeed true. Let’s state it as a theorem:
Theorem 2.3 Let f ( x, y) be a two-variable function which is C1 on its domain, and ( a, b) be a
point on the level curve f ( x, y) = c. Then the gradient vector ∇ f ( a, b) is orthogonal to the
level curve f ( x, y) = c at the point ( a, b).

Proof. Let r(t) = x (t)i + y(t)j be a parametrization of the level curve f ( x, y) = c. In other
words, we have f ( x (t), y(t)) = c for any t, and so

d
f ( x (t), y(t)) = 0.
dt

By the chain rule, we get:

∂ f dx ∂ f dy
+ =0
∂x dt ∂y dt
   
∂f ∂f dx dy
i+ j · i+ j =0
∂x ∂y dt dt
∇ f · r′ (t) = 0.

Therefore, the gradient vector ∇ f is orthogonal to r′ (t) which is the tangent vector of the level
curve r(t). It completes the proof. ■

Figure 2.14: ∇ f ( a, b) is orthogonal to the level curve of f at ( a, b)


2.4 Directional Derivatives 39

1.0 0.5
0.8
0.7

0.8

0.4 0.6
0.6

0.2
0.4

0.1 0.3

0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

Figure 2.15: a plot of the field ∇ (sin xy) and the level curves of sin xy.

2.4.2 Directional Derivative of Three-Variable Functions


The directional derivative and the gradient vector for functions f ( x, y, z) of three variables is
defined in an analogous way as for two-variable functions. Precisely, we have:

∂f ∂f ∂f
∇f = i+ j+ k
∂x ∂y ∂z

Given a unit vector u, the directional derivative of f ( x, y, z) in the direction of u is given by:

Du f = ∇ f · u.

Since the level set of a three-variable function is typically a surface, the gradient vector ∇ f
at any given point is orthogonal to the level surface f ( x, y, z) = c at that point. See Figure 2.16.

Figure 2.16: ∇ f ( a, b, c) is orthogonal to the level surface of f at ( a, b, c)

In the next section, we will use this fact to find the equation of the tangent plane to a
surface.
40 Partial Differentiations

2.5 Tangent Planes


In single-variable calculus, the tangent line to the curve y = f ( x ) at a point ( x0 , f ( x0 )) has
slope equal to f ′ ( x0 ). Using the slope, one can easily write down the equation of the tangent
line as:
y = f ( x0 ) + f ′ ( x0 )( x − x0 ).
In multivariable calculus, the graph of a two-variable function z = f ( x, y) is no longer a
curve but a surface. Therefore, there are infinitely many tangent lines passing through any
given point on a surface. However, there is only one tangent plane at any given point. In this
section, we want to find an equation for the tangent plane.
Recall that the two ingredients of finding an equation of a plane are:
• a normal vector to the plane; and
• any given point on the plane.
Naturally, the point can be taken to be the contact point of the surface ( x0 , y0 , z0 ). To find
the normal vector, we will use the gradient vector.
In the previous section, we see that given a three-variable function g( x, y, z), the gradient
vector ∇ g( x, y, z) is perpendicular to the level surface g( x, y, z) = c. In other words, ∇ g is a
normal vector of the level surface g( x, y, z) = c.

■ Example 2.7 Find the tangent plane to the surface

x 2 + y2 = z2 + 3

at the point ( x, y, z) = (2, 0, −1).

■ Solution First we need to write the equation of the surface in a level set form, i.e.

x 2 + y2 − z2 = 3

Define g( x, y, z) = x2 + y2 − z2 , then the given surface is a level set g( x, y, z) = 3. By direct


computations,

∇ g( x, y, z) = 2xi + 2yj + (−2z)k


∇ g(2, 0, −1) = 4i + 2k.

Then, n := 4i + 2k is a normal vector of the surface at (2, 0, −1). The equation of the tangent
plane at (2, 0, −1) is given by:

4x + 0y + 2z = 4(2) + 0(0) + 2(−1),


4x + 2z = 6.

Given a two-variable function f ( x, y), the graph z = f ( x, y) is a surface. In order to find


the tangent plane at a given point, one can rewrite the graph equation z = f ( x, y) as:

z − f ( x, y) = 0.

Then, one can define g( x, y, z) = z − f ( x, y) so that the graph of the two-variable function
f ( x, y) becomes a level set of a three-variable function g( x, y, z). Let’s look at an example:
2.5 Tangent Planes 41

■ Example 2.8 Given the function f ( x, y) = x cos y − ye x , find the tangent plane at (0, 0, 0) to
the graph z = x cos y − ye x .

■ Solution First rearrange the terms so that the graph equation becomes a level set:

z − x cos y + ye x = 0.

Define g( x, y, z) = z − x cos y + ye x , then the surface under consideration is the level set
g( x, y, z) = 0.

∇ g( x, y, z) = (− cos y + ye x )i + ( x sin y + e x )j + k.
The normal vector at (0, 0, 0) is given by:

n = ∇ g(0, 0, 0) = −i + j + k.

The equation of the tangent plane at (0, 0, 0) to the graph is:

− x + y + z = 0.

Generally, one can derive a formula for finding the tangent plane of any graph of a
two-variable function f ( x, y):
Theorem 2.4 Given a function f ( x, y) which is C1 on its domain. The equation of the tangent
plane for the graph z = f ( x, y) at the point ( x0 , y0 , f ( x0 , y0 )) is given by:

∂f ∂f
z = f ( x0 , y0 ) + ( x0 , y0 ) · ( x − x0 ) + ( x0 , y0 ) · ( y − y0 ).
∂x ∂y

Proof. Write the graph equation z = f ( x, y) as:

z − f ( x, y) = 0

and define g( x, y, z) = z − f ( x, y), then the graph can be regarded as a level set g( x, y, z) = 0
of the three-variable function g.

∂f ∂f
∇ g( x, y, z) = − i− j + k.
∂x ∂y

At the point ( x0 , y0 , f ( x0 , y0 )), the normal vector to the surface is therefore given by:

∂f ∂f
n = ∇ g( x0 , y0 , f ( x0 , y0 )) = − ( x0 , y0 )i − ( x0 , y0 )j + k.
∂x ∂y

The equation of the tangent plane at ( x0 , y0 , f ( x0 , y0 )) is:


       
∂f ∂f ∂f ∂f
− ( x0 , y0 ) x + − ( x0 , y0 ) y + z = − ( x0 , y0 ) x0 + − ( x0 , y0 ) y0 + f ( x0 , y0 )
∂x ∂y ∂x ∂y

By rearrangement, we get:

∂f ∂f
z = f ( x0 , y0 ) + ( x0 , y0 ) · ( x − x0 ) + ( x0 , y0 ) · ( y − y0 ),
∂x ∂y

as desired. ■
42 Partial Differentiations

2.6 Local Extrema


For a single-variable function f ( x ), if f ′ (c) = 0, we say (c, f (c)) is a critical point. It is a
candidate for local maximum or minimum. The second derivative may be used to determine
whether the critical point is a local maximum or a local minimum.
In this section, we will extend the concept of critical points to two-variable functions, and
introduce the second derivative test for these functions.

2.6.1 Critical Points


Recall that the equation of the tangent plane to the graph z = f ( x, y) at a point ( a, b) is given
by:
z = f ( a, b) + f x ( a, b)( x − a) + f y ( a, b)(y − b).
This plane is horizontal if and only if f x ( a, b) = f y ( a, b) = 0. This motivates the definition of
critical points for two-variable functions:
Definition 2.6 — Critical Points. Given a C1 function f ( x, y). A point ( a, b) is said to be a
critical point if the tangent plane at ( a, b) to the graph z = f ( x, y) is horizontal.
∂f ∂f
Therefore, ( a, b) is a critical point ⇔ ∂x ( a, b) = ∂y ( a, b) = 0 ⇔ ∇ f ( a, b) = 0.

i As in single-variable calculus, the critical points are just candidates of maximum/minimum.


Further investigation is needed to determine whether it is a local maximum or minimum,
or neither.

Figure 2.17: the tangent plane at a critical point of a C1 function is horizontal. However, a
critical point is not always a local maximum or minimum. It can be a saddle like the origin
of z = x2 − y2 , which is a local maximum in the y-direction but is a local minimum in the
x-direction.

■ Example 2.9 Find all critical point(s) of the function f ( x, y) = xy − x2 − y2 − 2x − 2y + 4.


2.6 Local Extrema 43

∂f ∂f
■ Solution We compute ∂x and ∂y , and then set them to zero and solve for ( x, y):

∂f
0= = y − 2x − 2
∂x
∂f
0= = x − 2y − 2.
∂y

It is a system of equations with unknowns x and y. The first equation gives y = 2x + 2, and
substitute it into the second equation, we get:

x − 2(2x + 2) − 2 = 0 ⇒ x = −2.

When x = −2, we have y = −2, and so ( x, y) = (−2, −2) is a critical point of f ( x, y). See
Figure 2.18a for its graph.

■ Example 2.10 Find all critical point(s) of the function f ( x, y) = sin x sin y

■ Solution Consider:

∂f ∂f
=0 and =0
∂x ∂y
cos x sin y = 0 and sin x cos y = 0
(cos x = 0 or sin y = 0) and (sin x = 0 or cos y = 0)
π π
( x = + kπ or y = mπ ) and ( x = nπ or y = + pπ ).
2 2
Here m, n, k, p are any integers. Some logical deductions show that these imply the following:
π π
x= + kπ and y = + pπ
2 2
or: y = mπ and x = nπ.

Therefore, there are infinitely many critical points:


π π 
+ kπ, + pπ , (mπ, nπ )
2 2
where m, n, k, p are any integers. See Figure 2.18b for its graph.

(a) graph of z = xy − x2 − y2 − 2x − 2y + 4 (b) graph of z = sin x sin y

Figure 2.18: graphs of functions in Examples 2.9 and 2.10


44 Partial Differentiations

2.6.2 Second Derivative Test of Multivariable Functions


In single-variable calculus, to determine the nature of a critical point x0 of a function f ( x ), we
look at its second derivative at x0 . If f ′′ ( x0 ) > 0, the graph y = f ( x ) is concave up around x0
and so ( x0 , f ( x0 )) is a local minimum point. On the other hand, if f ′′ ( x0 ) < 0, the graph is
concave down near x0 and so ( x0 , f ( x0 )) is a local maximum point.
In multivariable calculus, however, to determine whether a critical point ( x0 , y0 ) of a two-
variable function f ( x, y) is not as simple as in single-variable calculus. Take the following
function as an example:
f ( x, y) = x2 + 4xy + y2 .
One can easily verify that ∇ f (0, 0) = 0 and so (0, 0) is a critical point. For the second
derivatives, we find that:

f xx (0, 0) = 2
f yy (0, 0) = 2

for every ( x, y) on the R2 plane. Both are positive numbers. You may be tempted to conclude
that (0, 0) is a local maximum point. However, if one plots the graph of this function (see
Figure 2.19), one can see easily that (0, 0) is neither a local maximum or a local minimum.

Figure 2.19: (0, 0) is neither a maximum or minimum

Around (0, 0), the graph is a concave up in some directions but concave down in other
directions. We call this (0, 0) a saddle.
This example shows the signs of f xx and f yy alone could not conclude the nature of the
critical point. In fact, the second derivative test for two-variable functions is slightly more
complicated than that in single-variable calculus:
Theorem 2.5 — Second Derivative Test for Two-Variable Functions. Let f ( x, y) be a C2 function
and ( x0 , y0 ) is a critical point of f , i.e. ∇ f ( x0 , y0 ) = 0. Then the nature of this critical point
( x0 , y
0 ) is determined  by the following table:
2
f xx f yy − f xy f xx ( x0 , y0 ) ( x0 , y0 ) is a:
( x0 ,y0 )
>0 >0 local minimum
>0 <0 local maximum
<0 anything saddle
Any other cases are inconclusive.

For the function f ( x, y) = x2 + 4xy + y2 in the above example, to determine the nature of (0, 0)
we also need f xy (0, 0), which can be found as equal to 4.
Therefore, we have:
 
2
f xx f yy − f xy = 2 × 2 − 42 < 0,
(0,0)
f xx (0, 0) = 2 > 0.

From the table in Theorem 2.5, we conclude (0, 0) is a saddle, as expected from the plot of the
its graph. Let’s look at one more example before we learn the proof of the Second Derivative
Test.
2.6 Local Extrema 45

■ Example 2.11 Let f ( x, y) = 3y2 − 2y3 − 3x2 + 6xy. Find all critical points and determine
the nature of each of them.

■ Solution To find all critical points, we set:

∂f
= −6x + 6y = 0,
∂x
∂f
= 6y − 6y2 + 6x = 0.
∂y

From the first equation, we get y = x. Substitute this into the second equation, we yield:

6x − 6x2 + 6x = 0, or equivalently 2x − x2 = 0.

By factorization, we get x (2 − x ) = 0. Therefore

x = 0 or x = 2.

By noting that y = x, we have two critical points: (0, 0) and (2, 2).
Next we compute the second derivatives of f :

f xx = −6 f xy = 6
f yx = 6 f yy = 6 − 12y
 
Critical point P f xx ( P) f yy ( P) f xy ( P) 2
f xx f yy − f xy ( P) Nature of P
(0, 0) -6 6 6 -72 saddle
(2, 2) -6 -18 6 72 local maximum

Explanation of the Second Derivative Test


In single-variable, the second derivative test can be explained using convexity of the graph
y = f ( x ). However, this approach cannot be generalized to higher dimensions.
Before we explain why the above second derivative test works for two-variable functions
f ( x, y), we first seek an alternative explanation of the single-variable second derivative test
using Taylor’s series.
Recall that the Taylor’s series of a given function f ( x ) about x = a is given by:

f ′′ ( a) f ′′′ ( a)
f ( x ) = f ( a) + f ′ ( a)( x − a) + ( x − a )2 + ( x − a )3 + . . .
2! 3!
If f ( x ) has a critical point at x = a, then f ′ ( a) = 0. Also, when x is very close to a, the
higher-order terms ( x − a)3 , ( x − a)4 , etc. are significantly smaller than the quadratic term
( x − a)2 . Therefore, the function f ( x ) is approximately given by:

f ′′ ( a)
f ( x ) ≃ f ( a) + ( x − a )2 when x is near a.
2!
f ′′ ( a)
The right-hand side f ( a) + 2! ( x − a)2 is a quadratic function. If f ′′ ( a) > 0, then the graph
f ′′ ( a) f ′′ ( a)
y = f ( a) + 2! ( x − a)2 is a concave up parabola and so f ( a) + 2! ( x − a )
2 ≥ f ( a). Therefore,
f ′′ ( a) 2
f ( x ), which is approximately f ( a) + 2! ( x − a ) , is also ≥ f ( a) when x is near a. This explains
f ( x ) has a local minimum at x = a.
f ′′ ( a)
On the other hand, if f ′′ ( a) < 0, then the graph y = f ( a) + 2! ( x − a)2 is a concave down
parabola. Similar argument as above shows f ( x ) has a local maximum at x = a.
46 Partial Differentiations

10

-1.0 -0.5 0.0 0.5 1.0

Figure 2.20: blue graph shows y = f ( x ) where f ′ (0) = 0; yellow graph shows y = f (0) +
f ′′ (0) 2
2! x where f ′′ (0) > 0

Back to multivariable calculus, we now explain the second derivative test using the Taylor’s
series approach. Given a function f ( x, y), the multivariable Taylor’s series about ( x, y) =
( x0 , y0 ) is given by:
f ( x, y) = f ( x0 , y0 ) + f x ( x0 , y0 )( x − x0 ) + f y ( x0 , y0 )(y − y0 )
f xx ( x0 , y0 ) 2 f xy ( x0 , y0 ) f yy ( x0 , y0 )
+ ( x − x0 )2 + ( x − x0 )(y − y0 ) + ( y − y0 )2
2! 2! 2!
+ higher-order terms
The proof is beyond the scope of the course. If ( x0 , y0 ) is a critical point of f ( x, y), then
f x ( x0 , y0 ) = f y ( x0 , y0 ) = 0. For simplicity, denote P = ( x0 , y0 ), then when ( x, y) is near P, we
have:
1 
f ( x, y) ≃ f ( x0 , y0 ) + f xx ( P)( x − x0 )2 + 2 f xy ( P)( x − x0 )(y − y0 ) + f yy ( P)(y − y0 )2 .
2
Therefore, to determine whether ( x0 , y0 ) is a local maximum/minimum or a saddle of f ( x, y),
one should determine whether the quadratic function:

f xx ( P)( x − x0 )2 + 2 f xy ( P)( x − x0 )(y − y0 ) + f yy ( P)(y − y0 )2


is positive/negative or neither.
For simplicity, denote
A = f xx ( P), B = f xy ( P), C = f yy ( P),
X = x − x0 , Y = y − y0 .
Then, the quadratic expression can be simplified as:

f xx ( P)( x − x0 )2 + 2 f xy ( P)( x − x0 )(y − y0 ) + f yy ( P)(y − y0 )2 = AX 2 + 2BXY + CY 2 .

To determine whether AX 2 + 2BXY + CY 2 is always positive/negative or neither, one looks


the discriminant ∆ = (2B)2 − 4AC = 4( B2 − AC ):
AC − B2 ∆ = 4( B2 − AC ) A AX 2 + 2BXY + CY 2 near P, f ( x, y) is
>0 <0 >0 ≥0 ≥ f ( P)
>0 <0 <0 ≤0 ≤ f ( P)
<0 >0 anything + or − ≥ f ( P) or ≤ f ( P)

Translate
 back to previous  notations, we can conclude:
2
f xx f yy − f xy f xx ( x0 , y0 ) ( x0 , y0 ) is a:
( x0 ,y0 )
>0 >0 local minimum
>0 <0 local maximum
<0 anything saddle
This explains the second derivative test for two-variable functions!
2.7 Lagrange’s Multiplier 47

2.7 Lagrange’s Multiplier


In the previous section, we learned how to find critical points in the interior of a domain,
namely by solving ∇ f = 0. These critical points are candidates of the maximum or minimum
of the function. We also learn how to determine the local nature of the critical points. However,
to determine the maximum/minimum on the boundary of a domain, the gradient method
does not work as the tangent plane at the maximum/minimum needs not be horizontal (see
Figure 2.21).

Figure 2.21: At the maximum and minimum points of the function x2 + 2y2 when ( x, y) is
restricted the unit circle x2 + y2 = 1, the tangent plane may not be horizontal. Therefore,
solving ∇ f = 0 does not give the maximum or minimum points of the function.

When the domain of a function f ( x, y) is restricted on a level set such as x2 + y2 = 1, which


is a unit circle, we use a method called the Lagrange’s Multiplier. We first state the method,
then look at a few examples, and finally explain why it works.
Given a function f ( x, y) which we want to maximize or minimize, and the variables ( x, y)
are restricted by the constraint g( x, y) = c. Then, to determine all possible candidates of
maximum/minimum point on the constraint, we:
1. Solve the system of equations

∇ f ( x, y) = λ∇ g( x, y)
g( x, y) = c

Here the unknowns are x, y and λ.


2. The solutions of ( x, y) are the possible candidates of the maximum or minimum points
of the function f ( x, y). We call these boundary critical points.
3. Finally, evaluate f ( x, y) at each boundary critical point found. The point giving the
largest value of f ( x, y) is the maximum point on the boundary, and that giving the
smallest value of f ( x, y) is the minimum.

i We call this the Lagrange’s Multiplier method because the scalar λ is called the Lagrange’s
Multiplier.
48 Partial Differentiations

■ Example 2.12 Let f ( x, y) = x2 + y2 + 2x + 2y, find the maximum and minimum values of
f when ( x, y) is restricted on the constraint x2 + y2 = 1.

■Solution f ( x, y) is the function we want to maximize and minimize. Let g( x, y) = x2 + y2


so that the level set g( x, y) = 1 is our constraint. First we compute:

∇ f = ⟨2x + 2, 2y + 2⟩
∇ g = ⟨2x, 2y⟩.

Therefore, the vector equation ∇ f ( x, y) = λ∇ g( x, y) is equivalent to the two equations


2x + 2 = 2λx and 2y + 2 = 2λy. Combining with the constraint equation x2 + y2 = 1, we
get a system of three equations with three unknowns x, y and λ:

2x + 2 = 2λx ⃝
1

2y + 2 = 2λy ⃝
2

2 2
x +y = 1 ⃝
3

Since we are interested in solving for ( x, y) and finding λ is optional, we divide ⃝1 by ⃝2 so

that the λ can be canceled. However, we may worry that whether ⃝ 2 is zero! Therefore, we

split into two cases.


Case 1: 2y + 2 ̸= 0 (and so 2λy ̸= 0 too)
⃝1 ÷ ⃝ 2 gives:

2x + 2 2λx
= .
2y + 2 2λy
After cancellations, we get:
x+1 x
= .
y+1 y
By cross multiplication:

y( x + 1) = x (y + 1) ⇒ xy + y = xy + x ⇒ y = x.

Substitute y = x into ⃝,
3 we have 2x2 = 1, and so x = √1 or − √1 . Since y = x, the solutions
2 2
for ( x, y) in this case are:
   
1 1 1 1
( x, y) = √ ,√ , −√ ,−√ .
2 2 2 2
Case 2: 2y + 2 = 0
In this case, y = −1. Substitute this into ⃝,
3 we get x = 0. However, putting x = 0 into ⃝
1

yields 2 = 0 which is absurd! Therefore, there isno solution


 in
 this case. 
To sum up, the boundary critical points are: √1 , √1 , − √1 , − √1 . Evaluate f ( x, y)
2 2 2 2
at each point gives:

1 1
 √
f √ ,√ = 1+2 2
2 2

1 1
 √
f −√ ,−√ = 1−2 2
2 2
 
Therefore, subject to the constraint x2 + y2 = 1, √1 , √1 is the maximum point of f with
2 2
√ 
1 1
 √
value 1 + 2 2, and − √ , − √ is the minimum point of f with value 1 − 2 2. See Figure
2 2
2.22 (the blue circle is the constraint).
2.7 Lagrange’s Multiplier 49
1.0 3.82843

0.5 2.41421

0.414214

3.12132
0.0

0.292893

-0.5
-1.82843

1.70711

-1.0 -1.12132
-1.0 -0.5 0.0 0.5 1.0

(a) the graph (b) the level set diagram

Figure 2.22: f ( x, y) = x2 + y2 + 2x + 2y in Example 2.12

■ Example 2.13 Let f ( x, y) = x2 − 4x + y2 + 9. Find the maximum and minimum points and
values of f ( x, y) subject to the constraint 4x2 + 9y2 = 36.

■ Solution Define g( x, y) = 4x2 + 9y2 , then the constraint is the level set g( x, y) = 36. Set-up
the Lagrange’s Multiplier system:

∇ f ( x, y) = λ∇ g( x, y)
g( x, y) = 36

By computing ∇ f and ∇ g, the above is equivalent to a system of three equations:

2x − 4 = 8λx ⃝
1

2y = 18λy ⃝
2

2 2
4x + 9y = 36 ⃝
3

Case 1: ⃝2 ̸= 0

By ⃝1 ÷ ⃝,2 we get:

2x − 4 8λx
=
2y 18λy
x−2 4x
= (cancel λ)
y 9y
9y( x − 2) = 4xy (cross multiplication)
9( x − 2) = 4x (cancel y ̸= 0)
18
x=
5
18
However, substitute x = 5 into ⃝,
3 we get:
 2
18
9y2 = 36 − 4
5

which is a negative number, but 9y2 must be positive (or zero)! Therefore, there is no solution
in this case.
50 Partial Differentiations

Case 2: ⃝2 = 0

In this case, we have 2y = 18λy = 0, so y = 0. From ⃝, 3 we have 4x2 = 36 and so x = 3


or x = −3. Therefore, ( x, y) = (3, 0) and ( x, y) = (−3, 0) are the solutions in this case.
Finally, we evaluate f at each boundary critical point:

f (3, 0) = 6
f (−3, 0) = 30

Therefore, minimum point is (3, 0) with value 6; maximum point is (−3, 0) with value 30.
See Figure 2.23

2 30 26 18 10

0 6

-1

-2 30 22 14

-3 -2 -1 0 1 2 3

(a) the graph (b) the level set diagram

Figure 2.23: f ( x, y) = x2 − 4x + y2 + 9 in Example 2.13

Next we explain how the Lagrange’s Multiplier method works. Given a function f ( x, y)
subject to the constraint g( x, y) = c. At the point ( a, b) on the constraint where the maximum
or minimum of f ( x, y) is achieved, the level set of f ( x, y) at ( a, b) is tangent to the constraint
g( x, y) = c. Consequently, the gradient vector ∇ f ( a, b), which is perpendicular to the level set
of f at ( a, b), must be parallel to the gradient vector ∇ g( a, b), which is perpendicular to the
constraint g = c. See Figure 2.24 for an illustration. Therefore, at such a point, we must have:

∇ f ( a, b) = λ∇ g( a, b)

where λ is a scalar.

Figure 2.24: ∇ f and ∇ g are parallel at the boundary critical point ( a, b)


2.7 Lagrange’s Multiplier 51

The Lagrange’s Multiplier method also works for three-variable functions, yet the system
of equations may be more complicated. Let’s look at the example:

■Example 2.14 Find the distance from (0, 0, 0) to the plane 2x + 3y + 4z = 29 using La-
grange’s Multiplier.

■Solution The distance from a point P to a plane is defined to be the shortest possible
distance between the given point P and any point Q on the plane. Let’s first formulate this
problem in a mathematical way. We want to:
q
minimize x 2 + y2 + z2
subject to constraint 2x + 3y + 4z = 29
p p 
However, to minimize x2 + y2 + z2 amounts to calculating ∇ x2 + y2 + z2 . As you
p
can imagine, it would be messy. It is useful to observe that minimizing x2 + y2 + z2 is
equivalent to minimizing x2 + y2 + z2 , i.e. the square of the distance from the origin. The
latter is much easier to handle. Let:

f ( x, y, z) = x2 + y2 + z2
g( x, y, z) = 2x + 3y + 4z,

then the constraint is the level set g = 29. Set-up the Lagrange’s Multiplier system ∇ f = λ∇ g
as in previous examples:

2x = 2λ
2y = 3λ
2z = 4λ
2x + 3y + 4z = 29


Then x = λ, y = 2 and z = 2λ. Substitute them into the constraint equation, we get:


2λ + + 8λ = 29.
2
It is easy to see that λ = 2, and therefore ( x, y, z) = (2, 3, 4). It gives the unique critical point.
It is intuitive that the minimum point must exist in this problem (and there is no maximum
√ give the minimum point. Since f (2, 3, 4) = 29, the
point), so this unique critical point must
distance from (0, 0, 0) to the plane is 29.
52 Partial Differentiations

2.8 Optimizations
In this section we will learn some examples of optimization using the gradient and/or La-
grange’s Multiplier methods.

■ Example 2.15 Many airlines require that the sum of length, width and height of a checked
baggage cannot exceed 62 inches. Find the dimensions of the rectangular baggage that has
the greatest possible volume under this regulation.

■Solution Denote l, w, h to be the length, width and height respectively. We need to maximize
the volume of the baggage, which is given by:

V (l, w, h) = lwh (cubic inches).

The constraint is l + w + h ≤ 62 (inches), but it is intuitively clear that in order to maximize the
volume, the sum l + w + h has better be at maximum possible. Define g(l, w, h) = l + w + h,
then the constraint can be regarded as the level set g = 62. Set up the Lagrange’s Multiplier
system:

∇V = λ ∇ g
g(l, w, h) = 62

which is equivalent to

wh = λ
lh = λ
lw = λ
l + w + h = 62

Although it is not too difficult to solve them by hand, let’s type the following command on
Mathematica to solve them:
Solve[{w h == L, l h == L, l w == L, l + w + h == 62}, {l, w, h, L}]

These are all critical points:


 
62 62 62
(l, w, h) = , , , (0, 0, 62), (0, 62, , 0), (62, 0, 0).
3 3 3

Only the first one is physically relevant. Therefore, the rectangular baggage with the largest
volume under this restriction is the square cube!

■ Example 2.16 Three cities A, B and C are located at (5, 2), (−4, 4) and (−1, −3) respectively
on the ( x, y)-plane. There is a railtrack whose equation is y = x3 + 1, and a station is going
to be built on the track so that the sum of squares of the distances from each city to the
station is minimized. Find the coordinates of the station.

■ Solution Quantity to be minimized is

f ( x, y, z) = ( x − 5)2 + (y − 2)2 + ( x + 4)2 + (y − 4)2 + ( x + 1)2 + (y + 3)2 .


| {z } | {z } | {z }
distance2 from station to city A distance2 from station to city B distance2 from station to city C

The constraint is that the station has to be on the track, i.e. y = x3 + 1. Define g( x, y) = y − x3 ,
then the constraint can be written as g( x, y) = 1. Set up the Lagrange’s Multiplier system
2.8 Optimizations 53

∇ f = λ∇ g and g( x, y) = 1:

2( x − 5) + 2( x + 4) + 2( x + 1) = −3λx2
2( y − 2) + 2( y − 4) + 2( y + 3) = λ
y − x3 = 1

Solving the system, we get ( x, y) = (0, 1). Therefore, the station should be located at (0, 1)
in order to minimize the sum of squares of the distances.

■ Example 2.17 — Least Square Approximation. Given a set of data points:

( x1 , y1 ), . . . , ( x N , y N )

on the xy-plane. Find the straight-line y = mx + c such that the sum of squares of distances
between each ( xi , yi ) and ( xi , mxi + c) is minimized.

■ Solution The quantity to be minimized is:

N
f (m, c) = ∑ (yi − mxi − c)2 .
i =1

Note that ( xi , yi )’s are given so they should be regarded as constants. The variables are m
and c. Note that there is no constraint for m and c, so we can simply solve ∇ f (m, c) = 0 for
critical points.
!
N N N N
∂f
= −2 ∑ (yi − mxi − c) xi = −2 ∑ xi yi − m ∑ xi2 − c ∑ xi
∂m i =1 i =1 i =1 i =1
!
N N N
∂f
= −2 ∑ (yi − mxi − c) = −2 ∑ yi − m ∑ xi − cN
∂c i =1 i =1 i =1

∂f ∂f
Set ∂m = ∂c = 0, regarding all xi ’s and yi ’s to be constants, then:

Am + Bc = E
Bm + Nc = F

where A = ∑iN=1 xi2 , B = ∑in=1 xi , E = ∑iN=1 xi yi and F = ∑iN=1 yi . By solving the system
carefully, one should get:

BF − EN ( ∑ xi ) ( ∑ yi ) − N ( ∑ xi yi )
m= 2
=
B − AN (∑ xi )2 − N ∑ xi2


(∑ xi ) (∑ xi yi ) − ∑ xi2 (∑ yi )

BE − AF
c= 2 =
B − AN ( ∑ x i )2 − N ∑ x 2

i

It is quite intuitive that this pair of m and c should minimize f since f ≥ 0 and so a minimum
must exist.
54 Partial Differentiations

■ Example 2.18 Let f ( x, y) = x2 − 4x + y2 + 9 (which was considered in Example 2.13 in the


previous section). Find the absolute maximum and absolute minimum of f restricted to the
domain 4x2 + 9y2 ≤ 36.

■ Solution The Lagrange’s Multiplier method finds us the boundary critical points on
4x2 + 9y2 = 36. For the interior 4x2 + 9y2 < 36, the critical points are simply solutions to
∇ f = 0. The general procedure of an optimization problem with a solid domain is that:
1. Find all interior critical points by solving ∇ f = 0;
2. Find all boundary critical points using Lagrange’s Multiplier;
3. Evaluate f at each critical points found, and look for the point that gives greatest/lowest
value of f .
Interior: Set ∇ f = 0, we get:

2x − 4 = 0
2y = 0

Therefore, the only interior critical point is (2, 0), which can be checked easily that it is in the
given region.
Boundary: The boundary is the ellipse 4x2 + 9y2 = 36. We have already done in Example
2.13, using Lagrange’s Multiplier, that the boundary critical points are (3, 0) and (−3, 0).
Finally, evaluate f at each critical point found:

f (2, 0) = 5
f (3, 0) = 6
f (−3, 0) = 30

Therefore, the absolute minimum is 5 (attained at (2, 0)), and the absolute maximum is
30 (attained at (−3, 0)).
3 — Multiple Integrations

“Mathematics is not about numbers,


equations, computations, or algorithms: it
is about understanding”

William Thurston

3.1 Double Integrals in Rectangular Coordinates


In single-variable calculus, integration is used to find the area of the graph of a function f ( x ).
In this chapter, we will generalize the concept of integrations to multivariable functions. There
are many applications of multiple integrals in sciences, including deriving moments of inertia,
probability, and in later part of the course: finding surface area and surface flux.
Computations of multivariable integrals are not much different from those in single-variable
integrals, but setting up a multivariable integral involves a lot more geometric intuitions.
Let’s first look at some computations first before we explain the geometry of these inte-
grals.

■ Example 3.1 Compute the following double integral:


ˆ y =2 ˆ x =1
(4 − x − y2 x )dxdy.
y =1 x =0

■ Solution A double integral consists of an inner integral and an outer integral:

outer
ˆ y =2 ˆ
z }| {
x =1
(4 − x − y2 x )dx dy .
y =1 x =0
| {z }
inner

When computing the inner integral (which is respect to x in this example), we regard all
56 Multiple Integrations

other variable(s) (i.e. y) to be constant(s):


ˆ y =2 ˆ x =1 ˆ y =2  x =1
x2 y2 x 2

2
(4 − x − y x )dxdy = 4x − − dy
y =1 x =0 y =1 2 2 x =0
ˆ y=2 
1 y2
 
= 4− − − 0 dy
y =1 2 2
ˆ y =2   y =2
7 y2 7y y3
 
7
= − dy = − = .
y =1 2 2 2 6 y =1 3

It is worthwhile to note that if we switch the inner and outer integrals, the final answer is the
same!
ˆ x =1 ˆ y =2 ˆ y =2   y =2
y3 x
(4 − x − y2 x )dydx = 4y − xy − dx
x =0 y =1 y =1 3 y =1
ˆ x =1   
8x x
= 8 − 2x − − 4−x− dx
x =0 3 3
ˆ x =1  
10x 7
= 4− dx = .
x =0 3 3
It is not a coincident! Let’s explain why it is true by learning the geometric meaning of double
integrals. Consider the integral:
ˆ x =b ˆ y=d
f ( x, y) dydx.
x=a y=c

The inner integral:


ˆ y=d
A( x ) := f ( x, y) dy
y=c

is an integral with respect to y keeping x fixed. This quantity represents the area under
the curve obtained by moving along the y-direction from y = c to y = d on the surface
z = f ( x, y), while keeping x unchanged. See Figure 3.1.

Figure 3.1: geometric meaning of a double integral

The outer integral integrates the inner integral A( x ) from x = a to x = b, i.e.


ˆ x =b ˆ y=d ˆ x =b
f ( x, y) dydx = A( x ) dx.
x=a y=c x=a
3.1 Double Integrals in Rectangular Coordinates 57

Since A( x ) dx can be thought as the volume of a solid slice with width dx and cross-section
area A( x ), by integrating A( x ) dx it means adding up the volume of these thin slices and so
the double integral
ˆ x =b
A( x ) dx
x=a
is the volume under the graph z = f ( x, y) over the base rectangle bounded by x = a, x = b,
y = c and y = d. It is important to understand the geometric meanings of the inner and outer
integrals in order to set-up a double integral correctly.
As a double integral represents the volume of a solid, one should not expect there is any
difference if we slice the solid in a different way. For instance, to find the volume under the
graph z = 6 − 2x − y over the rectangular region 0 ≤ x ≤ 1 and 0 ≤ y ≤ 2, one can set up the
double integral in either way:
ˆ x =1 ˆ y =2
(6 − 2x − y) dy dx see Figure 3.2a
x =0 y =0
| {z }
A( x )
ˆ y =2 ˆ x =1
(6 − 2x − y) dx dy see Figure 3.2b
y =0 x =0
| {z }
A(y)

(a) slices with fixed x (b) slices with fixed y

Figure 3.2: volume under the same graph

Readers should verify that the above integrals indeed give the same value (the answer
is 8). In general, the following Fubini’s Theorem asserts that switching dx and dy (and the
corresponding integral signs) give the same double integral. Although the statement of the
theorem is geometrically intuitive, the proof is not easy and is beyond the scope of this course.
58 Multiple Integrations

Theorem 3.1 — Fubini’s Theorem for Rectangular Regions. Let f ( x, y) be a continuous function
over a rectangular region a ≤ x ≤ b and c ≤ y ≤ d, then:
ˆ y=d ˆ x =b ˆ x =b ˆ y=d
f ( x, y) dxdy = f ( x, y) dydx.
y=c x=a x=a y=c

Since the order of integration (i.e. dxdy or dydx) determines the order of the integral signs,
the above two double integrals can simply be written as:
ˆ dˆ b ˆ bˆ d
f ( x, y) dxdy = f ( x, y) dydx.
c a a c

Even simpler, one may denote dA := dxdy or dydx and the rectangular region by R. Then, we
can write the integral as: ¨
f ( x, y) dA.
R
When setting up a double integral to find the volume of the solid under a graph z = f ( x, y),
it is worthwhile to observe that the lower and upper limits of the integral are not affected by
the function f ( x, y). Therefore, in order to interpret a double integral in a geometric way, one
may simply draw the base region (or in other words, the top-down view) instead of drawing
the solid in the three-dimensional space.

y x=1 y x=1

leaves at y = 2
y=2 y=2
enters at x = 0 leavs at x = 1

x x
enters at y = 0

(a) top-down view of Figure 3.2a (b) top-down view of Figure 3.2b

Figure 3.3: the red arrows represent the cross-section slices in Figures 3.2a and 3.2b.
3.2 Fubini’s Theorem for General Regions 59

3.2 Fubini’s Theorem for General Regions


In this section, we demonstrate some examples of double integrals whose base regions are
general regions such as triangles.

■ Example 3.2 Find the volume of the solid under the plane z = 3 − x − y over the triangle
region R bounded by the x-axis, x = 1 and y = x.

■ Solution First we choose an order of integration, say dydx. The inner integral should
calculate the area of slices with y varies and x fixed. Since the height 3 − x − y of the solid
does not affect how we set-up the upper/lower limits, we consider the top-down view of the
solid (see Figure 3.4).
The red strip in Figure 3.4b represents a sample slice with fixed x. The strip enters at
y = 0 and leaves at y = x. Hence, the area of this slice is:
ˆ y= x
(3 − x − y) dy.
y =0 | {z }
height

“Summing up” the area of these slices, we integrate by dx over the range of x: 0 ≤ x ≤ 1 as
shown in Figure 3.4b, i.e.
ˆ x =1 ˆ y = x
(3 − x − y) dy dx.
x =0 y =0
| {z }
inner integral

It will give the volume of the solid as required in this problem. The rest of the task is to
compute the integral:
ˆ x =1 ˆ y = x ˆ x =1 y= x
y2

(3 − x − y) dydx = 3y − xy − dx
x =0 y =0 x =0 2 y =0
ˆ x =1
x2
 
= 3x − x2 − dx
x =0 2
ˆ 1
3x2

= 3x − dx
0 2

3x2 x 3 1

= −
2 2 0
= 1.

Alernatively, we can also integrate first by dx then by dy. Then, the inner integral is
represented by the red strip in Figure 3.4c. It enters at x = y and leaves at x = 1. The double
integral is therefore:
ˆ y =1 ˆ x =1
(3 − x − y) dxdy.
y =0 x =y

Readers should compute as an exercise that the answer is again 1.


60 Multiple Integrations

Figure 3.4: the graph and the top-down views of the function in Example 3.2

The fact that we have the freedom to choose our order of integration is guaranteed by the
Fubini’s Theorem, whose proof is again beyond the scope of this course.
Theorem 3.2 — Fubini’s Theorem for General Regions. Let R be a region on the xy-plane and
f ( x, y) is a continuous function on R, then
¨ ¨
f ( x, y)dxdy = f ( x, y)dydx
R R

where the lower/upper limits of each integral are set up according to the region R.

Notation As there is no difference between dxdy and dydx as far as the upper and lower
limits are set according to the same region, we may simply write:

dA = dxdy or dydx

Although choosing the dxdy-order will yield the same result as the dydx-order, it happens
often that one order is easier while the other one is harder. Let’s look at the following
example:

■ Example 3.3 Let R be the region in the first quadrant of the xy-plane bounded by the unit
circle x2 + y2 = 1 and the straight-line x + y = 1. Evaluate the integral
¨ p
1 − x2 dA.
R


■ Solution First choose the order of integration. However, it seems like integrating 1 − x2
3.2 Fubini’s Theorem for General Regions 61

by dx involves trig substitutions that we want to avoid if possible. Let’s try integrating first
by dy then by dx (to see if there is any luck).
Set-up the double integral according the top-down view of the solid (Figure 3.5):
ˆ √
x =1 ˆ y = 1− x 2 p
1 − x2 dydx.
x =0 y =1− x

As x is regarded as a constant when dealing with the inner integral, we can easily see that:
ˆ √
x =1 ˆ y = 1− x 2 p ˆ x =1 h p i y = √1− x 2
1− x2 dydx = y 1−x 2 dx
x =0 y =1− x x =0 y =1− x
ˆ x =1 p
= (1 − x2 ) − (1 − x ) 1 − x2 dx
x =0
ˆ 1 p p
= (1 − x 2 − 1 − x2 + x 1 − x2 ) dx
0

The only two difficult parts are


ˆ 1 p ˆ 1 p
1 − x2 dx and x 1 − x2 dx.
0 0

The former can be evaluated by substitution x = sin θ, while the latter can be done by
substituting u = 1 − x2 . Readers should complete the rest of computations as an exercise.
The final answer should be 1 − π4 .
We need a trig substitution anyway, but it is easier than doing a substitution for the inner
integral.

Figure 3.5: top-down view of the region in Example 3.3

In the previous example, we see that although the Fubini’s Theorem tells that in theory we
can choose our favorite the order of integration, in practice we sometimes have to make a smart
choice. In the next example, let’s demonstrate an example that one order gives an integral
which is impossible to compute, while another is extremely easy.
62 Multiple Integrations

■ Example 3.4 Evaluate the following double integral:


ˆ 1ˆ 1
sin x
dxdy.
0 y x

■ Solution The integrand sinx x does not have a simple antiderivative when integrating by dx!
Let’s switch the order of integration first.
The region corresponds to the double integral is formed by strips entering at x = y and
leaving at x = 1. The range of y is from 0 to 1. A sketch of the diagram can be found in
Figure 3.6.
Switching the order of integration, the strip for each x enters at y = 0 and leaves at y = x.
Therefore, Fubini’s Theorem says:
ˆ 1ˆ 1 ˆ 1ˆ x
sin x sin x
dxdy = dydx.
0 y x 0 0 x

The RHS is much easier to compute:


ˆ 1ˆ x ˆ 1
sin x y= x
 
sin x
dydx = y dx
0 0 x 0 x y =0
ˆ 1 
sin x
= x − 0 dx
0 x
ˆ 1
= sin x dx = − cos 1.
0

Figure 3.6: the region of integration in Example 3.4

If the region of integration is the shaded triangle below, which order of integration is better?
¨ ¨
f ( x, y) dxdy or f ( x, y) dydx,
R R

where f ( x, y) is not very complicated, say f ( x, y) = x2 y.


3.3 Double Integrals in Polar Coordinates 63

y y=x

y=1
x

y = 8−x

3.3 Double Integrals in Polar Coordinates


When the region of integration is circular in shape, or the integrand is rotationally symmetric,
it is often more convenient to use polar coordinates to set-up the integral instead of the using the
rectangular coordinates. We will see in some examples in this section that some tedious trig
substitution can be avoid if polar coordinates are used.
Recall that the polar coordinates (r, θ ) and the rectangular coordinates ( x, y) are related by
the following rules:
x = r cos θ
y = r sin θ
Here r is the distance from the point to the origin, and θ is the angle made with the positive
x-axis.
In rectangular coordinates, the region defined by inequalities like a ≤ x ≤ b and c ≤ y ≤ d,
where a, b, c and d are constants, describe a rectangle. We have seen that it is very easy to
set-up a double integral when the region is a rectangle.
In polar coordinates, regions defined by inequalities like a ≤ r ≤ b and α ≤ θ ≤ β, where a,
b, α and β are constants, describe a fan shape or a circular sector (see Figure 3.7). It is wise
to use polar coordinates instead of rectangular coordinates when the region of integration is
given by one of these circular shapes.

Figure 3.7: examples of regions good for polar coordinates

To set-up a double integral of a region a ≤ r ≤ b and α ≤ θ ≤ β using polar coordinates,


the upper/lower limits are simply:
ˆ θ = β ˆ r =b ˆ r =b ˆ θ = β
or
θ =α r=a r=a θ =α
64 Multiple Integrations

depending on the order of integration drdθ or dθdr. However, one should be very cautious that
while dA = dxdy in rectangular coordinates, it is instead:

dA = rdrdθ or rdθdr

in polar coordinates. We will explain why it is so after learning a few examples.

■ Example 3.5 Evaluate the integral:


¨
( x2 + y2 )dA
R

where R is the semicircular region bounded by the x-axis and the curve y = 1 − x2 . See
Figure 3.8.

■ Solution The problem is extremely difficult to do in rectangular coordinates. If one attempts


to set it up using xy-coordinates, one will get the following double integral:
ˆ ˆ √
1 1− x 2
( x2 + y2 )dydx.
−1 0

After computing the inner integral, one should get:


ˆ 1
!
2
p (1 − x2 )3/2
x 1− x2 + dx.
−1 3

However, things go much better if we switch to polar coordinates, since the region is in
the form a ≤ r ≤ b and α ≤ θ ≤ β. From Figure 3.8, the region is defined by:

0 ≤ r ≤ 1, 0 ≤ θ ≤ π.

Keeping in mind that dA = rdrdθ, the required integral is:


¨ ˆ θ =π ˆ r =1  
( x2 + y2 )dA = (r cos θ )2 + (r sin θ )2 rdrdθ
R θ =0 r =0 | {z }
x 2 + y2
ˆ θ =π ˆ r =1
= r2 (cos2 θ + sin2 θ )rdrdθ
θ =0 r =0
ˆ θ =π ˆ r =1
= r3 drdθ
θ =0 r =0
ˆ θ =π  r =1
r4

= dθ
θ =0 4 r =0
ˆ θ =π
1 π
= dθ = .
θ =0 4 4
3.3 Double Integrals in Polar Coordinates 65

Figure 3.8: region of integration for Example 3.5

Annular regions, to be discussed in the next example, are extremely clumsy using rectangular
coordinates but relatively easy using polar coordinates.

■ Example 3.6 Evaluate the following integral:


¨
x dA
R

where R is an annular region shown in Figure 3.9.

■ Solution The annular region R is defined by inequalities:

2 ≤ r ≤ 4, 0 ≤ θ ≤ 2π.

Therefore,
¨ ˆ 2π ˆ 4
x dA = r cos θ · rdrdθ
R 0 2
ˆ 2π ˆ 4
= r2 cos θ drdθ
0 2
ˆ 2π  r =4
r3

= · cos θ dθ
0 3 r =2
ˆ 2π
56
= cos θ dθ
0 3
 2π
56
= sin θ
3 0
=0

A Notoriously Difficult Single-Variable Integral


Next we present an application of evaluating a notoriously difficult single-variable integral
using a double integral in polar coordinates.
It is famous (or infamous) that the integral:
ˆ
2
e− x dx

does not have an easy explicit expression. Nonetheless, this integral is extremely important
in probability, heat flow and many physics and engineering subjects. While this integral is
generally hard to compute, it can be magically done, with the help of a double integral, when
the upper/lower limits are: ˆ ∞ 2
e− x dx.
0
66 Multiple Integrations

Figure 3.9: region of integration for Example 3.6

Consider the double integral:


ˆ ∞ˆ ∞ 2 − y2
e− x dxdy.
0 0

2 − y2 2 2 2
Observing that e− x = e− x e−y , we can regard e−y as a constant in the inner dx-integral:
ˆ ∞ˆ ∞ ˆ ∞ˆ ∞ ˆ ∞ ˆ ∞ 
2 − y2 2 2 2 2
e− x dxdy = e− x e−y dxdy = e−y e− x dx dy.
0 0 0 0 0 0
ˆ ∞ 2
Now that e− x dx is a constant, and in particular, independent of y. Therefore, with respect
0
to the outer dy-integral, one can factor it out and yield:
ˆ ∞ ˆ ∞  ˆ ∞  ˆ ∞ 
− y2 − x2 − x2 − y2
e e dx dy = e dx e dy .
0 0 0 0

Now the double integral are split as a product of two single-variable integrals. Note that the
two single-variable integrals are the same as x and y are merely dummy variables! Therefore,
combining everything we have shown above, we have:
ˆ ∞ˆ ∞ ˆ ∞  ˆ ∞  ˆ ∞ 2
2 − y2 2 2 2
e− x dxdy = e− x dx e−y dy = e− x dx .
0 0 0 0 0
ˆ ∞ 2
Consequently, in order to evaluate the single-variable integral e− x dx, one can first evaluate
0
the double integral
ˆ ∞ˆ ∞ 2 − y2
e− x dxdy
0 0
ˆ ∞ 2
and then the square root of the double integral will give the value of e− x dx.
0
In contrast to the single-variable integral, the double integral is relatively easy to find
using polar coordinates. The region of integration is the entire first quadrant which, in polar
coordinates, can be described as:

π
0 ≤ r ≤ ∞, 0≤θ≤ .
2
3.3 Double Integrals in Polar Coordinates 67

Therefore,
ˆ ∞ˆ ∞ ˆ π/2 ˆ ∞
2 − y2 2
e− x dxdy = e−r r drdθ
0 0 0 0
ˆ
1 −r 2 r = ∞
π/2 
d −r 2 2
= − e dθ note that e = e−r · (−2r )
0 2 r =0 dr
ˆ π/2  
1
= −0 + dθ
0 2
π
= .
4
Therefore, ˆ √

r
− x2 π π
e dx = = .
0 4 2
2
Furthermore, e− x is an even function, we also have:
ˆ ∞
2 √
e− x dx = π.
−∞

However, this trick does not work well for any similar definite integral such as:
ˆ 1
2
e− x dx.
0

Although it is still true (from a similar argument as above) that we have


ˆ !2 ˆ ˆ
1 1 1
2 2 − y2
e− x dx = e− x dxdy,
0 0 0

the double integral is not “polar-friendly” since the region of integration is a square!

Explanation of the Polar Method


We end this section by explaining why dA = rdrdθ but not simply dA = drdθ.
In rectangular coordinates, the area of a region is calculated by chopping off the region into
tiny rectangular pieces with length and width denoted by the changes ∆x and ∆y. The area of
each rectangular piece is given by ∆A = ∆x · ∆y. When these rectangular pieces get smaller
and smaller, ∆x and ∆y become dx and dy, and so the area of these rectangular pieces becomes
dA = dxdy.
However, things are a bit different in polar coordinates. Instead of chopping off a given
region into tiny rectangles, the region is chopped into “fan-shaped” pieces (see Figure 3.10.
Given a pair of ∆r and ∆θ, the area ∆A of the piece is not simply ∆r · ∆θ, but is getting
proportionally larger when the piece is further from the origin!
Therefore, one needs to multiply r to ∆r · ∆θ in order to accurately reflect the size of ∆A.
Precisely, ∆A can be calculated as the difference between the area of an outer sector (with
radius r + ∆r) and an inner section (with radius r):
1 1 2
∆A = (r + ∆r )2 ∆θ − r ∆θ
2
| {z } |2 {z }
area of outer sector area of inner sector
1 2 
= r + 2r∆r + (∆r )2 − r2 ∆θ
2
1
= r∆r · ∆θ + (∆r )2 ∆θ.
2
When ∆r and ∆θ get smaller and smaller, the last term (∆r )2 ∆θ is negligible when compared
to the other term r∆r∆θ. Therefore, as the region is chopped into infinitesimal pieces, we have:
dA = rdrdθ.
68 Multiple Integrations

Figure 3.10: area of ∆A gets larger when r gets larger.

3.4 Triple Integrals in Rectangular Coordinates


We now move one dimension up and talk about triple integrals, whose integrands are functions
of three variables such as f ( x, y, z). An example of such an integral is:
ˆ 1ˆ yˆ y
f ( x, y, z)dxdzdy.
0 0 z

Some triple integrals have important physical meanings. For instance, if f ( x, y, z) is the
density function, then the triple integral
ˆ 1ˆ yˆ y
f ( x, y, z) dxdzdy
0 0 z

represents the mass of the solid (whose shape is determined by the upper/lower limits of the
integral). It is because dxdzdy represents (infinitesimal) volume, and

density × volume = mass.

In physics, some calculations such as the center of mass of an object and the moment of inertia
about an axis also involve evaluations of triple integrals.

Pillar-Base Approach
Before we see some practical applications of triple integrals, let’s learn how a triple integral is
set-up. One helpful technique is so-called the pillar-base approach or pillar-shadow approach.
Let’s explain this through an example:

■ Example 3.7 Let D be the tetrahedral solid bounded by the plane z = y − x, the plane
y = 1, the xy-plane and the yz-plane (see Figure 3.11). Evaluate the triple integral:
˚
x2 dzdydx.
D

■Solution We demonstrate the pillar-base approach in the solution. The so-called pillar is the
orange ray labeled M in Figure 3.11, and it determines how the inner-most integral is set-up:
ˆ z=?
x2 dz.
z=?

Its lower and upper limits of z are determined by, respectively, where the pillar enters the
solid and where the pillar leaves the solid. According to the diagram, it enters at z = 0, i.e.
3.4 Triple Integrals in Rectangular Coordinates 69

the xy-plane; and it leaves at z = y − x. Therefore, the inner-most integral should be:
ˆ z=y− x
x2 dz.
z =0

Note that again the integrand x2 does not affect how we set-up this integral.
Next we call the other two variables x and y to be base variables or shadow variables.
Suppose light is coming in along the direction parallel to the pillar (i.e. z-direction), the
shadow of the solid appears on the base xy-plane as a triangle. To way to set-up the middle
and outer-most integrals are just the same as what we did for double integrals.
Since we picked the dydx-order, we draw a sample strip L on the base in the direction of
y. This strip enters at y = x and leaves at y = 1, and therefore the middle and outer-most
integrals should be set-up as:
ˆ x =1 ˆ y =1 ˆ z = y − x
x2 dzdydx.
x =0 y= x z =0
| {z } | {z }
base pillar

Finally, the easiest step is to evaluate the integral. This is as straight-forward as in double
integrals:
ˆ 1 ˆ 1 ˆ y− x ˆ 1ˆ 1
z=y− x
x2 dzdydx = [ x 2 z ] z =0 dydx
0 x 0 0 x
ˆ 1ˆ 1
= x2 (y − x )dydx
0 x
ˆ 1  y =1
x 2 y2

= − x3 y dx
0 2 y= x
ˆ 1 2
x4

x
= − x3 − + x4 dx
0 2 2
ˆ 1 2
x4

x 3
= −x + dx
0 2 2
 3 1
x x4 x5
= − +
6 4 10 0
1 1 1 1
= − + = .
6 4 10 60
The Fubini’s Theorem also holds for triple integrals, i.e. we can evaluate the integral by
different order of integration, say dxdydz or dydxdz (there are 6 possible orders), we should get
the same value provided that the upper and lower limits are adjusted to present the same solid.

■ Example 3.8 Let D be the tetrahedral solid in Example 3.7. Evaluate the triple integral
˚
x2 dydzdx
D

using the dydzdx-order.

■ Solution Now the inner-most integral is with respect to dy, meaning the pillar is pointing
along the positive y-axis. It is the ray labeled as M in Figure 3.12. It enters the solid through
the plane z = y − x, or equivalently y = x + z, and it leaves at y = 1. Therefore, the
70 Multiple Integrations

Figure 3.11: solid D in Example 3.7.

inner-most integral should be:


ˆ y =1
x2 dy.
y= x +z

The middle and the outer-most variables are z and x, so the base is the shadow of the
solid on the xz-plane, which is the triangle labeled R in Figure 3.12.
Here the order of integration is dzdx, so we draw a sample strip (labeled L) along the
z-axis direction. It enters the region through z = 0 and leaves at the line x + z = 1, or
equivalently, z = 1 − x. Therefore, the whole triple integral should be set as:
ˆ x =1 ˆ z =1− x ˆ y =1
x2 dydzdx.
x =0 z =0 y= x +z

It can be verified by straight-forward computations that the answer should be the same as
we got in Example 3.7:
ˆ x =1 ˆ z =1− x ˆ y =1 ˆ 1 ˆ 1− x h i y =1
2
x dydzdx = x2 y dzdx
x =0 z =0 y= x +z 0 0 y= x +z
ˆ 1 ˆ 1− x  
= x2 − x2 ( x + z) dzdx
0 0
ˆ 1  z =1− x
x 2 z2

= ( x2 − x3 )z − dx
0 2 z =0
ˆ 1
x )2 x 2 (1 −
 
= ( x2 − x3 )(1 − x ) − dx
0 2
ˆ 1 2
x4

x 3 1
= −x + dx = (same!)
0 2 2 60

The Fubini’s Theorem tells us that no matter what order of integration we choose, we
always get the same answer. Therefore, we can sometimes us the notation dV to denote the
volume element dxdydz (or any other order). A generic triple integral can be written as:
˚
f ( x, y, z) dV.
D

Although Fubini’s Theorem allows us to switch the order of integral, we sometimes need to
3.4 Triple Integrals in Rectangular Coordinates 71

Figure 3.12: the pillar-base diagram for the triple integral in Example 3.8

choose a smart choice of pillar and base variables to ease our computation. Let’s look at the
next example:

■ Example 3.9 Let D be the solid bounded by the paraboloid y = x2 + z2 and y = 16 − 3x2 − z2
(see Figure 3.13). Find the volume of the solid, i.e. evaluate the integral:
˚
1dV.
D

■ Solution After taking a careful look at the diagram, one should see that it would be a bad
choice if we chose either x or z to be the pillar variable. The solid can be decomposed into
two parts as shown in the diagram. If z were chosen to be the pillar direction, then the pillar
would enter the yellowp part in a way different from it does in the blue part. The yellow
part would
p have z = ± 16 − 3x2 − y as the lower/upper limits, while the blue part have
z = ± y − x2 . Even worse, the part near the intersection of the blue and the yellow surfaces
is geometrically complicated – it is not easy to set up the inner integral for that part.
However, life is much easier if we choose y as the pillar variable. Since then the y-pillar
will enter through the blue surface and leave through the yellow surface. The shadow is an
ellipse on the xz-plane.
To set-up the inner integral, we note that the y-pillar enters at y = x2 + z2 and leaves at
y = 16 − 3x2 − z2 :
ˆ y=16−3x2 −z2
dy.
y = x 2 + z2

The shadow (or base) is an ellipse, whose equation can be obtained by setting y = x2 + z2 =
16 − 3x2 − z2 , which gives
2x2 + z2 = 8.

√ can either choose dzdx


For the base integral, we √ or dxdz. For the former, the sample z-strip
enters the ellipse at − 8 − 2x2 and leaves at 8 − 2x2 . The min/max values of x are −2
and 2 for the ellipse. Therefore, the middle and outer integrals should be:
ˆ ˆ √ ˆ
x =2 z= 8−2x2 y=16−3x2 −z2
√ dydzdx.
x =−2 z=− 8−2x2 y = x 2 + z2

Evaluation of this integral is straight-forward


√ (but be careful). It is left as an exercise for
readers. The answer should be 32π 2.
72 Multiple Integrations

Figure 3.13: the pillar-base diagram for the triple integral in Example 3.9

3.5 Triple Integrals in Cylindrical Coordinates


In order to set-up or compute a double integral of a circular region, it is often more convenient
to convert the problem into polar coordinates. For triple integrals, there are also some solids
that are not so “compatible” with rectangular coordinates but are easily to handle if one convert
the problem into cylindrical or spherical coordinates. These solids include cylinders, cones,
spheres, etc.
Cylindrical coordinates in R3 is simply combining the polar coordinates for the ( x, y)-
directions, and keeping the z coordinates. The conversion rule is given by:

x = r cos θ
y = r sin θ
z=z

where r ≥ 0, 0 ≤ θ ≤ 2π and z can be any real number. Just like polar coordinates, it is good
to keep in mind that x2 + y2 = r2 , which will sometimes simplify your calculations. Figure
3.14 explains the geometry of these conversion rules.

Figure 3.14: cylindrical coordinates

If one sets r = constant, then it describes an infinite cylinder in R3 with z-axis as the
central axis. Therefore, if the solid is cylindrical in shape, it is usually easier to set-up a triple
3.5 Triple Integrals in Cylindrical Coordinates 73

integral using cylindrical coordinates. Analogous to polar coordinates, the volume element dV
is:
Theorem 3.3 Under cylindrical coordinates (r, θ, z), we have:

dV = rdzdrdθ.

Let’s look at an example:

■ Example 3.10 The volume of the solid bounded by two surfaces z = 4 − 4( x2 + y2 ) and
z = ( x2 + y2 )2 − 1, as shown in Figure 3.15.

■ Solution When using cylindrical coordinates, one often (though not always) set z as the

pillar variable, then r and θ become the base variable.


The z-pillar enters the solid at z = ( x2 + y2 )2 − 1, which is z = r4 − 1 in cylindrical
coordinates, and leaves the solid at z = 4 − 4( x2 + y2 ), which is z = 4 − 4r2 in cylindrical
coordinates. These determine the lower and upper limit of the inner integral.
The shadow is a circle centered at the origin. To find its radius, we solve:

r4 − 1 = z = 4 − 4r2

which gives r = 1. Therefore, the outer and middle integrals have upper and lower limits
given by:
ˆ θ =2π ˆ r=1
.
θ =0 r =0
Combining all of the above, the volume of the solid is given by:
˚ ˆ θ =2π ˆ r =1 ˆ z=4−4r2
dV = 1 rdzdrdθ
D θ =0 r =0 z =r 4 −1
ˆ 2π ˆ 1
2
= [z]rz4=−41−4r rdrdθ
0 0
ˆ 2π ˆ 1
2 4
= (4 − 4r − r + 1)r drdθ
0 0
ˆ 2π ˆ 1
= (5r − 4r3 − r5 ) drdθ
0 0
ˆ 2π  r =1
5r2 r6

4
= − −r − dθ
0 2 6 r =0
ˆ 2π  
5 1 8π
= −1− dθ = .
0 2 6 3

■Example 3.11 Let D be a solid cylinder of radius a with z-axis as the central axis, is bounded
by planes z = z0 and z = z0 + h, where z0 and h are constants. Therefore, the cylinder has
height h. Suppose the solid has uniform density δ. Evaluate the following triple integral:
˚
Iz := δ( x2 + y2 )dV
D

which is the moment of inertia of the solid about the z-axis.

■Solution We choose the order of integration dzdrdθ. The z-pillar enters the solid at z = z0
and leaves at z = z0 + h. The shadow is the circle with radius a centered at the origin.
74 Multiple Integrations

Figure 3.15: the solid in Example 3.10.

Therefore,
ˆ 2π ˆ a ˆ z0 + h
Iz = δ( x2 + y2 ) · rdzdrdθ
0 0 z0 | {z }
r2
ˆ 2π ˆ a ˆ z0 + h
= δr3 dzdrdθ
0 0 z0
ˆ 2π ˆ a
= δr3 hdrdθ
0 0
ˆ 2π r = a
δhr4

= dθ
0 4 r =0
ˆ 2π
δha4
= dθ
0 4
δhπa4
= .
2
Most physics/engineering textbook expresses the moment of inertia in terms of the total
mass m rather than the density δ. To rewrite the above answer in terms of m, we note that:
m
δ=
V
where V is the total volume of the solid, which is πa2 h. Therefore, combining with the above
calculation, we have:
m hπa4 ma2
Iz = · = ,
πa2 h 2 2
which is exactly what you can find in physics or engineering textbooks.
3.6 Triple Integrals in Spherical Coordinates 75

3.6 Triple Integrals in Spherical Coordinates


Another common coordinate system is the spherical coordinates. A point in R3 can be
represented by three numbers (ρ, θ, ϕ) which have the following geometric meaning:

ρ = distance from the point to the origin


θ = the projected angle on the xy-plane counting from the positive x-axis
ϕ = the angle counting from the positive z-axis

Figure 3.16: spherical coordinates

The ranges of (ρ, θ, ϕ) are:

ρ ≥ 0, 0 ≤ θ ≤ 2π, 0 ≤ ϕ ≤ π.

From standard trigonometry, one can figure out the following conversion rules:

x = ρ sin ϕ cos θ
y = ρ sin ϕ sin θ
z = ρ cos ϕ

Similar to polar and cylindrical coordinates, it is good to keep in mind that ρ2 = x2 + y2 + z2 .

Figure 3.17: coordinate planes


76 Multiple Integrations

The equation ρ = ρ0 , where ρ0 is a positive constant, represents the sphere of radius ρ0


centered at the origin. The equation ϕ = ϕ0 represents a cone with angle ϕ0 with the origin as
its vertex. Therefore, the spherical coordinates are particular useful when handling spherical
or conical objects (see Figure 3.17).
To integrate using spherical coordinates, one should note that:
Theorem 3.4 Under the spherical coordinates (ρ, θ, ϕ), the volume element is given by:

dV = ρ2 sin ϕ dρdϕdθ.

Infinitesimally, ρ2 sin ϕ dρdϕdθ is the volume of the little cube in red in Figure 3.18. The
dimensions of the little cube can be found using trigonometry.

Figure 3.18: geometric meaning of dV = ρ2 sin ϕ dρdϕdθ.

There is a general way to derive a formula for dV for any kind of coordinate systems.
Suppose each coordinate of ( x, y, z) is a function of (u, v, w). The Jacobian matrix is defined to
be:  
∂x ∂x ∂x
∂( x, y, z)  ∂u
∂y
∂v
∂y
∂w
∂y 
=  ∂u .
∂(u, v, w) ∂z
∂v
∂z
∂w 
∂z
∂u ∂v ∂w
There is a general conversion formula (whose proof is beyond the scope of the course):
∂( x, y, z)
dxdydz = det dudvdw. (3.1)
∂(u, v, w)
Let’s take cylindrical coordinates as an example. Since ( x, y, z) are related to (r, θ, z) by the
conversion rules:
x = r cos θ, y = r sin θ, z = z.
Therefore, the Jacobian matrix is given by:
∂ ∂ ∂   
∂( x, y, z) ∂r r cos θ ∂θ r cos θ ∂z r cos θ cos θ −r sin θ 0
= ∂r r sin θ ∂θ
 ∂ ∂
r sin θ ∂
∂z r sin θ
 =  sin θ r cos θ 0
∂(r, θ, z) ∂
z ∂ ∂ 0 0 1
∂r ∂θ z ∂z z

∂( x, y, z)
It is straight-forward to compute that det = r cos2 θ − (−r sin2 θ ) = r. Therefore, (3.1)
∂(r, θ, z)
shows:
dxdydz = rdrdθdz.
3.6 Triple Integrals in Spherical Coordinates 77

■Example 3.12 Consider a solid sphere S of radius r centered at the origin. Suppose it has
uniform density δ. Derive the moment of inertia about z-axis, which is given by:
˚
Iz := δ( x2 + y2 )dV
S

■Solution The solid sphere S can be written as 0 ≤ ρ ≤ r in spherical coordinates. As a full


sphere, the ranges for the angles are:

0 ≤ θ ≤ 2π, 0 ≤ ϕ ≤ π.

Therefore, the set-up of the integral is:


ˆ θ =2π ˆ ϕ=π ˆ ρ =r
Iz = δ( x2 + y2 )ρ2 sin ϕ dρdϕdθ.
θ =0 ϕ =0 ρ =0

To compute the integral, we need to express x2 + y2 using spherical coordinates:

x2 + y2 = (ρ sin ϕ cos θ )2 + (ρ sin ϕ sin θ )2 = ρ2 sin2 ϕ.

Therefore,
ˆ 2π ˆ π ˆ r
Iz = δ · ρ2 sin2 ϕ · ρ2 sin ϕ dρdϕdθ
0 0 0
ˆ 2π ˆ π ˆ r
= δρ4 sin3 ϕ dρdϕdθ
0 0 0
ˆ 2π
! ˆ
π  ˆ r 
= dθ sin3 ϕ dϕ δρ4 dρ
0 0 0

δr5
ˆ π 
= 2π · (1 − cos2 ϕ) sin ϕ dϕ ·
0 5

cos3 ϕ δr5

= 2π · − cos ϕ + ·
3 0 5
δr5
 
1 1
= 2π · 1 − + 1 − · .
3 3 5
8πδr5
= .
15
Although it is a perfectly acceptable answer, the moment of inertia in many physics and
engineering books is expressed in terms of the total mass m rather than of the density. Since
the density in this problem is uniform, one can easily derive the moment of inertia formula
in terms of m:
8πr5 m 2mr2
Iz = · 4πr3 =
15 5
3
which is what one can find in physics or engineering books.

Conical objects are another type of solids which are compatible with spherical coordinates.
The inequality 0 ≤ ϕ ≤ π3 represents an infinite cone above the xy-plane with cone angle
3 counting from the positive z-axis. If one further impose the inequality 0 ≤ ρ ≤ 1 which
π

represents the sphere with radius 1 centered at the origin, then the combined inequalities:
π
0 ≤ ρ ≤ 1, 0 ≤ ϕ ≤
3
represents the common part of the sphere and the cone, which is in an ice-cream shape.
78 Multiple Integrations

■Example 3.13 Find the volume of the “ice-cream cone” D cut from the solid sphere ρ ≤ 1
by the cone ϕ = π3 , as shown in Figure 3.19.

˚
■ Solution The volume is given by ρ2 sin ϕ dρdϕdθ. The limits of the inner-most integral
D
are determine by where the ρ-ray, labeled M in the diagram, enters the solid and where it
leaves the solid. Evidently, it enters from the origin ρ = 0. When it leaves, it always leaves
from the sphere part and never the cone part, so the upper limit should be ρ = 1.
The ϕ-angle run from 0 to π3 , and the θ-ray, labeled L in the diagram, sweeps over the
shadow from 0 to 2π.
Combining all these limits, the volume is given by the integral:
ˆ ˆ
θ =2π ˆ
ϕ=π/3 ρ =1 ˆ ! ˆ

! ˆ
π/3
!
1
ρ2 sin ϕ dρdϕdθ = dθ sin ϕ dϕ ρ2 dρ
θ =0 ϕ =0 ρ =0 0 0 0
1
= 2π · [− cos ϕ]0π/3 ·
  3
1 1 π
= 2π · − + 1 · = .
2 3 3

Figure 3.19: ice cream cone


4 — Vector Calculus

“Equations are just the boring part of


mathematics. I attempt to see things in
terms of geometry”

Stephen Hawking

4.1 Vector Fields on R2 and R3


4.1.1 Examples of Vector Fields
Vector Calculus is an important tool in physics and engineering. Many concepts in physics,
such as gravitational and electrostatic forces, fluid flow, heat flow, etc. are described using
a mathematical concept called vector fields. Gravitational, magnetic and electric forces are
not just one single vector. Both directions and magnitudes may vary from place to place and
even changing over time. A vector field consists of a collection of vectors that are denoted
by a vector-valued function F( x, y, z) : R3 → R3 , which assigns to each point ( x, y, z) a vector
labeled F( x, y, z). For instance, the gravitational force field due to the Sun (whose center is at
the origin) is given by:
xi + yj + zk
F( x, y, z) = − GMm 2 ,
( x + y2 + z2 )3/2
where G, M and m are constants. That is to say, at the point (1, 0, 0), the gravitational force is:

F(1, 0, 0) = − GMm i,

whereas at the point (1, 0, 1), the gravitation force is:


 
1 1
F(1, 0, 1) = − GMm i + k .
23/2 23/2

With the help of computer software (such as Mathematica), one can visualize a vector field
easily. A plot of the above gravitational field can be found in Figure 4.1.
The general form of a vector field in R3 is given by:

F( x, y, z) = Fx ( x, y, z)i + Fy ( x, y, z)j + Fz ( x, y, z)k

where each of Fx , Fy and Fz is a scalar-valued function.


80 Vector Calculus

Figure 4.1: plot of the gravitational force F

i Do NOT confuse Fx with the partial derivative ∂F ∂x ! Although we used the subscript
notation Fx to denote a partial derivative before, we should avoid using it in this chapter.
Here Fx means the x-component of the vector field F.

Many vector fields in physics are three-dimensional. To begin with, we will also study
two-dimensional vector fields. A two-dimensional vector field is a collection of vectors in R2
that are denoted by a vector-valued function F : R2 → R2 , which assigns to each point ( x, y) a
vector F( x, y). The general form of a two-dimensional vector field is given by:
F( x, y) = Fx ( x, y)i + Fy ( x, y)j.
Figure 4.2 shows the plot of two examples of two-dimensional vector fields:

F( x, y) = (1 − y2 )i and F( x, y) = −yi + xj.

1.0 2

0.5 1

0.0 0

- 0.5 -1

- 1.0 -2

- 1.0 - 0.5 0.0 0.5 1.0 -2 -1 0 1 2

Figure 4.2: two examples of vector fields in R2

For brevity, we sometimes write down a vector field F in the form of ⟨ Fx , Fy ⟩ in R2 , or


⟨ Fx , Fy , Fz ⟩ in R3 . For instance,

(1 − y2 ) i = ⟨1 − y2 , 0⟩
−yi + xj = ⟨−y, x ⟩
xi + yj + zk ⟨ x, y, z⟩
− GMm 2 = − GMm 2
( x + y2 + z2 )3/2 ( x + y2 + z2 )3/2
4.2 Line Integrals of Vector Fields 81

4.2 Line Integrals of Vector Fields


4.2.1 Definition of Line Integrals
Work is defined to be: force × displacement. Precisely, if F is a constant force, and ∆r := r2 − r1
is the displacement vector which denotes the change of position from r1 to r2 , then:

Work = F · ∆r.

However, if the force is not a constant, meaning that either the direction or the magnitude
is not uniform, the work done by the force is not as simple as stated above. Likewise, if the
path is not a straight-path but a curved one so that ∆r is changing over time, the work done
by the force is again a bit more complicated. Line integrals of vector fields are introduced to
handle these more complicated scenarios.
Definition 4.1 — Line Integrals of Vector Fields. Given a continuous vector field F( x, y, z) and
a path C which is parametrized by r(t), a ≤ t ≤ b, the line integral of F over C is defined to
be: ˆ b
F · r′ (t) dt.
a

Notation In the spirit of r′ (t) dt = dr


dt dt = dr, we may denote the above line integral in a
concise way as: ˆ
F · dr
C
where C is the given path.

Let’s first look at some computational examples before we explain the physical and geomet-
ric meanings of line integrals.

■Example 4.1 Let F( x, y) = −yi + xj and C be´the counter-clockwise path along the circular
arc from (1, 0) to (0, 1). Find the line integral C F · dr.

■ Solution First we parametrize C:

π
r(t) = (cos t)i + (sin t)j, 0≤t≤ .
2
By the definition of line integrals,
ˆ ˆ t=π/2
F · dr = F · r′ (t) dt
C t =0
ˆ π/2
= (−yi + xj) · ((− sin t)i + (cos t)j) dt
0
ˆ π/2
= (y sin t + x cos t) dt.
0

Along the path C, we have x = cos t and y = sin t according to the parametrization, so we
have:
ˆ ˆ π/2  
F · dr = sin2 t + cos2 t dt
C 0
ˆ π/2
π
= 1 dt = .
0 2
82 Vector Calculus

Line integrals in three dimension can be computed in an exactly the same way as in two
dimension. Let’s see one example:

■ Example 4.2 Let F( x, y, z) = (y − x2 )i + (z − y2 )j + ( x − z2 )k and C ´be a path parametrized


by: r(t) = ti + t2 j + t3 k where 0 ≤ t ≤ 1. Compute the line integral C F · dr.

■ Solution We are given the parametrization in the problem so we can proceed to the
computation of the line integral:
ˆ ˆ 1 D E
F · dr = y − x2 , z − y2 , x − z2 · r′ (t) dt
C 0
ˆ 1 D E D E
= y − x2 , z − y2 , x − z2 · 1, 2t, 3t2 dt
0
ˆ 1  
= y − x2 + 2t(z − y2 ) + 3t2 ( x − z2 ) dt.
0

Along the path C, we have x = t, y = t2 and z = t3 , so:


ˆ ˆ 1  
F · dr = t2 − t2 + 2t(t3 − t4 ) + 3t2 (t − t6 ) dt
C 0
ˆ 1   29
= 3t3 + 2t4 − 2t5 − 3t8 dt = .
0 60

It is often that a path can be broken into several


ˆ segments as shown in Figure 4.3. If C
denotes the combined path, then the line integral F · dr of any continuous vector field F can
C
be calculated by breaking it down into:
ˆ ˆ ˆ ˆ ˆ ˆ
F · dr = F · dr + F · dr + F · dr + F · dr + F · dr.
C C1 C2 C3 C4 C5

Figure 4.3: piecewise path

In this spirit, we often denote the path C by:

C = C1 + C2 + C3 + C4 + C5 .
4.2 Line Integrals of Vector Fields 83

3
Γ2
L1

1 Γ1

x
1 L2 3

Figure 4.4: the path in Example 4.3

■ Example 4.3 Consider the directed path C = L1 + Γ1 + L2 + Γ2 starting from (0, 3), first
along the y-axis to the point (0, 1), then along the circular arc Γ1 to (1, 0), then along the
ˆ to the point (0, 3) along the circular arc Γ2 . See Figure 4.4.
x-axis to (3, 0), and finally back
Compute the line integral F · dr where F is given by:
C

F( x, y) = −yi + xj.

■ Solution Since
ˆ ˆ ˆ ˆ ˆ
F · dr = F · dr + F · dr + F · dr + F · dr,
C L1 Γ1 L2 Γ2

we compute the line integral of each segment individually.


To compute any line integral, we need to parametrize the path. The line segment L1
connects the points (0, 3) and (0, 1). From Chapter 1, we learned that its parametrization of
the straight-line path from a to b is given by:

r ( t ) = a + t ( b − a ), 0 ≤ t ≤ 1.

Therefore, L1 is parametrized by:

L1 : r1 (t) = ⟨0, 3⟩ + t (⟨0, 1⟩ − ⟨0, 3⟩) = ⟨0, 3 − 2t⟩, 0 ≤ t ≤ 1.

Hence,
ˆ ˆ 1
F · dr = F · r1′ (t) dt
L1 0
ˆ 1
= ⟨−y, x ⟩ · ⟨0, −2⟩ dt
0
ˆ 1
= ⟨−(3 − 2t), 0⟩ · ⟨0, −2⟩ dt
0
ˆ 1
= 0 dt = 0
0
84 Vector Calculus

Next we consider Γ1 , which is a clockwise circular arc of unit radius. The reverse path,
commonly denoted as −Γ1 , is a counter-clockwise circular arc of unit radius which can be
parametrized by:
π
− Γ1 : r2 (t) = ⟨cos t, sin t⟩, 0 ≤ t ≤ .
2
ˆ
We will compute F · dr, then the line integral over the original path Γ1 is given by:
− Γ1
ˆ ˆ
F · dr = − F · dr.
Γ1 − Γ1

ˆ ˆ π/2
F · dr = F · r2′ (t) dt
− Γ1 0
ˆ π/2
= ⟨−y, x ⟩ · ⟨− sin t, cos t⟩ dt
0
ˆ π/2
= ⟨− sin t, cos t⟩ · ⟨− sin t, cos t⟩ dt
0
ˆ π/2 ˆ π/2
2 2 π
= (sin t + cos t) dt = 1 dt = .
0 0 2

Therefore, ˆ ˆ
π
F · dr = − F · dr = − .
Γ1 − Γ1 2
Similarly, we parametrize L2 :

L2 : r3 (t) = ⟨1 + 2t, 0⟩, 0 ≤ t ≤ 1.

ˆ ˆ 1
F · dr = ⟨−y, x ⟩ · r3′ (t) dt
L2 0
ˆ 1
= ⟨0, 1 + 2t⟩ · ⟨2, 0⟩ dt
0
ˆ 1
= 0 dt = 0.
0

For the circular arc Γ2 with radius 3, the parametrization is given by:
π
Γ2 : r4 (t) = ⟨3 cos t, 3 sin t⟩, 0≤t≤ .
2

ˆ ˆ π/2
F · dr = ⟨−y, x ⟩ · ⟨−3 sin t, 3 cos t⟩ dt
Γ2 0
ˆ π/2
= ⟨−3 sin t, 3 cos t⟩ · ⟨−3 sin t, 3 cos t⟩ dt
0
ˆ π/2 ˆ π/2
2 2 9π
= (9 sin t + 9 cos t) dt = 9 dt = .
0 0 2

Finally, we sum up these results to find the value of the desired line integral:
ˆ ˆ ˆ ˆ ˆ
π 9π
F · dr = F · dr + F · dr + F · dr + F · dr = 0 − + 0 + = 4π.
C L1 Γ1 L2 Γ2 2 2
4.2 Line Integrals of Vector Fields 85

4.2.2 Physical Meaning of Line Integrals


´
We introduce line integrals here because C F · dr is exactly equal to the work done by the force
F to move a particle from the starting point of C to the ending point of C.
By the virtue of Riemann sums, the line integral can be thought as:
ˆ b

a
F · r′ (t) dt ≃ ∑ F · r′ (ti )∆ti
i

where we divide the time interval [ a, b] into small subdivisions:

a =: t0 < t1 < . . . < ti < . . . < tn := b.

Assume each subdivision is very small that F and r′ are roughly constants in each subdivi-
sion. By definition of derivatives, we have:

∆r(ti )
r ′ ( ti ) ≃ .
∆ti
Therefore,
ˆ b

a
F · r′ (t) dt ≃ ∑ F · ∆r(ti )
i
Since F · ∆r(ti ) is the work done by F with displacement ∆r(ti ) (as they are roughly
constants), summing them up gives the approximated total work done by the force over the
whole path C. As n → ∞, the subdivisions become infinitesimal and ∑i F · ∆r(ti ) becomes
more accurate and approaches to the total work done by the force.

4.2.3 Geometric meaning of line integrals


The line integral of a given vector field F over a path C can indicate whether the path C is
overall flowing along the vector field. Recall that the line integral is given by:
ˆ
F · r′ (t) dt.
C

The value of the integral is determined by many factors, including the length of C, the
magnitude of F and the velocity of r(t). However, the sign of F · r′ (t) is solely determined by
the angle θ between F and the tangent vector r′ (t) of the curve C, since:

F · r′ (t) = |F| r′ (t) cos θ.

It is positive when F and r′ (t) make an acute angle, and is negative when they make an obtuse
angle. Therefore, the sign of the line integral can reveal whether the path C is overall along
86 Vector Calculus

-2

-4

-4 -2 2 4

Figure 4.5: geometric meaning of line integrals

or against the direction of the vector field F. The more positive is the value, the more often
the path is traveling along the vector field on average. Let’s illustrate this point through the
following example:
Consider the vector field as shown in Figure 4.5 and the two paths C1 and C2 with directions
indicated in the diagram. Along the path C1 , the velocity vector is pointing against the vector
field at the beginning of the path. Then, it turns slightly along the vector field near the very
end of the path. Therefore, F · r′ is negative for most of the time, and so we should expect that:
ˆ
F · dr < 0.
C1

On the other hand, the path C2 is along the vector field at all time. The integrand F · r′ is
positive throughout the path C2 . Therefore, it is certain that:
ˆ
F · dr > 0.
C2

4.2.4 Independence of parametrization


Recall that when computing a line integral, we first pick a parametrization of the path. There
is one logical gap we need to fill in, namely if we choose two different parametrizations of the
same path, will we get the same answer for the line integral?
The answer is positive, as we can show it is true using the chain rule:
Suppose r(τ ), a ≤ τ ≤ b, and r(t), c ≤ t ≤ d, are two parametrizations of a path C.
Therefore, their endpoints must match, i.e. τ = a if and only if t = c, and τ = b if and only if
t = d.
Then, using the τ-parametrization, the line integral is given by:
ˆ τ =b
F · r′ (τ ) dτ.
τ=a
By chain rule, we have:
dr dr dt dt
r′ (τ ) = = = r′ (t) ,
dτ dt dτ dτ
and so ˆ ˆ
τ =b τ =b
′ dt
F · r (τ ) dτ = F · r′ (t) dτ.
τ=a τ=a dτ
dt
By change of variables, we have dt = dτ dτ and so:
ˆ τ =b ˆ t=d
F · r′ (τ ) dτ = F · r′ (t) dt
τ=a t=c
4.2 Line Integrals of Vector Fields 87

which is exactly how we defined line integral using t as the parameter.


The above shows that no matter how we parametrize the path, as far as the endpoints of
the path are kept unchanged, the line integral must be the same. We call this independence of
parametrization. Physically speaking, this tells us as far as a particle travels along a fixed path
C, the work done by the force does not depend on how fast the particle travels.

4.2.5 Alternative notations for line integrals


In some textbooks, a line integral is denoted using the arc-length parametrization. Recall that a
path r(s) is said to be arc-length parametrized if |r′ (s)| = 1 for any s, i.e. unit speed. For such
a parametrization, it is a convention to use s as a parameter.
Since the line integral of a given vector field F over a path C is independent of parametriza-
tion, we can also denote the line integral using the s parameter:
ˆ
F · r′ (s) ds.
C

The value is the same using any other parametrizations.


Since r′ (s) is a unit vector, conventionally it is often denoted by T̂ inside a line integral,
meaning that it is a unit tangent vector of the path C. Therefore, you may see that occasionally
the line integral is denoted by: ˆ
F · T̂ ds.
C
Another common notation for line integrals is so-called the differential form notation. Recall
that r is the position vector r = xi + yj + zk, and so symbolically dr can be regarded as:

dr = dxi + dyj + dzk.

Let F = Fx i + Fy j + Fz k, then (again symbolically)

F · dr = Fx dx + Fy dy + Fz dz.

Therefore, another common notation for line integrals is:


ˆ
Fx dx + Fy dy + Fz dz.
C

While the ds-notation does not have much practical use, there are some practical advantages
for using the differential form notations if the path C is parallel to one of the coordinate axes.
Let’s illustrate this through an example:

■ Example 4.4 Compute the line integral


ˆ
−y dx + x dy
C

where C is the path from (0, 1) down to (0, 0) along the y-axis, then to (1, 0) along the x-axis.

■ Solution Note that the path C has two segments. Break C into two segments C1 and C2

where C1 is the path from (0, 1) to (0, 0) along the y-axis, and C2 is the path from (0, 0) to
(1, 0) along the x-axis.

(0, 1)

C1

(1, 0)
(0, 0) C2
88 Vector Calculus

The line integral is broken down into two:


ˆ ˆ ˆ
−y dx + x dy = −y dx + x dy + −y dx + x dy.
C C1 C2

Along C1 , we have x = 0, and so dx = 0 too. From this we can immediately tell that
ˆ ˆ
−y dx + x dy = −y d(0) + 0 dy = 0.
C1 C1

Similarly, along C2 , we have y = 0, and so:


ˆ ˆ
−y dx + x dy = −0 dx + x d(0) = 0.
C2 C2

Combining the two results, we know:


ˆ
−y dx + x dy = 0.
C

To conclude, given a path C parametrized by r(t) and a vector field F = Fx i + Fy j + Fz k,


then the following are all equivalent notations for line integrals:
ˆ ˆ ˆ ˆ

F · dr = F · r (t) dt = F · T̂ ds = Fx dx + Fy dy + Fz dz.
C C C C
4.3 Conservative Vector Fields 89

4.3 Conservative Vector Fields


4.3.1 Definition and Consequences
This section discusses a special type of fields called conservative vector fields. The term conservative
is rooted from physics, not politics! We will first study its definition, and then investigate the
features that make conservative vector fields distinguished from generic ones.
Definition 4.2 — Conservative Vector Field. A vector field F is called a conservative vector
field if and only if it is in the form of F = ∇ f where f is a scalar function. The scalar
function f is called a potential function of the vector field F.

Recall that ∇ f denotes the gradient vector of f , defined by:

∂f ∂f ∂f
∇f = i+ j+ k.
∂x ∂y ∂z

The most preliminary method to determine whether a given vector field is conservative is
to solve for the scalar potential f , as illustrated by the following example:

■ Example 4.5 Consider the vector field

F( x, y, z) = (2x + y)i + ( x + z3 )j + (3yz2 + 1)k.

Determine whether or not F is a conservative vector field. If so, find its potential function f
such that F = ∇ f .

■ Solution F is conservative if and only if F = ∇ f for some scalar function f , or equivalently,


the following equations hold simultaneously:

∂f

1 = 2x + y
∂x
∂f

2 = x + z3
∂y
∂f

3 = 3yz2 + 1
∂z
From ⃝,1 one can find f ( x, y, z) by integrating 2x + y by x, regarding y to be a constant. In
single variable calculus, an integration constant will be added after the integration. However,
we are now considering partial derivatives, and not only constants but also y or z will vanish
after differentiating by x! In other words, the “integration constant” is no longer a constant
but instead a function of y and z. Precisely, we have:


4 f ( x, y, z) = x2 + yx + g(y, z)

where g(y, z) is some function of y and z. We will figure out g(y, z) in the remaining steps.
By differentiating both sides of ⃝4 with respect to y, we get:

∂f ∂g
= x+ .
∂y ∂y

Compare this with ⃝,


2 one need to have

∂g
= z3 .
∂y

An integration by y yields:
g(y, z) = yz3 + h(z).
90 Vector Calculus

Note that from similar principle discussed above for f , the integration “constant” is no
longer just a constant but a function not depending on y and hence a function of z only.
By differentiating both sides with respect to z, we get:

∂g
= 3yz2 + h′ (z).
∂z
Finally, by comparing this result with ⃝,
3 one must have h′ (z) = 1, and so clearly h(z) = z + C,
where C is geniunely a constant this time!
Combining all results above, we have

f ( x, y, z) = x2 + yx + yz3 + z + C

where C is any real constant.


It can be easily checked that F = ∇ f . Therefore, F is a conservative vector field with
potential functions given by the above f ’s.

It is important to keep in mind that not all vector fields are conservative! Here is one which
such an f does not exist:

■ Example 4.6 Let F( x, y) = −yi + xj. Determine whether or not F is a conservative vector
field. If so, find its potential function f such that F = ∇ f .

■ Solution If f is a scalar function such that F = ∇ f , then we have:

∂f

1 = −y
∂x
∂f

2 =x
∂y

Solving ⃝1 for f by integration, we get f ( x, y ) = − xy + g ( y ) for some function g ( y ). However,

by differentiating this result with respect to y, we get:

∂f
= − x + g ′ ( y ).
∂y

In order to be consistent with ⃝,


2 we would require

− x + g′ (y) = x, or equivalently, g′ (y) = 2x.

However, g′ (y) is a function of y, not of x! Since it leads to inconsistency, such an f cannot


exist and so F is not conservative.

One important feature of conservative vector fields is the path independence of line integral,
meaning that the line integral depends only on the end-points of the path. Precisely, we have:

Theorem 4.1 Given a conservative vector field F = ∇ f , where f is a potential function, then
along any path C connecting from point P0 ( x0 , y0 , z0 ) to point P1 ( x1 , y1 , z1 ), then the line
integral is given by: ˆ
F · dr = f ( x1 , y1 , z1 ) − f ( x0 , y0 , z0 ).
C
4.3 Conservative Vector Fields 91

Proof. The proof is a consequenceDof the multivariable


E chain rule and the Fundamental Theorem
∂f ∂f ∂f
of Calculus. Suppose F = ∇ f = ∂x , ∂y , ∂z and the path C is parametrized by:

r(t) = ⟨ x (t), y(t), z(t)⟩ , a ≤ t ≤ b.

Then,
ˆ ˆ
dr
F · dr = F· dt
C dt
ˆC
dr
= ∇ f · dt
C dt
ˆ b   
∂f ∂f ∂f dx dy dz
= , , · , , dt
a ∂x ∂y ∂z dt dt dt
ˆ b 
∂ f dx ∂ f dy ∂ f dz
= + + dt
a ∂x dt ∂y dt ∂z dt
ˆ b
d
= f (r(t))dt (chain rule)
a dt
= f (r(b)) − f (r( a)) (Fundamental Theorem of Calculus)

If ( x0 , y0 , z0 ) and ( x1 , y1 , z1 ) are the initial and final positions respectively, then r( a) =


⟨ x0 , y0 , z0 ⟩ and r(b) = ⟨ x1 , y1 , z1 ⟩. Therefore,
ˆ
F · dr = f ( x1 , y1 , z1 ) − f ( x0 , y0 , z0 ).
C


The significance of this theorem is that the RHS depends only on the initial and final points
of the curve C, but not the intermediate path. Inˆother words,
ˆ if C1 and C2 are two paths with
the same initial and final points, then we have F · dr = F · dr. Moreover, if C is closed
˛ C2 C1

path whose initial and final positions are the same, then F · dr = 0.
C

Notation If C is a closed path, meaning that the two endpoints are the same, it is a
convention to use denote the line integral as:
˛
F · dr.
C

To summarize, we have:
Corollary 4.2 For a conservative vector field F, if C1 and C2 are two paths with the same
initial and final positions, then
ˆ ˆ
F · dr = F · dr.
C1 C2

Moreover, if C is a closed path, then


˛
F · dr = 0.
C
92 Vector Calculus

■ Example 4.7 Consider the vector field F( x, y, z) = (2x + y)i + ( x + z3 )j + (3yz2 + 1)k which
appeared in Example 4.5, and the path C given by the parametric equation

t
r(t) = (et sin2012 t)i + (cos2013 t)j + k, 0 ≤ t ≤ 2π.
π
ˆ
Find the line integral F · dr.
C

■ Solution It is clear that direct computation of this line integral is extremely laborious (and
may be impossible). Fortunately, it was shown in Example 4.5 that F is a conservative vector
field with potential function

f ( x, y, z) = x2 + yx + yz3 + z + C.

The initial and final positions of the given path are respectively:

r(0) = ⟨0, 1, 0⟩
r(2π ) = ⟨0, 1, 2⟩.

By Theorem 4.1, the line integral in question is simply given by:


ˆ
F · dr = f (0, 1, 2) − f (0, 1, 0) = (10 + C ) − C = 10.
C

Alternatively, one can also find this line integral using the path independence property
of conservative vector fields. Let L be the straight path connecting from (0, 1, 0) to (0, 1, 2)
which are the initial and final positions respectively. Since F is conservative, we must have:
ˆ ˆ
F · dr = F · dr.
C L

The latter is much easier to figure out: line L is parametrized by:

r L (t) = ⟨0, 1, 0⟩ + t (⟨0, 1, 2⟩ − ⟨0, 1, 0⟩) = ⟨0, 1, 2t⟩ , 0 ≤ t ≤ 1.

Therefore,
ˆ ˆ 1 D E
F · dr = 2x + y, x + z3 , 3yz2 + 1 · r′L (t) dt
L 0
ˆ 1 D E
= 2(0) + 1, 0 + 8t3 , 3(4t2 ) + 1 · ⟨0, 0, 2⟩ dt
0
ˆ 1
= (24t2 + 2) dt
0
1
= 8t3 + 2t = 10.
0
ˆ ˆ
By path independence, F · dr = F · dr = 10.
C L
4.3 Conservative Vector Fields 93

4.3.2 Curl Test


It is possible to test whether a given vector field is conservative or not without attempting to
find its potential function. This test is commonly called the curl test. The use of term curl
comes from the fact that it involves calculations of the curl of a vector field.
Definition 4.3 — Curl of a Vector Field. Given a vector field

F = Fx i + Fy j + Fz k,

the curl of the vector field F, denoted either by curl(F) or ∇ × F, is defined as:
 
∂ ∂ ∂ 
∇×F = i + j + k × Fx i + Fy j + Fz k
∂x ∂y ∂z
i j k
∂ ∂ ∂
= ∂x ∂y ∂z
Fx Fy Fz
     
∂Fz ∂Fy ∂Fx ∂Fz ∂Fy ∂Fx
= − i+ − j+ − k.
∂y ∂z ∂z ∂x ∂x ∂y

i The symbol ∇ := ∂ ∂ ∂
∂x i + ∂y j + ∂z k should be considered as an operator rather than a vector.

It carries no physical or geometric meaning. “Multiplying” ∂x by a function P gives the
partial derivative ∂P
∂x as the product.

i If F = Fx i + Fy j is a two dimensional vector field, the curl ∇ × F can also be defined by


regarding the k-component to be zero, i.e.

F = Fx i + Fy j + 0k.

It can be easily verified that


 
∂Fy ∂Fx
∇×F = − k.
∂x ∂y

i ∇ × F is called the curl of F because it measures how circular the vector field F is, as a
consequence of the Green’s and Stokes’ Theorems which we will learn very soon.

■ Example 4.8 Compute the curl of the vector field:

F = (2x + y)i + ( x + z3 )j + (3yz2 + 1)k.

■ Solution

i j k
∂ ∂ ∂
∇×F = ∂x ∂y ∂z
2x + y x + z3 3yz2 + 1
   
∂ ∂ ∂ ∂
= (3yz2 + 1) − ( x + z3 ) i + (2x + y) − (3yz2 + 1) j
∂y ∂z ∂z ∂x
 
∂ ∂
+ ( x + z3 ) − (2x + y) k
∂x ∂y
= (3z2 − 3z2 )i + (0 − 0)j + (1 − 1)k = 0.
94 Vector Calculus

In order to introduce the curl test, we first need to introduce a topological concept about a
region, namely simply-connectedness. A connected region means the region is in one piece, and a
simply-connected region is defined as follows:
Definition 4.4 — Simply-Connected Regions. A region Ω is simply-connected if Ω is con-
nected and every closed loop in Ω can be contracted to a point continuously without leaving
the region Ω.

The set R2 with the origin removed, is not simply-connected, as the loops that go around
the origin cannot be contracted to a point without “touching” the origin. However, the set R3
with the origin removed is simply-connected – draw a picture to convince yourself on that!
The curl test to be introduced is a very straight-forward test to check whether a given vector
field F is conservative, without the need to solve for the potential function f . There is one
crucial condition on the domain of the vector field when applying the curl test.
Theorem 4.3 — Curl Test. Given a vector field F is defined and C1 on a region Ω, then:
1. If F = ∇ f for some scalar function f defined on Ω, then ∇ × F = 0 on Ω.
2. If ∇ × F = 0 and Ω is simply-connected, then F = ∇ f for some scalar function f
defined on Ω.

Proof. The theorem has two parts. Part (1) is easy, while the other part is very technical (and
hence the proof is omitted).
Part (1) is a consequence of the Mixed Partial Theorem. Suppose F = ∇ f , then F =
∂f ∂f ∂f
∂x + ∂y j + ∂z k, and so
i

i j k
∂ ∂ ∂
∇×F = ∂x ∂y ∂z
∂f ∂f ∂f
∂x ∂y ∂z
∂2 f ∂2 f ∂2 f ∂2 f ∂2 f ∂2 f
     
= − i+ − j+ − k
∂y∂z ∂z∂y ∂z∂x ∂x∂z ∂x∂y ∂y∂x
= 0i + 0j + 0k = 0.


Therefore, to check whether or not F is conservative assuming it is defined and smooth on a
simply-connected region, it is not necessary to solve for the potential function f . All is needed
is to find ∇ × F which only involves differentiation but not integration. However, the curl test
only tells you whether or not the vector field is conservative. It fails to tell you what the
potential function is. However, knowing that F is conservative without knowing the potential
f can still be helpful since one can then pick an easier path when computing a line integral.
Let’s consider the following example:
4.3 Conservative Vector Fields 95

■ Example 4.9 Let C be the path in R2 parametrized by:


  2t π π
r(t) = cos24601 t i + j, − ≤t≤ .
π 2 2
Find the line integral:
ˆ    
2xe xy + x2 ye xy dx + x3 e xy + 2y dy
C

ˆ
■ Solution Recall that the required line integral can be equivalently written as F · dr where
C
the vector field F is given by:
   
F( x, y) = 2xe xy + x2 ye xy i + x3 e xy + 2y j.

If we compute the line integral directly from the definition, then one would have to first
compute F · r′ (t), which is given by:
 
24601 cos24601 t· 2t 2 × 24601 2t cos24601 t· 2t
2 cos t·e π + cos t· e π · 24601 cos24600 t · (− sin t) + . . .
π | {z }
24601 t ) ′
| {z }
2xe xy + x2 ye xy
( cos

Needless to say, it is overly complicated and almost impossible to integrate it by hand.


However, if we try to find the curl ∇ × F, we see that:

i j k
∂ ∂ ∂
∇×F = ∂x ∂y ∂z
2xe xy + x2 ye xy x3 e xy + 2y
0
∂( x e + 2y) ∂(2xe xy + x2 ye xy )
3 xy
 
= 0i − 0j + − k
∂x ∂y
 
= 3x2 e xy + x3 ye xy − 2x2 e xy − x2 e xy − x3 ye xy k
= 0k = 0.

Note that F is defined and C1 everywhere in R2 . The curl test applies and therefore it shows
F is conservative. Even
ˆ though the curl test does not tell us what the potential function
is, the line integral F · dr is now shown to be path-independent. One may simply join
C ˆ
the end-points of C by a straight-line path L, and we can compute the line integral F · dr
L
instead.
The end-points of C are given by:

2  π
r(−π/2) = (cos(−π/2))24601 i + − j = 0i − j
π 2
2 π
 
r(π/2) = (cos(π/2))24601 i + j = 0i + j
π 2
The straight-line path L from (0, −1) to (0, 1) is parametrized by:

r L (t) = ⟨0, −1⟩ + t (⟨0, 1⟩ − ⟨0, −1⟩) = ⟨0, 2t − 1⟩, 0 ≤ t ≤ 1.


96 Vector Calculus

Since F is conservative, by the path-independence property of line integrals, we get:


ˆ ˆ
F · dr = F · dr
C L
ˆ 1
= F · r′L (t) dt
0
ˆ 1
= ⟨2xe xy + x2 ye xy , x3 e xy + 2y⟩ · ⟨0, 2⟩ dt
0
ˆ 1  
= 2 x3 e xy + 2y dt
0
ˆ 1
= 2 (0 + 2(2t − 1)) dt
0
h i1
= 4 t2 − t = 0.
0

It is worthwhile to note the condition that F is defined and C1 on a simply-connected


domain is crucial when applying the curl test to show F is conservative. The following example
tells us why:
Let
y x
H( x, y) = − 2 i+ 2 j.
x + y2 x + y2
This vector field is not defined when ( x, y) ̸= (0, 0). It can be easily verified (left as an exercise)
that ∇ × H = 0 for any ( x, y) ̸= (0, 0). However, when integrating along the unit circle
C : r(t) = (cos t)i + (sin t)j where 0 ≤ t ≤ 2π:
˛ ˛
−ydx + xdy
H · dr =
C C x 2 + y2
ˆ 2π
− sin t d(cos t) + cos t d(sin t)
=
0 sin2 t + cos2 t
ˆ 2π 2
(sin t + cos2 t)dt
=
0 sin2 + cos2 t
ˆ 2π
= 1dt = 2π ̸= 0.
0

C is a closed path, so H cannot be conservative, even thought the curl of H is zero! The curl
test fails here because the domain of H is not simply-connected.
The gravitational vector field

xi + yj + zk
F( x, y, z) = − GMm
( x2+ y2 + z2 )3/2

is not defined on (0, 0, 0) but is defined and C1 on everywhere else in R3 . With some straight-
forward (although lengthy) computations, one can verify that ∇ × F = 0 for any ( x, y, z) ̸=
(0, 0, 0). Since the domain R3 with origin removed is simply-connected, the curl test applies to
this vector field! It concludes that the gravitational vector field F is conservative.
4.4 Green’s Theorem 97

4.4 Green’s Theorem


4.4.1 The Theorem and its Uses
We will exclusively deal with two-dimensional vector fields in this section. In the previous
section, we see that if F is a two-dimensional vector field which is defined and C1 on a simply-
connected region Ω in R2 (such as the entire R2 plane), then the curl test says ∇ × F = 0 if
and only if F is conservative, and so for such a vector field, we have
˛
F · dr = 0
C

for any closed curve C in R2 . ¸


It is natural to ask if there is any hidden relation between ∇ × F and C F · dr, given that the
former being a zero vector implies the latter is zero. The Green’s Theorem gives the relationship
between them. Before introducing the Green’s Theorem, we need to understand:
Definition 4.5 — Simple Closed Curves. A curve C is called a simple closed curve if the two
endpoints coincide and it does not intersect itself at any point (other than the endpoints).

Theorem 4.4 — Green’s Theorem. Let C be a simple closed curve in R2 which is counter-
clockwise oriented. Suppose the curve C encloses region R. Let F( x, y) be a vector field
which is defined and C1 at every point in R, then:
˛ ¨
F · dr = (∇ × F) · k dA
C R
| {z } | {z }
line integral double integral

i In terms of components of F in rectangular coordinates, i.e. F = Fx i + Fy j, one can easily


 
∂Fy ∂Fx
verify that ∇ × F = ∂x − ∂y k, so the above Green’s Theorem can be stated as:
˛ ¨  
∂Fy ∂Fx
Fx dx + Fy dy = − dA.
C R ∂x ∂y

i In particular, if ∇ × F = 0 and F is defined on all of R2 , then the Green’s Theorem tells us


that ˛ ¨
F · dr = 0 · k dA = 0
C R
for any simple closed curve C. This is exactly what the curl test tells us!

The proof of the Green’s Theorem is quite technical if the curve C is complicated. The proof
of the theorem in some special cases can be found in some reference textbooks listed in the
syllabus. Let’s first look at some examples.

■ Example 4.10 Use the Green’s Theorem to evaluate the line integral:
˛
−ydx + xdy
C
98 Vector Calculus

where C is a rectangular loop (0, 0) → (1, 0) → (1, 1) → (0, 1) → (0, 0).

■Solution The vector field corresponding to this line integral is F = −yi + xj, which is
defined and is smooth everywhere on R2 . The given path C encloses the rectangle R with
vertices (0, 0), (1, 0), (1, 1) and (0, 1).

(0, 1) (1, 1)

(0, 0) (1, 0)

By straight-forward calculation, the curl of F is given by:

∇ × F = 2k.

By the Green’s Theorem, we have:


˛ ˛
−ydx + xdy = F · dr
C
¨C
= (∇ × F) · k dA
R
¨
= 2k · k dA
¨R
= 2 dA
R
= 2 × area of R
= 2.

i If a two-dimensional C1 vector field F has a property that (∇ × F) · k = c where c is a


constant, it is best to apply the Green’s Theorem when finding the line integral over a
simple-closed curve C. It is because we then have:
˛ ¨
F · dr = (∇ × F) · k dA
C
¨R
= c dA
R
= c × area of R

i When C is a closed path going around


¨ a rectangle or a square, it is often a good idea to
apply the Green’s Theorem because (∇ × F) · k dA is a double integral over a rectangle
R ˛
or a square, which is very easy to set up. On the other hand, computing F · dr would
C
require breaking it down into four segments and computing the line integral of each of
them.
4.4 Green’s Theorem 99

3
Γ2
L1

1 Γ1

x
1 L2 3

Figure 4.6: the path in Example 4.11

■ Example 4.11 Let F = −y2 i + xyj and C is the multi-segment path appeared in Example
4.3. For easy reference, see Figure 4.6. Find the line integral using Green’s Theorem:
˛
F · dr.
C

■ Solution The closed path C consists of four segments. If we attempt to find the line integral
directly, we would have to break it down into four integrals, one for each segment. However,
this closed path C encloses a fan-shape region (denoted by R). The double integral over R
is easy to set up if one converts it into polar coordinates. This suggests that the Green’s
Theorem may come in handy here.
Since F is defined and is C1 everywhere, we can apply the Green’s Theorem without any
issues. First we compute its curl:
∇ × F = 3yk.
By the Green’s Theorem
˛ ¨
F · dr = (∇ × F) · k dA
C
¨R
= 3yk · k dA
R
¨
= 3y dA.
R

The region R can be represented in polar coordinates as:


π
1 ≤ r ≤ 3, 0≤θ≤ .
2
Recall that y = r sin θ and dA = rdrdθ, we have:
¨ ˆ π/2 ˆ 3
3y dA = 3r sin θ rdrdθ
R 0 1
ˆ π/2 h i3
= r3 · sin θ dθ
0 1

= 26 [− cos θ ]0π/2
= 26

Therefore, ˛
F · dr = 26.
C
100 Vector Calculus

4.4.2 Significance of the Green’s Theorem


The significance of the Green’s Theorem is far beyond than making computations of line inte-
grals easier. One important geometric or physical significance is that it gives an interpretation
of the curl of a two-dimensional vector field.
Consider a vector field F in R2 defined and being C1 everywhere. Suppose C is a very tiny
simple closed curve enclosing a tiny region R. Then, the quantity (∇ × F) · k can be regarded
as a constant inside the region R.
By the Green’s Theorem, we have:
˛ ¨
F · dr = (∇ × F) · k dA ≃ (∇ × F) · k (area of R).
C R

In other words, ˛
1
(∇ × F) · k ≃ F · dr.
area of R C
˛
The line integral F · dr is larger if F · r′ is large for most of the time. It will happen when
C ˛
F is circular around the curve C. In other words, the closed-loop line integral F · dr indicates
C
how circular the vector field F is around the curve C. The above result tells us that the quantity
(∇ × F) · k at a point is roughly proportional to the circulation of F around that point. That’s
why we call ∇ × F to be curl of F because it is an indicator of curliness of a vector field!
As an example, you can verify that the curls of the vector fields F = −yi + xj and G =
xi + yj are respectively given by:

(∇ × F) · k = 2
(∇ × G) · k = 0
It suggests that F is more circular than G. By plotting them in Mathematica, we can see that
vectors in F are circling around the origin, while vectors in G are diverging from the origin.

4.4.3 Limitations of the Green’s Theorem


We have stated in the Green’s Theorem that the vector field F needs to be defined at every
point in the region R enclosed by the simple closed curve C. It is a crucial condition!
Let’s consider the vector field
y x
F=− 2 i+ 2 j.
x + y2 x + y2
It is not defined at the origin (0, 0). If we let C be the (counter-clockwise) unit circle centered
at the origin, then C can be parametrized by:
r(t) = (cos t)i + (sin t)j, 0 ≤ t ≤ 2π.
By computing the line integral over C directly, one should get:
˛ ˛ ˛ ˛
F · dr = F · r′ (t) dt = ⟨− sin t, cos t⟩ · ⟨− sin t, cos t⟩ dt = 1 dt = 2π.
C C C C

However, one can check from direct computations that ∇ × F = 0 at every point except the
origin, and is undefined at the origin. Therefore, the double integral
¨
(∇ × F) · k dA = 0.
R| {z }
=0

Therefore, in this case the Green’s Theorem does not hold as we have:
˛ ¨
F · dr ̸= (∇ × F) · k dA .
| C {z } | R {z }
= 2π =0
4.4 Green’s Theorem 101

To summarize, one cannot directly apply the Green’s Theorem if the given curve encloses a
point at which the vector field is not defined. In such cases, we should either compute the line
integral directly by parametrization, or use some other tools to compute it. We will compute
this integral by so-called the hole-drilling technique to be presented in the next subsection.
If the curve does not enclose the origin (at which F is undefined), then Green’s Theorem
applies to F and such a curve without any issues. For instance, if Γ is a circle centered at
(3, 0) with radius 2, then it does not enclose the origin (which is the only point at which
y x
F = − x 2 + y2 i + x 2 + y2
j is undefined). The Green’s Theorem does show that
˛ ¨ ¨
F · dr = (∇ × F) · k dA = 0 dA = 0.
Γ R | {z } R
=0 at every point in R

Here R is the region enclosed by Γ.

4.4.4 Winding Number of a Closed Curve


The following vector field (discussed in the previous part)
y x
F=− i+ 2 j
x 2 + y2 x + y2
is a famous one which concerns about the winding number of a closed curve. As discussed
before, it is not defined at the origin, and ∇ × F = 0 at every point except the origin. If C is a
closed curve (not necessarily simple closed) in R2 not passing through the origin, there is a
celebrated result in topology that says:
˛
F · dr = 2π × number of times C travels around the origin counter-clockwisely.
C
˛
1 −y x
Consequently, the quantity 2 2
dx + 2 dy is often called the winding number
2π C x + y x + y2
of the curve C. This number is not only important in pure mathematics but also in physics.
Furthermore, the computation of surface flux of a vector field satisfying the inverse square law,
as we will see later, are also in the same spirit as the computation of the winding number.
Our goal here is to use the Green’s Theorem to explain why this line integral gives the
winding number.

Circles Centered at Origin


We start with the simplest case where the curve is a circle with radius ε centered at the origin,
which from now on denoted by Γε . The path is parametrized by:

Γε : r(t) = (ε cos t)i + (ε sin t)j, 0 ≤ t ≤ 2π.

Then, by straight-forward calculations, we get:


˛ ˆ t=2π
y x ε sin t ε cos t
− dx + 2 dy = − d(ε cos t) + d(ε sin t)
Γε x 2 + y2 x + y2 t =0 ε 2 ε2
ˆ t=2π  
= sin2 t + cos2 t dt
t =0
ˆ t=2π
= 1 dt = 2π.
t =0

Simple Closed Curves Enclosing the Origin


˛ arbitrary curve, say the curve C in
Next we see how to apply the Green’s Theorem on a more
y x
Figure 4.7a. We are going to show that the line integral F · dr, where F = − x2 +y2 i + x 2 + y2
j,
C
is equal to 2π. First note that F is not defined at the origin which is enclosed by the curve C,
so we can’t directly apply the Green’s Theorem on this curve C.
102 Vector Calculus

Figure 4.7: a curve circling around the origin once

To handle this issue, we drill a hole near the origin, i.e. we remove a tiny ball with radius ε
centered at the origin from the region (see Figure 4.7b), and further split the punctured region
into two parts R1 and R2 by cutting it through line segments L1 and L2 (see Figure 4.7cd). We
label each segment of the boundaries by C1 , C2 , L1 , L2 , Γ1 and Γ2 with directions indicated in
the diagram. Then, according to the directions of C1 , C2 and C, the line integral in question
can be expressed as:
˛ ˛ ˆ ˆ
F · dr = F · dr = F · dr + F · dr.
C C1 +C2 C1 C2

Likewise, according to the directions of Γ1 , Γ2 and Γε (recall that Γε is the counter-clockwise


circle with radius ε centered at the origin), we have:
˛ ˛ ˆ ˆ
F · dr = F · dr = F · dr + F · dr.
Γε Γ1 + Γ2 Γ1 Γ2
˛ ˛
Now our goal is to show that F · dr and F · dr are equal. Note that we have already
C Γε
computed the latter, which is 2π. The above result will show that the line integral over C is
also 2π.
In order to show this, we consider each of the regions R1 and R2 shown in Figure 4.7cd.
Since R1 does not enclose the origin and F is defined everywhere in R1 , the Green’s Theorem
can be applied to the region R1 . According to the indicated directions of the boundary curves,
we see that:
boundary of R1 = C1 + L2 − Γ1 + L1 .
Therefore, by the Green’s Theorem, we have:
ˆ ˆ ˆ ˆ ¨
F · dr + F · dr − F · dr + F · dr = (∇ × F) ·k dA = 0. (4.1)
C1 L2 Γ1 L1 R1 | {z }
| ¸
{z } =0
boundary of R1 F·dr

Similarly, R2 does not enclose the origin so the Green’s Theorem can be applied to R2 :
ˆ ˆ ˆ ˆ ¨
F · dr − F · dr − F · dr − F · dr = (∇ × F) ·k dA = 0. (4.2)
C2 L1 Γ2 L2 R2 | {z }
| ¸
{z } =0
boundary of R2 F·dr
4.4 Green’s Theorem 103

By summing up (4.1) and (4.2) and canceling out the line integrals over L1 and L2 , we get:
ˆ ˆ ˆ ˆ
F · dr + F · dr − F · dr − F · dr = 0.
C1 C2 Γ1 Γ2

Since C = C1 + C2 and Γε = Γ1 + Γ2 according to their directions in the diagram, we have


˛ ˛
F · dr − F · dr = 0
C Γε

and hence: ˛ ˛
F · dr = F · dr = 2π.
C Γε

Self-Intersecting Curves
Next, let’s use the above result to deal with some intersecting curves. Consider the curve C in
Figure 4.8.

Figure 4.8: a self-intersecting curve circling around the origin twice

Since we already know how to deal with any simple closed curve enclosing the origin, we
are going to split the curve C into simple closed curves C1 , C2 and C3 according to Figure 4.8.
As both C1 and C2 are simple closed curves enclosing the origin, by our previous discussion
we know: ˛ ˛
F · dr = F · dr = 2π.
C1 C2

For C3 , it is a simple closed curve not enclosing the origin. Therefore, the Green’s Theorem
can be applied on C3 without any issues. Hence, we have
˛ ¨
F · dr = (∇ × F) · k dA = 0
−C3 R

where R is the region enclosed by C3 . Here we have again used the fact that ∇ × F = 0 inside
the region R.
According the directions indicated on the diagram, we have:
˛ ˛ ˛ ˛
F · dr = F · dr + F · dr + F · dr = 2π + 2π + 0 = 4π.
C C1 C2 C3

Therefore, the winding number of this curve C is equal to 2. Similar technique can be
applied to more complicated curves which go around the origin a lot of times.
104 Vector Calculus

4.5 Parametric Surfaces


The goal of this and the next sections is to generalize the Green’s Theorem for vector fields
on the xy-plane to the Stokes’ Theorem for vector fields on the xyz-space. While the Green’s
Theorem relates the line integral of a closed curve to the double integral over the enclosed
region, the Stokes’ Theorem relates the line integral to an integral over a surface whose boundary
is the a closed curve. For that we need to define and make sense of surface integrals.

4.5.1 Surface Parametrizations


To begin with, we need to know how to describe a surface in the xyz-space. Recall that a curve
in space is represented in parametric form r(t) = x (t)i + y(t)j + z(t)k, and is thought as the
path of a particle. The values of x (t), y(t) and z(t) represent the coordinates of the particle at
time t.
To present a surface, we need two parameters, say u and v. The general form of a parametric
equation of a surface is:

r(u, v) = x (u, v)i + y(u, v)j + z(u, v)k.

Instead of regarding u and v as time variables, we regard them as the coordinates on a uv-plane,
and the vector r(u, v) is a function that associates each point (u, v) on the uv-plane to a point
( x (u, v), y(u, v), z(u, v)) in the xyz-space. Since there are two parameters, the image of the
function is a surface in the xyz-space. In other words, the function r(u, v) can be thought as a
transformation that “wraps” the uv-paper onto the surface.

The function r(u, v) is called a parametrization of the surface, and by what we mean
“parametrizing a surface” is to give a parametrization of the surface. As we will see later,
parametrizing a surface is often the first step of computing a surface integral.

i Although we use (u, v) to denote the parameters, you can use whatever pair of variables
for the parameters, provided that there is no confusion.

Parametrization via Coordinate Systems


Let’s look at several elementary examples such as cylinders, spheres and cones.

■ Example 4.12 Find a parametrization for the cylinder with radius r0 and with z-axis as the
central axis.

■Solution If, under a certain coordinate system, the surface has one of the coordinates being
constant, then giving a parametrization to that surface is fairly easy: simply take the other
two coordinates as parameters, and define r according to the conversion rules between this
coordinate system and the rectangular coordinate system.
The cylinder described in the problem can be presented by equation r = r0 under
cylindrical coordinates (r, θ, z). The conversion rule between cylindrical and rectangular
4.5 Parametric Surfaces 105

coordinates is given by:

x = r cos θ
y = r sin θ
z=z

Fix r = r0 , then x, y and z are functions of (θ, z). Simply set:

x (θ, z) = r0 cos θ, y(θ, z) = r0 sin θ, z(θ, z) = z,

then the parametrization is given by:

r(θ, z) = (r0 cos θ )i + (r0 sin θ )j + zk, 0 ≤ θ ≤ 2π, −∞ < z < ∞.

Figure 4.9: parametric plot of a cylinder

One can also specify the range of z so that the parametrization gives a finite cylinder. For
instance,
r(θ, z) = (r0 cos θ )i + (r0 sin θ )j + zk, 0 ≤ θ ≤ 2π, 0 ≤ z ≤ 1
gives the finite cylinder with unit height (from z = 0 to z = 1).
Similarly, a cone making π4 angle with the z-axis can be represented by z = r in cylindrical
coordinates, or ϕ = π4 in spherical coordinates. Therefore, it is can be parametrized by two
different ways:
r1 (r, θ ) = (r cos θ )i + (r sin θ )j + rk, 0 ≤ r < ∞, 0 ≤ θ ≤ 2π
 π   π   π
r2 (ρ, θ ) = ρ sin cos θ i + ρ cos sin θ j + ρ cos k
 4   4 4
ρ cos θ ρ sin θ ρ
= √ i+ √ j + √ k, 0 ≤ ρ < ∞, 0 ≤ θ ≤ 2π
2 2 2
A sphere with radius 3 can be presented by ρ = 3 in spherical coordinates, and so it can be
parametrized by:
r3 (θ, ϕ) = (3 sin ϕ cos θ )i + (3 sin ϕ sin θ )j + (3 sin ϕ)k, 0 ≤ θ ≤ 2π, 0 ≤ ϕ ≤ π.
Parametrization of Graphs
If a surface can be represented by a Cartesian equation (i.e. level-set form) such as x2 + y − z3 =
1 and that you can write one of the variables as a function of the other two variables (such
as y = z3 − x2 + 1 in this example), then we can use the other two variables as parameters. For
instance, the surface y = z3 − x2 + 1 can be parametrized as:
r( x, z) = xi + (z3 − x2 + 1) j + zk.
| {z }
y
106 Vector Calculus

However,
p x cannot be written as a function of y and z for the surface y = z3 − x2 + 1, since
x = ± z3 − y + 1 and the ± makes x not be a function of y and z. On the other hand, z can
be written as a function of x and y, since z3 = x2 + y − 1 and there is exactly one cubic root for
x2 + y − 1. However, the resulting parametrization
 1/3
r( x, y) = xi + yj + x2 + y − 1 k

is not easy to work with.


Some Interesting Surfaces
Although surfaces such as cones, spheres and cylinders can be easily parametrized, it is not
the case for many interesting surfaces.
Below are some interesting examples of surfaces. It is not easy to write down (or even to
explain) the parametrization since it demands quite a lot of geometric intuitions.
A torus (i.e. donut), for example, has a complicated parametrization given by:
r(u, v) = ((3 + cos u) cos v) i + ((3 + cos u) sin v) j + (sin u) k,
with 0 ≤ u ≤ 2π, 0 ≤ v ≤ 2π.

The Möbius strip, a famous object in topology, has the following parametrization
 v u   v u  v u
r(u, v) = 1 + cos cos u i + 1 + cos sin u j + sin k
2 2 2 2 2 2
where 0 ≤ u ≤ 2π and −1 ≤ v ≤ 1.
Since r denotes the position vector xi + yj + zk, we can also write down a parametrization
in an equation form (especially when the expression of the parametrization is too long to fit in
one line). For instance, the Möbius strip parametrization can be equivalently written as:
 v u
x = 1 + cos cos u
2 2
 v u
y = 1 + cos sin u
2 2
v u
z = sin
2 2
4.5 Parametric Surfaces 107

where 0 ≤ u ≤ 2π and −1 ≤ v ≤ 1.

Figure 4.10: Möbius Strip

The following parametrization, which looks intimidating, describes a very beautiful surface
called the Klein bottle (see Figure 4.11):
2  
x = − cos u 3 cos v − 30 sin u + 90 cos4 u sin u − 60 cos6 u sin u + 5 cos u sin u cos v
15
1
y = − sin u(3 cos v − 3 cos2 u cos v − 48 cos4 u cos v + 48 cos6 u cos v
15
− 60 sin u + 5 cos u sin u cos v − 5 cos3 u cos v sin u
− 80 cos5 u sin u cos v + 80 cos7 u sin u cos v)
2
z= (3 + 5 cos u sin u) sin v
15
for 0 ≤ u ≤ π and 0 ≤ v ≤ 2π.

Figure 4.11: the Klein Bottle

Note that the Klein bottle is self-intersecting as depicted in the diagram.


Geometry of Parametric Surfaces
Consider a parametrization r(u, v) of a surface. Keeping v = v0 fixed and letting u vary, the
parametrization function r(u, v0 ) depends only on u and it will give a curve on the surface.
This curve is often called a u-curve for u being varied. The tangent vector to any u-curve can
be computed by taking the u-derivative of r, i.e.
∂r
= tangent vector to the u-curve
∂u
Likewise, when u = u0 is fixed while v varies, the function r(u0 , v) depends only on v. The
curve traced out by this function when v varies is called a v-curve, and
∂r
= tangent vector to the v-curve
∂v
108 Vector Calculus
∂r ∂r
Both ∂u andare tangent to the surface, hence their cross product is normal to the surface.
∂v
∂r
× ∂r
A unit normal at each point is therefore given by: n̂ = ∂u ∂v .
∂u × ∂v
∂r ∂r

4.5.2 Surface Integrals


We are going to introduce surface integrals in this subsection. Double integrals such as
¨
f ( x, y) dA
R

is a special type of surface integral where the region of integration R is on the flat xy-plane. A
surface integral is one that the region of integration can be a curved surface. Many geometric
and physical quantities, such as surface area, surface flux, and moment of inertia for some
shell objects, can be computed using surface integrals. The Stokes’ Theorem to be introduced
in the next section also involves surface integrals.
We first state the definition of surface integrals, compute some examples and then explain
its geometric and physical meaning.
Definition 4.6 — Surface Integrals. Given a surface S parametrized by r(u, v) with a ≤ u ≤ b
and c ≤ v ≤ d, and a continuous, scaled-valued function f ( x, y, z), the surface integral of f
over the surface S is denoted and defined to be:
¨ ˆ v=d ˆ u=b
∂r ∂r
f dS = f (r(u, v)) × dudv.
S v=c u= a ∂u ∂v

i The line integral of a vector field over a curve C is independent of how we parametrize C.
It is also true that the surface integral over a surface S is also independent of the surface
parametrization r(u, v). The proof involves change of variables technique in multivariable
calculus and is omitted here.

Notation If a surface is closed (meaning it has no boundaries), it is conventional to denote


the integral sign by: ‹

S
Examples of closed surfaces include spheres and torus, while a hemisphere (only the
spherical part, the flat part is not included) is not closed since it has a circle as its boundary.
4.5 Parametric Surfaces 109

■ Example 4.13 Let S be the sphere of radius a centered at the origin. Evaluate the surface
integral ‹  
x 2 + y2 dS
S

■ Solution We first parametrize the surface. Using spherical coordinates, the sphere is
presented by ρ = a. Take (θ, ϕ) as parameters, then a parametrization of the sphere is given
by:
r(θ, ϕ) = ( a sin ϕ cos θ )i + ( a sin ϕ sin θ )j + ( a cos ϕ)k
where 0 ≤ θ ≤ 2π and 0 ≤ ϕ ≤ π.
According to the definition of surface integrals, we need to compute ∂r
∂θ × ∂r
∂ϕ . It is
straight-forward, although quite tedious:

∂r
= (− a sin ϕ sin θ )i + a sin ϕ cos θ )j + 0k
∂θ
∂r
= ( a cos ϕ cos θ )i + ( a cos ϕ sin θ )j + (− a sin ϕ)k
∂ϕ
i j k
∂r ∂r
× = − a sin ϕ sin θ a sin ϕ cos θ 0
∂θ ∂ϕ
a cos ϕ cos θ a cos ϕ sin θ − a sin ϕ
= (− a2 sin2 ϕ cos θ )i + (− a2 sin2 ϕ sin θ )j + (− a2 sin ϕ cos ϕ)k
q
∂r ∂r
× = a4 sin4 ϕ(cos2 θ + sin2 θ ) + a4 sin2 ϕ cos2 ϕ
∂θ ∂ϕ
q
= a4 sin2 ϕ(sin2 ϕ + cos2 ϕ)
= a2 sin ϕ.

Next we compute the surface integral. When ( x, y, z) is on the sphere S, we have: x =


a sin ϕ cos θ and y = a sin ϕ sin θ. Therefore, the integrand is given by:

x2 + y2 = a2 sin2 ϕ(cos2 θ + sin2 θ ) = a2 sin2 ϕ.

According to the bounds of θ and ϕ in the parametrization, the surface integral of x2 + y2


over S is given by:
‹  ˆ ϕ=π ˆ θ =2π  ∂r
2 2
  ∂r
x +y dS = x 2 + y2 × dθdϕ
S ϕ =0 θ =0 ∂θ ∂ϕ
ˆ ϕ=π ˆ θ =2π
= a2 sin2 ϕ · a2 sin ϕ dθdϕ
ϕ =0 θ =0 | {z } | {z }
x 2 + y2
∂θ × ∂ϕ
∂r ∂r

ˆ ϕ=π ˆ θ =2π
= a4 sin3 ϕ dθdϕ
ϕ =0 θ =0
ˆ ϕ=π
= 2πa4 sin3 ϕ dϕ
ϕ =0
4 8πa4
= 2πa4 · = .
3 3
110 Vector Calculus

Here we used the fact from single-variable calculus that:


ˆ π
4
sin3 ϕ dϕ = .
0 3

We leave this part as an exercise.

■ Example 4.14 Let S be the plane 3x + 2y + z = 1 defined over the region 0 ≤ x ≤ 1 and
0 ≤ y ≤ 1. Compute the surface integral:
¨
( x + y + z) dS.
S

■ Solution The equation of the given plane can be written as a graph z = 1 − 3x − 2y over
the given region 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1. We can take x and y as parameters and so a
parametrization of the plane is given by:

r( x, y) = xi + yj + (1 − 3x − 2y)k

where 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1.
Next we compute all the ingredients of the surface integral:

∂r
= i − 3k
∂x
∂r
= j − 2k
∂y
∂r ∂r
× = 3i + 2j + k
∂x ∂y
∂r ∂r p √
× = 32 + 22 + 12 = 14
∂x ∂y
x + y + z = x + y + (1 − 3x − 2y) = 1 − 2x − y

Finally we compute the surface integral:


¨ ˆ 1ˆ 1 √
( x + y + z) dS = (1 − 2x − y) · 14 dxdy
S 0 0
√ ˆ 1 h i x =1
= 14 x − x2 − xy dy
0 x =0
√ ˆ 1
= 14 −y dy
0
"√ #1
y2 14
= −
2
0

14
=− .
2
4.5 Parametric Surfaces 111

Surface Element
Now we explain the geometric and physical meaning of surface integrals. Given a para-
metric surface S with parametrization r(u, v), a ≤ u ≤ b and c ≤ v ≤ d, from now on we
denote:
∂r
Notation dS = ∂u × ∂v
∂r
dudv

It is a called the surface element of the integral. If we subdivide the domain in the uv-plane
into small rectangular pieces with area ∆u ∆v, the parametrization r(u, v) transform them into
small pieces ∆S on the surface (see Figure 4.12).

Figure 4.12: geometric meaning of the surface element

If the number of subdivisions is very large so that each ∆S is very small, then ∆S is
approximately a parallelogram. The red side of the parallelogram has length approximately
∂r
equal to ∂u ∆u (to see this, regard u is the time then the distance traveled after ∆u unit time
is approximately the speed × time. Similarly, the blue side of the parallelogram has length
approximately equal to ∂r
∂v ∆v. Suppose the angle of the parallelogram is θ, then the area of
∆S is approximately:
∂r ∂r ∂r ∂r ∂r ∂r
∆u · ∆v · sin θ = sin θ · ∆u ∆v = × ∆u ∆v.
∂u ∂v ∂u ∂v ∂u ∂v
| {z }
| ∂u
∂r
|| ∂v |
∂r sin θ

Infinitesimally, the ∆S becomes the surface element dS, and ∆u ∆v becomes dudv. In other
words, the surface element dS presents the area of a very tiny piece of subdivision on the
surface. Summing up the area of all tiny subdivisions, we get the surface area of S, i.e.
¨
dS = surface area of S.
S
Different geometric or physical meanings of the integrand f give different meanings to the
surface integral of f over S. For instance, ¨
If f is each element f dS means f dS means
S
1 area of dS total surface area
surface density mass of dS total mass of the surface
density × ( x2 + y2 ) moment of inertia of dS moment of inertia of S
112 Vector Calculus

Let S be the sphere of radius a centered at the origin. We computed in Example 4.13 that
‹   8πa4
x2 + y2 dS = .
S 3
‹  
Suppose this spherical shell has a uniform surface density δ, then the integral δ x2 + y2 dS
S
is the moment of inertia Iz about z-axis. Since its formula in many physics textbooks is written
in terms of the total mass m of the sphere, let’s rewrite it in terms of m.
‹  
Iz = δ x2 + y2 dS
S
8πa4
 

3
m 8πa4
= ·
4πa2 3
2 2
= ma .
3

4.5.3 Surface Flux


Surface flux is an important type of surface integrals in both mathematics and physics. It will
appear in the statement of the Stokes’ Theorem, and also plays an important in electricity and
magnetism. In a nutshell, the surface flux of a vector field F through a surface S measures the
“amount” of vectors passing through S.
Let’s begin our discussion with the simplest case where the vector field F is constant and
the surface is simply a flat plane.

Figure 4.13: surface flux for uniform vector field through a plane

The force F can be decomposed into two components, one perpendicular to the plane,
another parallel to the plane. As the flux counts only the amount of vectors through the plane,
only those perpendicular to the plane should be counted. Suppose the force F makes an angle
θ with the unit normal vector n̂, then the perpendicular component of the force has length
F · n̂ = |F| cos θ.
Furthermore, there are vectors at every point on the plane. The larger the plane, the more
vectors passing through it. Therefore, the flux of F through the plane should be defined as:

(F · n̂)(area of the plane).


Now consider a curved surface (so that n̂ is no longer constant) and F is no longer a constant
vector field. Infinitesimally, each area element can be regarded as a tiny flat parallelogram, and
both F and n̂ can be regarded as constant vectors over each of tiny parallelogram. Recall that
dS is the area of this element. Therefore, the flux of F through this tiny bit of surface is given
by:
F · n̂ dS
Summing up, the total flux of F through the entire surface is given by a surface integral as
stated in the definition below:
4.5 Parametric Surfaces 113

Figure 4.14: flux through a tiny surface element

Definition 4.7 — Surface Flux. Given a vector field F and a surface S, the surface flux of F
through S is defined to be the surface integral:
¨
F · n̂ dS
S

where n̂ denotes unit normal vector to S at each point.

i Note that there are often two choice of n̂, so the sign of the surface flux depends on which
direction of n̂ is chosen. For a closed surface such as a sphere, it is a convention to choose
the outward unit normal.

i There are some surfaces which you cannot “choose” a unit normal convention. For
instance, if you pick a normal vector on the Möbius strip and let it vary continuously over
the strip, the normal vector may end up pointing at opposite direction when it returns to
the original position! We call these non-orientable surfaces. We do not define surface flux
on non-orientable surfaces.

Generally speaking, the surface flux can be computed by parametrizing the surface. Given
a parametrization r(u, v), with a ≤ u ≤ b and c ≤ v ≤ d, of a surface S, the unit normal vector
n is given by:
∂r
× ∂r
n̂ = ± ∂u ∂v
∂r
∂u × ∂r
∂v

where ± depends on the convention chosen.


It looks complicated to compute the normal vector and the surface flux. However, as a
coincidence, the area element dS is given by:

∂r ∂r
dS = × dudv.
∂u ∂v

Therefore, when one multiplies F · n̂ by dS, the term ∂r


∂u × ∂r
∂v got canceled! Precisely, we have:

Theorem 4.5 Let r(u, v), with a ≤ u ≤ b and c ≤ v ≤ d, be a parametrization of a surface S.


The surface flux of a vector field F through S can computed by:
¨ ˆ dˆ b  
∂r ∂r
F · n̂ dS = ± F· × dudv
S c a ∂u ∂v

where ± depends on the chosen convention of n̂.


114 Vector Calculus

Proof. From the given parametrization r(u, v), we have


¨ ˆ dˆ b
∂r ∂r
F · n̂ dS = F · n̂ × dudv
S c a ∂u ∂v
ˆ dˆ b ∂r
× ∂r
∂r ∂r
=± F· ∂u ∂v
× dudv
c a ∂r
× ∂r ∂u ∂v
∂u ∂v
ˆ dˆ b  
∂r ∂r
=± F· × dudv.
c a ∂u ∂v

■ Example 4.15 Consider the vector field:

xi + yj + zk
F = − GMm .
( x2 + y2 + z2 )3/2

Let S be part of the horizontal plane z = 1 over the region x2 + y2 ≤ 1. Compute the surface
flux of F through S, i.e. ¨
F · n̂ dS.
S
Here n̂ is chosen to be the upward normal.

■ Solution First we parametrize the surface S. Since the plane z = 1 is a graph over the
xy-plane, it seems the easiest way is to use x and y as parameters. However, the base region
x2 + y2 ≤ 1 is not a rectangle so it may be difficult to set up the ranges of x and y for the
parametrization. Since the base region is a solid circle, we can use cylindrical coordinates
too. Let
r(r, θ ) = (r cos θ )i + (r sin θ )j + k (since z = 1),
where 0 ≤ r ≤ 1 and 0 ≤ θ ≤ 2π. Next we compute all the ingredients:

∂r
= (cos θ )i + (sin θ )j
∂r
∂r
= (−r sin θ )i + (r cos θ )j
∂θ
i j k
∂r ∂r
× = cos θ sin θ 0
∂r ∂θ
−r sin θ r cos θ 0
= (r cos2 θ + r sin2 θ )k
= rk (which is upward)
(r cos θ )i + (r sin θ )j + zk
F = − GMm
(r2 cos2 θ + r2 sin2 θ + z2 )3/2
(r cos θ )i + (r sin θ )j + k
= − GMm (since z = 1)
(r2 + 1)3/2
 
∂r ∂r GMmr
F· × =− 2 .
∂r ∂θ (r + 1)3/2
4.5 Parametric Surfaces 115

Therefore, by Theorem 4.5, the surface flux is given by


¨ ˆ 2π ˆ 1  
∂r ∂r
F · n̂ dS = F· × drdθ
S 0 0 ∂r ∂θ
ˆ 2π ˆ 1
GMmr
=− 2 + 1)3/2
drdθ
0 0 ( r
ˆ 2π 
GMm r=1

= √ dθ (by inspection)
0 1 + r 2 r =0
ˆ 2π  
1
= GMm √ − 1 dθ
0 2
 
1
= 2πGMm √ − 1 .
2

Figure 4.15: F and S in Example 4.15

Figure 4.16: F and S in Example 4.16


116 Vector Calculus

■ Example 4.16 Let F = xi − yj. Find the upward flux of F over S which is the upper part of

the sphere with radius 2 centered at the origin cut out by the plane z = 1.

■ Solution Since the surface is spherical, it is usually the best to use spherical coordinates to

parametrize it. Under spherical coordinates, the sphere is represented by ρ = 2, so we use
θ and ϕ for the parameters. Let
D√ √ √ E
r(θ, ϕ) = 2 sin ϕ cos θ, 2 sin ϕ sin θ, 2 cos ϕ .

The domain of the parameters are 0 ≤ θ ≤ 2π and 0 ≤ ϕ ≤ π4 .


As in the previous example, we first compute all the ingredients. Since they are all
straight-forward computations, some detail will be omitted here.

∂r D √ √ E
= − 2 sin ϕ sin θ, 2 sin ϕ cos θ, 0
∂θ
∂r D√ √ √ E
= 2 cos ϕ cos θ, 2 cos ϕ sin θ, − 2 sin ϕ
∂ϕ
∂r ∂r D E
× = −2 cos ϕ sin2 θ, −2 sin ϕ sin2 θ, −2 cos ϕ sin ϕ
∂θ ∂ϕ
D√ √ E
F = xi − yj = 2 sin ϕ cos θ, − 2 sin ϕ sin θ, 0

∂r ∂r
 √
F· × = −2 2 cos 2θ sin3 ϕ.
∂θ ∂ϕ
∂r
Note that ∂θ × ∂ϕ
∂r
obtained above is a downward normal since the k-component is negative.
The upward flux is given by:
¨ ˆ 2π ˆ π/4  
∂r ∂r
F · n̂ dS = F· − × dϕdθ
S 0 0 ∂θ ∂ϕ
ˆ 2π ˆ π/4 √
= 2 2 cos 2θ sin3 ϕ dϕdθ
0 0
√ ˆ 2π
! ˆ !
π/4
3
=2 2 cos 2θ dθ sin ϕ dϕ = 0.
0 0
| {z }
=0

Physical Interpretations of Surface Flux


Generally speaking, the flux of a vector field F through a surface S measures the net “amount”
of vectors F passing through S along a chosen direction of normal vector n̂. The unit of the
“amount” depends on the physical meaning of the vector field F.
For instance, if u is the velocity vector field of fluid (in ms−1 ), and the surface area of S has
unit m2 , then the surface flux ¨
u · n̂ dS
S
has unit m3 s−1 and so it measures the net volume of fluid through the surface S in the direction
of n̂. If the flux is positive, then there is more fluid flowing along the direction n̂ than against
it. On the other hand, if the flux is negative, there is more fluid flowing against n̂ than along it.
If S is a closed surface (such as a sphere, a cube, a torus), it is a convention to take n̂ as the
outward unit normal. The surface flux

u · n̂ dS
S

over this closed surface measures the net volume of fluid flowing in the direction of n̂ through
4.5 Parametric Surfaces 117

the surface, or in other words, the net volume of fluid flowing out from region D enclosed by
S.
If one denotes ϱ as the uniform density of the fluid, then

ϱu · n̂ dS
S

measures the net mass of the fluid flowing out from the enclosed region D through its boundary
S per unit time. Assuming there is no sink or source inside D, by the conservation of mass, the
rate of change of the total mass of fluid enclosed by S is related to the surface flux by:
˚ ‹

ϱ dV = − ϱu · n̂ dS.
∂t D S
| {z }
total mass in D

Later we can apply the Divergence Theorem on the above relation to derive an important
equation to the fluid flow.
Now suppose the vector field J represents the transfer of heat energy in unit Joule per
second. Note that while energy is a scalar, the transfer of heat at different point may have a
different direction and so it is a vector quantity. Again, take S to be a closed surface enclosing
a solid region D, then the surface flux of j through S:

J · n̂ dS
S

measures the amount of heat energy flowing out from the region D through S. This flux
integral is commonly called the heat flux through S by physicists.
If E is a electric field and, again, S is a closed surface, then the flux integral

E · n̂ dS
S

is commonly called the electric flux through S. A result by Gauss claims that this flux integral is
proportional to the total amount of charges enclosed by the surface. If B is a magnetic field
and, again, S is a closed surface, then the flux integral

B · n̂ dS
S

is commonly called the magnetic flux through S. Gauss’s Law for Magnetism asserts that it
must be zero.
118 Vector Calculus

4.6 Stokes’ Theorem


4.6.1 Stokes’ Theorem for Simply-Connected Surfaces
Recall that the Green’s Theorem relates the line integral of a closed curve to a certain double
integral over the region enclosed by the curve. The Stokes’ Theorem is its generalization to the
three dimenions, which allows the region to be a curved surface in R3 .
Theorem 4.6 — Stokes’ Theorem. Let S be an orientable, simply-connected surface in R3 , and
C be the boundary curve of the surface S. Suppose F is a vector field which is defined and is
C1 on the surface S, then we have:
˛ ¨
F · dr = (∇ × F) · n̂ dS
| C {z } | S {z }
line integral surface integral

where n̂ is the unit normal vector to S, with direction determined by the right-hand rule
(see Figure 4.17).

Figure 4.17: right-hand rule

i By comparing the statements of the Green’s and Stokes’ Theorems, one can easily see that
the Green’s Theorem is a special case of the Stokes’ Theorem, in a sense that the former
applies to plane curve and the flat region enclosed by the curve on the xy-plane. The unit
normal vector n̂ for the plane region is obviously given by k if the plane curve is traveling
in the counter-clockwise orientation.

i For closed curves in the three-dimensional space, one cannot say whether they are counter-
clockwise or clockwise as it depends on the direction of observations. Therefore, the
counter-clockwise convention of the Green’s Theorem is generalized to the right-hand rule
condition in the statement of the Stokes’ Theorem.

i The Stokes’ Theorem applies only on orientable surfaces. That says, it may not hold for
surfaces such as the Möbius strip. Also, the condition where the vector field F needs to be
defined and is C1 on the surface S is crucial. However, we will mostly deal with vector
fields that satisfy this condition.

i The condition that S has to be simply-connected is also crucial, but we will later learn how
to modify the Stokes’ Theorem so as to allow non-simply-connected surface S.

The proof of the Stokes’ Theorem is omitted here. Interested readers may consult a reference
textbook for a proof of one special case. Let’s look at some examples.

■ Example 4.17 Let S be the hemisphere x2 + y2 + z2 = 4, z ≥ 0 above the the xy-plane,


and C be its boundary curve oriented counter-clockwise on the xy-plane. Given F =
4.6 Stokes’ Theorem 119

(z − y)i + xj − xk. Determine the line integral


˛
F · dr
C

using the Stokes’ Theorem.

■ Solution The Stokes’ Theorem asserts that:


˛ ¨
F · dr = (∇ × F) · n̂ dS.
C S

Since the RHS is a surface integral, we need to parametrize it first in order to compute it.
Using spherical coordinates, the parametrization of S is given by:

r(θ, ϕ) = (2 sin ϕ cos θ )i + (2 sin ϕ sin θ )j + (2 cos ϕ)k

where 0 ≤ θ ≤ 2π and 0 ≤ ϕ ≤ π
2. Then,

∂r
= (−2 sin ϕ cos θ )i + (2 sin ϕ cos θ )j + 0k
∂θ
∂r
= (2 cos ϕ cos θ )i + 2 cos ϕ sin θ )j + (−2 sin ϕ)k
∂ϕ
∂r ∂r
× = (−4 sin2 ϕ cos θ )i + (−4 sin2 ϕ sin θ )j + (−4 sin ϕ cos ϕ)k
∂θ ∂ϕ
∂r ∂r
× = (4 sin2 ϕ cos θ )i + (4 sin2 ϕ sin θ )j + (4 sin ϕ cos ϕ)k
∂ϕ ∂θ
∂r
Note that ∂θ × ∂ϕ
∂r
is pointing downward, so we use ∂r
∂ϕ × ∂r
∂θ instead.
Next we need to compute ∇ × F:

i j k
∂ ∂ ∂
∇×F = ∂x ∂y ∂z
z−y x −x
= 2j + 2k.

Therefore, using the Stokes’ Theorem, we have


˛ ¨ ¨
F · dr = (∇ × F) · n̂ dS = (2j + 2k) · n̂ dS
C S S
ˆ π/2 ˆ 2π  
∂r ∂r
= (2j + 2k) · × dθdϕ (using Theorem 4.5)
0 0 ∂ϕ ∂θ
ˆ π/2 ˆ 2π
= (8 sin2 ϕ sin θ + 8 sin ϕ cos ϕ) dθdϕ
0 0
ˆ π/2
! ˆ

! ˆ π/2
=8 sin2 ϕ dϕ sin θ dθ +2π 4 sin 2ϕ dϕ
0 0 0
| {z }
=0
 π/2
1
= 8π − cos 2ϕ = 8π.
2 0

i Although the Stokes’ Theorem was used in the previous example (as required by the
problem), it is actually easier to compute the line integral directly by parametrizing the
curve:
120 Vector Calculus

Let r(t) = (2 cos t)i + (2 sin t)j + 0k where 0 ≤ t ≤ 2π. Then


˛ ˛
F · dr = F · r′ (t) dt
C C
ˆ 2π
= ((z − y)i + xj − xk) · ((−2 sin t)i + (2 cos t)j + 0k) dt
0
ˆ 2π
= (−2(z − y) sin t + 2x cos t) dt
0
ˆ 2π
= −2(−2 sin t) sin t + 2(2 cos t) cos t dt
0
ˆ 2π
= 4(sin2 t + cos2 t) dt = 8π.
0

Figure 4.18: the sphere and the plane in Example 4.18

The purpose of the previous example is simply to illustrate how to use the Stokes’ Theorem,
although it is not necessary to use it. The line integral in the next example, however, would be
extremely difficult to compute without the Stokes’ Theorem.

■Example 4.18 Let C be the curve of intersection of the plane Ax + By + Cz = 0 and the
sphere x2 + y2 + z2 = a2 . See Figure 4.18. Show that
˛
πa2 ( A + B + C )
ydx + zdy + xdz = ± √
C A2 + B2 + C 2
where ± is determined by the orientation of C.

■ Solution The plane Ax + By + Cz = 0 passes through the origin. Therefore, the curve C is
a great circle on the sphere. However, this great circle is neither horizontal or vertical, so it is
difficult to parametrize C to compute the line integral.
Let’s use the Stokes’ Theorem to see if there is any luck! When using the Stokes’ Theorem,
one needs to pick a surface S whose boundary curve is C. There are three choices:
1. the disk region enclosed by C on the plane
2. the hemisphere above the plane
3. the hemisphere below the plane
All of the above choice should give the same answer. However, let’s pick the region of the
plane enclosed by C to be the surface S and we will explain why it is the smartest choice
among all three.
4.6 Stokes’ Theorem 121

Now the given line integral is associated to the vector field F = yi + zj + xk, i.e.
˛ ˛
ydx + zdy + xdz = F · dr.
C C

In order to apply the Stokes’ Theorem, we need to compute the curl:

i j k
∂ ∂ ∂
∇×F = ∂x ∂y ∂z
y z x
= −i − j − k.

We also need the unit normal vector n̂, but since the region S is a plane whose equation is
Ax + By + Cz = 0. The unit normal is a constant vector given by:

Ai + Bj + Ck
n̂ = ± √
A2 + B2 + C 2
where ± is determined by the orientation of C.
Therefore,
 
Ai + Bj + Ck A+B+C
(∇ × F) · n̂ = ± (−i − j − k) · √ = ±√ .
2 2
A +B +C 2 A2 + B2 + C 2
Next, we apply the Stokes’ Theorem on these C and S:
˛ ¨
F · dr = (∇ × F) · n̂ dS
C S
¨
A+B+C
=± √ dS
S A2 + B2 + C 2
¨
A+B+C
= ±√ dS
A2 + B2 + C2 | S{z }
surface area
πa2 ( A + B + C)
=±√ .
A2 + B2 + C 2

Note that the surface area of the region S on the plane is πa2 , since its boundary is a circle
with radius a.
There are two major reasons why the part of the plane enclosed by C is a smarter choice
for S than the hemispheres. For one thing, both ∇ × F and n̂ are constant vector field if
S is chosen to be a planar region, so that computing its surface integral is very easy – no
parametrization! For another, if any of the hemispheres were chosen to be S, then the surface
integral needs to be computed by parametrization – which can be tedious. It is also very
difficult to determine the range of values of ϕ and θ since the plane cutting the sphere is not
a horizontal one.
Occasionally, the Stokes’ Theorem can be applied to evaluate a surface integral over an
arbitrary or complicated surface, which is not easy to be parametrized. If the given vector field
G can be expressed in the form of F = ∇ × G for another vector field G, by the (backward)
Stokes’ Theorem asserts that:
¨ ¨ ˛
F · n̂ dS = (∇ × G) · n̂ dS = G · dr
S S C

where C is the boundary curve of the surface S. Very often, the line integral is easier to compute
than the surface integral.

i Note that the above discussion holds only when the given vector field F is of the form
122 Vector Calculus

F = ∇ × G. If such an F is not in this form, there is no easy way to apply the Stokes’
Theorem backward.

Figure 4.19: the curve and surfaces in Example 4.19

■Example 4.19 Let C be an arbitrary simple closed curve on the xy-plane in the xyz-space,
and S be an arbitrary surface above the xy-plane with boundary curve C. See Figure 4.19.
y 
1. Verify that i = ∇ × − 2z j + 2 k .
2. Show that: ¨
i · n̂ dS = 0.
S

■ Solution Part (1) is straight-forward:

 z i j k
y  ∂ ∂ ∂
∇× − j+ k = ∂x ∂y ∂z
2 2 y
0 − 2z z
 
∂ y ∂  z
= − − i − 0j + 0k
∂y 2 ∂z 2
=i
y
For part (2), we denote G = − 2z j + 2 k for simplicity. Then i = ∇ × G. Applying the
Stokes’ Theorem backward, we get:
¨ ¨
i · n̂ dS = (∇ × G) · n̂ dS
S
˛S
= G · dr.
C

Since the curve C is arbitrary in nature, there is no way to parametrize C. However, it is


given that C is on the xy-plane! Therefore, one can use the Green’s Theorem to evaluate the
above line integral. Denote R to be the region on the xy-plane enclosed by the curve C, then
the Green’s Theorem asserts that:
˛ ¨
G · dr = (∇ × G) · k dA
C R
¨ ¨
= i · k dA = 0 dA = 0.
R R

4.6.2 Significance of the Stokes’ Theorem


Interpretation of Curl
Using the Stokes’ Theorem, one can give a geometric interpretation of ∇ × F. Consider a tiny
surface S with boundary curve C. Denote n̂ to be the unit normal vector of S with direction
4.6 Stokes’ Theorem 123

determined by the right-hand rule. Since the surface S is very small, one can regard the
quantity (∇ × F) · n̂ is nearly a constant over the surface S. By the Stokes’ Theorem, we have:
˛ ¨ ¨ 
F · dr = (∇ × F) · n̂ dS ≃ (∇ × F) · n̂ dS .
C S S

Therefore,
¸
F · dr.
C
(∇ × F) · n̂ ≃ .
Surface area of S

This quantity is large when the vector field F is circular about the normal vector n̂. In other
words, the quantity (∇ × F) · n̂ measures the circulation density around any given point. That’s
why ∇ × F is often called the curl of F.

Conservative Vector Fields


Recall that a vector field F is conservative if there exists a scalar function f such that F = ∇ f .
If F is defined and C1 everywhere (or on a simply-connected domain), then the curl test asserts
that F is conservative if and only if ∇ × F = 0. Using the Stokes’ Theorem, if F is conservative,
then:
˛ ¨
F · dr = (∇ × F) ·n̂ dS = 0,
C S | {z }
=0

which recovers the result we proved before in Theorem 4.1. Of course, the Stokes’ Theorem as-
serts more than Theorem 4.1 does because it applies to any vector field, not just the conservative
ones.

4.6.3 Surfaces with Multiple Boundaries


When applying the Stokes’ Theorem on surfaces with multiple boundaries (i.e. not simply-
connected), such as the one in Figure 4.20, one needs to be very careful when dealing with the
inner boundary.
The form of the Stokes’ Theorem, as stated in Theorem 4.6, applies only to simply-connected
regions. However, using a similar technique illustrated in the subsection about the winding
number, one can extend the Stokes’ Theorem so that it can be applied to surfaces with holes as
well. Take the region in Figure 4.20 as an example. One can subdivide the surface by cutting
along arcs that connect the outer boundary and the inner boundaries.

Figure 4.20: apply Stokes’ Theorem for surfaces with holes

Each sub-surface of S1 , S2 and S3 now becomes simply-connected and so the Stokes’


124 Vector Calculus

Theorem applies to each sub-surface:


ˆ ˆ ˆ ˆ ! ¨
+ − + F · dr = (∇ × F) · n̂ dS
C1 L1 Γ1− L2 S1
| {z }
form the boundary of S1
ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ! ¨
+ − + + − − − F · dr = (∇ × F) · n̂ dS
C2 L3 Γ2− L4 C4 L2 Γ1+ L1 S2
| {z }
form the boundary of S2
ˆ ˆ ˆ ˆ ! ¨
− − − F · dr = (∇ × F) · n̂ dS.
C3 L4 Γ2+ L3 S3
| {z }
form the boundary of S3

Summing up all three equations and cancelling out the Li ’s terms, we get:
ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ !
+ + + − − − − F · dr
C1 C2 C3 C4 Γ1− Γ1+ Γ2− Γ2+
¨ ¨ ¨ 
= + + (∇ × F) · n̂ dS
S1 S2 S3

Combining all Ci ’s, Γi ’s and Si ’s, we yield:


˛ ˛ ˛ ¨
F · dr− F · dr − F · dr = (∇ × F) · n̂ dS.
C Γ1 Γ2 S

While the surface integral (RHS) is in the same form as in the usual Stokes’ Theorem, the LHS
is not summing up the boundary line integrals, but rather the outer boundary has a plus sign
in front and the inner boundaries each has a minus sign in front.
For more complicated surfaces (with many, but finitely many, holes), one can apply the
technique illustrated above to establish:
Theorem 4.7 — Stokes’ Theorem for Higher Genusa Surfaces. Let S be an orientable surface
in R3 with n holes. Denote C to be its outer boundary, and Γ1 , Γ2 , . . . , Γn to be its inner
boundaries. Suppose F is a vector field is defined and C1 on the surface S, then:
˛ n ˛ ¨
F · dr − ∑ F · dr = (∇ × F) · n̂ dS.
C i =1 Γ i S

Here n̂ is the unit normal vector to S with orientation determined by the right-hand rule
applied to the outer boundary C.
a The word genus is the mathematical term for number of “holes” inside the surface.
4.7 Divergence Theorem 125

4.7 Divergence Theorem


The Green’s and Stokes’ Theorems relate the line integral of a vector field over a simple closed
curve with a double/surface integral of the curl of the vector field. In this section, we are going
to learn the Divergence Theorem, which relates the surface integral over a closed surface with
a triple integral over the solid region enclosed by the surface.

4.7.1 Divergence Operator


In order to state the Divergence Theorem, we need to define:
Definition 4.8 — Divergence Operator. Given a C1 vector field F in R3 whose components in
rectangular coordinates are:
F = Fx i + Fy j + Fz k,
the divergence of F, denoted by ∇ · F, is defined as:

∂Fx ∂Fy ∂Fz


∇·F = + + .
∂x ∂y ∂z

i We will see the geometric interpretation of ∇ · F after we state the Divergence Theorem.
Essentially, it measures how diverging the vector field is.

i The symbol ∇ can be regarded as ∂x



i + ∂y∂ ∂
j + ∂z k so that we can regard the divergence of
F as a dot product:
 
∂ ∂ ∂ 
∇·F = i+ j + k · Fx i + Fy j + Fz k .
∂x ∂y ∂z
∂ ∂ ∂
However, the “vector” ∂x i + ∂y j + ∂z k is regarded as an operator and has no physical or
geometric meaning. Note that the divergence of F is a scalar function, in contrast to the
curl ∇ × F which is a vector field.

4.7.2 Divergence Theorem for Solids without Holes

Theorem 4.8 — Divergence Theorem. Let S be a closed orientable surface enclosing a simply-
connected solid region D. Suppose F is a vector field defined and being C1 in and near the
region D, then we have: ‹ ˚
F · n̂ dS = ∇ · F dV .
S D
| {z } | {z }
surface integral triple integral

Here n̂ is the outward normal of S.

The Divergence Theorem is particularly useful for computing the flux over a closed surface,
as the theorem says we do not need to parametrize the surface and compute the normal vector.
Let’s look at some examples.
126 Vector Calculus


‹ − 5zk. Let S be the sphere with radius
Example 4.20 Consider the vector field F = 3xi + 4yj
a centered at the origin. Evaluate the flux integral F · n̂ dS with outward unit normal n̂.
S

■ Solution You can imagine the computation would be quite tedious if we computed this
flux integral directly by parametrizing the sphere. However, since S is a closed surface, one
can try to use the Divergence Theorem:

∂ ∂ ∂
∇·F = (3x ) + (4y) + (−5z) = 3 + 4 − 5 = 2.
∂x ∂y ∂z

Denote D to be the solid sphere with radius a centered at the origin, i.e. the solid region
enclosed by S. By the Divergence Theorem, we have:
‹ ˚ ˚
4 8
F · n̂ dS = ∇ · F dV = 2 dV = 2 · πa3 = πa3 .
S D D 3 3

■ Example 4.21 Let F = x2 i + 4xyzj + ze x k, D be the rectangular box defined by 0 ≤ x ≤ 3,


0 ≤ y ≤ 2 and 0 ≤ z ≤ 1 and S be the boundary surface of the D (i.e. S is the shell of the
box). Evaluate the flux integral: ‹
F · n̂ dS
S
where n̂ is the outward normal.

■ Solution The closed surface S has six faces! If one attempts to compute the flux integral
directly, one needs to split it into six integrals, corresponding to each of its six faces.
However, if one applies the Divergence Theorem, the difficult surface integral becomes a
triple integral over a rectangular region which is very easy to set-up.
We first compute:

∂ 2 ∂ ∂
∇·F = ( x ) + (4xyz) + (ze x ) = 2x + 4xz + e x .
∂x ∂y ∂z

By the Divergence Theorem, we get:


‹ ˚
F · n̂ dS = ∇ · F dV
S D
ˆ z =1 ˆ y =2 ˆ x =3
= (2x + 4xz + e x ) dxdydz
z =0 y =0 x =0
3
= 34 + 2e .

We omit the computational detail of the triple integral above, which is a very straight-forward.

One should note that the surface integral stated in the Divergence Theorem is

F · n̂ dS
S

but not of (∇ × F) · n̂. In fact, using the Divergence Theorem, one can show that

(∇ × F) · n̂ dS = 0
S

for any C1 vector field F.


4.7 Divergence Theorem 127

It is because:
‹ ˚
(∇ × F) ·n̂ dS = ∇ · G dV
S | {z } D
=:G
˚
= ∇ · (∇ × F) dV.
D

We next show that ∇ · (∇ × F) = 0 for any C1 vector field F.


     
∂Fz ∂Fy ∂Fx ∂Fz ∂Fy ∂Fx
∇×F = − i+ − j+ − k
∂y ∂z ∂z ∂x ∂x ∂y
     
∂ ∂Fz ∂Fy ∂ ∂Fx ∂Fz ∂ ∂Fy ∂Fx
∇ · (∇ × F) = − + − + −
∂x ∂y ∂z ∂y ∂z ∂x ∂z ∂x ∂y
∂2 Fz ∂2 Fy ∂2 Fx ∂2 Fz ∂2 Fy ∂2 Fx
= − + − + − .
∂x∂y ∂x∂z ∂y∂z ∂y∂x ∂z∂x ∂z∂y

Using the Mixed Partials Theorem, all of the above second derivatives are canceled out, and so
∇ · (∇ × F) = 0.
Therefore, we get: ‹
(∇ × F) · n̂ dS = 0.
S

4.7.3 Interpretation of Divergence Operator


By taking a tiny solid region D with boundary surface S, then the Divergence Theorem applied
to a vector field F asserts that:
‹ ˚
F · n̂ dS = ∇ · F dV.
S D

When the region D is very small, one can regard ∇ × F is nearly a constant, and so we have:

F · n̂ ≃ (∇ · F) × volume of D.
S

This result gives a geometric interpretation of ∇ · F:



1
∇·F ≃ F · n̂ dS.
volume of D S

In other words, ∇ × F measures the flux density near a point. The more diverging F is
around a point, the higher the flux over a tiny closed surface around that point, resulting in
greater value of ∇ · F. This justifies the use of name divergence for ∇ · F.

4.7.4 Limitations of Divergence Theorem


The condition that F has to be defined in the region D is crucial. Consider the gravitational
force field:
xi + yj + zk GMm xi + yj + zk
F = − GMm =− ·p .
( x 2 + y2 + z2 )
3/2 x2 2
+y +z 2
x 2 + y2 + z2

Under the spherical coordinates (ρ, θ, ϕ), we have ρ2 = x2 + y2 + z2 . The vector field
xi + yj + zk xi + yj + zk
p is the unit radial vector field. For simplicity, we denote eρ = p ,
2
x +y +z2 2 x 2 + y2 + z2
then the gravitational vector field can be expressed as:

GMm
F=− eρ .
ρ2
128 Vector Calculus

If S is the sphere with radius a centered at the origin, then its outward unit normal is also
radial and so we have n̂ = eρ , and ρ = a on S. Therefore, we have:

GMm
F · n̂ = − eρ · eρ
ρ2
GMm 2
= − 2 eρ
ρ
GMm
=− 2 .
ρ

Hence, the outward flux is given by:


‹ ‹
GMm
F · n̂ dS = − 2 dS
S ρ
‹S
GMm
= − 2 dS (since ρ = a on S)
S a

GMm
=− 2 dS
a
| S{z }
surface area of S
GMm
= − 2 · 4πa2 = −4πGMm.
a
On the other hand, by direct computation (left as an exercise), one can verify that:
!
xi + yj + zk
∇ · F = ∇ · − GMm 3/2
=0
( x 2 + y2 + z2 )

whenever ( x, y, z) ̸= (0, 0, 0).


Then, the triple integral would be:
˚

| {z· F} dV = 0.
D
= 0 except origin

Here D is the solid sphere enclosed by S.


Therefore, in this case we have:
‹ ˚
F · n̂ dS ̸= ∇ · F dV .
| S {z } | D {z }
= −4πGMm =0

The Divergence Theorem does not hold in this case. The reason is that the vector field F is not
defined at the origin and the surface S encloses the origin!
To conclude, one needs to be very careful when applying the Divergence Theorem if the
region D contains some points at which the vector field is not defined. In this next subsection,
we will learn how to apply the Divergence Theorem (in a modified way) when the surface
encloses some points at which the vector field in not defined.

4.7.5 Gauss’s Law for Gravity


The purpose of this subsection is to give a proof of the Gauss’s Law for Gravity (assuming the
inverse-square law), which says that the gravitational flux:

GMm
eρ · n̂ dS
S ρ2

is given by 4πGMm for any closed surface S enclosing the origin.


4.7 Divergence Theorem 129

We have shown that it is so when S is a sphere centered at the origin, and we are going to
use the Divergence Theorem to show that it is always true for any closed surface S enclosing
the origin. However, we need to be very careful when applying the Divergence Theorem since
the gravitational field is undefined at the origin.
We will adopt the “hole-drilling” technique which was previously used in computing the
winding number integral. Given a solid D containing the origin, we first construct a small
sphere B with radius a centered at the origin. Then, the solid D \ B (i.e. the solid D with B
removed) is a solid not enclosing the “bad” point origin.
Next, we cut this solid into two parts by the horizontal plane z = 0. Label each side of the
resulting solids by Si , Π and Σi as shown in the Figure 4.21. Note that Π is the common side.

Figure 4.21: applying Divergence Theorem on the gravitational force field

Gluing S1 , Π and Σ1 together gives a closed surface not enclosing the origin. Denote D1 to
be the solid enclosed by this closed surface. Hence, one can apply the Divergence Theorem
without any issue:
¨ ¨ ¨  ˚  
GMm GMm
+ + eρ · n̂ dS = ∇· eρ dV = 0
S1 Π Σ1 ρ2 D1 ρ2
| {z }
=0

where n̂ is the outward unit normal of the boundary surface of D1 . Denote n̂up and n̂down to
be the upward and downward normal vector respectively. The above integrals can be expressed
as:
¨ ¨ ¨
GMm GMm GMm
2
e ρ · n̂ up dS + 2
e ρ · n̂ down dS + eρ · n̂down dS = 0. (4.3)
S1 ρ Π ρ Σ1 ρ2

Similarly, gluing S2 , Π and Σ2 together gives a closed surface not enclosing the origin. By
the Divergence Theorem applied to this surface, we get:
¨ ¨ ¨
GMm GMm GMm
eρ · n̂down dS + eρ · n̂up dS + eρ · n̂up dS = 0. (4.4)
S2 ρ2 Π ρ2 Σ2 ρ2

We then add up the above two equations. First note that S1 and S2 can glue together to
form the closed surface S. Both n̂up of S1 , and n̂down of S2 become the outward normal of S.
130 Vector Calculus

Therefore,
¨ ¨ ¨
GMm GMm GMm
eρ · n̂down dS + eρ · n̂up dS = eρ · n̂outward dS.
S1 ρ2 S2 ρ2 S ρ2

For the planar surface Π, the downward normal n̂down is in the opposite direction of the
upward normal n̂up , i.e. n̂down = −n̂up . Therefore,
¨ ¨
GMm GMm
2
eρ · n̂down dS + eρ · n̂up dS = 0.
Π ρ Π ρ2

Finally, the surfaces Σ1 and Σ2 glue together to form the closed sphere Σ. The normal
vectors n̂down of Σ1 , and n̂up of Σ2 , are the inward unit normal of Σ. Therefore,
¨ ¨ ¨
GMm GMm GMm
2
eρ · n̂down dS + 2
eρ · n̂up dS = eρ · n̂inward dS.
Σ1 ρ Σ2 ρ Σ ρ2

Summing up (4.3) and (4.4), we get:


‹ ‹
GMm GMm
2
eρ · n̂outward dS + 0 + eρ · n̂inward dS = 0.
S ρ Σ ρ2

Therefore,
‹ ¨
GMm GMm
e ρ · n̂ outward dS = − eρ · n̂inward dS
S ρ2 Σ ρ2

GMm
= eρ · n̂outward dS
Σ ρ2
= 4πGMm (computed before)

This holds true for any closed surface S enclosing the origin. This proves the Gauss’s Law for
Gravity (assuming the inverse-square law).
However, if S does not enclose the origin, then one can apply the Divergence Theorem on
the gravitational vector field without any issue.
To conclude, for any closed surface S not passing through the origin, we have:
‹ (
1 4πGMm if S encloses the origin;
GMm 2 eρ · n̂ dS =
S ρ 0 otherwise.
4.8 Heat Diffusion (Optional) 131

4.8 Heat Diffusion (Optional)


In this section, we discuss an important differential equation in both mathematics and physics,
the heat equation. Let u( x, y, z, t) be the temperature at point ( x, y, z) at time t. The heat
equation:
 2
∂ u ∂2 u ∂2 u

∂u
=k + 2+ 2
∂t ∂x2 ∂y ∂z
governs the diffusion of heat.

4.8.1 Derivation of Heat Equation


To begin with, we will use several fundamental laws in physics and the Divergence Theorem
to derive the heat equation.
Heat diffusion is caused by displacement of heat energy. Fourier’s Law in physics asserts
that heat energy transfers according to the following rule:

J = − a∇u

where J is a vector field (in energy per second) representing the flow of heat energy, and a is a
positive constant depending on the medium. In other words, heat energy diffuses from higher
temperature regions to lower ones, and the rate of diffusion is proportional to the magnitude
of ∇u.
Let D be an arbitrary solid region with boundary surface S. Denote ϱ to be the energy
density function (in energy per volume), which equals to bu for some positive constant b whose
value depends on the medium. Then the triple integral:
˚
ϱ dV
D

is the total amount of heat energy contained in the region D.


On the other hand, the outward flux

J · n̂ dS
S

measures the amount of heat loss through the closed surface S. By the conservation of heat
energy, heat energy must escape through the surface S. In mathematical terms, it is stated as:
˚ ‹

ϱ dV = − J · n̂ dS.
∂t D S

The negative sign appears on the RHS because of the outward convention of n̂.
Applying the above physical laws, we get:
˚ ‹
∂u
b dV = − (− a∇u) · n̂ dS
∂t
˚D ‹ S
∂u
b dV = a∇u · n̂ dS
∂t
˚D ˚S
∂u
b dV = ∇ · ( a∇u) dV (Divergence Theorem)
D ∂t D

Since D is arbitrary, we must have:

∂u
b = ∇ · ( a∇u) = a∇ · ∇u.
∂t
We leave it as an exercise for readers to verify that:

∂2 u ∂2 u ∂2 u
∇ · ∇u = + 2 + 2.
∂x2 ∂y ∂z
132 Vector Calculus

Therefore, we can conclude that:


∂2 u ∂2 u ∂2 u
 
∂u b
= + 2+ 2
∂t a ∂x2 ∂y ∂z

which is exactly the heat equation by defining k = ba .


Very often, ∇ · ∇u is denoted by ∇2 u or ∆u. As such, the heat equation can be written as:
∂u
= k∆u.
∂t

4.8.2 Fundamental Solution


It can be verified that the following function satisfies the heat equation:

x 2 + y2 + z2
 
1
Φ( x, y, z, t) = exp − .
(4πkt)3/2 4kt

At a point ( x, y, z) = (0, 0, 0), we have Φ(0, 0, 0, t) = 1


(4πkt)3/2
, and so:

1
lim Φ(0, 0, 0, t) = lim = ∞.
t →0 t→0 (4πkt )3/2
 2 2 2
x +y +z
In contrast, if ( x, y, z) ̸= (0, 0, 0), both exp − 4kt and (4πkt)3/2 go to 0 as t → 0.
However, the exponential term goes to 0 faster than the t3/2 term, so

x 2 + y2 + z2
 
1
lim Φ( x, y, z, t) = lim exp − = 0 when ( x, y, z) ̸= (0, 0, 0).
t →0 t→0 (4πkt )3/2 4kt

Therefore, the function Φ( x, y, z, t) represents the heat diffusion starting from a highly concen-
trated heat source at t = 0. As time goes, the temperature distribution becomes more and more
uniform.
In general, if the initial temperature distribution is given by the function g( x, y, z), it can be
shown (proof beyond the scope of the course) that the following function
ˆ ∞ ˆ ∞ ˆ ∞
u( x, y, z, t) = Φ( x − u, y − v, z − w, t) g(u, v, w) dudvdw
−∞ −∞ −∞
∂u
satisfies the heat equation ∂t = k∆u with initial condition g( x, y, z), meaning that
lim u( x, y, z, t) = g( x, y, z).
t →0

In other words, the function u( x, y, z, t) predicts how heat diffuses when given an initial
temperature profile g( x, y, z). However, the triple integral involved is generally difficult to be
found explicitly.

4.8.3 Steady State


A temperature distribution u( x, y, z, t) is said to be at steady state if it is independent of the
time t, i.e. ∂u
∂t = 0. For such a temperature distribution, the heat equation implies that

∆u = 0.

The above equation is often called the Laplace Equation.


Now given a closed surface S which encloses a solid region D. Using the Divergence
Theorem, one can show that at a steady state if the temperature on the surface S is constant,
then the temperature inside the surface S is also a constant. To argue this, we denote u( x, y, z)
to be a steady state temperature distribution (i.e. ∆u = 0), and that:

u( x, y, z) = C for any ( x, y, z) on S.
4.8 Heat Diffusion (Optional) 133

Next we consider the vector field (u − C )∇u. We leave it as an exercise for readers to verify
from the definition that:

∇ · ((u − C )∇u) = |∇u|2 + (u − C )∆u.

At steady state, we have ∆u = 0 and so ∇ · ((u − C )∇u) = |∇u|2 . Next we integrate this result
over D: ˚ ˚
∇ · ((u − C )∇u) dV = |∇u|2 dV.
D D
Applying the Divergence Theorem on the LHS, we get:
‹ ˚
(u − C )∇u · n̂ dS = |∇u|2 dV.
S D

From our assumption, we have u = C for any point on S. Therefore, the integrand (u − C )∇u · n̂
of the flux integral on LHS is zero, and so:
˚
0= |∇u|2 dV.
D

Since the integrand |∇u|2 is non-negative, the only scenario for the above to happen is that

∇u( x, y, z) = 0 for any ( x, y, z) in D.

Therefore, u must be a constant in the region D, and by continuity, this constant must be C.

You might also like