Main
Main
By
E veryone who contributed.
github.com/oliver-butterley/ma2
c b a This document is covered by Creative Commons Attribution-ShareAlike 4.0 Interna-
tional (CC BY-SA 4.0).
You are free to share this work (copy and redistribute the material in any medium or format)
and adapt this work (remix, transform, and build upon the material for any purpose, even
commercially), under the obligation of attribution (you must give appropriate credit) and
share-alike (if you remix, transform, or build upon the material, you must distribute your
contributions under the same license as the original).
This text may contain errors, inaccuracies and misleading ideas, the reader takes full respons-
ibility for the consequences. Any resemblance to actual persons, living or dead, events or
localities is entirely coincidental.
i
Preface
T versity of Rome Tor Vergata in the department of engineering for the academic
h i s text accompanies the course “Mathematical Analysis 2” taught at the Uni-
year 2022/23. The course was led by Oliver Butterley, in collaboration with Giovanni
Canestrari.
The aim of this document is to concisely describe the fundamental details related
to the material of the course. They are aptly named as “notes” and are most likely
not the comprehensive source of all relevant information. We have easy access to a
huge volume of resources and so here we will make connections to whatever is useful,
whenever we can.
These notes are merely written text whereas the central part of the course remains
the time spent working with the material, be it doing exercises, discussing, doing
calculations, etc. This is not text for memorising, this is text that aims to help us
practice and become stronger thinkers.
This text is freely1 available at github.com/oliver-butterley/ma2. Everyone is
encouraged to contribute improvements to the document during the progress of the
course.
Some of the text comes from previous years and from many other sources, some
of the text came to be during the course. The current version is the product of many
people, in particular everyone who has made suggestions in class and pointed out
errors or imprecisions and to everyone who suggested useful additional content.
1
Free both in the sense of “free speech” and “free beer”.
iii
C o n t e n ts
Preface iii
Introduction vii
v
Curves & line integrals 49
Curves, paths & line integrals . . . . . . . . . . . . . . . . . . . . . . . 50
Basic properties of the line integral . . . . . . . . . . . . . . . . . . . . 51
The second fundamental theorem . . . . . . . . . . . . . . . . . . . . . 54
The first fundamental theorem . . . . . . . . . . . . . . . . . . . . . . 54
Potentials & conservative vector fields . . . . . . . . . . . . . . . . . . . 58
Line integrals of scalar fields . . . . . . . . . . . . . . . . . . . . . . . . 61
Multiple integrals 63
Definition of the integral . . . . . . . . . . . . . . . . . . . . . . . . . 63
Evaluation of multiple integrals . . . . . . . . . . . . . . . . . . . . . . 66
Regions bounded by functions . . . . . . . . . . . . . . . . . . . . . . 68
Applications of multiple integrals . . . . . . . . . . . . . . . . . . . . . 71
Green’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Change of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Surface integrals 79
Representation of a surface . . . . . . . . . . . . . . . . . . . . . . . . 79
Surface integral of scalar field . . . . . . . . . . . . . . . . . . . . . . . 82
Change of surface parametrization . . . . . . . . . . . . . . . . . . . . . 83
Surface integral of a vector field . . . . . . . . . . . . . . . . . . . . . . 84
Curl and divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Theorems of Stokes and Gauss . . . . . . . . . . . . . . . . . . . . . . 87
I n t r o d u c tion
vii
We often want to swap the order of summing (or integrating) and often need to
consider infinite sums (or integrals). When can we do this and can’t we?
Example (Interchanging integrals). Let’s try to integrate e−xy − xye−xy with respect
to both x and y. We would like to believe that
ˆ∞ ˆ1 ˆ1 ˆ∞
(e−xy − xye−xy ) dy dx = ? (e−xy − xye−xy ) dx dy.
0 0 0 0
´1 −xy 1
´∞
Since 0 (e −xye ) dy = [ye ]y=0 = e , the left-hand side is 0 e−x dx =
−xy −xy −x
∞ ´∞ ∞
[−e−x ]0 = 1. However, since 0 (e−xy − xye−xy ) dx = [xe−xy ]x=0 = 0, the
´1
right-hand side is 0 0 dx = 0. So how do we know when to trust the interchange of
intervals?
Example (interchanging limits). We could easily believe that
lim lim 2
x2 ? lim lim x2 .
=
x→0 y→0 x + y 2 y→0 x→0 x2 + y 2
However limy→0 x2x+y2 = x2x+0 = 1 and so the left-hand side is 1 whereas limx→0 x2x+y2 =
2 2 2
0
0+y 2
= 0 so the right-hand side is 0. What does the graph of this function look like?
This example shows that the interchange of limits is untrustworthy. Under what
circumstances is it legitimate?
We need to be rigorous in our logic otherwise, as we have seen in these examples,
the conclusions can be erroneous and the difficulties are often subtle.
viii
width
curve in R2 . For a given angle we define the width of this curve to be the smallest
distance between two parallel lines which touch the curve in a single point but never
cross it (one each side of the curve). We say that the curve has constant width if this
width is equal from every direction. This is just what we would check using calipers
on a part and rotating. The following statement is intuitive and true.
Theorem. A circle has constant width.
However the converse is not true, indeed the following is true.
Theorem. There exist constant width curves which are not circles.
This can be proved by constructing many such curves, for example the Reuleaux
triangle. Indeed there are such curves which look similar to regular polygons but still
have constant width.
ix
Mathematical Analysis 1 Mathematical Analysis 2
Sequences & series of numbers Sequences & series of functions
2 , a3 , . . . f
a1 , aP 1 (x), f2 (x), f3 (x), . . .
∞ P ∞
n=0 n a n=0 fn (x)
x
Chapter 1
S e q u e n c e s & s eries of
f u n c t i o ns
A
n a l o g o u s l y to sequences of numbers we can consider a sequence of func-
tions f0 (x), f1 (x), f2 (x), f3 (x), etc. Often it is convenient to write such a
sequence as {fn (x)}n∈N . For example, the following are sequences of functions.
▷ f1 (x) = x2 , f2 (x) = x4 , f3 (x) = x6 , . . .
▷ f1 (x) = ex , f2 (x) = e2x ,f3 (x) = e3x , . . .
▷ fn (x) = n exp − 12 n2 x2
Note that in the first case we could have instead written fn (x) = x2n and in the
second case we could have written fn (x) = enx . The natural number n is called the
index. Typically the index of the sequence starts from n = 0 or n = 1 but that’s not
essential. The index doesn’t need to be n, any other letter, or indeed symbol, can be
used.
1 . 1 C o n v e r g e n c e & co n t i n u i t y
We start by recalling the notion of convergence for sequences of numbers.
1
1
f1 (x) = x1
f2 (x) = x2
f3 (x) = x3 4
f4 (x) = x 5
ff5 (x) ==xx67
f6f7f(x)
(x)
(x)
(x) ==
=xxx8910
fff810 (x)
(x) =
= x 1112
=xxxxx14
9 13
f14 (x)=
ff11
12
13
15
(x)
(x)
(x) =
= 15
0
0 1
Example. Consider the sequence fn (x) = xn for x ∈ (0, 1). For each x ∈ (0, 1) we
1 1
see that fn (x) → 0. On the other hand, for each n, 2 n ∈ (0, 1) and fn (2 n ) = 12 .
Up until now we haven’t mentioned the domain of the functions in the sequence but
to proceed we need to be make this detail rigorous. We will write that “{fn (x)}n is a
sequence of functions on D ⊂ R” to mean that there is a fixed D ⊂ R and, for each
n ∈ N, fn is a function with domain D (i.e., fn : D → R).
2
fn (x) f (x) + ϵ
f (x)
f (x) − ϵ
a b
Theorem 1.5. Suppose that fn → f uniformly on D and that the fn are continuous on
D. Then f is continuous on D.
Proof. Let p ∈ D. Uniform convergence means that, for each ϵ > 0, there exists N
such that for every n ≥ N and every x ∈ D, |fn (x) − f (x)| < 3ϵ . By continuity of
3
nn = 654
n== 32
n=1
1
−1 1
−1
Recall that integrals are defined rigorously using the notion of a step functions.
4
Proof. The uniform convergence implies that for each ϵ > 0, there exists N such that
for every n ≥ N and every x ∈ D, |fn (x) − f (x)| < b−aϵ
. This means that
ˆb ˆb ˆb
fn (x) dx − f (x) dx ≤ |fn (x) − f (x)| dx
a a a
ϵ
≤ (b − a) = ϵ.
b−a
´b ´b
This shows that a
fn (x) dx → a
f (x) dx.
Series of functions
Recall that, if is a sequence of numbers, then the series n an is the sequence
P
Pn {a n } n
of numbers (the partial sums). We say that the series n an is convergent
P
{ P a
k=1 k n}
n
if { k=1 ak }n is convergent.
P
Theorem 1.8. Suppose that the series n fn is uniformly convergent to g on D and the
fn are continuous on D. Then g is continuous on D.
Pn
Proof. If the fk are continuous then the k=1 fk are continuous. This means that
Theorem 1.5 applies.
P
Theorem 1.9. Suppose that the series n fn is uniformly convergent to g and the fn
are continuous. Then
ˆb X n ˆb
lim fk (x) dx = g(x) dx.
n→∞
a k=1 a
Pn
Proof. Again, that the fk are continuous means that the k=1 fk are continuous.
This means that Theorem 1.6 applies.
5
Here and subsequently it is convenient to recall several commons tests which are
useful for proving convergence: ratio test, root test, comparison test, alternating series
test, integral test for convergence. For series of functions we have the following test for
convergence.
Proof. By the comparison test |fn (x)| is convergent for all x ∈ D. I.e., for each
P
x the series fn (x) is absolutely convergent and so we let f (x) be the limit. We
P
compute
n
X ∞
X ∞
X ∞
X
f (x) − fk (x) = fk (x) ≤ |fk (x)| ≤ Mk .
k=1 k=n+1 k=n+1 k=n+1
Typically the power series will converge for some x and diverge for other x. We
could permit x to be a complex number and the entire work of this section holds
verbatim. However, for the present purposes we will assume that x ∈ R and that the
coefficients an ∈ R and that c ∈ R. To simplify formulae we will often work with
the case that c = 0 since we can always transform a given problem to this special case.
6
Example. Let an = 1
. The power series an x n = xn
is convergent for all
P P
n! n n n!
x. To see this we use the ratio test and observe that xn+1
(n+1)!
/ xn
n!
= |x|
n+1
and that
limn→∞ |x|
n+1
= 0 for any x.
Proof. Since n an xn0 is convergent there exists M > 0 such that, for all n, |an xn0 | ≤
P
M . Observe that n n
|an xn | = |an xn0 | xx0 ≤ M |xR0 |n .
The series n M |xR0 |n is a geometric sum and so convergent. Consequently, by the
P n
Proof. Let A be the set of real numbers for which n an x is convergent and let r be
n
P
the least upper
Pboundn of A. The series n an x is convergent whenever |x| < r. If
n
P
P > r and
|x| n an x is convergent then this contradicts the definition of A and so
n an x is divergent for |x| > r.
n
In the above paragraphs we worked with the case c = 0 but all of these notions hold
for the general c ∈ R. Consequently Theorem 1.13 implies that the series is convergent
on an interval (c − r, c + r) = {x : |x − c| < r} but divergent when |x − c| > r.
7
The convergence for the sequence for x = c − r and x = c + r must be manually
checked and can differ for the left and right end points.
Proof. Let |x| < R < r. Observe that the series is uniformly convergent for y ∈
[−R, R]. This means that f (x) is continuous and so we can interchange limit and
integral,
ˆx ∞ ˆx ∞
X
n
X an n+1
f (y) dy = an x = x .
n=0 n=0
n + 1
0 0
Theorem 1.16P (differentiating power series). Suppose that, for x ∈ (−r, r), the
series f (x) = ∞ n ′
n=0 an x is convergent. Then f (x) is differentiable and f (x) =
P ∞ n−1
n=1 nan x , convergent for x ∈ (−r, r).
8
Proof. Let |x| < R < r. Observe that
∞ ∞
X
n−1
X
nn xn−1
nan x = an R · · n−1 .
n=1 n=1
R R
|x| n−1
SinceP ∞n=1 an R is absolutely convergent and R · ( R )
n n
is bounded we know
P
∞
that n=1Pnan x
n−1
is absolutely convergent
´ x(comparison test).
P∞ For convenience let
g(x) = ∞ n=1 na n x n−1
and observe that 0
g(y) dy = a
n=1 n x n
= f (x) −
a0 (by Theorem 1.15). By the fundamental theorem of calculus this concludes the
proof.
∞
X
f (x) = an (x − a)n
n=0
9
Theorem 1.17 (uniqueness of power series). Suppose that two power series are convergent
and are equal in a neighbourhood of a in the sense that, for |x − a| < ϵ,
X X
an (x − a)n = bn (x − a)n = f (x).
n n
Then the two series are equal term-by-term, i.e., an = bn for every n ∈ N. Moreover,
f (n) (a)
an = b n = .
n!
Proof. The conclusion of Theorem 1.16 can be iterated and implies that f (x) has
derivatives of every order and, for k ∈ N,
X∞
(k)
f (x) = k!ak + n(n − 1) · · · (n − k + 1)an (x − a)n−k .
n=k+1
This means that f (k) (a) = k!ak because all the terms in the sum vanish.
Observe how the coefficients in the Taylor’s series coincide with the formula ob-
tained in the above results. Question: Does the Taylor’s series converge on the entire
interval? In general, no. However we can calculate the radius of convergence of the
power series. Question: If the Taylor’s series converges, is it equal to f (x) on the
interval? In general it might not as seen in the following example.
Example. Let f (x) = e−1/x . If we proceed to calculate the Taylor’s series about
2
x = 0 we obtain:
f (x) = exp(−x−2 ) f (0) = 0
′ −3 −2
f (x) = 2x exp(−x ) f ′ (0) = 0
f ′′ (x) = (−6x−4 + 4x−6 ) exp(−x−2 ) f ′′ (0) = 0
f ′′′ (x) = 4(2x−9 − 9x−7 + 6x −5 −2 ′′′
P∞) exp(−x ) f (0) = 0
The Taylor’s series is consequently n=0 0 = 0. It does converge but has nothing to
do with the original function.
10
Example. What is Taylor’s series for f (x) = ex ? Does differentiating this power
series correspond to expectations?
11
Combining the integrals and integrating by parts we obtain the claimed statement for
n + 1. Using the formula for En (x) which we have just proved, we estimate
ˆx
1 n n+1 1 n n+1 (rA)n
|En (x)| ≤ |x − y| A dy ≤ rr A = rA .
n! n! n!
a
(rA)n
Since n!
→ 0 as n → ∞ we have shown that |En (x)| → 0 as n → ∞.
Task 1.6.1. Find a function y(x) which satisfies the differential equation
(1 − x2 )y ′′ (x) = −2y(x)
and satisfies the initial conditions y(0) = 1, y ′ (0) = 1.
12
3. Consequently, by Theorem 1.17, 0 = 2an + (n + 2)(n + 1)an+2 − n(n − 1)an
for each n ∈ N0 ;
4. Equivalently an+2 = n−2 a ;
n+2 n
5. Using the initial conditions, a0 = y(0) = 1, a1 = y ′ (0) = 1;
6. For the even coefficients:
0−2
▷ a2 = 0+2 a0 = −1,
2−2
▷ a4 = 2+2 a2 = 0,
▷ a6 = 4−2
4+2 4
a = 0,...;
7. For the odd coefficients:
▷ a3 = 1−2
1+2 1
a = − 13 ,
▷ a5 = 3−2
3+2 3
a = 15 (− 31 ),...
1
▷ a2n+1 = (2n+1)(2n−1) ;
8. Formally we have the series solution
∞
X x2n+1
y(x) = 1 − x + 2
, (1.1)
n=0
(2n + 1)(2n − 1)
9. We see that this series is convergent for |x| < 1.
Consequently we have shown that the function defined above (1.1) is well-defined in
the interval (−1, 1) and is a solution to the given differential equation.
13
Chapter 2
Di f f e re n t i a l c a lculus in
h i g h e r d i m e nsion
I
n this part of the course we start to consider higher dimensional space. That is,
instead of R we consider Rn for n ∈ N. We will particularly focus on 2D and 3D
but everything also holds in any dimension. Going beyond R we have more options
for functions and correspondingly more options for derivatives.
Various different notation is commonly used. Here we will primarily use (x, y) ∈
R , (x, y, z) ∈ R3 or, more generally, x = (x1 , x2 , . . . , xn ) ∈ Rn where x1 ∈
2
Pn
Definition 2.1 (inner product). x · y = k=1 xk yk ∈ R
We recall that the inner product being zero has a geometric meaning, it means that the
two vectors are orthogonal. We also recall that the “length” of a vector is given by the
norm, defined as follows.
√ 1
Definition 2.2 (norm). ∥x∥ = x · x = ( nk=1 x2k ) 2 .
P
For example, in R2 then ∥(x, y)∥ = x2 + y 2 . There are various convenient prop-
p
erties for working with norms and inner products, in particular, the Cauchy-Schwarz
inequality |x · y| ≤ ∥x∥ ∥y∥ and the triangle inequality ∥x + y∥ ≤ ∥x∥ + ∥y∥.
15
The primary higher-dimensional functions we consider in this course are:
Scalar fields: f : Rn → R
Vector fields: f : Rn → Rn
Paths: α : R → Rn
Change of coordinates: x : Rn → Rn
These possibilities all fit into the general pattern of f : Rn → Rm for n, m ∈ N but
tradition and use of the function gives us different terminology and symbols. Such
functions are useful for representing various practical things, for example: gravitational
force; temperature in a region; wind velocity; fluid flow; electric field; etc.
2.1 O p e n s e t s , c l o s e d s e t s , b o u n d a r y ,
continuity
Let a ∈ Rn , r > 0. The open n-ball of radius r and centre a is written as
B(a, r) := {x ∈ Rn : ∥x − a∥ < r} .
Definition 2.4 (open set). A set S ⊂ Rn is said to be open if all of its points are
interior points, i.e., if int S = S.
For example, open intervals, open disks, open balls, unions of open intervals, etc.,
are all open sets.
Lemma. Let r > 0, a ∈ Rn . The set B(a, r) ⊂ Rn is open.
Proof. Let b ∈ B(a, r). It suffices to show that b is an interior point. (1) Let r1 =
∥b − a∥ < r. (2) Let r2 = (r − r1 )/2. (3) We claim that B(b, r2 ) ⊂ B(a, r): In
order to see this take any c ∈ B(b, r2 ) and observe that
r + r1
∥c − a∥ ≤ ∥c − b∥ + ∥b − a∥ ≤ r2 + r1 = < r.
2
Observe that the radius of the ball will be small for points close to the boundary.
16
B(a, r)
F igure 2.1: Interior points are the centre of a ball contained within the
set.
Discussing the “interior” of the set naturally suggests the topic of the “boundary”
of the set. In the following definitions we develop this idea.
17
x2
A2 A1 × A2
A1 x1
Observe that ext S is an open set. We use the notation S c = Rn \ S and we say
that C c is the complement of the set S.
Definition 2.7 (boundary). The set Rn \ (int S ∪ ext S) is called the boundary of
S ⊂ Rn and is denoted ∂S.
18
Definition 2.10 (continuous). A function f is said to be continuous at a if f is defined
at a and lim f (x) = f (a). We say f is continuous on S if f is continuous at each
x→a
point of S.
Even functions which look “nice” can fail to be continuous as we can see in the
following example.
Example (continuity in higher dimensions). Let f be defined, for (x, y) ̸= (0, 0), as
xy
f (x, y) = 2
x + y2
and f (0, 0) = 0. What is the behaviour of f when approaching (0, 0) along the
following lines?
line value
{x = 0} f (0, t) = 0
{y = 0} f (t, 0) = 0
{x = y} f (t, t) = 12
{x = −y} f (t, t) = − 21 .
Theorem 2.11. Suppose that limx→a f (x) = b and limx→a g(x) = c. Then
1. limx→a (f (x) + g(x)) = b + c,
2. limx→a λf (x) = λb for every λ ∈ R,
3. limx→a f (x) · g(x) = b · c,
4. limx→a ∥f (x)∥ = ∥b∥.
We prove a couple of the parts of the above theorem here, the other parts are left as
exercises.
Proof of 3. Observe that f (x) · g(x) − b · c = (f (x) − b) · (g(x) − c) + b · (g(x) −
c) + c · (f (x) − b). By the triangle inequality and Cauchy-Schwarz,
∥f (x) · g(x) − b · c∥ ≤ ∥f (x) − b∥ ∥g(x) − c∥
+ ∥b∥ ∥g(x) − c∥
+ ∥c∥ ∥f (x) − b∥ .
Since we already know that ∥f (x) − b∥ → 0 and ∥g(x) − c∥ → 0 as x → a, this
implies that ∥f (x) · g(x) − b · c∥ → 0.
19
Proof of 4. Take f = g in part (c) implies that limx→a ∥f (x)∥2 = ∥b∥2 .
When writing a vector field (or similar functions) it is often convenient to divide the
higher-dimensional function into smaller parts. We call these parts the components of a
vector field. For example f (x) = (f1 (x), f2 (x)) in 2D, f (x) = (f1 (x), f2 (x), f3 (x))
in 3D, etc.
Theorem 2.12. Let f (x) = (f1 (x), f2 (x)). Then f is continuous if and only if f1 and
f2 are continuous.
continuous everywhere in Rn . This is because they are the finite sum of products of
continuous scalar fields.
20
2
0
0 0.5 1 1.5 2
Figure 2.3: Plot where colour represents the value of f (x, y) = x2 +y 2 .
The change in f depends on direction.
Example. We can consider the scalar field f (x, y) = sin(x2 + y) + xy as the com-
position of functions.
2 . 2 D e ri v a t i v e s o f s c a l a r f i e l d s
We can imagine, for example in Figure 2.3, that in higher dimensions, the derivative of
a scalar field depends on the direction. This motivates the following.
21
Definition 2.14 (directional derivative). Let S ⊂ Rn and f : S → R. For any
a ∈ int S and v ∈ Rn , ∥v∥ = 1 the directional derivative of f with respect to v is
defined as
1
Dv f (a) = lim (f (a + hv) − f (a)) .
h→0 h
Theorem (mean value). Assume that Dv (a + tv) exists for each t ∈ [0, 1]. Then for
some θ ∈ (0, 1),
f (a + v) − f (a) = Dv f (z), where z = a + θv.
The following notation is convenient. For any k ∈ {1, 2, . . . , n}, let ek be the
n-dimensional unit vector where all entries are zero except the k th position which is
equal to 1. I.e., e1 = (1, 0, . . . , 0), e1 = (0, 1, 0, . . . , 0), e1 = (0, . . . , 0, 1).
22
In practice, to compute the partial derivative ∂x
∂f
k
, one should consider all other xj
for j ̸= k as constants and take the derivative with respect to xk . In a moment we see
this rigorously.
If f : R → R is differentiable, then we know that, when x is close to a,
f (x) ≈ f (a) + (x − a)f ′ (a).
More precisely, we know that1 f (x) = f (a) + (x − a)f ′ (a) + ϵ(x − a) where
|ϵ(x − a)| = o(|x − a|). This way of seeing differentiability is convenient for the
higher dimensional definition of differentiability.
Definition 2.17 (gradient). The gradient of the scalar field f (x, y, z) at the point a is
∂f
∂x
(a)
∂f
∇f (a) = ∂y (a) .
∂f
∂z
(a)
In general, when working in Rn for some n ∈ N, the gradient of the scalar field
f (x1 , . . . , xn ) at the point a is
∂f
∂x1
(a)
∂f
∂x2 (a)
∇f (a) = . .
..
∂f
∂xn
(a)
1
This is little-o notation and here means that |f (x) − f (a) − (x − a)f ′ (a)| / |x − a| → 0 as
|x − a| → 0.
23
Theorem 2.18. If f is differentiable at a then df a (v) = ∇f (a) · v. This means that,
for x ∈ B(a, r),
f (x) = f (a) + ∇f (a) · (x − a) + ϵ(x − a)
where |ϵ(x − a)| = o(∥x − a∥). Moreover, for any vector v, ∥v∥ = 1,
Dv f (a) = ∇f (a) · v.
Proof. Observe that |f (a + v) − f (a)| = |df a (v) + ϵ(v)|. This means that
|f (a + v) − f (a)| ≤ ∥df a ∥ ∥v∥ + |ϵ(v)|
and so this tends to 0 as ∥v∥ → 0.
Theorem 2.19. Suppose that f (x1 , . . . , xn ) is a scalar field. If the partial derivatives
∂1 f (x), . . . , ∂n f (x) exist for all x ∈ B(a, r) and are continuous at a then f is
differentiable at a.
24
Using the mean value theorem we know that there exists zk = uk−1 + θk ek such that
f (a + uk ) − f (a + uk−1 ) = vk Dek f (a + zk ). Consequently
Xn
f (a + v) − f (a) = f (a + uk ) − f (a + uk−1 )
k=1
Xn
= vk Dek f (a + zk )
k=1
n
X
= vk Dek f (a + uk−1 )
k=1
n
X
+ vk (Dek f (a + zk ) − Dek f (a + uk−1 ))
k=1
To
Pnconclude, observe that the second sum vanishes as ∥v∥ → 0 and that the first sum,
k=1 vk Dek f (a + uk−1 ), converges to v · ∇f (a).
Chain rule
When we are working in R we know that, if g and h are differentiable, then f (t) =
g ◦ h(t) is also differentiable and also f ′ (t) = g ′ (h(t)) h′ (t). This is called the chain
rule and is frequently very useful in calculating derivatives. We now investigate how
this extends to higher dimension?
Example. Suppose that α : R → R3 describes the position α(t) at time t and that
f : R3 → R describes the temperature f (α) at a point α The temperature at time t
is equal to g(t) = f (α(t)). We want to calculate g ′ (t) because this is the change in
temperature with respect to time.
In situations like the above example it is convenient to consider the derivative of
a path α : R → Rn . Let α : R → Rn and suppose it has the form α(t) =
(α1 (t), . . . , αn (t)). We define the derivative as
x′1 (t)
α′ (t) := ... .
x′n (t)
Here α′ is a vector-valued function which represents the “direction of movement”.
25
α′ (t)
α(t)
Example. A particle moves in a circle and its position at time t ∈ [0, 2π] is given by
x(t) = (cos t, sin t).
26
y
x(t)
The temperature at a point y = (y1 , y2 ) is given by the function f (y) := y1 +y2 , The
temperature the particle experiences at time t is given by g(t) = f (x(t)). Temperature
change: g ′ (t) = ∇f (x(t)) · x′ (t) = ( 11 ) · ( −cos
sin t
t ) = cos t − sin t.
2 . 3 L e ve l s e t s & t a n g e n t p l a n e s
Let S ⊂ R2 , f : S → R. Suppose c ∈ R and let
L(c) = {x ∈ S : f (x) = c} .
The set L(c) is called the level set. In general this set can be empty or it can be all of S.
However the set L(c) is often a curve and this is the case of interest. This is the same
notion as that of contour lines on a map. I.e., x(ta ) = a for some ta ∈ I and
f (x(t)) = c
for all t ∈ I. Then
▷ ∇f (a) is normal to the curve at a
▷ Tangent line at a is {x ∈ R2 : ∇f (a) · (x − a) = 0}
27
This is because the chain rule implies that ∇f (x(t)) · x′ (t) = 0.
Let f be a differentiable scalar field on S ⊂ R3 and suppose that the level set
L(c) = {x ∈ S : f (x) = c} defines a surface.
▷ The gradient ∇f (a) is normal to every curve α(t) in the surface which passes
through a,
▷ The tangent plane at a is {x ∈ R3 : ∇f (a) · (x − a) = 0}.
Same argument as in R2 works in Rn .
28
∇f (a)
α′ (t) a
L(c)
α(t)
Remark 2.21. If we use the notation f = (f1 , . . . , fm ), i.e., we write the function using
the “components” where each fk is a scalar field, then Dv f = (Dv f1 , . . . , Dv fm ).
29
Theorem 2.22. If f is differentiable at a then f is continuous at a and df a (v) =
Dv f (a).
2.5 J a c o b i a n m a t r i x & t h e c h a i n r u l e
The relevant differential for higher-dimensional functions is the Jacobian matrix.
Definition 2.23 (Jacobian matrix). Suppose that f : R2 → R2 and use the notation
f (x, y) = (f1 (x, y), f2 (x, y)). The Jacobian matrix of f at a is defined as
!
∂f1 ∂f1
∂x
(a) ∂y
(a)
Df (a) = ∂f 2 ∂f2 .
∂x
(a) ∂y
(a)
where |ϵ(x − a)| = o(∥x − a∥). This is like a Taylor expansion in higher dimensions.
Here we see that in higher dimensions we have a matrix form of the chain rule.
30
Theorem 2.24. Let S ⊂ Rl , T ⊂ Rm be open. Let f : S → T and g : T → Rn and
define
h = g ◦ f : S → Rn .
Let a ∈ S. Suppose that f is differentiable at a and g is differentiable at f (a). Then h
is differentiable at a and
Dh(a) = Dg(f (a)) Df (a).
Example (polar coordinates). Here we consider polar coordinates and calculate the
Jacobian of this transformation. We can write the change of coordinates
(r, θ) 7→ (r cos θ, r sin θ)
as the function f (r, θ) = (x(r, θ), y(r, θ)) where f : (0, ∞) × [0, 2π) → R2 . We
calculate the Jacobian matrix of this transformation
∂x
(r, θ) ∂x
∂r ∂θ
(r, θ) cos θ −r sin θ
Df (r, θ) = ∂y = .
∂r
(r, θ) ∂y
∂θ
(r, θ) sin θ r cos θ
In particular we see that det Df (r, θ) = r, the familiar value used in change of
variables with polar coordinated.
Suppose now that we wish to calculate derivatives of h := g ◦ f for some g : R2 →
R. Here we take advantage of Theorem 2.24.
Dh(r, θ) = Dg(f (r, θ)) Df (r, θ)
cos θ −r sin θ
∂h ∂g ∂g
(r, θ) ∂h
∂r ∂θ
(r, θ) = ∂x (f (r, θ)) ∂y (f (r, θ))
sin θ r cos θ
In other words, we have shown that
∂h ∂g ∂g
(r, θ) = (r cos θ, r sin θ) cos θ + (r cos θ, r sin θ) sin θ
∂r ∂x ∂y
∂h ∂g ∂g
(r, θ) = −r (r cos θ, r sin θ) sin θ + r (r cos θ, r sin θ) cos θ.
∂θ ∂x ∂y
31
2.6 Implicit functions & partial
derivatives
Just like with derivatives, we can take higher order partial derivatives. For convenience
when we want to write ∂y ∂ ∂
∂x
f (x, y), i.e., differentiate first with respect to x and then
2
with respect to y, we write instead ∂y∂x
∂ f
(x, y). The analogous notation is used for
higher derivatives and any other choice of coordinates.
We first consider the question of when
∂ 2f ? ∂ 2 f (x, y).
(x, y) =
∂y∂x ∂x∂y
Example (partial derivative problem). Let f : R2 → R be defined as f (0, 0) = 0
and, for (x, y) ̸= (0, 0),
xy(x2 − y 2 )
f (x, y) := .
x2 + y 2
2 ∂2f
We calculate that ∂y∂x
∂ f
(0, 0) = −1 but ∂x∂y
(0, 0) = 1.
Theorem 2.25. Let f : S → R be a scalar field such that the partial derivatives ∂f
∂x
,
∂f ∂2f 2 ∂2f
∂y
and ∂y∂x exist on an open set S ⊂ R containing x. Further assume that ∂y∂x is
∂2f
continuous on S. Then the derivative ∂x∂y
(x) exists and
∂ 2f ∂ f 2
(x) = (x).
∂x∂y ∂y∂x
Implicit Explicit
x2 − y = 0 y(x)
√ =x
2
2
x +y =12
y(x) = ±√1 − x2 , |x| ≤ 1
2 2
x −y −1=0 y(x) = ± x2 − 1, |x| ≥ 1
x2 + y 2 − e y − 4 = 0 A mess?
x2 y 4 − 3 = sin(xy) A huge mess?
32
Given the above observation, the following method of calculating derivatives is
sometimes useful. Suppose that some f : R2 → R is given and we suppose there
exists some y : R → R such that
f (x, y(x)) = 0 for all x.
Let h(x) := f (x, y(x)) and note that h′ (x) = 0. Here we are using the idea that
h = f ◦ g where g(x) = (x, y(x)). By the chain rule h′ (x) is equal to
1
∂f ∂f
(x, y(x)) ∂y (x, y(x)) = 0.
∂x y ′ (x)
Consequently
∂f
′ ∂x
(x, y(x))
y (x) = − ∂f .
∂y
(x, y(x))
33
Chapter 3
E x t r e m a & other
a p p l i c a t i ons
I
n the previous chapter we introduced various notions of differentials for higher
dimensional functions (scalar fields, vector fields, paths, etc.). In this chapter
we now explore various applications of these notions and work with some of the
implementations, rather than just the objects. Firstly we will consider certain partial
differential equations which we now have the tools to solve. Then the majority of the
chapter is devoted to searching for extrema (minima / maxima) in various different
scenarios. This extends what we already know for functions in R and we will find that
in higher dimensions many more possibilities and subtleties exist.
3.1 P a r t ia l d i f f e r e n t i a l e q u a t i o n s
There are a huge number of different types of partial differential equations (PDEs)
and here we consider just two types, first order linear PDEs and the 1D wave equation.
We start by consider an example of the first type.
Example. Find all solutions of the PDE, 3 ∂f
∂x
(x, y) + 2 ∂f
∂y
(x, y) = 0.
Solution. The given PDE is equivalent to ( 32 ) · ∇f (x, y) = 0. We can also phrase
this in terms of the directional derivative, namely
Dv f (x, y) = 0 where v = ( 32 ).
This means that if a function f is a solution to the PDE then it is constant in the
direction ( 32 ). This means that all solutions have the form f (x, y) = g(2x − 3y) for
some g : R → R.
35
The same idea as used for the above example gives the following general result.
Proof. First we prove (⇒). If f (x, y) = g(bx − ay) then, by the chain rule,
Consequently a∂x f (x, y) + b∂y f (x, y) = abg ′ (bx − ay) − abg ′ (bx − ay) = 0.
Now we prove (⇐). It’s convenient to work in coordinates which x correspond to the
lines along which the solutions are constant. Let ( v ) = b −a ( y ). This means that
u a b
( xy ) = a2−1 a b
+b2 b −a
( uv ). Let h(u, v) = f ( au+bv , bu−av ). We calculate that
a2 +b2 a2 +b2
1
∂u h(u, v) = a2 +b2
(a∂x f + b∂y f ) (au + bv, bu − av) = 0.
Namely, h(u, v) is a function of v only and does not depend on u so we take g(v) =
h(u, v) and so f (x, y) = g(bx − ay).
∂ 2f 2
2∂ f
(x, t) = c (x, t).
∂x2 ∂t2
Here x represents the position along string, t is time and f (x, t) is the displacement of
the string from the centre at position x, at time t. The constant c is a fixed parameter
depending on the string.
This partial differential equation is derived from the equation of motion F = ma
where F is the tension in the string, a is the acceleration from horizontal and m is the
mass of a little piece of the string. The equation is valid for small displacement. In this
case the boundary conditions are natural: Are the ends of the string fixed? Is only one
end fixed? At time t = 0, is the string already moving?
36
Theorem 3.2. Let F be a twice differentiable function and G a differentiable function.
1. The function defined as
ˆ
x+ct
1 1
f (x, t) = (F (x + ct) + F (x − ct)) + G(s) ds (3.1)
2 2c
x−ct
∂2f ∂2f
satisfies ∂x2 (x, t) = c2 ∂t2 (x, t), f (x, 0) = F (x) and ∂f ∂t
(x, 0) = G(x).
2. Conversely, if a solution of
∂ 2f 2
2∂ f
(x, t) = c (x, t)
∂x2 ∂t2
∂2f ∂2f
satisfies ∂x∂t = ∂t∂x , then it has the above form (3.1).
Proof of part 1. Let f (x, t) be as defined (3.1) in the statement of the theorem. We
calculate the partial derivatives
∂f
∂x
(x, t) = 12 (F ′ (x + ct) + F ′ (x − ct))
1
+ 2c (G(x + ct) − G(x − ct))
∂2f
∂x2
(x, t) = 12 (F ′′ (x + ct) + F ′′ (x − ct))
+ 2c 1
(G′ (x + ct) − G′ (x − ct))
∂f
∂t
(x, t) = 12 (cF ′ (x + ct) − cF ′ (x − ct))
+ 12 (G(x + ct) + G(x − ct))
∂2f 1 2 ′′ 2 ′′
∂t2 f (x, t) = 2
c F (x + ct) + c F (x − ct)
+ 2c (G′ (x + ct) + G′ (x − ct)) .
2 2
From this calculation we see that ∂∂xf2 (x, t) = c2 ∂∂t2f (x, t). Additionally we have
f (x, 0) = F (x) and ∂f
∂t
(x, 0) = G(x).
Proof of part 2. Suppose that f satisfies the 1D wave equation; Introduce u = x + ct,
v = x − ct and observe that x = u+v 2
, t = u−v
2c
. Define g(u, v) = f ( u+v
2
, u−v
2c
). By
the chain rule
∂g
∂u
(u, v) = 12 ∂f
∂x
( u+v
2
, u−v
2c
1 ∂f u+v u−v
) + 2c ∂t
( 2 , 2c ),
∂2g 1 ∂ 2 f u+v u−v 1 ∂ 2 f u+v u−v
∂v∂u
(u, v) = 4 ∂x2
( 2 , 2c ) − 4c ∂x∂t
( 2 , 2c )
2 2
1 ∂ f u+v u−v
+ 4c ∂x∂t
( 2 , 2c ) − 4c12 ∂∂t2f ( u+v
2
, u−v
2c
) = 0.
37
Since the second derivative is zero we know that ∂u∂g
is constant in v, therefore we can
write ∂u (u, v) = φ0 (u). In turn this means we can write g(u, v) = φ1 (u) + φ2 (v).
∂g
3.2 E x t r e m a ( m i n i m a / m a x i m a / s a d d l e )
Let S ⊂ Rn be open, f : S → R be a scalar field and a ∈ S.
Definition 3.3 (absolute min/max). If f (a) ≤ f (x) (resp. f (a) ≥ f (x)) for all
x ∈ S, then f (a) is said to be the absolute minimum (resp. maximum) of f .
Definition 3.4 (relative min/max). If f (a) ≤ f (x) (resp. f (a) ≥ f (x)) for all
x ∈ B(a, r) for some r > 0, then f (a) is said to be a relative minimum (resp.
maximum) of f .
Collectively we call the these points the extrema of the scalar field. In the case of a
scalar field defined on R2 we can visualize the scalar field as a 3D plot like Figure 3.1.
Here we see the extrema as the “flat” places. We sometimes use global as a synonym of
absolute and local as a synonym of relative.
To proceed it is convenient to connect the extrema with the behaviour of the gradient
of the scalar field.
Proof. Suppose f has a relative minimum at a (or consider −f ). For any unit vector
v let g(u) = f (a + uv). We know that g : R → R has a relative minimum at u = 0
so u′ (0) = 0. This means that the directional derivative Dv f (a) = 0 for every v.
Consequently this means that ∇f (a) = 0.
38
0.6
0.4
0.2
0
2
−0.2
0
−2 −1 0 1 y
2 −2
x
3
2 y2 )
Fig u r e 3 . 1 : f (x, y) := xe−(x + 14 ey 10
Observe that here and in the subsequent text, we can always consider the case of
f : R → R, i.e., the case of Rn where n = 1. Everything still holds and reduces to the
arguments and formulae previously developed for functions of one variable.
As we see in the example of Figure 3.2, the converse of Theorem 3.5 fails in the sense
that a stationary point might not be a minimum or a maximum. The motivates the
following.
The quintessential saddle has the shape seen in Figure 3.4. However it might be
similar to Figure 3.2 or more complicated using the possibilities available in higher
dimension.
39
1
f (x)
0.5
x
−1 −0.5 0.5 1
−0.5
−1
0.5
y
−1 1
x
−1 1
F i g u r e 3 . 3 : If f (x, y) = x2 + y 2 then ∇f (x, y) = 2x 2y and
40
1
y
−1 1
0.5
0 x
−0.5
−1
1
−1
Definition 3.8 (Hessian matrix). Let f : R2 → R be twice differentiable and use the
notation f (x, y). The Hessian matrix at a ∈ R2 is defined as
2
∂ f ∂2f
∂x2
(a) ∂x ∂y
(a)
Hf (a) = 2 .
∂ f ∂2f
∂y ∂x
(a) ∂y 2
(a)
Observe that the Hessian matrix Hf (a) is a symmetric matrix since we know that
∂2f 2f
∂x ∂y
(a) = ∂y∂ ∂x (a) for twice differentiable functions (Theorem 2.25). The Hessian
matrix is defined analogously in any dimensions as follows. Let f : Rn → R be twice
41
differentiable. The Hessian
matrix at a ∈ Rn is defined as
∂2f ∂2f ∂2f
2 (a) ∂x1 ∂x2
(a) · · · ∂x1 ∂xn
(a)
∂x1
∂2f ∂2f ∂2f
∂x2 ∂x1 (a) (a) · · · (a)
∂x22 ∂x2 ∂xn
Hf (a) = .
.
. .
. . . .
.
. . . .
2
∂ f 2
∂ f 2
∂ f
∂xn ∂x1
(a) ∂xn ∂x2 (a) · · · ∂x2
(a)
n
Observe that the Hessian matrix is a real symmetric matrix in any dimension. If
f : R → R then Hf (a) is a 1 × 1 matrix and coincides with the second derivative of
f . In this sense what we know about extrema in R is just a special case of everything
we do here.
v1
Lemma. If v = ... then1 vt Hf (a) v = nj,k=0 ∂j ∂k f (a)vj vk ∈ R.
P
vn
42
The point (0, 0) is a stationary point since ∇f (0, 0) = ( 00 ). In this example Hf
does not depend on (x, y) but in general we can expect dependence and so it gives a
different matrix at different points (x, y).
Theorem 3.9 (second order Taylor). Let f be a scalar field twice differentiable on
B(a, r). Then,a for x close to a,
1
f (x) ≈ f (a) + ∇f (a) · (x − a) + (x − a)t Hf (a) (x − a)
2
2
in the sense that the error is o(∥(x − a)∥ ).
a
We use the convention that (x − a) is a vertical vector, equivalently, a n × 1 matrix.
= vt Hf (a + uv) v.
Consequently f (a + v) = f (a) + ∇f (a) · v + 12 vt Hf (a + cv) v. We define the
“error” in the approximation as ϵ(v) = 12 vt (Hf (a + cv) − Hf (a))v and estimate
that n
X
|ϵ(v)| ≤ vj vk (∂j ∂k f (a + cv) − ∂j ∂k f (a)) .
j,k=0
2
Since |vj vk | ≤ ∥v∥ we observe that |ϵ(v)|
∥v∥2
→ 0 as ∥v∥ → 0 as required.
43
3.4 C l a s s i f y i n g s t a t i o n a r y p o i n t s
In order to classify the stationary points we will take advantage of the Hessian matrix
and therefore we need to first understand the follow fact about real symmetric matrices.
Theorem 3.10. Let A be a real symmetric matrix and let Q(v) = vt Av. Then
Q(v) > 0 for all v ̸= 0 ⇐⇒ all eigenvalues of A are positive,
Q(v) < 0 for all v ̸= 0 ⇐⇒ all eigenvalues of A are negative.
Theorem 3.11 (classification of stationary points). Let f be a scalar field twice dif-
ferentiable on B(a, r). Suppose ∇f (a) = 0 and consider the eigenvalues of Hf (a).
Then
All eigenvalues are positive =⇒ relative minimum at a,
All eigenvalues are negative =⇒ relative maximum at a,
Some positive, some negative =⇒ a is a saddle point.
Since |ϵ(v)|
∥v∥2
→ 0 as ∥v∥ → 0, |ϵ(v)|
∥v∥2
< Λ2 when ∥v∥ is small. The argument is
analogous for the second part. For final part consider vj which is the eigenvector for
λj and apply the argument of the first or second part.
44
3 . 5 A tt a i n i n g e x t r em e v a l u e s
Here we explore the extreme value theorem for continuous scalar fields. The argument
will be in two parts: Firstly we show that continuity implies boundedness; Secondly
we show that boundedness implies that the maximum and minimum are attained.
We use the following notation for intervals / rectangles / cuboids / tesseracts, etc. If
a = (a1 , . . . , an ) and b = (b1 , . . . , bn ) then we consider the n-dimensional closed
Cartesian product
[a, b] = [a1 , b1 ] × · · · × [an , bn ].
We call this set a rectangle (independent of the dimension). As a first step it is convenient
to know that all sequences in our setting have convergent subsequences.
Proof. In order to prove the theorem we construct the subsequence. Firstly we divide
[a, b] into sub-rectangles of size half the original. We then choose a sub-rectangle
which contains infinite elements of the sequence and choose the first of these elements
to be part of the sub-sequence. We repeat this process by again dividing the sub-
rectangle we chose by half and choosing the next element of the subsequence. We
repeat to give the full subsequence.
Proof. Suppose the contrary: for all n ∈ N there exists xn ∈ [a, b] such that
|f (xn )| > n. Bolzano–Weierstrass theorem means that there exists a subsequence
{xnj }j converges to x ∈ [a, b]. Continuity of f means that f (xnj ) converges to
f (x). This is a contradiction and hence the theorem is proved.
We can now use the above result on the boundedness in order to show that the
extreme values are actually obtained.
45
Theorem 3.14 (extreme value theorem). Suppose that f is a scalar field continuous at
every point in the closed rectangle [a, b]. There there exist points x, y ∈ [a, b] such that
f (x) = inf f and f (y) = sup f.
Proof. By the boundedness theorem sup f is finite and so there exists a sequence
{xn }n such that f (xn ) converges to sup f . Bolzano–Weierstrass theorem implies
that there exists a subsequence {xnj }j which converges to x ∈ [a, b]. By continuity
f (xn ) → f (x) = sup f .
Theorem 3.15 (Lagrange multipliers in 2D). Suppose that a differentiable scalar field
f (x, y) has a relative minimum or maximum when it is subject to the constraint
g(x, y) = 0.
Then there exists a scalar λ such that, at the extremum point,
∇f = λ∇g.
46
f = c1
f = c2
f = c3 g(x, y) = 0
∇f
∇g
Theorem 3.16 (Lagrange multipliers in 3D). Suppose that a differentiable scalar field
f (x, y, z) has a relative minimum or maximum when it is subject to the constraints
g1 (x, y, z) = 0, g2 (x, y, z) = 0
and the ∇gk are linearly independent. Then there exist scalars λ1 , λ2 such that, at the
extremum point,
∇f = λ1 ∇g1 + λ2 ∇g2 .
In higher dimensions and possibly with additional constraints we have the following
general theorem.
The Lagrange multiplier method is often stated and far less often proved. Since the
proof is rather involved we will follow this tradition here. See, for example, Chapter 14
47
of “A First Course in Real Analysis” (2012) by Protter & Morrey for a complete proof
and further discussion.
Let us consider a particular case of the method when n = 3 and m = 2. More pre-
cisely we consider the following problem: Find the maxima and minima of f (x, y, z)
along the curve C defined as
g1 (x, y, z) = 0, g2 (x, y, z) = 0
where g1 , g2 are differentiable functions. In this particular case we will prove the
Lagrange multiplier method. Suppose that a is some point in the curve. Let α(t)
denote a path which lies in the curve C in the sense that α(t) ∈ C for all t ∈ (−1, 1),
α′ (t) ̸= 0 and α(0) = a. If a is a local minimum for f restricted to C it means that
f (α(t)) ≥ f (α(0)) for all t ∈ (−δ, δ) for some δ > 0. In words, moving away from
a along the curve C doesn’t cause f (x) to decrease. Let h(t) = f (α(t)) and observe
that h : R → R so we know how to find the extrema. In particular we know that
h′ (0) = 0. By the chain rule h′ (t) = ∇f (α(t)) · α′ (t) and so
∇f (a) · α′ (0) = 0.
Since we know that g1 (α(t)) = 0 and g2 (α(t)) = 0, again by the chain rule,
∇g1 (a) · α′ (0) = 0, ∇g2 (a) · α′ (0) = 0.
To proceed it is convenient to isolate the following result of linear algebra.
48
Chapter 4
C u r v e s & l i n e i ntegrals
Circle x2 + y 2 = 4
Semi-circle x2 + y 2 = 4, x ≥ 0
Ellipse 1 2
4
x + 19 y 2 = 4
Line y = 5x + 2
Line (in 3D) x + 2y + 3z = 0, x = 4y
Parabola (in 3D) y = x2 , z = x
In the above list the curves are written in a way where we are describing a set of
points using certain constraint or constraints. In some cases in implicit form, in
some cases in explicit form. For example, for the circle we formally mean the set
{(x, y) : x2 + y 2 = 4}. We have the idea that the curves should be sets which
are single connected pieces and we vaguely have an idea that we need curves that are
sufficiently smooth. To proceed we need a precise definition of the 1D objects we can
work with. As part of the definition we force a structure which really allows us to work
with these objects in a useful way.
49
4. 1 C u r v e s , p a t h s & l i n e i n t e g r a l s
Let α : [a, b] → Rn be continuous. For convenience, in components we write
α(t) = (α1 (t), . . . , αn (t)). We say that α(t) is differentiable if each component
αk (t) is differentiable on [a, b] and αk′ (t) is continuous (Definition 2.16). We say that
α(t) is piecewise differentiable if [a, b] = [a, c1 ] ∪ [c1 , c2 ] ∪ · · · ∪ [cl , b] and α(t) is
differentiable on each of these intervals.
Note that different functions can trace out the same curve in different ways. Also
note that a path has an inherent direction. We say that this is a parametric representation
of a given curve. We already saw examples of paths in Figure 2.4 and Figure 2.5. A few
examples of paths are as follows.
Observe how some of these paths represent the same curve, perhaps traversed in a
different direction.
Let α(t) be a (piecewise differentiable) path on [a, b] and let f : Rn → Rn be a
continuous vector field. Recall that we consider α
′
(t) andf (x) as n-vectors. I.e., in
′
α1 (t)
the case n = 2, then α (t) = α′ (t) and f (x) = ff12 (x)
′
(x) .
2
Definition 4.2 (line integral of a vector field). The line integral of the vector field f
along the path α is defined as
ˆ ˆb
f · dα = f (α(t)) · α′ (t) dt.
a
50
´
Sometimes the same integral is written as C f · dα to emphasize that
´ the integral is
along the curve
´ C. Alternatively the integral is sometimes written as f1 dα1 + · · · +
fn dαn or f1 dx1 + · · · + fn dxn . Each of these different notations are in common
usage in different contexts but the underlying quantity is always the same.
√
y
Example. Consider the vector field f (x, y) = x3 +y and the path α(t) = (t2 , t3 )
´
for t ∈ (0, 1). Evaluate f · dα.
51
As already mentioned, for a given curve there are many different choices of para-
metrization. For example, consider the curve C = {(x, y) : x2 + y 2 = 1, √y ≥ 0}.
This is a semi-circle and two possible parametrizations are α(t) = (−t, 1 − t2 ),
t ∈ [−1, 1] and β(t) = (cos t, sin t), t ∈ [0, π]. These are just two possibilities
among many possible choices. For a given curve, to what extent does the line integral
depend on the choice of parametrization?
Definition 4.3 (equivalent paths). We say that two paths α(t) and β(t) are equivalent
if there exists a differentiable function u : [c, d] → [a, b] such that α(u(t)) = β(t).
Furthermore, we say that α(t) and β(t) are
▷ in the same direction if u(c) = a and u(d) = b,
▷ in the opposite direction if u(c) = b and u(d) = a.
With this terminology we can precisely describe the dependence of the integral on
the choice of parametrization.
Theorem 4.4 (change of parametrization). Let f be a continuous vector field and let
α, β be equivalent paths. Then
ˆ (´
f · dβ if the paths are in the same direction,
f · dα = ´
− f · dβ if the paths are in the opposite direction.
Proof. Suppose that the paths are continuously differentiable path, decomposing if
required. Since α(u(t)) = β(t) the chain rule implies that β ′ (t) = α′ (u(t)) u′ (t).
In particular
ˆ ˆd ˆd
f · dβ = f (β(t)) · β ′ (t) dt = f (α(u(t))) · α′ (u(t)) u′ (t) dt.
c c
Changing variables, adding a minus sign if path is opposite direction because we need
to swap the limits of integration, completes the proof.
52
g ′ (t) = ∇h(α(t)) · α′ (t) and evaluate the line integral
ˆ ˆ1
∇h · dα = ∇h(α(t)) · α′ (t) dt
0
ˆ1
= g(t) dt = g(1) − g(0) = h(α(1)) − h(α(0)).
0
This equality has the following intuitive interpretation if we suppose for a moment
that h denotes altitude. In this case the line integral is the sum of all the infinitesimal
altitude changes and equals the total change in altitude.
As a first example of work
inphysics let’s consider gravity. The gravitational field
0
on earth is f (x, y, z) = mg 0 . If we move a particle from a = (a1 , a2 , a3 ) to
´ = (b1 , b2 , b3 ) along the path α(t), t ∈ [0, 1] then the work done is defined as
b
f · dα. We calculate that
ˆ ˆ1 ˆ1
f · dα = f (α(t)) · α (t) dt = mg α3′ (t) dt
′
0 0
53
4.3 T h e s e c o n d f u n d a m e n t a l t h e o r e m
´b
Recall that, if φ : R → R is differentiable then a φ′ (t) dt = φ(b) − φ(a). This is
called the second fundamental theorem of calculus and is one of the ways in which we
see that differentiation and integration are opposites. The analog for line integrals is
the following.
Proof. Suppose that α(t) is differentiable. By the chain rule dtd φ(α(t)) = ∇φ(α(t))·
α′ (t). Consequently
ˆ ˆ1 ˆ1
′ d
∇φ · dα = ∇φ(α(t)) · α (t) dt = dt
φ(α(t)) dt.
0 0
nd
´1
By the 2 fundamental theorem in R we know that d
0 dt
φ(α(t)) dt = φ(α(b)) −
φ(α(a)).
Example (potential energy). Our earth has mass M with centre at (0, 0, 0). Suppose
that there is a small particle close to earth which has mass m. The force field of
gravitation and potential energy are, respectively,
−GmM GmM
f (x) = 3 x, φ(x) = .
∥x∥ ∥x∥
We can calculate ∇φ(x) and see that it is equal to f (x).
4.4 T h e f i r s t f u n d a m e n t a l t h e o r e m
First we need to consider a basic topological property of sets. In particular we want
to avoid the possibility of the set being several disconnected pieces, in other words
we want to guarantee that we can get from one point to another in the set in a way
without every leaving the set (see Figure 4.1).
54
Definition 4.6 (connected). The set S ⊂ Rn is said to be connected if, for every pair
of points a, b ∈ S, there exists a path α(t), t ∈ [a, b] such that
▷ α(t) ∈ S for every t ∈ [a, b],
▷ α(a) = a and α(b) = b.
S
a
b
α(t)
F i g u r e 4 . 1 : A connected set.
´x
Recall that, if f : R → R is continuous and we let φ(x) = a f (t) dt then
φ′ (x) = f (x). This is called the first fundamental theorem of calculus and is the
other way in which we see that differentiation and integration are opposites. Again we
have an analog for the line integral but here it becomes a little more subtle since there
are many different paths along which we can integrate between any two points.
55
Moreover β ′k (t) = ek . Consequently
∂φ 1
(x) = lim (φ(x + hek ) − φ(x))
∂xk h→0 h
ˆh
1
= lim f (β k (t)) · ek dt = fk (x).
h→0 h
0
Definition 4.7 (closed path). We say a path α(t), t ∈ [a, b] is closed if α(a) = α(b).
Observe that, if α(t), t ∈ [a, b] is a closed path then we can divided it into two
paths: Let c ∈ [a, b] and consider the two paths α(t), t ∈ [a, c] and α(t), t ∈ [c, b].
On the other hand, suppose α(t), t ∈ [a, b] and β(t), t ∈ [c, d] are two path starting
at a and finishing at b. The these can be combined to define a closed path (by following
one backward).
Note that some authors call such a vector field a gradient (i.e., the vector field is
the gradient of some scalar). If f = ∇φ then the scalar field φ is called the potential
(associated to f ). Observe that that the potential is not unique, ∇φ = ∇(φ + C) for
any constant C ∈ R.
Theorem 4.9 (conservative vector fields). Let S ⊂ Rn and and consider the vector
field f : S → Rn . The following are equivalent:
(i) f´ is conservative, i.e., f = ∇φ on S for some φ,
(ii) ´ f · dα does not depend on α, as long as α(a) = a, α(b) = b,
(iii) f · dα = 0 for any closed path α contained in S.
Proof. In the previous theorems (the two fundamental theorems) we proved that (i) is
equivalent to (ii).
56
Now we prove that (ii) implies (iii): Let α(t) be ´a closed path and
´ let β(t) be the
same
´ path in´ the opposite direction.
´ Observe that f · dα = − f · dβ but that
f · dα = f · dβ and so f · dα = 0.
It remains to prove that (iii) implies (ii): The two paths between a and b can be
combined (with a minus sign) to give a closed path.
The above result is a special case of the following general statement which holds in
any dimension.
ˆ ˆ2π
f · dα = (sin2 t + cos2 t) dt = 2π.
0
57
Observe that in the above example S is somehow not a “nice” set because of the
“hole” in the middle. Moreover, observe that the line integral is the same for any circle,
independent of the radius.
Theorem 4.11 isn’t really useful in showing that a vector field is conservative because
it is possible for the mixed partial derivatives to all be equal but still the field fail to be
conservative. On the other hand, if a pair of mixed derivatives is not equal then f is
not conservative and so it is useful for proving the negative. Later in this chapter we
will return to this topic.
4.5 P o t e n t i a l s & c o n s e r v a t i v e v e c t o r
fields
We now turn our attention to the following question: Suppose we are given a vector
field f and we know that f = ∇φ for some φ. How can we find φ? For this we consider
two methods in the following paragraphs. First we describe the method which we call
x2 x
α2
a2
a α1
a1 x1
58
Let α(t) denote the concatenation of the two paths. We calculate that
ˆ ˆx1 ˆx2
f · dα = f (α1 (t)) · α′1 (t) dt + f (α2 (t)) · α′2 (t) dt.
a1 a2
´ x1 ´ x2
This means that φ(x) = a1 f1 (t, a2 ) dt + a2 f2 (x1 , t) dt.
Now we describe a different method which we describe as constructing a potential by
indefinite integrals. Again suppose that f = ∇φ for some scalar field φ(x, y) which
we wish to find. Observe that ∂φ∂x
= f1 and ∂φ
∂y
= f2 . This means that
ˆx ˆy
f1 (t, y) dt + A(y) = φ(x, y) = f2 (x, t) dt + B(x)
a b
where A(y), B(x) are constants of integration. Calculating and comparing we can
then obtain a formula for φ(x, y).
x 2
Example. Find a potential for f (x, y) = e 2ey x+1
y on R2 .
59
x x
y y
S
S
This extra property permits the following sufficient condition for a vector field to
be conservative.
Sketch of proof. We have already proved that f being conservative implies the equality of
partial derivatives (Theorem 4.11) and therefore
´ we need only assume that ∂g fl = ∂l fk
and construct a potential. Let φ(x) = f · dα where α(t) = tx, t ∈ [0, 1]. Since
´1
α′ (t) = x, φ(x) = 0 f (tx) · x dt. Also (needs proving)
ˆ1
∂φ
(tx) = (t∂k f (tx) · x + fk (tx)) dt.
∂xk
0
´1
This is equal to 0 (t∇fk (tx) · x + fk (tx)) dt because ∂g fl = ∂l fk ; By the chain
rule applied to g(t) = t∇fk (tx) this is equal to fk (x) as required.
The above gives us a useful tool to check if a given vector field is conservative.
Using the idea of “gluing together” several convex regions this result can be manually
60
extended to some more general settings. Later, in Theorem 5.7, we will take advantage
of some further ideas in order to significantly extend this result.
Proof. If y(x) satisfies φ(x, y(x)) = C, then by the chain rule and the fact that
∇φ = ( pq ), we see that p(x, y(x)) + y ′ (x)q(x, y(x)) = 0. Conversely, if y(x) is a
solution, φ(x, y(x)) must be constant in x.
4.6 L i n e i n t e g r a l s o f s c a l a r f i e l d s
Up until now this chapter has been devoted to line integrals of vector fields but there
is also the obvious question of defining the line integral for scalar fields. This we do
now. Such a line integral allows us also to define the length of a curve in a meaningful
way. Let α(t), t ∈ [a, b] be a path in Rn and let f : Rn → R.
61
Definition 4.15 (line integral of a scalar field). The line integral of the scalar field f
along the path α is defined as
ˆ ˆb
f dα = f (α(t)) ∥α′ (t)∥ dt.
a
This integral shares the same basic properties of the line integral of a vector field
and the proofs are essentially the same. Namely it is linear and also respects how a
path can be decomposed or joined with other paths which changing the value of the
integral. Moreover, the value of the integral along a given path is independent of the
choice of parametrization of the curve. In this case, even if the curve is parametrized in
the opposite direction then the integral takes the same value. Consequently it makes
sense to define the length of the curve as the line integral of the unit scalar field, i.e.,
´b
the length of a curve parametrized by the path α is a ∥α′ (t)∥ dt.
As a simple application, consider that the path represents a wire and the ´ wire has
density f (α(t)) at the point α(t). Then the mass of the wire is equal to f dα.
62
Chapter 5
M u l t i p l e i n t egrals
T ous chapters. We then defined line integrals which are, in a sense, one dimensional
h e extension to higher dimension of differentiation was established in the previ-
integrals which exist in a high dimensional setting. We now take the next step and
define higher dimensional integrals in the sense of how to integrate a scalar field defined
on a subset of Rn . The first step will be to rigorously define which scalar fields are
integrable and to define the integral. Then we need to fine reasonable ways to evaluate
such integrals. Among other applications we will use this multiple integrals to calculate
volumes and moment of inertia. In Green’s Theorem we find a connection between
multiple integrals and line integrals. We also develop the important topic of change of
variables which takes advantate of the Jacobian determinant and is often invaluable
for actually working with a given problem.
63
b2
yk+1
yk
a2
a1 xj xj+1 b1
Fi g u r e 5 . 1 : A partition of a rectangle R.
Observe that the value of the integral is independent of the partition, as long as the
function is constant on each sub-rectangle. In this sense the integral is well-defined
(not dependent on the choice of partition used to calculate it).
64
1
f (x, y)
0
0 2
1 2 3 y
4 0
x
Theorem 5.1 (basic properties of the integral). Let f, g be step functions. Then
˜ ˜ ˜
(af + bg) dxdy = a f dxdy + b g dxdy for all a, b ∈ R,
R
˜ ˜R ˜ R
f dxdy = f dxdy + f dxdy if R is divided into R1 and R2 ,
R R1 R2
˜ ˜
f dxdy ≤ g dxdy if f (x, y) ≤ g(x, y).
R R
We are now in the position to define the set of integrable functions. In order
to define integrability we take advantage of “upper” and “lower” integrals which
“sandwich” the function we really want to integrate.
65
All the basic properties of the integral of step functions, as stated in Theorem 5.1, as
holds for the integral of any integrable functions. This can be shown by considering
the limiting procedure of the upper and lower integral of step functions which are
part of the definition of integrability.
5.2 E v a l u a t i o n o f m u l t i p l e i n t e g r a l s
Now we have a definition we can rigorously work with integrals but it is essential to
also have a way to practically evaluate any given integral.
Theorem (evaluating by repeated integration). Let f be a bounded integrable function
on R = [a1 , b1 ] × [a2 , b2 ]. Suppose that, for every y ∈ [a2 , b2 ], the integral A(y) =
´ b1 ´b
a1
f (x, y) dx exists. Then a22 A(y) dy exists and,
¨ ˆb2 ˆb1
a1
h(x, y) dx are step functions (in y) and so A(y) is integrable. Moreover,
ˆb2 ˆb1 ˆb2 ˆb2 ˆb1
g(x, y) dx dy ≤ A(y) dy ≤ h(x, y) dx dy.
a2 a1 a2 a2 a1
´ b2
This both proves the existence of a2
A(y) dy and the value of the integral.
The conditions of the above theorem aren’t immediately easy to check and so it is
convenient to now investigate the integrability of continuous functions.
66
1
f (x, y)
1
0
0 0.2 0.5
0.4 0.6 0.8 1 0 y
x
Proof. Continuity implies boundedness and so upper and lower integrals exist. Let
ϵ > 0. Exists δ > 0 such that |f (x) − f (y)| ≤ ϵ whenever ∥x − y∥ ≤ δ. We can
choose a partition such ∥x − y∥ ≤ δ whenever x, y are in the same sub-rectangle
Qjk . We then define the step functions g, h s.t. g(x) = inf Qjk f , h(x) = supQjk f
when x ∈ Qjk . To finish the proof we observe that inf Qjk f − supQjk f ≤ ϵ and
ϵ > 0 can be made arbitrarily small.
This integral naturally allows us to calculate the volume of a solid. Let f (x, y) ≤
z ≤ g(x, y) be defined on the rectangle R ⊂ R2 and consider the 3D set defined as
V = {(x, y, z) : (x, y) ∈ R, f (x, y) ≤ z ≤ g(x, y)} .
˜
The volume of the set V is equal to Vol(V ) = R [g(x, y) − f (x, y)] dxdy.
Up until now we have considered step function and continuous functions. Clearly
we can permit some discontinuities and we introduce the following concept to be
able to control the functions with discontinuities sufficiently to guarantee that the
integrals are well-defined.
Definition (content zero set). A bounded subset A ⊂ R2 is said to have content zero
if, for every ϵ > 0, there exists a finite set of rectangles whose union includes A and
the sum of the areas of the rectangles is not greater than ϵ.
Examples of content zero sets include: finite sets of points; bounded line segments;
continuous paths.
67
Figure 5.4: The graph of a continuous function has content zero.
Theorem. Let f be a bounded function on R and ˜ suppose that the set of discontinuities
A ⊂ R has content zero. Then the double integral R f (x, y) dxdy exists.
Proof. Take a cover of A by rectangles with total area not greater than δ > 0. Let
P be a partition of R which is finer than the cover of A. We may assume that
inf Qjk f − supQjk f ≤ ϵ on each sub-rectangle of the partition which doesn’t
contain a discontinuity of f . The contribution to the integral of bounding step
functions from the cover of A is bounded by δ sup |f |.
5. 3 R e g i o n s b o u n d e d b y f u n c t i o n s
A major limitation is that we have only integrated over rectangles whereas we would
like to integrate over much more general different shaped regions. This we develop
now.
Suppose S ⊂ R and f is a bounded function on S. We extend f to R by defining
(
f (x, y) if (x, y) ∈ S
fR (x, y) =
0 otherwise.
We use this notation in the following definition.
68
φ2
y
φ1
a b
x
Suppose that there are continuous functions φ1 , φ2 on R and consider the set (see
Figure 5.5)
S = {(x, y) : a ≤ x ≤ b, φ1 (x) ≤ y ≤ φ2 (x)} ⊂ R2 .
Not all sets can be written in this way but many can and such a way of describing a
subset of R2 is convenient for evaluating integrals. Observe that we could also consider
the following set
S = {(x, y) : a ≤ y ≤ b, φ1 (y) ≤ x ≤ φ2 (y)} .
In the first case we could describe the representation as projecting along the y-coordinate
whereas in the second case we are projecting along the x-coordinate. Observe that it
doesn’t make a different to the integral if we use < or ≤ in the definition of S since
the difference would be a content zero set.
Theorem. Suppose that φ is a continuous function on [a, b]. Then the graph (x, y) :
x ∈ [a, b], y = φ(x) has zero content.
Proof. By continuity, for every ϵ > 0, there exists δ > 0 such that |φ(x) − φ(y)| ≤ ϵ
whenever |x − y| ≤ δ. We then take partition of [a, b] into subintervals of length less
than δ. Using this partition we generate a cover of the graph which has area not greater
than 2ϵ |b − a|.
69
z
y
5
−5 x
5
−5
Theorem 5.4. Let S = {(x, y) : x ∈ [a, b], φ1 (x) ≤ y ≤ φ2 (x)} where φ1 , φ2 are
continuous and let f be a bounded continuous function of S. Then f is integrable on S
and
¨ ˆb φˆ2 (x)
f (x, y) dxdy = f (x, y) dy dx.
S a φ1 (x)
A similar result holds for type 2 regions but with x and y swapped. For higher
dimensions we need to also have an understanding of how to represent subsets of Rn .
Take for example a 3D solid then we would hope to be able to “project” along one of
the coordinate axis and so describe it using the 2D “shadow” and a pair of continuous
functions. For example, consider the upside-down cone of Figure 5.6 which has base
of radius 5 lying in the plane {z = 5} and has tip at the origin. In order to describe
this set it is convenient to imagine how it projects down onto the xy-axis. We then
70
describe it as
V = {(x, y, z) : (x, y) ∈ S, γ1 (x, y) ≤ z ≤ γ2 (x, y)}
where S ⊂ R is the “shadow” and the functions represent the control we need in the
2
5.4 A p p l i c a t i o n s o f m u l t i p l e i n t e g r a l s
Multiple integrals can be used to calculate the area or volume of a given set. Suppose
that
S = {(x, y) : x ∈ [a, b], φ1 (x) ≤ y ≤ φ2 (x)} ⊂ R2
where φ1 , φ2 are continuous functions. The the area of S is
¨ ˆb φˆ2 (x) ˆb
dxdy = dy dx = [φ2 (x) − φ1 (x)] dx.
S a φ1 (x) a
71
The total mass would then be M = k mk and the centre of mass is the point (p, q)
P
such that X X
pM = mk xk and qM = mk y k .
k k
Suppose an object has the shape of a region S and the density of the material is
f (x, y)˜at point (x, y). Then, similar to the discrete case above, the total mass is
M = S f (x, y) dxdy and the centre of mass is the point (p, q) such that
¨ ¨
pM = x f (x, y) dxdy and qM = y f (x, y) dxdy.
S S
By tradition, if the density is constant, then the centre of mass is called the centroid.
5 . 5 G r e e n ’ s t h eo r e m
We can now establish a connection between multiple integrals and the line integrals of
the previous chapter.
Proof of Green’s theorem. To start we assume that S is a type 1 region and that Q = 0,
Since S = {(x, y) : x ∈ [a, b], φ1 (x) ≤ y ≤ φ2 (x)},
¨ ˆ b φˆ2 (x)
∂Q ∂P
− dxdy = (− ∂P ) dy dx
∂x ∂y ∂y
S a φ1 (x)
ˆb
= (P (x, φ1 (x)) − P (x, φ2 (x)))dx,
a
72
S S
α α
It is then natural to choose four paths α1 (t) =´(t, φ1 (t)), α´2 (t) = (a, ´t), α3 (t) =
(t, φ2 (t)), α4 (t) = (b, t). We can calculate that C f · dα = f · dα1 − f · dα3 =
´b ´b
a
P (t, φ1 (t)) dt − a P (t, φ2 (t)) dt. If S is also
type 2 then this works for P = 0
and linearity means it works for f = ( 0 ) + Q , More general regions can be formed
P 0
by “glueing” together simpler regions of the above type to complete the argument.
The quantity ∂Q
∂x
− ∂P
∂y
is reminiscent of something we saw with conservative vector
fields and we take advantage of this with the following application. We previously
introduced the concept of connected sets but now we need a slight refinement of the
idea.
The following result extends Theorem 4.13 which was limited to convex sets.
73
Theorem 5.7 (conservative vector fields on simply connected regions). Let S be a
P
simply connected region and suppose that f = Q is a vector field, continuously differ-
entiable on S. Then f is conservative if and only if ∂Q
∂x
= ∂P
∂y
.
74
For the 2D case we have the following result.
Theorem 5.8 (change of variable in 2D). Suppose that (u, v) 7→ (X(u, v), Y (u, v))
maps T to S one-to-one and X, Y are continuously differentiable. Then
¨ ¨
f (x, y) dxdy = f (X(u, v), Y (u, v)) |J(u, v)| dudv.
S T
Polar coordinates
Polar coordinates correspond to the coordinate mapping
(
x = r cos θ
y = r sin θ.
In this case the Jacobian determinant is
|J(r, θ)| = ∂∂uv XX ∂u Y
= −rcossinθ θ rsin θ
= r(cos2 θ + sin2 θ) = r.
∂v Y cos θ
Consequently, the change of variable in the integral gives that
¨ ¨
f (x, y) dxdy = r f (r cos θ, r sin θ) drdθ.
S T
Linear transformations
In this case the coordinate mapping is
(
x = Au + Bv
y = Cu + Dv
where A, B, C, D ∈ R are chosen fixed. The Jacobian determinant is equal to
|J(u, v)| = ∂∂uv X
X ∂u Y
∂v Y = |( CA D
B )| = |AD − BC| .
75
Extension to higher dimensions
The exact analog of Theorem 5.8 holds in any dimension. In particular, in 3D, if we
consider
˝ the change of variables (u, v, w) 7→ (X(u, v, w), Y (u, v, w), Z(u, v, w)),
then S f (x, y, z) dxdydz is equal to
˚
f (X(u, v, w), Y (u, v, w), Z(u, v, w)) |J(u, v, w)| dudvdw
T
where J(u, v) is now the Jacobian matrix of dimension (3 × 3).
Cylindrical coordinates
Cylindrical coordinates corresponds to the mapping (require r > 0, 0 ≤ θ ≤ 2π)
x = r cos θ
y = r sin θ
z=z
Spherical coordinates
Spherical coordinates correspond to how we use lattitude, longitude and altitude to
specify a position on earth. It is the coordinate mapping (require ρ > 0, 0 ≤ θ ≤ 2π,
0 ≤ φ < π)
x = ρ cos θ sin φ
y = ρ sin θ sin φ
z = ρ cos φ.
76
In this case the Jacobian determinant is
cos θ sin φ sin θ sin φ cos φ
|J(ρ, θ, φ)| = −ρ sin θ sin φ ρ cos θ sin φ 0 = −ρ2 sin φ = ρ2 sin φ.
ρ cos θ cos φ ρ sin θ cos φ −ρ sin φ
77
Chapter 6
S u r f a c e i n t egrals
I
n this section we consider surfaces and how to define integral of vector fields over
these surfaces. This is similar in many ways to line integrals but a higher dimensional
version. Curves (for line integrals) are 1D subsets of higher dimensional space whereas
surfaces are 2D subsets of higher dimensional space. Identically to line integrals, the
first step is to understand a practical way to represent the surfaces, just like with curves
we used paths as the parametric representation of the curve. Once we have clarified
the parametric representation of surface we can define the surface integral (of a vector
field) and show that it satisfies various properties which we would expect, including
that the integral is independent of the choice of parametrization. Similar to how we
were able to use a line integral (of a scalar) to calculate the length of a curve we can use
a surface integral (of a scalar) to calculate the area of a surface.
We then introduce two important operators that act on vector fields, namely curl
and divergence. Using these operators and the surface integral we introduce two theor-
ems, Gauss’ Theorem and Stokes’ Theorem. These theorems connect line integrals
with surface integrals and with volume integrals.
79
In a similar way, now in 2D we can have a parametric representation of a hemisphere.
80
Definition (regular point). If (u, v) is a point in T at which ∂u
∂r
and ∂v
∂r
are continuous
and the fundamental vector product is non-zero then r(u, v) is said to be a regular
point for that representation.
Just like we saw with paths to represent curves, there are many different ways we
can find the parametric representation of a given surface. If the surface S has the
form z = f (x, y) (the surface in written in explicit form) then we can use x, y as the
parameters and have the representation
r(x, y) = (x, y, f (x, y)) , (x, y) ∈ T.
The region T is the projection of S onto the xy-plane. For such a surface we compute
1 0
∂r ∂r
∂x
= 0
∂ f
, ∂y
= 1
∂ f
,
x y
and consequently
1
0
−∂x f
∂r ∂r
∂x
× ∂y
= 0
∂x f
× 1
∂y f
= −∂y f .
1
An example of such a representation is as follows for the hemisphere.
Example (hemisphere representation 2). Let T = [0, 2π] × [0, π/2] and let
r(u, v) = (cos u cos v, sin u cos v, sin v).
The surface r(T ) is the unit hemisphere {(x, y, z) : x2 + y 2 + z 2 = 1}. This is the
representation which is connected to spherical coordinates. We calculate that
− cos u sin v
∂r − sin u cos v ∂r
∂u
(u, v) = cos u cos v , ∂v (u, v) = − sin u sin v ,
0 cos v
81
and so the fundamental vector product of this representation is
∂r ∂r
∂u
× ∂v
(u, v) = cos v r(u, v).
In this case many points map to the north pole (0, 0, 1) and the north pole is not a
regular point. Additionally there are two points which map to each point on the line
between equator and north pole {(x, y, z) ∈ r(T ) : y = 0}.
6.2 S u r f a c e i n t e g r a l o f s c a l a r f i e l d
Mirroring the process for line integrals we will define surface integrals both for scalar
fields and for vector fields. The surface integral of a scalar field is closely related to the
area of a parametric surface, just like the length of a curve is closely related to the line
integral of a scalar field.
Definition 6.2 (area of a parametric surface). The area of the parametric surface
S = r(T ) is defined as the double integral
¨
∂r ∂r
Area(S) = ∂u
× ∂v dudv.
T
Observe that the definition is in terms of a multiple integral over the region T , and
the quantity being integrated is the norm of the fundamental vector product.
Later we will show that Area(S) is independent of the choice of representation
as we require for such a definition, it would be unreasonable if the area of a surface
depended on the choice of representation.
We will check that this definition corresponds to a fact that we already know by
computing the surface area of a hemisphere. Let, as before, T = [0, 2π]×[0, π/2] and
let r(u, v) = (cos u cos v, sin u cos v, sin v). The norm of the fundamental vector
product (which we computed earlier) is
∂r ∂r
∂x
× ∂y
(u, v) = cos v ∥r(u, v)∥ = cos v.
This means, by Definition 6.2 and evaluating the multiple integral, that
¨ ˆ2π ˆπ/2
82
The surface integral of a scalar field is defined in a way similar to the area of a surface.
Definition 6.3 (surface integral). Let S = r(T ) be a parametric surface and let f be
a scalar field defined on S. The surface integral of f over S is defined as
¨ ¨
∂r ∂r
f dS = f (r(u, v)) ∂u × ∂v (u, v) dudv
r(T ) T
Observe that, if we choose f ≡ 1, that is we choose the scalar field identically equal
to 1, then we obtain the formula for the area of the surface (Definition 6.2). This is
just the same as the line integral of a scalar and the length of the corresponding curve.
From the point of view of applications, we could take ˜ f as the density of thin
material which has the shape of the surface S and then S f dS is the total mass of
this piece of material. Extending this idea we could also calculate the centre of mass of
this piece of material.
6.3 C h a n g e o f s u r f a c e p a r a m e t r i z a t i o n
In order to validate the definition of a surface integral and consequently that of the
area of a surface, we will now show that the the value of the evaluated integral doesn’t
depend on the choice of representation for any given surface.
Theorem 6.4 (change of surface parametrization). Suppose that q(A) and r(B) are
both representations of the same surface, and that r = q ◦ G for some differentiable
G : B → A.¨Then ¨
∂q ∂q ∂r ∂r
f ◦q ∂s
× ∂t
dsdt = f ◦r ∂u
× ∂v
dudv.
A B
Proof. Since r(u, v) = q(S(u, v), T (u, v)) we calculate (chain rule and vector product)
that
× ∂v (u, v) = ∂q × ∂q
∂r ∂r ∂S ∂T
− ∂S ∂T
∂u ∂s ∂t ∂u ∂v ∂v ∂u
(S(u, v), T (u, v)).
83
Figure 6. 1 : Two different representations for a given surface.
Observe that ∂S ∂T
∂u ∂v
− ∂S ∂T
∂v ∂u
is the Jacobian determinant associated to change of
variables (u, v) 7→ (S(u, v), T (u, v)). Consequently, by the change of variables
theorem,
¨ ¨
∂q ∂q ∂r ∂r
f ◦q ∂s
× ∂t
dsdt = f ◦r ∂u
× ∂v
dudv
A B
6.4 S u r f a c e i n t e g r a l o f a v e c t o r f i e l d
In preparation for defining the surface integral of a vector field we need the notion of
the normal vector of a surface. This is a natural geometric notion, for each point in
the surface it is the unit vector field which is orthogonal to the surface.
84
Definition 6.5 (normal vector). Let S = r(T ) be a parametric surface. At each
regular point the two unit normals are
∂r ∂r
×
n1 = ∂u
∂r
∂v
∂r
and n2 = −n1 .
∂u
× ∂v
By definition ∥n1 ∥ = ∥n2 ∥ = 1. That there are two normal vectors is expected
because there are two sides to the surface at each point, one is just the opposite direction
to the other. If f is a vector field then f · n is the component of the flow in direction
of n.
85
Definition 6.7 (curl). The curl of f is defined as
∂fz ∂fy
∂y
− ∂z
∂fx ∂fz
∇×f = ∂z − ∂x .
∂fy
∂x
− ∂f x
∂y
Often the notation curl f = ∇ × f and div f = ∇ · f is used instead. Note that
the symbols“×” and “·” used in the notation for curl and divergence are not truly
representing the vector and scalar product but are more a convenient way to remember
the definitions. These quantities satisfy the following basic properties which can all be
proved by the basic calculation.
▷ If f = ∇φ then ∇ × f = 0,
▷ ∇ · (∇ × f ) = 0,
▷ ∇ × (∇ × f ) = ∇(∇ · f ) − ∇2 f .
2 2 2
The quantity defined as ∇2 φ = ∇ · (∇φ) = ∂∂xφ2 + ∂∂yφ2 + ∂∂zφ2 is called the Laplacian
and occurs in many applications of physics and mathematics.
x
Example. If f (x, y, z) = y
z
then ∇ × f = 0, ∇ · f = 3.
−y
0
Example. If f (x, y, z) = x then ∇ × f = 0 , ∇ · f = 0.
0 2
The above result implies Theorem 5.7 (the 2D vector fields can be written as 3D
vector fields with a zero component).
86
6. 6 T h e o r e m s o f S t o k e s a n d G a u s s
Theorem 6.10 (Stokes). Let S = r(T ) be a parametric surface. Suppose that T is
simply connected and that the boundary of T is mapped to C, the boundary of S. Let β
be a counter clockwise parametrization of the boundary of T and let α(t) = r(β(t)).
Then ¨ ˆ
(∇ × f ) · n dS = f · dα.
S
fx
Sketch of proof. Write f = fy and suppose that fy = fz = 0. This effectively
fz
reduces the full problem to the lower dimensional version that we previously consider.
As such, we can then apply Green’s theorem (Theorem 5.5). Finally we conclude for
general f by linearity of the integral.
Just as Green’s Theorem holds for regions which can contain holes, as long as they
are correctly accounted for, we can extend Stokes’ theorem to more general surfaces
with the idea of “cutting and gluing” the surface. In particular this allows the extension
to surfaces with holes, cylinders, spheres, etc. On the other hand the theorem can’t be
extended to the Möbius band because the topology of this surface prevents a similar
process being completed.
Theorem 6.11 (Gauss). Let V ⊂ R3 be a solid with boundary the parametric surface S
and let n be the outward normal unit vector. If f is a vector field then
˚ ¨
∇ · f dxdydz = f · n dS.
V S
87
extended to general solids). We then use basic calculus to express fx as the integral of
the derivative.
Stokes’ Theorem allows us to connect surface integrals (2D) to line integrals (1D).
On the other hand Gauss’ Theorem allows us to connect volume integrals (3D) to
surface integrals (2D). In this way they are similar to each other, the integral goes
decreases dimension and also there is the loss of a derivative. Indeed the fundamental
theorem of calculus for line integral also fits into this same pattern. The branch of
mathematics called “differential geometry” provides a framework in which all these
results can be described in a unified way by the statement
ˆ ˆ
ω = dω.
∂Ω Ω
This result is called the “generalized Stokes theorem”.
Note that Gauss’ Theorem is often called the “divergence theorem”. We can use
this theorem for the following interpretation of divergence as a limit, similar to the
way other versions of derivatives are defined.
Theorem. Let Vt be the ball of radius t > 0 centred at a ∈ R3 and let St be its
boundary with outgoing unit normal vector n. Then
¨
1
∇ · f = lim f · n dS.
t→0 Vol(Vt )
St
Curl can also be written as a similar limit. Given the similarity of all the terms, it is
not unexpected that there is a relation between curl and divergence with the Jacobian
matrix. Recall that ∂f ∂f ∂f
x x x
∂x ∂y ∂z
∂fy ∂fy ∂fy
Jac(f ) =
∂x ∂y ∂z
∂fz ∂fz ∂fz
∂x ∂y ∂z
We can immediately see that divergence is the trace of the Jacobian matrix. In order to
see the connection with curl, recall that every real matrix A can be written as the sum
88
of a symmetric matrix 12 (A + AT ) and a skew-symmetric matrix 12 (A − AT ). In this
case we have that
∂fx ∂fy ∂fx ∂fz
0 ∂y
− ∂x ∂z
− ∂x
1 T ∂fy ∂fx ∂fy ∂fz
2
(Jac(f ) − Jac(f ) ) =
∂x − ∂y
0 ∂z
− ∂y
∂fz ∂fx ∂fz ∂fy
∂x
− ∂z ∂y − ∂z 0
and can see that the terms of the skew-symmetric part of the matrix are exactly the
terms of curl.
89