0% found this document useful (0 votes)
13 views102 pages

Main

The document contains course notes for 'Mathematical Analysis 2' taught at the University of Rome Tor Vergata for the academic year 2022/23, led by Oliver Butterley and Giovanni Canestrari. It covers fundamental concepts in mathematical analysis, including sequences, series, differential calculus, and integrals, while encouraging contributions and improvements from readers. The notes are freely available under a Creative Commons license and emphasize practical engagement with the material rather than rote memorization.

Uploaded by

wu27182
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views102 pages

Main

The document contains course notes for 'Mathematical Analysis 2' taught at the University of Rome Tor Vergata for the academic year 2022/23, led by Oliver Butterley and Giovanni Canestrari. It covers fundamental concepts in mathematical analysis, including sequences, series, differential calculus, and integrals, while encouraging contributions and improvements from readers. The notes are freely available under a Creative Commons license and emphasize practical engagement with the material rather than rote memorization.

Uploaded by

wu27182
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

M a t h em a t i c a l Analysis 2

Course notes 2022/23

By
E veryone who contributed.

github.com/oliver-butterley/ma2
c b a This document is covered by Creative Commons Attribution-ShareAlike 4.0 Interna-
tional (CC BY-SA 4.0).

You are free to share this work (copy and redistribute the material in any medium or format)
and adapt this work (remix, transform, and build upon the material for any purpose, even
commercially), under the obligation of attribution (you must give appropriate credit) and
share-alike (if you remix, transform, or build upon the material, you must distribute your
contributions under the same license as the original).

This text may contain errors, inaccuracies and misleading ideas, the reader takes full respons-
ibility for the consequences. Any resemblance to actual persons, living or dead, events or
localities is entirely coincidental.

Typeset: 19th November 2022

Git commit: 9744bcf

Source available at: github.com/oliver-butterley/ma2

i
Preface

T versity of Rome Tor Vergata in the department of engineering for the academic
h i s text accompanies the course “Mathematical Analysis 2” taught at the Uni-

year 2022/23. The course was led by Oliver Butterley, in collaboration with Giovanni
Canestrari.
The aim of this document is to concisely describe the fundamental details related
to the material of the course. They are aptly named as “notes” and are most likely
not the comprehensive source of all relevant information. We have easy access to a
huge volume of resources and so here we will make connections to whatever is useful,
whenever we can.
These notes are merely written text whereas the central part of the course remains
the time spent working with the material, be it doing exercises, discussing, doing
calculations, etc. This is not text for memorising, this is text that aims to help us
practice and become stronger thinkers.
This text is freely1 available at github.com/oliver-butterley/ma2. Everyone is
encouraged to contribute improvements to the document during the progress of the
course.
Some of the text comes from previous years and from many other sources, some
of the text came to be during the course. The current version is the product of many
people, in particular everyone who has made suggestions in class and pointed out
errors or imprecisions and to everyone who suggested useful additional content.

1
Free both in the sense of “free speech” and “free beer”.

iii
C o n t e n ts
Preface iii

Introduction vii

Sequences & series of functions 1


Convergence & continuity . . . . . . . . . . . . . . . . . . . . . . . . . 1
Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Radius of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Integrating & differentiating power series . . . . . . . . . . . . . . . . . 8
Uniqueness & Taylor series . . . . . . . . . . . . . . . . . . . . . . . . 9
Power series & differential equations . . . . . . . . . . . . . . . . . . . . 12

Differential calculus in higher dimension 15


Open sets, closed sets, boundary, continuity . . . . . . . . . . . . . . . . 16
Derivatives of scalar fields . . . . . . . . . . . . . . . . . . . . . . . . . 21
Level sets & tangent planes . . . . . . . . . . . . . . . . . . . . . . . . 27
Derivatives of vector fields . . . . . . . . . . . . . . . . . . . . . . . . . 29
Jacobian matrix & the chain rule . . . . . . . . . . . . . . . . . . . . . . 30
Implicit functions & partial derivatives . . . . . . . . . . . . . . . . . . 32

Extrema & other applications 35


Partial differential equations . . . . . . . . . . . . . . . . . . . . . . . . 35
Extrema (minima / maxima / saddle) . . . . . . . . . . . . . . . . . . . 38
Hessian matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Classifying stationary points . . . . . . . . . . . . . . . . . . . . . . . . 44
Attaining extreme values . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Extrema with constraints (Lagrange multipliers) . . . . . . . . . . . . . 46

v
Curves & line integrals 49
Curves, paths & line integrals . . . . . . . . . . . . . . . . . . . . . . . 50
Basic properties of the line integral . . . . . . . . . . . . . . . . . . . . 51
The second fundamental theorem . . . . . . . . . . . . . . . . . . . . . 54
The first fundamental theorem . . . . . . . . . . . . . . . . . . . . . . 54
Potentials & conservative vector fields . . . . . . . . . . . . . . . . . . . 58
Line integrals of scalar fields . . . . . . . . . . . . . . . . . . . . . . . . 61

Multiple integrals 63
Definition of the integral . . . . . . . . . . . . . . . . . . . . . . . . . 63
Evaluation of multiple integrals . . . . . . . . . . . . . . . . . . . . . . 66
Regions bounded by functions . . . . . . . . . . . . . . . . . . . . . . 68
Applications of multiple integrals . . . . . . . . . . . . . . . . . . . . . 71
Green’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Change of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Surface integrals 79
Representation of a surface . . . . . . . . . . . . . . . . . . . . . . . . 79
Surface integral of scalar field . . . . . . . . . . . . . . . . . . . . . . . 82
Change of surface parametrization . . . . . . . . . . . . . . . . . . . . . 83
Surface integral of a vector field . . . . . . . . . . . . . . . . . . . . . . 84
Curl and divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Theorems of Stokes and Gauss . . . . . . . . . . . . . . . . . . . . . . 87
I n t r o d u c tion

W studying analysis in general.


e start by looking at examples which demonstrate some of the motives behind

Example (Series). The geometric series S = 1 + 12 + 41 + 18 + 16


1
+ · · · can be summed
by the following simple trick. Multiplying by 2 we obtain that
1 1 1 1
2S = 2 + 1 + + + + + ··· = 2 + S
2 4 8 16
and so S = 2. If we try to do the same to the sum T = 1 + 2 + 4 + 8 + 16 + · · · we
get the nonsensical answer
2T = 2 + 4 + 8 + 16 + · · · = T − 1
and so T = −1. Why should we trust the argument in the first case and not in the
second?
Example (Interchanging sums). If we consider any matrix of numbers, for example,
 
1 2 3
4 5 6
7 8 9
we can sum first the rows 6 + 15 + 24 = 45 or first the columns 12 + 15 + 18 = 45
to obtain the total sum of all numbers. This is the rule
X m Xn X n X m
ajk = ajk .
j=1 k=1 k=1 j=1
We would like to believe that also ∞
P∞ P∞ P∞
ajk . However this
P
j=1 k=1 ajk = k=1 j=1
doesn’t work for the following matrix:
 
1 0 0 ···
−1 1 0 · · ·
.
 
 0 −1 1 · · ·
.. .. .. ...
 
. . .

vii
We often want to swap the order of summing (or integrating) and often need to
consider infinite sums (or integrals). When can we do this and can’t we?
Example (Interchanging integrals). Let’s try to integrate e−xy − xye−xy with respect
to both x and y. We would like to believe that
ˆ∞ ˆ1 ˆ1 ˆ∞
   
 (e−xy − xye−xy ) dy  dx = ?  (e−xy − xye−xy ) dx dy.
0 0 0 0
´1 −xy 1
´∞
Since 0 (e −xye ) dy = [ye ]y=0 = e , the left-hand side is 0 e−x dx =
−xy −xy −x
∞ ´∞ ∞
[−e−x ]0 = 1. However, since 0 (e−xy − xye−xy ) dx = [xe−xy ]x=0 = 0, the
´1
right-hand side is 0 0 dx = 0. So how do we know when to trust the interchange of
intervals?
Example (interchanging limits). We could easily believe that

lim lim 2
x2 ? lim lim x2 .
=
x→0 y→0 x + y 2 y→0 x→0 x2 + y 2

However limy→0 x2x+y2 = x2x+0 = 1 and so the left-hand side is 1 whereas limx→0 x2x+y2 =
2 2 2

0
0+y 2
= 0 so the right-hand side is 0. What does the graph of this function look like?
This example shows that the interchange of limits is untrustworthy. Under what
circumstances is it legitimate?
We need to be rigorous in our logic otherwise, as we have seen in these examples,
the conclusions can be erroneous and the difficulties are often subtle.

Curves of constant width


The above examples are calculus based but it is worthwhile to consider a real world
application of the rigour and reasoning we aspire to. Suppose we are organising the
production facilities which manufacture a component that is round (maybe a rocket
body, maybe a propellant tube, etc.). As part of the production it is important to
have a procedure which guarantees that the fabrication is done to the correct tolerance.
The idea proposed is:
“We measure the width from all angles to confirm that the manufac-
tured component is correct.”
This is a two-dimensional problem in the sense we assume that the object is a closed

viii
width

Figure 1: The Reuleaux triangle is a curve of constant width.

curve in R2 . For a given angle we define the width of this curve to be the smallest
distance between two parallel lines which touch the curve in a single point but never
cross it (one each side of the curve). We say that the curve has constant width if this
width is equal from every direction. This is just what we would check using calipers
on a part and rotating. The following statement is intuitive and true.
Theorem. A circle has constant width.
However the converse is not true, indeed the following is true.
Theorem. There exist constant width curves which are not circles.
This can be proved by constructing many such curves, for example the Reuleaux
triangle. Indeed there are such curves which look similar to regular polygons but still
have constant width.

MA2 versus MA1


Much of what we do in this course builds on ideas established in Mathematical Analysis
1. In particular many of the ideas are extended to the higher dimensional setting. See
Table 1.

Suggested further reading


▷ “Analysis 1” by Terence Tao. (Particularly §1.2 “Why Analysis?” and Appendix A
“The basics of mathematical logic”).

ix
Mathematical Analysis 1 Mathematical Analysis 2
Sequences & series of numbers Sequences & series of functions
2 , a3 , . . . f
a1 , aP 1 (x), f2 (x), f3 (x), . . .
∞ P ∞
n=0 n a n=0 fn (x)

(Functions) f : R → R f : Rn → R (Scalar fields)


f : Rn → Rn (Vector fields)
α : R → Rn (Paths)
(Derivative) f ′ (x) = df
dx
(x) ∂f
∂xj
(x1 , . . . , xn ) (Partial derivatives)
∇f (Gradient)
Dv f (Directional derivative)
α′ (Derivative of path)
Df (Jacobian matrix)
∇ · f (Divergence)
∇ × f (Curl)
(Extrema) supx∈R f (x) supx∈Rn f (x) (Extrema)
Lagrange multiplier method
´b
Integral a f (x) dx Multiple integral
Line integral
Surface integral
T a b l e 1 : Ma2 versus MA1

x
Chapter 1
S e q u e n c e s & s eries of
f u n c t i o ns

A
n a l o g o u s l y to sequences of numbers we can consider a sequence of func-
tions f0 (x), f1 (x), f2 (x), f3 (x), etc. Often it is convenient to write such a
sequence as {fn (x)}n∈N . For example, the following are sequences of functions.
▷ f1 (x) = x2 , f2 (x) = x4 , f3 (x) = x6 , . . .
▷ f1 (x) = ex , f2 (x) = e2x ,f3 (x) = e3x , . . .
▷ fn (x) = n exp − 12 n2 x2
Note that in the first case we could have instead written fn (x) = x2n and in the
second case we could have written fn (x) = enx . The natural number n is called the
index. Typically the index of the sequence starts from n = 0 or n = 1 but that’s not
essential. The index doesn’t need to be n, any other letter, or indeed symbol, can be
used.

1 . 1 C o n v e r g e n c e & co n t i n u i t y
We start by recalling the notion of convergence for sequences of numbers.

Definition 1.1. A sequence of numbers a1 , a2 , a3 , . . . is said to converge to a if, for


each ϵ > 0, exists N ∈ N such that |an − a| < ϵ whenever n ≥ N .

If a sequence {an }n converges to a then we write an → a (as n → ∞). For sequences


of functions we will need to consider two different notions of convergence. In order
to understand this difficulty let us consider the following example.

1
1

f1 (x) = x1

f2 (x) = x2
f3 (x) = x3 4
f4 (x) = x 5
ff5 (x) ==xx67
f6f7f(x)
(x)
(x)
(x) ==
=xxx8910
fff810 (x)
(x) =
= x 1112
=xxxxx14
9 13
f14 (x)=
ff11
12
13
15
(x)
(x)
(x) =
= 15

0
0 1

Figur e 1 . 1 : The sequence of functions fn (x) = xn .

Example. Consider the sequence fn (x) = xn for x ∈ (0, 1). For each x ∈ (0, 1) we
1 1
see that fn (x) → 0. On the other hand, for each n, 2 n ∈ (0, 1) and fn (2 n ) = 12 .
Up until now we haven’t mentioned the domain of the functions in the sequence but
to proceed we need to be make this detail rigorous. We will write that “{fn (x)}n is a
sequence of functions on D ⊂ R” to mean that there is a fixed D ⊂ R and, for each
n ∈ N, fn is a function with domain D (i.e., fn : D → R).

Definition 1.2 (pointwise convergence). Let D ⊂ R, let fn (x) be a sequence of


functions on D and let f (x) be a function on D. If fn (x) → f (x) for each x ∈ D
we say that fn is pointwise convergent to f .

Definition 1.3 (uniform convergence). Let fn (x) be a sequence of functions on


D ⊂ R and let f (x) be a function on D. If, for each ϵ > 0, there exists N such
that for every n ≥ N and every x ∈ D, |fn (x) − f (x)| < ϵ then we say that fn is
uniformly convergent to f .

Example. Show that fn (x) = xn converges uniformly on (0, 12 ).

2
fn (x) f (x) + ϵ
f (x)
f (x) − ϵ

a b

F i g u r e 1 . 2 : Uniform convergence (Definition 1.3) requires that fn is


“close” to f (x) in the uniform sense illustrated here.

Solution. We observe that it converges pointwise to the constant function f (x) = 0.


We also observe that |fn (x) − f (x)| ≤ 21n for all x ∈ (0, 12 ). This means that, for
every ϵ > 0, if we can choose N = − log2 (ϵ) then |fn (x) − f (x)| ≤ ϵ whenever
n ≥ N.

Definition 1.4. Let f (x) be a functions on D ⊂ R. We say that f is continuous at


p ∈ D if, for each ϵ > 0, there is δ > 0 such that |f (x) − f (p)| < ϵ whenever
x ∈ D, |x − p| < δ. We say that f is continuous on D if f is continuous at every
p ∈ D.

It is natural to consider a sequence of continuous functions which converge and ask


if the function they converge to is continuous. What about the sequence of functions
fn (x) = arctan(nx)?

Theorem 1.5. Suppose that fn → f uniformly on D and that the fn are continuous on
D. Then f is continuous on D.

Proof. Let p ∈ D. Uniform convergence means that, for each ϵ > 0, there exists N
such that for every n ≥ N and every x ∈ D, |fn (x) − f (x)| < 3ϵ . By continuity of

3
nn = 654
n== 32
n=1
1

−1 1

−1

Figure 1.3 : The sequence of functions fn (x) = arctan(nx).

fN (x) at x = p, there is a δ > 0 such that |fN (x) − fN (p)| < ϵ


3
whenever x ∈ D,
|x − p| < δ. Since
|f (x) − f (p)| = |f (x) − fN (x) + fN (x) − fN (p) + fN (p) − f (p)|
this means that, for all |x − p| < δ,
|f (x) − f (p)| ≤ |f (x) − fN (x)| + |fN (x) − fN (p)| + |fN (p) − f (p)|
ϵ
< 3 = ϵ.
3
This proves the continuity of f at p. Since p ∈ D is arbitrary this shows the continuity
of f on D.

Recall that integrals are defined rigorously using the notion of a step functions.

Theorem 1.6. Suppose that fn are continuous functions on [a, b] ⊂ R, uniformly


convergent to f . Then
ˆb ˆb
lim fn (x) dx = f (x) dx.
n→∞
a a

4
Proof. The uniform convergence implies that for each ϵ > 0, there exists N such that
for every n ≥ N and every x ∈ D, |fn (x) − f (x)| < b−aϵ
. This means that
ˆb ˆb ˆb
fn (x) dx − f (x) dx ≤ |fn (x) − f (x)| dx
a a a
ϵ
≤ (b − a) = ϵ.
b−a
´b ´b
This shows that a
fn (x) dx → a
f (x) dx.

Series of functions
Recall that, if is a sequence of numbers, then the series n an is the sequence
P
Pn {a n } n
of numbers (the partial sums). We say that the series n an is convergent
P
{ P a
k=1 k n}
n
if { k=1 ak }n is convergent.

Definition 1.7. Let {fn } be a P


sequence of functions. We say that the series
P
n fn
▷ is pointwise convergent if Pnk=1 fk (x) is pointwise convergent,
▷ is uniformly convergent if nk=1 fk (x) is uniformly convergent.

P
Theorem 1.8. Suppose that the series n fn is uniformly convergent to g on D and the
fn are continuous on D. Then g is continuous on D.
Pn
Proof. If the fk are continuous then the k=1 fk are continuous. This means that
Theorem 1.5 applies.
P
Theorem 1.9. Suppose that the series n fn is uniformly convergent to g and the fn
are continuous. Then
ˆb X n ˆb
lim fk (x) dx = g(x) dx.
n→∞
a k=1 a

Pn
Proof. Again, that the fk are continuous means that the k=1 fk are continuous.
This means that Theorem 1.6 applies.

5
Here and subsequently it is convenient to recall several commons tests which are
useful for proving convergence: ratio test, root test, comparison test, alternating series
test, integral test for convergence. For series of functions we have the following test for
convergence.

Theorem 1.10 (Weierstrass M-test). Suppose that {fn }n is a sequence of functions


P∞ on D,
that {Mn }n is a sequencePof positive numbers and that |fn (x)| ≤ Mn . If n=0 Mn is
convergent then the series ∞ n=0 fn converges absolutely and uniformly on D.

Proof. By the comparison test |fn (x)| is convergent for all x ∈ D. I.e., for each
P
x the series fn (x) is absolutely convergent and so we let f (x) be the limit. We
P
compute
n
X ∞
X ∞
X ∞
X
f (x) − fk (x) = fk (x) ≤ |fk (x)| ≤ Mk .
k=1 k=n+1 k=n+1 k=n+1

As n Mn is convergent this last expression tends to 0 as k → ∞. This estimate is


P
independent of x.

1.2 Power series


Definition 1.11. Let {an }n be a series of numbers and let c be a number. The series
n
n an (x − c) is called a power series (centred at c).
P

Typically the power series will converge for some x and diverge for other x. We
could permit x to be a complex number and the entire work of this section holds
verbatim. However, for the present purposes we will assume that x ∈ R and that the
coefficients an ∈ R and that c ∈ R. To simplify formulae we will often work with
the case that c = 0 since we can always transform a given problem to this special case.

Example. Let an = 2−n . The power series n an xn = n x2n is convergent when


P P n
|x| < 2 and divergent when |x| > 2. To see this we apply the root test and observe
1
that limn→∞ 2−n |x|n n = |x| 2
.

6
Example. Let an = 1
. The power series an x n = xn
is convergent for all
P P
n! n n n!
x. To see this we use the ratio test and observe that xn+1
(n+1)!
/ xn
n!
= |x|
n+1
and that
limn→∞ |x|
n+1
= 0 for any x.

Example. A convergent power series defines a function f (x) = n an xn . In the


P
above two examples, are these functions something familiar? Hint: in the first example,
compare f (x) with xf (x), in the second compare f (x) with f ′ (x).

1.3 Radius of convergence


A key notion is determining exactly the domain on which a power series converges.

Theorem 1.12 (uniformly convergent power series). Suppose that n an xn converges


P
for some x = x0 ̸= 0. Let R < |x0 |. Then the series is uniformly and absolutely
convergent for all x such that |x| ≤ R.

Proof. Since n an xn0 is convergent there exists M > 0 such that, for all n, |an xn0 | ≤
P
M . Observe that n n
|an xn | = |an xn0 | xx0 ≤ M |xR0 |n .
The series n M |xR0 |n is a geometric sum and so convergent. Consequently, by the
P n

M-test, the series is uniformly and absolutely convergent when |x| ≤ R.

Theorem 1.13 (radius of convergence). n


P
Suppose exists x 1 , x2 ̸
= 0 such that n an x 1
is convergent and n an xn2 is divergent. Then exists r > 0 such that n an xn is
P P
convergent for |x| < r and divergent for |x| > r.

Proof. Let A be the set of real numbers for which n an x is convergent and let r be
n
P
the least upper
Pboundn of A. The series n an x is convergent whenever |x| < r. If
n
P

P > r and
|x| n an x is convergent then this contradicts the definition of A and so
n an x is divergent for |x| > r.
n

In the above paragraphs we worked with the case c = 0 but all of these notions hold
for the general c ∈ R. Consequently Theorem 1.13 implies that the series is convergent
on an interval (c − r, c + r) = {x : |x − c| < r} but divergent when |x − c| > r.

7
The convergence for the sequence for x = c − r and x = c + r must be manually
checked and can differ for the left and right end points.

Definition 1.14. This r is the radius of convergence of the series an (x − c)n .


P
n

We use the following convention:


P if n n an x converges for all x ∈ C we say the
n
P
radius of convergence is ∞; if n an x doesn’t converge except x = 0 we say the
radius of convergence is 0. All of the above concerning power series holds verbatim
for x a complex number and so “radius” is more meaningful since it truly corresponds
to a disk in the complex plane.

1 . 4 Integrating & differentiating power


series
Let If the series n an x converges we define the function f (x) =
n
P
P∞ a n ∈ R, x ∈ R.
n=0 an x . In general exchanging limits with derivatives and integrals is problematic
n

but for power series the situation is good.

TheoremP 1.15 (integrating power series). Suppose that, for x ∈ (−r,


´ x r), the series
∞ n
fP(x) = n=0 an x is convergent. Then f (x) is continuous and 0 f (y) dy =
∞ an n+1
n=0 n+1 x .

Proof. Let |x| < R < r. Observe that the series is uniformly convergent for y ∈
[−R, R]. This means that f (x) is continuous and so we can interchange limit and
integral,
ˆx ∞ ˆx ∞
X
n
X an n+1
f (y) dy = an x = x .
n=0 n=0
n + 1
0 0

Theorem 1.16P (differentiating power series). Suppose that, for x ∈ (−r, r), the
series f (x) = ∞ n ′
n=0 an x is convergent. Then f (x) is differentiable and f (x) =
P ∞ n−1
n=1 nan x , convergent for x ∈ (−r, r).

8
Proof. Let |x| < R < r. Observe that

∞ ∞
X
n−1
X
nn xn−1
nan x = an R · · n−1 .
n=1 n=1
R R

|x| n−1
SinceP ∞n=1 an R is absolutely convergent and R · ( R )
n n
is bounded we know
P

that n=1Pnan x
n−1
is absolutely convergent
´ x(comparison test).
P∞ For convenience let
g(x) = ∞ n=1 na n x n−1
and observe that 0
g(y) dy = a
n=1 n x n
= f (x) −
a0 (by Theorem 1.15). By the fundamental theorem of calculus this concludes the
proof.

Let a, x and the coefficients an be real numbers. The series


X
f (x) = an (x − a)n
n=0

defines a function on the interval (a − r, a + r), where r is the radius of convergence.


The series is said to represent the function f and is called the power series expansion of
f about a.
Two important questions are: Given the series, what are the properties of f ? Given
a function f , can it be represented by a power series? Only rather special functions
possess power-series expansions however the class of such functions is very useful in
practice.

1.5 Uniqueness & Taylor series


In the next paragraphs we develop the idea that, if two power series represent the
same function, then they must be the same power series. In this sense we have the
uniqueness of power series. The following result is a crucial piece of information about
power series and is one major reason why they are useful.

9
Theorem 1.17 (uniqueness of power series). Suppose that two power series are convergent
and are equal in a neighbourhood of a in the sense that, for |x − a| < ϵ,
X X
an (x − a)n = bn (x − a)n = f (x).
n n
Then the two series are equal term-by-term, i.e., an = bn for every n ∈ N. Moreover,
f (n) (a)
an = b n = .
n!

Proof. The conclusion of Theorem 1.16 can be iterated and implies that f (x) has
derivatives of every order and, for k ∈ N,
X∞
(k)
f (x) = k!ak + n(n − 1) · · · (n − k + 1)an (x − a)n−k .
n=k+1

This means that f (k) (a) = k!ak because all the terms in the sum vanish.

Definition 1.18. Suppose that a function f (x) is infinitely differentiable on an open


interval about a. The Taylor’s series generated by f at a is (formally)

X f (n) (a)
(x − a)n .
n=0
n!

Observe how the coefficients in the Taylor’s series coincide with the formula ob-
tained in the above results. Question: Does the Taylor’s series converge on the entire
interval? In general, no. However we can calculate the radius of convergence of the
power series. Question: If the Taylor’s series converges, is it equal to f (x) on the
interval? In general it might not as seen in the following example.
Example. Let f (x) = e−1/x . If we proceed to calculate the Taylor’s series about
2

x = 0 we obtain:
f (x) = exp(−x−2 ) f (0) = 0
′ −3 −2
f (x) = 2x exp(−x ) f ′ (0) = 0
f ′′ (x) = (−6x−4 + 4x−6 ) exp(−x−2 ) f ′′ (0) = 0
f ′′′ (x) = 4(2x−9 − 9x−7 + 6x −5 −2 ′′′
P∞) exp(−x ) f (0) = 0
The Taylor’s series is consequently n=0 0 = 0. It does converge but has nothing to
do with the original function.

10
Example. What is Taylor’s series for f (x) = ex ? Does differentiating this power
series correspond to expectations?

E rror term in Taylor’s series


We define the error term in the nth approximation given by Taylor’s series as
n
X f (k) (a)
En (x) = f (x) − (x − a)k .
k=0
k!
Convergence of the Taylor’s series to f (x) is implied by En (x) → 0 as n → ∞. Using
this idea we have the following sufficient condition for convergence of a Taylor’s series.

Theorem 1.19. Assume f is infinitely differentiable on I = (a − r, a + r) and there


exists A > 0 such that
f (n) (x) ≤ An , for all n ∈ N, x ∈ I.
Then then Taylor’s series generated by f at a converges to f (x) for each x ∈ I.

Proof. We will first show, by induction, that


ˆx
1
En (x) = (x − y)n f (n+1) (y) dy.
n!
a

Since, by definition, E0 (x) = f (x) − f (a), the case n = 0 is immediate. We now


assume that the statement is true for n and prove it for n + 1. Observe that
f (n+1)
En+1 (x) = En (x) − (x − a)n+1
(n + 1)!
n+1 ´x
and that (x − a) = (n + 1) a (x − y)n dy. Consequently
ˆx
1 f (n+1) (a)
En+1 (x) = (x − y)n f (n+1) (y) dy − (x − a)n+1
n! (n + 1)!
a
ˆx ˆx
1 n (n+1) 1
= (x − y) f (y) dy − f (n+1) (a)(x − y)n dy.
n! n!
a a

11
Combining the integrals and integrating by parts we obtain the claimed statement for
n + 1. Using the formula for En (x) which we have just proved, we estimate
ˆx
1 n n+1 1 n n+1 (rA)n
|En (x)| ≤ |x − y| A dy ≤ rr A = rA .
n! n! n!
a
(rA)n
Since n!
→ 0 as n → ∞ we have shown that |En (x)| → 0 as n → ∞.

1.6 Power series & differential


equations
In this section we will use some of the strength of power series in a particular applica-
tion. This is a method which we can use to solve certain power series. The method
is best illustrated with an example. This method of solving differential equations is
called the “method of undetermined coefficients”.

Task 1.6.1. Find a function y(x) which satisfies the differential equation
(1 − x2 )y ′′ (x) = −2y(x)
and satisfies the initial conditions y(0) = 1, y ′ (0) = 1.

We start by assuming that there exists a power series solution y(x) = ∞ n


P
n=0 an x
convergent for x ∈ (−r, r) for some r > 0 to be determined later.
1. By Theorem 1.16,

X X∞
y ′ (x) = nan xn−1 and y ′′ (x) = n(n − 1)an xn−2 .
n=1 n=2
2. And so

X ∞
X
n 2 ′′ 2
−2 an x = (1 − x )y (x) = (1 − x ) n(n − 1)an xn−2
n=0 n=2

X ∞
X
= n(n − 1)an xn−2 − n(n − 1)an xn
n=2 n=2

X ∞
X
n
= (n + 2)(n + 1)an+2 x − n(n − 1)an xn
n=0 n=0

12
3. Consequently, by Theorem 1.17, 0 = 2an + (n + 2)(n + 1)an+2 − n(n − 1)an
for each n ∈ N0 ;
4. Equivalently an+2 = n−2 a ;
n+2 n
5. Using the initial conditions, a0 = y(0) = 1, a1 = y ′ (0) = 1;
6. For the even coefficients:
0−2
▷ a2 = 0+2 a0 = −1,
2−2
▷ a4 = 2+2 a2 = 0,
▷ a6 = 4−2
4+2 4
a = 0,...;
7. For the odd coefficients:
▷ a3 = 1−2
1+2 1
a = − 13 ,
▷ a5 = 3−2
3+2 3
a = 15 (− 31 ),...
1
▷ a2n+1 = (2n+1)(2n−1) ;
8. Formally we have the series solution

X x2n+1
y(x) = 1 − x + 2
, (1.1)
n=0
(2n + 1)(2n − 1)
9. We see that this series is convergent for |x| < 1.
Consequently we have shown that the function defined above (1.1) is well-defined in
the interval (−1, 1) and is a solution to the given differential equation.

13
Chapter 2
Di f f e re n t i a l c a lculus in
h i g h e r d i m e nsion

I
n this part of the course we start to consider higher dimensional space. That is,
instead of R we consider Rn for n ∈ N. We will particularly focus on 2D and 3D
but everything also holds in any dimension. Going beyond R we have more options
for functions and correspondingly more options for derivatives.
Various different notation is commonly used. Here we will primarily use (x, y) ∈
R , (x, y, z) ∈ R3 or, more generally, x = (x1 , x2 , . . . , xn ) ∈ Rn where x1 ∈
2

R, . . . , xn ∈ R. For example, R2 is the plane, R3 is 3D space.

Pn
Definition 2.1 (inner product). x · y = k=1 xk yk ∈ R

We recall that the inner product being zero has a geometric meaning, it means that the
two vectors are orthogonal. We also recall that the “length” of a vector is given by the
norm, defined as follows.

√ 1
Definition 2.2 (norm). ∥x∥ = x · x = ( nk=1 x2k ) 2 .
P

For example, in R2 then ∥(x, y)∥ = x2 + y 2 . There are various convenient prop-
p

erties for working with norms and inner products, in particular, the Cauchy-Schwarz
inequality |x · y| ≤ ∥x∥ ∥y∥ and the triangle inequality ∥x + y∥ ≤ ∥x∥ + ∥y∥.

15
The primary higher-dimensional functions we consider in this course are:
Scalar fields: f : Rn → R
Vector fields: f : Rn → Rn
Paths: α : R → Rn
Change of coordinates: x : Rn → Rn
These possibilities all fit into the general pattern of f : Rn → Rm for n, m ∈ N but
tradition and use of the function gives us different terminology and symbols. Such
functions are useful for representing various practical things, for example: gravitational
force; temperature in a region; wind velocity; fluid flow; electric field; etc.

2.1 O p e n s e t s , c l o s e d s e t s , b o u n d a r y ,
continuity
Let a ∈ Rn , r > 0. The open n-ball of radius r and centre a is written as
B(a, r) := {x ∈ Rn : ∥x − a∥ < r} .

Definition 2.3 (interior point). Let S ⊂ Rn . A point a ∈ S is said to be an interior


point if there is r > 0 such that B(a, r) ⊂ S. The set of all interior points of S is
denoted int S.

Definition 2.4 (open set). A set S ⊂ Rn is said to be open if all of its points are
interior points, i.e., if int S = S.

For example, open intervals, open disks, open balls, unions of open intervals, etc.,
are all open sets.
Lemma. Let r > 0, a ∈ Rn . The set B(a, r) ⊂ Rn is open.
Proof. Let b ∈ B(a, r). It suffices to show that b is an interior point. (1) Let r1 =
∥b − a∥ < r. (2) Let r2 = (r − r1 )/2. (3) We claim that B(b, r2 ) ⊂ B(a, r): In
order to see this take any c ∈ B(b, r2 ) and observe that
r + r1
∥c − a∥ ≤ ∥c − b∥ + ∥b − a∥ ≤ r2 + r1 = < r.
2
Observe that the radius of the ball will be small for points close to the boundary.

16
B(a, r)

F igure 2.1: Interior points are the centre of a ball contained within the
set.

Definition 2.5 (Cartesian product). If A1 ⊂ R, A2 ⊂ R then the Cartesian product


is defined as
A1 × A2 := {(x, y) : x ∈ A1 , y ∈ A2 } ⊂ R2 .

Analogously the Cartesian product can be defined in higher dimensions: If A1 ⊂


R , A2 ⊂ Rn then the Cartesian product A1 × A2 is defined as the set of all points
m

(x1 , . . . , xm , y1 , . . . , yn ) ∈ Rm+n such that (x1 , . . . , xm ) ∈ A1 and (y1 , . . . , yn ) ∈


A2 .

Lemma. If A1 , A2 are open subsets of R then A1 × A2 is an open subset of R2 .

Proof. Let a = (a1 , a2 ) ∈ A1 × A2 ⊂ R2 . Since A1 is open there exists r1 > 0 such


that B(a1 , r1 ) ⊂ A1 . Similarly for A2 . Let r = min{r1 , r2 }. This all means that
B(a, r) ⊂ B(a1 , r1 ) × B(a2 , r2 ) ⊂ A1 × A2 .

Discussing the “interior” of the set naturally suggests the topic of the “boundary”
of the set. In the following definitions we develop this idea.

Definition 2.6 (exterior points). Let S ⊂ Rn . A point a ∈/ S is said to be an exterior


point if there exists r > 0 such that B(a, r) ∩ S = ∅. The set of all exterior points of
S is denoted ext S.

17
x2

A2 A1 × A2

A1 x1

Figure 2.2 : If A1 , A2 are intervals then A1 × A2 is a rectangle.

Observe that ext S is an open set. We use the notation S c = Rn \ S and we say
that C c is the complement of the set S.

Definition 2.7 (boundary). The set Rn \ (int S ∪ ext S) is called the boundary of
S ⊂ Rn and is denoted ∂S.

Definition 2.8 (closed). A set S ⊂ Rn is said to be closed if ∂S ⊂ S.

Lemma 2.9. S is open ⇐⇒ S c is closed.

Proof. Observe that Rn = int S ∪ ∂S ∪ ext S (disjointly). If x ∈ ∂S then, for every


r > 0, B(x, r) ∩ S ̸= ∅ and so x ∈ ∂(S c ). Similarly with S and S c swapped and so
∂S = ∂(S c ). If S is open then int S = S and S c = ext S ∪ ∂S = ext S ∪ ∂(S c )
and so S c is closed. If S is not open then there exists a ∈ ∂S ∩ S. Additionally
a ∈ ∂(S c ) ∩ S hence S c is not closed.

Limits and continuity


Let S ⊂ Rn and f : S → Rm . If a ∈ Rn , b ∈ Rm we write lim f (x) = b to mean
x→a
that ∥f (x) − b∥ → 0 as ∥x − a∥ → 0. Observe how, if n = m = 1, this is the
familiar notion of continuity for functions on R.

18
Definition 2.10 (continuous). A function f is said to be continuous at a if f is defined
at a and lim f (x) = f (a). We say f is continuous on S if f is continuous at each
x→a
point of S.

Even functions which look “nice” can fail to be continuous as we can see in the
following example.
Example (continuity in higher dimensions). Let f be defined, for (x, y) ̸= (0, 0), as
xy
f (x, y) = 2
x + y2
and f (0, 0) = 0. What is the behaviour of f when approaching (0, 0) along the
following lines?
line value
{x = 0} f (0, t) = 0
{y = 0} f (t, 0) = 0
{x = y} f (t, t) = 12
{x = −y} f (t, t) = − 21 .

Theorem 2.11. Suppose that limx→a f (x) = b and limx→a g(x) = c. Then
1. limx→a (f (x) + g(x)) = b + c,
2. limx→a λf (x) = λb for every λ ∈ R,
3. limx→a f (x) · g(x) = b · c,
4. limx→a ∥f (x)∥ = ∥b∥.

We prove a couple of the parts of the above theorem here, the other parts are left as
exercises.
Proof of 3. Observe that f (x) · g(x) − b · c = (f (x) − b) · (g(x) − c) + b · (g(x) −
c) + c · (f (x) − b). By the triangle inequality and Cauchy-Schwarz,
∥f (x) · g(x) − b · c∥ ≤ ∥f (x) − b∥ ∥g(x) − c∥
+ ∥b∥ ∥g(x) − c∥
+ ∥c∥ ∥f (x) − b∥ .
Since we already know that ∥f (x) − b∥ → 0 and ∥g(x) − c∥ → 0 as x → a, this
implies that ∥f (x) · g(x) − b · c∥ → 0.

19
Proof of 4. Take f = g in part (c) implies that limx→a ∥f (x)∥2 = ∥b∥2 .
When writing a vector field (or similar functions) it is often convenient to divide the
higher-dimensional function into smaller parts. We call these parts the components of a
vector field. For example f (x) = (f1 (x), f2 (x)) in 2D, f (x) = (f1 (x), f2 (x), f3 (x))
in 3D, etc.

Theorem 2.12. Let f (x) = (f1 (x), f2 (x)). Then f is continuous if and only if f1 and
f2 are continuous.

Proof. We will independently prove the two implications.


(⇒) Let e1 = (1, 0), e2 = (0, 1) and observe that fk (x) = f (x) · ek . We have
already shown that the continuity of two vector fields implies the continuity of
the inner product.
X 2
2
(⇐) By definition of the norm ∥f (x) − f (a)∥ = (fk (x) − fk (a))2 and we
k=1
know ∥fk (x) − fk (a)∥ → 0 as ∥x − a∥ → 0.
In higher dimensions the analogous statement is true for the vector field f (x) =
(f1 (x), . . . , fm (x)) with exactly the same proof. I.e., f is continuous if and only if
each fk is continuous.

Example (polynomials). A polynomial in n variables is a scalar field on Rn of the form


j j
X X
f (x1 , . . . , xn ) = ··· ck1 ,...,kn xk11 · · · xknn .
k1 =0 kn =0

E.g., f (x, y) := x + 2xy − x is a polynomial in 2 variables. Polynomials are


2

continuous everywhere in Rn . This is because they are the finite sum of products of
continuous scalar fields.

Example (rational functions). A rational function is a scalar field


p(x)
f (x) =
q(x)
where p(x) and q(x) are polynomials. A rational function is continuous at every point
x such that q(x) ̸= 0.

20
2

0
0 0.5 1 1.5 2
Figure 2.3: Plot where colour represents the value of f (x, y) = x2 +y 2 .
The change in f depends on direction.

As described in the following result, the continuity of functions continues to hold,


in an intuitive way, under composition of functions.

Theorem 2.13. Suppose S ⊂ Rl , T ⊂ Rm , f : S → Rm , g : T → Rn and that


f (S) ⊂ T so that
(g ◦ f )(x) = g(f (x))
makes sense. If f is continuous at a ∈ S and g is continuous at f (a) then g ◦ f is
continuous at a.

Proof. lim ∥f (g(x)) − f (g(a))∥ = lim ∥f (y) − f (g(a))∥ = 0


x→a y→g(a)

Example. We can consider the scalar field f (x, y) = sin(x2 + y) + xy as the com-
position of functions.

2 . 2 D e ri v a t i v e s o f s c a l a r f i e l d s
We can imagine, for example in Figure 2.3, that in higher dimensions, the derivative of
a scalar field depends on the direction. This motivates the following.

21
Definition 2.14 (directional derivative). Let S ⊂ Rn and f : S → R. For any
a ∈ int S and v ∈ Rn , ∥v∥ = 1 the directional derivative of f with respect to v is
defined as
1
Dv f (a) = lim (f (a + hv) − f (a)) .
h→0 h

When h is small we can guarantee that a + hv ∈ S because a ∈ int S so this


definition makes sense.

Theorem. Suppose S ⊂ Rn , f : S → R, a ∈ int S. Let g(t) := f (a + tv). If one


of the derivatives g ′ (t) or Dv f (a) exists then the other also exists and
g ′ (t) = Dv f (a + tv).
In particular g ′ (0) = Dv f (a).

Proof. By definition h1 (g(t + h) − g(h)) = h1 (f (a + hv) − f (a)).

The following result is useful for proving later results.

Theorem (mean value). Assume that Dv (a + tv) exists for each t ∈ [0, 1]. Then for
some θ ∈ (0, 1),
f (a + v) − f (a) = Dv f (z), where z = a + θv.

Proof. Apply mean value theorem to g(t) = f (a + tv).

The following notation is convenient. For any k ∈ {1, 2, . . . , n}, let ek be the
n-dimensional unit vector where all entries are zero except the k th position which is
equal to 1. I.e., e1 = (1, 0, . . . , 0), e1 = (0, 1, 0, . . . , 0), e1 = (0, . . . , 0, 1).

Definition 2.15 (partial derivatives). We define the partial derivative in xk of


f (x1 , . . . , xn ) at a as
∂f
(a) = Dek f (a).
∂xk

Remark. Various symbols used for partial derivatives: ∂x ∂f


k
(a) = Dk f (a) = ∂k f (a).
If a function is written f (x, y) we write ∂x , ∂y for the partial derivatives. Similarly
∂f ∂f

for higher dimension.

22
In practice, to compute the partial derivative ∂x
∂f
k
, one should consider all other xj
for j ̸= k as constants and take the derivative with respect to xk . In a moment we see
this rigorously.
If f : R → R is differentiable, then we know that, when x is close to a,
f (x) ≈ f (a) + (x − a)f ′ (a).
More precisely, we know that1 f (x) = f (a) + (x − a)f ′ (a) + ϵ(x − a) where
|ϵ(x − a)| = o(|x − a|). This way of seeing differentiability is convenient for the
higher dimensional definition of differentiability.

Definition 2.16 (differentiable). Let S ⊂ Rn be open, f : S → R. We say that f


is differentiable at a ∈ S if there exists a linear transformation df a : Rn → R such
that, for x ∈ B(a, r),
f (x) = f (a) + df a (x − a) + ϵ(x − a)
where |ϵ(x − a)| = o(∥x − a∥).

For future convenience we introduce the following notation.

Definition 2.17 (gradient). The gradient of the scalar field f (x, y, z) at the point a is
 ∂f 
∂x
(a)
 ∂f 
∇f (a) =  ∂y (a) .
∂f
∂z
(a)

In general, when working in Rn for some n ∈ N, the gradient of the scalar field
f (x1 , . . . , xn ) at the point a is
 
∂f
∂x1
(a)
 ∂f
 ∂x2 (a) 

∇f (a) =  .  .
 .. 

∂f
∂xn
(a)

1
This is little-o notation and here means that |f (x) − f (a) − (x − a)f ′ (a)| / |x − a| → 0 as
|x − a| → 0.

23
Theorem 2.18. If f is differentiable at a then df a (v) = ∇f (a) · v. This means that,
for x ∈ B(a, r),
f (x) = f (a) + ∇f (a) · (x − a) + ϵ(x − a)
where |ϵ(x − a)| = o(∥x − a∥). Moreover, for any vector v, ∥v∥ = 1,
Dv f (a) = ∇f (a) · v.

Proof. Since f is differentiable there exists a linear transformation df a : Rn → R


such that f (a + hv) = f (a) + hdf a (v) + ϵ(hv) and hence
1
Dv f (a) = lim (f (a + hv) − f (a))
h→0 h
1
= lim (h df a (v) + ϵ(hv)) = df a (v).
h→0 h

In particular df a (ek ) = Dek f (a).

Theorem. If f is differentiable at a, then it is continuous at a.

Proof. Observe that |f (a + v) − f (a)| = |df a (v) + ϵ(v)|. This means that
|f (a + v) − f (a)| ≤ ∥df a ∥ ∥v∥ + |ϵ(v)|
and so this tends to 0 as ∥v∥ → 0.

Theorem 2.19. Suppose that f (x1 , . . . , xn ) is a scalar field. If the partial derivatives
∂1 f (x), . . . , ∂n f (x) exist for all x ∈ B(a, r) and are continuous at a then f is
differentiable at a.

Proof. For convenience define the vectors


v = (v1 , v2 , . . . , vn ),
uk = (v1 , v2 , . . . , vk , 0, . . . , 0).
Observe that
uk − uk−1 = vk ek , u0 = (0, 0, . . . , 0), un = v.

24
Using the mean value theorem we know that there exists zk = uk−1 + θk ek such that
f (a + uk ) − f (a + uk−1 ) = vk Dek f (a + zk ). Consequently
Xn
f (a + v) − f (a) = f (a + uk ) − f (a + uk−1 )
k=1
Xn
= vk Dek f (a + zk )
k=1
n
X
= vk Dek f (a + uk−1 )
k=1
n
X
+ vk (Dek f (a + zk ) − Dek f (a + uk−1 ))
k=1
To
Pnconclude, observe that the second sum vanishes as ∥v∥ → 0 and that the first sum,
k=1 vk Dek f (a + uk−1 ), converges to v · ∇f (a).

Chain rule
When we are working in R we know that, if g and h are differentiable, then f (t) =
g ◦ h(t) is also differentiable and also f ′ (t) = g ′ (h(t)) h′ (t). This is called the chain
rule and is frequently very useful in calculating derivatives. We now investigate how
this extends to higher dimension?
Example. Suppose that α : R → R3 describes the position α(t) at time t and that
f : R3 → R describes the temperature f (α) at a point α The temperature at time t
is equal to g(t) = f (α(t)). We want to calculate g ′ (t) because this is the change in
temperature with respect to time.
In situations like the above example it is convenient to consider the derivative of
a path α : R → Rn . Let α : R → Rn and suppose it has the form α(t) =
(α1 (t), . . . , αn (t)). We define the derivative as
 
x′1 (t)
α′ (t) :=  ...  .
 
x′n (t)
Here α′ is a vector-valued function which represents the “direction of movement”.

25
α′ (t)
α(t)

Fig u r e 2 . 4 : α(t) = (cos t, sin t, t), t ∈ R.

Theorem. Let S ⊂ Rn be open and I ⊂ R an interval. Let x : I → S and


f : S → R and define, for t ∈ I,
g(t) = f (x(t)).
Suppose that t ∈ I is such that x′ (t) exists and f is differentiable at x(t). Then g ′ (t)
exists and
g ′ (t) = ∇f (x(t)) · x′ (t).

Proof. Let h > 0 be small,


1
h
[g(t + h) − g(t)] = h1 [f (x(t + h) − f (x(t)))]
= h1 ∇f (x(t)) · (x(t + h) − x(t))
+ h1 ∥x(t + h) − x(t)∥ E(x(t), x(t + h) − x(t)).
Observe that h1 (x(t + h) − x(t)) → x′ (t) as h → 0.

Example. A particle moves in a circle and its position at time t ∈ [0, 2π] is given by
x(t) = (cos t, sin t).

26
y

x(t)

Figure 2.5: x(t) is the position of a particle. Shading represents temper-


ature f .

The temperature at a point y = (y1 , y2 ) is given by the function f (y) := y1 +y2 , The
temperature the particle experiences at time t is given by g(t) = f (x(t)). Temperature
change: g ′ (t) = ∇f (x(t)) · x′ (t) = ( 11 ) · ( −cos
sin t
t ) = cos t − sin t.

2 . 3 L e ve l s e t s & t a n g e n t p l a n e s
Let S ⊂ R2 , f : S → R. Suppose c ∈ R and let
L(c) = {x ∈ S : f (x) = c} .
The set L(c) is called the level set. In general this set can be empty or it can be all of S.
However the set L(c) is often a curve and this is the case of interest. This is the same
notion as that of contour lines on a map. I.e., x(ta ) = a for some ta ∈ I and
f (x(t)) = c
for all t ∈ I. Then
▷ ∇f (a) is normal to the curve at a
▷ Tangent line at a is {x ∈ R2 : ∇f (a) · (x − a) = 0}

27
This is because the chain rule implies that ∇f (x(t)) · x′ (t) = 0.

Example. Let f (x1 , x2 , x3 ) := x21 + x22 + x23 .


▷ If c > 0 then L(c) is a sphere,
▷ L(0) is a single point (0, 0, 0),
▷ If c < 0 then L(c) is empty.

Example. Let f (x1 , x2 , x3 ) := x21 + x22 − x23 . See Figure 2.6.


▷ If c > 0 then L(c) is a one-sheeted hyperboloid,
▷ L(0) is an infinite cone,
▷ If c < 0 then L(c) is a two-sheeted hyperboloid.

(a) Sphere ( b ) 2-sheet hyperboloid ( c ) Infinite cone ( d ) 1-sheet hyperboloid

Fi g u r e 2 . 6 : Various surfaces as level sets.

Let f be a differentiable scalar field on S ⊂ R3 and suppose that the level set
L(c) = {x ∈ S : f (x) = c} defines a surface.
▷ The gradient ∇f (a) is normal to every curve α(t) in the surface which passes
through a,
▷ The tangent plane at a is {x ∈ R3 : ∇f (a) · (x − a) = 0}.
Same argument as in R2 works in Rn .

28
∇f (a)

α′ (t) a

L(c)

α(t)

Figu r e 2 . 7 : Tangent plane and normal vector

2.4 Derivatives of vector fields


Essentially everything discussed above for scalar fields extends to vector fields in a pre-
dictable way. This is because of the linearity and that we can consider each component
of the vector field independently.

Definition 2.20 (directional derivative). Let S ⊂ Rn and f : S → Rm . For any


a ∈ int S and v ∈ Rn the derivative of the vector field f with respect to v is defined
as
1
Dv f (a) := lim (f (a + hv) − f (a)) .
h→0 h

Remark 2.21. If we use the notation f = (f1 , . . . , fm ), i.e., we write the function using
the “components” where each fk is a scalar field, then Dv f = (Dv f1 , . . . , Dv fm ).

Definition (differentiable). We say that f : Rn → Rm is differentiable at a if there


exists a linear transformation df a : Rn → Rm such that, for x ∈ B(a, r),
f (x) = f (a) + df a (x − a) + ϵ(x − a)
|ϵ(x − a)| = o(∥x − a∥).

29
Theorem 2.22. If f is differentiable at a then f is continuous at a and df a (v) =
Dv f (a).

Proof. Same as for the case of scalar fields when f : Rn → R.

2.5 J a c o b i a n m a t r i x & t h e c h a i n r u l e
The relevant differential for higher-dimensional functions is the Jacobian matrix.

Definition 2.23 (Jacobian matrix). Suppose that f : R2 → R2 and use the notation
f (x, y) = (f1 (x, y), f2 (x, y)). The Jacobian matrix of f at a is defined as
!
∂f1 ∂f1
∂x
(a) ∂y
(a)
Df (a) = ∂f 2 ∂f2 .
∂x
(a) ∂y
(a)

The Jacobian matrix is defined analogously in any dimension. I.e., if f : Rn → Rm


the the Jacobian at a is
 
∂1 f1 (a) ∂2 f1 (a) · · · ∂n f1 (a)
 ∂1 f2 (a) ∂2 f2 (a) · · · ∂n f2 (a) 
Df (a) =  .. .. ..
 
. . .

 
∂1 fm (a) ∂2 fm (a) · · · ∂n fm (a)

If we choose a basis then any linear transformation Rn → Rm can be written as a


m × n matrix. We find that df a (v) = Df (a)v.
Let S ⊂ Rn and f : S → Rm . If f is differentiable at a ∈ S then, for all
x ∈ B(a, r) ⊂ S,

f (x) = f (a) + Df (a)(x − a) + ϵ(x − a)

where |ϵ(x − a)| = o(∥x − a∥). This is like a Taylor expansion in higher dimensions.
Here we see that in higher dimensions we have a matrix form of the chain rule.

30
Theorem 2.24. Let S ⊂ Rl , T ⊂ Rm be open. Let f : S → T and g : T → Rn and
define
h = g ◦ f : S → Rn .
Let a ∈ S. Suppose that f is differentiable at a and g is differentiable at f (a). Then h
is differentiable at a and
Dh(a) = Dg(f (a)) Df (a).

Proof. Let u = f (a + v) − f (a). Since f and g are differentiable,


h(a + v) − h(a) = g(f (a + v)) − g(f (a))
= Dg(f (a))(f (a + v) − f (a)) + ϵg (u)
= Dg(f (a))Df (a)v + Dg(f (a))ϵf (v) + ϵg (u).

Example (polar coordinates). Here we consider polar coordinates and calculate the
Jacobian of this transformation. We can write the change of coordinates
(r, θ) 7→ (r cos θ, r sin θ)
as the function f (r, θ) = (x(r, θ), y(r, θ)) where f : (0, ∞) × [0, 2π) → R2 . We
calculate the Jacobian matrix of this transformation
 ∂x
(r, θ) ∂x
  
∂r ∂θ
(r, θ) cos θ −r sin θ
Df (r, θ) = ∂y = .
∂r
(r, θ) ∂y
∂θ
(r, θ) sin θ r cos θ
In particular we see that det Df (r, θ) = r, the familiar value used in change of
variables with polar coordinated.
Suppose now that we wish to calculate derivatives of h := g ◦ f for some g : R2 →
R. Here we take advantage of Theorem 2.24.
Dh(r, θ) = Dg(f (r, θ)) Df (r, θ)
  cos θ −r sin θ
∂h ∂g ∂g
(r, θ) ∂h

∂r ∂θ
(r, θ) = ∂x (f (r, θ)) ∂y (f (r, θ))
sin θ r cos θ
In other words, we have shown that
∂h ∂g ∂g
(r, θ) = (r cos θ, r sin θ) cos θ + (r cos θ, r sin θ) sin θ
∂r ∂x ∂y
∂h ∂g ∂g
(r, θ) = −r (r cos θ, r sin θ) sin θ + r (r cos θ, r sin θ) cos θ.
∂θ ∂x ∂y

31
2.6 Implicit functions & partial
derivatives
Just like with derivatives, we can take higher order partial derivatives. For convenience
when we want to write ∂y ∂ ∂
∂x
f (x, y), i.e., differentiate first with respect to x and then
2
with respect to y, we write instead ∂y∂x
∂ f
(x, y). The analogous notation is used for
higher derivatives and any other choice of coordinates.
We first consider the question of when
∂ 2f ? ∂ 2 f (x, y).
(x, y) =
∂y∂x ∂x∂y
Example (partial derivative problem). Let f : R2 → R be defined as f (0, 0) = 0
and, for (x, y) ̸= (0, 0),
xy(x2 − y 2 )
f (x, y) := .
x2 + y 2
2 ∂2f
We calculate that ∂y∂x
∂ f
(0, 0) = −1 but ∂x∂y
(0, 0) = 1.

Theorem 2.25. Let f : S → R be a scalar field such that the partial derivatives ∂f
∂x
,
∂f ∂2f 2 ∂2f
∂y
and ∂y∂x exist on an open set S ⊂ R containing x. Further assume that ∂y∂x is
∂2f
continuous on S. Then the derivative ∂x∂y
(x) exists and
∂ 2f ∂ f 2
(x) = (x).
∂x∂y ∂y∂x

In many cases we can choose to write a given curve/function either in implicit or


explicit form.

Implicit Explicit
x2 − y = 0 y(x)
√ =x
2
2
x +y =12
y(x) = ±√1 − x2 , |x| ≤ 1
2 2
x −y −1=0 y(x) = ± x2 − 1, |x| ≥ 1
x2 + y 2 − e y − 4 = 0 A mess?
x2 y 4 − 3 = sin(xy) A huge mess?

32
Given the above observation, the following method of calculating derivatives is
sometimes useful. Suppose that some f : R2 → R is given and we suppose there
exists some y : R → R such that
f (x, y(x)) = 0 for all x.
Let h(x) := f (x, y(x)) and note that h′ (x) = 0. Here we are using the idea that
h = f ◦ g where g(x) = (x, y(x)). By the chain rule h′ (x) is equal to
  1 
∂f ∂f
(x, y(x)) ∂y (x, y(x)) = 0.
∂x y ′ (x)
Consequently
∂f
′ ∂x
(x, y(x))
y (x) = − ∂f .
∂y
(x, y(x))

33
Chapter 3
E x t r e m a & other
a p p l i c a t i ons

I
n the previous chapter we introduced various notions of differentials for higher
dimensional functions (scalar fields, vector fields, paths, etc.). In this chapter
we now explore various applications of these notions and work with some of the
implementations, rather than just the objects. Firstly we will consider certain partial
differential equations which we now have the tools to solve. Then the majority of the
chapter is devoted to searching for extrema (minima / maxima) in various different
scenarios. This extends what we already know for functions in R and we will find that
in higher dimensions many more possibilities and subtleties exist.

3.1 P a r t ia l d i f f e r e n t i a l e q u a t i o n s
There are a huge number of different types of partial differential equations (PDEs)
and here we consider just two types, first order linear PDEs and the 1D wave equation.
We start by consider an example of the first type.
Example. Find all solutions of the PDE, 3 ∂f
∂x
(x, y) + 2 ∂f
∂y
(x, y) = 0.
Solution. The given PDE is equivalent to ( 32 ) · ∇f (x, y) = 0. We can also phrase
this in terms of the directional derivative, namely
Dv f (x, y) = 0 where v = ( 32 ).
This means that if a function f is a solution to the PDE then it is constant in the
direction ( 32 ). This means that all solutions have the form f (x, y) = g(2x − 3y) for
some g : R → R.

35
The same idea as used for the above example gives the following general result.

Theorem 3.1. Let g : R → R be differentiable, a, b ∈ R, (a, b) ̸= (0, 0). If


f (x, y) = g(bx − ay) then
∂f ∂f
a (x, y) + b (x, y) = 0.
∂x ∂y
Conversely, every f which satisfies this equation is of the form g(bx − ay).

Proof. First we prove (⇒). If f (x, y) = g(bx − ay) then, by the chain rule,

∂x f (x, y) = bg ′ (bx − ay), ∂y f (x, y) = −ag ′ (bx − ay).

Consequently a∂x f (x, y) + b∂y f (x, y) = abg ′ (bx − ay) − abg ′ (bx − ay) = 0.
Now we prove (⇐). It’s convenient to work in coordinates which  x correspond to the
lines along which the solutions are constant. Let ( v ) = b −a ( y ). This means that
u a b

( xy ) = a2−1 a b
+b2 b −a
( uv ). Let h(u, v) = f ( au+bv , bu−av ). We calculate that
a2 +b2 a2 +b2

1
∂u h(u, v) = a2 +b2
(a∂x f + b∂y f ) (au + bv, bu − av) = 0.

Namely, h(u, v) is a function of v only and does not depend on u so we take g(v) =
h(u, v) and so f (x, y) = g(bx − ay).

Now we look at another type of PDE. The 1D wave equation is

∂ 2f 2
2∂ f
(x, t) = c (x, t).
∂x2 ∂t2
Here x represents the position along string, t is time and f (x, t) is the displacement of
the string from the centre at position x, at time t. The constant c is a fixed parameter
depending on the string.
This partial differential equation is derived from the equation of motion F = ma
where F is the tension in the string, a is the acceleration from horizontal and m is the
mass of a little piece of the string. The equation is valid for small displacement. In this
case the boundary conditions are natural: Are the ends of the string fixed? Is only one
end fixed? At time t = 0, is the string already moving?

36
Theorem 3.2. Let F be a twice differentiable function and G a differentiable function.
1. The function defined as
ˆ
x+ct
1 1
f (x, t) = (F (x + ct) + F (x − ct)) + G(s) ds (3.1)
2 2c
x−ct
∂2f ∂2f
satisfies ∂x2 (x, t) = c2 ∂t2 (x, t), f (x, 0) = F (x) and ∂f ∂t
(x, 0) = G(x).
2. Conversely, if a solution of
∂ 2f 2
2∂ f
(x, t) = c (x, t)
∂x2 ∂t2
∂2f ∂2f
satisfies ∂x∂t = ∂t∂x , then it has the above form (3.1).

Proof of part 1. Let f (x, t) be as defined (3.1) in the statement of the theorem. We
calculate the partial derivatives
∂f
∂x
(x, t) = 12 (F ′ (x + ct) + F ′ (x − ct))
1
+ 2c (G(x + ct) − G(x − ct))
∂2f
∂x2
(x, t) = 12 (F ′′ (x + ct) + F ′′ (x − ct))
+ 2c 1
(G′ (x + ct) − G′ (x − ct))
∂f
∂t
(x, t) = 12 (cF ′ (x + ct) − cF ′ (x − ct))
+ 12 (G(x + ct) + G(x − ct))
∂2f 1 2 ′′ 2 ′′

∂t2 f (x, t) = 2
c F (x + ct) + c F (x − ct)
+ 2c (G′ (x + ct) + G′ (x − ct)) .
2 2
From this calculation we see that ∂∂xf2 (x, t) = c2 ∂∂t2f (x, t). Additionally we have
f (x, 0) = F (x) and ∂f
∂t
(x, 0) = G(x).
Proof of part 2. Suppose that f satisfies the 1D wave equation; Introduce u = x + ct,
v = x − ct and observe that x = u+v 2
, t = u−v
2c
. Define g(u, v) = f ( u+v
2
, u−v
2c
). By
the chain rule
∂g
∂u
(u, v) = 12 ∂f
∂x
( u+v
2
, u−v
2c
1 ∂f u+v u−v
) + 2c ∂t
( 2 , 2c ),
∂2g 1 ∂ 2 f u+v u−v 1 ∂ 2 f u+v u−v
∂v∂u
(u, v) = 4 ∂x2
( 2 , 2c ) − 4c ∂x∂t
( 2 , 2c )
2 2
1 ∂ f u+v u−v
+ 4c ∂x∂t
( 2 , 2c ) − 4c12 ∂∂t2f ( u+v
2
, u−v
2c
) = 0.

37
Since the second derivative is zero we know that ∂u∂g
is constant in v, therefore we can
write ∂u (u, v) = φ0 (u). In turn this means we can write g(u, v) = φ1 (u) + φ2 (v).
∂g

I.e., f (x, t) = φ1 (x + ct) + φ2 (x − ct). Let


F (x) = φ1 (x) + φ2 (x).
This means that F (x) = φ1 (x) + φ′2 (x) and ∂f
′ ′
∂t
(x, t) = cφ1 (x + ct) − cφ2 (x − ct).
Let
G(x) = ∂f∂t
(x, 0) = cφ1 (x) − cφ2 (x).
Substituting these quantities we show that the required form (3.1) is satisfied.

3.2 E x t r e m a ( m i n i m a / m a x i m a / s a d d l e )
Let S ⊂ Rn be open, f : S → R be a scalar field and a ∈ S.

Definition 3.3 (absolute min/max). If f (a) ≤ f (x) (resp. f (a) ≥ f (x)) for all
x ∈ S, then f (a) is said to be the absolute minimum (resp. maximum) of f .

Definition 3.4 (relative min/max). If f (a) ≤ f (x) (resp. f (a) ≥ f (x)) for all
x ∈ B(a, r) for some r > 0, then f (a) is said to be a relative minimum (resp.
maximum) of f .

Collectively we call the these points the extrema of the scalar field. In the case of a
scalar field defined on R2 we can visualize the scalar field as a 3D plot like Figure 3.1.
Here we see the extrema as the “flat” places. We sometimes use global as a synonym of
absolute and local as a synonym of relative.
To proceed it is convenient to connect the extrema with the behaviour of the gradient
of the scalar field.

Theorem 3.5. If f : S → R is differentiable and has a relative minimum or max-


imum at a, then ∇f (a) = 0.

Proof. Suppose f has a relative minimum at a (or consider −f ). For any unit vector
v let g(u) = f (a + uv). We know that g : R → R has a relative minimum at u = 0
so u′ (0) = 0. This means that the directional derivative Dv f (a) = 0 for every v.
Consequently this means that ∇f (a) = 0.

38
0.6

0.4

0.2

0
2
−0.2
0
−2 −1 0 1 y
2 −2
x
3
2 y2 )
Fig u r e 3 . 1 : f (x, y) := xe−(x + 14 ey 10

Observe that here and in the subsequent text, we can always consider the case of
f : R → R, i.e., the case of Rn where n = 1. Everything still holds and reduces to the
arguments and formulae previously developed for functions of one variable.

Definition 3.6 (stationary point). If ∇f (a) = 0 then a is called a stationary point.

As we see in the example of Figure 3.2, the converse of Theorem 3.5 fails in the sense
that a stationary point might not be a minimum or a maximum. The motivates the
following.

Definition 3.7 (saddle point). If ∇f (a) = 0 and a is neither a minimum nor a


maximum then a is said to be a saddle point.

The quintessential saddle has the shape seen in Figure 3.4. However it might be
similar to Figure 3.2 or more complicated using the possibilities available in higher
dimension.

39
1
f (x)

0.5

x
−1 −0.5 0.5 1

−0.5

−1

F i g u r e 3 . 2 : ∇f (a) = 0 doesn’t imply a minimum or maximum at a,


even in R, as seen with the function f (x) := x3 . In higher dimensions
even more is possible.

0.5
y
−1 1
x
−1 1
F i g u r e 3 . 3 : If f (x, y) = x2 + y 2 then ∇f (x, y) = 2x 2y and


∇f (0, 0) = ( 00 ). The point (0, 0) is an absolute minimum for f .

40
1

y
−1 1
0.5
0 x
−0.5
−1
1
−1

F i g u r e 3 . 4 : If f (x, y) = x2 − y 2 then ∇f (x, y) = 2x


and

−2y
∇f (0, 0) = ( 00 ). The point (0, 0) is a saddle point for f .

3.3 Hessian matrix

To proceed it is useful to develop the idea of a second order Taylor expansion in


this higher dimensional setting. In particular this will allow us to identify the local
behaviour close to stationary points. The main object for doing this is the Hessian
matrix.

Definition 3.8 (Hessian matrix). Let f : R2 → R be twice differentiable and use the
notation f (x, y). The Hessian matrix at a ∈ R2 is defined as
 2 
∂ f ∂2f
∂x2
(a) ∂x ∂y
(a)
Hf (a) =  2 .
∂ f ∂2f
∂y ∂x
(a) ∂y 2
(a)

Observe that the Hessian matrix Hf (a) is a symmetric matrix since we know that
∂2f 2f

∂x ∂y
(a) = ∂y∂ ∂x (a) for twice differentiable functions (Theorem 2.25). The Hessian
matrix is defined analogously in any dimensions as follows. Let f : Rn → R be twice

41
differentiable. The Hessian
 matrix at a ∈ Rn is defined as 
∂2f ∂2f ∂2f
2 (a) ∂x1 ∂x2
(a) · · · ∂x1 ∂xn
(a)
 ∂x1 
 ∂2f ∂2f ∂2f
 ∂x2 ∂x1 (a) (a) · · · (a)

∂x22 ∂x2 ∂xn

Hf (a) =  .
 
.
. .
. . . .
.

 . . . . 

 
2
∂ f 2
∂ f 2
∂ f
∂xn ∂x1
(a) ∂xn ∂x2 (a) · · · ∂x2
(a)
n

Observe that the Hessian matrix is a real symmetric matrix in any dimension. If
f : R → R then Hf (a) is a 1 × 1 matrix and coincides with the second derivative of
f . In this sense what we know about extrema in R is just a special case of everything
we do here.
 v1 
Lemma. If v = ... then1 vt Hf (a) v = nj,k=0 ∂j ∂k f (a)vj vk ∈ R.
P
vn

Proof. Multiplying the matrices we calculate that2


  
∂ ∂ f (a) · · · ∂1 ∂n f (a) v1
  1 1. .. ..   .. 
t
v Hf (a) v = v1 · · · vn  .. . .  . 
∂n ∂1 f (a) · · · ∂n ∂n f (a) vn
X n
= ∂j ∂k f (a)vj vk
j,k=0
as required.
Example. Let f (x, y) = x2 − y 2 (Figure 3.4). The gradient and the Hessian are
respectively
! 
∂f 
∂x
(x, y) 2x
∇f (x, y) = ∂f = ,
∂y
(x, y) −2y
∂2f ∂2f
! !
2 (x, y) (x, y) 2 0
Hf (x, y) = ∂∂x2 f ∂x ∂y
2 = .
∂y ∂x
(x, y) ∂∂yf2 (x, y) 0 −2
1
The notation vt denotes the transpose of the vector v.
2
For convenience, here and in many other places of this text, we use the notation ∂j ∂k f (a) =
∂2f
∂xj ∂xk (a).

42
The point (0, 0) is a stationary point since ∇f (0, 0) = ( 00 ). In this example Hf
does not depend on (x, y) but in general we can expect dependence and so it gives a
different matrix at different points (x, y).

S e c o n d o r der Taylor formula for scalar fields


First let’s recall the first order Taylor approximation from Theorem 2.18. If f is differ-
entiable at a then f (x) ≈ f (a) + ∇f (a) · (x − a). If a is a stationary point then
this only tells us that f (x) ≈ f (a) so a natural next question is to search for slightly
more detailed information.

Theorem 3.9 (second order Taylor). Let f be a scalar field twice differentiable on
B(a, r). Then,a for x close to a,
1
f (x) ≈ f (a) + ∇f (a) · (x − a) + (x − a)t Hf (a) (x − a)
2
2
in the sense that the error is o(∥(x − a)∥ ).
a
We use the convention that (x − a) is a vertical vector, equivalently, a n × 1 matrix.

Proof. Let v = x − a and let g(u) = f (a + uv). The Taylor expansion of g


tells us that g(1) = g(0) + g ′ (0) + 21 g ′′ (c) for some c ∈ (0, 1). Since g(u) =
f (a1 + uv1 , . . . , an + uvn ), by the chain rule,
X n

g (u) = ∂j f (a1 + uv1 , . . . , an + uvn )vj = ∇f (a + uv) · v,
j=1
Xn
g ′′ (u) = ∂j ∂k f (a1 + uv1 , . . . , an + uvn )vj vk
j,k=1

= vt Hf (a + uv) v.
Consequently f (a + v) = f (a) + ∇f (a) · v + 12 vt Hf (a + cv) v. We define the
“error” in the approximation as ϵ(v) = 12 vt (Hf (a + cv) − Hf (a))v and estimate
that n
X
|ϵ(v)| ≤ vj vk (∂j ∂k f (a + cv) − ∂j ∂k f (a)) .
j,k=0
2
Since |vj vk | ≤ ∥v∥ we observe that |ϵ(v)|
∥v∥2
→ 0 as ∥v∥ → 0 as required.

43
3.4 C l a s s i f y i n g s t a t i o n a r y p o i n t s
In order to classify the stationary points we will take advantage of the Hessian matrix
and therefore we need to first understand the follow fact about real symmetric matrices.

Theorem 3.10. Let A be a real symmetric matrix and let Q(v) = vt Av. Then
Q(v) > 0 for all v ̸= 0 ⇐⇒ all eigenvalues of A are positive,
Q(v) < 0 for all v ̸= 0 ⇐⇒ all eigenvalues of A are negative.

Proof. Since A is symmetric it can be diagonalised by matrix B which is orthogonal


(B t = B −1 ) and the diagonal matrix D = B t AB has the eigenvalues of A as the
diagonal. This means that t t t
P Q(v)2 = v B BAB Bv = w Dw P
t
where w = Bv.
Consequently Q(v) = j λj wj . Observe that, if all λj > 0 then j λj wj2 > 0.
In order to prove the other direction in the “if and only if” statement, observe that
Q(Buk ) = λk . This means that, if Q(v) > 0 for all v ̸= 0 then λk > 0 for all
k.

Theorem 3.11 (classification of stationary points). Let f be a scalar field twice dif-
ferentiable on B(a, r). Suppose ∇f (a) = 0 and consider the eigenvalues of Hf (a).
Then
All eigenvalues are positive =⇒ relative minimum at a,
All eigenvalues are negative =⇒ relative maximum at a,
Some positive, some negative =⇒ a is a saddle point.

Proof. Let Q(v) = vt Hf (a)v,Pw = Bv andP let Λ := minj λj . Observe that


∥w∥ = ∥v∥ and that Q(v) = j λj wj ≥ Λ j wj2 = Λ ∥v∥2 . We have them
2

2nd -order Taylor


1
f (a + v) − f (a) = vt Hf (a) v + ϵ(v)
2
 
ϵ(v)
≥ Λ2 − ∥v∥ 2 ∥v∥2 .

Since |ϵ(v)|
∥v∥2
→ 0 as ∥v∥ → 0, |ϵ(v)|
∥v∥2
< Λ2 when ∥v∥ is small. The argument is
analogous for the second part. For final part consider vj which is the eigenvector for
λj and apply the argument of the first or second part.

44
3 . 5 A tt a i n i n g e x t r em e v a l u e s
Here we explore the extreme value theorem for continuous scalar fields. The argument
will be in two parts: Firstly we show that continuity implies boundedness; Secondly
we show that boundedness implies that the maximum and minimum are attained.
We use the following notation for intervals / rectangles / cuboids / tesseracts, etc. If
a = (a1 , . . . , an ) and b = (b1 , . . . , bn ) then we consider the n-dimensional closed
Cartesian product
[a, b] = [a1 , b1 ] × · · · × [an , bn ].
We call this set a rectangle (independent of the dimension). As a first step it is convenient
to know that all sequences in our setting have convergent subsequences.

Theorem 3.12 (Bolzano–Weierstrass). If {xn }n is a sequence in [a, b] there exists a


convergent subsequence {xnj }j .

Proof. In order to prove the theorem we construct the subsequence. Firstly we divide
[a, b] into sub-rectangles of size half the original. We then choose a sub-rectangle
which contains infinite elements of the sequence and choose the first of these elements
to be part of the sub-sequence. We repeat this process by again dividing the sub-
rectangle we chose by half and choosing the next element of the subsequence. We
repeat to give the full subsequence.

Theorem 3.13 (boundedness of continuous scalar fields). Suppose that f is a scalar


field continuous at every point in the closed rectangle [a, b]. Then f is bounded on [a, b]
in the sense that there exists C > 0 such that |f (x)| ≤ C for all x ∈ [a, b].

Proof. Suppose the contrary: for all n ∈ N there exists xn ∈ [a, b] such that
|f (xn )| > n. Bolzano–Weierstrass theorem means that there exists a subsequence
{xnj }j converges to x ∈ [a, b]. Continuity of f means that f (xnj ) converges to
f (x). This is a contradiction and hence the theorem is proved.

We can now use the above result on the boundedness in order to show that the
extreme values are actually obtained.

45
Theorem 3.14 (extreme value theorem). Suppose that f is a scalar field continuous at
every point in the closed rectangle [a, b]. There there exist points x, y ∈ [a, b] such that
f (x) = inf f and f (y) = sup f.

Proof. By the boundedness theorem sup f is finite and so there exists a sequence
{xn }n such that f (xn ) converges to sup f . Bolzano–Weierstrass theorem implies
that there exists a subsequence {xnj }j which converges to x ∈ [a, b]. By continuity
f (xn ) → f (x) = sup f .

3 .6 Extrema with constraints (Lagrange


multipliers)
We now consider a slightly different problem to the one earlier in this chapter. There
we wished to find the extrema of a given scalar field. Here the general problem is to
minimise or maximise a given scalar field f (x, y) under the constraint g(x, y) = 0.
Subsequently we will also consider the same problem but in higher dimensions. We
can visualize this problem as shown in Figure 3.5. For this graphic representation we
draw the constraint and also various level sets of the function that we want to find the
extrema of. The graphical representation suggests to us that at the “touching point”
the gradient vectors are parallel. In other words, ∇f = λ∇g for some λ ∈ R. The
implementation of this idea is the method of Lagrange multipliers.

Theorem 3.15 (Lagrange multipliers in 2D). Suppose that a differentiable scalar field
f (x, y) has a relative minimum or maximum when it is subject to the constraint
g(x, y) = 0.
Then there exists a scalar λ such that, at the extremum point,
∇f = λ∇g.

In three dimensions a similar result holds.

46
f = c1
f = c2

f = c3 g(x, y) = 0

∇f
∇g

Figure 3. 5 : Searching extrema of f under constraint g = 0

Theorem 3.16 (Lagrange multipliers in 3D). Suppose that a differentiable scalar field
f (x, y, z) has a relative minimum or maximum when it is subject to the constraints
g1 (x, y, z) = 0, g2 (x, y, z) = 0
and the ∇gk are linearly independent. Then there exist scalars λ1 , λ2 such that, at the
extremum point,
∇f = λ1 ∇g1 + λ2 ∇g2 .

In higher dimensions and possibly with additional constraints we have the following
general theorem.

Theorem (Lagrange multipliers). Suppose that a differentiable scalar field f (x1 , . . . , xn )


has an relative extrema when it is subject to m constraints
g1 (x1 , . . . , xn ) = 0, . . . , gm (x1 , . . . , xn ) = 0,
where m < n, and the ∇gk are all linearly independent. Then there exist m scalars
λ1 , . . . , λm such that, at each extremum point,
∇f = λ1 ∇g1 + · · · + λm ∇gm .

The Lagrange multiplier method is often stated and far less often proved. Since the
proof is rather involved we will follow this tradition here. See, for example, Chapter 14

47
of “A First Course in Real Analysis” (2012) by Protter & Morrey for a complete proof
and further discussion.
Let us consider a particular case of the method when n = 3 and m = 2. More pre-
cisely we consider the following problem: Find the maxima and minima of f (x, y, z)
along the curve C defined as
g1 (x, y, z) = 0, g2 (x, y, z) = 0
where g1 , g2 are differentiable functions. In this particular case we will prove the
Lagrange multiplier method. Suppose that a is some point in the curve. Let α(t)
denote a path which lies in the curve C in the sense that α(t) ∈ C for all t ∈ (−1, 1),
α′ (t) ̸= 0 and α(0) = a. If a is a local minimum for f restricted to C it means that
f (α(t)) ≥ f (α(0)) for all t ∈ (−δ, δ) for some δ > 0. In words, moving away from
a along the curve C doesn’t cause f (x) to decrease. Let h(t) = f (α(t)) and observe
that h : R → R so we know how to find the extrema. In particular we know that
h′ (0) = 0. By the chain rule h′ (t) = ∇f (α(t)) · α′ (t) and so
∇f (a) · α′ (0) = 0.
Since we know that g1 (α(t)) = 0 and g2 (α(t)) = 0, again by the chain rule,
∇g1 (a) · α′ (0) = 0, ∇g2 (a) · α′ (0) = 0.
To proceed it is convenient to isolate the following result of linear algebra.

Lemma. Consider w, u1 , u2 ∈ R3 and let V = {v : uk · v = 0, k = 1, 2}. If


w · v = 0 for all v ∈ V then w = λ1 u1 + λ2 u2 for some λ1 , λ2 ∈ R.

Proof. We can write w = λ1 u1 + λ2 u2 + v0 where v0 ∈ V because u1 , u2 together


with V must span R3 . Since v0 ∈ V and, by assumption, w · v0 = 0,
0 = w · v0 = (λ1 u1 + λ2 u2 + v0 ) · v0 = v0 · v0 = ∥v0 ∥2 .
This means that v0 = 0 and so w = λ1 u1 + λ2 u2 .
The above lemma also holds in any dimension with any number of vectors with the
analogous proof. Applying this lemma to the vectors ∇f (a), ∇g1 (a) and ∇g2 (a)
recovers exactly the Lagrange multiplier method in this setting.

48
Chapter 4
C u r v e s & l i n e i ntegrals

C attention to precisely what we mean by this notion. Up until now we relied


u r v e s have played a part in earlier parts of the course and now we turn our

more on an intuition, an idea of some type of 1D subset of higher dimensional space.


We will also define how we can integrate scalar and vector fields along these curves.
These types of integrals have a natural and important physical relevance. We will then
study some of the properties of these integrals. To start let’s recall a random selection
of curves we have already seen:

Circle x2 + y 2 = 4
Semi-circle x2 + y 2 = 4, x ≥ 0
Ellipse 1 2
4
x + 19 y 2 = 4
Line y = 5x + 2
Line (in 3D) x + 2y + 3z = 0, x = 4y
Parabola (in 3D) y = x2 , z = x

In the above list the curves are written in a way where we are describing a set of
points using certain constraint or constraints. In some cases in implicit form, in
some cases in explicit form. For example, for the circle we formally mean the set
{(x, y) : x2 + y 2 = 4}. We have the idea that the curves should be sets which
are single connected pieces and we vaguely have an idea that we need curves that are
sufficiently smooth. To proceed we need a precise definition of the 1D objects we can
work with. As part of the definition we force a structure which really allows us to work
with these objects in a useful way.

49
4. 1 C u r v e s , p a t h s & l i n e i n t e g r a l s
Let α : [a, b] → Rn be continuous. For convenience, in components we write
α(t) = (α1 (t), . . . , αn (t)). We say that α(t) is differentiable if each component
αk (t) is differentiable on [a, b] and αk′ (t) is continuous (Definition 2.16). We say that
α(t) is piecewise differentiable if [a, b] = [a, c1 ] ∪ [c1 , c2 ] ∪ · · · ∪ [cl , b] and α(t) is
differentiable on each of these intervals.

Definition 4.1. If α : [a, b] → Rn is piecewise differentiable then we call it a path.

Note that different functions can trace out the same curve in different ways. Also
note that a path has an inherent direction. We say that this is a parametric representation
of a given curve. We already saw examples of paths in Figure 2.4 and Figure 2.5. A few
examples of paths are as follows.

α(t) = (t, t), t ∈ [0, 1]


α(t) = (cos t, sin t), t ∈ [0, 2π]
α(t) = (cos t, sin t), t ∈ [− π2 , π2 ]
α(t) = (cos t, − sin t), t ∈ [0, 2π]
α(t) = (t, t, t), t ∈ [0, 1]
α(t) = (cos t, sin t, t), t ∈ [−10, 10]

Observe how some of these paths represent the same curve, perhaps traversed in a
different direction.
Let α(t) be a (piecewise differentiable) path on [a, b] and let f : Rn → Rn be a
continuous vector field. Recall that we consider α 

(t) andf (x) as n-vectors. I.e., in

α1 (t)
the case n = 2, then α (t) = α′ (t) and f (x) = ff12 (x)

(x) .
2

Definition 4.2 (line integral of a vector field). The line integral of the vector field f
along the path α is defined as
ˆ ˆb
f · dα = f (α(t)) · α′ (t) dt.
a

50
´
Sometimes the same integral is written as C f · dα to emphasize that
´ the integral is
along the curve
´ C. Alternatively the integral is sometimes written as f1 dα1 + · · · +
fn dαn or f1 dx1 + · · · + fn dxn . Each of these different notations are in common
usage in different contexts but the underlying quantity is always the same.
 √ 
y
Example. Consider the vector field f (x, y) = x3 +y and the path α(t) = (t2 , t3 )
´
for t ∈ (0, 1). Evaluate f · dα.

Solution. We start by calculating


   3 
′ 2t t2
α (t) = , f (α(t)) = .
3t2 t6 + t3
5
This means that f (α(t)) · α′ (t) = 2t 2 + 3t8 + 3t5 and so
ˆ ˆ1
5 59
f · dα = (2t 2 + 3t8 + 3t5 ) dt = .
42
0

4 .2 Basic properties of the line integral


Having defined the line integral, the next step is to clarify its behaviour, in particular
the following key properties.
Linearity: Suppose f , g are vector fields and α(t) is a path. For any c, d ∈ R, then
ˆ ˆ ˆ
(cf + dg) · dα = c f · dα + d g · dα.
Joining / splitting paths: Suppose f is a vector field and that
(
α1 (t) t ∈ [a, c]
α(t) =
α2 (t) t ∈ [c, b]
is a path. Then ˆ ˆ ˆ
f · dα = f · dα1 + f · dα2 .
Alternatively, if we write C, C1 , C2 for the corresponding curves, then
ˆ ˆ ˆ
f · dα = f · dα + f · dα.
C C1 C2

51
As already mentioned, for a given curve there are many different choices of para-
metrization. For example, consider the curve C = {(x, y) : x2 + y 2 = 1, √y ≥ 0}.
This is a semi-circle and two possible parametrizations are α(t) = (−t, 1 − t2 ),
t ∈ [−1, 1] and β(t) = (cos t, sin t), t ∈ [0, π]. These are just two possibilities
among many possible choices. For a given curve, to what extent does the line integral
depend on the choice of parametrization?

Definition 4.3 (equivalent paths). We say that two paths α(t) and β(t) are equivalent
if there exists a differentiable function u : [c, d] → [a, b] such that α(u(t)) = β(t).
Furthermore, we say that α(t) and β(t) are
▷ in the same direction if u(c) = a and u(d) = b,
▷ in the opposite direction if u(c) = b and u(d) = a.

With this terminology we can precisely describe the dependence of the integral on
the choice of parametrization.

Theorem 4.4 (change of parametrization). Let f be a continuous vector field and let
α, β be equivalent paths. Then
ˆ (´
f · dβ if the paths are in the same direction,
f · dα = ´
− f · dβ if the paths are in the opposite direction.

Proof. Suppose that the paths are continuously differentiable path, decomposing if
required. Since α(u(t)) = β(t) the chain rule implies that β ′ (t) = α′ (u(t)) u′ (t).
In particular
ˆ ˆd ˆd
f · dβ = f (β(t)) · β ′ (t) dt = f (α(u(t))) · α′ (u(t)) u′ (t) dt.
c c
Changing variables, adding a minus sign if path is opposite direction because we need
to swap the limits of integration, completes the proof.

Gradients & work


Let h(x, y) be a scalar field in R2 and recall that the gradient ∇h(x, y) is a vector
field. Let α(t), t ∈ [0, 1] be a path. Now let g(t) = h(α(t)), consider the derivative

52
g ′ (t) = ∇h(α(t)) · α′ (t) and evaluate the line integral
ˆ ˆ1
∇h · dα = ∇h(α(t)) · α′ (t) dt
0
ˆ1
= g(t) dt = g(1) − g(0) = h(α(1)) − h(α(0)).
0
This equality has the following intuitive interpretation if we suppose for a moment
that h denotes altitude. In this case the line integral is the sum of all the infinitesimal
altitude changes and equals the total change in altitude.
As a first example of work
 inphysics let’s consider gravity. The gravitational field
0
on earth is f (x, y, z) = mg 0 . If we move a particle from a = (a1 , a2 , a3 ) to
´ = (b1 , b2 , b3 ) along the path α(t), t ∈ [0, 1] then the work done is defined as
b
f · dα. We calculate that
ˆ ˆ1 ˆ1
f · dα = f (α(t)) · α (t) dt = mg α3′ (t) dt

0 0

= mg [α3 (t)]10 = mg(b3 − a3 ).


This coincides we what we know, work done depends only on the change in height.
As a second example of work in physics let’s consider a particle moving in a force
field. Let f be the force field and let x(t) be the position at time t of a particle moving
in the field. Let v(t) = x′ (t) be the velocity at time t of the particle and define kinetic
energy as m2 ∥v(t)∥2 . According to Newton’s law f (x(t)) = mx′′ (t) = mv′ (t) and
so the work done is
ˆ ˆ1 ˆ1
f · dx = f (x(t)) · v(t) dt = mv′ (t) · v(t) dt
0 0
ˆ1
d m
∥v(t)∥2 = m
∥v(1)∥2 − m
∥v(0)∥2
 
= dt 2 2 2
0
In this case we see, as expected, the work done on the particle moving in the force field
is equal to the change in kinetic energy.

53
4.3 T h e s e c o n d f u n d a m e n t a l t h e o r e m
´b
Recall that, if φ : R → R is differentiable then a φ′ (t) dt = φ(b) − φ(a). This is
called the second fundamental theorem of calculus and is one of the ways in which we
see that differentiation and integration are opposites. The analog for line integrals is
the following.

Theorem 4.5 (2nd fundamental theorem in Rn ). Suppose that φ is a continuously


differentiable scalar field on S ⊂ Rn and suppose that α(t), t ∈ [a, b] is a path in S.
Let a = α(a), b = α(b). Then
ˆ
∇φ · dα = φ(b) − φ(a).

Proof. Suppose that α(t) is differentiable. By the chain rule dtd φ(α(t)) = ∇φ(α(t))·
α′ (t). Consequently
ˆ ˆ1 ˆ1
′ d
∇φ · dα = ∇φ(α(t)) · α (t) dt = dt
φ(α(t)) dt.
0 0
nd
´1
By the 2 fundamental theorem in R we know that d
0 dt
φ(α(t)) dt = φ(α(b)) −
φ(α(a)).
Example (potential energy). Our earth has mass M with centre at (0, 0, 0). Suppose
that there is a small particle close to earth which has mass m. The force field of
gravitation and potential energy are, respectively,
−GmM GmM
f (x) = 3 x, φ(x) = .
∥x∥ ∥x∥
We can calculate ∇φ(x) and see that it is equal to f (x).

4.4 T h e f i r s t f u n d a m e n t a l t h e o r e m
First we need to consider a basic topological property of sets. In particular we want
to avoid the possibility of the set being several disconnected pieces, in other words
we want to guarantee that we can get from one point to another in the set in a way
without every leaving the set (see Figure 4.1).

54
Definition 4.6 (connected). The set S ⊂ Rn is said to be connected if, for every pair
of points a, b ∈ S, there exists a path α(t), t ∈ [a, b] such that
▷ α(t) ∈ S for every t ∈ [a, b],
▷ α(a) = a and α(b) = b.

Sometimes this property is called “path connected” to distinguish between different


notions.

S
a
b

α(t)

F i g u r e 4 . 1 : A connected set.

´x
Recall that, if f : R → R is continuous and we let φ(x) = a f (t) dt then
φ′ (x) = f (x). This is called the first fundamental theorem of calculus and is the
other way in which we see that differentiation and integration are opposites. Again we
have an analog for the line integral but here it becomes a little more subtle since there
are many different paths along which we can integrate between any two points.

Theorem (1st fundamental theorem in Rn ). Let f be a continuous ´ vector field on a


connected set S ⊂ Rn . Suppose that, for x, a ∈ S, the line integral f · dα is equal
´ for
every path α such that α(a) = a, α(b) = x. Fix a ∈ S and define φ(x) = f · dα.
Then φ is continuously differentiable and ∇φ = f .
     
1 0 0
Sketch of proof. As before let e1 = 0 , e2 = 1 , e3 = 0 . Observe that, if we
0 0 1
define the paths β k (t) = x + tek , t ∈ [0, h], then
ˆ
φ(x + hek ) − φ(x) = f · dβ k .

55
Moreover β ′k (t) = ek . Consequently
∂φ 1
(x) = lim (φ(x + hek ) − φ(x))
∂xk h→0 h
ˆh
1
= lim f (β k (t)) · ek dt = fk (x).
h→0 h
0

In other words, we have shown that ∇φ(x) = f (x).

Definition 4.7 (closed path). We say a path α(t), t ∈ [a, b] is closed if α(a) = α(b).

Observe that, if α(t), t ∈ [a, b] is a closed path then we can divided it into two
paths: Let c ∈ [a, b] and consider the two paths α(t), t ∈ [a, c] and α(t), t ∈ [c, b].
On the other hand, suppose α(t), t ∈ [a, b] and β(t), t ∈ [c, d] are two path starting
at a and finishing at b. The these can be combined to define a closed path (by following
one backward).

Definition 4.8 (conservative vector field). A vector field f , continuous on S ⊂ Rn is


conservative if there exists a scalar field φ such that, on S,
f = ∇φ.

Note that some authors call such a vector field a gradient (i.e., the vector field is
the gradient of some scalar). If f = ∇φ then the scalar field φ is called the potential
(associated to f ). Observe that that the potential is not unique, ∇φ = ∇(φ + C) for
any constant C ∈ R.

Theorem 4.9 (conservative vector fields). Let S ⊂ Rn and and consider the vector
field f : S → Rn . The following are equivalent:
(i) f´ is conservative, i.e., f = ∇φ on S for some φ,
(ii) ´ f · dα does not depend on α, as long as α(a) = a, α(b) = b,
(iii) f · dα = 0 for any closed path α contained in S.

Proof. In the previous theorems (the two fundamental theorems) we proved that (i) is
equivalent to (ii).

56
Now we prove that (ii) implies (iii): Let α(t) be ´a closed path and
´ let β(t) be the
same
´ path in´ the opposite direction.
´ Observe that f · dα = − f · dβ but that
f · dα = f · dβ and so f · dα = 0.
It remains to prove that (iii) implies (ii): The two paths between a and b can be
combined (with a minus sign) to give a closed path.

Theorem 4.10 (mixed derivatives in 2D). Suppose that S ⊂ R2 and that f : S → R2


is a differentiable vector field and write f = ff12 .
If f is conservative then, on S,
∂f1 ∂f2
= .
∂y ∂x

The above result is a special case of the following general statement which holds in
any dimension.

Theorem 4.11 (mixed derivatives). Suppose that f is a differentiable vector fielda on


S ⊂ Rn . If f is conservative then, for each l, k,
∂fl ∂fk
= .
∂xk ∂xl
a
As before fk (x1 , . . . , xn ) denotes the k th component of the vector field f .

Proof. By assumption the second order partial derivatives exist and so


∂fl ∂2φ ∂2φ ∂fk
∂xk
= ∂xk ∂xl
= ∂xl ∂xk
= ∂xl
.

Example. Consider the vector field


−1
 
−y(x2 +y 2 )
f (x, y) = x(x2 +y 2 )
−1

on S = R2 \ (0, 0). Calculating we verify that ∂f 1


= ∂f 2
on S. We now evaluate the
´ ∂y ∂x
line integral f · dα where α(t) = (a cos t, a sin t), t ∈ [0, 2π]. We calculate that
a cos t ) and f (α(t)) = a2 ( a cos t ). This means that
1 −a sin t
α′ (t) = ( −a sin t

ˆ ˆ2π
f · dα = (sin2 t + cos2 t) dt = 2π.
0

57
Observe that in the above example S is somehow not a “nice” set because of the
“hole” in the middle. Moreover, observe that the line integral is the same for any circle,
independent of the radius.
Theorem 4.11 isn’t really useful in showing that a vector field is conservative because
it is possible for the mixed partial derivatives to all be equal but still the field fail to be
conservative. On the other hand, if a pair of mixed derivatives is not equal then f is
not conservative and so it is useful for proving the negative. Later in this chapter we
will return to this topic.

4.5 P o t e n t i a l s & c o n s e r v a t i v e v e c t o r
fields
We now turn our attention to the following question: Suppose we are given a vector
field f and we know that f = ∇φ for some φ. How can we find φ? For this we consider
two methods in the following paragraphs. First we describe the method which we call

x2 x

α2

a2
a α1

a1 x1

F i g u r e 4 . 2 : The paths α1 and α2 .

constructing a potential by line integral. Suppose that f is a conservative


´ vector field on
the rectangle [a1 , b1 ] × [a2 , b2 ]. We define φ(x) as the line integral f · dα where α
is a path between a = (a1 , a2 ) and x. For any x = (x1 , x2 ) ∈ R2 consider the two
paths:
α1 (t) = (t, a2 ), t ∈ [a1 , x1 ],
α2 (t) = (x1 , t), t ∈ [a2 , x2 ].

58
Let α(t) denote the concatenation of the two paths. We calculate that
ˆ ˆx1 ˆx2
f · dα = f (α1 (t)) · α′1 (t) dt + f (α2 (t)) · α′2 (t) dt.
a1 a2
´ x1 ´ x2
This means that φ(x) = a1 f1 (t, a2 ) dt + a2 f2 (x1 , t) dt.
Now we describe a different method which we describe as constructing a potential by
indefinite integrals. Again suppose that f = ∇φ for some scalar field φ(x, y) which
we wish to find. Observe that ∂φ∂x
= f1 and ∂φ
∂y
= f2 . This means that
ˆx ˆy
f1 (t, y) dt + A(y) = φ(x, y) = f2 (x, t) dt + B(x)
a b
where A(y), B(x) are constants of integration. Calculating and comparing we can
then obtain a formula for φ(x, y).
 x 2 
Example. Find a potential for f (x, y) = e 2ey x+1
y on R2 .

Solution. We calculate that


ˆx
f1 (t, y) dt + A(y) = ex y 2 + x + A(y) = φ(x, y),
a
ˆy
f2 (x, t) dt + B(x) = ex y 2 + B(x) = φ(x, y).
b
From this we see that we can choose A(y) = 0 and B(x) = x to obtain equality of
the above quantities. Consequently we obtain the potential φ(x, y) = ex y 2 + x.
Theorem 4.11 concerning conservative fields and the mixed partial derivatives was
somewhat less than satisfactory since the converse wasn’t possible. In order to get a
more satisfactory result we need to look at another topological details of the domain of
the vector field. This concept is somewhat suggested by the methods of constructing
potentials which were described above.

Definition 4.12 (convex set). A set S ⊂ Rn is said to be convex if for any x, y ∈ S


the segment {tx + (1 − t)y, t ∈ [0, 1]} is contained in S.

59
x x
y y
S
S

( a ) A convex set. ( b ) A set which is not convex.

Fi g u r e 4 . 3 : Convex and non-convex sets.

This extra property permits the following sufficient condition for a vector field to
be conservative.

Theorem 4.13 (conservative fields on convex sets). Leta f be a differentiable vector


field on a convex region S ⊂ Rn . Then f is conservative if and only if
∂fl ∂fk
∂xk
= ∂xl
, for each l, k.
a
As usual fk (x1 , . . . , xn ) denotes the k th component of the vector field f .

Sketch of proof. We have already proved that f being conservative implies the equality of
partial derivatives (Theorem 4.11) and therefore
´ we need only assume that ∂g fl = ∂l fk
and construct a potential. Let φ(x) = f · dα where α(t) = tx, t ∈ [0, 1]. Since
´1
α′ (t) = x, φ(x) = 0 f (tx) · x dt. Also (needs proving)
ˆ1
∂φ
(tx) = (t∂k f (tx) · x + fk (tx)) dt.
∂xk
0
´1
This is equal to 0 (t∇fk (tx) · x + fk (tx)) dt because ∂g fl = ∂l fk ; By the chain
rule applied to g(t) = t∇fk (tx) this is equal to fk (x) as required.

The above gives us a useful tool to check if a given vector field is conservative.
Using the idea of “gluing together” several convex regions this result can be manually

60
extended to some more general settings. Later, in Theorem 5.7, we will take advantage
of some further ideas in order to significantly extend this result.

A p p lication to exact differential equations


Let S ⊂ R2 be simply-connected and open. The differential equation, considered on
S,
p(x, y) + q(x, y)y ′ (x) = 0
is called exact if there exists φ : S → R such that p = ∂φ ∂x
and q = ∂φ
∂y
. Exact
differential equations are closely related to conservative vector fields.

Theorem 4.14. Let S ⊂ R2 be connected and open.


▷ Suppose that φ : S → R satisfies ∇φ = ( pq ). Then the solution y(x) of the
equation p(x, y) + q(x, y)y ′ (x) = 0 satisfies φ(x, y(x)) = C for some C ∈ R.
▷ Conversely, if φ : S → R is such that φ(x, y(x)) = C defines implicitly a
function y(x), then y(x) is a solution to the equation p(x, y) + q(x, y)y ′ (x) = 0.

Proof. If y(x) satisfies φ(x, y(x)) = C, then by the chain rule and the fact that
∇φ = ( pq ), we see that p(x, y(x)) + y ′ (x)q(x, y(x)) = 0. Conversely, if y(x) is a
solution, φ(x, y(x)) must be constant in x.

Example. Solve y 2 + 2xyy ′ = 0. Let p(x, y) = y 2 , q(x, y) = 2xy and find


= xy 2 so ∇φ = ( pq ). Solutions satisfy φ(x, y(x)) = xy(x)2 = C, i.e.,
φ(x, y) q
y(x) = C
x
.

4.6 L i n e i n t e g r a l s o f s c a l a r f i e l d s
Up until now this chapter has been devoted to line integrals of vector fields but there
is also the obvious question of defining the line integral for scalar fields. This we do
now. Such a line integral allows us also to define the length of a curve in a meaningful
way. Let α(t), t ∈ [a, b] be a path in Rn and let f : Rn → R.

61
Definition 4.15 (line integral of a scalar field). The line integral of the scalar field f
along the path α is defined as
ˆ ˆb
f dα = f (α(t)) ∥α′ (t)∥ dt.
a

This integral shares the same basic properties of the line integral of a vector field
and the proofs are essentially the same. Namely it is linear and also respects how a
path can be decomposed or joined with other paths which changing the value of the
integral. Moreover, the value of the integral along a given path is independent of the
choice of parametrization of the curve. In this case, even if the curve is parametrized in
the opposite direction then the integral takes the same value. Consequently it makes
sense to define the length of the curve as the line integral of the unit scalar field, i.e.,
´b
the length of a curve parametrized by the path α is a ∥α′ (t)∥ dt.
As a simple application, consider that the path represents a wire and the ´ wire has
density f (α(t)) at the point α(t). Then the mass of the wire is equal to f dα.

62
Chapter 5
M u l t i p l e i n t egrals

T ous chapters. We then defined line integrals which are, in a sense, one dimensional
h e extension to higher dimension of differentiation was established in the previ-

integrals which exist in a high dimensional setting. We now take the next step and
define higher dimensional integrals in the sense of how to integrate a scalar field defined
on a subset of Rn . The first step will be to rigorously define which scalar fields are
integrable and to define the integral. Then we need to fine reasonable ways to evaluate
such integrals. Among other applications we will use this multiple integrals to calculate
volumes and moment of inertia. In Green’s Theorem we find a connection between
multiple integrals and line integrals. We also develop the important topic of change of
variables which takes advantate of the Jacobian determinant and is often invaluable
for actually working with a given problem.

5.1 Definition of the integral


First we need to find a definition of integrability and the integral. Then we will
proceed to study the properties of this higher dimensional integral. Recall that, in the
one-dimensional case integration was defined using the following steps:
1. Define the integral for step functions,
2. Define integral for “integrable functions”,
3. Show that continuous functions are integrable.
For higher dimensions we follow the same logic. We will then show that we can evaluate
higher dimensional integrals by repeated one-dimensional integration.
Definition (partition). Let R = [a1 , b1 ] × [a2 , b2 ] be a rectangle. Suppose that P1 =
{x0 , . . . , xm } and P2 = {y0 , . . . , yn } such that a1 = x0 < x2 < · · · < xm = b1

63
b2
yk+1
yk

a2

a1 xj xj+1 b1

Fi g u r e 5 . 1 : A partition of a rectangle R.

and a2 = y0 < y2 < · · · < yn = b2 . P = P1 × P2 is said to be a partition of R.


Observe that a partition divides R into nm sub-rectangles. If P ⊆ Q then we say
that Q is a finer partition than P . Partitions are constructed in higher dimension,
for Rn , in an analogous way. Before defining integration for general functions it is
convenient to make the definition for a special class of functions called step functions.

Definition (step function). A function f : R → R is said to be a step function if there


is a partition P of R such that f is constant on each sub-rectangle of the partition.
If f and q are step functions and c, d ∈ R, then cf + dg is also a step function.
Also note that the area of the sub-rectangle Qjk := [xj , xj+1 ] × [yk , yk+1 ] is equal to
(xj+1 − xj )(yk+1 − yk ).
We can now define the integral of a step function in a reasonable way. The definition
here is for 2D but the analogous definition holds for any dimension.
Definition (integral of a step function). Suppose that f is a step function with value
cjk on the sub-rectangle (xj , xj+1 ) × (yk , yk+1 ). Then we define the integral as
¨ m−1
XX n−1
f dxdy = cjk (xj+1 − xj )(yk+1 − yk ).
R j=0 k=0

Observe that the value of the integral is independent of the partition, as long as the
function is constant on each sub-rectangle. In this sense the integral is well-defined
(not dependent on the choice of partition used to calculate it).

64
1

f (x, y)
0

0 2
1 2 3 y
4 0
x

F i g u r e 5 . 2 : Graph of a step function.

Theorem 5.1 (basic properties of the integral). Let f, g be step functions. Then
˜ ˜ ˜
(af + bg) dxdy = a f dxdy + b g dxdy for all a, b ∈ R,
R
˜ ˜R ˜ R
f dxdy = f dxdy + f dxdy if R is divided into R1 and R2 ,
R R1 R2
˜ ˜
f dxdy ≤ g dxdy if f (x, y) ≤ g(x, y).
R R

Proof. All properties follow from the definition by basic calculations.

We are now in the position to define the set of integrable functions. In order
to define integrability we take advantage of “upper” and “lower” integrals which
“sandwich” the function we really want to integrate.

Definition 5.2 (integrability). Let R be a rectangle and let f : R → R be a bounded


function. If there is one and only one number I ∈ R such that
¨ ¨
g(x, y) dxdy ≤ I ≤ h(x, y) dxdy
R R
for every pair of step functions g, h such that, for all (x, y) ∈ R,
g(x, y) ≤ f (x, y) ≤ h(x, y).
˜
This number I is called the integral of f on R and is denoted R
f (x, y) dxdy.

65
All the basic properties of the integral of step functions, as stated in Theorem 5.1, as
holds for the integral of any integrable functions. This can be shown by considering
the limiting procedure of the upper and lower integral of step functions which are
part of the definition of integrability.

5.2 E v a l u a t i o n o f m u l t i p l e i n t e g r a l s
Now we have a definition we can rigorously work with integrals but it is essential to
also have a way to practically evaluate any given integral.
Theorem (evaluating by repeated integration). Let f be a bounded integrable function
on R = [a1 , b1 ] × [a2 , b2 ]. Suppose that, for every y ∈ [a2 , b2 ], the integral A(y) =
´ b1 ´b
a1
f (x, y) dx exists. Then a22 A(y) dy exists and,
¨ ˆb2 ˆb1
 

f dxdy =  f (x, y) dx dy.


R a2 a1

Proof. We start by choosing step functions g, h such that g ≤ f ≤ h. By assumption


´ b1 ´ b1 ´ b1
g(x, y) dx ≤ A(y) ≤ h(x, y) dx. We then observe that g(x, y) dx and
´ab11 a1 a1

a1
h(x, y) dx are step functions (in y) and so A(y) is integrable. Moreover,
ˆb2 ˆb1 ˆb2 ˆb2 ˆb1
   
 g(x, y) dx dy ≤ A(y) dy ≤  h(x, y) dx dy.
a2 a1 a2 a2 a1
´ b2
This both proves the existence of a2
A(y) dy and the value of the integral.
The conditions of the above theorem aren’t immediately easy to check and so it is
convenient to now investigate the integrability of continuous functions.

Theorem 5.3 (integral of continuous functions). Suppose that f is a continuous function


defined on the rectangle R. Then f is integrable and
¨ ˆb2 ˆb1 ˆb1 ˆb2
   

f (x, y) dxdy =  f (x, y) dx dy =  f (x, y) dy  dx.


R a2 a1 a1 a2

66
1

f (x, y)
1
0
0 0.2 0.5
0.4 0.6 0.8 1 0 y
x

Figur e 5 . 3 : Set enclosed by xy-plane and f (x, y).

Proof. Continuity implies boundedness and so upper and lower integrals exist. Let
ϵ > 0. Exists δ > 0 such that |f (x) − f (y)| ≤ ϵ whenever ∥x − y∥ ≤ δ. We can
choose a partition such ∥x − y∥ ≤ δ whenever x, y are in the same sub-rectangle
Qjk . We then define the step functions g, h s.t. g(x) = inf Qjk f , h(x) = supQjk f
when x ∈ Qjk . To finish the proof we observe that inf Qjk f − supQjk f ≤ ϵ and
ϵ > 0 can be made arbitrarily small.

This integral naturally allows us to calculate the volume of a solid. Let f (x, y) ≤
z ≤ g(x, y) be defined on the rectangle R ⊂ R2 and consider the 3D set defined as
V = {(x, y, z) : (x, y) ∈ R, f (x, y) ≤ z ≤ g(x, y)} .
˜
The volume of the set V is equal to Vol(V ) = R [g(x, y) − f (x, y)] dxdy.
Up until now we have considered step function and continuous functions. Clearly
we can permit some discontinuities and we introduce the following concept to be
able to control the functions with discontinuities sufficiently to guarantee that the
integrals are well-defined.

Definition (content zero set). A bounded subset A ⊂ R2 is said to have content zero
if, for every ϵ > 0, there exists a finite set of rectangles whose union includes A and
the sum of the areas of the rectangles is not greater than ϵ.

Examples of content zero sets include: finite sets of points; bounded line segments;
continuous paths.

67
Figure 5.4: The graph of a continuous function has content zero.

Theorem. Let f be a bounded function on R and ˜ suppose that the set of discontinuities
A ⊂ R has content zero. Then the double integral R f (x, y) dxdy exists.

Proof. Take a cover of A by rectangles with total area not greater than δ > 0. Let
P be a partition of R which is finer than the cover of A. We may assume that
inf Qjk f − supQjk f ≤ ϵ on each sub-rectangle of the partition which doesn’t
contain a discontinuity of f . The contribution to the integral of bounding step
functions from the cover of A is bounded by δ sup |f |.

5. 3 R e g i o n s b o u n d e d b y f u n c t i o n s
A major limitation is that we have only integrated over rectangles whereas we would
like to integrate over much more general different shaped regions. This we develop
now.
Suppose S ⊂ R and f is a bounded function on S. We extend f to R by defining
(
f (x, y) if (x, y) ∈ S
fR (x, y) =
0 otherwise.
We use this notation in the following definition.

Definition (integral on general regions). We say that f is integrable if fR is integrable


and define ¨ ¨
f (x, y) dxdy = fR (x, y) dxdy.
S R

68
φ2

y
φ1

a b
x

Figure 5.5: A region defined by two continuous functions. The “projec-


tion” of the region onto the x-axis is the interval [a, b]

Suppose that there are continuous functions φ1 , φ2 on R and consider the set (see
Figure 5.5)
S = {(x, y) : a ≤ x ≤ b, φ1 (x) ≤ y ≤ φ2 (x)} ⊂ R2 .
Not all sets can be written in this way but many can and such a way of describing a
subset of R2 is convenient for evaluating integrals. Observe that we could also consider
the following set
S = {(x, y) : a ≤ y ≤ b, φ1 (y) ≤ x ≤ φ2 (y)} .
In the first case we could describe the representation as projecting along the y-coordinate
whereas in the second case we are projecting along the x-coordinate. Observe that it
doesn’t make a different to the integral if we use < or ≤ in the definition of S since
the difference would be a content zero set.

Theorem. Suppose that φ is a continuous function on [a, b]. Then the graph (x, y) :
x ∈ [a, b], y = φ(x) has zero content.

Proof. By continuity, for every ϵ > 0, there exists δ > 0 such that |φ(x) − φ(y)| ≤ ϵ
whenever |x − y| ≤ δ. We then take partition of [a, b] into subintervals of length less
than δ. Using this partition we generate a cover of the graph which has area not greater
than 2ϵ |b − a|.

69
z

y
5
−5 x
5
−5

F i g u r e 5 . 6 : Upside-down cone ofpheight 5 with tip at the origin. The


solid is bounded by the surfaces z = x2 + y 2 and z = 5. This solid can
be “projected” onto the xy-plane.

Theorem 5.4. Let S = {(x, y) : x ∈ [a, b], φ1 (x) ≤ y ≤ φ2 (x)} where φ1 , φ2 are
continuous and let f be a bounded continuous function of S. Then f is integrable on S
and  
¨ ˆb φˆ2 (x)
f (x, y) dxdy =  f (x, y) dy  dx.
 

S a φ1 (x)

Proof. The set of discontinuity of fR is the boundary of S in R = [a, b] × [ã, b̃]


which consists of the graphs of φ1 , φ2 . These graphs have zero content as we proved
before. For each x, f (x, y) is integrable since it has only two discontinuity points.
´ b̃ ´ φ (x)
Additionally ã fR (x, y) dy = φ12(x) f (x, y) dy.

A similar result holds for type 2 regions but with x and y swapped. For higher
dimensions we need to also have an understanding of how to represent subsets of Rn .
Take for example a 3D solid then we would hope to be able to “project” along one of
the coordinate axis and so describe it using the 2D “shadow” and a pair of continuous
functions. For example, consider the upside-down cone of Figure 5.6 which has base
of radius 5 lying in the plane {z = 5} and has tip at the origin. In order to describe
this set it is convenient to imagine how it projects down onto the xy-axis. We then

70
describe it as
V = {(x, y, z) : (x, y) ∈ S, γ1 (x, y) ≤ z ≤ γ2 (x, y)}
where S ⊂ R is the “shadow” and the functions represent the control we need in the
2

vertical direction. In this case we must choose S = {(x, y) : x2 + y 2 ≤ 52 }since the


base of the cone, at the top of thep picture, it the largest part in terms of the shadow.
We also must choose γ1 (x, y) = x2 + y 2 and γ2 (x, y) = 5 to correspond to the
sloped lower surface and the horizontal upper surface.

5.4 A p p l i c a t i o n s o f m u l t i p l e i n t e g r a l s
Multiple integrals can be used to calculate the area or volume of a given set. Suppose
that
S = {(x, y) : x ∈ [a, b], φ1 (x) ≤ y ≤ φ2 (x)} ⊂ R2
where φ1 , φ2 are continuous functions. The the area of S is
 
¨ ˆb φˆ2 (x) ˆb
dxdy =  dy  dx = [φ2 (x) − φ1 (x)] dx.
 

S a φ1 (x) a

This corresponds to the usual notion of the integral of a function on R determining


the area under the curve. The same idea extends to arbitrary dimension. Suppose that
γ1 (x, y) ≤ γ2 (x, y) are continuous functions on S and let
V = {(x, y, z) : x ∈ [a, b], φ1 (x) ≤ y ≤ φ2 (x), γ1 (x, y) ≤ z ≤ γ2 (x, y)} ⊂ R3 .
The volume of V is    
˚ ˆb φˆ2 (x) γ2ˆ(x,y)
dxdydz =  dz  dy  dx
   

V a φ1 (x) γ1 (x,y)
 
ˆb φˆ2 (x)

= [γ2 (x, y) − γ1 (x, y)]dy  dx.


 

a φ1 (x)
Multiple integrals also allow us to calculate the mass and centre of mass of solids.
Suppose we have several particles1 each with mass mk and located at point (xk , yk ).
1
In general, mass mk at point xk , the centre of mass is point X such that M X = mk xk .
P
k

71
The total mass would then be M = k mk and the centre of mass is the point (p, q)
P
such that X X
pM = mk xk and qM = mk y k .
k k
Suppose an object has the shape of a region S and the density of the material is
f (x, y)˜at point (x, y). Then, similar to the discrete case above, the total mass is
M = S f (x, y) dxdy and the centre of mass is the point (p, q) such that
¨ ¨
pM = x f (x, y) dxdy and qM = y f (x, y) dxdy.
S S
By tradition, if the density is constant, then the centre of mass is called the centroid.

5 . 5 G r e e n ’ s t h eo r e m
We can now establish a connection between multiple integrals and the line integrals of
the previous chapter.

Theorem 5.5 (Green’s theorem). Let C ⊂ R2 be a piecewise-smooth simple (no in-


tersections) curve and α a path that parametrizes C in the counter-clockwise
  direction.
P (x,y)
Let S be the region enclosed by C. Suppose that f (x, y) = Q(x,y) is a vector field
continuously differentiable on an open set containing S. Then
¨   ˆ
∂Q ∂P
∂x
− ∂y dxdy = f · dα.
S C

Proof of Green’s theorem. To start we assume that S is a type 1 region and that Q = 0,
Since S = {(x, y) : x ∈ [a, b], φ1 (x) ≤ y ≤ φ2 (x)},
 
¨   ˆ b φˆ2 (x)
∂Q ∂P
− dxdy =  (− ∂P ) dy  dx
 
∂x ∂y ∂y
S a φ1 (x)

ˆb
= (P (x, φ1 (x)) − P (x, φ2 (x)))dx,
a

72
S S

α α

( a ) Simply connected. ( b ) Not simply connected.

F i g u r e 5 . 7 : A set is simply-connected if every closed path can be con-


tracted to a point.

It is then natural to choose four paths α1 (t) =´(t, φ1 (t)), α´2 (t) = (a, ´t), α3 (t) =
(t, φ2 (t)), α4 (t) = (b, t). We can calculate that C f · dα = f · dα1 − f · dα3 =
´b ´b
a
P (t, φ1 (t)) dt − a P (t, φ2 (t)) dt. If S is also
 type 2 then this works for P = 0
and linearity means it works for f = ( 0 ) + Q , More general regions can be formed
P 0

by “glueing” together simpler regions of the above type to complete the argument.

The quantity ∂Q
∂x
− ∂P
∂y
is reminiscent of something we saw with conservative vector
fields and we take advantage of this with the following application. We previously
introduced the concept of connected sets but now we need a slight refinement of the
idea.

Definition 5.6 (simply-connected set). A connected set S ⊂ Rn is said to be simply-


connected if any closed path α, contained within S, can be contracted to a point. (This
is in the sense that there exists a continuous map F : D2 → S, where D2 ⊂ R2
denotes the unit disk, such that F restricted to the unit circle is α.)

The following result extends Theorem 4.13 which was limited to convex sets.

73
Theorem 5.7 (conservative vector fields on simply  connected regions). Let S be a
P
simply connected region and suppose that f = Q is a vector field, continuously differ-
entiable on S. Then f is conservative if and only if ∂Q
∂x
= ∂P
∂y
.

Proof. In Theorem 4.11 we already proved that ∂Q


∂x
= ∂P
∂y
whenever f is conservative
so we need only prove the other direction of the statement. Suppose that ∂Q
∂x
= ∂P
∂y
and consider any closed path α in S. By Green’s (Theorem 5.5),
ˆ ¨  
∂Q ∂P
f · dα = ∂x
− ∂y dxdy = 0.
C S
This implies that f is conservative because the fact that the line integral around every
closed curve is zero (Theorem 4.9).
A crucially important consequence of the above result is that it implies the invariance
of a line integral under deformation of a path when the vector field is conservative.
Observe that the result can be extended to multiply connected regions by adding
additional “cuts” and keeping track of the additional line integrals.

5.6 Change of variables


When we want to identify a point in space it is common, particularly if we are pirates
recording the position of tresure, that there are many alternative ways we can describe
this point. For example we could write the number of steps north and the number
of steps east from the central palm tree. Alternatively we can specify that we stand
at the palm tree looking in a specific direction and then walk a particular number of
steps. Often is is really convenient to swap from one coordinate to another and in this
section we show how multiple integrals behave under change of coordinates.
To start, we recall the 1D case. If g : [a, b] → [g(a), g(b)] is onto with continuous
derivative and f is continuous then
ˆg(b) ˆb
f (x) dx = f (g(u)) g ′ (u) du.
g(a) a

In higher dimension we obtain a similar result but g ′ must be replaced by a type of


derivative which works in higher dimension.

74
For the 2D case we have the following result.

Theorem 5.8 (change of variable in 2D). Suppose that (u, v) 7→ (X(u, v), Y (u, v))
maps T to S one-to-one and X, Y are continuously differentiable. Then
¨ ¨
f (x, y) dxdy = f (X(u, v), Y (u, v)) |J(u, v)| dudv.
S T

Here J(u, v) = ∂∂uv X X ∂u Y


is the Jacobian matrix as used previously. Note

∂v Y ˜
that the Jacobian represents the scaling of volume in the sense that S dxdy =
˜
T
|J(u, v)| dudv.

Polar coordinates
Polar coordinates correspond to the coordinate mapping
(
x = r cos θ
y = r sin θ.
In this case the Jacobian determinant is
|J(r, θ)| = ∂∂uv XX ∂u Y
= −rcossinθ θ rsin θ
= r(cos2 θ + sin2 θ) = r.
 
∂v Y cos θ
Consequently, the change of variable in the integral gives that
¨ ¨
f (x, y) dxdy = r f (r cos θ, r sin θ) drdθ.
S T

Linear transformations
In this case the coordinate mapping is
(
x = Au + Bv
y = Cu + Dv
where A, B, C, D ∈ R are chosen fixed. The Jacobian determinant is equal to
|J(u, v)| = ∂∂uv X
X ∂u Y

∂v Y = |( CA D
B )| = |AD − BC| .

Consequently the change of coordinates for the integral is


¨ ¨
f (x, y) dxdy = |AD − BC| f (Au + Bv, Cu + Dv) dudv.
S T

75
Extension to higher dimensions
The exact analog of Theorem 5.8 holds in any dimension. In particular, in 3D, if we
consider
˝ the change of variables (u, v, w) 7→ (X(u, v, w), Y (u, v, w), Z(u, v, w)),
then S f (x, y, z) dxdydz is equal to
˚
f (X(u, v, w), Y (u, v, w), Z(u, v, w)) |J(u, v, w)| dudvdw
T
where J(u, v) is now the Jacobian matrix of dimension (3 × 3).

Cylindrical coordinates
Cylindrical coordinates corresponds to the mapping (require r > 0, 0 ≤ θ ≤ 2π)

x = r cos θ

y = r sin θ

z=z

and, in this case, the Jacobian determinant is


 cos θ sin θ 0 
|J(r, θ, z)| = −r sin θ r cos θ 0 = r(cos2 θ + sin2 θ) = r
0 0 1
and so the change of variables in the integral gives
˚ ˚
f (x, y, z) dxdydz = r F (r, θ, z) drdθdz.
S T
where F (r, θ, z) = f (r cos θ, r sin θ, z). Note that cylindrical coordinates are closely
related to polar coordinates in the sense that we don’t touch the z coordinate and use
polar coordinates for x and y.

Spherical coordinates
Spherical coordinates correspond to how we use lattitude, longitude and altitude to
specify a position on earth. It is the coordinate mapping (require ρ > 0, 0 ≤ θ ≤ 2π,
0 ≤ φ < π) 
x = ρ cos θ sin φ

y = ρ sin θ sin φ

z = ρ cos φ.

76
In this case the Jacobian determinant is
 
cos θ sin φ sin θ sin φ cos φ
|J(ρ, θ, φ)| = −ρ sin θ sin φ ρ cos θ sin φ 0 = −ρ2 sin φ = ρ2 sin φ.
ρ cos θ cos φ ρ sin θ cos φ −ρ sin φ

Consequently the change of variables in the integral gives that


˚ ˚
f (x, y, z) dxdydz = F (ρ, θ, φ)ρ2 sin φ dρdθdφ.
S T
where F (ρ, θ, φ) = f (ρ cos θ sin φ, ρ sin θ sin φ, ρ cos φ).

77
Chapter 6
S u r f a c e i n t egrals

I
n this section we consider surfaces and how to define integral of vector fields over
these surfaces. This is similar in many ways to line integrals but a higher dimensional
version. Curves (for line integrals) are 1D subsets of higher dimensional space whereas
surfaces are 2D subsets of higher dimensional space. Identically to line integrals, the
first step is to understand a practical way to represent the surfaces, just like with curves
we used paths as the parametric representation of the curve. Once we have clarified
the parametric representation of surface we can define the surface integral (of a vector
field) and show that it satisfies various properties which we would expect, including
that the integral is independent of the choice of parametrization. Similar to how we
were able to use a line integral (of a scalar) to calculate the length of a curve we can use
a surface integral (of a scalar) to calculate the area of a surface.
We then introduce two important operators that act on vector fields, namely curl
and divergence. Using these operators and the surface integral we introduce two theor-
ems, Gauss’ Theorem and Stokes’ Theorem. These theorems connect line integrals
with surface integrals and with volume integrals.

6.1 Representation of a surface


Before developing parametric representations of surfaces let’s recall an example of
parametric representation of a curve (path). For example, the half circle C = {(x, y) :
x2 + y 2 = 1, y ≥ 1} can be parametrized in many ways, including the following two
paths. √
α(x) = (x, 1 − x2 ), x ∈ [−1, 1],
α(t) = (cos t, sin t), t ∈ [0, π].

79
In a similar way, now in 2D we can have a parametric representation of a hemisphere.

Example (hemisphere). The hemisphere S = {(x, y, z) : x2 + y 2 + z 2 = 1, z ≥ 0}


can be represented parametrically in many ways, including
p
r(x, y) = (x, y, 1 − x2 − y 2 ), (x, y) ∈ {x2 + y 2 ≤ 1},
r(u, v) = (cos u cos v, sin u cos v, sin v), (u, v) ∈ [0, 2π] × [0, π/2].
Observe that the second form above can be deduced from spherical coordinates (fixed
distance from the origin).

Example (cone). The cone S = {(x, y, z) : z 2 = x2 + y 2 , z ∈ [0, 1]} can be


represented parametrically in many ways, including
p
r(x, y) = (x, y, x2 + y 2 ), (x, y) ∈ {x2 + y 2 ≤ 1},
r(u, v) = (v cos u, v sin u, v), (u, v) ∈ [0, 2π] × [0, 1].
Observe that the second form can be deduced from spherical coordinates (fixed angle
from z-axis).

Fundamental vector product


A key notion for parametric surfaces and natural geometric object is the fundamental
vector product. Consider the parametric surface, denoted r(T ), and suppose it has the
form
r(u, v) = (X(u, v), Y (u, v), Z(u, v)) , (u, v) ∈ T.

Definition 6.1 (fundamental vector product). The vector-valued function defined as


 ∂ u X   ∂v X 
∂r ∂r
∂u
× ∂v
= ∂u Y × ∂v Y
∂u Z ∂v Z
is called the fundamental vector product of the representation r.

By definition, the vector-valued functions ∂u


∂r
and ∂v
∂r
are tangent to the surface. As
such, assuming that they are linearly independent, the fundamental vector product
∂r
∂u
× ∂v∂r
is normal to the surface (orthogonal to every curve which passes through the
surface). Moreover the norm of the vector represents the local scaling of area (small
parallelograms).
As always we need to take some care about smoothness of the objects we work with.

80
Definition (regular point). If (u, v) is a point in T at which ∂u
∂r
and ∂v
∂r
are continuous
and the fundamental vector product is non-zero then r(u, v) is said to be a regular
point for that representation.

Definition (smooth surface representation). A surface r(T ) is said to be smooth if


all its points are regular points.

Just like we saw with paths to represent curves, there are many different ways we
can find the parametric representation of a given surface. If the surface S has the
form z = f (x, y) (the surface in written in explicit form) then we can use x, y as the
parameters and have the representation
r(x, y) = (x, y, f (x, y)) , (x, y) ∈ T.
The region T is the projection of S onto the xy-plane. For such a surface we compute
 1   0 
∂r ∂r
∂x
= 0
∂ f
, ∂y
= 1
∂ f
,
x y

and consequently
 1
  0
  −∂x f 
∂r ∂r
∂x
× ∂y
= 0
∂x f
× 1
∂y f
= −∂y f .
1
An example of such a representation is as follows for the hemisphere.

Example (hemisphere representation 1). Let T = {x2 + y 2 ≤ 1}, and let


p
r(x, y) = (x, y, 1 − x2 − y 2 ).
The surface r(T ) is the unit hemisphere {(x, y, z) : x2 + y 2 + z 2 = 1}. The
fundamental vector product of this representation is
−1/2
 
x(1−x2 −y 2 )
∂r
∂x
× ∂y∂r
(x, y) = y(1−x2 −y2 )−1/2 = z −1 r(x, y).
1
In this case, all points are regular except the equator.

Example (hemisphere representation 2). Let T = [0, 2π] × [0, π/2] and let
r(u, v) = (cos u cos v, sin u cos v, sin v).
The surface r(T ) is the unit hemisphere {(x, y, z) : x2 + y 2 + z 2 = 1}. This is the
representation which is connected to spherical coordinates. We calculate that
   − cos u sin v 
∂r − sin u cos v ∂r
∂u
(u, v) = cos u cos v , ∂v (u, v) = − sin u sin v ,
0 cos v

81
and so the fundamental vector product of this representation is
∂r ∂r
∂u
× ∂v
(u, v) = cos v r(u, v).
In this case many points map to the north pole (0, 0, 1) and the north pole is not a
regular point. Additionally there are two points which map to each point on the line
between equator and north pole {(x, y, z) ∈ r(T ) : y = 0}.

6.2 S u r f a c e i n t e g r a l o f s c a l a r f i e l d
Mirroring the process for line integrals we will define surface integrals both for scalar
fields and for vector fields. The surface integral of a scalar field is closely related to the
area of a parametric surface, just like the length of a curve is closely related to the line
integral of a scalar field.

Definition 6.2 (area of a parametric surface). The area of the parametric surface
S = r(T ) is defined as the double integral
¨
∂r ∂r
Area(S) = ∂u
× ∂v dudv.
T

Observe that the definition is in terms of a multiple integral over the region T , and
the quantity being integrated is the norm of the fundamental vector product.
Later we will show that Area(S) is independent of the choice of representation
as we require for such a definition, it would be unreasonable if the area of a surface
depended on the choice of representation.
We will check that this definition corresponds to a fact that we already know by
computing the surface area of a hemisphere. Let, as before, T = [0, 2π]×[0, π/2] and
let r(u, v) = (cos u cos v, sin u cos v, sin v). The norm of the fundamental vector
product (which we computed earlier) is
∂r ∂r
∂x
× ∂y
(u, v) = cos v ∥r(u, v)∥ = cos v.
This means, by Definition 6.2 and evaluating the multiple integral, that
¨ ˆ2π ˆπ/2
 

Area(S) = cos v dudv =  cos v dv  du = 2π.


T 0 0

82
The surface integral of a scalar field is defined in a way similar to the area of a surface.

Definition 6.3 (surface integral). Let S = r(T ) be a parametric surface and let f be
a scalar field defined on S. The surface integral of f over S is defined as
¨ ¨
∂r ∂r
f dS = f (r(u, v)) ∂u × ∂v (u, v) dudv
r(T ) T

whenever the double integral on the right exists.

Observe that, if we choose f ≡ 1, that is we choose the scalar field identically equal
to 1, then we obtain the formula for the area of the surface (Definition 6.2). This is
just the same as the line integral of a scalar and the length of the corresponding curve.
From the point of view of applications, we could take ˜ f as the density of thin
material which has the shape of the surface S and then S f dS is the total mass of
this piece of material. Extending this idea we could also calculate the centre of mass of
this piece of material.

6.3 C h a n g e o f s u r f a c e p a r a m e t r i z a t i o n
In order to validate the definition of a surface integral and consequently that of the
area of a surface, we will now show that the the value of the evaluated integral doesn’t
depend on the choice of representation for any given surface.

Theorem 6.4 (change of surface parametrization). Suppose that q(A) and r(B) are
both representations of the same surface, and that r = q ◦ G for some differentiable
G : B → A.¨Then ¨
∂q ∂q ∂r ∂r
f ◦q ∂s
× ∂t
dsdt = f ◦r ∂u
× ∂v
dudv.
A B

Proof. Since r(u, v) = q(S(u, v), T (u, v)) we calculate (chain rule and vector product)
that
× ∂v (u, v) = ∂q × ∂q
 ∂r ∂r   ∂S ∂T
− ∂S ∂T
 
∂u ∂s ∂t ∂u ∂v ∂v ∂u
(S(u, v), T (u, v)).

83
Figure 6. 1 : Two different representations for a given surface.

Observe that ∂S ∂T
∂u ∂v
− ∂S ∂T
∂v ∂u
is the Jacobian determinant associated to change of
variables (u, v) 7→ (S(u, v), T (u, v)). Consequently, by the change of variables
theorem,

¨ ¨
∂q ∂q ∂r ∂r
f ◦q ∂s
× ∂t
dsdt = f ◦r ∂u
× ∂v
dudv
A B

as announced in the theorem.

6.4 S u r f a c e i n t e g r a l o f a v e c t o r f i e l d
In preparation for defining the surface integral of a vector field we need the notion of
the normal vector of a surface. This is a natural geometric notion, for each point in
the surface it is the unit vector field which is orthogonal to the surface.

84
Definition 6.5 (normal vector). Let S = r(T ) be a parametric surface. At each
regular point the two unit normals are
∂r ∂r
×
n1 = ∂u
∂r
∂v
∂r
and n2 = −n1 .
∂u
× ∂v

By definition ∥n1 ∥ = ∥n2 ∥ = 1. That there are two normal vectors is expected
because there are two sides to the surface at each point, one is just the opposite direction
to the other. If f is a vector field then f · n is the component of the flow in direction
of n.

Definition 6.6 (surface integral of a vector field). Let S = r(T ) be a parametric


surface and f a vector field. The integral
¨
f · n dS
S
is said to be the surface integral of f with respect to the normal n.

For convenience let N = ∂u


∂r ∂r
× ∂v and n = N/ ∥N∥. Observe that
¨ ¨ ¨
∂r ∂r
f · n dS = (f ◦ r) · n × dudv = (f ◦ r) · N dudv
∂u ∂v
S T T
and so for evaluating the surface integral of a vector field there is typically
˜ no need to
evaluate
˜ the norm of the fundamental vector product. Also note that S
f · n1 dS =
− S f · n2 dS because n1 = −n2 . This means that choose one normal or the other
simply corresponds to a minus sign in the evaluated integral. This is the notion that
there is a choice of orientation inherent with a surface. As a tangible example imagine
that the surface has a flow passing it and this flow is determined by a vector field. Then
the surface integral would represent the total flow passing the given surface in a given
direction.

6.5 Curl and divergence


 
fx
Suppose that f = fy is a differentiable vector field.
fz

85
Definition 6.7 (curl). The curl of f is defined as
 ∂fz ∂fy 
∂y
− ∂z
 ∂fx ∂fz 
∇×f =  ∂z − ∂x  .

∂fy
∂x
− ∂f x
∂y

Definition 6.8 (divergence). The divergence of f is defined as


∂fx ∂fy ∂fz
∇·f = + + .
∂x ∂y ∂z

Often the notation curl f = ∇ × f and div f = ∇ · f is used instead. Note that
the symbols“×” and “·” used in the notation for curl and divergence are not truly
representing the vector and scalar product but are more a convenient way to remember
the definitions. These quantities satisfy the following basic properties which can all be
proved by the basic calculation.
▷ If f = ∇φ then ∇ × f = 0,
▷ ∇ · (∇ × f ) = 0,
▷ ∇ × (∇ × f ) = ∇(∇ · f ) − ∇2 f .
2 2 2
The quantity defined as ∇2 φ = ∇ · (∇φ) = ∂∂xφ2 + ∂∂yφ2 + ∂∂zφ2 is called the Laplacian
and occurs in many applications of physics and mathematics.

x
Example. If f (x, y, z) = y
z
then ∇ × f = 0, ∇ · f = 3.

 −y   
0
Example. If f (x, y, z) = x then ∇ × f = 0 , ∇ · f = 0.
0 2

Theorem 6.9. Let S ⊂ R3 be convex. Then ∇ × f ≡ 0 on S if and only if f is


conservative on S.

The above result implies Theorem 5.7 (the 2D vector fields can be written as 3D
vector fields with a zero component).

86
6. 6 T h e o r e m s o f S t o k e s a n d G a u s s
Theorem 6.10 (Stokes). Let S = r(T ) be a parametric surface. Suppose that T is
simply connected and that the boundary of T is mapped to C, the boundary of S. Let β
be a counter clockwise parametrization of the boundary of T and let α(t) = r(β(t)).
Then ¨ ˆ
(∇ × f ) · n dS = f · dα.
S

 
fx
Sketch of proof. Write f = fy and suppose that fy = fz = 0. This effectively
fz
reduces the full problem to the lower dimensional version that we previously consider.
As such, we can then apply Green’s theorem (Theorem 5.5). Finally we conclude for
general f by linearity of the integral.

Just as Green’s Theorem holds for regions which can contain holes, as long as they
are correctly accounted for, we can extend Stokes’ theorem to more general surfaces
with the idea of “cutting and gluing” the surface. In particular this allows the extension
to surfaces with holes, cylinders, spheres, etc. On the other hand the theorem can’t be
extended to the Möbius band because the topology of this surface prevents a similar
process being completed.

Theorem 6.11 (Gauss). Let V ⊂ R3 be a solid with boundary the parametric surface S
and let n be the outward normal unit vector. If f is a vector field then
˚ ¨
∇ · f dxdydz = f · n dS.
V S

Sketch of proof. We start by writing


˚   ¨
∂fx ∂fy ∂fz
∂x
+ ∂y + ∂z dxdydz = (fx nx + fy ny + fz nz ) dS.
V S
˝ ˜
As such, it suffices to show that V dxdydz = S (fx nx ) dS. If we suppose
∂fx

∂x
the solid V is xy-projectable then we can explicitly write the integral (later to be

87
extended to general solids). We then use basic calculus to express fx as the integral of
the derivative.

Stokes’ Theorem allows us to connect surface integrals (2D) to line integrals (1D).
On the other hand Gauss’ Theorem allows us to connect volume integrals (3D) to
surface integrals (2D). In this way they are similar to each other, the integral goes
decreases dimension and also there is the loss of a derivative. Indeed the fundamental
theorem of calculus for line integral also fits into this same pattern. The branch of
mathematics called “differential geometry” provides a framework in which all these
results can be described in a unified way by the statement
ˆ ˆ
ω = dω.
∂Ω Ω
This result is called the “generalized Stokes theorem”.
Note that Gauss’ Theorem is often called the “divergence theorem”. We can use
this theorem for the following interpretation of divergence as a limit, similar to the
way other versions of derivatives are defined.

Theorem. Let Vt be the ball of radius t > 0 centred at a ∈ R3 and let St be its
boundary with outgoing unit normal vector n. Then
¨
1
∇ · f = lim f · n dS.
t→0 Vol(Vt )
St

Proof. Using Gauss’ theorem.

Curl can also be written as a similar limit. Given the similarity of all the terms, it is
not unexpected that there is a relation between curl and divergence with the Jacobian
matrix. Recall that  ∂f ∂f ∂f 
x x x
∂x ∂y ∂z
 
 ∂fy ∂fy ∂fy 

Jac(f ) = 
 ∂x ∂y ∂z 
 
∂fz ∂fz ∂fz
∂x ∂y ∂z
We can immediately see that divergence is the trace of the Jacobian matrix. In order to
see the connection with curl, recall that every real matrix A can be written as the sum

88
of a symmetric matrix 12 (A + AT ) and a skew-symmetric matrix 12 (A − AT ). In this
case we have that
∂fx ∂fy ∂fx ∂fz
 
0 ∂y
− ∂x ∂z
− ∂x
 
1 T ∂fy ∂fx ∂fy ∂fz 
 
2
(Jac(f ) − Jac(f ) ) = 
 ∂x − ∂y
0 ∂z
− ∂y 
 
∂fz ∂fx ∂fz ∂fy
∂x
− ∂z ∂y − ∂z 0
and can see that the terms of the skew-symmetric part of the matrix are exactly the
terms of curl.

89

You might also like