0% found this document useful (0 votes)
47 views33 pages

Chapter 3 Contraction Mapping Prinicple 2024

Uploaded by

Angelo Oppio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views33 pages

Chapter 3 Contraction Mapping Prinicple 2024

Uploaded by

Angelo Oppio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Chapter 3

The Contraction Mapping Principle

The notion of a complete space is introduced in Section 1. The importance of complete


metric spaces partly lies on the Contraction Mapping Principle, which is proved in Section
2. Two major applications of the Contraction Mapping Principle are subsequently given,
first a proof of the Inverse Function Theorem in Section 3 and, second, a proof of the
fundamental existence and uniqueness theorem for the initial value problem of differential
equations in Section 4.

3.1 Complete Metric Space

In Rn a basic property is that every Cauchy sequence converges. This property is called
the completeness of the Euclidean space. The notion of a Cauchy sequence is well-defined
in a metric space. Indeed, a sequence {xn } in (X, d) is a Cauchy sequence if for every
ε > 0, there exists some n0 such that d(xn , xm ) < ε, for all n, m ≥ n0 . A metric space
(X, d) is complete if every Cauchy sequence in it converges. A subset E is complete if
(E, d E×E ) is complete, or, equivalently, every Cauchy sequence in E converges with limit
in E.

Proposition 3.1. Let (X, d) be a metric space.

(a) Every complete set in X is closed.

(b) Every closed set in a complete metric space is complete.

In particular, this proposition shows that every subset in a complete metric space is
complete if and only if it is closed.

1
2 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

Proof. (a) Let E ⊂ X be complete and {xn } a sequence converging to some x in X. Since
every convergent sequence is a Cauchy sequence, {xn } must converge to some z in E. By
the uniqueness of limit, we must have x = z ∈ E, so E is closed.
(b) Let (X, d) be complete and E a closed subset of X. Every Cauchy sequence {xn } in
E is also a Cauchy sequence in X. By the completeness of X, there is some x in X to
which {xn } converges. However, as E is closed, x also belongs to E. So every Cauchy
sequence in E has a limit in E.

Example 3.1. In 2050 it was shown that the space R is complete. Consequently, as
the closed subsets in R, the intervals [a, b], (−∞, b] and [a, ∞) are all complete sets. In
contrast, the set [a, b), b ∈ R, is not complete. For, simply observe that the sequence
{b − 1/k}, k ≥ k0 , for some large k0 , is a Cauchy sequence in [a, b) and yet it does not have
a limit in [a, b) (the limit is b, which lies outside [a, b)). The set of all rational numbers,
Q, is also not complete. Every irrational number is the limit of some sequence in Q, and
these sequences are Cauchy sequences whose limits lie outside Q.

Example 3.2. In 2060 we learned that every Cauchy sequence in C[a, b] with respect
to the sup-norm implies that it converges uniformly, so the limit is again continuous.
Therefore, C[a, b] is a complete space. The subset E = {f : f (x) ≥ 0, ∀x} is also
complete. Indeed, let {fn } be a Cauchy sequence in E, it is also a Cauchy sequence in
C[a, b] and hence there exists some f ∈ C[a, b] such that {fn } converges to f uniformly.
As uniform convergence implies pointwise convergence, f (x) = limn→∞ fn (x) ≥ 0, so f
belongs to E, and E is complete. Next, let P [a, b] be the collection of all polynomials
restricted to [a, b]. It is not complete. For, taking the sequence hn (x) given by
n
X xk
hn (x) = ,
k=0
k!

{hn } is a Cauchy sequence in P [a, b] which converges to ex . As ex is not a polynomial,


P [a, b] is not complete.

To obtain a typical non-complete set, we consider the closed interval [0, 1] in R. Take
away one point z from it to form E = [a, b] \ {z}. E is not complete, since every sequence
in E converging to z is a Cauchy sequence which does not converge in E. In general,
you may think of sets with “holes” being non-complete ones. Now, given a non-complete
metric space, can we make it into a complete metric space by filling out all the holes?
The answer turns out to affirmative. We can always enlarge a non-complete metric space
into a complete one by putting in sufficiently many ideal points.

Theorem 3.2 (Completion Theorem). Every metric space has a completion.


3.2. THE CONTRACTION MAPPING PRINCIPLE 3

This theorem will be further explained and proved in the appendix.

3.2 The Contraction Mapping Principle

Solving an equation f (x) = 0, where f is a function from Rn to itself frequently comes


up in application. This problem can be turned into a problem for fixed points. Literally,
a fixed point of a mapping is a point which becomes unchanged under this mapping. By
introducing the function g(x) = f (x) + x, solving the equation f (x) = 0 is equivalent to
finding a fixed point for g. This general observation underlines the importance of finding
fixed points. In this section we prove the Contraction Mapping Principle, one of the oldest
fixed point theorems and perhaps the most well-known one. As we will see, it has a wide
range of applications.

A map T : (X, d) → (X, d) is called a contraction if there is a constant γ ∈ (0, 1)


such that d(T x, T y) ≤ γd(x, y), ∀x, y ∈ X. A point x is called a fixed point of T if
T x = x. Usually we write T x instead of T (x).

Theorem 3.3 (Contraction Mapping Principle). Every contraction in a complete


metric space admits a unique fixed point.

This theorem is also called Banach’s Fixed Point Theorem.

Proof. Let T be a contraction in the complete metric space (X, d). Pick an arbitrary
x0 ∈ X and define a sequence {xn } by setting xn = T xn−1 = T n x0 , ∀n ≥ 1. We claim
that {xn } forms a Cauchy sequence in X. First of all, by iteration we have

d(T n+1 x0 , T n x0 ) ≤ γd(T n x0 , T n−1 x0 )


·
(3.1)
·
≤ γ n d(T x0 , x0 ).

Next, for n ≥ N where N is to be specified in a moment, by the triangle inequality,

d(xn , xN ) = d(T n x0 , T N x0 )
≤ d(T n x0 , T n−1 x0 ) + · · · + d(T N +1 x0 , T N x0 )
X−1
n−N
= d(T N +j+1 x0 , T N +j x0 ) .
j=0
4 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

Using (3.1), we have


X−1
n−N
d(xn , xN ) ≤ γ N +j d(T x0 , x0 )
j=0

X−1
n−N
≤ γ N d(T x0 , x0 ) γj
j=0

X
< γ N d(T x0 , x0 ) γj
j=0
d(T x0 , x0 ) N
= γ . (3.2)
1−γ
For ε > 0, choose N so large that d(T x0 , x0 )γ N /(1 − γ) < ε/2. Then for n, m ≥ N ,
d(xn , xm ) ≤ d(xn , xN ) + d(xN , xm )
2d(T x0 , x0 ) N
< γ
1−γ
< ε,
thus {xn } forms a Cauchy sequence. As X is complete, x = limn→∞ xn exists. By the
continuity of T , limn→∞ T xn = T x. But on the other hand, limn→∞ T xn = limn→∞ xn+1 =
x. We conclude that T x = x.
Suppose there is another fixed point y ∈ X. From
d(x, y) = d(T x, T y)
≤ γd(x, y),
and γ ∈ (0, 1), we conclude that d(x, y) = 0, i.e., x = y.

Incidentally, we point out that this proof is a constructive one. It tells you how to
find the fixed point starting from an arbitrary point. In fact, letting n → ∞ in (3.2)
and then replacing N by n, we obtain an error estimate between the fixed point and the
approximating sequence {xn }:
d(T x0 , x0 ) n
d(x, xn ) ≤ γ , n ≥ 1.
1−γ

The following two examples demonstrate the sharpness of the Contraction Mapping
Principle.

Example 3.3. Consider the map T x = x/2 which maps (0, 1] to itself. It is clearly a
contraction. If T x = x, then x = x/2 which implies x = 0. Thus T does not have a fixed
point in (0, 1]. This example shows that completeness of the underlying space cannot be
removed from the assumption of the theorem.
3.2. THE CONTRACTION MAPPING PRINCIPLE 5

Example 3.4. Consider the map S from R to itself defined by

Sx = x − log (1 + ex ) .

We have
dS 1
= ∈ (0, 1) , ∀x.
dx 1 + ex
By the Mean-Value Theorem, for some z lying between x and y,
1
|Sx − Sy| = |x − y| < |x − y| .
1 + ez
However, in view of (1+ez )−1 → 1 as x, y → −∞, it is impossible to find a single γ ∈ (0, 1)
to satisfy
|Sx − Sy| ≤ γ|x − y| , ∀x, y .
It is easy to see that S admits no fixed points. Therefore, the contraction condition cannot
be removed from the assumption of the theorem.

Example 3.5. Let f : [0, 1] → [0, 1] be a continuously differentiable function satisfying


|f 0 (x)| < 1 on [0, 1]. We claim that f admits a unique fixed point. For, by the Mean-Value
Theorem, for x, y ∈ [0, 1] there exists some z ∈ (0, 1) such that f (y) − f (x) = f 0 (z)(y − x).
Therefore,

|f (y) − f (x)| = |f 0 (z)||y − x|


≤ γ|y − x|,

where γ = supt∈[0,1] |f 0 (t)| < 1 (Why?). We see that f is a contraction. By the Contraction
Mapping Principle, it has a unique fixed point.

In fact, by using the mean-value theorem one can show that every continuous function
from [0, 1] to itself admits at least one fixed point. This is a general fact. More generally,
according to Brouwer’s Fixed Point Theorem, every continuous maps from a compact
convex set in Rn to itself admits at least one fixed point.

Our applications of the fixed point theorem are mainly on solving equations in certain
form. Let us first recall what the meaning of solving an equation is. Here are some
examples:

• Solve 2x − 5 = 0.

• Solve x2 − 3x + 5 = 0.

• Solve x12 − 6x3 + 6x − 127 = 0.


6 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

• Solve x − y + 12 = 0, 3x + 5y = 0 .

• Solve x2 − xy + y 3 − x − 16 = 0, x3 − 13xy + y 2 − 12x = 12.

• Solve y 0 = x2 y 3 + cos x, y(0) = 0.

All these equations can be formulated as some mappings from a metric space to itself.
For instance, in the second case we take T x = x2 − 3x and X = R. Then the equation
becomes T x = −5, thus solving the equation means to find the preimage of −5 under T .
In the fourth case we take S(x, y) = (x − y, 3x + 5y) which maps R2 to itself. Solving the
equation means to determine S −1 (−12, 0). There are other choices of the map, say, if we
let S1 (x, y) = (x − y + 12, 3x + 5y), then we need to find S1−1 (0, 0). In the sixth case, first
observe that it is equivalent to solving the integral equation
ˆ x
y(x) = t2 y 3 (t) dt + sin x.
0
´x
Hence, taking Φ(y) = y(x)− 0 t2 y 3 (t) dt and X = C[a, b], solving the differential equation
means to determine Φ−1 (sin x). Here in addition to the requirement a < 0 < b, there are
also some technical ones to ensure Φ really maps X to X.
All in all, we have seen that solving equations, algebraic or differential alike, means to
determine the preimage of a given map on some metric space.

Example 3.6. Show that the equation


1 1
x= + cos 5x ,
2 8
has a unique solution. Well, let us define
1 1
Tx = + cos 5x .
2 8
We claim that it is a contraction on R. Indeed,

1 1 1 1
|T x − T x0 | = + cos 5x − ( + cos 5x)
2 8 2 8
1
≤ | cos 5x − cos 5x0 |
8
1
≤ | − 5 sin 5z(x − x0 )|
8
5
≤ |x − x0 | ,
8
where z lies between x and x0 . By appealing to the contraction mapping principle, we
conclude that T has a unique fixed point, which is the solution to the equation.
3.2. THE CONTRACTION MAPPING PRINCIPLE 7

This example is a rather straightforward application of the fixed point theorem. Now
we describe a common situation where the theorem can be applied. Let (X, k · k) be a
normed space and Φ : X → X satisfying Φ(x0 ) = y0 . We asked: Is it locally solvable?
That is, for all y sufficiently near y0 , is there some x close to x0 so that Φ(x) = y holds?
We have the following result.

Theorem 3.4 (Perturbation of Identity). Let (X, k · k) be a Banach space and Φ :


Br (x0 ) → X satisfies Φ(x0 ) = y0 . Suppose that Φ is of the form I + Ψ where I is the
identity map and Ψ satisfies

kΨ(x2 ) − Ψ(x1 )k ≤ γkx2 − x1 k , x1 , x2 ∈ Br (x0 ), γ ∈ (0, 1) .

Then for y ∈ BR (y0 ), R = (1 − γ)r, there is a unique x ∈ Br (x0 ) satisfying Φ(x) = y.

The idea of the following proof can be explained in a few words. Taking x0 = y0 = 0
for simplicity, we would like to find x solving x + Ψ(x) = y. This is equivalent to finding
a fixed point for the map T , T x + Ψ(x) = y, that is, T x = y − Ψ(x). By our assumption,
Ψ is a contraction, so is T .

Proof. We first shift the points x0 and y0 to 0 by redefining Φ. Indeed, for x ∈ Br (0), let

Φ(x)
e = Φ(x + x0 ) − Φ(x0 ) = x + Ψ(x + x0 ) − Ψ(x0 ) .

Then Φ(0)
e = 0. Consider this map on Br (0) given by

T x = x − (Φ(x)
e − y), y ∈ BR (0) .

We would like to verify that T is a well-defined contraction on Br (0). First, we claim that
T maps Br (0) into itself. Indeed,

kT xk = kx − (Φ(x)
e − y)k
= kΨ(x0 ) − Ψ(x0 + x) + yk
≤ kΨ(x0 + x) − Ψ(x0 )k + kyk
≤ γkxk + R
≤ r.

Next, we claim that T is a contraction. Indeed,

kT x2 − T x1 k = kΨ(x1 + x0 ) − Ψ(x2 + x0 )k
≤ γkx2 − x1 k .

As Br (0) is a closed subset of the complete space X, it is also complete. The Contraction
Mapping Principle can be applied to conclude that for each y ∈ BR (0), there is a unique
fixed point for T , T x = x, in Br (0). In other words, Φ(x)
e = y for a unique x ∈ Br (0).
The desired conclusion follows after going back to Φ.
8 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

Remark 3.1. (a) It suffices to assume Ψ is a contraction on Br (x0 ) in the theorem. As


a contraction is uniformly continuous, it extends to become a contraction with the same
contraction constant in Br (x0 ), see exercise.
(b) By examining the proof above, one can see that the fixed point x ∈ Br (x0 ) whenever
y ∈ BR (y0 ). Indeed, when y ∈ BR (0), that is, kyk < R,

kT xk = kx − (Φ(x)
e − y)k
= kΨ(x0 ) − Ψ(x0 + x) + yk
≤ kΨ(x0 + x) − Ψ(x0 )k + kyk
< γkxk + R
≤ r.
It follows that the preimage x which satisfies T x = x belongs to Br (0).
(c) The inverse map that sends y ∈ BR (y0 ) back to x ∈ Br (x0 ), the fixed point of T , is
well-defined. Denote it by Φ−1 . We claim that it is continuous. For, let y1 , y2 ∈ BR (y0 ).
Then xi = Φ−1 (yi ) satisfy xi = yi − Ψ(xi ), i = 1, 2, that is,
kΦ−1 (y1 ) − Φ−1 (y2 )k = ky1 − Ψ(x1 ) − (y2 − Ψ(x2 )k
≤ ky1 − y2 k + kΨ(x2 ) − Ψ(x1 )k
≤ ky1 − y2 k + γkx1 − x2 k
= ky1 − y2 k + γkΦ−1 (y1 ) − Φ−1 (y2 )k ,
which implies
1
kΦ−1 (y1 ) − Φ−1 (y2 )k ≤ ky1 − y2 k .
1−γ
It follows that Φ−1 is uniformly continuous (in fact, “Lipschitz continuous”) in BR (y0 ).
Obviously, the terminology “perturbation of identity” comes from the expression
Φ(x)
e = Φ(x + x0 ) − Φ(x0 ) = x + Ψ(x + x0 ) − Ψ(x0 ) ,
which is in form of the identity plus a term satisfying the “smallness condition”
|Ψ(x + x0 ) − Ψ(x0 )| ≤ γ|x| , γ ∈ (0, 1) .

Example 3.7. Show that the equation 3x4 − x2 + x = −0.05 has a real root. We look
for a solution near 0. Let X be R and Φ(x) = x + Ψ(x) where Ψ(x) = 3x4 − x2 so
that Φ(0) = 0. According to the theorem, we need to find some r so that Ψ becomes a
contraction. For x1 , x2 ∈ Br (0), that is, x1 , x2 ∈ [−r, r], , we have
|Ψ(x1 ) − Ψ(x2 )| = |(3x42 − x22 ) − (3x41 − x21 )|
≤ (3|x32 + x22 x1 + x2 x21 + x31 | + |x2 + x1 |)|x2 − x1 |
≤ (12r3 + 2r)|x2 − x1 | ,
3.2. THE CONTRACTION MAPPING PRINCIPLE 9

which is a contraction as long as γ = (12r3 +2r) < 1. Taking r = 1/4, then γ = 11/16 < 1
will do the job. Then R = (1 − γ)r = 5/64 ∼ 0.078. We conclude that for all numbers
b, |b| < 5/64, the equation 3x4 − x2 + x = b has a unique root in (−1/4, 1/4). Now, −0.05
falls into this range, so the equation has a root.
Example 3.8. Solve
x − 3 sin2 (x − 1) = 1.01 .
Here we take Φ(x) = x−3 sin2 (x−1) and x0 = 1, y0 = 1. Using sin2 (x−1)−sin2 (x0 −1) =
2 sin(z − 1) cos(z − 1)(x − x0 ) where z lies between x and x0 , we have

| − 3 sin2 (x − 1) + 3 sin2 (x0 − 1)| ≤ 3 sin 2(z − 1)|x − x0 | .

When |x − 1|, |x0 − 1| ≤ r, | sin 2(z − 1)| ≤ 2r. Therefore,

| − 3 sin3 (x − 1) + 3 sin(x0 − 1)| ≤ 6r|x − x0 | .

We take r = 1/7 so that R = (1 − 6/7)1/7 = 1/49. We conclude that the equation has a
unique solution x ∈ [1−1/7, 1+1/7] whenever y ∈ [1−1/49, 1+1/49] ∼ [1−0.02, 1+0.02],
so it applies to y = 1.01.

The same method can be applied to solving systems of equations in Rn . We formulate


it as a general result.
Proposition 3.5. Let Φ = x + Ψ(x) : U → Rn be C 1 where U is an open set in Rn
containing 0, Ψ(0) = 0 and ∇Ψ(0) = 0. Then there is some r > 0 such that Ψ(x) = y has
a unique solution in Br (0) for each y in BR (0), R = r/2.

Proof. It suffices to verify that Ψ is a contraction on Br (0) for sufficiently small r. Then
we can apply the theorem on perturbation of identity to obtain the desired result. To
this end, we fix x1 , x2 ∈ Br (0) where r is to be determined and consider the function
ϕ(t) = Ψi (x1 + t(x2 − x1 )). We have ϕ(0) = Ψi (x1 ) and ϕ(1) = Ψi (x2 ). By the mean
value theorem, there is some t∗ ∈ (0, 1) such that ϕ(1) − ϕ(0) = ϕ0 (t∗ )(1 − 0) = ϕ0 (t∗ ).
By the Chain Rule,
d
ϕ0 (t) = Ψi (x1 + t(x2 − x1 ))
dt
∂Ψi ∂Ψi
= (x1 + t(x2 − x1 ))(x21 − x11 ) + · · · + (x1 + t(x2 − x1 ))(x2n − x1n )
∂x1 ∂xn
n
X ∂Ψi
= (x1 + t(x2 − x1 ))(x2j − x1j ) .
j=1
∂x j

Setting z = x1 + t∗ (x2 − x1 ), we have


n
X ∂Ψi
Ψi (x2 ) − Ψi (x1 ) = (z)(x2j − x1j ) .
j=1
∂xj
10 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

Recalling that for the equation y = Ax where A = {aij } is an n × n-matrix and x, y ∈ Rn ,


by Cauchy-Schwarz inequality the following inequality holds:
sX
|y| ≤ a2ij |x| .
i,j

Applying this inequality to our situation, we have

|Ψ(x1 ) − Ψ(x2 )| ≤ M |x1 − x2 | ,

where v
uX  2
u ∂Ψ i
M = sup t (z) .
|z|≤r i,j
∂x j

Using the assumptions ∇Ψ(0) = 0 and Ψ is C 1 , we can find some small r such that
M = 1/2. Applying the theorem of perturbation of identity, the equation Φ(x) = y is
uniquely solvable for y ∈ BR (0), R = (1 − 1/2)r = r/2 with solution x ∈ Br (0).

Theorem 3.4 is also applicable to function spaces. Let us example the following ex-
ample.

Example 3.9. Consider the integral equation


ˆ 1
y(x) = tg(x) + K(x, t)y 2 (t)dt ,
0

where K(x, t) ∈ C([0, 1]2 ), g ∈ C[0, 1] are given and t is a small parameter. We would like
to show that it admits a solution y as long as t is small in some sense. Our first job is to
formulate this problem as a problem of perturbation of identity. We work on the Banach
space C[0, 1] and let
ˆ 1
Φ(y)(x) = y(x) − K(x, t)y 2 (t)dt .
0

That is,
ˆ 1
Ψ(y)(x) = − K(x, t)y 2 (t)dt .
0

We further choose x0 to be 0, the zero function, so y0 = Φ(0) = 0. Then, for y2 , y1 ∈ Br (0)


(r to be specified later),
ˆ 1
kΨ(y2 ) − Ψ(y1 )k∞ ≤ |K(x, t)||y22 − y12 |(t)dt
0
≤ M × 2rky2 − y1 k∞ , M = max{|K(x, t) : (x, t) ∈ [0, 1]2 } ,
3.3. THE INVERSE FUNCTION THEOREM 11

which shows that Ψ is a contraction as long as γ = 2M r < 1. Under this condition, we


may apply Theorem 3.4 to conclude that for all t such that tkgk ≤ R, R = (1 − γ)r, the
integral equation ˆ 1
y(x) − K(x, t)y 2 (t)dt = tg(x) ,
0

has a unique solution y ∈ Br (0). For instance, we fix r = 1/(4M ) so that 2M r = 1/2 and
R = 1/(8M ). This integral equation is solvable for g as long as |t| < 1/(8M kgk).

You should be aware that in these two examples, the first underlying space is the Eu-
clidean space and the second one is the space of continuous functions under the supnorm.
It shows the power of abstraction, that is, the fixed point theorem applies to all complete
metric spaces.

3.3 The Inverse Function Theorem

We start by recalling two old results.


First, the general chain rule.
Let F : U → Rm and G : V → Rl where U is open in Rn and V open in Rm and
F (U ) ⊂ V . Assume the partial derivatives of F and G exist in U and V respectively. The
Chain Rule asserts that their composition H = G◦F : U → Rl also has partial derivatives
in U . Moreover, letting F = (F1 , · · · , Fm ), G = (G1 , · · · , Gl ) and H = (H1 , · · · , Hl ).
From
Hk (x1 , · · · , xn ) = Hk (F1 (x), · · · , Fm (x)), k = 1, · · · , l,
we have m
∂Hk X ∂Gk ∂Fi
= .
∂xj i=1
∂yi ∂xj
It is handy to write things in matrix form. Let DF be the Jacobian matrix of F , that is,
 ∂F1 ∂F1 
∂x1
· · · ∂x n
 · ··· · 
DF =   
· ··· · 
∂Fm
∂x1
· · · ∂F
∂xn
m

and similarly for DG and DH. Then the formula above becomes, in matrix product,
DH(x) = DG(F (x))DF (x) .

Next, the mean-value theorem in one-dimensional case reads as f (y)−f (x) = f 0 (c)(y−
x) for some value c lying between x and y. To remove the uncertainty of c, we note the
alternative formula
12 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

ˆ 1
f (y) = f (x) + f 0 (x + t(y − x)) dt (y − x) ,
0
which is obtained from
ˆ 1
d
f (y) = f (x) + f (x + t(y − x)) dt (Fundamental Theorem of Calculus)
0 dt
ˆ 1
= f (x) + f 0 (x + t(y − x)) dt (y − x) (Chain Rule) .
0

We will need its n-dimensional version.

Proposition 3.6. Let F : B → Rn be C 1 where B is a ball in Rn . For x1 , x2 ∈ B,


ˆ 1
F (x2 ) − F (x1 ) = DF (x1 + t(x2 − x1 )) dt · (x2 − x1 ) .
0

Here F (x2 ) − F (x1 ) and x2 − x1 are viewed as column vectors. Componentwise this means
n ˆ 1
X ∂Fi
Fi (x2 ) − Fi (x1 ) = (x1 + t(x2 − x1 ))dt (x2j − x1j ), i = 1, · · · , n .
j=1 0
∂xj

Proof. Applying Chain Rule to each function Fi , we have


ˆ 1
d
Fi (x2 ) − Fi (x1 ) = Fi (x1 + t(x2 − x1 ))dt
0 dt
ˆ 1X
∂Fi
= (x1 + t(x2 − x1 ))(x2j − x1j )dt
0 j
∂x j
ˆ 1
= DF (x1 + t(x2 − x1 )) dt · (x2 − x1 ) .
0

The Inverse Function Theorem and Implicit Function Theorem play a fundamental
role in analysis and geometry. They illustrate the principle of linearization which is ubiq-
uitous in mathematics. We learned these theorems in advanced calculus but the proofs
were not emphasized. Now we fill out the gap.

All is about linearization. Recall that a real-valued function on an open interval I is


differentiable at some x0 ∈ I if there exists some a ∈ R such that
f (x) − f (x0 ) − a(x − x0 )
lim = 0.
x→x0 x − x0
3.3. THE INVERSE FUNCTION THEOREM 13

In fact, the value a is equal to f 0 (x0 ), the derivative of f at x0 . We can rewrite the limit
above using the little o notation:

f (x0 + z) − f (x0 ) = f 0 (x0 )z + ◦(z), as z → 0.

Here ◦(z) denotes a quantity satisfying limz→0 ◦(z)/|z| = 0. The same situation carries
over to a real-valued function f in some open set in Rn . A function f is called differentiable
at x0 in this open set if there exists a vector a = (a1 , · · · , an ) such that
n
X
f (x0 + x) − f (x0 ) = aj xj + ◦(|x|) as x → 0.
j=1

Note that here x0 = (x10 , · · · , xn0 ) is a vector. Again one can show that the vector a is
uniquely given by the gradient vector of f at x0
 
∂f ∂f
∇f (x0 ) = (x0 ), · · · , (x0 ) .
∂x1 ∂xn
More generally, a map F from an open set in Rn to Rm is called differentiable at a point
x0 in this open set if each component of F = (f 1 , · · · , f m ) is differentiable. We can write
the differentiability condition collectively in the following form

F (x0 + x) − F (x0 ) = DF (x0 )x + o(x), (3.3)

where DF (x0 ) is the linear map from Rn to Rm given by


n
X
(DF (x0 )z)i = aij (x0 )xj , i = 1, · · · , m,
j=1
 
where aij = ∂f i /∂xj is the Jabocian matrix of f . (3.4) shows near x0 , that is,
when x is small, the function F is well-approximated by the linear map plus a constant
F (x0 ) + DF (x0 )x as long as DF (x0 ) is invertible (i.e., nonsingular). It suggests that the
local information of a map at a differentiable point could be retrieved from its a linear
map, which is much easier to analyse. This principle, called linearization, is widely used in
analysis. The Inverse Function Theorem is a typical result of linearization. It asserts that
a map is locally invertible if its linearization is invertible. Therefore, local bijectivity of
the map is ensured by the invertibility of its linearization. When DF (x0 ) is not invertible,
the first term on the right hand side of (3.4) may degenerate in some or even all direction
so that DF (x0 )x cannot control the error term ◦(z). In this case the local behavior of F
may be different from its linearization.
Theorem 3.7 (Inverse Function Theorem). Let F : U → Rn be a C 1 -map where U
is open in Rn and x0 ∈ U . Suppose that DF (x0 ) is invertible.

(a) There exist open sets V and W containing x0 and F (x0 ) respectively such that the
restriction of F on V is a bijection onto W with a C 1 -inverse.
14 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

(b) The inverse is C k when F is C k , 1 ≤ k ≤ ∞, in V .

A map from some open set in Rn to Rm is C k , 1 ≤ k ≤ ∞, if all its components belong


to C k . It is called a C ∞ -map or a smooth map if its components are C ∞ . Similarly, a
matrix is C k or smooth if its entries are C k or smooth accordingly.
The condition that DF (x0 ) is invertible, or equivalently the non-vanishing of the
determinant of the Jacobian matrix, is called the nondegeneracy condition. Without
this condition, the map may or may not be local invertible, see the examples below.
When the inverse is differentiable, we may apply this chain rule to differentiate the
relation F −1 (F (x)) = x to obtain

DF −1 (y0 ) DF (x0 ) = I , y0 = F (x0 ),

where I is the identity map. We conclude that


−1
DF −1 (y0 ) = DF (x0 ) .

In other words, the matrix of the derivative of the inverse map is precisely the inverse
matrix of the derivative of the map. We conclude that although the inverse may exist
without the non-degeneracy condition. This condition is necessary in order to have a
differentiable inverse. We single it out in the following proposition.

Proposition 3.8. Let F : U → Rn be a C 1 -map and x0 ∈ U . Suppose for some open V


in U containing x0 , F is invertible in V with a differentiable inverse. Then DF (x0 ) is
non-singular.

Now we prove Theorem 3.7. At first sight it is not clear how to link this theorem to
the Theorem of Perturbation of Identity. The ideas of the proof is as follows. Taking
x0 , y0 = 0, formally we have F (x) = F (0) + DF (0)(x − 0) + 12 D2 F (0)(x − 0)2 + · · · . Hence
solving F (x) = y is the same as solving DF (0)x + 12 D2 F (0)x2 + · · · = y. Since DF (0)
is invertible, it is equivalent to solving x + DF (0)−1 ( 21 D2 F (0)x2 + · · · ) = DF (0)−1 y, and
this is in the form of x + Ψ(x) = y.
Now let us turn to the proof. First assume that x0 = y0 = 0 and DF (0) = I, the
identity matrix. We write F (x) = y as x + Ψ(x) = y where Ψ(x) = F (x) − x and apply
Theorem 3.4. For this purpose we need to verify Ψ is a contraction. First fix a ball
Br0 (0) satisfying Br0 (0) ⊂ U . As U is open and 0 ∈ U , this is always possible. For
x1 , x2 ∈ Br0 (0), we have, by Proposition 3.6,

|Ψ(x1 ) − Ψ(x2 )| = |F (x1 ) − x1 − (F (x2 ) − x2 )|


= |(F (x1 ) − F (x2 ) − (x1 − x2 ))|
= |B · (x1 − x2 )| ,
3.3. THE INVERSE FUNCTION THEOREM 15

where the matrix B is given by


ˆ 1
B= (DF (x2 + t(x1 − x2 )) − DF (0)) dt ,
0

where we have used the assumption DF (0) = I. The ij-th entry of B = (bij ) is given by
ˆ 1 
∂Fi ∂Fi
bij = (x2 + t(x1 − x2 )) − (0) dt .
0 ∂xj ∂xj

By the continuity of ∂Fi /∂xj at 0, given ε > 0, there is some r ≤ r0 such that

∂Fi ∂Fi
(x) − (0) < ε , ∀x ∈ Br (0) .
∂xj ∂xj

As x2 + t(x1 − x2 ) ∈ Br (0) whenever x1 , x2 ∈ Br (0),

∂Fi ∂Fi
(x2 + t(x1 − x2 )) − (0) < ε , x1 , x2 ∈ Br (0) .
∂xj ∂xj
It follows that

|Ψ(x1 ) − Ψ(x2 )| = |B · (x1 − x2 )|


sX
≤ b2ij |x1 − x2 |
i,j

≤ n2 ε2 |x1 − x2 |
= nε|x1 − x2 | .

Now, by choosing ε to be 1/2n, we find some r ≤ r0 such that


1
|Ψ(x1 ) − Ψ(x2 )| ≤ |x2 − x1 | , ∀x1 , x2 ∈ Br (0) ,
2
that is, Ψ is a contraction with γ = 1/2.
By Theorem 3.4 and Remark 3.1, we conclude that F (x) = y is uniquely solvable
for y ∈ BR (0), R = (1 − 1/2)r = r/2, with x ∈ Br (0). Moreover, the inverse of F, G,
is continuous from BR (0) back to Br (0) whose image G(BR (0)) is an open set in Br (0).
Indeed, from Remark 3.1 (c) we have, since 1/(1 − γ) = 2,

|G(y1 ) − G(y2 )| ≤ 2|y1 − y2 | , y1 , y2 ∈ BR (0) . (3.4)

It remains to establish the differentiability of the inverse map. As by assumption, DF


is invertible at 0, we may further restrict r so that DF is invertible in Br (0). We take
W = BR (0) and V = G(BR (0)) and claim that the partial derivatives of G exist in BR (0).
16 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

To this end we recall the following fact: The partial derivatives of a function Φ exist at
x0 if there is an n × n-matrix A such that

Φ(x0 + x) − Φ(x0 ) = Ax + R ,

where R = ◦(|x|) . Moreover, when this happens, DΦ(x0 ) = A. (see Remark 3.2 below.)
Here we are concerned with Φ = G.
Let y ∈ BR (0) and y 0 ∈ BR (0) close to y. Let x and x0 be their respective preimages
in Br (0) under F . We have

F (x0 ) = F (x) + DF (x)(x0 − x) + ◦(|x0 − x|) .

Writing it in terms of y,

y 0 = y + DF (G(y))(G(y 0 ) − G(y)) + ◦(|x0 − x|) .

Here ◦(|x0 − x|) is a quantity which satisfies ◦(|x0 − x|)/|x0 − x| as x0 → x. Now, since G
is continuous, as y 0 → y, x0 = G(y 0 ) → x = G(y), and, in view of (3.4),
◦(|x0 − x|) ◦(|x0 − x|) |G(y 0 ) − G(y)|
= × → 0 , as y 0 → y .
|y 0 − y| |x0 − x| |y 0 − y|
So we can write

y 0 − y = DF (G(y))(G(y 0 ) − G(y)) + ◦(|y 0 − y|) .

Since DF (x) is invertible in Br (0),

(DF (G(y))−1 (y 0 − y) = G(y 0 ) − G(y) + ◦(|y 0 − y|) ,

that is,
G(y 0 ) − G(y) = (DF (G(y))−1 (y 0 − y) + ◦(|y 0 − y|) .
We conclude that G is differentiable in BR (0) and DG(y) = (DF (G(y))−1 .
From linear algebra we know that each entry of DG(y) can be expressed as a rational
function of the entries of the matrix of DF (G(y)). Consequently, DG(y) is C k in y if
DF (G(y)) is C k for 1 ≤ k ≤ ∞.
So far we have been assuming x0 = y0 = 0 and DF (0) = I. For a general F and x0 , y0 ,
set
F̃ (x) = A(F (x + x0 ) − y0 ) ,
where A = (DF )−1 (x0 ). Then F̃ is a C 1 -map in the open set Ũ ≡ U − x0 and it satisfies
F̃ (0) = 0 and (DF̃ )−1 (0) = I. By what has been done, F̃ admits an inverse G̃ from some
open set W̃ containing 0 to an open set Ṽ containing 0 in Ũ . Letting V = Ṽ + x0 and
W = A−1 W̃ + y0 , then V and W are open sets containing x0 and y0 respectively. Define

G(y) = G̃(A(y − y0 )) + x0 ,
3.3. THE INVERSE FUNCTION THEOREM 17

where maps W bijectively onto V . We claim that F (G(y)) = y for For y ∈ W . For,
observe that
F (x) = A−1 F̃ (x − x0 ) + y0 , x ∈ V .
We have
F (G(y)) = A−1 F̃ (G(y) − x0 ) + y0
= A−1 F̃ (G̃(A(y − y0 )) + x0 − x0 ) + y0
= y.
Finally, observe that G is C k in W as long as G̃ is C k in W̃ . The proof of the Inverse
Function Theorem is completed.

Remark 3.2. Recall that given a function ϕ : U → R where U ⊂ Rn is open and


x0 ∈ U . The partial derivative of ϕ at x0 exists if there is some α ∈ Rn such that
n
X
ϕ(x0 + x) − ϕ(x0 ) = αj xj + ◦(|x|) .
j=1

When this happens, ∂ϕ/∂xj (x0 ) = αj . For Φ : U → Rn , applying this fact to each
component of Φ, Φi , we see that the Jacobian matrix DΦ at x0 exists if there is a matrix
A = {αij } such that
Φ(x0 + x) − Φ(x0 ) = Ax + ◦(|x|) .
When this happens, ∂Φi /∂xj (x0 ) = αij .

Example 3.10. The Inverse Function Theorem asserts a local invertibility. Even if the
linearization is non-singular everywhere, we cannot assert global invertibility. Consider
x = et cos θ, y = et sin θ .
The function F : R2 → R2 given by F (t, θ) = (x, y) is a continuously differentiable
function whose Jacobian matrix is non-singular everywhere. However, it is clear that F
is not bijective, for instance, all points (t, θ + 2nπ), n ∈ Z, have the same image under F .
Example 3.11. An exceptional case is dimension one where a global result is available.
Indeed, in 2060 we learned that if f is continuously differentiable on (a, b) with non-
vanishing f 0 , it is either strictly increasing or decreasing so that its global inverse exists
and is again continuously differentiable.
Example 3.12. Consider the map F : R2 → R2 given by F (x, y) = (x 2
√ , y). Its Jacobian
matrix is singular at (0, 0). In fact, for any point (a, b), a > 0, F (± a, b) = (a, b). We
cannot find any open set, no matter how small is, at (0, 0) so that F is injective. On the
other hand, the map H(x, y) = (x3 , y) is bijective with inverse given by J(x, y) = (x1/3 , y).
However, as the non-degeneracy condition does not hold at (0, 0) so it is not differentiable
there. In these cases the Jacobian matrix is singular, so the nondegeneracy condition does
not hold.
18 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

Next we discuss the Implicit Function Theorem. The simplest situation of this general
theorem concerns the zero set (or locus) of a single function f (x, y) in the plane. Namely,
when is the zero set {(x, y) : f (x, y) = 0} a curve? Consider the function x2 + y 2 − 1
where the zero set is the unit circle. It is easy to see that it is a curve. Moreover, when
(x, y) 6= (±1, 0), it is a graph over the x-axis, and, when (x, y) 6= (0, ±1), it is a graph
over the y-axis.

Theorem 3.9. Let f be a C 1 -function in some open U in the plane and f (x0 , y0 ) = 0.
Suppose that fy (x0 , y0 ) 6= 0, then there is some open interval I, x0 ∈ I, an open set
V containing (x0 , y0 ) in U and a C 1 -function ϕ on I whose graph lies in V such that
{(x, y) ∈ V : f (x, y) = 0} = {(x, ϕ(x)) : x ∈ I}.

In other words, the zero set of f (x, y) = 0 near (x0 , y0 ) is given by the graph (x, ϕ(x)),
hence it is a curve. Likewise, when fx (x0 , y0 ) 6= 0, the locus is locally given the graph
{ψ(y), y) : y ∈ J} for some interval J containing y0 and a C 1 -function ψ on J satisfying
ψ(y0 ) = x0 .

Proof. Define Φ(x, y) = (x, f (x, y)). Then Φ(x0 , y0 ) = (x0 , 0) and det DΦ(x0 , y0 ) =
fy (x0 , y0 ) 6= 0. By Inverse Function Theorem, Φ has a C 1 -inverse Ψ from some open
set W containing (x0 , 0) satisfying Φ(Ψ(x, z)) = (x, z) on W . By shrinking W a bit, we
may assume W = I × J for two intervals. Writing Ψ(x, z) = (Ψ1 (x, z), Ψ2 (x, z)), we have

Φ(Ψ1 (x, z), Ψ2 (x, z)) = (x, z).

On the other hand, from the definition of Φ,

Φ(Ψ1 (x, z), Ψ2 (x, z)) = (Ψ1 (x, z), f (Ψ1 (x, z), Ψ2 (x, z))).

By comparing the two components, we have Ψ1 (x, z) = x and f (Ψ1 (x, z), Ψ2 (x, z)) = z.
It follows that f (x, Ψ2 (x, z)) = z. Thus each horizontal line I × {z}, z ∈ J = (c, d)
is mapped to a curve (x, Ψ2 (x, z)). By restricting U we may assume fy (x, y) 6= 0 for all
(x, y) ∈ U . When fy > 0, the horizontal line I × {c} and I × {d} are mapped to the curves
(x, Ψ2 (x, c)) and (x, Ψ2 (x, d)) respectively with Ψ2 (x, c) < Ψ2 (x, d). (When fy (x0 , y0 ) <
0, Ψ2 (x, c) > Ψ2 (x, d).) Thus the image of I × J under Ψ is precisely the set bounded by
x = a, b and the two curves (x, Ψ2 (x, c)) and (x, Ψ2 (x, d)). In particular, at z = 0, we
have f (x, Ψ2 (x, 0)) = 0. Our desired conclusion follows by taking ϕ(x) = Ψ2 (x, 0).

Let us look at some examples.

Example 3.13. First, consider the function f1 (x, y) = x − y 2 + 3. We have f1 (−3, 0) = 0


and f1x (−3, 0) = 1 6= 0. By Implicit Function Theorem, the zero set of F1 can be
described near (−3, 0) by a function x = ϕ(y) near y = 0. Indeed, by solving the equation
2
√ = 0, ϕ(y) = y − 3. On the other hand, f1y (−3, 0) = 0 and from the formula
f1 (x, y)
y = ± x + 3 we see that the zero set is not a graph over an open interval containing −3.
3.3. THE INVERSE FUNCTION THEOREM 19

Next we consider the function f2 (x, y) = x2 − y 2 at (0, 0). We have f2x (0, 0) =
F2y (0, 0) = 0. Indeed, the zero set of f2 consists of the two straight lines x = y and
x = −y intersecting at the origin. It is impossible to express it as the graph of a single
function near the origin.
Finally, consider the function f3 (x, y) = x2 + y 2 at (0, 0). We have f3x (0, 0) =
F3y (0, 0) = 0. Indeed, the zero set of f3 degenerates into a single point {(0, 0)} which
cannot be the graph of any function.

Next we state the general Implicit Function Theorem.

Theorem 3.10 (Implicit Function Theorem). Consider C 1 -map F : U → Rm where


U is an open set in Rn × Rm . Suppose that (x0 , y0 ) ∈ U satisfies F (x0 , y0 ) = 0 and
Dy F (x0 , y0 ) is invertible in Rm . There is an open set G in Rn containing x0 , an open set
V containing (x0 , y0 ) in U , and a C 1 -map ϕ on G whose graph lies in V such that

{(x, y) ∈ V : F (x, y) = 0} = {(x, ϕ(x)) : x ∈ G} .

The map ϕ belongs to C k on G when F is C k , 1 ≤ k ≤ ∞, in U .

In words, this theorem asserts that near (x0 , y0 ), the locust of F is given by the graph
of ϕ.
The notation Dy F (x0 , y0 ) stands for the Jocabian matrix (∂Fi /∂yj (x0 , y0 ))i,j=1,··· ,m
where x0 is fixed. In general, a version of Implicit Function Theorem holds when the rank
of DF at a point is m. In this case, we can rearrange the independent variables to make
Dy F non-singular at this point.
The proof of the general case is essentially the same as the proof of the simplest case.

Proof. Consider Φ : U → Rn × Rm given by

Φ(x, y) = (x, F (x, y)).

One readily checks that det DΦ(x, y) = det Dy F (x, y), so det DΦ(x0 , y0 ) 6= 0. By the
Inverse Function Theorem, there exists a C 1 -inverse Ψ = (Ψ1 , Ψ2 ) from some open set
W in Rn × Rm containing Φ(x0 , y0 ) = (x0 , 0) to an open subset of U . By restricting W
further we may assume W is of the form V1 × V2 where V1 and V2 are rectangles centered
at x0 and 0 respectively. We have

Φ(Ψ1 (x, z), Ψ2 (x, z)) = (x, z), (x, z) ∈ V1 × V2 .

On the other hand, the definition of Φ gives Φ(Ψ1 (x, z), Ψ2 (x, z)) = (Ψ1 (x, z), F (Ψ1 (x, z), Ψ(x, z))) .
Therefore,
Ψ1 (x, z) = x, and F ((Ψ1 (x, z), Ψ2 (x, z)) = z.
20 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

In other words, F (x, Ψ2 (x, z)) = z holds for (x, z) ∈ V1 × V2 . In particular, taking z = 0
gives
F (x, Ψ2 (x, 0)) = 0, ∀x ∈ V1 ,
so the function ϕ(x) ≡ Ψ2 (x, 0) satisfies our requirement. Here we take G = V1 and
V = Ψ(V1 × V2 ).

A basic knowledge we pick up from the implicit function theorem is, keeping the
notations in Theorem 3.10, whenever DF is of full rank on the locus of F (x, y) = 0,
the locus is an “n-dimensional surface” in Rn+m . Thinking of n + m many free variables
are constrained by m many equations F (x, y) = 0 and thus leaving with n many free
variables, the terminology of n-dimensional surface is easily understood. In general, for a
given smooth F , the level set {(x, y) : F (x, y) = c} may or may not be an n-dimensional
surface. We call those values c such that DF (x, y) is of rank m at every (x, y) satisfying
F (x, y) = c a regular value of F . A theorem of Sard asserts that for a smooth F , regular
values are of full measure. It implies that, in case F (x, y) = c is not regular, we can
always find some regular value c0 arbitrarily close to c. For instance, 0 is not a regular
value for the function x2 − y 2 . However, x2 − y 2 = a, a 6= 0 is a regular value.
It is interesting to note that the Inverse Function Theorem can be deduced from
Implicit Function Theorem. Thus they are equivalent. To see this, keeping the notations
used in Theorem 3.7. Define a map Φ : U × Rn → Rn by

Φ(x, y) = F (x) − y.

Then Φ(x0 , y0 ) = 0, Φ(x0 , y0 ) = 0, and Dx Φ(x0 , y0 ) = DF (x0 ) is invertible. By Theorem


3.10 (exchange x and y), there exists a C 1 -function ϕ near y0 satisfying ϕ(y0 ) = x0 and
Φ(ϕ(y), y) = F (ϕ(y)) − y = 0, hence ϕ is the local inverse of F .

We end this section by providing a justification to the method of Lagrange multipliers


in optimization.

Theorem 3.11. Let f and g1 , · · · , gm , 1 ≤ m < n, be C 1 -functions in some open set


U in Rn . Suppose that p0 is a local minimum of f subject to the constraint gj (p) =
0, j = 1, · · · , m. Assuming DG(p0 ), G = (g1 , · · · , gm ) if of rank m, there is some λ =
(λ1 , · · · , λm ) such that
m
X
∇f (p0 ) + λj ∇gj (p0 ) = 0 .
j=1

Proof. Here we take n = 3, m = 1 and write p0 = (x0 , y0 , z0 ). Since ∇g(x0 , y0 , z0 ) 6=


(0, 0, 0), WLOG we may assume gz (p0 ) 6= 0. by the Implicit Function Theorem, there is
3.4. PICARD-LINDELÖF THEOREM FOR DIFFERENTIAL EQUATIONS 21

some open set V ⊂ R2 containing (x0 , y0 ) and a C 1 -function ϕ on V such that the locust of
g = 0 is given by the graph of ϕ, that is, g(x, y, ϕ(x, y)) = 0 and ϕ(x0 , y0 ) = z0 for (x, y) ∈
V . It follows that (x0 , y0 ) is a local minimum for the function h(x, y) ≡ f (x, y, ϕ(x, y))
over V . We have
hx (x0 , y0 ) = fx (x0 , y0 , ϕ(x0 , y0 )) + fz (x0 , y0 , ϕ(x0 , y0 ))ϕx (x0 , y0 ) = ∇f · (1, 0, ϕx ) = 0,
and
hy (x0 , y0 ) = fy (x0 , y0 , ϕ(x0 , y0 )) + fz (x0 , y0 , ϕ(x0 , y0 ))ϕx (x0 , y0 ) = ∇f · (0, 1, ϕy ) = 0,
That is, ∇f is perpendicular to the two dimensional subspace spanned by (1, 0, ϕx ) and
(0, 1, ϕy ) at p0 . (In fact, this subspace is the tangent space of g = 0 at p0 .) On the other
hand, by differentiating g(x, y, ϕ(x, y)) = 0, we also have
gx (x, y, ϕ(x, y)) + gz (x, y, ϕ(x, y))ϕx (x, y) = ∇g · (1, 0, ϕx ) = 0,
and
gy (x, y, ϕ(x, y)) + gz (x, y, ϕ(x, y))ϕx (x, y) = ∇g · (0, 1, ϕy ) = 0.
It shows that the three vectors ∇g, (1, 0, ϕx ), (0, 1, ϕy ) forms a basis at p0 . Therefore,
∇f (p0 ) either points to the same or the opposite direction of ∇g(p0 , that is, or ∇f + λ∇g
at p0 for some λ.

3.4 Picard-Lindelöf Theorem for Differential Equa-


tions

In this section we discuss the fundamental existence and uniqueness theorem for differ-
ential equations. I assume that you learned the skills of solving ordinary differential
equations already so we will focus on the theoretical aspects.
Most differential equations cannot be solved explicitly, in other words, they cannot
be expressed as the composition of elementary functions. Nevertheless, there are two
exceptional classes which come up very often. Let us review them before going into the
theory.
Example 3.14. Consider the equation
dx
= a(t)x + b(t),
dt
where a and b are continuous functions defined on some open interval I. This differential
equation is called a linear differential equation because it is linear in x (with coefficients
functions of t). The general solution of this linear equation is given by the formula
 ˆ t  ˆ t
α(t) −α(s)
x(t) = e x0 + e b(s)ds , α(t) = a(s)ds,
t0 t0

where t0 ∈ I, x0 ∈ R, are arbitrary.


22 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

Example 3.15. The second class is the so-called separable equation


dx f (t)
= ,
dt g(x)
where f and g 6= 0 are continuous functions on intervals I and J respectively. The solution
can be obtained by an integration
ˆ x ˆ t
g(z)dz = f (s)ds, t0 ∈ I, x0 ∈ J.
x0 t0

The resulting relation, written as G(x) = F (t), can be converted formally into x =
G−1 F (t), a solution to the equation as immediately verified by the chain rule. For instance,
consider the equation
dx t+3
= .
dt x
The solution is given by integrating
ˆ x ˆ t
xdx = (t + 3)dt ,
x0 t0

to get
x2 = t2 + 6t + c , c ∈ R.
We have √
x(t) = ± t2 + 6t + c .
When x(0) = −2 is specified, we find the constant c = 4, so the solution is given by

x(t) = − t2 + 6t + 4 .

More interesting explicitly solvable equations can be found in texts on ODE’s.


Well, let us consider the general situation. Numerous problems in natural sciences and
engineering led to the initial value problem of differential equations. Let f be a function
defined in some set E in R2 and (t0 , x0 ) an interior point in E. We ask: Is there a solution
x = x(t) defined in some interval I containing t0 , (t, x(t)) ∈ E, ∀t ∈ I, satisfying the
differentiable equation x0 = f (t, x) as well as x(t0 ) = x0 ? Since we are looking for a local
solution, we may formulate the problem restricting to a rectangle centered at (t0 , x0 ). In
the following we will take E to be the rectangle R = [t0 − a, t0 + a] × [x0 − b, x0 + b] for
some a, b > 0 and consider the initial value problem (IVP) (also called the Cauchy
Problem)

dx


 = f (t, x),
dt (IVP)


x(t0 ) = x0 .
3.4. PICARD-LINDELÖF THEOREM FOR DIFFERENTIAL EQUATIONS 23

(In some books the independent variable t is replaced by x and the dependent variable
x is replaced by y. We prefer to use t instead of x as the independent variable in many
cases is the time.) To solve the initial value problem it means to find a function x(t)
defined in a perhaps smaller rectangle, that is, x : [t0 − a0 , t0 + a0 ] → [x0 − b, x0 + b], which
is differentiable and satisfies x(t0 ) = x0 and x0 (t) = f (t, x(t)), ∀t ∈ [t0 − a0 , t0 + a0 ], for
some 0 < a0 ≤ a. In general, no matter how nice f is, we do not expect there is always a
solution on the entire [t0 − a, t0 + a]. Let us look at the following example.

Example 3.16. Consider the initial value problem

dx


 = 1 + x2 ,
dt


x(0) = 0.

The function f (t, x) = 1 + x2 is smooth on [−a, a] × [−b, b] for every a, b > 0. However,
the solution, as one can verify immediately, is given by x(t) = tan t which is only defined
on (−π/2, π/2). It shows that even when f is very nice, a0 could be strictly less than a.
Furthermore, replace the equation by x0 = α(1+x2 ). Accordingly the solution becomes
tan αt which exists in (−π/2α, π/2α). It indicates that the interval of existence depending
on f .

The Picard-Lindelöf theorem, sometimes referred to as the fundamental theorem of


existence and uniqueness of differential equations, gives a clean condition on f ensuring
the unique solvability of the initial value problem (IVP). This condition imposes a further
regularity condition on f reminding what we did in the convergence of Fourier series.
Specifically, a function f defined in R satisfies the Lipschitz condition (uniform in t) if
there exists some L > 0 such that ∀(t, xi ) ∈ R ≡ [t0 − a, t0 + a] × [x0 − b, x0 + b], i = 1, 2,

|f (t, x1 ) − f (t, x2 )| ≤ L |x1 − x2 | .

Note that in particular means for each fixed t, f is Lipschitz continuous in x. The constant
L is called a Lipschitz constant. Obviously if L is a Lipschitz constant for f , any
number greater than L is also a Lipschitz constant. Not all continuous functions satisfy
the Lipschitz condition. An example is given by the function f (t, x) = tx1/2 is continuous.
I let you verify that it does not satisfy the Lipschitz condition on any rectangle containing
the origin.
In application, most functions satisfying the Lipschitz condition arise in the following
manner. A C 1 -function f (t, x) in a closed rectangle automatically satisfies the Lipschitz
condition. For, by the mean-value theorem, for some z lying on the segment between x1
and x2 ,
∂f
f (t, x2 ) − f (t, x1 ) = (t, z)(x2 − x1 ).
∂x
24 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

Letting
n ∂f o
L = max (t, x) : (t, x) ∈ R ,
∂x
(L is a finite number because ∂f /∂y is continuous on R and hence bounded), we have
|f (t, x2 ) − f (t, x1 )| ≤ L|x2 − x1 |, ∀(t, xi ) ∈ R, i = 1, 2.

Theorem 3.12 (Picard-Lindelöf Theorem). Consider (IVP) where f ∈ C(R) satisfies


the Lipschitz condition on R = [t0 − a, t0 + a] × [x0 − b, x0 + b]. There exist a0 ∈ (0, a) and
x ∈ C 1 [t0 − a0 , t0 + a0 ], x0 − b ≤ x(t) ≤ x0 + b for all t ∈ [t0 − a0 , t0 + a0 ], solving (IVP).
Furthermore, x is the unique solution in [t0 − a0 , t0 + a0 ].

From the proof one will see that a0 can be taken to be any number satisfying
 
0 b 1
0 < a < min a, , ,
M L
where M = sup{|f (t, x)| : (t, x) ∈ R}.
To prove Picard-Lindelöf Theorem, we first convert (IVP) into a single integral equa-
tion.
Proposition 3.13. Setting as Picard-Lindelöf Theorem, every solution x of (IVP) from
[t0 − a0 , t0 + a0 ] to [x0 − b, x0 + b] satisfies the equation
ˆ t
x(t) = x0 + f (t, x(t)) dt. (3.7)
t0

Conversely, every continuous function x(t), t ∈ [t0 − a0 , t0 + a0 ], satisfying (3.7) is contin-


uously differentiable and solves (IVP).

Proof. When x satisfies x0 (t) = f (t, x(t)) and x(t0 ) = x0 , (3.7) is a direct consequence of
the Fundamental Theorem of Calculus (first form). Conversely, when x(t) is continuous
on [t0 − a0 , t0 + a0 ], f (t, x(t)) is also continuous on the same interval. By the Fundamental
Theorem of Calculus (second form), the left hand side of (3.7) is continuously differentiable
on [t0 − a0 , t0 + a0 ] and solves (IVP).

Note that in this proposition we do not need the Lipschitz condition; only the conti-
nuity of f is needed.

Proof of Picard-Lindelöf Theorem. Instead of solving (IVP) directly, we look for a solution
of (3.7). We will work on the metric space
X = {x ∈ C[t0 − a0 , t0 + a0 ] : x(t) ∈ [x0 − b, x0 + b], x(t0 ) = x0 } ,
3.4. PICARD-LINDELÖF THEOREM FOR DIFFERENTIAL EQUATIONS 25

with the uniform metric (the metric induced by the supnorm). It is easily verified that
it is a closed subset in the complete metric space C[t0 − a0 , t0 + a0 ] and hence complete.
Recall that every closed subset of a complete metric space is complete. The number a0
will be specified below.
We are going to define a contraction on X. Indeed, for x ∈ X, define T by
ˆ t
(T x)(t) = x0 + f (s, x(s)) ds.
t0

First of all, for every x ∈ X, it is clear that f (t, x(t)) is well-defined and T x ∈ C[t0 −
a0 , t0 + a0 ]. To show that it is in X, we need to verify x0 − b ≤ (T x)(t) ≤ x0 + b for
all t ∈ [t0 − a0 , t0 + a0 ]. We claim that this holds if we choose a0 satisfying a0 ≤ b/M ,
M = sup {|f (t, x)| : (t, x) ∈ R}. For,
ˆ t
|(T x)(t) − x0 | = f (t, x(t)) dt
t0
≤ M |t − t0 |
≤ M a0
≤ b.

Next, we claim T is a contraction on X when a0 is further restricted to a0 < 1/L where


and L is the Lipschitz constant for f . For,
ˆ t
|(T x2 − T x1 )(t)| = f (t, x2 (t)) − f (t, x1 (t)) dt
t0
ˆ t
≤ f (t, x2 (t)) − f (t, x1 (t)) dt
t0
ˆ t
≤L |x2 (t) − x1 (t)| dt
t0
≤ L sup |x2 (t) − x1 (t)| |t − t0 |
t∈I
0
≤ La kx2 − x1 k∞ ,

where I = [t0 − a0 , t0 + a0 ]. It follows that

kT x2 − T x1 k∞ ≤ γkx2 − x1 k∞ , γ = a0 L < 1 .

Now we can apply the Contraction Mapping Principle to conclude that T x = x for some
x, and x solves (IVP). We have shown that (IVP) admits a solution in [t0 − a0 , t0 + a0 ]
where a0 can be chosen to be any number less than min{a, b/M, 1/L}.
Finally, any solution to the IVP is a fixed point of the map T , so the IVP has a unique
solution on [t0 − a0 , t0 + a0 ].
26 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

We point out that the existence part of Picard-Lindelöf Theorem still holds without
the Lipschitz condition. We will prove this in the next chapter. However, the solution
may not be unique.

The uniqueness assertion in this theorem is restricted to the interval [t0 − a0 , t0 + a0 ].


In fact, uniqueness holds regardless of the size of the interval of existence. We have

Proposition 3.14. Consider the IVP where f ∈ C(D), D ⊂ R2 an open set satisfying
the Lipschitz condition. Suppose x1 and x2 are two solutions of this IVP over an interval
I such that their graphs lying inside D. Suppose that x1 (t0 ) = x2 (t0 ) at some t0 ∈ I, then
x1 coincides with x2 on I.

Proof. For i = 1, 2, we have


ˆ t
xi (t) = xi (t0 ) + f (s, x(s)) ds , t∈I .
t0

By subtracting, as x1 (t0 ) = x2 (t0 ),


ˆ t
|x1 (t) − x2 (t)| = |f (s, x1 (s)) − f (s, x2 )| ds
t0
ˆ t
≤ L |x1 (s) − x2 (s)| ds .
t0

Let us take t > t0 . (The case t < t0 can be handled similarly.) The function
ˆ t
H(t) ≡ |x1 (s) − x2 (s)| ds
t0

satisfies the differential inequality

H 0 (t) ≤ LH(t) , t ∈ I +, I + = I ∩ {t > t0 }.

It satisfies H(t0 ) = 0 and is always increasing. Moreover, it vanishes on I + if and only if


x1 coincides with x2 on I + . To show that H vanishes, we add an ε > 0 to the right hand
side of this differential inequality to get H 0 ≤ L(H + ε). (The adding of ε makes H + ε
always positive.) Writing it as (log(H + ε))0 ≤ L, and integrating it to get

log(H(t) + ε) − log ε ≤ L(t − t0 ) ,

or
H(t) + ε ≤ εeL(t−t0 ) , t ∈ I+ .
Now the desired conclusion follows by letting ε → 0.
3.4. PICARD-LINDELÖF THEOREM FOR DIFFERENTIAL EQUATIONS 27

Under the assumption of this proposition, let S be the collection of all pairs (x(t), I)
where x(t) is a solution over the open interval I and whose graph passing (t0 , x0 ), t0 ∈ I.
Letting I ∗ = ∪Iα where Iα ranges over all I’s in S, the function x∗ on I ∗ defines by
x∗ (t) = xα (t) whenever t ∈ Iα is a well-defined function which solves the (IVP) over I ∗ .
It is called the maximal solution to the (IVP).

Picard-Lindelöf Theorem remains valid for systems of differential equations. Consider


the system

dxj


 = fj (t, x1 , x2 , · · · , xn ),
dt


xj (t0 ) = x0j ,
where j = 1, 2, · · · , n. By setting x = (x1 , x2 , · · · , xn ) and f = (f1 , f2 , · · · , fn ), we can
express it as in (IVP) but now both x and f are vectors.
Essentially following the same arguments as the case of a single equation, we have

Theorem 3.15 (Picard-Lindelöf Theorem for Systems). Consider (IVP) where f =


(f1 , · · · , fn ), fj ∈ C(R) satisfies the Lipschitz condition
kf(t, x) − f(t, y)k ≤ Lkx − yk ,
for all (t, x) ∈ R = [t0 − a, t0 + a] × [x01 − b, x01 + b] × · · · × [x0n − b, x0n + b] . There exists
a unique solution
x ∈ C 1 [t0 − a0 , t0 + a0 ], x(t) ∈ [x01 − b, x01 + b] × · · · × [x0n − b, x0n + b],
to (IVP) where
 
0 b 1
0 < a < min a, , , M ≥ |fj (t, x)| : (t, x) ∈ R, j = 1, · · · , n .
M L

Here for x ∈ Rn , kxk is the Euclidean norm.


Finally, let me remind you that there is a standard way to convert the initial value
problem for higher order differential equation (m ≥ 2)
 (m)
 x = f (t, x, x0 , · · · , x(m−1) ),

x(t0 ) = x0 , x0 (t0 ) = x1 , · · · , x(m−1) (t) = xm−1 ,


into a system of first order differential equations. As a result, we also have a correspond-
ing Picard-Lindelöf theorem for higher order differential equations. I let you formulate it.
28 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

3.5 Appendix I: Completion of a Metric Space

A metric space (X, d) is called isometrically embedded in (Y, ρ) if there is a mapping


Φ : X → Y such that d(x, y) = ρ(Φ(x), Φ(y)). Note that this condition implies that Φ is
1-1 and continuous. We call the metric space (Y, ρ) a completion of (X, d) if it is com-
plete, (X, d) is embedded in (Y, ρ) and Φ(X) = Y . The latter condition is a minimality
condition; (X, d) is enlarged merely to accommodate those ideal points to make the space
complete. When X is isometrically embedded in Y , we may identify X with its image
Φ(X) and d with ρ. Or, we can image X being enlarged to a larger set Y where d is also
extended to some ρ on Y which makes Y complete.

Before the proof of the Completion Theorem we briefly describe the ideas behind.
When (X, d) is not complete, we need to invent ideal points and add them to X to make
it complete. The idea goes back to Cantor’s construction of the real numbers from rational
numbers. Suppose now we have only rational numbers and we want to add irrationals.
First we identify Q with a proper subset in a larger set as follows. Let C be the collection
of all Cauchy sequences of rational numbers. Every point in C is of the form (x1 , x2 , · · · )
where {xn }, xn ∈ Q, forms a Cauchy sequence. A rational number x is identified with
the constant sequence {x, x, x, . . . } or any Cauchy sequence which converges to x. For in-
stance, 1 is identified with {1, 1, 1, . . . }, {0.9, 0.99, 0.999, . . . } or {1.01, 1.001, 1.0001, . . . }.
Clearly, there are Cauchy sequences which cannot be identified with rational numbers.
For instance, there is no rational number corresponding to {3, 3.1, 3.14, 3.141, 3.1415, . . . },
as we know, its correspondent should be the irrational number π. Similar √ situation holds
for the sequence {1, 1.4, 1.41, 1.414, · · · } which should correspond to 2. Since the cor-
respondence is not injective, we make it into one by introducing an equivalence relation
on C Indeed, {xn } and {yn } are said to be equivalent if |xn − yn | → 0 as n → ∞. The
equivalence relation ∼ forms the quotient C/ ∼ which is denoted by C. e Then x 7→ x e
sends Q injectively into C. It can be shown that C carries the structure of the real num-
e e
bers. In particular, those points not in the image of Q are exactly irrational numbers.
Now, for a metric space the situation is similar. We let Ce be the quotient space of all
Cauchy sequence in X under the relation {xn } ∼ {yn } if and only if d(xn , yn ) → 0. Define
e x, ye) = limn→∞ d(xn , yn ), for x ∈ x
d(e e, y ∈ ye. We have the embedding (X, d) → (X, e d), e
and we can further show that it is a completion of (X, d).
The following proof is for optional reading. In the exercise we will present a simpler
but less instructive proof.

Proof of Theorem 3.2. Let C be the collection of all Cauchy sequences in (X, d). We
introduce a relation ∼ on C by x ∼ y if and only if d(xn , yn ) → 0 as n → ∞. It is
routine to verify that ∼ is an equivalence relation on C. Let X
e = C/ ∼ and define a map:
3.5. APPENDIX I: COMPLETION OF A METRIC SPACE 29

e ×X
X e 7→ [0, ∞) by
d(e
e x, ye) = lim d(xn , yn )
n→∞

where x = (x1 , x2 , x3 , · · · ) and y = (y1 , y2 , y3 , · · · ) are respective representatives of x


e and
ye. We note that the limit in the definition always exists: For

d(xn , yn ) ≤ d(xn , xm ) + d(xm , ym ) + d(ym , yn )

and, after switching m and n,

|d(xn , yn ) − d(xm , ym )| ≤ d(xn , xm ) + d(ym , yn ).

As x and y are Cauchy sequences, d(xn , xm ) and d(ym , yn ) → 0 as n, m → ∞, and so


{d(xn , yn )} is a Cauchy sequence of real numbers.

Step 1. (well-definedness of d)
e To show that d(e
e x, ye) is independent of their representatives,
0 0
let x ∼ x and y ∼ y . We have

d(xn , yn ) ≤ d(xn , x0n ) + d(x0n , yn0 ) + d(yn0 , yn ).

After switching x and x0 , and y and y 0 ,

|d(xn , yn ) − d(x0n , yn0 )| ≤ d(xn , x0n ) + d(yn , yn0 ).

As x ∼ x0 and y ∼ y 0 , the right hand side of this inequality tends to 0 as n → ∞. Hence


limn→∞ d(xn , yn ) = limn→∞ d(x0n , yn0 ).

Step 2. (de is a metric). Let {xn }, {yn } and {zn } represent x


e, ye and ze respectively. We
have

d(e
e x, ze) = lim d(xn , zn )
n→∞

≤ lim d(xn , yn ) + d(yn , zn )
n→∞
= lim d(xn , yn ) + lim d(yn , zn )
n→∞ n→∞

= d(e
e x, ye) + d(e
e y , ze)

Step 3. We claim that there is a metric preserving map Φ : X 7→ X


e satisfying Φ(X) = X.
e

Given any x in X, the “constant sequence” (x, x, x, · · · ) is clearly a Cauchy sequence.


e be its equivalence class in C. Then Φx = x
Let x e defines a map from X to X. e Clearly

d(Φ(x),
e Φ(y)) = lim d(xn , yn ) = d(x, y)
n→∞

since xn = x and yn = y for all n, so Φ is metric preserving and it is injective in particular.


30 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

To show that the closure of Φ(X) is X, e we observe that any x


e in X
e is represented by a
Cauchy sequence x = (x1 , x2 , x3 , · · · ). Consider the constant sequence xn = (xn , xn , xn , · · · )
in Φ(X). We have
d(e
e x, x fn ) = lim d(xm , xn ).
m→∞

Given ε > 0, there exists an n0 such that d(xm , xn ) < ε/2 for all m, n ≥ n0 . Hence
d(e
e x, x fn → x
fn ) = limm→∞ d(xm , xn ) < ε for n ≥ n0 . That is x e as n → ∞, so the closure of
Φ(X) is precisely X.
e

Step 4. We claim that (X, e is a complete metric space. Let {x


e d) fn } be a Cauchy sequence
e As Φ(X) is equal to X,
in X. e for each n we can find a ye in Φ(X) such that

1
d(
ex fn , yen ) < .
n

So {yen } is also a Cauchy sequence in d. e Let yn be the point in X so that y n =


(yn , yn , yn , · · · ) represents ye . Since Φ is metric preserving, and {yen } is a Cauchy se-
n

quence in d, e {yn } is a Cauchy sequence in X. Let (y1 , y2 , y3 , · · · ) ∈ ye in X.


e We claim that
ye = limn→∞ x fn in X.e For, we have

d(
ex fn , ye) ≤ d(
ex fn , yen ) + d(
e yen , ye)
1
≤ + lim d(yn , ym ) → 0
n m→∞

as n → ∞. We have shown that de is a complete metric on X.


e

Completion of a metric space is unique once we have clarified the meaning of unique-
ness. Indeed, call two metric spaces (X, d) and (X 0 , d0 ) isometric if there exists a bijective
embedding from (X, d) onto (X 0 , d0 ). Since a metric preserving map is always one-to-one,
the inverse of of this mapping exists and is a metric preserving mapping from (X 0 , d0 ) to
(X, d). So two spaces are isometric provided there is a metric preserving map from one
onto the other. Two metric spaces will be regarded as the same if they are isometric,
since then they cannot be distinguish after identifying a point in X with its image in
X 0 under the metric preserving mapping. With this understanding, the completion of a
metric space is unique in the following sense: If (Y, ρ) and (Y 0 , ρ0 ) are two completions of
(X, d), then (Y, ρ) and (Y 0 , ρ0 ) are isometric. We will not go into the proof of this fact,
but instead leave it to the interested reader. In any case, now it makes sense to use “the
completion” of X to replace “a completion” of X.

3.6 Appendix II: Construction of Real Numbers

After the discovery of calculus by Newton and Leibniz, mathematics developed in an


incredibly fast speed. However, how to make it rigorous had not been a concern for
3.6. APPENDIX II: CONSTRUCTION OF REAL NUMBERS 31

mathematicians in this period. Toward the end of the nineteenth century, people began
to feel the need to clarify various things such as convergence of series. They soon realized
mathematics should be built upon the new theory of sets and the number systems. By
the effort of many people, nowadays the paramount building of mathematics stands on
relatively solid ground.
Mathematics is all about deduction. Proceeding from a few axioms, together with
insightful definitions, people deduce results from simple to sophisticated. Set theory is
the first step. Consider, for instance, the axioms proposed by Zermelo-Fraenkel, which
carefully tell us how a set is constructed. A remarkable axiom in this theory is the axiom
of choice which has many equivalent versions including the Zorn’s lemma commonly used
in analysis. With the notion of a set, one introduce ordered pairs and relations. Among
many relations, the equivalence relation and mappings are most useful.
Next it comes to numbers. The construction of the number system follows the order:
Natural numbers, integers, rational numbers, real numbers and finally complex numbers.
Natural numbers are introduced by the five axioms of Pearo:
A1. Zero is a natural number.
A2. Every natural number has a successor in the natural numbers.
A3. Zero is not the successor of any natural number.
A4. If the successor of two natural numbers is the same, then the two original numbers
are the same.
A5. If a set contains zero and the successor of every number is in the set, then the set
contains the natural numbers. With these five axioms one establishes unique factorization
property of natural numbers, introducing prime numbers, thus classical number theory
is born. After defining integers and its arithmetic, one introduces rational numbers as
the order pairs (p, q) where p, q are integers. Rational numbers consist of the equivalence
class of (p, q) under the relation (p, q) ∼ (r, s) if and only if ps = qr. The arithmetic and
ordering of integers are easily extended to all rational numbers.
There are two popular construction of the real numbers from rational numbers. Dedekind’s
cuts and Cantor’s Cauchy sequences. The latter was briefed in class. You may search the
internet to learn more. (This is not in the scope of MATH3060.)
Recall in Bartle-Sherbert’s book, the construction of real numbers is replaced by a few
additional axioms. For instance, it is assumed that R is a field satisfying certain well-
ordering property. A crucial assumption is the supremum property : Every nonempty
subset in R which is bounded from above has a supremum. It is this axiom which en-
ables us to deduce Nested-Interval, Theorem, Bolzano-Weierstrass Theorem, Complete-
ness Theorem, etc.
All these postulations become superfluous after the construction of R from Q. One
can prove the supremum property and them deduce all the other theorems, look up Wiki
32 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE

”Cantor construction of real numbers” for details.

Comments on Chapter 3. There are two popular constructions of the real number
system, Dedekind cuts and Cantor’s Cauchy sequences. Although the number system is
fundamental in mathematics, we did not pay much attention to its rigorous construction.
It is too dry and lengthy to be included in Mathematical Analysis I. Indeed, there are two
sophisticate steps in the construction of real numbers from nothing, namely, the construc-
tion of the natural numbers by Peano’s axioms and the construction of real numbers from
rational numbers. Other steps are much easier. Cantor’s construction of the irrationals
from the rationals is adapted to construct the completion for a metric space in Theorem
3.2. You may google under the key words “Peano’s axioms, Cantor’s construction of the
real numbers, Dedekind cuts” for more.

Contraction Mapping Principle, or Banach Fixed Point Theorem, was found by the
Polish mathematician S. Banach (1892-1945) in his 1922 doctoral thesis. He is the founder
of functional analysis and operator theory. According to P. Lax, “During the Second
World War, Banach was one of a group of people whose bodies were used by the Nazi
occupiers of Poland to breed lice, in an attempt to extract an anti-typhoid serum. He
died shortly after the conclusion of the war.” The interested reader should look up his
biography at Wiki.
An equally famous fixed point theorem is Brouwer’s Fixed Point Theorem. It states
that every continuous map from a closed ball in Rn to itself admits at least one fixed
point. Here it is not the map but the geometry, or more precisely, the topology of the
ball matters. You will learn it in a course on topology.

Inverse and Implicit Function Theorems, which reduce complicated structure to sim-
pler ones via linearization, are the most frequently used tool in the study of the local
behavior of maps. We learned these theorems and some of its applications in Advanced
Calculus I already. In view of this, we basically provide detailed proofs here but leave
out many standard applications. You may look up Fitzpatrick, “Advance Calculus”, to
refresh your memory. By the way, the proof in this book does not use Contraction Map-
ping Principle.

The case of polar coordinates (see Example 3.8) shows that a local invertible map
may not be globally invertible. A theorem of Hadamard asserts that a continuous, locally
bijective map F is globally bijective under an additional condition, namely, |F (x)| → ∞
whenever |x| → ∞. Incidentally, let us mention the celebrated Jacobian conjecture. Con-
sider a map F : Rn → Rn whose components Fj ’s are polynomials in x1 , · · · , xn . Assume
3.6. APPENDIX II: CONSTRUCTION OF REAL NUMBERS 33

that its Jacobian determinant is a nonzero constant. The conjecture asserts that this
map is globally bijective whose inverse is also a polynomial map. Except for some special
cases, this conjecture is still open.

Picard-Lindelöf Theorem or the fundamental existence and uniqueness theorem of dif-


ferential equations was mentioned in Ordinary Differential Equations and now its proof
is discussed in details. Of course, the contributors also include Cauchy and Lipschitz.
Further results without the Lipschitz condition can be found in Chapter 4. A classic text
on ordinary differential equations is “Theory of Ordinary Differential Equations” by E.A.
Coddington and N. Levinson. V.I. Arnold’s ”Ordinary Differential Equations” is also a
popular text.

You might also like