Inv FCN THM Notes s18

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Notes on the inverse function theorem

Math 511, Spring 2018

Theorem 0.1 (Inverse Function Theorem). Suppose Ω ⊂ Rn is an open set


and that F : Ω → Rn is a continuously differentiable function on Ω. Suppose
further that F 0 (a) ∈ L(Rn ) is an invertible transformation for some a ∈ Ω.
Denote b = F (a).

1. There exist open sets U, V ⊂ Rn such that a ∈ U , b ∈ V , F is one-to-


one on U and F (U ) = V . In other words the restriction of F to U ,
denoted F : U → V defines a bijection.

2. If F −1 : V → U denotes the inverse map of the bijection above, then


F −1 is continuously differentiable on V and (F −1 )0 (y) = (F 0 (x))−1
where y = F (x).

Proof. We first observe that if A ∈ L(Rn ) is invertible and b ∈ Rn is any


vector, then the mapping x 7→ Ax + b defines a bijection from Rn to itself
with inverse given by y 7→ A−1 (y − b). Both mappings are the composition
of a linear transformation with a translation and hence continuous. Since
the inverse image of open sets under continuous maps are open, we know

U is open ⇔ {Ax + b : x ∈ U } is open.

Consequently, it suffices to assume that a = b = 0, that is F (0) = 0, and that


F 0 (0) = I is the identity. For if this is not the case, we can apply the theorem
with these additional assumptions to F̃ (x) := (F 0 (a))−1 (F (x + a) − F (a)),
which is easily verified to be continuously differentiable on the open set
{x ∈ Rn : x + a ∈ Ω}. It is then an exercise to see that the open sets U, V
furnished by applying the theorem to F̃ yield open sets about a and b for
which the conclusion of the theorem hold for F .
From now on, we assume that a = b = 0 so that F (0) = 0 and that
0
F (0) = I with 0 ∈ Ω. Define H : Ω → Rn by H(x) = x − F (x), so that
H(0) = 0 and H is verified to be continuously differentiable with H 0 (0) = 0.

1
By continuity, there exists r > 0 such that Nr (0) ⊂ Ω and kH 0 (x)k < 12 for
x ∈ Nr (0). Since Nr (0) is convex, Theorem 9.19 in Rudin implies that
1
|H(x) − H(x0 )| ≤ |x − x0 |. (0.1)
2
Next, recalling that F (x) + H(x) = x, we have that

|x − x0 | = |F (x) + H(x) − (F (x0 ) + H(x0 ))|


≤ |F (x) − F (x0 )| + |H(x) − H(x0 )|
1
≤ |F (x) − F (x0 )| + |x − x0 |,
2
and by rearranging the inequality, we have that

|x − x0 | ≤ 2|F (x) − F (x0 )|.

This now shows that restricting F to Nr (0) yields an injective map, for if
F (x) = F (x0 ) with x, x0 ∈ Nr (0), then x = x0 .
Next, we have to show that F is surjective near the origin. We thus
take y ∈ Nr/2 (0) and want to show that there exists x ∈ Nr (0) such that
F (x) = y. To this end, define G(x) := x + y − F (x) = H(x) + y and observe
that G has a fixed point in Nr (0) if and only if F (x) = y has a solution on
this set:

x = G(x) = x + y − F (x) ⇔ 0 = y − F (x) ⇔ F (x) = y.

But G is easily observed to be a contraction since it is a translation of H


1
|G(x) − G(x0 )| = |H(x) − H(x0 )| ≤ |x − x0 |.
2
This if we can show that G(Nr (0)) ⊂ Nr (0), that is, G : Nr (0) → Nr (0),
then G is a mapping from a complete metric space to itself, at which point
the contraction mapping fixed point theorem shows that G has a unique
fixed point in Nr (0). Indeed, it can be verified that a closed subset of a
complete metric space defines a complete space on its own, or alternatively
that any compact space is complete. Suppose x ∈ Nr (0), then applying
(0.1) with x0 = 0, we obtain
1 r r r
|G(x)| ≤ |H(x)| + |y| < |x| + ≤ + = r,
2 2 2 2
hence G(Nr (0)) ⊂ Nr (0) and the second inequality is indeed strict since
y ∈ Nr/2 (0). Note that the latter point implies that if x = G(x), then

2
in fact |x| < r, so in fact the solution to y = F (x) is satisfied by some
x ∈ Nr (0).
The first part of the theorem is concluded by setting V = Nr/2 (0) and
U = Nr (0) ∩ F −1 (Nr/2 (0)), which define open sets since F is continuous.
Moreover, the properties established above ensure that F : U → V is a
bijection.
We now prove the second half of the theorem, that F −1 : V → U is
continuously differentiable. Note that since kF 0 (x) − Ik = kH 0 (x)k < 1/2
on U , F 0 (x) is invertible for x ∈ U by Theorem 9.8(a) in Rudin. Here
it is sufficient to show that if y ∈ V , then (F −1 )0 (y) exists and is equal
to (F 0 (x))−1 , where y = F (x). Indeed, as soon as we establish that F −1 is
differentiable on V , then we know that x = F −1 (y) defines x as a continuous
and function of y, so by continuity of inversion (Theorem 9.8(b)), y 7→
(F 0 (F −1 (y)))−1 is the composition of continuous maps, which shows that
F −1 is continuously differentiable. Alternatively, this can be seen by using
matrices: since the entries of (F 0 (x))−1 are rational functions of the entries
of F 0 (x) and the partial derivatives Dj fi (x) are continuous functions, again
there is continuous dependence of the entries of (F −1 )0 (y) on y.
Recall that if R(h) := F (x + h) − F (x) − F 0 (x)h, then |R(h)| = o(|h|) as
h → 0. However, here we want to define h as a function k by the relation

h = F −1 (y + k) − F −1 (y) = F −1 (y + k) − x,

which is a well defined injection for k such that y + k ∈ V . Hence x + h =


F −1 (y + k), equivalently,

F (x + h) = y + k = F (x) + k.

We now return the function G = x + y − F (x) defined above. Recall that


it is a contraction with constant 1/2. Hence since

G(x + h) − G(x) = x + h + y − F (x + h) − (x + y − F (x)) = h − k

we have that
1
|h − k| = |G(x + h) − G(x)| ≤ |h|.
2
Hence
1
|h| ≤ |k| + |h − k| ≤ |k| + |h|
2
or equivalently, |h| ≤ 2|k|. This shows that h → 0 as k → 0 and that when
1 2
h 6= 0, |k| ≤ |h| .

3
We now conclude by considering

F −1 (y + k) − F −1 (y) − (F 0 (x))−1 k = h − (F 0 (x))−1 k


= −(F 0 (x))−1 (k − F 0 (x)h)
= −(F 0 (x))−1 (F (x + h) − F (x) − F 0 (x)h).

Hence

|F −1 (y + k) − F −1 (y) − (F 0 (x))−1 k|
|k|
|(F (x + h) − F (x) − F 0 (x)h)|
 
0 −1
≤ 2k(F (x)) k
|h|
and since h → 0 as k → 0, the right hand side of this inequality tends to 0
as k → 0. This concludes that F −1 (y) is differentiable at y.

Theorem 0.2 (Implicit Function Theorem). Suppose Ω ⊂ Rn+m is an open


set and that F : Ω → Rn is continuously differentiable. Let (x, y) denote
coordinates in Rn+m so that x ∈ Rn , y ∈ Rm and write the Jacobian matrix
of F 0 (x, y) in block form as F 0 (x, y) = [ ∂F ∂F
∂x ∂y ] where
 ∂f1 ∂f1   ∂f1 ∂f1 
∂x1 ··· ∂xn ∂y1 ··· ∂ym
∂F ∂F
=  ... .. ..  , =  ... .. ..  ,
 
. .  . . 
∂x ∂f ∂fn
∂y ∂f ∂fn
∂x1
n
··· ∂xn
n
∂y1 ··· ∂ym

so that ∂F ∂F ∂F
∂x , ∂y are n × n and n × m matrices respectively. If ∂x defines
a invertible transformation in L(Rn ) at the point (a, b), where F (a, b) = 0,
then there exist neighborhoods V0 , W0 with a ∈ V0 ⊂ Rn and b ∈ W0 ⊂ Rm
and a continuously differentiable mapping G : W0 → V0 with the property
that F (x, y) = 0 for (x, y) ∈ V0 × W0 if and only if x = G(y). In other
words, F −1 (0) ∩ V0 × W0 is the graph of G,

F −1 (0) ∩ V0 × W0 = {(G(y), y) : y ∈ W0 } .

Proof. Define H : Ω → Rn+m by H(x, y) = (F (x, y), y) so that H is contin-


uously differentiable and the (n + m) × (n + m) Jacobian matrix of H 0 (a, b)
in block form is  ∂F
(a, b) ∂F

0 ∂x ∂y (a, b)
H (a, b) = , (0.2)
0m×n Im×m
where 0m×n is an m×n matrix of all zeros and Im×m denotes the m×m iden-
tity matrix. Thus by taking determinants det(H 0 (a, b)) = det( ∂F
∂x (a, b)) 6= 0,

4
which shows that H 0 (a, b) is invertible. Alternatively we can check that
H 0 (a, b) is invertible by taking any vector (h, k) ∈ Rn+m in the null space of
H 0 (a, b) and verifying that (h, k) = (0, 0). Indeed, conflating the matrices
∂F ∂F
∂x , ∂y with the linear transformations they define, we have
 
0 ∂F ∂F
(0, 0) = H (a, b)(h, k) = (a, b)h + (a, b)k, k ,
∂x ∂y
and hence k = 0 by matching the Rm entries, and after inserting this into
the Rn entries, yields 0 = ∂F
∂x (a, b)h and hence h = 0.
The inverse function theorem now furnishes neighborhoods U, W with
(a, b) ∈ U and (0, b) ∈ W such that H : U → W is bijection with continu-
ously differentiable inverse. Shrinking U if necessary, we may assume it has
the product structure V0 × V1 where V0 ⊂ Rn , V1 ⊂ Rm are open in their
respective spaces.
We now write H −1 (x, y) = (A(x, y), B(x, y)) where A : W → V0 and
B : W → V1 . Hence
(x, y) = H(H −1 (x, y)) = H(A(x, y), B(x, y))
= (F (A(x, y), B(x, y)), B(x, y)) .
Identifying both sides of the Rm identities here, we obtain B(x, y) = y and
inserting this into the Rn identities, we obtain that
x = F (A(x, y), y)
We now simply define G(y) := A(0, y) and W0 := {y ∈ Rm : (0, y) ∈ W }∩V1
so that G : W0 → V0 . It is verified that W0 is open. Thus
(x, y) ∈ F −1 (0) ∩ V0 × W0 ⇔ H(x, y) = (0, y) and (x, y) ∈ V0 × W0
and by the definitions above, the latter is equivalent to (x, y) = H −1 (0, y) =
(G(y), y) when (x, y) ∈ V0 × W0 .

Theorem 0.3 (Rank Theorem). Suppose Ω1 ⊂ Rn and Ω2 ⊂ Rm are open


subsets of their respective spaces and that F : Ω1 → Ω2 is a continuously
differentiable map. Suppose further that F 0 (z) has constant rank for every
z ∈ Ω1 . Then given z0 ∈ Ω1 , there exist neighborhoods U, V of z0 , F (z0 )
respectively and bijections ϕ : U → ϕ(U ), ψ : V → ψ(V ) such that both
ϕ, ψ and their inverses are continuously differentiable with the property that
F (U ) ⊂ V and
ψ ◦ F ◦ ϕ−1 (x1 , . . . , xk , xk+1 , . . . , xn ) = (x1 , . . . , xk , 0, . . . , 0)


where in the last expression, the last m − k entries vanish.

5
Proof. Similar to the proof of the inverse function theorem, we may assume
that z0 = 0 and F (z0 ) = 0 as any needed translations can be absorbed in
ϕ, ψ. Moreover,
 
D1 f1 (0) . . . Dn f1 (0)
F 0 (0) = 
 .. .. .. 
. . . 
D1 fm (0) . . . Dn fm (0)

has some k × k minor with nonvanishing determinant. By permuting coor-


dinates in Rn and Rm , we may assume that this minor is in the upper left
corner, that is,  
D1 f1 (0) . . . Dk fk (0)
.. .. ..
det   6= 0. (0.3)
 
. . .
D1 fk (0) . . . Dk fk (0)
Indeed, as before, such permutations can be absorbed into ϕ, ψ. It is thus
natural to denote coordinates (x, y) ∈ Rn where x ∈ Rk , y ∈ Rn−k and
similarly (v, w) ∈ Rm where v ∈ Rk , w ∈ Rm−k . We now write

F (x, y) = (Q(x, y), R(x, y))

where Q : Ω → Rk , and R : Ω → Rm−k are both continuously differentiable


maps.
Now define ϕ : Ω → Rn by ϕ(x, y) = (Q(x, y), y). Using similar notation
to (0.2) as in the proof of the implicit function theorem, ϕ0 (0, 0) is invertible
since
" #
∂Q ∂Q  
0

∂x (0, 0) ∂y (0, 0) ∂Q
det ϕ (0, 0) = det = det (0, 0) 6= 0.
0(n−k)×k I(n−k)×(n−k) ∂x

since the last quantity is (0.3). The inverse function theorem now furnishes
open sets U, Ũ containing 0 such that ϕ : U → Ũ is a bijection such that
ϕ, ϕ−1 are continuously differentiable. Shrinking Ũ if necessary, we may
assume that it is convex.
We now write

ϕ−1 (x, y) = (A(x, y), B(x, y)) A : Ũ → Rk , B : Ũ → Rn−k ,

with A, B continuously differentiable. Observe that by the definition of ϕ,


  
(x, y) = ϕ(A(x, y), B(x, y)) = Q A(x, y), B(x, y) , B(x, y) .

6
Thus by matching entries, we have that B(x, y) = y which implies that

ϕ−1 (x, y) = (A(x, y), y) and x = Q(A(x, y), y).

We now may write for some Rm−k -valued function R̃, continuously dif-
ferentiable on Ũ

F ◦ ϕ−1 (x, y) = F (A(x, y), y) = (Q(A(x, y), y), R(A(x, y), y)) = (x, R̃(x, y)).

Hence " #
−1 0
Ik×k 0k×(n−k)
(F ◦ ϕ ) (x, y) = ∂ R̃ ∂ R̃ . (0.4)
∂x (x, y) ∂y (x, y)

But by the chain rule, for (x, y) ∈ Ũ , (F ◦ ϕ−1 )0 (x, y) has rank k. Indeed,

(F ◦ ϕ−1 )0 (x, y) = F 0 (ϕ−1 (x, y))(ϕ−1 )0 (x, y),

and since F 0 (z) has constant rank with (ϕ−1 )0 (x, y) is invertible, this is the
composition of a rank k map with an invertible map, which yields a rank
k linear mapping. But given (0.4), the first k columns of (F ◦ ϕ−1 )0 (x, y)
are linearly independent, which means that ∂∂yR̃ = 0 on Ũ since otherwise,
the matrix would have rank larger than k. But since Ũ is convex, we have
that R̃ is independent of y (cf. Theorem 9.19 and its corollary), that is
R̃(x, y) = S(x) for some continuously differentiable function S defined on
the open set
Ṽ := {x ∈ Rk : (x, 0) ∈ Ũ }.
This now shows that (F ◦ ϕ−1 )(x, y) = (x, S(x)).
We finally define for v ∈ Ṽ , w ∈ Rm−k , ψ(v, w) = (v, w − S(v)). This de-
fines a continuously differentiable bijection with explicit inverse ψ −1 (s, t) =
(s, t + S(s)), which satisfies

(ψ ◦ F ◦ ϕ−1 )(x, y) = ψ(x, S(x)) = (x, S(x) − S(x)) = (x, 0),

where the latter entries in the last two expressions are in Rm−k . Defining
V to be the open set V := {(v, w) ∈ Rm : v ∈ Ṽ }, the proof is now
concluded.

You might also like