0% found this document useful (0 votes)
71 views79 pages

Analysis Distribution TH Lectures

This document contains lecture notes on real analysis focusing on functions of several variables. Key points: 1) It defines vector spaces, norms, metrics, and topologies on Rn. Rn is viewed as an n-dimensional vector space and equipped with the standard Euclidean norm and metric. 2) It covers continuity and differentiability of functions of several variables. A function f: Rn → Rm is differentiable at x if its change can be approximated by the linear map Df(x). 3) The chain rule is proved for the composition of differentiable functions. The differential of a composition equals the product of the individual differentials.

Uploaded by

publicacc71
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views79 pages

Analysis Distribution TH Lectures

This document contains lecture notes on real analysis focusing on functions of several variables. Key points: 1) It defines vector spaces, norms, metrics, and topologies on Rn. Rn is viewed as an n-dimensional vector space and equipped with the standard Euclidean norm and metric. 2) It covers continuity and differentiability of functions of several variables. A function f: Rn → Rm is differentiable at x if its change can be approximated by the linear map Df(x). 3) The chain rule is proved for the composition of differentiable functions. The differential of a composition equals the product of the individual differentials.

Uploaded by

publicacc71
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

v: 2019-01-15

REAL ANALYSIS LECTURE NOTES

RASUL SHAFIKOV

1. Functions of several variables: differentiation


1.1. Vector Space Rn . We view Rn as a n-dimensional vector space over the field of real numbers
with the usual addition of vectors and multiplication by scalars. The scalar or dot product of two
vectors x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) is defined as
n
X
(1) x·y = xi yi .
i=1
Together with this dot product Rn forms and n-dimensional Euclidean space. The norm of a vector
is then defined as √
|x| = x · x.
This norm satisfies the following three properties:
(i) |x| ≥ 0, |x| = 0 iff x = 0;
(ii) |cx| = |c||x|, for all x ∈ Rn and c ∈ R;
(iii) |x + y| ≤ |x| + |y|, for all x, y ∈ Rn .
A vector space with a norm satisfying the above three properties is called a normed space. The
norm also induces a metric on Rn given by
v
u n
uX
d(x, y) = |x − y| = t (xi − yi )2 ,
i=1

which is, of course, the standard Euclidean distance in Rn . One can verify that this metric satisfies
all three required properties: (i) d(x, y) ≥ 0, d(x, y) = 0 iff x = y; (ii) d(x, y) = d(y, x); (iii)
d(x, y) ≤ d(x, z) + d(z, y) for any x, y, z ∈ Rn . To prove properties (iii) of the norm and of the
metric in Rn one can use so-called Minkowski’s inequality:
n
!1/k n
!1/k n
!1/k
X X X
k k k
(ai + bi ) ≤ ai + bi ,
i=1 i=1 i=1
where ai , bi ≥ 0, and k > 1. In fact, Minkowski’s inequality is a special case of the Hölder inequality:
n
!1/p !1/q
X X p X q
ai bi ≤ ai · bi ,
i=1 i=1 i=1
where ai , bi ≥ 0, p, q > 1, 1/p + 1/q = 1. Note that when p = q = 2 the Hölder inequality can be
written in the form a · b ≤ |a| · |b|.
The topology on Rn is induced by the metric: a set Ω ⊂ Rn is called open if every point x ∈ Ω
is contained in Ω together with a small ball
B(x, ε) = {y ∈ Rn : |x − y| < ε}, ε > 0.
This topology gives Rn the structure of a complete metric space, i.e, every Cauchy sequence with
respect to the metric converges to an element of the space. Further, (Rn , | · |) is a Banach space,
1
2 RASUL SHAFIKOV

i.e., a complete normed space. A Banach space is called a Hilbert space if its norm comes from a
scalar product. Thus, Rn is a Hilbert space with the scalar product defined by (1).
A map A : Rn → Rm is called linear if A(ax + by) = aA(x) + bA(y) for all x, y ∈ Rn and a, b ∈ R.
A linear map can be identified with a n × m matrix with real coefficients. We define the norm of a
linear map as
||A|| = sup |Ax|.
x∈Rn ,|x|≤1
It follows immediately from the definition that |Ah| ≤ ||A|| · |h| for all h ∈ Rn .
A map f : Rn → Rm is called affine if f (x) = Ax + B, where A is a linear map, and B is a
constant vector.
1.2. Continuity. A domain Ω ⊂ Rn is a connected open set. Given a function f : Ω → R and
point x0 ∈ Ω, we say that f is continuous at x0 if for any ε > 0 there exists δ > 0 such that
|f (x) − f (x0 )| < ε whenever |x − x0 | < δ.
Theorem 1.1. A function f is continuous at x0 if and only if limj→∞ f (xj ) = f (x0 ) for any
sequence of point (xj ) → x0 .
The proof of ⇒ follows from the definition of continuity. To prove the converse formulate the
negation of continuity of a function and get a contradiction with the assumption.
xy
Example 1.1. The function f (x, y) = x2 +y 2
does not have a limit as x, y → 0, and thus does not
x2 y
admit continuous extension to the origin. On the other hand, the function g(x, y) = x2 +y 2
has
limit equal to 0 as x, y → 0, which follows from the estimate
x2 y xy 1
= 2 |x| ≤ |x|.
x2 + y 2 x + y2 2
Hence, g become continuous at the origin after setting g(0) = 0. 
Continuity of maps f : Rn → Rm is defined similarly.
1.3. Differentiability. Recall that for n = 1, a function f : R → R is called differentiable at a
point x if the limit
f (x + h) − f (x)
lim
h→0 h
exists. This implies that
f (x + h) − f (x) = f 0 (x) · h + r(h),
where r(h) = o(h), i.e., r(h)/h → 0 as h → 0. The definition of differentiability in higher dimensions
is defined similarly.
Definition 1.2. Let Ω ⊂ Rn be a domain, f : Ω → Rm be a map, x ∈ Ω. If there exists a linear
map A : Rn → Rm such that
|f (x + h) − f (x) − Ah|
(2) lim = 0, h ∈ Rn ,
h→0 |h|
then we say that f is differentiable at x and we write Df (x) = f 0 (x) = A. If f is differentiable at
every point of Ω, then we say that f is differentiable on Ω. The map A is called the differential of
f at x, and the corresponding matrix is called the Jacobian matrix of f .
Theorem 1.3. If the above definition holds for A = A1 and A = A2 then A1 = A2 .
REAL ANALYSIS LECTURE NOTES 3

Proof. Let B = A1 − A2 . Then


|Bh| ≤ |f (x + h) − f (x) − A1 h| + |f (x + h) − f (x) − A2 h|.
Hence, by differentiability of f , we have |Bh|
|h| → 0 as h → 0. It is a straightforward exercise to
verify that for a linear map B this implies that B ≡ 0. 
Example 1.2. The Jacobian matrix of a linear map A : Rn → Rm coincides with the matrix that
represents the map A, i.e, A0 (x) = A for any x ∈ Rn . 
Theorem 1.4 (The Chain Rule). Let Ω ⊂ Rn be a domain, and f : Ω → Rm be a differentiable
map at a ∈ Ω. Suppose that g : f (Ω) → Rl is a map differentiable at f (a). Then the map
F = g ◦ f = g(f ) is differentiable at a and
F 0 (a) = g 0 (f (a)) · f 0 (a).
Note that the product in the above formula is just the matrix multiplication of the Jacobian
matrices g 0 and f 0 .

Proof. Let b = f (a). We set A = f 0 (a), B = g 0 (b), U (h) = f (a + h) − f (a) − Ah, and V (k) =
g(b + k) − g(b) − Bk, where h ∈ Rn and k ∈ Rm . Then
|U (h)| |V (k)|
(3) ν(h) = →0 as h → 0, µ(k) = → 0, as k → 0.
|h| |k|
Given a vector h we set k = f (a + h) − f (a). Then
(4) |k| = |Ah + U (h)| ≤ (||A|| + ν(h)) |h|,
and
F (a + h) − F (a) − BAh = g(b + k) − g(b) − BAh = B(k − Ah) + V (k) = BU (h) + V (k).
Hence, (3) and (4) imply that for h 6= 0,
|F (a + h) − F (a) − BAh|
≤ (||B||ν(h) + (||A|| + ν(h)) µ(k).
|h|
Letting h → 0 we have ν(h) → 0, and k → 0 by (4), so µ(k) → 0. From this it follows that
F 0 (a) = BA as required. 
Example 1.3. Suppose that f : Rn → Rn is a differentiable map at a ∈ Rn such that in a
neighbourhood of f (a) the map f −1 is defined and differentiable. Then the composition of f −1 ◦ f
is a differentiable map whose differential at a by the Chain Rule equals
(f −1 ◦ f )0 (a) = (f −1 )0 (f (a)) · f 0 (a).
On the other hand, the differential of the identity map is the identity, and we conclude that the
matrix corresponding to (f −1 )0 (f (a)) is the inverse matrix to that of f 0 (a). 
Let e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), ... en = (0, . . . , 0, 1) be the standard basis in Rn , and
denote by {u1 , . . . , um } the standard basis in Rm . For a domain Ω ⊂ Rn the map f : Ω → Rm can
be written in the form
X n
(5) f (x) = fi (x)ui = (f1 (x), . . . , fm (x)),
i=1
4 RASUL SHAFIKOV

where each fi : Ω → R is a function. For a function f : Ω → R the limit


∂f f (x + tej ) − f (x)
(Dj f )(x) = Dxj f (x) = fxj (x) = = lim ,
∂xj t→0 t
if exists, is called a partial derivative with respect to variable xj . Unlike functions of one variable,
existence of partial derivatives does not imply in general that a function is differentiable, for example
the function f (x, y) in Example 1.1 has partial derivatives everywhere with respect to variables x
and y, but is not even continuous at the origin. Examples of continuous functions that have partial
derivatives but are not differentiable also exist.
Applying partial derivatives with respect to variable xj to components fi of a map f : Ω → Rm
∂fi
we obtain a matrix ( ∂x j
). As it turns out, if f is differentiable at a point x ∈ Ω, then all partial
derivatives exist.
Theorem 1.5. Suppose f : Ω → Rm is differentiable at x ∈ Ω. Then (Dj fi )(x) exist for all i, j
and
m  
0
X ∂f1 ∂f2 ∂fm
(6) f (x)ej = (Dj fi )(x)ui = , ,..., .
∂xj ∂xj ∂xj
i=1

Proof. Fix j. Since f is differentiable at x, f (x+tej )−f (x) = f 0 (x)(tej )+r(tej ), where r(tej ) = o(t).
Then by the linearity of f 0 (x),
f (x + tej ) − f (x)
lim = f 0 (x)ej .
t→0 t
If now f is represented component-wise as in (5), then
m
X fi (x + tej ) − fi (x)
lim ui = f 0 (x)ej .
t→0 t
i=1
Thus, each coefficient in front of ui has a limit, which shows existence of the partial derivatives
of f and proves (6). 
It follows from the above theorem that the Jacobian matrix f 0 (x) is given by
 
∂f1 ∂f1
∂x1 · · · ∂x n
Df (x) = f 0 (x) =  · · · ··· .
 
∂fm
∂x1 · · · ∂f
∂xn
m

Example 1.4. Let I ⊂ R be an interval, and let γ : I → Rn be a differentiable map, γ =


(γ1 , . . . , γn ). Its image in Rn is a called a (parametrized) curve. Its differential at a point t0 ∈ I is
a column or an n × 1 matrix of the form
 T
dγ1 dγn
Dγ(t0 ) = (t0 ), . . . , (t0 ) .
dt dt
Note that we used the usual notation for the derivative because each component of γ is a function
of one variable t. (Here T denotes the transposition of a matrix.) The curve γ is called smooth if
the vector Dγ(t) 6= 0 for all t ∈ I.
Let f : Rn → R be a differentiable function. Then its differential is the 1 × n matrix
 
∂f ∂f
Df = ,..., .
∂x1 ∂xn
REAL ANALYSIS LECTURE NOTES 5

The composition function g = f ◦ γ is a usual function of one variable. By the Chain Rule its
derivative can be computed as
n
dγn T X ∂f
   
dg ∂f ∂f dγ1 dγi
(7) (t) = ,..., · ,..., = (γ(t)) (t).
dt ∂x1 ∂xn dt dt ∂xi dt
i=1

The above example has an important generalization. For a differentiable function f , define ∇f ,
called the gradient of f , to be the vector given by
  Xn
∂f ∂f
∇f (x) = ,..., = (Di f )(x) ei .
∂x1 ∂xn
i=1
Then (7) can be written in the form g 0 (t)= 0
∇f (γ(t))·γ (t), where the dot indicates the dot product
in Rn . Let u now be a unit vector, and let γ(t) = x + tu be the line in the direction of u. Then
γ 0 (t) = u for all t, and so g 0 (0) = (∇f (x))u. On the other hand, g(t) − g(0) = f (x + tu) − f (x),
hence,
f (x + tu) − f (x)
lim = ∇f (x) · u.
t→0 t
This is called the directional derivative of f at x in the direction of vector u, denoted sometimes by
Du f (x) or ∂f∂u . For a fixed f and x it is clear that the directional derivative attains its maximum
if u is a positive multiple of ∇f . So ∇f gives the direction of the fastest growth of the function f .
∂f
Theorem 1.6. Let Ω be a domain in Rn , and f : Ω → R. If f has partial derivatives ∂xj on Ω,
which are continuous at a point a for j = 1, . . . , n, then f is differentiable at a.
Proof. For simplicity of notation we assume that Ω ⊂ R2 , the proof in the general case is the same.
We need to show that there exists a linear map A : R2 → R such that (2) holds. The clear choice
for A is the matrix (D1 f, D2 f ). Let a = (a1 , a2 ). For a fixed h = (h1 , h2 ) we have
∆f = f (a + h) − f (a) = [f (a + h) − f (a1 , a2 + h2 )] + [f (a1 , a2 + h2 ) − f (a)].
We apply the Mean Value theorem to two expressions on the right to obtain
∂f ∂f
∆f := h1 (a1 + θ1 h1 , a2 + h2 ) + h2 (a1 , a2 + θ2 h2 )
∂x1 ∂x2
for some numbers θi ∈ (0, 1). Hence,
∂f ∂f
∆f = h1 (a) + h2 (a) + ε(h),
∂x1 ∂x2
where
   
∂f ∂f ∂f ∂f
ε(h) = h1 (a1 + θ1 h1 , a2 + h2 ) − (a) + h2 (a1 , a2 + θ2 h2 ) − (a) .
∂x1 ∂x1 ∂x2 ∂x2
By continuity of partial derivatives we obtain
|∆f − D1 f (a)h1 − D2 f (a)h2 | |ε(h)|
= → 0, as h → 0,
|h| |h|
which is the required statement. 
Definition 1.7. A map f : Ω → Rm is called continuously differentiable, or of class C 1 (Ω), if f 0 (x)
is a continuous function on Ω, i.e., for every ε > 0 there exists δ > 0 such that ||f 0 (y) − f 0 (x)|| < ε
whenever |x − y| < δ.
6 RASUL SHAFIKOV

Theorem 1.8. Let Ω ⊂ Rn be a domain and f : Ω → Rm . Then f ∈ C 1 (Ω) if and only if all
partial derivatives exist and are continuous on Ω.
Proof. Suppose that f ∈ C 1 (Ω). Then
  
∂f1 ∂f1
∂x1 ··· ∂xn
(Dj fi )(x) = [f 0 (x)ej ]ui =  · · · · · ·  ej  · ui ,
  
∂fm ∂fm
∂x1 ··· ∂xn

for all i, j and x ∈ Ω (we continue to use the notation {uj } for the standard basis in the target
domain). Then
(Dj fi )(y) − (Dj fi )(x) = (f 0 (y) − f 0 (x))ej ui .
 

Since |ui | = |ej | = 1, we have


|(Dj fi (y) − (Dj fi )(x)| ≤ ||f 0 (y) − f 0 (x)||,
which shows continuity of Dj fi .
For the proof of the theorem in the other direction it suffices to consider the case m = 1, i.e.,
when f is a function. This can be proved using Theorem 1.6, which is left as an exercise for the
reader. 
Suppose that f : Ω → R with partial derivatives D1 f, . . . , Dn f . If the functions Dj f are
themselves differentiable, then the second-order partial derivatives of f are defined by
∂2f
 
∂ ∂f
Dij f = Di Dj f = = (i, j = 1, . . . , n).
∂xi ∂xj ∂xi ∂xj
If all functions Dij f are continuous on Ω, we say that f is of class C 2 (Ω) and write f ∈ C 2 (Ω). In
the same way we define derivatives of any order and functions of class C k (Ω), k = 1, 2, . . . , ∞. A
map f ∈ C k (Ω) if every component of f is of class C k (Ω).
It may happen that Dij f 6= Dji f at some point where both derivatives exist. But if both
derivatives are continuous, the following holds.
Theorem 1.9. Suppose f is defined on Ω ⊂ Rn , and D1 f , D2 f , D21 f , and D12 f exist at every
point of Ω, and D21 f, D12 f are continuous at some point a ∈ Ω. Then
(D12 f )(a) = (D21 f )(a).
In particular, D12 f = D21 f for f ∈ C 2 (Ω).
Proof. For simplicity assume that n = 2, as the proof for a general n is the same. Let a = (a1 , a2 ).
Consider the expression
f (a1 + h, a2 + k) − f (a1 + h, a2 ) − f (a1 , a2 + k) + f (a)
∆= ,
hk
where h, k are nonzero, say positive, and sufficiently small. The auxiliary function
f (x1 , a2 + k) − f (x1 , a2 )
φ(x1 ) =
k
is, by the assumptions of the theorem, differentiable on the interval [a1 , a1 + h] with
D1 f (x1 , a2 + k) − D1 f (x1 , a2 )
φ0 (x1 ) = ,
k
REAL ANALYSIS LECTURE NOTES 7

in particular, it is continuous. Then ∆ can be written in the form


φ(a1 + h) − φ(a1 )
∆= .
h
By the Mean Value theorem applied to φ(x1 ) on [a1 , a1 + h] we get for some 0 < θ < 1,
D1 f (a1 + θh, a2 + k) − D1 f (a1 + θh, a2 )
∆ = φ0 (a1 + θh) = .
k
Since D12 f exists, we apply the Mean Value theorem to D1 f (a1 + θh, x2 ) on the interval [a2 , a2 + k]
to get for 0 < θ1 < 1,
(8) ∆ = D21 f (a1 + θh, a2 + θ1 k).
By the symmetry in ∆ we can interchange the variables in the auxiliary function by considering
f (a1 + h, x2 ) − f (a1 , x2 )
ψ(x2 ) = ,
h
and obtain analogously that for some 0 < θ2 , θ3 < 1,
(9) ∆ = D12 f (a1 + θ2 h, a2 + θ3 k).
Comparing (8) and (9) we conclude that
D21 f (a1 + θh, a2 + θ1 k) = D12 f (a1 + θ2 h, a2 + θ3 k).
By letting h, k → 0, and from continuity of the second order derivatives the result follows. 
Definition similar to (1.2) also works for general Banach spaces of arbitrary dimension. We say
that a map f : V → W between two Banach spaces is differentiable at a point a ∈ V , if there exists
a continuous linear map (operator) A := Df (a) : V → W such that
||f (a + h) − f (a) − Ah||
lim = 0.
h→0 ||h||
Note that the requirement is that the map A is linear and continuous which is essential for infinite
dimensional spaces. The map Df is also called the Fréchet derivative of the map f . Proofs of
Theorem 1.3 and the Chain Rule given in this section can be adjusted for this more general setting.
If the derivative of f exists at every point of V , then Df becomes the map
Df : V → B(V, W ); x → Df (x).
Here B(V, W ) denotes the space of continuous linear operators from V to W , a Banach space itself.
The map f is called continuously differentiable if Df is continuous. Note that this is not the same
as to say that the map Df (x) is continuous for every x, the latter is part of the definition of
differentiability of f at x. From this the higher order derivatives are defined inductively.
v: 2019-01-17
REAL ANALYSIS LECTURE NOTES

RASUL SHAFIKOV

2. Inverse Function Theorem and Friends.


2.1. Inverse Function theorem.
Lemma 2.1 (Contraction Lemma). Let (X, d) be a complete metric space, and φ : X → X a
contraction, i.e., a map satisfying for some c < 1,
d(φ(x), φ(y)) ≤ c d(x, y), x, y ∈ X.
Then there exists a unique fixed point p of φ, i.e., p ∈ X such that φ(p) = p.
Proof. Pick any x0 ∈ X, and define {xn } inductively by setting xn+1 = φ(xn ), n = 0, 1, . . . . Then
for n > 0 we have
d(xn+1 , xn ) = d(φ(xn ), φ(xn−1 )) ≤ c d(xn , xn−1 ).
This gives the following relation
d(xn+1 , xn ) ≤ cn d(x1 , x0 ), n = 0, 1, 2, . . .
If n < m, then
m
X cn
d(xn , xm ) ≤ d(xi , xi−1 ) ≤ (cn + cn+1 · · · + cm−1 )d(x1 , x0 ) ≤ d(x1 , x0 ).
1−c
i=n+1
Thus, {xn } is a Cauchy sequences which converges to some point p by completeness of X. Since φ
is a contractions, it is continuous, and φ(p) = limn→∞ φ(xn ) = limn→∞ xn+1 = p.
The uniqueness of p is trivial. 
Definition 2.2. A map f : Rn → Rm is called Lipschitz continuous on Ω ⊂ Rn if there is a
constant C > 0 such that
|f (x) − f (y)| ≤ C|x − y|, x, y ∈ Ω.
Such C is called a Lipschitz constant for f .
Lemma 2.3. Let Ω ⊂ Rn be a domain and f : Ω → Rm be a map of class C 1 (Ω). Then f is
Lipschitz continuous on any compact convex subset B ⊂ Ω.
Proof. Let M = supx∈B ||Df (x)||. Let a, b ∈ B. Since B is convex, the straight line segment
{x = a + t(b − a), t ∈ [0, 1]}
connecting a and b is contained in B. By the Fundamental Theorem of Calculus and the Chain
Rule we have for each component of f ,
Z 1 Z 1
d
fi (b) − fi (a) = fi (a + t(b − a))dt = Dfi (a + t(b − a))(b − a)dt.
0 dt 0
Hence,
Xm m Z 1
X 2
2 2
|f (b) − f (a)| = |fi (b) − fi (a)| ≤ |Dfi (a + t(b − a))| |b − a|dt ≤ n(M |b − a|)2 .
i=1 i=1 0
1
2 RASUL SHAFIKOV

From this the assertion follows. 


A C k -smooth map f : Ω → Ω0 between open sets in Rn is called a (C k -)diffeomorphism if
f −1 : Ω0 → Ω is well defined and C k -smooth. In general, the inverse of a smooth map, if exists,
is not necessarily smooth (but always continuous!). For √ example, the function f (x) = x3 is C ∞ -
smooth on R, and has a continuous inverse f (x) = x. However, f −1 is not differentiable at the
−1 3

origin (note that f 0 (0) = 0). The situation is different if Df is invertible.


Theorem 2.4 (Inverse Function Theorem). Suppose U, V ⊂ Rn are open subsets, f : U → V is
of class C k (Ω) and f 0 (p) is nonsingular (invertible) for some p ∈ U . Then there exist connected
neighbourhoods U0 ⊂ U of p and V0 ⊂ V of f (p) such that f |U0 : U0 → V0 is a C k -diffeomorphism.
Proof. We may replace f with f1 (x) = f (x+p)−f (p). The map f1 is smooth and satisfies f1 (0) = 0
and Df (p) = Df1 (0). We may further replace f1 with f2 = Df1 (0)−1 ◦ f1 . The map f2 is smooth,
f2 (0) = 0, and Df2 (0) = Id, the identity map. Hence, we may assume that f is defined in a
neighbourhood U of the origin, f (0) = 0 and Df (0) = Id.
Set h(x) = x − f (x). Then Dh(0) = 0, and so for any ε > 0 there exists δ > 0 such that
||Dh(x)|| ≤ ε for x ∈ B(0, δ) = {x ∈ Rn : |x| ≤ δ}. By Lemma 2.3 we may δ > 0 such that
1
(1) h(x0 ) − h(x) ≤ |x0 − x|, ∀x, x0 ∈ B(0, δ).
2
Then |x − x| ≤ |f (x ) − f (x)| + |h(x ) − h(x)| ≤ |f (x0 ) − f (x)| + 12 |x − x0 |, and so
0 0 0

(2) |x0 − x| ≤ 2|f (x0 ) − f (x)|, x, x0 ∈ B(0, δ).


This shows, in particular, that f is injective on B(0, δ). For an arbitrary y ∈ B(0, δ/2) we show
that there exists a unique x ∈ B(0, δ) such that f (x) = y. Let g(x) = y + h(x) = y + x − f (x), so
g(x) = x if and only if f (x) = y. If |x| ≤ δ, then
δ 1
(3) |g(x)| ≤ |y| + |h(x)| ≤ + |x| ≤ δ,
2 2
so g maps B(0, δ) to itself. By (1), |g(x)−g(x0 )| = |h(x)−h(x0 )| ≤ 21 |x−x0 |, hence g is a contraction,
and by Lemma 2.1, g has a unique fixed point x ∈ B(0, δ). By (3), |x| = |g(x)| < δ, so x ∈ B(0, δ)
as claimed.
Let U1 = B(0, δ) ∩ f −1 (B(0, δ/2)). Then U1 ⊂ Rn is open, and f : U1 → B(0, δ/2) is bijective,
so f −1 exists. Estimate (2) shows that f −1 is continuous. Let U0 be a connected component of U1
containing the origin, and V0 = f (U0 ). Then f : U0 → V0 is a homeomorphism.
To show that f : U0 → V0 is a diffeomorphism it remains to show that f −1 ∈ C 1 (V0 ). Let
b = f (a) for some a ∈ U0 , b ∈ V0 , and set
R(v) = f (a + v) − f (a) − Df (a)v,
and
S(h) = f −1 (b + h) − f −1 (b) − Df (a)−1 h.
Let
v(h) = f −1 (b + h) + f −1 (b) = f −1 (b + h) − a.
Then h = f (a + v(h)) − f (a), and so
S(h) = v(h) − Df (a)−1 h = Df (a)−1 [Df (a)v(h) + f (a) − f (a + v(h))] = −Df (a)−1 R(v(h)).
If there exist constants C, c > 0 such that
(4) c|h| ≤ |v(h)| ≤ C|h|,
REAL ANALYSIS LECTURE NOTES 3

then
|S(h)| |R(v(h))| |R(v(h))| |v(h)| |R(v(h))|
≤ ||Df (a)−1 || ≤ ||Df (a)−1 || ≤ C||Df (a)−1 || .
|h| |h| |v(h)| |h| |v(h)|
The expression on the right converges to zero as h → 0 by differentiability of f . This proves that
f −1 is differentiable at b. It remains to show (4). We have
v(h) = Df (a)−1 Df (a)v(h) = Df (a)−1 [f (a + v(h)) − f (a) − R(v(h))] = Df (a)−1 (h − R(v(h))),
and so
|v(h)| ≤ ||Df (a)−1 || |h| + ||Df (a)−1 || |R(v(h))|.
Since |R(v)|/|v| → 0 as |v| → 0 by differentiability of f , there exists δ1 > 0 such that
(5) |R(v)| ≤ |v|/(2||Df (a)−1 ||), for |v| ≤ δ1
By continuity of f −1 , there exists δ2 > 0 such that |h| < δ2 implies |v(h)| ≤ δ1 , and therefore,
|v(h)| ≤ 2||Df (a)−1 || |h|,
whenever |h| ≤ δ2 which gives half of (4). For the other half, consider
h = f (a + v(h)) − f (a) = Df (a)v(h) + R(v(h)).
Therefore, in view of (5) for |h| < δ2 ,
 
1
|h| ≤ ||Df (a)|| |v(h)| + |R(v(h))| ≤ ||Df (a)|| + |v(h)|.
2||Df (a)−1 ||
By Theorem 1.5 the partial derivatives of f −1 are defined at each point y ∈ V0 . Observe that the
formula Df −1 (y) = Df (f −1 (y))−1 implies that the map Df −1 from V0 into the space of invertible
n × n matrices can be written in the form
f −1 Df ι
V0 −−→ U0 −−→ GL(n, R) →
− GL(n, R),
where ι : GL(n, R) → GL(n, R) is the matrix inversion map. It follows from Cramer’s rule that ι is
a smooth map of the matrix components. Thus the partial derivatives of f −1 are continuous, and
so f −1 is of class C 1 . To prove that f −1 ∈ C k (V0 ) assume by induction that we have shown that
f −1 is of class C k−1 . Because Df −1 is a composition of C k−1 -smooth functions, it is itself C k−1 -
smooth, which implies that the partial derivatives of f −1 are of class C k−1 , so f −1 is C k -smooth.
This completes the proof. 
Example 2.1 (Spherical coordinates). Consider the map f : (ρ, φ, θ) → (x, y, z) given by
x = ρ sin φ cos θ
y = ρ sin φ sin θ
z = ρ cos φ
A computation shows that the differential of this map equals ρ2 sin φ. Hence, by the Inverse
Function theorem, f is a local diffeomorphism from {ρ > 0, θ ∈ R, 0 < φ < π} to R3 . By choosing
a domain U where f is injective we conclude that the map f : U → f (U ) is a diffeomorphism.
This choice of coordinates can be generalized to arbitrary dimension. Consider the map
Φ : (r, θ1 , ..., θn−1 ) 7→ (x1 , ..., xn )
defined on the domain
U = (0, ∞) × (0, π) × ... × (0, π) × (0, 2π) ⊂ Rn
4 RASUL SHAFIKOV

by the equations
x1 = r cos θ1 ,
x2 = r sin θ1 cos θ2 ,
x3 = r sin θ1 sin θ2 cos θ3 ,
···
xn−1 = r sin θ1 sin θ2 . . . sin θn−2 cos θn−1 ,
xn = r sin θ1 sin θ2 . . . sin θn−1 .
By the Inverse Function theorem, Φ is a diffeomorphism since its differential satisfies
det DΦ = rn−1 (sin θ1 )n−2 ... sin θn−2 ,
which does not vanish on U . A diffeomorphism that is used to simplify considerations or calculations
is usually called a (local) change of coordinates. 
The rank of a map f : Rn → Rm at a point x is defined as the rank of the differential Df (x)
(viewed as an n × m matrix), which is the same as dim Df (x)(Rn ). The following theorem can be
viewed as a generalization of the Inverse Function theorem.
Theorem 2.5 (Rank theorem). Suppose U ⊂ Rm and V ⊂ Rn are open sets and f : U → V is
a smooth map with constant rank k. For any point p ∈ U , there exist a connected neighbourhood
U1 ⊂ U , a change of coordinates (i.e., a diffeomorphism) φ : U1 → U0 , φ(p) = 0 and connected
neighbourhood V1 ⊂ V with a change of coordinates ψ : V1 → V0 , ψ(f (p)) = 0, such that
ψ ◦ f ◦ φ−1 (x1 , . . . , xk , xk+1 , . . . , xm ) = (x1 , . . . , xk , 0, . . . , 0).
Here U0 and V0 can be assumed to be connected open neighbourhoods of the origin in Rm and Rn
respectively.
Proof. Since Df (p) has rank k, there exists a k × k minor with nonzero determinant. By reodering
∂fi
the coordinates, we may assume that it is the upper left minor, ( ∂x j
) for i, j = 1, . . . , k. After
translation we may assume that p = 0, and f (0) = 0. Let (x, y) ∈ Rk × Rm−k , (v, w) ∈ Rk × Rn−k
k
 wewrite f (x, y) = (Q(x, y), R(x, y)) for some smooth maps Q : U → R ,
be the coordinates. If
∂Q
R : U → Rn−k , then ∂xji is nonsingular at the origin. Define φ(x, y) = (Q(x, y), y). Then
1≤i,j≤k
!
∂Qi ∂Qi
∂x (0) ∂y (0)
Dφ(0) = j j
0 Im−k
is nonsingular. By the Inverse Function theorem there are connected neighbourhoods U1 and
U0 of the origin in Rm such that φ : U1 → U0 is a diffeomorphism. Writing the inverse map
φ−1 (x, y) = (A(x, y), B(x, y)), A : U0 → Rk , B : U0 → Rm−k , we have
(x, y) = φ(A(x, y), B(x, y)) = (Q(A(x, y), B(x, y)), B(x, y)).
It follows that B(x, y) = y, and so φ−1 (x, y) = (A(x, y), y), Q(A(x, y), y) = x, and therefore,
f ◦ φ−1 (x, y) = (x, R̃(x, y)), R̃(x, y) = R(A(x, y), y).
The Jacobian matrix of this map at an arbitrary point (x, y) ∈ U0 is
!
−1
Ik 0
D(f ◦ φ )(x, y) = ∂ R̃i ∂ R̃i .
∂xj ∂yj
REAL ANALYSIS LECTURE NOTES 5

Since composing with a diffeomorphism does not change the rank of a map, this matrix has rank
equal to k everywhere on U0 . Since the first k columns are obviously independent, the rank can be
k only if the partial derivatives ∂∂yR̃ji vanish identically on U0 , which implies that R̃ is independent
of variables y. Thus, setting S(x) = R̃(x, 0), we have
(6) f ◦ φ−1 (x, y) = (x, S(x)).
Let V1 = {(v, w) ∈ V : (v, 0) ∈ U0 }, which is a neighbourhood of the origin. The map ψ(v, w) =
(v, w − S(v)) is a diffeomorphism from V1 onto its image, which can be seen by observing that
ψ −1 (s, t) = (s, t + S(s)). It follows from (6) that
ψ ◦ f ◦ φ−1 (x, y) = ψ(x, S(x)) = (x, S(x) − S(x)) = (x, 0).

For a domain Ω ⊂ Rn , a smooth map f : Ω → Rm is called an immersion if Df (x) is injective for
all x ∈ Ω (i.e., Df (x) has a trivial kernel for all x), and a submersion if Df (x) is surjective for all
x ∈ Ω. Clearly n ≤ m is a necessary condition for f to be an immersion, while n ≥ m is required
for a submersion. These are important examples of maps of constant rank. The Rank theorem is
a powerful tool for the study of such maps. For example, let us show that if f : Rm → Rn is an
injective map of constant rank, then it is an immersion. Indeed, if f is not an immersion, then the
rank k of f is less than m. By the Rank theorem in a neighbourhood of any point there is a local
change of coordinates such that f becomes
f (x1 , . . . xk , xk+1 , . . . , xm ) = (x1 , . . . , xk , 0, . . . , 0).
It follows that f (0, . . . , 0, ε) = f (0) for ε small, which contradicts injectivity of f .
Another useful consequence of the Inverse Function theorem is the following theorem which gives
conditions under which a level set of a smooth map is locally the graph of a smooth function.
Theorem 2.6 (Implicit Function Theorem). Let U ⊂ Rn × Rk be an open set, and let (x, y) =
(x1 , . . . , xn , y1 , . . . , yk ) denote the standard coordinates on U . Suppose Φ : U → Rk is a smooth
map, (a, b) ∈ U , and c = Φ(a, b). If the k × k matrix
 i 
∂Φ
(a, b)
∂y j
is nonsingular, then there exist neighbourhoods V0 ⊂ Rn of a and W0 ⊂ Rk of b, and a smooth map
f : V0 → W0 such that Φ−1 (c) ∩ V0 × W0 is the graph of f , i.e., Φ(x, y) = c for (x, y) ∈ V0 × W0 if
and only if y = f (x).
Proof. Consider the map Ψ : U → Rn × Rk defined by Ψ(x, y) = (x, Φ(x, y)). Its differential at
(a, b) is
 
In 0
DΨ(a, b) =  ,
∂Φi ∂Φi
∂xj (a, b) ∂yj (a, b)
which is nonsingular by hypothesis. Thus by the Inverse Function theorem there exist connected
open neighbourhoods U0 of (a, b) and Y0 of (a, c) such that Ψ : U0 → Y0 is a diffeomorphism.
Shrinking U0 and Y0 if necessary, we may assume that U0 = V × W is a product neighbourhood.
The inverse map has the form (why?)
Ψ−1 (x, y) = (x, B(x, y))
6 RASUL SHAFIKOV

for some smooth map B : Y0 → W . Let V0 = {x ∈ V : (x, c) ∈ Y0 } and W0 = W , and define


f : V0 → W0 by f (x) = B(x, c). Comparing y components in the relation (x, c) = Ψ ◦ Ψ−1 (x, c)
yields
c = Φ(x, B(x, c)) = Φ(x, f (x)),
whenever x ∈ V0 so the graph of f is contained in Φ−1 (c). Conversely suppose (x, y) ∈ V0 × W0
and Φ(x, y) = c. Then Ψ(x, y) = (x, Φ(x, y)) = (x, c), so
(x, y) = Ψ−1 (x, c) = (x, B(x, c)) = (x, f (x)),
which implies that y = f (x). 
v: 2018-01-30
REAL ANALYSIS LECTURE NOTES

RASUL SHAFIKOV

3. Integration: from Riemann to Lebesgue


3.1. Riemann Integral.
Definition 3.1. For a < b, a partition of an interval [a, b] ⊂ R is a finite collection of points
P = {x0 , . . . , xm }, a = x0 < x1 < · · · < xn−1 < xn = b. A step function s(x) for a partition P is a
function which is constant on each interval (xi , xi+1 ), and arbitrary at all other points.
For a domain B = [a1 , b1 ] × · · · × [an , bn ] ⊂ Rn a partition is a set of the form P = P1 × · · · × Pn ,
where Pi is a partition of [ai , bi ]. For a multi-index I = (i1 , . . . , in ) denote by I the set of the
form (xi1 , xi1 +1 ) × · · · × (xin , xin +1 ) and call it a brick of the partition P . A function s(x) is a step
function for a partition P if it is constant on every brick of P . The volume of a brick I is the
usual Euclidean volume, i.e.,
vol (I ) = (xi1 +1 − xi1 ) · ... · (xin +1 − xin ).
Definition 3.2. A partition Q is a refinement of a partition P if Pi ⊂ Qi for all i = 1, . . . , n.
Lemma 3.3. Any two partitions of a domain B have a common refinement.
Proof. Given partitions P = P1 × · · · × Pn and P 0 = P10 × · · · × Pn0 , the partition (P1 ∪ P10 ) × · · · ×
(Pn ∪ Pn0 ) is a common refinement. 
Given a step function s(x) for a partition P of B ⊂ Rn , we define
X
I(s, P ) = sI vol (I ),
I∈P
where sI is the value of s(x) on the brick I , and the summation is taken over all bricks in the
partition.
Lemma 3.4. If s(x) is a step function for partitions P and P 0 then I(s, P ) = I(s, P 0 ).
Proof. Obvious. 
It follows from the above lemma that I(s, P ) does not depend on the choice of the partition P
for which s is a step function. Therefore, we simply denote this number by I(s).
Lemma 3.5. If s(x) is a step function for a partition P and t(x) is a step function for P 0 , then
s(x) ≤ t(x) implies I(s) ≤ I(t).
Proof. Pass to a common refinement and use the preceding lemma. 
Definition 3.6. Let B = [a1 , b1 ] × · · · × [an , bn ] ⊂ Rn . A function f : B → R is called (Riemann)
integrable on B if for any ε > 0 there exist step functions s(x) and t(x) such that s(x) ≤ f (x) ≤ t(x)
for all x and I(t) − I(s) < ε. For a function f integrable on a domain B define
Z
f (x)dx = sup I(s) = inf I(t),
B s≤f f ≤t
1
2 RASUL SHAFIKOV

where the the supremum (resp. infimum) is taken over all step function s (resp. t) with s ≤ f
(resp. f ≤ t).
Proposition 3.7. Continuous functions on Rn are Riemann integrable on any domain B =
[a1 , b1 ] × · · · × [an , bn ] ⊂ Rn .
Proof. Let ε > 0 be given. Recall that a continuous function on a compact set is uniformly
continuous, i.e., for any ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε whenever |x − y| < ε.
Thus, there exists δ > 0 such that |x − y| < δ implies |f (x) − f (y)| < volε(B) . Select a partition P
sufficiently fine so that the diameter of each brick of P is less than δ. Choose step functions s(x)
and t(x) to be respectively the minimum and the maximum of f on each brick. Then s ≤ f ≤ t,
and Z Z
ε X
t(x) − s(x) ≤ vol (I ) = ε.
B B vol (B)
I∈P

We remark that what we defined above is, in fact, called the Darboux integral. However, it can
be shown that Darboux’s definition of integral is equivalent to that of Riemann.
3.2. What is wrong with the Riemann integral? There are several reasons why the Riemann
integral defined in the previous section does not seem to be adequate. It all boils down to the fact
that certain reasonable functions are not Riemann integrable. The following three examples will
illustrate that. We begin with a definition.
Definition 3.8. Given a set S ⊂ Rn , the characteristic function χS of S is defined to be
(
1, if x ∈ S,
χS (x) =
0, if x ∈ / S.
Example
R 3.1. The so-called
R Dirichlet function χQ , is clearly not Riemann integrable on [0, 1],
since [0,1] s(x) = 0 and [0,1] t(x) = 1 for any step functions s and t with s ≤ χQ ≤ t. This is
because both rational numbers Q and irrational numbers R \ Q are dense in R. 
Proposition 3.9. Every open set U ⊂ R can be written in a unique way as an at most countable
union of disjoint open intervals.
We leave the proof of the proposition as an exercise for the reader. With the help of this
proposition we can make the following definition. Given an open set U ⊂ R we define the Lebesgue
measure of U to be X
m(U ) = |UI |,
I
where |(Ui )| is the length of the interval UI , and the summation is taken over the disjoint union of
open intervals whose union is U . It is immediate that the Lebesgue measure of every open interval
is equal to its length.
While the previous example can be dismissed by declaring χQ to be “too irregular” to be inte-
grable, the next example shows that there exist open sets whose characteristic functions are not
integrable.
Example 3.2. Suppose U ⊂ [0, 1] is an open set with the following properties: U is dense in [0, 1],
and m(U ) < 1. We claim that χU is not Riemann integrable. For the proof of the claim consider
any two step functions s(x) ≤ χU (x) ≤ t(x) for a partition P of [0, 1]. Since U is dense, any brick
REAL ANALYSIS LECTURE NOTES 3
R
[xi , xi+1 ] will have a nonempty intersection with U , and so [0,1] t(x) = 1. On the other hand, (using
the multidimensional notation, although we are in R), let
Z X
s(x) = sI vol (I ).
[0,1] I

Separate the partition P into R ∪ S, where R = {J ∈ P : sJ > 0} and S = {J ∈ P : sJ ≤ 0}. Then


J ∈ R implies 0 < sJ < 1, and J ⊂ U . It follows then that
Z X X X
s(x) = sJ vol (J ) + sJ vol (J ) ≤ vol (J ) ≤ m(U ) < 1.
[0,1] J∈S J∈R J∈R

This shows that χU is not Riemann integrable.


It remains to show that there indeed exist dense open subsets of [0, 1] with the Lebesgue measure
less than 1. To construct such a set, enumerate Q ∩ (0, 1) as {r1 , r2 , . . . , }. Suppose that 0 < b < 1.
For every l ∈ N select an open interval Jl such that rl ∈ Jl , Jl ⊂ (0, 1) and the length of Jl equals
b/2l . Then the union U of all Ul is an open subset of [0, 1] which is clearly dense in [0, 1]. Let Ui
be the disjoint union of open intervals, with ∪j Uj = U (these exist by Proposition 3.9). Then,
X X
m(∪l Ul ) = vol (Uj ) ≤ vol (Jl ) = b.
j l

Thus, U has the required properties. 


Example 3.3 (Cantor-type sets). Let I = [p, q] be an interval in R, and let the length of I be
equal to b > 0. For b > a > 0, write I = [p, r] ∪ (r, s) ∪ [s, q], such that |r − p| = |q − s| = (b − a)/2,
and |s − r| = a. We call [p, r] and [s, q] the remnantsPof I and (r, s) the middle part of I.
Select a0 , a1 , . . . , positive real numbers such that ∞ n
n=0 2 an = a. For each n ≥ 1, let
n−1
!
X
bn = 2−n 1 − 2k ak ,
k=0
bn −an
so that bn > an and bn+1 = 2 for all n. Let S0 = {I0 } be the middle part of [0, 1], T1 = {J1 , J2 }
be the corresponding a0 -remnants. Then these have length b1 > a1 . Let S1 be the middle a1 -parts
of T1 , and let T2 be the set of a1 -remnants of T1 . Their length is b2 > a2 . Note that S1 has 2
elements, while T2 has 4. We continue inductively: construct Sn+1 by taking the middle an -parts
of Tn , while Tn+1 will consist of the remnants of Tn . P n
Let U = S0 ∪ S1 ∪ . . . . By construction, this union is disjoint, and m(U ) = 2 an = a. Let
J be an arbitrary subinterval of [0, 1] of length bn . Then J intersects S0 , . . . Sn , and hence, U .
Indeed, otherwise, J is contained in the disjoint union in Tn+1 . But the intervals in Tn+1 have
length bn+1 < bn , a contradiction. Since bn → 0 as n → ∞, we conclude that any interval of
positive length will have a nonempty intersection with U . Thus U is dense in [0, 1]. It follows from
the previous example that χU is not integrable on [0, 1].
1
. Then a = ∞ n 1
P
For a concrete Cantor-type set, consider an = 4n+1 n=0 2 4n+1 = 1/2, and thus the
set U obtained for this choice of an has a nonintegrable characteristic function. 
3.3. Lebesgue Integral. In this subsection we briefly outline the construction of the Lebesgue
integral. We begin with Lebesgue measurable sets.
Let B be a “brick” domain in Rn defined by B = I1 × .... × In where Ij are intervals in R of the
), or [aj , bj ], aj ≤ bj . Define a map m : P → [0, +∞) on the set P of all
form (aj , bj ), (aj , bj ], [aj , bjQ
bricks by setting m(B) = j (bj − aj ). Thus m is just the usual Euclidean volume (resp. length,
4 RASUL SHAFIKOV

area) of a brick. We also add the empty set to P and define m(∅) = 0. If a set E is a finite disjoint
union of bricks, i.e.,
(1) E = ∪kj=0 Bj , Bj ∈ P, Bi ∩ Bj = ∅, ∀i 6= j,
then clearly
k
X
(2) m(E) = m(Bk ).
j=0

It is possible to extend m as a positive function to a wider class of sets still keeping the additivity
property (2). We say that a subset E of Rn is elementary if it admits representation (1). Then we
view (2) as the definition of m(E). Note that this definition is independent of the choice of Bk in
(1). It is easy to see that if E1 and E2 are two elementary sets, then E1 ∪ E2 , E1 ∩ E2 , E1 \E2 are
elementary sets. We denote the class of elementary sets by E. The crucial property of the function
m : E → R+ ∪ {0} is the following: if (EjP ) is a finite or countable collection of elementary sets and
E ∈ E satisfies E ⊂ ∪j Ej , then m(E) ≤ j m(Ej ).
Let now A be a subset of Rn . We define its outer measure m∗ by
 
X 

m (A) = inf m(Ej ) : A ⊂ ∪j Ej , Ej ∈ E ,
 
j

where the infimum is taken over all finite or countable coverings of A by elementary sets. Recall
that a symmetric difference of two sets A and B is defined by A∆B = (A ∪ B)\(A ∩ B).
Definition 3.10. A set A ⊂ Rn is called Lebesgue measurable if for every ε > 0 there exists
E ∈ E such that m∗ (A∆E) < ε. If A is a measurable set, the Lebesgue measure of A is defined as
m(A) := m∗ (A).
Denote by M the class of all measurable sets in Rn . Clearly, every brick domain is measurable.
One can show that M is closed with respect to finite or countable application of unions, intersections
and differences. Further, one can show that any open or closed subset of Rn is measurable, and
that a set X is measurable if and only if for any ε > 0 there exists an open set G (resp. closed F )
such that X ⊂ G (resp. F ⊂ X) such that m∗ (G \ X) < ε (resp. (m∗ (X \ F ) < ε).
Perhaps the most important property of the Lebesgue measureP is its σ-additivity: if (Aj ) is a
disjoint sequence of measurable sets and A = ∪j Aj , then m(A) = j m(Aj ). It is also monotone:
if A ⊂ B then m(A) ≤ m(B).
Lemma 3.11. Any countable set S in Rn has measure zero.
Proof. Enclose every point an of S = {a0 , a1 , . . . } in a brick domain of volume ε/2n . 
Note that the converse to the lemma is false: there exist sets of measure zero which are not
countable. A primary example of such domain is the Cantor set. Following the construction in
Example 3.3 we produce an open set U by taking an = 1/3 for all n ∈ N. Then the set [0, 1] \ U is
called the Cantor set. It is a compact set of measure zero, and can be shown to have cardinality of
R. We leave details to the reader.
We now move from sets to functions. Let X be a measurable subset of Rn . A function f :
X → R is called measurable if all subsets f −1 ((−∞, a)), f −1 ((−∞, a]), f −1 ([a, ∞)), f −1 ((a, ∞))
are measurable for every a ∈ f (X). In particular, suppose that f admits at most a finite set of
values y0 , y1 , ..., yk . Then f is measurable if and only if every set f −1 (yj ) is measurable. Measurable
REAL ANALYSIS LECTURE NOTES 5

functions that admit only finitely many values will be called simple. The Lebesgue integral over X
of a simple function ψ is defined by
Z X
(3) ψ(x)dx := yj m(ψ −1 (yj )).
X j

Definition 3.12. Let f : X → R be a bounded measurable function defined on X ∈ M with


m(X) < ∞. Then define
Z Z
(4) f (x)dx = sup ψ(x)dx,
X ψ≤f X

where the supremum is taken over all simple functions ψ on X satisfying ψ ≤ f .


It can beR shown that for a measurable
R function f : X → R, the Lebesgue integral can be also
defined as X f (x)dx = inf φ≥f X φ(x)dx for simple functions φ ≥ f . Both definitions agree.
Proposition 3.13. If f : X → R is Riemann integrable for a brick domain X ⊂ Rn , then the
integral in (4) well-defined and finite.
Proof. Note that every step function on X in particular is a simple function. Hence, for step
functions s(x) and t(x) satisfying s ≤ f ≤ t, we have
Z Z Z Z
I(s) = s(x)dx ≤ sup φ(x)dx ≤ inf ψ(x)dx ≤ t(x)dx = I(t).
X φ≤f X f ≤ψ X X

Since f is Riemann integrable, I(t) − I(s) can be made arbitrarily small, and we conclude that the
function f is Lebesgue integrable. 
If now f ≥ 0 on X ∈ M, we define
Z Z
(5) f (x)dx = sup h(x)dx,
X h≤f X

where the supremum is taken over all bounded


R measurable functions h such that m{x : h(x) 6=
0} < ∞. This last assumption ensures that X h(x)dx on the right-hand side of (5) is well-defined
even if m(X) = ∞. Indeed, we simply have
Z Z
h(x)dx = h(x)dx.
X {x:h(x)6=0}

For a general measurable f : X → R we set f + = max{f, 0}, and f − = max{−f, 0}. Then
f = f + − f − , and |f | = f + + f − .
Definition 3.14. For X ∈ M and a measurable f : X → R we define
Z Z Z
f (x)dx = f + (x)dx − f − (x)dx.
X X X
If both integrals on the right are finite we say that f is (Lebesgue) integrable on X. The class of
integrable functions is denoted by L1 (X).
A property of functions defined on a domain in Rn is said to hold almost everywhere if it does not
hold on a set of measure zero. The common notation for that is a.e.. For example two functions
f = gR a.e. means
R that the set of points where f is not equal to g has measure zero. It follows then
that f = g. Another example is convergence a.e.: we say lim fn = f a.e., if the set of points x
for which lim fn (x) 6= f (x) has measure zero.
6 RASUL SHAFIKOV

Using the definition of the integral


R and properties
R of Rmeasurable sets one can prove Rbasic prop-
R
erties
R of integration,
R R such as af + bg = a f + b g for a, b ∈ R; f ≤ g ⇒ f ≤ g;
A∪B f = A f + B f for disjoint A, B; etc. A more delicate property is taking the limit un-
der the integral sign. The following two theorems provide sufficient conditions under which the
operations of taking a limit and integration commute.
Theorem 3.15 (Fatou’s lemma). If {fn } is a sequence of nonnegative measurable functions and
fn (x) → f (x) a.e. on X ∈ M, then
Z Z
(6) f (x)dx ≤ lim inf fn .
X X

Proof. Without loss of generality we may assume fn (x) → f (x) for all x. By the definition of the
Lebesgue integral, it is enough to show
Pm that (6) holds if we replace f with any non-negative simple
function φ ≤ f . Suppose that φ = k=1 ak χAk , where Ak are disjoint measurable sets, and ak > 0.
Let 0 < t < 1. Since φ(x) ≤ f (x), we see that ak ≤ lim inf fn (x) for each k and x ∈ Ak . It follows
that for a fixed k the sequence of sets
Bkn = {x ∈ Ak : fp (x) ≥ tak for all p ≥ n}
Pm
increases to Ak . Consequently, m(Bkn ) → m(Ak ) as n → ∞. The simple function k=1 tak χBkn
is everywhere less than fn , and so
Z m
X
fn dx ≥ tak m(Bkn ).
X k=1
Taking lim inf in this inequality yields
Z m
X Z
lim inf fn dx ≥ tak m(Ak ) = t φdx.
n→∞ X X
k=1

Finally, by letting t → 1 we get (6). 


Theorem 3.16 (Lebesgue Convergence theorem). Let X ⊂ Rn be a measurable subset and {fn }
be a sequence of measurable functions. Suppose that fn (x) −→ f (x) for almost every x ∈ X.
Furthermore, assume that there exists a function g ∈ L1 (X) such that
|fn (x)| ≤ g(x), n = 1, 2, ...
Then f ∈ L1 (X), and Z Z
lim fn dx = f dx.
n−→∞ X X
Proof. The function g − fn is nonnegative, and so by Fatou’s lemma
Z Z
(g − f ) dx ≤ lim inf (g − fn ) dx.
X X
Since |f | ≤ g, f is integrable, and we have
Z Z Z Z
g dx − f dx ≤ g dx − lim sup fn dx,
X X X X
from which we conclude that Z Z
f dx ≥ lim sup fn dx.
X X
REAL ANALYSIS LECTURE NOTES 7

Similarly, considering g + fn we get


Z Z
f dx ≤ lim inf fn dx,
X X
and the theorem follows. 
Note that Fatou’s lemma has a weaker hypothesis than the Lebesgue Convergence theorem, and
as a result its conclusion is also weaker. The advantage of Fatou’s lemma is that it is applicable
even if f is not known to be integrable and so it is often a good way of showing that f is integrable.
v: 2019-01-31
REAL ANALYSIS LECTURE NOTES

RASUL SHAFIKOV

4. Lp spaces and their relatives


In this section we consider properties of function spaces, i.e., collections of functions defined
on Euclidean domains that satisfy certain integrability or differentiability conditions. These are
important examples of general spaces studied in functional analysis: Banach spaces, topological
vector spaces, Fréchet spaces, etc. All integrals in this section will be with respect to the Lebesgue
measure.

4.1. Lp spaces.
Definition 4.1. For a domainR Ω ⊂ RN and a real number p, 1 ≤ p < ∞, a measurable function f
is said to be of class Lp (Ω) if Ω |f |p < ∞.
Since |f + g|p ≤ 2p (|f |p + |g|p ) for all p, the space Lp = Lp (Ω) of all Lp -functions is a vector
space. Define
Z 1/p
p
(1) ||f ||p = |f | .

Lp
Clearly, ||c f || = |c| ||f || for all f ∈ and c ∈ R, and ||f || = 0 if and only if f = 0 a.e. on
Ω. In Theorem 4.3 below we will show that ||f + g||p ≤ ||f ||p + ||g||p . Thus, (1) defines a norm
on Lp . Note that since ||f || = 0 only implies that f vanishes everywhere except a set of measure
zero, one should understand elements of the space Lp as equivalence classes of functions satisfying
Definition 4.1 with respect to the equivalence relation given by f ∼ g ⇔ f = g a.e.
For p = ∞ we define the space L∞ (Ω) of bounded functions (more precisely essentially bounded)
functions with the norm
||f ||∞ = ess sup |f (x)| = sup {r ∈ R : m({x : |f (x) − r| < ε}) > 0 for all ε > 0} .
x∈Ω

Theorem 4.2 (Hölder’s inequality). If p, q ≥ 1 satisfy 1/p + 1/q = 1, and f ∈ Lp , g ∈ Lq , then


Z
|f g| ≤ ||f ||p ||g||q .

(If p = 1, we assume that q = ∞.)


Proof. We will leave the case p = 1, q = ∞ as an exercise for the reader, and assume that p > 1.
We first establish so-called Young’s inequality: for a, b > 0 and p and q as in Theorem 4.2, we have
ap bq
ab ≤ + .
p q
To see that let t = 1/p, so 1 − t = 1/q. Then, since log is a strictly concave function,
log(tap + (1 − t)bq ) ≥ t log(ap ) + (1 − t) log(bq ) = log a + log b = log(ab),
from which the required inequality follows.
1
2 RASUL SHAFIKOV

Now for the proof of Hölder’s inequality, we may divide the functions f and g by their norms in
the corresponding spaces, so we may assume that ||f ||p = ||g||q = 1. Using Young’s inequality, we
have
|f (x)|p |g(x)|q
|f (x)g(x)| ≤ + , x ∈ Ω.
p q
Integrating the above inequality over Ω gives
||f (x)g(x)||1 ≤ 1/p + 1/q = 1,
which is what was needed to prove. 
The next theorem is essentially a corollary of Hölder’s inequality.
Theorem 4.3 (Minkowski’s inequality). For any p ≥ 1,
||f + g||p ≤ ||f ||p + ||g||p .
Proof. When p = 1 or p = ∞ the inequality is trivial. For 1 < p < ∞ we write
|f (x) + g(x)|p = |f (x) + g(x)| · |f (x) + g(x)|p−1 ≤ |f (x)| · |f (x) + g(x)|p−1 + |g(x)| · |f (x) + g(x)|p−1 .
Integrating over Ω we obtain
Z Z
||f + g||pp ≤ p−1
|f | · |f + g| + |g| · |f + g|p−1 .

We now apply Hölder’s inequality to both terms on the right. The first one yields
Z Z p−1
p p
p−1 (p−1) p−1
|f | · |f + g| ≤ ||f ||p |f + g| ≤ ||f ||p · ||f + g||p−1
p .

Similarly, the second term give


Z
|g| · |f + g|p−1 ≤ ||g||p · ||f + g||pp−1 .

Combining everything together yields


||f + g||pp ≤ ||f ||p · ||f + g||p−1
p + ||g||p · ||f + g||p−1
p = (||f ||p + ||g||p ) · ||f + g||pp−1 ,
From this the result follows. 
Another consequence of Hölder’s inequality is the following. Let m(Ω) < +∞ and f ∈ Lp (Ω).
Setting g = 1, we obtain
(2) k f kL1 (Ω) ≤ (m(Ω))1/q k f kLp (Ω) .
Definition 4.4. For p ≥ 1, we say that a sequence {fn } ⊂ Lp converges to a function f ∈ Lp in
norm, if for every ε > 0 there exists N > 0 such that for all n > N , we have ||f − fn ||p < ε.
fn of elements of Lp converges to an Lp -function we say that the series is summable.
P
If a series
P P
We call fn absolutely summable if n ||fn || < ∞. (An absolutely convergent series of real
numbers always converges, but this is not true in general when one considers series of elements of
a normed space.)
Lemma 4.5. A normed space (X, || · ||) is complete if and only if every absolutely summable series
is summable.
We leave the proof of the lemma as an exercise for the reader. Using the lemma we now can
proof the following result.
REAL ANALYSIS LECTURE NOTES 3

Theorem 4.6 (Riesz-Fischer theorem). for all 1 ≤ p < ∞, the space Lp equipped with the norm
|| · ||p is a Banach space.
Proof. We only need to prove that Lp is complete, i.e., that every Cauchy sequence in Lp converges
in norm to an element of Lp . By Lemma 4.5, it suffices
P to show that every absolutely summable
series is summable. Suppose that {fn } is such that ||fn ||p = M < ∞. Define
n
X
gn (x) = |fk (x)|.
k=1
By Minkowski’s inequality we have
n
X
||gn ||p ≤ ||fk ||p ≤ M,
1
and so gnp ≤ M p . For each x, the sequence {gn (x)} is an increasing sequence of extended real
R
numbers (i.e., including the value ∞), and so it must converge to an extended real number g(x).
Then g(x) is a measurable function, and since gn ≥ 0, g ≤ M p by Fatou’s lemma. It follows
R p
p
P∞ g is integrable, and so g(x) is finite for a.e. x. For every x such that g(x) < ∞, the series
that
1 fk (x) converges absolutely, so in particular, it converges to a real number s(x). We set s(x) = 0
for those x P where g(x) = ∞. Thus we constructed a function s(x) which is the limit a.e. of partial
sums sn = n1 fk . It follows that s is measurable, and since |sn (x)| ≤ |g(x)|, we have s(x) ≤ g(x).
Consequently, s ∈ Lp , and
|sn (x) − s(x)|p ≤ 2p [g(x)]p .
Since 2p g p is integrable, and |sn (x) − s(x)|p → 0, we have by the Lebesgue Convergence theorem,
Z
|sn − s|p → 0.

Therefore, ||sn − s||p → 0, which proves the theorem. 


Let us now describe some natural dense subsets of Lp (Rn ).
Proposition 4.7. Let Ω be a bounded measurable subset of Rn . Then the set of all functions
continuous on Rn is dense in L1 (Ω).
Proof. It follows from the definition of the Lebesgue integral that the space of integrable simple
functions on X is dense in L1 (Ω). Furthermore, every simple function is a linear combination
of characteristic functions of some measurable subsets of Ω. Hence it suffices to show that the
characteristic function χY of a set Y of finite measure is a limit in L1 of a sequence of continuous
functions. From the definition of the Lebesgue measure for any given ε > 0 there exists an open
subset G ⊃ Y in Rn such that m(G\Y ) < ε/2. This also implies that there exists a closed subset
F ⊂ Y in Rn such that m(G) − m(F ) < ε. Consider now the function
d(x, Gc )
ϕε (x) = ,
d(x, Gc ) + d(x, F )
where d is the Euclidean distance from x to the set which is the second argument of the function.
The function ϕε is continuous on Rn since the denominator is strictly positive. Furthermore, ϕε
vanishes on Gc , the complement of G, and is identically equal to 1 on F . Hence,
Z
|χY − ϕε | ≤ ε
Rn
which proves the proposition. 
4 RASUL SHAFIKOV

This easily implies that L1 (Ω) is a separable space. Indeed, the space of polynomials with rational
coefficients is dense in the space of continuous functions (the Weierstrass theorem). One can also
show that Lp (Ω) is separable for 1 ≤ p < ∞. It is also easy to see that this remains true even if
m(Ω) = +∞.
4.2. Topological vector spaces and their duals.
Definition 4.8. If a vector space X (over the field of reals) is equipped with some topology, we
called X a topological vector space if the map X × X → X corresponding to vector addition in X
and the map R × X → X corresponding to scalar multiplication are both continuous.
Sometimes it is required that the topology on X is Hausdorff. This is always the case if the
topology comes from a metric (in particular, from a norm) on X.
Example 4.1. We give some examples of topological vector spaces.
(i) The space Lp is a topological vector space for any p ≥ 1 (prove it!).
(ii) The space C[0, 1] of continuous functions on the interval [0, 1]. One can show that this is a
Banach space equipped with the norm ||f || = supx∈[0,1] |f (x)|. In fact, any normed space,
complete or not, is a topological vector space.
(iii) The next is an example of a topological vector space which is not a normed space. Consider
the space C ∞ ([0, 1]) of smooth functions on [0, 1]. The topology on C ∞ ([0, 1]) can be
described as follows. For every integer k ≥ 0 we define a semi-norm
kf kk = sup {|f (k) (x)| : x ∈ [0, 1]}.
x∈[0,1]

Here f (k) is the derivative of f of order k. That || · ||k is a semi-norm, rather than a norm,
means that ||f ||k = 0 may hold for nonzero functions, for example any constant c satisfies
||c||1 = 0. The space C ∞ ([0, 1]) a complete metric space with the metric given by

X kf − gkk
(3) d(f, g) = 2−k f, g ∈ C ∞ ([0, 1]).
1 + kf − gkk
k=0
Topological vector spaces equipped with a complete metric that comes from a countable
collection of semi-norms are called Fréchet spaces.

Recall that a functional over a vector space X is a linear map φ : X → R.
Definition 4.9. Given a topological vector space X, the space of continuous linear functionals is
called the dual space of X, and denoted by X ∗ .
One can show that if a topological vector space X is finite-dimensional, then every linear map
on X is continuous. For infinite-dimensional vector spaces continuity of a functional is a nontrivial
condition.
Example 4.2.
(i) For a space Lp , p ≥ 1, choose some g ∈ Lq with 1/p + 1/q = 1. Then the map φg : Lp → R,
given by
Z
(4) φg (f ) = hφg , f i = f g,

is a continuous linear functional on Lp .


REAL ANALYSIS LECTURE NOTES 5
R1
(ii) On C[−1, 1] the map I given by I(f ) = −1 f dx is a continuous linear functional. Another
example of a continuous linear functional on C[0, 1] is given by δ0 (f ) = f (0). This is the
famous Dirac delta-function.
(iii) On C ∞ [−1, 1] consider the map Φ : C ∞ [−1, 1] → R given by
∂f
hΦ, f i = δ0 (f ) + (0).
∂x
Continuity of this functional can be verified from the convergence on the space C ∞ [−1, 1]
given by the metric in (3).
Lemma 4.10. Let X be a topological vector space, and φ be a linear functional. Then φ ∈ X ∗ if
and only if φ is continuous at some point x ∈ X.
Proof. For the proof in the nontrivial direction, let y ∈ X be arbitrary. For a given ε > 0, choose
a neighbourhood U of x such that
|φ(x) − φ(x0 )| < ε, for all x0 ∈ U.
Then the set V = U + (y − x) is a neighbourhood of y. If z ∈ V , then z + x − y ∈ U , and so
|φ(z) − φ(y)| = |φ(z − y + x) − φ(x)| < ε,
which shows continuity of φ at y. 
Thus, it suffices to test continuity of a functional at one point, for example, at the origin. Using
this, one can show that continuity of a functional on a topological vector space is equivalent to its
boundedness in some neighbourhood of the origin.
The dual space is itself a topological vector space. Its topology can be defined as follows. We
say that a sequence φn ∈ X ∗ converges to φ ∈ X ∗ if for any x ∈ X, we have lim φn (f ) = φ(f ). This
is the so-called weak∗ topology. It is the weakest topology that makes the pairing X × X ∗ → R a
continuous operation.
For a normed space X its dual space is also normed. The norm on X ∗ is given by
|φ(f )|
||φ||∗ = sup .
f ∈X\{0} kf k

For example, if g ∈ Lq , then the functional φg given by (4) has the norm which is equal to that of
g: ||φg ||∗ = ||g||q . Indeed, by Hölder’s inequality,
Z Z
f g ≤ |f g| ≤ ||f ||p ||g||q ,

which shows that ||φg ||∗ ≤ ||g||q . On the other hand, if f = |g|q/p sgn g, then |f |p = |g|q = f g, and
q/p
so ||f ||p = ||g||q . Therefore,
Z Z
hg, f i = f g = |g|q = (||g||q )q = ||g||q ||f ||p ,

which proves our assertion. In fact, the following holds.


Theorem 4.11 (Riesz Representation theorem). A linear map φ : Lp → R is a continuous linear
function on Lp if and only if there exists g ∈ Lq , 1/p + 1/q = 1, such that for all f ∈ Lp ,
Z
hφ, f i = g f.
6 RASUL SHAFIKOV

We do not give the proof of this theorem. Observe that now we have two topologies on the space
Lq : the normed topology and the weak∗ topology. These two are not the same, with weak∗ being
weaker than the normed topology (sometimes called the strong topology), i.e., weak∗ topology has
fewer opens sets. To see this it is enough to construct an example of a sequence in Lq that converges
weakly, but not in norm. Consider the space L2 (0, 1) which is the dual of itself, and consider the
sequence (√
n, x ∈ (0, n1 ),
φn (x) = .
0, x ∈ [ n1 , 1)
Clearly, φn ∈ L2 (0, 1), and φn (x) → 0 point-wise on (0, 1) as n → ∞. We use the following fact: if
a sequence φn converges to φ pointwise and converges to a function φ0 in norm, then φ = φ0 (prove
it!). From this it follows that the sequence φn does not converge in norm, since for any n we have
Z 1 1/2 Z 1/n !1/2
2
||φn − 0||2 = φn = n = 1.
0 0

But φn converges to 0 in weak∗ topology because for any f ∈ L2 (0, 1),


Z 1 Z 1/n Z 1/n !1/2

hφn , f i = φn f = nf ≤ f ,
0 0 0

by Hölder’s inequality. As n → ∞ the right hand-side in the above formula converges to zero for
every f ∈ L2 , which gives weak convergence φn → 0. On the other hand, by Hölder’s inequality,
convergence in norm always implies convergence in weak∗ topology.
4.3. Product spaces and Fubini’s theorem. Let (RN , m) and (Rn , µ) be the Euclidean spaces
with the corresponding Lebesgue measures. Then on the space RN × Rn = RN +n we may define a
new measure λ as follows: If A ⊂ RN and B ⊂ Rn are measurable, then
λ(A × B) := m(A) · µ(B).
One can show that λ can be extended from the collection of product sets in RN +n to a wide class
of sets in RN +n . In fact, one can prove that the measure λ obtained this way is nothing but the
Lebesgue measure on RN +n .
The following theorem gives sufficient conditions that allows one to replace an integral with
respect to a product measure by an iterated integral. We use dx (resp. dy) to denote integration
with respect to measure m (resp. µ), and dx dy to denote integration with respect to λ.
Theorem 4.12 (Fubini’s theorem). Let X × Y ⊂ (RN , m) × (Rn , µ) be measurable, and f ∈
L1 (X × Y, λ). Then
(i) for a.e. x, the function fx (y) = f (x, y) is integrable on Y ;
(ii) for a.e. y, the function
R fy (x) = f (x, y) is integrable on X;
(iii) the function x → R Y fx (y)dy is integrable on X;
(iv) the
R Rfunction y → X fRy (x)dx is integrable on R YR; 
(v) X Y fx (y)dy dx = X×Y f (x, y)dx dy = Y X fy (x)dx dy.
For the proof of Fubini’s theorem one can reduce the problem to simple functions and then use
the Lebesgue convergence theorem. We omit the details. Below we state a variation of Fubini’s
theorem that does not require integrability of the function f .
Theorem 4.13 (Tonelli’s theorem). Let X ×Y ⊂ (RN , m)×(Rn , µ) be measurable, and f : X ×Y →
R be a nonnegative measurable function. Then
REAL ANALYSIS LECTURE NOTES 7

(i) for a.e. x, the function fx (y) = f (x, y) is measurable on Y ;


(ii) for a.e. y, the function
R fy (x) = f (x, y) is measurable on X;
(iii) the function x → R Y fx (y)dy is measurable on X;
(iv) the
R Rfunction y → X fRy (x)dx is measurableRonRY ; 
(v) X Y fx (y)dy dx = X×Y f (x, y)dx dy = Y X fy (x)dx dy.

In the remaining part of the subsection we discuss another property of product spaces: possibility
of interchanging integration and differentiation.
Lemma 4.14. Let X × Y ⊂ RN × Rn be a product of open subsets, and let m(X) < ∞. Suppose
f (x, y) : X × Y → R is uniformly continuous on the closure of X × Y . Then the function
Z
(5) F (y) = f (x, y)dx
X
is continuous on Y .
Proof. Uniform continuity of f on X × Y means that for any ε > 0 there exists δ > 0 such that
|f (x, y) − f (x0 , y 0 )| < ε ⇐= |(x, y) − (x0 , y 0 )| < δ.
Then,
Z Z
0 0
|F (y) − F (y )| = f (x, y) − f (x, y )dx ≤ |f (x, y) − f (x, y 0 )|dx ≤ εm(X),
X X
which proves continuity of F . 
The next theorem gives a sufficient condition under which we can differentiate under the integral
sign. It can also be interpreted as commutativity of the operations of integration and differentiation
under the given assumptions.
Theorem 4.15. Let X, Y be as in the previous lemma. Assume that for some 1 ≤ i ≤ m, the
∂f
functions f (x, y) and ∂y i
are uniformly continuous on the closure of X × Y . Then the function
F (y) defined by (5) is of class C 1 (Y ), and
Z
∂F ∂f
(y) = (x, y)dx.
∂yi X ∂yi

Proof. Let y0 ∈ Y be arbitrary. For h = (0, . . . , 0, hi , 0, . . . , 0) ∈ Rm by the Mean Value theorem


we have
F (y0 + h) − F (y0 ) f (x, y0 + h) − f (x, y0 )
Z Z
∂f
= dx = (x, y0 + θh)dx, θ ∈ (0, 1).
hi X h i X ∂yi
∂f
Note that θ depends on x. Since ∂y i
is uniformly continuous on X × Y , for any ε > 0 there exists
δ > 0 such that
∂f ∂f
(x, y) − (x, y0 ) < ε ⇐= |y − y0 | < ε.
∂yi ∂yi
Thus, for |h| = |hi | < δ, we have
Z  
F (y0 + h) − F (y0 )
Z
∂f ∂f ∂f
− (x, y0 )dx = (x, y0 + θh) − (x, y0 ) dx ≤
hi X ∂yi X ∂yi ∂yi
Z
∂f ∂f
(x, y0 + θh) − (x, y0 ) dx ≤ εm(X).
X ∂yi ∂yi
8 RASUL SHAFIKOV

This shows that


F (y0 + h) − F (y0 )
Z
∂f
lim = (x, y0 )dx.
hi →0 hi X ∂yi
∂F
Finally, the continuity of ∂y i
follows from the lemma. 
v: 2019-02-05
REAL ANALYSIS LECTURE NOTES

RASUL SHAFIKOV

5. Divergence theorem and consequences


5.1. Integration on hypersurfaces. By a hypersurface Γ of class C k in Rn , k ∈ Z+ , we mean
a compact subset of Rn admitting a finite covering by open connected subsets Uj : Γ ⊂ ∪N j=1 Uj ,
k
with the following property: For every j there exists a function ρj ∈ C (Uj ) such that the gradient
∂ρ ∂ρ
∇ρj (x) = ( ∂xj1 , ..., ∂xnj ) does not vanish in Uj and Γ ∩ Uj = {x ∈ Uj : ρj (x) = 0}. Such a
function ρj is called a local defining function of Γ. Very often we will deal with the case when
Γ = {x ∈ Rn : ρ(x) = 0, ∇ρ 6= 0}, where ρ is a C k -function on Rn .
Another way to define a hypersurface is through parametrization. Let D be an open connected
subset of Rn−1 and Φ = (Φ1 , ..., Φn ) : D → Rn be an injective map of class C k (D). The hypersurface
Γ = Φ(D) is called a parametrized hypersurface.
Example 5.1. On R2 with coordinates (x, y) consider for some k ∈ Z+
(
y, for x ≤ 0,
ρ(x, y) = .
y − xk , for x > 0
Then Γ ⊂ R2 given by ρ(x, y) = 0 is a hypersurface of class C k−1 . It admits a global C k−1 -smooth
parametrization Φ : R → Γ given by x 7→ (x, xk ) for x > 0, and x 7→ (x, 0) for x ≤ 0. 
Let now Ω be a bounded domain (an open connected subset) of Rn with the boundary ∂Ω
consisting of a finite number of disjoint hypersurfaces Γk of class C 1 . A local defining function ρj
for Γk as defined above is called a local defining function of Ω if Ω ∩ Uj = {x ∈ Uj : ρj (x) < 0}.
Then the gradient vector ∇ρj (x) defines the outward-pointing normal direction to ∂Ω at a point
x ∈ ∂Ω. We denote by
∇ρ(x)
~n(x) :=
|∇ρ(x)|
the unit vector in the outward-pointing normal direction. Let p = (p1 , ..., pn ) be a boundary point
of Ω and ρ be a local defining function of ∂Ω near p. Since ∇ρ(p) 6= 0, then ∂ρ(p)/∂xk 6= 0 for
some 1 ≤ k ≤ n. By the Implicit Function theorem there exists a neighbourhood U of p, a function
ψ of class C 1 such that
(1) ∂Ω ∩ U = {x ∈ U : xk = ψ(x1 , ..., xk−1 , xk+1 , ..., xn )}.
Shrinking U if necessary we may assume that U = U 0 × U 00 , where U 0 is a ball in the space Rn−1
centred at (p1 , ..., pk−1 , pk+1 , ..., pn ) and U 00 is an interval in R centred at pk . This representation
allows us to view (x1 , ..., xk−1 , xk+1 , ..., xn ) as local coordinates on ∂Ω: the projection
πk : x 7→ (x1 , ..., xk−1 , xk+1 , ..., xn )
πk : ∂Ω ∩ U −→ U 0
is bijective. We point out that
πk−1 (x1 , ..., xk−1 , xk+1 , ..., xn ) = (x1 , ..., xk−1 , ψ(x1 , ..., xk−1 , xk+1 , ..., xn ), xk+1 , ..., xn )
1
2 RASUL SHAFIKOV

U 0.
for (x1 , ..., xk−1 , xk+1 , ..., xn ) ∈ The map πk−1 : U 0 → ∂Ω ∩ U is clearly a local parametrization
of the hypersurface ∂Ω. We call U a coordinate neighbourhood of p.
Example 5.2. The upperphemisphere S + = S 2 ∩ {z > 0} in R3 with coordinates (x, y, z) is the
graph of the function z = 1 − x2 − y 2 . The unit normal vector to S + at a point (x, y, z) ∈ S + is
~n = (x, y, z). 
Now let f be a continuous (this assumption can be considerably weakened) function on ∂Ω. Our
goal is to define the integral of f over ∂Ω as a surface integral . If an open set X ⊂ ∂Ω admits a
parametrization Φ : D → Rn , Φ(D) = X ⊂ ∂Ω then we define
Z Z
(2) f (x)dS = ~ |dt,
f ◦ Φ(t)|N
X D

where the coordinates of the vector N ~ are determined from N ~ = | det(∇Φ1 , ..., ∇Φn , ~e )|. Here
~e = (~e1 , . . . , ~en ) is a formal vector whose coordinates are the vectors of the standard basis in Rn .
In fact, one can show that N ~ is the normal vector to X ⊂ ∂Ω.
Now if U is a coordinate neighbourhood where ∂Ω admits representation as in (1) and X is an
open subset in ∂Ω ∩ U then
Z Z
(3) f dS = f ◦ πk−1 (1+ k ∇ψ k2 )1/2 dx1 ...dxk−1 dxk+1 ...dxn .
X πk (X)

Both definitions agree because (1+ k ∇ψ k2 )1/2 is just the length of the normal vector
 
(4) ~ = ∂ψ , ..., 1, ..., ∂ψ ,
N
∂x1 ∂xn
(here 1 is on the k-th position) corresponding to the local parametrization of ∂Ω. We refer to dS
or the equivalent expression in a local parametrization as the hypersurface area measure (or the
~/ k N
element of the surface area in some literature). Let νk be the angle between ~n = N ~ k and the
n
vector ~ek (the k-th vector of the standard base of R ). Then
cos νk = (~ek , ~n) = (1+ k ∇ψ k2 )−1/2 .
Thus,
Z Z
1
(5) f dS = f ◦ πk−1 dx1 ...dxk−1 dxk+1 ...dxn .
X πk (X) cos νk
If f ≡ 1, then the integral X dS represents the area of X. This terminology comes from R3 , where
R
the integral is indeed the area of a surface, while for n > 3, it is actually the (n − 1)-dimensional
volume.
Example 5.3. Consider the surface integral of a continuous function f (x, y, z) over
p the upper
+ 2 3
hemisphere S = S ∩ {z > 0} ⊂ R . First we use the parametrization z = ψ(x, y) = 1 − x2 − y 2
for x2 + y 2 < 1. Then
x2 y2 1
|1 + |∇ψ|2 | = 1 + 2 2
+ 2 2
= .
1−x −y 1−x −y 1 − x2 − y 2
Therefore, from (3) we obtain
Z Z p dxdy
f dS = f (x, y, 1 − x2 − y 2 ) p .
S+ {x2 +y 2 <1} 1 − x2 − y 2
REAL ANALYSIS LECTURE NOTES 3

Now we use the parametrization of S+


that comes from the spherical coordinates. Let Φ : R2 → R3
be given by
Φ(θ, φ) = (sin θ cos φ, sin θ sin φ, cos θ).
Then Φ ((0, π/2) × (0, 2π)) = S + (excluding a set of measure 0). To apply (2) we first compute
vectors of partial derivatives with respect to θ and φ. We have
Φθ = (cos θ cos φ, cos θ sin φ, − sin θ), Φφ = (− sin θ sin φ, sin θ cos φ, 0).
Then
~e1 ~e2 ~e3
~ = cos θ cos φ cos θ sin φ − sin θ = sin θ(cos φ sin θ, sin φ sin θ, cos θ),
N
− sin θ sin φ sin θ cos φ 0
~ | = sin θ. We conclude that
and so |N
Z Z
f dS = f (sin θ cos φ, sin θ sin φ, cos θ) sin θ dθ dφ.
S+ (0,π/2)×(0,2π)

That both integrals agree can be verified, for example, by calculating the surface area of S + using
these two representations of the surface integral. 
Finally, if Uj is an open covering of ∂Ω by coordinate neighbourhoods, we set Xk = Uk \ ∪k−1
j=1 Uj
so that ∂Ω = ∪Xk and Xk are disjoint. Then we set
Z XZ
f dS = f dS.
∂Ω k Xk

One can view this as a definition of the surface integral over ∂Ω. It is not difficult to verify
that the integral is well-defined, i.e., it is independent of the choice of the covering by coordinate
neighbourhoods, local defining functions, etc. We leave this verification as an exercise for the
reader.

5.2. Divergence theorem. The following theorem connects the integral over a domain Ω with
the surface integral over its boundary Ω. It was discussed in some form in the work of Lagrange,
Gauss, and most notably Ostrogradski, who gave a proof that would be considered complete by
modern standards. It is sometimes referred to as Gauss-Ostrogradski theorem.
Recall that a vector field F on a domain Ω ⊂ Rn is simply a map F : Ω → Rn . The geometric
interpretation of a vector field (which becomes nontrivial and important when one considers abstract
manifolds) is that at each point x ∈ Ω the value F (x) is thought of as a vector in Rn originating
at x. For example, given a function f : Ω → R, the gradient ∇f is a vector field on Ω. Another
example is a vector field given by (4) assigning to every boundary point of ∂Ω a normal vector N ~
~
to Ω. The divergence of a vector field F is defined as div F = ∂F1
+ ... + ∂Fn
.
∂x1 ∂xn

Theorem 5.1 (Divergence theorem). Let Ω be a bounded domain in Rn


with the boundary of class
1 ~ 1
C . Let F = (F1 , ...., Fn ) be a a vector field of class C(Ω) ∩ C (Ω). . Then
Z Z
(6) ~
div F dx = (F~ , ~n)dS,
Ω ∂Ω
where (a, b) denotes the usual scalar product of two vectors in Rn and ~n denotes the vector field of
the outward-pointing unit normals to ∂Ω.
For n = 1 the Divergence theorem becomes the Fundamental Theorem of Calculus.
4 RASUL SHAFIKOV

Proof. For simplicity of notation we assume that n = 3, the proof in the general case is completely
analogous. We will assume that Ω = {(x, y, z) : (x, y) ∈ D, ψ1 (x, y) < z < ψ2 (x, y)}, where
D is a domain in R2 , and ψ and φ are smooth functions on D. Moreover, we assume that a
similar representation is also valid for projections onto the other two coordinate planes. Such
domains sometimes are called simple. If the domain Ω is not simple in all three directions, then
we may divide it into smaller domains Ωi which are simple. Adding the results for each i gives the
Divergence theorem for Ω and ∂Ω. Indeed, since after splitting Ω the surface integrals over the
newly introduced boundaries occur twice with the opposite normal vectors ~n, their sum is equal to
zero, and we end up with the surface integral over the original ∂Ω.
Denote by Γj the surface
Γj = {(x, y, ψj (x, y)) ∈ R3 : (x, y) ∈ D}, j = 1, 2.
Then, by Fubini’s theorem,
Z Z Z ψ2 (x,y) !
∂F3 (x, y, z) ∂F3 (x, y, z)
dxdydz = dz dxdy =
Ω ∂z D ψ1 (x,y) ∂z
Z Z Z
F3 (x, y, ψ2 (x, y))dxdy − F3 (x, y, ψ1 (x, y))dxdy = F3 (x, y, z) cos ν3 dS,
D D Γ1 ∪Γ2

where ν3 is the angle between the vector ~e3 and the normals ( ∂ψ2 ∂ψ2
∂x , ∂y , 1) when (x, y, z) ∈ Γ2 and
(− ∂ψ ∂ψ2
∂x , − ∂y , −1)
1
when (x, y, z) ∈ Γ1 respectively. In the last step we used (5).
Let now Γ3 = {(x, y, z) : (x, y) ∈ ∂D, ψ1 (x, y) < z < ψ2 (x, y)} be the “vertical” part of ∂Ω.
Let ν3 still denote the angle between Γ3 and the outward-pointing unit normal vector to Γ3 . Then
ν3 = π/2 and cos ν3 = 0. Then Z
F3 (x, y, z) cos ν3 dS = 0.
Γ3
Since ∂Ω = Γ1 ∪ Γ2 ∪ Γ3 we can write
Z Z
F3 (x, y, z) cos ν3 dS = F3 (x, y, z) cos ν3 dS,
∂Ω Γ1 ∪Γ2
and therefore,
Z Z
∂F3 (x, y, z)
(7) dxdydz = F3 (x, y, z) cos ν3 dS.
Ω ∂z ∂Ω
Similarly, we establish the formulas
Z Z
∂F1 (x, y, z)
(8) dxdydz = F1 (x, y, z) cos ν1 dS,
Ω ∂x ∂Ω
and
Z Z
∂F2 (x, y, z)
(9) dxdydz = F2 (x, y, z) cos ν2 dS,
Ω ∂y ∂Ω
where ν1 and ν2 are angles between the outward-pointing unit normal to ∂Ω and the standard base
vectors ~e1 and ~e2 . Taking the sum in (7), (8), (9) we obtain
Z Z
(10) div F~ dx = (F1 (x, y, z) cos ν1 + F2 (x, y, z) cos ν2 + F3 (x, y, z) cos ν3 ,
Ω ∂Ω
which is precisely (6) for dimension 3. 
REAL ANALYSIS LECTURE NOTES 5

In conclusion we mention some useful consequence of the Divergence theorem. Let f be a function
of class C(Ω) ∩ C 1 (Ω). Applying the Divergence theorem to the vector field F~ = f~ek we obtain
Z Z
∂f
(11) (x)dx = f (x)(~ek , ~n(x))dS.
Ω ∂xk ∂Ω
Let f = u · v. Then, since
∂u ∂(uv) ∂v
v= −u ,
∂xk ∂xk ∂xk
formula (11) gives
Z Z Z
∂u ∂v
(12) (x)v(x)dx = u(x)v(x)(~ek , ~n(x))dS − u dx,
Ω ∂xk ∂Ω Ω ∂xk
which is just the multidimensional integration by parts formula. Since (~ek , ~n) = cos νk , where νk is
the angle between vectors ek and ~n, the integration by parts formula can be rewritten as
Z Z Z
∂u ∂v
(x)v(x)dx = u(x)v(x) cos νk dS − u dx.
Ω ∂x k ∂Ω Ω ∂xk
2
Recall that the Laplacian of a C 2 -smooth function u(x1 , . . . , xn ) is the function ∆u = nj=1 ∂∂xu2 (x).
P
j
Consider now two functions u and v of class C 2 (Ω) ∩ C 1 (Ω) such that their Laplacians ∆u and ∆v
are integrable in Ω. Clearly,
∆u = div(∇u).
Furthermore, for every boundary point x ∈ ∂Ω the scalar product (∇u(x), ~n(x)) coincides with the
∂u
directional derivative ∂~
n . On the other hand,
v∆u = vdiv(∇u) = div(v∇u) − (∇u, ∇v).
Integrating this identity over Ω and applying the Divergence theorem we obtain the first Green’s
formula:
Z Z Z
∂u
(13) v∆udx = v dS − (∇u, ∇v)dx.
Ω ∂Ω ∂~n Ω
Similarly we have
Z Z Z
∂v
u∆vdx = u dS − (∇v, ∇u)dx.
Ω ∂Ω ∂~
n Ω
Subtracting this last equality from (13) we obtain the second Green’s formula
Z Z  
∂u ∂v
(14) (v∆u − u∆v)dx = v −u dS.
Ω ∂Ω ∂~n ∂~n
5.3. Change of variables in the integral. The following theorem is the multidimensional version
of the substitution rule in the integral.
Theorem 5.2. Let Φ : Ω̄0 → Ω̄ be a C 1 -diffeomorphism between two domains in Rn with C 1 -smooth
boundary, and let f ∈ L1 (Ω). Then
Z Z
f (x)dx = f ◦ Φ(y)|JΦ (y)|dy,
Ω Ω0
where |JΦ | denotes the determinant of the differential (Jacobian matrix) of the map Φ.
6 RASUL SHAFIKOV

Proof. We will give the proof for n = 3. Assume that the coordinates in Ω are (x, y, z) and (u, v, w)
in Ω0 . Suppose ∂Ω is parametrized by a function
φ(s, t) → (x(s, t), y(s, t), z(s, t)), (s, t) ∈ D ⊂ R2 .
Then after substituting φ into Φ, the hypersurface ∂Ω0 is given by some function
(s, t) → (u(s, t), v(s, t), w(s, t)).
Define the function Z z
F (x, y, z) = f (x, y, ξ)dξ.
0
∂(x,y)
Then ∂F∂z = f in Ω. We will use the notation to denote the determinant of the Jacobian
∂(s,t)
matrix obtained by taking partial derivatives of the functions x(s, t) and y(s, t) with respect to
variables (s, t). Similar notation will be used for any other collection of functions and variables.
Set  
~ ∂(y, z) ∂(z, x) ∂(x, y)
N = (N1 , N2 , N3 ) = , , ,
∂(s, t) ∂(s, t) ∂(s, t)
and  
N~ 0 = (N10 , N20 , N30 ) = ∂(v, w) , ∂(w, u) , ∂(u, v) .
∂(s, t) ∂(s, t) ∂(s, t)
We claim that
∂(Φ1 , Φ2 ) 0 ∂(Φ1 , Φ2 ) 0 ∂(Φ1 , Φ2 ) 0
(15) N3 = N + N + N3 .
∂(v, w) 1 ∂(w, u) 2 ∂(u, v)
This can be verified by writing
x = Φ1 (u(s, t), v(s, t), w(s, t)), y = Φ2 (u(s, t), v(s, t), w(s, t)),
differentiating these equations with respect to s and t and then substituting into N3 = xs yt − xt ys .
The vectors N ~ and N ~ 0 are normal to ∂Ω and ∂Ω0 respectively, say, N is the outward-pointing
normal to ∂Ω, and N ~ 0 is the inward-point normal to ∂Ω0 . Then
(16) ~ | and ~n0 = −N
~ /|N
~n = N ~ 0 /|N
~ 0|
are the corresponding unit normal vectors. By the Divergence theorem,
Z Z Z Z
∂F
f dx dy dz = dx dy dz = F cos(e3 , ~n)dS = F N3 ds dt.
Ω Ω ∂z ∂Ω D
Substitution of N3 from (15) gives
Z Z  
∂(Φ1 , Φ2 ) 0 ∂(Φ1 , Φ2 ) 0 ∂(Φ1 , Φ2 ) 0
f dx dy dz = F N1 + N2 + N3 ds dt.
Ω D ∂(v, w) ∂(w, u) ∂(u, v)
Since the surface measure on ∂Ω0 is given by dS 0 = |N ~ 0 | ds dt and since
 
(N10 , N20 , N30 ) = −|N
~ 0 | cos(e1 , ~n0 ), −|N
~ 0 | cos(e2 , ~n0 ), −|N
~ 0 | cos(e3 , ~n0 ) ,

we get
Z Z  
∂(Φ1 , Φ2 ) 0 ∂(Φ1 , Φ2 ) 0 ∂(Φ1 , Φ2 )
f dx dy dz = − F cos(e1 , ~n ) + cos(e2 , ~n ) + cos(e3 , ~n ) dS 0 .
0
Ω ∂Ω0 ∂(v, w) ∂(w, u) ∂(u, v)
Evaluating the last surface integral by the divergence theorem, and using the relation
     
∂ ∂(Φ1 , Φ2 ) ∂ ∂(Φ1 , Φ2 ) ∂ ∂(Φ1 , Φ2 ) ∂(Φ1 , Φ2 , Φ3 )
F + F + F =f = f JΦ ,
∂u ∂(v, w) ∂v ∂(w, u) ∂w ∂(u, v) ∂(u, v, w)
REAL ANALYSIS LECTURE NOTES 7

we finally obtain Z Z
f dx dy dz = − f Jφ du dv dw.
Ω Ω0
Since the Jacobian does not vanish in Ω0 , it is either positive or negative. Taking f ≡ 1 we see that
it is negative for our choice of the sign in the normal vectors in (16). Therefore, −JΦ = |JΦ |. The
proof for other choices of sign in (16) is similar. 
Example 5.4. Consider again the spherical coordinates in Rn :
Φ : (r, θ1 , ..., θn−1 ) 7→ (x1 , ..., xn ),
Φ : (0, +∞) × (0, π) × ... × (0, π) × (0, 2π),
where
x1 = r cos θ1 ,
x2 = r sin θ1 cos θ2 ,
.....................................
xn−1 = r sin θ1 sin θ2 ... sin θn−2 cos θn−1 ,
xn = r sin θ1 sin θ2 ... sin θn−1 .
Then, for a domain D ⊂ Rn ,
Z Z
f (x)dx = f ◦ Φ(r, θ)rn−1 (sin θ1 )n−2 ... sin θn−2 drdθ1 ...dθn−1 .
D Φ−1 (D)

If Γ = RS n−1= {x ∈ Rn : |x| = R} is the sphere centred at the origin of radius R, then from (2)
we have Z Z
f (x)dS = Rn−1 f ◦ Φ(R, θ)(sin θ1 )n−2 ... sin θn−2 dθ1 ...dθn−1 ,
RS n−1 D0
with D0 = ×(0, π) × ... × (0, π) × (0, 2π). Suppose that f ∈ L1 (Rn ). Then rewriting the above
integral we obtain Z Z ∞ Z 
f (x)dx = f (x)dS rn−1 dr.
Rn 0 rS n−1
This can be interpreted as integration over a sphere of radius r, and then over all concentric spheres
for 0 < r < ∞. 
Let A ∈ O(n) be an orthogonal matrix : AAt = Id and A be the corresponding linear transfor-
mation of Rn , i.e., A(t) = At. It follows from the formula of the change of variables in the integral
that Z Z Z
f (x)dS = f ◦ A(t)dS = f (At)dS.
S n−1 S n−1 S n−1
This property is often useful in computations of spherical integrals.
v: 2019-02-11
REAL ANALYSIS LECTURE NOTES

RASUL SHAFIKOV

6. Differential Equations
6.1. Ordinary Differential Equations. Consider a system of d first order ordinary differential
equations (ODE for short) and an initial condition
(1) y 0 = f (t, y), y(t0 ) = y0 ,
where y = (y1 , . . . , yd ) is a vector of unknown functions of a real variable t, y 0 = (dy1 /dt, . . . , dyd /dt),
and f : Ω → Rn is a continuous map on a domain Ω ⊂ Rn+1 . We say that y = y(t) defined on a
t-interval J containing t0 is a solution of the initial value problem (1) if y(t0 ) = y0 , (t, y(t)) ∈ Ω,
y(t) is differentiable and y 0 (t) = f (t, y(t)) for t ∈ J. These requirements are equivalent to the
following: y(t0 ) = y0 , (t, y(t)) ∈ Ω, y(t) is continuous and
Z t
y(t) = y0 + f (s, y(s))ds, t ∈ J.
t0
Here integration should be understood component-wise.
An initial value problem involving a system of equations of m-th order
(2) z (m) = F (t, z, z (1) , . . . , z (m−1) ), z (j) (x0 ) = z0j for j = 0, . . . , m − 1,
where z (j) = dj z/dtj , z and F are n-dimensional vectors, and F is defined on an (mn + 1) di-
mensional domain Ω, can be considered as a special case of (1), where y is a d-dimensional vector
(2) (m−1)
symbolically, y = (z, z (1) , . . . , z (m−1) ), or more precisely, y = (z1 , . . . , zn , z10 , . . . zn0 , z1 , . . . , zn ).
Correspondingly,
f (t, y) = (z (1) , . . . , z (m−1) , F (t, y)), y0 = (z0 , . . . , z0m−1 ).
For example, if n = 1, then z is a scalar, and (2) becomes
y10 = y2 , . . . , ym−1
0 0
= ym , y m = F (t, y1 , . . . , ym ),
yj (t0 ) = z0j−1 for j = 1, . . . , m,
where y1 = z, y2 = z 0 , . . . , ym = z (m−1) .
The most fundamental question concerning ODE (1) is the existence and uniqueness of solutions.
Example 6.1. Consider the initial value problem given by y 0 = y 2 and y(0) = c > 0. It is easy to
c
see that y = 1−ct is a solution, but it exists only on the range −∞ < t < 1/c, which depends on
the initial condition. The initial value problem y 0 = |y|1/2 , y(0) = 0 has more than one solution, in
fact, it has a one-parameter family of solutions defined by y(t) = 0 for t ≤ c, y(t) = (t − c)2 /4 for
t ≥ c ≥ 0. 
The following result gives basic conditions when a local solution of an ODE exists and is unique.
Theorem 6.1. Let y, f ∈ Rd , f (t, y) be continuous on R = {|t−t0 | ≤ a, |y−y0 | ≤ b} and uniformly
Lipschitz continuous with respect to y. Let M be a bound for |f (t, y)| on R, α = min(a, b/M ). Then
the initial value problem (1) has a unique solution y = y(t) on [t0 − α, t0 + α].
1
2 RASUL SHAFIKOV

Proof. Let y 0 (t)≡ y0 . Suppose that y k (t)


has been defined on [t0 − α, t0 + α], continuous, and
satisfies |y k (t) − y0 | ≤ b for k = 0, 1, . . . , n. Put
Z t
n+1
(3) y (t) = y0 + f (s, y n (s)) ds.
t0

Then, since f (t, y n (t)) is defined and continuous on [t0 − α, t0 + α], the same holds for y n+1 (t). It
is clear that Z t
n+1
|y (t) − y0 | ≤ |f (s, y n (s)|ds ≤ M α ≤ b.
t0
Hence, y 0 (t), y 1 (t), . . .
are defined and continuous on [t0 − α, t0 + α] and |y n (t) − y0 | ≤ b.
It will now be verified by induction that
M K n |t − t0 |n+1
(4) |y n+1 (t) − y n (t)| ≤ , for t0 − α ≤ t ≤ t0 + α, n = 0, 1, . . . ,
(n + 1)!
where K is a Lipschitz constant for f . Clearly, (4) holds for n = 0. Assume that it holds for up to
n − 1. Then Z t
n+1 n
f (s, y n (s)) − f (s, y n−1 (s) ds, n ≥ 1.
 
y (t) − y (t) =
t0
Thus, the definition of K implies that
Z t
n+1 n
|y (t) − y (t)| ≤ K |y n (s) − y n−1 (s)| ds
t0
and so, by (4),
t
M Kn M K n |t − t0 |n+1
Z
n+1 n
|y (t) − y (t)| ≤ |s − t0 |n ds = .
n! t0 (n + 1)!
This proves (4) for general n. It follows from this inequality that
X∞
y0 + [yn+1 (t) − yn (t)] =: y(t)
n=0
is uniformly convergent on [t0 − α, t0 + α], i.e., we have a uniform limit
(5) y(t) = lim yn (t).
n→∞
Since f (t, y) is uniformly continuous on R, f (t, yn (t)) → f (t, y(t)) as n → ∞ on [t0 − α, t0 + α].
Thus, by taking the limit in the integral in (3) gives
Z t
(6) y(t) = y0 + f (s, y(s))ds.
t0
Hence, (5) is a solution of (1).
In order to prove uniqueness, let y = z(t) be any solution of (1) on [t0 − α, t0 + α]. Then
Z t
z(t) = y0 + f (s, z(s))ds.
t0
An induction similar to that used above gives, using (3),
M K n |t − t0 |n+1
|y n (t) − z(t)| ≤ for t0 − α ≤ t ≤ t0 + α, n = 0, 1, . . . .
(n + 1)!
If n → ∞, it follows from (5) that |y(t) − z(t)| ≤ 0, i.e., y(t) ≡ z(t). This proves the theorem. 
REAL ANALYSIS LECTURE NOTES 3

One can show that if the function f in (1) is merely continuous, then the initial value problem
always has a solution, but it may not be unique. This is Peano’s Existence theorem.
Consider now a homogeneous linear system of differential equations of the form
(7) y 0 = A(t)y
and the corresponding inhomogeneous system
(8) y 0 = A(t)y + f (t),
where A(t) is a d × d matrix of continuous functions in t, and f (t) is a continuous vector function
of size d. It is a consequence of the theorem on existence and uniqueness of solutions of ODEs
that (7) has unique solution given the initial condition y(t0 ) = y0 . Further, if y(t) is a solution
of (7) and y(t0 ) = 0 for some t0 , then y(t) ≡ 0. The following is immediate
Proposition 6.2 (Principle of Superposition). Let y = y1 (t), y2 (t) be solutions of (7), then any
linear combination y = c1 y1 (t) + c2 y2 (t) with constants c1 , c2 is also a solution. If y = y1 (t) and
y = y0 (t) are solutions of (7) and (8) respectively, then y = y0 (t) + y1 (t) is a solution of (8).
By a fundamental matrix Y (t) of (7) we mean a d × d matrix such that its columns are solutions
of (7) and det Y (t) 6= 0. If Y = Y0 (t) is a fundamental matrix of solutions and C is a constant d × d
matrix, then Y (t) = Y0 (t)C is also a solution, in fact, any solution of (7) can be obtained this way
for some suitable C.
Example 6.2. Let R be a constant d × d matrix with real coefficients. Consider the system of
differential equations
(9) y 0 = R y.
Let y1 6= 0 be a constant vector, and λ be a (complex) number. By substituting y = y1 eλt into the
equation we see that a necessary and sufficient condition for y to be a solution of (9) is
Ry1 = λy1 ,
i.e., that λ is an eigenvalue and y1 6= 0 be a corresponding eigenvector of R. Thus to each eigenvalue
λ of R there corresponds at least one solution of (9). If R has distinct eigenvalues λ1 , λ2 , . . . , λd
with linearly independent eigenvectors y1 , . . . , yd , then
 
Y = y1 eλ1 t , . . . , y1 eλ1 t
is a fundamental matrix for (9). 
Finally, linear differential equations of higher order can be reduced to a system of first order.
Indeed, let pj (t), j = 0, . . . , d−1, and h(t) be continuous functions. Consider the linear homogeneous
equation of order d
(10) u(d) + pd−1 (t)u(d−1) + · · · + p1 (t)u0 + p0 (t)u = 0
and the corresponding inhomogeneous equation
(11) u(d) + pd−1 (t)u(d−1) + · · · + p1 (t)u0 + p0 (t)u = h(t).
We let y = (u, u(1) , . . . , u(d−1) ),
 
0 1 0 0 ··· 0
 0
 0 1 0 ··· 0 

A(t) = 
 · · · ,

 0 0 0 0 ··· 1 
−p0 −p1 −p2 −p3 ··· −pd−1
4 RASUL SHAFIKOV

and f (t) = (0, . . . , 0, h(t)). With this choice of u, A and f , equation (10) becomes equivalent to (7),
while (11) equivalent to (8). With this we can apply results available for first order systems, in
(d−1)
particular, given the initial conditions u(t0 ) = u0 , u0 (t0 ) = u00 , ..., u(d−1) (t0 ) = u0 , where uj0 are
arbitrary numbers, the corresponding initial value problem has a unique solution. The Principle of
Superposition also holds: let u = u1 (t), u2 (t) be two solutions of (10), then any linear combination
u(t) = c1 u1 (t) + c2 u2 (t) is also a solution, and u(t) = u1 (t) + u0 (t) is a solution of (11) if u0 (t) is.
6.2. Partial Differential Equations. A partial differential equation (PDE for short) is an equa-
tion that involves an unknown function of two or more independent variables and certain partial
derivatives of the unknown function. More precisely, let u denote a function of n independent
variables x1 , . . . , xn , n ≥ 2. Then a relation of the form
(12) F (x1 , . . . , xn , u, ux1 , . . . , uxn , ux1 x2 , . . . ) = 0,
where F is a function of its arguments, is a partial differential equation in u. The following equations
are some examples of PDEs on R2 with coordinates (x, y):
(13) xux + yuy − 2u = 0
(14) yux − xuy = x
(15) uxx − uy − u = 0
(16) uux + yuy − u = xy 2
(17) uxx + x(uy )2 + yu = y
A typical problem in the theory of PDEs is for a given equation to find on some domain of Rn
a solution satisfying certain additional initial or boundary conditions. Analogous to ODEs, the
highest-order derivative appearing in a PDE is called the order of the equation. Thus, (13), (14),
and (16) are all first-order PDEs, and the remaining two are second-order. If there exists a function
u defined in a domain under consideration, such that u and its derivatives identically satisfy (12),
then u is called a solution of the equation.
A PDE is called linear if it is at most of the first order in u and its derivatives. Equation (13), (14),
and (15) above are linear, while the other two are not.
Example 6.3. The first-order linear ODE of the form
du
+ u = f (x)
dx
has the general solution Z x
u(x) = e−(x−t) f (t)dt + Ce−x .
0
Now if we consider the first order PDE on R2 for the unknown function u = u(x, y),
∂u
(18) + u = f (x),
∂x
then its general solution is given by
Z x
u(x) = e−(x−t) f (t)dt + g(y)e−x ,
0
where g(x) is an arbitrary function of y. It is easy to see that for any choice of g, the function
u satisfies (18). Thus the general solution of a PDE may contain some arbitrary functions, not
necessarily constants. 
REAL ANALYSIS LECTURE NOTES 5

A map between function spaces that involves differentiation is called a differential operator. For
∂2 ∂2
example, the map L : C ∞ (R2 ) → C ∞ (R2 ) given by L(u) = ∂t 2 − ∂x2 is a second-order differential
operator. A differential operator L is called linear if L(c1 u1 + c2 u2 ) = c1 L(u1 ) + c2 L(u2 ). It is
immediate that L is linear if and only if the PDE L(u) = 0 is linear, and that any finite sum of
linear differential operators is again linear. Given a linear operator L, the PDE Lu = 0 is called
a homogeneous PDE, while Lu = f is inhomogeneous for f 6≡ 0. As for ODEs, the Principle of
Superposition also holds: a linear combination of solutions of a homogeneous equation is again a
solution, and the sum of solutions of a homogeneous and inhomogeneous equation is a solution of
the inhomogeneous one.
We now consider three classical PDEs: the Wave equation, the Heat equation, and the Laplace
equation. For a function u = u(x, t), where t ∈ R, x = (x1 , . . . , xn ) ∈ Rn , the equation of the form
∂2u
 2
∂2u ∂2u

2 ∂ u
(19) − c + · · · + = − c2 ∆x u = 0 c = const.,
∂t2 ∂x21 ∂x2n ∂t2
is called the Wave equation. This equation describes many types of elastic and electromagnetic
waves. In many physical application the variable t represents time and x represents coordinates in
the Euclidean space where the physical experiment takes place. As an equation of second order, a
typical initial condition for the Wave equation is of the form
u(x, 0) = f (x),
ut (x, 0) = g(x).
When n = 1 the equation has surprisingly simple solution. We let ξ = x − ct and η = x + ct. After
this change of variable equation (19) has the form uξη = 0, which after elementary considerations
admits the solution u(ξ, η) = F (ξ) + G(η), or after returning to the origin variables,
u(x, t) = F (x − ct) + G(x + ct).
The general solution for an arbitrary n can be obtained using the theory of Fourier series.
Consider now the equation of the form
∂u
(20) − k∆x u = 0.
∂t
This is the so-called Heat equation. Again t ∈ R is the “time” variable, x = (x1 , . . . xn ), and
∆x is the Laplacian in variable x. As a primary physical application, equation (20) describes the
conduction of heat with the function u usually representing the temperature of a “point” with
coordinates x at time t, but more generally it governs a range of physical phenomena described as
diffusive. Typical initial conditions are
u(x, 0) = f (x),
u(0, t) = 0,
where the first condition can be interpreted as the initial temperature of the system, and the second
one declares that the temperature is fixed at the “end” point. Solution of (20) for n = 1 can be
found, for example, using the separation of variables method: assuming k = 1 we seek solution of
the form u(x, t) = X(x)T (t). Putting this into (20) gives
T 0 (t) X 0 (x)
= .
T (t) X(x)
6 RASUL SHAFIKOV

Since the right-hand side is independent of t and the left-hand side of x, each side must be a
constant. This gives
T 0 = λT, X 00 = λX.
Solving these equations gives
√ √
u(x, t) = eλt (Ae λx
+ Be− λx
).
Initial conditions will then specify the values of the constants.
Finally, for a function u = u(x1 , . . . , xn ) consider the equation
∂2u ∂2u
(21) ∆u = 2 + · · · + 2 = 0.
∂x1 ∂xn
This PDE, which is called the Laplace equation, has many applications in gravitation, elastic
membranes, electrostatics, fluid flow, etc, as well as numerous applications in other areas of pure
mathematics. There are two types of boundary conditions on a bounded domain Ω ⊂ Rn for the
Laplace equation that give a well-posed problem: the Dirichlet condition
u(x) = f (x), x ∈ ∂Ω,
and the Neumann condition
∂u
(x) = f (x), x ∈ ∂Ω,
∂~n
∂u
where ∂~ n (x) is the derivative in the normal direction to ∂Ω. Solutions of (21) are called harmonic
functions. For n = 2, solutions can be found again by separation of variables.
Unlike the theory of ODE, not every PDE has a solution. In 1957 H. Lewy found an example
of a first order linear PDE that has no solution. The corresponding (complex-valued) differential
operator on C ∞ (R3 ) is
Lu = −ux − iuy + 2i(x + iy)uz .
Then there exists a real valued function f (x, y, z) of class C ∞ (R3 ) such that the equation Lu =
f (x, y, z) has no solution of class C 1 (Ω) in any open subset Ω ⊂ R3 . While Lewy’s example is not
explicit, later explicit constructions were also found.
The situation is different, however, if the functions involved in a PDE are real-analytic. Recall
that for a domain Ω ⊂ Rn a function f (x) : Ω → R is real-analytic if near any point in Ω it can
be represented by a convergent power series. More precisely, if a = (a1 , . . . , an ) ∈ Ω, there exists a
polydisc U (a, r) = {x ∈ Rn : |xj − aj | < r, j = 1, . . . , n}, U ⊂ Ω such that

X
f (x) = bk (x − a)k ,
|k|=0

where k = (k1 , k2 , . . . , kn ) is the multi-index, |k| = k1 + k2 + · · · + kn , ak ∈ R, and


(x − a)k = (x1 − a1 )k1 · ... · (xn − an )kn .
The coefficients ak of the power series are, in fact, the Taylor coefficients of f that can be computed
by the formula
1 k 1 ∂ |k| f
bk = D f (a) = (a).
k! k1 !k2 ! . . . kn ! ∂xk11 ∂xk22 · · · ∂xknn
REAL ANALYSIS LECTURE NOTES 7

The space of real-analytic functions is denoted by C ω (Ω).


Every real-analytic function in infinitely
differentiable, but the converse is not true, for example the function h(x) on R given by
(
0, if x ≤ 0,
h(x) =
e−1/x , if x > 0
is of class C ∞ (R) as repeated application of the L’Hôspital rule shows, but it is not real analytic
at the origin. The reason is that all derivatives of h at zero vanish, but h is not identically zero
on any neighbourhood of the origin, and so h cannot be represented by its Taylor series. A map
f : Rn → Rm is real-analytic if every component of f is a real-analytic function.
Theorem 6.3 (Cauchy-Kovalevskaya theorem). Consider the system of partial differential equa-
tions
n−1 N
∂ui XX ∂uj
(22) = akij (x1 , . . . , xn−1 , u1 , . . . , uN ) +bi (x1 , . . . , xn−1 , u1 , . . . , uN ), i = 1, . . . , N
∂xn ∂xk
k=1 j=1
with the initial condition
(23) ui = 0 on xn = 0, i = 1, . . . , N.
Let the functions akij
and bi be real analytic at the origin of RN +n−1 . Then the system (22) with
the initial condition (23) has a unique (among real-analytic functions) system of solutions ui that
is real analytic at the origin.
The proof of the Cauchy-Kovalevskaya theorem can be found in comprehensive textbooks on
PDEs.
v: 2019-02-28
REAL ANALYSIS LECTURE NOTES

RASUL SHAFIKOV

7. The space of test functions


7.1. Space of test functions. Let Ω be an open subset of Rn . For an integer k ≥ 0 we denote by
C k (Ω) the linear space of functions h : Ω −→ R that have continuous partial derivatives up to order
k on Ω. We define C ∞ (Ω) = ∩∞ k
k=0 C (Ω) to be the space of real functions admitting continuous
partial derivatives of any order. The standard topology on C ∞ (Ω) can be defined by the following
notion of convergence: we say that a sequence {hk } ⊂ C ∞ (Ω) converges as k −→ ∞ to a function
h in C ∞ (Ω) if hk converges to h uniformly along with all partial derivatives of any order on any
compact subset of Ω. If K is a compact subset of Ω, we use the standard norm
X
k u kC k (K) = sup |Dα u(x)|
x∈K
|α|≤k

for u ∈ C k (Ω).
Recall that supp h, the support of a function h ∈ C 0 (Rn ), is defined to be the closure in Rn of
the set {x ∈ Ω : h(x) 6= 0}. Denote by C0∞ (Ω) the subspace of C ∞ (Rn ) which consists of functions
h such that supp h is a compact subset of Ω.
Our main goal is to introduce and to study the space of continuous linear functionals on the
linear vector space C0∞ (Ω). For this we first need to choose some topology on C0∞ (Ω). For our
purposes it will be sufficient simply to define the notion of convergence of a sequence of elements
in C0∞ (Ω). This will allow us to define the continuity of linear functionals. We say that a sequence
of functions (ϕj ) of class C0∞ (Ω) converges to ϕ ∈ C0∞ (Ω) if the following conditions hold:
(i) There exists a compact subset K such that K ⊂ Ω and ϕj = 0 on Rn \K for every
j = 1, 2, 3.... In other words, supp ϕj ⊂ K ⊂ Ω for every j.
(ii) For every α the sequence Dα ϕj converges to Dα ϕ uniformly on K. That is all partial
derivatives of ϕj of all orders converge uniformly to the corresponding partial derivatives
of ϕ.
Definition 7.1. The space C0∞ (Ω) equipped with the above topology of convergence of sequences
will be denoted by D(Ω). The elements of this space are called test-functions.
We leave as an exercise for the reader to verify that D(Ω) is a topological vector space.
p
Example 7.1. Denote by |x| = |x1 |2 + ... + |xn |2 the Euclidean norm on Rn . Set
 2
 − 2ε 2
ωε (x) = Cε e
ε −|x| , |x| < ε,
0, |x| ≥ ε.

Here the constant Cε is determined by the condition


Z
ωε (x)dx = 1,
Rn
1
2 RASUL SHAFIKOV

that is,
Z !−1
−1
−n 1−|t|2
Cε = ε e dt ,
B(0,1)
where B(0, 1) denotes the Euclidean unit ball in Rn . This is a model example of a function from
D(Rn ) (the so-called bump function). Observe that
ωε (x) = ε−n ω1 (x/ε).

In what follows we write ω(x) instead of ω1 (x). The bump function allows us to construct
suitable test-functions for an arbitrary domain Ω in Rn . We denote by Ωε := ∪x∈Ω B(x, ε), the
ε-neighbourhood of Ω.
Lemma 7.2. Given a compact subset Ω ⊂ Rn and ε > 0 there exists a function η : Rn −→ [0, 1] of
class C ∞ (Rn ) such that
η(x) = 1, for x ∈ Ωε ,
η(x) = 0, for x ∈ Rn \Ω3ε .
Proof. Let (
1, x ∈ Ω2ε ,
λ(x) =
0, x ∈ Rn \Ω2ε ,
be the characteristic function of Ω2ε . Then the function defined by the convolution integral
Z
η(x) = λ(y)ωε (x − y)dy
Rn
satisfies the required conditions. 
Corollary 7.3. Let Ω be a bounded domain and Ω0 be its subset such that Ω0 ⊂ Ω. Then there
exists a function η ∈ D(Ω), η : Ω −→ [0, 1] such that η(x) = 1 whenever x ∈ Ω0 .
We point out that the introduced above topology on the space D(Ω) is not metrizable. In other
words, one cannot define a distance d on the space of test functions such that the convergence with
respect to d is equivalent to the introduced above convergence of a sequence of test functions. To
see this, recall the following elementary assertion.

Claim. Let {um k } be a countable family of sequences in a metric space (X, d). Suppose that for
every m = 0, 1, ... the sequence {um m m
k } admits the limit u := limk→∞ uk and the sequence of these
m
limits {u } also tends to u ∈ X as m → ∞. Then for every m there exists km such that km−1 < km
and the sequence {um km } converges to u.

We leave an easy proof as an exercise. Consider now in D(R) the functions


m2
(
1 − m2 −x2
m
uk (x) = k e , for |x| ≤ m,
0, for |x| ≥ m.
For every m we have limk→∞ um m
k = 0 in D(R), but for any choice of km the sequence {ukm } does
not converge in D(R) since the supports of these test functions are not uniformly bounded. This
shows that the topology of D(Rn ) is not metrizable.
It is not difficult to describe the topology on D(Rn ) defined by the introduced above convergence.
Consider for simplicity the case n = 1, as the general case is similar. Given m consider m+1 positive
REAL ANALYSIS LECTURE NOTES 3

continuous functions γj . Define a neighbourhood of the origin in D(R) as the set of test-functions φ
satisfying |φ(j) (x)| ≤ γj (x), x ∈ R, j = 1, 2, . . . , m. Then the convergence in this topology precisely
coincides with the introduced above convergence on D(R). The reader is encouraged to supply a
proof of this fact.
7.2. Regularization of functions. The convolution f ∗g of two functions f, g ∈ L2 (Rn ) is defined
by the integral
Z Z
(1) f ∗ g(x) = f (y)g(x − y)dy = f (x − y)g(y)dy.
Rn Rn
The convolution has particularly nice properties when the second factor is a test function. Denote by
L1loc (Ω) the space of locally Lebesgue-integrable functions on Ω. Recall that a measurable function
1
R
f is in Lloc (Ω) if and only if X |f (x)|dx < ∞ for every compact measurable subset X ⊂ Ω. Let
f ∈ L1loc (Rn ) and ϕ ∈ D(Rn ). Then for every x ∈ Rn the function y 7→ ϕ(x − y) is a test-function
and so the convolution
Z
(2) f ∗ ϕ(x) = f (y)ϕ(x − y)dy
Rn
is well-defined point-wise as a usual function on all of Rn .
Proposition 7.4. We have f ∗ ϕ ∈ C ∞ (Rn ) and
(3) Dα (f ∗ φ) = f ∗ Dα ϕ.
Proof. (a) Let us show that the convolution f ∗ ϕ is a continuous function. Let xk ∈ Rn be a
sequence converging to x. Then ϕ(xk − y) −→ ϕ(x − y), as k → ∞ everywhere, and
Z Z
k k
f ∗ ϕ(x ) = f (y)ϕ(x − y)dy −→ f (y)ϕ(x − y)dy = (f ∗ ϕ)(x), as k → ∞,

by the Lebesgue Convergence theorem.


(b) Denote by ~ej , j = 1, ..., n, the vectors of the standard basis of Rn . For a fixed x ∈ Rn we
have
1 ∂
(ϕ(x + t~ej − y) − ϕ(x − y)) −→ (ϕ(x − y)), t → 0,
t ∂xj
everywhere (this follows from the Mean Value theorem applied to the function ϕ(x + t~ej − y) on
[0, t]). Therefore, by the Lebesgue Convergence theorem,
Z Z  
1 1 ∂
(f ∗ ϕ(x + t~ej ) − f ∗ ϕ(x)) = f (y) (ϕ(x + t~ej − y) − ϕ(x − y))dy −→ f (y) ϕ (x − y)dy
t t ∂xj
Hence, the partial derivative of f ∗ ϕ with respect to xj exists and
Z    
∂ ∂ϕ ∂ϕ
f ∗ ϕ = f (y) (x − y)dy = f ∗ .
∂xj ∂xj ∂xj
Since ∂x∂ j ϕ ∈ D(Rn ), it follows from part (a) that the partial derivative ∂x∂ j (f ∗ ϕ) is continuous.
Proceeding by induction, we obtain that f ∗ ϕ ∈ C ∞ (Rn ) and satisfies (3). 
A special case, important in applications, arises if we take the bump-function ωε as ϕ in (2).
Definition 7.5. The convolution fε := f ∗ωε is called the regularization of a function f ∈ L1loc (Rn ).
Proposition 7.6. We have
(i) fε ∈ C ∞ (Rn );
4 RASUL SHAFIKOV

(ii) if f ∈ C(Rn )
then fε −→ f , as ε → 0+, in C(Ω) for every bounded subset Ω of Rn ;
(iii) if f ∈ L (R ), 1 ≤ p < ∞, then fε −→ f , as ε −→ 0+, in Lp (Rn ).
p n

Proof. Part (i) follows from Proposition 7.4. To show (ii), assume that f is a continuous function
on Rn . Then
Z Z Z
(4) fε (x) = ωε (x − y)f (y)dy = ωε (y)f (x − y)dy = ωε (y)f (x − y)dy.
Rn Rn |y|≤ε

Since Z
ωε (x)dx = 1,
Rn
we have
Z Z Z
ωε (y)f (x − y)dy − f (x) = ωε (y)(f (x − y) − f (x))dy ≤ Mε ωε (y)dy = Mε ,
Rn Rn |y|≤ε

where
Mε = sup |f (x − y) − f (x)|.
x∈Ω,|y|≤ε

Since f is continuous on the compact subset Ωε of Rn , it is uniformly continuous there so that


Mε −→ 0 as ε → 0+. This proves (ii).
For the proof of (iii) we need to show that k fε − f kLp −→ 0, as ε −→ 0+, where
Z 1/p
p
k h kL =k h kLp (Rn ) =
p |h(x)| dx .
Rn

Lemma 7.7. For every ε > 0 we have


k fε kLp ≤k f kLp .
Proof. Consider first the case p = 1. By Fubini’s theorem
Z Z Z Z Z 
k fε kL1 = |f ∗ ωε (x)|dx ≤ |f (y)|ωε (x − y)dydx = |f (y)| ωε (x − y)dx dy
Z
= |f (y)|dy =k f kL1 .

Let now p > 1. For p and q satisfying p1 + 1q = 1 we have, by Hölder’s inequality,


Z p Z p
|f ∗ ωε |p ≤ |f (y)|ωε (x − y)dy = |f (y)|ωε (x − y)1/p ωε (x − y)1/q dy
Z  Z p/q Z
p
≤ |f (y)| ωε (x − y)dy ωε (x − y)dy = |f (y)|p ωε (x − y)dy.

Therefore, by Fubini’s theorem,


Z 1/p Z Z  1/p
p p
|f ∗ ωε (x)| dx ≤ |f (y)| ωε (x − y)dy dx
Z Z  1/p Z 1/p
p p
= |f (y)| ωε (x − y)dx dy = |f (y)| dy ,

which proves the lemma. 


REAL ANALYSIS LECTURE NOTES 5

Now let f ∈ Lp (Rn ),


1 ≤ p < ∞. From the theory of Lebesgue integral, given τ > 0 there exists
A > 0 and a function h ∈ C(Rn ) with supp h ⊂ {|x| ≤ A} such that
k f − h kLp ≤ τ.
Then by Lemma 7.7 we have
k fε − hε kLp =k (f − h)ε kLp ≤k f − h kLp ≤ τ
for every ε. Furthermore, supp hε ⊂ K = {|x| ≤ A + 1} for ε small enough. Hence, by (ii), hε −→ h
as ε −→ 0+ in C(K). It follows that
k h − hε kLp ≤ τ
for all ε sufficiently small. Thus,
k f − fε kLp ≤k f − h kLp + k h − hε kLp + k hε − fε kLp ≤ 3τ,
which concludes the proof of the proposition. 
7.3. Partition of unity. A family of open subsets (Uα )α∈A in Rn is called a covering of a subset
X ⊂ Rn if X ⊂ ∪α∈A Uα . A covering is called locally finite if every point x ∈ X admits a
neighbourhood V such that the intersection V ∩ Uα is not empty only for a finite subset of α in A.
A covering (Vβ )β∈B is called a subcovering of (Uα )α∈A if for every β ∈ B there exists α ∈ A such
that Vβ ⊂ Uα . We denote by X ◦ the interior of X.
Proposition 7.8. Let X be an open subset of Rn and (Uα ) be its covering. Then there exists a
locally finite subcovering (Vβ ) of (Uα ) with the following property: for every β the set Vβ contains
the closure of some ball B(pβ , rβ ) and these open balls (B(pβ , rβ ) form a locally finite subcovering
(Wγ )γ∈Γ of (Vβ ).
Proof. We set
Kj = {z ∈ X : |x| ≤ j, dist (x, ∂X) ≥ 1/j}.
Then Kj is a sequence of compact subsets of X satisfying

(i) Kj ⊂ Kj+1
(ii) X = ∪∞j=1 Kj .

Since Ki \Ki−1 is compact, one can choose Uαi 1 , ..., Uαi k a finite subset of (Uα ) with
i

(Ki \Ki−1 ) ⊂ ∪kj=1
i
Uαi j .
Set Vji = Uαi j ∩ (Ki \Ki−2 )◦ . Then (Vji ) is a locally finite subcovering. For every p ∈ Vβ consider a
ball B(p, r(p)) contained in Vβ for all β with p ∈ Vβ . Then this family of balls form a subcovering
of (Vβ ) and we apply a previous construction extracting a locally finite subcovering by balls. 
Corollary 7.9. Let X be an open subset of Rn and (Uα ) be its covering. Then there exists a locally
finite subcovering (Wγ ) and a family of functions ηγ ∈ C0∞ (Wγ ) with the following properties:
γ (x) ≥ 0 for every x ∈ X and every γ.
(i) ηP
(ii) γ ηγ (x) = 1 for every x ∈ X.

Proof. Consider locally finite subcoverings (Wγ ) and (Vβ ) constructed in the previous proposition.
Fix γ. By construction, there exists a finite number of β such that the closed ball Wγ is contained
in Vβ . By Corollary 7.3 there exists a function
\
φγ ∈ C0∞ ( Vβ )
Wγ ⊂Vβ
6 RASUL SHAFIKOV
P
with values in [0, 1] that is equal to 1 on Wγ . Hence, φ(x) = γ φγ (x) is well-defined for every
x ∈ X and φ(x) > 0. Now set ηγ (x) = φγ (x)/φ(x). 
The family (ηγ ) is called a partition of unity subordinated to the covering (Wγ ).
v: 2019-03-05
REAL ANALYSIS LECTURE NOTES

RASUL SHAFIKOV

8. Basic theory of distributions


8.1. Definition and examples of distributions.
Definition 8.1. A linear continuous functional f on the space D(Ω) is called a distribution. The
linear space of all distributions is denoted by D0 (Ω).
Recall that continuity here means that for every sequence (ϕj ) of test-functions converging to ϕ
in D(Ω) we have limj→∞ f (ϕj ) = f (ϕ). By the linearity of f this is equivalent to the continuity
at the zero vector: f is continuous if and only if limj→∞ f (ϕj ) = 0 for every sequence ϕj −→ 0 in
D(Ω). We will often use the notation f (ϕ) = hf, ϕi which has certain advantages. We now consider
some examples.

Example 8.1. Recall that L1loc (Ω) the space of locally Lebesgue-integrable functions on Ω. Let f
be in L1loc (Ω), i.e., X |f (x)|dx < ∞ for every compact measurable subset X ⊂ Ω. Then f defines
R

a distribution Tf ∈ D0 (Ω) acting on a test-function ϕ ∈ D(Ω) by


Z
(1) hTf , ϕi = f (x)ϕ(x)dx.

By the linearity of the integral, Tf is a linear functional on D(Ω). The continuity of Tf follows from
the definition of the topology on D(Ω) and the Lebesgue Dominated Convergence Theorem. 
Consider the map
L : L1loc (Ω) −→ D0 (Ω),
given by
L : f 7→ Tf .
By linearity, the injectivity of L is equivalent to the fact that L−1 (0) = {0}. The latter is a
consequence of the following.
Proposition 8.2. Tf = 0 in D0 (Ω) if and only if f ∈ L1loc (Ω) vanishes almost everywhere in Ω.
Proof. We only show that if Tf = 0 in D0 (Ω) then f vanishes almost everywhere in Ω; the converse is
obvious. Let p be an arbitrary point of Ω; fix an r > 0 such that the closed ball B(p, r) is contained
in Ω. Let η be a function of class C ∞ (Rn ) such that η = 1 on B(p, r/2) and supp η ⊂ B(p, r). Then
ηf ∈ L1 (Rn ) and Tηf vanishes in D0 (Rn ). On theR other hand, for every ε and every x the function
y 7→ η(y)ωε (x − y) is in D(Rn ) so (ηf )ε (x) = Rn f (y)η(y)ωε (x − y)dy = 0 for every x ∈ Rn . By
(iii) of Proposition 7.6, k ηf − (ηf )ε kL1 −→ 0, ε −→ 0. Hence ηf represents 0 in L1 (Rn ) and so
vanishes almost everywhere. Therefore, f vanishes almost everywhere on B(p, r/2). Since p is an
arbitrary point, the general statement follows. 
Thus, every “usual” function f of class L1loc (Ω) can be identified with a distribution Tf . In what
follows we often drop the T and write hf, ϕi instead of hTf , ϕi viewing usual functions as distribu-
tions. Such distributions (defined by (1)) are called regular. However, the class of distributions is
much larger, and so the space of distributions D0 (Ω) is a far-reaching generalization of the notion
1
2 RASUL SHAFIKOV

of a function defined point-wise. Distributions which are not regular, are called singular. The
following example confirms their existence.

Example 8.2. Consider the distribution δ(x) ∈ D0 (Rn ) (Dirac delta function) defined by
hδ(x), ϕi = ϕ(0)
for ϕ ∈ D(Rn ). Suppose that there exists a function f ∈ L1loc (Ω) such that δ = Tf . For every ε > 0
−ε2
consider the function ψε ∈ D(Ω) defined by ψε (x) = e ε2 −|x|2
for |x| < ε and ψε (x) = 0 for |x| ≥ ε.
Then hδ, ψε i = ψε (0) = e−1 . On the other hand,
Z
hTf , ψε i = f (x)ψε (x)dx,

and by the Lebesgue convergence theorem the last integral tends to 0 as ε → 0: a contradiction.
Thus, the δ-function is a singular distribution. Similarly, for every a ∈ Rn one can define the
translated delta function δa :
hδa , ϕi = ϕ(a).

Example 8.3. Another interesting and typical example of a singular distribution is given as follows:
Z Z −ε Z +∞ 
1 φ(x) φ(x) φ(x)
hP , φi = v.p. dx := lim dx + dx ,
x R x ε−→0+ −∞ x ε x
where v.p. stands for valeur principale in the sense of Cauchy of the integral. First we show that
P x1 is well-defined. Indeed, let φ ∈ D(R) be arbitrary with supp φ ⊂ [−A, A] for some A > 0. Then,
using the Mean Value Theorem for φ on the intervals [−x, 0] and [0, x], we obtain
Z −ε Z A
1 φ(x) φ(x)
hP , φi = lim dx + dx
x ε−→0+ −A x ε x
Z −ε
φ(0) − xφ0 (ξ(x))
Z A ˜
φ(0) + xφ0 (ξ(x))
= lim dx + dx
ε−→0+ −A x ε x
Z A Z A
≤ |φ0 (ξ(x))|dx + |φ0 (ξ(x))|dx
˜ ≤ 2A sup |φ0 | < ∞.
−A −A [−A,A]

Clearly, P x1
is linear. Let us now show that it is continuous on D(R). Consider a sequence (φj )
converging to 0 in D(R). This means that there exists A > 0 such that φj (x) = 0 for every j and
every |x| ≥ A. Then, applying the Mean Value Theorem, we have
φj (0) + xφ0j (ξ(x))
Z Z A
1 φj (x)
hP , φj i = v.p. dx = v.p. dx
x R x −A x
Z A
≤ |φ0j (ξ(x))|dx ≤ 2A sup |φ0j | −→ 0, j −→ 0.
−A [−A,A]

8.2. Convergence of distributions. Now we define a topology on the space of distributions. For
applications it is sufficient to use the standard notion of weak∗ convergence.
Definition 8.3. A sequence of distributions (fj ) converges to a distribution f in D0 (Ω) if for every
ϕ ∈ D(Ω) one has limj−→∞ hfj , ϕi = hf, ϕi.
The following simple example of convergence is very important.
REAL ANALYSIS LECTURE NOTES 3

Proposition 8.4. ωε −→ δ in D0 (Rn ) as ε −→ 0+.


Proof. Given φ ∈ D(Rn ) we need to show that
Z
lim ωε (x)φ(x)dx = φ(0).
ε−→0+

For every τ > 0 there exists an ε0 > 0 such that |φ(x) − φ(0)| < τ when |x| < ε0 . Using the
properties of the bump-function we obtain
Z Z
ωε (x)φ(x)dx − φ(0) ≤ ωε (x)|φ(x) − φ(0)|dx < τ.
Rn |x|≤ε

Another fundamental property of the space D0 (Ω) is its completeness.
Theorem 8.5. Let (fj ) be a sequence in D0 (Ω) such that for every ϕ ∈ D(Ω) the sequence (hfj , ϕi)
converges in R. Consider the map f : D(Ω) −→ R defined by
hf, ϕi := lim hfj , ϕi, ϕ ∈ D(Ω).
j−→∞

Then f ∈ D0 (Ω).
Proof. The linearity of f is obvious so we just need to establish its continuity. Let ϕk −→ 0 as
k −→ ∞ in D(Ω). Arguing by contradiction suppose that hf, ϕk i does not converge to 0. Passing
to a subsequence we may assume that there exists an ε > 0 such that |hf, ϕk i| ≥ 2ε for all k.
Since hf, ϕk i = limj−→∞ hfj , ϕk i, for every k there exists jk such that |hfjk , ϕk i| ≥ ε. However, this
contradicts the following statement:
Lemma 8.6. Let (fk ) be a sequence in D0 (Ω) satisfying assumptions of Theorem 8.5 and ϕk −→ 0
in D(Ω). Then hfk , ϕk i −→ 0, as k −→ ∞.
Thus, in order to complete the proof of the theorem it remains to prove the lemma.
Proof of Lemma 8.6. Suppose on the contrary that the statement of the lemma is false. Passing
to a subsequence we may assume that |hfk , ϕk i| ≥ C > 0. Since ϕk −→ 0 in D(Ω), we have:
(a) ϕk = 0 for all k outside a compact subset K ⊂ Ω.
(b) For every α the sequence Dα ϕk converges uniformly to 0.
Passing to a subsequence we can assume that any k = 0, 1, 2, . . . ,
|Dα ϕk (x)| ≤ 1/4k , |α| ≤ k.
Set ψk = 2k ϕk ; then
(2) |Dα ψk (x)| ≤ 1/2k , |α| ≤ k.
Furthermore, ψk −→ 0 in D(Rn ) and every series of type s ψks (x) converges in D(Ω). On the
P
other hand,
(3) |hfk , ψk i| = 2k |hfk , ϕk i| ≥ 2k C −→ ∞ as k −→ ∞.
To reach a contradiction, we construct by induction suitable subsequences (fks ) and (ψks ) that
satisfy inequalities (7) and (8) below. Choose fk1 and ψk1 such that |hfk1 , ψk1 i| ≥ 2. This is always
possible in view of (3). Suppose that fkj , ψkj , j = 1, ..., s − 1, are already constructed. We wish
to find fks , ψks . Since ψk −→ 0, k −→ ∞ in D(Ω), we have limk→∞ hfkj , ψk i −→ 0, for any
j = 1, ..., s − 1, and so there exists N such that for k ≥ N
(4) |hfkj , ψk i| ≤ 1/2s−j , j = 1, ..., s − 1.
4 RASUL SHAFIKOV

Moreover, since
lim hfk , ψkj i = hf, ψkj i j = 1, ..., s − 1,
k→∞
there exists N1 ≥ N such that for all k ≥ N1
(5) |hfk , ψkj i| ≤ |hf, ψkj i| + 1, j = 1, ..., s − 1.
Finally, in view of (3) we fix ks ≥ N1 such that
s−1
X
(6) |hfks , ψks i| ≥ |hf, ψkj i| + 2s.
j=1

Now it follows from (4), (5), (6) that the functions fks and ψks satisfy
(7) |hfkj , ψks i| ≤ 1/2s−j , j = 1, ..., s − 1,
s−1
X
(8) |hfks , ψks i| ≥ |hfks , ψkj i| + s + 1.
j=1

This gives the inductive construction of the required subsequences. Set



X
ψ(x) = ψks (x).
s=1

By (2) this series converges in D(Ω). Its sum ψ ∈ D(Ω) satisfies



X
hfks , ψi = hfks , ψks i + hfks , ψkj i.
j=1,j6=s

Therefore, keeping in mind (7), (8) we obtain


s−1
X ∞
X
hfks , ψi ≥ |hfks , ψks i| − |hfks , ψkj i| − |hfks , ψkj i|
j=1 j=s+1

X
≥s+1− 1/2j−s = s,
j=s+1

that is, hfks , ψi −→ ∞ as s −→ ∞. This contradicts the condition hfk , ψi −→ hf, ψi, which
completes the proof. 

8.3. Multiplication of distributions. The product of two functions of class L1loc (R) in general
is not in this class (consider, for instance, f (x) = |x|−1/2 and f 2 ). This example shows that it is
impossible in general to define in a natural way even the product of regular distributions. In fact,
one can show that it is impossible to define a multiplication of two distributions which satisfies
the standard algebraic properties (commutativity, associativity,...). However, one can define the
product of a distribution f ∈ D0 (Ω) and a function a ∈ C ∞ (Ω).
First, consider the case when f ∈ L1loc (Ω), i.e., f is a regular distribution. Then the distribution
corresponding to the usual product af acts on a test-function ϕ by
Z
hTaf , ϕi = a(x)f (x)ϕ(x)dx = hTf , aϕi.

REAL ANALYSIS LECTURE NOTES 5

In the case of an arbitrary distribution f we take the right-hand side of this equality to be the
definition of the distribution af , i.e., we set
haf, ϕi := hf, aϕi, ϕ ∈ D(Ω).
Observe two immediate properties of the algebraic operation of multiplication of a distribution
by a smooth function a:
(a) Linearity: for every f, g ∈ D0 (Ω) and real λ, µ we have
a(λf + µg) = λ(af ) + µ(ag).
(b) Continuity: if fj −→ f in D0 (Ω) then afj −→ af in D0 (Ω).
Example 8.4. a(x)δ(x) = a(0)δ(x), since
haδ, φi = hδ, aφi = a(0)φ(0) = a(0)hδ, φi.

Example 8.5. xP x1 = 1. Indeed, for any φ ∈ D(Ω), we have
Z Z
1 1 xφ(x)
hxP , φi = hP , xφi = v.p. dx = φ(x)dx = h1, φi.
x x R x R

8.4. Composition with linear maps. Let f be a function of class L1loc (Rn ) and let u : x 7→ Ax+b
be a bijective affine map of Rn , i.e., det A 6= 0. Given φ ∈ D(Ω) consider
Z Z
hTf ◦u , φi = f (Ay + b)φ(y)dy = | det A|−1 f (x)φ(A−1 (x − b))dx

= | det A|−1 hTf , φ(A−1 (x − b))i.


For an arbitrary f ∈ D0 (Ω) we take the last equality as a definition of the distribution f ◦ u =
f (Ay + b), that is,
hf (Ay + b), φi := | det A|−1 hf, φ(A−1 (x − b))i.
The distribution f (x + b) is called the translation of a distribution f by a vector b. In particular,
hδ(y − a), φi = hδ, φ(x + a)i = φ(a).
Recall that we also denoted above this distribution by δa .

8.5. Dependence on a parameter. The continuity of distributions implies their “good” be-
haviour under an action on test-functions depending on a real parameter. We will use this property
and its variations.
Theorem 8.7. Let X and Y be domains in Rn and Rm respectively and ϕ ∈ C ∞ (X × Y ). Suppose
that there exists a compact subset K ⊂ X such that ϕ(x, y) = 0 for every (x, y) with x ∈
/ K. Then
0
for every f ∈ D (X) the function
F : Y 3 y 7→ hf (x), ϕ(x, y)i
is of class C ∞ (Y ) and
Dyα hf (x), ϕ(x, y)i = hf (x), Dyα ϕ(x, y)i.
6 RASUL SHAFIKOV

Proof. (a) Let us show that F is a continuous function. Let y k ∈ Rm be a sequence converging to
y ∈ Y . We can assume that the points y k are in a fixed closed ball B ⊂ Y . Then
k Dxβ ϕ(x, y k ) − Dxβ ϕ(x, y) kC(X) ≤k ∇Dxβ ϕ(x, y) kC(K×B) |y k − y|.
Since the supports of all functions x 7→ Dxβ ϕ(x, y k ) are contained in K, the sequence ϕ(x, y k )
converges to ϕ(x, y) as k −→ ∞ in D(X) and F (y k ) −→ F (y), k −→ ∞, by continuity of f .
(b) Next we study the partial derivatives of F . For the element ~ej , j = 1, ..., m, of the standard
basis of Rm , and every fixed y ∈ Y we have
ϕ(x, y + t~ej ) − ϕ(x, y) ∂
−→ (ϕ(x, y)), t −→ 0,
t ∂yj
in D(X). Therefore,
1 1 ∂
(F (y + t~ej ) − F (y)) = hf (x), (ϕ(x, y + t~ej ) − ϕ(x, y))i −→ hf (x), ϕ(x, y)i.
t t ∂yj
Hence, the partial derivative of F in yj exists and
∂ ∂
hf (x), ϕ(x, y)i = hf (x), ϕ(x, y)i.
∂yj ∂yj
Part (a) shows that the partial derivative ∂y∂ j F is continuous. Proceeding by induction, we obtain
that F ∈ C ∞ (Y ) and satisfies the derivation rule stated in the theorem. 
v: 2019-03-05
REAL ANALYSIS LECTURE NOTES

RASUL SHAFIKOV

9. Differentiation of distributions and the structure theorems


We saw in the previous section that the space of distributions is a generalization of the space of
functions defined point-wise. A remarkable consequence of this fact is that all distributions admit
partial derivatives of any order (suitably defined).

9.1. Definition, basic properties, first examples. We begin with some motivation. Suppose
that f is a regular function on a domain Ω in Rn , say, of class C 1 (Ω). Then its partial derivative
∂f
(in the usual sense) ∂x j
defines a distribution acting on ϕ ∈ D(Ω) by
Z Z
∂f (x) ∂ϕ(x) ∂ϕ
hT ∂f , ϕi = ϕ(x)dx = − f (x) dx = −hTf , i,
∂xj Ω ∂xj Ω ∂xj ∂xj
where the second equality follows from the integration by parts formula. But the last expression is
defined for an arbitrary distribution f ; so it is natural to take it as a definition of the derivative of
a distribution. For f ∈ D0 (Ω) and a multi-index α = (α1 , ..., αn ) we set
hDα f, ϕi := (−1)|α| hf, Dα ϕi,
where we used the usual notation
∂ |α|
Dα = , |α| = α1 + ... + αn .
∂xα1 ...∂xαn
Derivatives in D0 (Ω) are often called weak derivatives. It is easy to check (do it!) that weak
differentiation is a well-defined operation, that is, Dα f ∈ D0 (Ω). We note some basic properties of
this operation:

(0) If f ∈ C 1 (Ω), then ∂xj Tf = T ∂f .
∂xj

(1) The map Dα : D0 (Ω) −→ D0 (Ω) is linear and continuous. The linearity is obvious. In
order to prove continuity, consider a sequence fj −→ 0 in D0 (Ω) as j → ∞. Then for any
ϕ ∈ D(Ω),
hDα fj , ϕi = (−1)|α| hfj , Dα ϕi −→ 0, as j → ∞.
Thus, if a sequence (fj ) converges to f in D0 (Ω), then all partial derivatives of fj converge
to the corresponding partial derivatives of f .
(2) Every distribution admits partial derivatives of all orders.
(3) For any multi-indices α and β we have
Dα+β f = Dα (Dβ f ) = Dβ (Dα f ).
(4) The Leinbitz rule. If f ∈ D0 (Ω) and a ∈ C ∞ (Ω) then
∂(af ) ∂f ∂a
=a + f.
∂xk ∂xk ∂xk
1
2 RASUL SHAFIKOV

Indeed, given ϕ ∈ D(Ω) we have


∂(af ) ∂ϕ ∂ϕ ∂(aϕ) ∂a
h , ϕi = −haf, i = −hf, a i = −hf, − ϕi =
∂xk ∂xk ∂xk ∂xk ∂xk
∂(aϕ) ∂a ∂f ∂a
−hf, i + hf, ϕi = h , aϕi + h f, ϕi
∂xk ∂xk ∂xk ∂xk
 
∂f ∂a ∂f ∂a
= ha , ϕi + h f, ϕi = h a + f , ϕi.
∂xk ∂xk ∂xk ∂xk
We consider several elementary examples in dimension 1.
Example 9.1. Consider the so-called Heaviside function
(
1, if x > 0,
θ(x) =
0, if x ≤ 0.
Then, Z ∞
hθ0 , φi = −hθ, φ0 i = − φ0 (x)dx = φ(0) = hδ, φi.
0
Thus, θ0 = δ. 
Example 9.2. More generally, let f be a function of class C 1 on (−∞, x0 ] and of class C 1 on
[x0 , ∞). Denote by [f ]x0 := f (x0 + 0) − f (x0 − 0) its “jump” at x0 . Denote also by Tf 0 the regular
distribution defined by the usual derivative f 0 of f . We claim that
f 0 = Tf 0 + [f ]x0 δ(x − x0 ),
where the derivative f 0 of f on the left is understood in the sense of distributions. For any ϕ ∈ D0 (R)
we have
Z Z
0 0 0
hf , ϕi = −hf, ϕ i = − f (x)ϕ (x)dx = [f ]x0 ϕ(x0 ) + f 0 (x)ϕ(x)dx
= h[f ]x0 δ(x − x0 ) + Tf 0 , ϕi.

Example 9.3. Let f (x) = ln |x|. Then for every φ ∈ D(R) we obtain
Z
hln |x|0 , ϕi = −hln |x|, ϕ0 i = − ln |x|ϕ0 dx =
R
Z −ε Z +∞ 
0 0
− lim ln |x|ϕ (x)dx + ln |x|ϕ (x)dx =
ε−→0+ −∞ ε
 Z −ε Z ∞ 
ϕ(x) ϕ(x)
− lim ln εϕ(−ε) − dx − ln εϕ(ε) − dx =
ε−→0+ −∞ x ε x
Z !
ϕ(x)
− lim ln ε[ϕ(−ε) − ϕ(ε)] − dx =
ε−→0+ |x|≥ε x
Z
ϕ(x) 1
lim dx = hP , ϕi
ε−→0+ |x|≥ε x x
Thus
1
ln |x|0 = P .
x

REAL ANALYSIS LECTURE NOTES 3

Example 9.4. We have


hδ 0 , φi = −hδ, φ0 i = −φ0 (0).

9.2. Basic differential equations with distributions. We saw in the previous examples that
the usual point-wise derivative does not give a full information about the derivative in the sense
of distributions: the Dirac delta-function appears at the points of discontinuity. The following
important statement shows that this does not happen for derivatives in the sense of distributions.
Theorem 9.1. Let f ∈ D0 ((a, b)) and f 0 = 0 in D0 ((a, b)). Then f is constant, i.e., there exists a
real constant c ∈ R such that f = Tc .
Proof. By hypothesis, for every ϕ ∈ D((a, b)) one has hf, ϕ0 i = 0. Given a function ψ ∈ D((a, b)),
its primitive
Z x
ϕ(x) = ψ(t)dt
−∞
is identically constant on the interval [A, ∞), where A < b is the sup of the support of ψ. Hence,
ϕ is in D((a, b)) if and only if
Z +∞
J(ψ) := ψ(t)dt = 0.
−∞
Now fix a function τ0 ∈ D((a, b)) such that J(τ0 ) = 1 and given φ ∈ D((a, b)) set ψ = φ − J(φ)τ0 .
Then J(ψ) = 0 and so ψ = ϕ0 for some ϕ ∈ D((a, b)). Therefore, hf, ψi = 0 and hf, φi =
hf, τ0 iJ(φ) = constJ(φ) for every φ ∈ D((a, b)), which proves the theorem. 
Corollary 9.2. Let f ∈ D0 ((a, b)) and f 0 ∈ C((a, b)). Then f is a regular distribution and f ∈
C 1 ((a, b)).
Proof. The continuous function f 0 admits a primitive f˜ of class C 1 ((a, b)). Then (f − f˜)0 = 0 in
D0 ((a, b)) and Theorem 9.1 can be applied. 

We now extend these results to distributions in several variables.


Theorem 9.3. Let Ω0 be a domain in Rn−1 and I = (a, b) be an interval in R. Assume that a
distribution f ∈ D0 (Ω0 × I) satisfies
∂f
=0
∂xn
in D0 (Ω0 × I). Then there exists a distribution f˜ ∈ D0 (Ω0 ) such that for every ϕ ∈ D(Ω0 × I)
Z
hf, ϕi = hf˜(x0 ), ϕ(x0 , xn )idxn ,
R

where x0
= (x1 , . . . , xn−1 ). In this sense the distribution f is independent of the variable xn .
Proof. Fix a function τ0 ∈ D(I) such that R τ0 dt = 1. We lift every φ ∈ D(Ω0 ) to a function
R

φ̃ ∈ D(Ω0 × I) by setting φ̃(x0 , xn ) = φ(x0 )τ0 (xn ). This allows us to define a distribution f˜ ∈ D0 (Ω0 )
by setting hf˜, φi = hf, φ̃i, φ ∈ D(Ω0 ).
Given ψ ∈ D(Ω0 × I) put
Z
0
J(ψ)(x ) = ψ(x0 , xn )dxn .
R
4 RASUL SHAFIKOV

Similarly to the proof of Theorem 9.1 for every ψ ∈ D(Ω0 × I) there exists a function ϕ ∈ D(Ω0 × I)
such that
∂ϕ(x)
ψ(x) − J(ψ)(x0 )τ0 (xn ) = .
∂xn
Then by the assumptions of the theorem, hf, ∂ϕ(x) ˜
∂xn i = 0, and by the definition of the distribution f
we have Z
hf, ψi = hf, J(ψ)(x )τ0 (xn )i = hf , J(ψ)i = hf , ψ(x0 , xn )dxn i.
0 ˜ ˜
R
It remains to show that
Z Z
0
hf˜, ψ(x , xn )dxn i = hf˜(x0 ), ψ(x0 , xn )idxn .
R R
R xn
Fix ψ ∈ D(Ω0 × I) and consider the functions F1 (xn ) = hf˜(x0 ), −∞ ψ(x0 , t)dti and F2 (xn ) =
R xn
˜ 0 0 0 0
−∞ hf (x ), ψ(x , t)idt. Then it follows from Theorem 8.7 that F1 = F2 . Since limxn →−∞ Fj = 0,
we obtain F1 ≡ F2 . This concludes the proof. 
∂f
Corollary 9.4. Let f ∈ D(Ω) satisfy ∂xj = 0, j = 1, ..., n. Then f is constant.
Finally we establish a weak, but useful analogue of Corollary 9.2.
Theorem 9.5. Let f and g be continuous functions in a domain Ω ⊂ Rn . Suppose that
∂Tf
= Tg .
∂xn
∂f
Then the usual partial derivative ∂xn exists at every point x ∈ Ω and is equal to g(x).
Proof. The statement in local so without loss of generality we assume that Ω = Ω0 × I in the
notation of Theorem 9.3. Fix a point c ∈ I and set
Z xn
v(x) = g(x0 , t)dt.
c
∂(f −v)
Then ∂xn = 0 in × I) and Theorem 9.3 gives the existence of a distribution f˜ ∈ D0 (Ω0 )
D0 (Ω0
such that f − v = f˜. Furthermore, since f − v is continuous, it follows from the construction of f˜
in the proof of Theorem 9.3 that f˜ is a continuous function in x0 (defining a regular distribution).
Then the function f (x) = v(x) + f˜(x0 ) admits a partial derivative in xn which coincides with g.
This proves the theorem. 
9.3. Support of a distribution. Distributions with compact support. Let f ∈ D0 (Ω0 ), and
Ω ⊂ Ω0 be a subdomain. By the restriction of f to Ω we mean a distribution f |Ω acting by
hf |Ω, ϕi := hf, ϕ|Ωi, ϕ ∈ D(Ω) ⊂ D(Ω0 ).
We say that a distribution f ∈ D0 (Rn ) vanishes on an open subset U ⊂ Rn if hf, ϕi = 0 for any
ϕ ∈ D(U ), i.e., its restriction to U vanishes identically. We express this as f |U ≡ 0.
Definition 9.6. The support supp f of a distribution f ∈ D0 (Rn ) is the subset of Rn with the
following property: x ∈ supp f if and only if for every neighbourhood U of x there exists φ ∈ D(U )
(and so supp φ ⊂ U ) such that hf, φi 6= 0, i.e., f does not vanish identically in any neighbourhood
of x.
It follows from the definition of supp f that it is a closed subset of Rn , and so its complement is
an open (but not necessarily connected) subset of Rn . Indeed, the set Rn \supp f is formed by all
points x such that f vanishes identically in some neighbourhood of x and so it is clearly open.
REAL ANALYSIS LECTURE NOTES 5

Proposition 9.7. Let X be an open subset of Rn


such that f ∈ D0 (Rn ) vanished identically in a
neighbourhood of every point of X. Then f |X ≡ 0.
Proof. Given point x ∈ X there exists a neighbourhood Uα such that f |Uα ≡ 0. Let φ ∈ D(X).
Consider a neighborhood U of supp φ such that the closure U is a compact subset of X. Let
(ηγ ) be aPpartition of unity subordinated to a finite sub-covering (Uα ) of U (see Section 7). Then
hf, φi = γ hf, ηγ φi = 0 since every ηγ φ ∈ D(Uα ) for some α. 
Example 9.5. If f is a regular distribution defined by a function f ∈ C(Rn ) then its support in
the sense of distributions coincides with the support in the usual sense since f vanishes on an open
set U as a distribution if and only if it vanishes as a usual function. 
Example 9.6. supp δ(x) = {0}. 
A remarkable property of distributions with a compact support in Rn is that one can extend them
as linear continuous functionals defined on the space C ∞ (Rn ). Let f ∈ D0 (Rn ) have a compact
support supp f = K in Rn . Fix a function η ∈ C0∞ (Rn ) such that η(x) = 1 in a neighbourhood of
K. Then for every ψ ∈ C ∞ (Rn ) the function ηψ is in D(Rn ) and we set
(1) hf, ψi := hf, ηψi,
since the right-hand side is well-defined. This definition is independent of the choice of η. Indeed,
let η 0 ∈ C0∞ (Rn ) be another function vanishing in a neighbourhood of K. Then η − η 0 vanishes in
a neighbourhood of K and for any ψ ∈ C ∞ (Rn ) the support of the function (η − η 0 )ψ ∈ D(Rn ) is
contained in Rn \K. By Definition 9.6 we have
hf, ηψi − hf, η 0 ψi = hf, (η − η 0 )ψi = 0.
Hence, (1) is independent of the choice of η. The defined above extension of f (still denoted by f )
is, of course, a linear continuous functional on C ∞ (Rn ). Indeed, let a sequence ψ k converge to ψ
in C ∞ (Rn ), i.e., ψ k converges to ψ together with all derivatives uniformly on every compact subset
of Rn . Then ηψ k converges to ηψ in D(Rn ) and
hf, ψ k i = hf, ηψ k i −→ hf, ηψi = hf, ψi.
Example 9.7. For every ψ ∈ C ∞ (Rn ) we have
hδ(x), ψi = hδ(x), ηψi = η(0)ψ(0) = ψ(0)
since η = 1 in a neighbourhood of supp δ(x) = {0}. 
v: 2019-04-01
REAL ANALYSIS LECTURE NOTES

RASUL SHAFIKOV

10. Structure theorems and convolution of distributions


10.1. Structure theorems. We introduce the following property (?).
Definition 10.1. A linear functional f : D(Ω) → R satisfies condition (?) if for every compact
subset K in Ω there exists C = C(K) > 0 and a positive integer k = k(K) such that
(1) |hf, φi| ≤ C||φ||C k (K) , ∀φ ∈ D(Ω), supp φ ⊂ K.
We have the following characterization of distributions.
Theorem 10.2. A linear functional f on the space D(Ω) is a distribution if and only if it satisfies
condition (?).
Proof. If f satisfies (?) then f is clearly continuous, and so f ∈ D0 (Ω). To prove the converse,
assume that f ∈ D0 (Ω). Arguing by contradiction, suppose that f does not satisfy (?). Then there
exists a compact K in Ω such that for every C and k the inequality (1) fails for some φ ∈ D(Ω) with
supp φ ⊂ K. In particular, we can set C = k = j and take a function φj ∈ D(Ω) with supp φ ⊂ K
such that
(2) |hf, φj i| > j k φj kC j (K) , j = 0, 1, 2, . . . .
By linearity of expressions on both sides, this inequality still holds if we replace φj by the function
φ
ψj = hf,φj j i . Then
1/j >k ψj kC j (K) , j = 0, 1, 2, . . . .
Fix a positive integer k. Then for j ≥ k we have
k ψj kC k (K) ≤k ψj kC j (K) < j −1 → 0, as j → ∞.
Therefore, the sequence (ψj ) converges to 0 in D(Ω) but hf, ψj i = 1. This contradiction proves the
theorem. 
Let Ω be a domain in Rn , f ∈ D0 (Ω) and k ≥ 0 be an integer. We say that a distribution f
has the order of singularity ≤ k if there exists a constant C = C(Ω, f ) > 0 such that for every
ϕ ∈ D(Ω) we have
|hf, ϕi| ≤ C k ϕ kC k (Ω) .
Thus, f satisfies condition (1) with the same k for every compact K in Ω, i.e., k can be chosen
independently of K. We say that the order of singularity of f is equal to k if this estimate does
not hold for some k 0 < k.
Example 10.1. If Tf is a regular distribution defined by a function f ∈ L1 (Ω). Then its order of
singularity is 0. 
Example 10.2. The order of singularity of δ (k) (x) is equal to k. 
The following property of distributions is often used.
1
2 RASUL SHAFIKOV

Theorem 10.3. Let Ω0


be a domain in Rn and Ω be a bounded subdomain such that Ω ⊂ Ω0 . Then
for every distribution f ∈ D0 (Ω0 ) its restriction to Ω is a distribution of finite order of singularity.
Thus, the theorem claims that there exist an integer k ≥ 0 (depending on f and Ω) and a
constant C = C(Ω, f ) > 0 such that for every ϕ ∈ D(Ω) we have
|hf, ϕi| ≤ C k ϕ kC k (Ω)
The proof is similar to the previous one.

Proof. Arguing by contradiction, suppose that there exists a sequence ϕm ∈ D(Ω) such that
|hf, ϕm i| > m k ϕm kC m (Ω)
for every m = 1, 2, .... Set ψm = αm ϕm , where αm is a real number. Then by linearity we still have
|hf, ψm i| > m k ψm kC m (Ω) .
Let αm = (k ϕm kC m (Ω) )−1 /m. Then
(3) |hf, ψm i| > m k ψm kC m (Ω) = 1.
On the other hand, k ψm kC m (Ω) = 1/m for every m. Then for every β, such that |β| ≤ m, we have

k Dβ ψm kC(Ω) ≤ 1/m.
Thus, the sequence (ψm ) converges to 0 together with all partial derivatives of all orders and the
supports of ψ m are contained in the compact Ω in Ω0 . Then ψ m −→ 0 in D(Ω0 ) and hf, ψm i −→ 0.
This contradicts (3). 

The following is a consequence of Theorem 10.3.


Proposition 10.4. Let f ∈ D0 (Rn ) satisfy supp f = {0}. Then there exist an integer k ≥ 0 and
constants Cα such that
X
f= Cα Dα δ(x).
|α|≤k

Proof. Let a function η ∈ D(Rn ) be equal to 1 in a neighbourhood of 0 and vanishes outside


B(0, 1) = {|x| < 1}. Consider a function ϕ ∈ D(Rn ). Let Ω be a domain in Rn containing
supp ϕ ∪ B(0, 2). Applying Theorem 10.3 to f in Ω we conclude that there exist an integer k ≥ 0
(depending on Ω) and a constant C = C(Ω, f ) > 0 such that for every φ ∈ D(Ω) we have
(4) |hf, φi| ≤ C k φ kC k (Ω) .
α φ(0)xα )/α!
P
Set h(x) = φ(x) − |α|≤k (D and
ψs (x) = h(x)η(sx).
Then for every integer s ≥ 1 we have
(5) hf (x), ψ1 i = hf, ψs i.
Indeed, hf, ψ1 i − hf, ψs i = hf, (ψ1 − ψs )hi = 0 since (ψ1 − ψs )h = 0 in a neighborhood of 0 and
supp f = {0}. Since supp ψs ⊂ Ω for every s ≥ 1, we obtain that ψs ∈ D(Ω) and by (4),
|hf, ψs i| ≤ C k ψs kC k (Ω) , s ≥ 1.
REAL ANALYSIS LECTURE NOTES 3

It follows easily from the definition of ψs that k ψs kC k (Ω) −→ 0 as s −→ ∞. But then (5) implies
that hf (x), ψ1 i = 0. Therefore,
X X
hf, ϕηi = (hf, xα ηi/α!)Dα ϕ(0) = Cα hDα δ(x), ϕi,
|α|≤k |α|≤k

where Cα = (hf, xα ηi/α!) are independent of φ. 


Example 10.3. Let a function f ∈ L1loc (Rn \ {0}) satisfy the following condition: there exists a
constant C > 0 and an integer m > 0 such that
C
(6) |f (x)| ≤ , ∀x ∈ {x ∈ Rn : |x| ≤ 1}.
|x|m

We will show that f admits an extension past the origin as a distribution, i.e., there exists f˜ ∈
D0 (Rn ) such that
Z
˜
hf , ϕi = f (x)ϕ(x)dx
Rn
for every ϕ ∈ D(Rn ).
First of all let us recall the general Taylor formula: let a ∈ Rn and let ψ be a smooth function
∈ C ∞ in a neighbourhood of a. Then for every integer k ≥ 0 there exists a neighbourhood U of a
such that for x ∈ U we have
X 1 Z 1 X k+1
α α
ψ(x) = D ψ(a)(x − a) + (1 − t)k Dα ψ(tx + (1 − t)a)(x − a)α dt.
α! 0 α!
0≤|α|≤k |α|=k+1

As usual we use here the notation α! = α1 !...αn ! and xα = xα1 1 ...xαnn . We define the distribution f˜
by the formula
hf˜, ϕi = I1 + I2 ,
where Z
I1 = f (x)ϕ(x)dx,
|x|≥1
 
Z X 1 α
I2 = f (x) ϕ(x) − D ϕ(0)xα  dx.
|x|≤1 α!
|α|≤m−1

Using the Taylor formula and condition (6) we obtain


X
|I2 | ≤ C sup |Dα ϕ|
R
|α|=m

Using the condition supp ϕ ⊂ {x : |x| ≤ M } we also obtain


Z
|I1 | ≤ |f (x)ϕ(x)|dx ≤ C 0 sup |ϕ|.
1≤|x|≤M Rn

From this and Theorem 10.2 we conclude that f˜ is a well-defined distribution in D0 (Rn ). 
Finally, consider an example of a distribution of infinite order.
4 RASUL SHAFIKOV

Example 10.4. Consider a linear functional on D(R) defined by



X
hf, ϕi = ϕ(n) (n).
n=0

It follows by Theorem 8.5 that f ∈ D0 (R). We leave to the reader to prove that f is not of a finite
order, arguing by contradiction. 
10.2. Regularization and convolution with test-functions. Recall that a convolution f ∗ g
of two functions L2 (Rn ) is defined by
Z Z
(7) f ∗ g(x) = f (y)g(x − y)dy = f (x − y)g(y)dy.
Rn Rn
This makes natural the following general definition.
Definition 10.5. A convolution of a distribution f ∈ D0 (Rn ) and a test function ϕ ∈ D(Rn ) is
defined by
(8) f ∗ ϕ(x) = hf (y), ϕ(x − y)i.
Note the following: for every x ∈ Rn the function y 7→ ϕ(x − y) is a test-function; on the right-
hand side of (8) we apply distribution f to this function which is stressed by the notation f (y).
Thus, f ∗ ϕ is defined as a usual function on Rn .
Proposition 10.6. We have f ∗ φ ∈ C ∞ (Rn ) and
(9) Dα (f ∗ φ) = f ∗ Dα φ = (Dα f ) ∗ φ.
Proof. The regularity of f ∗ ϕ and the first equality of (9) follow from Theorem 8.7. Let us prove
the second equality in (9). We have
 
∂ ∂ ∂
( f ) ∗ ϕ (x) = h f (y), ϕ(x − y)i = −hf (y), (ϕ(x − y))i
∂xj ∂yj ∂yj
 
∂ ∂
= hf (y), ϕ (x − y))i = f ∗ ϕ.
∂xj ∂xj
The rest of the proof is done by induction. 
A very important special case arises if we take the bump function ωε as ϕ in the definition of
convolution. This leads to
Definition 10.7. The convolution fε := f ∗ ωε is called the regularization of a distribution f .
Proposition 10.8. We have
(i) fε ∈ C ∞ (Rn ).
(ii) (Dα f )ε = Dα (fε ) .
(iii) If f ∈ C(Rn ) then fε −→ f, ε −→ 0+ in C(Ω) for every bounded subset Ω of Rn .
(iv) If ϕ ∈ D(Rn ) then ϕε ∈ D(Rn ) and ϕε −→ ϕ in D(Rn ) as ε −→ 0+.
(v) if f ∈ D0 (Rn ) then fε −→ f in D0 (Rn ) as ε −→ 0+.
Proof. Parts (i) and (ii) follow from Proposition 10.6. Part (iii) is established in Proposition 7.6,
so it remains to show (iv) and (v). If ϕ ∈ D(Rn ), then its support is compact, say, ϕ(x) = 0 when
|x| ≥ A for some A > 0. Then the formula for the regularization of functions shows that ϕε (x) = 0
for |x| ≥ A + ε so supp ϕε ⊂ K = {x ∈ Rn : |x| ≤ A + 1} for ε < 1. It follows now from (iii) and
REAL ANALYSIS LECTURE NOTES 5

(ii) that ϕε converges to ϕ as ε −→ 0+ uniformly on K together with all partial derivatives of any
order. Hence ϕε −→ ϕ, ε −→ 0+ in D(Rn ) and we obtain (iv).
To prove (v), we view fε = hf (y), ωε (x − y)i as a distribution acting on every ψ ∈ D(Rn ) by
Z
hfε , ψi = hf (y), ωε (x − y)iψ(x)dx.
Rn
It follows from Theorem 8.7 that
Z Z
(10) hf (y), ωε (x − y)iψ(x)dx = hf (y), ωε (x − y)ψ(x)dxi = hf, ψε i.
Rn Rn
To see this we consider for simplicity of notation the case n = 1. Let
Z t
F (t) = hf (y), ωε (x − y)iψ(x)dx,
−∞
and Z t
G(t) = hf (y), ωε (x − y)ψ(x)dxi.
−∞
Then by Theorem 8.7, F 0 (x) = G0 (x) = hf (y), ωε (x − y)iψ(x). Since F (−∞) = G(−∞) = 0, we
have F = G for all t and we pass to the limit as t → +∞. This proves (10). Then by (iv),
hfε , ψi = hf, ψε i −→ hf, ψi as ε −→ 0 + .
This concludes the proof. 
As an application we give another proof of Corollary 9.4: if f ∈ D0 (Rn ) are such that ∂
∂xj f =0
in D0 (Rn ) for j = 1, ..., n, then f = const.
Proof of Corollary 9.4. For every ε > 0 we have 0 = ( ∂x∂ j f )ε = ∂x∂ j (fε ). Since fε is a usual function
of class C ∞ , we conclude that fε = C(ε). Then,
Z
hf, ϕi = lim hfε , ϕi = lim C(ε) ϕ(x)dx for any ϕ ∈ D(Rn ).
ε−→0+ ε−→0+
R
In particular, set ϕ = ω(x) so that ω(x)dx = 1. We obtain that C = limε→0+ C(ε) exists, and
f = C. 

10.3. Convolution of distributions. Let f and g be functions in L2 (Rn ). For a moment assume
that f ∗ g is in L1loc (Rn ). Then it defines a regular distribution acting on ϕ ∈ D(Rn ) by
Z Z Z 
hf ∗ g(x), ϕ(x)i = f ∗ g(x)ϕ(x)dx = f (y)g(x − y)dy ϕ(x)dx.
Rn
By Fubini’s theorem,
Z Z  Z Z 
f (y)g(x − y)dy ϕ(x)dx = f (y) g(x − y)ϕ(x)dx dy
Z Z  Z
= f (y) g(t)ϕ(t + y)dt dy = f (y)hg(t), ϕ(t + y)idy

= hf (y), hg(t), ϕ(t + y)ii.


Thus,
(11) hf ∗ g(x), ϕ(x)i = hf (y), hg(t), ϕ(t + y)ii , ϕ ∈ D(Rn ).
6 RASUL SHAFIKOV

Therefore, in the general case of arbitrary distributions f, g ∈ D0 (Rn ) it is natural to take equality
(11) as a definition of the convolution f ∗ g. However, the right-hand side of equality (11) is not
defined for arbitrary distributions f and g since the function y 7→ hg(t), ϕ(t + y)i is just of class
C ∞ (Rn ) and in general need not have compact support. The support is clearly compact if the
distribution g itself has compact support. So in this case the convolution is well-defined. Similarly,
if f has compact support then it acts on any function from C ∞ (Rn ) as previously discussed, so the
right-hand of equality (11) is also well-defined. We summarize this in the following.
Proposition 10.9. The convolution f ∗ g of two distributions f, g ∈ D0 (Rn ) is a distribution
correctly defined by the equality (11) if at least one of the distributions f and g has compact support.
Example 10.5. For any distribution f ∈ D0 (Rn ) we have
hf ∗ δ, ϕi = hf (y), hδ(t), ϕ(t + y)ii = hf (y), ϕ(y)i,
that is f ∗ δ = f . Furthermore
hδ ∗ f, ϕi = hδ(y), hf (t), ϕ(t + y)ii = hf (t), ϕ(t)i,
so that δ ∗ f = f . We obtain the following fundamental identity
f ∗δ =δ∗f =f
for any f ∈ D0 (Rn ). 
We conclude this section by some algebraic properties of convolution.

(1) The map (f, g) 7→ f ∗ g is bilinear. This is obvious.


(2) We have
(12) Dα (f ∗ g) = f ∗ Dα g = Dα f ∗ g.

For the proof we consider ϕ ∈ D(Rn ). Then


hDα (f ∗ g), ϕi = (−1)|α| hf ∗ g, Dα ϕi = hf (y), hg(t), (−1)|α| (Dα ϕ)(t + y)ii
= hf (y), hDα g(t), ϕ(t + y)ii = hf ∗ Dα g, ϕi,
which proves the first equality of (12). For the second, we observe that hDα g(t), ϕ(t + y)i =
(Dα g) ∗ ϕ(−y); hence it follows from (9) that
hDα g(t), ϕ(t + y)i = (−1)|α| Dα hg(t), ϕ(t + y)i.
Therefore,
hf (y), hDα g(t), ϕ(t + y)ii = (−1)|α| hf (y), Dα hg(t), ϕ(t + y)ii
= hDα f (y), hg(t), ϕ(t + y)ii = Dα f ∗ g.
A simpler proof can be given using Theorem 8.7:
(−1)|α| hf (y), hg(t), (Dα ϕ)(t + y)ii = (−1)|α| hf (y), Dα hg(t), ϕ(t + y)ii
= hDα f (y), hg(t), ϕ(t + y)ii.
v: 2019-04-04
REAL ANALYSIS LECTURE NOTES

RASUL SHAFIKOV

11. Fundamental solutions of differential operators


11.1. Fundamental solutions. In this section we study linear differential equations of the form
X
(1) aα Dα u = f (x), f ∈ D0 (Rn ),
|α|≤m

with constant coefficients aα ∈ Rn . Define an order m linear differential operator


X
P (D) = aα Dα , aα ∈ Rn .
|α|≤m

Then the partial differential equation (1) takes the form


(2) P (D)u = f (x), f ∈ D0 (Rn ).
Let Ω be a domain in Rn . We say that u ∈ D0 (Rn ) is a generalized solution of (2) in Ω if u
satisfies this equation in Ω, that is,
X
aα hDα u, ϕi = hf (x), ϕi
|α|≤m

for every ϕ ∈ D0 (Ω).


Suppose that f ∈ C(Ω). If a function u ∈ C m (Ω) satisfies (2), we call it a classical solution of
(2). Obviously, if u ∈ C m (Ω) is a generalized solution of (2), then it is a classical solution.
Definition 11.1. A distribution E ∈ D0 (Rn ) is called a fundamental solution of a differential
operator P (D) if
P (D)E = δ(x).
If u is a solution of the homogeneous equation P (D)u = 0 then E + u also is a fundamental
solution of (2), so in general a fundamental solution is not unique. The importance of this notion
stems from the following statement.
Theorem 11.2. Let f ∈ D0 (Rn ) be a distribution such that the convolution
u=E∗f
exists inD0 (Rn ). Then u is a solution of equation (2). Moreover, this solution of (2) is unique in
the class of distributions in D0 (Rn ) admitting the convolution with E.
Proof. Using the properties of convolution we obtain
 
X X
P (D)(E ∗ f ) = aα Dα (E ∗ f ) =  aα D α E  ∗ f
|α|≤m |α|≤m
= (P (D)E) ∗ f = δ ∗ f = f.
1
2 RASUL SHAFIKOV

Thus u = E∗f defines a solution of (2). In order to prove the uniqueness in the class of distributions,
admitting the convolution with E, it suffices to prove that the homogeneous equation
P (D)v = 0
has a unique solution in this class. But this holds since
v = δ ∗ v = (P (D)E) ∗ v = E ∗ (P (D)v) = E ∗ 0 = 0.
This proves the theorem. 
d2
Example 11.1. Let P (D) = dx2
on R. To solve the equation
(3) P (D)u = χ[0,1]
2
we first find a fundamental solution of the operator P (D). If E satisfies ddxE2 = δ, then by Exam-
ple 9.1, we have dEdx = θ + c1 . For convenience we may take c1 = −1/2. Then E = 1/2|x| + c2 . Take
c2 = 0, then E = 1/2|x| is a fundamental solution. The find a generalized solution of (3) we com-
pute, according to Theorem 11.2, the convolution of the fundamental solution and the right-hand
side of (3). Since one of the functions has compact support, the convolution is well-defined, so we
have
1 1
Z Z Z
1 1
E ∗ χ[0,1] (x) = |y|χ[0,1] (x − y)dy = |x − t|χ[0,1] (t)dt = |x − t|dt.
R 2 2 R 2 0
This integral is a well defined C 1 -smooth function on R given by
 2
x 1
− 2 + 4 , if x ≤ 0,

2
u(x) = x2 − x2 + 41 , if 0 < x < 1,
 x2 1

2 − 4 , if x ≥ 1.

In the next section we compute fundamental solutions of the classical linear operators in Rn .

11.2. Malgrange-Ehrenpreis theorem. The following fundamental result is obtained indepen-


dently by B. Malgrange and L. Ehrenpreis in 1954-55.
Theorem 11.3. A linear differential operator with constant coefficients admits a fundamental
solution in D0 (Rn ).
We will follow the proof given by J.-P. Rosay ( Amer. Math. Monthly, 98 (1991), no. 6, p.
518–523.). In what follows it will be convenient to assume that all functions are complex valued.
We denote by || · || the L2 -norm on Rn , and
Z
hφ, ψi = φψ
Rn
be the corresponding scalar product. If P (D) is a linear differential operator with constant coeffi-
cients of order m, then its adjoint operator P ∗ (D) is defined by the identity
hAφ, ψi = hφ, A∗ ψi for all φ, ψ ∈ L2 (Rn ).
In particular, if
X ∂ |α|
(4) P (D) = aα , α = (α1 , . . . , αn ),
∂xα
|α|≤m
REAL ANALYSIS LECTURE NOTES 3

then the adjoint operator takes the form


X ∂ |α|
P ∗ (D) = (−1)|α| aα .
∂xα
|α|≤m

Proposition 11.4 (Hörmander’s inequality). Let P (D) be a nonzero linear differential operator
with constant coefficients of order m given by (4). Then for every bounded domain Ω ⊂ Rn , there
exists a constant C > 0, such that for every φ ∈ D(Ω), we have
||P (D)φ|| ≥ C||φ||.
One can take C = |P |m Km,Ω , where
|P |m = max {|aα |},
|α|=m

and Km,Ω depends only on m and the quantity {sup |x| : x ∈ Ω}.
Proof. To illustrate the idea of the proof first consider the case n = 1, Ω = (0, 1), and P (D) = d/dx.
We need to show that there exists some C > 0 such that ||φ0 || ≥ C||φ|| for all φ ∈ D((0, 1)). We
have
h(xφ)0 , φi = hxφ0 , φi + hφ, φi.
Using integration by parts, h(xφ)0 , φi = −hxφ, φ0 i, and so hφ, φi = −hxφ0 , φi − hxφ, φ0 i. Since
|x| < 1, we get ||φ||2 ≤ 2||φ0 || ||φ||, by the Hölder inequality (Thm 4.2). Hence, ||φ0 || ≥ 1/2||φ||.
The general case is proved by induction on the degree of P . Define a linear differential operator
with constant coefficients Pj (D) by the following identity
P (D)(xj φ) = xj P (D)φ + Pj (D)φ.
The operator Pj (D) is zero iff P (D) does not involve any differentiation with respect to xj . If it is
nonzero, then Pj (D) is of order at most m − 1. Let A = supx∈Ω |x|. By induction on m, we will
show that for every φ ∈ D(Ω),
(5) ||Pj (D)φ|| ≤ 2mA ||P (D)φ||.
Observe that (5) and the definition of Pj yield
(6) ||P (D)(xj φ)|| ≤ (2m + 1)A ||P (D)φ||.
Since differential operators with constant coefficients commute, we have for all φ ∈ D(Ω),
||P (D)φ||2 = hP (D)φ, P (D)φi = hφ, P ∗ (D)P (D)φi = hφ, P (D)P ∗ (D)φi
= hP ∗ (D)φ, P ∗ (D)φi = ||P ∗ (D)φ||2 .
The inequality (5) is trivial for m = 0, since then Pj (D) = 0. Assuming that (5) is verified
for operators of order m − 1, we compute hP (D)(xj φ), Pj (D)φi in two different ways. From the
definition of Pj (D) we have,
hP (D)(xj φ), Pj (D)φi = hxj P (D)φ, Pj (D)φi + ||Pj (D)φ||2 .
By integration by parts (i.e., using the definition of the adjoint) and using commutativity of P ∗ (D)
and Pj (D), we obtain
hP (D)(xj φ), Pj (D)φi = hPj∗ (D)(xj φ), P ∗ (D)φi.
Therefore,
(7) ||Pj (D)φ||2 = hPj∗ (D)(xj φ), P ∗ (D)φi − hxj P (D)φ, Pj (D)φi.
4 RASUL SHAFIKOV

By the induction hypothesis, equation (6) holds for all operators of order m−1, which when applied
to Pj∗ (D) yields
||Pj∗ (D)(xj φ)|| ≤ (2m − 1)A ||Pj (D)φ||.
And, since
|hxj P (D)φ, Pj (D)φi| ≤ A||P (D)φ|| ||Pj (D)φ||,
we obtain from (7) that
||Pj (D)φ||2 ≤ 2mA ||Pj (D)φ|| ||P (D)φ||,
which proves (5). If P (D) is an operator of order m ≥ 1, there exists j ∈ {1, . . . , n} such that
Pj (D) is of order m − 1, and |Pj |m−1 ≥ |P |m . Thus the proposition follows from (5) by induction
on m. 
Corollary 11.5. If Ω is a bounded domain in Rn , then for every g ∈ L2 (Ω) there exists u ∈ L2 (Ω)
such that P (D)u = g.
Proof. This follows from the inequality ||P ∗ (D)φ|| ≥ C||φ||, φ ∈ D(Ω). Indeed, P (D)u = g means
that for all φ ∈ D(Ω),
(8) hg, φi = hu, P ∗ (D)φi.
Let
E = {ψ ∈ D(Ω), ψ = P ∗ (D)φ for some φ ∈ D(Ω)}.
Consider the (anti)linear functional l : E → C given by
l(ψ) = hg, φi, where ψ = P ∗ (D)φ.
Then using Hörmander’s inequality we have
||g|| ||g||
||l|| = sup |hg, φi| ≤ ||g|| sup ||φ|| ≤ sup ||P ∗ (D)φ|| = .
||ψ||=1 ||ψ||=1 C ||ψ||=1 C
This shows that l is a bounded linear functional on E with L2 -norm. Therefore, l can be extended
to E, the closure of E in L2 (Ω). Then the Riesz representation theorem (Theorem 4.11) gives the
existence of u ∈ E such that l(ψ) =< u, ψ >. This implies equation (8). 
We now wish to extend the above result to L2loc (Ω) functions. For this we first prove the following
Proposition 11.6. There exists C 0 > 0 such that for all η ∈ R and φ ∈ D(Ω), we have
Z Z
ηx1 2 0
e |P (D)φ| ≥ C eηx1 |φ|2 .
Ω Ω
Note that C0 is independent of η.
Proof. Apply Hörmander’s inequality to Ψ = e(η/2)x1 φ and operator Q(D) defined by
Q(D)(Ψ) = e(η/2)x1 P (D)[e−(η/2)x1 Ψ],
which is indeed a constant coefficient operator of the same degree m as P (D). 
Corollary 11.7. Let φ ∈ D(Rn ) or more generally φ ∈ L2 (Rn ) with compact support. If P (D)φ is
supported in the ball B(0, r), then so is φ.
Proof. By letting η → +∞ in Proposition 11.6, one can immediately verify that if P (D)φ = 0
in the half-space {x1 > 0}, then φ = 0 there. From this, using translations and rotations, the
corollary can be verified in the case of a smooth φ. In the nonsmooth case, for ε < 1 consider the
regularization φε = φ ∗ ωe ∈ D(Rn ). Then P (D)φε = P (D)φ ∗ ωε is supported in B(0, r + ε) and
φε → φ in L2 as ε → 0 by Proposition 10.8. This reduces the problem to the smooth case. 
REAL ANALYSIS LECTURE NOTES 5

r0
Proposition 11.8. Let 0 < r < < R. If v ∈ L2 (B(0, r0 ))
and satisfy P (D)v = 0 on B(0, r0 ),
then there exists a sequence (vj ) ⊂ L2 (B(0, R)) such that P (D)vj = 0 on B(0, R) and vj → v in
L2 (B(0, r)) as j → ∞.
Proof. After regularization we can assume that v is smooth, possibly shrinking r0 slightly. It suffices
to show that any continuous linear functional that vanishes on the space L2 (B(0, R))∩{α : P (D)α =
0} also vanishes at v. In other words (using the Riesz representation theorem), we have to show
that if g ∈ L2 (B(0, r)) and satisfies hα, giB(0,r) = 0 for all α ∈ L2 (B(0, R)) with P (D)α = 0, then
hv, giB(0,r) = 0.
Claim. There exists w ∈ L2 (B(0, R)) such that for all φ ∈ D(Rn ),
hφ, giB(0,r) = hP (D)φ, wiB(0,R) .

For the proof of the claim, we need to find C > 0 such that
hφ, giB(0,r) ≤ C||P (D)φ||B(0,R) .
Notice that if P (D)φ = 0, then we have hφ, gi = 0. If P (D)φ 6= 0, then by Corollary 11.5 we
can find Ψ ∈ L2 (B(0, R)) so that P (D)Ψ = P (D)φ and ||Ψ||B(0,R) ≤ C1 ||P (D)φ||B(0,R) for some
C1 > 0. Then
hφ, giB(0,R) = hφ − Ψ, giB(0,R) + hΨ, giB(0,R) = hΨ, giB(0,R) .
Hence, |hφ, giB(0,R) | ≤ C||P (D)φ||B(0,R) with C = C1 ||g||, which proves the claim.
Pick w as given by the claim. Extend g and w on Rn to g̃ and w̃ by setting g̃ = 0 on Rn \ B(0, r)
and w̃ = 0 on Rn \ B(0, R). We then have g̃ = P ∗ (D)w̃. Since w̃ has compact support, and P ∗ (D)w̃
is supported in B(0, r), we conclude from Corollary 11.7 that w = 0 on B(0, R) \ B(0, r).
To complete the proof of the proposition take v as at the beginning of the proof, and extend
it to be a smooth, compactly supported function on Rn (but no longer satisfying P (D)v = 0 off
B(0, r)). One has
hv, giB(0,r) = hP (D)v, wiB(0,R) = hP (D)v, wiB(0,r) = 0.

We now can prove the following
Proposition 11.9. Let P (D) be a nonzero linear differential operator on Rn with constant coeffi-
cients. Then for every g ∈ L2loc (Rn ) there exists u ∈ L2loc (Rn ) such that P (D)u = g.
Proof. By Corollary 11.5 there exists u1 ∈ L2 (B(0, 2)) so that P (D)u1 = g on B(0, 2). Then
inductively, assuming up has been chosen in L2 (B(0, p + 1)) so that P (D)up = g, one chooses
up+1 in L2 (B(0, p + 2)) in the following way. Let w be an arbitrary solution of P (D)w = g, in
L2 (B(0, p + 2)). On B(0, p + 1) one has P (D)(up − w) = 0. By Proposition 11.8 there exists
v ∈ L2 (B(0, p + 2)) such that P (D)v = 0, and ||v − (up − w)||B(0,p) ≤ 1/2p . Set up+1 = v + w.
Then P (D)up+1 = g on B(0, p + 2), and ||up+1 − up ||B(0,p) ≤ 1/2p . The sequence (up ) is obviously
convergent in L2loc (Rn ), and its limit satisfies P (D)u = g. 
Proof of the Malgrange-Ehrenpreis Theorem. Let H be the function (product of the Heaviside func-
tions on R) defined on Rn by
(
1, if xj > 0, j = 1, . . . , n,
H(x1 , . . . , xn ) =
0, otherwise.
6 RASUL SHAFIKOV

Then
∂nH
= δ0 .
∂x1 . . . ∂xn
Since H ∈ L2loc (Rn ), by the previous proposition there exists u ∈ L2loc (Rn ) so that P (D)u = H. Set
∂nu
E= .
∂x1 . . . ∂xn
Then
∂nu ∂n
P (D)E = P (D) = (P (D)u) = δ0 .
∂x1 . . . ∂xn ∂x1 . . . ∂xn

v: 2019-04-08
REAL ANALYSIS LECTURE NOTES

RASUL SHAFIKOV

12. Fundamental solutions of classical operators.


12.1. More advanced examples. Here we consider examples concerning distributions in Rn ,

n > 1. One  of the most important examples is given by the Cauchy-Riemann operator ∂z =
1 ∂ ∂ ∼ 2
2 ∂x + i ∂y on the complex plane C = R with the coordinate z = x + iy. This is a differential
operator with constant coefficients of order one.
First of all we adapt the integration by parts (formula (12) in Section 5.2) to the complex
notation. Let Ω be a bounded domain with C 1 boundary in C and f be a complex function of
class C(Ω). We suppose that (a connected component of) ∂Ω is positively parametrized by the
map [a, b] 3 t 7→ x(t) + iy(t) of class C 1 . Then
(y 0 (t), −x0 (t))
~n = p
(x0 (t))2 + (y 0 (t))2
is the vector field of the unit outward normal. Then, from the definition of the surface integral (see
Section 5.1) and using the notation dz = dx + idy, we have
Z Z b Z
0 0
f [(~n, ~e1 ) + i(~n, ~e2 )]dS = f (x(t), y(t))(y (t) − ix (t))dt = −i f (z)dz.
∂Ω a ∂Ω

Keeping this in mind, we pass to the integration by parts with the Cauchy-Riemann operator. For
two complex-valued functions u, v ∈ C 1 (Ω) we have
Z Z Z Z Z 
∂u 1 ∂u i ∂u 1 ∂v
vdxdy = vdxdy + vdxdy = uv(~n, e1 )dS − u dxdy
Ω ∂z 2 Ω ∂x 2 Ω ∂y 2 ∂Ω Ω ∂x
Z Z  Z Z
i ∂v 1 ∂v
+ uv(~n, e2 )dS − u dxdy = uv[(~n, e1 ) + i(~n, e2 )]dS − u dxdy
2 ∂y 2 Ω ∂z
Z∂Ω Ω ∂Ω
−i
Z
∂v
= uvdz − u dxdy.
2 ∂Ω Ω ∂z

Thus we obtain the following useful integration by parts formula:


−i
Z Z Z
∂u ∂v
(1) vdxdy = uvdz − u dxdy.
Ω ∂z 2 ∂Ω Ω ∂z
1 ∂
Lemma 12.1. The function πz is the fundamental solution of the operator ∂ z̄ , i.e.,
∂ 1
(2) = πδ(x, y).
∂z z
Proof. First note that z1 ∈ L1loc (R2 ) (use the polar coordinates to verify this), and so z1 defines a
regular distribution. Let ϕ ∈ D(R2 ) be a (complex-valued) test function with supp ϕ ⊂ B(0, R).
1
2 RASUL SHAFIKOV

For ε > 0 denote by A(ε, R) the annulus B(0, R)\B(0, ε). Denote also by Cε the circle {|z| = ε}.
∂ 1
Then ∂z z = 0 on A(ε, R) and using (1) with u = φ and v = 1/z we have
Z Z
∂ 1 1 ∂ϕ 1 ∂ϕ i ϕ
h , ϕi = −h , i = − lim dxdy = lim − dz.
∂z z z ∂z ε−→0+ Ωε z ∂z ε−→0+ 2 Cε z

Here the integral over the circle Cε is taken with positive orientation with respect to the disc B(0, ε).
Writing
ϕ(z) − ϕ(0)
Z Z Z
ϕ dz
dz = dz + ϕ(0) ,
Cε z Cε z Cε z
we easily see that the first integral tends to 0 (use Taylor’s formula) and the second one tends to
2πiϕ(0). Hence,
Z
i ϕ
lim − dz = πϕ(0),
ε−→0+ 2 Cε z

which concludes the proof. 

Using Lemma 12.1 we can easily deduce an integral representation involving the Cauchy-Riemann
operator. Fix z ∈ Ω. Denote by Ωε the domain Ω\B(z, ε) and by C(z, ε) the circle {ζ : |ζ − z| = ε}.
Let a complex function f be of class C 1 (Ω). We set ζ = ξ + iη. Then, using (1) and (2), we have
Z Z
1 ∂f (ζ) 1 ∂f (ζ) 1
dξdη = lim dξdη
π Ω ∂ζ ζ − z ε−→0+ Ωε ∂ζ ζ − z
!
−i −i
Z Z Z
1 f (ζ) i f (ζ) f (ζ)
= lim dζ + dζ = dζ − f (z).
π ε−→0+ 2 ∂Ω ζ − z 2 C(z,ε) ζ − z 2π ∂Ω ζ − z

Thus we obtained the so-called Cauchy-Green formula


Z Z
1 f (ζ) 1 ∂f (ζ) 1
(3) f (z) = dζ − dξdη.
2πi ∂Ω ζ − z π Ω ∂ζ ζ − z
∂f (ζ)
In particular, if f is holomorphic, i.e., ∂ζ
= 0 in Ω, we have the classical Cauchy integral formula
Z
1 f (ζ)
(4) f (z) = dζ.
2πi ∂Ω ζ − z
It is also easy to deduce now the Cauchy theorem. Let D be a domain in C and γ be a closed
simple path in D homotopic to 0. If f is a holomorphic function in D, then
Z
(5) f (z)dz = 0.
γ

Indeed, consider the domain Ω ⊂ D bounded by γ. Since γ is homotopic to 0, the boundary ∂Ω of


Ω coincides with γ (with suitable orientation). Fix z ∈ Ω. By the Cauchy formula we have
(ζ − z)f (ζ)
Z Z Z Z
ζf (ζ) f (ζ)
f (ζ)dζ = dζ = dζ − z dζ
γ γ ζ − z γ ζ − z γ ζ −z
= 2πizf (z) − 2πizf (z) = 0,
which proves (5).
The next example is a generalization of Example (9.2).
REAL ANALYSIS LECTURE NOTES 3

Example 12.1. Let Ω be a bounded domain with C1


boundary and f ∈ ∩ C 1 (Ω)
(in C 1 (Rn \Ω)
∂f
particular, all discontinuity points of f belong to ∂Ω). The usual partial derivative ∂xk is defined
and locally integrable on Rn \∂Ω so we can consider the regular distribution T ∂f ∈ D0 (Rn ). We
∂xk
also introduce the “jump” of f on ∂Ω:
[f ]∂Ω (x) = f+ (x) − f− (x) = lim f (x0 ) − lim f (x0 ), x ∈ ∂Ω.
Rn \Ω3x0 −→x Ω3x0 −→x

We point out here that if µ is a continuous function on a compact hypersurface Γ ⊂ Rn , then


the distribution µδΓ defined by
Z
hµδΓ , ϕi = µ ϕ dS, ϕ ∈ D(Rn )
Γ

is called the simple potential on the hypersurface Γ with density µ.


For k = 1, 2, . . . , n, consider the distribution [f ]∂Ω (ek , ~n)δ∂Ω ∈ D0 (Rn ) defined by
Z
h[f ]∂Ω (ek , ~n) δ∂Ω , ϕi = [f ]∂Ω (ek , ~n) ϕ dS, ϕ ∈ D(Rn ).
∂Ω

Let us prove the formula for the partial derivative of f in the sense of distributions:

∂f
(6) = T ∂f + [f ]∂Ω (ek , ~n)δ∂Ω ,
∂xk ∂xk

∂f
where ∂xk ∈ D0 (Rn ). We have
Z
∂f ∂ϕ ∂ϕ(x)
h , ϕi = −hf, i=− f (x) dx.
∂xk ∂xk Rn ∂xk
We decompose
Z Z Z
∂ϕ(x) ∂ϕ(x) ∂ϕ(x)
f (x) dx = f (x) dx + f (x) dx
Rn ∂xk Ω ∂xk Rn \Ω ∂xk
and apply to every integral on the right the integration by parts formula. Then
Z Z Z
∂ϕ(x) ∂f (x)
f (x) dx = − ϕ(x) dx + f− (x)ϕ(x)(ek , ~n)dS,
Ω ∂xk Ω ∂xk ∂Ω

and Z Z Z
∂ϕ(x) ∂f (x)
f (x) dx = − ϕ(x) dx − f+ (x)ϕ(x)(ek , ~n)dS
n
R \Ω ∂xk n
R \Ω ∂xk ∂Ω

(the minus sign before the last integral appears because ~n is the exterior normal for Ω and so it is
the interior normal for Rn \Ω). Therefore,
Z Z Z
∂ϕ(x) ∂f (x)
f (x) dx = − ϕ(x) dx − [f ]∂Ω (x)(ek , ~n)ϕ(x)dS,
Rn ∂xk Rn ∂xk ∂Ω

and Z Z
∂f ∂f (x)
h , ϕi = − ϕ(x) dx + [f ]∂Ω (x)(ek , ~n)ϕ(x)dS,
∂xk R n ∂xk ∂Ω
which proves (6). 
4 RASUL SHAFIKOV

12.2. Laplace operator. In this section we construct a fundamental solution of the Laplace op-
erator
∂2 ∂2
∆= + ... + .
∂x21 ∂x2n
(a) First we suppose that n = 2 and prove that

∂ 2 ln |x| ∂ 2 ln |x|
(7) ∆ ln |x| = + = 2πδ(x), x ∈ R2 .
∂x21 ∂x22
First of all observe that the function ln |x| is of class L1loc (R2 ) (to see this it suffices to pass to the
polar coordinates) and so it can be viewed as a distribution. Let ϕ ∈ D(R2 ). Since supp ϕ is a
compact set, there exists R > 0 such that ϕ(x) = 0 for |x| ≥ R/2. We have
Z Z
h∆ ln |x|, ϕi = hln |x|, ∆ϕi = ln |x|∆ϕ(x)dx = ln |x|∆ϕ(x)dx.
R2 |x|≤R
Denote by A(ε, R) = {x : ε < |x| < R} the annulus, where ε > 0 is small enough. Then by the
Lebesgue convergence theorem,
Z Z
ln |x|∆ϕ(x)dx = lim ln |x|∆ϕ(x)dx.
|x|≤R ε−→0+ A(ε,R)

By the Green formula we have


 
∂ ln |x|
Z Z Z
∂ϕ(x)
ln |x|∆ϕ(x)dx = ∆ ln |x|ϕ(x)dx + ln |x| −ϕ dS.
A(ε,R) A(ε,R) ∂A(ε,R) ∂~n ∂~n
An elementary computation (say, in the polar coordinates) shows that ∆ ln |x| = 0 for x 6= 0 so the
first integral on the right vanishes. Furthermore, ∂A(ε, R) = Cε ∪ CR , where Cr = {x : |x| = r} so
that ∂A(ε,R) = Cε + CR . By the choice of R we have ϕ(x) = ϕ(x)
R R R
n = 0 for x ∈ CR . Thus,
∂~
Z  
∂ ln |x|
Z
∂ϕ(x)
ln |x|∆ϕ(x)dx = ln |x| −ϕ dS.
A(ε,R) Cε ∂~n ∂~n
Since ~n is the vector of the unit exterior normal to A(ε, R), for every x ∈ Cε we have ~n = −x/|x|,
and so
∂ ∂ x1 ∂ x2
=− − .
∂~n ∂~x1 |x| ∂~x2 |x|
Then, Z
∂ϕ(x)
ln |x| dS ≤ const · ε| ln ε| −→ 0, ε −→ 0.
Cε ∂~n
∂ ln |x| 1
Finally, ∂~n = − |x| so that
∂ ln |x|
Z Z
1
− ϕ dS = ϕdS.
Cε ∂~n ε Cε
But we have
Z  Z 
1 1
lim ϕ(x)dS = lim (ϕ(x) − ϕ(0))dS + 2πϕ(0) = 2πϕ(0).
ε−→0 ε Cε ε−→0 ε Cε
Thus, Z
ln |x|∆ϕ(x)dx = 2πϕ(0),
R2
which proves (7).
REAL ANALYSIS LECTURE NOTES 5

(b) Now we show that

1
(8) ∆ = −(n − 2)Sn δ(x), n ≥ 3,
|x|n−2
where the constant Sn is equal to the surface area of the unit sphere in Rn . The proof is quite
similar to part (a). We use the notation r = r(x) = |x|. For a function f : R −→ R of class C 2 we
have
∂ xj
f (r) = f 0 (r) ,
∂xj r

∂2 00
x2j 0
r2 − x2j
f (r) = f (r) + f (r) ,
∂x2j r2 r2
n−1
∆f (r) = f 00 (r) + f 0 (r) .
r
Setting f (r) = rp , we obtain
∆rp = p(p + n − 2)rp−2 .
Therefore, ∆r2−n = 0 on Rn \{0}. Also note that the function x 7→ r2−n is in L1loc (Rn ).
We have, for sufficiently large R > 0, that
Z Z
2−n 2−n 2−n
h∆r , ϕi = hr , ∆ϕi = r ∆ϕ(x) = r2−n ∆ϕ(x)dx,
Rn |x|≤R

and Z Z
2−n
r ∆ϕ(x)dx = lim r2−n ∆ϕ(x)dx.
|x|≤R ε−→0+ A(ε,R)

Again, by the Green formula we have


∂r2−n
Z Z Z  
2−n 2−n 2−n ∂ϕ(x)
r ∆ϕ(x)dx = ∆r ϕ(x)dx + r −ϕ dS.
A(ε,R) A(ε,R) ∂A(ε,R) ∂~n ∂~n
ϕ(x)
The first integral on the right vanishes and by the choice of R we have ϕ(x) = ∂~
n = 0 for x ∈ CR .
Thus,
∂r2−n
Z Z  
2−n 2−n ∂ϕ(x)
r ∆ϕ(x)dx = r −ϕ dS.
A(ε,R) Cε ∂~n ∂~n
Since ~n is the vector of the unit exterior normal to A(ε, R), for every x ∈ Cε we have ~n = −x/|x|
and
∂ ∂ x1 ∂ xn
=− − ··· − .
∂~n ∂~x1 |x| ∂~xn |x|
Then,
Z
∂ϕ(x)
r2−n dS ≤ const · ε −→ 0, ε → 0.
Cε ∂~n
∂r2−n
Finally, ∂~n = (n − 2)r1−n , so that
∂r2−n
Z Z
1
− ϕ dS = −(n − 2) n−1 ϕdS.
Cε ∂~n ε Cε
6 RASUL SHAFIKOV

Then
Z  Z 
1 1
−(n − 2) lim ϕ(x)dS = −(n − 2) lim (ϕ(x) − ϕ(0))dS
εn−1 Cε
ε−→0 ε−→0 εn−1 Cε
−(n − 2)Sn ϕ(0) = −(n − 2)Sn ϕ(0),
which proves (8).
If n = 3, then S3 = 4π so that
1
∆ = −4πδ(x), x ∈ R3 .
r
12.3. Heat Equation. Consider the function
θ(t) |x|2
− 2
E(x, t) = √ e 4a t ,
(2a πt)n
where the function θ is the Heaviside function on R. The function E is locally integrable in Rn+1 .
Indeed, E(x, t) = 0 if t < 0 and E(x, t) is positive for t ≥ 0. Furhermore, E is continuous
(and vanishes) on the hyperplane {(x, t) : t = 0}. Consider a bouded set of Rn+1 of the form
B(0, R) × [0, R], where B(0, R) = {x ∈ Rn : |x| ≤ R}. By Fubini’s theorem we have
Z Z Z ! Z Z 
E(x, t)dxdt = E(x, t)dx dt ≤ E(x, t)dx dt.
B(0,R)×[0,R] [0,R] B(0,R) [0,R] Rn

After the change of coordinates x/2a t = y we have
n Z
|x|2
Z Z
1 − 2 1 Y −yj2
E(x, t)dx = √ e 4a t dx = √
n
e dyj = 1.
n ( π)
Rn Rn (2a πt) j=1 R

Thus,
Z
(9) E(x, t)dx = 1,
Rn
and so Z Z  Z
E(x, t)dx dt ≤ dt = R.
[0,R] Rn [0,R]
This proves the local integrability of E(x, t).
Let us prove the following identity:
∂E
(10) − a2 ∆E = δ(x, t).
∂t
We first observe that for t > 0 the function E is of class C ∞ , and by an elementary computation,
which is left for the reader, we have
∂E
(11) (x, t) − a2 ∆E = 0, t > 0.
∂t
Here the derivatives are taken in the usual sense. Now let ϕ ∈ D(Rn+1 ). Then
Z ∞ Z   
∂E 2 ∂ϕ 2 ∂ϕ 2
h − a ∆E, ϕi = −hE, + a ∆ϕi = − E(x, t) + a ∆ϕ dx dt.
∂t ∂t 0 Rn ∂t
By the Lebesgue convergence theorem we have
Z ∞ Z    Z ∞ Z   
∂ϕ 2 ∂ϕ 2
− E(x, t) + a ∆ϕ dx dt = − lim E(x, t) + a ∆ϕ dx dt.
0 Rn ∂t ε−→0 ε Rn ∂t
REAL ANALYSIS LECTURE NOTES 7

Since supp φ is compact, the integration by parts yields


Z ∞ Z  Z Z ∞ Z 
∂ϕ ∂E
− E(x, t) dx dt = E(x, ε)ϕ(x, ε)dx + ϕdx dt.
ε Rn ∂t Rn ε Rn ∂t
Fix R > 0 such that ϕ(x, t) = 0 for |x| ≥ R/2. Green’s formula implies
Z Z Z
E(x, t)∆ϕ(x, t)dx = E(x, t)∆ϕ(x, t)dx = (∆E(x, t))ϕ(x, t)dx,
Rn |x|≤R Rn
since Z  
∂ϕ ∂E
E −ϕ dx = 0
|x|=R ∂~n ∂~n
in view of the choice of R. Thus,
Z ∞ Z    Z
∂ϕ 2
− lim E(x, t) + a ∆ϕ dx dt = lim E(x, ε)ϕ(x, ε)dx
ε−→0 ε Rn ∂t ε−→0
Z ∞ Z   Z 
∂E 2
+ ( − a ∆E)ϕdx dt = lim E(x, ε)ϕ(x, ε)dx ,
ε Rn ∂t ε−→0

where the last equality follows by (11). We need the following


Claim 1. One has Z
lim E(x, ε)[ϕ(x, ε) − ϕ(x, 0)]dx −→ 0.
ε−→0

For the proof, fix R > 0 such that supp φ ⊂ {|(x, t)| < R}. The function φ is Lipschitz continuous
and hence, uniformly continuous on Rn+1 . Given α > 0 there exists δ > 0 such that |φ(x, ε) −
φ(x, 0)| < α/2 for all x ∈ Rn . Therefore,
Z
E(x, ε)[ϕ(x, ε) − ϕ(x, 0)]dx = I + II,

with Z
I= E(x, ε)[ϕ(x, ε) − ϕ(x, 0)]dx,
|x|<δ
and Z
II = E(x, ε)[ϕ(x, ε) − ϕ(x, 0)]dx.
δ≤|x|≤R
Then Z
|I| ≤ (α/2) E(x, ε)dx = α/2.
Rn
Set
1 |δ|2
√ n e− 4a2 ε ,
M (ε) =
(2a πε)
and C = supx∈Rn |φ(x)|. Then sup|x|≥δ E(x, ε) = M (ε) and
Z
|II| ≤ 2C E(x, ε)dx ≤ 4CM (ε)R → 0, ε → 0.
δ≤|x|≤R
It follows that |II| ≤ α/2 for all ε small enough. This proves the claim.
We conclude that
Z Z
lim E(x, ε)ϕ(x, ε)dx = lim ( E(x, ε)ϕ(x, 0)dx.
ε−→0 ε−→0
To finish the proof we need the following
8 RASUL SHAFIKOV

Claim 2. The following holds in D0 (Rn ):


lim E(x, t) = δ(x).
t−→0+

For the proof, let ψ ∈ D(Rn ). Since ψ has a compact support, there exists a constant C > 0
such that
|ψ(x) − ψ(0)| ≤ C|x|, x ∈ Rn .
We have
|x|2
Z Z
C − 2
E(x, t)(ψ(x) − ψ(0))dx ≤ e 4a t |x|dx.
Rn (4πa2 t)n/2 Rn
Evaluating the last integral in the spherical coordinates (we denote by σn the surface of the unit
sphere in Rn ) we obtain that the last integral is equal to
Z ∞ √ Z
Cσn 2
− r2 n 2Cσn ta ∞ −u2 n √
2 n/2
e 4a t r dr = n/2
e u du = C 0 t.
(4πa t) 0 π 0
Hence, Z
E(x, t)(ψ(x) − ψ(0))dx −→ 0, as t −→ 0 + .
Rn
Then, using (9), we have
Z Z Z
hE(x, t), ψi = E(x, t)ψ(x)dx = ψ(0) E(x, t)dx + E(x, t)(ψ(x) − ψ(0))dx
Rn
−→ ψ(0) = hδ(x), ψi.
This proves the claim.
Let ψ(x) = ϕ(x, 0) ∈ D(Rn ). Then
Z 
∂E
h − a2 ∆E, ϕi = lim E(x, ε)ϕ(x, 0)dx = ϕ(0) = hδ(x, t), ϕi.
∂t ε−→0

This concludes the proof of (10).

You might also like