Analysis Distribution TH Lectures
Analysis Distribution TH Lectures
RASUL SHAFIKOV
which is, of course, the standard Euclidean distance in Rn . One can verify that this metric satisfies
all three required properties: (i) d(x, y) ≥ 0, d(x, y) = 0 iff x = y; (ii) d(x, y) = d(y, x); (iii)
d(x, y) ≤ d(x, z) + d(z, y) for any x, y, z ∈ Rn . To prove properties (iii) of the norm and of the
metric in Rn one can use so-called Minkowski’s inequality:
n
!1/k n
!1/k n
!1/k
X X X
k k k
(ai + bi ) ≤ ai + bi ,
i=1 i=1 i=1
where ai , bi ≥ 0, and k > 1. In fact, Minkowski’s inequality is a special case of the Hölder inequality:
n
!1/p !1/q
X X p X q
ai bi ≤ ai · bi ,
i=1 i=1 i=1
where ai , bi ≥ 0, p, q > 1, 1/p + 1/q = 1. Note that when p = q = 2 the Hölder inequality can be
written in the form a · b ≤ |a| · |b|.
The topology on Rn is induced by the metric: a set Ω ⊂ Rn is called open if every point x ∈ Ω
is contained in Ω together with a small ball
B(x, ε) = {y ∈ Rn : |x − y| < ε}, ε > 0.
This topology gives Rn the structure of a complete metric space, i.e, every Cauchy sequence with
respect to the metric converges to an element of the space. Further, (Rn , | · |) is a Banach space,
1
2 RASUL SHAFIKOV
i.e., a complete normed space. A Banach space is called a Hilbert space if its norm comes from a
scalar product. Thus, Rn is a Hilbert space with the scalar product defined by (1).
A map A : Rn → Rm is called linear if A(ax + by) = aA(x) + bA(y) for all x, y ∈ Rn and a, b ∈ R.
A linear map can be identified with a n × m matrix with real coefficients. We define the norm of a
linear map as
||A|| = sup |Ax|.
x∈Rn ,|x|≤1
It follows immediately from the definition that |Ah| ≤ ||A|| · |h| for all h ∈ Rn .
A map f : Rn → Rm is called affine if f (x) = Ax + B, where A is a linear map, and B is a
constant vector.
1.2. Continuity. A domain Ω ⊂ Rn is a connected open set. Given a function f : Ω → R and
point x0 ∈ Ω, we say that f is continuous at x0 if for any ε > 0 there exists δ > 0 such that
|f (x) − f (x0 )| < ε whenever |x − x0 | < δ.
Theorem 1.1. A function f is continuous at x0 if and only if limj→∞ f (xj ) = f (x0 ) for any
sequence of point (xj ) → x0 .
The proof of ⇒ follows from the definition of continuity. To prove the converse formulate the
negation of continuity of a function and get a contradiction with the assumption.
xy
Example 1.1. The function f (x, y) = x2 +y 2
does not have a limit as x, y → 0, and thus does not
x2 y
admit continuous extension to the origin. On the other hand, the function g(x, y) = x2 +y 2
has
limit equal to 0 as x, y → 0, which follows from the estimate
x2 y xy 1
= 2 |x| ≤ |x|.
x2 + y 2 x + y2 2
Hence, g become continuous at the origin after setting g(0) = 0.
Continuity of maps f : Rn → Rm is defined similarly.
1.3. Differentiability. Recall that for n = 1, a function f : R → R is called differentiable at a
point x if the limit
f (x + h) − f (x)
lim
h→0 h
exists. This implies that
f (x + h) − f (x) = f 0 (x) · h + r(h),
where r(h) = o(h), i.e., r(h)/h → 0 as h → 0. The definition of differentiability in higher dimensions
is defined similarly.
Definition 1.2. Let Ω ⊂ Rn be a domain, f : Ω → Rm be a map, x ∈ Ω. If there exists a linear
map A : Rn → Rm such that
|f (x + h) − f (x) − Ah|
(2) lim = 0, h ∈ Rn ,
h→0 |h|
then we say that f is differentiable at x and we write Df (x) = f 0 (x) = A. If f is differentiable at
every point of Ω, then we say that f is differentiable on Ω. The map A is called the differential of
f at x, and the corresponding matrix is called the Jacobian matrix of f .
Theorem 1.3. If the above definition holds for A = A1 and A = A2 then A1 = A2 .
REAL ANALYSIS LECTURE NOTES 3
Proof. Let b = f (a). We set A = f 0 (a), B = g 0 (b), U (h) = f (a + h) − f (a) − Ah, and V (k) =
g(b + k) − g(b) − Bk, where h ∈ Rn and k ∈ Rm . Then
|U (h)| |V (k)|
(3) ν(h) = →0 as h → 0, µ(k) = → 0, as k → 0.
|h| |k|
Given a vector h we set k = f (a + h) − f (a). Then
(4) |k| = |Ah + U (h)| ≤ (||A|| + ν(h)) |h|,
and
F (a + h) − F (a) − BAh = g(b + k) − g(b) − BAh = B(k − Ah) + V (k) = BU (h) + V (k).
Hence, (3) and (4) imply that for h 6= 0,
|F (a + h) − F (a) − BAh|
≤ (||B||ν(h) + (||A|| + ν(h)) µ(k).
|h|
Letting h → 0 we have ν(h) → 0, and k → 0 by (4), so µ(k) → 0. From this it follows that
F 0 (a) = BA as required.
Example 1.3. Suppose that f : Rn → Rn is a differentiable map at a ∈ Rn such that in a
neighbourhood of f (a) the map f −1 is defined and differentiable. Then the composition of f −1 ◦ f
is a differentiable map whose differential at a by the Chain Rule equals
(f −1 ◦ f )0 (a) = (f −1 )0 (f (a)) · f 0 (a).
On the other hand, the differential of the identity map is the identity, and we conclude that the
matrix corresponding to (f −1 )0 (f (a)) is the inverse matrix to that of f 0 (a).
Let e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), ... en = (0, . . . , 0, 1) be the standard basis in Rn , and
denote by {u1 , . . . , um } the standard basis in Rm . For a domain Ω ⊂ Rn the map f : Ω → Rm can
be written in the form
X n
(5) f (x) = fi (x)ui = (f1 (x), . . . , fm (x)),
i=1
4 RASUL SHAFIKOV
Proof. Fix j. Since f is differentiable at x, f (x+tej )−f (x) = f 0 (x)(tej )+r(tej ), where r(tej ) = o(t).
Then by the linearity of f 0 (x),
f (x + tej ) − f (x)
lim = f 0 (x)ej .
t→0 t
If now f is represented component-wise as in (5), then
m
X fi (x + tej ) − fi (x)
lim ui = f 0 (x)ej .
t→0 t
i=1
Thus, each coefficient in front of ui has a limit, which shows existence of the partial derivatives
of f and proves (6).
It follows from the above theorem that the Jacobian matrix f 0 (x) is given by
∂f1 ∂f1
∂x1 · · · ∂x n
Df (x) = f 0 (x) = · · · ··· .
∂fm
∂x1 · · · ∂f
∂xn
m
The composition function g = f ◦ γ is a usual function of one variable. By the Chain Rule its
derivative can be computed as
n
dγn T X ∂f
dg ∂f ∂f dγ1 dγi
(7) (t) = ,..., · ,..., = (γ(t)) (t).
dt ∂x1 ∂xn dt dt ∂xi dt
i=1
The above example has an important generalization. For a differentiable function f , define ∇f ,
called the gradient of f , to be the vector given by
Xn
∂f ∂f
∇f (x) = ,..., = (Di f )(x) ei .
∂x1 ∂xn
i=1
Then (7) can be written in the form g 0 (t)= 0
∇f (γ(t))·γ (t), where the dot indicates the dot product
in Rn . Let u now be a unit vector, and let γ(t) = x + tu be the line in the direction of u. Then
γ 0 (t) = u for all t, and so g 0 (0) = (∇f (x))u. On the other hand, g(t) − g(0) = f (x + tu) − f (x),
hence,
f (x + tu) − f (x)
lim = ∇f (x) · u.
t→0 t
This is called the directional derivative of f at x in the direction of vector u, denoted sometimes by
Du f (x) or ∂f∂u . For a fixed f and x it is clear that the directional derivative attains its maximum
if u is a positive multiple of ∇f . So ∇f gives the direction of the fastest growth of the function f .
∂f
Theorem 1.6. Let Ω be a domain in Rn , and f : Ω → R. If f has partial derivatives ∂xj on Ω,
which are continuous at a point a for j = 1, . . . , n, then f is differentiable at a.
Proof. For simplicity of notation we assume that Ω ⊂ R2 , the proof in the general case is the same.
We need to show that there exists a linear map A : R2 → R such that (2) holds. The clear choice
for A is the matrix (D1 f, D2 f ). Let a = (a1 , a2 ). For a fixed h = (h1 , h2 ) we have
∆f = f (a + h) − f (a) = [f (a + h) − f (a1 , a2 + h2 )] + [f (a1 , a2 + h2 ) − f (a)].
We apply the Mean Value theorem to two expressions on the right to obtain
∂f ∂f
∆f := h1 (a1 + θ1 h1 , a2 + h2 ) + h2 (a1 , a2 + θ2 h2 )
∂x1 ∂x2
for some numbers θi ∈ (0, 1). Hence,
∂f ∂f
∆f = h1 (a) + h2 (a) + ε(h),
∂x1 ∂x2
where
∂f ∂f ∂f ∂f
ε(h) = h1 (a1 + θ1 h1 , a2 + h2 ) − (a) + h2 (a1 , a2 + θ2 h2 ) − (a) .
∂x1 ∂x1 ∂x2 ∂x2
By continuity of partial derivatives we obtain
|∆f − D1 f (a)h1 − D2 f (a)h2 | |ε(h)|
= → 0, as h → 0,
|h| |h|
which is the required statement.
Definition 1.7. A map f : Ω → Rm is called continuously differentiable, or of class C 1 (Ω), if f 0 (x)
is a continuous function on Ω, i.e., for every ε > 0 there exists δ > 0 such that ||f 0 (y) − f 0 (x)|| < ε
whenever |x − y| < δ.
6 RASUL SHAFIKOV
Theorem 1.8. Let Ω ⊂ Rn be a domain and f : Ω → Rm . Then f ∈ C 1 (Ω) if and only if all
partial derivatives exist and are continuous on Ω.
Proof. Suppose that f ∈ C 1 (Ω). Then
∂f1 ∂f1
∂x1 ··· ∂xn
(Dj fi )(x) = [f 0 (x)ej ]ui = · · · · · · ej · ui ,
∂fm ∂fm
∂x1 ··· ∂xn
for all i, j and x ∈ Ω (we continue to use the notation {uj } for the standard basis in the target
domain). Then
(Dj fi )(y) − (Dj fi )(x) = (f 0 (y) − f 0 (x))ej ui .
RASUL SHAFIKOV
then
|S(h)| |R(v(h))| |R(v(h))| |v(h)| |R(v(h))|
≤ ||Df (a)−1 || ≤ ||Df (a)−1 || ≤ C||Df (a)−1 || .
|h| |h| |v(h)| |h| |v(h)|
The expression on the right converges to zero as h → 0 by differentiability of f . This proves that
f −1 is differentiable at b. It remains to show (4). We have
v(h) = Df (a)−1 Df (a)v(h) = Df (a)−1 [f (a + v(h)) − f (a) − R(v(h))] = Df (a)−1 (h − R(v(h))),
and so
|v(h)| ≤ ||Df (a)−1 || |h| + ||Df (a)−1 || |R(v(h))|.
Since |R(v)|/|v| → 0 as |v| → 0 by differentiability of f , there exists δ1 > 0 such that
(5) |R(v)| ≤ |v|/(2||Df (a)−1 ||), for |v| ≤ δ1
By continuity of f −1 , there exists δ2 > 0 such that |h| < δ2 implies |v(h)| ≤ δ1 , and therefore,
|v(h)| ≤ 2||Df (a)−1 || |h|,
whenever |h| ≤ δ2 which gives half of (4). For the other half, consider
h = f (a + v(h)) − f (a) = Df (a)v(h) + R(v(h)).
Therefore, in view of (5) for |h| < δ2 ,
1
|h| ≤ ||Df (a)|| |v(h)| + |R(v(h))| ≤ ||Df (a)|| + |v(h)|.
2||Df (a)−1 ||
By Theorem 1.5 the partial derivatives of f −1 are defined at each point y ∈ V0 . Observe that the
formula Df −1 (y) = Df (f −1 (y))−1 implies that the map Df −1 from V0 into the space of invertible
n × n matrices can be written in the form
f −1 Df ι
V0 −−→ U0 −−→ GL(n, R) →
− GL(n, R),
where ι : GL(n, R) → GL(n, R) is the matrix inversion map. It follows from Cramer’s rule that ι is
a smooth map of the matrix components. Thus the partial derivatives of f −1 are continuous, and
so f −1 is of class C 1 . To prove that f −1 ∈ C k (V0 ) assume by induction that we have shown that
f −1 is of class C k−1 . Because Df −1 is a composition of C k−1 -smooth functions, it is itself C k−1 -
smooth, which implies that the partial derivatives of f −1 are of class C k−1 , so f −1 is C k -smooth.
This completes the proof.
Example 2.1 (Spherical coordinates). Consider the map f : (ρ, φ, θ) → (x, y, z) given by
x = ρ sin φ cos θ
y = ρ sin φ sin θ
z = ρ cos φ
A computation shows that the differential of this map equals ρ2 sin φ. Hence, by the Inverse
Function theorem, f is a local diffeomorphism from {ρ > 0, θ ∈ R, 0 < φ < π} to R3 . By choosing
a domain U where f is injective we conclude that the map f : U → f (U ) is a diffeomorphism.
This choice of coordinates can be generalized to arbitrary dimension. Consider the map
Φ : (r, θ1 , ..., θn−1 ) 7→ (x1 , ..., xn )
defined on the domain
U = (0, ∞) × (0, π) × ... × (0, π) × (0, 2π) ⊂ Rn
4 RASUL SHAFIKOV
by the equations
x1 = r cos θ1 ,
x2 = r sin θ1 cos θ2 ,
x3 = r sin θ1 sin θ2 cos θ3 ,
···
xn−1 = r sin θ1 sin θ2 . . . sin θn−2 cos θn−1 ,
xn = r sin θ1 sin θ2 . . . sin θn−1 .
By the Inverse Function theorem, Φ is a diffeomorphism since its differential satisfies
det DΦ = rn−1 (sin θ1 )n−2 ... sin θn−2 ,
which does not vanish on U . A diffeomorphism that is used to simplify considerations or calculations
is usually called a (local) change of coordinates.
The rank of a map f : Rn → Rm at a point x is defined as the rank of the differential Df (x)
(viewed as an n × m matrix), which is the same as dim Df (x)(Rn ). The following theorem can be
viewed as a generalization of the Inverse Function theorem.
Theorem 2.5 (Rank theorem). Suppose U ⊂ Rm and V ⊂ Rn are open sets and f : U → V is
a smooth map with constant rank k. For any point p ∈ U , there exist a connected neighbourhood
U1 ⊂ U , a change of coordinates (i.e., a diffeomorphism) φ : U1 → U0 , φ(p) = 0 and connected
neighbourhood V1 ⊂ V with a change of coordinates ψ : V1 → V0 , ψ(f (p)) = 0, such that
ψ ◦ f ◦ φ−1 (x1 , . . . , xk , xk+1 , . . . , xm ) = (x1 , . . . , xk , 0, . . . , 0).
Here U0 and V0 can be assumed to be connected open neighbourhoods of the origin in Rm and Rn
respectively.
Proof. Since Df (p) has rank k, there exists a k × k minor with nonzero determinant. By reodering
∂fi
the coordinates, we may assume that it is the upper left minor, ( ∂x j
) for i, j = 1, . . . , k. After
translation we may assume that p = 0, and f (0) = 0. Let (x, y) ∈ Rk × Rm−k , (v, w) ∈ Rk × Rn−k
k
wewrite f (x, y) = (Q(x, y), R(x, y)) for some smooth maps Q : U → R ,
be the coordinates. If
∂Q
R : U → Rn−k , then ∂xji is nonsingular at the origin. Define φ(x, y) = (Q(x, y), y). Then
1≤i,j≤k
!
∂Qi ∂Qi
∂x (0) ∂y (0)
Dφ(0) = j j
0 Im−k
is nonsingular. By the Inverse Function theorem there are connected neighbourhoods U1 and
U0 of the origin in Rm such that φ : U1 → U0 is a diffeomorphism. Writing the inverse map
φ−1 (x, y) = (A(x, y), B(x, y)), A : U0 → Rk , B : U0 → Rm−k , we have
(x, y) = φ(A(x, y), B(x, y)) = (Q(A(x, y), B(x, y)), B(x, y)).
It follows that B(x, y) = y, and so φ−1 (x, y) = (A(x, y), y), Q(A(x, y), y) = x, and therefore,
f ◦ φ−1 (x, y) = (x, R̃(x, y)), R̃(x, y) = R(A(x, y), y).
The Jacobian matrix of this map at an arbitrary point (x, y) ∈ U0 is
!
−1
Ik 0
D(f ◦ φ )(x, y) = ∂ R̃i ∂ R̃i .
∂xj ∂yj
REAL ANALYSIS LECTURE NOTES 5
Since composing with a diffeomorphism does not change the rank of a map, this matrix has rank
equal to k everywhere on U0 . Since the first k columns are obviously independent, the rank can be
k only if the partial derivatives ∂∂yR̃ji vanish identically on U0 , which implies that R̃ is independent
of variables y. Thus, setting S(x) = R̃(x, 0), we have
(6) f ◦ φ−1 (x, y) = (x, S(x)).
Let V1 = {(v, w) ∈ V : (v, 0) ∈ U0 }, which is a neighbourhood of the origin. The map ψ(v, w) =
(v, w − S(v)) is a diffeomorphism from V1 onto its image, which can be seen by observing that
ψ −1 (s, t) = (s, t + S(s)). It follows from (6) that
ψ ◦ f ◦ φ−1 (x, y) = ψ(x, S(x)) = (x, S(x) − S(x)) = (x, 0).
For a domain Ω ⊂ Rn , a smooth map f : Ω → Rm is called an immersion if Df (x) is injective for
all x ∈ Ω (i.e., Df (x) has a trivial kernel for all x), and a submersion if Df (x) is surjective for all
x ∈ Ω. Clearly n ≤ m is a necessary condition for f to be an immersion, while n ≥ m is required
for a submersion. These are important examples of maps of constant rank. The Rank theorem is
a powerful tool for the study of such maps. For example, let us show that if f : Rm → Rn is an
injective map of constant rank, then it is an immersion. Indeed, if f is not an immersion, then the
rank k of f is less than m. By the Rank theorem in a neighbourhood of any point there is a local
change of coordinates such that f becomes
f (x1 , . . . xk , xk+1 , . . . , xm ) = (x1 , . . . , xk , 0, . . . , 0).
It follows that f (0, . . . , 0, ε) = f (0) for ε small, which contradicts injectivity of f .
Another useful consequence of the Inverse Function theorem is the following theorem which gives
conditions under which a level set of a smooth map is locally the graph of a smooth function.
Theorem 2.6 (Implicit Function Theorem). Let U ⊂ Rn × Rk be an open set, and let (x, y) =
(x1 , . . . , xn , y1 , . . . , yk ) denote the standard coordinates on U . Suppose Φ : U → Rk is a smooth
map, (a, b) ∈ U , and c = Φ(a, b). If the k × k matrix
i
∂Φ
(a, b)
∂y j
is nonsingular, then there exist neighbourhoods V0 ⊂ Rn of a and W0 ⊂ Rk of b, and a smooth map
f : V0 → W0 such that Φ−1 (c) ∩ V0 × W0 is the graph of f , i.e., Φ(x, y) = c for (x, y) ∈ V0 × W0 if
and only if y = f (x).
Proof. Consider the map Ψ : U → Rn × Rk defined by Ψ(x, y) = (x, Φ(x, y)). Its differential at
(a, b) is
In 0
DΨ(a, b) = ,
∂Φi ∂Φi
∂xj (a, b) ∂yj (a, b)
which is nonsingular by hypothesis. Thus by the Inverse Function theorem there exist connected
open neighbourhoods U0 of (a, b) and Y0 of (a, c) such that Ψ : U0 → Y0 is a diffeomorphism.
Shrinking U0 and Y0 if necessary, we may assume that U0 = V × W is a product neighbourhood.
The inverse map has the form (why?)
Ψ−1 (x, y) = (x, B(x, y))
6 RASUL SHAFIKOV
RASUL SHAFIKOV
where the the supremum (resp. infimum) is taken over all step function s (resp. t) with s ≤ f
(resp. f ≤ t).
Proposition 3.7. Continuous functions on Rn are Riemann integrable on any domain B =
[a1 , b1 ] × · · · × [an , bn ] ⊂ Rn .
Proof. Let ε > 0 be given. Recall that a continuous function on a compact set is uniformly
continuous, i.e., for any ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε whenever |x − y| < ε.
Thus, there exists δ > 0 such that |x − y| < δ implies |f (x) − f (y)| < volε(B) . Select a partition P
sufficiently fine so that the diameter of each brick of P is less than δ. Choose step functions s(x)
and t(x) to be respectively the minimum and the maximum of f on each brick. Then s ≤ f ≤ t,
and Z Z
ε X
t(x) − s(x) ≤ vol (I ) = ε.
B B vol (B)
I∈P
We remark that what we defined above is, in fact, called the Darboux integral. However, it can
be shown that Darboux’s definition of integral is equivalent to that of Riemann.
3.2. What is wrong with the Riemann integral? There are several reasons why the Riemann
integral defined in the previous section does not seem to be adequate. It all boils down to the fact
that certain reasonable functions are not Riemann integrable. The following three examples will
illustrate that. We begin with a definition.
Definition 3.8. Given a set S ⊂ Rn , the characteristic function χS of S is defined to be
(
1, if x ∈ S,
χS (x) =
0, if x ∈ / S.
Example
R 3.1. The so-called
R Dirichlet function χQ , is clearly not Riemann integrable on [0, 1],
since [0,1] s(x) = 0 and [0,1] t(x) = 1 for any step functions s and t with s ≤ χQ ≤ t. This is
because both rational numbers Q and irrational numbers R \ Q are dense in R.
Proposition 3.9. Every open set U ⊂ R can be written in a unique way as an at most countable
union of disjoint open intervals.
We leave the proof of the proposition as an exercise for the reader. With the help of this
proposition we can make the following definition. Given an open set U ⊂ R we define the Lebesgue
measure of U to be X
m(U ) = |UI |,
I
where |(Ui )| is the length of the interval UI , and the summation is taken over the disjoint union of
open intervals whose union is U . It is immediate that the Lebesgue measure of every open interval
is equal to its length.
While the previous example can be dismissed by declaring χQ to be “too irregular” to be inte-
grable, the next example shows that there exist open sets whose characteristic functions are not
integrable.
Example 3.2. Suppose U ⊂ [0, 1] is an open set with the following properties: U is dense in [0, 1],
and m(U ) < 1. We claim that χU is not Riemann integrable. For the proof of the claim consider
any two step functions s(x) ≤ χU (x) ≤ t(x) for a partition P of [0, 1]. Since U is dense, any brick
REAL ANALYSIS LECTURE NOTES 3
R
[xi , xi+1 ] will have a nonempty intersection with U , and so [0,1] t(x) = 1. On the other hand, (using
the multidimensional notation, although we are in R), let
Z X
s(x) = sI vol (I ).
[0,1] I
area) of a brick. We also add the empty set to P and define m(∅) = 0. If a set E is a finite disjoint
union of bricks, i.e.,
(1) E = ∪kj=0 Bj , Bj ∈ P, Bi ∩ Bj = ∅, ∀i 6= j,
then clearly
k
X
(2) m(E) = m(Bk ).
j=0
It is possible to extend m as a positive function to a wider class of sets still keeping the additivity
property (2). We say that a subset E of Rn is elementary if it admits representation (1). Then we
view (2) as the definition of m(E). Note that this definition is independent of the choice of Bk in
(1). It is easy to see that if E1 and E2 are two elementary sets, then E1 ∪ E2 , E1 ∩ E2 , E1 \E2 are
elementary sets. We denote the class of elementary sets by E. The crucial property of the function
m : E → R+ ∪ {0} is the following: if (EjP ) is a finite or countable collection of elementary sets and
E ∈ E satisfies E ⊂ ∪j Ej , then m(E) ≤ j m(Ej ).
Let now A be a subset of Rn . We define its outer measure m∗ by
X
∗
m (A) = inf m(Ej ) : A ⊂ ∪j Ej , Ej ∈ E ,
j
where the infimum is taken over all finite or countable coverings of A by elementary sets. Recall
that a symmetric difference of two sets A and B is defined by A∆B = (A ∪ B)\(A ∩ B).
Definition 3.10. A set A ⊂ Rn is called Lebesgue measurable if for every ε > 0 there exists
E ∈ E such that m∗ (A∆E) < ε. If A is a measurable set, the Lebesgue measure of A is defined as
m(A) := m∗ (A).
Denote by M the class of all measurable sets in Rn . Clearly, every brick domain is measurable.
One can show that M is closed with respect to finite or countable application of unions, intersections
and differences. Further, one can show that any open or closed subset of Rn is measurable, and
that a set X is measurable if and only if for any ε > 0 there exists an open set G (resp. closed F )
such that X ⊂ G (resp. F ⊂ X) such that m∗ (G \ X) < ε (resp. (m∗ (X \ F ) < ε).
Perhaps the most important property of the Lebesgue measureP is its σ-additivity: if (Aj ) is a
disjoint sequence of measurable sets and A = ∪j Aj , then m(A) = j m(Aj ). It is also monotone:
if A ⊂ B then m(A) ≤ m(B).
Lemma 3.11. Any countable set S in Rn has measure zero.
Proof. Enclose every point an of S = {a0 , a1 , . . . } in a brick domain of volume ε/2n .
Note that the converse to the lemma is false: there exist sets of measure zero which are not
countable. A primary example of such domain is the Cantor set. Following the construction in
Example 3.3 we produce an open set U by taking an = 1/3 for all n ∈ N. Then the set [0, 1] \ U is
called the Cantor set. It is a compact set of measure zero, and can be shown to have cardinality of
R. We leave details to the reader.
We now move from sets to functions. Let X be a measurable subset of Rn . A function f :
X → R is called measurable if all subsets f −1 ((−∞, a)), f −1 ((−∞, a]), f −1 ([a, ∞)), f −1 ((a, ∞))
are measurable for every a ∈ f (X). In particular, suppose that f admits at most a finite set of
values y0 , y1 , ..., yk . Then f is measurable if and only if every set f −1 (yj ) is measurable. Measurable
REAL ANALYSIS LECTURE NOTES 5
functions that admit only finitely many values will be called simple. The Lebesgue integral over X
of a simple function ψ is defined by
Z X
(3) ψ(x)dx := yj m(ψ −1 (yj )).
X j
Since f is Riemann integrable, I(t) − I(s) can be made arbitrarily small, and we conclude that the
function f is Lebesgue integrable.
If now f ≥ 0 on X ∈ M, we define
Z Z
(5) f (x)dx = sup h(x)dx,
X h≤f X
For a general measurable f : X → R we set f + = max{f, 0}, and f − = max{−f, 0}. Then
f = f + − f − , and |f | = f + + f − .
Definition 3.14. For X ∈ M and a measurable f : X → R we define
Z Z Z
f (x)dx = f + (x)dx − f − (x)dx.
X X X
If both integrals on the right are finite we say that f is (Lebesgue) integrable on X. The class of
integrable functions is denoted by L1 (X).
A property of functions defined on a domain in Rn is said to hold almost everywhere if it does not
hold on a set of measure zero. The common notation for that is a.e.. For example two functions
f = gR a.e. means
R that the set of points where f is not equal to g has measure zero. It follows then
that f = g. Another example is convergence a.e.: we say lim fn = f a.e., if the set of points x
for which lim fn (x) 6= f (x) has measure zero.
6 RASUL SHAFIKOV
Proof. Without loss of generality we may assume fn (x) → f (x) for all x. By the definition of the
Lebesgue integral, it is enough to show
Pm that (6) holds if we replace f with any non-negative simple
function φ ≤ f . Suppose that φ = k=1 ak χAk , where Ak are disjoint measurable sets, and ak > 0.
Let 0 < t < 1. Since φ(x) ≤ f (x), we see that ak ≤ lim inf fn (x) for each k and x ∈ Ak . It follows
that for a fixed k the sequence of sets
Bkn = {x ∈ Ak : fp (x) ≥ tak for all p ≥ n}
Pm
increases to Ak . Consequently, m(Bkn ) → m(Ak ) as n → ∞. The simple function k=1 tak χBkn
is everywhere less than fn , and so
Z m
X
fn dx ≥ tak m(Bkn ).
X k=1
Taking lim inf in this inequality yields
Z m
X Z
lim inf fn dx ≥ tak m(Ak ) = t φdx.
n→∞ X X
k=1
RASUL SHAFIKOV
4.1. Lp spaces.
Definition 4.1. For a domainR Ω ⊂ RN and a real number p, 1 ≤ p < ∞, a measurable function f
is said to be of class Lp (Ω) if Ω |f |p < ∞.
Since |f + g|p ≤ 2p (|f |p + |g|p ) for all p, the space Lp = Lp (Ω) of all Lp -functions is a vector
space. Define
Z 1/p
p
(1) ||f ||p = |f | .
Ω
Lp
Clearly, ||c f || = |c| ||f || for all f ∈ and c ∈ R, and ||f || = 0 if and only if f = 0 a.e. on
Ω. In Theorem 4.3 below we will show that ||f + g||p ≤ ||f ||p + ||g||p . Thus, (1) defines a norm
on Lp . Note that since ||f || = 0 only implies that f vanishes everywhere except a set of measure
zero, one should understand elements of the space Lp as equivalence classes of functions satisfying
Definition 4.1 with respect to the equivalence relation given by f ∼ g ⇔ f = g a.e.
For p = ∞ we define the space L∞ (Ω) of bounded functions (more precisely essentially bounded)
functions with the norm
||f ||∞ = ess sup |f (x)| = sup {r ∈ R : m({x : |f (x) − r| < ε}) > 0 for all ε > 0} .
x∈Ω
Now for the proof of Hölder’s inequality, we may divide the functions f and g by their norms in
the corresponding spaces, so we may assume that ||f ||p = ||g||q = 1. Using Young’s inequality, we
have
|f (x)|p |g(x)|q
|f (x)g(x)| ≤ + , x ∈ Ω.
p q
Integrating the above inequality over Ω gives
||f (x)g(x)||1 ≤ 1/p + 1/q = 1,
which is what was needed to prove.
The next theorem is essentially a corollary of Hölder’s inequality.
Theorem 4.3 (Minkowski’s inequality). For any p ≥ 1,
||f + g||p ≤ ||f ||p + ||g||p .
Proof. When p = 1 or p = ∞ the inequality is trivial. For 1 < p < ∞ we write
|f (x) + g(x)|p = |f (x) + g(x)| · |f (x) + g(x)|p−1 ≤ |f (x)| · |f (x) + g(x)|p−1 + |g(x)| · |f (x) + g(x)|p−1 .
Integrating over Ω we obtain
Z Z
||f + g||pp ≤ p−1
|f | · |f + g| + |g| · |f + g|p−1 .
We now apply Hölder’s inequality to both terms on the right. The first one yields
Z Z p−1
p p
p−1 (p−1) p−1
|f | · |f + g| ≤ ||f ||p |f + g| ≤ ||f ||p · ||f + g||p−1
p .
Theorem 4.6 (Riesz-Fischer theorem). for all 1 ≤ p < ∞, the space Lp equipped with the norm
|| · ||p is a Banach space.
Proof. We only need to prove that Lp is complete, i.e., that every Cauchy sequence in Lp converges
in norm to an element of Lp . By Lemma 4.5, it suffices
P to show that every absolutely summable
series is summable. Suppose that {fn } is such that ||fn ||p = M < ∞. Define
n
X
gn (x) = |fk (x)|.
k=1
By Minkowski’s inequality we have
n
X
||gn ||p ≤ ||fk ||p ≤ M,
1
and so gnp ≤ M p . For each x, the sequence {gn (x)} is an increasing sequence of extended real
R
numbers (i.e., including the value ∞), and so it must converge to an extended real number g(x).
Then g(x) is a measurable function, and since gn ≥ 0, g ≤ M p by Fatou’s lemma. It follows
R p
p
P∞ g is integrable, and so g(x) is finite for a.e. x. For every x such that g(x) < ∞, the series
that
1 fk (x) converges absolutely, so in particular, it converges to a real number s(x). We set s(x) = 0
for those x P where g(x) = ∞. Thus we constructed a function s(x) which is the limit a.e. of partial
sums sn = n1 fk . It follows that s is measurable, and since |sn (x)| ≤ |g(x)|, we have s(x) ≤ g(x).
Consequently, s ∈ Lp , and
|sn (x) − s(x)|p ≤ 2p [g(x)]p .
Since 2p g p is integrable, and |sn (x) − s(x)|p → 0, we have by the Lebesgue Convergence theorem,
Z
|sn − s|p → 0.
This easily implies that L1 (Ω) is a separable space. Indeed, the space of polynomials with rational
coefficients is dense in the space of continuous functions (the Weierstrass theorem). One can also
show that Lp (Ω) is separable for 1 ≤ p < ∞. It is also easy to see that this remains true even if
m(Ω) = +∞.
4.2. Topological vector spaces and their duals.
Definition 4.8. If a vector space X (over the field of reals) is equipped with some topology, we
called X a topological vector space if the map X × X → X corresponding to vector addition in X
and the map R × X → X corresponding to scalar multiplication are both continuous.
Sometimes it is required that the topology on X is Hausdorff. This is always the case if the
topology comes from a metric (in particular, from a norm) on X.
Example 4.1. We give some examples of topological vector spaces.
(i) The space Lp is a topological vector space for any p ≥ 1 (prove it!).
(ii) The space C[0, 1] of continuous functions on the interval [0, 1]. One can show that this is a
Banach space equipped with the norm ||f || = supx∈[0,1] |f (x)|. In fact, any normed space,
complete or not, is a topological vector space.
(iii) The next is an example of a topological vector space which is not a normed space. Consider
the space C ∞ ([0, 1]) of smooth functions on [0, 1]. The topology on C ∞ ([0, 1]) can be
described as follows. For every integer k ≥ 0 we define a semi-norm
kf kk = sup {|f (k) (x)| : x ∈ [0, 1]}.
x∈[0,1]
Here f (k) is the derivative of f of order k. That || · ||k is a semi-norm, rather than a norm,
means that ||f ||k = 0 may hold for nonzero functions, for example any constant c satisfies
||c||1 = 0. The space C ∞ ([0, 1]) a complete metric space with the metric given by
∞
X kf − gkk
(3) d(f, g) = 2−k f, g ∈ C ∞ ([0, 1]).
1 + kf − gkk
k=0
Topological vector spaces equipped with a complete metric that comes from a countable
collection of semi-norms are called Fréchet spaces.
Recall that a functional over a vector space X is a linear map φ : X → R.
Definition 4.9. Given a topological vector space X, the space of continuous linear functionals is
called the dual space of X, and denoted by X ∗ .
One can show that if a topological vector space X is finite-dimensional, then every linear map
on X is continuous. For infinite-dimensional vector spaces continuity of a functional is a nontrivial
condition.
Example 4.2.
(i) For a space Lp , p ≥ 1, choose some g ∈ Lq with 1/p + 1/q = 1. Then the map φg : Lp → R,
given by
Z
(4) φg (f ) = hφg , f i = f g,
For example, if g ∈ Lq , then the functional φg given by (4) has the norm which is equal to that of
g: ||φg ||∗ = ||g||q . Indeed, by Hölder’s inequality,
Z Z
f g ≤ |f g| ≤ ||f ||p ||g||q ,
which shows that ||φg ||∗ ≤ ||g||q . On the other hand, if f = |g|q/p sgn g, then |f |p = |g|q = f g, and
q/p
so ||f ||p = ||g||q . Therefore,
Z Z
hg, f i = f g = |g|q = (||g||q )q = ||g||q ||f ||p ,
We do not give the proof of this theorem. Observe that now we have two topologies on the space
Lq : the normed topology and the weak∗ topology. These two are not the same, with weak∗ being
weaker than the normed topology (sometimes called the strong topology), i.e., weak∗ topology has
fewer opens sets. To see this it is enough to construct an example of a sequence in Lq that converges
weakly, but not in norm. Consider the space L2 (0, 1) which is the dual of itself, and consider the
sequence (√
n, x ∈ (0, n1 ),
φn (x) = .
0, x ∈ [ n1 , 1)
Clearly, φn ∈ L2 (0, 1), and φn (x) → 0 point-wise on (0, 1) as n → ∞. We use the following fact: if
a sequence φn converges to φ pointwise and converges to a function φ0 in norm, then φ = φ0 (prove
it!). From this it follows that the sequence φn does not converge in norm, since for any n we have
Z 1 1/2 Z 1/n !1/2
2
||φn − 0||2 = φn = n = 1.
0 0
by Hölder’s inequality. As n → ∞ the right hand-side in the above formula converges to zero for
every f ∈ L2 , which gives weak convergence φn → 0. On the other hand, by Hölder’s inequality,
convergence in norm always implies convergence in weak∗ topology.
4.3. Product spaces and Fubini’s theorem. Let (RN , m) and (Rn , µ) be the Euclidean spaces
with the corresponding Lebesgue measures. Then on the space RN × Rn = RN +n we may define a
new measure λ as follows: If A ⊂ RN and B ⊂ Rn are measurable, then
λ(A × B) := m(A) · µ(B).
One can show that λ can be extended from the collection of product sets in RN +n to a wide class
of sets in RN +n . In fact, one can prove that the measure λ obtained this way is nothing but the
Lebesgue measure on RN +n .
The following theorem gives sufficient conditions that allows one to replace an integral with
respect to a product measure by an iterated integral. We use dx (resp. dy) to denote integration
with respect to measure m (resp. µ), and dx dy to denote integration with respect to λ.
Theorem 4.12 (Fubini’s theorem). Let X × Y ⊂ (RN , m) × (Rn , µ) be measurable, and f ∈
L1 (X × Y, λ). Then
(i) for a.e. x, the function fx (y) = f (x, y) is integrable on Y ;
(ii) for a.e. y, the function
R fy (x) = f (x, y) is integrable on X;
(iii) the function x → R Y fx (y)dy is integrable on X;
(iv) the
R Rfunction y → X fRy (x)dx is integrable on R YR;
(v) X Y fx (y)dy dx = X×Y f (x, y)dx dy = Y X fy (x)dx dy.
For the proof of Fubini’s theorem one can reduce the problem to simple functions and then use
the Lebesgue convergence theorem. We omit the details. Below we state a variation of Fubini’s
theorem that does not require integrability of the function f .
Theorem 4.13 (Tonelli’s theorem). Let X ×Y ⊂ (RN , m)×(Rn , µ) be measurable, and f : X ×Y →
R be a nonnegative measurable function. Then
REAL ANALYSIS LECTURE NOTES 7
In the remaining part of the subsection we discuss another property of product spaces: possibility
of interchanging integration and differentiation.
Lemma 4.14. Let X × Y ⊂ RN × Rn be a product of open subsets, and let m(X) < ∞. Suppose
f (x, y) : X × Y → R is uniformly continuous on the closure of X × Y . Then the function
Z
(5) F (y) = f (x, y)dx
X
is continuous on Y .
Proof. Uniform continuity of f on X × Y means that for any ε > 0 there exists δ > 0 such that
|f (x, y) − f (x0 , y 0 )| < ε ⇐= |(x, y) − (x0 , y 0 )| < δ.
Then,
Z Z
0 0
|F (y) − F (y )| = f (x, y) − f (x, y )dx ≤ |f (x, y) − f (x, y 0 )|dx ≤ εm(X),
X X
which proves continuity of F .
The next theorem gives a sufficient condition under which we can differentiate under the integral
sign. It can also be interpreted as commutativity of the operations of integration and differentiation
under the given assumptions.
Theorem 4.15. Let X, Y be as in the previous lemma. Assume that for some 1 ≤ i ≤ m, the
∂f
functions f (x, y) and ∂y i
are uniformly continuous on the closure of X × Y . Then the function
F (y) defined by (5) is of class C 1 (Y ), and
Z
∂F ∂f
(y) = (x, y)dx.
∂yi X ∂yi
RASUL SHAFIKOV
U 0.
for (x1 , ..., xk−1 , xk+1 , ..., xn ) ∈ The map πk−1 : U 0 → ∂Ω ∩ U is clearly a local parametrization
of the hypersurface ∂Ω. We call U a coordinate neighbourhood of p.
Example 5.2. The upperphemisphere S + = S 2 ∩ {z > 0} in R3 with coordinates (x, y, z) is the
graph of the function z = 1 − x2 − y 2 . The unit normal vector to S + at a point (x, y, z) ∈ S + is
~n = (x, y, z).
Now let f be a continuous (this assumption can be considerably weakened) function on ∂Ω. Our
goal is to define the integral of f over ∂Ω as a surface integral . If an open set X ⊂ ∂Ω admits a
parametrization Φ : D → Rn , Φ(D) = X ⊂ ∂Ω then we define
Z Z
(2) f (x)dS = ~ |dt,
f ◦ Φ(t)|N
X D
where the coordinates of the vector N ~ are determined from N ~ = | det(∇Φ1 , ..., ∇Φn , ~e )|. Here
~e = (~e1 , . . . , ~en ) is a formal vector whose coordinates are the vectors of the standard basis in Rn .
In fact, one can show that N ~ is the normal vector to X ⊂ ∂Ω.
Now if U is a coordinate neighbourhood where ∂Ω admits representation as in (1) and X is an
open subset in ∂Ω ∩ U then
Z Z
(3) f dS = f ◦ πk−1 (1+ k ∇ψ k2 )1/2 dx1 ...dxk−1 dxk+1 ...dxn .
X πk (X)
Both definitions agree because (1+ k ∇ψ k2 )1/2 is just the length of the normal vector
(4) ~ = ∂ψ , ..., 1, ..., ∂ψ ,
N
∂x1 ∂xn
(here 1 is on the k-th position) corresponding to the local parametrization of ∂Ω. We refer to dS
or the equivalent expression in a local parametrization as the hypersurface area measure (or the
~/ k N
element of the surface area in some literature). Let νk be the angle between ~n = N ~ k and the
n
vector ~ek (the k-th vector of the standard base of R ). Then
cos νk = (~ek , ~n) = (1+ k ∇ψ k2 )−1/2 .
Thus,
Z Z
1
(5) f dS = f ◦ πk−1 dx1 ...dxk−1 dxk+1 ...dxn .
X πk (X) cos νk
If f ≡ 1, then the integral X dS represents the area of X. This terminology comes from R3 , where
R
the integral is indeed the area of a surface, while for n > 3, it is actually the (n − 1)-dimensional
volume.
Example 5.3. Consider the surface integral of a continuous function f (x, y, z) over
p the upper
+ 2 3
hemisphere S = S ∩ {z > 0} ⊂ R . First we use the parametrization z = ψ(x, y) = 1 − x2 − y 2
for x2 + y 2 < 1. Then
x2 y2 1
|1 + |∇ψ|2 | = 1 + 2 2
+ 2 2
= .
1−x −y 1−x −y 1 − x2 − y 2
Therefore, from (3) we obtain
Z Z p dxdy
f dS = f (x, y, 1 − x2 − y 2 ) p .
S+ {x2 +y 2 <1} 1 − x2 − y 2
REAL ANALYSIS LECTURE NOTES 3
That both integrals agree can be verified, for example, by calculating the surface area of S + using
these two representations of the surface integral.
Finally, if Uj is an open covering of ∂Ω by coordinate neighbourhoods, we set Xk = Uk \ ∪k−1
j=1 Uj
so that ∂Ω = ∪Xk and Xk are disjoint. Then we set
Z XZ
f dS = f dS.
∂Ω k Xk
One can view this as a definition of the surface integral over ∂Ω. It is not difficult to verify
that the integral is well-defined, i.e., it is independent of the choice of the covering by coordinate
neighbourhoods, local defining functions, etc. We leave this verification as an exercise for the
reader.
5.2. Divergence theorem. The following theorem connects the integral over a domain Ω with
the surface integral over its boundary Ω. It was discussed in some form in the work of Lagrange,
Gauss, and most notably Ostrogradski, who gave a proof that would be considered complete by
modern standards. It is sometimes referred to as Gauss-Ostrogradski theorem.
Recall that a vector field F on a domain Ω ⊂ Rn is simply a map F : Ω → Rn . The geometric
interpretation of a vector field (which becomes nontrivial and important when one considers abstract
manifolds) is that at each point x ∈ Ω the value F (x) is thought of as a vector in Rn originating
at x. For example, given a function f : Ω → R, the gradient ∇f is a vector field on Ω. Another
example is a vector field given by (4) assigning to every boundary point of ∂Ω a normal vector N ~
~
to Ω. The divergence of a vector field F is defined as div F = ∂F1
+ ... + ∂Fn
.
∂x1 ∂xn
Proof. For simplicity of notation we assume that n = 3, the proof in the general case is completely
analogous. We will assume that Ω = {(x, y, z) : (x, y) ∈ D, ψ1 (x, y) < z < ψ2 (x, y)}, where
D is a domain in R2 , and ψ and φ are smooth functions on D. Moreover, we assume that a
similar representation is also valid for projections onto the other two coordinate planes. Such
domains sometimes are called simple. If the domain Ω is not simple in all three directions, then
we may divide it into smaller domains Ωi which are simple. Adding the results for each i gives the
Divergence theorem for Ω and ∂Ω. Indeed, since after splitting Ω the surface integrals over the
newly introduced boundaries occur twice with the opposite normal vectors ~n, their sum is equal to
zero, and we end up with the surface integral over the original ∂Ω.
Denote by Γj the surface
Γj = {(x, y, ψj (x, y)) ∈ R3 : (x, y) ∈ D}, j = 1, 2.
Then, by Fubini’s theorem,
Z Z Z ψ2 (x,y) !
∂F3 (x, y, z) ∂F3 (x, y, z)
dxdydz = dz dxdy =
Ω ∂z D ψ1 (x,y) ∂z
Z Z Z
F3 (x, y, ψ2 (x, y))dxdy − F3 (x, y, ψ1 (x, y))dxdy = F3 (x, y, z) cos ν3 dS,
D D Γ1 ∪Γ2
where ν3 is the angle between the vector ~e3 and the normals ( ∂ψ2 ∂ψ2
∂x , ∂y , 1) when (x, y, z) ∈ Γ2 and
(− ∂ψ ∂ψ2
∂x , − ∂y , −1)
1
when (x, y, z) ∈ Γ1 respectively. In the last step we used (5).
Let now Γ3 = {(x, y, z) : (x, y) ∈ ∂D, ψ1 (x, y) < z < ψ2 (x, y)} be the “vertical” part of ∂Ω.
Let ν3 still denote the angle between Γ3 and the outward-pointing unit normal vector to Γ3 . Then
ν3 = π/2 and cos ν3 = 0. Then Z
F3 (x, y, z) cos ν3 dS = 0.
Γ3
Since ∂Ω = Γ1 ∪ Γ2 ∪ Γ3 we can write
Z Z
F3 (x, y, z) cos ν3 dS = F3 (x, y, z) cos ν3 dS,
∂Ω Γ1 ∪Γ2
and therefore,
Z Z
∂F3 (x, y, z)
(7) dxdydz = F3 (x, y, z) cos ν3 dS.
Ω ∂z ∂Ω
Similarly, we establish the formulas
Z Z
∂F1 (x, y, z)
(8) dxdydz = F1 (x, y, z) cos ν1 dS,
Ω ∂x ∂Ω
and
Z Z
∂F2 (x, y, z)
(9) dxdydz = F2 (x, y, z) cos ν2 dS,
Ω ∂y ∂Ω
where ν1 and ν2 are angles between the outward-pointing unit normal to ∂Ω and the standard base
vectors ~e1 and ~e2 . Taking the sum in (7), (8), (9) we obtain
Z Z
(10) div F~ dx = (F1 (x, y, z) cos ν1 + F2 (x, y, z) cos ν2 + F3 (x, y, z) cos ν3 ,
Ω ∂Ω
which is precisely (6) for dimension 3.
REAL ANALYSIS LECTURE NOTES 5
In conclusion we mention some useful consequence of the Divergence theorem. Let f be a function
of class C(Ω) ∩ C 1 (Ω). Applying the Divergence theorem to the vector field F~ = f~ek we obtain
Z Z
∂f
(11) (x)dx = f (x)(~ek , ~n(x))dS.
Ω ∂xk ∂Ω
Let f = u · v. Then, since
∂u ∂(uv) ∂v
v= −u ,
∂xk ∂xk ∂xk
formula (11) gives
Z Z Z
∂u ∂v
(12) (x)v(x)dx = u(x)v(x)(~ek , ~n(x))dS − u dx,
Ω ∂xk ∂Ω Ω ∂xk
which is just the multidimensional integration by parts formula. Since (~ek , ~n) = cos νk , where νk is
the angle between vectors ek and ~n, the integration by parts formula can be rewritten as
Z Z Z
∂u ∂v
(x)v(x)dx = u(x)v(x) cos νk dS − u dx.
Ω ∂x k ∂Ω Ω ∂xk
2
Recall that the Laplacian of a C 2 -smooth function u(x1 , . . . , xn ) is the function ∆u = nj=1 ∂∂xu2 (x).
P
j
Consider now two functions u and v of class C 2 (Ω) ∩ C 1 (Ω) such that their Laplacians ∆u and ∆v
are integrable in Ω. Clearly,
∆u = div(∇u).
Furthermore, for every boundary point x ∈ ∂Ω the scalar product (∇u(x), ~n(x)) coincides with the
∂u
directional derivative ∂~
n . On the other hand,
v∆u = vdiv(∇u) = div(v∇u) − (∇u, ∇v).
Integrating this identity over Ω and applying the Divergence theorem we obtain the first Green’s
formula:
Z Z Z
∂u
(13) v∆udx = v dS − (∇u, ∇v)dx.
Ω ∂Ω ∂~n Ω
Similarly we have
Z Z Z
∂v
u∆vdx = u dS − (∇v, ∇u)dx.
Ω ∂Ω ∂~
n Ω
Subtracting this last equality from (13) we obtain the second Green’s formula
Z Z
∂u ∂v
(14) (v∆u − u∆v)dx = v −u dS.
Ω ∂Ω ∂~n ∂~n
5.3. Change of variables in the integral. The following theorem is the multidimensional version
of the substitution rule in the integral.
Theorem 5.2. Let Φ : Ω̄0 → Ω̄ be a C 1 -diffeomorphism between two domains in Rn with C 1 -smooth
boundary, and let f ∈ L1 (Ω). Then
Z Z
f (x)dx = f ◦ Φ(y)|JΦ (y)|dy,
Ω Ω0
where |JΦ | denotes the determinant of the differential (Jacobian matrix) of the map Φ.
6 RASUL SHAFIKOV
Proof. We will give the proof for n = 3. Assume that the coordinates in Ω are (x, y, z) and (u, v, w)
in Ω0 . Suppose ∂Ω is parametrized by a function
φ(s, t) → (x(s, t), y(s, t), z(s, t)), (s, t) ∈ D ⊂ R2 .
Then after substituting φ into Φ, the hypersurface ∂Ω0 is given by some function
(s, t) → (u(s, t), v(s, t), w(s, t)).
Define the function Z z
F (x, y, z) = f (x, y, ξ)dξ.
0
∂(x,y)
Then ∂F∂z = f in Ω. We will use the notation to denote the determinant of the Jacobian
∂(s,t)
matrix obtained by taking partial derivatives of the functions x(s, t) and y(s, t) with respect to
variables (s, t). Similar notation will be used for any other collection of functions and variables.
Set
~ ∂(y, z) ∂(z, x) ∂(x, y)
N = (N1 , N2 , N3 ) = , , ,
∂(s, t) ∂(s, t) ∂(s, t)
and
N~ 0 = (N10 , N20 , N30 ) = ∂(v, w) , ∂(w, u) , ∂(u, v) .
∂(s, t) ∂(s, t) ∂(s, t)
We claim that
∂(Φ1 , Φ2 ) 0 ∂(Φ1 , Φ2 ) 0 ∂(Φ1 , Φ2 ) 0
(15) N3 = N + N + N3 .
∂(v, w) 1 ∂(w, u) 2 ∂(u, v)
This can be verified by writing
x = Φ1 (u(s, t), v(s, t), w(s, t)), y = Φ2 (u(s, t), v(s, t), w(s, t)),
differentiating these equations with respect to s and t and then substituting into N3 = xs yt − xt ys .
The vectors N ~ and N ~ 0 are normal to ∂Ω and ∂Ω0 respectively, say, N is the outward-pointing
normal to ∂Ω, and N ~ 0 is the inward-point normal to ∂Ω0 . Then
(16) ~ | and ~n0 = −N
~ /|N
~n = N ~ 0 /|N
~ 0|
are the corresponding unit normal vectors. By the Divergence theorem,
Z Z Z Z
∂F
f dx dy dz = dx dy dz = F cos(e3 , ~n)dS = F N3 ds dt.
Ω Ω ∂z ∂Ω D
Substitution of N3 from (15) gives
Z Z
∂(Φ1 , Φ2 ) 0 ∂(Φ1 , Φ2 ) 0 ∂(Φ1 , Φ2 ) 0
f dx dy dz = F N1 + N2 + N3 ds dt.
Ω D ∂(v, w) ∂(w, u) ∂(u, v)
Since the surface measure on ∂Ω0 is given by dS 0 = |N ~ 0 | ds dt and since
(N10 , N20 , N30 ) = −|N
~ 0 | cos(e1 , ~n0 ), −|N
~ 0 | cos(e2 , ~n0 ), −|N
~ 0 | cos(e3 , ~n0 ) ,
we get
Z Z
∂(Φ1 , Φ2 ) 0 ∂(Φ1 , Φ2 ) 0 ∂(Φ1 , Φ2 )
f dx dy dz = − F cos(e1 , ~n ) + cos(e2 , ~n ) + cos(e3 , ~n ) dS 0 .
0
Ω ∂Ω0 ∂(v, w) ∂(w, u) ∂(u, v)
Evaluating the last surface integral by the divergence theorem, and using the relation
∂ ∂(Φ1 , Φ2 ) ∂ ∂(Φ1 , Φ2 ) ∂ ∂(Φ1 , Φ2 ) ∂(Φ1 , Φ2 , Φ3 )
F + F + F =f = f JΦ ,
∂u ∂(v, w) ∂v ∂(w, u) ∂w ∂(u, v) ∂(u, v, w)
REAL ANALYSIS LECTURE NOTES 7
we finally obtain Z Z
f dx dy dz = − f Jφ du dv dw.
Ω Ω0
Since the Jacobian does not vanish in Ω0 , it is either positive or negative. Taking f ≡ 1 we see that
it is negative for our choice of the sign in the normal vectors in (16). Therefore, −JΦ = |JΦ |. The
proof for other choices of sign in (16) is similar.
Example 5.4. Consider again the spherical coordinates in Rn :
Φ : (r, θ1 , ..., θn−1 ) 7→ (x1 , ..., xn ),
Φ : (0, +∞) × (0, π) × ... × (0, π) × (0, 2π),
where
x1 = r cos θ1 ,
x2 = r sin θ1 cos θ2 ,
.....................................
xn−1 = r sin θ1 sin θ2 ... sin θn−2 cos θn−1 ,
xn = r sin θ1 sin θ2 ... sin θn−1 .
Then, for a domain D ⊂ Rn ,
Z Z
f (x)dx = f ◦ Φ(r, θ)rn−1 (sin θ1 )n−2 ... sin θn−2 drdθ1 ...dθn−1 .
D Φ−1 (D)
If Γ = RS n−1= {x ∈ Rn : |x| = R} is the sphere centred at the origin of radius R, then from (2)
we have Z Z
f (x)dS = Rn−1 f ◦ Φ(R, θ)(sin θ1 )n−2 ... sin θn−2 dθ1 ...dθn−1 ,
RS n−1 D0
with D0 = ×(0, π) × ... × (0, π) × (0, 2π). Suppose that f ∈ L1 (Rn ). Then rewriting the above
integral we obtain Z Z ∞ Z
f (x)dx = f (x)dS rn−1 dr.
Rn 0 rS n−1
This can be interpreted as integration over a sphere of radius r, and then over all concentric spheres
for 0 < r < ∞.
Let A ∈ O(n) be an orthogonal matrix : AAt = Id and A be the corresponding linear transfor-
mation of Rn , i.e., A(t) = At. It follows from the formula of the change of variables in the integral
that Z Z Z
f (x)dS = f ◦ A(t)dS = f (At)dS.
S n−1 S n−1 S n−1
This property is often useful in computations of spherical integrals.
v: 2019-02-11
REAL ANALYSIS LECTURE NOTES
RASUL SHAFIKOV
6. Differential Equations
6.1. Ordinary Differential Equations. Consider a system of d first order ordinary differential
equations (ODE for short) and an initial condition
(1) y 0 = f (t, y), y(t0 ) = y0 ,
where y = (y1 , . . . , yd ) is a vector of unknown functions of a real variable t, y 0 = (dy1 /dt, . . . , dyd /dt),
and f : Ω → Rn is a continuous map on a domain Ω ⊂ Rn+1 . We say that y = y(t) defined on a
t-interval J containing t0 is a solution of the initial value problem (1) if y(t0 ) = y0 , (t, y(t)) ∈ Ω,
y(t) is differentiable and y 0 (t) = f (t, y(t)) for t ∈ J. These requirements are equivalent to the
following: y(t0 ) = y0 , (t, y(t)) ∈ Ω, y(t) is continuous and
Z t
y(t) = y0 + f (s, y(s))ds, t ∈ J.
t0
Here integration should be understood component-wise.
An initial value problem involving a system of equations of m-th order
(2) z (m) = F (t, z, z (1) , . . . , z (m−1) ), z (j) (x0 ) = z0j for j = 0, . . . , m − 1,
where z (j) = dj z/dtj , z and F are n-dimensional vectors, and F is defined on an (mn + 1) di-
mensional domain Ω, can be considered as a special case of (1), where y is a d-dimensional vector
(2) (m−1)
symbolically, y = (z, z (1) , . . . , z (m−1) ), or more precisely, y = (z1 , . . . , zn , z10 , . . . zn0 , z1 , . . . , zn ).
Correspondingly,
f (t, y) = (z (1) , . . . , z (m−1) , F (t, y)), y0 = (z0 , . . . , z0m−1 ).
For example, if n = 1, then z is a scalar, and (2) becomes
y10 = y2 , . . . , ym−1
0 0
= ym , y m = F (t, y1 , . . . , ym ),
yj (t0 ) = z0j−1 for j = 1, . . . , m,
where y1 = z, y2 = z 0 , . . . , ym = z (m−1) .
The most fundamental question concerning ODE (1) is the existence and uniqueness of solutions.
Example 6.1. Consider the initial value problem given by y 0 = y 2 and y(0) = c > 0. It is easy to
c
see that y = 1−ct is a solution, but it exists only on the range −∞ < t < 1/c, which depends on
the initial condition. The initial value problem y 0 = |y|1/2 , y(0) = 0 has more than one solution, in
fact, it has a one-parameter family of solutions defined by y(t) = 0 for t ≤ c, y(t) = (t − c)2 /4 for
t ≥ c ≥ 0.
The following result gives basic conditions when a local solution of an ODE exists and is unique.
Theorem 6.1. Let y, f ∈ Rd , f (t, y) be continuous on R = {|t−t0 | ≤ a, |y−y0 | ≤ b} and uniformly
Lipschitz continuous with respect to y. Let M be a bound for |f (t, y)| on R, α = min(a, b/M ). Then
the initial value problem (1) has a unique solution y = y(t) on [t0 − α, t0 + α].
1
2 RASUL SHAFIKOV
Then, since f (t, y n (t)) is defined and continuous on [t0 − α, t0 + α], the same holds for y n+1 (t). It
is clear that Z t
n+1
|y (t) − y0 | ≤ |f (s, y n (s)|ds ≤ M α ≤ b.
t0
Hence, y 0 (t), y 1 (t), . . .
are defined and continuous on [t0 − α, t0 + α] and |y n (t) − y0 | ≤ b.
It will now be verified by induction that
M K n |t − t0 |n+1
(4) |y n+1 (t) − y n (t)| ≤ , for t0 − α ≤ t ≤ t0 + α, n = 0, 1, . . . ,
(n + 1)!
where K is a Lipschitz constant for f . Clearly, (4) holds for n = 0. Assume that it holds for up to
n − 1. Then Z t
n+1 n
f (s, y n (s)) − f (s, y n−1 (s) ds, n ≥ 1.
y (t) − y (t) =
t0
Thus, the definition of K implies that
Z t
n+1 n
|y (t) − y (t)| ≤ K |y n (s) − y n−1 (s)| ds
t0
and so, by (4),
t
M Kn M K n |t − t0 |n+1
Z
n+1 n
|y (t) − y (t)| ≤ |s − t0 |n ds = .
n! t0 (n + 1)!
This proves (4) for general n. It follows from this inequality that
X∞
y0 + [yn+1 (t) − yn (t)] =: y(t)
n=0
is uniformly convergent on [t0 − α, t0 + α], i.e., we have a uniform limit
(5) y(t) = lim yn (t).
n→∞
Since f (t, y) is uniformly continuous on R, f (t, yn (t)) → f (t, y(t)) as n → ∞ on [t0 − α, t0 + α].
Thus, by taking the limit in the integral in (3) gives
Z t
(6) y(t) = y0 + f (s, y(s))ds.
t0
Hence, (5) is a solution of (1).
In order to prove uniqueness, let y = z(t) be any solution of (1) on [t0 − α, t0 + α]. Then
Z t
z(t) = y0 + f (s, z(s))ds.
t0
An induction similar to that used above gives, using (3),
M K n |t − t0 |n+1
|y n (t) − z(t)| ≤ for t0 − α ≤ t ≤ t0 + α, n = 0, 1, . . . .
(n + 1)!
If n → ∞, it follows from (5) that |y(t) − z(t)| ≤ 0, i.e., y(t) ≡ z(t). This proves the theorem.
REAL ANALYSIS LECTURE NOTES 3
One can show that if the function f in (1) is merely continuous, then the initial value problem
always has a solution, but it may not be unique. This is Peano’s Existence theorem.
Consider now a homogeneous linear system of differential equations of the form
(7) y 0 = A(t)y
and the corresponding inhomogeneous system
(8) y 0 = A(t)y + f (t),
where A(t) is a d × d matrix of continuous functions in t, and f (t) is a continuous vector function
of size d. It is a consequence of the theorem on existence and uniqueness of solutions of ODEs
that (7) has unique solution given the initial condition y(t0 ) = y0 . Further, if y(t) is a solution
of (7) and y(t0 ) = 0 for some t0 , then y(t) ≡ 0. The following is immediate
Proposition 6.2 (Principle of Superposition). Let y = y1 (t), y2 (t) be solutions of (7), then any
linear combination y = c1 y1 (t) + c2 y2 (t) with constants c1 , c2 is also a solution. If y = y1 (t) and
y = y0 (t) are solutions of (7) and (8) respectively, then y = y0 (t) + y1 (t) is a solution of (8).
By a fundamental matrix Y (t) of (7) we mean a d × d matrix such that its columns are solutions
of (7) and det Y (t) 6= 0. If Y = Y0 (t) is a fundamental matrix of solutions and C is a constant d × d
matrix, then Y (t) = Y0 (t)C is also a solution, in fact, any solution of (7) can be obtained this way
for some suitable C.
Example 6.2. Let R be a constant d × d matrix with real coefficients. Consider the system of
differential equations
(9) y 0 = R y.
Let y1 6= 0 be a constant vector, and λ be a (complex) number. By substituting y = y1 eλt into the
equation we see that a necessary and sufficient condition for y to be a solution of (9) is
Ry1 = λy1 ,
i.e., that λ is an eigenvalue and y1 6= 0 be a corresponding eigenvector of R. Thus to each eigenvalue
λ of R there corresponds at least one solution of (9). If R has distinct eigenvalues λ1 , λ2 , . . . , λd
with linearly independent eigenvectors y1 , . . . , yd , then
Y = y1 eλ1 t , . . . , y1 eλ1 t
is a fundamental matrix for (9).
Finally, linear differential equations of higher order can be reduced to a system of first order.
Indeed, let pj (t), j = 0, . . . , d−1, and h(t) be continuous functions. Consider the linear homogeneous
equation of order d
(10) u(d) + pd−1 (t)u(d−1) + · · · + p1 (t)u0 + p0 (t)u = 0
and the corresponding inhomogeneous equation
(11) u(d) + pd−1 (t)u(d−1) + · · · + p1 (t)u0 + p0 (t)u = h(t).
We let y = (u, u(1) , . . . , u(d−1) ),
0 1 0 0 ··· 0
0
0 1 0 ··· 0
A(t) =
· · · ,
0 0 0 0 ··· 1
−p0 −p1 −p2 −p3 ··· −pd−1
4 RASUL SHAFIKOV
and f (t) = (0, . . . , 0, h(t)). With this choice of u, A and f , equation (10) becomes equivalent to (7),
while (11) equivalent to (8). With this we can apply results available for first order systems, in
(d−1)
particular, given the initial conditions u(t0 ) = u0 , u0 (t0 ) = u00 , ..., u(d−1) (t0 ) = u0 , where uj0 are
arbitrary numbers, the corresponding initial value problem has a unique solution. The Principle of
Superposition also holds: let u = u1 (t), u2 (t) be two solutions of (10), then any linear combination
u(t) = c1 u1 (t) + c2 u2 (t) is also a solution, and u(t) = u1 (t) + u0 (t) is a solution of (11) if u0 (t) is.
6.2. Partial Differential Equations. A partial differential equation (PDE for short) is an equa-
tion that involves an unknown function of two or more independent variables and certain partial
derivatives of the unknown function. More precisely, let u denote a function of n independent
variables x1 , . . . , xn , n ≥ 2. Then a relation of the form
(12) F (x1 , . . . , xn , u, ux1 , . . . , uxn , ux1 x2 , . . . ) = 0,
where F is a function of its arguments, is a partial differential equation in u. The following equations
are some examples of PDEs on R2 with coordinates (x, y):
(13) xux + yuy − 2u = 0
(14) yux − xuy = x
(15) uxx − uy − u = 0
(16) uux + yuy − u = xy 2
(17) uxx + x(uy )2 + yu = y
A typical problem in the theory of PDEs is for a given equation to find on some domain of Rn
a solution satisfying certain additional initial or boundary conditions. Analogous to ODEs, the
highest-order derivative appearing in a PDE is called the order of the equation. Thus, (13), (14),
and (16) are all first-order PDEs, and the remaining two are second-order. If there exists a function
u defined in a domain under consideration, such that u and its derivatives identically satisfy (12),
then u is called a solution of the equation.
A PDE is called linear if it is at most of the first order in u and its derivatives. Equation (13), (14),
and (15) above are linear, while the other two are not.
Example 6.3. The first-order linear ODE of the form
du
+ u = f (x)
dx
has the general solution Z x
u(x) = e−(x−t) f (t)dt + Ce−x .
0
Now if we consider the first order PDE on R2 for the unknown function u = u(x, y),
∂u
(18) + u = f (x),
∂x
then its general solution is given by
Z x
u(x) = e−(x−t) f (t)dt + g(y)e−x ,
0
where g(x) is an arbitrary function of y. It is easy to see that for any choice of g, the function
u satisfies (18). Thus the general solution of a PDE may contain some arbitrary functions, not
necessarily constants.
REAL ANALYSIS LECTURE NOTES 5
A map between function spaces that involves differentiation is called a differential operator. For
∂2 ∂2
example, the map L : C ∞ (R2 ) → C ∞ (R2 ) given by L(u) = ∂t 2 − ∂x2 is a second-order differential
operator. A differential operator L is called linear if L(c1 u1 + c2 u2 ) = c1 L(u1 ) + c2 L(u2 ). It is
immediate that L is linear if and only if the PDE L(u) = 0 is linear, and that any finite sum of
linear differential operators is again linear. Given a linear operator L, the PDE Lu = 0 is called
a homogeneous PDE, while Lu = f is inhomogeneous for f 6≡ 0. As for ODEs, the Principle of
Superposition also holds: a linear combination of solutions of a homogeneous equation is again a
solution, and the sum of solutions of a homogeneous and inhomogeneous equation is a solution of
the inhomogeneous one.
We now consider three classical PDEs: the Wave equation, the Heat equation, and the Laplace
equation. For a function u = u(x, t), where t ∈ R, x = (x1 , . . . , xn ) ∈ Rn , the equation of the form
∂2u
2
∂2u ∂2u
2 ∂ u
(19) − c + · · · + = − c2 ∆x u = 0 c = const.,
∂t2 ∂x21 ∂x2n ∂t2
is called the Wave equation. This equation describes many types of elastic and electromagnetic
waves. In many physical application the variable t represents time and x represents coordinates in
the Euclidean space where the physical experiment takes place. As an equation of second order, a
typical initial condition for the Wave equation is of the form
u(x, 0) = f (x),
ut (x, 0) = g(x).
When n = 1 the equation has surprisingly simple solution. We let ξ = x − ct and η = x + ct. After
this change of variable equation (19) has the form uξη = 0, which after elementary considerations
admits the solution u(ξ, η) = F (ξ) + G(η), or after returning to the origin variables,
u(x, t) = F (x − ct) + G(x + ct).
The general solution for an arbitrary n can be obtained using the theory of Fourier series.
Consider now the equation of the form
∂u
(20) − k∆x u = 0.
∂t
This is the so-called Heat equation. Again t ∈ R is the “time” variable, x = (x1 , . . . xn ), and
∆x is the Laplacian in variable x. As a primary physical application, equation (20) describes the
conduction of heat with the function u usually representing the temperature of a “point” with
coordinates x at time t, but more generally it governs a range of physical phenomena described as
diffusive. Typical initial conditions are
u(x, 0) = f (x),
u(0, t) = 0,
where the first condition can be interpreted as the initial temperature of the system, and the second
one declares that the temperature is fixed at the “end” point. Solution of (20) for n = 1 can be
found, for example, using the separation of variables method: assuming k = 1 we seek solution of
the form u(x, t) = X(x)T (t). Putting this into (20) gives
T 0 (t) X 0 (x)
= .
T (t) X(x)
6 RASUL SHAFIKOV
Since the right-hand side is independent of t and the left-hand side of x, each side must be a
constant. This gives
T 0 = λT, X 00 = λX.
Solving these equations gives
√ √
u(x, t) = eλt (Ae λx
+ Be− λx
).
Initial conditions will then specify the values of the constants.
Finally, for a function u = u(x1 , . . . , xn ) consider the equation
∂2u ∂2u
(21) ∆u = 2 + · · · + 2 = 0.
∂x1 ∂xn
This PDE, which is called the Laplace equation, has many applications in gravitation, elastic
membranes, electrostatics, fluid flow, etc, as well as numerous applications in other areas of pure
mathematics. There are two types of boundary conditions on a bounded domain Ω ⊂ Rn for the
Laplace equation that give a well-posed problem: the Dirichlet condition
u(x) = f (x), x ∈ ∂Ω,
and the Neumann condition
∂u
(x) = f (x), x ∈ ∂Ω,
∂~n
∂u
where ∂~ n (x) is the derivative in the normal direction to ∂Ω. Solutions of (21) are called harmonic
functions. For n = 2, solutions can be found again by separation of variables.
Unlike the theory of ODE, not every PDE has a solution. In 1957 H. Lewy found an example
of a first order linear PDE that has no solution. The corresponding (complex-valued) differential
operator on C ∞ (R3 ) is
Lu = −ux − iuy + 2i(x + iy)uz .
Then there exists a real valued function f (x, y, z) of class C ∞ (R3 ) such that the equation Lu =
f (x, y, z) has no solution of class C 1 (Ω) in any open subset Ω ⊂ R3 . While Lewy’s example is not
explicit, later explicit constructions were also found.
The situation is different, however, if the functions involved in a PDE are real-analytic. Recall
that for a domain Ω ⊂ Rn a function f (x) : Ω → R is real-analytic if near any point in Ω it can
be represented by a convergent power series. More precisely, if a = (a1 , . . . , an ) ∈ Ω, there exists a
polydisc U (a, r) = {x ∈ Rn : |xj − aj | < r, j = 1, . . . , n}, U ⊂ Ω such that
∞
X
f (x) = bk (x − a)k ,
|k|=0
RASUL SHAFIKOV
for u ∈ C k (Ω).
Recall that supp h, the support of a function h ∈ C 0 (Rn ), is defined to be the closure in Rn of
the set {x ∈ Ω : h(x) 6= 0}. Denote by C0∞ (Ω) the subspace of C ∞ (Rn ) which consists of functions
h such that supp h is a compact subset of Ω.
Our main goal is to introduce and to study the space of continuous linear functionals on the
linear vector space C0∞ (Ω). For this we first need to choose some topology on C0∞ (Ω). For our
purposes it will be sufficient simply to define the notion of convergence of a sequence of elements
in C0∞ (Ω). This will allow us to define the continuity of linear functionals. We say that a sequence
of functions (ϕj ) of class C0∞ (Ω) converges to ϕ ∈ C0∞ (Ω) if the following conditions hold:
(i) There exists a compact subset K such that K ⊂ Ω and ϕj = 0 on Rn \K for every
j = 1, 2, 3.... In other words, supp ϕj ⊂ K ⊂ Ω for every j.
(ii) For every α the sequence Dα ϕj converges to Dα ϕ uniformly on K. That is all partial
derivatives of ϕj of all orders converge uniformly to the corresponding partial derivatives
of ϕ.
Definition 7.1. The space C0∞ (Ω) equipped with the above topology of convergence of sequences
will be denoted by D(Ω). The elements of this space are called test-functions.
We leave as an exercise for the reader to verify that D(Ω) is a topological vector space.
p
Example 7.1. Denote by |x| = |x1 |2 + ... + |xn |2 the Euclidean norm on Rn . Set
2
− 2ε 2
ωε (x) = Cε e
ε −|x| , |x| < ε,
0, |x| ≥ ε.
that is,
Z !−1
−1
−n 1−|t|2
Cε = ε e dt ,
B(0,1)
where B(0, 1) denotes the Euclidean unit ball in Rn . This is a model example of a function from
D(Rn ) (the so-called bump function). Observe that
ωε (x) = ε−n ω1 (x/ε).
In what follows we write ω(x) instead of ω1 (x). The bump function allows us to construct
suitable test-functions for an arbitrary domain Ω in Rn . We denote by Ωε := ∪x∈Ω B(x, ε), the
ε-neighbourhood of Ω.
Lemma 7.2. Given a compact subset Ω ⊂ Rn and ε > 0 there exists a function η : Rn −→ [0, 1] of
class C ∞ (Rn ) such that
η(x) = 1, for x ∈ Ωε ,
η(x) = 0, for x ∈ Rn \Ω3ε .
Proof. Let (
1, x ∈ Ω2ε ,
λ(x) =
0, x ∈ Rn \Ω2ε ,
be the characteristic function of Ω2ε . Then the function defined by the convolution integral
Z
η(x) = λ(y)ωε (x − y)dy
Rn
satisfies the required conditions.
Corollary 7.3. Let Ω be a bounded domain and Ω0 be its subset such that Ω0 ⊂ Ω. Then there
exists a function η ∈ D(Ω), η : Ω −→ [0, 1] such that η(x) = 1 whenever x ∈ Ω0 .
We point out that the introduced above topology on the space D(Ω) is not metrizable. In other
words, one cannot define a distance d on the space of test functions such that the convergence with
respect to d is equivalent to the introduced above convergence of a sequence of test functions. To
see this, recall the following elementary assertion.
Claim. Let {um k } be a countable family of sequences in a metric space (X, d). Suppose that for
every m = 0, 1, ... the sequence {um m m
k } admits the limit u := limk→∞ uk and the sequence of these
m
limits {u } also tends to u ∈ X as m → ∞. Then for every m there exists km such that km−1 < km
and the sequence {um km } converges to u.
continuous functions γj . Define a neighbourhood of the origin in D(R) as the set of test-functions φ
satisfying |φ(j) (x)| ≤ γj (x), x ∈ R, j = 1, 2, . . . , m. Then the convergence in this topology precisely
coincides with the introduced above convergence on D(R). The reader is encouraged to supply a
proof of this fact.
7.2. Regularization of functions. The convolution f ∗g of two functions f, g ∈ L2 (Rn ) is defined
by the integral
Z Z
(1) f ∗ g(x) = f (y)g(x − y)dy = f (x − y)g(y)dy.
Rn Rn
The convolution has particularly nice properties when the second factor is a test function. Denote by
L1loc (Ω) the space of locally Lebesgue-integrable functions on Ω. Recall that a measurable function
1
R
f is in Lloc (Ω) if and only if X |f (x)|dx < ∞ for every compact measurable subset X ⊂ Ω. Let
f ∈ L1loc (Rn ) and ϕ ∈ D(Rn ). Then for every x ∈ Rn the function y 7→ ϕ(x − y) is a test-function
and so the convolution
Z
(2) f ∗ ϕ(x) = f (y)ϕ(x − y)dy
Rn
is well-defined point-wise as a usual function on all of Rn .
Proposition 7.4. We have f ∗ ϕ ∈ C ∞ (Rn ) and
(3) Dα (f ∗ φ) = f ∗ Dα ϕ.
Proof. (a) Let us show that the convolution f ∗ ϕ is a continuous function. Let xk ∈ Rn be a
sequence converging to x. Then ϕ(xk − y) −→ ϕ(x − y), as k → ∞ everywhere, and
Z Z
k k
f ∗ ϕ(x ) = f (y)ϕ(x − y)dy −→ f (y)ϕ(x − y)dy = (f ∗ ϕ)(x), as k → ∞,
(ii) if f ∈ C(Rn )
then fε −→ f , as ε → 0+, in C(Ω) for every bounded subset Ω of Rn ;
(iii) if f ∈ L (R ), 1 ≤ p < ∞, then fε −→ f , as ε −→ 0+, in Lp (Rn ).
p n
Proof. Part (i) follows from Proposition 7.4. To show (ii), assume that f is a continuous function
on Rn . Then
Z Z Z
(4) fε (x) = ωε (x − y)f (y)dy = ωε (y)f (x − y)dy = ωε (y)f (x − y)dy.
Rn Rn |y|≤ε
Since Z
ωε (x)dx = 1,
Rn
we have
Z Z Z
ωε (y)f (x − y)dy − f (x) = ωε (y)(f (x − y) − f (x))dy ≤ Mε ωε (y)dy = Mε ,
Rn Rn |y|≤ε
where
Mε = sup |f (x − y) − f (x)|.
x∈Ω,|y|≤ε
Proof. Consider locally finite subcoverings (Wγ ) and (Vβ ) constructed in the previous proposition.
Fix γ. By construction, there exists a finite number of β such that the closed ball Wγ is contained
in Vβ . By Corollary 7.3 there exists a function
\
φγ ∈ C0∞ ( Vβ )
Wγ ⊂Vβ
6 RASUL SHAFIKOV
P
with values in [0, 1] that is equal to 1 on Wγ . Hence, φ(x) = γ φγ (x) is well-defined for every
x ∈ X and φ(x) > 0. Now set ηγ (x) = φγ (x)/φ(x).
The family (ηγ ) is called a partition of unity subordinated to the covering (Wγ ).
v: 2019-03-05
REAL ANALYSIS LECTURE NOTES
RASUL SHAFIKOV
Example 8.1. Recall that L1loc (Ω) the space of locally Lebesgue-integrable functions on Ω. Let f
be in L1loc (Ω), i.e., X |f (x)|dx < ∞ for every compact measurable subset X ⊂ Ω. Then f defines
R
of a function defined point-wise. Distributions which are not regular, are called singular. The
following example confirms their existence.
Example 8.2. Consider the distribution δ(x) ∈ D0 (Rn ) (Dirac delta function) defined by
hδ(x), ϕi = ϕ(0)
for ϕ ∈ D(Rn ). Suppose that there exists a function f ∈ L1loc (Ω) such that δ = Tf . For every ε > 0
−ε2
consider the function ψε ∈ D(Ω) defined by ψε (x) = e ε2 −|x|2
for |x| < ε and ψε (x) = 0 for |x| ≥ ε.
Then hδ, ψε i = ψε (0) = e−1 . On the other hand,
Z
hTf , ψε i = f (x)ψε (x)dx,
and by the Lebesgue convergence theorem the last integral tends to 0 as ε → 0: a contradiction.
Thus, the δ-function is a singular distribution. Similarly, for every a ∈ Rn one can define the
translated delta function δa :
hδa , ϕi = ϕ(a).
Example 8.3. Another interesting and typical example of a singular distribution is given as follows:
Z Z −ε Z +∞
1 φ(x) φ(x) φ(x)
hP , φi = v.p. dx := lim dx + dx ,
x R x ε−→0+ −∞ x ε x
where v.p. stands for valeur principale in the sense of Cauchy of the integral. First we show that
P x1 is well-defined. Indeed, let φ ∈ D(R) be arbitrary with supp φ ⊂ [−A, A] for some A > 0. Then,
using the Mean Value Theorem for φ on the intervals [−x, 0] and [0, x], we obtain
Z −ε Z A
1 φ(x) φ(x)
hP , φi = lim dx + dx
x ε−→0+ −A x ε x
Z −ε
φ(0) − xφ0 (ξ(x))
Z A ˜
φ(0) + xφ0 (ξ(x))
= lim dx + dx
ε−→0+ −A x ε x
Z A Z A
≤ |φ0 (ξ(x))|dx + |φ0 (ξ(x))|dx
˜ ≤ 2A sup |φ0 | < ∞.
−A −A [−A,A]
Clearly, P x1
is linear. Let us now show that it is continuous on D(R). Consider a sequence (φj )
converging to 0 in D(R). This means that there exists A > 0 such that φj (x) = 0 for every j and
every |x| ≥ A. Then, applying the Mean Value Theorem, we have
φj (0) + xφ0j (ξ(x))
Z Z A
1 φj (x)
hP , φj i = v.p. dx = v.p. dx
x R x −A x
Z A
≤ |φ0j (ξ(x))|dx ≤ 2A sup |φ0j | −→ 0, j −→ 0.
−A [−A,A]
8.2. Convergence of distributions. Now we define a topology on the space of distributions. For
applications it is sufficient to use the standard notion of weak∗ convergence.
Definition 8.3. A sequence of distributions (fj ) converges to a distribution f in D0 (Ω) if for every
ϕ ∈ D(Ω) one has limj−→∞ hfj , ϕi = hf, ϕi.
The following simple example of convergence is very important.
REAL ANALYSIS LECTURE NOTES 3
For every τ > 0 there exists an ε0 > 0 such that |φ(x) − φ(0)| < τ when |x| < ε0 . Using the
properties of the bump-function we obtain
Z Z
ωε (x)φ(x)dx − φ(0) ≤ ωε (x)|φ(x) − φ(0)|dx < τ.
Rn |x|≤ε
Another fundamental property of the space D0 (Ω) is its completeness.
Theorem 8.5. Let (fj ) be a sequence in D0 (Ω) such that for every ϕ ∈ D(Ω) the sequence (hfj , ϕi)
converges in R. Consider the map f : D(Ω) −→ R defined by
hf, ϕi := lim hfj , ϕi, ϕ ∈ D(Ω).
j−→∞
Then f ∈ D0 (Ω).
Proof. The linearity of f is obvious so we just need to establish its continuity. Let ϕk −→ 0 as
k −→ ∞ in D(Ω). Arguing by contradiction suppose that hf, ϕk i does not converge to 0. Passing
to a subsequence we may assume that there exists an ε > 0 such that |hf, ϕk i| ≥ 2ε for all k.
Since hf, ϕk i = limj−→∞ hfj , ϕk i, for every k there exists jk such that |hfjk , ϕk i| ≥ ε. However, this
contradicts the following statement:
Lemma 8.6. Let (fk ) be a sequence in D0 (Ω) satisfying assumptions of Theorem 8.5 and ϕk −→ 0
in D(Ω). Then hfk , ϕk i −→ 0, as k −→ ∞.
Thus, in order to complete the proof of the theorem it remains to prove the lemma.
Proof of Lemma 8.6. Suppose on the contrary that the statement of the lemma is false. Passing
to a subsequence we may assume that |hfk , ϕk i| ≥ C > 0. Since ϕk −→ 0 in D(Ω), we have:
(a) ϕk = 0 for all k outside a compact subset K ⊂ Ω.
(b) For every α the sequence Dα ϕk converges uniformly to 0.
Passing to a subsequence we can assume that any k = 0, 1, 2, . . . ,
|Dα ϕk (x)| ≤ 1/4k , |α| ≤ k.
Set ψk = 2k ϕk ; then
(2) |Dα ψk (x)| ≤ 1/2k , |α| ≤ k.
Furthermore, ψk −→ 0 in D(Rn ) and every series of type s ψks (x) converges in D(Ω). On the
P
other hand,
(3) |hfk , ψk i| = 2k |hfk , ϕk i| ≥ 2k C −→ ∞ as k −→ ∞.
To reach a contradiction, we construct by induction suitable subsequences (fks ) and (ψks ) that
satisfy inequalities (7) and (8) below. Choose fk1 and ψk1 such that |hfk1 , ψk1 i| ≥ 2. This is always
possible in view of (3). Suppose that fkj , ψkj , j = 1, ..., s − 1, are already constructed. We wish
to find fks , ψks . Since ψk −→ 0, k −→ ∞ in D(Ω), we have limk→∞ hfkj , ψk i −→ 0, for any
j = 1, ..., s − 1, and so there exists N such that for k ≥ N
(4) |hfkj , ψk i| ≤ 1/2s−j , j = 1, ..., s − 1.
4 RASUL SHAFIKOV
Moreover, since
lim hfk , ψkj i = hf, ψkj i j = 1, ..., s − 1,
k→∞
there exists N1 ≥ N such that for all k ≥ N1
(5) |hfk , ψkj i| ≤ |hf, ψkj i| + 1, j = 1, ..., s − 1.
Finally, in view of (3) we fix ks ≥ N1 such that
s−1
X
(6) |hfks , ψks i| ≥ |hf, ψkj i| + 2s.
j=1
Now it follows from (4), (5), (6) that the functions fks and ψks satisfy
(7) |hfkj , ψks i| ≤ 1/2s−j , j = 1, ..., s − 1,
s−1
X
(8) |hfks , ψks i| ≥ |hfks , ψkj i| + s + 1.
j=1
that is, hfks , ψi −→ ∞ as s −→ ∞. This contradicts the condition hfk , ψi −→ hf, ψi, which
completes the proof.
8.3. Multiplication of distributions. The product of two functions of class L1loc (R) in general
is not in this class (consider, for instance, f (x) = |x|−1/2 and f 2 ). This example shows that it is
impossible in general to define in a natural way even the product of regular distributions. In fact,
one can show that it is impossible to define a multiplication of two distributions which satisfies
the standard algebraic properties (commutativity, associativity,...). However, one can define the
product of a distribution f ∈ D0 (Ω) and a function a ∈ C ∞ (Ω).
First, consider the case when f ∈ L1loc (Ω), i.e., f is a regular distribution. Then the distribution
corresponding to the usual product af acts on a test-function ϕ by
Z
hTaf , ϕi = a(x)f (x)ϕ(x)dx = hTf , aϕi.
Ω
REAL ANALYSIS LECTURE NOTES 5
In the case of an arbitrary distribution f we take the right-hand side of this equality to be the
definition of the distribution af , i.e., we set
haf, ϕi := hf, aϕi, ϕ ∈ D(Ω).
Observe two immediate properties of the algebraic operation of multiplication of a distribution
by a smooth function a:
(a) Linearity: for every f, g ∈ D0 (Ω) and real λ, µ we have
a(λf + µg) = λ(af ) + µ(ag).
(b) Continuity: if fj −→ f in D0 (Ω) then afj −→ af in D0 (Ω).
Example 8.4. a(x)δ(x) = a(0)δ(x), since
haδ, φi = hδ, aφi = a(0)φ(0) = a(0)hδ, φi.
Example 8.5. xP x1 = 1. Indeed, for any φ ∈ D(Ω), we have
Z Z
1 1 xφ(x)
hxP , φi = hP , xφi = v.p. dx = φ(x)dx = h1, φi.
x x R x R
8.4. Composition with linear maps. Let f be a function of class L1loc (Rn ) and let u : x 7→ Ax+b
be a bijective affine map of Rn , i.e., det A 6= 0. Given φ ∈ D(Ω) consider
Z Z
hTf ◦u , φi = f (Ay + b)φ(y)dy = | det A|−1 f (x)φ(A−1 (x − b))dx
8.5. Dependence on a parameter. The continuity of distributions implies their “good” be-
haviour under an action on test-functions depending on a real parameter. We will use this property
and its variations.
Theorem 8.7. Let X and Y be domains in Rn and Rm respectively and ϕ ∈ C ∞ (X × Y ). Suppose
that there exists a compact subset K ⊂ X such that ϕ(x, y) = 0 for every (x, y) with x ∈
/ K. Then
0
for every f ∈ D (X) the function
F : Y 3 y 7→ hf (x), ϕ(x, y)i
is of class C ∞ (Y ) and
Dyα hf (x), ϕ(x, y)i = hf (x), Dyα ϕ(x, y)i.
6 RASUL SHAFIKOV
Proof. (a) Let us show that F is a continuous function. Let y k ∈ Rm be a sequence converging to
y ∈ Y . We can assume that the points y k are in a fixed closed ball B ⊂ Y . Then
k Dxβ ϕ(x, y k ) − Dxβ ϕ(x, y) kC(X) ≤k ∇Dxβ ϕ(x, y) kC(K×B) |y k − y|.
Since the supports of all functions x 7→ Dxβ ϕ(x, y k ) are contained in K, the sequence ϕ(x, y k )
converges to ϕ(x, y) as k −→ ∞ in D(X) and F (y k ) −→ F (y), k −→ ∞, by continuity of f .
(b) Next we study the partial derivatives of F . For the element ~ej , j = 1, ..., m, of the standard
basis of Rm , and every fixed y ∈ Y we have
ϕ(x, y + t~ej ) − ϕ(x, y) ∂
−→ (ϕ(x, y)), t −→ 0,
t ∂yj
in D(X). Therefore,
1 1 ∂
(F (y + t~ej ) − F (y)) = hf (x), (ϕ(x, y + t~ej ) − ϕ(x, y))i −→ hf (x), ϕ(x, y)i.
t t ∂yj
Hence, the partial derivative of F in yj exists and
∂ ∂
hf (x), ϕ(x, y)i = hf (x), ϕ(x, y)i.
∂yj ∂yj
Part (a) shows that the partial derivative ∂y∂ j F is continuous. Proceeding by induction, we obtain
that F ∈ C ∞ (Y ) and satisfies the derivation rule stated in the theorem.
v: 2019-03-05
REAL ANALYSIS LECTURE NOTES
RASUL SHAFIKOV
9.1. Definition, basic properties, first examples. We begin with some motivation. Suppose
that f is a regular function on a domain Ω in Rn , say, of class C 1 (Ω). Then its partial derivative
∂f
(in the usual sense) ∂x j
defines a distribution acting on ϕ ∈ D(Ω) by
Z Z
∂f (x) ∂ϕ(x) ∂ϕ
hT ∂f , ϕi = ϕ(x)dx = − f (x) dx = −hTf , i,
∂xj Ω ∂xj Ω ∂xj ∂xj
where the second equality follows from the integration by parts formula. But the last expression is
defined for an arbitrary distribution f ; so it is natural to take it as a definition of the derivative of
a distribution. For f ∈ D0 (Ω) and a multi-index α = (α1 , ..., αn ) we set
hDα f, ϕi := (−1)|α| hf, Dα ϕi,
where we used the usual notation
∂ |α|
Dα = , |α| = α1 + ... + αn .
∂xα1 ...∂xαn
Derivatives in D0 (Ω) are often called weak derivatives. It is easy to check (do it!) that weak
differentiation is a well-defined operation, that is, Dα f ∈ D0 (Ω). We note some basic properties of
this operation:
∂
(0) If f ∈ C 1 (Ω), then ∂xj Tf = T ∂f .
∂xj
(1) The map Dα : D0 (Ω) −→ D0 (Ω) is linear and continuous. The linearity is obvious. In
order to prove continuity, consider a sequence fj −→ 0 in D0 (Ω) as j → ∞. Then for any
ϕ ∈ D(Ω),
hDα fj , ϕi = (−1)|α| hfj , Dα ϕi −→ 0, as j → ∞.
Thus, if a sequence (fj ) converges to f in D0 (Ω), then all partial derivatives of fj converge
to the corresponding partial derivatives of f .
(2) Every distribution admits partial derivatives of all orders.
(3) For any multi-indices α and β we have
Dα+β f = Dα (Dβ f ) = Dβ (Dα f ).
(4) The Leinbitz rule. If f ∈ D0 (Ω) and a ∈ C ∞ (Ω) then
∂(af ) ∂f ∂a
=a + f.
∂xk ∂xk ∂xk
1
2 RASUL SHAFIKOV
where x0
= (x1 , . . . , xn−1 ). In this sense the distribution f is independent of the variable xn .
Proof. Fix a function τ0 ∈ D(I) such that R τ0 dt = 1. We lift every φ ∈ D(Ω0 ) to a function
R
φ̃ ∈ D(Ω0 × I) by setting φ̃(x0 , xn ) = φ(x0 )τ0 (xn ). This allows us to define a distribution f˜ ∈ D0 (Ω0 )
by setting hf˜, φi = hf, φ̃i, φ ∈ D(Ω0 ).
Given ψ ∈ D(Ω0 × I) put
Z
0
J(ψ)(x ) = ψ(x0 , xn )dxn .
R
4 RASUL SHAFIKOV
Similarly to the proof of Theorem 9.1 for every ψ ∈ D(Ω0 × I) there exists a function ϕ ∈ D(Ω0 × I)
such that
∂ϕ(x)
ψ(x) − J(ψ)(x0 )τ0 (xn ) = .
∂xn
Then by the assumptions of the theorem, hf, ∂ϕ(x) ˜
∂xn i = 0, and by the definition of the distribution f
we have Z
hf, ψi = hf, J(ψ)(x )τ0 (xn )i = hf , J(ψ)i = hf , ψ(x0 , xn )dxn i.
0 ˜ ˜
R
It remains to show that
Z Z
0
hf˜, ψ(x , xn )dxn i = hf˜(x0 ), ψ(x0 , xn )idxn .
R R
R xn
Fix ψ ∈ D(Ω0 × I) and consider the functions F1 (xn ) = hf˜(x0 ), −∞ ψ(x0 , t)dti and F2 (xn ) =
R xn
˜ 0 0 0 0
−∞ hf (x ), ψ(x , t)idt. Then it follows from Theorem 8.7 that F1 = F2 . Since limxn →−∞ Fj = 0,
we obtain F1 ≡ F2 . This concludes the proof.
∂f
Corollary 9.4. Let f ∈ D(Ω) satisfy ∂xj = 0, j = 1, ..., n. Then f is constant.
Finally we establish a weak, but useful analogue of Corollary 9.2.
Theorem 9.5. Let f and g be continuous functions in a domain Ω ⊂ Rn . Suppose that
∂Tf
= Tg .
∂xn
∂f
Then the usual partial derivative ∂xn exists at every point x ∈ Ω and is equal to g(x).
Proof. The statement in local so without loss of generality we assume that Ω = Ω0 × I in the
notation of Theorem 9.3. Fix a point c ∈ I and set
Z xn
v(x) = g(x0 , t)dt.
c
∂(f −v)
Then ∂xn = 0 in × I) and Theorem 9.3 gives the existence of a distribution f˜ ∈ D0 (Ω0 )
D0 (Ω0
such that f − v = f˜. Furthermore, since f − v is continuous, it follows from the construction of f˜
in the proof of Theorem 9.3 that f˜ is a continuous function in x0 (defining a regular distribution).
Then the function f (x) = v(x) + f˜(x0 ) admits a partial derivative in xn which coincides with g.
This proves the theorem.
9.3. Support of a distribution. Distributions with compact support. Let f ∈ D0 (Ω0 ), and
Ω ⊂ Ω0 be a subdomain. By the restriction of f to Ω we mean a distribution f |Ω acting by
hf |Ω, ϕi := hf, ϕ|Ωi, ϕ ∈ D(Ω) ⊂ D(Ω0 ).
We say that a distribution f ∈ D0 (Rn ) vanishes on an open subset U ⊂ Rn if hf, ϕi = 0 for any
ϕ ∈ D(U ), i.e., its restriction to U vanishes identically. We express this as f |U ≡ 0.
Definition 9.6. The support supp f of a distribution f ∈ D0 (Rn ) is the subset of Rn with the
following property: x ∈ supp f if and only if for every neighbourhood U of x there exists φ ∈ D(U )
(and so supp φ ⊂ U ) such that hf, φi 6= 0, i.e., f does not vanish identically in any neighbourhood
of x.
It follows from the definition of supp f that it is a closed subset of Rn , and so its complement is
an open (but not necessarily connected) subset of Rn . Indeed, the set Rn \supp f is formed by all
points x such that f vanishes identically in some neighbourhood of x and so it is clearly open.
REAL ANALYSIS LECTURE NOTES 5
RASUL SHAFIKOV
Proof. Arguing by contradiction, suppose that there exists a sequence ϕm ∈ D(Ω) such that
|hf, ϕm i| > m k ϕm kC m (Ω)
for every m = 1, 2, .... Set ψm = αm ϕm , where αm is a real number. Then by linearity we still have
|hf, ψm i| > m k ψm kC m (Ω) .
Let αm = (k ϕm kC m (Ω) )−1 /m. Then
(3) |hf, ψm i| > m k ψm kC m (Ω) = 1.
On the other hand, k ψm kC m (Ω) = 1/m for every m. Then for every β, such that |β| ≤ m, we have
k Dβ ψm kC(Ω) ≤ 1/m.
Thus, the sequence (ψm ) converges to 0 together with all partial derivatives of all orders and the
supports of ψ m are contained in the compact Ω in Ω0 . Then ψ m −→ 0 in D(Ω0 ) and hf, ψm i −→ 0.
This contradicts (3).
It follows easily from the definition of ψs that k ψs kC k (Ω) −→ 0 as s −→ ∞. But then (5) implies
that hf (x), ψ1 i = 0. Therefore,
X X
hf, ϕηi = (hf, xα ηi/α!)Dα ϕ(0) = Cα hDα δ(x), ϕi,
|α|≤k |α|≤k
We will show that f admits an extension past the origin as a distribution, i.e., there exists f˜ ∈
D0 (Rn ) such that
Z
˜
hf , ϕi = f (x)ϕ(x)dx
Rn
for every ϕ ∈ D(Rn ).
First of all let us recall the general Taylor formula: let a ∈ Rn and let ψ be a smooth function
∈ C ∞ in a neighbourhood of a. Then for every integer k ≥ 0 there exists a neighbourhood U of a
such that for x ∈ U we have
X 1 Z 1 X k+1
α α
ψ(x) = D ψ(a)(x − a) + (1 − t)k Dα ψ(tx + (1 − t)a)(x − a)α dt.
α! 0 α!
0≤|α|≤k |α|=k+1
As usual we use here the notation α! = α1 !...αn ! and xα = xα1 1 ...xαnn . We define the distribution f˜
by the formula
hf˜, ϕi = I1 + I2 ,
where Z
I1 = f (x)ϕ(x)dx,
|x|≥1
Z X 1 α
I2 = f (x) ϕ(x) − D ϕ(0)xα dx.
|x|≤1 α!
|α|≤m−1
From this and Theorem 10.2 we conclude that f˜ is a well-defined distribution in D0 (Rn ).
Finally, consider an example of a distribution of infinite order.
4 RASUL SHAFIKOV
It follows by Theorem 8.5 that f ∈ D0 (R). We leave to the reader to prove that f is not of a finite
order, arguing by contradiction.
10.2. Regularization and convolution with test-functions. Recall that a convolution f ∗ g
of two functions L2 (Rn ) is defined by
Z Z
(7) f ∗ g(x) = f (y)g(x − y)dy = f (x − y)g(y)dy.
Rn Rn
This makes natural the following general definition.
Definition 10.5. A convolution of a distribution f ∈ D0 (Rn ) and a test function ϕ ∈ D(Rn ) is
defined by
(8) f ∗ ϕ(x) = hf (y), ϕ(x − y)i.
Note the following: for every x ∈ Rn the function y 7→ ϕ(x − y) is a test-function; on the right-
hand side of (8) we apply distribution f to this function which is stressed by the notation f (y).
Thus, f ∗ ϕ is defined as a usual function on Rn .
Proposition 10.6. We have f ∗ φ ∈ C ∞ (Rn ) and
(9) Dα (f ∗ φ) = f ∗ Dα φ = (Dα f ) ∗ φ.
Proof. The regularity of f ∗ ϕ and the first equality of (9) follow from Theorem 8.7. Let us prove
the second equality in (9). We have
∂ ∂ ∂
( f ) ∗ ϕ (x) = h f (y), ϕ(x − y)i = −hf (y), (ϕ(x − y))i
∂xj ∂yj ∂yj
∂ ∂
= hf (y), ϕ (x − y))i = f ∗ ϕ.
∂xj ∂xj
The rest of the proof is done by induction.
A very important special case arises if we take the bump function ωε as ϕ in the definition of
convolution. This leads to
Definition 10.7. The convolution fε := f ∗ ωε is called the regularization of a distribution f .
Proposition 10.8. We have
(i) fε ∈ C ∞ (Rn ).
(ii) (Dα f )ε = Dα (fε ) .
(iii) If f ∈ C(Rn ) then fε −→ f, ε −→ 0+ in C(Ω) for every bounded subset Ω of Rn .
(iv) If ϕ ∈ D(Rn ) then ϕε ∈ D(Rn ) and ϕε −→ ϕ in D(Rn ) as ε −→ 0+.
(v) if f ∈ D0 (Rn ) then fε −→ f in D0 (Rn ) as ε −→ 0+.
Proof. Parts (i) and (ii) follow from Proposition 10.6. Part (iii) is established in Proposition 7.6,
so it remains to show (iv) and (v). If ϕ ∈ D(Rn ), then its support is compact, say, ϕ(x) = 0 when
|x| ≥ A for some A > 0. Then the formula for the regularization of functions shows that ϕε (x) = 0
for |x| ≥ A + ε so supp ϕε ⊂ K = {x ∈ Rn : |x| ≤ A + 1} for ε < 1. It follows now from (iii) and
REAL ANALYSIS LECTURE NOTES 5
(ii) that ϕε converges to ϕ as ε −→ 0+ uniformly on K together with all partial derivatives of any
order. Hence ϕε −→ ϕ, ε −→ 0+ in D(Rn ) and we obtain (iv).
To prove (v), we view fε = hf (y), ωε (x − y)i as a distribution acting on every ψ ∈ D(Rn ) by
Z
hfε , ψi = hf (y), ωε (x − y)iψ(x)dx.
Rn
It follows from Theorem 8.7 that
Z Z
(10) hf (y), ωε (x − y)iψ(x)dx = hf (y), ωε (x − y)ψ(x)dxi = hf, ψε i.
Rn Rn
To see this we consider for simplicity of notation the case n = 1. Let
Z t
F (t) = hf (y), ωε (x − y)iψ(x)dx,
−∞
and Z t
G(t) = hf (y), ωε (x − y)ψ(x)dxi.
−∞
Then by Theorem 8.7, F 0 (x) = G0 (x) = hf (y), ωε (x − y)iψ(x). Since F (−∞) = G(−∞) = 0, we
have F = G for all t and we pass to the limit as t → +∞. This proves (10). Then by (iv),
hfε , ψi = hf, ψε i −→ hf, ψi as ε −→ 0 + .
This concludes the proof.
As an application we give another proof of Corollary 9.4: if f ∈ D0 (Rn ) are such that ∂
∂xj f =0
in D0 (Rn ) for j = 1, ..., n, then f = const.
Proof of Corollary 9.4. For every ε > 0 we have 0 = ( ∂x∂ j f )ε = ∂x∂ j (fε ). Since fε is a usual function
of class C ∞ , we conclude that fε = C(ε). Then,
Z
hf, ϕi = lim hfε , ϕi = lim C(ε) ϕ(x)dx for any ϕ ∈ D(Rn ).
ε−→0+ ε−→0+
R
In particular, set ϕ = ω(x) so that ω(x)dx = 1. We obtain that C = limε→0+ C(ε) exists, and
f = C.
10.3. Convolution of distributions. Let f and g be functions in L2 (Rn ). For a moment assume
that f ∗ g is in L1loc (Rn ). Then it defines a regular distribution acting on ϕ ∈ D(Rn ) by
Z Z Z
hf ∗ g(x), ϕ(x)i = f ∗ g(x)ϕ(x)dx = f (y)g(x − y)dy ϕ(x)dx.
Rn
By Fubini’s theorem,
Z Z Z Z
f (y)g(x − y)dy ϕ(x)dx = f (y) g(x − y)ϕ(x)dx dy
Z Z Z
= f (y) g(t)ϕ(t + y)dt dy = f (y)hg(t), ϕ(t + y)idy
Therefore, in the general case of arbitrary distributions f, g ∈ D0 (Rn ) it is natural to take equality
(11) as a definition of the convolution f ∗ g. However, the right-hand side of equality (11) is not
defined for arbitrary distributions f and g since the function y 7→ hg(t), ϕ(t + y)i is just of class
C ∞ (Rn ) and in general need not have compact support. The support is clearly compact if the
distribution g itself has compact support. So in this case the convolution is well-defined. Similarly,
if f has compact support then it acts on any function from C ∞ (Rn ) as previously discussed, so the
right-hand of equality (11) is also well-defined. We summarize this in the following.
Proposition 10.9. The convolution f ∗ g of two distributions f, g ∈ D0 (Rn ) is a distribution
correctly defined by the equality (11) if at least one of the distributions f and g has compact support.
Example 10.5. For any distribution f ∈ D0 (Rn ) we have
hf ∗ δ, ϕi = hf (y), hδ(t), ϕ(t + y)ii = hf (y), ϕ(y)i,
that is f ∗ δ = f . Furthermore
hδ ∗ f, ϕi = hδ(y), hf (t), ϕ(t + y)ii = hf (t), ϕ(t)i,
so that δ ∗ f = f . We obtain the following fundamental identity
f ∗δ =δ∗f =f
for any f ∈ D0 (Rn ).
We conclude this section by some algebraic properties of convolution.
RASUL SHAFIKOV
Thus u = E∗f defines a solution of (2). In order to prove the uniqueness in the class of distributions,
admitting the convolution with E, it suffices to prove that the homogeneous equation
P (D)v = 0
has a unique solution in this class. But this holds since
v = δ ∗ v = (P (D)E) ∗ v = E ∗ (P (D)v) = E ∗ 0 = 0.
This proves the theorem.
d2
Example 11.1. Let P (D) = dx2
on R. To solve the equation
(3) P (D)u = χ[0,1]
2
we first find a fundamental solution of the operator P (D). If E satisfies ddxE2 = δ, then by Exam-
ple 9.1, we have dEdx = θ + c1 . For convenience we may take c1 = −1/2. Then E = 1/2|x| + c2 . Take
c2 = 0, then E = 1/2|x| is a fundamental solution. The find a generalized solution of (3) we com-
pute, according to Theorem 11.2, the convolution of the fundamental solution and the right-hand
side of (3). Since one of the functions has compact support, the convolution is well-defined, so we
have
1 1
Z Z Z
1 1
E ∗ χ[0,1] (x) = |y|χ[0,1] (x − y)dy = |x − t|χ[0,1] (t)dt = |x − t|dt.
R 2 2 R 2 0
This integral is a well defined C 1 -smooth function on R given by
2
x 1
− 2 + 4 , if x ≤ 0,
2
u(x) = x2 − x2 + 41 , if 0 < x < 1,
x2 1
2 − 4 , if x ≥ 1.
In the next section we compute fundamental solutions of the classical linear operators in Rn .
Proposition 11.4 (Hörmander’s inequality). Let P (D) be a nonzero linear differential operator
with constant coefficients of order m given by (4). Then for every bounded domain Ω ⊂ Rn , there
exists a constant C > 0, such that for every φ ∈ D(Ω), we have
||P (D)φ|| ≥ C||φ||.
One can take C = |P |m Km,Ω , where
|P |m = max {|aα |},
|α|=m
and Km,Ω depends only on m and the quantity {sup |x| : x ∈ Ω}.
Proof. To illustrate the idea of the proof first consider the case n = 1, Ω = (0, 1), and P (D) = d/dx.
We need to show that there exists some C > 0 such that ||φ0 || ≥ C||φ|| for all φ ∈ D((0, 1)). We
have
h(xφ)0 , φi = hxφ0 , φi + hφ, φi.
Using integration by parts, h(xφ)0 , φi = −hxφ, φ0 i, and so hφ, φi = −hxφ0 , φi − hxφ, φ0 i. Since
|x| < 1, we get ||φ||2 ≤ 2||φ0 || ||φ||, by the Hölder inequality (Thm 4.2). Hence, ||φ0 || ≥ 1/2||φ||.
The general case is proved by induction on the degree of P . Define a linear differential operator
with constant coefficients Pj (D) by the following identity
P (D)(xj φ) = xj P (D)φ + Pj (D)φ.
The operator Pj (D) is zero iff P (D) does not involve any differentiation with respect to xj . If it is
nonzero, then Pj (D) is of order at most m − 1. Let A = supx∈Ω |x|. By induction on m, we will
show that for every φ ∈ D(Ω),
(5) ||Pj (D)φ|| ≤ 2mA ||P (D)φ||.
Observe that (5) and the definition of Pj yield
(6) ||P (D)(xj φ)|| ≤ (2m + 1)A ||P (D)φ||.
Since differential operators with constant coefficients commute, we have for all φ ∈ D(Ω),
||P (D)φ||2 = hP (D)φ, P (D)φi = hφ, P ∗ (D)P (D)φi = hφ, P (D)P ∗ (D)φi
= hP ∗ (D)φ, P ∗ (D)φi = ||P ∗ (D)φ||2 .
The inequality (5) is trivial for m = 0, since then Pj (D) = 0. Assuming that (5) is verified
for operators of order m − 1, we compute hP (D)(xj φ), Pj (D)φi in two different ways. From the
definition of Pj (D) we have,
hP (D)(xj φ), Pj (D)φi = hxj P (D)φ, Pj (D)φi + ||Pj (D)φ||2 .
By integration by parts (i.e., using the definition of the adjoint) and using commutativity of P ∗ (D)
and Pj (D), we obtain
hP (D)(xj φ), Pj (D)φi = hPj∗ (D)(xj φ), P ∗ (D)φi.
Therefore,
(7) ||Pj (D)φ||2 = hPj∗ (D)(xj φ), P ∗ (D)φi − hxj P (D)φ, Pj (D)φi.
4 RASUL SHAFIKOV
By the induction hypothesis, equation (6) holds for all operators of order m−1, which when applied
to Pj∗ (D) yields
||Pj∗ (D)(xj φ)|| ≤ (2m − 1)A ||Pj (D)φ||.
And, since
|hxj P (D)φ, Pj (D)φi| ≤ A||P (D)φ|| ||Pj (D)φ||,
we obtain from (7) that
||Pj (D)φ||2 ≤ 2mA ||Pj (D)φ|| ||P (D)φ||,
which proves (5). If P (D) is an operator of order m ≥ 1, there exists j ∈ {1, . . . , n} such that
Pj (D) is of order m − 1, and |Pj |m−1 ≥ |P |m . Thus the proposition follows from (5) by induction
on m.
Corollary 11.5. If Ω is a bounded domain in Rn , then for every g ∈ L2 (Ω) there exists u ∈ L2 (Ω)
such that P (D)u = g.
Proof. This follows from the inequality ||P ∗ (D)φ|| ≥ C||φ||, φ ∈ D(Ω). Indeed, P (D)u = g means
that for all φ ∈ D(Ω),
(8) hg, φi = hu, P ∗ (D)φi.
Let
E = {ψ ∈ D(Ω), ψ = P ∗ (D)φ for some φ ∈ D(Ω)}.
Consider the (anti)linear functional l : E → C given by
l(ψ) = hg, φi, where ψ = P ∗ (D)φ.
Then using Hörmander’s inequality we have
||g|| ||g||
||l|| = sup |hg, φi| ≤ ||g|| sup ||φ|| ≤ sup ||P ∗ (D)φ|| = .
||ψ||=1 ||ψ||=1 C ||ψ||=1 C
This shows that l is a bounded linear functional on E with L2 -norm. Therefore, l can be extended
to E, the closure of E in L2 (Ω). Then the Riesz representation theorem (Theorem 4.11) gives the
existence of u ∈ E such that l(ψ) =< u, ψ >. This implies equation (8).
We now wish to extend the above result to L2loc (Ω) functions. For this we first prove the following
Proposition 11.6. There exists C 0 > 0 such that for all η ∈ R and φ ∈ D(Ω), we have
Z Z
ηx1 2 0
e |P (D)φ| ≥ C eηx1 |φ|2 .
Ω Ω
Note that C0 is independent of η.
Proof. Apply Hörmander’s inequality to Ψ = e(η/2)x1 φ and operator Q(D) defined by
Q(D)(Ψ) = e(η/2)x1 P (D)[e−(η/2)x1 Ψ],
which is indeed a constant coefficient operator of the same degree m as P (D).
Corollary 11.7. Let φ ∈ D(Rn ) or more generally φ ∈ L2 (Rn ) with compact support. If P (D)φ is
supported in the ball B(0, r), then so is φ.
Proof. By letting η → +∞ in Proposition 11.6, one can immediately verify that if P (D)φ = 0
in the half-space {x1 > 0}, then φ = 0 there. From this, using translations and rotations, the
corollary can be verified in the case of a smooth φ. In the nonsmooth case, for ε < 1 consider the
regularization φε = φ ∗ ωe ∈ D(Rn ). Then P (D)φε = P (D)φ ∗ ωε is supported in B(0, r + ε) and
φε → φ in L2 as ε → 0 by Proposition 10.8. This reduces the problem to the smooth case.
REAL ANALYSIS LECTURE NOTES 5
r0
Proposition 11.8. Let 0 < r < < R. If v ∈ L2 (B(0, r0 ))
and satisfy P (D)v = 0 on B(0, r0 ),
then there exists a sequence (vj ) ⊂ L2 (B(0, R)) such that P (D)vj = 0 on B(0, R) and vj → v in
L2 (B(0, r)) as j → ∞.
Proof. After regularization we can assume that v is smooth, possibly shrinking r0 slightly. It suffices
to show that any continuous linear functional that vanishes on the space L2 (B(0, R))∩{α : P (D)α =
0} also vanishes at v. In other words (using the Riesz representation theorem), we have to show
that if g ∈ L2 (B(0, r)) and satisfies hα, giB(0,r) = 0 for all α ∈ L2 (B(0, R)) with P (D)α = 0, then
hv, giB(0,r) = 0.
Claim. There exists w ∈ L2 (B(0, R)) such that for all φ ∈ D(Rn ),
hφ, giB(0,r) = hP (D)φ, wiB(0,R) .
For the proof of the claim, we need to find C > 0 such that
hφ, giB(0,r) ≤ C||P (D)φ||B(0,R) .
Notice that if P (D)φ = 0, then we have hφ, gi = 0. If P (D)φ 6= 0, then by Corollary 11.5 we
can find Ψ ∈ L2 (B(0, R)) so that P (D)Ψ = P (D)φ and ||Ψ||B(0,R) ≤ C1 ||P (D)φ||B(0,R) for some
C1 > 0. Then
hφ, giB(0,R) = hφ − Ψ, giB(0,R) + hΨ, giB(0,R) = hΨ, giB(0,R) .
Hence, |hφ, giB(0,R) | ≤ C||P (D)φ||B(0,R) with C = C1 ||g||, which proves the claim.
Pick w as given by the claim. Extend g and w on Rn to g̃ and w̃ by setting g̃ = 0 on Rn \ B(0, r)
and w̃ = 0 on Rn \ B(0, R). We then have g̃ = P ∗ (D)w̃. Since w̃ has compact support, and P ∗ (D)w̃
is supported in B(0, r), we conclude from Corollary 11.7 that w = 0 on B(0, R) \ B(0, r).
To complete the proof of the proposition take v as at the beginning of the proof, and extend
it to be a smooth, compactly supported function on Rn (but no longer satisfying P (D)v = 0 off
B(0, r)). One has
hv, giB(0,r) = hP (D)v, wiB(0,R) = hP (D)v, wiB(0,r) = 0.
We now can prove the following
Proposition 11.9. Let P (D) be a nonzero linear differential operator on Rn with constant coeffi-
cients. Then for every g ∈ L2loc (Rn ) there exists u ∈ L2loc (Rn ) such that P (D)u = g.
Proof. By Corollary 11.5 there exists u1 ∈ L2 (B(0, 2)) so that P (D)u1 = g on B(0, 2). Then
inductively, assuming up has been chosen in L2 (B(0, p + 1)) so that P (D)up = g, one chooses
up+1 in L2 (B(0, p + 2)) in the following way. Let w be an arbitrary solution of P (D)w = g, in
L2 (B(0, p + 2)). On B(0, p + 1) one has P (D)(up − w) = 0. By Proposition 11.8 there exists
v ∈ L2 (B(0, p + 2)) such that P (D)v = 0, and ||v − (up − w)||B(0,p) ≤ 1/2p . Set up+1 = v + w.
Then P (D)up+1 = g on B(0, p + 2), and ||up+1 − up ||B(0,p) ≤ 1/2p . The sequence (up ) is obviously
convergent in L2loc (Rn ), and its limit satisfies P (D)u = g.
Proof of the Malgrange-Ehrenpreis Theorem. Let H be the function (product of the Heaviside func-
tions on R) defined on Rn by
(
1, if xj > 0, j = 1, . . . , n,
H(x1 , . . . , xn ) =
0, otherwise.
6 RASUL SHAFIKOV
Then
∂nH
= δ0 .
∂x1 . . . ∂xn
Since H ∈ L2loc (Rn ), by the previous proposition there exists u ∈ L2loc (Rn ) so that P (D)u = H. Set
∂nu
E= .
∂x1 . . . ∂xn
Then
∂nu ∂n
P (D)E = P (D) = (P (D)u) = δ0 .
∂x1 . . . ∂xn ∂x1 . . . ∂xn
v: 2019-04-08
REAL ANALYSIS LECTURE NOTES
RASUL SHAFIKOV
Keeping this in mind, we pass to the integration by parts with the Cauchy-Riemann operator. For
two complex-valued functions u, v ∈ C 1 (Ω) we have
Z Z Z Z Z
∂u 1 ∂u i ∂u 1 ∂v
vdxdy = vdxdy + vdxdy = uv(~n, e1 )dS − u dxdy
Ω ∂z 2 Ω ∂x 2 Ω ∂y 2 ∂Ω Ω ∂x
Z Z Z Z
i ∂v 1 ∂v
+ uv(~n, e2 )dS − u dxdy = uv[(~n, e1 ) + i(~n, e2 )]dS − u dxdy
2 ∂y 2 Ω ∂z
Z∂Ω Ω ∂Ω
−i
Z
∂v
= uvdz − u dxdy.
2 ∂Ω Ω ∂z
For ε > 0 denote by A(ε, R) the annulus B(0, R)\B(0, ε). Denote also by Cε the circle {|z| = ε}.
∂ 1
Then ∂z z = 0 on A(ε, R) and using (1) with u = φ and v = 1/z we have
Z Z
∂ 1 1 ∂ϕ 1 ∂ϕ i ϕ
h , ϕi = −h , i = − lim dxdy = lim − dz.
∂z z z ∂z ε−→0+ Ωε z ∂z ε−→0+ 2 Cε z
Here the integral over the circle Cε is taken with positive orientation with respect to the disc B(0, ε).
Writing
ϕ(z) − ϕ(0)
Z Z Z
ϕ dz
dz = dz + ϕ(0) ,
Cε z Cε z Cε z
we easily see that the first integral tends to 0 (use Taylor’s formula) and the second one tends to
2πiϕ(0). Hence,
Z
i ϕ
lim − dz = πϕ(0),
ε−→0+ 2 Cε z
Using Lemma 12.1 we can easily deduce an integral representation involving the Cauchy-Riemann
operator. Fix z ∈ Ω. Denote by Ωε the domain Ω\B(z, ε) and by C(z, ε) the circle {ζ : |ζ − z| = ε}.
Let a complex function f be of class C 1 (Ω). We set ζ = ξ + iη. Then, using (1) and (2), we have
Z Z
1 ∂f (ζ) 1 ∂f (ζ) 1
dξdη = lim dξdη
π Ω ∂ζ ζ − z ε−→0+ Ωε ∂ζ ζ − z
!
−i −i
Z Z Z
1 f (ζ) i f (ζ) f (ζ)
= lim dζ + dζ = dζ − f (z).
π ε−→0+ 2 ∂Ω ζ − z 2 C(z,ε) ζ − z 2π ∂Ω ζ − z
Let us prove the formula for the partial derivative of f in the sense of distributions:
∂f
(6) = T ∂f + [f ]∂Ω (ek , ~n)δ∂Ω ,
∂xk ∂xk
∂f
where ∂xk ∈ D0 (Rn ). We have
Z
∂f ∂ϕ ∂ϕ(x)
h , ϕi = −hf, i=− f (x) dx.
∂xk ∂xk Rn ∂xk
We decompose
Z Z Z
∂ϕ(x) ∂ϕ(x) ∂ϕ(x)
f (x) dx = f (x) dx + f (x) dx
Rn ∂xk Ω ∂xk Rn \Ω ∂xk
and apply to every integral on the right the integration by parts formula. Then
Z Z Z
∂ϕ(x) ∂f (x)
f (x) dx = − ϕ(x) dx + f− (x)ϕ(x)(ek , ~n)dS,
Ω ∂xk Ω ∂xk ∂Ω
and Z Z Z
∂ϕ(x) ∂f (x)
f (x) dx = − ϕ(x) dx − f+ (x)ϕ(x)(ek , ~n)dS
n
R \Ω ∂xk n
R \Ω ∂xk ∂Ω
(the minus sign before the last integral appears because ~n is the exterior normal for Ω and so it is
the interior normal for Rn \Ω). Therefore,
Z Z Z
∂ϕ(x) ∂f (x)
f (x) dx = − ϕ(x) dx − [f ]∂Ω (x)(ek , ~n)ϕ(x)dS,
Rn ∂xk Rn ∂xk ∂Ω
and Z Z
∂f ∂f (x)
h , ϕi = − ϕ(x) dx + [f ]∂Ω (x)(ek , ~n)ϕ(x)dS,
∂xk R n ∂xk ∂Ω
which proves (6).
4 RASUL SHAFIKOV
12.2. Laplace operator. In this section we construct a fundamental solution of the Laplace op-
erator
∂2 ∂2
∆= + ... + .
∂x21 ∂x2n
(a) First we suppose that n = 2 and prove that
∂ 2 ln |x| ∂ 2 ln |x|
(7) ∆ ln |x| = + = 2πδ(x), x ∈ R2 .
∂x21 ∂x22
First of all observe that the function ln |x| is of class L1loc (R2 ) (to see this it suffices to pass to the
polar coordinates) and so it can be viewed as a distribution. Let ϕ ∈ D(R2 ). Since supp ϕ is a
compact set, there exists R > 0 such that ϕ(x) = 0 for |x| ≥ R/2. We have
Z Z
h∆ ln |x|, ϕi = hln |x|, ∆ϕi = ln |x|∆ϕ(x)dx = ln |x|∆ϕ(x)dx.
R2 |x|≤R
Denote by A(ε, R) = {x : ε < |x| < R} the annulus, where ε > 0 is small enough. Then by the
Lebesgue convergence theorem,
Z Z
ln |x|∆ϕ(x)dx = lim ln |x|∆ϕ(x)dx.
|x|≤R ε−→0+ A(ε,R)
1
(8) ∆ = −(n − 2)Sn δ(x), n ≥ 3,
|x|n−2
where the constant Sn is equal to the surface area of the unit sphere in Rn . The proof is quite
similar to part (a). We use the notation r = r(x) = |x|. For a function f : R −→ R of class C 2 we
have
∂ xj
f (r) = f 0 (r) ,
∂xj r
∂2 00
x2j 0
r2 − x2j
f (r) = f (r) + f (r) ,
∂x2j r2 r2
n−1
∆f (r) = f 00 (r) + f 0 (r) .
r
Setting f (r) = rp , we obtain
∆rp = p(p + n − 2)rp−2 .
Therefore, ∆r2−n = 0 on Rn \{0}. Also note that the function x 7→ r2−n is in L1loc (Rn ).
We have, for sufficiently large R > 0, that
Z Z
2−n 2−n 2−n
h∆r , ϕi = hr , ∆ϕi = r ∆ϕ(x) = r2−n ∆ϕ(x)dx,
Rn |x|≤R
and Z Z
2−n
r ∆ϕ(x)dx = lim r2−n ∆ϕ(x)dx.
|x|≤R ε−→0+ A(ε,R)
Then
Z Z
1 1
−(n − 2) lim ϕ(x)dS = −(n − 2) lim (ϕ(x) − ϕ(0))dS
εn−1 Cε
ε−→0 ε−→0 εn−1 Cε
−(n − 2)Sn ϕ(0) = −(n − 2)Sn ϕ(0),
which proves (8).
If n = 3, then S3 = 4π so that
1
∆ = −4πδ(x), x ∈ R3 .
r
12.3. Heat Equation. Consider the function
θ(t) |x|2
− 2
E(x, t) = √ e 4a t ,
(2a πt)n
where the function θ is the Heaviside function on R. The function E is locally integrable in Rn+1 .
Indeed, E(x, t) = 0 if t < 0 and E(x, t) is positive for t ≥ 0. Furhermore, E is continuous
(and vanishes) on the hyperplane {(x, t) : t = 0}. Consider a bouded set of Rn+1 of the form
B(0, R) × [0, R], where B(0, R) = {x ∈ Rn : |x| ≤ R}. By Fubini’s theorem we have
Z Z Z ! Z Z
E(x, t)dxdt = E(x, t)dx dt ≤ E(x, t)dx dt.
B(0,R)×[0,R] [0,R] B(0,R) [0,R] Rn
√
After the change of coordinates x/2a t = y we have
n Z
|x|2
Z Z
1 − 2 1 Y −yj2
E(x, t)dx = √ e 4a t dx = √
n
e dyj = 1.
n ( π)
Rn Rn (2a πt) j=1 R
Thus,
Z
(9) E(x, t)dx = 1,
Rn
and so Z Z Z
E(x, t)dx dt ≤ dt = R.
[0,R] Rn [0,R]
This proves the local integrability of E(x, t).
Let us prove the following identity:
∂E
(10) − a2 ∆E = δ(x, t).
∂t
We first observe that for t > 0 the function E is of class C ∞ , and by an elementary computation,
which is left for the reader, we have
∂E
(11) (x, t) − a2 ∆E = 0, t > 0.
∂t
Here the derivatives are taken in the usual sense. Now let ϕ ∈ D(Rn+1 ). Then
Z ∞ Z
∂E 2 ∂ϕ 2 ∂ϕ 2
h − a ∆E, ϕi = −hE, + a ∆ϕi = − E(x, t) + a ∆ϕ dx dt.
∂t ∂t 0 Rn ∂t
By the Lebesgue convergence theorem we have
Z ∞ Z Z ∞ Z
∂ϕ 2 ∂ϕ 2
− E(x, t) + a ∆ϕ dx dt = − lim E(x, t) + a ∆ϕ dx dt.
0 Rn ∂t ε−→0 ε Rn ∂t
REAL ANALYSIS LECTURE NOTES 7
For the proof, fix R > 0 such that supp φ ⊂ {|(x, t)| < R}. The function φ is Lipschitz continuous
and hence, uniformly continuous on Rn+1 . Given α > 0 there exists δ > 0 such that |φ(x, ε) −
φ(x, 0)| < α/2 for all x ∈ Rn . Therefore,
Z
E(x, ε)[ϕ(x, ε) − ϕ(x, 0)]dx = I + II,
with Z
I= E(x, ε)[ϕ(x, ε) − ϕ(x, 0)]dx,
|x|<δ
and Z
II = E(x, ε)[ϕ(x, ε) − ϕ(x, 0)]dx.
δ≤|x|≤R
Then Z
|I| ≤ (α/2) E(x, ε)dx = α/2.
Rn
Set
1 |δ|2
√ n e− 4a2 ε ,
M (ε) =
(2a πε)
and C = supx∈Rn |φ(x)|. Then sup|x|≥δ E(x, ε) = M (ε) and
Z
|II| ≤ 2C E(x, ε)dx ≤ 4CM (ε)R → 0, ε → 0.
δ≤|x|≤R
It follows that |II| ≤ α/2 for all ε small enough. This proves the claim.
We conclude that
Z Z
lim E(x, ε)ϕ(x, ε)dx = lim ( E(x, ε)ϕ(x, 0)dx.
ε−→0 ε−→0
To finish the proof we need the following
8 RASUL SHAFIKOV
For the proof, let ψ ∈ D(Rn ). Since ψ has a compact support, there exists a constant C > 0
such that
|ψ(x) − ψ(0)| ≤ C|x|, x ∈ Rn .
We have
|x|2
Z Z
C − 2
E(x, t)(ψ(x) − ψ(0))dx ≤ e 4a t |x|dx.
Rn (4πa2 t)n/2 Rn
Evaluating the last integral in the spherical coordinates (we denote by σn the surface of the unit
sphere in Rn ) we obtain that the last integral is equal to
Z ∞ √ Z
Cσn 2
− r2 n 2Cσn ta ∞ −u2 n √
2 n/2
e 4a t r dr = n/2
e u du = C 0 t.
(4πa t) 0 π 0
Hence, Z
E(x, t)(ψ(x) − ψ(0))dx −→ 0, as t −→ 0 + .
Rn
Then, using (9), we have
Z Z Z
hE(x, t), ψi = E(x, t)ψ(x)dx = ψ(0) E(x, t)dx + E(x, t)(ψ(x) − ψ(0))dx
Rn
−→ ψ(0) = hδ(x), ψi.
This proves the claim.
Let ψ(x) = ϕ(x, 0) ∈ D(Rn ). Then
Z
∂E
h − a2 ∆E, ϕi = lim E(x, ε)ϕ(x, 0)dx = ϕ(0) = hδ(x, t), ϕi.
∂t ε−→0