Solutions To Selected Exercises in Chapter Two
Solutions To Selected Exercises in Chapter Two
2.1.3. Suppose T : E → F is a linear mapping, and let A := y0 + T where y0 ∈ F . Then for any
λ ∈ R, we have
For the converse, suppose A : E → F is an affine mapping. Let y0 = A(0), T (x) := Ax − y0 and
k ∈ R. Then
Furthermore,
1 1
T (x + y) = A(x + y) − y0 = A (2x) + (2y) − y0
2 2
1 1 1 1
= A(2x) + A(2y) − y0 = [A(2x) − y0 ] + [A(2y) − y0 ]
2 2 2 2
1 1
= T (2x) + T (2y) = T (x) + T (y).
2 2
1
f (0) ≤ f (0) + f (0) implies f (0) ≥ 0, and so f (0) = 0. When λ = 0, 0 = f (0) = f (λx) = λf (x),
now for λ > 0,
2.1.5. Suppose f is a convex function. Consider the set F = {x ∈ E : f (x) ≤ α}. In the case F
is not empty, let u, v ∈ F . Then for 0 ≤ λ ≤ 1,
2.1.6. Observe that a pointwise suprema of convex functions is convex, because the epigraph
is an intersection of convex sets which is convex. Notice that the function need not be proper.
Moreover, the epigraph will be closed if all of the functions are closed.
(a) Suppose f : X → [−∞, +∞] is convex. If f ≡ +∞, then epi f = ∅ is convex. Otherwise, let
(x, t), (y, s) ∈ epi f . Then for 0 ≤ λ ≤ 1 we have
for all t > f (x) and s > f (y). It follows that f is convex. (Hence this confirms that we can define
improper convex functions via their epigraphs as remarked earlier)
(c) Let x, y ∈ E, and 0 ≤ λ ≤ 1. Then
as desired.
For a nice alternative geometric approach and explanation to (d), the reader is encouraged to
consult [255, Proposition 2.2.1]. For the converse, the convexity of g follows by restricting to the
case t = 1.
2.1.7. It is clear that if x0 ∈ int C, then x0 ∈ core C. For the converse, suppose C is convex
and x0 ∈ core C. Then there exist δi > 0 so that x0 + tei ∈ C for all |t| ≤ δi and i = 1, 2 . . . , n
where {ei }ni=1 is the usual basis of Rn . Now let δ = min{δ1 , δ2 , . . . , δn }. Because C is convex, it
follows that x0 + h ∈ C whenever h = a1 e1 + a2 e2 + . . . an en where |ai | ≤ δ/n for i = 1, 2, . . . , n.
Consequently, x0 ∈ int C 6= ∅. A conventional example of a nonconvex set F ⊂ R2 with (0, 0) ∈
core F \ int F is F = {(x, y) ∈ R2 : |y| ≥ x2 or y = 0}; see also Figure 2.4 for another example.
2
2.1.8. Let x ∈ E be a point of continuity of the convex function f . The max formula (2.1.19)
ensures that ∂f (x) 6= ∅. Moreover, f has Lipschitz constant K on some neighborhood U of x
(Theorem 2.1.10). Because hv, y − xi ≤ f (y) − f (x) for all y ∈ U and v ∈ ∂f (x), it follows that
kvk ≤ K. Thus, ∂f (x) ⊂ KBE . Now suppose (vn ) ⊂ ∂f (x) and vn → v. Then
This shows w ∈ ∂f (x). Therefore, ∂f (x) is a nonempty, closed, convex and bounded subset of
E.
as desired.
2.1.11. (a) For arbitrary a, b ∈ R and 0 < λ < 1, let f be defined by f (x) := a for 0 ≤ x ≤ λ
and f (x) := b for λ < x ≤ 1. Then
Z 1
g f (x) dx = g(λa + (1 − λ)b)
0
while Z 1
g(f ) dx = λg(a) + (1 − λ)g(b).
0
The original assumption Z 1 Z 1
g f (x)dx ≤ g(f )dx
0 0
3
then implies g(λa + (1 − λ)b) ≤ λg(a) + (1 − λ)g(b). This establishes the convexity of g.
(b) Applying Jensen’s inequality when φ := exp(·) we have
Z Z
exp f dµ ≤ exp(f ) dµ.
Ω Ω
When Ω is the finite set {s1 , s2 , . . . , sn } with µ({si }) = 1/n and f (si ) = yi for i = 1, 2, . . . , n, this
becomes
1 1
exp (y1 + . . . + yn ) ≤ (ey1 + . . . + eyn ), yi ∈ R.
n n
When xi = eyi this becomes
1
(x1 x2 · · · xn )1/n ≤ (x1 + x2 + . . . + xn ),
n
that is, the arithmetic-geometric mean inequality.
2.1.12. Suppose f is not identically −∞. Then fix x0 where f is real-valued. We may assume
f (x0 ) = 0. Suppose f (x0 + h) > 0 for some h ∈ E. Then for t > 1,
0 < f (x0 + h) ≤ (1 − t−1 )f (x0 ) + t−1 f (x0 + th) = t−1 f (x0 + th).
Thus limt→∞ f (x0 + th) = limt→∞ tf (x0 + h) = ∞ which is a contradiction. Finally, in the case
f (x0 + h) < 0 for some h ∈ E, we would deduce f (x0 − h) > 0, and so this, too, is impossible.
2.1.13. (a) For k ≥ 0, observe that x ∈ λC if and only if kx ∈ kλC. Therefore γC is positively
homogeneous (the definition of γC := inf{λ ≥ 0 : x ∈ λC} ensures γC (0) = 0 even when 0 6∈ C).
Suppose C ⊂ E is convex. Let x, y ∈ E. In the case γC (x) = ∞ or γC (y) = ∞ then it is clear
Suppose γC (x), γC (y) are real-valued and > 0. Choose α, β so that γC (x) ≤ α < γC (x) + ,
γC (y) ≤ β < γC (y) + and α1 x, β1 y ∈ C. So we choose u, v ∈ C so that x = αu and y = αv (the
case x = 0 or y = 0 are fine). Then
λαu + (1 − λ)βv
∈C
λα + (1 − λ)β
and therefore
γC (λαu + (1 − λ)βv) ≤ λα + (1 − λ)β
or in other words, γC (λx + (1 − λ)y) ≤ λγC (x) + (1 − λ)γC (y) + . This shows γC is convex when
C is convex, in fact, γC is subadditive because
1 1 1 1
γC (x + y) = γC (2x) + (2y) ≤ γC (2x) + γC (2y) = γC (x) + γC (y).
2 2 2 2
4
(c) Suppose 0 ∈ core C. Then γC is continuous, and therefore {x ∈ E : γC (x) ≤ 1} is closed.
Observe that γC (x) ≤ 1 for all x ∈ C, consequently cl C ⊂ {x ∈ E : γC (x) ≤ 1}.
√
2.1.14. For example, let f (x, y) = − 4 xy when x ≥ 0, and y ≥ 0. The strict convexity assertion
is probably best shown by computing the Hessian (as introduced in Section 2.2) of f at (x, y),
which is 3 −7/4 1/4 1 −3/4 −3/4
16 x y − 16 x y
H= 1 −3/4 −3/4 3 1/4 −7/4 .
− 16 x y 16 x y
Then f is strictly convex on the interior of its domain because this matrix is positive definite for
all (x, y) with x > 0 and y > 0 which follows because h11 > 0, and |H| > 0 at all such (x, y).
2.1.15. Let (fi ) be a family of proper functions and define f : E → [−∞, +∞] by
nX X X o
f (x) := inf λi fi (xi ) : λi = 1, λi ≥ 0, λi finitely nonzero, λ i xi = x .
Let u, v ∈ dom f and and let α and β be any real numbers satisfying f (u) < α and f (v) < β.
Now choose sums as in the definition above so
X X X X
αi f (ui ) < α, βi f (vi ) < β where αi ui = u, βi vi = v.
P P
Then for any 0 ≤ λ ≤ 1, we have [λαi +(1−λ)βi ] = 1 and (λαi ui +(1−λ)βi vi ) = λu+(1−λ)v.
Then by the definition of f ,
X
f (λu + (1 − λ)v) ≤ λαi fi (ui ) + (1 − λ)βi fi (vi ) < λα + (1 − λ)β.
The convexity of f follows from this. Next we show that f is the largest convex function minorizing
the family.P Indeed suppose
P h is convex and h minorizes the family (fi ). Then for any x such
that x = λi xi where λi = 1, λi ≥ 0, and only finitely many of the λi are nonzero, by the
convexity of h and minorization property we have
X X X
h( λi xi ) ≤ λi h(xi ) ≤ λi f (xi ).
2.1.18. Let x be such that f (x) is real-valued. Suppose f (x) = (cl f )(x) and let xn → x. Because
the epigraph of cl f is closed and f ≥ cl f we know
lim inf f (xn ) ≥ lim inf (cl f )(xn ) ≥ (cl f )(x) = f (x)
n→∞ n→∞
5
and so f is lower semicontinuous at x. Conversely, suppose f is lower semicontinuous at x. Let
(xn , tn ) ∈ epi f be such that (xn , tn ) → (x, (cl f )(x)) (possibly in the extended sense in the second
coordinate). Then
(cl f )(x) = lim tn ≥ lim inf f (xn ) ≥ f (x).
n→∞ n→∞
Because (cl f ) ≤ f , we conclude (cl f )(x) = f (x). According to the lower semicontinuity of f at
x we know f (u) > α > −∞ on some neighborhood of x. It follows easily that f is proper, see
the proof of Lemma 2.3.3.
2.1.19. Suppose lim inf kxk→∞ f (x)/kxk > β > 0. Let S = {x : f (x) ≤ α} for some α ∈ R. Find
k > 0 so that kβ > α, and f (x)/kxk > β for all kxk ≥ k. If kxk ≥ k, then f (x) > kβ > α.
Therefore, S ⊂ kBE . p
Clearly f : R → R defined by f (t) := |t| is not convex, and {t : f (t) ≤ α} = [−α2 , α2 ] for each
α > 0. However,
lim f (t)/|t| = lim |t|−1/2 = 0.
|t|→∞ |t|→∞
S
2.1.22. (a) For each n ∈ N, let Fn := {x ∈ S : |fk (x)| ≤ n}. Then n Fn = S because
fn (x) → f (x) and f (x) ∈ R. According to the Baire category theorem, there exist N ∈ N such
that FN contains a relatively open (in E) subset U . Then |fk (x)| ≤ N for all k ∈ N, x ∈ U .
(b) By shifting, we may assume x = 0 where x is some given point in int S. Now rBE ⊂ int S
for some r > 0. By part (a), there is some Br1 (y) ⊂ rBE for which fk (x) ≤ M for all k ∈ N,
x ∈ Br1 (y). Replacing M with a larger number as necessary, we may also assume fk (−y) ≤ M for
all k ∈ N. By the convexity of fn , we have that fn ≤ M on conv({−y} ∪ Br1 (y)) which contains
r1 r1
2 BE . Thus fn is uniformly bounded on 2 BE .
Now let K be a compact subset of int S. Suppose by way of contradiction there exists > 0 and
a subsequence (xnk ) ⊂ K such that
2.1.23. (a) Suppose f does not have Lipschitz constant K ≥ 0 on U . Fix u, v ∈ U such that
f (v) − f (u) > Kkv − uk, and let φ ∈ ∂f (v). Then
6
(b) Because dom f = E, f is continuous on E, and the extreme value theorem implies f is
bounded on bounded subsets of E. Exercise 2.1.22 then ensures f is Lipschitz on bounded subsets
of E, and then part (a) of this exercise ensures ∂f maps bounded subsets of E to bounded subsets
of E.
2.2.1. A proof of the convexity of f using the Hessian is sketched in [369, pp. 27–28]. Because
f is convex, we have
x+y 1 1
f ≤ f (x) + f (y)
2 2 2
which means
s
1√
n x1 + y1 xn + yn 1√
− ··· ≤− n
x1 x2 · · · xn − n y1 y2 · · · yn .
2 2 2 2
2.2.2. (a) ⇒ (b): Let g := −1/f and φ(t) := − ln(−t) for t ∈ (−∞, 0). Then φ is convex and
increasing, and g is convex. Therefore, ln ◦f = φ ◦ g is convex.
(b) ⇒ (c): g := ln ◦f is convex, therefore, f = exp(ln ◦f ) is convex since exp is convex and
increasing.
2.2.4. We provide details as in [34, Lemma 3.2]. For a function f on I consider the associated
Bregman distance Df defined by Df (x, y) := f (x) − f (y) − f 0 (y)(x − y). Let g := −1/h so that
g 0 = h0 /h2 . (a): 1/h is concave if and only if g is convex if and only if Dg is nonnegative if and
only if
0 ≤ −1/h(x) + 1/h(y) − (h0 (y)/h2 (y))(x − y) for all x, y ∈ I
if and only if
0 ≤ h(x)h(y) − h2 (y) − h(x)h0 (y)(x − y) for all x, y ∈ I.
Part (b) is similar, noting that Df ≡ 0 if and only if f is affine. Part (c) was shown in Ex-
ercise 2.2.2. Part (d): 1/h is concave if and only if g is convex if and only if g 00 = (h2 h00 −
2h(h0 )2 )/h4 ≥ 0 if and only if hh00 ≥ 2(h0 )2 .
2.2.5. First, g is a real-valued convex function on [0, 1] and so Theorem 2.1.2(d) ensures that g
is differentiable except at possibly countably many t ∈ [0, 1]. Then Theorem 2.2.1 implies that
at points of differentiability ∇g(t) = {∂g(t)}. Now let t ∈ (0, 1) be a point of differentiability of
of g. Observe that
2.2.8. Suppose f is Fréchet differentiable at x0 . Let φ = f 0 (x0 ). Given > 0 we choose δ > 0 so
that δkφk < and
|f (x0 + h) − f (x0 ) − φ(h)| ≤ khk
2
7
whenever 0 < khk < δ. Now suppose kx − x0 k < δ. Then x = x0 + h where khk < δ and the
previous inequality then implies
Thus, f is continuous at x0 .
Because SX is compact, we may replace (hn ) above with one of its convergent subsequences, so
we suppose hn → h ∈ SX . When khn − hk < /3K. For n sufficiently large we have
which contradicts the Gâteaux differentiability of f . For a slightly different proof of this, see the
last part of the proof of Theorem 2.5.4.
2.2.13. Suppose f is convex on the inverval I, and suppose J := [a, b] is a compact subinterval
of I. Then for m affine and 0 ≤ λ ≤ 1, we have
2.2.16. Suppose a < b and M is an affine function through (a, f (a)) and (b, f (b)), and let m be
an affine minorant of f passing through ((a + b)/2, f ((a + b)/2) (using the max formula (2.1.19)).
Then m ≤ f ≤ M , and thus
Z b
a+b 1 f (a) + f (b)
f ≤ f (t) dt ≤
2 b−a a 2
since these quantities are the averages of m, f and M respectively on [a, b].
2.2.20. First, for (x, y) ∈ dom f \ {(0, 0)}, the Hessian of f at (x, y) is
6xy −2 −6x2 y −3
H= .
−6x2 y −3 6x3 y −4
8
Then H is positive semidefinite for such (x, y) because |H| = 0 and 6xy −2 ≥ 0. Also, for
(x, y) ∈ dom f , we have
x3
λf (x, y) + (1 − λ)f (0, 0) = λ = f (λ(x, y))
y2
and together we deduce f is convex. It is also closed because: (i) its domain is closed (ii) x3 /y 2
is continuous when y 6= 0, (iii) lim inf (x,y)→(0,0) f (x, y) ≥ f (0, 0). However, f is not continuous at
(0, 0) even when considering that the underlying topological space is the domain. This is because
lim(x,x2 )→(0,0) f (x, x2 ) = +∞.
A further observation is that this type of example cannot occur on R. Indeed, if f : [a, b] → R
is convex and lower semicontinuous then f is continuous as a function on [a, b]. Indeed, it
is continuous on (a, b) and further lim inf x→b− f (x) ≥ f (b) by lower semicontinuity while the
convexity of f implies lim supx→b− f (x) ≤ f (b). Similarly, f is continuous from the right at a.
2.2.21. Suppose x, y ∈ U , and x∗ ∈ ∂f (x), y ∗ ∈ ∂f (y) (which are not empty by the max
formula (2.1.19)). Then by the subdifferential inequality
hy ∗ − x∗ , y − xi = y ∗ (y − x) + x∗ (x − y)
≥ f (y) − f (x) + f (x) − f (y) = 0.
Hence the subdifferential is a monotone mapping. The ‘in particular’ statement follows because
∂f (x) = {∇f (x)} when f is differentiable at x.
2.2.22. (a) Suppose not, then there exists xn → x0 and > 0 so that φn ∈ ∂f (xn ), but
φn 6∈ ∂f (x0 ) + BE . Use the local Lipschitz property of f (Theorem 2.1.12) to deduce that
(kφn k)n is bounded. Then use compactness to find convergent subsequence, say φnk → φ. Now
fix y ∈ E. Then
where the last equality follows by the uniform convergence of fnk to f on bounded sets. Thus
φ ∈ ∂f (w0 ), that is φ = ∇f (w0 ). By (b), ∇f (wk ) → ∇f (w0 ) = φ which yields a contradiction
because kφnk − ∇f (wnk )k > .
9
(d) For example, let fn := max{| · | − 1/n, 0} and f := | · | on R. Then ∂fn (1/n) 6⊂ ∂f (1/n) + 21 BR
for any n ∈ N. Indeed, ∂fn (1/n) = [0, 1] while ∂f (1/n) + 12 BR = [1/2, 3/2]. For the remaining
part, suppose no such N exists. As in (c), choose φnk ∈ ∂fnk (wnk ) but φnk 6∈ ∂f (w) + BE for
kw − wnk k < δ and as in (c), wnk → w0 and φnk → φ for some w0 ∈ E and φ ∈ E. Again, as in
(c), one can show that φ ∈ ∂f (w0 ). However, for kwnk − w0 k < δ, we have φnk 6∈ ∂f (w0 ) + BE
which is a contradiction.
2.2.23. (a) We will use the max formula (2.1.19). Suppose x0 ∈ bndy C and take xn 6∈ C such
that xn → x0 . By the max formula (2.1.19), let φn ∈ ∂dC (xn ). Then kφn k ≤ 1 because dC has
Lipschitz constant 1, and kφn k ≥ 1, because we choose x̄n ∈ C such that dC (xn ) = kxn − x̄n k,
and then hφn , x̄n − xn i ≤ −dC (xn ). By the compactness of BE , we know φnk → φ̄ for some φ̄,
and kφ̄k = 1. Also, for any x ∈ E, we have
Thus φ̄ ∈ ∂dC (x0 ).√Then nφ̄ ∈ ∂ndC (x0 ) and so nφ̄ + φ ∈ ∂f (x0 ).
(b) Let f (t) := − t for t ≥ 0 and f (t) := +∞ when t < 0. Then ∂f (0) = ∅. Let g := δ[0,+∞) .
Then ∂g(0) = (−∞, 0].
Further notes. The proof of (a) shows that given any nonempty convex set A with x0 ∈ bndy A,
that there exists φ̄ ∈ ∂dC (x0 ) with kφ̄k = 1, and C being the closure of A. It then follows that
φ̄(x0 ) = supC φ̄. Had we done this separation theorem earlier, we could have more elegantly
completed the proof of Theorem 2.2.1 and part (a) of this exercise.
2.3.2. Using calculus, one can show that for f := | · |p /p on R one has f ∗ = | · |q /q. The
Fenchel–Young inequality (2.3.1) then shows f (x) + f ∗ (y) ≥ xy, that is,
as desired.
2.3.3. Let kf kp = α, kgkq = β where α, β > 0 (if either α = 0 or β = 0, then f g = 0 a.e. and so
the inequality is trivially true). Now we integrate both sides of the Young inequality:
1 |f (x)|p 1 |g(x)|q
Z Z
f (x) g(x) 1 1
dµ ≤ p
+ q
dµ = + = 1.
X α β X p α q β p q
PN p
2.3.4. (a) To show that x 7→ k=1 |xk | is convex observe that g := | · |p is a convex function R,
and then
N
X
x 7→ g(Pk (x)) where Pk (x) = xk
k=1
10
(b) Alternatively, one may apply the discrete form of Hölder’s inequality of Exercise 2.3.3 as
follows:
N
X N
X
p
|xk + yk | = |xk ||xk + yk |p−1 + |yk ||xk + yk |p−1
k=1 k=1
N
! p1 N
! 1q
X X
≤ |xk |p |xk + yk |(p−1)q +
k=1 k=1
N
! p1 N
! 1q
X X
|yk |p |xk + yk |(p−1)q
k=1 k=1
N
! 1q N
! p1 N
! p1
X X X
= |xk + yk |p |xk |p + |yk |p
k=1 k=1 k=1
N
! 1q
X
where we used (p − 1)q = p. Now divide both sides by |xk + yk |p using that 1 − 1/q = 1/p
k=1
to obtain
N
! p1 N
! p1 N
! p1
X X X
|xk + yk |p ≤ |xk |p + |yk |p
k=1 k=1 k=1
as desired.
Taking the infimum of the left-hand side over x ∈ E, and then taking the supremum of the
right-hand side over φ ∈ Y establishes the weak duality inequality p ≥ d.
(ii) (This part is for the subdifferential sum rule). Let x ∈ E and suppose φ ∈ ∂f (x) and
Λ ∈ ∂g(Ax). Then for any v ∈ E we have
(iii) Fix u ∈ Y . Then f (x) + g(Ax + u) < ∞ for some x ∈ X if and only if there exist x ∈ dom f
such that Ax+u ∈ dom g if and only if u ∈ dom g−A dom f . Thus dom h = dom g−A dom f .
To check the convexity of h, suppose u, v ∈ dom h and let α, β be any numbers such that
11
h(u) < α and h(v) < β. Choose x1 ∈ E such that f (x1 ) + g(Ax1 + u) < α and x2 ∈ E such
that f (x2 ) + g(Ax2 + v) < β. Then for any 0 ≤ λ ≤ 1, we have
(iv) Let x0 ∈ dom f be such that Ax0 ∈ cont g. Let y0 = Ax0 . Because g is continuous at y0 ,
this implies y0 + rBY ⊂ dom g for some r > 0. Therefore,
Part (b).
(i) First, inclusion was completed in (a)(ii) above. Conversely suppose φ ∈ ∂(f + g ◦ A)(x̄).
Applying the Fenchel–Young inequality (2.3.1), and then applying the Fenchel duality the-
orem (2.3.4), we obtain
f (x̄) + g(Ax̄) − hφ, x̄i = inf {f (x) + g(Ax) − hφ, xi} = inf {(f − φ)(x) + g(Ax)}
x∈E x∈E
∗ ∗ ∗
= −(f − φ) (A φ̄) − g (−φ̄),
where φ̄ ∈ Y is a point where d in the Fenchel duality theorem (2.3.4) is attained. Therefore,
(f − φ)(x̄) − hA∗ φ̄, x̄i + g(Ax̄) − h−φ̄, Ax̄) = −(f − φ)∗ (A∗ φ̄) − g ∗ (−φ̄)
and by Fenchel–Young inequaltiy (2.3.1), A∗ φ̄ ∈ ∂(f − φ)(x̄) and −φ̄ ∈ ∂g(Ax̄). The first
inclusion implies A∗ φ̄ + φ ∈ ∂f (x̄), and using the second inclusion we check
h−A∗ φ̄, u − x̄i = h−φ̄, A(u − x̄)i ≤ g(Au) − g(Ax̄) for all u ∈ E;
(ii) The previous part has proved the ‘only if’ assertion, and we can essentially reverse our steps
to deduce the ‘if’ assertion.
2.3.13. Suppose f : E → R has Lipschitz constant k. Suppose φ ∈ E and kφk > k. Choose
x0 ∈ E with kx0 k = 1 and φ(x0 ) > k. Then limt→∞ φ(tx) − f (tx) → ∞. So φ 6∈ dom f ∗ .
For the converse, assume dom f ∗ ⊂ kBE is not empty, then f is bounded below by φ − a where
φ ∈ dom f ∗ and a = f ∗ (φ). (Using relative interior properties, one knows that the domain of
the subdifferential of a proper convex function on E is nonempty, and hence the domain of the
conjugate is not empty; see Theorem 2.4.8). Then if dom f 6= E, one can find yn ∈ dom f ∗ such
that kyn k → ∞: for example, letting fn := φ − a + ndC where C := dom f , one has fn ≤ f , but
for x 6∈ C, and y ∈ ∂fn (x), one has kyk ≥ n − kφk. Since y ∈ dom f ∗ , this yields a contradiction.
12
Thus dom f = E and thus f is continuous. In the event f is not k-Lipschitz, one can choose
u, v ∈ E such that f (v) − f (u) > kku − vk. Now let y ∈ ∂f (v). Then
hy, u − vi < −kku − vk
and so kyk > k, but y ∈ dom f ∗ which is a contradiction.
This result fails if f is not convex: for example, consider f (x) := |x| on R. Then dom f ∗ = {0},
p
13
(g) Observe that
As in the proof of (f), the convolution has Lipschitz constant 1 because the norm has
Lipschitz constant 1.
Further notes. One may prefer a more explicit argument in (f). Indeed, once we have established
(f g) is real-valued, suppose (f g)(x̄) = t̄. Given > 0, choose y ∈ E such that f (y)+g(x̄−y) <
t̄ + . Then for any h ∈ E,
From here, local/global Lipschitz properties of the convolution then follow directly from the
local/global Lipschitz properties of f (this argument works just as well in any normed linear
space irrespective of the dimension).
2.3.15. For (a) see the proof of Lemma 4.4.15; for (b) see the proof of Lemma 4.4.16; and for
(c), use (a) and (b), c.f. Corollary 4.4.17.
(d) Observe that if f and g are closed, then (f ∗ g ∗ )∗ = f ∗∗ + g ∗∗ = f + g, where the first
equality follows from (a). Then (f + g)∗ = (f ∗ g ∗ ). The result as stated follows because
cl(f + g) = cl f + cl g under the condition dom f ∩ cont g 6= ∅ as we now sketch.
Clearly, cl f + cl g ≤ f + g and so cl f + cl g ≤ cl(f + g). Let x̄ ∈ dom(cl f + cl g). We will show
cl(f + g)(x̄) ≤ (cl f + cl g)(x̄). Fix v ∈ int dom g ∩ dom f and choose r > 0 so that g is bounded on
v + rBX ⊂ int dom g. Now choose un ∈ dom f with un → x̄ and f (un ) → cl f (x̄). For 0 < λ < 1,
λx̄ + (1 − λ)(v + rBX ) ⊂ int dom g. Because un → x̄, we fix λn → 1 so that
Now,
2.3.16. Suppose f : C → R is Lipschitz with Lipschitz constant k, and let f˜(x) := inf{f (y) +
kkx − yk : y ∈ C}. For x0 ∈ C, taking y = x0 clearly shows f˜(x0 ) ≤ f (x0 ), hence f˜ ≤ f
on C. Also, fix x0 ∈ C. Then f (x) ≤ f (x0 ) + kkx − x0 k, and so f (x) < ∞ for all x ∈ X.
Moreover, f (x) ≥ f (x0 ) − kkx − x0 k for x ∈ C. Now fix x ∈ X, then f (y) + kkx − yk ≥
f (x0 ) − kky − x0 k + kkx − yk ≥ f (x0 ) − kkx − x0 k for all y ∈ C. This shows f˜(x) > −∞ for all
x ∈ X, i.e. f˜ is real-valued.
Suppose f˜(x0 ) < f (x0 ) for some x0 ∈ C. Then there exists y ∈ C such that f (y) + kkx0 − yk <
f (x0 ). This violates the Lipschitz constant of f on C. Hence f˜(x) = f (x) for all x ∈ C. Similarly,
one can see that f˜ is globally Lipschitz with Lipschitz constant k. Indeed, suppose f˜(u) − f˜(v) >
14
kku − vk + for some u, v ∈ X and > 0. Choose x0 ∈ C such that f˜(v) ≤ f (x0 ) + kkv − x0 k + .
Then
f˜(u) ≤ f (x0 ) + kku − x0 k ≤ f (x0 ) + kkv − x0 k + kku − vk
which is a contradiction.
2.3.17. (a) Suppose x and y are global minimizers of f . Then f (λx + (1 − λ)y) ≥ f (x) =
λf (x) + (1 − λ)f (y) for 0 < λ < 1. Because f is strictly convex, x = y.
(b) It suffices to show this for y = 0, and it suffices to show
u+v 1 1
(2) f < f (u) + f (v) for all distinct u, v ∈ E.
2 2 2
Indeed, if f (λu + (1 − λ)v) = λf (u) + (1 − λ)f (v) for some 0 < λ < 1 and distinct u and v,
then by the convexity of f equality holds for all 0 < λ < 1. We now check that (2) is an easy
consequence of the parallelogram identity:
2
u+v 1
u + v
= 1 kuk2 + 1 kvk2 − 1 ku − vk2
f =
2 2
2
2 2 2
1 1
< f (u) + f (v) when u 6= v.
2 2
(c) (i) Let y ∈ E and define f by f (x) := 12 kx − yk2 + δC . The strict convexity of f follows
from (b), and f attains its minimum on C because any minimizing sequence is bounded and a
convergent subsequence converges to the unique, by part (a), minimizer.
Now let y ∈ E, and suppose ȳ ∈ C satisfies hy − ȳ, x − ȳi ≤ 0 for all x ∈ C. Then for x ∈ C
ky − xλ k2 = hy − xλ , y − xλ i
= hy − ȳ − λ(x − ȳ), y − ȳ − λ(x − ȳ)i
= ky − ȳk2 − 2λhy − ȳ, x − ȳi + λ2 kx − ȳk2
= ky − ȳk2 − λ[2hy − ȳ, x − ȳi − λkx − ȳk2 ].
For λ > 0 sufficiently small, the term in the brackets is positive and then ky − ȳk2 > ky − xλ k2 ,
and so ȳ 6= PC (y).
(ii) Let x̄ ∈ C. Then d ∈ NC (x̄) if and only if d ∈ ∂δC (x̄) if and only if
15
Indeed, expanding and rearranging using the inner product, the left-hand side of (3) is equal to
and by part (i) the last two inner products in (4) are less than or equal to 0 which provides the
desired conclusion.
(d) For example S = {−1, 1}, then PS (0) is multi-valued, and limx→0+ P (x) = {1} while
limx→0− P (x) = {−1} so there is no single-valued selection of P that is continuous at {0}.
kxk2
2.3.20. (a) Let f (x) := 2 − δS (x). The proof of Fact 4.5.6 shows
1 2 kyk2
dS (y) = − f ∗ (y)
2 2
or dS (·) = k · k2 − 2f ∗ (·) is a difference of convex functions as desired.
(b) dC = k · k δC and thus d∗C = (k · k)∗ + δC
∗ =δ
B E + σC .
(c) Let x ∈ C. By Fenchel–Young (Proposition 2.3.1), φ ∈ ∂dC (x) if and only if d∗C (φ) =
φ(x) − dC (x) if and only if δBE (φ) + σC (φ) = φ(x) − dC (x) = φ(x) if and only if φ ∈ BE and
φ(x) = σC (φ) if and only if φ ∈ ∂δC and φ ∈ BE if and only if φ ∈ NC (x) and φ ∈ BE .
(d) Suppose x 6∈ C. Then φ ∈ ∂dC (x) if and only if φ ∈ BE and
For x 6∈ C, the chain rule implies ∇d2C (x) = 2dC (x)∇dC (x) = 2(x − PC (x)) (by part (c)). Thus
(e) follows.
2.3.21. Let D := {x ∈ E : Ax = b}. Let x ∈ D, then φ ∈ ∂D(x) if and only if φ(y − x) ≤ 0 for
all y ∈ D if and only if φ(u) ≤ 0 for all u ∈ ker A if and only if φ(u) = 0 for all u ∈ ker A.
Suppose φ ∈ A∗ Y , that is φ = A∗ y for some y ∈ Y . Fix u ∈ D, and let v ∈ D be arbitrary, then
hA∗ y, x̄i = hy, A(h + φ(x̄)x0 )i = hy, Ahi + φ(x̄)hy, Ax0 i = φ(x̄).
16
(a) Suppose x̄ is a local minimizer as specified. Then ∇f (x̄)| ∈ ∂δD (x̄) and so ∇f (x̄) ∈ A∗ Y .
(b) Suppose ∇f (x̄) ∈ A∗ Y and f is convex. Then ∇f (x̄) ∈ ∂δD (x̄), and because f is convex, x̄
is a global minimizer of f |D .
2.4.1. Part(a).
(i) As in the proof of the Fenchel duality theorem (2.3.4), let h : Y → [−∞, +∞] be defined by
then h is convex and 0 ∈ core(dom g − A dom f ) = dom h, thus by the max formula (2.1.19)
∂h(0) is not empty, so we let −φ ∈ ∂h(0). Now for all x ∈ E and u ∈ Y with u = Av where
v ∈ E, we have
Define
(ii) With notation as in the Fenchel duality theorem (2.3.4), observe p ≥ 0 because f (x) ≥
−g(Ax), and then the Fenchel duality theorem (2.3.4) says d = p and because the supremum
in d is attained, we choose φ ∈ Y such that
0 ≤ p = −f ∗ (A∗ φ) − g ∗ (−φ)
≤ [f (x) − hφ, Axi] + [g(y) + hφ, yi] for all x ∈ X, y ∈ Y,
where the second inequality is a direct consequence of the definitions of f ∗ (A∗ φ) and g ∗ (−φ).
Then for any z ∈ E, setting y = Az, in the previous inequality, we deduce a ≤ b where a
and b are as in (5). Now choose r ∈ [a, b] and let α(x) = hA∗ φ, xi + r.
(iii) The inclusion is straightforward (Exercise 2.3.12 (a)(ii)), so we prove the reverse inclu-
sion. Suppose φ ∈ ∂(f + g ◦ A)(x̄). Because shifting by a constant does not change the
subdifferential, we may assume without loss of generality that
attains its minimum of 0 at x̄. According to the sandwich theorem (2.4.1) there exists an
affine function α := hA∗ y, ·i + r with −y ∈ ∂g(Ax̄) such that
17
(iv) Let the notation be as in the Hahn–Banach extension theorem (2.1.18). Then −g ≤ p where
g = −f +δS . Because p is everywhere continuous, we can apply the sandwich theorem (2.4.1)
to find an affine mapping α such that −g ≤ α ≤ p, that is f ≤ α ≤ p. Then α = α(0) + φ
where φ ∈ E. We know α(0) ≥ f (0) = 0 and so φ ≤ p. We claim φ(s) = f (s) for all s in the
linear subspace S. Indeed, if this were not true, then f (x0 ) − φ(x0 ) 6= 0 for some x0 ∈ S.
Then choose k ∈ R so that k(f (x0 ) − φ(x0 )) > α(0). This implies f (kx0 ) > φ(kx0 ) + α(0)
which is impossible. Thus φ|S = f as desired.
Part (b). Several connections were outlined in (a) and earlier, for now we’ll derive a couple
additional easy relations, to sketch one complete circle.
(i) Suppose the subdifferential sum rule is valid, and x0 ∈ core dom f where f : E → (−∞, +∞]
is convex. Then E = ∂(f + δ{x0 } )(x0 ) = ∂f (x0 ) + ∂δ{x0 } (x0 ) and so ∂f (x0 ) is not empty.
(ii) Suppose the subdifferential of a convex function on E at a point of continuity is not empty.
Now consider a linear function f on a subspace Y of E and f |Y ≤ p for some sublinear
function p on E. Consider h = f p. Then h ≤ p, h|Y = f , h is continuous with h(0) =
0. Consider φ ∈ ∂h(0). Then φ|Y = f , and φ ≤ h. Thus the Hahn-Banach extension
theorem (2.1.18) follows.
So one of the circles of implications we have sketched is: Hahn–Banach extension ⇒ max formula
⇒ Fenchel duality theorem ⇒ sandwich theorem ⇒ nonemptiness of subdifferential ⇒ Hahn–
Banach extension theorem. Where the proofs of the respective implications are given in: proof
of max formula (2.1.19), proof of the Fenchel duality theorem (2.3.4), Part(a)(ii), Part(a)(iii),
Part(b)(i), Part(b)(ii).
Further notes. (I) The Fenchel duality and the sandwich theorems are most easily visualized and
understood in the classical case Y = E where A is the identity map, and yet still very powerful.
In this case, the primal and dual problems are:
p := inf {f (x) + g(x)} and d := sup{−f ∗ (y) − g ∗ (−y)}.
x∈E y∈E
18
When x = 0 and y = 0, (6) implies r ≥ p, and (6) further implies Λ(x) − φ(Ax) ≥ p − r for all
x ∈ E, and so Λ = A∗ φ. Lastly, we rewrite the left inequality of (6) as
and take the infimum over y ∈ Y and then over x ∈ E to deduce −g ∗ (φ) − f ∗ (A∗ φ) ≥ p which
together with the weak duality inequality provides the result.
2.4.3. (a) Fix > 0 and let x̄ ∈ cl C. Then there exists a sequence (xn ) ⊂ C such that xn → x̄.
Fix n0 such that kxn0 − x̄k < . Then x̄ ∈ xn0 + BX ⊂ C + BE .
(b) Let x + y ∈ D + F where x ∈ D, y ∈ F . Because D is open, we choose > 0 so that
x + BE ⊂ D. Then x + y + BE ⊂ D + F . Therefore, x + y ∈ int(D + F ) and so D + F is open.
(c) Let x ∈ int C and choose > 0 so that x + BE ⊂ C. By part (a), for each λ > 0,
λ
cl C ⊂ C + 1−λ BE . Therefore,
λ
λx + (1 − λ) cl C ⊂ λx + (1 − λ) C + BE
1−λ
= λ(x − BE ) + (1 − λ)C
= λ(x + BE ) + (1 − λ)C ⊂ C.
and by part (b), the sum on the left hand side is open, and so
as desired.
(d) Because int C ⊂ cl C, the previous inequality implies (trivially) λx + (1 − λ)y ∈ int C for all
x, y ∈ int C and 0 < λ < 1, and so int C is convex.
(e) For any fixed x ∈ int C, λx + (1 − λ) cl C ⊂ int C. Letting λ → 0+ , we deduce that
cl C ⊂ cl(int C).
Clearly this can fail without convexity. For example, let S := Q ∪ (0, 1) ⊂ R. Then cl S = R,
but cl(int S) = [0, 1].
Further notes: in any normed linear space it is straightforward to show the interior of a convex
set is convex: Let x, y ∈ int C, and choose r > 0 so that x + rBX ⊂ C and y + rBX ⊂ C. Then
for 0 ≤ λ ≤ 1, one has
λ(x + rBX ) + (1 − λ)(y + rBX ) ⊂ C.
Then, λx + (1 − λ)y + rBX ⊂ C and λx + (1 − λ)y ⊂ int C as desired. It is also easy to see that
the closure of a convex set is closed (see solution to Exercise 2.4.8). See also [383, Theorem 1.13]
for more.
T
2.4.4.(a) Let (Ai )i∈I be a collection of affine sets. Let A = i∈I Ai , and let x, y ∈ A and λ ∈ R.
Then x, y ∈ Ai for each i ∈ I and so λx+(1−λ)y ∈ Ai for each i ∈ I. Therefore, λx+(1−λ)y ∈ A,
and so A is affine.
(b) Suppose A is a nonempty affine subset of E. Fix x0 ∈ A. We claim that Y := A − x0 is
linear. Indeed, then for α, β ∈ R and x, y ∈ L we have x = u − x0 and y = v − x0 where u, v ∈ A,
and
αx + βy + x0 = αu − αx0 + βv − βx0 + 1x0 ∈ A
19
because α − α + β − β + 1 = 1 (see part (c)). Therefore, αx + βy ∈ A − x0 , and A − x0 is a linear
subspace. Conversely, if Y is linear, and x0 ∈ E, then A := Y + x0 is affine. Indeed, for x, y ∈ Y ,
and α + β = 1, we have
α(x + x0 ) + β(y + y0 ) = αx + βy + x0 ∈ A,
which completes the proof of (b). (Alternatively, one can directly deduce this from Lemma 2.4.5).
(c) This part follows immediately from Lemma 2.4.5 which Pmshows aff D = x + span(D − x) for
any x ∈ D. Indeed, suppose x1 , x2 , . . . , xm ∈ D. Then for i=1 λi = 1, we have
m
X m
X
λ i xi = x + λi (xi − x) ∈ aff D.
i=1 i=1
2.4.5. (a) Consider C1 = [0, 1] × {0} and C2 = [0, 1] × [0, 1] as subsets of R2 . Then ri C1 =
(0, 1) × {0} while ri C2 = (0, 1) × (0, 1). Thus C1 ⊂ C2 , but ri C1 6⊂ ri C2 .
(b) By translating C, we may assume that 0 ∈ C, then aff C = Y is a linear space, and so ri C
is the interior of C relative to Y , thus we may apply Exercise 2.4.3 using Y as the overspace to
derive the conclusion.
(c) (i) ⇒ (ii): Let x ∈ ri C then there exists r > 0 so that x + BE ∩ aff C ⊂ C. In particular,
for y ∈ C choosing > 0 so that ky − xk < r, we have x + (x − y) ∈ C.
(ii) ⇒ (iii): Let Y := {λ(c − x) : λ ≥ 0, x ∈ C}. Certainly 0 ∈ Y . Moreover, let λ(c − x) ∈ Y .
Then αλ(c − x) ∈ Y if α ≥ 0. Suppose α < 0, then choose > 0 so that (x − c) + x ∈ C. Then
|α|λ
αλ(c − x) = |α|λ(x − c) = (x − c)
|α|λ
= [((x − c) + x) − x],
and so Y is closed under scalar multiplication. We now show Y is closed under addition. Indeed,
for the nontrivial case λ1 , λ2 > 0 we have
λ1 λ2
λ1 (c1 − x) + λ2 (c2 − x) = (λ1 + λ2 ) c1 + c2 − x
λ1 + λ2 λ1 + λ2
= (λ1 + λ2 )(c̄ − x),
20
2.4.6. By shifting f we may assume 0 ∈ dom f , and let Y = span(dom f ). Let x ∈ ri(dom f ), the
x is in the interior of the domain of f relatively to Y . By the max formula (2.1.19), ∂f |Y (x) 6= ∅,
that is, there exists φ ∈ Y such that
Now write E = Y +Z as a direct sum, and define φ̃ on E by φ̃(y +z) = φ(y). Because dom f ⊂ Y ,
it follows from (7) that φ̃ ∈ ∂f (x).
Notice that this result ensures the subdifferential of any proper convex function on E has
nonempty domain and range. This is because the domain of a convex function is convex, and
every nonempty convex subset of E has nonempty relative interior.
2.4.7. Suppose x0 ∈ dom f and ∂f (x0 ) = ∅. Because cl(ri dom f ) = cl(dom f ), there exists a
sequence (xn ) ⊂ ri(dom f ) converging to x0 , and hence by Exercise 2.4.6 there exist φn ∈ ∂f (xn ).
Suppose by way of contradiction that kφn k 6→ ∞, hence it has a bounded subsequence, and then
by compactness a convergent subsequence. Thus we suppose (φnj ) → φ. Then for y ∈ E, we have
φ(y) − φ(x0 ) = lim φnj (y − xnj ) ≤ lim inf f (y) − f (xnj ) ≤ f (y) − f (x0 ),
j j
and so φ ∈ ∂f (x0 ). This provides our desired contradiction. Thus kφn k → ∞. Furthermore,
Exercise 2.2.23 shows that ∂f (x0 ) is unbounded whenever it is not empty and x0 is in the boundary
of the domain of f . Thus ∂f is not bounded on any neighborhood of a boundary point of the
domain of f .
Further notes. Closedness is necessary as simple examples illustrate. Indeed, let f (t) := 0 if
t < 1, f (1) := 1 and f (t) := +∞ for t > 1. Then ∂f (1) = ∅ and ∂f (t) = {0} for all t < 1.
2.4.8. Suppose f is proper. Fix x0 ∈ ri(dom f ) and let φ ∈ ∂f (x0 ). Then f (x) ≥ cl f (x) ≥
f (x0 ) + φ(x − x0 ) for all x ∈ X and so cl f is proper. One can verify cl f is convex because
its epigraph is convex as the closure of a convex set; that a closure of a convex set is convex
is elementary to verify. Indeed, suppose D = cl C where C is convex. Suppose x, y ∈ D, and
0 ≤ λ ≤ 1. Choose (xn ), (yn ) ⊂ C so that xn → x and yn → y. Then
2.4.9. The set C is closed by Carathéodory’s theorem (1.2.5) because it is the convex hull of a
compact set. The set of extreme points of C is not closed because (1, 0, 0) is not an extreme point
of C but every other point on the circle {(x, y, z) : x2 + y 2 = 1, z = 0} is an extreme point of C.
2.4.10. (a) If (2.4.12) has a solution, then clearly (2.4.13) does not, soPat most one of (2.4.12)
and (2.4.13) has a solution. Let C := {x ∈ E : x = m
P
i=0 i xi , λi ≥ 0,
λ λi = 1}. Then C is a
21
closed convex set. In the case 0 ∈ C, then (2.4.12) has a solution. In the event 0 6∈ C, we apply
the basic separation theorem (2.1.21) to find x ∈ E so that supC x < hx, 0i = 0. In particular,
hxi , xi < 0 for i = 0, 1, . . . , m and so x is a solution to (2.4.13).
(b) Clearly, (2.4.14) and (2.4.15) cannot simultaneously have solutions. Consider the cone
m
X
C := {x : x = µi xi , µi ≥ 0}.
i=1
Then C is convex and it is a finitely generated cone which is thus closed by Carathéodory’s
theorem (1.2.5). In the event c ∈ C, then (2.4.14) has a solution. In the event c 6∈ C, we apply the
basic separation theorem (2.1.21) to find x ∈ E so that hx, ci > supu∈C hx, ui = 0 (note supC x = 0
because 0 ∈ C, and if hx, ui > 0 for some u ∈ C, then nu ∈ C and so supC x > nhx, ui > hx, ci).
Therefore, (2.4.15) is satisfied by x because hc, xi > 0 and hxi , xi ≤ supC x = 0 for i = 1, . . . , m.
Pm
2.4.11. (a) Suppose {aj }m j=1 is linearly dependent, and x = j=1 µj aj where µj ≥ 0 for j =
1, 2, . . . , m. We will show that x can be written in this form using at most m − 1 elements, from
which the first statement in (a) will follow.
Using the linear dependence we can write
m
X
λ1 a1 + λ2 a2 + . . . + λm am = 0, where λj ≥ 0
j=1
as desired.
For the second statement in (a), let {aj : j ∈ J} be a linearly independent set, and let
N := |J| and define the linear mapping A : RN → RN by A(c1 , c2 , . . . , cN ) = N
P
i=1 ci aji where
J = {j1 , j2 , . . . , jN }. Then A is an isomorphism and so it maps closed sets onto closed sets, in
particular, A(R+ N ) = C is closed.
J
(b) A finitely generated cone is thus closed as a union of finitely many closed sets.
22
(c) This can be proved along the same lines as (a), but with more care: see p. 41–42 in L.D.
Berkowivitz, Convexity and Optimization in Rn , Wiley, 2002. We will use the result of (a) to
m
derive (c). Indeed, Pm suppose A ⊂ R , and suppose
Pm x ∈ conv A. Then there exist a1 , . . . , amn in A
such that x = i=1 λi ai where λi ≥ 0 and Pm i=1 λi = 1. Now consider the cone CI ⊂ R × R
m
generated by {(ai , 1)}i=1 . Then (x, 1) = i=1 λi (ai , 1) ∈ CI , and by part (a), (x, 1) P ∈ CJ where
{aj , 1}j∈J is linearly independent in Rn × R, P and so |J| ≤ n +P1. Now (x, 1) = j∈J µj (aj , 1)
where µj ≥ 0 for j ∈ J. Consequently, x = j∈J µj aj and 1 = j∈J µj which shows the desired
result.
2
(d) Let A be a compact set in Rn and let f be a function from R(n+1) to Rn defined by
Pn+1
f (y, x1 , x2 , . . . , xn+1 ) = i=1 yi xi where y = (y1 , y2 , . . . , yn+1 ) ∈ Rn+1 and xi ∈ Rn . Then f is a
continuous function and conv A is the image of the compact set ∆ × A × A × . . . × A under the
mapping f where ∆ is the simplex in Rn+1 .
a and let x̄ := P i∈I1 aai xi . Then i∈I1 aai = 1 and so x̄ ∈ C1 , and it follows from
P P P
Let a := i∈IP 1 i
(8) that x̄ = i∈I2 − aai xi and i∈I2 − aai = 1. Consequently x̄ ∈ C2 as well, and we are done.
2.4.13. We first establish the case when I is finite (in this case we need not assume the sets Ci
are closed and bounded). The case |I| ≤ n + 1 is trivial, so we suppose |I| = n + 2 and that the
sets C1 , C2 , . . . , Cn+2 are such that that every subcollection
T of n + 1 or fewer sets has nonempty
intersection. For each T 1 ≤ i ≤ n + 2, we fix x̄i ∈ j∈I,j6=i Cj . In the case x̄j1 = x̄j2 for some
j1 6= j2 , then x̄j1 ∈ i∈I Ci and we are done. So we suppose the x̄0i s are all distinct. According
to Radon’s theorem (1.2.3) we can partition I = I1 ∪ I2 so that D1 := conv{x̄i : i ∈ I1 } and
D2 :=T conv{x̄i : i ∈ I2 } have nonempty T intersection, sayTx̄ ∈ D1 ∩ D2 . Now x̄ ∈ D1 ensures
x̄ ∈ i∈I2 Ci and x̄ ∈ D2 ensures x̄ ∈ i∈I1 Ci and so x̄ ∈ 1≤i≤n+2 Ci as desired.
Now suppose |I| = k > n + 2, and the assertion is true whenever |I| ≤ k − 1 the argument in
the previous paragraph shows every subcollection of n + 2 sets on {Ci }i∈I will have nonempty
intersection. Now consider the collection D1 := C1 ∩ C2 and Di := Ci+1 for i = 2, . . . , k. Then
D1 , D2 , . . . , Dk − 1 is a collection of closed convex sets such that everyTsubcollection of n + 1
or fewer sets has nonempty T intersection. By the induction hypothesis, k−1 i=1 Dk has nonempty
intersection, that is i∈I Ci is not empty as desired. By mathematical induction, the result is
true for every |I| ∈ N.
Now suppose {Ci }i∈I is as in the statement of Helly’s theorem. According to the previous para-
graph, every finiteTsubcollection has nonempty intersection. By the finite intersection property
for compact sets, i∈I Ci is not empty.
2.4.20. Let m := inf C f . Then f ≥ −g where g := δC − m. The conditions imply we can apply
the sandwich theorem (2.4.1) to find an affine function α such that
m − δC ≤ α ≤ f.
23
Then m ≤ inf C α ≤ inf C f as desired.
Suppose x̄ minimizes f on C, and write the affine separating function as α = φ + r. Then
φ ∈ ∂f (x̄) and −φ ∈ ∂δC (x̄) and so 0 = φ − φ ∈ ∂f (x̄) + NC (x̄) as desired.
Conversely, suppose 0 ∈ ∂f (x̄) + δC (x̄). Then 0 ∈ ∂(f + δC )(x̄), and so f + δC attains its
minimum at x̄. Therefore f attains its minimum on C at x̄.
Further notes. This last part could have been completed equally easily using the subdifferential
sum rule because
0 ∈ ∂(f + δC )(x̄) if and only if 0 ∈ ∂f (x̄) + ∂δC (x̄) if and only if 0 ∈ ∂f (x̄) + NC (x̄).
Further, the supremum is attained when finite according to the Fenchel duality theorem (2.3.4),
and so we have
(b) In case (i) when D is bounded, σD has full domain and AC is not empty and so (2.4.17)
holds. In case (ii), when A is surjective and 0 ∈ int C, then 0 ∈ int AC because A is open. Clearly
0 ∈ dom σD , and thus (2.4.17) holds in this case as well.
(c) When D is compact, (2.4.17) holds by case part (b)(i). The compactness of D and C then
allow the replacing of sup and inf with max and min.
Therefore, K ⊂ (K − )− . Also, observe that for any set S, S − is a closed convex set because it is
an intersection of closed half-spaces. In particular, K −− is a closed convex set containing R+ K.
Now let C be the closed convex hull of R+ K, and suppose x0 6∈ C. By the basic separation
theorem (2.1.21), we choose φ ∈ E such that φ(x0 ) > supC φ. Observe that supC φ = 0 because
0 ∈ C, and if supC φ > 0, then there exists t > 0 and x̄ ∈ K such that φ(tx̄) > 0. Then
limn→∞ φ(nx̄) = ∞, and so φ would not be bounded above by φ(x0 ) on C. Since supC φ = 0,
then φ ∈ K − , and so x0 6∈ (K − )− , and thus K −− ⊂ C as desired.
24
2.4.24. We will prove the assertions for D + C, the other case follows by considering D + (−C).
Suppose d1 + c1 , d2 + c2 ∈ D + C. Then for 0 ≤ λ ≤ 1, using the convexity of D and C we obtain
Now suppose (dn + cn )∞ n=1 ⊂ D + C and dn + cn → x̄. Let (cnk ) be a convergent subsequence of
(cn ), say, cnk → c̄ ∈ C. Then dnk → x̄ − c̄. Thus x̄ − c̄ ∈ D, and so x̄ ∈ D + C as desired.
f (x + h) + f (x − h) − 2f (x)
lim sup ≥
khk→0 khk
2.5.2. Let ( )
1
Gn,m := x ∈ U : sup f (x + h) + f (x − h) − 2f (x) < ,
1
khk≤ m nm
S T
and On := m≥1 Gn,m and G := n≥1 On . Suppose x ∈ G and let > 0. Choose n such that
1/n < . Now find m such that x ∈ Gn,m , and choose δ so that 0 < δ < 1/m. The convexity of
f implies
1 1
sup f (x + h) + f (x − h) − 2f (x) < α whenever 0 < α ≤ .
khk=α n m
Indeed, for 0 < λ ≤ 1 we have
25
Exercise 2.5.1 implies that f is differentiable at x. Conversely, suppose f is differentiable at x,
and fix n ∈ N. Choose 0 < < 1/n and use Exercise 2.5.1 to find δ > 0 so that
Now we choose α > 0 so that 4Kα < and α < 1/m. Now suppose ku − xk < α, then because f
has Lipschitz constant K on B2/m (x), for khk < 1/m we have
1
f (u + h) + f (u − h) − 2f (u) ≤ f (x + h) + f (x − h) − 2f (x) + 4Kkx − uk < .
mn
This shows On is open as desired.
2.5.3. Suppose f : Rn → Rm is locally Lipschitz. Let pj be the j-th coordinate projection from
Rm to R. Define fj = pj ◦ f . Then fj is locally Lipschitz. Let D = {x ∈ Rn : fj is differentiable
at x, j = 1, 2, . . . , m}. It follows from Rademacher’s theorem, that Dc is a union of finitely many
null sets, and thus has measure 0. It remains to show that f is differentiable at each x ∈ D.
Indeed, fix x ∈ D and let A be the m by n matrix whose j-th row is ∇fj (x). It is not hard to
verify that ∇f (x) = A, indeed for > 0, choose δ > 0 so that
fi (x + th) − fi (x)
− h∇fi (x), hi < √ whenever h ∈ SRn , 0 < |t| < δ.
t m
Then for h ∈ SRn and 0 < |t| < δ, we have
v
um 2
f (x + th) − f (x)
uX fi (x + th) − fi (x)
− Ah
=
t − h∇fi (x), hi
t t
i=1
v
um
uX 2
< t = .
m
i=1
2.5.4. Let > 0, and let h ∈ SX . Let K > 0 be chosen so that f satisfies Lipschitz constant K
in a neighborhood δBr (x̄) of x̄ and kyk ≤ K. Now fix k ∈ N with khk − hk < /4K. Now choose
0 < δ < r so that
(10) |f (x̄ + thk ) − f (x̄) − hy, thk i| < t whenever |t| < δ.
4
Now for |t| < δ we have
|f (x̄ + th) − f (x̄) − hy, thi| ≤ |f (x̄ + thk ) − f (x̄) − hy, thk i| + 2Kkth − thk k
≤ |t| + 2K|t| = |t|.
2 4K
26
This shows f is Gâteaux differentiable at x̄ with ∇f (x̄) = y. So far, we didn’t use that X is finite-
dimensional. However, because X is finite-dimensional, and f is Lipschitz in a neighborhood of
x̄, Exercise 2.2.9 implies f is Féchet-differentiable at x̄ as desired.
Further comments. The reader will notice that last part of the proof of Rademacher’s theorem
also proves this fact. Observe further that even the assertion f is Gâteaux differentiable at x̄
may fail for continuous functions (the estimate pwith the Lipschitz constant
p above wasp crucial).
2
Indeed, on R let f (x, y) = 0 whenever y ≤ |x| and f (x, y) = y − |x| for y ≥ |x|. The
hypothesis of the exercise are satisfied at x̄ := (0, 0) for every direction h ∈ SR2 with y := (0, 0)
except for the direction h := (0, 1), however, f fails to be Gâteaux differentiable at (0, 0).
2.5.7. (a)⇒(b): g is almost everywhere differentiable by Theorem 2.5.1. On the other hand, the
subgradient inequality holds on U . Together, these facts imply (b).
(b)⇒(c): trivial.
(c)⇒(a): Fix u, v in U , u 6= v, and t ∈ (0, 1). It is not hard to see that there exist sequences (un )
in U , (vn ) in U , (tn ) in (0, 1) with un → u, vn → v, tn → t, and xn := tn un + (1 − tn )vn ∈ A, for
every n. By assumption, ∇g(xn )(un − xn ) ≤ g(un ) − g(xn ) and ∇g(xn )(vn − xn ) ≤ g(vn ) − g(xn ).
Equivalently, (1 − tn )∇g(xn )(vn − un ) ≤ g(un ) − g(xn ) and tn ∇g(xn )(un − vn ) ≤ g(vn ) − g(xn ).
Multiply the former inequality by tn , the latter by 1 − tn and adding we obtain
g(xn ) ≤ tn g(un ) + (1 − tn )g(vn ). Now let n tend to +∞, and deduce that
or in other words
g tu + (1 − t)v ≤ tg(u) + (1 − t)g(v). The convexity of g follows and the proof is complete.
2.5.8. Let fn (x) := n[f (x + 1/n) − f (x)]. Let G := {x : f 0 (x) exists}; then fn (x) → f 0 (x) for
x ∈ G. Let Fn := {x : |fj (x) − fk (x)| ≤ 1/2} for all j, k ≥ n. Then Fn is closed because it
S since fj − fk is continuous for each k, j ∈ N. Clearly (fn (x)) is
is an intersection of closed sets
convergent for x ∈ G and so Fn ⊃ G. Suppose Fn0 contains an open interval I for some n0 ∈ N.
Then |fn0 (x) − f 0 (x)| ≤ 1/2 almost everywhere on I. By the Fundamental theorem of calculus,
f 0 (x) = χS − χR\S almost everywhere, and so f 0 (x) = 1 on a dense subset of I and f 0 (x) = −1
on another dense subset of I we conclude that fn0 ≥ 1/2 on a dense subset of I, and fn0 ≤ −1/2
on a dense subset of I to contradict the continuity of fn0 . Hence Fn is nowhere dense for each
n ∈ N and so G is a set of first category.
2.6.1. Using the bilinear property of φ, and then the symmetric property we compute
27
Therefore,
1
φ(x, y) = [φ(x + y, x + y) − φ(x, x) − φ(y, y)].
2
That is, φ is uniquely determined by the values φ(h, h) such that h ∈ E.
2.6.3. (a) This part is essentially a restatement of the definitions. Indeed, assume that
converges uniformly on bounded sets to the function h 7→ hAh, hi as t → 0 for some matrix A.
Then given > 0, choose δ > 0 so that
f (x + ty) − f (x) − th∇f (x), yi
− hAy, yi < when 0 < |t| < δ, y ∈ SX .
1 2
2t
Now letting h := ty where y ∈ SX and 0 < |t| < δ in the preceding, implies
1
f (x + h) = f (x) + h∇f (x), hi + hAh, hi + o(khk2 ), khk → 0
2
as desired. The converse implication follows essentially by reversing the preceding steps. Indeed,
suppose f has a strong second-order Taylor expansion at x. Given > 0, choose δ > 0 so that
f (x + h) − f (x) + h∇f (x), hi + 1 hAh, hi ≤ 1 khk2 , when khk < δ.
2 2
Then given any r > 0 for y ∈ rBX and |t| < δ/r, we have
f (x + ty) − f (x) + th∇f (x), yi + 1 t2 hAy, yi ≤ 1 |t|2 .
2 2
(b) As in the proof of Theorem 2.6.1, let A be a symmetric matrix. By part (a), qt := 12 ∆2t f (x)
converges pointwise to q(h) := 12 hAh, hi. According to Proposition 2.6.3 , the functions qt are
closed and convex. Because they converge pointwise to q, it follows that q is convex. It follows
from Exercise 2.1.22 that the convergence is uniform on bounded sets.
2.6.4. (a) The definition of generalized Fréchet derivative implies φn → φ whenever φn ∈ ∂f (xn )
and xn → x. Then Corollary 2.5.3 implies ∂f (x) = {φ}. Thus f is Fréchet differentiable at x
with ∇f (x) = φ (Theorem 2.2.1).
28
(b) Suppose by way of contradiction that the definition of the generalized second-order Gâteaux
derivative at x works with distinct matrices A and B. Then we choose h ∈ SX such that Ah 6= Bh.
Thus we set := k(B − A)hk. Then choose δ > 0 so that
∂f (x + th) ⊂ φ + A(th) + |t|BE and ∂f (x + th) ⊂ φ + B(th) + |t|BE for |t| < δ.
3 3
Now for fixed 0 < t < δ, we let Λ ∈ ∂f (x + th) and write
t
Λ = φ + A(th) + y1 and Λ = φ + B(th) + y2 where ky1 k, ky2 k ≤ .
3
2
Then k(A − B)(th)k ≤ 3t which contradicts := k(B − A)hk.
(c) Suppose f has a generalized second-order Fréchet derivative at x. Then given > 0, there
exists δ > 0 so that
∂f (x + h) ⊂ ∇f (x) + Ah + o(khk)BE
as desired.
2.6.5. Suppose that for some matrix A : E → E, given any > 0 and bounded set W ⊂ E there
exists δ > 0 we so that
∂f (x + u) ⊂ ∇f (x) + Au + δBE
29
and thus f has a generalized second-order Fréchet derivative at x.
Conversely suppose f has a generalized second-order derivative at x. Let W ⊂ E be bounded
and let > 0. Choose K > 0 so that W ⊂ KBE . Using the definition of generalized derivative,
choose η > 0 so that
∂f (x + h) ⊂ ∇f (x) + Ah + khkBE whenever 0 < khk < η.
K
Now set δ := η/K and suppose 0 < t < δ and h ∈ W . Then kthk < η and using the previous
inclusion we obtain
∂f (x + th) − ∇f (x) ⊂ A(th) + kthkBE ;
K
and then dividing both sides by t and noting khk ≤ K we obtain
as desired.
2.6.6. The subgradient inequality ensures ∆2t is nonnegative. Hence the convexity, closedness
and
2 1 2
properness of ∆t thus follows because f possesses those properties. We next verify ∂ ∆t f (x) =
2
∆t [∂f ](x). Indeed, suppose y ∈ ∂ 21 ∆2t f (x) (h), then for u ∈ E,
f (x + t(h + u)) − f (x) − th∇f (x), h + ui − [f (x + th) − f (x) − th∇f (x), hi]
hy, ui ≤
t2
f (x + t(h + u)) − f (x + th) − h∇f (x), tui
=
t2
Multiplying both sides of the previous inequality by t (note: t > 0), we obtain
h3 cos(1/h) − 0
2.6.8. First, f 0 (0) = lim = 0 and f 0 (t) = 3t2 cos(1/t) + t sin(1/t) when t 6= 0 and
h→0 h
so f is continuously differentiable. Moreover,
1 1
t3 cos(1/t) = 0 + 0t + 0t2 + o(t2 ) = f (0) + f 0 (0)t + 0t2 + o(t2 )
2 2
and so f has a second-order Taylor expansion at 0. However, f 00 does not exist at 0 since
30
Exercises from Section 2.7
2.7.1. Note that given a nonempty set S, conv(S) is the collection of convexPcombinations of
elements in S, that is element of the form m m
P
i=1 i i where m ∈ N, si ∈ S,
λ s i=1 λi = 1 and
λi ≥ 0 for all 1 ≤ i ≤ m.
By the extreme value theorem, Pm f attains its maximum at some x̄ ∈ C. By Minkowski’s theo-
rem P (2.7.2), we may write x̄ = i=1 λi xi where xP i is an extreme point of C, λi ≥ 0 for 1 ≤ i ≤ m
and m i=1 λi = 1. By the convexity of f , f (x̄) ≤ m
i=1 λi f (xi ). Because f attains it maximum at
x̄, this implies f (xi ) = f (x̄) for each 1 ≤ i ≤ n.
2.7.2.(Exposed Points)
(a) Suppose x0 is an exposed point of a convex set C. Choose φ ∈ E such that hφ, x0 i = supC φ
and hφ, xi < hφ, x0 i for all x ∈ C \ {x0 }. Let x, y ∈ C \ {x0 }. Then for any 0 ≤ λ ≤ 1,
φ(λx + (1 − λ)y) = λφ(x) + (1 − λ)φ(y) < φ(x0 ).
Thus x0 is not a convex combination of x and y.
(b) Let C be a compact convex subset of E. Suppose x0 ∈ C is an exposed point, exposed by
φ ∈ E. Now suppose (xn ) ⊂ C is a sequence such that φ(xn ) → φ(x0 ), but xn →→ x0 . By
the compactness of C, there is a convergent subsequence of (xn ) such that (xnk ) → x̄ where
x̄ 6= x0 and x̄ ∈ C. Then φ(x̄) = lim φ(xn ) = φ(x0 ). This contradicts the fact φ exposes x0
in C.
(c) A proof that the exposed points are dense in the extreme points of a compact convex (or
any closed convex set in E) can be found in [369, Theorem 18.6, p. 167–68], the proof uses
the basic separation theorem (2.1.21) and Carathéodory’s theorem (1.2.5). We will outline
another proof that every compact convex subset of E is the closed convex hull of its strongly
exposed points that mimics techniques that will be used in Section 6.6.
Indeed, let C be a compact convex subset of E. Then σC : E → R is a continuous convex
function. By Theorem 2.5.1, σC is differentiable on a dense subset of E. Now let D be
the closed convex hull of the exposed points of C. Suppose by way of contradiction that
D 6= C, that is, we fix x̄ ∈ C \ D. According to the basic separation theorem (2.1.21),
we choose y ∈ E such that hy, x̄i > supD y. Because D is bounded, and the points of
differentiability of σC are dense in E, we may choose φ ∈ E such that σC is differentiable
at φ, and φ(x̄) > supD φ.
Now let x0 = ∇σC (φ) and so ∂σC (φ) = {x0 }. It is easy to check that φ(x0 ) = σC (φ).
Indeed, hx0 , 2φ − φi ≤ σC (2φ) − σC (φ) = σC (φ) and hx0 , φ − 0i ≥ σC (φ) − σC (0) = −σC (φ).
Thus φ(x0 ) = σC (φ). Further, x0 ∈ C, for otherwise we would use the basic separation
theorem (2.1.21) to find Λ ∈ E so that Λ(x0 ) > σC (Λ). This provides the immediate
contradiction hx, Λ − φi > σC (Λ) − σC (φ). Finally, if u ∈ C is such that φ(u) = σC (φ), then
hu, Λ − σi ≤ σC (Λ) − σC (φ) for all Λ ∈ E
and so u ∈ ∂σC (φ) = {x0 }. Thus φ attains its supremum on C uniquely at x0 , and this
yields the contradiction that x0 is an exposed point of C which is not in D.
√ √
Further notes. The set C := {(x, y) ∈ R2 : −1 ≤ x ≤ 1, − 1 − x2 ≤ y ≤ 1 + 1 − x2 } is a
compact convex set that is not the convex hull of its exposed points. Indeed, any exposed point
(x, y) of C satisfies |x| < 1. Thus the closure is needed in (c).
31