0% found this document useful (0 votes)
45 views31 pages

Solutions To Selected Exercises in Chapter Two

This document provides solutions to selected exercises from Chapter 2. It includes solutions for determining subdifferentials, properties of convex and quasi-convex functions, Jensen's inequality, and properties of support functions. The solutions involve proofs using definitions and properties of convex functions, such as subdifferentials, epigraphs, and subgradients.

Uploaded by

EDU CIPANA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views31 pages

Solutions To Selected Exercises in Chapter Two

This document provides solutions to selected exercises from Chapter 2. It includes solutions for determining subdifferentials, properties of convex and quasi-convex functions, Jensen's inequality, and properties of support functions. The solutions involve proofs using definitions and properties of convex functions, such as subdifferentials, epigraphs, and subgradients.

Uploaded by

EDU CIPANA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Solutions to Selected Exercises in Chapter Two

Exercises from Section 2.1

2.1.1. See [410, Theorem 4.43].

2.1.2. For x > x0 , x ∈ I, the three-slope inequality (2.1.1) implies


f (x) − f (x0 )
≥ f+0 (x0 ).
x − x0
For x < x0 , x ∈ I, the three-slope inequality (2.1.1) implies
f (x) − f (x0 )
≤ f−0 (x0 ).
x − x0
Hence f (x) − f (x0 ) ≥ λ(x − x0 ) for all x ∈ I whenever f−0 (x0 ) ≤ λ ≤ f+0 (x0 ). Considering x > x0 ,
the three-slope inequality (2.1.1) implies f+0 (x0 ) = sup{λ : λ(x−x0 ) ≤ f (x)−f (x0 ) for all x ∈ I},
and by the above, f 0 (x0 ) is the maximum such λ.

2.1.3. Suppose T : E → F is a linear mapping, and let A := y0 + T where y0 ∈ F . Then for any
λ ∈ R, we have

A(λx + (1 − λ)y) = T (λx + (1 − λ)y) + y0


= λ(T x) + (1 − λ)(T y) + y0
= λ(T x + y0 ) + (1 − λ)(T y + y0 )
= λAx + (1 − λ)Ay

For the converse, suppose A : E → F is an affine mapping. Let y0 = A(0), T (x) := Ax − y0 and
k ∈ R. Then

T (kx) = A(kx) − y0 = A(kx + (1 − k)0) − y0


= kA(x) + (1 − k)A(0) − y0 = kA(x) − ky0 = kT (x).

Furthermore,
 
1 1
T (x + y) = A(x + y) − y0 = A (2x) + (2y) − y0
2 2
1 1 1 1
= A(2x) + A(2y) − y0 = [A(2x) − y0 ] + [A(2y) − y0 ]
2 2 2 2
1 1
= T (2x) + T (2y) = T (x) + T (y).
2 2

2.1.4. Suppose f is positively homogeneous and subadditive, then for x, y ∈ X, α ≥ 0 and β ≥ 0,


we have
f (αx + βy) ≤ f (αx) + f (βy) = αf (x) + βf (y).
This shows f is sublinear. Conversely, suppose f is sublinear. Then clearly it is subadditive by
choosing α = 1 and β = 1 in the previous inequality. Now f (0) ≤ 0 · f (x) + 0 · f (−x) = 0 and

1
f (0) ≤ f (0) + f (0) implies f (0) ≥ 0, and so f (0) = 0. When λ = 0, 0 = f (0) = f (λx) = λf (x),
now for λ > 0,

f (λx) = f (λx + 0) ≤ λf (x) + 1 · f (0) = λf (x) = λf (λ−1 λx) ≤ λ · λ−1 f (λx)

where the second inequality follows from the first.

2.1.5. Suppose f is a convex function. Consider the set F = {x ∈ E : f (x) ≤ α}. In the case F
is not empty, let u, v ∈ F . Then for 0 ≤ λ ≤ 1,

f (λu + (1 − λ)v) ≤ λf (u) + (1 − λ)f (v) ≤ λα + (1 − λ)α = α.

Thus λu + (1 − λ)v ∈ F whenever 0 ≤ λ ≤ 1, and so f is quasi-convex. Analogously, suppose


u, v ∈ dom f . Then we can choose α ∈ R so that max{f (u), f (v)} ≤ α. It follows from the
previous reasoning that λu + (1 − λ)v ∈ dom f whenever 0 ≤ λ ≤ 1 and so dom f is convex.

2.1.6. Observe that a pointwise suprema of convex functions is convex, because the epigraph
is an intersection of convex sets which is convex. Notice that the function need not be proper.
Moreover, the epigraph will be closed if all of the functions are closed.
(a) Suppose f : X → [−∞, +∞] is convex. If f ≡ +∞, then epi f = ∅ is convex. Otherwise, let
(x, t), (y, s) ∈ epi f . Then for 0 ≤ λ ≤ 1 we have

f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) ≤ λt + (1 − λ)s.

Therefore, λ(x, t) + (1 − λ)(y, s) ∈ epi f as desired.


Conversely, suppose epi f is convex, if epi f = ∅ then f ≡ +∞ is convex. Otherwise, suppose
f (x), f (y) < +∞. Then (x, t), (y, s) ∈ epi f where f (x) < t and f (y) < s. Thus for 0 ≤ λ ≤ 1,
we have λ(x, t)) + (1 − λ)(y, s)) ∈ epi f . This implies

f (λx + (1 − λ)y) ≤ λt + (1 − λ)s,

for all t > f (x) and s > f (y). It follows that f is convex. (Hence this confirms that we can define
improper convex functions via their epigraphs as remarked earlier)
(c) Let x, y ∈ E, and 0 ≤ λ ≤ 1. Then

m(g(λx + (1 − λ)y) ≤ m(λ(g(x)) + (1 − λ)g(y)) ≤ λm(g(x)) + (1 − λ)m(g(y)),

as desired.
For a nice alternative geometric approach and explanation to (d), the reader is encouraged to
consult [255, Proposition 2.2.1]. For the converse, the convexity of g follows by restricting to the
case t = 1.

2.1.7. It is clear that if x0 ∈ int C, then x0 ∈ core C. For the converse, suppose C is convex
and x0 ∈ core C. Then there exist δi > 0 so that x0 + tei ∈ C for all |t| ≤ δi and i = 1, 2 . . . , n
where {ei }ni=1 is the usual basis of Rn . Now let δ = min{δ1 , δ2 , . . . , δn }. Because C is convex, it
follows that x0 + h ∈ C whenever h = a1 e1 + a2 e2 + . . . an en where |ai | ≤ δ/n for i = 1, 2, . . . , n.
Consequently, x0 ∈ int C 6= ∅. A conventional example of a nonconvex set F ⊂ R2 with (0, 0) ∈
core F \ int F is F = {(x, y) ∈ R2 : |y| ≥ x2 or y = 0}; see also Figure 2.4 for another example.

2
2.1.8. Let x ∈ E be a point of continuity of the convex function f . The max formula (2.1.19)
ensures that ∂f (x) 6= ∅. Moreover, f has Lipschitz constant K on some neighborhood U of x
(Theorem 2.1.10). Because hv, y − xi ≤ f (y) − f (x) for all y ∈ U and v ∈ ∂f (x), it follows that
kvk ≤ K. Thus, ∂f (x) ⊂ KBE . Now suppose (vn ) ⊂ ∂f (x) and vn → v. Then

hv, y − xi = lim hvn , y − xi ≤ f (y) − f (x), for all y ∈ E.


n→∞

Therefore, ∂f (x) is closed. Finally, suppose u, v ∈ ∂f (x) and 0 ≤ λ ≤ 1. Writing w = λu + (1 −


λ)v, for each y ∈ E we have

hw, y − xi = λhu, y − xi + (1 − λ)hv, y − xi


≤ λ[f (y) − f (x)] + (1 − λ)[f (y) − f (x)] = f (y) − f (x).

This shows w ∈ ∂f (x). Therefore, ∂f (x) is a nonempty, closed, convex and bounded subset of
E.

2.1.9. Let x ∈ dom f and d ∈ E. If x + d 6∈ dom f , f (x + d) − f (x) = ∞ so the inequality is


clear. In the case x + d ∈ dom f , the three slope inequality implies for 0 < t < 1,
f (x + td) − f (x) f (x + d) − f (x)
f 0 (x; d) ≤ ≤ .
t 1
as desired. Thus f (x + d) ≥ f (x) for all d ∈ E if and only if f 0 (x; d) ≥ 0 for all d ∈ E. Also,
0 ∈ ∂f (x) if and only if h0, di ≤ f (x + d) − f (x) for all d ∈ E if and only if f (x + d) ≥ f (x) for
all d ∈ E.
R
2.1.10. The measurability of φ ◦ f follows because φ is continuous (see [384]). Let a := f dµ
and note the bounds on f ensure a ∈ I. Apply Corollary 2.1.3 to obtain λ ∈ R such that
φ(t) ≥ φ(a) + λ(t − a) for all t ∈ I. Then φ(f (t)) − λ(f (t) − a) − φ(a) ≥ 0. Integrating we obtain
Z
[φ(f (t)) − λ(f (t) − a) − φ(a)]dµ ≥ 0.

Because µ(Ω) = 1 and because of the choice of a, this implies


Z Z  Z 
φ(f (t))dµ ≥ λ(a − a) + φ f dµ = φ f dµ ,
Ω Ω Ω

as desired.

2.1.11. (a) For arbitrary a, b ∈ R and 0 < λ < 1, let f be defined by f (x) := a for 0 ≤ x ≤ λ
and f (x) := b for λ < x ≤ 1. Then
Z 1 
g f (x) dx = g(λa + (1 − λ)b)
0

while Z 1
g(f ) dx = λg(a) + (1 − λ)g(b).
0
The original assumption Z 1  Z 1
g f (x)dx ≤ g(f )dx
0 0

3
then implies g(λa + (1 − λ)b) ≤ λg(a) + (1 − λ)g(b). This establishes the convexity of g.
(b) Applying Jensen’s inequality when φ := exp(·) we have
Z  Z
exp f dµ ≤ exp(f ) dµ.
Ω Ω

When Ω is the finite set {s1 , s2 , . . . , sn } with µ({si }) = 1/n and f (si ) = yi for i = 1, 2, . . . , n, this
becomes  
1 1
exp (y1 + . . . + yn ) ≤ (ey1 + . . . + eyn ), yi ∈ R.
n n
When xi = eyi this becomes
1
(x1 x2 · · · xn )1/n ≤ (x1 + x2 + . . . + xn ),
n
that is, the arithmetic-geometric mean inequality.

2.1.12. Suppose f is not identically −∞. Then fix x0 where f is real-valued. We may assume
f (x0 ) = 0. Suppose f (x0 + h) > 0 for some h ∈ E. Then for t > 1,

0 < f (x0 + h) ≤ (1 − t−1 )f (x0 ) + t−1 f (x0 + th) = t−1 f (x0 + th).

Thus limt→∞ f (x0 + th) = limt→∞ tf (x0 + h) = ∞ which is a contradiction. Finally, in the case
f (x0 + h) < 0 for some h ∈ E, we would deduce f (x0 − h) > 0, and so this, too, is impossible.

2.1.13. (a) For k ≥ 0, observe that x ∈ λC if and only if kx ∈ kλC. Therefore γC is positively
homogeneous (the definition of γC := inf{λ ≥ 0 : x ∈ λC} ensures γC (0) = 0 even when 0 6∈ C).
Suppose C ⊂ E is convex. Let x, y ∈ E. In the case γC (x) = ∞ or γC (y) = ∞ then it is clear

γC (λx + (1 − λ)y) ≤ λγC (x) + (1 − λ)γC (y) when 0 < λ < 1.

Suppose γC (x), γC (y) are real-valued and  > 0. Choose α, β so that γC (x) ≤ α < γC (x) + ,
γC (y) ≤ β < γC (y) +  and α1 x, β1 y ∈ C. So we choose u, v ∈ C so that x = αu and y = αv (the
case x = 0 or y = 0 are fine). Then

λαu + (1 − λ)βv
∈C
λα + (1 − λ)β

and therefore
γC (λαu + (1 − λ)βv) ≤ λα + (1 − λ)β
or in other words, γC (λx + (1 − λ)y) ≤ λγC (x) + (1 − λ)γC (y) + . This shows γC is convex when
C is convex, in fact, γC is subadditive because
 
1 1 1 1
γC (x + y) = γC (2x) + (2y) ≤ γC (2x) + γC (2y) = γC (x) + γC (y).
2 2 2 2

Hence γC is sublinear when C is convex.


(b) Suppose 0 ∈ core C. Given x ∈ E, there exists t > 0 so that tx ∈ C and then x ∈ 1t C. Thus
γC (x) ≤ 1/t. Because γC is convex and everywhere finite on E, it is continuous everywhere by
Theorem 2.1.12.

4
(c) Suppose 0 ∈ core C. Then γC is continuous, and therefore {x ∈ E : γC (x) ≤ 1} is closed.
Observe that γC (x) ≤ 1 for all x ∈ C, consequently cl C ⊂ {x ∈ E : γC (x) ≤ 1}.

2.1.14. For example, let f (x, y) = − 4 xy when x ≥ 0, and y ≥ 0. The strict convexity assertion
is probably best shown by computing the Hessian (as introduced in Section 2.2) of f at (x, y),
which is  3 −7/4 1/4 1 −3/4 −3/4

16 x y − 16 x y
H= 1 −3/4 −3/4 3 1/4 −7/4 .
− 16 x y 16 x y
Then f is strictly convex on the interior of its domain because this matrix is positive definite for
all (x, y) with x > 0 and y > 0 which follows because h11 > 0, and |H| > 0 at all such (x, y).

2.1.15. Let (fi ) be a family of proper functions and define f : E → [−∞, +∞] by
nX X X o
f (x) := inf λi fi (xi ) : λi = 1, λi ≥ 0, λi finitely nonzero, λ i xi = x .

Let u, v ∈ dom f and and let α and β be any real numbers satisfying f (u) < α and f (v) < β.
Now choose sums as in the definition above so
X X X X
αi f (ui ) < α, βi f (vi ) < β where αi ui = u, βi vi = v.
P P
Then for any 0 ≤ λ ≤ 1, we have [λαi +(1−λ)βi ] = 1 and (λαi ui +(1−λ)βi vi ) = λu+(1−λ)v.
Then by the definition of f ,
X
f (λu + (1 − λ)v) ≤ λαi fi (ui ) + (1 − λ)βi fi (vi ) < λα + (1 − λ)β.

The convexity of f follows from this. Next we show that f is the largest convex function minorizing
the family.P Indeed suppose
P h is convex and h minorizes the family (fi ). Then for any x such
that x = λi xi where λi = 1, λi ≥ 0, and only finitely many of the λi are nonzero, by the
convexity of h and minorization property we have
X X X
h( λi xi ) ≤ λi h(xi ) ≤ λi f (xi ).

Taking the infimum over all such sums we see that h ≤ f .


For the example, let f : R → R be defined by f (t) := t2 if t 6= 0 and f (0) = 1. Notice that
epi conv f = {(x, y) ∈ R2 : y ≤ x2 } and conv epi f = epi conv f \ {(0, 0)}.

2.1.16. Suppose T : X → Y is an open mapping. Then T (U ) is open where U = int BX . Then,


0 ∈ int T (U ), so we choose r > 0 so that rBY ⊂ T (U ). For any y ∈ Y , choose n > 0 so that
n−1 kyk < r. Then n−1 y ∈ T (U ), and so we let x ∈ X be chosen so T x = n−1 y. Then T (nx) = y
and so T is onto as desired.
Conversely, suppose T is onto. Let y ∈ T (U ) where U is an open subset of X, and choose x ∈ U
so that T x = y. Let V be an open convex set so that x ∈ V ⊂ U . Now fix h ∈ Y . Because T is
onto, we fix v ∈ X so that T v = h. Because V is open, we choose δ > 0 so that x + tv ∈ V for
all 0 ≤ t ≤ δ. Then T (x + tv) = y + th ∈ T (V ) for all 0 ≤ t ≤ δ. Thus y ∈ core T (V ), and so
y ∈ int T (V ) ⊂ T (U ). Thus T (U ) is open.

2.1.18. Let x be such that f (x) is real-valued. Suppose f (x) = (cl f )(x) and let xn → x. Because
the epigraph of cl f is closed and f ≥ cl f we know

lim inf f (xn ) ≥ lim inf (cl f )(xn ) ≥ (cl f )(x) = f (x)
n→∞ n→∞

5
and so f is lower semicontinuous at x. Conversely, suppose f is lower semicontinuous at x. Let
(xn , tn ) ∈ epi f be such that (xn , tn ) → (x, (cl f )(x)) (possibly in the extended sense in the second
coordinate). Then
(cl f )(x) = lim tn ≥ lim inf f (xn ) ≥ f (x).
n→∞ n→∞

Because (cl f ) ≤ f , we conclude (cl f )(x) = f (x). According to the lower semicontinuity of f at
x we know f (u) > α > −∞ on some neighborhood of x. It follows easily that f is proper, see
the proof of Lemma 2.3.3.

2.1.19. Suppose lim inf kxk→∞ f (x)/kxk > β > 0. Let S = {x : f (x) ≤ α} for some α ∈ R. Find
k > 0 so that kβ > α, and f (x)/kxk > β for all kxk ≥ k. If kxk ≥ k, then f (x) > kβ > α.
Therefore, S ⊂ kBE . p
Clearly f : R → R defined by f (t) := |t| is not convex, and {t : f (t) ≤ α} = [−α2 , α2 ] for each
α > 0. However,
lim f (t)/|t| = lim |t|−1/2 = 0.
|t|→∞ |t|→∞

S
2.1.22. (a) For each n ∈ N, let Fn := {x ∈ S : |fk (x)| ≤ n}. Then n Fn = S because
fn (x) → f (x) and f (x) ∈ R. According to the Baire category theorem, there exist N ∈ N such
that FN contains a relatively open (in E) subset U . Then |fk (x)| ≤ N for all k ∈ N, x ∈ U .
(b) By shifting, we may assume x = 0 where x is some given point in int S. Now rBE ⊂ int S
for some r > 0. By part (a), there is some Br1 (y) ⊂ rBE for which fk (x) ≤ M for all k ∈ N,
x ∈ Br1 (y). Replacing M with a larger number as necessary, we may also assume fk (−y) ≤ M for
all k ∈ N. By the convexity of fn , we have that fn ≤ M on conv({−y} ∪ Br1 (y)) which contains
r1 r1
2 BE . Thus fn is uniformly bounded on 2 BE .
Now let K be a compact subset of int S. Suppose by way of contradiction there exists  > 0 and
a subsequence (xnk ) ⊂ K such that

(1) |fnk (xnk ) − f (xnk )| >  for all nk .

By passing to a further subsequence as necessary, we may assume xnk → x̄ where x̄ ∈ K. By


the previous paragraph, we find r > 0, so that fnk is uniformly bounded on Br (x̄). The proof of
Theorem 2.1.10 shows that (fnk ) is equi-Lipschitz on B r2 (x̄). Hence (fnk ) converges uniformly to
f on B r2 (x̄). This is a contraction with (1) because (xnk ) is eventually in B r2 (x̄).

2.1.23. (a) Suppose f does not have Lipschitz constant K ≥ 0 on U . Fix u, v ∈ U such that
f (v) − f (u) > Kkv − uk, and let φ ∈ ∂f (v). Then

hφ, u − vi ≤ f (u) − f (v) < −Kku − vk

and so kφk > K.


Conversely, suppose f has Lipschitz constant K ≥ 0 on U . Let u ∈ U . Then ∂f (u) is not empty
because f is continuous. Moreover, let φ ∈ ∂f (u). Then

hφ, v − ui ≤ f (v) − f (u) ≤ Kkv − uk

for all v ∈ U . Because u is in the interior of U , it follows that kφk ≤ K.


The “in particular”’ part, follows from the first part because f is continuous on U , and therefore,
locally Lipschitz on U .

6
(b) Because dom f = E, f is continuous on E, and the extreme value theorem implies f is
bounded on bounded subsets of E. Exercise 2.1.22 then ensures f is Lipschitz on bounded subsets
of E, and then part (a) of this exercise ensures ∂f maps bounded subsets of E to bounded subsets
of E.

Exercises from Section 2.2

2.2.1. A proof of the convexity of f using the Hessian is sketched in [369, pp. 27–28]. Because
f is convex, we have  
x+y 1 1
f ≤ f (x) + f (y)
2 2 2
which means
s
1√
  
n x1 + y1 xn + yn 1√
− ··· ≤− n
x1 x2 · · · xn − n y1 y2 · · · yn .
2 2 2 2

Multiplying both sides of the previous inequality by −2 yields the result.

2.2.2. (a) ⇒ (b): Let g := −1/f and φ(t) := − ln(−t) for t ∈ (−∞, 0). Then φ is convex and
increasing, and g is convex. Therefore, ln ◦f = φ ◦ g is convex.
(b) ⇒ (c): g := ln ◦f is convex, therefore, f = exp(ln ◦f ) is convex since exp is convex and
increasing.

2.2.4. We provide details as in [34, Lemma 3.2]. For a function f on I consider the associated
Bregman distance Df defined by Df (x, y) := f (x) − f (y) − f 0 (y)(x − y). Let g := −1/h so that
g 0 = h0 /h2 . (a): 1/h is concave if and only if g is convex if and only if Dg is nonnegative if and
only if
0 ≤ −1/h(x) + 1/h(y) − (h0 (y)/h2 (y))(x − y) for all x, y ∈ I
if and only if
0 ≤ h(x)h(y) − h2 (y) − h(x)h0 (y)(x − y) for all x, y ∈ I.
Part (b) is similar, noting that Df ≡ 0 if and only if f is affine. Part (c) was shown in Ex-
ercise 2.2.2. Part (d): 1/h is concave if and only if g is convex if and only if g 00 = (h2 h00 −
2h(h0 )2 )/h4 ≥ 0 if and only if hh00 ≥ 2(h0 )2 .

2.2.5. First, g is a real-valued convex function on [0, 1] and so Theorem 2.1.2(d) ensures that g
is differentiable except at possibly countably many t ∈ [0, 1]. Then Theorem 2.2.1 implies that
at points of differentiability ∇g(t) = {∂g(t)}. Now let t ∈ (0, 1) be a point of differentiability of
of g. Observe that

hφt , shi ≤ f (x + (s + t)h) − f (x + th) = g(t + s) − g(t),

Hence hφt , hi ∈ ∂g(t) and we conclude ∇g(t) = hφt , hi.

2.2.8. Suppose f is Fréchet differentiable at x0 . Let φ = f 0 (x0 ). Given  > 0 we choose δ > 0 so
that δkφk <  and

|f (x0 + h) − f (x0 ) − φ(h)| ≤ khk
2

7
whenever 0 < khk < δ. Now suppose kx − x0 k < δ. Then x = x0 + h where khk < δ and the
previous inequality then implies

|f (x) − f (x0 )| ≤ |f (x0 + h) − f (x0 )| ≤ (kφk + /2)khk < .

Thus, f is continuous at x0 .

2.2.9. Suppose f has Lipschitz constant K in a neighborhood U of x0 and Gâteaux differentiable


at x0 with Gâteaux derivative φ. Then kφk ≤ K. Suppose f is not Fréchet differentiable at x0 .
Then there exists  > 0 and tn → 0+ , hn ∈ SE such that

|f (x0 + tn hn ) − f (x0 ) − φ(tn hn )| ≥ tn 

Because SX is compact, we may replace (hn ) above with one of its convergent subsequences, so
we suppose hn → h ∈ SX . When khn − hk < /3K. For n sufficiently large we have

|f (x0 + tn h) − f (x0 ) − φ(tn h)| ≥ |f (x0 + tn hn ) − f (x0 ) − φ(tn hn )| − 2Ktn khn − hk


≥ tn  − 2K(/3K) ≥ tn /3

which contradicts the Gâteaux differentiability of f . For a slightly different proof of this, see the
last part of the proof of Theorem 2.5.4.

2.2.13. Suppose f is convex on the inverval I, and suppose J := [a, b] is a compact subinterval
of I. Then for m affine and 0 ≤ λ ≤ 1, we have

(f + m)(λa + (1 − λ)b) ≤ λ(f + m)(a) + (1 − λ)(f + m)(b)


≤ max{(f + m)(a), (f + m)(b)}.

Thus the supremum of f + m is attained at one of the endpoints a or b.


Conversely, suppose a, b ∈ I. Now choose an affine function m such that (f +m)(a) = (f +m)(b).
Because (f + m) attains its max on [a, b] at an endpoint, we know it attains its max on [a, b] at
both a and b. Then, for 0 ≤ λ ≤ 1, we have

(f + m)(λa + (1 − λ)b) = f (λa + (1 − λ)b) + λm(a) + (1 − λ)m(b)


≤ max(f + m) = λ(f (a) + m(a)) + (1 − λ)(f (b) + m(b)).
[a,b]

Consequent, f (λa + (1 − λ)b) ≤ λf (a) + (1 − λ)f (b) and so f is convex as desired.

2.2.16. Suppose a < b and M is an affine function through (a, f (a)) and (b, f (b)), and let m be
an affine minorant of f passing through ((a + b)/2, f ((a + b)/2) (using the max formula (2.1.19)).
Then m ≤ f ≤ M , and thus
  Z b
a+b 1 f (a) + f (b)
f ≤ f (t) dt ≤
2 b−a a 2
since these quantities are the averages of m, f and M respectively on [a, b].

2.2.20. First, for (x, y) ∈ dom f \ {(0, 0)}, the Hessian of f at (x, y) is

6xy −2 −6x2 y −3
 
H= .
−6x2 y −3 6x3 y −4

8
Then H is positive semidefinite for such (x, y) because |H| = 0 and 6xy −2 ≥ 0. Also, for
(x, y) ∈ dom f , we have

x3
λf (x, y) + (1 − λ)f (0, 0) = λ = f (λ(x, y))
y2
and together we deduce f is convex. It is also closed because: (i) its domain is closed (ii) x3 /y 2
is continuous when y 6= 0, (iii) lim inf (x,y)→(0,0) f (x, y) ≥ f (0, 0). However, f is not continuous at
(0, 0) even when considering that the underlying topological space is the domain. This is because
lim(x,x2 )→(0,0) f (x, x2 ) = +∞.
A further observation is that this type of example cannot occur on R. Indeed, if f : [a, b] → R
is convex and lower semicontinuous then f is continuous as a function on [a, b]. Indeed, it
is continuous on (a, b) and further lim inf x→b− f (x) ≥ f (b) by lower semicontinuity while the
convexity of f implies lim supx→b− f (x) ≤ f (b). Similarly, f is continuous from the right at a.

2.2.21. Suppose x, y ∈ U , and x∗ ∈ ∂f (x), y ∗ ∈ ∂f (y) (which are not empty by the max
formula (2.1.19)). Then by the subdifferential inequality

hy ∗ − x∗ , y − xi = y ∗ (y − x) + x∗ (x − y)
≥ f (y) − f (x) + f (x) − f (y) = 0.

Hence the subdifferential is a monotone mapping. The ‘in particular’ statement follows because
∂f (x) = {∇f (x)} when f is differentiable at x.

2.2.22. (a) Suppose not, then there exists xn → x0 and  > 0 so that φn ∈ ∂f (xn ), but
φn 6∈ ∂f (x0 ) + BE . Use the local Lipschitz property of f (Theorem 2.1.12) to deduce that
(kφn k)n is bounded. Then use compactness to find convergent subsequence, say φnk → φ. Now
fix y ∈ E. Then

φ(y) − φ(x0 ) = φ(y) − φ(xnk ) + φ(xnk ) − φ(x0 )


= lim φnk (y − xnk ) + φnk (xnk − x0 )
k→∞
≤ lim f (y) − f (xnk ) + φnk (xnk − x0 ) = f (y) − f (x).
k→∞

Therefore φ ∈ ∂f (x0 ) which contradicts that φnk → φ.


(b) This follows from (a) and the fact ∂f (x0 ) = {f 0 (x0 )} (Theorem 2.2.1).
(c) Suppose not, then there is a subsequence (nk ) and  > 0 such that φnk ∈ ∂fnk (wnk ), wnk ∈ W
but φnk 6∈ ∇f (wnk ) + BE . Because fn → f uniformly on bounded sets, it follows that (fn ) is
uniformly bounded on bounded sets, and thus (fn ) is eventually uniformly Lipschitz on bounded
sets. Hence by passing to a further subsequence, if necessary, we may assume wnk → w0 , and
φnk → φ for some w0 , φ ∈ E. Now let y ∈ E, and observe

φ(y) − φ(x0 ) = φ(y) − φ(wnk ) + φ(wnk ) − φ(w0 )


= lim φnk (y − wnk ) + φnk (wnk − x0 )
k→∞
≤ lim fnk (y) − fnk (wnk ) + φnk (wnk − w0 ) = f (y) − f (w0 ),
k→∞

where the last equality follows by the uniform convergence of fnk to f on bounded sets. Thus
φ ∈ ∂f (w0 ), that is φ = ∇f (w0 ). By (b), ∇f (wk ) → ∇f (w0 ) = φ which yields a contradiction
because kφnk − ∇f (wnk )k > .

9
(d) For example, let fn := max{| · | − 1/n, 0} and f := | · | on R. Then ∂fn (1/n) 6⊂ ∂f (1/n) + 21 BR
for any n ∈ N. Indeed, ∂fn (1/n) = [0, 1] while ∂f (1/n) + 12 BR = [1/2, 3/2]. For the remaining
part, suppose no such N exists. As in (c), choose φnk ∈ ∂fnk (wnk ) but φnk 6∈ ∂f (w) + BE for
kw − wnk k < δ and as in (c), wnk → w0 and φnk → φ for some w0 ∈ E and φ ∈ E. Again, as in
(c), one can show that φ ∈ ∂f (w0 ). However, for kwnk − w0 k < δ, we have φnk 6∈ ∂f (w0 ) + BE
which is a contradiction.

2.2.23. (a) We will use the max formula (2.1.19). Suppose x0 ∈ bndy C and take xn 6∈ C such
that xn → x0 . By the max formula (2.1.19), let φn ∈ ∂dC (xn ). Then kφn k ≤ 1 because dC has
Lipschitz constant 1, and kφn k ≥ 1, because we choose x̄n ∈ C such that dC (xn ) = kxn − x̄n k,
and then hφn , x̄n − xn i ≤ −dC (xn ). By the compactness of BE , we know φnk → φ̄ for some φ̄,
and kφ̄k = 1. Also, for any x ∈ E, we have

hφ̄, x − x0 i = lim hφnk , x − xnk i ≤ lim(dC (x) − dC (xnk ) = dC (x) − dC (x0 ).


k→∞ nk

Thus φ̄ ∈ ∂dC (x0 ).√Then nφ̄ ∈ ∂ndC (x0 ) and so nφ̄ + φ ∈ ∂f (x0 ).
(b) Let f (t) := − t for t ≥ 0 and f (t) := +∞ when t < 0. Then ∂f (0) = ∅. Let g := δ[0,+∞) .
Then ∂g(0) = (−∞, 0].
Further notes. The proof of (a) shows that given any nonempty convex set A with x0 ∈ bndy A,
that there exists φ̄ ∈ ∂dC (x0 ) with kφ̄k = 1, and C being the closure of A. It then follows that
φ̄(x0 ) = supC φ̄. Had we done this separation theorem earlier, we could have more elegantly
completed the proof of Theorem 2.2.1 and part (a) of this exercise.

Exercises from Section 2.3

2.3.2. Using calculus, one can show that for f := | · |p /p on R one has f ∗ = | · |q /q. The
Fenchel–Young inequality (2.3.1) then shows f (x) + f ∗ (y) ≥ xy, that is,

|x|p /p + |y|q /q ≥ xy for all real x and y,

as desired.

2.3.3. Let kf kp = α, kgkq = β where α, β > 0 (if either α = 0 or β = 0, then f g = 0 a.e. and so
the inequality is trivially true). Now we integrate both sides of the Young inequality:

1 |f (x)|p 1 |g(x)|q
Z Z
f (x) g(x) 1 1
dµ ≤ p
+ q
dµ = + = 1.
X α β X p α q β p q

Multiplying both sides by αβ yields the result.

PN p
2.3.4. (a) To show that x 7→ k=1 |xk | is convex observe that g := | · |p is a convex function R,
and then
N
X
x 7→ g(Pk (x)) where Pk (x) = xk
k=1

is a sum of convex functions since g ◦ Pk is a convex function for each k as it is a composition of a


convex function with a linear function. Now use the gauge construction as suggested in the hint.

10
(b) Alternatively, one may apply the discrete form of Hölder’s inequality of Exercise 2.3.3 as
follows:
N
X N
X
p
|xk + yk | = |xk ||xk + yk |p−1 + |yk ||xk + yk |p−1
k=1 k=1
N
! p1 N
! 1q
X X
≤ |xk |p |xk + yk |(p−1)q +
k=1 k=1
N
! p1 N
! 1q
X X
|yk |p |xk + yk |(p−1)q
k=1 k=1
N
! 1q  N
! p1 N
! p1 
X X X
= |xk + yk |p  |xk |p + |yk |p 
k=1 k=1 k=1

N
! 1q
X
where we used (p − 1)q = p. Now divide both sides by |xk + yk |p using that 1 − 1/q = 1/p
k=1
to obtain
N
! p1 N
! p1 N
! p1
X X X
|xk + yk |p ≤ |xk |p + |yk |p
k=1 k=1 k=1

as desired.

2.3.12. Part (a).

(i) According to the Fenchel–Young inequality (2.3.1) we have

f (x) + g(Ax) ≥ hA∗ φ, xi − f ∗ (Aφ) + h−φ, Axi − g ∗ (−φ)


= hφ, Axi − f ∗ (A∗ φ) − hφ, Axi − g ∗ (−φ)
= −f ∗ (A∗ φ) − g ∗ (−φ).

Taking the infimum of the left-hand side over x ∈ E, and then taking the supremum of the
right-hand side over φ ∈ Y establishes the weak duality inequality p ≥ d.

(ii) (This part is for the subdifferential sum rule). Let x ∈ E and suppose φ ∈ ∂f (x) and
Λ ∈ ∂g(Ax). Then for any v ∈ E we have

hφ + A∗ Λ, v − xi = hφ, v − xi + hΛ, A(v − x)i


≤ f (v) − f (x) + g(Av) − g(Ax).

Thus φ + A∗ Λ ∈ ∂(f + g ◦ A)(x) from which the inclusion follows.

(iii) Fix u ∈ Y . Then f (x) + g(Ax + u) < ∞ for some x ∈ X if and only if there exist x ∈ dom f
such that Ax+u ∈ dom g if and only if u ∈ dom g−A dom f . Thus dom h = dom g−A dom f .
To check the convexity of h, suppose u, v ∈ dom h and let α, β be any numbers such that

11
h(u) < α and h(v) < β. Choose x1 ∈ E such that f (x1 ) + g(Ax1 + u) < α and x2 ∈ E such
that f (x2 ) + g(Ax2 + v) < β. Then for any 0 ≤ λ ≤ 1, we have

h(λu + (1 − λ)v) = inf {f (x) + g(Ax + λu + (1 − λ)v)}


x∈E
≤ f (λx1 + (1 − λ)x2 ) + g(A(λx1 + (1 − λ)x2 ) + λu + (1 − λ)v)
≤ λf (x1 ) + (1 − λ)f (x2 ) + λg(Ax1 + u) + (1 − λ)g(Ax2 + v)
< λα + (1 − λ)β.

Thus, h(λu + (1 − λ)v) ≤ λh(u) + (1 − λ)h(v) as desired.

(iv) Let x0 ∈ dom f be such that Ax0 ∈ cont g. Let y0 = Ax0 . Because g is continuous at y0 ,
this implies y0 + rBY ⊂ dom g for some r > 0. Therefore,

rBY = (y0 + rBY ) − Ax0 ⊂ dom g − A dom f

which implies 0 ∈ core(dom g − A dom f ) as desired.

Part (b).

(i) First, inclusion was completed in (a)(ii) above. Conversely suppose φ ∈ ∂(f + g ◦ A)(x̄).
Applying the Fenchel–Young inequality (2.3.1), and then applying the Fenchel duality the-
orem (2.3.4), we obtain

f (x̄) + g(Ax̄) − hφ, x̄i = inf {f (x) + g(Ax) − hφ, xi} = inf {(f − φ)(x) + g(Ax)}
x∈E x∈E
∗ ∗ ∗
= −(f − φ) (A φ̄) − g (−φ̄),

where φ̄ ∈ Y is a point where d in the Fenchel duality theorem (2.3.4) is attained. Therefore,

(f − φ)(x̄) − hA∗ φ̄, x̄i + g(Ax̄) − h−φ̄, Ax̄) = −(f − φ)∗ (A∗ φ̄) − g ∗ (−φ̄)

and by Fenchel–Young inequaltiy (2.3.1), A∗ φ̄ ∈ ∂(f − φ)(x̄) and −φ̄ ∈ ∂g(Ax̄). The first
inclusion implies A∗ φ̄ + φ ∈ ∂f (x̄), and using the second inclusion we check

h−A∗ φ̄, u − x̄i = h−φ̄, A(u − x̄)i ≤ g(Au) − g(Ax̄) for all u ∈ E;

thus −φ̄ ∈ ∂g(Ax̄), consequently equality holds in the sum formula.

(ii) The previous part has proved the ‘only if’ assertion, and we can essentially reverse our steps
to deduce the ‘if’ assertion.

2.3.13. Suppose f : E → R has Lipschitz constant k. Suppose φ ∈ E and kφk > k. Choose
x0 ∈ E with kx0 k = 1 and φ(x0 ) > k. Then limt→∞ φ(tx) − f (tx) → ∞. So φ 6∈ dom f ∗ .
For the converse, assume dom f ∗ ⊂ kBE is not empty, then f is bounded below by φ − a where
φ ∈ dom f ∗ and a = f ∗ (φ). (Using relative interior properties, one knows that the domain of
the subdifferential of a proper convex function on E is nonempty, and hence the domain of the
conjugate is not empty; see Theorem 2.4.8). Then if dom f 6= E, one can find yn ∈ dom f ∗ such
that kyn k → ∞: for example, letting fn := φ − a + ndC where C := dom f , one has fn ≤ f , but
for x 6∈ C, and y ∈ ∂fn (x), one has kyk ≥ n − kφk. Since y ∈ dom f ∗ , this yields a contradiction.

12
Thus dom f = E and thus f is continuous. In the event f is not k-Lipschitz, one can choose
u, v ∈ E such that f (v) − f (u) > kku − vk. Now let y ∈ ∂f (v). Then
hy, u − vi < −kku − vk
and so kyk > k, but y ∈ dom f ∗ which is a contradiction.
This result fails if f is not convex: for example, consider f (x) := |x| on R. Then dom f ∗ = {0},
p

but f is not Lipschitz.

2.3.14. Basic facts about infimal convolutions.


(a) Let (x, s) ∈ epi f and (y, t) ∈ epi g. Then
(f g)(x + y) ≤ f (x) + g(x + y − x) = f (x) + g(y) ≤ s + t
and so (x + y, s + t) ∈ epi(f g). That is, epi f + epi g ⊂ epi(f g). Now suppose h
is a function such that there exists x̄ ∈ E with h(x̄) > (f g)(x̄). Choose t ∈ R such
that h(x̄) > t > (f g)(x̄). Then we choose y ∈ E such that f (x̄) + g(y − x̄) < t Then
(x̄, t) 6∈ epi h, but (x̄, t) ∈ epi(f g). Thus (f g) is the largest function whose epigraph
contains epi f + epi g.
(b) As suggested, let f (x) := ex and g(x) := 0. Then f and g are continuous and convex, but
epi f + epi g = {(x, y) ∈ R2 : y > 0}.
(c) As suggested, let f (x) := x and g(x) := 0. For any u ∈ R,
(f g)(u) ≤ f (−en ) + g(u + en ) = −en for all n ∈ N.
Thus (f g)(u) = −∞ for all u ∈ R.
(d) As suggested let C := {(x, y) : y ≥ ex } and D := {(x, y) : y ≥ 0}. Then δC δD = δ{(x,y):y>0}
which is not closed.
(e) Suppose f and g are convex functions. Let u, v ∈ dom(f g). Let α and β be any real
numbers satisfying (f g)(u) < α and (f g)(v) < β. Now choose x1 , x2 ∈ E so that
f (x1 ) + g(u − x1 ) < α and f (x2 ) + g(v − x2 ) < β.
Then
(f g)(λu + (1 − λ)v) ≤ f (λx1 + (1 − λ)x2 ) + g(λ(u − x1 ) + (1 − λ)(v − x2 ))
≤ λf (x1 ) + (1 − λ)f (x2 ) + λg(u − x1 ) + (1 − λ)g(v − x2 )
< λα + (1 − λ)β.
It follows that f g is convex.
(f) Notice that (c) already shows this may fail if one of the functions is not bounded below,
and we need to explicitly assume g is proper and let x0 ∈ dom g. Then
inf f + inf g ≤ (f g)(x) ≤ g(x0 ) + f (x − x0 ).
E E

When f is continuous, this implies (f g) is real-valued and hence continuous. When f is


bounded on bounded sets, so is (f g). When f is Lipschitz with Lipschitz constant k ≥ 0,
then f ≤ kk · k + f (0) and so f g ≤ kk · k + b where b := g(x0 ) + f (0) + kkx0 k which
implies (f g) is Lipschitz with Lipschitz constant k (see Exercise 4.1.28). See also the note
following (g).

13
(g) Observe that

(k · k δC )(x) = inf kx − yk + δC (y) = inf kx − yk = dC (x).


y∈X y∈C

As in the proof of (f), the convolution has Lipschitz constant 1 because the norm has
Lipschitz constant 1.

Further notes. One may prefer a more explicit argument in (f). Indeed, once we have established
(f g) is real-valued, suppose (f g)(x̄) = t̄. Given  > 0, choose y ∈ E such that f (y)+g(x̄−y) <
t̄ + . Then for any h ∈ E,

(f g)(h + x̄) ≤ f (y + h) + g(x̄ − y) ≤ |f (y + h) − f (y)| + f (y) + g(x̄ − y)


< (f g)(h + x̄) + |f (y + h) − f (y)| + .

From here, local/global Lipschitz properties of the convolution then follow directly from the
local/global Lipschitz properties of f (this argument works just as well in any normed linear
space irrespective of the dimension).

2.3.15. For (a) see the proof of Lemma 4.4.15; for (b) see the proof of Lemma 4.4.16; and for
(c), use (a) and (b), c.f. Corollary 4.4.17.
(d) Observe that if f and g are closed, then (f ∗ g ∗ )∗ = f ∗∗ + g ∗∗ = f + g, where the first
equality follows from (a). Then (f + g)∗ = (f ∗ g ∗ ). The result as stated follows because
cl(f + g) = cl f + cl g under the condition dom f ∩ cont g 6= ∅ as we now sketch.
Clearly, cl f + cl g ≤ f + g and so cl f + cl g ≤ cl(f + g). Let x̄ ∈ dom(cl f + cl g). We will show
cl(f + g)(x̄) ≤ (cl f + cl g)(x̄). Fix v ∈ int dom g ∩ dom f and choose r > 0 so that g is bounded on
v + rBX ⊂ int dom g. Now choose un ∈ dom f with un → x̄ and f (un ) → cl f (x̄). For 0 < λ < 1,
λx̄ + (1 − λ)(v + rBX ) ⊂ int dom g. Because un → x̄, we fix λn → 1 so that

λn un + (1 − λn )v̄ ∈ λn x̄ + (1 − λn )(v̄ + rBX ) ⊂

Now,

g(λn un + (1 − λn )(v̄)) = cl g(λn un + (1 − λn )(v̄))


= cl g(λn x̄ + (1 − λn )vn ) where vn ∈ v̄ + rBX
≤ λn cl g(x̄) + (1 − λn ) cl g(un ) → cl g(x̄).

Similarly, f (λn un + (1 − λn )(v̄) ≤ λn f (un ) + (1 − λn )f (v̄) → cl f (x̄). Altogether, we conclude


cl(f + g)(x̄) ≤ cl f (x̄) + cl g(x̄).

2.3.16. Suppose f : C → R is Lipschitz with Lipschitz constant k, and let f˜(x) := inf{f (y) +
kkx − yk : y ∈ C}. For x0 ∈ C, taking y = x0 clearly shows f˜(x0 ) ≤ f (x0 ), hence f˜ ≤ f
on C. Also, fix x0 ∈ C. Then f (x) ≤ f (x0 ) + kkx − x0 k, and so f (x) < ∞ for all x ∈ X.
Moreover, f (x) ≥ f (x0 ) − kkx − x0 k for x ∈ C. Now fix x ∈ X, then f (y) + kkx − yk ≥
f (x0 ) − kky − x0 k + kkx − yk ≥ f (x0 ) − kkx − x0 k for all y ∈ C. This shows f˜(x) > −∞ for all
x ∈ X, i.e. f˜ is real-valued.
Suppose f˜(x0 ) < f (x0 ) for some x0 ∈ C. Then there exists y ∈ C such that f (y) + kkx0 − yk <
f (x0 ). This violates the Lipschitz constant of f on C. Hence f˜(x) = f (x) for all x ∈ C. Similarly,
one can see that f˜ is globally Lipschitz with Lipschitz constant k. Indeed, suppose f˜(u) − f˜(v) >

14
kku − vk +  for some u, v ∈ X and  > 0. Choose x0 ∈ C such that f˜(v) ≤ f (x0 ) + kkv − x0 k + .
Then
f˜(u) ≤ f (x0 ) + kku − x0 k ≤ f (x0 ) + kkv − x0 k + kku − vk
which is a contradiction.

2.3.17. (a) Suppose x and y are global minimizers of f . Then f (λx + (1 − λ)y) ≥ f (x) =
λf (x) + (1 − λ)f (y) for 0 < λ < 1. Because f is strictly convex, x = y.
(b) It suffices to show this for y = 0, and it suffices to show
 
u+v 1 1
(2) f < f (u) + f (v) for all distinct u, v ∈ E.
2 2 2

Indeed, if f (λu + (1 − λ)v) = λf (u) + (1 − λ)f (v) for some 0 < λ < 1 and distinct u and v,
then by the convexity of f equality holds for all 0 < λ < 1. We now check that (2) is an easy
consequence of the parallelogram identity:
  2
u+v 1 u + v
= 1 kuk2 + 1 kvk2 − 1 ku − vk2
f =
2 2 2 2 2 2
1 1
< f (u) + f (v) when u 6= v.
2 2
(c) (i) Let y ∈ E and define f by f (x) := 12 kx − yk2 + δC . The strict convexity of f follows
from (b), and f attains its minimum on C because any minimizing sequence is bounded and a
convergent subsequence converges to the unique, by part (a), minimizer.
Now let y ∈ E, and suppose ȳ ∈ C satisfies hy − ȳ, x − ȳi ≤ 0 for all x ∈ C. Then for x ∈ C

ky − ȳk2 = hy − ȳ, y − ȳi = hy − ȳ, y − xi + hy − ȳ, x − ȳi


≤ hy − ȳ, y − xh≤ ky − ȳk ky − xk

and so ky − ȳk ≤ ky − xk for all x ∈ C, thus ȳ = PC (y).


Conversely, suppose ȳ ∈ C satisfies hy − ȳ, x − ȳi > 0 for some x ∈ C. Then for each 0 < λ < 1,
the convexity of C implies the point xλ := λx + (1 − λ)ȳ is in C. Now compute

ky − xλ k2 = hy − xλ , y − xλ i
= hy − ȳ − λ(x − ȳ), y − ȳ − λ(x − ȳ)i
= ky − ȳk2 − 2λhy − ȳ, x − ȳi + λ2 kx − ȳk2
= ky − ȳk2 − λ[2hy − ȳ, x − ȳi − λkx − ȳk2 ].

For λ > 0 sufficiently small, the term in the brackets is positive and then ky − ȳk2 > ky − xλ k2 ,
and so ȳ 6= PC (y).
(ii) Let x̄ ∈ C. Then d ∈ NC (x̄) if and only if d ∈ ∂δC (x̄) if and only if

hx̄ + d − x̄, x − x̄ ≤ δC (x) − δC (x̄) = 0 for all x ∈ C

if and only if PC (x̄ + d) = x̄ (by part (i)).


(iii) In fact one can show

(3) kPC (x) − PC (y)k2 + kx − PC (x) − (y − PC (y))k2 ≤ kx − yk2 for all x, y ∈ E.

15
Indeed, expanding and rearranging using the inner product, the left-hand side of (3) is equal to

(4) hx − y, x − yi + 2hy − PC (y), PC (x) − PC (y)i + 2hx − PC (x), PC (y) − PC (x)i

and by part (i) the last two inner products in (4) are less than or equal to 0 which provides the
desired conclusion.
(d) For example S = {−1, 1}, then PS (0) is multi-valued, and limx→0+ P (x) = {1} while
limx→0− P (x) = {−1} so there is no single-valued selection of P that is continuous at {0}.

kxk2
2.3.20. (a) Let f (x) := 2 − δS (x). The proof of Fact 4.5.6 shows

1 2 kyk2
dS (y) = − f ∗ (y)
2 2
or dS (·) = k · k2 − 2f ∗ (·) is a difference of convex functions as desired.
(b) dC = k · k δC and thus d∗C = (k · k)∗ + δC
∗ =δ
B E + σC .

(c) Let x ∈ C. By Fenchel–Young (Proposition 2.3.1), φ ∈ ∂dC (x) if and only if d∗C (φ) =
φ(x) − dC (x) if and only if δBE (φ) + σC (φ) = φ(x) − dC (x) = φ(x) if and only if φ ∈ BE and
φ(x) = σC (φ) if and only if φ ∈ ∂δC and φ ∈ BE if and only if φ ∈ NC (x) and φ ∈ BE .
(d) Suppose x 6∈ C. Then φ ∈ ∂dC (x) if and only if φ ∈ BE and

hφ, x − PC (x)i ≥ dC (x) − dC (PC (x)) = dC (x) = kx − PC (x)k.


1 1
Therefore, φ = (x − PC (x)) = (x − PC (x)).
kx − PC (x) dC (x)
(e) For x ∈ C, we obtain that ∇d2C (x) = 0 because

lim |∇d2C (x + th) − d2C (x) − h0, thi| ≤ t2 khk.


t→0

For x 6∈ C, the chain rule implies ∇d2C (x) = 2dC (x)∇dC (x) = 2(x − PC (x)) (by part (c)). Thus
(e) follows.

2.3.21. Let D := {x ∈ E : Ax = b}. Let x ∈ D, then φ ∈ ∂D(x) if and only if φ(y − x) ≤ 0 for
all y ∈ D if and only if φ(u) ≤ 0 for all u ∈ ker A if and only if φ(u) = 0 for all u ∈ ker A.
Suppose φ ∈ A∗ Y , that is φ = A∗ y for some y ∈ Y . Fix u ∈ D, and let v ∈ D be arbitrary, then

hA∗ y, v − ui = hy, A(v − u)i = hy, b − bi = 0.

Therefore φ ∈ ∂D(u), that is φ ∈ NC (u)


Conversely, suppose φ ∈ ∂δD (x). Suppose φ 6= 0, then fix x0 ∈ X such that φ(x0 ) = 1. We
now express E as the direct sum ker φ ⊕ Rx0 . Observe that Ax0 6∈ A(ker φ). Indeed, suppose
Ax0 = Ax1 for some x1 ∈ ker φ. Then A(x0 − x1 ) = 0 and so by the previous paragraph
φ(x0 − x1 ) = 0. Thus, by the basic separation theorem (2.1.21), we choose y ∈ Y such that
y(Ax0 ) = 1 and y(A(ker φ)) = {0}. Given x̄ ∈ E, we write x̄ = h + φ(x̄)x0 where k ∈ ker φ. Then

hA∗ y, x̄i = hy, A(h + φ(x̄)x0 )i = hy, Ahi + φ(x̄)hy, Ax0 i = φ(x̄).

Because x̄ ∈ E was arbitrary, we have φ = A∗ y, and A∗ Y ⊂ δD (x) as desired.

16
(a) Suppose x̄ is a local minimizer as specified. Then ∇f (x̄)| ∈ ∂δD (x̄) and so ∇f (x̄) ∈ A∗ Y .
(b) Suppose ∇f (x̄) ∈ A∗ Y and f is convex. Then ∇f (x̄) ∈ ∂δD (x̄), and because f is convex, x̄
is a global minimizer of f |D .

Exercises from Section 2.4

2.4.1. Part(a).

(i) As in the proof of the Fenchel duality theorem (2.3.4), let h : Y → [−∞, +∞] be defined by

h(u) := inf {f (x) + g(Ax + u)},


x∈E

then h is convex and 0 ∈ core(dom g − A dom f ) = dom h, thus by the max formula (2.1.19)
∂h(0) is not empty, so we let −φ ∈ ∂h(0). Now for all x ∈ E and u ∈ Y with u = Av where
v ∈ E, we have

0 ≤ h(0) ≤ h(u) + hφ, ui


≤ f (x) + g(A(x + v)) + hφ, Avi
= [f (x) − hA∗ φ, xi] − [−g(A(x + v)) − hφ, A(x + v)i].

Define

(5) b := inf {f (x) − hA∗ φ, xi} a := sup{−g(A(z)) − hA∗ φ, zi}.


x∈E z∈E

Then a ≤ b and thus for any r ∈ [a, b] we have

f (x) ≥ r + hA∗ φ, xi ≥ −(g ◦ A)(x) for all x ∈ E.

(ii) With notation as in the Fenchel duality theorem (2.3.4), observe p ≥ 0 because f (x) ≥
−g(Ax), and then the Fenchel duality theorem (2.3.4) says d = p and because the supremum
in d is attained, we choose φ ∈ Y such that

0 ≤ p = −f ∗ (A∗ φ) − g ∗ (−φ)
≤ [f (x) − hφ, Axi] + [g(y) + hφ, yi] for all x ∈ X, y ∈ Y,

where the second inequality is a direct consequence of the definitions of f ∗ (A∗ φ) and g ∗ (−φ).
Then for any z ∈ E, setting y = Az, in the previous inequality, we deduce a ≤ b where a
and b are as in (5). Now choose r ∈ [a, b] and let α(x) = hA∗ φ, xi + r.
(iii) The inclusion is straightforward (Exercise 2.3.12 (a)(ii)), so we prove the reverse inclu-
sion. Suppose φ ∈ ∂(f + g ◦ A)(x̄). Because shifting by a constant does not change the
subdifferential, we may assume without loss of generality that

x 7→ f (x) + g(Ax) − φ(x)

attains its minimum of 0 at x̄. According to the sandwich theorem (2.4.1) there exists an
affine function α := hA∗ y, ·i + r with −y ∈ ∂g(Ax̄) such that

f (x) − φ(x) ≥ α(x) ≥ −g(Ax) for all x ∈ E, with equality when x = x̄

Then f (x) ≥ hφ + A∗ y, xi + r and f (x̄) = hφ + A∗ y, x̄i + r. Thus φ + A∗ y ∈ ∂f (x̄), and as


a consequence, we have φ ∈ ∂f (x̄) + A∗ ∂g(Ax̄) as desired.

17
(iv) Let the notation be as in the Hahn–Banach extension theorem (2.1.18). Then −g ≤ p where
g = −f +δS . Because p is everywhere continuous, we can apply the sandwich theorem (2.4.1)
to find an affine mapping α such that −g ≤ α ≤ p, that is f ≤ α ≤ p. Then α = α(0) + φ
where φ ∈ E. We know α(0) ≥ f (0) = 0 and so φ ≤ p. We claim φ(s) = f (s) for all s in the
linear subspace S. Indeed, if this were not true, then f (x0 ) − φ(x0 ) 6= 0 for some x0 ∈ S.
Then choose k ∈ R so that k(f (x0 ) − φ(x0 )) > α(0). This implies f (kx0 ) > φ(kx0 ) + α(0)
which is impossible. Thus φ|S = f as desired.

Part (b). Several connections were outlined in (a) and earlier, for now we’ll derive a couple
additional easy relations, to sketch one complete circle.

(i) Suppose the subdifferential sum rule is valid, and x0 ∈ core dom f where f : E → (−∞, +∞]
is convex. Then E = ∂(f + δ{x0 } )(x0 ) = ∂f (x0 ) + ∂δ{x0 } (x0 ) and so ∂f (x0 ) is not empty.
(ii) Suppose the subdifferential of a convex function on E at a point of continuity is not empty.
Now consider a linear function f on a subspace Y of E and f |Y ≤ p for some sublinear
function p on E. Consider h = f p. Then h ≤ p, h|Y = f , h is continuous with h(0) =
0. Consider φ ∈ ∂h(0). Then φ|Y = f , and φ ≤ h. Thus the Hahn-Banach extension
theorem (2.1.18) follows.

So one of the circles of implications we have sketched is: Hahn–Banach extension ⇒ max formula
⇒ Fenchel duality theorem ⇒ sandwich theorem ⇒ nonemptiness of subdifferential ⇒ Hahn–
Banach extension theorem. Where the proofs of the respective implications are given in: proof
of max formula (2.1.19), proof of the Fenchel duality theorem (2.3.4), Part(a)(ii), Part(a)(iii),
Part(b)(i), Part(b)(ii).
Further notes. (I) The Fenchel duality and the sandwich theorems are most easily visualized and
understood in the classical case Y = E where A is the identity map, and yet still very powerful.
In this case, the primal and dual problems are:
p := inf {f (x) + g(x)} and d := sup{−f ∗ (y) − g ∗ (−y)}.
x∈E y∈E

As before, p ≥ d by the Fenchel–Young inequality (2.3.1). We derive p = d using the sandwich


theorem (2.4.1) when 0 ∈ core(dom g − dom f ). Indeed, when p > −∞, we know f (x) ≥ p − g(x)
for all x ∈ E, and thus there is an affine function α := φ + r such that
f (x) ≥ hφ, xi + r ≥ −g(x) + p for all x ∈ E.
Then hφ, xi − f (x) ≤ −r for all x ∈ E and h−φ, xi − g(x) ≤ r − p for all x ∈ E. Thus f ∗ (φ) ≤ −r
and g ∗ (−φ) ≤ r − p. In other words, p ≥ d ≥ −f ∗ (φ) − g ∗ (−φ) ≥ r + p − r = p and so p = d and
the sup is attained at φ as desired.
(II) Of course, one can derive the Fenchel dualilty theorem (2.3.4) from the sandwich theo-
rem (2.4.1) by slightly modifying the proof as presented in the text. Indeed, let h be as defined in
the proof of the Fenchel dualilty theorem (2.3.4), and observe h ≥ p − δ{0} , and h(0) = p. By the
sandwich theorem (2.4.1), there is an affine function, say α := p − φ such that h ≥ α ≥ p − δ{0} ,
and thus −φ ∈ ∂h(0). Now proceed as in the proof for the Fenchel dualilty theorem (2.3.4).
(III) A more direct derivation of the Fenchel dualilty theorem (2.3.4) from the sandwich theo-
rem (2.4.1) is as follows. Let h : E × Y → (−∞, +∞] be defined by h(x, y) := f (x) + g(y). Then
h ≥ p − δG(A) , where G(A) := {(x, y) : y = Ax} is the graph of A : E → Y , and we apply the
sandwich theorem (2.4.1) to obtain Λ ∈ E, φ ∈ Y and r ∈ R such that
(6) g(y) + f (x) ≥ Λ(x) − φ(y) + r ≥ p − δG(A) , for all x ∈ X, y ∈ Y.

18
When x = 0 and y = 0, (6) implies r ≥ p, and (6) further implies Λ(x) − φ(Ax) ≥ p − r for all
x ∈ E, and so Λ = A∗ φ. Lastly, we rewrite the left inequality of (6) as

−[−φ(y) − g(y)] − [A∗ φ(x) − f (x)] ≥ r (≥ p)

and take the infimum over y ∈ Y and then over x ∈ E to deduce −g ∗ (φ) − f ∗ (A∗ φ) ≥ p which
together with the weak duality inequality provides the result.

2.4.3. (a) Fix  > 0 and let x̄ ∈ cl C. Then there exists a sequence (xn ) ⊂ C such that xn → x̄.
Fix n0 such that kxn0 − x̄k < . Then x̄ ∈ xn0 + BX ⊂ C + BE .
(b) Let x + y ∈ D + F where x ∈ D, y ∈ F . Because D is open, we choose  > 0 so that
x + BE ⊂ D. Then x + y + BE ⊂ D + F . Therefore, x + y ∈ int(D + F ) and so D + F is open.
(c) Let x ∈ int C and choose  > 0 so that x + BE ⊂ C. By part (a), for each λ > 0,
λ
cl C ⊂ C + 1−λ BE . Therefore,
 
λ
λx + (1 − λ) cl C ⊂ λx + (1 − λ) C + BE
1−λ
= λ(x − BE ) + (1 − λ)C
= λ(x + BE ) + (1 − λ)C ⊂ C.

Because x ∈ int C was arbitrary it follows that

λ int C + (1 − λ) cl C ⊂ C, for each 0 < λ ≤ 1,

and by part (b), the sum on the left hand side is open, and so

λ int C + (1 − λ) cl C ⊂ int C, for each 0 < λ ≤ 1,

as desired.
(d) Because int C ⊂ cl C, the previous inequality implies (trivially) λx + (1 − λ)y ∈ int C for all
x, y ∈ int C and 0 < λ < 1, and so int C is convex.
(e) For any fixed x ∈ int C, λx + (1 − λ) cl C ⊂ int C. Letting λ → 0+ , we deduce that
cl C ⊂ cl(int C).
Clearly this can fail without convexity. For example, let S := Q ∪ (0, 1) ⊂ R. Then cl S = R,
but cl(int S) = [0, 1].
Further notes: in any normed linear space it is straightforward to show the interior of a convex
set is convex: Let x, y ∈ int C, and choose r > 0 so that x + rBX ⊂ C and y + rBX ⊂ C. Then
for 0 ≤ λ ≤ 1, one has
λ(x + rBX ) + (1 − λ)(y + rBX ) ⊂ C.
Then, λx + (1 − λ)y + rBX ⊂ C and λx + (1 − λ)y ⊂ int C as desired. It is also easy to see that
the closure of a convex set is closed (see solution to Exercise 2.4.8). See also [383, Theorem 1.13]
for more.
T
2.4.4.(a) Let (Ai )i∈I be a collection of affine sets. Let A = i∈I Ai , and let x, y ∈ A and λ ∈ R.
Then x, y ∈ Ai for each i ∈ I and so λx+(1−λ)y ∈ Ai for each i ∈ I. Therefore, λx+(1−λ)y ∈ A,
and so A is affine.
(b) Suppose A is a nonempty affine subset of E. Fix x0 ∈ A. We claim that Y := A − x0 is
linear. Indeed, then for α, β ∈ R and x, y ∈ L we have x = u − x0 and y = v − x0 where u, v ∈ A,
and
αx + βy + x0 = αu − αx0 + βv − βx0 + 1x0 ∈ A

19
because α − α + β − β + 1 = 1 (see part (c)). Therefore, αx + βy ∈ A − x0 , and A − x0 is a linear
subspace. Conversely, if Y is linear, and x0 ∈ E, then A := Y + x0 is affine. Indeed, for x, y ∈ Y ,
and α + β = 1, we have

α(x + x0 ) + β(y + y0 ) = αx + βy + x0 ∈ A,

which completes the proof of (b). (Alternatively, one can directly deduce this from Lemma 2.4.5).
(c) This part follows immediately from Lemma 2.4.5 which Pmshows aff D = x + span(D − x) for
any x ∈ D. Indeed, suppose x1 , x2 , . . . , xm ∈ D. Then for i=1 λi = 1, we have
m
X m
X
λ i xi = x + λi (xi − x) ∈ aff D.
i=1 i=1

Conversely, suppose u ∈ aff D. Then u ∈ x + span(D − x) and so


m
X m
X m
X
u=x+ αi (xi − x) = αi xi + (1 − αi )x.
i=1 i=1 i=1

Thus u ∈ aff D and this proves (c).


(d) Suppose D is nonempty. Linear subspaces of E are closed, therefore aff D = x + span(D − x)
is closed for any x ∈ D. Consequently, cl D ⊂ aff D. It then follows that aff(cl D) ⊂ aff D.
Because the reverse inclusion is clear, we deduce aff(cl D) = aff D.

2.4.5. (a) Consider C1 = [0, 1] × {0} and C2 = [0, 1] × [0, 1] as subsets of R2 . Then ri C1 =
(0, 1) × {0} while ri C2 = (0, 1) × (0, 1). Thus C1 ⊂ C2 , but ri C1 6⊂ ri C2 .
(b) By translating C, we may assume that 0 ∈ C, then aff C = Y is a linear space, and so ri C
is the interior of C relative to Y , thus we may apply Exercise 2.4.3 using Y as the overspace to
derive the conclusion.
(c) (i) ⇒ (ii): Let x ∈ ri C then there exists r > 0 so that x + BE ∩ aff C ⊂ C. In particular,
for y ∈ C choosing  > 0 so that ky − xk < r, we have x + (x − y) ∈ C.
(ii) ⇒ (iii): Let Y := {λ(c − x) : λ ≥ 0, x ∈ C}. Certainly 0 ∈ Y . Moreover, let λ(c − x) ∈ Y .
Then αλ(c − x) ∈ Y if α ≥ 0. Suppose α < 0, then choose  > 0 so that (x − c) + x ∈ C. Then

|α|λ
αλ(c − x) = |α|λ(x − c) = (x − c)

|α|λ
= [((x − c) + x) − x],

and so Y is closed under scalar multiplication. We now show Y is closed under addition. Indeed,
for the nontrivial case λ1 , λ2 > 0 we have
 
λ1 λ2
λ1 (c1 − x) + λ2 (c2 − x) = (λ1 + λ2 ) c1 + c2 − x
λ1 + λ2 λ1 + λ2
= (λ1 + λ2 )(c̄ − x),

where c̄ ∈ C by the convexity of C as desired. Thus Y is a linear subspace.


(iii) ⇒ (i) Let Y = {λ(c − x) : c ∈ C}. It follows from Lemma 2.4.5 that aff C = x + Y . For a
basis {yi } of Y , we can choose each of yi and −yi can be written as λ(c − x) for some λ > 0 and
c ∈ C from which it follows that x is in the interior of C relative to aff C.
(d) In fact, ri(T C) = T (ri C); see [369, Theorem 6.6].

20
2.4.6. By shifting f we may assume 0 ∈ dom f , and let Y = span(dom f ). Let x ∈ ri(dom f ), the
x is in the interior of the domain of f relatively to Y . By the max formula (2.1.19), ∂f |Y (x) 6= ∅,
that is, there exists φ ∈ Y such that

(7) hφ, y − xi ≤ f (y) − f (x) for all y ∈ Y.

Now write E = Y +Z as a direct sum, and define φ̃ on E by φ̃(y +z) = φ(y). Because dom f ⊂ Y ,
it follows from (7) that φ̃ ∈ ∂f (x).
Notice that this result ensures the subdifferential of any proper convex function on E has
nonempty domain and range. This is because the domain of a convex function is convex, and
every nonempty convex subset of E has nonempty relative interior.

2.4.7. Suppose x0 ∈ dom f and ∂f (x0 ) = ∅. Because cl(ri dom f ) = cl(dom f ), there exists a
sequence (xn ) ⊂ ri(dom f ) converging to x0 , and hence by Exercise 2.4.6 there exist φn ∈ ∂f (xn ).
Suppose by way of contradiction that kφn k 6→ ∞, hence it has a bounded subsequence, and then
by compactness a convergent subsequence. Thus we suppose (φnj ) → φ. Then for y ∈ E, we have

φ(y) − φ(x0 ) = lim φnj (y − xnj ) ≤ lim inf f (y) − f (xnj ) ≤ f (y) − f (x0 ),
j j

and so φ ∈ ∂f (x0 ). This provides our desired contradiction. Thus kφn k → ∞. Furthermore,
Exercise 2.2.23 shows that ∂f (x0 ) is unbounded whenever it is not empty and x0 is in the boundary
of the domain of f . Thus ∂f is not bounded on any neighborhood of a boundary point of the
domain of f .
Further notes. Closedness is necessary as simple examples illustrate. Indeed, let f (t) := 0 if
t < 1, f (1) := 1 and f (t) := +∞ for t > 1. Then ∂f (1) = ∅ and ∂f (t) = {0} for all t < 1.

2.4.8. Suppose f is proper. Fix x0 ∈ ri(dom f ) and let φ ∈ ∂f (x0 ). Then f (x) ≥ cl f (x) ≥
f (x0 ) + φ(x − x0 ) for all x ∈ X and so cl f is proper. One can verify cl f is convex because
its epigraph is convex as the closure of a convex set; that a closure of a convex set is convex
is elementary to verify. Indeed, suppose D = cl C where C is convex. Suppose x, y ∈ D, and
0 ≤ λ ≤ 1. Choose (xn ), (yn ) ⊂ C so that xn → x and yn → y. Then

λx + (1 − λ)y = lim(λxn + (1 − λ)yn ).


n

Hence λx + (1 − λ)y ∈ D as it is a limit of elements from C.


Further notes. Hence, one can use the Hessian to check convexity of convex functions on closed
domains. For example suppose f : C → R is continuous and int C 6= ∅. Suppose f is twice
Gâteaux differentiable on int C whose Hessian is positive semidefinite at all x ∈ int C. Then f is
convex, because f |int C is convex, and f is the closure of f |int C .

2.4.9. The set C is closed by Carathéodory’s theorem (1.2.5) because it is the convex hull of a
compact set. The set of extreme points of C is not closed because (1, 0, 0) is not an extreme point
of C but every other point on the circle {(x, y, z) : x2 + y 2 = 1, z = 0} is an extreme point of C.

2.4.10. (a) If (2.4.12) has a solution, then clearly (2.4.13) does not, soPat most one of (2.4.12)
and (2.4.13) has a solution. Let C := {x ∈ E : x = m
P
i=0 i xi , λi ≥ 0,
λ λi = 1}. Then C is a

21
closed convex set. In the case 0 ∈ C, then (2.4.12) has a solution. In the event 0 6∈ C, we apply
the basic separation theorem (2.1.21) to find x ∈ E so that supC x < hx, 0i = 0. In particular,
hxi , xi < 0 for i = 0, 1, . . . , m and so x is a solution to (2.4.13).
(b) Clearly, (2.4.14) and (2.4.15) cannot simultaneously have solutions. Consider the cone
m
X
C := {x : x = µi xi , µi ≥ 0}.
i=1

Then C is convex and it is a finitely generated cone which is thus closed by Carathéodory’s
theorem (1.2.5). In the event c ∈ C, then (2.4.14) has a solution. In the event c 6∈ C, we apply the
basic separation theorem (2.1.21) to find x ∈ E so that hx, ci > supu∈C hx, ui = 0 (note supC x = 0
because 0 ∈ C, and if hx, ui > 0 for some u ∈ C, then nu ∈ C and so supC x > nhx, ui > hx, ci).
Therefore, (2.4.15) is satisfied by x because hc, xi > 0 and hxi , xi ≤ supC x = 0 for i = 1, . . . , m.

Pm
2.4.11. (a) Suppose {aj }m j=1 is linearly dependent, and x = j=1 µj aj where µj ≥ 0 for j =
1, 2, . . . , m. We will show that x can be written in this form using at most m − 1 elements, from
which the first statement in (a) will follow.
Using the linear dependence we can write
m
X
λ1 a1 + λ2 a2 + . . . + λm am = 0, where λj ≥ 0
j=1

and not all λj are 0. Now for any t ∈ R we have


m
X
x= (µj − tλj )aj .
j=1
P
Let J+ := {j : λj > 0}. Then J+ =
6 ∅ because λj ≥ 0 and not all λj = 0. Let j0 denote an
index in J+ such that
µj0
= min{µj /λj : j ∈ J+ }.
λj0
Set t̄ := µj0 /λj0 . Then t̄ ≥ 0 and for j ∈ J+ one has
 
µj µj0
(µj − t̄λj ) = λj − ≥ 0,
λj λ j0

which equality when j = j0 . When j 6∈ J+ , then λj ≤ 0 and so µj − t̄λj ≥ 0. Therefore,


µj − t̄λj ≥ 0 for j = 1, 2, . . . , m with equality when j = j0 , and so we write
X
x= (µj − t̄λj )aj
1≤j≤m,j6=j0

as desired.
For the second statement in (a), let {aj : j ∈ J} be a linearly independent set, and let
N := |J| and define the linear mapping A : RN → RN by A(c1 , c2 , . . . , cN ) = N
P
i=1 ci aji where
J = {j1 , j2 , . . . , jN }. Then A is an isomorphism and so it maps closed sets onto closed sets, in
particular, A(R+ N ) = C is closed.
J
(b) A finitely generated cone is thus closed as a union of finitely many closed sets.

22
(c) This can be proved along the same lines as (a), but with more care: see p. 41–42 in L.D.
Berkowivitz, Convexity and Optimization in Rn , Wiley, 2002. We will use the result of (a) to
m
derive (c). Indeed, Pm suppose A ⊂ R , and suppose
Pm x ∈ conv A. Then there exist a1 , . . . , amn in A
such that x = i=1 λi ai where λi ≥ 0 and Pm i=1 λi = 1. Now consider the cone CI ⊂ R × R
m
generated by {(ai , 1)}i=1 . Then (x, 1) = i=1 λi (ai , 1) ∈ CI , and by part (a), (x, 1) P ∈ CJ where
{aj , 1}j∈J is linearly independent in Rn × R, P and so |J| ≤ n +P1. Now (x, 1) = j∈J µj (aj , 1)
where µj ≥ 0 for j ∈ J. Consequently, x = j∈J µj aj and 1 = j∈J µj which shows the desired
result.
2
(d) Let A be a compact set in Rn and let f be a function from R(n+1) to Rn defined by
Pn+1
f (y, x1 , x2 , . . . , xn+1 ) = i=1 yi xi where y = (y1 , y2 , . . . , yn+1 ) ∈ Rn+1 and xi ∈ Rn . Then f is a
continuous function and conv A is the image of the compact set ∆ × A × A × . . . × A under the
mapping f where ∆ is the simplex in Rn+1 .

The collection {xi − x1 }n+2


2.4.12. Let {x1 , x2 , . . . , xn+2 } ⊂ Rn . P i=2 is linearly dependent
n
Pn+2 in R ,
n+2 n+2
hence we find {ai }i=2 not all 0 so that i=2 ai (xi − x1 ) = 0. Now set set a1 := − i=2 ai . Then
n+2
X n+2
X
(8) ai xi = 0 and ai = 0.
i=1 i=1

Let I1 := {i : ai > 0} and I2 := {i : ai ≤ 0}, and let

C1 := conv{xi : i ∈ I1 } and C2 := conv{xi : i ∈ I2 }.

a and let x̄ := P i∈I1 aai xi . Then i∈I1 aai = 1 and so x̄ ∈ C1 , and it follows from
P P P
Let a := i∈IP 1 i
(8) that x̄ = i∈I2 − aai xi and i∈I2 − aai = 1. Consequently x̄ ∈ C2 as well, and we are done.

2.4.13. We first establish the case when I is finite (in this case we need not assume the sets Ci
are closed and bounded). The case |I| ≤ n + 1 is trivial, so we suppose |I| = n + 2 and that the
sets C1 , C2 , . . . , Cn+2 are such that that every subcollection
T of n + 1 or fewer sets has nonempty
intersection. For each T 1 ≤ i ≤ n + 2, we fix x̄i ∈ j∈I,j6=i Cj . In the case x̄j1 = x̄j2 for some
j1 6= j2 , then x̄j1 ∈ i∈I Ci and we are done. So we suppose the x̄0i s are all distinct. According
to Radon’s theorem (1.2.3) we can partition I = I1 ∪ I2 so that D1 := conv{x̄i : i ∈ I1 } and
D2 :=T conv{x̄i : i ∈ I2 } have nonempty T intersection, sayTx̄ ∈ D1 ∩ D2 . Now x̄ ∈ D1 ensures
x̄ ∈ i∈I2 Ci and x̄ ∈ D2 ensures x̄ ∈ i∈I1 Ci and so x̄ ∈ 1≤i≤n+2 Ci as desired.
Now suppose |I| = k > n + 2, and the assertion is true whenever |I| ≤ k − 1 the argument in
the previous paragraph shows every subcollection of n + 2 sets on {Ci }i∈I will have nonempty
intersection. Now consider the collection D1 := C1 ∩ C2 and Di := Ci+1 for i = 2, . . . , k. Then
D1 , D2 , . . . , Dk − 1 is a collection of closed convex sets such that everyTsubcollection of n + 1
or fewer sets has nonempty T intersection. By the induction hypothesis, k−1 i=1 Dk has nonempty
intersection, that is i∈I Ci is not empty as desired. By mathematical induction, the result is
true for every |I| ∈ N.
Now suppose {Ci }i∈I is as in the statement of Helly’s theorem. According to the previous para-
graph, every finiteTsubcollection has nonempty intersection. By the finite intersection property
for compact sets, i∈I Ci is not empty.

2.4.20. Let m := inf C f . Then f ≥ −g where g := δC − m. The conditions imply we can apply
the sandwich theorem (2.4.1) to find an affine function α such that

m − δC ≤ α ≤ f.

23
Then m ≤ inf C α ≤ inf C f as desired.
Suppose x̄ minimizes f on C, and write the affine separating function as α = φ + r. Then
φ ∈ ∂f (x̄) and −φ ∈ ∂δC (x̄) and so 0 = φ − φ ∈ ∂f (x̄) + NC (x̄) as desired.
Conversely, suppose 0 ∈ ∂f (x̄) + δC (x̄). Then 0 ∈ ∂(f + δC )(x̄), and so f + δC attains its
minimum at x̄. Therefore f attains its minimum on C at x̄.
Further notes. This last part could have been completed equally easily using the subdifferential
sum rule because

0 ∈ ∂(f + δC )(x̄) if and only if 0 ∈ ∂f (x̄) + ∂δC (x̄) if and only if 0 ∈ ∂f (x̄) + NC (x̄).

2.4.21. (a) Applying the Fenchel duality theorem (2.3.4) we obtain



inf {δC (x) + σD (Ax)} = sup {−δC (A∗ φ) − δD (−φ)}
x∈E φ∈Y
= sup {−σC (A∗ φ) − δD (−φ)}
φ∈Y
= sup {− suphφ, Axi − δD (−φ)}
φ∈Y x∈C
= sup { inf h−φ, Axi − δD (−φ)}
φ∈Y x∈C
= sup { inf {hy, Axi}.
y∈D x∈C

Further, the supremum is attained when finite according to the Fenchel duality theorem (2.3.4),
and so we have

(9) inf sup hy, Axi = max inf hy, Axi.


x∈C y∈D y∈D x∈C

(b) In case (i) when D is bounded, σD has full domain and AC is not empty and so (2.4.17)
holds. In case (ii), when A is surjective and 0 ∈ int C, then 0 ∈ int AC because A is open. Clearly
0 ∈ dom σD , and thus (2.4.17) holds in this case as well.
(c) When D is compact, (2.4.17) holds by case part (b)(i). The compactness of D and C then
allow the replacing of sup and inf with max and min.

2.4.23. Let K be a nonempty subset of E. Then

K − := {φ ∈ E : hφ, xi ≤ 0, for all x ∈ K}

Given any x ∈ K, t ≥ 0 and φ ∈ K − we have

hφ, txi = thφ, xi ≤ 0.

Therefore, K ⊂ (K − )− . Also, observe that for any set S, S − is a closed convex set because it is
an intersection of closed half-spaces. In particular, K −− is a closed convex set containing R+ K.
Now let C be the closed convex hull of R+ K, and suppose x0 6∈ C. By the basic separation
theorem (2.1.21), we choose φ ∈ E such that φ(x0 ) > supC φ. Observe that supC φ = 0 because
0 ∈ C, and if supC φ > 0, then there exists t > 0 and x̄ ∈ K such that φ(tx̄) > 0. Then
limn→∞ φ(nx̄) = ∞, and so φ would not be bounded above by φ(x0 ) on C. Since supC φ = 0,
then φ ∈ K − , and so x0 6∈ (K − )− , and thus K −− ⊂ C as desired.

24
2.4.24. We will prove the assertions for D + C, the other case follows by considering D + (−C).
Suppose d1 + c1 , d2 + c2 ∈ D + C. Then for 0 ≤ λ ≤ 1, using the convexity of D and C we obtain

λ(d1 + c1 ) + (1 − λ)(d2 + c2 ) = λd1 + (1 − λ)d2 + λc1 + (1 − λ)c2 ∈ D + C.

Now suppose (dn + cn )∞ n=1 ⊂ D + C and dn + cn → x̄. Let (cnk ) be a convergent subsequence of
(cn ), say, cnk → c̄ ∈ C. Then dnk → x̄ − c̄. Thus x̄ − c̄ ∈ D, and so x̄ ∈ D + C as desired.

Exercises from Section 2.5

2.5.1. Suppose f is differentiable at x0 (by Theorem 2.2.1 f is automatically Fréchet differen-


tiable), so given  > 0 we choose δ > 0 so that

|f (x + h) − f (x) − hf 0 (x0 ), hi| ≤ khk whenever 0 ≤ khk < δ.
2
Therefore, for khk < δ, using the triangle inequality we obtain

|f (x + h) + f (x − h) − 2f (x)| = |f (x + h) − f (x) − hf 0 (x), hi + f (x − h) − f (x) − hf 0 (x), −hi|


 
≤ khk + khk = khk.
2 2
f (x + h) + f (x − h) − 2f (x)
This implies lim = 0.
khk→0 khk
Conversely, suppose f is not differentiable at x. Because f is continuous at x, the max for-
mula (2.1.19) ensures ∂f (x) 6= ∅, thus we use Theorem 2.2.1 to deduce that there are distinct
φ, Λ ∈ ∂f (x). Thus we choose h0 ∈ SE so that (φ − Λ)(h0 ) > 0. Now let  = (φ − Λ). By the
subdifferential inequality

f (x + th0 ) − f (x) − φ(th0 ) + f (x − th0 ) − f (x) − Λ(−th0 ) ≥ 0, for all t.

In particular, f (x + th0 ) + f (x − th0 ) − 2f (x) ≥ (φ − Λ)(th0 ) ≥ t for all t > 0 and so

f (x + h) + f (x − h) − 2f (x)
lim sup ≥
khk→0 khk

which establishes the ‘if’ assertion.

2.5.2. Let ( )
1
Gn,m := x ∈ U : sup f (x + h) + f (x − h) − 2f (x) < ,
1
khk≤ m nm
S T
and On := m≥1 Gn,m and G := n≥1 On . Suppose x ∈ G and let  > 0. Choose n such that
1/n < . Now find m such that x ∈ Gn,m , and choose δ so that 0 < δ < 1/m. The convexity of
f implies
1 1
sup f (x + h) + f (x − h) − 2f (x) < α whenever 0 < α ≤ .
khk=α n m
Indeed, for 0 < λ ≤ 1 we have

f (x + λh) + f (x − λh) − 2f (x) ≤ λf (x + h) + (1 − λ)f (x) + λf (x − h) + (1 − λ)f (x) − 2f (x)


= λ[f (x + h) + f (x − h) − 2f (x)].

25
Exercise 2.5.1 implies that f is differentiable at x. Conversely, suppose f is differentiable at x,
and fix n ∈ N. Choose 0 <  < 1/n and use Exercise 2.5.1 to find δ > 0 so that

f (x + h) + f (x − h) − 2f (x) ≤ khk whenever khk ≤ δ.

Then x ∈ Gn,m for all m > 1/δ. It follows that x ∈ G as desired.


It remains to verify that On is open for each n. Indeed, fix n and suppose x ∈ On . Then for
some m0 ∈ N, x ∈ Gnm and, as above, the convexity of f implies that x ∈ Gnm for all m ≥ m0 .
Now fix m > m0 sufficiently large so that f has Lipschitz constant, say K > 0 on B2/m (x). Then
choose  > 0 so that
1
sup f (x + h) + f (x − h) − 2f (x) < − .
khk≤ 1 mn
m

Now we choose α > 0 so that 4Kα <  and α < 1/m. Now suppose ku − xk < α, then because f
has Lipschitz constant K on B2/m (x), for khk < 1/m we have
1
f (u + h) + f (u − h) − 2f (u) ≤ f (x + h) + f (x − h) − 2f (x) + 4Kkx − uk < .
mn
This shows On is open as desired.

2.5.3. Suppose f : Rn → Rm is locally Lipschitz. Let pj be the j-th coordinate projection from
Rm to R. Define fj = pj ◦ f . Then fj is locally Lipschitz. Let D = {x ∈ Rn : fj is differentiable
at x, j = 1, 2, . . . , m}. It follows from Rademacher’s theorem, that Dc is a union of finitely many
null sets, and thus has measure 0. It remains to show that f is differentiable at each x ∈ D.
Indeed, fix x ∈ D and let A be the m by n matrix whose j-th row is ∇fj (x). It is not hard to
verify that ∇f (x) = A, indeed for  > 0, choose δ > 0 so that

fi (x + th) − fi (x) 
− h∇fi (x), hi < √ whenever h ∈ SRn , 0 < |t| < δ.
t m
Then for h ∈ SRn and 0 < |t| < δ, we have
v
um  2
f (x + th) − f (x) uX fi (x + th) − fi (x)
− Ah =
t − h∇fi (x), hi
t t
i=1
v
um
uX 2
< t = .
m
i=1

Hence ∇f (x) = A as desired.

2.5.4. Let  > 0, and let h ∈ SX . Let K > 0 be chosen so that f satisfies Lipschitz constant K
in a neighborhood δBr (x̄) of x̄ and kyk ≤ K. Now fix k ∈ N with khk − hk < /4K. Now choose
0 < δ < r so that

(10) |f (x̄ + thk ) − f (x̄) − hy, thk i| < t whenever |t| < δ.
4
Now for |t| < δ we have

|f (x̄ + th) − f (x̄) − hy, thi| ≤ |f (x̄ + thk ) − f (x̄) − hy, thk i| + 2Kkth − thk k
 
≤ |t| + 2K|t| = |t|.
2 4K

26
This shows f is Gâteaux differentiable at x̄ with ∇f (x̄) = y. So far, we didn’t use that X is finite-
dimensional. However, because X is finite-dimensional, and f is Lipschitz in a neighborhood of
x̄, Exercise 2.2.9 implies f is Féchet-differentiable at x̄ as desired.
Further comments. The reader will notice that last part of the proof of Rademacher’s theorem
also proves this fact. Observe further that even the assertion f is Gâteaux differentiable at x̄
may fail for continuous functions (the estimate pwith the Lipschitz constant
p above wasp crucial).
2
Indeed, on R let f (x, y) = 0 whenever y ≤ |x| and f (x, y) = y − |x| for y ≥ |x|. The
hypothesis of the exercise are satisfied at x̄ := (0, 0) for every direction h ∈ SR2 with y := (0, 0)
except for the direction h := (0, 1), however, f fails to be Gâteaux differentiable at (0, 0).

2.5.5. Let x be a boundary point of C. Because C has nonempty interior, we choose φ ∈ SX ∗ so


that φ(x) = supC φ and φ(x) > φ(y) whenever y ∈ int C. It follows that φ ∈ ∂dC (x). Indeed, if
y ∈ C, then φ(y − x) ≤ 0 = dC (y) − dC (x). If y 6∈ C, then φ(y − x) ≤ inf{ky − uk : φ(u) ≤ φ(x)} ≤
dC (y) = dC (y) − dC (x). Because 0 ∈ dC (x), it follows that ∂dC (x) is not a singleton, so dC is
not differentiable at x. Thus dC is a convex function that is not differentiable at the boundary
points of C, and consequently, the boundary of C is both first category and Lebesgue-null.

2.5.7. (a)⇒(b): g is almost everywhere differentiable by Theorem 2.5.1. On the other hand, the
subgradient inequality holds on U . Together, these facts imply (b).
(b)⇒(c): trivial.
(c)⇒(a): Fix u, v in U , u 6= v, and t ∈ (0, 1). It is not hard to see that there exist sequences (un )
in U , (vn ) in U , (tn ) in (0, 1) with un → u, vn → v, tn → t, and xn := tn un + (1 − tn )vn ∈ A, for
every n. By assumption, ∇g(xn )(un − xn ) ≤ g(un ) − g(xn ) and ∇g(xn )(vn − xn ) ≤ g(vn ) − g(xn ).
Equivalently, (1 − tn )∇g(xn )(vn − un ) ≤ g(un ) − g(xn ) and tn ∇g(xn )(un − vn ) ≤ g(vn ) − g(xn ).
Multiply the former inequality by tn , the latter by 1 − tn and adding we obtain

0 ≤ tn g(un ) − tn (g(xn ) + (1 − tn )g(vn ) − (1 − tn )g(xn ),

 g(xn ) ≤ tn g(un ) + (1 − tn )g(vn ). Now let n tend to +∞, and deduce that
or in other words
g tu + (1 − t)v ≤ tg(u) + (1 − t)g(v). The convexity of g follows and the proof is complete.

2.5.8. Let fn (x) := n[f (x + 1/n) − f (x)]. Let G := {x : f 0 (x) exists}; then fn (x) → f 0 (x) for
x ∈ G. Let Fn := {x : |fj (x) − fk (x)| ≤ 1/2} for all j, k ≥ n. Then Fn is closed because it
S since fj − fk is continuous for each k, j ∈ N. Clearly (fn (x)) is
is an intersection of closed sets
convergent for x ∈ G and so Fn ⊃ G. Suppose Fn0 contains an open interval I for some n0 ∈ N.
Then |fn0 (x) − f 0 (x)| ≤ 1/2 almost everywhere on I. By the Fundamental theorem of calculus,
f 0 (x) = χS − χR\S almost everywhere, and so f 0 (x) = 1 on a dense subset of I and f 0 (x) = −1
on another dense subset of I we conclude that fn0 ≥ 1/2 on a dense subset of I, and fn0 ≤ −1/2
on a dense subset of I to contradict the continuity of fn0 . Hence Fn is nowhere dense for each
n ∈ N and so G is a set of first category.

Exercises from Section 2.6

2.6.1. Using the bilinear property of φ, and then the symmetric property we compute

φ(x + y, x + y) = φ(x, x + y) + φ(y, x + y)


= φ(x, x) + φ(x, y) + φ(y, x) + φ(y, y)
= φ(x, x) + 2φ(x, y) + φ(y, y).

27
Therefore,
1
φ(x, y) = [φ(x + y, x + y) − φ(x, x) − φ(y, y)].
2
That is, φ is uniquely determined by the values φ(h, h) such that h ∈ E.

2.6.3. (a) This part is essentially a restatement of the definitions. Indeed, assume that

f (x + th) − f (x) − th∇f (x), hi


∆2t f (x) : h 7→ 1 2 ,
2t

converges uniformly on bounded sets to the function h 7→ hAh, hi as t → 0 for some matrix A.
Then given  > 0, choose δ > 0 so that

f (x + ty) − f (x) − th∇f (x), yi
− hAy, yi <  when 0 < |t| < δ, y ∈ SX .

1 2
2t

Now letting h := ty where y ∈ SX and 0 < |t| < δ in the preceding, implies
1
f (x + h) = f (x) + h∇f (x), hi + hAh, hi + o(khk2 ), khk → 0
2
as desired. The converse implication follows essentially by reversing the preceding steps. Indeed,
suppose f has a strong second-order Taylor expansion at x. Given  > 0, choose δ > 0 so that
 
f (x + h) − f (x) + h∇f (x), hi + 1 hAh, hi ≤ 1 khk2 , when khk < δ.

2 2

Then given any r > 0 for y ∈ rBX and |t| < δ/r, we have
 
f (x + ty) − f (x) + th∇f (x), yi + 1 t2 hAy, yi ≤ 1 |t|2 .

2 2

Dividing both sides by 21 t2 when t 6= 0, we obtain



f (x + ty) − f (x) − th∇f (x), yi
− hAy, yi ≤ .

1 2
2t

Thus ∆2t f (x) → hAh, hi uniformly on bounded sets.


The argument for pointwise convergence is analogous.

(b) As in the proof of Theorem 2.6.1, let A be a symmetric matrix. By part (a), qt := 12 ∆2t f (x)
converges pointwise to q(h) := 12 hAh, hi. According to Proposition 2.6.3 , the functions qt are
closed and convex. Because they converge pointwise to q, it follows that q is convex. It follows
from Exercise 2.1.22 that the convergence is uniform on bounded sets.

2.6.4. (a) The definition of generalized Fréchet derivative implies φn → φ whenever φn ∈ ∂f (xn )
and xn → x. Then Corollary 2.5.3 implies ∂f (x) = {φ}. Thus f is Fréchet differentiable at x
with ∇f (x) = φ (Theorem 2.2.1).

28
(b) Suppose by way of contradiction that the definition of the generalized second-order Gâteaux
derivative at x works with distinct matrices A and B. Then we choose h ∈ SX such that Ah 6= Bh.
Thus we set  := k(B − A)hk. Then choose δ > 0 so that
 
∂f (x + th) ⊂ φ + A(th) + |t|BE and ∂f (x + th) ⊂ φ + B(th) + |t|BE for |t| < δ.
3 3
Now for fixed 0 < t < δ, we let Λ ∈ ∂f (x + th) and write
t
Λ = φ + A(th) + y1 and Λ = φ + B(th) + y2 where ky1 k, ky2 k ≤ .
3
2
Then k(A − B)(th)k ≤ 3t which contradicts  := k(B − A)hk.
(c) Suppose f has a generalized second-order Fréchet derivative at x. Then given  > 0, there
exists δ > 0 so that

∂f (x + h) ⊂ ∇f (x) + Ah + δBE whenever 0 < khk ≤ δ.

Therefore, if |t| < δ, h ∈ BE and φt ∈ ∂f (x + th) we have



φt − ∇f (x)
kφt − ∇f (x) − A(th)k ≤ |t|, and so − Ah
≤ .
t
φt − ∇f (x)
Therefore, lim = Ah uniformly for h ∈ BE as desired.
t→0 t
φt − ∇f (x)
Conversely, suppose lim = Ah uniformly for h in BE . Given  > 0, there exists
t→0 t
δ > 0 such that

φt − ∇f (x)
− Ah <  whenever 0 < |t| ≤ δ, khk = 1, φt ∈ ∂f (x + th).
t
Thus kφt − ∇f (x) − A(th)k < |t|, or in other words,

kφ − ∇f (x) − Ahk < khk whenever φ ∈ ∂f (x + h), 0 < khk ≤ δ.

Because  > 0 was arbitrary, this implies

∂f (x + h) ⊂ ∇f (x) + Ah + o(khk)BE

as desired.

2.6.5. Suppose that for some matrix A : E → E, given any  > 0 and bounded set W ⊂ E there
exists δ > 0 we so that

∆t [∂f ](x)(h) − Ah ⊂ BE for all h ∈ W, t ∈ (0, δ).

Applying this with W = SE and arbitrary  > 0 we find δ > 0 so that


∂f (x + th) − ∇f (x)
− Ah ⊂ BE for all h ∈ BE , t ∈ (0, δ).
t
In particular, when 0 < kuk < δ we write u = th where t = kuk and h ∈ SE and mutiplying both
sides of the previous inclusion by t we obtain

∂f (x + u) ⊂ ∇f (x) + Au + δBE

29
and thus f has a generalized second-order Fréchet derivative at x.
Conversely suppose f has a generalized second-order derivative at x. Let W ⊂ E be bounded
and let  > 0. Choose K > 0 so that W ⊂ KBE . Using the definition of generalized derivative,
choose η > 0 so that

∂f (x + h) ⊂ ∇f (x) + Ah + khkBE whenever 0 < khk < η.
K
Now set δ := η/K and suppose 0 < t < δ and h ∈ W . Then kthk < η and using the previous
inclusion we obtain

∂f (x + th) − ∇f (x) ⊂ A(th) + kthkBE ;
K
and then dividing both sides by t and noting khk ≤ K we obtain

∆t [∂f ](x)(h) − Ah ⊂ BE for all h ∈ W, t ∈ (0, δ),

as desired.

2.6.6. The subgradient inequality ensures ∆2t is nonnegative. Hence the convexity, closedness
 and

2 1 2
properness of ∆t thus follows because f possesses those properties. We next verify ∂ ∆t f (x) =
2
∆t [∂f ](x). Indeed, suppose y ∈ ∂ 21 ∆2t f (x) (h), then for u ∈ E,
 

f (x + t(h + u)) − f (x) − th∇f (x), h + ui − [f (x + th) − f (x) − th∇f (x), hi]
hy, ui ≤
t2
f (x + t(h + u)) − f (x + th) − h∇f (x), tui
=
t2
Multiplying both sides of the previous inequality by t (note: t > 0), we obtain

1 f (x + t(h + u)) − f (x + th)


hy, tui + ∇f (x), tui ≤ .
t t

Thus y + 1t ∇f (x) ∈ 1t ∂f (x + th), that is y ∈ ∂f (x+th)−∇f


t
(x)
. Therefore,
 
1 2
∂ ∆t f (x) ⊂ ∆t [∂f ](x).
2

The reverse inclusion follows by roughly tracing the steps backwards.

h3 cos(1/h) − 0
2.6.8. First, f 0 (0) = lim = 0 and f 0 (t) = 3t2 cos(1/t) + t sin(1/t) when t 6= 0 and
h→0 h
so f is continuously differentiable. Moreover,
1 1
t3 cos(1/t) = 0 + 0t + 0t2 + o(t2 ) = f (0) + f 0 (0)t + 0t2 + o(t2 )
2 2
and so f has a second-order Taylor expansion at 0. However, f 00 does not exist at 0 since

f 0 (h) − f 0 (0) 3h2 cos(1/h) + h sin(1/h) − 0


lim = lim
h→0 h h→0 h
does not exist.

30
Exercises from Section 2.7

2.7.1. Note that given a nonempty set S, conv(S) is the collection of convexPcombinations of
elements in S, that is element of the form m m
P
i=1 i i where m ∈ N, si ∈ S,
λ s i=1 λi = 1 and
λi ≥ 0 for all 1 ≤ i ≤ m.
By the extreme value theorem, Pm f attains its maximum at some x̄ ∈ C. By Minkowski’s theo-
rem P (2.7.2), we may write x̄ = i=1 λi xi where xP i is an extreme point of C, λi ≥ 0 for 1 ≤ i ≤ m
and m i=1 λi = 1. By the convexity of f , f (x̄) ≤ m
i=1 λi f (xi ). Because f attains it maximum at
x̄, this implies f (xi ) = f (x̄) for each 1 ≤ i ≤ n.

2.7.2.(Exposed Points)
(a) Suppose x0 is an exposed point of a convex set C. Choose φ ∈ E such that hφ, x0 i = supC φ
and hφ, xi < hφ, x0 i for all x ∈ C \ {x0 }. Let x, y ∈ C \ {x0 }. Then for any 0 ≤ λ ≤ 1,
φ(λx + (1 − λ)y) = λφ(x) + (1 − λ)φ(y) < φ(x0 ).
Thus x0 is not a convex combination of x and y.
(b) Let C be a compact convex subset of E. Suppose x0 ∈ C is an exposed point, exposed by
φ ∈ E. Now suppose (xn ) ⊂ C is a sequence such that φ(xn ) → φ(x0 ), but xn →→ x0 . By
the compactness of C, there is a convergent subsequence of (xn ) such that (xnk ) → x̄ where
x̄ 6= x0 and x̄ ∈ C. Then φ(x̄) = lim φ(xn ) = φ(x0 ). This contradicts the fact φ exposes x0
in C.
(c) A proof that the exposed points are dense in the extreme points of a compact convex (or
any closed convex set in E) can be found in [369, Theorem 18.6, p. 167–68], the proof uses
the basic separation theorem (2.1.21) and Carathéodory’s theorem (1.2.5). We will outline
another proof that every compact convex subset of E is the closed convex hull of its strongly
exposed points that mimics techniques that will be used in Section 6.6.
Indeed, let C be a compact convex subset of E. Then σC : E → R is a continuous convex
function. By Theorem 2.5.1, σC is differentiable on a dense subset of E. Now let D be
the closed convex hull of the exposed points of C. Suppose by way of contradiction that
D 6= C, that is, we fix x̄ ∈ C \ D. According to the basic separation theorem (2.1.21),
we choose y ∈ E such that hy, x̄i > supD y. Because D is bounded, and the points of
differentiability of σC are dense in E, we may choose φ ∈ E such that σC is differentiable
at φ, and φ(x̄) > supD φ.
Now let x0 = ∇σC (φ) and so ∂σC (φ) = {x0 }. It is easy to check that φ(x0 ) = σC (φ).
Indeed, hx0 , 2φ − φi ≤ σC (2φ) − σC (φ) = σC (φ) and hx0 , φ − 0i ≥ σC (φ) − σC (0) = −σC (φ).
Thus φ(x0 ) = σC (φ). Further, x0 ∈ C, for otherwise we would use the basic separation
theorem (2.1.21) to find Λ ∈ E so that Λ(x0 ) > σC (Λ). This provides the immediate
contradiction hx, Λ − φi > σC (Λ) − σC (φ). Finally, if u ∈ C is such that φ(u) = σC (φ), then
hu, Λ − σi ≤ σC (Λ) − σC (φ) for all Λ ∈ E
and so u ∈ ∂σC (φ) = {x0 }. Thus φ attains its supremum on C uniquely at x0 , and this
yields the contradiction that x0 is an exposed point of C which is not in D.
√ √
Further notes. The set C := {(x, y) ∈ R2 : −1 ≤ x ≤ 1, − 1 − x2 ≤ y ≤ 1 + 1 − x2 } is a
compact convex set that is not the convex hull of its exposed points. Indeed, any exposed point
(x, y) of C satisfies |x| < 1. Thus the closure is needed in (c).

31

You might also like