Lecture Notes "Convex Analysis": Univ.-Prof. Dr. Radu Ioan Bot
Lecture Notes "Convex Analysis": Univ.-Prof. Dr. Radu Ioan Bot
“Convex Analysis”
Summer Semester 2021
Univ.-Prof. Dr. Radu Ioan Boţ
Preface
“... in fact, the great watershed in optimization isn’t between linearity and non-
linearity, but convexity and nonconvexity.” (R. Tyrrell Rockafellar, 1993)
3
4 Preface
Contents
I Convex sets 7
1 Preliminary notions and results . . . . . . . . . . . . . . . . . . . 7
2 The relative interior of a convex set in finite-dimensional spaces . 16
II Convex functions 27
3 Algebraic properties of convex functions . . . . . . . . . . . . . . 27
4 Topological properties of convex functions . . . . . . . . . . . . . 32
IV Duality theory 63
8 A general perturbation approach . . . . . . . . . . . . . . . . . . 63
9 Closedness-type regularity conditions . . . . . . . . . . . . . . . . 71
10 Fenchel duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
11 Lagrange duality . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Bibliography 107
5
6 Contents
Chapter I
Convex sets
In this chapter we will discuss some algebraic and topological properties of convex
sets.
Throughout this lecture X will denote a nontrivial real normed space with
norm k · k. For C, D ⊆ X, we denote by C + D := {x + y | x ∈ C, y ∈ D} the
Minkowski sum of C and D. For λ ∈ R, let λC := {λx | x ∈ C}.
Every affine set is convex. The empty set is affine and, therefore, convex and
it is a cone. Every linear subspace of X is an affine set. Every affine set which
contains the origin is a linear subspace. A cone C ⊆ X is convex if and only if
C + C ⊆ C.
Let C ⊆ X be a nonempty set. We have
For C an affine set and x ∈ C, one has C − x = C − y for all y ∈ C. The set
C − x is called the linear subspace parallel to C and the dimension of the affine
set C is defined as being equal the dimension of C − x.
By induction one can prove that a set C ⊆ X is
(a) P Pkall k ∈ N, xi ∈ C and λi ∈ R, i = 1, ..., k, with
affine if and only if for
k
i=1 λi = 1, it holds i=1 λi xi ∈ C;
7
8 I Convex sets
(b) convex
Pk for all k ∈ N, xi ∈ C and λi ≥ 0, i = 1, ..., k, with
if and only if P
k
i=1 λi = 1, it holds i=1 λi xi ∈ C.
The intersection of an arbitrary family of convex sets (cones, affine sets, lin-
ear subspaces, respectively) is a convex set (cone, affine set, linear subspace,
respectively).
Definition 1.2 (linear hull, affine hull, convex hull, conical hull) Let C ⊆ X.
The set
nP o
k Pk
(c) co C = i=1 λi xi | k ∈ N, xi ∈ C, λi ≥ 0, i = 1, ..., k, λ
i=1 i = 1 .
Proof. We will prove only statement (c). The other statements can be proven
in an analogous way.
nP o
k Pk
Let D := λ x
i=1 i i | k ∈ N, xi ∈ C, λ i ≥ 0, i = 1, ..., k, λ
i=1 i = 1 . It is
obvious that C P ⊆ D. Choosing an arbitrary element x ∈ D, this can be repre-
sented as x = ki=1 λi xi , for k ∈ N, xi ∈ C and λi ≥ 0, i = 1, ..., k, such that
Pk Pk
i=1 λi = 1. Since xi ∈ co C, i = 1, ..., k, it holds that x = i=1 λi xi ∈ co C.
This proves that C ⊆ D ⊆ co C.
Pk by proving that D is a convex set. Let x, y ∈ D
The conclusion will follow
and t ∈ [0, 1]. Then x = i=1 λi xi , for k ∈ N, xi ∈ C and λi ≥ 0, i = 1, ..., k, such
1 Preliminary notions and results 9
k
X l
X
tx + (1 − t)y = tλi xi + (1 − t)µj yj ∈ D,
i=1 j=1
Pk Pl
since i=1 tλi + j=1 (1 − t)µj = t + 1 − t = 1.
(a) linear, if T (λx + µy) = λT (x) + µT (y) for all x, y ∈ X and all λ, µ ∈ R;
(b) affine, if T (λx + (1 − λ)y) = λT (x) + (1 − λ)T (y) for all x, y ∈ X and all
λ ∈ R.
The next theorem shows that the interior and the closure of a convex set are
also convex sets.
Theorem 1.6 Let C ⊆ X be a convex set. The following statements are true:
Proof. (a) Let x, y ∈ cl C and λ ∈ [0, 1]. Let (xk )k∈N , (y k )k∈N ⊆ C such that
xk → x and y k → y as k → +∞. Then λxk + (1 − λ)y k ∈ C for all k ∈ N,
consequently, λx + (1 − λ)y ∈ cl C.
(b) Let x ∈ int C, y ∈ cl C and λ ∈ (0, 1]. Then there exists ε > 0 such that
2−λ 2−λ 2−λ
x + B 0, λ ε = B x, λ ε = u ∈ X | ku − xk < λ ε ⊆ C. On the other
hand, B(y, ε) ∩ C 6= ∅ or, equivalently, y ∈ B(0, ε) + C. From here we obtain
Remark 1.7 If C ⊆ X is a given set, then cl(co C) is a convex set and it holds
The second is in general strict, as it can be seen for the set C = R×{0}∪{(0, 1)} ⊆
R2 .
In the last part of this section we will propose some refinements of the concept
of the interior.
One always has that int C ⊆ core C ⊆ C. Indeed, let x ∈ int C and ε > 0
such that B(x, ε) ⊆ C. Let u ∈ X. Then there exists δ > 0 such that x + δu ∈
B(x, ε) ⊆ C. Consequently, for all λ ∈ [0, δ], it holds x + λu ∈ B(x, ε) ⊆ C, thus,
x ∈ core C.
12 I Convex sets
Proof. It is easy to see that x ∈ core C implies ∀u ∈ X ∃δ > 0 such that x+δu ∈
C and also that this statement implies cone(C − x) = X.
Assume that cone(C − x) = X and let u ∈ X. Then there exists δ > 0 such
that u ∈ 1δ (C − x) or, equivalently, x + δu ∈ C. Let λ ∈ [0, δ]. Then
λ λ
x + λu = 1 − x + (x + δu) ∈ C,
δ δ
The following example introduces a convex set with nonempty algebraic inte-
rior and empty interior.
The following theorem provides conditions for the algebraic interior of a con-
vex set to be equal with the interior of the set.
(b) X is finite-dimensional;
thus, x ∈ int C.
(c) Assume that X is a Banach space and C is closed. S Let x ∈ core C. By
the definition of the algebraic interior we have that X = n∈N n(C − x). By the
Baire Category Theorem (see, for instance, [4, Corollary 10.4]), we have that X
is a set of the second category, which means that there exists n0 ∈ N such that
∅=6 int(n0 (C − x)) = int(n0 (C − x)). Consequently, ∅ =6 int(C − x), which means
that there exists u ∈ C − x and ε > 0 such that x + u + B(0, ε) ⊆ C. On the other
hand, since x ∈ core C, there exists t > 0 such that x − tu ∈ C. The convexity
of C yields
εt t 1
x + B 0, = (x + u + B(0, ε)) + (x − tu) ⊆ C,
t+1 t+1 t+1
thus x ∈ int C.
Definition 1.12 (algebraic relative interior/intrinsic core) Let C ⊆ X. The
algebraic relative interior or intrinsic core of the set C is defined as
icr C:={x ∈ X | ∀y ∈ aff C ∃δ > 0 such that ∀λ ∈ [−δ, δ] it holds x+λ(y−x) ∈ C}.
Proposition 1.13 Let C ⊆ X. Then x ∈ core C if and only if aff C = X and
x ∈ icr C.
Proof. “⇐” It follows from the definitions of the intrinsic core and of the core.
“⇒” Let x ∈ core C. It is clear that x ∈ icr C. We will prove that aff C = X.
Let u ∈ X. Then there exists λ > 0 such that x + λ(u − x) ∈ C. From here,
we get u ∈ λ1 C + 1 − λ1 x ⊆ λ1 aff C + 1 − λ1 aff C ⊆ aff C. This proves that
aff C = X.
14 I Convex sets
(a) lin(C − x) = cone(C − C), thus, cone(C − C) is the linear subspace parallel
to aff C;
(b)
(b) Let x ∈ icr C. From the definition of the intrinsic core it follows that
∀y ∈ C ∃δ > 0 such that (1 + δ)x − δy ∈ C.
Assuming that this property holds, we have that x − C ∈ cone(C − x). Con-
sequently, C − C = C − x + x − C ⊆ cone(C − x) + cone(C − x) ⊆ cone(C − x)
and, further, cone(C − C) ⊆ cone(C − x). Since the opposite inclusion is always
true, it holds cone(C − x) = cone(C − C).
Assuming that cone(C − x) = cone(C − C), according to (a), we have that
cone(C − x) is a linear subspace.
Assume now that cone(C − x) is a linear subspace. We will prove that x ∈
icr C. Since C − x ⊆ cone(C − x) ⊆ cone(C − C) ⊆ lin(C − x), we have that
cone(C − x) = cone(C − C) = lin(C − x) = aff C − x. Let y ∈ aff C, y 6= x. Then
there exist α, β > 0 such that y − x ∈ α(C − x) and x − y ∈ β(C − x). Let δ > 0
such that − β1 < −δ < δ < α1 and λ ∈ [−δ, δ]. If λ ∈ [0, δ], then x + λ(y − x) ∈
λαC +(1−λα)x ∈ C. If λ ∈ [−δ, 0), then x+λ(y −x) = (−λβ)C +(1+λβ)x ∈ C.
This proves that x ∈ icr C.
1 Preliminary notions and results 15
Remark 1.16 (quasi interior and quasi-relative interior) In the literature one
can find two further notions which generalize the interior of a convex set C ⊆ X.
These are its quasi interior
qi C := {x ∈ C | cl(cone(C − x)) = X}
One has
sqri C ⊆ icr C
int C ⊆ core C ⊆ ⊆ qri C ⊆ C,
qi C
all the inclusions being in general strict. If int C 6= ∅, then all interior notions in
the above scheme become equal to int C.
For p ∈ [1, +∞), let `p be the real Banach space of real sequences (tn )n∈N such
+∞ +∞ 1/p
|tn |p < +∞, equipped with the norm k · k : `p → R, kxk = |tn |p
P P
that
n=1 n=1
for all x = (tn )n∈N ∈ `p . For `p+ = {(tn )n∈N ∈ `p : tn ≥ 0 ∀n ∈ N}, the positive
cone of `p , it holds
while
qi(`p+ ) = qri(`p+ ) = {(tn )n∈N ∈ `p : tn > 0 ∀n ∈ N}.
16 I Convex sets
and it represents the interior of the set C relative to its affine hull. If int C 6= ∅,
then aff C = Rn and ri C = int C.
(c) ri(ri C) = ri C.
Proof. (a) ”⇐” It follows from the definition of the relative interior.
”⇒” As x ∈ ri C, it holds x ∈ C and there exists ε > 0 such that B(x, ε) ∩
aff C ⊆ C. We will prove that B(x, 2ε ) ∩ aff C ⊆ ri C. Let y ∈ B(x, 2ε ) ∩ aff C.
Then y ∈ C and, since B(y, 2ε ) ⊆ B(x, ε), it holds B(y, 2ε ) ∩ aff C ⊆ C. This
proves that y belongs to ri C.
(b) The second equality follows according to Remark 2.3. Let x ∈ C and
y ∈ ri C ⊆ C. According to (a), there exists ε > 0 such that B(y, ε)∩aff C ⊆ ri C.
Since [y, x] := {(1 − λ)y + λx | λ ∈ [0, 1]} ⊆ aff C, it holds B(y, ε) ∩ [y, x] ⊆ ri C.
2 The relative interior of a set in finite-dimensional spaces 17
Consequently, there
1 1
exists λ ∈ (0, 1) such that z := (1−λ)y+λx ∈ ri C. Therefore
x = λ z + 1 − λ y ∈ aff(ri C), which proves that C ⊆ aff(ri C). From here it
follows that aff C ⊆ aff(ri C) ⊆ aff C.
(c) The statement is obviously true for ri C = ∅. Assume that ri C 6= ∅.
According to (a) and (b) we have
The following theorem explains the prominent role the concept of relative
interior plays for convex sets in finite-dimensional spaces.
1
+ µi (y − x) ≥ 0 for all i = 0, ..., m, and m 1
P
m+1 i=0 m+1 + µi (y − x) = 1, which
implies that y ∈ co{x0 , x1 , ..., xm } = S. This proves (2.1), consequently, x ∈ ri S.
We will see in the following that in finite-dimensional spaces the intrinsic core,
the strong quasi-relative interior and the relative interior of a convex set coincide.
icr C = sqri C = ri C.
Proof. The first equality follows from Proposition 1.14 (b), therefore, we will
only prove that icr C = ri C.
Let x ∈ ri C. Then x ∈ C and there exists ε > 0 such that B(x, ε)∩aff C ⊆ C.
Let y ∈ C. Then there exists δ > 0 such that x + δ(x − y) ∈ B(x, ε) and, since
x + δ(x − y) ∈ aff C, it holds x + δ(x − y) ∈ C. According to Proposition 1.14
(b), we have x ∈ icr C.
Let now x ∈ icr C. Then x ∈ C and for every u ∈ aff C − x = lin(C − x)
there exists δ > 0 such that for all λ ∈ [−δ, δ] it holds x + λu ∈ C. Let
m := dim(lin(C −x)) and {e1 , ..., em } a basis of lin(C −x) such that x±ei ∈ C for
all i = 1, ..., m. As in the proof of Theorem 1.11 (b), one can find a neighbourhood
of zero in the topology induced by the norm topology on lin(S − x), which can be
represented as B(0, ε)∩lin(C −x), for ε > 0, such that x+B(0, ε)∩lin(C −x) ⊆ C
or, equivalently, B(x, ε) ∩ aff C ⊆ C. This proves that x ∈ ri C.
Proof. The statements can be proved in the lines of Theorem 1.6 (b)-(d) by
taking into account that ri C 6= ∅ and aff(ri C) = aff C = aff(cl C).
(ii) ri C1 = ri C2 ;
(iii) ri C1 ⊆ C2 ⊆ cl C1 .
In the following we will give a useful characterization of the relative interior
of a convex set.
Theorem 2.9 Let C ⊆ Rn be a nonempty convex set. It holds
x ∈ ri C ⇔ ∀y ∈ C ∃λ > 1 such that (1 − λ)y + λx ∈ C. (2.2)
Proof. ”⇒” Let x ∈ ri C and ε > 0 such that B(x, ε) ∩ aff C ⊆ C. Let y ∈ C.
Then (1 − λ)y + λx ∈ aff C for all λ ∈ R and there exists λ̄ ∈ R, λ̄ > 1, such that
(1 − λ̄)y + λ̄x = x + (λ̄ − 1)(x − y) ∈ B(x, ε). This implies (1 − λ̄)y + λ̄x ∈ C.
”⇐” Since ri C 6= ∅, we can choose an element y in this set. Consequently,
there exists λ > 1 such that z := (1 − λ)y +λx ∈ C. Then λ1 ∈ (0, 1) and,
according to Theorem 2.7 (a), x = λ1 z + 1 − λ1 y ∈ ri C.
Remark 2.10 (a) Theorem 2.9 is actually saying that, for all x ∈ ri C and all
y ∈ C, the segment [x, y] can be extended beyond its endpoint x without leaving
the set C. This means not only that there exists λ > 1 with (1 − λ)y + λx ∈ C,
but also that (1 − µ)y + µx ∈ C holds for all µ ∈ [1, λ].
(b) If C ⊆ Rn is a nonempty convex set, then one also has
x ∈ ri C ⇔ ∀y ∈ cl C ∃λ > 1 such that (1 − λ)y + λx ∈ C
⇔ ∀y ∈ C ∃λ > 1 such that (1 − λ)y + λx ∈ ri C.
Let x ∈ ri C and y ∈ cl C. Let ε > 0 such that B(x, ε) ∩ aff C ⊆ C and
λ̄ ∈ R, λ̄ > 1, such that (1 − λ̄)y + λ̄x = x + (λ̄ − 1)(x − y) ∈ B(x, ε). Since
(1 − λ̄)y + λ̄x ∈ aff(cl C) = aff C, it follows (1 − λ̄)y + λ̄x ∈ C. This proves the
first equivalence.
The second equivalence follows from
x ∈ ri C ⇔ x ∈ ri(ri C)
⇔ ∀y ∈ cl(ri C) = cl C ∃λ > 1 such that (1 − λ)y + λx ∈ ri C.
The next theorem provides formulas for the closure and the relative interior
of the intersection of a family of convex sets.
Theorem 2.11 Let (Ci )i∈I ⊆ Rn be a family of convex sets with
T
ri Ci 6= ∅. It
i∈I
holds !
\ \
cl Ci = cl Ci . (2.3)
i∈I i∈I
In addition, if I is a finite index set, it holds
!
\ \
ri Ci = ri Ci . (2.4)
i∈I i∈I
20 I Convex sets
T T
Proof. Let x ∈ ri Ci and y ∈ cl Ci . According to Theorem 2.9, for all
i∈I i∈I
i ∈ I and all λ ∈ (0, 1] it holds
T λx +T (1 − λ)y ∈ ri Ci ⊆ Ci . Consequently, for all
λ ∈ (0, 1], y + λ(x − y) ∈ ri Ci ⊆ Ci . We let λ converge to zero and obtain
i∈I i∈I
T T T T
y ∈ cl ri Ci ⊆ cl Ci . This proves that cl Ci ⊆ cl Ci . Since
i∈I i∈I i∈I i∈I
the opposite inclusion is always true, relation (2.3) is proved.
We have shown above that
! ! !
\ \ \ \
cl Ci = cl Ci ⊆ cl ri Ci ⊆ cl Ci ,
i∈I i∈I i∈I i∈I
T T T T
consequently, cl ri Ci = cl Ci . The sets ri(Ci ) and Ci are convex,
i∈I i∈I i∈I i∈I
T T
therefore, according to Corollary 2.8, ri ri Ci = ri Ci , which implies
i∈I i∈I
T T
that ri Ci ⊆ ri Ci .
i∈I i∈I
In the following we characterize the relative interior and the closure of the
image and of the preimage of a convex set under an affine operator.
Proof. The first statement follows by the continuity of T and does not require
C to be convex. The second statement is obviously true if C is empty. We assume
that C 6= ∅. We have
Remark 2.15 For a finite family of sets Ci ⊆ Rni , i = 1, ..., m, one has cl(C1 ×
... × Cm ) = cl C1 × ... × cl Cm and aff(C1 × ... × Cm ) = aff C1 × ... × aff Cm , which,
according to the definition of the relative interior, leads to
n
Considering the convex sets Ci ⊆ RP , i = 1, ..., m, and the affine operator
n n n
T : R × ... × R → R , TP (x , ..., x ) = m
1 m i
i=1 αi x , where αi ∈ R, i = 1, ..., m, it
m
holds T (C1 × ... × Cm ) = i=1 αi Ci and, according to Theorem 2.13,
m m
!
X X
αi cl Ci ⊆ cl αi Ci
i=1 i=1
and !
m
X m
X
αi ri Ci = ri αi Ci .
i=1 i=1
cl(U ∩ V ) = cl U ∩ cl V = cl U ∩ V
and
ri(U ∩ V ) = ri U ∩ ri V = ri U ∩ V.
Let PrRn : Rn × Rm → Rn , PrRn (x, y) = x, be the projection operator of
R × Rm onto Rn . PrRn is a linear operator which fulfils PrRn (U ∩ V ) = {x ∈ Rn :
n
and
2 The relative interior of a set in finite-dimensional spaces 23
In the last part of this section we will formulate some separation theorems
in finite-dimensional spaces which refine the classical Hahn-Banach Separation
Theorems in real normed spaces, that we recall below.
Throughout this lecture X ∗ will denote the dual space of X. We will use the
notation hx∗ , xi := x∗ (x) for the evaluation of x∗ ∈ X ∗ at x ∈ X.
Theorem 2.19 (Hahn-Banach Separation Theorem; see [4, Theorem 7.6]) Let
X be a nontrivial real normed space and C, D ⊆ X two convex sets such that C
is open and C ∩ D = ∅. Then there exists x∗ ∈ X ∗ , x∗ 6= 0, such that
hx∗ , ci > hx∗ , di ∀c ∈ C ∀d ∈ D.
Theorem 2.20 (Hahn-Banach Strong Separation Theorem; see [4, Theorem
7.7]) Let X be a nontrivial real normed space and C, D ⊆ X two convex sets such
that C is compact, D is closed and C ∩ D = ∅. Then there exists x∗ ∈ X ∗ , x∗ 6= 0,
such that
inf hx∗ , ci > suphx∗ , di.
c∈C d∈D
We start with the following strong separation result which makes the choice
of the separating hyperplane more precise.
Theorem 2.21 Let C ⊆ Rn be a nonempty convex set and und d ∈
/ cl C. Then
n
there exists a ∈ R , a 6= 0, such that
inf{aT x | x ∈ C} = inf{aT x | x ∈ cl C} > aT d.
Moreover, one can choose a in the set cone(cl C − d) such that kak = 1.
Proof. The set cl C is convex and closed. Let
f := argmin kd − ek
e∈cl C
Theorem 2.22 and its consequence, Theorem 2.23, which we will formulate
below, extend the separation result in Theorem 2.19 to convex sets for which we
do not necessarily assume to have nonempty interiors.
Theorem 2.22 Let C ⊆ Rn be a nonempty convex set and d ∈
/ ri C. Then there
n
exists a ∈ R , a 6= 0, such that
inf{aT x | x ∈ C} ≥ aT d
and
sup{aT x | x ∈ C} > aT d.
Proof. If d ∈ / cl C, then the statement follows from Theorem 2.21. Assume that
d ∈ cl C \ ri C. According to Theorem 2.7 it holds d ∈ / ri(cl C). This means that
for all k ∈ N we must have
1
B d, ∩ aff(cl C) * cl C.
k
1
k
Therefore, for all k ∈ N there exists an element d ∈ B d, k
∩ aff(cl C) =
B d, k1 ∩ aff C such that dk ∈
/ cl C. According to Theorem 2.21, for all k ∈ N
there exists ak ∈ Rn fulfilling kak k = 1, ak ∈ cone(cl C − dk ) and
(ak )T x > (ak )T dk ∀x ∈ cl C. (2.6)
Since dk ∈ aff(cl C) = aff C, it holds, according to Proposition 1.4,
cl C − dk ⊆ aff C − dk = lin(C − dk ) := U (aff C) ∀k ∈ N,
where U (aff C) denotes the linear subspace parallel to aff C. Consequently, ak ∈
cone(cl C − dk ) ⊆ U (aff C) for all k ∈ N.
On the other hand, as kak k = 1 for all k ∈ N, there exist a subsequence
(a )j∈N ⊆ Rn and an element a ∈ Rn such that lim akj = a. Therefore,
kj
j→+∞
a ∈ U (aff C) and kak = 1. From (2.6) it follows that
aT x = lim (akj )T x ≥ lim (akj )T dkj = aT d ∀x ∈ cl C,
j→+∞ j→+∞
inf{aT c | c ∈ C1 } ≥ sup{aT d | d ∈ C2 }
and
sup{aT c | c ∈ C1 } > inf{aT d | d ∈ C2 }.
and
Convex functions
27
28 II Convex functions
Proof. “(a) ⇒ (c)” Let (x, r), (y, s) ∈ epiS f and λ ∈ [0, 1]. Then f (λx + (1 −
λ)y) ≤ λf (x) + (1 − λ)f (y) < λr + (1 − λ)s, thus
“(b) ⇒ (a)” Let x, y ∈ X and λ ∈ [0, 1]. If f (x) = +∞ or f (y) = +∞, then
(3.1) is obviously fulfilled. Assume that f (x) < +∞ and f (y) < +∞ and choose
arbitrary r, s ∈ R such that f (x) < r and f (y) < s. Then (x, r), (y, s) ∈ epi f
and, according to (b), λ(x, r)+(1−λ)(y, s) = (λx+(1−λ)y, λr +(1−λ)s) ∈ epi f
or, equivalently, f (λx + (1 − λ)y) ≤ λr + (1 − λ)s. We let r and s convege to
f (x) and f (y), respectively, and obtain, also in this case, (3.1).
are convex. The opposite statement is not true, as it can be seen for f : R →
R, f (x) = x3 .
Proof. Let z ∈ icr dom f . Since dom f is a convex set and x ∈ dom f , according
to Proposition 1.14 (b), there exists δ > 0 such that y := (1 + δ)z − δx ∈ dom f .
1 δ
It holds z = 1+δ y + 1+δ x, consequently,
1 δ 1 δ
f (z) = f y+ x ≤ f (y) + f (x) = −∞.
1+δ 1+δ 1+δ 1+δ
The following notion will allow us to avoid some pathological classes of ex-
tended real-valued functions.
(b) positively homogeneous, namely, f (0) = 0 and f (λx) = λf (x) for all λ > 0
and all x ∈ X.
30 II Convex functions
is sublinear.
(b) A convex function is sublinear if and only if it is positively homogeneous.
(c) Let C ⊆ X be a nonempty convex set. The Minkowski functional of C
Proof. The result follows from epi f = ∩i∈I epi fi and Proposition 3.5.
In the following we will assume that Y is another nontrivial real normed space.
is convex.
which shows that epiS h = PrY ×R (epiS Φ). The convexity of h follows via Propo-
sition 3.5.
Example 3.15 (distance function to a convex set) Let C ⊆ X be a convex set.
Then δC is a convex function, thus δC k · k is a convex function, too. Since for
all x ∈ X
δC k · k(x) = inf{kx − yk + δC (y) | y ∈ X} = inf{kx − yk | y ∈ C} = dC (x),
the distance function to the convex set C dC : X → R, dC (x) = inf y∈C kx − yk, is
convex.
32 II Convex functions
Proof. ”(a) ⇒ (b)” Assume that there exists (x, r) ∈ cl(epi f ) \ epi f . In other
words, r < f (x). Let ε > 0 such that r < r + ε < f (x). Since f is lower
semicontinuous at x, there exists δ > 0 such that inf y∈B(x,δ) f (y) > r + ε, which
implies that B(x, δ) × (r − ε, r + ε) ∩ epi f = ∅. Contradiction to (x, r) ∈ cl(epi f )!
”(b) ⇒ (c)” Let λ ∈ R be fixed and assume that there exists an element
x ∈ cl ({z ∈ X | f (z) ≤ λ}) such that f (x) > λ. This means that (x, λ) ∈ / epi f ,
consequently, there exist ε > 0 and δ > 0 such that B(x, δ)×(λ−ε, λ+ε)∩epi f =
∅. This implies that f (z) > λ for all z ∈ B(x, δ), which contradicts the fact that
x ∈ cl ({z ∈ X | f (z) ≤ λ}).
”(c) ⇒ (a)” Let x ∈ X. We will prove that (4.1) is fulfilled. If f (x) = −∞,
then there is nothing to be proved.
If f (x) = +∞, then, for all n ∈ N, x ∈ {z ∈ X | f (z) > n}, which is an open
set. Consequently, for all n ∈ N, there exists δn > 0 such that B(x, δn ) ⊆ {z ∈
X | f (z) > n}. Thus, inf y∈B(x,δn ) f (y) ≥ n. This yields supδ>0 inf y∈B(x,δ) f (y) =
+∞ = f (x).
Finally, assume that f (x) ∈ R and choose an arbitrary ε > 0. The set
{z ∈ X | f (z) > f (x) − ε} is open and it contains x. Consequently, there exists
δε > 0 such that B(x, δε ) ⊆ {z ∈ X | f (z) > f (x) − ε}, which means that
inf y∈B(x,δε ) f (y) ≥ f (x) − ε. Consequently,
Taking into account that ε > 0 was chosen arbitrary, (4.1) follows.
The notion which we will introduce as follows is the largest lower semicontin-
uous function majorized by a given function.
Assume that f (x) < lim inf y→x f (y). Then there exists ε ∈ R such that
f (x) < ε < lim inf f (y) = sup inf f (y).
y→x δ>0 y∈B(x,δ)
From here it follows that there exists δε > 0 such that inf y∈B(x,δε ) f (y) > ε. In
other words, for all y ∈ B(x, δε ), f (y) > ε or, equivalently, B(x, δε ) × (−∞, ε) ∩
epi f = ∅. Let r ∈ R such that f (x) < r < ε. Then (x, r) ∈ cl(epi f ) and,
since (x, r) ∈ B(x, δε ) × (−∞, ε), it follows that B(x, δε ) × (−∞, ε) ∩ epi f 6= ∅.
Contradiction!
Next we will recall some basic facts related to the weak topology on X induced
by X ∗ . The collection of sets
where
Vx∗1 ,...,x∗n ,ε (0) := {y ∈ X | |hx∗i , yi| < ε ∀i = 1, ..., n},
form a basis of neighbourhoods of 0 for the weak topology (every neighbourhood
of 0 contains an element V ∈ Vw (0)). Consequently, for x ∈ X, the collection of
sets
Vw (x) := Vx∗1 ,...,x∗n ,ε (x) | n ∈ N, x∗i ∈ X ∗ , i = 1, ..., n, ε > 0 ,
where
Vx∗1 ,...,x∗n ,ε (x) := x + Vx∗1 ,...,x∗n ,ε (0) = {y ∈ X | |hx∗i , y − xi| < ε ∀i = 1, ..., n},
C ⊆ cl C ⊆ clw C,
where clw C denotes the weak closure of C, which is the closure of C in the weak
topology. Indeed, let x ∈ cl C and n ∈ N, x∗i ∈ X ∗ , i = 1, ..., n, and ε > 0. Let
δ := max{kx∗ k | εi=1,...,n}+1 . Then there exists y ∈ C such that y ∈ B(x, δ). From
i
here we get that for every i = 1, ..., n, it holds |hx∗i , y − xi| ≤ kx∗i kky − xk < ε,
thus y ∈ Vx∗1 ,...,x∗n ,ε (x). This proves that x ∈ clw C.
The following theorem shows that the (norm) closure and the weak closure of
a convex set coincide.
cl C = clw C.
36 II Convex functions
therefore, there exists ε > 0 such that inf c∈C hx∗ , c − xi > ε. This shows that
Vx∗ ,ε (x) ∩ C = ∅, which contradicts x ∈ clw C.
Definition 4.10 (weak lower semicontinuity) Let f : X → R and x ∈ X. We
say that f is weakly lower semicontinuous at x if
limwinf f (y) = sup inf f (y) ≥ f (x). (4.3)
y−
→x V ∈Vw (x) y∈V
The following proposition shows that convex and lower semicontinuous func-
tions which take the value −∞ cannot take finite values.
Proof. Let z ∈ X such that f (z) = t ∈ R. We have (z, t) ∈ epi f and (x, t−n) ∈
epi f for all n ∈ N. Consequently, for all n ∈ N
1 1 1 1
(x, t − n) + 1 − (z, t) = x+ 1− z, t − 1 ∈ epi f,
n n n n
thus (z, t − 1) ∈ cl(epi f ) = epi f . This implies that f (z) ≤ f (z) − 1. Contradic-
tion!
Proof. ”⇐” The statement follows from Proposition 3.11 and Proposition 4.5.
”⇒” First we prove that the set
{(x∗ , c) ∈ X ∗ × R | hx∗ , yi + c ≤ f (y) ∀y ∈ X}
is nonempty.
If f is identically +∞, then this is obviously the case. Let now z ∈ X such
that f (z) ∈ R. Since (z, f (z)−1) ∈/ epi f , according to Theorem 2.20, there exists
∗ ∗ ∗
(x , α) ∈ X × R, (x , α) 6= (0, 0), such that
hx∗ , xi + αr > hx∗ , zi + α(f (z) − 1) ∀(x, r) ∈ epi f. (4.5)
Since (z, f (z)) ∈ epi f , it holds α > 0, thus, (4.5) can be equivalently written as
1 ∗
r> hx , z − xi + f (z) − 1 ∀(x, r) ∈ epi f.
α
For all x ∈ dom f we have (x, f (x)) ∈ epi f , consequently,
1 ∗
f (x) > hx , z − xi + f (z) − 1.
α
As this inequality holds also for x ∈
/ dom f ,
1 ∗ 1
x 7→ − x , x + hx∗ , zi + f (z) − 1
α α
is an affine-continuous minorant of f .
It obviously holds
f (x) ≥ sup{hx∗ , xi + c | (x∗ , c) ∈ X ∗ × R, hx∗ , yi + c ≤ f (y) ∀y ∈ X} ∀x ∈ X.
(4.6)
We will prove that (4.6) is fulfilled as equality. If f is identically +∞, then this
is obviously the case. Assume that f is not identically +∞ and that there exist
x̄ ∈ X and r̄ ∈ R such that
f (x̄) > r̄ > sup{hx∗ , x̄i + c | (x∗ , c) ∈ X ∗ × R, hx∗ , yi + c ≤ f (y) ∀y ∈ X}. (4.7)
It holds (x̄, r̄) ∈
/ epi f and so, applying again Theorem 2.20, we obtain an element
(x̄ , ᾱ) ∈ X × R, (x̄∗ , ᾱ) 6= (0, 0), and ε > 0 such that
∗ ∗
This means that the mapping x 7→ − ᾱ1 x̄∗ , x + ᾱ1 hx̄∗ , x̄i+ r̄ is an affine-continuous
M
f (y) − f (x) ≤ ky − xk ∀y ∈ B(x, δ).
δ
On the other hand, for every y ∈ B(x, δ) we have 2x − y ∈ B(x, δ), thus, by using
again the convexity of f ,
M
f (x) − f (y) ≤ f (2x − y) − f (x) ≤ ky − xk.
δ
This provides the conclusion.
Now can prove the following result which characterizes the continuity of con-
vex functions.
Proof. Let x ∈ dom f , δ > 0 and M ≥ 0 such that f (y) ≤ M for all y ∈ B(x, δ).
Then B(x, δ) ⊆ dom f , and so x ∈ int(dom f ).
If f takes the value −∞, then, according to Proposition 3.7 and Proposition
1.13, f (z) = −∞ for all z ∈ icr(dom f ) = int(dom f ).
Assume now that f is proper and let z ∈ int(dom f ). Then there exists γ > 0
1
such that u := z + γ(z − x) ∈ dom f . For λ := 1+γ ∈ (0, 1) it yields
γ γ γ γ
λu + (1 − λ)B(x, δ) = z − x+ x + B 0, δ = B z, δ ,
1+γ 1+γ 1+γ 1+γ
γ
thus, for every y ∈ B z, 1+γ δ , y := λu + (1 − λ)v, with v ∈ B(x, δ), and
where
The following results shows that a proper and convex function which is bounded
above on a neighbourhood of a point of its domain is even locally Lipschitz
continuous on the interior of is domain.
4 Topological properties of convex functions 41
α δ α α
f (y) − f (z) ≤ f (u) + f (z) − f (z) = (f (u) − f (z))< .
α+δ α+δ α+δ α+δ
Switching the roles of y and z allows us to conclude that
α α 1
|f (y) − f (z)| < < = ky − zk,
α+δ δ δ
1
thus the conclusion holds with L := δ
> 0.
Proof. Let x ∈ int(dom f ) and 0 < γ ≤ 1 such that x ± γei ∈ dom f for
all i = 1, ..., n, where ei , i = 1, ..., n denotes the canonical basis of Rn . Then
there exists 0 < δ < √γn such that B(x, δ) ⊆ dom f . Let y ∈ B(x, δ). Then
Pn √ pPn √
i=1 |yi − xi | ≤ P n 2
i=1 |yi − xi | ≤ nδ < γ. We denote αi := yi −x
γ
i
for all
n
i = 1, ..., n. Then i=1 |αi | < 1 and
n n
!
X X X X
y =x+ αi γei = αi (x + γei ) + (−αi )(x − γei ) + 1 − |αi | x.
i=1 i:αi >0 i:αi <0 i=1
By denoting
M := max{f (x), f (x ± γe1 ), ..., f (x ± γen )},
42 II Convex functions
it holds
n
!
X X X
f (y) ≤ αi f (x + γei ) + (−αi )f (x − γei ) + 1− |αi | f (x) ≤ M,
i:αi >0 i:αi <0 i=1
which proves that f is bounded above on B(x, δ). The conclusion follows from
Theorem 4.18 and Theorem 4.19.
We close the section with a result which relates the lower semicontinuity of a
convex function defined on Banach spaces with their continuity.
In this chapter we will introduce the notions of conjugate function and (convex)
subdifferential of a (convex) function and investigate their properties.
5 Conjugate functions
Definition 5.1 (conjugate function) Let f : X → R be a given function. The
function
f ∗ : X ∗ → R, f ∗ (x∗ ) := sup{hx∗ , xi − f (x)},
x∈X
Remark 5.2 The conjugate function of f is convex and weakly∗ lower semicon-
tinuous (therefore, norm (strongly) lower semicontinuous on X ∗ ). Indeed, if f is
not proper, then
• either f is identically +∞, which means that f ∗ is identically −∞;
• or there exists x0 ∈ X such that f (x0 ) = −∞, which means that f ∗ is
identically +∞.
If f is proper, then dom f 6= ∅ and
f ∗ (x∗ ) = sup {hx∗ , xi − f (x)} ∀x∗ ∈ X ∗ .
x∈dom f
Since for all x ∈ dom f the mapping x∗ 7→ hx∗ , xi − f (x) is affine and weakly∗
lower semicontinuous, according to Proposition 3.11 and Proposition 4.5, f ∗ is
convex and weakly∗ lower semicontinuous.
43
44 III Conjugate functions and subdifferentiability
(c) if f ≤ g, then f ∗ ≥ g ∗ ;
(d) (supi∈I fi )∗ ≤ inf i∈I fi∗ and (inf i∈I fi )∗ = supi∈I fi∗ ;
(h) if, for z ∗ ∈ X ∗ , fz∗ : X → R, fz∗ (x) = f (x) + hz ∗ , xi, then (fz∗ )∗ (x∗ ) =
f ∗ (x∗ − z ∗ ) ∀x∗ ∈ X ∗ ;
(j) (λf + (1 − λ)g)∗ (x∗ ) ≤ λf ∗ (x∗ ) + (1 − λ)g ∗ (x∗ ) ∀x∗ ∈ X ∗ ∀λ ∈ (0, 1).
Another property which follows directly from the definition of the conjugate
functions is stated in the following proposition.
Then m
X
f ∗ : X1∗ × ... × Xm
∗
→ R, f (x∗1 , ..., x∗m ) = fi∗ (x∗i ).
i=1
(c) For
x(ln x − 1), if x > 0,
f : R → R, f (x) = 0, if x = 0,
+∞, if x < 0,
∗
it holds f ∗ : R → R, f ∗ (x∗ ) = ex .
(d) For z ∗ ∈ X ∗ and c ∈ R, let f : X → R, f (x) = hz ∗ , xi + c. For all x∗ ∈ X ∗
it holds
−c, if x∗ = z ∗ ,
∗ ∗
f (x ) =
+∞, otherwise.
(e) Let C ⊆ X be a given set. For all x∗ ∈ X ∗ it holds
if kx∗ k∗ ≤ 1,
∗ ∗ ∗ 0,
k · k (x ) = δB ∗ (0,1) (x ) =
+∞, otherwise.
(h) For 1 < p < ∞, let f : X → R, f (x) = p1 kxkp . For all x∗ ∈ X ∗ it holds
f ∗ (x∗ ) = 1q kx∗ kq∗ , where p1 + 1q = 1.
A∗ : Y ∗ → X ∗ , hA∗ y ∗ , xi = hy ∗ , Axi ∀y ∗ ∈ Y ∗ ∀x ∈ X,
Proof. (a) By the definition of the conjugate function we have for all y ∗ ∈ Y ∗
(b) We define asPin the proof of Proposition 3.14 the function f : X ×...×X →
R, f (x1 , ..., xm ) = m i (xi ), and the continuous linear operator A : X × ... ×
i=1 fP
X → X, A(x1 , ..., xm ) = m i=1 xi , and recall that Af = f1 ...fm . According to
statement (a) we have
The conclusion follows by using Proposition 5.4 and the fact that the adjoint
operator of A is defined as A∗ x∗ = (x∗ , ..., x∗ ) for all x∗ ∈ X ∗ .
The following result shows that an arbitrary function and its lower semicon-
tinuous hull have the same conjugate function.
Remark 5.11 It is natural to ask whether it makes sense to continue the process
of defining conjugate functions of higher order for an arbitrary function f : X →
R. We will see that it does not make sense, since for
We will close this section by providing formulas for the conjugate of the com-
position of a proper, convex and lower semicontinuous function with a continuous
linear operator and for the sum of finitely many proper, convex and lower semicon-
tinuous functions, which we will obtain as a consequence of the Fenchel-Moreau
Theorem.
To this end we will denote in the following by
∗ ∗
g w : X ∗ → R, g w (x∗ ) = inf{t | (x∗ , t) ∈ clw∗ (epi g)},
w∗
(f1 + ...fm )∗ = f1∗ . . . fm
∗ .
w∗
Next we will prove that A∗ f ∗ is proper. If it is identically +∞, then, according
to (5.2), f ◦ A is identically −∞, which is a contradiction to f proper. If there
w∗
exists z ∗ ∈ X ∗ such that A∗ f ∗ (z ∗ ) = −∞, then f ◦ A is identically +∞, which
is a contradiction to the fact that A−1 (dom f ) 6= ∅.
Using again Theorem 5.10, it yields
w∗ w∗
A∗ f ∗ (x∗ ) = (A∗ f ∗ )∗∗ (x∗ ) = (f ◦ A)∗ (x∗ ) ∀x∗ ∈ X ∗ .
6 Convex subdifferential
Definition 6.1 ((convex) subdifferential) Let f : X → R be a given function
and x ∈ X be such that f (x) ∈ R. The set defined as
and that for all y ∈ dom f the set {x∗ ∈ X ∗ | hx∗ , yi − f (y) ≤ hx∗ , xi − f (x)}
is convex and weakly∗ closed, being either the empty set (if f (y) = −∞) or the
lower level set of the affine and weakly∗ continuous mapping x∗ 7→ hx∗ , yi − f (y).
The function
√
− x, if x ≥ 0,
f : R → R, f (x) =
+∞, otherwise,
50 III Conjugate functions and subdifferentiability
√ 1
− y ≥ x∗ y or, equivalently, − √ ≥ x∗ .
y
Contradiction!
The following proposition shows that the Young-Fenchel inequality for a func-
tion f : X → R is fulfilled as equality only for pairs (x, x∗ ) lying on the graph of
its (convex) subdifferential, namely, fulfilling x∗ ∈ ∂f (x).
h ≤ f ∗∗ ≤ f ≤ f. (6.1)
Taking into consideration that h(x) = f (x), we deduce that h(x) = f ∗∗ (x) =
f¯(x) = f (x), which implies that f is lower semicontinuous at x. The properness
of f , f¯ and f ∗∗ follows from (6.1).
6 Convex subdifferential 51
(b) It follows by using Proposition 6.3, statement (a) and that (see Proposition
5.7 and Remark 5.11)
f = (f )∗∗ = ((f )∗ )∗ = f ∗∗ .
(d) if, for z ∗ ∈ X ∗ , fz∗ : X → R, fz∗ (x) = f (x) + hz ∗ , xi, then ∂fz∗ (x) =
∂f (x) + z ∗ ∀x ∈ X.
Then
∂f (x1 , ..., xm ) = ∂f1 (x1 ) × ... × ∂fm (xm ) ∀(x1 , ..., xm ) ∈ X1 × ... × Xm .
Example 6.7 (a) Let C ⊆ X be a given set. Then the (convex) subdifferential
of its indicator function δC reads
(b) It holds
The next result displays some connections between the (convex) subdifferen-
tial of a given function f and the one of its conjugate.
The next theorem shows that proper and convex functions defined on a
normed space have nonempty (convex) subdifferentials at their points of con-
tinuity.
Proof. Let δ > 0 such that f (y) < f (x) + 1 for all y ∈ B(x, δ). Then B(x, δ) ×
[f (x)+1, +∞) ⊆ epi f , which means that int(epi f ) 6= ∅. We have that (x, f (x)) ∈
/
int(epi f ), since, otherwise, there would exist η > 0 such that (x, f (x) − η) ∈
epi f , which is impossible. Taking into account that int(epi f ) is convex, by
the Hahn-Banach Separation Theorem (Theorem 2.19) there exists (x∗ , α) ∈
X ∗ × R, (x∗ , α) 6= (0, 0), such that
hx∗ , yi + αr ≥ hx∗ , xi + αf (x) ∀(y, r) ∈ cl(epi f ) = cl(int(epi f )). (6.2)
Choosing (y, r) := (x, f (x) + 1) ∈ epi f in (6.2), it yields α ≥ 0. If α = 0, then,
from (6.2) it follows that
hx∗ , y − xi ≥ 0 ∀y ∈ dom f,
therefore,
hx∗ , ui ≥ 0 ∀u ∈ B(0, δ).
This further implies that x∗ = 0, which leads to a contradiction. Consequently,
α > 0. We divide (6.2) by α and, using that for all y ∈ dom f it holds (y, f (y)) ∈
epi f , we get
1 ∗
f (y) ≥ − x , y − x + f (x) ∀y ∈ dom f,
α
which gives
1 ∗
f (y) ≥ − x , y − x + f (x) ∀y ∈ X,
α
thus − α1 x∗ ∈ ∂f (x).
In order to prove the second statement of the theorem we consider an arbitrary
element x∗ ∈ ∂f (x). Then, for every u ∈ B(0, δ), it holds hx∗ , ui ≤ f (x + u) −
f (x) < 1. This means that
δkx∗ k∗ = δ sup |hx∗ , vi| = sup |hx∗ , δvi| = sup |hx∗ , ui| = sup hx∗ , ui ≤ 1,
kvk<1 kvk<1 kuk<δ kuk<δ
which proves that x∗ ∈ B ∗ (0, 1δ ). We have shown that ∂f (x) ⊆ 1δ B ∗ (0, 1), in
other words, that ∂f (x) is norm bounded.
According to the Theorem of Banach-Alaoglu (see, for instance, [8]), the closed
unit ball of X ∗ is weakly∗ compact and, since ∂f (x) is weakly∗ closed, it is weakly∗
compact, too.
54 III Conjugate functions and subdifferentiability
which means that (x, f (x)) ∈ / ri(epi f ). By Theorem 2.22, there exists (a, α) ∈
Rn × R, (a, α) 6= (0, 0), such that
and
sup{aT y + αr | (y, r) ∈ epi f } > aT x + αf (x). (6.4)
As in the proof of Theorem 6.9, from (6.3) it yields that α ≥ 0. If α = 0, then
the two above statements become
and, respectively,
sup{aT y | y ∈ dom f } > aT x. (6.6)
We will prove that aT z = aT x for all z ∈ dom f , which will then contradict (6.6).
Indeed, since x ∈ ri(dom f ), there exists ε > 0 such that B(x, ε) ∩ aff(dom f ) ⊆
dom f . Let z ∈ dom f be arbitrarily chosen. Then there exists λ < 0 such
that x + λ(z − x) ∈ B(x, ε) and, since x + λ(z − x) ∈ aff(dom f ), it yields
x + λ(z − x) ∈ dom f . According to (6.5) it holds aT (x + λ(z − x)) ≥ aT x
or, equivalently, aT z ≤ aT x. Since the opposite inequality is also true, we get
aT z = aT x.
This proves that α > 0. The conclusion follows as in the proof of Theorem
6.9 after dividing (6.3) by α.
Proof. “⊆” Let x∗ ∈ ∂f (x). For all u ∈ X and all t > 0 it holds
f (x + tu) − f (x) 1
≥ hx∗ , x + tu − xi = hx∗ , ui,
t t
which implies that f 0 (x; u) ≥ hx∗ , ui.
“⊇” Let x∗ ∈ X ∗ such that f 0 (x; u) ≥ hx∗ , ui for all u ∈ X. Then for all
u ∈ X it holds
f ((1 − t)x + t(x + u)) − f (x)
hx∗ , ui ≤ f 0 (x; u) = lim
t↓0,t≤1 t
t(f (x + u) − f (x))
≤ lim = f (x + u) − f (x).
t↓0,t≤1 t
Consequently,
f (y) − f (x) ≥ hx∗ , y − xi ∀y ∈ X,
which implies x∗ ∈ ∂f (x).
Proof. Let δ > 0 such that f (y) − f (x) < 1 for all y ∈ B(x, δ). Then B(x, δ) ⊆
dom f , thus x ∈ int(dom f ) = core(dom f ). According to Theorem 7.1 it holds
f 0 (x; u) ∈ R for all u ∈ X. On the other hand, for all u ∈ B(0, δ) we have
f (x + tu) − f (x)
f 0 (x; u) = inf ≤ f (x + u) − f (x) < 1,
t>0 t
in other words, f 0 (x; ·) is bounded above on B(0, δ). According to Theorem 4.18,
f 0 (x; ·) is continuous on int(dom f 0 (x; ·)) = X, which proves the first statement
of the theorem.
For all x∗ ∈ X ∗ we have
0 ∗ ∗ ∗ f (x + tu) − f (x)
(f (x; ·)) (x ) = sup hx , ui − inf
u∈X t>0 t
∗
hx , tui − f (x + tu) + f (x)
= sup .
u∈X,t>0 t
f 0 (x; u) = (f 0 (x; ·))∗∗ (u) = (δ∂f (x) )∗ (u) = σ∂f (x) (u) = sup{hx∗ , ui | x∗ ∈ ∂f (x)}.
It follows from the fact that ∂f (x) is weakly∗ compact (see Theorem 6.9) that
the supremum is attained.
Theorem 7.2 and Theorem 7.3 have important consequences when applied to
convex and Gâteaux differentiable functions. We recall that a proper function
58 III Conjugate functions and subdifferentiability
is continuous and Gâteaux differentiable at (0, 0), and it is not Fréchet differen-
tiable at (0, 0).
(c) If f : X → R is proper and Gâteaux differentiable on B(x, δ) ⊆ dom f , for
δ > 0, and the Gâteaux derivative ∇f : B(x, δ) → X ∗ is continuous at x, then f
is Fréchet differentiable at x.
In order to show this, we consider an arbitrary u ∈ B(0, δ), u 6= 0. Then
there exists ε > 0 such that x + tu ∈ B(x, δ) for all t ∈ (−ε, 1 + ε). We define
φ : (−ε, 1 + ε) → R, φ(t) = f (x + tu) − f (x) − th∇f (x), ui. For all t ∈ (−ε, 1 + ε)
it holds
φ(t + s) − φ(t) f (x + (t + s)u) − f (x + tu)
φ0 (t) = lim = lim − h∇f (x), ui
s→0 s s→0 s
=h∇f (x + tu) − ∇f (x), ui,
consequently,
|f (x + u) − f (x) − h∇f (x), ui|
≤ sup k∇f (x + tu) − ∇f (x)k∗ .
kuk t∈[0,1]
7 Directional derivative and differentiability 59
Proof. Let ∂f (x) = {x∗ }, where x∗ ∈ X ∗ . By Theorem 7.3 it holds for all
u∈X
f (x + tu) − f (x)
f 0 (x; u) = lim = hx∗ , ui.
t↓0 t
From here it yields for all u ∈ X that
tkuk
f 0 (0; u) = lim = kuk ∀u ∈ H,
t↓0 t
60 III Conjugate functions and subdifferentiability
(i) f is convex on C;
∇2 f (x)(u, u) ≥ 0 ∀x ∈ C ∀u ∈ X,
where
h∇f (x + tv), ui − h∇f (x), ui
∇2 f (x)(u, v) := lim
t→0 t
denotes the second Gâteaux differential (derivative) of f at x ∈ C in the
directions u ∈ X and v ∈ X.
and
f (y) − f (λx + (1 − λ)y) ≥ h∇f (λx + (1 − λ)y), λ(y − x)i.
7 Directional derivative and differentiability 61
We multiply the first inequality by λ, the second one by 1 − λ and add the
resulting inequalities. This yields
λf (x) + (1 − λ)f (y) − f (λx + (1 − λ)y) ≥ 0.
Consequently, f is convex on C.
“(ii) ⇒ (iii)” For all x, y ∈ C we have
f (y) − f (x) ≥ h∇f (x), y − xi
and
f (x) − f (y) ≥ h∇f (y), x − yi,
which by summation gives 0 ≥ h∇f (x) − ∇f (y), y − xi or, equivalently, h∇f (y) −
∇f (x), y − xi ≥ 0.
“(iii) ⇒ (ii)” Let x, y ∈ C and ε > 0 such that (1 − t)x + ty ∈ C for all
t ∈ (−ε, 1 + ε). We define φ : (−ε, 1 + ε) → R, φ(t) = f (x + t(y − x)). For all
t ∈ (−ε, 1 + ε) it holds
φ(t + s) − φ(t) f (x + (t + s)(y − x)) − f (x + t(y − x))
φ0 (t) = lim = lim
s→0 s s→0 s
=h∇f (x + t(y − x)), y − xi,
which also means that φ is differentiable on (−ε, 1 + ε) and therefore continuous
on [0, 1]. By the Mean Value Theorem there exists λ ∈ (0, 1) such that
f (y) − f (x) = φ(1) − φ(0) = φ0 (λ) = h∇f (x + λ(y − x)), y − xi.
On the other hand,
1
h∇f (x + λ(y − x)) − ∇f (x), y − xi = h∇f (x + λ(y − x)) − ∇f (x), λ(y − x)i ≥ 0,
λ
which implies that f (y) − f (x) ≥ h∇f (x), y − xi.
“(iii) ⇒ (iv)” Let x ∈ C and u ∈ X. Since x ∈ core C, there exists δ > 0
such that y := x + δu ∈ C. Then
h∇f (x + tu), ui − h∇f (x), ui
∇2 f (x)(u, u) = lim
t→0 t
h∇f (x + δ (y − x)) − ∇f (x), δt (y − x)i
t
= δ lim
t→0,t≤δ t
≥ 0.
“(iv) ⇒ (ii)” Let x, y ∈ C and ε > 0 such that (1 − t)x + ty ∈ C for all
t ∈ (−ε, 1 + ε), and φ : (−ε, 1 + ε) → R, φ(t) = f (x + t(y − x)). We have seen
above that for all t ∈ (−ε, 1 + ε) it holds
φ0 (t) = h∇f (x + t(y − x)), y − xi.
62 III Conjugate functions and subdifferentiability
φ0 (t + s) − φ0 (t)
φ00 (t) = lim
s→0 s
h∇f (x + (t + s)(y − x)), y − xi − h∇f (x + t(y − x)), y − xi
= lim
s→0 s
2
=∇ f (x + t(y − x))(y − x, y − x) ≥ 0.
Duality theory
In this chapter we will discuss in detail the theory of conjugate duality in convex
analysis and convex optimization, and some of its important consequences.
(P G) inf F (x).
x∈X
63
64 IV Duality theory
Next we explore the concept of strong duality, which is the situation when
v(P G) = v(DG) and the dual has an optimal solution. An important role in the
characterization of strong duality will be played by the infimal value function of
Φ, h : Y → R, h(y) = inf x∈X Φ(x, y). One can notice that v(P G) = h(0). The
following result shows that the optimal objective value of (DG) can be expressed
in terms of the biconjugate of h.
Since h∗∗ ≤ h, it holds h∗∗ (0) ≤ h(0), which is nothing else than the weak duality
statement expressed via the infimal value function h.
and so h∗∗ (0) = h(0) = h(0) ∈ R. Since v(P G) = h(0) and v(DG) = h∗∗ (0),
statement (ii) is shown.
“(ii) ⇒ (i)” Statement (ii) can be equivalently written as h∗∗ (0) = h(0) ∈ R.
This implies h(0) = h(0) ∈ R, which actually means that (P G) is normal.
The following notion will be used to characterize the strong duality for (P G)
and (DG).
Proof. “(i) ⇒ (ii)” Assume that h(0) ∈ R and ∂h(0) 6= ∅. Thus, according to
Proposition 6.4, h(0) = h∗∗ (0), which is nothing else than v(P G) = v(DG) ∈ R.
By Proposition 8.4 it follows that (P G) is normal. Further, consider an element
ȳ ∗ ∈ ∂h(0). We have h(0) + h∗ (ȳ ∗ ) = 0 or, equivalently, v(P G) = h(0) =
−h∗ (ȳ ∗ ) = −Φ∗ (0, ȳ ∗ ), which, according to the weak duality theorem, implies
that ȳ ∗ is an optimal solution of (DG).
“(ii) ⇒ (i)” Assume that (P G) is normal and that its dual (DG) has an op-
timal solution ȳ ∗ ∈ Y ∗ . Thus h(0) = v(P G) = v(DG) = −Φ∗ (0, ȳ ∗ ) = −h∗ (ȳ ∗ ) ∈
R, which is the same with h(0) + h∗ (ȳ ∗ ) = 0 or, equivalently, ȳ ∗ ∈ ∂h(0). Since
the set ∂h(0) is nonempty, the stability of (P G) is proved.
and use “max” instead of “sup” in order to emphasize that the dual problem has
an optimal solution.
66 IV Duality theory
Proof. The infimal value function h : Y → R, h(y) = inf x∈X Φ(x, y), is convex
and it fulfills h(0) = v(P G). If h(0) = −∞, then, according to the weak duality
theorem, v(P G) = v(DG) = −∞ and every element y ∗ ∈ Y ∗ is an optimal
solution of the dual problem (DG).
Let h(0) > −∞. Since h(0) ≤ Φ(x0 , 0) < +∞, it holds h(0) ∈ R. According
to (RC1Φ ), there exists δ > 0 such that
One can notice in the proof of the above theorem that a consequence of
the required regularity condition was that 0 is an interior point of dom h =
PrY (dom Φ). We will see in the next theorem that one can prove strong duality
for the primal-dual pair (P G) − (DG) by assuming “only” that 0 ∈ core(dom h),
however, in the setting of complete normed spaces and by additionally requiring
that the perturbation function Φ is also lower semicontinuous.
h(y) ≥ 0 ∀y ∈ Y.
e
Let x0 ∈ X such that Φ(x0 , 0) ∈ R. Then e h(0) < +∞, which proves that e
h is
proper. Since dom Φ = dom Φ, it holds dom h = PrY (dom Φ) = PrY (dom Φ) =
e e
dom eh, and so 0 ∈ core(dom h) = core(dom e h).
Let ε := max{h(0), 0} + 1 ∈ R and W := {y ∈ Y | e
e h(y) < ε}. Then
0 ∈ core W . Indeed, let y ∈ Y and λ > 0 such that λy ∈ dom e h. We choose
t ∈ (0, 1] such that t(e
h(λy) − ε + 1) < 1. Then
h(tλy) ≤ te
e h(λy) + (1 − t)e
h(0) ≤ te
h(λy) + (1 − t)(ε − 1)
h(λy) − ε + 1) + ε − 1 < ε,
= t(e
which shows that tλy ∈ W and, consequently, proves that 0 ∈ core W . From here
we immediately have that 0 ∈ core(cl W ).
According to Theorem 1.11, for the convex and closed set cl W we have 0 ∈
core(cl W ) = int(cl W ), which means that there exists δ > 0 such that B(0, 2δ) ⊆
cl W .
Next we will prove that
or, equivalently,
1
y − y1 − ... − yk
< 1 δ ∀k ≥ 1,
1
2 2k
2k
which proves that +∞ 1
P
k=1 2k yk = y.
For all k ∈ N, since yk ∈ W , we can choose xk ∈ X such Pthat Φ(x
e k , yk ) < ε,
+∞ 1
which implies kxk k < ε. Since X is a Banach space and k=1 2k kxk k < +∞,
P+∞ 1
there exists x ∈ X such that k=1 x = x (see, for instance, [4, Lemma 1.12
2k k
(ii)]). This proves that
+∞
X 1
(xk , yk ) = (x, y).
k=1
2k
68 IV Duality theory
h(y) ≤ Φ(x,
e e y) ≤ ε.
thus 0 ∈ int(dom h). At this point we can argue as in the proof of Theorem 8.7.
If h(0) = −∞, then strong duality follows automatically. If h(0) > −∞, then
h(0) ≤ Φ(x0 , 0) < +∞, in other words, h(0) ∈ R. By Theorem 4.18 we obtain
that h is proper and continuous on int(dom h), while Theorem 6.9 further gives
that ∂h(0) 6= ∅. This provides the desired conclusion.
Proof. Since 0 ∈ sqri(PrY (dom Φ)), it holds that Z := cone(PrY (dom Φ)) is a
closed linear subspace of the Banach space Y , consequently, it is complete. In
addition, dom h = PrY (dom Φ) ⊆ Z.
We assume first that Z = {0}, which corresponds to the case when dom h =
{0}. Then v(P G) = h(0) and for all y ∗ ∈ Y ∗ it holds h∗ (y ∗ ) = supy∈Y {hy ∗ , yi −
8 A general perturbation approach 69
h : Z → R,
e h(z) = inf Φ(x,
e e z) = inf Φ(x, j(z)) = h(j(z)).
x∈X x∈X
e ∗ (0, z ∗ )}
e 0) = max {−Φ
inf Φ(x,
x∈X ∗ ∗ z ∈Z
or, equivalently,
h(0) = e
h(0) = max
∗ ∗
h∗ (z ∗ )}.
{−e (8.3)
z ∈Z
h∗ (j ∗ (y ∗ )) = sup{hj ∗ (y ∗ ), zi − e
e h(z)} = sup{hy ∗ , j(z)i − h(j(z))}
z∈Z z∈Z
= sup{hy , zi − h(z)} = sup{hy , zi − h(z)} = h∗ (y ∗ ).
∗ ∗
z∈Z z∈Y
h(0) = max
∗ ∗
{−h∗ (y ∗ )},
y ∈Y
Regularity conditions and strong duality lie at the basis of exact conjugate
and subdifferential calculus.
Furthermore, the function Φx∗ is proper and convex with dom Φx∗ = dom Φ,
therefore, one of the regularity conditions (RCiΦx∗ ), i ∈ {1, 2, 20 , 3, 4}, is fulfilled.
According to the theorems 8.7, 8.8, 8.10 and 8.11 we have
(Φ(·, 0))∗ (x∗ ) = − inf Φx∗ (x, 0) = − max
∗ ∗
{−Φ∗x∗ (0, y ∗ )} = min
∗ ∗
Φ∗ (x∗ , y ∗ ),
x∈X y ∈Y y ∈Y
Proof. Since Φ is proper, convex and lower semicontinuous, for the conjugate
of inf y∗ ∈Y ∗ Φ∗ (·, y ∗ ) it holds for all x ∈ X
∗
∗ ∗ ∗ ∗ ∗ ∗
inf ∗ Φ (·, y ) (x) = sup hx , xi − ∗inf ∗ Φ (x , y )
∗
y ∈Y x∗ ∈X ∗ y ∈Y
w∗
This also implies that the function (inf y∗ ∈Y ∗ Φ∗ (·, y ∗ )) is proper. Assuming that
it takes everywhere the value +∞, it yields that its conjugate, which coincides
with (inf y∗ ∈Y ∗ Φ∗ (·, y ∗ ))∗ , is identically −∞. This contradicts the properness
w∗
of Φ. On the other hand, assuming that (inf y∗ ∈Y ∗ Φ∗ (·, y ∗ )) takes somewhere
the value −∞, by the same argument as above, one would have that Φ(x, 0) is
equal to +∞ for all x ∈ X. Since this contradicts the feasibility assumption
w∗
0 ∈ PrY (dom Φ), the function (inf y∗ ∈Y ∗ Φ∗ (·, y ∗ )) is everywhere greater than
−∞.
By the Fenchel-Moreau Theorem we obtain
∗∗ w∗ !∗∗ w∗
(Φ(·, 0))∗ = ∗inf ∗ Φ∗ (·, y ∗ ) = inf ∗ Φ∗ (·, y ∗ )
∗
= ∗inf ∗ Φ∗ (·, y ∗ ) ,
y ∈Y y ∈Y y ∈Y
such that Φ∗ (x∗ , y ∗ ) ≤ r + ε. This means that for all ε > 0 we have (x∗ , r + ε) ∈
∪y∗ ∈Y ∗ epi(Φ∗ (·, y ∗ )) = PrX ∗ ×R (epi Φ∗ ), and so (x∗ , r) ∈ clω∗ (PrX ∗ ×R (epi Φ∗ )).
The conclusion follows from Theorem 9.1, which yields
∗ ∗ ∗
epi(Φ(·, 0)) = clω∗ epi ∗inf ∗ Φ (·, y ) ,
y ∈Y
and (9.1).
Theorem 9.1 and Theorem 9.2 lead to the following result.
Theorem 9.3 Let Φ : X × Y → R be a proper, convex and lower semicontinuous
function such that 0 ∈ PrY (dom Φ). Then PrX ∗ ×R (epi Φ∗ ) is weakly∗ closed if and
only if
(Φ(·, 0))∗ (x∗ ) = min
∗ ∗
Φ∗ (x∗ , y ∗ ) ∀x∗ ∈ X ∗ .
y ∈Y
Proof. The infimal value function inf y∗ ∈Y ∗ Φ∗ (·, y ∗ ) is proper (see the proof of
Theorem 9.1).
“⇒” If PrX ∗ ×R (epi Φ∗ ) is weakly∗ closed, then, according to Theorem 9.2 and
(9.1), epi(Φ(·, 0))∗ = PrX ∗ ×R (epi Φ∗ ) = epi (inf y∗ ∈Y ∗ Φ∗ (·, y ∗ )). This proves that
(Φ(·, 0))∗ = inf y∗ ∈Y ∗ Φ∗ (·, y ∗ ). Let x∗ ∈ X ∗ . If (Φ(·, 0))∗ (x∗ ) = +∞, then the
infimum in the infimal value function of Φ∗ is attained at every y ∗ ∈ Y ∗ . If
(Φ(·, 0))∗ (x∗ ) ∈ R, then
(x∗ , (Φ(·, 0))∗ (x∗ )) ∈ epi(Φ(·, 0))∗ = PrX ∗ ×R (epi Φ∗ ) = ∪y∗ ∈Y ∗ epi(Φ∗ (·, y ∗ )),
consequently, there exists ȳ ∗ ∈ Y ∗ such that
Φ∗ (x∗ , ȳ ∗ ) ≤ (Φ(·, 0))∗ (x∗ ) = ∗inf ∗ Φ∗ (x∗ , y ∗ ).
y ∈Y
“⇐” The fact that (Φ(·, 0))∗ = inf y∗ ∈Y ∗ Φ∗ (·, y ∗ ) implies that inf y∗ ∈Y ∗ Φ∗ (·, y ∗ )
is weakly∗ lower semicontinuous, consequently, from (9.1)
∗ w∗ ∗ ∗
∗ ∗
clω∗ (PrX ∗ ×R (epi Φ )) ⊆ epi ∗inf ∗ Φ (·, y ) = epi ∗inf ∗ Φ (·, y ) .
y ∈Y y ∈Y
On the other hand, since for all x∗ ∈ X ∗ the infimum in the infimal value function
of Φ∗ is attained, we have
epi ∗inf ∗ Φ (·, y ) ⊆ ∪y∗ ∈Y ∗ epi(Φ∗ (·, y ∗ )) = PrX ∗ ×R (epi Φ∗ ).
∗ ∗
y ∈Y
10 Fenchel duality
In this section, by using the general perturbation approach we develop a duality
framework for the optimization problem
Obviously, ΦA (·, 0) = f + g ◦ A.
74 IV Duality theory
In Banach spaces we obtain from (RC2Φ ) and (RC2Φ0 ) the following regularity
conditions
(RC2A ) X and Y are Banach spaces, f and g are lower semicontinuous
and 0 ∈ core(dom g − A(dom f ))
and
(RC2A0 ) X and Y are Banach spaces, f and g are lower semicontinuous
and 0 ∈ int(dom g − A(dom f )),
reads
(ΦF )∗ (x∗ , y ∗ ) = f ∗ (x∗ − y ∗ ) + g ∗ (y ∗ ) ∀(x∗ , y ∗ ) ∈ X ∗ × X ∗
and gives rise to the following so-called Fenchel dual problem to (P F )
and
(RC3F ) X is a Banach space, f and g are lower semicontinuous
and 0 ∈ sqri(dom g − dom f ),
and in case X is finite-dimensional
Example 10.3 (a) Let X and Y be notrivial real Banach spaces and f =
δX×{0} , g = δ{0}×Y : X × Y → R proper, convex and lower semicontinuous func-
tions. It holds dom f ∩ dom g = {(0, 0)} and v(P F ) = inf (x,y)∈X×Y {f (x, y) +
g(x, y)} = 0. Since f ∗ = δ{0}×Y ∗ and g ∗ = δX ∗ ×{0} , it holds v(DF ) = 0
and (x̄∗ , ȳ ∗ ) = (0, 0) is an optimal solution of the dual, which means that for
(P F ) and (DF ) strong duality holds. Since neither f nor g is continuous at
(0, 0), the regularity condition (RC1F ) is not fulfilled. On the other hand, since
dom g − dom f = X × Y , we have 0 ∈ core(dom g − dom f ), thus the regularity
condition (RC2F ) is fulfilled.
(b) Let X be an infinite-dimensional real Banach space and x] : X → R a non-
continuous linear functional. The functions f = x] and g = −x] are proper and
convex and have full-domain, which means that 0 ∈ core(dom f − dom g). Their
conjugate functions are identically +∞, thus v(DF ) = −∞. On the other hand,
v(P F ) = 0. This shows that for the regularity conditions (RCiF ), i ∈ {2, 20 , 3},
the assumption that the functions f and g are lower semicontinous cannot be
omitted.
−x∗1 , if x∗1 ≤ 0,
∗ 2 ∗ ∗ ∗
f : R → R, f (x1 , x2 ) = (10.1)
+∞, if x∗1 > 0,
and
x∗1 , if x∗1 ≥ 0,
∗ ∗
g : R → R, 2
g (x∗1 , x∗2 ) = (10.2)
+∞, if x∗1 < 0,
thus
and every (0, x∗2 ), with x∗2 ∈ R, is an optimal solution of the dual. This shows that
for (P F ) and (DF ) strong duality holds. Since cone(dom g − dom f ) = R × {0},
78 IV Duality theory
neither (RC1F ) nor (RC2F ) is fulfilled. However, the regularity condition (RC3F )
is fulfilled.
(b) The functions f = δ(−1,+∞)×{0} , g = δ(−∞,1)×{0} : R2 → R are proper,
convex and not lower semicontinuous, (0, 0) ∈ dom f ∩dom g and it holds v(P F ) =
inf x∈R2 {f (x) + g(x)} = 0. Their conjugate functions are given by (10.1) and
(10.2), respectively, thus for (P F ) and (DF ) strong duality holds. None of the
regularity conditons (RCiF ), i ∈ {1, 2, 20 , 3}, is fulfilled, however, since ri(dom g)∩
ri(dom f ) = (−1, 1) × {0}, the regularity conditions (RC4F ) holds.
Example 10.5 Let `2 = (xn )n∈N ⊆ R | 2
P
n∈N |xn | < +∞ , the convex and
closed sets C = {(xn )n∈N ∈ `2 | x2n−1 + x2n = 0 ∀n ∈ N} and D = {(xn )n∈N ∈
`2 | x2n + x2n+1 = 0 ∀n ∈ N}, and the proper, convex and lower semicontinuous
functions f, g : `2 → R, f = δC and, for x = (xn )n∈N , g(x) = x1 + δD (x). It holds
v(P F ) = inf x∈`2 {f (x) + g(x)} = 0 and v(DF ) = supx∗ ∈`2 {−f ∗ (x∗ ) − g ∗ (x∗ )} =
−∞, while 0 ∈ / sqri(dom g − dom f ) and 0 ∈ icr(dom g − dom f ). This shows
that the condition 0 ∈ sqri(dom g − dom f ) in (RC4F ) cannot be replaced by the
weaker condition 0 ∈ icr(dom g − dom f ).
Example 10.7 For C = R+ ×R and D = {(x1 , x2 ) ∈ R2 | 2x1 +x22 ≤ 0}, the func-
tions f = δC , g = δD : R2 → R are proper, convex and lower semicontinuous and
fulfill dom f ∩ dom g = {(0, 0)} and v(P F ) = inf (x1 ,x2 )∈R2 {f (x1 , x2 ) + g(x1 , x2 )} =
0. It holds
∗2
(x )
x2∗ , if x∗1 > 0,
1
f ∗ = δR− ×{0} and g ∗ (x∗1 , x∗2 ) = 0, if x∗1 = x∗2 = 0,
+∞, otherwise,
and v(DF ) = sup(x∗1 ,x∗2 )∈R2 {−f ∗ (−x∗1 , −x∗2 ) − g ∗ (x∗1 , x∗2 )} = 0, and (x̄∗1 , x̄∗2 ) = (0, 0)
is an optimal solution of the dual. This means that, despite of the fact that none
of the regularity conditons (RCiF ), i ∈ {1, 2, 20 , 3, 4, 5}, is fulfilled, for (P F ) and
(DF ) strong duality holds.
10 Fenchel duality 79
(P Ag ) inf (g ◦ A)(x).
x∈X
g ∗ (y ∗ ), if x∗ = A∗ y ∗ ,
Ag ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
(Φ ) (x , y ) = f (x − A y ) + g (y ) =
+∞, otherwise,
A
(RC20 g ) X and Y are Banach spaces, g is lower semicontinuous
and 0 ∈ int(dom g − ran A),
and
A
(RC3 g ) X and Y are Banach spaces, g is lower semicontinuous
and 0 ∈ sqri(dom g − ran A),
∗ ∗ ∗ m ∗ ∗ ∗
Pm ∗ ∗ ∗ ∗ ∗
For
Pm ∗all (x 1 , ..., x m ) ∈ (X ) , g (x 1 , ..., x m ) = i=1 fi (xi ) and A (x1 , ..., xm ) =
Ag
i=1 xi , thus the dual (D ) is nothing else than
( m )
X
(DΣ ) sup − fi∗ (x∗i ) .
x∗i ∈X ∗ ,i=1,...,m, i=1
m
x∗i =0
P
i=1
For (P Σ ) and (D Σ
Qm) weak duality always holds.
−1
As dom g = i=1 dom fi and A (dom g) = ∩m i=1 dom fi , the regularity condi-
Ag
tion (RC1 ) reads
m
(RC1Σ ) ∃x0 ∈ dom fi such that fi is continuous at x0 , i = 1, ..., m.
T
i=1
(RC2Σ0 ) X is a Banach
mspace, fi , i = 1,
..., m, are lower semicontinuous
Q
and 0 ∈ int dom fi − ∆X m ,
i=1
11 Lagrange duality 81
and
Tm
(RC4Σ ) X is finite-dimensional and i=1 ri(dom fi ) 6= ∅,
respectively.
We notice that
(x∗ , r) ∈ (A∗ × IdR )(epi g ∗ ) ⇔ ∃(x∗1 , ..., x∗m ) ∈ (X ∗ )m such that g ∗ (x∗1 , ..., x∗m ) ≤ r
and A∗ (x∗1 , ..., x∗m ) = x∗
X m
∗ ∗ ∗ m
⇔ ∃(x1 , ..., xm ) ∈ (X ) such that fi∗ (x∗i ) ≤ r
i=1
m
X m
X
and x∗i = x∗ ⇔ (x∗ , r) ∈ epi fi∗ ,
i=1 i=1
A
thus the closedness-type regularity condition (RC5 g ) reads
(RC5Σ ) fi , i =
P1, ..., m, ∗are lower semicontinuous
and m ∗
i=1 epi fi is weakly closed.
(a) ( m
P ∗ ∗
Pm ∗ ∗ Pm ∗ ∗ ∗ ∗ ∗
i=1 f i ) (x ) = min { i=1 f i (x i ) | i=1 xi = x } = (f1 ...fm )(x ) for
∗ ∗
all x ∈ X ;
(b) ∂ ( m
P Pm
i=1 fi ) (x) = i=1 ∂fi (x) for all x ∈ X.
11 Lagrange duality
In this section, by using the general perturbation approach, we develop a duality
framework for the constrained optimization problem
(P L ) inf f (x).
such that x ∈ S,
g(x) 5C 0
82 IV Duality theory
or, equivalently,
(DL ) sup inf {f (x) + hz ∗ , g(x)i}.
z ∗ ∈C ∗ x∈S
(a) inf x∈S,g(x)5C 0 {f (x) − hx∗ , xi} = maxz∗ ∈C ∗ inf x∈S {f (x) + hz ∗ , g(x)i − hx∗ , xi}
for all x∗ ∈ X ∗ ;
(b) ∂ f + δS∩g−1 (−C) (x) = z∗ ∈C ∗ ,hz∗ ,g(x)i=0 ∂ (f + hz ∗ , g(·)i + δS ) (x) for all
S
x ∈ S ∩ g −1 (−C).
In particular, for (P L ) and (DL ) strong duality holds.
Proof. The two statements follow from the formula of the conjugate function
of ΦL and the fact that
[
PrX ∗ (∂ΦL (x, 0)) = ∂ (f + hz ∗ , g(·)i + δS ) (x) ∀x ∈ S ∩ g −1 (−C).
z ∗ ∈C ∗ ,hz ∗ ,g(x)i=0
To see that the last statement is true, let x ∈ S ∩ g −1 (−C) be fixed. Then
x ∈ PrX ∗ (∂ΦL (x, 0)) if and only if there exists −z ∗ ∈ Z ∗ such that (x∗ , −z ∗ ) ∈
∗
∂ΦL (x, 0) or, equivalently, ΦL (x, 0) + (ΦL )∗ (x∗ , −z ∗ ) = hx∗ , xi. This is nothing
else than z ∗ ∈ C ∗ and
f (x) + (f + hz ∗ , g(·)i + δS )∗ (x∗ ) = hx∗ , xi,
which, by taking into account the Young-Fenchel inequality, is further equivalent
to z ∗ ∈ C ∗ , hz ∗ , g(x)i = 0 and
f (x) + hz ∗ , g(x)i + δS (x) + (f + hz ∗ , g(·)i + δS )∗ (x∗ ) = hx∗ , xi.
This shows that x∗ ∈ PrX ∗ (∂ΦL (x, 0)) if and only if there exists z ∗ ∈ C ∗ such that
hz ∗ , g(x)i = 0 and x∗ ∈ ∂ (f + hz ∗ , g(·)i + δS ) (x), which concludes the proof.
In case Z = Rm+k , C = Rm + × {0Rk }, which is a proper and convex cone,
m+k
and g : X → R , g(x) = (g1 (x), ..., gm (x), h1 (x), ..., hk (x)), (P L ) becomes the
following optimization problem with inequality and equality constraints
(P L ) inf f (x).
such that x ∈ S,
gi (x) ≤ 0, i = 1, ...., m,
hj (x) = 0, j = 1, ..., k
Since C ∗ = Rm k
+ × R , its Lagrange dual problem reads
( m k
)
X X
(DL ) sup inf f (x) + λi gi (x) + µj hj (x) .
λi ≥0,i=1,...,m x∈S i=1 j=1
µj ∈R,i=1,...,k
then for (P L ) and (DL ) strong duality holds and for all x ∈ S ∩g −1 (−Rm+ ×{0Rk })
we have
m k
!
[ X X
∂ f + δS∩g−1 (−Rm + ×{0Rk })
(x) = ∂ f+ λi gi + µj hj + δS (x).
λi ≥0,λi gi (x)=0, i=1 j=1
i=1,...,m,
µj ∈R,i=1,...,k
is true.
If, additionally, X = Rn , let E = {(x, z) ∈ Rn ×Rm+k | x ∈ dom f ∩S, z−g(x) ∈
R+ × {0Rk }}. For all x ∈ Rn , let E(x) = {z ∈ Rm+k | (x, z) ∈ E}. Then E(x) = ∅
m
(x, z) ∈ ri E ⇔ x ∈ ri(dom f ∩ S)
and z ∈ ri(g(x) + Rm m
+ × {0Rk }) = g(x) + int R+ × {0Rk }.
Consequently,
ri g(dom f ∩ S) + Rm
+ × {0Rk } = ri PrRm+k (E) = PrRm+k (ri E)
= g(ri(dom f ∩ S)) + int Rm
+ × {0Rk }.
This proves that, in this case, the regularity condition (11.1) is nothing else than
the weak Slater constaint qualification
R2 , g(x1 , x2 ) = (x1 , x2 −x1 ), an R2+ -convex and continuous function. Then dom f ∩
S ∩ g −1 (−R2+ ) = {(0, 0)} and v(P L ) = 0. It holds
L 1 2
v(D ) = sup inf x + x2 + λ1 x1 + λ2 (x2 − x1 ) = 0
λ1 ,λ2 ≥0 x1 ,x2 ≥0 2 1
and (λ1 , λ2 ) = (0, 0) is an optimal solution of the dual problem. One can easily
see that none of the regularity conditons (RCiL ), i ∈ {1, 2, 20 , 3, 4}, is fulfilled. On
the other hand, ∪(λ1 ,λ2 )∈R2+ epi(f + h(λ1 , λ2 ), g(·)i + δR2+ )∗ = R2 × R+ , which means
that the closedness-type regularity condition (RC5L ) is verified.
Chapter V
In this chapter we will discuss the most significant notions and results of the
theory of maximal monotone operators.
87
88 V Maximal monotone operators
(d) Let (H, h·, ·i) be a real Hilbert space, C ⊆ H a nonempty set and S : C →
H a nonexpansive operator, which means that kS(x) − S(y)k ≤ kx − yk for all
x, y ∈ C. Then the operator T : H ⇒ H, defined as T (x) = x − S(x) for all
x ∈ C and T (x) = ∅, otherwise, is monotone. Indeed, for all x, y ∈ C it holds
hT (y) − T (x), y − xi = hy − x − S(y) + S(x), y − xi
= ky − xk2 − hS(y) − S(x), y − xi
≥ ky − xk2 − kS(y) − S(x)kky − xk
=ky − xk (ky − xk − kS(y) − S(x)k) ≥ 0.
Notice that 0 ∈ R(T ) if and only if S has a fixed point in C.
(e) Let (H, h·, ·i) be a real Hilbert space and C ⊆ H a nonempty subset. The
(set-valued) projection operator onto the set C
PC : H ⇒ H, PC (x) = u ∈ C | kx − uk = inf kx − ck ,
c∈C
Proof. Let
S1 ≤ S2 ⇔ G(S1 ) ⊆ G(S2 ).
λhSx − x∗ , ui + λ2 hSu, ui ≥ 0
90 V Maximal monotone operators
or, equivalently,
hSx − x∗ , ui + λhSu, ui ≥ 0.
We let λ converge to zero and obtain hSx−x∗ , ui ≥ 0 for all u ∈ X, consequently,
hSx − x∗ , ui = 0 for all u ∈ X, in other words, Sx = x∗ . This shows that (12.1)
holds, which proves that S is maximally monotone.
and proves that T (x) is the intersection of a family of convex and weakly∗ closed
sets. Consequently, T (x) is convex and weakly∗ closed.
In real Hilbert spaces we have the following criterion for the maximality of a
monotone operator.
Proposition 12.8 Let (H, h·, ·i) be a real Hilbert space and T : H → H a single-
valued monotone and continuous operator. Then T is maximally monotone.
Proof. Let x0 ∈ dom f and ε > 0 be fixed. For every x ∈ X we consider the
set F (x) := {y ∈ X | f (y) + εd(x, y) ≤ f (x)}. We will prove that there exists
xε ∈ F (x0 ) such that F (xε ) = {xε }, which is nothing else than the conclusion of
the theorem.
Since y 7→ f (y) + εd(x, y) is lower semicontinuous, the set F (x) is closed for
every x ∈ X. For every x ∈ dom f it holds x ∈ F (x) ⊆ dom f , while for every
x ∈ X \ dom f it holds F (x) = X.
We also note that F (y) ⊆ F (x) for every y ∈ F (x). This is obvious for
x∈/ dom f . Let x ∈ dom f , y ∈ F (x) and z ∈ F (y). Then
which shows that (xn )n≥0 is a Cauchy sequence, thus it converges to some element
xε ∈ X.
92 V Maximal monotone operators
Let x ∈ X. It holds ∂0 f (x) = ∂f (x) and, for 0 ≤ ε1 ≤ ε2 , ∂ε1 f (x) ⊆ ∂ε2 f (x).
Let ε ≥ 0. Then ∂ε f (x) is a convex and weakly∗ closed set,
and
x∗ ∈ ∂ε f (x) ⇔ f ∗ (x∗ ) + f (x) ≤ hx∗ , xi + ε ⇒ x ∈ ∂ε f ∗ (x∗ ).
Next we formulate a theorem, which is a consequence of the Ekeland’s vari-
ational principle and it relates the graph of the (convex) ε-subdifferential of a
proper, convex and lower semicontinuous function defined on a Banach space
with the graph of its (convex) subdifferential.
Proof. Let (x, x∗ ) ∈ X ×X ∗ such that hy ∗ −x∗ , y−xi ≥ 0 for all (y, y ∗ ) ∈ G(∂f ).
We will prove that x∗ ∈ ∂f (x), which will mean that (12.1) is fulfilled and will,
consequently, lead to the desired conclusion.
94 V Maximal monotone operators
The following example shows that there are maximal monotone operators
which are not (convex) subdifferentials.
Example 13.6 (maximal monotone operators which are not (convex) subdif-
ferentials) A single-valued linear operator S : X → X ∗ is said to be skew if
hSx, xi = 0 for all x ∈ X or, equivalently, if
hx∗ , xi ≤ ϕ∂f (x, x∗ ) ≤ f (x) + f ∗ (x∗ ) ≤ ϕ∗∂f (x∗ , x) ≤ hx∗ , xi + δG(∂f ) (x, x∗ ).
(b) Let H be a real Hilbert space and C ⊆ H a nonempty, convex and closed
set. The Fitzpatrick function of the normal cone operator NC : X ⇒ X ∗ reads
ϕNC (x, x∗ ) = δC (x) + δC∗ (x∗ ).
We will denote by J : X ⇒ X ∗ ,
1
J(x) = ∂ k · k (x) = {x∗ ∈ X ∗ | kx∗ k∗ = kxk and hx∗ , xi = kxkkx∗ k∗ },
2
2
1 1
∆(x, x∗ ) = kxk2 + hx∗ , xi + kx∗ k2∗
2 2
1 1 1
≥ kxk2 − kxkkx∗ k∗ + kx∗ k2∗ = (kxk − kx∗ k∗ )2 ≥ 0.
2 2 2
Then
In reflexive Banach spaces, one can formulate in terms of the duality mapping
the so-called “−J” criterion for the maximality of a monotone operator. In
this setting, we will identify the dual of X × X ∗ with X ∗ × X and consider as
duality product h(y ∗ , y), (x, x∗ )i = hy ∗ , xi + hx∗ , yi for (x, x∗ ) ∈ X × X ∗ and
(y ∗ , y) ∈ X ∗ × X. Consequently, the conjugate of a function h : X × X ∗ → R
reads
where g : X × X ∗ → R,
g(z, z ∗ ) = ∆(x − z, x∗ − z ∗ ) − hz ∗ , zi
1 1
= kx − zk2 + kx∗ − z ∗ k2∗ − hx∗ , zi − hz ∗ , xi + hx∗ , xi,
2 2
is a convex and continuous function. According to strong Fenchel duality, there
exists (u∗ , u) ∈ X ∗ × X such that
We have
∗ ∗ ∗ ∗ ∗ 1 2 1 ∗ ∗ 2
g (−u , −u) = sup hx − u , zi + hz , x − ui − kx − zk − kx − z k∗
(z,z ∗ )∈X×X ∗ 2 2
− hx∗ , xi
∗ ∗ 1 2 ∗ ∗ 1 ∗ 2
= sup hx − u , x − ti − ktk + sup hx − t , x − ui − kt k∗
t∈X 2 t∗ ∈X ∗ 2
∗
− hx , xi
1 1
=hx∗ − u∗ , xi + kx∗ − u∗ k2∗ + hx∗ , x − ui + kx − uk2 − hx∗ , xi
2 2
∗ ∗ ∗
=∆(x − u, x − u ) − hu , ui,
From here it yields ϕ∗S (u∗ , u) = hu∗ , ui and ∆(x − u, x∗ − u∗ ) = 0 or, equivalently,
(u, u∗ ) ∈ G(S) and x∗ − u∗ ∈ J(u − x). Consequently,
R(S + J) = X ∗ .
Remark 14.7 In Hilbert spaces, by combining Theorem 14.5 and Remark 14.6,
we rediscover the celebrated Minty’s theorem. This says that if H is a real Hilbert
space and S : H ⇒ H a monotone operator, then S is maximally monotone if
and only if R(S + IdH ) = H.
If a Banach space is uniformly smooth and uniformly convex, as are for in-
stance Lp -spaces, for 1 < p < ∞, then both J and J −1 are single-valued.
In the following we show that in reflexive Banach spaces one can build on
the approach introduced in Proposition 14.8 to construct maximal monotone
operators.
14 Monotone operators and their representative functions 101
Proof. Let
Since 0 ∈ PrX dom ϕS −PrX dom ϕT , there exist x ∈ X and s∗ , t∗ ∈ X ∗ , such that
(x, s∗ ) ∈ dom ϕS and (x, t∗ ) ∈ dom ϕT . Consequently, h(x, s∗ + t∗ ) ≤ ϕS (x, s∗ ) +
ϕT (x, t∗ ) < +∞, which proves that h is proper.
Let (x, x∗ ) ∈ X × X ∗ be fixed. It holds
where
The functions F and G are proper, convex and lower semicontinuous and fulfil
and
dom G = {(u, s∗ , t∗ ) ∈ X × X ∗ × X ∗ | (u, t∗ ) ∈ dom ϕT }.
It holds
thus, according to (15.1), 0 ∈ sqri(dom F − dom G). From Theorem 10.2 (a) and
(15.2) we have
and
G∗ (x∗ − u∗ , x − s, x − t)
= sup {hx∗ − u∗ , ui + hs∗ , x − si + ht∗ , x − ti − ϕT (u, t∗ )}
∗ ∗
u∈X,s ,t ∈X ∗
∗ ∗
ϕT (x − u∗ , x − t), if x = s,
=
+∞, otherwise,
consequently,
h∗ (x∗ , x) = min
∗ ∗
{ϕ∗S (u∗ , x) + ϕ∗T (x∗ − u∗ , x)}.
u ∈X
This proves that the hypotheses of Proposition 14.9 are fulfilled, consequently,
the set-valued operator having as graph {(x, x∗ ) ∈ X × X ∗ | h∗ (x∗ , x) = hx∗ , xi}
is maximally monotone.
We notice that for (x, x∗ ) ∈ X × X ∗ it holds
h∗ (x∗ , x) = hx∗ , xi
⇔ ∃u∗ ∈ X ∗ such that
hx∗ , xi = ϕ∗S (u∗ , x) + ϕ∗T (x∗ − u∗ , x) ≥ hu∗ , xi + hx∗ − u∗ , xi = hx∗ , xi
⇔ ∃u∗ ∈ X ∗ such that ϕ∗S (u∗ , x) = hu∗ , xi and ϕ∗T (x∗ − u∗ , x) = hx∗ − u∗ , xi
⇔ ∃u∗ ∈ X ∗ such that (x, u∗ ) ∈ G(S) and (x, x∗ − u∗ ) ∈ G(T )
⇔ ∃u∗ ∈ X ∗ such that x∗ = u∗ + (x∗ − u∗ ) ∈ S(x) + T (x)
⇔ (x, x∗ ) ∈ G(S + T ).
if and only if
A sufficient condition for these conditions to hold and, consequently, for the
maximal monotonicity of S +T is the celebrated Rockafellar’s regularity condition
int(D(S)) ∩ D(T ) 6= ∅.
0 ∈ int(D(S) − D(T ))
Remark 15.5 Since F and G in the proof of Theorem 15.3 are proper, convex
and lower semicontinuous functions one can obtain relation (15.3), according to
Theorem 10.2 (a), also if the closedness-type regularity condition (here we also
take into account that convex closed sets are weakly closed and that X is reflexive)
is fulfiled.
In other words, if X is a reflexive Banach space and S, T : X ⇒ X ∗ are two
maximal monotone operators such that
= 0, if x ≥ 0, x + x∗ ≤ 0,
+∞, otherwise,
and, for all (y, y ∗ ) ∈ R2 , by Example 14.3 (b), ϕT (y, y ∗ ) = δR− (y) + δR+ (y ∗ ).
Therefore, for all (x, x∗ ) ∈ R2 ,
∗ ∗ ∗ ∗ ∗ ∗ 1 ∗ 2
ϕS (x , x) = max sup {x z + xz }, sup x z + xz − (z + z )
z≥0,z+z ∗ ≤0 z≥0,z+z ∗ ≥0 4
x2 , if x ≥ 0, x ≥ x∗ ,
=
+∞, otherwise,
[1] H.H. Bauschke, P.L. Combettes: Convex Analysis and Monotone Operator
Theory in Hilbert Spaces, Springer New York Dordrecht Heidelberg London,
2017
[3] R.I. Boţ: Conjugate Duality in Convex Optimization, Lecture Notes in Eco-
nomics and Mathematical Systems, Vol. 637, Springer Berlin Heidelberg, 2010
107