0% found this document useful (0 votes)
59 views

Lecture Notes "Convex Analysis": Univ.-Prof. Dr. Radu Ioan Bot

This document contains lecture notes on convex analysis. It covers topics such as convex sets, convex functions, conjugate functions, subdifferentiability, and duality theory. The notes begin with definitions of basic concepts in convex sets like affine and convex sets. It defines the linear hull, affine hull, convex hull, and conical hull of a set. It also covers properties of convex sets and functions, such as the fact that the intersection of convex sets is convex.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Lecture Notes "Convex Analysis": Univ.-Prof. Dr. Radu Ioan Bot

This document contains lecture notes on convex analysis. It covers topics such as convex sets, convex functions, conjugate functions, subdifferentiability, and duality theory. The notes begin with definitions of basic concepts in convex sets like affine and convex sets. It defines the linear hull, affine hull, convex hull, and conical hull of a set. It also covers properties of convex sets and functions, such as the fact that the intersection of convex sets is convex.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

Lecture Notes

“Convex Analysis”
Summer Semester 2021
Univ.-Prof. Dr. Radu Ioan Boţ
Preface

“... in fact, the great watershed in optimization isn’t between linearity and non-
linearity, but convexity and nonconvexity.” (R. Tyrrell Rockafellar, 1993)

3
4 Preface
Contents

I Convex sets 7
1 Preliminary notions and results . . . . . . . . . . . . . . . . . . . 7
2 The relative interior of a convex set in finite-dimensional spaces . 16

II Convex functions 27
3 Algebraic properties of convex functions . . . . . . . . . . . . . . 27
4 Topological properties of convex functions . . . . . . . . . . . . . 32

III Conjugate functions and subdifferentiability 43


5 Conjugate functions . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6 Convex subdifferential . . . . . . . . . . . . . . . . . . . . . . . . 49
7 Directional derivative and differentiability . . . . . . . . . . . . . 55

IV Duality theory 63
8 A general perturbation approach . . . . . . . . . . . . . . . . . . 63
9 Closedness-type regularity conditions . . . . . . . . . . . . . . . . 71
10 Fenchel duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
11 Lagrange duality . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

V Maximal monotone operators 87


12 Monotone set-valued operators . . . . . . . . . . . . . . . . . . . . 87
13 The maximal monotonicity of the convex subdifferential . . . . . . 90
14 A representation of monotone operators by convex functions . . . 95
15 The maximal monotonicity of the sum of two maximal monotone
operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Bibliography 107

5
6 Contents
Chapter I

Convex sets

In this chapter we will discuss some algebraic and topological properties of convex
sets.
Throughout this lecture X will denote a nontrivial real normed space with
norm k · k. For C, D ⊆ X, we denote by C + D := {x + y | x ∈ C, y ∈ D} the
Minkowski sum of C and D. For λ ∈ R, let λC := {λx | x ∈ C}.

1 Preliminary notions and results


Definition 1.1 (affine set and convex set) A set C ⊆ X is said to be
(a) affine, if λx + (1 − λ)y ∈ C for all x, y ∈ C and all λ ∈ R;
(b) convex, if λx + (1 − λ)y ∈ C for all x, y ∈ C and all λ ∈ [0, 1];
(c) a cone, if λx ∈ C for all x ∈ C and all λ ≥ 0.

Every affine set is convex. The empty set is affine and, therefore, convex and
it is a cone. Every linear subspace of X is an affine set. Every affine set which
contains the origin is a linear subspace. A cone C ⊆ X is convex if and only if
C + C ⊆ C.
Let C ⊆ X be a nonempty set. We have

C is affine ⇔ ∀x ∈ C the set C − x is a linear subspace


⇔ ∃x ∈ C such that C − x is a linear subspace.

For C an affine set and x ∈ C, one has C − x = C − y for all y ∈ C. The set
C − x is called the linear subspace parallel to C and the dimension of the affine
set C is defined as being equal the dimension of C − x.
By induction one can prove that a set C ⊆ X is
(a) P Pkall k ∈ N, xi ∈ C and λi ∈ R, i = 1, ..., k, with
affine if and only if for
k
i=1 λi = 1, it holds i=1 λi xi ∈ C;

7
8 I Convex sets

(b) convex
Pk for all k ∈ N, xi ∈ C and λi ≥ 0, i = 1, ..., k, with
if and only if P
k
i=1 λi = 1, it holds i=1 λi xi ∈ C.

The intersection of an arbitrary family of convex sets (cones, affine sets, lin-
ear subspaces, respectively) is a convex set (cone, affine set, linear subspace,
respectively).

Definition 1.2 (linear hull, affine hull, convex hull, conical hull) Let C ⊆ X.
The set

(a) lin C := ∩{D ⊆ X | C ⊆ D, D is a linear subspace} is called the linear hull


of C.

(b) aff C := ∩{D ⊆ X | C ⊆ D, D is an affine set} is called the affine hull of


C.

(c) co C := ∩{D ⊆ X | C ⊆ D, D is a convex set} is called the convex hull of


C.

(d) cone C := ∩{D ⊆ X | C ⊆ D, D is a cone} is called the conical hull of C.

The following result provides a useful characterization of these sets.

Proposition 1.3 Let C ⊆ X. It holds:


nP o
k
(a) lin C = i=1 λi x i | k ∈ N, xi ∈ C, λ i ∈ R, i = 1, ..., k .
nP o
k Pk
(b) aff C = i=1 λi xi | k ∈ N, xi ∈ C, λi ∈ R, i = 1, ..., k, i=1 λi = 1 .

nP o
k Pk
(c) co C = i=1 λi xi | k ∈ N, xi ∈ C, λi ≥ 0, i = 1, ..., k, λ
i=1 i = 1 .

(d) cone C := {λx | x ∈ C, λ ≥ 0}.

Proof. We will prove only statement (c). The other statements can be proven
in an analogous way.
nP o
k Pk
Let D := λ x
i=1 i i | k ∈ N, xi ∈ C, λ i ≥ 0, i = 1, ..., k, λ
i=1 i = 1 . It is
obvious that C P ⊆ D. Choosing an arbitrary element x ∈ D, this can be repre-
sented as x = ki=1 λi xi , for k ∈ N, xi ∈ C and λi ≥ 0, i = 1, ..., k, such that
Pk Pk
i=1 λi = 1. Since xi ∈ co C, i = 1, ..., k, it holds that x = i=1 λi xi ∈ co C.
This proves that C ⊆ D ⊆ co C.
Pk by proving that D is a convex set. Let x, y ∈ D
The conclusion will follow
and t ∈ [0, 1]. Then x = i=1 λi xi , for k ∈ N, xi ∈ C and λi ≥ 0, i = 1, ..., k, such
1 Preliminary notions and results 9

that ki=1 λi = 1, and y = lj=1 µj yj , for j ∈ N, yj ∈ C and µj ≥ 0, j = 1, ..., l,


P P

such that lk=1 µj = 1. Then it holds


P

k
X l
X
tx + (1 − t)y = tλi xi + (1 − t)µj yj ∈ D,
i=1 j=1

Pk Pl
since i=1 tλi + j=1 (1 − t)µj = t + 1 − t = 1. 

If C ⊆ X is a convex set, then λC + µC = (λ + µ)C for all λ, µ ≥ 0. From


here it follows that, if C is a convex set, then cone C is also a convex set.
An operator T : X → Y , where Y is another nontrivial real normed space, is
called

(a) linear, if T (λx + µy) = λT (x) + µT (y) for all x, y ∈ X and all λ, µ ∈ R;

(b) affine, if T (λx + (1 − λ)y) = λT (x) + (1 − λ)T (y) for all x, y ∈ X and all
λ ∈ R.

It is easy to see that T is affine if and only if x 7→ T (x) − T (0) is linear. If


T : X → Y is an affine operator and C ⊆ X and D ⊆ Y are convex sets, then
the sets T −1 (D) = {x ∈ X | T (x) ∈ D} ⊆ X and T (C) := {T (x) | x ∈ C} ⊆ Y
are convex, too.
If Ci ⊆ X are P convex sets and λi ∈ R, i = 1, ..., k, then T : X × ... × X →
X, T (x1 , ..., xk ) = ki=1 λi xi , is a linear operator. The set C1 × ... × Ck is a convex
subset
Pk of the product space X × ... × X, which implies that T (C1 × ... × Ck ) =
i=1 λi Ci is a convex set.

Proposition 1.4 Let C ⊆ X be a given set and x ∈ aff C. It holds

aff C = x + lin(C − x).

Proof. “⊇” We have that C − x ⊆ aff C − x and aff C − x is a linear subspace,


therefore lin(C − x) ⊆ aff C − x. P
“⊆” Let y ∈ aff C. Then y = ki=1 λi xi , for k ∈ N, xi ∈ C, λi ∈ R, i = 1, ..., k,
such that ki=1 λi = 1. It holds y − x = ki=1 λi xi − x = ki=1 λi (xi − x) ∈
P P P
lin(C − x), which proves that aff C − x ⊆ lin(C − x). 

The dimension of a given set C ⊆ X is defined as being the dimension of its


affine hull. The following theorem provides a more precise representation for the
convex hull of a set of finite dimension. Before formulating it, we want to recall
that the vectors xi ∈ X, i = 0, 1, ..., m, are said to be affinely independent if the
dimension of the set aff ({x0 , x1 , ..., xm }) is equal to m. Otherwise they are said
to be affinely dependent.
10 I Convex sets

According to Proposition 1.4 it holds

aff ({x0 , x1 , ..., xm }) = x0 + lin ({x1 − x0 , ..., xm − x0 }) ,

consequently, x0 , x1 , ..., xm are affinely independent if and only if x1 − x0 , ..., xm −


x0 are linearly independent. This is further equivalent to
m
X m
X
λi xi = 0, λi = 0 ⇒ (λ0 , λ1 , ..., λm ) = 0. (1.1)
i=0 i=0

Theorem 1.5 (Theorem of Carathéodory) Let C ⊆ X be a nonempty set with


dim C = k. Then x ∈ co C if and only if x can be represented as a convex
combination of kP + 1 elements of C, namely,Pthere exist xi ∈ C and λi ≥ 0, i =
1, ..., k + 1, with k+1
i=1 λi = 1, such that x =
k+1
i=1 λi xi .

Proof. “⇐” It follows by Proposition 1.3 (c).


Pm“⇒” Let x ∈ co C. ThenPthere m
exist m ∈ N, xi ∈ C, λi ≥ 0, i = 1, ..., m, with
i=1 iλ = 1, such that x = i=1 i xi . If m ≤ k + 1, then there is nothing to be
λ
proven.
Assume that m > k+1. As dim aff C = dim C = k and xi ∈ aff C, i = 1, ..., m,
it follows that x1 , ..., xm arePmaffinely dependent, Pm which means that there exists
(µ1 , ..., µm ) 6= 0 such that i=1 µi xi = 0 and i=1 µi = 0. We define
 
λi
s̄ := min µi > 0, i = 1, ..., m and λ̄i := λi − s̄µi , i = 1, ..., m.
µi

Since there exists at least P one positivePµi , i = 1, ..., m, s̄ is well-defined.


Pm Then
m m Pm
λ̄i ≥ 0 for allP i = 1, ..., m,P i=1 λ̄i = i=1 λi − s̄ i=1 µi = 1 and i=1 λ̄i xi =
P m m m
λ x
i=1 i i − s̄ i=1 µi xi = i=1 λi xi = x. Since at least one of the λ̄i , i = 1, ..., m,
is equal to zero, this means that x can be represented as a convex combination
of m − 1 elements in C.
If m − 1 = k + 1, then the proof is done. Otherwise, we can repeat the
procedure until we obtain a representation for x as a convex combination of at
most k + 1 elements in C. 

The next theorem shows that the interior and the closure of a convex set are
also convex sets.

Theorem 1.6 Let C ⊆ X be a convex set. The following statements are true:

(a) cl C is a convex set;

(b) for all x ∈ int C, y ∈ cl C and λ ∈ (0, 1] it holds λx + (1 − λ)y ∈ int C;

(c) int C is a convex set;


1 Preliminary notions and results 11

(d) if int C 6= ∅, then cl(int C) = cl C and int(cl C) = int C.

Proof. (a) Let x, y ∈ cl C and λ ∈ [0, 1]. Let (xk )k∈N , (y k )k∈N ⊆ C such that
xk → x and y k → y as k → +∞. Then λxk + (1 − λ)y k ∈ C for all k ∈ N,
consequently, λx + (1 − λ)y ∈ cl C.
(b) Let x ∈ int C, y ∈ cl C and λ ∈ (0, 1]. Then there exists ε > 0 such that
2−λ 2−λ 2−λ

x + B 0, λ ε = B x, λ ε = u ∈ X | ku − xk < λ ε ⊆ C. On the other
hand, B(y, ε) ∩ C 6= ∅ or, equivalently, y ∈ B(0, ε) + C. From here we obtain

B(λx + (1 − λ)y, ε) = λx + (1 − λ)y + B(0, ε)


⊆ λx + (1 − λ)B(0, ε) + (1 − λ)C + B(0, ε)
= λx + (2 − λ)B(0, ε) + (1 − λ)C
 
2−λ
=λ x+ B(0, ε) + (1 − λ)C ⊆ λC + (1 − λ)C = C.
λ

Consequently, λx + (1 − λ)y ∈ int C.


(c) Is a direct consequence of (b).
(d) Let x ∈ int C. It holds cl(int C) ⊆ cl C. Let y ∈ cl C. According to (b),
1
k
x + 1 − k1 y ∈ int C for all k ∈ N. We let k → +∞ and obtain y ∈ cl(int C),
thus cl(int C) = cl C.
It holds int C ⊆ int(cl C). Let y ∈ int(cl C). Then there exists ε > 0 such that
B(y, ε) ⊆ cl C. Let λ̄ < 0 such that u := y + λ̄(x − y) ∈ B(y, ε) ⊆ cl C. It follows
−λ̄
from (b) that y = 1− λ̄
x + 1−1 λ̄ u ∈ int C, which proves that int(cl C) ⊆ int C, thus
int(cl C) = int C. 

Remark 1.7 If C ⊆ X is a given set, then cl(co C) is a convex set and it holds

co C ⊆ co(cl C) ⊆ cl(co C).

The second is in general strict, as it can be seen for the set C = R×{0}∪{(0, 1)} ⊆
R2 .

In the last part of this section we will propose some refinements of the concept
of the interior.

Definition 1.8 (algebraic interior/core) Let C ⊆ X. The algebraic interior or


core of the set C is defined as

core C := {x ∈ X | ∀u ∈ X ∃δ > 0 such that ∀λ ∈ [0, δ] it holds x + λu ∈ C}.

One always has that int C ⊆ core C ⊆ C. Indeed, let x ∈ int C and ε > 0
such that B(x, ε) ⊆ C. Let u ∈ X. Then there exists δ > 0 such that x + δu ∈
B(x, ε) ⊆ C. Consequently, for all λ ∈ [0, δ], it holds x + λu ∈ B(x, ε) ⊆ C, thus,
x ∈ core C.
12 I Convex sets

Proposition 1.9 Let C ⊆ X be a nonempty convex set. For every x ∈ C it


holds:

x ∈ core C ⇔ ∀u ∈ X ∃δ > 0 such that x + δu ∈ C ⇔ cone(C − x) = X.

Proof. It is easy to see that x ∈ core C implies ∀u ∈ X ∃δ > 0 such that x+δu ∈
C and also that this statement implies cone(C − x) = X.
Assume that cone(C − x) = X and let u ∈ X. Then there exists δ > 0 such
that u ∈ 1δ (C − x) or, equivalently, x + δu ∈ C. Let λ ∈ [0, δ]. Then
 
λ λ
x + λu = 1 − x + (x + δu) ∈ C,
δ δ

which proves that x ∈ core C. 

The following example introduces a convex set with nonempty algebraic inte-
rior and empty interior.

Example 1.10 Let X be a infinite-dimensional normed space and x] : X → R


a non-continuous linear functional (one can use Hamel bases to construct such a
functional). Let C := {x ∈ X | |x] (x)| < 1}, which is obviously a convex set.
We prove that core C = C. Let x ∈ C and α := x] (x) ∈ R with |α| < 1. Let
u ∈ X. Then
 
]
x x + 1 − |α| ≤ |x] (x)| + 1 − |α| < 1,

]
u
2(|x (u)| + 1) 2
]
thus u ∈ 2(|x1−|α|
(u)|+1)
(C − x) ⊆ cone(C − x). This proves that x ∈ core C.
On the other hand, int C = ∅. Indeed, assume that there exists x ∈ int C.
Then there exists δ > 0 such that B(x, δ) ⊆ C. This means that for every
u ∈ X, kuk ≤ 1, it holds x + 2δ u ∈ C, thus x] (x) + 2δ x] (u) < 1 and, from here,
|x] (u)| < 2δ (1 + |x] (x)|). Since this actually means that x] is continuous, we get
a contradiction.

The following theorem provides conditions for the algebraic interior of a con-
vex set to be equal with the interior of the set.

Theorem 1.11 Let C ⊆ X be a convex set. If one of the following conditions is


fulfilled

(a) int C is nonempty;

(b) X is finite-dimensional;

(c) X is a Banach space and C is closed;


1 Preliminary notions and results 13

then int C = core C.


Proof. (a) Assume that int C is nonempty and choose y ∈ int C. Let x ∈ core C.
Then there exists δ > 0 such that z := x + δ(x − y) = (1 + δ)x − δy ∈ C. Let
δ
t := δ+1 ∈ (0, 1). According to Theorem 1.6 (b) we have ty + (1 − t)z ∈ int C.
δ 1
However, ty + (1 − t)z = δ+1 y + δ+1 ((1 + δ)x − δy) = x, which proves that
x ∈ int C.
(b) Assume that dim X = n. Let x ∈ core C. Then there exists a basis
{e1 , ..., en }Pof X such that
Pnx ± ei ∈ C for all i = 1, ..., n. The function k · k1 :
n
X → R, k i=1 αi ei k1 = i=1 |αi |, defines a norm on X. Since any two norms on
finite-dimensional spaces are equivalent, the set Bk·k1 (0, 1) is a neighbourhood of
zero in the norm topology of X.
For every y ∈ x+Bk·k1 (0, 1) we have that y−x = ni=1 αi ei , where ni=1 |αi | <
P P
1. The convexity of C gives
n n
!
X X X X
y =x+ αi ei = αi (x + ei ) + (−αi )(x − ei ) + 1 − |αi | x ∈ C,
i=1 i:αi >0 i:αi <0 i=1

thus, x ∈ int C.
(c) Assume that X is a Banach space and C is closed. S Let x ∈ core C. By
the definition of the algebraic interior we have that X = n∈N n(C − x). By the
Baire Category Theorem (see, for instance, [4, Corollary 10.4]), we have that X
is a set of the second category, which means that there exists n0 ∈ N such that
∅=6 int(n0 (C − x)) = int(n0 (C − x)). Consequently, ∅ =6 int(C − x), which means
that there exists u ∈ C − x and ε > 0 such that x + u + B(0, ε) ⊆ C. On the other
hand, since x ∈ core C, there exists t > 0 such that x − tu ∈ C. The convexity
of C yields
 
εt t 1
x + B 0, = (x + u + B(0, ε)) + (x − tu) ⊆ C,
t+1 t+1 t+1
thus x ∈ int C. 
Definition 1.12 (algebraic relative interior/intrinsic core) Let C ⊆ X. The
algebraic relative interior or intrinsic core of the set C is defined as
icr C:={x ∈ X | ∀y ∈ aff C ∃δ > 0 such that ∀λ ∈ [−δ, δ] it holds x+λ(y−x) ∈ C}.
Proposition 1.13 Let C ⊆ X. Then x ∈ core C if and only if aff C = X and
x ∈ icr C.
Proof. “⇐” It follows from the definitions of the intrinsic core and of the core.
“⇒” Let x ∈ core C. It is clear that x ∈ icr C. We will prove that aff C = X.
Let u ∈ X. Then there exists λ > 0 such that x + λ(u − x) ∈ C. From here,
we get u ∈ λ1 C + 1 − λ1 x ⊆ λ1 aff C + 1 − λ1 aff C ⊆ aff C. This proves that
aff C = X. 
14 I Convex sets

As a direct consequence of Proposition 1.13 it follows that, if C ⊆ X is a set


with nonempty interior, then aff C = X.
The following proposition provides an useful geometric characterization for
the intrinsic core of a convex set.

Proposition 1.14 Let C ⊆ X be a nonempty convex set. For every x ∈ C it


holds:

(a) lin(C − x) = cone(C − C), thus, cone(C − C) is the linear subspace parallel
to aff C;

(b)

x ∈ icr C ⇔ ∀y ∈ C ∃δ > 0 such that (1 + δ)x − δy ∈ C


⇔ cone(C − x) = cone(C − C)
⇔ cone(C − x) is a linear subspace.

Proof. (a) We have C − C = C − x + x − C ⊆ lin(C − x), thus C − x ⊆


C − C ⊆ cone(C − C) ⊆ lin(C − x). The conclusion will follow after proving that
cone(C − C) is a linear subspace.
Let u ∈ cone(C − C). Then u = λ(x − y) for λ ≥ 0 and x, y ∈ C. Then
αu ∈ cone(C − C) for all α ∈ R.
Let u, v ∈ cone(C − C). Then u = λu (xu − yu ) for λu ≥ 0 and xu , yu ∈ C,
and v = λv (xv − yv ) for λv ≥ 0 and xv , yv ∈ C. Then

u + v = λu xu + λv xv − λu yu − λv yv ∈ (λu + λv )(C − C) ∈ cone(C − C).

(b) Let x ∈ icr C. From the definition of the intrinsic core it follows that
∀y ∈ C ∃δ > 0 such that (1 + δ)x − δy ∈ C.
Assuming that this property holds, we have that x − C ∈ cone(C − x). Con-
sequently, C − C = C − x + x − C ⊆ cone(C − x) + cone(C − x) ⊆ cone(C − x)
and, further, cone(C − C) ⊆ cone(C − x). Since the opposite inclusion is always
true, it holds cone(C − x) = cone(C − C).
Assuming that cone(C − x) = cone(C − C), according to (a), we have that
cone(C − x) is a linear subspace.
Assume now that cone(C − x) is a linear subspace. We will prove that x ∈
icr C. Since C − x ⊆ cone(C − x) ⊆ cone(C − C) ⊆ lin(C − x), we have that
cone(C − x) = cone(C − C) = lin(C − x) = aff C − x. Let y ∈ aff C, y 6= x. Then
there exist α, β > 0 such that y − x ∈ α(C − x) and x − y ∈ β(C − x). Let δ > 0
such that − β1 < −δ < δ < α1 and λ ∈ [−δ, δ]. If λ ∈ [0, δ], then x + λ(y − x) ∈
λαC +(1−λα)x ∈ C. If λ ∈ [−δ, 0), then x+λ(y −x) = (−λβ)C +(1+λβ)x ∈ C.
This proves that x ∈ icr C. 
1 Preliminary notions and results 15

The following notion will be used in the formulation of regularity conditions


for strong duality. We will address it in detail in the next section when X is a
finite-dimensional space.

Definition 1.15 (strong quasi-relative interior) Let C ⊆ X be a nonempty con-


vex set. The strong quasi-relative interior of the set C is defined as

sqri C := {x ∈ C | cone(C − x) is a closed linear subspace}.

If C is convex, then it holds

int C ⊆ core C ⊆ sqri C ⊆ icr C ⊆ C,

all inclusions being in general strict.

Remark 1.16 (quasi interior and quasi-relative interior) In the literature one
can find two further notions which generalize the interior of a convex set C ⊆ X.
These are its quasi interior

qi C := {x ∈ C | cl(cone(C − x)) = X}

and its quasi-relative interior

qri C := {x ∈ C | cl(cone(C − x)) is a linear subspace}.

One has
sqri C ⊆ icr C
int C ⊆ core C ⊆ ⊆ qri C ⊆ C,
qi C
all the inclusions being in general strict. If int C 6= ∅, then all interior notions in
the above scheme become equal to int C.
For p ∈ [1, +∞), let `p be the real Banach space of real sequences (tn )n∈N such
+∞  +∞ 1/p
|tn |p < +∞, equipped with the norm k · k : `p → R, kxk = |tn |p
P P
that
n=1 n=1
for all x = (tn )n∈N ∈ `p . For `p+ = {(tn )n∈N ∈ `p : tn ≥ 0 ∀n ∈ N}, the positive
cone of `p , it holds

int(`p+ ) = core(`p+ ) = sqri(`p+ ) = icr(`p+ ) = ∅,

while
qi(`p+ ) = qri(`p+ ) = {(tn )n∈N ∈ `p : tn > 0 ∀n ∈ N}.
16 I Convex sets

2 The relative interior of a convex set in finite-


dimensional spaces
Throughout this section we assume that X = Rn and that it is endowed with the
Euclidean topology.

Definition 2.1 (relative interior) Let C ⊆ Rn be a nonempty convex set. The


relative interior of the set C is defined as

ri C := {x ∈ C | ∃ε > 0 such that B(x, ε) ∩ aff C ⊆ C}

and it represents the interior of the set C relative to its affine hull. If int C 6= ∅,
then aff C = Rn and ri C = int C.

Example 2.2 For C = [0, 1] × {0} ⊆ R2 we have int C = ∅, while ri C =


(0, 1) × {0}.

Remark 2.3 (a) Let C ⊆ Rn . Since aff C is closed, it hold cl C ⊆ cl(aff C) =


aff C. This shows that it makes no sense to consider the closure of the set C
relative to its affine hull. We have

aff C ⊆ aff(cl C) ⊆ aff(aff C) = aff C,

thus aff C = aff(cl C).


(b) For C1 ⊆ C2 ⊆ Rn , it holds aff C1 ⊆ aff C2 , int C1 ⊆ int C2 and cl C1 ⊆
cl C2 , however, not necessarily ri C1 ⊆ ri C2 . Take, for instance, C1 = [0, 1] × {0}
and C2 = [0, 1] × [0, 1].
If, in addition, aff C1 = aff C2 , then, obviously, ri C1 ⊆ ri C2 .

Proposition 2.4 Let C ⊆ Rn . The following statements are true:

(a) x ∈ ri C ⇔ x ∈ C and there exists ε > 0 such that B(x, ε) ∩ aff C ⊆ ri C;

(b) if ri C 6= ∅, then aff(ri C) = aff C = aff(cl C);

(c) ri(ri C) = ri C.

Proof. (a) ”⇐” It follows from the definition of the relative interior.
”⇒” As x ∈ ri C, it holds x ∈ C and there exists ε > 0 such that B(x, ε) ∩
aff C ⊆ C. We will prove that B(x, 2ε ) ∩ aff C ⊆ ri C. Let y ∈ B(x, 2ε ) ∩ aff C.
Then y ∈ C and, since B(y, 2ε ) ⊆ B(x, ε), it holds B(y, 2ε ) ∩ aff C ⊆ C. This
proves that y belongs to ri C.
(b) The second equality follows according to Remark 2.3. Let x ∈ C and
y ∈ ri C ⊆ C. According to (a), there exists ε > 0 such that B(y, ε)∩aff C ⊆ ri C.
Since [y, x] := {(1 − λ)y + λx | λ ∈ [0, 1]} ⊆ aff C, it holds B(y, ε) ∩ [y, x] ⊆ ri C.
2 The relative interior of a set in finite-dimensional spaces 17

Consequently, there
1 1
 exists λ ∈ (0, 1) such that z := (1−λ)y+λx ∈ ri C. Therefore
x = λ z + 1 − λ y ∈ aff(ri C), which proves that C ⊆ aff(ri C). From here it
follows that aff C ⊆ aff(ri C) ⊆ aff C.
(c) The statement is obviously true for ri C = ∅. Assume that ri C 6= ∅.
According to (a) and (b) we have

x ∈ ri C ⇔ x ∈ C and ∃ε > 0 such that B(x, ε) ∩ aff C ⊆ ri C


⇔ x ∈ ri C and ∃ε > 0 such that B(x, ε) ∩ aff(ri C) ⊆ ri C ⇔ x ∈ ri(ri C).

The following theorem explains the prominent role the concept of relative
interior plays for convex sets in finite-dimensional spaces.

Theorem 2.5 Let C ⊆ Rn be a nonempty convex set. Then it holds ri C 6= ∅.

Proof. Let m := dim C and x0 ∈ C. Then dim(lin(C − x0 )) = dim(aff C) = m


and there exist x1 , ..., xm ∈ C such that x1 − x0 , ..., xm − x0 form a basis for
lin(C − x0 ).
We consider the m-dimensional simplex S := co{x0 , x1 , ..., xm }. Since C is a
convex set, S ⊆ C and aff S ⊆ aff C. On the other hand, according to Proposition
1.4, for z ∈ aff C, it holds z − x0 ∈ lin(C − x0 ) and, therefore, there exist
λ1 , ..., λm ∈ R such that
m m
! m
X X X
z − x0 = λi (xi − x0 ) ⇔ z = 1 − λ i x0 + λi xi ∈ aff S.
i=1 i=1 i=1

This shows that aff C = aff S and, consequently,Pm ri S ⊆ ri C.


1
Next we will prove that x := m+1 x
i=0 i ∈ ri S, which will provide the
desired conclusion.
Pm x ∈ aff S. ForP
Obviously, every u ∈ aff S−x there exist µi (u), i = 1, ..., m, such
that u = i=0 µi (u)xi and m i=0 µi (u) = 0. Since x0 , x1 , ..., xm are affinely inde-
pendent, the coefficients µi (u), i = 0, ..., m, are uniquely defined. The mapping
u 7→ (µ0 (u), µ1 (u), ..., µm (u)) from aff S − x to Rm+1 is linear and, therefore, also
continuous. This implies that there exists ε > 0 such that for all u ∈ aff(S) − x
1
with kuk < ε it holds |µi (u)| ≤ m+1 for all i = 0, ..., m.
We will prove that
B(x, ε) ∩ aff(S) ⊆ S. (2.1)
Let y ∈ B(x, ε) ∩ aff S. Then y − x ∈ aff S − x and ky − xk < ε. This means that
1
|µi (y − x)| ≤ m+1 for all i = 0, ..., m. On the other hand, it holds
m m  
X X 1
y = x + (y − x) = x + µi (y − x)xi = + µi (y − x) xi ,
i=0 i=0
m+1
18 I Convex sets

1
+ µi (y − x) ≥ 0 for all i = 0, ..., m, and m 1
P 
m+1 i=0 m+1 + µi (y − x) = 1, which
implies that y ∈ co{x0 , x1 , ..., xm } = S. This proves (2.1), consequently, x ∈ ri S.


We will see in the following that in finite-dimensional spaces the intrinsic core,
the strong quasi-relative interior and the relative interior of a convex set coincide.

Theorem 2.6 Let C ⊆ Rn be a convex set. It holds

icr C = sqri C = ri C.

Proof. The first equality follows from Proposition 1.14 (b), therefore, we will
only prove that icr C = ri C.
Let x ∈ ri C. Then x ∈ C and there exists ε > 0 such that B(x, ε)∩aff C ⊆ C.
Let y ∈ C. Then there exists δ > 0 such that x + δ(x − y) ∈ B(x, ε) and, since
x + δ(x − y) ∈ aff C, it holds x + δ(x − y) ∈ C. According to Proposition 1.14
(b), we have x ∈ icr C.
Let now x ∈ icr C. Then x ∈ C and for every u ∈ aff C − x = lin(C − x)
there exists δ > 0 such that for all λ ∈ [−δ, δ] it holds x + λu ∈ C. Let
m := dim(lin(C −x)) and {e1 , ..., em } a basis of lin(C −x) such that x±ei ∈ C for
all i = 1, ..., m. As in the proof of Theorem 1.11 (b), one can find a neighbourhood
of zero in the topology induced by the norm topology on lin(S − x), which can be
represented as B(0, ε)∩lin(C −x), for ε > 0, such that x+B(0, ε)∩lin(C −x) ⊆ C
or, equivalently, B(x, ε) ∩ aff C ⊆ C. This proves that x ∈ ri C. 

In the following we will discuss some properties of the relative interior of a


convex set.

Theorem 2.7 Let C ⊆ Rn be a nonempty convex set. The following statements


are true:
(a) for all x ∈ ri C, y ∈ cl C and λ ∈ (0, 1] it holds λx + (1 − λ)y ∈ ri C;

(b) ri C is a convex set;

(c) cl(ri C) = cl C and ri(cl C) = ri C.

Proof. The statements can be proved in the lines of Theorem 1.6 (b)-(d) by
taking into account that ri C 6= ∅ and aff(ri C) = aff C = aff(cl C). 

The following corollary is a direct consequence of Theorem 2.7.

Corollary 2.8 Let C1 , C2 ⊆ Rn be given convex sets. The following statements


are equivalent:
(i) cl C1 = cl C2 ;
2 The relative interior of a set in finite-dimensional spaces 19

(ii) ri C1 = ri C2 ;
(iii) ri C1 ⊆ C2 ⊆ cl C1 .
In the following we will give a useful characterization of the relative interior
of a convex set.
Theorem 2.9 Let C ⊆ Rn be a nonempty convex set. It holds
x ∈ ri C ⇔ ∀y ∈ C ∃λ > 1 such that (1 − λ)y + λx ∈ C. (2.2)
Proof. ”⇒” Let x ∈ ri C and ε > 0 such that B(x, ε) ∩ aff C ⊆ C. Let y ∈ C.
Then (1 − λ)y + λx ∈ aff C for all λ ∈ R and there exists λ̄ ∈ R, λ̄ > 1, such that
(1 − λ̄)y + λ̄x = x + (λ̄ − 1)(x − y) ∈ B(x, ε). This implies (1 − λ̄)y + λ̄x ∈ C.
”⇐” Since ri C 6= ∅, we can choose an element y in this set. Consequently,
there exists λ > 1 such that z := (1 − λ)y +λx ∈ C. Then λ1 ∈ (0, 1) and,
according to Theorem 2.7 (a), x = λ1 z + 1 − λ1 y ∈ ri C. 
Remark 2.10 (a) Theorem 2.9 is actually saying that, for all x ∈ ri C and all
y ∈ C, the segment [x, y] can be extended beyond its endpoint x without leaving
the set C. This means not only that there exists λ > 1 with (1 − λ)y + λx ∈ C,
but also that (1 − µ)y + µx ∈ C holds for all µ ∈ [1, λ].
(b) If C ⊆ Rn is a nonempty convex set, then one also has
x ∈ ri C ⇔ ∀y ∈ cl C ∃λ > 1 such that (1 − λ)y + λx ∈ C
⇔ ∀y ∈ C ∃λ > 1 such that (1 − λ)y + λx ∈ ri C.
Let x ∈ ri C and y ∈ cl C. Let ε > 0 such that B(x, ε) ∩ aff C ⊆ C and
λ̄ ∈ R, λ̄ > 1, such that (1 − λ̄)y + λ̄x = x + (λ̄ − 1)(x − y) ∈ B(x, ε). Since
(1 − λ̄)y + λ̄x ∈ aff(cl C) = aff C, it follows (1 − λ̄)y + λ̄x ∈ C. This proves the
first equivalence.
The second equivalence follows from
x ∈ ri C ⇔ x ∈ ri(ri C)
⇔ ∀y ∈ cl(ri C) = cl C ∃λ > 1 such that (1 − λ)y + λx ∈ ri C.
The next theorem provides formulas for the closure and the relative interior
of the intersection of a family of convex sets.
Theorem 2.11 Let (Ci )i∈I ⊆ Rn be a family of convex sets with
T
ri Ci 6= ∅. It
i∈I
holds !
\ \
cl Ci = cl Ci . (2.3)
i∈I i∈I
In addition, if I is a finite index set, it holds
!
\ \
ri Ci = ri Ci . (2.4)
i∈I i∈I
20 I Convex sets

T T
Proof. Let x ∈ ri Ci and y ∈ cl Ci . According to Theorem 2.9, for all
i∈I i∈I
i ∈ I and all λ ∈ (0, 1] it holds
T λx +T (1 − λ)y ∈ ri Ci ⊆ Ci . Consequently, for all
λ ∈ (0, 1], y + λ(x − y) ∈ ri Ci ⊆ Ci . We let λ converge to zero and obtain
   i∈I  i∈I  
T T T T
y ∈ cl ri Ci ⊆ cl Ci . This proves that cl Ci ⊆ cl Ci . Since
i∈I i∈I i∈I i∈I
the opposite inclusion is always true, relation (2.3) is proved.
We have shown above that
! ! !
\ \ \ \
cl Ci = cl Ci ⊆ cl ri Ci ⊆ cl Ci ,
i∈I i∈I i∈I i∈I
   
T T T T
consequently, cl ri Ci = cl Ci . The sets ri(Ci ) and Ci are convex,
i∈I i∈I  i∈I  i∈I

T T
therefore, according to Corollary 2.8, ri ri Ci = ri Ci , which implies
  i∈I i∈I
T T
that ri Ci ⊆ ri Ci .
i∈I i∈I

T We assume nowTthat I is a finite set and choose an arbitrary element x ∈


ri Ci . Let y ∈ Ci . According to Theorem 2.9, for all i ∈ I, there exists
i∈I i∈I
λi > 1 such that (1 − λi )y + λi x ∈ Ci . Moreover, as seen in Remark 2.10
(a), for all i ∈ I and all µi ∈ [1, λi ] it holds (1 − µi )y + µi x ∈ Ci . We define
λ := min{λi : i ∈ I} > 1. For all i ∈ I it holds (1−λ)y +λx ∈ Ci or, equivalently,
 
T T
(1 − λ)y + λx ∈ Ci . By applying again Theorem 2.9, we obtain x ∈ ri Ci ,
i∈I i∈I
which proves that (2.4) holds. 
T
Example 2.12 (a) The following example shows that the assumption ri Ci 6=
i∈I
∅ cannot be omitted to obtain (2.3) and (2.4). Let C1 = (0, +∞) × (0, +∞) ∪
{(0, 0)} and C2 = R × {0}. Then ri C1 = (0, +∞) × (0, +∞), ri C2 = C2 , ri C1 ∩
ri C2 = ∅, while ri(C1 ∩ C2 ) = {(0, 0)}.
Moreover, cl(C1 ∩ C2 ) = {(0, 0)} and cl C1 ∩ clT C2 = R+ × {0}.
(b) The following example shows that, even if ri Ci 6= ∅, (2.4) might not be
i∈I
fulfilled if I is not finite. Let I = N and Cn = 0, 1+ n1 , thus 1
  
 ri Cn = 0, 1 + n ,
T T
for all n ∈ N. It holds ri Cn = (0, 1], however, ri Cn = ri((0, 1]) = (0, 1).
n∈N n∈N

In the following we characterize the relative interior and the closure of the
image and of the preimage of a convex set under an affine operator.

Theorem 2.13 Let C ⊆ Rn be a convex set and T : Rn → Rm an affine operator.


It holds T (cl C) ⊆ cl T (C) and T (ri C) = ri T (C).
2 The relative interior of a set in finite-dimensional spaces 21

Proof. The first statement follows by the continuity of T and does not require
C to be convex. The second statement is obviously true if C is empty. We assume
that C 6= ∅. We have

T (ri C) ⊆ T (C) ⊆ T (cl C) = T (cl(ri C)) ⊆ cl T (ri C),

which implies that cl T (ri C) = cl T (C). According to Corollary 2.8, ri T (C) =


ri T (ri C) ⊆ T (ri C).
In order to prove the opposite inclusion we choose x ∈ T (ri C) and an arbitrary
element y in T (C). Then there exist x0 ∈ ri C and y 0 ∈ C such that x = T (x0 ) und
y = T (y 0 ). According to Theorem 2.9, there exists λ > 1 such that (1−λ)y 0 +λx0 ∈
C. This implies that (1 − λ)y + λx = T ((1 − λ)y 0 + λx0 ) ∈ T (C) and, since y was
chosen arbitrarily in T (C), it holds x ∈ ri T (C). 
n o
1
Example 2.14 For the convex closed set C = (x1 , x2 ) ∈ R2 | x1 > 0, x2 ≥ x1
and the affine operator T : R2 → R, T (x1 , x2 ) = x1 , one can notice that

T (cl C) = T (C) = (0, +∞) ⊆ cl T (C),

and that the inclusion is strict.

Remark 2.15 For a finite family of sets Ci ⊆ Rni , i = 1, ..., m, one has cl(C1 ×
... × Cm ) = cl C1 × ... × cl Cm and aff(C1 × ... × Cm ) = aff C1 × ... × aff Cm , which,
according to the definition of the relative interior, leads to

ri(C1 × ... × Cm ) = ri C1 × ... × ri Cm .

n
Considering the convex sets Ci ⊆ RP , i = 1, ..., m, and the affine operator
n n n
T : R × ... × R → R , TP (x , ..., x ) = m
1 m i
i=1 αi x , where αi ∈ R, i = 1, ..., m, it
m
holds T (C1 × ... × Cm ) = i=1 αi Ci and, according to Theorem 2.13,

m m
!
X X
αi cl Ci ⊆ cl αi Ci
i=1 i=1

and !
m
X m
X
αi ri Ci = ri αi Ci .
i=1 i=1

Theorem 2.16 Let D ⊆ Rm be a convex set and T : Rn → Rm an affine oper-


ator such that T −1 (ri D) 6= ∅. It holds T −1 (cl D) = cl T −1 (D) and T −1 (ri D) =
ri T −1 (D).
22 I Convex sets

Proof. Let U := Rn × D ⊆ Rn × Rm and V := {(x, T (x)) : x ∈ Rn } ⊆ Rn × Rm .


V is an affine set, therefore, ri V = V = cl V . Since T −1 (ri D) 6= ∅, there exists
x ∈ Rn such that T (x) ∈ ri D. This implies that ri U ∩ ri V = (Rn × ri D) ∩ V 6= ∅.
According to Theorem 2.11 it holds

cl(U ∩ V ) = cl U ∩ cl V = cl U ∩ V

and
ri(U ∩ V ) = ri U ∩ ri V = ri U ∩ V.
Let PrRn : Rn × Rm → Rn , PrRn (x, y) = x, be the projection operator of
R × Rm onto Rn . PrRn is a linear operator which fulfils PrRn (U ∩ V ) = {x ∈ Rn :
n

T (x) ∈ D}. According to Theorem 2.13 it holds

T −1 (ri D) = PrRn (ri U ∩ V ) = PrRn (ri(U ∩ V )) = ri Pr


n
(U ∩ V ) = ri T −1 (D)
R

and

T −1 (cl D) = PrRn (cl U ∩ V ) = PrRn cl(U ∩ V ) ⊆ cl PrRn (U ∩ V ) = cl T −1 (D).

The inclusion cl T −1 (D) ⊆ T −1 (cl D) follows from the continuity of T and it is


fulfilled for arbitrary sets D ⊆ Rm . 

Example 2.17 Let T : R → R, T (x) = 0 for all x ∈ R. It holds ri(0, 1] =


ri[0, 1) = (0, 1) and T −1 (ri(0, 1]) = T −1 (ri[0, 1)) = ∅. Moreover, cl T −1 ((0, 1]) =
∅ 6= R = T −1 (cl(0, 1]) and T −1 (ri[0, 1)) = ∅ 6= R = ri T −1 ([0, 1)). This shows that
the condition T −1 (ri D) 6= ∅ in Theorem 2.16 cannot be omitted.

The following theorem is another important consequence of Theorem 2.13.

Theorem 2.18 Let E ⊆ Rn × Rm be a convex set and define E(x) := {y ∈ Rm :


(x, y) ∈ E} for all x ∈ Rn . Further, let C := {x ∈ Rn : E(x) 6= ∅}. It holds

(x, y) ∈ ri E ⇔ x ∈ ri C and y ∈ ri E(x).

Proof. Let PrRn : Rn × Rm → Rn , PrRn (x, y) = x, and PrRm : Rn × Rm →


Rm , PrRm (x, y) = y, be the projection operators of Rn × Rm onto Rn and Rm ,
respectively. Obviously,

(x, y) ∈ ri E ⇔ x ∈ PrRn (ri E) and y ∈ PrRm (({x} × Rm ) ∩ ri E).

According to Theorem 2.13, we have PrRn (ri E) = ri PrRn (E) = ri C. On the


other hand, Theorem 2.11 and Theorem 2.13 give

PrRm (({x} × Rm ) ∩ ri E) = PrRm (ri(({x} × Rm ) ∩ E))


= ri PrRm (({x} × Rm ) ∩ E) = ri E(x).


2 The relative interior of a set in finite-dimensional spaces 23

In the last part of this section we will formulate some separation theorems
in finite-dimensional spaces which refine the classical Hahn-Banach Separation
Theorems in real normed spaces, that we recall below.
Throughout this lecture X ∗ will denote the dual space of X. We will use the
notation hx∗ , xi := x∗ (x) for the evaluation of x∗ ∈ X ∗ at x ∈ X.
Theorem 2.19 (Hahn-Banach Separation Theorem; see [4, Theorem 7.6]) Let
X be a nontrivial real normed space and C, D ⊆ X two convex sets such that C
is open and C ∩ D = ∅. Then there exists x∗ ∈ X ∗ , x∗ 6= 0, such that
hx∗ , ci > hx∗ , di ∀c ∈ C ∀d ∈ D.
Theorem 2.20 (Hahn-Banach Strong Separation Theorem; see [4, Theorem
7.7]) Let X be a nontrivial real normed space and C, D ⊆ X two convex sets such
that C is compact, D is closed and C ∩ D = ∅. Then there exists x∗ ∈ X ∗ , x∗ 6= 0,
such that
inf hx∗ , ci > suphx∗ , di.
c∈C d∈D

We start with the following strong separation result which makes the choice
of the separating hyperplane more precise.
Theorem 2.21 Let C ⊆ Rn be a nonempty convex set and und d ∈
/ cl C. Then
n
there exists a ∈ R , a 6= 0, such that
inf{aT x | x ∈ C} = inf{aT x | x ∈ cl C} > aT d.
Moreover, one can choose a in the set cone(cl C − d) such that kak = 1.
Proof. The set cl C is convex and closed. Let
f := argmin kd − ek
e∈cl C

be the projection of d on cl C (see, for instance, [4, Theorem 16.2]). Since d ∈


/ cl C,
it holds δ := kd − f k2 > 0.
For every x ∈ cl C it holds (see, for instance, [4, Lemma 16.3])
(f − d)T (x − f ) ≥ 0 ⇔ (f − d)T x ≥ (f − d)T f
⇔ (f − d)T x ≥ (f − d)T (f − d) + (f − d)T d = kf − dk2 + (f − d)T d
⇔ (f − d)T x ≥ δ + (f − d)T d.
Therefore, for b := f − d 6= 0, it holds
inf{bT x : x ∈ cl C} ≥ δ + bT d > bT d. (2.5)
1 1 1
Let a := kbk
b ∈ Rn . Then kak = 1, a = kbk
(f − d) ∈ kbk
(cl C − d) ⊆ cone(cl C − d)
and
inf{aT x | x ∈ C} = inf{aT x | x ∈ cl C} > aT d.

24 I Convex sets

Theorem 2.22 and its consequence, Theorem 2.23, which we will formulate
below, extend the separation result in Theorem 2.19 to convex sets for which we
do not necessarily assume to have nonempty interiors.
Theorem 2.22 Let C ⊆ Rn be a nonempty convex set and d ∈
/ ri C. Then there
n
exists a ∈ R , a 6= 0, such that
inf{aT x | x ∈ C} ≥ aT d
and
sup{aT x | x ∈ C} > aT d.
Proof. If d ∈ / cl C, then the statement follows from Theorem 2.21. Assume that
d ∈ cl C \ ri C. According to Theorem 2.7 it holds d ∈ / ri(cl C). This means that
for all k ∈ N we must have
 
1
B d, ∩ aff(cl C) * cl C.
k
1
k

Therefore, for all k ∈ N there exists an element d ∈ B d, k
∩ aff(cl C) =
B d, k1 ∩ aff C such that dk ∈

/ cl C. According to Theorem 2.21, for all k ∈ N
there exists ak ∈ Rn fulfilling kak k = 1, ak ∈ cone(cl C − dk ) and
(ak )T x > (ak )T dk ∀x ∈ cl C. (2.6)
Since dk ∈ aff(cl C) = aff C, it holds, according to Proposition 1.4,
cl C − dk ⊆ aff C − dk = lin(C − dk ) := U (aff C) ∀k ∈ N,
where U (aff C) denotes the linear subspace parallel to aff C. Consequently, ak ∈
cone(cl C − dk ) ⊆ U (aff C) for all k ∈ N.
On the other hand, as kak k = 1 for all k ∈ N, there exist a subsequence
(a )j∈N ⊆ Rn and an element a ∈ Rn such that lim akj = a. Therefore,
kj
j→+∞
a ∈ U (aff C) and kak = 1. From (2.6) it follows that
aT x = lim (akj )T x ≥ lim (akj )T dkj = aT d ∀x ∈ cl C,
j→+∞ j→+∞

consequently, inf{aT x | x ∈ cl C} = inf{aT x | x ∈ C} ≥ aT d.


It remains to show that there exists z ∈ C such that aT z > aT d. To this end
we assume that
aT x = aT d ∀x ∈ C. (2.7)
Let u ∈ U (aff C) be arbitrarily chosen. Since d ∈ aff C, we have u + Pd ∈ aff C,
m
Pm exist m ≥ 1, xi ∈ C, λi ∈ R, i = 1, ..., m, such that i=1 λi =
consequently, there
1 and u + d = i=1 λi xi . Acoording to (2.7) it holds
m
! m
X X
T T
a u=a λi xi − d = λi aT (xi − d) = 0.
i=1 i=1
2 The relative interior of a set in finite-dimensional spaces 25

Since a is an element of the linear subspace U (aff C), we have kak2 = aT a = 0,


which is a contradiction to kak = 1. Consequently, sup{aT x | x ∈ C} > aT d. 

Theorem 2.23 (Proper Separation Theorem) Let C1 , C2 ⊆ Rn be two nonempty


convex sets. Then ri C1 ∩ri C2 = ∅ if and only of C1 and C2 are properly separable,
which means that there exists a ∈ Rn , a 6= 0, such that

inf{aT c | c ∈ C1 } ≥ sup{aT d | d ∈ C2 }

and
sup{aT c | c ∈ C1 } > inf{aT d | d ∈ C2 }.

Proof. ”⇒” The set C1 − C2 is convex and, since ri(C1 − C2 ) = ri C1 − ri C2 ,


/ ri(C1 − C2 ). According to Theorem 2.22 there exists a ∈ Rn , a 6= 0,
it holds 0 ∈
such that

inf{aT (c − d) | c ∈ C1 , d ∈ C2 } ≥ 0 ⇔ inf{aT c | c ∈ C1 } ≥ sup{aT d | d ∈ C2 }

and

sup{aT (c − d) | c ∈ C1 , d ∈ C2 } > 0 ⇔ sup{aT c | c ∈ C1 } > inf{aT d | d ∈ C2 }.

”⇐” Let C := C1 − C2 . It holds

inf{aT x | x ∈ C} ≥ 0 and sup{aT x | x ∈ C} > 0.

For Z := {x ∈ Rn | aT x ≥ 0}, we have C ⊆ Z.


Assuming that aT x = 0 for all x ∈ ri C, we would have that aT x = 0 for all
x ∈ cl(ri C) = cl C, which would contradict sup{aT x : x ∈ C} > 0. Consequently,
the set ri C ∩ ri Z = ri C ∩ {x ∈ Rn : aT x > 0} is nonempty and, according to
Theorem 2.11, ri C ∩ ri Z = ri(C ∩ Z) = ri C, which implies that ri C ⊆ ri Z.
Assumg that ri C1 ∩ ri C2 6= ∅ ⇔ 0 ∈ ri C, one would have 0 ∈ ri Z = int Z =
{x ∈ Rn : aT x > 0}. Contradiction! 
26 I Convex sets
Chapter II

Convex functions

In this chapter we will discuss algebraic and topological properties of convex


functions defined on a nontrivial real normed space X and taking values in the
extended real-line R := R ∪ {±∞}. The addition and the multiplication are
extended in a natural way from R to R. Additionally, we make the following
conventions:
(+∞) + (−∞) = (−∞) + (+∞) = +∞,
0(+∞) = (+∞)0 = +∞, 0(−∞) = (−∞)0 = 0.

3 Algebraic properties of convex functions


Definition 3.1 (convex function) A function f : X → R is called convex if for
all x, y ∈ X and all λ ∈ [0, 1] it holds
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y). (3.1)
Remark 3.2 (a) Given a nonempty convex set C ⊆ X and f : C → R, we say
that f is convex on C if
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) ∀x, y ∈ C ∀λ ∈ [0, 1].
Then C is convex and f is convex on C if and only if the function

f (x), if x ∈ C,
f˜ : X → R, f˜(x) =
+∞, otherwise,
is convex (in the sense of Definition 3.1).
(b) It can be proved by induction that a function f : X → R isP convex if and
only if for all k ∈ N and all xi ∈ X and λi ≥ 0, i = 1, ..., k, with ki=1 λi = 1 it
holds !
X k k
X
f λi xi ≤ λi f (xi ).
i=1 i=1

27
28 II Convex functions

Example 3.3 (a) The indicator function of a set C ⊆ X is defined as



0, if x ∈ C,
δC : X → R, δC (x) =
+∞, otherwise.

Then C is a convex set if and only if δC is a convex function.


(b) The function x 7→ kxk is convex.
(c) Let (H, h·, ·i) be a real Hilbert space and A : H → H a positive (or,
positive semidefinite) (hx, Axi ≥ 0 for all x ∈ H) operator. Then x 7→ 12 hx, Axi
is a convex function.

The following sets play a central role in convex analysis.

Definition 3.4 For a given function f : X → R we denote by


(a) dom f = {x ∈ X | f (x) < +∞} its effective domain;

(b) epi f = {(x, r) ∈ X × R | f (x) ≤ r} its epigraph;

(c) epiS f = {(x, r) ∈ X × R | f (x) < r} its strict epigraph.

The following proposition relates the convexity of a function to the convexity


of its epigraph.

Proposition 3.5 Let f : X → R be a given function. The following statements


are equivalent:
(a) f is convex;

(b) epi f is a convex set;

(c) epiS f is a convex set.

Proof. “(a) ⇒ (c)” Let (x, r), (y, s) ∈ epiS f and λ ∈ [0, 1]. Then f (λx + (1 −
λ)y) ≤ λf (x) + (1 − λ)f (y) < λr + (1 − λ)s, thus

λ(x, r) + (1 − λ)(y, s) = (λx + (1 − λ)y, λr + (1 − λ)s) ∈ epiS f,

which proves that epiS f is a convex set.


“(c) ⇒ (b)” Let (x, r), (y, s) ∈ epi f and λ ∈ [0, 1]. Then, for all k ∈ N,
(x, r + k1 ), (y, s + k1 ) ∈ epiS f , and, according to (b), (λx + (1 − λ)y, λr + (1 −
λ)s + k1 ) ∈ epiS f . In other words,
1
f (λx + (1 − λ)y) < λr + (1 − λ)s + ∀k ∈ N,
k
which implies f (λx + (1 − λ)y) ≤ λr + (1 − λ) or, equivalently, λ(x, r) + (1 −
λ)(y, s) ∈ epi f .
3 Algebraic properties of convex functions 29

“(b) ⇒ (a)” Let x, y ∈ X and λ ∈ [0, 1]. If f (x) = +∞ or f (y) = +∞, then
(3.1) is obviously fulfilled. Assume that f (x) < +∞ and f (y) < +∞ and choose
arbitrary r, s ∈ R such that f (x) < r and f (y) < s. Then (x, r), (y, s) ∈ epi f
and, according to (b), λ(x, r)+(1−λ)(y, s) = (λx+(1−λ)y, λr +(1−λ)s) ∈ epi f
or, equivalently, f (λx + (1 − λ)y) ≤ λr + (1 − λ)s. We let r and s convege to
f (x) and f (y), respectively, and obtain, also in this case, (3.1). 

Remark 3.6 (a) If f : X → R is a convex function, then, since PrX : X × R →


R, PrX (x, r) = x, is a linear operator, dom f = PrX (epi f ) is a convex set.
(b) If f : X → R is a convex function, then all its lower lever sets

{x ∈ X | f (x) ≤ λ}, where λ ∈ R,

are convex. The opposite statement is not true, as it can be seen for f : R →
R, f (x) = x3 .

The next proposition provides a characterization of convex functions which


take the value −∞ at a point of the intrinsic core of their effective domain.

Proposition 3.7 Let f : X → R be a convex function and x ∈ X such that


f (x) = −∞. Then f (z) = −∞ for all z ∈ icr(dom f ).

Proof. Let z ∈ icr dom f . Since dom f is a convex set and x ∈ dom f , according
to Proposition 1.14 (b), there exists δ > 0 such that y := (1 + δ)z − δx ∈ dom f .
1 δ
It holds z = 1+δ y + 1+δ x, consequently,
 
1 δ 1 δ
f (z) = f y+ x ≤ f (y) + f (x) = −∞.
1+δ 1+δ 1+δ 1+δ


The following notion will allow us to avoid some pathological classes of ex-
tended real-valued functions.

Definition 3.8 (proper function) A function f : X → R is said to be proper if


dom f 6= ∅ and f (x) > −∞ for all x ∈ X.

Sublinear functions form an important subclass of the one of convex functions.

Definition 3.9 (sublinear function) A function f : X → R is called sublinear if


it is
(a) subaddtive, namely, f (x + y) ≤ f (x) + f (y) for all x, y ∈ X;

(b) positively homogeneous, namely, f (0) = 0 and f (λx) = λf (x) for all λ > 0
and all x ∈ X.
30 II Convex functions

Example 3.10 (a) The support function of a set C ⊆ X, defined as

σC : X ∗ → R, σC (x∗ ) = suphx∗ , xi,


x∈C

is sublinear.
(b) A convex function is sublinear if and only if it is positively homogeneous.
(c) Let C ⊆ X be a nonempty convex set. The Minkowski functional of C

pC : X → R, pC (x) = inf{λ ≥ 0 | x ∈ λC},

is sublinear. It holds pB(0,1) = pB(0,1) = k · k, where B(0, 1) := cl B(0, 1) = {x ∈


X | kxk ≤ 1}.

In the following we will present some operations that allow us to construct


convex functions from others.
We will start by noticing that, if f : X → R is a convex function and λ ≥ 0,
then λf is also convex. If f, g : X → R are two convex functions, then f + g is
also convex.

Proposition 3.11 Let fi : X → R, i ∈ I, be a family of convex functions. Then


its pointwise supremum f : X → R, f (x) = supi∈I fi (x), is also a convex function.

Proof. The result follows from epi f = ∩i∈I epi fi and Proposition 3.5. 

In the following we will assume that Y is another nontrivial real normed space.

Proposition 3.12 (infimal value function) Let Φ : X × Y → R be a convex


function. Then the infimal value function of Φ

h : Y → R, h(y) = inf Φ(x, y),


x∈X

is convex.

Proof. Let PrY ×R : X × Y × R → Y × R, PrY ×R (x, y, r) = (y, r). One has

(y, s) ∈ epiS h ⇔ h(y) < s ⇔ ∃x ∈ X such that Φ(x, y) < s


⇔ ∃x ∈ X such that (x, y, s) ∈ epiS Φ ⇔ (y, s) ∈ PrY ×R (epiS Φ),

which shows that epiS h = PrY ×R (epiS Φ). The convexity of h follows via Propo-
sition 3.5. 

If f : Y → R is a convex function and T : X → Y is an affine operator, then,


obviously f ◦ T : X → R is also convex.
3 Algebraic properties of convex functions 31

Proposition 3.13 (infimal function through an operator) Let f : X → R be a


convex function and T : X → Y an affine operator. The infimal function of f
through T
T f : Y → R, (T f )(y) = inf{f (x) | T x = y},
is convex and it fulfills dom T f = T (dom f ).
Proof. The graph of the operator T , gr T := {(x, y) | T x = y}, is a convex set,
thus Φ : X × Y → R, Φ(x, y) = f (x) + δgr T (x, y) is a convex function. By Propo-
sition 3.12 it follows that (T f )(y) = inf x∈X Φ(x, y) is also convex. Moreover,
y ∈ dom T f ⇔ (T f )(y) < +∞ ⇔ ∃x ∈ X with T x = y and f (x) < +∞
⇔ ∃x ∈ dom f with T x = y ⇔ y ∈ T (dom f ).

Proposition 3.14 (infimal convolution) Let fi : X → R, i = 1, ..., m, be convex
functions. The infimal convolution of the functions f1 , ..., fm
( m m
)
X X
f1 ...fm : X → R, (f1 ...fm )(x) = inf fi (xi ) xi = x ,

i=1 i=1
Pm
is convex and it fuliflls dom f1 ...fm = i=1 dom fi .
Pm
Proof. Let f : X × ... × X P → R, f (x1 , ..., xm ) = i=1 fi (xi ), and A : X ×
... × X → X, A(x1 , ..., xm ) = m i=1 ix . Then f is a convex function and A is a
linear operator. According to the previous proposition, the infimal function of f
through A, defined for all x ∈ X by
( m m
)
n o X X
Af (x) = inf f (x1 , ..., xm ) A(x1 , ..., xm ) = x = inf fi (xi ) xi = x

i=1 i=1
= (f1 ...fm )(x),
is also a convex function, and
dom f1 ...fm = dom Af = A(dom f ) = A(dom f1 × ... × dom fm )
Xm
= dom fi .
i=1


Example 3.15 (distance function to a convex set) Let C ⊆ X be a convex set.
Then δC is a convex function, thus δC k · k is a convex function, too. Since for
all x ∈ X
δC k · k(x) = inf{kx − yk + δC (y) | y ∈ X} = inf{kx − yk | y ∈ C} = dC (x),
the distance function to the convex set C dC : X → R, dC (x) = inf y∈C kx − yk, is
convex.
32 II Convex functions

4 Topological properties of convex functions


This section is dedicated to the study of the most important topological properties
of convex functions.

Definition 4.1 (lower and upper semicontinuity) Let f : X → R and x ∈ X.


We say that f is:
(a) lower semicontinuous at x if

lim inf f (y) = sup inf f (y) ≥ f (x); (4.1)


y→x δ>0 y∈B(x,δ)

(b) upper semicontinuous at x if (−f ) is lower semicontinuous at x;

(c) continuous at x if f (x) ∈ R and f is lower semicontinuous and upper semi-


continuous at x;

(d) lower (upper) semicontinuous if it is lower (upper) semicontinuous at every


x ∈ X.

Remark 4.2 Let f : X → R and x ∈ X.


(a) It is easy to see that the function f is lower semicontinuous at x ∈ X if
and only if actually
lim inf f (y) = f (x).
y→x

(b) The function f is lower semicontinuous at x ∈ X if and only if for every


sequence (xk )k∈N ⊆ X with xk → x (k → +∞) it holds

lim inf f (xk ) = sup inf f (xk ) ≥ f (x).


k→+∞ n≥1 k≥n

(c) If f (x) ∈ R, then f is lower semicontinuous at x ∈ X if and only if

∀ε > 0 ∃δ > 0 such that f (y) ≥ f (x) − ε ∀y ∈ B(x, δ).

The following proposition characterizes the lower semicontinuity of a function


in terms of the closedness of its epigraphs and of the closedness of its lower level
sets (see also Proposition 3.5 and Remark 3.6 (b)).

Proposition 4.3 Let f : X → R be a given function. The following statements


are equivalent:
(a) f is lower semicontinuous;

(b) epi f is a closed set in X × R;

(c) for every λ ∈ R the lower level set {x ∈ X | f (x) ≤ λ} is closed.


4 Topological properties of convex functions 33

Proof. ”(a) ⇒ (b)” Assume that there exists (x, r) ∈ cl(epi f ) \ epi f . In other
words, r < f (x). Let ε > 0 such that r < r + ε < f (x). Since f is lower
semicontinuous at x, there exists δ > 0 such that inf y∈B(x,δ) f (y) > r + ε, which
implies that B(x, δ) × (r − ε, r + ε) ∩ epi f = ∅. Contradiction to (x, r) ∈ cl(epi f )!
”(b) ⇒ (c)” Let λ ∈ R be fixed and assume that there exists an element
x ∈ cl ({z ∈ X | f (z) ≤ λ}) such that f (x) > λ. This means that (x, λ) ∈ / epi f ,
consequently, there exist ε > 0 and δ > 0 such that B(x, δ)×(λ−ε, λ+ε)∩epi f =
∅. This implies that f (z) > λ for all z ∈ B(x, δ), which contradicts the fact that
x ∈ cl ({z ∈ X | f (z) ≤ λ}).
”(c) ⇒ (a)” Let x ∈ X. We will prove that (4.1) is fulfilled. If f (x) = −∞,
then there is nothing to be proved.
If f (x) = +∞, then, for all n ∈ N, x ∈ {z ∈ X | f (z) > n}, which is an open
set. Consequently, for all n ∈ N, there exists δn > 0 such that B(x, δn ) ⊆ {z ∈
X | f (z) > n}. Thus, inf y∈B(x,δn ) f (y) ≥ n. This yields supδ>0 inf y∈B(x,δ) f (y) =
+∞ = f (x).
Finally, assume that f (x) ∈ R and choose an arbitrary ε > 0. The set
{z ∈ X | f (z) > f (x) − ε} is open and it contains x. Consequently, there exists
δε > 0 such that B(x, δε ) ⊆ {z ∈ X | f (z) > f (x) − ε}, which means that
inf y∈B(x,δε ) f (y) ≥ f (x) − ε. Consequently,

sup inf f (y) ≥ inf f (y) ≥ f (x) − ε.


δ>0 y∈B(x,δ) y∈B(x,δε )

Taking into account that ε > 0 was chosen arbitrary, (4.1) follows. 

Example 4.4 Let C ⊆ X be a given set. Since epi δC = C × R+ , δC is a lower


semicontinuous function if and only if C is a closed set.

The following proposition is the topological counterpart of Proposition 3.11


and it follows from epi f = ∩i∈I epi fi and Proposition 4.3.

Proposition 4.5 Let fi : X → R, i ∈ I, be a family of lower semicontinuous


functions. Then its pointwise supremum f : X → R, f (x) = supi∈I fi (x), is also
a lower semicontinuous function.

The notion which we will introduce as follows is the largest lower semicontin-
uous function majorized by a given function.

Definition 4.6 (lower semicontinuous hull) Let f : X → R be a given function.


The function

f : X → R, f (x) = inf{t | (x, t) ∈ cl(epi f )},

is called the lower semicontinuous hull of f .


34 II Convex functions

Theorem 4.7 Let f : X → R be a given function. The following statements are


true:
(a) epi f = cl(epi f ), consequently, f is lower semicontinuous;
(b) dom f ⊆ dom f ⊆ cl(dom f );
(c) f (x) = lim inf f (y) for all x ∈ X.
y→x

Proof. We have epi f ⊆ cl(epi f ) ⊆ epi f and


f (x) = inf{t | (x, t) ∈ cl(epi f )} ≤ inf{t | (x, t) ∈ epi f } = f (x) ∀x ∈ X.
(a) Let (x, r) ∈ epi f . Then f (x) = inf{t : (x, t) ∈ cl(epi f )} ≤ r and for all n ∈ N
there exists tn ∈ R such that (x, tn ) ∈ cl(epi f ) and tn < r + n1 . Consequently,
(x, r + n1 ) ∈ cl(epi f ) for all n ∈ N, which implies that (x, r) ∈ cl(epi f ).
(b) The fact that dom f ⊆ dom f follows from f ≤ f . In order to prove the
second inclusion we choose an arbitrary element x ∈ dom f . Then there exists
r ∈ R such that f (x) ≤ r or, equivalently, (x, r) ∈ epi f = cl(epi f ). Let δ > 0.
Since B(x, δ) × (r − 1, r + 1) ∩ epi f 6= ∅, we have that B(x, δ) ∩ dom f 6= ∅. This
proves that x ∈ cl(dom f ).
(c) Let x ∈ X. Since f ≤ f and f is lower semicontinuous, it holds
f (x) = lim inf f (y) ≤ lim inf f (y).
y→x y→x

Assume that f (x) < lim inf y→x f (y). Then there exists ε ∈ R such that
f (x) < ε < lim inf f (y) = sup inf f (y).
y→x δ>0 y∈B(x,δ)

From here it follows that there exists δε > 0 such that inf y∈B(x,δε ) f (y) > ε. In
other words, for all y ∈ B(x, δε ), f (y) > ε or, equivalently, B(x, δε ) × (−∞, ε) ∩
epi f = ∅. Let r ∈ R such that f (x) < r < ε. Then (x, r) ∈ cl(epi f ) and,
since (x, r) ∈ B(x, δε ) × (−∞, ε), it follows that B(x, δε ) × (−∞, ε) ∩ epi f 6= ∅.
Contradiction! 

Remark 4.8 The function sup{g | g ≤ f and g is lower semicontinuous} is ac-


cording to Proposition 4.5 lower semicontinuous, thus,
f = sup{g | g ≤ f and g is lower semicontinuous}. (4.2)
Since f ≤ f and f is lower semicontinuous, we have
f ≤ sup{g | g ≤ f and g is lower semicontinuous}.
Let h : X → R be an arbitrary lower semicontinuous function with h ≤ f . It
holds epi f = cl(epi f ) ⊆ cl(epi h) = epi h, which yields h ≤ f . This shows that
(4.2) is true.
4 Topological properties of convex functions 35

Next we will recall some basic facts related to the weak topology on X induced
by X ∗ . The collection of sets

Vw (0) := Vx∗1 ,...,x∗n ,ε (0) | n ∈ N, x∗i ∈ X ∗ , i = 1, ..., n, ε > 0 ,




where
Vx∗1 ,...,x∗n ,ε (0) := {y ∈ X | |hx∗i , yi| < ε ∀i = 1, ..., n},
form a basis of neighbourhoods of 0 for the weak topology (every neighbourhood
of 0 contains an element V ∈ Vw (0)). Consequently, for x ∈ X, the collection of
sets
Vw (x) := Vx∗1 ,...,x∗n ,ε (x) | n ∈ N, x∗i ∈ X ∗ , i = 1, ..., n, ε > 0 ,


where

Vx∗1 ,...,x∗n ,ε (x) := x + Vx∗1 ,...,x∗n ,ε (0) = {y ∈ X | |hx∗i , y − xi| < ε ∀i = 1, ..., n},

will forms a basis of neighbourhoods of x (every neighbourhood of x contains an


element V ∈ Vw (x)).
The weak topology on X is the weakest topology on X making all mappings
hx∗ , ·i : X → R continuous for all x∗ ∈ X ∗ . A sequence (xn )n∈N ⊆ X converges
w
in the weak topology to x ∈ X (we write xn − → x) as n → +∞ if and only if
for all x∗ ∈ X ∗ the sequence (hx∗ , xn i)n∈N converges to hx∗ , xi as n → +∞. If a
sequence (xn )n∈N ⊆ X converges in the norm topology to x ∈ X (we write, when
k·k
necessary, xn −→ x) as n → +∞, then it converges also in the weak topology to
x ∈ X as n → +∞. The opposite statement is in general not true, as it can be
seen for the canonical basis sequence (en )n∈N in `p , 1 < p < ∞, or in c0 , which
converges weakly to 0, while ken k = 1 for all n ∈ N.
For an arbitrary set C ⊆ X we have

C ⊆ cl C ⊆ clw C,

where clw C denotes the weak closure of C, which is the closure of C in the weak
topology. Indeed, let x ∈ cl C and n ∈ N, x∗i ∈ X ∗ , i = 1, ..., n, and ε > 0. Let
δ := max{kx∗ k | εi=1,...,n}+1 . Then there exists y ∈ C such that y ∈ B(x, δ). From
i
here we get that for every i = 1, ..., n, it holds |hx∗i , y − xi| ≤ kx∗i kky − xk < ε,
thus y ∈ Vx∗1 ,...,x∗n ,ε (x). This proves that x ∈ clw C.
The following theorem shows that the (norm) closure and the weak closure of
a convex set coincide.

Theorem 4.9 Let C ⊆ X be a convex set. Then it holds

cl C = clw C.
36 II Convex functions

Proof. Let x ∈ clw C. We will prove that x ∈ cl C. Assuming the contrary,


by the Hahn-Banach Strong Separation Theorem (see Theorem 2.20) there exists
x∗ ∈ X ∗ , x∗ 6= 0, such that
inf hx∗ , ci > hx∗ , xi,
c∈C

therefore, there exists ε > 0 such that inf c∈C hx∗ , c − xi > ε. This shows that
Vx∗ ,ε (x) ∩ C = ∅, which contradicts x ∈ clw C. 
Definition 4.10 (weak lower semicontinuity) Let f : X → R and x ∈ X. We
say that f is weakly lower semicontinuous at x if
limwinf f (y) = sup inf f (y) ≥ f (x). (4.3)
y−
→x V ∈Vw (x) y∈V

We say that f is weakly lower semicontinuous if it is weakly lower semicontinuous


at every x ∈ X.
Weakly lower semicontinuous functions can be characterized in terms of the weak
closedness of their epigraphs and also of the weak closedness of all their lower
level sets as in Proposition 4.3. This fact, together with Theorem 4.9, lead to the
following result.
Theorem 4.11 Let f : X → R be a convex function. Then f is lower semicon-
tinuous if and only if it is weakly lower semicontinuous.
Let us introduce now the weak∗ topology on X ∗ induced by X. The collection
of sets
Vw∗ (0) := {Vx1 ,...,xn ,ε (0) | n ∈ N, xi ∈ X, i = 1, ..., n, ε > 0} ,
where
Vx1 ,...,xn ,ε (0) := {y ∗ ∈ X ∗ | |hy ∗ , xi i| < ε ∀i = 1, ..., n},
forms a basis of neighbourhoods of 0 for the weak∗ topology (every neighbourhood
of 0 contains an element V ∈ Vw∗ (0)). Consequently, for x∗ ∈ X ∗ , the collection
of sets
Vw∗ (x∗ ) := {Vx1 ,...,xn ,ε (x∗ ) | n ∈ N, xi ∈ X, i = 1, ..., n, ε > 0} ,
where
Vx1 ,...,xn ,ε (x∗ ) := x∗ + Vx1 ,...,xn ,ε (0) = {y ∗ ∈ X ∗ | |hy ∗ − x∗ , xi i| < ε ∀i = 1, ..., n},
will form a basis of neighbourhoods of x∗ (every neighbourhood of x∗ contains an
element V ∈ Vw∗ (x∗ )).
The weak∗ topology on X is the weakest topology on X ∗ making all mappings
h·, xi : X ∗ → R continuous for all x ∈ X. A sequence (x∗n )n∈N ⊆ X ∗ converges in
w∗
the weak∗ topology to x∗ ∈ X ∗ (we write x∗n −→ x∗ ) as n → +∞ if and only if
for all x ∈ X the sequence (hx∗n , xi)n∈N converges to hx∗ , xi as n → +∞.
4 Topological properties of convex functions 37

Definition 4.12 (weak∗ lower semicontinuity) Let f : X ∗ → R and x∗ ∈ X ∗ .


We say that f is weakly∗ lower semicontinuous at x∗ if

lim∗inf f (y ∗ ) = sup inf


∗ ∈V
f (y ∗ ) ≥ f (x∗ ). (4.4)
w ∗
V ∈Vw∗ (x ) y
y∗ −→ x∗

We say that f is weakly∗ lower semicontinuous if it is weakly∗ lower semicontin-


uous at every x∗ ∈ X ∗ .

Example 4.13 Let C ⊆ X be a given set. According to Proposition 4.5, the


support function of C, σC : X ∗ → R, σC (x∗ ) = supx∈C hx∗ , xi, is weakly∗ lower
semicontinuous.

The following proposition shows that convex and lower semicontinuous func-
tions which take the value −∞ cannot take finite values.

Proposition 4.14 Let f : X → R be convex and lower semicontinuous. If


f (x) = −∞, then f (z) = −∞ for all z ∈ dom f . Consequently, if f is sublinear
and lower semicontinuous, then f is proper.

Proof. Let z ∈ X such that f (z) = t ∈ R. We have (z, t) ∈ epi f and (x, t−n) ∈
epi f for all n ∈ N. Consequently, for all n ∈ N
     
1 1 1 1
(x, t − n) + 1 − (z, t) = x+ 1− z, t − 1 ∈ epi f,
n n n n

thus (z, t − 1) ∈ cl(epi f ) = epi f . This implies that f (z) ≤ f (z) − 1. Contradic-
tion! 

Next we provide an important characterization of functions which are both


convex and lower semicontinuous. To this end we will introduce first the following
notion.

Definition 4.15 (affine-continuous minorant) Let f : X → R be a given func-


tion. We say that x 7→ hx∗ , xi + c, where (x∗ , c) ∈ X ∗ × R, is an affine-continuous
minorant of f if hx∗ , xi + c ≤ f (x) for all x ∈ X.

Theorem 4.16 Let f : X → R be a given function. Then f is convex and


lower semicontinuous with f (x) > −∞ for all x ∈ X if and only if the set of its
affine-continuous minorants is nonempty and

f (x) = sup{hx∗ , xi + c | (x∗ , c) ∈ X ∗ × R, hx∗ , yi + c ≤ f (y) ∀y ∈ X} ∀x ∈ X.


38 II Convex functions

Proof. ”⇐” The statement follows from Proposition 3.11 and Proposition 4.5.
”⇒” First we prove that the set
{(x∗ , c) ∈ X ∗ × R | hx∗ , yi + c ≤ f (y) ∀y ∈ X}
is nonempty.
If f is identically +∞, then this is obviously the case. Let now z ∈ X such
that f (z) ∈ R. Since (z, f (z)−1) ∈/ epi f , according to Theorem 2.20, there exists
∗ ∗ ∗
(x , α) ∈ X × R, (x , α) 6= (0, 0), such that
hx∗ , xi + αr > hx∗ , zi + α(f (z) − 1) ∀(x, r) ∈ epi f. (4.5)
Since (z, f (z)) ∈ epi f , it holds α > 0, thus, (4.5) can be equivalently written as
1 ∗
r> hx , z − xi + f (z) − 1 ∀(x, r) ∈ epi f.
α
For all x ∈ dom f we have (x, f (x)) ∈ epi f , consequently,
1 ∗
f (x) > hx , z − xi + f (z) − 1.
α
As this inequality holds also for x ∈
/ dom f ,
 
1 ∗ 1
x 7→ − x , x + hx∗ , zi + f (z) − 1
α α
is an affine-continuous minorant of f .
It obviously holds
f (x) ≥ sup{hx∗ , xi + c | (x∗ , c) ∈ X ∗ × R, hx∗ , yi + c ≤ f (y) ∀y ∈ X} ∀x ∈ X.
(4.6)
We will prove that (4.6) is fulfilled as equality. If f is identically +∞, then this
is obviously the case. Assume that f is not identically +∞ and that there exist
x̄ ∈ X and r̄ ∈ R such that
f (x̄) > r̄ > sup{hx∗ , x̄i + c | (x∗ , c) ∈ X ∗ × R, hx∗ , yi + c ≤ f (y) ∀y ∈ X}. (4.7)
It holds (x̄, r̄) ∈
/ epi f and so, applying again Theorem 2.20, we obtain an element
(x̄ , ᾱ) ∈ X × R, (x̄∗ , ᾱ) 6= (0, 0), and ε > 0 such that
∗ ∗

hx̄∗ , xi + ᾱr > hx̄∗ , x̄i + ᾱr̄ + ε ∀(x, r) ∈ epi f. (4.8)


Since f is not identically +∞, there exists (y, s) ∈ epi f . This means that
(y, s + t) ∈ epi f for all t > 0, which yields ᾱ ≥ 0.
Assume that f (x̄) < +∞. The fact that (x̄, f (x̄)) ∈ epi f and (4.8) imply
ᾱ(f (x̄) − r̄) > ε, therefore ᾱ > 0. After dividing (4.8) by ᾱ, we obtain
1 ∗ ε 1
f (x) > hx̄ , x̄ − xi + r̄ + > hx̄∗ , x̄ − xi + r̄ ∀x ∈ dom f.
ᾱ ᾱ ᾱ
4 Topological properties of convex functions 39

This means that the mapping x 7→ − ᾱ1 x̄∗ , x + ᾱ1 hx̄∗ , x̄i+ r̄ is an affine-continuous

minorant of f which takes the value r̄ at x̄. This contradicts (4.7).


Assume that f (x̄) = +∞. If ᾱ > 0, then, by arguing as above, we get a
contradiction to (4.7). Assuming that ᾱ = 0, from (4.8) we obtain
h−x̄∗ , x − x̄i + ε < 0 ∀x ∈ dom f.
Using that the set of affine-continuous minorants of f is nonempty, there exists
(x∗ , β) ∈ X ∗ × R such that
hx∗ , xi + β ≤ f (x) ∀x ∈ X.
r̄−(hx∗ ,x̄i+β)
Set γ := ε
> 0. The mapping
x 7→ hx∗ − γ x̄∗ , xi + γhx̄∗ , x̄i + β + γε
is affine-continuous and it fulfills for all x ∈ dom f
hx∗ − γ x̄∗ , xi + γhx̄∗ , x̄i + β + γε = hx∗ , xi + β + γ (h−x̄∗ , x − x̄i + ε)
< hx∗ , xi + β ≤ f (x).
Consequently, it is an affine-continuous minorant of f which takes at x := x̄ the
value
hx∗ − γ x̄∗ , x̄i + γhx̄∗ , x̄i + β + γε = hx∗ , x̄i + β + γε = r̄.
We obtain also in this case a contraction to (4.7). This concludes the proof. 
In the remainder of this section we will address the continuity of convex
functions. The following proposition will play an important role in these investi-
gations.
Proposition 4.17 Let f : X → R be a proper and convex function and x ∈
dom f . Assume that there exist δ > 0 and M ≥ 0 such that
f (y) ≤ f (x) + M ∀y ∈ B(x, δ).
Then it holds
M
|f (y) − f (x)| ≤
ky − xk ∀y ∈ B(x, δ),
δ
which means that f is continuous at x.
δ
Proof. Let y ∈ B(x, δ), y 6= x. Then x + ky−xk
(y − x) ∈ B(x, δ) and, since f is
ky−xk
convex and 0 < δ
≤ 1,
     
ky − xk δ ky − xk
f (y) − f (x) = f x+ (y − x) + 1 − x − f (x)
δ ky − xk δ
   
ky − xk δ M
≤ f x+ (y − x) − f (x) ≤ ky − xk.
δ ky − xk δ
40 II Convex functions

This proves that

M
f (y) − f (x) ≤ ky − xk ∀y ∈ B(x, δ).
δ

On the other hand, for every y ∈ B(x, δ) we have 2x − y ∈ B(x, δ), thus, by using
again the convexity of f ,

M
f (x) − f (y) ≤ f (2x − y) − f (x) ≤ ky − xk.
δ
This provides the conclusion. 

Now can prove the following result which characterizes the continuity of con-
vex functions.

Theorem 4.18 (continuity of convex functions) Let f : X → R be a convex


function with the property that f is bounded above on a neighbourhood of a point
of its domain. If f is not proper, then f is identically −∞ on int(dom f ). If f
is proper, then f is continuous on int(dom f ).

Proof. Let x ∈ dom f , δ > 0 and M ≥ 0 such that f (y) ≤ M for all y ∈ B(x, δ).
Then B(x, δ) ⊆ dom f , and so x ∈ int(dom f ).
If f takes the value −∞, then, according to Proposition 3.7 and Proposition
1.13, f (z) = −∞ for all z ∈ icr(dom f ) = int(dom f ).
Assume now that f is proper and let z ∈ int(dom f ). Then there exists γ > 0
1
such that u := z + γ(z − x) ∈ dom f . For λ := 1+γ ∈ (0, 1) it yields
   
γ γ γ γ
λu + (1 − λ)B(x, δ) = z − x+ x + B 0, δ = B z, δ ,
1+γ 1+γ 1+γ 1+γ
 
γ
thus, for every y ∈ B z, 1+γ δ , y := λu + (1 − λ)v, with v ∈ B(x, δ), and

f (y) ≤λf (u) + (1 − λ)f (v) ≤ λf (u) + (1 − λ)M = f (z) + M1 ,

where

M1 := λf (u) + (1 − λ)M − f (z) ≥ λf (u) + (1 − λ)f (x) − f (λu + (1 − λ)x) ≥ 0.

The continuity of f at z follows from Proposition 4.17. 

The following results shows that a proper and convex function which is bounded
above on a neighbourhood of a point of its domain is even locally Lipschitz
continuous on the interior of is domain.
4 Topological properties of convex functions 41

Theorem 4.19 (local Lipschitz continuity of convex functions) Let f : X → R


be a proper and convex function with the property that f is bounded above on a
neighbourhood of a point of its domain. Then f is locally Lipschitz continuous on
int(dom f ), in other words, for every x ∈ int(dom f ) there exist δ > 0 and L > 0
such that B(x, δ) ⊆ dom f and

|f (y) − f (z)| ≤ Lky − zk ∀y, z ∈ B(x, δ).

Proof. Let x ∈ int(dom f ). According to Theorem 4.18, f is continuous at


x, which means that there exists δ > 0 such that B(x, 2δ) ⊆ dom f and for all
u, v ∈ B(x, 2δ) it holds
1 1
|f (u) − f (v)| ≤ |f (u) − f (x)| + |f (x) − f (v)| < + = 1.
2 2
Let y, z ∈ B(x, δ), y 6= z, and α := ky − zk > 0. We define u := y + αδ (y − z) and
notice that ku − yk = αδ ky − zk = δ, which means that u ∈ B(x, 2δ). Using that
α δ
f is convex and noticing that y = α+δ u + α+δ z, it yields

α δ α α
f (y) − f (z) ≤ f (u) + f (z) − f (z) = (f (u) − f (z))< .
α+δ α+δ α+δ α+δ
Switching the roles of y and z allows us to conclude that
α α 1
|f (y) − f (z)| < < = ky − zk,
α+δ δ δ
1
thus the conclusion holds with L := δ
> 0. 

In finite-dimensional spaces we have a stronger formulation of the last two


theorems.

Theorem 4.20 Let f : Rn → R be a proper and convex function. Then f is lo-


cally Lipschitz continuous on int(dom f ), and therefore continuous on int(dom f ).

Proof. Let x ∈ int(dom f ) and 0 < γ ≤ 1 such that x ± γei ∈ dom f for
all i = 1, ..., n, where ei , i = 1, ..., n denotes the canonical basis of Rn . Then
there exists 0 < δ < √γn such that B(x, δ) ⊆ dom f . Let y ∈ B(x, δ). Then
Pn √ pPn √
i=1 |yi − xi | ≤ P n 2
i=1 |yi − xi | ≤ nδ < γ. We denote αi := yi −x
γ
i
for all
n
i = 1, ..., n. Then i=1 |αi | < 1 and
n n
!
X X X X
y =x+ αi γei = αi (x + γei ) + (−αi )(x − γei ) + 1 − |αi | x.
i=1 i:αi >0 i:αi <0 i=1

By denoting
M := max{f (x), f (x ± γe1 ), ..., f (x ± γen )},
42 II Convex functions

it holds
n
!
X X X
f (y) ≤ αi f (x + γei ) + (−αi )f (x − γei ) + 1− |αi | f (x) ≤ M,
i:αi >0 i:αi <0 i=1

which proves that f is bounded above on B(x, δ). The conclusion follows from
Theorem 4.18 and Theorem 4.19. 

An important consequence of the previous theorem is that convex functions


with full domains defined on finite-dimensional spaces are continuous.

Corollary 4.21 A convex function f : Rn → R is locally Lipschitz continuous


everywhere, and therefore continuous everywhere.

We close the section with a result which relates the lower semicontinuity of a
convex function defined on Banach spaces with their continuity.

Theorem 4.22 Let X be a Banach space and f : X → R a convex and lower


semicontinuous function having a finite value at at least one point. Then f is
continuous on int(dom f ) = core(dom f ).

Proof. According to Proposition 4.14, f is a proper function. The inclusion


int(dom f ) ⊆ core(dom f ) holds in general, thus, if core(dom f ) = ∅, then the
statement of the theorem is true.
Let x ∈ core(dom f ) and r ∈ R such that f (x) < r. The set C := {x ∈
X | f (x) ≤ r} is convex and closed. We will prove that x ∈ core C. To this
end, let u ∈ X fixed. Since x ∈ core(dom f ), there exists δ > 0 such that
x+λu ∈ dom f for all λ ∈ [−δ, δ]. Define the function g : R → R, g(t) = f (x+tu).
Then g is a proper and convex function (being the composition of a convex fuction
with an affine mapping) with [−δ, δ] ⊆ dom g. According to Theorem 4.20, g is
continuous at 0 ∈ (−δ, δ) ⊆ int(dom g). Since g(0) = f (x) < r, there exists t > 0
such that g(t) = f (x + tu) < r, which means that x + tu ∈ C. Since u was
arbitrarily chosen in X, according to Proposition 1.9 x ∈ core C.
Taking into account that X is Banach space and C is a convex and closed set,
Theorem 1.11 implies that x ∈ core C = int C ⊆ int(dom f ). This proves that
int(dom f ) = core(dom f ).
In addition, since x ∈ int C, there exists a neighbourhood of x which is con-
tained in C and, consequently, on which f is bounded above by r. From here it
follows according to Theorem 4.18 that f is continuous at x. 

The following corollary is a direct consequence of Theorem 4.22.

Corollary 4.23 A convex and lower semicontinuous function f : X → R defined


on a Banach space X is continuous everywhere.
Chapter III

Conjugate functions and


subdifferentiability

In this chapter we will introduce the notions of conjugate function and (convex)
subdifferential of a (convex) function and investigate their properties.

5 Conjugate functions
Definition 5.1 (conjugate function) Let f : X → R be a given function. The
function
f ∗ : X ∗ → R, f ∗ (x∗ ) := sup{hx∗ , xi − f (x)},
x∈X

is called the (Fenchel) conjugate function of f .

Remark 5.2 The conjugate function of f is convex and weakly∗ lower semicon-
tinuous (therefore, norm (strongly) lower semicontinuous on X ∗ ). Indeed, if f is
not proper, then
• either f is identically +∞, which means that f ∗ is identically −∞;
• or there exists x0 ∈ X such that f (x0 ) = −∞, which means that f ∗ is
identically +∞.
If f is proper, then dom f 6= ∅ and
f ∗ (x∗ ) = sup {hx∗ , xi − f (x)} ∀x∗ ∈ X ∗ .
x∈dom f

Since for all x ∈ dom f the mapping x∗ 7→ hx∗ , xi − f (x) is affine and weakly∗
lower semicontinuous, according to Proposition 3.11 and Proposition 4.5, f ∗ is
convex and weakly∗ lower semicontinuous.

In the following result we collect some elementary properties of conjugate


functions. For their verification one just has to use Definition 5.1.

43
44 III Conjugate functions and subdifferentiability

Proposition 5.3 Let f, g, fi : X → R, i ∈ I, be given functions. The following


statements are true:
(a) f (x) + f ∗ (x∗ ) ≥ hx∗ , xi ∀x ∈ X ∀x∗ ∈ X ∗ (Young-Fenchel inequality);

(b) inf x∈X f (x) = −f ∗ (0);

(c) if f ≤ g, then f ∗ ≥ g ∗ ;

(d) (supi∈I fi )∗ ≤ inf i∈I fi∗ and (inf i∈I fi )∗ = supi∈I fi∗ ;

(e) (λf )∗ (x∗ ) = λf ∗ λ1 x∗ ∀x∗ ∈ X ∗ ∀λ > 0;




(f ) (f + c)∗ (x∗ ) = f ∗ (x∗ ) − c ∀x∗ ∈ X ∗ ∀c ∈ R;

(g) if, for z ∈ X, fz : X → R, fz (x) = f (x − z), then (fz )∗ (x∗ ) = f ∗ (x∗ ) +


hx∗ , zi ∀x∗ ∈ X ∗ ;

(h) if, for z ∗ ∈ X ∗ , fz∗ : X → R, fz∗ (x) = f (x) + hz ∗ , xi, then (fz∗ )∗ (x∗ ) =
f ∗ (x∗ − z ∗ ) ∀x∗ ∈ X ∗ ;

(i) (f + g)∗ (x∗ + y ∗ ) ≤ f ∗ (x∗ ) + g ∗ (y ∗ ) ∀x∗ , y ∗ ∈ X ∗ ;

(j) (λf + (1 − λ)g)∗ (x∗ ) ≤ λf ∗ (x∗ ) + (1 − λ)g ∗ (x∗ ) ∀x∗ ∈ X ∗ ∀λ ∈ (0, 1).

Another property which follows directly from the definition of the conjugate
functions is stated in the following proposition.

Proposition 5.4 Let Xi , i = 1, ..., m, be nontrivial real normed spaces, fi : Xi →


R, i = 1, ..., m, given functions and
m
X
f : X1 × ... × Xm → R, f (x1 , ..., xm ) = fi (xi ).
i=1

Then m
X
f ∗ : X1∗ × ... × Xm

→ R, f (x∗1 , ..., x∗m ) = fi∗ (x∗i ).
i=1

Example 5.5 (a) For f : R → R, f (x) = 12 x2 , it holds


1
f ∗ : R → R, f ∗ (x∗ ) = (x∗ )2 .
2
(b) For f : R → R, f (x) = ex , it holds
 ∗
 x (ln x∗ − 1), if x∗ > 0,
f ∗ : R → R, f ∗ (x∗ ) = 0, if x∗ = 0,
+∞, if x∗ < 0.

5 Conjugate functions 45

(c) For 
 x(ln x − 1), if x > 0,
f : R → R, f (x) = 0, if x = 0,
+∞, if x < 0,


it holds f ∗ : R → R, f ∗ (x∗ ) = ex .
(d) For z ∗ ∈ X ∗ and c ∈ R, let f : X → R, f (x) = hz ∗ , xi + c. For all x∗ ∈ X ∗
it holds
−c, if x∗ = z ∗ ,

∗ ∗
f (x ) =
+∞, otherwise.
(e) Let C ⊆ X be a given set. For all x∗ ∈ X ∗ it holds

δC∗ (x∗ ) = sup{hx∗ , xi − δC (x)} = suphx∗ , xi = σC (x∗ ).


x∈X x∈C

(f) Let C ⊆ X be a nonempty set. The conjugate of the Minkowski functional of


C at x∗ ∈ X ∗ reads

p∗C (x∗ ) = sup hx∗ , xi − inf{λ ≥ 0 : x ∈ λC}



x∈X
 
   
∗ ∗
= sup hx , xi + sup {−λ} = sup − λ + suphx , λzi
x∈X  λ≥0,  λ≥0 z∈C
x∈λC
if σC (x∗ ) ≤ 1,
   
∗ 0,
= sup λ suphx , zi − 1 =
λ≥0 z∈C +∞, otherwise.

(g) Since pB(0,1) = k · k and σB(0,1) = k · k∗ , it holds for all x∗ ∈ X ∗

if kx∗ k∗ ≤ 1,

∗ ∗ ∗ 0,
k · k (x ) = δB ∗ (0,1) (x ) =
+∞, otherwise.

(h) For 1 < p < ∞, let f : X → R, f (x) = p1 kxkp . For all x∗ ∈ X ∗ it holds
f ∗ (x∗ ) = 1q kx∗ kq∗ , where p1 + 1q = 1.

In the following proposition we calculate the conjugate functions of the infimal


function of a given function through a continuous linear operator and of the
infimal convolution of a finite family of functions, respectively.

Proposition 5.6 (a) For f : X → R a given function and A : X → Y a


continuous linear operator it holds (Af )∗ = f ∗ ◦ A∗ , where

A∗ : Y ∗ → X ∗ , hA∗ y ∗ , xi = hy ∗ , Axi ∀y ∗ ∈ Y ∗ ∀x ∈ X,

denotes the adjoint operator of A.


46 III Conjugate functions and subdifferentiability

For fi : X → R, i = 1, . . . , m, given functions it holds (f1  . . . fm )∗ =


(b) P
m ∗
i=1 fi .

Proof. (a) By the definition of the conjugate function we have for all y ∗ ∈ Y ∗

(Af )∗ (y ∗ ) = sup{hy ∗ , yi − (Af )(y)} = sup{hy ∗ , yi − inf f (x)}


y∈Y y∈Y x∈X,Ax=y

= sup{hy ∗ , Axi − f (x)} = sup{hA∗ y ∗ , xi − f (x)} = (f ∗ ◦ A∗ )(y ∗ ).


x∈X x∈X

(b) We define asPin the proof of Proposition 3.14 the function f : X ×...×X →
R, f (x1 , ..., xm ) = m i (xi ), and the continuous linear operator A : X × ... ×
i=1 fP
X → X, A(x1 , ..., xm ) = m i=1 xi , and recall that Af = f1 ...fm . According to
statement (a) we have

(f1  . . . fm )∗ = (Af )∗ = f ∗ ◦ A∗ .

The conclusion follows by using Proposition 5.4 and the fact that the adjoint
operator of A is defined as A∗ x∗ = (x∗ , ..., x∗ ) for all x∗ ∈ X ∗ . 

The following result shows that an arbitrary function and its lower semicon-
tinuous hull have the same conjugate function.

Proposition 5.7 For f : X → R a given function it holds f ∗ = (f )∗ .

Proof. Since f ≤ f , by Proposition 5.3 (c) we have (f )∗ ≥ f ∗ . We assume that


there exists x∗ ∈ X ∗ such that (f )∗ (x∗ ) > f ∗ (x∗ ). Then there exists c ∈ R such
that (f )∗ (x∗ ) > c > f ∗ (x∗ ). This means that hx∗ , xi − f (x) ≤ c for all x ∈ X
or, equivalently, hx∗ , xi − c ≤ f (x) for all x ∈ X. This implies that hx∗ , xi − c ≤
f (x) for all x ∈ X or, equivalently, (f )∗ (x∗ ) = supx∈X {hx∗ , xi − f (x)} ≤ c.
Contradiction! 

The following definition introduces the biconjugate function of a given func-


tion.

Definition 5.8 (biconjugate function) Let f : X → R be a given function. The


function
f ∗∗ : X → R, f ∗∗ (x) = sup {hx∗ , xi − f ∗ (x∗ )},
x∗ ∈X ∗

is called the biconjugate function of f .

If X is a reflexive Banach space, then f ∗∗ = (f ∗ )∗ can be seen as the conjugate


of the conjugate function of f .
By arguing as in Remark 5.2, one can see that f ∗∗ is convex and weakly lower
semicontinuous, therefore, norm lower semicontinuous on X.
5 Conjugate functions 47

According to the Young-Fenchel inequality we have for all x ∈ X that


hx∗ , xi − f ∗ (x∗ ) ≤ f (x) ∀x∗ ∈ X ∗ ,
thus
f ∗∗ (x) ≤ f (x).
Since f ∗∗ is lower semicontinuous, in view of Remark 4.8, it holds
f ∗∗ (x) ≤ f (x) ≤ f (x) ∀x ∈ X. (5.1)
The following theorem shows that for proper, convex and lower semicontinuous
functions the above inequality holds as equality.
Theorem 5.9 (Fenchel-Moreau Theorem) If f : X → R is proper, convex and
lower semicontinuous, then f ∗ : X ∗ → R is proper and f = f ∗∗ .
Proof. We prove first that f ∗ is proper. If there exists z ∗ ∈ X ∗ such that
f ∗ (z ∗ ) = −∞, then f ∗∗ is identically +∞, thus, by (5.1), f is identically +∞.
Contradiction to f proper!
As f is proper, convex and lower semicontinuous, by Theorem 4.16 it follows
that there exists (x∗ , c) ∈ X ∗ × R such that hx∗ , xi + c ≤ f (x) for all x ∈ X. This
is equivalent to f ∗ (x∗ ) ≤ −c, which implies that f ∗ is not identically +∞. This
proves the properness of f ∗ .
The fact that f = f ∗∗ follows also from Theorem 4.16, namely, for all x ∈ X
it holds
f ∗∗ (x) = sup {hx∗ , xi − f ∗ (x∗ )} = sup {hx∗ , xi + c}
x∗ ∈X ∗ x∗ ∈X ∗ ,c∈R,
f ∗ (x∗ )≤−c

= sup{hx∗ , xi + c | (x∗ , c) ∈ X ∗ × R, hx∗ , yi + c ≤ f (y) ∀y ∈ X} = f (x).



Having a nonempty convex and weakly∗ closed set C ⊆ X ∗ and x∗ ∈ / C,
according to the Hahn-Banach Strong Separation Theorem (in locally convex
spaces), there exists x ∈ X, x 6= 0, such that
inf hc∗ , xi > hx∗ , xi.
c∗ ∈C

As in the proof of Theorem 4.16, one can show that a function g : X ∗ → R is


convex and weakly∗ lower semicontinuous with g(x∗ ) > −∞ for all x∗ ∈ X ∗ if
and only if there exists (x0 , c0 ) ∈ X × R such that g(x∗ ) ≥ hx∗ , x0 i + c0 for all
x∗ ∈ X ∗ and
g(x∗ ) = sup{hx∗ , xi + c | (x, c) ∈ X × R, hy ∗ , xi + c ≤ g(y ∗ ) ∀y ∗ ∈ X ∗ } ∀x∗ ∈ X ∗ .
This leads to the following version of the Fenchel-Moreau Theorem for convex
and weakly∗ lower semicontinuous functions defined on X ∗ .
48 III Conjugate functions and subdifferentiability

Theorem 5.10 If g : X ∗ → R is proper, convex and weakly∗ lower semicontinu-


ous, then g ∗ : X → R is proper and g = g ∗∗ .

Remark 5.11 It is natural to ask whether it makes sense to continue the process
of defining conjugate functions of higher order for an arbitrary function f : X →
R. We will see that it does not make sense, since for

f ∗∗∗ : X ∗ → R, f ∗∗∗ (x∗ ) = sup{hx∗ , xi − f ∗∗ (x)},


x∈X

we always have f ∗ = f ∗∗∗ .


If f ∗ is proper, then the statement follows from Theorem 5.10. If f ∗ is identi-
cally +∞, then f ∗∗ is identically −∞ and f ∗∗∗ is identically +∞, thus f ∗ = f ∗∗∗ .
If there exists z ∗ ∈ X ∗ such that f ∗ (z ∗ ) = −∞, then f ∗∗ is identically +∞, which
yields that f ∗∗∗ is identically −∞. On the other hand, from (5.1) it follows that
f is identically +∞, and so f ∗ is identically −∞, thus f ∗ = f ∗∗∗ .

We will close this section by providing formulas for the conjugate of the com-
position of a proper, convex and lower semicontinuous function with a continuous
linear operator and for the sum of finitely many proper, convex and lower semicon-
tinuous functions, which we will obtain as a consequence of the Fenchel-Moreau
Theorem.
To this end we will denote in the following by
∗ ∗
g w : X ∗ → R, g w (x∗ ) = inf{t | (x∗ , t) ∈ clw∗ (epi g)},

the weakly∗ lower semicontinuous hull of a function g : X ∗ → R.

Theorem 5.12 (Moreau-Rockafellar Theorem)


(a) For f : Y → R a proper, convex and lower semicontinuous function and
A : X → Y a continuous linear operator fulfilling A−1 (dom f ) 6= ∅ it holds
w∗
(f ◦ A)∗ = A∗ f ∗ .

(b) For fi : X → R, i = 1, . . . , m, proper, convex and lower semicontinuous


functions fulfilling ∩m
i=1 dom fi 6= ∅ it holds

w∗
(f1 + ...fm )∗ = f1∗  . . . fm
∗ .

Proof. (a) According to Proposition 3.13, the function A∗ f ∗ : X ∗ → R is con-


w∗
vex, thus A∗ f ∗ : X ∗ → R is convex and weakly∗ lower semicontinuous. Accord-
ing to Proposition 5.6, the counterpart of Proposition 5.7 for the weakly∗ lower
semicontinuous hull, and the Theorem 5.10 it holds
w∗ ∗
 

Af ∗ (x) = (A∗ f ∗ )∗ (x) = (f ∗∗ ◦ A)(x) = (f ◦ A)(x) ∀x ∈ X. (5.2)
6 Convex subdifferential 49

w∗
Next we will prove that A∗ f ∗ is proper. If it is identically +∞, then, according
to (5.2), f ◦ A is identically −∞, which is a contradiction to f proper. If there
w∗
exists z ∗ ∈ X ∗ such that A∗ f ∗ (z ∗ ) = −∞, then f ◦ A is identically +∞, which
is a contradiction to the fact that A−1 (dom f ) 6= ∅.
Using again Theorem 5.10, it yields
w∗ w∗
A∗ f ∗ (x∗ ) = (A∗ f ∗ )∗∗ (x∗ ) = (f ◦ A)∗ (x∗ ) ∀x∗ ∈ X ∗ .

(b) We consider the proper, P convex and lower semicontinuous function f :


X × ... × X → R, f (x1 , ..., xm ) = mi=1 fi (xi ), and the continuous linear operator
A : X → X × ... × X, Ax = (x, ..., x). Then A−1 (dom f ) 6= ∅ and, according to
(a),
w∗ w∗
(f1 + ... + fm )∗ = (f ◦ A)∗ = A∗ f ∗ = f1∗  . . . fm
∗ .


6 Convex subdifferential
Definition 6.1 ((convex) subdifferential) Let f : X → R be a given function
and x ∈ X be such that f (x) ∈ R. The set defined as

∂f (x) := {x∗ ∈ X ∗ | f (y) − f (x) ≥ hx∗ , y − xi ∀y ∈ X} for f (x) ∈ R,

and ∂f (x) := ∅ for f (x) ∈/ R, is called the (convex) subdifferential of f at x.


The elements of ∂f (x) are called (convex) subgradients of f at x. Therefore, the
(convex) subdifferential can be seen as a set-valued operator ∂f : X ⇒ X ∗ from
X to X ∗ .

Remark 6.2 The (convex) subdifferential of a function f : X → R at x ∈ X


is a convex and weakly∗ closed subset of X ∗ , however, it can be empty even if
f (x) ∈ R.
For f (x) ∈ R we have
\
∂f (x) = {x∗ ∈ X ∗ | hx∗ , yi − f (y) ≤ hx∗ , xi − f (x)}
y∈dom f

and that for all y ∈ dom f the set {x∗ ∈ X ∗ | hx∗ , yi − f (y) ≤ hx∗ , xi − f (x)}
is convex and weakly∗ closed, being either the empty set (if f (y) = −∞) or the
lower level set of the affine and weakly∗ continuous mapping x∗ 7→ hx∗ , yi − f (y).
The function
 √
− x, if x ≥ 0,
f : R → R, f (x) =
+∞, otherwise,
50 III Conjugate functions and subdifferentiability

is proper, convex and lower semicontinuous, f (0) = 0, however, ∂f (0) = ∅.


Indeed, assuming that there exists x∗ ∈ ∂f (0), then one has that for all y > 0

√ 1
− y ≥ x∗ y or, equivalently, − √ ≥ x∗ .
y

Contradiction!

The following proposition shows that the Young-Fenchel inequality for a func-
tion f : X → R is fulfilled as equality only for pairs (x, x∗ ) lying on the graph of
its (convex) subdifferential, namely, fulfilling x∗ ∈ ∂f (x).

Proposition 6.3 Let f : X → R be a given function, x ∈ X and x∗ ∈ X ∗ . Then

x∗ ∈ ∂f (x) ⇔ f ∗ (x∗ ) + f (x) = hx∗ , xi.

Proof. We have that x∗ ∈ ∂f (x) if and only if f (x) ∈ R and f ∗ (x∗ ) =


supy∈X {hx∗ , yi − f (y)} ≤ hx∗ , xi − f (x), which is further equivalent to f (x) ∈ R
and f ∗ (x∗ )+f (x) ≤ hx∗ , xi. In view of the Young-Fenchel inequality (Proposition
5.3 (a)), this is further equivalent to f (x) ∈ R and f ∗ (x∗ ) + f (x) = hx∗ , xi and
finally to f ∗ (x∗ ) + f (x) = hx∗ , xi. 

Next we present some consequences of the nonemptiness of the (convex) sub-


differential of an arbitrary function at a given point.

Proposition 6.4 Let f : X → R be a given function and x ∈ X be such that


∂f (x) 6= ∅. The following statements are true:

(a) f ∗∗ (x) = f¯(x) = f (x), f is lower semicontinuous at x, and f, f¯ and f ∗∗ are


proper;

(b) ∂f ∗∗ (x) = ∂ f¯(x) = ∂f (x);

(c) if f is convex, then f = f ∗∗ .

Proof. (a) Let x∗ ∈ ∂f (x). Then f (x) ∈ R and the function h : X → R,


h(y) = hx∗ , yi + f (x) − hx∗ , xi, is an affine-continuous minorant of f . According
to the proof of Theorem 5.9 we have that f ∗∗ is the pointwise supremum of the
family of affine-continuous minorants of f , which, in combination with (5.1), gives

h ≤ f ∗∗ ≤ f ≤ f. (6.1)

Taking into consideration that h(x) = f (x), we deduce that h(x) = f ∗∗ (x) =
f¯(x) = f (x), which implies that f is lower semicontinuous at x. The properness
of f , f¯ and f ∗∗ follows from (6.1).
6 Convex subdifferential 51

(b) It follows by using Proposition 6.3, statement (a) and that (see Proposition
5.7 and Remark 5.11)

f ∗ (x∗ ) = (f )∗ (x∗ ) = f ∗∗∗ (x∗ ) ∀x∗ ∈ X ∗ .

(c) If f is convex, then f is proper, convex and lower semicontinuous, thus,


according to Theorem 5.9, it holds

f = (f )∗∗ = ((f )∗ )∗ = f ∗∗ .

In the following proposition we collect some properties of the (convex) subd-


ifferential.

Proposition 6.5 Let f : X → R be a given function. The following statements


are true:
(a) 0 ∈ ∂f (x) ⇔ f (x) ∈ R and f (x) = miny∈X f (y);

(b) ∂(λf )(x) = λ∂f (x) ∀x ∈ X ∀λ > 0;

(c) if, for z ∈ X, fz : X → R, fz (x) = f (x + z), then ∂fz (x) = ∂f (x + z)


∀x ∈ X;

(d) if, for z ∗ ∈ X ∗ , fz∗ : X → R, fz∗ (x) = f (x) + hz ∗ , xi, then ∂fz∗ (x) =
∂f (x) + z ∗ ∀x ∈ X.

The following proposition provides the formula of the (convex) subdifferential


of a separable function.

Proposition 6.6 Let Xi , i = 1, ..., m, be nontrivial real normed spaces, fi : Xi →


R, i = 1, ..., m, given functions and
m
X
f : X1 × ... × Xm → R, f (x1 , ..., xm ) = fi (xi ).
i=1

Then

∂f (x1 , ..., xm ) = ∂f1 (x1 ) × ... × ∂fm (xm ) ∀(x1 , ..., xm ) ∈ X1 × ... × Xm .

Example 6.7 (a) Let C ⊆ X be a given set. Then the (convex) subdifferential
of its indicator function δC reads

∂δC (x) = {x∗ ∈ X ∗ | hx∗ , y − xi ≤ 0 ∀y ∈ C} = NC (x) for x ∈ C

and ∂δC (x) = ∅ for x ∈


/ C.
52 III Conjugate functions and subdifferentiability

(b) It holds

{x∗ ∈ X ∗ | kx∗ k∗ ≤ 1},



for x = 0,
∂k · k(x) =
{x∗ ∈ X ∗ | kx∗ k∗ = 1, hx∗ , xi = kxk}, 6 0.
for x =

According to Proposition 6.3 and Example 5.5 (g) we have

x∗ ∈ ∂k · k(x) ⇔ k · k∗ (x∗ ) + kxk = hx∗ , xi ⇔ kx∗ k∗ ≤ 1 and hx∗ , xi = kxk.

If x = 0, then everything is clear. If x 6= 0, then the statement follows by taking


into account that kxk = hx∗ , xi ≤ kx∗ k∗ kxk, thus 1 ≤ kx∗ k∗ .
(c) It holds
 
1
∂ k · k (x) = {x∗ ∈ X ∗ | kx∗ k∗ = kxk, hx∗ , xi = kxkkx∗ k∗ } ∀x ∈ X.
2
2

Let x ∈ X. According to Proposition 6.3 and Example 5.5 (h) we have


 
∗ 1 2
x ∈∂ k · k (x)
2
1 1 1 1
⇔ kx∗ k2∗ + kxk2 = hx∗ , xi ≤ kxkkx∗ k∗ ≤ kx∗ k2∗ + kxk2
2 2 2 2
⇔ hx∗ , xi = kxkkx∗ k∗ and kxk = kx∗ k∗ .

The next result displays some connections between the (convex) subdifferen-
tial of a given function f and the one of its conjugate.

Theorem 6.8 Let f : X → R be a given function and x ∈ X. The following


statements are true:

(a) if x∗ ∈ ∂f (x), then x ∈ ∂f ∗ (x∗ );

(b) if f (x) = f ∗∗ (x), then x∗ ∈ ∂f (x) if and only if x ∈ ∂f ∗ (x∗ ).

(c) if f is proper, convex and lower semicontinuous, then x∗ ∈ ∂f (x) if and


only if x ∈ ∂f ∗ (x∗ ).

Proof. (a) Since x∗ ∈ ∂f (x), according to Proposition 6.3 we have f (x) +


f ∗ (x∗ ) = hx∗ , xi. But f ∗∗ (x) ≤ f (x), and thus f ∗∗ (x) + f ∗ (x∗ ) ≤ hx∗ , xi. As
the reverse inequality is always fulfilled, using once more Proposition 6.3, we get
x ∈ ∂f ∗ (x∗ ).
(b) Because of (a) only the sufficiency must be proven. For any x ∈ ∂f ∗ (x∗ ),
again by Proposition 6.3, it holds hx∗ , xi = f ∗ (x∗ ) + f ∗∗ (x) = f ∗ (x∗ ) + f (x) and
therefore x∗ ∈ ∂f (x).
(c) Theorem 5.9 yields f = f ∗∗ and the equivalence follows from (b). 
6 Convex subdifferential 53

The next theorem shows that proper and convex functions defined on a
normed space have nonempty (convex) subdifferentials at their points of con-
tinuity.

Theorem 6.9 Let f : X → R be a proper and convex function and x ∈ dom f


such that f is continuous at x. Then ∂f (x) 6= ∅ and ∂f (x) is norm bounded, thus
weakly∗ compact.

Proof. Let δ > 0 such that f (y) < f (x) + 1 for all y ∈ B(x, δ). Then B(x, δ) ×
[f (x)+1, +∞) ⊆ epi f , which means that int(epi f ) 6= ∅. We have that (x, f (x)) ∈
/
int(epi f ), since, otherwise, there would exist η > 0 such that (x, f (x) − η) ∈
epi f , which is impossible. Taking into account that int(epi f ) is convex, by
the Hahn-Banach Separation Theorem (Theorem 2.19) there exists (x∗ , α) ∈
X ∗ × R, (x∗ , α) 6= (0, 0), such that
hx∗ , yi + αr ≥ hx∗ , xi + αf (x) ∀(y, r) ∈ cl(epi f ) = cl(int(epi f )). (6.2)
Choosing (y, r) := (x, f (x) + 1) ∈ epi f in (6.2), it yields α ≥ 0. If α = 0, then,
from (6.2) it follows that
hx∗ , y − xi ≥ 0 ∀y ∈ dom f,
therefore,
hx∗ , ui ≥ 0 ∀u ∈ B(0, δ).
This further implies that x∗ = 0, which leads to a contradiction. Consequently,
α > 0. We divide (6.2) by α and, using that for all y ∈ dom f it holds (y, f (y)) ∈
epi f , we get  
1 ∗
f (y) ≥ − x , y − x + f (x) ∀y ∈ dom f,
α
which gives  
1 ∗
f (y) ≥ − x , y − x + f (x) ∀y ∈ X,
α
thus − α1 x∗ ∈ ∂f (x).
In order to prove the second statement of the theorem we consider an arbitrary
element x∗ ∈ ∂f (x). Then, for every u ∈ B(0, δ), it holds hx∗ , ui ≤ f (x + u) −
f (x) < 1. This means that
δkx∗ k∗ = δ sup |hx∗ , vi| = sup |hx∗ , δvi| = sup |hx∗ , ui| = sup hx∗ , ui ≤ 1,
kvk<1 kvk<1 kuk<δ kuk<δ

which proves that x∗ ∈ B ∗ (0, 1δ ). We have shown that ∂f (x) ⊆ 1δ B ∗ (0, 1), in
other words, that ∂f (x) is norm bounded.
According to the Theorem of Banach-Alaoglu (see, for instance, [8]), the closed
unit ball of X ∗ is weakly∗ compact and, since ∂f (x) is weakly∗ closed, it is weakly∗
compact, too. 
54 III Conjugate functions and subdifferentiability

We recall that, if X is a Banach space, then, according to the Uniform Bound-


edness Principle, every weakly∗ compact subset of X ∗ is norm bounded and, of
course, weakly∗ closed.
In finite-dimensional spaces the (convex) subdifferential of a proper and con-
vex function is nonempty even at points in the relative interior of its effective
domain (at which f must not necessarily be continuous).

Theorem 6.10 If f : Rn → R is a proper and convex function and x ∈ ri(dom f ),


then ∂f (x) 6= ∅.

Proof. Since f is a proper and convex function, its epigraph is a nonempty


convex set. According to Theorem 2.18 (see also Exercise 20) we have

ri(epi f ) = {(x, r) ∈ Rn × R | x ∈ ri(dom f ), f (x) < r},

which means that (x, f (x)) ∈ / ri(epi f ). By Theorem 2.22, there exists (a, α) ∈
Rn × R, (a, α) 6= (0, 0), such that

inf{aT y + αr | (y, r) ∈ epi f } ≥ aT x + αf (x) (6.3)

and
sup{aT y + αr | (y, r) ∈ epi f } > aT x + αf (x). (6.4)
As in the proof of Theorem 6.9, from (6.3) it yields that α ≥ 0. If α = 0, then
the two above statements become

inf{aT y | y ∈ dom f } ≥ aT x (6.5)

and, respectively,
sup{aT y | y ∈ dom f } > aT x. (6.6)
We will prove that aT z = aT x for all z ∈ dom f , which will then contradict (6.6).
Indeed, since x ∈ ri(dom f ), there exists ε > 0 such that B(x, ε) ∩ aff(dom f ) ⊆
dom f . Let z ∈ dom f be arbitrarily chosen. Then there exists λ < 0 such
that x + λ(z − x) ∈ B(x, ε) and, since x + λ(z − x) ∈ aff(dom f ), it yields
x + λ(z − x) ∈ dom f . According to (6.5) it holds aT (x + λ(z − x)) ≥ aT x
or, equivalently, aT z ≤ aT x. Since the opposite inequality is also true, we get
aT z = aT x.
This proves that α > 0. The conclusion follows as in the proof of Theorem
6.9 after dividing (6.3) by α. 

Example 6.11 The (convex) subdifferential of a proper and convex function


f : Rn → R at an element x ∈ ri(dom f ) is not necessarily bounded. Indeed, 
the function δR×{0} : R2 → R is proper and convex and it holds ri dom δR×{0} =
R × {0}. For every (x, 0) ∈ R × {0} it holds

∂δR×{0} (x, 0) = NR×{0} (x, 0) = {0} × R.


7 Directional derivative and differentiability 55

7 Directional derivative and differentiability


The main aim of this section is to introduce a directional derivative notion for
convex functions and to characterize the (convex) subdifferential in terms of it.
Theorem 7.1 (directional derivative) Let f : X → R be a proper and convex
function and x ∈ dom f . Then for every u ∈ X there exists
f (x + tu) − f (x) f (x + tu) − f (x)
f 0 (x; u) := lim = inf ∈ R,
t↓0 t t>0 t
which is called the directional derivative of f at x in direction u. The function
u 7→ f 0 (x; u) is sublinear and dom f 0 (x; ·) = cone(dom f −x). If x ∈ core(dom f ),
then f 0 (x; u) ∈ R for all u ∈ X.
Proof. Let u ∈ X and define
f (x + tu) − f (x)
ϕ : (0, +∞) → R, ϕ(t) = .
t
We will show that ϕ is nondecreasing, in other words, that for 0 < t1 < t2 it
holds ϕ(t1 ) ≤ ϕ(t2 ). This will imply that f 0 (x; u) = limt↓0 ϕ(t) = inf t>0 ϕ(t) ∈ R.
Indeed, let 0 < t1 < t2 . Then, using the convexity of f , it yields
    
t1 t1 t1 t1
f (x + t1 u) = f 1− x + (x + t2 u) ≤ 1 − f (x) + f (x + t2 u)
t2 t2 t2 t2
and, from here,
f (x + t1 u) − f (x) f (x + t2 u) − f (x)
ϕ(t1 ) = ≤ = ϕ(t2 ).
t1 t2
Next we will show that u 7→ f 0 (x; u) is sublinear. Obviously, f 0 (x; 0) = 0. For
λ > 0 and u ∈ X it holds
f (x + tλu) − f (x) f (x + λtu) − f (x)
f 0 (x; λu) = lim = λ lim
t↓0 t t↓0 λt
f (x + su) − f (x)
=λ lim = λf 0 (x; u),
s↓0 s
which proves that u 7→ f 0 (x; u) is positively homogeneous.
Moreover, for u1 , u2 ∈ X it holds, by using again that f is convex,
f (x + t(u1 + u2 )) − f (x)
f 0 (x; u1 + u2 ) = lim
t↓0 t
f 2 (x + 2tu1 ) + 21 (x + 2tu2 ) − f (x)
1

= lim
t↓0 t
f (x + 2tu1 ) − f (x) f (x + 2tu2 ) − f (x)
≤ lim + lim
t↓0 2t t↓0 2t
0 0
= f (x; u1 ) + f (x; u2 ),
56 III Conjugate functions and subdifferentiability

which proves that u 7→ f 0 (x; u) is subadditive.


Since
f (x + tu) − f (x)
u ∈ dom f 0 (x; ·) ⇔f 0 (x; u) < +∞ ⇔ ∃t > 0 such that < +∞
t
⇔∃t > 0 such that x + tu ∈ dom f
1
⇔∃t > 0 such that u ∈ (dom f − x),
t
it follows that dom f 0 (x; ·) = cone(dom f − x).
Assume now that x ∈ core(dom f ). Since dom f is a convex set, it yields
dom f 0 (x; ·) = cone(dom f − x) = X. In other words, f 0 (x; u) < +∞ for all
u ∈ X. Assuming that there exists ũ ∈ X such that f 0 (x; ũ) = −∞, according to
Proposition 3.7 it holds f 0 (x; u) = −∞ for all u ∈ icr(dom f 0 (x; ·)) = X, which
is a contradiction to f 0 (x; 0) = 0. This proves that f 0 (x; u) ∈ R for all u ∈ X. 

The next theorem provides a formulation for the (convex) subdifferential of a


convex function in terms of its directional derivative.

Theorem 7.2 Let f : X → R be a proper and convex function and x ∈ dom f .


Then
∂f (x) = {x∗ ∈ X ∗ | f 0 (x; u) ≥ hx∗ , ui ∀u ∈ X}.

Proof. “⊆” Let x∗ ∈ ∂f (x). For all u ∈ X and all t > 0 it holds

f (x + tu) − f (x) 1
≥ hx∗ , x + tu − xi = hx∗ , ui,
t t
which implies that f 0 (x; u) ≥ hx∗ , ui.
“⊇” Let x∗ ∈ X ∗ such that f 0 (x; u) ≥ hx∗ , ui for all u ∈ X. Then for all
u ∈ X it holds
f ((1 − t)x + t(x + u)) − f (x)
hx∗ , ui ≤ f 0 (x; u) = lim
t↓0,t≤1 t
t(f (x + u) − f (x))
≤ lim = f (x + u) − f (x).
t↓0,t≤1 t

Consequently,
f (y) − f (x) ≥ hx∗ , y − xi ∀y ∈ X,
which implies x∗ ∈ ∂f (x). 

On the other hand, we have a formulation for the directional derivative of a


function in terms of its (convex) subdifferential.
7 Directional derivative and differentiability 57

Theorem 7.3 Let f : X → R be a proper and convex function such that f is


continuous at x ∈ dom f . Then f 0 (x, ·) is continuous and

f 0 (x; u) = max{hx∗ , ui | x∗ ∈ ∂f (x)} ∀u ∈ X.

Proof. Let δ > 0 such that f (y) − f (x) < 1 for all y ∈ B(x, δ). Then B(x, δ) ⊆
dom f , thus x ∈ int(dom f ) = core(dom f ). According to Theorem 7.1 it holds
f 0 (x; u) ∈ R for all u ∈ X. On the other hand, for all u ∈ B(0, δ) we have

f (x + tu) − f (x)
f 0 (x; u) = inf ≤ f (x + u) − f (x) < 1,
t>0 t
in other words, f 0 (x; ·) is bounded above on B(0, δ). According to Theorem 4.18,
f 0 (x; ·) is continuous on int(dom f 0 (x; ·)) = X, which proves the first statement
of the theorem.
For all x∗ ∈ X ∗ we have
 
0 ∗ ∗ ∗ f (x + tu) − f (x)
(f (x; ·)) (x ) = sup hx , ui − inf
u∈X t>0 t
 ∗ 
hx , tui − f (x + tu) + f (x)
= sup .
u∈X,t>0 t

If x∗ ∈ ∂f (x), then for all u ∈ X and all t > 0 it holds

hx∗ , tui − f (x + tu) + f (x)


≤ 0.
t
Since for u := 0 and t := 1 the fraction is equal to 0, it yields (f 0 (x; ·))∗ (x∗ ) = 0.
If x∗ ∈
/ ∂f (x), then there exists y ∈ X such that hx∗ , y − xi − f (y) + f (x) > 0,
thus, for all t > 0 (by taking u := y−x
t
) it holds

hx∗ , y − xi − f (y) + f (x)


(f 0 (x; ·))∗ (x∗ ) ≥ .
t
We let t → 0 and this implies (f 0 (x; ·))∗ (x∗ ) = +∞. This proves that (f 0 (x; ·))∗ =
δ∂f (x) .
By the Fenchel-Moreau Theorem we have for all u ∈ X

f 0 (x; u) = (f 0 (x; ·))∗∗ (u) = (δ∂f (x) )∗ (u) = σ∂f (x) (u) = sup{hx∗ , ui | x∗ ∈ ∂f (x)}.

It follows from the fact that ∂f (x) is weakly∗ compact (see Theorem 6.9) that
the supremum is attained. 

Theorem 7.2 and Theorem 7.3 have important consequences when applied to
convex and Gâteaux differentiable functions. We recall that a proper function
58 III Conjugate functions and subdifferentiability

f : X → R is said to be Gâteaux differentiable at x ∈ dom f if there exists


x∗ ∈ X ∗ such that
f (x + tu) − f (x)
hx∗ , ui = lim ∀u ∈ X.
t→0 t
The functional x∗ is called Gâteaux differential (derivative) of f at x, and it is
denoted by ∇f (x).

Remark 7.4 (a) We recall that a proper function f : X → R is said to be


Fréchet differentiable at x ∈ int(dom f ) if there exists x∗ ∈ X ∗ such that
f (x + u) − f (x) − hx∗ , ui
lim = 0.
u→0 kuk
In this case f is continous and Gâteaux differentiable at x. The functional x∗ is
called Fréchet differential (derivative) of f at x, and is consequently denoted also
by ∇f (x).
(b) The function
( 3
x1 x2
, if (x1 , x2 ) 6= (0, 0),
f : R2 → R, f (x1 , x2 ) = x41 +x22
0, otherwise,

is continuous and Gâteaux differentiable at (0, 0), and it is not Fréchet differen-
tiable at (0, 0).
(c) If f : X → R is proper and Gâteaux differentiable on B(x, δ) ⊆ dom f , for
δ > 0, and the Gâteaux derivative ∇f : B(x, δ) → X ∗ is continuous at x, then f
is Fréchet differentiable at x.
In order to show this, we consider an arbitrary u ∈ B(0, δ), u 6= 0. Then
there exists ε > 0 such that x + tu ∈ B(x, δ) for all t ∈ (−ε, 1 + ε). We define
φ : (−ε, 1 + ε) → R, φ(t) = f (x + tu) − f (x) − th∇f (x), ui. For all t ∈ (−ε, 1 + ε)
it holds
φ(t + s) − φ(t) f (x + (t + s)u) − f (x + tu)
φ0 (t) = lim = lim − h∇f (x), ui
s→0 s s→0 s
=h∇f (x + tu) − ∇f (x), ui,

which also means that φ is differentiable on (−ε, 1 + ε) and therefore continuous


on [0, 1]. By the Mean Value Theorem there exists λ ∈ (0, 1) such that

|f (x + u) − f (x) − h∇f (x), ui| = |φ(1)| = |φ(1) − φ(0)| = |φ0 (λ)|


≤ k∇f (x + λu) − ∇f (x)k∗ kuk,

consequently,
|f (x + u) − f (x) − h∇f (x), ui|
≤ sup k∇f (x + tu) − ∇f (x)k∗ .
kuk t∈[0,1]
7 Directional derivative and differentiability 59

The Fréchet differentiability of f at x follows from the fact that

lim sup k∇f (x + tu) − ∇f (x)k∗ = 0.


u→0 t∈[0,1]

Corollary 7.5 If f : X → R is proper, convex and Gâteaux differentiable at


x ∈ dom f , then ∂f (x) = {∇f (x)}.
f (x+tu)−f (x)
Proof. Since for all u ∈ X it holds f 0 (x; u) = limt→0 t
= h∇f (x), ui,
according to Theorem 7.2 we have

∂f (x) = {x∗ ∈ X ∗ | h∇f (x) − x∗ , ui ≥ 0 ∀u ∈ X}


= {x∗ ∈ X ∗ | h∇f (x) − x∗ , ui = 0 ∀u ∈ X}
= {∇f (x)}.

Corollary 7.6 If f : X → R is proper, convex and continuous at x ∈ dom f


such that ∂f (x) is a singleton, then f is Gâteaux differentiable at x.

Proof. Let ∂f (x) = {x∗ }, where x∗ ∈ X ∗ . By Theorem 7.3 it holds for all
u∈X
f (x + tu) − f (x)
f 0 (x; u) = lim = hx∗ , ui.
t↓0 t
From here it yields for all u ∈ X that

f (x + tu) − f (x) f (x + s(−u)) − f (x)


lim = − lim = −hx∗ , −ui = hx∗ , ui.
t↑0 t s↓0 s
In conclusion,
f (x + tu) − f (x)
lim = hx∗ , ui,
t→0 t
which proves that f is Gâteaux differentiable at x and ∇f (x) = x∗ . 

Example 7.7 Let (H, h·, ·i) be a real Hilbert space.


(a) The function f : H → R, f (x) = 21 kxk2 , is Fréchet differentiable on H and
for all x ∈ H it holds ∇f (x) = x. Indeed,

f (x + u) − f (x) − hx, ui kx + uk2 − kxk2 − 2hx, ui kuk


lim = lim = lim = 0.
u→0 kuk u→0 2kuk u→0 2

(b) For f : H → R, f (x) = kxk, it holds

tkuk
f 0 (0; u) = lim = kuk ∀u ∈ H,
t↓0 t
60 III Conjugate functions and subdifferentiability

which shows that f is not Gâteaux differentiable at 0.


On the other hand, for x 6= 0 we have

kx + tuk − kxk kx + tuk2 − kxk2 tkuk2 + 2hx, ui


f 0 (x; u) = lim = lim = lim
t↓0 t t↓0 t(kx + tuk + kxk) t↓0 kx + tuk + kxk
 
hx, ui x
= = ,u ∀u ∈ H.
kxk kxk
n o
x
Consequently, ∂f (x) = kxk , which shows that f is Gâteaux differentiable at x
x
with ∇f (x) = kxk .

We close the section by a result which provides a useful characterization of


the convexity of a function on a convex and open set by means of its Gâteaux
derivative.

Theorem 7.8 Let C ⊆ X be a nonempty, convex and open set and f : C → R a


Gâteaux differentiable function on C. The following statements are equivalent:

(i) f is convex on C;

(ii) f (y) − f (x) ≥ h∇f (x), y − xi ∀x, y ∈ C;

(iii) h∇f (y) − ∇f (x), y − xi ≥ 0 ∀x, y ∈ C;

(iv) if f is twice Gâteaux differentiable on C, then

∇2 f (x)(u, u) ≥ 0 ∀x ∈ C ∀u ∈ X,

where
h∇f (x + tv), ui − h∇f (x), ui
∇2 f (x)(u, v) := lim
t→0 t
denotes the second Gâteaux differential (derivative) of f at x ∈ C in the
directions u ∈ X and v ∈ X.

Proof. “(i) ⇒ (ii)” Let x, y ∈ C. The function f + δC : X → R is proper,


convex and Gâteaux differentiable at x ∈ dom(f +δC ) and ∇(f +δC )(x) = ∇f (x).
According to Corollary 7.5 it holds ∇f (x) ∈ ∂(f + δC )(x), thus h∇f (x), y − xi ≤
(f + δC )(y) − (f + δC )(x) = f (y) − f (x).
“(ii) ⇒ (i)” Let x, y ∈ C and λ ∈ [0, 1]. Then λx+(1−λ)y ∈ C and therefore

f (x) − f (λx + (1 − λ)y) ≥ h∇f (λx + (1 − λ)y), (1 − λ)(x − y)i

and
f (y) − f (λx + (1 − λ)y) ≥ h∇f (λx + (1 − λ)y), λ(y − x)i.
7 Directional derivative and differentiability 61

We multiply the first inequality by λ, the second one by 1 − λ and add the
resulting inequalities. This yields
λf (x) + (1 − λ)f (y) − f (λx + (1 − λ)y) ≥ 0.
Consequently, f is convex on C.
“(ii) ⇒ (iii)” For all x, y ∈ C we have
f (y) − f (x) ≥ h∇f (x), y − xi
and
f (x) − f (y) ≥ h∇f (y), x − yi,
which by summation gives 0 ≥ h∇f (x) − ∇f (y), y − xi or, equivalently, h∇f (y) −
∇f (x), y − xi ≥ 0.
“(iii) ⇒ (ii)” Let x, y ∈ C and ε > 0 such that (1 − t)x + ty ∈ C for all
t ∈ (−ε, 1 + ε). We define φ : (−ε, 1 + ε) → R, φ(t) = f (x + t(y − x)). For all
t ∈ (−ε, 1 + ε) it holds
φ(t + s) − φ(t) f (x + (t + s)(y − x)) − f (x + t(y − x))
φ0 (t) = lim = lim
s→0 s s→0 s
=h∇f (x + t(y − x)), y − xi,
which also means that φ is differentiable on (−ε, 1 + ε) and therefore continuous
on [0, 1]. By the Mean Value Theorem there exists λ ∈ (0, 1) such that
f (y) − f (x) = φ(1) − φ(0) = φ0 (λ) = h∇f (x + λ(y − x)), y − xi.
On the other hand,
1
h∇f (x + λ(y − x)) − ∇f (x), y − xi = h∇f (x + λ(y − x)) − ∇f (x), λ(y − x)i ≥ 0,
λ
which implies that f (y) − f (x) ≥ h∇f (x), y − xi.
“(iii) ⇒ (iv)” Let x ∈ C and u ∈ X. Since x ∈ core C, there exists δ > 0
such that y := x + δu ∈ C. Then
h∇f (x + tu), ui − h∇f (x), ui
∇2 f (x)(u, u) = lim
t→0 t
h∇f (x + δ (y − x)) − ∇f (x), δt (y − x)i
t
= δ lim
t→0,t≤δ t
≥ 0.
“(iv) ⇒ (ii)” Let x, y ∈ C and ε > 0 such that (1 − t)x + ty ∈ C for all
t ∈ (−ε, 1 + ε), and φ : (−ε, 1 + ε) → R, φ(t) = f (x + t(y − x)). We have seen
above that for all t ∈ (−ε, 1 + ε) it holds
φ0 (t) = h∇f (x + t(y − x)), y − xi.
62 III Conjugate functions and subdifferentiability

In addition, for all t ∈ (−ε, 1 + ε) it holds

φ0 (t + s) − φ0 (t)
φ00 (t) = lim
s→0 s
h∇f (x + (t + s)(y − x)), y − xi − h∇f (x + t(y − x)), y − xi
= lim
s→0 s
2
=∇ f (x + t(y − x))(y − x, y − x) ≥ 0.

According to Taylor’s Theorem, there exists λ ∈ (0, 1) such that


1
φ(1) = φ(0) + φ0 (0) + φ00 (λ) ≥ φ(0) + φ0 (0),
2
in other words,
f (y) ≥ f (x) + h∇f (x), y − xi.

Chapter IV

Duality theory

In this chapter we will discuss in detail the theory of conjugate duality in convex
analysis and convex optimization, and some of its important consequences.

8 A general perturbation approach


In this section we will present a so-called perturbation approach which allows to
assign to a given minimization problem a conjugate dual problem.
Let X be a nontrivial real normed space, F : X → R a proper function and
consider the following so-called primal optimization problem

(P G) inf F (x).
x∈X

Let Y be another nontrivial real normed space and the function Φ : X × Y →


R, called perturbation function, fulfilling Φ(x, 0) = F (x) for all x ∈ X. The
optimization problem (P G) is nothing else than

(P G) inf Φ(x, 0).


x∈X

The conjugate dual problem to (P G) is defined as the maximization problem

(DG) sup {−Φ∗ (0, y ∗ )},


y ∗ ∈Y ∗

where Φ∗ : X ∗ ×Y ∗ → R is the conjugate function of Φ. We denote by v(P G) and


v(DG) the optimal objective values of the problems (P G) and (DG), respectively.
The next result shows that for (P G) and (DG) weak duality always holds, namely,
the optimal objective value of the primal problem is always greater than or equal
to the optimal objective value of the dual problem.

Theorem 8.1 (weak duality) It holds

−∞ ≤ v(DG) ≤ v(P G) ≤ +∞.

63
64 IV Duality theory

Proof. For all x ∈ X and all y ∗ ∈ Y ∗ we have

−Φ∗ (0, y ∗ ) = inf {−hy ∗ , yi + Φ(z, y)} ≤ Φ(x, 0),


(z,y)∈X×Y

which implies that v(P G) ≥ v(DG). 

Next we explore the concept of strong duality, which is the situation when
v(P G) = v(DG) and the dual has an optimal solution. An important role in the
characterization of strong duality will be played by the infimal value function of
Φ, h : Y → R, h(y) = inf x∈X Φ(x, y). One can notice that v(P G) = h(0). The
following result shows that the optimal objective value of (DG) can be expressed
in terms of the biconjugate of h.

Proposition 8.2 It holds v(DG) = h∗∗ (0).

Proof. For all y ∗ ∈ Y ∗ we have

h∗ (y ∗ ) = sup{hy ∗ , yi − h(y)} = sup{hy ∗ , yi − Φ(x, y)} = Φ∗ (0, y ∗ )


y∈Y x∈X
y∈Y

and, from here,

h∗∗ (0) = sup {−h∗ (y ∗ )} = sup {−Φ∗ (0, y ∗ )} = v(DG).


y ∗ ∈Y ∗ y ∗ ∈Y ∗

Since h∗∗ ≤ h, it holds h∗∗ (0) ≤ h(0), which is nothing else than the weak duality
statement expressed via the infimal value function h.

Definition 8.3 (normal problem) We say that the problem (P G) is normal if


h(0) ∈ R and h is lower semicontinuous at 0.

The next result provides a characterization of the normality of (P G) in case the


perturbation function Φ is convex.

Proposition 8.4 Let Φ : X × Y → R be a proper and convex function. The


following statements are equivalent:

(i) the problem (P G) is normal;

(ii) v(P G) = v(DG) ∈ R.


8 A general perturbation approach 65

Proof. “(i) ⇒ (ii)” We have h∗∗ ≤ h ≤ h. Since Φ is convex, h is also convex,


and this implies that h is convex, too. The problem (P G) being normal, we have
h(0) = h(0) ∈ R. Since h is a convex and lower semicontinuous function, we have
according to Proposition 4.14 that h(y) > −∞ for all y ∈ Y . This guarantees
the properness of h. According to the Fenchel-Moreau Theorem we obtain

h = (h)∗∗ = ((h)∗ )∗ = (h∗ )∗ = h∗∗

and so h∗∗ (0) = h(0) = h(0) ∈ R. Since v(P G) = h(0) and v(DG) = h∗∗ (0),
statement (ii) is shown.
“(ii) ⇒ (i)” Statement (ii) can be equivalently written as h∗∗ (0) = h(0) ∈ R.
This implies h(0) = h(0) ∈ R, which actually means that (P G) is normal. 

The following notion will be used to characterize the strong duality for (P G)
and (DG).

Definition 8.5 (stable problem) We say that the problem (P G) is stable if


h(0) ∈ R and ∂h(0) 6= ∅.

Proposition 8.6 Let Φ : X × Y → R be a proper and convex function. The


following statements are equivalent:
(i) the problem (P G) is stable;

(ii) the primal problem (P G) is normal, namely, v(P G) = v(DG) ∈ R, and


the dual (DG) has an optimal solution. In this situation the set of optimal
solutions of (DG) is equal to ∂h(0).

Proof. “(i) ⇒ (ii)” Assume that h(0) ∈ R and ∂h(0) 6= ∅. Thus, according to
Proposition 6.4, h(0) = h∗∗ (0), which is nothing else than v(P G) = v(DG) ∈ R.
By Proposition 8.4 it follows that (P G) is normal. Further, consider an element
ȳ ∗ ∈ ∂h(0). We have h(0) + h∗ (ȳ ∗ ) = 0 or, equivalently, v(P G) = h(0) =
−h∗ (ȳ ∗ ) = −Φ∗ (0, ȳ ∗ ), which, according to the weak duality theorem, implies
that ȳ ∗ is an optimal solution of (DG).
“(ii) ⇒ (i)” Assume that (P G) is normal and that its dual (DG) has an op-
timal solution ȳ ∗ ∈ Y ∗ . Thus h(0) = v(P G) = v(DG) = −Φ∗ (0, ȳ ∗ ) = −h∗ (ȳ ∗ ) ∈
R, which is the same with h(0) + h∗ (ȳ ∗ ) = 0 or, equivalently, ȳ ∗ ∈ ∂h(0). Since
the set ∂h(0) is nonempty, the stability of (P G) is proved. 

When strong duality holds for (P G) and (DG), we write

inf Φ(x, 0) = max


∗ ∗
{−Φ∗ (0, y ∗ )},
x∈X y ∈Y

and use “max” instead of “sup” in order to emphasize that the dual problem has
an optimal solution.
66 IV Duality theory

As we have seen above, the stability completely characterizes the existence of


strong duality for a given convex optimization problem and its conjugate dual.
One of the main challenges in convex analysis is to formulate sufficient conditions,
called regularity conditions, which ensure that the primal problem is stable. In
the following we will formulate several such generalized interior point regularity
conditions in terms of the perturbation function Φ, and also prove the corre-
sponding strong duality theorem.

Theorem 8.7 Let Φ : X × Y → R be a proper and convex function such that


0 ∈ PrY (dom Φ). If the regularity condition
(RC1Φ ) ∃x0 ∈ X such that (x0 , 0) ∈ dom Φ and Φ(x0 , ·) is continuous at 0
is fulfilled, then for (P G) and (DG) strong duality holds.

Proof. The infimal value function h : Y → R, h(y) = inf x∈X Φ(x, y), is convex
and it fulfills h(0) = v(P G). If h(0) = −∞, then, according to the weak duality
theorem, v(P G) = v(DG) = −∞ and every element y ∗ ∈ Y ∗ is an optimal
solution of the dual problem (DG).
Let h(0) > −∞. Since h(0) ≤ Φ(x0 , 0) < +∞, it holds h(0) ∈ R. According
to (RC1Φ ), there exists δ > 0 such that

h(y) ≤ Φ(x0 , y) < Φ(x0 , 0) + 1 ∀y ∈ B(0, δ), (8.1)

which implies that B(0, δ) ⊆ dom h. In conclusion, 0 ∈ int(dom h).


Since h is bounded above on B(0, δ), we are in the setting of Theorem 4.18.
If h would be not proper, then it would be equal to −∞ on int(dom h), which
would be a contradiction to h(0) ∈ R. This shows that h is a proper function
and, consequently, continuous on int(dom h). According to Theorem 6.9, it holds
∂h(0) 6= ∅. In other words, the problem (P G) is stable, which leads to the desired
conclusion. 

One can notice in the proof of the above theorem that a consequence of
the required regularity condition was that 0 is an interior point of dom h =
PrY (dom Φ). We will see in the next theorem that one can prove strong duality
for the primal-dual pair (P G) − (DG) by assuming “only” that 0 ∈ core(dom h),
however, in the setting of complete normed spaces and by additionally requiring
that the perturbation function Φ is also lower semicontinuous.

Theorem 8.8 Let Φ : X × Y → R be a proper and convex function such that


0 ∈ PrY (dom Φ). If the regularity condition
(RC2Φ ) X and Y are Banach spaces, Φ is lower semicontinuous
and 0 ∈ core(PrY (dom Φ))
is fulfilled, then for (P G) and (DG) strong duality holds.
8 A general perturbation approach 67

Proof. We define Φ e y) = max{Φ(x, y), kxk}, and e


e : X × Y → R, Φ(x, h:Y →
R, e
h(y) = inf x∈X Φ(x,
e y). Then Φ e is a proper, convex and lower semicontinuous
h is a convex function, Φ ≤ Φ
function, e e and h ≤ eh. In addition,

h(y) ≥ 0 ∀y ∈ Y.
e

Let x0 ∈ X such that Φ(x0 , 0) ∈ R. Then e h(0) < +∞, which proves that e
h is
proper. Since dom Φ = dom Φ, it holds dom h = PrY (dom Φ) = PrY (dom Φ) =
e e
dom eh, and so 0 ∈ core(dom h) = core(dom e h).
Let ε := max{h(0), 0} + 1 ∈ R and W := {y ∈ Y | e
e h(y) < ε}. Then
0 ∈ core W . Indeed, let y ∈ Y and λ > 0 such that λy ∈ dom e h. We choose
t ∈ (0, 1] such that t(e
h(λy) − ε + 1) < 1. Then

h(tλy) ≤ te
e h(λy) + (1 − t)e
h(0) ≤ te
h(λy) + (1 − t)(ε − 1)
h(λy) − ε + 1) + ε − 1 < ε,
= t(e

which shows that tλy ∈ W and, consequently, proves that 0 ∈ core W . From here
we immediately have that 0 ∈ core(cl W ).
According to Theorem 1.11, for the convex and closed set cl W we have 0 ∈
core(cl W ) = int(cl W ), which means that there exists δ > 0 such that B(0, 2δ) ⊆
cl W .
Next we will prove that

h(y) ≤ ε ∀y ∈ B(0, δ).


e (8.2)

Let y ∈ B(0, δ) be arbitrarily chosen. Then 2y ∈ cl W and, consequently, there


exists y1 ∈ W such that k2y − y1 k < δ. This means that 4y − 2y1 = 2(2y − y1 ) ∈
cl W and, consequently, there exists y2 ∈ W such that k4y − 2y1 − y2 k < δ.
Repeating this procedure we obtain a sequence (yk )k∈N ⊆ W such that
k
2 y − 2k−1 y1 − ... − yk < δ ∀k ≥ 1

or, equivalently,
1
y − y1 − ... − yk < 1 δ ∀k ≥ 1,
1

2 2k 2k
which proves that +∞ 1
P
k=1 2k yk = y.
For all k ∈ N, since yk ∈ W , we can choose xk ∈ X such Pthat Φ(x
e k , yk ) < ε,
+∞ 1
which implies kxk k < ε. Since X is a Banach space and k=1 2k kxk k < +∞,
P+∞ 1
there exists x ∈ X such that k=1 x = x (see, for instance, [4, Lemma 1.12
2k k
(ii)]). This proves that
+∞
X 1
(xk , yk ) = (x, y).
k=1
2k
68 IV Duality theory

e 0 , 0) ∈ R, by using the convexity of Φ


Since Φ(x e (see Remark 3.2 (a)), we obtain
for all n ∈ N
n
! n
X 1 1 0
X 1e 1 e 0
Φ
e
k
(x k , yk ) + n
(x , 0) ≤ k
Φ(x k , yk ) + n
Φ(x , 0),
k=1
2 2 k=1
2 2
n
X 1 1 e 0
≤ k
ε + n Φ(x , 0).
k=1
2 2

For n → +∞ and by using that Φ


e is lower semicontinuous, this yields

h(y) ≤ Φ(x,
e e y) ≤ ε.

This proves relation (8.2), which further implies

h(y) ≤ ε ∀y ∈ B(0, δ),

thus 0 ∈ int(dom h). At this point we can argue as in the proof of Theorem 8.7.
If h(0) = −∞, then strong duality follows automatically. If h(0) > −∞, then
h(0) ≤ Φ(x0 , 0) < +∞, in other words, h(0) ∈ R. By Theorem 4.18 we obtain
that h is proper and continuous on int(dom h), while Theorem 6.9 further gives
that ∂h(0) 6= ∅. This provides the desired conclusion. 

Remark 8.9 As it follows from the proof of Theorem 8.8, if Φ : X × Y → R is a


proper and convex function such that 0 ∈ PrY (dom Φ), then (RC2Φ ) is equivalent
to
(RC2Φ0 ) X and Y are Banach spaces, Φ is lower semicontinuous
and 0 ∈ int(PrY (dom Φ)).

The following theorem introduces a regularity condition which is “weaker”


than (RC2Φ ) and (RC2Φ0 ).

Theorem 8.10 Let Φ : X × Y → R be a proper and convex function such that


0 ∈ PrY (dom Φ). If the regularity condition
(RC3Φ ) X and Y are Banach spaces, Φ is lower semicontinuous
and 0 ∈ sqri(PrY (dom Φ))
is fulfilled, then for (P G) and (DG) strong duality holds.

Proof. Since 0 ∈ sqri(PrY (dom Φ)), it holds that Z := cone(PrY (dom Φ)) is a
closed linear subspace of the Banach space Y , consequently, it is complete. In
addition, dom h = PrY (dom Φ) ⊆ Z.
We assume first that Z = {0}, which corresponds to the case when dom h =
{0}. Then v(P G) = h(0) and for all y ∗ ∈ Y ∗ it holds h∗ (y ∗ ) = supy∈Y {hy ∗ , yi −
8 A general perturbation approach 69

h(y)} = −h(0), consequently, v(DG) = supy∗ ∈Y ∗ {−h∗ (y ∗ )} = h(0) = v(P G) and


every y ∗ ∈ Y ∗ is an optimal solution of the dual problem.
Next we address the case when Z is a nontrivial Banach space. Let j :
Z → Y , j(z) = z, be the canonical embedding of Z into Y and define Φ e :
X × Z → R, Φ(x, z) = Φ(x, j(z)). The function Φ is proper, convex and lower
e e
semicontinuous and its infimal value function is defined as

h : Z → R,
e h(z) = inf Φ(x,
e e z) = inf Φ(x, j(z)) = h(j(z)).
x∈X x∈X

Since dom h ⊆ Z, it holds dom e h = j −1 (dom h) = dom h, which means that


0 ∈ core(PrZ (dom Φ)).
e Then, according to Theorem 8.8, it holds

e ∗ (0, z ∗ )}
e 0) = max {−Φ
inf Φ(x,
x∈X ∗ ∗ z ∈Z

or, equivalently,
h(0) = e
h(0) = max
∗ ∗
h∗ (z ∗ )}.
{−e (8.3)
z ∈Z

Denoting by j ∗ : Y ∗ → Z ∗ the adjoint operator of j, one can easily notice that


j ∗ (y ∗ ) = y ∗ |Z for all y ∗ ∈ Y ∗ . For all y ∗ ∈ Y ∗ , by taking into account that
dom h ⊆ Z, it holds

h∗ (j ∗ (y ∗ )) = sup{hj ∗ (y ∗ ), zi − e
e h(z)} = sup{hy ∗ , j(z)i − h(j(z))}
z∈Z z∈Z
= sup{hy , zi − h(z)} = sup{hy , zi − h(z)} = h∗ (y ∗ ).
∗ ∗
z∈Z z∈Y

According to (8.3), there exists z ∗ ∈ Z ∗ such that h(0) = −e h∗ (z ∗ ). According


to the Hahn-Banach Theorem (see, for instance, [4, Theorem 6.6]), there exists
y ∗ ∈ Y ∗ such that j ∗ (y ∗ ) = y ∗ |Z = z ∗ . From here we have h(0) = −e h∗ (j ∗ (y ∗ )) =
∗ ∗ ∗ ∗ ∗ ∗
−h (y ). Since, due to weak duality, h(0) ≥ −h (y ) for all y ∈ Y , it holds

h(0) = max
∗ ∗
{−h∗ (y ∗ )},
y ∈Y

in other words, for (P G) and (DG) strong duality holds. 

In finite-dimensional spaces one can formulate a regularity condition for strong


duality in terms of the relative interior.

Theorem 8.11 Let Φ : X × Y → R be a proper and convex function such that


0 ∈ PrY (dom Φ). If the regularity condition

(RC4Φ ) Y is finite-dimensional and 0 ∈ ri(PrY (dom Φ))

is fulfilled, then for (P G) and (DG) strong duality holds.


70 IV Duality theory

Proof. The function h : Y → R is convex and 0 ∈ ri(dom h). If h(0) = −∞,


then the statement follows from the weak duality theorem. If h(0) ∈ R, then, by
Theorem 2.6 and Proposition 3.7, h is proper. Thus, according to Theorem 6.10,
∂h(0) 6= ∅, and the conclusion follows from Proposition 8.6. 

Regularity conditions and strong duality lie at the basis of exact conjugate
and subdifferential calculus.

Theorem 8.12 (exact conjugate and subdifferential formulas) Let Φ : X × Y →


R be a proper and convex function such that 0 ∈ PrY (dom Φ). If one of the reg-
ularity conditions (RCiΦ ), i ∈ {1, 2, 20 , 3, 4}, is fulfilled, then the following state-
ments are true:
(a) (Φ(·, 0))∗ (x∗ ) = miny∗ ∈Y ∗ Φ∗ (x∗ , y ∗ ) for all x∗ ∈ X ∗ ;
(b) ∂ (Φ(·, 0)) (x) = PrX ∗ (∂Φ(x, 0)) for all x ∈ X.

Proof. (a) Let x∗ ∈ X ∗ . We define Φx∗ : X × Y → R, Φx∗ (x, y) = Φ(x, y) −


hx∗ , xi. By weak duality we always have
(Φ(·, 0))∗ (x∗ ) = sup{hx∗ , xi − Φ(x, 0)} = − inf Φx∗ (x, 0)
x∈X x∈X

≤ − sup {−Φ∗x∗ (0, y ∗ )} = ∗inf ∗ Φ∗x∗ (0, y ∗ ) = ∗inf ∗ Φ∗ (x∗ , y ∗ ).


y ∗ ∈Y ∗ y ∈Y y ∈Y

Furthermore, the function Φx∗ is proper and convex with dom Φx∗ = dom Φ,
therefore, one of the regularity conditions (RCiΦx∗ ), i ∈ {1, 2, 20 , 3, 4}, is fulfilled.
According to the theorems 8.7, 8.8, 8.10 and 8.11 we have
(Φ(·, 0))∗ (x∗ ) = − inf Φx∗ (x, 0) = − max
∗ ∗
{−Φ∗x∗ (0, y ∗ )} = min
∗ ∗
Φ∗ (x∗ , y ∗ ),
x∈X y ∈Y y ∈Y

which proves statement (a).


(b) Let x ∈ X.
“⊇” Let x∗ ∈ PrX ∗ (∂Φ(x, 0)). Then there exists y ∗ ∈ Y ∗ such that (x∗ , y ∗ ) ∈
∂Φ(x, 0). In other words,
Φ(u, v) − Φ(x, 0) ≥ hx∗ , u − xi + hy ∗ , vi ∀(u, v) ∈ X × Y.
From here it follows that Φ(u, 0) − Φ(x, 0) ≥ hx∗ , u − xi for all u ∈ X, which
means that x∗ ∈ ∂ (Φ(·, 0)) (x). This proves that the inclusion
PrX ∗ (∂Φ(x, 0)) ⊆ ∂ (Φ(·, 0)) (x)
always holds.
“⊆” Let x∗ ∈ ∂ (Φ(·, 0)) (x). Then Φ(x, 0) + (Φ(·, 0))∗ (x∗ ) = hx∗ , xi. Accord-
ing to (a), there exists ȳ ∗ ∈ Y ∗ such that (Φ(·, 0))∗ (x∗ ) = Φ∗ (x∗ , ȳ ∗ ), consequently,
Φ(x, 0) + Φ∗ (x∗ , ȳ ∗ ) = hx∗ , xi or, equivalently, (x∗ , ȳ ∗ ) ∈ ∂Φ(x, 0). This proves
that x∗ ∈ PrX ∗ (∂Φ(x, 0)). 
9 Closedness-type regularity conditions 71

9 Closedness-type regularity conditions


In this section we will introduce a new class of regularity conditions, called
closedness-type regularity conditions, that guarantee strong duality for the gen-
eral optimization problem (P G) and its conjugate dual problem (DG). The
conditions will be formulated in term of the conjugate function of the perturba-
tion function Φ : X × Y → R. Notice that, since Φ∗ : X ∗ × Y ∗ → R is a convex
function, its infimal value function inf y∗ ∈Y ∗ Φ∗ (·, y ∗ ) : X ∗ → R is also a convex
function.
Theorem 9.1 Let Φ : X × Y → R be a proper, convex and lower semicontinuous
function such that 0 ∈ PrY (dom Φ). Then it holds
 w∗
(Φ(·, 0))∗ (x∗ ) = ∗inf ∗ Φ∗ (·, y ∗ ) (x∗ ) ∀x∗ ∈ X ∗ .
y ∈Y

Proof. Since Φ is proper, convex and lower semicontinuous, for the conjugate
of inf y∗ ∈Y ∗ Φ∗ (·, y ∗ ) it holds for all x ∈ X
 ∗  
∗ ∗ ∗ ∗ ∗ ∗
inf ∗ Φ (·, y ) (x) = sup hx , xi − ∗inf ∗ Φ (x , y )

y ∈Y x∗ ∈X ∗ y ∈Y

= sup {hx∗ , xi − Φ∗ (x∗ , y ∗ )} = Φ∗∗ (x, 0) = Φ(x, 0).


x∗ ∈X ∗ ,y ∗ ∈Y ∗

w∗
This also implies that the function (inf y∗ ∈Y ∗ Φ∗ (·, y ∗ )) is proper. Assuming that
it takes everywhere the value +∞, it yields that its conjugate, which coincides
with (inf y∗ ∈Y ∗ Φ∗ (·, y ∗ ))∗ , is identically −∞. This contradicts the properness
w∗
of Φ. On the other hand, assuming that (inf y∗ ∈Y ∗ Φ∗ (·, y ∗ )) takes somewhere
the value −∞, by the same argument as above, one would have that Φ(x, 0) is
equal to +∞ for all x ∈ X. Since this contradicts the feasibility assumption
w∗
0 ∈ PrY (dom Φ), the function (inf y∗ ∈Y ∗ Φ∗ (·, y ∗ )) is everywhere greater than
−∞.
By the Fenchel-Moreau Theorem we obtain
 ∗∗  w∗ !∗∗  w∗
(Φ(·, 0))∗ = ∗inf ∗ Φ∗ (·, y ∗ ) = inf ∗ Φ∗ (·, y ∗ )

= ∗inf ∗ Φ∗ (·, y ∗ ) ,
y ∈Y y ∈Y y ∈Y

which concludes the proof. 


A first consequence of this result follows.
Theorem 9.2 Let Φ : X × Y → R be a proper, convex and lower semicontinuous
function with 0 ∈ PrY (dom Φ). Then
  
∗ ∗ ∗
epi(Φ(·, 0)) = clω∗ epi ∗inf ∗ Φ (·, y ) = clω∗ (PrX ∗ ×R (epi Φ∗ )) .
y ∈Y
72 IV Duality theory

Proof. We will prove that


 
PrX ∗ ×R (epi Φ ) ⊆ epi ∗inf ∗ Φ (·, y ) ⊆ clω∗ (PrX ∗ ×R (epi Φ∗ )) .
∗ ∗ ∗
(9.1)
y ∈Y

If (x∗ , r) ∈ PrX ∗ ×R (epi Φ∗ ), then it is clear that inf y∗ ∈Y ∗ Φ∗ (x∗ , y ∗ ) ≤ r, thus


(x∗ , r) ∈ epi (inf y∗ ∈Y ∗ Φ∗ (·, y ∗ )).
If (x∗ , r) ∈ epi inf y∗ ∈Y ∗ Φ∗ (·, y ∗ ) , then for each ε > 0 there is an y ∗ ∈ Y ∗


such that Φ∗ (x∗ , y ∗ ) ≤ r + ε. This means that for all ε > 0 we have (x∗ , r + ε) ∈
∪y∗ ∈Y ∗ epi(Φ∗ (·, y ∗ )) = PrX ∗ ×R (epi Φ∗ ), and so (x∗ , r) ∈ clω∗ (PrX ∗ ×R (epi Φ∗ )).
The conclusion follows from Theorem 9.1, which yields
  
∗ ∗ ∗
epi(Φ(·, 0)) = clω∗ epi ∗inf ∗ Φ (·, y ) ,
y ∈Y

and (9.1). 
Theorem 9.1 and Theorem 9.2 lead to the following result.
Theorem 9.3 Let Φ : X × Y → R be a proper, convex and lower semicontinuous
function such that 0 ∈ PrY (dom Φ). Then PrX ∗ ×R (epi Φ∗ ) is weakly∗ closed if and
only if
(Φ(·, 0))∗ (x∗ ) = min
∗ ∗
Φ∗ (x∗ , y ∗ ) ∀x∗ ∈ X ∗ .
y ∈Y

Proof. The infimal value function inf y∗ ∈Y ∗ Φ∗ (·, y ∗ ) is proper (see the proof of
Theorem 9.1).
“⇒” If PrX ∗ ×R (epi Φ∗ ) is weakly∗ closed, then, according to Theorem 9.2 and
(9.1), epi(Φ(·, 0))∗ = PrX ∗ ×R (epi Φ∗ ) = epi (inf y∗ ∈Y ∗ Φ∗ (·, y ∗ )). This proves that
(Φ(·, 0))∗ = inf y∗ ∈Y ∗ Φ∗ (·, y ∗ ). Let x∗ ∈ X ∗ . If (Φ(·, 0))∗ (x∗ ) = +∞, then the
infimum in the infimal value function of Φ∗ is attained at every y ∗ ∈ Y ∗ . If
(Φ(·, 0))∗ (x∗ ) ∈ R, then
(x∗ , (Φ(·, 0))∗ (x∗ )) ∈ epi(Φ(·, 0))∗ = PrX ∗ ×R (epi Φ∗ ) = ∪y∗ ∈Y ∗ epi(Φ∗ (·, y ∗ )),
consequently, there exists ȳ ∗ ∈ Y ∗ such that
Φ∗ (x∗ , ȳ ∗ ) ≤ (Φ(·, 0))∗ (x∗ ) = ∗inf ∗ Φ∗ (x∗ , y ∗ ).
y ∈Y

“⇐” The fact that (Φ(·, 0))∗ = inf y∗ ∈Y ∗ Φ∗ (·, y ∗ ) implies that inf y∗ ∈Y ∗ Φ∗ (·, y ∗ )
is weakly∗ lower semicontinuous, consequently, from (9.1)
   
∗ w∗ ∗ ∗
∗ ∗
clω∗ (PrX ∗ ×R (epi Φ )) ⊆ epi ∗inf ∗ Φ (·, y ) = epi ∗inf ∗ Φ (·, y ) .
y ∈Y y ∈Y

On the other hand, since for all x∗ ∈ X ∗ the infimum in the infimal value function
of Φ∗ is attained, we have
 
epi ∗inf ∗ Φ (·, y ) ⊆ ∪y∗ ∈Y ∗ epi(Φ∗ (·, y ∗ )) = PrX ∗ ×R (epi Φ∗ ).
∗ ∗
y ∈Y

This proves that PrX ∗ ×R (epi Φ∗ ) is weakly∗ closed. 


10 Fenchel duality 73

In the light of Theorem 9.3 we can formulate a new so-called closedness-


type regularity condition, which is sufficient not only for strong duality for the
optimization problems (P G) and (DG), but also for the exact conjugate and
subdifferential formulas in Theorem 8.12.

Theorem 9.4 Let Φ : X × Y → R be a proper and convex function such that


0 ∈ PrY (dom Φ). If the regularity condition

(RC5Φ ) Φ is lower semicontinuous and PrX ∗ ×R (epi Φ∗ ) is weakly∗ closed

is fulfilled, then the following statements are true:

(a) (Φ(·, 0))∗ (x∗ ) = miny∗ ∈Y ∗ Φ∗ (x∗ , y ∗ ) for all x∗ ∈ X ∗ ;

(b) ∂ (Φ(·, 0)) (x) = PrX ∗ (∂Φ(x, 0)) for all x ∈ X.

In particular, for (P G) and (DG) strong duality holds.

Remark 9.5 If Φ : X × Y → R is a proper, convex and lower semicontinu-


ous function such that 0 ∈ PrY (dom Φ) and one of the regularity conditions
(RCiΦ ), i ∈ {1, 2, 20 , 3, 4}, is fulfilled, then, according to Theorem 8.12 and The-
orem 9.3, the closedness-type regularity condition (RC5Φ ) is also fulfilled. In the
next sections we will provide an examples of primal-dual pairs of optimization
problems for which the closedness-type regularity condition are satisfied, but the
generalized interior point ones fail.

10 Fenchel duality
In this section, by using the general perturbation approach we develop a duality
framework for the optimization problem

(P A ) inf {f (x) + (g ◦ A)(x)}


x∈X

and some of its particular instances.


Here, X and Y are nontrivial real normed spaces, A : X → Y is a continuous
linear operator, and f : X → R and g : Y → R are proper functions fulfilling
A(dom f ) ∩ dom g 6= ∅.
We will consider as perturbation function to this problem

ΦA : X × Y → R, ΦA (x, y) = f (x) + g(Ax + y).

Obviously, ΦA (·, 0) = f + g ◦ A.
74 IV Duality theory

Its conjugate function (ΦA )∗ : X ∗ × Y ∗ → R reads for all (x∗ , y ∗ ) ∈ X ∗ × Y ∗

(ΦA )∗ (x∗ , y ∗ ) = sup{hx∗ , xi + hy ∗ , yi − f (x) − g(Ax + y)}


x∈X
y∈Y

= sup{hx∗ , xi + hy ∗ , r − Axi − f (x) − g(r)}


x∈X
r∈Y
= sup{hx∗ − A∗ y ∗ , xi + hy ∗ , ri − f (x) − g(r)}
x∈X
r∈Y
=f (x − A∗ y ∗ ) + g ∗ (y ∗ )
∗ ∗

and gives rise to the following dual problem to (P A )

(DA ) sup {−f ∗ (−A∗ y ∗ ) − g ∗ (y ∗ )}.


y ∗ ∈Y ∗

By the weak duality theorem we have v(DA ) ≤ v(P A ).


If f and g are convex functions, then ΦA is convex, too; if f and g are lower
semicontinuous, then ΦA is lower semicontinuous, too. The feasibility condition
guarantees that

0 ∈ PrY (dom ΦA ) ={y ∈ Y | ∃x ∈ dom f such that y ∈ dom g − Ax}


= dom g − A(dom f ).

Thus, the regularity condition (RC1Φ ) becomes

(RC1A ) ∃x0 ∈ dom f ∩ A−1 (dom g) such that g is continuous at Ax0 .

In Banach spaces we obtain from (RC2Φ ) and (RC2Φ0 ) the following regularity
conditions
(RC2A ) X and Y are Banach spaces, f and g are lower semicontinuous
and 0 ∈ core(dom g − A(dom f ))

and
(RC2A0 ) X and Y are Banach spaces, f and g are lower semicontinuous
and 0 ∈ int(dom g − A(dom f )),

respectively, while the regularity condition (RC3Φ ) leads to

(RC3A ) X and Y are Banach spaces, f and g are lower semicontinuous


and 0 ∈ sqri(dom g − A(dom f )).

The regularity condition (RC4Φ ) becomes

(RC4A ) Y is finite-dimensional and ri(dom g) ∩ ri(A(dom f ))) 6= ∅.


10 Fenchel duality 75

In order to formulate a closedness-type regularity condition for (P A ) − (DA ),


we will need to calculate PrX ∗ ×R (epi(ΦA )∗ ). We have
(x∗ , r) ∈ PrX ∗ ×R (epi(ΦA )∗ ) ⇔ ∃y ∗ ∈ Y ∗ such that f ∗ (x∗ − A∗ y ∗ ) + g ∗ (y ∗ ) ≤ r
⇔ ∃y ∗ ∈ Y ∗ such that (x∗ − A∗ y ∗ , r − g ∗ (y ∗ )) ∈ epi f ∗
⇔ ∃y ∗ ∈ Y ∗ such that (x∗ , r) ∈ epi f ∗ + (A∗ y ∗ , g ∗ (y ∗ ))
⇔ (x∗ , r) ∈ epi f ∗ + (A∗ × IdR )(epi g ∗ ),
where IdR denotes the identity operator on R and A∗ × IdR : Y ∗ × R → X ∗ × R is
defined as (A∗ × IdR )(y ∗ , r) = (A∗ y ∗ , r). Consequently, the regularity condition
(RC5Φ ) leads to
(RC5A ) f and g are lower semicontinuous and
epi f ∗ + (A∗ × IdR )(epi g ∗ ) is weakly∗ closed.
By combining Theorem 8.12 and Theorem 9.4, we obtain the following result.
Theorem 10.1 Let f : X → R and g : Y → R be proper and convex functions
and A : X → Y a continuous linear operator such that A(dom f ) ∩ dom g 6= ∅. If
one of the regularity conditions (RCiA ), i ∈ {1, 2, 20 , 3, 4, 5}, is fulfilled, then the
following statements are true:
(a) (f + g ◦ A)∗ (x∗ ) = miny∗ ∈Y ∗ {f ∗ (x∗ − A∗ y ∗ ) + g ∗ (y ∗ )} for all x∗ ∈ X ∗ ;
(b) ∂ (f + g ◦ A) (x) = ∂f (x) + A∗ ∂g(Ax) for all x ∈ X.
In particular, for (P A ) and (DA ) strong duality holds.
Proof. The two statements follow from the formula of the conjugate function of
ΦA and the fact that PrX ∗ (∂ΦA (x, 0)) = ∂f (x) + A∗ ∂g(Ax) for all x ∈ X. To see
that the last statement is true, let x ∈ X be fixed. Then x∗ ∈ PrX ∗ (∂ΦA (x, 0))
if and only if there exists y ∗ ∈ Y ∗ such that (x∗ , y ∗ ) ∈ ∂ΦA (x, 0) or, equivalently,
ΦA (x, 0) + (ΦA )∗ (x∗ , y ∗ ) = hx∗ , xi. In other words,
f (x) + g(Ax) + f ∗ (x∗ − A∗ y ∗ ) + g ∗ (y ∗ ) = hx∗ , xi,
which is further equivalent to
f (x) + f ∗ (x∗ − A∗ y ∗ ) − hx∗ − A∗ y ∗ , xi + g(Ax) + g ∗ (y ∗ ) − hy ∗ , Axi = 0.
By taking into account the Young-Fenchel inequality, this holds if and only if
f (x) + f ∗ (x∗ − A∗ y ∗ ) − hx∗ − A∗ y ∗ , xi = 0 and g(Ax) + g ∗ (y ∗ ) − hy ∗ , Axi = 0
or, equivalently,
x∗ − A∗ y ∗ ∈ ∂f (x) and y ∗ ∈ ∂g(Ax).
This shows that x∗ ∈ PrX ∗ (∂ΦA (x, 0)) if and only if x∗ ∈ ∂f (x) + A∗ ∂g(Ax) and
concludes the proof. 
76 IV Duality theory

In case X = Y and A = IdX , the identity operator on X, the primal opti-


mization problem (P A ) becomes

(P F ) inf {f (x) + g(x)}.


x∈X

The conjugate of the perturbation function

ΦF : X × X → R, ΦF (x, y) = f (x) + g(x + y),

reads
(ΦF )∗ (x∗ , y ∗ ) = f ∗ (x∗ − y ∗ ) + g ∗ (y ∗ ) ∀(x∗ , y ∗ ) ∈ X ∗ × X ∗
and gives rise to the following so-called Fenchel dual problem to (P F )

(DF ) sup {−f ∗ (−y ∗ ) − g ∗ (y ∗ )}.


y ∗ ∈X ∗

For (P F ) and (DF ) weak duality always holds.


The generalized interior point regularity conditions stated for the primal-dual
pair (P A ) − (DA ) become

(RC1F ) ∃x0 ∈ dom f ∩ dom g such that g (or f ) is continuous at x0 ,

in case X is a Banach space


(RC2F ) X is a Banach space, f and g are lower semicontinuous
and 0 ∈ core(dom g − dom f ),

(RC2F0 ) X is a Banach space, f and g are lower semicontinuous


and 0 ∈ int(dom g − dom f ),

and
(RC3F ) X is a Banach space, f and g are lower semicontinuous
and 0 ∈ sqri(dom g − dom f ),
and in case X is finite-dimensional

(RC4F ) X is finite-dimensional and ri(dom g) ∩ ri(dom f ) 6= ∅,

respectively. The regularity condition (RC3F ) was introduced by Attouch and


Brézis and bears their names, while (RC2F ) and (RC2F0 ) are called the Moreau-
Rockafellar regularity conditions.
The closedness-type regularity conditions (RC5A ) reads in case of the primal-
dual pair (P F ) − (DF )

(RC5F ) f and g are lower semicontinuous


and epi f ∗ + epi g ∗ is weakly∗ closed.
10 Fenchel duality 77

Theorem 10.1 gives rise to the following result.

Theorem 10.2 Let f, g : X → R be proper and convex functions such that


dom f ∩dom g 6= ∅. If one of the regularity conditions (RCiF ), i ∈ {1, 2, 20 , 3, 4, 5},
is fulfilled, then the following statements are true:
(a) (f +g)∗ (x∗ ) = miny∗ ∈X ∗ {f ∗ (x∗ −y ∗ )+g ∗ (y ∗ )} = (f ∗ g ∗ )(x∗ ) for all x∗ ∈ X ∗ ;

(b) ∂ (f + g) (x) = ∂f (x) + ∂g(x) for all x ∈ X.


In particular, for (P F ) and (DF ) strong duality holds.

Example 10.3 (a) Let X and Y be notrivial real Banach spaces and f =
δX×{0} , g = δ{0}×Y : X × Y → R proper, convex and lower semicontinuous func-
tions. It holds dom f ∩ dom g = {(0, 0)} and v(P F ) = inf (x,y)∈X×Y {f (x, y) +
g(x, y)} = 0. Since f ∗ = δ{0}×Y ∗ and g ∗ = δX ∗ ×{0} , it holds v(DF ) = 0
and (x̄∗ , ȳ ∗ ) = (0, 0) is an optimal solution of the dual, which means that for
(P F ) and (DF ) strong duality holds. Since neither f nor g is continuous at
(0, 0), the regularity condition (RC1F ) is not fulfilled. On the other hand, since
dom g − dom f = X × Y , we have 0 ∈ core(dom g − dom f ), thus the regularity
condition (RC2F ) is fulfilled.
(b) Let X be an infinite-dimensional real Banach space and x] : X → R a non-
continuous linear functional. The functions f = x] and g = −x] are proper and
convex and have full-domain, which means that 0 ∈ core(dom f − dom g). Their
conjugate functions are identically +∞, thus v(DF ) = −∞. On the other hand,
v(P F ) = 0. This shows that for the regularity conditions (RCiF ), i ∈ {2, 20 , 3},
the assumption that the functions f and g are lower semicontinous cannot be
omitted.

Example 10.4 (a) The functions f = δ[−1,+∞)×{0} , g = δ(−∞,1]×{0} : R2 → R are


proper, convex and lower semicontinuous, (0, 0) ∈ dom f ∩ dom g and it holds
v(P F ) = inf x∈R2 {f (x) + g(x)} = 0. We have

−x∗1 , if x∗1 ≤ 0,

∗ 2 ∗ ∗ ∗
f : R → R, f (x1 , x2 ) = (10.1)
+∞, if x∗1 > 0,

and
x∗1 , if x∗1 ≥ 0,

∗ ∗
g : R → R, 2
g (x∗1 , x∗2 ) = (10.2)
+∞, if x∗1 < 0,
thus

v(DF ) = sup {−f ∗ (−x∗1 , −x∗2 ) − g ∗ (x∗1 , x∗2 )} = sup {−2x∗1 } = 0


(x∗1 ,x∗2 )∈R2 (x∗1 ,x∗2 )∈R+ ×R

and every (0, x∗2 ), with x∗2 ∈ R, is an optimal solution of the dual. This shows that
for (P F ) and (DF ) strong duality holds. Since cone(dom g − dom f ) = R × {0},
78 IV Duality theory

neither (RC1F ) nor (RC2F ) is fulfilled. However, the regularity condition (RC3F )
is fulfilled.
(b) The functions f = δ(−1,+∞)×{0} , g = δ(−∞,1)×{0} : R2 → R are proper,
convex and not lower semicontinuous, (0, 0) ∈ dom f ∩dom g and it holds v(P F ) =
inf x∈R2 {f (x) + g(x)} = 0. Their conjugate functions are given by (10.1) and
(10.2), respectively, thus for (P F ) and (DF ) strong duality holds. None of the
regularity conditons (RCiF ), i ∈ {1, 2, 20 , 3}, is fulfilled, however, since ri(dom g)∩
ri(dom f ) = (−1, 1) × {0}, the regularity conditions (RC4F ) holds.

Example 10.5 Let `2 = (xn )n∈N ⊆ R | 2
P
n∈N |xn | < +∞ , the convex and
closed sets C = {(xn )n∈N ∈ `2 | x2n−1 + x2n = 0 ∀n ∈ N} and D = {(xn )n∈N ∈
`2 | x2n + x2n+1 = 0 ∀n ∈ N}, and the proper, convex and lower semicontinuous
functions f, g : `2 → R, f = δC and, for x = (xn )n∈N , g(x) = x1 + δD (x). It holds
v(P F ) = inf x∈`2 {f (x) + g(x)} = 0 and v(DF ) = supx∗ ∈`2 {−f ∗ (x∗ ) − g ∗ (x∗ )} =
−∞, while 0 ∈ / sqri(dom g − dom f ) and 0 ∈ icr(dom g − dom f ). This shows
that the condition 0 ∈ sqri(dom g − dom f ) in (RC4F ) cannot be replaced by the
weaker condition 0 ∈ icr(dom g − dom f ).

Example 10.6 (closedness-type versus generalized interior point regularity con-


ditions for Fenchel duality) Let f, g : R → R, f (x) = 12 x2 + δR+ (x) and g = δR− be
proper, convex and lower semicontinuous functions. Then dom f ∩ dom g = {0}
and v(P F ) = 0. It holds
 1 ∗
∗ ∗ x , if x∗ ≥ 0,
f (x ) = 2 and g ∗ = δR+ ,
0, if x∗ < 0

thus v(DF ) = 0 and every x∗ ≥ 0 is an optimal solution of the dual. Since


dom g − dom f = R− , none of the regularity conditons (RCiF ), i ∈ {1, 2, 20 , 3, 4},
is fulfilled. On the other hand, epi f ∗ + epi g ∗ = R × R+ , which means that the
closedness-type regularity condition (RC5F ) is verified.

Example 10.7 For C = R+ ×R and D = {(x1 , x2 ) ∈ R2 | 2x1 +x22 ≤ 0}, the func-
tions f = δC , g = δD : R2 → R are proper, convex and lower semicontinuous and
fulfill dom f ∩ dom g = {(0, 0)} and v(P F ) = inf (x1 ,x2 )∈R2 {f (x1 , x2 ) + g(x1 , x2 )} =
0. It holds
 ∗2
(x )
 x2∗ , if x∗1 > 0,

1
f ∗ = δR− ×{0} and g ∗ (x∗1 , x∗2 ) = 0, if x∗1 = x∗2 = 0,

 +∞, otherwise,

and v(DF ) = sup(x∗1 ,x∗2 )∈R2 {−f ∗ (−x∗1 , −x∗2 ) − g ∗ (x∗1 , x∗2 )} = 0, and (x̄∗1 , x̄∗2 ) = (0, 0)
is an optimal solution of the dual. This means that, despite of the fact that none
of the regularity conditons (RCiF ), i ∈ {1, 2, 20 , 3, 4, 5}, is fulfilled, for (P F ) and
(DF ) strong duality holds.
10 Fenchel duality 79

In case f (x) = 0 for all x ∈ X, the primal optimization problem (P A ) becomes

(P Ag ) inf (g ◦ A)(x).
x∈X

The conjugate of the perturbation function

ΦAg : X × Y → R, ΦAg (x, y) = g(Ax + y),

reads for all (x∗ , y ∗ ) ∈ X ∗ × Y ∗

g ∗ (y ∗ ), if x∗ = A∗ y ∗ ,

Ag ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
(Φ ) (x , y ) = f (x − A y ) + g (y ) =
+∞, otherwise,

and gives rise to the the following dual problem to (P Ag )

(DAg ) sup {−g ∗ (y ∗ )}.


y ∗ ∈Y ∗
A∗ y ∗ =0

For (P Ag ) and (DAg ) weak duality always holds.


The generalized interior point regularity conditions stated for the primal-dual
pair (P A ) − (DA ) become
A
(RC1 g ) ∃x0 ∈ A−1 (dom g) such that g is continuous at Ax0 ,

in case X and Y are Banach spaces


A
(RC2 g ) X and Y are Banach spaces, g is lower semicontinuous
and 0 ∈ core(dom g − ran A),

A
(RC20 g ) X and Y are Banach spaces, g is lower semicontinuous
and 0 ∈ int(dom g − ran A),

and
A
(RC3 g ) X and Y are Banach spaces, g is lower semicontinuous
and 0 ∈ sqri(dom g − ran A),

and in case Y is finite-dimensional


A
(RC4 g ) Y is finite-dimensional and ri(dom g) ∩ ran A 6= ∅,

respectively, while the closedness-type regularity condition reads


A
(RC5 g ) g is lower semicontinuous and (A∗ × IdR )(epi g ∗ ) is weakly∗ closed.

Theorem 10.1 leads to the following result.


80 IV Duality theory

Theorem 10.8 Let g : Y → R be a proper and convex function and A : X → Y


a continuous linear operator such that ran A ∩ dom g 6= ∅. If one of the regularity
A
conditions (RCi g ), i ∈ {1, 2, 20 , 3, 4, 5}, is fulfilled, then the following statements
are true:
(a) (g ◦ A)∗ (x∗ ) = miny∗ ∈Y ∗ ,A∗ y∗ =x∗ {g ∗ (y ∗ )} for all x∗ ∈ X ∗ ;
(b) ∂ (g ◦ A) (x) = A∗ ∂g(Ax) for all x ∈ X.
In particular, for (P Ag ) and (DAg ) strong duality holds.
We close this section by considering a primal-dual pair of optimization prob-
lems which is introduced as a special instance of (P Ag ) − (DAg ). For m ≥ 2,
let fi : X → R, i = 1, ..., m, be proper functions such that ∩m dom fi 6= ∅. We
i=1 P
m
denote X m := X| × ...
{z × X} . Then g : X m
→ R, g(x 1 , ..., x m ) = i=1 fi (xi ), is a
m
proper function and A : X → X m , Ax = (x, ..., x), is a continuous linear operator
such that ran A ∩ dom g 6= ∅.
For this choice, the primal optimization problem (P Ag ) becomes
( m )
X
Σ
(P ) inf fi (x) .
x∈X
i=1

∗ ∗ ∗ m ∗ ∗ ∗
Pm ∗ ∗ ∗ ∗ ∗
For
Pm ∗all (x 1 , ..., x m ) ∈ (X ) , g (x 1 , ..., x m ) = i=1 fi (xi ) and A (x1 , ..., xm ) =
Ag
i=1 xi , thus the dual (D ) is nothing else than
( m )
X
(DΣ ) sup − fi∗ (x∗i ) .
x∗i ∈X ∗ ,i=1,...,m, i=1
m
x∗i =0
P
i=1

For (P Σ ) and (D Σ
Qm) weak duality always holds.
−1
As dom g = i=1 dom fi and A (dom g) = ∩m i=1 dom fi , the regularity condi-
Ag
tion (RC1 ) reads
m
(RC1Σ ) ∃x0 ∈ dom fi such that fi is continuous at x0 , i = 1, ..., m.
T
i=1

We have ran A = {(x, ..., x) | x ∈ X} = ∆X m , which denotes the diagonal set


A
of X m . The regularity conditions (RCi g ), i ∈ {2, 20 , 3, 4}, lead to
(RC2Σ ) X is a Banach space, fi , i = 1, ...,
 m, are lower semicontinuous
Qm
and 0 ∈ core dom fi − ∆X m ,
i=1

(RC2Σ0 ) X is a Banach
 mspace, fi , i = 1, 
..., m, are lower semicontinuous
Q
and 0 ∈ int dom fi − ∆X m ,
i=1
11 Lagrange duality 81

(RC3Σ ) X is a Banach mspace, fi , i = 1, ...,


 m, are lower semicontinuous
Q
and 0 ∈ sqri dom fi − ∆X m ,
i=1

and
Tm
(RC4Σ ) X is finite-dimensional and i=1 ri(dom fi ) 6= ∅,

respectively.
We notice that

(x∗ , r) ∈ (A∗ × IdR )(epi g ∗ ) ⇔ ∃(x∗1 , ..., x∗m ) ∈ (X ∗ )m such that g ∗ (x∗1 , ..., x∗m ) ≤ r
and A∗ (x∗1 , ..., x∗m ) = x∗
X m
∗ ∗ ∗ m
⇔ ∃(x1 , ..., xm ) ∈ (X ) such that fi∗ (x∗i ) ≤ r
i=1
m
X m
X
and x∗i = x∗ ⇔ (x∗ , r) ∈ epi fi∗ ,
i=1 i=1

A
thus the closedness-type regularity condition (RC5 g ) reads

(RC5Σ ) fi , i =
P1, ..., m, ∗are lower semicontinuous
and m ∗
i=1 epi fi is weakly closed.

Theorem 10.8 leads to the following result.

Theorem 10.9 Let fi : X → R, i = 1, ..., m, be proper and convex functions


such that ∩m i=1 dom fi 6= ∅. If one of the regularity conditions (RCiΣ ), i ∈
0
{1, 2, 2 , 3, 4, 5}, is fulfilled, then the following statements are true:

(a) ( m
P ∗ ∗
Pm ∗ ∗ Pm ∗ ∗ ∗ ∗ ∗
i=1 f i ) (x ) = min { i=1 f i (x i ) | i=1 xi = x } = (f1 ...fm )(x ) for
∗ ∗
all x ∈ X ;

(b) ∂ ( m
P Pm
i=1 fi ) (x) = i=1 ∂fi (x) for all x ∈ X.

In particular, for (P Σ ) and (DΣ ) strong duality holds.

11 Lagrange duality
In this section, by using the general perturbation approach, we develop a duality
framework for the constrained optimization problem

(P L ) inf f (x).
such that x ∈ S,
g(x) 5C 0
82 IV Duality theory

Here, X and Z are nontrivial real normed spaces, C ⊆ Z is a nonempty proper


(C ∩ −C = {0}) convex cone, S ⊆ X is a nonempty set, f : X → R is a proper
function and g : X → Z is a vector function such that dom f ∩ S ∩ g −1 (−C) 6= ∅.
By “5C ” we denote the partial order on Z defined for all u, v ∈ Z by
u 5C v ⇔ v − u ∈ C.
We will consider as perturbation function to this problem

L L f (x), if x ∈ S, g(x) 5C z,
Φ : X × Z → R, Φ (x, z) =
+∞, otherwise.
Its conjugate function (ΦL )∗ : X ∗ × Z ∗ → R reads for all (x∗ , z ∗ ) ∈ X ∗ × Z ∗
(ΦL )∗ (x∗ , z ∗ ) = sup {hx∗ , xi + hz ∗ , zi − f (x)}
x∈S,z∈Z
g(x)−z∈−C

= sup h−z ∗ , si + sup{hx∗ , xi + hz ∗ , g(x)i − f (x)}


s∈−C x∈S

sup{hx , xi + hz ∗ , g(x)i − f (x)}, if z ∗ ∈ −C ∗ ,



(
= x∈S
+∞, otherwise,
where C ∗ = {z ∗ ∈ Z ∗ | hz ∗ , ci ≥ 0 ∀c ∈ C} denotes the dual cone of C, and gives
rise to the following so-called Lagrange dual problem to (P L )
(DL ) sup inf {f (x) − hz ∗ , g(x)i},
z ∗ ∈−C ∗ x∈S

or, equivalently,
(DL ) sup inf {f (x) + hz ∗ , g(x)i}.
z ∗ ∈C ∗ x∈S

By the weak duality theorem we have v(DL ) ≤ v(P L ).


The perturbation function ΦL is proper. If S is a convex set, f a convex
function and g a C-convex function, namely,
g(λx + (1 − λ)y)) 5C λg(x) + (1 − λ)g(y) ∀x, y ∈ X ∀λ ∈ [0, 1]
or, equivalently, its C-epigraph epiC g = {(x, z) ∈ X × Z | g(x) 5C z} is a convex
set, then ΦL is convex. This can be easily notice from the fact that
epi ΦL = {(x, z, r) ∈ X × Z × R | (x, r) ∈ epi f } ∩ (S × Z × R) ∩ (epiC g × R).
The feasibility condition guarantees that
0 ∈ PrZ (dom ΦL ) = {z ∈ Z | ∃x ∈ dom f ∩ S such that z ∈ g(x) + C}
= g(dom f ∩ S) + C.
The regularity condition (RC1Φ ) states in this particular case that there exists
x0 ∈ dom f ∩ S ∩ g −1 (−C) such that the function z 7→ f (x0 ) + δS (x0 ) + δg(x0 )+C (z)
is continuous at 0, which is the same with asking that there exists x0 ∈ dom f ∩
S ∩ g −1 (−C) such that 0 ∈ int(g(x0 ) + C) or, equivalently, that
11 Lagrange duality 83

(RC1L ) ∃x0 ∈ dom f ∩ S such that g(x0 ) ∈ − int C.


This is nothing else than the classical Slater constraint qualification.
If S is a closed set, f is a lower semicontinuous function and g is a C-epi closed
function, which means that its C-epigraph is a closed set, then epi ΦL is a closed
set, in other words, ΦL is lower semicontinuous. If C is closed and hz ∗ , g(·)i is
lower semicontinuous for all z ∗ ∈ C ∗ , then g is C-epi closed.
In Banach spaces we obtain from (RC2Φ ) and (RC2Φ0 ) the following regularity
conditions
(RC2L ) X and Z are Banach spaces, S is closed, f is lower semicontinuous,
g is C-epi closed and 0 ∈ core (g(dom f ∩ S) + C)
and
(RC2L0 ) X and Z are Banach spaces, S is closed, f is lower semicontinuous,
g is C-epi closed and 0 ∈ int (g(dom f ∩ S) + C),
respectively, while the regularity condition (RC3Φ ) leads to
(RC3L ) X and Z are Banach spaces, S is closed, f is lower semicontinuous,
g is C-epi closed and 0 ∈ sqri (g(dom f ∩ S) + C).
The regularity condition (RC4Φ ) becomes
(RC4L ) Z is finite-dimensional and 0 ∈ ri (g(dom f ∩ S) + C).
In order to formulate a closedness-type regularity condition for (P L ) − (DL ),
we will need to calculate PrX ∗ ×R (epi(ΦL )∗ ). We have
(x∗ , r) ∈ PrX ∗ ×R (epi(ΦL )∗ ) ⇔ ∃z ∗ ∈ −C ∗ such that
sup{hx∗ , xi + hz ∗ , g(x)i − f (x)} ≤ r
x∈S
⇔ ∃z ∈ C ∗ such that

sup{hx∗ , xi − hz ∗ , g(x)i − f (x) − δS (x)} ≤ r


x∈X
⇔ ∃z ∈ C ∗ such that (f + hz ∗ , g(·)i + δS )∗ (x∗ ) ≤ r

⇔ (x∗ , r) ∈ ∪z∗ ∈C ∗ epi(f + hz ∗ , g(·)i + δS )∗ .


Consequently, the regularity condition (RC5Φ ) leads to
(RC5L ) S is closed, f is lower semicontinuous, g is C-epi closed and
∪z∗ ∈C ∗ epi(f + hz ∗ , g(·)i + δS )∗ is weakly∗ closed.
By combining Theorem 8.12 and Theorem 9.4, we obtain the following result.
Theorem 11.1 Let S ⊆ X be a nonempty convex set, f : X → R a proper
and convex function and g : X → Z a C-convex function such that dom f ∩ S ∩
g −1 (−C) 6= ∅. If one of the regularity conditions (RCiL ), i ∈ {1, 2, 20 , 3, 4, 5}, is
fulfilled, then the following statements are true:
84 IV Duality theory

(a) inf x∈S,g(x)5C 0 {f (x) − hx∗ , xi} = maxz∗ ∈C ∗ inf x∈S {f (x) + hz ∗ , g(x)i − hx∗ , xi}
for all x∗ ∈ X ∗ ;
(b) ∂ f + δS∩g−1 (−C) (x) = z∗ ∈C ∗ ,hz∗ ,g(x)i=0 ∂ (f + hz ∗ , g(·)i + δS ) (x) for all
 S

x ∈ S ∩ g −1 (−C).
In particular, for (P L ) and (DL ) strong duality holds.
Proof. The two statements follow from the formula of the conjugate function
of ΦL and the fact that
[
PrX ∗ (∂ΦL (x, 0)) = ∂ (f + hz ∗ , g(·)i + δS ) (x) ∀x ∈ S ∩ g −1 (−C).
z ∗ ∈C ∗ ,hz ∗ ,g(x)i=0

To see that the last statement is true, let x ∈ S ∩ g −1 (−C) be fixed. Then
x ∈ PrX ∗ (∂ΦL (x, 0)) if and only if there exists −z ∗ ∈ Z ∗ such that (x∗ , −z ∗ ) ∈

∂ΦL (x, 0) or, equivalently, ΦL (x, 0) + (ΦL )∗ (x∗ , −z ∗ ) = hx∗ , xi. This is nothing
else than z ∗ ∈ C ∗ and
f (x) + (f + hz ∗ , g(·)i + δS )∗ (x∗ ) = hx∗ , xi,
which, by taking into account the Young-Fenchel inequality, is further equivalent
to z ∗ ∈ C ∗ , hz ∗ , g(x)i = 0 and
f (x) + hz ∗ , g(x)i + δS (x) + (f + hz ∗ , g(·)i + δS )∗ (x∗ ) = hx∗ , xi.
This shows that x∗ ∈ PrX ∗ (∂ΦL (x, 0)) if and only if there exists z ∗ ∈ C ∗ such that
hz ∗ , g(x)i = 0 and x∗ ∈ ∂ (f + hz ∗ , g(·)i + δS ) (x), which concludes the proof. 
In case Z = Rm+k , C = Rm + × {0Rk }, which is a proper and convex cone,
m+k
and g : X → R , g(x) = (g1 (x), ..., gm (x), h1 (x), ..., hk (x)), (P L ) becomes the
following optimization problem with inequality and equality constraints
(P L ) inf f (x).
such that x ∈ S,
gi (x) ≤ 0, i = 1, ...., m,
hj (x) = 0, j = 1, ..., k
Since C ∗ = Rm k
+ × R , its Lagrange dual problem reads
( m k
)
X X
(DL ) sup inf f (x) + λi gi (x) + µj hj (x) .
λi ≥0,i=1,...,m x∈S i=1 j=1
µj ∈R,i=1,...,k

If gi : X → R, i = 1, ..., m, are convex and hj : X → R, j = 1, ..., k, are affine,


then g : X → Rm+k , g(x) = (g1 (x), ..., gm (x), h1 (x), ..., hk (x)), is C-convex. One
can easily notice that (RC1L ) is not fulfilled. However, if
0 ∈ ri g(dom f ∩ S) + Rm

+ × {0Rk } , (11.1)
11 Lagrange duality 85

then for (P L ) and (DL ) strong duality holds and for all x ∈ S ∩g −1 (−Rm+ ×{0Rk })
we have
m k
!
  [ X X
∂ f + δS∩g−1 (−Rm + ×{0Rk })
(x) = ∂ f+ λi gi + µj hj + δS (x).
λi ≥0,λi gi (x)=0, i=1 j=1
i=1,...,m,
µj ∈R,i=1,...,k

This means that, if (11.1) is fulfilled, then x is an optimal solution of (P L )


if and only if there exists so-called Lagrange multipliers λ̄i ≥ 0, i = 1, ..., m,
and µ̄j ∈ R, j = 1, ..., k, such that the so-called Kraush-Kuhn-Tucker system of
optimality conditions

 x̄ ∈ S, gi (x̄) ≤ 0, λ̄i gi (x̄) = 0, i = 1, ..., m

hj (x̄) 
= 0, j = 1, ..., k 
 0 ∈ ∂ f + Pm λ̄i gi + Pk µ̄j hj + δS (x̄)

i=1 j=1

is true.
If, additionally, X = Rn , let E = {(x, z) ∈ Rn ×Rm+k | x ∈ dom f ∩S, z−g(x) ∈
R+ × {0Rk }}. For all x ∈ Rn , let E(x) = {z ∈ Rm+k | (x, z) ∈ E}. Then E(x) = ∅
m

for x ∈/ dom f ∩ S and E(x) = g(x) + Rm + × {0Rk } for x ∈ dom f ∩ S. According


to Theorem 2.18 it holds

(x, z) ∈ ri E ⇔ x ∈ ri(dom f ∩ S)
and z ∈ ri(g(x) + Rm m
+ × {0Rk }) = g(x) + int R+ × {0Rk }.

Consequently,

ri E = {(x, z) ∈ Rn × Rm+k | x ∈ ri(dom f ∩ S), z − g(x) ∈ int Rm


+ × {0Rk }}

and, from here,

ri g(dom f ∩ S) + Rm

+ × {0Rk } = ri PrRm+k (E) = PrRm+k (ri E)
= g(ri(dom f ∩ S)) + int Rm
+ × {0Rk }.

This proves that, in this case, the regularity condition (11.1) is nothing else than
the weak Slater constaint qualification

∃x0 ∈ ri(dom f ∩ S) such that gi (x0 ) < 0, i = 1, ..., m, hj (x0 ) = 0, j = 1, ..., k.


(11.2)

Example 11.2 (closedness-type versus generalized interior point regularity con-


ditions for Lagrange duality) Let X = Z = R2 , C = R2+ , S = R2+ , f : R2 →
R, f (x1 , x2 ) = 21 x21 + x2 , a convex and continuous function, and g : R2 →
86 IV Duality theory

R2 , g(x1 , x2 ) = (x1 , x2 −x1 ), an R2+ -convex and continuous function. Then dom f ∩
S ∩ g −1 (−R2+ ) = {(0, 0)} and v(P L ) = 0. It holds
 
L 1 2
v(D ) = sup inf x + x2 + λ1 x1 + λ2 (x2 − x1 ) = 0
λ1 ,λ2 ≥0 x1 ,x2 ≥0 2 1

and (λ1 , λ2 ) = (0, 0) is an optimal solution of the dual problem. One can easily
see that none of the regularity conditons (RCiL ), i ∈ {1, 2, 20 , 3, 4}, is fulfilled. On
the other hand, ∪(λ1 ,λ2 )∈R2+ epi(f + h(λ1 , λ2 ), g(·)i + δR2+ )∗ = R2 × R+ , which means
that the closedness-type regularity condition (RC5L ) is verified.
Chapter V

Maximal monotone operators

In this chapter we will discuss the most significant notions and results of the
theory of maximal monotone operators.

12 Monotone set-valued operators


Definition 12.1 (monotone operator) A set-valued operator T : X ⇒ X ∗ is said
to be monotone if

hy ∗ − x∗ , y − xi ≥ 0 ∀x, y ∈ X ∀x∗ ∈ T (x) ∀y ∗ ∈ T (y).

Let T : X ⇒ X ∗ be a set-valued operator. The set


• D(T ) := {x ∈ X | T (x) 6= ∅} denotes the domain of T ;

• R(T ) := ∪x∈X T (x) denotes the range of T ;

• G(T ) := {(x, x∗ ) ∈ X × X ∗ | x∗ ∈ T (x)} denotes the graph of T .

Example 12.2 (a) For a function f : X → R, its convex subdifferential ∂f :


X ⇒ X ∗ is a monotone operator. Indeed, let x, y ∈ X, x∗ ∈ ∂f (x) and y ∗ ∈
∂f (y). Then f (x), f (y) ∈ R,

f (y) − f (x) ≥ hx∗ , y − xi and f (x) − f (y) ≥ hy ∗ , x − yi,

which gives 0 ≥ hx∗ − y ∗ , y − xi or, equivalently, hy ∗ − x∗ , y − xi ≥ 0.


If f : X → R is a convex and Gâteaux-differentiable function, then ∇f : X →

X is a single-valued monotone operator (see also Theorem 7.8).
(b) Let S : X → X ∗ be a single-valued linear operator. Then S is monotone
if and only if hS(y − x), y − xi = hSy − Sx, y − xi ≥ 0 for all x, y ∈ X, in other
words, hSz, zi ≥ 0 for all z ∈ X, which is nothing else than S is positive.
(c) A function ϕ : R → R is a monotone operator if and only if it is nonde-
creasing, in other words, for every t1 , t2 ∈ R with t1 < t2 it holds ϕ(t1 ) ≤ ϕ(t2 ).

87
88 V Maximal monotone operators

(d) Let (H, h·, ·i) be a real Hilbert space, C ⊆ H a nonempty set and S : C →
H a nonexpansive operator, which means that kS(x) − S(y)k ≤ kx − yk for all
x, y ∈ C. Then the operator T : H ⇒ H, defined as T (x) = x − S(x) for all
x ∈ C and T (x) = ∅, otherwise, is monotone. Indeed, for all x, y ∈ C it holds
hT (y) − T (x), y − xi = hy − x − S(y) + S(x), y − xi
= ky − xk2 − hS(y) − S(x), y − xi
≥ ky − xk2 − kS(y) − S(x)kky − xk
=ky − xk (ky − xk − kS(y) − S(x)k) ≥ 0.
Notice that 0 ∈ R(T ) if and only if S has a fixed point in C.
(e) Let (H, h·, ·i) be a real Hilbert space and C ⊆ H a nonempty subset. The
(set-valued) projection operator onto the set C
 
PC : H ⇒ H, PC (x) = u ∈ C | kx − uk = inf kx − ck ,
c∈C

is monotone. Indeed, let x, y ∈ H, u ∈ PC (x) and v ∈ PC (y). Then kx − uk ≤


kx − vk and ky − vk ≤ ky − uk, which implies
kx − uk2 + ky − vk2 ≤ kx − vk2 + ky − uk2
or, equivalently,
hx, ui + hy, vi ≥ hx, vi + hy, ui.
This is nothing else than hy − x, v − ui ≥ 0.
The following statement shows that the monotonicity is preserved by multipli-
cation with a positive scalar, addition and composition with a continuous linear
operator.
Proposition 12.3 Let S : X ⇒ X ∗ , T : Y ⇒ Y ∗ be monotone operators,
A : X → Y a continuous linear operator and γ > 0. Then the operators
γS : X ⇒ X ∗ , (γS)(x) = γS(x),
and
S + A∗ T A : X ⇒ X ∗ , (S + A∗ T A)(x) = S(x) + A∗ (T (Ax)),
are monotone.
Proof. That γS is monotone is clear. Let u, v ∈ X, u∗ ∈ S(u) + A∗ (T (Au)) and
v ∗ ∈ S(v) + A∗ (T (Av)). Then u∗ = x∗1 + A∗ y1∗ , with x∗1 ∈ S(u) and y1∗ ∈ T (Au),
and v ∗ = x∗2 + A∗ y2∗ , with x∗2 ∈ S(v) and y2∗ ∈ T (Av). It holds
hv ∗ − u∗ , v − ui =hx∗2 − x∗1 , v − ui + hA∗ y2∗ − A∗ y1∗ , v − ui
=hx∗2 − x∗1 , v − ui + hy2∗ − y1∗ , Av − Aui ≥ 0,
which proves that S + A∗ T A is monotone. 
12 Monotone set-valued operators 89

Definition 12.4 (maximal monotone operator) A monotone set-valued operator


T : X ⇒ X ∗ is said to be maximally monotone if there exists no monotone
operator T 0 : X ⇒ X ∗ such that G(T ) ( G(T 0 ).

A monotone operator T : X ⇒ X ∗ is maximally monotone if and only if

(x, x∗ ) ∈ X × X ∗ and hy ∗ − x∗ , y − xi ≥ 0 ∀(y, y ∗ ) ∈ G(T ) ⇒ (x, x∗ ) ∈ G(T ).


(12.1)
Assume that T : X ⇒ X ∗ is maximally monotone and that, for (x, x∗ ) ∈ X × X ∗ ,
hy ∗ − x∗ , y − xi ≥ 0 for all (y, y ∗ ) ∈ G(T ). Then (x, x∗ ) ∈ G(T ), since, otherwise,
the operator T 0 : X ⇒ X ∗ defined such that G(T 0 ) = G(T ) ∪ {(x, x∗ )} is a proper
monotone extension of T .
On the other hand, assume that (12.1) holds and T : X ⇒ X ∗ is not max-
imally monotone. Then there exists a monotone operator T 0 : X ⇒ X ∗ such
that G(T ) ( G(T 0 ). Let (x, x∗ ) ∈ G(T 0 ) \ G(T ). Since T 0 is monootne, for all
(y, y ∗ ) ∈ G(T ) it holds hy ∗ − x∗ , y − xi ≥ 0, which, according to (12.1), implies
(x, x∗ ) ∈ G(T ). Contradiction!
The next theorem shows that every monotone operator admits a maximal
monotone extension.

Theorem 12.5 Let T : X ⇒ X ∗ be a monotone operator. Then there exists a


maximal monotone extension of T , namely, a maximal monotone operator T 0 :
X ⇒ X ∗ such that G(T ) ⊆ G(T 0 ).

Proof. Let

M := {S : X ⇒ X ∗ | S is monotone and G(T ) ⊆ G(S)},

which is obviously a nonempty set. We define on M the partial order

S1 ≤ S2 ⇔ G(S1 ) ⊆ G(S2 ).

It is easy to see that “≤” is reflexive, transitive and antisymmetric.


Let (Si )i∈I be a totally ordered subset of M, which means that for all i, j ∈
either Si ≤ Sj or Sj ≤ Si . Then ∪i∈I Si ∈ M is an upper bound of this chain.
According to the Zorn Lemma (see, for instance, [4, page 44]), there exists a
maximal element T 0 in M. This means that T 0 is monotone, G(T ) ⊆ G(T 0 ) and
there exists no S ∈ M, S 6= T 0 , such that G(T 0 ) ⊆ G(S). In other words, T 0 is
maximally monotone. 

Example 12.6 If S : X → X ∗ is a single-valued linear and positive operator,


then S is maximally monotone. According to Example 12.2 (b), S is monotone.
Let (x, x∗ ) ∈ X × X ∗ such that hSy − x∗ , y − xi ≥ 0 for all y ∈ X. For all u ∈ X
and all λ > 0, by taking y := x + λu, it holds

λhSx − x∗ , ui + λ2 hSu, ui ≥ 0
90 V Maximal monotone operators

or, equivalently,
hSx − x∗ , ui + λhSu, ui ≥ 0.
We let λ converge to zero and obtain hSx−x∗ , ui ≥ 0 for all u ∈ X, consequently,
hSx − x∗ , ui = 0 for all u ∈ X, in other words, Sx = x∗ . This shows that (12.1)
holds, which proves that S is maximally monotone.

Proposition 12.7 Let T : X ⇒ X ∗ be a maximally monotone operator. Then


for all x ∈ X the set T (x) is convex and weakly∗ closed.

Proof. Let x ∈ X be fixed. Then x∗ ∈ T (x) if and only if hy ∗ − x∗ , y − xi ≥ 0


for all (y, y ∗ ) ∈ G(T ). This means that

T (x) = ∩(y,y∗ )∈G(T ) {x∗ ∈ X ∗ | hx∗ , y − xi ≤ hy ∗ , y − xi}

and proves that T (x) is the intersection of a family of convex and weakly∗ closed
sets. Consequently, T (x) is convex and weakly∗ closed. 

In real Hilbert spaces we have the following criterion for the maximality of a
monotone operator.

Proposition 12.8 Let (H, h·, ·i) be a real Hilbert space and T : H → H a single-
valued monotone and continuous operator. Then T is maximally monotone.

Proof. Let (x, u) ∈ H × H such that hT (y) − u, y − xi ≥ 0 for all y ∈ H. For


all n ∈ N and yn := x + n1 (u − T (x)), we have hT (yn ) − u, u − T (x)i ≥ 0. Since
T (yn ) → T (x) as n → +∞, this yields 0 ≤ hT (x) − u, u − T (x)i = −kT (x) − uk2 ,
thus, T (x) = u. This shows that (12.1) holds, which proves that T is maximally
monotone. 

Example 12.9 As a consequence of Proposition 12.8 it follows that the projec-


tion operator PC : H → H onto a nonempty, convex and closed subset C of a
real Hilbert space H is maximally monotone.

13 The maximal monotonicity of the convex


subdifferential
This section is dedicated to the proof of the fact that the convex subdifferential
of a proper, convex and lower semicontinuous function defined on a nontrivial
real Banach space is maximally monotone. To this end we will prove a number
of important intermediary results. We will start with a very important result in
nonlinear analysis, the so-called Ekeland variational principle.
13 The maximal monotonicity of the convex subdifferential 91

Theorem 13.1 (Ekeland variational principle) Let (X, d) be a complete metric


space and f : X → R a proper, lower semicontinuous and bounded from below
function. Then for every x0 ∈ dom f and every ε > 0 there exists xε ∈ X such
that
f (xε ) ≤ f (x0 ) − εd(x0 , xε )
and
f (xε ) < f (x) + εd(xε , x) ∀x ∈ X \ {xε }.

Proof. Let x0 ∈ dom f and ε > 0 be fixed. For every x ∈ X we consider the
set F (x) := {y ∈ X | f (y) + εd(x, y) ≤ f (x)}. We will prove that there exists
xε ∈ F (x0 ) such that F (xε ) = {xε }, which is nothing else than the conclusion of
the theorem.
Since y 7→ f (y) + εd(x, y) is lower semicontinuous, the set F (x) is closed for
every x ∈ X. For every x ∈ dom f it holds x ∈ F (x) ⊆ dom f , while for every
x ∈ X \ dom f it holds F (x) = X.
We also note that F (y) ⊆ F (x) for every y ∈ F (x). This is obvious for
x∈/ dom f . Let x ∈ dom f , y ∈ F (x) and z ∈ F (y). Then

f (z) + εd(y, z) ≤ f (y) and f (y) + εd(x, y) ≤ f (x).

By the triangle inequality we get

f (z) + εd(x, z) ≤ f (y) + ε(d(x, z) − d(y, z)) ≤ f (y) + εd(x, y) ≤ f (x),

which proves that z ∈ F (x).


Since f is bounded from below, v0 := inf{f (x) | x ∈ F (x0 )} ∈ R. Then
there exists x1 ∈ F (x0 ) such that f (x1 ) < v0 + 21 . We have v1 := inf{f (x) | x ∈
F (x1 )} ∈ R and there exists x2 ∈ F (x1 ) such that f (x2 ) < v1 + 212 . Continuing
in this way we obtain the sequences (vn )n≥0 and (xn )n≥0 such that for all n ≥ 0
1
vn := inf{f (x) | x ∈ F (xn )} ∈ R, xn+1 ∈ F (xn ) and f (xn+1 ) < vn + .
2n+1
From F (xn+1 ) ⊆ F (xn ) it yields that vn+1 ≥ vn for all n ≥ 0. In addition, as
xn+1 ∈ F (xn ), for all n ≥ 1 it holds
1
εd(xn+1 , xn ) ≤ f (xn ) − f (xn+1 ) ≤ f (xn ) − vn ≤ f (xn ) − vn−1 < .
2n
Consequently, for n, p ≥ 1 it holds
n+p−1 n+p−1
X 1 X 1 1
d(xn+p , xn ) ≤ d(xj+1 , xj ) < j
< n−1 .
j=n
ε j=n 2 2

which shows that (xn )n≥0 is a Cauchy sequence, thus it converges to some element
xε ∈ X.
92 V Maximal monotone operators

Let n ≥ 0. For every m ≥ n it holds xm ∈ F (xm ) ⊆ F (xn ) and, since F (xn )


is closed, we have xε ∈ F (xn ). In particular, xε ∈ F (x0 ), thus xε ∈ dom f .
Let x ∈ F (xε ). Then, for all n ≥ 0, x ∈ F (xn ), in other words, εd(x, xn ) ≤
f (xn ) − f (x) ≤ f (xn ) − vn < 21n . This yields that xn → x as n → +∞, thus,
x = xε . Since, obviously, xε ∈ F (xε ), we get F (xε ) = {xε } and the proof is
complete. 

The following variant of the Ekeland variational principle is often used in


applications.

Corollary 13.2 Let (X, d) be a complete metric space and f : X → R a proper,


lower semicontinuous and bounded from below function. Let ε > 0 and x0 ∈ dom f
such that f (x0 ) ≤ inf x∈X f (x) + ε. Then for every λ > 0 there exists xλ ∈ X
such that
f (xλ ) ≤ f (x0 ), d(x0 , xλ ) ≤ λ
and
ε
f (xλ ) < f (x) + d(x, xλ ) ∀x ∈ X \ {xλ }.
λ

Proof. According to Theorem 13.1, there exists xλ ∈ X such that


ε
f (xλ ) ≤ f (x0 ) − d(x0 , xλ )
λ
and
ε
f (xλ ) < f (x) + d(x, xλ ) ∀x ∈ X \ {xλ }.
λ
Obviously, f (xλ ) ≤ f (x0 ). On the other hand,
ε
f (xλ ) + d(x0 , xλ ) ≤ f (x0 ) ≤ inf f (x) + ε ≤ f (xλ ) + ε,
λ x∈X

which proves that d(x0 , xλ ) ≤ λ. 

The following extension of the (convex) subdifferential of a function will play


an important role in the subsequent analysis.

Definition 13.3 ((convex) ε-subdifferential) Let f : X → R be a given function


and x ∈ X be such that f (x) ∈ R and ε ≥ 0. The set defined as

∂ε f (x) := {x∗ ∈ X ∗ | f (y) − f (x) ≥ hx∗ , y − xi − ε ∀y ∈ X} for f (x) ∈ R,

and ∂ε f (x) := ∅ for f (x) ∈


/ R, is called the (convex) ε-subdifferential of f at x.
The elements of ∂ε f (x) are called (convex) ε-subgradients of f at x. Therefore,
the (convex) ε-subdifferential is a set-valued operator ∂ε f : X ⇒ X ∗ .
13 The maximal monotonicity of the convex subdifferential 93

Let x ∈ X. It holds ∂0 f (x) = ∂f (x) and, for 0 ≤ ε1 ≤ ε2 , ∂ε1 f (x) ⊆ ∂ε2 f (x).
Let ε ≥ 0. Then ∂ε f (x) is a convex and weakly∗ closed set,

∂ε f (x) = ∩η>ε ∂η f (x),

and
x∗ ∈ ∂ε f (x) ⇔ f ∗ (x∗ ) + f (x) ≤ hx∗ , xi + ε ⇒ x ∈ ∂ε f ∗ (x∗ ).
Next we formulate a theorem, which is a consequence of the Ekeland’s vari-
ational principle and it relates the graph of the (convex) ε-subdifferential of a
proper, convex and lower semicontinuous function defined on a Banach space
with the graph of its (convex) subdifferential.

Theorem 13.4 (Brøndsted-Rockafellar Theorem) Let X be a Banach space and


f : X → R a proper, convex and lower semicontinuous function. Then for every
x0 ∈ dom f , ε > 0, λ > 0 and x∗0 ∈ ∂ε f (x0 ), there exists (xλ , x∗λ ) ∈ X × X ∗ such
that
ε
x∗λ ∈ ∂f (xλ ), kxλ − x0 k ≤ λ and kx∗λ − x∗0 k∗ ≤ .
λ
In particular,

D(∂f ) ⊆ dom f ⊆ cl(D(∂f )) and R(∂f ) ⊆ dom f ∗ ⊆ cl(R(∂f )).

Proof. Let the proper, convex and lower semicontinuous function g : X →


R, g(x) = f (x) − hx∗0 , xi. It holds g(x0 ) ≤ inf x∈X g(x) + ε. By Corollary 13.2,
there exists xλ ∈ X such that kxλ − x0 k ≤ λ and g(xλ ) ≤ g(x) + λε kx − xλ k for
all x ∈ X. Consequently, from Theorem 10.2 (b) and Proposition 6.5 (c),
 ε  ε ε
0 ∈ ∂ g + k · −xλ k (xλ ) = ∂g(xλ ) + ∂k · −xλ k(xλ ) = ∂g(xλ ) + ∂k · k(0)
λ λ λ
∗ ε
= ∂f (xλ ) − x0 + ∂k · k(0),
λ
which means that there exists x∗λ ∈ ∂f (xλ ) such that λε (x∗0 − x∗λ ) ∈ ∂k · k(0) or,
equivalently, kx∗λ − x∗0 k∗ ≤ λε . The last two statements of the theorem follow
immediately. 

Now we can prove the main result of this section.

Theorem 13.5 Let X be a Banach space and f : X → R a proper, convex and


lower semicontinuous. Then ∂f : X ⇒ X ∗ is a maximal monotone operator.

Proof. Let (x, x∗ ) ∈ X ×X ∗ such that hy ∗ −x∗ , y−xi ≥ 0 for all (y, y ∗ ) ∈ G(∂f ).
We will prove that x∗ ∈ ∂f (x), which will mean that (12.1) is fulfilled and will,
consequently, lead to the desired conclusion.
94 V Maximal monotone operators

We define g : X → R, g(y) = f (y + x) − hx∗ , yi, which is a proper, convex and


lower semicontinuous function. By the Fenchel-Moreau Theorem, there exists
u∗ ∈ X ∗ such that
 
1 1
inf g(y) + kyk = −g ∗ (u∗ ) − ku∗ k2∗ ∈ R.
2
y∈X 2 2
That this infimum is finite, is an easy consequence of the fact that g is proper,
convex and lower semicontinuous.
From here it yields that there exists a sequence (yn )n∈N ⊆ X such that for all
n∈N
1 1 2 ∗ ∗ 1 ∗ 2
≥ g(y n ) + ky n k + g (u ) + ku k∗ (13.1)
n2 2 2
1 1
≥ hu∗ , yn i + kyn k2 + ku∗ k2∗
2 2
1
≥ (kyn k − ku∗ k∗ )2 ,
2
where the second inequality follows from Young-Fenchel inequality. This yields,
on the one hand,
kyn k → ku∗ k∗ and hu∗ , yn i → −ku∗ k2∗ as n → +∞.
On the other hand, for all n ≥ 1 it holds
1
g(yn ) + g ∗ (u∗ ) ≤
2
+ hu∗ , yn i
n

or, equivalently, u ∈ ∂ 12 g(yn ). According to Theorem 13.4 there exists sequences
n
(zn )n∈N ⊆ X and (zn∗ )n∈N ⊆ X ∗ such that for all n ∈ N it holds
1 1
zn∗ ∈ ∂g(zn ), kzn − yn k ≤ and kzn∗ − u∗ k∗ ≤ .
n n
∗ ∗
We notice that for all n ∈ N we have zn ∈ ∂g(zn ) = ∂f (zn +x)−x or, equivalently,
(zn + x, zn∗ + x∗ ) ∈ G(∂f ), which, according to our assumption, yields
hzn∗ , zn i ≥ 0.
We have for all n ∈ N
0 ≤ hzn∗ , zn i = hzn∗ , zn − yn i + hzn∗ , yn i = hzn∗ , zn − yn i + hzn∗ − u∗ , yn i + hu∗ , yn i
 
1
≤ + ku k∗ kzn − yn k + kzn∗ − u∗ k∗ kyn k + hu∗ , yn i

n
and let in this inequality n → +∞. This yields 0 ≤ −ku∗ k2∗ , thus u∗ = 0, which
also proves that yn → 0 as n → +∞. Using that g is lower semicontinuous, (13.1)
yields for n → +∞
g(0) + g ∗ (0) = 0
or, equivalently, 0 ∈ ∂g(0) = ∂f (x) − x∗ , in other words, (x, x∗ ) ∈ G(∂f ). This
provides the desired conclusion. 
14 Monotone operators and their representative functions 95

The following example shows that there are maximal monotone operators
which are not (convex) subdifferentials.

Example 13.6 (maximal monotone operators which are not (convex) subdif-
ferentials) A single-valued linear operator S : X → X ∗ is said to be skew if
hSx, xi = 0 for all x ∈ X or, equivalently, if

hSy, xi = −hSx, yi ∀x, y ∈ X.

As seen in Example 12.6, skew operators are maximally monotone.


We will show that nonzero single-valued linear and skew operators are not
(convex) subdifferentials. This means that for every Banach space X of dimension
strictly greater than 1 there exist maximal monotone operators defined on X
which are not (convex) subdifferentials.
Let S : X → X ∗ be a nonzero single-valued linear and skew operator and
assume that there exists f : X → R such that ∂f (x) = {Sx} for all x ∈ X. This
implies that dom f = X and for all x, y ∈ X it holds

f (y) − f (x) ≥ hSx, y − xi and f (x) − f (y) ≥ hSy, x − yi.

From here we get

hSx, yi = −hSy, xi ≥ f (y) − f (x) ≥ hSx, yi ∀x, y ∈ X,

consequently, f (y) = f (x) + hSx, yi for all x, y ∈ X. In conclusion, f (y) = f (0)


for all y ∈ X, which further implies that S is a zero operator. Contradiction!

14 A representation of monotone operators by


convex functions
Throughout this section we will assume that X is a nontrivial real Banach space.
We will discuss some basic results related to the representation of monotone
operators by convex functions. The central role in our investigations will be
played by the so-called Fitzpatrick function.

Definition 14.1 (Fitzpatrick function) Let S : X ⇒ X ∗ be a monotone opera-


tor. The Fitzpatrick function associated to S is defined by

ϕS : X × X ∗ → R, ϕS (x, x∗ ) = sup {hx∗ , si + hs∗ , xi − hs∗ , si}.


(s,s∗ )∈G(S)

The most important properties of the Fitzpatrick function associated to a


maximal monotone operator are collected in the following result.
96 V Maximal monotone operators

Theorem 14.2 Let S : X ⇒ X ∗ be a maximal monotone operator. The Fitz-


patrick function ϕS : X × X ∗ → R associated to S has the following properties:
(a) ϕS (x, x∗ ) ≥ hx∗ , xi for all (x, x∗ ) ∈ X × X ∗ ;
(b) ϕS is proper, convex and weak×weak∗ lower semicontinuous;
(c) G(S) = {(x, x∗ ) ∈ X × X ∗ | ϕS (x, x∗ ) = hx∗ , xi} = {(x, x∗ ) ∈ X ×
X ∗ | ϕ∗S (x∗ , x) = hx∗ , xi}.
Proof. (a) Let (x, x∗ ) ∈ X ×X ∗ . If (x, x∗ ) ∈ G(S), then, obviously, ϕS (x, x∗ ) ≥
hx∗ , xi. If (x, x∗ ) ∈
/ G(S), then there exists (s, s∗ ) ∈ G(S) such that hs∗ − x∗ , s −
xi < 0 or, equivalently, hx∗ , xi < hx∗ , si + hs∗ , xi − hs∗ , si ≤ ϕS (x, x∗ ).
(b) The function ϕS is the pointwise supremum of a family of affine and
weak×weak∗ continuous functions, consequently, it is convex and weak×weak∗
lower semicontinuous. The properness of ϕS follows from (c) by taking into
account that G(S) 6= ∅.
(c) We have seen that for (x, x∗ ) ∈ / G(S) it holds ϕS (x, x∗ ) > hx∗ , xi, which
proves that {(x, x∗ ) ∈ X × X ∗ | ϕS (x, x∗ ) = hx∗ , xi} ⊆ G(S). For the opposite
inclusion, let (x, x∗ ) ∈ G(S). By using the monotonicity of S, for all (s, s∗ ) ∈
G(S) it holds hs∗ −x∗ , s−xi ≥ 0 or, equivalently, hx∗ , xi ≥ hx∗ , si+hs∗ , xi−hs∗ , si.
From here we get that ϕS (x, x∗ ) ≤ hx∗ , xi, which, in combination with (a), proves
that ϕS (x, x∗ ) = hx∗ , xi.
For all (x, x∗ ) ∈ X × X ∗ we have
ϕS (x, x∗ ) = sup {hx∗ , si + hs∗ , xi − hs∗ , si}
(s,s∗ )∈G(S)

= sup {hx∗ , si + hs∗ , xi − ϕS (s, s∗ )}


(s,s∗ )∈G(S)

≤ sup {h(x∗ , x), (s, s∗ )i − ϕS (s, s∗ )}


(s,s∗ )∈X×X ∗

= ϕ∗S (x∗ , x),


which proves that
ϕ∗S (x∗ , x) ≥ ϕS (x, x∗ ) ≥ hx∗ , xi ∀(x, x∗ ) ∈ X × X ∗ . (14.1)
If for (x, x∗ ) ∈ X × X ∗ it holds ϕ∗S (x∗ , x) = hx∗ , xi, then ϕS (x, x∗ ) = hx∗ , xi,
which implies that (x, x∗ ) ∈ G(S). On the other hand, let (x, x∗ ) ∈ G(S). For
all (y, y ∗ ) ∈ X × X ∗ it holds
ϕS (y, y ∗ ) ≥ hx∗ , yi + hy ∗ , xi − hx∗ , xi = h(x∗ , x), (y, y ∗ )i − hx∗ , xi,
which implies that
hx∗ , xi ≥ sup {h(x∗ , x), (y, y ∗ )i − ϕS (y, y ∗ )} = ϕ∗S (x∗ , x)
(y,y ∗ )∈X×X ∗

and gives, in combination with (14.1), ϕ∗S (x∗ , x) = hx∗ , xi. 


14 Monotone operators and their representative functions 97

Example 14.3 (a) If f : X → R is a proper, convex and lower semicontinuous


function. For all (x, x∗ ) ∈ X × X ∗ it holds

hx∗ , xi ≤ ϕ∂f (x, x∗ ) ≤ f (x) + f ∗ (x∗ ) ≤ ϕ∗∂f (x∗ , x) ≤ hx∗ , xi + δG(∂f ) (x, x∗ ).

(b) Let H be a real Hilbert space and C ⊆ H a nonempty, convex and closed
set. The Fitzpatrick function of the normal cone operator NC : X ⇒ X ∗ reads
ϕNC (x, x∗ ) = δC (x) + δC∗ (x∗ ).

We will denote by J : X ⇒ X ∗ ,
 
1
J(x) = ∂ k · k (x) = {x∗ ∈ X ∗ | kx∗ k∗ = kxk and hx∗ , xi = kxkkx∗ k∗ },
2
2

the so-called duality mapping of the space X. By Theorem 13.5, J is a maximal


monotone operator.
For (x, x∗ ) ∈ X × X ∗ , let

1 1
∆(x, x∗ ) = kxk2 + hx∗ , xi + kx∗ k2∗
2 2
1 1 1
≥ kxk2 − kxkkx∗ k∗ + kx∗ k2∗ = (kxk − kx∗ k∗ )2 ≥ 0.
2 2 2
Then

∆(x, x∗ ) = 0 ⇔ kxk = kx∗ k∗ and h−x∗ , xi = kxkkx∗ k∗ = kxk2 ⇔ −x∗ ∈ J(x).

In reflexive Banach spaces, one can formulate in terms of the duality mapping
the so-called “−J” criterion for the maximality of a monotone operator. In
this setting, we will identify the dual of X × X ∗ with X ∗ × X and consider as
duality product h(y ∗ , y), (x, x∗ )i = hy ∗ , xi + hx∗ , yi for (x, x∗ ) ∈ X × X ∗ and
(y ∗ , y) ∈ X ∗ × X. Consequently, the conjugate of a function h : X × X ∗ → R
reads

h∗ : X ∗ × X → R, h∗ (y ∗ , y) = sup {hy ∗ , xi + hx∗ , yi − h(x, x∗ )}.


(x,x∗ )∈X×X ∗

Theorem 14.4 (“−J” criterion for maximality) Let X be a reflexive Banach


space and S : X ⇒ X ∗ a monotone operator. The following statements are
equivalent:

(i) S is maximally monotone;

(ii) G(S) + G(−J) = X × X ∗ ;

(iii) R(S(x + ·) + J(·)) = X ∗ for all x ∈ X.


98 V Maximal monotone operators

Proof. “(ii) ⇒ (i)” Let (x, x∗ ) ∈ X × X ∗ such that hy ∗ − x∗ , y − xi ≥ 0 for all


(y, y ∗ ) ∈ G(S). Then there exists (s, s∗ ) ∈ G(S) such that (x−s, x∗ −s∗ ) ∈ G(−J)
or, equivalently, ∆(x − s, x∗ − s∗ ) = 0. Since
1 1
0 = ∆(x − s, x∗ − s∗ ) = kx − sk2 + hx∗ − s∗ , x − si + kx∗ − s∗ k2∗
2 2
1 1
≥ kx − sk2 + kx∗ − s∗ k2∗ ≥ 0,
2 2
we obtain (x, x∗ ) = (s, s∗ ) ∈ G(S). This proves that S is maximally monotone.
“(i) ⇒ (iii)” Let x ∈ X and x∗ ∈ X ∗ . Then ϕS (z, z ∗ ) − hz ∗ , zi + ∆(x − z, x∗ −
z ) ≥ 0 for all (z, z ∗ ) ∈ X × X ∗ or, equivalently,

inf {ϕS (z, z ∗ ) + g(z, z ∗ )} ≥ 0,


(z,z ∗ )∈X×X ∗

where g : X × X ∗ → R,

g(z, z ∗ ) = ∆(x − z, x∗ − z ∗ ) − hz ∗ , zi
1 1
= kx − zk2 + kx∗ − z ∗ k2∗ − hx∗ , zi − hz ∗ , xi + hx∗ , xi,
2 2
is a convex and continuous function. According to strong Fenchel duality, there
exists (u∗ , u) ∈ X ∗ × X such that

ϕ∗S (u∗ , u) + g ∗ (−u∗ , −u) ≤ 0.

We have
 
∗ ∗ ∗ ∗ ∗ 1 2 1 ∗ ∗ 2
g (−u , −u) = sup hx − u , zi + hz , x − ui − kx − zk − kx − z k∗
(z,z ∗ )∈X×X ∗ 2 2
− hx∗ , xi
   
∗ ∗ 1 2 ∗ ∗ 1 ∗ 2
= sup hx − u , x − ti − ktk + sup hx − t , x − ui − kt k∗
t∈X 2 t∗ ∈X ∗ 2

− hx , xi
1 1
=hx∗ − u∗ , xi + kx∗ − u∗ k2∗ + hx∗ , x − ui + kx − uk2 − hx∗ , xi
2 2
∗ ∗ ∗
=∆(x − u, x − u ) − hu , ui,

consequently, also by using (14.1),

0 ≤ ϕ∗S (u∗ , u) − hu∗ , ui + ∆(x − u, x∗ − u∗ ) ≤ 0.

From here it yields ϕ∗S (u∗ , u) = hu∗ , ui and ∆(x − u, x∗ − u∗ ) = 0 or, equivalently,
(u, u∗ ) ∈ G(S) and x∗ − u∗ ∈ J(u − x). Consequently,

x∗ = u∗ + x∗ − u∗ ∈ S(x + (u − x)) + J(u − x) ∈ R(S(x + ·) + J(·)).


14 Monotone operators and their representative functions 99

“(iii) ⇒ (ii)” Let (x, x∗ ) ∈ X × X ∗ . Then there exists u ∈ X such that x∗ ∈


S(x+u)+J(u), which means that there exists −u∗ ∈ J(u) with x∗ +u∗ ∈ S(x+u).
One can easily notice that u∗ ∈ J(−u) or, equivalently, −u∗ ∈ (−J)(−u). This
proves that

(x, x∗ ) = (x + u, x∗ + u∗ ) + (−u, −u∗ ) ∈ G(S) + G(−J).

An immediate consequence of Theorem 14.4 is the celebrated Rockafellar’s


surjectivity theorem.

Theorem 14.5 (Rockafellar’s surjectivity theorem) Let X be a nontrivial reflex-


ive Banach space and S : X ⇒ X ∗ a maximal monotone operator. Then

R(S + J) = X ∗ .

Remark 14.6 If X is a reflexive Banach space, S : X ⇒ X ∗ a monotone op-


erator and J and J −1 are single-valued, then R(S + J) = X ∗ implies that S is
maximally monotone.
Indeed, let (x, x∗ ) ∈ X ×X ∗ such that hy ∗ −x∗ , y−xi ≥ 0 for all (y, y ∗ ) ∈ G(S).
Since x∗ + J(x) ∈ R(S + J), there exists (s, s∗ ) ∈ G(S) such that x∗ + J(x) =
s∗ + J(s) or, equivalently, J(s) − J(x) = x∗ − s∗ . It holds
1 1 1 1
ksk2 + kJ(s)k2∗ + kxk2 + kJ(x)k2∗ = hJ(s), si + hJ(x), xi
2 2 2 2
= hJ(x), si + hJ(s), xi + hJ(s) − J(x), s − xi
= hJ(x), si + hJ(s), xi + hx∗ − s∗ , s − xi
1 1 1 1
≤ hJ(x), si + hJ(s), xi ≤ ksk2 + kJ(s)k2∗ + kxk2 + kJ(x)k2∗ .
2 2 2 2
Consequently, 21 ksk2 + 21 kJ(x)k2∗ = hJ(x), si and 21 kxk2 + 12 kJ(s)k2∗ = hJ(s), xi,
which is further equivalent to (s, J(x)), (x, J(s)) ∈ G(J). Since J is single-valued,
it holds J(x) = J(s), thus, x∗ = s∗ and, using in addition that J −1 is single-
valued, we have x = s. In conclusion, (x, x∗ ) = (s, s∗ ) ∈ G(S), which proves that
S is maximally monotone.

Remark 14.7 In Hilbert spaces, by combining Theorem 14.5 and Remark 14.6,
we rediscover the celebrated Minty’s theorem. This says that if H is a real Hilbert
space and S : H ⇒ H a monotone operator, then S is maximally monotone if
and only if R(S + IdH ) = H.
If a Banach space is uniformly smooth and uniformly convex, as are for in-
stance Lp -spaces, for 1 < p < ∞, then both J and J −1 are single-valued.

The following proposition proposes an approach to construct monotone oper-


ators inspired by the concept of Fitzpatrick function.
100 V Maximal monotone operators

Proposition 14.8 Let h : X × X ∗ → R be a convex function such that

h(x, x∗ ) ≥ hx∗ , xi ∀(x, x∗ ) × X × X ∗ ,

and Mh : X ⇒ X ∗ , G(Mh ) = {(x, x∗ ) ∈ X × X ∗ | h(x, x∗ ) = hx∗ , xi}. The


following statements are true:
(a) Mh is a monotone operator;
(b) if (x, x∗ ), (y, y ∗ ) ∈ X × X ∗ are such that

h(x, x∗ ) − hx∗ , xi + ∆(y − x, y ∗ − x∗ ) ≤ 0,

then (x, x∗ ) ∈ G(Mh );


(c) if S : X ⇒ X ∗ is such that G(S) ⊆ G(Mh ) and for all (y, y ∗ ) ∈ X × X ∗
there exists (x, x∗ ) ∈ G(S) such that

h(x, x∗ ) − hx∗ , xi + ∆(y − x, y ∗ − x∗ ) ≤ 0,

then S is maximally monotone and S = Mh .

Proof. (a) For all (x, x∗ ), (y, y ∗ ) ∈ G(Mh ) it holds


 
1 ∗ 1 1 1 1 1 1 1
hx , xi + hy ∗ , yi = h(x, x∗ ) + h(y, y ∗ ) ≥ h x + y, x∗ + y ∗
2 2 2 2 2 2 2 2
 
1 ∗ 1 ∗ 1 1
≥ x + y , x+ y ,
2 2 2 2
consequently, hy ∗ − x∗ , y − xi ≥ 0.
(b) Since ∆(y − x, y ∗ − x∗ ) ≥ 0, it yields h(x, x∗ ) ≤ hx∗ , xi, thus h(x, x∗ ) =
hx∗ , xi.
(c) From G(S) ⊆ G(Mh ) and (a) we have that S is monotone. Let (x, x∗ ) ∈
X × X ∗ such that hy ∗ − x∗ , y − xi ≥ 0 for all (y, y ∗ ) ∈ G(S). By assumption,
there exists (z, z ∗ ) ∈ G(S) such h(z, z ∗ ) − hz ∗ , zi + ∆(x − z, x∗ − z ∗ ) ≤ 0. Since
h(z, z ∗ ) − hz ∗ , zi = 0, it yields
1 1
0 ≥ ∆(x − z, x∗ − z ∗ ) = kx − zk2 + hx∗ − z ∗ , x − zi + kx∗ − z ∗ k2∗
2 2
1 1
≥ kx − zk2 + kx∗ − z ∗ k2∗ ≥ 0,
2 2
which implies that (x, x∗ ) = (z, z ∗ ) ∈ G(S). This proves that S is maximally
monotone and further that Mh is also maximally monotone. 

In the following we show that in reflexive Banach spaces one can build on
the approach introduced in Proposition 14.8 to construct maximal monotone
operators.
14 Monotone operators and their representative functions 101

Proposition 14.9 Let X be a nontrivial reflexive Banach space and h : X ×


X ∗ → R a proper and convex function such that
h(x, x∗ ) ≥ hx∗ , xi and h∗ (x∗ , x) ≥ hx∗ , xi ∀(x, x∗ ) × X × X ∗ .
Then the set-valued operator having as graph {(x, x∗ ) ∈ X × X ∗ | h∗ (x∗ , x) =
hx∗ , xi} is maximally monotone.

Proof. We define k : X × X ∗ → R, k(x, x∗ ) = h∗ (x∗ , x), which is a convex


function that fulfils k(x, x∗ ) ≥ hx∗ , xi for all (x, x∗ ) × X × X ∗ .
Let (y, y ∗ ) ∈ X × X ∗ . Then h(z, z ∗ ) − hz ∗ , zi + ∆(y − z, y ∗ − z ∗ ) ≥ 0 for all
(z, z ∗ ) ∈ X × X ∗ or, equivalently,
inf {h(z, z ∗ ) + g(z, z ∗ )} ≥ 0,
(z,z ∗ )∈X×X ∗

where g : X × X ∗ → R, g(z, z ∗ ) = ∆(y − z, y ∗ − z ∗ ) − hz ∗ , zi, is a convex and


continuous function. According to strong Fenchel duality, there exists (x∗ , x) ∈
X ∗ × X such that
k(x, x∗ ) + g ∗ (−x∗ , −x) = h∗ (x∗ , x) + g ∗ (−x∗ , −x) ≤ 0,
which, by taking into account the formula for g ∗ derived in the proof of Theorem
14.4, is nothing else than
k(x, x∗ ) − hx∗ , xi + ∆(y − x, y ∗ − x∗ ) ≤ 0.
This gives k(x, x∗ ) = hx∗ , xi, namely, (x, x∗ ) ∈ G(Mk ), and implies, according to
Proposition 14.8 (c), that Mk is maximally monotone. 

Remark 14.10 If X is a reflexive Banach space and f : X → R a proper, convex


and lower semicontinuous function, then h : X×X ∗ → R, h(x, x∗ ) = f (x)+f ∗ (x∗ ),
is proper and convex and it fulfils for all (x, x∗ ) ∈ X × X ∗
h(x, x∗ ) = f (x) + f ∗ (x∗ ) ≥ hx∗ , xi and h∗ (x∗ , x) = f ∗ (x∗ ) + f (x) ≥ hx∗ , xi.
Proposition 14.9 allows us to conclude that, since G(∂f ) = {(x, x∗ ) ∈ X ×
X ∗ | h∗ (x∗ , x) = hx∗ , xi} = {(x, x∗ ) ∈ X × X ∗ | f ∗ (x∗ ) + f (x) = hx∗ , xi}, the
(convex) subdifferential of f is maximally monotone.

Theorem 14.11 Let X be a reflexive Banach space and h : X × X ∗ → R a


proper, convex and lower semicontinuous function such that
h(x, x∗ ) ≥ hx∗ , xi ∀(x, x∗ ) × X × X ∗ .
Then the set-valued operator having as graph {(x, x∗ ) ∈ X × X ∗ | h(x, x∗ ) =
hx∗ , xi} is maximally monotone if and only if
h∗ (x∗ , x) ≥ hx∗ , xi ∀(x, x∗ ) × X × X ∗ .
102 V Maximal monotone operators

Proof. “⇒” For all (x, x∗ ) ∈ X × X ∗ we have

h∗ (x∗ , x) = sup {h(x∗ , x), (s, s∗ )i − h(s, s∗ )}


(s,s∗ )∈X×X ∗

≥ sup {hx∗ , si + hs∗ , xi − h(s, s∗ )}


(s,s∗ )∈G(M h)

= sup {hx∗ , si + hs∗ , xi − hs∗ , si}


(s,s∗ )∈G(M h)

= ϕMh (x, x ) ≥ hx∗ , xi.


“⇐” By the Fenchel-Moreau Theorem, the function k : X × X ∗ → R, k(x, x∗ ) =


h∗ (x∗ , x), is proper and convex. It fulfils k(x, x∗ ) ≥ hx∗ , xi for all (x, x∗ ) ∈ X×X ∗ .
In addition, again via the Fenchel-Moreau Theorem,

k ∗ (x∗ , x) = h∗∗ (x, x∗ ) = h(x, x∗ ) ≥ hx∗ , xi ∀(x, x∗ ) ∈ X × X ∗ .

According to Proposition 14.9, Mk∗ = Mh = {(x, x∗ ) ∈ X × X ∗ | h(x, x∗ ) =


hx∗ , xi} is maximally monotone. 

15 The maximal monotonicity of the sum of two


maximal monotone operators
Let X be a nontrivial real Banach space. We will address the question when
the sum of two maximal monotone operators is maximally monotone. We have
seen in Proposition 12.3 that the sum is monotone, however, it is in general not
maximally monotone.

Example 15.1 The sets C = {(x1 , x2 ) ∈ R2 | (x1 − 1)2 + x22 ≤ 1} and D =


{(x1 , x2 ) ∈ R2 | (x1 + 1)2 + x22 ≤ 1} are both nonempty, convex and closed, thus,
according to Theorem 13.5, the normal cone operators NC , ND : R2 ⇒ R2 are
maximally monotone. For every x 6= (0, 0) it holds NC (x) + ND (x) = ∅, while
NC (0, 0) + ND (0, 0) = R− × {0} + R+ × {0} = R × {0}. One can easily see that
NC + ND is not maximally monotone, since G(NC + ND ) = {(0, 0)} × (R × {0}) (
G(NC∩D ) = {(0, 0)} × R2 .

Remark 15.2 If f, g : X → R are proper, convex and lower semincontinu-


ous functions, then the (convex) subdifferential operators ∂f, ∂g : X ⇒ X ∗ are
maximally monotone. We have seen in Theorem 10.2 (b) that if one of the reg-
ularity conditions (RCiF ), i ∈ {1, 2, 20 , 3, 4, 5}, is fulfilled, then ∂f (x) + ∂g(x) =
∂(f + g)(x) for all x ∈ X, which means that ∂f + ∂g is maximally monotone.

In the following theorem we will provide a weak sufficient regularity condition


for the maximal monotonicity of the sum of two maximal monotone operators in
reflexive Banach spaces formulated in terms of their Fitzpatrick functions.
15 The maximal monotonicity of the sum of two maximal monotone operators 103

Theorem 15.3 Let X be a reflexive Banach space and S, T : X ⇒ X ∗ be two


maximal monotone operators such that

0 ∈ sqri (PrX dom ϕS − PrX dom ϕT ) . (15.1)

Then S + T is maximally monotone.

Proof. Let

h : X × X ∗ → R, h(x, x∗ ) = ∗inf ∗ {ϕS (x, s∗ ) + ϕT (x, x∗ − s∗ )}.


s ∈X

Since h is the infimal value function of a convex function, it is convex.


For all (x, x∗ ) ∈ X × X ∗ it holds

h(x, x∗ ) = ∗inf ∗ {ϕS (x, s∗ ) + ϕT (x, x∗ − s∗ )} ≥ ∗inf ∗ {hs∗ , xi + hx∗ − s∗ , xi}


s ∈X s ∈X
≥ hx∗ , xi > −∞.

Since 0 ∈ PrX dom ϕS −PrX dom ϕT , there exist x ∈ X and s∗ , t∗ ∈ X ∗ , such that
(x, s∗ ) ∈ dom ϕS and (x, t∗ ) ∈ dom ϕT . Consequently, h(x, s∗ + t∗ ) ≤ ϕS (x, s∗ ) +
ϕT (x, t∗ ) < +∞, which proves that h is proper.
Let (x, x∗ ) ∈ X × X ∗ be fixed. It holds

h∗ (x∗ , x) = sup {hx∗ , zi + hz ∗ , xi − h(z, z ∗ )}


(z,z ∗ )∈X×X ∗

= sup {hx∗ , zi + hz ∗ , xi − ϕS (z, s∗ ) − ϕT (z, z ∗ − s∗ )}


z∈X,z ∗ ,s∗ ∈X ∗

= sup {hx∗ , zi + hs∗ + t∗ , xi − ϕS (z, s∗ ) − ϕT (z, t∗ )}


z∈X,s∗ ,t∗ ∈X ∗
∗ ∗
= (F + G) (x , x, x), (15.2)

where

F, G : X×X ∗ ×X ∗ → R, F (u, s∗ , t∗ ) = ϕS (u, s∗ ) and G(u, s∗ , t∗ ) = ϕT (u, t∗ ).

The functions F and G are proper, convex and lower semicontinuous and fulfil

dom F = {(u, s∗ , t∗ ) ∈ X × X ∗ × X ∗ | (u, s∗ ) ∈ dom ϕS }

and
dom G = {(u, s∗ , t∗ ) ∈ X × X ∗ × X ∗ | (u, t∗ ) ∈ dom ϕT }.
It holds

dom F − dom G = (PrX dom ϕS − PrX dom ϕT ) × X ∗ × X ∗ ,


104 V Maximal monotone operators

thus, according to (15.1), 0 ∈ sqri(dom F − dom G). From Theorem 10.2 (a) and
(15.2) we have

h∗ (x∗ , x) = (F + G)∗ (x∗ , x, x) = F ∗ G∗ (x∗ , x, x)


= ∗ min∗ {F ∗ (u∗ , s, t) + G∗ (x∗ − u∗ , x − s, x − t)}. (15.3)
(u ,s,t)∈X ×X×X

For all (u∗ , s, t) ∈ X ∗ × X × X it holds

F ∗ (u∗ , s, t) = sup {hu∗ , ui + hs∗ , si + ht∗ , ti − ϕS (u, s∗ )}


u∈X,s∗ ,t∗ ∈X ∗

ϕ∗S (u∗ , s),



if t = 0,
=
+∞, otherwise,

and

G∗ (x∗ − u∗ , x − s, x − t)
= sup {hx∗ − u∗ , ui + hs∗ , x − si + ht∗ , x − ti − ϕT (u, t∗ )}
∗ ∗
u∈X,s ,t ∈X ∗
 ∗ ∗
ϕT (x − u∗ , x − t), if x = s,
=
+∞, otherwise,

consequently,

h∗ (x∗ , x) = min
∗ ∗
{ϕ∗S (u∗ , x) + ϕ∗T (x∗ − u∗ , x)}.
u ∈X

From here, by using (14.1), we get for all (x, x∗ ) ∈ X × X ∗

h∗ (x∗ , x) ≥ ∗inf ∗ {hu∗ , xi + hx∗ − u∗ , xi} = hx∗ , xi.


u ∈X

This proves that the hypotheses of Proposition 14.9 are fulfilled, consequently,
the set-valued operator having as graph {(x, x∗ ) ∈ X × X ∗ | h∗ (x∗ , x) = hx∗ , xi}
is maximally monotone.
We notice that for (x, x∗ ) ∈ X × X ∗ it holds

h∗ (x∗ , x) = hx∗ , xi
⇔ ∃u∗ ∈ X ∗ such that
hx∗ , xi = ϕ∗S (u∗ , x) + ϕ∗T (x∗ − u∗ , x) ≥ hu∗ , xi + hx∗ − u∗ , xi = hx∗ , xi
⇔ ∃u∗ ∈ X ∗ such that ϕ∗S (u∗ , x) = hu∗ , xi and ϕ∗T (x∗ − u∗ , x) = hx∗ − u∗ , xi
⇔ ∃u∗ ∈ X ∗ such that (x, u∗ ) ∈ G(S) and (x, x∗ − u∗ ) ∈ G(T )
⇔ ∃u∗ ∈ X ∗ such that x∗ = u∗ + (x∗ − u∗ ) ∈ S(x) + T (x)
⇔ (x, x∗ ) ∈ G(S + T ).

This proves that S + T is maximally monotone. 


15 The maximal monotonicity of the sum of two maximal monotone operators 105

Remark 15.4 If X is a reflexive Banach space and S, T : X ⇒ X ∗ are two


maximal monotone operators, then one can prove that (see [9])

0 ∈ sqri (PrX dom ϕS − PrX dom ϕT )

if and only if

cone (D(S) − D(T )) is a closed linear subspace.

A sufficient condition for these conditions to hold and, consequently, for the
maximal monotonicity of S +T is the celebrated Rockafellar’s regularity condition

int(D(S)) ∩ D(T ) 6= ∅.

Indeed, one has from here

0 ∈ int(D(S) − D(T ))

and, consequently, cone(D(S) − D(T )) = X.


Notice that for the maximal mononotone operators in Example 15.1 one has

cone(D(NC ) − D(ND )) = cone(C − D) = cone(C) = R+ × R,

which is obviously not a linear subspace.

Remark 15.5 Since F and G in the proof of Theorem 15.3 are proper, convex
and lower semicontinuous functions one can obtain relation (15.3), according to
Theorem 10.2 (a), also if the closedness-type regularity condition (here we also
take into account that convex closed sets are weakly closed and that X is reflexive)

epi F ∗ + epi G∗ is closed in X ∗ × X × X × R

is fulfiled.
In other words, if X is a reflexive Banach space and S, T : X ⇒ X ∗ are two
maximal monotone operators such that

{(x∗ + y ∗ , x, y, r) ∈ X ∗ × X × X × R | ϕ∗S (x∗ , x) + ϕ∗T (y ∗ , y) ≤ r} is closed,


(15.4)
then S + T is maximally monotone.
We will provide in the following an example of a pair of maximal monotone
operators for which the interiority-type regularity conditions are not fulfilled,
while the closedness-type one holds.
For f, g : R → R, f (x) = 21 x2 + δR+ (x) and g(x) = δR− (x), the operators
S = ∂f and T = ∂g are maximally monotone. For all x ∈ R we have

 x, if x > 0,
S(x) = x + NR+ (x) = R− , if x = 0,
∅, otherwise,

106 V Maximal monotone operators

T (x) = NR− (x), and



R, if x = 0,
S(x) + T (x) =
∅, otherwise,

which means that S + T is maximally monotone. It holds cone(D(S) − D(T )) =


R+ , thus, the interiority-type regularity conditions are not fulfilled. For all
(x, x∗ ) ∈ R2 it holds
 
∗ ∗ ∗ 2
ϕS (x, x ) = max sup xy , sup{(x + x)y − y }
y ∗ ≤0 y>0
 1
 4 (x + x ) , if x ≥ 0, x + x∗ ≥ 0,
∗ 2

= 0, if x ≥ 0, x + x∗ ≤ 0,
+∞, otherwise,

and, for all (y, y ∗ ) ∈ R2 , by Example 14.3 (b), ϕT (y, y ∗ ) = δR− (y) + δR+ (y ∗ ).
Therefore, for all (x, x∗ ) ∈ R2 ,
  
∗ ∗ ∗ ∗ ∗ ∗ 1 ∗ 2
ϕS (x , x) = max sup {x z + xz }, sup x z + xz − (z + z )
z≥0,z+z ∗ ≤0 z≥0,z+z ∗ ≥0 4
x2 , if x ≥ 0, x ≥ x∗ ,

=
+∞, otherwise,

and, for all (y, y ∗ ) ∈ R2 , ϕT (y ∗ , y) = δR+ (y ∗ ) + δR− (y). Consequently,

{(x∗ + y ∗ , x, y, r) ∈ R × R × R × R | ϕ∗S (x∗ , x) + ϕ∗T (y ∗ , y) ≤ r}


[
{x} × R− × [x2 , +∞) ,

= R×
x≥0

which is a closed set, thus, the closedness-type regularity condition (15.4) is


fulfilled.

Remark 15.6 (Rockafellar’s conjecture) R.T. Rockafellar conjectured that if X


is a general Banach space and S, T : X ⇒ X ∗ are two maximal monotone oper-
ators such that int(D(S)) ∩ D(T ) 6= ∅, then S + T is maximally monotone. The
conjecture is not solved yet.
Bibliography

[1] H.H. Bauschke, P.L. Combettes: Convex Analysis and Monotone Operator
Theory in Hilbert Spaces, Springer New York Dordrecht Heidelberg London,
2017

[2] J.M. Borwein, J.D. Vanderweff : Convex Functions, Cambridge University


Press, 2010

[3] R.I. Boţ: Conjugate Duality in Convex Optimization, Lecture Notes in Eco-
nomics and Mathematical Systems, Vol. 637, Springer Berlin Heidelberg, 2010

[4] R.I. Boţ: Skript zur Bachelor-Vorlesung “Funktionalanalysis” (SS2020),


www.mat.univie.ac.at/∼rabot/tutorials/st20/skriptfunktionalanalysis.pdf

[5] I. Ekeland, R. Témam: Convex Analysis and Variational Problems, SIAM,


1976

[6] R.R. Phelps: Convex Functions, Monotone Operators and Differentiability,


Lecture Notes in Mathematics, Vol. 1364, Springer Berlin Heidelberg New
York, 1993

[7] R.T. Rockafellar: Convex Analysis, Princeton University Press, 1970

[8] W. Rudin: Functional Analysis, McGraw-Hill, 1991

[9] S. Simons: From Hahn-Banach to Monotonicity, Lecture Notes in Mathemat-


ics, Vol. 1693, Springer New York, 2008

[10] C. Zălinescu: Convex Analysis in General Vector Spaces, World Scientific,


2002

107

You might also like