0% found this document useful (0 votes)
66 views12 pages

Epigrafo PDF

The document provides definitions and key concepts in convex analysis, including: 1) Convex functions are defined as those satisfying a certain inequality involving weighted averages, and convex sets are defined similarly. The epigraph of a function and characteristic function of a set can be used to determine if a function is convex or a set is convex. 2) Convex functions need not be continuous, but discontinuities can only occur at boundaries of the domain. Directional derivatives of convex functions are always well-defined. 3) The subdifferential of a convex function at a point consists of subgradients, which generalize the gradient and characterize minimizers of the function.

Uploaded by

xAlexis19x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views12 pages

Epigrafo PDF

The document provides definitions and key concepts in convex analysis, including: 1) Convex functions are defined as those satisfying a certain inequality involving weighted averages, and convex sets are defined similarly. The epigraph of a function and characteristic function of a set can be used to determine if a function is convex or a set is convex. 2) Convex functions need not be continuous, but discontinuities can only occur at boundaries of the domain. Directional derivatives of convex functions are always well-defined. 3) The subdifferential of a convex function at a point consists of subgradients, which generalize the gradient and characterize minimizers of the function.

Uploaded by

xAlexis19x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

BASICS OF CONVEX ANALYSIS

MARKUS GRASMAIR

1. Main Definitions
We start with providing the central definitions of convex functions and convex
sets.
Definition 1. A function f : Rn → R ∪ {+∞} is called convex, if
(1) f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
n
for all x, y ∈ R and all 0 < λ < 1.
It is called strictly convex, if it is convex and
f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y)
whenever x 6= y, 0 < λ < 1, and f (x) < ∞, f (y) < ∞.
It is important to note here that we allow the functions we consider to take the
value +∞. This leads sometimes to complications in the notation, but will be very
helpful later on. Indeed, convex functions taking the value +∞ appear naturally
as soon as we start looking at duality, which is possibly the central topic in convex
analysis. Note, however, that the functions may not take the value −∞.
Definition 2. A set K ⊂ Rn is convex, if
λx + (1 − λ)y ∈ K
whenever x, y ∈ K, and 0 < λ < 1.
We note that there is a very close connection between the convexity of functions
and the convexity of sets. Given a set K ⊂ Rn , we denote by χK : Rn → R ∪ {+∞}
its characteristic function, given by
(
0 if x ∈ K,
χK (x) =
+∞ if x 6∈ K.
Lemma 3. A set K is convex, if and only if its characteristic function χK is
convex.
Proof. Assume that K is convex, and let x, y ∈ Rn , and 0 < λ < 1. If either x 6∈ K
or y 6∈ K, then λχK (x) + (1 − λ)χK (y) = ∞, and therefore (1) (with f = χK ) is
automatically satisfied. On the other hand, if x, y ∈ K, then also λx+(1−λ)y ∈ K,
and therefore
χK (λx + (1 − λ)y) = 0 = λχK (x) + (1 − λ)χK (y).
Thus χK is convex.
Now assume that χK is convex, and let x, y ∈ K, and 0 < λ < 1. The convexity
of χK implies that
χK (λx + (1 − λ)y) ≤ λχK (x) + (1 − λ)χK (y) = 0.
Since χK only takes the values 0 and +∞, this shows that, actually, χK (λx + (1 −
λ)y) = 0, and therefore λx + (1 − λ)y ∈ K. Thus K is convex. 

Date: March 2015.


1
2 MARKUS GRASMAIR

Conversely, given a function f : Rn → R ∪ {+∞}, define its epigraph 1 as


epi(f ) := (x, t) ∈ Rn × R : t ≥ f (x) .


Lemma 4. The function f : Rn → R ∪ {+∞} is convex, if and only if its epigraph


epi(f ) ⊂ Rn+1 is convex.
Proof. Assume first that the epigraph of f is convex and let x, y ∈ Rn be such that
f (x), f (y) < ∞, and let 0 < λ < 1. Since (x, f (x)), (x, f (y)) ∈ epi(f ) it follows
that
λ(x, f (x)) + (1 − λ)(y, f (y)) ∈ epi(f ),
which means that
λf (x) + (1 − λ)f (y) ≥ f (λx + (1 − λ)y).
Since this holds for every x, y and 0 < λ < 1, the function f is convex.
Now assume that the function f is convex and let (x, t), (y, s) ∈ epi(f ), and
0 < λ < 1. Then
λt + (1 − λ)s ≥ λf (x) + (1 − λ)f (y) ≥ f (λx + (1 − λ)y),
which implies that λ(x, t) + (1 − λ)(y, s) ∈ epi(f ). Thus epi(f ) is convex. 

2. Continuity
A convex function need not be continuous—as can be easily seen from the ex-
ample of the characteristic function of a convex set, which is only ever continuous
in the trivial situations K = ∅ and K = Rn .
An even more worrying situation (at least from an optimization point of view)
occurs for instance in the situation of the function f : R → R ∪ {+∞} defined by

+∞, if x < 0,

f (x) = 1, if x = 0,

 2
x , if x > 0.
This function can easily seen to be convex, and it is obviously not continuous at 0.
The problem is that it is not even lower semi-continuous: If you approach the value
0 from the right, the function values tend to 0, but f (0) = 1. Thus this is a convex
(and obviously coercive) function, which does not attain its minimum. Fortunately,
problems with continuity and lower semi-continuity only occur near points where
a convex function takes the value +∞, as the following result shows.
In the following we will always denote by
dom(f ) := x ∈ Rn : f (x) < ∞


the domain of the function f . We will say that a function is proper, if its domain
is non-empty; that is, the function is not identically equal to +∞.
Proposition 5. Assume that f : Rn → R∪{+∞} is convex and that x is contained
in the interior of dom(f ). Then f is locally Lipschitz continuous at x.
In other words, discontinuities of convex functions can only appear at the bound-
ary of their domain.

Proof. See [3, Thm. 10.4]. 

1The word epigraph is composed of the greek preposition epi meaning above, over and the
word graph. Thus it simply means “above the graph.”
BASICS OF CONVEX ANALYSIS 3

3. Differentiation
Assume now that f : Rn → R ∪ {+∞} is any function and that x ∈ dom(f ).
Given d ∈ Rn , we define the directional derivative of f at x in direction d as
f (x + td) − f (x)
Df (x; d) := lim ∈ R ∪ {±∞}
t→0 t
t>0

provided that the limit exists (in the extended real line R ∪ {±∞}.
For general functions f , the directional derivative need not necessarily exist (take
for instance the function x 7→ x sin(1/x), which can be shown to have no directional
derivatives at 0 in directions d 6= 0). For convex functions, however, the situation
is different.
Lemma 6. Assume that f is convex, x ∈ dom(f ), and d ∈ Rn . Then Df (x; d)
exists. If x is an element of the interior of dom(f ), then Df (x; d) < ∞.
Proof. Let 0 ≤ t1 < t2 . Then
t1  t1 
x + t1 d = (x + t2 d) + 1 − x,
t2 t2
and thus the convexity of f implies that
 t1  t1
f (x + t1 d) ≤ 1 − f (x) + f (x + t2 d),
t2 t2
which can be rewritten as
f (x + t1 d) − f (x) f (x + t2 d) − f (x)
≤ .
t1 t2
Thus the function
f (x + td) − f (x)
t 7→
t
is increasing. As a consequence, its limit as t → 0 from above—which is exactly
the directional derivative Df (x; d)—exists. In addition, if x is in the interior of
the domain of f , then f (x + td) is finite for t sufficiently small, and therefore
Df (x; d) < ∞. 

Using the same argumenation as in the proof of the previous statement, one sees
that
f (x + td) ≥ f (x) + tDf (x; d)
for all t ≥ 0. Thus a sufficient (and also necessary) condition for x ∈ dom(f ) to be
a global minimizer of f is that
Df (x; d) ≥ 0 for all d ∈ Rn .
Next, we will introduce another notion of derivatives that is tailored to the needs
of convex functions.
Definition 7. Let f : Rn → R ∪ {+∞} and x ∈ dom(f ). An element ξ ∈ Rn with
the property that
f (y) ≥ f (x) + ξ T (y − x) for all y ∈ Rn
is called a subgradient of f at x.
The set of all subgradients of f at x is called the subdifferential of f at x, and
is denoted by ∂f (x).
An immediate consequence of the definition of the subdifferential is the following
result:
4 MARKUS GRASMAIR

Lemma 8. Assume that f : Rn → R ∪ {+∞} is convex and proper. Then x ∈ Rn


is a global minimizer of f , if and only if
0 ∈ ∂f (x).
Proof. Exercise. 

The subdifferential is connected to the usual derivative and the directional de-
rivative in the following way:
Proposition 9. Assume that f : Rn → R ∪ {+∞} is convex and that x ∈ dom(f ).
• If f is differentiable at x, then ∂f (x) = {∇f (x)}.
• If ∂f (x) contains a single element, e.g. ∂f (x) = {ξ}, then f is differentiable
at x and ξ = ∇f (x).
Proof. See [3, Thm. 25.1]. 

Proposition 10. Assume that f : Rn → R ∪ {+∞} is convex and that x ∈ dom(f ).


Then ξ ∈ ∂f (x), if and only if
ξ T d ≤ Df (x; d) for all d ∈ Rn .
Proof. See [3, Thm. 23.2]. 

Next we will compute the subdifferential in some particular (somehow important)


cases:
• We consider first the function f : Rn → R,
Xn 1/2
f (x) = kxk2 = x2i .
i=1

If x 6= 0, then f is differentiable at x with deriviative ∇f (x) = x/kxk2 , and


therefore ∂f (x) = {x/kxk2 } for x 6= 0.
Now consider the case x = 0. We have ξ ∈ ∂f (0), if and only if
ξ T y ≤ f (y) − f (0) = kyk2
for all y ∈ Rn , which is equivalent to the condition that kξk2 ≤ 1.
Thus we have
x o
n
 if x 6= 0,
∂f (x) =  kxk2
 ξ ∈ Rn : kξk ≤ 1 if x = 0.
2

• Consider now the function f : Rn → R ∪ {+∞},


(
0, if kxk2 ≤ 1,
f (x) =
+∞, if kxk2 > 1.
If x ∈ Rn satisfies kxk < 1, then f is differentiable at x with gradient
equal to 0, and therefore ∂f (x) = {0}. If kxk > 1, then f (x) = +∞, and
therefore ∂f (x) = ∅. Finally, if kxk = 1, then ξ ∈ ∂f (x), if and only if
ξ T (y − x) ≤ f (y) − f (x) for all y ∈ Rn ,
which is equivalent to the condition
ξ T (y − x) ≤ 0 for all y ∈ Rn with kyk2 ≤ 1.
The latter condition can in turn be shown to be equivalent to stating that
ξ = λx for some λ ≥ 0.
BASICS OF CONVEX ANALYSIS 5

Thus we obtain that



∅,


if kxk > 1,
∂f (x) = λx : λ ≥ 0 if kxk = 1,

{0}, if kxk < 1.

Now assume that f is convex and that x, y ∈ Rn , and that ξ ∈ ∂f (x) and
η ∈ ∂f (y). Then the definition of the subdifferential implies that
f (y) − f (x) − ξ T (y − x) ≥ 0,
f (x) − f (y) − η T (x − y) ≥ 0.
Summing up these inequalities, we see that
(η − ξ)T (y − x) ≥ 0
whenever ξ ∈ ∂f (x) and η ∈ ∂f (y).
In case f is differentiable, this inequality becomes
(∇f (y) − ∇f (x))T (y − x) ≥ 0 for all x, y ∈ Rn .
Now consider the case of a one-dimensional function. Then this further simplifies
to
(f 0 (y) − f 0 (x))(y − x) ≥ 0 for all x, y ∈ R,
which can be restated as
f 0 (y) ≥ f 0 (x) whenever y ≥ x.
In other word, the derivative of a one-dimensional function is monotoneously in-
creasing. Even more, the converse is also true:
Proposition 11. Assume that f : R → R is convex and differentiable. Then f 0 is
monotoneously increasing. Conversely, if g : R → R is monotoneously increasing
and continuous, then there exists a convex function f such that f 0 = g.
Proof. The necessity of the monotonicity of f 0 has already been shown above. Con-
versely, if g is monotoneously increasing, we can define
Z x
f (x) := g(y) dy.
0
Then, we have for y > x and 0 < λ < 1 that
f (λx + (1 − λ)y) = λf (x) + (1 − λ)f (y)
Z λx+(1−λ)y Z y
+λ g(z) dz − (1 − λ) g(z) dz
x λx+(1−λ)y
0
≤ λf (x) + (1 − λ)f (y) + λ(1 − λ)(y − x)g (λx + (1 − λ)y)
− (1 − λ)λ(y − x)g 0 (λx + (1 − λ)y)
= λf (x) + (1 − λ)f (y),
showing that f is convex. 
In higher dimensions, the situation is slightly more complicated. Still, one can
show the following result:
Proposition 12. Assume that f : Rn → R is differentiable. Then f is convex, if
and only if ∇f is increasing in the sense that
(∇f (y) − ∇f (x))T (y − x) ≥ 0 for all x, y ∈ Rn .
Proof. Apply Proposition 11 to the one-dimensional functions t 7→ f (x + t(y −
x)). 
6 MARKUS GRASMAIR

Note the difference between the two different statements: In the first proposition
(in one dimension), the monotonicity of g implies that it is the derivative of some
convex function. In the second proposition (in higher dimensions), we already
know that the object we are dealing with (∇f ) is the derivative of some function;
monotonocity of the gradient implies that f is convex, but we do not have to care
about the existence of f .
Finally, it is important to note that there is a close connection between properties
of the Hessian of a function and its convexity:
Proposition 13. Assume that f : Rn → R ∪ {+∞} is twice differentiable on its
domain, and that dom(f ) is convex. Then f is convex, if and only if ∇2 f (x) is
positive semi-definite for all x ∈ dom(f ).
In addition, if ∇2 f (x) is positive definite for all but a countable number of points
x ∈ dom(f ), then f is strictly convex.

4. Transformations of convex functions


Next we will discuss, which operations preserve the convexity of functions, and
how these operations affect the subdifferential.
The first thing to note is that convex functions behave well under multiplication
with positive scalars: If f : Rn → R ∪ {+∞} is convex and λ > 0, then also λf is
convex and
∂(λf )(x) = λ∂f (x) for all x ∈ Rn .
Note that this result does not hold for negative scalars: If f is convex and λ < 0,
then usually λf will not convex. Also, for λ = 0 some problems occur if f takes the
value +∞. First, it is not clear how to define the product 0 · (+∞) in a meaningful
way, and, second, the equality ∂(0·f )(x) = 0·∂f (x) need not necessarily be satisfied
for all x.
Next we consider a function that is defined as a supremum of convex functions:
Lemma 14. Assume that I is any index set (not necessarily finite or even count-
able) and that fi : Rn → R ∪ {+∞}, i ∈ I, are convex. Define
g(x) := sup fi (x).
i∈I

Then g is convex.
In addition, if x ∈ Rn and g(x) = fj (x) for some j ∈ I, then
∂fj (x) ⊂ ∂g(x).
Proof. If x, y ∈ Rn and 0 < λ < 1, then
g(λx + (1 − λ)y) = sup fi (λx + (1 − λ)y)
i∈I
 
≤ sup λfi (x) + (1 − λ)fi (y)
i∈I
≤ λ sup fi (x) + (1 − λ) sup fi (y)
i∈I i∈I
= λg(x) + (1 − λ)g(y).
Thus g is convex.
In addition, if g(x) = fj (x) and ξ ∈ ∂fj (x), then
g(y) ≥ fj (y) ≥ fj (x) + ξ T (y − x) = g(x) + ξ T (y − x),
showing that ξ ∈ ∂fj (x). 
BASICS OF CONVEX ANALYSIS 7

There are two important things to note here: First, the result states that the
supremum of any number of convex functions is convex; there is no need to restrict
oneself to only finitely many functions. Second, the minimum even of two convex
functions is usually not convex any more.
Next we discuss sums of convex functions:
Proposition 15. Assume that f , g : Rn → R ∪ {+∞} are convex. Then also the
function f + g is convex and
∂(f + g)(x) ⊂ ∂f (x) + ∂g(x) for all x ∈ Rn .
If in addition f and g are lower semi-continuous and there exists some y ∈
dom(f ) ∩ dom(g) such that either one of the functions f and g is continuous at y,
then
∂(f + g)(x) = ∂f (x) + ∂g(x) for all x ∈ Rn .
Proof. The first part of the proposition is straightforward:
(f + g)(λx + (1 − λ)y) = f (λx + (1 − λ)y) + g(λx + (1 − λ)y)
≤ λf (x) + (1 − λ)f (y) + λg(x) + (1 − λ)g(y)
= λ(f + g)(x) + (1 − λ)(f + g)(y).
Also, if ξ ∈ ∂f (x) and η ∈ ∂g(x), then
(f + g)(y) ≥ f (x) + ξ T (y − x) + g(x) + η T (y − x) = (f + g)(x) + (ξ + η)T (y − x),
showing that ∂(f + g)(x) ⊂ ∂f (x) + ∂g(x).
The second part is tricky. A proof can for instance be found in [2, Chap. I,
Prop. 5.6]. 

In the above proposition, the sum of subdifferentials is defined as


∂f (x) + ∂g(x) = ξ + η ∈ Rn : ξ ∈ ∂f (x) and η ∈ ∂g(x) .


Example 16. Define the functions f , g : R2 → R ∪ {+∞},


( (
0, if kxk2 ≤ 1, 0, if kx − (2, 0)k2 ≤ 1,
f (x) = and g(x) =
+∞, if kxk2 > 1, +∞, if kx − (2, 0)k2 > 1,
Both of these functions are convex. Moreover
(
0, if x = (1, 0),
(f + g)(x) =
+∞ else.
Now consider the subdifferential of the different functions at (1, 0). We have

∂f (1, 0) = λ(1, 0) : λ ≥ 0
and

∂g(1, 0) = µ(−1, 0) : µ ≥ 0 .
Thus
∂f (1, 0) + ∂g(1, 0) = R(1, 0).
On the other hand,
∂(f + g)(1, 0) = R2 .
Thus in the case we have
∂f (1, 0) + ∂g(1, 0) ( ∂(f + g)(1, 0).
Finally, we study the composition of convex functions and linear transformations:
8 MARKUS GRASMAIR

Proposition 17. Assume that f : Rn → R ∪ {+∞} is convex and that A ∈ Rn×m .


Then the function g : Rm → R ∪ {+∞} defined by
g(x) = f (Ax)
is convex.
If in addition there exists y ∈ Ran A such that f is continuous at y, then
∂g(x) = AT ∂f (Ax)
for all x ∈ Rn .
Proof. Proving the convexity of g is straightforward. The proof of the representa-
tion of the subdifferential can for instance be found in [2, Chap. I, Prop. 5.7]. 

Note that the second part of the previous result is just a kind of chain rule for
the composition of convex and linear functions. In the particular case where f is
differentiable, the result reads as
∇(f ◦ A)(x) = AT ∇f (Ax),
which is exactly the usual chain rule. Differentiability, however, is not required in
the convex setting.

5. Duality
Assume that f : Rn → R ∪ {+∞} is convex and lower semi-continuous and
consider any linear function x 7→ ξ T x (for some ξ ∈ Rn ). Then it is easy to
convince oneself that there exists some β ∈ R such that
f (x) ≥ ξ T x − β for all x ∈ Rn
(this follows basically from the definition and the local Lipschitz continuity of con-
vex functions). In other words,
β ≥ ξ T x − f (x) for all x ∈ Rn .
Now consider the smallest such number β ∗ , that is,
β ∗ := inf β ∈ R : β ≥ ξ T x − f (x) for all x ∈ Rn .


It is easy to see that β ∗ > −∞ unless the function f is everywhere equal to +∞.
Thus one can equivalently write
β ∗ = sup ξ T x − f (x).
x∈Rn

The mapping that maps ξ to β plays a central role in covnex analysis.
Definition 18. Let f : Rn → R ∪ {+∞} be proper. Then its conjugate f ∗ : Rn →
R ∪ {+∞} is defined as
f ∗ (ξ) := sup ξ T x − f (x).
x∈Rn

Lemma 19. Let f : Rn → R ∪ {+∞} be proper. Then f ∗ is convex and lower


semi-continuous.
Proof. The function f ∗ is the supremum of the collection of all affine (and thus in
particular convex and lower semi-continuous) functions hx (ξ) := ξ T x − f (x). Thus
also f ∗ is convex and lower semi-continuous. 
BASICS OF CONVEX ANALYSIS 9

Example 20. Let


n
1 1X p
f (x) = kxkpp = |xi |
p p i=1
with 1 < p < +∞. In order to compute f ∗ (ξ), we have to maximize the function
x 7→ hξ (x) := ξ − ∇f (x).
Since hξ is concave (that is, −hξ is convex) and differentiable, maximization is
equivalent to solving the equation ∇hξ (x) = 0. Now note that ∇hξ (x) = ξ −∇f (x),
and
x1 |x1 |p−2
 

∇f (x) =  ..
.
 
.
xn |xn |p−2
Thus the maximum x∗ satisfies
x∗i |x∗i |p−2 = ξi
for all i. Solving these equations for x∗i , we obtain
1
x∗i = ξi |ξi | p−1 −1 .
Thus
n n
X 1 1 X p−1 p
f ∗ (ξ) = hξ (x∗ ) = |ξi | p−1 +1 − |ξi | .
i=1
p i=1
Defining
p
p∗ := ,
p−1
this can be written as
n
1 X p∗ 1
f ∗ (ξ) = |ξi | = kξkpp∗∗ .
p∗ i=1 p∗

Example 21. Let


n
X
f (x) = kxk1 = |xi |.
i=1
Then
 n
X 
f ∗ (ξ) = sup ξ T x − |xi | .
x∈Rn i=1
Now assume that |ξj | > 1 for some 1 ≤ j ≤ n. Now consider only vectors x in
the supremum above for which the j th entry is λ sgn ξj and all the other entries of
which are zero. Then we see that
 X
f ∗ (ξ) ≥ sup ξj λ sgn ξj − λ = λ(|ξj | − 1) = +∞.
λ>0
λ>0

On the other hand, if |ξj | ≤ 1 for all j (or in other words kξk∞ ≤ 1, then
 Xn  Xn
f ∗ (ξ) = sup ξ T x −

|xi | = sup ξi xi − |xi | = 0,
x∈Rn i=1 x∈Rn i=1

since all the entries in the sum are non-positive. Thus we see that
(
∗ 0, if kξk∞ ≤ 1,
f (ξ) =
+∞, if kξk∞ > 1.
A trivial consequence of the definition of the conjugate of a function is the so
called Fenchel inequality:
10 MARKUS GRASMAIR

Lemma 22. If f : Rn → R ∪ {+∞} is proper, then


f (x) + f ∗ (ξ) ≥ ξ T x for all x, ξ ∈ Rn .
Proof. This follows immediately from the definition of f ∗ . 

Proposition 23. Assume that f : Rn → R ∪ {+∞} is proper, convex and lower


semi-continuous and that x, ξ ∈ Rn . Then the following are equivalent:
ξ T x = f (x) + f ∗ (ξ),
ξ ∈ ∂f (x),
x ∈ ∂f ∗ (ξ).
Proof. See [3, Thm. 23.5]. 

As a next step, it is also possible to compute the conjugate of the


f ∗∗ (x) := (f ∗ )∗ (x) = sup ξ T x − f ∗ (ξ) .

ξ∈Rn

Theorem 24. Assume that f : Rn → R ∪ {+∞} is proper, convex, and lower


semi-continuous. Then f ∗∗ = f .
Sketch of Proof. First recall that f ∗∗ is by construction convex and lower semi-
continuous. Moreover we have that
 
f ∗∗ (x) = sup ξ T x − f ∗ (ξ) = sup ξ T x − sup ξ T y − f (y)

ξ∈Rn ξ∈Rn y∈Rn

= sup infn ξ T (x − y) + f (y) ≤ f (x)



ξ∈Rn y∈R

for every x ∈ Rn (for the last inequality, we simply choose y = x). Thus f ∗∗ is a
convex and lower semicontinuous function that is smaller or equal to f .
Now assume that there exists some x ∈ Rn such that f ∗∗ (x) < f (x). One can
show2 that, because of the convexity and lower semicontinuity of f , in this case
there exist δ > 0 and ξ ∈ Rn such that
f (y) ≥ f ∗∗ (x) + ξ T (y − x) + δ for all y ∈ Rn .
In other words, the function f and the point f ∗∗ (x) can be separated strictly by
the (shifted) hyperplane defined by ξ. Therefore

f ∗∗ (x) = sup infn ζ T (x − y) + f (y)



ζ∈Rn y∈R

≥ infn ζ T (x − y) + f (y) ≥ f ∗∗ (x) + δ,



y∈R

which is obviously a contradiction. Thus f ∗∗ (x) ≥ f (x) for every x ∈ Rn . Since


we have already shown that f ∗∗ (x) ≤ f (x) for all x, this implies that, actually
f ∗∗ = f . 

Remark 25. Note that we have actually shown in the proof that for all proper
functions f (not necessarily convex or lower semicontinuous), the function f ∗∗ is
the largest convex and lower semicontinuous function below f . That is, the function
f ∗∗ is the convex and lower semicontinuous hull of f .

2Basically, the proof of this statement relies on so called separation theorems, which state
that it is possible to separate a closed convex set from a point outside said set by means of a
hyperplane. Applying this result to the epigraph of f yields the claimed result. A large collection
of such separation theorems can be found in [3, Section 11].
BASICS OF CONVEX ANALYSIS 11

The following list taken from [1, p. 50] gives a short overview of some impor-
tant functions and their conjugates. Note that all these functions are lower semi-
continuous and convex, and thus we have f ∗∗ = f ; that is, the functions on the left
hand side are also the conjugates of the functions on the right hand side:
(
∗ 0, if ξ = 0,
f (x) = 0, ⇐⇒ f (ξ) =
+∞ else,
( (
0 if x ≥ 0, ∗ 0 if ξ ≤ 0,
f (x) = ⇐⇒ f (ξ) =
+∞ else, +∞ else,
(
0 if |x| ≤ 1,
f (x) = ⇐⇒ f ∗ (ξ) = |ξ|,
+∞ else,
f (x) = |x|p /p, ⇐⇒ f ∗ (ξ) = |ξ|p∗ /p∗ ,
( p
p − 1 − ξ2, if |y| ≤ 1,
f (x) = 1 + x2 , ⇐⇒ f ∗ (ξ) =
+∞, else,
(
ξ log ξ − ξ, if ξ ≥ 0,
f (x) = ex , ⇐⇒ f ∗ (ξ) =
+∞ else.
In addition, the next list provides a few basic rules for the computation of con-
jugates:
f (x) = h(tx), f ∗ (ξ) = h∗ (ξ/t) t 6= 0,
f (x) = h(x + y), f ∗ (ξ) = h∗ (ξ) − ξ T y, y ∈ Rn ,
f (x) = λh(x), f ∗ (ξ) = λh∗ (ξ/λ), λ > 0.

6. Primal and dual problem


We now consider the special (but important) case of an optimization problem
on Rn of the form
f (x) + g(Ax) → min,
where f : R → R∪{+∞} is convex, A ∈ Rm×n , and g : Rm → R∪{+∞} is convex.
n

Its dual problem is the optimization problem on Rm given by


f ∗ (AT ξ) + g ∗ (−ξ) → min .
One of the central theorems in convex analysis is the following one, which shows
that, under certain non-restrictive assumptions, these two problems are equivalent
in some sense.
Theorem 26. Assume that f : Rn → R ∪ {+∞} is convex, A ∈ Rm×n , and
g : Rm → R ∪ {+∞} is convex. Assume moreover that there exists x ∈ dom(f )
such that g is continuous at Ax. Assume moreover that x∗ ∈ Rn and ξ ∗ ∈ Rm .
Then the following are equivalent:
(1) The vector x∗ minimizes the function x 7→ f (x) + g(Ax) and ξ ∗ minimizes
the function ξ 7→ f ∗ (AT ξ) + g ∗ (−ξ).
(2) We have
f (x∗ ) + g(Ax∗ ) + f ∗ (AT ξ ∗ ) + g ∗ (−ξ ∗ ) = 0.
(3) We have
x∗ ∈ ∂f ∗ (AT ξ ∗ ) and Ax∗ ∈ ∂g ∗ (−ξ ∗ ).
(4) We have
AT ξ ∗ ∈ ∂f (x∗ ) and − ξ ∗ ∈ ∂g(Ax∗ ).
12 MARKUS GRASMAIR

Proof. See [3, Thms. 31.2, 31.3]. 


Basically, this result allows one to switch freely between the primal (original)
problem and the dual problem and to try to solve the one that appears to be
easier. For instance, if n  m, the primal problem may be much more complicated
than the dual problem, because the number of variables is much larger. Thus,
instead of solving the n-dimensional primal problem, one can try the solve the m-
dimensional dual problem. If in addition the function f ∗ is differentiable (which is
the case if f is strictly convex), then x∗ can be easily recovered from the condition
x∗ ∈ ∂f ∗ (AT ξ ∗ ), since the subdifferential of f ∗ contains only a single element.

References
[1] Jonathan M. Borwein and Adrian S. Lewis. Convex analysis and nonlinear optimization. CMS
Books in Mathematics/Ouvrages de Mathématiques de la SMC, 3. Springer-Verlag, New York,
2000. Theory and examples.
[2] I. Ekeland and R. Temam. Convex Analysis and Variational Problems. North-Holland, Ams-
terdam, 1976.
[3] R. T. Rockafellar. Convex Analysis, volume 28 of Princeton Mathematical Series. Princeton
University Press, Princeton, 1970.

Department of Mathematics, Norwegian University of Science and Technology, 7491


Trondheim, Norway
E-mail address: [email protected]

You might also like