Epigrafo PDF
Epigrafo PDF
MARKUS GRASMAIR
1. Main Definitions
We start with providing the central definitions of convex functions and convex
sets.
Definition 1. A function f : Rn → R ∪ {+∞} is called convex, if
(1) f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
n
for all x, y ∈ R and all 0 < λ < 1.
It is called strictly convex, if it is convex and
f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y)
whenever x 6= y, 0 < λ < 1, and f (x) < ∞, f (y) < ∞.
It is important to note here that we allow the functions we consider to take the
value +∞. This leads sometimes to complications in the notation, but will be very
helpful later on. Indeed, convex functions taking the value +∞ appear naturally
as soon as we start looking at duality, which is possibly the central topic in convex
analysis. Note, however, that the functions may not take the value −∞.
Definition 2. A set K ⊂ Rn is convex, if
λx + (1 − λ)y ∈ K
whenever x, y ∈ K, and 0 < λ < 1.
We note that there is a very close connection between the convexity of functions
and the convexity of sets. Given a set K ⊂ Rn , we denote by χK : Rn → R ∪ {+∞}
its characteristic function, given by
(
0 if x ∈ K,
χK (x) =
+∞ if x 6∈ K.
Lemma 3. A set K is convex, if and only if its characteristic function χK is
convex.
Proof. Assume that K is convex, and let x, y ∈ Rn , and 0 < λ < 1. If either x 6∈ K
or y 6∈ K, then λχK (x) + (1 − λ)χK (y) = ∞, and therefore (1) (with f = χK ) is
automatically satisfied. On the other hand, if x, y ∈ K, then also λx+(1−λ)y ∈ K,
and therefore
χK (λx + (1 − λ)y) = 0 = λχK (x) + (1 − λ)χK (y).
Thus χK is convex.
Now assume that χK is convex, and let x, y ∈ K, and 0 < λ < 1. The convexity
of χK implies that
χK (λx + (1 − λ)y) ≤ λχK (x) + (1 − λ)χK (y) = 0.
Since χK only takes the values 0 and +∞, this shows that, actually, χK (λx + (1 −
λ)y) = 0, and therefore λx + (1 − λ)y ∈ K. Thus K is convex.
2. Continuity
A convex function need not be continuous—as can be easily seen from the ex-
ample of the characteristic function of a convex set, which is only ever continuous
in the trivial situations K = ∅ and K = Rn .
An even more worrying situation (at least from an optimization point of view)
occurs for instance in the situation of the function f : R → R ∪ {+∞} defined by
+∞, if x < 0,
f (x) = 1, if x = 0,
2
x , if x > 0.
This function can easily seen to be convex, and it is obviously not continuous at 0.
The problem is that it is not even lower semi-continuous: If you approach the value
0 from the right, the function values tend to 0, but f (0) = 1. Thus this is a convex
(and obviously coercive) function, which does not attain its minimum. Fortunately,
problems with continuity and lower semi-continuity only occur near points where
a convex function takes the value +∞, as the following result shows.
In the following we will always denote by
dom(f ) := x ∈ Rn : f (x) < ∞
the domain of the function f . We will say that a function is proper, if its domain
is non-empty; that is, the function is not identically equal to +∞.
Proposition 5. Assume that f : Rn → R∪{+∞} is convex and that x is contained
in the interior of dom(f ). Then f is locally Lipschitz continuous at x.
In other words, discontinuities of convex functions can only appear at the bound-
ary of their domain.
1The word epigraph is composed of the greek preposition epi meaning above, over and the
word graph. Thus it simply means “above the graph.”
BASICS OF CONVEX ANALYSIS 3
3. Differentiation
Assume now that f : Rn → R ∪ {+∞} is any function and that x ∈ dom(f ).
Given d ∈ Rn , we define the directional derivative of f at x in direction d as
f (x + td) − f (x)
Df (x; d) := lim ∈ R ∪ {±∞}
t→0 t
t>0
provided that the limit exists (in the extended real line R ∪ {±∞}.
For general functions f , the directional derivative need not necessarily exist (take
for instance the function x 7→ x sin(1/x), which can be shown to have no directional
derivatives at 0 in directions d 6= 0). For convex functions, however, the situation
is different.
Lemma 6. Assume that f is convex, x ∈ dom(f ), and d ∈ Rn . Then Df (x; d)
exists. If x is an element of the interior of dom(f ), then Df (x; d) < ∞.
Proof. Let 0 ≤ t1 < t2 . Then
t1 t1
x + t1 d = (x + t2 d) + 1 − x,
t2 t2
and thus the convexity of f implies that
t1 t1
f (x + t1 d) ≤ 1 − f (x) + f (x + t2 d),
t2 t2
which can be rewritten as
f (x + t1 d) − f (x) f (x + t2 d) − f (x)
≤ .
t1 t2
Thus the function
f (x + td) − f (x)
t 7→
t
is increasing. As a consequence, its limit as t → 0 from above—which is exactly
the directional derivative Df (x; d)—exists. In addition, if x is in the interior of
the domain of f , then f (x + td) is finite for t sufficiently small, and therefore
Df (x; d) < ∞.
Using the same argumenation as in the proof of the previous statement, one sees
that
f (x + td) ≥ f (x) + tDf (x; d)
for all t ≥ 0. Thus a sufficient (and also necessary) condition for x ∈ dom(f ) to be
a global minimizer of f is that
Df (x; d) ≥ 0 for all d ∈ Rn .
Next, we will introduce another notion of derivatives that is tailored to the needs
of convex functions.
Definition 7. Let f : Rn → R ∪ {+∞} and x ∈ dom(f ). An element ξ ∈ Rn with
the property that
f (y) ≥ f (x) + ξ T (y − x) for all y ∈ Rn
is called a subgradient of f at x.
The set of all subgradients of f at x is called the subdifferential of f at x, and
is denoted by ∂f (x).
An immediate consequence of the definition of the subdifferential is the following
result:
4 MARKUS GRASMAIR
The subdifferential is connected to the usual derivative and the directional de-
rivative in the following way:
Proposition 9. Assume that f : Rn → R ∪ {+∞} is convex and that x ∈ dom(f ).
• If f is differentiable at x, then ∂f (x) = {∇f (x)}.
• If ∂f (x) contains a single element, e.g. ∂f (x) = {ξ}, then f is differentiable
at x and ξ = ∇f (x).
Proof. See [3, Thm. 25.1].
Now assume that f is convex and that x, y ∈ Rn , and that ξ ∈ ∂f (x) and
η ∈ ∂f (y). Then the definition of the subdifferential implies that
f (y) − f (x) − ξ T (y − x) ≥ 0,
f (x) − f (y) − η T (x − y) ≥ 0.
Summing up these inequalities, we see that
(η − ξ)T (y − x) ≥ 0
whenever ξ ∈ ∂f (x) and η ∈ ∂f (y).
In case f is differentiable, this inequality becomes
(∇f (y) − ∇f (x))T (y − x) ≥ 0 for all x, y ∈ Rn .
Now consider the case of a one-dimensional function. Then this further simplifies
to
(f 0 (y) − f 0 (x))(y − x) ≥ 0 for all x, y ∈ R,
which can be restated as
f 0 (y) ≥ f 0 (x) whenever y ≥ x.
In other word, the derivative of a one-dimensional function is monotoneously in-
creasing. Even more, the converse is also true:
Proposition 11. Assume that f : R → R is convex and differentiable. Then f 0 is
monotoneously increasing. Conversely, if g : R → R is monotoneously increasing
and continuous, then there exists a convex function f such that f 0 = g.
Proof. The necessity of the monotonicity of f 0 has already been shown above. Con-
versely, if g is monotoneously increasing, we can define
Z x
f (x) := g(y) dy.
0
Then, we have for y > x and 0 < λ < 1 that
f (λx + (1 − λ)y) = λf (x) + (1 − λ)f (y)
Z λx+(1−λ)y Z y
+λ g(z) dz − (1 − λ) g(z) dz
x λx+(1−λ)y
0
≤ λf (x) + (1 − λ)f (y) + λ(1 − λ)(y − x)g (λx + (1 − λ)y)
− (1 − λ)λ(y − x)g 0 (λx + (1 − λ)y)
= λf (x) + (1 − λ)f (y),
showing that f is convex.
In higher dimensions, the situation is slightly more complicated. Still, one can
show the following result:
Proposition 12. Assume that f : Rn → R is differentiable. Then f is convex, if
and only if ∇f is increasing in the sense that
(∇f (y) − ∇f (x))T (y − x) ≥ 0 for all x, y ∈ Rn .
Proof. Apply Proposition 11 to the one-dimensional functions t 7→ f (x + t(y −
x)).
6 MARKUS GRASMAIR
Note the difference between the two different statements: In the first proposition
(in one dimension), the monotonicity of g implies that it is the derivative of some
convex function. In the second proposition (in higher dimensions), we already
know that the object we are dealing with (∇f ) is the derivative of some function;
monotonocity of the gradient implies that f is convex, but we do not have to care
about the existence of f .
Finally, it is important to note that there is a close connection between properties
of the Hessian of a function and its convexity:
Proposition 13. Assume that f : Rn → R ∪ {+∞} is twice differentiable on its
domain, and that dom(f ) is convex. Then f is convex, if and only if ∇2 f (x) is
positive semi-definite for all x ∈ dom(f ).
In addition, if ∇2 f (x) is positive definite for all but a countable number of points
x ∈ dom(f ), then f is strictly convex.
Then g is convex.
In addition, if x ∈ Rn and g(x) = fj (x) for some j ∈ I, then
∂fj (x) ⊂ ∂g(x).
Proof. If x, y ∈ Rn and 0 < λ < 1, then
g(λx + (1 − λ)y) = sup fi (λx + (1 − λ)y)
i∈I
≤ sup λfi (x) + (1 − λ)fi (y)
i∈I
≤ λ sup fi (x) + (1 − λ) sup fi (y)
i∈I i∈I
= λg(x) + (1 − λ)g(y).
Thus g is convex.
In addition, if g(x) = fj (x) and ξ ∈ ∂fj (x), then
g(y) ≥ fj (y) ≥ fj (x) + ξ T (y − x) = g(x) + ξ T (y − x),
showing that ξ ∈ ∂fj (x).
BASICS OF CONVEX ANALYSIS 7
There are two important things to note here: First, the result states that the
supremum of any number of convex functions is convex; there is no need to restrict
oneself to only finitely many functions. Second, the minimum even of two convex
functions is usually not convex any more.
Next we discuss sums of convex functions:
Proposition 15. Assume that f , g : Rn → R ∪ {+∞} are convex. Then also the
function f + g is convex and
∂(f + g)(x) ⊂ ∂f (x) + ∂g(x) for all x ∈ Rn .
If in addition f and g are lower semi-continuous and there exists some y ∈
dom(f ) ∩ dom(g) such that either one of the functions f and g is continuous at y,
then
∂(f + g)(x) = ∂f (x) + ∂g(x) for all x ∈ Rn .
Proof. The first part of the proposition is straightforward:
(f + g)(λx + (1 − λ)y) = f (λx + (1 − λ)y) + g(λx + (1 − λ)y)
≤ λf (x) + (1 − λ)f (y) + λg(x) + (1 − λ)g(y)
= λ(f + g)(x) + (1 − λ)(f + g)(y).
Also, if ξ ∈ ∂f (x) and η ∈ ∂g(x), then
(f + g)(y) ≥ f (x) + ξ T (y − x) + g(x) + η T (y − x) = (f + g)(x) + (ξ + η)T (y − x),
showing that ∂(f + g)(x) ⊂ ∂f (x) + ∂g(x).
The second part is tricky. A proof can for instance be found in [2, Chap. I,
Prop. 5.6].
Note that the second part of the previous result is just a kind of chain rule for
the composition of convex and linear functions. In the particular case where f is
differentiable, the result reads as
∇(f ◦ A)(x) = AT ∇f (Ax),
which is exactly the usual chain rule. Differentiability, however, is not required in
the convex setting.
5. Duality
Assume that f : Rn → R ∪ {+∞} is convex and lower semi-continuous and
consider any linear function x 7→ ξ T x (for some ξ ∈ Rn ). Then it is easy to
convince oneself that there exists some β ∈ R such that
f (x) ≥ ξ T x − β for all x ∈ Rn
(this follows basically from the definition and the local Lipschitz continuity of con-
vex functions). In other words,
β ≥ ξ T x − f (x) for all x ∈ Rn .
Now consider the smallest such number β ∗ , that is,
β ∗ := inf β ∈ R : β ≥ ξ T x − f (x) for all x ∈ Rn .
It is easy to see that β ∗ > −∞ unless the function f is everywhere equal to +∞.
Thus one can equivalently write
β ∗ = sup ξ T x − f (x).
x∈Rn
∗
The mapping that maps ξ to β plays a central role in covnex analysis.
Definition 18. Let f : Rn → R ∪ {+∞} be proper. Then its conjugate f ∗ : Rn →
R ∪ {+∞} is defined as
f ∗ (ξ) := sup ξ T x − f (x).
x∈Rn
∇f (x) = ..
.
.
xn |xn |p−2
Thus the maximum x∗ satisfies
x∗i |x∗i |p−2 = ξi
for all i. Solving these equations for x∗i , we obtain
1
x∗i = ξi |ξi | p−1 −1 .
Thus
n n
X 1 1 X p−1 p
f ∗ (ξ) = hξ (x∗ ) = |ξi | p−1 +1 − |ξi | .
i=1
p i=1
Defining
p
p∗ := ,
p−1
this can be written as
n
1 X p∗ 1
f ∗ (ξ) = |ξi | = kξkpp∗∗ .
p∗ i=1 p∗
On the other hand, if |ξj | ≤ 1 for all j (or in other words kξk∞ ≤ 1, then
Xn Xn
f ∗ (ξ) = sup ξ T x −
|xi | = sup ξi xi − |xi | = 0,
x∈Rn i=1 x∈Rn i=1
since all the entries in the sum are non-positive. Thus we see that
(
∗ 0, if kξk∞ ≤ 1,
f (ξ) =
+∞, if kξk∞ > 1.
A trivial consequence of the definition of the conjugate of a function is the so
called Fenchel inequality:
10 MARKUS GRASMAIR
for every x ∈ Rn (for the last inequality, we simply choose y = x). Thus f ∗∗ is a
convex and lower semicontinuous function that is smaller or equal to f .
Now assume that there exists some x ∈ Rn such that f ∗∗ (x) < f (x). One can
show2 that, because of the convexity and lower semicontinuity of f , in this case
there exist δ > 0 and ξ ∈ Rn such that
f (y) ≥ f ∗∗ (x) + ξ T (y − x) + δ for all y ∈ Rn .
In other words, the function f and the point f ∗∗ (x) can be separated strictly by
the (shifted) hyperplane defined by ξ. Therefore
Remark 25. Note that we have actually shown in the proof that for all proper
functions f (not necessarily convex or lower semicontinuous), the function f ∗∗ is
the largest convex and lower semicontinuous function below f . That is, the function
f ∗∗ is the convex and lower semicontinuous hull of f .
2Basically, the proof of this statement relies on so called separation theorems, which state
that it is possible to separate a closed convex set from a point outside said set by means of a
hyperplane. Applying this result to the epigraph of f yields the claimed result. A large collection
of such separation theorems can be found in [3, Section 11].
BASICS OF CONVEX ANALYSIS 11
The following list taken from [1, p. 50] gives a short overview of some impor-
tant functions and their conjugates. Note that all these functions are lower semi-
continuous and convex, and thus we have f ∗∗ = f ; that is, the functions on the left
hand side are also the conjugates of the functions on the right hand side:
(
∗ 0, if ξ = 0,
f (x) = 0, ⇐⇒ f (ξ) =
+∞ else,
( (
0 if x ≥ 0, ∗ 0 if ξ ≤ 0,
f (x) = ⇐⇒ f (ξ) =
+∞ else, +∞ else,
(
0 if |x| ≤ 1,
f (x) = ⇐⇒ f ∗ (ξ) = |ξ|,
+∞ else,
f (x) = |x|p /p, ⇐⇒ f ∗ (ξ) = |ξ|p∗ /p∗ ,
( p
p − 1 − ξ2, if |y| ≤ 1,
f (x) = 1 + x2 , ⇐⇒ f ∗ (ξ) =
+∞, else,
(
ξ log ξ − ξ, if ξ ≥ 0,
f (x) = ex , ⇐⇒ f ∗ (ξ) =
+∞ else.
In addition, the next list provides a few basic rules for the computation of con-
jugates:
f (x) = h(tx), f ∗ (ξ) = h∗ (ξ/t) t 6= 0,
f (x) = h(x + y), f ∗ (ξ) = h∗ (ξ) − ξ T y, y ∈ Rn ,
f (x) = λh(x), f ∗ (ξ) = λh∗ (ξ/λ), λ > 0.
References
[1] Jonathan M. Borwein and Adrian S. Lewis. Convex analysis and nonlinear optimization. CMS
Books in Mathematics/Ouvrages de Mathématiques de la SMC, 3. Springer-Verlag, New York,
2000. Theory and examples.
[2] I. Ekeland and R. Temam. Convex Analysis and Variational Problems. North-Holland, Ams-
terdam, 1976.
[3] R. T. Rockafellar. Convex Analysis, volume 28 of Princeton Mathematical Series. Princeton
University Press, Princeton, 1970.