Mathematics Essentials for Convex Optimization
Mathematics Essentials for Convex Optimization
FOR
CONVEX OPTIMIZATION
Preface x
iii
iv Contents
7 Separation Theorem 83
7.1 Separation: definition 83
7.2 Separation Theorem 85
15 Subgradients 188
15.1 Proper lower semicontinuous convex functions and their representation 188
15.2 Subgradients 198
15.3 Subdifferentials and directional derivatives of convex functions 204
vi Contents
x = [x1 ; . . . ; xn ]
1
(pay attention to semicolon “;”). For example, 2 is written as [1; 2; 3].
3
More generally,
— if A1 , . . . , Am are matrices with the same number of columns, we write
[A1 ; . . . ; Am ] to denote the matrix obtained by writing A2 beneath A1 , A3
beneath A2 , and so on.
— if A1 , . . . , Am are matrices with the same number of rows, then [A1 , . . . , Am ]
stands for the matrix obtained by writing A2 to the right of A1 , A3 to the right
of A2 , and so on.
Examples:
1 2 3
1 2 3
• A1 = , A2 = 7 8 9 =⇒ [A1 ; A2 ] = 4 5 6
4 5 6
7 8 9
1 2 7 1 2 7
• A1 = , A2 = =⇒ [A1 , A2 ] =
3 4 8 4 5 8
• [1, 2, 3, 4] = [1; 2; 3; 4]⊤
1 2 5 6 1 2 5 6
• [[1, 2; 3, 4], [5, 6; 7, 8]] = , =
3 4 7 8 3 4 7 8
= [1, 2, 5, 6; 3, 4, 7, 8]
• We follow the standard
P0 convention that the sum of vectors over an empty set
of indexes, i.e., i=1 xi , where xi ∈ Rn , has a value – it is the origin in Rn .
Main Notational Conventions xv
1
1
The definition of a segment [x; y] is in full accordance with our “real life ex-
perience” in 2D or 3D: when λ ∈ [0, 1], the point x(λ) = λx + (1 − λ)y =
x + (1 − λ)(y − x) is the point where you arrive when traveling from x directly
towards y after you have covered the fraction (1 − λ) of the entire distance from x
to y, and these points compose the “real world segment” with endpoints x = x(1)
and y = x(0).
Note that an empty set is convex by the exact sense of the definition: for the
empty set, you cannot present a counterexample to show that it is not convex.
A closed ray given by a direction 0 ̸= d ∈ Rn is also convex:
R+ (d) := {t d ∈ Rn : t ≥ 0} .
Note also that the open ray given by {t d ∈ Rn : t > 0} is convex as well.
3
4 First acquaintance with convex sets
is convex.
Proof. Consider any x′ , x′′ ∈ S and any λ ∈ [0, 1]. As x′ , x′′ ∈ S, we have
a⊤ ′ ⊤ ′′
α x ≤ bα and aα x ≤ bα for any α ∈ A. Then, for every α ∈ A, multiplying the
inequality aα x ≤ bα by λ, and the inequality a⊤
⊤ ′ ′′
α x ≤ bα by 1 − λ, respectively,
and summing up the resulting inequalities, we get a⊤ ′ ′′
α [λx + (1 − λ)x ] ≤ bα . Thus,
′ ′′
we deduce that λx + (1 − λ)x ∈ S.
Note that this verification of convexity of S works also in the case when in the
definition of S some of nonstrict inequalities a⊤ α x ≤ bα are replaced with their
strict versions a⊤
α x < bα .
Recall that linear and affine subspaces can be represented as the solution sets
of systems of linear equations (Proposition A.47). Consequently, from Proposi-
tion I.1.2 we deduce that such sets are convex.
Example I.1.1 All linear subspaces and all affine subspaces of Rn are convex.
♢
Another important special case of Proposition I.1.2 is the one when we have
a finite system of nonstrict linear inequalities. Such sets have a special name as
they are frequently encountered and studied.
1.1 Definition and examples 5
■
Remark I.1.5 Replacing some of the nonstrict linear inequalities a⊤
αx≤ bα in
system (1.1) with their strict versions a⊤
α x < bα preserves, as we have already
mentioned, convexity of the solution set, but can destroy its closedness. ■
These indeed are norms (which is not clear in advance; for proof, see page 156,
and for more details – page 215). When p = 2, we get the usual Euclidean norm.
When p = 1, we get
n
X
∥x∥1 = |xi |,
i=1
When p = ∞, we get
∥x∥∞ = max |xi |,
1≤i≤n
and
∥x + y∥∞ = max |xi + yi | ≤ max {|xi | + |yi |} ≤ max {|xi | + |yj |}
i i i,j
Fact I.1.8 Unit balls of norms on Rn are exactly the same as convex sets
V in Rn satisfying the following three properties:
(i) V is symmetric with respect to the origin: x ∈ V =⇒ −x ∈ V ;
(ii) V is bounded and closed;
(iii) V contains a neighborhood of the origin, i.e., there exists r > 0 such that
the centered at the origin Euclidean ball of radius r – the set {x ∈ Rn :
∥x∥2 ≤ r} – is contained in V .
Any set V satisfying the outlined properties is indeed the unit ball of a
particular norm given by
∥x∥V := inf t : t−1 x ∈ V, t > 0 .
(1.2)
t
1.1.3 Ellipsoids
is convex.
is convex. ♢
Justification of Example I.1.3 is left as an exercise at the end of this Part (see
Exercise I.6).
is also convex.
By Linear Algebra, the linear span of a set M – the smallest (w.r.t. inclusion)
linear subspace containing M – can be described in terms of linear combinations:
this is the set of all linear combinations of points from M . Analogous results hold
for affine span of (nonempty) set and affine combinations of points from the set as
well. We have an analogous description of convex hulls via convex combinations
as well:
We will see in section 9.3 that when M is a finite set in Rn , Conv(M ) is a bounded
polyhedral set. Bounded polyhedral sets are also called polytopes.
We next continue with a number of important families of convex sets.
1.2.3 Simplex
1.2.4 Cones
We next examine a very important class of convex sets.
A nonempty set K ⊆ Rn is called conic if it contains, along with every point
x ∈ K, the entire ray R+ (x) = {tx : t ≥ 0} spanned by the point:
x∈K =⇒ tx ∈ K, ∀t ≥ 0.
Note that based on our definition, any conic set is nonempty and it always con-
tains the origin.
is a cone.
In particular, the solution set of a finite system composed of m homogeneous
linear inequalities
Ax ≥ 0
(A is m × n matrix) is a cone. A cone of this latter type is called polyhedral 1 .
Specifically, the nonnegative orthant Rm
+ := {x ∈ R
m
: x ≥ 0} is a polyhedral
cone. ♢
Note that the cones given by systems of linear homogeneous nonstrict inequal-
ities are obviously closed. From Separation Theorem (see Theorem II.7.3) we will
deduce the reverse as well, i.e., every closed cone is the solution set to such a
system. Thus, Example I.1.4 is the generic example of a closed convex cone.
We already know that a norm ∥·∥ on Rn gives rise to specific convex sets in Rn ,
namely, balls of this norm. In fact, a norm also gives rise to another important
convex set.
Proposition I.1.18 For any norm ∥ · ∥ on Rn , its epigraph, i.e., the set
K := [x; t] ∈ Rn+1 : t ≥ ∥x∥
is a closed cone. This is the second-order, (or Lorentz or ice cream) cone (see
Figure I.3), and it plays a significant role in convex optimization.
Definition I.1.19 [Conic hull] For any K ⊆ Rn , the conic hull of K [nota-
tion: Cone(K)] is the intersection of all cones containing K. Thus, Cone(K)
is the smallest (w.r.t. inclusion) cone containing K.
• We can describe the conic hull of a set K ⊆ Rn in terms of its conic combina-
tions:
Fact I.1.20 [Conic hull via conic combinations] The conic hull Cone(K) of
a set K ⊆ Rn is the set of all conic combinations (i.e., linear combinations
with nonnegative coefficients) of vectors from K:
( N
)
X
Cone(K) = x ∈ Rn : ∃N ≥ 0, λi ≥ 0, xi ∈ K, i ≤ N : x = λi xi .
i=1
Note that here we P use the standard convention: the sum of vectors over an empty
0
set of indexes, like i=1 z i , has a value – it is the origin of the space where vectors
live. In particular, the set of conic combinations of vectors from empty set is {0},
in full accordance with Definition I.1.19.
is convex.
Warning: “Linear combination λ1 M1 + . . . + λk Mk of sets” as defined
above is just a notation. When operating with these “linear combinations
of sets,” one should be careful. For example. while is it true that M1 +
M2 = M2 + M1 and that M1 + (M2 + M3 ) = (M1 + M2 ) + M3 , and
1.3 Calculus of convex sets 13
even that λ(M1 + M2 ) = λM1 + λM2 , it is, in general, not true that
(λ1 + λ2 )M = λ1 M + λ2 M .
4. Taking image under an affine mapping: if M ⊆ Rn is convex set and x 7→
A(x) ≡ Ax + b is an affine mapping from Rn into Rm (where A ∈ Rm×n
and b ∈ Rm ), then the image of M under the mapping A(·), i.e., the set
A(M ) := {A(x) : x ∈ M } ,
is convex.
5. Taking inverse image under affine mapping: if M ⊆ Rn is a convex set
and y 7→ A(y) = Ay + b is an affine mapping from Rm to Rn (where
A ∈ Rn×m and b ∈ Rn ), then the inverse image of M under the mapping
A(·), i.e., the set
A−1 (M ) := {y ∈ Rm : A(y) ∈ M } ,
is convex.
The (completely straightforward) verification of this proposition is left to the
reader.
of sets, M1 and M2 . Assuming both sets are nonempty and closed and M1
is bounded, we should prove that if a sequence {xi + y i }i with xi ∈ M1 and
y i ∈ M2 converges as i → ∞, the limit lim (xi + y i ) belongs to M1 + M2 .
i→∞
Since M1 , and thus the sequence {xi }i , is bounded, passing to a subsequence
we may assume that the sequence {xi }i converges, as i → ∞, to some x. Since
the sequence {xi + y i }i converges as well, the sequence {y i }i also converges to
some y. As M1 and M2 are closed, we have x ∈ M1 , y ∈ M2 , and therefore
lim (xi + y i ) = x + y ∈ M1 + M2 ,
i→∞
4. Multiplication by a real: For a nonempty closed convex set M and a real λ,
the set λM is closed and convex (why?).
5. Image under an affine mapping of a closed convex set M is convex, but not
necessarily closed; it is definitely closed when M is bounded.
As an example of closed convex set with a non-closed affine image consider
the set {[x; y] ∈ R2 : x, y ≥ 0, xy ≥ 1} (i.e., a branch of hyperbola) and its
projection onto the x-axis. This set is convex and closed, but its projection
onto the x-axis is the positive ray {x > 0} which is not closed. Closedness of
the affine image of a closed and bounded set is the special case of the general
fact:
the image of a closed and bounded set under a mapping that is continuous
on this set is closed and bounded as well (why?).
6. Inverse image under affine mapping: if M ⊆ Rn is convex and closed and
y 7→ A(y) = Ay + b is an affine mapping from Rm to Rn , then the set
A−1 (M ) := {y ∈ Rm : A(y) ∈ M }
is a closed convex set in Rm . Indeed, the convexity of A−1 (M ) is given by the
calculus of convexity, and its closedness is due to the following standard fact:
the inverse image of a closed set in Rn under continuous mapping from
Rm to Rn is closed (why?).
We see that the “calculus of closed convex sets” is somehow weaker than the
calculus of convexity per se. Nevertheless, we will see that these difficulties dis-
appear when restricting the operands of our operations to be polyhedral, and not
just closed and convex.
subset M of Rn there exists the smallest (w.r.t. inclusion) closed set containing
M . This leads us to the following definition.
From Real Analysis, we have the following inner description of the closure of
a set in a metric space (and, in particular, in Rn ).
Fact I.1.23 The closure of a set M ⊆ Rn is exactly the set composed of the
limits of all converging sequences of elements from M .
Example I.1.5 Based on Fact I.1.23, it is easy to prove that, e.g., the closure
of the open Euclidean ball
{x ∈ Rn : ∥x − a∥2 < r} [where r > 0]
is the closed Euclidean ball {x ∈ Rn : ∥x − a∥2 ≤ r}.
Another useful application example is the closure of a set defined by strict
linear inequalities, i.e.,
M := x ∈ Rn : a⊤
α x < bα , α ∈ A .
Whenever such a set M is nonempty, then its closure is given by the nonstrict
versions of the same inequalities:
cl M = x ∈ Rn : a⊤
α x ≤ bα , α ∈ A .
Note here that nonemptiness of M in this last example is essential. To see this,
consider the set M = {x ∈ R : x < 0, − x < 0} . Clearly, M is empty, so that its
closure also is the empty set. On the other hand, if we ignore the nonemptiness
requirement on M and apply formally the above rule, we would incorrectly claim
that cl M = {x ∈ R : x ≤ 0, − x ≤ 0} = {0} . ♢
Definition I.1.24 [Interior] The set of all interior points of a given set
M ⊆ Rn is called the interior of M [notation: int M or int(M )] (see Defini-
tion B.10).
16 First acquaintance with convex sets
Example I.1.6 We have the following sets and their corresponding interiors:
Fact I.1.25 For any set M in Rn , its interior, int M , is always open, and
int M is the largest (with respect to the inclusion) open set contained in M .
The interior of a set is, of course, contained in the set, which, in turn, is
contained in its closure:
int M ⊆ M ⊆ cl M. (1.3)
The boundary points of M are exactly the points from Rn which can be approx-
imated to whatever high accuracy both by points from M and by points from
outside of M (check it!).
Given a set M ⊆ Rn , it is important to note that the boundary points not
necessarily belong to M , since M = cl M need not necessarily hold in general. In
fact, all boundary points belong to M if and only if M = cl M , i.e., if and only
if M is closed.
The boundary of a set M ⊆ Rn is clearly closed as bd M = cl M ∩ (Rn \ int M )
and both sets cl M and Rn \ int M are closed (note that the set Rn \ int M is
closed since it is the complement of an open set). In addition, from the definition
of the boundary, we have
M ⊆ (int M ∪ bd M ) = cl M.
Therefore, any point from M is either an interior or a boundary point of M .
Example I.1.7 We have the following sets and their corresponding relative
interiors:
• The relative interior of a singleton is the singleton itself (since a point in the
0-dimensional space is the same as a ball of a positive radius).
• More generally, the relative interior of an affine subspace is the subspace itself.
• Given two distinct point x ̸= y in Rn , the interior of a segment [x, y] is empty
whenever n > 1. In contrast to this, the relative interior of this set is always
(independent of n) nonempty and it is precisely the interval (x, y), i.e., the
segment without the endpoints. ♢
18 First acquaintance with convex sets
Note that for any M ⊆ Rn , we naturally have rbd M is a closed set contained
in Aff(M ), and, as for the “actual” interior and boundary, we have
rint M ⊆ M ⊆ cl M = rint M ∪ rbd M.
Of course, if Aff(M ) = Rn , then the relative interior becomes the usual interior,
and similarly for boundary. Note that Aff(M ) = Rn for sure is the case when
int M ̸= ∅ (since then M contains a ball B, and therefore the affine hull of M is
the entire Rn , which is the affine hull of B).
Lemma I.1.30 Let M be a convex set in Rn . Then, for any x ∈ rint M and
y ∈ cl M , we have
[x, y) := {(1 − λ)x + λy : 0 ≤ λ < 1} ⊆ rint M.
also belongs to M . Note that the set ∆ is the image of the standard full-
dimensional simplex
( n
)
X
n
µ ∈ R : µ ≥ 0, µi ≤ 1
i=1
under the linear transformation µ 7→ Aµ, where A is the matrix with the columns
a1 , . . . , an . Recall from Example I.1.6 that the standard simplex has a nonempty
interior. Since A is nonsingular (due to the linear independence of a1 , . . . , an ),
multiplication by A maps open sets onto open ones, so that ∆ has a nonempty
interior. Since ∆ ⊆ M , the interior of M is nonempty.
(iii): The statement is evidently true when M is empty, so we assume that
20 First acquaintance with convex sets
In general, this inclusion can be “loose” – the right hand side set in (1.6) can
be much larger than the left hand side one, even when all Mk are convex. For
example, when K = 2, M1 = {x ∈ R2 : x2 = 0} is the x1 -axis, and M2 = {x ∈
R2 : x2 > 0} ∪ {[0; 0]}, both sets are convex, their intersection is the singleton
{0}, so that cl(M1 ∩ M2 ) = cl{0} = {0}, while the intersection of cl M1 and cl M2
is the entire x1 -axis, which is simply M1 . In this example the right hand side
in (1.6) is “incomparably larger” than the left hand side one. However, under
suitable assumptions we can also achieve equality in (1.6).
Proposition
T I.1.33 Consider convex sets Mk ⊆ Rn , k ≤ K.
(i) If k≤K rint Mk ̸= ∅, then cl(∩k≤K Mk ) = ∩k≤K cl Mk , i.e., (1.6) holds
an equality.
T (ii) Moreover, if MK ∩ int M1 ∩ int M2 ∩ . . . ∩ int MK−1 ̸= ∅, then we have
k≤K rint Mk ̸= ∅, i.e., the premise (and thus the conclusion) in (i) holds
true, so that cl(∩k≤K Mk ) = ∩k≤K cl Mk .
Proof. (i): To prove that under the premise of (i) inclusion (1.6) is equality is
the same as to verify that under the circumstances given x ∈ ∩k cl Mk , one has
x ∈ cl (∩k Mk ). Indeed, under the premise of (i) there exists x̄ ∈ ∩k rint Mk . Then,
for every k we have x̄ ∈ rint Mk and x ∈ cl Mk , implying by Lemma I.1.30 that
the set ∆ := [x̄, x) = {(1 − λ)x̄ + λx : 0 ≤ λ < 1} is contained in Mk . Since
∆ ⊆ Mk for all k, we have ∆ ∈ ∩k Mk , and thus cl ∆ ⊆ cl (∩k Mk ). It remains to
note that x ∈ cl ∆.
(ii): Let x̄ ∈ MK ∩ int M1 ∩ . . . ∩ int MK−1 . As x̄ ∈ int Mk for all k < K, there
exists an open set U ⊂ ∩k<K Mk such that x̄ ∈ U . As x̄ ∈ MK ⊆ cl MK , by
Theorem I.1.29, x̄ is the limit of a sequence of points from rint MK , so that there
b ∈ U ∩ rint MK . Due to the origin of U , we have x
exists x b ∈ rint Mk for all k ≤ K,
so that the premise of (i) indeed takes place.
22 First acquaintance with convex sets
We will call this the conic transform of X, see Figure I.4. Note that this set
is indeed a cone. Moreover, all vectors [x; t] from this cone have t ≥ 0, and,
importantly, the only vector with t = 0 in the cone ConeT(X) is the origin in
Rn+1 (this is what you get when taking trivial – with all coefficients zero – conic
combinations of vectors from X + ).
All nonzero vectors [x; t] from ConeT(X) have t > 0 and form a convex set which
we call the perspective transform Persp(X) of X:
Persp(X) := {[x; t] ∈ ConeT(X) : t > 0} = ConeT(X) \ {0n+1 }.
The name of this set is motivated by the following immediate observation:
the representation
Persp(X) = {[x; t] ∈ Rn × R : t > 0, x/t ∈ X} . (1.7)
In other words, to get Persp(X), we pass from X to X + (i.e., lift X to
Rn+1 ) and then take the union of all rays {[sx; s] ∈ Rn × R : s > 0, x ∈ X}
emanating from the origin (with origin excluded) and passing through the
points of X + .
Proof. Let X b := {[x; t] ∈ Rn × R : t > 0, x/t ∈ X}, so that the claim in the
proposition is Persp(X) = X. b Consider a point [x; t] ∈ X. b Then, t > 0 and
y := x/t ∈ X, and thus we have [x, t] = t[y; 1] so that the point [x; t] from X b is a
single-term conic combination – just positive multiple – of the point [y; 1] ∈ X + .
As this holds for every point [x; t] ∈ X b we conclude X b ⊆ Persp(X). To verify
the
P oppositeP inclusion, recall that every point P [x; t] ∈ Persp(X) is of the form
[ i λi xi ; i λi ] with xi ∈ X, λi ≥ 0, and t = i λi > 0. Then,
hX X i hX i
λ i xi ; λi = t (λi /t)xi ; 1 = t[y; 1],
i i i
i
P
where y := i (λi /t)x . Note that y ∈ X as it is a convex combination of points
from X and X is convex. Thus, [x; t] is such that t > 0 and y = x/t ∈ X, that is,
Xb ⊇ Persp(X) as desired.
As a byproduct of Proposition I.1.34, we conclude that the right hand side set
in (1.7) is convex whenever X is convex and nonempty – a fact not so evident
“from scratch.”
Note that X + is geometrically the same as X, and moreover we can view X +
as simply the intersection of ConeT(X) (or Persp(X)) with the hyperplane t = 1
in Rn × R.
Example I.1.8
1. ConeT(Rn ) = {[x; t] ∈ Rn+1 : t > 0} ∪ {0n+1 }, and
Persp(Rn ) = {[x; n+1
t] ∈ R n+1: t > 0}.
n
2. ConeT(R+ ) = [x; t] ∈ R+ : t > 0 ∪ {0n+1 }, and
Persp(Rn+ ) = [x; t] ∈ Rn+1
+ :t>0 .
3. Given any norm ∥ · ∥ on Rn , let B be its unit ball. Then, we have ConeT(B) =
{[x; t] ∈ Rn+1 : t ≥ ∥x∥}, and Persp(B) = {[t; x] ∈ Rn+1 : t ≥ ∥x∥, t > 0}.
♢
Note that in all three examples in Example I.1.8, the set X of which we are
taking conic and perspective transforms is not just convex, but also closed. How-
ever, in the first two examples, the conic transform is a non-closed cone, while in
the third example the conic transform is closed, albeit in all three cases the inter-
sections of ConeT(X) with half-space {[x; t] ∈ Rn+1 : t ≥ α} is closed, provided
α > 0. There is indeed a general fact underlying this phenomenon.
24 First acquaintance with convex sets
For a nonempty convex set X, let us also define the closure of ConeT(X), i.e.,
the set
ConeT(X) := cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ X} .
Clearly, ConeT(X) is a closed cone in Rn+1 containing X + . Moreover, it is im-
mediately seen that ConeT(X) is the smallest (w.r.t. inclusion) closed cone in
Rn+1 which contains X + and that this cone remains intact when extending X
to the closure of X. We will refer to ConeT(X) as the closed conic transform
of X. In some cases, ConeT(X) admits a simple characterization. An immediate
illustration of this is as follows:
For useful additional facts on closed conic transforms, see Exercise III.12.1-3.
2
We next examine three theorems from Convex Analysis that have important
consequences in Optimization.
Remark I.2.2 Note that some subsets of Rn are in the scopes of several defi-
nitions of dimension. Specifically, linear subspace is also an affine subspace, and
similarly, an affine subspace is a nonempty set as well. It is immediately seen that
if a set is in the scope of more than one definition of dimension, all applicable
definitions attribute the set with the same value of the dimension. ■
As an informal introduction to what follows, draw several points (“red points”)
on the 2D plane and a take a point (“blue point”) in their convex hull. You will
observe that whatever your selection of red points and the blue point in their
convex hull, this point will belong to a properly selected triangle with red vertices.
The general fact is as follows.
Let x ∈ Conv(M ). By Fact I.1.14 on the structure of convex hull, there exists
x , . . . , xN from M and convex combination weights λ1 , . . . , λN such that
1
N
X N
X
x= λ i xi , where λi ≥ 0, ∀i = 1, . . . , N, λi = 1.
i=1 i=1
For ease of reference, let us define λi (t) := λi + tδi for all i and for all t ∈ R. Note
that for any t ∈ R, by the definition of λi and δi , we always have
N
X N
X N
X N
X
λi (t) = (λi + tδi ) = λi + t δi = 1.
i=1 i=1 i=1 i=1
Moreover, when t = 0, λi (0) = λi for all i, and thus this is a convex combination
as all λi (0) ≥ 0 for all i. On the other hand, from the selection of δi , we know
PN
that i=1 δi = 0 and [δ1 ; . . . ; δN ] ̸= 0, and thus at least one entry in δ must be
negative. Therefore, when t is large, some of the coefficients λi (t) will be negative.
There exists, of course, the largest t = t∗ for which λi (t) ≥ 0 for all i = 1, . . . , N
holds, and for t = t∗ at least some of λi (t) are zero. Specifically, when setting
I − := {i : δi < 0},
λi λi
i∗ ∈ argmin : i ∈ I − , and t∗ := min : i ∈ I− ,
i |δi | i |δi |
we have λi (t∗ ) ≥ 0 for all i and λi∗ (t∗ ) = 0. This then implies that we have
28 Theorems of Caratheodory, Radon, and Helly
µ1 , . . . , µ N
N
X
µi xi = 0,
i=1
N
X
µi = 0.
i=1
Then, by setting
λi −λj
αi := , for i ∈ I, and βj := , for j ∈ J,
a a
we get
X X
αi ≥ 0, ∀i, βj ≥ 0, ∀j, αi = 1, βj = 1.
i∈I j∈J
Based on these definitions, Fd is precisely the set of vectors f such that their cost
a⊤ f is at most $11 and the associated polyhedral set S[f, d] is nonempty, that is,
resource f allows to meet demand d. Note that the set Fd is convex as it is the
linear image (in fact just the projection) of the convex set
f ∈ R10 , x ∈ Rn : a⊤ f ≤ 11, x ∈ S[d, f ] .
The punchline in this illustration is that every 11 sets of the form Fd have a
common point. Suppose that we are given 11 scenarios d1 , . . . , d11 from D. Then,
we can meet demand scenario di by investing $1 in properly selected vector of
resources fdi ≥ 0. As we proceeded in the cases 1—3, by investing $11 in the single
vector of resources f = fd1 + . . . + fd11 , we can meet every one of 11 scenarios
d1 , . . . , d11 , whence f ∈ Fd1 ∩ . . . ∩ Fd11 . Since every 11 of 100,000 convex sets
Fd ⊆ R10 , d ∈ D, have a point in common, all these sets have a common point,
say f∗ . That is, f∗ ∈ Fd for all d ∈ D, and thus by definition of Fd , we deduce that
every one of the sets S[d, f∗ ], d ∈ D, is nonempty, that is, vector of resources f∗
(which costs at most $11) allows us to satisfy every demand scenario d ∈ D.
with 11 variables x1 , . . . , x11 and convex constraints, i.e., every one of the sets
Xi := x ∈ R11 : gi (x) ≤ 0 , i = 1, . . . , 1000,
is convex. Suppose also that the problem is solvable with optimal value Opt∗ = 0.
Clearly, when dropping one or more constraints, the optimal value can only de-
crease or remain the same.
Is it possible to find a constraint such that even if we drop it, we preserve the
optimal value? Two constraints which can be dropped simultaneously with no
34 Theorems of Caratheodory, Radon, and Helly
has a feasible solution with the objective value < −ϵ. Besides this, such an 11-
constraint relaxation of the original problem has also a feasible solution with the
objective equal to 0 (namely, the optimal solution of the original problem), and
since its feasible set is convex (as the intersection of the convex feasible sets of
the participating constraints), the 11-constraint relaxation has a feasible solution
x with c⊤ x = −ϵ. In other words, every 11 of the 1000 convex sets
Yi := x ∈ R11 : c⊤ x = −ϵ, gi (x) ≤ 0 , i = 1, . . . , 1000
(i) for every collection of at most n + 1 sets from the family, the sets from
the collection have a point in common;
and
(ii) every set in the family is closed, and the intersection of the sets from a
certain finite subfamily of the family is bounded (e.g., one of the sets in the
family is bounded).
Then, all the sets from the family have a point in common.
Proof. By (i), Theorem I.2.10 implies that all finite subfamilies of F have
nonempty intersections, and also these intersections are convex (since intersec-
tion of a family of convex sets is convex by Proposition I.1.12); in view of (ii)
these intersections are also closed. Adding to F intersections of sets from finite
subfamilies of F, we get a larger family F ′ composed of closed convex sets, and
sets from a finite subfamily of this larger family again have a nonempty intersec-
tion. Moreover, from (ii) it follows that this new family contains a bounded set
Q. Since all the sets are closed, the family of sets
{Q ∩ Q′ : Q′ ∈ F}
forms a nested family of compact sets (i.e., a family of compact sets with nonempty
intersection of sets from every finite subfamily). Then, by a well-known theorem
from Real Analysis such a family has a nonempty intersection2) .
2 Here is the proof of this Real Analysis theorem: assume for contradiction that the intersection of the
compact sets Qα , α ∈ A, is empty. Choose a set Qα∗ from the family; for every x ∈ Qα∗ there is a
set Qx in the family which does not contain x (otherwise x would be a common point of all our
sets). Since Qx is closed, there is an open ball Vx centered at x which does not intersect Qx . The
balls Vx , x ∈ Qα∗ , form an open covering of the compact set Qα∗ . Since Qα∗ is compact, there
exists a finite subcovering Vx1 , . . . , VxN of Qα∗ by the balls from the covering, see Theorem B.19.
Since Qxi does not intersect Vxi , we conclude that the intersection of the finite subfamily
Qα∗ , Qx1 , . . . , QxN is empty, which is a contradiction.
3
X = {x ∈ Rn : Ax ≤ b} = x ∈ Rn : a⊤
i x ≤ bi , 1 ≤ i ≤ m .
Geometrically, a polyhedral
representation of a set X ⊆ Rn is its repre-
sentation
as the projection x ∈ Rn : ∃u ∈ Rk : [x; u] ∈ Y of a polyhedral set
Y = (x, u) ∈ Rn × Rk : Ax + Bu ≤ c . Here, Y lives in the space of n + k vari-
ables x ∈ Rn and u ∈ Rk , and the polyhedral representation of X is obtained
by applying the linear mapping (the projection) [x; u] 7→ x : Rn+k → Rn of the
(n + k)-dimensional space of (x, u)-variables (the space where Y lives) to the
n-dimensional space of x-variables where X lives.
36
3.1 Polyhedral representations 37
Note that the set X in question can be described by a system of linear inequalities
in x-variables only, namely, as
( n
)
X
n
X= x∈R : ϵi xi ≤ 1 , ∀(ϵi ∈ {−1, +1}), 1 ≤ i ≤ n) ,
i=1
In other words, the original description of X is nothing but its polyhedral repre-
sentation (in slight disguise), with λi ’s in the role of extra variables. ♢
Note that it suffices to prove this claim in the case of exactly one extra variable
since the projection which reduces the dimension by k — “eliminates” k extra
variables — is the result of k subsequent projections, every one reducing the
dimension by 1, “eliminating” the extra variables one by one.
Thus, consider a polyhedral set with variables x ∈ Rn and u ∈ R, i.e.,
Y := (x, u) ∈ Rn+1 : a⊤
i x + bi u ≤ ci , 1 ≤ i ≤ m .
We want to prove that the projection of Y onto the space of x-variables, i.e.,
X := {x ∈ Rn : ∃u ∈ R: Ax + bu ≤ c} ,
is polyhedral. To see this, let us split the indices of the inequalities defining Y
into three groups (some of these groups can be empty):
• inequalities with bi = 0: I0 := {i : bi = 0}. These inequalities with index i ∈ I0
do not involve u at all;
of the LP and to find its optimal value. When it is finite (case 3. above), we
can use the Fourier-Motzkin elimination backward, starting with t = a ∈ T and
extending this value to a pair (t, x) with t = a = c⊤ x and Ax ≤ b, that is, we
can augment the optimal value by an optimal solution. Thus, we can say that
Fourier-Motzkin elimination is a finite Real Arithmetics algorithm which allows
to check whether an LP is feasible and bounded, and when it is the case, allows
to find the optimal value and an optimal solution.
On the other hand, Fourier-Motzkin elimination is completely impractical since
the elimination process can blow up exponentially the number of inequalities.
Indeed, from the description of the process it is clear that if a polyhedral set
is given by m linear inequalities, then eliminating one variable, we can end up
with as much as m2 /4 inequalities (this is what happens if there are m/2 indices
in I+ , m/2 indices in I− and I0 = ∅). Eliminating the next variable, we again
can “nearly square” the number of inequalities, and so on. Thus, the number of
inequalities in the description of T can become astronomically large even when
the dimension of x is something like 10.
The actual importance of Fourier-Motzkin elimination is of theoretical nature.
For example, the Linear Programming (LP)-related reasoning we have just carried
out shows that
every feasible and bounded LP problem is solvable, i.e., it has an optimal
solution.
(We will revisit this result in more details in section 9.3.1). This is a fundamental
fact for LP, and the above reasoning (even with the justification of the elimina-
tion “charged” to it) is, to the best of our knowledge, the shortest and most
transparent way to prove this fundamental fact. Another application of the fact
that polyhedrally representable sets are polyhedral is the Homogeneous Farkas
Lemma to be stated and proved in section 4.1; this lemma will be instrumental
in numerous subsequent theoretical developments.
Note that the rules for intersection, taking direct products and taking inverse
images, as applied to polyhedral descriptions of operands, lead to polyhedral de-
scriptions of the results. In contrast to this, the rules for taking sums with coeffi-
cients and images under affine mappings heavily exploit the notion of polyhedral
representation: even when the operands in these rules are given by polyhedral
descriptions, there are no simple ways to point out polyhedral descriptions of the
results.
Absolutely straightforward justification of the above calculus rules is the sub-
ject of Exercise I.27.
Finally, we note that the problem of minimizing a linear form c⊤ x over a set
M given by its polyhedral representation, i.e.,
M = x ∈ Rn : ∃u ∈ Rk : Ax + Bu ≤ c ,
then every vector h that has nonnegative inner products with all ai ’s
should also have nonnegative inner product with a:
( N
)
X
a= λi ai , with λi ≥ 0, ∀i, and h ai ≥ 0, ∀i =⇒ h⊤ a ≥ 0.
⊤
i=1
In fact, this evident necessary condition is also sufficient. This is given by the
Homogeneous Farkas Lemma.
Proof. The necessity – the “only if” part of the statement – was proved before
the Homogeneous Farkas Lemma was formulated. Let us prove the “if” part of
the lemma. Thus, we assume that h⊤ a ≥ 0 is a consequence of the homogeneous
43
44 General Theorem on Alternative and Linear Programming Duality
you convince everyone that your answer is correct? What can be an “evident for
everybody” validity certificate for your answer?
If your claim is that (S) is feasible, a certificate can be just to point out a
solution x∗ to (S). Given this certificate, one can substitute x∗ into the system
and check whether x∗ is indeed a solution.
Suppose now that your claim is that (S) has no solutions. What can be a
“simple certificate” of this claim? How can one certify a negative statement? This
is a highly nontrivial problem not just for mathematics; for example, in criminal
law, how should someone accused in a murder prove his innocence? The “real life”
answer to the question “how to certify a negative statement” is discouraging: such
a statement normally cannot be certified1 . In mathematics, the standard way to
justify a negative statement A, like “there is no solution to such and such system
of constraints” (e.g., “there is no solutions to the equation x5 + y 5 = z 5 with
positive integer variables x, y, z”) is to lead the opposite to A statement, i.e.,
A (in our example, “the solution exists”), to a contradiction. That is, assume
that A is true and derive consequences until a clearly false statement is obtained;
when this happens, we know that A is false (since legitimate consequences of a
true statement must be true), and therefore A must be true. In general, there is
no recipe for leading to contradiction something which in fact is false; this is why
certifying negative statements usually is difficult.
Fortunately, finite systems of linear inequalities are simple enough to allow
for a recipe for certifying their infeasibility: we start with the assumption that
a solution exists and then demonstrate a contradiction in a very specific way
– by taking weighted sum of the inequalities in the system using nonnegative
aggregation weights to produce a contradictory inequality.
Let us start with a simple illustration: we would like to certify infeasibility of
the following system of inequalities in variables u, v, w:
5u −6v −4w > 2
+4v −2w ≥ −1
−5u +7w ≥ 1
Let us assign these inequalities with “aggregation weights” 2, 3, 2, multiply the
inequalities by the respective weights and sum up the resulting inequalities:
2× 5u −6v −4w > 2
+
3× +4v −2w ≥ −1
+
2× −5u +7w ≥ 1
(∗) 0 · u +0 · v +0 · w > 3
The resulting aggregated inequality (∗) is contradictory, it has no solutions at
1 This is where the court rule “a person is presumed innocent until proven guilty” comes from –
instead of requesting from the accused to certify the negative statement “I did not commit the
crime,” the court requests from the prosecution to certify the positive statement “The accused did
commit the crime.”
46 General Theorem on Alternative and Linear Programming Duality
where Ω is “>” whenever λi > 0 for at least one i with Ωi = “ > ”, and Ω is “≥”
otherwise. Now, when can a linear inequality
d⊤ x Ω e
be contradictory? Of course, it can happen only when d = 0. Furthermore, in
this case, whether the inequality is contradictory depends on the relation Ω and
the value of e: if Ω = “ > ”, then the inequality is contradictory if and only if
e ≥ 0, and if Ω = “ ≥ ”, then it is contradictory if and only if e > 0. We have
established the following simple result:
with unknowns λ ∈ Rm :
(a) P λ ≥ 0,
m (a) P λ ≥ 0,
(b) Pi=1 λi ai = 0, m
TI : m TII : (b) λi ai = 0,
(cI ) i=1 λi bi ≥ 0, Pi=1
m
(cII ) i=1 λi bi > 0.
ms
P
(dI ) i=1 λi > 0.
If at least one of the systems TI , TII is feasible, then the system (S) is infea-
sible.
First, we claim that in every solution to (4.2), one has ϵ ≤ 0. Indeed, assuming
that (4.2) has a solution x, τ, ϵ with ϵ > 0, we conclude from (4.2.a) that τ > 0.
Then, from (4.2.b − c) it will follow that τ −1 x is a solution to (S), while we
assumed (S) is infeasible. Therefore, we must have ϵ ≤ 0 in every solution to
(4.2).
Now, we have that the homogeneous linear inequality
−ϵ ≥ 0 (4.3)
We would like to emphasize that the preceding principles are highly nontrivial
and very deep. Consider, e.g., the following system of 4 linear inequalities in two
variables u, v:
−1 ≤ u ≤ 1,
−1 ≤ v ≤ 1.
These inequalities clearly imply that
u2 + v 2 ≤ 2, (!)
which in turn implies, by the Cauchy-Schwarz inequality, the linear inequality
u + v ≤ 2:
√ √ √
u + v = 1 × u + 1 × v ≤ 12 + 12 u2 + v 2 ≤ ( 2)2 = 2. (!!)
The concluding inequality u + v ≤ 2 is linear and is a consequence of the original
feasible system, and so we could have simply relied on Principle B to derive it.
On the other hand, in the preceding demonstration of this linear consequence
inequality both steps (!) and (!!) are “highly nonlinear.” It is absolutely unclear
a priori why the same consequence inequality can, as it is stated by Principle B, be
derived from the system in a “linear” manner as well (of course it can – it suffices
just to sum up two inequalities u ≤ 1 and v ≤ 1). In contrast, Inhomogeneous
Farkas Lemma predicts that hundreds of pages of whatever complicated (but
correct!) demonstration that such and such linear inequality is a consequence
of such and such feasible finite system of linear inequalities can be replaced by
simply demonstrating weights of prescribed signs such that the target inequality
is the weighted sum, with these weights, of the inequalities from the system and
the identically true linear inequality. One shall appreciate the elegance and depth
of such a result!
Note that the General Theorem on Alternative and its corollaries A and B
heavily exploit the fact that we are speaking about linear inequalities. For exam-
ple, consider the following system of two quadratic and two linear inequalities in
50 General Theorem on Alternative and Linear Programming Duality
two variables:
(a) u2 ≥ 1,
(b) v 2 ≥ 1,
(c) u ≥ 0,
(d) v ≥ 0,
along with the quadratic inequality
(e) uv ≥ 1.
The inequality (e) is clearly a consequence of (a) – (d). However, if we extend the
system of inequalities (a) – (b) by all “trivial” (i.e., identically true) linear and
quadratic inequalities in 2 variables, like 0 > −1, u2 + v 2 ≥ 0, u2 + 2uv + v 2 ≥ 0,
u2 − 2uv + v 2 ≥ 0, etc., and ask whether (e) can be derived in a linear fashion
from the inequalities of the extended system, the answer will be negative. Thus,
Principle B fails to be true already for quadratic inequalities (which is a great
sorrow – otherwise there would be no difficult problems at all!).
where
• [domain] X is called the domain of the problem,
• [objective] f is called the objective (function) of the problem,
• [constraints] gi , i = 1, . . . , m, are called the (functional) inequality constraints,
and hj , j = 1, . . . , k, are called the equality constraints 2) .
We always assume that X ̸= ∅ and that the objective and the constraints are
well-defined on X. Moreover, we typically skip indicating X when X = Rn .
We use the following standard terminology related to (4.5)
2 Rigorously speaking, the constraints are not the functions gi , hj , but the relations gi (x) ≤ 0,
hj (x) = 0. We will use the word “constraints” in both of these senses, and it will always be clear
what is meant. For example, we will say that “x satisfies the constraints” to refer to the relations,
and we will say that “the constraints are differentiable” to refer to the underlying functions.
4.5 Application: Linear Programming Duality 51
– [below boundedness] the problem is called below bounded, if its optimal value
is > −∞, i.e., if the objective is bounded from below on the feasible set.
• [optimal solution] a point x ∈ Rn is called an optimal solution to (4.5), if x is
feasible and f (x) ≤ f (x′ ) for any other feasible solution x′ , i.e., if
x ∈ Argmin {f (x′ ) : x′ ∈ X, g(x′ ) ≤ 0, h(x′ ) = 0} .
x′
• the optimal value is the supremum of the values of the objective at feasible
solutions, and is, by definition, −∞ for infeasible problems, and
• boundedness means boundedness of the objective from above on the feasible
set (or, which is the same, the fact that the optimal value is < +∞),
• optimal solution is a feasible solution such that the objective value at this
solution is greater than or equal to the objective value at every feasible solution.
52 General Theorem on Alternative and Linear Programming Duality
Note that in principle we could allow for linear equality constraints hj (x) :=
a⊤
j x + bj = 0. However, a constraint of this type can be equivalently represented
by a pair of opposite linear inequalities a⊤ ⊤
j x + bj ≤ 0, − aj x − bj ≤ 0. To save
space and words (and, as we have just explained, with no loss in generality), in
the sequel we will focus on inequality constrained linear programming problems.
a⊤
1
⊤
Opt = min c⊤ x : Ax − b ≥ 0 where A = a2 ∈ Rm×n .
... (LP)
x
⊤
am
The motivation for constructing the problem dual to an LP problem is the desire
to generate, in a systematic way, lower bounds on the optimal value Opt of (LP).
Consider the problem
An evident way to bound from below a given function f (x) in the domain given
by a system of inequalities
gi (x) ≥ bi , i = 1, . . . , m, (4.6)
is offered by what is called the Lagrange duality. We will discuss Lagrange Duality
in full detail for general functions in Part IV. Here, let us do a brief precursor
and examine the special case when we are dealing with linear functions only.
4.5 Application: Linear Programming Duality 53
Lagrange Duality:
• Let us look at all inequalities which can be obtained from (4.6) by
linear aggregation, i.e., the inequalities of the form
m
X m
X
yi gi (x) ≥ yi bi (4.7)
i=1 i=1
with the “aggregation weights” yi ≥ 0 for all i. Note that the inequality
(4.7), due to its origin, is valid on the entire set X of feasible solutions
of (4.6).
• Depending on the choice of aggregation weights, it may happen that
the left hand side in (4.7) is ≤P f (x) for all x ∈ Rn . Whenever this
m
is the case, the right hand side i=1 yi bi of (4.7) is a lower bound on
f (x) for any x ∈ X . It follows that
• The optimal value of the problem
(m )
X y ≥ 0, (a)
max yi bi : Pm n (4.8)
y
i=1 i=1 yi gi (x) ≤ f (x), ∀x ∈ R (b)
is a lower bound on the values of f on the set of feasible solutions to
the system (4.6).
Let us now examine what happens with the Lagrange duality when f and gi
are homogeneous linear functions, i.e., f (x) = c⊤ x and gi (x) = a⊤ x for all i =
iP
m
1, . . . , m. In this case, the requirement (4.8.b) merely says that c = i=1 yi ai (or,
⊤
which is the same, A y = c due to the origin of the matrix A). Thus, problem
(4.8) becomes the Linear Programming problem
max b⊤ y : A⊤ y = c, y ≥ 0 , (LP∗ )
y
(*) (Sa ) has no solutions if and only if at least one of the following
two systems of linear inequalities in m + 1 unknowns has a solution:
; . . . ; λm ] ≥ 0,
(a) λ = [λ0 ; λ1P
m
(b) −λ0 c + i=1 λi ai = 0,
TI : Pm
(c ) −λ0 a + i=1 λi bi ≥ 0,
I
(dI ) λ0 > 0;
or
(a) λ = [λ0 ; λ1P ; . . . ; λm ] ≥ 0,
m
TII : (b) −λ0 c − i=1 λi ai = 0,
Pm
(cII ) −λ0 a − i=1 λi bi > 0.
Now assume that (LP) is feasible. We first claim that under this assumption
(Sa ) has no solutions if and only if TI has a solution. The implication “TI has a
solution =⇒ (Sa ) has no solution” is readily given by the preceding remarks. To
verify the inverse implication, assume that (Sa ) has no solution and the system
Ax ≥ b has a solution, and let us prove that then TI has a solution. If TI has no
solution, then by (*), TII must have a solution. Moreover, since any solution to
TII where λ0 > 0 is also a solution to TI as well, we must have λ0 = 0 for (every)
solution to TII . But, the fact that TII has a solution λ with λ0 = 0 is independent
of the values of c and a; if this fact would take place, it would mean, by the same
General Theorem on Alternative, that, e.g., the following instance of (Sa ):
0⊤ x ≥ −1, Ax ≥ b
has no solution as well. But, then we must have the system Ax ≥ b has no solution
– a contradiction to the assumption that (LP) is feasible.
Now, if TI has a solution, this system has a solution with λ0 = 1 as well (to see
this, pass from a solution λ to the one λ/λ0 ; this construction is well-defined, since
λ0 > 0 for every solution to TI ). Now, an (m + 1)-dimensional vector λ = [1; y]
is a solution to TI if and only if the m-dimensional vector y solves the following
system of linear inequalities and equations
⊤ Pm y ≥ 0,
A y≡ i=1 i i = c,
y a (D)
b⊤ y ≥ a.
We summarize these observations below.
We see that the entity responsible for lower bounds on the optimal value of
(LP) is the system (D): every solution to the latter system induces a bound of
this type, and in the case when (LP) is feasible, all lower bounds can be obtained
4.5 Application: Linear Programming Duality 55
from solutions to (D). Now note that if (y, a) is a solution to (D), then the pair
(y, b⊤ y) also is a solution to the same system, and the lower bound b⊤ y on Opt
is not worse than the lower bound a. Thus, as far as lower bounds on Opt are
concerned, we lose nothing by restricting ourselves to the solutions (y, a) of (D)
with a = b⊤ y. The best lower bound on Opt given by (D) is therefore the optimal
value of the problem
max b⊤ y : A⊤ y = c, y ≥ 0 ,
y
which is nothing but the dual to (LP) problem given by (LP∗ ). Note that (LP∗ )
is also a Linear Programming problem.
All we know about the dual problem so far is the following:
Then,
1) [Primal-dual symmetry] The dual problem is an LP program, and its
dual is equivalent to the primal problem;
2) [Weak duality] The value of the dual objective at every dual feasible
solution is less than or equal to the value of the primal objective at every
primal feasible solution, so that the dual optimal value is less than or equal
to the primal one;
3) [Strong duality] The following 5 properties are equivalent to each other:
(i) The primal is feasible and bounded below.
(ii) The dual is feasible and bounded above.
(iii) The primal is solvable.
(iv) The dual is solvable.
(v) Both primal and dual are feasible.
56 General Theorem on Alternative and Linear Programming Duality
Moreover, if any one of these properties (and then, by the equivalence just
stated, every one of them) holds, then the optimal values of the primal and
the dual problems are equal to each other.
Finally, if at least one of the problems in the primal-dual pair is feasible,
then the optimal values in both problems are the same, i.e., either both are
finite and equal to each other, or both are +∞ (i.e., primal is infeasible and
dual is not bounded above), or both are −∞ (i.e., primal is unbounded below
and dual is infeasible).
There is one last remark we should make to complete the story of primal
and dual objective values given in Theorem I.4.9: in fact it is possible to have
both primal and dual problems infeasible simultaneously (see Exercise I.38). This
is the only case when the primal and the dual optimal values (+∞ and −∞,
respectively) differ from each other.
Proof. 1) This part is quite straightforward: writing the dual problem (LP∗ ) in
our standard form, we get
Im 0
min −b⊤ y : A⊤ y − c ≥ 0 ,
y
−A⊤ −c
is to prove the equivalence between (i)–(iv) and (v). This is immediate: (i)–(iv),
of course, imply (v); vice versa, in the case of (v) the primal is not only feasible,
but also bounded below (this is an immediate consequence of the feasibility of
the dual problem, see part 2)), and (i) follows.
It remains to verify that if one problem in the primal-dual pair is feasible, then
the primal and the dual optimal values are equal to each other. By primal-dual
symmetry it suffices to consider the case when the primal problem is feasible. If
also the primal is bounded from below, then by what has already been proved
the dual problem is feasible and the primal and dual optimal values coincide
with each other. If the primal problem is unbounded from below, then the primal
optimal value is −∞ and by Weak Duality the dual problem is infeasible, so that
the dual optimal value is −∞.
An immediate corollary of the LP Duality Theorem is the following necessary
and sufficient optimality condition in LP.
Proof. Indeed, the “zero duality gap” optimality condition is an immediate con-
sequence of the fact that the value of primal objective at every primal feasible
solution is greater than or equal to the value of the dual objective at every dual
feasible solution, while the optimal values in the primal and the dual are equal
to each other whenever one of the problem is feasible, see Theorem I.4.9. The
equivalence between the “zero duality gap” and the “complementary slackness”
optimality conditions is given by the following computation: whenever x is primal
feasible and y is dual feasible, we have
y ⊤ (Ax − b) = (A⊤ y)⊤ x − b⊤ y = c⊤ x − b⊤ y,
where the second equality follows from dual feasibility (i.e., A⊤ y = c). Thus, for a
primal-dual feasible pair (x, y), the duality gap vanishes if and only if y ⊤ (Ax−b) =
0, and the latter, due to y ≥ 0 and Ax−b ≥ 0, happens if and only if yi [Ax−b]i = 0
for all i, that is, if and only if the complementary slackness takes place.
Geometry of primal-dual pair of LP problems. Consider primal-dual pair
of LP problems
minx∈Rn c⊤ x : Ax − b ≥ 0
(LP)
maxy∈Rm b⊤ y : A⊤ y = c, y ≥ 0 (LP∗ )
as presented in section 4.5.2 and assume that the system of equality constraints
58 General Theorem on Alternative and Linear Programming Duality
3 for derivations, see Exercise IV.7 addressing Conic Duality, of which LP duality is a special case.
5
5.1 Elementaries
Exercise I.1 Mark in the following list the sets which are convex:
1. x ∈ R2 : x1 + i2 x2 ≤ 1, i = 1, . . . , 10
7. x ∈ R2 : exp{x1 } ≥ x2
8. x ∈ Rn : n 2
P
i=1 xi = 1
n Pn
9. x ∈ R : i=1 x2i ≤ 1
10. x ∈ Rn : n 2
P
i=1 xi ≥ 1
11. x ∈ Rn : max xi ≤ 1
i=1,...,n
12. x ∈ Rn : max xi ≥ 1
i=1,...,n
13. x ∈ Rn : max xi = 1
i=1,...,n
n
14. x ∈ R : min xi ≤ 1
i=1,...,n
n
15. x ∈ R : min xi ≥ 1
i=1,...,n
16. x ∈ Rn : min xi = 1
i=1,...,n
Exercise I.2 Mark by T those of the following claims which always are true.
1. The linear image Y = {Ax : x ∈ X} of a linear subspace X is a linear subspace.
2. The linear image Y = {Ax : x ∈ X} of an affine subspace X is an affine subspace.
3. The linear image Y = {Ax : x ∈ X} of a convex set X is convex.
4. The affine image Y = {Ax + b : x ∈ X} of a linear subspace X is a linear subspace.
5. The affine image Y = {Ax + b : x ∈ X} of an affine subspace X is an affine subspace.
6. The affine image Y = {Ax + b : x ∈ X} of a convex set X is convex.
7. The intersection of two linear subspaces in Rn is always nonempty.
8. The intersection of two linear subspaces in Rn is a linear subspace.
9. The intersection of two affine subspaces in Rn is an affine subspace.
10. The intersection of two affine subspaces in Rn , when nonempty, is an affine subspace.
11. The intersection of two convex sets in Rn is a convex set.
12. The intersection of two convex sets in Rn , when nonempty, is a convex set.
59
60 Exercises for Part I
Exercise I.3 ▲ Prove that the relative interior of a simplex with vertices y 0 , . . . , y m is exactly
the set
( m m
)
X X
λi yi : λi > 0, λi = 1 .
i=0 i=0
is convex.
Exercise I.7 Which of the following claims are always true? Explain why/why not.
1. The convex hull of a bounded set in Rn is bounded.
2. The convex hull of a closed set in Rn is closed.
3. The convex hull of a closed convex set in Rn is closed.
4. The convex hull of a closed and bounded set in Rn is closed and bounded.
5. The convex hull of an open set in Rn is open.
Exercise I.8 ▲ [This exercise together with its follow-up, i.e., Exercise II.9, and Exercise I.9
are the most boring exercises ever designed by the authors. Our excuse is that There is no royal
road to geometry (Euclid of Alexandria, c. 300 BC)]
Let A, B be nonempty subsets of Rn . Consider the following claims. If the claim is always
(i.e., for every data satisfying premise of the claim) true, give a proof; otherwise, give a counter
example.
1. If A ⊆ B, then Conv(A) ⊆ Conv(B).
2. If Conv(A) ⊆ Conv(B), then A ⊆ B.
3. Conv(A ∩ B) = Conv(A) ∩ Conv(B).
4. Conv(A ∩ B) ⊆ Conv(A) ∩ Conv(B).
5. Conv(A ∪ B) ⊆ Conv(A) ∪ Conv(B).
6. Conv(A ∪ B) ⊇ Conv(A) ∪ Conv(B).
7. If A is closed, so is Conv(A).
8. If A is closed and bounded, so is Conv(A).
9. If Conv(A) is closed and bounded, so is A.
Exercise I.9 ▲ Let A, B, C be nonempty subsets of Rn and D be a nonempty subset of Rm .
Consider the following claims. If the claim is always (i.e., for every data satisfying premise of
the claim) true, give a proof; otherwise, give a counter example.
1. Conv(A ∪ B) = Conv(Conv(A) ∪ B).
2. Conv(A ∪ B) = Conv(Conv(A) ∪ Conv(B)).
3. Conv(A ∪ B ∪ C) = Conv(Conv(A ∪ B) ∪ C).
4. Conv(A × D) = Conv(A) × Conv(D).
5. When A is convex, the set Conv(A ∪ B) (which is always the set of convex combinations
of several points from A and several points from B), can be obtained by taking convex
combinations of points with at most one of them taken from A, and the rest taken from B.
Similarly, if A and B are both convex, to get Conv(A ∪ B), it suffices to add to A ∪ B all
convex combinations of pairs of points, one from A and one from B.
5.2 Around ellipsoids 61
Exercise I.12 Let C1 , C2 be two nonempty conic sets in Rn , i.e., for each i = 1, 2, for any
x ∈ Ci and t ≥ 0, we have t · x ∈ Ci as well. Note that C1 , C2 are not necessarily convex. Prove
that
1. C1 + C2 ̸= Conv(C1 ∪ C2 ) may happen if either C1 or C2 (or both) is nonconvex.
2. C1 + C2 = Conv(C1 ∪ C2 ) always holds if C1 , C2 are both convex.
S
3. C1 ∩ C2 = α∈[0,1] (αC1 ∩ (1 − α)C2 ) always holds if C1 , C2 are both convex.
Exercise I.13 ▲ Let X ⊆ Rn be a convex set with int X ̸= ∅, and consider the following set
Vol(E) = |Det(D)|,
When truss is subject to external load – collection of forces acting at the nodes – it starts to
deform, so that the nodes move a little bit, leading to elongations/shortenings of bars, which, in
turn, result in reaction forces. At the equilibrium, the reaction forces compensate the external
ones, and the truss capacitates certain potential energy, called compliance. Mechanics models
this story as follows.
• The nodes form a finite set p1 , . . . , pK of distinct points in physical space Rd (d = 2 for
planar, and d = 3 for spatial constructions). Virtual displacements of the nodes under the
load are somehow restricted by “support conditions;” we will focus on the case when some of
the nodes “are fixed” – cannot move at all (think about them as being in the wall), and the
remaining “are free” – their virtual displacements form the entire Rd . A virtual displacement
v of the nodal set can be identified with a vector of dimension M = dm, where m is the
number of free nodes; v is block vector with m d-dimensional blocks, indexed by the free
nodes, representing physical displacements of these nodes.
• There are N bars, i-th of them linking the nodes with indexes αi and βi (with at least one
of these nodes free) and with volume (3D or 2D, depending on whether the truss is spatial
or planar) ti .
5.3 Truss Topology Design 63
• An external load is a collection of physical forces – vectors from Rd – acting at the free nodes
(forces acting at the fixed nodes are of no interest – they are suppressed by the supports).
Thus, an external load f can be identified with block vector of the same structure as a virtual
displacement – blocks are indexed by free nodes and represent the external forces acting at
these nodes. Thus, displacements v of the nodal set and external loads f are vectors from
the space V of virtual displacements – M -dimensional block vectors with m d-dimensional
blocks.
• The bars and the nodes together specify the symmetric positive semidefinite M × M stiffness
matrix A of the truss. The role of this matrix is as follows. A displacement v ∈ V of the nodal
set results in reaction forces at free nodes (those at fixed nodes are of no interest – they are
compensated by supports); assembling these forces into M -dimensional block-vector, we get
a reaction, and this reaction is −Av. In other words, the potential energy capacitated in truss
under displacement v ∈ V of nodes is 12 v ⊤ Av, and reaction, as it should be, is the minus
gradient of the potential energy as a function of v 1 . At the equilibrium under external load
f , the total of the reaction and the load should be zero, that is, the equilibrium displacement
satisfies
Av = f (5.1)
Note that (5.1) may be unsolvable, meaning that the truss is crushed by the load in question.
Assuming the equilibrium displacement v exists, the truss at equilibrium capacitates potential
energy 12 v ⊤ Av; this energy is called compliance of the truss w.r.t. the load. Compliance is
convenient measure of rigidity of the truss with respect to the load, the less the compliance
the better the truss withstands the load.
Let us build the stiffness matrix of a truss. As we have mentioned, the reaction forces originate
from elongations/shortenings of bars under displacement of nodes. Consider i-th bar linking
nodes with initial – prior to the external load being applied – positions ai = pαi and bi = pβi ,
and let us set
di = ∥bi − ai ∥2 , ei = [bi − ai ]/di .
Under displacement v ∈ V of the nodal set,
v αi , bi + |{z}
• positions of the nodes linked by the bar become ai + |{z} v βi , where v γ is γ-th block
da db
in v – the displacement of γ-th node
• as a result, elongation of the bar becomes, in the first-order in v approximation, e⊤
i [db − da],
and the reaction forces caused by this elongation by Hooke’s Law2 are
d−1 ⊤
i Si ei ei [db − da] at node # αi
−d−1
i S e e⊤
i i i [db − da] at node # βi
0 at all remaining nodes
where Si = ti /di is the cross-sectional size of i-th bar. It follows that when both nodes linked
by i-th bar are free, the contribution of i-th bar to the reaction is
−ti bi b⊤
i v,
1 This is called linearly elastic model; it is the linearized in displacements approximation of the actual
behavior of a loaded truss. This model works the better the smaller are the nodal displacements as
compared to the inter-nodal distances, and is accurate enough to be used in typical real-life
applications.
2 Hooke’s Law says that the magnitude of the reaction force caused by elongation/shortening of a bar
is proportional to Sd−1 δ, where S is bar’s cross-sectional size (area for spatial, and thickness for
planar truss), d is bar’s (pre-deformation) length, and δ is the elongation. With units of length
properly adjusted to bars’ material, the proportionality coefficient becomes 1, and this is what we
assume from now on.
64 Exercises for Part I
The bottom line is that The stiffness matrix of a truss composed of N bars with volumes ti ,
1 ≤ i ≤ N , is
X
A = A(t) := ti bi b⊤
i ,
i
where bi ∈ V = RM are readily given by the geometry of nodal set and the indexes of nodes
linked by bar i.
Truss Topology Design problem. In the simplest Truss Topology Design (TTD) problem,
one is given
• a finite set of tentative nodes in 2D or 3D along with support conditions indicating which
of the nodes are fixed and which are free, and thus specifying the linear space V = RM of
virtual displacements of the nodal set,
• the set of N tentative bars – unordered pairs of (distinct from each other) nodes which are
allowed to be linked by bars, and the total volume W > 0 of the truss,
• An external load f ∈ V.
These data specify, as explained above, vectors bi ∈ RM , i = 1, . . . , N , and the stiffness matrix
N
X
A(t) = ti bi b⊤
i = B Diag{t1 , . . . , tN }B
⊤
∈ SM [B = [b1 , . . . , bN ]]
i=1
When applying the TTD model, one starts with dense grid of tentative nodes and broad list
of tentative bars (e.g., by allowing to link by a bar every pair of distinct from each other nodes,
with at least one of the nodes in the pair free). At the optimal truss yielded by the optimal
solution to the TTD problem, many tentative bars (usually vast majority of them) get zero
volumes, and significant part of the tentative nodes become unused. Thus, TTD problem in fact
is not about sizing – it allows to recover optimal structure of the construction, this is where
“Topology Design” comes from.
To illustrate this point, here is a toy example (it will be our guinea pig in the entire series of
TTD exercises):
5.3 Truss Topology Design 65
Console design: We want to design a 2D truss as follows:
• The set of tentative nodes is the 9 × 9 grid {[p; q] ∈ R2 : p, q ∈ {0, 1, . . . , 8}} with
the 9 most-left nodes fixed and remaining 72 nodes free, resulting in M = 144-
dimensional space V of virtual displacements
• The external load f ∈ V = R144 is a single-force one, with the only nonzero force
[0; −1] applied at the 5-th node of the most-right column of nodes.
• We allow for all pairwise connections of pairs of distinct from each other nodes,
with at least one of these nodes free, resulting in N = 3204 tentative bars
• The total volume of truss is W = 1000.
Important: From now on, speaking about TTD problem, we always make the following as-
sumption:
PN ⊤
t=1 bi bi ≻ 0.
R:
A(t)v = f,
B Diag{t}B ⊤ f
A= ⊤ , B = [b1 , . . . , bN ]
f 2τ
is positive semidefinite. As a result, pose the TTD problem as the optimization problem
( )
B Diag{t}B ⊤ f
X
Opt = min τ : ⪰ 0, t ≥ 0, ti = W (5.2)
τ,r f⊤ 2τ i
and compare the resulting design and compliance to those in the previous item.
Comment: Note that the claims above are refinements, albeit minor ones, of the Caratheodory
Theorem (plain and conic, respectively). Indeed, when M := Aff(X) and m is the dimension
of M , every affinely independent collection of points from X contains at most m + 1 points
(Proposition A.44), so that the first claim equivalent to stating that if x ∈ Conv(X), then x is a
convex combination of at most m + 1 points from X. However, the vectors participating in such
a convex combination are not necessarily affinely independent, so that the first claim provides
a bit more information than the plain Caratheodory’s Theorem. Similarly, if L := Lin(X) and
m := dim L, then every linearly independent collection of vectors from X contains at most
m ≤ n points, that is, the second claim implies the Caratheodory’s Theorem in conic form, and
provides a bit more information than the latter theorem.
Exercise I.18 ♦ 3 Consider TTD problem, and let N be the number of tentative bars, M be
the dimension of the corresponding space of virtual displacements V, and f be an external load.
Prove that if truss t ≥ 0 can withstand load f with compliance ≤ τ for some given real number
τ , then there exists truss t of the same total volume as t with compliance w.r.t. f at most τ and
at most M + 1 bars of positive volume.
Exercise I.19 ♦ [Shapley-Folkman Theorem]
1. Prove that if a system of linear equations Ax = b with n variables and m equations has a
nonnegative solution, it has a nonnegative solution with at most m positive entries.
2. Let V1 , . . . , Vn be n nonempty sets in Rm , and define
V := Conv(V1 + V2 + . . . + Vn ).
1. Prove that
V = Conv(V1 ) + . . . + Conv(Vn ).
Hint: Assume, on contrary, that the convex hulls of X and Y intersect, so that
k
X m
X
λ i xi = µj y j
i=1 j=1
( K K
)
X X
Conv(X) = x = λk xk : λk ≥ 0, xk ∈ Xk , ∀k ≤ K, λk = 1 .
k=1 k=1
n
2. Let Xk , k ≤ K, be nonempty bounded polyhedral sets in R given by polyhedral represen-
tations: n o
Xk = x ∈ Rn : ∃uk ∈ Rnk : Pk x + Qk uk ≤ rk .
S
Define X := k≤K Xk . Prove that the set Conv(X) is a polyhedral set given by the polyhedral
representation
∃xk ∈ Rn , uk ∈ Rnk , λk ∈ R, ∀k ≤ K :
Pk xk + Qk uk − λk rk ≤ 0, k ≤ K
n (a)
Conv(X) = x ∈ R : PK . (∗)
λk ≥ 0, λk = 1 (b)
PK k=1
x = k=1 xk
(c)
Does the claim remain true when the assumption of boundedness of the sets Xk s is lifted?
After two preliminary items above, let us pass to the essence of the matter. Consider the situation
as follows. We are given n nonempty and bounded polyhedral sets Xj ⊂ Rr , j = 1, . . . , n. We
will think of Xj as the “resource set” of the j-th production unit: entries in x ∈ Xj are amounts
of various resources, and Xj describes the set of vectors of resources available, in principle, for
j-th unit. Each production unit j can possibly use any one of its Kj < ∞ different production
plans. For each j = 1, . . . , n, the vector yj ∈ Rp representing the production of the j-th unit
depends on the vector xj of resources consumed by the unit and also on the production plan
utilized in the unit. In particular, the production vector yj ∈ Rp stemming from resources xj
under k-th plan can be picked by us, at our will, from the set
n o
Yjk [xj ] := yj ∈ Rp : zj := [xj ; −yj ] ∈ Vjk ,
where Vjk , k ≤ Kj , are given bounded polyhedral “technological sets” of the units with projec-
tions onto the xj -plane equal to Xj , so that for every k ≤ Kj it holds
xj ∈ Xj ⇐⇒ ∃yj : [xj ; −yj ] ∈ Vjk . (5.3)
We assume that all the sets Vjk are given by polyhedral representations, and we define
[
Vj := Vjk .
k≤Kj
Let R ∈ R be the vector of total resources available to all n units and let P ∈ Rp be the
r
vector of total demands for the products. For j ≤ n, we want to select xj ∈ Xj , kj ≤ Kj , and
k
yj ∈ Yj j [xj ] in such a way that
X X
xj ≤ R and yj ≥ P.
j j
5.5 Around Helly Theorem 69
P
That is, we would like to find zj = [xj ; vj ] ∈ Vj , j ≤ n, in such a way that j zj ≤ [R; −P ].
Note that the presence of “combinatorial part” in our decision – selection of production plans
in finite sets – makes the problem difficult.
3. Apply Shapley-Folkman Theorem (Exercise I.19) to overcome, to some extent, the above
difficulty and come up with a good and approximately feasible solution.
where e ̸= 0 is the vector (“inner normal to the cutting hyperplane”) chosen by John. Prove
70 Exercises for Part I
300
that for every ϵ > 0, Jill can guarantee to herself at least n+1
− ϵ g of raisins, but in general
300
cannot guarantee to herself n+1 + ϵ g.
Remarks:
300
1. With some minor effort, you can prove that Jill can find a point which guarantees her n+1
300
g of raisins, and not n+1 − ϵ g.
2. If, instead of dividing raisins, John and Jill would divide in the same fashion uniform and
convex cake (that is, a closed and bounded convex body X with a nonempty interior in Rn ,
the reward being the n-dimensional volume of the part a person gets), the results would
change dramatically: choosing as the point the center of masses of the cake
R
xdx
X
x̄ := R ,
dx
X
n
n
Jill would guarantee herself at least n+1
≈ 1e part of the cake. This is a not so easy
corollary of the following extremely important and deep result:
Brunn-Minkowski Symmetrization Theorem: Let X be as above, and let [a, b]
be the projection of X on an axis ℓ, say, on the last coordinate axis. Consider the “
symmetrization” Y of X, i.e., Y is the set with the same projection [a, b] on ℓ and
for every hyperplane orthogonal to the axis ℓ and crossing [a, b], the intersection of Y
with this hyperplane is an (n − 1)-dimensional ball centered at the axis with precisely
the same (n − 1)-dimensional volume as the one of the intersection of X with the same
hyperplane:
z ∈ Rn−1 : [z; c] ∈ Y = z ∈ Rn−1 : ∥z∥2 ≤ ρ(c) , ∀c ∈ [a, b], and
2 2
This problem has n + 1 variables and (n + 1) linear inequality constraints, and let us solve
it by applying the Fourier-Motzkin elimination to project the feasible set of the problem onto
the axis of the t-variable, that is, to build a finite system S of univariate linear inequalities
specifying this projection.
How many inequalities do you think there will be in S when n = 1, 2, 3, 4? Check your intuition
by implementing and running the F-M elimination, assuming, for the sake of definiteness, that
cij = 1 for all i, j.
Exercise I.32 For the systems of constraints to follow, write them down equivalently in the
standard form Ax < b, Cx ≤ d and point out their feasibility status (“feasible – infeasible”) along
with the corresponding certificates (certificate for feasibility is a feasible solution to the system;
certificate for infeasibility is a collection of weights of constraints which leads to a contradictory
consequence inequality, as explained in GTA).
1. x ≤ 0 (x ∈ RnP )
n
2. x ≤ 0, and i=1 xi > 0P(x ∈ Rn )
3. −1 ≤ xi ≤ 1, 1 ≤ i ≤ n, Pn n
i=1 xi ≥ n (x ∈ R )
n
4. −1 ≤ xi ≤ 1, 1 ≤ i ≤ n, xi > n (x ∈ Rn )
Pi=1
5. −1 ≤ xi ≤ 1, 1 ≤ i ≤ n, n
ixi ≥ n(n+1) (x ∈ Rn )
Pi=1
n
2
n(n+1)
6. −1 ≤ xi ≤ 1, 1 ≤ i ≤ n, i=1 ixi > 2
(x ∈ Rn )
2
7. x ∈ R , |x1 | + x2 ≤ 1, x2 ≥ 0, x1 + x2 = 1
8. x ∈ R2 , |x1 | + x2 ≤ 1, x2 ≥ 0, x1 + x2 > 1
72 Exercises for Part I
Write down the dual problem and check whether the optimal values are equal to each other.
Exercise I.39 Write down the problems dual to the following linear programs:
x1 − x2 + x3 = 0,
x1 + x2 − x3 ≥ 100,
1. max x1 + 2x2 + 3x3 : x1 ≤ 0,
x∈R3
x2 ≥ 0,
x3 ≥ 0
⊤
2. maxn c x : Ax = b, x ≥ 0
x∈R
3. maxn c⊤ x : Ax = b, u ≤ x ≤ u
x∈R
4. max c⊤ x : Ax + By ≤ b, x ≤ 0, y ≥ 0
x,y
Prove that the feasible set of at least one of these problems is unbounded.
Exercise I.41 ▲ Consider the following linear program
X X X
Opt = min 2 xij : xij ≥ 0, ∀1 ≤ i < j ≤ 4, xij + xji ≥ i, 1 ≤ i ≤ 4 .
{xij }1≤i<j≤4
1≤i<j≤4 j>i j<i
Exercise I.44 ▲ Let w ∈ Rn , and let A ∈ Rn×n be a skew-symmetric matrix, i.e., A⊤ = −A.
Consider the following linear program
n o
Opt(P ) = minn w⊤ x : Ax ≥ −w, x ≥ 0 .
x∈R
Suppose that the problem is solvable. Provide a closed analytical form expression for Opt(P ).
Exercise I.45 ▲ [Separation Theorem, polyhedral version] Let P and Q be two nonempty
polyhedral sets in Rn such that P ∩ Q = ∅. Suppose that the polyhedral descriptions of these
sets are given as
P := {x ∈ Rn : Ax ≤ b} and Q := {x ∈ Rn : Dx ≥ d} .
Using LP duality show that there exists a vector c ∈ Rn such that
c⊤ x < c⊤ y for all x ∈ P and y ∈ Q.
Exercise I.46 ▲ Suppose we are given the following linear program
n o
min c⊤ x : Ax = b, x ≥ 0 (P )
x
Now, let us consider the following “game”: Player 1 chooses some x ≥ 0, and player 2 chooses
some λ simultaneously; then, player 1 pays to player 2 the amount L(x, λ). In this game, player
1 would like to minimize L(x, λ) and player 2 would like to maximize L(x, λ).
A pair (x∗ , λ∗ ) with x∗ ≥ 0, is called an equilibrium point (or saddle point or Nash equilibrium)
if
L(x∗ , λ) ≤ L(x∗ , λ∗ ) ≤ L(x, λ∗ ), ∀x ≥ 0 and ∀λ. (∗)
(That is, we have an equilibrium if no player is able to improve her performance by unilaterally
modifying her choice.)
Show that a pair (x∗ , λ∗ ) is an equilibrium point if and only if x∗ and λ∗ are respectively
optimal solutions to the problem (P ) and its dual respectively.
Exercise I.47 ▲ Given a polyhedral set X = x ∈ Rn : a⊤
i x ≤ bi , ∀i = 1, . . . , m , consider
the associated optimization problem
max {t : B1 (x, t) ⊆ X} ,
x,t
n
where B1 (x, t) := {y ∈ R : ∥y − x∥∞ ≤ t}. Is it possible to pose this optimization problem as
a linear program with polynomial in m, n number of variables and constraints? If it is possible,
give such a representation explicitly. If not, argue why.
Exercise I.48 ▲ Consider the following optimization problem
n o
minn c⊤ x : ã⊤
i x ≤ bi for some ãi ∈ Ai , i = 1, . . . , m, x ≥ 0 , (*)
x∈R
where Ai = {āi + ϵi : ∥ϵi ∥∞ ≤ ρ} for i = 1, . . . , m and ∥u∥∞ := maxj=1,...,n {|uj |}. In this prob-
lem, we basically mean that the constraint coefficient ãij (j-th component of the i-th constraint
5.8 Around Linear Programming Duality 75
vector ãi ) belongs to the interval uncertainty set [āij − ρ, āij + ρ], where āij is its nominal
value. That is, in (∗), we are seeking a solution x such that each constraint is satisfied for some
coefficient vector from the corresponding uncertainty set.
Note that in its current form (∗), this problem is not a linear program (LP). Prove that it
can be written as an explicit linear program and give the corresponding LP formulation.
Exercise I.49 ♦ Let S = {a1 , a2 , . . . , an } be a finite set composed of n distinct from each
other elements, and let f be a real-valued function defined on the set of all subsets of S. We say
that f is submodular if for every X, Y ⊆ S, the following inequality holds
f (X) + f (Y ) ≥ f (X ∪ Y ) + f (X ∩ Y ).
1. Give an example of a submodular function f .
2. Let f : 2S → Z be an integer-valued submodular function such that f (∅) = 0. Consider the
polyhedron
( )
|S|
X
Pf := x ∈ R : xt ≤ f (T ), ∀T ⊆ S ,
t∈T
Consider
x̄ak := f ({a1 , . . . , ak }) − f ({a1 , . . . , ak−1 }), k = 1, . . . , n.
Show that x̄ is feasible to Pf .
3. Consider the following optimization problem associated with Pf
n o
max c⊤ x : x ∈ Pf .
x
Here, the first inequality follows from Triangle inequality, and the second equality follows from
homogeneity of norms, and the last inequality is due to x′ , x′′ ∈ Q. Thus, from ∥λx′ + (1 −
λ)x′′ − a∥ ≤ r, we conclude that λx′ + (1 − λ)x′′ ∈ Q as desired.
Fact I.1.8 Unit balls of norms on Rn are exactly the same as convex sets V in
Rn satisfying the following three properties:
Proof. First, let V be the unit ball of a norm ∥ · ∥, and let us verify the three stated properties.
Note that V = −V due to ∥x∥ = ∥ − x∥. V is bounded and contains a neighborhood of the
origin due to equivalence between ∥ · ∥ and ∥ · ∥2 (Proposition B.3). Moreover, V is closed. To
76
Proofs of Facts from Part I 77
see this note that ∥ · ∥ is Lipschitz continuous with constant 1 with respect to itself since by
Triangle inequality and due to ∥x − y∥ = ∥y − x∥ we have
which implies by Proposition B.3 that there exists L∥·∥ < ∞ such that
that is, ∥ · ∥ is Lipschitz continuous (and thus continuous). And of course for any a ∈ R, the
sublevel set {x ∈ Rn : f (x) ≤ a} of a continuous function is closed.
For the reverse direction, consider any V possessing properties (i –iii). Then, as V is bounded
and contains a neighborhood of the origin, the function ∥·∥V is well defined, it is positive outside
of the origin and vanishes at the origin. Moreover, ∥ · ∥V is homogeneous – when the argument
is multiplied by a real number λ, the value of the function is multiplied by |λ| (by construction
and since V = −V ).
Now, let us show that the relation V = {y ∈ Rn : ∥y∥V ≤ 1} holds. Indeed, the inclusion
V ⊆ {y : ∥y∥V ≤ 1} is evident. So, we will verify that ∥y∥V ≤ 1 implies y ∈ V . Consider
any y such that ∥y∥V ≤ 1 and let t̄ := ∥y∥V (note that t̄ ∈ [0, 1]). There is nothing to prove
when t̄ = 0, which due to the boundedness of V implies that y = 0 and V contains the origin.
When t̄ > 0, then, by definition of ∥ · ∥V , there exists a sequence of positive numbers {ti } that
converges to t̄ as i → ∞ such that y i := t−1 i y ∈ V . Then, as V is closed, ȳ := t̄
−1
y ∈ V . And
since 0 < t̄ ≤ 1, y = t̄ȳ is a convex combination of the origin and ȳ. As both 0 ∈ V and ȳ ∈ V
and V is convex, we conclude y ∈ V .
Let us now check that ∥ · ∥V satisfies the Triangle inequality. As ∥ · ∥V is nonnegative, all we have
to check is that ∥x + y∥V ≤ ∥x∥V + ∥y∥V when x ̸= 0, y ̸= 0. Setting x̄ := x/∥x∥V , ȳ := y/∥y∥V ,
we have by homogeneity ∥x̄∥V = ∥ȳ∥V = 1. Then, from the relation V = {y ∈ Rn : ∥y∥V ≤ 1}
we deduce x̄ ∈ V and ȳ ∈ V . Now, as V is convex and x̄, ȳ ∈ V , we have
1 ∥x∥V ∥y∥V
(x + y) = x̄ + ȳ ∈ V.
∥x∥V + ∥y∥V ∥x∥V + ∥y∥V ∥x∥V + ∥y∥V
1
That is, ∥x∥V +∥y∥V
(x + y) ≤ 1. Then, once again by homogeneity of ∥ · ∥V we conclude that
V
∥x + y∥V ≤ ∥x∥V + ∥y∥V .
x ∈ Rn : (x − a)⊤ Q(x − a) ≤ r2
is convex.
Proof. Note that since Q is positive definite, the matrix P := Q1/2 (see section D.1.5 for the
definition of the matrix square root) is well-defined and positive definite. Then, we have P is
nonsingular and symmetric, and
n o n o
x ∈ Rn : (x − a)⊤ Q(x − a) ≤ r2 = x ∈ Rn : (x − a)⊤ P ⊤ P (x − a) ≤ r2
= {x ∈ Rn : ∥P (x − a)∥2 ≤ r} .
Now, note that whenever ∥ · ∥ is a norm on Rn and P is a nonsingular n × n matrix, the function
p
x 7→ ∥P x∥ is a norm itself (why?). Thus, the function ∥x∥Q := x⊤ Qx = ∥Q1/2 x∥2 is a norm,
and the ellipsoid in question clearly is just the ∥ · ∥Q -ball of radius r centered at a.
Fact I.1.11 A set M ⊆ Rn is convex if and only if it is closed with respect to
taking all convex combinations of its elements. That is, M is convex if and only
78 Proofs of Facts from Part I
Define x̄ := N
P λi i PN
i=2 1−λ1 x . As i=2 λi = 1 − λ1 , we see that x̄ is an N -term convex combination
of points from M and thus belongs to M by the inductive hypothesis. Hence, x = λ1 x1 +(1−λ1 )x̄
is convex combination of x1 , x̄ ∈ M , and as M is convex we conclude x ∈ M . This completes
the inductive step.
Fact I.1.14 [Convex hull via convex combinations] For a set M ⊆ Rn ,
Conv(M ) = {the set of all convex combinations of vectors from M } .
Proof. Define M c := {the set of all convex combinations of vectors from M }. Recall that a con-
vex set is closed with respect to taking convex combinations of its members (Fact I.1.11); thus
any convex set containing M also contains M c. As by definition Conv(M ) is the intersection of
all convex sets containing M , we have Conv(M ) ⊇ M c. It remains to prove that Conv(M ) ⊆ M c.
We start with the claim that M is convex. By Fact I.1.11 M is convex if and only if every convex
c c
combination of points from M c is also in M c: let x̄i ∈ M
c. Indeed this criteria holds for M c for
i = 1, . . . , N and consider a convex combination of these points, i.e.,
N
X
x̂ := λi x̄i ,
i=1
PN
where λi ≥ 0 and i=1 λi = 1. For each i = 1, . . . , N , as x̄i ∈ M c, by definition of M c, we have
x̄i = N
P i i,j i,j P Ni
µ
j=1 i,j x , where x ∈ M , µ i,j ≥ 0 and µ
j=1 i,j = 1. Then, we arrive at
Ni Ni
N N
! N X
X i
X X i,j
X
x̂ := λi x̄ = λi µi,j x = (λi µi,j )xi,j .
i=1 i=1 j=1 i=1 j=1
Fact I.1.20 [Conic hull via conic combinations] The conic hull Cone(K) of a
set K ⊆ Rn is the set of all conic combinations (i.e., linear combinations with
nonnegative coefficients) of vectors from K:
( N
)
X
Cone(K) = x ∈ Rn : ∃N ≥ 0, λi ≥ 0, xi ∈ K, i ≤ N : x = λ i xi .
i=1
Proof. The case of K = ∅ is trivial, see the comment on the value of an empty sum after the
statement of Fact I.1.20. When K ̸= ∅, this fact is an immediate corollary of Fact I.1.17.
Fact I.1.23 The closure of a set M ⊆ Rn is exactly the set composed of the
limits of all converging sequences of elements from M .
Proof. Let M be the set of the limit points of all converging sequences of elements from M .
We need to show that cl M = M . Let us first prove cl M ⊇ M . Suppose x ∈ M . Then, x is the
limit of a converging sequence of points {xi } ∈ M ⊆ cl M . Since cl M is a closed set, we arrive
at x ∈ cl M .
For the reverse direction, note that by definition cl M is the smallest (w.r.t. inclusion) closed set
that contains M , so it suffices to prove that M is a closed set satisfying M ⊇ M . It is easy to see
that M ⊇ M holds as for any x ∈ M , the sequence {xi } where xi = x is a converging sequence
of points from M with a limit point of x and thus by definition of M we deduce x ∈ M . Now,
consider a converging sequence of points {xi } ⊆ M , and let us prove that the limit x̄ of this
sequence belongs to M . For every i, since the point xi ∈ M is the limit of a sequence of points
from M , we can find a point y i ∈ M such that ∥xi − y i ∥2 ≤ 1/i. The sequence {y i } is composed
of points from M and clearly has the same limit as the sequence {xi }, so that the latter limit is
the limit of a sequence of points from M and as such belongs to M .
where the last equality follows from K being conic. So, Persp(X) ⊆ X, b and by taking the
closures of both sides we arrive at ConeT(X) = cl(Persp(X)) ⊆ cl(X) = X, b b where the last
equality follows as X clearly is closed. Hence, ConeT(X) ⊆ X. To verify the opposite inclusion,
b b
consider [x; t] ∈ X b and let us prove that [x; t] ∈ ConeT(X). Let x̄ ∈ X (recall that X is
nonempty). Then, [x̄; 1] ∈ X.b Moreover, as X b is a cone and the points [x; t] and [x̄; 1] belong to
X, we have zϵ := [x + ϵx̄; t + ϵ] ∈ X for all ϵ > 0. Also, for ϵ > 0, we have t + ϵ > 0 and so
b b
zϵ ∈ Xb implies 1 (x + ϵx̄) ∈ X, which is equivalent to zϵ = [x + ϵx̄; t + ϵ] ∈ Persp(X). Finally,
(t+ϵ)
as [x; t] = limϵ→+0 zϵ , we have [x; t] ∈ cl(Persp(X)) = ConeT(X), as desired.
81
7
Separation Theorem
We can associate with the hyperplane M or, better to say, with the associated
pair a, b (defined by the hyperplane up to multiplication of a, b by nonzero real
number) the following sets:
• “upper” and “lower” open half-spaces
M ++ := x ∈ Rn : a⊤ x > b , and M −− := x ∈ Rn : a⊤ x < b .
These sets clearly are convex, and since a linear form is continuous, and the
sets are given by strict inequalities on the value of a continuous function, they
indeed are open.
These open half-spaces are uniquely defined by the hyperplane, up to swapping
the “upper” and the “lower” ones (this is what happens when passing from a
particular pair a, b specifying M to a negative multiple of this pair).
• “upper” and “lower” closed half-spaces
M + := x ∈ Rn : a⊤ x ≥ b , and M − := x ∈ Rn : a⊤ x ≤ b .
These are also convex sets. Moreover, these two sets are polyhedral and thus
closed. It is easily seen that the closed upper/lower half-space is the closure of
the corresponding open half-space, and M itself is the common boundary of
all four half-spaces.
Also, note that our half-spaces and M itself partition Rn , i.e.,
Rn = M −− ∪ M ∪ M ++
83
84 Separation Theorem
separates) S and T .
• We say that S and T can be (strongly) separated, if there exists a hyper-
plane which (strongly) separates S and T .
1) 2) 3) 4) 5)
Figure II.1. Separation.
1) The hyperplane {x ∈ R2 : x2 − x1 = 1} strongly separates the polyhedral sets
S = {x ∈ R2 : x2 = 0, x1 ≥ −1} and T = {x ∈ R2 : 0 ≤ x1 ≤ 1, 3 ≤ x2 ≤ 5}.
2) The hyperplane {x ∈ R : x = 1} separates (but not strongly separates) the
convex sets S = {x ∈ R : x ≤ 1} and T = {x ∈ R : x ≥ 1}.
3) The hyperplane {x ∈ R2 : x1 = 0} separates (but not strongly separates) the
convex sets S = {x ∈ R2 : x1 < 0, x2 ≥ −1/x1 } and T = {x ∈ R2 : x1 >
0, x2 > 1/x1 }.
7.2 Separation Theorem 85
We will use the following simple and important lemma in the proof of the
separation theorem.
Proof. “If” part is evident. To prove the “only if” part, let x̄ ∈ rint Q be, say, a
minimizer of f over Q, then for any y ∈ Q we need to prove that f (x̄) = f (y).
There is nothing to prove if y = x̄, so let us assume that y ̸= x̄. Since Q is convex
and x̄, y ∈ Q, the segment [x̄, y] belongs to Q. Moreover, as x̄ ∈ rint Q we can
extend this segment a little further away from x̄ and still remain in Q. That is,
86 Separation Theorem
there exists z ∈ Q such that x̄ = (1 − λ)y + λz with certain λ ∈ [0, 1). As y ̸= x̄,
we have in fact λ ∈ (0, 1). Since f is linear, we deduce
f (x̄) = (1 − λ)f (y) + λf (z).
Because x̄ is a minimizer of f over Q and y, z ∈ Q, we have min{f (y), f (z)} ≥
f (x̄) = (1 − λ)f (y) + λf (z). Then, from λ ∈ (0, 1) we conclude that this relation
can be satisfied only when f (x̄) = f (y) = f (z).
Proof of Theorem II.7.3. We will prove the separation theorem in several
steps. We will first focus on the usual separation, i.e., case (i) of the theorem.
(i) Necessity. Assume that S, T can be separated. Then, for certain a ̸= 0 we
have
sup a⊤ x ≤ inf a⊤ y, and inf a⊤ x < sup a⊤ y. (7.1)
x∈S y∈T x∈S y∈T
Assume for contradiction that rint S and rint T have a common point x̄. Then,
from the first inequality in (7.1) and x̄ ∈ S ∩ T , we deduce
a⊤ x̄ ≤ sup a⊤ x ≤ inf a⊤ y ≤ a⊤ x̄.
x∈S y∈T
convex set and T = {x} is a singleton outside S (here the difference with Step 1
is that now S is not assumed to be a polytope).
Without loss of generality we may assume that S contains 0 (if it is not the
case, by taking any p ∈ S, we may translate S and T to the sets S 7→ −p + S,
T 7→ −p + T ; clearly, a linear form which separates the shifted sets, can be shifted
to separate the original ones as well). Let L be the linear span of S.
If x ̸∈ L, the separation is easy: we can write x = e + f , where e ∈ L and f is
from the subspace orthogonal to L, and thus
f ⊤ x = f ⊤ f > 0 = max f ⊤ y,
y∈S
Assume for contradiction that no such f exists. Then, for every h ∈ B there exists
yh ∈ S such that
h⊤ yh > h⊤ x.
Since the inequality is strict, it immediately follows that there exists an open
neighborhood Uh of the vector h such that
(h′ )⊤ yh > (h′ )⊤ x, ∀h′ ∈ Uh . (7.3)
Note that the family of open sets {Uh }h∈B covers B. As B is compact, we can find
a finite subfamily Uh1 , . . . , UhN of this family which still covers B. Let us take the
corresponding points y 1 := yh1 , y 2 := yh2 , . . . , y N := yhN and define the polytope
Sb := Conv {y 1 , . . . , y N }. Due to the origin of y i , all of these points are in S and
thus S ⊇ Sb (recall that S is convex). Since x ̸∈ S, we deduce x ̸∈ S. b Then, by
Step 1, x can be strongly separated from S, i.e., there exists a ̸= 0 such that
b
In other words,
f ⊤x − f ⊤y f ⊤x − f ⊤y ,
0≥ sup and 0> inf
x∈S,y∈T x∈S,y∈T
inf f ⊤ x ≥ sup f ⊤ y.
x∈T y∈S
90
8.2 Extreme points and Krein-Milman Theorem 91
Proof. To see (i) consider any x ∈ rbd M . Then, x ̸∈ rint M , and therefore
the point {x} and rint M can be separated by the Separation Theorem. The
associated separating hyperplane is exactly the desired hyperplane supporting to
M at x.
To prove (ii), note that if Π = y ∈ Rn : a⊤ y = a⊤ x is supporting to M at
1 For dimension of a subset in Rn , see Definition I.2.1 or/and section A.4.3. We have used the
following immediate observation: If M ⊂ M ′ are two affine planes, then dim M ≤ dim M ′ , with
equality implying that M = M ′ . The readers are encouraged to prove this fact on their own.
92 Consequences of Separation Theorem
In the case of polyhedral sets, extreme points are also referred to as vertices.
Example II.8.2
• The extreme points of a segment [x, y] ∈ Rn are exactly its endpoints {x, y}.
• The extreme points of a triangle are its vertices.
• The extreme points of a (closed) circle on the 2-dimensional plane are the
points of the circumference.
• The convex set M := x ∈ R2+ : x1 > 0, x2 > 0 does not have any extreme
points.
• The only extreme point of the convex set M := {[0; 0]} ∪ x ∈ R2+ : x1 > 0, x2 > 0
Fact II.8.5 All extreme points of the convex hull Conv(Q) of a set Q belong
to Q:
Ext(Conv(Q)) ⊆ Q.
convex compact set, it possesses a quite representative set of extreme points, i.e.,
their convex hull is the entire M .
Proof. With x̄, h, and x b as above, for every t > 0 the point xt := x
b+2th belongs to
cl M by Lemma II.8.8. Taking into account that x b ∈ rint M and invoking Lemma
I.1.30, we conclude that x b + th = 21 [x
b + xt ] ∈ rint M. Thus, xb + th ∈ rint M for
b + th ∈ rint M holds true for t = 0 as well due
all t > 0. Finally, this inclusion x
to xb ∈ rint M .
Our last ingredient for the proof of Theorem II.8.6 is a lemma stating a nice
transitive property of extreme points: that is, the extreme points of subsets of
nonempty closed convex sets obtained from the intersection with a supporting
hyperplane of the set are also extreme for the original set.
Proof. First statement, i.e., Π ∩ M is nonempty closed and convex, follows from
Proposition II.8.2(ii). Moreover, by Proposition II.8.2(ii) we have x̄ ∈ Π ∩ M .
Next, let a ∈ Rn be the linear form associated with Π, i.e.,
Π = y ∈ Rn : a⊤ y = a⊤ x̄ ,
so that
inf a⊤ x < sup a⊤ x = a⊤ x̄ (8.2)
x∈M x∈M
Proof of Theorem II.8.6. Let us start with (i). The “only if” part for (i)
follows from Lemma II.8.8. Indeed, for the “only if” part we need to prove that
if M possesses extreme points, then M does not contain lines. That is, we need
to prove that if M contains lines, then it has no extreme points. But, this is
indeed immediate: if M contains a line, then, by Lemma II.8.8, there is a line in
M passing through every given point of M , so that no point can be extreme.
Now let us prove the “if” part of (i). Thus, from now on we assume that M
does not contain lines, and our goal is to prove that then M possesses extreme
points. Equipped with Lemma II.8.10 and Proposition II.8.2, we will prove this
by induction on dim(M ).
There is nothing to do if dim(M ) = 0, i.e., if M is a single point – then, of
course, M = Ext(M ). Now, for the induction hypothesis, for some integer k > 0,
we assume that all nonempty closed convex sets T that do not contain lines and
have dim(T ) = k satisfy Ext(T ) ̸= ∅. To complete the induction, we will show
that this statement is valid for such sets of dimension k + 1 as well. Let M be a
nonempty, closed, convex set that does not contain lines and has dim(M ) = k +1.
Since M does not contain lines and dim(M ) > 0, we have M ̸= Aff(M ). We claim
that M possesses a relative boundary point x̄. To see this, note that there exists
z ∈ Aff(M ) \ M , and thus for any fixed x ∈ M the point
xλ := x + λ(z − x)
does not belong to M for some λ > 0 (and then, by convexity of M , for all larger
values of λ), while x0 = x belongs to M . The set of those λ ≥ 0 for which xλ ∈ M
is therefore nonempty and bounded from above; this set clearly is closed (since
M is closed). Thus, there exists the largest λ = λ∗ for which xλ ∈ M . We claim
that xλ∗ ∈ rbd M . Indeed, by construction xλ∗ ∈ M . If xλ∗ were to be in rint M ,
then all the points xλ with λ values greater than λ∗ yet close to λ∗ would also
belong to M , which contradicts the origin of λ∗ .
Thus, there exists x̄ ∈ rbd M . Then, by Proposition II.8.2(i), there exists a
hyperplane Π = x ∈ Rn : a⊤ x = a⊤ x̄ which is supporting to M at x̄:
i.e., when M is a single point, is trivial. Assume that the statement holds for all
k-dimensional closed convex and bounded sets. Let M be a closed convex and
bounded set with dim(M ) = k + 1. Consider any x ∈ M . To represent x as a
convex combination of points from Ext(M ), let us pass through x an arbitrary
line ℓ = {x + λh : λ ∈ R} (where h ̸= 0) in the affine span Aff(M ) of M . Moving
along this line from x in each of the two possible directions, we eventually leave
M (since M is bounded). Then, there exist nonnegative λ+ and λ− such that the
points
x̄+ := x + λ+ h, x̄− := x − λ− h
both belong to rbd M . We claim that x̄± admit convex combination representa-
tion using points from Ext(M ) (this will complete the proof, since x clearly is a
convex combination of the two points x̄± ). Indeed, by Proposition II.8.2(i) there
exists a hyperplane Π supporting to M at x̄+ , and by Proposition II.8.2(ii) the
set Π ∩ M is nonempty, closed and convex with dim(Π ∩ M ) < dim(M ) = k + 1.
Moreover, as M is bounded Π ∩ M is bounded as well. Then, by the inductive
hypothesis, x̄+ ∈ Conv(Ext(Π ∩ M )). Moreover, since by Lemma II.8.10 we have
Ext(Π ∩ M ) ⊆ Ext(M ), we conclude x̄+ ∈ Conv(Ext(M )). Analogous reasoning
is valid for x̄− as well.
i=1 hi = 0}. Pn
• Consider the set M := {x ∈ Rn+ : Pi=1 xi = 1}; then Rec(M ) = {0}.
n
• Consider the set M := {x ∈ Rn+ : i=1 xi ≥ 1}; then Rec(M ) = Rn+ .
• Consider the set M := {x ∈ R+ : x1 x2 ≥ 1}; then Rec(M ) = R2+ .
2
Fact II.8.14 Let M ⊆ Rn be a nonempty closed convex set. Recall its closed
conic transform is given by
ConeT(M ) = cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ M } ,
(see section 1.5). Then,
Rec(M ) = h ∈ Rn : [h; 0] ∈ ConeT(M ) .
Finally, the recessive cones of nonempty polyhedral sets in fact admit a much
simpler characterization.
Proof.
(i): By Theorem II.8.6(i) we already know that any nonempty closed convex
that does not contain lines must possess extreme points. We will prove the rest
of Part (i) by induction on dim(M ). There is nothing to prove when dim(M ) =
0, that is, M is a singleton. So, suppose that the claim holds true for all sets
of dimension k. Let M be any nonempty closed convex that does not contain
lines and has dim(M ) = k + 1. To complete the induction step, we will show
that M satisfies the relation (8.4). Consider x ∈ M and let e be a nonzero
direction parallel to Aff(M ) (such a direction exists, since dim(M ) = k + 1 ≥
1). Recalling that M does not contain lines and replacing, if necessary, e with
−e, we can assume that −e is not a recessive direction of M . Same as in the
proof of Theorem II.8.6, x admits a representation x = x− + t− e with t− ≥ 0
and x− ∈ rbd(M ). Define M− to be the intersection of M with the plane Π−
supporting to M at x− . Then, M− is a nonempty closed convex subset of M
and dim(M− ) ≤ k. Also, M− does not contain lines as M− ⊂ M and M does
not contain lines. Thus, by inductive hypothesis, x− is the sum of a point from
the nonempty set Conv(Ext(M− )) and a recessive direction h− of M− . As in the
proof of Theorem II.8.6, Ext(M− ) ⊆ Ext(M ), and of course h− ∈ Rec(M ) due to
Rec(M− ) ⊆ Rec(M ) (why?). Thus, x = v− + h− + t− e with v− ∈ Conv(Ext(X))
and h− ∈ Rec(M ). Now, there are two possibilities: e ∈ Rec(M ) and e ̸∈ Rec(M ).
In the first case, x = v− + h with h = h− + t− e ∈ Rec(M ) (recall h− ∈ Rec(M )
and in this case we also have e ∈ Rec(M )), that is, x ∈ Conv(Ext(M )) + Rec(M ).
In the second case, we can apply the above construction to the vector −e in the
role of e, ending up with a representation of x of the form x = v+ + h+ − t+ e
where v+ ∈ Conv(Ext(M )), h+ ∈ Rec(M ) and t+ ≥ 0. Taking appropriate convex
combination of the resulting pair of representations of x, we can cancel the terms
with e and arrive at x = λv− + (1 − λ)v+ + λh− + (1 − λ)h+ , resulting in x ∈
Conv(Ext(M )) + Rec(M ). This reasoning holds true for every x ∈ M , hence
we deduce M ⊆ Conv(Ext(M )) + Rec(M ). The opposite inclusion is given by
(8.3) due to Conv(Ext(M )) ⊆ M . This then completes the proof of the inductive
hypothesis, and thus Part (i) is proved.
(ii): Now assume that M , in addition to being nonempty closed and convex,
is represented as M = V + K, where K is a closed cone and V is a nonempty
bounded set, and let us prove that K = Rec(M ) and V ⊇ Ext(M ). Indeed, every
vector from K clearly is a recessive direction of V + K, so that K ⊆ Rec(M ). To
8.3 Recessive directions and recessive cone 99
prove the opposite inclusion K ⊇ Rec(M ), consider any h ∈ Rec(M ), and let us
prove that h ∈ K. Fix any point v ∈ M . The vectors v +ih, i = 1, 2, . . ., belong to
M and therefore v + ih = v i + hi for some v i ∈ V and hi ∈ K due to M = V + K.
It follows that h = i−1 [v i − v] + i−1 hi for i = 1, 2, . . .. Thus, h = limi→∞ i−1 hi
(recall that V is bounded). As hi ∈ K and K is a cone, i−1 hi ∈ K and so h
is the limit of a sequence of points in K. Since K is closed, we deduce h ∈ K,
as claimed. Thus, K = Rec(M ). It remains to prove that Ext(M ) ⊆ V . This is
immediate: consider any w ∈ Ext(M ), then as M = V + K = V + Rec(M ) and
w ∈ M , we have w = v + e with some v ∈ V ⊆ M and e ∈ Rec(M ), implying
that w − e = v ∈ M . Besides this, w + e ∈ M as w ∈ M and e ∈ Rec(M ). Thus,
w ± e ∈ M . Since w is an extreme point of M , we conclude that e = 0, that is,
w =v ∈V.
Finally, let us consider what happens to the recessive directions after the pro-
jection operation.
is closed. Then,
[hx ; hu ] ∈ Rec(M + ) =⇒ hx ∈ Rec(M ).
Proof. Consider any recessive direction [hx ; hu ] ∈ Rec(M + ). Then, for any
[x̄; ū] ∈ M + , the ray {[x̄; ū] + t[hx ; hu ] : t ≥ 0} is contained in M + . The pro-
jection of this ray on the x-plane is given by the ray {x̄ + thx : t ≥ 0}, which is
contained in M . Thus, hx ∈ Rec(M ).
While Proposition II.8.17 states that [hx ; hu ] ∈ Rec(M + ) =⇒ hx ∈ Rec(M ),
in general, Rec(M ) can be much larger than the projection of Rec(M + ) onto
x-plane. Our next example illustrates this.
Example II.8.4 Consider the sets M + = {[x; u] ∈ R2 : u ≥ x2 } and M = {x ∈
R2 : ∃u ∈ R: [x; u] ∈ M + }. Then, M is the entire x-axis and Rec(M ) = M
is the entire x-axis. On the other hand, Rec(M + ) = {[0; hu ] : hu ≥ 0} and the
projection of Rec(M + ) onto the x-axis is just the origin. ♢
In fact, the pathology highlighted in Example II.8.4 can be eliminated when we
have that the set of extreme points of the convex representation M + of a convex
set M is bounded and the projection of Rec(M + ) is closed.
that is, W is the projection of V onto the x-space. As V is a closed and bounded
(therefore compact) set, its projection W is compact as well (recall that the
image of a compact set under a continuous mapping is compact). Note that M is
nonempty and it satisfies M = W + K (why?). Then, M is the sum of a compact
set W and a closed set K, and thus M is closed itself (why?). Besides this, M is
convex (recall that the projection of a convex set is convex). Thus, the nonempty
closed convex set M satisfies M = W + K with nonempty bounded W and closed
cone K, implying by Theorem II.8.16 that K = Rec(M ).
Recall that we have investigated the relation between the recessive directions of
a closed convex set M ∈ Rnx and its closed convex representation M + ∈ Rnx × Rku
in Proposition II.8.17. In particular, we observed that while [hx ; hu ] ∈ Rec(M + )
implies hx ∈ Rec(M ), the recessive direction of M “stemming” from those of M +
can form a small part of Rec(M ), as seen in Example II.8.4.
A surprising (and not completely trivial) fact is that for polyhedral sets M ,
the projection of Rec(M + ) onto the x-plane is Rec(M ).
Then,
Rec(M ) = hx : ∃hu : [hx ; hu ] ∈ Rec(M + )
Definition II.8.20 [Dual cone] Let M ⊆ Rn be a cone. The set of all vectors
which have nonnegative inner products with all vectors from M , i.e., the set
M∗ := a ∈ Rn : a⊤ x ≥ 0, ∀x ∈ M ,
(8.6)
is called the cone dual to M .
From its definition, it is clear that the dual cone M∗ of any cone M is a closed
cone.
Example II.8.5 The cone dual to the nonnegative orthant Rn+ is composed of
all n-dimensional vectors y making nonnegative inner products with all entrywise
nonnegative n-dimensional vectors x. As is immediately seen the vectors y with
this property are exactly entrywise nonnegative vectors: [Rn+ ]∗ = Rn+ . ♢
Note that in the preceding example, Rn+ is given by finitely many homogeneous
linear inequalities:
Rn+ = x ∈ Rn : e⊤
i x ≥ 0, i = 1, . . . , n ,
where ei are the basic orths; and we observe that the dual cone is the conic hull
of these basic orth. This is indeed a special case of the following general fact:
Because F is a closed cone (and so 0 ∈ F ), the right hand side infimum, being
finite, must be 0. Then, g ⊤ f ≥ 0 for all f ∈ F and g ⊤ z < 0. Since f ⊤ g ≥ 0 for all
f ∈ F and also F ⊇ F , we deduce f ⊤ g ≥ 0 for all f ∈ F , that is, g ∈ M by the
definition of M . But, then the inclusion g ∈ M together with z ∈ M∗ contradicts
the relation z ⊤ g < 0. Finally, we clearly have f ⊤ x ≥ 0 for all x ∈ F if and only
if f ⊤ x ≥ 0 for all x ∈ cl Cone(F ).
102 Consequences of Separation Theorem
we did not need to take the closure. This is because the conic hull of a finite set
F is polyhedrally representable and is therefore a polyhedral cone (by Theorem
I.3.2), and as such it is automatically closed.
This fact (i.e., no need to take the closure in Proposition II.8.21) holds true
for the dual of any polyhedral cone: consider the set {x ∈ Rn : a⊤ i x ≥ 0, i =
1, . . . , I} given by finitely many homogeneous nonstrict linear inequalities. This
set is clearly a polyhedral
nP cone, and itsodual is the conic hull of ai ’s, i.e., Cone({ai :
I
i = 1, . . . , I}) = i=1 λi ai : λ ≥ 0 . Moreover, this dual cone clearly is also
polyhedrally representable as
( I
)
X
Cone {a1 , . . . , aI } = x ∈ Rn : ∃λ ≥ 0: x = λi ai ,
i=1
R2+ , so K∗ = R2+ as well. Now observe that K∗ = R2+ is, as it should be by Propo-
sition II.8.21, the closure of Cone(F ), nevertheless K∗ = cl Cone(F ) is larger than
Cone(F ) as Cone(F ) is not closed! Note that Cone(F ) is precisely the set obtained
from R2+ by eliminating all nonzero points on the boundary of R2+ . ♢
Fact II.8.23 Let M be a closed cone in Rn , and let M∗ be the cone dual to
M . Then
(i) Duality does not distinguish between a cone and its closure: whenever
M = cl M ′ for a cone M ′ , we have M∗ = M∗′ .
(ii) Duality is symmetric: the cone dual to M∗ is M .
8.4 Dual cone 103
Tk
Proposition II.8.25 Let M 1 , . . . , M k be cones in Rn . Define M := i=1 M i ,
and let M∗ be the dual cone of M . Let M∗i denote the dual cone of M i , for
f := M 1 + . . . + M k . Then, M∗ ⊇ M
i = 1, . . . , k, and define M f.
∗ ∗
1 k
Moreover, if all the cones M , . . . , M are closed, then M∗ = cl M f. In
1 k
particular, for closed cones M , . . . , M , M∗ = M f holds if and only if M
f is
closed.
Remark II.8.26 Note that in general M f can be non-closed even when all the
1 k
cones M , . . . , M are closed. Indeed, take
√ 2 k =2 2, and let M 1 = M∗1 be the
3 2
second-order cone (x, y, z) ∈ R : z ≥ x + y , and M∗ be the following ray
in R3
(x, y, z) ∈ R3 : x = z, y = 0, x ≤ 0 .
Observe that the points from M f ≡ M 1 + M 2 are exactly the points of the form
√ ∗ ∗
(x−t, y, z−t) with t ≥ 0 and z ≥ x2 + y 2 . In particular, for any α > 0, the points
√ √
(0, 1, α2 + 1−α) = (α−α, 1, α2 + 1−α) belong to M f. As α → ∞, these points
ξ ∈ cl M
converge to ξ := (0, 1, 0), and thus √ f. On the other hand, we clearly cannot
find x, y, z, t with t ≥ 0 and z ≥ x2 + y 2 such that (x − t, y, z − t) = (0, 1, 0),
that is, ξ ̸∈ Mf. ■
Dubovitski-Milutin Lemma presents a simple sufficient condition for M f to be
closed and thus to coincide with M∗ :
Fact II.8.28 Let M ⊆ Rn be a cone and M∗ be its dual cone. Then, for any
x ∈ int M , there exists a properly selected Cx < ∞ such that
∥f ∥2 ≤ Cx f ⊤ x, ∀f ∈ M∗ .
Now, as explained in the beginning of Part (ii) of the above proof of Proposi-
tion II.8.27, we can assume without loss of generality that the cones M 1 , . . . , M k
satisfying the premise of the proposition are closed, and all we need to prove is
that the cone M∗1 + . . . + M∗k is closed. The latter is the same as to verify that
Pk
whenever vectors fti ∈ M∗i , i ≤ k, t = 1, 2, . . . are such that ft := i=1 fti → h
as i → ∞, it holds h ∈ M∗1 + . . . + M∗k . Indeed, in the situation in question,
selecting x̄ ∈ M k ∩ int M 1 ∩ . . . ∩ int M k−1 (by the promise this intersection is
Pk
nonempty!) we have x̄⊤ fti ≥ 0 for all i, t and i=1 x̄⊤ fti → x̄⊤ h as t → ∞,
implying that for all i ≤ k the sequences {x̄⊤ fti }t=1,2,... are bounded. Moreover,
for any i < k we have x̄ ∈ int M i and fti ∈ M∗i , and so Fact II.8.28 guaran-
tees that the sequence {fti }t=1,2,... is bounded. Thus, as the sequences {fti }t=1,2,...
8.6 Extreme rays and conic Krein-Milman Theorem 107
Pk
are bounded for any i < k and the sequence i=1 fti has a limit as t → ∞,
we conclude that the sequence {ftk }t=1,2,... is bounded as well. Hence, all k se-
quences {fti }t=1,2,... are bounded, so that passing to a subsequence t1 < t2 < . . .
we can assume that f i := limj→∞ ftij is well defined for every i ≤ k. Since
fti ∈ M∗i andPthe cones M∗i are P closed,
i i
Pwe ihave f ∈ M∗ for all i ≤ k. Finally, as
h = limt→∞ i ft = limj→∞ i ftj = i f , we conclude that h ∈ M∗1 + . . . + M∗k ,
i i
as claimed.
Example II.8.10 The simplest example of nontrivial closed and pointed cone
is the nonnegative orthant Rn+ . Based on our extreme direction definition, the
extreme directions of Rn+ should be the nonzero n-dimensional entrywise non-
negative vectors d such that whenever d = d1 + d2 with d1 ≥ 0 and d2 ≥ 0, both
d1 and d2 must be nonnegative multiples of d. Such a vector d has at all entries
nonnegative and at least one of them positive. In fact, the number of positive
entries in d is exactly one, since if there were at least two entries, say, d1 and
d2 , positive, we would have d = [d1 ; 0; . . . ; 0] + [0; d2 ; d3 ; . . . ; dn ] and both of the
vectors in the right hand side would be nonzero and not proportional to d. Thus,
any extreme direction of Rn+ must be a positive multiple of a basic orth. It is
immediately seen that every vector of the latter type is an extreme direction of
Rn+ . Hence, the extreme directions of Rn+ are positive multiples of the basic orths,
and the extreme rays of Rn+ are the nonnegative parts of the coordinate axes. ♢
We next introduce the concept of a base which is an important type of the
cross section of a nontrivial closed pointed cone. Moreover, we will see that a
base will be a compact convex set and will establish a direct connection between
the extreme rays of the underlying cone and extreme points of its base.
Figure II.2. Cone and its base (grey pentagon). Extreme rays of the cone are
OA, OB,. . . ,OE intersecting the base at its extreme points A, B, . . . , E.
Definition II.8.36 [Polar of a convex set] For any nonempty convex set
M ⊆ Rn , we define its polar [notation: Polar (M )] to be the set of all vectors
a ∈ Rn such that a⊤ x ≤ 1 for all x ∈ M , i.e.,
Polar (M ) := a ∈ Rn : a⊤ x ≤ 1, ∀x ∈ M .
For any nonempty convex set M , the following properties of its polar are evi-
dent:
1. 0 ∈ Polar (M );
2. Polar (M ) is convex;
3. Polar (M ) is closed.
Proof. Based on the evident properties of polars, all we need is to prove that if
M is closed and convex and 0 ∈ M , then M = Polar (Polar (M )). By definition,
for all x ∈ M and y ∈ Polar (M ), we have
y ⊤ x ≤ 1.
Thus, M ⊆ Polar (Polar (M )).
To prove that this inclusion is in fact equality, we assume for contradiction that
there exists x̄ ∈ Polar (Polar (M )) \ M . Since M is a nonempty closed convex set
and x̄ ̸∈ M , the point x̄ can be strongly separated from M (Separation Theorem
(ii)). Thus, there exists b ∈ Rn such that
b⊤ x̄ > sup b⊤ x.
x∈M
⊤
As 0 ∈ M , we deduce b x̄ > 0. Passing from b to a proportional vector a = λb
with appropriately chosen positive λ, we may ensure that
a⊤ x̄ > 1 ≥ sup a⊤ x.
x∈M
From the relation 1 ≥ sup a⊤ x we conclude that a ∈ Polar (M ). But, then the
x∈M
relation a⊤ x̄ > 1 contradicts the assumption that x̄ ∈ Polar (Polar (M )). Hence,
we conclude that indeed M = Polar (Polar (M )).
We close this section with a few important properties of the polars.
The⊤ “only if” part: let x be an extreme point of M , and define the sets I :=
i : ai x = bi as the set of indices of active constraints and F := {ai : i ∈ I} as
the set of vectors of active constraints. We will prove that the set F contains n
linearly independent vectors, i.e., Lin(F ) = Rn . Assume for contradiction that
this is not the case. Then, as dim(F ⊥ ) = n − dim(Lin(F )), we deduce dim(F ⊥ ) >
0 and so there exists a nonzero vector d ∈ F ⊥ . Consider the segment ∆ϵ :=
[x − ϵd, x + ϵd], where ϵ > 0 will be the parameter of our construction. Since
d is orthogonal to the “active” vectors ai (those with i ∈ I), all points y ∈ ∆ϵ
satisfy the relations a⊤ ⊤
i y = ai x = bi . Now, if i is a “nonactive” index (one with
⊤ ⊤
ai x < bi ), then ai y ≤ bi for all y ∈ ∆ϵ , provided that ϵ is small enough. Since
there are finitely many nonactive indices, we can choose ϵ > 0 in such a way that
all y ∈ ∆ϵ will satisfy all “nonactive” inequalities a⊤ i x ≤ bi , i ̸∈ I, as well. So,
we conclude that for the above choice of ϵ > 0 we get ∆ϵ ⊆ M . But, this is a
contradiction to x being an extreme point of M as we have expressed x as the
midpoint of a nontrivial segment ∆ϵ (recall that ϵ > 0 and d ̸= 0).
To prove the “if” part, we assume that x ∈ M is such that among the in-
equalities a⊤i x ≤ bi which are active at x there are n linearly independent ones.
Without loss of generality, we assume that the indices of these linearly indepen-
dent equations are 1, . . . , n. Given this, we will prove that x is an extreme point
112
9.1 Extreme points of polyhedral sets 113
of M . Assume for contradiction that x is not an extreme point. Then, there exists
a vector d ̸= 0 such that x ± d ∈ M . In other words, for i = 1, . . . , n we would
have bi ≥ a⊤ ⊤ ⊤
i (x ± d) ≡ bi ± ai d (where the last equivalence follows from ai x = bi
⊤
for all i ∈ I = {1, . . . , n}), which is possible only if ai d = 0, i = 1, . . . , n. But
the only vector which is orthogonal to n linearly independent vectors in Rn is
the zero vector (why?), and so we get d = 0, which contradicts to the assumption
d ̸= 0.
Theorem II.9.1 states that at every extreme point of a polyhedral set M =
{x ∈ Rn : Ax ≤ b} we must have n linearly independent constraints from Ax ≤ b
holding as equalities. Since a system of n linearly independent equality constraints
in n unknowns has a unique solution, such a system can specify at most one
extreme point of M (exactly one, when the (unique!) solution to the system
satisfies the remaining constraints in the system Ax ≤ b). Moreover, when M
is defined by m inequality constraints, the number of such systems, and thus
the number of extreme points of M , does not exceed the number Cnm of n × n
submatrices of the matrix A ∈ Rm×n . Hence, we arrive at the following corollary.
Corollary II.9.2 Every polyhedral set has finitely many extreme points.
Recall that there are nonempty polyhedral sets which do not have any extreme
points; these are precisely the ones that contain lines.
Note that Cnm is nothing but an upper (and typically very conservative) bound
on the number of extreme points of a polyhedral set in Rn defined by m inequality
constraints. This is because some n × n submatrices of A can be singular, and
what is more important, the majority of the nonsingular ones typically produce
“candidate” points which do not satisfy the remaining inequalities defining M .
Remark II.9.3 Historically, Theorem II.9.1 has been instrumental in developing
an algorithm to solve linear programs, namely the Simplex method. Let us consider
an LP in standard form
minn c⊤ x : P x = p, x ≥ 0 ,
x∈R
where P ∈ Rk×n . Note that we can convert any given LP to this form by adding
a small number of new variables and constraints if needed. In the context of this
LP, Theorem II.9.1 states that the extreme points of the feasible set are exactly
the basic feasible solutions of the system P x = p, i.e., nonnegative vectors x such
that P x = p and the set of columns of P associated with positive entries of x is
linearly independent. As the feasible set of an LP in standard form clearly does
not contain lines (note the constraints x ≥ 0 which restricts the standard form LP
domain to be subset of the pointed cone Rn+ ), among its optimal solutions (if they
exist) at least one is an extreme point of the feasible set (Theorem II.9.12(ii)).
This then suggests a simple algorithm to solve a solvable LP in standard form: go
through the finite set of all extreme points of the feasible set (or equivalently all
basic feasible solutions) and choose the one with the best objective value. This
algorithm allows to find an optimal solution in finitely many arithmetic opera-
114 Geometry of polyhedral sets
tions, provided that the LP is solvable, and underlies the basic idea for the Sim-
plex method. As one will immediately recognize, the number of extreme points,
although finite, may be quite large. The Simplex method operates in a smarter
way and examines only a subset of the basic feasible solutions in an organized
way and can handle other issues such as infeasibility and unboundedness.
Another useful consequence of Theorem II.9.1 is that if all the data in an LP
are rational, then every one of its extreme points is a vector with rational entries.
Thus, a solvable standard form LP with rational data has at least one rational
optimal solution. ■
Theorem II.9.1 has further important consequences in terms of sizes of extreme
points of polyhedral sets as well.
To this end, let us first recall a simple fact from Linear Algebra:
∥x(b)∥2 = ∥x b−1b
b(b)∥2 = ∥A b−1 ∥∥b
b∥2 ≤ ∥A b−1 ∥∥b∥2 ,
b∥2 ≤ ∥A
where ∥A b−1 ∥ is the spectral norm of A b−1 . Then, setting C(A) = ∥A b−1 ∥ concludes
the proof.
Surprisingly, a similar result holds for the solutions of systems of linear inequal-
ities as well.
Proof. This proof is quite similar to the one for Proposition II.9.4. Let r :=
rank(A). The case of r = 0 is trivial – in this case A = 0, and the system Ax ≤ b
is feasible, it has the solution x = 0. When r > 0, we can assume without loss of
9.1 Extreme points of polyhedral sets 115
generality that the first r columns in A are linearly independent. Let A b ∈ Rm×r
be the submatrix of A obtained from the first r columns of A. As A has all the
b
linearly independent columns of A, the image spaces of A and A b are the same.
Thus, the system Ax ≤ b is feasible if and only if the system Au b ≤ b is feasible.
r
Moreover, given any feasible solution u ∈ R to Au ≤ b, we can generate a feasible
b
solution x := [u; 0] ∈ Rn by adding n − r zeros at the end and still preserve the
norm of the solution. Hence, without loss of generality we can assume that the
columns of A are linearly independent and r = n.
As A ∈ Rm×n has n linearly independent columns and each column is a vector
in Rm , we deduce that m ≥ n and {u : Au = 0} = {0}. Thus, we conclude
that the polyhedral set {x : Ax ≤ b} does not contain lines. Therefore, when
nonempty, by Krein-Milman Theorem this polyhedral set has an extreme point.
Let us take this point as x(b). Then, by Theorem II.9.1, at least n of the inequality
constraints from the system Ax ≤ b will be active at x(b) and out of these active
constraints there will be n vectors ai (corresponding to rows of the matrix A)
that are linearly independent. That is, Ab x(b) = b holds for a certain nonsingular
n×n submatrix Ab of A. So, we conclude ∥x(b)∥2 ≤ ∥A−1 b ∥∥b∥2 . Since the number
of r × r nonsingular submatrices in A is finite, the maximum C(A) of the spectral
norms of the inverses of these submatrices is finite as well, and, as we have seen,
for every b for which the system Ax ≤ b is feasible, it has a solution x(b) with
∥x(b)∥2 ≤ C(A)∥b∥2 , as claimed.
The set of extreme points of X is precisely the set of vectors with entries 0 and
1 which have exactly k entries equal to 1. That is,
( n
)
X
Ext(X) = x ∈ {0, 1}n : xi = k .
i=1
In particular,
Pn the extreme points of the “flat (a.k.a. probabilistic) simplex”
{x ∈ Rn+ : i=1 xi = 1} are the basic orths (see the Figure II.3 for an illustration
116 Geometry of polyhedral sets
The set of extreme points of X is precisely the set of vectors with entries 0 and
1 which have at most k entries equal to 1. That is,
( n
)
X
Ext(X) = x ∈ {0, 1}n : xi ≤ k .
i=1
Justification of this example follows the one of Example II.9.1 and is left as an
exercise to the reader. ♢
Example II.9.3 Suppose k, n are integers satisfying 0 ≤ k ≤ n. Consider the
polytope
n Xn o
X = x ∈ Rn : |xi | ≤ 1, ∀i ≤ n, |xi | ≤ k .
i=1
Extreme points of X are exactly the vectors with entries 0, 1, −1 which have
exactly k nonzero entries. That is,
( n
)
X
n
Ext(X) = x ∈ {−1, 0, 1} : |xi | = k .
i=1
The set Πn is also called the Assignment (or balanced matching) polytope. As Πn
is a polytope, by Krein-Milman Theorem, Πn is the convex hull of its extreme
points. What are these extreme points? The answer is given by the following
fundamental result.
the case. So, the set {f } ∪ {ai : i ∈ I} must have n linearly independent vectors.
Then, by Theorem II.9.1 xd must be an extreme point of B. Once again, using
Fact II.8.33(iv) we conclude that d must be an extreme direction of M .
Analogous to Corollary II.9.2, we have the following immediate corollary of
this theorem.
the points from R. The set M (V, R) clearly is convex as it is the arithmetic sum
of two convex sets Conv(V ) and Cone(R) (recall our convention that Cone(∅) =
{0}, see Fact I.1.20). We are now ready to present the promised inner description
of a polyhedral set.
Remark II.9.11 We will see in section 9.3.3 that the inner characterization of
the polyhedral sets given in Theorem II.9.10 can be made much more precise.
Suppose that we are given a nonempty polyhedral set M . Then, we can select an
inner characterization of it in the form of M = Conv(V ) + Cone(R) with finite V
and finite R, where the “conic” part Cone(R) (not the set R itself!) is uniquely
defined by M ; in fact it will always hold that Cone(R) = Rec(M ), i.e., R can
be taken as the generators of the recessive cone of M (see Comment to Lemma
II.8.8). Moreover, if M does not contain lines, then V can be chosen as the set of
all extreme points of M . ■
We will prove Theorem II.9.10 in section 9.3.3. Before proceeding with its proof,
let us understand why this theorem is so important, i.e., why it is so nice to know
both inner and outer descriptions of a polyhedral set.
Consider the following natural questions:
• A. Is it true that the inverse image of a polyhedral set M ⊂ Rn under an affine
mapping y 7→ P(y) = P y + p : Rm → Rn , i.e., the set
P −1 (M ) = {y ∈ Rm : P y + p ∈ M }
is polyhedral?
• B. Is it true that the image of a polyhedral set M ⊂ Rn under an affine mapping
x 7→ y = P(x) = P x + p : Rn → Rm – the set
P(M ) = {P x + p : x ∈ M }
is polyhedral?
• C. Is it true that the intersection of two polyhedral sets is again a polyhedral
set?
• D. Is it true that the arithmetic sum of two polyhedral sets is again a polyhedral
set?
The answers to all these questions are positive; one way to see it is to use calculus
of polyhedral representations along with the fact that polyhedrally representable
sets are exactly the same as polyhedral sets (see chapter 3). Another very in-
structive way is to use the just outlined results on the structure of polyhedral
sets, which we will do now.
9.3 Geometry of polyhedral sets 123
and the right hand side expression is nothing but a convex combination of points
from V + V ′ .
124 Geometry of polyhedral sets
We see that Opt(P ) is finite if and only if inf r c⊤ r : r ∈ Cone(R) > −∞, and
⊤
the latter clearly ⊤is the case if and only if c r ≥ 0 for all r ∈ R. Then, in
such a case inf r c r : r ∈ Cone(R) = 0, and also minv c⊤ v : v ∈ Conv(V ) =
minv c⊤ v : v ∈ V .
(ii): The first claim in (ii) is an immediate byproduct of the proof of (i). The
126 Geometry of polyhedral sets
second claim follows from the fact that when M does not contain lines, we can
take V = Ext(M ), see Remark II.9.11.
In this section, we will prove Theorem II.9.10. To simplify our language let
us call VR-sets (“V” from “vertex” and “R” from “rays”) the sets of the form
M (V, R), and P-sets the nonempty polyhedral sets, i.e., defined by finitely many
linear non-strict inequalities. We should prove that every P-set is a VR-set, and
vice versa.
VR =⇒ P: This is immediate: a VR-set is nonempty and polyhedrally repre-
sentable (why?) and thus is a nonempty P-set by Theorem I.3.2.
P =⇒ VR:
Let M ̸= ∅ be a P-set, so that M is the set of all solutions to a feasible system
of linear inequalities:
M = {x ∈ Rn : Ax ≤ b} , (9.2)
where A ∈ Rm×n .
We will first study the case of P-sets that do not contain lines, and then reduce
the general case to this one.
Proof. As M is a nonempty closed convex set that does not contain lines, by
Theorem II.8.6(i) we know Ext(M ) ̸= ∅, and by Theorem II.8.16 we have M =
Conv(Ext(M )) + Rec(M ). Moreover, by Corollary II.9.2, we have Ext(M ) is a
finite set.
If M is bounded, then Rec(M ) = {0}, and thus the result follows. Suppose M
is unbounded. Then, Rec(M ) is nontrivial. Also, Rec(M ) is pointed as M does
not contain lines implies that Rec(M ) does not contain lines either. Moreover,
from Fact II.8.15, we deduce that Rec(M ) = {h ∈ Rn : Ah ≤ 0} and thus is
a polyhedral cone. Then, by Corollary II.9.9 we have that Rec(M ) has finitely
many extreme rays and Rec(M ) is the sum of its extreme rays.
Next, we study the case when M contains a line. We start with the following
observation.
t ∈ R) if and only if h ∈ KerA. That is, the nonzero vectors from KerA are
exactly the directions of lines contained in M .
First, note that as M ̸= ∅ we have M ′ ̸= ∅, and also the set M ′ clearly does
not contain lines. This is because if h ̸= 0 is the direction of a line satisfying
x + th ∈ M ′ for all t ∈ R and some x ∈ M ′ , by definition of M ′ we must have
x + th ∈ L⊥ for all t and thus h ∈ L⊥ . On the other hand, by Lemma II.9.14,
we must also have h ∈ KerA = L. Then, h ∈ L ∩ L⊥ implies h = 0, which is a
contradiction.
Now, note that M ′ ̸= ∅ satisfies M = M ′ + L. Indeed, M ′ contains the or-
thogonal projections of all points from M onto L⊥ (since to project a point onto
L⊥ , you should move from this point along a certain line with a direction from
L, and all these movements, started in M , keep you in M by Lemma II.9.14) and
therefore is nonempty, first, and is such that M ′ + L ⊇ M , second. On the other
hand, M ′ ⊆ M and M + L = M (by Lemma II.9.14), and so M ′ + L ⊆ M . Thus,
M′ + L = M.
Finally, it is clear that M ′ is a polyhedral set as the inclusion x ∈ L⊥ can be
represented by dim(L) linear equations (i.e., by 2 dim(L) nonstrict linear inequal-
ities). To this end, all we need is a set of vectors ξ1 , . . . , ξdim(L) forming a basis in
L, and then L⊥ := {x ∈ Rn : ξi⊤ x = 0, ∀i = 1, . . . , dim(L)}.
Therefore, with these steps, given an arbitrary nonempty P-set M , we have
represented it as the sum of a P-set M ′ which does not contain lines and a linear
subspace L. Then, as M ′ does not contain lines, by Theorem II.9.13, we have
M ′ = M (V ′ , R′ ) where V ′ is the nonempty set of extreme points of M ′ and R′
is the set of extreme rays of Rec(M ′ ). Let us define R′′ to be the finite set of
generators for L, i.e., L = Cone(R′′ ). Then, we arrive at
M = M′ + L
= [Conv(V ′ ) + Cone(R′ )] + Cone(R′′ )
= Conv(V ′ ) + [Cone(R′ ) + Cone(R′′ )]
= Conv(V ′ ) + Cone(R′ ∪ R′′ )
= M (V ′ , R′ ∪ R′′ ).
Thus, this proves that a P-set is indeed a VR-set, as desired.
Finally, let us justify Remark II.9.11. Suppose we are given M = Conv(V ) +
Cone(R) with finite sets V, R and V ̸= ∅. Justifying the first claim in this re-
mark requires us to show that Cone(R) = Rec(M ). To see this, from M =
128 Geometry of polyhedral sets
9.4 ⋆ Majorization
In this section we will introduce and study the Majorization Principle, which
describes the convex hull of permutations of a given vector.
For any x ∈ Rn , we define X[x] to be the set of all convex combinations of n!
vectors obtained from x by permuting its coordinates. That is,
X[x] := Conv ({P x : P is an n × n permutation matrix})
= {Dx : D ∈ Πn } ,
where Πn is the set of all n × n doubly stochastic matrices. Here, the equality is
due to the Birkhoff-von Neumann Theorem. Note that X[x] is a permutationally
symmetric set, that is given any vector from the set the vector obtained by
permuting its entries is also in the set.
For any k, define Ik to be the family of all k-element subsets of the index set
{1, 2 . . . ., n}, and so
X
sk (y) = max yi , . (9.4)
I∈Ik
i∈I
xσ := [xσ(1) ; . . . ; xσ(n) ].
where the inequality is due to (9.4). Maximizing both sides of this inequality over
I ∈ Ik and invoking (9.4) once again, we get sk (y) ≤ sk (x) for all k ≤ n. In
addition,
n
X n X
X X n
X X n
X n
X
yi = λσ xσ(i) = λσ xσ(i) = λσ xi = xi ,
i=1 i=1 σ σ i=1 σ i=1 i=1
c⊤ y > max c⊤ z.
z∈X[x]
As the set X[x] is permutationally symmetric and the vector y is ordered, without
loss of generality we can select the vector c to be ordered as well. This is because
when permuting the entries of c, we preserve max c⊤ z, and arranging the entries
z∈X[x]
of c in non-increasing order, we do not decrease c⊤ y: assuming, say, that c1 < c2 ,
swapping c1 and c2 we do not decrease c⊤ y: [c2 y1 + c1 y2 ] − [c1 y1 + c2 y2 ] = [c2 −
c1 ][y1 −y2 ] ≥ 0. Next, by Abel’s formula (discrete analogy of integration by parts)
130 Geometry of polyhedral sets
we have
n
X n−1
X i
X n
X
c⊤ y = ci yi = (ci − ci+1 ) yj + cn yj
i=1 i=1 j=1 j=1
n−1
X
= (ci − ci+1 )si (y) + cn sn (y)
i=1
n−1
X n
X
≤ (ci − ci+1 )si (x) + cn sn (x) = ci xi = c⊤ x.
i=1 i=1
10.1 Separation
Exercise II.1 Mark by ”Y” those of the below listed cases where the linear form f ⊤ x separates
the sets S and T :
• = {0} ⊂ R, T = {0} ⊂ R, f ⊤ x = x
S
• = {0} ⊂ R, T = [0, 1] ⊂ R, f ⊤ x = x
S
• = {0} ⊂ R, T = [−1, 1] ⊂ R, f ⊤ x = x
S
= {x ∈ R3 : x1 = x2 = x3 }, T = {x ∈ R3 : x3 ≥ x21 + x22 }, f ⊤ x = x1 − x2
p
• S
• S = {x ∈ R3 : x1 = x2 = x3 }, T = {x ∈ R3 : x3 ≥ x21 + x22 }, f ⊤ x = x3 − x2 S = {x ∈ R3 :
p
Is it possible to separate this set from the set {x1 = x2 = . . . = x2004 ≤ 0}? If yes, what could
be a separating plane?
Exercise II.3 Can the sets S = {x ∈ R2 : x1 > 0, x2 ≥ 1/x1 } and T = {x ∈ R2 : x1 < 0, x2 ≥
−1/x1 } be separated? Can they be strongly separated?
Exercise II.4 ♦ Let M ⊂ Rn be a nonempty closed convex set. The metric projection
ProjM (x) of a point x ∈ Rn onto M is the ∥ · ∥2 -closest to x point of M , so that
1. Prove that for every x ∈ Rn the minimum in the right hand side of (∗) is achieved, and x+
is a minimizer if and only if
Derive from the latter fact that the minimum in (∗) is achieved at a unique point, the bottom
line being that ProjM (·) is well defined.
2. Prove that when passing from a point x ∈ Rn to its metric projection x+ = ProjM (x), the
distance to any point of M does not increase, specifically,
131
132 Exercises for Part II
x−x+
3. Let x ̸∈ M , so that, denoting x+ = ProjM (x), the vector e = ∥x−x+ ∥2
is well defined. Prove
⊤
that the linear form e z strongly separates x and M , specifically,
∀y ∈ M : e⊤ y ≤ e⊤ x − dist(x, M ).
Note: The fact just outlined underlies an alternative proof of Separation Theorem, where
the first step is to prove that a point outside a nonempty closed convex set can be strongly
separated from the set. In our proof, the first step was similar, but with M restricted to be
polyhedral, rather than merely convex and closed.
4. Prove that the mapping x 7→ ProjM (x) : Rn → M is contraction in ∥ · ∥2 :
Exercise II.11 ♦ Looking at the sets of extreme points of closed convex sets like the unit
Euclidean ball, a polytope, the paraboloid {[x; t] : t ≥ x⊤ x}, etc., we see that these sets are
closed. Do you think this always is the case? Is it true that the set Ext(M ) of extreme points of
a closed convex set M always is closed ?
Exercise II.12 ▲ Derive representation (∗) in Exercise I.29 from Example II.9.1 in section
9.1.1.
Exercise II.13 ♦ P By Birkhoff PTheorem, the extreme points of the polytope Πn = {[xij ] ∈
Rn×n : xij ≥ 0, i xij = 1 ∀j, j xij = 1 ∀i} are exactly the Boolean (i.e., with entries 0
and 1) matrices from this set. Prove that the same holds
P true for theP “polytope of sub-doubly
stochastic” matrices Πm,n = {[xij ] ∈ Rm×n : xij ≥ 0, i xij ≤ 1 ∀j, j xij ≤ 1 ∀i}.
Exercise II.14 ♦ [Follow-up to Exercise II.13] Let with m ≤ n,
Pm, n be two positive integers P
and Xm,n be the set of m × n matrices [xij ] with i |xij | ≤ 1 for all j ≤ n and j |xij | ≤ 1
for
all i ≤ m.
Describe the set
Ext(Xm,n ). To get an educated guess, look at the matrices
1 0 0 0 0 0 0.5 −0.5 0
0 0 −1
,
0 0 −1
,
−0.5 0.5 0
from X2,3 .
2. Is the inverse of 1) true, i.e., is it true that every extreme point x̄ of M is the unique
maximizer, over x ∈ M , of a properly selected linear form?
Exercise II.19 Identify and justify the correct claims in the following list:
n
1. Let X ⊂ R be a nonempty closed convex set, P be an m × n matrix, Y = P X := {P x :
x ∈ X} ⊂ Rn , and Y be the closure of Y . Then
• For every x ∈ Ext(X), P x ∈ Ext(Y )
• Every extreme point of Y which happens to belong to Y is P x for some x ∈ Ext(X)
• When X does not contain lines, then every extreme point of Y which happens to belong
to Y is P x for some x ∈ Ext(X)
2. Let X, Y be nonempty closed convex sets in Rn , and let Z = X + Y , Z = cl Z. Then
• If w ∈ Ext(Z) ∩ Z, then w = x + y for some x ∈ Ext(X) and y ∈ Ext(Y ).
• If x ∈ Ext(X), y ∈ Ext(Y ), then x + y ∈ Ext(Z).
Exercise II.20 ♦ [faces of polyhedral set] Let X = {x ∈ Rn : a⊤ i x ≤ bi , i ≤ m} be a nonempty
polyhedral set and f ⊤ x be a linear form of x ∈ Rn which is bounded above on X:
Prove that
1. Opt(f ) is achieved – the set Argmax f ⊤ x := {x ∈ X : f ⊤ x = Opt(f )} is nonempty.
x∈X
2. The set Argmax f ⊤ x is as follows: there exists an index set I ⊂ {1, 2, . . . , m}, perhaps empty,
x∈X
such that
Argmax f ⊤ x = XI := {x : a⊤ ⊤
i x ≤ bi ∀i, ai x = bi ∀i ∈ I}
x∈X
The extreme points of Πd,n are exactly the block matrices [X ij ]i,j≤n as follows: for certain n × n
permutation matrix P and unit vectors eij ∈ Rd , one has
X ij = Pij eij e⊤
ij ∀i, j.
Exercise II.25 ▲ Let k, n be positive integers with k ≤ n, and let sk (λ) for λ ∈ Rn be
the sum of k largest entries in λ. From the description of the extreme points of the polytope
X = {x ∈ Rn : 0 ≤ xi ≤ 1, i ≤ n, n
P
i=1 xi ≤ k}, see Example II.9.2 in section 9.1.1, it follows
that when λ ∈ Rn+ , then
Xn
max λi xi = sk (λ).
x∈X
i=1
(!) Let X ⊂ RN be a nonempty closed convex set such that X ⊂ V + Rec(X) for some
bounded and closed set V , let x 7→ A(x) = Ax + b : RN → Rn be an affine mapping, and let
Y = A(X) := {y : ∃x ∈ X : y = A(x)} be the image of X under this mapping. Let also
K = {h ∈ Rn : ∃g ∈ Rec(X) : h = Ag}.
Then the recessive cone of the closure Y of Y is the closure K of K. In particular, when K is
closed (as definitely is the case when Rec(X) is polyhedral), it holds Rec(Y ) = K.
Exercise II.34 ♦ [follow-up to Exercise II.33]
1. Let K1 ⊂ R , K2 ⊂ Rn be closed cones, and let K = K1 + K2 .
n
Exercise II.42 Prove the easy part of Theorem II.9.7, specifically, that every n×n permutation
matrix is an extreme point of the polytope Πn of n × n doubly stochastic matrices.
Exercise II.43 ♦ [robust LP] Consider uncertain Linear Programming problem – a family
XN XN
minn {c⊤ x : [A + ζν ∆ν ]x ≤ b + ζ ν δν } : ζ ∈ Z (10.3)
x∈R ν=1 ν=1
(RC) is not an LP program – it has finitely many decision variables and infinite (when Z is
”massive”) system of linear constraints on these variables. Optimization problems of this type
are called semi-infinite and are, in general, difficult to solve. However, the RC of an uncertain
LP is easy, provided that Z is a “computation-friendly” set, for example, nonempty set given
by polyhedral representation:
Z = {ζ : ∃u : P ζ + Qu ≤ r} (10.4)
10.7 Miscellaneous exercises 139
a⊤ x ≤ b (1)
n
with uncertain data a ∈ R (b is certain) varying in the set
Xn
U = {a : |ai − a∗i |/δi ≤ 1, 1 ≤ i ≤ n, |ai − a∗i |/δi ≤ k} (2)
i=1
where a∗i are given “nominal data,” δi > 0 are given quantities, and k ≤ n is an integer (in
literature, this is called “budgeted uncertainty”). Rewrite the Robust Counterpart
a⊤ x ≤ b ∀a ∈ U (RC)
in a tractable LO form (that is, write down an explicit system (S) of linear inequalities in
variables x and additional variables such that x satisfies (RC) if and only if x can be extended
to a feasible solution of (S)).
where
• t is time, ω is the frequency,
• r is the distance from A to the origin O, d is the distance from P to the origin, ϕ ∈ [0, π] is
−−→ −→
the angle between the directions OP and OA,
• α and θ are responsible for how the oscillator is actuated.
The difference between the left and the right hand sides in (∗) is of order of r−2 and in all our
subsequent considerations can be completely ignored.
It is convenient to assemble α and θ into the actuation weight – the complex number w = αeıθ
(ı is the imaginary unit); with this convention, we have
where ℜ[·] stands for the real part of a complex number. The complex-valued function DP (ϕ) :
[0, π] → C, called the diagram of the oscillator, is responsible for the directional density of
the energy emitted by the oscillator: when evaluated at certain 3D direction ⃗e, this density
is proportional to |Dp (ϕ)|2 , where ϕ is the angle between the direction ⃗e and the direction
−−→
OP . Physics says that when our transmitting antenna is composed of K harmonic oscillators
located at points P1 , . . . , PK and actuated with weights w1 , . . . , wK , the directional density of the
energy transmitted by the resulting antenna array, as evaluated at a direction ⃗e, is proportional
−−→
to | k wk Dk (ϕk (⃗e))|2 , where ϕk (⃗e) is the angle between the directions ⃗e and OPk .
P
Consider the design problem as follows. We are given linear array of K oscillators placed
at the points Pk = (k − 1)δe, k ≤ K, where e is the first basic orth (that is, the unit vector
“looking” along the positive direction of the x-axis), and δ > 0 is a given distance between
consecutive oscillators. Our goal is to specify actuation weights wk , k ≤ K, in order to send as
much of total energy as possible along the directions which make at most a given angle γ with
e. To this end, we intend to act as follows:
140 Exercises for Part II
We want to select actuation weights wk , k ≤ K, in such a way that the magnitude |Dw (ϕ)| of
the complex-valued function
K
X
Dw (ϕ) = wk e2πı(k−1)δ cos(ϕ))
k=1
over w.
To get a computation-friendly version of this problem, we replace the full range [0, π] of values
of ϕ with M -point equidistant grid
ℓπ
Γ = {ϕℓ = : 0 ≤ ℓ ≤ M − 1},
M −1
thus converting our design problem into the optimization problem
– up to discretization of ϕ, this is the ratio of the energy emitted in the “cone of interest”
(i.e., along the directions making angle at most γ with e) to the total emitted energy. Factors
sin(ϕℓ ) reflect the fact that when computing the energy emitted in a spatial cone, we should
integrate |D(·)|2 over the part of the unit sphere in 3D cut off the sphere by the cone.
2. Now note that “in reality” the optimal weights wkn , k ≤ K are used to actuate physical
devices and as such cannot be implemented with the same 16-digit accuracy with which they
are computed; they definitely will be subject to small implementation errors. We can model
these errors by assuming that the “real life” diagram is
K
X
D(ϕ) = wkn (1 + ρξk )e2πı(k−1)δ cos(ϕ)
k=1
10.7 Miscellaneous exercises 141
where ρ ≥ 0 is some (perhaps small) perturbation level and ξk ∈ C are “primitive” per-
turbations responsible for the implementation errors and running through the unit disk
{ξ : |ξ| ≤ 1}. It is not a great sin to assume that ξk are independent across k random
variables uniformly distributed on the unit circumference in C. Now the diagram becomes
random and can violate the constraints of (P ) , unless ρ = 0; in the latter case, the diagram
is the “nominal” one given by the optimal weights wn , so that it satisfies the constraints of
(P ) with t set to Optn .
Now, what happens when ρ > 0? In this case, the diagram D(·) and its deviation v from
the prescribed value 1 at the origin, its sidelobe level l = maxℓ:ϕℓ >γ |D(ϕℓ )|, and energy
concentration become random. A crucial “real life” question is how large are “typical values”
of these quantities. To get impression of what happens, you are asked to carry out the
numerical experiment as follows:
• select perturbation level ρ ∈ {10−ℓ , 1 ≤ ℓ ≤ 6}
• for selected ρ, simulate and plot 100 realizations of the modulus of the actual diagram,
and find empirical averages v of v, l of l, and C of C.
3. Apply Robust Optimization methodology from Exercise II.43 to build “immunized against
implementation errors” solution to (P ), compute these solutions for perturbation levels 10−ℓ ,
1 ≤ ℓ ≤ 6, and subject the resulting designs to numerical study similar to the one outlined
in the previous item.
Note: (P ) is not a Linear Programming program, so that you cannot formally apply the
results stated in Exercise II.43; what you can apply, is the Robust Optimization “philosophy.”
Exercise II.46 ♦ Prove the statement “symmetric” the Dubovitski-Milutin Lemma:
The cone M∗ dual to the arithmetic sum of k (close or not) cones M i ⊂ Rn , i ≤ k, is the
intersection of the k cones M∗i dual to M i .
Exercise II.47 ♦ Prove the following polyhedral version of the Dubovitski-Milutin Lemma:
Let M , . . . , M be polyhedral cones in Rn , and let M = ∩i M i . The cone M∗ dual to M is the
1 k
sum of cones M∗i , i ≤ k, dual to M i , so that a linear form e⊤ x is nonnegative on M if and only
it can be represented as the sum of linear forms e⊤ i x nonnegative on the respective cones Mi .
Exercise II.48 ♦ [follow-up to Exercise II.47] Let A ∈ Rm×n be a matrix with trivial kernel,
e ∈ Rn , and let the set
X = {x : Ax ≥ 0, e⊤ x = 1} (∗)
be nonempty and bounded. Prove that there exists λ ∈ Rm such that λ > 0 and A⊤ λ = e.
Prove “partial inverse” of this statement: if KerA = {0} and e = A⊤ λ for some λ > 0, the
set (∗) is bounded.
Exercise II.49 ♦ Let E be a linear subspace in Rn , K be a closed cone in Rn , and ℓ(x) :
E → R be a linear (linear, not affine!) function which is nonnegative on K ∩ E. Which of the
following claims are always true:
1. ℓ(·) can be extended from E onto the entire Rn to yield a linear function which is nonnegative
on K
2. Assuming int K ∩ E ̸= ∅, ℓ(·) can be extended from E onto the entire Rn to yield a linear
function which is nonnegative on K.
3. Assuming, in addition to ℓ(x) ≥ 0 for x ∈ K ∩ E, that K = {x : P x ≤ 0} is a polyhedral
cone, ℓ(·) can be extended from E onto the entire Rn to yield a linear function which is
nonnegative on K.
Exercise II.50 Let n > 1. Is the unit ∥ · ∥2 -ball Bn = {x ∈ Rn : ∥x∥2 ≤ 1} a polyhedral set?
Justify your answer.
142 Exercises for Part II
N 2 4 8 16 32 64 128
U
M
L
where U is the maximal, M is the mean, and L is the minimal # of extreme points observed
when processing 100 samples ω N of a given cardinality
where F is the number of feasible systems, and U is the number of feasible systems with
bounded solution sets.
Intermezzo: related theoretical results originating from [Nem24, Exercise 2.23] are as follows.
Given positive integers m, n with n ≥ 2, consider homogenous system Ax ≤ 0 of m inequal-
ities with n variables. We call this system regular, if its matrix A is regular, regularity of a
matrix B meaning that all square submatrices of B are nonsingular. Clearly, the entries of a
regular matrix are nonzero, and when a p × q matrix B is drawn at random from a probabil-
ity distribution on Rp×q which has a density w.r.t the Lebesgue measure, B is regular with
probability 1.
Given regular m × n homogeneous system of inequalities Ax ≤ 0, let gi (x) = n
P
j=1 Aij xj ,
i ≤ m, so that gj are nonconstant linear functions. Setting Πi = {x : gi (x) = 0}, we get
a collection of m hyperplanes in Rn passing through the origin. For a point x ∈ Rn , the
signature of x is, by definition, the m-dimensional vector σ(x) of signs of the reals gi (x),
1 ≤ i ≤ m. Denoting by Σ the set of all m-dimensional vectors with entries ±1, for σ ∈ Σ
10.7 Miscellaneous exercises 143
the set Cσ = {x : σ(x) = σ} is either empty, or is a nonempty open convex set; when it is
nonempty, let us call it a cell associated with A, and the corresponding σ – an A-feasible
signature. Clearly, for regular system, Rn is the union of all hyperplanes Πi and all cells
associated with A. It turns out that
The number N (m, n) of cells associated with a regular homogeneous m × n system
Ax ≤ 0 is independent of the system and is given by a simple recurrence:
N (1, 2) = 2
m ≥ 2, n ≥ 2 =⇒ N (m, n) = N (m − 1, n) + N (m − 1, n − 1) [N (m, 1) = 2, m ≥ 1].
m×n
Next, when A is drawn at random from probability distribution P on R which possesses
symmetric density p, that is, such that p([a⊤ ⊤ ⊤ ⊤ ⊤ ⊤
1 ; a2 ; . . . ; am ]) = p([ϵ1 a1 ; ϵ2 a2 ; . . . ; ϵm am ]) for
all A = [a⊤1 ; a⊤
2 ; . . . ; a⊤
m ] and all ϵi = ±1, then the probability for a vector σ ∈ Σ to be an
A-feasible signature is
π(m, n) = N (m, n)/2m .
In particular, the probability for the system Ax ≤ 0 to have a solution set with a nonempty
interior (this is nothing but A-feasibility of the signature [−1; . . . ; −1] is π(m, n).
The inhomogeneous version of these results is as follows. An m×n system of linear inequalities
Ax ≤ b is called regular, if the matrix [A, −b] is regular. Setting gi (x) = n
P
j=1 Aij xj − bi ,
i ≤ n, the [A, b]-signature of x is, as above, the vector of signs of the reals gi (x). For σ ∈ Σ, the
set Cσ = {x : σ(x)) = σ} is either empty, or is a nonempty open convex set; in the latter case,
we call Cσ an [A, b]-cell, and call σ an [A, b]-feasible signature. Setting Πi = {x : gi (x) = 0},
we get m hyperplanes in Rn , and the entire Rn is the union of those hyperplanes and all
[A, b]-cells. It turns out that
The number N (m, n) of cells associated with a regular m × n system Ax ≤ b is independent
of the system and is equal to 12 N (m + 1, n + 1).
In addition, when m×(n+1) matrix [A, b] is drawn at random from a probability distribution
on Rm×(n+1) possessing a symmetric density w.r.t. the Lebesgue distribution, the probability
for every σ ∈ Σ to be [A, b]-feasible signature is
π(m, n) = N (m + 1, n + 1)/2m+1 .
In particular, the probability for the system Ax ≤ b to be strictly feasible is π(m, n).
2. Accompanying exercise: Prove that if A is m × n regular matrix, then the system Ax ≤ 0
has a nonzero solution if and only if the system Ax < 0 is feasible. Derive from this fact that
if [A, b] is regular, then the system Ax ≤ b is feasible if and only if it is strictly feasible, and
that when the system Ax ≤ 0 has a nonzero solution, the system Ax ≤ b is strictly feasible
for every b.
3. Use the results from Intermezzo to compute the expected values of F and B, see item 1.
Exercise II.54 ▲ [computational study]
1. For ν = 1, 2, . . . , 6, generate 100 systems of linear inequalities Ax ≤ b with n = 2ν variables
and m = 2n inequalities, the entries in A, b being drawn, independently of each other, from
N (0, 1). Fill the following table:
n 2 4 8 16 32 64
F
E{F }
B
F : # of feasible systems in sample;
B: # of feasible systems with bounded soultion sets
To compute the expected value of F , use the results from [Nem24, Exercise 2.23] cited in
item 2 of Exercise II.53.
144 Exercises for Part II
2. Carry out experiment similar to the one in item 1, but with m = n + 1 rather than m = 2n.
n 2 4 8 16 32 64
F
E{F }
B
E{B}
F : # of feasible systems in sample;
B: # of feasible systems with bounded soultion sets
11
Proof. When a linear form a⊤ x separates S and T , (a) holds true. Given (a), (b) could be
violated if and only if inf a⊤ x = sup a⊤ y. But, together with (a), this can happen only if a⊤ x is
x∈S y∈T
constant on S ∪ T , which is not the case as a⊤ x separates S and T . The above reasoning clearly
can be reversed: given (b), we have a ̸= 0, and given (a), both supx∈S a⊤ x and inf y∈T a⊤ y are
real numbers. Selecting b in-between these real numbers, and the hyperplane a⊤ x = b clearly
separates S and T . The “strong separation” claim is evident.
Fact II.8.4 Let M be a nonempty convex set and let x ∈ M . Then, x is an extreme
point of M if and only if any (and then all) of the following holds:
(i) the only vector h such that Px ± h ∈ M is the zero vector;
m
(ii) in every representation x = i=1 λi xi of x as a convex combination, with positive
coefficients, of points xi ∈ M , i ≤ m, one has x1 = . . . = xm = x;
(iii) the set M \ {x} is convex.
Proof.
(i): If x is extreme point and x ± h ∈ M , then h = 0, since otherwise x = 12 (x + h) + 21 (x − h)
implying that x is an interior point of a nontrivial segment [x − h, x + h], which is impossible.
For the other direction, assume for contradiction that x ± h = 0 implies h = 0 and that x
is not at extreme point of M . Then, as x ̸∈ Ext(M ), there exists u, v ∈ M where both u, v
are not equal to x and λ ∈ (0, 1) such that x = λu + (1 − λ)v. As u ̸= x and v ̸= x while
x = λu+(1−λ)v, we conclude that u ̸= v. Now, consider any δ > 0 such that δ < min{λ, 1−λ}
and define h := δ(u − v). Note that h ̸= 0 and x + h = (λ + δ)u + (1 − λ − δ)v ∈ M and
x − h = (λ − δ)u + (1 − λ + δ)v ∈ M due to λ ± δ ∈ (0, 1), u, v ∈ M and convexity of M .
This then leads to the desired contradiction with our assumption that x ± h ∈ M implies
that h = 0.
As a byproduct of our reasoning, we see that if x ∈ M can be represented as x = λu+(1−λ)v
with u, v ∈ M , λ ∈ (0, 1], and u ̸= x, then x is not an extreme point of M .
145
146 Proofs of Facts from Part II
(ii): In one direction, when x is not an extreme point of M , there exists h ̸= 0 such that x±h ∈ M
so that x = 12 (x + h) + 12 (x − h) is a convex combination with positive coefficients and using
two points x ± h that are both in M and are distinct P from x. To prove the opposite direction,
m i P
let x be an extreme point of M and suppose x = i=1 λ i x with λ i > 0, i λ i = 1, and
let us prove that x1 = . . . = xm = x. Indeed, assume for contradiction that at least one
of xi , say, x1 , differs P
from x, and m > 1. Since λ2 > 0, wePhave 0 < λ1 < 1. Then, the
point v := (1 − λ1 )−1 m i
i=2 λi x is well defined. Moreover, as
m
i=2 λi = 1 − λ1 , v is a convex
combination of x , . . . , x and therefore v ∈ M . Then, x = λ1 x +(1−λ1 )v with x, x1 , v ∈ M ,
2 m 1
λ1 ∈ (0, 1], and x1 ̸= x, which, by the concluding comment in item (i) of the proof, implies
that x ̸∈ Ext(M ); this is the desired contradiction.
(iii): In one direction, let x be an extreme point of M ; let us prove that the set M ′ := M \ {x} is
convex. Assume for contradiction that this is not the case. Then, there exist u, v ∈ M ′ and
λ ∈ [0, 1] such that x̄ := λu + (1 − λ)v ̸∈ M ′ , implying that 0 < λ < 1 (since u, v ∈ M ′ ). As
M is convex, we have x̄ ∈ M , and since x̄ ̸∈ M ′ and M \ M ′ = {x}, we conclude that x̄ = x.
Thus, x is a convex combination, with positive coefficients, of two distinct from x points from
M , contradicting, by already proved item (ii), the fact that x is an extreme point of M . For
the other direction, suppose that M \ {x} is convex and we will prove that x must be an
extreme point of M . Assume for contradiction that x ̸∈ Ext(M ). Then, there exists h ̸= 0
such that x ± h ∈ M . As h ̸= 0, both x + h and x − h are distinct from x, thus x ± h ∈ M \ {x}.
We see that x ± h ∈ M \{x}, x = 21 (x + h) + 12 (x − h) and x ∈ / M \ {x}, contradicting the
convexity of M \ {x}.
Fact II.8.5 All extreme points of the convex hull Conv(Q) of a set Q belong to Q:
Ext(Conv(Q)) ⊆ Q.
Proof. Assume for contradiction that x ∈ Ext(Conv(Q)) and x ̸∈ Q. As x ∈ Ext(Conv(Q)),
by Fact II.8.4.(iii) the set Conv(Q) \ {x} is convex and contains Q, contradicting the fact that
Conv(Q) is the smallest convex set containing Q.
Fact II.8.14 Let M ⊆ Rn be a nonempty closed convex set. Recall its closed conic
transform is given by
ConeT(M ) = cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ M } ,
(see section 1.5). Then,
Rec(M ) = h ∈ Rn : [h; 0] ∈ ConeT(M ) .
Proof. Let h be such that [h; 0] ∈ ConeT(M ), and let us prove that h ∈ Rec(M ). There is nothing
to prove when h = 0, thus assume that h ̸= 0. Since the vectors g such that [g; 0] ∈ ConeT form a
closed cone and both ConeT(M ) and Rec(M ) are cones as well, we lose nothing when assuming,
in addition to [h; 0] ∈ ConeT(M ) and h ̸= 0, that h is a unit vector. Since [h; 0] ∈ ConeT(M ), by
definition of the latter set there exists a sequence [ui ; ti ] → [h; 0], i → ∞, such that ti > 0 and
xi := ui /ti ∈ M for all i. Then, this together with ui → h and ∥h∥2 = 1 imply that ∥xi ∥2 → ∞
and ti ∥xi ∥2 → 1 as i → ∞. As a result, limi→∞ ∥xi ∥−1 i i
2 x = limi→∞ u = h. By Fact II.8.13(ii),
we see that h ∈ Rec(M ).
For the reverse direction, consider any h ∈ Rec(M ), and let us prove that [h; 0] ∈ ConeT(M ).
There is nothing to prove when h = 0, so we assume h ̸= 0. Consider any x̄ ∈ M and define
xi := x̄ + ih, i = 1, 2, . . .. As h ∈ Rec(M ), we have xi ∈ M for all i. Moreover, ∥xi ∥2 → ∞ as
i → ∞ due to h ̸= 0. We clearly have limi→∞ [xi /∥xi ∥2 ; 1/∥xi ∥2 ] = [h/∥h∥2 ; 0], and the vectors
[y i ; ti ] := [xi /∥xi ∥2 , 1/∥xi ∥2 ] for all large enough i satisfy the requirement ti > 0, y i /ti ∈ M , so
[y i ; ti ] ∈ ConeT(M ) for all large enough i. As ConeT(M ) is closed and [y i ; ti ] → [h/∥h∥2 ; 0] as
i → ∞, we deduce [h/∥h∥2 ; 0] ∈ ConeT(M ). Finally, ConeT(M ) is a cone, so [h; 0] ∈ ConeT(M )
as well.
Fact II.8.15 For any nonempty polyhedral set M = {x ∈ Rn : Ax ≤ b}, its recessive
cone is given by
Rec(M ) = {h ∈ Rn : Ah ≤ 0} ,
i.e., Rec(M ) is given by homogeneous version of linear constraints specifying M .
Proof. Consider any h such that Ah ≤ 0. Then, for any x̄ ∈ M , and t ≥ 0, we have A(x̄ + th) =
Ax̄ + tAh ≤ Ax̄ ≤ b, so x̄ + th ∈ M for all t ≥ 0. Hence, h ∈ Rec(M ). For the reverse direction,
suppose h ∈ Rec(M ) and x̄ ∈ M . Then, for all t ≥ 0 we have A(x̄ + th) ≤ b. This is equivalent
to Ah ≤ t−1 (b − Ax̄) for all t > 0, which implies that Ah ≤ 0.
Fact II.8.23 Let M be a closed cone in Rn , and let M∗ be the cone dual to M .
Then
148 Proofs of Facts from Part II
(i) Duality does not distinguish between a cone and its closure: whenever M = cl M ′
for a cone M ′ , we have M∗ = M∗′ .
(ii) Duality is symmetric: the cone dual to M∗ is M .
(iii) One has
int M∗ = y ∈ Rn : y ⊤ x > 0, ∀x ∈ M \ {0} ,
(iv) The cone dual to the direct product M1 × . . . × Mm of cones Mi is the direct
product of their duals: [M1 × . . . × Mm ]∗ = [M1 ]∗ × . . . × [Mm ]∗ .
Proof.
(i): This is evident.
(ii): By definition, any x ∈ M satisfies x⊤ y ≥ 0 for all y ∈ M∗ , hence M ⊆ [M∗ ]∗ . To prove
M = [M∗ ]∗ , assume for contradiction that there exists x̄ ∈ [M∗ ]∗ \M . By Separation Theorem,
{x̄} can be strongly separated from M , i.e., there exists y such that
y ⊤ x̄ < inf y ⊤ x.
x∈M
As M is a conic set and the right hand side infimum is finite, this infimum must be 0. Thus,
y ⊤ x̄ < 0 while y ⊤ x ≥ 0 for all x ∈ M implying y ∈ M∗ . But, then this contradicts to
x̄ ∈ [M∗ ]∗ .
(iii): Let us prove that int M∗ ̸= ∅ if and only if M is pointed. If M is not pointed, then ±h ∈ M
for some h ̸= 0, implying that y ⊤ [±h] ≥ 0 for all y ∈ M∗ , that is, y ⊤ h = 0 for all y ∈ M∗ .
Thus, when M is not pointed, M∗ belongs to a proper (smaller than the entire Rn ) linear
subspace of Rn and thus int M∗ = ∅. This reasoning can be reversed: when int M∗ = ∅,
the affine hull Aff(M∗ ) of M∗ cannot be the entire Rn (since int M∗ = ∅ and rint M∗ ̸= ∅);
taking into account that 0 ∈ M∗ , we have Aff(M∗ ) = Lin(M∗ ), so that Lin(M∗ ) ⫋ Rn , and
therefore there exists a nonzero h orthogonal to Lin(M∗ ). We have y ⊤ [±h] = 0 for all y ∈ M∗ ,
implying that h and −h belong to cone dual to M∗ , that is, to M (due to the already verified
item (ii)). Thus, for some nonzero h it holds ±h ∈ M , that is, M is not pointed.
Now let us prove that y ∈ int M∗ if and only if y ⊤ x > 0 for every x ∈ M \ {0}. In one
direction: assume that y ∈ int M∗ , so that for some r > 0 it holds y + δ ∈ M∗ for all δ with
∥δ2 ∥ ≤ r. If now x ∈ M , we have 0 ≤ minδ:∥δ∥2 ≤r [y + δ]⊤ x = y ⊤ x − r∥x∥2 . Thus,
1 ⊤
y ∈ int M∗ =⇒ ∥x∥2 ≤ y x, ∀x ∈ M, (*)
r
implying that y ⊤ x > 0 for all x ∈ M \ {0}, as required. In the opposite direction: assume that
y ⊤ x > 0 for all x ∈ M \ {0}, and let us prove that y ∈ int M∗ . There is nothing to prove when
M = {0} (and therefore M∗ = Rn ). Assuming M ̸= {0}, let M̄ = {x ∈ M : ∥x∥2 = 1}. This
set is nonempty (since M ̸= {0}), is closed (as M is closed), and is clearly bounded, and thus
is compact. We are in the situation when y ⊤ x > 0 for x ∈ M̄ , implying that minx∈M̄ y ⊤ x
(this minimum is achieved since M̄ is a nonempty compact set) is strictly positive. Thus,
y ⊤ x ≥ r > 0 for all x ∈ M̄ , whence [y + δ]⊤ x ≥ 0 for all x ∈ M̄ and all δ with ∥δ∥2 ≤ r. Due
to the origin of M̄ , the inequality [y + δ]⊤ x ≥ 0 for all x ∈ M̄ implies that [y + δ]⊤ x ≥ 0 for
all x ∈ M . The bottom line is that the Euclidean ball of radius r centered at y belongs to
M∗ , and therefore y ∈ int M∗ , as claimed.
Now let us prove the “Moreover” part of item (iii). Thus, let the cone M be closed, pointed,
and nontrivial. Consider any y ∈ int M∗ , then the set My , first, contains some positive
multiple of every nonzero vector from M and thus is nonempty (since M ̸= {0}) and, second,
Proofs of Facts from Part II 149
is bounded (by (*)). Since My is closed (as M is closed), we conclude that My is a nonempty
compact set. Thus, the left hand side set in (8.7) is contained in the right hand side one. To
prove the opposite inclusion, let y ∈ Rn be such that My is a nonempty compact set, and let
us prove that y ∈ int M∗ . By the already proved part of item (iii), all we need is to verify that
if x ̸= 0 and x ∈ M , then y ⊤ x > 0. Assume for contradiction that there exists x̄ ∈ M \ {0}
such that α := −y ⊤ x̄ ≥ 0. Then, by selecting any x b ∈ My (My is nonempty!) and setting
e = αb x + x̄, we get e ∈ M and y ⊤ e = 0. Note the e ̸= 0; indeed, e = 0 means that the nonzero
vector x̄ ∈ M is such that −x̄ = αb x ∈ M , contradicting pointedness of M . The bottom line
is that e ∈ M \ {0} and y ⊤ e = 0, whence e is a nonzero recessive direction of My . This is the
desired contradiction as My is compact!
(iv): This is evident.
Fact II.8.28 Let M ⊆ Rn be a cone and M∗ be its dual cone. Then, for any
x ∈ int M , there exists a properly selected Cx < ∞ such that
∥f ∥2 ≤ Cx f ⊤ x, ∀f ∈ M∗ .
Proof. Since x ∈ int M , there exists ρ > 0 such that x − δ ∈ M whenever ∥δ∥2 ≤ ρ. Then, as
f ∈ M∗ , we have f ⊤ (x − δ) ≥ 0 for any ∥δ∥2 ≤ ρ , i.e., f ⊤ x ≥ supδ {f ⊤ δ : ∥δ∥2 ≤ ρ} = ρ∥f ∥2 .
Taking Cx := 1/ρ (note that Cx < ∞ as ρ > 0) gives us the desired relation.
Fact II.8.33. Let M ⊆ Rn be a nontrivial closed cone, and M∗ be its dual cone.
(i) M is pointed
(i.1) if and only if M does not contain straight lines,
(i.2) if and only if M∗ has a nonempty interior, and
(i.3) if and only if M has a base.
(ii) Set (8.9) is a base of M
(ii.1) if and only if f ⊤ x > 0 for all x ∈ M \ {0},
(ii.2) if and only if f ∈ int M∗ .
In particular, f ∈ int M∗ if and only if f ⊤ x > 0 whenever x ∈ M \ {0}.
(iii) Every base of M is nonempty, closed, and bounded. Moreover, whenever M is
pointed, for any f ∈ M∗ such that the set (8.9) is nonempty (note that this set is
always closed for any f ), this set is bounded if and only if f ∈ int M∗ , in which case
(8.9) is a base of M .
(iv) M has extreme rays if and only if M is pointed. Furthermore, when M is pointed,
there is one-to-one correspondence between extreme rays of M and extreme points
of a base B of M : specifically, the ray R := R+ (d), d ∈ M \ {0} is extreme if and
only if R ∩ B is an extreme point of B.
Proof. (i.1): Since M is closed, convex, and contains the origin, M contains a line if and only
if M contains a line passing through the origin, and since M is conic, the latter happens if and
only if M is not pointed.
(i.2): This is precisely Fact II.8.23(iii).
(i.3): As we have seen, (8.9) is a base of M if and only if f ⊤ x > 0 for all x ∈ M \ {0}, which,
by Fact II.8.23(iii), holds if and only if f ∈ int M∗ .
(ii.1): This was explained when defining a base.
(ii.2): This is given by Fact II.8.23(iii).
(iii): Suppose B is a base of M . Then, B is nonempty since B intersects all nontrivial rays
in M emanating from the origin, and the set of these rays is nonempty since M is nontrivial.
Closedness of B is evident. To prove that B is bounded, note that by (ii.2) f ∈ int M∗ . Thus,
there exists r > 0 such that f − e ∈ M∗ , for all ∥e∥2 ≤ r. Hence, [f − e]⊤ x ≥ 0 for all x ∈ M
150 Proofs of Facts from Part II
and all e with ∥e∥2 ≤ r, implying f ⊤ x ≥ r∥x∥2 for all x ∈ M , and therefore ∥x∥2 ≤ r−1 for all
x ∈ B.
Next, let M be pointed and f ∈ M∗ be such that the set (8.9) is nonempty. Closedness of
this set is evident. Let us show that this set is bounded if and only if f ∈ int M∗ . Indeed, when
f ∈ int M∗ , B is a base of M by (ii.2) and therefore, as we have just seen, B is bounded. For
the other direction, suppose that f ̸∈ int M∗ . Then, by Fact II.8.23(iii), there exists x̄ ∈ M \ {0}
such that f ⊤ x̄ = 0. Also, as the set (8.9) is nonempty, there exists x b ∈ M such that f ⊤ x
b = 1.
Now, observe that for any λ ∈ [0, 1) the vector (1 − λ)−1 [(1 − λ)b x + λx̄] belongs to B and the
norm of this vector goes to +∞ as λ → 1. But, then this implies that B is unbounded, and so
the proof of (iii) is completed.
(iv): Suppose M is not pointed. Then, there exists a direction e ̸= 0 such that M contains
the line generated by e; in particular ±e ∈ M . Assume for contradiction that d is an extreme
direction of M . Then, as M is a closed convex cone, d ± te ∈ M for all t ∈ R. Thus, as M is a
cone, we have d± (t) := 12 [d ± te] ∈ M for all t. Let us first suppose that e is not collinear to d,
then for any t ̸= 0, the vector d± (t) is not a nonnegative multiple of d, but then this contradicts
d being an extreme direction of M . So, we now suppose that e is collinear to d. But, in this case,
for large enough t, one of the vectors d± (t), while being a multiple of d, is not a nonnegative
multiple of d, which again is impossible. Thus, when M is not pointed, M does not have extreme
rays.
Now let M be pointed, and let the set B given by (8.9) be a base of M (a base does exist
by (i.3)). B is a nonempty closed and bounded convex set by (iii). Let us verify that the rays
R+ (d) spanned by the extreme points of B are exactly the extreme rays of M . First, suppose
d ∈ Ext(B), and let us prove that d is an extreme direction of M . Indeed, let d = d1 + d2 for
some d1 , d2 ∈ M ; we should prove that d1 , d2 are nonnegative multiples of d. There is nothing
to prove when one of the vectors d1 , d2 is zero, so we assume that both d1 , d2 are nonzero. Then,
since B is a base, by (ii.1) we have αi := f ⊤ di > 0, i = 1, 2. Moreover, α1 + α2 = f ⊤ d = 1.
Setting di := αi−1 di , i = 1, 2, we have di ∈ B, i = 1, 2, and α1 d1 + α2 d2 = d1 + d2 = d. Recalling
that d is an extreme point of B and αi > 0, i = 1, 2, we conclude that d1 = d2 = d, that is,
d1 and d2 are positive multiples of d, as claimed. For the reverse direction, let d be an extreme
direction of M . We need to prove that the intersection of the ray R+ (d) and B (this intersection
is nonempty since d ∈ M \ {0}) is an extreme point of B. Passing from extreme direction d to
its positive multiple, we can assume that d ∈ B. To prove that d ∈ Ext(B), assume that there
exists h such that d ± h ∈ B and let us verify that h = 0. Indeed, as d ∈ B we have f ⊤ d = 1,
while from d ± h ∈ B we conclude that f ⊤ h = 0. Therefore, when h ̸= 0, h is not a multiple of
d, whence the vectors d ± h are not multiples of d. On the other hand, both of the vectors d ± h
belong to M and d is their average, which contradicts the fact that d is an extreme direction of
M . Thus, h = 0, as claimed.
we claim that C < ∞. Taking this claim for granted, observe that C < ∞ implies, by
homogeneity, that supf ∈−ϵB,z∈M f ⊤ z ≤ ϵC for all ϵ > 0, hence for properly selected small
positive ϵ the ball −ϵB is contained in Polar (M ), implying int(Polar (M )) ̸= ∅, which is a
desired contradiction.
It remains to justify the above claim. To this end assume that C = +∞, and let us lead this
assumption to a contradiction. When C = +∞, there exists a sequence fi ∈ −B and zi ∈ M
such that fi⊤ zi → +∞ as i → ∞, implying, due to fi ∈ −B, that ∥zi ∥2 → ∞ as i → ∞.
Passing to a subsequence, we can assume that zi /∥zi ∥2 → h as i → ∞. Then, by its origin,
h is an asymptotic direction of M and therefore is a unit vector from K (Fact II.8.13(ii)).
Assuming w.l.o.g. zi ̸= 0 for all i, we have
fi⊤ zi = ∥zi ∥2 fi⊤ h + fi⊤ (zi /∥zi ∥2 − h) . (!)
|{z} | {z }
:=αi :=βi
Convex Functions
153
12
x ln x;
• functions convex on the positive ray:
1/xp , where p > 0;
− ln x.
At the moment it is not clear why these functions are convex. We will soon
derive a simple analytic criterion for detecting convexity which will immediately
demonstrate that the above functions indeed are convex. ♢
A very convenient equivalent definition of a convex function is in terms of its
epigraph. Given a real-valued function f defined on a subset Q of Rn , we define
its epigraph as the set
epi{f } := [x; t] ∈ Rn+1 : x ∈ Q, t ≥ f (x) .
Geometrically, to define the epigraph, we plot the graph of the function, i.e., the
surface {(x, t) ∈ Rn+1 : x ∈ Q, t = f (x)} in Rn+1 , and add to this surface
all points which are “above” it. Epigraph allows us to give an equivalent, more
geometrical, definition of a convex function as follows.
the ℓ∞ -norm ∥x∥∞ = max |xi |. It was also claimed (although not proved) that
i
these are three members from an infinite family of norms
n
!1/p
X
∥x∥p := |xi |p , where 1 ≤ p ≤ ∞
i=1
(the right hand side of the latter relation for p = ∞ is, by definition, max |xi |).
i
We say that a function f : Rn → R is positively homogeneous of degree 1 if it
satisfies
f (tx) = tf (x), ∀x ∈ Rn , t ≥ 0.
Also, we say that the function f : Rn → R is subadditive if it satisfies
f (x + y) ≤ f (x) + f (y), ∀x, y ∈ Rn .
Note that every norm is positively homogeneous of degree 1 and subadditive.
We are about to prove that all such functions (in particular, all norms) are con-
vex:
Proof. Note that the points [xi , f (xi )] belong to the epigraph of f . As f is convex,
PN
its epigraph is a convex set. Then, for any λ ∈ RN + satisfying i=1 λi = 1, we
have that the corresponding convex combination of the points given by
N
"N N
#
X X X
λi [xi ; f (xi )] = λi xi ; λi f (xi )
i=1 i=1 i=1
also belongs to epi{f }. By definition of the epigraph, this means exactly that
N N
λi f (xi ) ≥ f ( λi xi ).
P P
i=1 i=1
Note that the definition of convexity of a function f is exactly the requirement
on f to satisfy the Jensen inequality for the case of N = 2. We see that to satisfy
this inequality for N = 2 is the same as to satisfy it for all N ≥ 2.
Remark III.12.5 An instructive interpretation of Jensen’s inequality is as fol-
lows: Given a convex function f , consider a discrete random variable x taking
values xi ∈ Dom f , i ≤ N , with probabilities λi . Then,
f (E[x]) ≤ E[f (x)],
where E[·] stands for the expectation operator. The resulting inequality, under
mild regularity conditions, holds true for general type random vectors x taking
values in Dom f with probability 1. ■
The simplest function with a given domain Q is identically zero on Q and iden-
tically +∞ outside of Q. This function, called the characteristic (a.k.a. indicator )
function of Q 1 is convex if and only if Q is a convex set.
It is convenient to think of a convex function as of something which is defined
everywhere, since it saves a lot of words. For example, with this convention we
can write f + g (f and g are convex functions on Rn ), and everybody will under-
stand what is meant. Without this convention, we were supposed to add to this
expression the following explanation as well: “f + g is a function with the domain
being the intersection of those of f and g, and in this intersection it is defined as
(f + g)(x) = f (x) + g(x).”
1 This terminology is standard for Convex Analysis; in other areas of Math, characteristic, a.k.a.
indicator, function of a set Q ⊂ Rn is defined as the function equal to 1 on the set and to 0 outside
of it.
13
In an optimization problem
min {f (x) : gj (x) ≤ 0, j = 1, . . . , m}
x
Imagine how many extra words would be necessary here if there were no con-
vention on the value of a convex function outside its domain!
In the Convex Monotone superposition rule, monotone nondecreasing property
of F is crucial. (Look what happens when n = K = 1, f1 (x) = x2 , F (z) = −z).
This rule, however, admits the following two useful variants where the mono-
tonicity requirement is somehow relaxed (the justifications of these variants are
left to the reader):
inequalities, we get
λg(x) + (1 − λ)g(x′ ) + ϵ ≥ λf (x, yϵ ) + (1 − λ)f (x′ , yϵ′ )
≥ f (λx + (1 − λ)x′ , λyϵ + (1 − λ)yϵ′ )
= f (x′′ , λyϵ + (1 − λ)yϵ′ ),
where the last inequality follows from the convexity of f . By definition of g(x′′ )
we have f (x′′ , λyϵ + (1 − λ)yϵ′ ) ≥ g(x′′ ), and thus we get λg(x) + (1 − λ)g(x′ ) +
ϵ ≥ g(x′′ ). In particular, x′′ ∈ Dom g (recall that x, x′ ∈ Dom(g) and thus
g(x), g(x′ ) ∈ R). Moreover, since the resulting inequality is valid for all ϵ > 0,
we come to g(x′′ ) ≤ λg(x) + (1 − λ)g(x′ ), as required.
• Perspective transform of a convex function: Given a convex function f on Rn ,
we define the function g(x, y) := yf (x/y) with the domain {[x; y] ∈ Rn+1 : y >
0, x/y ∈ Dom f } to be its perspective function. The perspective function of a
convex function is convex.
Let us first examine a direct justification of this. Consider any [x′ ; y ′ ] and
[x′′ ; y ′′ ] from Dom g and any λ ∈ [0, 1]. Define x := λx′ + (1 − λ)x′′ , y :=
λy ′ + (1 − λ)y ′′ . Then, y > 0. We also define λ′ := λy ′ /y and λ′′ := (1 − λ)y ′′ /y,
so that λ′ , λ′′ ≥ 0 and λ′ + λ′′ = 1. As f is convex, we deduce x/y = λx′ /y +
(1 − λ)x′′ /y = λ′ x′ /y ′ + λ′′ x′′ /y ′′ = λ′ x′ /y ′ + (1 − λ′ )x′′ /y ′′ ∈ Dom f and
f (x/y) ≤ λ′ f (x′ /y ′ ) + (1 − λ′ )f (x′′ /y ′′ ). Thus, as y > 0, we arrive at yf (x/y) ≤
yλ′ f (x′ /y ′ ) + y(1 − λ′ )f (x′′ /y ′′ ) = λ[y ′ f (x′ /y ′ )] + (1 − λ)[y ′′ f (x′′ /y ′′ )], that is,
g(x, y) ≤ λg(x′ , y ′ ) + (1 − λ)g(x′′ , y ′′ ).
Here is an alternative smarter justification. There is nothing to prove when
Dom f = ∅. So, suppose that Dom f ̸= ∅. Consider the epigraph epi(f ) =
{[x; s] : s ≥ f (x)} and the perspective transform of this nonempty convex set
which is given by (see section 1.5)
Persp(epi{f }) := [[x; s]; t] ∈ Rn+2 : t > 0, [x/t; s/t] ∈ epi{f }
where the second from last equality follows from the fact that by definition of
g(x, t), whenever t > 0, the inclusion x/t ∈ Dom f takes place if and only if
[x; t] ∈ Dom g. Thus, we observe that Persp(epi{f }) is nothing but the image
of epi{g} under the one-to-one linear transformation [x; t; s] 7→ [x; s; t]. As
Persp(epi{f }) is a convex set (recall from section 1.5 that the perspective
transform of a nonempty convex set is convex), we conclude that g is convex.
Now that we know what the basic operations preserving convexity of a function
are, let us look at the standard convex functions these operations can be applied
to. We have already seen several examples in Example III.12.2; but we still do
not know why these functions are convex. The usual way to check convexity of a
13.2 Criteria of convexity 165
x z y
Figure III.1. Univariate convex function f : [x, y] → R. The average rate of change of f
on the entire segment [x, y] is in-between the average rates of change “at the beginning,”
i.e., when passing from x to z and “at the end,” i.e., when passing from z to y.
When λ ∈ (0, 1), the pair (1, λ) is a positive multiple of the pair (y−x, zλ −x), thus
(13.2) is equivalent to (y −z)(f (zλ )−f (x)) ≤ (zλ −x)(f (y)−f (x)). Note that this
inequality is the same as f (zzλλ)−f
−x
(x)
≤ f (y)−f
y−x
(x)
. When λ runs through the interval
(0, 1) the point zλ runs through the entire set {z : x < z < y}, and so we conclude
that f is convex if and only if for every triple x < z < y with x, y ∈ Dom f the first
inequality in (13.1) holds true. As every one of inequalities in (13.1) implies the
other two, this justifies our “average rate of change” characterization of univariate
convexity.
In the case of multivariate convex functions, we have the following immediate
consequence of the preceding observations.
Lemma III.13.1 Let x, x′ , x′′ be three distinct points in Rn with x′ ∈ [x, x′′ ].
Then, for any convex function f that is finite on [x, x′′ ], we have
f (x′ ) − f (x) f (x′′ ) − f (x)
≤ . (13.3)
∥x′ − x∥2 ∥x′′ − x∥2
Proof. Under the premise of the lemma, define ϕ(t) := f (x + t(x′′ − x)) and let
λ ∈ R be such that x′ = x + λ(x′′ − x). Note that λ ∈ (0, 1) as x′ ∈ [x, x′′ ] and
the points x, x′ , x′′ are all distinct from each other. As it was explained at the
beginning of this section, the univariate function ϕ is convex along with f , and
0, 1, λ ∈ Dom ϕ. Applying the first inequality in (13.1) to ϕ in the role of f and
′
the triple (0, λ, 1) in the role of the triple (x, z, y), we get f (x )−f
λ
(x)
≤ f (x′′ )−f (x),
which, due to λ = ∥x′ − x∥2 /∥x′′ − x∥2 , is nothing but (13.3).
To sum up, to detect convexity of a function, in principle, it suffices to know
how to detect convexity of functions of a single variable. Moreover, this latter
question can be resolved by the standard Calculus tools.
Proof.
(i): We start by proving the necessity of the stated condition. Suppose that f
is differentiable and convex on (a, b). We will prove that then f ′ is monotonically
13.2 Criteria of convexity 167
nondecreasing. Let x < y be two points from the interval (a, b), and let us prove
that f ′ (x) ≤ f ′ (y). Consider any z ∈ (x, y). Invoking convexity of f and applying
(13.1), we have
f (z) − f (x) f (y) − f (z)
≤ .
x−z y−z
Passing to limit as z → x + 0, we get
f (y) − f (x)
f ′ (x) ≤ ,
y−x
and passing to limit in the same inequality as z → y − 0, we arrive at
f (y) − f (x)
≤ f ′ (y),
y−x
and so f ′ (x) ≤ f ′ (y), as claimed.
Let us now prove the sufficiency of the condition in (i). Thus, we assume that
f ′ exists and is nondecreasing on (a, b), and we will verify that f is convex on
(a, b). By “average rate of change” description of the convexity of a univariate
function, all we need is to verify that if x < z < y and x, y ∈ (a, b), then
f (z) − f (x) f (y) − f (z)
≤ .
z−x y−z
This is indeed evident: by the Lagrange Mean Value Theorem, the left hand side
ratio is f ′ (u) for some u ∈ (x, z), and the right hand side one is f ′ (v) for some
v ∈ (z, y). Since v > u and f ′ is nondecreasing on (a, b), we conclude that the
left hand side ratio is indeed less than or equal to the right hand side one.
(ii): This part is an immediate consequence of (i) as we know from Calculus
that a differentiable function — in our case now this is the function f ′ — is mono-
tonically nondecreasing on an interval if and only if its derivative is nonnegative
on this interval.
Proposition III.13.2 immediately allows us to verify the convexity of functions
listed in Example III.12.2. To this end, the only difficulty which we may encounter
is that some of these functions (e.g., xp with p ≥ 1, and −xp with 0 ≤ p ≤ 1)
are claimed to be convex on the half-interval [0, +∞), while Proposition III.13.2
talks about convexity of functions on open intervals. This difficulty can be ad-
dressed with the following simple result which allows us to extend the convexity
of continuous functions beyond open sets.
Proof. The “only if” part is evident: if f is convex on Q and x ∈ int Q, then for
any fixed direction h ∈ Rn the function g : R → R ∪ {+∞} defined as
g(t) := f (x + th)
is convex in a certain neighborhood of the point t = 0 on the axis (recall that affine
substitutions of argument preserves convexity). Since f is twice differentiable in
a neighborhood of x, the function g is twice differentiable in a neighborhood of
t = 0, as well. Thus, by Proposition III.13.2, we have 0 ≤ g ′′ (0) = h⊤ f ′′ (x)h.
In order to prove the “if” part we need to show that every function f : Q →
R∪{+∞} that is continuous on Q and that satisfies h⊤ f ′′ (x)h ≥ 0 for all x ∈ int Q
and all h ∈ Rn is convex on Q.
Let us first prove that f is convex on int Q. By Theorem I.1.29, int Q is a
convex set. Since the convexity of a function on a convex set is a one-dimensional
property, all we need to prove is that for any x, y ∈ int Q the univariate function
g : [0, 1] → R ∪ {+∞} given by
g(t) := f (x + t(y − x))
is convex on the segment [0, 1]. As f is twice differentiable on int Q, g is continuous
and twice differentiable on the segment [0, 1] and its second derivative is given by
g ′′ (t) = (y − x)⊤ f ′′ (x + t(y − x))(y − x) ≥ 0,
where the inequality follows from the premise on f . Then, by Propositions III.13.2(ii)
and III.13.3, g is convex on [0, 1]. Thus, f is convex on int Q. As f is convex on
int Q and is continuous on Q, by Proposition III.13.3 we conclude that f is convex
on Q.
Proof. Consider any x, y ∈ Q with x ̸= y and any λ ∈ (0, 1). We need to show
that f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y). Consider the function ϕ : [0, 1] → R
given by ϕ(t) := f (tx+(1−t)y). Then, as f is twice differentiable on Q, ϕ is twice
differentiable on [0, 1]. Moreover, based on the premise on f , we have ϕ′′ (t) > 0
for all t ∈ [0, 1]. Note that our target inequality is simply the relation ϕ(λ) <
λϕ(1) + (1 − λ)ϕ(0). Since 0 < λ < 1, we can rewrite this target inequality as
170 How to detect convexity
ϕ(λ)−ϕ(0)
λ
< ϕ(1)−ϕ(λ)
1−λ
. Finally, by the Mean Value Theorem and strict monotonicity
′
of ϕ we conclude that the desired target inequality holds.
We conclude this section by highlighting that convexity of many “complicated”
functions can be proved easily by the application of combination of “calculus of
convexity” rules to simple functions which pass the “infinitesimal” convexity tests.
Example III.13.2 Consider the following exponential posynomial function f :
Rn → R, given by
XN
f (x) = ci exp(a⊤
i x),
i=1
where the coefficients ci are positive (this is why the function is called posynomial).
This function is in fact convex on Rn . How can we prove this?
An immediate proof is as follows:
1. The function exp(t) is convex on R as its second order derivative is positive as
required by the infinitesimal convexity test for smooth univariate functions.
2. Thus, by stability of convexity under affine substitutions of argument, we
deduce that all functions exp(a⊤ n
i x) are convex on R .
3. Finally, by stability of convexity under taking linear combinations with non-
negative coefficients, we conclude that f is convex on Rn .
And if we were supposed to prove that the maximum of three exponential
posynomials is convex? Then, all we need is to add to our three steps above
the fourth one, which refers to the stability of convexity under taking pointwise
supremum. ♢
where x[i] denotes the i-th largest entry in the vector x. That is, for every vector
x ∈ Rn , we have x[1] ≥ x[2] ≥ . . . ≥ x[n] . By definition, sk (x) is simply the sum of
k largest elements in x. We claim thatP sk (x) is a convex function of x. Given any
index set I, the function ℓI (x) := i∈I xi is a linear function of x and thus it is
convex. Now, sk (x) is clearly the maximum of the linear functions ℓI (x) over all
index sets I with exactly k elements from {1, . . . , n}, and as such is convex.
Note also that sk (x) is a permutation symmetric function of x, that is, the value
of the function sk (x) remains the same when permuting entries in its argument
x. Taking together convexity and permutation symmetry of sk (x) will be very
13.3 Important multivariate convex functions 171
n
!
X
f (x) := ln exp(xi )
i=1
is convex.
Let us first verify the convexity of this function via direct computation using
Corollary III.13.4. To this end, we define pi := Pexp(x i)
. Then, the second-order
j exp(xj )
n
directional derivative of f along the direction h ∈ R is given by
!2
d2 X X
ω := 2 f (x + th) = pi h2i − pi hi .
dt t=0 i i
P
Observing that pi > 0 and i pi = 1, we see that ω is the variance (the ex-
pectation of square minus the squared expectation) of discrete random variable
taking values hi with probabilities pi , and it is well known that the variance of
any random variable is always nonnegative. Here is a direct verification of this
fact:
!2 !2 ! !
X X√ √ X X X
pi hi = pi ( pi hi ) ≤ pi pi h2i = pi h2i ,
i i i i i
and the concluding set is convex (as a sublevel set of a convex function). ♢
Example III.13.5 The function f : Rn+ → R given by
n
Y
f (x) = xαi i ,
i=1
P
where αi > 0 for all 1 ≤ i ≤ n and satisfy i αi ≤ 1, is convex.
To prove convexity of f via Corollary III.13.4 all we need is to verify that for
d2
any x ∈ Rn satisfying x > 0 and for any h ∈ Rn , we have dt2
f (x + th) ≥ 0.
t=0
Let ηi := hi /xi , then direct computation shows that
" #
d2 X
2
X
2
f (x + th) = ( αi ηi ) − ( αi ηi ) f (x)
dt2 t=0 i i
is convex on int Rn+as the function g(y) = − ln y is convex on the positive ray. It
remains to note that taking exponent preserves convexity by Convex Monotone
superposition rule. ♢
13.4 Gradient inequality 173
Proof. Let y ∈ Q. There is nothing to prove if y ̸∈ Dom f (since then the left
hand side of the gradient inequality is +∞). Similarly, there is nothing to prove
when y = x. Thus, we can assume that y ̸= x and y ∈ Dom f . Let us set
yτ := x + τ (y − x), where 0 < τ ≤ 1,
so that y0 = x, y1 = y and yτ is an interior point of the segment [x, y] for
0 < τ < 1. Applying Lemma III.13.1 to the triple (x, x′ , x′′ ) taken as (x, yτ , y),
we get
f (x + τ (y − x)) − f (x) f (y) − f (x)
≤ ;
τ ∥y − x∥2 ∥y − x∥2
as τ → +0, the left hand side in this inequality, by the definition of the gradient,
⊤
∇f (x)
tends to (y−x)
∥y−x∥2
, and so we get
(y − x)⊤ ∇f (x) f (y) − f (x)
≤ ,
∥y − x∥2 ∥y − x∥2
and as ∥y − x∥2 > 0 this is equivalent to
(y − x)⊤ ∇f (x) ≤ f (y) − f (x).
Note that this inequality is exactly the same as (13.4).
and only if the gradient inequality (13.4) is valid for every pair x ∈ int Q
and y ∈ Q.
Proof. Indeed, the “only if” part, i.e., the convexity of f on Q implying the
gradient inequality for all x ∈ int Q and all y ∈ Q, is given by Proposition III.13.7.
Let us prove the “if” part, i.e., establish the reverse implication. Suppose that
f satisfies the gradient inequality for all x ∈ int Q and all y ∈ Q, and let us
verify that f is convex on Q. As f is continuous on Q and Q is convex, by
Proposition III.13.3 it suffices to prove that f is convex on int Q. Recall also that
by Theorem I.1.29 int Q is convex. Moreover, due to the gradient inequality, on
int Q function f is the supremum of the family of affine (and therefore convex)
functions, i.e., for all y ∈ int Q we have
f (y) = sup fx (y), where fx (y) := f (x) + (y − x)⊤ ∇f (x).
x∈int Q
As affine functions are convex and by stability of convexity under taking pointwise
supremums, we conclude f is convex on int Q.
We shall prove this Theorem later in this section, after some preliminary effort.
Remark III.13.10 In Theorem III.13.9, all three assumptions on K, (1) closed-
ness, (2) boundedness, and (3) K ⊆ rint (Dom f ), are essential. The following
three examples illustrate their importance:
• Suppose f (x) = 1/x, then Dom f = (0, +∞). Consider K = (0, 1]. We have
assumptions (2) and (3) satisfied, but not (1). Note that f is neither bounded
nor Lipschitz continuous on K.
• Suppose f (x) = x2 , then Dom f = R. Consider K = R. We have (1) and (3)
satisfied, but not (2). Note that f is neither bounded nor Lipschitz continuous
on K.
13.5 Boundedness and Lipschitz continuity of a convex function 175
√
• Suppose f (x) = − x, then Dom f = [0, +∞). Consider K = [0, 1]. We have
(1) and (2) are satisfied, but not (3). Note that f is not Lipschitz continuous
on K (indeed, we have lim f (0)−ft
(t)
= lim t−1/2 = +∞, while for a Lipschitz
t→+0 t→+0
continuous f the ratios t−1 (f (0) − f (t)) should be bounded). On the other
hand, f is bounded on K. With a properly chosen convex function f of two
variables and non-polyhedral compact domain (e.g., with Dom f being the unit
circle), we can demonstrate also that lack of (3), even in presence of (1) and
(2), may cause unboundedness of f on K as well.
■
Theorem III.13.9 says that a convex function f is bounded on every compact
(i.e., closed and bounded) subset of rint(Dom f ). In fact, in the case of convex
functions we can make a much stronger statement on their below boundedness
of f : any convex function f is below bounded on any bounded subset of Rn !
Proof.
(i): 10 . We start with proving the above boundedness of f in a neighborhood of
x̄. This is immediate: by the premise of the proposition we have x̄ ∈ rint (Dom f ),
so there exists r̄ > 0 such that the neighborhood Ur̄ (x̄) is contained in Dom f .
Now, we can find a small simplex ∆ of the dimension m := dim(Aff(Dom f )) with
the vertices x0 , . . . , xm in Ur̄ (x̄) in such a way that x̄ will be a convex combination
of the vectors xi with positive coefficients, even with the coefficients 1/(m + 1),
i.e.,
m
X 1
x̄ = xi .
i=0
m + 1
Here is the justification of this claim that such a simplex ∆ exists: First, when
Dom f is a singleton, the claim is evident. So, we assume that dim(Dom f ) =
m ≥ 1. Without loss of generality, we may assume that x̄ = 0, so that 0 ∈ Dom f
and therefore Aff(Dom f ) = Lin(Dom f ). Then, by Linear Algebra, we can find
m vectors y 1 , . . . , y m in Dom f which form a basis of Lin(Dom f ) = Aff(Dom f ).
m
Setting y 0 := − y i and taking into account that 0 = x̄ ∈ rint (Dom f ), we can
P
i=1
find ϵ > 0 such that the vectors xi := ϵy i , i = 0, . . . , m, belong to Ur̄ (x̄). By
m
1
xi .
P
construction, x̄ = 0 = m+1
i=0
Note that x̄ ∈ rint (∆) (see Exercise I.3). Since ∆ spans the same affine sub-
space as Dom f , we can find a sufficiently small r > 0 such that r ≤ r̄ and
Ur (x̄) ⊆ ∆. Now, by definition,
(m m
)
X X
∆= λi xi : λi ≥ 0, ∀i, λi = 1 ,
i=0 i=0
Proof. If f is the convex function that is identical to +∞, Dom f = ∅ and there
is nothing to prove. So, we assume that Dom f ̸= ∅.
We will first show that any local minimizer of f is also a global minimizer
of f . Let x∗ be a local minimizer of f on Q. Consider any y ∈ Q such that
y ̸= x∗ . We need to prove that f (y) ≥ f (x∗ ). If f (y) = +∞, this relation is
automatically satisfied. So, we assume that y ∈ Dom f . Note that by definition
of a local minimizer, we also have x∗ ∈ Dom f for sure. Now, for any τ ∈ (0, 1),
by Lemma III.13.1 we have
f (x∗ + τ (y − x∗ )) − f (x∗ ) f (y) − f (x∗ )
≤ .
τ ∥y − x∗ ∥2 ∥y − x∗ ∥2
Since x∗ is a local minimizer of f , the left hand side in this inequality is nonneg-
179
180 Minima and maxima of convex functions
ative for all small enough values of τ > 0. Thus, we conclude that the right hand
side is nonnegative as well, i.e., f (y) ≥ f (x∗ ).
Note that Argmin f is nothing but the sublevel set levα (f ) of f associated with
Q
α taken as the minimal value min f of f on Q. Recall by Proposition III.12.6 any
Q
sublevel set of a convex function is convex, so this sublevel set Argmin f is convex.
Q
Finally, let us prove that the set Argmin f associated with a strictly convex f
Q
is, if nonempty, a singleton. Assume for contradiction that there are two distinct
minimizers x′ , x′′ in Argmin f . Then, from strict convexity of f , we would have
Q
1 ′ 1 ′′ 1
f x + x < (f (x′ ) + f (x′′ )) = min f,
2 2 2 Q
where the equality follows from x′ , x′′ ∈ Argmin f . But, this strict inequality is
Q
1 ′
impossible since 2
x + 12 x′′ ∈ Q as Q is convex and by definition of min f we
Q
cannot have a point in Q with objective value strictly smaller than min f .
Q
Proof. The necessity of the condition ∇f (x∗ ) = 0 for local optimality is due to
Calculus, and so it has nothing to do with convexity. The essence of the matter
is, of course, the sufficiency of the condition ∇f (x∗ ) = 0 for global optimality of
x∗ in the case of convex function f . In fact, this sufficiency is readily given by the
gradient inequality (13.4). In particular, when ∇f (x∗ ) = 0 holds, (13.4) becomes
f (y) ≥ f (x∗ ) + (y − x∗ )∇f (x∗ ) = f (x∗ ), ∀y ∈ Q.
the set
TQ (x∗ ) := {h ∈ Rn : x∗ + th ∈ Q, ∀ small enough t > 0} .
Geometrically, this is the set of all directions “looking” from x∗ towards Q, so
that a small enough positive step from x∗ along the direction, i.e., adding to x∗
a small enough positive multiple of the direction, keeps the point in Q. That
is, TQ (x∗ ) is the set of all “feasible” directions at x∗ that starting from x∗ we
can go a positive distance along and remain in Q. From the convexity of Q it
immediately follows that the radial cone indeed is a cone (not necessary closed).
For example, when x∗ ∈ int Q, we have TQ (x∗ ) = Rn . Let us examine a more
interesting example, e.g., the polyhedral set
Q = x ∈ Rn : a⊤
i x ≤ bi , i = 1, . . . , m , (14.3)
and its radial cone. For any x∗ ∈ Q, we define I(x∗ ) := {i : a⊤ ∗
i x = bi } as the
set of indices of constraints that are active at x (i.e., those satisfied at x∗ as
∗
Proof. The necessity of this condition is an evident fact which has nothing to
do with convexity. Suppose that x∗ is a local minimizer of f on Q. Assume for
contradiction that there exists h ∈ TQ (x∗ ) such that h⊤ ∇f (x∗ ) < 0. Then, by
182 Minima and maxima of convex functions
for all small enough positive t. These two observations together thus imply that
in every neighborhood of x∗ there are points x from Q with values f (x) strictly
smaller than f (x∗ ). This clearly contradicts the assumption that x∗ is a local
minimizer of f on Q.
Once again, the sufficiency of this condition is given by the gradient inequality,
exactly as in the case when x∗ ∈ int Q discussed in the proof of Theorem III.14.2.
Proposition III.14.3 states that under its premise the necessary and sufficient
condition for x∗ to minimize f on Q is the inclusion ∇f (x∗ ) ∈ −NQ (x∗ ). What
does this condition actually mean? The answer depends on what the normal cone
is: whenever we have an explicit description of it, we have an explicit form of the
optimality condition. For example,
• Consider the case of TQ (x∗ ) = Rn , i.e., x∗ ∈ int Q. Then, the normal cone
NQ (x∗ ) is the cone of all the vectors h that have nonpositive inner products with
every vector in Rn , i.e., NQ (x∗ ) = {0}. Consequently, in this case the necessary
and sufficient optimality condition of Proposition III.14.3 becomes the Fermat
rule ∇f (x∗ ) = 0, which we already know.
• When Q is an affine plane given by linear equalities Ax = b, A ∈ Rm×n , the
radial cone at every point x ∈ Q is the linear subspace {d : Ad = 0}, the normal
cone is the orthogonal complement {u = A⊤ v : v ∈ Rm } to this linear subspace,
and the optimality condition reads
• When Q is the polyhedral set (14.3), the radial cone is the polyhedral cone
(14.4), i.e., it is the set of all directions which have nonpositive inner products with
all ai for i ∈ I(x∗ ) (recall that these ai are coming from the constraints a⊤ i x ≤ bi
specifying Q that are satisfied as equalities at x∗ ). The corresponding normal
cone is thus the set of all vectors which have nonpositive inner products with all
these directions in TQ (x∗ ), i.e., of vectors a such that the inequality h⊤ a ≤ 0 is
a consequence of the inequalities h⊤ ai ≤ 0, i ∈ I(x∗ ) = {i : a⊤ ∗
i x = bi }. From
the Homogeneous Farkas Lemma we conclude that the normal cone is simply
the conic hull of the vectors ai , i ∈ I(x∗ ). Thus, in this case our necessary and
sufficient optimality condition becomes:
14.1 Minima of convex functions 183
Given Q = {x ∈ Rn : a⊤ ∗
i x ≤ bi , i = 1, . . . , m}, a point x ∈ Q, and a
∗ ∗
function f that is convex on Q and differentiable at x , x is a minimizer
of f on Q if and only if there exist nonnegative reals λ∗i (“Lagrange
multipliers”) associated with “active” indices i (those from I(x∗ )) such
that
X
∇f (x∗ ) + λ∗i ai = 0,
i∈I(x∗ )
Figure III.2. Physical illustration of KKT optimality onditions for optimization problem
minx∈R2 {f (x) : ai (x) ≤ 0, i = 1, 2, 3}.
White area represents the feasible domain Q, while ellipses A, B, C represent the sets
a1 (x) ≤ 0, a2 (x) ≤ 0, a3 (x) ≤ 0. The point x is a candidate feasible solution located at
the intersection {u ∈ R2 : a1 (u) = a2 (u) = 0} of boundaries of A and B. g = −∇f (x)
is external force acting at particle located at x, p and q are reaction forces created by
obstacles A and B. The condition for x to be at equilibrium reduces to g + p + q = 0,
as on the picture. Equilibrium condition g + p + q = 0 translates to the KKT equation
∇f (x) + λ1 a1 (x) + λ2 ∇a2 (x) = 0
holding for some nonnegative λ1 , λ2 .
In particular, Theorem III.14.6 states that given a convex function f the only
way for a point x∗ ∈ rint (Dom f ) to be a global maximizer of f is if the function
f is constant over its domain.
Next, we provide further information on maxima of convex functions.
Since the preceding inequality holds for any x̄ ∈ Conv E, we conclude sup f (x) ≤
x∈Conv E
sup f (x) holds as well, as desired.
x∈E
To prove (14.7), note that when S is a nonempty convex compact set, by Krein-
Milman Theorem (Theorem II.8.6) we have S = Conv(Ext(S)). Then, (14.7)
follows immediately from (14.6).
Our last theorem on maxima of convex functions is as follows.
Subgradients
is a convex set. Thus, there is no essential difference between convex functions and
convex sets: a convex function generates a convex set, i.e., its epigraph, which of
course remembers everything about the function. And the only specific property
of the epigraph as a convex set is that it always possesses a very specific recessive
direction, namely h = [0; 1]. That is, the ray {z + th : t ≥ 0} directed by h
belongs to the epigraph set whenever the starting point z of the ray is in the set.
Whenever a convex set possesses a nonzero recessive direction h such that −h is
not a recessive direction, the set in appropriate coordinates becomes the epigraph
of a convex function. Thus, a convex function is, basically, nothing but a way to
look, in the literal meaning of the latter verb, at a convex set.
Now, we know that the convex sets that are “actually nice” are the closed ones:
they possess a lot of important properties (e.g., admit a good outer description)
which are not shared by arbitrary convex sets. Therefore, among convex functions
there also are “actually nice” ones, namely those with closed epigraphs. Closed-
ness of the epigraph of a function can be “translated” to the functional language
and there it becomes a special kind of continuity, namely lower semicontinuity.
Before formally defining lower semicontinuity, let us do a brief preamble on con-
vergence of sequences on the extended real line. In the sequel, we will oper-
ate with limits of sequences {ai }i≥1 with terms ai from the extended real line
R := R ∪ {+∞} ∪ {−∞}. These limits are defined in the natural way: the rela-
tion
lim ai = a ∈ R
i→∞
With this “encoding,” R becomes the segment [−1, 1], and the relation a =
limi→∞ ai as defined above is the same as θ(a) = limi→∞ θ(ai ), that is, this
relation stands for the usual convergence, as i → ∞, of reals θ(ai ) to the real
θ(a). Note also that for a, b ∈ R the relation a ≤ b (a < b) is exactly the same as
the usual arithmetic inequality θ(a) ≤ θ(b) (θ(a) < θ(b), respectively).
With convergence and limits of sequences {ai }i ⊆ R already defined, we can
speak about upper (lower) limits of these sequences. For example, we can define
lim inf i→∞ ai as a ∈ R uniquely specified by the relation θ(a) = lim inf i→∞ θ(ai ).
Same as with lower limits of sequences of reals, lim inf i→∞ ai is the smallest (in
terms of the relation ≤ on R!) of the limits of converging (in R!) subsequences
of the sequence {ai }i .
It is time to come back to lower semicontinuity.
Proof. First, suppose epi{f } is closed, and let us prove that f is lsc. Consider
a sequence {xi }i such that xi → x as i → ∞, and let us prove that f (x) ≤ a :=
lim inf i→∞ f (xi ). There is nothing to prove when a = +∞. Assuming a < +∞, by
the definition of lim inf there exists a sequence i1 < i2 < . . . such that f (xij ) → a
as j → ∞. Let us assume that a > −∞ (we will verify later on that this is in fact
the case). Then, as the points [xij ; f (xij )] ∈ epi{f } converge to [x; a] and epi{f }
is closed, we see that [x; a] ∈ epi{f }, that is, f (x) ≤ a, as claimed. It remains
to verify that a > −∞. Indeed, assuming a = −∞, we conclude that for every
t ∈ R the points [xij ; t] belong to epi{f } for all large enough values of j, which,
as above, implies that [x; t] ∈ epi{f }, that is, t ≥ f (x). The latter inequality
cannot hold true for all real t, since f does not take value −∞; thus, a = −∞ is
impossible.
Now, for the opposite direction, let f be lsc, and let us prove that epi{f } is
closed. So, we should prove that if [xi ; ti ] → [x; t] as i → ∞ and [xi ; ti ] ∈ epi{f },
that is, ti ≥ f (xi ) for all i, then [x; t] ∈ epi{f }, that is, t ≥ f (x). Indeed, since f is
lsc and f (xi ) ≤ ti , we have f (x) ≤ lim inf i→∞ f (xi ) ≤ lim inf i→∞ ti = limi→∞ ti =
t.
An immediate consequence of Proposition III.15.2 is as follows:
is lower semicontinuous.
Proof. The epigraph of the function f is the intersection of the epigraphs of all
functions fα , and the intersection of closed sets is always closed.
Now let us look at convex, proper, and lower semicontinuous functions, that
is, functions Rn → R ∪ {+∞} with closed convex and nonempty epigraphs. To
save words, let us call these functions regular.
What we are about to do is to translate to the functional language several con-
structions and results related to convex sets. In the usual life, a translation (e.g.,
15.1 Proper lower semicontinuous convex functions and their representation 191
of poetry) typically results in something less rich than the original. In contrast
to this, in mathematics this is a powerful source of new ideas and constructions.
“Outer description” of a proper lower semicontinuous convex function.
We know that any closed convex set is the intersection of closed half-spaces. What
does this fact imply when the set is the epigraph of a regular function f ? First
of all, note that the epigraph is not a completely arbitrary convex set in Rn+1 : it
has the recessive direction e := [0n ; 1], i.e., the basic orth of the t-axis in the space
of variables x ∈ Rn , t ∈ R where the epigraph lives. This direction, of course,
should be recessive for every closed half-space
Π = [x; t] ∈ Rn+1 : αt ≥ d⊤ x − a , where |α| + ∥d∥2 > 0,
(*)
containing epi{f }. Note that in (*) we are adopting a specific form of the nonstrict
linear inequality describing the closed half-space Π among many possible forms
in the space where the epigraph lives; this form is the most convenient for us
now. Thus, e should be a recessive direction of Π ⊇ epi{f }, and the recessiveness
of e for Π means exactly that α ≥ 0. Thus, speaking about closed half-spaces
containing epi{f }, we in fact are considering some of the half-spaces (*) with
α ≥ 0.
Now, there are two essentially different possibilities for α to be nonnegative:
(A) α > 0, and (B) α = 0. In the case of (B) the boundary hyperplane of Π is
“vertical,” i.e., it is parallel to e, and in fact it “bounds” only x. And, in such
cases, Π is the set of all vectors [x; t] with x belonging to certain half-space in the
x-subspace and t being an arbitrary real number. These “vertical” half-spaces
will be of no interest to us.
The half-spaces which indeed are of interest to us are the “nonvertical” ones:
those given by the case (A), i.e., with α > 0. For a non-vertical half-space Π,
we can always divide the inequality defining Π by α and make α = 1. Thus,
a “nonvertical” candidate eligible for the role of a closed half-space containing
epi{f } can always be written as
Π = [x; t] ∈ Rn+1 : t ≥ d⊤ x − a .
(**)
That is, a “nonvertical” closed half-space containing epi{f } can be represented
as the epigraph of an affine function of x.
Now, when is such a candidate indeed a half-space containing epi{f }? It is
clear that the answer is yes if and only if the affine function d⊤ x − a is less than
or equal to f (x) for all x ∈ Rn . This is indeed precisely what we call as “d⊤ x − a
is an affine minorant of f .” In fact, we have a very nice characterization of proper
lsc convex functions through their affine minorants!
there exists an affine function fx̄ (x) such that fx̄ (x) ≤ f (x) for all x ∈ Rn
and fx̄ (x̄) = f (x̄).
Proof. I. We will first prove that at every x̄ ∈ rint (Dom f ) there exists an affine
function fx̄ (x) such that fx̄ (x) ≤ f (x) for all x ∈ Rn and fx̄ (x̄) = f (x̄).
I.10 First of all, we can easily reduce the situation to the one when Dom f is
full-dimensional. Indeed, by shifting f we can make Aff(Dom f ) to be a linear
subspace L in Rn ; restricting f onto this linear subspace, we clearly get a proper
function on L. If we believe that our statement is true for the case when Dom f
is full-dimensional, we can conclude that there exists an affine function on L, i.e.,
d⊤ x − a [where x ∈ L]
impossible, since in such a case the t-coordinate of the new endpoint would be
< f (x̄) while the x-component of it still would be x̄. Thus, ȳ ∈ rbd(epi{f }).
Next, we claim that ȳ ′ is an interior point of epi{f }. This is immediate: we know
from Theorem III.13.9 that f is continuous at x̄ (recall that x̄ ∈ int(Dom f )), so
that there exists a neighborhood U of x̄ in Aff(Dom f ) = Rn such that f (x) ≤
f (x̄) + 0.5 whenever x ∈ U , or, in other words, the set
V := {[x; t] : x ∈ U, t > f (x̄) + 0.5}
is contained in epi{f }; but this set clearly contains a neighborhood of ȳ ′ in Rn+1 .
We see that epi{f } is full-dimensional, so that rint(epi{f }) = int(epi f ) and
rbd(epi{f }) = bd(epi f ).
Now let us look at a hyperplane Π supporting cl(epi{f }) at the point ȳ ∈
rbd(epi{f }). W.l.o.g., we can represent this hyperplane via a nontrivial (i.e.,
with |α| + ∥d∥2 > 0) linear inequality
αt ≥ d⊤ x − a. (15.2)
satisfied everywhere on cl(epi{f }), specifically, as the hyperplane where this in-
equality holds true as equality. Now, inequality (15.2) is satisfied everywhere on
epi{f }, and therefore at the point ȳ ′ := [x̄; f (x̄) + 1] ∈ epi{f } as well, and is
satisfied as equality at ȳ = [x̄; f (x̄)] (since ȳ ∈ Π). These two observations clearly
imply that α ≥ 0. We claim that α > 0. Indeed, inequality (15.2) says that the
linear form h⊤ [x; t] := αt − d⊤ x attains its minimum over y ∈ cl(epi{f }), equal
to −a, at the point ȳ. Were α = 0, we would have h⊤ ȳ = h⊤ ȳ ′ , implying that
the set of minimizers of the linear form h⊤ y on the set cl(epi{f }) contains an
interior point (namely, ȳ ′ ) of the set. This is possible only when h = 0, that is,
α = 0, d = 0, which is not the case.
Now, as α > 0, by dividing both sides of (15.2) by α, we get a new inequality
of the form
t ≥ d⊤ x − a, (15.3)
(here we keep the same notation for the right hand side coefficients as we will
never come back to the old coefficients) which is valid on epi{f } and is equality
at ȳ = [x̄; f (x̄)]. Its validity on epi{f } implies that for all [x; t] with x ∈ Dom f
and t = f (x), we have
f (x) ≥ d⊤ x − a, ∀x ∈ Dom f. (15.4)
Thus, we conclude that the function d⊤ x − a is an affine minorant of f on Dom f
and therefore on Rn (f = +∞ outside Dom f !). Finally, note that the inequality
(15.4) becomes an equality at x̄, since (15.3) holds as equality at ȳ. The affine
minorant we have just built justifies the validity of the first claim of the propo-
sition.
II. Let F be the set of all affine functions which are minorants of f , and define
the function
f¯(x) := sup ϕ(x).
ϕ∈F
194 Subgradients
We have proved that f¯(x) is equal to f on rint (Dom f ) (and at any x ∈ rint (Dom f )
in fact sup in the right hand side can be replaced with max). To complete
the proof of the proposition, we should prove that f¯ is equal to f outside of
cl(Dom f ) as well. Note that this is the same as proving that f¯(x) = +∞ for all
x ∈ Rn \ cl(Dom f ). To see this, consider any x̄ ∈ Rn \ cl(Dom f ). As cl(Dom f )
is a closed convex set, x̄ can be strongly separated from Dom f , see Separation
Theorem (ii) (Theorem II.7.3). Thus, there exist ζ > 0 and z ∈ Rn such that
z ⊤ x̄ ≥ z ⊤ x + ζ, ∀x ∈ Dom f. (15.5)
In addition, we already know that there exists at least one affine minorant of f ,
i.e., there exist a and d such that
f (x) ≥ d⊤ x − a, ∀x ∈ Dom f. (15.6)
Multiplying both sides of (15.5) by a positive weight λ and then adding it to
(15.6), we get
f (x) ≥ (d + λz)⊤ x + [λζ − a − λz ⊤ x̄], ∀x ∈ Dom f.
| {z }
=:ϕλ (x)
This inequality clearly says that ϕλ (·) is an affine minorant of f on Rn for every
λ > 0. The value of this minorant at x = x̄ is equal to d⊤ x̄ − a + λζ and therefore
it goes to +∞ as λ → +∞. We see that the supremum of affine minorants of f at
x̄ indeed is +∞, as claimed. This concludes the proof of Proposition III.15.5.
Let us now prove Proposition III.15.4.
Proof of Proposition III.15.4. Under the premise of the proposition, f is a
proper lsc convex function. Let F be the set of all affine functions which are
minorants of f , and let
f¯(x) := sup ϕ(x).
ϕ∈F
(as it is the supremum of a family of affine and thus convex functions), we have
f¯(xi ) ≤ (1 − λi )f¯(x̄) + λi f¯(x′ ).
Noting that f¯(x′ ) = f (x′ ) (recall x′ ∈ rint (Dom f ) and apply Proposition III.15.5)
as well and putting things together, we get
f (xi ) ≤ (1 − λi )f¯(x̄) + λi f (x′ ).
Moreover, as i → ∞, we have λi → +0 and so the right hand side in our inequality
converges to f¯(x̄) = c. In addition, as i → ∞, we have xi → x̄ and since f is
lower semicontinuous, we get f (x̄) ≤ c.
We see why “translation of mathematical facts from one mathematical language
to another” – in our case, from the language of convex sets to the language of
convex functions – may be fruitful: because we invest a lot into the process rather
than run it mechanically.
Closure of a convex function. Proposition III.15.4 presents a nice result on
the outer description of a proper lower semicontinuous convex function: it is the
supremum of a family of affine functions. Note that, the reverse is also true: the
supremum of every family of affine functions is a proper lsc convex function,
provided that this supremum is finite at least at one point. This is because we
know from section 13.1 that supremum of every family of convex functions is
convex and from Corollary III.15.3 that supremum of lsc functions, e.g., affine
ones (these are in fact even continuous), is lower semicontinuous.
Now, what to do with a convex function which is not lower semicontinuous?
There is a similar question about convex sets: what to do with a convex set which
is not closed? We can resolve this question very simply by passing from the set to
its closure and thus getting a “much easier to handle” object which is very “close”
to the original one: the “main part” of the original set – its relative interior –
remains unchanged, and the “correction” adds to the set something relatively
small – (part of) its relative boundary. The same approach works for convex
functions as well: if a proper convex function f is not lower semicontinuous (i.e.,
its epigraph is convex and nonempty, but is not closed), we can “correct” the
function by replacing it with a new function with the epigraph being the closure
of epi{f }. To justify this approach, we, of course, should be sure that the closure
of the epigraph of a convex function is also an epigraph of such a function. This
indeed is the case, and to see it, it suffices to note that a set G in Rn+1 is the
epigraph of a function taking values in R ∪ {+∞} if and only if the intersection
of G with every vertical line {x = const, t ∈ R} is either empty, or is a closed
ray of the form {x = const, t ≥ t̄ > −∞}. Now, it is absolutely evident that if
G = cl(epi{f }), then the intersection of G with a vertical line is either empty, or
is a closed ray, or is the entire line (the last case indeed can take place – look at
the closure of the epigraph of the function equal to − x1 for x > 0 and +∞ for
x ≤ 0). We see that in order to justify our idea of “proper correction” of a convex
function we should prove that if f is convex, then the last of the indicated three
cases, i.e., the intersection of cl(epi{f }) with a vertical line is the entire line,
196 Subgradients
never occurs. However, we know from Proposition III.13.11 that every convex
function f is bounded from below on every compact set. Thus, cl(epi{f }) indeed
cannot contain an entire vertical line. Therefore, we conclude that the closure of
the epigraph of a convex function f is the epigraph of a certain function called
the closure of f [notation: cl f ] defined as:
cl(epi{f }) = epi{cl f }.
Of course, the function cl f is convex (its epigraph is convex as it is cl(epi{f })
and epi{f } itself is convex). Moreover, since the epigraph of cl f is closed, cl f is
lsc. And of course we have the following immediate observation.
Proof. Indeed, when f is convex, epi{f } is convex, and when f is lsc, epi{f }
is closed by Proposition III.15.2. Hence, under the premise of this observation,
epi{f } is convex and closed and thus, by definition of cl f , is the same as epi{cl f },
implying that f = cl f .
The following statement gives an instructive alternative description of cl f in
terms of f .
Also, for every x ∈ rint (Dom(cl f )) = rint (Dom f ), we can replace sup in
the right hand side of (15.7) with max.
Moreover,
(a) f (x) ≥ cl f (x), ∀x ∈ Rn ,
(b) f (x) = cl f (x), ∀x ∈ rint (Dom f ), (15.8)
(c) f (x) = cl f (x), ∀x ̸∈ cl(Dom f ).
Thus, the “correction” f 7→ cl f may vary f only at the points from rbd(Dom f ),
implying that
Dom f ⊆ Dom(cl f ) ⊆ cl(Dom f ),
hence rint (Dom f ) = rint (Dom(cl f )).
In addition, cl f is the supremum of all convex lower semicontinuous mi-
norants of f .
(ii) For all x ∈ Rn , we have
cl f (x) = lim inf f (x′ ).
r→+0 x′ :∥x′ −x∥2 ≤r
Dom f = Dom cl f = ∅, and all claims are trivially satisfied. Thus, assume from
now on that f is proper.
(i): We will prove this part in several simple steps.
1o . By construction, epi{cl f } = cl(epi{f }) ⊇ epi{f }, which implies (15.8.a).
Also, as cl(epi{f }) ⊆ [cl(Dom f )] × R, we arrive at (15.8.c).
2o . Note that from (15.8.a) we deduce that every affine minorant of cl f is an
affine minorant of f as well. Moreover, the reverse is also true. Indeed, let g(x) be
an affine minorant of f . Then, we clearly have epi{g} ⊇ epi{f }, and as epi{g} is
closed, we also get epi{g} ⊇ cl(epi{f }) = epi{cl f }. Note that epi{g} ⊇ epi{cl f }
is simply the same as saying that g is an affine minorant of cl f . Thus, affine
minorants of f and of cl f indeed are the same. Then, as cl f is lsc and proper
(since cl f ≤ f and f is proper), by applying Proposition III.15.4 to cl f and also
applying Proposition III.15.5 to f , we deduce (15.8.b) and (15.7).
Finally, if g is a convex lsc minorant of f , then g is definitely proper, and
thus by Proposition III.15.4 it is the supremum of all its affine minorants. These
minorants of g are affine minorants of f as well, and thus also affine minorants
of cl f . The bottom line is that g is ≤ the supremum of all affine minorants of f ,
which, as we already know, is cl f . Thus, convex lsc minorant of f is a minorant
of cl f , implying that the supremum fe of these lsc convex minorants of f satisfies
fe ≤ cl f . The latter inequality is in fact an equality since cl f itself is a convex
lsc minorant of f by (15.8.a). This completes the proof of part (i).
3o To verify (ii) we need to prove the following two facts:
To prove (ii-1), note that under the premise of this claim we have s ̸= −∞ since
f is below bounded on bounded subsets of Rn (Proposition III.13.11). There is
nothing to verify when s = +∞. So, suppose s ∈ R. Then, the point [x̄; s] is in
cl(epi{f }) = epi{cl f }, and thus cl f (x̄) ≤ s, as claimed.
To prove (ii-2), note that the claim is trivially true when cl f (x̄) = +∞. Indeed,
in this case f (x̄) = +∞ as well due to (15.8.a), and for all i = 1, 2, . . . we can
take xi = x̄. Now, consider a point x̄ such that cl f (x̄) < ∞. Then, we have
[x̄; cl f (x̄)] ∈ epi{cl f } = cl(epi{f }). Thus, there exists a sequence [xi ; ti ] ∈ epi{f }
such that [xi ; ti ] → [x̄; cl f (x̄)] as i → ∞. Passing to a subsequence, we can assume
that f (xi ) have a limit, finite or infinite, as i → ∞. Hence, limi→∞ xi = x̄ and
limi→∞ f (xi ) = lim inf i→∞ f (xi ) ≤ limi→∞ ti = cl f (x̄). Recall also that from
(ii-1) we have limi→∞ f (xi ) ≥ cl f (x̄), so we conclude limi→∞ f (xi ) = cl f (x̄) as
desired.
198 Subgradients
15.2 Subgradients
Let f : Rn → R∪{+∞} be a convex function, and let x ∈ Dom f . Recall from our
discussion in the preceding section that f may admit an affine minorant d⊤ x − a
which coincides with f at x, i.e.,
f (y) ≥ d⊤ y − a, ∀y ∈ Rn , and f (x) = d⊤ x − a.
The equality relation above is equivalent to a = d⊤ x − f (x), and substituting this
representation of a into the first inequality, we get
f (y) ≥ f (x) + d⊤ (y − x), ∀y ∈ Rn . (15.9)
Thus, if f admits an affine minorant which is exact at x, then there exists d ∈ Rn
which gives rise to the inequality (15.9). In fact the reverse is also true: if d is
such that (15.9) holds, then the right hand side of (15.9), regarded as a function
of y, is an affine minorant of f which coincides with f at x.
Now note that (15.9) expresses a specific property of a vector d and leads to
the following very important definition which generalizes the notion of gradient
for smooth convex functions to nonsmooth convex function.
Subgradients of convex functions play an important role in the theory and nu-
merical methods for Convex Optimization – they are quite reasonable surrogates
of gradients in the cases when the latter do not exist. Let us present a simple and
instructive illustration of this. Recall that Theorem III.14.2 states that
A necessary and sufficient condition for a convex function f : Rn →
R ∪ {+∞} to attain its minimum at a point x∗ ∈ int(Dom f ) where f is
differentiable is that ∇f (x∗ ) = 0.
The “nonsmooth” version of this statement is as follows.
Proof. (i): Closedness and convexity of ∂f (x) are evident from their definition as
(15.9) is an infinite system of nonstrict linear inequalities, indexed by y ∈ Dom f ,
on variable d.
When x ∈ rint (Dom f ), Proposition III.15.5 provides us with an affine function
which underestimates f everywhere and coincides with f at x. The slope of this
affine function is clearly a subgradient of f at x, and thus ∂f (x) ̸= ∅.
Boundedness of ∂f (x) when x ∈ int(Dom f ) is an immediate consequence of
item (iv) to be proved soon.
(ii): Suppose x ∈ int(Dom f ) and f is differentiable at x. Then, by the gradient
inequality we have ∇f (x) ∈ ∂f (x). Let us prove that in this case, ∇f (x) is the
only subgradient of f at x. Consider any d ∈ ∂f (x). Then, by the definition of
subgradient, we have
Now, consider any fixed direction h ∈ Rn and any real number t > 0. By substi-
tuting y = x + th in the preceding inequality and then dividing both sides of the
resulting inequality by t, we obtain
f (x + th) − f (x)
≥ d⊤ h.
t
Taking the limit of both sides of this inequality as t → +0, we get
h⊤ ∇f (x) ≥ h⊤ d.
Since h was an arbitrary direction, this inequality is valid for all h ∈ Rn , which
is possible if and only if d = ∇f (x).
(iii): Under the premise of this part, for every y ∈ Rn and for all i = 1, 2, . . .,
we have
♢
It is important to note that at the points from the relative boundary of the
domain of a convex function, even a “good” one, we may not have any subgradi-
ents. That is, it is possible to have ∂f (x) = ∅ for a convex function f at a point
x ∈ rbd(Dom f ). We give an example of this next.
Example III.15.2 Consider the function
( √
− x, if x ≥ 0,
f (x) =
+∞, if x < 0.
202 Subgradients
Convexity of this function follows from convexity of its domain and Example III.12.2.
Consider the point [0; f (0)] ∈ rbd(epi{f }). It is clear that at this point [0; f (0)]
there is no non-vertical supporting line to the set epi{f }, and, consequently, there
is no affine minorant of the function which is exact at x = 0. ♢
= Tr(X[sign(λX )eX e⊤
X ]),
where the third equality follows from the choice of eX , and the last equality is
due to Tr(X[eX e⊤X ]) = λX . Then, using item 2 in our subgradient calculus rules,
we conclude that the symmetric matrix E(X) := sign(λX )eX e⊤ X is a subgradient
of the function ∥ · ∥ at X. That is,
∥Y ∥ ≥ Tr(Y E(X)) = ∥X∥ + Tr(E(X)(Y − X)), ∀Y ∈ Sn ,
where the equality holds due to the choice of E(X) guaranteeing ∥X∥ = Tr(XE(X)).
To see that the above is indeed a subgradient inequality, recall that the inner
product on Sn is the Frobenius inner product ⟨A, B⟩ = Tr(AB).
Let us close this example by discussing smoothness properties of ∥X∥. Recall
that every norm ∥y∥ in Rm is nonsmooth at the origin y = 0. However, ∥X∥ is
a nonsmooth function of X even at points other than X = 0. In fact, ∥ · ∥ is
continuously differentiable in a neighborhood of every point X ∈ Sn where the
204 Subgradients
Proof of Theorem III.15.12. In this proof we will show that Theorem III.15.13
implies Theorem III.15.12.
Consider any d ∈ ∂f (x). Then, by Lemma III.15.11 we have d is in the subd-
ifferential Df (x)[·] at the origin, i.e., Df (x)[h] ≥ d⊤ h. Thus, we conclude
Df (x)[h] ≥ max d⊤ h : d ∈ ∂f (x) .
d
To prove the opposite inequality, let us fix h ∈ Rn , and let us verify that
g := Df (x)[h] ≤ maxd {d⊤ h : d ∈ ∂f (x)}. There is nothing to prove when
h = 0, so let h ̸= 0. Setting ϕ(t) := Df (x)[th], we get a convex (since, as we
already know, Df (x)[·] is convex) univariate function such that ϕ(t) = gt for
t ≥ 0. Then, this together with the convexity of ϕ implies that ϕ(t) ≥ gt for all
t ∈ R. By applying Hahn-Banach Theorem to the function D(z) := Df (x)[z] (we
already know that this function satisfies the premise of Hahn-Banach Theorem),
E := R(h) and the linear form e⊤ (th) = gt, t ∈ R, on E, we conclude that there
exists e ∈ Rn such that e⊤ u ≤ Df (x)[u] for all u ∈ Rn and e⊤ h = g = Df (x)[h].
206 Subgradients
Thus, the right hand side in (15.11) is greater than or equal to the left hand side.
The opposite inequality has already been proved, so that (15.11) is an equality.
Remark III.15.14 The reasoning in the proof of Theorem III.15.12 implies the
following fact:
Let f be a convex function. Consider any x ∈ int(Dom f ) and any affine
plane M such that x ∈ M . Then, “every subgradient, taken at x, of the
restriction f of f onto M can be obtained from the subgradient of f .”
M
That is, if e is such that f (y) ≥ f (x) + e⊤ (y − x) for all y ∈ M , then
there exists e′ ∈ ∂f (x) such that e⊤ (y − x) = (e′ )⊤ (y − x) for all y ∈ M .
■
greater than or equal to every possible value of the right hand side in (b), that is,
that D(g + h) − e⊤ h ≥ −D(−g + h′ ) + e⊤ h′ whenever h, h′ ∈ E. By rearranging
the terms, we thus need to show that
where the first inequality follows from convexity of the function D, the second
equality is due to D being positively homogeneous of degree 1, and the last
inequality is due to the facts that h + h′ ∈ E and e⊤ z is majorized by D(z) on
E. Hence, (15.12) is proved.
Remark III.15.15 The advantage of the preceding proof of Hahn-Banach The-
orem in finite dimensional case is that it straightforwardly combines with what is
called transfinite induction to yield Hahn-Banach Theorem in the case when Rn
is replaced with arbitrary, perhaps infinite dimensional, linear space and exten-
sion of linear functional from linear subspace on the entire space which preserves
majorization by a given convex and positively homogeneous, of degree 1, function
on the space.
In the finite-dimensional case, alternatively, we can prove Hahn-Banach The-
orem, via Separation Theorem as follows: without loss of generality we can as-
sume that E ̸= Rn . Define the sets T := {[x; t] : x ∈ Rn , t ≥ D(x)} and
S := {[h; t] : h ∈ E, t = e⊤ h}. Thus, we get two nonempty convex sets with
non-intersecting relative interiors (as E ̸= Rn and D(h) majorizes e⊤ h on E).
Then, by Separation Theorem there exists a nontrivial (r ̸= 0) linear functional
r⊤ [x; t] ≡ d⊤ x+at, which separates S and T , i.e., inf y∈T r⊤ y ≥ supy∈S r⊤ y. More-
over, since S is a linear subspace, we deduce that supy∈S r⊤ y is either 0 or +∞.
Also, as T ̸= ∅, we conclude +∞ > inf y∈T r⊤ y ≥ supy∈S r⊤ y, and thus we must
have supy∈S r⊤ y = 0. In addition, we claim that a > 0. Indeed,
As the right hand side value must be strictly greater than −∞, we see a ≥ 0.
Also, if a = 0 were to hold, then from r ̸= 0 we must have d ̸= 0. More-
over, when a = 0 and d ̸= 0, since D(·) is a finite valued function we have
inf x∈Rn ,t∈R d⊤ x + at : t ≥ D(x) = −∞. But, this contradicts to the infimum
being bounded below by 0. Now that a > 0, by multiplying r by a−1 , we get a
208 Subgradients
= sup{(−e′ )⊤ h + e⊤ h},
h∈E
′ ⊤ ⊤
and so we conclude (e ) h = e h holds for all h ∈ E. Note also that the relation
0 = sup[h;t]∈S r⊤ [h; t] ≤ inf [x;t]∈T r⊤ [x; t] is nothing but (e′ )⊤ x ≤ D(x) for all
x ∈ Rn . ■
Hahn-Banach Theorem is extremely important on its own right, and our way
of proving Theorem III.15.12 was motivated by the desire to acquaint the reader
with Hahn-Banach Theorem. If justification of Theorem III.15.12 were to be our
sole goal, we could have achieved this goal in a much broader setting and at a
cheaper cost, see solution to Exercise IV.29.D.4.
16
⋆ Legendre transform
f (x) ≥ d⊤ x − a, ∀x ∈ Rn ,
a ≥ d⊤ x − f (x), ∀x ∈ Rn .
a ≥ sup d⊤ x − f (x) .
x∈Rn
The supremum in the right hand side of this inequality is a certain function of d,
and we arrive at the following important definition.
Let us see some examples of simple functions and their Legendre transforms.
Example III.16.1
1. Given a ∈ R, consider the constant function
f (x) ≡ a.
Its Legendre transform is given by
(
∗
⊤
−a,
⊤ if d = 0,
f (d) = sup d x − f (x) = sup d x − a =
x∈Rn x∈Rn +∞, otherwise.
2. Consider the affine function
f (x) = c⊤ x + a, ∀x ∈ Rn .
Its Legendre transform is given by
(
−a, if d = c,
f ∗ (d) = sup d⊤ x − f (x) = sup d⊤ x − (c⊤ x + a) =
x∈Rn x∈Rn +∞, otherwise.
3. Consider the strictly convex quadratic function
1
f (x) = x⊤ Ax,
2
n
where A ∈ S is positive definite. Its Legendre transform is given by
1 ⊤ 1
f ∗ (d) = sup d⊤ x − f (x) = sup d⊤ x − = d⊤ A−1 d,
x Ax
x∈Rn x∈Rn 2 2
where the final equality holds by examining the first-order necessary and suf-
ficient optimality condition (for maximization type objective) of differentiable
concave functions.
4. Consider the function f : R → R given by f (x) = |x|p /p, where p ∈ (1, ∞).
Then, using the first-order optimality conditions we see that the Legendre
transform of f is given by
|x|p |d|q
∗
f (d) = sup dx − = ,
x∈R p q
where q satisfies p1 + 1q = 1.
5. Suppose f is a proper convex function and the function g is defined to be
g(x) = f (x − a). Then, the Legendre transform of g satisfies
g ∗ (d) = sup d⊤ x − g(x) = sup d⊤ x − f (x − a)
x∈Rn x∈R n
♢
16.2 Legendre transform : Main properties 211
Proof. First, by Fact III.16.2 f ∗ is a proper lsc convex function, so that f ∗∗ , once
again by Fact III.16.2, is a proper lsc convex function as well. Next, by definition,
f ∗∗ (x) = (f ∗ )∗ (x) = sup x⊤ d − f ∗ (d) =
⊤
sup d x−a .
d∈Rn d∈Rn ,a≥f ∗ (d)
Now, recall from the origin of the Legendre transform that a ≥ f ∗ (d) if and only
if the affine function d⊤ x − a is a minorant of f . Thus, sup d⊤ x − a
d∈Rn ,a≥f ∗ (d)
is exactly the supremum of all affine minorants of f , and this supremum, by
Proposition III.15.7 is nothing but the closure of f . Finally, when f is proper
convex and lsc, f = cl f by Observation III.15.6, that is f ∗∗ = cl f is the same as
f ∗∗ = f .
The Legendre transform is a very powerful descriptive tool, i.e., it is a “global”
transformation, so that local properties of f ∗ correspond to global properties
of f . Below we give a number important consequences of Legendre transform
highlighting this.
Let f be a proper convex lsc function.
A. By Proposition III.16.3, the Legendre transformf ∗ (d) = supx x⊤ d − f (x) is
a proper convex lsc function and f (x) = supd x⊤ d − f ∗ (d) . Since f ∗ (d) ≥
d⊤ x − f (x) for all x, we have
x⊤ d ≤ f (x) + f ∗ (d), ∀x, d ∈ Rn . (16.1)
Moreover, inequality in (16.1) becomes equality if and only if x ∈ Dom f and
d ∈ ∂f (x), same as if and only if d ∈ Dom f ∗ and x ∈ ∂f ∗ (d).
Proof. All we need is to justify the “moreover” part of the claim. Let x, d ∈
Rn , and let us prove that d⊤ x = f (x) + f ∗ (d) if and only if d ∈ ∂f (x). In one
212 ⋆ Legendre transform
To justify this claim, note that when x = 0 we can select any y with ∥y∥q = 1.
When x ̸= 0 and p < ∞ we can set y to be
yi := ∥x∥p1−p |xi |p−1 sign(xi ), ∀i = 1, . . . , n,
where we set 0p−1 = 0 when p = 1. Finally, when p = ∞, that is q = 1, we can
find index i∗ of the largest in magnitude entry of x and set
(
sign(xi∗ ), if i = i∗ ,
yi =
0, if i ̸= i∗ .
These observations altogether lead us to an extremely important, although
simple, fact:
1 1
∥x∥p = max y ⊤ x : ∥y∥q ≤ 1 ,
where + = 1. (16.4)
y p q
Based on this, we, in particular, deduce that ∥x∥p is convex (as an upper bound
of a family of linear functions). Hence, by its convexity we deduce that for any
x′ , x′′ we have
1 ′ 1 ′′
∥x′ + x′′ ∥p = 2 x + x ≤ 2 (∥x′ ∥p /2 + ∥x′′ ∥p /2) = ∥x′ ∥p + ∥x′′ ∥p ,
2 2 p
which is nothing but the triangle inequality. Thus, ∥x∥p satisfies the triangle
inequality; it clearly possesses the other two characteristic properties of a norm,
namely positivity and homogeneity, as well. Consequently, ∥ · ∥p is a norm—a fact
that we announced twice and already proved (see Example III.13.1.
Proof. Let ρ, σ ∈ (0, 1] be such that ρ ̸= σ. Consider any λ ∈ (0, 1) and set
π := λρ+(1−λ)σ. By defining θ := λρ/π, we get 0 < θ < 1 and 1−θ = (1−λ)σ/π.
Let r := 1/ρ, s := 1/σ, and p := 1/π, so we have p = θr + (1 − θ)s. Then,
n n
!θ n
!1−θ
X X X X
|ai |p = |ai |θr |ai |(1−θ)s ≤ |ai |r |ai |s ,
i=1 i=1 i i=1
216 ⋆ Legendre transform
where the inequality follows from Hölder’s inequality. Raising both sides of the
resulting inequality to the power 1/p we arrive at
∥a∥p ≤ ∥a∥rθ/p
r ∥a∥s(1−θ)/p
s = ∥a∥λ1/ρ ∥a∥1−λ
1/σ .
As its name implies one can indeed show that this function ∥d∥∗ is a norm.
For example, when p ∈ [1, ∞], (16.4) says that the norm conjugate to ∥ · ∥p is
∥ · ∥q where p1 + 1q = 1.
We also have the following characterization of the Legendre transform of norms.
where the inequality follows from the convexity of f and the second equality is
due to the fact that f is permutation symmetric.
Our developments will also rely on the following fundamental fact.
217
218 ⋆ Functions of eigenvalues of symmetric matrices
= Tr(e⊤ ⊤
i (U Diag {λ1 (X), . . . , λn (X)} U )ei )
= Tr(U ei e⊤ ⊤
i U Diag {λ1 (X), . . . , λn (X)})
Xn
= u2ji λj (X) = [Πλ(X)]i .
j=1
Let us denote with On the set of all n×n orthogonal matrices. Lemmas III.17.1
and III.17.2 together give us the following very useful relation.
Proof. The first claim immediately follows from Lemmas III.17.1 and III.17.2.
To see the second claim, consider any V ∈ On . Note that the matrix V ⊤ XV has
the same eigenvalues as X. Then, as f is a convex and permutation symmetric
function, applying the first claim of this proposition to the matrix V ⊤ XV , we
conclude
f (Dg{V ⊤ XV }) ≤ f (λ(V ⊤ XV )) = f (λ(X)).
Taking the supremum over V ∈ On of both sides of this relation gives us
f (λ(X)) ≥ sup f (Dg{V ⊤ XV }).
V ∈On
Note that ∥x − y∥2 is a convex function of x and y over the convex domain
{[x; y] ∈ Rn ×Rn : y ∈ Q}, and as convexity is preserved by partial minimization,
f (x) is a convex real-valued function. Permutation symmetry of Q and ∥·∥2 clearly
implies permutation symmetry of f . Then, by Proposition III.17.3 the function
F (X) := f (λ(X)) is a convex function of X. From the definition of the function
f , we have z ∈ Q if and only if f (z) = 0, which holds if and only if f (z) ≤ 0.
Thus,
Q = {X ∈ Sn : f (λ(X)) ≤ 0} = {X ∈ Sn : F (X) ≤ 0} ,
that is Q is a sublevel set of a convex function F (X) of X, and is hence convex.
Consider a univariate real-valued function g defined on some set Dom g ⊆ R.
In section D.1.5 we have associated with the function g(·) the matrix-valued
map X 7→ g(X) : Sn → Sn as follows: the domain of this map is composed
of all matrices X ∈ Sn with the spectrum σ(X) (subset of R composed of all
eigenvalues of X) contained in Dom g, and for such a matrix X, we set
g(X) := U Diag{g(λ1 ), . . . , g(λn )}U ⊤ ,
where X = U Diag{λ1 , . . . , λn }U ⊤ is an eigenvalue decomposition of X.
The following fact is quite important:
function
(
Tr(g(X)), if σ(X) ⊆ Dom g,
F (X) := : Sn → R ∪ {+∞}
+∞, otherwise,
is convex.
We close this chapter with the following very useful fact from majorization.
is convex.
3. Let Ai , i ≤ I, be as in item 2. Is it true that the function
X −1
g(x) = ln Det( xi Ai ) : {x ∈ RI : x > 0} → R
i
is convex?
4. Let Bi , i ≤ I, be mi × n matrices such that i Bi⊤ Bi ≻ 0, and let
P
is convex.
222
18.1 Around convex functions 223
5. Let Bi , i ≤ I, and Λ be as in the previous item. Prove that the matrix-valued function
" #−1
X ⊤ −1
F (λ) = Bi λi Bi : Λ → int Sn
+
i
where X ⊂ Rn is nonempty convex set, f (·) : X → R is concave, and G(x) = [g1 (x); . . . ; gm (x)] :
X → Rm is vector-function with convex components, and let R be the set of those r for which
(P [r]) is feasible. Prove that
1. R is a convex set with nonempty interior and this set is monotone, meaning that when r ∈ R
and r′ ≥ r, one has r′ ∈ R.
2. The function Opt(r) : R → R ∪ {+∞} satisfies the concavity inequality:
∀(r, r′ ∈ R, λ ∈ [0, 1]) : Opt(λr + (1 − λ)r′ ) ≥ λOpt(r) + (1 − λ)Opt(r′ ). (!)
3. If Opt(r) is finite at some point r̄ ∈ int R, then Opt(r) is real-valued everywhere on R.
Moreover, when X = Rn , and f and the components of G are affine, so that (P [r]) is an
LP program, we can replace in the above claim the inclusion r ∈ int R with the inclusion
r ∈ R: in the LP case, the function Opt(r) is either identically +∞ everywhere on R, or is
real-valued at every point of R.
Comment. Think about problem (P [r]) as about problem where r is the vector of resources
you create, and f (·) is your profit, so that the problem is to maximize your profit given your
resources and “technological constraints” x ∈ X. Now let r̄ ∈ R and e be a nonnegative vector,
and let us look what happens when you select your vector of resources on the ray R = r̄ + R+ e,
assuming that Opt(r) on this ray is real-valued. Restricted on this ray, your best profit becomes
a function ϕ(t) of nonnegative variable t:
ϕ(t) = Opt(r̄ + te).
Since e ≥ 0, this function is nondecreasing, as it should be: the larger t, the more resources you
have, and the larger is your profit. A not so nice news is that ϕ(t) is concave in t, meaning that
the slope of this function does not increase as t grows. In other words, if it costs you $1 to pass
from resources x̄ + te to resources x̄ + (t + 1)e, the return ϕ(t + 1) − ϕ(t) on one extra dollar of
your investment goes down (or at least does not go up) as t grows. This is called The Law of
Diminishing Marginal Returns.
Exercise III.5 ▲ [follow-up to Exercise ref III.4] There are n goods j with per-unit prices
cj > 0, per-unit utilities vj > 0, and the maximum available amounts xj , j ≤ n. Given budget
R ≥ 0, you want to decide on amounts xj of goods to be purchased to maximize the total utility
of the purchased goods, while respecting the budget and the availability constraints. Pose the
problem as and verify that the optimal value Opt(R) is piecewise linear function of R. What
are the breakpoints of this function? What are the slopes between breakpoints?
224 Exercises for Part III
where, as always, sk (x) = ki=1 x(i) . As we know from Exercise I.29, the functions sk (x), k < n,
P
are polyhedrally representable:
X
t ≥ sk (x) ⇐⇒ ∃z ≥ 0, s : xi ≤ zi + s, i ≤ n, zi + ks ≤ t,
i
This polyhedral representation has 2n2 − n linear inequalities and n2 + n − 2 extra variables.
Now goes the exercise:
1. Find an alternative polyhedral representation of f with n2 + 1 linear inequalities and 2n
extra variables.
2. [computational study] Generate at random orthogonal n × n matrix U and vector β with
nonincreasing entries and solve numerically the problem
( )
X
min f (x) := βk x(k) : ∥U x∥∞ ≤ 1
x
k
utilising the above polyhedral representations of f . For n = 8, 16, 32, . . . , 1024, compare the
running times corresponding to the 2 representations in question.
Exercise III.7 ♦ Let a ∈ Rn be a nonzero vector, and let f (ρ) = ln(∥a∥1/ρ ), ρ ∈ [0, 1].
Moment inequality, see section 16.3.3, states that f is convex. Prove that the function is also
nonincreasing and Lipschitz continuous, with Lipschitz constant ln n, or, which is the same, that
1 1
−p
1 ≤ p ≤ p′ ≤ ∞ =⇒ ∥a∥p ≥ ∥a∥p′ ≥ n p′ ∥a∥p .
Exercise III.8 ▲ This Exercise demonstrates power of Symmetry Principle. Consider the
situation as follows: you are given noisy observations
ω = Ax + ξ, A = Diag{αi , i ≤ n}
of unknown signal x known to belong to the unit ball B = {x ∈ Rn : ∥x∥2 ≤ 1}; here αi > 0
are given, and ξ is the standard (zero mean, unit covariance) Gaussian observation noise. Your
goal is to recover from this observation the vector y = Bx, B = Diag{βi , i ≤ n} being given.
You intend to recover y by linear estimate
where H is an n × n matrix you are allowed to choose. For example, selecting H = BA−1 =
Diag{βi αi−1 }, you get an unbiased estimate:
yH (Ax + ξ) − y} = 0.
E{b
18.1 Around convex functions 225
so that Err2x (H) is the expected squared ∥ · ∥2 -distance between the estimate and the estimated
quantity,
— on the entire set B of possible signals – by risk Risk[H] = maxx∈B Errx (H).
1. Find closed form expressions for Errx (H) and Risk(H).
2. Formulate the problem of finding the linear estimate with minimal risk as the problem of
minimizing a convex function and prove that the problem is solvable, and admits an optimal
solution H ∗ which is diagonal: H ∗ = Diag{ηi , i ≤ n}.
3. Reduce the problem yielding by item 2 to the problem of minimizing easy-to-compute convex
univariate function. Consider the case when βi = i−1 and αi = [σi2 ]−1 , 1 ≤ i ≤ n, set
n = 10000 and fill the following table:
where H ∗ is the minimum risk linear estimate as yielded by the solution to univariate problem
you end up with, and Risk[BA−1 ] is the risk of unbiased linear estimate.
You should see from your numerical results that minimal risk of linear estimation is much
smaller than the risk of the unbiased linear estimate. Explain on qualitative level why allowing
for bias reduces the risk.
Exercise III.9 ♦ 1 Given the sets of d-dimensional tentative nodes (d = 2 or d = 3) and of
tentative bars of a TTD problem satisfying assumption R, let V = RM be the space of virtual
displacements of the nodes, N be the number of tentative bars, and W > 0 be the allowed total
bar volume, see Exercise I.16. Let, next, C(t, f ) : RN
+ × V → R ∪ {+∞} be the compliance of
truss t ≥ 0 w.r.t. load f (we identify trusses with the corresponding vectors t of bar volumes).
Prove that
1. C(t, f ) is a convex lsc function, positively homogeneous of homogeneity degree 1, of [t; f ] with
RN N N n
++ × V ⊂ Dom C, where R++ = int R+ = {t ∈ R : t > 0}. This function is positively
homogeneous, with degree -1, in t, when f is fixed, and positively homogeneous, of degree
2, in f when t is fixed. Besides this, C(t, f ) is nonincreasing in t ≥ 0: if 0 ≤ t′ ≤ t, then
C(t, f ) ≤ C(t′ , f ) for every f .
P
2. The function Opt(W, f ) = inf t C(t, f ) : t ≥ 0, i ti = W – the optimal value in the TTD
problem (5.2) – with W restricted to reside in R++ = {W > 0} is convex continuous function
with the domain R++ × V. This function is positively homogeneous, of degree -1, in W > 0
and homogeneous, of homogeneity degree 2, in f :
To formulate the next two tasks, let us associate with a free node p the set F p of all single-
force loads stemming from forces g of magnitude ∥g∥2 not exceeding 1 and acting at node p. For
1 Preceding exercises in the TTD series are I.16, I.18.
226 Exercises for Part III
a set S of free nodes, F S is the set of all loads with nonzero forces acting solely at the nodes
from S and with the sum of ∥ · ∥2 -magnitudes of the forces not exceeding 1, so that
F S = Conv(∪p∈S F p )
(why?)
4. Let S = {p1 , . . . , pK } be a K-element collection of free nodes from the nodal set. Assume
that for every node p from S and every load f ∈ F p there exists a truss of a given total
weight W such that its compliance w.r.t. f does not exceed 1. Which, if any, of the following
statements
(i) For every load f ∈ F S , there exists a truss of total volume W with compliance w.r.t. f
not exceeding 1
(ii) There exists a truss of total volume W with compliance w.r.t. every load from F S not
exceeding 1
(iii) For properly selected γ depending solely on d, there exists a truss of total volume γKW
with compliance w.r.t. every load from F S not exceeding 1
is true?
⋆5. Prove the following statement:
In the situation of item 4 above, let γ = 4 when d = 2 and γ = 7 when d = 3. For
tk of total volume γW such that the compliance of t
every k ≤ K there exists a truss b
pk
w.r.t. every load from F does not exceed 1. As a result, there exists truss e
t of total
volume γKW with compliance w.r.t. every load from F S not exceeding 1.
As is immediately seen, this function is convex and proper. The Legendre transform of this
function is called the support function ϕX (x) of X:
1. Prove that χX is lower semicontinuous (lsc) if and only if X is closed, and that the support
functions of X and cl X are the same.
In the remaining part of Exercise, we are interested in properties of support function,s and in
view of item 1, it makes sense to assume from now on that X, on the top of being nonempty
and convex, is also closed.
Prove the following facts:
2. ϕX (·) is proper lsc convex function which is positively homogeneous of degree 1:
In particular, the domain of ϕX is a cone. Demonstrate by example that this cone not
necessarily is closed (look at the support function of the closed convex set {[v; w] ∈ R2 : v >
0, w ≤ ln v}).
18.2 Around support, characteristic, and Minkowski functions 227
3. Vice versa, every proper convex lsc function ϕ which is positively homogeneous of degree 1,
(x ∈ Dom f, λ ≥ 0) =⇒ ϕ(λx) = λϕ(x)
is the support function of a nonempty closed convex set, specifically, its subdifferential ∂ϕ(0)
taken at the origin. In particular, ϕX (·) “remembers” X: if X, Y are nonempty closed convex
sets, then ϕX (·) ≡ ϕY (·) if and only if X = Y .
4. Let X, Y be two nonempty closed convex sets. Then ϕX (·) ≥ ϕY (·) if and only if Y ⊂ X.
5. Dom ϕX = Rn if and only if X is bounded.
6. Let X be the unit ball of some norm ∥ · ∥. Then ϕX is nothing but the norm ∥ · ∥∗ conjugate
to ∥ · ∥. In particular, when p ∈ [1, ∞] and X = {x ∈ Rn : ∥x∥p ≤ 1}, we have ϕX (x) ≡ ∥x∥q ,
1
q
+ p1 = 1.
7. Let x 7→ Ax + b : Rn → Rm be an affine mapping, and let Y = AX + b = {Ax + b : x ∈ X}.
Then
ϕY (v) = ϕX (A⊤ v) + b⊤ v.
Exercise III.11 ♦ [Minkowski functions of convex sets] The goal of this Exercise is to acquaint
the reader with important special family of convex functions – Minkowski functions of convex
sets.
Consider a proper nonnegative lower semicontinuous function f : Rn → R ∪ {+∞} which is
positively homogeneous of degree 1, meaning that
x ∈ Dom f, t ≥ 0 =⇒ tx ∈ Dom f & f (tx) = tf (x).
Note that from the latter property of f and its properness it follows that 0 ∈ Dom f and
f (0) = 0.
We can associate with f its basic sublevel set
X = {x ∈ Rn : f (x) ≤ 1}.
Note that X ”remembers” f , specifically
∀t > 0 : f (x) ≤ t ⇐⇒ f (t−1 x) ≤ 1 ⇐⇒ t−1 x ∈ X,
whence also
∀x ∈ Rn : f (x) = inf t : t > 0, t−1 x ∈ X
(18.1)
[inf{t : t > 0, t ∈ ∅} = +∞ by definition]
Note that the basic sublevel set of our f cannot be arbitrary: it is convex and closed (since f is
convex lsc) and contains the origin (since f (0) = 0).
Now, given a closed convex set X ⊂ Rn containing the origin, we can associate with it a
function f : Rn → R ∪ {+∞} by construction from (18.1), specifically, as
f (x) = inf t : t > 0, t−1 x ∈ X
(18.2)
This function is called the Minkowski function (M.f.) of X.
Here goes your first task:
1. Prove that when X ⊂ Rn is convex, closed, bounded, and contains the origin, function f
given by (18.2) is proper, nonnegative, convex lsc function positively homogeneous of degree
1, and X is the basic sublevel set of f . Moreover, f is nothing but the support function ϕX∗
of the polar X∗ of X.
Your next tasks are as follows:
2. What are the Minkowski functions of
• the singleton {0} ?
• a linear subspace ?
• a closed cone K ?
228 Exercises for Part III
4. Let X = ∩k≤K Xk , where X1 , . . . , XK are closed convex sets in Rn such that XK ∩ int X1 ∩
int X2 . . . ∩ int XK−1 ̸= ∅. Prove that ϕX (y) ≤ a if and only if there exist yk , k ≤ K, such
that
X X
y= yk & ϕXk (yk ) ≤ a. (∗)
k k
In words: In the situation in question, the supremum of a linear form on ∩k Xk does not
exceed some a if and only if the form can be decomposed into the sum of K forms with the
sum of their suprema over the respective sets Xk not exceeding a.
5. Prove the following polyhedral version of the claim in item 4:
Let Xk = {x ∈ Rn : Ak x ≤ bk }, k ≤ K, be polyhedral sets with nonempty intersection X.
A linear form does not exceed some a ∈ R everywhere on X if and only if the form can be
decomposed into the sum of K linear forms with the sum of their maxima on the respective
sets Xk not exceeding a.
Exercise III.13 ▲ Let X ⊂ Rn be a nonempty polyhedral set given by polyhedral represen-
tation
X = {x : ∃u : Ax + Bu ≤ r}.
Build polyhedral representation of the epigraph of the support function of X. For non-polyhedral
extension, see Exercise IV.36.
Exercise III.14 ▲ Compute in closed analytic form the support functions of the following
sets:
1. The ellipsoid {x ∈ Rn : (x − c)⊤ C(x − c) ≤ 1} with C ≻ 0
probabilistic simplex {x ∈ Rn
P
2. The + : i xi = 1}
3. The nonnegative part of the unit ∥ · ∥p -ball: X = {x ∈ Rn + : ∥x∥p ≤ 1}, p ∈ [1, ∞]
4. The positive semidefinite part of the unit ∥ · ∥p,Sh norm: X = {x ∈ Sn + : ∥x∥p,Sh ≤ 1}
18.3 Around subdifferentials 229
2. Verify that ∥ · ∥2,Sh is nothing but the Frobenius norm of X, and ∥X∥∞,Sh is the same as the
spectral norm of X.
Exercise III.18 ♦ [chain rule for subdifferentials] Let Y ∈ Rm and X ∈ Rn be nonempty
convex sets, y ∈ Y , x ∈ X, f (·) : Y → R be a convex function, and A(·) : X → Y with A(x) = y.
Let, further, K be a closed cone in Rn . Function f is called K-monotone on Y , if for y, y ′ ∈ Y
such that y ′ − y ∈ K it holds f (y ′ ) ≥ f (y), and A is called K-convex on X if for all x, x′ ∈ X
and λ ∈ [0, 1] it holds λA(X) + (1 − λ)A(x′ ) − A(λx + (1 − λ)x′ ) ∈ K. 2 .
Prove that
1. A is K-convex on X if and only if for every ϕ ∈ K∗ the real-valued function ϕ⊤ A(x) is convex
on X.
2. Let A be K-convex on X and differentiable at x. Prove that
∀x ∈ X : A(x) − [A(x) + A′ (x)[x − x]] ∈ K. (∗)
3. Let f be K-monotone on Y and A be K-convex on X. Prove that the real valued on X
function f◦A (x) = f (A(x)) is convex.
4. Let f be K-monotone on Y . Prove that ∂f (y) ⊂ K∗ provided y ∈ int Y .
5. [chain rule] Let y ∈ int Y , x ∈ int X, let f be K-monotone on Y , A be K-convex on X and
differentiable at x. Prove that
∂f◦A (x) = [A′ (x)]⊤ ∂f (y) = {[A′ (x)]⊤ g : g ∈ ∂f (y)} (!)
n
Exercise III.19 ♦ Recall that the sum Sk (X) of k ≤ n largest eigenvalues of X ∈ S is a
convex function of X, see Remark III.17.4. Point out a subgradient of Sk (·) at a point X ∈ Sn .
2 We shall study cone-monotonicity and cone-convexity in more details in Part IV.
230 Exercises for Part III
where ξ ∈ Rn , τ ∈ R are parameters. Is the problem convex3) ? What is the domain in the
space of parameters where the problem is solvable? What is the optimal value? Is it convex in
the parameters?
Exercise III.23 ♦ Consider the optimization problem
where a, b ∈ R are parameters. Is the problem convex? What is the domain in space of parame-
ters where the problem is solvable? What is the optimal value? Is it convex in the parameters?
Exercise III.24 ▲ Compute Legendre transforms of the following functions:
• [“geometric mean”] f (x) = − i≤n xπi i : Rn
Q
+ → R, where πi > 0 sum up to 1 and n > 1.
n
!1/p
X p
|x|p = qj |xj |
j=1
This clearly is a norm which becomes the standard norm ∥ · ∥p when qj = 1, j ≤ n. Same as
∥x∥p , the quantity |x|p has limit, namely, ∥x∥∞ , as p → ∞, and we define | · |∞ as this limit.
Now let pi , i ≤ k, be positive reals such that
k
X 1
= 1.
i=1
p i
3 A maximization problem with objective f (·) and certain constraints and domain is called convex if
the equivalent minimization problem with the objective (−f ) and the original constraints and
domain is convex.
18.5 Miscellaneous exercises 231
Prove that
k
X |xii |ppii
|x1 x2 . . . xk |1 ≤ . (∗)
i=1
pi
provided some measurability conditions are satisfied. In this textbook we, however, do not touch
infinite-dimensional spaces of functions and related norms.
Exercise III.26 ♦ [Muirhead’s inequality] For any u ∈ Rn and z ∈ Rn n
++ := {z ∈ R : z > 0}
define
1 X u1 un
fz (u) = z · · · zσ(n) ,
n! σ σ(1)
where the sum is over all permutations σ of {1, . . . , n}. Show that if P is a doubly stochastic
n × n matrix, then
fz (P u) ≤ fz (u) ∀(u ∈ Rn , z ∈ Rn
++ ).
Exercise III.27 ♦ Prove that a convex lsc function f with polyhedral domain is continuous
on its domain. Does the conclusion remain true when lifting either one of the assumptions that
(a) convex f is lsc, and (b) Dom f is polyhedral?
Exercise III.28 ▲ Let a1 , . . . , an > 0, α, β > 0. Solve the optimization problem
( n )
X ai X β
min : x > 0, xi ≤ 1
x xα
i=1 i i
232 Exercises for Part III
Exercise III.29 ▲ [computational study] Consider the following situation: there are K ”radars”
with k-th of them capable to locate targets within ellipsoid Ek = {x ∈ Rn : (x−ck )⊤ Ck (x−ck ) ≤
1} (Ck ≻ 0); the measured position of target is
yk = x + σk ζk ,
where x is the actual position of the target, and ζk is the standard (zero mean, unit covariance)
Gaussian observation noise; ζk are independent across k. Given measurements y1 , . . . , yK of
target’s location x known to belong to the “common field of view” E = ∩k Ek of the radars,
which we assume to possess a nonempty interior, we want to estimate a given linear form e⊤ x
of x by using linear estimate
X ⊤
x
b= hk yk + h.
k
We are interested in finding the estimate (e.g., the parameters H1 , . . . , HK , h) minimizing the
risk
s
h X i2
Risk2 = max E e⊤ x − h⊤
k [x + σ k ζk ] − h
x∈E k
Fact III.16.4 Let ∥ · ∥ be a norm on Rn . Then, its dual norm ∥ · ∥∗ is indeed a norm.
Moreover, the norm dual to ∥ · ∥∗ is the original norm ∥ · ∥, and the unit balls of
conjugate to each other norms are polars of each other.
Proof. From definition of ∥ · ∥∗ it immediately follows that ∥ · ∥∗ satisfies all three conditions
specifying a norm. To justify that the norm dual to ∥ · ∥∗ is ∥ · ∥, note that the unit ball of the
dual norm is, by the definition of this norm, the polar of the unit ball of ∥ · ∥, and the latter
set, as the unit ball of any norm, is closed, convex, and contains the origin. As a result, the unit
balls of ∥ · ∥, ∥ · ∥∗ are polars of each other (Proposition II.8.37), and the norm dual to dual is
the original one – its unit ball is the polar of the unit ball of ∥ · ∥∗ .
That is, the Legendre transform of ∥ · ∥ is the characteristic function of the unit ball
of the conjugate norm.
Proof. Consider any fixed d ∈ Rn . By the definition of Legendre transform we have
n o n o
f ∗ (d) = sup d⊤ x − f (x) = sup d⊤ x − ∥x∥ .
x∈Rn x∈Rn
Now, consider the function gd (x) := d⊤ x − ∥x∥ so that f∗ (d) = supx∈Rn gd (x). The function
gd (x) is positively homogeneous, of degree 1, in x, so that its supremum over the entire space
is either 0 (this happens when the function is nonpositive everywhere), or +∞. By the same
homogeneity, the function gd (x) is nonpositive everywhere if and only if it is nonpositive when
∥x∥ = 1, that is, if and only if d⊤ x ≤ 1 whenever ∥x∥ = 1, or, which is the same, when d⊤ x ≤ 1
whenever ∥x∥ ≤ 1. The bottom line is that f ∗ (d) = supx gd (x) is either 0, or +∞, with the first
233
234 Proofs of Facts from Part III
option taking place if and only if d⊤ x ≤ 1 whenever ∥x∥ ≤ 1, that is, if and only if ∥d∥∗ ≤ 1.
x⊤ ⊤ ⊤
↓ y↓ ≥ x y ≥ x↓ y↑ . (*)
Indeed, by continuity, it suffices to verify this relation when all entries of x, same as all entries
in y, are distinct from each other. In such a case, observe that the inequalities to be proved
remain intact when we simultaneously reorder, in the same order, entries in x and in y, so
that we can assume without loss of generality that x1 ≥ x2 ≥ . . . ≥ xn . Taking into account
that ac+bd−[ad+bc] = [a−b][c−d], we see that if i < j and y is obtained from y by swapping
its i-th and j-th entries, we have x⊤ y ≤ x⊤ y when yi > yj and x⊤ y ≥ x⊤ y otherwise. Thus,
the minimum (the maximum) of inner products x⊤ z over the set of vectors z obtained by
reordering entries in y is achieved when z = y↑ (respectively, z = y↓ ), as claimed.
In the situation of (i), by Birkhoff Theorem, P y is a convex combination of vectors obtained
from y by reordering entries, and so the relation in (i) is immediately implied by (∗).
(ii): Let A = U Diag{λ(A)}U ⊤ be the eigenvalue decomposition of A. Then,
where µ is the diagonal of the matrix U ⊤ BU , i.e., µ = Dg{U ⊤ BU }. Then, by Lemma III.17.2
we deduce that there exists a doubly stochastic matrix P such that
Thus, Tr(AB) = (λ(A))⊤ µ = (λ(A))⊤ P λ(B). The desired inequality then follows from ap-
plying part (i) to the vectors x := λ(A) and y := λ(B).
Part IV
237
20
where
239
240 Convex Programming problems and Convex Theorem on Alternative
– [below boundedness] the problem is called below bounded, if its optimal value
is > −∞, i.e., if the objective is bounded from below on the feasible set.
• [optimal solution] a point x ∈ Rn is called an optimal solution to (20.1), if x
is feasible and f (x) ≤ f (x′ ) for any other feasible solution x′ , i.e., if
x ∈ Argmin {f (x′ ) : x′ ∈ X, g(x′ ) ≤ 0, h(x′ ) = 0} .
Now, what does it mean that (20.4) has no solutions in the domain X? A
necessary and sufficient condition for this is that the infimum of the left hand
side of (20.4) over the domain x ∈ X is greater than or equal to c. Thus, we
arrive at the following evident result.
Let us stress that Proposition IV.20.1 is completely general; it does not require
any assumptions (not even convexity) on the entities involved.
That said, Proposition IV.20.1, unfortunately, is not so helpful: the actual
power of the Theorem on Alternative (and the key fact utilized in the proof of the
Linear Programming Duality Theorem) is not the sufficiency of the condition of
Proposition for infeasibility of (I), but the necessity of this condition. Justification
of necessity of the condition in question has nothing to do with the evident
reasoning that established its sufficiency. In the linear case (X = Rn , f , g1 , . . . , gm
are linear), we established the necessity via the Homogeneous Farkas Lemma. We
will next prove the necessity of the condition for the convex case. At this step, we
already need some additional, although minor, assumptions; and in the general
nonconvex case the sufficient condition stated in Proposition IV.20.1 simply is
not necessary for the infeasibility of (I). This, of course, is very bad-yet-expected
news – this is the reason why there are difficult optimization problems that we
do not know how to solve efficiently.
The just presented “preface” outlines our action plan. Let us carry out our
plan by formally defining the aforementioned “minor regularity assumptions.”
In the case where some of the constraints are linear, we rely on a slightly relaxed
regularity condition.
20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 243
Clearly, the validity of Slater condition implies the validity of the Relaxed Slater
condition (why?). We are about to establish the following fundamental fact.
In our developments, we will frequently examine the dual cones as well. There-
fore, we introduce the following elementary fact on the regularity of dual cones.
Fact IV.20.6 (i) A cone K ⊆ Rn is regular if and only if its dual cone
K∗ = {y ∈ Rn : y ⊤ x ≥ 0, ∀x ∈ K} is regular.
(ii) Given regular cones K1 , . . . , Km , their direct product K1 × . . . × Km is
also regular.
There are a number of “magic cones” that are regular and play a crucial role
in Convex Optimization. In particular, many convex optimization problems from
practice can be posed as optimization problems involving domains expressed using
these cones as the basic building blocks.
Fact IV.20.7 The following cones (see Examples discussed in section 1.2.4)
are regular:
(i) Nonnegative ray, R+ .
(ii) Lorentz
p 2 (a.k.a., second-order, or ice-cream) cone, Ln = {x ∈ Rn : xn ≥
x1 + . . . + x2n−1 } (L1 := R+ ).
(iii) Positive semidefinite cone, Sn+ = {X ∈ Sn : a⊤ Xa ≥ 0, ∀a ∈ Rn }.
See chapter 25 for other instructive examples of K-convex functions and their
“calculus.”
Indeed, K-convexity can be expressed in terms of the usual convexity due to
the following immediate observation.
a≥K b ⇐⇒ b≤K a ⇐⇒ a − b ∈ K.
the one of the usual ≥ and > inequalities. Verification of the claims made in
Fact IV.20.11 is immediate and is left to the reader.
In the standard approach to nonlinear convex optimization, the Mathematical
Programming problem that is convex has the following form
min {f (x) : g(x) := [g1 (x); . . . ; gm (x)] ≤ 0, [h1 (x); . . . ; hk (x)] = 0} ,
x∈X
system (I). We call system (ConI) convex, if, in addition to already assumed
convexity of X, the function f is convex on X, and the map gb is K-convex on X.
Denoting by K∗ the cone dual to K, a sufficient condition for the infea-
sibility of (ConI) is the feasibility of the following system of constraints
⊤
h i
inf f (x) + λ g(x) + λb⊤ gb(x) ≥ c
x∈X
λ ≥ 0 (ConII)
b ≥K 0 [ ⇐⇒ λ
λ b ∈ K∗ ]
∗
Note: In some cases (ConI) may have no affine (i.e., polyhedral) part g(x) :=
Ax − b ≤ 0 and/or no “general part” gb(x)≤K 0; absence of one or both of these
parts leads to self-evident modifications in (ConII). To unify our forthcoming
considerations, it is convenient to assume that both of these parts are present.
This assumption is for free: it is immediately seen that in our context, in absence
of one or both of g-constraints in (ConI) we lose nothing when adding artificial
affine part g(x) := 0⊤ x − 1 ≤ 0 instead of missing affine part, and/or artifi-
cial general part gb(x) := 0⊤ x − 1≤K 0 with K = R+ instead of missing general
part. Thus, we lose nothing when assuming from the very beginning that both
polyhedral and general parts are present.
It is immediately seen that Convex Theorem on Alternative (Theorem IV.20.4)
is a special case of Theorem IV.20.13 corresponding to the case when K is a
nonnegative orthant.
In the proof of Theorem IV.20.13, we will use Lemma IV.20.14, a generalization
of the Inhomogeneous Farkas Lemma. We will present the proof of this result after
the proof of Theorem IV.20.13.
Proof of Theorem IV.20.13. The first part of the statement – “if (ConII) has
a solution, then (ConI) has no solutions” – has been already verified. What we
need is to prove the reverse statement. Thus, let us assume that (ConI) has no
solutions, and let us prove that then (ConII) has a solution.
00 . Without loss of generality we may assume that X is full-dimensional:
rint X = int X (indeed, otherwise we can replace our “universe” Rn with the
affine span of X). Besides this, if needed shifting f by a constant, we can assume
that c = 0. Thus, we are in the case where
f (x) < 0,
g(x) := Ax − b ≤ 0,
(ConI)
gb(x) ≤K 0, [ ⇐⇒ gb(x) ∈ −K]
x ∈ X;
20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 249
⊤
h i
b⊤ gb(x)
inf f (x) + λ g(x) + λ ≥ 0,
x∈X
λ ≥ 0, (ConII)
b ≥K
λ 0 [ ⇐⇒ λ
b ∈ K∗ ].
∗
20 . We now claim that α0 > 0. Note that the point t̄ := [t̄0 ; t̄1 ] with the
components t̄0 := f (0) and t̄1 := gb(0) belongs to T (since 0 ∈ Y by (20.7)),
thus by (20.9) it holds that α0 f (0) + α1⊤ gb(0) ≥ 0. Assume for contradiction that
α0 = 0. Then, we deduce α1⊤ gb(0) ≥ 0, which due to −gb(0) ∈ int K (see (20.7))
250 Convex Programming problems and Convex Theorem on Alternative
As W is nonempty, we have inf [x;τ ]∈W [e⊤ x + βτ ] < +∞. Then, by taking into
account the definition of Q, we deduce β ≥ 0 (since otherwise the left hand side
in the preceding inequality would be +∞). With β ≥ 0 in mind and considering
the definitions of Q and W , the preceding inequality reads
sup e⊤ x : x ∈ X, g(x) ≤ 0 ≤ inf e⊤ x + βh(x) : x ∈ X .
(20.11)
x x
Since (−f ) ∈ (M1 ∩ M2 )∗ , there exist ψ ∈ (M1 )∗ and ϕ ∈ (M2 )∗ such that
f = [d; δ] = −ϕ − ψ. The inclusion ϕ ∈ (M2 )∗ means that the homogeneous
linear inequality ϕ⊤ y ≥ 0 in variables y ∈ Rn+1 is a consequence of the system
of homogeneous linear inequalities given by [A, a]y ≤ 0. Hence, by Homogeneous
Farkas Lemma (Lemma I.4.1) −ϕ is a conic combination of the transposes of the
rows of the matrix [A, a], so that ϕ⊤ [x; 1] = −µ⊤ g(x) for some nonnegative µ
and all x ∈ Rn . Thus, for all x ∈ Rn , we deduce
d⊤ x + δ = [d; δ]⊤ [x; 1] = f ⊤ [x; 1] = −ϕ⊤ [x; 1] − ψ ⊤ [x; 1] = µ⊤ g(x) − ψ ⊤ [x; 1].
Finally, note that [x; 1] ∈ M1 whenever x ∈ X. Then, as ψ ∈ (M1 )∗ , we have
ψ ⊤ [x; 1] ≥ ⊤ ⊤
0 for all x ∈ X. Thus, for all x ∈ X, we have 0 ≤ ψ [x; 1] = µ g(x) −
⊤
d x + δ and so µ satisfies precisely the requirements stated in the lemma.
To complete the story about Convex Theorem on Alternative, let us present
an example which demonstrates that the relaxed Slater condition is crucial for
the validity of Theorem IV.20.13.
Example IV.20.1 Consider the following special case of (ConI):
f (x) ≡ x < 0, g(x) ≡ 0 ≤ 0, gb(x) ≡ x2 ≤ 0 (ConI)
(here the embedding space is R, X = R, c = 0, and K = R+ , that is, this is
just a system of scalar convex constraints). System (ConII) here is the system of
constraints
b 2 ≥ 0, λ ≥ 0, λ
inf x + λ · 0 + λx b≥0 (ConII)
x∈R
b ∈ R.
on variables λ, λ
System (ConI) clearly is infeasible. System (ConII) is infeasible as well – it is
immediately seen that whenever λ and λ b are nonnegative, the quantity x + λ ·
2
0 + λx is negative for all small in magnitude x < 0, that is, the first inequality
b
in (ConII) is incompatible with the remaining two inequalities of the system.
Note that in this example the only missing component of the premise in Theo-
rem IV.20.13 is the relaxed Slater condition. Let us now examine what happens
when we replace the constraint gb(x) ≡ x2 ≤ 0 with gb(x) ≡ x2 − 2ϵx ≤ 0 where
ϵ > 0. In this case, we keep (ConI) infeasible, and gain the validity of the re-
laxed (relaxed, not plain!) Slater condition. Then, as all the conditions of Convex
Theorem on Alternative are now met, we deduce that (ConII) which now reads
h i
b 2 − 2ϵx) ≥ 0, λ ≥ 0, λ
inf x + λ · 0 + λ(x b ≥ 0,
x
1
must be feasible. In fact, λ = 0, λ
b=
2ϵ
is a feasible solution to (ConII). ♢
21
from which L(λ) originates. The aggregate function in (21.2) is called the La-
grange (or Lagrangian) function of the inequality constrained optimization pro-
gram
min {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X] .
The Lagrange function L(x, λ) of an optimization problem is a very important
entity as most of the optimality conditions are expressed in terms of this function.
Let us start with translating our developments from section 20.2 to the language
of the Lagrange function.
Then,
(i) [Weak Duality] For every λ ≥ 0, the infimum of the Lagrange function in
253
254 Lagrange Function and Lagrange Duality
x ∈ X, that is,
L(λ) := inf L(x, λ)
x∈X
is a lower bound on the optimal value of (IC), so that the optimal value
of the optimization problem
sup L(λ) (IC∗ )
λ≥0
Xm
L(x, λ) = f (x) + λj gj (x) ≤ f (x),
j=1
where the inequality follows from the facts that λ ∈ Rm + and the feasibility of x
for (IC) implies that gj (x) ≤ 0 for all j = 1, . . . , m. Then, we immediately arrive
at
as desired.
(ii): This part is an immediate consequence of the Convex Theorem on Alter-
native. Note that the system
has no solutions in X, and by Theorem IV.20.4, the system (II) associated with
c = c∗ has a solution, i.e., there exists λ∗ ≥ 0 such that L(λ∗ ) ≥ c∗ . But, we
know from part (i) that the strict inequality here is impossible and, moreover,
that L(λ) ≤ c∗ for every λ ≥ 0. Thus, L(λ∗ ) = c∗ and λ∗ is a maximizer of L over
λ ≥ 0.
21.3 Lagrange duality and saddle points 255
Here, the variables λ of the dual problem are called the Lagrange multipliers of
the primal problem. Theorem IV.21.1 states that the optimal value of the dual
problem is at most that of the primal, and under some favorable circumstances
(i.e., when the primal problem is convex, below bounded, and satisfies the Relaxed
Slater condition) the optimal values in this pair of problems are equal to each
other.
In our formulation there may seem to be some asymmetry between the primal
and the dual problems. In fact, both of these problems are related to the Lagrange
function in a quite symmetric way. Indeed, consider the problem
min L(x), where L(x) := sup L(x, λ).
x∈X λ≥0
By definition of the Lagrange function L(x, λ), the function L(x) is clearly +∞ at
every point x ∈ X which is not feasible for (IC) and is f (x) on the feasible set of
(IC), so that this problem is equivalent to (IC). We see that both the primal and
the dual problems originate from the Lagrange function: in the primal problem,
we minimize over X the result of maximization of L(x, λ) in λ ≥ 0, i.e., the primal
problem is
min sup L(x, λ),
x∈X λ∈Rm
+
This then leads to the following easily demonstrable fact (do it by yourself or
look at Theorem IV.27.2).
The results from sections 21.1 and 21.2 related to convex optimization problems
in the standard MP format admit instructive extensions to the case of convex
problems in cone-constrained form. We next present these extensions.
Example IV.22.1 Recall that the positive semidefinite cone Sn+ and the notation
A ⪰ B, B ⪯ A, A ≻ B, B ≺ A for the associated non-strict and strict conic
inequalities were introduced in section D.2.2. As we know from Fact IV.20.7 and
Example II.8.9, the cone Sn+ is regular and self-dual. Recall from Lemma IV.20.9
that the function from Sn to Sn given by gb(x) = xx⊤ = xx = x2 is ⪰-convex. As
a result, the problem
min n t : Tr(y) ≤ t, y 2 ⪯ B
Opt(P ) = (22.1)
x=(t,y)∈R×S
min n t : ⟨y, In ⟩ − t ≤ 0, y 2 ⪯ B ,
=
x=(t,y)∈R×S
where B is a positive definite matrix and ⟨·, ·⟩ is the Frobenius inner product, is
a convex program in cone-constrained form. ♢
♢
Cone-constrained Lagrange dual of (P ) is the optimization problem
Note that the only nontrivial part (ii) of Theorem IV.21.1 is nothing but the
special case of Theorem IV.22.1 where K is a nonnegative orthant.
Proof of Theorem IV.22.1. This proof is immediate. Under the premise of the
theorem, c := Opt(P ) is a real, and the system of constraints (ConI) associated
with this c has no solutions. Relaxed Slater Condition along with Convex Theorem
on Alternative in cone-constrained form (Theorem IV.20.13) imply the feasibility
b∗ ] ∈ Λ such that
of (ConII), i.e., the existence of λ∗ = [λ∗ ; λ
⊤
n o
L(λ∗ ) = inf f (x) + λ∗ g(x) + λ b⊤ gb(x) ≥ c = Opt(P ).
∗
x∈X
Thus, we deduce that (D) has a feasible solution with objective value ≥ Opt(P ),
By Weak Duality, this value is exactly Opt(P ), the solution in question is optimal
for (D), and Opt(P ) = Opt(D).
Example IV.22.4 (continued from Example IV.22.1) Problem (22.1) is clearly
below bounded and satisfies Slater condition (since B ≻ 0). By Theorem IV.22.1
the dual problem (22.3) is solvable and has the same optimal value as (22.1).
The solution for the (convex!) dual problem (22.3) can be found by applying the
Fermat rule. To this end, note also that for a positive definite n × n matrix y and
h ∈ Sn it holds that
d
t=0
(y + th)−1 = −y −1 hy −1
dt
(why?). Then, the Fermat rule says that the optimal solution to (22.3) is
b∗ = 1 B −1/2
λ∗ = 1, λ
2
and Opt(P ) = Opt(D) = −Tr(B 1/2 ). ♢
⊤
Since L∆ (x, λ) = L0 (x, λ) − λ δ − λ b we deduce L (λ) = L (λ) − λ⊤ δ − λ
b⊤ δ, b⊤ δ,
b
∆ 0
⊤
n o
⊤
where by definition L0 (λ) = inf x∈X f (x) + λ g(x) + λ
b gb(x) . Thus,
⊤
n o
Opt(D∆ ) = max {L∆ (λ)} = max L0 (λ) − λ δ − λ b⊤ δb .
λ∈Λ λ∈Λ
We have the following nice and instructive fact that provides further insights
to the optimum value sensitivity of these parametric families of problems.
The premises in (i) and (ii) definitely take place when (P0 ) satisfies the
Relaxed Slater condition and is below bounded.
Example IV.22.5 (continued from Example IV.22.1) Problem (22.1) can be
embedded into the parametric family of problems
min n t : Tr(y) ≤ t, y 2 ⪯ B + R
Opt(PR ) := (P [R])
x=(t,y)∈R×S
with R varying through Sn . Taking into account all we have established so far
for this problem and also considering Fact IV.22.2, we arrive at
1
Tr((B + R)1/2 ) = −Opt(PR ) ≤ Tr(B 1/2 ) + Tr(B −1/2 R), ∀(R ≻ −B). (22.4)
2
Note that this is nothing but the Gradient inequality for the concave (see Fact
III.17.6) function Tr(X 1/2 ) : Sn+ → R, see Fact D.24. ♢
Recall that by Fact IV.20.6.i the cone dual to a regular cone also is a regular
cone. As a result, the problem (D), called the conic dual of the conic problem
(22.5), also is a conic problem. An immediate computation (utilizing the fact that
(K∗ )∗ = K for every regular cone K) shows that conic duality is symmetric.
262 ⋆ Convex Programming in cone-constrained form
Fact IV.22.5 Conic duality is symmetric, i.e., the conic dual to conic prob-
lem (D) is (equivalent to) conic problem (22.5).
Proof. This proof is immediate. Weak duality has already been verified. To
verify the second claim, note that by primal-dual symmetry we can assume that
the bounded problem satisfying Relaxed Slater condition is (P ). But, then the
claim in question is given by Theorem IV.22.1. Finally, if both problems satisfy
Relaxed Slater condition (and in particular are feasible), by Weak Duality, both
are bounded, and therefore solvable with equal optimal values by the preceding
claim.
Application example: S-Lemma S-Lemma is an extremely useful fact that
has applications in optimization, engineering, and control.
Note that S-Lemma is the statement of the same flavor as Homogeneous Farkas
Lemma: the latter states that a homogeneous linear inequality b⊤ x ≥ 0 is a
consequence of a system of homogeneous linear inequalities a⊤i x ≥ 0, 1 ≤ i ≤ k,
if and only if the target inequality can be obtained from the inequalities of the
system by taking weighted sum with nonnegative weights; we could add to “taking
weighted sum” also “and adding identically true homogeneous linear inequality”
22.4 Conic Programming and Conic Duality Theorem 263
– by the simple reason that there exists only one inequality of the latter type,
0⊤ x ≥ 0. Similarly, S-Lemma says that (whenever (22.6) holds) homogeneous
quadratic inequality x⊤ Bx ≥ 0 is a consequence of (single-inequality) system of
homogeneous quadratic inequalities x⊤ Ax ≥ 0 if and only if the target inequality
can be obtained by taking weighted sum, with nonnegative weights, of inequalities
of the system (that is, by taking a nonnegative multiple of the inequality x⊤ Ax ≥
0) and adding an identically true homogeneous quadratic inequality (there are
plenty of them, these are inequalities x⊤ Cx ≥ 0 with C ⪰ 0).
Note that the possibility for the target inequality to be obtained by summing
up, with nonnegative weights, inequalities from certain system and adding an
identically true inequality is clearly a sufficient condition for the target inequal-
ity to be a consequence of the system. The actual power of Homogeneous Farkas
Lemma and S-Lemma is in the fact that this evident sufficient condition is also
necessary for the conclusion in question to be valid (in the case of linear inequal-
ities – whenever the system is finite, in the case of S-Lemma – when the system
is a single-inequality one and (22.6) takes place). The fact that in the quadratic
case to guarantee the necessity, the system should be a single-inequality one,
whatever unpleasant, is a must. In fact, a straightforward “quadratic version” of
Homogeneous Farkas Lemma fails, in general, to be true already when there are
just two quadratic inequalities in the system. This being said, even that poor, as
compared to its linear inequalities analogy, S-Lemma is extremely useful. . .
In preparation to S-Lemma, we will first prove the following weaker statement.
Lemma IV.22.8 Let A, B ∈ Sn . Suppose ∃x̄ satisfying x̄⊤ Ax̄ > 0. Then,
the implication
{X ⪰ 0, Tr(AX) ≥ 0} =⇒ Tr(BX) ≥ 0 (22.9)
holds if and only if B ⪰ λA for some λ ≥ 0.
Proof. The “if” part of this lemma is evident. To prove the “only if” part,
consider the conic problem
Opt(P ) = min {Tr(BX) : Tr(AX) ≥ 0, X ⪰ 0} (P )
X
(derive the conic dual of (P ) yourself by utilizing the fact that Sn+ is self-dual).
Note that from the premise of (22.6), we deduce that for large enough nonnegative
t, the solution X̄ := In + tx̄x̄⊤ will ensure that the Slater condition holds true.
Moreover, under the premise of (22.9) Opt(P ) is bounded from below by 0 as
well. Then, by Conic Duality Theorem, the dual (D) is solvable, implying that
B ⪰ λA for some λ ≥ 0, as required in this lemma and completing the proof.
Note that (22.7) is nothing but (22.9) with X restricted to be of rank ≤ 1.
Indeed, X ⪰ 0 is of rank ≤ 1 if and only if X = xx⊤ for some vector x, and
264 ⋆ Convex Programming in cone-constrained form
holds true: the homogeneous inequality in the premise of (∗∗) is strictly feasible
along with (A), so that by the homogeneous S-Lemma (∗∗) holds true if and only
if
B b A a
∃λ ≥ 0 : ⪰ λ . (22.10)
b⊤ β a⊤ α
Thus, if we knew that in the case of strictly feasible (A) the validity of implication
(∗) is the same as the validity of implication (∗∗), we could be sure that the first
of these implications takes place if and only if (22.10) takes place. The above ”if”
indeed is true.
Proof. Suppose the premise holds, i.e., x̄⊤ Ax̄ + 2a⊤ x̄ + α > 0 for some x̄. Based
on the discussion preceding this lemma all we need to verify is that the validity
of (∗) is exactly the same as the validity of (∗∗). Clearly, the validity of (∗∗)
implies the validity of (∗), so our task boils down to demonstrating that under
the premise of the lemma, the validity of (∗) implies the validity of (∗∗). Thus,
assume that (∗) is valid, and let us prove that (∗∗) is valid as well. All we need
to prove is that y ⊤ Ay ≥ 0 implies y ⊤ By ≥ 0. Thus, assume that y is such that
y ⊤ Ay ≥ 0, and let us prove that y ⊤ By ≥ 0 as well. Define xt := tx̄ + (1 − t)y,
and consider the univariate quadratic functions qa (t) := x⊤ ⊤
t Axt + 2ta xt + αt ,
2
⊤ ⊤ 2
qb (t) := xt Bxt + 2tb xt + βt . We have so far seen that
(a) for all t ̸= 0, qa (t) ≥ 0 =⇒ qb (t) ≥ 0,
(b) qa (1) > 0 and qa (0) ≥ 0,
and we would like to show that qb (0) ≥ 0. Note that qa and qb are linear or
quadratic functions of t and thus they are continuous in t. Now, consider the
following cases (draw qa in these cases!):
• If qa (0) > 0, by continuity of qa we have qa (t) > 0 for all small enough nonzero
t, and so in such a case, by (a) we also get qb (t) ≥ 0 for all small enough in
magnitude nonzero t, implying, by continuity, that qb (0) ≥ 0.
• If qa (0) = 0, the reasoning goes as follows. When t varies from 0 to 1, the linear
or quadratic function qa (t) varies from 0 to something positive. It follows that
– either qa (t) ≥ 0, 0 ≤ t ≤ 1, implying by (a) that qb (t) ≥ 0 for t ∈ (0, 1], and
so qb (0) ≥ 0 holds by continuity of qb (t) at t = 0,
– or qa (t̄) < 0 holds for some t̄ ∈ (0, 1). Assuming that this is the case, the
linear or quadratic function qa (t) is zero at t = 0, negative somewhere on
(0, 1), and positive at t = 1. Therefore, qa is quadratic, and not linear,
function of t which has exactly one root in the interval (0, 1). Let this root in
(0, 1) be t1 . Recall that the other root of the quadratic function qa is t = 0,
thus we must have qa (t) = c(t − 0)(t − t1 ) for some c ∈ R. From t1 < 1 and
qa (1) > 0 it follows that c > 0; this, in turn, combines with t1 > 0 to imply
266 ⋆ Convex Programming in cone-constrained form
that qa (t) > 0 when t < 0. By (a) it follows that qb (t) ≥ 0 for t < 0, whence
by continuity qb (0) ≥ 0.
Thus, the dual is precisely the problem of maximizing under the outlined restric-
tion the right hand side of the aggregated inequality.
Since problem (22.11) satisfies the Slater condition (as B ≻ 0) and is below
bounded (why?), the dual problem is solvable and Opt(P ) = Opt(D). Moreover,
the dual problem also satisfies the Relaxed Slater condition (why?), so that both
the primal and the dual problems are solvable. ♢
23
Using our results on convex optimization duality, we next derive optimality con-
ditions for convex programs.
∗
Let x ∈ X. Then,
(i) A sufficient condition for x∗ to be an optimal solution to (IC) is the
existence of the vector of Lagrange multipliers λ∗ ≥ 0 such that (x∗ , λ∗ ) is
a saddle point of the Lagrange function L(x, λ), i.e., a point where L(x, λ)
attains its minimum as a function of x ∈ X and attains its maximum as a
function of λ ≥ 0:
L(x, λ∗ ) ≥ L(x∗ , λ∗ ) ≥ L(x∗ , λ) ∀x ∈ X, ∀λ ≥ 0. (23.1)
(ii) Furthermore, if the problem (IC) is convex and satisfies the Relaxed
Slater condition, then the above condition is necessary for optimality of x∗ : if
x∗ is optimal for (IC), then there exists λ∗ ≥ 0 such that (x∗ , λ∗ ) is a saddle
point of the Lagrange function.
Proof. (i): Assume that for a given x∗ ∈ X there exists λ∗ ≥ 0 such that (23.1)
is satisfied, and let us prove that then x∗ is optimal for (IC). First, we claim
that x∗ is feasible. Assume for contradiction that gj (x∗ ) > 0 for some j. Then, of
course, sup L(x∗ , λ) = +∞ (look what happens when all λ’s, except λj , are fixed,
λ≥0
268
23.1 Saddle point form of optimality conditions 269
Recall that for any x feasible for (IC), we have gj (x) ≤ 0 for all j. Together
with λ∗P≥ 0, we then deduce that for any x feasible to (IC), we have f (x) ≥
m
f (x) + j=1 λ∗j gj (x). But, then the above relation immediately implies that x∗ is
optimal.
(ii): Assume that (IC) is a convex program, x∗ is its optimal solution and
the problem satisfies the Relaxed Slater condition. We will prove that then there
exists λ∗ ≥ 0 such that (x∗ , λ∗ ) is a saddle point of the Lagrange function, i.e., that
(23.1) is satisfied. As we know from the Convex Programming Duality Theorem
(Theorem IV.21.1.ii), the dual problem (IC∗ ) has a solution λ∗ ≥ 0 and the
optimal value of the dual problem is equal to the optimal value of the primal one,
i.e., to f (x∗ ):
f (x∗ ) = L(λ∗ ) ≡ inf L(x, λ∗ ). (23.2)
x∈X
the terms in the summation expression in the right hand side are nonpositive
(since x∗ is feasible for (IC) and λ∗ ≥ 0), and the sum itself is nonnegative
due to our inequality. Note that this is possible if and only if all the terms in the
summation expression are zero, and this is precisely the complementary slackness.
From the complementary slackness we immediately conclude that f (x∗ ) =
L(x∗ , λ∗ ), so that (23.2) results in
L(x∗ , λ∗ ) = f (x∗ ) = inf L(x, λ∗ ).
x∈X
∗
On the other hand, since x is feasible for (IC), from the definition of the La-
grangian function we deduce that L(x∗ , λ) ≤ f (x∗ ) whenever λ ≥ 0. Combining
our observations, we conclude that
L(x∗ , λ) ≤ L(x∗ , λ∗ ) ≤ L(x, λ∗ )
270 Optimality Conditions in Convex Programming
(D)
Suppose that (P ) is bounded and satisfies Relaxed Slater condition. Then, a
point x∗ ∈ X is an optimal solution to (P ) if and only if x∗ can be augmented
by properly selected λ∗ ∈ Λ := Rk+ × [K∗ ] to be a saddle point of the cone-
constrained Lagrange function
b := f (x) + λ⊤ g(x) + λ
L(x; [λ; λ]) b⊤ gb(x)
on X × Λ.
Proof. The proof basically repeats the one of Theorem IV.23.1. In one direction:
∗
assume that x∗ ∈ X can be augmented by λ∗ = [λ ; λ b∗ ] ∈ Λ to form a saddle
∗
point of L(x; λ) on X × Λ, and let us prove that x is an optimal solution to (P ).
Observe, first, that from
b = f (x∗ ) + sup λ⊤ g(x∗ ) + λ
h i
L(x∗ ; λ∗ ) = sup L(x∗ ; [λ; λ]) b⊤ gb(x∗ ) (23.3)
λ∈Λ λ∈Λ
b⊤ gb(x∗ ) of λ
it follows that the linear form λ b is bounded from above on the cone
∗
K∗ , implying that −gb(x ) ∈ [K∗ ]∗ = K. Similarly, (23.3) says that the linear
⊤
form λ g(x∗ ) of λ is bounded from above on the cone Rk+ , implying that −g(x∗ )
belongs to the dual of this cone, that is, to Rk+ . Thus, x∗ is feasible for (P ).
As x∗ is feasible for (P ), the right hand side of the second equality in (23.3)
23.2 Karush-Kuhn-Tucker form of optimality conditions 271
is f (x∗ ), and thus (23.3) says that L(x∗ ; λ∗ ) = f (x∗ ). With this in mind, the
relation L(x; λ∗ ) ≥ L(x∗ ; λ∗ ) (which is satisfied for all x ∈ X, since (x∗ , λ∗ ) is
a saddle point of L) reads L(x; λ∗ ) ≥ f (x∗ ). This combines with the relation
f (x) ≥ L(x; λ∗ ) (which, due to λ∗ ∈ Λ, holds true for all x feasible for (P )) to
imply that Opt(P ) ≥ f (x∗ ). The bottom line is that x∗ is a feasible solution to
(P ) satisfying Opt(P ) ≥ f (x∗ ), thus, x∗ is an optimal solution to (P ), as claimed.
In the opposite direction: let x∗ be an optimal solution to (P ), and let us
verify that x∗ is the first component of a saddle point of L(x; λ) on X × Λ.
Indeed, (P ) is convex essentially strictly feasible cone-constrained problem; being
solvable, it is below bounded. Applying Convex Programming Duality Theorem
in cone-constrained form (Theorem IV.22.1), the dual problem (D) is solvable
∗
with optimal value Opt(D) = Opt(P ). Denoting by λ∗ = [λ ; λ b∗ ] an optimal
solution to (D), we have
f (x∗ ) = Opt(P ) = Opt(D) = L(λ∗ ) = inf L(x; λ∗ ), (23.4)
x∈X
∗
b∗ ]⊤ gb(x∗ ), that is, [λ∗ ]⊤ g(x∗ ) +
whence f (x∗ ) ≤ L(x∗ ; λ∗ ) = f (x∗ ) + [λ ]⊤ g(x∗ ) + [λ
b∗ ]⊤ gb(x∗ ) ≥ 0. Both terms in the latter sum are nonpositive (as x∗ is feasible
[λ
∗
for (P ) and λ∗ ∈ Λ), while their sum is nonnegative, so that [λ ]⊤ g(x∗ ) = 0
and [λ b∗ ]⊤ gb(x∗ ) = 0. We conclude that the inequality f (x∗ ) ≤ L(x∗ ; λ∗ ) is in fact
equality, so that (23.4) reads inf x∈X L(x; λ∗ ) = L(x∗ ; λ∗ ). Next, L(x∗ ; λ) ≤ f (x∗ )
for λ ∈ Λ due to feasibility of x∗ for (P ), which combines with the already
proved equality L(x∗ ; λ∗ ) = f (x∗ ) to imply that supλ∈Λ L(x∗ ; λ) = L(x∗ ; λ∗ ).
Thus, (x∗ , λ∗ ) is the desired saddle point of L.
(23.5)
m
X
and ∇f (x∗ ) + λ∗j ∇gj (x∗ ) ∈ −NX (x∗ ), [KKT equation]
j=1
(23.6)
∗ ∗
where NX (x ) is the normal cone of X at x .
We are now ready to state “more explicit” optimality conditions for convex
programs based on KKT points.
Proof. (i): Suppose x∗ is a KKT point of problem (IC), and let us prove that
x∗ is an optimal solution to (IC). By Theorem IV.23.1, it suffices to demonstrate
that augmenting x∗ by properly selected λ ≥ 0, we get a saddle point (x∗ , λ) of
∗
the Lagrange function on X × Rm + . Let λ be the Lagrange multipliers associated
with x according to the definition of a KKT point. We claim that (x∗ , λ∗ ) is a
∗
saddle point of the Lagrange function. Indeed, complementary P slackness says that
n
L(x∗ , λ∗ ) = f (x∗ ), while due to feasibility of x∗ we have supλ≥0 j=1 λj gj (x∗ ) = 0.
Taken together, these observations say that L(x , λ ) = supλ≥0 L(x∗ , λ). More-
∗ ∗
x∗ , then x∗ is a KKT point. Indeed, let x∗ and (IC) satisfy the above “if.” By
Theorem IV.23.1(ii), x∗ can be augmented by some λ∗ ≥ 0 to yield a saddle point
(x∗ , λ∗ ) of L(x, λ) on X × Rm
+ . Then, the saddle point inequalities (23.1) give us
m
X m
X
f (x∗ ) + λ∗j gj (x∗ ) = L(x∗ , λ∗ ) ≥ sup L(x∗ , λ) = f (x∗ ) + sup λj gj (x∗ ).
j=1 λ≥0 λ≥0 j=1
(23.7)
Moreover, as x∗ is feasible to (IC), we have gj (x∗ ) ≤ 0 for all j, whence
X
supλ≥0 λi gj (x∗ ) = 0.
j
Therefore (23.7) implies that j λ∗j gj (x∗ ) ≥ 0. This inequality, in view of λ∗j ≥ 0
P
and gj (x∗ ) ≤ 0 for all j, implies that λ∗j gj (x∗ ) = 0 for all j, i.e., complementary
slackness (23.5) condition holds. The relation L(x, λ∗ ) ≥ L(x∗ , λ∗ ) for all x ∈ X,
implies that the function ϕ(x) := L(x, λ∗ ) attains its minimum on X at x = x∗ .
Note also that ϕ(x) is convex on X and differentiable at x∗ , thus, by Proposition
III.14.3, we deduce that the KKT equation (23.6) also holds.
Note that the optimality conditions stated in Theorem III.14.2 and Proposition
III.14.3 are particular cases of Theorem IV.23.4 corresponding to m = 0.
Remark IV.23.5 A standard special case of Theorem IV.23.4 that is worth
discussing explicitly is when x∗ is in the (relative) interior of X.
When x∗ ∈ int X, we have NX (x∗ ) = {0}, so that (23.6) reads
m
X
∇f (x∗ ) + λ∗j ∇gj (x∗ ) = 0.
j=1
{f (x) : g(x) :=
min Ax − b ≤ 0, gb(x) ≤K 0, x ∈ X}
where X ⊆ Rn , f : X → R, g : Rn → Rk , (ConeC)
gb : X → Rν , K ⊂ Rν is a regular cone.
274 Optimality Conditions in Convex Programming
The proof of this theorem follows verbatim by the proof of Theorem IV.23.4, with
Theorem IV.23.2 in the role of Theorem IV.23.1.
Application: Optimal value in parametric convex cone-constrained prob-
lem. What follows is a far-reaching extension of subgradient interpretation of
Lagrange multipliers presented in section 22.3.A. Consider a parametric family
of convex cone-constrained problems defined by a parameter p ∈ P
Opt(p) := min {f (x, p) : g(x, p) ≤M 0} , (P[p])
x∈X
where
(a) X ⊆ Rn and P ⊆ Rµ are nonempty convex sets,
(b) M is a regular cone in some Rν .
(c) f : X × P → R is convex, and g : X × P → Rν is M-convex. 1
1 In what follows, splitting the constraint in a cone-constrained problem into a system of scalar linear
inequalities and a conic inequality does not play any role, and in order to save notation, (P[p]) uses
“single-constraint” format of cone-constrained problem. Of course, the two-constraint format
g(x) := Ax − b ≤ 0, bg (x) ≤K 0 reduces to the single-constraint one by setting g(x) = [g(x); b
g (x)] and
M = Rk+ × K.
23.3 Cone-constrained KKT optimality conditions 275
is a subgradient of Opt(·) at p:
Opt(p) ≥ Opt(p) + [p − p]⊤ [Fp + G⊤
p µ], ∀p ∈ P. (23.11)
≥ Opt(p) + (Fp + G⊤ ⊤
p µ) [p − p],
where the first inequality follows from µ ∈ M∗ and g(x, p) ≤M 0, the second is
due to (23.12) and (23.13), the equality holds by recalling that x is optimal for
(P [p]), and the last inequality is due to (23.10). As the resulting inequality holds
true for all x feasible for (P[p]), it justifies (23.11).
To complete the proof, we need to verify the convexity of Opt(·). By the relation
in (23.11), Opt(p) for p ∈ P is either a real, or +∞, as is required for a convex
function. For any p′ , p′′ ∈ P ∩ Dom(Opt(·)) and λ ∈ [0, 1], we need to check
Opt(λp′ + (1 − λ)p′′ ) ≤ λOpt(p′ ) + (1 − λ)Opt(p′′ ). (23.14)
276 Optimality Conditions in Convex Programming
This is immediate (cf. proof of Fact IV.22.2): given ϵ > 0, we can find x′ , x′′ ∈ X
such that
g(x′ , p′ ) ≤M 0, g(x′′ , p′′ ) ≤M 0, f (x′ , p′ ) ≤ Opt(p′ ) + ϵ, f (x′′ , p′′ ) ≤ Opt(p′′ ) + ϵ.
Setting p := λp′ + (1 − λ)p′′ , x := λx′ + (1 − λ)x′′ and invoking convexity of f
and M-convexity of g, we get
g(x, p) ≤M 0, f (x) ≤ [λOpt(p′ ) + (1 − λ)Opt(p′′ )] + ϵ.
Finally, since ϵ > 0 is arbitrary, we arrive at (23.14).
Note that the result of section 22.3.A is nothing but what Proposition IV.23.8
states in the case of f independent of p, M = Rk+ × K, p = [δ, δ]
b ∈ Rk × Rν , and
g(x, p) = [g(x) − δ; gb(x) − δ].
b
Suppose that both problems satisfy Relaxed Slater condition. Then, a pair
of feasible solutions x∗ to (P ) and λ∗ := [λ∗ ; λb∗ ] to (D) is optimal to the
respective problems if and only if
DualityGap(x∗ ; λ∗ ) := c⊤ x∗ − [−b⊤ λ∗ − p⊤ λ
b∗ ] = 0, [Zero Duality Gap]
Remark IV.23.10 Under the premise of Theorem IV.23.9, from the feasibility of
x∗ and λ∗ for their respective problems it follows that b−Ax∗ ≥ 0 and p−P x∗ ∈ K
and λ∗ ≥ 0 and λ b∗ ∈ K∗ . Therefore, Complementary slackness (which says that
the sum of two inner products, every one of a vector from a regular cone and a
vector from the dual of this cone, and as such automatically nonnegative) is zero is
b∗ ]⊤ gb(x∗ ) = 0
a really strong restriction. This comment is applicable to relation [λ
in (23.8). ■
Proof of Theorem IV.23.9. By Conic Duality Theorem (Theorem IV.22.6)
we are in the case when Opt(P ) = Opt(D) ∈ R, and therefore for any x and
λ := [λ; λ],
b we have
Now, when x is feasible for (P ), the primal optimality gap c⊤ x − Opt(P ) is non-
negative and is zero if and only if x is optimal for (P ). Similarly, when λ = [λ; λ]
b
is feasible for (D), the dual optimality gap Opt(D) − (−b⊤ λ − p⊤ λ)b is nonnegative
and is zero if and only if λ is optimal for (D). We conclude that whenever x is fea-
sible for (P ), and λ is feasible for (D). the duality gap DualityGap(x; λ) (which,
as we have seen, is the sum of the corresponding optimality gaps) is nonnegative
and is zero if and only if both these optimality gaps are zero, that is, if and only
if x is optimal for (P ), and λ is optimal for (D).
It remains to note that Complementary Slackness condition is equivalent to
Zero Duality Gap one. To this end, note that since x∗ and λ∗ are feasible for
their respective problems, we have
DualityGap(x∗ ; λ∗ ) = c⊤ x∗ + b⊤ λ∗ + p⊤ λ
b∗
= −[A⊤ λ∗ + P ⊤ λ
c∗ ]⊤ x∗ + b⊤ λ∗ + p⊤ λ
b∗
⊤
b⊤ [p − P x∗ ].
= λ∗ [b − Ax∗ ] + λ∗
Therefore, Complementary Slackness, for the solutions x∗ and λ∗ that are feasible
for the respective problems, is exactly the same as Zero Duality Gap.
Example IV.23.1 (continued from Example IV.22.1) Consider the primal-dual
pair of conic problems (22.11) and (22.12). We claim that the primal solution
y = −B 1/2 , t = −Tr(B 1/2 ) and the dual solution λ = 1, U = 21 B −1/2 , V = 12 In ,
W = 12 B 1/2 are optimal for the respective problems. Indeed, it is immediately seen
that these solutions are feasible for the respective problems (to check feasibility of
the dual solution, use Schur Complement Lemma). Moreover, the objective value
of the primal solution equals to the objective value of the dual solution, and both
these quantities are equal to −Tr(B 1/2 ). Thus, the zero duality gap indeed holds
true. ♢
24
The fundamental role of the Lagrange function and Lagrange Duality in Op-
timization is clear already from the Optimality Conditions given by Theorem
IV.23.1, but this role is not restricted by this theorem only. There are several
cases when we can explicitly write down the Lagrange dual, and whenever it is
the case, we get a pair of explicitly formulated and closely related to each other
optimization programs – the primal-dual pair; analyzing the problems simulta-
neously, we get more information about their properties (and get a possibility to
solve the problems numerically in a more efficient way) than it is possible when
we restrict ourselves with only one problem of the pair. The detailed investigation
of Duality in “well-structured” Convex Programming deals with cone-constrained
Lagrange duality and conic problems. This being said, there are cases where al-
ready “plain” Lagrange duality is quite appropriate. Let us look at two of these
particular cases.
∗
on X × Rm + : equalities (23.5) taken together with feasibility of x state that
∗ ∗
L(x , λ) attains its maximum in λ ≥ 0 at λ , and (23.6) states that when λ is
fixed at λ∗ the function L(x, λ∗ ) attains its minimum in x ∈ X at x = x∗ .
Now consider the particular case of (IC) where X = Rn is the entire space, the
objective f is convex and everywhere differentiable and the constraints g1 , . . . , gm
are linear. In this case the Relaxed Slater Condition holds whenever there is a fea-
sible solution to (IC), and when that is the case, Theorem IV.23.4 states that the
KKT (Karush-Kuhn-Tucker) condition is necessary and sufficient for optimality
278
24.2 Quadratic Programming Duality 279
of x∗ ; as we just have explained, this is the same as to say that the necessary
and sufficient condition of optimality for x∗ is that x∗ along with certain λ∗ ≥ 0
form a saddle point of the Lagrange function. Combining these observations with
Proposition IV.21.2, we get the following simple result.
Let us look what Proposition IV.24.1 says in the Linear Programming case, i.e.,
when (IC) is the problem given by
min f (x) := cT x : gj (x) := bj − aTj x ≤ 0, j = 1, . . . , m .
(P )
x
In order to get to the Lagrange dual, we should form the Lagrange function of
(IC) given by
m
X Xm ⊤ m
X
L(x, λ) = f (x) + λj gj (x) = c − λ j aj x + λj bj ,
j=1
j=1 j=1
and minimize it in x ∈ Rn ; this will give us the dual objective. In our case
the
Pm minimization in x is P immediate: the minimal value is equal to −∞, if c −
m
j=1 λj aj ̸= 0, and it is j=1 λj bj , otherwise. Hence, we see that the Lagrange
dual is given by
( m
)
X
⊤
max b λ : λj aj = c, λ ≥ 0 . (D)
λ
j=1
Therefore, the Lagrange dual problem is precisely the usual LP dual to (P ), and
Proposition IV.24.1 is one of the equivalent forms of the Linear Programming
Duality Theorem (Theorem I.4.9) which we already know.
where the objective is a strictly convex quadratic form, so that the matrix Q = Q⊤
is positive definite, i.e., x⊤ Qx > 0 whenever x ̸= 0. It is convenient to rewrite the
constraints in the vector-matrix form using the notation
⊤
a1
..
g(x) = b − Ax ≤ 0, where b := [b1 ; . . . ; bm ] , A := . .
a⊤m
In order to form the Lagrange dual to (P ) program, we write down the Lagrange
function
Xm
L(x, λ) = f (x) + λj gj (x)
j=1
1 1
= x⊤ Qx + c⊤ x + λ⊤ (b − Ax) = x⊤ Qx − (A⊤ λ − c)⊤ x + b⊤ λ,
2 2
and minimize it in x. Since the function is convex and differentiable in x, the
minimum, if exists, is given by the Fermat rule
∇x L(x, λ) = 0,
which in our situation becomes
Qx = A⊤ λ − c.
Since Q is positive definite, it is nonsingular, so that the Fermat equation has a
unique solution which is the minimizer of L(·, λ). This solution is
x(λ) := Q−1 (A⊤ λ − c).
Substituting the expression of x(λ) into the expression for the Lagrange function,
we get the dual objective
1
L(λ) = − (A⊤ λ − c)⊤ Q−1 (A⊤ λ − c) + b⊤ λ.
2
Thus, the dual problem is to maximize this objective over the nonnegative or-
thant. Let us rewrite this dual problem equivalently by introducing additional
variables
t := −Q−1 (A⊤ λ − c) =⇒ (A⊤ λ − c)⊤ Q−1 (A⊤ λ − c) = t⊤ Qt.
With this substitution, the dual problem becomes
1 ⊤ ⊤ ⊤
max − t Qt + b λ : A λ + Qt = c, λ ≥ 0 . (D)
λ,t 2
We see that the dual problem also turns out to be linearly constrained convex
quadratic program.
Remark IV.24.2 Note also a feasible quadratic program in the form of (P ) with
a positive definite matrix Q automatically is solvable. This relies on the following
simple general fact:
24.2 Quadratic Programming Duality 281
Proof. (i): Proposition IV.24.1 implies that the optimal value in minimization
problem (P ) is equal to the optimal value in the maximization problem (D).
It follows that the value of the primal objective at any primal feasible solution
is ≥ the value of the dual objective at any dual feasible solution, and equality
is possible if and only if these values coincide with the optimal values in the
problems, as claimed in (i).
(ii): Let ∆(x, (λ, t)) be the difference between the primal objective value of
the primal feasible solution x and the dual objective value of the dual feasible
solution (λ, t)
1 1
∆(x, (λ, t)) := (c⊤ x + x⊤ Qx) − (b⊤ λ − t⊤ Qt)
2 2
1 ⊤ 1
= (A λ + Qt) x + x Qx + t⊤ Qt − b⊤ λ
⊤ ⊤
2 2
1
= λ⊤ (Ax − b) + (x + t)⊤ Q(x + t),
2
where the second equation follows since A⊤ λ + Qt = c. Whenever x is primal
feasible, we have Ax − b ≥ 0, and similarly dual feasibility of (λ, t) implies that
λ ≥ 0. Since Q is positive definite as well, we then deduce that the first and
the second terms in the above representation of ∆(x, (λ, t)) are nonnegative for
every pair (x; (λ, t)) of primal and dual feasible solutions. Thus, for such a pair
∆(x, (λ, t)) = 0 holds if and only if λ⊤ (Ax − b) = 0 and (x + t)⊤ Q(x + t) = 0. The
first of these equalities, due to λ ≥ 0 and Ax ≥ b, is equivalent to λj (Ax − b)j = 0
282 Duality in Linear and Convex Quadratic Programming
283
284 ⋆ Cone-convex functions: elementary calculus and examples
25.2.1 Cone-monotonicity
Let us start with a new (for us) notion which will play an important role in
“calculus of cone-convexity.”