Solutions To Exercises in An Introduction To Convexity: Øyvind Ryan

Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

SOLUTIONS TO EXERCISES IN AN

INTRODUCTION TO CONVEXITY


ØYVIND RYAN

JUNE 2020

∗ University of Oslo, Centre of Mathematics for ApplicatiP.O.Box 1053, Blindern,


0316 Oslo, Norway ([email protected])
Contents

1 The basic concepts 1


2 Convex hulls and Carathéodory’s theorem 8
3 Projection and separation 20
4 Representation of convex sets 25
5 Convex functions 36
6 Nonlinear and convex optimization 45
Chapter 1

The basic concepts

Exercise 1.1. Let x1 , x2 , y1 , y2 ∈ IRn and assume that x1 ≤ x2 and y1 ≤ y2 .


Verify that the inequality x1 + y1 ≤ x2 + y2 also holds. Let now λ be a nonnegative
real number. Explain why λx1 ≤ λx2 holds. What happens if λis negative?
Solution: We have that x1 + y1 ≤ x2 + y1 ≤ x2 + y2 , since adding the same
number to the both sides does not alter the inequality. If λ < 0 we obtain that
λx1 ≥ λx2 , since multiplication with a negative number changes the direction of
the inequality.
Exercise 1.2. Think about the question in Exercise 1.1 again, now in light of
the properties explained in Example 1.2.1.
Solution: The exercise adds to the observation that IRn+ is closed under addition
and multiplication with positive scalars, that these two operations also respect
the ordering in IRn+ defined by x ≤ y ⇐⇒ xi ≤ yi for all i.
Exercise 1.3. Let a ∈ IRn+ and assume that x ≤ y. Show that aT x ≤ aT y. What
happens if we do not require a to be nonnegative here?
Solution: If a ∈ IRn+ , since xi ≤ yi we have that ai xi ≤ ai yi . We obtain that
n
X n
X
T
a x= ai x i ≤ ai yi = aT y,
i=1 i=1
and the result follows. If a has some negative components, the corresponding
inequality is reversed. If all entries of a are nonpositive, the sum inequality above
is also reversed.
Exercise 1.4. Show that every ball B(a, r) := {x ∈ IRn : kx − ak ≤ r} is convex
(where a ∈ IRn and r ≥ 0).
Solution: Assume that x ∈ B(a, r), y ∈ B(a, r), so that kx − ak ≤ r, y − ak ≤ r.
We have that
k((1 − λ)x + λy) − ak = k(1 − λ)(x − a) + λ(y − a)k
≤ (1 − λ)kx − ak + λky − ak ≤ (1 − λ)r + λr = r,

1
2 CHAPTER 1. THE BASIC CONCEPTS

so that also (1 − λ)x + λy ∈ B(a, r). It follows that B(a, r) is convex.


Exercise 1.5. Explain how you can write the LP problem max {cT x : Ax ≤ b}
in the form max {cT x : Ax = b, x ≥ O}.
Solution: Ax ≤ b is equivalent to solving Ax + z = b, z ≥ 0 for x and z. Now,
write x = x1 − x2 where x1 , x2 ≥ 0. We have the system
 
 x1
A(x1 − x2 ) + z = Ax1 − Ax2 + z = A −A I x2  = b,
z

so that A is replaced with A −A I .
Exercise 1.6. Make a drawing of the standard simplices S1 , S2 and S3 . Verify
that each unit vector ej lies in Sn (ej has a one in position j, all other components
as a linear combination x = nj=1 λj ej
P
are zero). Each x ∈ Sn may be writtenPn
where each λj is nonnegative and j=1 λj = 1. How? Can this be done in several
ways?
Pn
Solution: ej lies in Sn since its components sum to one. In x = j=1 λj ej ,
we simply have that λj = xj (the j’th component of x). This decomposition is
unique, due to linear independence of the standard basis.
Exercise 1.7. Show that each convex cone is indeed a convex set.
Solution: Let C be a convex cone, and let x1 ∈ C, x2 ∈ C. Then (1−λ)x1 +λx2 ∈
C for 0 ≤ λ ≤ 1, since λ, 1 − λ ≥ 0. It follows that C also is a convex set.
Exercise 1.8. Let A ∈ IRm,n and consider the set C = {x ∈ IRn : Ax ≤ O}.
Prove that C is a convex cone.
Solution: Let x1 , x2 ∈ C, and λ1 , λ2 ≥ 0. Then we have that

A(λ1 x1 + λ2 x2 ) = λ1 Ax1 + λ2 Ax2 ≤ 0

since Axi ≤ 0. It follows that λ1 x1 + λ2 x2 , so that C is a convex cone.


Exercise 1.9. Prove that C(x1 , . . . , xt ) is a convex cone.
Solution: Let y, z ∈ C(x1 , . . . , xt ), so that

y = a1 x 1 + . . . + at x t z = b1 x 1 + . . . + bt x t ,

where a1 , b1 , . . . , at , bt ≥ 0. Let λ1 , λ2 ≥ 0. We have that

λ1 y + λ2 z = (λ1 a1 + λ2 b1 )x1 + . . . + (λ1 at + λ2 bt )xt ,

where all λ1 ai + λ2 bi ≥ 0 (since all variables are ≥ 0). It follows that λ1 y + λ2 z ∈


C(x1 , . . . , xt ), so that C(x1 , . . . , xt ) is a convex cone.
3

Exercise 1.10. Let S = {(x, y, z) : z ≥ x2 + y 2 } ⊂ IR3 . Sketch the set and verify
that it is a convex set. Is S a finitely generated cone?
Solution: Assume that a1 = (x1 , y1 , z1 ), a2 = (x2 , y2 , z2 ) are both in S, so that
z1 ≥ x21 + y12 , z2 ≥ x22 + y22 . Convexity of S is the same as showing, for 0 ≤ λ ≤ 1,
that

(1 − λ)a1 + λa2 = ((1 − λ)x1 + λx2 , (1 − λ)y1 + λy2 , (1 − λ)z1 + λz2 ) ∈ S,

i.e., that

((1 − λ)x1 + λx2 )2 + ((1 − λ)y1 + λy2 )2 ≤ (1 − λ)z1 + λz2 .

We have that
((1 − λ)x1 + λx2 )2 ≤ (1 − λ)x21 + λx22
(since this can be reorganized to λ(1−λ)(x21 +x22 −2x1 x2 = λ(1−λ)(x1 −x2 )2 ≥ 0.
Convexity of the function f (x) = x2 is really what is at play here. We will return
to this later), and similarly

((1 − λ)y1 + λy2 )2 ≤ (1 − λ)y12 + λy22 .

It follows that

((1 − λ)x1 + λx2 )2 + ((1 − λ)y1 + λy2 )2


≤ (1 − λ)x21 + λx22 + (1 − λ)y12 + λy22
= (1 − λ)(x21 + y12 ) + λ(x22 + y22 ) ≤ (1 − λ)z1 + λz2 ,

and the result follows.


Exercise 1.11. Consider the linear system 0 ≤ xi ≤ 1 for i = 1, . . . , n and let
P denote the solution set. Explain how to solve a linear programming problem

max{cT x : x ∈ P }.

What if the linear system was ai ≤ xi ≤ bi for i = 1, . . . , n. Here we assume


ai ≤ bi for each i.
Solution: Since cT x = ni=1 ci xi , we see that the maximum can be obtained by
P

P ci xi separately. If ci ≥ 0 this maximum is 0. It follows that the


maximizing each
maximum is i|ci >0 ci .
Exercise 1.12. Is the union of two convex sets again convex?
Solution: No. Take for instance the union of the two intervals (−∞, 1) and
(1, ∞).
4 CHAPTER 1. THE BASIC CONCEPTS

Exercise 1.13. Determine the sum A + B in each of the following cases:


(i) A = {(x, y) : x2 + y 2 ≤ 1}, B = {(3, 4)};
(ii) A = {(x, y) : x2 + y 2 ≤ 1}, B = [0, 1] × {0};
(iii) A = {(x, y) : x + 2y = 5}, B = {(x, y) : x = y, 0 ≤ x ≤ 1};
(iv) A = [0, 1] × [1, 2], B = [0, 2] × [0, 2].

Solution:
(i): The disk with center (3, 4) and radius 1.
(ii): The left half of the disk with center (0, 0) and radius 1, combined with the
rectangle with corners (0, 1), (0, −1), (1, 1), (1, −1), combined with the disk with
center (1, 0) and radius 1.
     
5 −2 1
(iii): We can write A = +y , and B = x , 0 ≤ x ≤ 1. If we set
0 1 1
x = 1 we obtain the line
         
5 −2 1 6 −2
+y + = +y
0 1 1 1 1
It follows that A + B is the area between the parallel lines
       
5 −2 6 −2
+y and +y .
0 1 1 1

(iv): The rectangle with vertices (0, 1), (0, 4), (3, 1), (3, 4).
Exercise 1.14. (i) Prove that, for every λ ∈ IR and A, B ⊆ IRn , it holds that
λ(A + B) = λA + λB.
(ii) Is it true that (λ + µ)A = λA + µA for every λ, µ ∈ IR and A ⊆ IRn ? If
not, find a counterexample.
(iii) Show that, if λ, µ ≥ 0 and A ⊆ IRn is convex, then (λ + µ)A = λA + µA.
Solution:
(i): A general element in A + B is on the form a + b for a ∈ A, b ∈ B. Since
λ(a + b) = λa + λb ∈∈ λA + λB, it follows that λ(A + B) ⊆ λA + λB. The other
way follows in the same way.
(ii): If λ = −µ the set on the left consists of only the origin, while the set on the
right can be any set.
(iii): Let A be convex, and let a1 , a2 ∈ A. Then
 
λ µ
λa1 + µa2 = (λ + µ) a1 + a2 ∈ (λ + µ)A
λ+µ λ+µ
5

since A is convex. It follows that λA + µB ⊆ (λ + µ)A. On the other hand, if


a ∈ A,
(λ + µ)a = λa + µa ∈ λA + µA,
so that (λ + µ)A ⊆ λA + µA. It follows that (λ + µ)A = λA + µA.
Exercise 1.15. Show that if C1 , . . . , Ct ⊆ IRn are all convex sets, then C1 ∩ . . . ∩
Ct is convex. Do the same when all sets are affine (or linear subspaces, or convex
cones). In fact, a similar result for the intersection of any family of convex sets.
Explain this.
Solution: Assume that x, y ∈ C1 ∩ . . . ∩ Ct . Then, in particular x, y ∈ Ci . Since
Ci is convex, (1 − λ)x + λy ∈ Ci . But then also (1 − λ)x + λy ∈ C1 ∩ . . . ∩ Ct , so
that C1 ∩ . . . ∩ Ct is also convex. Affine sets, linear subspaces and convex cones
are all convex, so the result for them also follows. Note that the proof applied
for any number of sets, regardless of cardinality, so the result also holds for any
number of sets.
Exercise 1.16. Consider a family (possibly infinite) of linear inequalities
aTi x ≤ bi , i ∈ I, and C be its solution set, i.e., C is the set of points satisfying
all the inequalities. Prove that C is a convex set.
Solution: Each set aTi x ≤ bi is convex (in fact affine). C is the intersection of all
these sets, so it is also convex, by the previous exercise.
Exercise 1.17. Consider the unit disc S = {(x1 , x2 ) ∈ IR2 : x21 + x22 ≤ 1} in IR2 .
Find a family of linear inequalities as in the previous problem with solution set
S.
Solution: For any a ∈ S, use the linear inequality aT x ≤ 1. This describes a
plane with normal vector 1 which supports the unit disc at a. The intersection
of all these half planes thus contains the entire unit disc. If x is a point outside
the unit disc, then clearly the half plane obtained by setting a = x/kxk does not
contain x, so that the intersection is exactly S.
Exercise 1.18. Is the unit ball B = {x ∈ IRn : kxk2 ≤ 1} a polyhedron?
Solution: Assume that B equals the intersection of n sets on the form aTi x ≤ bi .
Clearly we can assume that none of the ai are parallel, that all kai k2 = 1, and
all bi = 1. The plane aTi x = 1 is a tangent plane to the unit ball at ai , and B is
contained in {x : aTi x ≤ bi }.
Let x be so that kxk2 = 1, and so that x 6= ai for all i. Clearly aTi x < bi for
all i, since for each i x = ai is the unique point in B so that equality holds.
By continuity there exists a point x0 6∈ B so that aTi x0 < bi for all i. Thus, the
intersection of all those half planes must contain more than B. It follows that B
is not a polyhedron.
6 CHAPTER 1. THE BASIC CONCEPTS

Exercise 1.19. Show that the unit ball B∞ = {x ∈ IRn : kxk∞ ≤ 1} is convex.
Here kxk∞ = maxj |xj | is the max norm of x. Show that B∞ is a polyhedron.
Illustrate when n = 2.
Solution: It is straightforward to show that k · k∞ is a norm. Convexity thus
follows from Exercise 1.4, since the proof therein applies for any norm. That B∞ is
a polyhedron follows from writing kxk∞ ≤ 1 equivalently as {xi ≤ 1, −xi ≤ 1}ni=1 .
For n = 2 we obtain the square with vertices (1, 1), (−1, 1), (1, −1), (−1, −1).
n
Exercise 1.20.Pn Show that the unit ball B1 = {x ∈ IR : kxk1 ≤ 1} is convex.
Here kxk1 = j=1 |xj | is the absolute norm of x. Show that B1 is a polyhedron.
Illustrate when n = 2.
Solution: It is straightforward to show that k · k1 is a norm. Convexity
Pn thus fol-
Pn B1 is a polyhedron follows from writing kxk1 = j=1
lows as above. That
n
|xj | ≤ 1
equivalently as j=1 ±xj ≤ 1, where the signs traverse all possible 2 combina-
tions. For n = 2 there are four possible sign choices, leadin to the polyhedron
defined by x + y ≤ 1, x − y ≤ 1, −x + y ≤ 1, −x − y ≤ 1. This gives the square
with vertices (1, 0), (0, 1), (−1, 0), (0, −1).
Exercise 1.21. Prove Proposition 1.5.1.
Solution: Let x0 ∈ C (C is assumed nonempty), and let L = C − x0 = {x − x0 :
x ∈ C} (so that C = L + x0 ). Assume that C is affine. For x ∈ C we have that

λ(x − x0 ) = λx + (1 − λ)x0 − x0 ∈ L,

since λx + (1 − λ)x0 ∈ C. If x1 , x2 ∈ C and λ ∈ IR, it follows that we can write


x1 − x0 = (1 − λ)(x3 − x0 ) and x2 = λ(x4 − x0 ) for some x3 , x4 ∈ C. Thus

x1 − x0 + x2 − x0 = (1 − λ)(x3 − x0 ) + λ(x4 − x0 ) = (1 − λ)x3 + λx4 − x0 ∈ L,

since (1 − λ)x3 + λx4 ∈ C. It follows that L is a vector space. From

C − x1 = (C − x0 ) − (x1 − x0 )

it follows that C − x1 ⊆ C − x0 = L. It follows that this vector space is uniquely


defined (i.e., independent of the choice of x0 ).
Let now A be a matrix, and let C be the solution set of Ax = b. If Ax1 = b,
Ax2 = b, we get

A((1 − λ)x1 + λx2 ) = (1 − λ)Ax1 + λAx2 = (1 − λ)b + λb = b,

so that C is affine. The other way, if L is a linear subspace of IRn , and x0 ∈ IRn ,
we can find a matrix A with null space L. If Ax0 = b, the solution set of Ax = b
is then the affine set C = L + x0 .
7

Exercise 1.22. Let C be a nonempty affine set in IRn . Define L = C − C. Show


that L is a subspace and that C = L + x0 for some vector x0 .
Solution: From the previous exercise we can write C = L + x0 (so that L =
C − x0 ) for a unique subspace L (it does not depend on the choice of x0 ∈ C).
Let x1 , x2 ∈ C, so that x1 = l1 + x0 , x2 = l2 + x0 for some l1 , l2 ∈ L. We get that

x1 − x2 = l1 − l2 ∈ L,

so that C − C ⊆ L. The other way, if l ∈ L, we can write l = x − x0 for some


x ∈ C, so that l ∈ C − C. Thus L ⊆ C − C, so that L = C − C.
Chapter 2

Convex hulls and Carathéodory’s theorem

Exercise 2.1. Illustrate some combinations (linear, convex, nonnegative) of two


vectors in IR2 .
Exercise 2.2. Choose your favorite three points x1 , x2 , x3 in IR2 , but make sure
that they do not all lie on the same line. Thus, the three points form the corners
of a triangle C. Describe those points that are convex combinations of two of the
three points. What about the interior of the triangle C, i.e., those points that lie
in C but not on the boundary (the three sides): can these points be written as
convex combinations of x1 , x2 and x3 ? If so, how?
Solution: Take for instance the three points x1 = (0, 1), x2 = (1, 1), x3 = (1, 0).
The convex combinations of two points lie on the line between those points.
Convex combinations of (0, 1) and (1, 1), as well as convex combinations of (1, 0)
and (1, 1), and convex combinations of (0, 1) and (1, 0), constitute the three edges
which together form the boundary. Clearly, the interior points lie on a line through
x1 and a point on the edge connecting x2 and x3 . The latter can be written as
(1 − λ)x2 + λx3 . An interior point is thus a convex combination of this and x1 ,
which can be written as

(1 − µ)x1 + µ ((1 − λ)x2 + λx3 ) = (1 − µ)x1 + µ(1 − λ)x2 + µλx3 .

This is a convex combination of the three points, since the coefficients sum to
1 − µ + µ(1 − λ + λ) = 1.
Exercise 2.3. Show P that conv(S) P is convex for all S ⊆ IRn . (Hint: look at two
convex combinations j λj xj and j µj yj , and note that both these points may
be written as a convex combination of the same set of vectors.)
Pt Ps
Solution: Let x = j=1 λj xj and y = j=1 µj yj be convex combinations of
points in S. By taking the union of the points xj and yj , both x and y can be
written as convex combinations of the same set of points {zj }rj=1 (some of the

8
9

coefficients may now be zero). We obtain


r
! r
!
X X
(1 − λ)x + λy = (1 − λ) λj zj +λ µj zj
j=1 j=1
r
X
= ((1 − λ)λj + λµj )zj .
j=1

Clearly the coefficients (1 − λ)λj + λµj sum to one, so that this is also a convex
combination of points in S. It follows that conv(S) is convex.
Exercise 2.4. Give an example of two distinct sets S and T having the same
convex hull. It makes sense to look for a smallest possible subset S0 of a set S
such that S = conv(S0 ). We study this question later.
Solution: We can set S = {−1, 0, 1}, and T = {−1, 1}. Both have [−1, 1] as
convex hull.
Exercise 2.5. Prove that if S ⊆ T , then conv(S) ⊆ conv(T ).
Solution: This follows from the fact that a convex combination of points in S
then also is a convex combination of points in T .
Exercise 2.6. If S is convex, then conv(S) = S. Show this!
Solution: This is a compulsory exercise. We already know that S ⊆ conv(S).
The converse statement is also shown in the text, but let us repeat it. Assume
that we have shown that any convex combination of t − 1 points in S lies in S.
We can write
t t−1 t−1
!
X X X λj
λ j xj = ( λj ) Pt−1 xj + λt xt .
j=1 j=1 j=1 j=1 λj

Pt−1 λj Pt
By the induction hypothesis j=1
Pt−1
λj
xj ∈ S. But then also j=1 λ j xj ∈ S
j=1
by convexity of S.
Exercise 2.7. Let S = {x ∈ IR2 : kxk2 = 1}, this is the unit circle in IR2 .
Determine conv(S) and cone(S).
Solution: conv(S) must contain any point inside the unit circle. This can be seen
if you take any line through this point. This line will intersect the unit circle at
two points, so that the point is a convex combination of two points on the unit
circle. Therefore D ⊆ conv(S), where D = {x ∈ IR2 : kxk2 ≤ 1}. Since conv(S)
is the smallest convex set that contains S and since D is convex and contains S,
we obtain that conv(S) ⊆ D. It follows that conv(S) = D.
If x ∈ IR2 6= 0 then x = kxk2 u with u = x/kxk2 ∈ S. It follows that u ∈ cone(S),
so that conv(S) = IR2 .
10 CHAPTER 2. CONVEX HULLS AND CARATHÉODORY’S THEOREM

Exercise 2.8. Does affine independence imply linear independence? Does linear
independence imply affine independence? Prove or disprove!
Solution: Linear independence (of the columns of A) is the same as

Ax = 0 ⇒ x = 0.

Affine independence (of the columns of A) is the same as


 
A
x = 0 ⇒ x = 0,
1···1

i.e., it is the same as linear independence of the columns of A with a last compo-
nent with a one added. Clearly then linear independence implies affine indepen-
dence (since equality in the first n components already implies that the coefficients
must be zero). Affine
 idependence
 does not imply linear independence, however:
A
If A has n rows, can have rank n + 1, but A can have rank at most n. In
1···1  
A
particular, there can be n + 1 linearly independent column vectors in ,
1···1
but only n in A.
Exercise 2.9. Let x1 , . . . , xt ∈ IRn be affinely independent and let w ∈ IRn . Show
that x1 + w, . . . , xt + w are also affinely independent.
Solution: X X
λi (xi + w) = 0 and λi = 0
i i

is equivalent to X X
λi xi = 0 and λi = 0
i i
P P
since i λi w = 0 when i λi = 0. The result follows.
Exercise 2.10. Let L be a linear subspace of dimension (in the usual linear
algebra sense) t. Check that this coincides with our new definition of dimension
above. (Hint: add O to a “suitable” set of vectors).
Solution: Let x1 , . . . , xt be a basis for L. {0, x1 , . . . , xt } are affinely independent
since x1 − 0, . . . , xt − 0 are linearly independent. Therefore the affine dimension of
L is ≥ t. If the affine dimension of L was larger than t, we could find at least t + 2
affinely independent points x1 , . . . , xt+2 in L, so that x2 − x1 , . . . , xt+2 − x1 are
linearly independent. There are t + 1 vectors here, all in L, so that the dimension
of L is ≥ t + 1. This is a contradiction. It follows that the affine dimension equals
the dimension.
Exercise 2.11. Prove the last statements in the previous paragraph.
11

Solution: With A being the set of all tj=1 λj xi with tj=1 λj = 1, we have that
P P

X X X
(1 − λ) λ j xj + λ µ j xj = (1 − λ)λj + λµj )xj ∈ A,
j j j

since the coefficients sum to one (by expansion we can clearly assume the same
base set xj for the two linear combinations). It follows that A is affine. Choosing
λj = 0, the other zero, we see that x1 , . . . , xd+1 are in A. Then A must in partic-
ular contain all the convex combinations of these points. Since conv(C) = C, A
contains C. Any affine set that contains C must also contain these, so this is the
smallest one.
Exercise 2.12. Construct a set which is neither open nor closed.
Solution: An example is the half-open interval A = [0, 1).
Exercise 2.13. Show that xk → x if and only if xkj → xj for j = 1, . . . , n. Thus,
convergence of a point sequence simply means that all the component sequences
are convergent.
Solution: If xk → x we can find an N so that kxk − xk2 ≤  for k ≥ N . But
since |xkj − xj | ≤ kxr − xk2 , it follows that also |xkj − xj | ≤  for r ≥ N , so that
xkj → xj for j = 1, . . . , n.
k k
√ other way, if xj → xj for j = 1, . . . , n, we can find an N so that |xj − xj | ≤
The
/ n for all k ≥ N and j = 1, . . . , n. But then, for k ≥ N ,
v v
n
u n
uX uX
k
u k
kx − xk2 = t 2
|xj − xj | ≤ t 2 /n = ,
j=1 j=1

so that xk → x. The result follows.


Exercise 2.14. Show that every simplex cone is closed.
Solution: Let X be the matrix with columns xj (which define the simplex cone).
x is in the simplex cone if and only if it can be written on the form kj=1 λj xj
P
for some λj ≥ 0, i.e., x = Xλ (λ is the column vector with components λi ).
Since the xj are linearly independent, X has linearly independent columns. As-
sume that X T Xx = 0. This implies that Xx is in the null space of X T , which is
orthogonal to the range of X. This implies that Xx = 0, so that x = 0 by linear
independence of the columns of X. It follows that X T X is invertible, so that we
can define X † := (X T X)−1 X T .
Myltiplying x = Xλ with X † on each side we obtain that λ = X † x. The simplex
cone can now be written as the inverse image of the closed set λ : λj ≥ 0} under
the continuous map f (x) = X † x. It follows that the simplex cone is closed.
12 CHAPTER 2. CONVEX HULLS AND CARATHÉODORY’S THEOREM

Exercise 2.15. Prove that x ∈ bd(S) if and only if each ball with center x
intersects both S and the complement of S.
Solution: If x ∈ bd(S) is the same as x ∈ cl(S) and x 6∈ int(S). x 6∈ int(S) is the
same as any ball with center x intersects the complement of S. x ∈ cl(S) is the
same as any ball intersects S. The result follows.
Exercise 2.16. Consider again the set C = {(x1 , x2 , 0) ∈ IR3 : x21 + x22 ≤ 1}.
Verify that
(i) C is closed,
(ii) dim(C) = 2,
(iii) int(C) = ∅,
(iv) bd(C) = C,
(v) rint(C) = {(x1 , x2 , 0) ∈ IR3 : x21 + x22 < 1} and
(vi) rbd(C) = {(x1 , x2 , 0) ∈ IR3 : x21 + x22 = 1}.
Solution:
(i): Assume that xk → x, and that xk ∈ C. Since all x2k + yk2 ≤ 1, then also
x2 + y 2 ≤ 1. Since all zk = 0, then also z = 0. It follows that x ∈ C also. It follows
that C is closed.
(ii): It is possible to find only 3 affinely independent points in C, sinde the third
component is always zero. The dimension is thus ≤ 2. To see that the dimension
is exactly 3, choose for instance the three points (0, 0, 0), (1, 0, 0), and (0, 1, 0).
(iii): Any ball around a point in C will contain points with nonzero third compo-
nent, so that no ball can be entirely contained in C. It follows that int(C) = ∅.
(iv): bd(C) = cl(C) \ int(C) = C \ ∅ = C.
(v): Clearly aff(C) is the entire xy-plane.
If x21 + x22 < 1, then in IR2 we can find a ball B o ((x1 , x2 ), r) contained in
B o ((0, 0), 1). We then obtain
B o (x, r) ∩ aff(C) = {(z1 , z2 , 0) ∈ IR3 : (z1 − x1 )2 + (z2 − x22 ) < r}
= (B o ((x1 , x2 , r), 0) ⊆ C,

This shows that {(x1 , x2 , 0) ∈ IR3 : x21 + x22 < 1} ⊆ rint(C).


If x21 + x2 = 1, any ball around such a point contains points suct that z12 + z22 > 1,
so that for such points we can’t have B o (x, r) ∩ aff(C) ⊆ C.
(vi):
rbd(C) = cl(C) \ rint(C)) = C \ {(x1 , x2 , 0) ∈ IR3 : x21 + x22 < 1}
= {(x1 , x2 , 0) ∈ IR3 : x21 + x22 = 1}.
13

Exercise 2.17. Show that every polytope in IRn is bounded. (Hint: use the prop-
erties of the norm: kx + yk ≤ kxk + kyk and kλxk = λkxk when λ ≥ 0).
Solution: Let C = conv(x1 , . . . , xt ), ann let x = tj=1 λj xj with the λj nonneg-
P
ative and summing to 1. Using the triangle inequality we obtain
t t t
X X t X t
kxk = k λj xj k ≤ λj kxj k ≤ max kxj k λj = max kxj k.
j=1 j=1
j=1 j=1 j=1

This proves that the polytope is bounded.


Exercise 2.18. Consider the standard simplex St . Show that it is compact, i.e.,
closed and bounded.
Solution: Since a simplex is in particular polytope, it is bounded because of the
previous exercise. It is closed since it is the inverse of the closed set {1} under
the mapping f (x) = x1 + . . . + xn restricted to the first “quadrant”.
Exercise 2.19. Give an example of a convex cone which is not closed.
Solution:
S = {(0, 0)} ∪ {x ∈ IR2 : x1 , x2 > 0}.
Exercise 2.20. Let S ⊆ IRn and let W be the set of all convex combinations of
points in S. Prove that W is convex.
Solution: See exercise 2.3.
Exercise 2.21. Prove the second statement of Proposition 2.1.1.
Solution: Assume that C is a convex cone. By definition and induction it must
contain all nonnegative combinations of its points.
Assume that C contains all nonnegative combinations of its points. That it is a
convex cone follows by restricting this to any two points.
Exercise 2.22. Give a geometrical interpretation of the induction step in the
proof of Proposition 2.1.1.
Exercise 2.23. Let S = {(0, 0), (1, 0), (0, 1)}. Show that conv(S) = {(x1 , x2 ) ∈
IR2 : x1 ≥ 0, x2 ≥ 0, x1 + x2 ≤ 1}.
Solution: A convex combination λ1 a1 + λ2 a2 + λ3 a3 can be written as (λ2 , λ1 ),
and the only requiremnet on the λ1 , λ2 is that they are nonnegative and summing
to something less than 1. This shows that the stated set coincides with the set of
convex combinations.
Exercise 2.24. Let S consist of the points (0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1),
(1, 1, 0), (1, 0, 1), (0, 1, 1) and (1, 1, 1). Show that conv(S) = {(x1 , x2 , x3 ) ∈ IR3 :
0 ≤ xi ≤ 1 for i = 1, 2, 3}. Also determine conv(S \ {(1, 1, 1)} as the solution set
of a system of linear inequalities. Illustrate all these cases geometrically.
14 CHAPTER 2. CONVEX HULLS AND CARATHÉODORY’S THEOREM

Solution: Any convex combination of values between 0 and 1 lies between 0 and
1. Since the coordinates of all the points lie between 0 and 1, conv(S) must consist
of points with coordinates between 0 and 1 as well, i.e., conv(S) ⊆ {(x1 , x2 , x3 ) ∈
IR3 : 0 ≤ xi ≤ 1 for i = 1, 2, 3}. The other way, since the points constitute all
vertices of the unit cube, conv(S) will contain all the edges of the unit cube (by
taking convex combinations of adjacent vertices). By taking convex combinations
of edges which on different sides of a face, we see that all faces of the cube are also
contained in conv(S). By taking convex combinations of different faces we obtain
the entire cube, so that {(x1 , x2 , x3 ) ∈ IR3 : 0 ≤ xi ≤ 1 for i = 1, 2, 3} ⊆ conv(S).
Equality now follows.
Now, let us exclude the point (1, 1, 1). In all remaining seven points the coor-
dinates sum to ≤ 2, so that this applies to conv(S \ (1, 1, 1)) as well, which
is thus contained in x1 + x2 + x3 ≤ 2. Consider the polyhedron described by
0 ≤ x1 , x2 , x3 ≤ 1, and x1 + x2 + x3 ≤ 2. It is easily verified that the vertices of
this set is S \ (1, 1, 1). Since the polyhedron is bounded, and since the vertcies
equal the extreme points (see Chapter 4), it follows that conv(S \ (1, 1, 1)) equals
this polyhedron.
Exercise 2.25. Let A, B ⊆ IRn . ProveP that conv(A + B) = conv(A) + conv(B).
P the sum Pj,k λj µk (aj + bk ) where aj ∈ A, bk ∈ B
Hint: it is useful to consider
and λj ≥ 0, µk ≥ 0 and j λj = 1 and k µk = 1.
Solution: We have that
X X X X X X X
λj µk (aj + bk ) = ( λj )( µ k bk ) + ( µk )( λ j aj ) = µ k bk + λj aj .
j,k j k k j k j

The right hand side is a general element in conv(A) + conv(B). Since the left
hand side is a convex combination of elements in A + B, it follows that conv(A) +
conv(B) ⊆ conv(A + B). The other way, conv(A) + conv(B) is clearly convex
(the sum of two convex sets is always convex), so that it equals its convex hull.
Therefore
conv(A + B) ⊆ conv(conv(A) + conv(B)) = conv(A) + conv(B),
and the result follows.
Exercise 2.26. When S ⊂ IRn is a finite set, say S = {x1 , . . . , xt }, then we have
t
X X
conv(S) = { λj xj : λj ≥ 0 for each j, λj = 1}.
j=1 j

Thus, every point in conv(S) is a convex combination of the points x1 , . . . , xt .


What happens if, instead, S has an infinite number of elements? Then it may not
be possible to give a fixed, finite subset S0 of S such that every point in conv(S) is
a convex combination of elements in S0 . Give an example which illustrates this.
15

Solution: Let S be the integers. Clearly conv(S) = IR, but, for any finite subset
S0 , conv(S0 ) = [min(S0 ), max(S0 )]. This means that we can’t use any finite subset
to describe conv(S).
Exercise 2.27. Let x0 ∈ IRn and let C ⊆ IRn be a convex set. Show that

conv(C ∪ {x0 }) = {(1 − λ)x0 + λx : x ∈ C, λ ∈ [0, 1]}.

Solution: By definition it is clear that

{(1 − λ)x0 + λx : x ∈ C, λ ∈ [0, 1]} ⊆ conv(C ∪ {x0 }).

The other way, a general element in conv(C ∪ {x0 }) can be written as


n
X
(1 − λ)x0 + λi xi ,
i=1
Pn
where xi ∈ C and i=1 λi = λ. We rewrite this as
n
X λi
(1 − λ)x0 + λ xi = (1 − λ)x0 + λx,
i=1
λ
Pn λi
where x = i=1 λ xi ∈ C since C is convex. It follows that

conv(C ∪ {x0 }) ⊆ {(1 − λ)x0 + λx : x ∈ C, λ ∈ [0, 1]}.

The result follows.


Exercise 2.28. Prove Proposition 2.2.2.
Solution: Let W be the intersection of all convex cones containing S. If T is
a convex cone containing S, then clearly T also contains cone(S) by definition.
It follows xthat cone(S) ⊆ W . The other way, cone(S) is also a convex cone
containing S, so that W ⊆ cone(S). Thus W = cone(S), and the result follows.
Exercise 2.29. Confer Exercise 2.9. Give an example showing that a similar
property for linear independence does not hold. Hint: consider the vectors (1, 0)
and (0, 1) and choose some w.
Solution: Set x1 = (1, 0), x2 = (0, 1), and w = −x1 . Then x1 and x2 are linearly
independent, but x1 + w and x2 + w are not, since x1 + w = 0.
Exercise 2.30. If x = tj=1 λj xj and tj=1 λj = 1 we say that x is an affine
P P
combination of x1 , . . . , xt . Show that x1 , . . . , xt are affinely independent if and only
if none of these vectors may be written as an affine combination of the remaining
ones.
16 CHAPTER 2. CONVEX HULLS AND CARATHÉODORY’S THEOREM

Solution:
 Affine independence of x1 , . . . , xt is the same as independent columns
X
in . That xi can be written as an affine combination of the remaining
1···1  
X
ones means that column i in can be written as a linear combinations
1···1
of the other columns. The result now follows from the fact that, a set of vectors
is linearly independent if and only if none of the vectors can be written as a
non-trivial linear combination of the others.
Exercise 2.31. Prove that x1 , . . . , xt ∈ IRn are affinely independent if and only
if the vectors (x1 , 1), . . . , (xt , 1) ∈ IRn+1 are linearly independent.
Solution See Exercise 2.8.
Exercise 2.32. Prove Proposition 2.3.2.
Solution: Asume that x = tj=1 λj xj where
P P
λj = 1. Then
   
X x
λ= .
1···1 1
The convex hull consist of the vectors x for which this has a solution. Affine
independence means that the left hand side columns are linearly independent,
which implies that the solution λ is unique when it exist. This in turn implies
that x has a unique representation.
Exercise 2.33. Prove that cl(A1 ∪ . . . ∪ At ) = cl(A1 ) ∪ . . . ∪ cl(At ) holds whenever
A1 , . . . , At ⊆ IRn .
Solution: cl(A1 ) ∪ . . . ∪ cl(At ) is a closed set containing A1 ∪ . . . ∪ At . Since
cl(A1 ∪ . . . ∪ At ) is the smallest such set, it follows that
cl(A1 ∪ . . . ∪ At ) ⊆ cl(A1 ) ∪ . . . ∪ cl(At ).
The other way, since Ai ⊆ A1 ∪ . . . ∪ At , cl(Ai ) ⊆ cl(A1 ∪ . . . ∪ At ) (since a closed
set S containing A1 ∪ · · · ∪ At also must contain Ai ). But then also
cl(A1 ) ∪ . . . ∪ cl(At ) ⊆ cl(A1 ∪ . . . ∪ At ),
and the result follows.
Exercise 2.34. Prove that every bounded point sequence in IRn has a convergent
subsequence.
Solution: Since any subsequence of a convergent sequence also is convergent, it
is enough to prove the result for n = 1. Due to boundedness there exists an M
so that |xi | ≤ M for all i. Partition the interval [−M, M ] into two parts of equal
length. One of these parts must contain an infinite number of the xi . Choose the
first xi1 taking value in this part, and split the new interval in 2 again. Pick an
xi2 from the new interval, and split in two again, and so on. The new sequence y
defined as yn = xin is clearly a Cauchy sequence, so that it is convergent.
17

Exercise 2.35. Find an infinite set of closed intervals whose union is the open
interval h0, 1i. This proves that the union of an infinite set of closed intervals
may not be a closed set.
Solution: We have that (0, 1) = ∪∞ n=3 [1/n, 1 − 1/n].
Exercise 2.36. Let S be a bounded set in IRn . Prove that cl(S) is compact.
Solution: Sine cl(S) is closed, we only have to prove that cl(S) is bounded.
Since S is bounded, we can find a (closed) ball B(0, r) so that S ⊆ B(0, r). Since
B(0, r) is a closed set containing S, it follows that cl(S) ⊆ B(0, r). Since B(0, r)
is bounded, so is cl(S).
Exercise 2.37. Let S ⊆ IRn . Show that either int(S) = rint(S) or int(S) = ∅.
Solution: We know that we can write aff(S) = L + x0 , where L is a vector space,
and where x0 can be chosen to be any x0 ∈ aff(S) (in particular we can choose
any x0 ∈ S). If L has dimension n, aff(S) = IRn . In this case clearly the relative
interior equals the interior (their definitions coincide).
Assume now that L has dimension m < n. Let x0 , . . . xm be a maximum number
of affinely independent points in S (i.e., the xi − x0 are linearly independent).
Find a vector x 6= 0 so that x − x0 can not be writtenP as a linear combination of
the {xi −x0 }i=1 . I claim that, for any choice of λi , x0 + m
m
i=1 λi (xi −x0 )+(x−x0 )
can not be in aff(S). Otherwise we would have that
m
X m
X
x0 + λi (xi − x0 ) + (x − x0 ) = x0 + µi (xi − x0 )
i=1 i=1
Pm
for some µi (since x0 + i=1 µi (xi − x0 ) describes a general element in aff(S)).
From this one could write x − x0 as a linear combination of the xi − x0 , which is
a contradiction. It follows that int(aff(S)) = ∅ (since  was arbitrary). But then
also int(S) = ∅.
Exercise 2.38. Prove Theorem 2.4.3 Hint: To prove that rint(C) is convex, use
Theorem 2.4.2. Concerning int(C), use Exercise 2.37. Finally, to show that cl(C)
is convex, let x, y ∈ cl(C) and consider two point sequences that converge to x
and y, respectively. Then look at a convex combination of x and y and construct
a suitable sequence!
Solution: This is a compulsory exercise. Let C be convex.
Let x, y ∈ rint(C). Since y ∈ rint(C) ⊆ C ⊆ cl(C), it follows from Theorem 2.4.2
that (1 − λ)x1 + λx2 ∈ rint(C) for all 0 < λ < 1. It follows that rint(C) is convex.
Exercise 2.37 says that int(C) is either ∅, or rint(C). In any case int(C) is convex.
Let x, y ∈ cl(C), and let {xk }, {yk } be sequences from C which converge to x and
y, respectively. Then (1 − λ)xk + λyk is a sequence from C (due to convexity),
18 CHAPTER 2. CONVEX HULLS AND CARATHÉODORY’S THEOREM

and converges to (1 − λ)x + λy, so that (1 − λ)x + λy ∈ cl(C). It follows that


cl(C) is convex.
Exercise 2.39. Prove Theorem 2.5.2.
Solution: Since x ∈ cone(S) there P are nonnegative numbers λ1 , . . . , λt and vec-
tors x1 , . . . , xt ∈ S such that x = j λj xj . In fact, we may assume that each λj is
positive, otherwise we could omit some xj from the representation. If x1 , . . . , xt are
linearly independent, we are done, so assume Pthat they are not. Then there are
t
numbers µ1 , . . . , µt not all zero such that j=1 j xj = O. Since the µj s are
µ
P all zero, pick a nonzero one, say µ1 6= 0. We now multiply the equation
not
j µj x j =PO by a number α and subtract the resulting equation from the equa-
tion x = j λj xj . This gives
X
x= (λj − αµj )xj .
j

Note that, for small α, this is still a nonnegative linear combination, and when
α = 0 it is just the original representation of x. But now we gradually increase or
decrease α from zero until one of the coefficients λj − αµj becomes zero, say this
happens for α = α0 . Recall here that each λj is positive and that µ1 6= 0. Then
each coefficient λj − α0 µj is nonnegative and at least one of them is zero. But this
means that we have found a new representation of x as a nonnegative combination
of t − 1 vectors from S. Clearly, this reduction process may be continued until
we have x written as a nonnegative combination of, say, m linearly independent
points in S. Finally, there are at most n linearly independent points in IRn , so
m ≤ n.

Alternative proof for Theorem 2.4.2


To me it is not completely clear that y ∈ aff(C) (third last line in the proof),
so I have put together a more constructive proof, which I think is clearer. The
following prerequisite for the new proof is useful (see page 8 in Rockafellar’s book
on convex analysis): Let {b0 , . . . , bm } and {c0 , . . . , cm } be two sets of points in IRn ,
both affinely independent. Let A be an invertible linear transformation from IRn
to itself so that A(bi − b0 ) = ci − c0 for i = 1, . . . , m (such a linear transformation
exists due to linear independence of the bi − b0 , ci − c0 ). The affine transformation
T defined by T x = Ax + c0 − Ab0 satisfies T bi = ci for i = 0, . . . , m. For i = 0
this is immediate. For i ≥ 1 we obtain

T bi = Abi + c0 − Ab0 = A(bi − b0 ) + Ab0 + c0 − Ab0 = ci − c0 + Ab0 + c0 − Ab0 = ci .


19

Clearly this affine transformation is unique if m = n (its extension to the affine


hull of the vectors is also unique). Since T is invertible it preserves open and
closed sets, and also affine hulls (since the line through x and y is sent to the
line between T x and T y). It follows that T also preserves relative interior. Now,
let {c0 , c1 , . . . , cm } = {0, e1 , . . . , em }. T then embeds a convex set of dimension m
into IRm , where IRm is obtained from IRn by setting the mast n − m coordinates
to 0. Also, the aff(S) can clearly be identified with IRm as a subset of IRn in the
same way.
Now for the proof itself, which also can be found on page 45 in Rockafellar’s book:
Since we can assume that the last n−m coordinates are zero, we can assume that
we are in IRm , that the relative interior equals the interior, and that the affine
hull equals the entire IRm .
Let w = (1 − λ)x1 + λx2 with x ∈ rint(C) = int(C), y ∈ cl(C). Let B be the unit
ball of IRm . We need to show that (1 − λ)x1 + λx2 + B ⊆ C for some  > 0. Since
x2 ∈ C + B for every  (since x2 ∈ cl(C)), we obtain

(1 − λ)x1 + λx2 + B ⊆ (1 − λ)x1 + λ(C + B) + B


= (1 − λ)x1 + λC + λB + B
 
1+λ
= (1 − λ) x1 +  B + λC,
1−λ

where the equality between the first and second line results from Exercise 1.14(i),
and the equality between the second and third line results from convexity of B
combined with (iii) in the same exercise. For  sufficiently small we have that
x1 +  1+λ
1−λ
B ⊆ C, so that the above is in (1 − λ)C + λC ⊆ C, due to convexity. It
follows that (1 − λ)x1 + λx2 + B ⊆ C, so that (1 − λ)x1 + λx2 ∈ int(C) = rint(C).
The proof follows.
Note that this proof simplifies in the sense that the case x2 ∈ C needs not be
handled separately first.
Chapter 3

Projection and separation

Exercise 3.1. Give an example where the nearest point is unique, and one where
it is not. Find a point x and a set S such that every point of S is a nearest point
to x!
Solution: Let S be the unit disk in IR2 , and let x = (2, 0). Then clearly s = (1, 0)
is the unique nearest point.
Let S be the unit circle in IR2 , and let x = (0, 0). Then there is no unique nearest
point, since every point in S has the same distance to x, so that every point in
S is a nearest point.
Exercise 3.2. Let a ∈ IRn \ {O} and x0 ∈ IRn . Then there is a unique hyperplane
H that contains x0 and has normal vector a. Verify this and find the value of the
constant α (see above).
Solution: The hyperplane is aT x = aT x0 . This hyperplane is unique since any
other value of α will exclude x0 from the set.
Exercise 3.3. Give an example of two disjoint sets S and T that cannot be
separated by a hyperplane.
Solution: Let S be the circle in IR2 consisting of points with modulus 1, T the
circle in IR2 consisting of points with modulus 2.
Exercise 3.4. In view of the previous remark, what about the separation of S
and a point p 6∈ aff(S)? Is there an easy way to find a separating hyperplane?
Solution:
Exercise 3.5. Let C ⊆ IRn be convex. Recall that if a point x0 ∈ C satisfies ( 3.2)
for any y ∈ C, then x0 is the (unique) nearest point to x in C. Now, let C be the
unit ball in IRn and let x ∈ IRn satisfy kxk > 1. Find the nearest point to x in C.
What if kxk ≤ 1?
Solution: If kxk > 1 then clearly x0 = x/kxk is the nearest point. If x ≤ 1 then
x itself is the nearest point.

20
21

Exercise 3.6. Let L be a line in IRn . Find the nearest point in L to a point x ∈
IRn . Use your result to find the nearest point on the line L = {(x, y) : x + 3y = 5}
to the point (1, 2).
Solution: This is a compulsory exercise. Let the line be written on the form
x0 + ta (a being the direction vector of the line). The closest point to x from the
subspace spanned by a is hx,ai
ha,ai
a. It follows that the closest point from the line to
x is
hx − x0 , ai
a + x0 .
ha, ai
The line L can be parametrizes as (5 − 3y, y) = (5, 0) + y(−3, 1). We get that
x − x0 = (1, 2) − (5, 0) = (−4, 2). We then get
hx − x0 , ai 12 + 2
a + x0 = (−3, 1) + (5, 0) = (−21/5, 7/5) + (5, 0) = (4/5, 7/5).
ha, ai 10
Exercise 3.7. Let H be a hyperplane in IRn . Find the nearest point in H to a
point x ∈ IRn . In particular, find the nearest point to each of the points (0, 0, 0)
and (1, 2, 2) in the hyperplane H = {(x1 , x2 , x3 ) : x1 + x2 + x3 = 1}.
Solution: A general hyperplane in IRn can be written on the form x0 + L, where
L is an n − 1-dimensional space. Let a span the orthogonal complement of L. We
subtract x0 and compute the projection to obtain
hx − x0 , ai hx − x0 , ai
x − x0 − a + x0 = x − a
ha, ai ha, ai
(we here instead projected onto the orthogonal complement).
For the plane in question we obtain ha, ai = 3, and we can use x0 = (1, 0, 0) For
the point (0, 0, 0) we obtain
hx − x0 , ai 1
x− a = (1, 1, 1),
ha, ai 3
while for the point (1, 2, 2) we obtain
hx − x0 , ai 4
x− a = (1, 2, 2) − (1, 1, 1) = (−1/3, 2/3, 2/3).
ha, ai 3
Exercise 3.8. Let L be a linear subspace in IRn and let q1 , . . . , qt be an orthonor-
mal basis for L. Thus, q1 , . . . , qt span L, qiT qj = 0 when i 6= j and kqj k = 1
for each j. Let q be the n × t-matrix whose jth column is qj , for j = 1, . . . , t.
Define the associated matrix p = qq T . Show that px is the nearest point in L to
x. (The matrix P is called an orthogonal projector (or projection matrix)). Thus,
performing the projection is simply to apply the linear transformation given by p.
Let L⊥ be the orthogonal complement of L. Explain why (I − P )x is the nearest
point in L⊥ to x.
22 CHAPTER 3. PROJECTION AND SEPARATION

Solution: The orthogonal decomposition theorem states that the closest point is
t
X t
X
hx, qj iqj = (q T x)j qj = qq T x = px.
j=1 j=1

by the definition of matrix multiplication. Similary, if qt+1 , . . . , qn is an orthonor-


mal basis for L⊥ , and Q is the matrix with these n − t column, we have
n
X n
X
hx, qj iqj = (QT x)j qj = QQT x =: P x.
j=t+1 j=t+1

That p + P = I (so that P = I − p) follows now since, for x ∈ L, px = x and


P x = 0, while for x ∈ L⊥ , px = 0 and P x = x. This shows that I − P is the
projector onto the orthogonal complement.
Exercise 3.9. Let L ⊂ IR3 be the subspace spanned by the vectors (1, 0, 1) and
(0, 1, 0). Find the nearest point to (1, 2, 3) in L using the results of the previous
exercise.
Solution: We compute
        
1 0   1 1 0 1 1 4
0 1 1 0 1 2 = 0 1 0 2 = 2 .
0 1 0
1 0 3 1 0 1 3 4
Exercise 3.10. Show that the nearest point in IRn+ to x ∈ IRn is the point x+
defined by x+
j = max{xj , 0} for each j.
Solution: Let y ∈ IRn+ . We have that
p
kx − yk = (x1 − y1 )2 + · · · + (xn − yn )2 .
Now, if xj is negative, the smallest value we can obtain for (xj −yj )2 is by choosing
yj = 0 (we require that yj ∈ IR+ ). If xj is nonnegative, the smallest value we can
obtain for (xj −yj )2 is by choosing yj = xj . Setting yj = max(0, xj ) = x+j captures
both these possibilities.
Exercise 3.11. Find a set S ⊂ IRn and a point x ∈ IRn with the property that
every point of S is nearest to x in S!
Solution: This is exercise 3.1
Exercise 3.12. Verify that every hyperplane in IRn has dimension n − 1.
Solution: Let the hyperplane be xt a = α, let x1 be a point in the hyperplane,
and let y2 , . . . , yn span the orthogonal complement of a. Set xj = x1 + yj . Then
x1 , . . . , xn are affinely independent since xj − x1 = yj are linearly independent.
The dimension can’t be larger, since the orthogonal complement does not have
dimension larger than n − 1.
23

Exercise 3.13. Let C = [0, 1]×[0, 1] ⊂ IR2 and let a = (2, 2). Find all hyperplanes
that separates C and a.
Solution: Let 0 ≤ a ≤ 1/2. Then y = ax + b separates the two sets if b > 1 and
2a + b < 2.
Let a ≤ 0. Then we have separation if a + b > 1 and 2a + b < 2.
Let a ≥ 2. Then we have separation if a + b < 0 and a + 2b > 2.
Exercise 3.14. Let C be the unit ball in IRn and let a 6∈ C. Find a hyperplane
that separates C and a.
Solution: Set x0 = 21 (a + a/kak). we can use aT x = aT x0 = α.
Exercise 3.15. Find an example in IR2 of two sets that have a unique separating
hyperplane.
Solution: The left and right half planes give an example.
Exercise 3.16. Let S, T ⊆ IRn . Explain the following fact: there exists a hyper-
plane that separates S and T if and only if there is a linear function l : IRn → IR
such that l(s) ≤ l(t) for all s ∈ S and t ∈ T . Is there a similar equivalence for
the notion of strong separation?
Solution: Separation means that aT x ≤ α on S, aT x ≥ α on T . Setting l(x) =
aT x (which is linear), this means that l(x) ≤ α on S, l(x) ≥ α on T . This implies
one way. The other way follows by setting α to be any number in the interval
[maxx∈S (l(x)), miny∈T (l(y))].
For strong separation, there must be some  so that l(x) ≤ α− on S, l(x) ≥ α+
on T .
Exercise 3.17. Let C be a nonempty closed convex set in IRn . Then the associated
projection operator pC is Lipschitz continuous with Lipschitz constant 1, i.e.,

kpC (x) − pC (y)k ≤ kx − yk for all x, y ∈ IRn .

(Such an operator is called nonexpansive). You are asked to prove this using
the following procedure. Define a = x − pC (x) and b = y − pC (y). Verify that
(a−b)T (pC (x)−pC (y)) ≥ 0. (Show first that aT (pC (y)−pC (x) ≤ 0 and bT (pC (x)−
pC (y) ≤ 0 using ( 3.2). Then consider kx − yk2 = k(a − b) + (pC (x) − pC (y))k2
and do some calculations.)
Solution: From (3.2) it follows that (x − pC (x))T (y − pC (x)) = aT (y − pC (x)) ≤ 0
for all y ∈ C. If we in particular set y = pC (y) we obtain aT (pC (y) − pC (x)) ≤ 0.
Similarly, (x − pC (y))T (y − pC (y)) = bT (x − pC (y)) ≤ 0 for all x ∈ C. If we in
particular set x = pC (x) we obtain bT (pC (x) − pC (y)) ≤ 0.
24 CHAPTER 3. PROJECTION AND SEPARATION

We now obtain

kx − yk2 = k(a − b) + (pC (x) − pC (y))k2


= aT (pC (x) − bT pC (y)) + bT (pC (x) − pC (y)) + ka − bk2 + kpC (x) − pC (y)k2
≥ ka − bk2 + kpC (x) − pC (y)k2 ≥ kpC (x) − pC (y)k2 .

The result follows after taking square roots.


Exercise 3.18. Consider the outer description of closed convex sets given in
Corollary 3.2.4. What is this description for each of the following sets:
(i) C1 = {x ∈ IRn : kxk ≤ 1},
(ii) C2 = conv({0, 1}n ),
(iii) C3 = conv({−1, 1}2 )
(iv) C4 = conv({−1, 1}n ), n > 2.
Solution: This is a compulsory exercise.
(i) The supporting halfplanes are defined by the tangents at each point on the
unit circle.
(ii) This is the unit cube in IRn . In this case one can use the halfplanes eTi x = 0
or 1.
(iii) The supporting halfplanes are x = ±1, y = ±1.
(iv) The supporting halfplanes are xi = ±1 (alternatively eTi x = ±1).
Chapter 4

Representation of convex sets

Exercise 4.1. Consider the polytope P ⊂ IR2 being the convex hull of the points
(0, 0), (1, 0) and (0, 1) (so P is a simplex in IR2 ).
(i) Find the unique face of P that contains the point (1/3, 1/2).
(ii) Find all the faces of P that contain the point (1/3, 2/3).
(iii) Determine all the faces of P .
Solution:
(i) The face is the entire polytope.
(ii) This is the line from (1, 0) to (0, 1).
(iii) The three vertices, the three edges, and the polytope itself.
Exercise 4.2. Explain why an equivalent definition of face is obtained using the
condition: if whenever x1 , x2 ∈ C and (1/2)(x1 + x2 ) ∈ F , then x1 , x2 ∈ F .
Solution: The old condition clearly implies that the new condition holds (simply
choose λ = 1/2). The other way, assume that the new condition holds, and let
x1 , x2 ∈ C be so that x = (1 − λ)x1 + λx2 ∈ F . If λ ≤ 1/2, x is the midpoint
between x1 and (1 − 2λ)x1 + 2λx2 , and since both these are on the line segment
between x1 and x2 (and hence in C), it follows from the condition that x1 ∈ F
(1 − 2λ)x1 + 2λx2 . This can be repeated until we find a point in F which is on the
second part of the line segment between x1 and x2 . In the same way, if λ ≥ 1/2,
it follows that x2 ∈ F . Combining these two proves the result.
Exercise 4.3. Prove this proposition!
Solution: Let x1 , x2 ∈ C, and let (1 − λ)x1 + λx2 ∈ F2 . Since F2 ⊆ F1 and F1
is a face of C, it follows that x1 , x2 ∈ F1 . Since F2 is a face of F1 it follows that
x1 , x2 ∈ F2 as well. It follows that F2 is a face of C.
Exercise 4.4. Define C = {(x1 , x2 ) ∈ IR2 : x1 ≥ 0, x2 ≥ 0, x1 + x2 ≤ 1}. Why
does C not have any extreme halfline? Find all the extreme points of C.
Solution: C is bounded. Existence of an extreme halfline would imply C to be
unbounded. The extreme points of C are clearly (0, 0), (1, 0), and (0, 1).

25
26 CHAPTER 4. REPRESENTATION OF CONVEX SETS

Exercise 4.5. Consider a polytope P ⊂ IRn , say P = conv({x1 , . . . , xt }). Show


that if x is an extreme point of P , then x ∈ {x1 , . . . , xt }. Is every xj necessarily
an extreme point?
Py Pt
Solution: Write an extreme point as x = j=1 λj xj ∈ P with j=1 λj = 1.
Assume that 0 < λ1 < 1. We can rewrite
t
X λj
x = λ1 x1 + (1 − λ1 ) xj .
j=2
1 − λ1

Pt λj
Since x1 ∈ P and j=2 1−λ1 xj ∈ P , the extreme point property implies that
x = x1 .
Not every xj needs to be an extreme point: simply add an xj which already is in
the convex hull of the previous ones to see this.
Exercise 4.6. Show that rec(C, x) is a closed convex cone. First, verify that
z ∈ rec(C, x) implies that µz ∈ rec(C, x) for each µ ≥ 0. Next, in order to verify
convexity you may show that
\ 1
rec(C, x) = (C − x)
λ>0
λ

where λ1 (C − x) is the set of all vectors of the form λ1 (y − x) where y ∈ C.


Solution: Since x + z = x + µ1 (µz), we see that µz ∈ rec(C, x) whenever z ∈
rec(C, x) and µ > 0. This applies also when µ = 0, since trivially 0 ∈ rec(C, x).
An element in λ>0 λ1 (C −x) can for any λ > 0 be written on the form z = λ1 (c−x)
T
for some c ∈ C, so that x + λz = c. This is clearly the same as z ∈ rec(C, x).
rec(C, x) can thus be ritten as an intersection of closed convex sets, and is thus
also closed and convex.
Another way to show convexity is as follows: Let z1 , z2 ∈ rec(C, x), 0 ≤ λ ≤ 1,
and λ0 ≥ 0. We have that

x + λ0 ((1 − λ)z1 + λz2 ) = (1 − λ)(x + λ0 z1 ) + λ(x + λz2 )

z1 , z2 ∈ rec(C, x) implies x + λ0 z1 ∈ C, x + λz2 ∈ C. By convexity of C it follows


that the left hand side also is in C, so that (1 − λ)z1 + λz2 ∈ C. Convexity of
rec(C, x) follows.
Exercise 4.7. Show that a closed convex set C is bounded if and only if rec(C) =
{O}.
27

Solution: Clearly C is unbounded if the recession cone is nontrivial. The other


way, when C is unbounded, we will construct a nonzero vector in rec(C). The
proof is long and involved, and builds on many small lemmas (the three stated
below are not proved here, see the book of Rockafellar).
Assume that C ⊆ IRn . Consider the convex cone K ⊆ IRn+1 generated by (1, x),
with x ∈ C. Clearly (0, 0) is the only point in K with first component zero. We
will attempt to extend K to a cone K 0 = K ∪ K0 , where K 0 ⊆ {(λ, x) : λ ≥ 0},
K0 ⊆ {(λ, x) : λ = 0}. K0 is thus required to be a cone, and so that K0 + K ⊆ K.
In particular we must have that, for any z ∈ C,
X
(0, x) + (1, z) = λi (1, xi )
i

for some xi ∈ C, λi ≥ 0. The λi must sum to one, so that the right hand side can
be written as (1, w) for some w ∈ C (by convexity). It follows that x + z = w.
i.e., adding any element from C to x gives another element in C. We prove the
following statement, which implies that x ∈ rec(C):

x ∈ rec(C) ⇐⇒ C + x ⊆ C.

⇒ is obvious. The other way, if C + x ⊆ C, then C + 2x ⊆ C, and in general


C + mx ⊆ C for any integer m. letting λ ≥ 0 be any integer, y + λx lies on
the between y and a + mx where m is some integer larger than λ. By convexity
y + λx ∈ C (it is one the line between y and y + mx, which both are in C). It
follows that x ∈ rec(C).
Thus, K0 ⊆ (0, rec(C)). Since (0, rec(C)) itself is a cone, it follows that K0 =
(0, rec(C)) is the largest cone we can choose for K0 . This choice also satisfies
K0 + K ⊆ K, so that K0 ∪ K is a cone. To see this, let Cλ = {x : (λ, x) ∈ K} for
λ ≥ 0. Clearly λC ⊂ Cλ . The other way, if x ∈ Cλ , then (λ, x) ∈ K so that
X X X
(λ, x) = λi (1, xi ) = ( λi , λ i xi )
i i i

with λi ≥ 0. It follows that


X X λi
x= λi xi = λ xi ∈ λC
i i
λ
P
since i λi = λ and due to convexity of C. It follows that λC = Cλ . From this,
since
1
(0, x) + (λ, z) = (0, x) + (λ, λy) = (λ, λ( x + y)) ∈ K
λ
28 CHAPTER 4. REPRESENTATION OF CONVEX SETS

(since λ1 x ∈ rec(C)). K0 + K ⊆ K follows.


Below we will prove that cl(K) ⊆ K 0 (actually one can prove equality here
as well with a bit more work). Let us see how we can use this to construct a
nonzero vector in rec(C) when C is unbounded, and thereby complete the proof.
If C is unbounded we can find a sequence xn from C so that kxn k → ∞. The
elements (1/kxn k, xn /kxn k) are in K. One can find a convergent subsequence so
that xnk /kxnk k → y, where kyk = 1. But then (1/kxnk k, xnk /kxnk k) → (0, y).
Since cl(K) ⊆ K 0 it follows that y ∈ rec(C).
Proving that cl(K) ⊆ K 0 requires two additional results, both taken from the
book of Rockafellar. Set Mλ to be the affine set {(λ, x) : x ∈ IRn }:
1. (Theorem 6.6). If K is convex and A is a linear transformations, the rint(AK) =
Arint(K). We will use this as follows: Consider the projection P : (λ, x) → λ
(from IRn+1 to IR). P is linear, so that P (rint(K)) = rint(P (K)) = rint(IR+ ) =
(0, ∞). It follows that rint(K) ∩ Mλ 6= ∅ for any λ > 0.
2. (Corollary 6.5.1). If K is convex and M is affine and contains a point in rint(K),
then

rint(M ∩ C) = M ∩ rint(C)
cl(M ∩ C) = M ∩ cl(C).

We will use this as follows, using M = Mλ (due to 1., rint(K) ∩ Mλ 6= ∅):

Mλ ∩ rint(K) = rint(Mλ ∩ K) = rint(λ, Cλ ) = rint(λ, λC) = λ(1, rint(C)).

This proves that

rint(K) = ∪λ>0 (Mλ ∩ rint(K)) = ∪λ>0 λ(1, rint(C)).

Since rint(K) ∩ M1 6= ∅ in particular, cl(M1 ∩ K) = M1 ∩ cl(K) (second part of


Corollary 6.5.1). Since M ∩ K = {(1, x) : x ∈ C}, which is closed, we have that
M1 ∩ cl(K) = {(1, x) : x ∈ C}. It follows that cl(K) and K are equal on λ > 0.
By maximality of K 0 it follows that cl(K) ⊆ K 0 .
Exercise 4.8. Consider a hyperplane H. Determine its recession cone and lin-
eality space.
Solution: Let aT x = α be a hyperplane. rec(H, x) consists of all z so that
x + λz ∈ H for all x ∈ H and λ ≥ 0, i.e., α = aT (x + λz) = α + λaT z for all
λ. This implies that z must be orthogonal to a, i.e., rec(C, x) = {a}⊥ . Since the
recession cone here is a vector space, it equals its lineality space.
Exercise 4.9. What is rec(P ) and lin(P ) when P is a polytope?
29

Solution: Since polytopes are bounded, we have that rec(P ) = lin(P ) = {0}.
Exercise 4.10. Let C be a closed convex cone in IRn . Show that rec(C) = C.
Solution: If x ∈ C then (1 + λ)x = x + λx ∈ C for any λ ≥ 0, so that x ∈ rec(C).
The other way, if x ∈ rec(C) then 0 + 1 · x = x ∈ C since 0 ∈ C. The result
follows.
Exercise 4.11. Prove that lin(C) is a linear subspace of IRn .
Solution: lin(C) is closed under multiplication with scalars:
If x ∈ rec(C) then −x ∈ −rec(C). If x ∈ −rec(C) then −x ∈ rec(C). It follows
that lin(C) is closed under multiplication with −1, and thus also with all scalars.
Since both rec(C) and −rec(C) are closed under addition, the result follows.
Exercise 4.12. Show that rec({x : Ax ≤ b}) = {x : Ax ≤ O}.
Solution: Let C = {x : Ax ≤ b}, and assume that z ∈ rec(C, x). Then Ax ≤ b
and A(x + λz) ≤ b for all λ ≥ 0. Since A(x + λz) = Ax + λAz, this is clearly only
possible if Az ≤ 0. This implies the ⊆-direction.
The other way, if Az ≤ 0, then, for any x ∈ C, λ ≥ 0, we have that A(x + λz) ≤
b + 0 = b, so that x + λz ∈ C. It follows that z ∈ rec(C). This proves the other
direction.
Exercise 4.13. Let C be a line-free closed convex set and let F be an extreme
halfline of C. Show that then there is an x ∈ C and a z ∈ rec(C) such that
F = x + cone({z}).
Solution: An extreme halfline can be written on the form {x + tz}t≥0 for some
x ∈ C, and some vector z. Since the extreme halfline is in C, it follows that
z ∈ rec(C, x) = rec(C).
Exercise 4.14. Decide if the following statement is true: if z ∈ rec(C) then
x + cone({z}) is an extreme halfline of C.
Solution: No. Let C = IRn . Then rec(C) = IRn , and C has no extreme half lines.
Exercise 4.15. Consider again the set C = {(x1 , x2 , 0) ∈ IR3 : x21 + x22 ≤ 1}
from Exercise 2.16. Convince yourself that C equals the convex hull of its relative
boundary. Note that we here have bd(C) = C so the fact that C is the convex
hull of its boundary is not very impressive!
Solution: Its relative boundary has been shown to be {(x1 , x2 , 0) : x21 + x22 = 1},
and the convex hull of these is clearly C.
Exercise 4.16. Let H be a hyperplane in IRn . Prove that H 6= conv(rbd(H)).
Solution: The relative boundary of a hyperplane is ∅, so that conv(rbd(H)) =
∅=6 H.
Exercise 4.17. Consider a polyhedral cone C = {x ∈ IRn : Ax ≤ O} (where, as
usual, A is a real m × n-matrix). Show that O is the unique vertex of C.
30 CHAPTER 4. REPRESENTATION OF CONVEX SETS

Solution: This is a compulsory exercise. If a system of n equations from Ax = O


has a unique solution, then that solution must be the zero vector (since zero
is always a solution). Since the rank of A is n, some subsystem must give an
invertible matrix, so that 0 is the unique solution/vertex.
Exercise 4.18. Let F be a face of a convex set C in IRn . Show that every extreme
point of F is also an extreme point of C.
Solution: This follows from Exercise 4.3
Exercise 4.19. Find all the faces of the unit ball in IR2 . What about the unit
ball in IRn ?
Solution: Every point on the boundary is a face.
Exercise 4.20. Let F be a nontrivial face of a convex set C in IRn . Show that
F ⊆ bd(C) (recall that bd(C) is the boundary of C). Is the stronger statement
F ⊆ rbd(C) also true? Find an example where F = bd(C).
Solution: Assume that F contains an interior point x. Then clearly F must
contain a small ball around x. Let y be any other point in C. We can draw a line
through x starting at y and ending at the opposite side of x in another point in
C. By the face property this implies that y ∈ C, so that F = C. This shows that
any non-trivial face is contained in the boundary. F = bd(C) is possible when C
is the left or right side of a hyperplane.
Assume that F contains a relative interior point x of C. Then B o (x, r) ∩ aff(C) ⊆
C for some ball. Let y be another point in C. Draw again a line through x
starting at y and ending at the opposite side of x in another point in C. By the
face property this implies that y ∈ C, so that F = C.
Exercise 4.21. Consider the convex set C = B + ([0, 1] × {0}) where B is the
unit ball (of the Euclidean norm) in IR2 . Find a point on the boundary of C which
is a face of C, but not an exposed face.
Solution: Choose the point (0, 1) for instance. This gives the line from (0, 1) to
(1, 1) as an exposed face, but the point itself is a face.
Exercise 4.22. Let P ⊂ IR2 be the polyhedron being the solution set of the linear
system
x − y ≤ 0;
−x + y ≤ 1;
2y ≥ 5;
8x − y ≤ 16;
x + y ≥ 4.

Find all the extreme points of P .


31

Solution: This
 is a compulsory
  exercise.
 We can write the system as Ax ≤ b,
1 −1 0
−1 1  1
5
    
 0 −2, b = −5. These equations give 2 = 10 possible 2 × 2
where A =    
 8 −1  16 
−1 −1 −4
subsystems, of which 9 are invertible.
For four of the systems we have that 2y = 5 (where there is equality in the third
equation) so that y = 5/2. For these we get
• The first equation gives equality: We get the point (5/2, 5/2). This violates
the fourth inequality, however.
• The second equation gives equality: We get the point (3/2, 5/2). This is
feasible.
• The fourth equation gives equality: We get the point (37/16, 5/2). This is
feasible.
• The fifth equation gives equality: We get the point (3/2, 5/2), which was
obtained above.
So, so far we have obtained two extreme points. We also get two systems where
the first inequality gives equality (i.e. x = y):
• The fourth equation gives equality: We get the point (16/7, 16/7). This
violates the third inequality, however.
• The fifth equation gives equality: We get the point (2, 2). This also violates
the third inequality.
We also get two systems where the second inequality gives equality (i.e. y = x+1):
• The fourth equation gives equality: We get the point (17/7, 24/7). This is
feasible.
• The fifth equation gives equality: We get the point (3/2, 5/2), which was
obtained above.
Finally, if there is equality in the last two inequalities, we obtain the point
(20/9, 16/9), but this violates the third inequality.
The extreme points are thus
• (3/2, 5/2) (equality in the second, third, fifth inequalities)
• (37/16, 5/2) (equality in the third, fourth inequalities)
• (17/7, 24/7) (equality in the second, fourth inequalities)
When sketching this area, one sees that the first and last constraints can be elimi-
nated. The extreme points are the intersections between the remaining constraints
(number 2, 3, and 4).
Exercise 4.23. Find all the extreme halflines of the cone IRn+ .
32 CHAPTER 4. REPRESENTATION OF CONVEX SETS

Solution: Clearly any halfline where all except one component is nonzero is an
extreme halfline. Assume that we have an extreme halfline where there are points
with two nonzero components. Assume for simplicity that (x1 , x2 , 0) is on the
extreme halfline. Then clearly the halfline must contain all points with arbitrary
values in the first two components, so that it can’t be a halfline.
Exercise 4.24. Determine the recession cone of the set {(x1 , x2 ) ∈ IR2 : x1 >
0, x2 ≥ 1/x1 }. What are the extreme points?
Solution: Clearly the recession cone is IR2+ . All points in the first quadrant on
the graph y = 1/x are extreme points.
Exercise 4.25. Let B be the unit ball in IRn (in the Euclidean norm). Show that
every point in B can be written as a convex combination of two of the extreme
points of C.
Solution: The extreme points of B are the boundary points of B. Clearly any
interior point of B can be written as a convex combination of two boundary
points.
Exercise 4.26. Let C be a compact convex set in IRn and let f : C → IR be a
function satisfying
Xt t
X
f( λ j xj ) ≤ λj f (xj )
j=1 j=1

whenever x1 , . . . , xt ∈ C, λ1 , . . . , λt ≥ 0 and tj=1 λj = 1. Such a function is


P
called convex, see chapter 5. Show that f attains its maximum over C in an
extreme point. Hint: Minkowski’s theorem.
Solution: According to Minkowski’s
P theorem, C is the convex hull of its extreme
points. Write x ∈ C as x = λi xj , where the xj are extreme points. But then
X
f (x) ≤ λj f (xj ) ≤ max f (xj ).

This implies that, if x = argmaxx extreme (f (x)) is attained, then the maximum
is attained in the extreme point x. It turns out, however, that this maximum
may not be attained: Let C = {x ∈ IR2 : kxk ≤ 1}, and consider the function f
defined to be zero on the interior of C, and defined on the boundary so that it
both ≥ 0 and unbounded there. f is clearly convex, but the maximum x is not
attained due to unboundedness on the boundary.
Exercise 4.27. Prove Corollary 4.4.5 using the Main theorem for polyhedra.
Solution: Clearly a polytope is a bounded polyhedron. Assume now that P =
conv(V ) + cone(Z) is a bounded polyhedron, where V and Z are finite sets.
Boundedness implies that Z is empty, so that P = conv(V ). It follows that P is
a polytope.
33

Exercise 4.28. Let S ⊆ {0, 1}n , i.e., S is a set of (0, 1)-vectors. Define the
polytope P = conv(S). Show that x is a vertex of P if and only if x ∈ S.
Solution: For polytopes, vertices and extreme points coincide.
Clearly any x ∈ S is an extreme point: A line which passes through x ∈ S must
either increase or decrease at least one component linearly, in which case one of
the end components lies outside [0, 1]. But any component of a vector in P must
lie between 0 and 1. It follows that the points are extreme points. It is also clear
that any vector outside S with 0,1 components must be outside P .
Assume that x is an extreme point of P with 0 < x1 < 1. Then x must be
a convex combinations of two different points in S, contradicting that it is an
extreme point. Therefore, all components are either 0 or 1 in an extreme point.
Exercise 4.29. Let S ⊆ {0, 1}3 consist of the points (0, 0, 0), (1, 1, 1), (0, 1, 0)
and (1, 0, 1). Consider the polytope P = conv(S) and find a linear system defining
it.
Solution: Since all four points have equal first and third candidate, we can
eliminate the third component by requiring x3 = x1 . The convex hull of the four
points (0, 0), (0, 1), (1, 0), and (1, 1) in IR2 is clearly described by 0 ≤ x1 , x2 ≤ 1.
A possible linear system defining the system is thus

x1 , x2 , x3 ≥ 0
x1 , x2 ≤ 1
x3 − x1 = 0

Exercise 4.30. Let P1 and P2 be two polytopes in IRn . Prove that P1 ∩ P2 is a


polytope.
Solution: Both P1 and P2 are bounded polyhedra. But then P1 ∩ P2 is also a
bounded polyhedron, hence a polytope.
Exercise 4.31. Is the sum of polytopes again a polytope? The sum of two poly-
topes P1 and P2 in IRn is the set P1 + P2 = {p1 + p2 : p1 ∈ P1 , p2 ∈ P2 }.
Solution: Let P1 = conv(A), P2 = conv(B), where A and B are finite sets.
By exercise 2.25, conv(A + B) = conv(A) + conv(B) = P1 + P2 . In the case of
polytopes A and B are finite sets. But then A + B is also a finite set, so that
conv(A + B) is also a polytope. It follows that P1 + P2 is a polytope.
n
ExercisePk 4.32. Let L = span({b1 , . . . , bk }) be a linear subspace of IR . Define
b0 = − j=1 bj . Show that L = cone({b0 , b1 , . . . , bk }). Thus, every linear subspace
is a finitely generated cone, and we know how to find a set of generators for L
(i.e., a finite set with conical hull being L).
34 CHAPTER 4. REPRESENTATION OF CONVEX SETS

P Clearly cone({b0 , b1 , . . . , bk }) ⊆ L, since also b0 ∈ L. The other way,


Solution:
let x = kj=1 cj bj ∈ L. If all ck ≥ 0, then this is in cone({b0 , b1 , . . . , bk }). Assume
thus there are nonnegative cj , and let cl be the smallest such. Write
k
!
X X
x = −cl − bj + (cj − cl )bj .
j=1 j6=l

Since cj ≥ cl and cl < 0 this is a nonnegative combination, so that

x ∈ cone({b0 , b1 , . . . , bk }).
Exercise 4.33. Let P = conv({v1 , . . . , vk }) + cone({z1 , . . . , zm }) ⊆ IRn . Define
new vectors in IRn+1 by adding a component which is 1 for all the v. -vectors and
a component which is 0 for all the z. -vectors, and let C be the cone spanned by
these new vectors. Thus,

C = cone({(v1 , 1), . . . , (vk , 1), (z1 , 0), . . . , (zm , 0)})

Prove that x ∈ P if and only if (x, 1) ∈ C. The cone C is said to be obtained


by homogenization of the polyhedron P . This is sometimes a useful technique for
translating results that are known for cones into similar results for polyhedra, as
in the proof of Theorem 4.4.4.
P P P
Solution: Let x = j λj vj + k µk zk ∈ P , where j λj = 1, all µk ≥ 0. We
have that   X   X  
x v z
= λj j + µk k ∈ C.
1 1 0
j k

The other way, suppose that (x, 1) ∈ C, so that we can write


  X   X  
x v z
= λj j + µk k .
1 1 0
j k
P P
Clearly this forces the λj to sum to one, and also x = j λj vj + k µk zk , which
thus is in conv({v1 , . . . , vk }) + cone({z1 , . . . , zm }) = P .
Exercise 4.34. Show that the sum of valid inequalities for a set P is another
valid inequality for P . What about weighted sums? What can you say about the
properties of the set

{(a, α) ∈ IRn+1 : aT x ≤ α is a valid inequality for P }.

Solution: If cT1 x ≤ α1 and cT2 x ≤ α2 for x ∈ P , then clearly

(c1 + c2 )T x = cT1 x + cT2 x ≤ α1 + α2 ,


35

for x ∈ P as well, so that valid inequalities can be summed to obtain new valid
inequalities. For weighted sums,

(w1 c1 + w2 c2 )T x = w1 cT1 x + w2 cT2 x ≤ w1 α1 + w2 α2 ,

so that we get new valid inequalities here as well, as long as the wights are
nonnegative. It follows that the stated set is a cone.
Chapter 5

Convex functions

Exercise 5.1. Prove this lemma.


Solution: That Px2 is below the line segment Px1 Px3 (i.e., (i)) is equivalent to

f (x3 ) − f (x1 )
f (x2 ) ≤ (x2 − x1 ) + f (x1 ).
x3 − x1
Reorganizing this gives that
f (x2 ) − f (x1 ) f (x3 ) − f (x1 )
≤ ,
x2 − x1 x3 − x1
which states that slope(Px1 , Px2 ) ≤ slope(Px1 , Px3 ) (i.e. (ii)). This proves that (i)
and (ii) are equivalent.
(i) is also equivalent to

f (x3 ) − f (x1 )
f (x2 ) ≤ (x2 − x3 ) + f (x3 ).
x3 − x1
Reorganizing this gives that
f (x3 ) − f (x1 ) f (x3 ) − f (x2 )

x3 − x1 x3 − x2
which states that slope(Px1 , Px3 ) ≤ slope(Px2 , Px3 ) (i.e. (iii)).
Exercise 5.2. Show that the sum of convex functions is a convex function, and
that λf is convex if f is convex and λ ≥ 0 (here λf is the function given by
(λf )(x) = λf (x)).
Solution: We have that
X X X X
fk ((1−λ)x+λy) ≤ ((1−λ)fk (x)+λfk (y)) = (1−λ) fk (x)+λ fk (y),
k k k k

36
37

P
whichs shows that k fk also is convex. We also have that

µf ((1 − λ)x + λy) ≤ µ((1 − λ)f (x) + λf (y)) = (1 − λ)µf (x) + λµf (y),

which shows that µf is convex.


Exercise 5.3. Prove that the following functions are convex:
(i) f (x) = x2 ,
(ii) f (x) = |x|,
(iii) f (x) = xp where p ≥ 1,
(iv) f (x) = ex ,
(v) f (x) = − ln(x) defined on IR+ .
Solution:
(i) f (x) = x2 has a positive second derivative, so that it must be convex.
(ii) f (x) = |x| is convex due to the triangle inequality: |(1 − λ)x + λy| ≤
(1 − λ)|x| + λ|y|.
(iii) f (x) = xp has second derivative p(p − 1)xp−2 , which is nonnegative when
p ≥ 1.
(iv) f (x) = ex has positive second derivative ex , so that it is convex.
(v) f (x) = − ln x has positive second derivative 1/x2 , so that it is convex.
Exercise 5.4. Consider Example 5.1.2 again. Use the same technique as in the
proof of arithmetic-geometric inequality except that you consider general weights
λ1 , . . . , λr (nonnegative with sum one). Which inequality do you obtain? It in-
volves the so-called weighted arithmetic mean and the weighted geometric mean.
Solution: By convexity of f (x) = − ln x we obtain
r r r
λ
X X Y
− ln( λj xj ) ≤ − λj ln(xj ) = − ln( xj j ),
j=1 j=1 j=1

which leads to r r
λ
Y X
xj j ≤ λj xj .
j=1 j=1
Exercise 5.5. Repeat Exercise 5.2, but now for convex functions defined on some
convex set in IRn .
Solution: The proof goes in the same way.
Exercise 5.6. Verify that every linear function from IRn to IR is convex.
Solution: If f is linear we have that

f ((1 − λ)x + λy) = (1 − λ)f (x) + λf (y),

so that f is also convex and with equality holding in the definition of convexity.
38 CHAPTER 5. CONVEX FUNCTIONS

Exercise 5.7. Prove Proposition 5.2.1.


Solution: We have that

f (h((1 − λ)x + λy)) = f ((1 − λ)h(x) + λh(y)) ≤ (1 − λ)f (h(x)) + λf (h(y)),

where we used that h is affine and that f is convex. It follows that f ◦ h also is
convex.
Exercise 5.8. Let f : C → IR be convex and let w ∈ IRn . Show that the function
x → f (x + w) is convex.
Solution: This follows from the previous exercise since x → x + w is affine.
Exercise 5.9. Prove Theorem 5.2.3 (just apply the definitions).
Solution: Assume that f is convex. Let (x, s) ∈ epi(f ), (y, t) ∈ epi(f ). We have
that
f ((1 − λ)x + λy) ≤ (1 − λ)f (x) + λf (y) ≤ (1 − λ)s + λt.
It follows that (1 − λ)(x, s) + λ(y, t) = ((1 − λ)x + λy, (1 − λ)s + λt) ∈ epi(f ), so
that epi(f ) is convex.
Assume now that epi(f ) is convex. Since (x, f (x)) and (y, f (y)) are in epi(f ), also
((1−λ)x+λy, (1−λ)f (x)+λf (y)) ∈ epi(f ). This implies that f ((1−λ)x+λy) ≤
(1 − λ)f (x) + λf (y), so that f is convex.
Exercise 5.10. By the result above we have that if f and g are convex functions,
then the function max{f, g} is also convex. Prove this result directly from the
definition of a convex function.
Solution: This is a compulsory exercise. We have that

f ((1 − λ)x + λy) ≤ (1 − λ)f (x) + λf (y) ≤ (1 − λ) max{f, g}(x) + λ max{f, g}(y)
g((1 − λ)x + λy) ≤ (1 − λ)g(x) + λg(y) ≤ (1 − λ) max{f, g}(x) + λ max{f, g}(y).

It follows that also

max{f, g}((1 − λ)x + λy) ≤ (1 − λ) max{f, g}(x) + λ max{f, g}(y),

so that max{f, g} is also convex.


Exercise 5.11. Let f : IRn → IR be a convex function and let α ∈ IR. Show that
the set {x ∈ IRn : f (x) ≤ α} is a convex set. Each such set is called a sublevel
set.
Solution: Assume that f (x) ≤ α, f (y) ≤ α. Then

f ((1 − λ)x + λy) ≤ (1 − λ)f (x) + λf (y) ≤ (1 − λ)α + λα = α.

It follows that sublevel sets are convex.


39

Exercise 5.12. Verify that the function x → kxkp is positively homogeneous.


Solution: We have that

kλxkp = ((λx1 )2 + . . . + (λxn )2 )1/p = λ(x21 + . . . + x2n )1/p = λkxkp ,

so that the p-norm is positively homogeneous.


Exercise 5.13. Consider the support function of an optimization problem with
a linear objective function, i.e., let f (c) := max{cT x : x ∈ S} where S ⊆ IRn is
a given nonempty set. Show that f is positively homogeneous. Therefore (due to
Example 5.2.2), the support function is convex and positively homogeneous when
S is a compact convex set.
Solution: We have that

f (λc) = max{λcT x : x ∈ S} = λ max{cT x : x ∈ S} = λf (c),

so that f is positively homogeneous.


Exercise 5.14. Let f (x) = xT x = kxk2 for x ∈ IRn . Show that the directional
derivative f 0 (x0 ; z) exists for all x0 and nonzero z and that f 0 (x0 ; z) = 2z T x0 .
Solution: We have that
(x0 + tz)T (x0 + tz) − xT0 x0
f 0 (x0 ; z) = lim
t→0 t
2txT0 z + t2 z T z
= lim
t→0 t
= lim(2x0 z + tz T z) = 2z T x0 .
T
t→0

Exercise 5.15. A quadratic function is a function of the form

f (x) = xT Ax + cT x + α

for some (symmetric) matrix A ∈ IRn×n , a vector c ∈ IRn and a scalar α ∈ IR.
Discuss whether f is convex.
Solution: The Hessian of f is 2A, so f is convex if and only if A is positive
semidefinite.
Exercise 5.16. Assume that f and g are convex functions defined on an inter-
val I. Determine which of the functions following functions that are convex or
concave:
(i) λf where λ ∈ IR,
(ii) min{f, g},
(iii) |f |.
Solution:
40 CHAPTER 5. CONVEX FUNCTIONS

(i) λf is convex when λ ≥ 0, otherwise it is concave.


(ii) min{f, g} can be neither convex or concave. As an example consider a
convex parabola and a line which intersects it at two points.
(iii) |f | can be either convex or concave. If f (x) = x2 , then |f | is concave. If
f (x) = x2 − 4, defined on (−2, 2), then |f | is concave.
Exercise 5.17. Let f, g : I → IR where I is an interval. Assume that f and
f + g both are convex. Does this imply that g is convex? Or concave? What if
f + g is convex and f concave?
Solution: Assume that f is convex. If g = −f , f + g = 0 is clearly convex, but
g is concave (not convex).
If h = f + g is convex, and f is concave, then g = h − f is a sum of two convex
functions, so that it is convex.
Exercise 5.18. Let f : [a, b] → IR be a convex function. Show that
max{f (x) : x ∈ [a, b]} = max{f (a), f (b)}
i.e., a convex function defined on closed real interval attains its maximum in one
of the endpoints.
Solution: Any point in [a, b] can be written on the form (1 − λ)a + λb, for some
0 ≤ λ ≤ 1. We have that
f (x) = f ((1 − λ)a + λb) ≤ (1 − λ)f (a) + λf (b)
≤ (1 − λ) max{f (a), f (b)} + λ max{f (a), f (b)} = max{f (a), f (b)}.
The result follows.
Exercise 5.19. Let f : I → IR be a convex function defined on a bounded interval
I. Prove that f must be bounded below (i.e., there is a number L such that f (x) ≥
L for all x ∈ I). Is f also bounded above?
Solution: By the previous exercise f is bounded above by the maximum of the
endpoint values. Since f is convex it has an increasing left-/right-derivative on
(a, b). For a ≤ x < z < y ≤ b, since the slope function is increasing we have that

f (z) − f (x) f (y) − f (z)


≤ f−0 (z) f+0 (z) ≤ ,
z−x y−z
so that
f (x) ≥ −f− (z)(z − x) + f (z) f (y) ≥ f+0 (z)(y − z) + f (z).
It follows that f is bounded below on both sides of z by linear functions, so that
f in total is bounded below.
41

Exercise 5.20. Let f, g : IR → IR be convex functions and assume that f is


increasing. Prove that the composition f ◦ g is convex.
Solution: Since g((1 − λ)x + λy) ≤ (1 − λ)g(x) + λg(y) and f is increasing, we
have that
f (g((1 − λ)x + λy)) ≤ f ((1 − λ)g(x) + λg(y)) ≤ (1 − λ)f (g(x)) + λf (g(y)),
which proves that f ◦ g is convex.
Exercise 5.21. Find the optimal solutions of the problem min{f (x) : a ≤ x ≤
b} where a < b and f : IR → IR is a differentiable convex function.
Solution: This occurs when the derivative is 0.
Exercise 5.22. Let f : h0, ∞i → IR and define the function g : h0, ∞i → IR by
g(x) = xf (1/x). Prove that f is convex if and only if g is convex. Hint: Prove
that
g(x) − g(x0 ) 1 f (1/x) − f (1/x0 )
= f (1/x0 ) − ·
x − x0 x0 1/x − 1/x0
and use Proposition 5.1.2. Why is the function x → xe1/x convex?
Solution: This is a compulsory exercise We have that
g(x) − g(x0 ) xf (1/x) − x0 f (1/x0 )
=
x − x0 x − x0
xf (1/x) − xf (1/x0 )
= f (1/x0 ) +
x − x0
1 f (1/x) − f (1/x0 )
= f (1/x0 ) − .
x0 1/x − 1/x0
Now, as x increases, 1/x decreases, so that the fraction on the right decreases
since the slope function of f is increasing since f is convex. Due to the minus
sign the entire right hand side increases, so that the slope function of g increases.
It follows that g is convex as well.
With f (x) = ex we get that g(x) = xe1/x so this function is convex since ex is.
Exercise 5.23. Prove Theorem 5.1.9 as follows. Consider the function
f (b) − f (a)
g(x) = f (x) − f (a) − (x − a).
b−a
Explain why g is convex and that it has a minimum point at some c ∈ ha, bi (note
that g(a) = g(b) = 0 and g is not constant). Then verify that
f (b) − f (a)
∂g(c) = ∂f (c) −
b−a
and use Corollary 5.1.8.
42 CHAPTER 5. CONVEX FUNCTIONS

Solution: g is convex since it is a sum of a convex and an affine function. We


know from the exercises then that it has a minimum c. We compute first
f (b) − f (a)
g(x) − g(c) = f (x) − f (c) − (x − c).
b−a
We then obtain
 
g(x) − g(c)) f (x) − f (c)) f (b) − f (a) x − c
lim = lim −
x→c+ x−c x→c+ x−c b−a x−c
f (b) − f (a)
= f+0 (c) −
 b−a 
g(x) − g(c)) f (x) − f (c)) f (b) − f (a) x − c
lim = lim −
x→c− x−c x→c− x−c b−a x−c
f (b) − f (a)
= f−0 (c) − ,
b−a
so that
f (b) − f (a) f (b) − f (a)
∂g(c) = [f−0 (c), f+0 (c)] − = ∂f (c) − .
b−a b−a

By Corollary 5.1.8 it follows that 0 ∈ ∂g(c), so that 0 ∈ ∂f (c) − f (b)−f


b−a
(a)
, so that
f (b)−f (a)
b−a
∈ ∂f (c). This proves Theorem 5.1.9.
Exercise 5.24. Let f : IR → IR be an increasing convex function and let g :
C → IR be a convex function defined on a convex set C in IRn . Prove that the
composition f ◦ g (defined on C) is convex.
Solution: See previous exercise.
T
Exercise 5.25. Prove that the function given by h(x) = ex Ax is convex when
A is positive definite.
Solution: We know that g(x) = xT Ax is convex, and that f (x) = ex is both
convex and increasing. The result thus follows from the previous exercise.
Exercise 5.26. Let f : C → IR be a convex function defined on a compact convex
set C ⊆ IRn . Show that f attains its maximum in an extreme point. Hint: use
Minkowski’s theorem (Corollary 4.3.4).
Solution: Corollary 4.3.4 says that C is the convex hull of its extreme points.
Denote them by xj . We have that
X X X
f( λ j xj ) ≤ λj f (xj ) ≤ λj max(f (xj )) = max(f (xj )),
j j
j j j

it follows that any maximum value must be attained in an extreme point.


43

Exercise 5.27. Let C ⊆ IRn be a convex set and consider the distance function
dC defined in ( 3.1), i.e., dC (x) = inf{kx − ck : c ∈ C}. Show that dC is a convex
function.
Solution: Let x, y be given, and find x1 , y1 ∈ C so that kx − x1 k ≤ dC (x) + ,
ky − y1 k ≤ dC (y) + . Since C is convex, (1 − λ)x1 + λy1 ∈ C. We have that

dC ((1 − λ)x + λy) = inf{k(1 − λ)x + λy − ck : c ∈ C}


≤ k(1 − λ)x + λy − ((1 − λ)x1 + λy1 )k
≤ (1 − λ)kx − x1 k + λky − y1 k
≤ (1 − λ)(dC (x) + ) + λ(dC (y) + )
= (1 − λ)dC (x) + λdC (y) + .

Since this applies for all  it follows that dC ((1−λ)x+λy) ≤ (1−λ)dC (x)+λdC (y)
as well, so that dC is convex.
Exercise 5.28. Prove Corollary 6.1.1 using Theorem 5.3.5.
Solution: Let x∗ be a local minimum Since f is convex, Theorem 5.3.5 says that
f (x) ≥ f (x∗ ) + ∇f (x∗ )T (x − x∗ )) for all x ∈ C. If ∇f (x∗ ) = 0 this says that
f (x) ≥ f (x∗ ), i.e., x∗ is a global minimum. Therefore (iii) implies (ii), and (ii)
obviously implies (i).
Assume finally that x∗ is a local minimum, and assume for contradiction that
∇f (x∗ ) 6= 0. Note that ∇f (y)T ∇f (x∗ ) > 0 for y in some neighbourhood of x∗ (at
least if the gradient is continuous).
It is now better to use the following Taylor formula

f (x) = f (x∗ ) + ∇f (x∗ + t(x − x∗ ))T (x − x∗ ),

for some 0 < t < 1. By choosing x = x∗ − α∇f (x∗ ) and α small enough, we get

f (x) = f (x∗ ) − α∇f (x∗ + t(x − x∗ ))T ∇f (x∗ ) < f (x∗ ),

which contradicts that we have a local minimum. This proves (iii), and the proof
is complete.
Exercise 5.29. Compare the notion of support for a convex function to the notion
of supporting hyperplane of a convex set (see section 3.2). Have in mind that f
is convex if and only if epi(f ) is a convex set. Let f : IRn → IR be convex and
consider a supporting hyperplane of epi(f ). Interpret the hyperplane in terms of
functions, and derive a result saying that every convex function has a support at
every point.
44 CHAPTER 5. CONVEX FUNCTIONS

Solution: Let aT x = α be a supporting hyperplane of epi(f ) at (y, f (y)) (at


points (y, t) with t > f (y) the epigraph can not have a supporting hyperplane).
Assume that a is scaled so that its last component is 1 (this is always possible
when the last component of a is nonzero. Note that the case of a zero last compo-
nent is not interesting, since this rather constrains the domain of f ). This means
that the hyperplane can be written as

aT x = (a1 , . . . , an )T (x1 , . . . , xn ) + t = (a1 , . . . , an )T (y1 , . . . , yn ) + f (y),

so that
t = −(a1 , . . . , an )T (x1 − y1 , . . . , xn − yn ) + f (y).
Denote this affine function by h(x). Since aT x ≥ α on epi(f ), from the above it
follows that

(a1 , . . . , an )T (x1 , . . . , xn ) + f (x) ≥ (a1 , . . . , an )T (y1 , . . . , yn ) + f (y),


so that

f (x) ≥ −(a1 , . . . , an )T (x1 − y1 , . . . , xn − yn ) + f (y) = h(x).

All this can be more compactly explained in terms of graphs: The graph of the
hyperplane lies below the graph of f . The supporting hyperplane is viewed as the
tangent plane of the graph.
Chapter 6

Nonlinear and convex optimization

Exercise 6.1. Consider the least squares problem minimize kAx − bk over all
x ∈ IRn . From linear algebra we know that the optimal solutions to this problem
are precisely the solutions to the linear system (called the normal equations)

AT Ax = AT b.

Show this using optimization theory by considering the function f (x) = kAx−bk2 .
Solution: The gradient of f (x) = kAx−bk2 = xT AT Ax−2bT Ax+bT b is ∇f (x) =
2AT A − 2AT b. We see that the gradient is zero if and only if AT Ax = AT b.
Exercise 6.2. Prove that the optimality condition is correct in Example 6.2.1.
Solution: Assume that x∗i = 0. Let x = x∗ + ei ∈ C. Then ∇f (x∗ )T (x − x∗ ) =
∂f
∂xi
(x∗ ), which must be greater than zero.
Assume that x∗i > 0. Then x = x∗ ±ei ∈ C for  small enough. These two choices
∂f
give ± ∂x i
(x∗ ) as values for ∇f (x∗ )T (x − x∗ ). If both of these are ≥ 0 then clearly
∂f
∂xi
(x∗ ) = 0.
Exercise 6.3. Consider the problem to minimize a (continuously differentiable)
convex function f subject to x ∈ C = {x ∈ IRn : O ≤ x ≤ p} where p is some
nonnegative vector. Find the optimality conditions for this problem. Suggest a
numerical algorithm for solving this problem.
Solution: The constraints can be written as −xi ≤ 0, xi − p ≤ 0, which have
gradients −ei and ei , respectively.
Go through all possibilities of active constraints.
∂f
If 0 < xi < pi , the gradient equation says that ∂x i
= 0 If for instance xi = 0, we
∂f
would add −µi ei to the gradient equation. This is the same as ∂x i
= µi ≥ 0.
If for instance xi = pi , we would add µi ei to the gradient equation. This is the
∂f
same as ∂x i
= −µi ≤ 0.
∂f ∂f
We thus arrive at the following: ∂x i
≥ 0 if xi = 0, ∂x i
≤ 0 if xi = pi .

45
46 CHAPTER 6. NONLINEAR AND CONVEX OPTIMIZATION

Exercise 6.4. Consider the optimization problem minimize f (x) subject to x ≥


O, where f : IRn → IR is a differentiable convex function. Show that the KKT
conditions for this problem are

x ≥ O, ∇f (x) ≥ O, and xk · ∂f (x)/∂xk = 0 for k = 1, . . . , n.

Discuss the consequences of these conditions for optimal solutions.


∂f
Solution: If xi = 0 then the ith component of the gradient equation is ∂xi
− µi =
∂f ∂f
0, so that ∂xi = µi ≥ 0. Otherwise ∂xi = 0. In any case xk · ∂f (x)/∂xk = 0. In
any case ∇f (x) ≥ 0.
Exercise 6.5. Solve the problem: minimize (x + 2y − 3)2 for (x, y) ∈ IR2 and the
problem minimize (x + 2y − 3)2 subject to (x − 2)2 + (y − 1)2 ≤ 1.
Solution: The minimum of the first is clearly 0. with the minimum obtained for
any point on the line x + 2y = 3. For the second the gradient equation is
   
2(x + 2y − 3) 2(x − 2)
+µ = 0.
4(x + 2y − 3) 2(y − 1)
Note first that the gradient of the constraint is zero if x = 2, y = 1. This gives
the candidate f (2, 1) = 1 (this point also satisfies the constraint).
If the constraint is not active we must have x + 2y − 3 = 0. Any point on
this line, and so that (x − 2)2 + (y − 1)2 < 1, is a candidate for the minumum
(the point (1, 1) for instance makes the constraint active, so that the line clearly
intersects the interior of the circle satisfie). At all these points f is zero, so that the
unconstrained global minimum is attained, so that we can stop here. Nevertheless,
let us also see what happens when the constraint is active. We then have that
2(y − 1) = 4(x − 2), i.e., y = 2x − 3. Inserting this in (x − 2)2 + (y − 1)2 = 1 gives

(x − 2)2 + 4(x − 1)2 = 5(x − 1)2 = 1,


√ √
so that x = 1 ± 5, y = ±2/ 5 − 3.
√ √
x+2y−3 = ±5 √ 5−5 = −5± 5. The minimum value for the two new candidates
is clearly (−5 + 5)2 , but the minimum was already found to be 0 above.
Clearly the global minimum is attained in this exercise, since we minimize over
a bounded region.
Exercise 6.6. Solve the problem: minimize x2 +y 2 −14x−6y subject to x+y ≤ 2,
x + 2y ≤ 3.
Solution: The gradient equation is
     
2x − 14 1 1
+ µ1 + µ2 = 0.
2y − 6 1 2
47

Note first that the constraint gradients are linearly independent.


Assume that both constraints are active. This gives x = y = 1 and f (1, 1) = −18.
Assume that only the first constraint is active. We get 2x − 14 = 2y − 6 and
x + y = 2, so that we get x = 3, y = −1, and f (3, −1) = −26.
Assume that only the second constraint is active. We get y = 2x−11 and x+2y =
3, so that we get x = 5, y = −1, and f (5, −1) = −38.
Assume finally that no constraints are active. Then x = 7, y = 3, and f (7, 3) =
−58. By comparing with the other candidates we see that this is the minimum.
We should finally comment that f goes to zero as x, y → ∞. This implies that
the minimum we have found also is global.
Exercise 6.7. Solve the problem: minimize x2 − y subject to y − x ≥ −2, y 2 ≤ x,
y ≥ 0.
Solution: The gradient equation is
       
2x 1 −1 0
+ µ1 + µ2 + µ3 = 0.
−1 −1 2y −1

We see that x ≥ 0.
From the second component we see that the second constraint must be active
and y > 0. The third constraint can thus never be active. We are left with two
possiblities: The first constraint may or may not be active.
Assume first that the first constraint is active. We then have that y = x − 2 and
y 2 = x, so that y 2 − y − 2 = 0. This gives that y = 2 or y = −1. y = −1 can be
discarded, so that we obtain the candidate (4, 2). We have that f (4, 2) = 14.
Finally assume that the first constraint is not active. The gradient equation can
now be written as    
2x 1
= µ2 .
−1 −2y
Taking ratios we see that −2x = −1/2y so that 4xy = 1 (and both x and y must
be > 0). Inserting this in y 2 = x gives that 4y 3 = 1, so that x = 4−2/3 , y = 4−1/3 ,
and
3
f (4−2/3 , 4−1/3 ) = 4−4/3 − 4−1/3 = − 4−1/3 < 0.
4
It follows that this is the constrained minimum.
We should also comment on the possibility of having linearly dependent active
constraint gradients. The only problematic part here can be when the first two
48 CHAPTER 6. NONLINEAR AND CONVEX OPTIMIZATION

constraints both are active, but this is covered by the calculations above, which
lead to f (4, 2) = 14.
We should also comment that the area we minimize over is bounded, so that there
actually exists a global minimum, which is the one we have found.
More on the proof of Theorem 1.2 in [2]

F (G) ⊆ Q follows from Lemma 1.1: Since (1.4) holds for all incidence vectors
of forests, it also holds for their convex hull, i.e., F (G) ⊆ Q. The other way,
as Q is compact and convex, Minkowski’s theorem yields that Q is the convex
hull of its extreme points. According to Chapter 4 in [1], as Q is a polyhedron,
vertices and extreme points coincide, and faces and exposed faces are the same.
It follows that it is enough to show that any unique optimal solution to a problem
on the form max{cT x : x ∈ Q} is also in F (G). We will actually show that such
a unique optimal solution must be an incidence vector for a forest, which is an
even stronger statement.
It is smart to consult section 1.4 in [21] here, where one learns the following
greedy algorithm for constructing a maximum weight forest: If you at any stage
in the algorithm have the components U1 , . . . , Uk , join two of the components
with an edge of maximum overall weight (there may be more than one such),
and terminate when there are no such edges left with positive weight. Since the
later edges added also were candidates at previous iterations, the weights are
decreasing: c(e1 ) ≥ c(e2 ) ≥ · · · c(er ) > 0 (the edges ei are the ones found by the
algorithm). This is a crucial point, which is not commented in the proof.
Let us comment on why the upper triangular system
X
yVj + yVi = c(ej ) (6.1)
i:ej ∈E(Vi )

yields a dual feasible y (recall yS = 0 for S different from the Vj ).


After all iterations of the greedy algorithm, we end up with a forest where the
nodes in any connected component equal a Vj which is maximal in the sense
that it is not a subset of another Vi (this means that I(j) = ∅). If Vj is such,
then yVj = c(ej ) > 0 due to (6.1). Now, each Vj is constructed by the algorithm
by adding edges, providing a subsystem of (6.1), where only subsets of Vj occur
in the summations. If we eliminate yVj = c(ej ) > 0 (back-substitution) in this
subsystem, we get a system where the right hand side is c(ek ) − c(ej ) ≥ 0 (and
still decreasing), and where ej joins Vj1 and Vj2 to Vj . The sketched procedure

49
50 CHAPTER 6. NONLINEAR AND CONVEX OPTIMIZATION

can now be repeated to show that yVj1 ≥ 0, yVj2 ≥ 0, and so on, proving that all
yVj ≥ 0.
For dual feasibility we also need to explain why
X
yVi ≥ c(e)
i:e∈E(Vi )

when e is different from the ei added by the algorithm. If e joins two different
components of the forest we end up with, we must have that c(e) ≤ 0, and the
equation follows. Assume thus that e joins two vertices in the same component
of the forest. At some point in the algorithm, the two end nodes of e are joined
into the same component by means of one edge ek . As e was also a candiate for
this join, we must have that c(ek ) ≥ c(e) by maximality of c(ek ) in the algorithm.
But then X X
yVi = yVi = c(ek ) ≥ c(e).
i:e∈E(Vi ) i:ek ∈E(Vi )

Dual feasibility follows.


Let us also explain the complementary slackness further. By construction the dual
slack variables corresponding to e1 , . . . , er are zero. As x0 is the incidence vector
of e1 , . . . , er we have complementary slackness between the primal variables and
the dual slack variables.
To secure complementary slackness between the dual variables and the primal
slack variables, since yS = 0 for all S 6= Vi , we only need to show that x(E[Vi ]) =
|Vi | − 1 for all i. But this follows from the fact that the connected components
constructed by the greedy algorithm are trees.
From complementary slackness it now follows that x0 is optimal for the primal
problem. As x̄ also is optimal, they must be equal due to the uniqueness assump-
tion. As x0 is the incidence vector of a forest, the result follows.
Chapter 2 in [2]

Regarding the final statement in proposition 2.2, I believe it should be that,


if P is pointed and rational, nonempty polyhedron, then P is integral if and
only if each vertex is integral. To see why: When it is pointed, we can write
P = conv(ext(P )) + cone(Z) according to Proposition 4.3.3. Since P also is
assumed rational, we can choose integral generators for Z as in Theorem 2.1.
⇒: IF P is integral, P = conv(P ∩ Z n ). According to Exercise 4.5, all vertices
of P are contained in P ∩ Z n , so that they are integral (this direction does not
require the polyhedron to be rational).
⇐: If each vertex is integral, then conv(ext(P )) ⊆ PI . Since P and PI have the
same characteristic cone according to Theorem 2.1, we have that

P = conv(ext(P )) + cone(Z) ⊆ PI + cone(Z) = PI .

Thus P ⊆ PI , so that P is integral.


Add details in Proposition 2.3: Note that the vertices are rational since the poly-
hedron is rational. Since the polyhedron is not integral, one of the vertices is not
integral. Since it is rational, it must be fractional. That there is a c̄ as described is
not explained. Note that each vertex is the unique intersection of n planes. Con-
sider a supporting hyperplane at the optimal solution. We can slightly change its
normal vector c̄ to a rational vector (and therefore also an integral vector), while
maintaining the supporting hyperplane property. c̄ must be changed so that its
inner products with the normal vectors of the planes do not change in sign. For
the same reason we can find c0 .

51
Chapter 4 in [2]

The deductions on the bottom of page 54


Let us go carefully through the deductions on the bottom of page 54. The first
two lines follow by definitions, while the third line follows frm the fact that the
maximum of a convex function in a polytope can be found in one of the vertices
of the polytope. The fourth line, introducing η, is a simple rewriting, while the
fifth line simply moves the terms from side to side. The fifth line is an LP in
the variables η and λ. The coefficient vector in the objective is (1, 0, . . . , 0). With
T
X = x1 · · · x|K| , and 1n the column vector with all ones, the inequalities
can be rephrased as
 
  η
2 2
1n (A X − b · · · b ) 2 T
≥ X T c.
λ
The Pdual LP is thus (with variables denoted by µk ) to maximize (X T c)T µ =
T k k
c k∈K µ x subject to
 
 
1 1
 T
 µ 0
1n 2
µ  ≤  ..  ,
   
A2 X − b2 · · · b2 .. .
.
0
where there is equality in the first inequality since there was P
no requirement on
η to be nonnegative. This first equality can be rephrased as k∈K µk = 1. The
other inequalities can be rephrased as
 
µ1
2
A2 X − b2 · · · b2 µ  ≤ 0,
 
..
.
i.e., X X
A2 µ k x k ≤ b2 µ k = b2 .
k∈K k∈K

µk xk is a general element in
P
This is the statement on the sixth line. Since k∈K
PI1 , the last line follows.

52
53

The details at the top of page 65


Let us also clarify thePdetails at the top of page 65. By adding the degree con-
straints we have that v∈H x(δ(v)) = 2|H|. Now, each edge in E(H) contributes
twice in this sum, while each x ∈ δ(H) contributes once, so that 2x(E(H)) +
x(δ(H)) = 2|H|, so that x(E(H)) + 21 x(δ(H)) = |H|. Adding that − 21 xe ≤ 0 for
e ∈ δ(H) \ (∪i E(Ti )) we obtain
k
1X
x(E(H)) + x(E(Ti ) ∩ δ(H)) ≤ |H|, (6.2)
2 i=1

where we used that we have the disjoint union

δ(H) = (δ(H) \ ∪i E(Ti )) ∪ (∪i E(Ti ) ∩ δ(H)) .

Now note that we also have the disjoint union

E(Ti ) = E(Ti ∩ H) ∪ E(Ti \ H) ∪ δ(H) ∩ E(Ti ),

and that
1. An edge in E(Ti ∩ H) contributes both in (ii) and (iv).
2. An edge in E(Ti \ H) contributes both in (ii) and (iii).
3. An edge in δ(H) ∩ E(Ti ) contributes both in (ii) and (6.2).
Therefore, if (i), (ii), and (iii) are scaled by 12 , and these three are added for all i
with (6.2), the left hand side will become x(E(H)) + ki=1 x(E(Ti )). On the right
P
side we obtain
k
!
1 X
|H| + (|Ti | − 1 + |Ti \ H| − 1 + |Ti ∩ H| − 1)
2 i=1
k
1X
= |H| + (2|Ti | − 3)
2 i=1
k
X
= |H| + (|Ti | − 1) − k/2.
i=1

The use of Farkas lemma


Let us also comment on how Farkas lemma is used in (4.17). Farkas lemma
(Theorem 3.2.5) states that. Ax = b has a solution x ≥ 0 if and only if for each
54 CHAPTER 6. NONLINEAR AND CONVEX OPTIMIZATION

y, y T A ≥ 0 implies y T b ≥ 0. Alternatively this says that Ax = b has a solution


x ≥ 0 if and only if for each y, y T A ≤ 0 implies y T b ≥ 0.
   
A x̄
Applied to the matrix and vector , the first statement is equivalent
e 1
x ∈conv(T ). x 6∈ conv(T
to   ) is thus equivalent
 to that there exists a y so that
A x̄ a
yT ≤ 0 and y T > 0. Writing y = , this says that aT A − be ≤ 0
e 1 −b
and aT x̄ − b > 0, which is the staement in the book.
The matrix A here has nonnegative entries, as well as x̄. The reason for using −b
instead of b is that the equations secure equivalence between positivity of a and
that of b.
Exercises from [2]

Exercise 4.1: The coordinate change xj = abj yj turns the problem into maximizing
c a
b nj=1 ajj yj under the constraints nj=1 yj ≤ 1, and 0 ≤ yj ≤ bj . Clearly we must
P P
 
choose y1 = min 1, ab1 , i.e., x1 = min 1, ab1 .


If a1 ≥ b we are done and must choose x2 = · · · = xn = 0.


POtherwise x1 = 1 and
the problem can be rephrased as minimizing subject to nj=2 aj xj ≤ b − a1 .

Pna2 ≥ b − a1 we are again done. Otherwise x2 = 1, and we get the system


If
j=3 aj xj ≤ b − a1 − a2 . We continue this procedure until for some k,

ak ≥ b − a1 − · · · − ak−1 , (6.3)
Pk−1 Pk
i.e., i=1 ai < b ≤ i=1 ai . We obtain the optimal solution
!
b − k−1
P
a
i=1 i
1, . . . , 1, , 0, . . . , 0 , (6.4)
ak
Pk−1
ck (b− i=1 ai )
and the optimal value is c1 + . . . + ck−1 + ak

The dual problem is: Minimize by + y1 + · · · + yn subject to ai y + yi ≥ ci , y ≥ 0.


This can also be written as yi ≥ ci − ai y.
With y ≥ 0 fixed, clearly the minimum is obtained by choosing yj = max(cj −
aj y, 0). Let now y be chosen so that acii ≥ y ≥ aci+1
i+1
.
c
Since cj − aj y = aj ( ajj − y) we have that
(
c j − aj y j≤i
max(cj − aj y, 0) =
0 j > i.
The minimum is thus
i i
! i
X X X
by + (cj − aj y) = b− aj y+ cj .
j=1 j=1 j=1

55
56 CHAPTER 6. NONLINEAR AND CONVEX OPTIMIZATION

This states that the minimization problem  canP


be stated
 as P
finding theminimum 
of the piecewise linear function defined as b − j=1 aj y+ ij=1 cj on aci+1
i
, ci
i+1 ai
.
Pk−1 Pk ck
Thus the minimum occurs when j=1 aj ≤ b ≤ j=1 aj+1 , and at y = ak
, which
gives !
k−1 k−1
X ck X
b− aj + cj ,
j=1
ak j=1

which is the same expression we found when solving the primal problem.
The vertices are obtained by collecting all possible maxima when the ci are varied
(this gives all exposed faces, which consitute the vertices for polyhedra). By
permuting the ci in particular we see that all vectors obtained by permuting the
entries in (6.4) also are vertices. Let us take a look at how many possible such
permutations there are. The actual number depends on the k we found in (6.3).
If the middle number in (6.4) is in (0, 1), the number of vertices is n n−1
k
, where
the binomial coefficient comes from the number of ways to place k zeros among
n − 1 numbers.
Exercise 4.2: No. That v(R(u)) > v(Q) implies that that the LP relaxation has
an optimal value which is not integral. The optimal node in the enumeration tree
may still be below u, so that we cannot prune.
Exercise 4.3: Assume that x represents a Hamilton tour. Then clearly (i) holds.
Also, if W is as described in (ii), along the Hamilton tour we must pass at least
twice between an element in W and an element outside W , so that x(δ(W )) ≥ 2.
On the other hand, suppose (i) and (ii) are fulfilled. (i) secures that x passes
through each vertex twice, so that the only possibility is to have one tour, or
several subtours. Assume the latter, and let W be the node set of one of those
subtours. Then no edges are entering or leaving W , so that x(δ(W )) = 0, contra-
dicting (ii). It follows that x represents a Hamilton tour.
Exercise 4.4:
Exercise 4.5: The constraints x(E[S]) ≤ |S| − 1 (for all S ⊆ V , S 6= ∅, S 6= V ),
xe ≥ 0 forces x to be the incidence vector of a forest.
We should now add the degree constraints x(δ(v)) ≤ bv (bv is the constrained
degree at v), for all v ∈ V .
Finally we should add constraints enforcing a tree. For this we can add the con-
straints x(δ(W )) ≥ 1 (for all W ⊆ V , W 6= ∅, W 6= V ).
Exercise 4.6:
57

Exercise 4.7:
Exercise 4.8:
Exercises from [3]

Exercise 1: We have that


 
X X X X
divx (v) =  x(e) − x(e)
v∈V v∈V e∈δ + (v) e∈δ − (v)
X
= (x(e) − x(e)) = 0,
e=(u,v)∈E

since (u, v) ∈ δ + (u), and (u, v) ∈ δ − (v).


Exercise 2: This does not give meaning, since one sums together edges sets. I
suppose what is meant is to show that x(δ − (S)) = x(δ + (S)), i.e., that the total
inflow
P to S equals the total outflow from S. The total inflow P
P to S canP be written
v∈S −
e∈δ (v) x(e), while the total outflow can be written v∈S e∈δ + (v) x(e).
P P
The flow balance equations say that e∈δ − (v) x(e) = e∈δ + (v) x(e) for any v.
Adding for v ∈ S we obtain teh desired equality.
P P
We have that e∈δ+ (v) x(e) = e∈δ− (v) x(e) for any v ∈ V . Summing over v ∈ S
we obtain
X X X X X
x(e) = x(e) = x(e) = x(e).
v∈S e∈δ + (v) e∈δ − (v) e∈δ + (S) e∈δ − (S)

Exercise 3: Clearly x satisfies the bounds l ≤ x ≤ u if and only if 0 ≤ x0 ≤ u − l.


If divx (v) = 0, then
X X
div0x (v) = x0 (e) − x0 (e)
e∈δ + (v) e∈δ − (v)
X X X X
= x(e) − x(e) − l(e) + l(e)
e∈δ + (v) e∈δ − (v) e∈δ + (v) e∈δ − (v)
X X
=− l(e) + l(e),
e∈δ + (v) e∈δ − (v)

58
59

which gives the required divergence of x0 .


P
Exercise 4: We have that x∈V divx (v) = 0. Also divx (v) = 0 for v 6= s, t. It
follows that divx (s) = −divx (t). If one assumes (as one usually does) that there
is no edge entering s and no edge leaving t, we obtain
X X X
divx (s) = x(e) − x(e) = x(e) = val(x)
e∈δ + (s) e∈δ − (s) e∈δ + (s)
X X X
−divx (t) = x(e) − x(e) = x(e),
e∈δ − (t) e∈δ + (t) e∈δ − (t)

and the result follows.


Exercise 5: The value of a flow is a continuous (in fact linear) function. The
constraints 0 ≤ x ≤ c gives a closed and bounded (i.e., compact) set. It fol-
lows from the extreme value theorem that there is a maximum flow. Any linear
programming problem with box constraints has a maximum.
Exercise 6: Let x be a maximum flow. Dx thus contains no x-augmenting path.
As in the proof of Theorem 1.5 we define S(x) as the set of all vertices v to which
we can find an augmenting sv-path in Dx , and we define the cut K = δ + (S(x).
The proof shows that the capacity of this cut equals the value of the (flow), and
the result follows from Lemma 1.3.
Exercise 7:
Exercise 8:
(a) The outgoing edges from P s are (s, v) with P
b(v) > 0, and their capacities are
b(v). It follows that val(x) = v∈V + x(s, v) ≤ v∈V + b(v).
(b) Assume that a flow in D satisfies divx = b and 0 ≤ x ≤ c. Define a flow x0
in D0 by expanding x so that x0 (s, v) = b(v) for v ∈ V + , and x0 (v, t) = −b(v) for
v ∈ V − . The value of x0 is v∈V + b(v) = M , so that the value of the maximum
P
flow is M . x0 also satisfies the balancing equations for v ∈ V + , and for v ∈ V −,a
dn thus for all v.
The other way, if the value of a maximum st-flow in D0 is M , all the edges x0 (s, v)
must be at capacity c0 (s, v) = b(v) for v ∈ V + . From the flow balance equations
it follows that div(x) is b(v) for all v, and the result follows.
(c) One simply restricts x’ from E 0 to E.
Exercise 9: The Ford Fulkerson algorithm starts with the zero flow, which is
integral. Due to integral capacities, the  found by the algorithm will at each step
be integral, so that each step produces a new integral flow. After all steps we thus
end with an integral flow.
60 CHAPTER 6. NONLINEAR AND CONVEX OPTIMIZATION

Exercise 10: Since the capacities are integral, there exists a maximum flow which
is integral (Theorem 1.6). This implies that all edges have either unit- or zero
flow. Due to flow conservation, each vertex has the same number of incoming
unit flow edges as outgoing unit flow edges. Start by following edge-disjoint edges
with unit flow from s, all the way to t (if this was impossible, we would have
a contradiction to flow balancing). If one removes this st-path from D, one still
have a flow of the same type. One can in this way take out one edge-disjoint
st-path at a time, unitl there are noe edges left.
Exercise 11: From the previous exercise it is clear that the maximum number of
edge-disjoint st-paths equals the
P maximum flow, which again equals the capacity
of the minimum cut, which is e∈K c(e) = |K|.
Exercise 12: The upper and lower bounds should be defined as follows:
• For e = (ui , vj ), we set l(e) = baij c, r(e) = daij e.
• For e = (s, ui ), we set l(e) = bri c, r(e) = dri e.
• For e = (vj , t), we set l(e) = bsj c, r(e) = dsj e.
Hoffman’s circulation theorem ensures that, if a circulation exists in thsi graph, an
integral circulation also exists. Such an integral circulation represents a solution
to the matrix rounding problem.
Exercise 13: See the proof of Exercise 4.28.
Exercise 14: That every permutation matrix is a vertex follows directly from the
previous exercise. Integral matrices in Ωn must have exactly one 1 in each row
and column, the rest zeros (in order for a column/row summing to one). But this
is equivalent to being a permutation matrix.

You might also like