0% found this document useful (0 votes)
74 views

Mathematics Essentials for Convex Optimization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

Mathematics Essentials for Convex Optimization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 300

MATHEMATICAL ESSENTIALS

FOR
CONVEX OPTIMIZATION

Fatma Kilinç-Karzan, Tepper School of Business, Carnegie Mellon


University

Arkadi Nemirovski, H. Milton Stewart School of Industrial and


Systems Engineering, Georgia Institute of Technology
Contents

Preface x

Main Notational Conventions xiv

Part I Convex sets in Rn – From First Acquaintance to


Linear Programming Duality 1

1 First acquaintance with convex sets 3


1.1 Definition and examples 3
1.1.1 Affine subspaces and polyhedral sets 4
1.1.2 Unit balls of norms 5
1.1.3 Ellipsoids 7
1.1.4 Neighborhood of a convex set 7
1.2 Inner description of convex sets: convex combinations and convex hull 7
1.2.1 Convex combinations 7
1.2.2 Convex hull 8
1.2.3 Simplex 9
1.2.4 Cones 10
1.3 Calculus of convex sets 12
1.3.1 Calculus of closed convex sets 13
1.4 Topological properties of convex sets 14
1.4.1 The closure 14
1.4.2 The interior 15
1.4.3 The relative interior 17
1.4.4 Nice topological properties of convex sets 18
1.5 ⋆ Conic and perspective transforms of a convex set 22

2 Theorems of Caratheodory, Radon, and Helly 26


2.1 Caratheodory Theorem 26
2.1.1 Caratheodory Theorem, Illustration 28
2.2 Radon Theorem 29
2.3 Helly Theorem 30
2.3.1 Helly Theorem, Illustration A 32
2.3.2 Helly Theorem, Illustration B 33
2.3.3 ⋆ Helly Theorem for infinite families of convex sets 34

3 Polyhedral representations and Fourier-Motzkin elimination 36

iii
iv Contents

3.1 Polyhedral representations 36


3.2 Every polyhedrally representable set is polyhedral (Fourier-Motzkin
elimination) 38
3.2.1 Some applications 39
3.3 Calculus of polyhedral representations 40

4 General Theorem on Alternative and Linear Programming Duality 43


4.1 Homogeneous Farkas Lemma 43
4.2 Certificates for feasibility and infeasibility 44
4.3 General Theorem on Alternative 47
4.4 Corollaries of GTA 48
4.5 Application: Linear Programming Duality 50
4.5.1 Preliminaries: Mathematical and Linear Programming problems 50
4.5.2 Dual to an LP problem: the origin 52
4.5.3 Linear Programming Duality Theorem 55

5 Exercises for Part I 59


5.1 Elementaries 59
5.2 Around ellipsoids 61
5.3 Truss Topology Design 62
5.4 Around Caratheodory Theorem 66
5.5 Around Helly Theorem 69
5.6 Around Polyhedral Representations 70
5.7 Around General Theorem on Alternative 71
5.8 Around Linear Programming Duality 72

6 Proofs of Facts from Part I 76

Part II Separation Theorem, Extreme Points, Reces-


sive Directions, and Geometry of Polyhedral Sets 81

7 Separation Theorem 83
7.1 Separation: definition 83
7.2 Separation Theorem 85

8 Consequences of Separation Theorem 90


8.1 Supporting hyperplanes 90
8.2 Extreme points and Krein-Milman Theorem 91
8.2.1 Extreme points: definition 91
8.2.2 Krein-Milman Theorem 92
8.3 Recessive directions and recessive cone 96
8.4 Dual cone 100
8.5 ⋆ Dubovitski-Milutin Lemma 104
8.6 Extreme rays and conic Krein-Milman Theorem 107
8.7 ⋆ Polar of a convex set 110

9 Geometry of polyhedral sets 112


Contents v

9.1 Extreme points of polyhedral sets 112


9.1.1 Important polyhedral sets and their extreme points 115
9.2 Extreme rays of polyhedral cones 120
9.3 Geometry of polyhedral sets 121
9.3.1 Application: Descriptive theory of Linear Programming 124
9.3.2 Application: Solvability of a Linear Programming problem 124
9.3.3 Proof of Theorem II.9.10 126
9.4 ⋆ Majorization 128

10 Exercises for Part II 131


10.1 Separation 131
10.2 Extreme points 132
10.3 Cones and extreme rays 135
10.4 Recessive cone 136
10.5 Around majorization 137
10.6 Around polars 137
10.7 Miscellaneous exercises 138

11 Proofs of Facts from Part II 145

Part III Convex Functions 153

12 First acquaintance with convex functions 155


12.1 Definition and examples 155
12.2 Jensen’s inequality 157
12.3 Convexity of sublevel sets 158
12.4 Value of a convex function outside its domain 159

13 How to detect convexity 161


13.1 Operations preserving convexity of functions 161
13.2 Criteria of convexity 165
13.2.1 Differential criteria of convexity 166
13.3 Important multivariate convex functions 170
13.4 Gradient inequality 173
13.5 Boundedness and Lipschitz continuity of a convex function 174

14 Minima and maxima of convex functions 179


14.1 Minima of convex functions 179
14.1.1 Unimodality 179
14.1.2 Necessary and sufficient optimality conditions 180
14.1.3 ⋆ Symmetry Principle 183
14.2 Maxima of convex functions 185

15 Subgradients 188
15.1 Proper lower semicontinuous convex functions and their representation 188
15.2 Subgradients 198
15.3 Subdifferentials and directional derivatives of convex functions 204
vi Contents

16 ⋆ Legendre transform 209


16.1 Legendre transform : Definition and examples 209
16.2 Legendre transform : Main properties 211
16.3 Young, Hölder, and Moment inequalities 213
16.3.1 Young’s inequality 213
16.3.2 Hölder’s inequality 214
16.3.3 Moment inequality 215

17 ⋆ Functions of eigenvalues of symmetric matrices 217

18 Exercises for Part III 222


18.1 Around convex functions 222
18.2 Around support, characteristic, and Minkowski functions 226
18.3 Around subdifferentials 229
18.4 Around Legendre transform 230
18.5 Miscellaneous exercises 230

19 Proofs of Facts from Part III 233

Part IV Convex Programming, Lagrange Duality, Sad-


dle Points 237

20 Convex Programming problems and Convex Theorem on Alterna-


tive 239
20.1 Mathematical Programming and Convex Programming problems 239
20.1.1 Convex Programming problem 240
20.2 Convex Theorem on Alternative 240
20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 243

21 Lagrange Function and Lagrange Duality 253


21.1 Lagrange function 253
21.2 Convex Programming Duality Theorem 253
21.3 Lagrange duality and saddle points 255

22 ⋆ Convex Programming in cone-constrained form 257


22.1 Convex problem in cone-constrained form 257
22.2 Cone-constrained Lagrange function 257
22.3 Convex Programming Duality Theorem in cone-constrained form 258
22.4 Conic Programming and Conic Duality Theorem 260

23 Optimality Conditions in Convex Programming 268


23.1 Saddle point form of optimality conditions 268
23.2 Karush-Kuhn-Tucker form of optimality conditions 271
23.3 Cone-constrained KKT optimality conditions 273
23.4 Optimality conditions in Conic Programming 276

24 Duality in Linear and Convex Quadratic Programming 278


Contents vii

24.1 Linear Programming Duality 278


24.2 Quadratic Programming Duality 279

25 ⋆ Cone-convex functions: elementary calculus and examples 283


25.1 Epigraph characterization of cone-convexity 283
25.2 Testing cone-convexity and cone-monotonicity 284
25.2.1 Cone-monotonicity 284
25.2.2 Differential criteria for cone-convexity and cone-monotonicity 284
25.3 Elementary calculus of cone-convexity 287

26 ⋆ Mathematical Programming Optimality Conditions 290


26.1 Formulating Optimality conditions 291
26.2 Justifying Optimality conditions 292
26.2.1 Main tool: Implicit Function Theorem 293
26.2.2 Strategy 293
26.2.3 Justifying optimality conditions for (26.4) 294
26.2.4 Justifying Propositions IV.26.3, IV.26.4 298
Justifying Proposition IV.26.3 299
Justifying Proposition IV.26.4 299
26.3 Concluding remarks 300
26.3.1 Illustration: S-Lemma revisited 302

27 Saddle points 304


27.1 Definition and Game Theory interpretation 304
27.2 Existence of Saddle Points: Sion-Kakutani Theorem 307
27.3 Proof of Sion-Kakutani Theorem 309
27.3.1 Proof of Minimax Lemma 309
27.3.2 From Minimax Lemma to the proof of Sion-Kakutani Theorem 310
27.4 Sion-Kakutani Theorem: A refinement 311

28 Exercises for Part IV 314


28.1 Around Conic Duality 314
28.1.1 ⋆ Geometry of primal-dual pair of conic problems 317
28.2 Around S-Lemma 319
28.3 Miscellaneous exercises 321
28.4 Around convex cone-constrained and conic problems 325
28.5 ⋆ Cone-convexity 331
28.6 ⋆ Around conic representations of sets and functions 334
28.6.1 Conic representations: definitions 334
28.6.2 Conic representability: elementary calculus 335
28.6.3 R/L/S hierarchy 337
28.6.4 More calculus 338
28.6.5 Raw materials 338

29 Proofs of Facts from Part IV 343

Appendix A Prerequisites from Linear Algebra 346


A.1 Space Rn : algebraic structure 346
A.1.1 A point in Rn 346
viii Contents

A.1.2 Linear operations 346


A.2 Linear subspaces 347
A.2.1 Examples of linear subspaces 347
A.2.2 Sums and intersections of linear subspaces 349
A.2.3 Linear independence, bases, dimensions 349
A.2.4 Linear mappings and matrices 352
A.2.5 Determinant and rank 353
A.2.6 Determinant 353
A.2.7 Rank 354
A.3 Space Rn : Euclidean structure 355
A.3.1 Euclidean structure 355
A.3.2 Inner product representation of linear forms on Rn 356
A.3.3 Orthogonal complement 357
A.3.4 Orthonormal bases 358
A.4 Affine subspaces in Rn 361
A.4.1 Affine subspaces and affine hulls 361
A.4.2 Intersections of affine subspaces, affine combinations, and affine hulls 362
A.4.3 Affinely spanning sets, affinely independent sets, affine dimension 363
A.4.4 Dual description of linear subspaces and affine subspaces 367
A.4.5 Affine subspaces and systems of linear equations 367
A.4.6 Structure of the simplest affine subspaces 369
A.5 Exercises 370
A.6 Proofs of Facts 371

Appendix B Prerequisites from Real Analysis 372


B.1 Space Rn : metric structure and topology 372
B.1.1 Euclidean norm and distances 372
B.1.2 Convergence 375
B.1.3 Closed and open sets 377
B.1.4 Local compactness of Rn 379
B.2 Continuous functions on Rn 381
B.2.1 Continuity of a function 381
B.2.2 Elementary continuity-preserving operations 382
B.2.3 Basic properties of continuous functions on Rn 383
B.3 Semicontinuity 385
B.3.1 Functions with values in the extended real axis 385
B.3.2 Epigraph of a function 386
B.3.3 Lower semicontinuity 386
B.3.4 Hypograph and upper semicontinuity 388
B.4 Exercises 388
B.5 Proofs of Facts 389

Appendix C Prerequisites from Calculus 391


C.1 Differentiable functions on Rn 391
C.1.1 The derivative 391
C.1.2 Derivative and directional derivatives 393
C.1.3 Representations of the derivative 395
C.1.4 Existence of the derivative 396
C.1.5 Calculus of derivatives 397
C.1.6 Computing the derivative 398
Contents ix

C.2 Higher order derivatives 400


C.2.1 Calculus of Ck mappings 404
C.2.2 Examples of higher-order derivatives 405
C.2.3 Taylor expansion 406

Appendix D Prerequisites: Symmetric Matrices and Positive Semidef-


inite Cone 408
D.1 Symmetric matrices 408
D.1.1 Main facts on symmetric matrices 409
D.1.2 Variational characterization of eigenvalues 412
D.1.3 Corollaries of the VCE 414
D.1.4 Spectral norm and Lipschitz continuity of vector of eigenvalues 416
D.1.5 Functions of symmetric matrices 417
D.2 Positive semidefinite matrices and positive semidefinite cone 420
D.2.1 Positive semidefinite matrices. 420
D.2.2 The positive semidefinite cone 422
D.3 Exercises 426
D.4 Proofs of Facts 427
References 433
Index 434

Solutions to Selected Exercises 1


Exercises from Part I 2
Exercises from Part II 37
Exercises from Part III 72
Exercises from Part IV 102
Exercises from Appendix A 157
Exercises from Appendix B 160
Exercises from Appendix D 162
Preface

Convex optimization serves as a cornerstone in various fields of science, engineer-


ing, and mathematics, offering powerful tools for solving a wide range of practical
problems. With the latest advancements in data sciences and engineering, convex
optimization has flourished into a vibrant and rapidly evolving field.
This textbook aims to introduce fundamental theory necessary to establish a
robust foundation for doing research on convex optimization. In particular, we
have selected to cover both the indispensable basics suited for beginners – rooted
in centuries-old research on convexity – as well as modern facets of convex op-
timization, e.g., cone-constrained conic programming, targeting more advanced
readers.
Our emphasis is on foundations and mathematical prerequisites that underpin
(primarily Convex) Optimization Theory, and not operational aspects like Mod-
eling and Algorithms. This deliberate choice stems from our desire of emphasiz-
ing “timeless and essential classics” and providing an accessible, self-contained,
concise, rigorous, yet practical mathematical toolkit. Our goal is to illuminate
the entrance to the field of convex optimization, offering readers the background
necessary to engage with and comprehend the state-of-the-art “operational” op-
timization literature, like excellent books [BV04, Nes18]. While applications and
algorithms naturally evolve with changing trends and advancements, they will
always rely on these timeless foundational blocks. Overall, we view the primary
purpose of this book as learning and teaching as opposed to an extensive reference
to be kept on shelf by experts.
To an expert in the field: the primary focus of this book is Convex Analysis
and the basic theory of convex optimization. Convex Analysis boasts a rich and
profound theoretical framework, chronicled in classical treatises such as [Roc70,
IT79, HUL93]. However, our approach and presentation style here are tailored
to meet the needs of those new to the subject. While we have condensed the
scope compared to these classical resources, we have strived to maintain rigor
and cover the fundamental descriptive mathematical groundwork that we believe
is necessary for state-of-the-art research in Convex Optimization models and
algorithms. With regards to convex optimization theory, once again our emphasis
is on timeless building blocks such as duality and optimality conditions.
Our intended audience, are students, practitioners, researchers with back-
x
Preface xi

grounds in mathematics, operations research, engineering, computer science, data


sciences, statistics, and economics.
As prerequisites all we assume is basic knowledge of Linear Algebra and Calcu-
lus. In fact, we do not anticipate a deep, “ready-to-use” mastery of these subjects
either; rather, we expect a basic mathematical culture. To clarify our expecta-
tions, consider the following: asserting that 2×2 = 5 does not necessarily indicate
a deficiency in mathematical culture; it may simply be a miscalculation. In con-
trast, claiming that 2 × 2 is a triangle or a violin (occasionally encountered, albeit
perhaps figuratively rather than literally, in our teaching experience) does signify
a lack of mathematical culture: under any circumstance, the product of two real
numbers should invariably yield another real number and cannot be a triangle or
a violin.
Our choice of material is driven by years of experience teaching graduate-level
courses on Nonlinear and Convex Optimization. We have organized this material
into four main parts:
• basics on convex sets – instructive examples, “calculus” (convexity-preserving
operations), main theorem on convex sets such as Caratheodory and Helly
theorems, topology of convex sets, “descriptive basics” of Linear Programming
(General Theorem on Alternative, Linear Programming duality),
• separation theorem and its applications – extreme points, extreme rays, reces-
sive directions, and (finite-dimensional) Krein-Milman Theorem, structure of
polyhedral sets,
• basics on convex functions – instructive examples, detecting convexity, “cal-
culus,” subgradient inequality and preliminaries on subgradients, maxima and
minima, Legendre transformation,
• basic theory of Convex Optimization – Lagrange Duality and Lagrange Duality
Theorem for problems in standard form and in cone-constrained form, Conic
Programming and Conic Duality Theorem, optimality conditions in Mathe-
matical Programming, and convex-concave saddle points and Sion-Kakutani
Theorem.
We envision that in a semester-long course on convex optimization, this material
may cover about 40% or 60% of the course for building the foundational blocks
before moving to other parts (modeling and/or algorithms) of convex optimiza-
tion. For a course more geared towards linear and/or combinatorial optimization,
an instructor may opt to use specific material from the first two parts.
For reader’s convenience, the elementary facts from Linear Algebra, Calculus,
Real Analysis, and Matrix Analysis are summarized in the appendices of the
textbook (reproducing, courtesy of World Scientific Publishing Co., appendices
A – C in [Nem24]). In contrast to the main body of the textbook, these appen-
dices usually do not feature accompanying proofs, which are readily available
in standard undergraduate textbooks covering the respective subjects (see e.g.,
[Axl15, Edw12, Gel89, Pas22, Rud13, Str06]). A well-prepared reader may opt for
xii Preface

a “fast-forward” approach by initially reviewing these appendices before delving


into the main body of the book. Alternatively, they may commence their reading
journey from Part 1, referring back to the appendices as necessary.
Certain sections in our text, with titles starting with ⋆, delve into more advanced
and specialized topics, such as Conic, Perspective, and Legendre transforms,
Majorization, Cone-convexity, Cone-monotonicity, among others. Although these
starred sections hold significance in their respective domains, they are designed
to be optional and can be skipped over depending on the goals of the reader.
Our exposition adheres to the usual standards of rigor needed to present math-
ematical subjects. Accordingly, we provide complete formal proofs for all of the
theorems, propositions, lemmas, and the like. In addition to these, we include
formal statements of similar nature labeled as “Facts” scattered throughout a
chapter, but without accompanying proofs. The claims made in these “Facts”
are also compulsory part of our exposition, and their knowledge is as manda-
tory for mastering the material as the knowledge of theorems, propositions, etc.
Nonetheless, the statements within “Facts” are sufficiently elementary to be eas-
ily verified by a diligent reader. In essence, “Facts” serve as embedded exercises,
and we firmly believe that engaging with these exercises as they appear in the
text provides valuable hands-on practice for effective learning. This active partic-
ipation is an indispensable facet of mastering the presented material and honing
mathematical skills. At the same time, we recognize the importance of providing
access to detailed self-contained proofs of “Facts.” To this end, nearly all1 “Facts”
are repeated, this time with accompanying proofs, at the end of the respective
parts.
The exercises are crafted to align with our educational objective of fostering
hands-on learning and providing ample practice opportunities at various diffi-
culty levels. In particular, they are categorized into three types. The traditional
“Test Yourself” exercises enable readers to evaluate their grasp of the material
presented in the main body of the textbook. In addition to these, we also offer
“Try Yourself” type of exercises which typically require readers to prove some-
thing, aimed at fostering and assessing their creative skills. Finally, our “Educate
Yourself” exercises address topics that we deem significant, extending beyond the
core material covered in the textbook’s main body. An illustrative example for
this type of exercises is the investigation of conic representations of convex sets
and functions, along with the calculus associated with these representations. This
plays a crucial role in formulating and solving “well-structured” convex optimiza-
tion problems, particularly those involving Second Order Conic and Semidefinite
programs. We provide a separate solution manual for “Try Yourself” and “Edu-
cate Yourself” exercises.
Acknowledgements. The main body of this textbook existed for about 25 years,
1 few exceptions are absolutely straightforward and presented as Facts solely for the sake of further
references.
Preface xiii

in a more restricted form, as appendices to the graduate course on Modern Con-


vex Optimization taught by the second author first at TU Delft (1998) and then
at Georgia Institute of Technology (since 2003). The first author was fortunate to
have been a student at Georgia Tech thoroughly enjoying this material, and later
on she adopted this material and has been teaching a similar course at Carnegie
Mellon University (since 2012). These appendices originate from the descriptive
part of graduate course “Optimization II” designed in 1980’s by Prof. Aharon
Ben-Tal and for over 20 years taught by him at the Technion – Israel Institute
of Technology. This course was “inherited” and taught, in re-designed form, by
the second author first at the Technion, and then - at Georgia Institute of Tech-
nology. It is our pleasure to acknowledge hereby, with the deepest gratitude, the
instrumental role played by Prof. Ben-Tal in selecting and structuring significant
part of the material to follow. Besides this, we are greatly indebted to Dr. Sergei
Gelfand for the idea to convert these appendices into a “standalone” textbook.

Fatma Kılınç-Karzan, Tepper Business School, Carnegie Mellon University


Arkadi Nemirovski, H. Milton Stewart School of Industrial and Systems
Engineering, Georgia Institute of Technology
February 2024
xiv Main Notational Conventions

Main Notational Conventions

N, Z, Q, R, C stand for the set of all, respectively, nonnegative integers, integers,


rational numbers, real numbers, and complex numbers.
Vectors and matrices. By default, all vectors are column vectors.

• The space of all n-dimensional vectors with real entries is denoted by Rn ,


the set of all m × n matrices with real entries is denoted by Rm×n ; notation
Nn , . . . , Cm×n is interpreted similarly, where we restrict the entries to belong to
the respective number domains. The set of symmetric n×n matrices is denoted
by Sn . By default, all vectors and matrices have real entries, and when speaking
about Rn and Sn (Rm×n ), n (m and n) are positive integers.
By default, notation like xi (or yk ) refers to i-th entry of context-specified
vector x (or k-th entry of context-specified vector y). Similarly, notation like
xij (or Ykℓ ) refers to (i, j)-th entry of context-specified matrix x (or (k, ℓ)-th
entry of context-specified matrix Y ).
• Sometimes, “MATLAB notation” is used to save space: a vector with coordi-
nates x1 , . . . , xn is written down as

x = [x1 ; . . . ; xn ]
 
1
(pay attention to semicolon “;”). For example,  2  is written as [1; 2; 3].
3
More generally,
— if A1 , . . . , Am are matrices with the same number of columns, we write
[A1 ; . . . ; Am ] to denote the matrix obtained by writing A2 beneath A1 , A3
beneath A2 , and so on.
— if A1 , . . . , Am are matrices with the same number of rows, then [A1 , . . . , Am ]
stands for the matrix obtained by writing A2 to the right of A1 , A3 to the right
of A2 , and so on.
Examples:  
  1 2 3
1 2 3  
• A1 = , A2 = 7 8 9 =⇒ [A1 ; A2 ] =  4 5 6 
4 5 6
7 8 9
     
1 2 7 1 2 7
• A1 = , A2 = =⇒ [A1 , A2 ] =
3 4 8 4 5 8
• [1, 2, 3, 4] = [1; 2; 3; 4]⊤      
1 2 5 6 1 2 5 6
• [[1, 2; 3, 4], [5, 6; 7, 8]] = , =
3 4 7 8 3 4 7 8
= [1, 2, 5, 6; 3, 4, 7, 8]
• We follow the standard
P0 convention that the sum of vectors over an empty set
of indexes, i.e., i=1 xi , where xi ∈ Rn , has a value – it is the origin in Rn .
Main Notational Conventions xv

Intervals in Rn . Given two vectors x, y ∈ Rn , we use the notation [x, y] to denote


the segment in Rn that connects x and y, where both endpoints are included,
i.e., [x, y] := {λx + (1 − λ)y : 0 ≤ λ ≤ 1}. Similarly, we define the open segment
(x, y) := {λx + (1 − λ)y : 0 < λ < 1} in Rn without the endpoints.
Semidefinite order. Relations A ⪰ B, B ⪯ A, A − B ⪰ 0, B − A ⪯ 0 all
mean the same, namely, that A, B are real symmetric matrices of the same size
such that the difference A − B is positive semidefinite. Positive definiteness of
the difference of A − B is expressed by every one of the relations A ≻ B, B ≺ A,
A − B ≻ 0, B − A ≺ 0.
Diag and Dg. For x ∈ Rn , Diag{x} stands for diagonal n × n matrix with the
entries of x on the diagonal.

For a collection

X1 , . . . , XK of rectangular matrices,
X1
Diag{X1 , . . . , Xk } =  .. stands for block-diagonal matrix with diag-
 

. 
Xk
onal blocks X1 , . . . , XK . For an n × n matrix X, Dg{X} stands for n-dimensional
vector composed of diagonal entries of X.

Extended real axis. We follow the standard conventions on operations of sum-


mation, multiplication, and comparison in the “extended real line” R ∪ {+∞} ∪
{−∞}. These conventions are as follows:
• Operations with real numbers are understood in their usual sense.
• The sum of +∞ and a real number, same as the sum of +∞ and +∞ is +∞.
Similarly, the sum of a real number and −∞, same as the sum of −∞ and −∞
is −∞. The sum of +∞ and −∞ is undefined.
• The product of a real number and +∞ is +∞, 0 or −∞, depending on whether
the real number is positive, zero or negative, and similarly for the product of
a real number and −∞. The product of two “infinities” is again infinity, with
the usual rule for assigning the sign to the product.
• Finally, any real number is < +∞ and > −∞, and of course −∞ < ∞.

Abbreviations. From time to time we use the following abbreviations:


a.k.a. for “also known as”
iff for “if and only if”
w.l.o.g. for “without loss of generality”
w.r.t. for “with respect to”

Symbols ▲ and ♦ mark respectively “try yourself” and “educate yourself”


exercises.
Part I

Convex sets in Rn – From First


Acquaintance to Linear Programming
Duality

1
1

First acquaintance with convex sets

1.1 Definition and examples


In the school geometry a figure is called convex if it contains, along with every
pair of its points x, y, also the entire segment [x, y] linking the points. This is
exactly the definition of a convex set in the multidimensional case; all we need
is to say what “the segment [x, y] linking the points x, y ∈ Rn ” is. We state this
formally in the following definition.

Definition I.1.1 [Convex set]


1) Let x, y be two points in Rn . The set
[x, y] := {λx + (1 − λ)y : 0 ≤ λ ≤ 1}
is called a segment with the endpoints x, y.
2) A subset M of Rn is called convex, if it contains, along with every pair
of its points x, y, also the entire segment [x, y]:
x, y ∈ M, 0 ≤ λ ≤ 1 =⇒ λx + (1 − λ)y ∈ M.

The definition of a segment [x; y] is in full accordance with our “real life ex-
perience” in 2D or 3D: when λ ∈ [0, 1], the point x(λ) = λx + (1 − λ)y =
x + (1 − λ)(y − x) is the point where you arrive when traveling from x directly
towards y after you have covered the fraction (1 − λ) of the entire distance from x
to y, and these points compose the “real world segment” with endpoints x = x(1)
and y = x(0).
Note that an empty set is convex by the exact sense of the definition: for the
empty set, you cannot present a counterexample to show that it is not convex.
A closed ray given by a direction 0 ̸= d ∈ Rn is also convex:

R+ (d) := {t d ∈ Rn : t ≥ 0} .

Note also that the open ray given by {t d ∈ Rn : t > 0} is convex as well.
3
4 First acquaintance with convex sets

Figure I.1. a – d): convex sets; e – h): nonconvex sets


We next continue with a number of examples of convex sets.

1.1.1 Affine subspaces and polyhedral sets


We start with a simple and important fact.

Proposition I.1.2 The solution set of an arbitrary (possibly, infinite) sys-


tem
a⊤
α x ≤ bα , α ∈ A (1.1)
of nonstrict linear inequalities with n unknowns x, i.e., the set
S := x ∈ Rn : a⊤

α x ≤ bα , α ∈ A

is convex.
Proof. Consider any x′ , x′′ ∈ S and any λ ∈ [0, 1]. As x′ , x′′ ∈ S, we have
a⊤ ′ ⊤ ′′
α x ≤ bα and aα x ≤ bα for any α ∈ A. Then, for every α ∈ A, multiplying the
inequality aα x ≤ bα by λ, and the inequality a⊤
⊤ ′ ′′
α x ≤ bα by 1 − λ, respectively,
and summing up the resulting inequalities, we get a⊤ ′ ′′
α [λx + (1 − λ)x ] ≤ bα . Thus,
′ ′′
we deduce that λx + (1 − λ)x ∈ S.
Note that this verification of convexity of S works also in the case when in the
definition of S some of nonstrict inequalities a⊤ α x ≤ bα are replaced with their
strict versions a⊤
α x < bα .
Recall that linear and affine subspaces can be represented as the solution sets
of systems of linear equations (Proposition A.47). Consequently, from Proposi-
tion I.1.2 we deduce that such sets are convex.
Example I.1.1 All linear subspaces and all affine subspaces of Rn are convex.

Another important special case of Proposition I.1.2 is the one when we have
a finite system of nonstrict linear inequalities. Such sets have a special name as
they are frequently encountered and studied.
1.1 Definition and examples 5

Definition I.1.3 [Polyhedral set] A set in Rn is called polyhedral if it is the


solution set of a finite system
Ax ≤ b
of m nonstrict linear inequalities with n variables (i.e., A is an m × n matrix)
for some nonnegative integer m.

Based on this definition and as an immediate consequence of Proposition I.1.2,


we arrive at our second generic example of convex sets.
Example I.1.2 Any polyhedral set in Rn is convex. ♢
Remark I.1.4 Note that every set given by Proposition I.1.2 is not only convex,
but also closed (why?). In fact, Separation Theorem (see Theorem II.7.3) implies
the following:
Every closed convex set in Rn is the solution set of an infinite system
a⊤
i x ≤ bi , i = 1, 2, . . ., of nonstrict linear inequalities.


Remark I.1.5 Replacing some of the nonstrict linear inequalities a⊤
αx≤ bα in
system (1.1) with their strict versions a⊤
α x < bα preserves, as we have already
mentioned, convexity of the solution set, but can destroy its closedness. ■

1.1.2 Unit balls of norms


Let ∥ · ∥ be a norm on Rn i.e., a real-valued function on Rn satisfying the three
characteristic properties of a norm (section B.1.1), specifically:
1. Positivity: ∥x∥ ≥ 0 for all x ∈ Rn , and ∥x∥ = 0 if and only if x = 0;
2. Homogeneity: For x ∈ Rn and λ ∈ R, we have ∥λx∥ = |λ|∥x∥;
3. Triangle inequality: For all x, y ∈ Rn , we have ∥x + y∥ ≤ ∥x∥ + ∥y∥.

Fact I.1.6 The unit ball of a norm ∥ · ∥, i.e., the set


{x ∈ Rn : ∥x∥ ≤ 1} ,
same as every other ∥ · ∥-ball
Br (a) := {x ∈ Rn : ∥x − a∥ ≤ r} ,
(here a ∈ Rn and r ≥ 0 are fixed) is convex.
√ balls (∥ · ∥-balls associated with the standard Eu-
In particular, Euclidean
clidean norm ∥x∥2 := x⊤ x) are convex.

The standard examples of norms on Rn are the ℓp -norms


 P
( n |xi |p )1/p , if 1 ≤ p < ∞
i=1
∥x∥p =
 max |xi |, if p = ∞.
1≤i≤n
6 First acquaintance with convex sets

These indeed are norms (which is not clear in advance; for proof, see page 156,
and for more details – page 215). When p = 2, we get the usual Euclidean norm.
When p = 1, we get
n
X
∥x∥1 = |xi |,
i=1

and its unit ball is the hyperoctahedron


( n
)
X
n
V = x∈R : |xi | ≤ 1 .
i=1

When p = ∞, we get
∥x∥∞ = max |xi |,
1≤i≤n

and its unit ball is the hypercube


V = {x ∈ Rn : − 1 ≤ xi ≤ 1, 1 ≤ i ≤ n} ,
see Figure I.2.

Figure I.2. ∥ · ∥p -balls in 2D, p = 1 (diamond), p = 2 (circle), p = ∞ (box).


Remark I.1.7 As we have already mentioned, the fact that the ℓp norms,
1 ≤ p ≤ ∞, indeed are norms, is not completely trivial and will be proved in
full generality later. What is evident, is that ∥ · ∥p does possess properties of
positivity and homogeneity; what requires effort, is the triangle inequality. There
are, however, two special cases, i.e., p = 1 and p = ∞, where this inequality
is easy. Indeed, from high school you know that for reals a, b it always holds
|a + b| ≤ |a| + |b|. It follows that
X X X X
∥x + y∥1 = |xi + yi | ≤ (|xi | + |yi |) = |xi | + |yi | = ∥x∥1 + ∥y∥1 ,
i i i i

and
∥x + y∥∞ = max |xi + yi | ≤ max {|xi | + |yi |} ≤ max {|xi | + |yj |}
i i i,j

= max |xi | + max |yj | = ∥x∥∞ + ∥y∥∞ .


i j

Triangle inequality for Euclidean norm ∥ · ∥2 should be already known to the


reader; this is an immediate consequence of Cauchy-Schwarz inequality |x⊤ y| ≤
∥x∥2 ∥y∥2 , see section B.1.1. ■
1.2 Inner description of convex sets: convex combinations and convex hull 7

Fact I.1.8 Unit balls of norms on Rn are exactly the same as convex sets
V in Rn satisfying the following three properties:
(i) V is symmetric with respect to the origin: x ∈ V =⇒ −x ∈ V ;
(ii) V is bounded and closed;
(iii) V contains a neighborhood of the origin, i.e., there exists r > 0 such that
the centered at the origin Euclidean ball of radius r – the set {x ∈ Rn :
∥x∥2 ≤ r} – is contained in V .
Any set V satisfying the outlined properties is indeed the unit ball of a
particular norm given by
∥x∥V := inf t : t−1 x ∈ V, t > 0 .

(1.2)
t

1.1.3 Ellipsoids

Fact I.1.9 Let Q be an n × n matrix which is symmetric (i.e., Q = Q⊤ ) and


positive definite (i.e., x⊤ Qx > 0 for all x ̸= 0). Then, for every nonnegative
r, the Q-ellipsoid of radius r centered at a, i.e., the set
x ∈ Rn : (x − a)⊤ Q(x − a) ≤ r2


is convex.

1.1.4 Neighborhood of a convex set


Example I.1.3 Let M be a nonempty convex set in Rn , and let ϵ > 0. Then,
for every norm ∥ · ∥ on Rn , the ϵ-neighborhood of M , i.e., the set
 
n
Mϵ := y ∈ R : inf ∥y − x∥ ≤ ϵ
x∈M

is convex. ♢
Justification of Example I.1.3 is left as an exercise at the end of this Part (see
Exercise I.6).

1.2 Inner description of convex sets: convex combinations and convex


hull
1.2.1 Convex combinations
Recall the notion of linear combination x of vectors x1 , . . . , xm ; this is a vector
represented as
m
X
x= λi xi ,
i=1
8 First acquaintance with convex sets

where λi ∈ R are the coefficients. By including a specific restriction on which


coefficients can be used in this definition, we arrive at important special types of
linear combinations. For example, an affine combination is a linear combination
where the sum of the coefficients is equal to 1. Given a nonempty set X, the
smallest (w.r.t. inclusion) affine plane containing X is composed of all affine
combinations of the points of X, see section A.4.2. Another beast in this genre is
convex combination.

Definition I.1.10 [Convex combination] A convex combination of vectors


x1 , . . . , xm is a linear combination
m
X
x := λi xi ,
i=1

with nonnegative coefficients summing up to 1:


m
X
λi ≥ 0, ∀i = 1, . . . , m, λi = 1.
i=1

Equivalently, convex combination is an affine combination with nonnegative co-


efficients.
By Linear Algebra, a nonempty set X ⊆ Rn is a linear (or an affine) sub-
space if and only if X is closed with respect to taking all linear, respectively, all
affine combinations of its elements. Convex combinations play similar role when
speaking about convex sets.

Fact I.1.11 A set M ⊆ Rn is convex if and only if it is closed with respect


to taking all convex combinations of its elements. That is, M is convex if and
only if every convex combination of vectors from M is again a vector from
M.
Hint: Note that assuming λ1 , . . . , λm > 0, one has
m m
X X λi
λi xi = λ1 x1 + (λ2 + λ3 + . . . + λm ) µi xi , where µi := .
i=1 i=2
λ2 + λ3 + . . . + λm

(cf. Corollary A.39).

1.2.2 Convex hull


Recall that taking the intersection of linear subspaces results in another linear
subspace. The same property holds true for convex sets as well (why?).

Proposition I.1.12 Let {Mα }α be an arbitrary family of convex subsets of


Rn . Then, their intersection
\
M= Mα
α
1.2 Inner description of convex sets: convex combinations and convex hull 9

is also convex.

As an immediate consequence of Proposition I.1.12, we come to the notion of


convex hull Conv(M ) of a subset M ⊆ Rn (cf. the notions of linear/affine span):

Definition I.1.13 [Convex hull] For any M ⊆ Rn , the convex hull of M


[notation: Conv(M )] is the intersection of all convex sets containing M (and
thus, by Proposition 1.2.2 Conv(M ) is the smallest (w.r.t. inclusion) convex
set containing M ).

By Linear Algebra, the linear span of a set M – the smallest (w.r.t. inclusion)
linear subspace containing M – can be described in terms of linear combinations:
this is the set of all linear combinations of points from M . Analogous results hold
for affine span of (nonempty) set and affine combinations of points from the set as
well. We have an analogous description of convex hulls via convex combinations
as well:

Fact I.1.14 [Convex hull via convex combinations] For a set M ⊆ Rn ,


Conv(M ) = {the set of all convex combinations of vectors from M } .

We will see in section 9.3 that when M is a finite set in Rn , Conv(M ) is a bounded
polyhedral set. Bounded polyhedral sets are also called polytopes.
We next continue with a number of important families of convex sets.

1.2.3 Simplex

Definition I.1.15 [Simplex] The convex hull of m + 1 affinely indepen-


dent points x0 , . . . , xm is called the m-dimensional simplex with the vertices
x0 , . . . , xm . (See section A.4.3 for affine independence.)

Consider an m-dimensional simplex with vertices x0 , . . . , xm . Then, based on


section A.4.3, every point x from this simplex admits exactly one representation
as a convex combination of these vertices. The coefficients λi , i = 0, . . . , m, used
in the convex combination representation of x form the unique solution to the
system of linear equations given by
m
X m
X
λi xi = x, λi = 1.
i=0 i=0

This system in variables λi is feasible if and only if x ∈ M = Aff({x0 , . . . , xm }),


and the components of the solution (the barycentric coordinates of x) are affine
functions of x ∈ Aff(M ). The simplex itself is composed of points from M with
nonnegative barycentric coordinates.
10 First acquaintance with convex sets

1.2.4 Cones
We next examine a very important class of convex sets.
A nonempty set K ⊆ Rn is called conic if it contains, along with every point
x ∈ K, the entire ray R+ (x) = {tx : t ≥ 0} spanned by the point:
x∈K =⇒ tx ∈ K, ∀t ≥ 0.
Note that based on our definition, any conic set is nonempty and it always con-
tains the origin.

Definition I.1.16 [Cone] A cone is a nonempty, convex, and conic set.

Fact I.1.17 A set K ⊆ Rn is a cone if and only if it is nonempty and


• is conic, i.e., x ∈ K, t ≥ 0 =⇒ tx ∈ K; and
• contains sums of its elements, i.e., x, y ∈ K =⇒ x + y ∈ K.

Example I.1.4 The solution set of an arbitrary (possibly, infinite) system of


homogeneous linear inequalities with n unknowns x, i.e., the set
K = x ∈ Rn : a⊤

α x ≥ 0, ∀α ∈ A ,

is a cone.
In particular, the solution set of a finite system composed of m homogeneous
linear inequalities
Ax ≥ 0
(A is m × n matrix) is a cone. A cone of this latter type is called polyhedral 1 .
Specifically, the nonnegative orthant Rm
+ := {x ∈ R
m
: x ≥ 0} is a polyhedral
cone. ♢
Note that the cones given by systems of linear homogeneous nonstrict inequal-
ities are obviously closed. From Separation Theorem (see Theorem II.7.3) we will
deduce the reverse as well, i.e., every closed cone is the solution set to such a
system. Thus, Example I.1.4 is the generic example of a closed convex cone.
We already know that a norm ∥·∥ on Rn gives rise to specific convex sets in Rn ,
namely, balls of this norm. In fact, a norm also gives rise to another important
convex set.

Proposition I.1.18 For any norm ∥ · ∥ on Rn , its epigraph, i.e., the set
K := [x; t] ∈ Rn+1 : t ≥ ∥x∥


is a closed cone in Rn+1 .


1 The “literal” interpretation of the words “polyhedral cone” should be “a set of the form {x : Ax ≤ b}
which is a cone;” this is not exactly the terminology just introduced. Luckily, there is no collision: a
polyhedral set X = {x : Ax ≤ b} is a cone if and only if X = {x : Ax ≤ 0}, see Exercise II.32.
1.2 Inner description of convex sets: convex combinations and convex hull 11

Proof. Obviously, K is nonempty as [x; t] = [0; 0] is in K. Also, K is a conic


set as any norm ∥ · ∥ is positively homogeneous. Moreover, the closedness of K
with respect to summation is readily given by the Triangle inequality: consider
two points [x; t] ∈ K and [x′ ; t′ ] ∈ K. Then, t ≥ ∥x∥ and t′ ≥ ∥x′ ∥ which imply
t+t′ ≥ ∥x∥+∥x′ ∥ ≥ ∥x+x′ ∥. Thus, [x+x′ ; t+t′ ] ∈ K. Invoking Fact I.1.17, we see
that K is a cone. In order to see that K is closed recall that ∥·∥ is continuous (see
Fact B.23). Thus, for any sequence of points [xi ; ti ] ∈ K converging to a point [x; t]
as i → ∞, we have [∥xi ∥; ti ] → [∥x∥; t] and therefore t ≥ ∥x∥. This establishes
that the limit of any converging sequence from K belongs to K, proving that K
is closed.
A particular case of Proposition I.1.18 states that the epigraph of Euclidean
norm, i.e.,
Lm := [x; t] ∈ Rn+1 : t ≥ ∥x∥2 ,


is a closed cone. This is the second-order, (or Lorentz or ice cream) cone (see
Figure I.3), and it plays a significant role in convex optimization.

Figure I.3. [Boundary of] 3D Lorentz cone L3


To complete our first acquaintance with cones, we also mention the semidefinite
cone Sm m
+ “living” in the space S of real symmetric m×m matrices and composed
of positive semidefinite matrices from Sm , i.e.,
Sm m×m
: X = X ⊤ , a⊤ Xa ≥ 0, ∀a ∈ Rm ;

+ := X ∈ R

see section D.2.2.


Cones form a very important family of convex sets, and one can develop theory
of cones absolutely similar (and in a sense, equivalent) to that of all convex sets.
For example, by introducing the notion of conic combination of vectors x1 , . . . , xk
as a linear combination of these vectors with nonnegative coefficients, we can eas-
ily prove the following statements completely similar to those for general convex
sets, with conic combination playing the role of convex ones:
• A set is a cone if and only if it is nonempty and is closed with respect to taking
conic combinations of its elements;
• Intersection of a family of cones is again a cone; in particular, for every set
K ⊆ Rn there exists the smallest (w.r.t. inclusion) cone containing K, called the
conic hull of K:
12 First acquaintance with convex sets

Definition I.1.19 [Conic hull] For any K ⊆ Rn , the conic hull of K [nota-
tion: Cone(K)] is the intersection of all cones containing K. Thus, Cone(K)
is the smallest (w.r.t. inclusion) cone containing K.

• We can describe the conic hull of a set K ⊆ Rn in terms of its conic combina-
tions:

Fact I.1.20 [Conic hull via conic combinations] The conic hull Cone(K) of
a set K ⊆ Rn is the set of all conic combinations (i.e., linear combinations
with nonnegative coefficients) of vectors from K:
( N
)
X
Cone(K) = x ∈ Rn : ∃N ≥ 0, λi ≥ 0, xi ∈ K, i ≤ N : x = λi xi .
i=1

Note that here we P use the standard convention: the sum of vectors over an empty
0
set of indexes, like i=1 z i , has a value – it is the origin of the space where vectors
live. In particular, the set of conic combinations of vectors from empty set is {0},
in full accordance with Definition I.1.19.

1.3 Calculus of convex sets


Calculus of convex sets is, in a nutshell, the list of operations which preserve
convexity.

Proposition I.1.21 The following operations preserve convexity of sets:


1. Taking
T intersection: if Mα , α ∈ A, are convex sets, so is their intersection
Mα .
α
2. Taking direct product: if M1 ⊆ Rn1 and M2 ⊆ Rn2 are convex sets, so is
their direct product, i.e., the set
M1 × M2 := x = [x1 ; x2 ] ∈ Rn1 × Rn2 = Rn1 +n2 : x1 ∈ M1 , x2 ∈ M2 .


3. Arithmetic summation and multiplication by reals: if M1 , . . . , Mk are non-


empty convex sets in Rn and λ1 , . . . , λk are arbitrary reals, then the set
( k )
X
λ1 M1 + . . . + λk Mk := λi xi : xi ∈ Mi , i = 1, . . . , k
i=1

is convex.
Warning: “Linear combination λ1 M1 + . . . + λk Mk of sets” as defined
above is just a notation. When operating with these “linear combinations
of sets,” one should be careful. For example. while is it true that M1 +
M2 = M2 + M1 and that M1 + (M2 + M3 ) = (M1 + M2 ) + M3 , and
1.3 Calculus of convex sets 13

even that λ(M1 + M2 ) = λM1 + λM2 , it is, in general, not true that
(λ1 + λ2 )M = λ1 M + λ2 M .
4. Taking image under an affine mapping: if M ⊆ Rn is convex set and x 7→
A(x) ≡ Ax + b is an affine mapping from Rn into Rm (where A ∈ Rm×n
and b ∈ Rm ), then the image of M under the mapping A(·), i.e., the set
A(M ) := {A(x) : x ∈ M } ,
is convex.
5. Taking inverse image under affine mapping: if M ⊆ Rn is a convex set
and y 7→ A(y) = Ay + b is an affine mapping from Rm to Rn (where
A ∈ Rn×m and b ∈ Rn ), then the inverse image of M under the mapping
A(·), i.e., the set
A−1 (M ) := {y ∈ Rm : A(y) ∈ M } ,
is convex.
The (completely straightforward) verification of this proposition is left to the
reader.

1.3.1 Calculus of closed convex sets


Numerous important convexity-related results require not just convexity, but also
closedness of the participating sets. Therefore, it makes sense to think to which
extent the “calculus of convexity” as presented in Proposition I.1.21 is preserved
when passing from general convex sets to closed convex sets. Here are the answers:
T
1. Taking intersection: if Mα , α ∈ A, are closed convex sets, so is the set Mα .
α
2. Taking direct product: if M1 ⊆ Rn1 and M2 ⊆ Rn2 are closed convex sets, so
is the set
M1 × M2 = x = [x1 ; x2 ] ∈ Rn1 × Rn2 = Rn1 +n2 : x1 ∈ M1 , x2 ∈ M2 .


3. Arithmetic summation of nonempty closed convex sets Mi , 1 ≤ i ≤ k, preserves


convexity, but not necessarily preserves closedness. However, it does preserve
closedness when at most one of the sets is unbounded.
An example of a pair of closed convex sets with non-closed sum is M1 = {x ∈
R2 : x1 > 0, x2 ≥ 1/x1 }, M2 = {x ∈ R2 : x2 = 0}. The sum of these two closed
sets clearly is the open upper halfplane {x ∈ R2 : x2 > 0} (why?) and is not
closed.
Let us verify that if at most one of nonempty closed convex sets is unbounded,
then the sum of the sets is convex (this we already know from calculus of
convexity) and closed. Closedness is given by the following observation:
the sum of nonempty closed sets, convex or not, with at most one of the
sets unbounded, is closed.
To justify this observation, it clearly suffices to verify its validity for a pair
14 First acquaintance with convex sets

of sets, M1 and M2 . Assuming both sets are nonempty and closed and M1
is bounded, we should prove that if a sequence {xi + y i }i with xi ∈ M1 and
y i ∈ M2 converges as i → ∞, the limit lim (xi + y i ) belongs to M1 + M2 .
i→∞
Since M1 , and thus the sequence {xi }i , is bounded, passing to a subsequence
we may assume that the sequence {xi }i converges, as i → ∞, to some x. Since
the sequence {xi + y i }i converges as well, the sequence {y i }i also converges to
some y. As M1 and M2 are closed, we have x ∈ M1 , y ∈ M2 , and therefore
lim (xi + y i ) = x + y ∈ M1 + M2 ,
i→∞
4. Multiplication by a real: For a nonempty closed convex set M and a real λ,
the set λM is closed and convex (why?).
5. Image under an affine mapping of a closed convex set M is convex, but not
necessarily closed; it is definitely closed when M is bounded.
As an example of closed convex set with a non-closed affine image consider
the set {[x; y] ∈ R2 : x, y ≥ 0, xy ≥ 1} (i.e., a branch of hyperbola) and its
projection onto the x-axis. This set is convex and closed, but its projection
onto the x-axis is the positive ray {x > 0} which is not closed. Closedness of
the affine image of a closed and bounded set is the special case of the general
fact:
the image of a closed and bounded set under a mapping that is continuous
on this set is closed and bounded as well (why?).
6. Inverse image under affine mapping: if M ⊆ Rn is convex and closed and
y 7→ A(y) = Ay + b is an affine mapping from Rm to Rn , then the set
A−1 (M ) := {y ∈ Rm : A(y) ∈ M }
is a closed convex set in Rm . Indeed, the convexity of A−1 (M ) is given by the
calculus of convexity, and its closedness is due to the following standard fact:
the inverse image of a closed set in Rn under continuous mapping from
Rm to Rn is closed (why?).

We see that the “calculus of closed convex sets” is somehow weaker than the
calculus of convexity per se. Nevertheless, we will see that these difficulties dis-
appear when restricting the operands of our operations to be polyhedral, and not
just closed and convex.

1.4 Topological properties of convex sets


Convex sets and closely related objects - convex functions - play the central role
in Optimization. To play this role properly, the convexity alone is not sufficient;
we need convexity and closedness.

1.4.1 The closure


It is clear from definition of a closed set that the intersection of a family of closed
sets in Rn is also closed (see Fact B.13). From this fact it follows that for every
1.4 Topological properties of convex sets 15

subset M of Rn there exists the smallest (w.r.t. inclusion) closed set containing
M . This leads us to the following definition.

Definition I.1.22 [Closure] Given a set M ⊆ Rn , the closure of M [no-


tation: cl M or cl(M )] is the smallest (w.r.t. inclusion) closed set (i.e., the
intersections of all closed sets) containing M .

From Real Analysis, we have the following inner description of the closure of
a set in a metric space (and, in particular, in Rn ).

Fact I.1.23 The closure of a set M ⊆ Rn is exactly the set composed of the
limits of all converging sequences of elements from M .

Example I.1.5 Based on Fact I.1.23, it is easy to prove that, e.g., the closure
of the open Euclidean ball
{x ∈ Rn : ∥x − a∥2 < r} [where r > 0]
is the closed Euclidean ball {x ∈ Rn : ∥x − a∥2 ≤ r}.
Another useful application example is the closure of a set defined by strict
linear inequalities, i.e.,
M := x ∈ Rn : a⊤

α x < bα , α ∈ A .

Whenever such a set M is nonempty, then its closure is given by the nonstrict
versions of the same inequalities:
cl M = x ∈ Rn : a⊤

α x ≤ bα , α ∈ A .

Note here that nonemptiness of M in this last example is essential. To see this,
consider the set M = {x ∈ R : x < 0, − x < 0} . Clearly, M is empty, so that its
closure also is the empty set. On the other hand, if we ignore the nonemptiness
requirement on M and apply formally the above rule, we would incorrectly claim
that cl M = {x ∈ R : x ≤ 0, − x ≤ 0} = {0} . ♢

1.4.2 The interior


n
Consider a set M ⊆ R . Recall from Definition B.9 that a point x ∈ M is an
interior point of M , if some neighborhood of the point is contained in M , i.e., if
there exists a ball of positive radius centered at x which is contained in M :
∃r > 0 : Br (x) := {y ∈ Rn : ∥y − x∥2 ≤ r} ⊆ M.

Definition I.1.24 [Interior] The set of all interior points of a given set
M ⊆ Rn is called the interior of M [notation: int M or int(M )] (see Defini-
tion B.10).
16 First acquaintance with convex sets

Example I.1.6 We have the following sets and their corresponding interiors:

• The interior of an open set is the set itself.


• The interior of the closed ball {x ∈ Rn : ∥x − a∥2 ≤ r} (r > 0 or n ≥ 1) is the
open ball {x ∈ Rn : ∥x − a∥2 < r} (why?).
• The interior of the standard full-dimensional simplex
( n
)
X
n
µ ∈ R : µ ≥ 0, µi ≤ 1
i=1
Pn
is composed of all vectors µi > 0 for all i = 1, . . . , n with i=1 µi < 1 (why?).
• The interior of a polyhedral set {x ∈ Rn : Ax ≤ b} with matrix A not contain-
ing zero rows is the set {x ∈ Rn : Ax < b} (why?).
Note that here the requirement that the set is polyhedral, i.e., defined by a
finite system of linear inequalities is critical. In particular, this statement is
not, generally speaking, true for solution sets of infinite systems of linear in-
equalities. For example, the following set defined by an infinite system of linear
inequalities
 
1
M := x ∈ R : x ≤ , n = 1, 2, . . .
n
is nothing but the nonpositive ray R− = {x ∈ R : x ≤ 0}, i.e., M = R− .
Thus, int M = {x ∈ R : x < 0}, i.e., the negative ray. On the other hand, the
following set defined by the strict versions of these inequalities
 
1
M ′ := x ∈ R : x < , n = 1, 2, . . .
n
define the same nonpositive ray, i.e., M ′ = {x ∈ R : x ≤ 0}. Hence, M ′ ̸=
int M for this set M defined by an infinite system of inequalities. ♢
The following observation is evident:

Fact I.1.25 For any set M in Rn , its interior, int M , is always open, and
int M is the largest (with respect to the inclusion) open set contained in M .

The interior of a set is, of course, contained in the set, which, in turn, is
contained in its closure:
int M ⊆ M ⊆ cl M. (1.3)

Definition I.1.26 [Boundary] For any M ⊆ Rn , the boundary of M is the


set
bd M = cl M \ int M,
and the points on the boundary are called boundary points of M .
1.4 Topological properties of convex sets 17

The boundary points of M are exactly the points from Rn which can be approx-
imated to whatever high accuracy both by points from M and by points from
outside of M (check it!).
Given a set M ⊆ Rn , it is important to note that the boundary points not
necessarily belong to M , since M = cl M need not necessarily hold in general. In
fact, all boundary points belong to M if and only if M = cl M , i.e., if and only
if M is closed.
The boundary of a set M ⊆ Rn is clearly closed as bd M = cl M ∩ (Rn \ int M )
and both sets cl M and Rn \ int M are closed (note that the set Rn \ int M is
closed since it is the complement of an open set). In addition, from the definition
of the boundary, we have
M ⊆ (int M ∪ bd M ) = cl M.
Therefore, any point from M is either an interior or a boundary point of M .

1.4.3 The relative interior


Many of the constructions in Optimization possess nice properties in the interior
of the set the construction is related to and may lose these nice properties at the
boundary points of the set. This is why in many cases we are especially interested
in interior points of sets and want the set of these interior points to be “sufficiently
dense.” What should we do if it is not the case, for example if there are no interior
points at all (e.g., if we are looking at a segment in the plane)? It turns out that
in these cases we can use a good surrogate of the “normal” interior, namely the
relative interior defined as follows.

Definition I.1.27 [Relative interior] Let M ⊆ Rn be nonempty. We say


that a point x ∈ M is relative interior for M if M contains the intersection
of a small enough ball centered at x with Aff(M ), i.e., if there exists r > 0
such that
(Br (x) ∩ Aff(M )) := {y ∈ Rn : y ∈ Aff(M ), ∥y − x∥2 ≤ r} ⊆ M.
The relative interior of M [notation: rint M ] refers to the set of all relative
interior points of M .
By definition, the relative interior of empty set is empty.

Example I.1.7 We have the following sets and their corresponding relative
interiors:
• The relative interior of a singleton is the singleton itself (since a point in the
0-dimensional space is the same as a ball of a positive radius).
• More generally, the relative interior of an affine subspace is the subspace itself.
• Given two distinct point x ̸= y in Rn , the interior of a segment [x, y] is empty
whenever n > 1. In contrast to this, the relative interior of this set is always
(independent of n) nonempty and it is precisely the interval (x, y), i.e., the
segment without the endpoints. ♢
18 First acquaintance with convex sets

Geometrically speaking, the relative interior is the interior we get when we


treat M ⊆ Rn as a subset of its affine hull (the latter, geometrically, is nothing
but Rk , k being the affine dimension of Aff(M )).
We can play with the notion of the relative interior in basically the same way
as with the one of interior. Namely, for any M ⊆ Rn , since Aff(M ) is closed
and contains M , it contains also the smallest closed set containing M , i.e., cl M .
Therefore, we have the following analogies of inclusions, cf. (1.3):
rint M ⊆ M ⊆ cl M [⊆ Aff(M )]. (1.4)
We can also define the relative boundary.

Definition I.1.28 [Relative boundary] For any M ⊆ Rn , its relative bound-


ary [notation: rbd M ] is defined as the set rbd M = cl M \ rint M .

Note that for any M ⊆ Rn , we naturally have rbd M is a closed set contained
in Aff(M ), and, as for the “actual” interior and boundary, we have
rint M ⊆ M ⊆ cl M = rint M ∪ rbd M.
Of course, if Aff(M ) = Rn , then the relative interior becomes the usual interior,
and similarly for boundary. Note that Aff(M ) = Rn for sure is the case when
int M ̸= ∅ (since then M contains a ball B, and therefore the affine hull of M is
the entire Rn , which is the affine hull of B).

1.4.4 Nice topological properties of convex sets


An arbitrary set M ⊆ Rn may possess very pathological topology. In particular,
both inclusions in the chain
rint M ⊆ M ⊆ cl M
can be very “loose.” For example, let M be the set of rational numbers in the
segment [0, 1] ⊂ R. Then, rint M = int M = ∅ since every neighborhood of every
rational number contains irrational numbers. On the other hand, cl M = [0, 1].
Thus, rint M is “incomparably smaller” than M , cl M is “incomparably larger”
than M , and M is contained in its relative boundary (by the way, what is this
relative boundary?).
The following theorem demonstrates that the topology of a convex set M is
much better than what it might be for an arbitrary set.

Theorem I.1.29 Let M be a convex set in Rn . Then,


(i) The interior int M , the closure cl M and the relative interior rint M are
convex.
(ii) If M is nonempty, then its relative interior rint M is nonempty.
(iii) The closure of M is the same as the closure of its relative interior,
1.4 Topological properties of convex sets 19

i.e., cl M = cl(rint M ). (In particular, every point of cl M is the limit of a


sequence of points from rint M .)
(iv) The relative interior remains unchanged when we replace M with its
closure, i.e., rint M = rint (cl M ).
Moreover, (iii) and (iv) imply that
(v) The relative boundary remains unchanged when we replace M with its
closure.
We will use the following basic result to prove this theorem (we will present
the proof of this lemma after the proof of the theorem).

Lemma I.1.30 Let M be a convex set in Rn . Then, for any x ∈ rint M and
y ∈ cl M , we have
[x, y) := {(1 − λ)x + λy : 0 ≤ λ < 1} ⊆ rint M.

Proof of Theorem I.1.29. (i): Prove yourself!


(ii): Let M be a nonempty convex set, and let us prove that rint M ̸= ∅.
By translation, we may assume that 0 ∈ M . Furthermore, we may assume that
the linear span of M , i.e., Lin(M ), is the entire Rn . Indeed, as far as linear
operations and the Euclidean structure are concerned, Lin(M ), as every other
linear subspace in Rn , is equivalent to Rk for a certain k. Since the notion of
relative interior deals only with linear and Euclidean structures, we lose nothing
thinking of Lin(M ) as of Rk and taking it as our universe instead of the original
universe Rn . Thus, in the rest of the proof of (ii), we assume that 0 ∈ M and
Lin(M ) = Rn ; what we need to prove is that the interior of M (which in our
case is the same as relative interior of M ) is nonempty. Note that since 0 ∈ M ,
we have Aff(M ) = Lin(M ) = Rn .
As Lin(M ) = Rn , we can find n linearly independent vectors a1 , . . . , an in M .
Let us also set a0 := 0. The n + 1 vectors a0 , . . . , an belong to M . Since M is
convex, the convex hull of these vectors, i.e.,
( n n
) ( n n
)
X X X X
∆ := x = λi ai : λ ≥ 0, λi = 1 = x = µi ai : µ ≥ 0, µi ≤ 1
i=0 i=0 i=1 i=1

also belongs to M . Note that the set ∆ is the image of the standard full-
dimensional simplex
( n
)
X
n
µ ∈ R : µ ≥ 0, µi ≤ 1
i=1

under the linear transformation µ 7→ Aµ, where A is the matrix with the columns
a1 , . . . , an . Recall from Example I.1.6 that the standard simplex has a nonempty
interior. Since A is nonsingular (due to the linear independence of a1 , . . . , an ),
multiplication by A maps open sets onto open ones, so that ∆ has a nonempty
interior. Since ∆ ⊆ M , the interior of M is nonempty.
(iii): The statement is evidently true when M is empty, so we assume that
20 First acquaintance with convex sets

M ̸= ∅. We clearly have cl(rint M ) ⊆ cl M due to rint M ⊆ M . Thus, all we


need to complete the proof of (iii) is to verify that every y ∈ cl M is the limit of
a sequence of points y i ∈ rint M . Indeed, pick x ∈ rint M (recall that from part
(ii) we have rint M ̸= ∅) and set y i := (1 − 1/i)y + (1/i)x. By Lemma I.1.30, we
have y i ∈ rint M , and clearly y = lim y i , completing the verification of (iii).
i→∞
(iv): The statement is obviously true when M is empty, so we assume that
M ̸= ∅. Since M ⊆ cl M , we always have rint M ⊆ rint (cl M ). To prove the
reverse inclusion, consider any z ∈ rint (cl M ), and let us prove that z ∈ rint M .
Let x ∈ rint M (from part (ii), we already know that rint M ̸= ∅). As x and z
are in Aff(M ), for any t ∈ R, the vectors z t := x + t(z − x) belong to Aff(M ), and
when t approaches 1, z t approaches z. Since z ∈ rint (cl M ), it follows that there
exists ϵ > 0 such that y := z 1+ϵ ∈ cl M . It remains to note that z = (1 − λ)y + λx
ϵ
with λ = 1+ϵ ∈ (0, 1), and therefore z = (1 − λ)y + λx ∈ rint M by Lemma I.1.30
(recall that x ∈ rint M , y ∈ cl M ).
Remark I.1.31 We see from the proof of Theorem I.1.29(iii) that to get the
closure of a (nonempty) convex set, it suffices to take its “radial” closure, i.e., to
take a point x ∈ rint M , take all rays in Aff(M ) starting at x and look at the
intersection of such a ray ℓ with M ; such an intersection will be a convex set on
the line which contains a one-sided neighborhood of x, i.e., is either a segment
[x, y ℓ ], or the entire ray ℓ, or a half-interval [x, y ℓ ). In the first two cases we do
not need to do anything; in the third case, we need to add y ℓ to M . After all
rays are looked through and all “missed” endpoints y ℓ are added to M , we obtain
cl M . To understand the role of convexity in this result, look at the nonconvex set
of rational numbers from [0, 1]. The interior (≡ relative interior) of this ”highly
percolated” set is empty, the closure is [0, 1], and there is no way to restore the
closure in terms of the interior. ■
Proof of Lemma I.1.30. Given that x ∈ M , let us denote Aff(M ) = x + L,
where L is the linear subspace parallel to Aff(M ). Then,
M ⊆ Aff(M ) = x + L.
Let B be the unit Euclidean ball in L, i.e., B = {h ∈ L : ∥h∥2 ≤ 1} . Since
x ∈ rint M , there exists a positive radius r such that
x + rB ⊆ M. (1.5)
Now consider any λ ∈ [0, 1), and let z := (1 − λ)x + λy. As y ∈ cl M , we have
y = lim y i for certain sequence of points from M . By setting z i := (1 − λ)x + λy i ,
i→∞
we get z i → z as i → ∞. Then, from (1.5) and the convexity of M , it follows
that the sets Zi := {(1 − λ)x′ + λy i : x′ ∈ x + rB} are contained in M . Clearly,
Zi is exactly the set z i + r′ B, where r′ := (1 − λ)r > 0. Thus, z is the limit of the
sequence z i , and r′ -neighborhood (in Aff(M )) of every one of the points z i belongs
to M . For every 0 < r′′ < r′ and for all i such that z i is close enough to z, the
r′ -neighborhood of z i contains the r′′ -neighborhood of z; thus, a neighborhood
(in Aff(M )) of z belongs to M , hence z ∈ rint M .
A useful byproduct of Lemma I.1.30 is as follows:
1.4 Topological properties of convex sets 21

Corollary I.1.32 Let M ⊆ Rn be convex. Then, every convex combination


i i
P
i λi x of points x ∈ cl M such that at least one term with positive coefficient
is associated with xi ∈ rint M is in fact a point from rint M .

Another useful byproduct of Lemma I.1.30 is as follows. Let Mk , k ≤ K, be a


finite collection of subsets of Rn . The closure of the union of these sets is the union
of their closures: cl(∪k≤K Mk ) = ∪k≤K cl Mj (why?). Now let us ask ourselves
similar question about intersection: what is the relation between cl ∩k≤K Mk and
∩k≤K cl Mk ? The set ∩k≤K cl Mk is closed and clearly contains ∩k≤K Mk and thus
always contains the closure of the latter set:
!
\ \
cl Mk ⊆ cl Mk . (1.6)
k≤K k≤K

In general, this inclusion can be “loose” – the right hand side set in (1.6) can
be much larger than the left hand side one, even when all Mk are convex. For
example, when K = 2, M1 = {x ∈ R2 : x2 = 0} is the x1 -axis, and M2 = {x ∈
R2 : x2 > 0} ∪ {[0; 0]}, both sets are convex, their intersection is the singleton
{0}, so that cl(M1 ∩ M2 ) = cl{0} = {0}, while the intersection of cl M1 and cl M2
is the entire x1 -axis, which is simply M1 . In this example the right hand side
in (1.6) is “incomparably larger” than the left hand side one. However, under
suitable assumptions we can also achieve equality in (1.6).

Proposition
T I.1.33 Consider convex sets Mk ⊆ Rn , k ≤ K.
(i) If k≤K rint Mk ̸= ∅, then cl(∩k≤K Mk ) = ∩k≤K cl Mk , i.e., (1.6) holds
an equality.
T (ii) Moreover, if MK ∩ int M1 ∩ int M2 ∩ . . . ∩ int MK−1 ̸= ∅, then we have
k≤K rint Mk ̸= ∅, i.e., the premise (and thus the conclusion) in (i) holds
true, so that cl(∩k≤K Mk ) = ∩k≤K cl Mk .

Proof. (i): To prove that under the premise of (i) inclusion (1.6) is equality is
the same as to verify that under the circumstances given x ∈ ∩k cl Mk , one has
x ∈ cl (∩k Mk ). Indeed, under the premise of (i) there exists x̄ ∈ ∩k rint Mk . Then,
for every k we have x̄ ∈ rint Mk and x ∈ cl Mk , implying by Lemma I.1.30 that
the set ∆ := [x̄, x) = {(1 − λ)x̄ + λx : 0 ≤ λ < 1} is contained in Mk . Since
∆ ⊆ Mk for all k, we have ∆ ∈ ∩k Mk , and thus cl ∆ ⊆ cl (∩k Mk ). It remains to
note that x ∈ cl ∆.
(ii): Let x̄ ∈ MK ∩ int M1 ∩ . . . ∩ int MK−1 . As x̄ ∈ int Mk for all k < K, there
exists an open set U ⊂ ∩k<K Mk such that x̄ ∈ U . As x̄ ∈ MK ⊆ cl MK , by
Theorem I.1.29, x̄ is the limit of a sequence of points from rint MK , so that there
b ∈ U ∩ rint MK . Due to the origin of U , we have x
exists x b ∈ rint Mk for all k ≤ K,
so that the premise of (i) indeed takes place.
22 First acquaintance with convex sets

1.5 ⋆ Conic and perspective transforms of a convex set


Let X ⊆ Rn be a nonempty convex set. We can “lift” it to Rn+1 by passing to
the set
X + := {[x; 1] ∈ Rn × R : x ∈ X} .
Now let us look at the conic hull of X + , given by
ConeT(X) := Cone(X + )
[x; t] ∈ Rn × R+ : ∃(I, λiP
≥ 0, xi ∈ X, ∀i
 
≤ I):
= .
x = i≤I λi xi , t = i≤I λi
P

We will call this the conic transform of X, see Figure I.4. Note that this set
is indeed a cone. Moreover, all vectors [x; t] from this cone have t ≥ 0, and,
importantly, the only vector with t = 0 in the cone ConeT(X) is the origin in
Rn+1 (this is what you get when taking trivial – with all coefficients zero – conic
combinations of vectors from X + ).

Figure I.4. Conic transform


a) conic transform of segment X is the angle AOB
b) conic transform of ray X is the angle AOB with
relative interior of the ray OB excluded

All nonzero vectors [x; t] from ConeT(X) have t > 0 and form a convex set which
we call the perspective transform Persp(X) of X:
Persp(X) := {[x; t] ∈ ConeT(X) : t > 0} = ConeT(X) \ {0n+1 }.
The name of this set is motivated by the following immediate observation:

Proposition I.1.34 [Perspective transform of a nonempty convex set] Let


X be a nonempty convex set in Rn . Then, its perspective transform admits
1.5 ⋆ Conic and perspective transforms of a convex set 23

the representation
Persp(X) = {[x; t] ∈ Rn × R : t > 0, x/t ∈ X} . (1.7)
In other words, to get Persp(X), we pass from X to X + (i.e., lift X to
Rn+1 ) and then take the union of all rays {[sx; s] ∈ Rn × R : s > 0, x ∈ X}
emanating from the origin (with origin excluded) and passing through the
points of X + .

Proof. Let X b := {[x; t] ∈ Rn × R : t > 0, x/t ∈ X}, so that the claim in the
proposition is Persp(X) = X. b Consider a point [x; t] ∈ X. b Then, t > 0 and
y := x/t ∈ X, and thus we have [x, t] = t[y; 1] so that the point [x; t] from X b is a
single-term conic combination – just positive multiple – of the point [y; 1] ∈ X + .
As this holds for every point [x; t] ∈ X b we conclude X b ⊆ Persp(X). To verify
the
P oppositeP inclusion, recall that every point P [x; t] ∈ Persp(X) is of the form
[ i λi xi ; i λi ] with xi ∈ X, λi ≥ 0, and t = i λi > 0. Then,
hX X i hX i
λ i xi ; λi = t (λi /t)xi ; 1 = t[y; 1],
i i i
i
P
where y := i (λi /t)x . Note that y ∈ X as it is a convex combination of points
from X and X is convex. Thus, [x; t] is such that t > 0 and y = x/t ∈ X, that is,
Xb ⊇ Persp(X) as desired.
As a byproduct of Proposition I.1.34, we conclude that the right hand side set
in (1.7) is convex whenever X is convex and nonempty – a fact not so evident
“from scratch.”
Note that X + is geometrically the same as X, and moreover we can view X +
as simply the intersection of ConeT(X) (or Persp(X)) with the hyperplane t = 1
in Rn × R.
Example I.1.8
1. ConeT(Rn ) = {[x; t] ∈ Rn+1 : t > 0} ∪ {0n+1 }, and
Persp(Rn ) = {[x; n+1
 t] ∈ R n+1: t > 0}.
n
2. ConeT(R+ ) = [x; t] ∈ R+ : t > 0 ∪ {0n+1 }, and
Persp(Rn+ ) = [x; t] ∈ Rn+1
+ :t>0 .
3. Given any norm ∥ · ∥ on Rn , let B be its unit ball. Then, we have ConeT(B) =
{[x; t] ∈ Rn+1 : t ≥ ∥x∥}, and Persp(B) = {[t; x] ∈ Rn+1 : t ≥ ∥x∥, t > 0}.

Note that in all three examples in Example I.1.8, the set X of which we are
taking conic and perspective transforms is not just convex, but also closed. How-
ever, in the first two examples, the conic transform is a non-closed cone, while in
the third example the conic transform is closed, albeit in all three cases the inter-
sections of ConeT(X) with half-space {[x; t] ∈ Rn+1 : t ≥ α} is closed, provided
α > 0. There is indeed a general fact underlying this phenomenon.
24 First acquaintance with convex sets

Proposition I.1.35 Let X ⊂ Rn be a nonempty convex set. Then, we have


the following:
(i) For α > 0, define Hα := {[x; t] ∈ Rn+1 : t ≥ α}. When X is closed,
ConeT(X) ∩ Hα = Persp(X) ∩ Hα and this intersection is closed for any
α > 0.
(ii) Moreover, the cone ConeT(X) is closed if and only if X is closed and
bounded. In fact, ConeT(X) is closed if and only if cl (Persp(X)) = ConeT(X).

Proof. (i): When α > 0, we clearly have ConeT(X) ∩ Hα = Persp(X) ∩ Hα .


To see that these intersections are closed whenever X is closed, invoking (1.7) it
suffices to prove that when {[xi ; ti ]}i≥1 is a converging sequence such that ti ≥ α
and xi /ti ∈ X, then the limit [x; t] of this sequence satisfies x/t ∈ X and t ≥ α.
Since ti → t as i → ∞ and ti ≥ α holds for all i, we clearly have t ≥ α. Moreover,
we have that the converging sequence y i := xi /ti is in X thus xi /ti → x/t as
i → ∞ and the point x/t is in X since X is closed.
(ii): First, we assume that nonempty convex set X is closed and bounded, and
we will prove that ConeT(X) is closed, that is, whenever a sequence {[xi ; ti ]}i≥1 of
points from ConeT(X) converges, the limit of the sequence belongs to ConeT(X).
Indeed, consider such a sequence along with its limit [x; t]. When t > 0, all but
finitely many terms of the sequence belong to the half-space Ht/2 , and as by part
(i) ConeT(X) ∩ Ht/2 is closed, we have [x; t] ∈ ConeT(X). When t = 0, then
either (a) ti = 0 for infinitely many values of i, or (b) ti > 0 for all but finitely
many values of i. In the case of (a) infinitely many terms in our sequence are of
the form [0n ; 0] (since whenever [y; 0] ∈ ConeT(X) we must have y = 0 as well),
so that [x; t] = 0n+1 ∈ ConeT(X). In the case of (b) for all large enough i we
have ti > 0 and xi /ti ∈ X, and since ti → 0 as i → ∞ and X is bounded we
deduce xi → 0 as i → ∞. Then, this together with ti → 0, i → ∞, implies that
[x; t] = [0n+1 ; 0], and we again have [x; t] ∈ ConeT(X). Thus, whenever X is a
nonempty closed and bounded set, ConeT(X) is closed.
Now assume that ConeT(X) is closed, and let us prove that X is closed and
bounded. Clearly, X is closed if and only if X + is closed, and since X + is the
intersection of the closed set ConeT(X) with the hyperplane t = 1 in Rn ×R, X +
is indeed closed. it remains to prove that X is bounded. Assume for contradiction
that X is unbounded. Then, we can find a sequence xi ∈ X, i ≥ 1, with ∥xi ∥2 →
∞ as i → ∞. Passing to a subsequence, we can assume that the ∥ · ∥2 -unit vectors
ξ i := xi /∥xi ∥2 converge to some unit vector ξ. Setting ti := 1/∥xi ∥2 , we have
ti > 0, ti → 0 as i → ∞, and ξ i /ti = xi ∈ X, so that [ξ i ; ti ] ∈ ConeT(X). Since
ConeT(X) is closed and by construction [ξ i ; ti ] → [ξ; 0] as i → ∞, we should have
[ξ; 0] ∈ ConeT(X), which is impossible as ∥ξ∥2 = 1.
The final claim, i.e., ConeT(X) is closed if and only if cl (Persp(X)) = ConeT(X),
follows immediately as well. Indeed, whenever X is nonempty and convex, we have
Persp(X) = ConeT(X) \ {0n+1 } and clearly 0n+1 ∈ cl(Persp(X)), implying that
cl(ConeT(X)) = cl(Persp(X)). As a result, whenever X is nonempty and convex,
ConeT(X) is closed if and only if ConeT(X) = cl(Persp(X)).
1.5 ⋆ Conic and perspective transforms of a convex set 25

For a nonempty convex set X, let us also define the closure of ConeT(X), i.e.,
the set
ConeT(X) := cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ X} .
Clearly, ConeT(X) is a closed cone in Rn+1 containing X + . Moreover, it is im-
mediately seen that ConeT(X) is the smallest (w.r.t. inclusion) closed cone in
Rn+1 which contains X + and that this cone remains intact when extending X
to the closure of X. We will refer to ConeT(X) as the closed conic transform
of X. In some cases, ConeT(X) admits a simple characterization. An immediate
illustration of this is as follows:

Fact I.1.36 Let K be a closed cone and let the set


X := {x ∈ Rn : Ax − b ∈ K}
be nonempty. Then, ConeT(X) = {[x; t] ∈ Rn × R : Ax − bt ∈ K, t ≥ 0}.

For useful additional facts on closed conic transforms, see Exercise III.12.1-3.
2

Theorems of Caratheodory, Radon, and


Helly

We next examine three theorems from Convex Analysis that have important
consequences in Optimization.

2.1 Caratheodory Theorem


Let us recall the notion of dimension from Linear Algebra. First of all, we define
the dimension of a linear subspace. This is precisely the number of linearly inde-
pendent vectors spanning the linear subspace. In the case of an affine subspace,
we talk about its affine dimension, which is precisely the dimension of the linear
subspace that is underlying (parallel) to the given affine subspace. Based on these
notions, we are now ready to define dimension of a nonempty set M .

Definition I.2.1 [Dimension of a nonempty set] Given a nonempty set M ⊆


Rn , its dimension (also referred as its affine dimension) [notation: dim(M )]
is defined as the affine dimension of Aff(M ), or, which is the same, linear
dimension of the linear subspace parallel to Aff(M ).

Remark I.2.2 Note that some subsets of Rn are in the scopes of several defi-
nitions of dimension. Specifically, linear subspace is also an affine subspace, and
similarly, an affine subspace is a nonempty set as well. It is immediately seen that
if a set is in the scope of more than one definition of dimension, all applicable
definitions attribute the set with the same value of the dimension. ■
As an informal introduction to what follows, draw several points (“red points”)
on the 2D plane and a take a point (“blue point”) in their convex hull. You will
observe that whatever your selection of red points and the blue point in their
convex hull, this point will belong to a properly selected triangle with red vertices.
The general fact is as follows.

Theorem I.2.3 [Caratheodory Theorem] Consider a nonempty set M ⊆


Rn , and let m := dim(M ). Then, every point x ∈ Conv(M ) is a convex
combination of at most m + 1 points from M .

Proof. Let E := Aff(M ). Then, dim(E) = m. Replacing, if necessary, the em-


bedding space Rn of M with E (the latter is, geometrically, just Rm ), we can
assume with loss of generality that m = n.
26
2.1 Caratheodory Theorem 27

Let x ∈ Conv(M ). By Fact I.1.14 on the structure of convex hull, there exists
x , . . . , xN from M and convex combination weights λ1 , . . . , λN such that
1

N
X N
X
x= λ i xi , where λi ≥ 0, ∀i = 1, . . . , N, λi = 1.
i=1 i=1

Among all such possible representations of x as a convex combination of points


from M , let us choose one with the smallest possible N , i.e., involving fewest
number of points from M . Let this representation of x be the above convex
combination. We claim that N ≤ m + 1; proving this claim is all we need to
complete the proof of Caratheodory Theorem.
Let us assume for contradiction that N > m + 1. Now, consider the following
system in N variables µ1 , . . . , µN :
N
X N
X
µi xi = 0, µi = 0.
i=1 i=1

This is a system of m + 1 scalar homogeneous linear equations (recall that we are


in the case of m = n, that is, xi ∈ Rm ). As N > m + 1, the number of variables
in this system is strictly greater than the number of equations. Therefore, this
system has a nontrivial solution, say δ1 , . . . , δN , i.e.,
N
X N
X
δi xi = 0, δi = 0, and [δ1 ; . . . ; δN ] ̸= 0.
i=1 i=1

Then, for every t ∈ R, we have the following representation of x as a linear


combination of the points x1 , . . . , xN :
N
X
(λi + tδi )xi = x.
i=1

For ease of reference, let us define λi (t) := λi + tδi for all i and for all t ∈ R. Note
that for any t ∈ R, by the definition of λi and δi , we always have
N
X N
X N
X N
X
λi (t) = (λi + tδi ) = λi + t δi = 1.
i=1 i=1 i=1 i=1

Moreover, when t = 0, λi (0) = λi for all i, and thus this is a convex combination
as all λi (0) ≥ 0 for all i. On the other hand, from the selection of δi , we know
PN
that i=1 δi = 0 and [δ1 ; . . . ; δN ] ̸= 0, and thus at least one entry in δ must be
negative. Therefore, when t is large, some of the coefficients λi (t) will be negative.
There exists, of course, the largest t = t∗ for which λi (t) ≥ 0 for all i = 1, . . . , N
holds, and for t = t∗ at least some of λi (t) are zero. Specifically, when setting
I − := {i : δi < 0},
   
λi λi
i∗ ∈ argmin : i ∈ I − , and t∗ := min : i ∈ I− ,
i |δi | i |δi |
we have λi (t∗ ) ≥ 0 for all i and λi∗ (t∗ ) = 0. This then implies that we have
28 Theorems of Caratheodory, Radon, and Helly

represented x as a convex combination of less than N points from M , which


contradicts the definition of N (being the smallest number of points xi needed in
the convex combination representation of x).
Remark I.2.4 Caratheodory Theorem is sharp: for every positive integer n,
there is a set of n + 1 affinely independent points in Rn (e.g., the origin and the n
standard basic orths) such that certain convex combination of these n + 1 points
(specifically, their average) cannot be represented as a convex combination using
strictly less than n + 1 points from the set. ■
Let us see an instructive corollary witnessing the power of Caratheodory The-
orem.

Corollary I.2.5 Let X ⊆ Rn be a closed and bounded set. Then, Conv(X)


is closed and bounded as well.
Proof. There is nothing to prove when X is empty. Now let X be nonempty,
closed and bounded, and define Y := Conv(X). Boundedness of Y is evident. In
order to verify that Y is closed, let {xt }t≥1 be a converging sequence of points
from Y . By Caratheodory Theorem, every one of vectors xt , beingP a convex
n+1
combination of vectors from X, has a representation of the form xt = i=1 λti xti
of at most n+1 vectors xti from X, i ≤ n+1, where λti are the corresponding convex
combination weights. Since X is bounded, passing to a subsequence t1 < t2 < . . .,
we can assume that the sequences {xtis }s≥1 and {λtis }s≥1 converge as s → ∞ for
every i ≤ n + 1, the limits Pbeing, respectively,
Pn+1vectors xi and reals λi . We clearly
n+1
have limt→∞ xt = lims→∞ i=1 λtis xtis = i=1 λi xi . In addition, as X is closed,
Pn+1
xi = lims→∞ xtis ∈ X. Moreover, clearly λ ≥ 0 and i=1 λi = 1. The bottom
Pn+1
line is that limt→∞ xt = i=1 λi xi is a convex combination of points from X and
thus belongs to Y as Y = Conv(X).
Remark I.2.6 Note that the convex hull of a closedunbounded set is not always
closed. For example, consider the set X = {[0; 0]} ∪ [u; v] ∈ R2+ : uv = 1 which
is closed and unbounded, and we have Conv(X) = [u; v] ∈ R2+ : u > 0, v > 0 ∪
{[0; 0]} which is not closed. ■
Let us see a concrete illustration, taken from [Nem24], of Caratheodory Theo-
rem.

2.1.1 Caratheodory Theorem, Illustration


Suppose that a supermarket sells 99 different market blend herbal teas, and every
herbal tea is a certain blend of 26 herbs A,. . . ,Z. In spite of such a variety of
marketed blends, John is not satisfied with any one of them; the only herbal tea
he likes is their mixture, in the proportion
1 : 2 : 3 : . . . : 98 : 99.
Once it occurred to John that in order to prepare his favorite tea, there is no
necessity to buy all 99 market blends; a smaller number of them will do. With
2.2 Radon Theorem 29

some arithmetics, John found a combination of 66 marketed blends which still


allows to prepare his tea. Do you believe John’s result can be improved?
Answer: In fact, just 26 properly selected market blends are enough.
Explanation: Let us represent a blend by its unit weight portion, say, 1g. Such
a portion can be identified with 26-dimensional vector x = [x1 ; . . . ; x26 ], where
xi is the weight,
P26 in grams, of herb #i in the portion. Clearly, we have x ∈
R26+ and i=1 x i = 1. When mixing market blends x1 , x2 , . . . , x99 to get unit
weight portion x of mixture, we take λj ≥ 0 grams of market blend xj , j =
P99
1, . . . , 99, and mix them together, that is, x = j=1 λj xj . Since the weight of
the mixture represented by x is 1 gram and λj s corresponds to the weight (in
P99
grams) of market blends xj used in x, we get j=1 λj = 1. The bottom line is
that blend x can be obtained by mixing market blends x1 , . . . , x99 if and only if
x ∈ Conv{x1 , . . . , x99 }.
Then, by Caratheodory Theorem, every blend which can be obtained by mixing
market blends can be obtained by mixing m + 1 of them, where m is the affine
dimension of the affine spannof x1 , . . . , x99 . In our case,
o this span belongs to the
P26
25-dimensional affine plane x ∈ R26 : x
i=1 i = 1 that is, m ≤ 25.
Caratheodory Theorem admits a “conic analogy” as follows:

Fact I.2.7 [Caratheodory Theorem in conic form] Let a ∈ Rm be a conic


combination (linear combination with nonnegative coefficients) of N vectors
a1 , . . . , aN . Then, a is a conic combination of at most m vectors from the
collection a1 , . . . , aN .

2.2 Radon Theorem


As an informal introduction to what follows, draw 4 arbitrary yet distinct from
each other points on the plane and try to color some of them in red, and remaining
in blue in such a way that the convex hull of red points will intersect the convex
hull of the blue ones. Experimentation will suggest that this is always possible.
The general fact is as follows.

Theorem I.2.8 [Radon Theorem] Let xi ∈ Rn , i ≤ N , where N ≥ n + 2.


Then, there exists a partition I ∪ J = {1, . . . , N } of the index set {1, . . . , N }
into two nonempty disjoint (I ∩ J = ∅) sets I and J such that
Conv xi : i ∈ I ∩ Conv xj : j ∈ J ̸= ∅.
   

Proof. Consider the following system of homogeneous equations in N variables


30 Theorems of Caratheodory, Radon, and Helly

µ1 , . . . , µ N
N
X
µi xi = 0,
i=1
N
X
µi = 0.
i=1

Note that as xi ∈ Rn , this system has n + 1 scalar linear equations. Moreover, as


the premise of the theorem states that N > n + 1, we deduce that this system of
equations has a nontrivial solution λ1 , . . . , λN :
N
X N
X
λi xi = 0, λi = 0, and [λ1 ; . . . ; λN ] ̸= 0.
i=1 i=1

Let I := {i : λi ≥ 0} and J := {i : λi < 0}. Then, I and J are nonempty and


form a partition of {1, . . . , N } (since the sum of all λi ’s is zero and not all λi ’s
are zero). Moreover, we have
X X
a := λi = (−λj ) > 0.
i∈I j∈J

Then, by setting
λi −λj
αi := , for i ∈ I, and βj := , for j ∈ J,
a a
we get
X X
αi ≥ 0, ∀i, βj ≥ 0, ∀j, αi = 1, βj = 1.
i∈I j∈J

In addition, we also have,


" # " # " # " #! N
X
i
X
j 1 X
i
X
j 1X
αi x − βj x = λi x − (−λj )x = λi xi = 0.
i∈I j∈J
a i∈I j∈J
a i=1

αi xi = βj xj is the desired common point of


P P
We conclude that the vector
i∈I j∈J
Conv ({xi : i ∈ I}) and Conv ({xj : j ∈ J}).
Remark I.2.9 Same as in Caratheodory Theorem, the bound in Radon Theorem
is sharp: for every positive integer n, there exist n+1 points in Rn (e.g., the origin
and the n standard basic orths) which cannot be split into two disjoint subsets
with intersecting convex hulls. ■

2.3 Helly Theorem


What follows is a multidimensional extension of the nearly evident fact:
Given finitely many segments [ai , bi ] on the line such that every two of
the segments intersect, we can always find a point common to all the
segments.
2.3 Helly Theorem 31

Multidimensional extension of this fact is as follows.

Theorem I.2.10 [Helly Theorem, I] Let F := {S1 , . . . , SN } be a finite family


of convex sets in Rn . Suppose that for every collection of at most n + 1 sets
from this family, the sets from the collection have a point in common. Then,
all of the sets Si , i ≤ N , have a common point.

Proof. We will prove the theorem by induction on the number N of sets in


the family. The case of N ≤ n + 1 holds immediately due to the premise of
the theorem. So, suppose that the statement holds for all families with certain
number N ≥ n + 1 of sets, and let S1 , . . . , SN , SN +1 be a family of N + 1 convex
sets which satisfies the premise of Helly Theorem. We need to prove that the
intersection of the sets S1 , . . . , SN , SN +1 is nonempty.
For each i ≤ N + 1, we construct the following N -set families
F i := {S1 , S2 , . . . , Si−1 , Si+1 , . . . , SN +1 } , ∀i ≤ N + 1,
where the N -set family Fi is obtained by deleting from our N + 1-set family the
set Si . Note that each of these N -set families F i satisfies the premise of the Helly
Theorem, and thus, by the inductive hypothesis, the intersection of the members
of F i is nonempty:
T i := S1 ∩ S2 ∩ . . . ∩ Si−1 ∩ Si+1 ∩ . . . ∩ SN +1 ̸= ∅, ∀i ≤ N + 1.
For each i ≤ N + 1, choose a point xi ∈ T i (recall that T i is nonempty!). Then,
we have N + 1 points xi ∈ Rn . As N ≥ n + 1, we have N + 1 ≥ n + 2 and by
Radon Theorem, we can partition the index set {1, . . . , N +1} into two nonempty
disjoint subsets I and J in such a way that certain convex combination x of the
points xi , i ∈ I, is a convex combination of the points xj , j ∈ J, as well. Let
us verify that x belongs to all the sets S1 , . . . , SN +1 , which will complete the
inductive step. Indeed, select any index i∗ ≤ N + 1 and let us prove that x ∈ Si∗ .
We have either i∗ ∈ I or i∗ ∈ J. Suppose first that i∗ ∈ I. Then, all the sets
T j , j ∈ J, are contained in Si∗ (since Si∗ participates in all intersections which
give T i with i ̸= i∗ ). Consequently, all the points xj , j ∈ J, belong to Si∗ , and
therefore x, which is a convex combination of these points, also belongs to Si∗
(recall that all Si are convex!), as required. Suppose now that i∗ ∈ J. In this
case, a similar reasoning shows that all the points xi , i ∈ I, belong to Si∗ , and
therefore x, which is a convex combination of these points xi , i ∈ I, belongs to
Si∗ . The induction and the proof are complete.
For an alternative proof of Theorem I.2.10 which does not utilize Radon The-
orem, see Exercise I.23.
Remark I.2.11 Helly Theorem admits a small and immediate refinement as
follows:
SN } be a family of N convex sets in Rn , and let m be
Let F := {S1 , . . . ,S
the dimension of i≤N Si . Assume that for every collection of at most
m + 1 sets from the family, the sets from the collection have a point in
common. Then, all sets from the family have a point in common.
32 Theorems of Caratheodory, Radon, and Helly

The justification of this claim follows from viewing Si as subsets of E rather


than Rn and applying the standard Helly Theorem. ■
Remark I.2.12 Same as Caratheodory and Radon Theorems, Helly Theorem
is sharp: for every positive integer n, there exists a finite family of convex sets
in Rn such that every n of them have a common point, but all of them have no
common point. Indeed, take n + 1 affinely independent points x1 , . . . , xn+1 in Rn
(say, the origin and the n basic orths in Rn ) and n + 1 convex sets S1 ,. . . ,Sn+1
with Si being the convex hull of points x1 , . . . , xi−1 , xi+1 , . . . , xn+1 . Every n of
these sets have a point in common (e.g., the common point of S1 , S3 , S4 , . . . , Sn+1
is x2 ), but there is no point common to all n + 1 sets (why?). ■

2.3.1 Helly Theorem, Illustration A


1
Suppose that we need to design a factory which, mathematically, is described
by the following set of constraints in variables x ∈ Rn :

Ax ≥ d [d1 , . . . , d1000 : demands] 
Bx ≤ f [f1 ≥ 0, . . . , f10 ≥ 0: amounts of resources of various types] (F )
Cx ≤ c, [other constraints]

where Ax ≥ d with a vector d ∈ R1000 represents the demand constraints, Bx ≤ f


with a vector f ∈ R10 + corresponds to availability of various resources, and the
constraint Cx ≤ c represents various additional restrictions. The data A, B, C, c
are given in advance, but d is unknown and we are asked to determine f . In
particular, we are asked to buy in advance the resources fi ≥ 0, i = 1, . . . , 10, in
such a way that the factory will be able to satisfy all demand scenarios d from a
given finite set D, that is, (F ) should be feasible for every d ∈ D. The unit cost
of resource i is given to us as ai > 0. We are given that the amount fi of resource
i costs us ai fi with known ai > 0.
It is known that for every single demand scenario d ∈ D proper (depending on
the scenario d) investment of at most $1 in resources suffices to meet the demand
d.
How large should the investment in resources be in the following cases when D
contains
1. just one scenario?
2. 3 scenarios?
3. 10 scenarios?
4. 100,000 scenarios?
Answer: We claim that in these scenarios the following investment amounts will
suffice:
1. $1 investment is enough.
2. $3 investment is enough.
1 The illustration to follow is taken from [Nem24].
2.3 Helly Theorem 33

3. $10 investment is enough.


4. $11 investment is enough.
Explanation:
Cases 1 — 3: In these cases, we know that every scenario d from D can be met
by a vector of resources fd ≥ 0 incurring a cost of at most $1. Thus, when we are
given scenarios d1 , . . . , dk from D, we can meet every one of them with the vector
of resources f := fd1 + . . . + fdk , since f ≥ fdi for every i = 1, . . . , k. Then, due to
the structure of our model, f meets demand scenario di using the resources fdi .
Case 4: We claim that $11 investment is always enough no matter what the
cardinality of D is. To see this, for every d ∈ D, we define
S[d, f ] := {x ∈ Rn : Ax ≥ d, Bx ≤ f, Cx ≤ c} ,
Fd := f ∈ R10 : a⊤ f ≤ 11, S[d, f ] ̸= ∅ .


Based on these definitions, Fd is precisely the set of vectors f such that their cost
a⊤ f is at most $11 and the associated polyhedral set S[f, d] is nonempty, that is,
resource f allows to meet demand d. Note that the set Fd is convex as it is the
linear image (in fact just the projection) of the convex set
f ∈ R10 , x ∈ Rn : a⊤ f ≤ 11, x ∈ S[d, f ] .


The punchline in this illustration is that every 11 sets of the form Fd have a
common point. Suppose that we are given 11 scenarios d1 , . . . , d11 from D. Then,
we can meet demand scenario di by investing $1 in properly selected vector of
resources fdi ≥ 0. As we proceeded in the cases 1—3, by investing $11 in the single
vector of resources f = fd1 + . . . + fd11 , we can meet every one of 11 scenarios
d1 , . . . , d11 , whence f ∈ Fd1 ∩ . . . ∩ Fd11 . Since every 11 of 100,000 convex sets
Fd ⊆ R10 , d ∈ D, have a point in common, all these sets have a common point,
say f∗ . That is, f∗ ∈ Fd for all d ∈ D, and thus by definition of Fd , we deduce that
every one of the sets S[d, f∗ ], d ∈ D, is nonempty, that is, vector of resources f∗
(which costs at most $11) allows us to satisfy every demand scenario d ∈ D.

2.3.2 Helly Theorem, Illustration B


Consider an optimization problem
Opt∗ := min11 c⊤ x : gi (x) ≤ 0, i = 1, . . . , 1000

x∈R

with 11 variables x1 , . . . , x11 and convex constraints, i.e., every one of the sets
Xi := x ∈ R11 : gi (x) ≤ 0 , i = 1, . . . , 1000,


is convex. Suppose also that the problem is solvable with optimal value Opt∗ = 0.
Clearly, when dropping one or more constraints, the optimal value can only de-
crease or remain the same.
Is it possible to find a constraint such that even if we drop it, we preserve the
optimal value? Two constraints which can be dropped simultaneously with no
34 Theorems of Caratheodory, Radon, and Helly

effect on the optimal value? Three of them?


Answer: We can in fact drop as many as 1000 − 11 = 989 appropriately chosen
constraints without changing the optimal value!
Explanation: The case of c = 0 is trivial - here one can drop all 1000 constraints
without varying the optimal value! Therefore, from now on we assume c ̸= 0.
Assume for contradiction that every 11-constraint relaxation of the original
problem has negative optimal value. Since there are finitely many such relax-
ations, there exists ϵ < 0 such that every problem of the form
min c⊤ x : gi1 (x) ≤ 0, . . . , gi11 (x) ≤ 0

x

has a feasible solution with the objective value < −ϵ. Besides this, such an 11-
constraint relaxation of the original problem has also a feasible solution with the
objective equal to 0 (namely, the optimal solution of the original problem), and
since its feasible set is convex (as the intersection of the convex feasible sets of
the participating constraints), the 11-constraint relaxation has a feasible solution
x with c⊤ x = −ϵ. In other words, every 11 of the 1000 convex sets
Yi := x ∈ R11 : c⊤ x = −ϵ, gi (x) ≤ 0 , i = 1, . . . , 1000


have a point in common. Now, consider the hyperplane H := {x ∈ R11 : c⊤ x =


−ϵ}. SNote that Yi ⊆ H for all i. Moreover, as c ̸= 0, dim(H) = 10 and thus
1000
dim( i=1 Yi ) ≤ dim(H) = 10. Since every 11 of these sets Yi have a nonempty
S1000
intersection and dim( i=1 Yi ) ≤ 10, all of them have a point in common. In
other words, the original problem should have a feasible solution with negative
objective value, which is not possible as Opt∗ = 0.

2.3.3 ⋆ Helly Theorem for infinite families of convex sets


In Helly Theorem as presented above we dealt with a family of finitely many
convex sets. To extend the statement to the case of infinite families, we need to
slightly strengthen the assumptions, essentially, to avoid complications stemming
from the following two situations:
• lack of closedness: every two (and in fact – finitely many) of convex sets Ai =
{x ∈ R : 0 < x ≤ 1/i}, i = 1, 2, . . ., have a point in common, while all the sets
have no common point;
• “intersection at ∞”: every two (and in fact – finitely many) of the closed convex
sets Ai = {x ∈ R : x ≥ i}, i = 1, 2, . . ., have a point in common, while all the
sets have no common point.
Resulting refined statement of Helly Theorem for a family of (possibly) in-
finitely many sets is as follows:

Theorem I.2.13 [Helly Theorem, II] Let F be an arbitrary family of convex


sets in Rn . Assume that
2.3 Helly Theorem 35

(i) for every collection of at most n + 1 sets from the family, the sets from
the collection have a point in common;
and
(ii) every set in the family is closed, and the intersection of the sets from a
certain finite subfamily of the family is bounded (e.g., one of the sets in the
family is bounded).
Then, all the sets from the family have a point in common.

Proof. By (i), Theorem I.2.10 implies that all finite subfamilies of F have
nonempty intersections, and also these intersections are convex (since intersec-
tion of a family of convex sets is convex by Proposition I.1.12); in view of (ii)
these intersections are also closed. Adding to F intersections of sets from finite
subfamilies of F, we get a larger family F ′ composed of closed convex sets, and
sets from a finite subfamily of this larger family again have a nonempty intersec-
tion. Moreover, from (ii) it follows that this new family contains a bounded set
Q. Since all the sets are closed, the family of sets
{Q ∩ Q′ : Q′ ∈ F}
forms a nested family of compact sets (i.e., a family of compact sets with nonempty
intersection of sets from every finite subfamily). Then, by a well-known theorem
from Real Analysis such a family has a nonempty intersection2) .

2 Here is the proof of this Real Analysis theorem: assume for contradiction that the intersection of the
compact sets Qα , α ∈ A, is empty. Choose a set Qα∗ from the family; for every x ∈ Qα∗ there is a
set Qx in the family which does not contain x (otherwise x would be a common point of all our
sets). Since Qx is closed, there is an open ball Vx centered at x which does not intersect Qx . The
balls Vx , x ∈ Qα∗ , form an open covering of the compact set Qα∗ . Since Qα∗ is compact, there
exists a finite subcovering Vx1 , . . . , VxN of Qα∗ by the balls from the covering, see Theorem B.19.
Since Qxi does not intersect Vxi , we conclude that the intersection of the finite subfamily
Qα∗ , Qx1 , . . . , QxN is empty, which is a contradiction.
3

Polyhedral representations and


Fourier-Motzkin elimination

3.1 Polyhedral representations


Recall that by definition a polyhedral set X in Rn is the solution set of a finite
system of nonstrict linear inequalities in variables x ∈ Rn :

X = {x ∈ Rn : Ax ≤ b} = x ∈ Rn : a⊤

i x ≤ bi , 1 ≤ i ≤ m .

We call such a representation of X its polyhedral description. A polyhedral set


is always convex and closed (Proposition I.1.2). We next introduce the notion of
polyhedral representation of a set X ⊆ Rn .

Definition I.3.1 A set X ⊆ Rn is called polyhedrally representable if it


admits a representation of the form
X = x ∈ Rn : ∃u ∈ Rk : Ax + Bu ≤ c ,

(3.1)
where A, B are m × n and m × k matrices and c ∈ Rm . A representation
of X of the form of (3.1) is called a polyhedral representation of X, and the
variables u ∈ Rk in such a representation are called extra variables.

Geometrically, a polyhedral
 representation of a set X ⊆ Rn is its repre-
sentation
 as the projection x ∈ Rn : ∃u ∈ Rk : [x; u] ∈ Y of a polyhedral set
Y = (x, u) ∈ Rn × Rk : Ax + Bu ≤ c . Here, Y lives in the space of n + k vari-
ables x ∈ Rn and u ∈ Rk , and the polyhedral representation of X is obtained
by applying the linear mapping (the projection) [x; u] 7→ x : Rn+k → Rn of the
(n + k)-dimensional space of (x, u)-variables (the space where Y lives) to the
n-dimensional space of x-variables where X lives.
36
3.1 Polyhedral representations 37

Figure I.5. Polyhedral representation of hexagon in xy-plane


as the projection of rotated 3D cube onto the plane
Note that every polyhedrally representable set is the image under a linear
mapping (even a projection) of a polyhedral, and thus convex, set. It follows that
a polyhedrally representable set is definitely convex (Proposition I.1.21).
Example I.3.1 Every polyhedral set X = {x ∈ Rn : Ax ≤ b} is polyhedrally
representable: a polyhedral description of X is nothing but a polyhedral repre-
sentation with no extra variables (k = 0). Vice versa, a polyhedral representation
of a set X with no extra variables (k = 0) clearly is a polyhedral description of
the set (which therefore is polyhedral). ♢
n
Pn
Example I.3.2 Consider the set X = {x ∈ R : i=1 |xi | ≤ 1}. Note that this
initial description of X is not of the form {x ∈ Rn : Ax ≤ b}. Thus, from this
description of X, we cannot immediately say whether it is polyhedral or not.
However, X admits a polyhedral representation, e.g., the following representation
 

 X n 

X = x ∈ Rn : ∃u ∈ Rn : −ui ≤ xi ≤ ui , 1 ≤ i ≤ n, ui ≤ 1 . (3.2)
 | {z } 
 i=1 
⇐⇒ |xi |≤ui

Note that the set X in question can be described by a system of linear inequalities
in x-variables only, namely, as
( n
)
X
n
X= x∈R : ϵi xi ≤ 1 , ∀(ϵi ∈ {−1, +1}), 1 ≤ i ≤ n) ,
i=1

thus, X is polyhedral. However, the above polyhedral description of X (which in


fact is minimal in terms of the number of inequalities involved) requires 2n in-
equalities — an astronomically large number when n is just few tens. In contrast,
the polyhedral representation of the same set given in (3.2) requires only n extra
variables u and 2n + 1 linear inequalities on x, u and so the “complexity” of this
representation is just linear in n. ♢
1 m n
Example I.3.3 Given a finite P set of vectors a , . . . , a ∈ R , consider their
m
conic hull Cone {a1 , . . . , am } = { i=1 λi ai : λ ≥ 0} (see section 1.2.4). From this
38 Polyhedral representations and Fourier-Motzkin elimination

definition, it is absolutely unclear whether this set is polyhedral. In contrast to


this, its polyhedral representation is immediate:
( m
)
X
Cone a , . . . , am = x ∈ Rn : ∃λ ∈ Rm
 1
+: x = λi ai
i=1
  
  −λ ≤
Pm0 
= x ∈ Rn : ∃λ ∈ Rm : x− P i=1 λi ai ≤ 0 .
m
−x + i=1 λi ai ≤ 0
  

In other words, the original description of X is nothing but its polyhedral repre-
sentation (in slight disguise), with λi ’s in the role of extra variables. ♢

3.2 Every polyhedrally representable set is polyhedral


(Fourier-Motzkin elimination)
A surprising and deep fact is that the situation in Example I.3.2 above is quite
general.

Theorem I.3.2 Every polyhedrally representable set is polyhedral.

Proof: Fourier-Motzkin Elimination. Recalling the definition of a polyhe-


drally representable set, our claim can be rephrased equivalently as follows:
The projection of a polyhedral set Y in a space Rn+k
x,u of (x, u)-variables
onto the subspace Rx of x-variables is a polyhedral set in Rn .
n

Note that it suffices to prove this claim in the case of exactly one extra variable
since the projection which reduces the dimension by k — “eliminates” k extra
variables — is the result of k subsequent projections, every one reducing the
dimension by 1, “eliminating” the extra variables one by one.
Thus, consider a polyhedral set with variables x ∈ Rn and u ∈ R, i.e.,
Y := (x, u) ∈ Rn+1 : a⊤

i x + bi u ≤ ci , 1 ≤ i ≤ m .

We want to prove that the projection of Y onto the space of x-variables, i.e.,
X := {x ∈ Rn : ∃u ∈ R: Ax + bu ≤ c} ,
is polyhedral. To see this, let us split the indices of the inequalities defining Y
into three groups (some of these groups can be empty):
• inequalities with bi = 0: I0 := {i : bi = 0}. These inequalities with index i ∈ I0
do not involve u at all;

• inequalities with bi > 0: I+ := {i : bi > 0}. An inequality with index i ∈ I+ can


be rewritten equivalently as u ≤ b−1 ⊤
i [ci − ai x], and it imposes a (depending on
x) upper bound on u;
3.2 Every polyhedrally representable set is polyhedral (Fourier-Motzkin elimination)39

• inequalities with bi < 0: I− := {i : bi < 0}. An inequality with index i ∈ I− can


be rewritten equivalently as u ≥ b−1 ⊤
i [ci − ai x], and it imposes a (depending on
x) lower bound on u.
We can now clearly answer the question of when x can be in X, that is, when
x can be extended, by some u, to a point (x, u) from Y : this is the case if and
only if, first, x satisfies all inequalities with i ∈ I0 , and, second, the inequalities
with i ∈ I+ giving the upper bounds on u specified by x are compatible with the
inequalities with i ∈ I− giving the lower bounds on u specified by x, meaning
that every lower bound is less than or equal to every upper bound (the latter is
necessary and sufficient to be able to find a value of u which is greater than or
equal to all lower bounds and less than or equal to all upper bounds). Thus,
a⊤
 
n i x ≤ ci , ∀i ∈ I0
X = x ∈ R : −1 −1 .
bj (cj − a⊤ ⊤
j x) ≤ bk (ck − ak x), ∀j ∈ I− , ∀k ∈ I+ .

We see that X is given by finitely many nonstrict linear inequalities in x-variables


only, as claimed.
The outlined procedure for building polyhedral descriptions (i.e., polyhedral
representations not involving extra variables) for projections of polyhedral sets is
called Fourier-Motzkin elimination.

3.2.1 Some applications


As an immediate application of Fourier-Motzkin elimination, let us take a linear
program min{c⊤ x : Ax ≤ b} and look at the set T of possible objective values of
x
all its feasible solutions (if any):
T := t ∈ R : ∃x ∈ Rn : c⊤ x = t, Ax ≤ b .


Rewriting the linear equality c⊤ x = t as a pair of opposite inequalities, we see


that T is polyhedrally representable, and the above definition of T is nothing
but a polyhedral representation of this set, with x in the role of the vector of
extra variables. By Fourier-Motzkin elimination, T is polyhedral – this set is
given by a finite system of nonstrict linear inequalities in variable t only. Thus,
we immediately see that T is
1. either empty (meaning that the LP in question is infeasible),
2. or is a below unbounded nonempty set of the form {t ∈ R : −∞ ≤ t ≤ b} with
b ∈ R ∪ {+∞} (meaning that the LP is feasible and unbounded),
3. or is a below bounded nonempty set of the form {t ∈ R : a ≤ t ≤ b} with a ∈ R
and +∞ ≥ b ≥ a. In this case, the LP is feasible and bounded, and its optimal
value is a.
Note that given the list of linear inequalities defining T (this list can be built
algorithmically by Fourier-Motzkin elimination as applied to the original poly-
hedral representation of T ), we can easily detect which one of the above cases
indeed takes place, i.e., we can identify the feasibility and boundedness status
40 Polyhedral representations and Fourier-Motzkin elimination

of the LP and to find its optimal value. When it is finite (case 3. above), we
can use the Fourier-Motzkin elimination backward, starting with t = a ∈ T and
extending this value to a pair (t, x) with t = a = c⊤ x and Ax ≤ b, that is, we
can augment the optimal value by an optimal solution. Thus, we can say that
Fourier-Motzkin elimination is a finite Real Arithmetics algorithm which allows
to check whether an LP is feasible and bounded, and when it is the case, allows
to find the optimal value and an optimal solution.
On the other hand, Fourier-Motzkin elimination is completely impractical since
the elimination process can blow up exponentially the number of inequalities.
Indeed, from the description of the process it is clear that if a polyhedral set
is given by m linear inequalities, then eliminating one variable, we can end up
with as much as m2 /4 inequalities (this is what happens if there are m/2 indices
in I+ , m/2 indices in I− and I0 = ∅). Eliminating the next variable, we again
can “nearly square” the number of inequalities, and so on. Thus, the number of
inequalities in the description of T can become astronomically large even when
the dimension of x is something like 10.
The actual importance of Fourier-Motzkin elimination is of theoretical nature.
For example, the Linear Programming (LP)-related reasoning we have just carried
out shows that
every feasible and bounded LP problem is solvable, i.e., it has an optimal
solution.
(We will revisit this result in more details in section 9.3.1). This is a fundamental
fact for LP, and the above reasoning (even with the justification of the elimina-
tion “charged” to it) is, to the best of our knowledge, the shortest and most
transparent way to prove this fundamental fact. Another application of the fact
that polyhedrally representable sets are polyhedral is the Homogeneous Farkas
Lemma to be stated and proved in section 4.1; this lemma will be instrumental
in numerous subsequent theoretical developments.

3.3 Calculus of polyhedral representations


The fact that polyhedral sets are exactly the same as polyhedrally representable
ones does not nullify the notion of a polyhedral representation. The point is
that a set can admit “quite compact” polyhedral representation involving extra
variables and require astronomically large, completely meaningless for any prac-
tical purpose, number of inequalities in its polyhedral description (think about
Example I.3.2 and the associated set (3.2) when n = 100). Moreover, polyhe-
dral representations admit a kind of “fully algorithmic calculus.” Specifically, it
turns out that all basic convexity-preserving operations (cf. Proposition I.1.21)
as applied to polyhedral operands preserve polyhedrality. Moreover, polyhedral
representations of the results are readily given by polyhedral representations of
the operands. Here is the “algorithmic polyhedral analogy” of Proposition I.1.21:

1. Taking finite intersection: Let Mi , 1 ≤ i ≤ m, be polyhedral sets in Rn given


3.3 Calculus of polyhedral representations 41

by their polyhedral representations


Mi = x ∈ Rn : ∃ui ∈ Rki : Ai x + Bi ui ≤ ci , 1 ≤ i ≤ m.


Then, the intersection of the sets Mi is polyhedral with an explicit polyhedral


representation, i.e.,
Tm
 Mi
i=1
= x ∈ Rn : ∃u = [u1 ; . . . ; um ] ∈ Rk1 +...+km : Ai x + Bi ui ≤ ci , 1 ≤ i ≤ m .
2. Taking direct product: Let Mi ⊆ Rni , 1 ≤ i ≤ m, be polyhedral sets given by
polyhedral representations
Mi = xi ∈ Rni : ∃ui ∈ Rki ]asntxtf :Ai xi + Bi ui ≤ ci , 1 ≤ i ≤ m.


Then, the direct product


M1 × . . . × Mm := x = [x1 ; . . . ; xm ] : xi ∈ Mi , 1 ≤ i ≤ m


of the sets is a polyhedral set with explicit polyhedral representation, i.e.,


M1× . . . × Mm
x = [x1 ; . . . ; xm ] ∈ Rn1 +...+nm : ∃u = [u1 ; . . . ; um ] ∈ Rk1 +...+km :

= .
Ai xi + Bi ui ≤ ci , 1 ≤ i ≤ m
3. Arithmetic summation and multiplication by reals: Let Mi ⊆ Rn , 1 ≤ i ≤ m,
be polyhedral sets given by polyhedral representations
Mi = x ∈ Rn : ∃ui ∈ Rki : Ai x + Bi ui ≤ ci , 1 ≤ i ≤ m,


and let λ1 , . . . , λk be reals. Then, the set λ1 M1 + . . . + λm Mm := {x = λ1 x1 +


. . . + λm xm : xi ∈ Mi , 1 ≤ i ≤ m} is polyhedral with explicit polyhedral
representation, specifically,
λ1 M
 1 + . . . n+ λm M m
x ∈ R : ∃(xi P ∈ Rn , ui ∈ RkP

i
, 1 ≤ i ≤ m):
= .
x ≤ i λi xi , x ≥ i λi xi , Ai xi + Bi ui ≤ ci , 1 ≤ i ≤ m
4. Taking the image under an affine mapping: Let M ⊆ Rn be a polyhedral set
given by polyhedral representation
M = x ∈ Rn : ∃u ∈ Rk : Ax + Bu ≤ c ,


and let P(x) = P x + p : Rn → Rm be an affine mapping. Then, the image of


M under this mapping, i.e., P(M ) := {P x + p : x ∈ M }, is a polyhedral set
with explicit polyhedral representation given by
  
  y ≤ Px + p 
P(M ) = y ∈ Rm : ∃(x ∈ Rn , u ∈ Rk ) : y ≥ Px + p .
Ax + Bu ≤ c
  

5. Taking the inverse image under affine mapping: Let M ⊆ Rn be polyhedral


set given by polyhedral representation
M = x ∈ Rn : ∃u ∈ Rk : Ax + Bu ≤ c ,

42 Polyhedral representations and Fourier-Motzkin elimination

and let P(y) = P y + p : Rm → Rn be an affine mapping. Then, the inverse


image of M under this mapping, i.e., P −1 (M ) := {y ∈ Rm : P y + p ∈ M }, is
a polyhedral set with explicit polyhedral representation given by
P −1 (M ) = y ∈ Rm : ∃u ∈ Rk : A(P y + p) + Bu ≤ c .


Note that the rules for intersection, taking direct products and taking inverse
images, as applied to polyhedral descriptions of operands, lead to polyhedral de-
scriptions of the results. In contrast to this, the rules for taking sums with coeffi-
cients and images under affine mappings heavily exploit the notion of polyhedral
representation: even when the operands in these rules are given by polyhedral
descriptions, there are no simple ways to point out polyhedral descriptions of the
results.
Absolutely straightforward justification of the above calculus rules is the sub-
ject of Exercise I.27.
Finally, we note that the problem of minimizing a linear form c⊤ x over a set
M given by its polyhedral representation, i.e.,
M = x ∈ Rn : ∃u ∈ Rk : Ax + Bu ≤ c ,


can be immediately reduced to an explicit LP program, namely,


min c⊤ x : Ax + Bu ≤ c .

x,u

A reader with some experience in Linear Programming definitely used a lot of


the above “calculus of polyhedral representations” when building LPs (perhaps
without a clear understanding of what in fact is going on, same as Molière’s
Monsieur Jourdain all his life has been speaking prose without knowing it).
4

General Theorem on Alternative and Linear


Programming Duality

4.1 Homogeneous Farkas Lemma


Let a1 , . . . , aN be vectors from Rn , and let a be another vector from Rn . Here,
we address the question: when does a belong to the cone spanned by the vectors
a1 , . . . , aN , i.e., when can a be represented as a linear combination of ai with
nonnegative coefficients? We immediately observe the following evident necessary
condition:
N
X
if a = λi ai [where λi ≥ 0, i = 1, . . . , N ],
i=1

then every vector h that has nonnegative inner products with all ai ’s
should also have nonnegative inner product with a:
( N
)
X
a= λi ai , with λi ≥ 0, ∀i, and h ai ≥ 0, ∀i =⇒ h⊤ a ≥ 0.

i=1

In fact, this evident necessary condition is also sufficient. This is given by the
Homogeneous Farkas Lemma.

Lemma I.4.1 [Homogeneous Farkas Lemma (HFL)] Let a, a1 , . . . , aN be


vectors from Rn . The vector a is a conic combination of the vectors ai (linear
combination with nonnegative coefficients), i.e., a ∈ Cone {a1 , . . . , aN }, if and
only if every vector h satisfying h⊤ ai ≥ 0, i = 1, . . . , N , satisfies also h⊤ a ≥ 0.
In other words, a homogeneous linear inequality
a⊤ h ≥ 0
in variable h is a consequence of the system
a⊤
i h ≥ 0, 1≤i≤N
of homogeneous linear inequalities if and only if it can be obtained from the
inequalities of the system by “admissible linear aggregation” – taking their
weighted sum with nonnegative weights.

Proof. The necessity – the “only if” part of the statement – was proved before
the Homogeneous Farkas Lemma was formulated. Let us prove the “if” part of
the lemma. Thus, we assume that h⊤ a ≥ 0 is a consequence of the homogeneous
43
44 General Theorem on Alternative and Linear Programming Duality

system h⊤ ai ≥ 0 ∀i, i.e., every vector h satisfying h⊤ ai ≥ 0 ∀i satisfies also


h⊤ a ≥ 0, and let us prove that a is a conic combination of the vectors ai .
An “intelligent” proof goes as follows. The set Cone {a1 , . . . , aN } of all conic
combinations of a1 , . . . , aN is polyhedrally representable (see Example I.3.3) and
as such is polyhedral (Theorem I.3.2). Hence, we have
Cone {a1 , . . . , aN } = x ∈ Rn : p⊤

j x ≥ bj , 1 ≤ j ≤ J . (4.1)
Now, observe that 0 ∈ Cone {a1 , . . . , aN }, and thus we conclude that bj ≤ 0 for
all j ∈ J. Moreover, since λai ∈ Cone {a1 , . . . , aN } for every i and every λ ≥ 0, we
deduce λp⊤ ⊤
j ai ≥ bj for all i, j and all λ ≥ 0, whence pj ai ≥ 0 for all i and j. For
every j, the relation p⊤ j ai ≥ 0 for all i implies, by the premise of the statement
we want to prove, that p⊤ ⊤
j a ≥ 0. Then, as 0 ≥ bj for all j, we see that pj a ≥ bj
for all j, meaning that a indeed belongs to Cone {a1 , . . . , aN } due to (4.1).
This very short and elegant proof of Homogeneous Farkas Lemma is a nice
illustration of the power of Fourier-Motzkin elimination.

4.2 Certificates for feasibility and infeasibility


Consider a (finite) system of scalar inequalities with n unknowns. To be as general
as possible, we do not assume for the time being the inequalities to be linear, and
we allow for both non-strict and strict inequalities in the system, as well as for
equalities. Since an equality can be represented by a pair of non-strict inequalities,
our system can always be written as
fi (x) Ωi 0, i = 1, . . . , m, (S)
where every Ωi is either the relation “>” or the relation “≥”, and we assume
m ≥ 1, which is the only case of interest here.
The most basic question about (S) is
(Q) Does (S) have a solution, i.e., is (S) feasible?
Knowing how to answer the question (Q) enables us to answer many other
questions. For example, verifying whether a given real number a is a lower bound
on the optimal value Opt∗ of a linear program
min c⊤ x : Ax ≥ b

(LP)
x

is the same as verifying whether the system


−c⊤ x + a > 0
Ax − b ≥ 0
has no solutions.
The general question (Q) above is too difficult, and it makes sense to pass from
it to a seemingly simpler one:
(Q′ ) How do we certify that (S) has, or does not have, a solution?
Imagine that you are very smart and know the correct answer to (Q) ; how can
4.2 Certificates for feasibility and infeasibility 45

you convince everyone that your answer is correct? What can be an “evident for
everybody” validity certificate for your answer?
If your claim is that (S) is feasible, a certificate can be just to point out a
solution x∗ to (S). Given this certificate, one can substitute x∗ into the system
and check whether x∗ is indeed a solution.
Suppose now that your claim is that (S) has no solutions. What can be a
“simple certificate” of this claim? How can one certify a negative statement? This
is a highly nontrivial problem not just for mathematics; for example, in criminal
law, how should someone accused in a murder prove his innocence? The “real life”
answer to the question “how to certify a negative statement” is discouraging: such
a statement normally cannot be certified1 . In mathematics, the standard way to
justify a negative statement A, like “there is no solution to such and such system
of constraints” (e.g., “there is no solutions to the equation x5 + y 5 = z 5 with
positive integer variables x, y, z”) is to lead the opposite to A statement, i.e.,
A (in our example, “the solution exists”), to a contradiction. That is, assume
that A is true and derive consequences until a clearly false statement is obtained;
when this happens, we know that A is false (since legitimate consequences of a
true statement must be true), and therefore A must be true. In general, there is
no recipe for leading to contradiction something which in fact is false; this is why
certifying negative statements usually is difficult.
Fortunately, finite systems of linear inequalities are simple enough to allow
for a recipe for certifying their infeasibility: we start with the assumption that
a solution exists and then demonstrate a contradiction in a very specific way
– by taking weighted sum of the inequalities in the system using nonnegative
aggregation weights to produce a contradictory inequality.
Let us start with a simple illustration: we would like to certify infeasibility of
the following system of inequalities in variables u, v, w:
5u −6v −4w > 2
+4v −2w ≥ −1
−5u +7w ≥ 1
Let us assign these inequalities with “aggregation weights” 2, 3, 2, multiply the
inequalities by the respective weights and sum up the resulting inequalities:
2× 5u −6v −4w > 2
+
3× +4v −2w ≥ −1
+
2× −5u +7w ≥ 1
(∗) 0 · u +0 · v +0 · w > 3
The resulting aggregated inequality (∗) is contradictory, it has no solutions at
1 This is where the court rule “a person is presumed innocent until proven guilty” comes from –
instead of requesting from the accused to certify the negative statement “I did not commit the
crime,” the court requests from the prosecution to certify the positive statement “The accused did
commit the crime.”
46 General Theorem on Alternative and Linear Programming Duality

all. At the same time, (∗) is a consequence of our system – by construction of


(∗), every solution to the original system of three inequalities is also feasible to
(∗). Taken together, these two observations say that the system has no solutions,
and the vector [2; 3; 2] of our aggregation weights can be seen as an infeasibil-
ity certificate – taking weighted sum of inequalities from the system with the
corresponding nonnegative weights, we lead the system to a contradiction.
As applied to a general system of inequalities (S), a similar approach to certify-
ing infeasibility would be to assign the inequalities with nonnegative aggregation
weights, multiply them by these weights and sum up the resulting inequalities,
arriving at an aggregated inequality, which, due to its origin, is a consequence
of system (S), meaning that every solution to the system solves the aggregated
inequality as well. It follows that when the aggregated inequality is contradictory,
i.e., it has no solutions at all, the original system (S) must be infeasible as well.
When this happens, the collection of weights used to generate the contradictory
consequence inequality can be viewed as an infeasibility certificate for (S).
Let us look what the outlined approach means when (S) is composed of finitely
many linear inequalities:
 ⊤
ai x Ωi bi , i = 1, . . . , m [where Ωi is either “>” or “≥”] . (S)
In this case the “aggregated inequality” is linear as well:
m
!⊤ m
X X
λi ai x Ω λi bi , (Comb(λ))
i=1 i=1

where Ω is “>” whenever λi > 0 for at least one i with Ωi = “ > ”, and Ω is “≥”
otherwise. Now, when can a linear inequality
d⊤ x Ω e
be contradictory? Of course, it can happen only when d = 0. Furthermore, in
this case, whether the inequality is contradictory depends on the relation Ω and
the value of e: if Ω = “ > ”, then the inequality is contradictory if and only if
e ≥ 0, and if Ω = “ ≥ ”, then it is contradictory if and only if e > 0. We have
established the following simple result:

Proposition I.4.2 Consider a system of linear inequalities in unknowns


x ∈ Rn : (
a⊤
i x > bi , i = 1, . . . , ms ,
(S)
a⊤
i x ≥ bi , i = ms + 1, . . . , m.
Let us associate with (S) two systems of linear inequalities and equations
4.3 General Theorem on Alternative 47

with unknowns λ ∈ Rm :

(a) P λ ≥ 0, 

 m  (a) P λ ≥ 0,

(b) Pi=1 λi ai = 0, m
TI : m TII : (b) λi ai = 0,
(cI ) i=1 λi bi ≥ 0, Pi=1
m
(cII ) i=1 λi bi > 0.
 
ms
 P
(dI ) i=1 λi > 0.

If at least one of the systems TI , TII is feasible, then the system (S) is infea-
sible.

4.3 General Theorem on Alternative


Proposition I.4.2 states that in some cases it is easy to certify infeasibility of
a linear system of inequalities: a “simple certificate” is a solution to another
system of linear inequalities. Note, however, that the existence of a certificate of
this latter type so far is only a sufficient, but not a necessary, condition for the
infeasibility of (S). A fundamental result in the theory of linear inequalities is
that this sufficient condition is in fact also necessary:

Theorem I.4.3 [General Theorem on Alternative (GTA)] Consider the no-


tation and setting of Proposition I.4.2. System (S) has no solutions if and
only if at least one of the systems TI or TII is feasible.

Proof. GTA is a more or less straightforward corollary of the Homogeneous


Farkas Lemma. Indeed, in view of Proposition I.4.2, all we need to prove is that
if (S) has no solution, then at least one of the systems TI , or TII is feasible.
Thus, assume that (S) has no solutions, and let us look at the consequences. Let
us associate with (S) the following system of homogeneous linear inequalities in
variables x, τ , ϵ:
(a) τ −ϵ ≥ 0,
(b) a⊤
i x −biτ −ϵ ≥ 0, i = 1, . . . , ms , (4.2)

(c) ai x −bi τ ≥ 0, i = ms + 1, . . . , m.

First, we claim that in every solution to (4.2), one has ϵ ≤ 0. Indeed, assuming
that (4.2) has a solution x, τ, ϵ with ϵ > 0, we conclude from (4.2.a) that τ > 0.
Then, from (4.2.b − c) it will follow that τ −1 x is a solution to (S), while we
assumed (S) is infeasible. Therefore, we must have ϵ ≤ 0 in every solution to
(4.2).
Now, we have that the homogeneous linear inequality

−ϵ ≥ 0 (4.3)

is a consequence of the system of homogeneous linear inequalities (4.2). Then, by


Homogeneous Farkas Lemma, there exist nonnegative weights ν, λi , i = 1, . . . , m,
such that the aggregated inequality from (4.2) using these weights results in
48 General Theorem on Alternative and Linear Programming Duality

precisely the consequence inequality (4.3), i.e.,


Pm
(a) Pm i=1 λi ai = 0,
(b) − P i=1 λi bi + ν = 0, (4.4)
ms
(c) − i=1 λi − ν = −1.
Recall that by their origin, ν and all λi are nonnegative. Now, it may happen
that λ1 , . . . , λms are zero. In this case ν = 1 by (4.4.c), and relations (4.4a − b)
say that λ1 , . . . , λm is a solution for TII . In the remaining
Pmcase (that is, when not
all λ1 , . . . , λms are zero, or, which is the same, when i=1 s
λi = 1 − ν > 0), the
same relations (4.4a − b) say that λ1 , . . . , λm is a solution for TI .
Remark I.4.4 We have derived GTA from Homogeneous Farkas Lemma (HFL).
Note that HFL is nothing but a special case of GTA. Indeed, identifying when a
linear inequality a⊤ x ≤ b is a consequence of the system a⊤ i xi ≤ bi , 1 ≤ i ≤ m
(this is the question answered by HFL in the case of b = b1 = . . . = bm = 0) is
exactly the same as identifying when the system of inequalities
a⊤ x > b, a⊤
i x ≤ bi , i = 1, . . . , m (∗)
in variables x is infeasible, and what in the latter case is said by GTA, is exactly
what HFL states: when b = b1 = . . . = bm = 0, the system (∗) is infeasible if and
only if the vector a is a conic combination of the vectors a1 , . . . , am . Thus, it is
completely sensible that GTA, in full generality, was derived from its indepen-
dently justified special case, HFL. ■

4.4 Corollaries of GTA


Let us explicitly state two very useful principles derived from the General Theo-
rem on Alternative:
A. A system of finitely many linear inequalities
a⊤
i x Ωi bi , i = 1, . . . , m [where Ωi ∈ {“ ≥ ”, “ > ”}]
has no solutions if and only if one can aggregate the inequalities of the system
in a linear fashion (i.e., multiplying the inequalities by nonnegative weights,
summing the resulting inequalities and passing, if necessary, from an inequality
a⊤ x > b to the inequality a⊤ x ≥ b) to get a contradictory inequality, namely,
either the inequality 0⊤ x ≥ 1, or the inequality 0⊤ x > 0.
B. A linear inequality
a⊤
0 x Ω0 b0 [where Ω0 ∈ {“ ≥ ”, “ > ”}]
in variables x is a consequence of a feasible system of linear inequalities
a⊤
i x Ωi bi , i = 1, . . . , m, [where Ωi ∈ {“ ≥ ”, “ > ”}]
if and only if it can be obtained by linear aggregation with nonnegative weights
from the inequalities of the system and the trivial identically true inequality
0⊤ x > −1.
4.4 Corollaries of GTA 49

In fact, when all Ωi in B are non-strict, B can be reformulated equivalently as


follows.

Proposition I.4.5 [Inhomogeneous Farkas Lemma] Linear inequality


a⊤ x ≤ b
is a consequence of the feasible system of linear inequalities
a⊤
i x ≤ bi , 1≤i≤m
if and only if there exist nonnegative aggregation weights λi , i = 1, . . . , m,
such that
Xm Xm
a= λi ai and b ≥ λi bi .
i=1 i=1

We would like to emphasize that the preceding principles are highly nontrivial
and very deep. Consider, e.g., the following system of 4 linear inequalities in two
variables u, v:
−1 ≤ u ≤ 1,
−1 ≤ v ≤ 1.
These inequalities clearly imply that
u2 + v 2 ≤ 2, (!)
which in turn implies, by the Cauchy-Schwarz inequality, the linear inequality
u + v ≤ 2:
√ √ √
u + v = 1 × u + 1 × v ≤ 12 + 12 u2 + v 2 ≤ ( 2)2 = 2. (!!)
The concluding inequality u + v ≤ 2 is linear and is a consequence of the original
feasible system, and so we could have simply relied on Principle B to derive it.
On the other hand, in the preceding demonstration of this linear consequence
inequality both steps (!) and (!!) are “highly nonlinear.” It is absolutely unclear
a priori why the same consequence inequality can, as it is stated by Principle B, be
derived from the system in a “linear” manner as well (of course it can – it suffices
just to sum up two inequalities u ≤ 1 and v ≤ 1). In contrast, Inhomogeneous
Farkas Lemma predicts that hundreds of pages of whatever complicated (but
correct!) demonstration that such and such linear inequality is a consequence
of such and such feasible finite system of linear inequalities can be replaced by
simply demonstrating weights of prescribed signs such that the target inequality
is the weighted sum, with these weights, of the inequalities from the system and
the identically true linear inequality. One shall appreciate the elegance and depth
of such a result!
Note that the General Theorem on Alternative and its corollaries A and B
heavily exploit the fact that we are speaking about linear inequalities. For exam-
ple, consider the following system of two quadratic and two linear inequalities in
50 General Theorem on Alternative and Linear Programming Duality

two variables:
(a) u2 ≥ 1,
(b) v 2 ≥ 1,
(c) u ≥ 0,
(d) v ≥ 0,
along with the quadratic inequality
(e) uv ≥ 1.
The inequality (e) is clearly a consequence of (a) – (d). However, if we extend the
system of inequalities (a) – (b) by all “trivial” (i.e., identically true) linear and
quadratic inequalities in 2 variables, like 0 > −1, u2 + v 2 ≥ 0, u2 + 2uv + v 2 ≥ 0,
u2 − 2uv + v 2 ≥ 0, etc., and ask whether (e) can be derived in a linear fashion
from the inequalities of the extended system, the answer will be negative. Thus,
Principle B fails to be true already for quadratic inequalities (which is a great
sorrow – otherwise there would be no difficult problems at all!).

4.5 Application: Linear Programming Duality


We are about to use the General Theorem on Alternative to obtain the basic re-
sults of the Linear Programming (LP) duality theory. To do so, we first introduce
some basic terminology about mathematical programming problems.

4.5.1 Preliminaries: Mathematical and Linear Programming


problems
A (constrained) Mathematical Programming problem has the following form:
 
 x ∈ X, 
(P) min f (x) : g(x) ≡ [g1 (x); . . . ; gm (x)] ≤ 0, , (4.5)
x 
h(x) ≡ [h1 (x); . . . ; hk (x)] = 0

where
• [domain] X is called the domain of the problem,
• [objective] f is called the objective (function) of the problem,
• [constraints] gi , i = 1, . . . , m, are called the (functional) inequality constraints,
and hj , j = 1, . . . , k, are called the equality constraints 2) .
We always assume that X ̸= ∅ and that the objective and the constraints are
well-defined on X. Moreover, we typically skip indicating X when X = Rn .
We use the following standard terminology related to (4.5)
2 Rigorously speaking, the constraints are not the functions gi , hj , but the relations gi (x) ≤ 0,
hj (x) = 0. We will use the word “constraints” in both of these senses, and it will always be clear
what is meant. For example, we will say that “x satisfies the constraints” to refer to the relations,
and we will say that “the constraints are differentiable” to refer to the underlying functions.
4.5 Application: Linear Programming Duality 51

• [feasible solution] a point x ∈ Rn is called a feasible solution to (4.5), if x ∈ X,


gi (x) ≤ 0, i = 1, . . . , m, and hj (x) = 0, j = 1, . . . , k, i.e., if x satisfies all
restrictions imposed by the formulation of the problem.
– [feasible set] the set of all feasible solutions is called the feasible set of the
problem.
– [feasible problem] a problem with a nonempty feasible set (i.e., the one which
admits feasible solutions) is called feasible (or consistent).
• [optimal value] the optimal value of the problem refers to the quantity
(
inf {f (x) : x ∈ X, g(x) ≤ 0, h(x) = 0} , if the problem is feasible,
Opt := x
+∞, if the problem is infeasible.

– [below boundedness] the problem is called below bounded, if its optimal value
is > −∞, i.e., if the objective is bounded from below on the feasible set.
• [optimal solution] a point x ∈ Rn is called an optimal solution to (4.5), if x is
feasible and f (x) ≤ f (x′ ) for any other feasible solution x′ , i.e., if
x ∈ Argmin {f (x′ ) : x′ ∈ X, g(x′ ) ≤ 0, h(x′ ) = 0} .
x′

– [solvable problem] a problem is called solvable, if it admits optimal solutions.


– [optimal set] the set of all optimal solutions to a problem is called its optimal
set.
Remark I.4.6 In the above description of a Mathematical Programming prob-
lem and related basic notions, like feasibility, solvability, boundedness, etc., we
“standardize” the situation by assuming that the objective should be minimized,
and the inequality constraints are of the form gi (x) ≤ 0. Needless to say, we
can also speak about problems where the objective should be maximized and/or
some of the inequality constraints are of the form gi (x) ≥ 0. There is no difficulty
to reduce these “more general” forms of optimization problems to our standard
form: maximizing f (x) is the same as minimizing −f (x), and the constraint of
the form gi (x) ≥ 0 is the same as the constraint −gi (x) ≤ 0. While this standard-
ization is always possible, from time to time we take the liberty to speak about
maximization problems and/or ≥-type constraints. With this in mind, it is worth
to mention that when working with maximization problems, we should update
the notions of optimal value, problem’s boundedness, and optimal solution. For
a maximization problem,

• the optimal value is the supremum of the values of the objective at feasible
solutions, and is, by definition, −∞ for infeasible problems, and
• boundedness means boundedness of the objective from above on the feasible
set (or, which is the same, the fact that the optimal value is < +∞),
• optimal solution is a feasible solution such that the objective value at this
solution is greater than or equal to the objective value at every feasible solution.
52 General Theorem on Alternative and Linear Programming Duality

Needless to say, when “standardizing” a maximization problem, i.e., replacing


maximization of f (x) with minimization of −f (x), boundedness and optimal
solutions remain intact while the optimal value “is negated,” i.e., real number a
becomes −a, and ±∞ becomes ∓∞. ■
Linear Programming problems. A Mathematical Programming problem
(P) is called Linear Programming (LP) problem, if

• X = Rn is the entire space,


• f, g1 , . . . , gm are real-valued affine functions on Rm , that is, functions of the
form a⊤ x + b, and
• there are no equality constraints at all.

Note that in principle we could allow for linear equality constraints hj (x) :=
a⊤
j x + bj = 0. However, a constraint of this type can be equivalently represented
by a pair of opposite linear inequalities a⊤ ⊤
j x + bj ≤ 0, − aj x − bj ≤ 0. To save
space and words (and, as we have just explained, with no loss in generality), in
the sequel we will focus on inequality constrained linear programming problems.

4.5.2 Dual to an LP problem: the origin


Consider an LP problem

a⊤
   
1
 ⊤ 
Opt = min c⊤ x : Ax − b ≥ 0 where A =  a2  ∈ Rm×n  .
  
 ...  (LP)
x  

am

The motivation for constructing the problem dual to an LP problem is the desire
to generate, in a systematic way, lower bounds on the optimal value Opt of (LP).
Consider the problem

min {f (x) : gi (x) ≥ bi , i = 1, . . . , m} .


x

An evident way to bound from below a given function f (x) in the domain given
by a system of inequalities

gi (x) ≥ bi , i = 1, . . . , m, (4.6)

is offered by what is called the Lagrange duality. We will discuss Lagrange Duality
in full detail for general functions in Part IV. Here, let us do a brief precursor
and examine the special case when we are dealing with linear functions only.
4.5 Application: Linear Programming Duality 53
Lagrange Duality:
• Let us look at all inequalities which can be obtained from (4.6) by
linear aggregation, i.e., the inequalities of the form
m
X m
X
yi gi (x) ≥ yi bi (4.7)
i=1 i=1

with the “aggregation weights” yi ≥ 0 for all i. Note that the inequality
(4.7), due to its origin, is valid on the entire set X of feasible solutions
of (4.6).
• Depending on the choice of aggregation weights, it may happen that
the left hand side in (4.7) is ≤P f (x) for all x ∈ Rn . Whenever this
m
is the case, the right hand side i=1 yi bi of (4.7) is a lower bound on
f (x) for any x ∈ X . It follows that
• The optimal value of the problem
(m )
X y ≥ 0, (a)
max yi bi : Pm n (4.8)
y
i=1 i=1 yi gi (x) ≤ f (x), ∀x ∈ R (b)
is a lower bound on the values of f on the set of feasible solutions to
the system (4.6).
Let us now examine what happens with the Lagrange duality when f and gi
are homogeneous linear functions, i.e., f (x) = c⊤ x and gi (x) = a⊤ x for all i =
iP
m
1, . . . , m. In this case, the requirement (4.8.b) merely says that c = i=1 yi ai (or,

which is the same, A y = c due to the origin of the matrix A). Thus, problem
(4.8) becomes the Linear Programming problem

max b⊤ y : A⊤ y = c, y ≥ 0 , (LP∗ )

y

which is nothing but the LP dual of (LP).


By the construction of the dual problem (LP∗ ), we immediately have
[Weak Duality] The optimal value in (LP∗ ) is less than or equal to the
optimal value in (LP).
In fact, the “less than or equal to” in the latter statement is “equal to,” provided
that the optimal value Opt in (LP) is a number (i.e., (LP) is feasible and below
bounded, in which case Fourier Motzkin elimination guarantees that Opt is a real
number). To see that this indeed is the case, note that a real number a is a lower
bound on Opt if and only if c⊤ x ≥ a holds for all x satisfying Ax ≥ b, or, which
is the same, if and only if the system of linear inequalities

−c⊤ x > −a, Ax ≥ b (Sa ) :

has no solution. Then, by General Theorem on Alternative we deduce that at


least one of a certain pair of systems of linear inequalities does have a solution.
More precisely,
54 General Theorem on Alternative and Linear Programming Duality

(*) (Sa ) has no solutions if and only if at least one of the following
two systems of linear inequalities in m + 1 unknowns has a solution:

 ; . . . ; λm ] ≥ 0,
(a) λ = [λ0 ; λ1P
 m

(b) −λ0 c + i=1 λi ai = 0,
TI : Pm
(c ) −λ0 a + i=1 λi bi ≥ 0,
 I


(dI ) λ0 > 0;
or 
 (a) λ = [λ0 ; λ1P ; . . . ; λm ] ≥ 0,
m
TII : (b) −λ0 c − i=1 λi ai = 0,
Pm
(cII ) −λ0 a − i=1 λi bi > 0.

Now assume that (LP) is feasible. We first claim that under this assumption
(Sa ) has no solutions if and only if TI has a solution. The implication “TI has a
solution =⇒ (Sa ) has no solution” is readily given by the preceding remarks. To
verify the inverse implication, assume that (Sa ) has no solution and the system
Ax ≥ b has a solution, and let us prove that then TI has a solution. If TI has no
solution, then by (*), TII must have a solution. Moreover, since any solution to
TII where λ0 > 0 is also a solution to TI as well, we must have λ0 = 0 for (every)
solution to TII . But, the fact that TII has a solution λ with λ0 = 0 is independent
of the values of c and a; if this fact would take place, it would mean, by the same
General Theorem on Alternative, that, e.g., the following instance of (Sa ):
0⊤ x ≥ −1, Ax ≥ b
has no solution as well. But, then we must have the system Ax ≥ b has no solution
– a contradiction to the assumption that (LP) is feasible.
Now, if TI has a solution, this system has a solution with λ0 = 1 as well (to see
this, pass from a solution λ to the one λ/λ0 ; this construction is well-defined, since
λ0 > 0 for every solution to TI ). Now, an (m + 1)-dimensional vector λ = [1; y]
is a solution to TI if and only if the m-dimensional vector y solves the following
system of linear inequalities and equations

 ⊤  Pm y ≥ 0,
A y≡ i=1 i i = c,
y a (D)
b⊤ y ≥ a.
We summarize these observations below.

Proposition I.4.7 If system (D) in unknowns y, a associated with the LP


program (LP) has a solution (ȳ, ā), then ā is a lower bound on the optimal
value in (LP). Vice versa, if (LP) is feasible and ā is a lower bound on
the optimal value of (LP), then ā can be extended by a properly chosen
m-dimensional vector ȳ to a solution to (D).

We see that the entity responsible for lower bounds on the optimal value of
(LP) is the system (D): every solution to the latter system induces a bound of
this type, and in the case when (LP) is feasible, all lower bounds can be obtained
4.5 Application: Linear Programming Duality 55

from solutions to (D). Now note that if (y, a) is a solution to (D), then the pair
(y, b⊤ y) also is a solution to the same system, and the lower bound b⊤ y on Opt
is not worse than the lower bound a. Thus, as far as lower bounds on Opt are
concerned, we lose nothing by restricting ourselves to the solutions (y, a) of (D)
with a = b⊤ y. The best lower bound on Opt given by (D) is therefore the optimal
value of the problem
max b⊤ y : A⊤ y = c, y ≥ 0 ,

y

which is nothing but the dual to (LP) problem given by (LP∗ ). Note that (LP∗ )
is also a Linear Programming problem.
All we know about the dual problem so far is the following:

Proposition I.4.8 Whenever y is a feasible solution to (LP∗ ), the corre-


sponding value of the dual objective b⊤ y is a lower bound on the optimal
value Opt in (LP). If (LP) is feasible, then for every a ≤ Opt there exists a
feasible solution y of (LP∗ ) with b⊤ y ≥ a.

4.5.3 Linear Programming Duality Theorem


Proposition I.4.8 is in fact equivalent to the following complete statement of LP
Duality Theorem.

Theorem I.4.9 [Duality Theorem in Linear Programming] Consider a lin-


ear programming problem
min c⊤ x : Ax ≥ b ,

(LP)
x

along with its dual


max b⊤ y : A⊤ y = c, y ≥ 0 . (LP∗ )

y

Then,
1) [Primal-dual symmetry] The dual problem is an LP program, and its
dual is equivalent to the primal problem;
2) [Weak duality] The value of the dual objective at every dual feasible
solution is less than or equal to the value of the primal objective at every
primal feasible solution, so that the dual optimal value is less than or equal
to the primal one;
3) [Strong duality] The following 5 properties are equivalent to each other:
(i) The primal is feasible and bounded below.
(ii) The dual is feasible and bounded above.
(iii) The primal is solvable.
(iv) The dual is solvable.
(v) Both primal and dual are feasible.
56 General Theorem on Alternative and Linear Programming Duality

Moreover, if any one of these properties (and then, by the equivalence just
stated, every one of them) holds, then the optimal values of the primal and
the dual problems are equal to each other.
Finally, if at least one of the problems in the primal-dual pair is feasible,
then the optimal values in both problems are the same, i.e., either both are
finite and equal to each other, or both are +∞ (i.e., primal is infeasible and
dual is not bounded above), or both are −∞ (i.e., primal is unbounded below
and dual is infeasible).

There is one last remark we should make to complete the story of primal
and dual objective values given in Theorem I.4.9: in fact it is possible to have
both primal and dual problems infeasible simultaneously (see Exercise I.38). This
is the only case when the primal and the dual optimal values (+∞ and −∞,
respectively) differ from each other.
Proof. 1) This part is quite straightforward: writing the dual problem (LP∗ ) in
our standard form, we get
     
 Im 0 
min −b⊤ y :  A⊤  y −  c  ≥ 0 ,
y 
−A⊤ −c

where Im is the m × m identity matrix. Applying the duality transformation to


the latter problem, we come to the problem
 

 ξ ≥ 0  
η ≥ 0
 
⊤ ⊤ ⊤
max 0 ξ + c η + (−c) ζ : ,
ξ,η,ζ 
 ζ ≥ 0  
ξ + Aη − Aζ = −b
 

which is clearly equivalent to (LP) (after we set x = ζ − η and eliminate ξ).


2) This part follows from the origin of the dual and is thus immediately given
by Proposition I.4.8.
3) We prove the following implications.
(i) =⇒ (iv): If the primal is feasible and bounded below, its optimal value Opt
(which of course is a lower bound on itself) can, by Proposition I.4.8, be (non-
strictly) majorized by a quantity b⊤ y ∗ , where y ∗ is a feasible solution to (LP∗ ).
Then, of course, b⊤ y ∗ = Opt by already proven statement of item 2). On the
other hand, by Proposition I.4.8, the optimal value in the dual is less than or
equal to Opt. Thus, we conclude that the optimal value in the dual is attained
and is equal to the optimal value in the primal.
(iv) =⇒ (ii): This is evident by the definition of solvability.
(ii) =⇒ (iii): This implication, in view of the primal-dual symmetry, follows from
the already justified implication (i) =⇒ (iv).
(iii) =⇒ (i): This is evident by the definition of solvability.
We have shown that (i)≡(ii)≡(iii)≡(iv) and that the first (and consequently
each) of these four equivalent properties implies that the optimal value in the
primal problem is equal to the optimal value in the dual one. All that remains
4.5 Application: Linear Programming Duality 57

is to prove the equivalence between (i)–(iv) and (v). This is immediate: (i)–(iv),
of course, imply (v); vice versa, in the case of (v) the primal is not only feasible,
but also bounded below (this is an immediate consequence of the feasibility of
the dual problem, see part 2)), and (i) follows.
It remains to verify that if one problem in the primal-dual pair is feasible, then
the primal and the dual optimal values are equal to each other. By primal-dual
symmetry it suffices to consider the case when the primal problem is feasible. If
also the primal is bounded from below, then by what has already been proved
the dual problem is feasible and the primal and dual optimal values coincide
with each other. If the primal problem is unbounded from below, then the primal
optimal value is −∞ and by Weak Duality the dual problem is infeasible, so that
the dual optimal value is −∞.
An immediate corollary of the LP Duality Theorem is the following necessary
and sufficient optimality condition in LP.

Theorem I.4.10 [Necessary and sufficient optimality conditions in Linear


Programming] Consider an LP program (LP) along with its dual (LP∗ ) as in
Theorem I.4.9. A pair (x, y) of primal and dual feasible solutions is composed
of optimal solutions to the respective problems if and only if we have
yi [Ax − b]i = 0, i = 1, . . . , m, [complementary slackness]
or equivalently, if and only if
c⊤ x − b⊤ y = 0. [zero duality gap]

Proof. Indeed, the “zero duality gap” optimality condition is an immediate con-
sequence of the fact that the value of primal objective at every primal feasible
solution is greater than or equal to the value of the dual objective at every dual
feasible solution, while the optimal values in the primal and the dual are equal
to each other whenever one of the problem is feasible, see Theorem I.4.9. The
equivalence between the “zero duality gap” and the “complementary slackness”
optimality conditions is given by the following computation: whenever x is primal
feasible and y is dual feasible, we have
y ⊤ (Ax − b) = (A⊤ y)⊤ x − b⊤ y = c⊤ x − b⊤ y,
where the second equality follows from dual feasibility (i.e., A⊤ y = c). Thus, for a
primal-dual feasible pair (x, y), the duality gap vanishes if and only if y ⊤ (Ax−b) =
0, and the latter, due to y ≥ 0 and Ax−b ≥ 0, happens if and only if yi [Ax−b]i = 0
for all i, that is, if and only if the complementary slackness takes place.
Geometry of primal-dual pair of LP problems. Consider primal-dual pair
of LP problems
minx∈Rn c⊤ x : Ax − b ≥ 0

(LP)
maxy∈Rm b⊤ y : A⊤ y = c, y ≥ 0 (LP∗ )
as presented in section 4.5.2 and assume that the system of equality constraints
58 General Theorem on Alternative and Linear Programming Duality

Figure I.6. Geometry of primal-dual LP pair, m = 3


Q : △ABC; Q∗ : ray DD′ ; ξ: point A; ξ∗ : point D
Pay attention to orthogonality of the ray and the plane of
−→ −−→
the triangle and orthogonality of vectors OA and OD.

in the dual problem is feasible, so that there exists β ∈ Rm such that A⊤ β = c.


It turns out3 that the pair (LP), (LP∗ ) possesses nice and transparent geometry.
Specifically, the data of the pair give rise to the following geometric entities:
• a pair of linear subspaces L, L∗ in Rm which are orthogonal complements to
each other
[L = Im(A), L∗ = Ker(A⊤ )],
• a pair of shift vectors d, d∗ ∈ Rm
[d = b, d∗ = −β],
which in turn give rise to
• the pair of convex sets Q = [L − d] ∩ Rm m
+ , Q∗ = [L∗ − d∗ ] ∩ R+
m m
[where R+ = {u ∈ R : u ≥ 0}].
To solve (LP), (LP∗ ) to optimality is exactly the same as to find orthogonal to
each other vectors ξ ∈ Q and ξ∗ ∈ Q∗ . Such a pair ξ, ξ∗ gives rise to primal-dual
optimal pair(s) (x∗ , y ∗ ) (one can take as x∗ any x such that Ax − b = ξ and
set y ∗ = ξ∗ ), and every primal-dual optimal pair (x∗ , y ∗ ) can be obtained in this
manner from a pair of orthogonal to each other vectors ξ ∈ Q, ξ∗ ∈ Q∗ . The
required pairs ξ, ξ∗ exist if and only if both the sets Q and Q∗ are nonempty.
For illustration, see Figure I.6.

3 for derivations, see Exercise IV.7 addressing Conic Duality, of which LP duality is a special case.
5

Exercises for Part I

5.1 Elementaries
Exercise I.1 Mark in the following list the sets which are convex:
1. x ∈ R2 : x1 + i2 x2 ≤ 1, i = 1, . . . , 10


2. x ∈ R2 : x21 + 2ix1 x2 + i2 x22 ≤ 1, i = 1, . . . , 10


3. x ∈ R2 : x21 + ix1 x2 + i2 x22 ≤ 1, i = 1, . . . , 10
4. x ∈ R2 : x21 + 5x1 x2 + 4x22 ≤ 1
 
5. x ∈ R10 : x21 + 2x22 + 3x23 + . . . + 10x210 ≤ ˙1000x1 − 999x2 + 998x3 − . . . + 992x9 − 991x10
6. x ∈ R2 : exp{x1 } ≤ x2


7. x ∈ R2 : exp{x1 } ≥ x2
8. x ∈ Rn : n 2
P
i=1 xi = 1
n Pn
9. x ∈ R : i=1 x2i ≤ 1
10. x ∈ Rn : n 2
P
 i=1 xi ≥ 1 

11. x ∈ Rn : max xi ≤ 1
i=1,...,n
 
12. x ∈ Rn : max xi ≥ 1
i=1,...,n
 
13. x ∈ Rn : max xi = 1
i=1,...,n
 
n
14. x ∈ R : min xi ≤ 1
i=1,...,n
 
n
15. x ∈ R : min xi ≥ 1
i=1,...,n
 
16. x ∈ Rn : min xi = 1
i=1,...,n

Exercise I.2 Mark by T those of the following claims which always are true.
1. The linear image Y = {Ax : x ∈ X} of a linear subspace X is a linear subspace.
2. The linear image Y = {Ax : x ∈ X} of an affine subspace X is an affine subspace.
3. The linear image Y = {Ax : x ∈ X} of a convex set X is convex.
4. The affine image Y = {Ax + b : x ∈ X} of a linear subspace X is a linear subspace.
5. The affine image Y = {Ax + b : x ∈ X} of an affine subspace X is an affine subspace.
6. The affine image Y = {Ax + b : x ∈ X} of a convex set X is convex.
7. The intersection of two linear subspaces in Rn is always nonempty.
8. The intersection of two linear subspaces in Rn is a linear subspace.
9. The intersection of two affine subspaces in Rn is an affine subspace.
10. The intersection of two affine subspaces in Rn , when nonempty, is an affine subspace.
11. The intersection of two convex sets in Rn is a convex set.
12. The intersection of two convex sets in Rn , when nonempty, is a convex set.

59
60 Exercises for Part I

Exercise I.3 ▲ Prove that the relative interior of a simplex with vertices y 0 , . . . , y m is exactly
the set
( m m
)
X X
λi yi : λi > 0, λi = 1 .
i=0 i=0

Exercise I.4 Which of the following claims is true:


1. The set X = {x : Ax ≤ b} is a cone if and only if X = {x : Ax ≤ 0}.
2. The set X = {x : Ax ≤ b} is a cone if and only if b = 0.
Exercise I.5 Suppose K is a closed cone. Prove that the set X = {x : Ax − b ∈ K} is a cone
if and only if X = {x : Ax ∈ K}.
Exercise I.6 ▲ Prove that if M is a nonempty convex set in Rn and ϵ > 0, then for every
norm ∥ · ∥ on Rn , the ϵ-neighborhood of M , i.e., the set
 
Mϵ = y ∈ Rn : inf ∥y − x∥ ≤ ϵ ,
x∈M

is convex.
Exercise I.7 Which of the following claims are always true? Explain why/why not.
1. The convex hull of a bounded set in Rn is bounded.
2. The convex hull of a closed set in Rn is closed.
3. The convex hull of a closed convex set in Rn is closed.
4. The convex hull of a closed and bounded set in Rn is closed and bounded.
5. The convex hull of an open set in Rn is open.
Exercise I.8 ▲ [This exercise together with its follow-up, i.e., Exercise II.9, and Exercise I.9
are the most boring exercises ever designed by the authors. Our excuse is that There is no royal
road to geometry (Euclid of Alexandria, c. 300 BC)]
Let A, B be nonempty subsets of Rn . Consider the following claims. If the claim is always
(i.e., for every data satisfying premise of the claim) true, give a proof; otherwise, give a counter
example.
1. If A ⊆ B, then Conv(A) ⊆ Conv(B).
2. If Conv(A) ⊆ Conv(B), then A ⊆ B.
3. Conv(A ∩ B) = Conv(A) ∩ Conv(B).
4. Conv(A ∩ B) ⊆ Conv(A) ∩ Conv(B).
5. Conv(A ∪ B) ⊆ Conv(A) ∪ Conv(B).
6. Conv(A ∪ B) ⊇ Conv(A) ∪ Conv(B).
7. If A is closed, so is Conv(A).
8. If A is closed and bounded, so is Conv(A).
9. If Conv(A) is closed and bounded, so is A.
Exercise I.9 ▲ Let A, B, C be nonempty subsets of Rn and D be a nonempty subset of Rm .
Consider the following claims. If the claim is always (i.e., for every data satisfying premise of
the claim) true, give a proof; otherwise, give a counter example.
1. Conv(A ∪ B) = Conv(Conv(A) ∪ B).
2. Conv(A ∪ B) = Conv(Conv(A) ∪ Conv(B)).
3. Conv(A ∪ B ∪ C) = Conv(Conv(A ∪ B) ∪ C).
4. Conv(A × D) = Conv(A) × Conv(D).
5. When A is convex, the set Conv(A ∪ B) (which is always the set of convex combinations
of several points from A and several points from B), can be obtained by taking convex
combinations of points with at most one of them taken from A, and the rest taken from B.
Similarly, if A and B are both convex, to get Conv(A ∪ B), it suffices to add to A ∪ B all
convex combinations of pairs of points, one from A and one from B.
5.2 Around ellipsoids 61

6. Suppose A is a set in Rn . Consider the affine mapping x 7→ P x + p : Rn → Rm , and


the image of A under this mapping, i.e., the set P A + p := {P x + p : x ∈ A}. Then,
Conv(P A + p) = P Conv(A) + p.
7. Consider an affine mapping y 7→ P (y) : Rm → Rn where P (y) := P y + p. Recall that
given a set X ∈ Rn , its inverse image under the mapping P (·) is given by P −1 (X) := {y ∈
Rm : P (y) ∈ X}. Then, Conv(P −1 (A)) = P −1 (Conv(A)).
8. Consider an affine mapping y 7→ P (y) : Rm → Rn where P (y) := P y+p. Then, Conv(P −1 (A)) ⊆
P −1 (Conv(A)).
Exercise I.10 ▲ Let X1 , X2 ∈ Rn be two nonempty sets, and define Y := X1 ∪ X2 and
Z := Conv(Y ). Consider the following claims. If the claim is always (i.e., for every data satisfying
premise of the claim) true, give a proof; otherwise, give a counter example.
1. Whenever X1 and X2 are both convex, so is Y .
2. Whenever X1 and X2 are both convex, so is Z.
3. Whenever X1 and X2 are both bounded, so is Y .
4. Whenever X1 and X2 are both bounded, so is Z.
5. Whenever X1 and X2 are both closed, so is Y .
6. Whenever X1 and X2 are both closed, so is Z.
7. Whenever X1 and X2 are both compact, so is Y .
8. Whenever X1 and X2 are both compact, so is Z.
9. Whenever X1 and X2 are both polyhedral, so is Y .
10. Whenever X1 and X2 are both polyhedral, so is Z.
11. Whenever X1 and X2 are both polyhedral and bounded, so is Y .
12. Whenever X1 and X2 are both polyhedral and bounded, so is Z.
Exercise I.11 Consider two families of convex sets given by {Fi }i∈I and {Gj }j∈J . Prove that
the following relation holds:
! !
[ [
Conv (Fi ∩ Gj ) ⊆ Conv [Gj ∩ Conv(∪i∈I Fi )] .
i∈I, j∈J j∈J

Exercise I.12 Let C1 , C2 be two nonempty conic sets in Rn , i.e., for each i = 1, 2, for any
x ∈ Ci and t ≥ 0, we have t · x ∈ Ci as well. Note that C1 , C2 are not necessarily convex. Prove
that
1. C1 + C2 ̸= Conv(C1 ∪ C2 ) may happen if either C1 or C2 (or both) is nonconvex.
2. C1 + C2 = Conv(C1 ∪ C2 ) always holds if C1 , C2 are both convex.
S
3. C1 ∩ C2 = α∈[0,1] (αC1 ∩ (1 − α)C2 ) always holds if C1 , C2 are both convex.
Exercise I.13 ▲ Let X ⊆ Rn be a convex set with int X ̸= ∅, and consider the following set

K := cl {[x; t] : t > 0, x/t ∈ X} .

Prove that the set K is a closed cone with a nonempty interior.

5.2 Around ellipsoids


Exercise I.14 Verify each of the following statements:
1. Any ellipsoid E ∈ Rn is the images of the unit Euclidean ball Bn = {x ∈ Rn : ∥x∥2 ≤ 1}
under a one-to-one affine mapping. That is, E ⊂ Rn can be represented as E = {x :
(x − c)⊤ C(x − c) ≤ 1} with C ≻ 0 and c ∈ Rn if and only if it can be represented as
E = {c + Du : u ∈ Bn } with nonsingular D, and in the latter representation D can be
selected to be symmetric positive definite.
62 Exercises for Part I

2. Given C ≻ 0, D ≻ 0 and c, d ∈ Rn , the ellipsoid EC := {x : (x − c)⊤ C(x − c) ≤ 1} is


contained in the ellipsoid ED := {x : (x − c)⊤ D(x − c) ≤ 1} if and only if C ⪰ D. If the

ellipsoid EC is contained in the ellipsoid ED = {x : (x − d)⊤ D(x − d) ≤ 1}, then C ⪰ D.
n
3. For a set U ⊂ R , let Vol(U ) be the ratio of the n-dimensional volume of U and the n-
dimensional volume of the unit ball Bn . Then, for an n-dimensional ellipsoid E represented
as {x = c + Du : ∥u∥2 ≤ 1} with nonsingular D we have

Vol(E) = |Det(D)|,

and when E is represented as {x : (x − c)⊤ C(x − c) ≤ 1} with C ≻ 0, we have

Vol(E) = Det−1/2 (C).

Exercise I.15 Given C ≻ 0, an ellipsoid {x : (x − a)⊤ C(x − a) ≤ 1} is the solution set of


quadratic inequality x⊤ Cx − 2(Ca)⊤ x + (a⊤ Ca − 1) ≤ 0. Prove that the solution set E of any
quadratic inequality f (x) := x⊤ Cx − c⊤ x + σ ≤ 0 with positive semidefinite matrix C is convex.

5.3 Truss Topology Design


Exercise I.16 ♦ [First acquaintance with Truss Topology Design]
Preamble. What follows is the first exercise in a “Truss Topology Design” (TTD) series ((other
exercises in it are I.18, III.9, IV.11, IV.28). The underlying “real life” mechanical story is simple
enough to be told and rich enough to illustrate numerous constructions and results presented in
the main body of our textbook – ranging from Caratheodory Theorem to semidefinite duality,
demonstrating on a real life example how the theory works.
Trusses. Truss is a mechanical construction, like railroad bridge, electric mast, of Eiffel Tower,
composed of thin elastic bars linked with each other at nodes – points from physical space (3D
space for spatial, and 2D space for planar trusses).

Figure I.7. Pratt Truss Bridge


source: https://grabcad.com/library/pratt-truss-bridge-2

When truss is subject to external load – collection of forces acting at the nodes – it starts to
deform, so that the nodes move a little bit, leading to elongations/shortenings of bars, which, in
turn, result in reaction forces. At the equilibrium, the reaction forces compensate the external
ones, and the truss capacitates certain potential energy, called compliance. Mechanics models
this story as follows.
• The nodes form a finite set p1 , . . . , pK of distinct points in physical space Rd (d = 2 for
planar, and d = 3 for spatial constructions). Virtual displacements of the nodes under the
load are somehow restricted by “support conditions;” we will focus on the case when some of
the nodes “are fixed” – cannot move at all (think about them as being in the wall), and the
remaining “are free” – their virtual displacements form the entire Rd . A virtual displacement
v of the nodal set can be identified with a vector of dimension M = dm, where m is the
number of free nodes; v is block vector with m d-dimensional blocks, indexed by the free
nodes, representing physical displacements of these nodes.
• There are N bars, i-th of them linking the nodes with indexes αi and βi (with at least one
of these nodes free) and with volume (3D or 2D, depending on whether the truss is spatial
or planar) ti .
5.3 Truss Topology Design 63

• An external load is a collection of physical forces – vectors from Rd – acting at the free nodes
(forces acting at the fixed nodes are of no interest – they are suppressed by the supports).
Thus, an external load f can be identified with block vector of the same structure as a virtual
displacement – blocks are indexed by free nodes and represent the external forces acting at
these nodes. Thus, displacements v of the nodal set and external loads f are vectors from
the space V of virtual displacements – M -dimensional block vectors with m d-dimensional
blocks.
• The bars and the nodes together specify the symmetric positive semidefinite M × M stiffness
matrix A of the truss. The role of this matrix is as follows. A displacement v ∈ V of the nodal
set results in reaction forces at free nodes (those at fixed nodes are of no interest – they are
compensated by supports); assembling these forces into M -dimensional block-vector, we get
a reaction, and this reaction is −Av. In other words, the potential energy capacitated in truss
under displacement v ∈ V of nodes is 12 v ⊤ Av, and reaction, as it should be, is the minus
gradient of the potential energy as a function of v 1 . At the equilibrium under external load
f , the total of the reaction and the load should be zero, that is, the equilibrium displacement
satisfies
Av = f (5.1)
Note that (5.1) may be unsolvable, meaning that the truss is crushed by the load in question.
Assuming the equilibrium displacement v exists, the truss at equilibrium capacitates potential
energy 12 v ⊤ Av; this energy is called compliance of the truss w.r.t. the load. Compliance is
convenient measure of rigidity of the truss with respect to the load, the less the compliance
the better the truss withstands the load.
Let us build the stiffness matrix of a truss. As we have mentioned, the reaction forces originate
from elongations/shortenings of bars under displacement of nodes. Consider i-th bar linking
nodes with initial – prior to the external load being applied – positions ai = pαi and bi = pβi ,
and let us set
di = ∥bi − ai ∥2 , ei = [bi − ai ]/di .
Under displacement v ∈ V of the nodal set,
v αi , bi + |{z}
• positions of the nodes linked by the bar become ai + |{z} v βi , where v γ is γ-th block
da db
in v – the displacement of γ-th node
• as a result, elongation of the bar becomes, in the first-order in v approximation, e⊤
i [db − da],
and the reaction forces caused by this elongation by Hooke’s Law2 are
d−1 ⊤
i Si ei ei [db − da] at node # αi
−d−1
i S e e⊤
i i i [db − da] at node # βi
0 at all remaining nodes
where Si = ti /di is the cross-sectional size of i-th bar. It follows that when both nodes linked
by i-th bar are free, the contribution of i-th bar to the reaction is
−ti bi b⊤
i v,

1 This is called linearly elastic model; it is the linearized in displacements approximation of the actual
behavior of a loaded truss. This model works the better the smaller are the nodal displacements as
compared to the inter-nodal distances, and is accurate enough to be used in typical real-life
applications.
2 Hooke’s Law says that the magnitude of the reaction force caused by elongation/shortening of a bar
is proportional to Sd−1 δ, where S is bar’s cross-sectional size (area for spatial, and thickness for
planar truss), d is bar’s (pre-deformation) length, and δ is the elongation. With units of length
properly adjusted to bars’ material, the proportionality coefficient becomes 1, and this is what we
assume from now on.
64 Exercises for Part I

where bi ∈ V is the vector with just two nonzero blocks:


— the block with index αi – this block is ei /di = [bi − ai ]/∥bi − ai ∥22 , and
— the block with index βi – this block is −ei /di = −[bi − ai ]/∥bi − ai ∥22 .
It is immediately seen that when just one of the nodes linked by i-th bar is free, the contri-
bution of i-th bar to the reaction is given by similar relations, but with one, rather than 2,
blocks in bi – the one corresponding to the free among the nodes linked by the bar.

The bottom line is that The stiffness matrix of a truss composed of N bars with volumes ti ,
1 ≤ i ≤ N , is

X
A = A(t) := ti bi b⊤
i ,
i

where bi ∈ V = RM are readily given by the geometry of nodal set and the indexes of nodes
linked by bar i.

Truss Topology Design problem. In the simplest Truss Topology Design (TTD) problem,
one is given

• a finite set of tentative nodes in 2D or 3D along with support conditions indicating which
of the nodes are fixed and which are free, and thus specifying the linear space V = RM of
virtual displacements of the nodal set,

• the set of N tentative bars – unordered pairs of (distinct from each other) nodes which are
allowed to be linked by bars, and the total volume W > 0 of the truss,

• An external load f ∈ V.

These data specify, as explained above, vectors bi ∈ RM , i = 1, . . . , N , and the stiffness matrix

N
X
A(t) = ti bi b⊤
i = B Diag{t1 , . . . , tN }B

∈ SM [B = [b1 , . . . , bN ]]
i=1

of truss, which under the circumstances can be identified with vector t ∈ RN


+ of bar volumes.
What we want is to find the truss of given volume capable to “withstand best of all” the given
load, that is, the one that minimizes the corresponding compliance.

When applying the TTD model, one starts with dense grid of tentative nodes and broad list
of tentative bars (e.g., by allowing to link by a bar every pair of distinct from each other nodes,
with at least one of the nodes in the pair free). At the optimal truss yielded by the optimal
solution to the TTD problem, many tentative bars (usually vast majority of them) get zero
volumes, and significant part of the tentative nodes become unused. Thus, TTD problem in fact
is not about sizing – it allows to recover optimal structure of the construction, this is where
“Topology Design” comes from.

To illustrate this point, here is a toy example (it will be our guinea pig in the entire series of
TTD exercises):
5.3 Truss Topology Design 65
Console design: We want to design a 2D truss as follows:
• The set of tentative nodes is the 9 × 9 grid {[p; q] ∈ R2 : p, q ∈ {0, 1, . . . , 8}} with
the 9 most-left nodes fixed and remaining 72 nodes free, resulting in M = 144-
dimensional space V of virtual displacements
• The external load f ∈ V = R144 is a single-force one, with the only nonzero force
[0; −1] applied at the 5-th node of the most-right column of nodes.
• We allow for all pairwise connections of pairs of distinct from each other nodes,
with at least one of these nodes free, resulting in N = 3204 tentative bars
• The total volume of truss is W = 1000.

9 × 9 nodal grid 3024 tentative bars


•: fixed nodes

optimal truss, 38 bars displacement under


compliance 0.1914 load of interest
Figure I.8. Console. Bars and nodes’ positions before (crosses) and
after (gray dots) deformation. Gray segment starting at the most right node: external force

Important: From now on, speaking about TTD problem, we always make the following as-
sumption:
PN ⊤
t=1 bi bi ≻ 0.
R:

Under this assumption, the stiffness matrix A(t) = i ti bi b⊤


P
i associated with truss t > 0 is
positive definite, so that such a truss can withstand whatever load f . You can verify numerically
that this is the case in Console design as stated above.
After this lengthy preamble (to justify its length, note that it is investment to a series of
exercises, rather than just one of them), let us pass to the exercise per se. Consider a TTD
problem.
1. Prove that truss t ≥ 0 (recall that we identify truss with the corresponding vector of bar
volumes) is capable to carry load f if and only if the quadratic function
1 ⊤
F (v) = f ⊤ v − v A(t)v
2
is bounded from above, and that whenever this takes place,
66 Exercises for Part I

• the maximum of F over V is achieved

• the maximizers of F are exactly the equilibrium displacements v – those with

A(t)v = f,

and for such a displacement, one has


1 ⊤ 1
[max F =] F (v) = v A(t)v = v ⊤ f
2 2
• the maximum value of F is exactly the compliance of the truss w.r.t. the load f
2. Prove that a real τ is an upper bound on the compliance of truss t ≥ 0 w.r.t. load f if and
only if the symmetric matrix

B Diag{t}B ⊤ f
 
A= ⊤ , B = [b1 , . . . , bN ]
f 2τ

is positive semidefinite. As a result, pose the TTD problem as the optimization problem
(  )
B Diag{t}B ⊤ f
 X
Opt = min τ : ⪰ 0, t ≥ 0, ti = W (5.2)
τ,r f⊤ 2τ i

Prove that the problem is solvable.


3. [computational study]
3.1. Solve the Console problem numerically and reproduce the numerical results presented
above.
3.2. Resolve the problem with the set of all possible tentative bars reduced to the subset of
“short” bars connecting neighboring nodes only:

Figure I.9. 262 ”short” tentative bars

and compare the resulting design and compliance to those in the previous item.

5.4 Around Caratheodory Theorem


Exercise I.17 ♦ Prove the following statements:
Let X ⊂ Rn be nonempty. Then,
1. if a point x can be represented as a convex combination of a collection of vectors from X,
then the collection can be selected to be affinely independent.
2. if a point x can be represented as a conic combination of a collection of vectors from X, then
the collection can be selected to be linearly independent.
5.4 Around Caratheodory Theorem 67

Comment: Note that the claims above are refinements, albeit minor ones, of the Caratheodory
Theorem (plain and conic, respectively). Indeed, when M := Aff(X) and m is the dimension
of M , every affinely independent collection of points from X contains at most m + 1 points
(Proposition A.44), so that the first claim equivalent to stating that if x ∈ Conv(X), then x is a
convex combination of at most m + 1 points from X. However, the vectors participating in such
a convex combination are not necessarily affinely independent, so that the first claim provides
a bit more information than the plain Caratheodory’s Theorem. Similarly, if L := Lin(X) and
m := dim L, then every linearly independent collection of vectors from X contains at most
m ≤ n points, that is, the second claim implies the Caratheodory’s Theorem in conic form, and
provides a bit more information than the latter theorem.
Exercise I.18 ♦ 3 Consider TTD problem, and let N be the number of tentative bars, M be
the dimension of the corresponding space of virtual displacements V, and f be an external load.
Prove that if truss t ≥ 0 can withstand load f with compliance ≤ τ for some given real number
τ , then there exists truss t of the same total volume as t with compliance w.r.t. f at most τ and
at most M + 1 bars of positive volume.
Exercise I.19 ♦ [Shapley-Folkman Theorem]
1. Prove that if a system of linear equations Ax = b with n variables and m equations has a
nonnegative solution, it has a nonnegative solution with at most m positive entries.
2. Let V1 , . . . , Vn be n nonempty sets in Rm , and define

V := Conv(V1 + V2 + . . . + Vn ).

1. Prove that
V = Conv(V1 ) + . . . + Conv(Vn ).

2. Prove Shapley-Folkman Theorem:


Let x ∈ V . Then, there exists a representation of x such that
x = x1 + . . . + xn , xi ∈ Conv(Vi ),
in which at least n − m of xi ’s belong to the respective sets Vi .
Comment: Shapley-Folkman Theorem says, informally, that when n ≫ m, summing up
n nonempty sets in Rm possesses certain “convexification property” – every point from
the convex hull V of the sum of our sets is the sum of points xi with all but m of them
belonging to Vi rather than to Conv(Vi ), and only ≤ m of the points belonging to Vi
“fractionally,” that is, belonging to Conv(Vi ), but not to Vi . This nice fact has numerous
useful applications.
Exercise I.20 ♦ Caratheodory’s Theorem in its plain and its conic forms are “existence”
statements: if a point x ∈ Rm is a convex, respectively conic, combination of points x1 , . . . , xN ,
then there exists a representation of x of the same type which involves at most (m + 1), respec-
tively, m, terms. Extract from the proofs of the theorems algorithms for finding these “short”
representations at the cost of solving at most N solvable systems of linear equations with at
most N variables and m equations each.

Exercise I.21 ♦ Prove Kirchberger’s Theorem:


Consider two sets of finitely many points X = x1 , . . . , xk and Y = y 1 , . . . , y m
 

in Rn such that k + m ≥ n + 2 and all the points x1 , . . . , xk , y 1 , . . . , y m are distinct.


Assume that for any subset S ⊆ X ∪ Y composed of n + 2 points the convex hulls of
the sets X ∩ S and Y ∩ S do not intersect, i.e., Conv(X ∩ S) ∩ Conv(Y ∩ S) = ∅. Then,
the convex hulls of X and Y also do not intersect, i.e., Conv(X) ∩ Conv(Y ) = ∅.
3 Preceding exercise in the TTD series is I.16.
68 Exercises for Part I

Hint: Assume, on contrary, that the convex hulls of X and Y intersect, so that
k
X m
X
λ i xi = µj y j
i=1 j=1

for certain nonnegative λi , ki=1 λi = 1, and certain nonnegative µj , m


P P
j=1 µj = 1, and look at
the expression of this type with the minimum possible total number of nonzero coefficients λi ,
µj .
Exercise I.22 ♦ [Follow-up to Shapley-Folkman Theorem]
1. Let X1 , . . . , XK be nonempty convex sets in Rn and X = k≤K Xk . Prove that
S

( K K
)
X X
Conv(X) = x = λk xk : λk ≥ 0, xk ∈ Xk , ∀k ≤ K, λk = 1 .
k=1 k=1
n
2. Let Xk , k ≤ K, be nonempty bounded polyhedral sets in R given by polyhedral represen-
tations: n o
Xk = x ∈ Rn : ∃uk ∈ Rnk : Pk x + Qk uk ≤ rk .
S
Define X := k≤K Xk . Prove that the set Conv(X) is a polyhedral set given by the polyhedral
representation
∃xk ∈ Rn , uk ∈ Rnk , λk ∈ R, ∀k ≤ K :
 
 
Pk xk + Qk uk − λk rk ≤ 0, k ≤ K
 

n (a) 
Conv(X) = x ∈ R : PK . (∗)
λk ≥ 0, λk = 1 (b) 
PK k=1

 
x = k=1 xk
 
(c)
Does the claim remain true when the assumption of boundedness of the sets Xk s is lifted?
After two preliminary items above, let us pass to the essence of the matter. Consider the situation
as follows. We are given n nonempty and bounded polyhedral sets Xj ⊂ Rr , j = 1, . . . , n. We
will think of Xj as the “resource set” of the j-th production unit: entries in x ∈ Xj are amounts
of various resources, and Xj describes the set of vectors of resources available, in principle, for
j-th unit. Each production unit j can possibly use any one of its Kj < ∞ different production
plans. For each j = 1, . . . , n, the vector yj ∈ Rp representing the production of the j-th unit
depends on the vector xj of resources consumed by the unit and also on the production plan
utilized in the unit. In particular, the production vector yj ∈ Rp stemming from resources xj
under k-th plan can be picked by us, at our will, from the set
n o
Yjk [xj ] := yj ∈ Rp : zj := [xj ; −yj ] ∈ Vjk ,

where Vjk , k ≤ Kj , are given bounded polyhedral “technological sets” of the units with projec-
tions onto the xj -plane equal to Xj , so that for every k ≤ Kj it holds
xj ∈ Xj ⇐⇒ ∃yj : [xj ; −yj ] ∈ Vjk . (5.3)
We assume that all the sets Vjk are given by polyhedral representations, and we define
[
Vj := Vjk .
k≤Kj

Let R ∈ R be the vector of total resources available to all n units and let P ∈ Rp be the
r

vector of total demands for the products. For j ≤ n, we want to select xj ∈ Xj , kj ≤ Kj , and
k
yj ∈ Yj j [xj ] in such a way that
X X
xj ≤ R and yj ≥ P.
j j
5.5 Around Helly Theorem 69
P
That is, we would like to find zj = [xj ; vj ] ∈ Vj , j ≤ n, in such a way that j zj ≤ [R; −P ].
Note that the presence of “combinatorial part” in our decision – selection of production plans
in finite sets – makes the problem difficult.
3. Apply Shapley-Folkman Theorem (Exercise I.19) to overcome, to some extent, the above
difficulty and come up with a good and approximately feasible solution.

5.5 Around Helly Theorem


Exercise I.23 ▲ [Alternative proof of Helly Theorem] The goal of this exercise is to build an
alternative proof of Helly’s Theorem, without using Radon’s Theorem.
1. Consider a system a⊤ n
i x ≤ bi , i ≤ N , of N linear inequalities in variables x ∈ R . Helly’s
n ⊤
Theorem applied to the sets Ai := {x ∈ R : ai x ≤ bi } gives us that
(!) If a system a⊤ n
i x ≤ bi , i ≤ N , of linear inequalities in variables x ∈ R infeasible,
so is a properly selected sub-system composed of at most n + 1 inequalities from the
system.
Find an alternative proof of (!) without relying on Helly’s or Radon’s Theorems.
2. Extract from item 1 Helly’s Theorem for polyhedral sets: If A1 , . . . , AN , N ≥ n + 1, are
polyhedral sets in Rn and every n + 1 of these sets have a point in common, then all the sets
have a point in common.
3. Extract from item 2 Helly’s Theorem (Theorem I.2.10).
Exercise I.24 ▲ A0 , A1 , . . . , Am , m = 2025, are nonempty convex subsets of R2000 , and A0 is
a triangle (convex hull of 3 affinely independent vectors). Which of the claims below are always
(that is, for any A0 , . . . , Am satisfying the above assumptions) true:
1. If every 3 among the sets A0 , . . . , Am have a point in common, all m + 1 sets have a point
in common.
2. If every 4 among the sets A0 , . . . , Am have a point in common, all m + 1 sets have a point
in common.
3. If every 2001 among the sets A0 , . . . , Am have a point in common, all m + 1 sets have a point
in common.
Exercise I.25 ▲ Let Pi := {x ∈ Rn : Ai x ≤ bi } for i ∈ {1, . . . , m} and C := {x ∈ Rn : Dx ≥
d} be nonempty polyhedral sets. Suppose that for any n + 1 sets, Pi1 , . . . , Pin+1 , there is a
translate of C, i.e., the set C + u for some u ∈ Rn , which is contained in all Pi1 , . . . , Pin+1 .
Prove that there is a translate of C, which is contained in all of the sets P1 , . . . , Pm .
Exercise I.26 ▲ A cake contains 300 g of raisins (you may think of every one of them as of a
3D ball of positive radius). John and Jill are about to divide the cake according to the following
rules:
• first, Jill chooses a point a in the cake;
• second, John makes a cut through a, that is, chooses a 2D plane Π passing through a and
takes the part of the cake on one side of the plane (both Π and the side are up to John, with
the only restriction that the plane should pass through a); all the rest goes to Jill.
1. Prove that it may happen that Jill cannot guarantee herself 76 g of the raisins.
2. Prove that Jill always can choose a in a way which guarantees her at least 74 g of the raisins.
3. Consider n-dimensional version of the problem, where the raisins are n-dimensional balls,
the cake is a domain in Rn , and “a cut” taken by John is defined as the part of the cake
contained in the half-space
n o
x ∈ Rn : e⊤ (x − a) ≥ 0 ,

where e ̸= 0 is the vector (“inner normal to the cutting hyperplane”) chosen by John. Prove
70 Exercises for Part I
300
that for every ϵ > 0, Jill can guarantee to herself at least n+1
− ϵ g of raisins, but in general
300
cannot guarantee to herself n+1 + ϵ g.
Remarks:
300
1. With some minor effort, you can prove that Jill can find a point which guarantees her n+1
300
g of raisins, and not n+1 − ϵ g.
2. If, instead of dividing raisins, John and Jill would divide in the same fashion uniform and
convex cake (that is, a closed and bounded convex body X with a nonempty interior in Rn ,
the reward being the n-dimensional volume of the part a person gets), the results would
change dramatically: choosing as the point the center of masses of the cake
R
xdx
X
x̄ := R ,
dx
X
 n
n
Jill would guarantee herself at least n+1
≈ 1e part of the cake. This is a not so easy
corollary of the following extremely important and deep result:
Brunn-Minkowski Symmetrization Theorem: Let X be as above, and let [a, b]
be the projection of X on an axis ℓ, say, on the last coordinate axis. Consider the “
symmetrization” Y of X, i.e., Y is the set with the same projection [a, b] on ℓ and
for every hyperplane orthogonal to the axis ℓ and crossing [a, b], the intersection of Y
with this hyperplane is an (n − 1)-dimensional ball centered at the axis with precisely
the same (n − 1)-dimensional volume as the one of the intersection of X with the same
hyperplane:
z ∈ Rn−1 : [z; c] ∈ Y = z ∈ Rn−1 : ∥z∥2 ≤ ρ(c) , ∀c ∈ [a, b], and
 

Voln−1 z ∈ Rn−1 : [z; c] ∈ Y = Voln−1 z ∈ Rn−1 : [z; c] ∈ X , ∀c ∈ [a, b].


   

Then, Y is a closed convex set.

5.6 Around Polyhedral Representations


Exercise I.27 ▲ Justify calculus rules for polyhedral representations presented in section 3.3.
Exercise I.28 Given two sets U, V ⊆ Rm , we define
U + V = {x ∈ Rm : ∃u ∈ U, ∃v ∈ V : x = u + v} .
Let D := {x ∈ Rn : Ax + b + Qs ⊆ P, ∀s ∈ S} where the set ∅ ̸= P ⊂ Rm admits polyhedral
representation, the set ∅ ̸= S ⊂ Rk is given but arbitrary, and the sets ∅ ̸= Qs ⊂ Rm are
indexed by s ∈ S.
1. Suppose that S is a finite set and for each s ∈ S we have Qs = {qs }, i.e., is a single point.
Then, will the set D be polyhedrally representable?
2. State sufficient conditions on the structure of sets Qs and S that will guarantee that the
resulting set D is polyhedral. Here, the goal is to have conditions as general as possible.
Among your sufficient conditions, can you identify at least some of those that are necessary?
Exercise I.29 ♦ For x ∈ Rn and integer k, 1 ≤ k P ≤ n, let sk (x) be the sum of k largest
entries in x. For example, s1 (x) = maxi {xi }, sn (x) = n
i=1 xi , s3 ([3; 1; 2; 2]) = 3 + 2 + 2 = 7.
Now let 1 ≤ k ≤ n be two integers. For any integer k = 1, . . . , n, define
Xk,n := {[x; t] ∈ Rn × R : sk (x) ≤ t} .
Observe that Xk,n is a polyhedral set. Indeed, sk (x) ≤ t holds if and only if for every k indices
i1 < i2 < . . . < ik from {1, 2, . . . , n} we have xi1 + xi2 + . . . + xik ≤ t, which is nothing but a
5.7 Around General Theorem on Alternative 71

linear inequality in variables x, t. Since there are nk possible ways



 of selecting k indices from
{1, 2, . . . , n}, the number of linear inequalities describing Xk,n is nk , and these linear inequalities
give the polyhedral description of Xk,n . The point of this exercise is to demonstrate that Xk,n
admits a “short” polyhedral representation, specifically,
( n
)
n n
X
Xk,n = [x; t] ∈ R × R : ∃z ∈ R , ∃s ∈ R: xi ≤ zi + s, ∀i, z ≥ 0, zi + ks ≤ t .
i=1

Exercise I.30 ▲ [Computational study: Fourier-Motzkin elimination as an LP algorithm] It


was mentioned in section 3.2.1 that Fourier-Motzkin elimination provides us with an algorithm
that terminates in finitely many steps for solving LP problems. This algorithm, however, is of
no computational value due to the potential rapid growth of the number of inequalities one may
need to handle when eliminating more and more variables. The goal of this exercise is to get an
impression of this phenomenon.
Our “guinea pig” will be transportation problem with n unit capacity suppliers and n unit
demand customers:
( n Xn
)
X X X
min t : t ≥ cij xij , xij ≥ 1, ∀j, xij ≤ 1, ∀i, xij ≥ 0, ∀i, j .
x,t
i=1 i=1 i j

2 2
This problem has n + 1 variables and (n + 1) linear inequality constraints, and let us solve
it by applying the Fourier-Motzkin elimination to project the feasible set of the problem onto
the axis of the t-variable, that is, to build a finite system S of univariate linear inequalities
specifying this projection.
How many inequalities do you think there will be in S when n = 1, 2, 3, 4? Check your intuition
by implementing and running the F-M elimination, assuming, for the sake of definiteness, that
cij = 1 for all i, j.

5.7 Around General Theorem on Alternative


Exercise I.31 1. Prove Gordan’s Theorem on Alternative:
A system of strict homogeneous linear inequalities Ax < 0 in variables x has a solution
if and only if the system A⊤ λ = 0, λ ≥ 0 in variables λ has only the trivial solution
λ = 0.
2. Prove Motzkin’s Theorem on Alternative:
A system Ax < 0, Bx ≤ 0 of strict and nonstrict homogeneous linear inequalities has
a solution if and only if the system A⊤ λ + B ⊤ µ = 0, λ ≥ 0, µ ≥ 0 in variables λ, µ
has no solution with λ ̸= 0.

Exercise I.32 For the systems of constraints to follow, write them down equivalently in the
standard form Ax < b, Cx ≤ d and point out their feasibility status (“feasible – infeasible”) along
with the corresponding certificates (certificate for feasibility is a feasible solution to the system;
certificate for infeasibility is a collection of weights of constraints which leads to a contradictory
consequence inequality, as explained in GTA).
1. x ≤ 0 (x ∈ RnP )
n
2. x ≤ 0, and i=1 xi > 0P(x ∈ Rn )
3. −1 ≤ xi ≤ 1, 1 ≤ i ≤ n, Pn n
i=1 xi ≥ n (x ∈ R )
n
4. −1 ≤ xi ≤ 1, 1 ≤ i ≤ n, xi > n (x ∈ Rn )
Pi=1
5. −1 ≤ xi ≤ 1, 1 ≤ i ≤ n, n
ixi ≥ n(n+1) (x ∈ Rn )
Pi=1
n
2
n(n+1)
6. −1 ≤ xi ≤ 1, 1 ≤ i ≤ n, i=1 ixi > 2
(x ∈ Rn )
2
7. x ∈ R , |x1 | + x2 ≤ 1, x2 ≥ 0, x1 + x2 = 1
8. x ∈ R2 , |x1 | + x2 ≤ 1, x2 ≥ 0, x1 + x2 > 1
72 Exercises for Part I

9. x ∈ R4 , x ≥ 0, the sum of two largest entries in x does not exceed 2, and x1 + x2 + x3 ≥ 3


10. x ∈ R4 , x ≥ 0, the sum of two largest entries in x does not exceed 2, and x1 + x2 + x3 > 3
Exercise I.33 Let (S) be the following system of linear inequalities in variables x ∈ R3
x1 ≤ 1, x1 + x2 ≤ 1, x1 + x2 + x3 ≤ 1 (S)
In the following list, point out which inequalities are/are not consequences of this system, and
certify your claims. To certify that a given inequality is a consequence of the given system, you
need to provide nonnegative aggregation weights λ ∈ R3+ for the inequalities in (S) such that the
resulting consequence inequality implies the given inequality. To certify that a given inequality
is not a consequence of the given system (S), you need to find a point x ∈ R3 that satisfies the
given system but violates the given inequality.
1. 3x1 + 2x2 + x3 ≤4
2. 3x1 + 2x2 + x3 ≤2
3. 3x1 + 2x2 ≤ 3
4. 3x1 + 2x2 ≤ 2
5. 3x1 + 3x2 + x3 ≤3
6. 3x1 + 3x2 + x3 ≤2
Make a generalization: prove that a linear inequality px1 + qx2 + rx3 ≤ s is a consequence of
(S) if and only if s ≥ p ≥ q ≥ r ≥ 0.
Exercise I.34 Is the inequality x1 + x2 ≤ 1 a consequence of the system x1 ≤ 1, x1 ≥ 2? If
yes, can it be obtained by taking a legitimate weighted sum of inequalities from the system and
the always true inequality 0⊤ x ≤ 1, as it is suggested by the Inhomogeneous Farkas Lemma?
Exercise I.35 Certify the correct statements in the following list:
P3
The polyhedral set X = x ∈ R3 : x ≥ [1/3; 1/3; 1/3],

1. i=1 xi ≤ 1 is nonempty.
3
The polyhedral set X = x ∈ R3 : x ≥ [1/3; 1/3; 1/3],
 P
2. i=1 xi ≤ 0.99 is empty.
3. The linear inequality x1 + x2 + Px33 ≥ 2 is violated somewhere on the polyhedral set X =
x ∈ R3 : x ≥ [1/3; 1/3; 1/3],

i=1 xi ≤ 1 .
4. The linear inequality x1 + x 2
P3 3 ≥ 2 is violated somewhere on the polyhedral set X =
+ x
x ∈ R3 : x ≥ [1/3; 1/3; 1/3],

i=1 xi ≤ 0.99 .
5. The linear inequality x1 + x2 ≤ 3/4 P is satisfied everywhere on the polyhedral set
3
X = x ∈ R3 : x ≥ [1/3;1/3; 1/3],

i=1 xi ≤ 1.05 .
3
6. The polyhedral set Y = x ∈ R : x1 ≥ 1/3, x2 ≥ 1/3, x3 ≥ 1/3 is not contained in the
P3
polyhedral set X = x ∈ R3 : x ≥ [1/3; 1/3; 1/3],

i ≤ 1 .
i=1 xP
3
The polyhedral set Y = x ∈ R3 : x ≥ [1/3; 1/3; 1/3],

7. i=1 xi ≤ 1 is contained in the
3
polyhedral set X = x ∈ R : x1 + x2 ≤ 2/3, x2 + x3 ≤ 2/3, x1 + x3 ≤ 2/3 .

5.8 Around Linear Programming Duality


Exercise I.36 ♦ Let the polyhedral set P = {x ∈ Rn : Ax ≤ b}, where A = [a⊤ ⊤
1 ; . . . ; am ], be
nonempty. Prove that P is bounded if and only if every vector from Rn can be represented as a
linear combination of the vectors ai with nonnegative coefficients where at most n coefficients are
positive. As a result, given A, all nonempty sets of the form {x ∈ Rn : Ax ≤ b} simultaneously
are/are not bounded.
Exercise I.37 Consider the linear program
Opt = max {x1 : x1 ≥ 0, x2 ≥ 0, ax1 + bx2 ≤ c} (P )
x∈R2

where a, b, c are parameters. Answer the following questions:


1. Let c = 1. Is the problem feasible?
2. Let a = b = 1, c = −1. Is the problem feasible?
5.8 Around Linear Programming Duality 73

3. Let a = b = 1, c = −1. Is the problem bounded4 ?


4. Let a = b = c = 1. Is the problem bounded?
5. Let a = 1, b = −1, c = 1. Is the problem bounded?
6. Let a = b = c = 1. Is it true that Opt ≥ 0.5?
7. Let a = b = 1, c = −1. Is it true that Opt ≤ 1?
8. Let a = b = c = 1. Is it true that Opt ≤ 1?
9. Let a = b = c = 1. Is it true that x∗ = [1; 1] is an optimal solution of (P )?
10. Let a = b = c = 1. Is it true that x∗ = [1/2; 1/2] is an optimal solution of (P )?
11. Let a = b = c = 1. Is it true that x∗ = [1; 0] is an optimal solution of (P )?
Exercise I.38 Consider the LP program
 
 x1 ≤ 0 
max −x2 : −x1 ≤ −1
x1 ,x2 
x2 ≤ 1

Write down the dual problem and check whether the optimal values are equal to each other.
Exercise I.39 Write down the problems dual to the following linear programs:
 

 x1 − x2 + x3 = 0, 

x1 + x2 − x3 ≥ 100, 

 
 
1. max x1 + 2x2 + 3x3 : x1 ≤ 0,
x∈R3 
x2 ≥ 0,
 


 

x3 ≥ 0
 
 ⊤
2. maxn c x : Ax = b, x ≥ 0
x∈R
3. maxn c⊤ x : Ax = b, u ≤ x ≤ u

x∈R
4. max c⊤ x : Ax + By ≤ b, x ≤ 0, y ≥ 0

x,y

Exercise I.40 ▲ Consider a primal-dual pair of LO programs


n o
Opt(P ) = min c⊤ x : Ax ≥ b (P )
x
n o
Opt(D) = max b⊤ y : y ≥ 0, A⊤ y = c (D)
y

Prove that the feasible set of at least one of these problems is unbounded.
Exercise I.41 ▲ Consider the following linear program
 
 X X X 
Opt = min 2 xij : xij ≥ 0, ∀1 ≤ i < j ≤ 4, xij + xji ≥ i, 1 ≤ i ≤ 4 .
{xij }1≤i<j≤4  
1≤i<j≤4 j>i j<i

1. Show that the optimum objective value is at most 20.


2. Show that the optimum objective value is at least 10. Opt ≥ 10.
Exercise I.42 ♦ We say that an n×n matrix P is stochastic if all of its entries are nonnegative
and the sum of the entries of each row is equal to 1. Show that if P is a stochastic matrix, then
there is a nonzero vector a ∈ Rn such that a⊤ P = a⊤ and a ≥ 0.
Exercise I.43 ▲ Let A ∈ Rn×n be a symmetric matrix. Consider the linear optimization
problem
n o
min c⊤ x : Ax ≥ c, x ≥ 0 .
x

Prove that if x̄ satisfies Ax̄ = c and x̄ ≥ 0, then x̄ is optimal.


4 Recall that a maximization problem is called bounded, if the objective is bounded from above on the
feasible set, which is the same as its optimal value being < ∞
74 Exercises for Part I

Exercise I.44 ▲ Let w ∈ Rn , and let A ∈ Rn×n be a skew-symmetric matrix, i.e., A⊤ = −A.
Consider the following linear program
n o
Opt(P ) = minn w⊤ x : Ax ≥ −w, x ≥ 0 .
x∈R

Suppose that the problem is solvable. Provide a closed analytical form expression for Opt(P ).
Exercise I.45 ▲ [Separation Theorem, polyhedral version] Let P and Q be two nonempty
polyhedral sets in Rn such that P ∩ Q = ∅. Suppose that the polyhedral descriptions of these
sets are given as
P := {x ∈ Rn : Ax ≤ b} and Q := {x ∈ Rn : Dx ≥ d} .
Using LP duality show that there exists a vector c ∈ Rn such that
c⊤ x < c⊤ y for all x ∈ P and y ∈ Q.
Exercise I.46 ▲ Suppose we are given the following linear program
n o
min c⊤ x : Ax = b, x ≥ 0 (P )
x

and its associated Lagrangian function given by


L(x, λ) := c⊤ x + λ⊤ (b − Ax).
The LP dual to (P ) is (replace Ax = b with Ax ≥ b, −Ax ≥ −b)
n o
Opt(D) = max b⊤ [λ+ − λ− ] : A⊤ [λ+ − λ− ] + µ = c, λ± ≥ 0, µ ≥ 0 ,
λ± ,µ

or, after eliminating µ and setting λ = λ+ − λ− ,


n o
Opt(D) = max b⊤ λ : A⊤ λ ≤ b . (D)
λ

Now, let us consider the following “game”: Player 1 chooses some x ≥ 0, and player 2 chooses
some λ simultaneously; then, player 1 pays to player 2 the amount L(x, λ). In this game, player
1 would like to minimize L(x, λ) and player 2 would like to maximize L(x, λ).
A pair (x∗ , λ∗ ) with x∗ ≥ 0, is called an equilibrium point (or saddle point or Nash equilibrium)
if
L(x∗ , λ) ≤ L(x∗ , λ∗ ) ≤ L(x, λ∗ ), ∀x ≥ 0 and ∀λ. (∗)
(That is, we have an equilibrium if no player is able to improve her performance by unilaterally
modifying her choice.)
Show that a pair (x∗ , λ∗ ) is an equilibrium point if and only if x∗ and λ∗ are respectively
optimal solutions to the problem (P ) and its dual respectively.
Exercise I.47 ▲ Given a polyhedral set X = x ∈ Rn : a⊤

i x ≤ bi , ∀i = 1, . . . , m , consider
the associated optimization problem
max {t : B1 (x, t) ⊆ X} ,
x,t

n
where B1 (x, t) := {y ∈ R : ∥y − x∥∞ ≤ t}. Is it possible to pose this optimization problem as
a linear program with polynomial in m, n number of variables and constraints? If it is possible,
give such a representation explicitly. If not, argue why.
Exercise I.48 ▲ Consider the following optimization problem
n o
minn c⊤ x : ã⊤
i x ≤ bi for some ãi ∈ Ai , i = 1, . . . , m, x ≥ 0 , (*)
x∈R

where Ai = {āi + ϵi : ∥ϵi ∥∞ ≤ ρ} for i = 1, . . . , m and ∥u∥∞ := maxj=1,...,n {|uj |}. In this prob-
lem, we basically mean that the constraint coefficient ãij (j-th component of the i-th constraint
5.8 Around Linear Programming Duality 75

vector ãi ) belongs to the interval uncertainty set [āij − ρ, āij + ρ], where āij is its nominal
value. That is, in (∗), we are seeking a solution x such that each constraint is satisfied for some
coefficient vector from the corresponding uncertainty set.
Note that in its current form (∗), this problem is not a linear program (LP). Prove that it
can be written as an explicit linear program and give the corresponding LP formulation.
Exercise I.49 ♦ Let S = {a1 , a2 , . . . , an } be a finite set composed of n distinct from each
other elements, and let f be a real-valued function defined on the set of all subsets of S. We say
that f is submodular if for every X, Y ⊆ S, the following inequality holds
f (X) + f (Y ) ≥ f (X ∪ Y ) + f (X ∩ Y ).
1. Give an example of a submodular function f .
2. Let f : 2S → Z be an integer-valued submodular function such that f (∅) = 0. Consider the
polyhedron
( )
|S|
X
Pf := x ∈ R : xt ≤ f (T ), ∀T ⊆ S ,
t∈T

Consider
x̄ak := f ({a1 , . . . , ak }) − f ({a1 , . . . , ak−1 }), k = 1, . . . , n.
Show that x̄ is feasible to Pf .
3. Consider the following optimization problem associated with Pf
n o
max c⊤ x : x ∈ Pf .
x

Write down the dual of this LP.


4. Assume without loss of generality that ca1 ≥ ca2 ≥ . . . ≥ can . Identify a dual feasible solution
and using LP Duality Theorem show that the solution x̄ specified in part 2 is optimal to the
primal maximization problem associated with Pf .
Remark: Note that when the submodular function f is integer-valued, we immediately see from
the characterization of the optimal primal solution x̄ that for all integer vectors c ∈ Zn such that
there exists an optimum solution to the primal problem, there exists an optimum solution (e.g.
x̄) where all variables take integer values. A system of linear inequalities Ax ≤ b with b ∈ Zm
and A ∈ Qm×n satisfying such a property (i.e., whenever c ∈ Zn is such that there is an optimal
solution to maxx {c⊤ x : Ax ≤ b} then there is an integer optimum solution) is called totally
dual integral (TDI). Thus, we conclude that the polyhedron Pf associated with an integer-
valued submodular function f is TDI. TDI property is a well-known sufficient condition that
guarantees that every extreme point (see section 8.2) of the associated polyhedron is integral.
In particular, TDI property generalizes total unimodularity (TU), i.e., the other well-known
sufficient condition for integrality of a polyhedron which plays a key role in network-flow based
optimization.
6

Proofs of Facts from Part I

Fact I.1.6 The unit ball of a norm ∥ · ∥, i.e., the set


{x ∈ Rn : ∥x∥ ≤ 1} ,
same as every other ∥ · ∥-ball
Br (a) := {x ∈ Rn : ∥x − a∥ ≤ r} ,
(here a ∈ Rn and r ≥ 0 are fixed) is convex.
In particular,
√Euclidean balls (∥·∥-balls associated with the standard Euclidean
norm ∥x∥2 := x⊤ x) are convex.
Proof. Let us prove that the set Q := {x ∈ Rn : ∥x − a∥ ≤ r} is convex. For any x′ , x′′ ∈ Q
and λ ∈ [0, 1], we have

∥λx′ + (1 − λ)x′′ − a∥ = ∥λ(x′ − a) + (1 − λ)(x′′ − a)∥


≤ ∥λ(x′ − a)∥ + ∥(1 − λ)(x′′ − a)∥
= λ∥x′ − a∥ + (1 − λ)∥x′′ − a∥ ≤ λr + (1 − λ)r = r.

Here, the first inequality follows from Triangle inequality, and the second equality follows from
homogeneity of norms, and the last inequality is due to x′ , x′′ ∈ Q. Thus, from ∥λx′ + (1 −
λ)x′′ − a∥ ≤ r, we conclude that λx′ + (1 − λ)x′′ ∈ Q as desired.

Fact I.1.8 Unit balls of norms on Rn are exactly the same as convex sets V in
Rn satisfying the following three properties:

(i) V is symmetric with respect to the origin: x ∈ V =⇒ −x ∈ V ;


(ii) V is bounded and closed;
(iii) V contains a neighborhood of the origin, i.e., there exists r > 0 such that the
centered at the origin Euclidean ball of radius r – the set {x ∈ Rn : ∥x∥2 ≤ r}
– is contained in V .
Any set V satisfying the outlined properties is indeed the unit ball of a particular
norm given by
∥x∥V = inf t ≥ 0 : t−1 x ∈ V .

(1.2)

Proof. First, let V be the unit ball of a norm ∥ · ∥, and let us verify the three stated properties.
Note that V = −V due to ∥x∥ = ∥ − x∥. V is bounded and contains a neighborhood of the
origin due to equivalence between ∥ · ∥ and ∥ · ∥2 (Proposition B.3). Moreover, V is closed. To

76
Proofs of Facts from Part I 77

see this note that ∥ · ∥ is Lipschitz continuous with constant 1 with respect to itself since by
Triangle inequality and due to ∥x − y∥ = ∥y − x∥ we have

|∥x∥ − ∥y∥| ≤ ∥x − y∥, ∀x, y ∈ Rn ,

which implies by Proposition B.3 that there exists L∥·∥ < ∞ such that

|∥x∥ − ∥y∥| ≤ L∥·∥ ∥x − y∥2 , ∀x, y ∈ Rn ,

that is, ∥ · ∥ is Lipschitz continuous (and thus continuous). And of course for any a ∈ R, the
sublevel set {x ∈ Rn : f (x) ≤ a} of a continuous function is closed.
For the reverse direction, consider any V possessing properties (i –iii). Then, as V is bounded
and contains a neighborhood of the origin, the function ∥·∥V is well defined, it is positive outside
of the origin and vanishes at the origin. Moreover, ∥ · ∥V is homogeneous – when the argument
is multiplied by a real number λ, the value of the function is multiplied by |λ| (by construction
and since V = −V ).
Now, let us show that the relation V = {y ∈ Rn : ∥y∥V ≤ 1} holds. Indeed, the inclusion
V ⊆ {y : ∥y∥V ≤ 1} is evident. So, we will verify that ∥y∥V ≤ 1 implies y ∈ V . Consider
any y such that ∥y∥V ≤ 1 and let t̄ := ∥y∥V (note that t̄ ∈ [0, 1]). There is nothing to prove
when t̄ = 0, which due to the boundedness of V implies that y = 0 and V contains the origin.
When t̄ > 0, then, by definition of ∥ · ∥V , there exists a sequence of positive numbers {ti } that
converges to t̄ as i → ∞ such that y i := t−1 i y ∈ V . Then, as V is closed, ȳ := t̄
−1
y ∈ V . And
since 0 < t̄ ≤ 1, y = t̄ȳ is a convex combination of the origin and ȳ. As both 0 ∈ V and ȳ ∈ V
and V is convex, we conclude y ∈ V .
Let us now check that ∥ · ∥V satisfies the Triangle inequality. As ∥ · ∥V is nonnegative, all we have
to check is that ∥x + y∥V ≤ ∥x∥V + ∥y∥V when x ̸= 0, y ̸= 0. Setting x̄ := x/∥x∥V , ȳ := y/∥y∥V ,
we have by homogeneity ∥x̄∥V = ∥ȳ∥V = 1. Then, from the relation V = {y ∈ Rn : ∥y∥V ≤ 1}
we deduce x̄ ∈ V and ȳ ∈ V . Now, as V is convex and x̄, ȳ ∈ V , we have

1 ∥x∥V ∥y∥V
(x + y) = x̄ + ȳ ∈ V.
∥x∥V + ∥y∥V ∥x∥V + ∥y∥V ∥x∥V + ∥y∥V

1
That is, ∥x∥V +∥y∥V
(x + y) ≤ 1. Then, once again by homogeneity of ∥ · ∥V we conclude that
V
∥x + y∥V ≤ ∥x∥V + ∥y∥V .

Fact I.1.9 Let Q be an n × n matrix which is symmetric (i.e., Q = Q⊤ ) and


positive definite (i.e., x⊤ Qx > 0 for all x ̸= 0). Then, for every nonnegative r,
the Q-ellipsoid of radius r centered at a, i.e., the set

x ∈ Rn : (x − a)⊤ Q(x − a) ≤ r2


is convex.
Proof. Note that since Q is positive definite, the matrix P := Q1/2 (see section D.1.5 for the
definition of the matrix square root) is well-defined and positive definite. Then, we have P is
nonsingular and symmetric, and
n o n o
x ∈ Rn : (x − a)⊤ Q(x − a) ≤ r2 = x ∈ Rn : (x − a)⊤ P ⊤ P (x − a) ≤ r2
= {x ∈ Rn : ∥P (x − a)∥2 ≤ r} .

Now, note that whenever ∥ · ∥ is a norm on Rn and P is a nonsingular n × n matrix, the function
p
x 7→ ∥P x∥ is a norm itself (why?). Thus, the function ∥x∥Q := x⊤ Qx = ∥Q1/2 x∥2 is a norm,
and the ellipsoid in question clearly is just the ∥ · ∥Q -ball of radius r centered at a.
Fact I.1.11 A set M ⊆ Rn is convex if and only if it is closed with respect to
taking all convex combinations of its elements. That is, M is convex if and only
78 Proofs of Facts from Part I

if every convex combination of vectors from M is again a vector from M .


Hint: Note that assuming λ1 , . . . , λm > 0, one has
m m
X X λi
λi xi = λ1 x1 + (λ2 + λ3 + . . . + λm ) µi xi , where µi := .
i=1 i=2
λ2 + λ3 + . . . + λm

Proof. There is nothing to prove when M is empty, so we assume M ̸= ∅. If M is closed with


respect to taking arbitrary convex combinations of its points, it is closed with respect to taking
2-point combinations, which is exactly the same as to say that M is convex. For the reverse
direction, let M be convex. We will prove that a point given as a convex combination of N points
from M is itself in M by induction on N . The claim is clearly true when N = 1 (independent
of what M is) and is true when N = 2 (since M is convex). Suppose now that the claim is true
for some N ≥ 2. Consider (N + 1)-term convex combination x = N
P +1 i i
i=1 λi x of points x from
1
M . If λ1 = 1, we have x = x ∈ M . When λ1 < 1, we have
N
!
1
X λi i
x = λ1 x + (1 − λ1 ) x .
i=2
1 − λ1

Define x̄ := N
P λi i PN
i=2 1−λ1 x . As i=2 λi = 1 − λ1 , we see that x̄ is an N -term convex combination
of points from M and thus belongs to M by the inductive hypothesis. Hence, x = λ1 x1 +(1−λ1 )x̄
is convex combination of x1 , x̄ ∈ M , and as M is convex we conclude x ∈ M . This completes
the inductive step.
Fact I.1.14 [Convex hull via convex combinations] For a set M ⊆ Rn ,
Conv(M ) = {the set of all convex combinations of vectors from M } .
Proof. Define M c := {the set of all convex combinations of vectors from M }. Recall that a con-
vex set is closed with respect to taking convex combinations of its members (Fact I.1.11); thus
any convex set containing M also contains M c. As by definition Conv(M ) is the intersection of
all convex sets containing M , we have Conv(M ) ⊇ M c. It remains to prove that Conv(M ) ⊆ M c.
We start with the claim that M is convex. By Fact I.1.11 M is convex if and only if every convex
c c
combination of points from M c is also in M c: let x̄i ∈ M
c. Indeed this criteria holds for M c for
i = 1, . . . , N and consider a convex combination of these points, i.e.,
N
X
x̂ := λi x̄i ,
i=1
PN
where λi ≥ 0 and i=1 λi = 1. For each i = 1, . . . , N , as x̄i ∈ M c, by definition of M c, we have
x̄i = N
P i i,j i,j P Ni
µ
j=1 i,j x , where x ∈ M , µ i,j ≥ 0 and µ
j=1 i,j = 1. Then, we arrive at
Ni Ni
N N
! N X
X i
X X i,j
X
x̂ := λi x̄ = λi µi,j x = (λi µi,j )xi,j .
i=1 i=1 j=1 i=1 j=1

Clearly, γi,j := λi µi,j is nonnegative for all i, j. Moreover,


Ni Ni Ni
N X N X N
! N
X X X X X
γi,j = λi µi,j = λi µi,j = λi = 1,
i=1 j=1 i=1 j=1 i=1 j=1 i=1
P Ni PN
where the third and forth equalities follow from j=1 µi,j = 1 for all i and i=1 λi = 1,
PN PNi
respectively. Therefore, x̂ = i=1 j=1 γi,j xi,j is nothing but a convex combination of points
from M , and thus x̂ ∈ Mc proving that M
c is convex. Clearly, we also have M c ⊇ M , and so by
definition of Conv(M ), we deduce M ⊇ Conv(M ), as desired.
c

Fact I.1.17 A set K ⊆ Rn is a cone if and only if it is nonempty and


Proofs of Facts from Part I 79

• is conic, i.e., x ∈ K, t ≥ 0 =⇒ tx ∈ K; and


• contains sums of its elements, i.e., x, y ∈ K =⇒ x + y ∈ K.
Proof. Suppose K is nonempty and possesses the above properties. Then, the first property
already states that K is conic, so we will show that K is convex. For any x, y ∈ K and λ ∈ [0, 1],
we have
λx + (1 − λ)y = x̄ + ȳ,
where x̄ := λx and ȳ := (1 − λ)y. As λ ∈ [0, 1], x, y ∈ K and K is conic, we have x̄, ȳ ∈ K.
Moreover, since K contains sum of its elements, we conclude λx + (1 − λ)y = (x̄ + ȳ) ∈ K, i.e.,
K is convex.
For the reverse direction, if K is a cone, then by definition K is nonempty, conic and convex.
Then, for any x, y ∈ K, as K is convex we have 21 x + 12 y ∈ K, and as K is conic we arrive at
x + y = 2 21 x + 12 y ∈ K.


Fact I.1.20 [Conic hull via conic combinations] The conic hull Cone(K) of a
set K ⊆ Rn is the set of all conic combinations (i.e., linear combinations with
nonnegative coefficients) of vectors from K:
( N
)
X
Cone(K) = x ∈ Rn : ∃N ≥ 0, λi ≥ 0, xi ∈ K, i ≤ N : x = λ i xi .
i=1

Proof. The case of K = ∅ is trivial, see the comment on the value of an empty sum after the
statement of Fact I.1.20. When K ̸= ∅, this fact is an immediate corollary of Fact I.1.17.
Fact I.1.23 The closure of a set M ⊆ Rn is exactly the set composed of the
limits of all converging sequences of elements from M .
Proof. Let M be the set of the limit points of all converging sequences of elements from M .
We need to show that cl M = M . Let us first prove cl M ⊇ M . Suppose x ∈ M . Then, x is the
limit of a converging sequence of points {xi } ∈ M ⊆ cl M . Since cl M is a closed set, we arrive
at x ∈ cl M .
For the reverse direction, note that by definition cl M is the smallest (w.r.t. inclusion) closed set
that contains M , so it suffices to prove that M is a closed set satisfying M ⊇ M . It is easy to see
that M ⊇ M holds as for any x ∈ M , the sequence {xi } where xi = x is a converging sequence
of points from M with a limit point of x and thus by definition of M we deduce x ∈ M . Now,
consider a converging sequence of points {xi } ⊆ M , and let us prove that the limit x̄ of this
sequence belongs to M . For every i, since the point xi ∈ M is the limit of a sequence of points
from M , we can find a point y i ∈ M such that ∥xi − y i ∥2 ≤ 1/i. The sequence {y i } is composed
of points from M and clearly has the same limit as the sequence {xi }, so that the latter limit is
the limit of a sequence of points from M and as such belongs to M .

Fact I.1.36 Let K be a closed cone, and let the set


X := {x ∈ Rn : Ax − b ∈ K}
be nonempty. Then, ConeT(X) = {[x; t] ∈ Rn × R : Ax − bt ∈ K, t ≥ 0}.
Proof. Define X b := {[x; t] ∈ Rn × R : Ax − bt ∈ K, t ≥ 0}; so, we should prove that X
b =
ConeT(X). Recall that ConeT(M ) := cl{[x; t] ∈ Rn × R : t > 0, x/t ∈ M } for any nonempty
convex set M . Note that for the given set X, its perspective transform is
Persp(X) = {[x; t] ∈ Rn × R : t > 0, A(x/t) − b ∈ K}
= {[x; t] ∈ Rn × R : t > 0, Ax − bt ∈ K} ,
80 Proofs of Facts from Part I

where the last equality follows from K being conic. So, Persp(X) ⊆ X, b and by taking the
closures of both sides we arrive at ConeT(X) = cl(Persp(X)) ⊆ cl(X) = X, b b where the last
equality follows as X clearly is closed. Hence, ConeT(X) ⊆ X. To verify the opposite inclusion,
b b
consider [x; t] ∈ X b and let us prove that [x; t] ∈ ConeT(X). Let x̄ ∈ X (recall that X is
nonempty). Then, [x̄; 1] ∈ X.b Moreover, as X b is a cone and the points [x; t] and [x̄; 1] belong to
X, we have zϵ := [x + ϵx̄; t + ϵ] ∈ X for all ϵ > 0. Also, for ϵ > 0, we have t + ϵ > 0 and so
b b
zϵ ∈ Xb implies 1 (x + ϵx̄) ∈ X, which is equivalent to zϵ = [x + ϵx̄; t + ϵ] ∈ Persp(X). Finally,
(t+ϵ)
as [x; t] = limϵ→+0 zϵ , we have [x; t] ∈ cl(Persp(X)) = ConeT(X), as desired.

Fact I.2.7 [Caratheodory Theorem in conic form] Let a ∈ Rm be a conic combi-


nation (linear combination with nonnegative coefficients) of N vectors a1 , . . . , aN .
Then, a is a conic combination of at most m vectors from the collection a1 , . . . , aN .
Proof. The proof follows the same lines as the proof of the “plain” Caratheodory Theorem. Con-
sider the minimal, in terms of the number of positive coefficients, representation of a as a conic
combination of a1 , . . . , aN ; w.l.o.g., we can assume that this is the representation a = K i
P
i=1 λi a ,
λi > 0, i ≤ K. We need to prove that K ≤ m. Assume for contradiction that K > m. Con-
sider the system of m scalar linear equations K i
P
i=1 δi a = 0 in variables δ. The number K of
unknowns in this system is larger than the number m of equations. Thus, this system has a
nontrivial solution δ̄. Passing, if necessary, from δ̄ to −δ̄, we may further assume that some
of δ̄i are strictly negative. Define λi (t) := λi + tδ̄i for all i. Note that for all t ≥ 0 we have
PK i ∗ ∗
a = i=1 λi (t)a . Let t be the largest t ≥ 0 for which all λi (t) are nonnegative (t is well
defined, since for large t some of λi (t), i.e., those corresponding to δ̄i < 0, will become negative),
we get a = K ∗ i ∗
P
i=1 λi (t )a with all coefficients λi (t ) nonnegative and at least one of them zero,
contradicting the origin of K.
Part II

Separation Theorem, Extreme Points,


Recessive Directions, and Geometry of
Polyhedral Sets

81
7

Separation Theorem

We next investigate Separation Theorem which is as indispensable when studying


general convex sets as is General Theorem on Alternative when investigating
properties of polyhedral sets.

7.1 Separation: definition


Recall that a hyperplane M in Rn is, by definition, an affine subspace of dimension
n − 1. Then, by Proposition A.47, hyperplanes are precisely the level sets of
nontrivial linear forms. That is,
M ⊂ Rn is a hyperplane
⇐⇒ ∃a ∈ Rn , a ̸= 0, ∃b ∈ R such that M = x ∈ Rn : a⊤ x = b .


We can associate with the hyperplane M or, better to say, with the associated
pair a, b (defined by the hyperplane up to multiplication of a, b by nonzero real
number) the following sets:
• “upper” and “lower” open half-spaces
M ++ := x ∈ Rn : a⊤ x > b , and M −− := x ∈ Rn : a⊤ x < b .
 

These sets clearly are convex, and since a linear form is continuous, and the
sets are given by strict inequalities on the value of a continuous function, they
indeed are open.
These open half-spaces are uniquely defined by the hyperplane, up to swapping
the “upper” and the “lower” ones (this is what happens when passing from a
particular pair a, b specifying M to a negative multiple of this pair).
• “upper” and “lower” closed half-spaces
M + := x ∈ Rn : a⊤ x ≥ b , and M − := x ∈ Rn : a⊤ x ≤ b .
 

These are also convex sets. Moreover, these two sets are polyhedral and thus
closed. It is easily seen that the closed upper/lower half-space is the closure of
the corresponding open half-space, and M itself is the common boundary of
all four half-spaces.
Also, note that our half-spaces and M itself partition Rn , i.e.,
Rn = M −− ∪ M ∪ M ++
83
84 Separation Theorem

(partitioning by disjoint sets), and


Rn = M − ∪ M +
(where M is the intersection of the sets M − and M + ).
We are now ready to define the basic notion of separation of two convex sets
T and S by a hyperplane.

Definition II.7.1 [Separation] Let S, T be two nonempty convex sets in


Rn .
• A hyperplane
M = x ∈ Rn : a⊤ x = b

[where a ̸= 0]
is said to separate S and T , if it satisfies both of the following properties:
– S ⊆ x ∈ Rn : a⊤ x ≤ b , T ⊆ x ∈ Rn : a⊤ x ≥ b (i.e., S and T
 

belong to the opposite closed half-spaces into which M splits Rn ), and,


– at least one of the sets S, T is not contained in M itself, i.e.,
S ∪ T ̸⊆ M.
• The separation is called strong, if there exist b′ , b′′ ∈ R satisfying b′ < b <
b′′ , such that
S ⊆ x ∈ Rn : a⊤ x ≤ b′ , T ⊆ x ∈ Rn : a⊤ x ≥ b′′ .
 

• A linear form a ̸= 0 is said to separate (strongly separate) S and T , if for


properly chosen b the hyperplane x ∈ Rn : a⊤ x = b separates (strongly


separates) S and T .
• We say that S and T can be (strongly) separated, if there exists a hyper-
plane which (strongly) separates S and T .

Let us examine the separation concept on a few simple examples.


Example II.7.1

1) 2) 3) 4) 5)
Figure II.1. Separation.
1) The hyperplane {x ∈ R2 : x2 − x1 = 1} strongly separates the polyhedral sets
S = {x ∈ R2 : x2 = 0, x1 ≥ −1} and T = {x ∈ R2 : 0 ≤ x1 ≤ 1, 3 ≤ x2 ≤ 5}.
2) The hyperplane {x ∈ R : x = 1} separates (but not strongly separates) the
convex sets S = {x ∈ R : x ≤ 1} and T = {x ∈ R : x ≥ 1}.
3) The hyperplane {x ∈ R2 : x1 = 0} separates (but not strongly separates) the
convex sets S = {x ∈ R2 : x1 < 0, x2 ≥ −1/x1 } and T = {x ∈ R2 : x1 >
0, x2 > 1/x1 }.
7.2 Separation Theorem 85

4) The hyperplane {x ∈ R2 : x2 − x1 = 1} does not separate the convex sets


S = {x ∈ R2 : x2 ≥ 1} and T = {x ∈ R2 : x2 = 0}.
5) The hyperplane {x ∈ R2 : x2 = 0} does not separate the polyhedral sets S =
{x ∈ R2 : x2 = 0, x1 ≤ −1} and T = {x ∈ R2 : x2 = 0, x1 ≥ 1}. ♢
The following equivalent description of separation is used often as well.

Fact II.7.2 Let S, T be nonempty convex sets in Rn . A linear form a⊤ x


separates S and T if and only if
(a) sup a⊤ x ≤ inf a⊤ y, and
x∈S y∈T

(b) inf a⊤ x < sup a⊤ y.


x∈S y∈T

This separation is strong if and only if (a) holds as a strict inequality:


sup a⊤ x < inf a⊤ y.
x∈S y∈T

7.2 Separation Theorem


One of the most fundamental results in convex analysis is the following separation
theorem.

Theorem II.7.3 [Separation Theorem] Let S and T be nonempty convex


sets in Rn .
(i) S and T can be separated if and only if their relative interiors do not
intersect, i.e., rint S ∩ rint T = ∅.
(ii) S and T can be strongly separated if and only if the sets are at a
positive distance from each other, i.e.,
dist(S, T ) := inf {∥x − y∥2 : x ∈ S, y ∈ T } > 0.
In particular, if S and T are nonempty non-intersecting closed convex sets
and one of these sets is compact, then S and T can be strongly separated.

We will use the following simple and important lemma in the proof of the
separation theorem.

Lemma II.7.4 A point x ∈ rint Q of a convex set Q can be the minimizer


(or maximizer) of a linear function f (x) = a⊤ x if and only if the function is
constant on Q.

Proof. “If” part is evident. To prove the “only if” part, let x̄ ∈ rint Q be, say, a
minimizer of f over Q, then for any y ∈ Q we need to prove that f (x̄) = f (y).
There is nothing to prove if y = x̄, so let us assume that y ̸= x̄. Since Q is convex
and x̄, y ∈ Q, the segment [x̄, y] belongs to Q. Moreover, as x̄ ∈ rint Q we can
extend this segment a little further away from x̄ and still remain in Q. That is,
86 Separation Theorem

there exists z ∈ Q such that x̄ = (1 − λ)y + λz with certain λ ∈ [0, 1). As y ̸= x̄,
we have in fact λ ∈ (0, 1). Since f is linear, we deduce
f (x̄) = (1 − λ)f (y) + λf (z).
Because x̄ is a minimizer of f over Q and y, z ∈ Q, we have min{f (y), f (z)} ≥
f (x̄) = (1 − λ)f (y) + λf (z). Then, from λ ∈ (0, 1) we conclude that this relation
can be satisfied only when f (x̄) = f (y) = f (z).
Proof of Theorem II.7.3. We will prove the separation theorem in several
steps. We will first focus on the usual separation, i.e., case (i) of the theorem.
(i) Necessity. Assume that S, T can be separated. Then, for certain a ̸= 0 we
have
sup a⊤ x ≤ inf a⊤ y, and inf a⊤ x < sup a⊤ y. (7.1)
x∈S y∈T x∈S y∈T

Assume for contradiction that rint S and rint T have a common point x̄. Then,
from the first inequality in (7.1) and x̄ ∈ S ∩ T , we deduce
a⊤ x̄ ≤ sup a⊤ x ≤ inf a⊤ y ≤ a⊤ x̄.
x∈S y∈T

Thus, x̄ maximizes the linear function f (x) = a⊤ x on S and simultaneously


minimizes this function on T . Then, as x̄ ∈ rint S and also x̄ ∈ rint T using
Lemma II.7.4, we conclude f (x) = f (x̄) on S and on T , so that f (·) is constant
on S ∪ T . This then yields the desired contradiction to the second inequality in
(7.1).
(i) Sufficiency. The proof of sufficiency part of the Separation Theorem is much
more instructive. There are several ways to prove it. Below, we present a proof
based on Theorem I.3.2.
(i) Sufficiency, Step 1: Separation of a nonempty polytope and a point
outside the polytope. We start with seemingly a very particular case of the
Separation Theorem – the one where S = Conv {x1 , . . . , xN } and T is a singleton
T = {x} which does not belong to S. We will prove that in this case there exists
a linear form which strongly separates T = {x} and S.
The set S = Conv{x1 , . . . , xN } is given by the polyhedral representation
( N N
)
X X
S = z ∈ Rn : ∃λ such that λ ≥ 0, λi = 1, z = λi ai ,
i=1 i=1

and thus S is polyhedral (Theorem I.3.2). Therefore, for a properly selected k,


a1 , . . . , ak , and b1 , . . . , bk we have:
S = z ∈ Rn : a⊤

i z ≤ bi , i ≤ k .

Since x ̸∈ S, there exists i ≤ k such that a⊤


i x > bi , and thus the corresponding
ai clearly strongly separates our S and T = {x}.
(i) Sufficiency, Step 2: Separation of a nonempty convex set and a point
outside of the set. Now consider the case when S is an arbitrary nonempty
7.2 Separation Theorem 87

convex set and T = {x} is a singleton outside S (here the difference with Step 1
is that now S is not assumed to be a polytope).
Without loss of generality we may assume that S contains 0 (if it is not the
case, by taking any p ∈ S, we may translate S and T to the sets S 7→ −p + S,
T 7→ −p + T ; clearly, a linear form which separates the shifted sets, can be shifted
to separate the original ones as well). Let L be the linear span of S.
If x ̸∈ L, the separation is easy: we can write x = e + f , where e ∈ L and f is
from the subspace orthogonal to L, and thus
f ⊤ x = f ⊤ f > 0 = max f ⊤ y,
y∈S

so that f strongly separates S and T = {x}.


Now, we consider the case when x ∈ L. Since x ∈ L, and x ̸∈ S as well as
∅ ̸= S ⊆ L, we deduce that L contains at least two points and so L ̸= {0}.
Without loss of generality, we can assume that L = Rn .
Let B := {h ∈ Rn : ∥h∥2 = 1} be the unit sphere in Rn . This is a closed and
bounded set in Rn (boundedness is evident, and closedness follows from the fact
that ∥ · ∥2 is continuous). Thus, B is a compact set. Let us prove that there exists
f ∈ B that separates x and S in the sense that
f ⊤ x ≥ sup f ⊤ y. (7.2)
y∈S

Assume for contradiction that no such f exists. Then, for every h ∈ B there exists
yh ∈ S such that
h⊤ yh > h⊤ x.
Since the inequality is strict, it immediately follows that there exists an open
neighborhood Uh of the vector h such that
(h′ )⊤ yh > (h′ )⊤ x, ∀h′ ∈ Uh . (7.3)
Note that the family of open sets {Uh }h∈B covers B. As B is compact, we can find
a finite subfamily Uh1 , . . . , UhN of this family which still covers B. Let us take the
corresponding points y 1 := yh1 , y 2 := yh2 , . . . , y N := yhN and define the polytope
Sb := Conv {y 1 , . . . , y N }. Due to the origin of y i , all of these points are in S and
thus S ⊇ Sb (recall that S is convex). Since x ̸∈ S, we deduce x ̸∈ S. b Then, by
Step 1, x can be strongly separated from S, i.e., there exists a ̸= 0 such that
b

a⊤ x > sup a⊤ y = max a⊤ y i : 1 ≤ i ≤ N .



(7.4)
y∈S
b

By normalization, we may also assume that ∥a∥2 = 1, so that a ∈ B. Recall that


Uh1 , . . . , UhN form a covering of B, and as a ∈ B, we have that a belongs to certain
Uhi . By construction of Uhi (see (7.3)), we have
a⊤ y i ≡ a⊤ yhi > a⊤ x,
which contradicts (7.4) as y i ∈ S.
b
Thus, we conclude that there exists f ∈ B satisfying (7.2). We claim that
88 Separation Theorem

f separates S and {x}. Given that we already established(7.2), all we need to


verify for establishing f indeed separates S and {x} is to show that the linear
form f (y) = f ⊤ y is non-constant on S ∪ T . This is evident as we are in the
situation when 0 ∈ S and L = Lin(S) = Rn and f ̸= 0, so that f (y) is non-
constant already on S (indeed, otherwise we would have f ⊤ y = 0 for y ∈ S due
to 0 ∈ S, whence f ⊤ y = 0 for y ∈ Lin(S) = Rn , contradicting f ̸= 0).
(i) Sufficiency, Step 3: Separation of two nonempty and non-intersecting
convex sets. Now we are ready to prove that two nonempty and non-intersecting
convex sets S and T can be separated. To this end consider the arithmetic dif-
ference of the sets S and T , i.e.,
∆ := S − T = {x − y : x ∈ S, y ∈ T } .
As S and T are nonempty and convex, ∆ is nonempty and convex (by Proposition
I.1.21.3). Also, as S ∩ T = ∅, we have 0 ̸∈ ∆. Then, by Step 2, we can separate
∆ and {0}, i.e., there exists f ̸= 0 such that
f ⊤ 0 = 0 ≥ sup f ⊤ z and f ⊤ 0 > inf f ⊤ z.
z∈∆ z∈∆

In other words,
f ⊤x − f ⊤y f ⊤x − f ⊤y ,
 
0≥ sup and 0> inf
x∈S,y∈T x∈S,y∈T

which clearly means that f separates S and T .


(i) Sufficiency, Step 4: Separation of nonempty convex sets with non-
intersecting relative interiors. Now we are ready to complete the proof of the
“if” part of part (i) of the Separation Theorem. Let S and T be two nonempty
convex sets such that rint S ∩ rint T = ∅, then we will prove that S and T can be
separated. Recall from Theorem I.1.29 that the sets S ′ := rint S and T ′ := rint T
are nonempty and convex. Moreover, we are given that S ′ and T ′ do not intersect,
thus they can be separated by Step 3. That is, there exists f such that
inf f ⊤ x ≥ sup f ⊤ x and sup f ⊤ x > inf ′ f ⊤ x. (7.5)
x∈T ′ y∈S ′ x∈T ′ y∈S

It is immediately seen that in fact f separates S and T . Indeed, the quantities


in the left and the right hand sides of the first inequality in (7.5) clearly remain
unchanged when we replace S ′ with cl S ′ and T ′ with cl T ′ . Moreover, by Theorem
I.1.29, cl S ′ = cl S ⊇ S and cl T ′ = cl T ⊇ T , and we get inf f ⊤ x = inf ′ f ⊤ x, and
x∈T x∈T
similarly sup f ⊤ y = sup f ⊤ y. Thus, we get from (7.5)
y∈S y∈S ′

inf f ⊤ x ≥ sup f ⊤ y.
x∈T y∈S

It remains to note that T ′ ⊆ T , S ′ ⊆ S, so that the second inequality in (7.5)


implies that
sup f ⊤ x > inf f ⊤ x.
x∈T y∈S
7.2 Separation Theorem 89

(ii) Necessity: Prove yourself.


(ii) Sufficiency: Define ρ := dist(S, T ) = inf {∥x − 
y∥2 : x ∈ S, y ∈ T }. In case

(ii), we are given that ρ > 0. Consider the set Sb := x ∈ Rn : inf ∥x − y∥2 ≤ ρ/2 .
y∈S

As S is convex, the set Sb is convex (recall Example I.1.3). Moreover, Sb ∩ T = ∅


(why?). Then, by part (i), Sb and T can be separated. Let f be any linear form
that separates Sb and T . Then, the same form strongly separates S and T (why?).
The last statement of (ii), i.e., “in particular” part, readily follows from the just
proved statement due to the fact that if two closed nonempty sets in Rn do not
intersect and one of them is compact, then the sets are at positive distance from
each other (why?).
Remark II.7.5 In Theorem II.7.3, a careful reader would notice that the con-
siderations in the proof (i) Sufficiency, Step 1, i.e., separation of a polytope and a
point not in the polytope, are based on solely Theorem I.3.2, and this is a “purely
arithmetic” statement: when proving it, we never used things like convergence,
compactness, square roots, etc., just rational arithmetics. Therefore, the result
stated at Step 1 remain valid if we replace our universe Rn with the space Qn
of n-dimensional rational vectors (those with rational coordinates; of course, the
multiplication by reals in this space should be restricted to multiplication by ra-
tionals). The possibility to separate a rational vector from a “rational” polytope
by a rational linear form, which is the “rational” version of the result of Step
1, definitely are of interest (e.g., for Integer Programming). In fact, all results
in Part 1 are derived from Fourier-Motzkin elimination and Theorem I.3.2, i.e.,
the existence of optimal solution(s) to feasible and bounded LP programs, Farkas
Lemmas, General Theorem on Alternative, plain and conic Caratheodory and
(finite family version of) Helly theorems, etc., remain valid when Rn is replaced
with Qn (provided, of course, that the related data are rational). In particu-
lar, any feasible and bounded LP program with rational data admits a rational
optimal solution, which definitely is worthy of knowing.
In contrast to these “purely arithmetic” considerations at Step 1, at Step 2, i.e.,
for the separation of a closed convex set and a point outside of the set, we used
compactness, which heavily exploits the fact that our universe is Rn and not, say,
Qn (in the latter space bounded and closed sets not necessarily are compact).
In fact, we could not avoid things like compactness arguments at Step 2, since
the very fact we are proving is true in Rn but not in Qn . Indeed, consider the
“rational plane,” i.e., the universe composed of all 2-dimensional vectors with
rational entries, and let S be the half-plane in this rational plane given by the
linear inequality
x1 + αx2 ≤ 0,
where α is irrational. S clearly is a “convex set” in Q2 . But, it is immediately
seen that a point outside this set cannot be separated from S by a rational linear
form. ■
8

Consequences of Separation Theorem

Separation Theorem admits a number of important consequences. In this chapter,


we will discuss these.

8.1 Supporting hyperplanes


By Separation Theorem, we immediately deduce that a nonempty closed convex
set M is precisely the intersection of all closed half-spaces containing M . Among
these half-spaces, the most interesting are the “extreme” ones, i.e., those with
boundary hyperplanes touching M . Such extreme hyperplanes are called sup-
porting hyperplanes. While this notion of extreme makes sense for an arbitrary
(not necessary closed) convex set, we will use it for closed convex sets only, and
include the requirement of closedness in the definition:

Definition II.8.1 [Supporting hyperplane] Let M be a convex closed set in


Rn , and let x ∈ rbd M . A hyperplane
Π := y ∈ Rn : a⊤ y = a⊤ x

[where a ̸= 0]
is called supporting to M at x, if it separates M and {x}, i.e., if
a⊤ x ≥ sup a⊤ y and a⊤ x > inf a⊤ y. (8.1)
y∈M y∈M

Independent of whether M is or is not closed, a point x ∈ rbd M is a limit of


points from M , and thus the first inequality in (8.1) cannot be strict. As a result,
we arrive at an equivalent definition of a supporting hyperplane for convex sets
as follows.
Given a closed convex set M ∈ Rn and a point x ∈ rbd M , a hyperplane
y ∈ Rn : a⊤ y = a⊤ x


is supporting to M at x if and only if the linear form a(y) := a⊤ y attains


its maximum on M at the point x and is non-constant on M .

Example II.8.1 The hyperplane {x ∈ Rn : x1 = 1} clearly is supporting to the


unit Euclidean ball {x ∈ Rn : ∥x∥2 ≤ 1} at the point x = e1 = [1; 0; . . . ; 0]. ♢
The most important property of a supporting hyperplane is its existence:

90
8.2 Extreme points and Krein-Milman Theorem 91

Proposition II.8.2 [Existence of supporting hyperplanes] Let M be a closed


convex set in Rn , and let x ∈ rbd M . Then,
(i) there exists at least one hyperplane which is supporting to M at x;
(ii) if a hyperplane Π is supporting to M at x, then Π ∩ M is a nonempty
closed convex set, x ∈ Π ∩ M , and dim(Π ∩ M ) < dim(M ).

Proof. To see (i) consider any x ∈ rbd M . Then, x ̸∈ rint M , and therefore
the point {x} and rint M can be separated by the Separation Theorem. The
associated separating hyperplane is exactly the desired hyperplane supporting to
M at x.
To prove (ii), note that if Π = y ∈ Rn : a⊤ y = a⊤ x is supporting to M at


x ∈ rbd M , then the set M ′ := M ∩ Π is nonempty (as it contains x) and is closed


and convex as both Π and M are. Moreover, the linear form a⊤ y is constant on
M ′ and therefore (why?) on Aff(M ′ ). At the same time, this form is non-constant
on M by definition of a supporting plane. Thus, Aff(M ′ ) is a proper (less than the
entire Aff(M )) subset of Aff(M ), and therefore the affine dimension of Aff(M ′ ),
which is by definition nothing but dim(M ′ ), is less than the affine dimension of
Aff(M ), which is precisely dim(M ) 1 .

8.2 Extreme points and Krein-Milman Theorem


Supporting hyperplanes are useful in proving the existence of extreme points of
convex sets. Geometrically, an extreme point of a convex set is a point in the set
which cannot be written as a convex combination of other points from the set.
The importance of this notion originates from the following fact which we will
soon prove: any “good enough” (in fact just nonempty compact) convex set M is
nothing but the convex hull of its extreme points, and the set of extreme points
of such a set M is the smallest set whose convex hull is equal to M . That is,
every extreme point of a nonempty compact convex set M is essential.

8.2.1 Extreme points: definition


The exact definition of an extreme point is as follows:

Definition II.8.3 [Extreme points] Let M be a nonempty convex set in Rn .


A point x ∈ M is called an extreme point of M , if there is no nontrivial (of
positive length) segment [u, v] ∈ M for which x is an interior point. That is,
x is an extreme point of M if the relation
x = λu + (1 − λ)v

1 For dimension of a subset in Rn , see Definition I.2.1 or/and section A.4.3. We have used the
following immediate observation: If M ⊂ M ′ are two affine planes, then dim M ≤ dim M ′ , with
equality implying that M = M ′ . The readers are encouraged to prove this fact on their own.
92 Consequences of Separation Theorem

with λ ∈ (0, 1) and u, v ∈ M holds if and only if


u = v = x.
The set of all extreme points of M is denoted by Ext(M ).

In the case of polyhedral sets, extreme points are also referred to as vertices.
Example II.8.2
• The extreme points of a segment [x, y] ∈ Rn are exactly its endpoints {x, y}.
• The extreme points of a triangle are its vertices.
• The extreme points of a (closed) circle on the 2-dimensional plane are the
points of the circumference.

• The convex set M := x ∈ R2+ : x1 > 0, x2 > 0 does not have any extreme
points.
• The only extreme point of the convex set M := {[0; 0]} ∪ x ∈ R2+ : x1 > 0, x2 > 0


is the point [0; 0].


• The closed convex set {x ∈ R2 : x1 = 0} does not have any extreme points. ♢
An equivalent definition of an extreme point is as follows:

Fact II.8.4 Let M be a nonempty convex set and let x ∈ M . Then, x is an


extreme point of M if and only if any (and then all) of the following holds:
(i) the only vector h such that x P±m
h ∈ M is the zero vector;
(ii) in every representation x = i=1 λi xi of x as a convex combination, with
positive coefficients, of points xi ∈ M , i ≤ m, one has x1 = . . . = xm = x;
(iii) the set M \ {x} is convex.

Fact II.8.4.(iii) also admits the following immediate corollary.

Fact II.8.5 All extreme points of the convex hull Conv(Q) of a set Q belong
to Q:
Ext(Conv(Q)) ⊆ Q.

8.2.2 Krein-Milman Theorem


There are convex sets that do not necessarily possess extreme points; as an ex-
ample you may take the open unit ball in Rn . This example is not so interesting
as the set in question is not closed, and when we replace it with its closure the
resulting set is the closed unit ball with plenty of extreme points, i.e., all points
of the boundary. There are, however, closed convex sets which do not possess
extreme points. Consider for example, a line or an affine subspace of larger di-
mension as the convex set. Indeed, a nonempty closed convex set will have no
extreme points only when it contains a line.
We will next prove that any nonempty closed convex set M that does not
contain lines for sure possesses extreme points. Furthermore, if M is a nonempty
8.2 Extreme points and Krein-Milman Theorem 93

convex compact set, it possesses a quite representative set of extreme points, i.e.,
their convex hull is the entire M .

Theorem II.8.6 Let M be a nonempty closed convex set in Rn . Then,


(i) the set of extreme points of M , i.e., Ext(M ), is nonempty if and only
if M does not contain lines;
(ii) if M is bounded, then M = Conv(Ext(M )), i.e., every point of M is a
convex combination of the points of Ext(M ).

Remark II.8.7 Part (ii) of this theorem is the finite-dimensional version of


the famous Krein-Milman Theorem (1940). In fact, Hermann Minkowski (1911)
established Part (ii) of this theorem for the case n = 3, and Ernst Steinitz (1916)
showed it for any (finite) n. ■
We will use a number of lemmas in the proof of Theorem II.8.6. The first one
states that in the case of a closed convex set, we can add any “recessive” direction
to any point in the set and still remain in the set.

Lemma II.8.8 Let M ∈ Rn be a nonempty closed convex set. Then, when-


ever M contains a ray
{x̄ + th : t ≥ 0}
starting at x̄ ∈ M with the direction h ∈ Rn , M also contains all parallel
rays starting at the points of M , i.e., for all x ∈ M
{x + th : t ≥ 0} ⊆ M.
As a consequence, if M contains a certain line, then it contains also all parallel
lines passing through the points of M .

Proof. Suppose x̄ + th ∈ M for all t ≥ 0. Consider any point x ∈ M . Since M is


convex, for any fixed τ ≥ 0 we have
 τ 
ϵ x̄ + h + (1 − ϵ)x ∈ M, ∀ϵ ∈ (0, 1).
ϵ
By taking the limit as ϵ → +0 and noting that M is closed, we deduce that
x + τ h ∈ M for every τ ≥ 0.
Note that Lemma II.8.8 admits a corollary as follows:

Lemma II.8.9 Let M ∈ Rn be a nonempty convex set, not necessarily


closed. Suppose cl M contains a ray
{x̄ + th : t ≥ 0}
starting at x̄ ∈ cl M with the direction h ∈ Rn , and let x
b ∈ rint M . Then
rint M contains the ray
{x
b + th : t ≥ 0} .
94 Consequences of Separation Theorem

In particular, cl M contains a ray (a straight line) if and only if M contains


a ray (resp., a straight line) with the same direction.

Proof. With x̄, h, and x b as above, for every t > 0 the point xt := x
b+2th belongs to
cl M by Lemma II.8.8. Taking into account that x b ∈ rint M and invoking Lemma
I.1.30, we conclude that x b + th = 21 [x
b + xt ] ∈ rint M. Thus, xb + th ∈ rint M for
b + th ∈ rint M holds true for t = 0 as well due
all t > 0. Finally, this inclusion x
to xb ∈ rint M .
Our last ingredient for the proof of Theorem II.8.6 is a lemma stating a nice
transitive property of extreme points: that is, the extreme points of subsets of
nonempty closed convex sets obtained from the intersection with a supporting
hyperplane of the set are also extreme for the original set.

Lemma II.8.10 Let M ⊂ Rn be a nonempty closed convex set. Then, for


any x̄ ∈ rbd M and any hyperplane Π that is supporting to M at x̄, we
have that the set Π ∩ M is nonempty closed and convex, and Ext(Π ∩ M ) ⊆
Ext(M ).

Proof. First statement, i.e., Π ∩ M is nonempty closed and convex, follows from
Proposition II.8.2(ii). Moreover, by Proposition II.8.2(ii) we have x̄ ∈ Π ∩ M .
Next, let a ∈ Rn be the linear form associated with Π, i.e.,
Π = y ∈ Rn : a⊤ y = a⊤ x̄ ,


so that
inf a⊤ x < sup a⊤ x = a⊤ x̄ (8.2)
x∈M x∈M

(see Proposition II.8.2). Consider any extreme point y of Π ∩ M . Assume for


contradiction that y ̸∈ Ext(M ). Then, there exists two distinct points u, v ∈ M
and λ ∈ (0, 1) such that
y = λu + (1 − λ)v.
As y ∈ Π ∩ M we have a⊤ y = a⊤ x̄ and also as u, v ∈ M , from (8.2) we deduce
that
a⊤ y = a⊤ x̄ ≥ max a⊤ u, a⊤ v .


On the other hand, from the relation y = λu + (1 − λ)v we have


a⊤ y = λa⊤ u + (1 − λ)a⊤ v.
Combining these last two observations and taking into account that λ ∈ (0, 1),
we conclude that
a⊤ y = a⊤ u = a⊤ v.
Then, by the definition of Π, these equalities imply that u, v ∈ Π. As u, v ∈ M as
well, this contradicts that y ∈ Ext(Π ∩ M ) as we have written y = λu + (1 − λ)v
using distinct points u, v ∈ Π ∩ M and some λ ∈ (0, 1).
We are now ready to prove Theorem II.8.6.
8.2 Extreme points and Krein-Milman Theorem 95

Proof of Theorem II.8.6. Let us start with (i). The “only if” part for (i)
follows from Lemma II.8.8. Indeed, for the “only if” part we need to prove that
if M possesses extreme points, then M does not contain lines. That is, we need
to prove that if M contains lines, then it has no extreme points. But, this is
indeed immediate: if M contains a line, then, by Lemma II.8.8, there is a line in
M passing through every given point of M , so that no point can be extreme.
Now let us prove the “if” part of (i). Thus, from now on we assume that M
does not contain lines, and our goal is to prove that then M possesses extreme
points. Equipped with Lemma II.8.10 and Proposition II.8.2, we will prove this
by induction on dim(M ).
There is nothing to do if dim(M ) = 0, i.e., if M is a single point – then, of
course, M = Ext(M ). Now, for the induction hypothesis, for some integer k > 0,
we assume that all nonempty closed convex sets T that do not contain lines and
have dim(T ) = k satisfy Ext(T ) ̸= ∅. To complete the induction, we will show
that this statement is valid for such sets of dimension k + 1 as well. Let M be a
nonempty, closed, convex set that does not contain lines and has dim(M ) = k +1.
Since M does not contain lines and dim(M ) > 0, we have M ̸= Aff(M ). We claim
that M possesses a relative boundary point x̄. To see this, note that there exists
z ∈ Aff(M ) \ M , and thus for any fixed x ∈ M the point
xλ := x + λ(z − x)
does not belong to M for some λ > 0 (and then, by convexity of M , for all larger
values of λ), while x0 = x belongs to M . The set of those λ ≥ 0 for which xλ ∈ M
is therefore nonempty and bounded from above; this set clearly is closed (since
M is closed). Thus, there exists the largest λ = λ∗ for which xλ ∈ M . We claim
that xλ∗ ∈ rbd M . Indeed, by construction xλ∗ ∈ M . If xλ∗ were to be in rint M ,
then all the points xλ with λ values greater than λ∗ yet close to λ∗ would also
belong to M , which contradicts the origin of λ∗ .
Thus, there exists x̄ ∈ rbd M . Then, by Proposition II.8.2(i), there exists a
hyperplane Π = x ∈ Rn : a⊤ x = a⊤ x̄ which is supporting to M at x̄:


inf a⊤ x < max a⊤ x = a⊤ x̄.


x∈M x∈M

Moreover, by Proposition II.8.2(ii), the set T := Π ∩ M is nonempty closed


and convex and it satisfies dim(T ) < dim(M ), i.e., dim(T ) ≤ k. As M does
not contain lines, T ⊂ M clearly does not contain lines either. Then, by the
inductive hypothesis, T possesses extreme points, i.e., Ext(T ) ̸= ∅. Moreover,
by Lemma II.8.10 Ext(M ) ⊇ Ext(Π ∩ M ) = Ext(T ) ̸= ∅. This completes the
inductive step, and hence (i) is proved.
Now let us prove (ii). Let M be nonempty, closed, convex, and bounded. We
need to prove that
M = Conv(Ext(M )).
As M is convex, we immediately observe that M ⊇ Conv(Ext(M )). Thus, all
we need is to prove that every x ∈ M is a convex combination of points from
Ext(M ). Here, we again use induction on dim(M ). The case of dim(M ) = 0,
96 Consequences of Separation Theorem

i.e., when M is a single point, is trivial. Assume that the statement holds for all
k-dimensional closed convex and bounded sets. Let M be a closed convex and
bounded set with dim(M ) = k + 1. Consider any x ∈ M . To represent x as a
convex combination of points from Ext(M ), let us pass through x an arbitrary
line ℓ = {x + λh : λ ∈ R} (where h ̸= 0) in the affine span Aff(M ) of M . Moving
along this line from x in each of the two possible directions, we eventually leave
M (since M is bounded). Then, there exist nonnegative λ+ and λ− such that the
points
x̄+ := x + λ+ h, x̄− := x − λ− h
both belong to rbd M . We claim that x̄± admit convex combination representa-
tion using points from Ext(M ) (this will complete the proof, since x clearly is a
convex combination of the two points x̄± ). Indeed, by Proposition II.8.2(i) there
exists a hyperplane Π supporting to M at x̄+ , and by Proposition II.8.2(ii) the
set Π ∩ M is nonempty, closed and convex with dim(Π ∩ M ) < dim(M ) = k + 1.
Moreover, as M is bounded Π ∩ M is bounded as well. Then, by the inductive
hypothesis, x̄+ ∈ Conv(Ext(Π ∩ M )). Moreover, since by Lemma II.8.10 we have
Ext(Π ∩ M ) ⊆ Ext(M ), we conclude x̄+ ∈ Conv(Ext(M )). Analogous reasoning
is valid for x̄− as well.

8.3 Recessive directions and recessive cone


Lemma II.8.8 states that if M is a nonempty closed convex set, then the set of
all directions h such that x + th ∈ M for some x and all t ≥ 0 is exactly the same
as the set of all directions h such that x + th ∈ M for all x ∈ M and all t ≥ 0.
Directions of this type play an important role in the theory of convex sets, and
consequently they have a name – they are called recessive directions of M .

Definition II.8.11 [Recessive directions and recessive cone] Given a nonempty


closed convex set M ⊆ Rn , a direction h ∈ Rn is called a recessive direction
of M if we have x + th ∈ M for any x ∈ M and any t ≥ 0.
The set of all recessive directions is called the recessive cone of M [notation:
Rec(M )].

Remark II.8.12 Given a closed convex set M , we immediately deduce that


Rec(M ) indeed is a closed cone (prove it!) and that
M + Rec(M ) = M. (8.3)

Let us see some examples of recessive cones of sets.
Example II.8.3
• The recessive cone of Rn+ is itself. In fact, the recessive cone of any closed cone
is itself.
8.3 Recessive directions and recessive cone 97
Pn
• Consider
Pn the set M := {x ∈ Rn : i=1 xi = 1}; then Rec(M ) = {h ∈ R :
n

i=1 hi = 0}. Pn
• Consider the set M := {x ∈ Rn+ : Pi=1 xi = 1}; then Rec(M ) = {0}.
n
• Consider the set M := {x ∈ Rn+ : i=1 xi ≥ 1}; then Rec(M ) = Rn+ .
• Consider the set M := {x ∈ R+ : x1 x2 ≥ 1}; then Rec(M ) = R2+ .
2

• Consider the set M := {x ∈ Rn : xn −an ≥ ∥(x1 , . . . , xn−1 )−(a1 , . . . , an−1 )∥2 },


where a = (a1 , . . . , an ) is a given point. Then, Rec(M ) = Ln .

Fact II.8.13 Let M be a nonempty closed convex set in Rn . Then,


(i) Rec(M ) ̸= {0} if and only if M is unbounded.
(ii) If M is unbounded, then all nonzero recessive directions of M are
positive multiples of recessive directions of unit Euclidean length, and the
latter are asymptotic directions of M , i.e., a unit vector h ∈ Rn is a recessive
direction of M if and only if there exists a sequence {xi ∈ M }i≥1 such that
∥xi ∥2 → ∞ as i → ∞ and h = limi→∞ xi /∥xi ∥2 .
(iii) M does not contain lines if and only if the cone Rec(M ) does not
contain lines.
Here is how we can “visualize” (or compute) the recessive cone of a nonempty
closed convex set:

Fact II.8.14 Let M ⊆ Rn be a nonempty closed convex set. Recall its closed
conic transform is given by
ConeT(M ) = cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ M } ,
(see section 1.5). Then,
Rec(M ) = h ∈ Rn : [h; 0] ∈ ConeT(M ) .


Finally, the recessive cones of nonempty polyhedral sets in fact admit a much
simpler characterization.

Fact II.8.15 For any nonempty polyhedral set M = {x ∈ Rn : Ax ≤ b}, its


recessive cone is given by
Rec(M ) = {h ∈ Rn : Ah ≤ 0} ,
i.e., Rec(M ) is given by homogeneous version of linear constraints specifying
M.
We have seen in Theorem II.8.6 that if M is a nonempty convex compact set,
it possesses a quite representative set of extreme points, i.e., their convex hull is
the entire M . We close this section by extending this result as follows.
98 Consequences of Separation Theorem

Theorem II.8.16 Let M ⊂ Rn be a nonempty closed convex set.


(i) If M does not contain lines, then the set Ext(M ) of extreme points of
M is nonempty, and
M = Conv(Ext(M )) + Rec(M ). (8.4)
(ii) In every representation, if any, of M as M = V + K with a nonempty
bounded set V and a closed cone K, the cone K is Rec(M ) and V contains
Ext(M ).

Proof.
(i): By Theorem II.8.6(i) we already know that any nonempty closed convex
that does not contain lines must possess extreme points. We will prove the rest
of Part (i) by induction on dim(M ). There is nothing to prove when dim(M ) =
0, that is, M is a singleton. So, suppose that the claim holds true for all sets
of dimension k. Let M be any nonempty closed convex that does not contain
lines and has dim(M ) = k + 1. To complete the induction step, we will show
that M satisfies the relation (8.4). Consider x ∈ M and let e be a nonzero
direction parallel to Aff(M ) (such a direction exists, since dim(M ) = k + 1 ≥
1). Recalling that M does not contain lines and replacing, if necessary, e with
−e, we can assume that −e is not a recessive direction of M . Same as in the
proof of Theorem II.8.6, x admits a representation x = x− + t− e with t− ≥ 0
and x− ∈ rbd(M ). Define M− to be the intersection of M with the plane Π−
supporting to M at x− . Then, M− is a nonempty closed convex subset of M
and dim(M− ) ≤ k. Also, M− does not contain lines as M− ⊂ M and M does
not contain lines. Thus, by inductive hypothesis, x− is the sum of a point from
the nonempty set Conv(Ext(M− )) and a recessive direction h− of M− . As in the
proof of Theorem II.8.6, Ext(M− ) ⊆ Ext(M ), and of course h− ∈ Rec(M ) due to
Rec(M− ) ⊆ Rec(M ) (why?). Thus, x = v− + h− + t− e with v− ∈ Conv(Ext(X))
and h− ∈ Rec(M ). Now, there are two possibilities: e ∈ Rec(M ) and e ̸∈ Rec(M ).
In the first case, x = v− + h with h = h− + t− e ∈ Rec(M ) (recall h− ∈ Rec(M )
and in this case we also have e ∈ Rec(M )), that is, x ∈ Conv(Ext(M )) + Rec(M ).
In the second case, we can apply the above construction to the vector −e in the
role of e, ending up with a representation of x of the form x = v+ + h+ − t+ e
where v+ ∈ Conv(Ext(M )), h+ ∈ Rec(M ) and t+ ≥ 0. Taking appropriate convex
combination of the resulting pair of representations of x, we can cancel the terms
with e and arrive at x = λv− + (1 − λ)v+ + λh− + (1 − λ)h+ , resulting in x ∈
Conv(Ext(M )) + Rec(M ). This reasoning holds true for every x ∈ M , hence
we deduce M ⊆ Conv(Ext(M )) + Rec(M ). The opposite inclusion is given by
(8.3) due to Conv(Ext(M )) ⊆ M . This then completes the proof of the inductive
hypothesis, and thus Part (i) is proved.
(ii): Now assume that M , in addition to being nonempty closed and convex,
is represented as M = V + K, where K is a closed cone and V is a nonempty
bounded set, and let us prove that K = Rec(M ) and V ⊇ Ext(M ). Indeed, every
vector from K clearly is a recessive direction of V + K, so that K ⊆ Rec(M ). To
8.3 Recessive directions and recessive cone 99

prove the opposite inclusion K ⊇ Rec(M ), consider any h ∈ Rec(M ), and let us
prove that h ∈ K. Fix any point v ∈ M . The vectors v +ih, i = 1, 2, . . ., belong to
M and therefore v + ih = v i + hi for some v i ∈ V and hi ∈ K due to M = V + K.
It follows that h = i−1 [v i − v] + i−1 hi for i = 1, 2, . . .. Thus, h = limi→∞ i−1 hi
(recall that V is bounded). As hi ∈ K and K is a cone, i−1 hi ∈ K and so h
is the limit of a sequence of points in K. Since K is closed, we deduce h ∈ K,
as claimed. Thus, K = Rec(M ). It remains to prove that Ext(M ) ⊆ V . This is
immediate: consider any w ∈ Ext(M ), then as M = V + K = V + Rec(M ) and
w ∈ M , we have w = v + e with some v ∈ V ⊆ M and e ∈ Rec(M ), implying
that w − e = v ∈ M . Besides this, w + e ∈ M as w ∈ M and e ∈ Rec(M ). Thus,
w ± e ∈ M . Since w is an extreme point of M , we conclude that e = 0, that is,
w =v ∈V.
Finally, let us consider what happens to the recessive directions after the pro-
jection operation.

Proposition II.8.17 Let M + ∈ Rnx × Rku be a nonempty closed convex set


such that its projection
M = x ∈ Rn : ∃u: [x; u] ∈ M +


is closed. Then,
[hx ; hu ] ∈ Rec(M + ) =⇒ hx ∈ Rec(M ).

Proof. Consider any recessive direction [hx ; hu ] ∈ Rec(M + ). Then, for any
[x̄; ū] ∈ M + , the ray {[x̄; ū] + t[hx ; hu ] : t ≥ 0} is contained in M + . The pro-
jection of this ray on the x-plane is given by the ray {x̄ + thx : t ≥ 0}, which is
contained in M . Thus, hx ∈ Rec(M ).
While Proposition II.8.17 states that [hx ; hu ] ∈ Rec(M + ) =⇒ hx ∈ Rec(M ),
in general, Rec(M ) can be much larger than the projection of Rec(M + ) onto
x-plane. Our next example illustrates this.
Example II.8.4 Consider the sets M + = {[x; u] ∈ R2 : u ≥ x2 } and M = {x ∈
R2 : ∃u ∈ R: [x; u] ∈ M + }. Then, M is the entire x-axis and Rec(M ) = M
is the entire x-axis. On the other hand, Rec(M + ) = {[0; hu ] : hu ≥ 0} and the
projection of Rec(M + ) onto the x-axis is just the origin. ♢
In fact, the pathology highlighted in Example II.8.4 can be eliminated when we
have that the set of extreme points of the convex representation M + of a convex
set M is bounded and the projection of Rec(M + ) is closed.

Proposition II.8.18 Let M + ⊂ Rnx × Rku be a nonempty closed convex set


such that M + = V + Rec(M + ) for some bounded and closed set V . Let M
be the projection of M + onto the x-plane, i.e.,
M = x ∈ Rnx : ∃u ∈ Rku : [x; u] ∈ M + .

100 Consequences of Separation Theorem

Assume that the cone


K = hx ∈ Rnx : ∃hu ∈ Rku : [hx ; hu ] ∈ Rec(M + )


is closed. Then, M is closed and K = Rec(M ).

Proof. Let M + satisfy the premise of the proposition. Define


W := x ∈ Rnx : ∃u ∈ Rku : [x; u] ∈ V ,


that is, W is the projection of V onto the x-space. As V is a closed and bounded
(therefore compact) set, its projection W is compact as well (recall that the
image of a compact set under a continuous mapping is compact). Note that M is
nonempty and it satisfies M = W + K (why?). Then, M is the sum of a compact
set W and a closed set K, and thus M is closed itself (why?). Besides this, M is
convex (recall that the projection of a convex set is convex). Thus, the nonempty
closed convex set M satisfies M = W + K with nonempty bounded W and closed
cone K, implying by Theorem II.8.16 that K = Rec(M ).
Recall that we have investigated the relation between the recessive directions of
a closed convex set M ∈ Rnx and its closed convex representation M + ∈ Rnx × Rku
in Proposition II.8.17. In particular, we observed that while [hx ; hu ] ∈ Rec(M + )
implies hx ∈ Rec(M ), the recessive direction of M “stemming” from those of M +
can form a small part of Rec(M ), as seen in Example II.8.4.
A surprising (and not completely trivial) fact is that for polyhedral sets M ,
the projection of Rec(M + ) onto the x-plane is Rec(M ).

Proposition II.8.19 Let M ∈ Rnx be a nonempty set admitting a polyhedral


representation M + ∈ Rnx × Rku , i.e.,
M + := [x; u] ∈ Rnx × Rku : Ax + Bu ≤ c , and


M := x ∈ Rnx : ∃u ∈ Rku : [x; u] ∈ M + .




Then,
Rec(M ) = hx : ∃hu : [hx ; hu ] ∈ Rec(M + )


= {hx : ∃hu : Ahx + Bhu ≤ 0} . (8.5)


That is, polyhedral representation of M naturally induces a polyhedral rep-
resentation of Rec(M ).

Proposition II.8.19 is an immediate consequence of Proposition II.8.18. To


derive Proposition II.8.19 from Proposition II.8.18, it suffices to note that a
nonempty polyhedral set is the sum of the convex hull of a finite set and a
polyhedral cone, see section 9.3.

8.4 Dual cone


We start with the definition of dual cone.
8.4 Dual cone 101

Definition II.8.20 [Dual cone] Let M ⊆ Rn be a cone. The set of all vectors
which have nonnegative inner products with all vectors from M , i.e., the set
M∗ := a ∈ Rn : a⊤ x ≥ 0, ∀x ∈ M ,

(8.6)
is called the cone dual to M .

From its definition, it is clear that the dual cone M∗ of any cone M is a closed
cone.
Example II.8.5 The cone dual to the nonnegative orthant Rn+ is composed of
all n-dimensional vectors y making nonnegative inner products with all entrywise
nonnegative n-dimensional vectors x. As is immediately seen the vectors y with
this property are exactly entrywise nonnegative vectors: [Rn+ ]∗ = Rn+ . ♢
Note that in the preceding example, Rn+ is given by finitely many homogeneous
linear inequalities:

Rn+ = x ∈ Rn : e⊤

i x ≥ 0, i = 1, . . . , n ,

where ei are the basic orths; and we observe that the dual cone is the conic hull
of these basic orth. This is indeed a special case of the following general fact:

Proposition II.8.21 For any F ⊆ Rn , the set


M := x ∈ Rn : f ⊤ x ≥ 0, ∀f ∈ F


is a closed cone, and its dual cone is


M∗ = cl Cone(F ),
where Cone(F ), as always, is the conic hull of F , see Definition I.1.19. In
addition, M remains intact when F is extended to its closed conic hull:
M := x ∈ Rn : f ⊤ x ≥ 0, ∀f ∈ F = x ∈ Rn : f ⊤ x ≥ 0, ∀f ∈ cl Cone(F ) .
 

Proof. The inclusion Cone(F ) ⊆ M∗ is evident, and since M∗ is closed, we have


also cl Cone(F ) ⊆ M∗ . Let us define F := cl Cone(F ), so now we need to prove
the inclusion F ⊇ M∗ . Consider z ∈ M∗ , and assume for contradiction that z ̸∈ F .
Note that F is convex, nonempty, and closed, so that by Separation Theorem (ii)
there exists g such that
g ⊤ z < inf g ⊤ f.
f ∈F

Because F is a closed cone (and so 0 ∈ F ), the right hand side infimum, being
finite, must be 0. Then, g ⊤ f ≥ 0 for all f ∈ F and g ⊤ z < 0. Since f ⊤ g ≥ 0 for all
f ∈ F and also F ⊇ F , we deduce f ⊤ g ≥ 0 for all f ∈ F , that is, g ∈ M by the
definition of M . But, then the inclusion g ∈ M together with z ∈ M∗ contradicts
the relation z ⊤ g < 0. Finally, we clearly have f ⊤ x ≥ 0 for all x ∈ F if and only
if f ⊤ x ≥ 0 for all x ∈ cl Cone(F ).
102 Consequences of Separation Theorem

Remark II.8.22 Note that, in contrast to Proposition II.8.21, in the concluding


expression of the chain
[Rn+ ]∗ = x ∈ Rn : e⊤

i x ≥ 0, i = 1, . . . , n ∗ = Cone({ei : i = 1, . . . , n})

we did not need to take the closure. This is because the conic hull of a finite set
F is polyhedrally representable and is therefore a polyhedral cone (by Theorem
I.3.2), and as such it is automatically closed.
This fact (i.e., no need to take the closure in Proposition II.8.21) holds true
for the dual of any polyhedral cone: consider the set {x ∈ Rn : a⊤ i x ≥ 0, i =
1, . . . , I} given by finitely many homogeneous nonstrict linear inequalities. This
set is clearly a polyhedral
nP cone, and itsodual is the conic hull of ai ’s, i.e., Cone({ai :
I
i = 1, . . . , I}) = i=1 λi ai : λ ≥ 0 . Moreover, this dual cone clearly is also
polyhedrally representable as
( I
)
X
Cone {a1 , . . . , aI } = x ∈ Rn : ∃λ ≥ 0: x = λi ai ,
i=1

and thus Cone {a1 , . . . , aI } is polyhedral as well. ■



 n
In the case of the cones of the form x ∈ R : f x ≥ 0, ∀f ∈ F stemming
from infinite sets F (in fact, every closed cone in Rn can be represented in this
way using a properly selected countable set F = {fi : i = 1, 2, . . .}) (why?), the
closure operation in the computation of the dual cone, in general, cannot be
omitted. This is so even when the set F itself is closed convex and bounded
(why? Hint: recall Proposition II.8.21).
Let us illustrate this in the next example.
x ∈ R2 : f ⊤ x ≥ 0, ∀f ∈ F ,

Example II.8.6 Consider the cone given by K :=
where F = f = [u; v] ∈ R2+ : v ≥ u2 , u ≤ 1, v ≤ 1 . Note that F is a compact
convex set contained in R2+ . Moreover, every vector [u; v] ∈ R2 with positive en-
tries is a positive multiple of a vector from F (draw a picture!). Thus, the set of
vectors that have nonnegative inner products with all vectors from F , i.e., K, is
exactly the same as the set of vectors that have nonnegative inner products with
all vectors from R2+ . Hence, we arrive at K = x ∈ R2 : f ⊤ x ≥ 0, ∀f ∈ F =


R2+ , so K∗ = R2+ as well. Now observe that K∗ = R2+ is, as it should be by Propo-
sition II.8.21, the closure of Cone(F ), nevertheless K∗ = cl Cone(F ) is larger than
Cone(F ) as Cone(F ) is not closed! Note that Cone(F ) is precisely the set obtained
from R2+ by eliminating all nonzero points on the boundary of R2+ . ♢

Fact II.8.23 Let M be a closed cone in Rn , and let M∗ be the cone dual to
M . Then
(i) Duality does not distinguish between a cone and its closure: whenever
M = cl M ′ for a cone M ′ , we have M∗ = M∗′ .
(ii) Duality is symmetric: the cone dual to M∗ is M .
8.4 Dual cone 103

(iii) One has


int M∗ = y ∈ Rn : y ⊤ x > 0, ∀x ∈ M \ {0} ,


and int M∗ is nonempty if and only if M is pointed (i.e., M ∩[−M ] = {0}).


Moreover, when M , in addition to being closed, is pointed and nontrivial
(M ̸= {0}), one has
int M∗ = y ∈ Rn : My := {x ∈ M : x⊤ y = 1} is nonempty and compact .

(8.7)

(iv) The cone dual to the direct product M1 × . . . × Mm of cones Mi is the


direct product of their duals: [M1 × . . . × Mm ]∗ = [M1 ]∗ × . . . × [Mm ]∗ .

Let us see some examples of dual cones.


Example II.8.7 Consider the epigraph of ∥ · ∥∞ on Rn given by
K∞ := [x; t] ∈ Rn+1 : t ≥ ∥x∥∞ .


Note that K∞ is a polyhedral cone (why?). The cone dual to K∞ is K1 =


{[x; t] ∈ Rn+1 : t ≥ ∥x∥1 }, which is nothing but the epigraph of ∥ · ∥1 :
[K∞ ]∗ = [g; s] ∈ Rn+1 : st + g ⊤ x ≥ 0, ∀([x; t] : ∥x∥∞ ≤ t)

 
n+1 ⊤
= [g; s] ∈ R : s + min g x ≥ 0
x:∥x∥∞ ≤1
n+1

= [g; s] ∈ R : s ≥ ∥g∥1 . ♢

Definition II.8.24 [Self-dual cone] A cone K ⊂ Rn is called a self-dual cone


if its dual cone is equal to itself, i.e., K∗ = K.

We next examine a number of very important self-dual cones.


Example II.8.8 Let us compute the dual of the Lorentz cone
Ln := [x; t] ∈ Rn−1 × R : t ≥ ∥x∥2 .


When n = 1, L1 is the nonnegative ray, and thus L1 = R+ and therefore [L1 ]∗ =


R+ = L1 . When n ≥ 2, we have
[Ln ]∗ = [g; s] ∈ Rn−1 × R : g ⊤ x + ts ≥ 0, ∀([x; t] : ∥x∥2 ≤ t)


= [g; s] ∈ Rn−1 × R : g ⊤ x + s ≥ 0, ∀(x : ∥x∥2 ≤ 1)



 
n−1 ⊤
= [g; s] ∈ R × R : s + min g x ≥ 0
x:∥x∥2 ≤1

= [g; s] ∈ Rn−1 × R : s ≥ ∥g∥2 ,




where the concluding equality is due to Cauchy-Schwarz inequality, see Theorem


B.1. Thus, [Ln ]∗ = Ln . ♢
n
Example II.8.9 The cone dual to the semidefinite cone S+ , by Theorem D.32,
is itself:
[Sn+ ]∗ := y ∈ Sn : ⟨y, x⟩ := Tr(xy) ≥ 0, ∀x ∈ Sn+ = Sn+ .


104 Consequences of Separation Theorem

Based on these examples, we have arrived at the following conclusion.


The cones R+ , Ln , and Sn+ are self-dual.
By Fact II.8.23 the direct product of finitely many self-dual cones is self-dual, im-
plying that finite direct products of nonnegative orthants, Lorentz, and semidef-
inite cones are self-dual.

8.5 ⋆ Dubovitski-Milutin Lemma


In this section, we deal with the following basic yet important question: Let
M 1 , . . . , M k be cones (not necessarily closed) in Rn , and M be their intersection.
Of course, M also is a cone. But, how can we compute M∗ , i.e., the cone dual
to M ? To this end, we first examine the relationship between M∗ and the cone
f that is defined as the sum of the dual cones M i .
M ∗

Tk
Proposition II.8.25 Let M 1 , . . . , M k be cones in Rn . Define M := i=1 M i ,
and let M∗ be the dual cone of M . Let M∗i denote the dual cone of M i , for
f := M 1 + . . . + M k . Then, M∗ ⊇ M
i = 1, . . . , k, and define M f.
∗ ∗
1 k
Moreover, if all the cones M , . . . , M are closed, then M∗ = cl M f. In
1 k
particular, for closed cones M , . . . , M , M∗ = M f holds if and only if M
f is
closed.

Proof. For any i = 1, . . . , k, any ai ∈ M∗i and any x ∈ M , we have a⊤ i x ≥ 0, and


hence (a1 + . . . + ak )⊤ x ≥ 0. Since the latter relation is valid for all x ∈ M , we
conclude that a1 + . . . + ak ∈ M∗ . Thus, M f ⊆ M∗ .
1 k
Now assume that the cones M , . . . , M are closed, and let us define M c := cl M
f
so that we need to prove M∗ = M . Recall that we have already seen M ⊆ M∗ ,
c f
and as M∗ is closed we deduce M c = cl Mf ⊆ M∗ . Thus, all we need to prove is
that if a ∈ M∗ , then a ∈ M c as well. Assume for contradiction that there exists
a ∈ M∗ \ M . As M is clearly a cone, its closure M
c f c is a closed cone. Then, as
a ̸∈ M
c, by Separation Theorem (ii), a can be strongly separated from M c and
thus also from M ⊆ M . Therefore, for some x ̸= 0 we have
f c
k
X
a⊤ x < inf b⊤ x = inf
i
(a1 + . . . + ak )⊤ x = inf a⊤
i x. (8.8)
b∈M
f ai ∈M∗ ,i=1,...,k ai ∈M∗i
i=1

As a⊤ x is a finite number, this inequality implies that inf i a⊤


i x > −∞ holds for
ai ∈M∗
all i = 1, . . . , k. Since M∗i is a cone, this is possible if and only if inf i a⊤
i x = 0.
ai ∈M∗
Then, we deduce that x ∈ [M∗i ]∗ = M i , where the last equality follows from the
fact that each cone M i is closed and using Fact II.8.23.(ii). Thus, x ∈ M i for all i,
k k
inf i a⊤ M i , and (8.8) reduces to a⊤ x < 0.
P T
and i x = 0. Therefore, x ∈ M =
i=1 ai ∈M∗ i=1
But, this then contradicts the inclusion a ∈ M∗ .
8.5 ⋆ Dubovitski-Milutin Lemma 105

Remark II.8.26 Note that in general M f can be non-closed even when all the
1 k
cones M , . . . , M are closed. Indeed, take
√ 2 k =2 2, and let M 1 = M∗1 be the
3 2
second-order cone (x, y, z) ∈ R : z ≥ x + y , and M∗ be the following ray
in R3
(x, y, z) ∈ R3 : x = z, y = 0, x ≤ 0 .


Observe that the points from M f ≡ M 1 + M 2 are exactly the points of the form
√ ∗ ∗
(x−t, y, z−t) with t ≥ 0 and z ≥ x2 + y 2 . In particular, for any α > 0, the points
√ √
(0, 1, α2 + 1−α) = (α−α, 1, α2 + 1−α) belong to M f. As α → ∞, these points
ξ ∈ cl M
converge to ξ := (0, 1, 0), and thus √ f. On the other hand, we clearly cannot
find x, y, z, t with t ≥ 0 and z ≥ x2 + y 2 such that (x − t, y, z − t) = (0, 1, 0),
that is, ξ ̸∈ Mf. ■
Dubovitski-Milutin Lemma presents a simple sufficient condition for M f to be
closed and thus to coincide with M∗ :

Proposition II.8.27 [Dubovitski-Milutin Lemma in finite dimensions] Let


M 1 , . . . , M k be cones such that
M k ∩ int M 1 ∩ int M 2 ∩ . . . ∩ int M k−1 ̸= ∅.
k
M i . Let also M∗i be the cones dual to M i . Then,
T
Define M :=
i=1
k
cl M i ; and
T
(i) cl M =
i=1
f := M 1 +. . .+M k is closed, and thus by Proposition II.8.25
(ii) the cone M ∗ ∗
M∗ = M∗ + . . . + M∗k . In other words, every linear form which is nonnegative
1

on M can be represented as a sum of k linear forms which are nonnegative


on the respective cones M 1 , . . . , M k .

Proof. (i): This is given by Proposition I.1.33(ii).


(ii): First, we claim that under the premise of the proposition, without loss of
generality we can assume that M 1 , . . . , M k are closed cones. This is because when
replacing the cones M 1 , . . . , M k with their closures, we preserve the premise of the
proposition, and also M f = M 1 + . . . + M k = [cl M 1 ]∗ + . . . + [cl M k ]∗ holds (recall
∗ ∗
that by definition of the dual cone M∗i = [cl M i ]∗ ), as well as M∗ = [cl M ]∗ =
k
cl M i where the last equality holds by Part (i).
T
i=1
To prove Part (ii) of the proposition all we need is to show that given closed
cones M 1 , . . . , M k we have the cone M
f := M 1 + . . . + M k is closed. To this end,
∗ ∗
we will use induction on k ≥ 2.
Base case: Suppose k = 2. Consider a sequence {ft + gt }∞ t=1 with ft ∈ M∗ ,
1
2
gt ∈ M∗ and (ft + gt ) → h as t → ∞. We need to prove that h = f + g for some
appropriate f ∈ M∗1 and g ∈ M∗2 . To this end, it suffices to verify that for an
appropriate subsequence tj of indices there exists f := lim ftj . Indeed, if this is
j→∞
106 Consequences of Separation Theorem

the case, then g = lim gtj also exists since ft + gt → h as t → ∞ and f + g = h,


j→∞
and also in this case we will have f ∈ M∗1 and g ∈ M∗2 (recall that M∗1 and M∗2
are closed cones). Let us verify the existence of the desired subsequence. Assume
for contradiction that ∥ft ∥2 → ∞ as t → ∞. Passing to a subsequence, we may
assume that the unit vectors ϕt := ft /∥ft ∥2 have a limit ϕ as t → ∞. Since M∗1
is a closed cone, ϕ is a unit vector from M∗1 . Now, since ft + gt → h as t → ∞,
we have ϕ = lim ft /∥ft ∥2 = − lim gt /∥ft ∥2 (recall that ∥ft ∥2 → ∞ as t → ∞,
t→∞ t→∞
whence h/∥ft ∥2 → 0 as t → ∞). Then, the vector (−ϕ) ∈ M∗2 as well (recall that
M∗2 is a closed cone). Now, consider any x̄ ∈ M 2 ∩ int M 1 (by the premise of the
proposition this set is non-empty). We have ϕ⊤ x̄ ≥ 0 (since x̄ ∈ M 1 and ϕ ∈ M∗1 )
and ϕ⊤ x̄ ≤ 0 (since −ϕ ∈ M∗2 and x̄ ∈ M 2 ). We conclude that ϕ⊤ x̄ = 0, which
contradicts the facts that 0 ̸= ϕ (as ∥ϕ∥2 = 1), ϕ ∈ M∗1 and x̄ ∈ int M 1 (see Fact
II.8.23.(iii)).
Inductive step: Assume that the statement is valid for k − 1 ≥ 2 cones, and let
M 1 ,. . . ,M k be k cones satisfying the premise of the proposition. By this premise,
the cone M1 := M 1 ∩ . . . ∩ M k−1 has a nonempty interior, and M k intersects
this interior. Applying to the pair of cones M1 , M k the already proved 2-cone
version of the lemma, we see that the set [M1 ]∗ + M∗k is closed; here [M1 ]∗ is
the cone dual to M1 . Moreover, the cones M 1 , . . . , M k−1 satisfy the premise of
the (k − 1)-cone version of the lemma. Then, by inductive hypothesis, the set
M∗1 + . . . + M∗k−1 is closed. Then, as M1 := M 1 ∩ . . . ∩ M k−1 , Proposition II.8.25
implies that [M1 ]∗ = M∗1 + . . . + M∗k−1 , and so M∗1 + . . . + M∗k = [M1 ]∗ + M∗k . As
[M1 ]∗ + M∗k is closed, we deduce that M∗1 + . . . + M∗k is closed, as desired. This
concludes the induction step.
Alternative to proof to Proposition II.8.27. Here, we present an alternative
proof of Proposition II.8.27 Part (ii) without relying on induction.
Let us start with the following fact that is important by its own right.

Fact II.8.28 Let M ⊆ Rn be a cone and M∗ be its dual cone. Then, for any
x ∈ int M , there exists a properly selected Cx < ∞ such that
∥f ∥2 ≤ Cx f ⊤ x, ∀f ∈ M∗ .

Now, as explained in the beginning of Part (ii) of the above proof of Proposi-
tion II.8.27, we can assume without loss of generality that the cones M 1 , . . . , M k
satisfying the premise of the proposition are closed, and all we need to prove is
that the cone M∗1 + . . . + M∗k is closed. The latter is the same as to verify that
Pk
whenever vectors fti ∈ M∗i , i ≤ k, t = 1, 2, . . . are such that ft := i=1 fti → h
as i → ∞, it holds h ∈ M∗1 + . . . + M∗k . Indeed, in the situation in question,
selecting x̄ ∈ M k ∩ int M 1 ∩ . . . ∩ int M k−1 (by the promise this intersection is
Pk
nonempty!) we have x̄⊤ fti ≥ 0 for all i, t and i=1 x̄⊤ fti → x̄⊤ h as t → ∞,
implying that for all i ≤ k the sequences {x̄⊤ fti }t=1,2,... are bounded. Moreover,
for any i < k we have x̄ ∈ int M i and fti ∈ M∗i , and so Fact II.8.28 guaran-
tees that the sequence {fti }t=1,2,... is bounded. Thus, as the sequences {fti }t=1,2,...
8.6 Extreme rays and conic Krein-Milman Theorem 107
Pk
are bounded for any i < k and the sequence i=1 fti has a limit as t → ∞,
we conclude that the sequence {ftk }t=1,2,... is bounded as well. Hence, all k se-
quences {fti }t=1,2,... are bounded, so that passing to a subsequence t1 < t2 < . . .
we can assume that f i := limj→∞ ftij is well defined for every i ≤ k. Since
fti ∈ M∗i andPthe cones M∗i are P closed,
i i
Pwe ihave f ∈ M∗ for all i ≤ k. Finally, as
h = limt→∞ i ft = limj→∞ i ftj = i f , we conclude that h ∈ M∗1 + . . . + M∗k ,
i i

as claimed.

8.6 Extreme rays and conic Krein-Milman Theorem


The story about extreme points of closed convex sets has a conic analogy, with
nontrivial closed pointed cones playing the role of nonempty closed convex sets
that do not contain straight lines and extreme rays of these cones in the role of
extreme points.
Once again, in order to define extreme rays of nontrivial closed cones, we need
to work with cones that do not contain straight lines. In the case of closed cones,
this notion of not containing straight lines has a special name, i.e., pointed.

Definition II.8.29 [Pointed cone] A closed cone M ⊆ Rn is called pointed


if M ∩ [−M ] = {0}, i.e., the zero vector is the only vector x that satisfies
x ∈ M and −x ∈ M .

Remark II.8.30 Note that a closed cone M ⊆ Rn is pointed if and only if


M does not contain a straight line passing through the origin. Invoking Lemma
II.8.8, we see that a closed cone is pointed if and only if it does not contain
straight lines. ■
In our discussion, we will focus on nontrivial cones M , i.e., M ̸= {0}. For
nontrivial closed pointed cones, let us formally introduce the definition of extreme
directions and extreme rays.

Definition II.8.31 [Extreme directions and extreme rays] Let M ⊂ Rn be


a nontrivial closed pointed cone. A direction d ∈ Rn is called an extreme
direction of M if it possesses the following two properties:
• d ∈ M \ {0}, and
• in every representation of d as the sum of two vectors from M , i.e., d =
d1 + d2 with d1 , d2 ∈ M , both d1 and d2 are nonnegative multiples of d.
It is clear that when d is an extreme direction of M , so are all positive
multiples of d, i.e., all nonzero vectors on the ray R+ (d) generated by d are
also extreme directions of M . A ray generated by an extreme direction of M
is called an extreme ray of M .
The set of all extreme directions and extreme rays of M are denoted by
ExtD(M ) and ExtR(M ), respectively.
108 Consequences of Separation Theorem

Example II.8.10 The simplest example of nontrivial closed and pointed cone
is the nonnegative orthant Rn+ . Based on our extreme direction definition, the
extreme directions of Rn+ should be the nonzero n-dimensional entrywise non-
negative vectors d such that whenever d = d1 + d2 with d1 ≥ 0 and d2 ≥ 0, both
d1 and d2 must be nonnegative multiples of d. Such a vector d has at all entries
nonnegative and at least one of them positive. In fact, the number of positive
entries in d is exactly one, since if there were at least two entries, say, d1 and
d2 , positive, we would have d = [d1 ; 0; . . . ; 0] + [0; d2 ; d3 ; . . . ; dn ] and both of the
vectors in the right hand side would be nonzero and not proportional to d. Thus,
any extreme direction of Rn+ must be a positive multiple of a basic orth. It is
immediately seen that every vector of the latter type is an extreme direction of
Rn+ . Hence, the extreme directions of Rn+ are positive multiples of the basic orths,
and the extreme rays of Rn+ are the nonnegative parts of the coordinate axes. ♢
We next introduce the concept of a base which is an important type of the
cross section of a nontrivial closed pointed cone. Moreover, we will see that a
base will be a compact convex set and will establish a direct connection between
the extreme rays of the underlying cone and extreme points of its base.

Definition II.8.32 [Base of a cone] Let M ⊆ Rn be a nontrivial closed


pointed cone, and M∗ be its dual cone. A set of the form
B := x ∈ M : f ⊤ x = 1

(8.9)
is called a base of M if this set intersects every emanating from the origin
nontrivial ray in M . Equivalently, a set B of the form (8.9) is a base of M if
for every x ∈ M \{0} there exists t > 0 such that f ⊤ (tx) = 1, or equivalently,
if f ⊤ x > 0 whenever x ∈ M \ {0}.

Fact II.8.33 Let M ⊆ Rn be a nontrivial closed cone, and M∗ be its dual


cone.
(i) M is pointed
(i.1) if and only if M does not contain straight lines,
(i.2) if and only if M∗ has a nonempty interior, and
(i.3) if and only if M has a base.
(ii) Set (8.9) is a base of M
(ii.1) if and only if f ⊤ x > 0 for all x ∈ M \ {0},
(ii.2) if and only if f ∈ int M∗ .
In particular, f ∈ int M∗ if and only if f ⊤ x > 0 whenever x ∈ M \ {0}.
(iii) Every base of M is nonempty, closed, and bounded. Moreover, whenever
M is pointed, for any f ∈ M∗ such that the set (8.9) is nonempty (note
that this set is always closed for any f ), this set is bounded if and only if
f ∈ int M∗ , in which case (8.9) is a base of M .
(iv) M has extreme rays if and only if M is pointed. Furthermore, when
M is pointed, there is one-to-one correspondence between extreme rays of
8.6 Extreme rays and conic Krein-Milman Theorem 109

Figure II.2. Cone and its base (grey pentagon). Extreme rays of the cone are
OA, OB,. . . ,OE intersecting the base at its extreme points A, B, . . . , E.

M and extreme points of a base B of M : specifically, the ray R := R+ (d),


d ∈ M \ {0} is extreme if and only if R ∩ B is an extreme point of B.

See Figure II.2 for an illustration of Fact II.8.33(iv).


In Example II.8.10 we have observed that every vector from Rn+ is the sum
of finitely many extreme directions of Rn+ . This observation is indeed the special
case of the following result.

Theorem II.8.34 [Krein-Milman Theorem, conic form] Let M ⊂ Rn be


a nontrivial closed and pointed cone. Then, its set of extreme directions,
ExtD(M ), is nonempty, and every vector d ∈ M is the sum of finitely many
extreme directions of M .
Proof. Under the premise of the theorem, Fact II.8.33 implies that M has a base
B which is a nonempty convex compact set. Then, by Krein-Milman Theorem
(Theorem II.8.6, B = Conv(Ext(B)). Invoking Fact II.8.33(iv), we conclude that
M has extreme rays; each extreme ray of M is generated by precisely one extreme
point of B. Since vectors from M are exactly the nonnegative multiples of vectors
from B and each point in B is the convex combination of finitely many points
from Ext(B) (by Caratheodory’s Theorem, see Theorem I.2.3), we conclude that
every point of M belongs to the sum of finitely many properly selected extreme
rays of M .
Krein-Milman Theorem in conic form states that a nontrivial closed pointed
cone has extreme rays – even enough of them to make their arithmetic sum the
entire cone. Recall that when a cone M is trivial, we have M = {0} and thus
M does not contain nonzero vectors and therefore has no extreme directions
(the latter, by definition, are nonzero vectors from the cone satisfying certain
additional requirements). Then, by Fact II.8.33(iv) and Krein-Milman Theorem,
we arrive at the following complete characterization of when a closed cone M
possesses extreme rays.
110 Consequences of Separation Theorem

Corollary II.8.35 A closed cone M ⊆ Rn possesses extreme rays if and


only if M is nontrivial and pointed.

8.7 ⋆ Polar of a convex set


We next study the polars of convex sets, a concept closely related to the duals of
cones.

Definition II.8.36 [Polar of a convex set] For any nonempty convex set
M ⊆ Rn , we define its polar [notation: Polar (M )] to be the set of all vectors
a ∈ Rn such that a⊤ x ≤ 1 for all x ∈ M , i.e.,
Polar (M ) := a ∈ Rn : a⊤ x ≤ 1, ∀x ∈ M .


Let us see some basic examples.


Example II.8.11

1. Polar (Rn ) = {0}.


2. Polar ({0}) = Rn .
3. Given a linear subspace in L ⊆ Rn , we have Polar (L) = L⊥ (why?).
4. Let B be the unit Euclidean ball, i.e., B := {x ∈ Rn : ∥x∥2 ≤ 1}. Then,
Polar (B) = B (by Cauchy-Schwarz inequality).
5. Let X ⊂ Rn be nonempty and D be a nonsingular n × n matrix. Then,
Polar (DX) = D−⊤ Polar (X).
6. Let E be an n-dimensional ellipsoid centered at the origin, i.e., E := {x :
x⊤ Cx ≤ 1} where C ≻ 0. Then, Polar (E) = {x : x⊤ C −1 x ≤ 1}, i.e., its polar
is another n-dimensional ellipsoid centered at the origin.
7. Finally, note that passing to polars reverts inclusions: when ∅ ̸= X ⊂ Y ⊂ Rn ,
we have Polar (Y ) ⊂ Polar (X). ♢

For any nonempty convex set M , the following properties of its polar are evi-
dent:

1. 0 ∈ Polar (M );
2. Polar (M ) is convex;
3. Polar (M ) is closed.

It turns out that these properties characterize polars completely:

Proposition II.8.37 Every closed convex set M containing the origin is a


polar set. Specifically, such a set is the polar of its polar:
M is closed, convex, and 0 ∈ M ⇐⇒ M = Polar (Polar (M )).
8.7 ⋆ Polar of a convex set 111

Proof. Based on the evident properties of polars, all we need is to prove that if
M is closed and convex and 0 ∈ M , then M = Polar (Polar (M )). By definition,
for all x ∈ M and y ∈ Polar (M ), we have
y ⊤ x ≤ 1.
Thus, M ⊆ Polar (Polar (M )).
To prove that this inclusion is in fact equality, we assume for contradiction that
there exists x̄ ∈ Polar (Polar (M )) \ M . Since M is a nonempty closed convex set
and x̄ ̸∈ M , the point x̄ can be strongly separated from M (Separation Theorem
(ii)). Thus, there exists b ∈ Rn such that
b⊤ x̄ > sup b⊤ x.
x∈M

As 0 ∈ M , we deduce b x̄ > 0. Passing from b to a proportional vector a = λb
with appropriately chosen positive λ, we may ensure that
a⊤ x̄ > 1 ≥ sup a⊤ x.
x∈M

From the relation 1 ≥ sup a⊤ x we conclude that a ∈ Polar (M ). But, then the
x∈M
relation a⊤ x̄ > 1 contradicts the assumption that x̄ ∈ Polar (Polar (M )). Hence,
we conclude that indeed M = Polar (Polar (M )).
We close this section with a few important properties of the polars.

Fact II.8.38 Let M ⊆ Rn be a convex set containing the origin. Then,


(i) Polar (M ) = Polar (cl M );
(ii) M is bounded if and only if 0 ∈ int(Polar (M ));
(iii) int(Polar (M )) ̸= ∅ if and only if M does not contain straight lines;
(iv) If M is a cone (not necessarily closed), then
Polar (M ) = a ∈ Rn : a⊤ x ≤ 0, ∀x ∈ M = −M∗ .

(8.10)
Assume that M is closed. Then, M is a closed cone if and only if Polar (M )
is a closed cone.
For more information on polars, see Exercise II.38.
9

Geometry of polyhedral sets

9.1 Extreme points of polyhedral sets


Consider a polyhedral set
M = {x ∈ Rn : Ax ≤ b} ,
where A is an m × n matrix and b ∈ Rm . We have seen a geometric character-
ization of extreme points for general convex sets in section 8.2.1. In the case of
polyhedral sets M , we can also give an algebraic characterization of the extreme
points as follows.

Theorem II.9.1 [Characterization of extreme points of polyhedral sets] Let


M = {x ∈ Rn : Ax ≤ b}. A point x ∈ M is an extreme point of M if and only
if there are n linearly independent (i.e., with linearly independent vectors of
coefficients) inequalities of the system Ax ≤ b that are active (i.e., hold as
equalities) at x.

Proof. Let a⊤ i , i = 1, . . . , m, be the rows of A.

 The⊤ “only if” part: let x be an extreme point of M , and define the sets I :=
i : ai x = bi as the set of indices of active constraints and F := {ai : i ∈ I} as
the set of vectors of active constraints. We will prove that the set F contains n
linearly independent vectors, i.e., Lin(F ) = Rn . Assume for contradiction that
this is not the case. Then, as dim(F ⊥ ) = n − dim(Lin(F )), we deduce dim(F ⊥ ) >
0 and so there exists a nonzero vector d ∈ F ⊥ . Consider the segment ∆ϵ :=
[x − ϵd, x + ϵd], where ϵ > 0 will be the parameter of our construction. Since
d is orthogonal to the “active” vectors ai (those with i ∈ I), all points y ∈ ∆ϵ
satisfy the relations a⊤ ⊤
i y = ai x = bi . Now, if i is a “nonactive” index (one with
⊤ ⊤
ai x < bi ), then ai y ≤ bi for all y ∈ ∆ϵ , provided that ϵ is small enough. Since
there are finitely many nonactive indices, we can choose ϵ > 0 in such a way that
all y ∈ ∆ϵ will satisfy all “nonactive” inequalities a⊤ i x ≤ bi , i ̸∈ I, as well. So,
we conclude that for the above choice of ϵ > 0 we get ∆ϵ ⊆ M . But, this is a
contradiction to x being an extreme point of M as we have expressed x as the
midpoint of a nontrivial segment ∆ϵ (recall that ϵ > 0 and d ̸= 0).
To prove the “if” part, we assume that x ∈ M is such that among the in-
equalities a⊤i x ≤ bi which are active at x there are n linearly independent ones.
Without loss of generality, we assume that the indices of these linearly indepen-
dent equations are 1, . . . , n. Given this, we will prove that x is an extreme point
112
9.1 Extreme points of polyhedral sets 113

of M . Assume for contradiction that x is not an extreme point. Then, there exists
a vector d ̸= 0 such that x ± d ∈ M . In other words, for i = 1, . . . , n we would
have bi ≥ a⊤ ⊤ ⊤
i (x ± d) ≡ bi ± ai d (where the last equivalence follows from ai x = bi

for all i ∈ I = {1, . . . , n}), which is possible only if ai d = 0, i = 1, . . . , n. But
the only vector which is orthogonal to n linearly independent vectors in Rn is
the zero vector (why?), and so we get d = 0, which contradicts to the assumption
d ̸= 0.
Theorem II.9.1 states that at every extreme point of a polyhedral set M =
{x ∈ Rn : Ax ≤ b} we must have n linearly independent constraints from Ax ≤ b
holding as equalities. Since a system of n linearly independent equality constraints
in n unknowns has a unique solution, such a system can specify at most one
extreme point of M (exactly one, when the (unique!) solution to the system
satisfies the remaining constraints in the system Ax ≤ b). Moreover, when M
is defined by m inequality constraints, the number of such systems, and thus
the number of extreme points of M , does not exceed the number Cnm of n × n
submatrices of the matrix A ∈ Rm×n . Hence, we arrive at the following corollary.

Corollary II.9.2 Every polyhedral set has finitely many extreme points.

Recall that there are nonempty polyhedral sets which do not have any extreme
points; these are precisely the ones that contain lines.
Note that Cnm is nothing but an upper (and typically very conservative) bound
on the number of extreme points of a polyhedral set in Rn defined by m inequality
constraints. This is because some n × n submatrices of A can be singular, and
what is more important, the majority of the nonsingular ones typically produce
“candidate” points which do not satisfy the remaining inequalities defining M .
Remark II.9.3 Historically, Theorem II.9.1 has been instrumental in developing
an algorithm to solve linear programs, namely the Simplex method. Let us consider
an LP in standard form
minn c⊤ x : P x = p, x ≥ 0 ,

x∈R

where P ∈ Rk×n . Note that we can convert any given LP to this form by adding
a small number of new variables and constraints if needed. In the context of this
LP, Theorem II.9.1 states that the extreme points of the feasible set are exactly
the basic feasible solutions of the system P x = p, i.e., nonnegative vectors x such
that P x = p and the set of columns of P associated with positive entries of x is
linearly independent. As the feasible set of an LP in standard form clearly does
not contain lines (note the constraints x ≥ 0 which restricts the standard form LP
domain to be subset of the pointed cone Rn+ ), among its optimal solutions (if they
exist) at least one is an extreme point of the feasible set (Theorem II.9.12(ii)).
This then suggests a simple algorithm to solve a solvable LP in standard form: go
through the finite set of all extreme points of the feasible set (or equivalently all
basic feasible solutions) and choose the one with the best objective value. This
algorithm allows to find an optimal solution in finitely many arithmetic opera-
114 Geometry of polyhedral sets

tions, provided that the LP is solvable, and underlies the basic idea for the Sim-
plex method. As one will immediately recognize, the number of extreme points,
although finite, may be quite large. The Simplex method operates in a smarter
way and examines only a subset of the basic feasible solutions in an organized
way and can handle other issues such as infeasibility and unboundedness.
Another useful consequence of Theorem II.9.1 is that if all the data in an LP
are rational, then every one of its extreme points is a vector with rational entries.
Thus, a solvable standard form LP with rational data has at least one rational
optimal solution. ■
Theorem II.9.1 has further important consequences in terms of sizes of extreme
points of polyhedral sets as well.
To this end, let us first recall a simple fact from Linear Algebra:

Proposition II.9.4 If a system of linear equations Ax = b (with A ∈ Rm×n )


is feasible, then it has a solution x(b) of “magnitude of order of the magnitude
of b;” that is, ∥x(b)∥2 ≤ C(A)∥b∥2 with parameter C(A) < ∞ which depends
solely on A and but not on b.

Proof. Let r := rank(A). There is nothing to prove when r = 0 as in this case


A is zero, and if the system Ax = b has a solution, the vector zero is one of its
solutions as well. When r > 0, we can assume without loss of generality that
the first r columns of A are linearly independent, and the remaining columns are
linear combinations of these r columns. Then, if there is a solution to Ax = b,
then there must be a solution where xi = 0 for all i > r. We will take such a
solution as x(b). As the first r columns of A are linearly independent, A has an
r × r submatrix A b composed of the first r columns of A and r properly selected
rows of A. Let b be the subvector of b corresponding to the indices of rows selected
b
for A.
b Define the vector x b(b) ∈ Rr be the vector obtained from the first r entries
in x(b). Since the entries in x(b) with indexes greater than r are zero, we have
x(b) = [xb(b); 0] and so A
bxb(b) = b
b. As A
b is a nonsingular matrix, we deduce that

∥x(b)∥2 = ∥x b−1b
b(b)∥2 = ∥A b−1 ∥∥b
b∥2 ≤ ∥A b−1 ∥∥b∥2 ,
b∥2 ≤ ∥A
where ∥A b−1 ∥ is the spectral norm of A b−1 . Then, setting C(A) = ∥A b−1 ∥ concludes
the proof.
Surprisingly, a similar result holds for the solutions of systems of linear inequal-
ities as well.

Proposition II.9.5 Consider a system of linear inequalities Ax ≤ b where


A ∈ Rm×n . Whenever b results in a feasible system Ax ≤ b, then there exists
C(A) < ∞ depending solely on A, but not on b, such that this system has a
solution x(b) with ∥x(b)∥2 ≤ C(A)∥b∥2 .

Proof. This proof is quite similar to the one for Proposition II.9.4. Let r :=
rank(A). The case of r = 0 is trivial – in this case A = 0, and the system Ax ≤ b
is feasible, it has the solution x = 0. When r > 0, we can assume without loss of
9.1 Extreme points of polyhedral sets 115

generality that the first r columns in A are linearly independent. Let A b ∈ Rm×r
be the submatrix of A obtained from the first r columns of A. As A has all the
b
linearly independent columns of A, the image spaces of A and A b are the same.
Thus, the system Ax ≤ b is feasible if and only if the system Au b ≤ b is feasible.
r
Moreover, given any feasible solution u ∈ R to Au ≤ b, we can generate a feasible
b
solution x := [u; 0] ∈ Rn by adding n − r zeros at the end and still preserve the
norm of the solution. Hence, without loss of generality we can assume that the
columns of A are linearly independent and r = n.
As A ∈ Rm×n has n linearly independent columns and each column is a vector
in Rm , we deduce that m ≥ n and {u : Au = 0} = {0}. Thus, we conclude
that the polyhedral set {x : Ax ≤ b} does not contain lines. Therefore, when
nonempty, by Krein-Milman Theorem this polyhedral set has an extreme point.
Let us take this point as x(b). Then, by Theorem II.9.1, at least n of the inequality
constraints from the system Ax ≤ b will be active at x(b) and out of these active
constraints there will be n vectors ai (corresponding to rows of the matrix A)
that are linearly independent. That is, Ab x(b) = b holds for a certain nonsingular
n×n submatrix Ab of A. So, we conclude ∥x(b)∥2 ≤ ∥A−1 b ∥∥b∥2 . Since the number
of r × r nonsingular submatrices in A is finite, the maximum C(A) of the spectral
norms of the inverses of these submatrices is finite as well, and, as we have seen,
for every b for which the system Ax ≤ b is feasible, it has a solution x(b) with
∥x(b)∥2 ≤ C(A)∥b∥2 , as claimed.

9.1.1 Important polyhedral sets and their extreme points


In this section, we will examine a number of important polyhedral sets and their
extreme points. Throughout this section, we suppose that k and n are positive
integers with k ≤ n.

Example II.9.1 Suppose k, n are integers satisfying 0 ≤ k ≤ n. Consider the


polytope
( n
)
X
X := x ∈ Rn : 0 ≤ xi ≤ 1, ∀i ≤ n, xi = k .
i=1

The set of extreme points of X is precisely the set of vectors with entries 0 and
1 which have exactly k entries equal to 1. That is,
( n
)
X
Ext(X) = x ∈ {0, 1}n : xi = k .
i=1

In particular,
Pn the extreme points of the “flat (a.k.a. probabilistic) simplex”
{x ∈ Rn+ : i=1 xi = 1} are the basic orths (see the Figure II.3 for an illustration
116 Geometry of polyhedral sets

of this set with k = 1).

Figure II.3. Example II.9.1, n = 3, k = 1.

Let us justifyPnthe claim of this example. For convenience, we define Y :=


{x ∈ {0, 1}n : i=1 xi = k}; then we need to show that Ext(X) = Y . Clearly,
Y ⊆ X. Moreover, for any y ∈ Y , for each coordinate i = 1, . . . , n we have either
yi = 0 or yi = 1, thus we have n bound constraints active. Since the vectors
of coefficients of these constraints provide us n linearly independent vectors, we
conclude by Theorem II.9.1 that Y ⊆ Ext(X). Now, consider any w ∈ Ext(X).
Then, by Theorem II.9.1 among the constraints specifying X, n constraints with
linearly independent vectors of coefficients should be active at w. Thus, we must
have at least n − 1 of the bound constraints 0 ≤ xi ≤ 1 active at w, i.e., at least
n − 1 of the entries of w must be {0, 1}. Let Pni∗ be the index of the remaining
entry of w. As w P ∈ X, it must also satisfy i=1 wi = k. As k is an integer, we
deduce wi∗ = k − i̸=i∗ wi must be an integer as well. But, then as we also have
0 ≤ wi∗ ≤ 1, we deduce that wi∗ ∈ {0, 1}. Thus, w ∈ Y holds, as desired. ♢

Example II.9.2 Suppose k, n are integers satisfying 0 ≤ k ≤ n. Consider the


polytope
( n
)
X
n
X= x ∈ R : 0 ≤ xi ≤ 1, ∀i ≤ n, xi ≤ k .
i=1

The set of extreme points of X is precisely the set of vectors with entries 0 and
1 which have at most k entries equal to 1. That is,

( n
)
X
Ext(X) = x ∈ {0, 1}n : xi ≤ k .
i=1

PIn particular, the extreme points of the “full-dimensional simplex” {x ∈ Rn+ :


n
i=1 xi ≤ 1} are the basic orths and the origin (see Figure II.4 for an illustration
9.1 Extreme points of polyhedral sets 117

of this set with k = 1).

Figure II.4. Example II.9.2, n = 3, k = 1

Justification of this example follows the one of Example II.9.1 and is left as an
exercise to the reader. ♢
Example II.9.3 Suppose k, n are integers satisfying 0 ≤ k ≤ n. Consider the
polytope
n Xn o
X = x ∈ Rn : |xi | ≤ 1, ∀i ≤ n, |xi | ≤ k .
i=1

Extreme points of X are exactly the vectors with entries 0, 1, −1 which have
exactly k nonzero entries. That is,
( n
)
X
n
Ext(X) = x ∈ {−1, 0, 1} : |xi | = k .
i=1

In particular, extreme points of the unit ∥ · ∥1 -ball {x ∈ Rn : ∥x∥1 ≤ 1} = {x ∈


n
Rn :
P
i=1 |xi | ≤ 1} are exactly the vectors {±ei : i = 1, . . . , n} where ei is the
i-th basic orth (see Figure II.5 for an illustration of this set with k = 1).

Figure II.5. Example II.9.3, n = 3, k = 1.

Similarly, extreme points of the unit ∥ · ∥∞ -ball {x ∈ Rn : ∥x∥∞ ≤ 1} = {x ∈


R : |xi | ≤ 1, ∀i = 1, . . . , n} are the 2n vectors with ±1 entries (see Figure II.6
n
118 Geometry of polyhedral sets

for an illustration of this set).

Figure II.6. Extreme points of the box {x ∈ R3 : ∥x∥∞ ≤ 1}.


Here is the justification of the Pn claim of this example. For convenience, we
define Y := {x ∈ {−1, 0, 1}n : i=1 |xi | = k}; we need to show that Ext(X) =
Y . Clearly, Y ⊆ X. Consider any y ∈ Y . Without loss of generality, suppose
that the nonzero entries of y are the first k. Now, consider any h such that
y ± h ∈ X. Since yi ∈ {−1, +1} for i = 1, . . . , k and y ± h ∈ X, Pnwe must have
hi = 0 for all i = 1, . . . , k. Also, y + h ∈ X implies that k ≥ i=1 |yi + hi | =
Pk Pn Pn
i=1 |yi | + i=k+1 |hi | = k + i=k+1 |hi |, and thus |hi | = 0 for all i = k + 1, . . . , n.
This proves that h = 0 and thus y must indeed be an extreme point of X. So,
Y ⊆ Ext(X). Now, consider any w ∈ Ext(X), we will show that w ∈ Y . Note
that X has certain symmetry: for any x ∈ X and any d ∈ {−1, +1}n , we have
Diag(d)x ∈ X as well. In particular, any d ∈ {−1, +1}n maps X onto itself
and therefore maps its extreme points Ext(X) onto Ext(X). As a result, we can
assume without loss of generality that w ≥ 0. Then, all we need to prove is
that w has k entries equal to 1 and all remaining entries equal Pto 0. The set
k
X+ := {x ∈ X : x ≥ 0} = {x ∈ Rn : 0 ≤ xi ≤ 1, ∀i ≤ n, i=1 xi ≤ k} is
contained in X and w ∈ X+ . As w ∈ Ext(X) and w ∈ X+ ⊂ X, we must have
that w is an extreme point of X+ as well.
In the preceding reasoning, we have used the following evident fact:
Suppose P ⊂ Q are convex sets. If x̄ ∈ P is an extreme point of Q, then
it is an extreme point of P as well.
(Otherwise x̄ would be the midpoint of a nontrivial segment contained in P and
therefore contained in Q).
Then, noting that X+ is precisely the set from Example II.9.2 and w ∈ Ext(X+ ),
we conclude that w has only 0 and 1 entries with at most k entries equal to k.
It remains to verify that the number of nonzero entries in w is equal to k. In-
deed, w were to have
Pn fewer than k nonzero entries, w would have a zero entry,
say, w1 = 0, and i=1 |wi | < k, implying that there exists ϵ ∈ (0, 1) such that
the vector h = [ϵ; 0; . . . ; 0] will satisfy (w ± h) ∈ X, which is impossible since
w ∈ Ext(X). ♢
As our last example we next discuss the so-called Assignment polytope, which
9.1 Extreme points of polyhedral sets 119

is closely connected to the very important concept of doubly stochastic matrices


and the Birkhoff Theorem.
n
Definition II.9.6 [Doubly stochastic matrix] A matrix X = [x
n×n
Pijn]i,j=1 ∈
R is called doubly stochastic, if xij ≥ 0 for all i, j ∈ {1, . . . , n},
Pn i=1 xij =
1 for all j ∈ {1, . . . , n} (i.e., all column sums are equal to 1), and j=1 xij = 1
for all i ∈ {1, . . . , n} (i.e., all row sums are equal to 1).
2
The set of all doubly stochastic matrices (treated as elements of Rn = Rn×n )
form the following bounded polyhedral set:
 
 x ij ≥ 0, ∀i, j ∈ {1, . . . , n}, 
n
Πn := X = [xij ]ni,j=1 : Pi=1 xij = 1, ∀j ∈ {1, . . . , n}, .
P
n
j=1 xij = 1, ∀i ∈ {1, . . . , n}
 

The set Πn is also called the Assignment (or balanced matching) polytope. As Πn
is a polytope, by Krein-Milman Theorem, Πn is the convex hull of its extreme
points. What are these extreme points? The answer is given by the following
fundamental result.

Theorem II.9.7 [Birkhoff–von Neumann Theorem] Extreme points of Πn


are exactly the permutation matrices of order n, which are n × n Boolean
(i.e., with 0/1 entries) matrices with exactly one nonzero element (equal to
1) in every row and every column.

Proof. It is indeed easy to see that every n × n permutation matrix is an extreme


point of Πn ; we give this as Exercise II.42.
We now prove the difficult part, that is we will show that every extreme point
of Πn is a permutation matrix. First, note that the 2n linear equations in the
definition of Πn , those saying that all row and column sums are equal to 1, are
linearly dependent (observe that the sum of the first group of equalities is exactly
the same as the sum of the second group of equalities). Thus, we lose nothing
when assuming that there are just 2n − 1 equality constraints in the description
of Πn . Now, let us prove the claim by induction on n. The base n = 1 is trivial.
As the induction hypothesis suppose that the statement holds for Πn−1 . Let X
be an extreme point of Πn . By Theorem II.9.1, among the constraints defining
Πn (i.e., 2n − 1 equalities and n2 inequalities xij ≥ 0) there should be n2 linearly
independent constraints which are satisfied at X as equations. Thus, at least
n2 − (2n − 1) = (n − 1)2 entries in X should be zeros. It follows that at least one
of the columns of X contains ≤ 1 nonzero entries (since otherwise the number of
zero entries in X would be at most n(n − 2) < (n − 1)2 ). Thus, there exists at
least one column with at most 1 nonzero entry; since the sum of entries in this
column is 1, this nonzero entry, let it be xīj̄ , is equal to 1. As the entries in row ī
are nonnegative, sum up to 1 and xīj̄ = 1, xīj̄ = 1 is the only nonzero entry in its
row and its column. Eliminating from X the row ī and the column j̄, we get an
(n − 1) × (n − 1) doubly stochastic matrix. By inductive hypothesis, this matrix
120 Geometry of polyhedral sets

is a convex combination of (n − 1) × (n − 1) permutation matrices. Augmenting


every one of these matrices by the column and the row we have eliminated, we
representation of X as a convex combination of n × n permutation matrices:
get a P
X = ℓ λℓ P ℓ with nonnegative λℓ summing up to 1. Since P ℓ ∈ Πn and X is an
extreme point of Πn , in this representation all terms with nonzero coefficients λℓ
must be equal to λℓ X, so that X is one of the permutation matrices P ℓ and as
such is a permutation matrix.

9.2 Extreme rays of polyhedral cones


Recall that for nontrivial closed pointed cones, we have defined the concepts of
extreme directions and extreme rays in section 8.6. In the case of polyhedral
cones, we can also give an algebraic characterization of its extreme directions
analogous to Theorem II.9.1.

Theorem II.9.8 [Characterization  of extreme directions of polyhedral cones]


Consider a polyhedral cone M = d ∈ Rn : a⊤ i d ≤ 0, i = 1, . . . , m . Suppose
that M is nontrivial and pointed. A direction d ∈ M \ {0} is an extreme di-
rection of M if and only if there are n − 1 linearly independent (i.e., with
linearly independent vectors ai ) constraints which are active at d (i.e., such
that a⊤i d = 0).

Proof. As M is a nonempty closed (recall that it is polyhedral!) pointed cone,


from Fact II.8.23(iii), we deduce that its dual cone M∗ has a nonempty interior.
Let f ∈ int M∗ . Consider the set
B := M ∩ d ∈ Rn : f ⊤ d = 1 .


Then, by Fact II.8.33(ii), B is a base of M . Thus, B is nonempty and compact (see


Fact II.8.33(iii)). As M is polyhedral, by definition of B, we have B is polyhedral
as well. Since B is a nonempty bounded polyhedral set, from Theorem II.8.6(ii)
we have B = Conv(Ext(B)). Recall from Fact II.8.33(iv) that there is a one-to-
one correspondence between extreme rays of M and extreme points of a base B
of M ; specifically, the ray R := R+ (d), d ∈ M \ {0} is extreme if and only if
R ∩ B is an extreme point of B.
Consider an extreme point xd of B. Applying Theorem II.9.1, we deduce that
at every extreme point of B we must have at least n active constraints from
the set of linear (in)equalities f ⊤ xd = 1 and a⊤i xd ≤ 0, i = 1, . . . , m, where the
corresponding vectors of active constraints must contain n linearly independent
vectors. Considering the definition of B, at every extreme point xd of B, we have
the constraint f ⊤ xd = 1 is active. Let I be the set of indices i ∈ {1, . . . , m}
such that a⊤ i xd = 0. Then, the vectors generating the active constraints at xd
are given by {f } ∪ {ai : i ∈ I}. Since among the active constraints at xd , there
are n of them with linearly independent vectors, we conclude that there must be
n − 1 linearly independent vectors in {ai : i ∈ I}. Then, as xd and d generate the
9.3 Geometry of polyhedral sets 121

same extreme direction of M , we conclude that d satisfies the desired algebraic


characterization.
To establish the reverse direction, suppose that a direction d ∈ M \{0} satisfies
a⊤
i d = 0 for i ∈ I ⊆ {1, . . . , m} where there are n − 1 linearly independent vectors
in {ai : i ∈ I}. Let xd := R+ (d) ∩ B. We claim that xd ∈ Ext(B). Note that xd
satisfies a⊤ ⊤
i xd = 0 for i ∈ I and f xd = 1. If f ∈ Lin({ai : i ∈ I}), then from the
constraints ai xd = 0 for all i ∈ I we would have deduced f ⊤ xd = 0, which is not

the case. So, the set {f } ∪ {ai : i ∈ I} must have n linearly independent vectors.
Then, by Theorem II.9.1 xd must be an extreme point of B. Once again, using
Fact II.8.33(iv) we conclude that d must be an extreme direction of M .
Analogous to Corollary II.9.2, we have the following immediate corollary of
this theorem.

Corollary II.9.9 Every nontrivial pointed polyhedral cone M has finitely


many extreme rays. Moreover, the sum of these extreme rays is the entire
M.
Proof. Let M be a nontrivial pointed polyhedral cone. As any polyhedral cone
is defined by finitely many linear inequalities, using the algebraic characterization
of the extreme directions given in Theorem II.9.8, we deduce that M has finitely
many extreme rays. The last claim of the corollary is justified by Theorem II.8.34,
i.e., Krein-Milman Theorem in conic form, as this theorem states that M has
extreme rays, and their sum is the entire M .

9.3 Geometry of polyhedral sets


By definition, a polyhedral set M is the set of all solutions to a finite system of
nonstrict linear inequalities:
M := {x ∈ Rn : Ax ≤ b} , (9.1)
where A ∈ Rm×n and b ∈ Rm . This is an “outer” description of a polyhedral
set, that is, it explains what we should delete from Rn to get M (cf: “to create
a sculpture, take a big stone and delete everything that is redundant”). We are
about to establish an important result on the equivalent “inner” representation
of a polyhedral set, that is, one explaining how to build the set starting with
simple “building blocks.”
Consider the following construction. Let us take two finite sets of vectors V
(“vertices;” this set should be nonempty) and R (“rays;” this set can be empty)
and build the set
M (V, R) := Conv(V ) + Cone(R)
X P 
X λv ≥ 0, ∀v ∈ V, v∈V λv = 1,
= λv v + µr r : .
v∈V r∈R µr ≥ 0, ∀r ∈ R
Thus, in the construction of M (V, R) we take all vectors which can be represented
as sums of convex combinations of the points from V and conic combinations of
122 Geometry of polyhedral sets

the points from R. The set M (V, R) clearly is convex as it is the arithmetic sum
of two convex sets Conv(V ) and Cone(R) (recall our convention that Cone(∅) =
{0}, see Fact I.1.20). We are now ready to present the promised inner description
of a polyhedral set.

Theorem II.9.10 [Inner description of a polyhedral set] The sets of the


form M (V, R) , where V, R are finite set of vectors and V ̸= ∅, are exactly
the nonempty polyhedral sets: M (V, R) is polyhedral, and every nonempty
polyhedral set M is M (V, R) for properly chosen finite sets V ̸= ∅ and R.
The sets of the type M ({0}, R) are exactly the polyhedral cones (sets given
by finitely many nonstrict homogeneous linear inequalities).

Remark II.9.11 We will see in section 9.3.3 that the inner characterization of
the polyhedral sets given in Theorem II.9.10 can be made much more precise.
Suppose that we are given a nonempty polyhedral set M . Then, we can select an
inner characterization of it in the form of M = Conv(V ) + Cone(R) with finite V
and finite R, where the “conic” part Cone(R) (not the set R itself!) is uniquely
defined by M ; in fact it will always hold that Cone(R) = Rec(M ), i.e., R can
be taken as the generators of the recessive cone of M (see Comment to Lemma
II.8.8). Moreover, if M does not contain lines, then V can be chosen as the set of
all extreme points of M . ■
We will prove Theorem II.9.10 in section 9.3.3. Before proceeding with its proof,
let us understand why this theorem is so important, i.e., why it is so nice to know
both inner and outer descriptions of a polyhedral set.
Consider the following natural questions:
• A. Is it true that the inverse image of a polyhedral set M ⊂ Rn under an affine
mapping y 7→ P(y) = P y + p : Rm → Rn , i.e., the set
P −1 (M ) = {y ∈ Rm : P y + p ∈ M }
is polyhedral?
• B. Is it true that the image of a polyhedral set M ⊂ Rn under an affine mapping
x 7→ y = P(x) = P x + p : Rn → Rm – the set
P(M ) = {P x + p : x ∈ M }
is polyhedral?
• C. Is it true that the intersection of two polyhedral sets is again a polyhedral
set?
• D. Is it true that the arithmetic sum of two polyhedral sets is again a polyhedral
set?
The answers to all these questions are positive; one way to see it is to use calculus
of polyhedral representations along with the fact that polyhedrally representable
sets are exactly the same as polyhedral sets (see chapter 3). Another very in-
structive way is to use the just outlined results on the structure of polyhedral
sets, which we will do now.
9.3 Geometry of polyhedral sets 123

It is very easy to answer affirmatively to A, starting from the original “outer”


definition of a polyhedral set: if M = {x : Ax ≤ b}, then, of course,
P −1 (M ) = {y : A(P y + p) ≤ b} = {y : (AP )y ≤ b − Ap}
and therefore P −1 (M ) is a polyhedral set.
An attempt to answer affirmatively to B via the same “outer” definition fails
– there is no easy way to convert the linear inequalities defining a polyhedral
set into those defining its image, and it is absolutely unclear why the image in
fact is given by finitely many linear inequalities. Note, however, that there is no
difficulty to answer affirmatively to B with the inner description of a nonempty
polyhedral set: if M = M (V, R), then, evidently,
P(M ) = M (P(V ), P R),
where P R := {P r : r ∈ R} is the image of R under the action of the homogeneous
part of P.
A positive answer to C becomes evident when we use the outer description of
a polyhedral set: taking intersection of the solution sets to two systems of finitely
many nonstrict linear inequalities, we, of course, again get the solution set to a
system of this type – you simply should put together all inequalities from both
of the original systems.
On the other hand, it is very unclear how to give the affirmative answer to D
using the outer description of a polyhedral set – what happens to the inequalities
when we add the solution sets? In contrast to this, the inner description gives the
answer immediately:
M (V, R) + M (V ′ , R′ ) = Conv(V ) + Cone(R) + Conv(V ′ ) + Cone(R′ )
= [Conv(V ) + Conv(V ′ )] + [Cone(R) + Cone(R′ )]
= Conv(V + V ′ ) + Cone(R ∪ R′ )
= M (V + V ′ , R ∪ R′ ).
Note that in this computation we used two rules which should be justified:
Conv(V ) + Conv(V ′ ) = Conv(V + V ′ ) and Cone(R) + Cone(R′ ) = Cone(R ∪ R′ ).
The second is evident from the definition of the conic hull, and the first one
follows from a very simple reasoning. To see it, note that Conv(V ) + Conv(V ′ )
is a convex set which by its definition contains V + V ′ and thus also contains
Conv(V + V ′ ). The inverse inclusion is proved as follows: if
X X
x= λ i vi , y= λ′j vj′
i j

are convex combinations of points from V , respectively, V ′ , then, as it is imme-


diately seen (please check!),
X
x+y = λi λ′j (vi + vj′ )
i,j

and the right hand side expression is nothing but a convex combination of points
from V + V ′ .
124 Geometry of polyhedral sets

To conclude, it is extremely useful to keep in mind both descriptions of poly-


hedral sets – what is difficult to see with one of them, is absolutely clear with
another.
As a seemingly “more important” application of the developed theory, let us
look at Linear Programming.

9.3.1 Application: Descriptive theory of Linear Programming


A general linear program is the problem of minimizing a linear objective function
over a polyhedral set:
min c⊤ x : x ∈ M , where M := {x ∈ Rn : Ax ≤ b}.

(P )
x

Here, c ∈ Rn is the objective, A ∈ Rm×n is the constraint matrix, and b ∈ Rm


is the right hand side vector. Note that (P) is called a “Linear Programming
problem in the canonical form;” there are other equivalent forms of this problem
as well.

9.3.2 Application: Solvability of a Linear Programming problem


According to the Linear Programming terminology discussed in section 4.5.1, (P )
is called
• feasible, if it admits a feasible solution, i.e., the system Ax ≤ b is feasible, and
infeasible otherwise;
• bounded, if its objective is below bounded on the feasible set (e.g., due to the
fact that the latter is empty), and unbounded otherwise;
• solvable, if it is feasible and the optimal solution exists, i.e., the objective
function attains its minimum on the feasible set.
Whenever (P ) is feasible, the infimum of the values of the objective function at
feasible solutions is called the optimal value Opt(P ) of (P ). Opt(P ) is finite when
the problem (P ) is bounded from below and −∞ when (P ) is unbounded. In the
case of a minimization type problem, it is convenient to assign the optimal value
of +∞ whenever the problem is infeasible.
Note that our terminology is aimed to deal with minimization problems; if the
problem is to maximize the objective, the terminology is updated in the natural
way: when defining bounded/unbounded programs, we should speak about above
boundedness rather than about the below boundedness of the objective on the
feasible set, etc. As a result, the optimal value of an LP problem
• in the case of a minimization problem is the infimum of the objective over the
feasible set, provided the latter is nonempty, and +∞ when the problem is
infeasible;
• in the case of a maximization problem is the supremum of the objective over
the feasible set, provided the latter is nonempty, and −∞ when the problem is
infeasible.
9.3 Geometry of polyhedral sets 125

This terminology is consistent with the usual way of converting a minimization


problem into an equivalent maximization one by replacing the original objective c
with −c: the properties of feasibility, boundedness, solvability remain unchanged,
and the optimal value in all cases changes its sign.
When talking about the possible outcomes of solving an LP problem, we talk
about three possibilities: (i) infeasible LP problem, (ii) unbounded and feasible
LP problem, and (iii) solvable LP problem. In particular, it seems that in the case
of “bounded and feasible” LP problem, we are jumping straight to the conclusion
that the corresponding optimization problem will be “solvable.” This, a bounded
LP program is always solvable, is indeed true, although it is absolutely unclear
in advance why (note that this statement absolutely does not hold for general
convex programming problems without further assumptions). We have already
established this fact, even twice — via Fourier-Motzkin elimination (section 3.2
and via the LP Duality Theorem, see Theorem I.4.9). In fact yet another proof of
this fundamental fact of Linear Programming follows immediately from the inner
characterization of polyhedral sets as shown next.

Theorem II.9.12 Suppose we are given a feasible minimization type LP


problem (P ) via an inner representation of its feasible set M :
M = Conv(V ) + Cone(R),
where V = {vi : i = 1, . . . , I} and R = {rj : j = 1, . . . , J} are finite and
nonempty sets (cf. Theorem II.9.10). Then,
(i) (P ) is solvable if and only if it is below bounded, which is the case if
and only if c⊤ rj ≥ 0 for all 1 ≤ j ≤ J.
In particular, the set C of objectives c for which (P ) is below bounded is a
polyhedral cone.
(ii) When (P ) is below bounded, its optimal value is equal to
Opt(P ) = min c⊤ v.
v∈V

Thus, Opt(P ) is a concave function of the objective vector c, and there is


an optimal solution which is the best, in terms of its objective value, among
the points in V . In addition, when the feasible set of (P ) does not contain
lines and (P ) is below bounded, there is at least one optimal solution of (P )
which is an extreme point of M .

Proof. (i): By Theorem II.9.10, we clearly have

Opt(P ) = minv c⊤ v : v ∈ Conv(V ) + inf r c⊤ r : r ∈ Cone(R) .


 

We see that Opt(P ) is finite if and only if inf r c⊤ r : r ∈ Cone(R) > −∞, and


the latter clearly ⊤is the case if and only if c r ≥ 0 for  all r ∈ R. Then, in
such a case inf r c r : r ∈ Cone(R) = 0, and also minv c⊤ v : v ∈ Conv(V ) =
minv c⊤ v : v ∈ V .
(ii): The first claim in (ii) is an immediate byproduct of the proof of (i). The
126 Geometry of polyhedral sets

second claim follows from the fact that when M does not contain lines, we can
take V = Ext(M ), see Remark II.9.11.

9.3.3 Proof of Theorem II.9.10

In this section, we will prove Theorem II.9.10. To simplify our language let
us call VR-sets (“V” from “vertex” and “R” from “rays”) the sets of the form
M (V, R), and P-sets the nonempty polyhedral sets, i.e., defined by finitely many
linear non-strict inequalities. We should prove that every P-set is a VR-set, and
vice versa.
VR =⇒ P: This is immediate: a VR-set is nonempty and polyhedrally repre-
sentable (why?) and thus is a nonempty P-set by Theorem I.3.2.
P =⇒ VR:
Let M ̸= ∅ be a P-set, so that M is the set of all solutions to a feasible system
of linear inequalities:
M = {x ∈ Rn : Ax ≤ b} , (9.2)
where A ∈ Rm×n .
We will first study the case of P-sets that do not contain lines, and then reduce
the general case to this one.

Theorem II.9.13 [Structure of a polyhedral set with no lines] A nonempty


polyhedral set M = {x ∈ Rn : Ax ≤ b} which does not contain lines admits
a VR-representation given by M = M (V, R) = Conv(V ) + Cone(R), where
V is the set of extreme points of M and R = {0} if M is bounded and R is
the set of extreme rays of Rec(M ) if M is unbounded.

Proof. As M is a nonempty closed convex set that does not contain lines, by
Theorem II.8.6(i) we know Ext(M ) ̸= ∅, and by Theorem II.8.16 we have M =
Conv(Ext(M )) + Rec(M ). Moreover, by Corollary II.9.2, we have Ext(M ) is a
finite set.
If M is bounded, then Rec(M ) = {0}, and thus the result follows. Suppose M
is unbounded. Then, Rec(M ) is nontrivial. Also, Rec(M ) is pointed as M does
not contain lines implies that Rec(M ) does not contain lines either. Moreover,
from Fact II.8.15, we deduce that Rec(M ) = {h ∈ Rn : Ah ≤ 0} and thus is
a polyhedral cone. Then, by Corollary II.9.9 we have that Rec(M ) has finitely
many extreme rays and Rec(M ) is the sum of its extreme rays.
Next, we study the case when M contains a line. We start with the following
observation.

Lemma II.9.14 A nonempty polyhedral set M = {x ∈ Rn : Ax ≤ b}


contains lines if and only if KerA ̸= {0}. Moreover, given a vector h ̸= 0, the
set M contains a line with direction h (i.e., x + th ∈ M for all x ∈ M and
9.3 Geometry of polyhedral sets 127

t ∈ R) if and only if h ∈ KerA. That is, the nonzero vectors from KerA are
exactly the directions of lines contained in M .

Proof. If h ̸= 0 is the direction of a line in M , then A(x + th) ≤ b for some


x ∈ M and all t ∈ R, which is possible if and only if Ah = 0. Vice versa, if h ̸= 0
is from the kernel of A, i.e., if Ah = 0, then the line x + R(h) with x ∈ M is
clearly contained in M .
Given a nonempty set M as in (9.2), define L := KerA, let L⊥ be the orthogonal
complement to L, and let M ′ be the intersection of M and L⊥ :
M ′ := x ∈ L⊥ : Ax ≤ b .


First, note that as M ̸= ∅ we have M ′ ̸= ∅, and also the set M ′ clearly does
not contain lines. This is because if h ̸= 0 is the direction of a line satisfying
x + th ∈ M ′ for all t ∈ R and some x ∈ M ′ , by definition of M ′ we must have
x + th ∈ L⊥ for all t and thus h ∈ L⊥ . On the other hand, by Lemma II.9.14,
we must also have h ∈ KerA = L. Then, h ∈ L ∩ L⊥ implies h = 0, which is a
contradiction.
Now, note that M ′ ̸= ∅ satisfies M = M ′ + L. Indeed, M ′ contains the or-
thogonal projections of all points from M onto L⊥ (since to project a point onto
L⊥ , you should move from this point along a certain line with a direction from
L, and all these movements, started in M , keep you in M by Lemma II.9.14) and
therefore is nonempty, first, and is such that M ′ + L ⊇ M , second. On the other
hand, M ′ ⊆ M and M + L = M (by Lemma II.9.14), and so M ′ + L ⊆ M . Thus,
M′ + L = M.
Finally, it is clear that M ′ is a polyhedral set as the inclusion x ∈ L⊥ can be
represented by dim(L) linear equations (i.e., by 2 dim(L) nonstrict linear inequal-
ities). To this end, all we need is a set of vectors ξ1 , . . . , ξdim(L) forming a basis in
L, and then L⊥ := {x ∈ Rn : ξi⊤ x = 0, ∀i = 1, . . . , dim(L)}.
Therefore, with these steps, given an arbitrary nonempty P-set M , we have
represented it as the sum of a P-set M ′ which does not contain lines and a linear
subspace L. Then, as M ′ does not contain lines, by Theorem II.9.13, we have
M ′ = M (V ′ , R′ ) where V ′ is the nonempty set of extreme points of M ′ and R′
is the set of extreme rays of Rec(M ′ ). Let us define R′′ to be the finite set of
generators for L, i.e., L = Cone(R′′ ). Then, we arrive at
M = M′ + L
= [Conv(V ′ ) + Cone(R′ )] + Cone(R′′ )
= Conv(V ′ ) + [Cone(R′ ) + Cone(R′′ )]
= Conv(V ′ ) + Cone(R′ ∪ R′′ )
= M (V ′ , R′ ∪ R′′ ).
Thus, this proves that a P-set is indeed a VR-set, as desired.
Finally, let us justify Remark II.9.11. Suppose we are given M = Conv(V ) +
Cone(R) with finite sets V, R and V ̸= ∅. Justifying the first claim in this re-
mark requires us to show that Cone(R) = Rec(M ). To see this, from M =
128 Geometry of polyhedral sets

Conv(V ) + Cone(R) it clearly follows Cone(R) ⊆ Rec(M ). To prove reverse in-


clusion, consider any r ∈ Rec(M ). Then, by definition of the recessive cone, for
any v ∈ V ⊆ M , we have v + tr ∈ M for all t > 0. In addition, as v + tr ∈ M for
all t > 0, from M = Conv(V ) + Cone(R) we deduce that for all t > 0
∃vt ∈ Conv(V ) and ∃rt ∈ Cone(R): v + tr = vt + rt .
Since V is a finite set, Conv(V ) is bounded, and so r = limt→∞ t−1 rt . Moreover,
this limit (and thus r) belongs to Cone(R) as Cone(R) is polyhedral and thus
closed. The last claim of Remark II.9.11 states that when a nonempty P-set M
does not contain lines, in every representation M = M (V, R) with finite V and R
one has Ext(M ) ⊆ V . To see this, suppose x is an extreme point of M . We will first
show that x ∈ Conv(V ). Assume for contradiction that x ∈ M \ Conv(V ). Then,
from x ∈ M = Conv(V ) + Cone(R), we deduce that x = x̄ + r with x̄ ∈ Conv(V )
and 0 ̸= r ∈ Cone(R) (why r ̸= 0?). But, this would imply x̄ = x − r ∈ M and
x + r ∈ M as x ∈ M , r ∈ Cone(R) = Rec(M ) and M is convex. Thus, we arrive
at x ± r ∈ M with r ̸= 0, contradicting the fact that x ∈ Ext(M ). Therefore,
x ∈ Conv(V ), and hence x ∈ V by Fact II.8.5.

9.4 ⋆ Majorization
In this section we will introduce and study the Majorization Principle, which
describes the convex hull of permutations of a given vector.
For any x ∈ Rn , we define X[x] to be the set of all convex combinations of n!
vectors obtained from x by permuting its coordinates. That is,
X[x] := Conv ({P x : P is an n × n permutation matrix})
= {Dx : D ∈ Πn } ,
where Πn is the set of all n × n doubly stochastic matrices. Here, the equality is
due to the Birkhoff-von Neumann Theorem. Note that X[x] is a permutationally
symmetric set, that is given any vector from the set the vector obtained by
permuting its entries is also in the set.

Theorem II.9.15 [Majorization Principle] Given two vectors x, y ∈ Rn , we


have y ∈ X[x] if and only if y satisfies
sj (y) ≤ sj (x), j = 1, . . . , n − 1,
(9.3)
y1 + . . . + yn = x 1 + . . . + x n ,
where sj (y) is the sum of the j largest entries of the vector y.

Proof: To ease our notation, let us define the set


 
n sj (y) ≤ sj (x), j = 1, . . . , n − 1
Y := y ∈ R : .
sn (y) = y1 + . . . + yn = x1 + . . . + xn = sn (x)
Then, we need to show that Y = X[x].
9.4 ⋆ Majorization 129

For any k, define Ik to be the family of all k-element subsets of the index set
{1, 2 . . . ., n}, and so
X
sk (y) = max yi , . (9.4)
I∈Ik
i∈I

We first prove that Y ⊇ X[x]. Consider any y ∈ X[x]. By the definition of


X[x], y can be represented as convex combination
X
y= λ σ xσ ,
σ

where the sum is taken over all permutations σ of indices 1, 2, . . . , n, and xσ is


obtained from x by permutation σ of the entries, i.e.,

xσ := [xσ(1) ; . . . ; xσ(n) ].

Consequently, for every I ∈ Ik we have


X XX X X X
yi = λσ xσ(i) = λσ xσ(i) ≤ λσ sk (x) = sk (x),
i∈I i∈I σ σ i∈I i∈I

where the inequality is due to (9.4). Maximizing both sides of this inequality over
I ∈ Ik and invoking (9.4) once again, we get sk (y) ≤ sk (x) for all k ≤ n. In
addition,
n
X n X
X X n
X X n
X n
X
yi = λσ xσ(i) = λσ xσ(i) = λσ xi = xi ,
i=1 i=1 σ σ i=1 σ i=1 i=1

that is, sn (y) = sn (x). Thus, y ∈ Y , whence Y ⊇ X[x].


We will now prove the difficult part of Majorization Principle which states that
Y ⊆ X[x]. Consider any y ∈ Y and let us prove that y ∈ X[x]. By symmetry,
we may assume without loss of generality that the vectors x and y are ordered:
x1 ≥ x2 ≥ . . . ≥ xn and y1 ≥ y2 ≥ . . . ≥ yn . Assume for contradiction that
y ̸∈ X[x]. Since X[x] clearly is a convex compact set and y ̸∈ X[x], there exists
a linear form c⊤ z which strongly separates y and X[x], i.e.,

c⊤ y > max c⊤ z.
z∈X[x]

As the set X[x] is permutationally symmetric and the vector y is ordered, without
loss of generality we can select the vector c to be ordered as well. This is because
when permuting the entries of c, we preserve max c⊤ z, and arranging the entries
z∈X[x]
of c in non-increasing order, we do not decrease c⊤ y: assuming, say, that c1 < c2 ,
swapping c1 and c2 we do not decrease c⊤ y: [c2 y1 + c1 y2 ] − [c1 y1 + c2 y2 ] = [c2 −
c1 ][y1 −y2 ] ≥ 0. Next, by Abel’s formula (discrete analogy of integration by parts)
130 Geometry of polyhedral sets

we have
n
X n−1
X i
X n
X
c⊤ y = ci yi = (ci − ci+1 ) yj + cn yj
i=1 i=1 j=1 j=1
n−1
X
= (ci − ci+1 )si (y) + cn sn (y)
i=1
n−1
X n
X
≤ (ci − ci+1 )si (x) + cn sn (x) = ci xi = c⊤ x.
i=1 i=1

where the inequality follows from the “orderedness” of entries in c and y ∈ Y .


Thus, we conclude c⊤ y ≤ c⊤ x, which is the desired contradiction.
10

Exercises for Part II

10.1 Separation
Exercise II.1 Mark by ”Y” those of the below listed cases where the linear form f ⊤ x separates
the sets S and T :
• = {0} ⊂ R, T = {0} ⊂ R, f ⊤ x = x
S
• = {0} ⊂ R, T = [0, 1] ⊂ R, f ⊤ x = x
S
• = {0} ⊂ R, T = [−1, 1] ⊂ R, f ⊤ x = x
S
= {x ∈ R3 : x1 = x2 = x3 }, T = {x ∈ R3 : x3 ≥ x21 + x22 }, f ⊤ x = x1 − x2
p
• S
• S = {x ∈ R3 : x1 = x2 = x3 }, T = {x ∈ R3 : x3 ≥ x21 + x22 }, f ⊤ x = x3 − x2 S = {x ∈ R3 :
p

−1 ≤ x1 ≤ 1}, T = {x ∈ R3 : x21 ≥ 4}, f ⊤ x = x1


• S = {x ∈ R2 : x2 ≥ x21 , x1 ≥ 0}, T = {x ∈ R2 : x2 = 0}, f ⊤ x = −x2
Exercise II.2 Consider the set
 

 x1 + x2 + . . . + x2004 ≥ 1 

x1 + 2x2 + 3x3 . . . + 2004x2004 ≥ 10

 

 
M = x ∈ R2004 : x1 + 22 x2 + 32 x3 . . . + 20042 x2004 ≥ 102
............

 


 

x1 + 22002 x2 + 32002 x3 + . . . + 20042002 x2004 ≥ 10 2002
 

Is it possible to separate this set from the set {x1 = x2 = . . . = x2004 ≤ 0}? If yes, what could
be a separating plane?
Exercise II.3 Can the sets S = {x ∈ R2 : x1 > 0, x2 ≥ 1/x1 } and T = {x ∈ R2 : x1 < 0, x2 ≥
−1/x1 } be separated? Can they be strongly separated?
Exercise II.4 ♦ Let M ⊂ Rn be a nonempty closed convex set. The metric projection
ProjM (x) of a point x ∈ Rn onto M is the ∥ · ∥2 -closest to x point of M , so that

ProjM (x) ∈ M & ∥x − ProjM (x)∥22 = min ∥x − y∥22 . (∗)


y∈M

1. Prove that for every x ∈ Rn the minimum in the right hand side of (∗) is achieved, and x+
is a minimizer if and only if

x+ ∈ M & ∀y ∈ M : [x − x+ ]⊤ [x+ − y] ≥ 0. (10.1)

Derive from the latter fact that the minimum in (∗) is achieved at a unique point, the bottom
line being that ProjM (·) is well defined.
2. Prove that when passing from a point x ∈ Rn to its metric projection x+ = ProjM (x), the
distance to any point of M does not increase, specifically,

∀y ∈ M : ∥x+ − y∥22 ≤ ∥x − y∥22 − dist2 (x, M ),


(10.2)
dist(x, M ) := minu∈M ∥x − u∥2 = ∥x − x+ ∥2 .

131
132 Exercises for Part II
x−x+
3. Let x ̸∈ M , so that, denoting x+ = ProjM (x), the vector e = ∥x−x+ ∥2
is well defined. Prove

that the linear form e z strongly separates x and M , specifically,

∀y ∈ M : e⊤ y ≤ e⊤ x − dist(x, M ).

Note: The fact just outlined underlies an alternative proof of Separation Theorem, where
the first step is to prove that a point outside a nonempty closed convex set can be strongly
separated from the set. In our proof, the first step was similar, but with M restricted to be
polyhedral, rather than merely convex and closed.
4. Prove that the mapping x 7→ ProjM (x) : Rn → M is contraction in ∥ · ∥2 :

∀u, u′ ∈ Rn : ∥ ProjM (u) − ProjM (u′ )∥2 ≤ ∥u − u′ ∥2 .

5. Let M be the probabilistic simplex: M = {x ∈ Rn : x ≥ 0, i xi = 1}. Justify the following


P
recipe for computing ProjM (x):
Let ψ(t) = m
P
i=1 [xi − t]+ . where [s]+ = max[s, 0]. ψ is piecewise linear, with breakpoints
x1 , x2 , . . . , xn , continuous function of t ∈ R. ψ(t) → +∞ as t → −∞, and ψ(t) → 0 as
t → +∞. Consequently,Pthere exists (and can be easily computed due to piecewise linearity
of ψ) t ∈ R such that i [xi − t]+ = 1. The metric projection of x onto M is nothing but
the vector x+ with coordinates [xi − t]+ , 1 ≤ i ≤ n.
What is the metric projection of the point x = [1; 2; 2.5] onto the 3-dimensional probabilistic
simplex?
Exercise II.5 ♦ [Follow-up to Exercise II.4] Let p(z) = z n + pn−1 z n−1 + . . . + p1 z + p0 , n ≥ 1
be a polynomial of complex variable z. By the Fundamental Theorem of Algebra, p has n roots
λ1 , . . . , λn . Treating complex numbers as 2D real vectors, prove that all roots of the derivative
p′ (z) = nz n−1 + (n − 1)pn−1 z n−2 + .. + p1 belong to the convex hull of λ1 , . . . , λn .
Exercise II.6 ▲ Derive the statement in Remark I.1.4 from the Separation Theorem.

10.2 Extreme points


Exercise II.7 Find extreme points of the following sets:
3
1. X = {x ∈ R : x1 + x2 ≤ 1, x2 + x3 ≤ 1, x3 + x1 ≤ 1}
2. X = {x ∈ R4 : x1 + x2 ≤ 1, x2 + x3 ≤ 1, x3 + x4 ≤ 1, x4 + x1 ≤ 1}
Exercise II.8 ♦ Let M ⊂ Rn be a nonempty closed convex set not containing lines, and f ⊤ x
be a linear function of x ∈ Rn achieving its maximum over X. Prove that among maximizers
of this function on M there are extreme points of M .
Exercise II.9 [Follow-up to Exercise I.8] Let A, B be subsets of Rn . Mark by T those of the
below claims which always (i.e., for every data satisfying premise of the claim) are true:
1. If Conv(A) = Conv(B) , then A = B
2. If Conv(A) = Conv(B) is nonempty and A, B, Conv(A) are closed, then A ∩ B ̸= ∅.
3. If Conv(A) = Conv(B) is nonempty and bounded, A ∩ B ̸= ∅.
4. If Conv(A) = Conv(B) is nonempty, closed and bounded, then A ∩ B ̸= ∅.
Exercise II.10 As is immediately seen, the only extreme point of the nonnegative orthant
Rn+ = R+ × R+ × . . . × R+ is the origin, that is, the vector from {0} × {0} × . . . × {0}; as
we know, the extreme points of n-dimensional unit box {x ∈ Rn : 0 ≤ xi ≤ 1, i ≤ n} =
[0, 1] × [0, 1] × . . . × [0, 1] are zero/one vectors, that is, vectors from {0, 1} × {0, 1} × . . . × {0, 1}.
Prove the following generalization of these observations:
Let Xi ⊂ Rni , 1 ≤ i ≤ K, be closed convex sets. The set of extreme points of the direct product
X = X1 × . . . × XK of these sets is the direct product of the sets of extreme points of Xi .
10.2 Extreme points 133

Exercise II.11 ♦ Looking at the sets of extreme points of closed convex sets like the unit
Euclidean ball, a polytope, the paraboloid {[x; t] : t ≥ x⊤ x}, etc., we see that these sets are
closed. Do you think this always is the case? Is it true that the set Ext(M ) of extreme points of
a closed convex set M always is closed ?
Exercise II.12 ▲ Derive representation (∗) in Exercise I.29 from Example II.9.1 in section
9.1.1.
Exercise II.13 ♦ P By Birkhoff PTheorem, the extreme points of the polytope Πn = {[xij ] ∈
Rn×n : xij ≥ 0, i xij = 1 ∀j, j xij = 1 ∀i} are exactly the Boolean (i.e., with entries 0
and 1) matrices from this set. Prove that the same holds
P true for theP “polytope of sub-doubly
stochastic” matrices Πm,n = {[xij ] ∈ Rm×n : xij ≥ 0, i xij ≤ 1 ∀j, j xij ≤ 1 ∀i}.
Exercise II.14 ♦ [Follow-up to Exercise II.13] Let with m ≤ n,
Pm, n be two positive integers P
and Xm,n be the set of m × n matrices [xij ] with i |xij | ≤ 1 for all j ≤ n and j |xij | ≤ 1
for

all i ≤ m.
 
Describe the set

Ext(Xm,n ). To get an educated guess, look at the matrices
1 0 0 0 0 0 0.5 −0.5 0
0 0 −1
,
0 0 −1
,
−0.5 0.5 0
from X2,3 .

Exercise II.15 ♦ [follow-up to Exercise II.13] Let x be an n × n entrywise nonnegative matrix


with all row and all column sums ≤ 1. Is it true that for some doubly stochastic matrix x, the
matrix x − x is entrywise nonnegative?
Exercise II.16 ♦ [Assignment problem] Consider the problem as follows:
There are n jobs and n workers. When worker j is assigned with job i, we get profit cij . We want
to assign every worker with a job in such a way that every worker is assigned with exactly one
job and every job is assigned to exactly one worker. Under this restriction, we want to maximize
the total profit.
1. Pose the Assignment problem as a Boolean (i.e., with the decision variables restricted to be
zeros and ones) Linear Programming problem.
2. Think how to solve the problem from item 1 via plain Linear Programming
3. [computational study] Consider the special case of Assignment problem where all profits cij
are zeros or ones; you can interpret cij = 1/0 as the fact that worker j knows/does not
know how to execute job j. In this situation Assignment problem requires from us to find an
assignment which maximizes the total number of executed jobs. Assume now that the matrix
C = [cij ] is generated at random, with entries taking, independently of each other, value 1
with probability ϵ ∈ (0, 1) and value 0 with probability 1−ϵ. For n ∈ {4, 8, 16, 32, 64, 128, 256}
and ϵ ∈ {1/2, 1/4, 1/8, 1/16}, run 100 simulations per pair n, ϵ to find the empirical mean of
the ratio ”number of executed jobs in optimal assignment”/n and look at the results.
Exercise II.17 ▲ Let ν = (ν1 , . . . , νK ) with positive integer νi , and let Sν = Sν1 × . . . × SνK
be the space of block-diagonal, with K diagonal blocks of sizes νi × νi , i ≤ K, symmetric
matrices, let Sν+ be the cone composed of positive semidefinite matrices from Sν , and let E be
an m-dimensional affine plane in Sν which intersects Sν+ . The intersection X = E ∩ Sν+ is a
closed nonempty convex set not containing lines and thus possessing extreme points. Let W be
such a point, W ii be the diagonal blocks of W , and ri be the ranks of νi × νi matrices W ii .
Prove that
Xk XK
ri (ri + 1) ≤ νi (νi + 1) − 2m.
i=1 i=1

What happens in the diagonal case ν1 = . . . = νK = 1 ?


Exercise II.18 ♦ Let M be a closed convex set in Rn and x̄ be a point of M .
1. Prove that if there exists a linear form a⊤ x such that x̄ is the unique maximizer of the form
on M , then x̄ is an extreme point of M .
134 Exercises for Part II

2. Is the inverse of 1) true, i.e., is it true that every extreme point x̄ of M is the unique
maximizer, over x ∈ M , of a properly selected linear form?
Exercise II.19 Identify and justify the correct claims in the following list:
n
1. Let X ⊂ R be a nonempty closed convex set, P be an m × n matrix, Y = P X := {P x :
x ∈ X} ⊂ Rn , and Y be the closure of Y . Then
• For every x ∈ Ext(X), P x ∈ Ext(Y )
• Every extreme point of Y which happens to belong to Y is P x for some x ∈ Ext(X)
• When X does not contain lines, then every extreme point of Y which happens to belong
to Y is P x for some x ∈ Ext(X)
2. Let X, Y be nonempty closed convex sets in Rn , and let Z = X + Y , Z = cl Z. Then
• If w ∈ Ext(Z) ∩ Z, then w = x + y for some x ∈ Ext(X) and y ∈ Ext(Y ).
• If x ∈ Ext(X), y ∈ Ext(Y ), then x + y ∈ Ext(Z).
Exercise II.20 ♦ [faces of polyhedral set] Let X = {x ∈ Rn : a⊤ i x ≤ bi , i ≤ m} be a nonempty
polyhedral set and f ⊤ x be a linear form of x ∈ Rn which is bounded above on X:

Opt(f ) = sup f ⊤ x < ∞


x∈X

Prove that
1. Opt(f ) is achieved – the set Argmax f ⊤ x := {x ∈ X : f ⊤ x = Opt(f )} is nonempty.
x∈X
2. The set Argmax f ⊤ x is as follows: there exists an index set I ⊂ {1, 2, . . . , m}, perhaps empty,
x∈X
such that
Argmax f ⊤ x = XI := {x : a⊤ ⊤
i x ≤ bi ∀i, ai x = bi ∀i ∈ I}
x∈X

3. Vice versa, if I ⊂ {1, . . . , m} is such that the set XI = {x : a⊤ ⊤


i x ≤ bi ∀i, ai x = bi ∀i ∈ I} is

nonempty, then XI = X∗ := Argmaxx∈X f x for properly selected f .
Note: Nonempty sets of the form XI , I ⊂ {1, . . . , m}, are called faces of the polyhedral set
X. This definition is not geometric – according to it, whether a given set Y is or is not a
face in X, may depend not on X per se, but on its representation as the solution set of a
finite system of linear inequalities. Facts 2—3, taken together, state that in fact being a face
of a polyhedral set is a geometric property – faces are exactly the sets Argmax f ⊤ x of all
x∈X
maximizers of linear forms bounded from above on X.
4. Extreme points of a face of X are extreme points of X.
5. Extreme points of X, if any, are exactly the faces of X which are singletons.
Note: As a corollary of 1—3, 5, we see that extreme points of polyhedral set X are exactly
the maximizers of those linear forms which achieve their maximum on X at a unique point.
Exercise II.21 ♦ [Follow-up to Exercise II.20]
1. Let X ⊂ Y be nonempty closed convex sets in Rn . Is it true that Ext(Y ) ∩ X ⊂ Ext(X) ?
2. Let X be a nonempty closed convex set contained in the polyhedral set {x : Ax ≤ b}.
Assuming that the set X = X∩{x : Ax = b} is nonempty, is it true that Ext(X) = Ext(X)∩X
?
m×n
3. By the result
P of Exercise P II.13, the extreme points of the polytope Πm,n = {[xij ] ∈ R :
xij ≥ 0, i xij ≤ 1 ∀j, j xij ≤ 1 ∀i} are exactly the Boolean matrices from this polytope.
Now let Πb m,n be the part of Πm,n cut off Πm,n by imposing on prescribed row and columns
of m × n matrix x ∈ Πm,n the requirement to be equal to 1, rather than to be ≤ 1. Assuming
Π
b m,n nonempty, prove that the extreme points of this polytope are exactly the Boolean
matrices contained in it.
10.3 Cones and extreme rays 135

Exercise II.22 Let X ⊂ Rm be a nonempty polyhedral set, x 7→ P x + p : Rn → Rm be an


affine mapping, and Y be the image of X under this mapping. Mark by T the statements in the
below list which are always (i.e., for all X, P, p compatible with the above assumptions) true:
1. Y is a nonempty polyhedral set.
2. If X does not contain lines, so is Y .
3. If X does contain lines, so is Y .
4. If v is an extreme point of X, then P v + p is an extreme point of Y .
5. If z is an extreme point of Y , then z = P v + p for certain extreme point z of X.
6. If z is an extreme point of Y and X does not contain lines, then z = P v + p for certain
extreme point z of X.
Exercise II.23 ▲ Find extreme points of the following closed convex sets:
1. The set Sn = {X ∈ Sn : −In ⪯ X ⪯ In }
2. The set Sn+ = {X ∈ Sn : 0 ⪯ X ⪯ In }
3. The set Dk,n = {X ∈ Sn : In ⪰ X ⪰ 0, Tr(X) = k}, where k is a positive integer ≤ n.
4. The set Mn = {X ∈ Rn×n : ∥X∥2,2 ≤ 1} (∥ · ∥2,2 is the spectral norm)
Exercise II.24 ▲ Prove the following fact (which can be considered as a matrix extension of
Birkhoff Theorem):
For positive integers d, n, let Πd,n be the set of all n × n block matrices with d × d symmetric
blocks X ij satisfying
X X
X ij ⪰ 0, Tr(X ij ) = 1∀i, Tr(X ij ) = 1∀j.
j i

The extreme points of Πd,n are exactly the block matrices [X ij ]i,j≤n as follows: for certain n × n
permutation matrix P and unit vectors eij ∈ Rd , one has
X ij = Pij eij e⊤
ij ∀i, j.

Exercise II.25 ▲ Let k, n be positive integers with k ≤ n, and let sk (λ) for λ ∈ Rn be
the sum of k largest entries in λ. From the description of the extreme points of the polytope
X = {x ∈ Rn : 0 ≤ xi ≤ 1, i ≤ n, n
P
i=1 xi ≤ k}, see Example II.9.2 in section 9.1.1, it follows
that when λ ∈ Rn+ , then
Xn
max λi xi = sk (λ).
x∈X
i=1

Prove the following matrix analogy of this fact:


Pn
For k, n as above, let X = {(X1 , . . . , Xn ) : Xi ∈ Sd , 0 ⪯ Xi ⪯ Id , i ≤ n, i=1 Xi ⪯ kId }. Then
for λ ∈ Rn+ one has
n
X
(X1 , . . . , Xn ) ∈ X =⇒ λi Xi ⪯ sk (λ)Id ,
i=1

with the concluding ⪯ being = for properly selected (X1 , . . . , Xn ) ∈ X .

10.3 Cones and extreme rays


Exercise II.26 ▲ Let X be a nonempty closed and bounded set in Rn . Which of the following
statements are true?
1. Conv(X) is closed convex set.
2. Cone(X) is a closed cone.
3. When X is convex, Cone(X) is closed cone.
4. When 0 ̸∈ X, Cone(X) is a closed cone.
136 Exercises for Part II

5. When 0 ̸∈ X and X is convex, Cone(X) is closed cone.


6. When X is polyhedral, Cone(X) is a closed cone.
Exercise II.27 ♦ Let X ⊂ Rn be a nonempty polyhedral set given by polyhedral representa-
tion:
X = {x : ∃u : Ax + Bu ≤ r}

and let K = Cone(X) be the conic hull of X.


1. Is it true that K is a closed cone?
2. Prove that K := cl K is a polyhedral cone and find polyhedral representation of K.
3. Assume that X is given by plain – no extra variables – polyhedral representation: X = {x :
Ax ≤ b}. Build plain polyhedral representation of K := cl Cone(X).
Exercise II.28 As we know, the extreme directions of the nonnegative orthant Rn + = R+ ×
R+ × . . . × R+ are the vectors with single positive entry and remaining entries equal to 0. Prove
the following generalization of this observation:
Let Xi ⊂ Rni , 1 ≤ i ≤ K, be closed and pointed cones. The extreme directions of the direct
product X = X1 × . . . × XK of these cones are the block-vectors d = [d1 ; . . . ; dK ] with di ∈ Rni
of the following structure: all but one blocks in d are zero, and the only nonzero block is an
extreme direction of the corresponding factor Xi .
Exercise II.29 Describe all extreme rays of
1. positive semidefinite cone Sn
+
2. Lorentz cone Ln

10.4 Recessive cone


Exercise II.30 ▲ Let M be a convex set, and let x̄ and h be such that Rx̄ := {x̄ + th : t ≥
0} ⊂ M .
1. Is it always true that whenever x ∈ M , the set Rx = {x + th, t ≥ 0} is contained in M ?
2. Let h be a recessive direction of M = cl M , and let x̄ be a point from the relative interior of
M . Is it always true that the set Rx̄ = {x̄ + th : t ≥ 0} is contained in M ?
Exercise II.31 ▲ Let M ⊂ Rn be a cone, not necessary closed; recall that pointedness of a
cone M means that the only vector x such that x ∈ M and −x ∈ M is the zero vector. Which
of the following statements are always true:
1. M is pointed if an only if the only representation of 0 as the sum of k ≥ 1 vectors xi ∈ M is
the representation with xi = 0, i ≤ k.
2. M is pointed if and only if M does not contain straight lines (one-dimensional affine planes)
passing through the origin.
3. Assuming M closed, M is pointed if and only if M does not contain straight lines.
4. M is pointed cone if and only if the closure of M is so.
5. The closure of M is a pointed cone if and only if M does not contain straight lines.
Exercise II.32 Literal interpretation of the words “polyhedral cone” is: a polyhedral set {x :
Ax ≤ b} which is a cone. An immediate example is the solution set {x : Ax ≤ 0} of homogeneous
system of linear inequalities. Prove that this example is generic: whenever a polyhedral set
K = {x : Ax ≤ b} is a cone, one has K = {x : Ax ≤ 0}.
Exercise II.33 ♦ Prove the following modification of Proposition II.8.18:
10.5 Around majorization 137

(!) Let X ⊂ RN be a nonempty closed convex set such that X ⊂ V + Rec(X) for some
bounded and closed set V , let x 7→ A(x) = Ax + b : RN → Rn be an affine mapping, and let
Y = A(X) := {y : ∃x ∈ X : y = A(x)} be the image of X under this mapping. Let also

K = {h ∈ Rn : ∃g ∈ Rec(X) : h = Ag}.

Then the recessive cone of the closure Y of Y is the closure K of K. In particular, when K is
closed (as definitely is the case when Rec(X) is polyhedral), it holds Rec(Y ) = K.
Exercise II.34 ♦ [follow-up to Exercise II.33]
1. Let K1 ⊂ R , K2 ⊂ Rn be closed cones, and let K = K1 + K2 .
n

• Is it always true that K is a cone?


• Is it always true that K is closed?
• Let K2 be polyhedral. Is it always true that K is closed?
• Let both K1 and K2 be polyhedral. Is it always true that K is closed?
2. Let Xi , i = 1, . . . , I, be closed convex sets in Rn with nonempty intersection. Is it true that
∩i Rec(Xi ) = Rec(∩i Xi )?
3. Let X1 , X2 be nonempty closed convex sets in Rn , let K1 = Rec(X1 ), K2 = Rec(X2 ),
X = cl(X1 + X2 ), K = cl(K1 + K2 ).
• Is it always true that K ⊂ Rec(X) ?
• Is is always true that K = Rec(X) ?
• Assume that Xi ⊂ Vi + Ki for properly selected closed and bounded set Vi , i = 1, 2, Is it
true that K = Rec(X) ?
Exercise II.35 ▲ Let f (x) = x⊤ Cx − c⊤ x + σ be quadratic form with C ⪰ 0. By Exercise I.15,
the set E = {x : f (x) ≤ 0} is convex (and of course closed). Assuming E ̸= ∅, describe Rec(E).

10.5 Around majorization


Exercise II.36 ♦ Let x ∈ Rm , let X[x] be the convex hull of all permutations of x, and let
X+ [x] be the set of all vectors x′ dominated by a vector form X[x]:

X+ [x] = {y : ∃z ∈ X[x] : y ≤ z}.

1) Prove that X+ [x] is a closed convex set.


2) Prove the following characterization of X+ [x]: X+ [x] is exactly the set of solutions of the
system of inequalities sj (y) ≤ sj (x), j = 1, . . . , m, in variables y, where, as always sj (z) is the
sum of the j largest entries in vector z.

10.6 Around polars


Exercise II.37 Justify the last three claims in Example II.8.11.
Exercise II.38 ♦ [more on polars]
1. Recall that for U ⊂ Rn , Vol(U ) stands for the ratio of the n-dimensional volume of U and
the volume of the n-dimensional unit Euclidean ball. Check that for a centered at the origin
ellipsoid E = {x : x⊤ Cx ≤ 1} (C ≻ 0) we have Vol(E)Vol(Polar (E)) = 1.
2. Let C ≻ 0 and let ellipsoid E = {x : (x − c)⊤ C(x − c) ≤ 1} contain the origin. Compute
Polar (E).
3. Let Xk , k ≤ K, be closed convex sets in Rn containing the origin. Prove that
Polar (Conv(∪k Xk )) = ∩k Polar (Xk ) (a)
Polar (∩k Xk ) = cl Conv(∪k Polar (Xk )) (b)
138 Exercises for Part II

Exercise II.39 ♦ Let X ⊂ Rn be a cone given by polyhedral representation


X = {x ∈ Rn : ∃u : Ax + Bu ≤ r}
Is the dual to X cone X∗ polyhedral? If yes, build a polyhedral representation of X∗ .
Exercise II.40 ♦
1. Let X ⊂ Rn be a nonempty polyhedral set given by polyhedral representation
X = {x ∈ Rn : ∃u : Ax + Bu ≤ r}
Is the polar Polar (X) of X polyhedral? If yes, point out a polyhedral representation of
Polar (X). For non-polyhedral extension, see Exercise IV.36.
2. Compute the polars of
1. probabilistic simplex ∆ = {x ∈ Rn : x ≥ 0, i xi = 1}
P
2. convex hull of nonempty finite set of points a1 , . . . , aN from Rn
3. the set {x ∈ Rn : x ≤ b}
Solution: Polar ({x : x ≤ b}) = {y : y ≥ 0, y ⊤ b ≤ 1}

10.7 Miscellaneous exercises


Exercise II.41 ▲ Let X = {x ∈ Rn : Ax ≤ b} be a nonempty polyhedral set.
1. Prove that X is bounded if and only if every one of the vectors ±ei , (ei , 1 ≤ i ≤ n, are the
basic orth) can be represented as conic combination of columns of A⊤ .
2. Certify the correct statements in the following list:
• The polyhedral set X = {x ∈ R3 : x ≥ [1/3; 1/3; 1/3], P3i=1 xi ≤ 1} is bounded.
P
• The polyhedral set X = {x ∈ R : x1 ≥ 1/3, x2 ≥ 1/3, 3i=1 xi ≤ 1} is unbounded.
3

Exercise II.42 Prove the easy part of Theorem II.9.7, specifically, that every n×n permutation
matrix is an extreme point of the polytope Πn of n × n doubly stochastic matrices.
Exercise II.43 ♦ [robust LP] Consider uncertain Linear Programming problem – a family
 XN XN 
minn {c⊤ x : [A + ζν ∆ν ]x ≤ b + ζ ν δν } : ζ ∈ Z (10.3)
x∈R ν=1 ν=1

of LP instances of common sizes (n variables, m constraints). The associated story is as follows:


we want to solve an LP program with the data not known exactly when the problem is being
solved; what we know at this time, is that the “true problem” belongs to the parametric family
given, according to (10.3), by the “nominal data” c, A, b, “basic perturbations ∆ν , δν ” and the
perturbation set Z through which run the data perturbations ζ specifying particular instances
in the family. In this situation (quite typical for real life applications of LP, where partial data
uncertainty is a rule rather than an exception), one way to “immunize” decisions against data
uncertainty is to look for robust solutions – those remaining feasible for all perturbations of
the data from the perturbation set – by solving the Robust Counterpart (RC) of our uncertain
problem – the optimization problem
n XN XN o
min c⊤ x : [A + ζν ∆ν ]x ≤ b + ζν δν ∀(ζ ∈ Z) (RC)
x ν=1 ν=1

(RC) is not an LP program – it has finitely many decision variables and infinite (when Z is
”massive”) system of linear constraints on these variables. Optimization problems of this type
are called semi-infinite and are, in general, difficult to solve. However, the RC of an uncertain
LP is easy, provided that Z is a “computation-friendly” set, for example, nonempty set given
by polyhedral representation:
Z = {ζ : ∃u : P ζ + Qu ≤ r} (10.4)
10.7 Miscellaneous exercises 139

Now goes the exercise per se:


Use LP duality to reformulate (RC), (10.4) as an explicit LP program.
Exercise II.44 ▲ Consider scalar linear constraint

a⊤ x ≤ b (1)
n
with uncertain data a ∈ R (b is certain) varying in the set
Xn
U = {a : |ai − a∗i |/δi ≤ 1, 1 ≤ i ≤ n, |ai − a∗i |/δi ≤ k} (2)
i=1

where a∗i are given “nominal data,” δi > 0 are given quantities, and k ≤ n is an integer (in
literature, this is called “budgeted uncertainty”). Rewrite the Robust Counterpart

a⊤ x ≤ b ∀a ∈ U (RC)

in a tractable LO form (that is, write down an explicit system (S) of linear inequalities in
variables x and additional variables such that x satisfies (RC) if and only if x can be extended
to a feasible solution of (S)).

Exercise II.45 ▲ [computational study, follow-up to Exercise II.43]


Preliminaries. Consider oscillator transmitting harmonic wave with unit wavelength and placed
at some point P in 3D. Physics says that the electric field generated by the oscillator, when
measured at a remote point A, is

eA (t) ≈ r−1 α cos ωt − 2πr + θ + 2πd cos(ϕ) + ωt



(∗)
| {z }
EA (t)

where
• t is time, ω is the frequency,
• r is the distance from A to the origin O, d is the distance from P to the origin, ϕ ∈ [0, π] is
−−→ −→
the angle between the directions OP and OA,
• α and θ are responsible for how the oscillator is actuated.
The difference between the left and the right hand sides in (∗) is of order of r−2 and in all our
subsequent considerations can be completely ignored.
It is convenient to assemble α and θ into the actuation weight – the complex number w = αeıθ
(ı is the imaginary unit); with this convention, we have

EA (t) = ℜ wDP (ϕ)eıωt−2πr , DP (ϕ) = e2πıd cos(ϕ) .


 

where ℜ[·] stands for the real part of a complex number. The complex-valued function DP (ϕ) :
[0, π] → C, called the diagram of the oscillator, is responsible for the directional density of
the energy emitted by the oscillator: when evaluated at certain 3D direction ⃗e, this density
is proportional to |Dp (ϕ)|2 , where ϕ is the angle between the direction ⃗e and the direction
−−→
OP . Physics says that when our transmitting antenna is composed of K harmonic oscillators
located at points P1 , . . . , PK and actuated with weights w1 , . . . , wK , the directional density of the
energy transmitted by the resulting antenna array, as evaluated at a direction ⃗e, is proportional
−−→
to | k wk Dk (ϕk (⃗e))|2 , where ϕk (⃗e) is the angle between the directions ⃗e and OPk .
P

Consider the design problem as follows. We are given linear array of K oscillators placed
at the points Pk = (k − 1)δe, k ≤ K, where e is the first basic orth (that is, the unit vector
“looking” along the positive direction of the x-axis), and δ > 0 is a given distance between
consecutive oscillators. Our goal is to specify actuation weights wk , k ≤ K, in order to send as
much of total energy as possible along the directions which make at most a given angle γ with
e. To this end, we intend to act as follows:
140 Exercises for Part II

We want to select actuation weights wk , k ≤ K, in such a way that the magnitude |Dw (ϕ)| of
the complex-valued function
K
X
Dw (ϕ) = wk e2πı(k−1)δ cos(ϕ))
k=1

of π ∈ [0, π] is “concentrated” on the segment 0 ≤ ϕ ≤ γ. Let us normalize the weights by the


requirement
Dw (0) = 1

and minimize under this restriction the “sidelobe level”

max |Dw (ϕ)|


γ≤ϕ≤π

over w.
To get a computation-friendly version of this problem, we replace the full range [0, π] of values
of ϕ with M -point equidistant grid
ℓπ
Γ = {ϕℓ = : 0 ≤ ℓ ≤ M − 1},
M −1
thus converting our design problem into the optimization problem

| K wk e2πı(k−1)δ cos(ϕℓ ) | ≤ t ∀(ℓ : ϕℓ > γ)


 P 
Opt = min t : P K
k=1
2πı(k−1)δ , wk ∈ C, k ≤ K (P )
k=1 wk e =1
t,w

which is a convex problem in 2k real variables – real and imaginary parts of w1 , . . . , wK .


Your tasks are as follows:
1. Process problem (P ) numerically and find the optimal design wn = {wkn , k ≤ K} along with
the optimal value Optn . Here and in what follows, recommended setup is
• number of oscillators K = 24, distance between consecutive oscillators δ = 0.125
• γ = π/12
• cardinality M of the equidistant grid Γ is 512
Draw the plot of the modulus of the resulting diagram
K
X
Dn (ϕ) = wkn e2πı(k−1)δ cos(ϕ)
k=1

and compute the corresponding “energy concentration” C n , with concentration of a diagram


D(·) defined as
P 2
ℓ:ϕ ≤γ sin(ϕℓ )|D(ϕℓ )|
C = PMℓ
2
ℓ=1 sin(ϕℓ )|D(ϕℓ )|

– up to discretization of ϕ, this is the ratio of the energy emitted in the “cone of interest”
(i.e., along the directions making angle at most γ with e) to the total emitted energy. Factors
sin(ϕℓ ) reflect the fact that when computing the energy emitted in a spatial cone, we should
integrate |D(·)|2 over the part of the unit sphere in 3D cut off the sphere by the cone.
2. Now note that “in reality” the optimal weights wkn , k ≤ K are used to actuate physical
devices and as such cannot be implemented with the same 16-digit accuracy with which they
are computed; they definitely will be subject to small implementation errors. We can model
these errors by assuming that the “real life” diagram is
K
X
D(ϕ) = wkn (1 + ρξk )e2πı(k−1)δ cos(ϕ)
k=1
10.7 Miscellaneous exercises 141

where ρ ≥ 0 is some (perhaps small) perturbation level and ξk ∈ C are “primitive” per-
turbations responsible for the implementation errors and running through the unit disk
{ξ : |ξ| ≤ 1}. It is not a great sin to assume that ξk are independent across k random
variables uniformly distributed on the unit circumference in C. Now the diagram becomes
random and can violate the constraints of (P ) , unless ρ = 0; in the latter case, the diagram
is the “nominal” one given by the optimal weights wn , so that it satisfies the constraints of
(P ) with t set to Optn .
Now, what happens when ρ > 0? In this case, the diagram D(·) and its deviation v from
the prescribed value 1 at the origin, its sidelobe level l = maxℓ:ϕℓ >γ |D(ϕℓ )|, and energy
concentration become random. A crucial “real life” question is how large are “typical values”
of these quantities. To get impression of what happens, you are asked to carry out the
numerical experiment as follows:
• select perturbation level ρ ∈ {10−ℓ , 1 ≤ ℓ ≤ 6}
• for selected ρ, simulate and plot 100 realizations of the modulus of the actual diagram,
and find empirical averages v of v, l of l, and C of C.
3. Apply Robust Optimization methodology from Exercise II.43 to build “immunized against
implementation errors” solution to (P ), compute these solutions for perturbation levels 10−ℓ ,
1 ≤ ℓ ≤ 6, and subject the resulting designs to numerical study similar to the one outlined
in the previous item.
Note: (P ) is not a Linear Programming program, so that you cannot formally apply the
results stated in Exercise II.43; what you can apply, is the Robust Optimization “philosophy.”
Exercise II.46 ♦ Prove the statement “symmetric” the Dubovitski-Milutin Lemma:
The cone M∗ dual to the arithmetic sum of k (close or not) cones M i ⊂ Rn , i ≤ k, is the
intersection of the k cones M∗i dual to M i .
Exercise II.47 ♦ Prove the following polyhedral version of the Dubovitski-Milutin Lemma:
Let M , . . . , M be polyhedral cones in Rn , and let M = ∩i M i . The cone M∗ dual to M is the
1 k

sum of cones M∗i , i ≤ k, dual to M i , so that a linear form e⊤ x is nonnegative on M if and only
it can be represented as the sum of linear forms e⊤ i x nonnegative on the respective cones Mi .

Exercise II.48 ♦ [follow-up to Exercise II.47] Let A ∈ Rm×n be a matrix with trivial kernel,
e ∈ Rn , and let the set
X = {x : Ax ≥ 0, e⊤ x = 1} (∗)

be nonempty and bounded. Prove that there exists λ ∈ Rm such that λ > 0 and A⊤ λ = e.
Prove “partial inverse” of this statement: if KerA = {0} and e = A⊤ λ for some λ > 0, the
set (∗) is bounded.
Exercise II.49 ♦ Let E be a linear subspace in Rn , K be a closed cone in Rn , and ℓ(x) :
E → R be a linear (linear, not affine!) function which is nonnegative on K ∩ E. Which of the
following claims are always true:
1. ℓ(·) can be extended from E onto the entire Rn to yield a linear function which is nonnegative
on K
2. Assuming int K ∩ E ̸= ∅, ℓ(·) can be extended from E onto the entire Rn to yield a linear
function which is nonnegative on K.
3. Assuming, in addition to ℓ(x) ≥ 0 for x ∈ K ∩ E, that K = {x : P x ≤ 0} is a polyhedral
cone, ℓ(·) can be extended from E onto the entire Rn to yield a linear function which is
nonnegative on K.
Exercise II.50 Let n > 1. Is the unit ∥ · ∥2 -ball Bn = {x ∈ Rn : ∥x∥2 ≤ 1} a polyhedral set?
Justify your answer.
142 Exercises for Part II

Exercise II.51 ▲ The unit box {x ∈ Rn : −1 ≤ xi ≤ 1, i ≤ n} is cut off Rn by a system


of m = 2n linear inequalities and is a nonempty and bounded polyhedral set. However, when
we eliminate any inequality from this system, the solution set of the resulting system becomes
unbounded. To see that this situation is in a sense extreme, prove the following claim:
Consider the solution set of a system of m linear inequalities in n variables x, i.e., the
set
X := {x ∈ Rn : Ax ≤ b} ,
where A = [a⊤ ⊤ ⊤
1 ; a2 ; . . . ; am ]. Suppose that X is nonempty and bounded. Then, when-
ever m > 2n, one can drop from this system a properly selected inequality in such a
way that the solution set of the resulting subsystem remains bounded.
A provocative follow-up: Is it possible to cut off from R1000 a bounded set by using only a
single linear inequality?
Exercise II.52 ▲ [computational study] Let ω N = (ω1 , . . . , ωN ) be an N -element i.i.d. sample
drawn from the standard Gaussian distribution (zero mean, unit covariance) on Rd . How many
extreme points are there in the convex hull of the points from the sample?
1. Consider the planar case d = 2 and think how to list extreme points of Conv{ω1 , . . . , ωN }.
Fill the following table:

N 2 4 8 16 32 64 128
U
M
L

where U is the maximal, M is the mean, and L is the minimal # of extreme points observed
when processing 100 samples ω N of a given cardinality

2. Think how to upper-bound the expected number of extreme points in W = Conv(ω N ).


Exercise II.53 ▲ [computational study] Given positive integers m, n, with n ≥ 2, consider
randomly generated system Ax ≤ b of m linear inequalities with n variables. We assume that
A, b are generated by drawing the entries, independently of each other, from N (0, 1).
1. Consider the planar case n = 2. For m = 2, 4, 8, 16, generate 100 samples of m × 2 systems
and fill the following table:
m 2 4 8 16
F
B

where F is the number of feasible systems, and U is the number of feasible systems with
bounded solution sets.
Intermezzo: related theoretical results originating from [Nem24, Exercise 2.23] are as follows.
Given positive integers m, n with n ≥ 2, consider homogenous system Ax ≤ 0 of m inequal-
ities with n variables. We call this system regular, if its matrix A is regular, regularity of a
matrix B meaning that all square submatrices of B are nonsingular. Clearly, the entries of a
regular matrix are nonzero, and when a p × q matrix B is drawn at random from a probabil-
ity distribution on Rp×q which has a density w.r.t the Lebesgue measure, B is regular with
probability 1.
Given regular m × n homogeneous system of inequalities Ax ≤ 0, let gi (x) = n
P
j=1 Aij xj ,
i ≤ m, so that gj are nonconstant linear functions. Setting Πi = {x : gi (x) = 0}, we get
a collection of m hyperplanes in Rn passing through the origin. For a point x ∈ Rn , the
signature of x is, by definition, the m-dimensional vector σ(x) of signs of the reals gi (x),
1 ≤ i ≤ m. Denoting by Σ the set of all m-dimensional vectors with entries ±1, for σ ∈ Σ
10.7 Miscellaneous exercises 143

the set Cσ = {x : σ(x) = σ} is either empty, or is a nonempty open convex set; when it is
nonempty, let us call it a cell associated with A, and the corresponding σ – an A-feasible
signature. Clearly, for regular system, Rn is the union of all hyperplanes Πi and all cells
associated with A. It turns out that
The number N (m, n) of cells associated with a regular homogeneous m × n system
Ax ≤ 0 is independent of the system and is given by a simple recurrence:
N (1, 2) = 2
m ≥ 2, n ≥ 2 =⇒ N (m, n) = N (m − 1, n) + N (m − 1, n − 1) [N (m, 1) = 2, m ≥ 1].
m×n
Next, when A is drawn at random from probability distribution P on R which possesses
symmetric density p, that is, such that p([a⊤ ⊤ ⊤ ⊤ ⊤ ⊤
1 ; a2 ; . . . ; am ]) = p([ϵ1 a1 ; ϵ2 a2 ; . . . ; ϵm am ]) for
all A = [a⊤1 ; a⊤
2 ; . . . ; a⊤
m ] and all ϵi = ±1, then the probability for a vector σ ∈ Σ to be an
A-feasible signature is
π(m, n) = N (m, n)/2m .
In particular, the probability for the system Ax ≤ 0 to have a solution set with a nonempty
interior (this is nothing but A-feasibility of the signature [−1; . . . ; −1] is π(m, n).
The inhomogeneous version of these results is as follows. An m×n system of linear inequalities
Ax ≤ b is called regular, if the matrix [A, −b] is regular. Setting gi (x) = n
P
j=1 Aij xj − bi ,
i ≤ n, the [A, b]-signature of x is, as above, the vector of signs of the reals gi (x). For σ ∈ Σ, the
set Cσ = {x : σ(x)) = σ} is either empty, or is a nonempty open convex set; in the latter case,
we call Cσ an [A, b]-cell, and call σ an [A, b]-feasible signature. Setting Πi = {x : gi (x) = 0},
we get m hyperplanes in Rn , and the entire Rn is the union of those hyperplanes and all
[A, b]-cells. It turns out that
The number N (m, n) of cells associated with a regular m × n system Ax ≤ b is independent
of the system and is equal to 12 N (m + 1, n + 1).
In addition, when m×(n+1) matrix [A, b] is drawn at random from a probability distribution
on Rm×(n+1) possessing a symmetric density w.r.t. the Lebesgue distribution, the probability
for every σ ∈ Σ to be [A, b]-feasible signature is
π(m, n) = N (m + 1, n + 1)/2m+1 .
In particular, the probability for the system Ax ≤ b to be strictly feasible is π(m, n).
2. Accompanying exercise: Prove that if A is m × n regular matrix, then the system Ax ≤ 0
has a nonzero solution if and only if the system Ax < 0 is feasible. Derive from this fact that
if [A, b] is regular, then the system Ax ≤ b is feasible if and only if it is strictly feasible, and
that when the system Ax ≤ 0 has a nonzero solution, the system Ax ≤ b is strictly feasible
for every b.
3. Use the results from Intermezzo to compute the expected values of F and B, see item 1.
Exercise II.54 ▲ [computational study]
1. For ν = 1, 2, . . . , 6, generate 100 systems of linear inequalities Ax ≤ b with n = 2ν variables
and m = 2n inequalities, the entries in A, b being drawn, independently of each other, from
N (0, 1). Fill the following table:

n 2 4 8 16 32 64
F
E{F }
B
F : # of feasible systems in sample;
B: # of feasible systems with bounded soultion sets
To compute the expected value of F , use the results from [Nem24, Exercise 2.23] cited in
item 2 of Exercise II.53.
144 Exercises for Part II

2. Carry out experiment similar to the one in item 1, but with m = n + 1 rather than m = 2n.

n 2 4 8 16 32 64
F
E{F }
B
E{B}
F : # of feasible systems in sample;
B: # of feasible systems with bounded soultion sets
11

Proofs of Facts from Part II

Fact II.7.2 Let S, T be nonempty convex sets in Rn . A linear form a⊤ x separates S


and T if and only if
(a) sup a⊤ x ≤ inf a⊤ y, and
x∈S y∈T

(b) inf a⊤ x < sup a⊤ y.


x∈S y∈T

This separation is strong if and only if (a) holds as a strict inequality:


sup a⊤ x < inf a⊤ y.
x∈S y∈T

Proof. When a linear form a⊤ x separates S and T , (a) holds true. Given (a), (b) could be
violated if and only if inf a⊤ x = sup a⊤ y. But, together with (a), this can happen only if a⊤ x is
x∈S y∈T
constant on S ∪ T , which is not the case as a⊤ x separates S and T . The above reasoning clearly
can be reversed: given (b), we have a ̸= 0, and given (a), both supx∈S a⊤ x and inf y∈T a⊤ y are
real numbers. Selecting b in-between these real numbers, and the hyperplane a⊤ x = b clearly
separates S and T . The “strong separation” claim is evident.

Fact II.8.4 Let M be a nonempty convex set and let x ∈ M . Then, x is an extreme
point of M if and only if any (and then all) of the following holds:
(i) the only vector h such that Px ± h ∈ M is the zero vector;
m
(ii) in every representation x = i=1 λi xi of x as a convex combination, with positive
coefficients, of points xi ∈ M , i ≤ m, one has x1 = . . . = xm = x;
(iii) the set M \ {x} is convex.
Proof.
(i): If x is extreme point and x ± h ∈ M , then h = 0, since otherwise x = 12 (x + h) + 21 (x − h)
implying that x is an interior point of a nontrivial segment [x − h, x + h], which is impossible.
For the other direction, assume for contradiction that x ± h = 0 implies h = 0 and that x
is not at extreme point of M . Then, as x ̸∈ Ext(M ), there exists u, v ∈ M where both u, v
are not equal to x and λ ∈ (0, 1) such that x = λu + (1 − λ)v. As u ̸= x and v ̸= x while
x = λu+(1−λ)v, we conclude that u ̸= v. Now, consider any δ > 0 such that δ < min{λ, 1−λ}
and define h := δ(u − v). Note that h ̸= 0 and x + h = (λ + δ)u + (1 − λ − δ)v ∈ M and
x − h = (λ − δ)u + (1 − λ + δ)v ∈ M due to λ ± δ ∈ (0, 1), u, v ∈ M and convexity of M .
This then leads to the desired contradiction with our assumption that x ± h ∈ M implies
that h = 0.
As a byproduct of our reasoning, we see that if x ∈ M can be represented as x = λu+(1−λ)v
with u, v ∈ M , λ ∈ (0, 1], and u ̸= x, then x is not an extreme point of M .

145
146 Proofs of Facts from Part II

(ii): In one direction, when x is not an extreme point of M , there exists h ̸= 0 such that x±h ∈ M
so that x = 12 (x + h) + 12 (x − h) is a convex combination with positive coefficients and using
two points x ± h that are both in M and are distinct P from x. To prove the opposite direction,
m i P
let x be an extreme point of M and suppose x = i=1 λ i x with λ i > 0, i λ i = 1, and
let us prove that x1 = . . . = xm = x. Indeed, assume for contradiction that at least one
of xi , say, x1 , differs P
from x, and m > 1. Since λ2 > 0, wePhave 0 < λ1 < 1. Then, the
point v := (1 − λ1 )−1 m i
i=2 λi x is well defined. Moreover, as
m
i=2 λi = 1 − λ1 , v is a convex
combination of x , . . . , x and therefore v ∈ M . Then, x = λ1 x +(1−λ1 )v with x, x1 , v ∈ M ,
2 m 1

λ1 ∈ (0, 1], and x1 ̸= x, which, by the concluding comment in item (i) of the proof, implies
that x ̸∈ Ext(M ); this is the desired contradiction.
(iii): In one direction, let x be an extreme point of M ; let us prove that the set M ′ := M \ {x} is
convex. Assume for contradiction that this is not the case. Then, there exist u, v ∈ M ′ and
λ ∈ [0, 1] such that x̄ := λu + (1 − λ)v ̸∈ M ′ , implying that 0 < λ < 1 (since u, v ∈ M ′ ). As
M is convex, we have x̄ ∈ M , and since x̄ ̸∈ M ′ and M \ M ′ = {x}, we conclude that x̄ = x.
Thus, x is a convex combination, with positive coefficients, of two distinct from x points from
M , contradicting, by already proved item (ii), the fact that x is an extreme point of M . For
the other direction, suppose that M \ {x} is convex and we will prove that x must be an
extreme point of M . Assume for contradiction that x ̸∈ Ext(M ). Then, there exists h ̸= 0
such that x ± h ∈ M . As h ̸= 0, both x + h and x − h are distinct from x, thus x ± h ∈ M \ {x}.
We see that x ± h ∈ M \{x}, x = 21 (x + h) + 12 (x − h) and x ∈ / M \ {x}, contradicting the
convexity of M \ {x}.

Fact II.8.5 All extreme points of the convex hull Conv(Q) of a set Q belong to Q:
Ext(Conv(Q)) ⊆ Q.
Proof. Assume for contradiction that x ∈ Ext(Conv(Q)) and x ̸∈ Q. As x ∈ Ext(Conv(Q)),
by Fact II.8.4.(iii) the set Conv(Q) \ {x} is convex and contains Q, contradicting the fact that
Conv(Q) is the smallest convex set containing Q.

Fact II.8.13 Let M be a nonempty closed convex set in Rn . Then


(i) Rec(M ) ̸= {0} if and only if M is unbounded.
(ii) If M is unbounded, then all nonzero recessive directions of M are positive
multiples of recessive directions of unit Euclidean length, and the latter are asymptotic
directions of M , i.e., a unit vector h ∈ Rn is a recessive direction of M if and only
if there exists a sequence {xi ∈ M }i≥1 such that ∥xi ∥2 → ∞ as i → ∞ and
h = limi→∞ xi /∥xi ∥2 .
(iii) M does not contain lines if and only if the cone Rec(M ) does not contain
lines.
Proof.
(i): If Rec(M ) ̸= {0}, then M contains a ray and therefore M is unbounded. For the reverse
direction, suppose M is unbounded and let us prove that Rec(M ) ̸= {0}. As M is unbounded,
there exists a sequence of points xi ∈ M such that ∥xi ∥2 > i, for all i = 1, 2, . . .. Then, for
sufficiently large i, the vectors hi := (xi − x1 )/∥xi − x1 ∥2 are well defined unit vectors. Passing
to a subsequence, we can assume that hi → h as i → ∞ (Theorem B.15), so that h is a unit
vector as well. For every t ≥ 0, the points x1 + thi , for all i with ∥xi − x1 ∥2 > t, are convex
combinations of points x1 and xi and both x1 , xi ∈ M . Then, as M is convex, x1 + thi ∈ M for
all large enough i. As i → ∞, the points x1 + thi converge to x1 + th, and since M is closed,
we conclude x1 + th ∈ M . Because this holds for every t ≥ 0, the vector h ̸= 0 is a recessive
direction of M .
(ii): Suppose M is unbounded and consider any h ∈ Rec(M ) such that h is a unit vector.
Pick any x0 ∈ M and define xi := x0 + ih, i = 1, 2, . . .. Then, we get a sequence of points from
M diverging to infinity, i.e., ∥xi ∥2 → ∞ as i → ∞, and also satisfying h = limi→∞ ∥xi ∥−1 i
2 x .
Proofs of Facts from Part II 147

Thus, h is an asymptotic direction of M , as claimed. To prove the reverse direction, if xi ∈ M


are such that ∥xi ∥2 → ∞ and h := limi→∞ ∥xi ∥−1 i
2 x exists, then h is a recessive direction of M
by the same reasoning used in the proof of item (i), and of course h is a unit vector.
(iii): Suppose M contains a line with direction h ̸= 0, i.e., for some x ∈ M and for all t ∈ R,
we have x + th ∈ M . Then, by the definition of recessive direction, both h and −h are recessive
directions of M , so ±h ∈ Rec(M ), and thus Rec(M ) contains a line with the direction h. For
the reverse direction suppose h ̸= 0 and Rec(M ) contains a line with direction h. Since Rec(M )
is a closed convex cone, the line with the same direction passing through the origin is contained
in Rec(M ) (by Lemma II.8.8). Thus, ±h ∈ Rec(M ). Then, by the definition of Rec(M ), for any
x ∈ M it holds that x + th ∈ M for all t ∈ R. Hence, M contains a line with the direction h.

Fact II.8.14 Let M ⊆ Rn be a nonempty closed convex set. Recall its closed conic
transform is given by
ConeT(M ) = cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ M } ,
(see section 1.5). Then,
Rec(M ) = h ∈ Rn : [h; 0] ∈ ConeT(M ) .


Proof. Let h be such that [h; 0] ∈ ConeT(M ), and let us prove that h ∈ Rec(M ). There is nothing
to prove when h = 0, thus assume that h ̸= 0. Since the vectors g such that [g; 0] ∈ ConeT form a
closed cone and both ConeT(M ) and Rec(M ) are cones as well, we lose nothing when assuming,
in addition to [h; 0] ∈ ConeT(M ) and h ̸= 0, that h is a unit vector. Since [h; 0] ∈ ConeT(M ), by
definition of the latter set there exists a sequence [ui ; ti ] → [h; 0], i → ∞, such that ti > 0 and
xi := ui /ti ∈ M for all i. Then, this together with ui → h and ∥h∥2 = 1 imply that ∥xi ∥2 → ∞
and ti ∥xi ∥2 → 1 as i → ∞. As a result, limi→∞ ∥xi ∥−1 i i
2 x = limi→∞ u = h. By Fact II.8.13(ii),
we see that h ∈ Rec(M ).
For the reverse direction, consider any h ∈ Rec(M ), and let us prove that [h; 0] ∈ ConeT(M ).
There is nothing to prove when h = 0, so we assume h ̸= 0. Consider any x̄ ∈ M and define
xi := x̄ + ih, i = 1, 2, . . .. As h ∈ Rec(M ), we have xi ∈ M for all i. Moreover, ∥xi ∥2 → ∞ as
i → ∞ due to h ̸= 0. We clearly have limi→∞ [xi /∥xi ∥2 ; 1/∥xi ∥2 ] = [h/∥h∥2 ; 0], and the vectors
[y i ; ti ] := [xi /∥xi ∥2 , 1/∥xi ∥2 ] for all large enough i satisfy the requirement ti > 0, y i /ti ∈ M , so
[y i ; ti ] ∈ ConeT(M ) for all large enough i. As ConeT(M ) is closed and [y i ; ti ] → [h/∥h∥2 ; 0] as
i → ∞, we deduce [h/∥h∥2 ; 0] ∈ ConeT(M ). Finally, ConeT(M ) is a cone, so [h; 0] ∈ ConeT(M )
as well.

Fact II.8.15 For any nonempty polyhedral set M = {x ∈ Rn : Ax ≤ b}, its recessive
cone is given by
Rec(M ) = {h ∈ Rn : Ah ≤ 0} ,
i.e., Rec(M ) is given by homogeneous version of linear constraints specifying M .
Proof. Consider any h such that Ah ≤ 0. Then, for any x̄ ∈ M , and t ≥ 0, we have A(x̄ + th) =
Ax̄ + tAh ≤ Ax̄ ≤ b, so x̄ + th ∈ M for all t ≥ 0. Hence, h ∈ Rec(M ). For the reverse direction,
suppose h ∈ Rec(M ) and x̄ ∈ M . Then, for all t ≥ 0 we have A(x̄ + th) ≤ b. This is equivalent
to Ah ≤ t−1 (b − Ax̄) for all t > 0, which implies that Ah ≤ 0.

Fact II.8.23 Let M be a closed cone in Rn , and let M∗ be the cone dual to M .
Then
148 Proofs of Facts from Part II

(i) Duality does not distinguish between a cone and its closure: whenever M = cl M ′
for a cone M ′ , we have M∗ = M∗′ .
(ii) Duality is symmetric: the cone dual to M∗ is M .
(iii) One has
int M∗ = y ∈ Rn : y ⊤ x > 0, ∀x ∈ M \ {0} ,


and int M∗ is nonempty if and only if M is pointed (i.e., M ∩ [−M ] = {0}).


Moreover, when M , in addition to being closed, is pointed and nontrivial (M ̸=
{0}), one has
int M∗ = y ∈ Rn : My := {x ∈ M : x⊤ y = 1} is nonempty and compact .

(8.7)

(iv) The cone dual to the direct product M1 × . . . × Mm of cones Mi is the direct
product of their duals: [M1 × . . . × Mm ]∗ = [M1 ]∗ × . . . × [Mm ]∗ .
Proof.
(i): This is evident.
(ii): By definition, any x ∈ M satisfies x⊤ y ≥ 0 for all y ∈ M∗ , hence M ⊆ [M∗ ]∗ . To prove
M = [M∗ ]∗ , assume for contradiction that there exists x̄ ∈ [M∗ ]∗ \M . By Separation Theorem,
{x̄} can be strongly separated from M , i.e., there exists y such that

y ⊤ x̄ < inf y ⊤ x.
x∈M

As M is a conic set and the right hand side infimum is finite, this infimum must be 0. Thus,
y ⊤ x̄ < 0 while y ⊤ x ≥ 0 for all x ∈ M implying y ∈ M∗ . But, then this contradicts to
x̄ ∈ [M∗ ]∗ .
(iii): Let us prove that int M∗ ̸= ∅ if and only if M is pointed. If M is not pointed, then ±h ∈ M
for some h ̸= 0, implying that y ⊤ [±h] ≥ 0 for all y ∈ M∗ , that is, y ⊤ h = 0 for all y ∈ M∗ .
Thus, when M is not pointed, M∗ belongs to a proper (smaller than the entire Rn ) linear
subspace of Rn and thus int M∗ = ∅. This reasoning can be reversed: when int M∗ = ∅,
the affine hull Aff(M∗ ) of M∗ cannot be the entire Rn (since int M∗ = ∅ and rint M∗ ̸= ∅);
taking into account that 0 ∈ M∗ , we have Aff(M∗ ) = Lin(M∗ ), so that Lin(M∗ ) ⫋ Rn , and
therefore there exists a nonzero h orthogonal to Lin(M∗ ). We have y ⊤ [±h] = 0 for all y ∈ M∗ ,
implying that h and −h belong to cone dual to M∗ , that is, to M (due to the already verified
item (ii)). Thus, for some nonzero h it holds ±h ∈ M , that is, M is not pointed.
Now let us prove that y ∈ int M∗ if and only if y ⊤ x > 0 for every x ∈ M \ {0}. In one
direction: assume that y ∈ int M∗ , so that for some r > 0 it holds y + δ ∈ M∗ for all δ with
∥δ2 ∥ ≤ r. If now x ∈ M , we have 0 ≤ minδ:∥δ∥2 ≤r [y + δ]⊤ x = y ⊤ x − r∥x∥2 . Thus,
1 ⊤
y ∈ int M∗ =⇒ ∥x∥2 ≤ y x, ∀x ∈ M, (*)
r
implying that y ⊤ x > 0 for all x ∈ M \ {0}, as required. In the opposite direction: assume that
y ⊤ x > 0 for all x ∈ M \ {0}, and let us prove that y ∈ int M∗ . There is nothing to prove when
M = {0} (and therefore M∗ = Rn ). Assuming M ̸= {0}, let M̄ = {x ∈ M : ∥x∥2 = 1}. This
set is nonempty (since M ̸= {0}), is closed (as M is closed), and is clearly bounded, and thus
is compact. We are in the situation when y ⊤ x > 0 for x ∈ M̄ , implying that minx∈M̄ y ⊤ x
(this minimum is achieved since M̄ is a nonempty compact set) is strictly positive. Thus,
y ⊤ x ≥ r > 0 for all x ∈ M̄ , whence [y + δ]⊤ x ≥ 0 for all x ∈ M̄ and all δ with ∥δ∥2 ≤ r. Due
to the origin of M̄ , the inequality [y + δ]⊤ x ≥ 0 for all x ∈ M̄ implies that [y + δ]⊤ x ≥ 0 for
all x ∈ M . The bottom line is that the Euclidean ball of radius r centered at y belongs to
M∗ , and therefore y ∈ int M∗ , as claimed.
Now let us prove the “Moreover” part of item (iii). Thus, let the cone M be closed, pointed,
and nontrivial. Consider any y ∈ int M∗ , then the set My , first, contains some positive
multiple of every nonzero vector from M and thus is nonempty (since M ̸= {0}) and, second,
Proofs of Facts from Part II 149

is bounded (by (*)). Since My is closed (as M is closed), we conclude that My is a nonempty
compact set. Thus, the left hand side set in (8.7) is contained in the right hand side one. To
prove the opposite inclusion, let y ∈ Rn be such that My is a nonempty compact set, and let
us prove that y ∈ int M∗ . By the already proved part of item (iii), all we need is to verify that
if x ̸= 0 and x ∈ M , then y ⊤ x > 0. Assume for contradiction that there exists x̄ ∈ M \ {0}
such that α := −y ⊤ x̄ ≥ 0. Then, by selecting any x b ∈ My (My is nonempty!) and setting
e = αb x + x̄, we get e ∈ M and y ⊤ e = 0. Note the e ̸= 0; indeed, e = 0 means that the nonzero
vector x̄ ∈ M is such that −x̄ = αb x ∈ M , contradicting pointedness of M . The bottom line
is that e ∈ M \ {0} and y ⊤ e = 0, whence e is a nonzero recessive direction of My . This is the
desired contradiction as My is compact!
(iv): This is evident.

Fact II.8.28 Let M ⊆ Rn be a cone and M∗ be its dual cone. Then, for any
x ∈ int M , there exists a properly selected Cx < ∞ such that
∥f ∥2 ≤ Cx f ⊤ x, ∀f ∈ M∗ .
Proof. Since x ∈ int M , there exists ρ > 0 such that x − δ ∈ M whenever ∥δ∥2 ≤ ρ. Then, as
f ∈ M∗ , we have f ⊤ (x − δ) ≥ 0 for any ∥δ∥2 ≤ ρ , i.e., f ⊤ x ≥ supδ {f ⊤ δ : ∥δ∥2 ≤ ρ} = ρ∥f ∥2 .
Taking Cx := 1/ρ (note that Cx < ∞ as ρ > 0) gives us the desired relation.

Fact II.8.33. Let M ⊆ Rn be a nontrivial closed cone, and M∗ be its dual cone.
(i) M is pointed
(i.1) if and only if M does not contain straight lines,
(i.2) if and only if M∗ has a nonempty interior, and
(i.3) if and only if M has a base.
(ii) Set (8.9) is a base of M
(ii.1) if and only if f ⊤ x > 0 for all x ∈ M \ {0},
(ii.2) if and only if f ∈ int M∗ .
In particular, f ∈ int M∗ if and only if f ⊤ x > 0 whenever x ∈ M \ {0}.
(iii) Every base of M is nonempty, closed, and bounded. Moreover, whenever M is
pointed, for any f ∈ M∗ such that the set (8.9) is nonempty (note that this set is
always closed for any f ), this set is bounded if and only if f ∈ int M∗ , in which case
(8.9) is a base of M .
(iv) M has extreme rays if and only if M is pointed. Furthermore, when M is pointed,
there is one-to-one correspondence between extreme rays of M and extreme points
of a base B of M : specifically, the ray R := R+ (d), d ∈ M \ {0} is extreme if and
only if R ∩ B is an extreme point of B.
Proof. (i.1): Since M is closed, convex, and contains the origin, M contains a line if and only
if M contains a line passing through the origin, and since M is conic, the latter happens if and
only if M is not pointed.
(i.2): This is precisely Fact II.8.23(iii).
(i.3): As we have seen, (8.9) is a base of M if and only if f ⊤ x > 0 for all x ∈ M \ {0}, which,
by Fact II.8.23(iii), holds if and only if f ∈ int M∗ .
(ii.1): This was explained when defining a base.
(ii.2): This is given by Fact II.8.23(iii).
(iii): Suppose B is a base of M . Then, B is nonempty since B intersects all nontrivial rays
in M emanating from the origin, and the set of these rays is nonempty since M is nontrivial.
Closedness of B is evident. To prove that B is bounded, note that by (ii.2) f ∈ int M∗ . Thus,
there exists r > 0 such that f − e ∈ M∗ , for all ∥e∥2 ≤ r. Hence, [f − e]⊤ x ≥ 0 for all x ∈ M
150 Proofs of Facts from Part II

and all e with ∥e∥2 ≤ r, implying f ⊤ x ≥ r∥x∥2 for all x ∈ M , and therefore ∥x∥2 ≤ r−1 for all
x ∈ B.
Next, let M be pointed and f ∈ M∗ be such that the set (8.9) is nonempty. Closedness of
this set is evident. Let us show that this set is bounded if and only if f ∈ int M∗ . Indeed, when
f ∈ int M∗ , B is a base of M by (ii.2) and therefore, as we have just seen, B is bounded. For
the other direction, suppose that f ̸∈ int M∗ . Then, by Fact II.8.23(iii), there exists x̄ ∈ M \ {0}
such that f ⊤ x̄ = 0. Also, as the set (8.9) is nonempty, there exists x b ∈ M such that f ⊤ x
b = 1.
Now, observe that for any λ ∈ [0, 1) the vector (1 − λ)−1 [(1 − λ)b x + λx̄] belongs to B and the
norm of this vector goes to +∞ as λ → 1. But, then this implies that B is unbounded, and so
the proof of (iii) is completed.
(iv): Suppose M is not pointed. Then, there exists a direction e ̸= 0 such that M contains
the line generated by e; in particular ±e ∈ M . Assume for contradiction that d is an extreme
direction of M . Then, as M is a closed convex cone, d ± te ∈ M for all t ∈ R. Thus, as M is a
cone, we have d± (t) := 12 [d ± te] ∈ M for all t. Let us first suppose that e is not collinear to d,
then for any t ̸= 0, the vector d± (t) is not a nonnegative multiple of d, but then this contradicts
d being an extreme direction of M . So, we now suppose that e is collinear to d. But, in this case,
for large enough t, one of the vectors d± (t), while being a multiple of d, is not a nonnegative
multiple of d, which again is impossible. Thus, when M is not pointed, M does not have extreme
rays.
Now let M be pointed, and let the set B given by (8.9) be a base of M (a base does exist
by (i.3)). B is a nonempty closed and bounded convex set by (iii). Let us verify that the rays
R+ (d) spanned by the extreme points of B are exactly the extreme rays of M . First, suppose
d ∈ Ext(B), and let us prove that d is an extreme direction of M . Indeed, let d = d1 + d2 for
some d1 , d2 ∈ M ; we should prove that d1 , d2 are nonnegative multiples of d. There is nothing
to prove when one of the vectors d1 , d2 is zero, so we assume that both d1 , d2 are nonzero. Then,
since B is a base, by (ii.1) we have αi := f ⊤ di > 0, i = 1, 2. Moreover, α1 + α2 = f ⊤ d = 1.
Setting di := αi−1 di , i = 1, 2, we have di ∈ B, i = 1, 2, and α1 d1 + α2 d2 = d1 + d2 = d. Recalling
that d is an extreme point of B and αi > 0, i = 1, 2, we conclude that d1 = d2 = d, that is,
d1 and d2 are positive multiples of d, as claimed. For the reverse direction, let d be an extreme
direction of M . We need to prove that the intersection of the ray R+ (d) and B (this intersection
is nonempty since d ∈ M \ {0}) is an extreme point of B. Passing from extreme direction d to
its positive multiple, we can assume that d ∈ B. To prove that d ∈ Ext(B), assume that there
exists h such that d ± h ∈ B and let us verify that h = 0. Indeed, as d ∈ B we have f ⊤ d = 1,
while from d ± h ∈ B we conclude that f ⊤ h = 0. Therefore, when h ̸= 0, h is not a multiple of
d, whence the vectors d ± h are not multiples of d. On the other hand, both of the vectors d ± h
belong to M and d is their average, which contradicts the fact that d is an extreme direction of
M . Thus, h = 0, as claimed.

Fact II.8.38 Let M be a convex set in Rn containing the origin. Then,


(i) Polar (M ) = Polar (cl M );
(ii) M is bounded if and only if 0 ∈ int(Polar (M ));
(iii) int(Polar (M )) ̸= ∅ if and only if M does not contain straight lines;
(iv) If M is a cone (not necessarily closed), then
Polar (M ) = a ∈ Rn : a⊤ x ≤ 0, ∀x ∈ M = −M∗ .

(8.10)
Assume that M is closed. Then, M is a closed cone if and only if Polar (M ) is a
closed cone.
Proof.
Proofs of Facts from Part II 151

(i): This follows immediately from supx∈M a⊤ x = supx∈cl M a⊤ x.


(ii): Suppose M is bounded. Then, by Cauchy-Schwarz inequality all vectors y with small enough
norms satisfy y ∈ Polar (M ) and so 0 ∈ int(Polar (M )). To see the reverse direction, sup-
pose 0 ∈ int(Polar (M )). Note that cl M is a closed convex set and by item (i), we have
Polar (M ) = Polar (cl M ), so 0 ∈ int(Polar (cl M )), i.e., Polar (cl M ) contains a ball cen-
tered at the origin with some radius ρ > 0. Then, by Proposition II.8.37, we have cl M =
= Polar (Polar (M )), which implies that x ∈ cl M = Polar (Polar (M ))
Polar (Polar (cl M )) 
only if 1 ≥ supy∈Rn y ⊤ x : ∥y∥2 ≤ ρ = ρ∥x∥2 . Thus, for all x ∈ cl M we have ∥x∥2 ≤ 1/ρ.
(iii): By item (i), the polar remains intact when passing from M to cl M ; by Lemma II.8.9, a
nonempty convex set M contains a straight line if and only if cl M does so. Thus, we lose
nothing when assuming in the rest of the proof that M is closed.
Assume, first, that M contains a straight line, and let us prove that int(Polar (M )) = ∅.
Indeed, when the closed convex set M contains a line, as 0 ∈ M by Lemma II.8.8 M contains
a parallel line ℓ passing through 0 ∈ M . Thus, Polar (M ) ⊆ Polar (ℓ). Since ℓ is a one-
dimensional linear subspace of Rn , Polar (ℓ) is the orthogonal complement ℓ⊥ of ℓ, so that
int(Polar (ℓ)) = int(ℓ⊥ ) = ∅, hence int(Polar (M )) = ∅ as well.
Now let int(Polar (M )) = ∅, and let us prove that M contains a straight line. Assume that it
is not the case, and let us lead this assumption to a contradiction. Since M does not contain
lines, the closed cone K := Rec(M ) is pointed, so that its dual cone K∗ has a nonempty
interior (Fact II.8.33(i.2)). Thus, there exists r > 0 and f ∈ K∗ such that the ball of radius
2r centered at f is contained in K∗ . Then, z ⊤ (f + e) ≥ 0 whenever z ∈ K, ∥e∥2 ≤ r and
∥f − f¯∥2 ≤ r. As a result,

f ⊤ z ≥ r∥z∥2 , ∀(z ∈ K, f ∈ B := {f : ∥f − f ∥2 ≤ r}). (∗)


Now let
C := sup f ⊤ z;
f ∈−B,z∈M

we claim that C < ∞. Taking this claim for granted, observe that C < ∞ implies, by
homogeneity, that supf ∈−ϵB,z∈M f ⊤ z ≤ ϵC for all ϵ > 0, hence for properly selected small
positive ϵ the ball −ϵB is contained in Polar (M ), implying int(Polar (M )) ̸= ∅, which is a
desired contradiction.
It remains to justify the above claim. To this end assume that C = +∞, and let us lead this
assumption to a contradiction. When C = +∞, there exists a sequence fi ∈ −B and zi ∈ M
such that fi⊤ zi → +∞ as i → ∞, implying, due to fi ∈ −B, that ∥zi ∥2 → ∞ as i → ∞.
Passing to a subsequence, we can assume that zi /∥zi ∥2 → h as i → ∞. Then, by its origin,
h is an asymptotic direction of M and therefore is a unit vector from K (Fact II.8.13(ii)).
Assuming w.l.o.g. zi ̸= 0 for all i, we have
 
fi⊤ zi = ∥zi ∥2 fi⊤ h + fi⊤ (zi /∥zi ∥2 − h) . (!)
|{z} | {z }
:=αi :=βi

As i → ∞, fi ∈ −B remain bounded and (zi /∥zi ∥2 − h) → 0, implying that βi → 0 as


i → ∞, while (∗) together with h ∈ K, ∥h∥2 = 1, and fi ∈ −B imply that αi ≤ −r < 0.
Thus, αi + βi ≤ −r/2 for large enough values of i, so that (!) taken together with ∥zi ∥2 → ∞
as i → ∞ says that fi⊤ zi → −∞ as i → ∞, contradicting the origin of fi and zi . Thus,
C < ∞, as claimed. This completes the verification of item (iii).
(iv): Clearly, when M is a nonempty conic set, the relation y ⊤ x ≤ 1 for all x ∈ M is exactly the
same as y ⊤ x ≤ 0 for all x ∈ M . Hence, when M is a cone, its polar is a closed cone given
by (8.10). On the other hand, when M contains the origin and is convex and closed, it is the
polar of its polar, so that when this polar is a cone, M itself is a closed cone (by the just
proved part of item (iv) as applied to Polar (M ) in the role of M ).
Part III

Convex Functions

153
12

First acquaintance with convex functions

12.1 Definition and examples

Definition III.12.1 [Convex function] A function f : Q → R defined on a


subset Q of Rn and taking real values is called convex, if
• the domain Q of the function is convex, and
• for every x, y ∈ Q and every λ ∈ [0, 1] one has
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y). (12.1)
The function f is called strictly convex whenever the above inequality holds
strictly for every x ̸= y and 0 < λ < 1.

A function f such that −f is convex is called concave. In particular, the domain


Q of a concave function f should be convex, and the function f itself should
satisfy the inequality opposite to (12.1), i.e.,
f (λx + (1 − λ)y) ≥ λf (x) + (1 − λ)f (y), ∀x, y ∈ Q and ∀λ ∈ [0, 1].
Example III.12.1 The simplest example of a convex function is an affine func-
tion
f (x) = a⊤ x + b,
which is simply the sum of a linear form and a constant. This function is clearly
convex on the entire space, and in this case “convexity inequality” holds as an
equality everywhere. In fact, an affine function is both convex and concave. More-
over, it is easily seen that a function which is both convex and concave on the
entire space must be affine. ♢
Example III.12.2 Here are several elementary examples of “nonlinear” convex
functions of one variable:
• functions convex on the entire axis:
x2p , where p is a positive integer;
exp(x);
exp(−x);
• functions convex on the nonnegative ray:
xp , where p ≥ 1;
−xp , where 0 ≤ p ≤ 1;
155
156 First acquaintance with convex functions

x ln x;
• functions convex on the positive ray:
1/xp , where p > 0;
− ln x.
At the moment it is not clear why these functions are convex. We will soon
derive a simple analytic criterion for detecting convexity which will immediately
demonstrate that the above functions indeed are convex. ♢
A very convenient equivalent definition of a convex function is in terms of its
epigraph. Given a real-valued function f defined on a subset Q of Rn , we define
its epigraph as the set
epi{f } := [x; t] ∈ Rn+1 : x ∈ Q, t ≥ f (x) .


Geometrically, to define the epigraph, we plot the graph of the function, i.e., the
surface {(x, t) ∈ Rn+1 : x ∈ Q, t = f (x)} in Rn+1 , and add to this surface
all points which are “above” it. Epigraph allows us to give an equivalent, more
geometrical, definition of a convex function as follows.

Proposition III.12.2 [Epigraph based definition of convex functions] A


function f : Q → R defined on a subset Q of Rn is convex if and only if its
epigraph is a convex set in Rn+1 .

Proof. Let f : Q → R be convex, we will show that epi(f ) is convex. Consider


any two points [x′ ; t′ ], [x′′ ; t′′ ] from epi{f } and any λ ∈ [0, 1]. Then, x′ , x′′ ∈ Q
and
λ[x′ ; t′ ] + (1 − λ)[x′′ ; t′′ ] = [λx′ + (1 − λ)x′′ ; λt′ + (1 − λ)t′′ ],
| {z } | {z }
:=x :=t

and since f is convex, we see that Q is convex and x ∈ Q. Moreover, by definition


t = λf (x′ ) + (1 − λ)f (x′′ ) ≥ f (x), where the inequality follows from convexity of
f . Then, [x; t] ∈ epi{f }. Thus, epi{f } is convex.
For the other direction, suppose that epi{f } is convex. Consider any x′ , x′′ ∈ Q
and λ ∈ [0, 1]. Then, we have [x′ ; f (x′ )] ∈ epi{f } and [x′′ ; f (x′′ )] ∈ epi{f } by
definition of epi{f }. Since epi{f } is convex, the point
λ[x′ ; f (x′ )] + (1 − λ)[x′′ ; f (x′′ )] = [λx′ + (1 − λ)x′′ ; λf (x′ ) + (1 − λ)f (x′′ )]
is in epi{f } as well. This, by definition of epi{f }, implies the relations λx′ + (1 −
λ)x′′ ∈ Q and f (λx′ + (1 − λ)x′′ ) ≤ λf (x′ ) + (1 − λ)f (x′′ ), so that f is convex.
More examples of convex functions: norms. Equipped with Proposition
III.12.2, we can extend our initial list of convex functions (affine functions and
several one-dimensional functions) with more examples, namely norms. Let ∥x∥
be a norm on Rn (see section 1.1.2). So √ far, we encountered three examples of
norms: the Euclidean (ℓ2 -) norm ∥x∥2 = x⊤ x, the ℓ1 -norm ∥x∥1 =
P
|xi | and
i
12.2 Jensen’s inequality 157

the ℓ∞ -norm ∥x∥∞ = max |xi |. It was also claimed (although not proved) that
i
these are three members from an infinite family of norms
n
!1/p
X
∥x∥p := |xi |p , where 1 ≤ p ≤ ∞
i=1

(the right hand side of the latter relation for p = ∞ is, by definition, max |xi |).
i
We say that a function f : Rn → R is positively homogeneous of degree 1 if it
satisfies
f (tx) = tf (x), ∀x ∈ Rn , t ≥ 0.
Also, we say that the function f : Rn → R is subadditive if it satisfies
f (x + y) ≤ f (x) + f (y), ∀x, y ∈ Rn .
Note that every norm is positively homogeneous of degree 1 and subadditive.
We are about to prove that all such functions (in particular, all norms) are con-
vex:

Proposition III.12.3 Let π(x) be a real-valued function on Rn which is


positively homogeneous of degree 1. Then, π is convex if and only if it is
subadditive.

Proof. Note that the epigraph of a positively homogeneous of degree 1 function


π is a conic set since for any λ ≥ 0 we have [x; t] ∈ epi{π} implies λ[x; t] ∈ epi{π}.
Moreover, by Proposition III.12.2 π is convex if and only if epi{π} is convex. It
is clear that a conic set is convex if and only if it contains the sum of every pair
of its elements (why ?). This latter property is satisfied for the epigraph of a
real-valued function if and only if the function is subadditive (evident).

12.2 Jensen’s inequality


The following basic observation is, we believe, one of the most useful observations
ever made.

Proposition III.12.4 [Jensen’s inequality] Let f : Q → R be convex. Then,


for every convex combination of points xi from Q, i.e.,
N
X
λi xi ,
i=1
PN
for some λ ∈ RN
+ satisfying λi = 1, we have
i=1
N
! N
X X
i
f λi x ≤ λi f (xi ).
i=1 i=1
158 First acquaintance with convex functions

Proof. Note that the points [xi , f (xi )] belong to the epigraph of f . As f is convex,
PN
its epigraph is a convex set. Then, for any λ ∈ RN + satisfying i=1 λi = 1, we
have that the corresponding convex combination of the points given by
N
"N N
#
X X X
λi [xi ; f (xi )] = λi xi ; λi f (xi )
i=1 i=1 i=1

also belongs to epi{f }. By definition of the epigraph, this means exactly that
N N
λi f (xi ) ≥ f ( λi xi ).
P P
i=1 i=1
Note that the definition of convexity of a function f is exactly the requirement
on f to satisfy the Jensen inequality for the case of N = 2. We see that to satisfy
this inequality for N = 2 is the same as to satisfy it for all N ≥ 2.
Remark III.12.5 An instructive interpretation of Jensen’s inequality is as fol-
lows: Given a convex function f , consider a discrete random variable x taking
values xi ∈ Dom f , i ≤ N , with probabilities λi . Then,
f (E[x]) ≤ E[f (x)],
where E[·] stands for the expectation operator. The resulting inequality, under
mild regularity conditions, holds true for general type random vectors x taking
values in Dom f with probability 1. ■

12.3 Convexity of sublevel sets


The sublevel set of a function f : Q → R given by α ∈ R is defined as
levα (f ) := {x ∈ Q : f (x) ≤ α} .
We have the following simple yet useful observation on the sublevel sets of convex
functions.

Proposition III.12.6 [Convexity of sublevel sets] Let f : Q → R be convex.


Then, for every α ∈ R, the sublevel set of f given by α ∈ R, i.e., levα (f ), is
convex.

Proof. Suppose x, y ∈ levα (f ) and λ ∈ [0, 1]. Then, from convexity of f , we


have f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) ≤ λα + (1 − λ)α = α, so that
λx + (1 − λ)y ∈ levα (f ).
It is important to note that the convexity of sublevel sets does not characterize
convex functions; there are nonconvex functions which possess this property (e.g.,
every monotone function on the axis has all of its sublevel sets convex). Thus,
convexity of sublevel sets specifies a wider family of functions, the so called qua-
siconvex ones. The “proper” characterization of convex functions in terms of
convex sets is given by Proposition III.12.2 – convex functions are exactly the
functions with convex epigraphs.
12.4 Value of a convex function outside its domain 159

12.4 Value of a convex function outside its domain


In its literal meaning, a function is not defined outside its domain and thus
does not have any associated “value” outside its domain. Nevertheless, when
speaking of convex functions, it is extremely convenient to think that the function
outside its domain also has a value, namely, it takes the value of +∞. With this
convention, we revise our definition of convex functions as follows.
A convex function f on Rn is a function taking values in the extended
real axis R ∪ {+∞} and such that for all x, y ∈ Rn and all λ ∈ [0, 1] one
has
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y). (12.2)
When f is taking values in the extended real axis, some terms in the inequality
(12.2) may involve infinities. In such a case, the left- and right-hand side values of
this inequality as well as its validity status is determined according to the stan-
dard conventions on operations of summation, multiplication, and comparison in
the “extended real line” R ∪ {+∞} ∪ {−∞}. These conventions are as follows:

• Operations with real numbers are understood in their usual sense.


• The sum of +∞ and a real number, same as the sum of +∞ and +∞ is +∞.
Similarly, the sum of a real number and −∞, same as the sum of −∞ and −∞
is −∞. The sum of +∞ and −∞ is undefined.
• The product of a real number and +∞ is +∞, 0 or −∞, depending on whether
the real number is positive, zero or negative, and similarly for the product of
a real number and −∞. The product of two “infinities” is again infinity, with
the usual rule for assigning the sign to the product.
• Finally, any real number is < +∞ and > −∞, and of course −∞ < ∞.
The set where a function f taking values in the extended real axis is finite is
called the domain of f and is denoted by Dom f . Based on our revised definition
of convex functions on the extended real axis, the function f that is defined to
be identically equal to +∞ is a legitimate convex function. A convex function
with nonempty domain (that is, a convex function which is not identically +∞)
is called proper.
When f takes all its values in R ∪ {+∞}, the inequality (12.2) is automatically
valid when λ = 0 or λ = 1; when 0 < λ < 1, it is automatically valid when x = y,
same as when at least one of the points x, y is not in Dom f . Thus, we arrive at
the following equivalent definition of convex functions:
Function f : Rn → R ∪ {+∞} is convex if and only if the inequality
(12.2) holds for every x ̸= y, x, y ∈ Dom f and for every λ ∈ (0, 1).
Note that our initial definition of convex function included the requirement for
the domain of the function to be convex; our new, equivalent, definition of convex
function does not include such a requirement —after f is extended outside of its
domain by +∞, inequality (12.2) automatically takes care of the convexity of
Dom f .
160 First acquaintance with convex functions

The simplest function with a given domain Q is identically zero on Q and iden-
tically +∞ outside of Q. This function, called the characteristic (a.k.a. indicator )
function of Q 1 is convex if and only if Q is a convex set.
It is convenient to think of a convex function as of something which is defined
everywhere, since it saves a lot of words. For example, with this convention we
can write f + g (f and g are convex functions on Rn ), and everybody will under-
stand what is meant. Without this convention, we were supposed to add to this
expression the following explanation as well: “f + g is a function with the domain
being the intersection of those of f and g, and in this intersection it is defined as
(f + g)(x) = f (x) + g(x).”

1 This terminology is standard for Convex Analysis; in other areas of Math, characteristic, a.k.a.
indicator, function of a set Q ⊂ Rn is defined as the function equal to 1 on the set and to 0 outside
of it.
13

How to detect convexity

In an optimization problem
min {f (x) : gj (x) ≤ 0, j = 1, . . . , m}
x

convexity of the objective function f and the constraint functions gi is crucial.


Indeed, convex problems — those with convex f and gj — possess nice theoretical
properties with important practical implications. For example, the local neces-
sary optimality conditions for these problems are sufficient for global optimality.
Moreover, much more importantly, convex problems can be efficiently (both in
theoretical and, to some extent, in the practical meaning of the word) solved,
which is not, unfortunately, the case for general nonconvex problems. This is why
it is so important to know how to detect convexity of a given function.
The scheme of our investigation is typical for mathematics. Let us start with
the example which you know from Analysis. How do you detect continuity of a
function? Of course, there is a definition of continuity in terms of ϵ and δ, but it
would be an actual disaster if each time we need to prove continuity of a func-
tion, we were supposed to write down the proof that “for every positive ϵ there
exists positive δ such that . . . ”. In fact, we use another approach: we list once
forever a number of standard operations which preserve continuity (like addition,
multiplication, taking superpositions, etc.) and point out a number of standard
examples of continuous functions (like the power function, the exponent, etc.).
Note that both steps, proving that the operations in the list preserve continuity
and proving that the standard functions are continuous, take certain effort and
indeed is done in ϵ − δ terms. But, after investing in this effort once, typically
proving continuity of a given function becomes a much simpler task: it suffices
to demonstrate that the function can be obtained, in finitely many steps, from
our “raw materials” (the standard functions which are known to be continuous)
by applying our “machinery” (the combination rules which preserve continuity).
Normally, this demonstration is given by a single word “evident” or is even un-
derstood by default.
This is exactly the case with convexity. We will next point out the list of
operations which preserve convexity and a number of standard convex functions.

13.1 Operations preserving convexity of functions


We start with the following basic operations preserving convexity of functions:
161
162 How to detect convexity

• Stability under taking nonnegative weighted sums: if f, g are convex functions


on Rn , then their linear combination λf + µg with nonnegative coefficients
λ, µ ∈ R+ is also convex.
[This can be verified straightforwardly using the convex function definition.]
• Stability under affine substitutions of the argument: given a convex function f
on Rn and an affine mapping x 7→ Ax + b from Rm into Rn , the superposition
f (Ax + b) is convex.
[This can be proved directly by verifying the convex function definition or
by noting that the epigraph of the superposition is the inverse image of the
epigraph of f under an affine mapping.]
• Stability under taking pointwise supremum: given any nonempty (and possi-
bly infinite!) family of convex functions {fα (·)}α∈A on Rn , their supremum
sup fα (·) is convex.
α∈A
[Note that the epigraph of the supremum is clearly the intersection of the
epigraphs of the functions from the family. Then, recall that the intersection
of every family of convex sets is convex.]
• “Convex Monotone superposition”: Let f (x) := [f1 (x), . . . , fK (x)] be a vector-
map with convex component functions fi : Rn → R ∪ {+∞} , and let F
be a convex function on RK . Suppose F is monotone nondecreasing, i.e., for
any z, z ′ ∈ RK satisfying z ≤ z ′ we always have F (z) ≤ F (z ′ ). Then, the
superposition function given by
ϕ(x) := F (f (x)) = F (f1 (x), . . . , fK (x)) : Rn → R ∪ {+∞}
is convex.
Here, note that the expression F (f1 (x), . . . , fK (x)) makes no evident sense at a
point x where some of fi ’s take the value of +∞. At a such point, by definition,
we assign the value of +∞ to the superposition function.
Let us now justify the convex monotone composition rule. Consider any
x, x′ ∈ Dom ϕ. Then, z := f (x) and z ′ := f (x′ ) are vectors from RK which
belong to Dom F . Due to the convexity of the components of f , for any λ ∈
(0, 1) we have the vector inequality
f (λx + (1 − λ)x′ ) ≤ λz + (1 − λ)z ′ .
In particular, the left hand side in this inequality is a vector from RK , i.e.,
it has no “infinite entries,” and we may further use the monotonicity of F to
arrive at
ϕ(λx + (1 − λ)x′ ) = F (f (λx + (1 − λ)x′ )) ≤ F (λz + (1 − λ)z ′ ).
Moreover, using the convexity of F we deduce
F (λz + (1 − λ)z ′ ) ≤ λF (z) + (1 − λ)F (z ′ ).
Then, combining these two inequalities and noting that F (z) = F (f (x)) = ϕ(x)
and F (z ′ ) = F (f (x′ )) = ϕ(x′ ), we arrive at the desired convexity relation
ϕ(λx + (1 − λ)x′ ) ≤ λϕ(x) + (1 − λ)ϕ(x′ ).
13.1 Operations preserving convexity of functions 163

Imagine how many extra words would be necessary here if there were no con-
vention on the value of a convex function outside its domain!
In the Convex Monotone superposition rule, monotone nondecreasing property
of F is crucial. (Look what happens when n = K = 1, f1 (x) = x2 , F (z) = −z).
This rule, however, admits the following two useful variants where the mono-
tonicity requirement is somehow relaxed (the justifications of these variants are
left to the reader):

• “Convex Affine superposition”: Let F (z) be a convex function on RK , and


let the functions fi (x), i ≤ K, be convex functions on Rn . Suppose that
for some k ≤ K the functions f1 , . . . , fk are affine, and the function F (z)
is nondecreasing in the entries zs of z with indices s > k. Then, the function
F (f1 (x), . . . , fK (x)) is convex.
• Let F (z) be a convex function on RK , and let the functions fi (x), i ≤ K, be
convex functions on Rn . Define f (x) := [f1 (x); . . . ; fK (x)]. Let Y be a convex
set in RK such that f (x) ∈ Y whenever all entries in f (x) are finite. Suppose
that for some k ≤ K the functions f1 , . . . , fk are affine. Assume, next, that F (z)
is nondecreasing in only the entries zs , s > k, of z on Y , i.e., F (z ′ ) ≥ F (z)
whenever z ′ , z are such that z ′ , z ∈ Y and zs′ = zs for all s ≤ k and zs′ ≥ zs for
all s > k. Then, F (f (x)) is convex on Rn .
Example: Let fi (x) be convex, i ≤ K, and F (z) := ∥z∥1 . Note that in general
the function F (f (x)) is not necessarily convex (look what happens when n =
K = 1 and f1 (x) = x2 − 1). However, F (f (x)) is convex provided that fi (x)
are not just convex, but are nonnegative as well (set Y = RK + ). More generally,
the functions ∥f (x)∥p , 1 ≤ p ≤ ∞, are convex provided that each component
fi : Rn → R ∪ {+∞}, i ≤ K of the vector function f (x) = [f1 (x); . . . ; fK (x)]
is convex and nonnegative.

We close this section with two more convexity preserving operations:

• Stability under partial minimization: if f (x, y) : Rnx × Rm


y → R ∪ {+∞} is
convex (as a function of z = [x; y]; this is called joint convexity) and the
function
g(x) := inf f (x, y)
y

is greater than −∞ everywhere, then g is convex.


The justification of this is as follows. First, the only values convex functions
can take are real numbers and +∞, and in the case of g this is assumed. Now,
consider any x, x′ ∈ Dom g and any λ ∈ [0, 1]. Define x′′ := λx + (1 − λ)x′ . We
need to show that g(x′′ ) ≤ λg(x) + (1 − λ)g(x′ ). There is clearly nothing to
prove when λ = 0 or λ = 1, same as when 0 < λ < 1 and either x′ , or x′′ , or
both do not belong to Dom g. Thus, we assume that x′ , x′′ ∈ Dom g. For any
positive ϵ, we can find yϵ and yϵ′ such that [x; yϵ ] ∈ Dom f , [x′ ; yϵ′ ] ∈ Dom f and
g(x) + ϵ ≥ f (x, yϵ ), g(x′ ) + ϵ ≥ f (x′ , yϵ′ ). Taking weighted sum of these two
164 How to detect convexity

inequalities, we get
λg(x) + (1 − λ)g(x′ ) + ϵ ≥ λf (x, yϵ ) + (1 − λ)f (x′ , yϵ′ )
≥ f (λx + (1 − λ)x′ , λyϵ + (1 − λ)yϵ′ )
= f (x′′ , λyϵ + (1 − λ)yϵ′ ),
where the last inequality follows from the convexity of f . By definition of g(x′′ )
we have f (x′′ , λyϵ + (1 − λ)yϵ′ ) ≥ g(x′′ ), and thus we get λg(x) + (1 − λ)g(x′ ) +
ϵ ≥ g(x′′ ). In particular, x′′ ∈ Dom g (recall that x, x′ ∈ Dom(g) and thus
g(x), g(x′ ) ∈ R). Moreover, since the resulting inequality is valid for all ϵ > 0,
we come to g(x′′ ) ≤ λg(x) + (1 − λ)g(x′ ), as required.
• Perspective transform of a convex function: Given a convex function f on Rn ,
we define the function g(x, y) := yf (x/y) with the domain {[x; y] ∈ Rn+1 : y >
0, x/y ∈ Dom f } to be its perspective function. The perspective function of a
convex function is convex.
Let us first examine a direct justification of this. Consider any [x′ ; y ′ ] and
[x′′ ; y ′′ ] from Dom g and any λ ∈ [0, 1]. Define x := λx′ + (1 − λ)x′′ , y :=
λy ′ + (1 − λ)y ′′ . Then, y > 0. We also define λ′ := λy ′ /y and λ′′ := (1 − λ)y ′′ /y,
so that λ′ , λ′′ ≥ 0 and λ′ + λ′′ = 1. As f is convex, we deduce x/y = λx′ /y +
(1 − λ)x′′ /y = λ′ x′ /y ′ + λ′′ x′′ /y ′′ = λ′ x′ /y ′ + (1 − λ′ )x′′ /y ′′ ∈ Dom f and
f (x/y) ≤ λ′ f (x′ /y ′ ) + (1 − λ′ )f (x′′ /y ′′ ). Thus, as y > 0, we arrive at yf (x/y) ≤
yλ′ f (x′ /y ′ ) + y(1 − λ′ )f (x′′ /y ′′ ) = λ[y ′ f (x′ /y ′ )] + (1 − λ)[y ′′ f (x′′ /y ′′ )], that is,
g(x, y) ≤ λg(x′ , y ′ ) + (1 − λ)g(x′′ , y ′′ ).
Here is an alternative smarter justification. There is nothing to prove when
Dom f = ∅. So, suppose that Dom f ̸= ∅. Consider the epigraph epi(f ) =
{[x; s] : s ≥ f (x)} and the perspective transform of this nonempty convex set
which is given by (see section 1.5)
Persp(epi{f }) := [[x; s]; t] ∈ Rn+2 : t > 0, [x/t; s/t] ∈ epi{f }


= [x; s; t] ∈ Rn+2 : t > 0, s/t ≥ f (x/t)




= [x; s; t] ∈ Rn+2 : t > 0, s ≥ tf (x/t)




= [x; s; t] ∈ Rn+2 : t > 0, x/t ∈ Dom f, s ≥ tf (x/t)




= [x; s; t] ∈ Rn+2 : t > 0, [x; t] ∈ Dom g, s ≥ g(x, t)




= [x; s; t] ∈ Rn+2 : [x; t; s] ∈ epi{g} ,




where the second from last equality follows from the fact that by definition of
g(x, t), whenever t > 0, the inclusion x/t ∈ Dom f takes place if and only if
[x; t] ∈ Dom g. Thus, we observe that Persp(epi{f }) is nothing but the image
of epi{g} under the one-to-one linear transformation [x; t; s] 7→ [x; s; t]. As
Persp(epi{f }) is a convex set (recall from section 1.5 that the perspective
transform of a nonempty convex set is convex), we conclude that g is convex.
Now that we know what the basic operations preserving convexity of a function
are, let us look at the standard convex functions these operations can be applied
to. We have already seen several examples in Example III.12.2; but we still do
not know why these functions are convex. The usual way to check convexity of a
13.2 Criteria of convexity 165

“simple” (i.e., given by a simple formula) function is based on differential criteria


of convexity, which we will examine in the next section.

13.2 Criteria of convexity


The definition of convexity of a function immediately reveals that convexity is a
one-dimensional property: a function f on Rn taking values in R∪{+∞} is convex
if and only if its restriction on every line, i.e., every function g : R → R ∪ {+∞}
of the type g(t) := f (x + th) with x, h ∈ Rn is convex.

x z y

Figure III.1. Univariate convex function f : [x, y] → R. The average rate of change of f
on the entire segment [x, y] is in-between the average rates of change “at the beginning,”
i.e., when passing from x to z and “at the end,” i.e., when passing from z to y.

We are about to show that any univariate function f : R → R ∪ {+∞} is


convex if and only if for every 3 real numbers x < z < y such that x, y ∈ Dom f
we have z ∈ Dom f and the average rate f (y)−f
y−x
(x)
at which f varies when moving
from x to y is in-between the average rate f (z)−f
z−x
(x)
at which f changes “at the
beginning,” i.e., when moving from x to z, and the average rate f (y)−f
y−z
(z)
at which
f changes “at the end,” i.e., when moving from z to y, see Figure III.1:
f (z) − f (x) f (y) − f (x) f (y) − f (z)
≤ ≤ ,
z−x y−x y−z
so that
f (z) − f (x) f (y) − f (x)
≤ ,
z−x y−x
f (y) − f (x) f (y) − f (z)
≤ , (13.1)
y−x y−z
f (z) − f (x) f (y) − f (z)
≤ .
z−x y−z
As is immediately seen, every one of the three inequalities in (13.1) implies the
other two.
Here is the justification of the above characterization of convexity of a uni-
variate function f . Note that this convexity is nothing but the requirement
that for any real numbers x, y ∈ Dom f with x < y and every λ ∈ (0, 1), for
zλ := (1 − λ)x + λy it holds f (zλ ) ≤ (1 − λ)f (x) + λf (y), or, which is the same,
f (zλ ) − f (x) ≤ λ(f (y) − f (x)). (13.2)
166 How to detect convexity

When λ ∈ (0, 1), the pair (1, λ) is a positive multiple of the pair (y−x, zλ −x), thus
(13.2) is equivalent to (y −z)(f (zλ )−f (x)) ≤ (zλ −x)(f (y)−f (x)). Note that this
inequality is the same as f (zzλλ)−f
−x
(x)
≤ f (y)−f
y−x
(x)
. When λ runs through the interval
(0, 1) the point zλ runs through the entire set {z : x < z < y}, and so we conclude
that f is convex if and only if for every triple x < z < y with x, y ∈ Dom f the first
inequality in (13.1) holds true. As every one of inequalities in (13.1) implies the
other two, this justifies our “average rate of change” characterization of univariate
convexity.
In the case of multivariate convex functions, we have the following immediate
consequence of the preceding observations.

Lemma III.13.1 Let x, x′ , x′′ be three distinct points in Rn with x′ ∈ [x, x′′ ].
Then, for any convex function f that is finite on [x, x′′ ], we have
f (x′ ) − f (x) f (x′′ ) − f (x)
≤ . (13.3)
∥x′ − x∥2 ∥x′′ − x∥2

Proof. Under the premise of the lemma, define ϕ(t) := f (x + t(x′′ − x)) and let
λ ∈ R be such that x′ = x + λ(x′′ − x). Note that λ ∈ (0, 1) as x′ ∈ [x, x′′ ] and
the points x, x′ , x′′ are all distinct from each other. As it was explained at the
beginning of this section, the univariate function ϕ is convex along with f , and
0, 1, λ ∈ Dom ϕ. Applying the first inequality in (13.1) to ϕ in the role of f and

the triple (0, λ, 1) in the role of the triple (x, z, y), we get f (x )−f
λ
(x)
≤ f (x′′ )−f (x),
which, due to λ = ∥x′ − x∥2 /∥x′′ − x∥2 , is nothing but (13.3).
To sum up, to detect convexity of a function, in principle, it suffices to know
how to detect convexity of functions of a single variable. Moreover, this latter
question can be resolved by the standard Calculus tools.

13.2.1 Differential criteria of convexity


We have the following simple and complete characterization of convexity of smooth
univariate functions from Calculus.

Proposition III.13.2 [Necessary and sufficient condition for convexity of


smooth univariate functions] Let (a, b) be an interval on the axis (where the
cases of a = −∞ and/or b = +∞ are also possible). Then,
(i) A function f that is differentiable everywhere on (a, b) is convex on
(a, b) if and only if its derivative f ′ is monotonically nondecreasing on (a, b);
(ii) A function f that is twice differentiable everywhere on (a, b) is convex
on (a, b) if and only if its second derivative f ′′ is nonnegative everywhere on
(a, b).

Proof.
(i): We start by proving the necessity of the stated condition. Suppose that f
is differentiable and convex on (a, b). We will prove that then f ′ is monotonically
13.2 Criteria of convexity 167

nondecreasing. Let x < y be two points from the interval (a, b), and let us prove
that f ′ (x) ≤ f ′ (y). Consider any z ∈ (x, y). Invoking convexity of f and applying
(13.1), we have
f (z) − f (x) f (y) − f (z)
≤ .
x−z y−z
Passing to limit as z → x + 0, we get
f (y) − f (x)
f ′ (x) ≤ ,
y−x
and passing to limit in the same inequality as z → y − 0, we arrive at
f (y) − f (x)
≤ f ′ (y),
y−x
and so f ′ (x) ≤ f ′ (y), as claimed.
Let us now prove the sufficiency of the condition in (i). Thus, we assume that
f ′ exists and is nondecreasing on (a, b), and we will verify that f is convex on
(a, b). By “average rate of change” description of the convexity of a univariate
function, all we need is to verify that if x < z < y and x, y ∈ (a, b), then
f (z) − f (x) f (y) − f (z)
≤ .
z−x y−z
This is indeed evident: by the Lagrange Mean Value Theorem, the left hand side
ratio is f ′ (u) for some u ∈ (x, z), and the right hand side one is f ′ (v) for some
v ∈ (z, y). Since v > u and f ′ is nondecreasing on (a, b), we conclude that the
left hand side ratio is indeed less than or equal to the right hand side one.
(ii): This part is an immediate consequence of (i) as we know from Calculus
that a differentiable function — in our case now this is the function f ′ — is mono-
tonically nondecreasing on an interval if and only if its derivative is nonnegative
on this interval.
Proposition III.13.2 immediately allows us to verify the convexity of functions
listed in Example III.12.2. To this end, the only difficulty which we may encounter
is that some of these functions (e.g., xp with p ≥ 1, and −xp with 0 ≤ p ≤ 1)
are claimed to be convex on the half-interval [0, +∞), while Proposition III.13.2
talks about convexity of functions on open intervals. This difficulty can be ad-
dressed with the following simple result which allows us to extend the convexity
of continuous functions beyond open sets.

Proposition III.13.3 Let M be a convex set and let f be a function with


Dom f = M . Suppose that f is convex on rint M and is continuous on M ,
i.e.,
f (xi ) → f (x), as i → ∞,
whenever xi , x ∈ M and xi → x as i → ∞. Then, f is convex on M .
168 How to detect convexity

Proof. Consider any x, y ∈ M and any λ ∈ [0, 1]. Define z := λx + (1 − λ)y. We


need to prove that
f (z) ≤ λf (x) + (1 − λ)f (y).
As x, y ∈ M and M is convex, by Theorem I.1.29(iii), there exist sequences
{xi }i≥1 ∈ rint M and {yi }i≥1 ∈ rint M converging to x and to y, respectively.
Then, the sequence zi := λxi + (1 − λ)yi is in rint M and it converges to z as
i → ∞. Since f is convex on rint M , for all i ≥ 1 we have
f (zi ) ≤ λf (xi ) + (1 − λ)f (yi ).
By taking the limits of both sides of this inequality, noting that f is continuous
on M , and as i → ∞ the sequences xi , yi , zi converge to x, y, z ∈ M , respectively,
we obtain the desired inequality.
Example III.13.1 We are now able to justify the claim, made as early as in
section 1.1.2, that the functions ∥ · ∥p , 1 ≤ p ≤ ∞, are norms.
So far, we have verified it only for p = 1, 2, ∞ in Remark I.1.7. Now consider
the case of p ∈ (1, ∞). In order to prove that ∥ · ∥p is indeed a norm, using
Fact I.1.8, all we need is to show that the set V := {x ∈ Rn : ∥x∥p ≤ 1} is
closed, bounded, symmetric with respect to origin, contains a neighborhood of
the origin and is convex. All these except the convexity is evident. So, Pnwe will
prove that V is indeed convex. Define the function f (x) := ∥x∥pp = i=1 |xi |p .
Then, V = {x ∈ Rn : f (x) ≤ 1}. By Proposition III.13.2, the univariate function
|x|p is convex (recall that p ∈ (1, ∞)). Moreover, by “calculus of convexity”
(section 13.1), f is convex (as it is the sum of convex functions). Thus, V is the
sublevel set of a convex function, and by Proposition III.12.6, it is convex. ♢
In the preceding illustration of convexity of norms we have started to examine
multivariate functions. Let us continue our discussion of multivariate functions
by examining a differential criteria for their convexity. Propositions III.13.2(ii)
and III.13.3 give us the following convenient necessary and sufficient condition
for convexity of smooth multivariate functions.

Corollary III.13.4 [Convexity criterion for smooth multivariate functions]


Let f : Rn → R ∪ {+∞} be a function where its domain Q := Dom f is a
convex set with a nonempty interior. Suppose that f is
• continuous on Q, and
• twice differentiable on int Q.
Then, f is convex on Q if and only if for all x ∈ int Q its Hessian matrix
f ′′ (x) is positive semidefinite, i.e.,
h⊤ f ′′ (x)h ≥ 0, ∀h ∈ Rn .
That is, f is convex on Q if and only if the second order directional derivative
of f taken at any point x ∈ int Q along any direction h ∈ Rn is nonnegative,
13.2 Criteria of convexity 169

i.e., for all x ∈ int Q, and h ∈ Rn we have


d2
h⊤ f ′′ (x)h = t=0
f (x + th) ≥ 0.
dt2

Proof. The “only if” part is evident: if f is convex on Q and x ∈ int Q, then for
any fixed direction h ∈ Rn the function g : R → R ∪ {+∞} defined as
g(t) := f (x + th)
is convex in a certain neighborhood of the point t = 0 on the axis (recall that affine
substitutions of argument preserves convexity). Since f is twice differentiable in
a neighborhood of x, the function g is twice differentiable in a neighborhood of
t = 0, as well. Thus, by Proposition III.13.2, we have 0 ≤ g ′′ (0) = h⊤ f ′′ (x)h.
In order to prove the “if” part we need to show that every function f : Q →
R∪{+∞} that is continuous on Q and that satisfies h⊤ f ′′ (x)h ≥ 0 for all x ∈ int Q
and all h ∈ Rn is convex on Q.
Let us first prove that f is convex on int Q. By Theorem I.1.29, int Q is a
convex set. Since the convexity of a function on a convex set is a one-dimensional
property, all we need to prove is that for any x, y ∈ int Q the univariate function
g : [0, 1] → R ∪ {+∞} given by
g(t) := f (x + t(y − x))
is convex on the segment [0, 1]. As f is twice differentiable on int Q, g is continuous
and twice differentiable on the segment [0, 1] and its second derivative is given by
g ′′ (t) = (y − x)⊤ f ′′ (x + t(y − x))(y − x) ≥ 0,
where the inequality follows from the premise on f . Then, by Propositions III.13.2(ii)
and III.13.3, g is convex on [0, 1]. Thus, f is convex on int Q. As f is convex on
int Q and is continuous on Q, by Proposition III.13.3 we conclude that f is convex
on Q.

Corollary III.13.5 [Sufficient condition for strict convexity of smooth func-


tions] Consider the setting of Corollary III.13.4. Suppose, in addition, that
Q := Dom f is open and the Hessian of f is positive definite on Q, i.e.,
d2
f (x + th) > 0, ∀x ∈ Q and ∀h ̸= 0.
dt2
Then, f is strictly convex.

Proof. Consider any x, y ∈ Q with x ̸= y and any λ ∈ (0, 1). We need to show
that f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y). Consider the function ϕ : [0, 1] → R
given by ϕ(t) := f (tx+(1−t)y). Then, as f is twice differentiable on Q, ϕ is twice
differentiable on [0, 1]. Moreover, based on the premise on f , we have ϕ′′ (t) > 0
for all t ∈ [0, 1]. Note that our target inequality is simply the relation ϕ(λ) <
λϕ(1) + (1 − λ)ϕ(0). Since 0 < λ < 1, we can rewrite this target inequality as
170 How to detect convexity
ϕ(λ)−ϕ(0)
λ
< ϕ(1)−ϕ(λ)
1−λ
. Finally, by the Mean Value Theorem and strict monotonicity

of ϕ we conclude that the desired target inequality holds.
We conclude this section by highlighting that convexity of many “complicated”
functions can be proved easily by the application of combination of “calculus of
convexity” rules to simple functions which pass the “infinitesimal” convexity tests.
Example III.13.2 Consider the following exponential posynomial function f :
Rn → R, given by
XN
f (x) = ci exp(a⊤
i x),
i=1

where the coefficients ci are positive (this is why the function is called posynomial).
This function is in fact convex on Rn . How can we prove this?
An immediate proof is as follows:
1. The function exp(t) is convex on R as its second order derivative is positive as
required by the infinitesimal convexity test for smooth univariate functions.
2. Thus, by stability of convexity under affine substitutions of argument, we
deduce that all functions exp(a⊤ n
i x) are convex on R .
3. Finally, by stability of convexity under taking linear combinations with non-
negative coefficients, we conclude that f is convex on Rn .
And if we were supposed to prove that the maximum of three exponential
posynomials is convex? Then, all we need is to add to our three steps above
the fourth one, which refers to the stability of convexity under taking pointwise
supremum. ♢

13.3 Important multivariate convex functions


Let us start with some simple yet important multivariate convex functions that
can be detected solely based on “calculus of convexity” presented in section 13.1.
Example III.13.3 Let k, n be two positive integers such that k ≤ n. Consider
the function sk : Rn → R given by
k
X
sk (x) := x[i] ,
i=1

where x[i] denotes the i-th largest entry in the vector x. That is, for every vector
x ∈ Rn , we have x[1] ≥ x[2] ≥ . . . ≥ x[n] . By definition, sk (x) is simply the sum of
k largest elements in x. We claim thatP sk (x) is a convex function of x. Given any
index set I, the function ℓI (x) := i∈I xi is a linear function of x and thus it is
convex. Now, sk (x) is clearly the maximum of the linear functions ℓI (x) over all
index sets I with exactly k elements from {1, . . . , n}, and as such is convex.
Note also that sk (x) is a permutation symmetric function of x, that is, the value
of the function sk (x) remains the same when permuting entries in its argument
x. Taking together convexity and permutation symmetry of sk (x) will be very
13.3 Important multivariate convex functions 171

useful in our developments for functions of eigenvalues of symmetric matrices in


chapter 17. ♢
Remark III.13.6 An immediate implication of Example III.13.3 is that given
a vector x ∈ Rn , the function maxi {xi }, i.e., the value of its largest element, is
convex in x. Thus, the function corresponding to the value of minimum element
in x, i.e., mini {xi } = − maxi {−xi } is concave in x. That said, for 1 < k < n, the
“intermediate element” function, i.e., the function given by x[k] , which stands for
the k-largest element in x, is neither convex, nor concave function of x. ■
While “calculus of convexity” presented in section 13.1 is sufficient and quite
practical in proving convexity of many functions, there are still several multi-
variate functions for which convexity seemingly cannot be extracted from this
calculus and should be verified via Corollary III.13.4. Here are some important
examples.
Example III.13.4 The function f : Rn → R given by

n
!
X
f (x) := ln exp(xi )
i=1

is convex.
Let us first verify the convexity of this function via direct computation using
Corollary III.13.4. To this end, we define pi := Pexp(x i)
. Then, the second-order
j exp(xj )
n
directional derivative of f along the direction h ∈ R is given by
!2
d2 X X
ω := 2 f (x + th) = pi h2i − pi hi .
dt t=0 i i

P
Observing that pi > 0 and i pi = 1, we see that ω is the variance (the ex-
pectation of square minus the squared expectation) of discrete random variable
taking values hi with probabilities pi , and it is well known that the variance of
any random variable is always nonnegative. Here is a direct verification of this
fact:
!2 !2 ! !
X X√ √ X X X
pi hi = pi ( pi hi ) ≤ pi pi h2i = pi h2i ,
i i i i i

where the inequality


P follows from Cauchy-Schwarz inequality and the last equality
holds since i pi = 1.
In fact, in this example we can prove convexity of f via calculus Pof convexity
as well. Recall
P that ln(z) = mins∈R {z exp(s) − s − 1}. Thus, ln ( i exp(xi )) =
mins∈R {( i exp(xi + s)) − s + 1}. Then, by stability of convexity under partial
minimization we conclude that f is convex.
172 How to detect convexity

An even simpler proof is as follows:


( !) ( !)
X X
epi ln exp(xi ) = [x; t] : t ≥ ln exp(xi )
i i
( )
X
= [x; t] : exp(t) ≥ exp(xi )
i
( )
X
= [x; t] : exp(xi − t) ≤ 1 ,
i

and the concluding set is convex (as a sublevel set of a convex function). ♢
Example III.13.5 The function f : Rn+ → R given by
n
Y
f (x) = xαi i ,
i=1
P
where αi > 0 for all 1 ≤ i ≤ n and satisfy i αi ≤ 1, is convex.
To prove convexity of f via Corollary III.13.4 all we need is to verify that for
d2
any x ∈ Rn satisfying x > 0 and for any h ∈ Rn , we have dt2
f (x + th) ≥ 0.
t=0
Let ηi := hi /xi , then direct computation shows that
" #
d2 X
2
X
2
f (x + th) = ( αi ηi ) − ( αi ηi ) f (x)
dt2 t=0 i i

and as we have seen in Example III.13.4, the premise on α implies that


!2 !2 ! !
X X√ √ X X X
2
αi ηi = αi ( αi ηi ) ≤ αi αi η i ≤ αi ηi2 ,
i i i i i

where the first inequality


P follows from Cauchy-Schwarz inequality and the last
inequality holds since i αi ≤ 1. Noting that f (x) ≥ 0 whenever x > 0 completes
the proof. ♢
Example III.13.5 admits the following immediate extension:
Example III.13.6 The function f : int Rn+ → R given by
n
Y
f (x) = x−α
i
i
,
i=1

where αi > 0 for all 1 ≤ i ≤ n, is convex.


Here “calculus of convexity” already works: the function
n
X
ln(f (x)) = − αi ln(xi )
i=1

is convex on int Rn+as the function g(y) = − ln y is convex on the positive ray. It
remains to note that taking exponent preserves convexity by Convex Monotone
superposition rule. ♢
13.4 Gradient inequality 173

13.4 Gradient inequality


We next present an extremely important property of convex functions:

Proposition III.13.7 [Gradient inequality] Let f : Rn → R ∪ {+∞}, x ∈


int(Dom f ), and let Q be a convex set containing x. Suppose that
• f is convex on Q, and
• f is differentiable at x.
Let ∇f (x) be the gradient of the function f at x. Then, the following in-
equality holds:
f (y) ≥ f (x) + (y − x)⊤ ∇f (x), ∀y ∈ Q. (13.4)
Geometrically, this relation states that the graph of the function f restricted
onto the set Q, i.e.,
(y, t) ∈ Rn+1 : y ∈ Dom f ∩ Q, t = f (y)


is above the graph


{(y, t) ∈ Rn+1 : t = f (x) + (y − x)⊤ ∇f (x)}
of the linear form tangent to f at x.

Proof. Let y ∈ Q. There is nothing to prove if y ̸∈ Dom f (since then the left
hand side of the gradient inequality is +∞). Similarly, there is nothing to prove
when y = x. Thus, we can assume that y ̸= x and y ∈ Dom f . Let us set
yτ := x + τ (y − x), where 0 < τ ≤ 1,
so that y0 = x, y1 = y and yτ is an interior point of the segment [x, y] for
0 < τ < 1. Applying Lemma III.13.1 to the triple (x, x′ , x′′ ) taken as (x, yτ , y),
we get
f (x + τ (y − x)) − f (x) f (y) − f (x)
≤ ;
τ ∥y − x∥2 ∥y − x∥2
as τ → +0, the left hand side in this inequality, by the definition of the gradient,

∇f (x)
tends to (y−x)
∥y−x∥2
, and so we get
(y − x)⊤ ∇f (x) f (y) − f (x)
≤ ,
∥y − x∥2 ∥y − x∥2
and as ∥y − x∥2 > 0 this is equivalent to
(y − x)⊤ ∇f (x) ≤ f (y) − f (x).
Note that this inequality is exactly the same as (13.4).

Corollary III.13.8 Suppose Q is a convex set with a nonempty interior and


f is continuous on Q and differentiable on int Q. Then, f is convex on Q if
174 How to detect convexity

and only if the gradient inequality (13.4) is valid for every pair x ∈ int Q
and y ∈ Q.

Proof. Indeed, the “only if” part, i.e., the convexity of f on Q implying the
gradient inequality for all x ∈ int Q and all y ∈ Q, is given by Proposition III.13.7.
Let us prove the “if” part, i.e., establish the reverse implication. Suppose that
f satisfies the gradient inequality for all x ∈ int Q and all y ∈ Q, and let us
verify that f is convex on Q. As f is continuous on Q and Q is convex, by
Proposition III.13.3 it suffices to prove that f is convex on int Q. Recall also that
by Theorem I.1.29 int Q is convex. Moreover, due to the gradient inequality, on
int Q function f is the supremum of the family of affine (and therefore convex)
functions, i.e., for all y ∈ int Q we have
f (y) = sup fx (y), where fx (y) := f (x) + (y − x)⊤ ∇f (x).
x∈int Q

As affine functions are convex and by stability of convexity under taking pointwise
supremums, we conclude f is convex on int Q.

13.5 Boundedness and Lipschitz continuity of a convex function


In this section we will discuss some local properties of convex functions such as
boundedness and Lipschitz continuity. Recall that given a function f and a set
Q ⊆ Dom f , we say that f is Lipschitz continuous on Q if there exists a constant
L > 0 (referred to as the Lipschitz constant of f on Q) such that
|f (x) − f (y)| ≤ L∥x − y∥2 , ∀x, y ∈ Q.

Theorem III.13.9 [Local boundedness and Lipschitz continuity of convex


functions] Let f be a convex function and let K be a closed and bounded set
contained in rint (Dom f ). Then, f is Lipschitz continuous on K, i.e.,
|f (x) − f (y)| ≤ L∥x − y∥2 , ∀x, y ∈ K, (13.5)
for properly selected L < ∞. In particular, f is bounded on K.

We shall prove this Theorem later in this section, after some preliminary effort.
Remark III.13.10 In Theorem III.13.9, all three assumptions on K, (1) closed-
ness, (2) boundedness, and (3) K ⊆ rint (Dom f ), are essential. The following
three examples illustrate their importance:
• Suppose f (x) = 1/x, then Dom f = (0, +∞). Consider K = (0, 1]. We have
assumptions (2) and (3) satisfied, but not (1). Note that f is neither bounded
nor Lipschitz continuous on K.
• Suppose f (x) = x2 , then Dom f = R. Consider K = R. We have (1) and (3)
satisfied, but not (2). Note that f is neither bounded nor Lipschitz continuous
on K.
13.5 Boundedness and Lipschitz continuity of a convex function 175

• Suppose f (x) = − x, then Dom f = [0, +∞). Consider K = [0, 1]. We have
(1) and (2) are satisfied, but not (3). Note that f is not Lipschitz continuous
on K (indeed, we have lim f (0)−ft
(t)
= lim t−1/2 = +∞, while for a Lipschitz
t→+0 t→+0
continuous f the ratios t−1 (f (0) − f (t)) should be bounded). On the other
hand, f is bounded on K. With a properly chosen convex function f of two
variables and non-polyhedral compact domain (e.g., with Dom f being the unit
circle), we can demonstrate also that lack of (3), even in presence of (1) and
(2), may cause unboundedness of f on K as well.

Theorem III.13.9 says that a convex function f is bounded on every compact
(i.e., closed and bounded) subset of rint(Dom f ). In fact, in the case of convex
functions we can make a much stronger statement on their below boundedness
of f : any convex function f is below bounded on any bounded subset of Rn !

Proposition III.13.11 A convex function f is bounded from below on every


bounded subset of Rn .
Proof. It is clear that f is bounded from below at any point outside of the
domain of f . Thus, without loss of generality we may assume that Dom f is
full-dimensional and that 0 ∈ int(Dom f ). By Theorem III.13.9, there exists a
neighborhood U of the origin – which can be thought of to be a centered at the
origin ball of some radius r > 0 – where f is bounded from above by some C.
Now, consider an arbitrary R > 0 and an arbitrary point x satisfying ∥x∥2 ≤ R.
Then, the point
r
y := − x
R
is in U , and we have
r R
0= x+ y.
r+R r+R
As f is convex and ∥y∥2 ≤ r implying f (y) ≤ C, we conclude that
r R r R
f (0) ≤ f (x) + f (y) ≤ f (x) + C.
r+R r+R r+R r+R
By reorganizing the terms in this inequality, we get the lower bound
r+R R
f (x) ≥ f (0) − C.
r r
Thus, f is bounded below for any x in the ball that is centered at 0 and has
radius R. As R > 0 is arbitrary, for any bounded set, by selecting R > 0 large
enough we can find such a ball with radius R covering it.
Our proof of Theorem III.13.9 relies on the following “local” version of the
theorem.
176 How to detect convexity

Proposition III.13.12 Let f be a convex function. Then, for any x̄ ∈


rint (Dom f ), we have
(i) f is bounded at x̄, i.e., there exists r > 0 such that f is bounded in the
r-neighborhood Ur (x̄) of x̄ in the affine span of Dom f :
∃r > 0 and C such that |f (x)| ≤ C, ∀x ∈ Ur (x̄),
where Ur (x̄) := {x ∈ Aff(Dom f ) : ∥x − x̄∥2 ≤ r} ;
(ii) f is Lipschitz continuous at x̄, i.e., there exist constants ρ > 0 and L
such that
|f (x) − f (x′ )| ≤ L∥x − x′ ∥2 , ∀x, x′ ∈ Uρ (x̄).

Proof.
(i): 10 . We start with proving the above boundedness of f in a neighborhood of
x̄. This is immediate: by the premise of the proposition we have x̄ ∈ rint (Dom f ),
so there exists r̄ > 0 such that the neighborhood Ur̄ (x̄) is contained in Dom f .
Now, we can find a small simplex ∆ of the dimension m := dim(Aff(Dom f )) with
the vertices x0 , . . . , xm in Ur̄ (x̄) in such a way that x̄ will be a convex combination
of the vectors xi with positive coefficients, even with the coefficients 1/(m + 1),
i.e.,
m
X 1
x̄ = xi .
i=0
m + 1

Here is the justification of this claim that such a simplex ∆ exists: First, when
Dom f is a singleton, the claim is evident. So, we assume that dim(Dom f ) =
m ≥ 1. Without loss of generality, we may assume that x̄ = 0, so that 0 ∈ Dom f
and therefore Aff(Dom f ) = Lin(Dom f ). Then, by Linear Algebra, we can find
m vectors y 1 , . . . , y m in Dom f which form a basis of Lin(Dom f ) = Aff(Dom f ).
m
Setting y 0 := − y i and taking into account that 0 = x̄ ∈ rint (Dom f ), we can
P
i=1
find ϵ > 0 such that the vectors xi := ϵy i , i = 0, . . . , m, belong to Ur̄ (x̄). By
m
1
xi .
P
construction, x̄ = 0 = m+1
i=0
Note that x̄ ∈ rint (∆) (see Exercise I.3). Since ∆ spans the same affine sub-
space as Dom f , we can find a sufficiently small r > 0 such that r ≤ r̄ and
Ur (x̄) ⊆ ∆. Now, by definition,
(m m
)
X X
∆= λi xi : λi ≥ 0, ∀i, λi = 1 ,
i=0 i=0

so that by Jensen’s inequality, for all x ∈ ∆ we have


m
! m
X X
i
f λi x ≤ λi f (xi ) ≤ max f (xi ) =: Cu .
i=0,...,m
i=0 i=0

Thus, as ∆ ⊇ Ur (x̄), we conclude that f (x) ≤ Cu for all x ∈ Ur (x̄) as well.


13.5 Boundedness and Lipschitz continuity of a convex function 177

20 . Now let us prove that if f is above bounded, by some Cℓ , in Ur := Ur (x̄),


then in fact it is also below bounded in this neighborhood (and, consequently, f is
bounded in Ur ). Consider any x ∈ Ur , so that x ∈ Aff(Dom f ) and ∥x − x̄∥2 ≤ r.
Define x′ := x̄ − [x − x̄] = 2x̄ − x. Then, x′ ∈ Aff(Dom f ) and ∥x′ − x̄∥2 =
∥x − x̄∥2 ≤ r, and so x′ ∈ Ur . As x̄ = 21 [x + x′ ] and f is convex, we have
2f (x̄) ≤ f (x) + f (x′ ).
Note that this inequality holds for all x ∈ Ur , hence
f (x) ≥ 2f (x̄) − f (x′ ) =: Cℓ , ∀x ∈ Ur (x̄).
Thus, f is indeed bounded below by Cℓ in Ur .
Setting C := max {Cu , Cℓ } then concludes the proof of part (i).
(ii): Part (ii) is indeed an immediate consequence of part (i) and Lemma III.13.1.
From part (i) we already know that there exist positive constants r, C such that
|f (x)| ≤ C for all x ∈ Ur (x̄). We will prove that f is Lipschitz continuous in
the neighborhood Ur/2 (x̄). Consider any x, x′ ∈ Ur/2 (x̄) such that x ̸= x′ . Let us
extend the segment [x, x′ ] through the point x′ until it reaches to a certain point
x′′ on the (relative) boundary of Ur (x̄). Then,
x′ ∈ (x, x′′ ) and ∥x′′ − x̄∥2 = r.
Then, by (13.3) we have
f (x′′ ) − f (x)
f (x′ ) − f (x) ≤ ∥x′ − x∥2 .
∥x′′ − x∥2
As |f (y)| ≤ C for any y ∈ Ur (x̄) and x, x′′ ∈ Ur (x̄), we observe that |f (x′′ ) −
f (x)| ≤ 2C. Moreover, by the triangle inequality we have ∥x′′ − x∥2 ≥ ∥x′′ − x̄∥2 −
∥x̄ − x∥2 ≥ r − (r/2) = r/2, where the last inequality holds from ∥x′′ − x̄∥2 = r
′′
and ∥x̄ − x∥2 ≤ r/2 (as x ∈ Ur/2 (x̄)). Hence, the term f (x )−f (x)
∥x′′ −x∥2
is bounded above
by the quantity (2C)/(r/2) = 4C/r. Thus, we have
4C ′
f (x′ ) − f (x) ≤ ∥x − x∥2 , ∀x, x′ ∈ Ur/2 .
r
By swapping the roles of x and x′ , we arrive at
4C ′
f (x) − f (x′ ) ≤ ∥x − x∥2 ,
r
and hence
4C
|f (x) − f (x′ )| ≤ ∥x − x′ ∥2 , ∀x, x′ ∈ Ur/2 ,
r
as required in part (ii).
We are now ready to prove Theorem III.13.9.
Proof of Theorem III.13.9. We can extract Theorem III.13.9 from Propo-
sition III.13.12 by the standard Analysis reasoning. All we need is to prove that
if K is a bounded and closed (i.e., a compact) subset of rint (Dom f ), then f is
Lipschitz continuous on K (the boundedness of f on K is an evident consequence
178 How to detect convexity

of its Lipschitz continuity on K and boundedness of K). Assume for contradiction


that f is not Lipschitz continuous on K. Then, for every integer i there exists a
pair of distinct points xi , y i ∈ K such that
f (xi ) − f (y i ) ≥ i∥xi − y i ∥2 . (13.6)
Since K is compact, passing to a subsequence we can ensure that xi → x ∈ K
and y i → y ∈ K.
First, we claim that it is not possible to have x = y. To see this recall that by
Proposition III.13.12 f is Lipschitz continuous in a neighborhood B of x. If x = y
were to hold, then since xi → x, y i → y, this neighborhood B would contain all
xi and y i with large enough indices i; but then, from the Lipschitz continuity of
f in B, the ratios (f (xi ) − f (y i ))/∥xi − y i ∥2 would form a bounded sequence,
which we know is not the case. Thus, the case of x = y is impossible.
Next, we claim that the case of x ̸= y is not possible, either. Proposition III.13.12
implies that f is continuous on Dom f at both the points x and y (note that Lip-
schitz continuity at a point clearly implies the usual continuity at it), so that
f (xi ) → f (x) and f (y i ) → f (y) as i → ∞. Thus, the left hand side in (13.6)
remains bounded as i → ∞. On the other hand, as i → ∞ the right hand side
in (13.6) tends to ∞ since when i tends to ∞ and ∥xi − y i ∥2 tends to a nonzero
limit ∥x − y∥2 (recall x ̸= y in this case).
This is the desired contradiction since precisely one of the cases x = y and
x ̸= y must hold.
14

Minima and maxima of convex functions

14.1 Minima of convex functions


14.1.1 Unimodality
As it was already mentioned, optimization problems involving convex functions
possess nice theoretical properties. One of the most important of these properties
is given by the following result which states that any local minimizer of a convex
function is also its global minimizer.

Theorem III.14.1 [Unimodality] Let f : Rn → R ∪ {+∞} be a convex


function and let Q ⊆ Rn be a nonempty convex set. Suppose x∗ ∈ Q ∩ Dom f
is a local minimizer of f on Q, i.e.,
∃r > 0 such that f (y) ≥ f (x∗ ) ∀y ∈ Q satisfying ∥y − x∗ ∥2 < r. (14.1)
Then, x∗ is a global minimizer of f on Q, i.e.,
f (x∗ ) ≤ min {f (y) : y ∈ Q} . (14.2)
y

Moreover, the set Argmin f of all local (≡ global) minimizers of f on Q is


Q
convex.
If f is strictly convex, x ̸= y and λ ∈ (0, 1), then the set of its global
minimizers Argmin f is either empty or a singleton.
Q

Proof. If f is the convex function that is identical to +∞, Dom f = ∅ and there
is nothing to prove. So, we assume that Dom f ̸= ∅.
We will first show that any local minimizer of f is also a global minimizer
of f . Let x∗ be a local minimizer of f on Q. Consider any y ∈ Q such that
y ̸= x∗ . We need to prove that f (y) ≥ f (x∗ ). If f (y) = +∞, this relation is
automatically satisfied. So, we assume that y ∈ Dom f . Note that by definition
of a local minimizer, we also have x∗ ∈ Dom f for sure. Now, for any τ ∈ (0, 1),
by Lemma III.13.1 we have
f (x∗ + τ (y − x∗ )) − f (x∗ ) f (y) − f (x∗ )
≤ .
τ ∥y − x∗ ∥2 ∥y − x∗ ∥2
Since x∗ is a local minimizer of f , the left hand side in this inequality is nonneg-
179
180 Minima and maxima of convex functions

ative for all small enough values of τ > 0. Thus, we conclude that the right hand
side is nonnegative as well, i.e., f (y) ≥ f (x∗ ).
Note that Argmin f is nothing but the sublevel set levα (f ) of f associated with
Q
α taken as the minimal value min f of f on Q. Recall by Proposition III.12.6 any
Q
sublevel set of a convex function is convex, so this sublevel set Argmin f is convex.
Q
Finally, let us prove that the set Argmin f associated with a strictly convex f
Q
is, if nonempty, a singleton. Assume for contradiction that there are two distinct
minimizers x′ , x′′ in Argmin f . Then, from strict convexity of f , we would have
Q
 
1 ′ 1 ′′ 1
f x + x < (f (x′ ) + f (x′′ )) = min f,
2 2 2 Q

where the equality follows from x′ , x′′ ∈ Argmin f . But, this strict inequality is
Q
1 ′
impossible since 2
x + 12 x′′ ∈ Q as Q is convex and by definition of min f we
Q
cannot have a point in Q with objective value strictly smaller than min f .
Q

14.1.2 Necessary and sufficient optimality conditions


Another pleasant fact for differentiable convex functions is that the Calculus-
based necessary optimality condition (a.k.a., the Fermat rule) is sufficient for
global optimality.

Theorem III.14.2 [Necessary and sufficient optimality condition for differ-


entiable convex functions] Let f : Rn → R ∪ {+∞} be a convex function and
let Q ⊆ Rn be a nonempty convex set. Consider any x∗ ∈ int Q. Suppose
that f is differentiable at x∗ . Then, x∗ is a minimizer of f on Q if and only if
∇f (x∗ ) = 0.

Proof. The necessity of the condition ∇f (x∗ ) = 0 for local optimality is due to
Calculus, and so it has nothing to do with convexity. The essence of the matter
is, of course, the sufficiency of the condition ∇f (x∗ ) = 0 for global optimality of
x∗ in the case of convex function f . In fact, this sufficiency is readily given by the
gradient inequality (13.4). In particular, when ∇f (x∗ ) = 0 holds, (13.4) becomes
f (y) ≥ f (x∗ ) + (y − x∗ )∇f (x∗ ) = f (x∗ ), ∀y ∈ Q.

A natural question is what happens if x∗ in Theorem III.14.2 is not necessarily


an interior point of Q ⊆ Rn . Let us now assume that x∗ is an arbitrary point of a
convex set Q and that f is convex on Q and differentiable at x∗ (the latter means
exactly that Dom f contains a neighborhood of x∗ and f possesses the first order
derivative at x∗ ). Under these assumptions, our goal is to characterize when x∗
will be a minimizer of f on Q.
14.1 Minima of convex functions 181

In order to give such a characterization, given a convex set Q and a point


x ∈ Q, we will look at the radial cone of Q at x∗ [notation: TQ (x∗ )] defined as

the set
TQ (x∗ ) := {h ∈ Rn : x∗ + th ∈ Q, ∀ small enough t > 0} .
Geometrically, this is the set of all directions “looking” from x∗ towards Q, so
that a small enough positive step from x∗ along the direction, i.e., adding to x∗
a small enough positive multiple of the direction, keeps the point in Q. That
is, TQ (x∗ ) is the set of all “feasible” directions at x∗ that starting from x∗ we
can go a positive distance along and remain in Q. From the convexity of Q it
immediately follows that the radial cone indeed is a cone (not necessary closed).
For example, when x∗ ∈ int Q, we have TQ (x∗ ) = Rn . Let us examine a more
interesting example, e.g., the polyhedral set
Q = x ∈ Rn : a⊤

i x ≤ bi , i = 1, . . . , m , (14.3)
and its radial cone. For any x∗ ∈ Q, we define I(x∗ ) := {i : a⊤ ∗
i x = bi } as the
set of indices of constraints that are active at x (i.e., those satisfied at x∗ as

equalities rather than as strict inequalities). Then,


TQ (x∗ ) = h ∈ Rn : a⊤ ∀i ∈ I(x∗ ) .

i h ≤ 0, (14.4)
The radial cone TQ (x∗ ) of a convex set Q taken at x∗ ∈ Q gives rise to another
cone, namely the normal cone [notation: NQ (x∗ )]. By definition, the normal cone
is the negative of the dual of the radial cone TQ (x∗ ), i.e.,
NQ (x∗ ) := h ∈ Rn : h⊤ (x′ − x∗ ) ≤ 0, ∀x′ ∈ Q .

(14.5)
Now, we are ready to present the necessary and sufficient condition for x∗ to
be a minimizer of a convex function f on Q.

Proposition III.14.3 Let Q be a convex set and x∗ ∈ Q, and suppose f


is a convex function on Q which is differentiable at x∗ . The necessary and
sufficient condition for x∗ to be a minimizer of f on Q is that the derivative
of f taken at x∗ along every direction from TQ (x∗ ) should be nonnegative,
i.e.,
h⊤ ∇f (x∗ ) ≥ 0, ∀h ∈ TQ (x∗ ).
By the duality relation between TQ (x∗ ) and NQ (x∗ ), this is precisely the same
condition as the inclusion
∇f (x∗ ) ∈ −NQ (x∗ ).
Thus, with Q, x∗ and f as above, x∗ minimizes f over Q if and only if
−∇f (x∗ ) belongs to the normal cone NQ (x∗ ) of Q at x∗ .

Proof. The necessity of this condition is an evident fact which has nothing to
do with convexity. Suppose that x∗ is a local minimizer of f on Q. Assume for
contradiction that there exists h ∈ TQ (x∗ ) such that h⊤ ∇f (x∗ ) < 0. Then, by
182 Minima and maxima of convex functions

h ∈ TQ (x∗ ), we get x∗ + th ∈ Q for all small enough positive t. Moreover, as


h⊤ ∇f (x∗ ) < 0, we would also get

f (x∗ + th) < f (x∗ ),

for all small enough positive t. These two observations together thus imply that
in every neighborhood of x∗ there are points x from Q with values f (x) strictly
smaller than f (x∗ ). This clearly contradicts the assumption that x∗ is a local
minimizer of f on Q.
Once again, the sufficiency of this condition is given by the gradient inequality,
exactly as in the case when x∗ ∈ int Q discussed in the proof of Theorem III.14.2.

Proposition III.14.3 states that under its premise the necessary and sufficient
condition for x∗ to minimize f on Q is the inclusion ∇f (x∗ ) ∈ −NQ (x∗ ). What
does this condition actually mean? The answer depends on what the normal cone
is: whenever we have an explicit description of it, we have an explicit form of the
optimality condition. For example,
• Consider the case of TQ (x∗ ) = Rn , i.e., x∗ ∈ int Q. Then, the normal cone
NQ (x∗ ) is the cone of all the vectors h that have nonpositive inner products with
every vector in Rn , i.e., NQ (x∗ ) = {0}. Consequently, in this case the necessary
and sufficient optimality condition of Proposition III.14.3 becomes the Fermat
rule ∇f (x∗ ) = 0, which we already know.
• When Q is an affine plane given by linear equalities Ax = b, A ∈ Rm×n , the
radial cone at every point x ∈ Q is the linear subspace {d : Ad = 0}, the normal
cone is the orthogonal complement {u = A⊤ v : v ∈ Rm } to this linear subspace,
and the optimality condition reads

Given Q = {x : Ax = b}, a point x∗ ∈ Q, and a function f that is convex


on Q and differentiable at x∗ , x∗ is a minimizer of f on Q if and only if
∃v ∈ Rm : ∇f (x∗ ) + A⊤ v = 0.

• When Q is the polyhedral set (14.3), the radial cone is the polyhedral cone
(14.4), i.e., it is the set of all directions which have nonpositive inner products with
all ai for i ∈ I(x∗ ) (recall that these ai are coming from the constraints a⊤ i x ≤ bi
specifying Q that are satisfied as equalities at x∗ ). The corresponding normal
cone is thus the set of all vectors which have nonpositive inner products with all
these directions in TQ (x∗ ), i.e., of vectors a such that the inequality h⊤ a ≤ 0 is
a consequence of the inequalities h⊤ ai ≤ 0, i ∈ I(x∗ ) = {i : a⊤ ∗
i x = bi }. From
the Homogeneous Farkas Lemma we conclude that the normal cone is simply
the conic hull of the vectors ai , i ∈ I(x∗ ). Thus, in this case our necessary and
sufficient optimality condition becomes:
14.1 Minima of convex functions 183

Given Q = {x ∈ Rn : a⊤ ∗
i x ≤ bi , i = 1, . . . , m}, a point x ∈ Q, and a
∗ ∗
function f that is convex on Q and differentiable at x , x is a minimizer
of f on Q if and only if there exist nonnegative reals λ∗i (“Lagrange
multipliers”) associated with “active” indices i (those from I(x∗ )) such
that
X
∇f (x∗ ) + λ∗i ai = 0,
i∈I(x∗ )

or, equivalently, there exist nonnegative λ∗i , 1 ≤ i ≤ m, such that


m
X

∇f (x ) + λ∗i ai = 0, (Karush-Kuhn-Tucker equation)
i=1

and λ∗i [a⊤ ∗


i x − bi ] = 0, 1 ≤ i ≤ m. (complementary slackness)
This is our first acquaintance with the famous Karush-Kuhn-Tucker (KKT)
optimality conditions. We will eventually extend them to the case where Q is given
by convex, rather than just linear, constraints (see Theorem IV.23.4). Indeed,
KKT conditions are also necessary for local optimality in the case of nonconvex
Mathematical Programming problems.
Remark III.14.4 Let us give an informal explanation of the preceding results on
first-order optimality conditions through Physics (see Figure III.2 for a graphical
illustration). Consider the optimization problem given by
min {f (x) : ai (x) ≤ 0, i = 1, . . . , m} .
x∈Rn

This problem can be interpreted as locating the equilibrium position of a particle


that is moving through Rn while being affected by an external force (like gravity)
with potential f , meaning that when the position of the particle is x ∈ Rn , the
force acting at the particle is −∇f (x). The domain in which the particle can
actually travel is Q := {x ∈ Rn : ai (x) ≤ 0, i ≤ m}; think about areas ai (x) > 0
as rigid obstacles that the particle cannot penetrate into. When the particle
touches i-th obstacle (i.e., is in position x with ai (x) = 0), the obstacle produces
a reaction force directed along the inward normal −∇ai (x) to the boundary of
the obstacle, so that the reaction force is −λi ∇ai (x); here λi ≥ 0 depends on the
pressure on the obstacle exerted by the particle. At an equilibrium x∗ (which,
by Physics, should minimize, at least locally, the potential f over Q), the total
of the forces acting at the particle should be zero, that is, for properly selected
λi ≥ 0 one should have −∇f (x ) − i:ai (x∗ )=0 λi ∇ai (x∗ ) = 0, which is exactly

P
what is said by our Karush-Kuhn-Tucker (KKT) optimality condition as applied
to the problem where the functions ai (x) = a⊤ i x − bi are affine. ■

14.1.3 ⋆ Symmetry Principle


We close this section by discussing a simple characterization of minimizers of
convex functions admitting certain symmetry. When applicable, this characteri-
184 Minima and maxima of convex functions

Figure III.2. Physical illustration of KKT optimality onditions for optimization problem
minx∈R2 {f (x) : ai (x) ≤ 0, i = 1, 2, 3}.
White area represents the feasible domain Q, while ellipses A, B, C represent the sets
a1 (x) ≤ 0, a2 (x) ≤ 0, a3 (x) ≤ 0. The point x is a candidate feasible solution located at
the intersection {u ∈ R2 : a1 (u) = a2 (u) = 0} of boundaries of A and B. g = −∇f (x)
is external force acting at particle located at x, p and q are reaction forces created by
obstacles A and B. The condition for x to be at equilibrium reduces to g + p + q = 0,
as on the picture. Equilibrium condition g + p + q = 0 translates to the KKT equation
∇f (x) + λ1 a1 (x) + λ2 ∇a2 (x) = 0
holding for some nonnegative λ1 , λ2 .

zation is extremely useful. This result is indeed an almost immediate consequence


of Proposition III.12.6.

Proposition III.14.5 Let f be a convex function with domain Q := Dom f ⊆


Rn . Suppose that f and Q admit a group of symmetries. That is, there ex-
ists a finite collection G = {G0 , . . . , Gm } of distinct from each other n × n
nonsingular matrices such that
(i) G is a finite group, i.e., G ∈ G implies G−1 ∈ G as well, and for any
G′ , G′′ ∈ G, we also have G′ G′′ ∈ G;
(ii) all matrices G ∈ G are symmetries of Q and f , i.e., for any G ∈ G, we
have Gx ∈ Q and f (x) = f (Gx) for all x ∈ Q.
Suppose also that the set of minimizers of f on Q, i.e., Q∗ := Argminx∈Q f (x),
is nonempty. Then, f admits a G-symmetric minimizer, that is, there exists
some x∗ ∈ Q∗ such that Gx∗ = x∗ for all G ∈ G.

Proof. Consider any fixed x ∈ Q∗ . Since matrices G ∈ G are symmetries of


f and Q, they are symmetries of Q∗ , i.e., Gx ∈ Q∗ for all G ∈ G (why?) as
well. Now, because f is convex, Proposition III.12.6 implies that the set Q∗ =
Argminu∈Q f (u) = {u ∈ Q : f (u) ≤P minQ f } is convex and thus contains, along
1
with the point x, the point x∗ := mP G∈G Gx. It remains to show that Hx∗ = x∗
1
for all H ∈ G. Indeed, Hx∗ = m G∈G HGx, and the terms in the latter sum
form a permutation of the terms in the sum specifying x∗ due to the fact that G
is a group. Thus, x∗ is the desired G-symmetric minimizer of f .

Illustration: Geometric-Arithmetic Mean inequality. Let us use Symmetry Prin-


14.2 Maxima of convex functions 185

ciple to justify the following classical inequality:


√ a1 + a2 + . . . + am
For any a1 , . . . , am ∈ R+ , we have m a1 . . . am ≤ .
m
Pm
To prove this inequality, we define a := a1 +. . .+am , Q := {x ∈ Rm + : i=1 xi =
a}, and
(√
m
x1 . . . xm , if x ∈ Q,
f (x) :=
+∞, otherwise.
Note that Q is nonempty and compact, and f (x) is continuous and concave (see
the end of section 13.2). Thus, by Theorem B.31 the set of maximizers of f on
Q, i.e., Q∗ , is nonempty. Clearly, the m! permutation matrices of size m × m
form a group of symmetries of f and Q, so that Q∗ contains a permutationally
symmetric point x∗ (apply Proposition III.14.5 to minimize the convex function
−f over Q). Since the sum of all entries in a point from Q is a, Q contains exactly
1 1
one permutationally symmetric point m [a; a; . . . ; a]. Then, as m [a; a; . . . ; a] is a
maximizer of f over Q, we conclude that for every [a1 ; . . . ; am ] ∈ Q we have

 
1 1 a1 + a2 + . . . + am
m
a1 . . . am = f ([a1 ; . . . ; am ]) ≤ f [a; . . . ; a] = a = .
m m m

14.2 Maxima of convex functions


So far we have seen that the fact that a point x∗ ∈ Dom f is a global minimizer
of a convex function f depends only on the local behavior of f at x∗ . This is not
the case with maximizers of a convex function. First of all, in all nontrivial cases,
such a maximizer, if one exists at all, must belong to the relative boundary of
the domain of the function.

Theorem III.14.6 Let f be a convex function. If a point x∗ ∈ rint (Dom f )


is a maximizer of f over Dom f , then f is constant on Dom f .

Proof. Define Q := Dom f , and consider any y ∈ Q. We need to prove that


f (y) = f (x∗ ). There is nothing to prove if y = x∗ , so we assume that y ̸= x∗ .
Since, by assumption, x∗ ∈ rint Q, there exists a point y ′ ∈ Q such that x∗ is an
interior point of the segment [y ′ , y], i.e., there exists λ ∈ (0, 1) such that
x∗ = λy ′ + (1 − λ)y.
Then, as f is convex, we get
f (x∗ ) ≤ λf (y ′ ) + (1 − λ)f (y).
As x∗ is a maximizer of f on Q, we have f (x∗ ) ≥ f (y) and f (x∗ ) ≥ f (y ′ ).
Combining this with the preceding inequality leads to λf (y ′ )+(1−λ)f (y) = f (x∗ ).
Since λ ∈ (0, 1) and max{f (y), f (y ′ )} ≤ f (x∗ ), this equation can hold as equality
only if f (y ′ ) = f (y) = f (x∗ ).
186 Minima and maxima of convex functions

In particular, Theorem III.14.6 states that given a convex function f the only
way for a point x∗ ∈ rint (Dom f ) to be a global maximizer of f is if the function
f is constant over its domain.
Next, we provide further information on maxima of convex functions.

Theorem III.14.7 Let f be a convex function on Rn and E ⊆ Rn be a


nonempty set. Then,
sup f (x) = sup f (x). (14.6)
x∈Conv E x∈E
n
In particular, if S ⊂ R is a nonempty convex and compact set, then the
supremum of f on S is equal to the supremum of f on the set of extreme
points of S, i.e.,
sup f (x) = sup f (x). (14.7)
x∈S x∈Ext(S)

Proof. We will first prove (14.6). As Conv E ⊇ E, we have sup f (x) ≥


x∈Conv E
sup f (x). To prove the reverse direction, consider any x̄ ∈ Conv E. Then, there
x∈E
exist xi ∈ E and convex combination weights λi ≥ 0 satisfying
P
λi = 1 and
i
X
x̄ = λi xi .
i

Applying Jensen’s inequality (Proposition III.12.4), we get


X X
f (x̄) ≤ λi f (xi ) ≤ λi sup f (x) = sup f (x).
i i x∈E x∈E

Since the preceding inequality holds for any x̄ ∈ Conv E, we conclude sup f (x) ≤
x∈Conv E
sup f (x) holds as well, as desired.
x∈E
To prove (14.7), note that when S is a nonempty convex compact set, by Krein-
Milman Theorem (Theorem II.8.6) we have S = Conv(Ext(S)). Then, (14.7)
follows immediately from (14.6).
Our last theorem on maxima of convex functions is as follows.

Theorem III.14.8 [Maxima of convex functions] Let f be a proper convex


function, and let Q := Dom f . Then,
(i) If Q is closed and does not contain lines and the set of global maximizers
of f , i.e.,
Argmax f := {x ∈ Q : f (x) ≥ f (y), ∀y ∈ Q} ,
Q

is nonempty, then Argmax f ∩Ext(Q) ̸= ∅, i.e., at least one of the maximizers


Q
of f is an extreme point of Q.
14.2 Maxima of convex functions 187

(ii) If the set Q is polyhedral and f is above bounded on Q, then the


maximum of f on Q is achieved, i.e., Argmax f ̸= ∅.
Q

Proof. As f is a proper convex function, we have Q ̸= ∅.


Let us start with the following immediate observation:
(!) Let a convex function f be bounded from above on a ray ℓ = x̄ +
R+ (e). Then, the function does not increase along the ray, i.e., for any
x ∈ ℓ and s ≥ 0, we have f (x + se) ≤ f (x).
Indeed, assuming for contradiction that there exists x ∈ ℓ and x′ = x + se, s ≥ 0,
with f (x′ ) > f (x), we conclude that s > 0 and that the average rate of change

κ := f (x )−f
s
(x)
when moving from x to x′ is positive: κ > 0. Since f is convex,
the average rate f (x+te)−f
r
(x)
when moving from x to x + te with t ≥ s is at least
κ > 0 (section 13.2), implying that f (x + te) → +∞ as t → ∞, which is the
desired contradiction, since f is bounded from above on ℓ.
As an immediate corollary of (!), we conclude that
(!!) If Q is a convex set represented as V + R, where V is nonempty, and
R is a cone, and f is a convex bounded from above function on Q, then
f attains its maximum on Q if and only if it attains its maximum on V ,
the maxima being equal to each other.
Indeed, assume that f attains its maximum on Q at some point x̄. As Q = V +R,
we have x̄ = v̄ + e for some v̄ ∈ V and e ∈ R. By (!), we have f (v̄) ≥ f (x̄) which
combines with V ⊆ Q to imply that v̄ is a maximizer of f on both V and Q.
Vice versa, assume that f attains its maximum on V at a certain point v̄. Every
x ∈ Q can be represented as v + e with v ∈ V and e ∈ R, so that by (!) we have
f (x) ≤ f (v) and thus f (x) ≤ f (v̄), and we again conclude that v̄ maximizes f
both on V and on Q.
Theorem III.14.8 is an immediate corollary of (!) and (!!). Indeed, in the case
of (i), as Q is a nonempty closed convex set that does not contain lines, by
Theorem II.8.16 we deduce Ext(Q) ̸= ∅ and Q admits a representation as
Q = Conv(W ) +R, (14.8)
| {z }
V

W = Ext(Q), R = Rec(Q). In the situation of (i), f attains its maximum on Q,


and therefore, by (!!), f has a maximizer v̄ belongingPto V . Due to the origin of
V , v̄ = i∈I λi v i with nonempty finite set I, λi ≥ 0, i λi = 1, and v i ∈ Ext(Q).
P
By convexity of f , f (v̄) ≤ i∈I λi f (v i ) ≤ maxi∈I f (v i ) =: f (v i∗ ), implying that
P
the extreme point v i∗ of Q maximizes f on Q.
In the case of (ii), the polyhedral set Q by Theorem II.9.10 admits representa-
tion (14.8) with nonempty finite W , implying that f , which is convex and real-
valued on V = Conv(W ) ⊆ Q = Dom f , attains its maximum on V , e.g., at its
maximizer on the nonempty and finite set W . By (!!), this maximizer maximizes
f on Q as well, so that f achieves its maximum on Q, as claimed in (ii).
When Q = ∅, there is nothing to prove so we assume Q ̸= ∅.
15

Subgradients

15.1 Proper lower semicontinuous convex functions and their


representation
Recall that an equivalent definition of a convex function f on Rn is that f is a
function taking values in R ∪ {+∞} such that its epigraph
epi{f } = [x; t] ∈ Rn+1 : t ≥ f (x)


is a convex set. Thus, there is no essential difference between convex functions and
convex sets: a convex function generates a convex set, i.e., its epigraph, which of
course remembers everything about the function. And the only specific property
of the epigraph as a convex set is that it always possesses a very specific recessive
direction, namely h = [0; 1]. That is, the ray {z + th : t ≥ 0} directed by h
belongs to the epigraph set whenever the starting point z of the ray is in the set.
Whenever a convex set possesses a nonzero recessive direction h such that −h is
not a recessive direction, the set in appropriate coordinates becomes the epigraph
of a convex function. Thus, a convex function is, basically, nothing but a way to
look, in the literal meaning of the latter verb, at a convex set.
Now, we know that the convex sets that are “actually nice” are the closed ones:
they possess a lot of important properties (e.g., admit a good outer description)
which are not shared by arbitrary convex sets. Therefore, among convex functions
there also are “actually nice” ones, namely those with closed epigraphs. Closed-
ness of the epigraph of a function can be “translated” to the functional language
and there it becomes a special kind of continuity, namely lower semicontinuity.
Before formally defining lower semicontinuity, let us do a brief preamble on con-
vergence of sequences on the extended real line. In the sequel, we will oper-
ate with limits of sequences {ai }i≥1 with terms ai from the extended real line
R := R ∪ {+∞} ∪ {−∞}. These limits are defined in the natural way: the rela-
tion
lim ai = a ∈ R
i→∞

means that for every a′ ∈ R


• ai < a′ for all but finitely many values of i, when a′ > a, and
• ai > a′ for all but finitely many values of i, when a′ < a.
Equivalent way to treat convergence of sequences {ai }i ⊆ R is as follows: let us
188
15.1 Proper lower semicontinuous convex functions and their representation 189

fix somehow a strictly monotone continuous function θ on R which maps the


axis onto the interval (−1, 1) (that is, lims→−∞ θ(s) = −1, lims→∞ θ(s) = 1), e.g.,
θ(s) = π2 atan(s), and extend it from R to R by setting

−1,
 if a = −∞,
θ(a) = θ(a), if a ∈ R,

1, if a = +∞.

With this “encoding,” R becomes the segment [−1, 1], and the relation a =
limi→∞ ai as defined above is the same as θ(a) = limi→∞ θ(ai ), that is, this
relation stands for the usual convergence, as i → ∞, of reals θ(ai ) to the real
θ(a). Note also that for a, b ∈ R the relation a ≤ b (a < b) is exactly the same as
the usual arithmetic inequality θ(a) ≤ θ(b) (θ(a) < θ(b), respectively).
With convergence and limits of sequences {ai }i ⊆ R already defined, we can
speak about upper (lower) limits of these sequences. For example, we can define
lim inf i→∞ ai as a ∈ R uniquely specified by the relation θ(a) = lim inf i→∞ θ(ai ).
Same as with lower limits of sequences of reals, lim inf i→∞ ai is the smallest (in
terms of the relation ≤ on R!) of the limits of converging (in R!) subsequences
of the sequence {ai }i .
It is time to come back to lower semicontinuity.

Definition III.15.1 [Lower semicontinuity] Let f : Rn → R ∪ {+∞} be a


function (not necessarily convex). f is called lower semicontinuous (lsc for
short) at a point x̄, if for every sequence of points {xi }i converging to x̄ one
has
f (x̄) ≤ lim inf f (xi ).
i→∞

A function f is called lower semicontinuous, if it is lower semicontinuous


at every point.

A trivial example of an lsc function is a continuous one. Note, however, that an


lsc function need not be continuous; what it is needed for lower semicontinuity is
that the function can make only “jump downs.” For example, the function
(
0, if x ̸= 0,
f (x) =
a, if x = 0,

is lsc if a ≤ 0 (the function can “jump down at x = 0 or no jump at all”), and is


not lsc if a > 0 (“jump up”). For more illustrations, see Figure III.3.
Here is the connection between lower semicontinuity of a function and the
geometry of its epigraph.

Proposition III.15.2 A function f : Rn → R ∪ {+∞} is lower semicontin-


uous if and only if its epigraph is closed.
190 Subgradients

Figure III.3. Continuity and lower semicontinuity


a) continuous function f (x), Dom f = [0, 1]
b-d) functions on [0, 1] discontinuous at x = 0.5. Given their common restriction on
{0 ≤ x ≤ 1, x ̸= 0.5} and setting q = limϵ→+0 inf |x−0.5|≤ϵ f (x), to be lsc,
x̸=0.5
we should have f (0.5) ≤ q, as in b-c). Function d) is not lsc.

Proof. First, suppose epi{f } is closed, and let us prove that f is lsc. Consider
a sequence {xi }i such that xi → x as i → ∞, and let us prove that f (x) ≤ a :=
lim inf i→∞ f (xi ). There is nothing to prove when a = +∞. Assuming a < +∞, by
the definition of lim inf there exists a sequence i1 < i2 < . . . such that f (xij ) → a
as j → ∞. Let us assume that a > −∞ (we will verify later on that this is in fact
the case). Then, as the points [xij ; f (xij )] ∈ epi{f } converge to [x; a] and epi{f }
is closed, we see that [x; a] ∈ epi{f }, that is, f (x) ≤ a, as claimed. It remains
to verify that a > −∞. Indeed, assuming a = −∞, we conclude that for every
t ∈ R the points [xij ; t] belong to epi{f } for all large enough values of j, which,
as above, implies that [x; t] ∈ epi{f }, that is, t ≥ f (x). The latter inequality
cannot hold true for all real t, since f does not take value −∞; thus, a = −∞ is
impossible.
Now, for the opposite direction, let f be lsc, and let us prove that epi{f } is
closed. So, we should prove that if [xi ; ti ] → [x; t] as i → ∞ and [xi ; ti ] ∈ epi{f },
that is, ti ≥ f (xi ) for all i, then [x; t] ∈ epi{f }, that is, t ≥ f (x). Indeed, since f is
lsc and f (xi ) ≤ ti , we have f (x) ≤ lim inf i→∞ f (xi ) ≤ lim inf i→∞ ti = limi→∞ ti =
t.
An immediate consequence of Proposition III.15.2 is as follows:

Corollary III.15.3 Given an arbitrary family of lsc functions fα : Rn →


R ∪ {+∞}, their supremum given by
f (x) := sup fα (x)
α∈A

is lower semicontinuous.
Proof. The epigraph of the function f is the intersection of the epigraphs of all
functions fα , and the intersection of closed sets is always closed.
Now let us look at convex, proper, and lower semicontinuous functions, that
is, functions Rn → R ∪ {+∞} with closed convex and nonempty epigraphs. To
save words, let us call these functions regular.
What we are about to do is to translate to the functional language several con-
structions and results related to convex sets. In the usual life, a translation (e.g.,
15.1 Proper lower semicontinuous convex functions and their representation 191

of poetry) typically results in something less rich than the original. In contrast
to this, in mathematics this is a powerful source of new ideas and constructions.
“Outer description” of a proper lower semicontinuous convex function.
We know that any closed convex set is the intersection of closed half-spaces. What
does this fact imply when the set is the epigraph of a regular function f ? First
of all, note that the epigraph is not a completely arbitrary convex set in Rn+1 : it
has the recessive direction e := [0n ; 1], i.e., the basic orth of the t-axis in the space
of variables x ∈ Rn , t ∈ R where the epigraph lives. This direction, of course,
should be recessive for every closed half-space
Π = [x; t] ∈ Rn+1 : αt ≥ d⊤ x − a , where |α| + ∥d∥2 > 0,

(*)
containing epi{f }. Note that in (*) we are adopting a specific form of the nonstrict
linear inequality describing the closed half-space Π among many possible forms
in the space where the epigraph lives; this form is the most convenient for us
now. Thus, e should be a recessive direction of Π ⊇ epi{f }, and the recessiveness
of e for Π means exactly that α ≥ 0. Thus, speaking about closed half-spaces
containing epi{f }, we in fact are considering some of the half-spaces (*) with
α ≥ 0.
Now, there are two essentially different possibilities for α to be nonnegative:
(A) α > 0, and (B) α = 0. In the case of (B) the boundary hyperplane of Π is
“vertical,” i.e., it is parallel to e, and in fact it “bounds” only x. And, in such
cases, Π is the set of all vectors [x; t] with x belonging to certain half-space in the
x-subspace and t being an arbitrary real number. These “vertical” half-spaces
will be of no interest to us.
The half-spaces which indeed are of interest to us are the “nonvertical” ones:
those given by the case (A), i.e., with α > 0. For a non-vertical half-space Π,
we can always divide the inequality defining Π by α and make α = 1. Thus,
a “nonvertical” candidate eligible for the role of a closed half-space containing
epi{f } can always be written as
Π = [x; t] ∈ Rn+1 : t ≥ d⊤ x − a .

(**)
That is, a “nonvertical” closed half-space containing epi{f } can be represented
as the epigraph of an affine function of x.
Now, when is such a candidate indeed a half-space containing epi{f }? It is
clear that the answer is yes if and only if the affine function d⊤ x − a is less than
or equal to f (x) for all x ∈ Rn . This is indeed precisely what we call as “d⊤ x − a
is an affine minorant of f .” In fact, we have a very nice characterization of proper
lsc convex functions through their affine minorants!

Proposition III.15.4 A proper lower semicontinuous convex function f is


the pointwise supremum of all its affine minorants. Moreover, at every point
x̄ ∈ rint (Dom f ), the function f is even not only the supremum, but simply
the maximum of its affine minorants. That is, at every point x̄ ∈ rint (Dom f ),
192 Subgradients

there exists an affine function fx̄ (x) such that fx̄ (x) ≤ f (x) for all x ∈ Rn
and fx̄ (x̄) = f (x̄).

Note that Proposition III.15.4 essentially gives us an outer description of a


proper lsc convex function. This outer description is in fact instrumental in de-
veloping algorithms for convex optimization. Before proceeding with the proof of
Proposition III.15.4, we will first prove an intermediate result on proper convex
(but not necessarily lower semicontinuous) functions. This result indeed forms
the most important step in the proof of Proposition III.15.4.

Proposition III.15.5 Let f be a proper convex function. By definition


of affine minorant, we immediately have that the supremum of all affine
minorants of f is less than or equal to f everywhere. Let x̄ ∈ rint (Dom f ).
Then, there exists an affine minorant d⊤ x − a of f which coincides with f at
x̄, i.e.,
f (x) ≥ d⊤ x − a, ∀x ∈ Rn , and d⊤ x̄ − a = f (x̄). (15.1)
Moreover, this supremum is +∞ outside of cl(Dom f ). Thus, the supremum
of all affine minorants of f is equal to f everywhere except, perhaps, some
subset of rbd(Dom f ).

Proof. I. We will first prove that at every x̄ ∈ rint (Dom f ) there exists an affine
function fx̄ (x) such that fx̄ (x) ≤ f (x) for all x ∈ Rn and fx̄ (x̄) = f (x̄).
I.10 First of all, we can easily reduce the situation to the one when Dom f is
full-dimensional. Indeed, by shifting f we can make Aff(Dom f ) to be a linear
subspace L in Rn ; restricting f onto this linear subspace, we clearly get a proper
function on L. If we believe that our statement is true for the case when Dom f
is full-dimensional, we can conclude that there exists an affine function on L, i.e.,

d⊤ x − a [where x ∈ L]

(where d ∈ L) such that

f (x) ≥ d⊤ x − a, ∀x ∈ L, and f (x̄) = d⊤ x̄ − a.

This affine function d⊤ x − a on L clearly can be extended, by the same formula,


from L on the entire Rn and is a minorant of f on the entire Rn (note that
outside of L ⊇ Dom f , the function f simply is +∞!). This affine minorant on
Rn is exactly what we need.
I.20 . Now let us prove that our statement is valid when Dom f is full-dimensional.
In such a case, x̄ ∈ int(Dom f ). Consider the point ȳ := [x̄; f (x̄)]. Note that
ȳ ∈ epi{f }, and in fact we claim further that ȳ ∈ rbd(epi{f }). Assume for con-
tradiction that ȳ ∈ rint (epi{f }). Recall that e := [0n ; 1] is the special recessive
direction of epi{f }, thus ȳ ′ := ȳ + e satisfies ȳ ′ ∈ epi{f } and so the segment
[ȳ ′ , ȳ] is contained in epi{f }. Since ȳ ∈ rint (epi{f }), we can extend this segment
a little more through its endpoint ȳ, without leaving epi{f }. But, this is clearly
15.1 Proper lower semicontinuous convex functions and their representation 193

impossible, since in such a case the t-coordinate of the new endpoint would be
< f (x̄) while the x-component of it still would be x̄. Thus, ȳ ∈ rbd(epi{f }).
Next, we claim that ȳ ′ is an interior point of epi{f }. This is immediate: we know
from Theorem III.13.9 that f is continuous at x̄ (recall that x̄ ∈ int(Dom f )), so
that there exists a neighborhood U of x̄ in Aff(Dom f ) = Rn such that f (x) ≤
f (x̄) + 0.5 whenever x ∈ U , or, in other words, the set
V := {[x; t] : x ∈ U, t > f (x̄) + 0.5}
is contained in epi{f }; but this set clearly contains a neighborhood of ȳ ′ in Rn+1 .
We see that epi{f } is full-dimensional, so that rint(epi{f }) = int(epi f ) and
rbd(epi{f }) = bd(epi f ).
Now let us look at a hyperplane Π supporting cl(epi{f }) at the point ȳ ∈
rbd(epi{f }). W.l.o.g., we can represent this hyperplane via a nontrivial (i.e.,
with |α| + ∥d∥2 > 0) linear inequality
αt ≥ d⊤ x − a. (15.2)
satisfied everywhere on cl(epi{f }), specifically, as the hyperplane where this in-
equality holds true as equality. Now, inequality (15.2) is satisfied everywhere on
epi{f }, and therefore at the point ȳ ′ := [x̄; f (x̄) + 1] ∈ epi{f } as well, and is
satisfied as equality at ȳ = [x̄; f (x̄)] (since ȳ ∈ Π). These two observations clearly
imply that α ≥ 0. We claim that α > 0. Indeed, inequality (15.2) says that the
linear form h⊤ [x; t] := αt − d⊤ x attains its minimum over y ∈ cl(epi{f }), equal
to −a, at the point ȳ. Were α = 0, we would have h⊤ ȳ = h⊤ ȳ ′ , implying that
the set of minimizers of the linear form h⊤ y on the set cl(epi{f }) contains an
interior point (namely, ȳ ′ ) of the set. This is possible only when h = 0, that is,
α = 0, d = 0, which is not the case.
Now, as α > 0, by dividing both sides of (15.2) by α, we get a new inequality
of the form
t ≥ d⊤ x − a, (15.3)
(here we keep the same notation for the right hand side coefficients as we will
never come back to the old coefficients) which is valid on epi{f } and is equality
at ȳ = [x̄; f (x̄)]. Its validity on epi{f } implies that for all [x; t] with x ∈ Dom f
and t = f (x), we have
f (x) ≥ d⊤ x − a, ∀x ∈ Dom f. (15.4)
Thus, we conclude that the function d⊤ x − a is an affine minorant of f on Dom f
and therefore on Rn (f = +∞ outside Dom f !). Finally, note that the inequality
(15.4) becomes an equality at x̄, since (15.3) holds as equality at ȳ. The affine
minorant we have just built justifies the validity of the first claim of the propo-
sition.
II. Let F be the set of all affine functions which are minorants of f , and define
the function
f¯(x) := sup ϕ(x).
ϕ∈F
194 Subgradients

We have proved that f¯(x) is equal to f on rint (Dom f ) (and at any x ∈ rint (Dom f )
in fact sup in the right hand side can be replaced with max). To complete
the proof of the proposition, we should prove that f¯ is equal to f outside of
cl(Dom f ) as well. Note that this is the same as proving that f¯(x) = +∞ for all
x ∈ Rn \ cl(Dom f ). To see this, consider any x̄ ∈ Rn \ cl(Dom f ). As cl(Dom f )
is a closed convex set, x̄ can be strongly separated from Dom f , see Separation
Theorem (ii) (Theorem II.7.3). Thus, there exist ζ > 0 and z ∈ Rn such that
z ⊤ x̄ ≥ z ⊤ x + ζ, ∀x ∈ Dom f. (15.5)
In addition, we already know that there exists at least one affine minorant of f ,
i.e., there exist a and d such that
f (x) ≥ d⊤ x − a, ∀x ∈ Dom f. (15.6)
Multiplying both sides of (15.5) by a positive weight λ and then adding it to
(15.6), we get
f (x) ≥ (d + λz)⊤ x + [λζ − a − λz ⊤ x̄], ∀x ∈ Dom f.
| {z }
=:ϕλ (x)

This inequality clearly says that ϕλ (·) is an affine minorant of f on Rn for every
λ > 0. The value of this minorant at x = x̄ is equal to d⊤ x̄ − a + λζ and therefore
it goes to +∞ as λ → +∞. We see that the supremum of affine minorants of f at
x̄ indeed is +∞, as claimed. This concludes the proof of Proposition III.15.5.
Let us now prove Proposition III.15.4.
Proof of Proposition III.15.4. Under the premise of the proposition, f is a
proper lsc convex function. Let F be the set of all affine functions which are
minorants of f , and let
f¯(x) := sup ϕ(x).
ϕ∈F

be the supremum of all affine minorants of f . Then, by Proposition III.15.5, we


know that f¯ is equal to f everywhere on rint (Dom f ) and everywhere outside of
cl(Dom f ). Thus, all we need to prove is that when f is also lsc, f¯ is equal to f
everywhere on rbd(Dom f ) as well.
Consider any x̄ ∈ rbd(Dom f ). Recall that by construction f¯ is everywhere ≤ f .
So, there is nothing to prove if f¯(x̄) = +∞. Thus, we assume that f¯(x̄) = c < ∞
holds, and we will prove that in this case f (x̄) = c holds as well. Since f¯ ≤ f
everywhere, proving f (x̄) = c is the same as proving f (x̄) ≤ c. In fact f (x̄) ≤ c
holds due to the lower semicontinuity of f : pick any x′ ∈ rint (Dom f ) and consider
a sequence of points xi ∈ [x′ , x̄) converging to x̄. For all i, by Lemma I.1.30, we
have xi ∈ rint (Dom f ) and thus by Proposition III.15.5 we conclude
f (xi ) = f¯(xi ).
Also, since xi ∈ [x′ , x̄), there exists λi ∈ (0, 1] such that xi = (1 − λi )x̄ + λi x′ .
Note that as i → ∞, we have xi → x̄ and so λi → +0. Since f¯ is clearly convex
15.1 Proper lower semicontinuous convex functions and their representation 195

(as it is the supremum of a family of affine and thus convex functions), we have
f¯(xi ) ≤ (1 − λi )f¯(x̄) + λi f¯(x′ ).
Noting that f¯(x′ ) = f (x′ ) (recall x′ ∈ rint (Dom f ) and apply Proposition III.15.5)
as well and putting things together, we get
f (xi ) ≤ (1 − λi )f¯(x̄) + λi f (x′ ).
Moreover, as i → ∞, we have λi → +0 and so the right hand side in our inequality
converges to f¯(x̄) = c. In addition, as i → ∞, we have xi → x̄ and since f is
lower semicontinuous, we get f (x̄) ≤ c.
We see why “translation of mathematical facts from one mathematical language
to another” – in our case, from the language of convex sets to the language of
convex functions – may be fruitful: because we invest a lot into the process rather
than run it mechanically.
Closure of a convex function. Proposition III.15.4 presents a nice result on
the outer description of a proper lower semicontinuous convex function: it is the
supremum of a family of affine functions. Note that, the reverse is also true: the
supremum of every family of affine functions is a proper lsc convex function,
provided that this supremum is finite at least at one point. This is because we
know from section 13.1 that supremum of every family of convex functions is
convex and from Corollary III.15.3 that supremum of lsc functions, e.g., affine
ones (these are in fact even continuous), is lower semicontinuous.
Now, what to do with a convex function which is not lower semicontinuous?
There is a similar question about convex sets: what to do with a convex set which
is not closed? We can resolve this question very simply by passing from the set to
its closure and thus getting a “much easier to handle” object which is very “close”
to the original one: the “main part” of the original set – its relative interior –
remains unchanged, and the “correction” adds to the set something relatively
small – (part of) its relative boundary. The same approach works for convex
functions as well: if a proper convex function f is not lower semicontinuous (i.e.,
its epigraph is convex and nonempty, but is not closed), we can “correct” the
function by replacing it with a new function with the epigraph being the closure
of epi{f }. To justify this approach, we, of course, should be sure that the closure
of the epigraph of a convex function is also an epigraph of such a function. This
indeed is the case, and to see it, it suffices to note that a set G in Rn+1 is the
epigraph of a function taking values in R ∪ {+∞} if and only if the intersection
of G with every vertical line {x = const, t ∈ R} is either empty, or is a closed
ray of the form {x = const, t ≥ t̄ > −∞}. Now, it is absolutely evident that if
G = cl(epi{f }), then the intersection of G with a vertical line is either empty, or
is a closed ray, or is the entire line (the last case indeed can take place – look at
the closure of the epigraph of the function equal to − x1 for x > 0 and +∞ for
x ≤ 0). We see that in order to justify our idea of “proper correction” of a convex
function we should prove that if f is convex, then the last of the indicated three
cases, i.e., the intersection of cl(epi{f }) with a vertical line is the entire line,
196 Subgradients

never occurs. However, we know from Proposition III.13.11 that every convex
function f is bounded from below on every compact set. Thus, cl(epi{f }) indeed
cannot contain an entire vertical line. Therefore, we conclude that the closure of
the epigraph of a convex function f is the epigraph of a certain function called
the closure of f [notation: cl f ] defined as:
cl(epi{f }) = epi{cl f }.
Of course, the function cl f is convex (its epigraph is convex as it is cl(epi{f })
and epi{f } itself is convex). Moreover, since the epigraph of cl f is closed, cl f is
lsc. And of course we have the following immediate observation.

Observation III.15.6 The closure of a lsc convex function f is f itself.

Proof. Indeed, when f is convex, epi{f } is convex, and when f is lsc, epi{f }
is closed by Proposition III.15.2. Hence, under the premise of this observation,
epi{f } is convex and closed and thus, by definition of cl f , is the same as epi{cl f },
implying that f = cl f .
The following statement gives an instructive alternative description of cl f in
terms of f .

Proposition III.15.7 Let f : Rn → R ∪ {+∞} be a convex function. Then,


its closure cl f satisfies the following:
(i) Affine minorants of cl f are exactly the same as affine minorants of f ,
and cl f is the supremum of these minorants, i.e., for all x ∈ Rn we have
cl f (x) = sup {ϕ(x) : ϕ is an affine minorant of f } . (15.7)
ϕ

Also, for every x ∈ rint (Dom(cl f )) = rint (Dom f ), we can replace sup in
the right hand side of (15.7) with max.
Moreover,
(a) f (x) ≥ cl f (x), ∀x ∈ Rn ,
(b) f (x) = cl f (x), ∀x ∈ rint (Dom f ), (15.8)
(c) f (x) = cl f (x), ∀x ̸∈ cl(Dom f ).
Thus, the “correction” f 7→ cl f may vary f only at the points from rbd(Dom f ),
implying that
Dom f ⊆ Dom(cl f ) ⊆ cl(Dom f ),
hence rint (Dom f ) = rint (Dom(cl f )).
In addition, cl f is the supremum of all convex lower semicontinuous mi-
norants of f .
(ii) For all x ∈ Rn , we have
cl f (x) = lim inf f (x′ ).
r→+0 x′ :∥x′ −x∥2 ≤r

Proof. There is nothing to prove when f ≡ +∞. In this case cl f ≡ +∞ as well,


15.1 Proper lower semicontinuous convex functions and their representation 197

Dom f = Dom cl f = ∅, and all claims are trivially satisfied. Thus, assume from
now on that f is proper.
(i): We will prove this part in several simple steps.
1o . By construction, epi{cl f } = cl(epi{f }) ⊇ epi{f }, which implies (15.8.a).
Also, as cl(epi{f }) ⊆ [cl(Dom f )] × R, we arrive at (15.8.c).
2o . Note that from (15.8.a) we deduce that every affine minorant of cl f is an
affine minorant of f as well. Moreover, the reverse is also true. Indeed, let g(x) be
an affine minorant of f . Then, we clearly have epi{g} ⊇ epi{f }, and as epi{g} is
closed, we also get epi{g} ⊇ cl(epi{f }) = epi{cl f }. Note that epi{g} ⊇ epi{cl f }
is simply the same as saying that g is an affine minorant of cl f . Thus, affine
minorants of f and of cl f indeed are the same. Then, as cl f is lsc and proper
(since cl f ≤ f and f is proper), by applying Proposition III.15.4 to cl f and also
applying Proposition III.15.5 to f , we deduce (15.8.b) and (15.7).
Finally, if g is a convex lsc minorant of f , then g is definitely proper, and
thus by Proposition III.15.4 it is the supremum of all its affine minorants. These
minorants of g are affine minorants of f as well, and thus also affine minorants
of cl f . The bottom line is that g is ≤ the supremum of all affine minorants of f ,
which, as we already know, is cl f . Thus, convex lsc minorant of f is a minorant
of cl f , implying that the supremum fe of these lsc convex minorants of f satisfies
fe ≤ cl f . The latter inequality is in fact an equality since cl f itself is a convex
lsc minorant of f by (15.8.a). This completes the proof of part (i).
3o To verify (ii) we need to prove the following two facts:

(ii-1) For every sequence {xi } such that as i → ∞ we have xi → x̄


and f (xi ) → s (where s may be a finite number or infinity), we have
s ≥ cl f (x̄).
(ii-2) For every x̄, there exists a sequence xi → x̄ such that
limi→∞ f (xi ) = cl f (x̄).

To prove (ii-1), note that under the premise of this claim we have s ̸= −∞ since
f is below bounded on bounded subsets of Rn (Proposition III.13.11). There is
nothing to verify when s = +∞. So, suppose s ∈ R. Then, the point [x̄; s] is in
cl(epi{f }) = epi{cl f }, and thus cl f (x̄) ≤ s, as claimed.
To prove (ii-2), note that the claim is trivially true when cl f (x̄) = +∞. Indeed,
in this case f (x̄) = +∞ as well due to (15.8.a), and for all i = 1, 2, . . . we can
take xi = x̄. Now, consider a point x̄ such that cl f (x̄) < ∞. Then, we have
[x̄; cl f (x̄)] ∈ epi{cl f } = cl(epi{f }). Thus, there exists a sequence [xi ; ti ] ∈ epi{f }
such that [xi ; ti ] → [x̄; cl f (x̄)] as i → ∞. Passing to a subsequence, we can assume
that f (xi ) have a limit, finite or infinite, as i → ∞. Hence, limi→∞ xi = x̄ and
limi→∞ f (xi ) = lim inf i→∞ f (xi ) ≤ limi→∞ ti = cl f (x̄). Recall also that from
(ii-1) we have limi→∞ f (xi ) ≥ cl f (x̄), so we conclude limi→∞ f (xi ) = cl f (x̄) as
desired.
198 Subgradients

Figure III.4. Subdifferentials of a univariate convex function


dotted: convex function with domain [a, d]
point a: at the boundary point a of the domain, the subgradients of f are
the slopes of lines like AP and AQ, and ∂f (a) = {g : g ≤ slope(AP )}
point b: at point b, subgradients are the slopes of lines like RR, EE, SS, and
∂f (b) = {g : slope(RR) ≤ g ≤ slope(SS)}
point c: just one subgradient – the slope of the tangent line, taken at point T ,
to the graph of f
point d: similar to a, ∂f (d) = {g : g ≥ slope(U B)}

15.2 Subgradients
Let f : Rn → R∪{+∞} be a convex function, and let x ∈ Dom f . Recall from our
discussion in the preceding section that f may admit an affine minorant d⊤ x − a
which coincides with f at x, i.e.,
f (y) ≥ d⊤ y − a, ∀y ∈ Rn , and f (x) = d⊤ x − a.
The equality relation above is equivalent to a = d⊤ x − f (x), and substituting this
representation of a into the first inequality, we get
f (y) ≥ f (x) + d⊤ (y − x), ∀y ∈ Rn . (15.9)
Thus, if f admits an affine minorant which is exact at x, then there exists d ∈ Rn
which gives rise to the inequality (15.9). In fact the reverse is also true: if d is
such that (15.9) holds, then the right hand side of (15.9), regarded as a function
of y, is an affine minorant of f which coincides with f at x.
Now note that (15.9) expresses a specific property of a vector d and leads to
the following very important definition which generalizes the notion of gradient
for smooth convex functions to nonsmooth convex function.

Definition III.15.8 [Subgradient of a convex function] Let f : Rn → R ∪


{+∞} be a convex function. Given x̄ ∈ Rn , a vector d ∈ Rn is called a
subgradient of f at x̄ if d is the slope of an affine minorant of f which is
exact at x̄. That is, d is a subgradient of f at x̄ if d satisfies
f (y) ≥ f (x̄) + d⊤ (y − x̄), ∀y ∈ Rn .
The set of all subgradients of f at a point x̄ is called subdifferential of f at
x̄ [notation: ∂f (x̄)].

Figure III.4 illustrates the notions of subgradient and subdifferential.


15.2 Subgradients 199

Subgradients of convex functions play an important role in the theory and nu-
merical methods for Convex Optimization – they are quite reasonable surrogates
of gradients in the cases when the latter do not exist. Let us present a simple and
instructive illustration of this. Recall that Theorem III.14.2 states that
A necessary and sufficient condition for a convex function f : Rn →
R ∪ {+∞} to attain its minimum at a point x∗ ∈ int(Dom f ) where f is
differentiable is that ∇f (x∗ ) = 0.
The “nonsmooth” version of this statement is as follows.

Proposition III.15.9 [Necessary and sufficient optimality condition for


nonsmooth convex functions] A necessary and sufficient condition for a con-
vex function f : Rn → R ∪ {+∞} to attain its minimum at a point x∗ ∈
Dom f is the inclusion 0 ∈ ∂f (x∗ ).

Proof. Given x∗ ∈ Dom f , by definition of the subdifferential, 0 ∈ ∂f (x∗ ) if


and only if the vector 0 is a subgradient of f at x∗ , which (by the definition of
subgradient) holds if and only if
f (y) ≥ f (x∗ ) + 0⊤ (y − x∗ ), ∀y ∈ Rn ,
which simply states that x∗ is a minimizer of f .

In fact, if a convex function f is differentiable at x̄ ∈ int(Dom f ), then ∂f (x̄) is


the singleton {∇f (x̄)} (see Proposition III.15.10 below). Thus, Proposition III.15.9
indeed is an extension of Theorem III.14.2 to the case when f is possibly non-
differentiable.
Looking at the (absolutely trivial!) proof of Proposition III.15.9, one can ask:
how such a trivial necessary and sufficient optimality condition can be useful?
Why is it more informative than the tautological necessary and sufficient opti-
mality condition “f (x∗ ) ≤ f (x) for all x”? Well, the definite usefulness of Fermat
optimality condition stems not from the condition per se, but from our knowledge
on gradients, including (but not reduced to) our ability to compute, to the extent
given by the standard Calculus, the gradients. Similarly, the usefulness of Propo-
sition III.15.9 stems from our knowledge on subgradients, including the ability,
to the extent given by calculus of subgradients, to compute subdifferentials of
convex functions. Let us become acquainted with the basics of this calculus.
Here is a summary of the most elementary properties of the subgradients.

Proposition III.15.10 Let f : Rn → R ∪ {+∞} be a convex function.


Then,
(i) ∂f (x) is a closed convex set for any x ∈ Dom f , ∂f (x) ̸= ∅ whenever
x ∈ rint (Dom f ) (this is the most important fact about subgradients), and
∂f (x) is bounded whenever x ∈ int(Dom f ).
(ii) If x ∈ int(Dom f ) and f is differentiable at x, then ∇f (x) is the only
subgradient of f at x: ∂f (x) = {∇f (x)}.
200 Subgradients

(iii) “closedness of the subdifferential mapping:” Let gi ∈ ∂f (xi ) and


[xi ; gi ] → [x; g] as i → ∞. Assume also that x ∈ Dom f and f is contin-
uous at x. Then, g ∈ ∂f (x).
(iv) Let Y be a nonempty convex compact subset of int(Dom f ). Then,
the set G = {[x; g] ∈ Rn × Rn : x ∈ Y, g ∈ ∂f (x)} is compact.

Proof. (i): Closedness and convexity of ∂f (x) are evident from their definition as
(15.9) is an infinite system of nonstrict linear inequalities, indexed by y ∈ Dom f ,
on variable d.
When x ∈ rint (Dom f ), Proposition III.15.5 provides us with an affine function
which underestimates f everywhere and coincides with f at x. The slope of this
affine function is clearly a subgradient of f at x, and thus ∂f (x) ̸= ∅.
Boundedness of ∂f (x) when x ∈ int(Dom f ) is an immediate consequence of
item (iv) to be proved soon.
(ii): Suppose x ∈ int(Dom f ) and f is differentiable at x. Then, by the gradient
inequality we have ∇f (x) ∈ ∂f (x). Let us prove that in this case, ∇f (x) is the
only subgradient of f at x. Consider any d ∈ ∂f (x). Then, by the definition of
subgradient, we have

f (y) − f (x) ≥ d⊤ (y − x), ∀y ∈ Rn .

Now, consider any fixed direction h ∈ Rn and any real number t > 0. By substi-
tuting y = x + th in the preceding inequality and then dividing both sides of the
resulting inequality by t, we obtain

f (x + th) − f (x)
≥ d⊤ h.
t
Taking the limit of both sides of this inequality as t → +0, we get

h⊤ ∇f (x) ≥ h⊤ d.

Since h was an arbitrary direction, this inequality is valid for all h ∈ Rn , which
is possible if and only if d = ∇f (x).
(iii): Under the premise of this part, for every y ∈ Rn and for all i = 1, 2, . . .,
we have

f (y) ≥ f (xi ) + gi⊤ (y − xi ).

Passing to limit as i → ∞, we get

f (y) ≥ f (x) + g ⊤ (y − x), ∀y ∈ Rn ,

and thus g ∈ ∂f (x).


(iv): Let us start with the following observation:
15.2 Subgradients 201

Let f : Rn → R ∪ {+∞} be convex, and consider x̄ ∈ int(Dom f ).


Suppose f is Lipschitz continuous, with constant L with respect to ∥ · ∥2 ,
in a neighborhood V of x̄, that is, |f (x) − f (y)| ≤ L∥x − y∥2 for all
x, y ∈ V . Then,
∥g∥2 ≤ L, ∀g ∈ ∂f (x̄).
Indeed, when g ∈ ∂f (x), by the subgradient inequality and Lipschitz
continuity of f we have g ⊤ (y − x) ≤ f (y) − f (x) ≤ L∥x − y∥2 for all y
⊤ n
 ⊤x, that is, g h ≤ L∥h∥2 for all h ∈ R , implying that ∥g∥2 =
close to
maxh g h : ∥h∥2 ≤ 1 ≤ L.
Now, under the premise of (iv), we can find a compact set Y ′ ⊂ int(Dom f ) such
that Y ⊂ int Y ′ . By Theorem III.13.9, f is Lipschitz continuous, with certain
constant L, on Y ′ , implying by the preceding observation that ∥g∥2 ≤ L for all
g ∈ ∂f (x) with x ∈ Y . Then, as Y is bounded, the set G = {[x; g] ∈ Rn × Rn :
x ∈ Y, g ∈ ∂f (x)} is bounded. Moreover, this set is closed by (iii) (recall that Y
is compact and f is continuous on Y ), so that G is compact.
Proposition III.15.10 sheds light onto why subgradients are good surrogates of
gradients: at a point where gradient exists, the gradient is the unique subgradient;
but, in contrast to the gradient, a subgradient exists basically everywhere (for
sure in the relative interior of the domain of the function).
Let us examine a simple function and its subgradients.
Example III.15.1 Consider the function f : R → R given by
f (x) = |x| = max {x, −x} .
This function f is, of course, convex (as maximum of two linear forms x and −x).
Whenever x ̸= 0, f is differentiable at x with the derivative +1 for x > 0 and −1
for x < 0. At the point x = 0, f is not differentiable; nevertheless, it must have
subgradients at this point (since 0 ∈ int(Dom f )). And indeed, it is immediately
seen that the subgradients of f at x = 0 are exactly the real numbers from the
segment [−1, 1]. Thus,

{−1},
 if x < 0,
∂|x| = [−1, 1], if x = 0,

{+1}, if x > 0.


It is important to note that at the points from the relative boundary of the
domain of a convex function, even a “good” one, we may not have any subgradi-
ents. That is, it is possible to have ∂f (x) = ∅ for a convex function f at a point
x ∈ rbd(Dom f ). We give an example of this next.
Example III.15.2 Consider the function
( √
− x, if x ≥ 0,
f (x) =
+∞, if x < 0.
202 Subgradients

Convexity of this function follows from convexity of its domain and Example III.12.2.
Consider the point [0; f (0)] ∈ rbd(epi{f }). It is clear that at this point [0; f (0)]
there is no non-vertical supporting line to the set epi{f }, and, consequently, there
is no affine minorant of the function which is exact at x = 0. ♢

A significant – and important – part of Convex Analysis deals with subgradient


calculus, which is the set of rules for computing subgradients of “composite” func-
tions, like sums, superpositions, maxima, etc., given subgradients of the operands.
These rules extend the standard Calculus rules to nonsmooth convex functions,
and they are quite nice and useful. Here, we list several “self-evident” versions of
these rules:

1. Subdifferential of nonnegative weighted sum: Let f, g be convex functions on


Rn . Consider x ∈ Dom f ∩ Dom g and λ, µ ∈ R+ . If d ∈ ∂f (x) and e ∈ ∂g(x),
then λd + µe ∈ ∂[λf + µg](x).
2. Subdifferential of pointwise supremum: Let {fα (·)}α∈A be a family of convex
functions on Rn . Consider the convex function f (x) := supα∈A fα (x) and x̄ ∈
Dom f . Suppose that ᾱ is such that f (x̄) = fᾱ (x̄). Then, for any d¯ ∈ ∂fᾱ (x̄),
we have d¯ ∈ ∂f (x̄).
Here is the justification: f (x) ≥ fᾱ (x) ≥ fᾱ (x̄) + d¯⊤ (x − x̄) for all x ∈ Rn , that
is, f (x) ≥ fᾱ (x̄) + d¯⊤ (x − x̄) for all x, and this inequality becomes equality at
x = x̄ as f (x̄) = fᾱ (x̄).
3. Subdifferential of convex monotone/affine superposition [chain rule] :
Let f1 (x), . . . , fK (x) be convex functions on Rn , and let F be a convex function
on RK . Suppose for some 0 ≤ k ≤ K the functions f1 , . . . , fk are affine, and
also F (y) is nondecreasing in yk+1 , . . . , yK . Recall that the superposition
(
F (f1 (x), . . . , fK (x)), if fk (x) < +∞ for all k ≤ K,
g(x) :=
+∞, otherwise,
TK
is a convex function of x. Consider x̄ ∈ i=1 Dom(fi ) such that

ȳ := [f1 (x̄); . . . ; fK (x̄)] ∈ Dom F.


PK
Let g i ∈ ∂fi (x̄) for all i ≤ K and e ∈ ∂F (ȳ). Then, the vector d := i=1 ei g i
satisfies d ∈ ∂[F (f1 , . . . , fK )](x̄).
The justification of this is as follows. Let h ∈ Rn be an arbitrary direction,
and consider x = x̄ + h and y = ȳ + [(g 1 )⊤ h; (g 2 )⊤ h; . . . ; (g K )⊤ h]. Then, for all
i ≤ K, we have

yi = ȳi + (g i )⊤ h = fi (x̄) + (g i )⊤ (x − x̄).

Moreover, for i ≤ k as f1 , . . . , fk are affine we have yi = fi (x); and for i > k, as


g i ∈ ∂fi (x̄), we have fi (x) ≥ yi . Consequently, using the partial monotonicity
of F , we conclude F (f1 (x), . . . , fK (x)) ≥ F (y). Now, e ∈ ∂F (ȳ), which implies
15.2 Subgradients 203

the first inequality in the following chain:


!⊤
X
⊤ i
F (y) ≥ F (ȳ) + e (y − ȳ) = F (f1 (x̄), . . . , fK (x̄)) + ei g h
i

= F (f1 (x̄), . . . , fK (x̄)) + d (x − x̄).
Here, the first equality follows from yi = ȳi + (g i )⊤ h for all i, and the sec-
ond equality is due to x = x̄ + h. Finally, this inequality combines with
F (f1 (x), . . . , fK (x)) ≥ F (y) to imply that

F (f1 (x), . . . , fK (x)) ≥ F (f1 (x), . . . , fK (x)) + d (x − x).
as desired.
Advanced versions of these subgradient calculus rules, under appropriate as-
sumptions, describe how the entire subdifferentials of the resulting functions are
obtained from those of operands; the related considerations, however, are beyond
our scope.
We close this section by providing an illustration of the second rule.
Example III.15.3 In this example, we will examine the subgradients of spectral
norm of a symmetric matrix. For X ∈ Sn , its spectral norm ∥X∥ is given by
∥X∥ := maxn |e⊤ Xe| : ∥e∥2 ≤ 1 .

e∈R

From this definition, it is clear that ∥X∥ is a convex function of X.


Given X, we can compute a maximizer of the optimization problem defining
∥X∥ by finding an eigenvalue-eigenvector pair (λX , e∗ ) of X such that λX is
the largest in magnitude of the eigenvalues of X. Let eX be the unit length
normalization of e∗ . Thus,
∥X∥ = max |e⊤ Xe| = max |Tr(X[ee⊤ ])| = |Tr(X[eX e⊤
X ])|
e:∥e∥2 ≤1 e:∥e∥2 ≤1

= Tr(X[sign(λX )eX e⊤
X ]),

where the third equality follows from the choice of eX , and the last equality is
due to Tr(X[eX e⊤X ]) = λX . Then, using item 2 in our subgradient calculus rules,
we conclude that the symmetric matrix E(X) := sign(λX )eX e⊤ X is a subgradient
of the function ∥ · ∥ at X. That is,
∥Y ∥ ≥ Tr(Y E(X)) = ∥X∥ + Tr(E(X)(Y − X)), ∀Y ∈ Sn ,
where the equality holds due to the choice of E(X) guaranteeing ∥X∥ = Tr(XE(X)).
To see that the above is indeed a subgradient inequality, recall that the inner
product on Sn is the Frobenius inner product ⟨A, B⟩ = Tr(AB).
Let us close this example by discussing smoothness properties of ∥X∥. Recall
that every norm ∥y∥ in Rm is nonsmooth at the origin y = 0. However, ∥X∥ is
a nonsmooth function of X even at points other than X = 0. In fact, ∥ · ∥ is
continuously differentiable in a neighborhood of every point X ∈ Sn where the
204 Subgradients

maximum magnitude eigenvalue is unique and is of multiplicity 1, and can lose


smoothness at other points. ♢

15.3 Subdifferentials and directional derivatives of convex functions


Let f be a convex function on Rn and consider x ∈ int(Dom f ) (in fact our
construction to follow admits an immediate generalization for the case when
x ∈ rint (Dom f ) as well). Consider any direction h ∈ Rn and the univariate
function
ϕ(t) := f (x + th)
associated with h. Note that ϕ(t) is a real-valued convex function of t in a neigh-
borhood of 0, thus for all small enough positive s and t we have
ϕ(0) − ϕ(−s) ϕ(t) − ϕ(0)
≤ .
s t
Moreover, the right hand side of this inequality is nondecreasing in t, hence it
implies the existence of directional derivative of f taken at x along the direction
h, i.e., the quantity
f (x + th) − f (x)
Df (x)[h] = lim .
t→+0 t
As a function of h, the function Df (x)[h] is clearly positively homogeneous of
degree 1, i.e.,
Df (x)[λh] = λDf (x)[h], ∀λ ≥ 0.
In addition, Df (x)[h] is a convex function of h. To see this, let r > 0 be such
that the Euclidean ball B centered at x and of radius r is contained in Dom f .
Then, the functions
f (x + sh) − f (x)
fs (h) :=
s
are convex over the domain h ∈ B as long as 0 < s ≤ 1. Moreover, as s → +0,
on B they pointwise converge to Df (x)[h], which clearly implies the convexity of
Df (x)[h] as a function of h ∈ B. Finally, since as a function of h, Df (x)[h] is pos-
itively homogeneous of degree 1, its convexity on B clearly implies its convexity
on the entire Rn . Note that convexity of f implies that
f (x + h) ≥ f (x) + Df (x)[h], ∀(x, h : x ∈ int(Dom f ), x + h ∈ Dom f ). (15.10)
We have arrived at the following result.

Lemma III.15.11 Let f be a convex function on Rn . For any x ∈ int(Dom f ),


the subdifferential ∂f (x) of f at x is exactly the same as the subdifferential
of Df (x)[·] at the origin.
15.3 Subdifferentials and directional derivatives of convex functions 205

Proof. Suppose d is a subgradient of f at x, and thus f (x + th) − f (x) ≥


td⊤ h holds for all h ∈ Rn and all small enough t > 0. This then implies that
Df (x)[h] ≥ d⊤ h, that is, d is a subgradient of Df (x)[h] at h = 0. For the reverse
direction, suppose that d is a subgradient of Df (x)[·] at h = 0. Then, we have
Df (x)[h] ≥ d⊤ h for all h, implying, by (15.10), that f (x + h) ≥ f (x) + d⊤ h
whenever x + h ∈ Dom f .
Our goal is to demonstrate that when x ∈ int(Dom f ), the subdifferential ∂f (x)
is large enough to fully define Df (x)[·]:

Theorem III.15.12 Let f be a convex function on Rn and x ∈ int(Dom f ).


Then,
Df (x)[h] = max d⊤ h : d ∈ ∂f (x) .

(15.11)
d

Note that for a convex function f , at x ∈ int(Dom f ), by Proposition III.15.10


we know that ∂f (x) is bounded and thus in (15.11) the use of max as opposed to
sup is justified. We are about to prove Theorem III.15.12 using the fundamental
Hahn-Banach Theorem (finite-dimensional version) given below.

Theorem III.15.13 [Hahn-Banach Theorem, finite-dimensional version] Let


D(·) be a real-valued convex positively homogeneous, of degree 1, function
on Rn , e ∈ Rn and E be a linear subspace of Rn such that
e⊤ z ≤ D(z), ∀z ∈ E.
Then, there exists e′ ∈ Rn such that [e′ ]⊤ z ≡ e⊤ z for all z ∈ E and [e′ ]⊤ z ≤
D(z) for all z ∈ Rn . In other words, a linear functional defined on a linear
subspace of Rn and majorized on this subspace by D(·), can be extended
from the subspace to a linear functional on the entire space in such a way
that the extension is majorized by D(·) everywhere.

Proof of Theorem III.15.12. In this proof we will show that Theorem III.15.13
implies Theorem III.15.12.
Consider any d ∈ ∂f (x). Then, by Lemma III.15.11 we have d is in the subd-
ifferential Df (x)[·] at the origin, i.e., Df (x)[h] ≥ d⊤ h. Thus, we conclude
Df (x)[h] ≥ max d⊤ h : d ∈ ∂f (x) .

d

To prove the opposite inequality, let us fix h ∈ Rn , and let us verify that
g := Df (x)[h] ≤ maxd {d⊤ h : d ∈ ∂f (x)}. There is nothing to prove when
h = 0, so let h ̸= 0. Setting ϕ(t) := Df (x)[th], we get a convex (since, as we
already know, Df (x)[·] is convex) univariate function such that ϕ(t) = gt for
t ≥ 0. Then, this together with the convexity of ϕ implies that ϕ(t) ≥ gt for all
t ∈ R. By applying Hahn-Banach Theorem to the function D(z) := Df (x)[z] (we
already know that this function satisfies the premise of Hahn-Banach Theorem),
E := R(h) and the linear form e⊤ (th) = gt, t ∈ R, on E, we conclude that there
exists e ∈ Rn such that e⊤ u ≤ Df (x)[u] for all u ∈ Rn and e⊤ h = g = Df (x)[h].
206 Subgradients

By the first relation, e is a subgradient of Df (x)[·] at the origin and thus, by


Lemma III.15.11, e ∈ ∂f (x), so that
max d⊤ h ≥ e⊤ h = Df (x)[h].
d∈∂f (x)

Thus, the right hand side in (15.11) is greater than or equal to the left hand side.
The opposite inequality has already been proved, so that (15.11) is an equality.

Remark III.15.14 The reasoning in the proof of Theorem III.15.12 implies the
following fact:
Let f be a convex function. Consider any x ∈ int(Dom f ) and any affine
plane M such that x ∈ M . Then, “every subgradient, taken at x, of the
restriction f of f onto M can be obtained from the subgradient of f .”
M
That is, if e is such that f (y) ≥ f (x) + e⊤ (y − x) for all y ∈ M , then
there exists e′ ∈ ∂f (x) such that e⊤ (y − x) = (e′ )⊤ (y − x) for all y ∈ M .

For completeness we also present a proof of finite-dimensional version of Hahn-


Banach Theorem (Theorem III.15.13).
Proof of Theorem III.15.13. We are given a linear functional defined on a
linear subspace E of Rn and this linear functional is majorized on this subspace
by D(·); we want to prove that this linear functional can be extended from E to a
linear functional on the entire space such that it is majorized by D(·) everywhere.
To this end, it clearly suffices to prove this fact when E is of dimension n − 1
(as given this fact for E satisfying dim(E) = n − 1, we can build the desired
extension in the general case by iterating extensions “increasing the dimension
by 1”). Thus, suppose Rn = R(g) + E, where g ̸∈ E. Note that in order to specify
the desired vector e′ all we need is to determine what the value of α := (e′ )⊤ g
should be, since then e′ will be uniquely defined by the relation
(e′ )⊤ (λg + h) = λα + e⊤ h, ∀(λ ∈ R, h ∈ E).
Therefore, we wish to find α ∈ R such that
λα + e⊤ h ≤ D(λg + h), ∀(λ ∈ R, h ∈ E).
Note that when λ = 0, the preceding inequality is automatically satisfied due to
the premise of the theorem on e. Moreover, as D is positively homogeneous of
degree 1 and E is a linear subspace, all we need is to ensure that the preceding
inequality is valid when λ = ±1, that is, to ensure that
α ≤ D(g + h) − e⊤ h, ∀h ∈ E (a)
α ≥ −D(−g + h) + e⊤ h, ∀h ∈ E (b)
Now, to justify that α ∈ R satisfying the relations (a) and (b) above indeed
exists, we need to show that every possible value of the right hand side in (a) is
15.3 Subdifferentials and directional derivatives of convex functions 207

greater than or equal to every possible value of the right hand side in (b), that is,
that D(g + h) − e⊤ h ≥ −D(−g + h′ ) + e⊤ h′ whenever h, h′ ∈ E. By rearranging
the terms, we thus need to show that

e⊤ [h + h′ ] ≤ D(h + g) + D(−g + h′ ), ∀h, h′ ∈ E. (15.12)

Now, note that


 
1 1
D(h + g) + D(−g + h′ ) = 2 D(h + g) + D(−g + h′ )
2 2
 
1 1 ′
≥ 2D (h + g) + (−g + h )
2 2
= D(h + h′ )
≥ e⊤ [h + h′ ],

where the first inequality follows from convexity of the function D, the second
equality is due to D being positively homogeneous of degree 1, and the last
inequality is due to the facts that h + h′ ∈ E and e⊤ z is majorized by D(z) on
E. Hence, (15.12) is proved.
Remark III.15.15 The advantage of the preceding proof of Hahn-Banach The-
orem in finite dimensional case is that it straightforwardly combines with what is
called transfinite induction to yield Hahn-Banach Theorem in the case when Rn
is replaced with arbitrary, perhaps infinite dimensional, linear space and exten-
sion of linear functional from linear subspace on the entire space which preserves
majorization by a given convex and positively homogeneous, of degree 1, function
on the space.
In the finite-dimensional case, alternatively, we can prove Hahn-Banach The-
orem, via Separation Theorem as follows: without loss of generality we can as-
sume that E ̸= Rn . Define the sets T := {[x; t] : x ∈ Rn , t ≥ D(x)} and
S := {[h; t] : h ∈ E, t = e⊤ h}. Thus, we get two nonempty convex sets with
non-intersecting relative interiors (as E ̸= Rn and D(h) majorizes e⊤ h on E).
Then, by Separation Theorem there exists a nontrivial (r ̸= 0) linear functional
r⊤ [x; t] ≡ d⊤ x+at, which separates S and T , i.e., inf y∈T r⊤ y ≥ supy∈S r⊤ y. More-
over, since S is a linear subspace, we deduce that supy∈S r⊤ y is either 0 or +∞.
Also, as T ̸= ∅, we conclude +∞ > inf y∈T r⊤ y ≥ supy∈S r⊤ y, and thus we must
have supy∈S r⊤ y = 0. In addition, we claim that a > 0. Indeed,

0 = sup r⊤ y ≤ inf r⊤ y = d⊤ x + at : t ≥ D(x) .



inf
n
y∈S y∈T x∈R ,t∈R

As the right hand side value must be strictly greater than −∞, we see a ≥ 0.
Also, if a = 0 were to hold, then from r ̸= 0 we must have d ̸= 0. More-
over, when a = 0 and d ̸= 0, since D(·) is a finite valued function we have
inf x∈Rn ,t∈R d⊤ x + at : t ≥ D(x) = −∞. But, this contradicts to the infimum
being bounded below by 0. Now that a > 0, by multiplying r by a−1 , we get a
208 Subgradients

separator of the form r = [−e′ ; 1]. Then,


0 = sup r⊤ y = sup r⊤ [h; t] = sup {(−e′ )⊤ h + t} = sup {(−e′ )⊤ h + t}
y∈S [h;t]∈S [h;t]∈S h∈E,t=e⊤ h

= sup{(−e′ )⊤ h + e⊤ h},
h∈E
′ ⊤ ⊤
and so we conclude (e ) h = e h holds for all h ∈ E. Note also that the relation
0 = sup[h;t]∈S r⊤ [h; t] ≤ inf [x;t]∈T r⊤ [x; t] is nothing but (e′ )⊤ x ≤ D(x) for all
x ∈ Rn . ■
Hahn-Banach Theorem is extremely important on its own right, and our way
of proving Theorem III.15.12 was motivated by the desire to acquaint the reader
with Hahn-Banach Theorem. If justification of Theorem III.15.12 were to be our
sole goal, we could have achieved this goal in a much broader setting and at a
cheaper cost, see solution to Exercise IV.29.D.4.
16

⋆ Legendre transform

16.1 Legendre transform : Definition and examples


Let f : Rn → R ∪ {+∞} be a proper convex function. We know that f is
“basically” the supremum of all its affine minorants. In fact, this is exactly the
case when f is lower semicontinuous in addition to being convex and proper;
otherwise (i.e., if it is not lower semicontinuous) the corresponding equality takes
place everywhere except, perhaps, some points from rbd(Dom f ). Now, let us look
into the question of when an affine function d⊤ x − a is an affine minorant of f .
This is the case if and only if we have

f (x) ≥ d⊤ x − a, ∀x ∈ Rn ,

which holds if and only if we have

a ≥ d⊤ x − f (x), ∀x ∈ Rn .

Thus, we see that if the slope d of an affine function d⊤ x − a is fixed, then in


order for the function to be a minorant of f , it needs to satisfy

a ≥ sup d⊤ x − f (x) .

x∈Rn

The supremum in the right hand side of this inequality is a certain function of d,
and we arrive at the following important definition.

Definition III.16.1 [Legendre transform (Fenchel dual) of a convex func-


tion] Given a convex function f : Rn → R ∪ {+∞}, its Legendre transform
(also called the Fenchel conjugate or Fenchel dual ) [notation: f ∗ ] is the func-
tion
f ∗ (d) := sup d⊤ x − f (x) : Rn → R ∪ {+∞}.

x∈Rn

Geometrically, the Legendre transform answers the following question: given a


slope d of an affine function, i.e., given the hyperplane t = d⊤ x in Rn+1 , what is
the minimal “shift down” of this hyperplane so that it can be placed below the
graph of f ?
The definition of Legendre transform immediately leads to a simple yet useful
observation.
209
210 ⋆ Legendre transform

Fact III.16.2 Given a proper convex function f : Rn → R ∪ {+∞}, its


Legendre transform f ∗ is a proper convex lower semicontinuous function.

Let us see some examples of simple functions and their Legendre transforms.
Example III.16.1
1. Given a ∈ R, consider the constant function
f (x) ≡ a.
Its Legendre transform is given by
(

 ⊤
 −a,
⊤ if d = 0,
f (d) = sup d x − f (x) = sup d x − a =
x∈Rn x∈Rn +∞, otherwise.
2. Consider the affine function
f (x) = c⊤ x + a, ∀x ∈ Rn .
Its Legendre transform is given by
(
−a, if d = c,
f ∗ (d) = sup d⊤ x − f (x) = sup d⊤ x − (c⊤ x + a) =
 
x∈Rn x∈Rn +∞, otherwise.
3. Consider the strictly convex quadratic function
1
f (x) = x⊤ Ax,
2
n
where A ∈ S is positive definite. Its Legendre transform is given by
  
1 ⊤ 1
f ∗ (d) = sup d⊤ x − f (x) = sup d⊤ x − = d⊤ A−1 d,

x Ax
x∈Rn x∈Rn 2 2
where the final equality holds by examining the first-order necessary and suf-
ficient optimality condition (for maximization type objective) of differentiable
concave functions.
4. Consider the function f : R → R given by f (x) = |x|p /p, where p ∈ (1, ∞).
Then, using the first-order optimality conditions we see that the Legendre
transform of f is given by
|x|p |d|q
 

f (d) = sup dx − = ,
x∈R p q
where q satisfies p1 + 1q = 1.
5. Suppose f is a proper convex function and the function g is defined to be
g(x) = f (x − a). Then, the Legendre transform of g satisfies
g ∗ (d) = sup d⊤ x − g(x) = sup d⊤ x − f (x − a)
 
x∈Rn x∈R n

= sup d⊤ (a + x′ ) − f (x′ ) = d⊤ a + f ∗ (d).



x′ ∈Rn


16.2 Legendre transform : Main properties 211

16.2 Legendre transform : Main properties


Given a function f , we define its biconjugate [notation: f ∗∗ ] as the Legendre
transform of the function f ∗ , that is,
f ∗∗ := (f ∗ )∗ .
The most elementary (and the most fundamental) fact about the Legendre trans-
form is its involutive property (“symmetry”) which we discuss next. In particular,
this symmetry property of Legendre transform gives us an alternative represen-
tation of f in terms of its affine minorants.

Proposition III.16.3 Let f be a proper convex function. Then, its bicon-


jugate f ∗∗ is exactly the closure of f , i.e.,
f ∗∗ = cl f.
In particular, when f is proper, convex, and also lower semicontinuous, then
it is precisely the Legendre transform of its Legendre transform, and thus
f (x) = f ∗∗ (x) = sup x⊤ d − f ∗ (d) .

d

Proof. First, by Fact III.16.2 f ∗ is a proper lsc convex function, so that f ∗∗ , once
again by Fact III.16.2, is a proper lsc convex function as well. Next, by definition,
f ∗∗ (x) = (f ∗ )∗ (x) = sup x⊤ d − f ∗ (d) =
  ⊤
sup d x−a .
d∈Rn d∈Rn ,a≥f ∗ (d)

Now, recall from the origin of the Legendre transform that a ≥ f ∗ (d) if and only
if the affine function d⊤ x − a is a minorant of f . Thus, sup d⊤ x − a
d∈Rn ,a≥f ∗ (d)
is exactly the supremum of all affine minorants of f , and this supremum, by
Proposition III.15.7 is nothing but the closure of f . Finally, when f is proper
convex and lsc, f = cl f by Observation III.15.6, that is f ∗∗ = cl f is the same as
f ∗∗ = f .
The Legendre transform is a very powerful descriptive tool, i.e., it is a “global”
transformation, so that local properties of f ∗ correspond to global properties
of f . Below we give a number important consequences of Legendre transform
highlighting this.
Let f be a proper convex lsc function.
A. By Proposition III.16.3, the Legendre transformf ∗ (d) = supx x⊤ d − f (x) is


a proper convex lsc function and f (x) = supd x⊤ d − f ∗ (d) . Since f ∗ (d) ≥
d⊤ x − f (x) for all x, we have
x⊤ d ≤ f (x) + f ∗ (d), ∀x, d ∈ Rn . (16.1)
Moreover, inequality in (16.1) becomes equality if and only if x ∈ Dom f and
d ∈ ∂f (x), same as if and only if d ∈ Dom f ∗ and x ∈ ∂f ∗ (d).
Proof. All we need is to justify the “moreover” part of the claim. Let x, d ∈
Rn , and let us prove that d⊤ x = f (x) + f ∗ (d) if and only if d ∈ ∂f (x). In one
212 ⋆ Legendre transform

direction: when d ∈ ∂f (x), we have x ∈ Dom f and f (z) ≥ f (x) + d⊤ (z − x)


for all z ∈ Rn , so
f ∗ (d) = sup d⊤ z − f (z) ≤ sup d⊤ z − f (x) − d⊤ (z − x) = d⊤ x − f (x).
 
z∈Rn z∈Rn

Thus, f ∗ (d) ≤ f (x) − d⊤ x; since by (16.1) strict inequality here is impossible,


we conclude that when d ∈ ∂f (x), the inequality in (16.1) is equality. In the
opposite direction: let d, x be such that the inequality in (16.1) is equality, and
let us prove that d ∈ ∂f (x). Indeed, in this case x ∈ Dom f and d⊤ x − f (x) =
f ∗ (d) ≥ d⊤ z − f (z) for all z ∈ Rn (recall the definition of Legendre transform
f ∗ (d)), that is, f (z) ≥ f (x) + d⊤ (z − x) for all z, implying that d ∈ ∂f (x).
We have seen that when f is proper convex and lsc, then d ∈ ∂f (x) if and
only if d⊤ x = f (x) + f ∗ (d). Hence, when f is proper convex and lsc, by Fact
III.16.2 and Proposition III.16.3, we can swap the roles of f and f ∗ , so that
the inequality in (16.1) is equality if and only if x ∈ ∂f ∗ (d).
B. We always have inf x f (x) = −f ∗ (0). Thus, f is bounded from below if and only
if 0 ∈ Dom f ∗ . Moreover, f attains its minimum if and only if ∂f ∗ (0) ̸= ∅, in
which case Argmin f = ∂f ∗ (0).
Proof. By definition of the Legendre transform we have f ∗ (0) = − inf x f (x).
Thus, f is below bounded if and only if 0 ∈ Dom f ∗ . To see the rest of the claim,
suppose now that 0 ∈ Dom f ∗ and so f ∗ (0) ∈ R. As inf x f (x) = −f ∗ (0) implies
f (x) + f ∗ (0) ≥ 0 for all x ∈ Rn , we deduce that the equality f (x) + f ∗ (0) = 0
holds exactly for x’s that are the minimizers of f . Also, by part A, we conclude
that when 0 ∈ Dom f ∗ , inequality in (16.1) with d = 0 holds as equality if and
only if x ∈ ∂f ∗ (0). Combining the last two statements, we conclude that when
f is below bounded (or, equivalently, when 0 ∈ Dom f ∗ ), the set of minimizers
of f is exactly ∂f ∗ (0).
C. By part A, we have (i) d¯ ∈ ∂f (x̄) if and only if (ii) x̄ ∈ ∂f ∗ (d),
¯ and moreover
both (i) and (ii) hold simultaneously if and only if (iii) with x = x̄, d = d, ¯
inequality in (16.1) holds as equality.
C′ . Here is a nice special case of C: Let f be a proper convex lsc function on Rn .
Suppose that the domains of f and f ∗ are open, and that these functions are
continuously differentiable and strictly convex on their domains. Then, the map
x 7→ ∇f (x) : Dom f → Rn is a one-to-one mapping of Dom f onto Dom f ∗ ,
and the inverse of this map is given by y 7→ ∇f ∗ (y) : Dom f ∗ → Rn .
Proof. First, we claim that under the given premise, ∇f (x) is an embedding
of Dom f into Rn , i.e., ∇f (x) = ∇f (x′ ) implies x = x′ . Assume for contradic-
tion that this is not the case. Consider the function g(u) := f (u) − u⊤ ∇f (x).
This function is strictly convex, since f is so. However, when ∇f (x) = ∇f (x′ ),
both x and x′ are minimizers of g(u), which is not possible as g(u) is strictly
convex and by Theorem III.14.1 its minimizer, if any exists, must be unique.
Thus, x 7→ ∇f (x) is an embedding of Dom f into Rn . Next, when x ∈ Dom f
and d = ∇f (x), we have d ∈ ∂f (x), so that d ∈ Dom f ∗ and x ∈ ∂f ∗ (d) by
16.3 Young, Hölder, and Moment inequalities 213

C. As Dom f ∗ is open and f ∗ is continuously differentiable at its domain,


the relation x ∈ ∂f ∗ (d) is the same as x = ∇f ∗ (d). Thus, x 7→ ∇f (x)
is an embedding of Dom f into Dom f ∗ and its left inverse is the mapping
d 7→ ∇f ∗ (d) : Dom f ∗ → Rn . Recalling that f is proper lsc convex, so is f ∗
(Fact III.16.2), and (f ∗ )∗ = f (Proposition III.16.3). Since the rest of our as-
sumptions is symmetric with respect to f , f ∗ as well, the previous reasoning as
applied to f ∗ in the role of f demonstrates that the mapping d 7→ ∇f ∗ (d) is an
embedding of Dom f ∗ into dom f with left inverse x 7→ ∇f (x). Taken together,
these observations yield C′ .
D. Dom f ∗ = Rn if and only if f (x) “grows at infinity faster than ∥x∥2 ”, that
is, if and only if the function F (s) := inf x:∥x∥2 ≤s f (x)/∥x∥2 blows up to ∞ as
s → ∞.
Proof. Suppose F (s) → ∞ as s → ∞. Consider any fixed d ∈ Rn . Then,
there exists s̄ such that f (x) ≥ ∥d∥2 ∥x∥2 whenever ∥x∥2 ≥ s̄. Thus, whenever
∥x∥2 ≥ s̄ we have d⊤ x−f (x) ≤ ∥d∥2 ∥x∥2 −f (x) ≤ 0. Also, as a convex function
f is below bounded on the ball ∥x∥2 ≤ s̄, (in fact on any bounded set), the
function d⊤ x − f (x) of x is bounded from above and we conclude d ∈ Dom f ∗ .
As this conclusion holds for any d, we conclude that Dom f ∗ = Rn . Now, to
see the reverse direction, suppose that F (s) does not blow up to ∞ as s → ∞.
Then, we can find a sequence {xi } and L ∈ R such that ∥xi ∥2 → ∞ as i → ∞
and f (xi ) ≤ L∥xi ∥2 for all i. Passing to a subsequence, we can assume that as
i → ∞ we have xi /∥xi ∥2 → ξ, where ∥ξ∥2 = 1. Now, select d := 2Lξ, then for
all large enough i we have f (xi ) ≤ L∥xi ∥2 ≤ 23 d⊤ xi . In addition, d⊤ xi → ∞ as
i → ∞. Consequently, d⊤ xi − f (xi ) ≥ d⊤ xi − L∥xi ∥2 ≥ 13 d⊤ xi for large enough
i. Hence, d⊤ xi − f (xi ) → +∞ as i → ∞, that is, d ̸∈ Dom f ∗ .
The bottom line is that by investigating the Legendre transform of a convex
function, we get a lot of “global” information on the function. This being said,
detailed investigation of the properties of Legendre transform is beyond our scope.
We close this chapter by listing several simple yet important consequences of
Legendre transform.

16.3 Young, Hölder, and Moment inequalities


Legendre transform leads to several important inequalities. Recall from the defi-
nition of Legendre transformation, we have
f (x) + f ∗ (d) ≥ x⊤ d, ∀x, d ∈ Rn .
We will see that specific choices of f in this inequality leads to several well-known
inequalities.

16.3.1 Young’s inequality


Young’s inequality reads as follows:
214 ⋆ Legendre transform
1 1
Let p and q be positive real numbers such that p
+ q
= 1. Then,
|x|p |d|q
xd ≤ + , ∀x, d ∈ R.
p q
Proof. Recall from Example III.16.1 that the Legendre transform of the function
|x|p /p is |d|q /q.

16.3.2 Hölder’s inequality


The admittedly simple-looking Young’s inequality gives rise to the very nice and
useful Hölder’s inequality.
p 1 1
Let 1 ≤ p ≤ ∞ and let q = p−1 (where 1/0 = +∞), so that p
+ q
= 1.
Then,
Xn
|xi yi | ≤ ∥x∥p ∥y∥q , ∀x, y ∈ Rn . (16.2)
i=1

Proof. When p = ∞, we have q = 1 and (16.2) becomes the obvious relation


n n
!
X   X
|xi yi | ≤ max{|xi |} |yi | , ∀x, y ∈ Rn .
i
i=1 i=1

By symmetry, we also see that when p = 1 we have q = ∞ and (16.2) is evident.


Now, let 1 < p < ∞, so that also 1 < q < ∞. In this case we should prove that
n n
!1/p n !1/q
X X X
|xi yi | ≤ |xi |p |yi |q .
i=1 i=1 i=1

When x = 0 or y = 0, this inequality is evident. So, we assume that x ̸= 0 and


y ̸= 0. As both sides of this inequality are positively homogeneous of degree 1 with
respect to x, and similarly with respect to y, without loss of generality we can
assume that ∥x∥p = ∥y∥q = 1. Now, under this normalization, we should prove
p q
that i=1 |xi yi | ≤ 1. Recall that by Young’s inequality we get |xi yi | ≤ |xpi | + |yqi |
Pn
for all i, and so
n n 
|xi |p |yi |q

X X 1 1 1 1
|xi yi | ≤ + = ∥x∥pp + ∥y∥qq = + = 1,
i=1 i=1
p q p q p q
as desired.
1 1
Note that for p, q ∈ [1, ∞] such that p
+ q
= 1, Hölder’s inequality gives us

|x⊤ y| ≤ ∥x∥p ∥y∥q . (16.3)


When p = q = 2, this is the well-known Cauchy–Schwarz inequality. Moreover,
for every p ∈ [1, ∞] the inequality (16.3) is tight in the sense that for every x
there exists y with ∥y∥q = 1 such that
x⊤ y = ∥x∥p [= ∥x∥p ∥y∥q as ∥y∥q = 1].
16.3 Young, Hölder, and Moment inequalities 215

To justify this claim, note that when x = 0 we can select any y with ∥y∥q = 1.
When x ̸= 0 and p < ∞ we can set y to be
yi := ∥x∥p1−p |xi |p−1 sign(xi ), ∀i = 1, . . . , n,
where we set 0p−1 = 0 when p = 1. Finally, when p = ∞, that is q = 1, we can
find index i∗ of the largest in magnitude entry of x and set
(
sign(xi∗ ), if i = i∗ ,
yi =
0, if i ̸= i∗ .
These observations altogether lead us to an extremely important, although
simple, fact:
1 1
∥x∥p = max y ⊤ x : ∥y∥q ≤ 1 ,

where + = 1. (16.4)
y p q
Based on this, we, in particular, deduce that ∥x∥p is convex (as an upper bound
of a family of linear functions). Hence, by its convexity we deduce that for any
x′ , x′′ we have
1 ′ 1 ′′
∥x′ + x′′ ∥p = 2 x + x ≤ 2 (∥x′ ∥p /2 + ∥x′′ ∥p /2) = ∥x′ ∥p + ∥x′′ ∥p ,
2 2 p

which is nothing but the triangle inequality. Thus, ∥x∥p satisfies the triangle
inequality; it clearly possesses the other two characteristic properties of a norm,
namely positivity and homogeneity, as well. Consequently, ∥ · ∥p is a norm—a fact
that we announced twice and already proved (see Example III.13.1.

16.3.3 Moment inequality


A useful application of Hölder’s inequality gives us another well-known inequality
as follows. Moment inequality reads:
For any 0 ̸= a ∈ Rn , the function
f (π) := ln(∥a∥1/π ) : [0, 1] → R
is a convex function of π ∈ [0, 1]. That is, by letting p = 1/π, and so
1 ≤ p ≤ ∞, we have the following inequality
1 ≤ r < s < h∞, p ∈ [r, s] =⇒ ∥a∥p ≤ ∥a∥λr ∥a∥1−λ
s i
r(s−p) . (16.5)
λ = p(s−r) ⇐⇒ λ ∈ [0, 1] & λ r + (1 − λ) s = p1
1 1

Proof. Let ρ, σ ∈ (0, 1] be such that ρ ̸= σ. Consider any λ ∈ (0, 1) and set
π := λρ+(1−λ)σ. By defining θ := λρ/π, we get 0 < θ < 1 and 1−θ = (1−λ)σ/π.
Let r := 1/ρ, s := 1/σ, and p := 1/π, so we have p = θr + (1 − θ)s. Then,
n n
!θ n
!1−θ
X X X X
|ai |p = |ai |θr |ai |(1−θ)s ≤ |ai |r |ai |s ,
i=1 i=1 i i=1
216 ⋆ Legendre transform

where the inequality follows from Hölder’s inequality. Raising both sides of the
resulting inequality to the power 1/p we arrive at
∥a∥p ≤ ∥a∥rθ/p
r ∥a∥s(1−θ)/p
s = ∥a∥λ1/ρ ∥a∥1−λ
1/σ .

By recalling that p = θr + (1 − θ)s and ln(·) is a monotone increasing function,


we conclude that for any λ ∈ (0, 1) the function f (π) = ln(∥a∥1/π ) satisfies the
inequality
f (λρ + (1 − λ)σ) ≤ λf (ρ) + (1 − λ)f (σ), ∀ (ρ, σ : ρ, σ ∈ (0, 1], ρ ̸= σ) .
Since this function is continuous on [0, 1], it is convex, as claimed.
We close this section by discussing the dual norm of a norm. Let ∥ · ∥ be a norm
on Rn . We define its dual (a.k.a. conjugate) norm as the function
∥d∥∗ := sup d⊤ x : ∥x∥ ≤ 1 .

x

As its name implies one can indeed show that this function ∥d∥∗ is a norm.

Fact III.16.4 Let ∥ · ∥ be a norm on Rn . Then, its dual norm ∥ · ∥∗ is indeed


a norm. Moreover, the norm dual to ∥ · ∥∗ is the original norm ∥ · ∥, and the
unit balls of conjugate to each other norms are polars of each other.

For example, when p ∈ [1, ∞], (16.4) says that the norm conjugate to ∥ · ∥p is
∥ · ∥q where p1 + 1q = 1.
We also have the following characterization of the Legendre transform of norms.

Fact III.16.5 Let f (x) = ∥x∥ be a norm on Rn . Then,


(
0, if ∥d∥∗ ≤ 1,
f ∗ (d) =
+∞, otherwise.
That is, the Legendre transform of ∥ · ∥ is the characteristic function of the
unit ball of the conjugate norm.
17

⋆ Functions of eigenvalues of symmetric


matrices

One may think that the calculus of convexity-preserving operations presented in


section 13.1 does not look really deep. On the other hand, these “simple” rules are
extremely useful and allow us to detect convexity and offer nice characterizations
for a particular class of functions of symmetric matrices, which we will examine
in this chapter.
Let X ∈ Sn be an n × n symmetric matrix, and let λ(X) denote the vector of
eigenvalues of X taken with their multiplicities and arranged in the non-ascending
order, see section D.1.1.C. In this chapter, we present a really deep result stating
that whenever f : Rn → R ∪ {+∞} is convex and permutation symmetric, then
the function F (X) := f (λ(X)) is a convex function of the matrix X ∈ Sn .
Recall that a function f : Rn → R ∪ {+∞} is called permutation symmetric if
f (x) = f (P x) for every n × n permutation matrix P .
That is, a function is permutation symmetric if and only if its value remains
unchanged when we permute the coordinates in its argument.
We start with the following observation.

Lemma III.17.1 Let f : Rn → R ∪ {+∞} be a convex permutation sym-


metric function. Then, for any x ∈ Dom f and n×n doubly stochastic matrix
Π, we have f (Πx) ≤ f (x).

Proof. Recall that by Birkhoff’s Theorem (Theorem II.9.7), Π is a convex combi-


nation of a permutation matrices P i , i.e., there exists Pconvex combination weights
λi ≥ 0 and permutation matrices P i such that Π = i λi P i and i λi = 1. Then,
P
as f is convex, we have
!
X X X
f (Πx) = f λi P i x ≤ λi f (P i x) = λi f (x) = f (x),
i i i

where the inequality follows from the convexity of f and the second equality is
due to the fact that f is permutation symmetric.
Our developments will also rely on the following fundamental fact.

Lemma III.17.2 For any X ∈ Sn , the diagonal Dg{X} of the matrix X is


the image of the vector λ(X) of the eigenvalues of X under multiplication by

217
218 ⋆ Functions of eigenvalues of symmetric matrices

a doubly stochastic matrix. That is, there exists an n × n doubly stochastic


matrix Π such that
Dg{X} = Πλ(X).

Proof. Consider the spectral decomposition of X, i.e.,


X = U ⊤ Diag {λ1 (X), . . . , λn (X)} U
where U = [uij ]ni,j=1 is an orthogonal matrix. Define the matrix Π := [u2ji ]ni,j=1 .
As U is an orthogonal matrix, we have that Π is indeed doubly stochastic (verify
yourself!). Moreover, by denoting the i-th basic orth with ei , we get
Xii = e⊤ ⊤ ⊤
i Xei = ei (U Diag {λ1 (X), . . . , λn (X)} U )ei

= Tr(e⊤ ⊤
i (U Diag {λ1 (X), . . . , λn (X)} U )ei )

= Tr(U ei e⊤ ⊤
i U Diag {λ1 (X), . . . , λn (X)})
Xn
= u2ji λj (X) = [Πλ(X)]i .
j=1

Let us denote with On the set of all n×n orthogonal matrices. Lemmas III.17.1
and III.17.2 together give us the following very useful relation.

Proposition III.17.3 Let f : Rn → R ∪ {+∞} be a convex permutation


symmetric function. Then, for every X ∈ Sn , we have
f (λ(X)) ≥ f (Dg{X}).
Furthermore, we also have
f (λ(X)) = max f (Dg{V ⊤ XV }). (17.1)
V ∈On

In particular, the function F (X) := f (λ(X)) is a convex function of X.

Proof. The first claim immediately follows from Lemmas III.17.1 and III.17.2.
To see the second claim, consider any V ∈ On . Note that the matrix V ⊤ XV has
the same eigenvalues as X. Then, as f is a convex and permutation symmetric
function, applying the first claim of this proposition to the matrix V ⊤ XV , we
conclude
f (Dg{V ⊤ XV }) ≤ f (λ(V ⊤ XV )) = f (λ(X)).
Taking the supremum over V ∈ On of both sides of this relation gives us
f (λ(X)) ≥ sup f (Dg{V ⊤ XV }).
V ∈On

Note that for a properly chosen V ∈ On we have Dg{V ⊤ XV } = λ(X). Thus,


the preceding inequality holds as equality and the right hand side supremum is
achieved.
For the final claim, note that for any V ∈ On , we have the function FV (X) :=
⋆ Functions of eigenvalues of symmetric matrices 219

f (Dg{V ⊤ XV }) is convex in X (as it is the composition of a convex function f and


an affine map X 7→ Dg{V ⊤ XV }. Then, the final claim of the proposition follows
from its second claim as F (X) is the pointwise supremum of convex functions
{FV (X)}V ∈On .
Given a symmetric n × n matrix X, as a corollary of Proposition III.17.3,
we arrive at the following immediate relations between eigenvalues and diagonal
entries of X.
n n
|Xii |p ≤ |λi (X)|p .
P P
1. For all p ≥ 1, we have
i=1 Pn i=1
[Consider the function f (x) = i=1 |xi |p .]
n
Q
2. Whenever X is positive semidefinite, we have Xii ≥ Det(X).
Pn i=1
[Consider f (x) = − i=1 ln(xi ) over the domain where xi > 0 for all i.]
3. Define the function sk (x) : Rn → R to be the sum of k largest entries of x
(i.e., the sum of the first k entries in the vector obtained from x by writing
down the coordinates of x in the non-ascending order). Then, the function
Sk (X) := sk (λ(X)) is convex, and
sk (Dg{X}) ≤ Sk (X). (17.2)
[Recall from Example III.13.3 that sk (x) is a convex permutation symmetric
function.]
Remark III.17.4 Let us examine the convexity status of eigenvalues of sym-
metric matrices. Completely, analogous to our discussion in Remark III.13.6 for
the vector case, we have by Proposition III.17.3, the largest eigenvalue λ1 (X) of
a symmetric n × n matrix X is a convex function of X. Therefore, the smallest
eigenvalue λn (X) = −λ1 (−X) is concave in X. On the other hand, “intermediate”
eigenvalues λk (X), 1 < k < n, of X, are neither convex, nor concave functions
of X. What is convex in X, is the sum Sk (X) of k ≤ n largest eigenvalues of X,
which we have just seen. This clearly implies that the sum of k smallest eigen-
values of X is concave in X. The sum of magnitudes (absolute values) of the k
largest in magnitude eigenvalues of X is convex in X, since the function
sk (x) = sk ([|x1 |; . . . ; |xn |]) = max {|xi1 | + . . . + |xik |}
i1 ,...,ik ∈{1,2,...,n}:
i1 <i2 <...<ik

is permutation symmetric and convex on Rn . ■


Example III.17.1 Proposition III.17.3 implies convexity of the following func-
tions of X ∈ Sn :
• F (X) = −Detq (X) in the domain X ⪰ 0, whenever 0 < q ≤ n1 .
[Consider f (x1 , . . . , xn ) = −(x1 . . . xn )q : Rn+ → R.]
• F (X) = − ln Det(X) in the domain X ≻ 0.
Pn
[Consider f (x1 , . . . , xn ) = − i=1 ln xi : int(Rn+ ) → R.]
• F (X) = Det−q (X) in the domain X ≻ 0, whenever q ∈ R is positive.
[Consider f (x1 , . . . , xn ) = (x1 . . . xn )−q : int(Rn+ ) → R.]
220 ⋆ Functions of eigenvalues of symmetric matrices
 n
1/p
|λi (X)|p
P
• ∥X∥p = , where p ≥ 1.
i=1
[Consider f (x1 , . . . , xn ) = ∥x∥p .]
m 1/p
P p
• ∥X+ ∥p = (max[λi (X), 0]) , where p ≥ 1.
i=1
[Consider f (x1 , . . . , xn ) = ∥x+ ∥p , where x+ := [max{0, x1 }; . . . ; max{0, xn }].]

n
We say that a set Q ∈ R is permutation symmetric if for all x ∈ Q and for all
n × n permutation matrices P , we have P x ∈ Q as well. Let us also mention the
following useful corollary of Proposition III.17.3.

Corollary III.17.5 Let Q be a nonempty closed convex and permutation


symmetric set in Rn . Then, the set
Q := {X ∈ Sn : λ(X) ∈ Q}
is closed and convex.
Proof. Recall from Fact D.21 that λ(X) is continuous in X, thus Q is closed.
To prove that Q is convex, consider the function
f (x) := min ∥x − y∥2 .
y∈Q

Note that ∥x − y∥2 is a convex function of x and y over the convex domain
{[x; y] ∈ Rn ×Rn : y ∈ Q}, and as convexity is preserved by partial minimization,
f (x) is a convex real-valued function. Permutation symmetry of Q and ∥·∥2 clearly
implies permutation symmetry of f . Then, by Proposition III.17.3 the function
F (X) := f (λ(X)) is a convex function of X. From the definition of the function
f , we have z ∈ Q if and only if f (z) = 0, which holds if and only if f (z) ≤ 0.
Thus,
Q = {X ∈ Sn : f (λ(X)) ≤ 0} = {X ∈ Sn : F (X) ≤ 0} ,
that is Q is a sublevel set of a convex function F (X) of X, and is hence convex.
Consider a univariate real-valued function g defined on some set Dom g ⊆ R.
In section D.1.5 we have associated with the function g(·) the matrix-valued
map X 7→ g(X) : Sn → Sn as follows: the domain of this map is composed
of all matrices X ∈ Sn with the spectrum σ(X) (subset of R composed of all
eigenvalues of X) contained in Dom g, and for such a matrix X, we set
g(X) := U Diag{g(λ1 ), . . . , g(λn )}U ⊤ ,
where X = U Diag{λ1 , . . . , λn }U ⊤ is an eigenvalue decomposition of X.
The following fact is quite important:

Fact III.17.6 Let g : R → R ∪ {+∞} be a convex function. Then, the


⋆ Functions of eigenvalues of symmetric matrices 221

function
(
Tr(g(X)), if σ(X) ⊆ Dom g,
F (X) := : Sn → R ∪ {+∞}
+∞, otherwise,
is convex.
We close this chapter with the following very useful fact from majorization.

Fact III.17.7 For x ∈ Rn , let x↑ and x↓ be the vectors obtained by re-


ordering the entries of x in the non-decreasing and non-increasing orders,
respectively. For example, [1; 3; 2; 1]↑ = [1; 1; 2; 3] and [1; 3; 2; 1]↓ = [3; 2; 1; 1].
(i) For every x, y ∈ Rn and every n × n doubly stochastic matrix P , we have
[x↑ ]⊤ y↑ ≥ x⊤ P y ≥ [x↑ ]⊤ y↓ .
As a result,
(ii) [Trace inequality] For every A, B ∈ Sn , we have
λ⊤ (A)λ(B) ≥ Tr(AB) ≥ λ⊤ (A)[λ(B)]↑ .
18

Exercises for Part III

18.1 Around convex functions


Exercise III.1 Which of the following functions are convex on the indicated domains:
• f (x) ≡ 1 on R
• f (x) = x on R
• f (x) = |x| on R
• f (x) = −|x| on R
• f (x) = −|x| on R+ = {x ∈ R : x ≥ 0}
• f (x) = |2x − 3| on R
• f (x) = |2x2 − 3| on R
• exp{x} on R
• exp{x2 } on R
• exp{−x2 } on R
• exp{−x2 } on {x ∈ R : x ≥ 100}
• ln(x) on {x ∈ R : x > 0}
• − ln(x) on {x ∈ R : x > 0}
Exercise III.2 ▲
1. Prove the following fact:
For every Ci ∈ Sm
P
+ , i ≤ I, satisfying i∈I Ci = Im and for every λi ∈ R, we have
X 2  X 
Tr λi Ci ≤ Tr λ2i Ci .
i∈I i∈I
P
P from Example III.13.4 in section 13.2 that for ai ≥ 0,
2. Recall i ai > 0 the function
ln( i ai exp(λi )) is a convex function of λ. Prove the following matrix analogy of this fact:
For every Ai ∈ Sm
P
+ , 1 ≤ i ≤ I such that i Ai ≻ 0, the function
X 
f (λ) = ln Det exp(λi )Ai : RI → R
i

is convex.
3. Let Ai , i ≤ I, be as in item 2. Is it true that the function
X −1
g(x) = ln Det( xi Ai ) : {x ∈ RI : x > 0} → R
i

is convex?
4. Let Bi , i ≤ I, be mi × n matrices such that i Bi⊤ Bi ≻ 0, and let
P

Λ = {λ := (λ1 , . . . , λI ) : λi ∈ Smi , λi ≻ 0, i ≤ I}.


Prove that the function
X 
h(λ) = ln Det Bi⊤ λ−1
i Bi : Λ → R
i

is convex.

222
18.1 Around convex functions 223

5. Let Bi , i ≤ I, and Λ be as in the previous item. Prove that the matrix-valued function
" #−1
X ⊤ −1
F (λ) = Bi λi Bi : Λ → int Sn
+
i

is ⪰-concave, that is, the ⪰-hypograph


{(λ, Y ) : λ ∈ Λ, Y ⪯ F (λ)}
of the function is convex.
Exercise III.3 ♦ A function f defined on a convex set Q is called log-convex on Q, if it takes
real positive values on Q and the function ln f is convex on Q. Prove that
• a log-convex on Q function is convex on Q
• the sum (more generally, linear combination with positive coefficients) of two log-convex
functions on Q also is log-convex on the set.
Exercise III.4 ♦ [Law of Diminishing Marginal Returns] Consider optimization problem
Opt(r) = max {f (x) : G(x) ≤ r & x ∈ X} (P [r])
x

where X ⊂ Rn is nonempty convex set, f (·) : X → R is concave, and G(x) = [g1 (x); . . . ; gm (x)] :
X → Rm is vector-function with convex components, and let R be the set of those r for which
(P [r]) is feasible. Prove that
1. R is a convex set with nonempty interior and this set is monotone, meaning that when r ∈ R
and r′ ≥ r, one has r′ ∈ R.
2. The function Opt(r) : R → R ∪ {+∞} satisfies the concavity inequality:
∀(r, r′ ∈ R, λ ∈ [0, 1]) : Opt(λr + (1 − λ)r′ ) ≥ λOpt(r) + (1 − λ)Opt(r′ ). (!)
3. If Opt(r) is finite at some point r̄ ∈ int R, then Opt(r) is real-valued everywhere on R.
Moreover, when X = Rn , and f and the components of G are affine, so that (P [r]) is an
LP program, we can replace in the above claim the inclusion r ∈ int R with the inclusion
r ∈ R: in the LP case, the function Opt(r) is either identically +∞ everywhere on R, or is
real-valued at every point of R.

Comment. Think about problem (P [r]) as about problem where r is the vector of resources
you create, and f (·) is your profit, so that the problem is to maximize your profit given your
resources and “technological constraints” x ∈ X. Now let r̄ ∈ R and e be a nonnegative vector,
and let us look what happens when you select your vector of resources on the ray R = r̄ + R+ e,
assuming that Opt(r) on this ray is real-valued. Restricted on this ray, your best profit becomes
a function ϕ(t) of nonnegative variable t:
ϕ(t) = Opt(r̄ + te).
Since e ≥ 0, this function is nondecreasing, as it should be: the larger t, the more resources you
have, and the larger is your profit. A not so nice news is that ϕ(t) is concave in t, meaning that
the slope of this function does not increase as t grows. In other words, if it costs you $1 to pass
from resources x̄ + te to resources x̄ + (t + 1)e, the return ϕ(t + 1) − ϕ(t) on one extra dollar of
your investment goes down (or at least does not go up) as t grows. This is called The Law of
Diminishing Marginal Returns.
Exercise III.5 ▲ [follow-up to Exercise ref III.4] There are n goods j with per-unit prices
cj > 0, per-unit utilities vj > 0, and the maximum available amounts xj , j ≤ n. Given budget
R ≥ 0, you want to decide on amounts xj of goods to be purchased to maximize the total utility
of the purchased goods, while respecting the budget and the availability constraints. Pose the
problem as and verify that the optimal value Opt(R) is piecewise linear function of R. What
are the breakpoints of this function? What are the slopes between breakpoints?
224 Exercises for Part III

Exercise III.6 ▲ Let β ∈ Rn be such that β1 ≥ β2 ≥ . . . ≥ βn . For x ∈ Rn , let x(k) be the


k-th largest entry in x. Consider the function
X
f (x) = βk x(k) = [β1 − β2 ]s1 (x) + [β2 − β3 ]s2 (x) + . . . + [βn−1 − βn ]sn−1 (x) + βn sn (x),
k

where, as always, sk (x) = ki=1 x(i) . As we know from Exercise I.29, the functions sk (x), k < n,
P
are polyhedrally representable:
X
t ≥ sk (x) ⇐⇒ ∃z ≥ 0, s : xi ≤ zi + s, i ≤ n, zi + ks ≤ t,
i

and sn (x) is just linear:


X
sn (x) = xi
i

As a result, f admits the polyhedral representation


⇐⇒ ∃Z = [zik ] ∈ Rn×n−1 , sk , tk , k < n :
t ≥ f (x) 
 ∀(i ≤ n, k < n)P: zik ≥ 0, xi ≤ zik + sk ,
∀k < n : tk ≥ i zik + ksk
t ≥ n−1
P Pn
k=1 [βk − βk+1 ]tk + βn i=1 xi

This polyhedral representation has 2n2 − n linear inequalities and n2 + n − 2 extra variables.
Now goes the exercise:
1. Find an alternative polyhedral representation of f with n2 + 1 linear inequalities and 2n
extra variables.
2. [computational study] Generate at random orthogonal n × n matrix U and vector β with
nonincreasing entries and solve numerically the problem
( )
X
min f (x) := βk x(k) : ∥U x∥∞ ≤ 1
x
k

utilising the above polyhedral representations of f . For n = 8, 16, 32, . . . , 1024, compare the
running times corresponding to the 2 representations in question.
Exercise III.7 ♦ Let a ∈ Rn be a nonzero vector, and let f (ρ) = ln(∥a∥1/ρ ), ρ ∈ [0, 1].
Moment inequality, see section 16.3.3, states that f is convex. Prove that the function is also
nonincreasing and Lipschitz continuous, with Lipschitz constant ln n, or, which is the same, that
1 1
−p
1 ≤ p ≤ p′ ≤ ∞ =⇒ ∥a∥p ≥ ∥a∥p′ ≥ n p′ ∥a∥p .

Exercise III.8 ▲ This Exercise demonstrates power of Symmetry Principle. Consider the
situation as follows: you are given noisy observations

ω = Ax + ξ, A = Diag{αi , i ≤ n}

of unknown signal x known to belong to the unit ball B = {x ∈ Rn : ∥x∥2 ≤ 1}; here αi > 0
are given, and ξ is the standard (zero mean, unit covariance) Gaussian observation noise. Your
goal is to recover from this observation the vector y = Bx, B = Diag{βi , i ≤ n} being given.
You intend to recover y by linear estimate

ybH (ω) = Hω,

where H is an n × n matrix you are allowed to choose. For example, selecting H = BA−1 =
Diag{βi αi−1 }, you get an unbiased estimate:

yH (Ax + ξ) − y} = 0.
E{b
18.1 Around convex functions 225

Let us quantify the quality of a candidate linear estimate ybH


— at a particular signal x ∈ B - by the quantity
q
Errx (H) = E{∥b yH (Ax + ξ) − Bx∥22 },

so that Err2x (H) is the expected squared ∥ · ∥2 -distance between the estimate and the estimated
quantity,
— on the entire set B of possible signals – by risk Risk[H] = maxx∈B Errx (H).
1. Find closed form expressions for Errx (H) and Risk(H).
2. Formulate the problem of finding the linear estimate with minimal risk as the problem of
minimizing a convex function and prove that the problem is solvable, and admits an optimal
solution H ∗ which is diagonal: H ∗ = Diag{ηi , i ≤ n}.
3. Reduce the problem yielding by item 2 to the problem of minimizing easy-to-compute convex
univariate function. Consider the case when βi = i−1 and αi = [σi2 ]−1 , 1 ≤ i ≤ n, set
n = 10000 and fill the following table:

σ 1.0 0.1 0.01 0.001 0.0001 0.00001 0.000001


Risk[H ∗ ]
Risk[BA−1 ]

where H ∗ is the minimum risk linear estimate as yielded by the solution to univariate problem
you end up with, and Risk[BA−1 ] is the risk of unbiased linear estimate.
You should see from your numerical results that minimal risk of linear estimation is much
smaller than the risk of the unbiased linear estimate. Explain on qualitative level why allowing
for bias reduces the risk.
Exercise III.9 ♦ 1 Given the sets of d-dimensional tentative nodes (d = 2 or d = 3) and of
tentative bars of a TTD problem satisfying assumption R, let V = RM be the space of virtual
displacements of the nodes, N be the number of tentative bars, and W > 0 be the allowed total
bar volume, see Exercise I.16. Let, next, C(t, f ) : RN
+ × V → R ∪ {+∞} be the compliance of
truss t ≥ 0 w.r.t. load f (we identify trusses with the corresponding vectors t of bar volumes).
Prove that
1. C(t, f ) is a convex lsc function, positively homogeneous of homogeneity degree 1, of [t; f ] with
RN N N n
++ × V ⊂ Dom C, where R++ = int R+ = {t ∈ R : t > 0}. This function is positively
homogeneous, with degree -1, in t, when f is fixed, and positively homogeneous, of degree
2, in f when t is fixed. Besides this, C(t, f ) is nonincreasing in t ≥ 0: if 0 ≤ t′ ≤ t, then
C(t, f ) ≤ C(t′ , f ) for every f .
 P
2. The function Opt(W, f ) = inf t C(t, f ) : t ≥ 0, i ti = W – the optimal value in the TTD
problem (5.2) – with W restricted to reside in R++ = {W > 0} is convex continuous function
with the domain R++ × V. This function is positively homogeneous, of degree -1, in W > 0
and homogeneous, of homogeneity degree 2, in f :

∀(λ > 0, µ ≥ 0) : Opt(λW, µf ) = λ−1 µ2 Opt(W, f ), ∀(W, f ) ∈ R++ × V.


 P
Moreover, the infimum in inf t C(t, f ) : t ≥ 0, i ti = W is achieved whenever W > 0.
3. When on certain bridge there is just one car, of unit weight, the compliance of the bridge
does not exceed 1, whatever be the position of the car. How large could the compliance of
the bridge when there are 100 cars of total weight 70 on it?

To formulate the next two tasks, let us associate with a free node p the set F p of all single-
force loads stemming from forces g of magnitude ∥g∥2 not exceeding 1 and acting at node p. For
1 Preceding exercises in the TTD series are I.16, I.18.
226 Exercises for Part III

a set S of free nodes, F S is the set of all loads with nonzero forces acting solely at the nodes
from S and with the sum of ∥ · ∥2 -magnitudes of the forces not exceeding 1, so that

F S = Conv(∪p∈S F p )

(why?)
4. Let S = {p1 , . . . , pK } be a K-element collection of free nodes from the nodal set. Assume
that for every node p from S and every load f ∈ F p there exists a truss of a given total
weight W such that its compliance w.r.t. f does not exceed 1. Which, if any, of the following
statements
(i) For every load f ∈ F S , there exists a truss of total volume W with compliance w.r.t. f
not exceeding 1
(ii) There exists a truss of total volume W with compliance w.r.t. every load from F S not
exceeding 1
(iii) For properly selected γ depending solely on d, there exists a truss of total volume γKW
with compliance w.r.t. every load from F S not exceeding 1
is true?
⋆5. Prove the following statement:
In the situation of item 4 above, let γ = 4 when d = 2 and γ = 7 when d = 3. For
tk of total volume γW such that the compliance of t
every k ≤ K there exists a truss b
pk
w.r.t. every load from F does not exceed 1. As a result, there exists truss e
t of total
volume γKW with compliance w.r.t. every load from F S not exceeding 1.

18.2 Around support, characteristic, and Minkowski functions


Exercise III.10 ♦ [characteristic and support functions of convex sets] Let X ⊂ Rn be
a nonempty convex set. Characteristic (a.k.a. indicator ) function of X is, by definition, the
function

0 ,x ∈ X
χX (x) =
+∞ , x ̸∈ X

As is immediately seen, this function is convex and proper. The Legendre transform of this
function is called the support function ϕX (x) of X:

ϕX (x) = sup[x⊤ u − χX (u)] = sup x⊤ u.


u u∈X

1. Prove that χX is lower semicontinuous (lsc) if and only if X is closed, and that the support
functions of X and cl X are the same.
In the remaining part of Exercise, we are interested in properties of support function,s and in
view of item 1, it makes sense to assume from now on that X, on the top of being nonempty
and convex, is also closed.
Prove the following facts:
2. ϕX (·) is proper lsc convex function which is positively homogeneous of degree 1:

∀(x ∈ Dom ϕx , λ ≥ 0) : ϕX (λx) = λϕX (x).

In particular, the domain of ϕX is a cone. Demonstrate by example that this cone not
necessarily is closed (look at the support function of the closed convex set {[v; w] ∈ R2 : v >
0, w ≤ ln v}).
18.2 Around support, characteristic, and Minkowski functions 227

3. Vice versa, every proper convex lsc function ϕ which is positively homogeneous of degree 1,
(x ∈ Dom f, λ ≥ 0) =⇒ ϕ(λx) = λϕ(x)
is the support function of a nonempty closed convex set, specifically, its subdifferential ∂ϕ(0)
taken at the origin. In particular, ϕX (·) “remembers” X: if X, Y are nonempty closed convex
sets, then ϕX (·) ≡ ϕY (·) if and only if X = Y .
4. Let X, Y be two nonempty closed convex sets. Then ϕX (·) ≥ ϕY (·) if and only if Y ⊂ X.
5. Dom ϕX = Rn if and only if X is bounded.
6. Let X be the unit ball of some norm ∥ · ∥. Then ϕX is nothing but the norm ∥ · ∥∗ conjugate
to ∥ · ∥. In particular, when p ∈ [1, ∞] and X = {x ∈ Rn : ∥x∥p ≤ 1}, we have ϕX (x) ≡ ∥x∥q ,
1
q
+ p1 = 1.
7. Let x 7→ Ax + b : Rn → Rm be an affine mapping, and let Y = AX + b = {Ax + b : x ∈ X}.
Then
ϕY (v) = ϕX (A⊤ v) + b⊤ v.
Exercise III.11 ♦ [Minkowski functions of convex sets] The goal of this Exercise is to acquaint
the reader with important special family of convex functions – Minkowski functions of convex
sets.
Consider a proper nonnegative lower semicontinuous function f : Rn → R ∪ {+∞} which is
positively homogeneous of degree 1, meaning that
x ∈ Dom f, t ≥ 0 =⇒ tx ∈ Dom f & f (tx) = tf (x).
Note that from the latter property of f and its properness it follows that 0 ∈ Dom f and
f (0) = 0.
We can associate with f its basic sublevel set
X = {x ∈ Rn : f (x) ≤ 1}.
Note that X ”remembers” f , specifically
∀t > 0 : f (x) ≤ t ⇐⇒ f (t−1 x) ≤ 1 ⇐⇒ t−1 x ∈ X,
whence also
∀x ∈ Rn : f (x) = inf t : t > 0, t−1 x ∈ X

(18.1)
[inf{t : t > 0, t ∈ ∅} = +∞ by definition]
Note that the basic sublevel set of our f cannot be arbitrary: it is convex and closed (since f is
convex lsc) and contains the origin (since f (0) = 0).
Now, given a closed convex set X ⊂ Rn containing the origin, we can associate with it a
function f : Rn → R ∪ {+∞} by construction from (18.1), specifically, as
f (x) = inf t : t > 0, t−1 x ∈ X

(18.2)
This function is called the Minkowski function (M.f.) of X.
Here goes your first task:
1. Prove that when X ⊂ Rn is convex, closed, bounded, and contains the origin, function f
given by (18.2) is proper, nonnegative, convex lsc function positively homogeneous of degree
1, and X is the basic sublevel set of f . Moreover, f is nothing but the support function ϕX∗
of the polar X∗ of X.
Your next tasks are as follows:
2. What are the Minkowski functions of
• the singleton {0} ?
• a linear subspace ?
• a closed cone K ?
228 Exercises for Part III

• the unit ball of a norm ∥ · ∥ ?


3. Prove that the Minkowski functions fX , fY of closed convex and containing the origin sets
X, Y are linked by the relation fX ≥ fY if and only if X ⊂ Y
4. When the Minkowski function of a set X (convex, closed, bounded, and containing the origin)
does not take value +∞?
5. What is the set of zeros of the Minkowski function of a set X (convex, closed, bounded, and
containing the origin)?
6. What is the M.f. of the intersection ∩k≤K Xk of closed convex sets containing the origin?
Exercise III.12 ♦
1. Recall that the closed conic transform
ConeT(X) = cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ X} ,
of a nonempty convex set X ⊂ Rn (see section 1.5) is a closed cone such that
cl(X) = {x : [x; 1] ∈ ConeT(X).
What is the cone dual to ConeT(X) ?
2. Let X ⊂ Rn be a nonempty closed convex set and X + = ConeT(X). Prove that

 tX, t > 0 (a)
Xt+ := {x : [x; t] ∈ X + } = Rec(X), t = 0 (b)
t < 0 (c)

∅,
3. Let X1 , . . . , XK be closed convex sets in Rn with nonempty intersection X. Prove that
ConeT(X) = ∩k ConeT(Xk ).

4. Let X = ∩k≤K Xk , where X1 , . . . , XK are closed convex sets in Rn such that XK ∩ int X1 ∩
int X2 . . . ∩ int XK−1 ̸= ∅. Prove that ϕX (y) ≤ a if and only if there exist yk , k ≤ K, such
that
X X
y= yk & ϕXk (yk ) ≤ a. (∗)
k k

In words: In the situation in question, the supremum of a linear form on ∩k Xk does not
exceed some a if and only if the form can be decomposed into the sum of K forms with the
sum of their suprema over the respective sets Xk not exceeding a.
5. Prove the following polyhedral version of the claim in item 4:
Let Xk = {x ∈ Rn : Ak x ≤ bk }, k ≤ K, be polyhedral sets with nonempty intersection X.
A linear form does not exceed some a ∈ R everywhere on X if and only if the form can be
decomposed into the sum of K linear forms with the sum of their maxima on the respective
sets Xk not exceeding a.
Exercise III.13 ▲ Let X ⊂ Rn be a nonempty polyhedral set given by polyhedral represen-
tation
X = {x : ∃u : Ax + Bu ≤ r}.
Build polyhedral representation of the epigraph of the support function of X. For non-polyhedral
extension, see Exercise IV.36.
Exercise III.14 ▲ Compute in closed analytic form the support functions of the following
sets:
1. The ellipsoid {x ∈ Rn : (x − c)⊤ C(x − c) ≤ 1} with C ≻ 0
probabilistic simplex {x ∈ Rn
P
2. The + : i xi = 1}
3. The nonnegative part of the unit ∥ · ∥p -ball: X = {x ∈ Rn + : ∥x∥p ≤ 1}, p ∈ [1, ∞]
4. The positive semidefinite part of the unit ∥ · ∥p,Sh norm: X = {x ∈ Sn + : ∥x∥p,Sh ≤ 1}
18.3 Around subdifferentials 229

18.3 Around subdifferentials


Exercise III.15 ♦ Let f be a convex function and x̄ ∈ Dom f ⊂ Rn . Prove that the property
of g ∈ Rn to be a subgradient of f at x̄ is local: the inequality
f (x) ≥ f (x̄) + g ⊤ (x − x̄) (∗)
n
hods true for all x ∈ R iff it holds true for all x in a neighborhood of x̄.
Exercise III.16 ♦ [subdifferentials of norms] Let ∥ · ∥ be a norm on Rn , and ∥ · ∥∗ be its
conjugate (see Fact III.16.4). Prove that
1. The subdifferential of ∥ · ∥ taken at the origin is the unit ball B∗ of ∥ · ∥∗ , or, which is the
same, the polar
{u : u⊤ x ≤ 1 ∀(x : ∥u∥ ≤ 1)}
of the unit ball B of the norm ∥ · ∥.
2. When x ̸= 0, the subdifferential of ∥ · ∥ taken at x is the set {u ∈ B∗ : u⊤ x = ∥x∥}. In
particular, the subdifferential of ∥ · ∥ remains intact when replacing x with tx, t > 0, and is
reflected with respect to the origin when x is replaced with tx, t < 0.
Exercise III.17 ♦ [Shatten norms] Let p ∈ [1, ∞]. The space Sn of symmetric n × n matrices
can be equipped with Shatten p-norms – matrix analogies of the standard ∥ · ∥p -norms on Rn .
Specifically, Shatten p-norm ∥ · ∥p,Sh of symmetric matrix X is defined as
∥X∥p,Sh = ∥λ(X)∥p ,
where λ(X), as always, is the vector of eigenvalues of X.
1. Prove that Shatten norms indeed are norms, and the norm conjugate to ∥ · ∥p,Sh is ∥ · ∥q,Sh ,
1
p
+ 1q = 1:
∥X∥q,Sh = max{Tr(XY ) : ∥Y ∥p,Sh ≤ 1} (18.3)
Y

2. Verify that ∥ · ∥2,Sh is nothing but the Frobenius norm of X, and ∥X∥∞,Sh is the same as the
spectral norm of X.
Exercise III.18 ♦ [chain rule for subdifferentials] Let Y ∈ Rm and X ∈ Rn be nonempty
convex sets, y ∈ Y , x ∈ X, f (·) : Y → R be a convex function, and A(·) : X → Y with A(x) = y.
Let, further, K be a closed cone in Rn . Function f is called K-monotone on Y , if for y, y ′ ∈ Y
such that y ′ − y ∈ K it holds f (y ′ ) ≥ f (y), and A is called K-convex on X if for all x, x′ ∈ X
and λ ∈ [0, 1] it holds λA(X) + (1 − λ)A(x′ ) − A(λx + (1 − λ)x′ ) ∈ K. 2 .
Prove that
1. A is K-convex on X if and only if for every ϕ ∈ K∗ the real-valued function ϕ⊤ A(x) is convex
on X.
2. Let A be K-convex on X and differentiable at x. Prove that
∀x ∈ X : A(x) − [A(x) + A′ (x)[x − x]] ∈ K. (∗)
3. Let f be K-monotone on Y and A be K-convex on X. Prove that the real valued on X
function f◦A (x) = f (A(x)) is convex.
4. Let f be K-monotone on Y . Prove that ∂f (y) ⊂ K∗ provided y ∈ int Y .
5. [chain rule] Let y ∈ int Y , x ∈ int X, let f be K-monotone on Y , A be K-convex on X and
differentiable at x. Prove that
∂f◦A (x) = [A′ (x)]⊤ ∂f (y) = {[A′ (x)]⊤ g : g ∈ ∂f (y)} (!)
n
Exercise III.19 ♦ Recall that the sum Sk (X) of k ≤ n largest eigenvalues of X ∈ S is a
convex function of X, see Remark III.17.4. Point out a subgradient of Sk (·) at a point X ∈ Sn .
2 We shall study cone-monotonicity and cone-convexity in more details in Part IV.
230 Exercises for Part III

18.4 Around Legendre transform


Exercise III.20 ▲ Compute Legendre transforms of the following univariate functions:
1. f (x) = − ln x, Dom f = (0, ∞)
2. f (x) = ex , Dom f = R.
3. f (x) = x ln x, Dom f = [0, ∞) (0 ln 0 = 0 by definition).
4. f (x) = xp /p, Dom f = [0, ∞); here p > 1.
Exercise III.21 ▲ Compute Legendre transforms of the following functions:
Pn
• [log-barrier for nonnegative orthant Rn + ] f (x) = −
n
i=1 ln xi : int R+ → R
n n
• [log-det barrier for semidefinite cone S+ ] f (x) = − ln Det(x) : int S+ → R (start with proving
convexity of f ).
Exercise III.22 ♦ [computing Legendre transform of the log-barrier − ln(x2n −x21 −. . .−x2n−1 )
for Lorentz cone] Consider the optimization problem
n √ o
max ξ ⊤ x + τ t + ln(t2 − x⊤ x) : (t, x) ∈ X = {t > x⊤ x}
x,t

where ξ ∈ Rn , τ ∈ R are parameters. Is the problem convex3) ? What is the domain in the
space of parameters where the problem is solvable? What is the optimal value? Is it convex in
the parameters?
Exercise III.23 ♦ Consider the optimization problem

max {f (x, y) = ax + by + ln(ln y − x) + ln(y) : (x, y) ∈ X = {y > exp{x}}} ,


x,y

where a, b ∈ R are parameters. Is the problem convex? What is the domain in space of parame-
ters where the problem is solvable? What is the optimal value? Is it convex in the parameters?
Exercise III.24 ▲ Compute Legendre transforms of the following functions:
• [“geometric mean”] f (x) = − i≤n xπi i : Rn
Q
+ → R, where πi > 0 sum up to 1 and n > 1.

• [“inverse geometric mean”] f (x) = i≤n x−π : int Rn


Q
+ → R, where πi > 0.
i
i

18.5 Miscellaneous exercises


Exercise III.25 ♦ [multi-factor Hölder inequality] Given positive reals q1 , . . . , qn and p ∈
[1, ∞), we define the weighted p-norm of a vector x ∈ Rn as

n
!1/p
X p
|x|p = qj |xj |
j=1

This clearly is a norm which becomes the standard norm ∥ · ∥p when qj = 1, j ≤ n. Same as
∥x∥p , the quantity |x|p has limit, namely, ∥x∥∞ , as p → ∞, and we define | · |∞ as this limit.
Now let pi , i ≤ k, be positive reals such that
k
X 1
= 1.
i=1
p i

3 A maximization problem with objective f (·) and certain constraints and domain is called convex if
the equivalent minimization problem with the objective (−f ) and the original constraints and
domain is convex.
18.5 Miscellaneous exercises 231

1. Prove that for nonnegative reals a1 , . . . , ak one has


p
ap11 a k
a1 a2 . . . ak ≤ + ... + k
p1 pk

or, equivalently (set bi = api i )

1/p1 1/p2 1/pk b1 b2 bk


∀b ≥ 0 : b1 b2 . . . bk ≤ + + ... + .
p1 p2 pk
Note: the special case pi = k, i ≤ k, of this inequality is the inequality between the geometric
and the arithmetic means.
2. Let x1 , . . . , xk ∈ Rn , and let x1 x2 . . . xk be the entrywise product of x1 , . . . , xk :

[x1 x2 . . . xk ]j = x1j x2j · · · xkj , 1 ≤ j ≤ n.

Prove that
k
X |xii |ppii
|x1 x2 . . . xk |1 ≤ . (∗)
i=1
pi

3. Prove multi-factor Hölder inequality: for vectors xi ∈ Rn , i ≤ k, one has

|x1 x2 . . . xk |1 ≤ |x1 |p1 |x2 |p2 · · · |xk |pk (#)


P
Note: (#) was stated for positive reals p1 , . . . , pk with i 1/pi = 1. It is immediately seen
that (#) remains true when pi = ∞ for some i (and, of course, 1/pi is set to 0 for these i).
Note: (#) is the general form of Hölder inequality which in the main text was proved for k = 2
and | · |pi = ∥ · ∥pi . Needless to say, this inequality extends
P to the case when xi are functions
xi (ω) on a space with measure q(·) , and the finite sums n j=1 qj fj are replaced with integrals,
resulting in
Z Y k Yk Z 1/pi
pi
| xi (ω)|q(dω) ≤ |xi (ω)| q(dω)
i=1 i=1

provided some measurability conditions are satisfied. In this textbook we, however, do not touch
infinite-dimensional spaces of functions and related norms.
Exercise III.26 ♦ [Muirhead’s inequality] For any u ∈ Rn and z ∈ Rn n
++ := {z ∈ R : z > 0}
define
1 X u1 un
fz (u) = z · · · zσ(n) ,
n! σ σ(1)

where the sum is over all permutations σ of {1, . . . , n}. Show that if P is a doubly stochastic
n × n matrix, then
fz (P u) ≤ fz (u) ∀(u ∈ Rn , z ∈ Rn
++ ).

Exercise III.27 ♦ Prove that a convex lsc function f with polyhedral domain is continuous
on its domain. Does the conclusion remain true when lifting either one of the assumptions that
(a) convex f is lsc, and (b) Dom f is polyhedral?
Exercise III.28 ▲ Let a1 , . . . , an > 0, α, β > 0. Solve the optimization problem
( n )
X ai X β
min : x > 0, xi ≤ 1
x xα
i=1 i i
232 Exercises for Part III

Exercise III.29 ▲ [computational study] Consider the following situation: there are K ”radars”
with k-th of them capable to locate targets within ellipsoid Ek = {x ∈ Rn : (x−ck )⊤ Ck (x−ck ) ≤
1} (Ck ≻ 0); the measured position of target is
yk = x + σk ζk ,
where x is the actual position of the target, and ζk is the standard (zero mean, unit covariance)
Gaussian observation noise; ζk are independent across k. Given measurements y1 , . . . , yK of
target’s location x known to belong to the “common field of view” E = ∩k Ek of the radars,
which we assume to possess a nonempty interior, we want to estimate a given linear form e⊤ x
of x by using linear estimate
X ⊤
x
b= hk yk + h.
k

We are interested in finding the estimate (e.g., the parameters H1 , . . . , HK , h) minimizing the
risk
s 
h X i2 
Risk2 = max E e⊤ x − h⊤
k [x + σ k ζk ] − h
x∈E k

1. Pose the problem as convex optimization program


2. Process the problem numerically and look at the results.
Recommended setup:
 
1.000 −0.500 −0.500
• K = 3, n = 2, [c1 , c2 , c3 ] = ,
0 0.866 −0.866
     
0.2500 0 1.1875 0.5413 1.1875 −0.5413
C1 = , C2 = , C3 =
0 1.5000 0.5413 0.5625 −0.5413 0.5625
• σ1 = 0.1, σ√2 = 0.2, σ3 = 0.3
• e = [1; 1]/ 2.

Figure III.5. 3 radars and their common filed of view (dotted)


Exercise III.30 ♦ For any k ≤ m and X ∈ Sm , recall that Sk(X) denotes the sum of k
largest eigenvalues of the matrix X. Given X ∈ Sm , define R[X] := V ⊤ XV : V ∈ Om where
Om = {V ∈ Rm×m : V V ⊤ = Im } is the set of all m × m orthogonal matrices. Prove that for
any two symmetric matrices X, Y ∈ Sm , we have
Y ∈ Conv(R[X]) if and only if Sk (Y ) ≤ Sk (X) for all k < m and Tr(Y ) = Tr(X).
19

Proofs of Facts from Part III

Fact III.16.2 Given a proper convex function f : Rn → R ∪ {+∞}, its Legendre


transform f ∗ is a proper convex lower semicontinuous function.
Proof. This fact immediately follows from the definition of the Legendre transform f ∗ : indeed,
we lose nothing when replacing sup d⊤ x − f (x) with sup
  ⊤
d x − f (x) , so that the
x∈Rn x∈Dom f
Legendre transform is the supremum of a nonempty (as f is proper) family of affine functions
and as such is convex and lower semicontinuous. Since this supremum is finite at least at one
point (namely, at every d which is the slope of an affine minorant of f and we know that such
a minorant exists), f ∗ is a proper convex lsc function, as claimed.

Fact III.16.4 Let ∥ · ∥ be a norm on Rn . Then, its dual norm ∥ · ∥∗ is indeed a norm.
Moreover, the norm dual to ∥ · ∥∗ is the original norm ∥ · ∥, and the unit balls of
conjugate to each other norms are polars of each other.
Proof. From definition of ∥ · ∥∗ it immediately follows that ∥ · ∥∗ satisfies all three conditions
specifying a norm. To justify that the norm dual to ∥ · ∥∗ is ∥ · ∥, note that the unit ball of the
dual norm is, by the definition of this norm, the polar of the unit ball of ∥ · ∥, and the latter
set, as the unit ball of any norm, is closed, convex, and contains the origin. As a result, the unit
balls of ∥ · ∥, ∥ · ∥∗ are polars of each other (Proposition II.8.37), and the norm dual to dual is
the original one – its unit ball is the polar of the unit ball of ∥ · ∥∗ .

Fact III.16.5 Let f (x) = ∥x∥ be a norm on Rn . Then,


(
0, if ∥d∥∗ ≤ 1,
f ∗ (d) =
+∞, otherwise.

That is, the Legendre transform of ∥ · ∥ is the characteristic function of the unit ball
of the conjugate norm.
Proof. Consider any fixed d ∈ Rn . By the definition of Legendre transform we have
n o n o
f ∗ (d) = sup d⊤ x − f (x) = sup d⊤ x − ∥x∥ .
x∈Rn x∈Rn

Now, consider the function gd (x) := d⊤ x − ∥x∥ so that f∗ (d) = supx∈Rn gd (x). The function
gd (x) is positively homogeneous, of degree 1, in x, so that its supremum over the entire space
is either 0 (this happens when the function is nonpositive everywhere), or +∞. By the same
homogeneity, the function gd (x) is nonpositive everywhere if and only if it is nonpositive when
∥x∥ = 1, that is, if and only if d⊤ x ≤ 1 whenever ∥x∥ = 1, or, which is the same, when d⊤ x ≤ 1
whenever ∥x∥ ≤ 1. The bottom line is that f ∗ (d) = supx gd (x) is either 0, or +∞, with the first

233
234 Proofs of Facts from Part III

option taking place if and only if d⊤ x ≤ 1 whenever ∥x∥ ≤ 1, that is, if and only if ∥d∥∗ ≤ 1.

Fact III.17.6 Let g : R → R ∪ {+∞} be a convex function. Then, the function


(
Tr(g(X)), if σ(X) ⊆ Dom g,
F (X) := : Sn → R ∪ {+∞}
+∞, otherwise,
is convex.
Pn
Proof. Define gb(x) := i=1 g(xi ), so that g
b is a permutation symmetric and convex (since
g is convex) function on Rn . Then, by Proposition III.17.3 the function Fb(x) := gb(λ(X)) is
a convex function of X ∈ Sn . Now, Dom Fb = {X ∈ Sn : σ(X) ⊆ Dom g}, so Dom Fb =
Dom F . Consider any X ∈ Dom Fb along with its eigenvalue decomposition given by X =
U Diag{λ(X)}U ⊤ . By definition of g(X), we have g(X) = U Diag{g(λ1 (X)), . . . , g(λn (X))}U ⊤
and F (X) = Tr(g(X)) = n
P
i=1 g(λi (X)) = gb(λ(X)) = Fb(X). We conclude that F is nothing
but the convex function Fb.

Fact III.17.7. For x ∈ Rn , let x↑ and x↓ be the vectors obtained by reordering


the entries of x in the non-decreasing and non-increasing orders, respectively. For
example, [1; 3; 2; 1]↑ = [1; 1; 2; 3] and [1; 3; 2; 1]↓ = [3; 2; 1; 1].
(i) For every x, y ∈ Rn and every n × n doubly stochastic matrix P , we have
[x↑ ]⊤ y↑ ≥ x⊤ P y ≥ [x↑ ]⊤ y↓ .
As a result,
(ii) [Trace inequality] For every A, B ∈ Sn , we have
λ⊤ (A)λ(B) ≥ Tr(AB) ≥ λ⊤ (A)[λ(B)]↑ .
Proof.
(i): First, we claim that for all x, y ∈ Rn we have

x⊤ ⊤ ⊤
↓ y↓ ≥ x y ≥ x↓ y↑ . (*)
Indeed, by continuity, it suffices to verify this relation when all entries of x, same as all entries
in y, are distinct from each other. In such a case, observe that the inequalities to be proved
remain intact when we simultaneously reorder, in the same order, entries in x and in y, so
that we can assume without loss of generality that x1 ≥ x2 ≥ . . . ≥ xn . Taking into account
that ac+bd−[ad+bc] = [a−b][c−d], we see that if i < j and y is obtained from y by swapping
its i-th and j-th entries, we have x⊤ y ≤ x⊤ y when yi > yj and x⊤ y ≥ x⊤ y otherwise. Thus,
the minimum (the maximum) of inner products x⊤ z over the set of vectors z obtained by
reordering entries in y is achieved when z = y↑ (respectively, z = y↓ ), as claimed.
In the situation of (i), by Birkhoff Theorem, P y is a convex combination of vectors obtained
from y by reordering entries, and so the relation in (i) is immediately implied by (∗).
(ii): Let A = U Diag{λ(A)}U ⊤ be the eigenvalue decomposition of A. Then,

Tr(AB) = Tr(U Diag{λ(A)}U ⊤ B) = Tr(Diag{λ(A)}(U ⊤ BU )) = (λ(A))⊤ µ,

where µ is the diagonal of the matrix U ⊤ BU , i.e., µ = Dg{U ⊤ BU }. Then, by Lemma III.17.2
we deduce that there exists a doubly stochastic matrix P such that

µ = Dg{U ⊤ BU } = P λ(U ⊤ BU ) = P λ(B).


Proofs of Facts from Part III 235

Thus, Tr(AB) = (λ(A))⊤ µ = (λ(A))⊤ P λ(B). The desired inequality then follows from ap-
plying part (i) to the vectors x := λ(A) and y := λ(B).
Part IV

Convex Programming, Lagrange


Duality, Saddle Points

237
20

Convex Programming problems and Convex


Theorem on Alternative

20.1 Mathematical Programming and Convex Programming


problems
For reader’s convenience, we start with reproducing the basic optimization ter-
minology presented in section 4.5.1.
A (constrained) Mathematical Programming problem has the following form:
 
 x ∈ X, 
(P) min f (x) : g(x) ≡ [g1 (x); . . . ; gm (x)] ≤ 0, , (20.1)
x 
h(x) ≡ [h1 (x); . . . ; hk (x)] = 0

where

• [domain] X ⊆ Rn is called the domain of the problem.


• [objective] f is called the objective (function) of the problem,
• [constraints] gi , i = 1, . . . , m, are called the (functional) inequality constraints,
and hj , j = 1, . . . , k, are called the equality constraints 1) .
We always assume that X ̸= ∅ and that the objective and the constraints are
well-defined on X. Moreover, we typically skip indicating X when X = Rn . Thus,
in the sequel, unless the domain is explicitly present in the formulation, it is the
entire Rn .
We use the following standard terminology related to (20.1)

• [feasible solution] a point x ∈ Rn is called a feasible solution to (20.1), if


x ∈ X, gi (x) ≤ 0, i = 1, . . . , m, and hj (x) = 0, j = 1, . . . , k, i.e., if x satisfies
all restrictions imposed by the formulation of the problem.
– [feasible set] the set of all feasible solutions is called the feasible set of the
problem.
– [feasible problem] a problem with a nonempty feasible set (i.e., the one which
admits feasible solutions) is called feasible (or consistent).
– [active constraint] an inequality constraint gi (·) ≤ 0 is called active at a given
1 Rigorously speaking, the constraints are not the functions gi , hj , but the relations gi (x) ≤ 0,
hj (x) = 0. We will use the word “constraints” in both of these senses, and it will always be clear
what is meant. For example, we will say that “x satisfies the constraints” to refer to the relations,
and we will say that “the constraints are differentiable” to refer to the underlying functions.

239
240 Convex Programming problems and Convex Theorem on Alternative

feasible solution x, if this constraint is satisfied at the point as an equality


rather than strict inequality, i.e., if
gi (x) = 0.
Each equality constraint hj (x) = 0 by definition is active at every feasible
solution x.
• [optimal value] the optimal value of the problem refers to the quantity

inf x {f (x) : x ∈ X, g(x) ≤ 0, h(x) = 0} , if the problem is feasible
f ∗ := .
+∞, if the problem is infeasible

– [below boundedness] the problem is called below bounded, if its optimal value
is > −∞, i.e., if the objective is bounded from below on the feasible set.
• [optimal solution] a point x ∈ Rn is called an optimal solution to (20.1), if x
is feasible and f (x) ≤ f (x′ ) for any other feasible solution x′ , i.e., if
x ∈ Argmin {f (x′ ) : x′ ∈ X, g(x′ ) ≤ 0, h(x′ ) = 0} .

– [solvable problem] a problem is called solvable, if it admits optimal solutions.


– [optimal set] the set of all optimal solutions to a problem is called its optimal
set.
The terminology above is for minimization problems; for its “maximization mod-
ifications,” see section 4.5.1.

20.1.1 Convex Programming problem


A Mathematical Programming problem (P) is called convex (or Convex Program-
ming problem), if
• X is a convex subset of Rn ,
• f, g1 , . . . , gm are real-valued convex functions on X, and
• there are no equality constraints at all.
Note that instead of saying that there are no equality constraints, we could say
that there are constraints of this type, but only linear (affine) ones; this latter case
can be immediately reduced to the one without equality constraints by replacing
Rn with the affine subspace given by the (linear) equality constraints.

20.2 Convex Theorem on Alternative


The simplest case of a convex problem is, of course, a Linear Programming prob-
lem – the one where X = Rn and the objective and all the constraints are linear.
The main descriptive components of LP are LP duality and optimality condi-
tions; our primary goal in this and forthcoming chapters is to extend duality and
optimality conditions from Linear to Convex programming.
The origin of our developments is based on the following simple observation:
20.2 Convex Theorem on Alternative 241

the fact that a point x∗ is an optimal solution can be expressed in terms of


feasibility/infeasibility of certain systems of constraints. These systems in our
current setup of convex optimization problems are given by
x ∈ X, f (x) ≤ c, gj (x) ≤ 0, j = 1, . . . , m (20.2)
and
x ∈ X, f (x) < c, gj (x) ≤ 0, j = 1, . . . , m; (20.3)
here c is a parameter. Optimality of x∗ for the problem means precisely that
for appropriately chosen c (this choice, of course, is c = f (x∗ )) the first of these
systems is feasible and x∗ is its feasible solution, while the second system is
infeasible. Next, in the case of LP, we converted the “negative” part of this
simple observation –the claim that (20.3) is infeasible– into a positive statement,
using the General Theorem on Alternative (Theorem I.4.3), and this gave us the
LP Duality Theorem (Theorem I.4.9).
We will follow the same approach for convex optimization problems. To this
end, we need a “convex analogy” to the Theorem on Alternative – something like
the latter statement, but for the case when the inequalities in question are given
by convex functions rather than the linear ones (and, besides, we now have to
handle a “convex inclusion” x ∈ X).
Indeed, it is easy to guess the result we need. How did we come to the formu-
lation of the Theorem on Alternative? The main question, basically, boiled down
to how to express in an affirmative manner the fact that a system of linear in-
equalities has no solutions. To this end, we observed that if we can combine, in a
linear fashion, the inequalities of the system and get an obviously false inequality
like 0 ≤ −1, then the system is infeasible. Note that this condition is nothing but
a certain affirmative statement with respect to the weights with which we are
combining the original inequalities.
Now, the scheme of the above reasoning has nothing tied to linearity (and even
convexity) of the inequalities in question. Indeed, consider an arbitrary system
of constraints of the type (20.3):
f (x) < c
gj (x) ≤ 0, j = 1, . . . , m (I)
x ∈ X.
Here, all we assume is that X is a nonempty subset in Rn and f, g1 , . . . , gm are
real-valued functions on X. Then, it is absolutely evident that
if there exist nonnegative weights λ1 , . . . , λm such that the inequality
Xm
f (x) + λj gj (x) < c (20.4)
j=1

has no solutions in X, then (I) also has no solutions.


Indeed, a solution to (I) is clearly a solution to (20.4) – the latter inequality is
nothing but a combination of the inequalities from (I) with the weights 1 (for the
first inequality) and λj (for the remaining ones).
242 Convex Programming problems and Convex Theorem on Alternative

Now, what does it mean that (20.4) has no solutions in the domain X? A
necessary and sufficient condition for this is that the infimum of the left hand
side of (20.4) over the domain x ∈ X is greater than or equal to c. Thus, we
arrive at the following evident result.

Proposition IV.20.1 [Sufficient condition for infeasibility of (I)] Consider


a system (I) with arbitrary data and assume that the system
h Pm i
inf f (x) + j=1 λj gj (x) ≥ c
x∈X (II)
λj ≥ 0, j = 1, . . . , m
with unknowns λ1 , . . . , λm has a solution. Then, (I) is infeasible.

Let us stress that Proposition IV.20.1 is completely general; it does not require
any assumptions (not even convexity) on the entities involved.
That said, Proposition IV.20.1, unfortunately, is not so helpful: the actual
power of the Theorem on Alternative (and the key fact utilized in the proof of the
Linear Programming Duality Theorem) is not the sufficiency of the condition of
Proposition for infeasibility of (I), but the necessity of this condition. Justification
of necessity of the condition in question has nothing to do with the evident
reasoning that established its sufficiency. In the linear case (X = Rn , f , g1 , . . . , gm
are linear), we established the necessity via the Homogeneous Farkas Lemma. We
will next prove the necessity of the condition for the convex case. At this step, we
already need some additional, although minor, assumptions; and in the general
nonconvex case the sufficient condition stated in Proposition IV.20.1 simply is
not necessary for the infeasibility of (I). This, of course, is very bad-yet-expected
news – this is the reason why there are difficult optimization problems that we
do not know how to solve efficiently.
The just presented “preface” outlines our action plan. Let us carry out our
plan by formally defining the aforementioned “minor regularity assumptions.”

Definition IV.20.2 [Slater Condition] Let X ⊆ Rn and let g1 , . . . , gm be


real-valued functions on X. We say that these functions satisfy the Slater
condition on X, if there exists a strictly feasible solution x, that is, x ∈ rint X
such that gj (x) < 0, j = 1, . . . , m.
We say that an inequality constrained problem
minx {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X]
satisfies the Slater condition (synonym: is strictly feasible), if g1 , . . . , gm sat-
isfy this condition on X.

In the case where some of the constraints are linear, we rely on a slightly relaxed
regularity condition.
20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 243

Definition IV.20.3 [Relaxed Slater Condition] Let X ⊆ Rn , and let g1 , . . . , gm


be real-valued functions on X. We say that g1 , . . . , gm satisfy the Relaxed
Slater condition on X, if there exists x ∈ rint X such that gj (x) ≤ 0 for all
1 ≤ j ≤ m, and gj (x) < 0 for all j with non-affine gj .
An inequality constrained problem (IC) is said to satisfy the Relaxed Slater
condition (synonym: is essentially strictly feasible), if g1 , . . . , gm satisfy this
condition on X.
A system of equality and inequality constraints
gj (x) ≤ 0, j = 1, . . . , m, hi (x) = 0, i = 1, . . . , k, x ∈ X
(C)
[where g1 , . . . , gm , h1 , . . . , hk are real-valued functions on X]
is said to satisfy Relaxed Slater Condition (synonym: is essentially strictly
feasible), if all hi are affine functions, and there exists an essentially strictly
feasible solution, that is, a feasible solution x ∈ rint X where all inequality
constraints gj (x) ≤ 0 with non-affine gj are satisfied as strict inequalities.
An optimization problem of minimizing a (real-valued on X) objective f
under constraints (C) is called essentially strictly feasible, if the system of
constraints is so.
Note: (C) is essentially strictly feasible if and only if the equivalent inequality
reformulation
gj (x) ≤ 0, j = 1, . . . , m, ± hi (x) = 0, i = 1, . . . , k, x ∈ X
of (C) is essentially strictly feasible.

Clearly, the validity of Slater condition implies the validity of the Relaxed Slater
condition (why?). We are about to establish the following fundamental fact.

Theorem IV.20.4 [Convex Theorem on Alternative] Let X ⊆ Rn be con-


vex, let f, g1 , . . . , gm be real-valued convex functions on X, and let g1 , . . . , gm
satisfy the Relaxed Slater condition on X. Then, system (I) is feasible if and
only if system (II) is infeasible.

Theorem IV.20.4 is a special case of Theorem IV.20.13 to be formulated and


proved in the next section.

20.3 ⋆ Convex Theorem on Alternative – cone-constrained form


We will indeed present and prove a form of Theorem IV.20.4 that will be stronger.
To this end, we need a few definitions and concepts related to cones.

Definition IV.20.5 [Regular cone] A cone K ⊂ Rn is called a regular cone


if K is closed, convex, full dimensional (i.e., possesses a nonempty interior),
and is pointed (i.e., K ∩ (−K) = {0}).
244 Convex Programming problems and Convex Theorem on Alternative

In our developments, we will frequently examine the dual cones as well. There-
fore, we introduce the following elementary fact on the regularity of dual cones.

Fact IV.20.6 (i) A cone K ⊆ Rn is regular if and only if its dual cone
K∗ = {y ∈ Rn : y ⊤ x ≥ 0, ∀x ∈ K} is regular.
(ii) Given regular cones K1 , . . . , Km , their direct product K1 × . . . × Km is
also regular.

There are a number of “magic cones” that are regular and play a crucial role
in Convex Optimization. In particular, many convex optimization problems from
practice can be posed as optimization problems involving domains expressed using
these cones as the basic building blocks.

Fact IV.20.7 The following cones (see Examples discussed in section 1.2.4)
are regular:
(i) Nonnegative ray, R+ .
(ii) Lorentz
p 2 (a.k.a., second-order, or ice-cream) cone, Ln = {x ∈ Rn : xn ≥
x1 + . . . + x2n−1 } (L1 := R+ ).
(iii) Positive semidefinite cone, Sn+ = {X ∈ Sn : a⊤ Xa ≥ 0, ∀a ∈ Rn }.

Our developments will be based on an important concept that we introduce


now.

Definition IV.20.8 [Cone-convexity] Let K ⊂ Rν be a regular cone. A map


h(·) : Dom h → Rν is called K-convex if Dom h is a convex set in some Rn
and for every x, y ∈ Dom h and λ ∈ [0, 1] we have
λh(x) + (1 − λ)h(y) − h(λx + (1 − λ)y) ∈ K. (20.5)

Note that in the simplest case of K = Rν+ (nonnegative orthant is a regular


cone!) K-convexity of a map h means exactly that the components of h are convex
functions with common domain.
An instructive example of a “genuine cone-convex” function is as follows:

Lemma IV.20.9 Let K = Sm + , and consider h : R


m×n
→ Sm defined as
h(x) = xx⊤ . Then, h is K-convex.

Proof. Indeed, for any x, y ∈ Rm×n and λ ∈ [0, 1], we have



(λx + (1 − λ)y) (λx + (1 − λ)y) = λxx⊤ + (1 − λ)yy ⊤ − λ(1 − λ)(x − y)(x − y)⊤ .
Therefore,
λh(x) + (1 − λ)h(y) − h(λx + (1 − λ)y) = λ(1 − λ) (x − y)(x − y)⊤ ⪰ 0.
| {z } | {z }
≥0 ⪰0
20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 245

See chapter 25 for other instructive examples of K-convex functions and their
“calculus.”
Indeed, K-convexity can be expressed in terms of the usual convexity due to
the following immediate observation.

Fact IV.20.10 Let K be a regular cone in Rν , Z ⊆ Rn be a nonempty


convex set and h : Z → Rν be a mapping with dom h = Z. Then, h is
K-convex if and only if for all µ ∈ K∗ (where K∗ is the cone dual to K) we
have that the real valued functions µ⊤ h(·) are convex on Z.

Given a regular cone K ⊂ Rν , we can associate with it K-inequality between


vectors of Rν : we say that a ∈ Rν is K-greater than or equal to b ∈ Rν (notation:
a ≥K b, or, equivalently, b ≤K a) when a − b ∈ K:

a≥K b ⇐⇒ b≤K a ⇐⇒ a − b ∈ K.

For example, when K = Rν+ is nonnegative orthant, ≥K is the standard coordinate-


wise vector inequality ≥. That is, a ≥ b means that every entry of a is greater
than or equal to, in the standard arithmetic sense, the corresponding entry in b.
K-vector inequality possesses all algebraic properties of ≥.

Fact IV.20.11 Let K ⊂ Rν be a regular cone. Then, any K-inequality


a ≥K b satisfies all of the following properties:
(i) It is a partial order on Rν , i.e., the relation a ≥K b is
• reflexive: a ≥K a for all a;
• anti-symmetric: a ≥K b and b ≥K a if and only if a = b;
• transitive: if a ≥K b and b ≥K c, then a ≥K c.
(ii) It is compatible with linear operations, i.e., ≥K -inequalities can be
• summed up: if a≥K b and c≥K d, then a + c≥K b + d;
• multiplied by nonnegative reals: if a≥K b and λ is a nonnegative real,
then λa≥K λb.
(iii) It is compatible with convergence, i.e., one can pass to sidewise limits in
≥K -inequality:
• if at ≥K bt , t = 1, 2, . . ., and at → a and bt → b as t → ∞, then a≥K b.
(iv) It gives rise to strict version >K of ≥K -inequality a >K b (equivalently:
b <K a) meaning that a − b ∈ int K. The strict K-inequality possesses the
basic properties of the coordinate-wise >, specifically,
• >K is stable: if a >K b and a′ , b′ are close enough to a, b respectively,
then a′ >K b′ ;
• if a >K b, λ is a positive real, and c≥K d, then λa >K λb and a + c >K
b + d.

In summary, the arithmetics of ≥K and >K inequalities is completely similar to


246 Convex Programming problems and Convex Theorem on Alternative

the one of the usual ≥ and > inequalities. Verification of the claims made in
Fact IV.20.11 is immediate and is left to the reader.
In the standard approach to nonlinear convex optimization, the Mathematical
Programming problem that is convex has the following form
min {f (x) : g(x) := [g1 (x); . . . ; gm (x)] ≤ 0, [h1 (x); . . . ; hk (x)] = 0} ,
x∈X

where X is a convex set, f (x), gi (x) : X → R, 1 ≤ i ≤ m are convex functions


and hj (x), 1 ≤ j ≤ k are affine functions. In this form, the nonlinearity “sits”
in the functions gi and/or non-polyhedrality of the set X. Since 1990s, it was
realized that, along with this form, it is extremely convenient to consider the
conic form where the nonlinearity “sits” in the inequality relations ≤. That is,
the usual coordinate-wise ≤ is replaced with ≤K , where K is a regular cone. The
resulting convex program in cone-constrained form reads
 

 

min f (x) : g(x) := Ax − b ≤ 0, gb(x) ≤K 0 , (20.6)
x∈X 
 | {z }  
⇐⇒ g
b(x)∈−K

where X ⊂ Rn is a convex set, f : X → R is a convex function, K ⊂ Rν is a


regular cone, A ∈ Rk×n , and gb : X → Rν is K-convex. Note that K-convexity of
gb in our new notation is simply equivalent to the requirement
gb(λx + (1 − λ)y)) ≤K λgb(x) + (1 − λ)gb(y), ∀x, y ∈ X, ∀λ ∈ [0, 1].
Indeed, when K is the nonnegative orthant, (20.6) recovers the Mathematical
Programming form (20.1) of a convex problem.
It turns out that with “cone-constrained approach” to Convex Programming,
we lose nothing when restricting ourselves with X = Rn , linear f (x), and affine
gb(x); this specific version of (20.6) is called “conic problem” (to be considered in
more details later). That said, it makes sense to speak about a “less extreme”
form of a convex program, specifically, one presented in (20.6); we call problems
of this form “convex problems in cone-constrained form,” reserving the words
“conic problems” for problems (20.6) with X = Rn , linear f and affine gb, see
section 22.4.
The developments from section 20.2 can be naturally extended to the cone-
constrained case as follows. Let X ⊆ Rn be a nonempty convex set, K ⊂ Rν be
a regular cone, f be a real-valued function on X, and gb(·) be a mapping from
X into Rν . Instead of feasibility/infeasibility of system (I) we can speak about
feasibility/infeasibility of system of constraints
f (x) < c
g(x) := Ax − b ≤ 0
(ConI)
gb(x) ≤K 0 [ ⇐⇒ gb(x) ∈ −K]
x ∈ X
in variables x, i.e., this is the cone-constrained analogy of inequality-constrained
20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 247

system (I). We call system (ConI) convex, if, in addition to already assumed
convexity of X, the function f is convex on X, and the map gb is K-convex on X.
Denoting by K∗ the cone dual to K, a sufficient condition for the infea-
sibility of (ConI) is the feasibility of the following system of constraints

h i
inf f (x) + λ g(x) + λb⊤ gb(x) ≥ c
x∈X
λ ≥ 0 (ConII)
b ≥K 0 [ ⇐⇒ λ
λ b ∈ K∗ ]

in variables λ = (λ, λ).


b

Indeed, given a feasible solution λ, λb to (ConII) and “aggregating” the constraints


in (ConI) with weights 1, λ, λ b (i.e., taking sidewise inner products of constraints
in (ConI) based on these aggregation weights and summing up the results), we
arrive at the inequality

b⊤ gb(x) < c
f (x) + λ g(x) + λ
which due to λ ≥ 0 and λ b ∈ K∗ is a consequence of (ConI) – it must be satisfied
at every feasible solution to (ConI). On the other hand, the aggregated inequal-
ity contradicts the first constraint in (ConII), and we conclude that under the
circumstances (ConI) is infeasible.
“Cone-constrained” version of Slater/Relaxed Slater condition is as follows:

Definition IV.20.12 [Cone-constrained Slater/Relaxed Slater Condition]


We say that the system of constraints
g(x) := Ax − b ≤ 0, gb(x)≤K 0 (S)
in variables x satisfies
— Slater condition on X, if there exists x̄ ∈ rint X such that g(x̄) < 0 and
gb(x̄)<K 0 (i.e., gb(x̄) ∈ − int K),
— Relaxed Slater condition on X, if there exists x̄ ∈ rint X such that g(x̄) ≤ 0
and gb(x̄)<K 0.
We say that optimization problem in cone-constrained form, i.e., problem of
the form
min {f (x) : g(x) := Ax − b ≤ 0, gb(x)≤K 0} (ConIC)
x∈X

satisfies Slater/Relaxed Slater condition, if the system of its constraints sat-


isfies this condition on X.
Note that in the case of K = Rν+ , (ConI) and (ConII) become, respectively,
(I) and (II), and the cone-constrained versions of Slater/Relaxed Slater condition
become the usual ones.
We are now ready to state Convex Theorem on Alternative in cone-constrained
form dealing with convex cone-constrained system (ConI), and obtain Theorem
IV.20.4 as a particular case of this result.
248 Convex Programming problems and Convex Theorem on Alternative

Theorem IV.20.13 [Convex Theorem on Alternative in cone-constrained


form] Let K ⊂ Rν be a regular cone, let X ⊆ Rn be nonempty and convex,
let f be real-valued convex function on X, g(x) = Ax − b be affine, and gb(x) :
X → Rν be K-convex. Suppose that system (S) satisfies the cone-constrained
Relaxed Slater condition on X. Then, the system (ConI) is feasible if and
only if the system (ConII) is infeasible.

Note: In some cases (ConI) may have no affine (i.e., polyhedral) part g(x) :=
Ax − b ≤ 0 and/or no “general part” gb(x)≤K 0; absence of one or both of these
parts leads to self-evident modifications in (ConII). To unify our forthcoming
considerations, it is convenient to assume that both of these parts are present.
This assumption is for free: it is immediately seen that in our context, in absence
of one or both of g-constraints in (ConI) we lose nothing when adding artificial
affine part g(x) := 0⊤ x − 1 ≤ 0 instead of missing affine part, and/or artifi-
cial general part gb(x) := 0⊤ x − 1≤K 0 with K = R+ instead of missing general
part. Thus, we lose nothing when assuming from the very beginning that both
polyhedral and general parts are present.
It is immediately seen that Convex Theorem on Alternative (Theorem IV.20.4)
is a special case of Theorem IV.20.13 corresponding to the case when K is a
nonnegative orthant.
In the proof of Theorem IV.20.13, we will use Lemma IV.20.14, a generalization
of the Inhomogeneous Farkas Lemma. We will present the proof of this result after
the proof of Theorem IV.20.13.

Lemma IV.20.14 Let X ⊆ Rn be a convex set with 0 ∈ int X. Let g(x) =


Ax + a : Rn → Rk and d⊤ x + δ : Rn → R be affine functions such that
g(0) ≤ 0 and
x ∈ X, g(x) ≤ 0 =⇒ d⊤ x + δ ≤ 0.
Then, there exists a vector µ ≥ 0 such that
d⊤ x + δ ≤ µ⊤ g(x), ∀x ∈ X.

Proof of Theorem IV.20.13. The first part of the statement – “if (ConII) has
a solution, then (ConI) has no solutions” – has been already verified. What we
need is to prove the reverse statement. Thus, let us assume that (ConI) has no
solutions, and let us prove that then (ConII) has a solution.
00 . Without loss of generality we may assume that X is full-dimensional:
rint X = int X (indeed, otherwise we can replace our “universe” Rn with the
affine span of X). Besides this, if needed shifting f by a constant, we can assume
that c = 0. Thus, we are in the case where

f (x) < 0, 

g(x) := Ax − b ≤ 0,

(ConI)
gb(x) ≤K 0, [ ⇐⇒ gb(x) ∈ −K]  
x ∈ X;

20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 249

h i 
b⊤ gb(x)
inf f (x) + λ g(x) + λ ≥ 0, 
x∈X

λ ≥ 0, (ConII)

b ≥K
λ 0 [ ⇐⇒ λ
b ∈ K∗ ]. 

Moreover, because the system satisfies the cone-constrained version of Relaxed


Slater condition on X, there exists some x̄ ∈ rint X = int X such that g(x̄) ≤ 0
and gb(x̄) ∈ − int K. If needed, by shifting X (that is, passing from variables x to
variables x − x̄; this clearly does not affect the statement we need to prove) we
can assume that x̄ is just the origin, so that we have
0 ∈ int X, g(0) ≤ 0, gb(0)<K 0. (20.7)
Recall that we are in the situation when (ConI) is infeasible, that is, the opti-
mization problem
Opt(P ) := min {f (x) : x ∈ X, g(x) ≤ 0, gb(x)≤K 0} (P )
x

satisfies Opt(P ) ≥ 0, and our goal is to show that (ConII) is feasible.


10 . Define
Y := {x ∈ X : g(x) ≤ 0} = {x ∈ X : Ax − b ≤ 0} ,
along with
S := {t = [t0 ; t1 ] ∈ R × Rν : t0 < 0, t1 ≤K 0} ,
T := {t = [t0 ; t1 ] ∈ R × Rν : ∃x ∈ Y s.t. f (x) ≤ t0 , gb(x)≤K t1 } ,
so that both sets S and T are nonempty (as (P ) is feasible by (20.7)) and convex
(since X, Y , and f are convex, and gb(x) is K-convex on X). Moreover, S and
T do not intersect since Opt(P ) ≥ 0. Then, by Separation Theorem (Theorem
II.7.3) S and T can be separated by an appropriately chosen linear form α. Thus,
sup α⊤ t ≤ inf α⊤ t (20.8)
t∈S t∈T

and the linear form α⊤ t is non-constant on S ∪ T , implying that α ̸= 0. Denote


α = [α0 ; α1 ] with α0 ∈ R and α1 ∈ Rν . We claim that α0 ≥ 0 and α1 ∈ K∗ .
Suppose not, then either α0 < 0 or α1 ∈ / K∗ (i.e., there exists τ̄ ∈ K such that
α1⊤ τ̄ < 0) or both. But, in any of these cases we would have supt∈S α⊤ t = +∞
(look at what happens with α⊤ t on the ray {t = [t0 ; t1 ] : t0 ≤ −1, t1 = 0} ⊂ S
when α0 < 0, and on the ray {[t0 = −1, t1 = −sτ̄ ], s ≥ 0} ⊂ S when α1 ̸∈ K∗
and τ̄ ∈ K is such that α1⊤ τ̄ < 0) and this would contradict (20.8) combined with
nonemptiness of T . Then, taking into account that α0 ≥ 0 and α1 ∈ K∗ , we have
supt∈S α⊤ t = 0, and (20.8) reads
0 ≤ inf α⊤ t. (20.9)
t∈T

20 . We now claim that α0 > 0. Note that the point t̄ := [t̄0 ; t̄1 ] with the
components t̄0 := f (0) and t̄1 := gb(0) belongs to T (since 0 ∈ Y by (20.7)),
thus by (20.9) it holds that α0 f (0) + α1⊤ gb(0) ≥ 0. Assume for contradiction that
α0 = 0. Then, we deduce α1⊤ gb(0) ≥ 0, which due to −gb(0) ∈ int K (see (20.7))
250 Convex Programming problems and Convex Theorem on Alternative

and α1 ∈ K∗ is possible only when α1 = 0, see Fact II.8.23(iii). Thus, we conclude


that α1 = 0 on the top of α0 = 0, which is impossible, since α ̸= 0.
Given that we have proved α0 > 0, in the sequel, we set ᾱ1 := α1 /α0 , and
h(x) := f (x) + ᾱ1⊤ gb(x).
Note that h(·) is convex on X due to convexity of X and f , K-convexity of gb,
and the inclusion ᾱ1 ∈ K∗ , see Fact IV.20.10.
Observe that (20.9) remains valid when we replace α with ᾱ := α/α0 . Moreover,
when x ∈ Y , we have [f (x); gb(x)] ∈ T and thus from (20.9) and the definition of
h(x) we deduce that
x∈Y =⇒ h(x) ≥ 0. (20.10)
30 .i. Consider the convex sets
Q := {[x; τ ] ∈ Rn × R : x ∈ X, g(x) := Ax − b ≤ 0, τ < 0} ,
W := {[x; τ ] ∈ Rn × R : x ∈ X, τ ≥ h(x)} .
These sets clearly are nonempty and do not intersect (since the x-component x
of a point from Q ∩ W would satisfy the premise but violate the conclusion in
(20.10)). By Separation Theorem, there exists [e; β] ̸= 0 such that
sup e⊤ x + βτ ≤ inf
  ⊤
e x + βτ .
[x;τ ]∈Q [x;τ ]∈W

As W is nonempty, we have inf [x;τ ]∈W [e⊤ x + βτ ] < +∞. Then, by taking into
account the definition of Q, we deduce β ≥ 0 (since otherwise the left hand side
in the preceding inequality would be +∞). With β ≥ 0 in mind and considering
the definitions of Q and W , the preceding inequality reads
sup e⊤ x : x ∈ X, g(x) ≤ 0 ≤ inf e⊤ x + βh(x) : x ∈ X .
 
(20.11)
x x

Let us define a := supx e⊤ x : x ∈ X, g(x) ≤ 0 . Note that in (20.11), supx and




inf x are taken over nonempty sets, implying that a ∈ R.


30 .ii. Recall that we have seen that β ≥ 0 in (20.11). We claim that in fact
β > 0. Assume for contradiction that β = 0. Then, using the definition of a and
β = 0, (20.11) implies
[e⊤ x − a ≤ 0, ∀(x ∈ X : g(x) ≤ 0)] & [e⊤ x ≥ a, ∀x ∈ X]. (20.12)
Taking into account that 0 ∈ X and g(0) ≤ 0 by (20.7), the first relation in
(20.12) says that a ≥ 0. Then, as a ≥ 0, from the second relation in (20.12) we
deduce that e⊤ x ≥ 0 for x ∈ X. As 0 ∈ int X, there exists a small enough ε > 0
such that (0 − εe) ∈ X. Thus, from e⊤ (−εe) ≥ 0, we deduce that e = 0. But,
e = 0 is impossible, since we are in the case when [e; β] ̸= 0 and β = 0.
30 .iii. Thus, in (20.11) we must have β > 0. Then, by replacing e with β −1 e,
we can
 assume that (20.11) holds true with β = 1. Once again recalling a =
supx e⊤ x : x ∈ X, g(x) ≤ 0 , the inequality (20.11) becomes
a ≤ h(x) + e⊤ x, ∀x ∈ X. (20.13)
20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 251

By the definition of a, we have also


d⊤ x + δ := e⊤ x − a ≤ 0, ∀(x ∈ X : g(x) ≤ 0)
so that the data X, g(x) := g(x) augmented with the just defined affine function
d⊤ x + δ satisfy the premise in Lemma IV.20.14 (recall that (20.7) holds true).
Applying Lemma IV.20.14, we conclude that there exists µ ≥ 0 such that
e⊤ x − a ≤ µ⊤ g(x), ∀x ∈ X.
Combining this relation and (20.13), we conclude that
h(x) + µ⊤ g(x) ≥ 0, ∀x ∈ X. (20.14)
Recalling that h(x) = f (x) + ᾱ1⊤ gb(x) with ᾱ1 ∈ K∗ and setting λ = µ, λ
b = ᾱ1 ,
we get λ ≥ 0, λ
b ∈ K∗ while by (20.14) it holds

b⊤ gb(x) ≥ 0 ∀x ∈ X,
f (x) + λ g(x) + λ
meaning that λ, λ
b solve (ConII) (recall that we are in the case of c = 0).
Proof of Lemma IV.20.14. Recall g(x) = Ax + a. Let us define the following
cones
M1 := cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ X} ,
M2 := {y = [x; t] ∈ Rn × R : Ax + ta ≤ 0} .
M2 is a polyhedral cone, and M1 is a closed cone (in fact it is the closed conic
transform ConeT(X) of X, see section 1.5) with a nonempty interior (since the
point [0; . . . ; 0; 1] ∈ Rn+1 belongs to int M1 due to 0 ∈ int X). Moreover, (int M1 )∩
M2 ̸= ∅ as the point [0; . . . ; 0; 1] ∈ int M1 belongs to M2 as g(0) ≤ 0.
Let f := [d; δ] ∈ Rn × R. We claim that the linear form f ⊤ [x; t] = d⊤ x + tδ is
nonpositive on M1 ∩ M2 . Indeed, consider any y := [z; t] ∈ M1 ∩ M2 , and define
ys = [zs ; ts ] = (1 − s)y + s[0; . . . ; 0; 1]. Then, for s = 0 we have ys = y and it is
in M1 ∩ M2 , and when s = 1 we have ys = [0; . . . ; 0; 1] ∈ M1 ∩ M2 . As M1 ∩ M2
is convex, we observe that ys ∈ M1 ∩ M2 for 0 ≤ s ≤ 1. Now, consider the case
when 0 < s ≤ 1. Then, we have ts > 0 (since s > 0 and also t ≥ 0 due to y ∈ M1 ),
and ys ∈ M1 implies that ws := zs /ts ∈ cl X (why?), while ys ∈ M2 along with
ts > 0 implies that g(ws ) ≤ 0. Since 0 ∈ int X, ws ∈ cl X implies that θws ∈ X
for all θ ∈ [0, 1), while g(0) ≤ 0 along with g(ws ) ≤ 0 implies that g(θws ) ≤ 0 for
θ ∈ [0, 1). Then, by invoking the premise of the lemma, we conclude that
d⊤ (θws ) + δ ≤ 0, ∀θ ∈ [0, 1).
Hence, whenever 0 < s ≤ 1, we have d⊤ ws + δ ≤ 0, or, equivalently, f ⊤ ys ≤ 0. As
s → +0, we have f ⊤ ys → f ⊤ y = f ⊤ [z; t], implying that f ⊤ [z; t] ≤ 0, as claimed.
Therefore, we have shown that f ⊤ [x; t] = d⊤ x + tδ ≤ 0 for every [x; t] ∈
(M1 ∩ M2 ). That is, (−f ) ∈ (M1 ∩ M2 )∗ , where, as usual, M∗ is the cone dual to
the cone M . To summarize, we have M1 , M2 are closed cones such that (int M1 )∩
M2 ̸= ∅ and (−f ) ∈ (M1 ∩ M2 )∗ . Applying to M1 , M2 the Dubovitski-Milutin
Lemma (Proposition II.8.27), we conclude that (M1 ∩ M2 )∗ = (M1 )∗ + (M2 )∗ .
252 Convex Programming problems and Convex Theorem on Alternative

Since (−f ) ∈ (M1 ∩ M2 )∗ , there exist ψ ∈ (M1 )∗ and ϕ ∈ (M2 )∗ such that
f = [d; δ] = −ϕ − ψ. The inclusion ϕ ∈ (M2 )∗ means that the homogeneous
linear inequality ϕ⊤ y ≥ 0 in variables y ∈ Rn+1 is a consequence of the system
of homogeneous linear inequalities given by [A, a]y ≤ 0. Hence, by Homogeneous
Farkas Lemma (Lemma I.4.1) −ϕ is a conic combination of the transposes of the
rows of the matrix [A, a], so that ϕ⊤ [x; 1] = −µ⊤ g(x) for some nonnegative µ
and all x ∈ Rn . Thus, for all x ∈ Rn , we deduce
d⊤ x + δ = [d; δ]⊤ [x; 1] = f ⊤ [x; 1] = −ϕ⊤ [x; 1] − ψ ⊤ [x; 1] = µ⊤ g(x) − ψ ⊤ [x; 1].
Finally, note that [x; 1] ∈ M1 whenever x ∈ X. Then, as ψ ∈ (M1 )∗ , we have
ψ ⊤ [x; 1] ≥ ⊤ ⊤
 0 for all x ∈ X. Thus, for all x ∈ X, we have 0 ≤ ψ [x; 1] = µ g(x) −

d x + δ and so µ satisfies precisely the requirements stated in the lemma.
To complete the story about Convex Theorem on Alternative, let us present
an example which demonstrates that the relaxed Slater condition is crucial for
the validity of Theorem IV.20.13.
Example IV.20.1 Consider the following special case of (ConI):
f (x) ≡ x < 0, g(x) ≡ 0 ≤ 0, gb(x) ≡ x2 ≤ 0 (ConI)
(here the embedding space is R, X = R, c = 0, and K = R+ , that is, this is
just a system of scalar convex constraints). System (ConII) here is the system of
constraints
b 2 ≥ 0, λ ≥ 0, λ
 
inf x + λ · 0 + λx b≥0 (ConII)
x∈R

b ∈ R.
on variables λ, λ
System (ConI) clearly is infeasible. System (ConII) is infeasible as well – it is
immediately seen that whenever λ and λ b are nonnegative, the quantity x + λ ·
2
0 + λx is negative for all small in magnitude x < 0, that is, the first inequality
b
in (ConII) is incompatible with the remaining two inequalities of the system.
Note that in this example the only missing component of the premise in Theo-
rem IV.20.13 is the relaxed Slater condition. Let us now examine what happens
when we replace the constraint gb(x) ≡ x2 ≤ 0 with gb(x) ≡ x2 − 2ϵx ≤ 0 where
ϵ > 0. In this case, we keep (ConI) infeasible, and gain the validity of the re-
laxed (relaxed, not plain!) Slater condition. Then, as all the conditions of Convex
Theorem on Alternative are now met, we deduce that (ConII) which now reads
h i
b 2 − 2ϵx) ≥ 0, λ ≥ 0, λ
inf x + λ · 0 + λ(x b ≥ 0,
x

1
must be feasible. In fact, λ = 0, λ
b=

is a feasible solution to (ConII). ♢
21

Lagrange Function and Lagrange Duality

21.1 Lagrange function


Convex Theorem on Alternative brings to our attention the function
h Xm i
L(λ) := inf f (x) + λj gj (x) , (21.1)
x∈X j=1

and the related aggregate function


Xm
L(x, λ) := f (x) + λj gj (x) (21.2)
j=1

from which L(λ) originates. The aggregate function in (21.2) is called the La-
grange (or Lagrangian) function of the inequality constrained optimization pro-
gram
min {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X] .
The Lagrange function L(x, λ) of an optimization problem is a very important
entity as most of the optimality conditions are expressed in terms of this function.
Let us start with translating our developments from section 20.2 to the language
of the Lagrange function.

21.2 Convex Programming Duality Theorem


We start by developing the duality theorem for convex optimization problems.

Theorem IV.21.1 Consider an arbitrary inequality constrained optimiza-


tion program
min {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X]
along with its Lagrange function
Xm
L(x, λ) := f (x) + λi gi (x) : X × Rm
+ → R.
i=1

Then,
(i) [Weak Duality] For every λ ≥ 0, the infimum of the Lagrange function in

253
254 Lagrange Function and Lagrange Duality

x ∈ X, that is,
L(λ) := inf L(x, λ)
x∈X

is a lower bound on the optimal value of (IC), so that the optimal value
of the optimization problem
sup L(λ) (IC∗ )
λ≥0

also is a lower bound for the optimal value in (IC);


(ii) [Convex Duality Theorem] If the problem (IC)
• is convex,
• is below bounded, and
• satisfies the Relaxed Slater condition,
then the optimal value of (IC∗ ) is attained and is equal to the optimal
value in (IC).

Proof. Let c∗ be the optimal value of (IC).


(i): This part is nothing but Proposition IV.20.1 (why?). It makes sense, how-
ever, to repeat here the corresponding one-line reasoning:
Consider any λ ∈ Rm + . Then, by definition of the Lagrange function, for any x
that is feasible for (IC) we have

Xm
L(x, λ) = f (x) + λj gj (x) ≤ f (x),
j=1

where the inequality follows from the facts that λ ∈ Rm + and the feasibility of x
for (IC) implies that gj (x) ≤ 0 for all j = 1, . . . , m. Then, we immediately arrive
at

L(λ) = inf L(x, λ) ≤ inf L(x, λ) ≤ inf f (x) = c∗ ,


x∈X x∈X:g(x)≤0 x∈X:g(x)≤0

as desired.
(ii): This part is an immediate consequence of the Convex Theorem on Alter-
native. Note that the system

f (x) < c∗ , gj (x) ≤ 0, j = 1, . . . , m

has no solutions in X, and by Theorem IV.20.4, the system (II) associated with
c = c∗ has a solution, i.e., there exists λ∗ ≥ 0 such that L(λ∗ ) ≥ c∗ . But, we
know from part (i) that the strict inequality here is impossible and, moreover,
that L(λ) ≤ c∗ for every λ ≥ 0. Thus, L(λ∗ ) = c∗ and λ∗ is a maximizer of L over
λ ≥ 0.
21.3 Lagrange duality and saddle points 255

21.3 Lagrange duality and saddle points


Theorem IV.21.1 establishes a certain connection between two optimization prob-
lems, i.e., the “primal” problem
minx {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X] ,
and its Lagrange dual problem

maxλ L(λ) : λ ∈ Rm +
(IC∗ )
 
Pm
where L(λ) = inf L(x, λ) and L(x, λ) = f (x) + i=1 λi gi (x) .
x∈X

Here, the variables λ of the dual problem are called the Lagrange multipliers of
the primal problem. Theorem IV.21.1 states that the optimal value of the dual
problem is at most that of the primal, and under some favorable circumstances
(i.e., when the primal problem is convex, below bounded, and satisfies the Relaxed
Slater condition) the optimal values in this pair of problems are equal to each
other.
In our formulation there may seem to be some asymmetry between the primal
and the dual problems. In fact, both of these problems are related to the Lagrange
function in a quite symmetric way. Indeed, consider the problem
min L(x), where L(x) := sup L(x, λ).
x∈X λ≥0

By definition of the Lagrange function L(x, λ), the function L(x) is clearly +∞ at
every point x ∈ X which is not feasible for (IC) and is f (x) on the feasible set of
(IC), so that this problem is equivalent to (IC). We see that both the primal and
the dual problems originate from the Lagrange function: in the primal problem,
we minimize over X the result of maximization of L(x, λ) in λ ≥ 0, i.e., the primal
problem is
min sup L(x, λ),
x∈X λ∈Rm
+

and in the dual program we maximize over λ ≥ 0 the result of minimization of


L(x, λ) in x ∈ X, i.e., the dual problem is
max inf L(x, λ).
λ∈Rm
+ x∈X

This is a particular (and the most important) example of a two-person zero-sum


game which we will explore later.
We have seen that under certain convexity and regularity assumptions the
optimal values in (IC) and (IC∗ ) are equal to each. There is also another way
to say when these optimal values are equal – this is always the case when the
Lagrange function possesses a saddle point, i.e., there exists a pair x∗ ∈ X, λ∗ ≥ 0
such that at the pair the function L(x, λ) attains its minimum as a function of
x ∈ X and attains its maximum as a function of λ ≥ 0:
L(x, λ∗ ) ≥ L(x∗ , λ∗ ) ≥ L(x∗ , λ), ∀x ∈ X, ∀λ ≥ 0.
256 Lagrange Function and Lagrange Duality

This then leads to the following easily demonstrable fact (do it by yourself or
look at Theorem IV.27.2).

Proposition IV.21.2 The primal-dual pair of solutions (x∗ , λ∗ ) ∈ X × Rk+


is a saddle point of the Lagrange function L of (IC) if and only if x∗ is an
optimal solution to (IC), λ∗ is an optimal solution to (IC∗ ) and the optimal
values in the indicated problems are equal to each other.
22

⋆ Convex Programming in cone-constrained


form

The results from sections 21.1 and 21.2 related to convex optimization problems
in the standard MP format admit instructive extensions to the case of convex
problems in cone-constrained form. We next present these extensions.

22.1 Convex problem in cone-constrained form


Convex problem in cone-constrained form is an optimization problem of the form
Opt(P ) = min {f (x) : g(x) ≤ 0, gb(x) ≤K 0} , (P )
x∈X

where X ⊆ Rn is a nonempty convex set, f : X → R is a convex function,


g(x) := Ax − b is an affine function from Rn to Rk , K ⊂ Rν is a regular cone,
and gb(·) : X → Rν is K-convex.

Example IV.22.1 Recall that the positive semidefinite cone Sn+ and the notation
A ⪰ B, B ⪯ A, A ≻ B, B ≺ A for the associated non-strict and strict conic
inequalities were introduced in section D.2.2. As we know from Fact IV.20.7 and
Example II.8.9, the cone Sn+ is regular and self-dual. Recall from Lemma IV.20.9
that the function from Sn to Sn given by gb(x) = xx⊤ = xx = x2 is ⪰-convex. As
a result, the problem
min n t : Tr(y) ≤ t, y 2 ⪯ B

Opt(P ) = (22.1)
x=(t,y)∈R×S

min n t : ⟨y, In ⟩ − t ≤ 0, y 2 ⪯ B ,

=
x=(t,y)∈R×S

where B is a positive definite matrix and ⟨·, ·⟩ is the Frobenius inner product, is
a convex program in cone-constrained form. ♢

22.2 Cone-constrained Lagrange function


Consider (P ) with a convex objective f , a convex domain X, an affine map
g(·) : Rn → Rk , a regular cone K ⊂ Rν , and a K-convex function gb(·) : X → Rν .
Let Λ := Rk+ × K∗ , where K∗ is the cone dual to K, and consider λ := [λ; λ]
b ∈ Λ.
Then, the cone-constrained Lagrange function of (P ) is defined as

b⊤ gb(x) : X × Λ → R.
L(x; λ) := f (x) + λ g(x) + λ
257
258 ⋆ Convex Programming in cone-constrained form

By construction, for any λ ∈ Λ, we have that L(x; λ) as a function of x underes-


timates f (x) everywhere on the feasible domain of (P ).
Example IV.22.2 (continued from Example IV.22.1) We see that the cone-
constrained Lagrange function of (22.1) is given by
b 2 − B)) : [R × Sn ] × [R+ × Sn ] → R.
b = t + λ[Tr(y) − t] + Tr(λ(y
L(t, y; λ, λ) +


Cone-constrained Lagrange dual of (P ) is the optimization problem

Opt(D) := max {L(λ) : λ ∈ Λ} , [where L(λ) := inf L(x; λ)], (D)


x∈X

where Λ := Rk+ × K∗ . From L(x; λ) ≤ f (x) for all x ∈ X and λ ∈ Λ, we clearly


extract that
Opt(D) ≤ Opt(P ). [Weak Duality]

Note that Weak Duality is independent of any assumptions of convexity on f , X


and on K-convexity of gb.
Example IV.22.3 (continued from Example IV.22.1) It is immediately seen
(check it, or look at the solution to Exercise IV.22) that for the cone-constrained
Lagrange function of (22.1) we have Dom(L) = {[λ; λ] b : λ = 1, λ b ≻ 0} and
(
1 b−1
b = − 4 Tr(λ ) − Tr(λB), if [λ, λ] ∈ Dom(L) .
b b
L(λ, λ) (22.2)
−∞, otherwise

Then, the cone-constrained Lagrange dual of (22.1) is the problem


 
1 b−1 ) − Tr(λB)
Opt(D) = max − Tr(λ b : λ=1
λ≻0,λ≥0
b 4
 
1 −1
= max − Tr(λ ) − Tr(λB) .
b b (22.3)
λ≻0
b 4

22.3 Convex Programming Duality Theorem in cone-constrained


form
For cone-constrained problems, we have the following strong duality theorem.

Theorem IV.22.1 [Convex Programming Duality Theorem in cone-con-


strained form] Consider convex cone-constrained problem (P ), that is, X is
convex, f is real-valued and convex on X, and gb(·) is well defined and K-
convex on X. Assume that the problem is below bounded and satisfies the
Relaxed Slater condition. Then, (D) is solvable and Opt(P ) = Opt(D).
22.3 Convex Programming Duality Theorem in cone-constrained form 259

Note that the only nontrivial part (ii) of Theorem IV.21.1 is nothing but the
special case of Theorem IV.22.1 where K is a nonnegative orthant.
Proof of Theorem IV.22.1. This proof is immediate. Under the premise of the
theorem, c := Opt(P ) is a real, and the system of constraints (ConI) associated
with this c has no solutions. Relaxed Slater Condition along with Convex Theorem
on Alternative in cone-constrained form (Theorem IV.20.13) imply the feasibility
b∗ ] ∈ Λ such that
of (ConII), i.e., the existence of λ∗ = [λ∗ ; λ

n o
L(λ∗ ) = inf f (x) + λ∗ g(x) + λ b⊤ gb(x) ≥ c = Opt(P ).

x∈X

Thus, we deduce that (D) has a feasible solution with objective value ≥ Opt(P ),
By Weak Duality, this value is exactly Opt(P ), the solution in question is optimal
for (D), and Opt(P ) = Opt(D).
Example IV.22.4 (continued from Example IV.22.1) Problem (22.1) is clearly
below bounded and satisfies Slater condition (since B ≻ 0). By Theorem IV.22.1
the dual problem (22.3) is solvable and has the same optimal value as (22.1).
The solution for the (convex!) dual problem (22.3) can be found by applying the
Fermat rule. To this end, note also that for a positive definite n × n matrix y and
h ∈ Sn it holds that
d
t=0
(y + th)−1 = −y −1 hy −1
dt
(why?). Then, the Fermat rule says that the optimal solution to (22.3) is

b∗ = 1 B −1/2
λ∗ = 1, λ
2
and Opt(P ) = Opt(D) = −Tr(B 1/2 ). ♢

22.3.A “Subgradient interpretation” of Lagrange multipliers. Consider


any ∆ := [δ; δ]b ∈ Rk ×Rν , and define the maps g (x) := Ax−b−δ = g(x)−δ and

gb∆ (x) := gb(x) − δb along with the parametric family of convex cone-constrained
problems (P∆ ) given by

Opt(P∆ ) := min {f (x) : g ∆ (x) ≤ 0, gb∆ (x) ≤K 0}


x∈X
n o
= min f (x) : g(x) − δ ≤ 0, gb(x) − δb ≤K 0 .
x∈X

Notice that (P ) is part of this family, as it is precisely (P0 ).


The cone-constrained Lagrange duals of these problems (P∆ ) also form a para-
metric family. Specifically, by setting λ = [λ; λ]b and Λ = Rk × K∗ , we arrive at
+
the cone-constrained Lagrange function of (P∆ ) as

b⊤ (gb(x) − δ),
L∆ (x, λ) := f (x) + λ (g(x) − δ) + λ b
260 ⋆ Convex Programming in cone-constrained form

the resulting dual family of problems (D∆ ) given by


Opt(D∆ ) := max {L∆ (λ)} ,
λ∈Λ

n o
where L∆ (λ) := inf {L∆ (x, λ)} = inf b⊤ (gb(x) − δ)
f (x) + λ (g(x) − δ) + λ b .
x∈X x∈X


Since L∆ (x, λ) = L0 (x, λ) − λ δ − λ b we deduce L (λ) = L (λ) − λ⊤ δ − λ
b⊤ δ, b⊤ δ,
b
∆ 0

n o

where by definition L0 (λ) = inf x∈X f (x) + λ g(x) + λ
b gb(x) . Thus,


n o
Opt(D∆ ) = max {L∆ (λ)} = max L0 (λ) − λ δ − λ b⊤ δb .
λ∈Λ λ∈Λ

We have the following nice and instructive fact that provides further insights
to the optimum value sensitivity of these parametric families of problems.

Fact IV.22.2 Consider parametric family (P∆ ) of convex cone-constrained


problems along with the family (D∆ ) of their cone-constrained Lagrange
duals. Then,
(i) If L0 (µ) > −∞ for some µ = [µ; µ b] ∈ Λ, then the primal optimal value
Opt(P∆ ) takes values in R ∪ {+∞} and is a convex function of ∆.
(ii) If (D0 ) is solvable with optimal solution λ∗ = [λ∗ ; λ
b∗ ] and Opt(D0 ) =
Opt(P0 ), then −λ∗ is a subgradient of Opt(P∆ ) at the point ∆ = 0, i.e.,

b⊤ δ,
Opt(P∆ ) ≥ Opt(P0 ) − λ∗ δ − λ b ∀(∆ = [δ; δ]).
b

The premises in (i) and (ii) definitely take place when (P0 ) satisfies the
Relaxed Slater condition and is below bounded.
Example IV.22.5 (continued from Example IV.22.1) Problem (22.1) can be
embedded into the parametric family of problems
min n t : Tr(y) ≤ t, y 2 ⪯ B + R

Opt(PR ) := (P [R])
x=(t,y)∈R×S

with R varying through Sn . Taking into account all we have established so far
for this problem and also considering Fact IV.22.2, we arrive at
1
Tr((B + R)1/2 ) = −Opt(PR ) ≤ Tr(B 1/2 ) + Tr(B −1/2 R), ∀(R ≻ −B). (22.4)
2
Note that this is nothing but the Gradient inequality for the concave (see Fact
III.17.6) function Tr(X 1/2 ) : Sn+ → R, see Fact D.24. ♢

22.4 Conic Programming and Conic Duality Theorem


In this section, we will consider a special case of convex problem in cone-cons-
trained form, namely Conic programs in today’s optimization terminology. In
conic programs, f (x) is linear, X is the entire space, and gb(x) = P x − p is
affine. Note that affine mapping is K-convex whatever be the cone K. Thus,
conic problem automatically satisfies convexity restrictions from cone-constrained
22.4 Conic Programming and Conic Duality Theorem 261

Convex Programming Duality Theorem (Theorem IV.22.1) and is an optimization


problem of the form
Opt(P ) = minn c⊤ x : Ax − b ≤ 0, P x − p ≤K 0 ,

(22.5)
x∈R

where K is a regular cone in certain Rν .


The simplest example of a conic problem is an LP problem, where K is a
nonnegative orthant. Another instructive example is the conic reformulation of
convex quadratic quadratically constrained problem. This example relies on the
following useful observation.

Fact IV.22.3 Consider the convex quadratic constraint x⊤ A⊤ Ax ≤ b⊤ x + c,


where A ∈ Rd×n , b ∈ Rn , and c ∈ R. This constraints can be equivalently
rewritten as a conic constraint involving Lorentz cone, i.e.,
x⊤ A⊤ Ax ≤ b⊤ x + c
⇐⇒ [2Ax; b⊤ x + c − 1; b⊤ x + c + 1] ∈ Ld+2
⇐⇒ 4x⊤ Ax + (b⊤ x + c − 1)2 ≤ (b⊤ x + c + 1)2 and b⊤ x + c + 1 ≥ 0.

Fact IV.22.3 immediately leads to the following useful result.

Fact IV.22.4 Given Aj ∈ Rdj ×n , bj ∈ Rn , and cj ∈ R, for 0 ≤ j ≤ m, the


convex quadratic quadratically constrained optimization problem
min x⊤ A⊤ ⊤ ⊤ ⊤ ⊤

0 A0 x + b0 x + c0 : x Aj Aj x + bj x + cj ≤ 0, 1 ≤ j ≤ m
x

is equivalent to the conic problem on the product of m + 1 Lorentz cones,


specifically, it admits the conic formulation given by
 
A[x; t] + b := [α0 (x, t); α1 (x); . . . ; αm (x)]
min t : ,
x,t ∈ Ld0 +2 × Ld1 +2 × . . . × Ldm +2
where α0 (x, t) := [2A0 x; t − b⊤ ⊤
0 x − c0 − 1; t − b0 x − c0 + 1] and αj (x) :=
⊤ ⊤
[2Aj x; −bj x − cj − 1; −bj x − cj + 1] for all 1 ≤ j ≤ m.

The cone-constrained Lagrange dual problem of problem (22.5) reads



h n oi
max inf c⊤ x + λ (Ax − b) + λb⊤ (P x − p) : λ ≥ 0, λ
b ∈ K∗ ,
λ,λ
b x

which, as it is immediately seen, is nothing but the problem


n o
Opt(D) = max −b⊤ λ − p⊤ λ b : A⊤ λ + P ⊤ λ
b + c = 0, λ ≥ 0, λ
b ∈ K∗ . (D)
λ,λ
b

Recall that by Fact IV.20.6.i the cone dual to a regular cone also is a regular
cone. As a result, the problem (D), called the conic dual of the conic problem
(22.5), also is a conic problem. An immediate computation (utilizing the fact that
(K∗ )∗ = K for every regular cone K) shows that conic duality is symmetric.
262 ⋆ Convex Programming in cone-constrained form

Fact IV.22.5 Conic duality is symmetric, i.e., the conic dual to conic prob-
lem (D) is (equivalent to) conic problem (22.5).

In view of primal-dual symmetry, Convex Duality Theorem in cone-constrained


form (Theorem IV.22.1) in the Conic Programming case takes the following nice
form.

Theorem IV.22.6 [Conic Duality Theorem] Consider a primal-dual pair of


conic problems
Opt(P ) := minn c⊤ x : Ax − b ≤ 0, P x − x≤K 0 ,

(P )
x∈R
n o
Opt(D) := max −b⊤ λ − p⊤ λ b : A⊤ λ + P ⊤ λ
b + c = 0, λ ≥ 0, λ
b ∈ K∗ . (D)
λ,λ
b

Then, we always have Opt(D) ≤ Opt(P ). Moreover, if one of the problems


in the pair is bounded and satisfies the Relaxed Slater condition, then the
other problem in the pair is solvable, and Opt(P ) = Opt(D). Finally, if both
of the problems satisfy Relaxed Slater condition, then both are solvable with
equal optimal values.

Proof. This proof is immediate. Weak duality has already been verified. To
verify the second claim, note that by primal-dual symmetry we can assume that
the bounded problem satisfying Relaxed Slater condition is (P ). But, then the
claim in question is given by Theorem IV.22.1. Finally, if both problems satisfy
Relaxed Slater condition (and in particular are feasible), by Weak Duality, both
are bounded, and therefore solvable with equal optimal values by the preceding
claim.
Application example: S-Lemma S-Lemma is an extremely useful fact that
has applications in optimization, engineering, and control.

Lemma IV.22.7 (S-Lemma) Let A, B ∈ Sn be such that


∃x̄ : x̄⊤ Ax̄ > 0. (22.6)
Then, the implication
x⊤ Ax ≥ 0 =⇒ x⊤ Bx ≥ 0 (22.7)
holds if and only if
∃λ ≥ 0: B ⪰ λA. (22.8)

Note that S-Lemma is the statement of the same flavor as Homogeneous Farkas
Lemma: the latter states that a homogeneous linear inequality b⊤ x ≥ 0 is a
consequence of a system of homogeneous linear inequalities a⊤i x ≥ 0, 1 ≤ i ≤ k,
if and only if the target inequality can be obtained from the inequalities of the
system by taking weighted sum with nonnegative weights; we could add to “taking
weighted sum” also “and adding identically true homogeneous linear inequality”
22.4 Conic Programming and Conic Duality Theorem 263

– by the simple reason that there exists only one inequality of the latter type,
0⊤ x ≥ 0. Similarly, S-Lemma says that (whenever (22.6) holds) homogeneous
quadratic inequality x⊤ Bx ≥ 0 is a consequence of (single-inequality) system of
homogeneous quadratic inequalities x⊤ Ax ≥ 0 if and only if the target inequality
can be obtained by taking weighted sum, with nonnegative weights, of inequalities
of the system (that is, by taking a nonnegative multiple of the inequality x⊤ Ax ≥
0) and adding an identically true homogeneous quadratic inequality (there are
plenty of them, these are inequalities x⊤ Cx ≥ 0 with C ⪰ 0).
Note that the possibility for the target inequality to be obtained by summing
up, with nonnegative weights, inequalities from certain system and adding an
identically true inequality is clearly a sufficient condition for the target inequal-
ity to be a consequence of the system. The actual power of Homogeneous Farkas
Lemma and S-Lemma is in the fact that this evident sufficient condition is also
necessary for the conclusion in question to be valid (in the case of linear inequal-
ities – whenever the system is finite, in the case of S-Lemma – when the system
is a single-inequality one and (22.6) takes place). The fact that in the quadratic
case to guarantee the necessity, the system should be a single-inequality one,
whatever unpleasant, is a must. In fact, a straightforward “quadratic version” of
Homogeneous Farkas Lemma fails, in general, to be true already when there are
just two quadratic inequalities in the system. This being said, even that poor, as
compared to its linear inequalities analogy, S-Lemma is extremely useful. . .
In preparation to S-Lemma, we will first prove the following weaker statement.

Lemma IV.22.8 Let A, B ∈ Sn . Suppose ∃x̄ satisfying x̄⊤ Ax̄ > 0. Then,
the implication
{X ⪰ 0, Tr(AX) ≥ 0} =⇒ Tr(BX) ≥ 0 (22.9)
holds if and only if B ⪰ λA for some λ ≥ 0.

Proof. The “if” part of this lemma is evident. To prove the “only if” part,
consider the conic problem
Opt(P ) = min {Tr(BX) : Tr(AX) ≥ 0, X ⪰ 0} (P )
X

along with its conic dual, which is given by


Opt(D) = max{0 · λ + Tr(0n×n Y ) : λ ≥ 0, Y ⪰ 0, Y + λA = B} (D)
λ,Y

(derive the conic dual of (P ) yourself by utilizing the fact that Sn+ is self-dual).
Note that from the premise of (22.6), we deduce that for large enough nonnegative
t, the solution X̄ := In + tx̄x̄⊤ will ensure that the Slater condition holds true.
Moreover, under the premise of (22.9) Opt(P ) is bounded from below by 0 as
well. Then, by Conic Duality Theorem, the dual (D) is solvable, implying that
B ⪰ λA for some λ ≥ 0, as required in this lemma and completing the proof.
Note that (22.7) is nothing but (22.9) with X restricted to be of rank ≤ 1.
Indeed, X ⪰ 0 is of rank ≤ 1 if and only if X = xx⊤ for some vector x, and
264 ⋆ Convex Programming in cone-constrained form

in this case Tr(P X) = x⊤ P x for every symmetric P of appropriate size. We are


now ready to complete the proof of S-Lemma.
Proof of Lemma IV.22.7 (S-Lemma). The “if” part is evident. To prove the
“only if” part, assume that implication (22.7) holds true, and let us verify that
B ⪰ λA for some λ ≥ 0. By Lemma IV.22.8, all we need to this end is to show
that the validity implication (22.7) implies the validity of implication (22.9). Thus,
assume that x⊤ Ax ≥ 0 does imply that x⊤ Bx ≥ 0, and let X ⪰ 0 be such that
Tr(AX) ≥ 0; all we need is to prove that in this case Tr(BX) ≥ 0 holds as well.
To this end, let X 1/2 AX 1/2 = U Diag{µ}U ⊤ be the eigenvalue decomposition of
X 1/2 AX 1/2 . By defining µ := Tr(Diag{µ}) and using the relation X 1/2 AX 1/2 =
U Diag{µ}U ⊤ , we arrive at
µ = Tr(Diag{µ}) = Tr(U ⊤ X 1/2 AX 1/2 U ) = Tr(X 1/2 AX 1/2 ) = Tr(AX) ≥ 0.
Now, consider an n-dimensional Rademacher random vector ζ, i.e., n-dimensio-
nal vector with entries which, independently of each other, take values ±1 with
probabilities 1/2. By setting ξ := X 1/2 U ζ, we get
ξ ⊤ Aξ = ζ ⊤ (U ⊤ X 1/2 AX 1/2 U )ζ = ζ ⊤ Diag{µ}ζ
= Tr(Diag{µ}ζζ ⊤ ) = Tr(Diag{µ}) = µ ≥ 0.
Thus, ξ ⊤ Aξ ≥ 0 for all realizations of ξ. Recalling that we are in the case when
(22.7) holds, we conclude that ξ ⊤ Bξ ≥ 0 for all realizations of ξ, or, which is the
same, ζ ⊤ (U ⊤ X 1/2 BX 1/2 U )ζ ≥ 0 for all realizations of ζ. Passing to expectations
and recalling that ζ is Rademacher random vector, we get
h i
0 ≤ Eζ ζ ⊤ (U ⊤ X 1/2 BX 1/2 U )ζ = Tr(U ⊤ X 1/2 BX 1/2 U )
= Tr(X 1/2 BX 1/2 ) = Tr(BX),
that is, Tr(BX) ≥ 0, so that (22.9) does hold true.
Inhomogeneous S-Lemma. S-Lemma provides a necessary and sufficient con-
dition for a homogeneous quadratic inequality x⊤ Bx ≥ 0 to be a consequence of
strictly feasible homogeneous quadratic inequality x⊤ Ax ≥ 0. What about the
inhomogeneous case? When an inhomogeneous quadratic inequality
x⊤ Bx + 2b⊤ x + β ≥ 0 (B)
is a consequence of a strictly feasible inhomogeneous quadratic inequality
x⊤ Ax + 2a⊤ x + α ≥ 0 (A)
? The answer is easy to guess. The implication (A) =⇒ (B) is nothing but the
implication
∀(t ̸= 0, x) : x⊤ Ax + 2ta⊤ x + αt2 ≥ 0 =⇒ x⊤ Bx + 2tb⊤ x + βt2 ≥ 0 (∗)
(plug x/t instead of x into (A) and look what happens with (B)). We understand
when a bit stronger implication
∀(t, x) : x⊤ Ax + 2ta⊤ x + αt2 ≥ 0 =⇒ x⊤ Bx + 2tb⊤ x + βt2 ≥ 0 (∗∗)
22.4 Conic Programming and Conic Duality Theorem 265

holds true: the homogeneous inequality in the premise of (∗∗) is strictly feasible
along with (A), so that by the homogeneous S-Lemma (∗∗) holds true if and only
if
   
B b A a
∃λ ≥ 0 : ⪰ λ . (22.10)
b⊤ β a⊤ α
Thus, if we knew that in the case of strictly feasible (A) the validity of implication
(∗) is the same as the validity of implication (∗∗), we could be sure that the first
of these implications takes place if and only if (22.10) takes place. The above ”if”
indeed is true.

Lemma IV.22.9 [Inhomogeneous S-Lemma] Let A, B ∈ Sn . Suppose there


exists x̄ such that x̄⊤ Ax̄ + 2a⊤ x̄ + α > 0. Then, the implication (A) =⇒ (B)
takes place if and only if (22.10) takes place.

Proof. Suppose the premise holds, i.e., x̄⊤ Ax̄ + 2a⊤ x̄ + α > 0 for some x̄. Based
on the discussion preceding this lemma all we need to verify is that the validity
of (∗) is exactly the same as the validity of (∗∗). Clearly, the validity of (∗∗)
implies the validity of (∗), so our task boils down to demonstrating that under
the premise of the lemma, the validity of (∗) implies the validity of (∗∗). Thus,
assume that (∗) is valid, and let us prove that (∗∗) is valid as well. All we need
to prove is that y ⊤ Ay ≥ 0 implies y ⊤ By ≥ 0. Thus, assume that y is such that
y ⊤ Ay ≥ 0, and let us prove that y ⊤ By ≥ 0 as well. Define xt := tx̄ + (1 − t)y,
and consider the univariate quadratic functions qa (t) := x⊤ ⊤
t Axt + 2ta xt + αt ,
2
⊤ ⊤ 2
qb (t) := xt Bxt + 2tb xt + βt . We have so far seen that
(a) for all t ̸= 0, qa (t) ≥ 0 =⇒ qb (t) ≥ 0,
(b) qa (1) > 0 and qa (0) ≥ 0,
and we would like to show that qb (0) ≥ 0. Note that qa and qb are linear or
quadratic functions of t and thus they are continuous in t. Now, consider the
following cases (draw qa in these cases!):

• If qa (0) > 0, by continuity of qa we have qa (t) > 0 for all small enough nonzero
t, and so in such a case, by (a) we also get qb (t) ≥ 0 for all small enough in
magnitude nonzero t, implying, by continuity, that qb (0) ≥ 0.
• If qa (0) = 0, the reasoning goes as follows. When t varies from 0 to 1, the linear
or quadratic function qa (t) varies from 0 to something positive. It follows that
– either qa (t) ≥ 0, 0 ≤ t ≤ 1, implying by (a) that qb (t) ≥ 0 for t ∈ (0, 1], and
so qb (0) ≥ 0 holds by continuity of qb (t) at t = 0,
– or qa (t̄) < 0 holds for some t̄ ∈ (0, 1). Assuming that this is the case, the
linear or quadratic function qa (t) is zero at t = 0, negative somewhere on
(0, 1), and positive at t = 1. Therefore, qa is quadratic, and not linear,
function of t which has exactly one root in the interval (0, 1). Let this root in
(0, 1) be t1 . Recall that the other root of the quadratic function qa is t = 0,
thus we must have qa (t) = c(t − 0)(t − t1 ) for some c ∈ R. From t1 < 1 and
qa (1) > 0 it follows that c > 0; this, in turn, combines with t1 > 0 to imply
266 ⋆ Convex Programming in cone-constrained form

that qa (t) > 0 when t < 0. By (a) it follows that qb (t) ≥ 0 for t < 0, whence
by continuity qb (0) ≥ 0.

As an important consequence of Inhomogeneous S-Lemma we arrive at the


following observation that states that the semidefinite programming relaxation
of the quadratically constrained quadratic program with a single strictly feasible
quadratic constraint is exact.

Corollary IV.22.10 Let A, B ∈ Sn . Suppose there exists x̄ such that


x̄⊤ Ax̄ + 2a⊤ x̄ + α > 0. Then,
β ∗ := inf x⊤ Bx + 2b⊤ x : x⊤ Ax + 2a⊤ x + α ≥ 0

x
   
B − λA b − λa
= max β : λ ≥ 0, ⪰ 0 .
β,λ b⊤ − λa⊤ −λα − β
(Here, as always, the optimal value of an infeasible maximization problem is
−∞.)

Proof. Define the quadratic functions q(x) := x⊤ Ax + 2a⊤ x + α and qβ (x) :=


x⊤ Bx + 2b⊤ x − β. Note that β ∗ is the supremum of all β’s satisfying β ≤
inf x {x⊤ Bx + 2b⊤ x : x⊤ Ax + 2a⊤ x + α ≥ 0}, that is, those β for which the
implication
q(x) ≥ 0 =⇒ qβ (x) ≥ 0
holds true. By the premise of the corollary, there exists x̄ satisfying q(x̄) > 0,
so that by Inhomogeneous S-lemma (see Lemma IV.22.9) β ∗ is the supremum of
those β which can be augmented by appropriate λ to yield feasible solutions to
the maximization problem in the corollary’s formulation, or, which is the same,
β ∗ is the optimal value in the latter problem.
Example IV.22.6 (continued from Example IV.22.1) Invoking Schur Comple-
ment Lemma (Proposition D.33), we can rewrite (22.1) equivalently as the conic
problem
   
B y
Opt(P ) = min n t : Tr(y) ≤ t, ⪰0 . (22.11)
t∈R,y∈S y In
The conic dual of (22.11) can be obtained as follows: we equip the
 scalarinequality
B y
t − Tr(y) ≥ 0 with Lagrange multiplier λ ≥ 0, the inequality ⪰ 0 with
y In
 
U V
Lagrange multiplier ⪰ 0 (recall that the semidefinite cone is self-dual,
V⊤ W
so that legitimate Lagrange multipliers for ⪰-constraints are ⪰-nonnegative), and
sum up the termwise inner products of the constraints of (22.11), thus arriving
at the aggregated inequality
λ(t − Tr(y)) + 2Tr(yV ⊤ ) ≥ −Tr(BU ) − Tr(W ).
22.4 Conic Programming and Conic Duality Theorem 267

which by construction is a consequence of the constraints in (22.11). We then


impose on the Lagrange multipliers, in addition to the above conic constraints,
the restriction that the left hand side in the aggregated inequality is identically
in t ∈ R, y ∈ Sn equal to the objective in (22.11), that is, the restrictions
λ = 1, V + V ⊤ = In .
Then, the dual problem is given by
 

 λ = 1, 


V
 +V =  In
 
Opt(D) = max −(Tr(BU ) + Tr(W )) : . (22.12)
λ∈R, U V
⪰0
n n×n

 

U,W ∈S ,V ∈R
V⊤ W
 

Thus, the dual is precisely the problem of maximizing under the outlined restric-
tion the right hand side of the aggregated inequality.
Since problem (22.11) satisfies the Slater condition (as B ≻ 0) and is below
bounded (why?), the dual problem is solvable and Opt(P ) = Opt(D). Moreover,
the dual problem also satisfies the Relaxed Slater condition (why?), so that both
the primal and the dual problems are solvable. ♢
23

Optimality Conditions in Convex


Programming

Using our results on convex optimization duality, we next derive optimality con-
ditions for convex programs.

23.1 Saddle point form of optimality conditions

Theorem IV.23.1 [Saddle Point formulation of Optimality Conditions in


Convex Programming]
Consider optimization problem
minx {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X] ,
along with its Lagrange dual problem

max λ L(λ) : λ ∈ Rm
+
(IC∗ )
 
Pm
where L(λ) = inf L(x, λ) and L(x, λ) = f (x) + i=1 λi gi (x) .
x∈X


Let x ∈ X. Then,
(i) A sufficient condition for x∗ to be an optimal solution to (IC) is the
existence of the vector of Lagrange multipliers λ∗ ≥ 0 such that (x∗ , λ∗ ) is
a saddle point of the Lagrange function L(x, λ), i.e., a point where L(x, λ)
attains its minimum as a function of x ∈ X and attains its maximum as a
function of λ ≥ 0:
L(x, λ∗ ) ≥ L(x∗ , λ∗ ) ≥ L(x∗ , λ) ∀x ∈ X, ∀λ ≥ 0. (23.1)
(ii) Furthermore, if the problem (IC) is convex and satisfies the Relaxed
Slater condition, then the above condition is necessary for optimality of x∗ : if
x∗ is optimal for (IC), then there exists λ∗ ≥ 0 such that (x∗ , λ∗ ) is a saddle
point of the Lagrange function.

Proof. (i): Assume that for a given x∗ ∈ X there exists λ∗ ≥ 0 such that (23.1)
is satisfied, and let us prove that then x∗ is optimal for (IC). First, we claim
that x∗ is feasible. Assume for contradiction that gj (x∗ ) > 0 for some j. Then, of
course, sup L(x∗ , λ) = +∞ (look what happens when all λ’s, except λj , are fixed,
λ≥0

268
23.1 Saddle point form of optimality conditions 269

and λj → +∞). But, sup L(x∗ , λ) = +∞ is forbidden by the second inequality in


λ≥0
(23.1). Thus, x∗ must be feasible to (IC).
Since x∗ is feasible, sup L(x∗ , λ) = f (x∗ ), and we conclude from the second
λ≥0
inequality in (23.1) that L(x∗ , λ∗ ) = sup L(x∗ , λ) = f (x∗ ). Finally, let us examine
λ≥0
the first inequality in (23.1). This relation now reads
m
X
f (x) + λ∗j gj (x) ≥ f (x∗ ), ∀x ∈ X.
j=1

Recall that for any x feasible for (IC), we have gj (x) ≤ 0 for all j. Together
with λ∗P≥ 0, we then deduce that for any x feasible to (IC), we have f (x) ≥
m
f (x) + j=1 λ∗j gj (x). But, then the above relation immediately implies that x∗ is
optimal.
(ii): Assume that (IC) is a convex program, x∗ is its optimal solution and
the problem satisfies the Relaxed Slater condition. We will prove that then there
exists λ∗ ≥ 0 such that (x∗ , λ∗ ) is a saddle point of the Lagrange function, i.e., that
(23.1) is satisfied. As we know from the Convex Programming Duality Theorem
(Theorem IV.21.1.ii), the dual problem (IC∗ ) has a solution λ∗ ≥ 0 and the
optimal value of the dual problem is equal to the optimal value of the primal one,
i.e., to f (x∗ ):
f (x∗ ) = L(λ∗ ) ≡ inf L(x, λ∗ ). (23.2)
x∈X

Then, we immediately conclude that


λ∗j > 0 =⇒ gj (x∗ ) = 0
(this is called complementary slackness: positive Lagrange multipliers can be
associated only with active (satisfied at x∗ as equalities) constraints). Indeed,
from (23.2) it for sure follows that
m
X
∗ ∗ ∗ ∗
f (x ) ≤ L(x , λ ) = f (x ) + λ∗j gj (x∗ );
j=1

the terms in the summation expression in the right hand side are nonpositive
(since x∗ is feasible for (IC) and λ∗ ≥ 0), and the sum itself is nonnegative
due to our inequality. Note that this is possible if and only if all the terms in the
summation expression are zero, and this is precisely the complementary slackness.
From the complementary slackness we immediately conclude that f (x∗ ) =
L(x∗ , λ∗ ), so that (23.2) results in
L(x∗ , λ∗ ) = f (x∗ ) = inf L(x, λ∗ ).
x∈X

On the other hand, since x is feasible for (IC), from the definition of the La-
grangian function we deduce that L(x∗ , λ) ≤ f (x∗ ) whenever λ ≥ 0. Combining
our observations, we conclude that
L(x∗ , λ) ≤ L(x∗ , λ∗ ) ≤ L(x, λ∗ )
270 Optimality Conditions in Convex Programming

for all x ∈ X and all λ ≥ 0.


Note that Theorem IV.23.1(i) is valid for an arbitrary inequality constrained
optimization problem, not necessarily a convex one. However, in the noncon-
vex case the sufficient condition for optimality given by Theorem IV.23.1(i) is
extremely far from being necessary and is “almost never” satisfied. In contrast
to this, in the convex case the condition in question is not only sufficient, but
also “nearly necessary” – it for sure is necessary when (IC) is a convex program
satisfying the Relaxed Slater condition.
Theorem IV.23.1 presents Saddle Point form of optimality conditions for con-
vex problems in the standard Mathematical Programming form (that is, with
constraints represented by scalar convex inequalities). Similar results can be ob-
tained for convex cone-constrained problems as follows.

Theorem IV.23.2 [Saddle Point formulation of Optimality Conditions in


Convex Cone-constrained Programming] Consider a convex cone-constrained
problem
Opt(P ) = min {f (x) : g(x) := Ax − b ≤ 0, gb(x)≤K 0} , (P )
x∈X

(X is convex, f : X → R is convex, g(·) : Rn → Rk is affine, K is a regular


cone, and gb is K-convex on X) along with its Cone-constrained Lagrange
dual problem
 

h i

Opt(D) = max L(λ) := inf f (x) + λ g(x) + λ gb(x) : λ ≥ 0, λ ∈ K∗ .
b b
λ:=[λ;λ]
b x∈X

(D)
Suppose that (P ) is bounded and satisfies Relaxed Slater condition. Then, a
point x∗ ∈ X is an optimal solution to (P ) if and only if x∗ can be augmented
by properly selected λ∗ ∈ Λ := Rk+ × [K∗ ] to be a saddle point of the cone-
constrained Lagrange function
b := f (x) + λ⊤ g(x) + λ
L(x; [λ; λ]) b⊤ gb(x)

on X × Λ.

Proof. The proof basically repeats the one of Theorem IV.23.1. In one direction:

assume that x∗ ∈ X can be augmented by λ∗ = [λ ; λ b∗ ] ∈ Λ to form a saddle

point of L(x; λ) on X × Λ, and let us prove that x is an optimal solution to (P ).
Observe, first, that from
b = f (x∗ ) + sup λ⊤ g(x∗ ) + λ
h i
L(x∗ ; λ∗ ) = sup L(x∗ ; [λ; λ]) b⊤ gb(x∗ ) (23.3)
λ∈Λ λ∈Λ

b⊤ gb(x∗ ) of λ
it follows that the linear form λ b is bounded from above on the cone

K∗ , implying that −gb(x ) ∈ [K∗ ]∗ = K. Similarly, (23.3) says that the linear

form λ g(x∗ ) of λ is bounded from above on the cone Rk+ , implying that −g(x∗ )
belongs to the dual of this cone, that is, to Rk+ . Thus, x∗ is feasible for (P ).
As x∗ is feasible for (P ), the right hand side of the second equality in (23.3)
23.2 Karush-Kuhn-Tucker form of optimality conditions 271

is f (x∗ ), and thus (23.3) says that L(x∗ ; λ∗ ) = f (x∗ ). With this in mind, the
relation L(x; λ∗ ) ≥ L(x∗ ; λ∗ ) (which is satisfied for all x ∈ X, since (x∗ , λ∗ ) is
a saddle point of L) reads L(x; λ∗ ) ≥ f (x∗ ). This combines with the relation
f (x) ≥ L(x; λ∗ ) (which, due to λ∗ ∈ Λ, holds true for all x feasible for (P )) to
imply that Opt(P ) ≥ f (x∗ ). The bottom line is that x∗ is a feasible solution to
(P ) satisfying Opt(P ) ≥ f (x∗ ), thus, x∗ is an optimal solution to (P ), as claimed.
In the opposite direction: let x∗ be an optimal solution to (P ), and let us
verify that x∗ is the first component of a saddle point of L(x; λ) on X × Λ.
Indeed, (P ) is convex essentially strictly feasible cone-constrained problem; being
solvable, it is below bounded. Applying Convex Programming Duality Theorem
in cone-constrained form (Theorem IV.22.1), the dual problem (D) is solvable

with optimal value Opt(D) = Opt(P ). Denoting by λ∗ = [λ ; λ b∗ ] an optimal
solution to (D), we have
f (x∗ ) = Opt(P ) = Opt(D) = L(λ∗ ) = inf L(x; λ∗ ), (23.4)
x∈X

b∗ ]⊤ gb(x∗ ), that is, [λ∗ ]⊤ g(x∗ ) +
whence f (x∗ ) ≤ L(x∗ ; λ∗ ) = f (x∗ ) + [λ ]⊤ g(x∗ ) + [λ
b∗ ]⊤ gb(x∗ ) ≥ 0. Both terms in the latter sum are nonpositive (as x∗ is feasible


for (P ) and λ∗ ∈ Λ), while their sum is nonnegative, so that [λ ]⊤ g(x∗ ) = 0
and [λ b∗ ]⊤ gb(x∗ ) = 0. We conclude that the inequality f (x∗ ) ≤ L(x∗ ; λ∗ ) is in fact
equality, so that (23.4) reads inf x∈X L(x; λ∗ ) = L(x∗ ; λ∗ ). Next, L(x∗ ; λ) ≤ f (x∗ )
for λ ∈ Λ due to feasibility of x∗ for (P ), which combines with the already
proved equality L(x∗ ; λ∗ ) = f (x∗ ) to imply that supλ∈Λ L(x∗ ; λ) = L(x∗ ; λ∗ ).
Thus, (x∗ , λ∗ ) is the desired saddle point of L.

23.2 Karush-Kuhn-Tucker form of optimality conditions


Theorem IV.23.1 provides, basically, the strongest known optimality conditions
for a Convex Programming problem. These conditions, however, are “implicit” –
they are expressed in terms of saddle point of the Lagrange function, and it is
unclear how to verify whether a given solution is a saddle point of the Lagrange
function. Fortunately, the proof of Theorem IV.23.1 yields more or less explicit
optimality conditions.
Recall that the normal cone NX (x) of a set X ⊆ Rn at a point x ∈ X as
defined by (14.5) is
NX (x) = h ∈ Rn : h⊤ (x′ − x) ≤ 0, ∀x′ ∈ X .


Now let us define the notion of Karush-Kuhn-Tucker point of inequality con-


strained optimization problem.

Definition IV.23.3 [Karush-Kuhn-Tucker point of inequality constrained


272 Optimality Conditions in Convex Programming

Mathematical Programming problem] Consider optimization problem


minx {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X] ,
A point x∗ ∈ Rn is called Karush-Kuhn-Tucker (KKT) point of (IC), if x∗
is feasible, f , g1 ,. . . ,gm are differentiable at x∗ , and there exist nonnegative
Lagrange multipliers λ∗j , j = 1, . . . , m, such that
λ∗j gj (x∗ ) = 0, j = 1, . . . , m, [complementary slackness]

(23.5)
m
X
and ∇f (x∗ ) + λ∗j ∇gj (x∗ ) ∈ −NX (x∗ ), [KKT equation]
j=1
(23.6)
∗ ∗
where NX (x ) is the normal cone of X at x .

We are now ready to state “more explicit” optimality conditions for convex
programs based on KKT points.

Theorem IV.23.4 [Karush-Kuhn-Tucker Optimality Conditions in Convex


Programming] Let (IC) be a convex program (i.e., X is nonempty and convex,
and f, g1 , . . . , gm are convex on X). Let x∗ ∈ X, and let the functions f ,
g1 ,. . . ,gm be differentiable at x∗ . Then,
(i) [Sufficiency] If x∗ is a KKT point of (IC), then x∗ is an optimal solution
to the problem.
(ii) [Necessity and sufficiency] If, in addition to the premise, the Relaxed
Slater condition holds, x∗ is an optimal solution to (IC) if and only if x∗ is a
KKT point of the problem.

Proof. (i): Suppose x∗ is a KKT point of problem (IC), and let us prove that
x∗ is an optimal solution to (IC). By Theorem IV.23.1, it suffices to demonstrate
that augmenting x∗ by properly selected λ ≥ 0, we get a saddle point (x∗ , λ) of

the Lagrange function on X × Rm + . Let λ be the Lagrange multipliers associated
with x according to the definition of a KKT point. We claim that (x∗ , λ∗ ) is a

saddle point of the Lagrange function. Indeed, complementary P slackness says that
n
L(x∗ , λ∗ ) = f (x∗ ), while due to feasibility of x∗ we have supλ≥0 j=1 λj gj (x∗ ) = 0.
Taken together, these observations say that L(x , λ ) = supλ≥0 L(x∗ , λ). More-
∗ ∗

over, the function ϕ(x) := L(x, λ∗ ) : X → R is convex and differentiable at x∗ ,


and by the KKT equation we have ∇ϕ(x∗ ) ∈ −NX (x∗ ). Invoking Proposition
III.14.3, we conclude that x∗ minimizes ϕ on X, that is, L(x, λ∗ ) ≥ L(x∗ , λ∗ ) for
all x ∈ X. Thus, (x∗ , λ∗ ) is a saddle point of the Lagrange function.
(ii): In view of (i), all we need to prove (ii) is to demonstrate that if x∗ is
an optimal solution to (IC), (IC) is a convex program that satisfies the Relaxed
Slater condition, and the objective and constraints of (IC) are differentiable at
23.3 Cone-constrained KKT optimality conditions 273

x∗ , then x∗ is a KKT point. Indeed, let x∗ and (IC) satisfy the above “if.” By
Theorem IV.23.1(ii), x∗ can be augmented by some λ∗ ≥ 0 to yield a saddle point
(x∗ , λ∗ ) of L(x, λ) on X × Rm
+ . Then, the saddle point inequalities (23.1) give us
m
X m
X
f (x∗ ) + λ∗j gj (x∗ ) = L(x∗ , λ∗ ) ≥ sup L(x∗ , λ) = f (x∗ ) + sup λj gj (x∗ ).
j=1 λ≥0 λ≥0 j=1

(23.7)
Moreover, as x∗ is feasible to (IC), we have gj (x∗ ) ≤ 0 for all j, whence
X
supλ≥0 λi gj (x∗ ) = 0.
j

Therefore (23.7) implies that j λ∗j gj (x∗ ) ≥ 0. This inequality, in view of λ∗j ≥ 0
P
and gj (x∗ ) ≤ 0 for all j, implies that λ∗j gj (x∗ ) = 0 for all j, i.e., complementary
slackness (23.5) condition holds. The relation L(x, λ∗ ) ≥ L(x∗ , λ∗ ) for all x ∈ X,
implies that the function ϕ(x) := L(x, λ∗ ) attains its minimum on X at x = x∗ .
Note also that ϕ(x) is convex on X and differentiable at x∗ , thus, by Proposition
III.14.3, we deduce that the KKT equation (23.6) also holds.
Note that the optimality conditions stated in Theorem III.14.2 and Proposition
III.14.3 are particular cases of Theorem IV.23.4 corresponding to m = 0.
Remark IV.23.5 A standard special case of Theorem IV.23.4 that is worth
discussing explicitly is when x∗ is in the (relative) interior of X.
When x∗ ∈ int X, we have NX (x∗ ) = {0}, so that (23.6) reads
m
X
∇f (x∗ ) + λ∗j ∇gj (x∗ ) = 0.
j=1

When x∗ ∈ rint X, NX (x∗ ) is the orthogonal complement to the linear subspace


L to which Aff(X) is parallel, so that (23.6) reads
m
X
∇f (x∗ ) + λ∗j ∇gj (x∗ ) is orthogonal to L := Lin(X − x∗ ).
j=1

23.3 Cone-constrained KKT optimality conditions


The cone-constrained version of the notion of a KKT point is defined as follows:

Definition IV.23.6 [Karush-Kuhn-Tucker point of a cone-constrained prob-


lem] Consider cone-constrained optimization problem

 {f (x) : g(x) :=
min Ax − b ≤ 0, gb(x) ≤K 0, x ∈ X}
where X ⊆ Rn , f : X → R, g : Rn → Rk , (ConeC)
gb : X → Rν , K ⊂ Rν is a regular cone.
274 Optimality Conditions in Convex Programming

A point x∗ ∈ Rn is called Karush-Kuhn-Tucker (KKT) point of (ConeC),


if x∗ is feasible, f and gb are differentiable at x∗ , and there exist Lagrange
multipliers λ = [λ1 ; . . . ; λk ] ≥ 0 and λ
b ∈ K∗ such that

λj [g(x∗ )]j = 0, ∀j ≤ k, & λ


b⊤ gb(x∗ ) = 0, [complementary slackness] (23.8)

h i
and ∇x f (x) + λ g(x) + λ b⊤ gb(x) ∈ −NX (x∗ ), [KKT equation]
x=x∗
(23.9)
where NX (x∗ ) is the normal cone of X at x∗ , see (14.5).

Based on this definition, cone-constrained version of Theorem IV.23.4 is as


follows.

Theorem IV.23.7 [Karush-Kuhn-Tucker Optimality Conditions in Cone-


constrained Convex Programming] Consider a convex cone-constrained prob-
lem
Opt(P ) = min {f (x) : g(x) := Ax − b ≤ 0, gb(x)≤K 0} (P )
x∈X

(X is convex, f : X → R is convex, A ∈ Rk×n , K is a regular cone, and gb is


K-convex on X). Suppose x∗ ∈ X is a feasible solution to the problem, and
let f and gb be differentiable at x∗ .
(i) If x∗ is a KKT point (as defined by Definition IV.23.6) of (P ), then x∗
is an optimal solution to (P ).
(ii) If x∗ is optimal solution to (P ) and, if addition to the above premise,
(P ) satisfies the cone-constrained Relaxed Slater condition, then x∗ is a KKT
point, as defined by Definition IV.23.6, of (P )

The proof of this theorem follows verbatim by the proof of Theorem IV.23.4, with
Theorem IV.23.2 in the role of Theorem IV.23.1.
Application: Optimal value in parametric convex cone-constrained prob-
lem. What follows is a far-reaching extension of subgradient interpretation of
Lagrange multipliers presented in section 22.3.A. Consider a parametric family
of convex cone-constrained problems defined by a parameter p ∈ P
Opt(p) := min {f (x, p) : g(x, p) ≤M 0} , (P[p])
x∈X

where
(a) X ⊆ Rn and P ⊆ Rµ are nonempty convex sets,
(b) M is a regular cone in some Rν .
(c) f : X × P → R is convex, and g : X × P → Rν is M-convex. 1

1 In what follows, splitting the constraint in a cone-constrained problem into a system of scalar linear
inequalities and a conic inequality does not play any role, and in order to save notation, (P[p]) uses
“single-constraint” format of cone-constrained problem. Of course, the two-constraint format
g(x) := Ax − b ≤ 0, bg (x) ≤K 0 reduces to the single-constraint one by setting g(x) = [g(x); b
g (x)] and
M = Rk+ × K.
23.3 Cone-constrained KKT optimality conditions 275

Next, we make the following assumption:


(d) Suppose x ∈ X and p ∈ P are such that
– x is a KKT point, as defined by Definition IV.23.6, of convex cone-constrained
problem (P[p]) (and therefore, by Theorem IV.23.7, is an optimal solution
to the problem), and
– f (x, p) and g(x, p) are differentiable at the point [x; p], the derivatives being
Df ([x; p])[[dx; dp]] = Fx⊤ dx + Fp⊤ dp, Dg([x; p])[[dx; dp]] = Gx dx + Gp dp.
and let µ ∈ M∗ be the Lagrange multiplier associated with x and (P[p]), so that
µ⊤ g(x, p) = 0 and [x − x]⊤ [Fx + G⊤
x µ] ≥ 0, ∀x ∈ X. (23.10)
(cf. Definition IV.23.6).

Proposition IV.23.8 Under Assumptions (a–d), Opt(·) is a convex function


on P taking values in R ∪ {+∞} and finite at p, and the vector
Fp + G⊤

is a subgradient of Opt(·) at p:
Opt(p) ≥ Opt(p) + [p − p]⊤ [Fp + G⊤
p µ], ∀p ∈ P. (23.11)

Proof. Observe, first, that as f is convex by Gradient inequality we have


f (x, p) ≥ f (x, p) + Fx⊤ [x − x] + Fp⊤ [p − p], ∀(x ∈ X, p ∈ P ). (23.12)
Also, from µ ∈ M∗ and M-convexity of g by Fact IV.20.10 it follows that the
function µ⊤ g(x, p) is convex on X × P , so that by Gradient inequality we have
µ⊤ g(x, p) ≥ µ⊤ g(x, p) +µ⊤ Gx [x − x] + µ⊤ Gp [p − p], ∀(x ∈ X, p ∈ P ).
| {z }
=0 by (23.10)
(23.13)
Now let p ∈ P and x be feasible for (P[p]). Then,
f (x, p) ≥ f (x, p) + µ⊤ g(x, p)
≥ f (x, p) + Fx⊤ [x − x] + Fp⊤ [p − p] + µ⊤ Gx [x − x] + µ⊤ Gp [p − p]
= Opt(p) + (Fx + G⊤ ⊤ ⊤ ⊤
x µ) [x − x] + (Fp + Gp µ) [p − p]

≥ Opt(p) + (Fp + G⊤ ⊤
p µ) [p − p],

where the first inequality follows from µ ∈ M∗ and g(x, p) ≤M 0, the second is
due to (23.12) and (23.13), the equality holds by recalling that x is optimal for
(P [p]), and the last inequality is due to (23.10). As the resulting inequality holds
true for all x feasible for (P[p]), it justifies (23.11).
To complete the proof, we need to verify the convexity of Opt(·). By the relation
in (23.11), Opt(p) for p ∈ P is either a real, or +∞, as is required for a convex
function. For any p′ , p′′ ∈ P ∩ Dom(Opt(·)) and λ ∈ [0, 1], we need to check
Opt(λp′ + (1 − λ)p′′ ) ≤ λOpt(p′ ) + (1 − λ)Opt(p′′ ). (23.14)
276 Optimality Conditions in Convex Programming

This is immediate (cf. proof of Fact IV.22.2): given ϵ > 0, we can find x′ , x′′ ∈ X
such that
g(x′ , p′ ) ≤M 0, g(x′′ , p′′ ) ≤M 0, f (x′ , p′ ) ≤ Opt(p′ ) + ϵ, f (x′′ , p′′ ) ≤ Opt(p′′ ) + ϵ.
Setting p := λp′ + (1 − λ)p′′ , x := λx′ + (1 − λ)x′′ and invoking convexity of f
and M-convexity of g, we get
g(x, p) ≤M 0, f (x) ≤ [λOpt(p′ ) + (1 − λ)Opt(p′′ )] + ϵ.
Finally, since ϵ > 0 is arbitrary, we arrive at (23.14).
Note that the result of section 22.3.A is nothing but what Proposition IV.23.8
states in the case of f independent of p, M = Rk+ × K, p = [δ, δ]
b ∈ Rk × Rν , and
g(x, p) = [g(x) − δ; gb(x) − δ].
b

23.4 Optimality conditions in Conic Programming


We continue by discussing the case of conic programming.

Theorem IV.23.9 [Optimality Conditions in Conic Programming] Consider


a primal-dual pair of conic problems (cf. section 22.4)
Opt(P ) = minn c⊤ x : Ax − b ≤ 0, P x − p ≤K 0

(P)
x∈R
n o
Opt(D) = max −b⊤ λ − p⊤ λ b : A⊤ λ + P ⊤ λ
b + c = 0, λ ≥ 0, λ
b ∈ K∗ . (D)
λ,λ
b

Suppose that both problems satisfy Relaxed Slater condition. Then, a pair
of feasible solutions x∗ to (P ) and λ∗ := [λ∗ ; λb∗ ] to (D) is optimal to the
respective problems if and only if
DualityGap(x∗ ; λ∗ ) := c⊤ x∗ − [−b⊤ λ∗ − p⊤ λ
b∗ ] = 0, [Zero Duality Gap]

which holds if and only if



b⊤ [p − P x∗ ] = 0.
λ∗ [b − Ax∗ ] + λ [Complementary Slackness]

Remark IV.23.10 Under the premise of Theorem IV.23.9, from the feasibility of
x∗ and λ∗ for their respective problems it follows that b−Ax∗ ≥ 0 and p−P x∗ ∈ K
and λ∗ ≥ 0 and λ b∗ ∈ K∗ . Therefore, Complementary slackness (which says that
the sum of two inner products, every one of a vector from a regular cone and a
vector from the dual of this cone, and as such automatically nonnegative) is zero is
b∗ ]⊤ gb(x∗ ) = 0
a really strong restriction. This comment is applicable to relation [λ
in (23.8). ■
Proof of Theorem IV.23.9. By Conic Duality Theorem (Theorem IV.22.6)
we are in the case when Opt(P ) = Opt(D) ∈ R, and therefore for any x and
λ := [λ; λ],
b we have

DualityGap(x; λ) = [c⊤ x − Opt(P )] + [Opt(D) − (−b⊤ λ − p⊤ λ)].


b
23.4 Optimality conditions in Conic Programming 277

Now, when x is feasible for (P ), the primal optimality gap c⊤ x − Opt(P ) is non-
negative and is zero if and only if x is optimal for (P ). Similarly, when λ = [λ; λ]
b
is feasible for (D), the dual optimality gap Opt(D) − (−b⊤ λ − p⊤ λ)b is nonnegative
and is zero if and only if λ is optimal for (D). We conclude that whenever x is fea-
sible for (P ), and λ is feasible for (D). the duality gap DualityGap(x; λ) (which,
as we have seen, is the sum of the corresponding optimality gaps) is nonnegative
and is zero if and only if both these optimality gaps are zero, that is, if and only
if x is optimal for (P ), and λ is optimal for (D).
It remains to note that Complementary Slackness condition is equivalent to
Zero Duality Gap one. To this end, note that since x∗ and λ∗ are feasible for
their respective problems, we have
DualityGap(x∗ ; λ∗ ) = c⊤ x∗ + b⊤ λ∗ + p⊤ λ
b∗

= −[A⊤ λ∗ + P ⊤ λ
c∗ ]⊤ x∗ + b⊤ λ∗ + p⊤ λ
b∗

b⊤ [p − P x∗ ].
= λ∗ [b − Ax∗ ] + λ∗

Therefore, Complementary Slackness, for the solutions x∗ and λ∗ that are feasible
for the respective problems, is exactly the same as Zero Duality Gap.
Example IV.23.1 (continued from Example IV.22.1) Consider the primal-dual
pair of conic problems (22.11) and (22.12). We claim that the primal solution
y = −B 1/2 , t = −Tr(B 1/2 ) and the dual solution λ = 1, U = 21 B −1/2 , V = 12 In ,
W = 12 B 1/2 are optimal for the respective problems. Indeed, it is immediately seen
that these solutions are feasible for the respective problems (to check feasibility of
the dual solution, use Schur Complement Lemma). Moreover, the objective value
of the primal solution equals to the objective value of the dual solution, and both
these quantities are equal to −Tr(B 1/2 ). Thus, the zero duality gap indeed holds
true. ♢
24

Duality in Linear and Convex Quadratic


Programming

The fundamental role of the Lagrange function and Lagrange Duality in Op-
timization is clear already from the Optimality Conditions given by Theorem
IV.23.1, but this role is not restricted by this theorem only. There are several
cases when we can explicitly write down the Lagrange dual, and whenever it is
the case, we get a pair of explicitly formulated and closely related to each other
optimization programs – the primal-dual pair; analyzing the problems simulta-
neously, we get more information about their properties (and get a possibility to
solve the problems numerically in a more efficient way) than it is possible when
we restrict ourselves with only one problem of the pair. The detailed investigation
of Duality in “well-structured” Convex Programming deals with cone-constrained
Lagrange duality and conic problems. This being said, there are cases where al-
ready “plain” Lagrange duality is quite appropriate. Let us look at two of these
particular cases.

24.1 Linear Programming Duality


Let us start with some general observation. Note that the Karush-Kuhn-Tucker
condition under the assumption of Theorem IV.23.4 (i.e., problem

min f (x) : g( x) ≤ 0, j = 1, . . . , m, x ∈ X (IC)
x

is convex, x is a feasible solution to the problem, and f, g1 , . . . , gm are differ-
entiable at x∗ ) is exactly the condition that (x∗ , λ∗ := (λ∗1 , . . . , λ∗m )) is a saddle
point of the Lagrange function
m
X
L(x, λ) = f (x) + λj gj (x) (24.1)
j=1


on X × Rm + : equalities (23.5) taken together with feasibility of x state that
∗ ∗
L(x , λ) attains its maximum in λ ≥ 0 at λ , and (23.6) states that when λ is
fixed at λ∗ the function L(x, λ∗ ) attains its minimum in x ∈ X at x = x∗ .
Now consider the particular case of (IC) where X = Rn is the entire space, the
objective f is convex and everywhere differentiable and the constraints g1 , . . . , gm
are linear. In this case the Relaxed Slater Condition holds whenever there is a fea-
sible solution to (IC), and when that is the case, Theorem IV.23.4 states that the
KKT (Karush-Kuhn-Tucker) condition is necessary and sufficient for optimality
278
24.2 Quadratic Programming Duality 279

of x∗ ; as we just have explained, this is the same as to say that the necessary
and sufficient condition of optimality for x∗ is that x∗ along with certain λ∗ ≥ 0
form a saddle point of the Lagrange function. Combining these observations with
Proposition IV.21.2, we get the following simple result.

Proposition IV.24.1 Let (IC) be a convex program with X = Rn , every-


where differentiable objective f and linear constraints g1 , . . . , gm .
Then, x∗ is an optimal solution to (IC) if and only if there exists λ∗ ≥ 0
such that (x∗ , λ∗ ) is a saddle point of the Lagrange function (24.1) (regarded
as a function of x ∈ Rn and λ ≥ 0). In particular, (IC) is solvable if and only
if L has saddle points, and if it is the case, then both (IC) and its Lagrange
dual n o
max L(λ) := inf L(x, λ) (IC∗ )
λ≥0 x

are solvable with equal optimal objective values.

Let us look what Proposition IV.24.1 says in the Linear Programming case, i.e.,
when (IC) is the problem given by
min f (x) := cT x : gj (x) := bj − aTj x ≤ 0, j = 1, . . . , m .

(P )
x

In order to get to the Lagrange dual, we should form the Lagrange function of
(IC) given by
m
X  Xm ⊤ m
X
L(x, λ) = f (x) + λj gj (x) = c − λ j aj x + λj bj ,
j=1
j=1 j=1

and minimize it in x ∈ Rn ; this will give us the dual objective. In our case
the
Pm minimization in x is P immediate: the minimal value is equal to −∞, if c −
m
j=1 λj aj ̸= 0, and it is j=1 λj bj , otherwise. Hence, we see that the Lagrange
dual is given by
( m
)
X

max b λ : λj aj = c, λ ≥ 0 . (D)
λ
j=1

Therefore, the Lagrange dual problem is precisely the usual LP dual to (P ), and
Proposition IV.24.1 is one of the equivalent forms of the Linear Programming
Duality Theorem (Theorem I.4.9) which we already know.

24.2 Quadratic Programming Duality


Now consider the case when the original problem is linearly constrained convex
quadratic program given by
 
1
min f (x) := x⊤ Qx + c⊤ x : gj (x) := bj − a⊤
j x ≤ 0, j = 1, . . . , m , (P )
x 2
280 Duality in Linear and Convex Quadratic Programming

where the objective is a strictly convex quadratic form, so that the matrix Q = Q⊤
is positive definite, i.e., x⊤ Qx > 0 whenever x ̸= 0. It is convenient to rewrite the
constraints in the vector-matrix form using the notation
 ⊤ 
a1
 .. 
g(x) = b − Ax ≤ 0, where b := [b1 ; . . . ; bm ] , A :=  .  .
a⊤m

In order to form the Lagrange dual to (P ) program, we write down the Lagrange
function
Xm
L(x, λ) = f (x) + λj gj (x)
j=1
1 1
= x⊤ Qx + c⊤ x + λ⊤ (b − Ax) = x⊤ Qx − (A⊤ λ − c)⊤ x + b⊤ λ,
2 2
and minimize it in x. Since the function is convex and differentiable in x, the
minimum, if exists, is given by the Fermat rule
∇x L(x, λ) = 0,
which in our situation becomes
Qx = A⊤ λ − c.
Since Q is positive definite, it is nonsingular, so that the Fermat equation has a
unique solution which is the minimizer of L(·, λ). This solution is
x(λ) := Q−1 (A⊤ λ − c).
Substituting the expression of x(λ) into the expression for the Lagrange function,
we get the dual objective
1
L(λ) = − (A⊤ λ − c)⊤ Q−1 (A⊤ λ − c) + b⊤ λ.
2
Thus, the dual problem is to maximize this objective over the nonnegative or-
thant. Let us rewrite this dual problem equivalently by introducing additional
variables
t := −Q−1 (A⊤ λ − c) =⇒ (A⊤ λ − c)⊤ Q−1 (A⊤ λ − c) = t⊤ Qt.
With this substitution, the dual problem becomes
 
1 ⊤ ⊤ ⊤
max − t Qt + b λ : A λ + Qt = c, λ ≥ 0 . (D)
λ,t 2
We see that the dual problem also turns out to be linearly constrained convex
quadratic program.
Remark IV.24.2 Note also a feasible quadratic program in the form of (P ) with
a positive definite matrix Q automatically is solvable. This relies on the following
simple general fact:
24.2 Quadratic Programming Duality 281

Let (IC) be a feasible program with closed domain X, continuous on X


objective and constraints and such that f (x) → ∞ as x ∈ X “goes to
infinity” (i.e., ∥x∥2 → ∞). Then (IC) is solvable.
In the case of our quadratic program (P ), as Q is positive definite, we have
f (x) → ∞ as ∥x∥2 → ∞. Then, solvability of (P ) follows from the simple fact
stated above. You are encouraged to prove this simple fact on your own. ■
Based on this remark, Proposition IV.24.1 leads to the following result.

Theorem IV.24.3 [Duality Theorem in Quadratic Programming]


Let (P ) be a feasible quadratic program with positive definite symmetric
matrix Q in the objective. Then, both (P ) and (D) are solvable, and the
optimal values in these problems are equal to each other.
A pair of primal and dual feasible solutions, say (x; (λ, t)), to these prob-
lems are optimum
(i) if and only if “zero duality gap” optimality condition holds, i.e., the
primal objective value at x is equal to the dual objective value at (λ, t)
or equivalently,
(ii) if and only if the following holds
λi (Ax − b)i = 0, i = 1, . . . , m, and t = −x. (24.2)

Proof. (i): Proposition IV.24.1 implies that the optimal value in minimization
problem (P ) is equal to the optimal value in the maximization problem (D).
It follows that the value of the primal objective at any primal feasible solution
is ≥ the value of the dual objective at any dual feasible solution, and equality
is possible if and only if these values coincide with the optimal values in the
problems, as claimed in (i).
(ii): Let ∆(x, (λ, t)) be the difference between the primal objective value of
the primal feasible solution x and the dual objective value of the dual feasible
solution (λ, t)
1 1
∆(x, (λ, t)) := (c⊤ x + x⊤ Qx) − (b⊤ λ − t⊤ Qt)
2 2
1 ⊤ 1
= (A λ + Qt) x + x Qx + t⊤ Qt − b⊤ λ
⊤ ⊤
2 2
1
= λ⊤ (Ax − b) + (x + t)⊤ Q(x + t),
2
where the second equation follows since A⊤ λ + Qt = c. Whenever x is primal
feasible, we have Ax − b ≥ 0, and similarly dual feasibility of (λ, t) implies that
λ ≥ 0. Since Q is positive definite as well, we then deduce that the first and
the second terms in the above representation of ∆(x, (λ, t)) are nonnegative for
every pair (x; (λ, t)) of primal and dual feasible solutions. Thus, for such a pair
∆(x, (λ, t)) = 0 holds if and only if λ⊤ (Ax − b) = 0 and (x + t)⊤ Q(x + t) = 0. The
first of these equalities, due to λ ≥ 0 and Ax ≥ b, is equivalent to λj (Ax − b)j = 0
282 Duality in Linear and Convex Quadratic Programming

for all j = 1, . . . , m; the second equality, due to positive definiteness of Q, is


equivalent to x + t = 0.
25

⋆ Cone-convex functions: elementary


calculus and examples

So far, speaking about K-convex functions, we assumed the cone K to be regular,


that is, pointed, closed, and with a nonempty interior. In the considerations of this
chapter, nonemptiness of int K is of no importance, so that within this chapter we
specify K-convex mapping f , K being a closed pointed cone in some embedding
Euclidean space F , as a mapping f defined on a convex domain Dom f ⊆ Rn
and taking values in F such that
f (λx + (1 − λ)y) ≤K λf (x) + (1 − λ)f (y), ∀(x, y ∈ Dom f, λ ∈ [0, 1]),
where, as always, a ≤K b means that b − a ∈ K. In particular, we allow for
K = {0}, that is a ≤K b equivalent to a = b; in this extreme case, K-convexity
of f means that f is affine on Dom f .
Note that Fact IV.20.10, as is clear from its proof, remains true when the
regularity of cone K in its premise is relaxed to closedness and pointedness.
The calculus presented below resembles the usual calculus of real-valued mono-
tone and convex functions, and almost all claims to follow are nearly self-evident,
and therefore not all of them are accompanied by proofs. Absence of a proof in
the text means that providing the proof is an exercise for the reader1 .

25.1 Epigraph characterization of cone-convexity


Let f (x) : Dom f → Rν be a mapping with convex domain Dom f ⊆ Rn , and let
K ⊂ Rν be a closed pointed cone. The following claim is evident.

Fact IV.25.1 A function f is K-convex if and only if its K-epigraph


epiK (f ) := {(x, y) ∈ Dom f × Rν : y ≥K f (x)}
is convex.
Let us examine some K-convex functions. In what follows, we use ⪰-convexity
as synonym of Sm
+ -convexity, with context-specified m.
Example IV.25.1 Consider the “convex matrix-valued quadratic function” of
x ∈ Rp×q given by
f (x) := [AxB][AxB]⊤ + [CxD + D⊤ x⊤ C ⊤ ] + H,
1 This being said, all missing proofs can be found in solutions to Exercise IV.29.

283
284 ⋆ Cone-convex functions: elementary calculus and examples

where A, C ∈ Rm×p , B ∈ Rq×n , D ∈ Rq×m , H ∈ Sm . Note that f : Rp×q 7→ Sm .


We claim that f is ⪰-convex. Indeed, by the Schur Complement Lemma we have
y − (CxD + D⊤ x⊤ C ⊤ ) − H AxB
   
epiSm {f } = (x, y) : ⪰0
+ B ⊤ x ⊤ A⊤ In
and the right hand side set is clearly convex (as it is the inverse image of Sm+n
+
under an affine mapping). ♢
Example IV.25.2 Consider the matrix-valued fractional-quadratic function
f (u, v) := u⊤ v −1 u
with the domain Dom f = {(u, v) : u ∈ Rp×q , v ∈ int Sp+ }. We claim that f is
⪰-convex. Indeed, by the Schur Complement Lemma we have
y u⊤
   
epiSq+ {f } = [(u, v, y) : ⪰ 0, v ≻ 0 ,
u v
and the right hand side set is convex (by the same reason as in Example IV.25.1).

25.2 Testing cone-convexity and cone-monotonicity

25.2.1 Cone-monotonicity
Let us start with a new (for us) notion which will play an important role in
“calculus of cone-convexity.”

Definition IV.25.2 [(U, K)-monotonicity] Let E and F be Euclidean spaces


equipped with closed pointed cones U and K, Q be a nonempty convex subset
of E, and f (x) : Q → F be a mapping. We say that this mapping is (U, K)-
monotone on Q, if f (x) ≤K f (x′ ) holds for every x, x′ ∈ Q satisfying x ≤U x′ .

For example, when U and K are nonnegative orthants, e.g., E = Rn and


F = Rm , (U, K)-monotonicity of F on Q means that whenever x ≤ x′ and
x, x′ ∈ Q, one has F (x) ≤ F (x′ ). An instructive extreme example is the one
where U = {0}; in this case, every mapping f : Q → F is (U, K)-monotone.

25.2.2 Differential criteria for cone-convexity and cone-monotonicity


We next present a differential characterization of cone-convexity and cone-mono-
tonicity. The following claim is nearly evident.

Proposition IV.25.3 Let E, F be Euclidean spaces equipped with closed

You might also like