0% found this document useful (0 votes)

13 views454 pages

CUPProject Ini No Sol

The document is a comprehensive guide on mathematical essentials for convex optimization, authored by Fatma Kilinç-Karzan and Arkadi Nemirovski. It covers various topics including convex sets, functions, duality, and optimality conditions, structured into multiple parts with detailed sections and exercises. The content is aimed at providing foundational knowledge and advanced concepts necessary for understanding and applying convex optimization techniques.

Uploaded by

nwobodope

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views454 pages

CUPProject Ini No Sol

Uploaded by

nwobodope

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 454

MATHEMATICAL ESSENTIALS

FOR
CONVEX OPTIMIZATION

Fatma Kilinç-Karzan, Tepper School of Business, Carnegie Mellon

University

Arkadi Nemirovski, H. Milton Stewart School of Industrial and

Systems Engineering, Georgia Institute of Technology
Contents

Preface x

Main Notational Conventions xiv

Part I Convex sets in Rn – From First Acquaintance to

Linear Programming Duality 1

1 First acquaintance with convex sets 3

1.1 Definition and examples 3
1.1.1 Affine subspaces and polyhedral sets 4
1.1.2 Unit balls of norms 5
1.1.3 Ellipsoids 7
1.1.4 Neighborhood of a convex set 7
1.2 Inner description of convex sets: convex combinations and convex hull 7
1.2.1 Convex combinations 7
1.2.2 Convex hull 8
1.2.3 Simplex 9
1.2.4 Cones 10
1.3 Calculus of convex sets 12
1.3.1 Calculus of closed convex sets 13
1.4 Topological properties of convex sets 14
1.4.1 The closure 14
1.4.2 The interior 15
1.4.3 The relative interior 17
1.4.4 Nice topological properties of convex sets 18
1.5 ⋆ Conic and perspective transforms of a convex set 22

2 Theorems of Caratheodory, Radon, and Helly 26

2.1 Caratheodory Theorem 26
2.1.1 Caratheodory Theorem, Illustration 28
2.2 Radon Theorem 29
2.3 Helly Theorem 30
2.3.1 Helly Theorem, Illustration A 32
2.3.2 Helly Theorem, Illustration B 33
2.3.3 ⋆ Helly Theorem for infinite families of convex sets 34

3 Polyhedral representations and Fourier-Motzkin elimination 36

iii
iv Contents

3.1 Polyhedral representations 36

3.2 Every polyhedrally representable set is polyhedral (Fourier-Motzkin
elimination) 38
3.2.1 Some applications 39
3.3 Calculus of polyhedral representations 40

4 General Theorem on Alternative and Linear Programming Duality 43

4.1 Homogeneous Farkas Lemma 43
4.2 Certificates for feasibility and infeasibility 44
4.3 General Theorem on Alternative 47
4.4 Corollaries of GTA 48
4.5 Application: Linear Programming Duality 50
4.5.1 Preliminaries: Mathematical and Linear Programming problems 50
4.5.2 Dual to an LP problem: the origin 52
4.5.3 Linear Programming Duality Theorem 55

5 Exercises for Part I 59

5.1 Elementaries 59
5.2 Around ellipsoids 61
5.3 Truss Topology Design 62
5.4 Around Caratheodory Theorem 66
5.5 Around Helly Theorem 69
5.6 Around Polyhedral Representations 70
5.7 Around General Theorem on Alternative 71
5.8 Around Linear Programming Duality 72

6 Proofs of Facts from Part I 76

Part II Separation Theorem, Extreme Points, Reces-

sive Directions, and Geometry of Polyhedral Sets 81

7 Separation Theorem 83
7.1 Separation: definition 83
7.2 Separation Theorem 85

8 Consequences of Separation Theorem 90

8.1 Supporting hyperplanes 90
8.2 Extreme points and Krein-Milman Theorem 91
8.2.1 Extreme points: definition 91
8.2.2 Krein-Milman Theorem 92
8.3 Recessive directions and recessive cone 96
8.4 Dual cone 100
8.5 ⋆ Dubovitski-Milutin Lemma 104
8.6 Extreme rays and conic Krein-Milman Theorem 107
8.7 ⋆ Polar of a convex set 110

9 Geometry of polyhedral sets 112

Contents v

9.1 Extreme points of polyhedral sets 112

9.1.1 Important polyhedral sets and their extreme points 115
9.2 Extreme rays of polyhedral cones 120
9.3 Geometry of polyhedral sets 121
9.3.1 Application: Descriptive theory of Linear Programming 124
9.3.2 Application: Solvability of a Linear Programming problem 124
9.3.3 Proof of Theorem II.9.10 126
9.4 ⋆ Majorization 128

10 Exercises for Part II 131

10.1 Separation 131
10.2 Extreme points 132
10.3 Cones and extreme rays 135
10.4 Recessive cone 136
10.5 Around majorization 137
10.6 Around polars 137
10.7 Miscellaneous exercises 138

11 Proofs of Facts from Part II 145

Part III Convex Functions 153

12 First acquaintance with convex functions 155

12.1 Definition and examples 155
12.2 Jensen’s inequality 157
12.3 Convexity of sublevel sets 158
12.4 Value of a convex function outside its domain 159

13 How to detect convexity 161

13.1 Operations preserving convexity of functions 161
13.2 Criteria of convexity 165
13.2.1 Differential criteria of convexity 166
13.3 Important multivariate convex functions 170
13.4 Gradient inequality 173
13.5 Boundedness and Lipschitz continuity of a convex function 174

14 Minima and maxima of convex functions 179

14.1 Minima of convex functions 179
14.1.1 Unimodality 179
14.1.2 Necessary and sufficient optimality conditions 180
14.1.3 ⋆ Symmetry Principle 183
14.2 Maxima of convex functions 185

15 Subgradients 188
15.1 Proper lower semicontinuous convex functions and their representation 188
15.2 Subgradients 198
15.3 Subdifferentials and directional derivatives of convex functions 204
vi Contents

16 ⋆ Legendre transform 209

16.1 Legendre transform : Definition and examples 209
16.2 Legendre transform : Main properties 211
16.3 Young, Hölder, and Moment inequalities 213
16.3.1 Young’s inequality 213
16.3.2 Hölder’s inequality 214
16.3.3 Moment inequality 215

17 ⋆ Functions of eigenvalues of symmetric matrices 217

18 Exercises for Part III 222

18.1 Around convex functions 222
18.2 Around support, characteristic, and Minkowski functions 226
18.3 Around subdifferentials 229
18.4 Around Legendre transform 230
18.5 Miscellaneous exercises 230

19 Proofs of Facts from Part III 233

Part IV Convex Programming, Lagrange Duality, Sad-

dle Points 237

20 Convex Programming problems and Convex Theorem on Alterna-

tive 239
20.1 Mathematical Programming and Convex Programming problems 239
20.1.1 Convex Programming problem 240
20.2 Convex Theorem on Alternative 240
20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 243

21 Lagrange Function and Lagrange Duality 253

21.1 Lagrange function 253
21.2 Convex Programming Duality Theorem 253
21.3 Lagrange duality and saddle points 255

22 ⋆ Convex Programming in cone-constrained form 257

22.1 Convex problem in cone-constrained form 257
22.2 Cone-constrained Lagrange function 257
22.3 Convex Programming Duality Theorem in cone-constrained form 258
22.4 Conic Programming and Conic Duality Theorem 260

23 Optimality Conditions in Convex Programming 268

23.1 Saddle point form of optimality conditions 268
23.2 Karush-Kuhn-Tucker form of optimality conditions 271
23.3 Cone-constrained KKT optimality conditions 273
23.4 Optimality conditions in Conic Programming 276

24 Duality in Linear and Convex Quadratic Programming 278

Contents vii

24.1 Linear Programming Duality 278

24.2 Quadratic Programming Duality 279

25 ⋆ Cone-convex functions: elementary calculus and examples 283

25.1 Epigraph characterization of cone-convexity 283
25.2 Testing cone-convexity and cone-monotonicity 284
25.2.1 Cone-monotonicity 284
25.2.2 Differential criteria for cone-convexity and cone-monotonicity 284
25.3 Elementary calculus of cone-convexity 287

26 ⋆ Mathematical Programming Optimality Conditions 290

26.1 Formulating Optimality conditions 291
26.2 Justifying Optimality conditions 292
26.2.1 Main tool: Implicit Function Theorem 293
26.2.2 Strategy 293
26.2.3 Justifying optimality conditions for (26.4) 294
26.2.4 Justifying Propositions IV.26.3, IV.26.4 298
Justifying Proposition IV.26.3 299
Justifying Proposition IV.26.4 299
26.3 Concluding remarks 300
26.3.1 Illustration: S-Lemma revisited 302

27 Saddle points 304

27.1 Definition and Game Theory interpretation 304
27.2 Existence of Saddle Points: Sion-Kakutani Theorem 307
27.3 Proof of Sion-Kakutani Theorem 309
27.3.1 Proof of Minimax Lemma 309
27.3.2 From Minimax Lemma to the proof of Sion-Kakutani Theorem 310
27.4 Sion-Kakutani Theorem: A refinement 311

28 Exercises for Part IV 314

28.1 Around Conic Duality 314
28.1.1 ⋆ Geometry of primal-dual pair of conic problems 317
28.2 Around S-Lemma 319
28.3 Miscellaneous exercises 321
28.4 Around convex cone-constrained and conic problems 325
28.5 ⋆ Cone-convexity 331
28.6 ⋆ Around conic representations of sets and functions 334
28.6.1 Conic representations: definitions 334
28.6.2 Conic representability: elementary calculus 335
28.6.3 R/L/S hierarchy 337
28.6.4 More calculus 338
28.6.5 Raw materials 338

29 Proofs of Facts from Part IV 343

Appendix A Prerequisites from Linear Algebra 346

A.1 Space Rn : algebraic structure 346
A.1.1 A point in Rn 346
viii Contents

A.1.2 Linear operations 346

A.2 Linear subspaces 347
A.2.1 Examples of linear subspaces 347
A.2.2 Sums and intersections of linear subspaces 349
A.2.3 Linear independence, bases, dimensions 349
A.2.4 Linear mappings and matrices 352
A.2.5 Determinant and rank 353
A.2.6 Determinant 353
A.2.7 Rank 354
A.3 Space Rn : Euclidean structure 355
A.3.1 Euclidean structure 355
A.3.2 Inner product representation of linear forms on Rn 356
A.3.3 Orthogonal complement 357
A.3.4 Orthonormal bases 358
A.4 Affine subspaces in Rn 361
A.4.1 Affine subspaces and affine hulls 361
A.4.2 Intersections of affine subspaces, affine combinations, and affine hulls 362
A.4.3 Affinely spanning sets, affinely independent sets, affine dimension 363
A.4.4 Dual description of linear subspaces and affine subspaces 367
A.4.5 Affine subspaces and systems of linear equations 367
A.4.6 Structure of the simplest affine subspaces 369
A.5 Exercises 370
A.6 Proofs of Facts 371

Appendix B Prerequisites from Real Analysis 372

B.1 Space Rn : metric structure and topology 372
B.1.1 Euclidean norm and distances 372
B.1.2 Convergence 375
B.1.3 Closed and open sets 377
B.1.4 Local compactness of Rn 379
B.2 Continuous functions on Rn 381
B.2.1 Continuity of a function 381
B.2.2 Elementary continuity-preserving operations 382
B.2.3 Basic properties of continuous functions on Rn 383
B.3 Semicontinuity 385
B.3.1 Functions with values in the extended real axis 385
B.3.2 Epigraph of a function 386
B.3.3 Lower semicontinuity 386
B.3.4 Hypograph and upper semicontinuity 388
B.4 Exercises 388
B.5 Proofs of Facts 389

Appendix C Prerequisites from Calculus 391

C.1 Differentiable functions on Rn 391
C.1.1 The derivative 391
C.1.2 Derivative and directional derivatives 393
C.1.3 Representations of the derivative 395
C.1.4 Existence of the derivative 396
C.1.5 Calculus of derivatives 397
C.1.6 Computing the derivative 398
Contents ix

C.2 Higher order derivatives 400

C.2.1 Calculus of Ck mappings 404
C.2.2 Examples of higher-order derivatives 405
C.2.3 Taylor expansion 406

Appendix D Prerequisites: Symmetric Matrices and Positive Semidef-

inite Cone 408
D.1 Symmetric matrices 408
D.1.1 Main facts on symmetric matrices 409
D.1.2 Variational characterization of eigenvalues 412
D.1.3 Corollaries of the VCE 414
D.1.4 Spectral norm and Lipschitz continuity of vector of eigenvalues 416
D.1.5 Functions of symmetric matrices 417
D.2 Positive semidefinite matrices and positive semidefinite cone 420
D.2.1 Positive semidefinite matrices. 420
D.2.2 The positive semidefinite cone 422
D.3 Exercises 426
D.4 Proofs of Facts 427
References 433
Index 434

Solutions to Selected Exercises 1

Exercises from Part I 2
Exercises from Part II 37
Exercises from Part III 72
Exercises from Part IV 102
Exercises from Appendix A 157
Exercises from Appendix B 160
Exercises from Appendix D 162
Preface

Convex optimization serves as a cornerstone in various fields of science, engineer-

ing, and mathematics, offering powerful tools for solving a wide range of practical
problems. With the latest advancements in data sciences and engineering, convex
optimization has flourished into a vibrant and rapidly evolving field.
This textbook aims to introduce fundamental theory necessary to establish a
robust foundation for doing research on convex optimization. In particular, we
have selected to cover both the indispensable basics suited for beginners – rooted
in centuries-old research on convexity – as well as modern facets of convex op-
timization, e.g., cone-constrained conic programming, targeting more advanced
readers.
Our emphasis is on foundations and mathematical prerequisites that underpin
(primarily Convex) Optimization Theory, and not operational aspects like Mod-
eling and Algorithms. This deliberate choice stems from our desire of emphasiz-
ing “timeless and essential classics” and providing an accessible, self-contained,
concise, rigorous, yet practical mathematical toolkit. Our goal is to illuminate
the entrance to the field of convex optimization, offering readers the background
necessary to engage with and comprehend the state-of-the-art “operational” op-
timization literature, like excellent books [BV04, Nes18]. While applications and
algorithms naturally evolve with changing trends and advancements, they will
always rely on these timeless foundational blocks. Overall, we view the primary
purpose of this book as learning and teaching as opposed to an extensive reference
to be kept on shelf by experts.
To an expert in the field: the primary focus of this book is Convex Analysis
and the basic theory of convex optimization. Convex Analysis boasts a rich and
profound theoretical framework, chronicled in classical treatises such as [Roc70,
IT79, HUL93]. However, our approach and presentation style here are tailored
to meet the needs of those new to the subject. While we have condensed the
scope compared to these classical resources, we have strived to maintain rigor
and cover the fundamental descriptive mathematical groundwork that we believe
is necessary for state-of-the-art research in Convex Optimization models and
algorithms. With regards to convex optimization theory, once again our emphasis
is on timeless building blocks such as duality and optimality conditions.
Our intended audience, are students, practitioners, researchers with back-
x
Preface xi

grounds in mathematics, operations research, engineering, computer science, data

sciences, statistics, and economics.
As prerequisites all we assume is basic knowledge of Linear Algebra and Calcu-
lus. In fact, we do not anticipate a deep, “ready-to-use” mastery of these subjects
either; rather, we expect a basic mathematical culture. To clarify our expecta-
tions, consider the following: asserting that 2×2 = 5 does not necessarily indicate
a deficiency in mathematical culture; it may simply be a miscalculation. In con-
trast, claiming that 2 × 2 is a triangle or a violin (occasionally encountered, albeit
perhaps figuratively rather than literally, in our teaching experience) does signify
a lack of mathematical culture: under any circumstance, the product of two real
numbers should invariably yield another real number and cannot be a triangle or
a violin.
Our choice of material is driven by years of experience teaching graduate-level
courses on Nonlinear and Convex Optimization. We have organized this material
into four main parts:
• basics on convex sets – instructive examples, “calculus” (convexity-preserving
operations), main theorem on convex sets such as Caratheodory and Helly
theorems, topology of convex sets, “descriptive basics” of Linear Programming
(General Theorem on Alternative, Linear Programming duality),
• separation theorem and its applications – extreme points, extreme rays, reces-
sive directions, and (finite-dimensional) Krein-Milman Theorem, structure of
polyhedral sets,
• basics on convex functions – instructive examples, detecting convexity, “cal-
culus,” subgradient inequality and preliminaries on subgradients, maxima and
minima, Legendre transformation,
• basic theory of Convex Optimization – Lagrange Duality and Lagrange Duality
Theorem for problems in standard form and in cone-constrained form, Conic
Programming and Conic Duality Theorem, optimality conditions in Mathe-
matical Programming, and convex-concave saddle points and Sion-Kakutani
Theorem.
We envision that in a semester-long course on convex optimization, this material
may cover about 40% or 60% of the course for building the foundational blocks
before moving to other parts (modeling and/or algorithms) of convex optimiza-
tion. For a course more geared towards linear and/or combinatorial optimization,
an instructor may opt to use specific material from the first two parts.
For reader’s convenience, the elementary facts from Linear Algebra, Calculus,
Real Analysis, and Matrix Analysis are summarized in the appendices of the
textbook (reproducing, courtesy of World Scientific Publishing Co., appendices
A – C in [Nem24]). In contrast to the main body of the textbook, these appen-
dices usually do not feature accompanying proofs, which are readily available
in standard undergraduate textbooks covering the respective subjects (see e.g.,
[Axl15, Edw12, Gel89, Pas22, Rud13, Str06]). A well-prepared reader may opt for
xii Preface

a “fast-forward” approach by initially reviewing these appendices before delving

into the main body of the book. Alternatively, they may commence their reading
journey from Part 1, referring back to the appendices as necessary.
Certain sections in our text, with titles starting with ⋆, delve into more advanced
and specialized topics, such as Conic, Perspective, and Legendre transforms,
Majorization, Cone-convexity, Cone-monotonicity, among others. Although these
starred sections hold significance in their respective domains, they are designed
to be optional and can be skipped over depending on the goals of the reader.
Our exposition adheres to the usual standards of rigor needed to present math-
ematical subjects. Accordingly, we provide complete formal proofs for all of the
theorems, propositions, lemmas, and the like. In addition to these, we include
formal statements of similar nature labeled as “Facts” scattered throughout a
chapter, but without accompanying proofs. The claims made in these “Facts”
are also compulsory part of our exposition, and their knowledge is as manda-
tory for mastering the material as the knowledge of theorems, propositions, etc.
Nonetheless, the statements within “Facts” are sufficiently elementary to be eas-
ily verified by a diligent reader. In essence, “Facts” serve as embedded exercises,
and we firmly believe that engaging with these exercises as they appear in the
text provides valuable hands-on practice for effective learning. This active partic-
ipation is an indispensable facet of mastering the presented material and honing
mathematical skills. At the same time, we recognize the importance of providing
access to detailed self-contained proofs of “Facts.” To this end, nearly all1 “Facts”
are repeated, this time with accompanying proofs, at the end of the respective
parts.
The exercises are crafted to align with our educational objective of fostering
hands-on learning and providing ample practice opportunities at various diffi-
culty levels. In particular, they are categorized into three types. The traditional
“Test Yourself” exercises enable readers to evaluate their grasp of the material
presented in the main body of the textbook. In addition to these, we also offer
“Try Yourself” type of exercises which typically require readers to prove some-
thing, aimed at fostering and assessing their creative skills. Finally, our “Educate
Yourself” exercises address topics that we deem significant, extending beyond the
core material covered in the textbook’s main body. An illustrative example for
this type of exercises is the investigation of conic representations of convex sets
and functions, along with the calculus associated with these representations. This
plays a crucial role in formulating and solving “well-structured” convex optimiza-
tion problems, particularly those involving Second Order Conic and Semidefinite
programs. We provide a separate solution manual for “Try Yourself” and “Edu-
cate Yourself” exercises.
Acknowledgements. The main body of this textbook existed for about 25 years,
1 few exceptions are absolutely straightforward and presented as Facts solely for the sake of further
references.
Preface xiii

in a more restricted form, as appendices to the graduate course on Modern Con-

vex Optimization taught by the second author first at TU Delft (1998) and then
at Georgia Institute of Technology (since 2003). The first author was fortunate to
have been a student at Georgia Tech thoroughly enjoying this material, and later
on she adopted this material and has been teaching a similar course at Carnegie
Mellon University (since 2012). These appendices originate from the descriptive
part of graduate course “Optimization II” designed in 1980’s by Prof. Aharon
Ben-Tal and for over 20 years taught by him at the Technion – Israel Institute
of Technology. This course was “inherited” and taught, in re-designed form, by
the second author first at the Technion, and then - at Georgia Institute of Tech-
nology. It is our pleasure to acknowledge hereby, with the deepest gratitude, the
instrumental role played by Prof. Ben-Tal in selecting and structuring significant
part of the material to follow. Besides this, we are greatly indebted to Dr. Sergei
Gelfand for the idea to convert these appendices into a “standalone” textbook.

Fatma Kılınç-Karzan, Tepper Business School, Carnegie Mellon University

Arkadi Nemirovski, H. Milton Stewart School of Industrial and Systems
Engineering, Georgia Institute of Technology
February 2024
xiv Main Notational Conventions

Main Notational Conventions

N, Z, Q, R, C stand for the set of all, respectively, nonnegative integers, integers,

rational numbers, real numbers, and complex numbers.
Vectors and matrices. By default, all vectors are column vectors.

• The space of all n-dimensional vectors with real entries is denoted by Rn ,

the set of all m × n matrices with real entries is denoted by Rm×n ; notation
Nn , . . . , Cm×n is interpreted similarly, where we restrict the entries to belong to
the respective number domains. The set of symmetric n×n matrices is denoted
by Sn . By default, all vectors and matrices have real entries, and when speaking
about Rn and Sn (Rm×n ), n (m and n) are positive integers.
By default, notation like xi (or yk ) refers to i-th entry of context-specified
vector x (or k-th entry of context-specified vector y). Similarly, notation like
xij (or Ykℓ ) refers to (i, j)-th entry of context-specified matrix x (or (k, ℓ)-th
entry of context-specified matrix Y ).
• Sometimes, “MATLAB notation” is used to save space: a vector with coordi-
nates x1 , . . . , xn is written down as

x = [x1 ; . . . ; xn ]
 
1
(pay attention to semicolon “;”). For example,  2  is written as [1; 2; 3].
3
More generally,
— if A1 , . . . , Am are matrices with the same number of columns, we write
[A1 ; . . . ; Am ] to denote the matrix obtained by writing A2 beneath A1 , A3
beneath A2 , and so on.
— if A1 , . . . , Am are matrices with the same number of rows, then [A1 , . . . , Am ]
stands for the matrix obtained by writing A2 to the right of A1 , A3 to the right
of A2 , and so on.
Examples:  
1 2 3
1 2 3
• A1 = , A2 = 7 8 9 =⇒ [A1 ; A2 ] =  4 5 6 
4 5 6
7 8 9

1 2 7 1 2 7
• A1 = , A2 = =⇒ [A1 , A2 ] =
3 4 8 4 5 8
• [1, 2, 3, 4] = [1; 2; 3; 4]⊤
1 2 5 6 1 2 5 6
• [[1, 2; 3, 4], [5, 6; 7, 8]] = , =
3 4 7 8 3 4 7 8
= [1, 2, 5, 6; 3, 4, 7, 8]
• We follow the standard
P0 convention that the sum of vectors over an empty set
of indexes, i.e., i=1 xi , where xi ∈ Rn , has a value – it is the origin in Rn .
Main Notational Conventions xv

Intervals in Rn . Given two vectors x, y ∈ Rn , we use the notation [x, y] to denote

the segment in Rn that connects x and y, where both endpoints are included,
i.e., [x, y] := {λx + (1 − λ)y : 0 ≤ λ ≤ 1}. Similarly, we define the open segment
(x, y) := {λx + (1 − λ)y : 0 < λ < 1} in Rn without the endpoints.
Semidefinite order. Relations A ⪰ B, B ⪯ A, A − B ⪰ 0, B − A ⪯ 0 all
mean the same, namely, that A, B are real symmetric matrices of the same size
such that the difference A − B is positive semidefinite. Positive definiteness of
the difference of A − B is expressed by every one of the relations A ≻ B, B ≺ A,
A − B ≻ 0, B − A ≺ 0.
Diag and Dg. For x ∈ Rn , Diag{x} stands for diagonal n × n matrix with the
entries of x on the diagonal.

For a collection

X1 , . . . , XK of rectangular matrices,
X1
Diag{X1 , . . . , Xk } =  .. stands for block-diagonal matrix with diag-
 

. 
Xk
onal blocks X1 , . . . , XK . For an n × n matrix X, Dg{X} stands for n-dimensional
vector composed of diagonal entries of X.

Extended real axis. We follow the standard conventions on operations of sum-

mation, multiplication, and comparison in the “extended real line” R ∪ {+∞} ∪
{−∞}. These conventions are as follows:
• Operations with real numbers are understood in their usual sense.
• The sum of +∞ and a real number, same as the sum of +∞ and +∞ is +∞.
Similarly, the sum of a real number and −∞, same as the sum of −∞ and −∞
is −∞. The sum of +∞ and −∞ is undefined.
• The product of a real number and +∞ is +∞, 0 or −∞, depending on whether
the real number is positive, zero or negative, and similarly for the product of
a real number and −∞. The product of two “infinities” is again infinity, with
the usual rule for assigning the sign to the product.
• Finally, any real number is < +∞ and > −∞, and of course −∞ < ∞.

Abbreviations. From time to time we use the following abbreviations:

a.k.a. for “also known as”
iff for “if and only if”
w.l.o.g. for “without loss of generality”
w.r.t. for “with respect to”

Symbols ▲ and ♦ mark respectively “try yourself” and “educate yourself”

exercises.
Part I

Convex sets in Rn – From First

Acquaintance to Linear Programming
Duality

1
1

First acquaintance with convex sets

1.1 Definition and examples

In the school geometry a figure is called convex if it contains, along with every
pair of its points x, y, also the entire segment [x, y] linking the points. This is
exactly the definition of a convex set in the multidimensional case; all we need
is to say what “the segment [x, y] linking the points x, y ∈ Rn ” is. We state this
formally in the following definition.

Definition I.1.1 [Convex set]

1) Let x, y be two points in Rn . The set
[x, y] := {λx + (1 − λ)y : 0 ≤ λ ≤ 1}
is called a segment with the endpoints x, y.
2) A subset M of Rn is called convex, if it contains, along with every pair
of its points x, y, also the entire segment [x, y]:
x, y ∈ M, 0 ≤ λ ≤ 1 =⇒ λx + (1 − λ)y ∈ M.

The definition of a segment [x; y] is in full accordance with our “real life ex-
perience” in 2D or 3D: when λ ∈ [0, 1], the point x(λ) = λx + (1 − λ)y =
x + (1 − λ)(y − x) is the point where you arrive when traveling from x directly
towards y after you have covered the fraction (1 − λ) of the entire distance from x
to y, and these points compose the “real world segment” with endpoints x = x(1)
and y = x(0).
Note that an empty set is convex by the exact sense of the definition: for the
empty set, you cannot present a counterexample to show that it is not convex.
A closed ray given by a direction 0 ̸= d ∈ Rn is also convex:

R+ (d) := {t d ∈ Rn : t ≥ 0} .

Note also that the open ray given by {t d ∈ Rn : t > 0} is convex as well.
3
4 First acquaintance with convex sets

Figure I.1. a – d): convex sets; e – h): nonconvex sets

We next continue with a number of examples of convex sets.

1.1.1 Affine subspaces and polyhedral sets

We start with a simple and important fact.

Proposition I.1.2 The solution set of an arbitrary (possibly, infinite) sys-

tem
a⊤
α x ≤ bα , α ∈ A (1.1)
of nonstrict linear inequalities with n unknowns x, i.e., the set
S := x ∈ Rn : a⊤

α x ≤ bα , α ∈ A

is convex.
Proof. Consider any x′ , x′′ ∈ S and any λ ∈ [0, 1]. As x′ , x′′ ∈ S, we have
a⊤ ′ ⊤ ′′
α x ≤ bα and aα x ≤ bα for any α ∈ A. Then, for every α ∈ A, multiplying the
inequality aα x ≤ bα by λ, and the inequality a⊤
⊤ ′ ′′
α x ≤ bα by 1 − λ, respectively,
and summing up the resulting inequalities, we get a⊤ ′ ′′
α [λx + (1 − λ)x ] ≤ bα . Thus,
′ ′′
we deduce that λx + (1 − λ)x ∈ S.
Note that this verification of convexity of S works also in the case when in the
definition of S some of nonstrict inequalities a⊤ α x ≤ bα are replaced with their
strict versions a⊤
α x < bα .
Recall that linear and affine subspaces can be represented as the solution sets
of systems of linear equations (Proposition A.47). Consequently, from Proposi-
tion I.1.2 we deduce that such sets are convex.
Example I.1.1 All linear subspaces and all affine subspaces of Rn are convex.
♢
Another important special case of Proposition I.1.2 is the one when we have
a finite system of nonstrict linear inequalities. Such sets have a special name as
they are frequently encountered and studied.
1.1 Definition and examples 5

Definition I.1.3 [Polyhedral set] A set in Rn is called polyhedral if it is the

solution set of a finite system
Ax ≤ b
of m nonstrict linear inequalities with n variables (i.e., A is an m × n matrix)
for some nonnegative integer m.

Based on this definition and as an immediate consequence of Proposition I.1.2,

we arrive at our second generic example of convex sets.
Example I.1.2 Any polyhedral set in Rn is convex. ♢
Remark I.1.4 Note that every set given by Proposition I.1.2 is not only convex,
but also closed (why?). In fact, Separation Theorem (see Theorem II.7.3) implies
the following:
Every closed convex set in Rn is the solution set of an infinite system
a⊤
i x ≤ bi , i = 1, 2, . . ., of nonstrict linear inequalities.

■
Remark I.1.5 Replacing some of the nonstrict linear inequalities a⊤
αx≤ bα in
system (1.1) with their strict versions a⊤
α x < bα preserves, as we have already
mentioned, convexity of the solution set, but can destroy its closedness. ■

1.1.2 Unit balls of norms

Let ∥ · ∥ be a norm on Rn i.e., a real-valued function on Rn satisfying the three
characteristic properties of a norm (section B.1.1), specifically:
1. Positivity: ∥x∥ ≥ 0 for all x ∈ Rn , and ∥x∥ = 0 if and only if x = 0;
2. Homogeneity: For x ∈ Rn and λ ∈ R, we have ∥λx∥ = |λ|∥x∥;
3. Triangle inequality: For all x, y ∈ Rn , we have ∥x + y∥ ≤ ∥x∥ + ∥y∥.

Fact I.1.6 The unit ball of a norm ∥ · ∥, i.e., the set

{x ∈ Rn : ∥x∥ ≤ 1} ,
same as every other ∥ · ∥-ball
Br (a) := {x ∈ Rn : ∥x − a∥ ≤ r} ,
(here a ∈ Rn and r ≥ 0 are fixed) is convex.
√ balls (∥ · ∥-balls associated with the standard Eu-
In particular, Euclidean
clidean norm ∥x∥2 := x⊤ x) are convex.

The standard examples of norms on Rn are the ℓp -norms

 P
( n |xi |p )1/p , if 1 ≤ p < ∞
i=1
∥x∥p =
 max |xi |, if p = ∞.
1≤i≤n
6 First acquaintance with convex sets

These indeed are norms (which is not clear in advance; for proof, see page 156,
and for more details – page 215). When p = 2, we get the usual Euclidean norm.
When p = 1, we get
n
X
∥x∥1 = |xi |,
i=1

and its unit ball is the hyperoctahedron

( n
)
X
n
V = x∈R : |xi | ≤ 1 .
i=1

When p = ∞, we get
∥x∥∞ = max |xi |,
1≤i≤n

and its unit ball is the hypercube

V = {x ∈ Rn : − 1 ≤ xi ≤ 1, 1 ≤ i ≤ n} ,
see Figure I.2.

Figure I.2. ∥ · ∥p -balls in 2D, p = 1 (diamond), p = 2 (circle), p = ∞ (box).

Remark I.1.7 As we have already mentioned, the fact that the ℓp norms,
1 ≤ p ≤ ∞, indeed are norms, is not completely trivial and will be proved in
full generality later. What is evident, is that ∥ · ∥p does possess properties of
positivity and homogeneity; what requires effort, is the triangle inequality. There
are, however, two special cases, i.e., p = 1 and p = ∞, where this inequality
is easy. Indeed, from high school you know that for reals a, b it always holds
|a + b| ≤ |a| + |b|. It follows that
X X X X
∥x + y∥1 = |xi + yi | ≤ (|xi | + |yi |) = |xi | + |yi | = ∥x∥1 + ∥y∥1 ,
i i i i

and
∥x + y∥∞ = max |xi + yi | ≤ max {|xi | + |yi |} ≤ max {|xi | + |yj |}
i i i,j

= max |xi | + max |yj | = ∥x∥∞ + ∥y∥∞ .

i j

Triangle inequality for Euclidean norm ∥ · ∥2 should be already known to the

reader; this is an immediate consequence of Cauchy-Schwarz inequality |x⊤ y| ≤
∥x∥2 ∥y∥2 , see section B.1.1. ■
1.2 Inner description of convex sets: convex combinations and convex hull 7

Fact I.1.8 Unit balls of norms on Rn are exactly the same as convex sets
V in Rn satisfying the following three properties:
(i) V is symmetric with respect to the origin: x ∈ V =⇒ −x ∈ V ;
(ii) V is bounded and closed;
(iii) V contains a neighborhood of the origin, i.e., there exists r > 0 such that
the centered at the origin Euclidean ball of radius r – the set {x ∈ Rn :
∥x∥2 ≤ r} – is contained in V .
Any set V satisfying the outlined properties is indeed the unit ball of a
particular norm given by
∥x∥V := inf t : t−1 x ∈ V, t > 0 .

(1.2)
t

1.1.3 Ellipsoids

Fact I.1.9 Let Q be an n × n matrix which is symmetric (i.e., Q = Q⊤ ) and

positive definite (i.e., x⊤ Qx > 0 for all x ̸= 0). Then, for every nonnegative
r, the Q-ellipsoid of radius r centered at a, i.e., the set
x ∈ Rn : (x − a)⊤ Q(x − a) ≤ r2

is convex.

1.1.4 Neighborhood of a convex set

Example I.1.3 Let M be a nonempty convex set in Rn , and let ϵ > 0. Then,
for every norm ∥ · ∥ on Rn , the ϵ-neighborhood of M , i.e., the set

n
Mϵ := y ∈ R : inf ∥y − x∥ ≤ ϵ
x∈M

is convex. ♢
Justification of Example I.1.3 is left as an exercise at the end of this Part (see
Exercise I.6).

1.2 Inner description of convex sets: convex combinations and convex

hull
1.2.1 Convex combinations
Recall the notion of linear combination x of vectors x1 , . . . , xm ; this is a vector
represented as
m
X
x= λi xi ,
i=1
8 First acquaintance with convex sets

where λi ∈ R are the coefficients. By including a specific restriction on which

coefficients can be used in this definition, we arrive at important special types of
linear combinations. For example, an affine combination is a linear combination
where the sum of the coefficients is equal to 1. Given a nonempty set X, the
smallest (w.r.t. inclusion) affine plane containing X is composed of all affine
combinations of the points of X, see section A.4.2. Another beast in this genre is
convex combination.

Definition I.1.10 [Convex combination] A convex combination of vectors

x1 , . . . , xm is a linear combination
m
X
x := λi xi ,
i=1

with nonnegative coefficients summing up to 1:

m
X
λi ≥ 0, ∀i = 1, . . . , m, λi = 1.
i=1

Equivalently, convex combination is an affine combination with nonnegative co-

efficients.
By Linear Algebra, a nonempty set X ⊆ Rn is a linear (or an affine) sub-
space if and only if X is closed with respect to taking all linear, respectively, all
affine combinations of its elements. Convex combinations play similar role when
speaking about convex sets.

Fact I.1.11 A set M ⊆ Rn is convex if and only if it is closed with respect

to taking all convex combinations of its elements. That is, M is convex if and
only if every convex combination of vectors from M is again a vector from
M.
Hint: Note that assuming λ1 , . . . , λm > 0, one has
m m
X X λi
λi xi = λ1 x1 + (λ2 + λ3 + . . . + λm ) µi xi , where µi := .
i=1 i=2
λ2 + λ3 + . . . + λm

(cf. Corollary A.39).

1.2.2 Convex hull

Recall that taking the intersection of linear subspaces results in another linear
subspace. The same property holds true for convex sets as well (why?).

Proposition I.1.12 Let {Mα }α be an arbitrary family of convex subsets of

Rn . Then, their intersection
\
M= Mα
α
1.2 Inner description of convex sets: convex combinations and convex hull 9

is also convex.

As an immediate consequence of Proposition I.1.12, we come to the notion of

convex hull Conv(M ) of a subset M ⊆ Rn (cf. the notions of linear/affine span):

Definition I.1.13 [Convex hull] For any M ⊆ Rn , the convex hull of M

[notation: Conv(M )] is the intersection of all convex sets containing M (and
thus, by Proposition 1.2.2 Conv(M ) is the smallest (w.r.t. inclusion) convex
set containing M ).

By Linear Algebra, the linear span of a set M – the smallest (w.r.t. inclusion)
linear subspace containing M – can be described in terms of linear combinations:
this is the set of all linear combinations of points from M . Analogous results hold
for affine span of (nonempty) set and affine combinations of points from the set as
well. We have an analogous description of convex hulls via convex combinations
as well:

Fact I.1.14 [Convex hull via convex combinations] For a set M ⊆ Rn ,

Conv(M ) = {the set of all convex combinations of vectors from M } .

We will see in section 9.3 that when M is a finite set in Rn , Conv(M ) is a bounded
polyhedral set. Bounded polyhedral sets are also called polytopes.
We next continue with a number of important families of convex sets.

1.2.3 Simplex

Definition I.1.15 [Simplex] The convex hull of m + 1 affinely indepen-

dent points x0 , . . . , xm is called the m-dimensional simplex with the vertices
x0 , . . . , xm . (See section A.4.3 for affine independence.)

Consider an m-dimensional simplex with vertices x0 , . . . , xm . Then, based on

section A.4.3, every point x from this simplex admits exactly one representation
as a convex combination of these vertices. The coefficients λi , i = 0, . . . , m, used
in the convex combination representation of x form the unique solution to the
system of linear equations given by
m
X m
X
λi xi = x, λi = 1.
i=0 i=0

This system in variables λi is feasible if and only if x ∈ M = Aff({x0 , . . . , xm }),

and the components of the solution (the barycentric coordinates of x) are affine
functions of x ∈ Aff(M ). The simplex itself is composed of points from M with
nonnegative barycentric coordinates.
10 First acquaintance with convex sets

1.2.4 Cones
We next examine a very important class of convex sets.
A nonempty set K ⊆ Rn is called conic if it contains, along with every point
x ∈ K, the entire ray R+ (x) = {tx : t ≥ 0} spanned by the point:
x∈K =⇒ tx ∈ K, ∀t ≥ 0.
Note that based on our definition, any conic set is nonempty and it always con-
tains the origin.

Definition I.1.16 [Cone] A cone is a nonempty, convex, and conic set.

Fact I.1.17 A set K ⊆ Rn is a cone if and only if it is nonempty and

• is conic, i.e., x ∈ K, t ≥ 0 =⇒ tx ∈ K; and
• contains sums of its elements, i.e., x, y ∈ K =⇒ x + y ∈ K.

Example I.1.4 The solution set of an arbitrary (possibly, infinite) system of

homogeneous linear inequalities with n unknowns x, i.e., the set
K = x ∈ Rn : a⊤

α x ≥ 0, ∀α ∈ A ,

is a cone.
In particular, the solution set of a finite system composed of m homogeneous
linear inequalities
Ax ≥ 0
(A is m × n matrix) is a cone. A cone of this latter type is called polyhedral 1 .
Specifically, the nonnegative orthant Rm
+ := {x ∈ R
m
: x ≥ 0} is a polyhedral
cone. ♢
Note that the cones given by systems of linear homogeneous nonstrict inequal-
ities are obviously closed. From Separation Theorem (see Theorem II.7.3) we will
deduce the reverse as well, i.e., every closed cone is the solution set to such a
system. Thus, Example I.1.4 is the generic example of a closed convex cone.
We already know that a norm ∥·∥ on Rn gives rise to specific convex sets in Rn ,
namely, balls of this norm. In fact, a norm also gives rise to another important
convex set.

Proposition I.1.18 For any norm ∥ · ∥ on Rn , its epigraph, i.e., the set
K := [x; t] ∈ Rn+1 : t ≥ ∥x∥

is a closed cone in Rn+1 .

1 The “literal” interpretation of the words “polyhedral cone” should be “a set of the form {x : Ax ≤ b}
which is a cone;” this is not exactly the terminology just introduced. Luckily, there is no collision: a
polyhedral set X = {x : Ax ≤ b} is a cone if and only if X = {x : Ax ≤ 0}, see Exercise II.32.
1.2 Inner description of convex sets: convex combinations and convex hull 11

Proof. Obviously, K is nonempty as [x; t] = [0; 0] is in K. Also, K is a conic

set as any norm ∥ · ∥ is positively homogeneous. Moreover, the closedness of K
with respect to summation is readily given by the Triangle inequality: consider
two points [x; t] ∈ K and [x′ ; t′ ] ∈ K. Then, t ≥ ∥x∥ and t′ ≥ ∥x′ ∥ which imply
t+t′ ≥ ∥x∥+∥x′ ∥ ≥ ∥x+x′ ∥. Thus, [x+x′ ; t+t′ ] ∈ K. Invoking Fact I.1.17, we see
that K is a cone. In order to see that K is closed recall that ∥·∥ is continuous (see
Fact B.23). Thus, for any sequence of points [xi ; ti ] ∈ K converging to a point [x; t]
as i → ∞, we have [∥xi ∥; ti ] → [∥x∥; t] and therefore t ≥ ∥x∥. This establishes
that the limit of any converging sequence from K belongs to K, proving that K
is closed.
A particular case of Proposition I.1.18 states that the epigraph of Euclidean
norm, i.e.,
Lm := [x; t] ∈ Rn+1 : t ≥ ∥x∥2 ,

is a closed cone. This is the second-order, (or Lorentz or ice cream) cone (see
Figure I.3), and it plays a significant role in convex optimization.

Figure I.3. [Boundary of] 3D Lorentz cone L3

To complete our first acquaintance with cones, we also mention the semidefinite
cone Sm m
+ “living” in the space S of real symmetric m×m matrices and composed
of positive semidefinite matrices from Sm , i.e.,
Sm m×m
: X = X ⊤ , a⊤ Xa ≥ 0, ∀a ∈ Rm ;

+ := X ∈ R

see section D.2.2.

Cones form a very important family of convex sets, and one can develop theory
of cones absolutely similar (and in a sense, equivalent) to that of all convex sets.
For example, by introducing the notion of conic combination of vectors x1 , . . . , xk
as a linear combination of these vectors with nonnegative coefficients, we can eas-
ily prove the following statements completely similar to those for general convex
sets, with conic combination playing the role of convex ones:
• A set is a cone if and only if it is nonempty and is closed with respect to taking
conic combinations of its elements;
• Intersection of a family of cones is again a cone; in particular, for every set
K ⊆ Rn there exists the smallest (w.r.t. inclusion) cone containing K, called the
conic hull of K:
12 First acquaintance with convex sets

Definition I.1.19 [Conic hull] For any K ⊆ Rn , the conic hull of K [nota-
tion: Cone(K)] is the intersection of all cones containing K. Thus, Cone(K)
is the smallest (w.r.t. inclusion) cone containing K.

• We can describe the conic hull of a set K ⊆ Rn in terms of its conic combina-
tions:

Fact I.1.20 [Conic hull via conic combinations] The conic hull Cone(K) of
a set K ⊆ Rn is the set of all conic combinations (i.e., linear combinations
with nonnegative coefficients) of vectors from K:
( N
)
X
Cone(K) = x ∈ Rn : ∃N ≥ 0, λi ≥ 0, xi ∈ K, i ≤ N : x = λi xi .
i=1

Note that here we P use the standard convention: the sum of vectors over an empty
0
set of indexes, like i=1 z i , has a value – it is the origin of the space where vectors
live. In particular, the set of conic combinations of vectors from empty set is {0},
in full accordance with Definition I.1.19.

1.3 Calculus of convex sets

Calculus of convex sets is, in a nutshell, the list of operations which preserve
convexity.

Proposition I.1.21 The following operations preserve convexity of sets:

1. Taking
T intersection: if Mα , α ∈ A, are convex sets, so is their intersection
Mα .
α
2. Taking direct product: if M1 ⊆ Rn1 and M2 ⊆ Rn2 are convex sets, so is
their direct product, i.e., the set
M1 × M2 := x = [x1 ; x2 ] ∈ Rn1 × Rn2 = Rn1 +n2 : x1 ∈ M1 , x2 ∈ M2 .

3. Arithmetic summation and multiplication by reals: if M1 , . . . , Mk are non-

empty convex sets in Rn and λ1 , . . . , λk are arbitrary reals, then the set
( k )
X
λ1 M1 + . . . + λk Mk := λi xi : xi ∈ Mi , i = 1, . . . , k
i=1

is convex.
Warning: “Linear combination λ1 M1 + . . . + λk Mk of sets” as defined
above is just a notation. When operating with these “linear combinations
of sets,” one should be careful. For example. while is it true that M1 +
M2 = M2 + M1 and that M1 + (M2 + M3 ) = (M1 + M2 ) + M3 , and
1.3 Calculus of convex sets 13

even that λ(M1 + M2 ) = λM1 + λM2 , it is, in general, not true that
(λ1 + λ2 )M = λ1 M + λ2 M .
4. Taking image under an affine mapping: if M ⊆ Rn is convex set and x 7→
A(x) ≡ Ax + b is an affine mapping from Rn into Rm (where A ∈ Rm×n
and b ∈ Rm ), then the image of M under the mapping A(·), i.e., the set
A(M ) := {A(x) : x ∈ M } ,
is convex.
5. Taking inverse image under affine mapping: if M ⊆ Rn is a convex set
and y 7→ A(y) = Ay + b is an affine mapping from Rm to Rn (where
A ∈ Rn×m and b ∈ Rn ), then the inverse image of M under the mapping
A(·), i.e., the set
A−1 (M ) := {y ∈ Rm : A(y) ∈ M } ,
is convex.
The (completely straightforward) verification of this proposition is left to the
reader.

1.3.1 Calculus of closed convex sets

Numerous important convexity-related results require not just convexity, but also
closedness of the participating sets. Therefore, it makes sense to think to which
extent the “calculus of convexity” as presented in Proposition I.1.21 is preserved
when passing from general convex sets to closed convex sets. Here are the answers:
T
1. Taking intersection: if Mα , α ∈ A, are closed convex sets, so is the set Mα .
α
2. Taking direct product: if M1 ⊆ Rn1 and M2 ⊆ Rn2 are closed convex sets, so
is the set
M1 × M2 = x = [x1 ; x2 ] ∈ Rn1 × Rn2 = Rn1 +n2 : x1 ∈ M1 , x2 ∈ M2 .

3. Arithmetic summation of nonempty closed convex sets Mi , 1 ≤ i ≤ k, preserves

convexity, but not necessarily preserves closedness. However, it does preserve
closedness when at most one of the sets is unbounded.
An example of a pair of closed convex sets with non-closed sum is M1 = {x ∈
R2 : x1 > 0, x2 ≥ 1/x1 }, M2 = {x ∈ R2 : x2 = 0}. The sum of these two closed
sets clearly is the open upper halfplane {x ∈ R2 : x2 > 0} (why?) and is not
closed.
Let us verify that if at most one of nonempty closed convex sets is unbounded,
then the sum of the sets is convex (this we already know from calculus of
convexity) and closed. Closedness is given by the following observation:
the sum of nonempty closed sets, convex or not, with at most one of the
sets unbounded, is closed.
To justify this observation, it clearly suffices to verify its validity for a pair
14 First acquaintance with convex sets

of sets, M1 and M2 . Assuming both sets are nonempty and closed and M1
is bounded, we should prove that if a sequence {xi + y i }i with xi ∈ M1 and
y i ∈ M2 converges as i → ∞, the limit lim (xi + y i ) belongs to M1 + M2 .
i→∞
Since M1 , and thus the sequence {xi }i , is bounded, passing to a subsequence
we may assume that the sequence {xi }i converges, as i → ∞, to some x. Since
the sequence {xi + y i }i converges as well, the sequence {y i }i also converges to
some y. As M1 and M2 are closed, we have x ∈ M1 , y ∈ M2 , and therefore
lim (xi + y i ) = x + y ∈ M1 + M2 ,
i→∞
4. Multiplication by a real: For a nonempty closed convex set M and a real λ,
the set λM is closed and convex (why?).
5. Image under an affine mapping of a closed convex set M is convex, but not
necessarily closed; it is definitely closed when M is bounded.
As an example of closed convex set with a non-closed affine image consider
the set {[x; y] ∈ R2 : x, y ≥ 0, xy ≥ 1} (i.e., a branch of hyperbola) and its
projection onto the x-axis. This set is convex and closed, but its projection
onto the x-axis is the positive ray {x > 0} which is not closed. Closedness of
the affine image of a closed and bounded set is the special case of the general
fact:
the image of a closed and bounded set under a mapping that is continuous
on this set is closed and bounded as well (why?).
6. Inverse image under affine mapping: if M ⊆ Rn is convex and closed and
y 7→ A(y) = Ay + b is an affine mapping from Rm to Rn , then the set
A−1 (M ) := {y ∈ Rm : A(y) ∈ M }
is a closed convex set in Rm . Indeed, the convexity of A−1 (M ) is given by the
calculus of convexity, and its closedness is due to the following standard fact:
the inverse image of a closed set in Rn under continuous mapping from
Rm to Rn is closed (why?).

We see that the “calculus of closed convex sets” is somehow weaker than the
calculus of convexity per se. Nevertheless, we will see that these difficulties dis-
appear when restricting the operands of our operations to be polyhedral, and not
just closed and convex.

1.4 Topological properties of convex sets

Convex sets and closely related objects - convex functions - play the central role
in Optimization. To play this role properly, the convexity alone is not sufficient;
we need convexity and closedness.

1.4.1 The closure

It is clear from definition of a closed set that the intersection of a family of closed
sets in Rn is also closed (see Fact B.13). From this fact it follows that for every
1.4 Topological properties of convex sets 15

subset M of Rn there exists the smallest (w.r.t. inclusion) closed set containing
M . This leads us to the following definition.

Definition I.1.22 [Closure] Given a set M ⊆ Rn , the closure of M [no-

tation: cl M or cl(M )] is the smallest (w.r.t. inclusion) closed set (i.e., the
intersections of all closed sets) containing M .

From Real Analysis, we have the following inner description of the closure of
a set in a metric space (and, in particular, in Rn ).

Fact I.1.23 The closure of a set M ⊆ Rn is exactly the set composed of the
limits of all converging sequences of elements from M .

Example I.1.5 Based on Fact I.1.23, it is easy to prove that, e.g., the closure
of the open Euclidean ball
{x ∈ Rn : ∥x − a∥2 < r} [where r > 0]
is the closed Euclidean ball {x ∈ Rn : ∥x − a∥2 ≤ r}.
Another useful application example is the closure of a set defined by strict
linear inequalities, i.e.,
M := x ∈ Rn : a⊤

α x < bα , α ∈ A .

Whenever such a set M is nonempty, then its closure is given by the nonstrict
versions of the same inequalities:
cl M = x ∈ Rn : a⊤

α x ≤ bα , α ∈ A .

Note here that nonemptiness of M in this last example is essential. To see this,
consider the set M = {x ∈ R : x < 0, − x < 0} . Clearly, M is empty, so that its
closure also is the empty set. On the other hand, if we ignore the nonemptiness
requirement on M and apply formally the above rule, we would incorrectly claim
that cl M = {x ∈ R : x ≤ 0, − x ≤ 0} = {0} . ♢

1.4.2 The interior

n
Consider a set M ⊆ R . Recall from Definition B.9 that a point x ∈ M is an
interior point of M , if some neighborhood of the point is contained in M , i.e., if
there exists a ball of positive radius centered at x which is contained in M :
∃r > 0 : Br (x) := {y ∈ Rn : ∥y − x∥2 ≤ r} ⊆ M.

Definition I.1.24 [Interior] The set of all interior points of a given set
M ⊆ Rn is called the interior of M [notation: int M or int(M )] (see Defini-
tion B.10).
16 First acquaintance with convex sets

Example I.1.6 We have the following sets and their corresponding interiors:

• The interior of an open set is the set itself.

• The interior of the closed ball {x ∈ Rn : ∥x − a∥2 ≤ r} (r > 0 or n ≥ 1) is the
open ball {x ∈ Rn : ∥x − a∥2 < r} (why?).
• The interior of the standard full-dimensional simplex
( n
)
X
n
µ ∈ R : µ ≥ 0, µi ≤ 1
i=1
Pn
is composed of all vectors µi > 0 for all i = 1, . . . , n with i=1 µi < 1 (why?).
• The interior of a polyhedral set {x ∈ Rn : Ax ≤ b} with matrix A not contain-
ing zero rows is the set {x ∈ Rn : Ax < b} (why?).
Note that here the requirement that the set is polyhedral, i.e., defined by a
finite system of linear inequalities is critical. In particular, this statement is
not, generally speaking, true for solution sets of infinite systems of linear in-
equalities. For example, the following set defined by an infinite system of linear
inequalities

1
M := x ∈ R : x ≤ , n = 1, 2, . . .
n
is nothing but the nonpositive ray R− = {x ∈ R : x ≤ 0}, i.e., M = R− .
Thus, int M = {x ∈ R : x < 0}, i.e., the negative ray. On the other hand, the
following set defined by the strict versions of these inequalities

1
M ′ := x ∈ R : x < , n = 1, 2, . . .
n
define the same nonpositive ray, i.e., M ′ = {x ∈ R : x ≤ 0}. Hence, M ′ ̸=
int M for this set M defined by an infinite system of inequalities. ♢
The following observation is evident:

Fact I.1.25 For any set M in Rn , its interior, int M , is always open, and
int M is the largest (with respect to the inclusion) open set contained in M .

The interior of a set is, of course, contained in the set, which, in turn, is
contained in its closure:
int M ⊆ M ⊆ cl M. (1.3)

Definition I.1.26 [Boundary] For any M ⊆ Rn , the boundary of M is the

set
bd M = cl M \ int M,
and the points on the boundary are called boundary points of M .
1.4 Topological properties of convex sets 17

The boundary points of M are exactly the points from Rn which can be approx-
imated to whatever high accuracy both by points from M and by points from
outside of M (check it!).
Given a set M ⊆ Rn , it is important to note that the boundary points not
necessarily belong to M , since M = cl M need not necessarily hold in general. In
fact, all boundary points belong to M if and only if M = cl M , i.e., if and only
if M is closed.
The boundary of a set M ⊆ Rn is clearly closed as bd M = cl M ∩ (Rn \ int M )
and both sets cl M and Rn \ int M are closed (note that the set Rn \ int M is
closed since it is the complement of an open set). In addition, from the definition
of the boundary, we have
M ⊆ (int M ∪ bd M ) = cl M.
Therefore, any point from M is either an interior or a boundary point of M .

1.4.3 The relative interior

Many of the constructions in Optimization possess nice properties in the interior
of the set the construction is related to and may lose these nice properties at the
boundary points of the set. This is why in many cases we are especially interested
in interior points of sets and want the set of these interior points to be “sufficiently
dense.” What should we do if it is not the case, for example if there are no interior
points at all (e.g., if we are looking at a segment in the plane)? It turns out that
in these cases we can use a good surrogate of the “normal” interior, namely the
relative interior defined as follows.

Definition I.1.27 [Relative interior] Let M ⊆ Rn be nonempty. We say

that a point x ∈ M is relative interior for M if M contains the intersection
of a small enough ball centered at x with Aff(M ), i.e., if there exists r > 0
such that
(Br (x) ∩ Aff(M )) := {y ∈ Rn : y ∈ Aff(M ), ∥y − x∥2 ≤ r} ⊆ M.
The relative interior of M [notation: rint M ] refers to the set of all relative
interior points of M .
By definition, the relative interior of empty set is empty.

Example I.1.7 We have the following sets and their corresponding relative
interiors:
• The relative interior of a singleton is the singleton itself (since a point in the
0-dimensional space is the same as a ball of a positive radius).
• More generally, the relative interior of an affine subspace is the subspace itself.
• Given two distinct point x ̸= y in Rn , the interior of a segment [x, y] is empty
whenever n > 1. In contrast to this, the relative interior of this set is always
(independent of n) nonempty and it is precisely the interval (x, y), i.e., the
segment without the endpoints. ♢
18 First acquaintance with convex sets

Geometrically speaking, the relative interior is the interior we get when we

treat M ⊆ Rn as a subset of its affine hull (the latter, geometrically, is nothing
but Rk , k being the affine dimension of Aff(M )).
We can play with the notion of the relative interior in basically the same way
as with the one of interior. Namely, for any M ⊆ Rn , since Aff(M ) is closed
and contains M , it contains also the smallest closed set containing M , i.e., cl M .
Therefore, we have the following analogies of inclusions, cf. (1.3):
rint M ⊆ M ⊆ cl M [⊆ Aff(M )]. (1.4)
We can also define the relative boundary.

Definition I.1.28 [Relative boundary] For any M ⊆ Rn , its relative bound-

ary [notation: rbd M ] is defined as the set rbd M = cl M \ rint M .

Note that for any M ⊆ Rn , we naturally have rbd M is a closed set contained
in Aff(M ), and, as for the “actual” interior and boundary, we have
rint M ⊆ M ⊆ cl M = rint M ∪ rbd M.
Of course, if Aff(M ) = Rn , then the relative interior becomes the usual interior,
and similarly for boundary. Note that Aff(M ) = Rn for sure is the case when
int M ̸= ∅ (since then M contains a ball B, and therefore the affine hull of M is
the entire Rn , which is the affine hull of B).

1.4.4 Nice topological properties of convex sets

An arbitrary set M ⊆ Rn may possess very pathological topology. In particular,
both inclusions in the chain
rint M ⊆ M ⊆ cl M
can be very “loose.” For example, let M be the set of rational numbers in the
segment [0, 1] ⊂ R. Then, rint M = int M = ∅ since every neighborhood of every
rational number contains irrational numbers. On the other hand, cl M = [0, 1].
Thus, rint M is “incomparably smaller” than M , cl M is “incomparably larger”
than M , and M is contained in its relative boundary (by the way, what is this
relative boundary?).
The following theorem demonstrates that the topology of a convex set M is
much better than what it might be for an arbitrary set.

Theorem I.1.29 Let M be a convex set in Rn . Then,

(i) The interior int M , the closure cl M and the relative interior rint M are
convex.
(ii) If M is nonempty, then its relative interior rint M is nonempty.
(iii) The closure of M is the same as the closure of its relative interior,
1.4 Topological properties of convex sets 19

i.e., cl M = cl(rint M ). (In particular, every point of cl M is the limit of a

sequence of points from rint M .)
(iv) The relative interior remains unchanged when we replace M with its
closure, i.e., rint M = rint (cl M ).
Moreover, (iii) and (iv) imply that
(v) The relative boundary remains unchanged when we replace M with its
closure.
We will use the following basic result to prove this theorem (we will present
the proof of this lemma after the proof of the theorem).

Lemma I.1.30 Let M be a convex set in Rn . Then, for any x ∈ rint M and
y ∈ cl M , we have
[x, y) := {(1 − λ)x + λy : 0 ≤ λ < 1} ⊆ rint M.

Proof of Theorem I.1.29. (i): Prove yourself!

(ii): Let M be a nonempty convex set, and let us prove that rint M ̸= ∅.
By translation, we may assume that 0 ∈ M . Furthermore, we may assume that
the linear span of M , i.e., Lin(M ), is the entire Rn . Indeed, as far as linear
operations and the Euclidean structure are concerned, Lin(M ), as every other
linear subspace in Rn , is equivalent to Rk for a certain k. Since the notion of
relative interior deals only with linear and Euclidean structures, we lose nothing
thinking of Lin(M ) as of Rk and taking it as our universe instead of the original
universe Rn . Thus, in the rest of the proof of (ii), we assume that 0 ∈ M and
Lin(M ) = Rn ; what we need to prove is that the interior of M (which in our
case is the same as relative interior of M ) is nonempty. Note that since 0 ∈ M ,
we have Aff(M ) = Lin(M ) = Rn .
As Lin(M ) = Rn , we can find n linearly independent vectors a1 , . . . , an in M .
Let us also set a0 := 0. The n + 1 vectors a0 , . . . , an belong to M . Since M is
convex, the convex hull of these vectors, i.e.,
( n n
) ( n n
)
X X X X
∆ := x = λi ai : λ ≥ 0, λi = 1 = x = µi ai : µ ≥ 0, µi ≤ 1
i=0 i=0 i=1 i=1

also belongs to M . Note that the set ∆ is the image of the standard full-
dimensional simplex
( n
)
X
n
µ ∈ R : µ ≥ 0, µi ≤ 1
i=1

under the linear transformation µ 7→ Aµ, where A is the matrix with the columns
a1 , . . . , an . Recall from Example I.1.6 that the standard simplex has a nonempty
interior. Since A is nonsingular (due to the linear independence of a1 , . . . , an ),
multiplication by A maps open sets onto open ones, so that ∆ has a nonempty
interior. Since ∆ ⊆ M , the interior of M is nonempty.
(iii): The statement is evidently true when M is empty, so we assume that
20 First acquaintance with convex sets

M ̸= ∅. We clearly have cl(rint M ) ⊆ cl M due to rint M ⊆ M . Thus, all we

need to complete the proof of (iii) is to verify that every y ∈ cl M is the limit of
a sequence of points y i ∈ rint M . Indeed, pick x ∈ rint M (recall that from part
(ii) we have rint M ̸= ∅) and set y i := (1 − 1/i)y + (1/i)x. By Lemma I.1.30, we
have y i ∈ rint M , and clearly y = lim y i , completing the verification of (iii).
i→∞
(iv): The statement is obviously true when M is empty, so we assume that
M ̸= ∅. Since M ⊆ cl M , we always have rint M ⊆ rint (cl M ). To prove the
reverse inclusion, consider any z ∈ rint (cl M ), and let us prove that z ∈ rint M .
Let x ∈ rint M (from part (ii), we already know that rint M ̸= ∅). As x and z
are in Aff(M ), for any t ∈ R, the vectors z t := x + t(z − x) belong to Aff(M ), and
when t approaches 1, z t approaches z. Since z ∈ rint (cl M ), it follows that there
exists ϵ > 0 such that y := z 1+ϵ ∈ cl M . It remains to note that z = (1 − λ)y + λx
ϵ
with λ = 1+ϵ ∈ (0, 1), and therefore z = (1 − λ)y + λx ∈ rint M by Lemma I.1.30
(recall that x ∈ rint M , y ∈ cl M ).
Remark I.1.31 We see from the proof of Theorem I.1.29(iii) that to get the
closure of a (nonempty) convex set, it suffices to take its “radial” closure, i.e., to
take a point x ∈ rint M , take all rays in Aff(M ) starting at x and look at the
intersection of such a ray ℓ with M ; such an intersection will be a convex set on
the line which contains a one-sided neighborhood of x, i.e., is either a segment
[x, y ℓ ], or the entire ray ℓ, or a half-interval [x, y ℓ ). In the first two cases we do
not need to do anything; in the third case, we need to add y ℓ to M . After all
rays are looked through and all “missed” endpoints y ℓ are added to M , we obtain
cl M . To understand the role of convexity in this result, look at the nonconvex set
of rational numbers from [0, 1]. The interior (≡ relative interior) of this ”highly
percolated” set is empty, the closure is [0, 1], and there is no way to restore the
closure in terms of the interior. ■
Proof of Lemma I.1.30. Given that x ∈ M , let us denote Aff(M ) = x + L,
where L is the linear subspace parallel to Aff(M ). Then,
M ⊆ Aff(M ) = x + L.
Let B be the unit Euclidean ball in L, i.e., B = {h ∈ L : ∥h∥2 ≤ 1} . Since
x ∈ rint M , there exists a positive radius r such that
x + rB ⊆ M. (1.5)
Now consider any λ ∈ [0, 1), and let z := (1 − λ)x + λy. As y ∈ cl M , we have
y = lim y i for certain sequence of points from M . By setting z i := (1 − λ)x + λy i ,
i→∞
we get z i → z as i → ∞. Then, from (1.5) and the convexity of M , it follows
that the sets Zi := {(1 − λ)x′ + λy i : x′ ∈ x + rB} are contained in M . Clearly,
Zi is exactly the set z i + r′ B, where r′ := (1 − λ)r > 0. Thus, z is the limit of the
sequence z i , and r′ -neighborhood (in Aff(M )) of every one of the points z i belongs
to M . For every 0 < r′′ < r′ and for all i such that z i is close enough to z, the
r′ -neighborhood of z i contains the r′′ -neighborhood of z; thus, a neighborhood
(in Aff(M )) of z belongs to M , hence z ∈ rint M .
A useful byproduct of Lemma I.1.30 is as follows:
1.4 Topological properties of convex sets 21

Corollary I.1.32 Let M ⊆ Rn be convex. Then, every convex combination

i i
P
i λi x of points x ∈ cl M such that at least one term with positive coefficient
is associated with xi ∈ rint M is in fact a point from rint M .

Another useful byproduct of Lemma I.1.30 is as follows. Let Mk , k ≤ K, be a

finite collection of subsets of Rn . The closure of the union of these sets is the union
of their closures: cl(∪k≤K Mk ) = ∪k≤K cl Mj (why?). Now let us ask ourselves
similar question about intersection: what is the relation between cl ∩k≤K Mk and
∩k≤K cl Mk ? The set ∩k≤K cl Mk is closed and clearly contains ∩k≤K Mk and thus
always contains the closure of the latter set:
!
\ \
cl Mk ⊆ cl Mk . (1.6)
k≤K k≤K

In general, this inclusion can be “loose” – the right hand side set in (1.6) can
be much larger than the left hand side one, even when all Mk are convex. For
example, when K = 2, M1 = {x ∈ R2 : x2 = 0} is the x1 -axis, and M2 = {x ∈
R2 : x2 > 0} ∪ {[0; 0]}, both sets are convex, their intersection is the singleton
{0}, so that cl(M1 ∩ M2 ) = cl{0} = {0}, while the intersection of cl M1 and cl M2
is the entire x1 -axis, which is simply M1 . In this example the right hand side
in (1.6) is “incomparably larger” than the left hand side one. However, under
suitable assumptions we can also achieve equality in (1.6).

Proposition
T I.1.33 Consider convex sets Mk ⊆ Rn , k ≤ K.
(i) If k≤K rint Mk ̸= ∅, then cl(∩k≤K Mk ) = ∩k≤K cl Mk , i.e., (1.6) holds
an equality.
T (ii) Moreover, if MK ∩ int M1 ∩ int M2 ∩ . . . ∩ int MK−1 ̸= ∅, then we have
k≤K rint Mk ̸= ∅, i.e., the premise (and thus the conclusion) in (i) holds
true, so that cl(∩k≤K Mk ) = ∩k≤K cl Mk .

Proof. (i): To prove that under the premise of (i) inclusion (1.6) is equality is
the same as to verify that under the circumstances given x ∈ ∩k cl Mk , one has
x ∈ cl (∩k Mk ). Indeed, under the premise of (i) there exists x̄ ∈ ∩k rint Mk . Then,
for every k we have x̄ ∈ rint Mk and x ∈ cl Mk , implying by Lemma I.1.30 that
the set ∆ := [x̄, x) = {(1 − λ)x̄ + λx : 0 ≤ λ < 1} is contained in Mk . Since
∆ ⊆ Mk for all k, we have ∆ ∈ ∩k Mk , and thus cl ∆ ⊆ cl (∩k Mk ). It remains to
note that x ∈ cl ∆.
(ii): Let x̄ ∈ MK ∩ int M1 ∩ . . . ∩ int MK−1 . As x̄ ∈ int Mk for all k < K, there
exists an open set U ⊂ ∩k<K Mk such that x̄ ∈ U . As x̄ ∈ MK ⊆ cl MK , by
Theorem I.1.29, x̄ is the limit of a sequence of points from rint MK , so that there
b ∈ U ∩ rint MK . Due to the origin of U , we have x
exists x b ∈ rint Mk for all k ≤ K,
so that the premise of (i) indeed takes place.
22 First acquaintance with convex sets

1.5 ⋆ Conic and perspective transforms of a convex set

Let X ⊆ Rn be a nonempty convex set. We can “lift” it to Rn+1 by passing to
the set
X + := {[x; 1] ∈ Rn × R : x ∈ X} .
Now let us look at the conic hull of X + , given by
ConeT(X) := Cone(X + )
[x; t] ∈ Rn × R+ : ∃(I, λiP
≥ 0, xi ∈ X, ∀i

≤ I):
= .
x = i≤I λi xi , t = i≤I λi
P

We will call this the conic transform of X, see Figure I.4. Note that this set
is indeed a cone. Moreover, all vectors [x; t] from this cone have t ≥ 0, and,
importantly, the only vector with t = 0 in the cone ConeT(X) is the origin in
Rn+1 (this is what you get when taking trivial – with all coefficients zero – conic
combinations of vectors from X + ).

Figure I.4. Conic transform

a) conic transform of segment X is the angle AOB
b) conic transform of ray X is the angle AOB with
relative interior of the ray OB excluded

All nonzero vectors [x; t] from ConeT(X) have t > 0 and form a convex set which
we call the perspective transform Persp(X) of X:
Persp(X) := {[x; t] ∈ ConeT(X) : t > 0} = ConeT(X) \ {0n+1 }.
The name of this set is motivated by the following immediate observation:

Proposition I.1.34 [Perspective transform of a nonempty convex set] Let

X be a nonempty convex set in Rn . Then, its perspective transform admits
1.5 ⋆ Conic and perspective transforms of a convex set 23

the representation
Persp(X) = {[x; t] ∈ Rn × R : t > 0, x/t ∈ X} . (1.7)
In other words, to get Persp(X), we pass from X to X + (i.e., lift X to
Rn+1 ) and then take the union of all rays {[sx; s] ∈ Rn × R : s > 0, x ∈ X}
emanating from the origin (with origin excluded) and passing through the
points of X + .

Proof. Let X b := {[x; t] ∈ Rn × R : t > 0, x/t ∈ X}, so that the claim in the
proposition is Persp(X) = X. b Consider a point [x; t] ∈ X. b Then, t > 0 and
y := x/t ∈ X, and thus we have [x, t] = t[y; 1] so that the point [x; t] from X b is a
single-term conic combination – just positive multiple – of the point [y; 1] ∈ X + .
As this holds for every point [x; t] ∈ X b we conclude X b ⊆ Persp(X). To verify
the
P oppositeP inclusion, recall that every point P [x; t] ∈ Persp(X) is of the form
[ i λi xi ; i λi ] with xi ∈ X, λi ≥ 0, and t = i λi > 0. Then,
hX X i hX i
λ i xi ; λi = t (λi /t)xi ; 1 = t[y; 1],
i i i
i
P
where y := i (λi /t)x . Note that y ∈ X as it is a convex combination of points
from X and X is convex. Thus, [x; t] is such that t > 0 and y = x/t ∈ X, that is,
Xb ⊇ Persp(X) as desired.
As a byproduct of Proposition I.1.34, we conclude that the right hand side set
in (1.7) is convex whenever X is convex and nonempty – a fact not so evident
“from scratch.”
Note that X + is geometrically the same as X, and moreover we can view X +
as simply the intersection of ConeT(X) (or Persp(X)) with the hyperplane t = 1
in Rn × R.
Example I.1.8
1. ConeT(Rn ) = {[x; t] ∈ Rn+1 : t > 0} ∪ {0n+1 }, and
Persp(Rn ) = {[x; n+1
t] ∈ R n+1: t > 0}.
n
2. ConeT(R+ ) = [x; t] ∈ R+ : t > 0 ∪ {0n+1 }, and
Persp(Rn+ ) = [x; t] ∈ Rn+1
+ :t>0 .
3. Given any norm ∥ · ∥ on Rn , let B be its unit ball. Then, we have ConeT(B) =
{[x; t] ∈ Rn+1 : t ≥ ∥x∥}, and Persp(B) = {[t; x] ∈ Rn+1 : t ≥ ∥x∥, t > 0}.
♢
Note that in all three examples in Example I.1.8, the set X of which we are
taking conic and perspective transforms is not just convex, but also closed. How-
ever, in the first two examples, the conic transform is a non-closed cone, while in
the third example the conic transform is closed, albeit in all three cases the inter-
sections of ConeT(X) with half-space {[x; t] ∈ Rn+1 : t ≥ α} is closed, provided
α > 0. There is indeed a general fact underlying this phenomenon.
24 First acquaintance with convex sets

Proposition I.1.35 Let X ⊂ Rn be a nonempty convex set. Then, we have

the following:
(i) For α > 0, define Hα := {[x; t] ∈ Rn+1 : t ≥ α}. When X is closed,
ConeT(X) ∩ Hα = Persp(X) ∩ Hα and this intersection is closed for any
α > 0.
(ii) Moreover, the cone ConeT(X) is closed if and only if X is closed and
bounded. In fact, ConeT(X) is closed if and only if cl (Persp(X)) = ConeT(X).

Proof. (i): When α > 0, we clearly have ConeT(X) ∩ Hα = Persp(X) ∩ Hα .

To see that these intersections are closed whenever X is closed, invoking (1.7) it
suffices to prove that when {[xi ; ti ]}i≥1 is a converging sequence such that ti ≥ α
and xi /ti ∈ X, then the limit [x; t] of this sequence satisfies x/t ∈ X and t ≥ α.
Since ti → t as i → ∞ and ti ≥ α holds for all i, we clearly have t ≥ α. Moreover,
we have that the converging sequence y i := xi /ti is in X thus xi /ti → x/t as
i → ∞ and the point x/t is in X since X is closed.
(ii): First, we assume that nonempty convex set X is closed and bounded, and
we will prove that ConeT(X) is closed, that is, whenever a sequence {[xi ; ti ]}i≥1 of
points from ConeT(X) converges, the limit of the sequence belongs to ConeT(X).
Indeed, consider such a sequence along with its limit [x; t]. When t > 0, all but
finitely many terms of the sequence belong to the half-space Ht/2 , and as by part
(i) ConeT(X) ∩ Ht/2 is closed, we have [x; t] ∈ ConeT(X). When t = 0, then
either (a) ti = 0 for infinitely many values of i, or (b) ti > 0 for all but finitely
many values of i. In the case of (a) infinitely many terms in our sequence are of
the form [0n ; 0] (since whenever [y; 0] ∈ ConeT(X) we must have y = 0 as well),
so that [x; t] = 0n+1 ∈ ConeT(X). In the case of (b) for all large enough i we
have ti > 0 and xi /ti ∈ X, and since ti → 0 as i → ∞ and X is bounded we
deduce xi → 0 as i → ∞. Then, this together with ti → 0, i → ∞, implies that
[x; t] = [0n+1 ; 0], and we again have [x; t] ∈ ConeT(X). Thus, whenever X is a
nonempty closed and bounded set, ConeT(X) is closed.
Now assume that ConeT(X) is closed, and let us prove that X is closed and
bounded. Clearly, X is closed if and only if X + is closed, and since X + is the
intersection of the closed set ConeT(X) with the hyperplane t = 1 in Rn ×R, X +
is indeed closed. it remains to prove that X is bounded. Assume for contradiction
that X is unbounded. Then, we can find a sequence xi ∈ X, i ≥ 1, with ∥xi ∥2 →
∞ as i → ∞. Passing to a subsequence, we can assume that the ∥ · ∥2 -unit vectors
ξ i := xi /∥xi ∥2 converge to some unit vector ξ. Setting ti := 1/∥xi ∥2 , we have
ti > 0, ti → 0 as i → ∞, and ξ i /ti = xi ∈ X, so that [ξ i ; ti ] ∈ ConeT(X). Since
ConeT(X) is closed and by construction [ξ i ; ti ] → [ξ; 0] as i → ∞, we should have
[ξ; 0] ∈ ConeT(X), which is impossible as ∥ξ∥2 = 1.
The final claim, i.e., ConeT(X) is closed if and only if cl (Persp(X)) = ConeT(X),
follows immediately as well. Indeed, whenever X is nonempty and convex, we have
Persp(X) = ConeT(X) \ {0n+1 } and clearly 0n+1 ∈ cl(Persp(X)), implying that
cl(ConeT(X)) = cl(Persp(X)). As a result, whenever X is nonempty and convex,
ConeT(X) is closed if and only if ConeT(X) = cl(Persp(X)).
1.5 ⋆ Conic and perspective transforms of a convex set 25

For a nonempty convex set X, let us also define the closure of ConeT(X), i.e.,
the set
ConeT(X) := cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ X} .
Clearly, ConeT(X) is a closed cone in Rn+1 containing X + . Moreover, it is im-
mediately seen that ConeT(X) is the smallest (w.r.t. inclusion) closed cone in
Rn+1 which contains X + and that this cone remains intact when extending X
to the closure of X. We will refer to ConeT(X) as the closed conic transform
of X. In some cases, ConeT(X) admits a simple characterization. An immediate
illustration of this is as follows:

Fact I.1.36 Let K be a closed cone and let the set

X := {x ∈ Rn : Ax − b ∈ K}
be nonempty. Then, ConeT(X) = {[x; t] ∈ Rn × R : Ax − bt ∈ K, t ≥ 0}.

For useful additional facts on closed conic transforms, see Exercise III.12.1-3.
2

Theorems of Caratheodory, Radon, and

Helly

We next examine three theorems from Convex Analysis that have important
consequences in Optimization.

2.1 Caratheodory Theorem

Let us recall the notion of dimension from Linear Algebra. First of all, we define
the dimension of a linear subspace. This is precisely the number of linearly inde-
pendent vectors spanning the linear subspace. In the case of an affine subspace,
we talk about its affine dimension, which is precisely the dimension of the linear
subspace that is underlying (parallel) to the given affine subspace. Based on these
notions, we are now ready to define dimension of a nonempty set M .

Definition I.2.1 [Dimension of a nonempty set] Given a nonempty set M ⊆

Rn , its dimension (also referred as its affine dimension) [notation: dim(M )]
is defined as the affine dimension of Aff(M ), or, which is the same, linear
dimension of the linear subspace parallel to Aff(M ).

Remark I.2.2 Note that some subsets of Rn are in the scopes of several defi-
nitions of dimension. Specifically, linear subspace is also an affine subspace, and
similarly, an affine subspace is a nonempty set as well. It is immediately seen that
if a set is in the scope of more than one definition of dimension, all applicable
definitions attribute the set with the same value of the dimension. ■
As an informal introduction to what follows, draw several points (“red points”)
on the 2D plane and a take a point (“blue point”) in their convex hull. You will
observe that whatever your selection of red points and the blue point in their
convex hull, this point will belong to a properly selected triangle with red vertices.
The general fact is as follows.

Theorem I.2.3 [Caratheodory Theorem] Consider a nonempty set M ⊆

Rn , and let m := dim(M ). Then, every point x ∈ Conv(M ) is a convex
combination of at most m + 1 points from M .

Proof. Let E := Aff(M ). Then, dim(E) = m. Replacing, if necessary, the em-

bedding space Rn of M with E (the latter is, geometrically, just Rm ), we can
assume with loss of generality that m = n.
26
2.1 Caratheodory Theorem 27

Let x ∈ Conv(M ). By Fact I.1.14 on the structure of convex hull, there exists
x , . . . , xN from M and convex combination weights λ1 , . . . , λN such that
1

N
X N
X
x= λ i xi , where λi ≥ 0, ∀i = 1, . . . , N, λi = 1.
i=1 i=1

Among all such possible representations of x as a convex combination of points

from M , let us choose one with the smallest possible N , i.e., involving fewest
number of points from M . Let this representation of x be the above convex
combination. We claim that N ≤ m + 1; proving this claim is all we need to
complete the proof of Caratheodory Theorem.
Let us assume for contradiction that N > m + 1. Now, consider the following
system in N variables µ1 , . . . , µN :
N
X N
X
µi xi = 0, µi = 0.
i=1 i=1

This is a system of m + 1 scalar homogeneous linear equations (recall that we are

in the case of m = n, that is, xi ∈ Rm ). As N > m + 1, the number of variables
in this system is strictly greater than the number of equations. Therefore, this
system has a nontrivial solution, say δ1 , . . . , δN , i.e.,
N
X N
X
δi xi = 0, δi = 0, and [δ1 ; . . . ; δN ] ̸= 0.
i=1 i=1

Then, for every t ∈ R, we have the following representation of x as a linear

combination of the points x1 , . . . , xN :
N
X
(λi + tδi )xi = x.
i=1

For ease of reference, let us define λi (t) := λi + tδi for all i and for all t ∈ R. Note
that for any t ∈ R, by the definition of λi and δi , we always have
N
X N
X N
X N
X
λi (t) = (λi + tδi ) = λi + t δi = 1.
i=1 i=1 i=1 i=1

Moreover, when t = 0, λi (0) = λi for all i, and thus this is a convex combination
as all λi (0) ≥ 0 for all i. On the other hand, from the selection of δi , we know
PN
that i=1 δi = 0 and [δ1 ; . . . ; δN ] ̸= 0, and thus at least one entry in δ must be
negative. Therefore, when t is large, some of the coefficients λi (t) will be negative.
There exists, of course, the largest t = t∗ for which λi (t) ≥ 0 for all i = 1, . . . , N
holds, and for t = t∗ at least some of λi (t) are zero. Specifically, when setting
I − := {i : δi < 0},

λi λi
i∗ ∈ argmin : i ∈ I − , and t∗ := min : i ∈ I− ,
i |δi | i |δi |
we have λi (t∗ ) ≥ 0 for all i and λi∗ (t∗ ) = 0. This then implies that we have
28 Theorems of Caratheodory, Radon, and Helly

represented x as a convex combination of less than N points from M , which

contradicts the definition of N (being the smallest number of points xi needed in
the convex combination representation of x).
Remark I.2.4 Caratheodory Theorem is sharp: for every positive integer n,
there is a set of n + 1 affinely independent points in Rn (e.g., the origin and the n
standard basic orths) such that certain convex combination of these n + 1 points
(specifically, their average) cannot be represented as a convex combination using
strictly less than n + 1 points from the set. ■
Let us see an instructive corollary witnessing the power of Caratheodory The-
orem.

Corollary I.2.5 Let X ⊆ Rn be a closed and bounded set. Then, Conv(X)

is closed and bounded as well.
Proof. There is nothing to prove when X is empty. Now let X be nonempty,
closed and bounded, and define Y := Conv(X). Boundedness of Y is evident. In
order to verify that Y is closed, let {xt }t≥1 be a converging sequence of points
from Y . By Caratheodory Theorem, every one of vectors xt , beingP a convex
n+1
combination of vectors from X, has a representation of the form xt = i=1 λti xti
of at most n+1 vectors xti from X, i ≤ n+1, where λti are the corresponding convex
combination weights. Since X is bounded, passing to a subsequence t1 < t2 < . . .,
we can assume that the sequences {xtis }s≥1 and {λtis }s≥1 converge as s → ∞ for
every i ≤ n + 1, the limits Pbeing, respectively,
Pn+1vectors xi and reals λi . We clearly
n+1
have limt→∞ xt = lims→∞ i=1 λtis xtis = i=1 λi xi . In addition, as X is closed,
Pn+1
xi = lims→∞ xtis ∈ X. Moreover, clearly λ ≥ 0 and i=1 λi = 1. The bottom
Pn+1
line is that limt→∞ xt = i=1 λi xi is a convex combination of points from X and
thus belongs to Y as Y = Conv(X).
Remark I.2.6 Note that the convex hull of a closedunbounded set is not always
closed. For example, consider the set X = {[0; 0]} ∪ [u; v] ∈ R2+ : uv = 1 which
is closed and unbounded, and we have Conv(X) = [u; v] ∈ R2+ : u > 0, v > 0 ∪
{[0; 0]} which is not closed. ■
Let us see a concrete illustration, taken from [Nem24], of Caratheodory Theo-
rem.

2.1.1 Caratheodory Theorem, Illustration

Suppose that a supermarket sells 99 different market blend herbal teas, and every
herbal tea is a certain blend of 26 herbs A,. . . ,Z. In spite of such a variety of
marketed blends, John is not satisfied with any one of them; the only herbal tea
he likes is their mixture, in the proportion
1 : 2 : 3 : . . . : 98 : 99.
Once it occurred to John that in order to prepare his favorite tea, there is no
necessity to buy all 99 market blends; a smaller number of them will do. With
2.2 Radon Theorem 29

some arithmetics, John found a combination of 66 marketed blends which still

allows to prepare his tea. Do you believe John’s result can be improved?
Answer: In fact, just 26 properly selected market blends are enough.
Explanation: Let us represent a blend by its unit weight portion, say, 1g. Such
a portion can be identified with 26-dimensional vector x = [x1 ; . . . ; x26 ], where
xi is the weight,
P26 in grams, of herb #i in the portion. Clearly, we have x ∈
R26+ and i=1 x i = 1. When mixing market blends x1 , x2 , . . . , x99 to get unit
weight portion x of mixture, we take λj ≥ 0 grams of market blend xj , j =
P99
1, . . . , 99, and mix them together, that is, x = j=1 λj xj . Since the weight of
the mixture represented by x is 1 gram and λj s corresponds to the weight (in
P99
grams) of market blends xj used in x, we get j=1 λj = 1. The bottom line is
that blend x can be obtained by mixing market blends x1 , . . . , x99 if and only if
x ∈ Conv{x1 , . . . , x99 }.
Then, by Caratheodory Theorem, every blend which can be obtained by mixing
market blends can be obtained by mixing m + 1 of them, where m is the affine
dimension of the affine spannof x1 , . . . , x99 . In our case,
o this span belongs to the
P26
25-dimensional affine plane x ∈ R26 : x
i=1 i = 1 that is, m ≤ 25.
Caratheodory Theorem admits a “conic analogy” as follows:

Fact I.2.7 [Caratheodory Theorem in conic form] Let a ∈ Rm be a conic

combination (linear combination with nonnegative coefficients) of N vectors
a1 , . . . , aN . Then, a is a conic combination of at most m vectors from the
collection a1 , . . . , aN .

2.2 Radon Theorem

As an informal introduction to what follows, draw 4 arbitrary yet distinct from
each other points on the plane and try to color some of them in red, and remaining
in blue in such a way that the convex hull of red points will intersect the convex
hull of the blue ones. Experimentation will suggest that this is always possible.
The general fact is as follows.

Theorem I.2.8 [Radon Theorem] Let xi ∈ Rn , i ≤ N , where N ≥ n + 2.

Then, there exists a partition I ∪ J = {1, . . . , N } of the index set {1, . . . , N }
into two nonempty disjoint (I ∩ J = ∅) sets I and J such that
Conv xi : i ∈ I ∩ Conv xj : j ∈ J ̸= ∅.

Proof. Consider the following system of homogeneous equations in N variables

30 Theorems of Caratheodory, Radon, and Helly

µ1 , . . . , µ N
N
X
µi xi = 0,
i=1
N
X
µi = 0.
i=1

Note that as xi ∈ Rn , this system has n + 1 scalar linear equations. Moreover, as

the premise of the theorem states that N > n + 1, we deduce that this system of
equations has a nontrivial solution λ1 , . . . , λN :
N
X N
X
λi xi = 0, λi = 0, and [λ1 ; . . . ; λN ] ̸= 0.
i=1 i=1

Let I := {i : λi ≥ 0} and J := {i : λi < 0}. Then, I and J are nonempty and

form a partition of {1, . . . , N } (since the sum of all λi ’s is zero and not all λi ’s
are zero). Moreover, we have
X X
a := λi = (−λj ) > 0.
i∈I j∈J

Then, by setting
λi −λj
αi := , for i ∈ I, and βj := , for j ∈ J,
a a
we get
X X
αi ≥ 0, ∀i, βj ≥ 0, ∀j, αi = 1, βj = 1.
i∈I j∈J

In addition, we also have,

" # " # " # " #! N
X
i
X
j 1 X
i
X
j 1X
αi x − βj x = λi x − (−λj )x = λi xi = 0.
i∈I j∈J
a i∈I j∈J
a i=1

αi xi = βj xj is the desired common point of

P P
We conclude that the vector
i∈I j∈J
Conv ({xi : i ∈ I}) and Conv ({xj : j ∈ J}).
Remark I.2.9 Same as in Caratheodory Theorem, the bound in Radon Theorem
is sharp: for every positive integer n, there exist n+1 points in Rn (e.g., the origin
and the n standard basic orths) which cannot be split into two disjoint subsets
with intersecting convex hulls. ■

2.3 Helly Theorem

What follows is a multidimensional extension of the nearly evident fact:
Given finitely many segments [ai , bi ] on the line such that every two of
the segments intersect, we can always find a point common to all the
segments.
2.3 Helly Theorem 31

Multidimensional extension of this fact is as follows.

Theorem I.2.10 [Helly Theorem, I] Let F := {S1 , . . . , SN } be a finite family

of convex sets in Rn . Suppose that for every collection of at most n + 1 sets
from this family, the sets from the collection have a point in common. Then,
all of the sets Si , i ≤ N , have a common point.

Proof. We will prove the theorem by induction on the number N of sets in

the family. The case of N ≤ n + 1 holds immediately due to the premise of
the theorem. So, suppose that the statement holds for all families with certain
number N ≥ n + 1 of sets, and let S1 , . . . , SN , SN +1 be a family of N + 1 convex
sets which satisfies the premise of Helly Theorem. We need to prove that the
intersection of the sets S1 , . . . , SN , SN +1 is nonempty.
For each i ≤ N + 1, we construct the following N -set families
F i := {S1 , S2 , . . . , Si−1 , Si+1 , . . . , SN +1 } , ∀i ≤ N + 1,
where the N -set family Fi is obtained by deleting from our N + 1-set family the
set Si . Note that each of these N -set families F i satisfies the premise of the Helly
Theorem, and thus, by the inductive hypothesis, the intersection of the members
of F i is nonempty:
T i := S1 ∩ S2 ∩ . . . ∩ Si−1 ∩ Si+1 ∩ . . . ∩ SN +1 ̸= ∅, ∀i ≤ N + 1.
For each i ≤ N + 1, choose a point xi ∈ T i (recall that T i is nonempty!). Then,
we have N + 1 points xi ∈ Rn . As N ≥ n + 1, we have N + 1 ≥ n + 2 and by
Radon Theorem, we can partition the index set {1, . . . , N +1} into two nonempty
disjoint subsets I and J in such a way that certain convex combination x of the
points xi , i ∈ I, is a convex combination of the points xj , j ∈ J, as well. Let
us verify that x belongs to all the sets S1 , . . . , SN +1 , which will complete the
inductive step. Indeed, select any index i∗ ≤ N + 1 and let us prove that x ∈ Si∗ .
We have either i∗ ∈ I or i∗ ∈ J. Suppose first that i∗ ∈ I. Then, all the sets
T j , j ∈ J, are contained in Si∗ (since Si∗ participates in all intersections which
give T i with i ̸= i∗ ). Consequently, all the points xj , j ∈ J, belong to Si∗ , and
therefore x, which is a convex combination of these points, also belongs to Si∗
(recall that all Si are convex!), as required. Suppose now that i∗ ∈ J. In this
case, a similar reasoning shows that all the points xi , i ∈ I, belong to Si∗ , and
therefore x, which is a convex combination of these points xi , i ∈ I, belongs to
Si∗ . The induction and the proof are complete.
For an alternative proof of Theorem I.2.10 which does not utilize Radon The-
orem, see Exercise I.23.
Remark I.2.11 Helly Theorem admits a small and immediate refinement as
follows:
SN } be a family of N convex sets in Rn , and let m be
Let F := {S1 , . . . ,S
the dimension of i≤N Si . Assume that for every collection of at most
m + 1 sets from the family, the sets from the collection have a point in
common. Then, all sets from the family have a point in common.
32 Theorems of Caratheodory, Radon, and Helly

The justification of this claim follows from viewing Si as subsets of E rather

than Rn and applying the standard Helly Theorem. ■
Remark I.2.12 Same as Caratheodory and Radon Theorems, Helly Theorem
is sharp: for every positive integer n, there exists a finite family of convex sets
in Rn such that every n of them have a common point, but all of them have no
common point. Indeed, take n + 1 affinely independent points x1 , . . . , xn+1 in Rn
(say, the origin and the n basic orths in Rn ) and n + 1 convex sets S1 ,. . . ,Sn+1
with Si being the convex hull of points x1 , . . . , xi−1 , xi+1 , . . . , xn+1 . Every n of
these sets have a point in common (e.g., the common point of S1 , S3 , S4 , . . . , Sn+1
is x2 ), but there is no point common to all n + 1 sets (why?). ■

2.3.1 Helly Theorem, Illustration A

1
Suppose that we need to design a factory which, mathematically, is described
by the following set of constraints in variables x ∈ Rn :

Ax ≥ d [d1 , . . . , d1000 : demands] 
Bx ≤ f [f1 ≥ 0, . . . , f10 ≥ 0: amounts of resources of various types] (F )
Cx ≤ c, [other constraints]


where Ax ≥ d with a vector d ∈ R1000 represents the demand constraints, Bx ≤ f

with a vector f ∈ R10 + corresponds to availability of various resources, and the
constraint Cx ≤ c represents various additional restrictions. The data A, B, C, c
are given in advance, but d is unknown and we are asked to determine f . In
particular, we are asked to buy in advance the resources fi ≥ 0, i = 1, . . . , 10, in
such a way that the factory will be able to satisfy all demand scenarios d from a
given finite set D, that is, (F ) should be feasible for every d ∈ D. The unit cost
of resource i is given to us as ai > 0. We are given that the amount fi of resource
i costs us ai fi with known ai > 0.
It is known that for every single demand scenario d ∈ D proper (depending on
the scenario d) investment of at most $1 in resources suffices to meet the demand
d.
How large should the investment in resources be in the following cases when D
contains
1. just one scenario?
2. 3 scenarios?
3. 10 scenarios?
4. 100,000 scenarios?
Answer: We claim that in these scenarios the following investment amounts will
suffice:
1. $1 investment is enough.
2. $3 investment is enough.
1 The illustration to follow is taken from [Nem24].
2.3 Helly Theorem 33

3. $10 investment is enough.

4. $11 investment is enough.
Explanation:
Cases 1 — 3: In these cases, we know that every scenario d from D can be met
by a vector of resources fd ≥ 0 incurring a cost of at most $1. Thus, when we are
given scenarios d1 , . . . , dk from D, we can meet every one of them with the vector
of resources f := fd1 + . . . + fdk , since f ≥ fdi for every i = 1, . . . , k. Then, due to
the structure of our model, f meets demand scenario di using the resources fdi .
Case 4: We claim that $11 investment is always enough no matter what the
cardinality of D is. To see this, for every d ∈ D, we define
S[d, f ] := {x ∈ Rn : Ax ≥ d, Bx ≤ f, Cx ≤ c} ,
Fd := f ∈ R10 : a⊤ f ≤ 11, S[d, f ] ̸= ∅ .

Based on these definitions, Fd is precisely the set of vectors f such that their cost
a⊤ f is at most $11 and the associated polyhedral set S[f, d] is nonempty, that is,
resource f allows to meet demand d. Note that the set Fd is convex as it is the
linear image (in fact just the projection) of the convex set
f ∈ R10 , x ∈ Rn : a⊤ f ≤ 11, x ∈ S[d, f ] .

The punchline in this illustration is that every 11 sets of the form Fd have a
common point. Suppose that we are given 11 scenarios d1 , . . . , d11 from D. Then,
we can meet demand scenario di by investing $1 in properly selected vector of
resources fdi ≥ 0. As we proceeded in the cases 1—3, by investing $11 in the single
vector of resources f = fd1 + . . . + fd11 , we can meet every one of 11 scenarios
d1 , . . . , d11 , whence f ∈ Fd1 ∩ . . . ∩ Fd11 . Since every 11 of 100,000 convex sets
Fd ⊆ R10 , d ∈ D, have a point in common, all these sets have a common point,
say f∗ . That is, f∗ ∈ Fd for all d ∈ D, and thus by definition of Fd , we deduce that
every one of the sets S[d, f∗ ], d ∈ D, is nonempty, that is, vector of resources f∗
(which costs at most $11) allows us to satisfy every demand scenario d ∈ D.

2.3.2 Helly Theorem, Illustration B

Consider an optimization problem
Opt∗ := min11 c⊤ x : gi (x) ≤ 0, i = 1, . . . , 1000

x∈R

with 11 variables x1 , . . . , x11 and convex constraints, i.e., every one of the sets
Xi := x ∈ R11 : gi (x) ≤ 0 , i = 1, . . . , 1000,

is convex. Suppose also that the problem is solvable with optimal value Opt∗ = 0.
Clearly, when dropping one or more constraints, the optimal value can only de-
crease or remain the same.
Is it possible to find a constraint such that even if we drop it, we preserve the
optimal value? Two constraints which can be dropped simultaneously with no
34 Theorems of Caratheodory, Radon, and Helly

effect on the optimal value? Three of them?

Answer: We can in fact drop as many as 1000 − 11 = 989 appropriately chosen
constraints without changing the optimal value!
Explanation: The case of c = 0 is trivial - here one can drop all 1000 constraints
without varying the optimal value! Therefore, from now on we assume c ̸= 0.
Assume for contradiction that every 11-constraint relaxation of the original
problem has negative optimal value. Since there are finitely many such relax-
ations, there exists ϵ < 0 such that every problem of the form
min c⊤ x : gi1 (x) ≤ 0, . . . , gi11 (x) ≤ 0

x

has a feasible solution with the objective value < −ϵ. Besides this, such an 11-
constraint relaxation of the original problem has also a feasible solution with the
objective equal to 0 (namely, the optimal solution of the original problem), and
since its feasible set is convex (as the intersection of the convex feasible sets of
the participating constraints), the 11-constraint relaxation has a feasible solution
x with c⊤ x = −ϵ. In other words, every 11 of the 1000 convex sets
Yi := x ∈ R11 : c⊤ x = −ϵ, gi (x) ≤ 0 , i = 1, . . . , 1000

have a point in common. Now, consider the hyperplane H := {x ∈ R11 : c⊤ x =

−ϵ}. SNote that Yi ⊆ H for all i. Moreover, as c ̸= 0, dim(H) = 10 and thus
1000
dim( i=1 Yi ) ≤ dim(H) = 10. Since every 11 of these sets Yi have a nonempty
S1000
intersection and dim( i=1 Yi ) ≤ 10, all of them have a point in common. In
other words, the original problem should have a feasible solution with negative
objective value, which is not possible as Opt∗ = 0.

2.3.3 ⋆ Helly Theorem for infinite families of convex sets

In Helly Theorem as presented above we dealt with a family of finitely many
convex sets. To extend the statement to the case of infinite families, we need to
slightly strengthen the assumptions, essentially, to avoid complications stemming
from the following two situations:
• lack of closedness: every two (and in fact – finitely many) of convex sets Ai =
{x ∈ R : 0 < x ≤ 1/i}, i = 1, 2, . . ., have a point in common, while all the sets
have no common point;
• “intersection at ∞”: every two (and in fact – finitely many) of the closed convex
sets Ai = {x ∈ R : x ≥ i}, i = 1, 2, . . ., have a point in common, while all the
sets have no common point.
Resulting refined statement of Helly Theorem for a family of (possibly) in-
finitely many sets is as follows:

Theorem I.2.13 [Helly Theorem, II] Let F be an arbitrary family of convex

sets in Rn . Assume that
2.3 Helly Theorem 35

(i) for every collection of at most n + 1 sets from the family, the sets from
the collection have a point in common;
and
(ii) every set in the family is closed, and the intersection of the sets from a
certain finite subfamily of the family is bounded (e.g., one of the sets in the
family is bounded).
Then, all the sets from the family have a point in common.

Proof. By (i), Theorem I.2.10 implies that all finite subfamilies of F have
nonempty intersections, and also these intersections are convex (since intersec-
tion of a family of convex sets is convex by Proposition I.1.12); in view of (ii)
these intersections are also closed. Adding to F intersections of sets from finite
subfamilies of F, we get a larger family F ′ composed of closed convex sets, and
sets from a finite subfamily of this larger family again have a nonempty intersec-
tion. Moreover, from (ii) it follows that this new family contains a bounded set
Q. Since all the sets are closed, the family of sets
{Q ∩ Q′ : Q′ ∈ F}
forms a nested family of compact sets (i.e., a family of compact sets with nonempty
intersection of sets from every finite subfamily). Then, by a well-known theorem
from Real Analysis such a family has a nonempty intersection2) .

2 Here is the proof of this Real Analysis theorem: assume for contradiction that the intersection of the
compact sets Qα , α ∈ A, is empty. Choose a set Qα∗ from the family; for every x ∈ Qα∗ there is a
set Qx in the family which does not contain x (otherwise x would be a common point of all our
sets). Since Qx is closed, there is an open ball Vx centered at x which does not intersect Qx . The
balls Vx , x ∈ Qα∗ , form an open covering of the compact set Qα∗ . Since Qα∗ is compact, there
exists a finite subcovering Vx1 , . . . , VxN of Qα∗ by the balls from the covering, see Theorem B.19.
Since Qxi does not intersect Vxi , we conclude that the intersection of the finite subfamily
Qα∗ , Qx1 , . . . , QxN is empty, which is a contradiction.
3

Polyhedral representations and

Fourier-Motzkin elimination

3.1 Polyhedral representations

Recall that by definition a polyhedral set X in Rn is the solution set of a finite
system of nonstrict linear inequalities in variables x ∈ Rn :

X = {x ∈ Rn : Ax ≤ b} = x ∈ Rn : a⊤

i x ≤ bi , 1 ≤ i ≤ m .

We call such a representation of X its polyhedral description. A polyhedral set

is always convex and closed (Proposition I.1.2). We next introduce the notion of
polyhedral representation of a set X ⊆ Rn .

Definition I.3.1 A set X ⊆ Rn is called polyhedrally representable if it

admits a representation of the form
X = x ∈ Rn : ∃u ∈ Rk : Ax + Bu ≤ c ,

(3.1)
where A, B are m × n and m × k matrices and c ∈ Rm . A representation
of X of the form of (3.1) is called a polyhedral representation of X, and the
variables u ∈ Rk in such a representation are called extra variables.

Geometrically, a polyhedral
representation of a set X ⊆ Rn is its repre-
sentation
as the projection x ∈ Rn : ∃u ∈ Rk : [x; u] ∈ Y of a polyhedral set
Y = (x, u) ∈ Rn × Rk : Ax + Bu ≤ c . Here, Y lives in the space of n + k vari-
ables x ∈ Rn and u ∈ Rk , and the polyhedral representation of X is obtained
by applying the linear mapping (the projection) [x; u] 7→ x : Rn+k → Rn of the
(n + k)-dimensional space of (x, u)-variables (the space where Y lives) to the
n-dimensional space of x-variables where X lives.
36
3.1 Polyhedral representations 37

Figure I.5. Polyhedral representation of hexagon in xy-plane

as the projection of rotated 3D cube onto the plane
Note that every polyhedrally representable set is the image under a linear
mapping (even a projection) of a polyhedral, and thus convex, set. It follows that
a polyhedrally representable set is definitely convex (Proposition I.1.21).
Example I.3.1 Every polyhedral set X = {x ∈ Rn : Ax ≤ b} is polyhedrally
representable: a polyhedral description of X is nothing but a polyhedral repre-
sentation with no extra variables (k = 0). Vice versa, a polyhedral representation
of a set X with no extra variables (k = 0) clearly is a polyhedral description of
the set (which therefore is polyhedral). ♢
n
Pn
Example I.3.2 Consider the set X = {x ∈ R : i=1 |xi | ≤ 1}. Note that this
initial description of X is not of the form {x ∈ Rn : Ax ≤ b}. Thus, from this
description of X, we cannot immediately say whether it is polyhedral or not.
However, X admits a polyhedral representation, e.g., the following representation
 

 X n 

X = x ∈ Rn : ∃u ∈ Rn : −ui ≤ xi ≤ ui , 1 ≤ i ≤ n, ui ≤ 1 . (3.2)
 | {z } 
 i=1 
⇐⇒ |xi |≤ui

Note that the set X in question can be described by a system of linear inequalities
in x-variables only, namely, as
( n
)
X
n
X= x∈R : ϵi xi ≤ 1 , ∀(ϵi ∈ {−1, +1}), 1 ≤ i ≤ n) ,
i=1

thus, X is polyhedral. However, the above polyhedral description of X (which in

fact is minimal in terms of the number of inequalities involved) requires 2n in-
equalities — an astronomically large number when n is just few tens. In contrast,
the polyhedral representation of the same set given in (3.2) requires only n extra
variables u and 2n + 1 linear inequalities on x, u and so the “complexity” of this
representation is just linear in n. ♢
1 m n
Example I.3.3 Given a finite P set of vectors a , . . . , a ∈ R , consider their
m
conic hull Cone {a1 , . . . , am } = { i=1 λi ai : λ ≥ 0} (see section 1.2.4). From this
38 Polyhedral representations and Fourier-Motzkin elimination

definition, it is absolutely unclear whether this set is polyhedral. In contrast to

this, its polyhedral representation is immediate:
( m
)
X
Cone a , . . . , am = x ∈ Rn : ∃λ ∈ Rm
1
+: x = λi ai
i=1
  
  −λ ≤
Pm0 
= x ∈ Rn : ∃λ ∈ Rm : x− P i=1 λi ai ≤ 0 .
m
−x + i=1 λi ai ≤ 0
  

In other words, the original description of X is nothing but its polyhedral repre-
sentation (in slight disguise), with λi ’s in the role of extra variables. ♢

3.2 Every polyhedrally representable set is polyhedral

(Fourier-Motzkin elimination)
A surprising and deep fact is that the situation in Example I.3.2 above is quite
general.

Theorem I.3.2 Every polyhedrally representable set is polyhedral.

Proof: Fourier-Motzkin Elimination. Recalling the definition of a polyhe-

drally representable set, our claim can be rephrased equivalently as follows:
The projection of a polyhedral set Y in a space Rn+k
x,u of (x, u)-variables
onto the subspace Rx of x-variables is a polyhedral set in Rn .
n

Note that it suffices to prove this claim in the case of exactly one extra variable
since the projection which reduces the dimension by k — “eliminates” k extra
variables — is the result of k subsequent projections, every one reducing the
dimension by 1, “eliminating” the extra variables one by one.
Thus, consider a polyhedral set with variables x ∈ Rn and u ∈ R, i.e.,
Y := (x, u) ∈ Rn+1 : a⊤

i x + bi u ≤ ci , 1 ≤ i ≤ m .

We want to prove that the projection of Y onto the space of x-variables, i.e.,
X := {x ∈ Rn : ∃u ∈ R: Ax + bu ≤ c} ,
is polyhedral. To see this, let us split the indices of the inequalities defining Y
into three groups (some of these groups can be empty):
• inequalities with bi = 0: I0 := {i : bi = 0}. These inequalities with index i ∈ I0
do not involve u at all;

• inequalities with bi > 0: I+ := {i : bi > 0}. An inequality with index i ∈ I+ can

be rewritten equivalently as u ≤ b−1 ⊤
i [ci − ai x], and it imposes a (depending on
x) upper bound on u;
3.2 Every polyhedrally representable set is polyhedral (Fourier-Motzkin elimination)39

• inequalities with bi < 0: I− := {i : bi < 0}. An inequality with index i ∈ I− can

be rewritten equivalently as u ≥ b−1 ⊤
i [ci − ai x], and it imposes a (depending on
x) lower bound on u.
We can now clearly answer the question of when x can be in X, that is, when
x can be extended, by some u, to a point (x, u) from Y : this is the case if and
only if, first, x satisfies all inequalities with i ∈ I0 , and, second, the inequalities
with i ∈ I+ giving the upper bounds on u specified by x are compatible with the
inequalities with i ∈ I− giving the lower bounds on u specified by x, meaning
that every lower bound is less than or equal to every upper bound (the latter is
necessary and sufficient to be able to find a value of u which is greater than or
equal to all lower bounds and less than or equal to all upper bounds). Thus,
a⊤

n i x ≤ ci , ∀i ∈ I0
X = x ∈ R : −1 −1 .
bj (cj − a⊤ ⊤
j x) ≤ bk (ck − ak x), ∀j ∈ I− , ∀k ∈ I+ .

We see that X is given by finitely many nonstrict linear inequalities in x-variables

only, as claimed.
The outlined procedure for building polyhedral descriptions (i.e., polyhedral
representations not involving extra variables) for projections of polyhedral sets is
called Fourier-Motzkin elimination.

3.2.1 Some applications

As an immediate application of Fourier-Motzkin elimination, let us take a linear
program min{c⊤ x : Ax ≤ b} and look at the set T of possible objective values of
x
all its feasible solutions (if any):
T := t ∈ R : ∃x ∈ Rn : c⊤ x = t, Ax ≤ b .

Rewriting the linear equality c⊤ x = t as a pair of opposite inequalities, we see

that T is polyhedrally representable, and the above definition of T is nothing
but a polyhedral representation of this set, with x in the role of the vector of
extra variables. By Fourier-Motzkin elimination, T is polyhedral – this set is
given by a finite system of nonstrict linear inequalities in variable t only. Thus,
we immediately see that T is
1. either empty (meaning that the LP in question is infeasible),
2. or is a below unbounded nonempty set of the form {t ∈ R : −∞ ≤ t ≤ b} with
b ∈ R ∪ {+∞} (meaning that the LP is feasible and unbounded),
3. or is a below bounded nonempty set of the form {t ∈ R : a ≤ t ≤ b} with a ∈ R
and +∞ ≥ b ≥ a. In this case, the LP is feasible and bounded, and its optimal
value is a.
Note that given the list of linear inequalities defining T (this list can be built
algorithmically by Fourier-Motzkin elimination as applied to the original poly-
hedral representation of T ), we can easily detect which one of the above cases
indeed takes place, i.e., we can identify the feasibility and boundedness status
40 Polyhedral representations and Fourier-Motzkin elimination

of the LP and to find its optimal value. When it is finite (case 3. above), we
can use the Fourier-Motzkin elimination backward, starting with t = a ∈ T and
extending this value to a pair (t, x) with t = a = c⊤ x and Ax ≤ b, that is, we
can augment the optimal value by an optimal solution. Thus, we can say that
Fourier-Motzkin elimination is a finite Real Arithmetics algorithm which allows
to check whether an LP is feasible and bounded, and when it is the case, allows
to find the optimal value and an optimal solution.
On the other hand, Fourier-Motzkin elimination is completely impractical since
the elimination process can blow up exponentially the number of inequalities.
Indeed, from the description of the process it is clear that if a polyhedral set
is given by m linear inequalities, then eliminating one variable, we can end up
with as much as m2 /4 inequalities (this is what happens if there are m/2 indices
in I+ , m/2 indices in I− and I0 = ∅). Eliminating the next variable, we again
can “nearly square” the number of inequalities, and so on. Thus, the number of
inequalities in the description of T can become astronomically large even when
the dimension of x is something like 10.
The actual importance of Fourier-Motzkin elimination is of theoretical nature.
For example, the Linear Programming (LP)-related reasoning we have just carried
out shows that
every feasible and bounded LP problem is solvable, i.e., it has an optimal
solution.
(We will revisit this result in more details in section 9.3.1). This is a fundamental
fact for LP, and the above reasoning (even with the justification of the elimina-
tion “charged” to it) is, to the best of our knowledge, the shortest and most
transparent way to prove this fundamental fact. Another application of the fact
that polyhedrally representable sets are polyhedral is the Homogeneous Farkas
Lemma to be stated and proved in section 4.1; this lemma will be instrumental
in numerous subsequent theoretical developments.

3.3 Calculus of polyhedral representations

The fact that polyhedral sets are exactly the same as polyhedrally representable
ones does not nullify the notion of a polyhedral representation. The point is
that a set can admit “quite compact” polyhedral representation involving extra
variables and require astronomically large, completely meaningless for any prac-
tical purpose, number of inequalities in its polyhedral description (think about
Example I.3.2 and the associated set (3.2) when n = 100). Moreover, polyhe-
dral representations admit a kind of “fully algorithmic calculus.” Specifically, it
turns out that all basic convexity-preserving operations (cf. Proposition I.1.21)
as applied to polyhedral operands preserve polyhedrality. Moreover, polyhedral
representations of the results are readily given by polyhedral representations of
the operands. Here is the “algorithmic polyhedral analogy” of Proposition I.1.21:

1. Taking finite intersection: Let Mi , 1 ≤ i ≤ m, be polyhedral sets in Rn given

3.3 Calculus of polyhedral representations 41

by their polyhedral representations

Mi = x ∈ Rn : ∃ui ∈ Rki : Ai x + Bi ui ≤ ci , 1 ≤ i ≤ m.

Then, the intersection of the sets Mi is polyhedral with an explicit polyhedral

representation, i.e.,
Tm
Mi
i=1
= x ∈ Rn : ∃u = [u1 ; . . . ; um ] ∈ Rk1 +...+km : Ai x + Bi ui ≤ ci , 1 ≤ i ≤ m .
2. Taking direct product: Let Mi ⊆ Rni , 1 ≤ i ≤ m, be polyhedral sets given by
polyhedral representations
Mi = xi ∈ Rni : ∃ui ∈ Rki ]asntxtf :Ai xi + Bi ui ≤ ci , 1 ≤ i ≤ m.

Then, the direct product

M1 × . . . × Mm := x = [x1 ; . . . ; xm ] : xi ∈ Mi , 1 ≤ i ≤ m

of the sets is a polyhedral set with explicit polyhedral representation, i.e.,

M1× . . . × Mm
x = [x1 ; . . . ; xm ] ∈ Rn1 +...+nm : ∃u = [u1 ; . . . ; um ] ∈ Rk1 +...+km :

= .
Ai xi + Bi ui ≤ ci , 1 ≤ i ≤ m
3. Arithmetic summation and multiplication by reals: Let Mi ⊆ Rn , 1 ≤ i ≤ m,
be polyhedral sets given by polyhedral representations
Mi = x ∈ Rn : ∃ui ∈ Rki : Ai x + Bi ui ≤ ci , 1 ≤ i ≤ m,

and let λ1 , . . . , λk be reals. Then, the set λ1 M1 + . . . + λm Mm := {x = λ1 x1 +

. . . + λm xm : xi ∈ Mi , 1 ≤ i ≤ m} is polyhedral with explicit polyhedral
representation, specifically,
λ1 M
1 + . . . n+ λm M m
x ∈ R : ∃(xi P ∈ Rn , ui ∈ RkP

i
, 1 ≤ i ≤ m):
= .
x ≤ i λi xi , x ≥ i λi xi , Ai xi + Bi ui ≤ ci , 1 ≤ i ≤ m
4. Taking the image under an affine mapping: Let M ⊆ Rn be a polyhedral set
given by polyhedral representation
M = x ∈ Rn : ∃u ∈ Rk : Ax + Bu ≤ c ,

and let P(x) = P x + p : Rn → Rm be an affine mapping. Then, the image of

M under this mapping, i.e., P(M ) := {P x + p : x ∈ M }, is a polyhedral set
with explicit polyhedral representation given by
  
  y ≤ Px + p 
P(M ) = y ∈ Rm : ∃(x ∈ Rn , u ∈ Rk ) : y ≥ Px + p .
Ax + Bu ≤ c
  

5. Taking the inverse image under affine mapping: Let M ⊆ Rn be polyhedral

set given by polyhedral representation
M = x ∈ Rn : ∃u ∈ Rk : Ax + Bu ≤ c ,

42 Polyhedral representations and Fourier-Motzkin elimination

and let P(y) = P y + p : Rm → Rn be an affine mapping. Then, the inverse

image of M under this mapping, i.e., P −1 (M ) := {y ∈ Rm : P y + p ∈ M }, is
a polyhedral set with explicit polyhedral representation given by
P −1 (M ) = y ∈ Rm : ∃u ∈ Rk : A(P y + p) + Bu ≤ c .

Note that the rules for intersection, taking direct products and taking inverse
images, as applied to polyhedral descriptions of operands, lead to polyhedral de-
scriptions of the results. In contrast to this, the rules for taking sums with coeffi-
cients and images under affine mappings heavily exploit the notion of polyhedral
representation: even when the operands in these rules are given by polyhedral
descriptions, there are no simple ways to point out polyhedral descriptions of the
results.
Absolutely straightforward justification of the above calculus rules is the sub-
ject of Exercise I.27.
Finally, we note that the problem of minimizing a linear form c⊤ x over a set
M given by its polyhedral representation, i.e.,
M = x ∈ Rn : ∃u ∈ Rk : Ax + Bu ≤ c ,

can be immediately reduced to an explicit LP program, namely,

min c⊤ x : Ax + Bu ≤ c .

x,u

A reader with some experience in Linear Programming definitely used a lot of

the above “calculus of polyhedral representations” when building LPs (perhaps
without a clear understanding of what in fact is going on, same as Molière’s
Monsieur Jourdain all his life has been speaking prose without knowing it).
4

General Theorem on Alternative and Linear

Programming Duality

4.1 Homogeneous Farkas Lemma

Let a1 , . . . , aN be vectors from Rn , and let a be another vector from Rn . Here,
we address the question: when does a belong to the cone spanned by the vectors
a1 , . . . , aN , i.e., when can a be represented as a linear combination of ai with
nonnegative coefficients? We immediately observe the following evident necessary
condition:
N
X
if a = λi ai [where λi ≥ 0, i = 1, . . . , N ],
i=1

then every vector h that has nonnegative inner products with all ai ’s
should also have nonnegative inner product with a:
( N
)
X
a= λi ai , with λi ≥ 0, ∀i, and h ai ≥ 0, ∀i =⇒ h⊤ a ≥ 0.
⊤

i=1

In fact, this evident necessary condition is also sufficient. This is given by the
Homogeneous Farkas Lemma.

Lemma I.4.1 [Homogeneous Farkas Lemma (HFL)] Let a, a1 , . . . , aN be

vectors from Rn . The vector a is a conic combination of the vectors ai (linear
combination with nonnegative coefficients), i.e., a ∈ Cone {a1 , . . . , aN }, if and
only if every vector h satisfying h⊤ ai ≥ 0, i = 1, . . . , N , satisfies also h⊤ a ≥ 0.
In other words, a homogeneous linear inequality
a⊤ h ≥ 0
in variable h is a consequence of the system
a⊤
i h ≥ 0, 1≤i≤N
of homogeneous linear inequalities if and only if it can be obtained from the
inequalities of the system by “admissible linear aggregation” – taking their
weighted sum with nonnegative weights.

Proof. The necessity – the “only if” part of the statement – was proved before
the Homogeneous Farkas Lemma was formulated. Let us prove the “if” part of
the lemma. Thus, we assume that h⊤ a ≥ 0 is a consequence of the homogeneous
43
44 General Theorem on Alternative and Linear Programming Duality

system h⊤ ai ≥ 0 ∀i, i.e., every vector h satisfying h⊤ ai ≥ 0 ∀i satisfies also

h⊤ a ≥ 0, and let us prove that a is a conic combination of the vectors ai .
An “intelligent” proof goes as follows. The set Cone {a1 , . . . , aN } of all conic
combinations of a1 , . . . , aN is polyhedrally representable (see Example I.3.3) and
as such is polyhedral (Theorem I.3.2). Hence, we have
Cone {a1 , . . . , aN } = x ∈ Rn : p⊤

j x ≥ bj , 1 ≤ j ≤ J . (4.1)
Now, observe that 0 ∈ Cone {a1 , . . . , aN }, and thus we conclude that bj ≤ 0 for
all j ∈ J. Moreover, since λai ∈ Cone {a1 , . . . , aN } for every i and every λ ≥ 0, we
deduce λp⊤ ⊤
j ai ≥ bj for all i, j and all λ ≥ 0, whence pj ai ≥ 0 for all i and j. For
every j, the relation p⊤ j ai ≥ 0 for all i implies, by the premise of the statement
we want to prove, that p⊤ ⊤
j a ≥ 0. Then, as 0 ≥ bj for all j, we see that pj a ≥ bj
for all j, meaning that a indeed belongs to Cone {a1 , . . . , aN } due to (4.1).
This very short and elegant proof of Homogeneous Farkas Lemma is a nice
illustration of the power of Fourier-Motzkin elimination.

4.2 Certificates for feasibility and infeasibility

Consider a (finite) system of scalar inequalities with n unknowns. To be as general
as possible, we do not assume for the time being the inequalities to be linear, and
we allow for both non-strict and strict inequalities in the system, as well as for
equalities. Since an equality can be represented by a pair of non-strict inequalities,
our system can always be written as
fi (x) Ωi 0, i = 1, . . . , m, (S)
where every Ωi is either the relation “>” or the relation “≥”, and we assume
m ≥ 1, which is the only case of interest here.
The most basic question about (S) is
(Q) Does (S) have a solution, i.e., is (S) feasible?
Knowing how to answer the question (Q) enables us to answer many other
questions. For example, verifying whether a given real number a is a lower bound
on the optimal value Opt∗ of a linear program
min c⊤ x : Ax ≥ b

(LP)
x

is the same as verifying whether the system

−c⊤ x + a > 0
Ax − b ≥ 0
has no solutions.
The general question (Q) above is too difficult, and it makes sense to pass from
it to a seemingly simpler one:
(Q′ ) How do we certify that (S) has, or does not have, a solution?
Imagine that you are very smart and know the correct answer to (Q) ; how can
4.2 Certificates for feasibility and infeasibility 45

you convince everyone that your answer is correct? What can be an “evident for
everybody” validity certificate for your answer?
If your claim is that (S) is feasible, a certificate can be just to point out a
solution x∗ to (S). Given this certificate, one can substitute x∗ into the system
and check whether x∗ is indeed a solution.
Suppose now that your claim is that (S) has no solutions. What can be a
“simple certificate” of this claim? How can one certify a negative statement? This
is a highly nontrivial problem not just for mathematics; for example, in criminal
law, how should someone accused in a murder prove his innocence? The “real life”
answer to the question “how to certify a negative statement” is discouraging: such
a statement normally cannot be certified1 . In mathematics, the standard way to
justify a negative statement A, like “there is no solution to such and such system
of constraints” (e.g., “there is no solutions to the equation x5 + y 5 = z 5 with
positive integer variables x, y, z”) is to lead the opposite to A statement, i.e.,
A (in our example, “the solution exists”), to a contradiction. That is, assume
that A is true and derive consequences until a clearly false statement is obtained;
when this happens, we know that A is false (since legitimate consequences of a
true statement must be true), and therefore A must be true. In general, there is
no recipe for leading to contradiction something which in fact is false; this is why
certifying negative statements usually is difficult.
Fortunately, finite systems of linear inequalities are simple enough to allow
for a recipe for certifying their infeasibility: we start with the assumption that
a solution exists and then demonstrate a contradiction in a very specific way
– by taking weighted sum of the inequalities in the system using nonnegative
aggregation weights to produce a contradictory inequality.
Let us start with a simple illustration: we would like to certify infeasibility of
the following system of inequalities in variables u, v, w:
5u −6v −4w > 2
+4v −2w ≥ −1
−5u +7w ≥ 1
Let us assign these inequalities with “aggregation weights” 2, 3, 2, multiply the
inequalities by the respective weights and sum up the resulting inequalities:
2× 5u −6v −4w > 2
+
3× +4v −2w ≥ −1
+
2× −5u +7w ≥ 1
(∗) 0 · u +0 · v +0 · w > 3
The resulting aggregated inequality (∗) is contradictory, it has no solutions at
1 This is where the court rule “a person is presumed innocent until proven guilty” comes from –
instead of requesting from the accused to certify the negative statement “I did not commit the
crime,” the court requests from the prosecution to certify the positive statement “The accused did
commit the crime.”
46 General Theorem on Alternative and Linear Programming Duality

all. At the same time, (∗) is a consequence of our system – by construction of

(∗), every solution to the original system of three inequalities is also feasible to
(∗). Taken together, these two observations say that the system has no solutions,
and the vector [2; 3; 2] of our aggregation weights can be seen as an infeasibil-
ity certificate – taking weighted sum of inequalities from the system with the
corresponding nonnegative weights, we lead the system to a contradiction.
As applied to a general system of inequalities (S), a similar approach to certify-
ing infeasibility would be to assign the inequalities with nonnegative aggregation
weights, multiply them by these weights and sum up the resulting inequalities,
arriving at an aggregated inequality, which, due to its origin, is a consequence
of system (S), meaning that every solution to the system solves the aggregated
inequality as well. It follows that when the aggregated inequality is contradictory,
i.e., it has no solutions at all, the original system (S) must be infeasible as well.
When this happens, the collection of weights used to generate the contradictory
consequence inequality can be viewed as an infeasibility certificate for (S).
Let us look what the outlined approach means when (S) is composed of finitely
many linear inequalities:
⊤
ai x Ωi bi , i = 1, . . . , m [where Ωi is either “>” or “≥”] . (S)
In this case the “aggregated inequality” is linear as well:
m
!⊤ m
X X
λi ai x Ω λi bi , (Comb(λ))
i=1 i=1

where Ω is “>” whenever λi > 0 for at least one i with Ωi = “ > ”, and Ω is “≥”
otherwise. Now, when can a linear inequality
d⊤ x Ω e
be contradictory? Of course, it can happen only when d = 0. Furthermore, in
this case, whether the inequality is contradictory depends on the relation Ω and
the value of e: if Ω = “ > ”, then the inequality is contradictory if and only if
e ≥ 0, and if Ω = “ ≥ ”, then it is contradictory if and only if e > 0. We have
established the following simple result:

Proposition I.4.2 Consider a system of linear inequalities in unknowns

x ∈ Rn : (
a⊤
i x > bi , i = 1, . . . , ms ,
(S)
a⊤
i x ≥ bi , i = ms + 1, . . . , m.
Let us associate with (S) two systems of linear inequalities and equations
4.3 General Theorem on Alternative 47

with unknowns λ ∈ Rm :

(a) P λ ≥ 0, 

 m  (a) P λ ≥ 0,

(b) Pi=1 λi ai = 0, m
TI : m TII : (b) λi ai = 0,
(cI ) i=1 λi bi ≥ 0, Pi=1
m
(cII ) i=1 λi bi > 0.
 
ms
 P
(dI ) i=1 λi > 0.


If at least one of the systems TI , TII is feasible, then the system (S) is infea-
sible.

4.3 General Theorem on Alternative

Proposition I.4.2 states that in some cases it is easy to certify infeasibility of
a linear system of inequalities: a “simple certificate” is a solution to another
system of linear inequalities. Note, however, that the existence of a certificate of
this latter type so far is only a sufficient, but not a necessary, condition for the
infeasibility of (S). A fundamental result in the theory of linear inequalities is
that this sufficient condition is in fact also necessary:

Theorem I.4.3 [General Theorem on Alternative (GTA)] Consider the no-

tation and setting of Proposition I.4.2. System (S) has no solutions if and
only if at least one of the systems TI or TII is feasible.

Proof. GTA is a more or less straightforward corollary of the Homogeneous

Farkas Lemma. Indeed, in view of Proposition I.4.2, all we need to prove is that
if (S) has no solution, then at least one of the systems TI , or TII is feasible.
Thus, assume that (S) has no solutions, and let us look at the consequences. Let
us associate with (S) the following system of homogeneous linear inequalities in
variables x, τ , ϵ:
(a) τ −ϵ ≥ 0,
(b) a⊤
i x −biτ −ϵ ≥ 0, i = 1, . . . , ms , (4.2)
⊤
(c) ai x −bi τ ≥ 0, i = ms + 1, . . . , m.

First, we claim that in every solution to (4.2), one has ϵ ≤ 0. Indeed, assuming
that (4.2) has a solution x, τ, ϵ with ϵ > 0, we conclude from (4.2.a) that τ > 0.
Then, from (4.2.b − c) it will follow that τ −1 x is a solution to (S), while we
assumed (S) is infeasible. Therefore, we must have ϵ ≤ 0 in every solution to
(4.2).
Now, we have that the homogeneous linear inequality

−ϵ ≥ 0 (4.3)

is a consequence of the system of homogeneous linear inequalities (4.2). Then, by

Homogeneous Farkas Lemma, there exist nonnegative weights ν, λi , i = 1, . . . , m,
such that the aggregated inequality from (4.2) using these weights results in
48 General Theorem on Alternative and Linear Programming Duality

precisely the consequence inequality (4.3), i.e.,

Pm
(a) Pm i=1 λi ai = 0,
(b) − P i=1 λi bi + ν = 0, (4.4)
ms
(c) − i=1 λi − ν = −1.
Recall that by their origin, ν and all λi are nonnegative. Now, it may happen
that λ1 , . . . , λms are zero. In this case ν = 1 by (4.4.c), and relations (4.4a − b)
say that λ1 , . . . , λm is a solution for TII . In the remaining
Pmcase (that is, when not
all λ1 , . . . , λms are zero, or, which is the same, when i=1 s
λi = 1 − ν > 0), the
same relations (4.4a − b) say that λ1 , . . . , λm is a solution for TI .
Remark I.4.4 We have derived GTA from Homogeneous Farkas Lemma (HFL).
Note that HFL is nothing but a special case of GTA. Indeed, identifying when a
linear inequality a⊤ x ≤ b is a consequence of the system a⊤ i xi ≤ bi , 1 ≤ i ≤ m
(this is the question answered by HFL in the case of b = b1 = . . . = bm = 0) is
exactly the same as identifying when the system of inequalities
a⊤ x > b, a⊤
i x ≤ bi , i = 1, . . . , m (∗)
in variables x is infeasible, and what in the latter case is said by GTA, is exactly
what HFL states: when b = b1 = . . . = bm = 0, the system (∗) is infeasible if and
only if the vector a is a conic combination of the vectors a1 , . . . , am . Thus, it is
completely sensible that GTA, in full generality, was derived from its indepen-
dently justified special case, HFL. ■

4.4 Corollaries of GTA

Let us explicitly state two very useful principles derived from the General Theo-
rem on Alternative:
A. A system of finitely many linear inequalities
a⊤
i x Ωi bi , i = 1, . . . , m [where Ωi ∈ {“ ≥ ”, “ > ”}]
has no solutions if and only if one can aggregate the inequalities of the system
in a linear fashion (i.e., multiplying the inequalities by nonnegative weights,
summing the resulting inequalities and passing, if necessary, from an inequality
a⊤ x > b to the inequality a⊤ x ≥ b) to get a contradictory inequality, namely,
either the inequality 0⊤ x ≥ 1, or the inequality 0⊤ x > 0.
B. A linear inequality
a⊤
0 x Ω0 b0 [where Ω0 ∈ {“ ≥ ”, “ > ”}]
in variables x is a consequence of a feasible system of linear inequalities
a⊤
i x Ωi bi , i = 1, . . . , m, [where Ωi ∈ {“ ≥ ”, “ > ”}]
if and only if it can be obtained by linear aggregation with nonnegative weights
from the inequalities of the system and the trivial identically true inequality
0⊤ x > −1.
4.4 Corollaries of GTA 49

In fact, when all Ωi in B are non-strict, B can be reformulated equivalently as

follows.

Proposition I.4.5 [Inhomogeneous Farkas Lemma] Linear inequality

a⊤ x ≤ b
is a consequence of the feasible system of linear inequalities
a⊤
i x ≤ bi , 1≤i≤m
if and only if there exist nonnegative aggregation weights λi , i = 1, . . . , m,
such that
Xm Xm
a= λi ai and b ≥ λi bi .
i=1 i=1

We would like to emphasize that the preceding principles are highly nontrivial
and very deep. Consider, e.g., the following system of 4 linear inequalities in two
variables u, v:
−1 ≤ u ≤ 1,
−1 ≤ v ≤ 1.
These inequalities clearly imply that
u2 + v 2 ≤ 2, (!)
which in turn implies, by the Cauchy-Schwarz inequality, the linear inequality
u + v ≤ 2:
√ √ √
u + v = 1 × u + 1 × v ≤ 12 + 12 u2 + v 2 ≤ ( 2)2 = 2. (!!)
The concluding inequality u + v ≤ 2 is linear and is a consequence of the original
feasible system, and so we could have simply relied on Principle B to derive it.
On the other hand, in the preceding demonstration of this linear consequence
inequality both steps (!) and (!!) are “highly nonlinear.” It is absolutely unclear
a priori why the same consequence inequality can, as it is stated by Principle B, be
derived from the system in a “linear” manner as well (of course it can – it suffices
just to sum up two inequalities u ≤ 1 and v ≤ 1). In contrast, Inhomogeneous
Farkas Lemma predicts that hundreds of pages of whatever complicated (but
correct!) demonstration that such and such linear inequality is a consequence
of such and such feasible finite system of linear inequalities can be replaced by
simply demonstrating weights of prescribed signs such that the target inequality
is the weighted sum, with these weights, of the inequalities from the system and
the identically true linear inequality. One shall appreciate the elegance and depth
of such a result!
Note that the General Theorem on Alternative and its corollaries A and B
heavily exploit the fact that we are speaking about linear inequalities. For exam-
ple, consider the following system of two quadratic and two linear inequalities in
50 General Theorem on Alternative and Linear Programming Duality

two variables:
(a) u2 ≥ 1,
(b) v 2 ≥ 1,
(c) u ≥ 0,
(d) v ≥ 0,
along with the quadratic inequality
(e) uv ≥ 1.
The inequality (e) is clearly a consequence of (a) – (d). However, if we extend the
system of inequalities (a) – (b) by all “trivial” (i.e., identically true) linear and
quadratic inequalities in 2 variables, like 0 > −1, u2 + v 2 ≥ 0, u2 + 2uv + v 2 ≥ 0,
u2 − 2uv + v 2 ≥ 0, etc., and ask whether (e) can be derived in a linear fashion
from the inequalities of the extended system, the answer will be negative. Thus,
Principle B fails to be true already for quadratic inequalities (which is a great
sorrow – otherwise there would be no difficult problems at all!).

4.5 Application: Linear Programming Duality

We are about to use the General Theorem on Alternative to obtain the basic re-
sults of the Linear Programming (LP) duality theory. To do so, we first introduce
some basic terminology about mathematical programming problems.

4.5.1 Preliminaries: Mathematical and Linear Programming

problems
A (constrained) Mathematical Programming problem has the following form:
 
 x ∈ X, 
(P) min f (x) : g(x) ≡ [g1 (x); . . . ; gm (x)] ≤ 0, , (4.5)
x 
h(x) ≡ [h1 (x); . . . ; hk (x)] = 0


where
• [domain] X is called the domain of the problem,
• [objective] f is called the objective (function) of the problem,
• [constraints] gi , i = 1, . . . , m, are called the (functional) inequality constraints,
and hj , j = 1, . . . , k, are called the equality constraints 2) .
We always assume that X ̸= ∅ and that the objective and the constraints are
well-defined on X. Moreover, we typically skip indicating X when X = Rn .
We use the following standard terminology related to (4.5)
2 Rigorously speaking, the constraints are not the functions gi , hj , but the relations gi (x) ≤ 0,
hj (x) = 0. We will use the word “constraints” in both of these senses, and it will always be clear
what is meant. For example, we will say that “x satisfies the constraints” to refer to the relations,
and we will say that “the constraints are differentiable” to refer to the underlying functions.
4.5 Application: Linear Programming Duality 51

• [feasible solution] a point x ∈ Rn is called a feasible solution to (4.5), if x ∈ X,

gi (x) ≤ 0, i = 1, . . . , m, and hj (x) = 0, j = 1, . . . , k, i.e., if x satisfies all
restrictions imposed by the formulation of the problem.
– [feasible set] the set of all feasible solutions is called the feasible set of the
problem.
– [feasible problem] a problem with a nonempty feasible set (i.e., the one which
admits feasible solutions) is called feasible (or consistent).
• [optimal value] the optimal value of the problem refers to the quantity
(
inf {f (x) : x ∈ X, g(x) ≤ 0, h(x) = 0} , if the problem is feasible,
Opt := x
+∞, if the problem is infeasible.

– [below boundedness] the problem is called below bounded, if its optimal value
is > −∞, i.e., if the objective is bounded from below on the feasible set.
• [optimal solution] a point x ∈ Rn is called an optimal solution to (4.5), if x is
feasible and f (x) ≤ f (x′ ) for any other feasible solution x′ , i.e., if
x ∈ Argmin {f (x′ ) : x′ ∈ X, g(x′ ) ≤ 0, h(x′ ) = 0} .
x′

– [solvable problem] a problem is called solvable, if it admits optimal solutions.

– [optimal set] the set of all optimal solutions to a problem is called its optimal
set.
Remark I.4.6 In the above description of a Mathematical Programming prob-
lem and related basic notions, like feasibility, solvability, boundedness, etc., we
“standardize” the situation by assuming that the objective should be minimized,
and the inequality constraints are of the form gi (x) ≤ 0. Needless to say, we
can also speak about problems where the objective should be maximized and/or
some of the inequality constraints are of the form gi (x) ≥ 0. There is no difficulty
to reduce these “more general” forms of optimization problems to our standard
form: maximizing f (x) is the same as minimizing −f (x), and the constraint of
the form gi (x) ≥ 0 is the same as the constraint −gi (x) ≤ 0. While this standard-
ization is always possible, from time to time we take the liberty to speak about
maximization problems and/or ≥-type constraints. With this in mind, it is worth
to mention that when working with maximization problems, we should update
the notions of optimal value, problem’s boundedness, and optimal solution. For
a maximization problem,

• the optimal value is the supremum of the values of the objective at feasible
solutions, and is, by definition, −∞ for infeasible problems, and
• boundedness means boundedness of the objective from above on the feasible
set (or, which is the same, the fact that the optimal value is < +∞),
• optimal solution is a feasible solution such that the objective value at this
solution is greater than or equal to the objective value at every feasible solution.
52 General Theorem on Alternative and Linear Programming Duality

Needless to say, when “standardizing” a maximization problem, i.e., replacing

maximization of f (x) with minimization of −f (x), boundedness and optimal
solutions remain intact while the optimal value “is negated,” i.e., real number a
becomes −a, and ±∞ becomes ∓∞. ■
Linear Programming problems. A Mathematical Programming problem
(P) is called Linear Programming (LP) problem, if

• X = Rn is the entire space,

• f, g1 , . . . , gm are real-valued affine functions on Rm , that is, functions of the
form a⊤ x + b, and
• there are no equality constraints at all.

Note that in principle we could allow for linear equality constraints hj (x) :=
a⊤
j x + bj = 0. However, a constraint of this type can be equivalently represented
by a pair of opposite linear inequalities a⊤ ⊤
j x + bj ≤ 0, − aj x − bj ≤ 0. To save
space and words (and, as we have just explained, with no loss in generality), in
the sequel we will focus on inequality constrained linear programming problems.

4.5.2 Dual to an LP problem: the origin

Consider an LP problem

a⊤
   
1
 ⊤ 
Opt = min c⊤ x : Ax − b ≥ 0 where A =  a2  ∈ Rm×n  .
 
 ...  (LP)
x  
⊤
am

The motivation for constructing the problem dual to an LP problem is the desire
to generate, in a systematic way, lower bounds on the optimal value Opt of (LP).
Consider the problem

min {f (x) : gi (x) ≥ bi , i = 1, . . . , m} .

An evident way to bound from below a given function f (x) in the domain given
by a system of inequalities

gi (x) ≥ bi , i = 1, . . . , m, (4.6)

is offered by what is called the Lagrange duality. We will discuss Lagrange Duality
in full detail for general functions in Part IV. Here, let us do a brief precursor
and examine the special case when we are dealing with linear functions only.
4.5 Application: Linear Programming Duality 53
Lagrange Duality:
• Let us look at all inequalities which can be obtained from (4.6) by
linear aggregation, i.e., the inequalities of the form
m
X m
X
yi gi (x) ≥ yi bi (4.7)
i=1 i=1

with the “aggregation weights” yi ≥ 0 for all i. Note that the inequality
(4.7), due to its origin, is valid on the entire set X of feasible solutions
of (4.6).
• Depending on the choice of aggregation weights, it may happen that
the left hand side in (4.7) is ≤P f (x) for all x ∈ Rn . Whenever this
m
is the case, the right hand side i=1 yi bi of (4.7) is a lower bound on
f (x) for any x ∈ X . It follows that
• The optimal value of the problem
(m )
X y ≥ 0, (a)
max yi bi : Pm n (4.8)
y
i=1 i=1 yi gi (x) ≤ f (x), ∀x ∈ R (b)
is a lower bound on the values of f on the set of feasible solutions to
the system (4.6).
Let us now examine what happens with the Lagrange duality when f and gi
are homogeneous linear functions, i.e., f (x) = c⊤ x and gi (x) = a⊤ x for all i =
iP
m
1, . . . , m. In this case, the requirement (4.8.b) merely says that c = i=1 yi ai (or,
⊤
which is the same, A y = c due to the origin of the matrix A). Thus, problem
(4.8) becomes the Linear Programming problem

max b⊤ y : A⊤ y = c, y ≥ 0 , (LP∗ )

y

which is nothing but the LP dual of (LP).

By the construction of the dual problem (LP∗ ), we immediately have
[Weak Duality] The optimal value in (LP∗ ) is less than or equal to the
optimal value in (LP).
In fact, the “less than or equal to” in the latter statement is “equal to,” provided
that the optimal value Opt in (LP) is a number (i.e., (LP) is feasible and below
bounded, in which case Fourier Motzkin elimination guarantees that Opt is a real
number). To see that this indeed is the case, note that a real number a is a lower
bound on Opt if and only if c⊤ x ≥ a holds for all x satisfying Ax ≥ b, or, which
is the same, if and only if the system of linear inequalities

−c⊤ x > −a, Ax ≥ b (Sa ) :

has no solution. Then, by General Theorem on Alternative we deduce that at

least one of a certain pair of systems of linear inequalities does have a solution.
More precisely,
54 General Theorem on Alternative and Linear Programming Duality

(*) (Sa ) has no solutions if and only if at least one of the following
two systems of linear inequalities in m + 1 unknowns has a solution:

 ; . . . ; λm ] ≥ 0,
(a) λ = [λ0 ; λ1P
 m

(b) −λ0 c + i=1 λi ai = 0,
TI : Pm
(c ) −λ0 a + i=1 λi bi ≥ 0,
 I


(dI ) λ0 > 0;
or 
 (a) λ = [λ0 ; λ1P ; . . . ; λm ] ≥ 0,
m
TII : (b) −λ0 c − i=1 λi ai = 0,
Pm
(cII ) −λ0 a − i=1 λi bi > 0.


Now assume that (LP) is feasible. We first claim that under this assumption
(Sa ) has no solutions if and only if TI has a solution. The implication “TI has a
solution =⇒ (Sa ) has no solution” is readily given by the preceding remarks. To
verify the inverse implication, assume that (Sa ) has no solution and the system
Ax ≥ b has a solution, and let us prove that then TI has a solution. If TI has no
solution, then by (*), TII must have a solution. Moreover, since any solution to
TII where λ0 > 0 is also a solution to TI as well, we must have λ0 = 0 for (every)
solution to TII . But, the fact that TII has a solution λ with λ0 = 0 is independent
of the values of c and a; if this fact would take place, it would mean, by the same
General Theorem on Alternative, that, e.g., the following instance of (Sa ):
0⊤ x ≥ −1, Ax ≥ b
has no solution as well. But, then we must have the system Ax ≥ b has no solution
– a contradiction to the assumption that (LP) is feasible.
Now, if TI has a solution, this system has a solution with λ0 = 1 as well (to see
this, pass from a solution λ to the one λ/λ0 ; this construction is well-defined, since
λ0 > 0 for every solution to TI ). Now, an (m + 1)-dimensional vector λ = [1; y]
is a solution to TI if and only if the m-dimensional vector y solves the following
system of linear inequalities and equations

⊤ Pm y ≥ 0,
A y≡ i=1 i i = c,
y a (D)
b⊤ y ≥ a.
We summarize these observations below.

Proposition I.4.7 If system (D) in unknowns y, a associated with the LP

program (LP) has a solution (ȳ, ā), then ā is a lower bound on the optimal
value in (LP). Vice versa, if (LP) is feasible and ā is a lower bound on
the optimal value of (LP), then ā can be extended by a properly chosen
m-dimensional vector ȳ to a solution to (D).

We see that the entity responsible for lower bounds on the optimal value of
(LP) is the system (D): every solution to the latter system induces a bound of
this type, and in the case when (LP) is feasible, all lower bounds can be obtained
4.5 Application: Linear Programming Duality 55

from solutions to (D). Now note that if (y, a) is a solution to (D), then the pair
(y, b⊤ y) also is a solution to the same system, and the lower bound b⊤ y on Opt
is not worse than the lower bound a. Thus, as far as lower bounds on Opt are
concerned, we lose nothing by restricting ourselves to the solutions (y, a) of (D)
with a = b⊤ y. The best lower bound on Opt given by (D) is therefore the optimal
value of the problem
max b⊤ y : A⊤ y = c, y ≥ 0 ,

y

which is nothing but the dual to (LP) problem given by (LP∗ ). Note that (LP∗ )
is also a Linear Programming problem.
All we know about the dual problem so far is the following:

Proposition I.4.8 Whenever y is a feasible solution to (LP∗ ), the corre-

sponding value of the dual objective b⊤ y is a lower bound on the optimal
value Opt in (LP). If (LP) is feasible, then for every a ≤ Opt there exists a
feasible solution y of (LP∗ ) with b⊤ y ≥ a.

4.5.3 Linear Programming Duality Theorem

Proposition I.4.8 is in fact equivalent to the following complete statement of LP
Duality Theorem.

Theorem I.4.9 [Duality Theorem in Linear Programming] Consider a lin-

ear programming problem
min c⊤ x : Ax ≥ b ,

(LP)
x

along with its dual

max b⊤ y : A⊤ y = c, y ≥ 0 . (LP∗ )

y

Then,
1) [Primal-dual symmetry] The dual problem is an LP program, and its
dual is equivalent to the primal problem;
2) [Weak duality] The value of the dual objective at every dual feasible
solution is less than or equal to the value of the primal objective at every
primal feasible solution, so that the dual optimal value is less than or equal
to the primal one;
3) [Strong duality] The following 5 properties are equivalent to each other:
(i) The primal is feasible and bounded below.
(ii) The dual is feasible and bounded above.
(iii) The primal is solvable.
(iv) The dual is solvable.
(v) Both primal and dual are feasible.
56 General Theorem on Alternative and Linear Programming Duality

Moreover, if any one of these properties (and then, by the equivalence just
stated, every one of them) holds, then the optimal values of the primal and
the dual problems are equal to each other.
Finally, if at least one of the problems in the primal-dual pair is feasible,
then the optimal values in both problems are the same, i.e., either both are
finite and equal to each other, or both are +∞ (i.e., primal is infeasible and
dual is not bounded above), or both are −∞ (i.e., primal is unbounded below
and dual is infeasible).

There is one last remark we should make to complete the story of primal
and dual objective values given in Theorem I.4.9: in fact it is possible to have
both primal and dual problems infeasible simultaneously (see Exercise I.38). This
is the only case when the primal and the dual optimal values (+∞ and −∞,
respectively) differ from each other.
Proof. 1) This part is quite straightforward: writing the dual problem (LP∗ ) in
our standard form, we get
     
 Im 0 
min −b⊤ y :  A⊤  y −  c  ≥ 0 ,
y 
−A⊤ −c


where Im is the m × m identity matrix. Applying the duality transformation to

the latter problem, we come to the problem
 

 ξ ≥ 0  
η ≥ 0
 
⊤ ⊤ ⊤
max 0 ξ + c η + (−c) ζ : ,
ξ,η,ζ 
 ζ ≥ 0  
ξ + Aη − Aζ = −b
 

which is clearly equivalent to (LP) (after we set x = ζ − η and eliminate ξ).

2) This part follows from the origin of the dual and is thus immediately given
by Proposition I.4.8.
3) We prove the following implications.
(i) =⇒ (iv): If the primal is feasible and bounded below, its optimal value Opt
(which of course is a lower bound on itself) can, by Proposition I.4.8, be (non-
strictly) majorized by a quantity b⊤ y ∗ , where y ∗ is a feasible solution to (LP∗ ).
Then, of course, b⊤ y ∗ = Opt by already proven statement of item 2). On the
other hand, by Proposition I.4.8, the optimal value in the dual is less than or
equal to Opt. Thus, we conclude that the optimal value in the dual is attained
and is equal to the optimal value in the primal.
(iv) =⇒ (ii): This is evident by the definition of solvability.
(ii) =⇒ (iii): This implication, in view of the primal-dual symmetry, follows from
the already justified implication (i) =⇒ (iv).
(iii) =⇒ (i): This is evident by the definition of solvability.
We have shown that (i)≡(ii)≡(iii)≡(iv) and that the first (and consequently
each) of these four equivalent properties implies that the optimal value in the
primal problem is equal to the optimal value in the dual one. All that remains
4.5 Application: Linear Programming Duality 57

is to prove the equivalence between (i)–(iv) and (v). This is immediate: (i)–(iv),
of course, imply (v); vice versa, in the case of (v) the primal is not only feasible,
but also bounded below (this is an immediate consequence of the feasibility of
the dual problem, see part 2)), and (i) follows.
It remains to verify that if one problem in the primal-dual pair is feasible, then
the primal and the dual optimal values are equal to each other. By primal-dual
symmetry it suffices to consider the case when the primal problem is feasible. If
also the primal is bounded from below, then by what has already been proved
the dual problem is feasible and the primal and dual optimal values coincide
with each other. If the primal problem is unbounded from below, then the primal
optimal value is −∞ and by Weak Duality the dual problem is infeasible, so that
the dual optimal value is −∞.
An immediate corollary of the LP Duality Theorem is the following necessary
and sufficient optimality condition in LP.

Theorem I.4.10 [Necessary and sufficient optimality conditions in Linear

Programming] Consider an LP program (LP) along with its dual (LP∗ ) as in
Theorem I.4.9. A pair (x, y) of primal and dual feasible solutions is composed
of optimal solutions to the respective problems if and only if we have
yi [Ax − b]i = 0, i = 1, . . . , m, [complementary slackness]
or equivalently, if and only if
c⊤ x − b⊤ y = 0. [zero duality gap]

Proof. Indeed, the “zero duality gap” optimality condition is an immediate con-
sequence of the fact that the value of primal objective at every primal feasible
solution is greater than or equal to the value of the dual objective at every dual
feasible solution, while the optimal values in the primal and the dual are equal
to each other whenever one of the problem is feasible, see Theorem I.4.9. The
equivalence between the “zero duality gap” and the “complementary slackness”
optimality conditions is given by the following computation: whenever x is primal
feasible and y is dual feasible, we have
y ⊤ (Ax − b) = (A⊤ y)⊤ x − b⊤ y = c⊤ x − b⊤ y,
where the second equality follows from dual feasibility (i.e., A⊤ y = c). Thus, for a
primal-dual feasible pair (x, y), the duality gap vanishes if and only if y ⊤ (Ax−b) =
0, and the latter, due to y ≥ 0 and Ax−b ≥ 0, happens if and only if yi [Ax−b]i = 0
for all i, that is, if and only if the complementary slackness takes place.
Geometry of primal-dual pair of LP problems. Consider primal-dual pair
of LP problems
minx∈Rn c⊤ x : Ax − b ≥ 0

(LP)
maxy∈Rm b⊤ y : A⊤ y = c, y ≥ 0 (LP∗ )
as presented in section 4.5.2 and assume that the system of equality constraints
58 General Theorem on Alternative and Linear Programming Duality

Figure I.6. Geometry of primal-dual LP pair, m = 3

Q : △ABC; Q∗ : ray DD′ ; ξ: point A; ξ∗ : point D
Pay attention to orthogonality of the ray and the plane of
−→ −−→
the triangle and orthogonality of vectors OA and OD.

in the dual problem is feasible, so that there exists β ∈ Rm such that A⊤ β = c.

It turns out3 that the pair (LP), (LP∗ ) possesses nice and transparent geometry.
Specifically, the data of the pair give rise to the following geometric entities:
• a pair of linear subspaces L, L∗ in Rm which are orthogonal complements to
each other
[L = Im(A), L∗ = Ker(A⊤ )],
• a pair of shift vectors d, d∗ ∈ Rm
[d = b, d∗ = −β],
which in turn give rise to
• the pair of convex sets Q = [L − d] ∩ Rm m
+ , Q∗ = [L∗ − d∗ ] ∩ R+
m m
[where R+ = {u ∈ R : u ≥ 0}].
To solve (LP), (LP∗ ) to optimality is exactly the same as to find orthogonal to
each other vectors ξ ∈ Q and ξ∗ ∈ Q∗ . Such a pair ξ, ξ∗ gives rise to primal-dual
optimal pair(s) (x∗ , y ∗ ) (one can take as x∗ any x such that Ax − b = ξ and
set y ∗ = ξ∗ ), and every primal-dual optimal pair (x∗ , y ∗ ) can be obtained in this
manner from a pair of orthogonal to each other vectors ξ ∈ Q, ξ∗ ∈ Q∗ . The
required pairs ξ, ξ∗ exist if and only if both the sets Q and Q∗ are nonempty.
For illustration, see Figure I.6.

3 for derivations, see Exercise IV.7 addressing Conic Duality, of which LP duality is a special case.
5

Exercises for Part I

5.1 Elementaries
Exercise I.1 Mark in the following list the sets which are convex:
1. x ∈ R2 : x1 + i2 x2 ≤ 1, i = 1, . . . , 10

2. x ∈ R2 : x21 + 2ix1 x2 + i2 x22 ≤ 1, i = 1, . . . , 10

3. x ∈ R2 : x21 + ix1 x2 + i2 x22 ≤ 1, i = 1, . . . , 10
4. x ∈ R2 : x21 + 5x1 x2 + 4x22 ≤ 1

5. x ∈ R10 : x21 + 2x22 + 3x23 + . . . + 10x210 ≤ ˙1000x1 − 999x2 + 998x3 − . . . + 992x9 − 991x10
6. x ∈ R2 : exp{x1 } ≤ x2

7. x ∈ R2 : exp{x1 } ≥ x2
8. x ∈ Rn : n 2
P
i=1 xi = 1
n Pn
9. x ∈ R : i=1 x2i ≤ 1
10. x ∈ Rn : n 2
P
i=1 xi ≥ 1

11. x ∈ Rn : max xi ≤ 1
i=1,...,n

12. x ∈ Rn : max xi ≥ 1
i=1,...,n

13. x ∈ Rn : max xi = 1
i=1,...,n

n
14. x ∈ R : min xi ≤ 1
i=1,...,n

n
15. x ∈ R : min xi ≥ 1
i=1,...,n

16. x ∈ Rn : min xi = 1
i=1,...,n

Exercise I.2 Mark by T those of the following claims which always are true.
1. The linear image Y = {Ax : x ∈ X} of a linear subspace X is a linear subspace.
2. The linear image Y = {Ax : x ∈ X} of an affine subspace X is an affine subspace.
3. The linear image Y = {Ax : x ∈ X} of a convex set X is convex.
4. The affine image Y = {Ax + b : x ∈ X} of a linear subspace X is a linear subspace.
5. The affine image Y = {Ax + b : x ∈ X} of an affine subspace X is an affine subspace.
6. The affine image Y = {Ax + b : x ∈ X} of a convex set X is convex.
7. The intersection of two linear subspaces in Rn is always nonempty.
8. The intersection of two linear subspaces in Rn is a linear subspace.
9. The intersection of two affine subspaces in Rn is an affine subspace.
10. The intersection of two affine subspaces in Rn , when nonempty, is an affine subspace.
11. The intersection of two convex sets in Rn is a convex set.
12. The intersection of two convex sets in Rn , when nonempty, is a convex set.

59
60 Exercises for Part I

Exercise I.3 ▲ Prove that the relative interior of a simplex with vertices y 0 , . . . , y m is exactly
the set
( m m
)
X X
λi yi : λi > 0, λi = 1 .
i=0 i=0

Exercise I.4 Which of the following claims is true:

1. The set X = {x : Ax ≤ b} is a cone if and only if X = {x : Ax ≤ 0}.
2. The set X = {x : Ax ≤ b} is a cone if and only if b = 0.
Exercise I.5 Suppose K is a closed cone. Prove that the set X = {x : Ax − b ∈ K} is a cone
if and only if X = {x : Ax ∈ K}.
Exercise I.6 ▲ Prove that if M is a nonempty convex set in Rn and ϵ > 0, then for every
norm ∥ · ∥ on Rn , the ϵ-neighborhood of M , i.e., the set

Mϵ = y ∈ Rn : inf ∥y − x∥ ≤ ϵ ,
x∈M

is convex.
Exercise I.7 Which of the following claims are always true? Explain why/why not.
1. The convex hull of a bounded set in Rn is bounded.
2. The convex hull of a closed set in Rn is closed.
3. The convex hull of a closed convex set in Rn is closed.
4. The convex hull of a closed and bounded set in Rn is closed and bounded.
5. The convex hull of an open set in Rn is open.
Exercise I.8 ▲ [This exercise together with its follow-up, i.e., Exercise II.9, and Exercise I.9
are the most boring exercises ever designed by the authors. Our excuse is that There is no royal
road to geometry (Euclid of Alexandria, c. 300 BC)]
Let A, B be nonempty subsets of Rn . Consider the following claims. If the claim is always
(i.e., for every data satisfying premise of the claim) true, give a proof; otherwise, give a counter
example.
1. If A ⊆ B, then Conv(A) ⊆ Conv(B).
2. If Conv(A) ⊆ Conv(B), then A ⊆ B.
3. Conv(A ∩ B) = Conv(A) ∩ Conv(B).
4. Conv(A ∩ B) ⊆ Conv(A) ∩ Conv(B).
5. Conv(A ∪ B) ⊆ Conv(A) ∪ Conv(B).
6. Conv(A ∪ B) ⊇ Conv(A) ∪ Conv(B).
7. If A is closed, so is Conv(A).
8. If A is closed and bounded, so is Conv(A).
9. If Conv(A) is closed and bounded, so is A.
Exercise I.9 ▲ Let A, B, C be nonempty subsets of Rn and D be a nonempty subset of Rm .
Consider the following claims. If the claim is always (i.e., for every data satisfying premise of
the claim) true, give a proof; otherwise, give a counter example.
1. Conv(A ∪ B) = Conv(Conv(A) ∪ B).
2. Conv(A ∪ B) = Conv(Conv(A) ∪ Conv(B)).
3. Conv(A ∪ B ∪ C) = Conv(Conv(A ∪ B) ∪ C).
4. Conv(A × D) = Conv(A) × Conv(D).
5. When A is convex, the set Conv(A ∪ B) (which is always the set of convex combinations
of several points from A and several points from B), can be obtained by taking convex
combinations of points with at most one of them taken from A, and the rest taken from B.
Similarly, if A and B are both convex, to get Conv(A ∪ B), it suffices to add to A ∪ B all
convex combinations of pairs of points, one from A and one from B.
5.2 Around ellipsoids 61

6. Suppose A is a set in Rn . Consider the affine mapping x 7→ P x + p : Rn → Rm , and

the image of A under this mapping, i.e., the set P A + p := {P x + p : x ∈ A}. Then,
Conv(P A + p) = P Conv(A) + p.
7. Consider an affine mapping y 7→ P (y) : Rm → Rn where P (y) := P y + p. Recall that
given a set X ∈ Rn , its inverse image under the mapping P (·) is given by P −1 (X) := {y ∈
Rm : P (y) ∈ X}. Then, Conv(P −1 (A)) = P −1 (Conv(A)).
8. Consider an affine mapping y 7→ P (y) : Rm → Rn where P (y) := P y+p. Then, Conv(P −1 (A)) ⊆
P −1 (Conv(A)).
Exercise I.10 ▲ Let X1 , X2 ∈ Rn be two nonempty sets, and define Y := X1 ∪ X2 and
Z := Conv(Y ). Consider the following claims. If the claim is always (i.e., for every data satisfying
premise of the claim) true, give a proof; otherwise, give a counter example.
1. Whenever X1 and X2 are both convex, so is Y .
2. Whenever X1 and X2 are both convex, so is Z.
3. Whenever X1 and X2 are both bounded, so is Y .
4. Whenever X1 and X2 are both bounded, so is Z.
5. Whenever X1 and X2 are both closed, so is Y .
6. Whenever X1 and X2 are both closed, so is Z.
7. Whenever X1 and X2 are both compact, so is Y .
8. Whenever X1 and X2 are both compact, so is Z.
9. Whenever X1 and X2 are both polyhedral, so is Y .
10. Whenever X1 and X2 are both polyhedral, so is Z.
11. Whenever X1 and X2 are both polyhedral and bounded, so is Y .
12. Whenever X1 and X2 are both polyhedral and bounded, so is Z.
Exercise I.11 Consider two families of convex sets given by {Fi }i∈I and {Gj }j∈J . Prove that
the following relation holds:
! !
[ [
Conv (Fi ∩ Gj ) ⊆ Conv [Gj ∩ Conv(∪i∈I Fi )] .
i∈I, j∈J j∈J

Exercise I.12 Let C1 , C2 be two nonempty conic sets in Rn , i.e., for each i = 1, 2, for any
x ∈ Ci and t ≥ 0, we have t · x ∈ Ci as well. Note that C1 , C2 are not necessarily convex. Prove
that
1. C1 + C2 ̸= Conv(C1 ∪ C2 ) may happen if either C1 or C2 (or both) is nonconvex.
2. C1 + C2 = Conv(C1 ∪ C2 ) always holds if C1 , C2 are both convex.
S
3. C1 ∩ C2 = α∈[0,1] (αC1 ∩ (1 − α)C2 ) always holds if C1 , C2 are both convex.
Exercise I.13 ▲ Let X ⊆ Rn be a convex set with int X ̸= ∅, and consider the following set

K := cl {[x; t] : t > 0, x/t ∈ X} .

Prove that the set K is a closed cone with a nonempty interior.

5.2 Around ellipsoids

Exercise I.14 Verify each of the following statements:
1. Any ellipsoid E ∈ Rn is the images of the unit Euclidean ball Bn = {x ∈ Rn : ∥x∥2 ≤ 1}
under a one-to-one affine mapping. That is, E ⊂ Rn can be represented as E = {x :
(x − c)⊤ C(x − c) ≤ 1} with C ≻ 0 and c ∈ Rn if and only if it can be represented as
E = {c + Du : u ∈ Bn } with nonsingular D, and in the latter representation D can be
selected to be symmetric positive definite.
62 Exercises for Part I

2. Given C ≻ 0, D ≻ 0 and c, d ∈ Rn , the ellipsoid EC := {x : (x − c)⊤ C(x − c) ≤ 1} is

contained in the ellipsoid ED := {x : (x − c)⊤ D(x − c) ≤ 1} if and only if C ⪰ D. If the
′
ellipsoid EC is contained in the ellipsoid ED = {x : (x − d)⊤ D(x − d) ≤ 1}, then C ⪰ D.
n
3. For a set U ⊂ R , let Vol(U ) be the ratio of the n-dimensional volume of U and the n-
dimensional volume of the unit ball Bn . Then, for an n-dimensional ellipsoid E represented
as {x = c + Du : ∥u∥2 ≤ 1} with nonsingular D we have

Vol(E) = |Det(D)|,

and when E is represented as {x : (x − c)⊤ C(x − c) ≤ 1} with C ≻ 0, we have

Vol(E) = Det−1/2 (C).

Exercise I.15 Given C ≻ 0, an ellipsoid {x : (x − a)⊤ C(x − a) ≤ 1} is the solution set of

quadratic inequality x⊤ Cx − 2(Ca)⊤ x + (a⊤ Ca − 1) ≤ 0. Prove that the solution set E of any
quadratic inequality f (x) := x⊤ Cx − c⊤ x + σ ≤ 0 with positive semidefinite matrix C is convex.

5.3 Truss Topology Design

Exercise I.16 ♦ [First acquaintance with Truss Topology Design]
Preamble. What follows is the first exercise in a “Truss Topology Design” (TTD) series ((other
exercises in it are I.18, III.9, IV.11, IV.28). The underlying “real life” mechanical story is simple
enough to be told and rich enough to illustrate numerous constructions and results presented in
the main body of our textbook – ranging from Caratheodory Theorem to semidefinite duality,
demonstrating on a real life example how the theory works.
Trusses. Truss is a mechanical construction, like railroad bridge, electric mast, of Eiffel Tower,
composed of thin elastic bars linked with each other at nodes – points from physical space (3D
space for spatial, and 2D space for planar trusses).

Figure I.7. Pratt Truss Bridge

source: https://grabcad.com/library/pratt-truss-bridge-2

When truss is subject to external load – collection of forces acting at the nodes – it starts to
deform, so that the nodes move a little bit, leading to elongations/shortenings of bars, which, in
turn, result in reaction forces. At the equilibrium, the reaction forces compensate the external
ones, and the truss capacitates certain potential energy, called compliance. Mechanics models
this story as follows.
• The nodes form a finite set p1 , . . . , pK of distinct points in physical space Rd (d = 2 for
planar, and d = 3 for spatial constructions). Virtual displacements of the nodes under the
load are somehow restricted by “support conditions;” we will focus on the case when some of
the nodes “are fixed” – cannot move at all (think about them as being in the wall), and the
remaining “are free” – their virtual displacements form the entire Rd . A virtual displacement
v of the nodal set can be identified with a vector of dimension M = dm, where m is the
number of free nodes; v is block vector with m d-dimensional blocks, indexed by the free
nodes, representing physical displacements of these nodes.
• There are N bars, i-th of them linking the nodes with indexes αi and βi (with at least one
of these nodes free) and with volume (3D or 2D, depending on whether the truss is spatial
or planar) ti .
5.3 Truss Topology Design 63

• An external load is a collection of physical forces – vectors from Rd – acting at the free nodes
(forces acting at the fixed nodes are of no interest – they are suppressed by the supports).
Thus, an external load f can be identified with block vector of the same structure as a virtual
displacement – blocks are indexed by free nodes and represent the external forces acting at
these nodes. Thus, displacements v of the nodal set and external loads f are vectors from
the space V of virtual displacements – M -dimensional block vectors with m d-dimensional
blocks.
• The bars and the nodes together specify the symmetric positive semidefinite M × M stiffness
matrix A of the truss. The role of this matrix is as follows. A displacement v ∈ V of the nodal
set results in reaction forces at free nodes (those at fixed nodes are of no interest – they are
compensated by supports); assembling these forces into M -dimensional block-vector, we get
a reaction, and this reaction is −Av. In other words, the potential energy capacitated in truss
under displacement v ∈ V of nodes is 12 v ⊤ Av, and reaction, as it should be, is the minus
gradient of the potential energy as a function of v 1 . At the equilibrium under external load
f , the total of the reaction and the load should be zero, that is, the equilibrium displacement
satisfies
Av = f (5.1)
Note that (5.1) may be unsolvable, meaning that the truss is crushed by the load in question.
Assuming the equilibrium displacement v exists, the truss at equilibrium capacitates potential
energy 12 v ⊤ Av; this energy is called compliance of the truss w.r.t. the load. Compliance is
convenient measure of rigidity of the truss with respect to the load, the less the compliance
the better the truss withstands the load.
Let us build the stiffness matrix of a truss. As we have mentioned, the reaction forces originate
from elongations/shortenings of bars under displacement of nodes. Consider i-th bar linking
nodes with initial – prior to the external load being applied – positions ai = pαi and bi = pβi ,
and let us set
di = ∥bi − ai ∥2 , ei = [bi − ai ]/di .
Under displacement v ∈ V of the nodal set,
v αi , bi + |{z}
• positions of the nodes linked by the bar become ai + |{z} v βi , where v γ is γ-th block
da db
in v – the displacement of γ-th node
• as a result, elongation of the bar becomes, in the first-order in v approximation, e⊤
i [db − da],
and the reaction forces caused by this elongation by Hooke’s Law2 are
d−1 ⊤
i Si ei ei [db − da] at node # αi
−d−1
i S e e⊤
i i i [db − da] at node # βi
0 at all remaining nodes
where Si = ti /di is the cross-sectional size of i-th bar. It follows that when both nodes linked
by i-th bar are free, the contribution of i-th bar to the reaction is
−ti bi b⊤
i v,

1 This is called linearly elastic model; it is the linearized in displacements approximation of the actual
behavior of a loaded truss. This model works the better the smaller are the nodal displacements as
compared to the inter-nodal distances, and is accurate enough to be used in typical real-life
applications.
2 Hooke’s Law says that the magnitude of the reaction force caused by elongation/shortening of a bar
is proportional to Sd−1 δ, where S is bar’s cross-sectional size (area for spatial, and thickness for
planar truss), d is bar’s (pre-deformation) length, and δ is the elongation. With units of length
properly adjusted to bars’ material, the proportionality coefficient becomes 1, and this is what we
assume from now on.
64 Exercises for Part I

where bi ∈ V is the vector with just two nonzero blocks:

— the block with index αi – this block is ei /di = [bi − ai ]/∥bi − ai ∥22 , and
— the block with index βi – this block is −ei /di = −[bi − ai ]/∥bi − ai ∥22 .
It is immediately seen that when just one of the nodes linked by i-th bar is free, the contri-
bution of i-th bar to the reaction is given by similar relations, but with one, rather than 2,
blocks in bi – the one corresponding to the free among the nodes linked by the bar.

The bottom line is that The stiffness matrix of a truss composed of N bars with volumes ti ,
1 ≤ i ≤ N , is

X
A = A(t) := ti bi b⊤
i ,
i

where bi ∈ V = RM are readily given by the geometry of nodal set and the indexes of nodes
linked by bar i.

Truss Topology Design problem. In the simplest Truss Topology Design (TTD) problem,
one is given

• a finite set of tentative nodes in 2D or 3D along with support conditions indicating which
of the nodes are fixed and which are free, and thus specifying the linear space V = RM of
virtual displacements of the nodal set,

• the set of N tentative bars – unordered pairs of (distinct from each other) nodes which are
allowed to be linked by bars, and the total volume W > 0 of the truss,

• An external load f ∈ V.

These data specify, as explained above, vectors bi ∈ RM , i = 1, . . . , N , and the stiffness matrix

N
X
A(t) = ti bi b⊤
i = B Diag{t1 , . . . , tN }B
⊤
∈ SM [B = [b1 , . . . , bN ]]
i=1

of truss, which under the circumstances can be identified with vector t ∈ RN

+ of bar volumes.
What we want is to find the truss of given volume capable to “withstand best of all” the given
load, that is, the one that minimizes the corresponding compliance.

When applying the TTD model, one starts with dense grid of tentative nodes and broad list
of tentative bars (e.g., by allowing to link by a bar every pair of distinct from each other nodes,
with at least one of the nodes in the pair free). At the optimal truss yielded by the optimal
solution to the TTD problem, many tentative bars (usually vast majority of them) get zero
volumes, and significant part of the tentative nodes become unused. Thus, TTD problem in fact
is not about sizing – it allows to recover optimal structure of the construction, this is where
“Topology Design” comes from.

To illustrate this point, here is a toy example (it will be our guinea pig in the entire series of
TTD exercises):
5.3 Truss Topology Design 65
Console design: We want to design a 2D truss as follows:
• The set of tentative nodes is the 9 × 9 grid {[p; q] ∈ R2 : p, q ∈ {0, 1, . . . , 8}} with
the 9 most-left nodes fixed and remaining 72 nodes free, resulting in M = 144-
dimensional space V of virtual displacements
• The external load f ∈ V = R144 is a single-force one, with the only nonzero force
[0; −1] applied at the 5-th node of the most-right column of nodes.
• We allow for all pairwise connections of pairs of distinct from each other nodes,
with at least one of these nodes free, resulting in N = 3204 tentative bars
• The total volume of truss is W = 1000.

9 × 9 nodal grid 3024 tentative bars

•: fixed nodes

optimal truss, 38 bars displacement under

compliance 0.1914 load of interest
Figure I.8. Console. Bars and nodes’ positions before (crosses) and
after (gray dots) deformation. Gray segment starting at the most right node: external force

Important: From now on, speaking about TTD problem, we always make the following as-
sumption:
PN ⊤
t=1 bi bi ≻ 0.
R:

Under this assumption, the stiffness matrix A(t) = i ti bi b⊤

P
i associated with truss t > 0 is
positive definite, so that such a truss can withstand whatever load f . You can verify numerically
that this is the case in Console design as stated above.
After this lengthy preamble (to justify its length, note that it is investment to a series of
exercises, rather than just one of them), let us pass to the exercise per se. Consider a TTD
problem.
1. Prove that truss t ≥ 0 (recall that we identify truss with the corresponding vector of bar
volumes) is capable to carry load f if and only if the quadratic function
1 ⊤
F (v) = f ⊤ v − v A(t)v
2
is bounded from above, and that whenever this takes place,
66 Exercises for Part I

• the maximum of F over V is achieved

• the maximizers of F are exactly the equilibrium displacements v – those with

A(t)v = f,

and for such a displacement, one has

1 ⊤ 1
[max F =] F (v) = v A(t)v = v ⊤ f
2 2
• the maximum value of F is exactly the compliance of the truss w.r.t. the load f
2. Prove that a real τ is an upper bound on the compliance of truss t ≥ 0 w.r.t. load f if and
only if the symmetric matrix

B Diag{t}B ⊤ f

A= ⊤ , B = [b1 , . . . , bN ]
f 2τ

is positive semidefinite. As a result, pose the TTD problem as the optimization problem
( )
B Diag{t}B ⊤ f
X
Opt = min τ : ⪰ 0, t ≥ 0, ti = W (5.2)
τ,r f⊤ 2τ i

Prove that the problem is solvable.

3. [computational study]
3.1. Solve the Console problem numerically and reproduce the numerical results presented
above.
3.2. Resolve the problem with the set of all possible tentative bars reduced to the subset of
“short” bars connecting neighboring nodes only:

Figure I.9. 262 ”short” tentative bars

and compare the resulting design and compliance to those in the previous item.

5.4 Around Caratheodory Theorem

Exercise I.17 ♦ Prove the following statements:
Let X ⊂ Rn be nonempty. Then,
1. if a point x can be represented as a convex combination of a collection of vectors from X,
then the collection can be selected to be affinely independent.
2. if a point x can be represented as a conic combination of a collection of vectors from X, then
the collection can be selected to be linearly independent.
5.4 Around Caratheodory Theorem 67

Comment: Note that the claims above are refinements, albeit minor ones, of the Caratheodory
Theorem (plain and conic, respectively). Indeed, when M := Aff(X) and m is the dimension
of M , every affinely independent collection of points from X contains at most m + 1 points
(Proposition A.44), so that the first claim equivalent to stating that if x ∈ Conv(X), then x is a
convex combination of at most m + 1 points from X. However, the vectors participating in such
a convex combination are not necessarily affinely independent, so that the first claim provides
a bit more information than the plain Caratheodory’s Theorem. Similarly, if L := Lin(X) and
m := dim L, then every linearly independent collection of vectors from X contains at most
m ≤ n points, that is, the second claim implies the Caratheodory’s Theorem in conic form, and
provides a bit more information than the latter theorem.
Exercise I.18 ♦ 3 Consider TTD problem, and let N be the number of tentative bars, M be
the dimension of the corresponding space of virtual displacements V, and f be an external load.
Prove that if truss t ≥ 0 can withstand load f with compliance ≤ τ for some given real number
τ , then there exists truss t of the same total volume as t with compliance w.r.t. f at most τ and
at most M + 1 bars of positive volume.
Exercise I.19 ♦ [Shapley-Folkman Theorem]
1. Prove that if a system of linear equations Ax = b with n variables and m equations has a
nonnegative solution, it has a nonnegative solution with at most m positive entries.
2. Let V1 , . . . , Vn be n nonempty sets in Rm , and define

V := Conv(V1 + V2 + . . . + Vn ).

1. Prove that
V = Conv(V1 ) + . . . + Conv(Vn ).

2. Prove Shapley-Folkman Theorem:

Let x ∈ V . Then, there exists a representation of x such that
x = x1 + . . . + xn , xi ∈ Conv(Vi ),
in which at least n − m of xi ’s belong to the respective sets Vi .
Comment: Shapley-Folkman Theorem says, informally, that when n ≫ m, summing up
n nonempty sets in Rm possesses certain “convexification property” – every point from
the convex hull V of the sum of our sets is the sum of points xi with all but m of them
belonging to Vi rather than to Conv(Vi ), and only ≤ m of the points belonging to Vi
“fractionally,” that is, belonging to Conv(Vi ), but not to Vi . This nice fact has numerous
useful applications.
Exercise I.20 ♦ Caratheodory’s Theorem in its plain and its conic forms are “existence”
statements: if a point x ∈ Rm is a convex, respectively conic, combination of points x1 , . . . , xN ,
then there exists a representation of x of the same type which involves at most (m + 1), respec-
tively, m, terms. Extract from the proofs of the theorems algorithms for finding these “short”
representations at the cost of solving at most N solvable systems of linear equations with at
most N variables and m equations each.

Exercise I.21 ♦ Prove Kirchberger’s Theorem:

Consider two sets of finitely many points X = x1 , . . . , xk and Y = y 1 , . . . , y m

in Rn such that k + m ≥ n + 2 and all the points x1 , . . . , xk , y 1 , . . . , y m are distinct.

Assume that for any subset S ⊆ X ∪ Y composed of n + 2 points the convex hulls of
the sets X ∩ S and Y ∩ S do not intersect, i.e., Conv(X ∩ S) ∩ Conv(Y ∩ S) = ∅. Then,
the convex hulls of X and Y also do not intersect, i.e., Conv(X) ∩ Conv(Y ) = ∅.
3 Preceding exercise in the TTD series is I.16.
68 Exercises for Part I

Hint: Assume, on contrary, that the convex hulls of X and Y intersect, so that
k
X m
X
λ i xi = µj y j
i=1 j=1

for certain nonnegative λi , ki=1 λi = 1, and certain nonnegative µj , m

P P
j=1 µj = 1, and look at
the expression of this type with the minimum possible total number of nonzero coefficients λi ,
µj .
Exercise I.22 ♦ [Follow-up to Shapley-Folkman Theorem]
1. Let X1 , . . . , XK be nonempty convex sets in Rn and X = k≤K Xk . Prove that
S

( K K
)
X X
Conv(X) = x = λk xk : λk ≥ 0, xk ∈ Xk , ∀k ≤ K, λk = 1 .
k=1 k=1
n
2. Let Xk , k ≤ K, be nonempty bounded polyhedral sets in R given by polyhedral represen-
tations: n o
Xk = x ∈ Rn : ∃uk ∈ Rnk : Pk x + Qk uk ≤ rk .
S
Define X := k≤K Xk . Prove that the set Conv(X) is a polyhedral set given by the polyhedral
representation
∃xk ∈ Rn , uk ∈ Rnk , λk ∈ R, ∀k ≤ K :
 
 
Pk xk + Qk uk − λk rk ≤ 0, k ≤ K
 

n (a) 
Conv(X) = x ∈ R : PK . (∗)
λk ≥ 0, λk = 1 (b) 
PK k=1

 
x = k=1 xk
 
(c)
Does the claim remain true when the assumption of boundedness of the sets Xk s is lifted?
After two preliminary items above, let us pass to the essence of the matter. Consider the situation
as follows. We are given n nonempty and bounded polyhedral sets Xj ⊂ Rr , j = 1, . . . , n. We
will think of Xj as the “resource set” of the j-th production unit: entries in x ∈ Xj are amounts
of various resources, and Xj describes the set of vectors of resources available, in principle, for
j-th unit. Each production unit j can possibly use any one of its Kj < ∞ different production
plans. For each j = 1, . . . , n, the vector yj ∈ Rp representing the production of the j-th unit
depends on the vector xj of resources consumed by the unit and also on the production plan
utilized in the unit. In particular, the production vector yj ∈ Rp stemming from resources xj
under k-th plan can be picked by us, at our will, from the set
n o
Yjk [xj ] := yj ∈ Rp : zj := [xj ; −yj ] ∈ Vjk ,

where Vjk , k ≤ Kj , are given bounded polyhedral “technological sets” of the units with projec-
tions onto the xj -plane equal to Xj , so that for every k ≤ Kj it holds
xj ∈ Xj ⇐⇒ ∃yj : [xj ; −yj ] ∈ Vjk . (5.3)
We assume that all the sets Vjk are given by polyhedral representations, and we define
[
Vj := Vjk .
k≤Kj

Let R ∈ R be the vector of total resources available to all n units and let P ∈ Rp be the
r

vector of total demands for the products. For j ≤ n, we want to select xj ∈ Xj , kj ≤ Kj , and
k
yj ∈ Yj j [xj ] in such a way that
X X
xj ≤ R and yj ≥ P.
j j
5.5 Around Helly Theorem 69
P
That is, we would like to find zj = [xj ; vj ] ∈ Vj , j ≤ n, in such a way that j zj ≤ [R; −P ].
Note that the presence of “combinatorial part” in our decision – selection of production plans
in finite sets – makes the problem difficult.
3. Apply Shapley-Folkman Theorem (Exercise I.19) to overcome, to some extent, the above
difficulty and come up with a good and approximately feasible solution.

5.5 Around Helly Theorem

Exercise I.23 ▲ [Alternative proof of Helly Theorem] The goal of this exercise is to build an
alternative proof of Helly’s Theorem, without using Radon’s Theorem.
1. Consider a system a⊤ n
i x ≤ bi , i ≤ N , of N linear inequalities in variables x ∈ R . Helly’s
n ⊤
Theorem applied to the sets Ai := {x ∈ R : ai x ≤ bi } gives us that
(!) If a system a⊤ n
i x ≤ bi , i ≤ N , of linear inequalities in variables x ∈ R infeasible,
so is a properly selected sub-system composed of at most n + 1 inequalities from the
system.
Find an alternative proof of (!) without relying on Helly’s or Radon’s Theorems.
2. Extract from item 1 Helly’s Theorem for polyhedral sets: If A1 , . . . , AN , N ≥ n + 1, are
polyhedral sets in Rn and every n + 1 of these sets have a point in common, then all the sets
have a point in common.
3. Extract from item 2 Helly’s Theorem (Theorem I.2.10).
Exercise I.24 ▲ A0 , A1 , . . . , Am , m = 2025, are nonempty convex subsets of R2000 , and A0 is
a triangle (convex hull of 3 affinely independent vectors). Which of the claims below are always
(that is, for any A0 , . . . , Am satisfying the above assumptions) true:
1. If every 3 among the sets A0 , . . . , Am have a point in common, all m + 1 sets have a point
in common.
2. If every 4 among the sets A0 , . . . , Am have a point in common, all m + 1 sets have a point
in common.
3. If every 2001 among the sets A0 , . . . , Am have a point in common, all m + 1 sets have a point
in common.
Exercise I.25 ▲ Let Pi := {x ∈ Rn : Ai x ≤ bi } for i ∈ {1, . . . , m} and C := {x ∈ Rn : Dx ≥
d} be nonempty polyhedral sets. Suppose that for any n + 1 sets, Pi1 , . . . , Pin+1 , there is a
translate of C, i.e., the set C + u for some u ∈ Rn , which is contained in all Pi1 , . . . , Pin+1 .
Prove that there is a translate of C, which is contained in all of the sets P1 , . . . , Pm .
Exercise I.26 ▲ A cake contains 300 g of raisins (you may think of every one of them as of a
3D ball of positive radius). John and Jill are about to divide the cake according to the following
rules:
• first, Jill chooses a point a in the cake;
• second, John makes a cut through a, that is, chooses a 2D plane Π passing through a and
takes the part of the cake on one side of the plane (both Π and the side are up to John, with
the only restriction that the plane should pass through a); all the rest goes to Jill.
1. Prove that it may happen that Jill cannot guarantee herself 76 g of the raisins.
2. Prove that Jill always can choose a in a way which guarantees her at least 74 g of the raisins.
3. Consider n-dimensional version of the problem, where the raisins are n-dimensional balls,
the cake is a domain in Rn , and “a cut” taken by John is defined as the part of the cake
contained in the half-space
n o
x ∈ Rn : e⊤ (x − a) ≥ 0 ,

where e ̸= 0 is the vector (“inner normal to the cutting hyperplane”) chosen by John. Prove
70 Exercises for Part I
300
that for every ϵ > 0, Jill can guarantee to herself at least n+1
− ϵ g of raisins, but in general
300
cannot guarantee to herself n+1 + ϵ g.
Remarks:
300
1. With some minor effort, you can prove that Jill can find a point which guarantees her n+1
300
g of raisins, and not n+1 − ϵ g.
2. If, instead of dividing raisins, John and Jill would divide in the same fashion uniform and
convex cake (that is, a closed and bounded convex body X with a nonempty interior in Rn ,
the reward being the n-dimensional volume of the part a person gets), the results would
change dramatically: choosing as the point the center of masses of the cake
R
xdx
X
x̄ := R ,
dx
X
n
n
Jill would guarantee herself at least n+1
≈ 1e part of the cake. This is a not so easy
corollary of the following extremely important and deep result:
Brunn-Minkowski Symmetrization Theorem: Let X be as above, and let [a, b]
be the projection of X on an axis ℓ, say, on the last coordinate axis. Consider the “
symmetrization” Y of X, i.e., Y is the set with the same projection [a, b] on ℓ and
for every hyperplane orthogonal to the axis ℓ and crossing [a, b], the intersection of Y
with this hyperplane is an (n − 1)-dimensional ball centered at the axis with precisely
the same (n − 1)-dimensional volume as the one of the intersection of X with the same
hyperplane:
z ∈ Rn−1 : [z; c] ∈ Y = z ∈ Rn−1 : ∥z∥2 ≤ ρ(c) , ∀c ∈ [a, b], and

Voln−1 z ∈ Rn−1 : [z; c] ∈ Y = Voln−1 z ∈ Rn−1 : [z; c] ∈ X , ∀c ∈ [a, b].

Then, Y is a closed convex set.

5.6 Around Polyhedral Representations

Exercise I.27 ▲ Justify calculus rules for polyhedral representations presented in section 3.3.
Exercise I.28 Given two sets U, V ⊆ Rm , we define
U + V = {x ∈ Rm : ∃u ∈ U, ∃v ∈ V : x = u + v} .
Let D := {x ∈ Rn : Ax + b + Qs ⊆ P, ∀s ∈ S} where the set ∅ ̸= P ⊂ Rm admits polyhedral
representation, the set ∅ ̸= S ⊂ Rk is given but arbitrary, and the sets ∅ ̸= Qs ⊂ Rm are
indexed by s ∈ S.
1. Suppose that S is a finite set and for each s ∈ S we have Qs = {qs }, i.e., is a single point.
Then, will the set D be polyhedrally representable?
2. State sufficient conditions on the structure of sets Qs and S that will guarantee that the
resulting set D is polyhedral. Here, the goal is to have conditions as general as possible.
Among your sufficient conditions, can you identify at least some of those that are necessary?
Exercise I.29 ♦ For x ∈ Rn and integer k, 1 ≤ k P ≤ n, let sk (x) be the sum of k largest
entries in x. For example, s1 (x) = maxi {xi }, sn (x) = n
i=1 xi , s3 ([3; 1; 2; 2]) = 3 + 2 + 2 = 7.
Now let 1 ≤ k ≤ n be two integers. For any integer k = 1, . . . , n, define
Xk,n := {[x; t] ∈ Rn × R : sk (x) ≤ t} .
Observe that Xk,n is a polyhedral set. Indeed, sk (x) ≤ t holds if and only if for every k indices
i1 < i2 < . . . < ik from {1, 2, . . . , n} we have xi1 + xi2 + . . . + xik ≤ t, which is nothing but a
5.7 Around General Theorem on Alternative 71

linear inequality in variables x, t. Since there are nk possible ways

of selecting k indices from
{1, 2, . . . , n}, the number of linear inequalities describing Xk,n is nk , and these linear inequalities
give the polyhedral description of Xk,n . The point of this exercise is to demonstrate that Xk,n
admits a “short” polyhedral representation, specifically,
( n
)
n n
X
Xk,n = [x; t] ∈ R × R : ∃z ∈ R , ∃s ∈ R: xi ≤ zi + s, ∀i, z ≥ 0, zi + ks ≤ t .
i=1

Exercise I.30 ▲ [Computational study: Fourier-Motzkin elimination as an LP algorithm] It

was mentioned in section 3.2.1 that Fourier-Motzkin elimination provides us with an algorithm
that terminates in finitely many steps for solving LP problems. This algorithm, however, is of
no computational value due to the potential rapid growth of the number of inequalities one may
need to handle when eliminating more and more variables. The goal of this exercise is to get an
impression of this phenomenon.
Our “guinea pig” will be transportation problem with n unit capacity suppliers and n unit
demand customers:
( n Xn
)
X X X
min t : t ≥ cij xij , xij ≥ 1, ∀j, xij ≤ 1, ∀i, xij ≥ 0, ∀i, j .
x,t
i=1 i=1 i j

2 2
This problem has n + 1 variables and (n + 1) linear inequality constraints, and let us solve
it by applying the Fourier-Motzkin elimination to project the feasible set of the problem onto
the axis of the t-variable, that is, to build a finite system S of univariate linear inequalities
specifying this projection.
How many inequalities do you think there will be in S when n = 1, 2, 3, 4? Check your intuition
by implementing and running the F-M elimination, assuming, for the sake of definiteness, that
cij = 1 for all i, j.

5.7 Around General Theorem on Alternative

Exercise I.31 1. Prove Gordan’s Theorem on Alternative:
A system of strict homogeneous linear inequalities Ax < 0 in variables x has a solution
if and only if the system A⊤ λ = 0, λ ≥ 0 in variables λ has only the trivial solution
λ = 0.
2. Prove Motzkin’s Theorem on Alternative:
A system Ax < 0, Bx ≤ 0 of strict and nonstrict homogeneous linear inequalities has
a solution if and only if the system A⊤ λ + B ⊤ µ = 0, λ ≥ 0, µ ≥ 0 in variables λ, µ
has no solution with λ ̸= 0.

Exercise I.32 For the systems of constraints to follow, write them down equivalently in the
standard form Ax < b, Cx ≤ d and point out their feasibility status (“feasible – infeasible”) along
with the corresponding certificates (certificate for feasibility is a feasible solution to the system;
certificate for infeasibility is a collection of weights of constraints which leads to a contradictory
consequence inequality, as explained in GTA).
1. x ≤ 0 (x ∈ RnP )
n
2. x ≤ 0, and i=1 xi > 0P(x ∈ Rn )
3. −1 ≤ xi ≤ 1, 1 ≤ i ≤ n, Pn n
i=1 xi ≥ n (x ∈ R )
n
4. −1 ≤ xi ≤ 1, 1 ≤ i ≤ n, xi > n (x ∈ Rn )
Pi=1
5. −1 ≤ xi ≤ 1, 1 ≤ i ≤ n, n
ixi ≥ n(n+1) (x ∈ Rn )
Pi=1
n
2
n(n+1)
6. −1 ≤ xi ≤ 1, 1 ≤ i ≤ n, i=1 ixi > 2
(x ∈ Rn )
2
7. x ∈ R , |x1 | + x2 ≤ 1, x2 ≥ 0, x1 + x2 = 1
8. x ∈ R2 , |x1 | + x2 ≤ 1, x2 ≥ 0, x1 + x2 > 1
72 Exercises for Part I

9. x ∈ R4 , x ≥ 0, the sum of two largest entries in x does not exceed 2, and x1 + x2 + x3 ≥ 3

10. x ∈ R4 , x ≥ 0, the sum of two largest entries in x does not exceed 2, and x1 + x2 + x3 > 3
Exercise I.33 Let (S) be the following system of linear inequalities in variables x ∈ R3
x1 ≤ 1, x1 + x2 ≤ 1, x1 + x2 + x3 ≤ 1 (S)
In the following list, point out which inequalities are/are not consequences of this system, and
certify your claims. To certify that a given inequality is a consequence of the given system, you
need to provide nonnegative aggregation weights λ ∈ R3+ for the inequalities in (S) such that the
resulting consequence inequality implies the given inequality. To certify that a given inequality
is not a consequence of the given system (S), you need to find a point x ∈ R3 that satisfies the
given system but violates the given inequality.
1. 3x1 + 2x2 + x3 ≤4
2. 3x1 + 2x2 + x3 ≤2
3. 3x1 + 2x2 ≤ 3
4. 3x1 + 2x2 ≤ 2
5. 3x1 + 3x2 + x3 ≤3
6. 3x1 + 3x2 + x3 ≤2
Make a generalization: prove that a linear inequality px1 + qx2 + rx3 ≤ s is a consequence of
(S) if and only if s ≥ p ≥ q ≥ r ≥ 0.
Exercise I.34 Is the inequality x1 + x2 ≤ 1 a consequence of the system x1 ≤ 1, x1 ≥ 2? If
yes, can it be obtained by taking a legitimate weighted sum of inequalities from the system and
the always true inequality 0⊤ x ≤ 1, as it is suggested by the Inhomogeneous Farkas Lemma?
Exercise I.35 Certify the correct statements in the following list:
P3
The polyhedral set X = x ∈ R3 : x ≥ [1/3; 1/3; 1/3],

1. i=1 xi ≤ 1 is nonempty.
3
The polyhedral set X = x ∈ R3 : x ≥ [1/3; 1/3; 1/3],
P
2. i=1 xi ≤ 0.99 is empty.
3. The linear inequality x1 + x2 + Px33 ≥ 2 is violated somewhere on the polyhedral set X =
x ∈ R3 : x ≥ [1/3; 1/3; 1/3],

i=1 xi ≤ 1 .
4. The linear inequality x1 + x 2
P3 3 ≥ 2 is violated somewhere on the polyhedral set X =
+ x
x ∈ R3 : x ≥ [1/3; 1/3; 1/3],

i=1 xi ≤ 0.99 .
5. The linear inequality x1 + x2 ≤ 3/4 P is satisfied everywhere on the polyhedral set
3
X = x ∈ R3 : x ≥ [1/3;1/3; 1/3],

i=1 xi ≤ 1.05 .
3
6. The polyhedral set Y = x ∈ R : x1 ≥ 1/3, x2 ≥ 1/3, x3 ≥ 1/3 is not contained in the
P3
polyhedral set X = x ∈ R3 : x ≥ [1/3; 1/3; 1/3],

i ≤ 1 .
i=1 xP
3
The polyhedral set Y = x ∈ R3 : x ≥ [1/3; 1/3; 1/3],

7. i=1 xi ≤ 1 is contained in the
3
polyhedral set X = x ∈ R : x1 + x2 ≤ 2/3, x2 + x3 ≤ 2/3, x1 + x3 ≤ 2/3 .

5.8 Around Linear Programming Duality

Exercise I.36 ♦ Let the polyhedral set P = {x ∈ Rn : Ax ≤ b}, where A = [a⊤ ⊤
1 ; . . . ; am ], be
nonempty. Prove that P is bounded if and only if every vector from Rn can be represented as a
linear combination of the vectors ai with nonnegative coefficients where at most n coefficients are
positive. As a result, given A, all nonempty sets of the form {x ∈ Rn : Ax ≤ b} simultaneously
are/are not bounded.
Exercise I.37 Consider the linear program
Opt = max {x1 : x1 ≥ 0, x2 ≥ 0, ax1 + bx2 ≤ c} (P )
x∈R2

where a, b, c are parameters. Answer the following questions:

1. Let c = 1. Is the problem feasible?
2. Let a = b = 1, c = −1. Is the problem feasible?
5.8 Around Linear Programming Duality 73

3. Let a = b = 1, c = −1. Is the problem bounded4 ?

4. Let a = b = c = 1. Is the problem bounded?
5. Let a = 1, b = −1, c = 1. Is the problem bounded?
6. Let a = b = c = 1. Is it true that Opt ≥ 0.5?
7. Let a = b = 1, c = −1. Is it true that Opt ≤ 1?
8. Let a = b = c = 1. Is it true that Opt ≤ 1?
9. Let a = b = c = 1. Is it true that x∗ = [1; 1] is an optimal solution of (P )?
10. Let a = b = c = 1. Is it true that x∗ = [1/2; 1/2] is an optimal solution of (P )?
11. Let a = b = c = 1. Is it true that x∗ = [1; 0] is an optimal solution of (P )?
Exercise I.38 Consider the LP program
 
 x1 ≤ 0 
max −x2 : −x1 ≤ −1
x1 ,x2 
x2 ≤ 1


Write down the dual problem and check whether the optimal values are equal to each other.
Exercise I.39 Write down the problems dual to the following linear programs:
 

 x1 − x2 + x3 = 0, 

x1 + x2 − x3 ≥ 100, 

 
 
1. max x1 + 2x2 + 3x3 : x1 ≤ 0,
x∈R3 
x2 ≥ 0,
 


 

x3 ≥ 0
 
⊤
2. maxn c x : Ax = b, x ≥ 0
x∈R
3. maxn c⊤ x : Ax = b, u ≤ x ≤ u

x∈R
4. max c⊤ x : Ax + By ≤ b, x ≤ 0, y ≥ 0

x,y

Exercise I.40 ▲ Consider a primal-dual pair of LO programs

n o
Opt(P ) = min c⊤ x : Ax ≥ b (P )
x
n o
Opt(D) = max b⊤ y : y ≥ 0, A⊤ y = c (D)
y

Prove that the feasible set of at least one of these problems is unbounded.
Exercise I.41 ▲ Consider the following linear program
 
 X X X 
Opt = min 2 xij : xij ≥ 0, ∀1 ≤ i < j ≤ 4, xij + xji ≥ i, 1 ≤ i ≤ 4 .
{xij }1≤i<j≤4  
1≤i<j≤4 j>i j<i

1. Show that the optimum objective value is at most 20.

2. Show that the optimum objective value is at least 10. Opt ≥ 10.
Exercise I.42 ♦ We say that an n×n matrix P is stochastic if all of its entries are nonnegative
and the sum of the entries of each row is equal to 1. Show that if P is a stochastic matrix, then
there is a nonzero vector a ∈ Rn such that a⊤ P = a⊤ and a ≥ 0.
Exercise I.43 ▲ Let A ∈ Rn×n be a symmetric matrix. Consider the linear optimization
problem
n o
min c⊤ x : Ax ≥ c, x ≥ 0 .
x

Prove that if x̄ satisfies Ax̄ = c and x̄ ≥ 0, then x̄ is optimal.

4 Recall that a maximization problem is called bounded, if the objective is bounded from above on the
feasible set, which is the same as its optimal value being < ∞
74 Exercises for Part I

Exercise I.44 ▲ Let w ∈ Rn , and let A ∈ Rn×n be a skew-symmetric matrix, i.e., A⊤ = −A.
Consider the following linear program
n o
Opt(P ) = minn w⊤ x : Ax ≥ −w, x ≥ 0 .
x∈R

Suppose that the problem is solvable. Provide a closed analytical form expression for Opt(P ).
Exercise I.45 ▲ [Separation Theorem, polyhedral version] Let P and Q be two nonempty
polyhedral sets in Rn such that P ∩ Q = ∅. Suppose that the polyhedral descriptions of these
sets are given as
P := {x ∈ Rn : Ax ≤ b} and Q := {x ∈ Rn : Dx ≥ d} .
Using LP duality show that there exists a vector c ∈ Rn such that
c⊤ x < c⊤ y for all x ∈ P and y ∈ Q.
Exercise I.46 ▲ Suppose we are given the following linear program
n o
min c⊤ x : Ax = b, x ≥ 0 (P )
x

and its associated Lagrangian function given by

L(x, λ) := c⊤ x + λ⊤ (b − Ax).
The LP dual to (P ) is (replace Ax = b with Ax ≥ b, −Ax ≥ −b)
n o
Opt(D) = max b⊤ [λ+ − λ− ] : A⊤ [λ+ − λ− ] + µ = c, λ± ≥ 0, µ ≥ 0 ,
λ± ,µ

or, after eliminating µ and setting λ = λ+ − λ− ,

n o
Opt(D) = max b⊤ λ : A⊤ λ ≤ b . (D)
λ

Now, let us consider the following “game”: Player 1 chooses some x ≥ 0, and player 2 chooses
some λ simultaneously; then, player 1 pays to player 2 the amount L(x, λ). In this game, player
1 would like to minimize L(x, λ) and player 2 would like to maximize L(x, λ).
A pair (x∗ , λ∗ ) with x∗ ≥ 0, is called an equilibrium point (or saddle point or Nash equilibrium)
if
L(x∗ , λ) ≤ L(x∗ , λ∗ ) ≤ L(x, λ∗ ), ∀x ≥ 0 and ∀λ. (∗)
(That is, we have an equilibrium if no player is able to improve her performance by unilaterally
modifying her choice.)
Show that a pair (x∗ , λ∗ ) is an equilibrium point if and only if x∗ and λ∗ are respectively
optimal solutions to the problem (P ) and its dual respectively.
Exercise I.47 ▲ Given a polyhedral set X = x ∈ Rn : a⊤

i x ≤ bi , ∀i = 1, . . . , m , consider
the associated optimization problem
max {t : B1 (x, t) ⊆ X} ,
x,t

n
where B1 (x, t) := {y ∈ R : ∥y − x∥∞ ≤ t}. Is it possible to pose this optimization problem as
a linear program with polynomial in m, n number of variables and constraints? If it is possible,
give such a representation explicitly. If not, argue why.
Exercise I.48 ▲ Consider the following optimization problem
n o
minn c⊤ x : ã⊤
i x ≤ bi for some ãi ∈ Ai , i = 1, . . . , m, x ≥ 0 , (*)
x∈R

where Ai = {āi + ϵi : ∥ϵi ∥∞ ≤ ρ} for i = 1, . . . , m and ∥u∥∞ := maxj=1,...,n {|uj |}. In this prob-
lem, we basically mean that the constraint coefficient ãij (j-th component of the i-th constraint
5.8 Around Linear Programming Duality 75

vector ãi ) belongs to the interval uncertainty set [āij − ρ, āij + ρ], where āij is its nominal
value. That is, in (∗), we are seeking a solution x such that each constraint is satisfied for some
coefficient vector from the corresponding uncertainty set.
Note that in its current form (∗), this problem is not a linear program (LP). Prove that it
can be written as an explicit linear program and give the corresponding LP formulation.
Exercise I.49 ♦ Let S = {a1 , a2 , . . . , an } be a finite set composed of n distinct from each
other elements, and let f be a real-valued function defined on the set of all subsets of S. We say
that f is submodular if for every X, Y ⊆ S, the following inequality holds
f (X) + f (Y ) ≥ f (X ∪ Y ) + f (X ∩ Y ).
1. Give an example of a submodular function f .
2. Let f : 2S → Z be an integer-valued submodular function such that f (∅) = 0. Consider the
polyhedron
( )
|S|
X
Pf := x ∈ R : xt ≤ f (T ), ∀T ⊆ S ,
t∈T

Consider
x̄ak := f ({a1 , . . . , ak }) − f ({a1 , . . . , ak−1 }), k = 1, . . . , n.
Show that x̄ is feasible to Pf .
3. Consider the following optimization problem associated with Pf
n o
max c⊤ x : x ∈ Pf .
x

Write down the dual of this LP.

4. Assume without loss of generality that ca1 ≥ ca2 ≥ . . . ≥ can . Identify a dual feasible solution
and using LP Duality Theorem show that the solution x̄ specified in part 2 is optimal to the
primal maximization problem associated with Pf .
Remark: Note that when the submodular function f is integer-valued, we immediately see from
the characterization of the optimal primal solution x̄ that for all integer vectors c ∈ Zn such that
there exists an optimum solution to the primal problem, there exists an optimum solution (e.g.
x̄) where all variables take integer values. A system of linear inequalities Ax ≤ b with b ∈ Zm
and A ∈ Qm×n satisfying such a property (i.e., whenever c ∈ Zn is such that there is an optimal
solution to maxx {c⊤ x : Ax ≤ b} then there is an integer optimum solution) is called totally
dual integral (TDI). Thus, we conclude that the polyhedron Pf associated with an integer-
valued submodular function f is TDI. TDI property is a well-known sufficient condition that
guarantees that every extreme point (see section 8.2) of the associated polyhedron is integral.
In particular, TDI property generalizes total unimodularity (TU), i.e., the other well-known
sufficient condition for integrality of a polyhedron which plays a key role in network-flow based
optimization.
6

Proofs of Facts from Part I

Fact I.1.6 The unit ball of a norm ∥ · ∥, i.e., the set

{x ∈ Rn : ∥x∥ ≤ 1} ,
same as every other ∥ · ∥-ball
Br (a) := {x ∈ Rn : ∥x − a∥ ≤ r} ,
(here a ∈ Rn and r ≥ 0 are fixed) is convex.
In particular,
√Euclidean balls (∥·∥-balls associated with the standard Euclidean
norm ∥x∥2 := x⊤ x) are convex.
Proof. Let us prove that the set Q := {x ∈ Rn : ∥x − a∥ ≤ r} is convex. For any x′ , x′′ ∈ Q
and λ ∈ [0, 1], we have

∥λx′ + (1 − λ)x′′ − a∥ = ∥λ(x′ − a) + (1 − λ)(x′′ − a)∥

≤ ∥λ(x′ − a)∥ + ∥(1 − λ)(x′′ − a)∥
= λ∥x′ − a∥ + (1 − λ)∥x′′ − a∥ ≤ λr + (1 − λ)r = r.

Here, the first inequality follows from Triangle inequality, and the second equality follows from
homogeneity of norms, and the last inequality is due to x′ , x′′ ∈ Q. Thus, from ∥λx′ + (1 −
λ)x′′ − a∥ ≤ r, we conclude that λx′ + (1 − λ)x′′ ∈ Q as desired.

Fact I.1.8 Unit balls of norms on Rn are exactly the same as convex sets V in
Rn satisfying the following three properties:

(i) V is symmetric with respect to the origin: x ∈ V =⇒ −x ∈ V ;

(ii) V is bounded and closed;
(iii) V contains a neighborhood of the origin, i.e., there exists r > 0 such that the
centered at the origin Euclidean ball of radius r – the set {x ∈ Rn : ∥x∥2 ≤ r}
– is contained in V .
Any set V satisfying the outlined properties is indeed the unit ball of a particular
norm given by
∥x∥V = inf t ≥ 0 : t−1 x ∈ V .

(1.2)

Proof. First, let V be the unit ball of a norm ∥ · ∥, and let us verify the three stated properties.
Note that V = −V due to ∥x∥ = ∥ − x∥. V is bounded and contains a neighborhood of the
origin due to equivalence between ∥ · ∥ and ∥ · ∥2 (Proposition B.3). Moreover, V is closed. To

76
Proofs of Facts from Part I 77

see this note that ∥ · ∥ is Lipschitz continuous with constant 1 with respect to itself since by
Triangle inequality and due to ∥x − y∥ = ∥y − x∥ we have

|∥x∥ − ∥y∥| ≤ ∥x − y∥, ∀x, y ∈ Rn ,

which implies by Proposition B.3 that there exists L∥·∥ < ∞ such that

|∥x∥ − ∥y∥| ≤ L∥·∥ ∥x − y∥2 , ∀x, y ∈ Rn ,

that is, ∥ · ∥ is Lipschitz continuous (and thus continuous). And of course for any a ∈ R, the
sublevel set {x ∈ Rn : f (x) ≤ a} of a continuous function is closed.
For the reverse direction, consider any V possessing properties (i –iii). Then, as V is bounded
and contains a neighborhood of the origin, the function ∥·∥V is well defined, it is positive outside
of the origin and vanishes at the origin. Moreover, ∥ · ∥V is homogeneous – when the argument
is multiplied by a real number λ, the value of the function is multiplied by |λ| (by construction
and since V = −V ).
Now, let us show that the relation V = {y ∈ Rn : ∥y∥V ≤ 1} holds. Indeed, the inclusion
V ⊆ {y : ∥y∥V ≤ 1} is evident. So, we will verify that ∥y∥V ≤ 1 implies y ∈ V . Consider
any y such that ∥y∥V ≤ 1 and let t̄ := ∥y∥V (note that t̄ ∈ [0, 1]). There is nothing to prove
when t̄ = 0, which due to the boundedness of V implies that y = 0 and V contains the origin.
When t̄ > 0, then, by definition of ∥ · ∥V , there exists a sequence of positive numbers {ti } that
converges to t̄ as i → ∞ such that y i := t−1 i y ∈ V . Then, as V is closed, ȳ := t̄
−1
y ∈ V . And
since 0 < t̄ ≤ 1, y = t̄ȳ is a convex combination of the origin and ȳ. As both 0 ∈ V and ȳ ∈ V
and V is convex, we conclude y ∈ V .
Let us now check that ∥ · ∥V satisfies the Triangle inequality. As ∥ · ∥V is nonnegative, all we have
to check is that ∥x + y∥V ≤ ∥x∥V + ∥y∥V when x ̸= 0, y ̸= 0. Setting x̄ := x/∥x∥V , ȳ := y/∥y∥V ,
we have by homogeneity ∥x̄∥V = ∥ȳ∥V = 1. Then, from the relation V = {y ∈ Rn : ∥y∥V ≤ 1}
we deduce x̄ ∈ V and ȳ ∈ V . Now, as V is convex and x̄, ȳ ∈ V , we have

1 ∥x∥V ∥y∥V
(x + y) = x̄ + ȳ ∈ V.
∥x∥V + ∥y∥V ∥x∥V + ∥y∥V ∥x∥V + ∥y∥V

1
That is, ∥x∥V +∥y∥V
(x + y) ≤ 1. Then, once again by homogeneity of ∥ · ∥V we conclude that
V
∥x + y∥V ≤ ∥x∥V + ∥y∥V .

Fact I.1.9 Let Q be an n × n matrix which is symmetric (i.e., Q = Q⊤ ) and

positive definite (i.e., x⊤ Qx > 0 for all x ̸= 0). Then, for every nonnegative r,
the Q-ellipsoid of radius r centered at a, i.e., the set

x ∈ Rn : (x − a)⊤ Q(x − a) ≤ r2

is convex.
Proof. Note that since Q is positive definite, the matrix P := Q1/2 (see section D.1.5 for the
definition of the matrix square root) is well-defined and positive definite. Then, we have P is
nonsingular and symmetric, and
n o n o
x ∈ Rn : (x − a)⊤ Q(x − a) ≤ r2 = x ∈ Rn : (x − a)⊤ P ⊤ P (x − a) ≤ r2
= {x ∈ Rn : ∥P (x − a)∥2 ≤ r} .

Now, note that whenever ∥ · ∥ is a norm on Rn and P is a nonsingular n × n matrix, the function
p
x 7→ ∥P x∥ is a norm itself (why?). Thus, the function ∥x∥Q := x⊤ Qx = ∥Q1/2 x∥2 is a norm,
and the ellipsoid in question clearly is just the ∥ · ∥Q -ball of radius r centered at a.
Fact I.1.11 A set M ⊆ Rn is convex if and only if it is closed with respect to
taking all convex combinations of its elements. That is, M is convex if and only
78 Proofs of Facts from Part I

if every convex combination of vectors from M is again a vector from M .

Hint: Note that assuming λ1 , . . . , λm > 0, one has
m m
X X λi
λi xi = λ1 x1 + (λ2 + λ3 + . . . + λm ) µi xi , where µi := .
i=1 i=2
λ2 + λ3 + . . . + λm

Proof. There is nothing to prove when M is empty, so we assume M ̸= ∅. If M is closed with

respect to taking arbitrary convex combinations of its points, it is closed with respect to taking
2-point combinations, which is exactly the same as to say that M is convex. For the reverse
direction, let M be convex. We will prove that a point given as a convex combination of N points
from M is itself in M by induction on N . The claim is clearly true when N = 1 (independent
of what M is) and is true when N = 2 (since M is convex). Suppose now that the claim is true
for some N ≥ 2. Consider (N + 1)-term convex combination x = N
P +1 i i
i=1 λi x of points x from
1
M . If λ1 = 1, we have x = x ∈ M . When λ1 < 1, we have
N
!
1
X λi i
x = λ1 x + (1 − λ1 ) x .
i=2
1 − λ1

Define x̄ := N
P λi i PN
i=2 1−λ1 x . As i=2 λi = 1 − λ1 , we see that x̄ is an N -term convex combination
of points from M and thus belongs to M by the inductive hypothesis. Hence, x = λ1 x1 +(1−λ1 )x̄
is convex combination of x1 , x̄ ∈ M , and as M is convex we conclude x ∈ M . This completes
the inductive step.
Fact I.1.14 [Convex hull via convex combinations] For a set M ⊆ Rn ,
Conv(M ) = {the set of all convex combinations of vectors from M } .
Proof. Define M c := {the set of all convex combinations of vectors from M }. Recall that a con-
vex set is closed with respect to taking convex combinations of its members (Fact I.1.11); thus
any convex set containing M also contains M c. As by definition Conv(M ) is the intersection of
all convex sets containing M , we have Conv(M ) ⊇ M c. It remains to prove that Conv(M ) ⊆ M c.
We start with the claim that M is convex. By Fact I.1.11 M is convex if and only if every convex
c c
combination of points from M c is also in M c: let x̄i ∈ M
c. Indeed this criteria holds for M c for
i = 1, . . . , N and consider a convex combination of these points, i.e.,
N
X
x̂ := λi x̄i ,
i=1
PN
where λi ≥ 0 and i=1 λi = 1. For each i = 1, . . . , N , as x̄i ∈ M c, by definition of M c, we have
x̄i = N
P i i,j i,j P Ni
µ
j=1 i,j x , where x ∈ M , µ i,j ≥ 0 and µ
j=1 i,j = 1. Then, we arrive at
Ni Ni
N N
! N X
X i
X X i,j
X
x̂ := λi x̄ = λi µi,j x = (λi µi,j )xi,j .
i=1 i=1 j=1 i=1 j=1

Clearly, γi,j := λi µi,j is nonnegative for all i, j. Moreover,

Ni Ni Ni
N X N X N
! N
X X X X X
γi,j = λi µi,j = λi µi,j = λi = 1,
i=1 j=1 i=1 j=1 i=1 j=1 i=1
P Ni PN
where the third and forth equalities follow from j=1 µi,j = 1 for all i and i=1 λi = 1,
PN PNi
respectively. Therefore, x̂ = i=1 j=1 γi,j xi,j is nothing but a convex combination of points
from M , and thus x̂ ∈ Mc proving that M
c is convex. Clearly, we also have M c ⊇ M , and so by
definition of Conv(M ), we deduce M ⊇ Conv(M ), as desired.
c

Fact I.1.17 A set K ⊆ Rn is a cone if and only if it is nonempty and

Proofs of Facts from Part I 79

• is conic, i.e., x ∈ K, t ≥ 0 =⇒ tx ∈ K; and

• contains sums of its elements, i.e., x, y ∈ K =⇒ x + y ∈ K.
Proof. Suppose K is nonempty and possesses the above properties. Then, the first property
already states that K is conic, so we will show that K is convex. For any x, y ∈ K and λ ∈ [0, 1],
we have
λx + (1 − λ)y = x̄ + ȳ,
where x̄ := λx and ȳ := (1 − λ)y. As λ ∈ [0, 1], x, y ∈ K and K is conic, we have x̄, ȳ ∈ K.
Moreover, since K contains sum of its elements, we conclude λx + (1 − λ)y = (x̄ + ȳ) ∈ K, i.e.,
K is convex.
For the reverse direction, if K is a cone, then by definition K is nonempty, conic and convex.
Then, for any x, y ∈ K, as K is convex we have 21 x + 12 y ∈ K, and as K is conic we arrive at
x + y = 2 21 x + 12 y ∈ K.

Fact I.1.20 [Conic hull via conic combinations] The conic hull Cone(K) of a
set K ⊆ Rn is the set of all conic combinations (i.e., linear combinations with
nonnegative coefficients) of vectors from K:
( N
)
X
Cone(K) = x ∈ Rn : ∃N ≥ 0, λi ≥ 0, xi ∈ K, i ≤ N : x = λ i xi .
i=1

Proof. The case of K = ∅ is trivial, see the comment on the value of an empty sum after the
statement of Fact I.1.20. When K ̸= ∅, this fact is an immediate corollary of Fact I.1.17.
Fact I.1.23 The closure of a set M ⊆ Rn is exactly the set composed of the
limits of all converging sequences of elements from M .
Proof. Let M be the set of the limit points of all converging sequences of elements from M .
We need to show that cl M = M . Let us first prove cl M ⊇ M . Suppose x ∈ M . Then, x is the
limit of a converging sequence of points {xi } ∈ M ⊆ cl M . Since cl M is a closed set, we arrive
at x ∈ cl M .
For the reverse direction, note that by definition cl M is the smallest (w.r.t. inclusion) closed set
that contains M , so it suffices to prove that M is a closed set satisfying M ⊇ M . It is easy to see
that M ⊇ M holds as for any x ∈ M , the sequence {xi } where xi = x is a converging sequence
of points from M with a limit point of x and thus by definition of M we deduce x ∈ M . Now,
consider a converging sequence of points {xi } ⊆ M , and let us prove that the limit x̄ of this
sequence belongs to M . For every i, since the point xi ∈ M is the limit of a sequence of points
from M , we can find a point y i ∈ M such that ∥xi − y i ∥2 ≤ 1/i. The sequence {y i } is composed
of points from M and clearly has the same limit as the sequence {xi }, so that the latter limit is
the limit of a sequence of points from M and as such belongs to M .

Fact I.1.36 Let K be a closed cone, and let the set

X := {x ∈ Rn : Ax − b ∈ K}
be nonempty. Then, ConeT(X) = {[x; t] ∈ Rn × R : Ax − bt ∈ K, t ≥ 0}.
Proof. Define X b := {[x; t] ∈ Rn × R : Ax − bt ∈ K, t ≥ 0}; so, we should prove that X
b =
ConeT(X). Recall that ConeT(M ) := cl{[x; t] ∈ Rn × R : t > 0, x/t ∈ M } for any nonempty
convex set M . Note that for the given set X, its perspective transform is
Persp(X) = {[x; t] ∈ Rn × R : t > 0, A(x/t) − b ∈ K}
= {[x; t] ∈ Rn × R : t > 0, Ax − bt ∈ K} ,
80 Proofs of Facts from Part I

where the last equality follows from K being conic. So, Persp(X) ⊆ X, b and by taking the
closures of both sides we arrive at ConeT(X) = cl(Persp(X)) ⊆ cl(X) = X, b b where the last
equality follows as X clearly is closed. Hence, ConeT(X) ⊆ X. To verify the opposite inclusion,
b b
consider [x; t] ∈ X b and let us prove that [x; t] ∈ ConeT(X). Let x̄ ∈ X (recall that X is
nonempty). Then, [x̄; 1] ∈ X.b Moreover, as X b is a cone and the points [x; t] and [x̄; 1] belong to
X, we have zϵ := [x + ϵx̄; t + ϵ] ∈ X for all ϵ > 0. Also, for ϵ > 0, we have t + ϵ > 0 and so
b b
zϵ ∈ Xb implies 1 (x + ϵx̄) ∈ X, which is equivalent to zϵ = [x + ϵx̄; t + ϵ] ∈ Persp(X). Finally,
(t+ϵ)
as [x; t] = limϵ→+0 zϵ , we have [x; t] ∈ cl(Persp(X)) = ConeT(X), as desired.

Fact I.2.7 [Caratheodory Theorem in conic form] Let a ∈ Rm be a conic combi-

nation (linear combination with nonnegative coefficients) of N vectors a1 , . . . , aN .
Then, a is a conic combination of at most m vectors from the collection a1 , . . . , aN .
Proof. The proof follows the same lines as the proof of the “plain” Caratheodory Theorem. Con-
sider the minimal, in terms of the number of positive coefficients, representation of a as a conic
combination of a1 , . . . , aN ; w.l.o.g., we can assume that this is the representation a = K i
P
i=1 λi a ,
λi > 0, i ≤ K. We need to prove that K ≤ m. Assume for contradiction that K > m. Con-
sider the system of m scalar linear equations K i
P
i=1 δi a = 0 in variables δ. The number K of
unknowns in this system is larger than the number m of equations. Thus, this system has a
nontrivial solution δ̄. Passing, if necessary, from δ̄ to −δ̄, we may further assume that some
of δ̄i are strictly negative. Define λi (t) := λi + tδ̄i for all i. Note that for all t ≥ 0 we have
PK i ∗ ∗
a = i=1 λi (t)a . Let t be the largest t ≥ 0 for which all λi (t) are nonnegative (t is well
defined, since for large t some of λi (t), i.e., those corresponding to δ̄i < 0, will become negative),
we get a = K ∗ i ∗
P
i=1 λi (t )a with all coefficients λi (t ) nonnegative and at least one of them zero,
contradicting the origin of K.
Part II

Separation Theorem, Extreme Points,

Recessive Directions, and Geometry of
Polyhedral Sets

81
7

Separation Theorem

We next investigate Separation Theorem which is as indispensable when studying

general convex sets as is General Theorem on Alternative when investigating
properties of polyhedral sets.

7.1 Separation: definition

Recall that a hyperplane M in Rn is, by definition, an affine subspace of dimension
n − 1. Then, by Proposition A.47, hyperplanes are precisely the level sets of
nontrivial linear forms. That is,
M ⊂ Rn is a hyperplane
⇐⇒ ∃a ∈ Rn , a ̸= 0, ∃b ∈ R such that M = x ∈ Rn : a⊤ x = b .

We can associate with the hyperplane M or, better to say, with the associated
pair a, b (defined by the hyperplane up to multiplication of a, b by nonzero real
number) the following sets:
• “upper” and “lower” open half-spaces
M ++ := x ∈ Rn : a⊤ x > b , and M −− := x ∈ Rn : a⊤ x < b .

These sets clearly are convex, and since a linear form is continuous, and the
sets are given by strict inequalities on the value of a continuous function, they
indeed are open.
These open half-spaces are uniquely defined by the hyperplane, up to swapping
the “upper” and the “lower” ones (this is what happens when passing from a
particular pair a, b specifying M to a negative multiple of this pair).
• “upper” and “lower” closed half-spaces
M + := x ∈ Rn : a⊤ x ≥ b , and M − := x ∈ Rn : a⊤ x ≤ b .

These are also convex sets. Moreover, these two sets are polyhedral and thus
closed. It is easily seen that the closed upper/lower half-space is the closure of
the corresponding open half-space, and M itself is the common boundary of
all four half-spaces.
Also, note that our half-spaces and M itself partition Rn , i.e.,
Rn = M −− ∪ M ∪ M ++
83
84 Separation Theorem

(partitioning by disjoint sets), and

Rn = M − ∪ M +
(where M is the intersection of the sets M − and M + ).
We are now ready to define the basic notion of separation of two convex sets
T and S by a hyperplane.

Definition II.7.1 [Separation] Let S, T be two nonempty convex sets in

Rn .
• A hyperplane
M = x ∈ Rn : a⊤ x = b

[where a ̸= 0]
is said to separate S and T , if it satisfies both of the following properties:
– S ⊆ x ∈ Rn : a⊤ x ≤ b , T ⊆ x ∈ Rn : a⊤ x ≥ b (i.e., S and T

belong to the opposite closed half-spaces into which M splits Rn ), and,

– at least one of the sets S, T is not contained in M itself, i.e.,
S ∪ T ̸⊆ M.
• The separation is called strong, if there exist b′ , b′′ ∈ R satisfying b′ < b <
b′′ , such that
S ⊆ x ∈ Rn : a⊤ x ≤ b′ , T ⊆ x ∈ Rn : a⊤ x ≥ b′′ .

• A linear form a ̸= 0 is said to separate (strongly separate) S and T , if for

properly chosen b the hyperplane x ∈ Rn : a⊤ x = b separates (strongly

separates) S and T .
• We say that S and T can be (strongly) separated, if there exists a hyper-
plane which (strongly) separates S and T .

Let us examine the separation concept on a few simple examples.

Example II.7.1

1) 2) 3) 4) 5)
Figure II.1. Separation.
1) The hyperplane {x ∈ R2 : x2 − x1 = 1} strongly separates the polyhedral sets
S = {x ∈ R2 : x2 = 0, x1 ≥ −1} and T = {x ∈ R2 : 0 ≤ x1 ≤ 1, 3 ≤ x2 ≤ 5}.
2) The hyperplane {x ∈ R : x = 1} separates (but not strongly separates) the
convex sets S = {x ∈ R : x ≤ 1} and T = {x ∈ R : x ≥ 1}.
3) The hyperplane {x ∈ R2 : x1 = 0} separates (but not strongly separates) the
convex sets S = {x ∈ R2 : x1 < 0, x2 ≥ −1/x1 } and T = {x ∈ R2 : x1 >
0, x2 > 1/x1 }.
7.2 Separation Theorem 85

4) The hyperplane {x ∈ R2 : x2 − x1 = 1} does not separate the convex sets

S = {x ∈ R2 : x2 ≥ 1} and T = {x ∈ R2 : x2 = 0}.
5) The hyperplane {x ∈ R2 : x2 = 0} does not separate the polyhedral sets S =
{x ∈ R2 : x2 = 0, x1 ≤ −1} and T = {x ∈ R2 : x2 = 0, x1 ≥ 1}. ♢
The following equivalent description of separation is used often as well.

Fact II.7.2 Let S, T be nonempty convex sets in Rn . A linear form a⊤ x

separates S and T if and only if
(a) sup a⊤ x ≤ inf a⊤ y, and
x∈S y∈T

(b) inf a⊤ x < sup a⊤ y.

x∈S y∈T

This separation is strong if and only if (a) holds as a strict inequality:

sup a⊤ x < inf a⊤ y.
x∈S y∈T

7.2 Separation Theorem

One of the most fundamental results in convex analysis is the following separation
theorem.

Theorem II.7.3 [Separation Theorem] Let S and T be nonempty convex

sets in Rn .
(i) S and T can be separated if and only if their relative interiors do not
intersect, i.e., rint S ∩ rint T = ∅.
(ii) S and T can be strongly separated if and only if the sets are at a
positive distance from each other, i.e.,
dist(S, T ) := inf {∥x − y∥2 : x ∈ S, y ∈ T } > 0.
In particular, if S and T are nonempty non-intersecting closed convex sets
and one of these sets is compact, then S and T can be strongly separated.

We will use the following simple and important lemma in the proof of the
separation theorem.

Lemma II.7.4 A point x ∈ rint Q of a convex set Q can be the minimizer

(or maximizer) of a linear function f (x) = a⊤ x if and only if the function is
constant on Q.

Proof. “If” part is evident. To prove the “only if” part, let x̄ ∈ rint Q be, say, a
minimizer of f over Q, then for any y ∈ Q we need to prove that f (x̄) = f (y).
There is nothing to prove if y = x̄, so let us assume that y ̸= x̄. Since Q is convex
and x̄, y ∈ Q, the segment [x̄, y] belongs to Q. Moreover, as x̄ ∈ rint Q we can
extend this segment a little further away from x̄ and still remain in Q. That is,
86 Separation Theorem

there exists z ∈ Q such that x̄ = (1 − λ)y + λz with certain λ ∈ [0, 1). As y ̸= x̄,
we have in fact λ ∈ (0, 1). Since f is linear, we deduce
f (x̄) = (1 − λ)f (y) + λf (z).
Because x̄ is a minimizer of f over Q and y, z ∈ Q, we have min{f (y), f (z)} ≥
f (x̄) = (1 − λ)f (y) + λf (z). Then, from λ ∈ (0, 1) we conclude that this relation
can be satisfied only when f (x̄) = f (y) = f (z).
Proof of Theorem II.7.3. We will prove the separation theorem in several
steps. We will first focus on the usual separation, i.e., case (i) of the theorem.
(i) Necessity. Assume that S, T can be separated. Then, for certain a ̸= 0 we
have
sup a⊤ x ≤ inf a⊤ y, and inf a⊤ x < sup a⊤ y. (7.1)
x∈S y∈T x∈S y∈T

Assume for contradiction that rint S and rint T have a common point x̄. Then,
from the first inequality in (7.1) and x̄ ∈ S ∩ T , we deduce
a⊤ x̄ ≤ sup a⊤ x ≤ inf a⊤ y ≤ a⊤ x̄.
x∈S y∈T

Thus, x̄ maximizes the linear function f (x) = a⊤ x on S and simultaneously

minimizes this function on T . Then, as x̄ ∈ rint S and also x̄ ∈ rint T using
Lemma II.7.4, we conclude f (x) = f (x̄) on S and on T , so that f (·) is constant
on S ∪ T . This then yields the desired contradiction to the second inequality in
(7.1).
(i) Sufficiency. The proof of sufficiency part of the Separation Theorem is much
more instructive. There are several ways to prove it. Below, we present a proof
based on Theorem I.3.2.
(i) Sufficiency, Step 1: Separation of a nonempty polytope and a point
outside the polytope. We start with seemingly a very particular case of the
Separation Theorem – the one where S = Conv {x1 , . . . , xN } and T is a singleton
T = {x} which does not belong to S. We will prove that in this case there exists
a linear form which strongly separates T = {x} and S.
The set S = Conv{x1 , . . . , xN } is given by the polyhedral representation
( N N
)
X X
S = z ∈ Rn : ∃λ such that λ ≥ 0, λi = 1, z = λi ai ,
i=1 i=1

and thus S is polyhedral (Theorem I.3.2). Therefore, for a properly selected k,

a1 , . . . , ak , and b1 , . . . , bk we have:
S = z ∈ Rn : a⊤

i z ≤ bi , i ≤ k .

Since x ̸∈ S, there exists i ≤ k such that a⊤

i x > bi , and thus the corresponding
ai clearly strongly separates our S and T = {x}.
(i) Sufficiency, Step 2: Separation of a nonempty convex set and a point
outside of the set. Now consider the case when S is an arbitrary nonempty
7.2 Separation Theorem 87

convex set and T = {x} is a singleton outside S (here the difference with Step 1
is that now S is not assumed to be a polytope).
Without loss of generality we may assume that S contains 0 (if it is not the
case, by taking any p ∈ S, we may translate S and T to the sets S 7→ −p + S,
T 7→ −p + T ; clearly, a linear form which separates the shifted sets, can be shifted
to separate the original ones as well). Let L be the linear span of S.
If x ̸∈ L, the separation is easy: we can write x = e + f , where e ∈ L and f is
from the subspace orthogonal to L, and thus
f ⊤ x = f ⊤ f > 0 = max f ⊤ y,
y∈S

so that f strongly separates S and T = {x}.

Now, we consider the case when x ∈ L. Since x ∈ L, and x ̸∈ S as well as
∅ ̸= S ⊆ L, we deduce that L contains at least two points and so L ̸= {0}.
Without loss of generality, we can assume that L = Rn .
Let B := {h ∈ Rn : ∥h∥2 = 1} be the unit sphere in Rn . This is a closed and
bounded set in Rn (boundedness is evident, and closedness follows from the fact
that ∥ · ∥2 is continuous). Thus, B is a compact set. Let us prove that there exists
f ∈ B that separates x and S in the sense that
f ⊤ x ≥ sup f ⊤ y. (7.2)
y∈S

Assume for contradiction that no such f exists. Then, for every h ∈ B there exists
yh ∈ S such that
h⊤ yh > h⊤ x.
Since the inequality is strict, it immediately follows that there exists an open
neighborhood Uh of the vector h such that
(h′ )⊤ yh > (h′ )⊤ x, ∀h′ ∈ Uh . (7.3)
Note that the family of open sets {Uh }h∈B covers B. As B is compact, we can find
a finite subfamily Uh1 , . . . , UhN of this family which still covers B. Let us take the
corresponding points y 1 := yh1 , y 2 := yh2 , . . . , y N := yhN and define the polytope
Sb := Conv {y 1 , . . . , y N }. Due to the origin of y i , all of these points are in S and
thus S ⊇ Sb (recall that S is convex). Since x ̸∈ S, we deduce x ̸∈ S. b Then, by
Step 1, x can be strongly separated from S, i.e., there exists a ̸= 0 such that
b

a⊤ x > sup a⊤ y = max a⊤ y i : 1 ≤ i ≤ N .

(7.4)
y∈S
b

By normalization, we may also assume that ∥a∥2 = 1, so that a ∈ B. Recall that

Uh1 , . . . , UhN form a covering of B, and as a ∈ B, we have that a belongs to certain
Uhi . By construction of Uhi (see (7.3)), we have
a⊤ y i ≡ a⊤ yhi > a⊤ x,
which contradicts (7.4) as y i ∈ S.
b
Thus, we conclude that there exists f ∈ B satisfying (7.2). We claim that
88 Separation Theorem

f separates S and {x}. Given that we already established(7.2), all we need to

verify for establishing f indeed separates S and {x} is to show that the linear
form f (y) = f ⊤ y is non-constant on S ∪ T . This is evident as we are in the
situation when 0 ∈ S and L = Lin(S) = Rn and f ̸= 0, so that f (y) is non-
constant already on S (indeed, otherwise we would have f ⊤ y = 0 for y ∈ S due
to 0 ∈ S, whence f ⊤ y = 0 for y ∈ Lin(S) = Rn , contradicting f ̸= 0).
(i) Sufficiency, Step 3: Separation of two nonempty and non-intersecting
convex sets. Now we are ready to prove that two nonempty and non-intersecting
convex sets S and T can be separated. To this end consider the arithmetic dif-
ference of the sets S and T , i.e.,
∆ := S − T = {x − y : x ∈ S, y ∈ T } .
As S and T are nonempty and convex, ∆ is nonempty and convex (by Proposition
I.1.21.3). Also, as S ∩ T = ∅, we have 0 ̸∈ ∆. Then, by Step 2, we can separate
∆ and {0}, i.e., there exists f ̸= 0 such that
f ⊤ 0 = 0 ≥ sup f ⊤ z and f ⊤ 0 > inf f ⊤ z.
z∈∆ z∈∆

In other words,
f ⊤x − f ⊤y f ⊤x − f ⊤y ,

0≥ sup and 0> inf
x∈S,y∈T x∈S,y∈T

which clearly means that f separates S and T .

(i) Sufficiency, Step 4: Separation of nonempty convex sets with non-
intersecting relative interiors. Now we are ready to complete the proof of the
“if” part of part (i) of the Separation Theorem. Let S and T be two nonempty
convex sets such that rint S ∩ rint T = ∅, then we will prove that S and T can be
separated. Recall from Theorem I.1.29 that the sets S ′ := rint S and T ′ := rint T
are nonempty and convex. Moreover, we are given that S ′ and T ′ do not intersect,
thus they can be separated by Step 3. That is, there exists f such that
inf f ⊤ x ≥ sup f ⊤ x and sup f ⊤ x > inf ′ f ⊤ x. (7.5)
x∈T ′ y∈S ′ x∈T ′ y∈S

It is immediately seen that in fact f separates S and T . Indeed, the quantities

in the left and the right hand sides of the first inequality in (7.5) clearly remain
unchanged when we replace S ′ with cl S ′ and T ′ with cl T ′ . Moreover, by Theorem
I.1.29, cl S ′ = cl S ⊇ S and cl T ′ = cl T ⊇ T , and we get inf f ⊤ x = inf ′ f ⊤ x, and
x∈T x∈T
similarly sup f ⊤ y = sup f ⊤ y. Thus, we get from (7.5)
y∈S y∈S ′

inf f ⊤ x ≥ sup f ⊤ y.
x∈T y∈S

It remains to note that T ′ ⊆ T , S ′ ⊆ S, so that the second inequality in (7.5)

implies that
sup f ⊤ x > inf f ⊤ x.
x∈T y∈S
7.2 Separation Theorem 89

(ii) Necessity: Prove yourself.

(ii) Sufficiency: Define ρ := dist(S, T ) = inf {∥x −
y∥2 : x ∈ S, y ∈ T }. In case

(ii), we are given that ρ > 0. Consider the set Sb := x ∈ Rn : inf ∥x − y∥2 ≤ ρ/2 .
y∈S

As S is convex, the set Sb is convex (recall Example I.1.3). Moreover, Sb ∩ T = ∅

(why?). Then, by part (i), Sb and T can be separated. Let f be any linear form
that separates Sb and T . Then, the same form strongly separates S and T (why?).
The last statement of (ii), i.e., “in particular” part, readily follows from the just
proved statement due to the fact that if two closed nonempty sets in Rn do not
intersect and one of them is compact, then the sets are at positive distance from
each other (why?).
Remark II.7.5 In Theorem II.7.3, a careful reader would notice that the con-
siderations in the proof (i) Sufficiency, Step 1, i.e., separation of a polytope and a
point not in the polytope, are based on solely Theorem I.3.2, and this is a “purely
arithmetic” statement: when proving it, we never used things like convergence,
compactness, square roots, etc., just rational arithmetics. Therefore, the result
stated at Step 1 remain valid if we replace our universe Rn with the space Qn
of n-dimensional rational vectors (those with rational coordinates; of course, the
multiplication by reals in this space should be restricted to multiplication by ra-
tionals). The possibility to separate a rational vector from a “rational” polytope
by a rational linear form, which is the “rational” version of the result of Step
1, definitely are of interest (e.g., for Integer Programming). In fact, all results
in Part 1 are derived from Fourier-Motzkin elimination and Theorem I.3.2, i.e.,
the existence of optimal solution(s) to feasible and bounded LP programs, Farkas
Lemmas, General Theorem on Alternative, plain and conic Caratheodory and
(finite family version of) Helly theorems, etc., remain valid when Rn is replaced
with Qn (provided, of course, that the related data are rational). In particu-
lar, any feasible and bounded LP program with rational data admits a rational
optimal solution, which definitely is worthy of knowing.
In contrast to these “purely arithmetic” considerations at Step 1, at Step 2, i.e.,
for the separation of a closed convex set and a point outside of the set, we used
compactness, which heavily exploits the fact that our universe is Rn and not, say,
Qn (in the latter space bounded and closed sets not necessarily are compact).
In fact, we could not avoid things like compactness arguments at Step 2, since
the very fact we are proving is true in Rn but not in Qn . Indeed, consider the
“rational plane,” i.e., the universe composed of all 2-dimensional vectors with
rational entries, and let S be the half-plane in this rational plane given by the
linear inequality
x1 + αx2 ≤ 0,
where α is irrational. S clearly is a “convex set” in Q2 . But, it is immediately
seen that a point outside this set cannot be separated from S by a rational linear
form. ■
8

Consequences of Separation Theorem

Separation Theorem admits a number of important consequences. In this chapter,

we will discuss these.

8.1 Supporting hyperplanes

By Separation Theorem, we immediately deduce that a nonempty closed convex
set M is precisely the intersection of all closed half-spaces containing M . Among
these half-spaces, the most interesting are the “extreme” ones, i.e., those with
boundary hyperplanes touching M . Such extreme hyperplanes are called sup-
porting hyperplanes. While this notion of extreme makes sense for an arbitrary
(not necessary closed) convex set, we will use it for closed convex sets only, and
include the requirement of closedness in the definition:

Definition II.8.1 [Supporting hyperplane] Let M be a convex closed set in

Rn , and let x ∈ rbd M . A hyperplane
Π := y ∈ Rn : a⊤ y = a⊤ x

[where a ̸= 0]
is called supporting to M at x, if it separates M and {x}, i.e., if
a⊤ x ≥ sup a⊤ y and a⊤ x > inf a⊤ y. (8.1)
y∈M y∈M

Independent of whether M is or is not closed, a point x ∈ rbd M is a limit of

points from M , and thus the first inequality in (8.1) cannot be strict. As a result,
we arrive at an equivalent definition of a supporting hyperplane for convex sets
as follows.
Given a closed convex set M ∈ Rn and a point x ∈ rbd M , a hyperplane
y ∈ Rn : a⊤ y = a⊤ x

is supporting to M at x if and only if the linear form a(y) := a⊤ y attains

its maximum on M at the point x and is non-constant on M .

Example II.8.1 The hyperplane {x ∈ Rn : x1 = 1} clearly is supporting to the

unit Euclidean ball {x ∈ Rn : ∥x∥2 ≤ 1} at the point x = e1 = [1; 0; . . . ; 0]. ♢
The most important property of a supporting hyperplane is its existence:

90
8.2 Extreme points and Krein-Milman Theorem 91

Proposition II.8.2 [Existence of supporting hyperplanes] Let M be a closed

convex set in Rn , and let x ∈ rbd M . Then,
(i) there exists at least one hyperplane which is supporting to M at x;
(ii) if a hyperplane Π is supporting to M at x, then Π ∩ M is a nonempty
closed convex set, x ∈ Π ∩ M , and dim(Π ∩ M ) < dim(M ).

Proof. To see (i) consider any x ∈ rbd M . Then, x ̸∈ rint M , and therefore
the point {x} and rint M can be separated by the Separation Theorem. The
associated separating hyperplane is exactly the desired hyperplane supporting to
M at x.
To prove (ii), note that if Π = y ∈ Rn : a⊤ y = a⊤ x is supporting to M at

x ∈ rbd M , then the set M ′ := M ∩ Π is nonempty (as it contains x) and is closed

and convex as both Π and M are. Moreover, the linear form a⊤ y is constant on
M ′ and therefore (why?) on Aff(M ′ ). At the same time, this form is non-constant
on M by definition of a supporting plane. Thus, Aff(M ′ ) is a proper (less than the
entire Aff(M )) subset of Aff(M ), and therefore the affine dimension of Aff(M ′ ),
which is by definition nothing but dim(M ′ ), is less than the affine dimension of
Aff(M ), which is precisely dim(M ) 1 .

8.2 Extreme points and Krein-Milman Theorem

Supporting hyperplanes are useful in proving the existence of extreme points of
convex sets. Geometrically, an extreme point of a convex set is a point in the set
which cannot be written as a convex combination of other points from the set.
The importance of this notion originates from the following fact which we will
soon prove: any “good enough” (in fact just nonempty compact) convex set M is
nothing but the convex hull of its extreme points, and the set of extreme points
of such a set M is the smallest set whose convex hull is equal to M . That is,
every extreme point of a nonempty compact convex set M is essential.

8.2.1 Extreme points: definition

The exact definition of an extreme point is as follows:

Definition II.8.3 [Extreme points] Let M be a nonempty convex set in Rn .

A point x ∈ M is called an extreme point of M , if there is no nontrivial (of
positive length) segment [u, v] ∈ M for which x is an interior point. That is,
x is an extreme point of M if the relation
x = λu + (1 − λ)v

1 For dimension of a subset in Rn , see Definition I.2.1 or/and section A.4.3. We have used the
following immediate observation: If M ⊂ M ′ are two affine planes, then dim M ≤ dim M ′ , with
equality implying that M = M ′ . The readers are encouraged to prove this fact on their own.
92 Consequences of Separation Theorem

with λ ∈ (0, 1) and u, v ∈ M holds if and only if

u = v = x.
The set of all extreme points of M is denoted by Ext(M ).

In the case of polyhedral sets, extreme points are also referred to as vertices.
Example II.8.2
• The extreme points of a segment [x, y] ∈ Rn are exactly its endpoints {x, y}.
• The extreme points of a triangle are its vertices.
• The extreme points of a (closed) circle on the 2-dimensional plane are the
points of the circumference.

• The convex set M := x ∈ R2+ : x1 > 0, x2 > 0 does not have any extreme
points.
• The only extreme point of the convex set M := {[0; 0]} ∪ x ∈ R2+ : x1 > 0, x2 > 0

is the point [0; 0].

• The closed convex set {x ∈ R2 : x1 = 0} does not have any extreme points. ♢
An equivalent definition of an extreme point is as follows:

Fact II.8.4 Let M be a nonempty convex set and let x ∈ M . Then, x is an

extreme point of M if and only if any (and then all) of the following holds:
(i) the only vector h such that x P±m
h ∈ M is the zero vector;
(ii) in every representation x = i=1 λi xi of x as a convex combination, with
positive coefficients, of points xi ∈ M , i ≤ m, one has x1 = . . . = xm = x;
(iii) the set M \ {x} is convex.

Fact II.8.4.(iii) also admits the following immediate corollary.

Fact II.8.5 All extreme points of the convex hull Conv(Q) of a set Q belong
to Q:
Ext(Conv(Q)) ⊆ Q.

8.2.2 Krein-Milman Theorem

There are convex sets that do not necessarily possess extreme points; as an ex-
ample you may take the open unit ball in Rn . This example is not so interesting
as the set in question is not closed, and when we replace it with its closure the
resulting set is the closed unit ball with plenty of extreme points, i.e., all points
of the boundary. There are, however, closed convex sets which do not possess
extreme points. Consider for example, a line or an affine subspace of larger di-
mension as the convex set. Indeed, a nonempty closed convex set will have no
extreme points only when it contains a line.
We will next prove that any nonempty closed convex set M that does not
contain lines for sure possesses extreme points. Furthermore, if M is a nonempty
8.2 Extreme points and Krein-Milman Theorem 93

convex compact set, it possesses a quite representative set of extreme points, i.e.,
their convex hull is the entire M .

Theorem II.8.6 Let M be a nonempty closed convex set in Rn . Then,

(i) the set of extreme points of M , i.e., Ext(M ), is nonempty if and only
if M does not contain lines;
(ii) if M is bounded, then M = Conv(Ext(M )), i.e., every point of M is a
convex combination of the points of Ext(M ).

Remark II.8.7 Part (ii) of this theorem is the finite-dimensional version of

the famous Krein-Milman Theorem (1940). In fact, Hermann Minkowski (1911)
established Part (ii) of this theorem for the case n = 3, and Ernst Steinitz (1916)
showed it for any (finite) n. ■
We will use a number of lemmas in the proof of Theorem II.8.6. The first one
states that in the case of a closed convex set, we can add any “recessive” direction
to any point in the set and still remain in the set.

Lemma II.8.8 Let M ∈ Rn be a nonempty closed convex set. Then, when-

ever M contains a ray
{x̄ + th : t ≥ 0}
starting at x̄ ∈ M with the direction h ∈ Rn , M also contains all parallel
rays starting at the points of M , i.e., for all x ∈ M
{x + th : t ≥ 0} ⊆ M.
As a consequence, if M contains a certain line, then it contains also all parallel
lines passing through the points of M .

Proof. Suppose x̄ + th ∈ M for all t ≥ 0. Consider any point x ∈ M . Since M is

convex, for any fixed τ ≥ 0 we have
τ
ϵ x̄ + h + (1 − ϵ)x ∈ M, ∀ϵ ∈ (0, 1).
ϵ
By taking the limit as ϵ → +0 and noting that M is closed, we deduce that
x + τ h ∈ M for every τ ≥ 0.
Note that Lemma II.8.8 admits a corollary as follows:

Lemma II.8.9 Let M ∈ Rn be a nonempty convex set, not necessarily

closed. Suppose cl M contains a ray
{x̄ + th : t ≥ 0}
starting at x̄ ∈ cl M with the direction h ∈ Rn , and let x
b ∈ rint M . Then
rint M contains the ray
{x
b + th : t ≥ 0} .
94 Consequences of Separation Theorem

In particular, cl M contains a ray (a straight line) if and only if M contains

a ray (resp., a straight line) with the same direction.

Proof. With x̄, h, and x b as above, for every t > 0 the point xt := x
b+2th belongs to
cl M by Lemma II.8.8. Taking into account that x b ∈ rint M and invoking Lemma
I.1.30, we conclude that x b + th = 21 [x
b + xt ] ∈ rint M. Thus, xb + th ∈ rint M for
b + th ∈ rint M holds true for t = 0 as well due
all t > 0. Finally, this inclusion x
to xb ∈ rint M .
Our last ingredient for the proof of Theorem II.8.6 is a lemma stating a nice
transitive property of extreme points: that is, the extreme points of subsets of
nonempty closed convex sets obtained from the intersection with a supporting
hyperplane of the set are also extreme for the original set.

Lemma II.8.10 Let M ⊂ Rn be a nonempty closed convex set. Then, for

any x̄ ∈ rbd M and any hyperplane Π that is supporting to M at x̄, we
have that the set Π ∩ M is nonempty closed and convex, and Ext(Π ∩ M ) ⊆
Ext(M ).

Proof. First statement, i.e., Π ∩ M is nonempty closed and convex, follows from
Proposition II.8.2(ii). Moreover, by Proposition II.8.2(ii) we have x̄ ∈ Π ∩ M .
Next, let a ∈ Rn be the linear form associated with Π, i.e.,
Π = y ∈ Rn : a⊤ y = a⊤ x̄ ,

so that
inf a⊤ x < sup a⊤ x = a⊤ x̄ (8.2)
x∈M x∈M

(see Proposition II.8.2). Consider any extreme point y of Π ∩ M . Assume for

contradiction that y ̸∈ Ext(M ). Then, there exists two distinct points u, v ∈ M
and λ ∈ (0, 1) such that
y = λu + (1 − λ)v.
As y ∈ Π ∩ M we have a⊤ y = a⊤ x̄ and also as u, v ∈ M , from (8.2) we deduce
that
a⊤ y = a⊤ x̄ ≥ max a⊤ u, a⊤ v .

On the other hand, from the relation y = λu + (1 − λ)v we have

a⊤ y = λa⊤ u + (1 − λ)a⊤ v.
Combining these last two observations and taking into account that λ ∈ (0, 1),
we conclude that
a⊤ y = a⊤ u = a⊤ v.
Then, by the definition of Π, these equalities imply that u, v ∈ Π. As u, v ∈ M as
well, this contradicts that y ∈ Ext(Π ∩ M ) as we have written y = λu + (1 − λ)v
using distinct points u, v ∈ Π ∩ M and some λ ∈ (0, 1).
We are now ready to prove Theorem II.8.6.
8.2 Extreme points and Krein-Milman Theorem 95

Proof of Theorem II.8.6. Let us start with (i). The “only if” part for (i)
follows from Lemma II.8.8. Indeed, for the “only if” part we need to prove that
if M possesses extreme points, then M does not contain lines. That is, we need
to prove that if M contains lines, then it has no extreme points. But, this is
indeed immediate: if M contains a line, then, by Lemma II.8.8, there is a line in
M passing through every given point of M , so that no point can be extreme.
Now let us prove the “if” part of (i). Thus, from now on we assume that M
does not contain lines, and our goal is to prove that then M possesses extreme
points. Equipped with Lemma II.8.10 and Proposition II.8.2, we will prove this
by induction on dim(M ).
There is nothing to do if dim(M ) = 0, i.e., if M is a single point – then, of
course, M = Ext(M ). Now, for the induction hypothesis, for some integer k > 0,
we assume that all nonempty closed convex sets T that do not contain lines and
have dim(T ) = k satisfy Ext(T ) ̸= ∅. To complete the induction, we will show
that this statement is valid for such sets of dimension k + 1 as well. Let M be a
nonempty, closed, convex set that does not contain lines and has dim(M ) = k +1.
Since M does not contain lines and dim(M ) > 0, we have M ̸= Aff(M ). We claim
that M possesses a relative boundary point x̄. To see this, note that there exists
z ∈ Aff(M ) \ M , and thus for any fixed x ∈ M the point
xλ := x + λ(z − x)
does not belong to M for some λ > 0 (and then, by convexity of M , for all larger
values of λ), while x0 = x belongs to M . The set of those λ ≥ 0 for which xλ ∈ M
is therefore nonempty and bounded from above; this set clearly is closed (since
M is closed). Thus, there exists the largest λ = λ∗ for which xλ ∈ M . We claim
that xλ∗ ∈ rbd M . Indeed, by construction xλ∗ ∈ M . If xλ∗ were to be in rint M ,
then all the points xλ with λ values greater than λ∗ yet close to λ∗ would also
belong to M , which contradicts the origin of λ∗ .
Thus, there exists x̄ ∈ rbd M . Then, by Proposition II.8.2(i), there exists a
hyperplane Π = x ∈ Rn : a⊤ x = a⊤ x̄ which is supporting to M at x̄:

inf a⊤ x < max a⊤ x = a⊤ x̄.

x∈M x∈M

Moreover, by Proposition II.8.2(ii), the set T := Π ∩ M is nonempty closed

and convex and it satisfies dim(T ) < dim(M ), i.e., dim(T ) ≤ k. As M does
not contain lines, T ⊂ M clearly does not contain lines either. Then, by the
inductive hypothesis, T possesses extreme points, i.e., Ext(T ) ̸= ∅. Moreover,
by Lemma II.8.10 Ext(M ) ⊇ Ext(Π ∩ M ) = Ext(T ) ̸= ∅. This completes the
inductive step, and hence (i) is proved.
Now let us prove (ii). Let M be nonempty, closed, convex, and bounded. We
need to prove that
M = Conv(Ext(M )).
As M is convex, we immediately observe that M ⊇ Conv(Ext(M )). Thus, all
we need is to prove that every x ∈ M is a convex combination of points from
Ext(M ). Here, we again use induction on dim(M ). The case of dim(M ) = 0,
96 Consequences of Separation Theorem

i.e., when M is a single point, is trivial. Assume that the statement holds for all
k-dimensional closed convex and bounded sets. Let M be a closed convex and
bounded set with dim(M ) = k + 1. Consider any x ∈ M . To represent x as a
convex combination of points from Ext(M ), let us pass through x an arbitrary
line ℓ = {x + λh : λ ∈ R} (where h ̸= 0) in the affine span Aff(M ) of M . Moving
along this line from x in each of the two possible directions, we eventually leave
M (since M is bounded). Then, there exist nonnegative λ+ and λ− such that the
points
x̄+ := x + λ+ h, x̄− := x − λ− h
both belong to rbd M . We claim that x̄± admit convex combination representa-
tion using points from Ext(M ) (this will complete the proof, since x clearly is a
convex combination of the two points x̄± ). Indeed, by Proposition II.8.2(i) there
exists a hyperplane Π supporting to M at x̄+ , and by Proposition II.8.2(ii) the
set Π ∩ M is nonempty, closed and convex with dim(Π ∩ M ) < dim(M ) = k + 1.
Moreover, as M is bounded Π ∩ M is bounded as well. Then, by the inductive
hypothesis, x̄+ ∈ Conv(Ext(Π ∩ M )). Moreover, since by Lemma II.8.10 we have
Ext(Π ∩ M ) ⊆ Ext(M ), we conclude x̄+ ∈ Conv(Ext(M )). Analogous reasoning
is valid for x̄− as well.

8.3 Recessive directions and recessive cone

Lemma II.8.8 states that if M is a nonempty closed convex set, then the set of
all directions h such that x + th ∈ M for some x and all t ≥ 0 is exactly the same
as the set of all directions h such that x + th ∈ M for all x ∈ M and all t ≥ 0.
Directions of this type play an important role in the theory of convex sets, and
consequently they have a name – they are called recessive directions of M .

Definition II.8.11 [Recessive directions and recessive cone] Given a nonempty

closed convex set M ⊆ Rn , a direction h ∈ Rn is called a recessive direction
of M if we have x + th ∈ M for any x ∈ M and any t ≥ 0.
The set of all recessive directions is called the recessive cone of M [notation:
Rec(M )].

Remark II.8.12 Given a closed convex set M , we immediately deduce that

Rec(M ) indeed is a closed cone (prove it!) and that
M + Rec(M ) = M. (8.3)
■
Let us see some examples of recessive cones of sets.
Example II.8.3
• The recessive cone of Rn+ is itself. In fact, the recessive cone of any closed cone
is itself.
8.3 Recessive directions and recessive cone 97
Pn
• Consider
Pn the set M := {x ∈ Rn : i=1 xi = 1}; then Rec(M ) = {h ∈ R :
n

i=1 hi = 0}. Pn
• Consider the set M := {x ∈ Rn+ : Pi=1 xi = 1}; then Rec(M ) = {0}.
n
• Consider the set M := {x ∈ Rn+ : i=1 xi ≥ 1}; then Rec(M ) = Rn+ .
• Consider the set M := {x ∈ R+ : x1 x2 ≥ 1}; then Rec(M ) = R2+ .
2

• Consider the set M := {x ∈ Rn : xn −an ≥ ∥(x1 , . . . , xn−1 )−(a1 , . . . , an−1 )∥2 },

where a = (a1 , . . . , an ) is a given point. Then, Rec(M ) = Ln .
♢

Fact II.8.13 Let M be a nonempty closed convex set in Rn . Then,

Fact II.8.14 Let M ⊆ Rn be a nonempty closed convex set. Recall its closed
conic transform is given by
ConeT(M ) = cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ M } ,
(see section 1.5). Then,
Rec(M ) = h ∈ Rn : [h; 0] ∈ ConeT(M ) .

Finally, the recessive cones of nonempty polyhedral sets in fact admit a much
simpler characterization.

Fact II.8.15 For any nonempty polyhedral set M = {x ∈ Rn : Ax ≤ b}, its

recessive cone is given by
Rec(M ) = {h ∈ Rn : Ah ≤ 0} ,
i.e., Rec(M ) is given by homogeneous version of linear constraints specifying
M.
We have seen in Theorem II.8.6 that if M is a nonempty convex compact set,
it possesses a quite representative set of extreme points, i.e., their convex hull is
the entire M . We close this section by extending this result as follows.
98 Consequences of Separation Theorem

Theorem II.8.16 Let M ⊂ Rn be a nonempty closed convex set.

(i) If M does not contain lines, then the set Ext(M ) of extreme points of
M is nonempty, and
M = Conv(Ext(M )) + Rec(M ). (8.4)
(ii) In every representation, if any, of M as M = V + K with a nonempty
bounded set V and a closed cone K, the cone K is Rec(M ) and V contains
Ext(M ).

Proof.
(i): By Theorem II.8.6(i) we already know that any nonempty closed convex
that does not contain lines must possess extreme points. We will prove the rest
of Part (i) by induction on dim(M ). There is nothing to prove when dim(M ) =
0, that is, M is a singleton. So, suppose that the claim holds true for all sets
of dimension k. Let M be any nonempty closed convex that does not contain
lines and has dim(M ) = k + 1. To complete the induction step, we will show
that M satisfies the relation (8.4). Consider x ∈ M and let e be a nonzero
direction parallel to Aff(M ) (such a direction exists, since dim(M ) = k + 1 ≥
1). Recalling that M does not contain lines and replacing, if necessary, e with
−e, we can assume that −e is not a recessive direction of M . Same as in the
proof of Theorem II.8.6, x admits a representation x = x− + t− e with t− ≥ 0
and x− ∈ rbd(M ). Define M− to be the intersection of M with the plane Π−
supporting to M at x− . Then, M− is a nonempty closed convex subset of M
and dim(M− ) ≤ k. Also, M− does not contain lines as M− ⊂ M and M does
not contain lines. Thus, by inductive hypothesis, x− is the sum of a point from
the nonempty set Conv(Ext(M− )) and a recessive direction h− of M− . As in the
proof of Theorem II.8.6, Ext(M− ) ⊆ Ext(M ), and of course h− ∈ Rec(M ) due to
Rec(M− ) ⊆ Rec(M ) (why?). Thus, x = v− + h− + t− e with v− ∈ Conv(Ext(X))
and h− ∈ Rec(M ). Now, there are two possibilities: e ∈ Rec(M ) and e ̸∈ Rec(M ).
In the first case, x = v− + h with h = h− + t− e ∈ Rec(M ) (recall h− ∈ Rec(M )
and in this case we also have e ∈ Rec(M )), that is, x ∈ Conv(Ext(M )) + Rec(M ).
In the second case, we can apply the above construction to the vector −e in the
role of e, ending up with a representation of x of the form x = v+ + h+ − t+ e
where v+ ∈ Conv(Ext(M )), h+ ∈ Rec(M ) and t+ ≥ 0. Taking appropriate convex
combination of the resulting pair of representations of x, we can cancel the terms
with e and arrive at x = λv− + (1 − λ)v+ + λh− + (1 − λ)h+ , resulting in x ∈
Conv(Ext(M )) + Rec(M ). This reasoning holds true for every x ∈ M , hence
we deduce M ⊆ Conv(Ext(M )) + Rec(M ). The opposite inclusion is given by
(8.3) due to Conv(Ext(M )) ⊆ M . This then completes the proof of the inductive
hypothesis, and thus Part (i) is proved.
(ii): Now assume that M , in addition to being nonempty closed and convex,
is represented as M = V + K, where K is a closed cone and V is a nonempty
bounded set, and let us prove that K = Rec(M ) and V ⊇ Ext(M ). Indeed, every
vector from K clearly is a recessive direction of V + K, so that K ⊆ Rec(M ). To
8.3 Recessive directions and recessive cone 99

prove the opposite inclusion K ⊇ Rec(M ), consider any h ∈ Rec(M ), and let us
prove that h ∈ K. Fix any point v ∈ M . The vectors v +ih, i = 1, 2, . . ., belong to
M and therefore v + ih = v i + hi for some v i ∈ V and hi ∈ K due to M = V + K.
It follows that h = i−1 [v i − v] + i−1 hi for i = 1, 2, . . .. Thus, h = limi→∞ i−1 hi
(recall that V is bounded). As hi ∈ K and K is a cone, i−1 hi ∈ K and so h
is the limit of a sequence of points in K. Since K is closed, we deduce h ∈ K,
as claimed. Thus, K = Rec(M ). It remains to prove that Ext(M ) ⊆ V . This is
immediate: consider any w ∈ Ext(M ), then as M = V + K = V + Rec(M ) and
w ∈ M , we have w = v + e with some v ∈ V ⊆ M and e ∈ Rec(M ), implying
that w − e = v ∈ M . Besides this, w + e ∈ M as w ∈ M and e ∈ Rec(M ). Thus,
w ± e ∈ M . Since w is an extreme point of M , we conclude that e = 0, that is,
w =v ∈V.
Finally, let us consider what happens to the recessive directions after the pro-
jection operation.

Proposition II.8.17 Let M + ∈ Rnx × Rku be a nonempty closed convex set

such that its projection
M = x ∈ Rn : ∃u: [x; u] ∈ M +

is closed. Then,
[hx ; hu ] ∈ Rec(M + ) =⇒ hx ∈ Rec(M ).

Proof. Consider any recessive direction [hx ; hu ] ∈ Rec(M + ). Then, for any
[x̄; ū] ∈ M + , the ray {[x̄; ū] + t[hx ; hu ] : t ≥ 0} is contained in M + . The pro-
jection of this ray on the x-plane is given by the ray {x̄ + thx : t ≥ 0}, which is
contained in M . Thus, hx ∈ Rec(M ).
While Proposition II.8.17 states that [hx ; hu ] ∈ Rec(M + ) =⇒ hx ∈ Rec(M ),
in general, Rec(M ) can be much larger than the projection of Rec(M + ) onto
x-plane. Our next example illustrates this.
Example II.8.4 Consider the sets M + = {[x; u] ∈ R2 : u ≥ x2 } and M = {x ∈
R2 : ∃u ∈ R: [x; u] ∈ M + }. Then, M is the entire x-axis and Rec(M ) = M
is the entire x-axis. On the other hand, Rec(M + ) = {[0; hu ] : hu ≥ 0} and the
projection of Rec(M + ) onto the x-axis is just the origin. ♢
In fact, the pathology highlighted in Example II.8.4 can be eliminated when we
have that the set of extreme points of the convex representation M + of a convex
set M is bounded and the projection of Rec(M + ) is closed.

Proposition II.8.18 Let M + ⊂ Rnx × Rku be a nonempty closed convex set

such that M + = V + Rec(M + ) for some bounded and closed set V . Let M
be the projection of M + onto the x-plane, i.e.,
M = x ∈ Rnx : ∃u ∈ Rku : [x; u] ∈ M + .

100 Consequences of Separation Theorem

Assume that the cone

K = hx ∈ Rnx : ∃hu ∈ Rku : [hx ; hu ] ∈ Rec(M + )

is closed. Then, M is closed and K = Rec(M ).

Proof. Let M + satisfy the premise of the proposition. Define

W := x ∈ Rnx : ∃u ∈ Rku : [x; u] ∈ V ,

that is, W is the projection of V onto the x-space. As V is a closed and bounded
(therefore compact) set, its projection W is compact as well (recall that the
image of a compact set under a continuous mapping is compact). Note that M is
nonempty and it satisfies M = W + K (why?). Then, M is the sum of a compact
set W and a closed set K, and thus M is closed itself (why?). Besides this, M is
convex (recall that the projection of a convex set is convex). Thus, the nonempty
closed convex set M satisfies M = W + K with nonempty bounded W and closed
cone K, implying by Theorem II.8.16 that K = Rec(M ).
Recall that we have investigated the relation between the recessive directions of
a closed convex set M ∈ Rnx and its closed convex representation M + ∈ Rnx × Rku
in Proposition II.8.17. In particular, we observed that while [hx ; hu ] ∈ Rec(M + )
implies hx ∈ Rec(M ), the recessive direction of M “stemming” from those of M +
can form a small part of Rec(M ), as seen in Example II.8.4.
A surprising (and not completely trivial) fact is that for polyhedral sets M ,
the projection of Rec(M + ) onto the x-plane is Rec(M ).

Proposition II.8.19 Let M ∈ Rnx be a nonempty set admitting a polyhedral

representation M + ∈ Rnx × Rku , i.e.,
M + := [x; u] ∈ Rnx × Rku : Ax + Bu ≤ c , and

M := x ∈ Rnx : ∃u ∈ Rku : [x; u] ∈ M + .

Then,
Rec(M ) = hx : ∃hu : [hx ; hu ] ∈ Rec(M + )

= {hx : ∃hu : Ahx + Bhu ≤ 0} . (8.5)

That is, polyhedral representation of M naturally induces a polyhedral rep-
resentation of Rec(M ).

Proposition II.8.19 is an immediate consequence of Proposition II.8.18. To

derive Proposition II.8.19 from Proposition II.8.18, it suffices to note that a
nonempty polyhedral set is the sum of the convex hull of a finite set and a
polyhedral cone, see section 9.3.

8.4 Dual cone

We start with the definition of dual cone.
8.4 Dual cone 101

Definition II.8.20 [Dual cone] Let M ⊆ Rn be a cone. The set of all vectors
which have nonnegative inner products with all vectors from M , i.e., the set
M∗ := a ∈ Rn : a⊤ x ≥ 0, ∀x ∈ M ,

(8.6)
is called the cone dual to M .

From its definition, it is clear that the dual cone M∗ of any cone M is a closed
cone.
Example II.8.5 The cone dual to the nonnegative orthant Rn+ is composed of
all n-dimensional vectors y making nonnegative inner products with all entrywise
nonnegative n-dimensional vectors x. As is immediately seen the vectors y with
this property are exactly entrywise nonnegative vectors: [Rn+ ]∗ = Rn+ . ♢
Note that in the preceding example, Rn+ is given by finitely many homogeneous
linear inequalities:

Rn+ = x ∈ Rn : e⊤

i x ≥ 0, i = 1, . . . , n ,

where ei are the basic orths; and we observe that the dual cone is the conic hull
of these basic orth. This is indeed a special case of the following general fact:

Proposition II.8.21 For any F ⊆ Rn , the set

M := x ∈ Rn : f ⊤ x ≥ 0, ∀f ∈ F

is a closed cone, and its dual cone is

M∗ = cl Cone(F ),
where Cone(F ), as always, is the conic hull of F , see Definition I.1.19. In
addition, M remains intact when F is extended to its closed conic hull:
M := x ∈ Rn : f ⊤ x ≥ 0, ∀f ∈ F = x ∈ Rn : f ⊤ x ≥ 0, ∀f ∈ cl Cone(F ) .

Proof. The inclusion Cone(F ) ⊆ M∗ is evident, and since M∗ is closed, we have

also cl Cone(F ) ⊆ M∗ . Let us define F := cl Cone(F ), so now we need to prove
the inclusion F ⊇ M∗ . Consider z ∈ M∗ , and assume for contradiction that z ̸∈ F .
Note that F is convex, nonempty, and closed, so that by Separation Theorem (ii)
there exists g such that
g ⊤ z < inf g ⊤ f.
f ∈F

Because F is a closed cone (and so 0 ∈ F ), the right hand side infimum, being
finite, must be 0. Then, g ⊤ f ≥ 0 for all f ∈ F and g ⊤ z < 0. Since f ⊤ g ≥ 0 for all
f ∈ F and also F ⊇ F , we deduce f ⊤ g ≥ 0 for all f ∈ F , that is, g ∈ M by the
definition of M . But, then the inclusion g ∈ M together with z ∈ M∗ contradicts
the relation z ⊤ g < 0. Finally, we clearly have f ⊤ x ≥ 0 for all x ∈ F if and only
if f ⊤ x ≥ 0 for all x ∈ cl Cone(F ).
102 Consequences of Separation Theorem

Remark II.8.22 Note that, in contrast to Proposition II.8.21, in the concluding

expression of the chain
[Rn+ ]∗ = x ∈ Rn : e⊤

i x ≥ 0, i = 1, . . . , n ∗ = Cone({ei : i = 1, . . . , n})

we did not need to take the closure. This is because the conic hull of a finite set
F is polyhedrally representable and is therefore a polyhedral cone (by Theorem
I.3.2), and as such it is automatically closed.
This fact (i.e., no need to take the closure in Proposition II.8.21) holds true
for the dual of any polyhedral cone: consider the set {x ∈ Rn : a⊤ i x ≥ 0, i =
1, . . . , I} given by finitely many homogeneous nonstrict linear inequalities. This
set is clearly a polyhedral
nP cone, and itsodual is the conic hull of ai ’s, i.e., Cone({ai :
I
i = 1, . . . , I}) = i=1 λi ai : λ ≥ 0 . Moreover, this dual cone clearly is also
polyhedrally representable as
( I
)
X
Cone {a1 , . . . , aI } = x ∈ Rn : ∃λ ≥ 0: x = λi ai ,
i=1

and thus Cone {a1 , . . . , aI } is polyhedral as well. ■

⊤
n
In the case of the cones of the form x ∈ R : f x ≥ 0, ∀f ∈ F stemming
from infinite sets F (in fact, every closed cone in Rn can be represented in this
way using a properly selected countable set F = {fi : i = 1, 2, . . .}) (why?), the
closure operation in the computation of the dual cone, in general, cannot be
omitted. This is so even when the set F itself is closed convex and bounded
(why? Hint: recall Proposition II.8.21).
Let us illustrate this in the next example.
x ∈ R2 : f ⊤ x ≥ 0, ∀f ∈ F ,

Example II.8.6 Consider the cone given by K :=
where F = f = [u; v] ∈ R2+ : v ≥ u2 , u ≤ 1, v ≤ 1 . Note that F is a compact
convex set contained in R2+ . Moreover, every vector [u; v] ∈ R2 with positive en-
tries is a positive multiple of a vector from F (draw a picture!). Thus, the set of
vectors that have nonnegative inner products with all vectors from F , i.e., K, is
exactly the same as the set of vectors that have nonnegative inner products with
all vectors from R2+ . Hence, we arrive at K = x ∈ R2 : f ⊤ x ≥ 0, ∀f ∈ F =

R2+ , so K∗ = R2+ as well. Now observe that K∗ = R2+ is, as it should be by Propo-
sition II.8.21, the closure of Cone(F ), nevertheless K∗ = cl Cone(F ) is larger than
Cone(F ) as Cone(F ) is not closed! Note that Cone(F ) is precisely the set obtained
from R2+ by eliminating all nonzero points on the boundary of R2+ . ♢

Fact II.8.23 Let M be a closed cone in Rn , and let M∗ be the cone dual to
M . Then
(i) Duality does not distinguish between a cone and its closure: whenever
M = cl M ′ for a cone M ′ , we have M∗ = M∗′ .
(ii) Duality is symmetric: the cone dual to M∗ is M .
8.4 Dual cone 103

(iii) One has

int M∗ = y ∈ Rn : y ⊤ x > 0, ∀x ∈ M \ {0} ,

and int M∗ is nonempty if and only if M is pointed (i.e., M ∩[−M ] = {0}).

Moreover, when M , in addition to being closed, is pointed and nontrivial
(M ̸= {0}), one has
int M∗ = y ∈ Rn : My := {x ∈ M : x⊤ y = 1} is nonempty and compact .

(8.7)

(iv) The cone dual to the direct product M1 × . . . × Mm of cones Mi is the

direct product of their duals: [M1 × . . . × Mm ]∗ = [M1 ]∗ × . . . × [Mm ]∗ .

Let us see some examples of dual cones.

Example II.8.7 Consider the epigraph of ∥ · ∥∞ on Rn given by
K∞ := [x; t] ∈ Rn+1 : t ≥ ∥x∥∞ .

Note that K∞ is a polyhedral cone (why?). The cone dual to K∞ is K1 =

{[x; t] ∈ Rn+1 : t ≥ ∥x∥1 }, which is nothing but the epigraph of ∥ · ∥1 :
[K∞ ]∗ = [g; s] ∈ Rn+1 : st + g ⊤ x ≥ 0, ∀([x; t] : ∥x∥∞ ≤ t)

n+1 ⊤
= [g; s] ∈ R : s + min g x ≥ 0
x:∥x∥∞ ≤1
n+1

= [g; s] ∈ R : s ≥ ∥g∥1 . ♢

Definition II.8.24 [Self-dual cone] A cone K ⊂ Rn is called a self-dual cone

if its dual cone is equal to itself, i.e., K∗ = K.

We next examine a number of very important self-dual cones.

Example II.8.8 Let us compute the dual of the Lorentz cone
Ln := [x; t] ∈ Rn−1 × R : t ≥ ∥x∥2 .

When n = 1, L1 is the nonnegative ray, and thus L1 = R+ and therefore [L1 ]∗ =

R+ = L1 . When n ≥ 2, we have
[Ln ]∗ = [g; s] ∈ Rn−1 × R : g ⊤ x + ts ≥ 0, ∀([x; t] : ∥x∥2 ≤ t)

= [g; s] ∈ Rn−1 × R : g ⊤ x + s ≥ 0, ∀(x : ∥x∥2 ≤ 1)

n−1 ⊤
= [g; s] ∈ R × R : s + min g x ≥ 0
x:∥x∥2 ≤1

= [g; s] ∈ Rn−1 × R : s ≥ ∥g∥2 ,

where the concluding equality is due to Cauchy-Schwarz inequality, see Theorem

B.1. Thus, [Ln ]∗ = Ln . ♢
n
Example II.8.9 The cone dual to the semidefinite cone S+ , by Theorem D.32,
is itself:
[Sn+ ]∗ := y ∈ Sn : ⟨y, x⟩ := Tr(xy) ≥ 0, ∀x ∈ Sn+ = Sn+ .

♢
104 Consequences of Separation Theorem

Based on these examples, we have arrived at the following conclusion.

The cones R+ , Ln , and Sn+ are self-dual.
By Fact II.8.23 the direct product of finitely many self-dual cones is self-dual, im-
plying that finite direct products of nonnegative orthants, Lorentz, and semidef-
inite cones are self-dual.

8.5 ⋆ Dubovitski-Milutin Lemma

In this section, we deal with the following basic yet important question: Let
M 1 , . . . , M k be cones (not necessarily closed) in Rn , and M be their intersection.
Of course, M also is a cone. But, how can we compute M∗ , i.e., the cone dual
to M ? To this end, we first examine the relationship between M∗ and the cone
f that is defined as the sum of the dual cones M i .
M ∗

Tk
Proposition II.8.25 Let M 1 , . . . , M k be cones in Rn . Define M := i=1 M i ,
and let M∗ be the dual cone of M . Let M∗i denote the dual cone of M i , for
f := M 1 + . . . + M k . Then, M∗ ⊇ M
i = 1, . . . , k, and define M f.
∗ ∗
1 k
Moreover, if all the cones M , . . . , M are closed, then M∗ = cl M f. In
1 k
particular, for closed cones M , . . . , M , M∗ = M f holds if and only if M
f is
closed.

Proof. For any i = 1, . . . , k, any ai ∈ M∗i and any x ∈ M , we have a⊤ i x ≥ 0, and

hence (a1 + . . . + ak )⊤ x ≥ 0. Since the latter relation is valid for all x ∈ M , we
conclude that a1 + . . . + ak ∈ M∗ . Thus, M f ⊆ M∗ .
1 k
Now assume that the cones M , . . . , M are closed, and let us define M c := cl M
f
so that we need to prove M∗ = M . Recall that we have already seen M ⊆ M∗ ,
c f
and as M∗ is closed we deduce M c = cl Mf ⊆ M∗ . Thus, all we need to prove is
that if a ∈ M∗ , then a ∈ M c as well. Assume for contradiction that there exists
a ∈ M∗ \ M . As M is clearly a cone, its closure M
c f c is a closed cone. Then, as
a ̸∈ M
c, by Separation Theorem (ii), a can be strongly separated from M c and
thus also from M ⊆ M . Therefore, for some x ̸= 0 we have
f c
k
X
a⊤ x < inf b⊤ x = inf
i
(a1 + . . . + ak )⊤ x = inf a⊤
i x. (8.8)
b∈M
f ai ∈M∗ ,i=1,...,k ai ∈M∗i
i=1

As a⊤ x is a finite number, this inequality implies that inf i a⊤

i x > −∞ holds for
ai ∈M∗
all i = 1, . . . , k. Since M∗i is a cone, this is possible if and only if inf i a⊤
i x = 0.
ai ∈M∗
Then, we deduce that x ∈ [M∗i ]∗ = M i , where the last equality follows from the
fact that each cone M i is closed and using Fact II.8.23.(ii). Thus, x ∈ M i for all i,
k k
inf i a⊤ M i , and (8.8) reduces to a⊤ x < 0.
P T
and i x = 0. Therefore, x ∈ M =
i=1 ai ∈M∗ i=1
But, this then contradicts the inclusion a ∈ M∗ .
8.5 ⋆ Dubovitski-Milutin Lemma 105

Remark II.8.26 Note that in general M f can be non-closed even when all the
1 k
cones M , . . . , M are closed. Indeed, take
√ 2 k =2 2, and let M 1 = M∗1 be the
3 2
second-order cone (x, y, z) ∈ R : z ≥ x + y , and M∗ be the following ray
in R3
(x, y, z) ∈ R3 : x = z, y = 0, x ≤ 0 .

Observe that the points from M f ≡ M 1 + M 2 are exactly the points of the form
√ ∗ ∗
(x−t, y, z−t) with t ≥ 0 and z ≥ x2 + y 2 . In particular, for any α > 0, the points
√ √
(0, 1, α2 + 1−α) = (α−α, 1, α2 + 1−α) belong to M f. As α → ∞, these points
ξ ∈ cl M
converge to ξ := (0, 1, 0), and thus √ f. On the other hand, we clearly cannot
find x, y, z, t with t ≥ 0 and z ≥ x2 + y 2 such that (x − t, y, z − t) = (0, 1, 0),
that is, ξ ̸∈ Mf. ■
Dubovitski-Milutin Lemma presents a simple sufficient condition for M f to be
closed and thus to coincide with M∗ :

Proposition II.8.27 [Dubovitski-Milutin Lemma in finite dimensions] Let

M 1 , . . . , M k be cones such that
M k ∩ int M 1 ∩ int M 2 ∩ . . . ∩ int M k−1 ̸= ∅.
k
M i . Let also M∗i be the cones dual to M i . Then,
T
Define M :=
i=1
k
cl M i ; and
T
(i) cl M =
i=1
f := M 1 +. . .+M k is closed, and thus by Proposition II.8.25
(ii) the cone M ∗ ∗
M∗ = M∗ + . . . + M∗k . In other words, every linear form which is nonnegative
1

on M can be represented as a sum of k linear forms which are nonnegative

on the respective cones M 1 , . . . , M k .

Proof. (i): This is given by Proposition I.1.33(ii).

(ii): First, we claim that under the premise of the proposition, without loss of
generality we can assume that M 1 , . . . , M k are closed cones. This is because when
replacing the cones M 1 , . . . , M k with their closures, we preserve the premise of the
proposition, and also M f = M 1 + . . . + M k = [cl M 1 ]∗ + . . . + [cl M k ]∗ holds (recall
∗ ∗
that by definition of the dual cone M∗i = [cl M i ]∗ ), as well as M∗ = [cl M ]∗ =
k
cl M i where the last equality holds by Part (i).
T
i=1
To prove Part (ii) of the proposition all we need is to show that given closed
cones M 1 , . . . , M k we have the cone M
f := M 1 + . . . + M k is closed. To this end,
∗ ∗
we will use induction on k ≥ 2.
Base case: Suppose k = 2. Consider a sequence {ft + gt }∞ t=1 with ft ∈ M∗ ,
1
2
gt ∈ M∗ and (ft + gt ) → h as t → ∞. We need to prove that h = f + g for some
appropriate f ∈ M∗1 and g ∈ M∗2 . To this end, it suffices to verify that for an
appropriate subsequence tj of indices there exists f := lim ftj . Indeed, if this is
j→∞
106 Consequences of Separation Theorem

the case, then g = lim gtj also exists since ft + gt → h as t → ∞ and f + g = h,

j→∞
and also in this case we will have f ∈ M∗1 and g ∈ M∗2 (recall that M∗1 and M∗2
are closed cones). Let us verify the existence of the desired subsequence. Assume
for contradiction that ∥ft ∥2 → ∞ as t → ∞. Passing to a subsequence, we may
assume that the unit vectors ϕt := ft /∥ft ∥2 have a limit ϕ as t → ∞. Since M∗1
is a closed cone, ϕ is a unit vector from M∗1 . Now, since ft + gt → h as t → ∞,
we have ϕ = lim ft /∥ft ∥2 = − lim gt /∥ft ∥2 (recall that ∥ft ∥2 → ∞ as t → ∞,
t→∞ t→∞
whence h/∥ft ∥2 → 0 as t → ∞). Then, the vector (−ϕ) ∈ M∗2 as well (recall that
M∗2 is a closed cone). Now, consider any x̄ ∈ M 2 ∩ int M 1 (by the premise of the
proposition this set is non-empty). We have ϕ⊤ x̄ ≥ 0 (since x̄ ∈ M 1 and ϕ ∈ M∗1 )
and ϕ⊤ x̄ ≤ 0 (since −ϕ ∈ M∗2 and x̄ ∈ M 2 ). We conclude that ϕ⊤ x̄ = 0, which
contradicts the facts that 0 ̸= ϕ (as ∥ϕ∥2 = 1), ϕ ∈ M∗1 and x̄ ∈ int M 1 (see Fact
II.8.23.(iii)).
Inductive step: Assume that the statement is valid for k − 1 ≥ 2 cones, and let
M 1 ,. . . ,M k be k cones satisfying the premise of the proposition. By this premise,
the cone M1 := M 1 ∩ . . . ∩ M k−1 has a nonempty interior, and M k intersects
this interior. Applying to the pair of cones M1 , M k the already proved 2-cone
version of the lemma, we see that the set [M1 ]∗ + M∗k is closed; here [M1 ]∗ is
the cone dual to M1 . Moreover, the cones M 1 , . . . , M k−1 satisfy the premise of
the (k − 1)-cone version of the lemma. Then, by inductive hypothesis, the set
M∗1 + . . . + M∗k−1 is closed. Then, as M1 := M 1 ∩ . . . ∩ M k−1 , Proposition II.8.25
implies that [M1 ]∗ = M∗1 + . . . + M∗k−1 , and so M∗1 + . . . + M∗k = [M1 ]∗ + M∗k . As
[M1 ]∗ + M∗k is closed, we deduce that M∗1 + . . . + M∗k is closed, as desired. This
concludes the induction step.
Alternative to proof to Proposition II.8.27. Here, we present an alternative
proof of Proposition II.8.27 Part (ii) without relying on induction.
Let us start with the following fact that is important by its own right.

Fact II.8.28 Let M ⊆ Rn be a cone and M∗ be its dual cone. Then, for any
x ∈ int M , there exists a properly selected Cx < ∞ such that
∥f ∥2 ≤ Cx f ⊤ x, ∀f ∈ M∗ .

Now, as explained in the beginning of Part (ii) of the above proof of Proposi-
tion II.8.27, we can assume without loss of generality that the cones M 1 , . . . , M k
satisfying the premise of the proposition are closed, and all we need to prove is
that the cone M∗1 + . . . + M∗k is closed. The latter is the same as to verify that
Pk
whenever vectors fti ∈ M∗i , i ≤ k, t = 1, 2, . . . are such that ft := i=1 fti → h
as i → ∞, it holds h ∈ M∗1 + . . . + M∗k . Indeed, in the situation in question,
selecting x̄ ∈ M k ∩ int M 1 ∩ . . . ∩ int M k−1 (by the promise this intersection is
Pk
nonempty!) we have x̄⊤ fti ≥ 0 for all i, t and i=1 x̄⊤ fti → x̄⊤ h as t → ∞,
implying that for all i ≤ k the sequences {x̄⊤ fti }t=1,2,... are bounded. Moreover,
for any i < k we have x̄ ∈ int M i and fti ∈ M∗i , and so Fact II.8.28 guaran-
tees that the sequence {fti }t=1,2,... is bounded. Thus, as the sequences {fti }t=1,2,...
8.6 Extreme rays and conic Krein-Milman Theorem 107
Pk
are bounded for any i < k and the sequence i=1 fti has a limit as t → ∞,
we conclude that the sequence {ftk }t=1,2,... is bounded as well. Hence, all k se-
quences {fti }t=1,2,... are bounded, so that passing to a subsequence t1 < t2 < . . .
we can assume that f i := limj→∞ ftij is well defined for every i ≤ k. Since
fti ∈ M∗i andPthe cones M∗i are P closed,
i i
Pwe ihave f ∈ M∗ for all i ≤ k. Finally, as
h = limt→∞ i ft = limj→∞ i ftj = i f , we conclude that h ∈ M∗1 + . . . + M∗k ,
i i

as claimed.

8.6 Extreme rays and conic Krein-Milman Theorem

The story about extreme points of closed convex sets has a conic analogy, with
nontrivial closed pointed cones playing the role of nonempty closed convex sets
that do not contain straight lines and extreme rays of these cones in the role of
extreme points.
Once again, in order to define extreme rays of nontrivial closed cones, we need
to work with cones that do not contain straight lines. In the case of closed cones,
this notion of not containing straight lines has a special name, i.e., pointed.

Definition II.8.29 [Pointed cone] A closed cone M ⊆ Rn is called pointed

if M ∩ [−M ] = {0}, i.e., the zero vector is the only vector x that satisfies
x ∈ M and −x ∈ M .

Remark II.8.30 Note that a closed cone M ⊆ Rn is pointed if and only if

M does not contain a straight line passing through the origin. Invoking Lemma
II.8.8, we see that a closed cone is pointed if and only if it does not contain
straight lines. ■
In our discussion, we will focus on nontrivial cones M , i.e., M ̸= {0}. For
nontrivial closed pointed cones, let us formally introduce the definition of extreme
directions and extreme rays.

Definition II.8.31 [Extreme directions and extreme rays] Let M ⊂ Rn be

a nontrivial closed pointed cone. A direction d ∈ Rn is called an extreme
direction of M if it possesses the following two properties:
• d ∈ M \ {0}, and
• in every representation of d as the sum of two vectors from M , i.e., d =
d1 + d2 with d1 , d2 ∈ M , both d1 and d2 are nonnegative multiples of d.
It is clear that when d is an extreme direction of M , so are all positive
multiples of d, i.e., all nonzero vectors on the ray R+ (d) generated by d are
also extreme directions of M . A ray generated by an extreme direction of M
is called an extreme ray of M .
The set of all extreme directions and extreme rays of M are denoted by
ExtD(M ) and ExtR(M ), respectively.
108 Consequences of Separation Theorem

Example II.8.10 The simplest example of nontrivial closed and pointed cone
is the nonnegative orthant Rn+ . Based on our extreme direction definition, the
extreme directions of Rn+ should be the nonzero n-dimensional entrywise non-
negative vectors d such that whenever d = d1 + d2 with d1 ≥ 0 and d2 ≥ 0, both
d1 and d2 must be nonnegative multiples of d. Such a vector d has at all entries
nonnegative and at least one of them positive. In fact, the number of positive
entries in d is exactly one, since if there were at least two entries, say, d1 and
d2 , positive, we would have d = [d1 ; 0; . . . ; 0] + [0; d2 ; d3 ; . . . ; dn ] and both of the
vectors in the right hand side would be nonzero and not proportional to d. Thus,
any extreme direction of Rn+ must be a positive multiple of a basic orth. It is
immediately seen that every vector of the latter type is an extreme direction of
Rn+ . Hence, the extreme directions of Rn+ are positive multiples of the basic orths,
and the extreme rays of Rn+ are the nonnegative parts of the coordinate axes. ♢
We next introduce the concept of a base which is an important type of the
cross section of a nontrivial closed pointed cone. Moreover, we will see that a
base will be a compact convex set and will establish a direct connection between
the extreme rays of the underlying cone and extreme points of its base.

Definition II.8.32 [Base of a cone] Let M ⊆ Rn be a nontrivial closed

pointed cone, and M∗ be its dual cone. A set of the form
B := x ∈ M : f ⊤ x = 1

(8.9)
is called a base of M if this set intersects every emanating from the origin
nontrivial ray in M . Equivalently, a set B of the form (8.9) is a base of M if
for every x ∈ M \{0} there exists t > 0 such that f ⊤ (tx) = 1, or equivalently,
if f ⊤ x > 0 whenever x ∈ M \ {0}.

Fact II.8.33 Let M ⊆ Rn be a nontrivial closed cone, and M∗ be its dual

cone.
(i) M is pointed
(i.1) if and only if M does not contain straight lines,
(i.2) if and only if M∗ has a nonempty interior, and
(i.3) if and only if M has a base.
(ii) Set (8.9) is a base of M
(ii.1) if and only if f ⊤ x > 0 for all x ∈ M \ {0},
(ii.2) if and only if f ∈ int M∗ .
In particular, f ∈ int M∗ if and only if f ⊤ x > 0 whenever x ∈ M \ {0}.
(iii) Every base of M is nonempty, closed, and bounded. Moreover, whenever
M is pointed, for any f ∈ M∗ such that the set (8.9) is nonempty (note
that this set is always closed for any f ), this set is bounded if and only if
f ∈ int M∗ , in which case (8.9) is a base of M .
(iv) M has extreme rays if and only if M is pointed. Furthermore, when
M is pointed, there is one-to-one correspondence between extreme rays of
8.6 Extreme rays and conic Krein-Milman Theorem 109

Figure II.2. Cone and its base (grey pentagon). Extreme rays of the cone are
OA, OB,. . . ,OE intersecting the base at its extreme points A, B, . . . , E.

M and extreme points of a base B of M : specifically, the ray R := R+ (d),

d ∈ M \ {0} is extreme if and only if R ∩ B is an extreme point of B.

See Figure II.2 for an illustration of Fact II.8.33(iv).

In Example II.8.10 we have observed that every vector from Rn+ is the sum
of finitely many extreme directions of Rn+ . This observation is indeed the special
case of the following result.

Theorem II.8.34 [Krein-Milman Theorem, conic form] Let M ⊂ Rn be

a nontrivial closed and pointed cone. Then, its set of extreme directions,
ExtD(M ), is nonempty, and every vector d ∈ M is the sum of finitely many
extreme directions of M .
Proof. Under the premise of the theorem, Fact II.8.33 implies that M has a base
B which is a nonempty convex compact set. Then, by Krein-Milman Theorem
(Theorem II.8.6, B = Conv(Ext(B)). Invoking Fact II.8.33(iv), we conclude that
M has extreme rays; each extreme ray of M is generated by precisely one extreme
point of B. Since vectors from M are exactly the nonnegative multiples of vectors
from B and each point in B is the convex combination of finitely many points
from Ext(B) (by Caratheodory’s Theorem, see Theorem I.2.3), we conclude that
every point of M belongs to the sum of finitely many properly selected extreme
rays of M .
Krein-Milman Theorem in conic form states that a nontrivial closed pointed
cone has extreme rays – even enough of them to make their arithmetic sum the
entire cone. Recall that when a cone M is trivial, we have M = {0} and thus
M does not contain nonzero vectors and therefore has no extreme directions
(the latter, by definition, are nonzero vectors from the cone satisfying certain
additional requirements). Then, by Fact II.8.33(iv) and Krein-Milman Theorem,
we arrive at the following complete characterization of when a closed cone M
possesses extreme rays.
110 Consequences of Separation Theorem

Corollary II.8.35 A closed cone M ⊆ Rn possesses extreme rays if and

only if M is nontrivial and pointed.

8.7 ⋆ Polar of a convex set

We next study the polars of convex sets, a concept closely related to the duals of
cones.

Definition II.8.36 [Polar of a convex set] For any nonempty convex set
M ⊆ Rn , we define its polar [notation: Polar (M )] to be the set of all vectors
a ∈ Rn such that a⊤ x ≤ 1 for all x ∈ M , i.e.,
Polar (M ) := a ∈ Rn : a⊤ x ≤ 1, ∀x ∈ M .

Let us see some basic examples.

Example II.8.11

1. Polar (Rn ) = {0}.

2. Polar ({0}) = Rn .
3. Given a linear subspace in L ⊆ Rn , we have Polar (L) = L⊥ (why?).
4. Let B be the unit Euclidean ball, i.e., B := {x ∈ Rn : ∥x∥2 ≤ 1}. Then,
Polar (B) = B (by Cauchy-Schwarz inequality).
5. Let X ⊂ Rn be nonempty and D be a nonsingular n × n matrix. Then,
Polar (DX) = D−⊤ Polar (X).
6. Let E be an n-dimensional ellipsoid centered at the origin, i.e., E := {x :
x⊤ Cx ≤ 1} where C ≻ 0. Then, Polar (E) = {x : x⊤ C −1 x ≤ 1}, i.e., its polar
is another n-dimensional ellipsoid centered at the origin.
7. Finally, note that passing to polars reverts inclusions: when ∅ ̸= X ⊂ Y ⊂ Rn ,
we have Polar (Y ) ⊂ Polar (X). ♢

For any nonempty convex set M , the following properties of its polar are evi-
dent:

1. 0 ∈ Polar (M );
2. Polar (M ) is convex;
3. Polar (M ) is closed.

It turns out that these properties characterize polars completely:

Proposition II.8.37 Every closed convex set M containing the origin is a

polar set. Specifically, such a set is the polar of its polar:
M is closed, convex, and 0 ∈ M ⇐⇒ M = Polar (Polar (M )).
8.7 ⋆ Polar of a convex set 111

Proof. Based on the evident properties of polars, all we need is to prove that if
M is closed and convex and 0 ∈ M , then M = Polar (Polar (M )). By definition,
for all x ∈ M and y ∈ Polar (M ), we have
y ⊤ x ≤ 1.
Thus, M ⊆ Polar (Polar (M )).
To prove that this inclusion is in fact equality, we assume for contradiction that
there exists x̄ ∈ Polar (Polar (M )) \ M . Since M is a nonempty closed convex set
and x̄ ̸∈ M , the point x̄ can be strongly separated from M (Separation Theorem
(ii)). Thus, there exists b ∈ Rn such that
b⊤ x̄ > sup b⊤ x.
x∈M
⊤
As 0 ∈ M , we deduce b x̄ > 0. Passing from b to a proportional vector a = λb
with appropriately chosen positive λ, we may ensure that
a⊤ x̄ > 1 ≥ sup a⊤ x.
x∈M

From the relation 1 ≥ sup a⊤ x we conclude that a ∈ Polar (M ). But, then the
x∈M
relation a⊤ x̄ > 1 contradicts the assumption that x̄ ∈ Polar (Polar (M )). Hence,
we conclude that indeed M = Polar (Polar (M )).
We close this section with a few important properties of the polars.

Fact II.8.38 Let M ⊆ Rn be a convex set containing the origin. Then,

(i) Polar (M ) = Polar (cl M );
(ii) M is bounded if and only if 0 ∈ int(Polar (M ));
(iii) int(Polar (M )) ̸= ∅ if and only if M does not contain straight lines;
(iv) If M is a cone (not necessarily closed), then
Polar (M ) = a ∈ Rn : a⊤ x ≤ 0, ∀x ∈ M = −M∗ .

(8.10)
Assume that M is closed. Then, M is a closed cone if and only if Polar (M )
is a closed cone.
For more information on polars, see Exercise II.38.
9

Geometry of polyhedral sets

9.1 Extreme points of polyhedral sets

Consider a polyhedral set
M = {x ∈ Rn : Ax ≤ b} ,
where A is an m × n matrix and b ∈ Rm . We have seen a geometric character-
ization of extreme points for general convex sets in section 8.2.1. In the case of
polyhedral sets M , we can also give an algebraic characterization of the extreme
points as follows.

Theorem II.9.1 [Characterization of extreme points of polyhedral sets] Let

M = {x ∈ Rn : Ax ≤ b}. A point x ∈ M is an extreme point of M if and only
if there are n linearly independent (i.e., with linearly independent vectors of
coefficients) inequalities of the system Ax ≤ b that are active (i.e., hold as
equalities) at x.

Proof. Let a⊤ i , i = 1, . . . , m, be the rows of A.

The⊤ “only if” part: let x be an extreme point of M , and define the sets I :=
i : ai x = bi as the set of indices of active constraints and F := {ai : i ∈ I} as
the set of vectors of active constraints. We will prove that the set F contains n
linearly independent vectors, i.e., Lin(F ) = Rn . Assume for contradiction that
this is not the case. Then, as dim(F ⊥ ) = n − dim(Lin(F )), we deduce dim(F ⊥ ) >
0 and so there exists a nonzero vector d ∈ F ⊥ . Consider the segment ∆ϵ :=
[x − ϵd, x + ϵd], where ϵ > 0 will be the parameter of our construction. Since
d is orthogonal to the “active” vectors ai (those with i ∈ I), all points y ∈ ∆ϵ
satisfy the relations a⊤ ⊤
i y = ai x = bi . Now, if i is a “nonactive” index (one with
⊤ ⊤
ai x < bi ), then ai y ≤ bi for all y ∈ ∆ϵ , provided that ϵ is small enough. Since
there are finitely many nonactive indices, we can choose ϵ > 0 in such a way that
all y ∈ ∆ϵ will satisfy all “nonactive” inequalities a⊤ i x ≤ bi , i ̸∈ I, as well. So,
we conclude that for the above choice of ϵ > 0 we get ∆ϵ ⊆ M . But, this is a
contradiction to x being an extreme point of M as we have expressed x as the
midpoint of a nontrivial segment ∆ϵ (recall that ϵ > 0 and d ̸= 0).
To prove the “if” part, we assume that x ∈ M is such that among the in-
equalities a⊤i x ≤ bi which are active at x there are n linearly independent ones.
Without loss of generality, we assume that the indices of these linearly indepen-
dent equations are 1, . . . , n. Given this, we will prove that x is an extreme point
112
9.1 Extreme points of polyhedral sets 113

of M . Assume for contradiction that x is not an extreme point. Then, there exists
a vector d ̸= 0 such that x ± d ∈ M . In other words, for i = 1, . . . , n we would
have bi ≥ a⊤ ⊤ ⊤
i (x ± d) ≡ bi ± ai d (where the last equivalence follows from ai x = bi
⊤
for all i ∈ I = {1, . . . , n}), which is possible only if ai d = 0, i = 1, . . . , n. But
the only vector which is orthogonal to n linearly independent vectors in Rn is
the zero vector (why?), and so we get d = 0, which contradicts to the assumption
d ̸= 0.
Theorem II.9.1 states that at every extreme point of a polyhedral set M =
{x ∈ Rn : Ax ≤ b} we must have n linearly independent constraints from Ax ≤ b
holding as equalities. Since a system of n linearly independent equality constraints
in n unknowns has a unique solution, such a system can specify at most one
extreme point of M (exactly one, when the (unique!) solution to the system
satisfies the remaining constraints in the system Ax ≤ b). Moreover, when M
is defined by m inequality constraints, the number of such systems, and thus
the number of extreme points of M , does not exceed the number Cnm of n × n
submatrices of the matrix A ∈ Rm×n . Hence, we arrive at the following corollary.

Corollary II.9.2 Every polyhedral set has finitely many extreme points.

Recall that there are nonempty polyhedral sets which do not have any extreme
points; these are precisely the ones that contain lines.
Note that Cnm is nothing but an upper (and typically very conservative) bound
on the number of extreme points of a polyhedral set in Rn defined by m inequality
constraints. This is because some n × n submatrices of A can be singular, and
what is more important, the majority of the nonsingular ones typically produce
“candidate” points which do not satisfy the remaining inequalities defining M .
Remark II.9.3 Historically, Theorem II.9.1 has been instrumental in developing
an algorithm to solve linear programs, namely the Simplex method. Let us consider
an LP in standard form
minn c⊤ x : P x = p, x ≥ 0 ,

x∈R

where P ∈ Rk×n . Note that we can convert any given LP to this form by adding
a small number of new variables and constraints if needed. In the context of this
LP, Theorem II.9.1 states that the extreme points of the feasible set are exactly
the basic feasible solutions of the system P x = p, i.e., nonnegative vectors x such
that P x = p and the set of columns of P associated with positive entries of x is
linearly independent. As the feasible set of an LP in standard form clearly does
not contain lines (note the constraints x ≥ 0 which restricts the standard form LP
domain to be subset of the pointed cone Rn+ ), among its optimal solutions (if they
exist) at least one is an extreme point of the feasible set (Theorem II.9.12(ii)).
This then suggests a simple algorithm to solve a solvable LP in standard form: go
through the finite set of all extreme points of the feasible set (or equivalently all
basic feasible solutions) and choose the one with the best objective value. This
algorithm allows to find an optimal solution in finitely many arithmetic opera-
114 Geometry of polyhedral sets

tions, provided that the LP is solvable, and underlies the basic idea for the Sim-
plex method. As one will immediately recognize, the number of extreme points,
although finite, may be quite large. The Simplex method operates in a smarter
way and examines only a subset of the basic feasible solutions in an organized
way and can handle other issues such as infeasibility and unboundedness.
Another useful consequence of Theorem II.9.1 is that if all the data in an LP
are rational, then every one of its extreme points is a vector with rational entries.
Thus, a solvable standard form LP with rational data has at least one rational
optimal solution. ■
Theorem II.9.1 has further important consequences in terms of sizes of extreme
points of polyhedral sets as well.
To this end, let us first recall a simple fact from Linear Algebra:

Proposition II.9.4 If a system of linear equations Ax = b (with A ∈ Rm×n )

is feasible, then it has a solution x(b) of “magnitude of order of the magnitude
of b;” that is, ∥x(b)∥2 ≤ C(A)∥b∥2 with parameter C(A) < ∞ which depends
solely on A and but not on b.

Proof. Let r := rank(A). There is nothing to prove when r = 0 as in this case

A is zero, and if the system Ax = b has a solution, the vector zero is one of its
solutions as well. When r > 0, we can assume without loss of generality that
the first r columns of A are linearly independent, and the remaining columns are
linear combinations of these r columns. Then, if there is a solution to Ax = b,
then there must be a solution where xi = 0 for all i > r. We will take such a
solution as x(b). As the first r columns of A are linearly independent, A has an
r × r submatrix A b composed of the first r columns of A and r properly selected
rows of A. Let b be the subvector of b corresponding to the indices of rows selected
b
for A.
b Define the vector x b(b) ∈ Rr be the vector obtained from the first r entries
in x(b). Since the entries in x(b) with indexes greater than r are zero, we have
x(b) = [xb(b); 0] and so A
bxb(b) = b
b. As A
b is a nonsingular matrix, we deduce that

∥x(b)∥2 = ∥x b−1b
b(b)∥2 = ∥A b−1 ∥∥b
b∥2 ≤ ∥A b−1 ∥∥b∥2 ,
b∥2 ≤ ∥A
where ∥A b−1 ∥ is the spectral norm of A b−1 . Then, setting C(A) = ∥A b−1 ∥ concludes
the proof.
Surprisingly, a similar result holds for the solutions of systems of linear inequal-
ities as well.

Proposition II.9.5 Consider a system of linear inequalities Ax ≤ b where

A ∈ Rm×n . Whenever b results in a feasible system Ax ≤ b, then there exists
C(A) < ∞ depending solely on A, but not on b, such that this system has a
solution x(b) with ∥x(b)∥2 ≤ C(A)∥b∥2 .

Proof. This proof is quite similar to the one for Proposition II.9.4. Let r :=
rank(A). The case of r = 0 is trivial – in this case A = 0, and the system Ax ≤ b
is feasible, it has the solution x = 0. When r > 0, we can assume without loss of
9.1 Extreme points of polyhedral sets 115

generality that the first r columns in A are linearly independent. Let A b ∈ Rm×r
be the submatrix of A obtained from the first r columns of A. As A has all the
b
linearly independent columns of A, the image spaces of A and A b are the same.
Thus, the system Ax ≤ b is feasible if and only if the system Au b ≤ b is feasible.
r
Moreover, given any feasible solution u ∈ R to Au ≤ b, we can generate a feasible
b
solution x := [u; 0] ∈ Rn by adding n − r zeros at the end and still preserve the
norm of the solution. Hence, without loss of generality we can assume that the
columns of A are linearly independent and r = n.
As A ∈ Rm×n has n linearly independent columns and each column is a vector
in Rm , we deduce that m ≥ n and {u : Au = 0} = {0}. Thus, we conclude
that the polyhedral set {x : Ax ≤ b} does not contain lines. Therefore, when
nonempty, by Krein-Milman Theorem this polyhedral set has an extreme point.
Let us take this point as x(b). Then, by Theorem II.9.1, at least n of the inequality
constraints from the system Ax ≤ b will be active at x(b) and out of these active
constraints there will be n vectors ai (corresponding to rows of the matrix A)
that are linearly independent. That is, Ab x(b) = b holds for a certain nonsingular
n×n submatrix Ab of A. So, we conclude ∥x(b)∥2 ≤ ∥A−1 b ∥∥b∥2 . Since the number
of r × r nonsingular submatrices in A is finite, the maximum C(A) of the spectral
norms of the inverses of these submatrices is finite as well, and, as we have seen,
for every b for which the system Ax ≤ b is feasible, it has a solution x(b) with
∥x(b)∥2 ≤ C(A)∥b∥2 , as claimed.

9.1.1 Important polyhedral sets and their extreme points

In this section, we will examine a number of important polyhedral sets and their
extreme points. Throughout this section, we suppose that k and n are positive
integers with k ≤ n.

Example II.9.1 Suppose k, n are integers satisfying 0 ≤ k ≤ n. Consider the

polytope
( n
)
X
X := x ∈ Rn : 0 ≤ xi ≤ 1, ∀i ≤ n, xi = k .
i=1

The set of extreme points of X is precisely the set of vectors with entries 0 and
1 which have exactly k entries equal to 1. That is,
( n
)
X
Ext(X) = x ∈ {0, 1}n : xi = k .
i=1

In particular,
Pn the extreme points of the “flat (a.k.a. probabilistic) simplex”
{x ∈ Rn+ : i=1 xi = 1} are the basic orths (see the Figure II.3 for an illustration
116 Geometry of polyhedral sets

of this set with k = 1).

Figure II.3. Example II.9.1, n = 3, k = 1.

Let us justifyPnthe claim of this example. For convenience, we define Y :=

{x ∈ {0, 1}n : i=1 xi = k}; then we need to show that Ext(X) = Y . Clearly,
Y ⊆ X. Moreover, for any y ∈ Y , for each coordinate i = 1, . . . , n we have either
yi = 0 or yi = 1, thus we have n bound constraints active. Since the vectors
of coefficients of these constraints provide us n linearly independent vectors, we
conclude by Theorem II.9.1 that Y ⊆ Ext(X). Now, consider any w ∈ Ext(X).
Then, by Theorem II.9.1 among the constraints specifying X, n constraints with
linearly independent vectors of coefficients should be active at w. Thus, we must
have at least n − 1 of the bound constraints 0 ≤ xi ≤ 1 active at w, i.e., at least
n − 1 of the entries of w must be {0, 1}. Let Pni∗ be the index of the remaining
entry of w. As w P ∈ X, it must also satisfy i=1 wi = k. As k is an integer, we
deduce wi∗ = k − i̸=i∗ wi must be an integer as well. But, then as we also have
0 ≤ wi∗ ≤ 1, we deduce that wi∗ ∈ {0, 1}. Thus, w ∈ Y holds, as desired. ♢

Example II.9.2 Suppose k, n are integers satisfying 0 ≤ k ≤ n. Consider the

polytope
( n
)
X
n
X= x ∈ R : 0 ≤ xi ≤ 1, ∀i ≤ n, xi ≤ k .
i=1

The set of extreme points of X is precisely the set of vectors with entries 0 and
1 which have at most k entries equal to 1. That is,

( n
)
X
Ext(X) = x ∈ {0, 1}n : xi ≤ k .
i=1

PIn particular, the extreme points of the “full-dimensional simplex” {x ∈ Rn+ :

n
i=1 xi ≤ 1} are the basic orths and the origin (see Figure II.4 for an illustration
9.1 Extreme points of polyhedral sets 117

of this set with k = 1).

Figure II.4. Example II.9.2, n = 3, k = 1

Justification of this example follows the one of Example II.9.1 and is left as an
exercise to the reader. ♢
Example II.9.3 Suppose k, n are integers satisfying 0 ≤ k ≤ n. Consider the
polytope
n Xn o
X = x ∈ Rn : |xi | ≤ 1, ∀i ≤ n, |xi | ≤ k .
i=1

Extreme points of X are exactly the vectors with entries 0, 1, −1 which have
exactly k nonzero entries. That is,
( n
)
X
n
Ext(X) = x ∈ {−1, 0, 1} : |xi | = k .
i=1

In particular, extreme points of the unit ∥ · ∥1 -ball {x ∈ Rn : ∥x∥1 ≤ 1} = {x ∈

n
Rn :
P
i=1 |xi | ≤ 1} are exactly the vectors {±ei : i = 1, . . . , n} where ei is the
i-th basic orth (see Figure II.5 for an illustration of this set with k = 1).

Figure II.5. Example II.9.3, n = 3, k = 1.

Similarly, extreme points of the unit ∥ · ∥∞ -ball {x ∈ Rn : ∥x∥∞ ≤ 1} = {x ∈

R : |xi | ≤ 1, ∀i = 1, . . . , n} are the 2n vectors with ±1 entries (see Figure II.6
n
118 Geometry of polyhedral sets

for an illustration of this set).

Figure II.6. Extreme points of the box {x ∈ R3 : ∥x∥∞ ≤ 1}.

Here is the justification of the Pn claim of this example. For convenience, we
define Y := {x ∈ {−1, 0, 1}n : i=1 |xi | = k}; we need to show that Ext(X) =
Y . Clearly, Y ⊆ X. Consider any y ∈ Y . Without loss of generality, suppose
that the nonzero entries of y are the first k. Now, consider any h such that
y ± h ∈ X. Since yi ∈ {−1, +1} for i = 1, . . . , k and y ± h ∈ X, Pnwe must have
hi = 0 for all i = 1, . . . , k. Also, y + h ∈ X implies that k ≥ i=1 |yi + hi | =
Pk Pn Pn
i=1 |yi | + i=k+1 |hi | = k + i=k+1 |hi |, and thus |hi | = 0 for all i = k + 1, . . . , n.
This proves that h = 0 and thus y must indeed be an extreme point of X. So,
Y ⊆ Ext(X). Now, consider any w ∈ Ext(X), we will show that w ∈ Y . Note
that X has certain symmetry: for any x ∈ X and any d ∈ {−1, +1}n , we have
Diag(d)x ∈ X as well. In particular, any d ∈ {−1, +1}n maps X onto itself
and therefore maps its extreme points Ext(X) onto Ext(X). As a result, we can
assume without loss of generality that w ≥ 0. Then, all we need to prove is
that w has k entries equal to 1 and all remaining entries equal Pto 0. The set
k
X+ := {x ∈ X : x ≥ 0} = {x ∈ Rn : 0 ≤ xi ≤ 1, ∀i ≤ n, i=1 xi ≤ k} is
contained in X and w ∈ X+ . As w ∈ Ext(X) and w ∈ X+ ⊂ X, we must have
that w is an extreme point of X+ as well.
In the preceding reasoning, we have used the following evident fact:
Suppose P ⊂ Q are convex sets. If x̄ ∈ P is an extreme point of Q, then
it is an extreme point of P as well.
(Otherwise x̄ would be the midpoint of a nontrivial segment contained in P and
therefore contained in Q).
Then, noting that X+ is precisely the set from Example II.9.2 and w ∈ Ext(X+ ),
we conclude that w has only 0 and 1 entries with at most k entries equal to k.
It remains to verify that the number of nonzero entries in w is equal to k. In-
deed, w were to have
Pn fewer than k nonzero entries, w would have a zero entry,
say, w1 = 0, and i=1 |wi | < k, implying that there exists ϵ ∈ (0, 1) such that
the vector h = [ϵ; 0; . . . ; 0] will satisfy (w ± h) ∈ X, which is impossible since
w ∈ Ext(X). ♢
As our last example we next discuss the so-called Assignment polytope, which
9.1 Extreme points of polyhedral sets 119

is closely connected to the very important concept of doubly stochastic matrices

and the Birkhoff Theorem.
n
Definition II.9.6 [Doubly stochastic matrix] A matrix X = [x
n×n
Pijn]i,j=1 ∈
R is called doubly stochastic, if xij ≥ 0 for all i, j ∈ {1, . . . , n},
Pn i=1 xij =
1 for all j ∈ {1, . . . , n} (i.e., all column sums are equal to 1), and j=1 xij = 1
for all i ∈ {1, . . . , n} (i.e., all row sums are equal to 1).
2
The set of all doubly stochastic matrices (treated as elements of Rn = Rn×n )
form the following bounded polyhedral set:
 
 x ij ≥ 0, ∀i, j ∈ {1, . . . , n}, 
n
Πn := X = [xij ]ni,j=1 : Pi=1 xij = 1, ∀j ∈ {1, . . . , n}, .
P
n
j=1 xij = 1, ∀i ∈ {1, . . . , n}
 

The set Πn is also called the Assignment (or balanced matching) polytope. As Πn
is a polytope, by Krein-Milman Theorem, Πn is the convex hull of its extreme
points. What are these extreme points? The answer is given by the following
fundamental result.

Theorem II.9.7 [Birkhoff–von Neumann Theorem] Extreme points of Πn

are exactly the permutation matrices of order n, which are n × n Boolean
(i.e., with 0/1 entries) matrices with exactly one nonzero element (equal to
1) in every row and every column.

Proof. It is indeed easy to see that every n × n permutation matrix is an extreme

point of Πn ; we give this as Exercise II.42.
We now prove the difficult part, that is we will show that every extreme point
of Πn is a permutation matrix. First, note that the 2n linear equations in the
definition of Πn , those saying that all row and column sums are equal to 1, are
linearly dependent (observe that the sum of the first group of equalities is exactly
the same as the sum of the second group of equalities). Thus, we lose nothing
when assuming that there are just 2n − 1 equality constraints in the description
of Πn . Now, let us prove the claim by induction on n. The base n = 1 is trivial.
As the induction hypothesis suppose that the statement holds for Πn−1 . Let X
be an extreme point of Πn . By Theorem II.9.1, among the constraints defining
Πn (i.e., 2n − 1 equalities and n2 inequalities xij ≥ 0) there should be n2 linearly
independent constraints which are satisfied at X as equations. Thus, at least
n2 − (2n − 1) = (n − 1)2 entries in X should be zeros. It follows that at least one
of the columns of X contains ≤ 1 nonzero entries (since otherwise the number of
zero entries in X would be at most n(n − 2) < (n − 1)2 ). Thus, there exists at
least one column with at most 1 nonzero entry; since the sum of entries in this
column is 1, this nonzero entry, let it be xīj̄ , is equal to 1. As the entries in row ī
are nonnegative, sum up to 1 and xīj̄ = 1, xīj̄ = 1 is the only nonzero entry in its
row and its column. Eliminating from X the row ī and the column j̄, we get an
(n − 1) × (n − 1) doubly stochastic matrix. By inductive hypothesis, this matrix
120 Geometry of polyhedral sets

is a convex combination of (n − 1) × (n − 1) permutation matrices. Augmenting

every one of these matrices by the column and the row we have eliminated, we
representation of X as a convex combination of n × n permutation matrices:
get a P
X = ℓ λℓ P ℓ with nonnegative λℓ summing up to 1. Since P ℓ ∈ Πn and X is an
extreme point of Πn , in this representation all terms with nonzero coefficients λℓ
must be equal to λℓ X, so that X is one of the permutation matrices P ℓ and as
such is a permutation matrix.

9.2 Extreme rays of polyhedral cones

Recall that for nontrivial closed pointed cones, we have defined the concepts of
extreme directions and extreme rays in section 8.6. In the case of polyhedral
cones, we can also give an algebraic characterization of its extreme directions
analogous to Theorem II.9.1.

Theorem II.9.8 [Characterization of extreme directions of polyhedral cones]

Consider a polyhedral cone M = d ∈ Rn : a⊤ i d ≤ 0, i = 1, . . . , m . Suppose
that M is nontrivial and pointed. A direction d ∈ M \ {0} is an extreme di-
rection of M if and only if there are n − 1 linearly independent (i.e., with
linearly independent vectors ai ) constraints which are active at d (i.e., such
that a⊤i d = 0).

Proof. As M is a nonempty closed (recall that it is polyhedral!) pointed cone,

from Fact II.8.23(iii), we deduce that its dual cone M∗ has a nonempty interior.
Let f ∈ int M∗ . Consider the set
B := M ∩ d ∈ Rn : f ⊤ d = 1 .

Then, by Fact II.8.33(ii), B is a base of M . Thus, B is nonempty and compact (see

Fact II.8.33(iii)). As M is polyhedral, by definition of B, we have B is polyhedral
as well. Since B is a nonempty bounded polyhedral set, from Theorem II.8.6(ii)
we have B = Conv(Ext(B)). Recall from Fact II.8.33(iv) that there is a one-to-
one correspondence between extreme rays of M and extreme points of a base B
of M ; specifically, the ray R := R+ (d), d ∈ M \ {0} is extreme if and only if
R ∩ B is an extreme point of B.
Consider an extreme point xd of B. Applying Theorem II.9.1, we deduce that
at every extreme point of B we must have at least n active constraints from
the set of linear (in)equalities f ⊤ xd = 1 and a⊤i xd ≤ 0, i = 1, . . . , m, where the
corresponding vectors of active constraints must contain n linearly independent
vectors. Considering the definition of B, at every extreme point xd of B, we have
the constraint f ⊤ xd = 1 is active. Let I be the set of indices i ∈ {1, . . . , m}
such that a⊤ i xd = 0. Then, the vectors generating the active constraints at xd
are given by {f } ∪ {ai : i ∈ I}. Since among the active constraints at xd , there
are n of them with linearly independent vectors, we conclude that there must be
n − 1 linearly independent vectors in {ai : i ∈ I}. Then, as xd and d generate the
9.3 Geometry of polyhedral sets 121

same extreme direction of M , we conclude that d satisfies the desired algebraic

characterization.
To establish the reverse direction, suppose that a direction d ∈ M \{0} satisfies
a⊤
i d = 0 for i ∈ I ⊆ {1, . . . , m} where there are n − 1 linearly independent vectors
in {ai : i ∈ I}. Let xd := R+ (d) ∩ B. We claim that xd ∈ Ext(B). Note that xd
satisfies a⊤ ⊤
i xd = 0 for i ∈ I and f xd = 1. If f ∈ Lin({ai : i ∈ I}), then from the
constraints ai xd = 0 for all i ∈ I we would have deduced f ⊤ xd = 0, which is not
⊤

the case. So, the set {f } ∪ {ai : i ∈ I} must have n linearly independent vectors.
Then, by Theorem II.9.1 xd must be an extreme point of B. Once again, using
Fact II.8.33(iv) we conclude that d must be an extreme direction of M .
Analogous to Corollary II.9.2, we have the following immediate corollary of
this theorem.

Corollary II.9.9 Every nontrivial pointed polyhedral cone M has finitely

many extreme rays. Moreover, the sum of these extreme rays is the entire
M.
Proof. Let M be a nontrivial pointed polyhedral cone. As any polyhedral cone
is defined by finitely many linear inequalities, using the algebraic characterization
of the extreme directions given in Theorem II.9.8, we deduce that M has finitely
many extreme rays. The last claim of the corollary is justified by Theorem II.8.34,
i.e., Krein-Milman Theorem in conic form, as this theorem states that M has
extreme rays, and their sum is the entire M .

9.3 Geometry of polyhedral sets

By definition, a polyhedral set M is the set of all solutions to a finite system of
nonstrict linear inequalities:
M := {x ∈ Rn : Ax ≤ b} , (9.1)
where A ∈ Rm×n and b ∈ Rm . This is an “outer” description of a polyhedral
set, that is, it explains what we should delete from Rn to get M (cf: “to create
a sculpture, take a big stone and delete everything that is redundant”). We are
about to establish an important result on the equivalent “inner” representation
of a polyhedral set, that is, one explaining how to build the set starting with
simple “building blocks.”
Consider the following construction. Let us take two finite sets of vectors V
(“vertices;” this set should be nonempty) and R (“rays;” this set can be empty)
and build the set
M (V, R) := Conv(V ) + Cone(R)
X P
X λv ≥ 0, ∀v ∈ V, v∈V λv = 1,
= λv v + µr r : .
v∈V r∈R µr ≥ 0, ∀r ∈ R
Thus, in the construction of M (V, R) we take all vectors which can be represented
as sums of convex combinations of the points from V and conic combinations of
122 Geometry of polyhedral sets

the points from R. The set M (V, R) clearly is convex as it is the arithmetic sum
of two convex sets Conv(V ) and Cone(R) (recall our convention that Cone(∅) =
{0}, see Fact I.1.20). We are now ready to present the promised inner description
of a polyhedral set.

Theorem II.9.10 [Inner description of a polyhedral set] The sets of the

form M (V, R) , where V, R are finite set of vectors and V ̸= ∅, are exactly
the nonempty polyhedral sets: M (V, R) is polyhedral, and every nonempty
polyhedral set M is M (V, R) for properly chosen finite sets V ̸= ∅ and R.
The sets of the type M ({0}, R) are exactly the polyhedral cones (sets given
by finitely many nonstrict homogeneous linear inequalities).

Remark II.9.11 We will see in section 9.3.3 that the inner characterization of
the polyhedral sets given in Theorem II.9.10 can be made much more precise.
Suppose that we are given a nonempty polyhedral set M . Then, we can select an
inner characterization of it in the form of M = Conv(V ) + Cone(R) with finite V
and finite R, where the “conic” part Cone(R) (not the set R itself!) is uniquely
defined by M ; in fact it will always hold that Cone(R) = Rec(M ), i.e., R can
be taken as the generators of the recessive cone of M (see Comment to Lemma
II.8.8). Moreover, if M does not contain lines, then V can be chosen as the set of
all extreme points of M . ■
We will prove Theorem II.9.10 in section 9.3.3. Before proceeding with its proof,
let us understand why this theorem is so important, i.e., why it is so nice to know
both inner and outer descriptions of a polyhedral set.
Consider the following natural questions:
• A. Is it true that the inverse image of a polyhedral set M ⊂ Rn under an affine
mapping y 7→ P(y) = P y + p : Rm → Rn , i.e., the set
P −1 (M ) = {y ∈ Rm : P y + p ∈ M }
is polyhedral?
• B. Is it true that the image of a polyhedral set M ⊂ Rn under an affine mapping
x 7→ y = P(x) = P x + p : Rn → Rm – the set
P(M ) = {P x + p : x ∈ M }
is polyhedral?
• C. Is it true that the intersection of two polyhedral sets is again a polyhedral
set?
• D. Is it true that the arithmetic sum of two polyhedral sets is again a polyhedral
set?
The answers to all these questions are positive; one way to see it is to use calculus
of polyhedral representations along with the fact that polyhedrally representable
sets are exactly the same as polyhedral sets (see chapter 3). Another very in-
structive way is to use the just outlined results on the structure of polyhedral
sets, which we will do now.
9.3 Geometry of polyhedral sets 123

It is very easy to answer affirmatively to A, starting from the original “outer”

definition of a polyhedral set: if M = {x : Ax ≤ b}, then, of course,
P −1 (M ) = {y : A(P y + p) ≤ b} = {y : (AP )y ≤ b − Ap}
and therefore P −1 (M ) is a polyhedral set.
An attempt to answer affirmatively to B via the same “outer” definition fails
– there is no easy way to convert the linear inequalities defining a polyhedral
set into those defining its image, and it is absolutely unclear why the image in
fact is given by finitely many linear inequalities. Note, however, that there is no
difficulty to answer affirmatively to B with the inner description of a nonempty
polyhedral set: if M = M (V, R), then, evidently,
P(M ) = M (P(V ), P R),
where P R := {P r : r ∈ R} is the image of R under the action of the homogeneous
part of P.
A positive answer to C becomes evident when we use the outer description of
a polyhedral set: taking intersection of the solution sets to two systems of finitely
many nonstrict linear inequalities, we, of course, again get the solution set to a
system of this type – you simply should put together all inequalities from both
of the original systems.
On the other hand, it is very unclear how to give the affirmative answer to D
using the outer description of a polyhedral set – what happens to the inequalities
when we add the solution sets? In contrast to this, the inner description gives the
answer immediately:
M (V, R) + M (V ′ , R′ ) = Conv(V ) + Cone(R) + Conv(V ′ ) + Cone(R′ )
= [Conv(V ) + Conv(V ′ )] + [Cone(R) + Cone(R′ )]
= Conv(V + V ′ ) + Cone(R ∪ R′ )
= M (V + V ′ , R ∪ R′ ).
Note that in this computation we used two rules which should be justified:
Conv(V ) + Conv(V ′ ) = Conv(V + V ′ ) and Cone(R) + Cone(R′ ) = Cone(R ∪ R′ ).
The second is evident from the definition of the conic hull, and the first one
follows from a very simple reasoning. To see it, note that Conv(V ) + Conv(V ′ )
is a convex set which by its definition contains V + V ′ and thus also contains
Conv(V + V ′ ). The inverse inclusion is proved as follows: if
X X
x= λ i vi , y= λ′j vj′
i j

are convex combinations of points from V , respectively, V ′ , then, as it is imme-

diately seen (please check!),
X
x+y = λi λ′j (vi + vj′ )
i,j

and the right hand side expression is nothing but a convex combination of points
from V + V ′ .
124 Geometry of polyhedral sets

To conclude, it is extremely useful to keep in mind both descriptions of poly-

hedral sets – what is difficult to see with one of them, is absolutely clear with
another.
As a seemingly “more important” application of the developed theory, let us
look at Linear Programming.

9.3.1 Application: Descriptive theory of Linear Programming

A general linear program is the problem of minimizing a linear objective function
over a polyhedral set:
min c⊤ x : x ∈ M , where M := {x ∈ Rn : Ax ≤ b}.

(P )
x

Here, c ∈ Rn is the objective, A ∈ Rm×n is the constraint matrix, and b ∈ Rm

is the right hand side vector. Note that (P) is called a “Linear Programming
problem in the canonical form;” there are other equivalent forms of this problem
as well.

9.3.2 Application: Solvability of a Linear Programming problem

According to the Linear Programming terminology discussed in section 4.5.1, (P )
is called
• feasible, if it admits a feasible solution, i.e., the system Ax ≤ b is feasible, and
infeasible otherwise;
• bounded, if its objective is below bounded on the feasible set (e.g., due to the
fact that the latter is empty), and unbounded otherwise;
• solvable, if it is feasible and the optimal solution exists, i.e., the objective
function attains its minimum on the feasible set.
Whenever (P ) is feasible, the infimum of the values of the objective function at
feasible solutions is called the optimal value Opt(P ) of (P ). Opt(P ) is finite when
the problem (P ) is bounded from below and −∞ when (P ) is unbounded. In the
case of a minimization type problem, it is convenient to assign the optimal value
of +∞ whenever the problem is infeasible.
Note that our terminology is aimed to deal with minimization problems; if the
problem is to maximize the objective, the terminology is updated in the natural
way: when defining bounded/unbounded programs, we should speak about above
boundedness rather than about the below boundedness of the objective on the
feasible set, etc. As a result, the optimal value of an LP problem
• in the case of a minimization problem is the infimum of the objective over the
feasible set, provided the latter is nonempty, and +∞ when the problem is
infeasible;
• in the case of a maximization problem is the supremum of the objective over
the feasible set, provided the latter is nonempty, and −∞ when the problem is
infeasible.
9.3 Geometry of polyhedral sets 125

This terminology is consistent with the usual way of converting a minimization

problem into an equivalent maximization one by replacing the original objective c
with −c: the properties of feasibility, boundedness, solvability remain unchanged,
and the optimal value in all cases changes its sign.
When talking about the possible outcomes of solving an LP problem, we talk
about three possibilities: (i) infeasible LP problem, (ii) unbounded and feasible
LP problem, and (iii) solvable LP problem. In particular, it seems that in the case
of “bounded and feasible” LP problem, we are jumping straight to the conclusion
that the corresponding optimization problem will be “solvable.” This, a bounded
LP program is always solvable, is indeed true, although it is absolutely unclear
in advance why (note that this statement absolutely does not hold for general
convex programming problems without further assumptions). We have already
established this fact, even twice — via Fourier-Motzkin elimination (section 3.2
and via the LP Duality Theorem, see Theorem I.4.9). In fact yet another proof of
this fundamental fact of Linear Programming follows immediately from the inner
characterization of polyhedral sets as shown next.

Theorem II.9.12 Suppose we are given a feasible minimization type LP

problem (P ) via an inner representation of its feasible set M :
M = Conv(V ) + Cone(R),
where V = {vi : i = 1, . . . , I} and R = {rj : j = 1, . . . , J} are finite and
nonempty sets (cf. Theorem II.9.10). Then,
(i) (P ) is solvable if and only if it is below bounded, which is the case if
and only if c⊤ rj ≥ 0 for all 1 ≤ j ≤ J.
In particular, the set C of objectives c for which (P ) is below bounded is a
polyhedral cone.
(ii) When (P ) is below bounded, its optimal value is equal to
Opt(P ) = min c⊤ v.
v∈V

Thus, Opt(P ) is a concave function of the objective vector c, and there is

an optimal solution which is the best, in terms of its objective value, among
the points in V . In addition, when the feasible set of (P ) does not contain
lines and (P ) is below bounded, there is at least one optimal solution of (P )
which is an extreme point of M .

Proof. (i): By Theorem II.9.10, we clearly have

Opt(P ) = minv c⊤ v : v ∈ Conv(V ) + inf r c⊤ r : r ∈ Cone(R) .

We see that Opt(P ) is finite if and only if inf r c⊤ r : r ∈ Cone(R) > −∞, and

⊤
the latter clearly ⊤is the case if and only if c r ≥ 0 for all r ∈ R. Then, in
such a case inf r c r : r ∈ Cone(R) = 0, and also minv c⊤ v : v ∈ Conv(V ) =
minv c⊤ v : v ∈ V .
(ii): The first claim in (ii) is an immediate byproduct of the proof of (i). The
126 Geometry of polyhedral sets

second claim follows from the fact that when M does not contain lines, we can
take V = Ext(M ), see Remark II.9.11.

9.3.3 Proof of Theorem II.9.10

In this section, we will prove Theorem II.9.10. To simplify our language let
us call VR-sets (“V” from “vertex” and “R” from “rays”) the sets of the form
M (V, R), and P-sets the nonempty polyhedral sets, i.e., defined by finitely many
linear non-strict inequalities. We should prove that every P-set is a VR-set, and
vice versa.
VR =⇒ P: This is immediate: a VR-set is nonempty and polyhedrally repre-
sentable (why?) and thus is a nonempty P-set by Theorem I.3.2.
P =⇒ VR:
Let M ̸= ∅ be a P-set, so that M is the set of all solutions to a feasible system
of linear inequalities:
M = {x ∈ Rn : Ax ≤ b} , (9.2)
where A ∈ Rm×n .
We will first study the case of P-sets that do not contain lines, and then reduce
the general case to this one.

Theorem II.9.13 [Structure of a polyhedral set with no lines] A nonempty

polyhedral set M = {x ∈ Rn : Ax ≤ b} which does not contain lines admits
a VR-representation given by M = M (V, R) = Conv(V ) + Cone(R), where
V is the set of extreme points of M and R = {0} if M is bounded and R is
the set of extreme rays of Rec(M ) if M is unbounded.

Proof. As M is a nonempty closed convex set that does not contain lines, by
Theorem II.8.6(i) we know Ext(M ) ̸= ∅, and by Theorem II.8.16 we have M =
Conv(Ext(M )) + Rec(M ). Moreover, by Corollary II.9.2, we have Ext(M ) is a
finite set.
If M is bounded, then Rec(M ) = {0}, and thus the result follows. Suppose M
is unbounded. Then, Rec(M ) is nontrivial. Also, Rec(M ) is pointed as M does
not contain lines implies that Rec(M ) does not contain lines either. Moreover,
from Fact II.8.15, we deduce that Rec(M ) = {h ∈ Rn : Ah ≤ 0} and thus is
a polyhedral cone. Then, by Corollary II.9.9 we have that Rec(M ) has finitely
many extreme rays and Rec(M ) is the sum of its extreme rays.
Next, we study the case when M contains a line. We start with the following
observation.

Lemma II.9.14 A nonempty polyhedral set M = {x ∈ Rn : Ax ≤ b}

contains lines if and only if KerA ̸= {0}. Moreover, given a vector h ̸= 0, the
set M contains a line with direction h (i.e., x + th ∈ M for all x ∈ M and
9.3 Geometry of polyhedral sets 127

t ∈ R) if and only if h ∈ KerA. That is, the nonzero vectors from KerA are
exactly the directions of lines contained in M .

Proof. If h ̸= 0 is the direction of a line in M , then A(x + th) ≤ b for some

x ∈ M and all t ∈ R, which is possible if and only if Ah = 0. Vice versa, if h ̸= 0
is from the kernel of A, i.e., if Ah = 0, then the line x + R(h) with x ∈ M is
clearly contained in M .
Given a nonempty set M as in (9.2), define L := KerA, let L⊥ be the orthogonal
complement to L, and let M ′ be the intersection of M and L⊥ :
M ′ := x ∈ L⊥ : Ax ≤ b .

First, note that as M ̸= ∅ we have M ′ ̸= ∅, and also the set M ′ clearly does
not contain lines. This is because if h ̸= 0 is the direction of a line satisfying
x + th ∈ M ′ for all t ∈ R and some x ∈ M ′ , by definition of M ′ we must have
x + th ∈ L⊥ for all t and thus h ∈ L⊥ . On the other hand, by Lemma II.9.14,
we must also have h ∈ KerA = L. Then, h ∈ L ∩ L⊥ implies h = 0, which is a
contradiction.
Now, note that M ′ ̸= ∅ satisfies M = M ′ + L. Indeed, M ′ contains the or-
thogonal projections of all points from M onto L⊥ (since to project a point onto
L⊥ , you should move from this point along a certain line with a direction from
L, and all these movements, started in M , keep you in M by Lemma II.9.14) and
therefore is nonempty, first, and is such that M ′ + L ⊇ M , second. On the other
hand, M ′ ⊆ M and M + L = M (by Lemma II.9.14), and so M ′ + L ⊆ M . Thus,
M′ + L = M.
Finally, it is clear that M ′ is a polyhedral set as the inclusion x ∈ L⊥ can be
represented by dim(L) linear equations (i.e., by 2 dim(L) nonstrict linear inequal-
ities). To this end, all we need is a set of vectors ξ1 , . . . , ξdim(L) forming a basis in
L, and then L⊥ := {x ∈ Rn : ξi⊤ x = 0, ∀i = 1, . . . , dim(L)}.
Therefore, with these steps, given an arbitrary nonempty P-set M , we have
represented it as the sum of a P-set M ′ which does not contain lines and a linear
subspace L. Then, as M ′ does not contain lines, by Theorem II.9.13, we have
M ′ = M (V ′ , R′ ) where V ′ is the nonempty set of extreme points of M ′ and R′
is the set of extreme rays of Rec(M ′ ). Let us define R′′ to be the finite set of
generators for L, i.e., L = Cone(R′′ ). Then, we arrive at
M = M′ + L
= [Conv(V ′ ) + Cone(R′ )] + Cone(R′′ )
= Conv(V ′ ) + [Cone(R′ ) + Cone(R′′ )]
= Conv(V ′ ) + Cone(R′ ∪ R′′ )
= M (V ′ , R′ ∪ R′′ ).
Thus, this proves that a P-set is indeed a VR-set, as desired.
Finally, let us justify Remark II.9.11. Suppose we are given M = Conv(V ) +
Cone(R) with finite sets V, R and V ̸= ∅. Justifying the first claim in this re-
mark requires us to show that Cone(R) = Rec(M ). To see this, from M =
128 Geometry of polyhedral sets

Conv(V ) + Cone(R) it clearly follows Cone(R) ⊆ Rec(M ). To prove reverse in-

clusion, consider any r ∈ Rec(M ). Then, by definition of the recessive cone, for
any v ∈ V ⊆ M , we have v + tr ∈ M for all t > 0. In addition, as v + tr ∈ M for
all t > 0, from M = Conv(V ) + Cone(R) we deduce that for all t > 0
∃vt ∈ Conv(V ) and ∃rt ∈ Cone(R): v + tr = vt + rt .
Since V is a finite set, Conv(V ) is bounded, and so r = limt→∞ t−1 rt . Moreover,
this limit (and thus r) belongs to Cone(R) as Cone(R) is polyhedral and thus
closed. The last claim of Remark II.9.11 states that when a nonempty P-set M
does not contain lines, in every representation M = M (V, R) with finite V and R
one has Ext(M ) ⊆ V . To see this, suppose x is an extreme point of M . We will first
show that x ∈ Conv(V ). Assume for contradiction that x ∈ M \ Conv(V ). Then,
from x ∈ M = Conv(V ) + Cone(R), we deduce that x = x̄ + r with x̄ ∈ Conv(V )
and 0 ̸= r ∈ Cone(R) (why r ̸= 0?). But, this would imply x̄ = x − r ∈ M and
x + r ∈ M as x ∈ M , r ∈ Cone(R) = Rec(M ) and M is convex. Thus, we arrive
at x ± r ∈ M with r ̸= 0, contradicting the fact that x ∈ Ext(M ). Therefore,
x ∈ Conv(V ), and hence x ∈ V by Fact II.8.5.

9.4 ⋆ Majorization
In this section we will introduce and study the Majorization Principle, which
describes the convex hull of permutations of a given vector.
For any x ∈ Rn , we define X[x] to be the set of all convex combinations of n!
vectors obtained from x by permuting its coordinates. That is,
X[x] := Conv ({P x : P is an n × n permutation matrix})
= {Dx : D ∈ Πn } ,
where Πn is the set of all n × n doubly stochastic matrices. Here, the equality is
due to the Birkhoff-von Neumann Theorem. Note that X[x] is a permutationally
symmetric set, that is given any vector from the set the vector obtained by
permuting its entries is also in the set.

Theorem II.9.15 [Majorization Principle] Given two vectors x, y ∈ Rn , we

have y ∈ X[x] if and only if y satisfies
sj (y) ≤ sj (x), j = 1, . . . , n − 1,
(9.3)
y1 + . . . + yn = x 1 + . . . + x n ,
where sj (y) is the sum of the j largest entries of the vector y.

Proof: To ease our notation, let us define the set

n sj (y) ≤ sj (x), j = 1, . . . , n − 1
Y := y ∈ R : .
sn (y) = y1 + . . . + yn = x1 + . . . + xn = sn (x)
Then, we need to show that Y = X[x].
9.4 ⋆ Majorization 129

For any k, define Ik to be the family of all k-element subsets of the index set
{1, 2 . . . ., n}, and so
X
sk (y) = max yi , . (9.4)
I∈Ik
i∈I

We first prove that Y ⊇ X[x]. Consider any y ∈ X[x]. By the definition of

X[x], y can be represented as convex combination
X
y= λ σ xσ ,
σ

where the sum is taken over all permutations σ of indices 1, 2, . . . , n, and xσ is

obtained from x by permutation σ of the entries, i.e.,

xσ := [xσ(1) ; . . . ; xσ(n) ].

Consequently, for every I ∈ Ik we have

X XX X X X
yi = λσ xσ(i) = λσ xσ(i) ≤ λσ sk (x) = sk (x),
i∈I i∈I σ σ i∈I i∈I

where the inequality is due to (9.4). Maximizing both sides of this inequality over
I ∈ Ik and invoking (9.4) once again, we get sk (y) ≤ sk (x) for all k ≤ n. In
addition,
n
X n X
X X n
X X n
X n
X
yi = λσ xσ(i) = λσ xσ(i) = λσ xi = xi ,
i=1 i=1 σ σ i=1 σ i=1 i=1

that is, sn (y) = sn (x). Thus, y ∈ Y , whence Y ⊇ X[x].

We will now prove the difficult part of Majorization Principle which states that
Y ⊆ X[x]. Consider any y ∈ Y and let us prove that y ∈ X[x]. By symmetry,
we may assume without loss of generality that the vectors x and y are ordered:
x1 ≥ x2 ≥ . . . ≥ xn and y1 ≥ y2 ≥ . . . ≥ yn . Assume for contradiction that
y ̸∈ X[x]. Since X[x] clearly is a convex compact set and y ̸∈ X[x], there exists
a linear form c⊤ z which strongly separates y and X[x], i.e.,

c⊤ y > max c⊤ z.
z∈X[x]

As the set X[x] is permutationally symmetric and the vector y is ordered, without
loss of generality we can select the vector c to be ordered as well. This is because
when permuting the entries of c, we preserve max c⊤ z, and arranging the entries
z∈X[x]
of c in non-increasing order, we do not decrease c⊤ y: assuming, say, that c1 < c2 ,
swapping c1 and c2 we do not decrease c⊤ y: [c2 y1 + c1 y2 ] − [c1 y1 + c2 y2 ] = [c2 −
c1 ][y1 −y2 ] ≥ 0. Next, by Abel’s formula (discrete analogy of integration by parts)
130 Geometry of polyhedral sets

we have
n
X n−1
X i
X n
X
c⊤ y = ci yi = (ci − ci+1 ) yj + cn yj
i=1 i=1 j=1 j=1
n−1
X
= (ci − ci+1 )si (y) + cn sn (y)
i=1
n−1
X n
X
≤ (ci − ci+1 )si (x) + cn sn (x) = ci xi = c⊤ x.
i=1 i=1

where the inequality follows from the “orderedness” of entries in c and y ∈ Y .

Thus, we conclude c⊤ y ≤ c⊤ x, which is the desired contradiction.
10

Exercises for Part II

10.1 Separation
Exercise II.1 Mark by ”Y” those of the below listed cases where the linear form f ⊤ x separates
the sets S and T :
• = {0} ⊂ R, T = {0} ⊂ R, f ⊤ x = x
S
• = {0} ⊂ R, T = [0, 1] ⊂ R, f ⊤ x = x
S
• = {0} ⊂ R, T = [−1, 1] ⊂ R, f ⊤ x = x
S
= {x ∈ R3 : x1 = x2 = x3 }, T = {x ∈ R3 : x3 ≥ x21 + x22 }, f ⊤ x = x1 − x2
p
• S
• S = {x ∈ R3 : x1 = x2 = x3 }, T = {x ∈ R3 : x3 ≥ x21 + x22 }, f ⊤ x = x3 − x2 S = {x ∈ R3 :
p

−1 ≤ x1 ≤ 1}, T = {x ∈ R3 : x21 ≥ 4}, f ⊤ x = x1

• S = {x ∈ R2 : x2 ≥ x21 , x1 ≥ 0}, T = {x ∈ R2 : x2 = 0}, f ⊤ x = −x2
Exercise II.2 Consider the set
 

 x1 + x2 + . . . + x2004 ≥ 1 

x1 + 2x2 + 3x3 . . . + 2004x2004 ≥ 10

 

 
M = x ∈ R2004 : x1 + 22 x2 + 32 x3 . . . + 20042 x2004 ≥ 102
............

 


 

x1 + 22002 x2 + 32002 x3 + . . . + 20042002 x2004 ≥ 10 2002
 

Is it possible to separate this set from the set {x1 = x2 = . . . = x2004 ≤ 0}? If yes, what could
be a separating plane?
Exercise II.3 Can the sets S = {x ∈ R2 : x1 > 0, x2 ≥ 1/x1 } and T = {x ∈ R2 : x1 < 0, x2 ≥
−1/x1 } be separated? Can they be strongly separated?
Exercise II.4 ♦ Let M ⊂ Rn be a nonempty closed convex set. The metric projection
ProjM (x) of a point x ∈ Rn onto M is the ∥ · ∥2 -closest to x point of M , so that

ProjM (x) ∈ M & ∥x − ProjM (x)∥22 = min ∥x − y∥22 . (∗)

y∈M

1. Prove that for every x ∈ Rn the minimum in the right hand side of (∗) is achieved, and x+
is a minimizer if and only if

x+ ∈ M & ∀y ∈ M : [x − x+ ]⊤ [x+ − y] ≥ 0. (10.1)

Derive from the latter fact that the minimum in (∗) is achieved at a unique point, the bottom
line being that ProjM (·) is well defined.
2. Prove that when passing from a point x ∈ Rn to its metric projection x+ = ProjM (x), the
distance to any point of M does not increase, specifically,

∀y ∈ M : ∥x+ − y∥22 ≤ ∥x − y∥22 − dist2 (x, M ),

(10.2)
dist(x, M ) := minu∈M ∥x − u∥2 = ∥x − x+ ∥2 .

131
132 Exercises for Part II
x−x+
3. Let x ̸∈ M , so that, denoting x+ = ProjM (x), the vector e = ∥x−x+ ∥2
is well defined. Prove
⊤
that the linear form e z strongly separates x and M , specifically,

∀y ∈ M : e⊤ y ≤ e⊤ x − dist(x, M ).

Note: The fact just outlined underlies an alternative proof of Separation Theorem, where
the first step is to prove that a point outside a nonempty closed convex set can be strongly
separated from the set. In our proof, the first step was similar, but with M restricted to be
polyhedral, rather than merely convex and closed.
4. Prove that the mapping x 7→ ProjM (x) : Rn → M is contraction in ∥ · ∥2 :

∀u, u′ ∈ Rn : ∥ ProjM (u) − ProjM (u′ )∥2 ≤ ∥u − u′ ∥2 .

5. Let M be the probabilistic simplex: M = {x ∈ Rn : x ≥ 0, i xi = 1}. Justify the following

P
recipe for computing ProjM (x):
Let ψ(t) = m
P
i=1 [xi − t]+ . where [s]+ = max[s, 0]. ψ is piecewise linear, with breakpoints
x1 , x2 , . . . , xn , continuous function of t ∈ R. ψ(t) → +∞ as t → −∞, and ψ(t) → 0 as
t → +∞. Consequently,Pthere exists (and can be easily computed due to piecewise linearity
of ψ) t ∈ R such that i [xi − t]+ = 1. The metric projection of x onto M is nothing but
the vector x+ with coordinates [xi − t]+ , 1 ≤ i ≤ n.
What is the metric projection of the point x = [1; 2; 2.5] onto the 3-dimensional probabilistic
simplex?
Exercise II.5 ♦ [Follow-up to Exercise II.4] Let p(z) = z n + pn−1 z n−1 + . . . + p1 z + p0 , n ≥ 1
be a polynomial of complex variable z. By the Fundamental Theorem of Algebra, p has n roots
λ1 , . . . , λn . Treating complex numbers as 2D real vectors, prove that all roots of the derivative
p′ (z) = nz n−1 + (n − 1)pn−1 z n−2 + .. + p1 belong to the convex hull of λ1 , . . . , λn .
Exercise II.6 ▲ Derive the statement in Remark I.1.4 from the Separation Theorem.

10.2 Extreme points

Exercise II.7 Find extreme points of the following sets:
3
1. X = {x ∈ R : x1 + x2 ≤ 1, x2 + x3 ≤ 1, x3 + x1 ≤ 1}
2. X = {x ∈ R4 : x1 + x2 ≤ 1, x2 + x3 ≤ 1, x3 + x4 ≤ 1, x4 + x1 ≤ 1}
Exercise II.8 ♦ Let M ⊂ Rn be a nonempty closed convex set not containing lines, and f ⊤ x
be a linear function of x ∈ Rn achieving its maximum over X. Prove that among maximizers
of this function on M there are extreme points of M .
Exercise II.9 [Follow-up to Exercise I.8] Let A, B be subsets of Rn . Mark by T those of the
below claims which always (i.e., for every data satisfying premise of the claim) are true:
1. If Conv(A) = Conv(B) , then A = B
2. If Conv(A) = Conv(B) is nonempty and A, B, Conv(A) are closed, then A ∩ B ̸= ∅.
3. If Conv(A) = Conv(B) is nonempty and bounded, A ∩ B ̸= ∅.
4. If Conv(A) = Conv(B) is nonempty, closed and bounded, then A ∩ B ̸= ∅.
Exercise II.10 As is immediately seen, the only extreme point of the nonnegative orthant
Rn+ = R+ × R+ × . . . × R+ is the origin, that is, the vector from {0} × {0} × . . . × {0}; as
we know, the extreme points of n-dimensional unit box {x ∈ Rn : 0 ≤ xi ≤ 1, i ≤ n} =
[0, 1] × [0, 1] × . . . × [0, 1] are zero/one vectors, that is, vectors from {0, 1} × {0, 1} × . . . × {0, 1}.
Prove the following generalization of these observations:
Let Xi ⊂ Rni , 1 ≤ i ≤ K, be closed convex sets. The set of extreme points of the direct product
X = X1 × . . . × XK of these sets is the direct product of the sets of extreme points of Xi .
10.2 Extreme points 133

Exercise II.11 ♦ Looking at the sets of extreme points of closed convex sets like the unit
Euclidean ball, a polytope, the paraboloid {[x; t] : t ≥ x⊤ x}, etc., we see that these sets are
closed. Do you think this always is the case? Is it true that the set Ext(M ) of extreme points of
a closed convex set M always is closed ?
Exercise II.12 ▲ Derive representation (∗) in Exercise I.29 from Example II.9.1 in section
9.1.1.
Exercise II.13 ♦ P By Birkhoff PTheorem, the extreme points of the polytope Πn = {[xij ] ∈
Rn×n : xij ≥ 0, i xij = 1 ∀j, j xij = 1 ∀i} are exactly the Boolean (i.e., with entries 0
and 1) matrices from this set. Prove that the same holds
P true for theP “polytope of sub-doubly
stochastic” matrices Πm,n = {[xij ] ∈ Rm×n : xij ≥ 0, i xij ≤ 1 ∀j, j xij ≤ 1 ∀i}.
Exercise II.14 ♦ [Follow-up to Exercise II.13] Let with m ≤ n,
Pm, n be two positive integers P
and Xm,n be the set of m × n matrices [xij ] with i |xij | ≤ 1 for all j ≤ n and j |xij | ≤ 1
for

all i ≤ m.

Describe the set

Ext(Xm,n ). To get an educated guess, look at the matrices
1 0 0 0 0 0 0.5 −0.5 0
0 0 −1
,
0 0 −1
,
−0.5 0.5 0
from X2,3 .

Exercise II.15 ♦ [follow-up to Exercise II.13] Let x be an n × n entrywise nonnegative matrix

with all row and all column sums ≤ 1. Is it true that for some doubly stochastic matrix x, the
matrix x − x is entrywise nonnegative?
Exercise II.16 ♦ [Assignment problem] Consider the problem as follows:
There are n jobs and n workers. When worker j is assigned with job i, we get profit cij . We want
to assign every worker with a job in such a way that every worker is assigned with exactly one
job and every job is assigned to exactly one worker. Under this restriction, we want to maximize
the total profit.
1. Pose the Assignment problem as a Boolean (i.e., with the decision variables restricted to be
zeros and ones) Linear Programming problem.
2. Think how to solve the problem from item 1 via plain Linear Programming
3. [computational study] Consider the special case of Assignment problem where all profits cij
are zeros or ones; you can interpret cij = 1/0 as the fact that worker j knows/does not
know how to execute job j. In this situation Assignment problem requires from us to find an
assignment which maximizes the total number of executed jobs. Assume now that the matrix
C = [cij ] is generated at random, with entries taking, independently of each other, value 1
with probability ϵ ∈ (0, 1) and value 0 with probability 1−ϵ. For n ∈ {4, 8, 16, 32, 64, 128, 256}
and ϵ ∈ {1/2, 1/4, 1/8, 1/16}, run 100 simulations per pair n, ϵ to find the empirical mean of
the ratio ”number of executed jobs in optimal assignment”/n and look at the results.
Exercise II.17 ▲ Let ν = (ν1 , . . . , νK ) with positive integer νi , and let Sν = Sν1 × . . . × SνK
be the space of block-diagonal, with K diagonal blocks of sizes νi × νi , i ≤ K, symmetric
matrices, let Sν+ be the cone composed of positive semidefinite matrices from Sν , and let E be
an m-dimensional affine plane in Sν which intersects Sν+ . The intersection X = E ∩ Sν+ is a
closed nonempty convex set not containing lines and thus possessing extreme points. Let W be
such a point, W ii be the diagonal blocks of W , and ri be the ranks of νi × νi matrices W ii .
Prove that
Xk XK
ri (ri + 1) ≤ νi (νi + 1) − 2m.
i=1 i=1

What happens in the diagonal case ν1 = . . . = νK = 1 ?

Exercise II.18 ♦ Let M be a closed convex set in Rn and x̄ be a point of M .
1. Prove that if there exists a linear form a⊤ x such that x̄ is the unique maximizer of the form
on M , then x̄ is an extreme point of M .
134 Exercises for Part II

2. Is the inverse of 1) true, i.e., is it true that every extreme point x̄ of M is the unique
maximizer, over x ∈ M , of a properly selected linear form?
Exercise II.19 Identify and justify the correct claims in the following list:
n
1. Let X ⊂ R be a nonempty closed convex set, P be an m × n matrix, Y = P X := {P x :
x ∈ X} ⊂ Rn , and Y be the closure of Y . Then
• For every x ∈ Ext(X), P x ∈ Ext(Y )
• Every extreme point of Y which happens to belong to Y is P x for some x ∈ Ext(X)
• When X does not contain lines, then every extreme point of Y which happens to belong
to Y is P x for some x ∈ Ext(X)
2. Let X, Y be nonempty closed convex sets in Rn , and let Z = X + Y , Z = cl Z. Then
• If w ∈ Ext(Z) ∩ Z, then w = x + y for some x ∈ Ext(X) and y ∈ Ext(Y ).
• If x ∈ Ext(X), y ∈ Ext(Y ), then x + y ∈ Ext(Z).
Exercise II.20 ♦ [faces of polyhedral set] Let X = {x ∈ Rn : a⊤ i x ≤ bi , i ≤ m} be a nonempty
polyhedral set and f ⊤ x be a linear form of x ∈ Rn which is bounded above on X:

Opt(f ) = sup f ⊤ x < ∞

x∈X

Prove that
1. Opt(f ) is achieved – the set Argmax f ⊤ x := {x ∈ X : f ⊤ x = Opt(f )} is nonempty.
x∈X
2. The set Argmax f ⊤ x is as follows: there exists an index set I ⊂ {1, 2, . . . , m}, perhaps empty,
x∈X
such that
Argmax f ⊤ x = XI := {x : a⊤ ⊤
i x ≤ bi ∀i, ai x = bi ∀i ∈ I}
x∈X

3. Vice versa, if I ⊂ {1, . . . , m} is such that the set XI = {x : a⊤ ⊤

i x ≤ bi ∀i, ai x = bi ∀i ∈ I} is
⊤
nonempty, then XI = X∗ := Argmaxx∈X f x for properly selected f .
Note: Nonempty sets of the form XI , I ⊂ {1, . . . , m}, are called faces of the polyhedral set
X. This definition is not geometric – according to it, whether a given set Y is or is not a
face in X, may depend not on X per se, but on its representation as the solution set of a
finite system of linear inequalities. Facts 2—3, taken together, state that in fact being a face
of a polyhedral set is a geometric property – faces are exactly the sets Argmax f ⊤ x of all
x∈X
maximizers of linear forms bounded from above on X.
4. Extreme points of a face of X are extreme points of X.
5. Extreme points of X, if any, are exactly the faces of X which are singletons.
Note: As a corollary of 1—3, 5, we see that extreme points of polyhedral set X are exactly
the maximizers of those linear forms which achieve their maximum on X at a unique point.
Exercise II.21 ♦ [Follow-up to Exercise II.20]
1. Let X ⊂ Y be nonempty closed convex sets in Rn . Is it true that Ext(Y ) ∩ X ⊂ Ext(X) ?
2. Let X be a nonempty closed convex set contained in the polyhedral set {x : Ax ≤ b}.
Assuming that the set X = X∩{x : Ax = b} is nonempty, is it true that Ext(X) = Ext(X)∩X
?
m×n
3. By the result
P of Exercise P II.13, the extreme points of the polytope Πm,n = {[xij ] ∈ R :
xij ≥ 0, i xij ≤ 1 ∀j, j xij ≤ 1 ∀i} are exactly the Boolean matrices from this polytope.
Now let Πb m,n be the part of Πm,n cut off Πm,n by imposing on prescribed row and columns
of m × n matrix x ∈ Πm,n the requirement to be equal to 1, rather than to be ≤ 1. Assuming
Π
b m,n nonempty, prove that the extreme points of this polytope are exactly the Boolean
matrices contained in it.
10.3 Cones and extreme rays 135

Exercise II.22 Let X ⊂ Rm be a nonempty polyhedral set, x 7→ P x + p : Rn → Rm be an

affine mapping, and Y be the image of X under this mapping. Mark by T the statements in the
below list which are always (i.e., for all X, P, p compatible with the above assumptions) true:
1. Y is a nonempty polyhedral set.
2. If X does not contain lines, so is Y .
3. If X does contain lines, so is Y .
4. If v is an extreme point of X, then P v + p is an extreme point of Y .
5. If z is an extreme point of Y , then z = P v + p for certain extreme point z of X.
6. If z is an extreme point of Y and X does not contain lines, then z = P v + p for certain
extreme point z of X.
Exercise II.23 ▲ Find extreme points of the following closed convex sets:
1. The set Sn = {X ∈ Sn : −In ⪯ X ⪯ In }
2. The set Sn+ = {X ∈ Sn : 0 ⪯ X ⪯ In }
3. The set Dk,n = {X ∈ Sn : In ⪰ X ⪰ 0, Tr(X) = k}, where k is a positive integer ≤ n.
4. The set Mn = {X ∈ Rn×n : ∥X∥2,2 ≤ 1} (∥ · ∥2,2 is the spectral norm)
Exercise II.24 ▲ Prove the following fact (which can be considered as a matrix extension of
Birkhoff Theorem):
For positive integers d, n, let Πd,n be the set of all n × n block matrices with d × d symmetric
blocks X ij satisfying
X X
X ij ⪰ 0, Tr(X ij ) = 1∀i, Tr(X ij ) = 1∀j.
j i

The extreme points of Πd,n are exactly the block matrices [X ij ]i,j≤n as follows: for certain n × n
permutation matrix P and unit vectors eij ∈ Rd , one has
X ij = Pij eij e⊤
ij ∀i, j.

Exercise II.25 ▲ Let k, n be positive integers with k ≤ n, and let sk (λ) for λ ∈ Rn be
the sum of k largest entries in λ. From the description of the extreme points of the polytope
X = {x ∈ Rn : 0 ≤ xi ≤ 1, i ≤ n, n
P
i=1 xi ≤ k}, see Example II.9.2 in section 9.1.1, it follows
that when λ ∈ Rn+ , then
Xn
max λi xi = sk (λ).
x∈X
i=1

Prove the following matrix analogy of this fact:

Pn
For k, n as above, let X = {(X1 , . . . , Xn ) : Xi ∈ Sd , 0 ⪯ Xi ⪯ Id , i ≤ n, i=1 Xi ⪯ kId }. Then
for λ ∈ Rn+ one has
n
X
(X1 , . . . , Xn ) ∈ X =⇒ λi Xi ⪯ sk (λ)Id ,
i=1

with the concluding ⪯ being = for properly selected (X1 , . . . , Xn ) ∈ X .

10.3 Cones and extreme rays

Exercise II.26 ▲ Let X be a nonempty closed and bounded set in Rn . Which of the following
statements are true?
1. Conv(X) is closed convex set.
2. Cone(X) is a closed cone.
3. When X is convex, Cone(X) is closed cone.
4. When 0 ̸∈ X, Cone(X) is a closed cone.
136 Exercises for Part II

5. When 0 ̸∈ X and X is convex, Cone(X) is closed cone.

6. When X is polyhedral, Cone(X) is a closed cone.
Exercise II.27 ♦ Let X ⊂ Rn be a nonempty polyhedral set given by polyhedral representa-
tion:
X = {x : ∃u : Ax + Bu ≤ r}

and let K = Cone(X) be the conic hull of X.

1. Is it true that K is a closed cone?
2. Prove that K := cl K is a polyhedral cone and find polyhedral representation of K.
3. Assume that X is given by plain – no extra variables – polyhedral representation: X = {x :
Ax ≤ b}. Build plain polyhedral representation of K := cl Cone(X).
Exercise II.28 As we know, the extreme directions of the nonnegative orthant Rn + = R+ ×
R+ × . . . × R+ are the vectors with single positive entry and remaining entries equal to 0. Prove
the following generalization of this observation:
Let Xi ⊂ Rni , 1 ≤ i ≤ K, be closed and pointed cones. The extreme directions of the direct
product X = X1 × . . . × XK of these cones are the block-vectors d = [d1 ; . . . ; dK ] with di ∈ Rni
of the following structure: all but one blocks in d are zero, and the only nonzero block is an
extreme direction of the corresponding factor Xi .
Exercise II.29 Describe all extreme rays of
1. positive semidefinite cone Sn
+
2. Lorentz cone Ln

10.4 Recessive cone

Exercise II.30 ▲ Let M be a convex set, and let x̄ and h be such that Rx̄ := {x̄ + th : t ≥
0} ⊂ M .
1. Is it always true that whenever x ∈ M , the set Rx = {x + th, t ≥ 0} is contained in M ?
2. Let h be a recessive direction of M = cl M , and let x̄ be a point from the relative interior of
M . Is it always true that the set Rx̄ = {x̄ + th : t ≥ 0} is contained in M ?
Exercise II.31 ▲ Let M ⊂ Rn be a cone, not necessary closed; recall that pointedness of a
cone M means that the only vector x such that x ∈ M and −x ∈ M is the zero vector. Which
of the following statements are always true:
1. M is pointed if an only if the only representation of 0 as the sum of k ≥ 1 vectors xi ∈ M is
the representation with xi = 0, i ≤ k.
2. M is pointed if and only if M does not contain straight lines (one-dimensional affine planes)
passing through the origin.
3. Assuming M closed, M is pointed if and only if M does not contain straight lines.
4. M is pointed cone if and only if the closure of M is so.
5. The closure of M is a pointed cone if and only if M does not contain straight lines.
Exercise II.32 Literal interpretation of the words “polyhedral cone” is: a polyhedral set {x :
Ax ≤ b} which is a cone. An immediate example is the solution set {x : Ax ≤ 0} of homogeneous
system of linear inequalities. Prove that this example is generic: whenever a polyhedral set
K = {x : Ax ≤ b} is a cone, one has K = {x : Ax ≤ 0}.
Exercise II.33 ♦ Prove the following modification of Proposition II.8.18:
10.5 Around majorization 137

(!) Let X ⊂ RN be a nonempty closed convex set such that X ⊂ V + Rec(X) for some
bounded and closed set V , let x 7→ A(x) = Ax + b : RN → Rn be an affine mapping, and let
Y = A(X) := {y : ∃x ∈ X : y = A(x)} be the image of X under this mapping. Let also

K = {h ∈ Rn : ∃g ∈ Rec(X) : h = Ag}.

Then the recessive cone of the closure Y of Y is the closure K of K. In particular, when K is
closed (as definitely is the case when Rec(X) is polyhedral), it holds Rec(Y ) = K.
Exercise II.34 ♦ [follow-up to Exercise II.33]
1. Let K1 ⊂ R , K2 ⊂ Rn be closed cones, and let K = K1 + K2 .
n

• Is it always true that K is a cone?

• Is it always true that K is closed?
• Let K2 be polyhedral. Is it always true that K is closed?
• Let both K1 and K2 be polyhedral. Is it always true that K is closed?
2. Let Xi , i = 1, . . . , I, be closed convex sets in Rn with nonempty intersection. Is it true that
∩i Rec(Xi ) = Rec(∩i Xi )?
3. Let X1 , X2 be nonempty closed convex sets in Rn , let K1 = Rec(X1 ), K2 = Rec(X2 ),
X = cl(X1 + X2 ), K = cl(K1 + K2 ).
• Is it always true that K ⊂ Rec(X) ?
• Is is always true that K = Rec(X) ?
• Assume that Xi ⊂ Vi + Ki for properly selected closed and bounded set Vi , i = 1, 2, Is it
true that K = Rec(X) ?
Exercise II.35 ▲ Let f (x) = x⊤ Cx − c⊤ x + σ be quadratic form with C ⪰ 0. By Exercise I.15,
the set E = {x : f (x) ≤ 0} is convex (and of course closed). Assuming E ̸= ∅, describe Rec(E).

10.5 Around majorization

Exercise II.36 ♦ Let x ∈ Rm , let X[x] be the convex hull of all permutations of x, and let
X+ [x] be the set of all vectors x′ dominated by a vector form X[x]:

X+ [x] = {y : ∃z ∈ X[x] : y ≤ z}.

1) Prove that X+ [x] is a closed convex set.

2) Prove the following characterization of X+ [x]: X+ [x] is exactly the set of solutions of the
system of inequalities sj (y) ≤ sj (x), j = 1, . . . , m, in variables y, where, as always sj (z) is the
sum of the j largest entries in vector z.

10.6 Around polars

Exercise II.37 Justify the last three claims in Example II.8.11.
Exercise II.38 ♦ [more on polars]
1. Recall that for U ⊂ Rn , Vol(U ) stands for the ratio of the n-dimensional volume of U and
the volume of the n-dimensional unit Euclidean ball. Check that for a centered at the origin
ellipsoid E = {x : x⊤ Cx ≤ 1} (C ≻ 0) we have Vol(E)Vol(Polar (E)) = 1.
2. Let C ≻ 0 and let ellipsoid E = {x : (x − c)⊤ C(x − c) ≤ 1} contain the origin. Compute
Polar (E).
3. Let Xk , k ≤ K, be closed convex sets in Rn containing the origin. Prove that
Polar (Conv(∪k Xk )) = ∩k Polar (Xk ) (a)
Polar (∩k Xk ) = cl Conv(∪k Polar (Xk )) (b)
138 Exercises for Part II

Exercise II.39 ♦ Let X ⊂ Rn be a cone given by polyhedral representation

X = {x ∈ Rn : ∃u : Ax + Bu ≤ r}
Is the dual to X cone X∗ polyhedral? If yes, build a polyhedral representation of X∗ .
Exercise II.40 ♦
1. Let X ⊂ Rn be a nonempty polyhedral set given by polyhedral representation
X = {x ∈ Rn : ∃u : Ax + Bu ≤ r}
Is the polar Polar (X) of X polyhedral? If yes, point out a polyhedral representation of
Polar (X). For non-polyhedral extension, see Exercise IV.36.
2. Compute the polars of
1. probabilistic simplex ∆ = {x ∈ Rn : x ≥ 0, i xi = 1}
P
2. convex hull of nonempty finite set of points a1 , . . . , aN from Rn
3. the set {x ∈ Rn : x ≤ b}
Solution: Polar ({x : x ≤ b}) = {y : y ≥ 0, y ⊤ b ≤ 1}

10.7 Miscellaneous exercises

Exercise II.41 ▲ Let X = {x ∈ Rn : Ax ≤ b} be a nonempty polyhedral set.
1. Prove that X is bounded if and only if every one of the vectors ±ei , (ei , 1 ≤ i ≤ n, are the
basic orth) can be represented as conic combination of columns of A⊤ .
2. Certify the correct statements in the following list:
• The polyhedral set X = {x ∈ R3 : x ≥ [1/3; 1/3; 1/3], P3i=1 xi ≤ 1} is bounded.
P
• The polyhedral set X = {x ∈ R : x1 ≥ 1/3, x2 ≥ 1/3, 3i=1 xi ≤ 1} is unbounded.
3

Exercise II.42 Prove the easy part of Theorem II.9.7, specifically, that every n×n permutation
matrix is an extreme point of the polytope Πn of n × n doubly stochastic matrices.
Exercise II.43 ♦ [robust LP] Consider uncertain Linear Programming problem – a family
XN XN
minn {c⊤ x : [A + ζν ∆ν ]x ≤ b + ζ ν δν } : ζ ∈ Z (10.3)
x∈R ν=1 ν=1

of LP instances of common sizes (n variables, m constraints). The associated story is as follows:

we want to solve an LP program with the data not known exactly when the problem is being
solved; what we know at this time, is that the “true problem” belongs to the parametric family
given, according to (10.3), by the “nominal data” c, A, b, “basic perturbations ∆ν , δν ” and the
perturbation set Z through which run the data perturbations ζ specifying particular instances
in the family. In this situation (quite typical for real life applications of LP, where partial data
uncertainty is a rule rather than an exception), one way to “immunize” decisions against data
uncertainty is to look for robust solutions – those remaining feasible for all perturbations of
the data from the perturbation set – by solving the Robust Counterpart (RC) of our uncertain
problem – the optimization problem
n XN XN o
min c⊤ x : [A + ζν ∆ν ]x ≤ b + ζν δν ∀(ζ ∈ Z) (RC)
x ν=1 ν=1

(RC) is not an LP program – it has finitely many decision variables and infinite (when Z is
”massive”) system of linear constraints on these variables. Optimization problems of this type
are called semi-infinite and are, in general, difficult to solve. However, the RC of an uncertain
LP is easy, provided that Z is a “computation-friendly” set, for example, nonempty set given
by polyhedral representation:
Z = {ζ : ∃u : P ζ + Qu ≤ r} (10.4)
10.7 Miscellaneous exercises 139

Now goes the exercise per se:

Use LP duality to reformulate (RC), (10.4) as an explicit LP program.
Exercise II.44 ▲ Consider scalar linear constraint

a⊤ x ≤ b (1)
n
with uncertain data a ∈ R (b is certain) varying in the set
Xn
U = {a : |ai − a∗i |/δi ≤ 1, 1 ≤ i ≤ n, |ai − a∗i |/δi ≤ k} (2)
i=1

where a∗i are given “nominal data,” δi > 0 are given quantities, and k ≤ n is an integer (in
literature, this is called “budgeted uncertainty”). Rewrite the Robust Counterpart

a⊤ x ≤ b ∀a ∈ U (RC)

in a tractable LO form (that is, write down an explicit system (S) of linear inequalities in
variables x and additional variables such that x satisfies (RC) if and only if x can be extended
to a feasible solution of (S)).

Exercise II.45 ▲ [computational study, follow-up to Exercise II.43]

Preliminaries. Consider oscillator transmitting harmonic wave with unit wavelength and placed
at some point P in 3D. Physics says that the electric field generated by the oscillator, when
measured at a remote point A, is

eA (t) ≈ r−1 α cos ωt − 2πr + θ + 2πd cos(ϕ) + ωt

(∗)
| {z }
EA (t)

where
• t is time, ω is the frequency,
• r is the distance from A to the origin O, d is the distance from P to the origin, ϕ ∈ [0, π] is
−−→ −→
the angle between the directions OP and OA,
• α and θ are responsible for how the oscillator is actuated.
The difference between the left and the right hand sides in (∗) is of order of r−2 and in all our
subsequent considerations can be completely ignored.
It is convenient to assemble α and θ into the actuation weight – the complex number w = αeıθ
(ı is the imaginary unit); with this convention, we have

EA (t) = ℜ wDP (ϕ)eıωt−2πr , DP (ϕ) = e2πıd cos(ϕ) .

where ℜ[·] stands for the real part of a complex number. The complex-valued function DP (ϕ) :
[0, π] → C, called the diagram of the oscillator, is responsible for the directional density of
the energy emitted by the oscillator: when evaluated at certain 3D direction ⃗e, this density
is proportional to |Dp (ϕ)|2 , where ϕ is the angle between the direction ⃗e and the direction
−−→
OP . Physics says that when our transmitting antenna is composed of K harmonic oscillators
located at points P1 , . . . , PK and actuated with weights w1 , . . . , wK , the directional density of the
energy transmitted by the resulting antenna array, as evaluated at a direction ⃗e, is proportional
−−→
to | k wk Dk (ϕk (⃗e))|2 , where ϕk (⃗e) is the angle between the directions ⃗e and OPk .
P

Consider the design problem as follows. We are given linear array of K oscillators placed
at the points Pk = (k − 1)δe, k ≤ K, where e is the first basic orth (that is, the unit vector
“looking” along the positive direction of the x-axis), and δ > 0 is a given distance between
consecutive oscillators. Our goal is to specify actuation weights wk , k ≤ K, in order to send as
much of total energy as possible along the directions which make at most a given angle γ with
e. To this end, we intend to act as follows:
140 Exercises for Part II

We want to select actuation weights wk , k ≤ K, in such a way that the magnitude |Dw (ϕ)| of
the complex-valued function
K
X
Dw (ϕ) = wk e2πı(k−1)δ cos(ϕ))
k=1

of π ∈ [0, π] is “concentrated” on the segment 0 ≤ ϕ ≤ γ. Let us normalize the weights by the

requirement
Dw (0) = 1

and minimize under this restriction the “sidelobe level”

max |Dw (ϕ)|

γ≤ϕ≤π

over w.
To get a computation-friendly version of this problem, we replace the full range [0, π] of values
of ϕ with M -point equidistant grid
ℓπ
Γ = {ϕℓ = : 0 ≤ ℓ ≤ M − 1},
M −1
thus converting our design problem into the optimization problem

| K wk e2πı(k−1)δ cos(ϕℓ ) | ≤ t ∀(ℓ : ϕℓ > γ)

P
Opt = min t : P K
k=1
2πı(k−1)δ , wk ∈ C, k ≤ K (P )
k=1 wk e =1
t,w

which is a convex problem in 2k real variables – real and imaginary parts of w1 , . . . , wK .

Your tasks are as follows:
1. Process problem (P ) numerically and find the optimal design wn = {wkn , k ≤ K} along with
the optimal value Optn . Here and in what follows, recommended setup is
• number of oscillators K = 24, distance between consecutive oscillators δ = 0.125
• γ = π/12
• cardinality M of the equidistant grid Γ is 512
Draw the plot of the modulus of the resulting diagram
K
X
Dn (ϕ) = wkn e2πı(k−1)δ cos(ϕ)
k=1

and compute the corresponding “energy concentration” C n , with concentration of a diagram

D(·) defined as
P 2
ℓ:ϕ ≤γ sin(ϕℓ )|D(ϕℓ )|
C = PMℓ
2
ℓ=1 sin(ϕℓ )|D(ϕℓ )|

– up to discretization of ϕ, this is the ratio of the energy emitted in the “cone of interest”
(i.e., along the directions making angle at most γ with e) to the total emitted energy. Factors
sin(ϕℓ ) reflect the fact that when computing the energy emitted in a spatial cone, we should
integrate |D(·)|2 over the part of the unit sphere in 3D cut off the sphere by the cone.
2. Now note that “in reality” the optimal weights wkn , k ≤ K are used to actuate physical
devices and as such cannot be implemented with the same 16-digit accuracy with which they
are computed; they definitely will be subject to small implementation errors. We can model
these errors by assuming that the “real life” diagram is
K
X
D(ϕ) = wkn (1 + ρξk )e2πı(k−1)δ cos(ϕ)
k=1
10.7 Miscellaneous exercises 141

where ρ ≥ 0 is some (perhaps small) perturbation level and ξk ∈ C are “primitive” per-
turbations responsible for the implementation errors and running through the unit disk
{ξ : |ξ| ≤ 1}. It is not a great sin to assume that ξk are independent across k random
variables uniformly distributed on the unit circumference in C. Now the diagram becomes
random and can violate the constraints of (P ) , unless ρ = 0; in the latter case, the diagram
is the “nominal” one given by the optimal weights wn , so that it satisfies the constraints of
(P ) with t set to Optn .
Now, what happens when ρ > 0? In this case, the diagram D(·) and its deviation v from
the prescribed value 1 at the origin, its sidelobe level l = maxℓ:ϕℓ >γ |D(ϕℓ )|, and energy
concentration become random. A crucial “real life” question is how large are “typical values”
of these quantities. To get impression of what happens, you are asked to carry out the
numerical experiment as follows:
• select perturbation level ρ ∈ {10−ℓ , 1 ≤ ℓ ≤ 6}
• for selected ρ, simulate and plot 100 realizations of the modulus of the actual diagram,
and find empirical averages v of v, l of l, and C of C.
3. Apply Robust Optimization methodology from Exercise II.43 to build “immunized against
implementation errors” solution to (P ), compute these solutions for perturbation levels 10−ℓ ,
1 ≤ ℓ ≤ 6, and subject the resulting designs to numerical study similar to the one outlined
in the previous item.
Note: (P ) is not a Linear Programming program, so that you cannot formally apply the
results stated in Exercise II.43; what you can apply, is the Robust Optimization “philosophy.”
Exercise II.46 ♦ Prove the statement “symmetric” the Dubovitski-Milutin Lemma:
The cone M∗ dual to the arithmetic sum of k (close or not) cones M i ⊂ Rn , i ≤ k, is the
intersection of the k cones M∗i dual to M i .
Exercise II.47 ♦ Prove the following polyhedral version of the Dubovitski-Milutin Lemma:
Let M , . . . , M be polyhedral cones in Rn , and let M = ∩i M i . The cone M∗ dual to M is the
1 k

sum of cones M∗i , i ≤ k, dual to M i , so that a linear form e⊤ x is nonnegative on M if and only
it can be represented as the sum of linear forms e⊤ i x nonnegative on the respective cones Mi .

Exercise II.48 ♦ [follow-up to Exercise II.47] Let A ∈ Rm×n be a matrix with trivial kernel,
e ∈ Rn , and let the set
X = {x : Ax ≥ 0, e⊤ x = 1} (∗)

be nonempty and bounded. Prove that there exists λ ∈ Rm such that λ > 0 and A⊤ λ = e.
Prove “partial inverse” of this statement: if KerA = {0} and e = A⊤ λ for some λ > 0, the
set (∗) is bounded.
Exercise II.49 ♦ Let E be a linear subspace in Rn , K be a closed cone in Rn , and ℓ(x) :
E → R be a linear (linear, not affine!) function which is nonnegative on K ∩ E. Which of the
following claims are always true:
1. ℓ(·) can be extended from E onto the entire Rn to yield a linear function which is nonnegative
on K
2. Assuming int K ∩ E ̸= ∅, ℓ(·) can be extended from E onto the entire Rn to yield a linear
function which is nonnegative on K.
3. Assuming, in addition to ℓ(x) ≥ 0 for x ∈ K ∩ E, that K = {x : P x ≤ 0} is a polyhedral
cone, ℓ(·) can be extended from E onto the entire Rn to yield a linear function which is
nonnegative on K.
Exercise II.50 Let n > 1. Is the unit ∥ · ∥2 -ball Bn = {x ∈ Rn : ∥x∥2 ≤ 1} a polyhedral set?
Justify your answer.
142 Exercises for Part II

Exercise II.51 ▲ The unit box {x ∈ Rn : −1 ≤ xi ≤ 1, i ≤ n} is cut off Rn by a system

of m = 2n linear inequalities and is a nonempty and bounded polyhedral set. However, when
we eliminate any inequality from this system, the solution set of the resulting system becomes
unbounded. To see that this situation is in a sense extreme, prove the following claim:
Consider the solution set of a system of m linear inequalities in n variables x, i.e., the
set
X := {x ∈ Rn : Ax ≤ b} ,
where A = [a⊤ ⊤ ⊤
1 ; a2 ; . . . ; am ]. Suppose that X is nonempty and bounded. Then, when-
ever m > 2n, one can drop from this system a properly selected inequality in such a
way that the solution set of the resulting subsystem remains bounded.
A provocative follow-up: Is it possible to cut off from R1000 a bounded set by using only a
single linear inequality?
Exercise II.52 ▲ [computational study] Let ω N = (ω1 , . . . , ωN ) be an N -element i.i.d. sample
drawn from the standard Gaussian distribution (zero mean, unit covariance) on Rd . How many
extreme points are there in the convex hull of the points from the sample?
1. Consider the planar case d = 2 and think how to list extreme points of Conv{ω1 , . . . , ωN }.
Fill the following table:

N 2 4 8 16 32 64 128
U
M
L

where U is the maximal, M is the mean, and L is the minimal # of extreme points observed
when processing 100 samples ω N of a given cardinality

2. Think how to upper-bound the expected number of extreme points in W = Conv(ω N ).

Exercise II.53 ▲ [computational study] Given positive integers m, n, with n ≥ 2, consider
randomly generated system Ax ≤ b of m linear inequalities with n variables. We assume that
A, b are generated by drawing the entries, independently of each other, from N (0, 1).
1. Consider the planar case n = 2. For m = 2, 4, 8, 16, generate 100 samples of m × 2 systems
and fill the following table:
m 2 4 8 16
F
B

where F is the number of feasible systems, and U is the number of feasible systems with
bounded solution sets.
Intermezzo: related theoretical results originating from [Nem24, Exercise 2.23] are as follows.
Given positive integers m, n with n ≥ 2, consider homogenous system Ax ≤ 0 of m inequal-
ities with n variables. We call this system regular, if its matrix A is regular, regularity of a
matrix B meaning that all square submatrices of B are nonsingular. Clearly, the entries of a
regular matrix are nonzero, and when a p × q matrix B is drawn at random from a probabil-
ity distribution on Rp×q which has a density w.r.t the Lebesgue measure, B is regular with
probability 1.
Given regular m × n homogeneous system of inequalities Ax ≤ 0, let gi (x) = n
P
j=1 Aij xj ,
i ≤ m, so that gj are nonconstant linear functions. Setting Πi = {x : gi (x) = 0}, we get
a collection of m hyperplanes in Rn passing through the origin. For a point x ∈ Rn , the
signature of x is, by definition, the m-dimensional vector σ(x) of signs of the reals gi (x),
1 ≤ i ≤ m. Denoting by Σ the set of all m-dimensional vectors with entries ±1, for σ ∈ Σ
10.7 Miscellaneous exercises 143

the set Cσ = {x : σ(x) = σ} is either empty, or is a nonempty open convex set; when it is
nonempty, let us call it a cell associated with A, and the corresponding σ – an A-feasible
signature. Clearly, for regular system, Rn is the union of all hyperplanes Πi and all cells
associated with A. It turns out that
The number N (m, n) of cells associated with a regular homogeneous m × n system
Ax ≤ 0 is independent of the system and is given by a simple recurrence:
N (1, 2) = 2
m ≥ 2, n ≥ 2 =⇒ N (m, n) = N (m − 1, n) + N (m − 1, n − 1) [N (m, 1) = 2, m ≥ 1].
m×n
Next, when A is drawn at random from probability distribution P on R which possesses
symmetric density p, that is, such that p([a⊤ ⊤ ⊤ ⊤ ⊤ ⊤
1 ; a2 ; . . . ; am ]) = p([ϵ1 a1 ; ϵ2 a2 ; . . . ; ϵm am ]) for
all A = [a⊤1 ; a⊤
2 ; . . . ; a⊤
m ] and all ϵi = ±1, then the probability for a vector σ ∈ Σ to be an
A-feasible signature is
π(m, n) = N (m, n)/2m .
In particular, the probability for the system Ax ≤ 0 to have a solution set with a nonempty
interior (this is nothing but A-feasibility of the signature [−1; . . . ; −1] is π(m, n).
The inhomogeneous version of these results is as follows. An m×n system of linear inequalities
Ax ≤ b is called regular, if the matrix [A, −b] is regular. Setting gi (x) = n
P
j=1 Aij xj − bi ,
i ≤ n, the [A, b]-signature of x is, as above, the vector of signs of the reals gi (x). For σ ∈ Σ, the
set Cσ = {x : σ(x)) = σ} is either empty, or is a nonempty open convex set; in the latter case,
we call Cσ an [A, b]-cell, and call σ an [A, b]-feasible signature. Setting Πi = {x : gi (x) = 0},
we get m hyperplanes in Rn , and the entire Rn is the union of those hyperplanes and all
[A, b]-cells. It turns out that
The number N (m, n) of cells associated with a regular m × n system Ax ≤ b is independent
of the system and is equal to 12 N (m + 1, n + 1).
In addition, when m×(n+1) matrix [A, b] is drawn at random from a probability distribution
on Rm×(n+1) possessing a symmetric density w.r.t. the Lebesgue distribution, the probability
for every σ ∈ Σ to be [A, b]-feasible signature is
π(m, n) = N (m + 1, n + 1)/2m+1 .
In particular, the probability for the system Ax ≤ b to be strictly feasible is π(m, n).
2. Accompanying exercise: Prove that if A is m × n regular matrix, then the system Ax ≤ 0
has a nonzero solution if and only if the system Ax < 0 is feasible. Derive from this fact that
if [A, b] is regular, then the system Ax ≤ b is feasible if and only if it is strictly feasible, and
that when the system Ax ≤ 0 has a nonzero solution, the system Ax ≤ b is strictly feasible
for every b.
3. Use the results from Intermezzo to compute the expected values of F and B, see item 1.
Exercise II.54 ▲ [computational study]
1. For ν = 1, 2, . . . , 6, generate 100 systems of linear inequalities Ax ≤ b with n = 2ν variables
and m = 2n inequalities, the entries in A, b being drawn, independently of each other, from
N (0, 1). Fill the following table:

n 2 4 8 16 32 64
F
E{F }
B
F : # of feasible systems in sample;
B: # of feasible systems with bounded soultion sets
To compute the expected value of F , use the results from [Nem24, Exercise 2.23] cited in
item 2 of Exercise II.53.
144 Exercises for Part II

2. Carry out experiment similar to the one in item 1, but with m = n + 1 rather than m = 2n.

n 2 4 8 16 32 64
F
E{F }
B
E{B}
F : # of feasible systems in sample;
B: # of feasible systems with bounded soultion sets
11

Proofs of Facts from Part II

Fact II.7.2 Let S, T be nonempty convex sets in Rn . A linear form a⊤ x separates S

and T if and only if
(a) sup a⊤ x ≤ inf a⊤ y, and
x∈S y∈T

(b) inf a⊤ x < sup a⊤ y.

x∈S y∈T

This separation is strong if and only if (a) holds as a strict inequality:

sup a⊤ x < inf a⊤ y.
x∈S y∈T

Proof. When a linear form a⊤ x separates S and T , (a) holds true. Given (a), (b) could be
violated if and only if inf a⊤ x = sup a⊤ y. But, together with (a), this can happen only if a⊤ x is
x∈S y∈T
constant on S ∪ T , which is not the case as a⊤ x separates S and T . The above reasoning clearly
can be reversed: given (b), we have a ̸= 0, and given (a), both supx∈S a⊤ x and inf y∈T a⊤ y are
real numbers. Selecting b in-between these real numbers, and the hyperplane a⊤ x = b clearly
separates S and T . The “strong separation” claim is evident.

Fact II.8.4 Let M be a nonempty convex set and let x ∈ M . Then, x is an extreme
point of M if and only if any (and then all) of the following holds:
(i) the only vector h such that Px ± h ∈ M is the zero vector;
m
(ii) in every representation x = i=1 λi xi of x as a convex combination, with positive
coefficients, of points xi ∈ M , i ≤ m, one has x1 = . . . = xm = x;
(iii) the set M \ {x} is convex.
Proof.
(i): If x is extreme point and x ± h ∈ M , then h = 0, since otherwise x = 12 (x + h) + 21 (x − h)
implying that x is an interior point of a nontrivial segment [x − h, x + h], which is impossible.
For the other direction, assume for contradiction that x ± h = 0 implies h = 0 and that x
is not at extreme point of M . Then, as x ̸∈ Ext(M ), there exists u, v ∈ M where both u, v
are not equal to x and λ ∈ (0, 1) such that x = λu + (1 − λ)v. As u ̸= x and v ̸= x while
x = λu+(1−λ)v, we conclude that u ̸= v. Now, consider any δ > 0 such that δ < min{λ, 1−λ}
and define h := δ(u − v). Note that h ̸= 0 and x + h = (λ + δ)u + (1 − λ − δ)v ∈ M and
x − h = (λ − δ)u + (1 − λ + δ)v ∈ M due to λ ± δ ∈ (0, 1), u, v ∈ M and convexity of M .
This then leads to the desired contradiction with our assumption that x ± h ∈ M implies
that h = 0.
As a byproduct of our reasoning, we see that if x ∈ M can be represented as x = λu+(1−λ)v
with u, v ∈ M , λ ∈ (0, 1], and u ̸= x, then x is not an extreme point of M .

145
146 Proofs of Facts from Part II

(ii): In one direction, when x is not an extreme point of M , there exists h ̸= 0 such that x±h ∈ M
so that x = 12 (x + h) + 12 (x − h) is a convex combination with positive coefficients and using
two points x ± h that are both in M and are distinct P from x. To prove the opposite direction,
m i P
let x be an extreme point of M and suppose x = i=1 λ i x with λ i > 0, i λ i = 1, and
let us prove that x1 = . . . = xm = x. Indeed, assume for contradiction that at least one
of xi , say, x1 , differs P
from x, and m > 1. Since λ2 > 0, wePhave 0 < λ1 < 1. Then, the
point v := (1 − λ1 )−1 m i
i=2 λi x is well defined. Moreover, as
m
i=2 λi = 1 − λ1 , v is a convex
combination of x , . . . , x and therefore v ∈ M . Then, x = λ1 x +(1−λ1 )v with x, x1 , v ∈ M ,
2 m 1

λ1 ∈ (0, 1], and x1 ̸= x, which, by the concluding comment in item (i) of the proof, implies
that x ̸∈ Ext(M ); this is the desired contradiction.
(iii): In one direction, let x be an extreme point of M ; let us prove that the set M ′ := M \ {x} is
convex. Assume for contradiction that this is not the case. Then, there exist u, v ∈ M ′ and
λ ∈ [0, 1] such that x̄ := λu + (1 − λ)v ̸∈ M ′ , implying that 0 < λ < 1 (since u, v ∈ M ′ ). As
M is convex, we have x̄ ∈ M , and since x̄ ̸∈ M ′ and M \ M ′ = {x}, we conclude that x̄ = x.
Thus, x is a convex combination, with positive coefficients, of two distinct from x points from
M , contradicting, by already proved item (ii), the fact that x is an extreme point of M . For
the other direction, suppose that M \ {x} is convex and we will prove that x must be an
extreme point of M . Assume for contradiction that x ̸∈ Ext(M ). Then, there exists h ̸= 0
such that x ± h ∈ M . As h ̸= 0, both x + h and x − h are distinct from x, thus x ± h ∈ M \ {x}.
We see that x ± h ∈ M \{x}, x = 21 (x + h) + 12 (x − h) and x ∈ / M \ {x}, contradicting the
convexity of M \ {x}.

Fact II.8.5 All extreme points of the convex hull Conv(Q) of a set Q belong to Q:
Ext(Conv(Q)) ⊆ Q.
Proof. Assume for contradiction that x ∈ Ext(Conv(Q)) and x ̸∈ Q. As x ∈ Ext(Conv(Q)),
by Fact II.8.4.(iii) the set Conv(Q) \ {x} is convex and contains Q, contradicting the fact that
Conv(Q) is the smallest convex set containing Q.

Fact II.8.13 Let M be a nonempty closed convex set in Rn . Then

(i) Rec(M ) ̸= {0} if and only if M is unbounded.
(ii) If M is unbounded, then all nonzero recessive directions of M are positive
multiples of recessive directions of unit Euclidean length, and the latter are asymptotic
directions of M , i.e., a unit vector h ∈ Rn is a recessive direction of M if and only
if there exists a sequence {xi ∈ M }i≥1 such that ∥xi ∥2 → ∞ as i → ∞ and
h = limi→∞ xi /∥xi ∥2 .
(iii) M does not contain lines if and only if the cone Rec(M ) does not contain
lines.
Proof.
(i): If Rec(M ) ̸= {0}, then M contains a ray and therefore M is unbounded. For the reverse
direction, suppose M is unbounded and let us prove that Rec(M ) ̸= {0}. As M is unbounded,
there exists a sequence of points xi ∈ M such that ∥xi ∥2 > i, for all i = 1, 2, . . .. Then, for
sufficiently large i, the vectors hi := (xi − x1 )/∥xi − x1 ∥2 are well defined unit vectors. Passing
to a subsequence, we can assume that hi → h as i → ∞ (Theorem B.15), so that h is a unit
vector as well. For every t ≥ 0, the points x1 + thi , for all i with ∥xi − x1 ∥2 > t, are convex
combinations of points x1 and xi and both x1 , xi ∈ M . Then, as M is convex, x1 + thi ∈ M for
all large enough i. As i → ∞, the points x1 + thi converge to x1 + th, and since M is closed,
we conclude x1 + th ∈ M . Because this holds for every t ≥ 0, the vector h ̸= 0 is a recessive
direction of M .
(ii): Suppose M is unbounded and consider any h ∈ Rec(M ) such that h is a unit vector.
Pick any x0 ∈ M and define xi := x0 + ih, i = 1, 2, . . .. Then, we get a sequence of points from
M diverging to infinity, i.e., ∥xi ∥2 → ∞ as i → ∞, and also satisfying h = limi→∞ ∥xi ∥−1 i
2 x .
Proofs of Facts from Part II 147

Thus, h is an asymptotic direction of M , as claimed. To prove the reverse direction, if xi ∈ M

are such that ∥xi ∥2 → ∞ and h := limi→∞ ∥xi ∥−1 i
2 x exists, then h is a recessive direction of M
by the same reasoning used in the proof of item (i), and of course h is a unit vector.
(iii): Suppose M contains a line with direction h ̸= 0, i.e., for some x ∈ M and for all t ∈ R,
we have x + th ∈ M . Then, by the definition of recessive direction, both h and −h are recessive
directions of M , so ±h ∈ Rec(M ), and thus Rec(M ) contains a line with the direction h. For
the reverse direction suppose h ̸= 0 and Rec(M ) contains a line with direction h. Since Rec(M )
is a closed convex cone, the line with the same direction passing through the origin is contained
in Rec(M ) (by Lemma II.8.8). Thus, ±h ∈ Rec(M ). Then, by the definition of Rec(M ), for any
x ∈ M it holds that x + th ∈ M for all t ∈ R. Hence, M contains a line with the direction h.

Fact II.8.14 Let M ⊆ Rn be a nonempty closed convex set. Recall its closed conic
transform is given by
ConeT(M ) = cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ M } ,
(see section 1.5). Then,
Rec(M ) = h ∈ Rn : [h; 0] ∈ ConeT(M ) .

Proof. Let h be such that [h; 0] ∈ ConeT(M ), and let us prove that h ∈ Rec(M ). There is nothing
to prove when h = 0, thus assume that h ̸= 0. Since the vectors g such that [g; 0] ∈ ConeT form a
closed cone and both ConeT(M ) and Rec(M ) are cones as well, we lose nothing when assuming,
in addition to [h; 0] ∈ ConeT(M ) and h ̸= 0, that h is a unit vector. Since [h; 0] ∈ ConeT(M ), by
definition of the latter set there exists a sequence [ui ; ti ] → [h; 0], i → ∞, such that ti > 0 and
xi := ui /ti ∈ M for all i. Then, this together with ui → h and ∥h∥2 = 1 imply that ∥xi ∥2 → ∞
and ti ∥xi ∥2 → 1 as i → ∞. As a result, limi→∞ ∥xi ∥−1 i i
2 x = limi→∞ u = h. By Fact II.8.13(ii),
we see that h ∈ Rec(M ).
For the reverse direction, consider any h ∈ Rec(M ), and let us prove that [h; 0] ∈ ConeT(M ).
There is nothing to prove when h = 0, so we assume h ̸= 0. Consider any x̄ ∈ M and define
xi := x̄ + ih, i = 1, 2, . . .. As h ∈ Rec(M ), we have xi ∈ M for all i. Moreover, ∥xi ∥2 → ∞ as
i → ∞ due to h ̸= 0. We clearly have limi→∞ [xi /∥xi ∥2 ; 1/∥xi ∥2 ] = [h/∥h∥2 ; 0], and the vectors
[y i ; ti ] := [xi /∥xi ∥2 , 1/∥xi ∥2 ] for all large enough i satisfy the requirement ti > 0, y i /ti ∈ M , so
[y i ; ti ] ∈ ConeT(M ) for all large enough i. As ConeT(M ) is closed and [y i ; ti ] → [h/∥h∥2 ; 0] as
i → ∞, we deduce [h/∥h∥2 ; 0] ∈ ConeT(M ). Finally, ConeT(M ) is a cone, so [h; 0] ∈ ConeT(M )
as well.

Fact II.8.15 For any nonempty polyhedral set M = {x ∈ Rn : Ax ≤ b}, its recessive
cone is given by
Rec(M ) = {h ∈ Rn : Ah ≤ 0} ,
i.e., Rec(M ) is given by homogeneous version of linear constraints specifying M .
Proof. Consider any h such that Ah ≤ 0. Then, for any x̄ ∈ M , and t ≥ 0, we have A(x̄ + th) =
Ax̄ + tAh ≤ Ax̄ ≤ b, so x̄ + th ∈ M for all t ≥ 0. Hence, h ∈ Rec(M ). For the reverse direction,
suppose h ∈ Rec(M ) and x̄ ∈ M . Then, for all t ≥ 0 we have A(x̄ + th) ≤ b. This is equivalent
to Ah ≤ t−1 (b − Ax̄) for all t > 0, which implies that Ah ≤ 0.

Fact II.8.23 Let M be a closed cone in Rn , and let M∗ be the cone dual to M .
Then
148 Proofs of Facts from Part II

(i) Duality does not distinguish between a cone and its closure: whenever M = cl M ′
for a cone M ′ , we have M∗ = M∗′ .
(ii) Duality is symmetric: the cone dual to M∗ is M .
(iii) One has
int M∗ = y ∈ Rn : y ⊤ x > 0, ∀x ∈ M \ {0} ,

and int M∗ is nonempty if and only if M is pointed (i.e., M ∩ [−M ] = {0}).

Moreover, when M , in addition to being closed, is pointed and nontrivial (M ̸=
{0}), one has
int M∗ = y ∈ Rn : My := {x ∈ M : x⊤ y = 1} is nonempty and compact .

(8.7)

(iv) The cone dual to the direct product M1 × . . . × Mm of cones Mi is the direct
product of their duals: [M1 × . . . × Mm ]∗ = [M1 ]∗ × . . . × [Mm ]∗ .
Proof.
(i): This is evident.
(ii): By definition, any x ∈ M satisfies x⊤ y ≥ 0 for all y ∈ M∗ , hence M ⊆ [M∗ ]∗ . To prove
M = [M∗ ]∗ , assume for contradiction that there exists x̄ ∈ [M∗ ]∗ \M . By Separation Theorem,
{x̄} can be strongly separated from M , i.e., there exists y such that

y ⊤ x̄ < inf y ⊤ x.
x∈M

As M is a conic set and the right hand side infimum is finite, this infimum must be 0. Thus,
y ⊤ x̄ < 0 while y ⊤ x ≥ 0 for all x ∈ M implying y ∈ M∗ . But, then this contradicts to
x̄ ∈ [M∗ ]∗ .
(iii): Let us prove that int M∗ ̸= ∅ if and only if M is pointed. If M is not pointed, then ±h ∈ M
for some h ̸= 0, implying that y ⊤ [±h] ≥ 0 for all y ∈ M∗ , that is, y ⊤ h = 0 for all y ∈ M∗ .
Thus, when M is not pointed, M∗ belongs to a proper (smaller than the entire Rn ) linear
subspace of Rn and thus int M∗ = ∅. This reasoning can be reversed: when int M∗ = ∅,
the affine hull Aff(M∗ ) of M∗ cannot be the entire Rn (since int M∗ = ∅ and rint M∗ ̸= ∅);
taking into account that 0 ∈ M∗ , we have Aff(M∗ ) = Lin(M∗ ), so that Lin(M∗ ) ⫋ Rn , and
therefore there exists a nonzero h orthogonal to Lin(M∗ ). We have y ⊤ [±h] = 0 for all y ∈ M∗ ,
implying that h and −h belong to cone dual to M∗ , that is, to M (due to the already verified
item (ii)). Thus, for some nonzero h it holds ±h ∈ M , that is, M is not pointed.
Now let us prove that y ∈ int M∗ if and only if y ⊤ x > 0 for every x ∈ M \ {0}. In one
direction: assume that y ∈ int M∗ , so that for some r > 0 it holds y + δ ∈ M∗ for all δ with
∥δ2 ∥ ≤ r. If now x ∈ M , we have 0 ≤ minδ:∥δ∥2 ≤r [y + δ]⊤ x = y ⊤ x − r∥x∥2 . Thus,
1 ⊤
y ∈ int M∗ =⇒ ∥x∥2 ≤ y x, ∀x ∈ M, (*)
r
implying that y ⊤ x > 0 for all x ∈ M \ {0}, as required. In the opposite direction: assume that
y ⊤ x > 0 for all x ∈ M \ {0}, and let us prove that y ∈ int M∗ . There is nothing to prove when
M = {0} (and therefore M∗ = Rn ). Assuming M ̸= {0}, let M̄ = {x ∈ M : ∥x∥2 = 1}. This
set is nonempty (since M ̸= {0}), is closed (as M is closed), and is clearly bounded, and thus
is compact. We are in the situation when y ⊤ x > 0 for x ∈ M̄ , implying that minx∈M̄ y ⊤ x
(this minimum is achieved since M̄ is a nonempty compact set) is strictly positive. Thus,
y ⊤ x ≥ r > 0 for all x ∈ M̄ , whence [y + δ]⊤ x ≥ 0 for all x ∈ M̄ and all δ with ∥δ∥2 ≤ r. Due
to the origin of M̄ , the inequality [y + δ]⊤ x ≥ 0 for all x ∈ M̄ implies that [y + δ]⊤ x ≥ 0 for
all x ∈ M . The bottom line is that the Euclidean ball of radius r centered at y belongs to
M∗ , and therefore y ∈ int M∗ , as claimed.
Now let us prove the “Moreover” part of item (iii). Thus, let the cone M be closed, pointed,
and nontrivial. Consider any y ∈ int M∗ , then the set My , first, contains some positive
multiple of every nonzero vector from M and thus is nonempty (since M ̸= {0}) and, second,
Proofs of Facts from Part II 149

is bounded (by (*)). Since My is closed (as M is closed), we conclude that My is a nonempty
compact set. Thus, the left hand side set in (8.7) is contained in the right hand side one. To
prove the opposite inclusion, let y ∈ Rn be such that My is a nonempty compact set, and let
us prove that y ∈ int M∗ . By the already proved part of item (iii), all we need is to verify that
if x ̸= 0 and x ∈ M , then y ⊤ x > 0. Assume for contradiction that there exists x̄ ∈ M \ {0}
such that α := −y ⊤ x̄ ≥ 0. Then, by selecting any x b ∈ My (My is nonempty!) and setting
e = αb x + x̄, we get e ∈ M and y ⊤ e = 0. Note the e ̸= 0; indeed, e = 0 means that the nonzero
vector x̄ ∈ M is such that −x̄ = αb x ∈ M , contradicting pointedness of M . The bottom line
is that e ∈ M \ {0} and y ⊤ e = 0, whence e is a nonzero recessive direction of My . This is the
desired contradiction as My is compact!
(iv): This is evident.

Fact II.8.28 Let M ⊆ Rn be a cone and M∗ be its dual cone. Then, for any
x ∈ int M , there exists a properly selected Cx < ∞ such that
∥f ∥2 ≤ Cx f ⊤ x, ∀f ∈ M∗ .
Proof. Since x ∈ int M , there exists ρ > 0 such that x − δ ∈ M whenever ∥δ∥2 ≤ ρ. Then, as
f ∈ M∗ , we have f ⊤ (x − δ) ≥ 0 for any ∥δ∥2 ≤ ρ , i.e., f ⊤ x ≥ supδ {f ⊤ δ : ∥δ∥2 ≤ ρ} = ρ∥f ∥2 .
Taking Cx := 1/ρ (note that Cx < ∞ as ρ > 0) gives us the desired relation.

Fact II.8.33. Let M ⊆ Rn be a nontrivial closed cone, and M∗ be its dual cone.
(i) M is pointed
(i.1) if and only if M does not contain straight lines,
(i.2) if and only if M∗ has a nonempty interior, and
(i.3) if and only if M has a base.
(ii) Set (8.9) is a base of M
(ii.1) if and only if f ⊤ x > 0 for all x ∈ M \ {0},
(ii.2) if and only if f ∈ int M∗ .
In particular, f ∈ int M∗ if and only if f ⊤ x > 0 whenever x ∈ M \ {0}.
(iii) Every base of M is nonempty, closed, and bounded. Moreover, whenever M is
pointed, for any f ∈ M∗ such that the set (8.9) is nonempty (note that this set is
always closed for any f ), this set is bounded if and only if f ∈ int M∗ , in which case
(8.9) is a base of M .
(iv) M has extreme rays if and only if M is pointed. Furthermore, when M is pointed,
there is one-to-one correspondence between extreme rays of M and extreme points
of a base B of M : specifically, the ray R := R+ (d), d ∈ M \ {0} is extreme if and
only if R ∩ B is an extreme point of B.
Proof. (i.1): Since M is closed, convex, and contains the origin, M contains a line if and only
if M contains a line passing through the origin, and since M is conic, the latter happens if and
only if M is not pointed.
(i.2): This is precisely Fact II.8.23(iii).
(i.3): As we have seen, (8.9) is a base of M if and only if f ⊤ x > 0 for all x ∈ M \ {0}, which,
by Fact II.8.23(iii), holds if and only if f ∈ int M∗ .
(ii.1): This was explained when defining a base.
(ii.2): This is given by Fact II.8.23(iii).
(iii): Suppose B is a base of M . Then, B is nonempty since B intersects all nontrivial rays
in M emanating from the origin, and the set of these rays is nonempty since M is nontrivial.
Closedness of B is evident. To prove that B is bounded, note that by (ii.2) f ∈ int M∗ . Thus,
there exists r > 0 such that f − e ∈ M∗ , for all ∥e∥2 ≤ r. Hence, [f − e]⊤ x ≥ 0 for all x ∈ M
150 Proofs of Facts from Part II

and all e with ∥e∥2 ≤ r, implying f ⊤ x ≥ r∥x∥2 for all x ∈ M , and therefore ∥x∥2 ≤ r−1 for all
x ∈ B.
Next, let M be pointed and f ∈ M∗ be such that the set (8.9) is nonempty. Closedness of
this set is evident. Let us show that this set is bounded if and only if f ∈ int M∗ . Indeed, when
f ∈ int M∗ , B is a base of M by (ii.2) and therefore, as we have just seen, B is bounded. For
the other direction, suppose that f ̸∈ int M∗ . Then, by Fact II.8.23(iii), there exists x̄ ∈ M \ {0}
such that f ⊤ x̄ = 0. Also, as the set (8.9) is nonempty, there exists x b ∈ M such that f ⊤ x
b = 1.
Now, observe that for any λ ∈ [0, 1) the vector (1 − λ)−1 [(1 − λ)b x + λx̄] belongs to B and the
norm of this vector goes to +∞ as λ → 1. But, then this implies that B is unbounded, and so
the proof of (iii) is completed.
(iv): Suppose M is not pointed. Then, there exists a direction e ̸= 0 such that M contains
the line generated by e; in particular ±e ∈ M . Assume for contradiction that d is an extreme
direction of M . Then, as M is a closed convex cone, d ± te ∈ M for all t ∈ R. Thus, as M is a
cone, we have d± (t) := 12 [d ± te] ∈ M for all t. Let us first suppose that e is not collinear to d,
then for any t ̸= 0, the vector d± (t) is not a nonnegative multiple of d, but then this contradicts
d being an extreme direction of M . So, we now suppose that e is collinear to d. But, in this case,
for large enough t, one of the vectors d± (t), while being a multiple of d, is not a nonnegative
multiple of d, which again is impossible. Thus, when M is not pointed, M does not have extreme
rays.
Now let M be pointed, and let the set B given by (8.9) be a base of M (a base does exist
by (i.3)). B is a nonempty closed and bounded convex set by (iii). Let us verify that the rays
R+ (d) spanned by the extreme points of B are exactly the extreme rays of M . First, suppose
d ∈ Ext(B), and let us prove that d is an extreme direction of M . Indeed, let d = d1 + d2 for
some d1 , d2 ∈ M ; we should prove that d1 , d2 are nonnegative multiples of d. There is nothing
to prove when one of the vectors d1 , d2 is zero, so we assume that both d1 , d2 are nonzero. Then,
since B is a base, by (ii.1) we have αi := f ⊤ di > 0, i = 1, 2. Moreover, α1 + α2 = f ⊤ d = 1.
Setting di := αi−1 di , i = 1, 2, we have di ∈ B, i = 1, 2, and α1 d1 + α2 d2 = d1 + d2 = d. Recalling
that d is an extreme point of B and αi > 0, i = 1, 2, we conclude that d1 = d2 = d, that is,
d1 and d2 are positive multiples of d, as claimed. For the reverse direction, let d be an extreme
direction of M . We need to prove that the intersection of the ray R+ (d) and B (this intersection
is nonempty since d ∈ M \ {0}) is an extreme point of B. Passing from extreme direction d to
its positive multiple, we can assume that d ∈ B. To prove that d ∈ Ext(B), assume that there
exists h such that d ± h ∈ B and let us verify that h = 0. Indeed, as d ∈ B we have f ⊤ d = 1,
while from d ± h ∈ B we conclude that f ⊤ h = 0. Therefore, when h ̸= 0, h is not a multiple of
d, whence the vectors d ± h are not multiples of d. On the other hand, both of the vectors d ± h
belong to M and d is their average, which contradicts the fact that d is an extreme direction of
M . Thus, h = 0, as claimed.

Fact II.8.38 Let M be a convex set in Rn containing the origin. Then,

(i) Polar (M ) = Polar (cl M );
(ii) M is bounded if and only if 0 ∈ int(Polar (M ));
(iii) int(Polar (M )) ̸= ∅ if and only if M does not contain straight lines;
(iv) If M is a cone (not necessarily closed), then
Polar (M ) = a ∈ Rn : a⊤ x ≤ 0, ∀x ∈ M = −M∗ .

(8.10)
Assume that M is closed. Then, M is a closed cone if and only if Polar (M ) is a
closed cone.
Proof.
Proofs of Facts from Part II 151

(i): This follows immediately from supx∈M a⊤ x = supx∈cl M a⊤ x.

(ii): Suppose M is bounded. Then, by Cauchy-Schwarz inequality all vectors y with small enough
norms satisfy y ∈ Polar (M ) and so 0 ∈ int(Polar (M )). To see the reverse direction, sup-
pose 0 ∈ int(Polar (M )). Note that cl M is a closed convex set and by item (i), we have
Polar (M ) = Polar (cl M ), so 0 ∈ int(Polar (cl M )), i.e., Polar (cl M ) contains a ball cen-
tered at the origin with some radius ρ > 0. Then, by Proposition II.8.37, we have cl M =
= Polar (Polar (M )), which implies that x ∈ cl M = Polar (Polar (M ))
Polar (Polar (cl M ))
only if 1 ≥ supy∈Rn y ⊤ x : ∥y∥2 ≤ ρ = ρ∥x∥2 . Thus, for all x ∈ cl M we have ∥x∥2 ≤ 1/ρ.
(iii): By item (i), the polar remains intact when passing from M to cl M ; by Lemma II.8.9, a
nonempty convex set M contains a straight line if and only if cl M does so. Thus, we lose
nothing when assuming in the rest of the proof that M is closed.
Assume, first, that M contains a straight line, and let us prove that int(Polar (M )) = ∅.
Indeed, when the closed convex set M contains a line, as 0 ∈ M by Lemma II.8.8 M contains
a parallel line ℓ passing through 0 ∈ M . Thus, Polar (M ) ⊆ Polar (ℓ). Since ℓ is a one-
dimensional linear subspace of Rn , Polar (ℓ) is the orthogonal complement ℓ⊥ of ℓ, so that
int(Polar (ℓ)) = int(ℓ⊥ ) = ∅, hence int(Polar (M )) = ∅ as well.
Now let int(Polar (M )) = ∅, and let us prove that M contains a straight line. Assume that it
is not the case, and let us lead this assumption to a contradiction. Since M does not contain
lines, the closed cone K := Rec(M ) is pointed, so that its dual cone K∗ has a nonempty
interior (Fact II.8.33(i.2)). Thus, there exists r > 0 and f ∈ K∗ such that the ball of radius
2r centered at f is contained in K∗ . Then, z ⊤ (f + e) ≥ 0 whenever z ∈ K, ∥e∥2 ≤ r and
∥f − f¯∥2 ≤ r. As a result,

f ⊤ z ≥ r∥z∥2 , ∀(z ∈ K, f ∈ B := {f : ∥f − f ∥2 ≤ r}). (∗)

Now let
C := sup f ⊤ z;
f ∈−B,z∈M

we claim that C < ∞. Taking this claim for granted, observe that C < ∞ implies, by
homogeneity, that supf ∈−ϵB,z∈M f ⊤ z ≤ ϵC for all ϵ > 0, hence for properly selected small
positive ϵ the ball −ϵB is contained in Polar (M ), implying int(Polar (M )) ̸= ∅, which is a
desired contradiction.
It remains to justify the above claim. To this end assume that C = +∞, and let us lead this
assumption to a contradiction. When C = +∞, there exists a sequence fi ∈ −B and zi ∈ M
such that fi⊤ zi → +∞ as i → ∞, implying, due to fi ∈ −B, that ∥zi ∥2 → ∞ as i → ∞.
Passing to a subsequence, we can assume that zi /∥zi ∥2 → h as i → ∞. Then, by its origin,
h is an asymptotic direction of M and therefore is a unit vector from K (Fact II.8.13(ii)).
Assuming w.l.o.g. zi ̸= 0 for all i, we have

fi⊤ zi = ∥zi ∥2 fi⊤ h + fi⊤ (zi /∥zi ∥2 − h) . (!)
|{z} | {z }
:=αi :=βi

As i → ∞, fi ∈ −B remain bounded and (zi /∥zi ∥2 − h) → 0, implying that βi → 0 as

i → ∞, while (∗) together with h ∈ K, ∥h∥2 = 1, and fi ∈ −B imply that αi ≤ −r < 0.
Thus, αi + βi ≤ −r/2 for large enough values of i, so that (!) taken together with ∥zi ∥2 → ∞
as i → ∞ says that fi⊤ zi → −∞ as i → ∞, contradicting the origin of fi and zi . Thus,
C < ∞, as claimed. This completes the verification of item (iii).
(iv): Clearly, when M is a nonempty conic set, the relation y ⊤ x ≤ 1 for all x ∈ M is exactly the
same as y ⊤ x ≤ 0 for all x ∈ M . Hence, when M is a cone, its polar is a closed cone given
by (8.10). On the other hand, when M contains the origin and is convex and closed, it is the
polar of its polar, so that when this polar is a cone, M itself is a closed cone (by the just
proved part of item (iv) as applied to Polar (M ) in the role of M ).
Part III

Convex Functions

153
12

First acquaintance with convex functions

12.1 Definition and examples

Definition III.12.1 [Convex function] A function f : Q → R defined on a

subset Q of Rn and taking real values is called convex, if
• the domain Q of the function is convex, and
• for every x, y ∈ Q and every λ ∈ [0, 1] one has
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y). (12.1)
The function f is called strictly convex whenever the above inequality holds
strictly for every x ̸= y and 0 < λ < 1.

A function f such that −f is convex is called concave. In particular, the domain

Q of a concave function f should be convex, and the function f itself should
satisfy the inequality opposite to (12.1), i.e.,
f (λx + (1 − λ)y) ≥ λf (x) + (1 − λ)f (y), ∀x, y ∈ Q and ∀λ ∈ [0, 1].
Example III.12.1 The simplest example of a convex function is an affine func-
tion
f (x) = a⊤ x + b,
which is simply the sum of a linear form and a constant. This function is clearly
convex on the entire space, and in this case “convexity inequality” holds as an
equality everywhere. In fact, an affine function is both convex and concave. More-
over, it is easily seen that a function which is both convex and concave on the
entire space must be affine. ♢
Example III.12.2 Here are several elementary examples of “nonlinear” convex
functions of one variable:
• functions convex on the entire axis:
x2p , where p is a positive integer;
exp(x);
exp(−x);
• functions convex on the nonnegative ray:
xp , where p ≥ 1;
−xp , where 0 ≤ p ≤ 1;
155
156 First acquaintance with convex functions

x ln x;
• functions convex on the positive ray:
1/xp , where p > 0;
− ln x.
At the moment it is not clear why these functions are convex. We will soon
derive a simple analytic criterion for detecting convexity which will immediately
demonstrate that the above functions indeed are convex. ♢
A very convenient equivalent definition of a convex function is in terms of its
epigraph. Given a real-valued function f defined on a subset Q of Rn , we define
its epigraph as the set
epi{f } := [x; t] ∈ Rn+1 : x ∈ Q, t ≥ f (x) .

Geometrically, to define the epigraph, we plot the graph of the function, i.e., the
surface {(x, t) ∈ Rn+1 : x ∈ Q, t = f (x)} in Rn+1 , and add to this surface
all points which are “above” it. Epigraph allows us to give an equivalent, more
geometrical, definition of a convex function as follows.

Proposition III.12.2 [Epigraph based definition of convex functions] A

function f : Q → R defined on a subset Q of Rn is convex if and only if its
epigraph is a convex set in Rn+1 .

Proof. Let f : Q → R be convex, we will show that epi(f ) is convex. Consider

any two points [x′ ; t′ ], [x′′ ; t′′ ] from epi{f } and any λ ∈ [0, 1]. Then, x′ , x′′ ∈ Q
and
λ[x′ ; t′ ] + (1 − λ)[x′′ ; t′′ ] = [λx′ + (1 − λ)x′′ ; λt′ + (1 − λ)t′′ ],
| {z } | {z }
:=x :=t

and since f is convex, we see that Q is convex and x ∈ Q. Moreover, by definition

t = λf (x′ ) + (1 − λ)f (x′′ ) ≥ f (x), where the inequality follows from convexity of
f . Then, [x; t] ∈ epi{f }. Thus, epi{f } is convex.
For the other direction, suppose that epi{f } is convex. Consider any x′ , x′′ ∈ Q
and λ ∈ [0, 1]. Then, we have [x′ ; f (x′ )] ∈ epi{f } and [x′′ ; f (x′′ )] ∈ epi{f } by
definition of epi{f }. Since epi{f } is convex, the point
λ[x′ ; f (x′ )] + (1 − λ)[x′′ ; f (x′′ )] = [λx′ + (1 − λ)x′′ ; λf (x′ ) + (1 − λ)f (x′′ )]
is in epi{f } as well. This, by definition of epi{f }, implies the relations λx′ + (1 −
λ)x′′ ∈ Q and f (λx′ + (1 − λ)x′′ ) ≤ λf (x′ ) + (1 − λ)f (x′′ ), so that f is convex.
More examples of convex functions: norms. Equipped with Proposition
III.12.2, we can extend our initial list of convex functions (affine functions and
several one-dimensional functions) with more examples, namely norms. Let ∥x∥
be a norm on Rn (see section 1.1.2). So √ far, we encountered three examples of
norms: the Euclidean (ℓ2 -) norm ∥x∥2 = x⊤ x, the ℓ1 -norm ∥x∥1 =
P
|xi | and
i
12.2 Jensen’s inequality 157

the ℓ∞ -norm ∥x∥∞ = max |xi |. It was also claimed (although not proved) that
i
these are three members from an infinite family of norms
n
!1/p
X
∥x∥p := |xi |p , where 1 ≤ p ≤ ∞
i=1

(the right hand side of the latter relation for p = ∞ is, by definition, max |xi |).
i
We say that a function f : Rn → R is positively homogeneous of degree 1 if it
satisfies
f (tx) = tf (x), ∀x ∈ Rn , t ≥ 0.
Also, we say that the function f : Rn → R is subadditive if it satisfies
f (x + y) ≤ f (x) + f (y), ∀x, y ∈ Rn .
Note that every norm is positively homogeneous of degree 1 and subadditive.
We are about to prove that all such functions (in particular, all norms) are con-
vex:

Proposition III.12.3 Let π(x) be a real-valued function on Rn which is

positively homogeneous of degree 1. Then, π is convex if and only if it is
subadditive.

Proof. Note that the epigraph of a positively homogeneous of degree 1 function

π is a conic set since for any λ ≥ 0 we have [x; t] ∈ epi{π} implies λ[x; t] ∈ epi{π}.
Moreover, by Proposition III.12.2 π is convex if and only if epi{π} is convex. It
is clear that a conic set is convex if and only if it contains the sum of every pair
of its elements (why ?). This latter property is satisfied for the epigraph of a
real-valued function if and only if the function is subadditive (evident).

12.2 Jensen’s inequality

The following basic observation is, we believe, one of the most useful observations
ever made.

Proposition III.12.4 [Jensen’s inequality] Let f : Q → R be convex. Then,

for every convex combination of points xi from Q, i.e.,
N
X
λi xi ,
i=1
PN
for some λ ∈ RN
+ satisfying λi = 1, we have
i=1
N
! N
X X
i
f λi x ≤ λi f (xi ).
i=1 i=1
158 First acquaintance with convex functions

Proof. Note that the points [xi , f (xi )] belong to the epigraph of f . As f is convex,
PN
its epigraph is a convex set. Then, for any λ ∈ RN + satisfying i=1 λi = 1, we
have that the corresponding convex combination of the points given by
N
"N N
#
X X X
λi [xi ; f (xi )] = λi xi ; λi f (xi )
i=1 i=1 i=1

also belongs to epi{f }. By definition of the epigraph, this means exactly that
N N
λi f (xi ) ≥ f ( λi xi ).
P P
i=1 i=1
Note that the definition of convexity of a function f is exactly the requirement
on f to satisfy the Jensen inequality for the case of N = 2. We see that to satisfy
this inequality for N = 2 is the same as to satisfy it for all N ≥ 2.
Remark III.12.5 An instructive interpretation of Jensen’s inequality is as fol-
lows: Given a convex function f , consider a discrete random variable x taking
values xi ∈ Dom f , i ≤ N , with probabilities λi . Then,
f (E[x]) ≤ E[f (x)],
where E[·] stands for the expectation operator. The resulting inequality, under
mild regularity conditions, holds true for general type random vectors x taking
values in Dom f with probability 1. ■

12.3 Convexity of sublevel sets

The sublevel set of a function f : Q → R given by α ∈ R is defined as
levα (f ) := {x ∈ Q : f (x) ≤ α} .
We have the following simple yet useful observation on the sublevel sets of convex
functions.

Proposition III.12.6 [Convexity of sublevel sets] Let f : Q → R be convex.

Then, for every α ∈ R, the sublevel set of f given by α ∈ R, i.e., levα (f ), is
convex.

Proof. Suppose x, y ∈ levα (f ) and λ ∈ [0, 1]. Then, from convexity of f , we

have f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) ≤ λα + (1 − λ)α = α, so that
λx + (1 − λ)y ∈ levα (f ).
It is important to note that the convexity of sublevel sets does not characterize
convex functions; there are nonconvex functions which possess this property (e.g.,
every monotone function on the axis has all of its sublevel sets convex). Thus,
convexity of sublevel sets specifies a wider family of functions, the so called qua-
siconvex ones. The “proper” characterization of convex functions in terms of
convex sets is given by Proposition III.12.2 – convex functions are exactly the
functions with convex epigraphs.
12.4 Value of a convex function outside its domain 159

12.4 Value of a convex function outside its domain

In its literal meaning, a function is not defined outside its domain and thus
does not have any associated “value” outside its domain. Nevertheless, when
speaking of convex functions, it is extremely convenient to think that the function
outside its domain also has a value, namely, it takes the value of +∞. With this
convention, we revise our definition of convex functions as follows.
A convex function f on Rn is a function taking values in the extended
real axis R ∪ {+∞} and such that for all x, y ∈ Rn and all λ ∈ [0, 1] one
has
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y). (12.2)
When f is taking values in the extended real axis, some terms in the inequality
(12.2) may involve infinities. In such a case, the left- and right-hand side values of
this inequality as well as its validity status is determined according to the stan-
dard conventions on operations of summation, multiplication, and comparison in
the “extended real line” R ∪ {+∞} ∪ {−∞}. These conventions are as follows:

• Operations with real numbers are understood in their usual sense.

• The sum of +∞ and a real number, same as the sum of +∞ and +∞ is +∞.
Similarly, the sum of a real number and −∞, same as the sum of −∞ and −∞
is −∞. The sum of +∞ and −∞ is undefined.
• The product of a real number and +∞ is +∞, 0 or −∞, depending on whether
the real number is positive, zero or negative, and similarly for the product of
a real number and −∞. The product of two “infinities” is again infinity, with
the usual rule for assigning the sign to the product.
• Finally, any real number is < +∞ and > −∞, and of course −∞ < ∞.
The set where a function f taking values in the extended real axis is finite is
called the domain of f and is denoted by Dom f . Based on our revised definition
of convex functions on the extended real axis, the function f that is defined to
be identically equal to +∞ is a legitimate convex function. A convex function
with nonempty domain (that is, a convex function which is not identically +∞)
is called proper.
When f takes all its values in R ∪ {+∞}, the inequality (12.2) is automatically
valid when λ = 0 or λ = 1; when 0 < λ < 1, it is automatically valid when x = y,
same as when at least one of the points x, y is not in Dom f . Thus, we arrive at
the following equivalent definition of convex functions:
Function f : Rn → R ∪ {+∞} is convex if and only if the inequality
(12.2) holds for every x ̸= y, x, y ∈ Dom f and for every λ ∈ (0, 1).
Note that our initial definition of convex function included the requirement for
the domain of the function to be convex; our new, equivalent, definition of convex
function does not include such a requirement —after f is extended outside of its
domain by +∞, inequality (12.2) automatically takes care of the convexity of
Dom f .
160 First acquaintance with convex functions

The simplest function with a given domain Q is identically zero on Q and iden-
tically +∞ outside of Q. This function, called the characteristic (a.k.a. indicator )
function of Q 1 is convex if and only if Q is a convex set.
It is convenient to think of a convex function as of something which is defined
everywhere, since it saves a lot of words. For example, with this convention we
can write f + g (f and g are convex functions on Rn ), and everybody will under-
stand what is meant. Without this convention, we were supposed to add to this
expression the following explanation as well: “f + g is a function with the domain
being the intersection of those of f and g, and in this intersection it is defined as
(f + g)(x) = f (x) + g(x).”

1 This terminology is standard for Convex Analysis; in other areas of Math, characteristic, a.k.a.
indicator, function of a set Q ⊂ Rn is defined as the function equal to 1 on the set and to 0 outside
of it.
13

How to detect convexity

In an optimization problem
min {f (x) : gj (x) ≤ 0, j = 1, . . . , m}
x

convexity of the objective function f and the constraint functions gi is crucial.

Indeed, convex problems — those with convex f and gj — possess nice theoretical
properties with important practical implications. For example, the local neces-
sary optimality conditions for these problems are sufficient for global optimality.
Moreover, much more importantly, convex problems can be efficiently (both in
theoretical and, to some extent, in the practical meaning of the word) solved,
which is not, unfortunately, the case for general nonconvex problems. This is why
it is so important to know how to detect convexity of a given function.
The scheme of our investigation is typical for mathematics. Let us start with
the example which you know from Analysis. How do you detect continuity of a
function? Of course, there is a definition of continuity in terms of ϵ and δ, but it
would be an actual disaster if each time we need to prove continuity of a func-
tion, we were supposed to write down the proof that “for every positive ϵ there
exists positive δ such that . . . ”. In fact, we use another approach: we list once
forever a number of standard operations which preserve continuity (like addition,
multiplication, taking superpositions, etc.) and point out a number of standard
examples of continuous functions (like the power function, the exponent, etc.).
Note that both steps, proving that the operations in the list preserve continuity
and proving that the standard functions are continuous, take certain effort and
indeed is done in ϵ − δ terms. But, after investing in this effort once, typically
proving continuity of a given function becomes a much simpler task: it suffices
to demonstrate that the function can be obtained, in finitely many steps, from
our “raw materials” (the standard functions which are known to be continuous)
by applying our “machinery” (the combination rules which preserve continuity).
Normally, this demonstration is given by a single word “evident” or is even un-
derstood by default.
This is exactly the case with convexity. We will next point out the list of
operations which preserve convexity and a number of standard convex functions.

13.1 Operations preserving convexity of functions

We start with the following basic operations preserving convexity of functions:
161
162 How to detect convexity

• Stability under taking nonnegative weighted sums: if f, g are convex functions

on Rn , then their linear combination λf + µg with nonnegative coefficients
λ, µ ∈ R+ is also convex.
[This can be verified straightforwardly using the convex function definition.]
• Stability under affine substitutions of the argument: given a convex function f
on Rn and an affine mapping x 7→ Ax + b from Rm into Rn , the superposition
f (Ax + b) is convex.
[This can be proved directly by verifying the convex function definition or
by noting that the epigraph of the superposition is the inverse image of the
epigraph of f under an affine mapping.]
• Stability under taking pointwise supremum: given any nonempty (and possi-
bly infinite!) family of convex functions {fα (·)}α∈A on Rn , their supremum
sup fα (·) is convex.
α∈A
[Note that the epigraph of the supremum is clearly the intersection of the
epigraphs of the functions from the family. Then, recall that the intersection
of every family of convex sets is convex.]
• “Convex Monotone superposition”: Let f (x) := [f1 (x), . . . , fK (x)] be a vector-
map with convex component functions fi : Rn → R ∪ {+∞} , and let F
be a convex function on RK . Suppose F is monotone nondecreasing, i.e., for
any z, z ′ ∈ RK satisfying z ≤ z ′ we always have F (z) ≤ F (z ′ ). Then, the
superposition function given by
ϕ(x) := F (f (x)) = F (f1 (x), . . . , fK (x)) : Rn → R ∪ {+∞}
is convex.
Here, note that the expression F (f1 (x), . . . , fK (x)) makes no evident sense at a
point x where some of fi ’s take the value of +∞. At a such point, by definition,
we assign the value of +∞ to the superposition function.
Let us now justify the convex monotone composition rule. Consider any
x, x′ ∈ Dom ϕ. Then, z := f (x) and z ′ := f (x′ ) are vectors from RK which
belong to Dom F . Due to the convexity of the components of f , for any λ ∈
(0, 1) we have the vector inequality
f (λx + (1 − λ)x′ ) ≤ λz + (1 − λ)z ′ .
In particular, the left hand side in this inequality is a vector from RK , i.e.,
it has no “infinite entries,” and we may further use the monotonicity of F to
arrive at
ϕ(λx + (1 − λ)x′ ) = F (f (λx + (1 − λ)x′ )) ≤ F (λz + (1 − λ)z ′ ).
Moreover, using the convexity of F we deduce
F (λz + (1 − λ)z ′ ) ≤ λF (z) + (1 − λ)F (z ′ ).
Then, combining these two inequalities and noting that F (z) = F (f (x)) = ϕ(x)
and F (z ′ ) = F (f (x′ )) = ϕ(x′ ), we arrive at the desired convexity relation
ϕ(λx + (1 − λ)x′ ) ≤ λϕ(x) + (1 − λ)ϕ(x′ ).
13.1 Operations preserving convexity of functions 163

Imagine how many extra words would be necessary here if there were no con-
vention on the value of a convex function outside its domain!
In the Convex Monotone superposition rule, monotone nondecreasing property
of F is crucial. (Look what happens when n = K = 1, f1 (x) = x2 , F (z) = −z).
This rule, however, admits the following two useful variants where the mono-
tonicity requirement is somehow relaxed (the justifications of these variants are
left to the reader):

• “Convex Affine superposition”: Let F (z) be a convex function on RK , and

let the functions fi (x), i ≤ K, be convex functions on Rn . Suppose that
for some k ≤ K the functions f1 , . . . , fk are affine, and the function F (z)
is nondecreasing in the entries zs of z with indices s > k. Then, the function
F (f1 (x), . . . , fK (x)) is convex.
• Let F (z) be a convex function on RK , and let the functions fi (x), i ≤ K, be
convex functions on Rn . Define f (x) := [f1 (x); . . . ; fK (x)]. Let Y be a convex
set in RK such that f (x) ∈ Y whenever all entries in f (x) are finite. Suppose
that for some k ≤ K the functions f1 , . . . , fk are affine. Assume, next, that F (z)
is nondecreasing in only the entries zs , s > k, of z on Y , i.e., F (z ′ ) ≥ F (z)
whenever z ′ , z are such that z ′ , z ∈ Y and zs′ = zs for all s ≤ k and zs′ ≥ zs for
all s > k. Then, F (f (x)) is convex on Rn .
Example: Let fi (x) be convex, i ≤ K, and F (z) := ∥z∥1 . Note that in general
the function F (f (x)) is not necessarily convex (look what happens when n =
K = 1 and f1 (x) = x2 − 1). However, F (f (x)) is convex provided that fi (x)
are not just convex, but are nonnegative as well (set Y = RK + ). More generally,
the functions ∥f (x)∥p , 1 ≤ p ≤ ∞, are convex provided that each component
fi : Rn → R ∪ {+∞}, i ≤ K of the vector function f (x) = [f1 (x); . . . ; fK (x)]
is convex and nonnegative.

We close this section with two more convexity preserving operations:

• Stability under partial minimization: if f (x, y) : Rnx × Rm

y → R ∪ {+∞} is
convex (as a function of z = [x; y]; this is called joint convexity) and the
function
g(x) := inf f (x, y)
y

is greater than −∞ everywhere, then g is convex.

The justification of this is as follows. First, the only values convex functions
can take are real numbers and +∞, and in the case of g this is assumed. Now,
consider any x, x′ ∈ Dom g and any λ ∈ [0, 1]. Define x′′ := λx + (1 − λ)x′ . We
need to show that g(x′′ ) ≤ λg(x) + (1 − λ)g(x′ ). There is clearly nothing to
prove when λ = 0 or λ = 1, same as when 0 < λ < 1 and either x′ , or x′′ , or
both do not belong to Dom g. Thus, we assume that x′ , x′′ ∈ Dom g. For any
positive ϵ, we can find yϵ and yϵ′ such that [x; yϵ ] ∈ Dom f , [x′ ; yϵ′ ] ∈ Dom f and
g(x) + ϵ ≥ f (x, yϵ ), g(x′ ) + ϵ ≥ f (x′ , yϵ′ ). Taking weighted sum of these two
164 How to detect convexity

inequalities, we get
λg(x) + (1 − λ)g(x′ ) + ϵ ≥ λf (x, yϵ ) + (1 − λ)f (x′ , yϵ′ )
≥ f (λx + (1 − λ)x′ , λyϵ + (1 − λ)yϵ′ )
= f (x′′ , λyϵ + (1 − λ)yϵ′ ),
where the last inequality follows from the convexity of f . By definition of g(x′′ )
we have f (x′′ , λyϵ + (1 − λ)yϵ′ ) ≥ g(x′′ ), and thus we get λg(x) + (1 − λ)g(x′ ) +
ϵ ≥ g(x′′ ). In particular, x′′ ∈ Dom g (recall that x, x′ ∈ Dom(g) and thus
g(x), g(x′ ) ∈ R). Moreover, since the resulting inequality is valid for all ϵ > 0,
we come to g(x′′ ) ≤ λg(x) + (1 − λ)g(x′ ), as required.
• Perspective transform of a convex function: Given a convex function f on Rn ,
we define the function g(x, y) := yf (x/y) with the domain {[x; y] ∈ Rn+1 : y >
0, x/y ∈ Dom f } to be its perspective function. The perspective function of a
convex function is convex.
Let us first examine a direct justification of this. Consider any [x′ ; y ′ ] and
[x′′ ; y ′′ ] from Dom g and any λ ∈ [0, 1]. Define x := λx′ + (1 − λ)x′′ , y :=
λy ′ + (1 − λ)y ′′ . Then, y > 0. We also define λ′ := λy ′ /y and λ′′ := (1 − λ)y ′′ /y,
so that λ′ , λ′′ ≥ 0 and λ′ + λ′′ = 1. As f is convex, we deduce x/y = λx′ /y +
(1 − λ)x′′ /y = λ′ x′ /y ′ + λ′′ x′′ /y ′′ = λ′ x′ /y ′ + (1 − λ′ )x′′ /y ′′ ∈ Dom f and
f (x/y) ≤ λ′ f (x′ /y ′ ) + (1 − λ′ )f (x′′ /y ′′ ). Thus, as y > 0, we arrive at yf (x/y) ≤
yλ′ f (x′ /y ′ ) + y(1 − λ′ )f (x′′ /y ′′ ) = λ[y ′ f (x′ /y ′ )] + (1 − λ)[y ′′ f (x′′ /y ′′ )], that is,
g(x, y) ≤ λg(x′ , y ′ ) + (1 − λ)g(x′′ , y ′′ ).
Here is an alternative smarter justification. There is nothing to prove when
Dom f = ∅. So, suppose that Dom f ̸= ∅. Consider the epigraph epi(f ) =
{[x; s] : s ≥ f (x)} and the perspective transform of this nonempty convex set
which is given by (see section 1.5)
Persp(epi{f }) := [[x; s]; t] ∈ Rn+2 : t > 0, [x/t; s/t] ∈ epi{f }

= [x; s; t] ∈ Rn+2 : t > 0, s/t ≥ f (x/t)

= [x; s; t] ∈ Rn+2 : t > 0, s ≥ tf (x/t)

= [x; s; t] ∈ Rn+2 : t > 0, x/t ∈ Dom f, s ≥ tf (x/t)

= [x; s; t] ∈ Rn+2 : t > 0, [x; t] ∈ Dom g, s ≥ g(x, t)

= [x; s; t] ∈ Rn+2 : [x; t; s] ∈ epi{g} ,

where the second from last equality follows from the fact that by definition of
g(x, t), whenever t > 0, the inclusion x/t ∈ Dom f takes place if and only if
[x; t] ∈ Dom g. Thus, we observe that Persp(epi{f }) is nothing but the image
of epi{g} under the one-to-one linear transformation [x; t; s] 7→ [x; s; t]. As
Persp(epi{f }) is a convex set (recall from section 1.5 that the perspective
transform of a nonempty convex set is convex), we conclude that g is convex.
Now that we know what the basic operations preserving convexity of a function
are, let us look at the standard convex functions these operations can be applied
to. We have already seen several examples in Example III.12.2; but we still do
not know why these functions are convex. The usual way to check convexity of a
13.2 Criteria of convexity 165

“simple” (i.e., given by a simple formula) function is based on differential criteria

of convexity, which we will examine in the next section.

13.2 Criteria of convexity

The definition of convexity of a function immediately reveals that convexity is a
one-dimensional property: a function f on Rn taking values in R∪{+∞} is convex
if and only if its restriction on every line, i.e., every function g : R → R ∪ {+∞}
of the type g(t) := f (x + th) with x, h ∈ Rn is convex.

x z y

Figure III.1. Univariate convex function f : [x, y] → R. The average rate of change of f
on the entire segment [x, y] is in-between the average rates of change “at the beginning,”
i.e., when passing from x to z and “at the end,” i.e., when passing from z to y.

We are about to show that any univariate function f : R → R ∪ {+∞} is

convex if and only if for every 3 real numbers x < z < y such that x, y ∈ Dom f
we have z ∈ Dom f and the average rate f (y)−f
y−x
(x)
at which f varies when moving
from x to y is in-between the average rate f (z)−f
z−x
(x)
at which f changes “at the
beginning,” i.e., when moving from x to z, and the average rate f (y)−f
y−z
(z)
at which
f changes “at the end,” i.e., when moving from z to y, see Figure III.1:
f (z) − f (x) f (y) − f (x) f (y) − f (z)
≤ ≤ ,
z−x y−x y−z
so that
f (z) − f (x) f (y) − f (x)
≤ ,
z−x y−x
f (y) − f (x) f (y) − f (z)
≤ , (13.1)
y−x y−z
f (z) − f (x) f (y) − f (z)
≤ .
z−x y−z
As is immediately seen, every one of the three inequalities in (13.1) implies the
other two.
Here is the justification of the above characterization of convexity of a uni-
variate function f . Note that this convexity is nothing but the requirement
that for any real numbers x, y ∈ Dom f with x < y and every λ ∈ (0, 1), for
zλ := (1 − λ)x + λy it holds f (zλ ) ≤ (1 − λ)f (x) + λf (y), or, which is the same,
f (zλ ) − f (x) ≤ λ(f (y) − f (x)). (13.2)
166 How to detect convexity

When λ ∈ (0, 1), the pair (1, λ) is a positive multiple of the pair (y−x, zλ −x), thus
(13.2) is equivalent to (y −z)(f (zλ )−f (x)) ≤ (zλ −x)(f (y)−f (x)). Note that this
inequality is the same as f (zzλλ)−f
−x
(x)
≤ f (y)−f
y−x
(x)
. When λ runs through the interval
(0, 1) the point zλ runs through the entire set {z : x < z < y}, and so we conclude
that f is convex if and only if for every triple x < z < y with x, y ∈ Dom f the first
inequality in (13.1) holds true. As every one of inequalities in (13.1) implies the
other two, this justifies our “average rate of change” characterization of univariate
convexity.
In the case of multivariate convex functions, we have the following immediate
consequence of the preceding observations.

Lemma III.13.1 Let x, x′ , x′′ be three distinct points in Rn with x′ ∈ [x, x′′ ].
Then, for any convex function f that is finite on [x, x′′ ], we have
f (x′ ) − f (x) f (x′′ ) − f (x)
≤ . (13.3)
∥x′ − x∥2 ∥x′′ − x∥2

Proof. Under the premise of the lemma, define ϕ(t) := f (x + t(x′′ − x)) and let
λ ∈ R be such that x′ = x + λ(x′′ − x). Note that λ ∈ (0, 1) as x′ ∈ [x, x′′ ] and
the points x, x′ , x′′ are all distinct from each other. As it was explained at the
beginning of this section, the univariate function ϕ is convex along with f , and
0, 1, λ ∈ Dom ϕ. Applying the first inequality in (13.1) to ϕ in the role of f and
′
the triple (0, λ, 1) in the role of the triple (x, z, y), we get f (x )−f
λ
(x)
≤ f (x′′ )−f (x),
which, due to λ = ∥x′ − x∥2 /∥x′′ − x∥2 , is nothing but (13.3).
To sum up, to detect convexity of a function, in principle, it suffices to know
how to detect convexity of functions of a single variable. Moreover, this latter
question can be resolved by the standard Calculus tools.

13.2.1 Differential criteria of convexity

We have the following simple and complete characterization of convexity of smooth
univariate functions from Calculus.

Proposition III.13.2 [Necessary and sufficient condition for convexity of

smooth univariate functions] Let (a, b) be an interval on the axis (where the
cases of a = −∞ and/or b = +∞ are also possible). Then,
(i) A function f that is differentiable everywhere on (a, b) is convex on
(a, b) if and only if its derivative f ′ is monotonically nondecreasing on (a, b);
(ii) A function f that is twice differentiable everywhere on (a, b) is convex
on (a, b) if and only if its second derivative f ′′ is nonnegative everywhere on
(a, b).

Proof.
(i): We start by proving the necessity of the stated condition. Suppose that f
is differentiable and convex on (a, b). We will prove that then f ′ is monotonically
13.2 Criteria of convexity 167

nondecreasing. Let x < y be two points from the interval (a, b), and let us prove
that f ′ (x) ≤ f ′ (y). Consider any z ∈ (x, y). Invoking convexity of f and applying
(13.1), we have
f (z) − f (x) f (y) − f (z)
≤ .
x−z y−z
Passing to limit as z → x + 0, we get
f (y) − f (x)
f ′ (x) ≤ ,
y−x
and passing to limit in the same inequality as z → y − 0, we arrive at
f (y) − f (x)
≤ f ′ (y),
y−x
and so f ′ (x) ≤ f ′ (y), as claimed.
Let us now prove the sufficiency of the condition in (i). Thus, we assume that
f ′ exists and is nondecreasing on (a, b), and we will verify that f is convex on
(a, b). By “average rate of change” description of the convexity of a univariate
function, all we need is to verify that if x < z < y and x, y ∈ (a, b), then
f (z) − f (x) f (y) − f (z)
≤ .
z−x y−z
This is indeed evident: by the Lagrange Mean Value Theorem, the left hand side
ratio is f ′ (u) for some u ∈ (x, z), and the right hand side one is f ′ (v) for some
v ∈ (z, y). Since v > u and f ′ is nondecreasing on (a, b), we conclude that the
left hand side ratio is indeed less than or equal to the right hand side one.
(ii): This part is an immediate consequence of (i) as we know from Calculus
that a differentiable function — in our case now this is the function f ′ — is mono-
tonically nondecreasing on an interval if and only if its derivative is nonnegative
on this interval.
Proposition III.13.2 immediately allows us to verify the convexity of functions
listed in Example III.12.2. To this end, the only difficulty which we may encounter
is that some of these functions (e.g., xp with p ≥ 1, and −xp with 0 ≤ p ≤ 1)
are claimed to be convex on the half-interval [0, +∞), while Proposition III.13.2
talks about convexity of functions on open intervals. This difficulty can be ad-
dressed with the following simple result which allows us to extend the convexity
of continuous functions beyond open sets.

Proposition III.13.3 Let M be a convex set and let f be a function with

Dom f = M . Suppose that f is convex on rint M and is continuous on M ,
i.e.,
f (xi ) → f (x), as i → ∞,
whenever xi , x ∈ M and xi → x as i → ∞. Then, f is convex on M .
168 How to detect convexity

Proof. Consider any x, y ∈ M and any λ ∈ [0, 1]. Define z := λx + (1 − λ)y. We

need to prove that
f (z) ≤ λf (x) + (1 − λ)f (y).
As x, y ∈ M and M is convex, by Theorem I.1.29(iii), there exist sequences
{xi }i≥1 ∈ rint M and {yi }i≥1 ∈ rint M converging to x and to y, respectively.
Then, the sequence zi := λxi + (1 − λ)yi is in rint M and it converges to z as
i → ∞. Since f is convex on rint M , for all i ≥ 1 we have
f (zi ) ≤ λf (xi ) + (1 − λ)f (yi ).
By taking the limits of both sides of this inequality, noting that f is continuous
on M , and as i → ∞ the sequences xi , yi , zi converge to x, y, z ∈ M , respectively,
we obtain the desired inequality.
Example III.13.1 We are now able to justify the claim, made as early as in
section 1.1.2, that the functions ∥ · ∥p , 1 ≤ p ≤ ∞, are norms.
So far, we have verified it only for p = 1, 2, ∞ in Remark I.1.7. Now consider
the case of p ∈ (1, ∞). In order to prove that ∥ · ∥p is indeed a norm, using
Fact I.1.8, all we need is to show that the set V := {x ∈ Rn : ∥x∥p ≤ 1} is
closed, bounded, symmetric with respect to origin, contains a neighborhood of
the origin and is convex. All these except the convexity is evident. So, Pnwe will
prove that V is indeed convex. Define the function f (x) := ∥x∥pp = i=1 |xi |p .
Then, V = {x ∈ Rn : f (x) ≤ 1}. By Proposition III.13.2, the univariate function
|x|p is convex (recall that p ∈ (1, ∞)). Moreover, by “calculus of convexity”
(section 13.1), f is convex (as it is the sum of convex functions). Thus, V is the
sublevel set of a convex function, and by Proposition III.12.6, it is convex. ♢
In the preceding illustration of convexity of norms we have started to examine
multivariate functions. Let us continue our discussion of multivariate functions
by examining a differential criteria for their convexity. Propositions III.13.2(ii)
and III.13.3 give us the following convenient necessary and sufficient condition
for convexity of smooth multivariate functions.

Corollary III.13.4 [Convexity criterion for smooth multivariate functions]

Let f : Rn → R ∪ {+∞} be a function where its domain Q := Dom f is a
convex set with a nonempty interior. Suppose that f is
• continuous on Q, and
• twice differentiable on int Q.
Then, f is convex on Q if and only if for all x ∈ int Q its Hessian matrix
f ′′ (x) is positive semidefinite, i.e.,
h⊤ f ′′ (x)h ≥ 0, ∀h ∈ Rn .
That is, f is convex on Q if and only if the second order directional derivative
of f taken at any point x ∈ int Q along any direction h ∈ Rn is nonnegative,
13.2 Criteria of convexity 169

i.e., for all x ∈ int Q, and h ∈ Rn we have

d2
h⊤ f ′′ (x)h = t=0
f (x + th) ≥ 0.
dt2

Proof. The “only if” part is evident: if f is convex on Q and x ∈ int Q, then for
any fixed direction h ∈ Rn the function g : R → R ∪ {+∞} defined as
g(t) := f (x + th)
is convex in a certain neighborhood of the point t = 0 on the axis (recall that affine
substitutions of argument preserves convexity). Since f is twice differentiable in
a neighborhood of x, the function g is twice differentiable in a neighborhood of
t = 0, as well. Thus, by Proposition III.13.2, we have 0 ≤ g ′′ (0) = h⊤ f ′′ (x)h.
In order to prove the “if” part we need to show that every function f : Q →
R∪{+∞} that is continuous on Q and that satisfies h⊤ f ′′ (x)h ≥ 0 for all x ∈ int Q
and all h ∈ Rn is convex on Q.
Let us first prove that f is convex on int Q. By Theorem I.1.29, int Q is a
convex set. Since the convexity of a function on a convex set is a one-dimensional
property, all we need to prove is that for any x, y ∈ int Q the univariate function
g : [0, 1] → R ∪ {+∞} given by
g(t) := f (x + t(y − x))
is convex on the segment [0, 1]. As f is twice differentiable on int Q, g is continuous
and twice differentiable on the segment [0, 1] and its second derivative is given by
g ′′ (t) = (y − x)⊤ f ′′ (x + t(y − x))(y − x) ≥ 0,
where the inequality follows from the premise on f . Then, by Propositions III.13.2(ii)
and III.13.3, g is convex on [0, 1]. Thus, f is convex on int Q. As f is convex on
int Q and is continuous on Q, by Proposition III.13.3 we conclude that f is convex
on Q.

Corollary III.13.5 [Sufficient condition for strict convexity of smooth func-

tions] Consider the setting of Corollary III.13.4. Suppose, in addition, that
Q := Dom f is open and the Hessian of f is positive definite on Q, i.e.,
d2
f (x + th) > 0, ∀x ∈ Q and ∀h ̸= 0.
dt2
Then, f is strictly convex.

Proof. Consider any x, y ∈ Q with x ̸= y and any λ ∈ (0, 1). We need to show
that f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y). Consider the function ϕ : [0, 1] → R
given by ϕ(t) := f (tx+(1−t)y). Then, as f is twice differentiable on Q, ϕ is twice
differentiable on [0, 1]. Moreover, based on the premise on f , we have ϕ′′ (t) > 0
for all t ∈ [0, 1]. Note that our target inequality is simply the relation ϕ(λ) <
λϕ(1) + (1 − λ)ϕ(0). Since 0 < λ < 1, we can rewrite this target inequality as
170 How to detect convexity
ϕ(λ)−ϕ(0)
λ
< ϕ(1)−ϕ(λ)
1−λ
. Finally, by the Mean Value Theorem and strict monotonicity
′
of ϕ we conclude that the desired target inequality holds.
We conclude this section by highlighting that convexity of many “complicated”
functions can be proved easily by the application of combination of “calculus of
convexity” rules to simple functions which pass the “infinitesimal” convexity tests.
Example III.13.2 Consider the following exponential posynomial function f :
Rn → R, given by
XN
f (x) = ci exp(a⊤
i x),
i=1

where the coefficients ci are positive (this is why the function is called posynomial).
This function is in fact convex on Rn . How can we prove this?
An immediate proof is as follows:
1. The function exp(t) is convex on R as its second order derivative is positive as
required by the infinitesimal convexity test for smooth univariate functions.
2. Thus, by stability of convexity under affine substitutions of argument, we
deduce that all functions exp(a⊤ n
i x) are convex on R .
3. Finally, by stability of convexity under taking linear combinations with non-
negative coefficients, we conclude that f is convex on Rn .
And if we were supposed to prove that the maximum of three exponential
posynomials is convex? Then, all we need is to add to our three steps above
the fourth one, which refers to the stability of convexity under taking pointwise
supremum. ♢

13.3 Important multivariate convex functions

Let us start with some simple yet important multivariate convex functions that
can be detected solely based on “calculus of convexity” presented in section 13.1.
Example III.13.3 Let k, n be two positive integers such that k ≤ n. Consider
the function sk : Rn → R given by
k
X
sk (x) := x[i] ,
i=1

where x[i] denotes the i-th largest entry in the vector x. That is, for every vector
x ∈ Rn , we have x[1] ≥ x[2] ≥ . . . ≥ x[n] . By definition, sk (x) is simply the sum of
k largest elements in x. We claim thatP sk (x) is a convex function of x. Given any
index set I, the function ℓI (x) := i∈I xi is a linear function of x and thus it is
convex. Now, sk (x) is clearly the maximum of the linear functions ℓI (x) over all
index sets I with exactly k elements from {1, . . . , n}, and as such is convex.
Note also that sk (x) is a permutation symmetric function of x, that is, the value
of the function sk (x) remains the same when permuting entries in its argument
x. Taking together convexity and permutation symmetry of sk (x) will be very
13.3 Important multivariate convex functions 171

useful in our developments for functions of eigenvalues of symmetric matrices in

chapter 17. ♢
Remark III.13.6 An immediate implication of Example III.13.3 is that given
a vector x ∈ Rn , the function maxi {xi }, i.e., the value of its largest element, is
convex in x. Thus, the function corresponding to the value of minimum element
in x, i.e., mini {xi } = − maxi {−xi } is concave in x. That said, for 1 < k < n, the
“intermediate element” function, i.e., the function given by x[k] , which stands for
the k-largest element in x, is neither convex, nor concave function of x. ■
While “calculus of convexity” presented in section 13.1 is sufficient and quite
practical in proving convexity of many functions, there are still several multi-
variate functions for which convexity seemingly cannot be extracted from this
calculus and should be verified via Corollary III.13.4. Here are some important
examples.
Example III.13.4 The function f : Rn → R given by

n
!
X
f (x) := ln exp(xi )
i=1

is convex.
Let us first verify the convexity of this function via direct computation using
Corollary III.13.4. To this end, we define pi := Pexp(x i)
. Then, the second-order
j exp(xj )
n
directional derivative of f along the direction h ∈ R is given by
!2
d2 X X
ω := 2 f (x + th) = pi h2i − pi hi .
dt t=0 i i

P
Observing that pi > 0 and i pi = 1, we see that ω is the variance (the ex-
pectation of square minus the squared expectation) of discrete random variable
taking values hi with probabilities pi , and it is well known that the variance of
any random variable is always nonnegative. Here is a direct verification of this
fact:
!2 !2 ! !
X X√ √ X X X
pi hi = pi ( pi hi ) ≤ pi pi h2i = pi h2i ,
i i i i i

where the inequality

P follows from Cauchy-Schwarz inequality and the last equality
holds since i pi = 1.
In fact, in this example we can prove convexity of f via calculus Pof convexity
as well. Recall
P that ln(z) = mins∈R {z exp(s) − s − 1}. Thus, ln ( i exp(xi )) =
mins∈R {( i exp(xi + s)) − s + 1}. Then, by stability of convexity under partial
minimization we conclude that f is convex.
172 How to detect convexity

An even simpler proof is as follows:

( !) ( !)
X X
epi ln exp(xi ) = [x; t] : t ≥ ln exp(xi )
i i
( )
X
= [x; t] : exp(t) ≥ exp(xi )
i
( )
X
= [x; t] : exp(xi − t) ≤ 1 ,
i

and the concluding set is convex (as a sublevel set of a convex function). ♢
Example III.13.5 The function f : Rn+ → R given by
n
Y
f (x) = xαi i ,
i=1
P
where αi > 0 for all 1 ≤ i ≤ n and satisfy i αi ≤ 1, is convex.
To prove convexity of f via Corollary III.13.4 all we need is to verify that for
d2
any x ∈ Rn satisfying x > 0 and for any h ∈ Rn , we have dt2
f (x + th) ≥ 0.
t=0
Let ηi := hi /xi , then direct computation shows that
" #
d2 X
2
X
2
f (x + th) = ( αi ηi ) − ( αi ηi ) f (x)
dt2 t=0 i i

and as we have seen in Example III.13.4, the premise on α implies that

!2 !2 ! !
X X√ √ X X X
2
αi ηi = αi ( αi ηi ) ≤ αi αi η i ≤ αi ηi2 ,
i i i i i

where the first inequality

P follows from Cauchy-Schwarz inequality and the last
inequality holds since i αi ≤ 1. Noting that f (x) ≥ 0 whenever x > 0 completes
the proof. ♢
Example III.13.5 admits the following immediate extension:
Example III.13.6 The function f : int Rn+ → R given by
n
Y
f (x) = x−α
i
i
,
i=1

where αi > 0 for all 1 ≤ i ≤ n, is convex.

Here “calculus of convexity” already works: the function
n
X
ln(f (x)) = − αi ln(xi )
i=1

is convex on int Rn+as the function g(y) = − ln y is convex on the positive ray. It
remains to note that taking exponent preserves convexity by Convex Monotone
superposition rule. ♢
13.4 Gradient inequality 173

13.4 Gradient inequality

We next present an extremely important property of convex functions:

Proposition III.13.7 [Gradient inequality] Let f : Rn → R ∪ {+∞}, x ∈

int(Dom f ), and let Q be a convex set containing x. Suppose that
• f is convex on Q, and
• f is differentiable at x.
Let ∇f (x) be the gradient of the function f at x. Then, the following in-
equality holds:
f (y) ≥ f (x) + (y − x)⊤ ∇f (x), ∀y ∈ Q. (13.4)
Geometrically, this relation states that the graph of the function f restricted
onto the set Q, i.e.,
(y, t) ∈ Rn+1 : y ∈ Dom f ∩ Q, t = f (y)

is above the graph

{(y, t) ∈ Rn+1 : t = f (x) + (y − x)⊤ ∇f (x)}
of the linear form tangent to f at x.

Proof. Let y ∈ Q. There is nothing to prove if y ̸∈ Dom f (since then the left
hand side of the gradient inequality is +∞). Similarly, there is nothing to prove
when y = x. Thus, we can assume that y ̸= x and y ∈ Dom f . Let us set
yτ := x + τ (y − x), where 0 < τ ≤ 1,
so that y0 = x, y1 = y and yτ is an interior point of the segment [x, y] for
0 < τ < 1. Applying Lemma III.13.1 to the triple (x, x′ , x′′ ) taken as (x, yτ , y),
we get
f (x + τ (y − x)) − f (x) f (y) − f (x)
≤ ;
τ ∥y − x∥2 ∥y − x∥2
as τ → +0, the left hand side in this inequality, by the definition of the gradient,
⊤
∇f (x)
tends to (y−x)
∥y−x∥2
, and so we get
(y − x)⊤ ∇f (x) f (y) − f (x)
≤ ,
∥y − x∥2 ∥y − x∥2
and as ∥y − x∥2 > 0 this is equivalent to
(y − x)⊤ ∇f (x) ≤ f (y) − f (x).
Note that this inequality is exactly the same as (13.4).

Corollary III.13.8 Suppose Q is a convex set with a nonempty interior and

f is continuous on Q and differentiable on int Q. Then, f is convex on Q if
174 How to detect convexity

and only if the gradient inequality (13.4) is valid for every pair x ∈ int Q
and y ∈ Q.

Proof. Indeed, the “only if” part, i.e., the convexity of f on Q implying the
gradient inequality for all x ∈ int Q and all y ∈ Q, is given by Proposition III.13.7.
Let us prove the “if” part, i.e., establish the reverse implication. Suppose that
f satisfies the gradient inequality for all x ∈ int Q and all y ∈ Q, and let us
verify that f is convex on Q. As f is continuous on Q and Q is convex, by
Proposition III.13.3 it suffices to prove that f is convex on int Q. Recall also that
by Theorem I.1.29 int Q is convex. Moreover, due to the gradient inequality, on
int Q function f is the supremum of the family of affine (and therefore convex)
functions, i.e., for all y ∈ int Q we have
f (y) = sup fx (y), where fx (y) := f (x) + (y − x)⊤ ∇f (x).
x∈int Q

As affine functions are convex and by stability of convexity under taking pointwise
supremums, we conclude f is convex on int Q.

13.5 Boundedness and Lipschitz continuity of a convex function

In this section we will discuss some local properties of convex functions such as
boundedness and Lipschitz continuity. Recall that given a function f and a set
Q ⊆ Dom f , we say that f is Lipschitz continuous on Q if there exists a constant
L > 0 (referred to as the Lipschitz constant of f on Q) such that
|f (x) − f (y)| ≤ L∥x − y∥2 , ∀x, y ∈ Q.

Theorem III.13.9 [Local boundedness and Lipschitz continuity of convex

functions] Let f be a convex function and let K be a closed and bounded set
contained in rint (Dom f ). Then, f is Lipschitz continuous on K, i.e.,
|f (x) − f (y)| ≤ L∥x − y∥2 , ∀x, y ∈ K, (13.5)
for properly selected L < ∞. In particular, f is bounded on K.

We shall prove this Theorem later in this section, after some preliminary effort.
Remark III.13.10 In Theorem III.13.9, all three assumptions on K, (1) closed-
ness, (2) boundedness, and (3) K ⊆ rint (Dom f ), are essential. The following
three examples illustrate their importance:
• Suppose f (x) = 1/x, then Dom f = (0, +∞). Consider K = (0, 1]. We have
assumptions (2) and (3) satisfied, but not (1). Note that f is neither bounded
nor Lipschitz continuous on K.
• Suppose f (x) = x2 , then Dom f = R. Consider K = R. We have (1) and (3)
satisfied, but not (2). Note that f is neither bounded nor Lipschitz continuous
on K.
13.5 Boundedness and Lipschitz continuity of a convex function 175
√
• Suppose f (x) = − x, then Dom f = [0, +∞). Consider K = [0, 1]. We have
(1) and (2) are satisfied, but not (3). Note that f is not Lipschitz continuous
on K (indeed, we have lim f (0)−ft
(t)
= lim t−1/2 = +∞, while for a Lipschitz
t→+0 t→+0
continuous f the ratios t−1 (f (0) − f (t)) should be bounded). On the other
hand, f is bounded on K. With a properly chosen convex function f of two
variables and non-polyhedral compact domain (e.g., with Dom f being the unit
circle), we can demonstrate also that lack of (3), even in presence of (1) and
(2), may cause unboundedness of f on K as well.
■
Theorem III.13.9 says that a convex function f is bounded on every compact
(i.e., closed and bounded) subset of rint(Dom f ). In fact, in the case of convex
functions we can make a much stronger statement on their below boundedness
of f : any convex function f is below bounded on any bounded subset of Rn !

Proposition III.13.11 A convex function f is bounded from below on every

bounded subset of Rn .
Proof. It is clear that f is bounded from below at any point outside of the
domain of f . Thus, without loss of generality we may assume that Dom f is
full-dimensional and that 0 ∈ int(Dom f ). By Theorem III.13.9, there exists a
neighborhood U of the origin – which can be thought of to be a centered at the
origin ball of some radius r > 0 – where f is bounded from above by some C.
Now, consider an arbitrary R > 0 and an arbitrary point x satisfying ∥x∥2 ≤ R.
Then, the point
r
y := − x
R
is in U , and we have
r R
0= x+ y.
r+R r+R
As f is convex and ∥y∥2 ≤ r implying f (y) ≤ C, we conclude that
r R r R
f (0) ≤ f (x) + f (y) ≤ f (x) + C.
r+R r+R r+R r+R
By reorganizing the terms in this inequality, we get the lower bound
r+R R
f (x) ≥ f (0) − C.
r r
Thus, f is bounded below for any x in the ball that is centered at 0 and has
radius R. As R > 0 is arbitrary, for any bounded set, by selecting R > 0 large
enough we can find such a ball with radius R covering it.
Our proof of Theorem III.13.9 relies on the following “local” version of the
theorem.
176 How to detect convexity

Proposition III.13.12 Let f be a convex function. Then, for any x̄ ∈

rint (Dom f ), we have
(i) f is bounded at x̄, i.e., there exists r > 0 such that f is bounded in the
r-neighborhood Ur (x̄) of x̄ in the affine span of Dom f :
∃r > 0 and C such that |f (x)| ≤ C, ∀x ∈ Ur (x̄),
where Ur (x̄) := {x ∈ Aff(Dom f ) : ∥x − x̄∥2 ≤ r} ;
(ii) f is Lipschitz continuous at x̄, i.e., there exist constants ρ > 0 and L
such that
|f (x) − f (x′ )| ≤ L∥x − x′ ∥2 , ∀x, x′ ∈ Uρ (x̄).

Proof.
(i): 10 . We start with proving the above boundedness of f in a neighborhood of
x̄. This is immediate: by the premise of the proposition we have x̄ ∈ rint (Dom f ),
so there exists r̄ > 0 such that the neighborhood Ur̄ (x̄) is contained in Dom f .
Now, we can find a small simplex ∆ of the dimension m := dim(Aff(Dom f )) with
the vertices x0 , . . . , xm in Ur̄ (x̄) in such a way that x̄ will be a convex combination
of the vectors xi with positive coefficients, even with the coefficients 1/(m + 1),
i.e.,
m
X 1
x̄ = xi .
i=0
m + 1

Here is the justification of this claim that such a simplex ∆ exists: First, when
Dom f is a singleton, the claim is evident. So, we assume that dim(Dom f ) =
m ≥ 1. Without loss of generality, we may assume that x̄ = 0, so that 0 ∈ Dom f
and therefore Aff(Dom f ) = Lin(Dom f ). Then, by Linear Algebra, we can find
m vectors y 1 , . . . , y m in Dom f which form a basis of Lin(Dom f ) = Aff(Dom f ).
m
Setting y 0 := − y i and taking into account that 0 = x̄ ∈ rint (Dom f ), we can
P
i=1
find ϵ > 0 such that the vectors xi := ϵy i , i = 0, . . . , m, belong to Ur̄ (x̄). By
m
1
xi .
P
construction, x̄ = 0 = m+1
i=0
Note that x̄ ∈ rint (∆) (see Exercise I.3). Since ∆ spans the same affine sub-
space as Dom f , we can find a sufficiently small r > 0 such that r ≤ r̄ and
Ur (x̄) ⊆ ∆. Now, by definition,
(m m
)
X X
∆= λi xi : λi ≥ 0, ∀i, λi = 1 ,
i=0 i=0

so that by Jensen’s inequality, for all x ∈ ∆ we have

m
! m
X X
i
f λi x ≤ λi f (xi ) ≤ max f (xi ) =: Cu .
i=0,...,m
i=0 i=0

Thus, as ∆ ⊇ Ur (x̄), we conclude that f (x) ≤ Cu for all x ∈ Ur (x̄) as well.

13.5 Boundedness and Lipschitz continuity of a convex function 177

20 . Now let us prove that if f is above bounded, by some Cℓ , in Ur := Ur (x̄),

then in fact it is also below bounded in this neighborhood (and, consequently, f is
bounded in Ur ). Consider any x ∈ Ur , so that x ∈ Aff(Dom f ) and ∥x − x̄∥2 ≤ r.
Define x′ := x̄ − [x − x̄] = 2x̄ − x. Then, x′ ∈ Aff(Dom f ) and ∥x′ − x̄∥2 =
∥x − x̄∥2 ≤ r, and so x′ ∈ Ur . As x̄ = 21 [x + x′ ] and f is convex, we have
2f (x̄) ≤ f (x) + f (x′ ).
Note that this inequality holds for all x ∈ Ur , hence
f (x) ≥ 2f (x̄) − f (x′ ) =: Cℓ , ∀x ∈ Ur (x̄).
Thus, f is indeed bounded below by Cℓ in Ur .
Setting C := max {Cu , Cℓ } then concludes the proof of part (i).
(ii): Part (ii) is indeed an immediate consequence of part (i) and Lemma III.13.1.
From part (i) we already know that there exist positive constants r, C such that
|f (x)| ≤ C for all x ∈ Ur (x̄). We will prove that f is Lipschitz continuous in
the neighborhood Ur/2 (x̄). Consider any x, x′ ∈ Ur/2 (x̄) such that x ̸= x′ . Let us
extend the segment [x, x′ ] through the point x′ until it reaches to a certain point
x′′ on the (relative) boundary of Ur (x̄). Then,
x′ ∈ (x, x′′ ) and ∥x′′ − x̄∥2 = r.
Then, by (13.3) we have
f (x′′ ) − f (x)
f (x′ ) − f (x) ≤ ∥x′ − x∥2 .
∥x′′ − x∥2
As |f (y)| ≤ C for any y ∈ Ur (x̄) and x, x′′ ∈ Ur (x̄), we observe that |f (x′′ ) −
f (x)| ≤ 2C. Moreover, by the triangle inequality we have ∥x′′ − x∥2 ≥ ∥x′′ − x̄∥2 −
∥x̄ − x∥2 ≥ r − (r/2) = r/2, where the last inequality holds from ∥x′′ − x̄∥2 = r
′′
and ∥x̄ − x∥2 ≤ r/2 (as x ∈ Ur/2 (x̄)). Hence, the term f (x )−f (x)
∥x′′ −x∥2
is bounded above
by the quantity (2C)/(r/2) = 4C/r. Thus, we have
4C ′
f (x′ ) − f (x) ≤ ∥x − x∥2 , ∀x, x′ ∈ Ur/2 .
r
By swapping the roles of x and x′ , we arrive at
4C ′
f (x) − f (x′ ) ≤ ∥x − x∥2 ,
r
and hence
4C
|f (x) − f (x′ )| ≤ ∥x − x′ ∥2 , ∀x, x′ ∈ Ur/2 ,
r
as required in part (ii).
We are now ready to prove Theorem III.13.9.
Proof of Theorem III.13.9. We can extract Theorem III.13.9 from Propo-
sition III.13.12 by the standard Analysis reasoning. All we need is to prove that
if K is a bounded and closed (i.e., a compact) subset of rint (Dom f ), then f is
Lipschitz continuous on K (the boundedness of f on K is an evident consequence
178 How to detect convexity

of its Lipschitz continuity on K and boundedness of K). Assume for contradiction

that f is not Lipschitz continuous on K. Then, for every integer i there exists a
pair of distinct points xi , y i ∈ K such that
f (xi ) − f (y i ) ≥ i∥xi − y i ∥2 . (13.6)
Since K is compact, passing to a subsequence we can ensure that xi → x ∈ K
and y i → y ∈ K.
First, we claim that it is not possible to have x = y. To see this recall that by
Proposition III.13.12 f is Lipschitz continuous in a neighborhood B of x. If x = y
were to hold, then since xi → x, y i → y, this neighborhood B would contain all
xi and y i with large enough indices i; but then, from the Lipschitz continuity of
f in B, the ratios (f (xi ) − f (y i ))/∥xi − y i ∥2 would form a bounded sequence,
which we know is not the case. Thus, the case of x = y is impossible.
Next, we claim that the case of x ̸= y is not possible, either. Proposition III.13.12
implies that f is continuous on Dom f at both the points x and y (note that Lip-
schitz continuity at a point clearly implies the usual continuity at it), so that
f (xi ) → f (x) and f (y i ) → f (y) as i → ∞. Thus, the left hand side in (13.6)
remains bounded as i → ∞. On the other hand, as i → ∞ the right hand side
in (13.6) tends to ∞ since when i tends to ∞ and ∥xi − y i ∥2 tends to a nonzero
limit ∥x − y∥2 (recall x ̸= y in this case).
This is the desired contradiction since precisely one of the cases x = y and
x ̸= y must hold.
14

Minima and maxima of convex functions

14.1 Minima of convex functions

14.1.1 Unimodality
As it was already mentioned, optimization problems involving convex functions
possess nice theoretical properties. One of the most important of these properties
is given by the following result which states that any local minimizer of a convex
function is also its global minimizer.

Theorem III.14.1 [Unimodality] Let f : Rn → R ∪ {+∞} be a convex

function and let Q ⊆ Rn be a nonempty convex set. Suppose x∗ ∈ Q ∩ Dom f
is a local minimizer of f on Q, i.e.,
∃r > 0 such that f (y) ≥ f (x∗ ) ∀y ∈ Q satisfying ∥y − x∗ ∥2 < r. (14.1)
Then, x∗ is a global minimizer of f on Q, i.e.,
f (x∗ ) ≤ min {f (y) : y ∈ Q} . (14.2)
y

Moreover, the set Argmin f of all local (≡ global) minimizers of f on Q is

Q
convex.
If f is strictly convex, x ̸= y and λ ∈ (0, 1), then the set of its global
minimizers Argmin f is either empty or a singleton.
Q

Proof. If f is the convex function that is identical to +∞, Dom f = ∅ and there
is nothing to prove. So, we assume that Dom f ̸= ∅.
We will first show that any local minimizer of f is also a global minimizer
of f . Let x∗ be a local minimizer of f on Q. Consider any y ∈ Q such that
y ̸= x∗ . We need to prove that f (y) ≥ f (x∗ ). If f (y) = +∞, this relation is
automatically satisfied. So, we assume that y ∈ Dom f . Note that by definition
of a local minimizer, we also have x∗ ∈ Dom f for sure. Now, for any τ ∈ (0, 1),
by Lemma III.13.1 we have
f (x∗ + τ (y − x∗ )) − f (x∗ ) f (y) − f (x∗ )
≤ .
τ ∥y − x∗ ∥2 ∥y − x∗ ∥2
Since x∗ is a local minimizer of f , the left hand side in this inequality is nonneg-
179
180 Minima and maxima of convex functions

ative for all small enough values of τ > 0. Thus, we conclude that the right hand
side is nonnegative as well, i.e., f (y) ≥ f (x∗ ).
Note that Argmin f is nothing but the sublevel set levα (f ) of f associated with
Q
α taken as the minimal value min f of f on Q. Recall by Proposition III.12.6 any
Q
sublevel set of a convex function is convex, so this sublevel set Argmin f is convex.
Q
Finally, let us prove that the set Argmin f associated with a strictly convex f
Q
is, if nonempty, a singleton. Assume for contradiction that there are two distinct
minimizers x′ , x′′ in Argmin f . Then, from strict convexity of f , we would have
Q

1 ′ 1 ′′ 1
f x + x < (f (x′ ) + f (x′′ )) = min f,
2 2 2 Q

where the equality follows from x′ , x′′ ∈ Argmin f . But, this strict inequality is
Q
1 ′
impossible since 2
x + 12 x′′ ∈ Q as Q is convex and by definition of min f we
Q
cannot have a point in Q with objective value strictly smaller than min f .
Q

14.1.2 Necessary and sufficient optimality conditions

Another pleasant fact for differentiable convex functions is that the Calculus-
based necessary optimality condition (a.k.a., the Fermat rule) is sufficient for
global optimality.

Theorem III.14.2 [Necessary and sufficient optimality condition for differ-

entiable convex functions] Let f : Rn → R ∪ {+∞} be a convex function and
let Q ⊆ Rn be a nonempty convex set. Consider any x∗ ∈ int Q. Suppose
that f is differentiable at x∗ . Then, x∗ is a minimizer of f on Q if and only if
∇f (x∗ ) = 0.

Proof. The necessity of the condition ∇f (x∗ ) = 0 for local optimality is due to
Calculus, and so it has nothing to do with convexity. The essence of the matter
is, of course, the sufficiency of the condition ∇f (x∗ ) = 0 for global optimality of
x∗ in the case of convex function f . In fact, this sufficiency is readily given by the
gradient inequality (13.4). In particular, when ∇f (x∗ ) = 0 holds, (13.4) becomes
f (y) ≥ f (x∗ ) + (y − x∗ )∇f (x∗ ) = f (x∗ ), ∀y ∈ Q.

A natural question is what happens if x∗ in Theorem III.14.2 is not necessarily

an interior point of Q ⊆ Rn . Let us now assume that x∗ is an arbitrary point of a
convex set Q and that f is convex on Q and differentiable at x∗ (the latter means
exactly that Dom f contains a neighborhood of x∗ and f possesses the first order
derivative at x∗ ). Under these assumptions, our goal is to characterize when x∗
will be a minimizer of f on Q.
14.1 Minima of convex functions 181

In order to give such a characterization, given a convex set Q and a point

x ∈ Q, we will look at the radial cone of Q at x∗ [notation: TQ (x∗ )] defined as
∗

the set
TQ (x∗ ) := {h ∈ Rn : x∗ + th ∈ Q, ∀ small enough t > 0} .
Geometrically, this is the set of all directions “looking” from x∗ towards Q, so
that a small enough positive step from x∗ along the direction, i.e., adding to x∗
a small enough positive multiple of the direction, keeps the point in Q. That
is, TQ (x∗ ) is the set of all “feasible” directions at x∗ that starting from x∗ we
can go a positive distance along and remain in Q. From the convexity of Q it
immediately follows that the radial cone indeed is a cone (not necessary closed).
For example, when x∗ ∈ int Q, we have TQ (x∗ ) = Rn . Let us examine a more
interesting example, e.g., the polyhedral set
Q = x ∈ Rn : a⊤

i x ≤ bi , i = 1, . . . , m , (14.3)
and its radial cone. For any x∗ ∈ Q, we define I(x∗ ) := {i : a⊤ ∗
i x = bi } as the
set of indices of constraints that are active at x (i.e., those satisfied at x∗ as
∗

equalities rather than as strict inequalities). Then,

TQ (x∗ ) = h ∈ Rn : a⊤ ∀i ∈ I(x∗ ) .

i h ≤ 0, (14.4)
The radial cone TQ (x∗ ) of a convex set Q taken at x∗ ∈ Q gives rise to another
cone, namely the normal cone [notation: NQ (x∗ )]. By definition, the normal cone
is the negative of the dual of the radial cone TQ (x∗ ), i.e.,
NQ (x∗ ) := h ∈ Rn : h⊤ (x′ − x∗ ) ≤ 0, ∀x′ ∈ Q .

(14.5)
Now, we are ready to present the necessary and sufficient condition for x∗ to
be a minimizer of a convex function f on Q.

Proposition III.14.3 Let Q be a convex set and x∗ ∈ Q, and suppose f

is a convex function on Q which is differentiable at x∗ . The necessary and
sufficient condition for x∗ to be a minimizer of f on Q is that the derivative
of f taken at x∗ along every direction from TQ (x∗ ) should be nonnegative,
i.e.,
h⊤ ∇f (x∗ ) ≥ 0, ∀h ∈ TQ (x∗ ).
By the duality relation between TQ (x∗ ) and NQ (x∗ ), this is precisely the same
condition as the inclusion
∇f (x∗ ) ∈ −NQ (x∗ ).
Thus, with Q, x∗ and f as above, x∗ minimizes f over Q if and only if
−∇f (x∗ ) belongs to the normal cone NQ (x∗ ) of Q at x∗ .

Proof. The necessity of this condition is an evident fact which has nothing to
do with convexity. Suppose that x∗ is a local minimizer of f on Q. Assume for
contradiction that there exists h ∈ TQ (x∗ ) such that h⊤ ∇f (x∗ ) < 0. Then, by
182 Minima and maxima of convex functions

h ∈ TQ (x∗ ), we get x∗ + th ∈ Q for all small enough positive t. Moreover, as

h⊤ ∇f (x∗ ) < 0, we would also get

f (x∗ + th) < f (x∗ ),

for all small enough positive t. These two observations together thus imply that
in every neighborhood of x∗ there are points x from Q with values f (x) strictly
smaller than f (x∗ ). This clearly contradicts the assumption that x∗ is a local
minimizer of f on Q.
Once again, the sufficiency of this condition is given by the gradient inequality,
exactly as in the case when x∗ ∈ int Q discussed in the proof of Theorem III.14.2.

Proposition III.14.3 states that under its premise the necessary and sufficient
condition for x∗ to minimize f on Q is the inclusion ∇f (x∗ ) ∈ −NQ (x∗ ). What
does this condition actually mean? The answer depends on what the normal cone
is: whenever we have an explicit description of it, we have an explicit form of the
optimality condition. For example,
• Consider the case of TQ (x∗ ) = Rn , i.e., x∗ ∈ int Q. Then, the normal cone
NQ (x∗ ) is the cone of all the vectors h that have nonpositive inner products with
every vector in Rn , i.e., NQ (x∗ ) = {0}. Consequently, in this case the necessary
and sufficient optimality condition of Proposition III.14.3 becomes the Fermat
rule ∇f (x∗ ) = 0, which we already know.
• When Q is an affine plane given by linear equalities Ax = b, A ∈ Rm×n , the
radial cone at every point x ∈ Q is the linear subspace {d : Ad = 0}, the normal
cone is the orthogonal complement {u = A⊤ v : v ∈ Rm } to this linear subspace,
and the optimality condition reads

Given Q = {x : Ax = b}, a point x∗ ∈ Q, and a function f that is convex

on Q and differentiable at x∗ , x∗ is a minimizer of f on Q if and only if
∃v ∈ Rm : ∇f (x∗ ) + A⊤ v = 0.

• When Q is the polyhedral set (14.3), the radial cone is the polyhedral cone
(14.4), i.e., it is the set of all directions which have nonpositive inner products with
all ai for i ∈ I(x∗ ) (recall that these ai are coming from the constraints a⊤ i x ≤ bi
specifying Q that are satisfied as equalities at x∗ ). The corresponding normal
cone is thus the set of all vectors which have nonpositive inner products with all
these directions in TQ (x∗ ), i.e., of vectors a such that the inequality h⊤ a ≤ 0 is
a consequence of the inequalities h⊤ ai ≤ 0, i ∈ I(x∗ ) = {i : a⊤ ∗
i x = bi }. From
the Homogeneous Farkas Lemma we conclude that the normal cone is simply
the conic hull of the vectors ai , i ∈ I(x∗ ). Thus, in this case our necessary and
sufficient optimality condition becomes:
14.1 Minima of convex functions 183

Given Q = {x ∈ Rn : a⊤ ∗
i x ≤ bi , i = 1, . . . , m}, a point x ∈ Q, and a
∗ ∗
function f that is convex on Q and differentiable at x , x is a minimizer
of f on Q if and only if there exist nonnegative reals λ∗i (“Lagrange
multipliers”) associated with “active” indices i (those from I(x∗ )) such
that
X
∇f (x∗ ) + λ∗i ai = 0,
i∈I(x∗ )

or, equivalently, there exist nonnegative λ∗i , 1 ≤ i ≤ m, such that

m
X
∗
∇f (x ) + λ∗i ai = 0, (Karush-Kuhn-Tucker equation)
i=1

and λ∗i [a⊤ ∗

i x − bi ] = 0, 1 ≤ i ≤ m. (complementary slackness)
This is our first acquaintance with the famous Karush-Kuhn-Tucker (KKT)
optimality conditions. We will eventually extend them to the case where Q is given
by convex, rather than just linear, constraints (see Theorem IV.23.4). Indeed,
KKT conditions are also necessary for local optimality in the case of nonconvex
Mathematical Programming problems.
Remark III.14.4 Let us give an informal explanation of the preceding results on
first-order optimality conditions through Physics (see Figure III.2 for a graphical
illustration). Consider the optimization problem given by
min {f (x) : ai (x) ≤ 0, i = 1, . . . , m} .
x∈Rn

This problem can be interpreted as locating the equilibrium position of a particle

that is moving through Rn while being affected by an external force (like gravity)
with potential f , meaning that when the position of the particle is x ∈ Rn , the
force acting at the particle is −∇f (x). The domain in which the particle can
actually travel is Q := {x ∈ Rn : ai (x) ≤ 0, i ≤ m}; think about areas ai (x) > 0
as rigid obstacles that the particle cannot penetrate into. When the particle
touches i-th obstacle (i.e., is in position x with ai (x) = 0), the obstacle produces
a reaction force directed along the inward normal −∇ai (x) to the boundary of
the obstacle, so that the reaction force is −λi ∇ai (x); here λi ≥ 0 depends on the
pressure on the obstacle exerted by the particle. At an equilibrium x∗ (which,
by Physics, should minimize, at least locally, the potential f over Q), the total
of the forces acting at the particle should be zero, that is, for properly selected
λi ≥ 0 one should have −∇f (x ) − i:ai (x∗ )=0 λi ∇ai (x∗ ) = 0, which is exactly
∗
P
what is said by our Karush-Kuhn-Tucker (KKT) optimality condition as applied
to the problem where the functions ai (x) = a⊤ i x − bi are affine. ■

14.1.3 ⋆ Symmetry Principle

We close this section by discussing a simple characterization of minimizers of
convex functions admitting certain symmetry. When applicable, this characteri-
184 Minima and maxima of convex functions

Figure III.2. Physical illustration of KKT optimality onditions for optimization problem
minx∈R2 {f (x) : ai (x) ≤ 0, i = 1, 2, 3}.
White area represents the feasible domain Q, while ellipses A, B, C represent the sets
a1 (x) ≤ 0, a2 (x) ≤ 0, a3 (x) ≤ 0. The point x is a candidate feasible solution located at
the intersection {u ∈ R2 : a1 (u) = a2 (u) = 0} of boundaries of A and B. g = −∇f (x)
is external force acting at particle located at x, p and q are reaction forces created by
obstacles A and B. The condition for x to be at equilibrium reduces to g + p + q = 0,
as on the picture. Equilibrium condition g + p + q = 0 translates to the KKT equation
∇f (x) + λ1 a1 (x) + λ2 ∇a2 (x) = 0
holding for some nonnegative λ1 , λ2 .

zation is extremely useful. This result is indeed an almost immediate consequence

of Proposition III.12.6.

Proposition III.14.5 Let f be a convex function with domain Q := Dom f ⊆

Rn . Suppose that f and Q admit a group of symmetries. That is, there ex-
ists a finite collection G = {G0 , . . . , Gm } of distinct from each other n × n
nonsingular matrices such that
(i) G is a finite group, i.e., G ∈ G implies G−1 ∈ G as well, and for any
G′ , G′′ ∈ G, we also have G′ G′′ ∈ G;
(ii) all matrices G ∈ G are symmetries of Q and f , i.e., for any G ∈ G, we
have Gx ∈ Q and f (x) = f (Gx) for all x ∈ Q.
Suppose also that the set of minimizers of f on Q, i.e., Q∗ := Argminx∈Q f (x),
is nonempty. Then, f admits a G-symmetric minimizer, that is, there exists
some x∗ ∈ Q∗ such that Gx∗ = x∗ for all G ∈ G.

Proof. Consider any fixed x ∈ Q∗ . Since matrices G ∈ G are symmetries of

f and Q, they are symmetries of Q∗ , i.e., Gx ∈ Q∗ for all G ∈ G (why?) as
well. Now, because f is convex, Proposition III.12.6 implies that the set Q∗ =
Argminu∈Q f (u) = {u ∈ Q : f (u) ≤P minQ f } is convex and thus contains, along
1
with the point x, the point x∗ := mP G∈G Gx. It remains to show that Hx∗ = x∗
1
for all H ∈ G. Indeed, Hx∗ = m G∈G HGx, and the terms in the latter sum
form a permutation of the terms in the sum specifying x∗ due to the fact that G
is a group. Thus, x∗ is the desired G-symmetric minimizer of f .

Illustration: Geometric-Arithmetic Mean inequality. Let us use Symmetry Prin-

14.2 Maxima of convex functions 185

ciple to justify the following classical inequality:

√ a1 + a2 + . . . + am
For any a1 , . . . , am ∈ R+ , we have m a1 . . . am ≤ .
m
Pm
To prove this inequality, we define a := a1 +. . .+am , Q := {x ∈ Rm + : i=1 xi =
a}, and
(√
m
x1 . . . xm , if x ∈ Q,
f (x) :=
+∞, otherwise.
Note that Q is nonempty and compact, and f (x) is continuous and concave (see
the end of section 13.2). Thus, by Theorem B.31 the set of maximizers of f on
Q, i.e., Q∗ , is nonempty. Clearly, the m! permutation matrices of size m × m
form a group of symmetries of f and Q, so that Q∗ contains a permutationally
symmetric point x∗ (apply Proposition III.14.5 to minimize the convex function
−f over Q). Since the sum of all entries in a point from Q is a, Q contains exactly
1 1
one permutationally symmetric point m [a; a; . . . ; a]. Then, as m [a; a; . . . ; a] is a
maximizer of f over Q, we conclude that for every [a1 ; . . . ; am ] ∈ Q we have
√

1 1 a1 + a2 + . . . + am
m
a1 . . . am = f ([a1 ; . . . ; am ]) ≤ f [a; . . . ; a] = a = .
m m m

14.2 Maxima of convex functions

So far we have seen that the fact that a point x∗ ∈ Dom f is a global minimizer
of a convex function f depends only on the local behavior of f at x∗ . This is not
the case with maximizers of a convex function. First of all, in all nontrivial cases,
such a maximizer, if one exists at all, must belong to the relative boundary of
the domain of the function.

Theorem III.14.6 Let f be a convex function. If a point x∗ ∈ rint (Dom f )

is a maximizer of f over Dom f , then f is constant on Dom f .

Proof. Define Q := Dom f , and consider any y ∈ Q. We need to prove that

f (y) = f (x∗ ). There is nothing to prove if y = x∗ , so we assume that y ̸= x∗ .
Since, by assumption, x∗ ∈ rint Q, there exists a point y ′ ∈ Q such that x∗ is an
interior point of the segment [y ′ , y], i.e., there exists λ ∈ (0, 1) such that
x∗ = λy ′ + (1 − λ)y.
Then, as f is convex, we get
f (x∗ ) ≤ λf (y ′ ) + (1 − λ)f (y).
As x∗ is a maximizer of f on Q, we have f (x∗ ) ≥ f (y) and f (x∗ ) ≥ f (y ′ ).
Combining this with the preceding inequality leads to λf (y ′ )+(1−λ)f (y) = f (x∗ ).
Since λ ∈ (0, 1) and max{f (y), f (y ′ )} ≤ f (x∗ ), this equation can hold as equality
only if f (y ′ ) = f (y) = f (x∗ ).
186 Minima and maxima of convex functions

In particular, Theorem III.14.6 states that given a convex function f the only
way for a point x∗ ∈ rint (Dom f ) to be a global maximizer of f is if the function
f is constant over its domain.
Next, we provide further information on maxima of convex functions.

Theorem III.14.7 Let f be a convex function on Rn and E ⊆ Rn be a

nonempty set. Then,
sup f (x) = sup f (x). (14.6)
x∈Conv E x∈E
n
In particular, if S ⊂ R is a nonempty convex and compact set, then the
supremum of f on S is equal to the supremum of f on the set of extreme
points of S, i.e.,
sup f (x) = sup f (x). (14.7)
x∈S x∈Ext(S)

Proof. We will first prove (14.6). As Conv E ⊇ E, we have sup f (x) ≥

x∈Conv E
sup f (x). To prove the reverse direction, consider any x̄ ∈ Conv E. Then, there
x∈E
exist xi ∈ E and convex combination weights λi ≥ 0 satisfying
P
λi = 1 and
i
X
x̄ = λi xi .
i

Applying Jensen’s inequality (Proposition III.12.4), we get

X X
f (x̄) ≤ λi f (xi ) ≤ λi sup f (x) = sup f (x).
i i x∈E x∈E

Since the preceding inequality holds for any x̄ ∈ Conv E, we conclude sup f (x) ≤
x∈Conv E
sup f (x) holds as well, as desired.
x∈E
To prove (14.7), note that when S is a nonempty convex compact set, by Krein-
Milman Theorem (Theorem II.8.6) we have S = Conv(Ext(S)). Then, (14.7)
follows immediately from (14.6).
Our last theorem on maxima of convex functions is as follows.

Theorem III.14.8 [Maxima of convex functions] Let f be a proper convex

function, and let Q := Dom f . Then,
(i) If Q is closed and does not contain lines and the set of global maximizers
of f , i.e.,
Argmax f := {x ∈ Q : f (x) ≥ f (y), ∀y ∈ Q} ,
Q

is nonempty, then Argmax f ∩Ext(Q) ̸= ∅, i.e., at least one of the maximizers

Q
of f is an extreme point of Q.
14.2 Maxima of convex functions 187

(ii) If the set Q is polyhedral and f is above bounded on Q, then the

maximum of f on Q is achieved, i.e., Argmax f ̸= ∅.
Q

Proof. As f is a proper convex function, we have Q ̸= ∅.

Let us start with the following immediate observation:
(!) Let a convex function f be bounded from above on a ray ℓ = x̄ +
R+ (e). Then, the function does not increase along the ray, i.e., for any
x ∈ ℓ and s ≥ 0, we have f (x + se) ≤ f (x).
Indeed, assuming for contradiction that there exists x ∈ ℓ and x′ = x + se, s ≥ 0,
with f (x′ ) > f (x), we conclude that s > 0 and that the average rate of change
′
κ := f (x )−f
s
(x)
when moving from x to x′ is positive: κ > 0. Since f is convex,
the average rate f (x+te)−f
r
(x)
when moving from x to x + te with t ≥ s is at least
κ > 0 (section 13.2), implying that f (x + te) → +∞ as t → ∞, which is the
desired contradiction, since f is bounded from above on ℓ.
As an immediate corollary of (!), we conclude that
(!!) If Q is a convex set represented as V + R, where V is nonempty, and
R is a cone, and f is a convex bounded from above function on Q, then
f attains its maximum on Q if and only if it attains its maximum on V ,
the maxima being equal to each other.
Indeed, assume that f attains its maximum on Q at some point x̄. As Q = V +R,
we have x̄ = v̄ + e for some v̄ ∈ V and e ∈ R. By (!), we have f (v̄) ≥ f (x̄) which
combines with V ⊆ Q to imply that v̄ is a maximizer of f on both V and Q.
Vice versa, assume that f attains its maximum on V at a certain point v̄. Every
x ∈ Q can be represented as v + e with v ∈ V and e ∈ R, so that by (!) we have
f (x) ≤ f (v) and thus f (x) ≤ f (v̄), and we again conclude that v̄ maximizes f
both on V and on Q.
Theorem III.14.8 is an immediate corollary of (!) and (!!). Indeed, in the case
of (i), as Q is a nonempty closed convex set that does not contain lines, by
Theorem II.8.16 we deduce Ext(Q) ̸= ∅ and Q admits a representation as
Q = Conv(W ) +R, (14.8)
| {z }
V

W = Ext(Q), R = Rec(Q). In the situation of (i), f attains its maximum on Q,

and therefore, by (!!), f has a maximizer v̄ belongingPto V . Due to the origin of
V , v̄ = i∈I λi v i with nonempty finite set I, λi ≥ 0, i λi = 1, and v i ∈ Ext(Q).
P
By convexity of f , f (v̄) ≤ i∈I λi f (v i ) ≤ maxi∈I f (v i ) =: f (v i∗ ), implying that
P
the extreme point v i∗ of Q maximizes f on Q.
In the case of (ii), the polyhedral set Q by Theorem II.9.10 admits representa-
tion (14.8) with nonempty finite W , implying that f , which is convex and real-
valued on V = Conv(W ) ⊆ Q = Dom f , attains its maximum on V , e.g., at its
maximizer on the nonempty and finite set W . By (!!), this maximizer maximizes
f on Q as well, so that f achieves its maximum on Q, as claimed in (ii).
When Q = ∅, there is nothing to prove so we assume Q ̸= ∅.
15

Subgradients

15.1 Proper lower semicontinuous convex functions and their

representation
Recall that an equivalent definition of a convex function f on Rn is that f is a
function taking values in R ∪ {+∞} such that its epigraph
epi{f } = [x; t] ∈ Rn+1 : t ≥ f (x)

is a convex set. Thus, there is no essential difference between convex functions and
convex sets: a convex function generates a convex set, i.e., its epigraph, which of
course remembers everything about the function. And the only specific property
of the epigraph as a convex set is that it always possesses a very specific recessive
direction, namely h = [0; 1]. That is, the ray {z + th : t ≥ 0} directed by h
belongs to the epigraph set whenever the starting point z of the ray is in the set.
Whenever a convex set possesses a nonzero recessive direction h such that −h is
not a recessive direction, the set in appropriate coordinates becomes the epigraph
of a convex function. Thus, a convex function is, basically, nothing but a way to
look, in the literal meaning of the latter verb, at a convex set.
Now, we know that the convex sets that are “actually nice” are the closed ones:
they possess a lot of important properties (e.g., admit a good outer description)
which are not shared by arbitrary convex sets. Therefore, among convex functions
there also are “actually nice” ones, namely those with closed epigraphs. Closed-
ness of the epigraph of a function can be “translated” to the functional language
and there it becomes a special kind of continuity, namely lower semicontinuity.
Before formally defining lower semicontinuity, let us do a brief preamble on con-
vergence of sequences on the extended real line. In the sequel, we will oper-
ate with limits of sequences {ai }i≥1 with terms ai from the extended real line
R := R ∪ {+∞} ∪ {−∞}. These limits are defined in the natural way: the rela-
tion
lim ai = a ∈ R
i→∞

means that for every a′ ∈ R

• ai < a′ for all but finitely many values of i, when a′ > a, and
• ai > a′ for all but finitely many values of i, when a′ < a.
Equivalent way to treat convergence of sequences {ai }i ⊆ R is as follows: let us
188
15.1 Proper lower semicontinuous convex functions and their representation 189

fix somehow a strictly monotone continuous function θ on R which maps the

axis onto the interval (−1, 1) (that is, lims→−∞ θ(s) = −1, lims→∞ θ(s) = 1), e.g.,
θ(s) = π2 atan(s), and extend it from R to R by setting

−1,
 if a = −∞,
θ(a) = θ(a), if a ∈ R,

1, if a = +∞.


With this “encoding,” R becomes the segment [−1, 1], and the relation a =
limi→∞ ai as defined above is the same as θ(a) = limi→∞ θ(ai ), that is, this
relation stands for the usual convergence, as i → ∞, of reals θ(ai ) to the real
θ(a). Note also that for a, b ∈ R the relation a ≤ b (a < b) is exactly the same as
the usual arithmetic inequality θ(a) ≤ θ(b) (θ(a) < θ(b), respectively).
With convergence and limits of sequences {ai }i ⊆ R already defined, we can
speak about upper (lower) limits of these sequences. For example, we can define
lim inf i→∞ ai as a ∈ R uniquely specified by the relation θ(a) = lim inf i→∞ θ(ai ).
Same as with lower limits of sequences of reals, lim inf i→∞ ai is the smallest (in
terms of the relation ≤ on R!) of the limits of converging (in R!) subsequences
of the sequence {ai }i .
It is time to come back to lower semicontinuity.

Definition III.15.1 [Lower semicontinuity] Let f : Rn → R ∪ {+∞} be a

function (not necessarily convex). f is called lower semicontinuous (lsc for
short) at a point x̄, if for every sequence of points {xi }i converging to x̄ one
has
f (x̄) ≤ lim inf f (xi ).
i→∞

A function f is called lower semicontinuous, if it is lower semicontinuous

at every point.

A trivial example of an lsc function is a continuous one. Note, however, that an

lsc function need not be continuous; what it is needed for lower semicontinuity is
that the function can make only “jump downs.” For example, the function
(
0, if x ̸= 0,
f (x) =
a, if x = 0,

is lsc if a ≤ 0 (the function can “jump down at x = 0 or no jump at all”), and is

not lsc if a > 0 (“jump up”). For more illustrations, see Figure III.3.
Here is the connection between lower semicontinuity of a function and the
geometry of its epigraph.

Proposition III.15.2 A function f : Rn → R ∪ {+∞} is lower semicontin-

uous if and only if its epigraph is closed.
190 Subgradients

Figure III.3. Continuity and lower semicontinuity

a) continuous function f (x), Dom f = [0, 1]
b-d) functions on [0, 1] discontinuous at x = 0.5. Given their common restriction on
{0 ≤ x ≤ 1, x ̸= 0.5} and setting q = limϵ→+0 inf |x−0.5|≤ϵ f (x), to be lsc,
x̸=0.5
we should have f (0.5) ≤ q, as in b-c). Function d) is not lsc.

Proof. First, suppose epi{f } is closed, and let us prove that f is lsc. Consider
a sequence {xi }i such that xi → x as i → ∞, and let us prove that f (x) ≤ a :=
lim inf i→∞ f (xi ). There is nothing to prove when a = +∞. Assuming a < +∞, by
the definition of lim inf there exists a sequence i1 < i2 < . . . such that f (xij ) → a
as j → ∞. Let us assume that a > −∞ (we will verify later on that this is in fact
the case). Then, as the points [xij ; f (xij )] ∈ epi{f } converge to [x; a] and epi{f }
is closed, we see that [x; a] ∈ epi{f }, that is, f (x) ≤ a, as claimed. It remains
to verify that a > −∞. Indeed, assuming a = −∞, we conclude that for every
t ∈ R the points [xij ; t] belong to epi{f } for all large enough values of j, which,
as above, implies that [x; t] ∈ epi{f }, that is, t ≥ f (x). The latter inequality
cannot hold true for all real t, since f does not take value −∞; thus, a = −∞ is
impossible.
Now, for the opposite direction, let f be lsc, and let us prove that epi{f } is
closed. So, we should prove that if [xi ; ti ] → [x; t] as i → ∞ and [xi ; ti ] ∈ epi{f },
that is, ti ≥ f (xi ) for all i, then [x; t] ∈ epi{f }, that is, t ≥ f (x). Indeed, since f is
lsc and f (xi ) ≤ ti , we have f (x) ≤ lim inf i→∞ f (xi ) ≤ lim inf i→∞ ti = limi→∞ ti =
t.
An immediate consequence of Proposition III.15.2 is as follows:

Corollary III.15.3 Given an arbitrary family of lsc functions fα : Rn →

R ∪ {+∞}, their supremum given by
f (x) := sup fα (x)
α∈A

is lower semicontinuous.
Proof. The epigraph of the function f is the intersection of the epigraphs of all
functions fα , and the intersection of closed sets is always closed.
Now let us look at convex, proper, and lower semicontinuous functions, that
is, functions Rn → R ∪ {+∞} with closed convex and nonempty epigraphs. To
save words, let us call these functions regular.
What we are about to do is to translate to the functional language several con-
structions and results related to convex sets. In the usual life, a translation (e.g.,
15.1 Proper lower semicontinuous convex functions and their representation 191

of poetry) typically results in something less rich than the original. In contrast
to this, in mathematics this is a powerful source of new ideas and constructions.
“Outer description” of a proper lower semicontinuous convex function.
We know that any closed convex set is the intersection of closed half-spaces. What
does this fact imply when the set is the epigraph of a regular function f ? First
of all, note that the epigraph is not a completely arbitrary convex set in Rn+1 : it
has the recessive direction e := [0n ; 1], i.e., the basic orth of the t-axis in the space
of variables x ∈ Rn , t ∈ R where the epigraph lives. This direction, of course,
should be recessive for every closed half-space
Π = [x; t] ∈ Rn+1 : αt ≥ d⊤ x − a , where |α| + ∥d∥2 > 0,

(*)
containing epi{f }. Note that in (*) we are adopting a specific form of the nonstrict
linear inequality describing the closed half-space Π among many possible forms
in the space where the epigraph lives; this form is the most convenient for us
now. Thus, e should be a recessive direction of Π ⊇ epi{f }, and the recessiveness
of e for Π means exactly that α ≥ 0. Thus, speaking about closed half-spaces
containing epi{f }, we in fact are considering some of the half-spaces (*) with
α ≥ 0.
Now, there are two essentially different possibilities for α to be nonnegative:
(A) α > 0, and (B) α = 0. In the case of (B) the boundary hyperplane of Π is
“vertical,” i.e., it is parallel to e, and in fact it “bounds” only x. And, in such
cases, Π is the set of all vectors [x; t] with x belonging to certain half-space in the
x-subspace and t being an arbitrary real number. These “vertical” half-spaces
will be of no interest to us.
The half-spaces which indeed are of interest to us are the “nonvertical” ones:
those given by the case (A), i.e., with α > 0. For a non-vertical half-space Π,
we can always divide the inequality defining Π by α and make α = 1. Thus,
a “nonvertical” candidate eligible for the role of a closed half-space containing
epi{f } can always be written as
Π = [x; t] ∈ Rn+1 : t ≥ d⊤ x − a .

(**)
That is, a “nonvertical” closed half-space containing epi{f } can be represented
as the epigraph of an affine function of x.
Now, when is such a candidate indeed a half-space containing epi{f }? It is
clear that the answer is yes if and only if the affine function d⊤ x − a is less than
or equal to f (x) for all x ∈ Rn . This is indeed precisely what we call as “d⊤ x − a
is an affine minorant of f .” In fact, we have a very nice characterization of proper
lsc convex functions through their affine minorants!

Proposition III.15.4 A proper lower semicontinuous convex function f is

the pointwise supremum of all its affine minorants. Moreover, at every point
x̄ ∈ rint (Dom f ), the function f is even not only the supremum, but simply
the maximum of its affine minorants. That is, at every point x̄ ∈ rint (Dom f ),
192 Subgradients

there exists an affine function fx̄ (x) such that fx̄ (x) ≤ f (x) for all x ∈ Rn
and fx̄ (x̄) = f (x̄).

Note that Proposition III.15.4 essentially gives us an outer description of a

proper lsc convex function. This outer description is in fact instrumental in de-
veloping algorithms for convex optimization. Before proceeding with the proof of
Proposition III.15.4, we will first prove an intermediate result on proper convex
(but not necessarily lower semicontinuous) functions. This result indeed forms
the most important step in the proof of Proposition III.15.4.

Proposition III.15.5 Let f be a proper convex function. By definition

of affine minorant, we immediately have that the supremum of all affine
minorants of f is less than or equal to f everywhere. Let x̄ ∈ rint (Dom f ).
Then, there exists an affine minorant d⊤ x − a of f which coincides with f at
x̄, i.e.,
f (x) ≥ d⊤ x − a, ∀x ∈ Rn , and d⊤ x̄ − a = f (x̄). (15.1)
Moreover, this supremum is +∞ outside of cl(Dom f ). Thus, the supremum
of all affine minorants of f is equal to f everywhere except, perhaps, some
subset of rbd(Dom f ).

Proof. I. We will first prove that at every x̄ ∈ rint (Dom f ) there exists an affine
function fx̄ (x) such that fx̄ (x) ≤ f (x) for all x ∈ Rn and fx̄ (x̄) = f (x̄).
I.10 First of all, we can easily reduce the situation to the one when Dom f is
full-dimensional. Indeed, by shifting f we can make Aff(Dom f ) to be a linear
subspace L in Rn ; restricting f onto this linear subspace, we clearly get a proper
function on L. If we believe that our statement is true for the case when Dom f
is full-dimensional, we can conclude that there exists an affine function on L, i.e.,

d⊤ x − a [where x ∈ L]

(where d ∈ L) such that

f (x) ≥ d⊤ x − a, ∀x ∈ L, and f (x̄) = d⊤ x̄ − a.

This affine function d⊤ x − a on L clearly can be extended, by the same formula,

from L on the entire Rn and is a minorant of f on the entire Rn (note that
outside of L ⊇ Dom f , the function f simply is +∞!). This affine minorant on
Rn is exactly what we need.
I.20 . Now let us prove that our statement is valid when Dom f is full-dimensional.
In such a case, x̄ ∈ int(Dom f ). Consider the point ȳ := [x̄; f (x̄)]. Note that
ȳ ∈ epi{f }, and in fact we claim further that ȳ ∈ rbd(epi{f }). Assume for con-
tradiction that ȳ ∈ rint (epi{f }). Recall that e := [0n ; 1] is the special recessive
direction of epi{f }, thus ȳ ′ := ȳ + e satisfies ȳ ′ ∈ epi{f } and so the segment
[ȳ ′ , ȳ] is contained in epi{f }. Since ȳ ∈ rint (epi{f }), we can extend this segment
a little more through its endpoint ȳ, without leaving epi{f }. But, this is clearly
15.1 Proper lower semicontinuous convex functions and their representation 193

impossible, since in such a case the t-coordinate of the new endpoint would be
< f (x̄) while the x-component of it still would be x̄. Thus, ȳ ∈ rbd(epi{f }).
Next, we claim that ȳ ′ is an interior point of epi{f }. This is immediate: we know
from Theorem III.13.9 that f is continuous at x̄ (recall that x̄ ∈ int(Dom f )), so
that there exists a neighborhood U of x̄ in Aff(Dom f ) = Rn such that f (x) ≤
f (x̄) + 0.5 whenever x ∈ U , or, in other words, the set
V := {[x; t] : x ∈ U, t > f (x̄) + 0.5}
is contained in epi{f }; but this set clearly contains a neighborhood of ȳ ′ in Rn+1 .
We see that epi{f } is full-dimensional, so that rint(epi{f }) = int(epi f ) and
rbd(epi{f }) = bd(epi f ).
Now let us look at a hyperplane Π supporting cl(epi{f }) at the point ȳ ∈
rbd(epi{f }). W.l.o.g., we can represent this hyperplane via a nontrivial (i.e.,
with |α| + ∥d∥2 > 0) linear inequality
αt ≥ d⊤ x − a. (15.2)
satisfied everywhere on cl(epi{f }), specifically, as the hyperplane where this in-
equality holds true as equality. Now, inequality (15.2) is satisfied everywhere on
epi{f }, and therefore at the point ȳ ′ := [x̄; f (x̄) + 1] ∈ epi{f } as well, and is
satisfied as equality at ȳ = [x̄; f (x̄)] (since ȳ ∈ Π). These two observations clearly
imply that α ≥ 0. We claim that α > 0. Indeed, inequality (15.2) says that the
linear form h⊤ [x; t] := αt − d⊤ x attains its minimum over y ∈ cl(epi{f }), equal
to −a, at the point ȳ. Were α = 0, we would have h⊤ ȳ = h⊤ ȳ ′ , implying that
the set of minimizers of the linear form h⊤ y on the set cl(epi{f }) contains an
interior point (namely, ȳ ′ ) of the set. This is possible only when h = 0, that is,
α = 0, d = 0, which is not the case.
Now, as α > 0, by dividing both sides of (15.2) by α, we get a new inequality
of the form
t ≥ d⊤ x − a, (15.3)
(here we keep the same notation for the right hand side coefficients as we will
never come back to the old coefficients) which is valid on epi{f } and is equality
at ȳ = [x̄; f (x̄)]. Its validity on epi{f } implies that for all [x; t] with x ∈ Dom f
and t = f (x), we have
f (x) ≥ d⊤ x − a, ∀x ∈ Dom f. (15.4)
Thus, we conclude that the function d⊤ x − a is an affine minorant of f on Dom f
and therefore on Rn (f = +∞ outside Dom f !). Finally, note that the inequality
(15.4) becomes an equality at x̄, since (15.3) holds as equality at ȳ. The affine
minorant we have just built justifies the validity of the first claim of the propo-
sition.
II. Let F be the set of all affine functions which are minorants of f , and define
the function
f¯(x) := sup ϕ(x).
ϕ∈F
194 Subgradients

We have proved that f¯(x) is equal to f on rint (Dom f ) (and at any x ∈ rint (Dom f )
in fact sup in the right hand side can be replaced with max). To complete
the proof of the proposition, we should prove that f¯ is equal to f outside of
cl(Dom f ) as well. Note that this is the same as proving that f¯(x) = +∞ for all
x ∈ Rn \ cl(Dom f ). To see this, consider any x̄ ∈ Rn \ cl(Dom f ). As cl(Dom f )
is a closed convex set, x̄ can be strongly separated from Dom f , see Separation
Theorem (ii) (Theorem II.7.3). Thus, there exist ζ > 0 and z ∈ Rn such that
z ⊤ x̄ ≥ z ⊤ x + ζ, ∀x ∈ Dom f. (15.5)
In addition, we already know that there exists at least one affine minorant of f ,
i.e., there exist a and d such that
f (x) ≥ d⊤ x − a, ∀x ∈ Dom f. (15.6)
Multiplying both sides of (15.5) by a positive weight λ and then adding it to
(15.6), we get
f (x) ≥ (d + λz)⊤ x + [λζ − a − λz ⊤ x̄], ∀x ∈ Dom f.
| {z }
=:ϕλ (x)

This inequality clearly says that ϕλ (·) is an affine minorant of f on Rn for every
λ > 0. The value of this minorant at x = x̄ is equal to d⊤ x̄ − a + λζ and therefore
it goes to +∞ as λ → +∞. We see that the supremum of affine minorants of f at
x̄ indeed is +∞, as claimed. This concludes the proof of Proposition III.15.5.
Let us now prove Proposition III.15.4.
Proof of Proposition III.15.4. Under the premise of the proposition, f is a
proper lsc convex function. Let F be the set of all affine functions which are
minorants of f , and let
f¯(x) := sup ϕ(x).
ϕ∈F

be the supremum of all affine minorants of f . Then, by Proposition III.15.5, we

know that f¯ is equal to f everywhere on rint (Dom f ) and everywhere outside of
cl(Dom f ). Thus, all we need to prove is that when f is also lsc, f¯ is equal to f
everywhere on rbd(Dom f ) as well.
Consider any x̄ ∈ rbd(Dom f ). Recall that by construction f¯ is everywhere ≤ f .
So, there is nothing to prove if f¯(x̄) = +∞. Thus, we assume that f¯(x̄) = c < ∞
holds, and we will prove that in this case f (x̄) = c holds as well. Since f¯ ≤ f
everywhere, proving f (x̄) = c is the same as proving f (x̄) ≤ c. In fact f (x̄) ≤ c
holds due to the lower semicontinuity of f : pick any x′ ∈ rint (Dom f ) and consider
a sequence of points xi ∈ [x′ , x̄) converging to x̄. For all i, by Lemma I.1.30, we
have xi ∈ rint (Dom f ) and thus by Proposition III.15.5 we conclude
f (xi ) = f¯(xi ).
Also, since xi ∈ [x′ , x̄), there exists λi ∈ (0, 1] such that xi = (1 − λi )x̄ + λi x′ .
Note that as i → ∞, we have xi → x̄ and so λi → +0. Since f¯ is clearly convex
15.1 Proper lower semicontinuous convex functions and their representation 195

(as it is the supremum of a family of affine and thus convex functions), we have
f¯(xi ) ≤ (1 − λi )f¯(x̄) + λi f¯(x′ ).
Noting that f¯(x′ ) = f (x′ ) (recall x′ ∈ rint (Dom f ) and apply Proposition III.15.5)
as well and putting things together, we get
f (xi ) ≤ (1 − λi )f¯(x̄) + λi f (x′ ).
Moreover, as i → ∞, we have λi → +0 and so the right hand side in our inequality
converges to f¯(x̄) = c. In addition, as i → ∞, we have xi → x̄ and since f is
lower semicontinuous, we get f (x̄) ≤ c.
We see why “translation of mathematical facts from one mathematical language
to another” – in our case, from the language of convex sets to the language of
convex functions – may be fruitful: because we invest a lot into the process rather
than run it mechanically.
Closure of a convex function. Proposition III.15.4 presents a nice result on
the outer description of a proper lower semicontinuous convex function: it is the
supremum of a family of affine functions. Note that, the reverse is also true: the
supremum of every family of affine functions is a proper lsc convex function,
provided that this supremum is finite at least at one point. This is because we
know from section 13.1 that supremum of every family of convex functions is
convex and from Corollary III.15.3 that supremum of lsc functions, e.g., affine
ones (these are in fact even continuous), is lower semicontinuous.
Now, what to do with a convex function which is not lower semicontinuous?
There is a similar question about convex sets: what to do with a convex set which
is not closed? We can resolve this question very simply by passing from the set to
its closure and thus getting a “much easier to handle” object which is very “close”
to the original one: the “main part” of the original set – its relative interior –
remains unchanged, and the “correction” adds to the set something relatively
small – (part of) its relative boundary. The same approach works for convex
functions as well: if a proper convex function f is not lower semicontinuous (i.e.,
its epigraph is convex and nonempty, but is not closed), we can “correct” the
function by replacing it with a new function with the epigraph being the closure
of epi{f }. To justify this approach, we, of course, should be sure that the closure
of the epigraph of a convex function is also an epigraph of such a function. This
indeed is the case, and to see it, it suffices to note that a set G in Rn+1 is the
epigraph of a function taking values in R ∪ {+∞} if and only if the intersection
of G with every vertical line {x = const, t ∈ R} is either empty, or is a closed
ray of the form {x = const, t ≥ t̄ > −∞}. Now, it is absolutely evident that if
G = cl(epi{f }), then the intersection of G with a vertical line is either empty, or
is a closed ray, or is the entire line (the last case indeed can take place – look at
the closure of the epigraph of the function equal to − x1 for x > 0 and +∞ for
x ≤ 0). We see that in order to justify our idea of “proper correction” of a convex
function we should prove that if f is convex, then the last of the indicated three
cases, i.e., the intersection of cl(epi{f }) with a vertical line is the entire line,
196 Subgradients

never occurs. However, we know from Proposition III.13.11 that every convex
function f is bounded from below on every compact set. Thus, cl(epi{f }) indeed
cannot contain an entire vertical line. Therefore, we conclude that the closure of
the epigraph of a convex function f is the epigraph of a certain function called
the closure of f [notation: cl f ] defined as:
cl(epi{f }) = epi{cl f }.
Of course, the function cl f is convex (its epigraph is convex as it is cl(epi{f })
and epi{f } itself is convex). Moreover, since the epigraph of cl f is closed, cl f is
lsc. And of course we have the following immediate observation.

Observation III.15.6 The closure of a lsc convex function f is f itself.

Proof. Indeed, when f is convex, epi{f } is convex, and when f is lsc, epi{f }
is closed by Proposition III.15.2. Hence, under the premise of this observation,
epi{f } is convex and closed and thus, by definition of cl f , is the same as epi{cl f },
implying that f = cl f .
The following statement gives an instructive alternative description of cl f in
terms of f .

Proposition III.15.7 Let f : Rn → R ∪ {+∞} be a convex function. Then,

its closure cl f satisfies the following:
(i) Affine minorants of cl f are exactly the same as affine minorants of f ,
and cl f is the supremum of these minorants, i.e., for all x ∈ Rn we have
cl f (x) = sup {ϕ(x) : ϕ is an affine minorant of f } . (15.7)
ϕ

Also, for every x ∈ rint (Dom(cl f )) = rint (Dom f ), we can replace sup in
the right hand side of (15.7) with max.
Moreover,
(a) f (x) ≥ cl f (x), ∀x ∈ Rn ,
(b) f (x) = cl f (x), ∀x ∈ rint (Dom f ), (15.8)
(c) f (x) = cl f (x), ∀x ̸∈ cl(Dom f ).
Thus, the “correction” f 7→ cl f may vary f only at the points from rbd(Dom f ),
implying that
Dom f ⊆ Dom(cl f ) ⊆ cl(Dom f ),
hence rint (Dom f ) = rint (Dom(cl f )).
In addition, cl f is the supremum of all convex lower semicontinuous mi-
norants of f .
(ii) For all x ∈ Rn , we have
cl f (x) = lim inf f (x′ ).
r→+0 x′ :∥x′ −x∥2 ≤r

Proof. There is nothing to prove when f ≡ +∞. In this case cl f ≡ +∞ as well,

15.1 Proper lower semicontinuous convex functions and their representation 197

Dom f = Dom cl f = ∅, and all claims are trivially satisfied. Thus, assume from
now on that f is proper.
(i): We will prove this part in several simple steps.
1o . By construction, epi{cl f } = cl(epi{f }) ⊇ epi{f }, which implies (15.8.a).
Also, as cl(epi{f }) ⊆ [cl(Dom f )] × R, we arrive at (15.8.c).
2o . Note that from (15.8.a) we deduce that every affine minorant of cl f is an
affine minorant of f as well. Moreover, the reverse is also true. Indeed, let g(x) be
an affine minorant of f . Then, we clearly have epi{g} ⊇ epi{f }, and as epi{g} is
closed, we also get epi{g} ⊇ cl(epi{f }) = epi{cl f }. Note that epi{g} ⊇ epi{cl f }
is simply the same as saying that g is an affine minorant of cl f . Thus, affine
minorants of f and of cl f indeed are the same. Then, as cl f is lsc and proper
(since cl f ≤ f and f is proper), by applying Proposition III.15.4 to cl f and also
applying Proposition III.15.5 to f , we deduce (15.8.b) and (15.7).
Finally, if g is a convex lsc minorant of f , then g is definitely proper, and
thus by Proposition III.15.4 it is the supremum of all its affine minorants. These
minorants of g are affine minorants of f as well, and thus also affine minorants
of cl f . The bottom line is that g is ≤ the supremum of all affine minorants of f ,
which, as we already know, is cl f . Thus, convex lsc minorant of f is a minorant
of cl f , implying that the supremum fe of these lsc convex minorants of f satisfies
fe ≤ cl f . The latter inequality is in fact an equality since cl f itself is a convex
lsc minorant of f by (15.8.a). This completes the proof of part (i).
3o To verify (ii) we need to prove the following two facts:

(ii-1) For every sequence {xi } such that as i → ∞ we have xi → x̄

and f (xi ) → s (where s may be a finite number or infinity), we have
s ≥ cl f (x̄).
(ii-2) For every x̄, there exists a sequence xi → x̄ such that
limi→∞ f (xi ) = cl f (x̄).

To prove (ii-1), note that under the premise of this claim we have s ̸= −∞ since
f is below bounded on bounded subsets of Rn (Proposition III.13.11). There is
nothing to verify when s = +∞. So, suppose s ∈ R. Then, the point [x̄; s] is in
cl(epi{f }) = epi{cl f }, and thus cl f (x̄) ≤ s, as claimed.
To prove (ii-2), note that the claim is trivially true when cl f (x̄) = +∞. Indeed,
in this case f (x̄) = +∞ as well due to (15.8.a), and for all i = 1, 2, . . . we can
take xi = x̄. Now, consider a point x̄ such that cl f (x̄) < ∞. Then, we have
[x̄; cl f (x̄)] ∈ epi{cl f } = cl(epi{f }). Thus, there exists a sequence [xi ; ti ] ∈ epi{f }
such that [xi ; ti ] → [x̄; cl f (x̄)] as i → ∞. Passing to a subsequence, we can assume
that f (xi ) have a limit, finite or infinite, as i → ∞. Hence, limi→∞ xi = x̄ and
limi→∞ f (xi ) = lim inf i→∞ f (xi ) ≤ limi→∞ ti = cl f (x̄). Recall also that from
(ii-1) we have limi→∞ f (xi ) ≥ cl f (x̄), so we conclude limi→∞ f (xi ) = cl f (x̄) as
desired.
198 Subgradients

Figure III.4. Subdifferentials of a univariate convex function

dotted: convex function with domain [a, d]
point a: at the boundary point a of the domain, the subgradients of f are
the slopes of lines like AP and AQ, and ∂f (a) = {g : g ≤ slope(AP )}
point b: at point b, subgradients are the slopes of lines like RR, EE, SS, and
∂f (b) = {g : slope(RR) ≤ g ≤ slope(SS)}
point c: just one subgradient – the slope of the tangent line, taken at point T ,
to the graph of f
point d: similar to a, ∂f (d) = {g : g ≥ slope(U B)}

15.2 Subgradients
Let f : Rn → R∪{+∞} be a convex function, and let x ∈ Dom f . Recall from our
discussion in the preceding section that f may admit an affine minorant d⊤ x − a
which coincides with f at x, i.e.,
f (y) ≥ d⊤ y − a, ∀y ∈ Rn , and f (x) = d⊤ x − a.
The equality relation above is equivalent to a = d⊤ x − f (x), and substituting this
representation of a into the first inequality, we get
f (y) ≥ f (x) + d⊤ (y − x), ∀y ∈ Rn . (15.9)
Thus, if f admits an affine minorant which is exact at x, then there exists d ∈ Rn
which gives rise to the inequality (15.9). In fact the reverse is also true: if d is
such that (15.9) holds, then the right hand side of (15.9), regarded as a function
of y, is an affine minorant of f which coincides with f at x.
Now note that (15.9) expresses a specific property of a vector d and leads to
the following very important definition which generalizes the notion of gradient
for smooth convex functions to nonsmooth convex function.

Definition III.15.8 [Subgradient of a convex function] Let f : Rn → R ∪

{+∞} be a convex function. Given x̄ ∈ Rn , a vector d ∈ Rn is called a
subgradient of f at x̄ if d is the slope of an affine minorant of f which is
exact at x̄. That is, d is a subgradient of f at x̄ if d satisfies
f (y) ≥ f (x̄) + d⊤ (y − x̄), ∀y ∈ Rn .
The set of all subgradients of f at a point x̄ is called subdifferential of f at
x̄ [notation: ∂f (x̄)].

Figure III.4 illustrates the notions of subgradient and subdifferential.

15.2 Subgradients 199

Subgradients of convex functions play an important role in the theory and nu-
merical methods for Convex Optimization – they are quite reasonable surrogates
of gradients in the cases when the latter do not exist. Let us present a simple and
instructive illustration of this. Recall that Theorem III.14.2 states that
A necessary and sufficient condition for a convex function f : Rn →
R ∪ {+∞} to attain its minimum at a point x∗ ∈ int(Dom f ) where f is
differentiable is that ∇f (x∗ ) = 0.
The “nonsmooth” version of this statement is as follows.

Proposition III.15.9 [Necessary and sufficient optimality condition for

nonsmooth convex functions] A necessary and sufficient condition for a con-
vex function f : Rn → R ∪ {+∞} to attain its minimum at a point x∗ ∈
Dom f is the inclusion 0 ∈ ∂f (x∗ ).

Proof. Given x∗ ∈ Dom f , by definition of the subdifferential, 0 ∈ ∂f (x∗ ) if

and only if the vector 0 is a subgradient of f at x∗ , which (by the definition of
subgradient) holds if and only if
f (y) ≥ f (x∗ ) + 0⊤ (y − x∗ ), ∀y ∈ Rn ,
which simply states that x∗ is a minimizer of f .

In fact, if a convex function f is differentiable at x̄ ∈ int(Dom f ), then ∂f (x̄) is

the singleton {∇f (x̄)} (see Proposition III.15.10 below). Thus, Proposition III.15.9
indeed is an extension of Theorem III.14.2 to the case when f is possibly non-
differentiable.
Looking at the (absolutely trivial!) proof of Proposition III.15.9, one can ask:
how such a trivial necessary and sufficient optimality condition can be useful?
Why is it more informative than the tautological necessary and sufficient opti-
mality condition “f (x∗ ) ≤ f (x) for all x”? Well, the definite usefulness of Fermat
optimality condition stems not from the condition per se, but from our knowledge
on gradients, including (but not reduced to) our ability to compute, to the extent
given by the standard Calculus, the gradients. Similarly, the usefulness of Propo-
sition III.15.9 stems from our knowledge on subgradients, including the ability,
to the extent given by calculus of subgradients, to compute subdifferentials of
convex functions. Let us become acquainted with the basics of this calculus.
Here is a summary of the most elementary properties of the subgradients.

Proposition III.15.10 Let f : Rn → R ∪ {+∞} be a convex function.

Then,
(i) ∂f (x) is a closed convex set for any x ∈ Dom f , ∂f (x) ̸= ∅ whenever
x ∈ rint (Dom f ) (this is the most important fact about subgradients), and
∂f (x) is bounded whenever x ∈ int(Dom f ).
(ii) If x ∈ int(Dom f ) and f is differentiable at x, then ∇f (x) is the only
subgradient of f at x: ∂f (x) = {∇f (x)}.
200 Subgradients

(iii) “closedness of the subdifferential mapping:” Let gi ∈ ∂f (xi ) and

[xi ; gi ] → [x; g] as i → ∞. Assume also that x ∈ Dom f and f is contin-
uous at x. Then, g ∈ ∂f (x).
(iv) Let Y be a nonempty convex compact subset of int(Dom f ). Then,
the set G = {[x; g] ∈ Rn × Rn : x ∈ Y, g ∈ ∂f (x)} is compact.

Proof. (i): Closedness and convexity of ∂f (x) are evident from their definition as
(15.9) is an infinite system of nonstrict linear inequalities, indexed by y ∈ Dom f ,
on variable d.
When x ∈ rint (Dom f ), Proposition III.15.5 provides us with an affine function
which underestimates f everywhere and coincides with f at x. The slope of this
affine function is clearly a subgradient of f at x, and thus ∂f (x) ̸= ∅.
Boundedness of ∂f (x) when x ∈ int(Dom f ) is an immediate consequence of
item (iv) to be proved soon.
(ii): Suppose x ∈ int(Dom f ) and f is differentiable at x. Then, by the gradient
inequality we have ∇f (x) ∈ ∂f (x). Let us prove that in this case, ∇f (x) is the
only subgradient of f at x. Consider any d ∈ ∂f (x). Then, by the definition of
subgradient, we have

f (y) − f (x) ≥ d⊤ (y − x), ∀y ∈ Rn .

Now, consider any fixed direction h ∈ Rn and any real number t > 0. By substi-
tuting y = x + th in the preceding inequality and then dividing both sides of the
resulting inequality by t, we obtain

f (x + th) − f (x)
≥ d⊤ h.
t
Taking the limit of both sides of this inequality as t → +0, we get

h⊤ ∇f (x) ≥ h⊤ d.

Since h was an arbitrary direction, this inequality is valid for all h ∈ Rn , which
is possible if and only if d = ∇f (x).
(iii): Under the premise of this part, for every y ∈ Rn and for all i = 1, 2, . . .,
we have

f (y) ≥ f (xi ) + gi⊤ (y − xi ).

Passing to limit as i → ∞, we get

f (y) ≥ f (x) + g ⊤ (y − x), ∀y ∈ Rn ,

and thus g ∈ ∂f (x).

(iv): Let us start with the following observation:
15.2 Subgradients 201

Let f : Rn → R ∪ {+∞} be convex, and consider x̄ ∈ int(Dom f ).

Suppose f is Lipschitz continuous, with constant L with respect to ∥ · ∥2 ,
in a neighborhood V of x̄, that is, |f (x) − f (y)| ≤ L∥x − y∥2 for all
x, y ∈ V . Then,
∥g∥2 ≤ L, ∀g ∈ ∂f (x̄).
Indeed, when g ∈ ∂f (x), by the subgradient inequality and Lipschitz
continuity of f we have g ⊤ (y − x) ≤ f (y) − f (x) ≤ L∥x − y∥2 for all y
⊤ n
⊤x, that is, g h ≤ L∥h∥2 for all h ∈ R , implying that ∥g∥2 =
close to
maxh g h : ∥h∥2 ≤ 1 ≤ L.
Now, under the premise of (iv), we can find a compact set Y ′ ⊂ int(Dom f ) such
that Y ⊂ int Y ′ . By Theorem III.13.9, f is Lipschitz continuous, with certain
constant L, on Y ′ , implying by the preceding observation that ∥g∥2 ≤ L for all
g ∈ ∂f (x) with x ∈ Y . Then, as Y is bounded, the set G = {[x; g] ∈ Rn × Rn :
x ∈ Y, g ∈ ∂f (x)} is bounded. Moreover, this set is closed by (iii) (recall that Y
is compact and f is continuous on Y ), so that G is compact.
Proposition III.15.10 sheds light onto why subgradients are good surrogates of
gradients: at a point where gradient exists, the gradient is the unique subgradient;
but, in contrast to the gradient, a subgradient exists basically everywhere (for
sure in the relative interior of the domain of the function).
Let us examine a simple function and its subgradients.
Example III.15.1 Consider the function f : R → R given by
f (x) = |x| = max {x, −x} .
This function f is, of course, convex (as maximum of two linear forms x and −x).
Whenever x ̸= 0, f is differentiable at x with the derivative +1 for x > 0 and −1
for x < 0. At the point x = 0, f is not differentiable; nevertheless, it must have
subgradients at this point (since 0 ∈ int(Dom f )). And indeed, it is immediately
seen that the subgradients of f at x = 0 are exactly the real numbers from the
segment [−1, 1]. Thus,

{−1},
 if x < 0,
∂|x| = [−1, 1], if x = 0,

{+1}, if x > 0.


♢
It is important to note that at the points from the relative boundary of the
domain of a convex function, even a “good” one, we may not have any subgradi-
ents. That is, it is possible to have ∂f (x) = ∅ for a convex function f at a point
x ∈ rbd(Dom f ). We give an example of this next.
Example III.15.2 Consider the function
( √
− x, if x ≥ 0,
f (x) =
+∞, if x < 0.
202 Subgradients

Convexity of this function follows from convexity of its domain and Example III.12.2.
Consider the point [0; f (0)] ∈ rbd(epi{f }). It is clear that at this point [0; f (0)]
there is no non-vertical supporting line to the set epi{f }, and, consequently, there
is no affine minorant of the function which is exact at x = 0. ♢

A significant – and important – part of Convex Analysis deals with subgradient

calculus, which is the set of rules for computing subgradients of “composite” func-
tions, like sums, superpositions, maxima, etc., given subgradients of the operands.
These rules extend the standard Calculus rules to nonsmooth convex functions,
and they are quite nice and useful. Here, we list several “self-evident” versions of
these rules:

1. Subdifferential of nonnegative weighted sum: Let f, g be convex functions on

Rn . Consider x ∈ Dom f ∩ Dom g and λ, µ ∈ R+ . If d ∈ ∂f (x) and e ∈ ∂g(x),
then λd + µe ∈ ∂[λf + µg](x).
2. Subdifferential of pointwise supremum: Let {fα (·)}α∈A be a family of convex
functions on Rn . Consider the convex function f (x) := supα∈A fα (x) and x̄ ∈
Dom f . Suppose that ᾱ is such that f (x̄) = fᾱ (x̄). Then, for any d¯ ∈ ∂fᾱ (x̄),
we have d¯ ∈ ∂f (x̄).
Here is the justification: f (x) ≥ fᾱ (x) ≥ fᾱ (x̄) + d¯⊤ (x − x̄) for all x ∈ Rn , that
is, f (x) ≥ fᾱ (x̄) + d¯⊤ (x − x̄) for all x, and this inequality becomes equality at
x = x̄ as f (x̄) = fᾱ (x̄).
3. Subdifferential of convex monotone/affine superposition [chain rule] :
Let f1 (x), . . . , fK (x) be convex functions on Rn , and let F be a convex function
on RK . Suppose for some 0 ≤ k ≤ K the functions f1 , . . . , fk are affine, and
also F (y) is nondecreasing in yk+1 , . . . , yK . Recall that the superposition
(
F (f1 (x), . . . , fK (x)), if fk (x) < +∞ for all k ≤ K,
g(x) :=
+∞, otherwise,
TK
is a convex function of x. Consider x̄ ∈ i=1 Dom(fi ) such that

ȳ := [f1 (x̄); . . . ; fK (x̄)] ∈ Dom F.

PK
Let g i ∈ ∂fi (x̄) for all i ≤ K and e ∈ ∂F (ȳ). Then, the vector d := i=1 ei g i
satisfies d ∈ ∂[F (f1 , . . . , fK )](x̄).
The justification of this is as follows. Let h ∈ Rn be an arbitrary direction,
and consider x = x̄ + h and y = ȳ + [(g 1 )⊤ h; (g 2 )⊤ h; . . . ; (g K )⊤ h]. Then, for all
i ≤ K, we have

yi = ȳi + (g i )⊤ h = fi (x̄) + (g i )⊤ (x − x̄).

Moreover, for i ≤ k as f1 , . . . , fk are affine we have yi = fi (x); and for i > k, as

g i ∈ ∂fi (x̄), we have fi (x) ≥ yi . Consequently, using the partial monotonicity
of F , we conclude F (f1 (x), . . . , fK (x)) ≥ F (y). Now, e ∈ ∂F (ȳ), which implies
15.2 Subgradients 203

the first inequality in the following chain:

!⊤
X
⊤ i
F (y) ≥ F (ȳ) + e (y − ȳ) = F (f1 (x̄), . . . , fK (x̄)) + ei g h
i
⊤
= F (f1 (x̄), . . . , fK (x̄)) + d (x − x̄).
Here, the first equality follows from yi = ȳi + (g i )⊤ h for all i, and the sec-
ond equality is due to x = x̄ + h. Finally, this inequality combines with
F (f1 (x), . . . , fK (x)) ≥ F (y) to imply that
⊤
F (f1 (x), . . . , fK (x)) ≥ F (f1 (x), . . . , fK (x)) + d (x − x).
as desired.
Advanced versions of these subgradient calculus rules, under appropriate as-
sumptions, describe how the entire subdifferentials of the resulting functions are
obtained from those of operands; the related considerations, however, are beyond
our scope.
We close this section by providing an illustration of the second rule.
Example III.15.3 In this example, we will examine the subgradients of spectral
norm of a symmetric matrix. For X ∈ Sn , its spectral norm ∥X∥ is given by
∥X∥ := maxn |e⊤ Xe| : ∥e∥2 ≤ 1 .

e∈R

From this definition, it is clear that ∥X∥ is a convex function of X.

Given X, we can compute a maximizer of the optimization problem defining
∥X∥ by finding an eigenvalue-eigenvector pair (λX , e∗ ) of X such that λX is
the largest in magnitude of the eigenvalues of X. Let eX be the unit length
normalization of e∗ . Thus,
∥X∥ = max |e⊤ Xe| = max |Tr(X[ee⊤ ])| = |Tr(X[eX e⊤
X ])|
e:∥e∥2 ≤1 e:∥e∥2 ≤1

= Tr(X[sign(λX )eX e⊤
X ]),

where the third equality follows from the choice of eX , and the last equality is
due to Tr(X[eX e⊤X ]) = λX . Then, using item 2 in our subgradient calculus rules,
we conclude that the symmetric matrix E(X) := sign(λX )eX e⊤ X is a subgradient
of the function ∥ · ∥ at X. That is,
∥Y ∥ ≥ Tr(Y E(X)) = ∥X∥ + Tr(E(X)(Y − X)), ∀Y ∈ Sn ,
where the equality holds due to the choice of E(X) guaranteeing ∥X∥ = Tr(XE(X)).
To see that the above is indeed a subgradient inequality, recall that the inner
product on Sn is the Frobenius inner product ⟨A, B⟩ = Tr(AB).
Let us close this example by discussing smoothness properties of ∥X∥. Recall
that every norm ∥y∥ in Rm is nonsmooth at the origin y = 0. However, ∥X∥ is
a nonsmooth function of X even at points other than X = 0. In fact, ∥ · ∥ is
continuously differentiable in a neighborhood of every point X ∈ Sn where the
204 Subgradients

maximum magnitude eigenvalue is unique and is of multiplicity 1, and can lose

smoothness at other points. ♢

15.3 Subdifferentials and directional derivatives of convex functions

Let f be a convex function on Rn and consider x ∈ int(Dom f ) (in fact our
construction to follow admits an immediate generalization for the case when
x ∈ rint (Dom f ) as well). Consider any direction h ∈ Rn and the univariate
function
ϕ(t) := f (x + th)
associated with h. Note that ϕ(t) is a real-valued convex function of t in a neigh-
borhood of 0, thus for all small enough positive s and t we have
ϕ(0) − ϕ(−s) ϕ(t) − ϕ(0)
≤ .
s t
Moreover, the right hand side of this inequality is nondecreasing in t, hence it
implies the existence of directional derivative of f taken at x along the direction
h, i.e., the quantity
f (x + th) − f (x)
Df (x)[h] = lim .
t→+0 t
As a function of h, the function Df (x)[h] is clearly positively homogeneous of
degree 1, i.e.,
Df (x)[λh] = λDf (x)[h], ∀λ ≥ 0.
In addition, Df (x)[h] is a convex function of h. To see this, let r > 0 be such
that the Euclidean ball B centered at x and of radius r is contained in Dom f .
Then, the functions
f (x + sh) − f (x)
fs (h) :=
s
are convex over the domain h ∈ B as long as 0 < s ≤ 1. Moreover, as s → +0,
on B they pointwise converge to Df (x)[h], which clearly implies the convexity of
Df (x)[h] as a function of h ∈ B. Finally, since as a function of h, Df (x)[h] is pos-
itively homogeneous of degree 1, its convexity on B clearly implies its convexity
on the entire Rn . Note that convexity of f implies that
f (x + h) ≥ f (x) + Df (x)[h], ∀(x, h : x ∈ int(Dom f ), x + h ∈ Dom f ). (15.10)
We have arrived at the following result.

Lemma III.15.11 Let f be a convex function on Rn . For any x ∈ int(Dom f ),

the subdifferential ∂f (x) of f at x is exactly the same as the subdifferential
of Df (x)[·] at the origin.
15.3 Subdifferentials and directional derivatives of convex functions 205

Proof. Suppose d is a subgradient of f at x, and thus f (x + th) − f (x) ≥

td⊤ h holds for all h ∈ Rn and all small enough t > 0. This then implies that
Df (x)[h] ≥ d⊤ h, that is, d is a subgradient of Df (x)[h] at h = 0. For the reverse
direction, suppose that d is a subgradient of Df (x)[·] at h = 0. Then, we have
Df (x)[h] ≥ d⊤ h for all h, implying, by (15.10), that f (x + h) ≥ f (x) + d⊤ h
whenever x + h ∈ Dom f .
Our goal is to demonstrate that when x ∈ int(Dom f ), the subdifferential ∂f (x)
is large enough to fully define Df (x)[·]:

Theorem III.15.12 Let f be a convex function on Rn and x ∈ int(Dom f ).

Then,
Df (x)[h] = max d⊤ h : d ∈ ∂f (x) .

(15.11)
d

Note that for a convex function f , at x ∈ int(Dom f ), by Proposition III.15.10

we know that ∂f (x) is bounded and thus in (15.11) the use of max as opposed to
sup is justified. We are about to prove Theorem III.15.12 using the fundamental
Hahn-Banach Theorem (finite-dimensional version) given below.

Theorem III.15.13 [Hahn-Banach Theorem, finite-dimensional version] Let

D(·) be a real-valued convex positively homogeneous, of degree 1, function
on Rn , e ∈ Rn and E be a linear subspace of Rn such that
e⊤ z ≤ D(z), ∀z ∈ E.
Then, there exists e′ ∈ Rn such that [e′ ]⊤ z ≡ e⊤ z for all z ∈ E and [e′ ]⊤ z ≤
D(z) for all z ∈ Rn . In other words, a linear functional defined on a linear
subspace of Rn and majorized on this subspace by D(·), can be extended
from the subspace to a linear functional on the entire space in such a way
that the extension is majorized by D(·) everywhere.

Proof of Theorem III.15.12. In this proof we will show that Theorem III.15.13
implies Theorem III.15.12.
Consider any d ∈ ∂f (x). Then, by Lemma III.15.11 we have d is in the subd-
ifferential Df (x)[·] at the origin, i.e., Df (x)[h] ≥ d⊤ h. Thus, we conclude
Df (x)[h] ≥ max d⊤ h : d ∈ ∂f (x) .

d

To prove the opposite inequality, let us fix h ∈ Rn , and let us verify that
g := Df (x)[h] ≤ maxd {d⊤ h : d ∈ ∂f (x)}. There is nothing to prove when
h = 0, so let h ̸= 0. Setting ϕ(t) := Df (x)[th], we get a convex (since, as we
already know, Df (x)[·] is convex) univariate function such that ϕ(t) = gt for
t ≥ 0. Then, this together with the convexity of ϕ implies that ϕ(t) ≥ gt for all
t ∈ R. By applying Hahn-Banach Theorem to the function D(z) := Df (x)[z] (we
already know that this function satisfies the premise of Hahn-Banach Theorem),
E := R(h) and the linear form e⊤ (th) = gt, t ∈ R, on E, we conclude that there
exists e ∈ Rn such that e⊤ u ≤ Df (x)[u] for all u ∈ Rn and e⊤ h = g = Df (x)[h].
206 Subgradients

By the first relation, e is a subgradient of Df (x)[·] at the origin and thus, by

Lemma III.15.11, e ∈ ∂f (x), so that
max d⊤ h ≥ e⊤ h = Df (x)[h].
d∈∂f (x)

Thus, the right hand side in (15.11) is greater than or equal to the left hand side.
The opposite inequality has already been proved, so that (15.11) is an equality.

Remark III.15.14 The reasoning in the proof of Theorem III.15.12 implies the
following fact:
Let f be a convex function. Consider any x ∈ int(Dom f ) and any affine
plane M such that x ∈ M . Then, “every subgradient, taken at x, of the
restriction f of f onto M can be obtained from the subgradient of f .”
M
That is, if e is such that f (y) ≥ f (x) + e⊤ (y − x) for all y ∈ M , then
there exists e′ ∈ ∂f (x) such that e⊤ (y − x) = (e′ )⊤ (y − x) for all y ∈ M .
■

For completeness we also present a proof of finite-dimensional version of Hahn-

Banach Theorem (Theorem III.15.13).
Proof of Theorem III.15.13. We are given a linear functional defined on a
linear subspace E of Rn and this linear functional is majorized on this subspace
by D(·); we want to prove that this linear functional can be extended from E to a
linear functional on the entire space such that it is majorized by D(·) everywhere.
To this end, it clearly suffices to prove this fact when E is of dimension n − 1
(as given this fact for E satisfying dim(E) = n − 1, we can build the desired
extension in the general case by iterating extensions “increasing the dimension
by 1”). Thus, suppose Rn = R(g) + E, where g ̸∈ E. Note that in order to specify
the desired vector e′ all we need is to determine what the value of α := (e′ )⊤ g
should be, since then e′ will be uniquely defined by the relation
(e′ )⊤ (λg + h) = λα + e⊤ h, ∀(λ ∈ R, h ∈ E).
Therefore, we wish to find α ∈ R such that
λα + e⊤ h ≤ D(λg + h), ∀(λ ∈ R, h ∈ E).
Note that when λ = 0, the preceding inequality is automatically satisfied due to
the premise of the theorem on e. Moreover, as D is positively homogeneous of
degree 1 and E is a linear subspace, all we need is to ensure that the preceding
inequality is valid when λ = ±1, that is, to ensure that
α ≤ D(g + h) − e⊤ h, ∀h ∈ E (a)
α ≥ −D(−g + h) + e⊤ h, ∀h ∈ E (b)
Now, to justify that α ∈ R satisfying the relations (a) and (b) above indeed
exists, we need to show that every possible value of the right hand side in (a) is
15.3 Subdifferentials and directional derivatives of convex functions 207

greater than or equal to every possible value of the right hand side in (b), that is,
that D(g + h) − e⊤ h ≥ −D(−g + h′ ) + e⊤ h′ whenever h, h′ ∈ E. By rearranging
the terms, we thus need to show that

e⊤ [h + h′ ] ≤ D(h + g) + D(−g + h′ ), ∀h, h′ ∈ E. (15.12)

Now, note that

1 1
D(h + g) + D(−g + h′ ) = 2 D(h + g) + D(−g + h′ )
2 2

1 1 ′
≥ 2D (h + g) + (−g + h )
2 2
= D(h + h′ )
≥ e⊤ [h + h′ ],

where the first inequality follows from convexity of the function D, the second
equality is due to D being positively homogeneous of degree 1, and the last
inequality is due to the facts that h + h′ ∈ E and e⊤ z is majorized by D(z) on
E. Hence, (15.12) is proved.
Remark III.15.15 The advantage of the preceding proof of Hahn-Banach The-
orem in finite dimensional case is that it straightforwardly combines with what is
called transfinite induction to yield Hahn-Banach Theorem in the case when Rn
is replaced with arbitrary, perhaps infinite dimensional, linear space and exten-
sion of linear functional from linear subspace on the entire space which preserves
majorization by a given convex and positively homogeneous, of degree 1, function
on the space.
In the finite-dimensional case, alternatively, we can prove Hahn-Banach The-
orem, via Separation Theorem as follows: without loss of generality we can as-
sume that E ̸= Rn . Define the sets T := {[x; t] : x ∈ Rn , t ≥ D(x)} and
S := {[h; t] : h ∈ E, t = e⊤ h}. Thus, we get two nonempty convex sets with
non-intersecting relative interiors (as E ̸= Rn and D(h) majorizes e⊤ h on E).
Then, by Separation Theorem there exists a nontrivial (r ̸= 0) linear functional
r⊤ [x; t] ≡ d⊤ x+at, which separates S and T , i.e., inf y∈T r⊤ y ≥ supy∈S r⊤ y. More-
over, since S is a linear subspace, we deduce that supy∈S r⊤ y is either 0 or +∞.
Also, as T ̸= ∅, we conclude +∞ > inf y∈T r⊤ y ≥ supy∈S r⊤ y, and thus we must
have supy∈S r⊤ y = 0. In addition, we claim that a > 0. Indeed,

0 = sup r⊤ y ≤ inf r⊤ y = d⊤ x + at : t ≥ D(x) .

inf
n
y∈S y∈T x∈R ,t∈R

As the right hand side value must be strictly greater than −∞, we see a ≥ 0.
Also, if a = 0 were to hold, then from r ̸= 0 we must have d ̸= 0. More-
over, when a = 0 and d ̸= 0, since D(·) is a finite valued function we have
inf x∈Rn ,t∈R d⊤ x + at : t ≥ D(x) = −∞. But, this contradicts to the infimum
being bounded below by 0. Now that a > 0, by multiplying r by a−1 , we get a
208 Subgradients

separator of the form r = [−e′ ; 1]. Then,

0 = sup r⊤ y = sup r⊤ [h; t] = sup {(−e′ )⊤ h + t} = sup {(−e′ )⊤ h + t}
y∈S [h;t]∈S [h;t]∈S h∈E,t=e⊤ h

= sup{(−e′ )⊤ h + e⊤ h},
h∈E
′ ⊤ ⊤
and so we conclude (e ) h = e h holds for all h ∈ E. Note also that the relation
0 = sup[h;t]∈S r⊤ [h; t] ≤ inf [x;t]∈T r⊤ [x; t] is nothing but (e′ )⊤ x ≤ D(x) for all
x ∈ Rn . ■
Hahn-Banach Theorem is extremely important on its own right, and our way
of proving Theorem III.15.12 was motivated by the desire to acquaint the reader
with Hahn-Banach Theorem. If justification of Theorem III.15.12 were to be our
sole goal, we could have achieved this goal in a much broader setting and at a
cheaper cost, see solution to Exercise IV.29.D.4.
16

⋆ Legendre transform

16.1 Legendre transform : Definition and examples

Let f : Rn → R ∪ {+∞} be a proper convex function. We know that f is
“basically” the supremum of all its affine minorants. In fact, this is exactly the
case when f is lower semicontinuous in addition to being convex and proper;
otherwise (i.e., if it is not lower semicontinuous) the corresponding equality takes
place everywhere except, perhaps, some points from rbd(Dom f ). Now, let us look
into the question of when an affine function d⊤ x − a is an affine minorant of f .
This is the case if and only if we have

f (x) ≥ d⊤ x − a, ∀x ∈ Rn ,

which holds if and only if we have

a ≥ d⊤ x − f (x), ∀x ∈ Rn .

Thus, we see that if the slope d of an affine function d⊤ x − a is fixed, then in

order for the function to be a minorant of f , it needs to satisfy

a ≥ sup d⊤ x − f (x) .

x∈Rn

The supremum in the right hand side of this inequality is a certain function of d,
and we arrive at the following important definition.

Definition III.16.1 [Legendre transform (Fenchel dual) of a convex func-

tion] Given a convex function f : Rn → R ∪ {+∞}, its Legendre transform
(also called the Fenchel conjugate or Fenchel dual ) [notation: f ∗ ] is the func-
tion
f ∗ (d) := sup d⊤ x − f (x) : Rn → R ∪ {+∞}.

x∈Rn

Geometrically, the Legendre transform answers the following question: given a

slope d of an affine function, i.e., given the hyperplane t = d⊤ x in Rn+1 , what is
the minimal “shift down” of this hyperplane so that it can be placed below the
graph of f ?
The definition of Legendre transform immediately leads to a simple yet useful
observation.
209
210 ⋆ Legendre transform

Fact III.16.2 Given a proper convex function f : Rn → R ∪ {+∞}, its

Legendre transform f ∗ is a proper convex lower semicontinuous function.

Let us see some examples of simple functions and their Legendre transforms.
Example III.16.1
1. Given a ∈ R, consider the constant function
f (x) ≡ a.
Its Legendre transform is given by
(
∗
⊤
−a,
⊤ if d = 0,
f (d) = sup d x − f (x) = sup d x − a =
x∈Rn x∈Rn +∞, otherwise.
2. Consider the affine function
f (x) = c⊤ x + a, ∀x ∈ Rn .
Its Legendre transform is given by
(
−a, if d = c,
f ∗ (d) = sup d⊤ x − f (x) = sup d⊤ x − (c⊤ x + a) =

x∈Rn x∈Rn +∞, otherwise.
3. Consider the strictly convex quadratic function
1
f (x) = x⊤ Ax,
2
n
where A ∈ S is positive definite. Its Legendre transform is given by

1 ⊤ 1
f ∗ (d) = sup d⊤ x − f (x) = sup d⊤ x − = d⊤ A−1 d,

x Ax
x∈Rn x∈Rn 2 2
where the final equality holds by examining the first-order necessary and suf-
ficient optimality condition (for maximization type objective) of differentiable
concave functions.
4. Consider the function f : R → R given by f (x) = |x|p /p, where p ∈ (1, ∞).
Then, using the first-order optimality conditions we see that the Legendre
transform of f is given by
|x|p |d|q

∗
f (d) = sup dx − = ,
x∈R p q
where q satisfies p1 + 1q = 1.
5. Suppose f is a proper convex function and the function g is defined to be
g(x) = f (x − a). Then, the Legendre transform of g satisfies
g ∗ (d) = sup d⊤ x − g(x) = sup d⊤ x − f (x − a)

x∈Rn x∈R n

= sup d⊤ (a + x′ ) − f (x′ ) = d⊤ a + f ∗ (d).

x′ ∈Rn

♢
16.2 Legendre transform : Main properties 211

16.2 Legendre transform : Main properties

Given a function f , we define its biconjugate [notation: f ∗∗ ] as the Legendre
transform of the function f ∗ , that is,
f ∗∗ := (f ∗ )∗ .
The most elementary (and the most fundamental) fact about the Legendre trans-
form is its involutive property (“symmetry”) which we discuss next. In particular,
this symmetry property of Legendre transform gives us an alternative represen-
tation of f in terms of its affine minorants.

Proposition III.16.3 Let f be a proper convex function. Then, its bicon-

jugate f ∗∗ is exactly the closure of f , i.e.,
f ∗∗ = cl f.
In particular, when f is proper, convex, and also lower semicontinuous, then
it is precisely the Legendre transform of its Legendre transform, and thus
f (x) = f ∗∗ (x) = sup x⊤ d − f ∗ (d) .

d

Proof. First, by Fact III.16.2 f ∗ is a proper lsc convex function, so that f ∗∗ , once
again by Fact III.16.2, is a proper lsc convex function as well. Next, by definition,
f ∗∗ (x) = (f ∗ )∗ (x) = sup x⊤ d − f ∗ (d) =
⊤
sup d x−a .
d∈Rn d∈Rn ,a≥f ∗ (d)

Now, recall from the origin of the Legendre transform that a ≥ f ∗ (d) if and only
if the affine function d⊤ x − a is a minorant of f . Thus, sup d⊤ x − a
d∈Rn ,a≥f ∗ (d)
is exactly the supremum of all affine minorants of f , and this supremum, by
Proposition III.15.7 is nothing but the closure of f . Finally, when f is proper
convex and lsc, f = cl f by Observation III.15.6, that is f ∗∗ = cl f is the same as
f ∗∗ = f .
The Legendre transform is a very powerful descriptive tool, i.e., it is a “global”
transformation, so that local properties of f ∗ correspond to global properties
of f . Below we give a number important consequences of Legendre transform
highlighting this.
Let f be a proper convex lsc function.
A. By Proposition III.16.3, the Legendre transformf ∗ (d) = supx x⊤ d − f (x) is

a proper convex lsc function and f (x) = supd x⊤ d − f ∗ (d) . Since f ∗ (d) ≥
d⊤ x − f (x) for all x, we have
x⊤ d ≤ f (x) + f ∗ (d), ∀x, d ∈ Rn . (16.1)
Moreover, inequality in (16.1) becomes equality if and only if x ∈ Dom f and
d ∈ ∂f (x), same as if and only if d ∈ Dom f ∗ and x ∈ ∂f ∗ (d).
Proof. All we need is to justify the “moreover” part of the claim. Let x, d ∈
Rn , and let us prove that d⊤ x = f (x) + f ∗ (d) if and only if d ∈ ∂f (x). In one
212 ⋆ Legendre transform

direction: when d ∈ ∂f (x), we have x ∈ Dom f and f (z) ≥ f (x) + d⊤ (z − x)

for all z ∈ Rn , so
f ∗ (d) = sup d⊤ z − f (z) ≤ sup d⊤ z − f (x) − d⊤ (z − x) = d⊤ x − f (x).

z∈Rn z∈Rn

Thus, f ∗ (d) ≤ f (x) − d⊤ x; since by (16.1) strict inequality here is impossible,

we conclude that when d ∈ ∂f (x), the inequality in (16.1) is equality. In the
opposite direction: let d, x be such that the inequality in (16.1) is equality, and
let us prove that d ∈ ∂f (x). Indeed, in this case x ∈ Dom f and d⊤ x − f (x) =
f ∗ (d) ≥ d⊤ z − f (z) for all z ∈ Rn (recall the definition of Legendre transform
f ∗ (d)), that is, f (z) ≥ f (x) + d⊤ (z − x) for all z, implying that d ∈ ∂f (x).
We have seen that when f is proper convex and lsc, then d ∈ ∂f (x) if and
only if d⊤ x = f (x) + f ∗ (d). Hence, when f is proper convex and lsc, by Fact
III.16.2 and Proposition III.16.3, we can swap the roles of f and f ∗ , so that
the inequality in (16.1) is equality if and only if x ∈ ∂f ∗ (d).
B. We always have inf x f (x) = −f ∗ (0). Thus, f is bounded from below if and only
if 0 ∈ Dom f ∗ . Moreover, f attains its minimum if and only if ∂f ∗ (0) ̸= ∅, in
which case Argmin f = ∂f ∗ (0).
Proof. By definition of the Legendre transform we have f ∗ (0) = − inf x f (x).
Thus, f is below bounded if and only if 0 ∈ Dom f ∗ . To see the rest of the claim,
suppose now that 0 ∈ Dom f ∗ and so f ∗ (0) ∈ R. As inf x f (x) = −f ∗ (0) implies
f (x) + f ∗ (0) ≥ 0 for all x ∈ Rn , we deduce that the equality f (x) + f ∗ (0) = 0
holds exactly for x’s that are the minimizers of f . Also, by part A, we conclude
that when 0 ∈ Dom f ∗ , inequality in (16.1) with d = 0 holds as equality if and
only if x ∈ ∂f ∗ (0). Combining the last two statements, we conclude that when
f is below bounded (or, equivalently, when 0 ∈ Dom f ∗ ), the set of minimizers
of f is exactly ∂f ∗ (0).
C. By part A, we have (i) d¯ ∈ ∂f (x̄) if and only if (ii) x̄ ∈ ∂f ∗ (d),
¯ and moreover
both (i) and (ii) hold simultaneously if and only if (iii) with x = x̄, d = d, ¯
inequality in (16.1) holds as equality.
C′ . Here is a nice special case of C: Let f be a proper convex lsc function on Rn .
Suppose that the domains of f and f ∗ are open, and that these functions are
continuously differentiable and strictly convex on their domains. Then, the map
x 7→ ∇f (x) : Dom f → Rn is a one-to-one mapping of Dom f onto Dom f ∗ ,
and the inverse of this map is given by y 7→ ∇f ∗ (y) : Dom f ∗ → Rn .
Proof. First, we claim that under the given premise, ∇f (x) is an embedding
of Dom f into Rn , i.e., ∇f (x) = ∇f (x′ ) implies x = x′ . Assume for contradic-
tion that this is not the case. Consider the function g(u) := f (u) − u⊤ ∇f (x).
This function is strictly convex, since f is so. However, when ∇f (x) = ∇f (x′ ),
both x and x′ are minimizers of g(u), which is not possible as g(u) is strictly
convex and by Theorem III.14.1 its minimizer, if any exists, must be unique.
Thus, x 7→ ∇f (x) is an embedding of Dom f into Rn . Next, when x ∈ Dom f
and d = ∇f (x), we have d ∈ ∂f (x), so that d ∈ Dom f ∗ and x ∈ ∂f ∗ (d) by
16.3 Young, Hölder, and Moment inequalities 213

C. As Dom f ∗ is open and f ∗ is continuously differentiable at its domain,

the relation x ∈ ∂f ∗ (d) is the same as x = ∇f ∗ (d). Thus, x 7→ ∇f (x)
is an embedding of Dom f into Dom f ∗ and its left inverse is the mapping
d 7→ ∇f ∗ (d) : Dom f ∗ → Rn . Recalling that f is proper lsc convex, so is f ∗
(Fact III.16.2), and (f ∗ )∗ = f (Proposition III.16.3). Since the rest of our as-
sumptions is symmetric with respect to f , f ∗ as well, the previous reasoning as
applied to f ∗ in the role of f demonstrates that the mapping d 7→ ∇f ∗ (d) is an
embedding of Dom f ∗ into dom f with left inverse x 7→ ∇f (x). Taken together,
these observations yield C′ .
D. Dom f ∗ = Rn if and only if f (x) “grows at infinity faster than ∥x∥2 ”, that
is, if and only if the function F (s) := inf x:∥x∥2 ≤s f (x)/∥x∥2 blows up to ∞ as
s → ∞.
Proof. Suppose F (s) → ∞ as s → ∞. Consider any fixed d ∈ Rn . Then,
there exists s̄ such that f (x) ≥ ∥d∥2 ∥x∥2 whenever ∥x∥2 ≥ s̄. Thus, whenever
∥x∥2 ≥ s̄ we have d⊤ x−f (x) ≤ ∥d∥2 ∥x∥2 −f (x) ≤ 0. Also, as a convex function
f is below bounded on the ball ∥x∥2 ≤ s̄, (in fact on any bounded set), the
function d⊤ x − f (x) of x is bounded from above and we conclude d ∈ Dom f ∗ .
As this conclusion holds for any d, we conclude that Dom f ∗ = Rn . Now, to
see the reverse direction, suppose that F (s) does not blow up to ∞ as s → ∞.
Then, we can find a sequence {xi } and L ∈ R such that ∥xi ∥2 → ∞ as i → ∞
and f (xi ) ≤ L∥xi ∥2 for all i. Passing to a subsequence, we can assume that as
i → ∞ we have xi /∥xi ∥2 → ξ, where ∥ξ∥2 = 1. Now, select d := 2Lξ, then for
all large enough i we have f (xi ) ≤ L∥xi ∥2 ≤ 23 d⊤ xi . In addition, d⊤ xi → ∞ as
i → ∞. Consequently, d⊤ xi − f (xi ) ≥ d⊤ xi − L∥xi ∥2 ≥ 13 d⊤ xi for large enough
i. Hence, d⊤ xi − f (xi ) → +∞ as i → ∞, that is, d ̸∈ Dom f ∗ .
The bottom line is that by investigating the Legendre transform of a convex
function, we get a lot of “global” information on the function. This being said,
detailed investigation of the properties of Legendre transform is beyond our scope.
We close this chapter by listing several simple yet important consequences of
Legendre transform.

16.3 Young, Hölder, and Moment inequalities

Legendre transform leads to several important inequalities. Recall from the defi-
nition of Legendre transformation, we have
f (x) + f ∗ (d) ≥ x⊤ d, ∀x, d ∈ Rn .
We will see that specific choices of f in this inequality leads to several well-known
inequalities.

16.3.1 Young’s inequality

Young’s inequality reads as follows:
214 ⋆ Legendre transform
1 1
Let p and q be positive real numbers such that p
+ q
= 1. Then,
|x|p |d|q
xd ≤ + , ∀x, d ∈ R.
p q
Proof. Recall from Example III.16.1 that the Legendre transform of the function
|x|p /p is |d|q /q.

16.3.2 Hölder’s inequality

The admittedly simple-looking Young’s inequality gives rise to the very nice and
useful Hölder’s inequality.
p 1 1
Let 1 ≤ p ≤ ∞ and let q = p−1 (where 1/0 = +∞), so that p
+ q
= 1.
Then,
Xn
|xi yi | ≤ ∥x∥p ∥y∥q , ∀x, y ∈ Rn . (16.2)
i=1

Proof. When p = ∞, we have q = 1 and (16.2) becomes the obvious relation

n n
!
X X
|xi yi | ≤ max{|xi |} |yi | , ∀x, y ∈ Rn .
i
i=1 i=1

By symmetry, we also see that when p = 1 we have q = ∞ and (16.2) is evident.

Now, let 1 < p < ∞, so that also 1 < q < ∞. In this case we should prove that
n n
!1/p n !1/q
X X X
|xi yi | ≤ |xi |p |yi |q .
i=1 i=1 i=1

When x = 0 or y = 0, this inequality is evident. So, we assume that x ̸= 0 and

y ̸= 0. As both sides of this inequality are positively homogeneous of degree 1 with
respect to x, and similarly with respect to y, without loss of generality we can
assume that ∥x∥p = ∥y∥q = 1. Now, under this normalization, we should prove
p q
that i=1 |xi yi | ≤ 1. Recall that by Young’s inequality we get |xi yi | ≤ |xpi | + |yqi |
Pn
for all i, and so
n n
|xi |p |yi |q

X X 1 1 1 1
|xi yi | ≤ + = ∥x∥pp + ∥y∥qq = + = 1,
i=1 i=1
p q p q p q
as desired.
1 1
Note that for p, q ∈ [1, ∞] such that p
+ q
= 1, Hölder’s inequality gives us

|x⊤ y| ≤ ∥x∥p ∥y∥q . (16.3)

When p = q = 2, this is the well-known Cauchy–Schwarz inequality. Moreover,
for every p ∈ [1, ∞] the inequality (16.3) is tight in the sense that for every x
there exists y with ∥y∥q = 1 such that
x⊤ y = ∥x∥p [= ∥x∥p ∥y∥q as ∥y∥q = 1].
16.3 Young, Hölder, and Moment inequalities 215

To justify this claim, note that when x = 0 we can select any y with ∥y∥q = 1.
When x ̸= 0 and p < ∞ we can set y to be
yi := ∥x∥p1−p |xi |p−1 sign(xi ), ∀i = 1, . . . , n,
where we set 0p−1 = 0 when p = 1. Finally, when p = ∞, that is q = 1, we can
find index i∗ of the largest in magnitude entry of x and set
(
sign(xi∗ ), if i = i∗ ,
yi =
0, if i ̸= i∗ .
These observations altogether lead us to an extremely important, although
simple, fact:
1 1
∥x∥p = max y ⊤ x : ∥y∥q ≤ 1 ,

where + = 1. (16.4)
y p q
Based on this, we, in particular, deduce that ∥x∥p is convex (as an upper bound
of a family of linear functions). Hence, by its convexity we deduce that for any
x′ , x′′ we have
1 ′ 1 ′′
∥x′ + x′′ ∥p = 2 x + x ≤ 2 (∥x′ ∥p /2 + ∥x′′ ∥p /2) = ∥x′ ∥p + ∥x′′ ∥p ,
2 2 p

which is nothing but the triangle inequality. Thus, ∥x∥p satisfies the triangle
inequality; it clearly possesses the other two characteristic properties of a norm,
namely positivity and homogeneity, as well. Consequently, ∥ · ∥p is a norm—a fact
that we announced twice and already proved (see Example III.13.1.

16.3.3 Moment inequality

A useful application of Hölder’s inequality gives us another well-known inequality
as follows. Moment inequality reads:
For any 0 ̸= a ∈ Rn , the function
f (π) := ln(∥a∥1/π ) : [0, 1] → R
is a convex function of π ∈ [0, 1]. That is, by letting p = 1/π, and so
1 ≤ p ≤ ∞, we have the following inequality
1 ≤ r < s < h∞, p ∈ [r, s] =⇒ ∥a∥p ≤ ∥a∥λr ∥a∥1−λ
s i
r(s−p) . (16.5)
λ = p(s−r) ⇐⇒ λ ∈ [0, 1] & λ r + (1 − λ) s = p1
1 1

Proof. Let ρ, σ ∈ (0, 1] be such that ρ ̸= σ. Consider any λ ∈ (0, 1) and set
π := λρ+(1−λ)σ. By defining θ := λρ/π, we get 0 < θ < 1 and 1−θ = (1−λ)σ/π.
Let r := 1/ρ, s := 1/σ, and p := 1/π, so we have p = θr + (1 − θ)s. Then,
n n
!θ n
!1−θ
X X X X
|ai |p = |ai |θr |ai |(1−θ)s ≤ |ai |r |ai |s ,
i=1 i=1 i i=1
216 ⋆ Legendre transform

where the inequality follows from Hölder’s inequality. Raising both sides of the
resulting inequality to the power 1/p we arrive at
∥a∥p ≤ ∥a∥rθ/p
r ∥a∥s(1−θ)/p
s = ∥a∥λ1/ρ ∥a∥1−λ
1/σ .

By recalling that p = θr + (1 − θ)s and ln(·) is a monotone increasing function,

we conclude that for any λ ∈ (0, 1) the function f (π) = ln(∥a∥1/π ) satisfies the
inequality
f (λρ + (1 − λ)σ) ≤ λf (ρ) + (1 − λ)f (σ), ∀ (ρ, σ : ρ, σ ∈ (0, 1], ρ ̸= σ) .
Since this function is continuous on [0, 1], it is convex, as claimed.
We close this section by discussing the dual norm of a norm. Let ∥ · ∥ be a norm
on Rn . We define its dual (a.k.a. conjugate) norm as the function
∥d∥∗ := sup d⊤ x : ∥x∥ ≤ 1 .

x

As its name implies one can indeed show that this function ∥d∥∗ is a norm.

Fact III.16.4 Let ∥ · ∥ be a norm on Rn . Then, its dual norm ∥ · ∥∗ is indeed

a norm. Moreover, the norm dual to ∥ · ∥∗ is the original norm ∥ · ∥, and the
unit balls of conjugate to each other norms are polars of each other.

For example, when p ∈ [1, ∞], (16.4) says that the norm conjugate to ∥ · ∥p is
∥ · ∥q where p1 + 1q = 1.
We also have the following characterization of the Legendre transform of norms.

Fact III.16.5 Let f (x) = ∥x∥ be a norm on Rn . Then,

(
0, if ∥d∥∗ ≤ 1,
f ∗ (d) =
+∞, otherwise.
That is, the Legendre transform of ∥ · ∥ is the characteristic function of the
unit ball of the conjugate norm.
17

⋆ Functions of eigenvalues of symmetric

matrices

One may think that the calculus of convexity-preserving operations presented in

section 13.1 does not look really deep. On the other hand, these “simple” rules are
extremely useful and allow us to detect convexity and offer nice characterizations
for a particular class of functions of symmetric matrices, which we will examine
in this chapter.
Let X ∈ Sn be an n × n symmetric matrix, and let λ(X) denote the vector of
eigenvalues of X taken with their multiplicities and arranged in the non-ascending
order, see section D.1.1.C. In this chapter, we present a really deep result stating
that whenever f : Rn → R ∪ {+∞} is convex and permutation symmetric, then
the function F (X) := f (λ(X)) is a convex function of the matrix X ∈ Sn .
Recall that a function f : Rn → R ∪ {+∞} is called permutation symmetric if
f (x) = f (P x) for every n × n permutation matrix P .
That is, a function is permutation symmetric if and only if its value remains
unchanged when we permute the coordinates in its argument.
We start with the following observation.

Lemma III.17.1 Let f : Rn → R ∪ {+∞} be a convex permutation sym-

metric function. Then, for any x ∈ Dom f and n×n doubly stochastic matrix
Π, we have f (Πx) ≤ f (x).

Proof. Recall that by Birkhoff’s Theorem (Theorem II.9.7), Π is a convex combi-

nation of a permutation matrices P i , i.e., there exists Pconvex combination weights
λi ≥ 0 and permutation matrices P i such that Π = i λi P i and i λi = 1. Then,
P
as f is convex, we have
!
X X X
f (Πx) = f λi P i x ≤ λi f (P i x) = λi f (x) = f (x),
i i i

where the inequality follows from the convexity of f and the second equality is
due to the fact that f is permutation symmetric.
Our developments will also rely on the following fundamental fact.

Lemma III.17.2 For any X ∈ Sn , the diagonal Dg{X} of the matrix X is

the image of the vector λ(X) of the eigenvalues of X under multiplication by

217
218 ⋆ Functions of eigenvalues of symmetric matrices

a doubly stochastic matrix. That is, there exists an n × n doubly stochastic

matrix Π such that
Dg{X} = Πλ(X).

Proof. Consider the spectral decomposition of X, i.e.,

X = U ⊤ Diag {λ1 (X), . . . , λn (X)} U
where U = [uij ]ni,j=1 is an orthogonal matrix. Define the matrix Π := [u2ji ]ni,j=1 .
As U is an orthogonal matrix, we have that Π is indeed doubly stochastic (verify
yourself!). Moreover, by denoting the i-th basic orth with ei , we get
Xii = e⊤ ⊤ ⊤
i Xei = ei (U Diag {λ1 (X), . . . , λn (X)} U )ei

= Tr(e⊤ ⊤
i (U Diag {λ1 (X), . . . , λn (X)} U )ei )

= Tr(U ei e⊤ ⊤
i U Diag {λ1 (X), . . . , λn (X)})
Xn
= u2ji λj (X) = [Πλ(X)]i .
j=1

Let us denote with On the set of all n×n orthogonal matrices. Lemmas III.17.1
and III.17.2 together give us the following very useful relation.

Proposition III.17.3 Let f : Rn → R ∪ {+∞} be a convex permutation

symmetric function. Then, for every X ∈ Sn , we have
f (λ(X)) ≥ f (Dg{X}).
Furthermore, we also have
f (λ(X)) = max f (Dg{V ⊤ XV }). (17.1)
V ∈On

In particular, the function F (X) := f (λ(X)) is a convex function of X.

Proof. The first claim immediately follows from Lemmas III.17.1 and III.17.2.
To see the second claim, consider any V ∈ On . Note that the matrix V ⊤ XV has
the same eigenvalues as X. Then, as f is a convex and permutation symmetric
function, applying the first claim of this proposition to the matrix V ⊤ XV , we
conclude
f (Dg{V ⊤ XV }) ≤ f (λ(V ⊤ XV )) = f (λ(X)).
Taking the supremum over V ∈ On of both sides of this relation gives us
f (λ(X)) ≥ sup f (Dg{V ⊤ XV }).
V ∈On

Note that for a properly chosen V ∈ On we have Dg{V ⊤ XV } = λ(X). Thus,

the preceding inequality holds as equality and the right hand side supremum is
achieved.
For the final claim, note that for any V ∈ On , we have the function FV (X) :=
⋆ Functions of eigenvalues of symmetric matrices 219

f (Dg{V ⊤ XV }) is convex in X (as it is the composition of a convex function f and

an affine map X 7→ Dg{V ⊤ XV }. Then, the final claim of the proposition follows
from its second claim as F (X) is the pointwise supremum of convex functions
{FV (X)}V ∈On .
Given a symmetric n × n matrix X, as a corollary of Proposition III.17.3,
we arrive at the following immediate relations between eigenvalues and diagonal
entries of X.
n n
|Xii |p ≤ |λi (X)|p .
P P
1. For all p ≥ 1, we have
i=1 Pn i=1
[Consider the function f (x) = i=1 |xi |p .]
n
Q
2. Whenever X is positive semidefinite, we have Xii ≥ Det(X).
Pn i=1
[Consider f (x) = − i=1 ln(xi ) over the domain where xi > 0 for all i.]
3. Define the function sk (x) : Rn → R to be the sum of k largest entries of x
(i.e., the sum of the first k entries in the vector obtained from x by writing
down the coordinates of x in the non-ascending order). Then, the function
Sk (X) := sk (λ(X)) is convex, and
sk (Dg{X}) ≤ Sk (X). (17.2)
[Recall from Example III.13.3 that sk (x) is a convex permutation symmetric
function.]
Remark III.17.4 Let us examine the convexity status of eigenvalues of sym-
metric matrices. Completely, analogous to our discussion in Remark III.13.6 for
the vector case, we have by Proposition III.17.3, the largest eigenvalue λ1 (X) of
a symmetric n × n matrix X is a convex function of X. Therefore, the smallest
eigenvalue λn (X) = −λ1 (−X) is concave in X. On the other hand, “intermediate”
eigenvalues λk (X), 1 < k < n, of X, are neither convex, nor concave functions
of X. What is convex in X, is the sum Sk (X) of k ≤ n largest eigenvalues of X,
which we have just seen. This clearly implies that the sum of k smallest eigen-
values of X is concave in X. The sum of magnitudes (absolute values) of the k
largest in magnitude eigenvalues of X is convex in X, since the function
sk (x) = sk ([|x1 |; . . . ; |xn |]) = max {|xi1 | + . . . + |xik |}
i1 ,...,ik ∈{1,2,...,n}:
i1 <i2 <...<ik

is permutation symmetric and convex on Rn . ■

Example III.17.1 Proposition III.17.3 implies convexity of the following func-
tions of X ∈ Sn :
• F (X) = −Detq (X) in the domain X ⪰ 0, whenever 0 < q ≤ n1 .
[Consider f (x1 , . . . , xn ) = −(x1 . . . xn )q : Rn+ → R.]
• F (X) = − ln Det(X) in the domain X ≻ 0.
Pn
[Consider f (x1 , . . . , xn ) = − i=1 ln xi : int(Rn+ ) → R.]
• F (X) = Det−q (X) in the domain X ≻ 0, whenever q ∈ R is positive.
[Consider f (x1 , . . . , xn ) = (x1 . . . xn )−q : int(Rn+ ) → R.]
220 ⋆ Functions of eigenvalues of symmetric matrices
n
1/p
|λi (X)|p
P
• ∥X∥p = , where p ≥ 1.
i=1
[Consider f (x1 , . . . , xn ) = ∥x∥p .]
m 1/p
P p
• ∥X+ ∥p = (max[λi (X), 0]) , where p ≥ 1.
i=1
[Consider f (x1 , . . . , xn ) = ∥x+ ∥p , where x+ := [max{0, x1 }; . . . ; max{0, xn }].]
♢
n
We say that a set Q ∈ R is permutation symmetric if for all x ∈ Q and for all
n × n permutation matrices P , we have P x ∈ Q as well. Let us also mention the
following useful corollary of Proposition III.17.3.

Corollary III.17.5 Let Q be a nonempty closed convex and permutation

symmetric set in Rn . Then, the set
Q := {X ∈ Sn : λ(X) ∈ Q}
is closed and convex.
Proof. Recall from Fact D.21 that λ(X) is continuous in X, thus Q is closed.
To prove that Q is convex, consider the function
f (x) := min ∥x − y∥2 .
y∈Q

Note that ∥x − y∥2 is a convex function of x and y over the convex domain
{[x; y] ∈ Rn ×Rn : y ∈ Q}, and as convexity is preserved by partial minimization,
f (x) is a convex real-valued function. Permutation symmetry of Q and ∥·∥2 clearly
implies permutation symmetry of f . Then, by Proposition III.17.3 the function
F (X) := f (λ(X)) is a convex function of X. From the definition of the function
f , we have z ∈ Q if and only if f (z) = 0, which holds if and only if f (z) ≤ 0.
Thus,
Q = {X ∈ Sn : f (λ(X)) ≤ 0} = {X ∈ Sn : F (X) ≤ 0} ,
that is Q is a sublevel set of a convex function F (X) of X, and is hence convex.
Consider a univariate real-valued function g defined on some set Dom g ⊆ R.
In section D.1.5 we have associated with the function g(·) the matrix-valued
map X 7→ g(X) : Sn → Sn as follows: the domain of this map is composed
of all matrices X ∈ Sn with the spectrum σ(X) (subset of R composed of all
eigenvalues of X) contained in Dom g, and for such a matrix X, we set
g(X) := U Diag{g(λ1 ), . . . , g(λn )}U ⊤ ,
where X = U Diag{λ1 , . . . , λn }U ⊤ is an eigenvalue decomposition of X.
The following fact is quite important:

Fact III.17.6 Let g : R → R ∪ {+∞} be a convex function. Then, the

⋆ Functions of eigenvalues of symmetric matrices 221

function
(
Tr(g(X)), if σ(X) ⊆ Dom g,
F (X) := : Sn → R ∪ {+∞}
+∞, otherwise,
is convex.
We close this chapter with the following very useful fact from majorization.

Fact III.17.7 For x ∈ Rn , let x↑ and x↓ be the vectors obtained by re-

ordering the entries of x in the non-decreasing and non-increasing orders,
respectively. For example, [1; 3; 2; 1]↑ = [1; 1; 2; 3] and [1; 3; 2; 1]↓ = [3; 2; 1; 1].
(i) For every x, y ∈ Rn and every n × n doubly stochastic matrix P , we have
[x↑ ]⊤ y↑ ≥ x⊤ P y ≥ [x↑ ]⊤ y↓ .
As a result,
(ii) [Trace inequality] For every A, B ∈ Sn , we have
λ⊤ (A)λ(B) ≥ Tr(AB) ≥ λ⊤ (A)[λ(B)]↑ .
18

Exercises for Part III

18.1 Around convex functions

Exercise III.1 Which of the following functions are convex on the indicated domains:
• f (x) ≡ 1 on R
• f (x) = x on R
• f (x) = |x| on R
• f (x) = −|x| on R
• f (x) = −|x| on R+ = {x ∈ R : x ≥ 0}
• f (x) = |2x − 3| on R
• f (x) = |2x2 − 3| on R
• exp{x} on R
• exp{x2 } on R
• exp{−x2 } on R
• exp{−x2 } on {x ∈ R : x ≥ 100}
• ln(x) on {x ∈ R : x > 0}
• − ln(x) on {x ∈ R : x > 0}
Exercise III.2 ▲
1. Prove the following fact:
For every Ci ∈ Sm
P
+ , i ≤ I, satisfying i∈I Ci = Im and for every λi ∈ R, we have
X 2 X
Tr λi Ci ≤ Tr λ2i Ci .
i∈I i∈I
P
P from Example III.13.4 in section 13.2 that for ai ≥ 0,
2. Recall i ai > 0 the function
ln( i ai exp(λi )) is a convex function of λ. Prove the following matrix analogy of this fact:
For every Ai ∈ Sm
P
+ , 1 ≤ i ≤ I such that i Ai ≻ 0, the function
X
f (λ) = ln Det exp(λi )Ai : RI → R
i

is convex.
3. Let Ai , i ≤ I, be as in item 2. Is it true that the function
X −1
g(x) = ln Det( xi Ai ) : {x ∈ RI : x > 0} → R
i

is convex?
4. Let Bi , i ≤ I, be mi × n matrices such that i Bi⊤ Bi ≻ 0, and let
P

Λ = {λ := (λ1 , . . . , λI ) : λi ∈ Smi , λi ≻ 0, i ≤ I}.

Prove that the function
X
h(λ) = ln Det Bi⊤ λ−1
i Bi : Λ → R
i

is convex.

222
18.1 Around convex functions 223

5. Let Bi , i ≤ I, and Λ be as in the previous item. Prove that the matrix-valued function
" #−1
X ⊤ −1
F (λ) = Bi λi Bi : Λ → int Sn
+
i

is ⪰-concave, that is, the ⪰-hypograph

{(λ, Y ) : λ ∈ Λ, Y ⪯ F (λ)}
of the function is convex.
Exercise III.3 ♦ A function f defined on a convex set Q is called log-convex on Q, if it takes
real positive values on Q and the function ln f is convex on Q. Prove that
• a log-convex on Q function is convex on Q
• the sum (more generally, linear combination with positive coefficients) of two log-convex
functions on Q also is log-convex on the set.
Exercise III.4 ♦ [Law of Diminishing Marginal Returns] Consider optimization problem
Opt(r) = max {f (x) : G(x) ≤ r & x ∈ X} (P [r])
x

where X ⊂ Rn is nonempty convex set, f (·) : X → R is concave, and G(x) = [g1 (x); . . . ; gm (x)] :
X → Rm is vector-function with convex components, and let R be the set of those r for which
(P [r]) is feasible. Prove that
1. R is a convex set with nonempty interior and this set is monotone, meaning that when r ∈ R
and r′ ≥ r, one has r′ ∈ R.
2. The function Opt(r) : R → R ∪ {+∞} satisfies the concavity inequality:
∀(r, r′ ∈ R, λ ∈ [0, 1]) : Opt(λr + (1 − λ)r′ ) ≥ λOpt(r) + (1 − λ)Opt(r′ ). (!)
3. If Opt(r) is finite at some point r̄ ∈ int R, then Opt(r) is real-valued everywhere on R.
Moreover, when X = Rn , and f and the components of G are affine, so that (P [r]) is an
LP program, we can replace in the above claim the inclusion r ∈ int R with the inclusion
r ∈ R: in the LP case, the function Opt(r) is either identically +∞ everywhere on R, or is
real-valued at every point of R.

Comment. Think about problem (P [r]) as about problem where r is the vector of resources
you create, and f (·) is your profit, so that the problem is to maximize your profit given your
resources and “technological constraints” x ∈ X. Now let r̄ ∈ R and e be a nonnegative vector,
and let us look what happens when you select your vector of resources on the ray R = r̄ + R+ e,
assuming that Opt(r) on this ray is real-valued. Restricted on this ray, your best profit becomes
a function ϕ(t) of nonnegative variable t:
ϕ(t) = Opt(r̄ + te).
Since e ≥ 0, this function is nondecreasing, as it should be: the larger t, the more resources you
have, and the larger is your profit. A not so nice news is that ϕ(t) is concave in t, meaning that
the slope of this function does not increase as t grows. In other words, if it costs you $1 to pass
from resources x̄ + te to resources x̄ + (t + 1)e, the return ϕ(t + 1) − ϕ(t) on one extra dollar of
your investment goes down (or at least does not go up) as t grows. This is called The Law of
Diminishing Marginal Returns.
Exercise III.5 ▲ [follow-up to Exercise ref III.4] There are n goods j with per-unit prices
cj > 0, per-unit utilities vj > 0, and the maximum available amounts xj , j ≤ n. Given budget
R ≥ 0, you want to decide on amounts xj of goods to be purchased to maximize the total utility
of the purchased goods, while respecting the budget and the availability constraints. Pose the
problem as and verify that the optimal value Opt(R) is piecewise linear function of R. What
are the breakpoints of this function? What are the slopes between breakpoints?
224 Exercises for Part III

Exercise III.6 ▲ Let β ∈ Rn be such that β1 ≥ β2 ≥ . . . ≥ βn . For x ∈ Rn , let x(k) be the

k-th largest entry in x. Consider the function
X
f (x) = βk x(k) = [β1 − β2 ]s1 (x) + [β2 − β3 ]s2 (x) + . . . + [βn−1 − βn ]sn−1 (x) + βn sn (x),
k

where, as always, sk (x) = ki=1 x(i) . As we know from Exercise I.29, the functions sk (x), k < n,
P
are polyhedrally representable:
X
t ≥ sk (x) ⇐⇒ ∃z ≥ 0, s : xi ≤ zi + s, i ≤ n, zi + ks ≤ t,
i

and sn (x) is just linear:

X
sn (x) = xi
i

As a result, f admits the polyhedral representation

⇐⇒ ∃Z = [zik ] ∈ Rn×n−1 , sk , tk , k < n :
t ≥ f (x) 
 ∀(i ≤ n, k < n)P: zik ≥ 0, xi ≤ zik + sk ,
∀k < n : tk ≥ i zik + ksk
t ≥ n−1
P Pn
k=1 [βk − βk+1 ]tk + βn i=1 xi


This polyhedral representation has 2n2 − n linear inequalities and n2 + n − 2 extra variables.
Now goes the exercise:
1. Find an alternative polyhedral representation of f with n2 + 1 linear inequalities and 2n
extra variables.
2. [computational study] Generate at random orthogonal n × n matrix U and vector β with
nonincreasing entries and solve numerically the problem
( )
X
min f (x) := βk x(k) : ∥U x∥∞ ≤ 1
x
k

utilising the above polyhedral representations of f . For n = 8, 16, 32, . . . , 1024, compare the
running times corresponding to the 2 representations in question.
Exercise III.7 ♦ Let a ∈ Rn be a nonzero vector, and let f (ρ) = ln(∥a∥1/ρ ), ρ ∈ [0, 1].
Moment inequality, see section 16.3.3, states that f is convex. Prove that the function is also
nonincreasing and Lipschitz continuous, with Lipschitz constant ln n, or, which is the same, that
1 1
−p
1 ≤ p ≤ p′ ≤ ∞ =⇒ ∥a∥p ≥ ∥a∥p′ ≥ n p′ ∥a∥p .

Exercise III.8 ▲ This Exercise demonstrates power of Symmetry Principle. Consider the
situation as follows: you are given noisy observations

ω = Ax + ξ, A = Diag{αi , i ≤ n}

of unknown signal x known to belong to the unit ball B = {x ∈ Rn : ∥x∥2 ≤ 1}; here αi > 0
are given, and ξ is the standard (zero mean, unit covariance) Gaussian observation noise. Your
goal is to recover from this observation the vector y = Bx, B = Diag{βi , i ≤ n} being given.
You intend to recover y by linear estimate

ybH (ω) = Hω,

where H is an n × n matrix you are allowed to choose. For example, selecting H = BA−1 =
Diag{βi αi−1 }, you get an unbiased estimate:

yH (Ax + ξ) − y} = 0.
E{b
18.1 Around convex functions 225

Let us quantify the quality of a candidate linear estimate ybH

— at a particular signal x ∈ B - by the quantity
q
Errx (H) = E{∥b yH (Ax + ξ) − Bx∥22 },

so that Err2x (H) is the expected squared ∥ · ∥2 -distance between the estimate and the estimated
quantity,
— on the entire set B of possible signals – by risk Risk[H] = maxx∈B Errx (H).
1. Find closed form expressions for Errx (H) and Risk(H).
2. Formulate the problem of finding the linear estimate with minimal risk as the problem of
minimizing a convex function and prove that the problem is solvable, and admits an optimal
solution H ∗ which is diagonal: H ∗ = Diag{ηi , i ≤ n}.
3. Reduce the problem yielding by item 2 to the problem of minimizing easy-to-compute convex
univariate function. Consider the case when βi = i−1 and αi = [σi2 ]−1 , 1 ≤ i ≤ n, set
n = 10000 and fill the following table:

σ 1.0 0.1 0.01 0.001 0.0001 0.00001 0.000001

Risk[H ∗ ]
Risk[BA−1 ]

where H ∗ is the minimum risk linear estimate as yielded by the solution to univariate problem
you end up with, and Risk[BA−1 ] is the risk of unbiased linear estimate.
You should see from your numerical results that minimal risk of linear estimation is much
smaller than the risk of the unbiased linear estimate. Explain on qualitative level why allowing
for bias reduces the risk.
Exercise III.9 ♦ 1 Given the sets of d-dimensional tentative nodes (d = 2 or d = 3) and of
tentative bars of a TTD problem satisfying assumption R, let V = RM be the space of virtual
displacements of the nodes, N be the number of tentative bars, and W > 0 be the allowed total
bar volume, see Exercise I.16. Let, next, C(t, f ) : RN
+ × V → R ∪ {+∞} be the compliance of
truss t ≥ 0 w.r.t. load f (we identify trusses with the corresponding vectors t of bar volumes).
Prove that
1. C(t, f ) is a convex lsc function, positively homogeneous of homogeneity degree 1, of [t; f ] with
RN N N n
++ × V ⊂ Dom C, where R++ = int R+ = {t ∈ R : t > 0}. This function is positively
homogeneous, with degree -1, in t, when f is fixed, and positively homogeneous, of degree
2, in f when t is fixed. Besides this, C(t, f ) is nonincreasing in t ≥ 0: if 0 ≤ t′ ≤ t, then
C(t, f ) ≤ C(t′ , f ) for every f .
P
2. The function Opt(W, f ) = inf t C(t, f ) : t ≥ 0, i ti = W – the optimal value in the TTD
problem (5.2) – with W restricted to reside in R++ = {W > 0} is convex continuous function
with the domain R++ × V. This function is positively homogeneous, of degree -1, in W > 0
and homogeneous, of homogeneity degree 2, in f :

∀(λ > 0, µ ≥ 0) : Opt(λW, µf ) = λ−1 µ2 Opt(W, f ), ∀(W, f ) ∈ R++ × V.

P
Moreover, the infimum in inf t C(t, f ) : t ≥ 0, i ti = W is achieved whenever W > 0.
3. When on certain bridge there is just one car, of unit weight, the compliance of the bridge
does not exceed 1, whatever be the position of the car. How large could the compliance of
the bridge when there are 100 cars of total weight 70 on it?

To formulate the next two tasks, let us associate with a free node p the set F p of all single-
force loads stemming from forces g of magnitude ∥g∥2 not exceeding 1 and acting at node p. For
1 Preceding exercises in the TTD series are I.16, I.18.
226 Exercises for Part III

a set S of free nodes, F S is the set of all loads with nonzero forces acting solely at the nodes
from S and with the sum of ∥ · ∥2 -magnitudes of the forces not exceeding 1, so that

F S = Conv(∪p∈S F p )

(why?)
4. Let S = {p1 , . . . , pK } be a K-element collection of free nodes from the nodal set. Assume
that for every node p from S and every load f ∈ F p there exists a truss of a given total
weight W such that its compliance w.r.t. f does not exceed 1. Which, if any, of the following
statements
(i) For every load f ∈ F S , there exists a truss of total volume W with compliance w.r.t. f
not exceeding 1
(ii) There exists a truss of total volume W with compliance w.r.t. every load from F S not
exceeding 1
(iii) For properly selected γ depending solely on d, there exists a truss of total volume γKW
with compliance w.r.t. every load from F S not exceeding 1
is true?
⋆5. Prove the following statement:
In the situation of item 4 above, let γ = 4 when d = 2 and γ = 7 when d = 3. For
tk of total volume γW such that the compliance of t
every k ≤ K there exists a truss b
pk
w.r.t. every load from F does not exceed 1. As a result, there exists truss e
t of total
volume γKW with compliance w.r.t. every load from F S not exceeding 1.

18.2 Around support, characteristic, and Minkowski functions

Exercise III.10 ♦ [characteristic and support functions of convex sets] Let X ⊂ Rn be
a nonempty convex set. Characteristic (a.k.a. indicator ) function of X is, by definition, the
function

0 ,x ∈ X
χX (x) =
+∞ , x ̸∈ X

As is immediately seen, this function is convex and proper. The Legendre transform of this
function is called the support function ϕX (x) of X:

ϕX (x) = sup[x⊤ u − χX (u)] = sup x⊤ u.

u u∈X

1. Prove that χX is lower semicontinuous (lsc) if and only if X is closed, and that the support
functions of X and cl X are the same.
In the remaining part of Exercise, we are interested in properties of support function,s and in
view of item 1, it makes sense to assume from now on that X, on the top of being nonempty
and convex, is also closed.
Prove the following facts:
2. ϕX (·) is proper lsc convex function which is positively homogeneous of degree 1:

∀(x ∈ Dom ϕx , λ ≥ 0) : ϕX (λx) = λϕX (x).

In particular, the domain of ϕX is a cone. Demonstrate by example that this cone not
necessarily is closed (look at the support function of the closed convex set {[v; w] ∈ R2 : v >
0, w ≤ ln v}).
18.2 Around support, characteristic, and Minkowski functions 227

3. Vice versa, every proper convex lsc function ϕ which is positively homogeneous of degree 1,
(x ∈ Dom f, λ ≥ 0) =⇒ ϕ(λx) = λϕ(x)
is the support function of a nonempty closed convex set, specifically, its subdifferential ∂ϕ(0)
taken at the origin. In particular, ϕX (·) “remembers” X: if X, Y are nonempty closed convex
sets, then ϕX (·) ≡ ϕY (·) if and only if X = Y .
4. Let X, Y be two nonempty closed convex sets. Then ϕX (·) ≥ ϕY (·) if and only if Y ⊂ X.
5. Dom ϕX = Rn if and only if X is bounded.
6. Let X be the unit ball of some norm ∥ · ∥. Then ϕX is nothing but the norm ∥ · ∥∗ conjugate
to ∥ · ∥. In particular, when p ∈ [1, ∞] and X = {x ∈ Rn : ∥x∥p ≤ 1}, we have ϕX (x) ≡ ∥x∥q ,
1
q
+ p1 = 1.
7. Let x 7→ Ax + b : Rn → Rm be an affine mapping, and let Y = AX + b = {Ax + b : x ∈ X}.
Then
ϕY (v) = ϕX (A⊤ v) + b⊤ v.
Exercise III.11 ♦ [Minkowski functions of convex sets] The goal of this Exercise is to acquaint
the reader with important special family of convex functions – Minkowski functions of convex
sets.
Consider a proper nonnegative lower semicontinuous function f : Rn → R ∪ {+∞} which is
positively homogeneous of degree 1, meaning that
x ∈ Dom f, t ≥ 0 =⇒ tx ∈ Dom f & f (tx) = tf (x).
Note that from the latter property of f and its properness it follows that 0 ∈ Dom f and
f (0) = 0.
We can associate with f its basic sublevel set
X = {x ∈ Rn : f (x) ≤ 1}.
Note that X ”remembers” f , specifically
∀t > 0 : f (x) ≤ t ⇐⇒ f (t−1 x) ≤ 1 ⇐⇒ t−1 x ∈ X,
whence also
∀x ∈ Rn : f (x) = inf t : t > 0, t−1 x ∈ X

(18.1)
[inf{t : t > 0, t ∈ ∅} = +∞ by definition]
Note that the basic sublevel set of our f cannot be arbitrary: it is convex and closed (since f is
convex lsc) and contains the origin (since f (0) = 0).
Now, given a closed convex set X ⊂ Rn containing the origin, we can associate with it a
function f : Rn → R ∪ {+∞} by construction from (18.1), specifically, as
f (x) = inf t : t > 0, t−1 x ∈ X

(18.2)
This function is called the Minkowski function (M.f.) of X.
Here goes your first task:
1. Prove that when X ⊂ Rn is convex, closed, bounded, and contains the origin, function f
given by (18.2) is proper, nonnegative, convex lsc function positively homogeneous of degree
1, and X is the basic sublevel set of f . Moreover, f is nothing but the support function ϕX∗
of the polar X∗ of X.
Your next tasks are as follows:
2. What are the Minkowski functions of
• the singleton {0} ?
• a linear subspace ?
• a closed cone K ?
228 Exercises for Part III

• the unit ball of a norm ∥ · ∥ ?

3. Prove that the Minkowski functions fX , fY of closed convex and containing the origin sets
X, Y are linked by the relation fX ≥ fY if and only if X ⊂ Y
4. When the Minkowski function of a set X (convex, closed, bounded, and containing the origin)
does not take value +∞?
5. What is the set of zeros of the Minkowski function of a set X (convex, closed, bounded, and
containing the origin)?
6. What is the M.f. of the intersection ∩k≤K Xk of closed convex sets containing the origin?
Exercise III.12 ♦
1. Recall that the closed conic transform
ConeT(X) = cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ X} ,
of a nonempty convex set X ⊂ Rn (see section 1.5) is a closed cone such that
cl(X) = {x : [x; 1] ∈ ConeT(X).
What is the cone dual to ConeT(X) ?
2. Let X ⊂ Rn be a nonempty closed convex set and X + = ConeT(X). Prove that

 tX, t > 0 (a)
Xt+ := {x : [x; t] ∈ X + } = Rec(X), t = 0 (b)
t < 0 (c)

∅,
3. Let X1 , . . . , XK be closed convex sets in Rn with nonempty intersection X. Prove that
ConeT(X) = ∩k ConeT(Xk ).

4. Let X = ∩k≤K Xk , where X1 , . . . , XK are closed convex sets in Rn such that XK ∩ int X1 ∩
int X2 . . . ∩ int XK−1 ̸= ∅. Prove that ϕX (y) ≤ a if and only if there exist yk , k ≤ K, such
that
X X
y= yk & ϕXk (yk ) ≤ a. (∗)
k k

In words: In the situation in question, the supremum of a linear form on ∩k Xk does not
exceed some a if and only if the form can be decomposed into the sum of K forms with the
sum of their suprema over the respective sets Xk not exceeding a.
5. Prove the following polyhedral version of the claim in item 4:
Let Xk = {x ∈ Rn : Ak x ≤ bk }, k ≤ K, be polyhedral sets with nonempty intersection X.
A linear form does not exceed some a ∈ R everywhere on X if and only if the form can be
decomposed into the sum of K linear forms with the sum of their maxima on the respective
sets Xk not exceeding a.
Exercise III.13 ▲ Let X ⊂ Rn be a nonempty polyhedral set given by polyhedral represen-
tation
X = {x : ∃u : Ax + Bu ≤ r}.
Build polyhedral representation of the epigraph of the support function of X. For non-polyhedral
extension, see Exercise IV.36.
Exercise III.14 ▲ Compute in closed analytic form the support functions of the following
sets:
1. The ellipsoid {x ∈ Rn : (x − c)⊤ C(x − c) ≤ 1} with C ≻ 0
probabilistic simplex {x ∈ Rn
P
2. The + : i xi = 1}
3. The nonnegative part of the unit ∥ · ∥p -ball: X = {x ∈ Rn + : ∥x∥p ≤ 1}, p ∈ [1, ∞]
4. The positive semidefinite part of the unit ∥ · ∥p,Sh norm: X = {x ∈ Sn + : ∥x∥p,Sh ≤ 1}
18.3 Around subdifferentials 229

18.3 Around subdifferentials

Exercise III.15 ♦ Let f be a convex function and x̄ ∈ Dom f ⊂ Rn . Prove that the property
of g ∈ Rn to be a subgradient of f at x̄ is local: the inequality
f (x) ≥ f (x̄) + g ⊤ (x − x̄) (∗)
n
hods true for all x ∈ R iff it holds true for all x in a neighborhood of x̄.
Exercise III.16 ♦ [subdifferentials of norms] Let ∥ · ∥ be a norm on Rn , and ∥ · ∥∗ be its
conjugate (see Fact III.16.4). Prove that
1. The subdifferential of ∥ · ∥ taken at the origin is the unit ball B∗ of ∥ · ∥∗ , or, which is the
same, the polar
{u : u⊤ x ≤ 1 ∀(x : ∥u∥ ≤ 1)}
of the unit ball B of the norm ∥ · ∥.
2. When x ̸= 0, the subdifferential of ∥ · ∥ taken at x is the set {u ∈ B∗ : u⊤ x = ∥x∥}. In
particular, the subdifferential of ∥ · ∥ remains intact when replacing x with tx, t > 0, and is
reflected with respect to the origin when x is replaced with tx, t < 0.
Exercise III.17 ♦ [Shatten norms] Let p ∈ [1, ∞]. The space Sn of symmetric n × n matrices
can be equipped with Shatten p-norms – matrix analogies of the standard ∥ · ∥p -norms on Rn .
Specifically, Shatten p-norm ∥ · ∥p,Sh of symmetric matrix X is defined as
∥X∥p,Sh = ∥λ(X)∥p ,
where λ(X), as always, is the vector of eigenvalues of X.
1. Prove that Shatten norms indeed are norms, and the norm conjugate to ∥ · ∥p,Sh is ∥ · ∥q,Sh ,
1
p
+ 1q = 1:
∥X∥q,Sh = max{Tr(XY ) : ∥Y ∥p,Sh ≤ 1} (18.3)
Y

2. Verify that ∥ · ∥2,Sh is nothing but the Frobenius norm of X, and ∥X∥∞,Sh is the same as the
spectral norm of X.
Exercise III.18 ♦ [chain rule for subdifferentials] Let Y ∈ Rm and X ∈ Rn be nonempty
convex sets, y ∈ Y , x ∈ X, f (·) : Y → R be a convex function, and A(·) : X → Y with A(x) = y.
Let, further, K be a closed cone in Rn . Function f is called K-monotone on Y , if for y, y ′ ∈ Y
such that y ′ − y ∈ K it holds f (y ′ ) ≥ f (y), and A is called K-convex on X if for all x, x′ ∈ X
and λ ∈ [0, 1] it holds λA(X) + (1 − λ)A(x′ ) − A(λx + (1 − λ)x′ ) ∈ K. 2 .
Prove that
1. A is K-convex on X if and only if for every ϕ ∈ K∗ the real-valued function ϕ⊤ A(x) is convex
on X.
2. Let A be K-convex on X and differentiable at x. Prove that
∀x ∈ X : A(x) − [A(x) + A′ (x)[x − x]] ∈ K. (∗)
3. Let f be K-monotone on Y and A be K-convex on X. Prove that the real valued on X
function f◦A (x) = f (A(x)) is convex.
4. Let f be K-monotone on Y . Prove that ∂f (y) ⊂ K∗ provided y ∈ int Y .
5. [chain rule] Let y ∈ int Y , x ∈ int X, let f be K-monotone on Y , A be K-convex on X and
differentiable at x. Prove that
∂f◦A (x) = [A′ (x)]⊤ ∂f (y) = {[A′ (x)]⊤ g : g ∈ ∂f (y)} (!)
n
Exercise III.19 ♦ Recall that the sum Sk (X) of k ≤ n largest eigenvalues of X ∈ S is a
convex function of X, see Remark III.17.4. Point out a subgradient of Sk (·) at a point X ∈ Sn .
2 We shall study cone-monotonicity and cone-convexity in more details in Part IV.
230 Exercises for Part III

18.4 Around Legendre transform

Exercise III.20 ▲ Compute Legendre transforms of the following univariate functions:
1. f (x) = − ln x, Dom f = (0, ∞)
2. f (x) = ex , Dom f = R.
3. f (x) = x ln x, Dom f = [0, ∞) (0 ln 0 = 0 by definition).
4. f (x) = xp /p, Dom f = [0, ∞); here p > 1.
Exercise III.21 ▲ Compute Legendre transforms of the following functions:
Pn
• [log-barrier for nonnegative orthant Rn + ] f (x) = −
n
i=1 ln xi : int R+ → R
n n
• [log-det barrier for semidefinite cone S+ ] f (x) = − ln Det(x) : int S+ → R (start with proving
convexity of f ).
Exercise III.22 ♦ [computing Legendre transform of the log-barrier − ln(x2n −x21 −. . .−x2n−1 )
for Lorentz cone] Consider the optimization problem
n √ o
max ξ ⊤ x + τ t + ln(t2 − x⊤ x) : (t, x) ∈ X = {t > x⊤ x}
x,t

where ξ ∈ Rn , τ ∈ R are parameters. Is the problem convex3) ? What is the domain in the
space of parameters where the problem is solvable? What is the optimal value? Is it convex in
the parameters?
Exercise III.23 ♦ Consider the optimization problem

max {f (x, y) = ax + by + ln(ln y − x) + ln(y) : (x, y) ∈ X = {y > exp{x}}} ,

x,y

where a, b ∈ R are parameters. Is the problem convex? What is the domain in space of parame-
ters where the problem is solvable? What is the optimal value? Is it convex in the parameters?
Exercise III.24 ▲ Compute Legendre transforms of the following functions:
• [“geometric mean”] f (x) = − i≤n xπi i : Rn
Q
+ → R, where πi > 0 sum up to 1 and n > 1.

• [“inverse geometric mean”] f (x) = i≤n x−π : int Rn

Q
+ → R, where πi > 0.
i
i

18.5 Miscellaneous exercises

Exercise III.25 ♦ [multi-factor Hölder inequality] Given positive reals q1 , . . . , qn and p ∈
[1, ∞), we define the weighted p-norm of a vector x ∈ Rn as

n
!1/p
X p
|x|p = qj |xj |
j=1

This clearly is a norm which becomes the standard norm ∥ · ∥p when qj = 1, j ≤ n. Same as
∥x∥p , the quantity |x|p has limit, namely, ∥x∥∞ , as p → ∞, and we define | · |∞ as this limit.
Now let pi , i ≤ k, be positive reals such that
k
X 1
= 1.
i=1
p i

3 A maximization problem with objective f (·) and certain constraints and domain is called convex if
the equivalent minimization problem with the objective (−f ) and the original constraints and
domain is convex.
18.5 Miscellaneous exercises 231

1. Prove that for nonnegative reals a1 , . . . , ak one has

p
ap11 a k
a1 a2 . . . ak ≤ + ... + k
p1 pk

or, equivalently (set bi = api i )

1/p1 1/p2 1/pk b1 b2 bk

∀b ≥ 0 : b1 b2 . . . bk ≤ + + ... + .
p1 p2 pk
Note: the special case pi = k, i ≤ k, of this inequality is the inequality between the geometric
and the arithmetic means.
2. Let x1 , . . . , xk ∈ Rn , and let x1 x2 . . . xk be the entrywise product of x1 , . . . , xk :

[x1 x2 . . . xk ]j = x1j x2j · · · xkj , 1 ≤ j ≤ n.

Prove that
k
X |xii |ppii
|x1 x2 . . . xk |1 ≤ . (∗)
i=1
pi

3. Prove multi-factor Hölder inequality: for vectors xi ∈ Rn , i ≤ k, one has

|x1 x2 . . . xk |1 ≤ |x1 |p1 |x2 |p2 · · · |xk |pk (#)

P
Note: (#) was stated for positive reals p1 , . . . , pk with i 1/pi = 1. It is immediately seen
that (#) remains true when pi = ∞ for some i (and, of course, 1/pi is set to 0 for these i).
Note: (#) is the general form of Hölder inequality which in the main text was proved for k = 2
and | · |pi = ∥ · ∥pi . Needless to say, this inequality extends
P to the case when xi are functions
xi (ω) on a space with measure q(·) , and the finite sums n j=1 qj fj are replaced with integrals,
resulting in
Z Y k Yk Z 1/pi
pi
| xi (ω)|q(dω) ≤ |xi (ω)| q(dω)
i=1 i=1

provided some measurability conditions are satisfied. In this textbook we, however, do not touch
infinite-dimensional spaces of functions and related norms.
Exercise III.26 ♦ [Muirhead’s inequality] For any u ∈ Rn and z ∈ Rn n
++ := {z ∈ R : z > 0}
define
1 X u1 un
fz (u) = z · · · zσ(n) ,
n! σ σ(1)

where the sum is over all permutations σ of {1, . . . , n}. Show that if P is a doubly stochastic
n × n matrix, then
fz (P u) ≤ fz (u) ∀(u ∈ Rn , z ∈ Rn
++ ).

Exercise III.27 ♦ Prove that a convex lsc function f with polyhedral domain is continuous
on its domain. Does the conclusion remain true when lifting either one of the assumptions that
(a) convex f is lsc, and (b) Dom f is polyhedral?
Exercise III.28 ▲ Let a1 , . . . , an > 0, α, β > 0. Solve the optimization problem
( n )
X ai X β
min : x > 0, xi ≤ 1
x xα
i=1 i i
232 Exercises for Part III

Exercise III.29 ▲ [computational study] Consider the following situation: there are K ”radars”
with k-th of them capable to locate targets within ellipsoid Ek = {x ∈ Rn : (x−ck )⊤ Ck (x−ck ) ≤
1} (Ck ≻ 0); the measured position of target is
yk = x + σk ζk ,
where x is the actual position of the target, and ζk is the standard (zero mean, unit covariance)
Gaussian observation noise; ζk are independent across k. Given measurements y1 , . . . , yK of
target’s location x known to belong to the “common field of view” E = ∩k Ek of the radars,
which we assume to possess a nonempty interior, we want to estimate a given linear form e⊤ x
of x by using linear estimate
X ⊤
x
b= hk yk + h.
k

We are interested in finding the estimate (e.g., the parameters H1 , . . . , HK , h) minimizing the
risk
s
h X i2
Risk2 = max E e⊤ x − h⊤
k [x + σ k ζk ] − h
x∈E k

1. Pose the problem as convex optimization program

2. Process the problem numerically and look at the results.
Recommended setup:

1.000 −0.500 −0.500
• K = 3, n = 2, [c1 , c2 , c3 ] = ,
0 0.866 −0.866

0.2500 0 1.1875 0.5413 1.1875 −0.5413
C1 = , C2 = , C3 =
0 1.5000 0.5413 0.5625 −0.5413 0.5625
• σ1 = 0.1, σ√2 = 0.2, σ3 = 0.3
• e = [1; 1]/ 2.

Figure III.5. 3 radars and their common filed of view (dotted)

Exercise III.30 ♦ For any k ≤ m and X ∈ Sm , recall that Sk(X) denotes the sum of k
largest eigenvalues of the matrix X. Given X ∈ Sm , define R[X] := V ⊤ XV : V ∈ Om where
Om = {V ∈ Rm×m : V V ⊤ = Im } is the set of all m × m orthogonal matrices. Prove that for
any two symmetric matrices X, Y ∈ Sm , we have
Y ∈ Conv(R[X]) if and only if Sk (Y ) ≤ Sk (X) for all k < m and Tr(Y ) = Tr(X).
19

Proofs of Facts from Part III

Fact III.16.2 Given a proper convex function f : Rn → R ∪ {+∞}, its Legendre

transform f ∗ is a proper convex lower semicontinuous function.
Proof. This fact immediately follows from the definition of the Legendre transform f ∗ : indeed,
we lose nothing when replacing sup d⊤ x − f (x) with sup
⊤
d x − f (x) , so that the
x∈Rn x∈Dom f
Legendre transform is the supremum of a nonempty (as f is proper) family of affine functions
and as such is convex and lower semicontinuous. Since this supremum is finite at least at one
point (namely, at every d which is the slope of an affine minorant of f and we know that such
a minorant exists), f ∗ is a proper convex lsc function, as claimed.

Fact III.16.4 Let ∥ · ∥ be a norm on Rn . Then, its dual norm ∥ · ∥∗ is indeed a norm.
Moreover, the norm dual to ∥ · ∥∗ is the original norm ∥ · ∥, and the unit balls of
conjugate to each other norms are polars of each other.
Proof. From definition of ∥ · ∥∗ it immediately follows that ∥ · ∥∗ satisfies all three conditions
specifying a norm. To justify that the norm dual to ∥ · ∥∗ is ∥ · ∥, note that the unit ball of the
dual norm is, by the definition of this norm, the polar of the unit ball of ∥ · ∥, and the latter
set, as the unit ball of any norm, is closed, convex, and contains the origin. As a result, the unit
balls of ∥ · ∥, ∥ · ∥∗ are polars of each other (Proposition II.8.37), and the norm dual to dual is
the original one – its unit ball is the polar of the unit ball of ∥ · ∥∗ .

Fact III.16.5 Let f (x) = ∥x∥ be a norm on Rn . Then,

(
0, if ∥d∥∗ ≤ 1,
f ∗ (d) =
+∞, otherwise.

That is, the Legendre transform of ∥ · ∥ is the characteristic function of the unit ball
of the conjugate norm.
Proof. Consider any fixed d ∈ Rn . By the definition of Legendre transform we have
n o n o
f ∗ (d) = sup d⊤ x − f (x) = sup d⊤ x − ∥x∥ .
x∈Rn x∈Rn

Now, consider the function gd (x) := d⊤ x − ∥x∥ so that f∗ (d) = supx∈Rn gd (x). The function
gd (x) is positively homogeneous, of degree 1, in x, so that its supremum over the entire space
is either 0 (this happens when the function is nonpositive everywhere), or +∞. By the same
homogeneity, the function gd (x) is nonpositive everywhere if and only if it is nonpositive when
∥x∥ = 1, that is, if and only if d⊤ x ≤ 1 whenever ∥x∥ = 1, or, which is the same, when d⊤ x ≤ 1
whenever ∥x∥ ≤ 1. The bottom line is that f ∗ (d) = supx gd (x) is either 0, or +∞, with the first

233
234 Proofs of Facts from Part III

option taking place if and only if d⊤ x ≤ 1 whenever ∥x∥ ≤ 1, that is, if and only if ∥d∥∗ ≤ 1.

Fact III.17.6 Let g : R → R ∪ {+∞} be a convex function. Then, the function

(
Tr(g(X)), if σ(X) ⊆ Dom g,
F (X) := : Sn → R ∪ {+∞}
+∞, otherwise,
is convex.
Pn
Proof. Define gb(x) := i=1 g(xi ), so that g
b is a permutation symmetric and convex (since
g is convex) function on Rn . Then, by Proposition III.17.3 the function Fb(x) := gb(λ(X)) is
a convex function of X ∈ Sn . Now, Dom Fb = {X ∈ Sn : σ(X) ⊆ Dom g}, so Dom Fb =
Dom F . Consider any X ∈ Dom Fb along with its eigenvalue decomposition given by X =
U Diag{λ(X)}U ⊤ . By definition of g(X), we have g(X) = U Diag{g(λ1 (X)), . . . , g(λn (X))}U ⊤
and F (X) = Tr(g(X)) = n
P
i=1 g(λi (X)) = gb(λ(X)) = Fb(X). We conclude that F is nothing
but the convex function Fb.

Fact III.17.7. For x ∈ Rn , let x↑ and x↓ be the vectors obtained by reordering

the entries of x in the non-decreasing and non-increasing orders, respectively. For
example, [1; 3; 2; 1]↑ = [1; 1; 2; 3] and [1; 3; 2; 1]↓ = [3; 2; 1; 1].
(i) For every x, y ∈ Rn and every n × n doubly stochastic matrix P , we have
[x↑ ]⊤ y↑ ≥ x⊤ P y ≥ [x↑ ]⊤ y↓ .
As a result,
(ii) [Trace inequality] For every A, B ∈ Sn , we have
λ⊤ (A)λ(B) ≥ Tr(AB) ≥ λ⊤ (A)[λ(B)]↑ .
Proof.
(i): First, we claim that for all x, y ∈ Rn we have

x⊤ ⊤ ⊤
↓ y↓ ≥ x y ≥ x↓ y↑ . (*)
Indeed, by continuity, it suffices to verify this relation when all entries of x, same as all entries
in y, are distinct from each other. In such a case, observe that the inequalities to be proved
remain intact when we simultaneously reorder, in the same order, entries in x and in y, so
that we can assume without loss of generality that x1 ≥ x2 ≥ . . . ≥ xn . Taking into account
that ac+bd−[ad+bc] = [a−b][c−d], we see that if i < j and y is obtained from y by swapping
its i-th and j-th entries, we have x⊤ y ≤ x⊤ y when yi > yj and x⊤ y ≥ x⊤ y otherwise. Thus,
the minimum (the maximum) of inner products x⊤ z over the set of vectors z obtained by
reordering entries in y is achieved when z = y↑ (respectively, z = y↓ ), as claimed.
In the situation of (i), by Birkhoff Theorem, P y is a convex combination of vectors obtained
from y by reordering entries, and so the relation in (i) is immediately implied by (∗).
(ii): Let A = U Diag{λ(A)}U ⊤ be the eigenvalue decomposition of A. Then,

Tr(AB) = Tr(U Diag{λ(A)}U ⊤ B) = Tr(Diag{λ(A)}(U ⊤ BU )) = (λ(A))⊤ µ,

where µ is the diagonal of the matrix U ⊤ BU , i.e., µ = Dg{U ⊤ BU }. Then, by Lemma III.17.2
we deduce that there exists a doubly stochastic matrix P such that

µ = Dg{U ⊤ BU } = P λ(U ⊤ BU ) = P λ(B).

Proofs of Facts from Part III 235

Thus, Tr(AB) = (λ(A))⊤ µ = (λ(A))⊤ P λ(B). The desired inequality then follows from ap-
plying part (i) to the vectors x := λ(A) and y := λ(B).
Part IV

Convex Programming, Lagrange

Duality, Saddle Points

237
20

Convex Programming problems and Convex

Theorem on Alternative

20.1 Mathematical Programming and Convex Programming

problems
For reader’s convenience, we start with reproducing the basic optimization ter-
minology presented in section 4.5.1.
A (constrained) Mathematical Programming problem has the following form:
 
 x ∈ X, 
(P) min f (x) : g(x) ≡ [g1 (x); . . . ; gm (x)] ≤ 0, , (20.1)
x 
h(x) ≡ [h1 (x); . . . ; hk (x)] = 0


where

• [domain] X ⊆ Rn is called the domain of the problem.

• [objective] f is called the objective (function) of the problem,
• [constraints] gi , i = 1, . . . , m, are called the (functional) inequality constraints,
and hj , j = 1, . . . , k, are called the equality constraints 1) .
We always assume that X ̸= ∅ and that the objective and the constraints are
well-defined on X. Moreover, we typically skip indicating X when X = Rn . Thus,
in the sequel, unless the domain is explicitly present in the formulation, it is the
entire Rn .
We use the following standard terminology related to (20.1)

• [feasible solution] a point x ∈ Rn is called a feasible solution to (20.1), if

x ∈ X, gi (x) ≤ 0, i = 1, . . . , m, and hj (x) = 0, j = 1, . . . , k, i.e., if x satisfies
all restrictions imposed by the formulation of the problem.
– [feasible set] the set of all feasible solutions is called the feasible set of the
problem.
– [feasible problem] a problem with a nonempty feasible set (i.e., the one which
admits feasible solutions) is called feasible (or consistent).
– [active constraint] an inequality constraint gi (·) ≤ 0 is called active at a given
1 Rigorously speaking, the constraints are not the functions gi , hj , but the relations gi (x) ≤ 0,
hj (x) = 0. We will use the word “constraints” in both of these senses, and it will always be clear
what is meant. For example, we will say that “x satisfies the constraints” to refer to the relations,
and we will say that “the constraints are differentiable” to refer to the underlying functions.

239
240 Convex Programming problems and Convex Theorem on Alternative

feasible solution x, if this constraint is satisfied at the point as an equality

rather than strict inequality, i.e., if
gi (x) = 0.
Each equality constraint hj (x) = 0 by definition is active at every feasible
solution x.
• [optimal value] the optimal value of the problem refers to the quantity

inf x {f (x) : x ∈ X, g(x) ≤ 0, h(x) = 0} , if the problem is feasible
f ∗ := .
+∞, if the problem is infeasible

– [below boundedness] the problem is called below bounded, if its optimal value
is > −∞, i.e., if the objective is bounded from below on the feasible set.
• [optimal solution] a point x ∈ Rn is called an optimal solution to (20.1), if x
is feasible and f (x) ≤ f (x′ ) for any other feasible solution x′ , i.e., if
x ∈ Argmin {f (x′ ) : x′ ∈ X, g(x′ ) ≤ 0, h(x′ ) = 0} .

– [solvable problem] a problem is called solvable, if it admits optimal solutions.

– [optimal set] the set of all optimal solutions to a problem is called its optimal
set.
The terminology above is for minimization problems; for its “maximization mod-
ifications,” see section 4.5.1.

20.1.1 Convex Programming problem

A Mathematical Programming problem (P) is called convex (or Convex Program-
ming problem), if
• X is a convex subset of Rn ,
• f, g1 , . . . , gm are real-valued convex functions on X, and
• there are no equality constraints at all.
Note that instead of saying that there are no equality constraints, we could say
that there are constraints of this type, but only linear (affine) ones; this latter case
can be immediately reduced to the one without equality constraints by replacing
Rn with the affine subspace given by the (linear) equality constraints.

20.2 Convex Theorem on Alternative

The simplest case of a convex problem is, of course, a Linear Programming prob-
lem – the one where X = Rn and the objective and all the constraints are linear.
The main descriptive components of LP are LP duality and optimality condi-
tions; our primary goal in this and forthcoming chapters is to extend duality and
optimality conditions from Linear to Convex programming.
The origin of our developments is based on the following simple observation:
20.2 Convex Theorem on Alternative 241

the fact that a point x∗ is an optimal solution can be expressed in terms of

feasibility/infeasibility of certain systems of constraints. These systems in our
current setup of convex optimization problems are given by
x ∈ X, f (x) ≤ c, gj (x) ≤ 0, j = 1, . . . , m (20.2)
and
x ∈ X, f (x) < c, gj (x) ≤ 0, j = 1, . . . , m; (20.3)
here c is a parameter. Optimality of x∗ for the problem means precisely that
for appropriately chosen c (this choice, of course, is c = f (x∗ )) the first of these
systems is feasible and x∗ is its feasible solution, while the second system is
infeasible. Next, in the case of LP, we converted the “negative” part of this
simple observation –the claim that (20.3) is infeasible– into a positive statement,
using the General Theorem on Alternative (Theorem I.4.3), and this gave us the
LP Duality Theorem (Theorem I.4.9).
We will follow the same approach for convex optimization problems. To this
end, we need a “convex analogy” to the Theorem on Alternative – something like
the latter statement, but for the case when the inequalities in question are given
by convex functions rather than the linear ones (and, besides, we now have to
handle a “convex inclusion” x ∈ X).
Indeed, it is easy to guess the result we need. How did we come to the formu-
lation of the Theorem on Alternative? The main question, basically, boiled down
to how to express in an affirmative manner the fact that a system of linear in-
equalities has no solutions. To this end, we observed that if we can combine, in a
linear fashion, the inequalities of the system and get an obviously false inequality
like 0 ≤ −1, then the system is infeasible. Note that this condition is nothing but
a certain affirmative statement with respect to the weights with which we are
combining the original inequalities.
Now, the scheme of the above reasoning has nothing tied to linearity (and even
convexity) of the inequalities in question. Indeed, consider an arbitrary system
of constraints of the type (20.3):
f (x) < c
gj (x) ≤ 0, j = 1, . . . , m (I)
x ∈ X.
Here, all we assume is that X is a nonempty subset in Rn and f, g1 , . . . , gm are
real-valued functions on X. Then, it is absolutely evident that
if there exist nonnegative weights λ1 , . . . , λm such that the inequality
Xm
f (x) + λj gj (x) < c (20.4)
j=1

has no solutions in X, then (I) also has no solutions.

Indeed, a solution to (I) is clearly a solution to (20.4) – the latter inequality is
nothing but a combination of the inequalities from (I) with the weights 1 (for the
first inequality) and λj (for the remaining ones).
242 Convex Programming problems and Convex Theorem on Alternative

Now, what does it mean that (20.4) has no solutions in the domain X? A
necessary and sufficient condition for this is that the infimum of the left hand
side of (20.4) over the domain x ∈ X is greater than or equal to c. Thus, we
arrive at the following evident result.

Proposition IV.20.1 [Sufficient condition for infeasibility of (I)] Consider

a system (I) with arbitrary data and assume that the system
h Pm i
inf f (x) + j=1 λj gj (x) ≥ c
x∈X (II)
λj ≥ 0, j = 1, . . . , m
with unknowns λ1 , . . . , λm has a solution. Then, (I) is infeasible.

Let us stress that Proposition IV.20.1 is completely general; it does not require
any assumptions (not even convexity) on the entities involved.
That said, Proposition IV.20.1, unfortunately, is not so helpful: the actual
power of the Theorem on Alternative (and the key fact utilized in the proof of the
Linear Programming Duality Theorem) is not the sufficiency of the condition of
Proposition for infeasibility of (I), but the necessity of this condition. Justification
of necessity of the condition in question has nothing to do with the evident
reasoning that established its sufficiency. In the linear case (X = Rn , f , g1 , . . . , gm
are linear), we established the necessity via the Homogeneous Farkas Lemma. We
will next prove the necessity of the condition for the convex case. At this step, we
already need some additional, although minor, assumptions; and in the general
nonconvex case the sufficient condition stated in Proposition IV.20.1 simply is
not necessary for the infeasibility of (I). This, of course, is very bad-yet-expected
news – this is the reason why there are difficult optimization problems that we
do not know how to solve efficiently.
The just presented “preface” outlines our action plan. Let us carry out our
plan by formally defining the aforementioned “minor regularity assumptions.”

Definition IV.20.2 [Slater Condition] Let X ⊆ Rn and let g1 , . . . , gm be

real-valued functions on X. We say that these functions satisfy the Slater
condition on X, if there exists a strictly feasible solution x, that is, x ∈ rint X
such that gj (x) < 0, j = 1, . . . , m.
We say that an inequality constrained problem
minx {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X]
satisfies the Slater condition (synonym: is strictly feasible), if g1 , . . . , gm sat-
isfy this condition on X.

In the case where some of the constraints are linear, we rely on a slightly relaxed
regularity condition.
20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 243

Definition IV.20.3 [Relaxed Slater Condition] Let X ⊆ Rn , and let g1 , . . . , gm

be real-valued functions on X. We say that g1 , . . . , gm satisfy the Relaxed
Slater condition on X, if there exists x ∈ rint X such that gj (x) ≤ 0 for all
1 ≤ j ≤ m, and gj (x) < 0 for all j with non-affine gj .
An inequality constrained problem (IC) is said to satisfy the Relaxed Slater
condition (synonym: is essentially strictly feasible), if g1 , . . . , gm satisfy this
condition on X.
A system of equality and inequality constraints
gj (x) ≤ 0, j = 1, . . . , m, hi (x) = 0, i = 1, . . . , k, x ∈ X
(C)
[where g1 , . . . , gm , h1 , . . . , hk are real-valued functions on X]
is said to satisfy Relaxed Slater Condition (synonym: is essentially strictly
feasible), if all hi are affine functions, and there exists an essentially strictly
feasible solution, that is, a feasible solution x ∈ rint X where all inequality
constraints gj (x) ≤ 0 with non-affine gj are satisfied as strict inequalities.
An optimization problem of minimizing a (real-valued on X) objective f
under constraints (C) is called essentially strictly feasible, if the system of
constraints is so.
Note: (C) is essentially strictly feasible if and only if the equivalent inequality
reformulation
gj (x) ≤ 0, j = 1, . . . , m, ± hi (x) = 0, i = 1, . . . , k, x ∈ X
of (C) is essentially strictly feasible.

Clearly, the validity of Slater condition implies the validity of the Relaxed Slater
condition (why?). We are about to establish the following fundamental fact.

Theorem IV.20.4 [Convex Theorem on Alternative] Let X ⊆ Rn be con-

vex, let f, g1 , . . . , gm be real-valued convex functions on X, and let g1 , . . . , gm
satisfy the Relaxed Slater condition on X. Then, system (I) is feasible if and
only if system (II) is infeasible.

Theorem IV.20.4 is a special case of Theorem IV.20.13 to be formulated and

proved in the next section.

20.3 ⋆ Convex Theorem on Alternative – cone-constrained form

We will indeed present and prove a form of Theorem IV.20.4 that will be stronger.
To this end, we need a few definitions and concepts related to cones.

Definition IV.20.5 [Regular cone] A cone K ⊂ Rn is called a regular cone

if K is closed, convex, full dimensional (i.e., possesses a nonempty interior),
and is pointed (i.e., K ∩ (−K) = {0}).
244 Convex Programming problems and Convex Theorem on Alternative

In our developments, we will frequently examine the dual cones as well. There-
fore, we introduce the following elementary fact on the regularity of dual cones.

Fact IV.20.6 (i) A cone K ⊆ Rn is regular if and only if its dual cone
K∗ = {y ∈ Rn : y ⊤ x ≥ 0, ∀x ∈ K} is regular.
(ii) Given regular cones K1 , . . . , Km , their direct product K1 × . . . × Km is
also regular.

There are a number of “magic cones” that are regular and play a crucial role
in Convex Optimization. In particular, many convex optimization problems from
practice can be posed as optimization problems involving domains expressed using
these cones as the basic building blocks.

Fact IV.20.7 The following cones (see Examples discussed in section 1.2.4)
are regular:
(i) Nonnegative ray, R+ .
(ii) Lorentz
p 2 (a.k.a., second-order, or ice-cream) cone, Ln = {x ∈ Rn : xn ≥
x1 + . . . + x2n−1 } (L1 := R+ ).
(iii) Positive semidefinite cone, Sn+ = {X ∈ Sn : a⊤ Xa ≥ 0, ∀a ∈ Rn }.

Our developments will be based on an important concept that we introduce

now.

Definition IV.20.8 [Cone-convexity] Let K ⊂ Rν be a regular cone. A map

h(·) : Dom h → Rν is called K-convex if Dom h is a convex set in some Rn
and for every x, y ∈ Dom h and λ ∈ [0, 1] we have
λh(x) + (1 − λ)h(y) − h(λx + (1 − λ)y) ∈ K. (20.5)

Note that in the simplest case of K = Rν+ (nonnegative orthant is a regular

cone!) K-convexity of a map h means exactly that the components of h are convex
functions with common domain.
An instructive example of a “genuine cone-convex” function is as follows:

Lemma IV.20.9 Let K = Sm + , and consider h : R

m×n
→ Sm defined as
h(x) = xx⊤ . Then, h is K-convex.

Proof. Indeed, for any x, y ∈ Rm×n and λ ∈ [0, 1], we have

⊤
(λx + (1 − λ)y) (λx + (1 − λ)y) = λxx⊤ + (1 − λ)yy ⊤ − λ(1 − λ)(x − y)(x − y)⊤ .
Therefore,
λh(x) + (1 − λ)h(y) − h(λx + (1 − λ)y) = λ(1 − λ) (x − y)(x − y)⊤ ⪰ 0.
| {z } | {z }
≥0 ⪰0
20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 245

See chapter 25 for other instructive examples of K-convex functions and their
“calculus.”
Indeed, K-convexity can be expressed in terms of the usual convexity due to
the following immediate observation.

Fact IV.20.10 Let K be a regular cone in Rν , Z ⊆ Rn be a nonempty

convex set and h : Z → Rν be a mapping with dom h = Z. Then, h is
K-convex if and only if for all µ ∈ K∗ (where K∗ is the cone dual to K) we
have that the real valued functions µ⊤ h(·) are convex on Z.

Given a regular cone K ⊂ Rν , we can associate with it K-inequality between

vectors of Rν : we say that a ∈ Rν is K-greater than or equal to b ∈ Rν (notation:
a ≥K b, or, equivalently, b ≤K a) when a − b ∈ K:

a≥K b ⇐⇒ b≤K a ⇐⇒ a − b ∈ K.

For example, when K = Rν+ is nonnegative orthant, ≥K is the standard coordinate-

wise vector inequality ≥. That is, a ≥ b means that every entry of a is greater
than or equal to, in the standard arithmetic sense, the corresponding entry in b.
K-vector inequality possesses all algebraic properties of ≥.

Fact IV.20.11 Let K ⊂ Rν be a regular cone. Then, any K-inequality

a ≥K b satisfies all of the following properties:
(i) It is a partial order on Rν , i.e., the relation a ≥K b is
• reflexive: a ≥K a for all a;
• anti-symmetric: a ≥K b and b ≥K a if and only if a = b;
• transitive: if a ≥K b and b ≥K c, then a ≥K c.
(ii) It is compatible with linear operations, i.e., ≥K -inequalities can be
• summed up: if a≥K b and c≥K d, then a + c≥K b + d;
• multiplied by nonnegative reals: if a≥K b and λ is a nonnegative real,
then λa≥K λb.
(iii) It is compatible with convergence, i.e., one can pass to sidewise limits in
≥K -inequality:
• if at ≥K bt , t = 1, 2, . . ., and at → a and bt → b as t → ∞, then a≥K b.
(iv) It gives rise to strict version >K of ≥K -inequality a >K b (equivalently:
b <K a) meaning that a − b ∈ int K. The strict K-inequality possesses the
basic properties of the coordinate-wise >, specifically,
• >K is stable: if a >K b and a′ , b′ are close enough to a, b respectively,
then a′ >K b′ ;
• if a >K b, λ is a positive real, and c≥K d, then λa >K λb and a + c >K
b + d.

In summary, the arithmetics of ≥K and >K inequalities is completely similar to

246 Convex Programming problems and Convex Theorem on Alternative

the one of the usual ≥ and > inequalities. Verification of the claims made in
Fact IV.20.11 is immediate and is left to the reader.
In the standard approach to nonlinear convex optimization, the Mathematical
Programming problem that is convex has the following form
min {f (x) : g(x) := [g1 (x); . . . ; gm (x)] ≤ 0, [h1 (x); . . . ; hk (x)] = 0} ,
x∈X

where X is a convex set, f (x), gi (x) : X → R, 1 ≤ i ≤ m are convex functions

and hj (x), 1 ≤ j ≤ k are affine functions. In this form, the nonlinearity “sits”
in the functions gi and/or non-polyhedrality of the set X. Since 1990s, it was
realized that, along with this form, it is extremely convenient to consider the
conic form where the nonlinearity “sits” in the inequality relations ≤. That is,
the usual coordinate-wise ≤ is replaced with ≤K , where K is a regular cone. The
resulting convex program in cone-constrained form reads
 

 

min f (x) : g(x) := Ax − b ≤ 0, gb(x) ≤K 0 , (20.6)
x∈X 
 | {z }  
⇐⇒ g
b(x)∈−K

where X ⊂ Rn is a convex set, f : X → R is a convex function, K ⊂ Rν is a

regular cone, A ∈ Rk×n , and gb : X → Rν is K-convex. Note that K-convexity of
gb in our new notation is simply equivalent to the requirement
gb(λx + (1 − λ)y)) ≤K λgb(x) + (1 − λ)gb(y), ∀x, y ∈ X, ∀λ ∈ [0, 1].
Indeed, when K is the nonnegative orthant, (20.6) recovers the Mathematical
Programming form (20.1) of a convex problem.
It turns out that with “cone-constrained approach” to Convex Programming,
we lose nothing when restricting ourselves with X = Rn , linear f (x), and affine
gb(x); this specific version of (20.6) is called “conic problem” (to be considered in
more details later). That said, it makes sense to speak about a “less extreme”
form of a convex program, specifically, one presented in (20.6); we call problems
of this form “convex problems in cone-constrained form,” reserving the words
“conic problems” for problems (20.6) with X = Rn , linear f and affine gb, see
section 22.4.
The developments from section 20.2 can be naturally extended to the cone-
constrained case as follows. Let X ⊆ Rn be a nonempty convex set, K ⊂ Rν be
a regular cone, f be a real-valued function on X, and gb(·) be a mapping from
X into Rν . Instead of feasibility/infeasibility of system (I) we can speak about
feasibility/infeasibility of system of constraints
f (x) < c
g(x) := Ax − b ≤ 0
(ConI)
gb(x) ≤K 0 [ ⇐⇒ gb(x) ∈ −K]
x ∈ X
in variables x, i.e., this is the cone-constrained analogy of inequality-constrained
20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 247

system (I). We call system (ConI) convex, if, in addition to already assumed
convexity of X, the function f is convex on X, and the map gb is K-convex on X.
Denoting by K∗ the cone dual to K, a sufficient condition for the infea-
sibility of (ConI) is the feasibility of the following system of constraints
⊤
h i
inf f (x) + λ g(x) + λb⊤ gb(x) ≥ c
x∈X
λ ≥ 0 (ConII)
b ≥K 0 [ ⇐⇒ λ
λ b ∈ K∗ ]
∗

in variables λ = (λ, λ).

Indeed, given a feasible solution λ, λb to (ConII) and “aggregating” the constraints

in (ConI) with weights 1, λ, λ b (i.e., taking sidewise inner products of constraints
in (ConI) based on these aggregation weights and summing up the results), we
arrive at the inequality
⊤
b⊤ gb(x) < c
f (x) + λ g(x) + λ
which due to λ ≥ 0 and λ b ∈ K∗ is a consequence of (ConI) – it must be satisfied
at every feasible solution to (ConI). On the other hand, the aggregated inequal-
ity contradicts the first constraint in (ConII), and we conclude that under the
circumstances (ConI) is infeasible.
“Cone-constrained” version of Slater/Relaxed Slater condition is as follows:

Definition IV.20.12 [Cone-constrained Slater/Relaxed Slater Condition]

We say that the system of constraints
g(x) := Ax − b ≤ 0, gb(x)≤K 0 (S)
in variables x satisfies
— Slater condition on X, if there exists x̄ ∈ rint X such that g(x̄) < 0 and
gb(x̄)<K 0 (i.e., gb(x̄) ∈ − int K),
— Relaxed Slater condition on X, if there exists x̄ ∈ rint X such that g(x̄) ≤ 0
and gb(x̄)<K 0.
We say that optimization problem in cone-constrained form, i.e., problem of
the form
min {f (x) : g(x) := Ax − b ≤ 0, gb(x)≤K 0} (ConIC)
x∈X

satisfies Slater/Relaxed Slater condition, if the system of its constraints sat-

isfies this condition on X.
Note that in the case of K = Rν+ , (ConI) and (ConII) become, respectively,
(I) and (II), and the cone-constrained versions of Slater/Relaxed Slater condition
become the usual ones.
We are now ready to state Convex Theorem on Alternative in cone-constrained
form dealing with convex cone-constrained system (ConI), and obtain Theorem
IV.20.4 as a particular case of this result.
248 Convex Programming problems and Convex Theorem on Alternative

Theorem IV.20.13 [Convex Theorem on Alternative in cone-constrained

form] Let K ⊂ Rν be a regular cone, let X ⊆ Rn be nonempty and convex,
let f be real-valued convex function on X, g(x) = Ax − b be affine, and gb(x) :
X → Rν be K-convex. Suppose that system (S) satisfies the cone-constrained
Relaxed Slater condition on X. Then, the system (ConI) is feasible if and
only if the system (ConII) is infeasible.

Note: In some cases (ConI) may have no affine (i.e., polyhedral) part g(x) :=
Ax − b ≤ 0 and/or no “general part” gb(x)≤K 0; absence of one or both of these
parts leads to self-evident modifications in (ConII). To unify our forthcoming
considerations, it is convenient to assume that both of these parts are present.
This assumption is for free: it is immediately seen that in our context, in absence
of one or both of g-constraints in (ConI) we lose nothing when adding artificial
affine part g(x) := 0⊤ x − 1 ≤ 0 instead of missing affine part, and/or artifi-
cial general part gb(x) := 0⊤ x − 1≤K 0 with K = R+ instead of missing general
part. Thus, we lose nothing when assuming from the very beginning that both
polyhedral and general parts are present.
It is immediately seen that Convex Theorem on Alternative (Theorem IV.20.4)
is a special case of Theorem IV.20.13 corresponding to the case when K is a
nonnegative orthant.
In the proof of Theorem IV.20.13, we will use Lemma IV.20.14, a generalization
of the Inhomogeneous Farkas Lemma. We will present the proof of this result after
the proof of Theorem IV.20.13.

Lemma IV.20.14 Let X ⊆ Rn be a convex set with 0 ∈ int X. Let g(x) =

Ax + a : Rn → Rk and d⊤ x + δ : Rn → R be affine functions such that
g(0) ≤ 0 and
x ∈ X, g(x) ≤ 0 =⇒ d⊤ x + δ ≤ 0.
Then, there exists a vector µ ≥ 0 such that
d⊤ x + δ ≤ µ⊤ g(x), ∀x ∈ X.

Proof of Theorem IV.20.13. The first part of the statement – “if (ConII) has
a solution, then (ConI) has no solutions” – has been already verified. What we
need is to prove the reverse statement. Thus, let us assume that (ConI) has no
solutions, and let us prove that then (ConII) has a solution.
00 . Without loss of generality we may assume that X is full-dimensional:
rint X = int X (indeed, otherwise we can replace our “universe” Rn with the
affine span of X). Besides this, if needed shifting f by a constant, we can assume
that c = 0. Thus, we are in the case where

f (x) < 0, 

g(x) := Ax − b ≤ 0,

(ConI)
gb(x) ≤K 0, [ ⇐⇒ gb(x) ∈ −K]  
x ∈ X;

20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 249
⊤
h i 
b⊤ gb(x)
inf f (x) + λ g(x) + λ ≥ 0, 
x∈X

λ ≥ 0, (ConII)

b ≥K
λ 0 [ ⇐⇒ λ
b ∈ K∗ ]. 
∗

Moreover, because the system satisfies the cone-constrained version of Relaxed

Slater condition on X, there exists some x̄ ∈ rint X = int X such that g(x̄) ≤ 0
and gb(x̄) ∈ − int K. If needed, by shifting X (that is, passing from variables x to
variables x − x̄; this clearly does not affect the statement we need to prove) we
can assume that x̄ is just the origin, so that we have
0 ∈ int X, g(0) ≤ 0, gb(0)<K 0. (20.7)
Recall that we are in the situation when (ConI) is infeasible, that is, the opti-
mization problem
Opt(P ) := min {f (x) : x ∈ X, g(x) ≤ 0, gb(x)≤K 0} (P )
x

satisfies Opt(P ) ≥ 0, and our goal is to show that (ConII) is feasible.

10 . Define
Y := {x ∈ X : g(x) ≤ 0} = {x ∈ X : Ax − b ≤ 0} ,
along with
S := {t = [t0 ; t1 ] ∈ R × Rν : t0 < 0, t1 ≤K 0} ,
T := {t = [t0 ; t1 ] ∈ R × Rν : ∃x ∈ Y s.t. f (x) ≤ t0 , gb(x)≤K t1 } ,
so that both sets S and T are nonempty (as (P ) is feasible by (20.7)) and convex
(since X, Y , and f are convex, and gb(x) is K-convex on X). Moreover, S and
T do not intersect since Opt(P ) ≥ 0. Then, by Separation Theorem (Theorem
II.7.3) S and T can be separated by an appropriately chosen linear form α. Thus,
sup α⊤ t ≤ inf α⊤ t (20.8)
t∈S t∈T

and the linear form α⊤ t is non-constant on S ∪ T , implying that α ̸= 0. Denote

α = [α0 ; α1 ] with α0 ∈ R and α1 ∈ Rν . We claim that α0 ≥ 0 and α1 ∈ K∗ .
Suppose not, then either α0 < 0 or α1 ∈ / K∗ (i.e., there exists τ̄ ∈ K such that
α1⊤ τ̄ < 0) or both. But, in any of these cases we would have supt∈S α⊤ t = +∞
(look at what happens with α⊤ t on the ray {t = [t0 ; t1 ] : t0 ≤ −1, t1 = 0} ⊂ S
when α0 < 0, and on the ray {[t0 = −1, t1 = −sτ̄ ], s ≥ 0} ⊂ S when α1 ̸∈ K∗
and τ̄ ∈ K is such that α1⊤ τ̄ < 0) and this would contradict (20.8) combined with
nonemptiness of T . Then, taking into account that α0 ≥ 0 and α1 ∈ K∗ , we have
supt∈S α⊤ t = 0, and (20.8) reads
0 ≤ inf α⊤ t. (20.9)
t∈T

20 . We now claim that α0 > 0. Note that the point t̄ := [t̄0 ; t̄1 ] with the
components t̄0 := f (0) and t̄1 := gb(0) belongs to T (since 0 ∈ Y by (20.7)),
thus by (20.9) it holds that α0 f (0) + α1⊤ gb(0) ≥ 0. Assume for contradiction that
α0 = 0. Then, we deduce α1⊤ gb(0) ≥ 0, which due to −gb(0) ∈ int K (see (20.7))
250 Convex Programming problems and Convex Theorem on Alternative

and α1 ∈ K∗ is possible only when α1 = 0, see Fact II.8.23(iii). Thus, we conclude

that α1 = 0 on the top of α0 = 0, which is impossible, since α ̸= 0.
Given that we have proved α0 > 0, in the sequel, we set ᾱ1 := α1 /α0 , and
h(x) := f (x) + ᾱ1⊤ gb(x).
Note that h(·) is convex on X due to convexity of X and f , K-convexity of gb,
and the inclusion ᾱ1 ∈ K∗ , see Fact IV.20.10.
Observe that (20.9) remains valid when we replace α with ᾱ := α/α0 . Moreover,
when x ∈ Y , we have [f (x); gb(x)] ∈ T and thus from (20.9) and the definition of
h(x) we deduce that
x∈Y =⇒ h(x) ≥ 0. (20.10)
30 .i. Consider the convex sets
Q := {[x; τ ] ∈ Rn × R : x ∈ X, g(x) := Ax − b ≤ 0, τ < 0} ,
W := {[x; τ ] ∈ Rn × R : x ∈ X, τ ≥ h(x)} .
These sets clearly are nonempty and do not intersect (since the x-component x
of a point from Q ∩ W would satisfy the premise but violate the conclusion in
(20.10)). By Separation Theorem, there exists [e; β] ̸= 0 such that
sup e⊤ x + βτ ≤ inf
⊤
e x + βτ .
[x;τ ]∈Q [x;τ ]∈W

As W is nonempty, we have inf [x;τ ]∈W [e⊤ x + βτ ] < +∞. Then, by taking into
account the definition of Q, we deduce β ≥ 0 (since otherwise the left hand side
in the preceding inequality would be +∞). With β ≥ 0 in mind and considering
the definitions of Q and W , the preceding inequality reads
sup e⊤ x : x ∈ X, g(x) ≤ 0 ≤ inf e⊤ x + βh(x) : x ∈ X .

(20.11)
x x

Let us define a := supx e⊤ x : x ∈ X, g(x) ≤ 0 . Note that in (20.11), supx and

inf x are taken over nonempty sets, implying that a ∈ R.

30 .ii. Recall that we have seen that β ≥ 0 in (20.11). We claim that in fact
β > 0. Assume for contradiction that β = 0. Then, using the definition of a and
β = 0, (20.11) implies
[e⊤ x − a ≤ 0, ∀(x ∈ X : g(x) ≤ 0)] & [e⊤ x ≥ a, ∀x ∈ X]. (20.12)
Taking into account that 0 ∈ X and g(0) ≤ 0 by (20.7), the first relation in
(20.12) says that a ≥ 0. Then, as a ≥ 0, from the second relation in (20.12) we
deduce that e⊤ x ≥ 0 for x ∈ X. As 0 ∈ int X, there exists a small enough ε > 0
such that (0 − εe) ∈ X. Thus, from e⊤ (−εe) ≥ 0, we deduce that e = 0. But,
e = 0 is impossible, since we are in the case when [e; β] ̸= 0 and β = 0.
30 .iii. Thus, in (20.11) we must have β > 0. Then, by replacing e with β −1 e,
we can
assume that (20.11) holds true with β = 1. Once again recalling a =
supx e⊤ x : x ∈ X, g(x) ≤ 0 , the inequality (20.11) becomes
a ≤ h(x) + e⊤ x, ∀x ∈ X. (20.13)
20.3 ⋆ Convex Theorem on Alternative – cone-constrained form 251

By the definition of a, we have also

d⊤ x + δ := e⊤ x − a ≤ 0, ∀(x ∈ X : g(x) ≤ 0)
so that the data X, g(x) := g(x) augmented with the just defined affine function
d⊤ x + δ satisfy the premise in Lemma IV.20.14 (recall that (20.7) holds true).
Applying Lemma IV.20.14, we conclude that there exists µ ≥ 0 such that
e⊤ x − a ≤ µ⊤ g(x), ∀x ∈ X.
Combining this relation and (20.13), we conclude that
h(x) + µ⊤ g(x) ≥ 0, ∀x ∈ X. (20.14)
Recalling that h(x) = f (x) + ᾱ1⊤ gb(x) with ᾱ1 ∈ K∗ and setting λ = µ, λ
b = ᾱ1 ,
we get λ ≥ 0, λ
b ∈ K∗ while by (20.14) it holds
⊤
b⊤ gb(x) ≥ 0 ∀x ∈ X,
f (x) + λ g(x) + λ
meaning that λ, λ
b solve (ConII) (recall that we are in the case of c = 0).
Proof of Lemma IV.20.14. Recall g(x) = Ax + a. Let us define the following
cones
M1 := cl {[x; t] ∈ Rn × R : t > 0, x/t ∈ X} ,
M2 := {y = [x; t] ∈ Rn × R : Ax + ta ≤ 0} .
M2 is a polyhedral cone, and M1 is a closed cone (in fact it is the closed conic
transform ConeT(X) of X, see section 1.5) with a nonempty interior (since the
point [0; . . . ; 0; 1] ∈ Rn+1 belongs to int M1 due to 0 ∈ int X). Moreover, (int M1 )∩
M2 ̸= ∅ as the point [0; . . . ; 0; 1] ∈ int M1 belongs to M2 as g(0) ≤ 0.
Let f := [d; δ] ∈ Rn × R. We claim that the linear form f ⊤ [x; t] = d⊤ x + tδ is
nonpositive on M1 ∩ M2 . Indeed, consider any y := [z; t] ∈ M1 ∩ M2 , and define
ys = [zs ; ts ] = (1 − s)y + s[0; . . . ; 0; 1]. Then, for s = 0 we have ys = y and it is
in M1 ∩ M2 , and when s = 1 we have ys = [0; . . . ; 0; 1] ∈ M1 ∩ M2 . As M1 ∩ M2
is convex, we observe that ys ∈ M1 ∩ M2 for 0 ≤ s ≤ 1. Now, consider the case
when 0 < s ≤ 1. Then, we have ts > 0 (since s > 0 and also t ≥ 0 due to y ∈ M1 ),
and ys ∈ M1 implies that ws := zs /ts ∈ cl X (why?), while ys ∈ M2 along with
ts > 0 implies that g(ws ) ≤ 0. Since 0 ∈ int X, ws ∈ cl X implies that θws ∈ X
for all θ ∈ [0, 1), while g(0) ≤ 0 along with g(ws ) ≤ 0 implies that g(θws ) ≤ 0 for
θ ∈ [0, 1). Then, by invoking the premise of the lemma, we conclude that
d⊤ (θws ) + δ ≤ 0, ∀θ ∈ [0, 1).
Hence, whenever 0 < s ≤ 1, we have d⊤ ws + δ ≤ 0, or, equivalently, f ⊤ ys ≤ 0. As
s → +0, we have f ⊤ ys → f ⊤ y = f ⊤ [z; t], implying that f ⊤ [z; t] ≤ 0, as claimed.
Therefore, we have shown that f ⊤ [x; t] = d⊤ x + tδ ≤ 0 for every [x; t] ∈
(M1 ∩ M2 ). That is, (−f ) ∈ (M1 ∩ M2 )∗ , where, as usual, M∗ is the cone dual to
the cone M . To summarize, we have M1 , M2 are closed cones such that (int M1 )∩
M2 ̸= ∅ and (−f ) ∈ (M1 ∩ M2 )∗ . Applying to M1 , M2 the Dubovitski-Milutin
Lemma (Proposition II.8.27), we conclude that (M1 ∩ M2 )∗ = (M1 )∗ + (M2 )∗ .
252 Convex Programming problems and Convex Theorem on Alternative

Since (−f ) ∈ (M1 ∩ M2 )∗ , there exist ψ ∈ (M1 )∗ and ϕ ∈ (M2 )∗ such that
f = [d; δ] = −ϕ − ψ. The inclusion ϕ ∈ (M2 )∗ means that the homogeneous
linear inequality ϕ⊤ y ≥ 0 in variables y ∈ Rn+1 is a consequence of the system
of homogeneous linear inequalities given by [A, a]y ≤ 0. Hence, by Homogeneous
Farkas Lemma (Lemma I.4.1) −ϕ is a conic combination of the transposes of the
rows of the matrix [A, a], so that ϕ⊤ [x; 1] = −µ⊤ g(x) for some nonnegative µ
and all x ∈ Rn . Thus, for all x ∈ Rn , we deduce
d⊤ x + δ = [d; δ]⊤ [x; 1] = f ⊤ [x; 1] = −ϕ⊤ [x; 1] − ψ ⊤ [x; 1] = µ⊤ g(x) − ψ ⊤ [x; 1].
Finally, note that [x; 1] ∈ M1 whenever x ∈ X. Then, as ψ ∈ (M1 )∗ , we have
ψ ⊤ [x; 1] ≥ ⊤ ⊤
0 for all x ∈ X. Thus, for all x ∈ X, we have 0 ≤ ψ [x; 1] = µ g(x) −
⊤
d x + δ and so µ satisfies precisely the requirements stated in the lemma.
To complete the story about Convex Theorem on Alternative, let us present
an example which demonstrates that the relaxed Slater condition is crucial for
the validity of Theorem IV.20.13.
Example IV.20.1 Consider the following special case of (ConI):
f (x) ≡ x < 0, g(x) ≡ 0 ≤ 0, gb(x) ≡ x2 ≤ 0 (ConI)
(here the embedding space is R, X = R, c = 0, and K = R+ , that is, this is
just a system of scalar convex constraints). System (ConII) here is the system of
constraints
b 2 ≥ 0, λ ≥ 0, λ

inf x + λ · 0 + λx b≥0 (ConII)
x∈R

b ∈ R.
on variables λ, λ
System (ConI) clearly is infeasible. System (ConII) is infeasible as well – it is
immediately seen that whenever λ and λ b are nonnegative, the quantity x + λ ·
2
0 + λx is negative for all small in magnitude x < 0, that is, the first inequality
b
in (ConII) is incompatible with the remaining two inequalities of the system.
Note that in this example the only missing component of the premise in Theo-
rem IV.20.13 is the relaxed Slater condition. Let us now examine what happens
when we replace the constraint gb(x) ≡ x2 ≤ 0 with gb(x) ≡ x2 − 2ϵx ≤ 0 where
ϵ > 0. In this case, we keep (ConI) infeasible, and gain the validity of the re-
laxed (relaxed, not plain!) Slater condition. Then, as all the conditions of Convex
Theorem on Alternative are now met, we deduce that (ConII) which now reads
h i
b 2 − 2ϵx) ≥ 0, λ ≥ 0, λ
inf x + λ · 0 + λ(x b ≥ 0,
x

1
must be feasible. In fact, λ = 0, λ
b=
2ϵ
is a feasible solution to (ConII). ♢
21

Lagrange Function and Lagrange Duality

21.1 Lagrange function

Convex Theorem on Alternative brings to our attention the function
h Xm i
L(λ) := inf f (x) + λj gj (x) , (21.1)
x∈X j=1

and the related aggregate function

Xm
L(x, λ) := f (x) + λj gj (x) (21.2)
j=1

from which L(λ) originates. The aggregate function in (21.2) is called the La-
grange (or Lagrangian) function of the inequality constrained optimization pro-
gram
min {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X] .
The Lagrange function L(x, λ) of an optimization problem is a very important
entity as most of the optimality conditions are expressed in terms of this function.
Let us start with translating our developments from section 20.2 to the language
of the Lagrange function.

21.2 Convex Programming Duality Theorem

We start by developing the duality theorem for convex optimization problems.

Theorem IV.21.1 Consider an arbitrary inequality constrained optimiza-

tion program
min {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X]
along with its Lagrange function
Xm
L(x, λ) := f (x) + λi gi (x) : X × Rm
+ → R.
i=1

Then,
(i) [Weak Duality] For every λ ≥ 0, the infimum of the Lagrange function in

253
254 Lagrange Function and Lagrange Duality

x ∈ X, that is,
L(λ) := inf L(x, λ)
x∈X

is a lower bound on the optimal value of (IC), so that the optimal value
of the optimization problem
sup L(λ) (IC∗ )
λ≥0

also is a lower bound for the optimal value in (IC);

(ii) [Convex Duality Theorem] If the problem (IC)
• is convex,
• is below bounded, and
• satisfies the Relaxed Slater condition,
then the optimal value of (IC∗ ) is attained and is equal to the optimal
value in (IC).

Proof. Let c∗ be the optimal value of (IC).

(i): This part is nothing but Proposition IV.20.1 (why?). It makes sense, how-
ever, to repeat here the corresponding one-line reasoning:
Consider any λ ∈ Rm + . Then, by definition of the Lagrange function, for any x
that is feasible for (IC) we have

Xm
L(x, λ) = f (x) + λj gj (x) ≤ f (x),
j=1

where the inequality follows from the facts that λ ∈ Rm + and the feasibility of x
for (IC) implies that gj (x) ≤ 0 for all j = 1, . . . , m. Then, we immediately arrive
at

L(λ) = inf L(x, λ) ≤ inf L(x, λ) ≤ inf f (x) = c∗ ,

x∈X x∈X:g(x)≤0 x∈X:g(x)≤0

as desired.
(ii): This part is an immediate consequence of the Convex Theorem on Alter-
native. Note that the system

f (x) < c∗ , gj (x) ≤ 0, j = 1, . . . , m

has no solutions in X, and by Theorem IV.20.4, the system (II) associated with
c = c∗ has a solution, i.e., there exists λ∗ ≥ 0 such that L(λ∗ ) ≥ c∗ . But, we
know from part (i) that the strict inequality here is impossible and, moreover,
that L(λ) ≤ c∗ for every λ ≥ 0. Thus, L(λ∗ ) = c∗ and λ∗ is a maximizer of L over
λ ≥ 0.
21.3 Lagrange duality and saddle points 255

21.3 Lagrange duality and saddle points

Theorem IV.21.1 establishes a certain connection between two optimization prob-
lems, i.e., the “primal” problem
minx {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X] ,
and its Lagrange dual problem

maxλ L(λ) : λ ∈ Rm +
(IC∗ )

Pm
where L(λ) = inf L(x, λ) and L(x, λ) = f (x) + i=1 λi gi (x) .
x∈X

Here, the variables λ of the dual problem are called the Lagrange multipliers of
the primal problem. Theorem IV.21.1 states that the optimal value of the dual
problem is at most that of the primal, and under some favorable circumstances
(i.e., when the primal problem is convex, below bounded, and satisfies the Relaxed
Slater condition) the optimal values in this pair of problems are equal to each
other.
In our formulation there may seem to be some asymmetry between the primal
and the dual problems. In fact, both of these problems are related to the Lagrange
function in a quite symmetric way. Indeed, consider the problem
min L(x), where L(x) := sup L(x, λ).
x∈X λ≥0

By definition of the Lagrange function L(x, λ), the function L(x) is clearly +∞ at
every point x ∈ X which is not feasible for (IC) and is f (x) on the feasible set of
(IC), so that this problem is equivalent to (IC). We see that both the primal and
the dual problems originate from the Lagrange function: in the primal problem,
we minimize over X the result of maximization of L(x, λ) in λ ≥ 0, i.e., the primal
problem is
min sup L(x, λ),
x∈X λ∈Rm
+

and in the dual program we maximize over λ ≥ 0 the result of minimization of

L(x, λ) in x ∈ X, i.e., the dual problem is
max inf L(x, λ).
λ∈Rm
+ x∈X

This is a particular (and the most important) example of a two-person zero-sum

game which we will explore later.
We have seen that under certain convexity and regularity assumptions the
optimal values in (IC) and (IC∗ ) are equal to each. There is also another way
to say when these optimal values are equal – this is always the case when the
Lagrange function possesses a saddle point, i.e., there exists a pair x∗ ∈ X, λ∗ ≥ 0
such that at the pair the function L(x, λ) attains its minimum as a function of
x ∈ X and attains its maximum as a function of λ ≥ 0:
L(x, λ∗ ) ≥ L(x∗ , λ∗ ) ≥ L(x∗ , λ), ∀x ∈ X, ∀λ ≥ 0.
256 Lagrange Function and Lagrange Duality

This then leads to the following easily demonstrable fact (do it by yourself or
look at Theorem IV.27.2).

Proposition IV.21.2 The primal-dual pair of solutions (x∗ , λ∗ ) ∈ X × Rk+

is a saddle point of the Lagrange function L of (IC) if and only if x∗ is an
optimal solution to (IC), λ∗ is an optimal solution to (IC∗ ) and the optimal
values in the indicated problems are equal to each other.
22

⋆ Convex Programming in cone-constrained

form

The results from sections 21.1 and 21.2 related to convex optimization problems
in the standard MP format admit instructive extensions to the case of convex
problems in cone-constrained form. We next present these extensions.

22.1 Convex problem in cone-constrained form

Convex problem in cone-constrained form is an optimization problem of the form
Opt(P ) = min {f (x) : g(x) ≤ 0, gb(x) ≤K 0} , (P )
x∈X

where X ⊆ Rn is a nonempty convex set, f : X → R is a convex function,

g(x) := Ax − b is an affine function from Rn to Rk , K ⊂ Rν is a regular cone,
and gb(·) : X → Rν is K-convex.

Example IV.22.1 Recall that the positive semidefinite cone Sn+ and the notation
A ⪰ B, B ⪯ A, A ≻ B, B ≺ A for the associated non-strict and strict conic
inequalities were introduced in section D.2.2. As we know from Fact IV.20.7 and
Example II.8.9, the cone Sn+ is regular and self-dual. Recall from Lemma IV.20.9
that the function from Sn to Sn given by gb(x) = xx⊤ = xx = x2 is ⪰-convex. As
a result, the problem
min n t : Tr(y) ≤ t, y 2 ⪯ B

Opt(P ) = (22.1)
x=(t,y)∈R×S

min n t : ⟨y, In ⟩ − t ≤ 0, y 2 ⪯ B ,

=
x=(t,y)∈R×S

where B is a positive definite matrix and ⟨·, ·⟩ is the Frobenius inner product, is
a convex program in cone-constrained form. ♢

22.2 Cone-constrained Lagrange function

Consider (P ) with a convex objective f , a convex domain X, an affine map
g(·) : Rn → Rk , a regular cone K ⊂ Rν , and a K-convex function gb(·) : X → Rν .
Let Λ := Rk+ × K∗ , where K∗ is the cone dual to K, and consider λ := [λ; λ]
b ∈ Λ.
Then, the cone-constrained Lagrange function of (P ) is defined as
⊤
b⊤ gb(x) : X × Λ → R.
L(x; λ) := f (x) + λ g(x) + λ
257
258 ⋆ Convex Programming in cone-constrained form

By construction, for any λ ∈ Λ, we have that L(x; λ) as a function of x underes-

timates f (x) everywhere on the feasible domain of (P ).
Example IV.22.2 (continued from Example IV.22.1) We see that the cone-
constrained Lagrange function of (22.1) is given by
b 2 − B)) : [R × Sn ] × [R+ × Sn ] → R.
b = t + λ[Tr(y) − t] + Tr(λ(y
L(t, y; λ, λ) +

♢
Cone-constrained Lagrange dual of (P ) is the optimization problem

Opt(D) := max {L(λ) : λ ∈ Λ} , [where L(λ) := inf L(x; λ)], (D)

x∈X

where Λ := Rk+ × K∗ . From L(x; λ) ≤ f (x) for all x ∈ X and λ ∈ Λ, we clearly

extract that
Opt(D) ≤ Opt(P ). [Weak Duality]

Note that Weak Duality is independent of any assumptions of convexity on f , X

and on K-convexity of gb.
Example IV.22.3 (continued from Example IV.22.1) It is immediately seen
(check it, or look at the solution to Exercise IV.22) that for the cone-constrained
Lagrange function of (22.1) we have Dom(L) = {[λ; λ] b : λ = 1, λ b ≻ 0} and
(
1 b−1
b = − 4 Tr(λ ) − Tr(λB), if [λ, λ] ∈ Dom(L) .
b b
L(λ, λ) (22.2)
−∞, otherwise

Then, the cone-constrained Lagrange dual of (22.1) is the problem

1 b−1 ) − Tr(λB)
Opt(D) = max − Tr(λ b : λ=1
λ≻0,λ≥0
b 4

1 −1
= max − Tr(λ ) − Tr(λB) .
b b (22.3)
λ≻0
b 4
♢

22.3 Convex Programming Duality Theorem in cone-constrained

form
For cone-constrained problems, we have the following strong duality theorem.

Theorem IV.22.1 [Convex Programming Duality Theorem in cone-con-

strained form] Consider convex cone-constrained problem (P ), that is, X is
convex, f is real-valued and convex on X, and gb(·) is well defined and K-
convex on X. Assume that the problem is below bounded and satisfies the
Relaxed Slater condition. Then, (D) is solvable and Opt(P ) = Opt(D).
22.3 Convex Programming Duality Theorem in cone-constrained form 259

Note that the only nontrivial part (ii) of Theorem IV.21.1 is nothing but the
special case of Theorem IV.22.1 where K is a nonnegative orthant.
Proof of Theorem IV.22.1. This proof is immediate. Under the premise of the
theorem, c := Opt(P ) is a real, and the system of constraints (ConI) associated
with this c has no solutions. Relaxed Slater Condition along with Convex Theorem
on Alternative in cone-constrained form (Theorem IV.20.13) imply the feasibility
b∗ ] ∈ Λ such that
of (ConII), i.e., the existence of λ∗ = [λ∗ ; λ
⊤
n o
L(λ∗ ) = inf f (x) + λ∗ g(x) + λ b⊤ gb(x) ≥ c = Opt(P ).
∗
x∈X

Thus, we deduce that (D) has a feasible solution with objective value ≥ Opt(P ),
By Weak Duality, this value is exactly Opt(P ), the solution in question is optimal
for (D), and Opt(P ) = Opt(D).
Example IV.22.4 (continued from Example IV.22.1) Problem (22.1) is clearly
below bounded and satisfies Slater condition (since B ≻ 0). By Theorem IV.22.1
the dual problem (22.3) is solvable and has the same optimal value as (22.1).
The solution for the (convex!) dual problem (22.3) can be found by applying the
Fermat rule. To this end, note also that for a positive definite n × n matrix y and
h ∈ Sn it holds that
d
t=0
(y + th)−1 = −y −1 hy −1
dt
(why?). Then, the Fermat rule says that the optimal solution to (22.3) is

b∗ = 1 B −1/2
λ∗ = 1, λ
2
and Opt(P ) = Opt(D) = −Tr(B 1/2 ). ♢

22.3.A “Subgradient interpretation” of Lagrange multipliers. Consider

any ∆ := [δ; δ]b ∈ Rk ×Rν , and define the maps g (x) := Ax−b−δ = g(x)−δ and
∆
gb∆ (x) := gb(x) − δb along with the parametric family of convex cone-constrained
problems (P∆ ) given by

Opt(P∆ ) := min {f (x) : g ∆ (x) ≤ 0, gb∆ (x) ≤K 0}

x∈X
n o
= min f (x) : g(x) − δ ≤ 0, gb(x) − δb ≤K 0 .
x∈X

Notice that (P ) is part of this family, as it is precisely (P0 ).

The cone-constrained Lagrange duals of these problems (P∆ ) also form a para-
metric family. Specifically, by setting λ = [λ; λ]b and Λ = Rk × K∗ , we arrive at
+
the cone-constrained Lagrange function of (P∆ ) as
⊤
b⊤ (gb(x) − δ),
L∆ (x, λ) := f (x) + λ (g(x) − δ) + λ b
260 ⋆ Convex Programming in cone-constrained form

the resulting dual family of problems (D∆ ) given by

Opt(D∆ ) := max {L∆ (λ)} ,
λ∈Λ
⊤
n o
where L∆ (λ) := inf {L∆ (x, λ)} = inf b⊤ (gb(x) − δ)
f (x) + λ (g(x) − δ) + λ b .
x∈X x∈X

⊤
Since L∆ (x, λ) = L0 (x, λ) − λ δ − λ b we deduce L (λ) = L (λ) − λ⊤ δ − λ
b⊤ δ, b⊤ δ,
b
∆ 0
⊤
n o
⊤
where by definition L0 (λ) = inf x∈X f (x) + λ g(x) + λ
b gb(x) . Thus,

⊤
n o
Opt(D∆ ) = max {L∆ (λ)} = max L0 (λ) − λ δ − λ b⊤ δb .
λ∈Λ λ∈Λ

We have the following nice and instructive fact that provides further insights
to the optimum value sensitivity of these parametric families of problems.

Fact IV.22.2 Consider parametric family (P∆ ) of convex cone-constrained

problems along with the family (D∆ ) of their cone-constrained Lagrange
duals. Then,
(i) If L0 (µ) > −∞ for some µ = [µ; µ b] ∈ Λ, then the primal optimal value
Opt(P∆ ) takes values in R ∪ {+∞} and is a convex function of ∆.
(ii) If (D0 ) is solvable with optimal solution λ∗ = [λ∗ ; λ
b∗ ] and Opt(D0 ) =
Opt(P0 ), then −λ∗ is a subgradient of Opt(P∆ ) at the point ∆ = 0, i.e.,
⊤
b⊤ δ,
Opt(P∆ ) ≥ Opt(P0 ) − λ∗ δ − λ b ∀(∆ = [δ; δ]).
b
∗

The premises in (i) and (ii) definitely take place when (P0 ) satisfies the
Relaxed Slater condition and is below bounded.
Example IV.22.5 (continued from Example IV.22.1) Problem (22.1) can be
embedded into the parametric family of problems
min n t : Tr(y) ≤ t, y 2 ⪯ B + R

Opt(PR ) := (P [R])
x=(t,y)∈R×S

with R varying through Sn . Taking into account all we have established so far
for this problem and also considering Fact IV.22.2, we arrive at
1
Tr((B + R)1/2 ) = −Opt(PR ) ≤ Tr(B 1/2 ) + Tr(B −1/2 R), ∀(R ≻ −B). (22.4)
2
Note that this is nothing but the Gradient inequality for the concave (see Fact
III.17.6) function Tr(X 1/2 ) : Sn+ → R, see Fact D.24. ♢

22.4 Conic Programming and Conic Duality Theorem

In this section, we will consider a special case of convex problem in cone-cons-
trained form, namely Conic programs in today’s optimization terminology. In
conic programs, f (x) is linear, X is the entire space, and gb(x) = P x − p is
affine. Note that affine mapping is K-convex whatever be the cone K. Thus,
conic problem automatically satisfies convexity restrictions from cone-constrained
22.4 Conic Programming and Conic Duality Theorem 261

Convex Programming Duality Theorem (Theorem IV.22.1) and is an optimization

problem of the form
Opt(P ) = minn c⊤ x : Ax − b ≤ 0, P x − p ≤K 0 ,

(22.5)
x∈R

where K is a regular cone in certain Rν .

The simplest example of a conic problem is an LP problem, where K is a
nonnegative orthant. Another instructive example is the conic reformulation of
convex quadratic quadratically constrained problem. This example relies on the
following useful observation.

Fact IV.22.3 Consider the convex quadratic constraint x⊤ A⊤ Ax ≤ b⊤ x + c,

where A ∈ Rd×n , b ∈ Rn , and c ∈ R. This constraints can be equivalently
rewritten as a conic constraint involving Lorentz cone, i.e.,
x⊤ A⊤ Ax ≤ b⊤ x + c
⇐⇒ [2Ax; b⊤ x + c − 1; b⊤ x + c + 1] ∈ Ld+2
⇐⇒ 4x⊤ Ax + (b⊤ x + c − 1)2 ≤ (b⊤ x + c + 1)2 and b⊤ x + c + 1 ≥ 0.

Fact IV.22.3 immediately leads to the following useful result.

Fact IV.22.4 Given Aj ∈ Rdj ×n , bj ∈ Rn , and cj ∈ R, for 0 ≤ j ≤ m, the

convex quadratic quadratically constrained optimization problem
min x⊤ A⊤ ⊤ ⊤ ⊤ ⊤

0 A0 x + b0 x + c0 : x Aj Aj x + bj x + cj ≤ 0, 1 ≤ j ≤ m
x

is equivalent to the conic problem on the product of m + 1 Lorentz cones,

specifically, it admits the conic formulation given by

A[x; t] + b := [α0 (x, t); α1 (x); . . . ; αm (x)]
min t : ,
x,t ∈ Ld0 +2 × Ld1 +2 × . . . × Ldm +2
where α0 (x, t) := [2A0 x; t − b⊤ ⊤
0 x − c0 − 1; t − b0 x − c0 + 1] and αj (x) :=
⊤ ⊤
[2Aj x; −bj x − cj − 1; −bj x − cj + 1] for all 1 ≤ j ≤ m.

The cone-constrained Lagrange dual problem of problem (22.5) reads

⊤
h n oi
max inf c⊤ x + λ (Ax − b) + λb⊤ (P x − p) : λ ≥ 0, λ
b ∈ K∗ ,
λ,λ
b x

which, as it is immediately seen, is nothing but the problem

n o
Opt(D) = max −b⊤ λ − p⊤ λ b : A⊤ λ + P ⊤ λ
b + c = 0, λ ≥ 0, λ
b ∈ K∗ . (D)
λ,λ
b

Recall that by Fact IV.20.6.i the cone dual to a regular cone also is a regular
cone. As a result, the problem (D), called the conic dual of the conic problem
(22.5), also is a conic problem. An immediate computation (utilizing the fact that
(K∗ )∗ = K for every regular cone K) shows that conic duality is symmetric.
262 ⋆ Convex Programming in cone-constrained form

Fact IV.22.5 Conic duality is symmetric, i.e., the conic dual to conic prob-
lem (D) is (equivalent to) conic problem (22.5).

In view of primal-dual symmetry, Convex Duality Theorem in cone-constrained

form (Theorem IV.22.1) in the Conic Programming case takes the following nice
form.

Theorem IV.22.6 [Conic Duality Theorem] Consider a primal-dual pair of

conic problems
Opt(P ) := minn c⊤ x : Ax − b ≤ 0, P x − x≤K 0 ,

(P )
x∈R
n o
Opt(D) := max −b⊤ λ − p⊤ λ b : A⊤ λ + P ⊤ λ
b + c = 0, λ ≥ 0, λ
b ∈ K∗ . (D)
λ,λ
b

Then, we always have Opt(D) ≤ Opt(P ). Moreover, if one of the problems

in the pair is bounded and satisfies the Relaxed Slater condition, then the
other problem in the pair is solvable, and Opt(P ) = Opt(D). Finally, if both
of the problems satisfy Relaxed Slater condition, then both are solvable with
equal optimal values.

Proof. This proof is immediate. Weak duality has already been verified. To
verify the second claim, note that by primal-dual symmetry we can assume that
the bounded problem satisfying Relaxed Slater condition is (P ). But, then the
claim in question is given by Theorem IV.22.1. Finally, if both problems satisfy
Relaxed Slater condition (and in particular are feasible), by Weak Duality, both
are bounded, and therefore solvable with equal optimal values by the preceding
claim.
Application example: S-Lemma S-Lemma is an extremely useful fact that
has applications in optimization, engineering, and control.

Lemma IV.22.7 (S-Lemma) Let A, B ∈ Sn be such that

∃x̄ : x̄⊤ Ax̄ > 0. (22.6)
Then, the implication
x⊤ Ax ≥ 0 =⇒ x⊤ Bx ≥ 0 (22.7)
holds if and only if
∃λ ≥ 0: B ⪰ λA. (22.8)

Note that S-Lemma is the statement of the same flavor as Homogeneous Farkas
Lemma: the latter states that a homogeneous linear inequality b⊤ x ≥ 0 is a
consequence of a system of homogeneous linear inequalities a⊤i x ≥ 0, 1 ≤ i ≤ k,
if and only if the target inequality can be obtained from the inequalities of the
system by taking weighted sum with nonnegative weights; we could add to “taking
weighted sum” also “and adding identically true homogeneous linear inequality”
22.4 Conic Programming and Conic Duality Theorem 263

– by the simple reason that there exists only one inequality of the latter type,
0⊤ x ≥ 0. Similarly, S-Lemma says that (whenever (22.6) holds) homogeneous
quadratic inequality x⊤ Bx ≥ 0 is a consequence of (single-inequality) system of
homogeneous quadratic inequalities x⊤ Ax ≥ 0 if and only if the target inequality
can be obtained by taking weighted sum, with nonnegative weights, of inequalities
of the system (that is, by taking a nonnegative multiple of the inequality x⊤ Ax ≥
0) and adding an identically true homogeneous quadratic inequality (there are
plenty of them, these are inequalities x⊤ Cx ≥ 0 with C ⪰ 0).
Note that the possibility for the target inequality to be obtained by summing
up, with nonnegative weights, inequalities from certain system and adding an
identically true inequality is clearly a sufficient condition for the target inequal-
ity to be a consequence of the system. The actual power of Homogeneous Farkas
Lemma and S-Lemma is in the fact that this evident sufficient condition is also
necessary for the conclusion in question to be valid (in the case of linear inequal-
ities – whenever the system is finite, in the case of S-Lemma – when the system
is a single-inequality one and (22.6) takes place). The fact that in the quadratic
case to guarantee the necessity, the system should be a single-inequality one,
whatever unpleasant, is a must. In fact, a straightforward “quadratic version” of
Homogeneous Farkas Lemma fails, in general, to be true already when there are
just two quadratic inequalities in the system. This being said, even that poor, as
compared to its linear inequalities analogy, S-Lemma is extremely useful. . .
In preparation to S-Lemma, we will first prove the following weaker statement.

Lemma IV.22.8 Let A, B ∈ Sn . Suppose ∃x̄ satisfying x̄⊤ Ax̄ > 0. Then,
the implication
{X ⪰ 0, Tr(AX) ≥ 0} =⇒ Tr(BX) ≥ 0 (22.9)
holds if and only if B ⪰ λA for some λ ≥ 0.

Proof. The “if” part of this lemma is evident. To prove the “only if” part,
consider the conic problem
Opt(P ) = min {Tr(BX) : Tr(AX) ≥ 0, X ⪰ 0} (P )
X

along with its conic dual, which is given by

Opt(D) = max{0 · λ + Tr(0n×n Y ) : λ ≥ 0, Y ⪰ 0, Y + λA = B} (D)
λ,Y

(derive the conic dual of (P ) yourself by utilizing the fact that Sn+ is self-dual).
Note that from the premise of (22.6), we deduce that for large enough nonnegative
t, the solution X̄ := In + tx̄x̄⊤ will ensure that the Slater condition holds true.
Moreover, under the premise of (22.9) Opt(P ) is bounded from below by 0 as
well. Then, by Conic Duality Theorem, the dual (D) is solvable, implying that
B ⪰ λA for some λ ≥ 0, as required in this lemma and completing the proof.
Note that (22.7) is nothing but (22.9) with X restricted to be of rank ≤ 1.
Indeed, X ⪰ 0 is of rank ≤ 1 if and only if X = xx⊤ for some vector x, and
264 ⋆ Convex Programming in cone-constrained form

in this case Tr(P X) = x⊤ P x for every symmetric P of appropriate size. We are

now ready to complete the proof of S-Lemma.
Proof of Lemma IV.22.7 (S-Lemma). The “if” part is evident. To prove the
“only if” part, assume that implication (22.7) holds true, and let us verify that
B ⪰ λA for some λ ≥ 0. By Lemma IV.22.8, all we need to this end is to show
that the validity implication (22.7) implies the validity of implication (22.9). Thus,
assume that x⊤ Ax ≥ 0 does imply that x⊤ Bx ≥ 0, and let X ⪰ 0 be such that
Tr(AX) ≥ 0; all we need is to prove that in this case Tr(BX) ≥ 0 holds as well.
To this end, let X 1/2 AX 1/2 = U Diag{µ}U ⊤ be the eigenvalue decomposition of
X 1/2 AX 1/2 . By defining µ := Tr(Diag{µ}) and using the relation X 1/2 AX 1/2 =
U Diag{µ}U ⊤ , we arrive at
µ = Tr(Diag{µ}) = Tr(U ⊤ X 1/2 AX 1/2 U ) = Tr(X 1/2 AX 1/2 ) = Tr(AX) ≥ 0.
Now, consider an n-dimensional Rademacher random vector ζ, i.e., n-dimensio-
nal vector with entries which, independently of each other, take values ±1 with
probabilities 1/2. By setting ξ := X 1/2 U ζ, we get
ξ ⊤ Aξ = ζ ⊤ (U ⊤ X 1/2 AX 1/2 U )ζ = ζ ⊤ Diag{µ}ζ
= Tr(Diag{µ}ζζ ⊤ ) = Tr(Diag{µ}) = µ ≥ 0.
Thus, ξ ⊤ Aξ ≥ 0 for all realizations of ξ. Recalling that we are in the case when
(22.7) holds, we conclude that ξ ⊤ Bξ ≥ 0 for all realizations of ξ, or, which is the
same, ζ ⊤ (U ⊤ X 1/2 BX 1/2 U )ζ ≥ 0 for all realizations of ζ. Passing to expectations
and recalling that ζ is Rademacher random vector, we get
h i
0 ≤ Eζ ζ ⊤ (U ⊤ X 1/2 BX 1/2 U )ζ = Tr(U ⊤ X 1/2 BX 1/2 U )
= Tr(X 1/2 BX 1/2 ) = Tr(BX),
that is, Tr(BX) ≥ 0, so that (22.9) does hold true.
Inhomogeneous S-Lemma. S-Lemma provides a necessary and sufficient con-
dition for a homogeneous quadratic inequality x⊤ Bx ≥ 0 to be a consequence of
strictly feasible homogeneous quadratic inequality x⊤ Ax ≥ 0. What about the
inhomogeneous case? When an inhomogeneous quadratic inequality
x⊤ Bx + 2b⊤ x + β ≥ 0 (B)
is a consequence of a strictly feasible inhomogeneous quadratic inequality
x⊤ Ax + 2a⊤ x + α ≥ 0 (A)
? The answer is easy to guess. The implication (A) =⇒ (B) is nothing but the
implication
∀(t ̸= 0, x) : x⊤ Ax + 2ta⊤ x + αt2 ≥ 0 =⇒ x⊤ Bx + 2tb⊤ x + βt2 ≥ 0 (∗)
(plug x/t instead of x into (A) and look what happens with (B)). We understand
when a bit stronger implication
∀(t, x) : x⊤ Ax + 2ta⊤ x + αt2 ≥ 0 =⇒ x⊤ Bx + 2tb⊤ x + βt2 ≥ 0 (∗∗)
22.4 Conic Programming and Conic Duality Theorem 265

holds true: the homogeneous inequality in the premise of (∗∗) is strictly feasible
along with (A), so that by the homogeneous S-Lemma (∗∗) holds true if and only
if

B b A a
∃λ ≥ 0 : ⪰ λ . (22.10)
b⊤ β a⊤ α
Thus, if we knew that in the case of strictly feasible (A) the validity of implication
(∗) is the same as the validity of implication (∗∗), we could be sure that the first
of these implications takes place if and only if (22.10) takes place. The above ”if”
indeed is true.

Lemma IV.22.9 [Inhomogeneous S-Lemma] Let A, B ∈ Sn . Suppose there

exists x̄ such that x̄⊤ Ax̄ + 2a⊤ x̄ + α > 0. Then, the implication (A) =⇒ (B)
takes place if and only if (22.10) takes place.

Proof. Suppose the premise holds, i.e., x̄⊤ Ax̄ + 2a⊤ x̄ + α > 0 for some x̄. Based
on the discussion preceding this lemma all we need to verify is that the validity
of (∗) is exactly the same as the validity of (∗∗). Clearly, the validity of (∗∗)
implies the validity of (∗), so our task boils down to demonstrating that under
the premise of the lemma, the validity of (∗) implies the validity of (∗∗). Thus,
assume that (∗) is valid, and let us prove that (∗∗) is valid as well. All we need
to prove is that y ⊤ Ay ≥ 0 implies y ⊤ By ≥ 0. Thus, assume that y is such that
y ⊤ Ay ≥ 0, and let us prove that y ⊤ By ≥ 0 as well. Define xt := tx̄ + (1 − t)y,
and consider the univariate quadratic functions qa (t) := x⊤ ⊤
t Axt + 2ta xt + αt ,
2
⊤ ⊤ 2
qb (t) := xt Bxt + 2tb xt + βt . We have so far seen that
(a) for all t ̸= 0, qa (t) ≥ 0 =⇒ qb (t) ≥ 0,
(b) qa (1) > 0 and qa (0) ≥ 0,
and we would like to show that qb (0) ≥ 0. Note that qa and qb are linear or
quadratic functions of t and thus they are continuous in t. Now, consider the
following cases (draw qa in these cases!):

• If qa (0) > 0, by continuity of qa we have qa (t) > 0 for all small enough nonzero
t, and so in such a case, by (a) we also get qb (t) ≥ 0 for all small enough in
magnitude nonzero t, implying, by continuity, that qb (0) ≥ 0.
• If qa (0) = 0, the reasoning goes as follows. When t varies from 0 to 1, the linear
or quadratic function qa (t) varies from 0 to something positive. It follows that
– either qa (t) ≥ 0, 0 ≤ t ≤ 1, implying by (a) that qb (t) ≥ 0 for t ∈ (0, 1], and
so qb (0) ≥ 0 holds by continuity of qb (t) at t = 0,
– or qa (t̄) < 0 holds for some t̄ ∈ (0, 1). Assuming that this is the case, the
linear or quadratic function qa (t) is zero at t = 0, negative somewhere on
(0, 1), and positive at t = 1. Therefore, qa is quadratic, and not linear,
function of t which has exactly one root in the interval (0, 1). Let this root in
(0, 1) be t1 . Recall that the other root of the quadratic function qa is t = 0,
thus we must have qa (t) = c(t − 0)(t − t1 ) for some c ∈ R. From t1 < 1 and
qa (1) > 0 it follows that c > 0; this, in turn, combines with t1 > 0 to imply
266 ⋆ Convex Programming in cone-constrained form

that qa (t) > 0 when t < 0. By (a) it follows that qb (t) ≥ 0 for t < 0, whence
by continuity qb (0) ≥ 0.

As an important consequence of Inhomogeneous S-Lemma we arrive at the

following observation that states that the semidefinite programming relaxation
of the quadratically constrained quadratic program with a single strictly feasible
quadratic constraint is exact.

Corollary IV.22.10 Let A, B ∈ Sn . Suppose there exists x̄ such that

x̄⊤ Ax̄ + 2a⊤ x̄ + α > 0. Then,
β ∗ := inf x⊤ Bx + 2b⊤ x : x⊤ Ax + 2a⊤ x + α ≥ 0

x

B − λA b − λa
= max β : λ ≥ 0, ⪰ 0 .
β,λ b⊤ − λa⊤ −λα − β
(Here, as always, the optimal value of an infeasible maximization problem is
−∞.)

Proof. Define the quadratic functions q(x) := x⊤ Ax + 2a⊤ x + α and qβ (x) :=

x⊤ Bx + 2b⊤ x − β. Note that β ∗ is the supremum of all β’s satisfying β ≤
inf x {x⊤ Bx + 2b⊤ x : x⊤ Ax + 2a⊤ x + α ≥ 0}, that is, those β for which the
implication
q(x) ≥ 0 =⇒ qβ (x) ≥ 0
holds true. By the premise of the corollary, there exists x̄ satisfying q(x̄) > 0,
so that by Inhomogeneous S-lemma (see Lemma IV.22.9) β ∗ is the supremum of
those β which can be augmented by appropriate λ to yield feasible solutions to
the maximization problem in the corollary’s formulation, or, which is the same,
β ∗ is the optimal value in the latter problem.
Example IV.22.6 (continued from Example IV.22.1) Invoking Schur Comple-
ment Lemma (Proposition D.33), we can rewrite (22.1) equivalently as the conic
problem

B y
Opt(P ) = min n t : Tr(y) ≤ t, ⪰0 . (22.11)
t∈R,y∈S y In
The conic dual of (22.11) can be obtained as follows: we equip the
scalarinequality
B y
t − Tr(y) ≥ 0 with Lagrange multiplier λ ≥ 0, the inequality ⪰ 0 with
y In

U V
Lagrange multiplier ⪰ 0 (recall that the semidefinite cone is self-dual,
V⊤ W
so that legitimate Lagrange multipliers for ⪰-constraints are ⪰-nonnegative), and
sum up the termwise inner products of the constraints of (22.11), thus arriving
at the aggregated inequality
λ(t − Tr(y)) + 2Tr(yV ⊤ ) ≥ −Tr(BU ) − Tr(W ).
22.4 Conic Programming and Conic Duality Theorem 267

which by construction is a consequence of the constraints in (22.11). We then

impose on the Lagrange multipliers, in addition to the above conic constraints,
the restriction that the left hand side in the aggregated inequality is identically
in t ∈ R, y ∈ Sn equal to the objective in (22.11), that is, the restrictions
λ = 1, V + V ⊤ = In .
Then, the dual problem is given by
 

 λ = 1, 

⊤
V
+V = In
 
Opt(D) = max −(Tr(BU ) + Tr(W )) : . (22.12)
λ∈R, U V
⪰0
n n×n

 

U,W ∈S ,V ∈R
V⊤ W
 

Thus, the dual is precisely the problem of maximizing under the outlined restric-
tion the right hand side of the aggregated inequality.
Since problem (22.11) satisfies the Slater condition (as B ≻ 0) and is below
bounded (why?), the dual problem is solvable and Opt(P ) = Opt(D). Moreover,
the dual problem also satisfies the Relaxed Slater condition (why?), so that both
the primal and the dual problems are solvable. ♢
23

Optimality Conditions in Convex

Programming

Using our results on convex optimization duality, we next derive optimality con-
ditions for convex programs.

23.1 Saddle point form of optimality conditions

Theorem IV.23.1 [Saddle Point formulation of Optimality Conditions in

Convex Programming]
Consider optimization problem
minx {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X] ,
along with its Lagrange dual problem

max λ L(λ) : λ ∈ Rm
+
(IC∗ )

Pm
where L(λ) = inf L(x, λ) and L(x, λ) = f (x) + i=1 λi gi (x) .
x∈X

∗
Let x ∈ X. Then,
(i) A sufficient condition for x∗ to be an optimal solution to (IC) is the
existence of the vector of Lagrange multipliers λ∗ ≥ 0 such that (x∗ , λ∗ ) is
a saddle point of the Lagrange function L(x, λ), i.e., a point where L(x, λ)
attains its minimum as a function of x ∈ X and attains its maximum as a
function of λ ≥ 0:
L(x, λ∗ ) ≥ L(x∗ , λ∗ ) ≥ L(x∗ , λ) ∀x ∈ X, ∀λ ≥ 0. (23.1)
(ii) Furthermore, if the problem (IC) is convex and satisfies the Relaxed
Slater condition, then the above condition is necessary for optimality of x∗ : if
x∗ is optimal for (IC), then there exists λ∗ ≥ 0 such that (x∗ , λ∗ ) is a saddle
point of the Lagrange function.

Proof. (i): Assume that for a given x∗ ∈ X there exists λ∗ ≥ 0 such that (23.1)
is satisfied, and let us prove that then x∗ is optimal for (IC). First, we claim
that x∗ is feasible. Assume for contradiction that gj (x∗ ) > 0 for some j. Then, of
course, sup L(x∗ , λ) = +∞ (look what happens when all λ’s, except λj , are fixed,
λ≥0

268
23.1 Saddle point form of optimality conditions 269

and λj → +∞). But, sup L(x∗ , λ) = +∞ is forbidden by the second inequality in

λ≥0
(23.1). Thus, x∗ must be feasible to (IC).
Since x∗ is feasible, sup L(x∗ , λ) = f (x∗ ), and we conclude from the second
λ≥0
inequality in (23.1) that L(x∗ , λ∗ ) = sup L(x∗ , λ) = f (x∗ ). Finally, let us examine
λ≥0
the first inequality in (23.1). This relation now reads
m
X
f (x) + λ∗j gj (x) ≥ f (x∗ ), ∀x ∈ X.
j=1

Recall that for any x feasible for (IC), we have gj (x) ≤ 0 for all j. Together
with λ∗P≥ 0, we then deduce that for any x feasible to (IC), we have f (x) ≥
m
f (x) + j=1 λ∗j gj (x). But, then the above relation immediately implies that x∗ is
optimal.
(ii): Assume that (IC) is a convex program, x∗ is its optimal solution and
the problem satisfies the Relaxed Slater condition. We will prove that then there
exists λ∗ ≥ 0 such that (x∗ , λ∗ ) is a saddle point of the Lagrange function, i.e., that
(23.1) is satisfied. As we know from the Convex Programming Duality Theorem
(Theorem IV.21.1.ii), the dual problem (IC∗ ) has a solution λ∗ ≥ 0 and the
optimal value of the dual problem is equal to the optimal value of the primal one,
i.e., to f (x∗ ):
f (x∗ ) = L(λ∗ ) ≡ inf L(x, λ∗ ). (23.2)
x∈X

Then, we immediately conclude that

λ∗j > 0 =⇒ gj (x∗ ) = 0
(this is called complementary slackness: positive Lagrange multipliers can be
associated only with active (satisfied at x∗ as equalities) constraints). Indeed,
from (23.2) it for sure follows that
m
X
∗ ∗ ∗ ∗
f (x ) ≤ L(x , λ ) = f (x ) + λ∗j gj (x∗ );
j=1

the terms in the summation expression in the right hand side are nonpositive
(since x∗ is feasible for (IC) and λ∗ ≥ 0), and the sum itself is nonnegative
due to our inequality. Note that this is possible if and only if all the terms in the
summation expression are zero, and this is precisely the complementary slackness.
From the complementary slackness we immediately conclude that f (x∗ ) =
L(x∗ , λ∗ ), so that (23.2) results in
L(x∗ , λ∗ ) = f (x∗ ) = inf L(x, λ∗ ).
x∈X
∗
On the other hand, since x is feasible for (IC), from the definition of the La-
grangian function we deduce that L(x∗ , λ) ≤ f (x∗ ) whenever λ ≥ 0. Combining
our observations, we conclude that
L(x∗ , λ) ≤ L(x∗ , λ∗ ) ≤ L(x, λ∗ )
270 Optimality Conditions in Convex Programming

for all x ∈ X and all λ ≥ 0.

Note that Theorem IV.23.1(i) is valid for an arbitrary inequality constrained
optimization problem, not necessarily a convex one. However, in the noncon-
vex case the sufficient condition for optimality given by Theorem IV.23.1(i) is
extremely far from being necessary and is “almost never” satisfied. In contrast
to this, in the convex case the condition in question is not only sufficient, but
also “nearly necessary” – it for sure is necessary when (IC) is a convex program
satisfying the Relaxed Slater condition.
Theorem IV.23.1 presents Saddle Point form of optimality conditions for con-
vex problems in the standard Mathematical Programming form (that is, with
constraints represented by scalar convex inequalities). Similar results can be ob-
tained for convex cone-constrained problems as follows.

Theorem IV.23.2 [Saddle Point formulation of Optimality Conditions in

Convex Cone-constrained Programming] Consider a convex cone-constrained
problem
Opt(P ) = min {f (x) : g(x) := Ax − b ≤ 0, gb(x)≤K 0} , (P )
x∈X

(X is convex, f : X → R is convex, g(·) : Rn → Rk is affine, K is a regular

cone, and gb is K-convex on X) along with its Cone-constrained Lagrange
dual problem

⊤
h i
⊤
Opt(D) = max L(λ) := inf f (x) + λ g(x) + λ gb(x) : λ ≥ 0, λ ∈ K∗ .
b b
λ:=[λ;λ]
b x∈X

(D)
Suppose that (P ) is bounded and satisfies Relaxed Slater condition. Then, a
point x∗ ∈ X is an optimal solution to (P ) if and only if x∗ can be augmented
by properly selected λ∗ ∈ Λ := Rk+ × [K∗ ] to be a saddle point of the cone-
constrained Lagrange function
b := f (x) + λ⊤ g(x) + λ
L(x; [λ; λ]) b⊤ gb(x)

on X × Λ.

Proof. The proof basically repeats the one of Theorem IV.23.1. In one direction:
∗
assume that x∗ ∈ X can be augmented by λ∗ = [λ ; λ b∗ ] ∈ Λ to form a saddle
∗
point of L(x; λ) on X × Λ, and let us prove that x is an optimal solution to (P ).
Observe, first, that from
b = f (x∗ ) + sup λ⊤ g(x∗ ) + λ
h i
L(x∗ ; λ∗ ) = sup L(x∗ ; [λ; λ]) b⊤ gb(x∗ ) (23.3)
λ∈Λ λ∈Λ

b⊤ gb(x∗ ) of λ
it follows that the linear form λ b is bounded from above on the cone
∗
K∗ , implying that −gb(x ) ∈ [K∗ ]∗ = K. Similarly, (23.3) says that the linear
⊤
form λ g(x∗ ) of λ is bounded from above on the cone Rk+ , implying that −g(x∗ )
belongs to the dual of this cone, that is, to Rk+ . Thus, x∗ is feasible for (P ).
As x∗ is feasible for (P ), the right hand side of the second equality in (23.3)
23.2 Karush-Kuhn-Tucker form of optimality conditions 271

is f (x∗ ), and thus (23.3) says that L(x∗ ; λ∗ ) = f (x∗ ). With this in mind, the
relation L(x; λ∗ ) ≥ L(x∗ ; λ∗ ) (which is satisfied for all x ∈ X, since (x∗ , λ∗ ) is
a saddle point of L) reads L(x; λ∗ ) ≥ f (x∗ ). This combines with the relation
f (x) ≥ L(x; λ∗ ) (which, due to λ∗ ∈ Λ, holds true for all x feasible for (P )) to
imply that Opt(P ) ≥ f (x∗ ). The bottom line is that x∗ is a feasible solution to
(P ) satisfying Opt(P ) ≥ f (x∗ ), thus, x∗ is an optimal solution to (P ), as claimed.
In the opposite direction: let x∗ be an optimal solution to (P ), and let us
verify that x∗ is the first component of a saddle point of L(x; λ) on X × Λ.
Indeed, (P ) is convex essentially strictly feasible cone-constrained problem; being
solvable, it is below bounded. Applying Convex Programming Duality Theorem
in cone-constrained form (Theorem IV.22.1), the dual problem (D) is solvable
∗
with optimal value Opt(D) = Opt(P ). Denoting by λ∗ = [λ ; λ b∗ ] an optimal
solution to (D), we have
f (x∗ ) = Opt(P ) = Opt(D) = L(λ∗ ) = inf L(x; λ∗ ), (23.4)
x∈X
∗
b∗ ]⊤ gb(x∗ ), that is, [λ∗ ]⊤ g(x∗ ) +
whence f (x∗ ) ≤ L(x∗ ; λ∗ ) = f (x∗ ) + [λ ]⊤ g(x∗ ) + [λ
b∗ ]⊤ gb(x∗ ) ≥ 0. Both terms in the latter sum are nonpositive (as x∗ is feasible
[λ
∗
for (P ) and λ∗ ∈ Λ), while their sum is nonnegative, so that [λ ]⊤ g(x∗ ) = 0
and [λ b∗ ]⊤ gb(x∗ ) = 0. We conclude that the inequality f (x∗ ) ≤ L(x∗ ; λ∗ ) is in fact
equality, so that (23.4) reads inf x∈X L(x; λ∗ ) = L(x∗ ; λ∗ ). Next, L(x∗ ; λ) ≤ f (x∗ )
for λ ∈ Λ due to feasibility of x∗ for (P ), which combines with the already
proved equality L(x∗ ; λ∗ ) = f (x∗ ) to imply that supλ∈Λ L(x∗ ; λ) = L(x∗ ; λ∗ ).
Thus, (x∗ , λ∗ ) is the desired saddle point of L.

23.2 Karush-Kuhn-Tucker form of optimality conditions

Theorem IV.23.1 provides, basically, the strongest known optimality conditions
for a Convex Programming problem. These conditions, however, are “implicit” –
they are expressed in terms of saddle point of the Lagrange function, and it is
unclear how to verify whether a given solution is a saddle point of the Lagrange
function. Fortunately, the proof of Theorem IV.23.1 yields more or less explicit
optimality conditions.
Recall that the normal cone NX (x) of a set X ⊆ Rn at a point x ∈ X as
defined by (14.5) is
NX (x) = h ∈ Rn : h⊤ (x′ − x) ≤ 0, ∀x′ ∈ X .

Now let us define the notion of Karush-Kuhn-Tucker point of inequality con-

strained optimization problem.

Definition IV.23.3 [Karush-Kuhn-Tucker point of inequality constrained

272 Optimality Conditions in Convex Programming

Mathematical Programming problem] Consider optimization problem

minx {f (x) : gj (x) ≤ 0, j = 1, . . . , m, x ∈ X}
(IC)
[where f, g1 , . . . , gm are real-valued functions on X] ,
A point x∗ ∈ Rn is called Karush-Kuhn-Tucker (KKT) point of (IC), if x∗
is feasible, f , g1 ,. . . ,gm are differentiable at x∗ , and there exist nonnegative
Lagrange multipliers λ∗j , j = 1, . . . , m, such that
λ∗j gj (x∗ ) = 0, j = 1, . . . , m, [complementary slackness]

(23.5)
m
X
and ∇f (x∗ ) + λ∗j ∇gj (x∗ ) ∈ −NX (x∗ ), [KKT equation]
j=1
(23.6)
∗ ∗
where NX (x ) is the normal cone of X at x .

We are now ready to state “more explicit” optimality conditions for convex
programs based on KKT points.

Theorem IV.23.4 [Karush-Kuhn-Tucker Optimality Conditions in Convex

Programming] Let (IC) be a convex program (i.e., X is nonempty and convex,
and f, g1 , . . . , gm are convex on X). Let x∗ ∈ X, and let the functions f ,
g1 ,. . . ,gm be differentiable at x∗ . Then,
(i) [Sufficiency] If x∗ is a KKT point of (IC), then x∗ is an optimal solution
to the problem.
(ii) [Necessity and sufficiency] If, in addition to the premise, the Relaxed
Slater condition holds, x∗ is an optimal solution to (IC) if and only if x∗ is a
KKT point of the problem.

Proof. (i): Suppose x∗ is a KKT point of problem (IC), and let us prove that
x∗ is an optimal solution to (IC). By Theorem IV.23.1, it suffices to demonstrate
that augmenting x∗ by properly selected λ ≥ 0, we get a saddle point (x∗ , λ) of
∗
the Lagrange function on X × Rm + . Let λ be the Lagrange multipliers associated
with x according to the definition of a KKT point. We claim that (x∗ , λ∗ ) is a
∗

saddle point of the Lagrange function. Indeed, complementary P slackness says that
n
L(x∗ , λ∗ ) = f (x∗ ), while due to feasibility of x∗ we have supλ≥0 j=1 λj gj (x∗ ) = 0.
Taken together, these observations say that L(x , λ ) = supλ≥0 L(x∗ , λ). More-
∗ ∗

over, the function ϕ(x) := L(x, λ∗ ) : X → R is convex and differentiable at x∗ ,

and by the KKT equation we have ∇ϕ(x∗ ) ∈ −NX (x∗ ). Invoking Proposition
III.14.3, we conclude that x∗ minimizes ϕ on X, that is, L(x, λ∗ ) ≥ L(x∗ , λ∗ ) for
all x ∈ X. Thus, (x∗ , λ∗ ) is a saddle point of the Lagrange function.
(ii): In view of (i), all we need to prove (ii) is to demonstrate that if x∗ is
an optimal solution to (IC), (IC) is a convex program that satisfies the Relaxed
Slater condition, and the objective and constraints of (IC) are differentiable at
23.3 Cone-constrained KKT optimality conditions 273

x∗ , then x∗ is a KKT point. Indeed, let x∗ and (IC) satisfy the above “if.” By
Theorem IV.23.1(ii), x∗ can be augmented by some λ∗ ≥ 0 to yield a saddle point
(x∗ , λ∗ ) of L(x, λ) on X × Rm
+ . Then, the saddle point inequalities (23.1) give us
m
X m
X
f (x∗ ) + λ∗j gj (x∗ ) = L(x∗ , λ∗ ) ≥ sup L(x∗ , λ) = f (x∗ ) + sup λj gj (x∗ ).
j=1 λ≥0 λ≥0 j=1

(23.7)
Moreover, as x∗ is feasible to (IC), we have gj (x∗ ) ≤ 0 for all j, whence
X
supλ≥0 λi gj (x∗ ) = 0.
j

Therefore (23.7) implies that j λ∗j gj (x∗ ) ≥ 0. This inequality, in view of λ∗j ≥ 0
P
and gj (x∗ ) ≤ 0 for all j, implies that λ∗j gj (x∗ ) = 0 for all j, i.e., complementary
slackness (23.5) condition holds. The relation L(x, λ∗ ) ≥ L(x∗ , λ∗ ) for all x ∈ X,
implies that the function ϕ(x) := L(x, λ∗ ) attains its minimum on X at x = x∗ .
Note also that ϕ(x) is convex on X and differentiable at x∗ , thus, by Proposition
III.14.3, we deduce that the KKT equation (23.6) also holds.
Note that the optimality conditions stated in Theorem III.14.2 and Proposition
III.14.3 are particular cases of Theorem IV.23.4 corresponding to m = 0.
Remark IV.23.5 A standard special case of Theorem IV.23.4 that is worth
discussing explicitly is when x∗ is in the (relative) interior of X.
When x∗ ∈ int X, we have NX (x∗ ) = {0}, so that (23.6) reads
m
X
∇f (x∗ ) + λ∗j ∇gj (x∗ ) = 0.
j=1

When x∗ ∈ rint X, NX (x∗ ) is the orthogonal complement to the linear subspace

L to which Aff(X) is parallel, so that (23.6) reads
m
X
∇f (x∗ ) + λ∗j ∇gj (x∗ ) is orthogonal to L := Lin(X − x∗ ).
j=1

23.3 Cone-constrained KKT optimality conditions

The cone-constrained version of the notion of a KKT point is defined as follows:

Definition IV.23.6 [Karush-Kuhn-Tucker point of a cone-constrained prob-

lem] Consider cone-constrained optimization problem

{f (x) : g(x) :=
min Ax − b ≤ 0, gb(x) ≤K 0, x ∈ X}
where X ⊆ Rn , f : X → R, g : Rn → Rk , (ConeC)
gb : X → Rν , K ⊂ Rν is a regular cone.
274 Optimality Conditions in Convex Programming

A point x∗ ∈ Rn is called Karush-Kuhn-Tucker (KKT) point of (ConeC),

if x∗ is feasible, f and gb are differentiable at x∗ , and there exist Lagrange
multipliers λ = [λ1 ; . . . ; λk ] ≥ 0 and λ
b ∈ K∗ such that

λj [g(x∗ )]j = 0, ∀j ≤ k, & λ

b⊤ gb(x∗ ) = 0, [complementary slackness] (23.8)
⊤
h i
and ∇x f (x) + λ g(x) + λ b⊤ gb(x) ∈ −NX (x∗ ), [KKT equation]
x=x∗
(23.9)
where NX (x∗ ) is the normal cone of X at x∗ , see (14.5).

Based on this definition, cone-constrained version of Theorem IV.23.4 is as

follows.

Theorem IV.23.7 [Karush-Kuhn-Tucker Optimality Conditions in Cone-

constrained Convex Programming] Consider a convex cone-constrained prob-
lem
Opt(P ) = min {f (x) : g(x) := Ax − b ≤ 0, gb(x)≤K 0} (P )
x∈X

(X is convex, f : X → R is convex, A ∈ Rk×n , K is a regular cone, and gb is

K-convex on X). Suppose x∗ ∈ X is a feasible solution to the problem, and
let f and gb be differentiable at x∗ .
(i) If x∗ is a KKT point (as defined by Definition IV.23.6) of (P ), then x∗
is an optimal solution to (P ).
(ii) If x∗ is optimal solution to (P ) and, if addition to the above premise,
(P ) satisfies the cone-constrained Relaxed Slater condition, then x∗ is a KKT
point, as defined by Definition IV.23.6, of (P )

The proof of this theorem follows verbatim by the proof of Theorem IV.23.4, with
Theorem IV.23.2 in the role of Theorem IV.23.1.
Application: Optimal value in parametric convex cone-constrained prob-
lem. What follows is a far-reaching extension of subgradient interpretation of
Lagrange multipliers presented in section 22.3.A. Consider a parametric family
of convex cone-constrained problems defined by a parameter p ∈ P
Opt(p) := min {f (x, p) : g(x, p) ≤M 0} , (P[p])
x∈X

where
(a) X ⊆ Rn and P ⊆ Rµ are nonempty convex sets,
(b) M is a regular cone in some Rν .
(c) f : X × P → R is convex, and g : X × P → Rν is M-convex. 1

1 In what follows, splitting the constraint in a cone-constrained problem into a system of scalar linear
inequalities and a conic inequality does not play any role, and in order to save notation, (P[p]) uses
“single-constraint” format of cone-constrained problem. Of course, the two-constraint format
g(x) := Ax − b ≤ 0, bg (x) ≤K 0 reduces to the single-constraint one by setting g(x) = [g(x); b
g (x)] and
M = Rk+ × K.
23.3 Cone-constrained KKT optimality conditions 275

Next, we make the following assumption:

(d) Suppose x ∈ X and p ∈ P are such that
– x is a KKT point, as defined by Definition IV.23.6, of convex cone-constrained
problem (P[p]) (and therefore, by Theorem IV.23.7, is an optimal solution
to the problem), and
– f (x, p) and g(x, p) are differentiable at the point [x; p], the derivatives being
Df ([x; p])[[dx; dp]] = Fx⊤ dx + Fp⊤ dp, Dg([x; p])[[dx; dp]] = Gx dx + Gp dp.
and let µ ∈ M∗ be the Lagrange multiplier associated with x and (P[p]), so that
µ⊤ g(x, p) = 0 and [x − x]⊤ [Fx + G⊤
x µ] ≥ 0, ∀x ∈ X. (23.10)
(cf. Definition IV.23.6).

Proposition IV.23.8 Under Assumptions (a–d), Opt(·) is a convex function

on P taking values in R ∪ {+∞} and finite at p, and the vector
Fp + G⊤
pµ

is a subgradient of Opt(·) at p:
Opt(p) ≥ Opt(p) + [p − p]⊤ [Fp + G⊤
p µ], ∀p ∈ P. (23.11)

Proof. Observe, first, that as f is convex by Gradient inequality we have

f (x, p) ≥ f (x, p) + Fx⊤ [x − x] + Fp⊤ [p − p], ∀(x ∈ X, p ∈ P ). (23.12)
Also, from µ ∈ M∗ and M-convexity of g by Fact IV.20.10 it follows that the
function µ⊤ g(x, p) is convex on X × P , so that by Gradient inequality we have
µ⊤ g(x, p) ≥ µ⊤ g(x, p) +µ⊤ Gx [x − x] + µ⊤ Gp [p − p], ∀(x ∈ X, p ∈ P ).
| {z }
=0 by (23.10)
(23.13)
Now let p ∈ P and x be feasible for (P[p]). Then,
f (x, p) ≥ f (x, p) + µ⊤ g(x, p)
≥ f (x, p) + Fx⊤ [x − x] + Fp⊤ [p − p] + µ⊤ Gx [x − x] + µ⊤ Gp [p − p]
= Opt(p) + (Fx + G⊤ ⊤ ⊤ ⊤
x µ) [x − x] + (Fp + Gp µ) [p − p]

≥ Opt(p) + (Fp + G⊤ ⊤
p µ) [p − p],

where the first inequality follows from µ ∈ M∗ and g(x, p) ≤M 0, the second is
due to (23.12) and (23.13), the equality holds by recalling that x is optimal for
(P [p]), and the last inequality is due to (23.10). As the resulting inequality holds
true for all x feasible for (P[p]), it justifies (23.11).
To complete the proof, we need to verify the convexity of Opt(·). By the relation
in (23.11), Opt(p) for p ∈ P is either a real, or +∞, as is required for a convex
function. For any p′ , p′′ ∈ P ∩ Dom(Opt(·)) and λ ∈ [0, 1], we need to check
Opt(λp′ + (1 − λ)p′′ ) ≤ λOpt(p′ ) + (1 − λ)Opt(p′′ ). (23.14)
276 Optimality Conditions in Convex Programming

This is immediate (cf. proof of Fact IV.22.2): given ϵ > 0, we can find x′ , x′′ ∈ X
such that
g(x′ , p′ ) ≤M 0, g(x′′ , p′′ ) ≤M 0, f (x′ , p′ ) ≤ Opt(p′ ) + ϵ, f (x′′ , p′′ ) ≤ Opt(p′′ ) + ϵ.
Setting p := λp′ + (1 − λ)p′′ , x := λx′ + (1 − λ)x′′ and invoking convexity of f
and M-convexity of g, we get
g(x, p) ≤M 0, f (x) ≤ [λOpt(p′ ) + (1 − λ)Opt(p′′ )] + ϵ.
Finally, since ϵ > 0 is arbitrary, we arrive at (23.14).
Note that the result of section 22.3.A is nothing but what Proposition IV.23.8
states in the case of f independent of p, M = Rk+ × K, p = [δ, δ]
b ∈ Rk × Rν , and
g(x, p) = [g(x) − δ; gb(x) − δ].
b

23.4 Optimality conditions in Conic Programming

We continue by discussing the case of conic programming.

Theorem IV.23.9 [Optimality Conditions in Conic Programming] Consider

a primal-dual pair of conic problems (cf. section 22.4)
Opt(P ) = minn c⊤ x : Ax − b ≤ 0, P x − p ≤K 0

(P)
x∈R
n o
Opt(D) = max −b⊤ λ − p⊤ λ b : A⊤ λ + P ⊤ λ
b + c = 0, λ ≥ 0, λ
b ∈ K∗ . (D)
λ,λ
b

Suppose that both problems satisfy Relaxed Slater condition. Then, a pair
of feasible solutions x∗ to (P ) and λ∗ := [λ∗ ; λb∗ ] to (D) is optimal to the
respective problems if and only if
DualityGap(x∗ ; λ∗ ) := c⊤ x∗ − [−b⊤ λ∗ − p⊤ λ
b∗ ] = 0, [Zero Duality Gap]

which holds if and only if

⊤
b⊤ [p − P x∗ ] = 0.
λ∗ [b − Ax∗ ] + λ [Complementary Slackness]
∗

Remark IV.23.10 Under the premise of Theorem IV.23.9, from the feasibility of
x∗ and λ∗ for their respective problems it follows that b−Ax∗ ≥ 0 and p−P x∗ ∈ K
and λ∗ ≥ 0 and λ b∗ ∈ K∗ . Therefore, Complementary slackness (which says that
the sum of two inner products, every one of a vector from a regular cone and a
vector from the dual of this cone, and as such automatically nonnegative) is zero is
b∗ ]⊤ gb(x∗ ) = 0
a really strong restriction. This comment is applicable to relation [λ
in (23.8). ■
Proof of Theorem IV.23.9. By Conic Duality Theorem (Theorem IV.22.6)
we are in the case when Opt(P ) = Opt(D) ∈ R, and therefore for any x and
λ := [λ; λ],
b we have

DualityGap(x; λ) = [c⊤ x − Opt(P )] + [Opt(D) − (−b⊤ λ − p⊤ λ)].

b
23.4 Optimality conditions in Conic Programming 277

Now, when x is feasible for (P ), the primal optimality gap c⊤ x − Opt(P ) is non-
negative and is zero if and only if x is optimal for (P ). Similarly, when λ = [λ; λ]
b
is feasible for (D), the dual optimality gap Opt(D) − (−b⊤ λ − p⊤ λ)b is nonnegative
and is zero if and only if λ is optimal for (D). We conclude that whenever x is fea-
sible for (P ), and λ is feasible for (D). the duality gap DualityGap(x; λ) (which,
as we have seen, is the sum of the corresponding optimality gaps) is nonnegative
and is zero if and only if both these optimality gaps are zero, that is, if and only
if x is optimal for (P ), and λ is optimal for (D).
It remains to note that Complementary Slackness condition is equivalent to
Zero Duality Gap one. To this end, note that since x∗ and λ∗ are feasible for
their respective problems, we have
DualityGap(x∗ ; λ∗ ) = c⊤ x∗ + b⊤ λ∗ + p⊤ λ
b∗

= −[A⊤ λ∗ + P ⊤ λ
c∗ ]⊤ x∗ + b⊤ λ∗ + p⊤ λ
b∗
⊤
b⊤ [p − P x∗ ].
= λ∗ [b − Ax∗ ] + λ∗

Therefore, Complementary Slackness, for the solutions x∗ and λ∗ that are feasible
for the respective problems, is exactly the same as Zero Duality Gap.
Example IV.23.1 (continued from Example IV.22.1) Consider the primal-dual
pair of conic problems (22.11) and (22.12). We claim that the primal solution
y = −B 1/2 , t = −Tr(B 1/2 ) and the dual solution λ = 1, U = 21 B −1/2 , V = 12 In ,
W = 12 B 1/2 are optimal for the respective problems. Indeed, it is immediately seen
that these solutions are feasible for the respective problems (to check feasibility of
the dual solution, use Schur Complement Lemma). Moreover, the objective value
of the primal solution equals to the objective value of the dual solution, and both
these quantities are equal to −Tr(B 1/2 ). Thus, the zero duality gap indeed holds
true. ♢
24

Duality in Linear and Convex Quadratic

Programming

The fundamental role of the Lagrange function and Lagrange Duality in Op-
timization is clear already from the Optimality Conditions given by Theorem
IV.23.1, but this role is not restricted by this theorem only. There are several
cases when we can explicitly write down the Lagrange dual, and whenever it is
the case, we get a pair of explicitly formulated and closely related to each other
optimization programs – the primal-dual pair; analyzing the problems simulta-
neously, we get more information about their properties (and get a possibility to
solve the problems numerically in a more efficient way) than it is possible when
we restrict ourselves with only one problem of the pair. The detailed investigation
of Duality in “well-structured” Convex Programming deals with cone-constrained
Lagrange duality and conic problems. This being said, there are cases where al-
ready “plain” Lagrange duality is quite appropriate. Let us look at two of these
particular cases.

24.1 Linear Programming Duality

Let us start with some general observation. Note that the Karush-Kuhn-Tucker
condition under the assumption of Theorem IV.23.4 (i.e., problem

min f (x) : g( x) ≤ 0, j = 1, . . . , m, x ∈ X (IC)
x
∗
is convex, x is a feasible solution to the problem, and f, g1 , . . . , gm are differ-
entiable at x∗ ) is exactly the condition that (x∗ , λ∗ := (λ∗1 , . . . , λ∗m )) is a saddle
point of the Lagrange function
m
X
L(x, λ) = f (x) + λj gj (x) (24.1)
j=1

∗
on X × Rm + : equalities (23.5) taken together with feasibility of x state that
∗ ∗
L(x , λ) attains its maximum in λ ≥ 0 at λ , and (23.6) states that when λ is
fixed at λ∗ the function L(x, λ∗ ) attains its minimum in x ∈ X at x = x∗ .
Now consider the particular case of (IC) where X = Rn is the entire space, the
objective f is convex and everywhere differentiable and the constraints g1 , . . . , gm
are linear. In this case the Relaxed Slater Condition holds whenever there is a fea-
sible solution to (IC), and when that is the case, Theorem IV.23.4 states that the
KKT (Karush-Kuhn-Tucker) condition is necessary and sufficient for optimality
278
24.2 Quadratic Programming Duality 279

of x∗ ; as we just have explained, this is the same as to say that the necessary
and sufficient condition of optimality for x∗ is that x∗ along with certain λ∗ ≥ 0
form a saddle point of the Lagrange function. Combining these observations with
Proposition IV.21.2, we get the following simple result.

Proposition IV.24.1 Let (IC) be a convex program with X = Rn , every-

where differentiable objective f and linear constraints g1 , . . . , gm .
Then, x∗ is an optimal solution to (IC) if and only if there exists λ∗ ≥ 0
such that (x∗ , λ∗ ) is a saddle point of the Lagrange function (24.1) (regarded
as a function of x ∈ Rn and λ ≥ 0). In particular, (IC) is solvable if and only
if L has saddle points, and if it is the case, then both (IC) and its Lagrange
dual n o
max L(λ) := inf L(x, λ) (IC∗ )
λ≥0 x

are solvable with equal optimal objective values.

Let us look what Proposition IV.24.1 says in the Linear Programming case, i.e.,
when (IC) is the problem given by
min f (x) := cT x : gj (x) := bj − aTj x ≤ 0, j = 1, . . . , m .

(P )
x

In order to get to the Lagrange dual, we should form the Lagrange function of
(IC) given by
m
X Xm ⊤ m
X
L(x, λ) = f (x) + λj gj (x) = c − λ j aj x + λj bj ,
j=1
j=1 j=1

and minimize it in x ∈ Rn ; this will give us the dual objective. In our case
the
Pm minimization in x is P immediate: the minimal value is equal to −∞, if c −
m
j=1 λj aj ̸= 0, and it is j=1 λj bj , otherwise. Hence, we see that the Lagrange
dual is given by
( m
)
X
⊤
max b λ : λj aj = c, λ ≥ 0 . (D)
λ
j=1

Therefore, the Lagrange dual problem is precisely the usual LP dual to (P ), and
Proposition IV.24.1 is one of the equivalent forms of the Linear Programming
Duality Theorem (Theorem I.4.9) which we already know.

24.2 Quadratic Programming Duality

Now consider the case when the original problem is linearly constrained convex
quadratic program given by

1
min f (x) := x⊤ Qx + c⊤ x : gj (x) := bj − a⊤
j x ≤ 0, j = 1, . . . , m , (P )
x 2
280 Duality in Linear and Convex Quadratic Programming

where the objective is a strictly convex quadratic form, so that the matrix Q = Q⊤
is positive definite, i.e., x⊤ Qx > 0 whenever x ̸= 0. It is convenient to rewrite the
constraints in the vector-matrix form using the notation
 ⊤ 
a1
 .. 
g(x) = b − Ax ≤ 0, where b := [b1 ; . . . ; bm ] , A :=  .  .
a⊤m

In order to form the Lagrange dual to (P ) program, we write down the Lagrange
function
Xm
L(x, λ) = f (x) + λj gj (x)
j=1
1 1
= x⊤ Qx + c⊤ x + λ⊤ (b − Ax) = x⊤ Qx − (A⊤ λ − c)⊤ x + b⊤ λ,
2 2
and minimize it in x. Since the function is convex and differentiable in x, the
minimum, if exists, is given by the Fermat rule
∇x L(x, λ) = 0,
which in our situation becomes
Qx = A⊤ λ − c.
Since Q is positive definite, it is nonsingular, so that the Fermat equation has a
unique solution which is the minimizer of L(·, λ). This solution is
x(λ) := Q−1 (A⊤ λ − c).
Substituting the expression of x(λ) into the expression for the Lagrange function,
we get the dual objective
1
L(λ) = − (A⊤ λ − c)⊤ Q−1 (A⊤ λ − c) + b⊤ λ.
2
Thus, the dual problem is to maximize this objective over the nonnegative or-
thant. Let us rewrite this dual problem equivalently by introducing additional
variables
t := −Q−1 (A⊤ λ − c) =⇒ (A⊤ λ − c)⊤ Q−1 (A⊤ λ − c) = t⊤ Qt.
With this substitution, the dual problem becomes

1 ⊤ ⊤ ⊤
max − t Qt + b λ : A λ + Qt = c, λ ≥ 0 . (D)
λ,t 2
We see that the dual problem also turns out to be linearly constrained convex
quadratic program.
Remark IV.24.2 Note also a feasible quadratic program in the form of (P ) with
a positive definite matrix Q automatically is solvable. This relies on the following
simple general fact:
24.2 Quadratic Programming Duality 281

Let (IC) be a feasible program with closed domain X, continuous on X

objective and constraints and such that f (x) → ∞ as x ∈ X “goes to
infinity” (i.e., ∥x∥2 → ∞). Then (IC) is solvable.
In the case of our quadratic program (P ), as Q is positive definite, we have
f (x) → ∞ as ∥x∥2 → ∞. Then, solvability of (P ) follows from the simple fact
stated above. You are encouraged to prove this simple fact on your own. ■
Based on this remark, Proposition IV.24.1 leads to the following result.

Theorem IV.24.3 [Duality Theorem in Quadratic Programming]

Let (P ) be a feasible quadratic program with positive definite symmetric
matrix Q in the objective. Then, both (P ) and (D) are solvable, and the
optimal values in these problems are equal to each other.
A pair of primal and dual feasible solutions, say (x; (λ, t)), to these prob-
lems are optimum
(i) if and only if “zero duality gap” optimality condition holds, i.e., the
primal objective value at x is equal to the dual objective value at (λ, t)
or equivalently,
(ii) if and only if the following holds
λi (Ax − b)i = 0, i = 1, . . . , m, and t = −x. (24.2)

Proof. (i): Proposition IV.24.1 implies that the optimal value in minimization
problem (P ) is equal to the optimal value in the maximization problem (D).
It follows that the value of the primal objective at any primal feasible solution
is ≥ the value of the dual objective at any dual feasible solution, and equality
is possible if and only if these values coincide with the optimal values in the
problems, as claimed in (i).
(ii): Let ∆(x, (λ, t)) be the difference between the primal objective value of
the primal feasible solution x and the dual objective value of the dual feasible
solution (λ, t)
1 1
∆(x, (λ, t)) := (c⊤ x + x⊤ Qx) − (b⊤ λ − t⊤ Qt)
2 2
1 ⊤ 1
= (A λ + Qt) x + x Qx + t⊤ Qt − b⊤ λ
⊤ ⊤
2 2
1
= λ⊤ (Ax − b) + (x + t)⊤ Q(x + t),
2
where the second equation follows since A⊤ λ + Qt = c. Whenever x is primal
feasible, we have Ax − b ≥ 0, and similarly dual feasibility of (λ, t) implies that
λ ≥ 0. Since Q is positive definite as well, we then deduce that the first and
the second terms in the above representation of ∆(x, (λ, t)) are nonnegative for
every pair (x; (λ, t)) of primal and dual feasible solutions. Thus, for such a pair
∆(x, (λ, t)) = 0 holds if and only if λ⊤ (Ax − b) = 0 and (x + t)⊤ Q(x + t) = 0. The
first of these equalities, due to λ ≥ 0 and Ax ≥ b, is equivalent to λj (Ax − b)j = 0
282 Duality in Linear and Convex Quadratic Programming

for all j = 1, . . . , m; the second equality, due to positive definiteness of Q, is

equivalent to x + t = 0.
25

⋆ Cone-convex functions: elementary

calculus and examples

So far, speaking about K-convex functions, we assumed the cone K to be regular,

that is, pointed, closed, and with a nonempty interior. In the considerations of this
chapter, nonemptiness of int K is of no importance, so that within this chapter we
specify K-convex mapping f , K being a closed pointed cone in some embedding
Euclidean space F , as a mapping f defined on a convex domain Dom f ⊆ Rn
and taking values in F such that
f (λx + (1 − λ)y) ≤K λf (x) + (1 − λ)f (y), ∀(x, y ∈ Dom f, λ ∈ [0, 1]),
where, as always, a ≤K b means that b − a ∈ K. In particular, we allow for
K = {0}, that is a ≤K b equivalent to a = b; in this extreme case, K-convexity
of f means that f is affine on Dom f .
Note that Fact IV.20.10, as is clear from its proof, remains true when the
regularity of cone K in its premise is relaxed to closedness and pointedness.
The calculus presented below resembles the usual calculus of real-valued mono-
tone and convex functions, and almost all claims to follow are nearly self-evident,
and therefore not all of them are accompanied by proofs. Absence of a proof in
the text means that providing the proof is an exercise for the reader1 .

25.1 Epigraph characterization of cone-convexity

Let f (x) : Dom f → Rν be a mapping with convex domain Dom f ⊆ Rn , and let
K ⊂ Rν be a closed pointed cone. The following claim is evident.

Fact IV.25.1 A function f is K-convex if and only if its K-epigraph

epiK (f ) := {(x, y) ∈ Dom f × Rν : y ≥K f (x)}
is convex.
Let us examine some K-convex functions. In what follows, we use ⪰-convexity
as synonym of Sm
+ -convexity, with context-specified m.
Example IV.25.1 Consider the “convex matrix-valued quadratic function” of
x ∈ Rp×q given by
f (x) := [AxB][AxB]⊤ + [CxD + D⊤ x⊤ C ⊤ ] + H,
1 This being said, all missing proofs can be found in solutions to Exercise IV.29.

283
284 ⋆ Cone-convex functions: elementary calculus and examples

where A, C ∈ Rm×p , B ∈ Rq×n , D ∈ Rq×m , H ∈ Sm . Note that f : Rp×q 7→ Sm .

We claim that f is ⪰-convex. Indeed, by the Schur Complement Lemma we have
y − (CxD + D⊤ x⊤ C ⊤ ) − H AxB

epiSm {f } = (x, y) : ⪰0
+ B ⊤ x ⊤ A⊤ In
and the right hand side set is clearly convex (as it is the inverse image of Sm+n
+
under an affine mapping). ♢
Example IV.25.2 Consider the matrix-valued fractional-quadratic function
f (u, v) := u⊤ v −1 u
with the domain Dom f = {(u, v) : u ∈ Rp×q , v ∈ int Sp+ }. We claim that f is
⪰-convex. Indeed, by the Schur Complement Lemma we have
y u⊤

epiSq+ {f } = [(u, v, y) : ⪰ 0, v ≻ 0 ,
u v
and the right hand side set is convex (by the same reason as in Example IV.25.1).
♢

25.2 Testing cone-convexity and cone-monotonicity

25.2.1 Cone-monotonicity
Let us start with a new (for us) notion which will play an important role in
“calculus of cone-convexity.”

Definition IV.25.2 [(U, K)-monotonicity] Let E and F be Euclidean spaces

equipped with closed pointed cones U and K, Q be a nonempty convex subset
of E, and f (x) : Q → F be a mapping. We say that this mapping is (U, K)-
monotone on Q, if f (x) ≤K f (x′ ) holds for every x, x′ ∈ Q satisfying x ≤U x′ .

For example, when U and K are nonnegative orthants, e.g., E = Rn and

F = Rm , (U, K)-monotonicity of F on Q means that whenever x ≤ x′ and
x, x′ ∈ Q, one has F (x) ≤ F (x′ ). An instructive extreme example is the one
where U = {0}; in this case, every mapping f : Q → F is (U, K)-monotone.

25.2.2 Differential criteria for cone-convexity and cone-monotonicity

We next present a differential characterization of cone-convexity and cone-mono-
tonicity. The following claim is nearly evident.

Proposition IV.25.3 Let E, F be Euclidean spaces equipped with closed

25.2 Testing cone-convexity and cone-monotonicity 285

pointed cones U and K, let Dom f ⊆ E be a convex set with a nonempty

interior, and let f : Dom f → F be a mapping. Then,
(i) f is K-convex if and only if the scalar function ⟨g, f (x)⟩ : Dom f → R
is convex for every g ∈ K∗ . In particular, assuming that f is continuous on
Dom f and twice differentiable on int(Dom f ), we deduce that f is K-convex
if and only if
d2
f (x + th) ≥K 0, ∀(x ∈ int(Dom f ), h ∈ E).
dt2 t=0

(ii) Assuming f is continuous on Dom f and differentiable on int(Dom f ),

f is (U, K)-monotone on Dom f if and only if
d
f (x + th) ≥K 0, ∀(h ∈ U, x ∈ int(Dom f )).
dt t=0

Example IV.25.3 The function f (x) = xx⊤ : Rm×n → Sm is Sm

+ -convex. In-
deed,
d d2
f (x + th)[h] = xh⊤ + hx⊤ , f (x + th) = 2hh⊤ ⪰ 0.
dt t=0 dt2 t=0

We can arrive at the same conclusion by specifying appropriately the data in

Example IV.25.1. It is worthwhile to compare our new tools with the “bare
hands” verification of ⪰-convexity of similar matrix-valued function in the proof
of Lemma IV.20.9. ♢
−1 m m m m
Example IV.25.4 The function x 7→ f (x) := x : int S+ → int S+ is (−S+ , S+ )-
monotone and Sm + -convex.
Indeed, as we know from Example C.8 in section C.1.6, for x ∈ Dom f and h ∈ Sm
it holds
d
Df (x)[h] := f (x + th) = −x−1 hx−1 .
dt t=0
Thus, Df (x)[h] ⪰ 0 whenever h ∈ −Sm+ , which, by Proposition IV.25.3(ii), implies
the desired monotonicity. From the above expression for Df (x)[h] it follows that
d2
D2 f (x)[h, h] := f (x + th)
dt2 t=0
d
−(x + th)−1 h(x + th)−1

=
dt t=0
= (x−1 hx−1 )hx−1 + x−1 h(x−1 hx−1 )
= 2x−1 hx−1 hx−1
= 2x−1/2 (x−1/2 hx−1/2 )2 x−1/2 ⪰ 0.
Then, by Proposition IV.25.3(i) we deduce the Sm+ -convexity of f (x) as well.
One can easily construct numerical examples showing that the function f (x) =
x−2 : int Sm m m m m
+ → S+ is neither (−S+ , S+ )-monotone, nor S+ -convex. ♢
286 ⋆ Cone-convex functions: elementary calculus and examples

Our next example is less trivial and relies on the following well-known result
that is important on its own.

Theorem IV.25.4 Let f (s) : (a, b) → R be an analytic function on an

interval of real axis. Then, the function F (x) = f (x) : Dom F → Sm where
Dom F := {x ∈ Sm : λi (x) ∈ (a, b), 1 ≤ i ≤ m} (see section D.1.5 for the
precise definition of the map F ) is infinitely many times differentiable on the
open set ∆.

Justification of this well known fact requires tools (integral Cauchy formula)
which go beyond prerequisites in Calculus we take for granted in this book.
We are now ready for our next example.
Example IV.25.5 The function f (x) = x1/2 : Sm + → S
m
is (Sm m
+ , S+ )-monotone
m m
and S+ -concave (the latter, of course, means that −f (x) is S+ -convex).
Here is the justification. By Proposition D.26, f (x) is continuous on Sm + , so that
it suffices to prove that the function possesses the announced monotonicity and
concavity properties in the interior of Sm + . By Theorem IV.25.4, f (x) is infinitely
many times differentiable on int Sm + . Let us compute the derivative of f (x). Given
m
x ≻ 0 and h ∈ S and setting d := Df (x)[h], we have, by differentiating the
identity f 2 (x) ≡ x,
x1/2 · d + d · x1/2 = h. (25.1)
Rewriting this linear equation in variable d ∈ Sm in an orthonormal eigenbasis of
x we see that this equation has a unique solution. A (and therefore the) solution
to this equation is given by
Z∞
d = exp{−x1/2 t}h exp{−x1/2 t}dt. (25.2)
0

Indeed, consider u ∈ Rn×n . Then, one has

"∞ # ∞
d d X1 i i X 1 i i
exp{tu} = tu =u t u = u exp{tu} = exp{tu}u
dt dt i=0 i! i=0
i!
(for the definition of the exponent of a square matrix, see Remark D.25). More-
over, by integrating in t, over the ray R+ , the identity
d h i
exp{−x1/2 t}h exp{−x1/2 t}
dt
= −x1/2 exp{−x1/2 t}h exp{−x1/2 t} − exp{−x1/2 t}h exp{−x1/2 t}x1/2 ,
we arrive at the result.2
2 Note that when x ≻ 0 and h commute (as is the case when x, h ∈ S1 and x > 0), (25.2) becomes
Z ∞ Z ∞
d= exp{−x1/2 t}h exp{−x1/2 t}dt = h exp{−2x1/2 t}dt
0 0
Z ∞
d h i
=h −[2x1/2 ]−1 exp{−2x1/2 t} dt = [2x1/2 ]−1 h,
0 dt
25.3 Elementary calculus of cone-convexity 287

From (25.2) we deduce that d ⪰ 0 whenever h ⪰ 0, and thus by Proposition

IV.25.3(ii), we conclude that f is (Sm m
+ , S+ )-monotone. Now, differentiating the
identity (25.1) with x replaced by x + th, that is, the identity
f (x + th)Df (x + th)[h] + Df (x + th)[h]f (x + th) ≡ h
in t and setting t = 0, we get
0 = 2d2 + x1/2 D2 f (x)[h, h] + D2 f (x)[h, h]x1/2 ,
whence, as above,
Z ∞
D2 f (x)[h, h] = −2 exp{−tx1/2 }d2 exp{−tx1/2 }dt,
0

so that D f (x)[h, h] ⪯ 0, thus f (x) = x1/2 is Sm

2
+ -concave. ♢

25.3 Elementary calculus of cone-convexity

We start with the elementary calculus of cone-convexity.
A. Let F be a Euclidean space and K ⊂ F be a closed pointed cone. Then,
A.1. An affine mapping f (x) = Ax + b : Rn → F is K-convex.
P D⊆
A.2. When fi (x), i = 1, . . . , m, are K-convex functions with common domain
E, E being a finite-dimensional space, and λi ≥ 0, the function i λi fi (x)
with the domain D is K-convex.
A.3. When f (x) : Dom f → F is K-convex, and x = Ay + b : G → E is an
affine mapping, E being the embedding space of Dom f , the function g(y) :=
f (Ay + b) with the domain Dom g = {y : Ay + b ∈ Dom f } is K-convex.
A.4. When f (x) : Dom f → Rν is U-convex, U being a closed pointed cone in Rν ,
and y = Az + b : Rν → F is an affine mapping such that Au ∈ K whenever
u ∈ U, the function g(x) := Af (x) + b, with Dom g := Dom f , is K-convex.
We next present the “conic version” of Convex Monotone superposition rules
from section 13.1.

B. Let
• K ⊂ Rν be a closed pointed cone,
• F (y) : Dom F → Rν be a mapping with convex domain Dom F ⊆ Rn1 ×
. . . × RnK , so that an argument y = [y1 ; . . . ; yK ] of F is a block vector with
blocks yk of dimension nk , 1 ≤ k ≤ K,
• Uk ⊂ Rnk , k ≤ K, be closed pointed cones,
• fk (x) : D → Rnk , 1 ≤ k ≤ K, be mappings with common convex domain
D ⊆ Rn .
√ 1
in full accordance with the formula D[ s][ds] = 2√ s
ds for the derivative of the univariate function
√
s. Thus, (25.2) is (not that predictable in advance) matrix version of the formula for the derivative
of the usual square root.
288 ⋆ Cone-convex functions: elementary calculus and examples

Assume that F is K-convex and (U1 × . . . × UK , K)-monotone, fk are Uk -

convex, k ≤ K, and
f (x) := [f1 (x); . . . ; fK (x)] ∈ Dom F, ∀x ∈ D.
Then, the function
G(x) := F (f (x)) : D → Rν
is K-convex.
Remark IV.25.5 Note that in the preceding superposition rule B, some of Uk
may be trivial: Uk = {0}. For these k, Uk -convexity of fk is the same as fk being
an affine function, and (Uk , K)-monotonicity of F in yk holds true automatically.
Thus, the above rule covers the usual Convex Monotone superposition rule from
section 13.1, where affinity of some of the inner functions fk allowed to lift the
requirement for the outer function F to be monotone in the respective yk . ■
Let us illustrate these calculus rules on some examples.
Example IV.25.6 Suppose f (x) : Sm m m m m m
+ → S+ and g(x) : S+ → S+ are (S+ , S+ )-
monotone and Sm + -concave, then, as an immediate corollary of B, so is the map-
ping h(x) := f (g(x)).
Therefore, recalling also Example IV.25.5, we conclude that for any positive
k
integer k the function x1/2 : Sm m m m m
+ → S+ is (S+ , S+ )-monotone and S+ -concave.
Taking into account that
sα − 1
lim = ln s, s > 0.
α→+0 α
we conclude that the matrix logarithm, i.e., the function
f (x) := ln(x) : int Sm
+ → S
m

is (Sm m m
+ , S+ )-monotone and S+ -concave.
Finally, in contrast to these, the matrix exponent exp{x} : Sm → Sm possesses
no monotonicity/convexity properties unless m = 1. ♢
Example IV.25.7 Let the function f (x) : Dom f → int Sm + be continuous on the
convex domain Dom f ⊆ Rn with a nonempty interior, and assume that f is twice
−1
differentiable on int Dom f . When f is Sm
+ -concave, the function g(x) := (f (x))
is Sm+ -convex.
Indeed, for x ∈ Dom f and h ∈ Rn we have
Dg(x)[h] = −g(x)Df (x)[h]g(x),
D g(x)[h, h] = 2g(x)Df (x)[h]g(x)Df (x)[h]g(x) − g(x)D2 f (x)[h, h]g(x)
2

2
= 2g 1/2 (x) g 1/2 (x)Df (x)[h]g 1/2 (x) g 1/2 (x) − g(x)D2 f (x)[h, h]g(x)
⪰ 0,
(recall that D2 f (x)[h, h] ⪯ 0 by Proposition IV.25.3(i) as f is Sm
+ -concave). ♢
25.3 Elementary calculus of cone-convexity 289

Example IV.25.8 All of the following functions

v⊤

u
f (u, v, w, z) = , where
v w − z 1/2
m n×m
Dom f := {(u, v, w, z) :⊤u ∈ S , v ∈ R , w ∈ Sn , z ∈ Sn , z ⪰ 0}
u v
g(u, v, w, z) = , where
v w + z −1/2
m n×m
Dom g := {(u, v, w, z) ⊤: u ∈ S , v ∈ R , w ∈ Sn , z ∈ Sn , z ≻ 0}
u v
h(u, v, w, z) = , where
v w + z −1
m n×m
Dom h := {(u, v, w, z)⊤: u ∈ S , v ∈ R , w ∈ Sn , z ∈ Sn , z ≻ 0}
u v
e(u, v, w, z) = , where
v w + z2
Dom e := {(u, v, w, z) : u ∈ Sm , v ∈ Rn×m , w ∈ Sn , z ∈ Sn }
are Sm+n
+ -convex.
To justify our claim, note that the function
y1 y2⊤

F (y1 , y2 , y3 ) := : Sm × Rn×m × Sn → Sm+n
y2 y3
m+n
is affine and therefore S+ -convex. By Proposition IV.25.3(ii), this function is
m+n
(S+ , S+ )-monotone in y3 . Applying B with U1 = {0} ⊂ Sm , U2 = {0} ⊂
n

Rm×n , U3 = Sn+ and

f1 (u, v, w, z) := u, f2 (u, v, w, z) := v, f3 (u, v, w, z) := w − z 1/2
(the latter function is Sn+ -convex in its domain {(u, v, w, z) : z ⪰ 0} by Ex-
ample IV.25.5), we conclude that f is Sm+n + -convex. Similar reasoning, with
f3 (u, v, w, z) replaced with
• w + z −1/2 (this function is Sn+ -convex in the domain z ≻ 0 by ⪰-concavity of
z 1/2 , z ⪰ 0 (Example IV.25.5) and Example IV.25.7)

• w + z −1 (this function is Sn+ -convex in the domain z ≻ 0 by Example IV.25.7),

• w + z 2 (this function is Sn+ -convex by Example IV.25.3),

justifies the announced Sm+n
+ -convexity of the functions g, h, e as well. ♢
26

⋆ Mathematical Programming Optimality

Conditions

The goal of this chapter is to develop optimality conditions for a general-type

Mathematical Programming problem
inequality constraints
z }| {
g1 (x) ≤ 0, . . . , gm (x) ≤ 0,
min f (x) : , (26.1)
x h1 (x) = 0, . . . , hk (x) = 0
| {z }
equality constraints

where m, k are nonnegative integers, and the objective f and the constraints
gj , hi are real-valued functions well-defined each on its own subset of Rn . This
topic seemingly “goes beyond convexity;” however, related developments utilize
Convex Analysis tools, and it would be unwise to skip it completely.
Same as in the case of convex programs, optimality conditions are aimed at
answering the following question:
Given a feasible solution x∗ to (26.1), what are necessary/sufficient con-
ditions for x∗ to be an optimal solution to the problem?
The conditions we are looking for should be verifiable: given x∗ and local informa-
tion on the objective and the constraints (i.e., their values and derivatives taken
at x∗ ) we should be able to check whether the conditions are/are not satisfied.
Beyond the convex case, there are no verifiable sufficient conditions for x∗ to be
globally optimal (i.e., f (x∗ ) ≤ f (x) for every feasible x) are known. The existing
optimality conditions focus on local optimality of x∗ defined as follows:

Definition IV.26.1 [Local optimality] A feasible solution to (26.1) is called

locally optimal, if the objective and the constraints are well defined in a
neighborhood of x∗ and x∗ has the best –the smallest– objective value among
all feasible solutions that are close enough to it, that is, there exists r > 0
such that
x is feasible for (26.1) and ∥x − x∗ ∥2 ≤ r =⇒ f (x∗ ) ≤ f (x).

The classical MP optimality conditions we are about to present are applicable

only when x∗ is a regular feasible solution to (26.1), with regularity defined as
follows:

290
26.1 Formulating Optimality conditions 291

Definition IV.26.2 [Regular solution] A vector x∗ ∈ Rn is called a regular

solution to problem (26.1), if
• x∗ is a feasible solution to the problem,
• the objective and the constraints are well defined and continuously differ-
entiable in a neighborhood of x∗ , and
• the taken at x∗ gradients of active at x∗ (i.e., satisfied at x∗ as equalities)
constraints are linearly independent.

Geometrically and informally, regularity of a solution x∗ means that the set S cut
off a small enough neighborhood of x∗ by the equality versions of the constraints
active at x∗ is a smooth surface.

26.1 Formulating Optimality conditions

26.1.1 Default Assumption. From now on, unless the opposite is explicitly
stated,
we assume that x∗ is a regular solution to (26.1) and that the objective
and the constraints are twice continuously differentiable in a neighbor-
hood of x∗ .
We will express our optimality conditions in terms of the Lagrange function of
(26.1) given by
m
X k
X
L(x; λ, µ) := f (x) + λj gj (x) + µi hi (x).
j=1 i=1

This function is well defined on the direct product of a neighborhood X of x∗ and

the entire space Rm k
λ × Rµ of Lagrange multipliers λ, µ and is twice continuously
differentiable in x ∈ X and linear in [λ; µ].
We are now ready to state our optimality conditions (first the necessary one,
then the sufficient one) for general MPs under our Default Assumption.

Proposition IV.26.3 [Necessary Optimality condition for general MP] Sup-

pose Default Assumption takes place, and let x∗ be a locally optimal solution
to problem (26.1). Then,
(i) [first order part] x∗ is a KKT (Karush-Kuhn-Tucker) point of the problem,
i.e., there exist λ∗ ∈ Rm ∗ k
+ and µ ∈ R satisfying
∗
[complementary slackness]: λj gj (x∗ ) = 0, for all 1 ≤ j ≤ m,
[KKT equation]: ∇x x=x∗ L(x; λ∗ , µ∗ ) = 0.
(ii) [second order part] The second order directional derivatives of L(·; λ∗ , µ∗ )
taken at x∗ along the directions from the linear subspace

n d is orthogonal to the taken at x∗ gradients
Tn := d ∈ R :
of all constraints that are active at x∗
292 ⋆ Mathematical Programming Optimality Conditions

are nonnegative, i.e.,

d⊤ ∇2x x=x∗
L(x; λ∗ , µ∗ )d ≥ 0, ∀d ∈ Tn .

Note that under our Default Assumption if x∗ is a KKT point of (26.1), then
the corresponding Lagrange multipliers λ∗ , µ∗ are P uniquely defined
P by x∗ . In-
deed, the KKT equation says that −∇f (x∗ ) = j λ∗j ∇gj (x∗ ) + i µ∗i ∇hi (x∗ ).
By complementary slackness, the Lagrange multipliers λ∗j for non-active at x∗
inequality constraints are zero, and the rest of λ∗j ’s taken together with µ∗i ’s form
the coefficients in a representation of −∇f (x∗ ) as a linear combination of lin-
early independent (since x∗ is regular) gradients, taken at x∗ , of the active at x∗
constraints and thus are uniquely defined.

Proposition IV.26.4 [Sufficient Optimality condition for general MP] Sup-

pose Default Assumption takes place, and let x∗ be such that
(i) [first order part] x∗ is a KKT point of (26.1) as defined in item (i) of
Proposition IV.26.3, and
(ii) [second order part] For the Lagrange multipliers λ∗ , µ∗ associated with
the KKT point x∗ , we have that the second order directional derivatives
of L(·; λ∗ , µ∗ ) taken at x∗ along the nonzero directions from the linear
subspace
 
 d is orthogonal to the taken at x∗ gradients 
Ts := d ∈ Rn : of all equality constraints hi and all inequality
constraints gj corresponding to λ∗j > 0
 

are positive, i.e.,

d⊤ ∇2x x=x∗
L(x; λ∗ , µ∗ )d > 0, ∀d ∈ Ts \{0}.
Then, x∗ is a locally optimal solution to (26.1).

Pay attention to the fact that by complementary slackness λ∗j > 0 only when the
constraint gj (x) ≤ 0 is active at x∗ . As a result, the linear subspaces participating
in the second order parts of these two Optimality conditions are embedded one
into another: Tn ⊆ Ts , and in general this inclusion is strict. In fact, Tn = Ts
holds if and only if all active at x∗ inequality constraints gj are associated with
positive, and not just nonnegative, Lagrange multipliers λ∗j .

26.2 Justifying Optimality conditions

Justifying Optimality conditions: preliminary step. It may happen that
some of the inequality constraints are non-active at x∗ . It is immediately seen
that in our context, eliminating these constraints from (26.1) changes nothing:
on one hand, x∗ remains a regular solution to the resulting problem and is a
locally optimal solution to the latter problem if and only if it is locally optimal to
the former one. On the other hand, satisfiability statuses of optimality conditions
26.2 Justifying Optimality conditions 293

in both problems are exactly the same, since these conditions are expressed in
terms of two entities:
(a) the functions of x obtained from the respective Lagrange functions by setting
λ, µ to λ∗ , µ∗ ; note that complementary slackness enforces these functions for
both problems in question to be the same;
(b) linear subspaces Tn and Ts ; these linear subspaces are defined in terms of the
taken at x∗ gradients of the active at x∗ constraints and for both problems in
question are the same as well.
The bottom line is that it suffices to verify validity of our optimality conditions
in the special case when all inequality constraints in (26.1) are active at x∗ , and
this is what we assume from now on.

26.2.1 Main tool: Implicit Function Theorem

We are about to justify Optimality conditions by utilizing the following funda-
mental fact (which is one of the forms of Implicit Function Theorem available in
any graduate textbook on multivariate calculus):

Theorem IV.26.5 Let x∗ ∈ Rn , and let ϕ1 , . . . , ϕp be real-valued functions

well defined and κ ≥ 1 times continuously differentiable in a neighborhood
U of x∗ and normalized by the condition ϕℓ (x∗ ) = 0, ℓ ≤ p. Assume that the
taken at x∗ gradients of ϕℓ (x), ℓ ≤ p, are linearly independent. Then, there
exists a neighborhood X ⊆ U of x∗ , a neighborhood Y of y∗ := 0 ∈ Rn and a
one-to one mapping y(x) of X onto Y which, along with its inverse x(y) (this
is a one-to-one mapping of Y onto X), possesses the following properties:
(i) y∗ := y(x∗ ) = 0 ( ⇐⇒ x(0) = x∗ );
(ii) y(x) and x(y) are κ times continuously differentiable on X, resp. on Y ;
(iii) In y-variables, the functions ϕℓ (·), ℓ ≤ p, become just the first p coordinates
yℓ , ℓ ≤ p, of y:
ϕℓ (x(y)) ≡ yℓ , ∀(y ∈ Y, ℓ ≤ p) [ ⇐⇒ ϕℓ (x) = yℓ (x), ∀(x ∈ X, ℓ ≤ p).]
(26.2)

26.2.2 Strategy
Let us apply the Implicit Function Theorem to the m + k functions given by
ϕj (x) := gj (x), j ≤ m, ϕm+i (x) := hi (x), i ≤ k. (26.3)
Since we are in the case when x∗ is a regular solution and all constraints of
the problem of interest are active at x∗ , the resulting p = m + k functions ϕℓ
satisfy the premise in the Implicit Function Theorem, with κ set to 2. Let X,
Y , x(·), y(·) be the entities given by the theorem as applied to our ϕℓ and x∗ .
Taking into account (26.2), substituting x = x(y) and defining ϕ(y) := f (x(y)),
294 ⋆ Mathematical Programming Optimality Conditions

we convert the original problem (26.1) into the linearly constrained Mathematical
Programming problem

min {ϕ(y) : yj ≤ 0, j ≤ m, ym+i = 0, i ≤ k} . (26.4)

Taking into account that y(x) and x(y) are twice continuously differentiable in-
verse to each other mappings of neighborhoods X of x∗ and Y of y∗ = y(x∗ ) = 0
onto each other, the function ϕ(y) is twice continuously differentiable in a neigh-
borhood of y∗ = 0, and x∗ is a locally optimal solution to (26.1) if and only if
y∗ = 0 is a locally optimal solution to (26.4). Moreover, by looking at the problem
(26.4), we conclude that y∗ = 0 is a regular solution to it.
Our course of actions will be as follows:

A. We start with verifying that our optimality conditions as applied to problem

(26.4) and its regular solution y∗ = 0 are indeed valid.
B. We “transfer” these valid optimality conditions from problem (26.4) to problem
of interest (26.1). Taking into account that x∗ is locally optimal for the latter
problem if and only if y∗ = 0 is a locally optimal solution to the former problem,
we end up with valid necessary/sufficient optimality conditions for the problem
of interest. As we shall see, these valid conditions are exactly the conditions
stated in Propositions IV.26.3, IV.26.4; this will complete the justification of
these two propositions.

Note that when implementing our strategy we should not bother about comple-
mentary slackness, since in both problems (26.1), (26.4) all constraints are active
at x∗ , resp. y∗ , making complementary slackness trivially true.

26.2.3 Justifying optimality conditions for (26.4)

We are about to verify the validity of Propositions IV.26.3 and IV.26.4 as applied
to problem (26.4), which is nearly immediate. Let

m
X k
X
L(y;
b λ, µ) := ϕ(y) + λj yj + µi ym+i
j=1 i=1

be the Lagrange function of (26.4). Here are the straightforward specifications of

the optimality conditions stated in Propositions IV.26.3 and IV.26.4 as applied
to the solution y∗ = 0 of problem (26.4):

• Claim N: The condition N:

26.2 Justifying Optimality conditions 295

N.(i) [first order part] y∗ = 0 is a KKT point of (26.4), i.e., there exist
λ∗ ∈ Rm ∗ k ∗ ∗
+ and µ ∈ R such that ∇y y=0 L(y; λ , µ ) = 0.
b
N.(ii) [second order part] The second order directional derivatives of the
b λ∗ , µ∗ ), or, which is the same under the circumstances, of
function L(·;
the function ϕ(y), taken at the point y∗ = 0 along any direction from
the linear space
T n := d ∈ Rn : e⊤

ℓ d = 0, ∀ℓ ≤ m + k

(e1 , . . . , en , as always, are the standard basic orth in Rn ) are nonneg-

ative, i.e.,
e⊤ ⊤ 2
ℓ d = 0, ∀ℓ ≤ m + k =⇒ d ∇y y=0
b λ∗ , µ∗ )d = d⊤ ∇2 ϕ(0)d ≥ 0.
L(y;

is necessary for y∗ = 0 to be a locally optimal solution to (26.4).

• Claim S: The condition S:
S.(i) [first order part] y∗ = 0 is a KKT point of (26.4) (see item N.(i) above),
S.(ii) [second order part] The second order directional derivative of the func-
b λ∗ , µ∗ ) (where λ∗ , µ∗ are the Lagrange multipliers associated
tion L(·;
with the KKT point y∗ = 0), or, which is the same under the circum-
stances, of the function ϕ(y), taken at the point y∗ = 0 along every
nonzero direction from the linear subspace
T s := d ∈ Rn : e⊤ ∗ ⊤

ℓ d = 0, ∀(ℓ ≤ m : λℓ > 0), em+ℓ d = 0, ∀ℓ ≤ k

is positive, i.e.,
d ̸= 0 & e⊤ ∗ ⊤
ℓ d = 0, ∀(ℓ ≤ m : λj > 0) & em+ℓ d = 0, ∀ℓ ≤ k

=⇒ d⊤ ∇2 b λ∗ , µ∗ )d = d⊤ ∇2 ϕ(0)d > 0.
L(y;
y y=0

is sufficient for y∗ = 0 to be a locally optimal solution to (26.4).

We are about to justify Claims N, S.

1) Observe that the feasible set of (26.4) is the polyhedral cone

F := d ∈ Rn : e⊤ ⊤

ℓ d ≤ 0, ∀ℓ ≤ m, em+ℓ d = 0, ∀ℓ ≤ k ,

so that a necessary condition for y∗ = 0 to be locally optimal solution to (26.4)

is that for every d ∈ F, 0 is the local minimizer of the restriction of ϕ(·) on the
ray {td : t ≥ 0}. Next, given a univariate function ψ well defined and twice con-
tinuously differentiable in a neighborhood of the origin, the elementary calculus
says that a necessary condition for 0 to be a local minimizer of the restriction of
ψ onto the nonnegative ray is

ψ ′ (0) ≥ 0 & ψ ′′ (0) ≥ 0 when ψ ′ (0) = 0.

Thus, we conclude that

296 ⋆ Mathematical Programming Optimality Conditions
C: A necessary condition for y∗ = 0 to be a locally optimal solution to
(26.4) is that for every direction d ∈ F
(a) the first-order directional derivative d⊤ ∇ϕ(0) of ϕ taken at the origin
along the direction d is nonnegative, and
(b) if the first-order directional derivative from (a) is zero, then the second-
order directional derivative d⊤ ∇2 ϕ(0)d of ϕ taken at the origin along
the direction d is nonnegative.
Now, C.(a) is nothing but the fact that the homogeneous linear inequality
d⊤ ∇ϕ(0) ≥ 0 in variables d ∈ Rn is a consequence of the following system of
homogeneous linear inequalities in variables d:

−e⊤
j d ≥ 0, ∀j ≤ m, ± e⊤
m+i d ≥ 0, ∀i ≤ k.

By Homogeneous Farkas Lemma, this is the same as to say that −∇ϕ(0) is

a linear combination, with nonnegative coefficients, of the vectors ej , j ≤ m,
±em+i , i ≤ k, or, which again is the same, that −∇ϕ(0) is a linear combination
of the basic orths eℓ , ℓ ≤ m + k, with the first m coefficients being nonnegative.
Thus, C.(a) is nothing but the fact that
m
X k
X
∇ϕ(0) + λ∗j ej + µ∗i em+i = 0
j=1 i=1

for some nonnegative λ∗j . The bottom line is that C.(a) is nothing but the fact
that y∗ = 0 is a KKT point of problem (26.4).
Now, let us define and examine the set

K := d ∈ F : d⊤ ∇ϕ(0) = 0 .

When C.(a) holds true, by the representation of ∇ϕ(0) in terms of λ∗ ∈ Rm ∗

+, µ ∈
k
R and also by definition of F, we have

K = d ∈ F : d⊤ ∇ϕ(0) = 0

( m k
! )
X X
⊤ ∗ ∗
= d∈F:d λj ej + µi em+i = 0
j=1 i=1

e⊤
j d ≤ 0, ∀j ≤ m, e⊤
m+i d = 0, ∀i ≤ k,

n
= d∈R : m ∗
Pk
µ∗i dm+i = 0
P
j=1 λj dj + i=1
⊤
e⊤

n ej d ≤ 0, ∀j ≤ m,m+i d = 0, ∀i ≤ k,
= d∈R : P m ∗
j=1 λj dj = 0

e⊤ ∗
 
 j d ≤ 0, ∀(j ≤ m : λj = 0) 
= d ∈ Rn : e⊤ ∗
j d = 0, ∀(j ≤ m : λj > 0) .
⊤
em+i d = 0, ∀i ≤ k
 

Note that C.(b) wants from the quadratic form d⊤ ∇2 ϕ(0)d to be nonnegative on
the cone K. The bottom line is that we have justified the following:
26.2 Justifying Optimality conditions 297
Fact N: The condition N∗ :
y∗ = 0 is a KKT point of (26.4) such that second order directional derivatives
d⊤ ∇2 ϕ(0)d of ϕ (or, which is the same, of the Lagrange function L(·;
b λ∗ , µ∗ ), with
∗ ∗
λ , µ given by the KKT property of y∗ ) taken at y∗ = 0 along all directions from
cone K are nonnegative

is necessary for y∗ = 0 to be a locally optimal solution to (26.4).

2) We have already outlined that a necessary condition for 0 to be local mini-
mizer of the restriction onto R+ of a well defined and twice continuously differ-
entiable in a neighborhood of 0 univariate function ψ is “ψ ′ (0) ≥ 0 and ψ ′′ (0) ≥ 0
when ψ ′ (0) = 0.” In fact, a slightly strengthened version of this condition is a
sufficient condition for local optimality of 0 for ψ over R+ as well. Specifically,
if for ψ it holds that ψ ′ (0) ≥ 0 and ψ ′′ (0) > 0 when ψ ′ (0) = 0, then 0 is a local
minimizer of the restriction of ψ onto R+ . This suggests the following educated
guess:
Let all first-order directional derivatives d⊤ ∇ϕ(0) of the function ϕ taken
at the origin along directions from the cone F be nonnegative, and the
second-order directional derivatives d⊤ ∇2 ϕ(0)d taken at the origin along
all nonzero directions from F which are orthogonal to ∇ϕ(0) be positive.
Then y∗ = 0 is a locally optimal solution to (26.4).
An absolutely straightforward verification demonstrates that our educated guess
is correct. Modulo this verification (it requires nothing but the fact that a con-
tinuous real-valued function on a nonempty closed and bounded set attains its
minimum on the set and is left to the reader), we have arrived at the following:
Fact S: The condition S∗ :
y∗ = 0 is a KKT point of problem (26.4) such that second-order directional derivatives
d⊤ ∇2 ϕ(0)d of ϕ (or, which is the same, of the Lagrange function L(·;
b λ∗ , µ∗ ), with
λ∗ , µ∗ given by the KKT property of y∗ ) taken at y∗ = 0 along all nonzero directions
from cone K are positive

is sufficient for y∗ = 0 to be a locally optimal solution to (26.4).

Note that the sufficient optimality condition S∗ is obtained from N∗ by strength-
ening the nonnegativity of the quadratic form d⊤ ∇2 ϕ(0)d = d⊤ ∇2y y=0 L(y;
b λ∗ , µ∗ )d
on the cone K to positivity of the form on the “nonzero part” K\{0} of that cone.
3) So far, we have established pretty close to each other conditions N∗ , S∗
which are necessary, resp. sufficient, for y∗ = 0 to be a locally optimal solution to
(26.4). Unfortunately, these conditions are difficult to verify, since checking non-
negativity/positivity outside of the origin of a quadratic form on a polyhedral
cone, in contrast to checking the form’s nonnegativity/positivity outside of the
origin on a linear subspace, is a computationally intractable task even when the
cone is as simple as the nonnegative orthant. To overcome, to some extent, this
difficulty, we
— modify the necessary optimality condition N∗ by replacing nonnegativity of
the quadratic form d⊤ ∇2 ϕ(0)d on the cone K with nonnegativity of the form on
298 ⋆ Mathematical Programming Optimality Conditions

the largest linear subspace contained in K. This linear subspace, as is immedi-

ately seen, is T n , and the resulting “spoiled” necessary optimality condition is
nothing but condition N;
— modify the sufficient optimality condition S∗ by replacing positivity of the
quadratic form d⊤ ∇2 ϕ(0)d on the “nonzero part” K\{0} of the cone K with pos-
itivity of the form on the nonzero part of the smallest linear subspace containing
K. This linear subspace, as is immediately seen, is T s , and the resulting “spoiled”
sufficient optimality condition is nothing but condition S.
Claims N and S are justified.

26.2.4 Justifying Propositions IV.26.3, IV.26.4

Let us start with summarizing some of our observations.

O.1. x∗ is a locally optimal solution to (26.1) if and only if y∗ = y(x∗ ) = 0 is a

locally optimal solution to (26.4).
O.2. Setting
ϕℓ (x) := e⊤
ℓ y(x), ∀ℓ = 1, . . . , n,

where eℓ are the standard basic orth in Rn , we get twice continuously differ-
entiable in a neighborhood X of x∗ functions such that
(
gℓ (x), if ℓ ≤ m,
ϕℓ (x) = ∀x ∈ X
hℓ−m (x), if m < ℓ ≤ m + k,

(see (26.2), (26.3)).

O.3. The Lagrange functions L(x; λ, µ), L(y;
b λ, µ) of problems (26.1) and (26.4) are
linked by the relation
L(x; λ, µ) = L(y(x);
b λ, µ), ∀(x ∈ X, λ ∈ Rm , µ ∈ Rk ).
Let
J := [∇y1 (x∗ ), . . . , ∇yn (x∗ )]⊤
be the taken at x = x∗ Jacobian of the mapping x 7→ y(x); this n × n matrix
is nonsingular, the inverse being the taken at y∗ := y(x∗ ) = 0 Jacobian of the
mapping y 7→ x(y). By Chain Rule we have

∇x x=x∗
L(x; λ, µ) = J ⊤ ∇y y=y∗ =0
b λ, µ), ∀(λ ∈ Rm , µ ∈ Rk ).
L(y;

Taking into account nonsingularity of J and recalling that all constraints in

(26.1) and (26.4) are active at x∗ , y∗ , respectively, we conclude that x∗ is a
KKT point of (26.1) if and only if y∗ = 0 is a KKT point of (26.4), and in
this case the Lagrange multipliers λ∗ , µ∗ certifying the KKT properties of the
points are the same for both points.
26.2 Justifying Optimality conditions 299

O.4. By the same Chain Rule we have

∇ϕℓ (x∗ ) = J ⊤ eℓ , ∀ℓ ≤ n,
implying that a direction d is orthogonal to ∇ϕℓ (x∗ ) if and only if the direction
Jd is orthogonal to eℓ .
O.5. Finally, by elementary calculus we have
∀(d ∈ Rn , λ, µ) :
d⊤ ∇2x x=x∗ L(x; λ, µ)d = (Jd)⊤ ∇2y y=y∗ =0 L(y;
b λ, µ)(Jd)
⊤ 2
d
+ ∇y y=y∗ L(y;
b λ, µ)
dt2
y(x∗ + td).
t=0
(26.5)

Justifying Proposition IV.26.3

Let x∗ be a locally optimal solution to (26.1), so that by O.1 the point y∗ = 0 is
a locally optimal solution to (26.4). Due to the latter fact and (already verified)
Claim N, condition N is satisfied, whence, in particular, y∗ is a KKT point of
(26.4), the associated Lagrange multipliers being some λ∗ ≥ 0, µ∗ . Consequently,
x∗ is a KKT point of (26.1), see O.3, the associated Lagrange multipliers being the
same λ∗ , µ∗ ; we have justified the first order part of the conclusion in Proposition
IV.26.3. Next, by O.4 we have T n = JTn . Besides this, by O.5 with λ, µ set to
λ∗ , µ∗ , we have
d⊤ ∇2
x x=x∗ L(x; λ∗ , µ∗ )d = (Jd)⊤ ∇2 b λ∗ , µ∗ )(Jd)
L(y;
y y=0 (26.6)
for all d, since the last term in the right hand side of (26.5) vanishes when λ =
λ∗ , µ = µ∗ – we already know that y∗ = 0 is a KKT point of (26.4), the Lagrange
multipliers being λ∗ , µ∗ . As we have just seen, when d ∈ Tn , one has Jd ∈ T n ,
so that the right hand side in (26.6) is nonnegative by the second order part of
condition N. Thus, the second order part of the conclusion in Proposition IV.26.3
takes place as well.

Justifying Proposition IV.26.4

Let the premise of Proposition IV.26.4 take place. Then, by the first order part
of this premise, x∗ is a KKT point of (26.1), the associated Lagrange multipliers
being some λ∗ ≥ 0 and µ∗ . By O.3, y∗ = 0 is a KKT point of problem (26.4), the
Lagrange multipliers being the same λ∗ , µ∗ . Thus, the first order part of condition
S is satisfied. Next, the linear subspace Ts is cut off Rn by the requirement that
a direction from Ts should be orthogonal to the gradients, taken at x∗ , of some
of the functions ϕℓ , and the linear subspace T s is cut off Rn by the requirement
that a direction from T s should be orthogonal to some of the vectors eℓ . Indexes
ℓ participating in these requirements are fully specified, via the same for both
subspaces in question rules, by the vectors of Lagrange multipliers associated
300 ⋆ Mathematical Programming Optimality Conditions

with the inequality constraints of respective problems (26.1), (26.4). As it was

already explained, these two vectors of Lagrange multipliers coincide with each
other, which combines with O.4 to imply that T s = JTs . Invoking (26.5) with
λ, µ set to λ∗ , µ∗ and taking into account that y∗ = 0 is a KKT point of (26.4),
the Lagrange multipliers being λ∗ , µ∗ , we conclude that (26.6) still holds true,
so that second order part of the premise in Proposition IV.26.4 implies that the
second order part of condition S is satisfied. Thus, condition S is satisfied, so that
y∗ = 0 is a locally optimal solution to (26.4) due to the already verified Claim S.
It remains to note that local optimality of y∗ as a solution to (26.4) implies, by
O.1, local optimality of x∗ as a solution to (26.1).

26.3 Concluding remarks

A. We have obtained the claims in Propositions IV.26.3, IV.26.4, by “translat-
ing” to the problem of interest (26.1) Claims N, S stating optimality conditions
established for problem (26.4). In turn, Claims N, S were “spoiled” versions of
“tight” necessary optimality condition N∗ , and “tight” sufficient optimality con-
dition S∗ for (26.4). We could also translate to problem (26.1) the conditions N∗ ,
S∗ as they are, thus arriving at “tight” necessary and sufficient conditions N+ ,
S+ for local optimality of x∗ in the problem of interest (26.1). As is immediately
seen, N+ , S+ are obtained from Propositions IV.26.3, IV.26.4 as follows. Define
H := ∇2x x=x∗
L(x; λ∗ , µ∗ ).
Then,
— in the second order part of the claim of Proposition IV.26.3, the requirement
d⊤ Hd ≥ 0 for all directions d from the linear subspace Tn is strengthened to
d⊤ Hd ≥ 0 for all directions d from the following cone (which contains Tn )
d⊤ ∇hi (x∗ ) = 0, ∀i,
 
 
K := d ∈ Rn : d⊤ ∇gj (x∗ ) = 0, for all j with λ∗j > 0, .
d⊤ ∇gj (x∗ ) ≤ 0, for all j with gj (x∗ ) = 0 and λ∗j = 0
 

— in the second order part of the claim of Proposition IV.26.4, the requirement
d⊤ Hd > 0 for all nonzero directions d from the linear subspace
Ts = d : d⊤ ∇hi (x∗ ) = 0, i ≤ k, d⊤ ∇gi (x∗ ) = 0, ∀(j ≤ m: λ∗j > 0)

is relaxed to d⊤ Hd > 0 for all nonzero directions d from the cone K contained in
Ts .
As we have explained, the rationale to spoil N+ , S+ is the desire to end up with
efficiently verifiable optimality conditions. Let us also examine the special cases
where the tight conditions are verifiable “as is.” Here are these cases (below, x∗
is a regular feasible solution to (26.1) which happens to be a KKT point of the
problem, and λ∗ is the associated vector of Lagrange multipliers for the inequality
constraints):
• [nondegeneracy] all active at x∗ inequality constraints gj are associated with
26.3 Concluding remarks 301

positive Lagrange multipliers λ∗j .

In this case, K = Tn = Ts .
• there is exactly one active at x∗ inequality constraint with zero Lagrange mul-
tiplier λ∗j .
In this case, K is cut off Ts by a single homogeneous linear inequality, so that
for a homogeneous quadratic form to be nonnegative/positive outside of the
origin on the cone K and on the subspace Ts is the same. Consequently, in the
case in question N+ is obtained from Proposition IV.26.3 by replacing in the
formulation of the second order part the subspace Tn with larger subspace Ts ,
and Proposition IV.26.4 states exactly the same as condition S+ ;
• There are exactly two active at x∗ inequality constraints with zero Lagrange
multipliers λ∗· , say, j-th and j ′ -th.
In this case, the cone K is cut off the linear subspace Ts by two homogeneous
linear constraints:
K = d ∈ Ts : a⊤ d ≤ 0, b⊤ d ≤ 0 ,

where a and b are the orthoprojections of ∇gj (x∗ ) and ∇gj ′ (x∗ ) onto Ts . Note
that a and b are linearly independent due to the regularity of x∗ and the origin
of Ts . As a result, a homogeneous quadratic form d⊤ Hd is nonnegative/positive
outside of the origin on the cone K if and only if it is so everywhere on the set
{d ∈ Ts : d⊤ [ab ⊤
| {z + ba⊤}]d ≥ 0},
:=E

or, by invoking S-lemma (Lemma IV.22.7), if and only if there exists θ ≥ 0

such that the quadratic form d⊤ [H − θE]d is nonnegative/positive outside of
the origin on the linear subspace Ts . The bottom line is that in the case in
question optimality conditions N+ and S+ are efficiently verifiable and may
“outperform” the standard optimality conditions stated in Propositions IV.26.3
and IV.26.4. In order to see an example of when they may “outperform,” look
what the conditions say in the case of the problem minx1 ,x2 {f (x) : x1 ≤ 0, x2 ≤
0} when x∗ = 0 and ∇f (0) = 0.
B. Finally, we remark that on the closest inspection, the first order part of
Proposition IV.26.3 remains a necessary condition for local optimality in the
case when we relax our Default Assumption by replacing twice continuous dif-
ferentiability of the objective and the constraints in a neighborhood of a regular
solution x∗ with plain continuous differentiability; the validity of this modifica-
tion is readily given by inspecting the relevant parts of the above derivations and
the Implicit Function Theorem applied with κ = 1 rather than with κ = 2. The
resulting “truncated” version of Proposition IV.26.3 is called the necessary First
Order Optimality condition.
Aside from rare cases when the optimality conditions allow to find an optimal
solution in closed analytical form, their role in traditional Mathematical Pro-
gramming is in “guiding” algorithms and justifying their convergence properties.
Specifically, when the current iterate of an iterative algorithm does not satisfy
302 ⋆ Mathematical Programming Optimality Conditions

the necessary optimality condition (e.g., the gradient of smooth objective which
we want to minimize over the entire space is nonzero), the condition usually sug-
gests a way to “improve” the iterate (say, in the example above moving along
the antigradient direction reduces the value of the objective, provided the step is
chosen properly), and the algorithm utilizes this improvement and in this sense is
guided by the optimality condition. However, in this textbook, we do not touch
algorithms at all. Therefore, to illustrate the role of optimality conditions, we con-
sider the rare situation where these conditions allow to establish an important
theoretical result, namely, S-Lemma (Lemma IV.22.7).

26.3.1 Illustration: S-Lemma revisited

The proof to follow is incomparably less elegant than the original one; the ad-
vantage of the new proof is its “bare hands” nature – it is what you can develop
if you do not want to think much.
The only nontrivial part of S-Lemma is the claim that if a homogeneous
quadratic inequality
x⊤ Bx ≥ 0, (B)
is a consequence of strictly feasible quadratic inequality
x⊤ Ax ≥ 0, (A)
then B−θA ⪰ 0 for properly selected θ ≥ 0. The proof of this claim via optimality
conditions goes as follows. Define f (x) := x⊤ Bx and h(x) := x⊤ Ax − 1 and
consider the following equality constrained optimization problem
Opt := min {f (x) : h(x) = 0} (P )
x

Since (A) is strictly feasible, this problem (P ) is feasible, and since (B) is a
consequence of (A), we have Opt ≥ 0. Assume for a moment that the problem is
solvable, and let x∗ be its optimal solution. Note that x∗ is a regular solution to
our problem, since ∇h(x∗ ) = 2Ax∗ ̸= 0 due to x⊤ ∗ Ax∗ = 1. Applying Necessary
optimality condition, we get a µ∗ such that
∇f (x∗ ) + µ∗ ∇h(x∗ ) = 0, and
d⊤ (∇2 f (x∗ ) + µ∗ ∇2 h(x∗ ))d ≥ 0, ∀(d : d⊤ ∇h(x∗ ) = 0).
That is, by defining H := B + µ∗ A, we arrive at
(a) : Hx∗ = 0, and (b) : d⊤ Hd ≥ 0, ∀(d : d⊤ Ax∗ = 0).
Thus, we conclude that x⊤ ⊤ ∗ ∗
∗ Hx∗ = x∗ (B + µ A)x∗ = 0, that is, Opt + µ = 0,
∗ n
implying that µ ≤ 0 due to Opt ≥ 0. Next, for any g ∈ R and setting γ :=
g ⊤ Ax∗ and d := g − (g ⊤ Ax∗ )x∗ = g − γx∗ , we get
(c) : d⊤ Ax∗ = 0
26.3 Concluding remarks 303

due to x⊤ n
∗ Ax∗ = 1. Consequently, for all g ∈ R we have

g ⊤ Hg = (d + γx∗ )⊤ H(d + γx∗ )

⊤
= d Hd} +2γd⊤ Hx∗ +γ 2 x⊤
∗ Hx ≥ 0,
| {z |{z} |{z}∗
≥0 by (c,b) =0 =0
by (a)

that is, H ⪰ 0. Thus, setting θ = −µ∗ ≥ 0, we get B − θA ⪰ 0, as claimed.

It remains to get rid of the assumption that (P ) is solvable. To this end let
ϵ > 0 and define Bϵ := B + ϵI. When x satisfies (A), we have x⊤ Bϵ x = x⊤ Bx +
ϵx⊤ x ≥ ϵx⊤ x, so that replacing in (P ) matrix B with Bϵ , we get a feasible
optimization problem with nonnegative optimal value and objective tending to
∞ along every sequence of feasible solutions with norms tending to ∞, implying
that the problem is solvable. By the above reasoning applied to Bϵ in the role of
B, there exists θϵ ≥ 0 such that Bϵ ⪰ θϵ A. Let x be a once forever fixed feasible
solution to (P ). Then, the feasibility of x (i.e., x⊤ Ax = 1) and Bϵ ⪰ θϵ A together
imply x⊤ Bϵ x ≥ θϵ x⊤ Ax = θϵ . By plugging in the definition of Bϵ , we see that
x⊤ Bx + ϵ∥x∥22 ≥ θϵ , and so as ϵ → +0 the sequence θϵ remains bounded. Thus,
defining θ to be a limiting point of θϵ as ϵ → +0, we get θ ≥ 0, and Bϵ ⪰ θϵ A
combines with Bϵ → B as ϵ → +0 to imply that B ⪰ θA.
27

Saddle points

27.1 Definition and Game Theory interpretation

When speaking about the “saddle point” formulation of optimality conditions
in Convex Programming, we touched the topic of Saddle Points, which is very
interesting in its own right. Let us recall our situation and the definition of saddle
points.

Definition IV.27.1 [Saddle points] Let X ⊆ Rn and Λ ⊆ Rm be two

nonempty sets, and let
L(x, λ) : X × Λ → R
be a real-valued function of x ∈ X and λ ∈ Λ. We say that a point (x∗ , λ∗ ) ∈
X × Λ is a saddle point of L on X × Λ, if L attains at this point its maximum
in λ ∈ Λ and attains at the point its minimum in x ∈ X, i.e.,
L(x, λ∗ ) ≥ L(x∗ , λ∗ ) ≥ L(x∗ , λ), ∀(x, λ) ∈ X × Λ. (27.1)

The notion of a saddle point admits natural interpretation in game terms. Con-
sider what is called a two-person zero-sum game where player I chooses x ∈ X
and player II chooses λ ∈ Λ; after the players have chosen their decisions, player
I pays to player II the sum L(x, λ). Of course, I is interested to minimize his pay-
ment, while II is interested to maximize his income. What is the natural notion
of the equilibrium in such a game, i.e., what are the choices (x, λ) of the players
I and II such that every one of the players is not interested to vary his choice
independently on whether he knows the choice of his opponent? It is immedi-
ately seen that the equilibria are exactly the saddle points of the cost function
L. Indeed, if the players decisions (x, y) are chosen to be a saddle point (x∗ , λ∗ )
satisfying (27.1), then the player I is not interested to pass from x∗ to another
choice, given that II keeps his choice λ = λ∗ fixed: the first inequality in (27.1)
shows that such a choice cannot decrease the payment of I. Similarly, player II
is not interested to choose something different from λ∗ , given that I keeps his
choice x = x∗ as such an action cannot increase the income of II. On the other
hand, if the players decisions (x, λ) is not a saddle point, then either the player
I can decrease his payment passing from x to another choice, given that II keeps
his choice at λ (this is the case when the first inequality in (27.1) is violated),
304
27.1 Definition and Game Theory interpretation 305

or similarly for the player II. Thus, we conclude that equilibria are exactly the
saddle points.
The game interpretation of the notion of a saddle point motivates deep insight
into the structure of the set of saddle points. Consider the following two situations:
(A) player I makes his choice first, and player II makes his choice already
knowing the choice of player I;
(B) vice versa, player II chooses first, and player I makes his choice already
knowing the choice of player II.
In the case (A) the reasoning of player I is as follows: If I choose some x, then
player II of course will choose λ which maximizes, for my x, my payment L(x, λ),
so that I will pay the sum
L(x) := sup L(x, λ);
λ∈Λ

Consequently, my policy should be to choose x which minimizes my loss function

L, i.e., the one which solves the optimization problem
Opt(P) = inf L(x); (P)
x∈X

with this policy my anticipated payment will be

Opt(P) = inf L(x) = inf sup L(x, λ).
x∈X x∈X λ∈Λ

In the case (B), similar reasoning of player II results in his profit function to
be given by
L(λ) := inf L(x, λ),
x∈X

and thus player II’s objective becomes to maximize (in λ) L(λ) resulting in the
following optimization problem
Opt(D) = sup L(λ). (D)
λ∈Λ

Based on this policy, the anticipated profit of II is given by

Opt(D) = sup L(λ) = sup inf L(x, λ).
λ∈Λ λ∈Λ x∈X

Note that these two reasonings relate to two different games: the one with
priority of player II (when making his decision, player II already knows the choice
of player I) in the case of (A), and the one with similar priority of player I in
the case of (B). Therefore, we should not, generally speaking, expect that the
anticipated loss of player I in (A) is equal to the anticipated profit of player II
in (B). What can be guessed is that the anticipated loss of player I in (B) is less
than or equal to the anticipated profit of player II in (A), since the conditions
of the game (B) are better for player I than those of (A). Thus, we may guess
that independent of any structural property of the function L(x, λ), the following
inequality holds:
Opt(D) = sup inf L(x, λ) ≤ Opt(P) = inf sup L(x, λ). (27.2)
λ∈Λ x∈X x∈X λ∈Λ
306 Saddle points

This inequality indeed is true, which is seen from the following reasoning:
inf L(x, λ) ≤ L(y, λ), ∀y ∈ X
x∈X

=⇒ sup inf L(x, λ) ≤ sup L(y, λ) =: L(y), ∀y ∈ X.

λ∈Λ x∈X λ∈Λ

Consequently, the quantity sup inf L(x, λ) is a lower bound for the function L(y)
λ∈Λ x∈X
for all y ∈ X, and therefore it is a lower bound for the infimum of the latter
function over y ∈ X, i.e., it is a lower bound for inf sup L(y, λ).
y∈X λ∈Λ
Now let us look at what happens when the game in question has a saddle point
(x∗ , λ∗ ), so that the following relations hold:
L(x, λ∗ ) ≥ L(x∗ , λ∗ ) ≥ L(x∗ , λ), ∀(x, λ) ∈ X × Λ. (27.3)
We claim that if it is the case, then we have the following property:

(*) x∗ is an optimal solution to (P), λ∗ is an optimal solution to (D) and

the optimal values in these two optimization problems are equal to each
other (and are equal to the quantity L(x∗ , λ∗ )).
Indeed, from (27.3) it follows that
L(λ∗ ) ≥ L(x∗ , λ∗ ) ≥ L(x∗ ),
hence, of course,
sup L(λ) ≥ L(λ∗ ) ≥ L(x∗ , λ∗ ) ≥ L(x∗ ) ≥ inf L(x).
λ∈Λ x∈X

On the other hand, from (27.2) we deduce that sup L(λ) ≤ inf L(x), which is
λ∈Λ x∈X
possible if and only if all the inequalities in the chain are equalities, which is
exactly what is said in property (*).
Thus, if (x∗ , λ∗ ) is a saddle point of L, then (*) takes place. We are about to
demonstrate that the inverse is also true.

Theorem IV.27.2 [Structure of the set of saddle points] Let L : X ×Y → R

be a function. The function L has a saddle point if and only if the related
optimization problems (P) and (D) are solvable and Opt(P ) = Opt(D). In
such a case, the saddle points of L are exactly all pairs (x∗ , λ∗ ) with x∗ being
an optimal solution to (P) and λ∗ being an optimal solution to (D), and the
value of the cost function L(·, ·) at every one of these points is equal to the
common optimal value in (P) and (D).

Proof. We have already established “half” of the theorem: if there are saddle
points of L, then their components are optimal solutions to (P), respectively, (D),
and the optimal objective values, Opt(P) and Opt(D), of these two problems are
equal to each other and to the value of L at the saddle point in question.
To complete the proof, we should demonstrate that if x∗ is an optimal solution
to (P), λ∗ is an optimal solution to (D) and the optimal objective values of these
27.2 Existence of Saddle Points: Sion-Kakutani Theorem 307

two problems are equal to each other, then (x∗ , λ∗ ) is a saddle point of L. But,
this is also immediate as for all x ∈ X, λ ∈ Λ we have
L(x, λ∗ ) ≥ L(λ∗ ) = L(x∗ ) ≥ L(x∗ , λ).
Here, the first inequality holds by definition of L, the equality holds by the as-
sumption that the optimum objective values of the problems (P) and (D) are the
same, and the last inequality follows from the definition of L. Hence,
L(x, λ∗ ) ≥ L(x∗ , λ) ∀x ∈ X, λ ∈ Λ.
By substituting λ = λ∗ in the right hand side of this inequality, we get L(x, λ∗ ) ≥
L(x∗ , λ∗ ), and substituting x = x∗ in the right hand side of our inequality, we get
L(x∗ , λ∗ ) ≥ L(x∗ , λ). Thus, (x∗ , λ∗ ) is indeed a saddle point of L.

27.2 Existence of Saddle Points: Sion-Kakutani Theorem

It is easily seen that a “quite respectable” function L on a direct product–type
convex domain may have no saddle points.
Example IV.27.1 Consider the function L(x, λ) := (x − λ)2 on the unit square
[0, 1] × [0, 1]. Then, we have
L(x) = sup (x − λ)2 = max{x2 , (1 − x)2 },
λ∈[0,1]
L(λ) = inf (x − λ)2 = 0, λ ∈ [0, 1],
x∈[0,1]

so that the optimal objective value of (P) is 14 , and the optimal objective value
of (D) is 0. Hence, Theorem IV.27.2 implies that L has no saddle points. ♢
On the other hand, there are generic cases when L has a saddle point. For
example, when
m
X
L(x, λ) = f (x) + λi gi (x) : X × Rm
+ → R
i=1

is the Lagrange function of a solvable convex program satisfying the Slater con-
dition. Setting Λ = Rm+ , we get
(
f (x), if gj (x) ≤ 0, j ≤ m
L(x) = : X → R ∪ {+∞},
+∞, otherwise
so that, in the notation from Theorem IV.27.2, problem (P) stemming from
L, Λ, X is (equivalent to) the problem
min {f (x) : gj (x) ≤ 0, j ≤ m, x ∈ X} (∗)
x

underlying the Lagrange function in question, and problem (D) is the standard
Lagrange dual of (∗), cf. section 21.3. Theorem IV.23.1 states that in this case
saddle points exist, which, in particular, combines with Theorem IV.27.2 to imply
that the value of L at a saddle point is equal to the optimal value of (P ), that is,
308 Saddle points

of (∗) – a fact that was not explicitly articulated in Theorem IV.23.1 and which
we will use later.
Note that in the case in question Λ = Rm + and X are nonempty and convex,
and L is convex in x for every fixed λ ∈ Λ and is linear (and therefore concave)
in λ for every fixed x ∈ X. As we shall see in a while, these structural properties
of L –convexity in x for every fixed λ and concavity in λ for every fixed x– take
upon themselves the “main responsibility” for the fact that in this case the saddle
points exist. We state this formally as the following result.

Theorem IV.27.3 [Existence of saddle points of a convex-concave function

(Sion-Kakutani)] Let X and Λ be nonempty convex compact sets in Rn and
Rm , respectively, and let
L(x, λ) : X × Λ → R
be a continuous function which is convex in x ∈ X for every fixed λ ∈ Λ
and is concave in λ ∈ Λ for every fixed x ∈ X. Then, L has saddle points on
X × Λ.

In the proof of Theorem IV.27.3 we will use a basic fact from Analysis.

Fact IV.27.4 Let X and Λ be nonempty compact sets in Rn and Rm ,

respectively, and let L(x, λ) : X × Λ → R be a continuous function. Then,
the problems (P) and (D) are solvable.

In order to prove Theorem IV.27.3, we need one more critical ingredient that
is quite significant on its own right.

Lemma IV.27.5 [Minimax Lemma] Let X be a nonempty convex compact

set, and let f0 , . . . , fN be a collection of N +1 convex and continuous functions
on X. Then,PNthere exist convex combination weights µi ∈ R+ , i = 0, . . . , N
such that i=0 µi = 1 and
N
X
min max fi (x) = min µi fi (x).
x∈X i=0,...,N x∈X
i=0

Remark IV.27.6 Minimum of every convex combination of a collection of arbi-

trary functions is less than or equal to the minimax of the collection. This evident
fact can be also obtained from applying (27.2) to the function
N
X
M (x, µ) := µi fi (x)
i=0

on the direct product of X and the standard simplex

( N
)
X
∆= µ ∈ RN +1 : µi ≥ 0, ∀0 ≤ i ≤ N, µi = 1 .
i=0
27.3 Proof of Sion-Kakutani Theorem 309

The interesting part of the Minimax Lemma states that if fi are convex and
continuous on a convex compact set X, then the indicated inequality is in fact
equality. You can easily verify that this is nothing but the claim that the function
M possesses a saddle point. Thus, the Minimax Lemma is in fact a particular
case of the Sion-Kakutani Theorem (i.e., Theorem IV.27.3). ■
We will first give a direct proof of this particular case of the Theorem IV.27.3
stated in Lemma IV.27.5 and then use it to prove the general case given in
Theorem IV.27.3.

27.3 Proof of Sion-Kakutani Theorem

27.3.1 Proof of Minimax Lemma
Consider the optimization program
min {t : x ∈ X, fi (x) − t ≤ 0, ∀i = 0, . . . , N } . (S)
t,x

This is clearly a convex problem with the optimal objective value equal to
t∗ := min max fi (x).
x∈X i=0,...,N

Note that (t, x) is feasible solution for (S) if and only if x ∈ X and t ≥
max fi (x). Problem (S) clearly satisfies the Slater condition and is solvable
i=0,...,N
(since X is a compact set and fi , i = 0, . . . , N , are continuous on X; therefore
their maximum is also continuous on X and thus attains its minimum on the
compact set X). Let (t∗ , x∗ ) be an optimal solution to Problem (S). According
to Theorem IV.23.1, there exists λ∗ ≥ 0 such that ((t∗ , x∗ ), λ∗ ) is a saddle point
of the corresponding Lagrange function
N N
! N
X X X
L(t, x; λ) = t + λi (fi (x) − t) = t 1 − λi + λi fi (x),
i=0 i=0 i=0

on (t, x; λ) ∈ [R × X] × RN +
+1
and the value of this function at ((t∗ , x∗ ), λ∗ ) is
∗
equal to t , i.e., the optimal objective value of (S), see the discussion preceding
Theorem IV.27.3.
Now, since L(t, x; λ∗ ) attains its minimum in (t, x) over the set R×X at (t∗ , x∗ ),
we should have
N
X
λ∗i = 1
i=0

(otherwise
P −∞ (look what happens when
the minimum of L in (t, x) would be P
i λ i < 1 and t → −∞, and what happens when i λi > 1 and t → +∞). Thus,
N
!
X
min max fi (x) = t∗ = min t·0+ λ∗i fi (x) ,
x∈X i=0,...,N t∈R,x∈X
i=0
310 Saddle points

so that
N
X
min max fi (x) = min λ∗i fi (x)
x∈X i=0,...,N x∈X
i=0
PN
for some λ∗i ≥ 0, ∀0 ≤ i ≤ N satisfying i=0 λ∗i = 1, as desired.
We are now ready to prove Theorem IV.27.3, a.k.a., Sion-Kakutani Theorem.

27.3.2 From Minimax Lemma to the proof of Sion-Kakutani

Theorem
Based on Theorem IV.27.2, all we need to prove is that
(i) the optimization problems (P) and (D) are solvable, and
(ii) the optimal values in (P) and (D) are equal to each other.
In fact, (i) is valid independent of convexity-concavity of L and the convexity
of the sets X and Λ, and it is simply given by Fact IV.27.4.
The essence of the matter in the proof of Theorem IV.27.3 lies in (ii). In order
to prove (ii), of course, we will heavily exploit convexity-concavity of L and the
convexity of the sets X and Λ. We will prove this in several steps and use the
Minimax Lemma (i.e., Lemma IV.27.5) in the last step.
00 . We need to prove that the optimal objective values of (P) and (D) (which,
by (i), are well defined real numbers) are equal to each other, i.e., the following
relation holds:
inf sup L(x, λ) = sup inf L(x, λ).
x∈X λ∈Λ λ∈Λ x∈X

We already know from (27.2) (which by the way makes no structural assumptions
on L, X, or Λ) that
inf sup L(x, λ) ≥ sup inf L(x, λ).
x∈X λ∈Λ λ∈Λ x∈X

So, all we need is to prove the reverse inequality. Without loss of generality we can
assume that Opt(D) is zero, i.e., Opt(D) = sup inf L(x, λ) = 0 (we can achieve
λ∈Λ x∈X
this by shifting the function L by a constant quantity if needed). Then, all we need
to prove is that Opt(P ) cannot be positive, i.e., Opt(P ) = inf sup L(x, λ) ≤ 0.
x∈X λ∈Λ
0
1 . What does it mean that Opt(D) is zero? When Opt(D) = 0, then the
function L(λ) = inf L(x, λ) is nonpositive for every λ ∈ Λ, or, equivalently, for
x∈X
every λ ∈ Λ, the function L(x, λ), which is a convex and continuous function
of x ∈ X, has nonpositive minimal value over x ∈ X. Since X is compact, this
minimal value is achieved, so that for any λ ∈ Λ the set
X(λ) := {x ∈ X : L(x, λ) ≤ 0}
is nonempty. Moreover, as X is convex and for every λ ∈ Λ the function L is
convex in x ∈ X, the set X(λ) is convex (it is nothing but the sublevel set of
27.4 Sion-Kakutani Theorem: A refinement 311

a convex function, so Proposition III.12.6 applies here). Note also that the set
X(λ) is closed since X is closed and L(x, λ) is continuous in x ∈ X. Thus, if
Opt(D) = 0, then the set X(λ) is a nonempty convex compact set for every
λ ∈ Λ.
20 . What does it mean that Opt(P ) ≤ 0? It means exactly that there is a point
x ∈ X where the function L is nonpositive, i.e., there exists a point x ∈ X where
L(x, λ) ≤ 0 for all λ ∈ Λ. In other words, to prove that Opt(P ) ≤ 0 is the same
as to prove that the sets X(λ), λ ∈ Λ, have a point in common.
30 . With the above observations we see that the situation is as follows: we are
given a family of nonempty closed convex subsets X(λ), λ ∈ Λ, of a compact
set X, and we need to prove that these sets have a point in common. By Helly
Theorem II, in order to prove that all X(λ) have a point in common, it suffices to
prove that every (N + 1) sets of this family, where N := dim(X), have a point in
common. Let X(λ0 ), . . . , X(λN ) be N + 1 sets from our family; we should prove
that the sets have a point in common. To this end, by defining
fi (x) := L(x, λi ), i = 0, . . . , N ;
all we need to prove is that there exists a point x where all our functions are
nonpositive, or, equivalently prove that the minimax of our collection of functions
fi for i = 0, . . . , N , i.e., the quantity
α := min max fi (x),
x∈X i=0,...,N

is nonpositive.
Note that as L is convex and continuous in x, all of the functions fi are convex
and continuous. Since X is compact and all fi are convex and continuous, we can
then apply the Minimax Lemma (i.e., Lemma IV.27.5) and deduce that α is the
PN
minimumP in x ∈ X of certain convex combination ϕ(x) := i=0 νi fi (x) (where
νi ≥ 0, i νi = 1) of the functions fi (x). Thus, we arrive at
N N N
!
X X X
ϕ(x) = νi fi (x) = νi L(x, λi ) ≤ L x, νi λi ,
i=0 i=0 i=0

where the last inequality follows from the concavity of L in λ (this is the only
–and crucial– point where we use this assumption). Then, we see that ϕ(·) is
PN
majorized by L(·, λ̄) where λ̄ := i=0 νi λi for come convex combination weights
νi , so that λ̄ ∈ Λ by convexity of Λ. Thus, the minimum of ϕ in x ∈ X –and
we already know that this minimum is exactly α– is nonpositive (recall that the
minimum of L in x is nonpositive for every λ ∈ Λ).

27.4 Sion-Kakutani Theorem: A refinement

The next theorem lifts the assumption of boundedness of X and Λ in Theorem
IV.27.3 – now only one of these sets need to be bounded – at the price of some
weakening of the conclusion. In particular, when only one of the sets is bounded,
we can no longer claim to be able to find the associated saddle points (as they
312 Saddle points

may not exist), yet we can switch the order in which we take the minimum over
x and the maximum over λ, i.e., the optimum objective values of the associated
problems are still the same.

Theorem IV.27.7 [Swapping min and max in convex-concave saddle point

problem (Sion-Kakutani)] Let X and Λ be nonempty convex sets in Rn and
Rm , respectively, and suppose that X is compact. Let
L(x, λ) : X × Λ → R
be a continuous function which is convex in x ∈ X for every fixed λ ∈ Λ and
is concave in λ ∈ Λ for every fixed x ∈ X. Then,
inf sup L(x, λ) = sup inf L(x, λ). (27.4)
x∈X λ∈Λ λ∈Λ x∈X

Proof. Once again, we already know from (27.2) (which by the way makes no
structural assumptions on L, X, or Λ) that
inf sup L(x, λ) ≥ sup inf L(x, λ).
x∈X λ∈Λ λ∈Λ x∈X

Hence, there is nothing to prove when the right had side in (27.4) is +∞. So,
we assume that this is not the case. Since X is compact and L is continuous in
x ∈ X, inf x∈X L(x, λ) > −∞ for every λ ∈ Λ, so that the right hand side in (27.4)
cannot be −∞, either. Then, sup inf L(x, λ) ∈ (−∞, ∞), and thus it must be a
λ∈Λ x∈X
real number. By shifting the function L by a constant (if needed), without loss
of generality we can assume that this real number is 0, i.e.,
sup inf L(x, λ) = 0.
λ∈Λ x∈X

All we need to prove now is that the left hand side in (27.4) is nonpositive.
Assume, on the contrary, that it is positive and thus it is strictly greater than
some number c > 0. Then, for every x ∈ X there exists λx ∈ Λ such that
L(x, λx ) > c. By continuity, there exists a neighborhood Vx of x in X (i.e., the
intersection with X of an open set containing x) such that L(x′ , λx ) ≥ c for
all x′ ∈ Vx . Since X is compact, we can Sp find finitely many points x1 , . . . , xp
in X such that the union set given by i=1 Vxi is exactly X. Then, we deduce
max1≤i≤p L(x, λxi ) ≥ c for every x ∈ X.
Now, define
Λ̄ := Conv {λx1 , . . . , λxn } .
Then, Λ̄ is compact by design and maxλ∈Λ̄ L(x, λ) ≥ c for every x ∈ X, so that
c ≤ minx∈X maxλ∈Λ̄ L(x, λ). We can now apply Theorem IV.27.3 to L and the
convex compact sets X and Λ̄ to get the equality in the following chain:
c ≤ min max L(x, λ) = max min L(x, λ) ≤ sup min L(x, λ) = 0.
x∈X λ∈Λ̄ λ∈Λ̄ x∈X λ∈Λ x∈X

This implies c ≤ 0 and gives the desired contradiction (recall that c > 0).
27.4 Sion-Kakutani Theorem: A refinement 313

In the case when one of the sets, say Λ is unbounded, a slightly stronger premise
of Theorem IV.27.7 still allows us to replace (27.4) with the existence of a saddle
point.

Theorem IV.27.8 [Existence of saddle point in convex-concave saddle point

problem (Sion-Kakutani, Semi-Bounded case)] Let X and Λ be nonempty
closed convex sets in Rn and Rm , respectively, and suppose that X is com-
pact. Let
L(x, λ) : X × Λ → R
be a continuous function which is convex in x ∈ X for every fixed λ ∈ Λ
and is concave in λ ∈ Λ for every fixed x ∈ X. Furthermore, assume that for
every a ∈ R, there exist a collection of points xa1 , . . . , xana ∈ X such that the
set
{λ ∈ Λ : L(xai , λ) ≥ a, 1 ≤ i ≤ na }
is bounded† . Then, L has saddle points on X × Λ.
† This is definitely the case when L(x̄, λ) is coercive in λ for some x̄ ∈ X, meaning that the sets
{λ ∈ Λ : L(x̄, λ) ≥ a} are bounded for every a ∈ R, or, equivalently, whenever λi ∈ Λ and
∥λi ∥2 → ∞ as i → ∞, we have L(x̄, λi ) → −∞ as i → ∞.

Proof. Since X is compact and L is continuous, the function

L(λ) = min L(x, λ)
x∈X

is real-valued and continuous on Λ. Furthermore, for every a ∈ R, the set {λ ∈

Λ : L(λ) ≥ a} is clearly contained in the set {λ : L(xai , λ) ≥ a, 1 ≤ i ≤ na } and
thus is bounded. Thus, L(λ) is a continuous function on a closed set Λ, and its su-
perlevel sets {λ ∈ Λ : L(λ) ≥ a} are bounded. Therefore, L attains its maximum
on Λ. Then, by Theorem IV.27.7 we deduce that inf x∈X [L(x) := supλ∈Λ L(x, λ)]
is finite. Hence, the function L(·) is not +∞ identically in x ∈ X. Also, as L
is continuous, L is lower semicontinuous. Thus, L : X → R ∪ {+∞} is a lower
semicontinuous proper (i.e., not identically +∞) function on X, and since we ad-
ditionally have that X is compact, we conclude that L attains its minimum on X.
Thus, both problems maxλ∈Λ L(λ) and minx∈X L(x) are solvable, and their opti-
mal objective values are equal by Theorem IV.27.7. Then, by Theorem IV.27.2,
we conclude that L has a saddle point.
28

Exercises for Part IV

28.1 Around Conic Duality

Exercise IV.1 Given Linear Dynamical System

x0 = 0
(LDS)
xt+1 = Axt + But , t = 0, 1, . . . , N − 1

(A : n × n, B : n × m) with controls ut subject to the “energy constraints”

∥ut ∥2 ≤ 1, 0 ≤ t < N, (EN)

pose the problem of maximizing f ⊤ xN (f is a given vector) as a conic problem on the product
of Lorentz cones, write down the conic dual of this problem, and answer the following questions:
1. Is the problem essentially strictly feasible?
2. Is the problem bounded?
3. Is the problem solvable?
4. Is the dual problem feasible?
5. Is the dual problem solvable?
6. Are the optimal values equal to each other?
7. What do optimality conditions say?
Exercise IV.2 ▲ Consider conic constraint Ax − b ∈ K where K ⊂ Rm is a regular cone and
matrix A is of full column rank (i.e., has linearly independent columns, or, which is the same,
has trivial kernel). Suppose that the constraint is feasible. Show that the following properties
are all equivalent to each other:
(i) the feasible region {x ∈ Rn : Ax − b ∈ K} is bounded;
(ii) Im(A) ∩ K = { 0 }, where Im(A) := {Ax : x ∈ Rn };
(iii) the following system of vector inequalities is solvable

A⊤ λ = 0, λ ∈ int K∗ .

Using these conclude that the property of whether a given conic optimization problem has a
bounded feasible region or not is independent of the choice of b, provided that the problem is
feasible.
Exercise IV.3 ♦ Given a cone K in a Euclidean space E with an inner product ⟨·, ·⟩, we call
a pair of elements x ∈ K and y ∈ K∗ complementary if they satisfy ⟨x, y⟩ = 0.
In this question, we will examine complementarity relations for the second-order cones Ln
and the positive semidefinite cone Sn+.

1. Consider L := x = [x̃; xn ] ∈ Rn−1 × R : xn ≥ ∥x̃∥2 ; as we know, this cone is self-dual

(Example II.8.8). Prove that x, s ∈ Ln satisfy ⟨x, s⟩ = 0 iff xn s̃ + sn x̃ = 0 holds.

314
28.1 Around Conic Duality 315

2. Consider the space of n × n symmetric

Pn matrices,
Pn i.e., E = Sn equipped with the Frobenius
⊤
inner product ⟨X, Y ⟩ = Tr(XY ) = i=1 j=1 Xij Yij . Let K = Sn n
+ := {X ∈ S : x Xx ≥
n
0 ∀x ∈ R } be the positive semidefinite cone; recall that this cone is self-dual (Example
II.8.9). Prove that X, Y ∈ Sn
+ are complementary, i.e., ⟨X, Y ⟩ = 0, iff their matrix product is
zero, i.e., XY = Y X = 0. In particular, matrices from a complementary pair commute and
therefore share a common orthonormal eigenbasis.
Exercise IV.4 ♦ By General Theorem on Alternative, a system of m scalar linear constraints
Ax ≥ b in variables x ∈ Rn (or, which is the same, the conic inequality Ax ≥Rm +
b) has no
solutions if and only if it can be led to contradiction by aggregation: there exist nonnegative
weights λ1 , . . . , λm such that the associated weighted sum λ⊤ Ax ≥ λ⊤ b of inequalities from
the system is a contradictory inequality, that is, A⊤ λ = 0 and b⊤ λ > 0. For a general conic
constraint of the form
Ax ≥K b (I)
where K ⊂ Rm is a regular cone, similar recipe for certifying infeasibility would read
∃λ ∈ K∗ : A⊤ λ = 0 & b⊤ λ > 0. (II)
The goal of Exercise is to investigate relation between feasibility statuses of (I) and of (II).
Your first task is easy:
1. Prove that if (II) is feasible, then (I) is infeasible.
The rest of your effort is aimed at investigating to which extent item 1 can be inverted: if
and when it is true that when (II) has no solutions, then (I) is feasible? General Theorem on
Alternative says that this indeed is the case when K is the nonnegative orthant Rm + . In the
general case, the situation is different.
2. Let (I) be the univariate conic inequality
Ax := [1; 0; 1]x ≥L3 b := [0; 1; 0] (i)
3
where L is the 3D Lorentz cone. Write down the associated system (II) and check that both
this system and (i) are infeasible. Conclude from this example that in general, solvability of
(II) is only sufficient, but not necessary, condition for infeasibility of (I).
3. Prove that (II) is infeasible if and only if (I) is nearly feasible, meaning that for every
ϵ > 0 there exists b′ such that ∥b′ − b∥2 ≤ ϵ and the conic constraint Ax ≥K b′ is feasible.
Equivalently: (II) is infeasible if and only if b belongs to the closure B of the set B = ARn −K
of those right hand side vectors in (I) for which (I) is feasible.
Conclusion: Solvability of (II) is necessary and sufficient for infeasibility of (I) if and only if the
set B = ARn − K of the right hand sides in the conic constraint (I) resulting in constraint’s
solvability is closed; in fact, solvability of (II) is necessary and sufficient condition for b to
belong to the closure B of B. Now, when K = {y : P y ≥ 0} is a polyhedral cone, e.g., Rm + , B is
polyhedral (since its definition in the case under consideration is its polyhedral representation
as well) and therefore closed, which explains why when the cone K is polyhedral infeasibility of
(II) is equivalent to solvability of (I). At the same time, when K is not polyhedral, B can be non-
closed, as is the case in example from item 2. Let us look at the geometry of this p example. (i)
wants of us to find a point in the intersection of the cone L3 = {x ∈ R3 : x3 ≥ x21 + x22 } with
the line ℓ = {[t; −1; t] ∈ R3 : t ∈ R}. ℓ belongs to the 2D plane L = {x ∈ R3 : x2 = −1}, and
the intersection of L3 with this plane is the set {[x1 ; −1; x3 ] : x23 − x21 ≥ 1, x3 ≥ 0}, or, which is
the same, the set {[x1 ; −1; x3 ] : (x3 − x1 )(x3 + x1 ) ≥ 1, x3 − x1 ≥ 0}; introducing the coordinates
u = x1 + x3 , v = x1 − x3 on the 2D plane L, the intersection of L and L3 in these coordinates
becomes the inner part H = {[u; v] : u ≥ 1/v, v > 0} of the branch Γ = {[u; v] : uv = 1, v > 0} of
hyperbola. In u, v-coordinates the line ℓ is just the line v = 0. Thus, geometrically the situation
is as follows: to intersect ℓ and L3 is the same as to intersect H with the v-axis of the [u; v]-
plane; the intersection is clearly empty, so that (i) is infeasible. At the same time, our line is an
316 Exercises for Part IV

asymptote of Γ, so that the shift v = ϵ of the line v = 0 makes the intersection of the shifted line
with H nonempty, whatever small ϵ > 0 be. The outlined shift of ℓ in our original x-coordinates
reduces to passing from b = [0; 1; 0] to bϵ = [0; 1; −ϵ]. The bottom line is that b ̸∈ B and b ∈ B,
since b = limϵ→+0 bϵ and bϵ ∈ B.
The result of item 3 attracts our attention to the following question: What are natural suffi-
cient conditions which guarantee the closedness of the set ARn − K ? Here is a simple answer:
4. Prove that when the only common point of the image space L := {y ∈ Rm : ∃x : y = Ax} of
A and of K is the origin, the set B := ARn − K = L − K is closed. Prove that the same holds
true when the condition L ∩ K = {0} is “heavily violated,” meaning that L ∩ int K ̸= ∅.
Exercise IV.5 ♦ [follow-up to Exercise IV.4] Let K ⊂ Rm be a regular cone, P ∈ Rm×n ,
Q ∈ Rm×k , and p ∈ Rm . Consider the set
K = {x ∈ Rn : ∃u ∈ Rk : P x + Qu + p ∈ K}

This set clearly is convex. When the cone K is polyhedral, the above description of K is its
polyhedral representation, so that the set K is polyhedral and as such is closed.
The goal of this exercise is to understand what happens with closedness of K when K is a
general-type regular cone.
1. Is it true that K is closed whenever K is a regular cone?
Hint: Look what happens when K = L3 , P = I3 , Q = [0; 1; 1] ∈ R3×1 , and p = [0; 0; 0]
2. Prove when K is a regular cone and ImQ ∩ K = {0}, K is closed.
Exercise IV.6 ♦ Let n(x) be a norm on Rn such that n is continuously differentiable outside
of the origin, and let
n∗ (y) = max{y ⊤ x : n(x) ≤ 1}.
x

be the norm conjugate to n (see Fact III.16.4), so that n∗ (·) is a norm such that
x⊤ y ≤ n(x)n∗ (y) ∀x, y ∈ Rn
and (n∗ )∗ = n, implying that for every x ̸= 0 there exists y ̸= 0 such that
x⊤ y = n(x)n∗ (y).
Here are your tasks:
1. Let M be a d × d matrix, d ≥ 2, with diagonal entries equal to 1. Assume that M λ ≤ 0 for
some nonzero vector λ ≥ 0. How large could be mini,j Mij ?
2. For d ≥ 2, let p1 , . . . , pd be n∗ (·)-unit vectors, w1 , . . . , wd be n(·)-unit vectors, and let p⊤
i wi =
1, 1 ≤ i ≤ d. Assume that 0 ∈ Conv{p1 , . . . , pd }. How small could be maxi̸=j n(wi − wj ) ?
3. Let x ∈ Rn be nonzero.
1. Let g = ∇n(x).
1. What is n∗ (g) ?
2. What is g ⊤ x ?
3. Let e be such that n∗ (e) ≤ n∗ (g) and e⊤ x = g ⊤ x. Is it true that e = g ?
2. Given N points yi ∈ Rn , consider the problem of finding the smallest n(·)-ball containing
y1 , . . . , y N .
1. Write down the problem as a conic one, and write down the conic dual of this problem.
Are both the problems solvable with equal optimal values?
2. Assume that the data are such that the optimal value in (P ) is equal to 1. How small
can be maxi,j n(yi − yj ) ?
Hint: write down and analyze optimality conditions.
3. In the situation of item 3.2.2, assume that n(x) = ∥x∥2 is the standard Euclidean
norm. How small can be maxi,j n(yi − yj ) now?
28.1 Around Conic Duality 317

28.1.1 ⋆ Geometry of primal-dual pair of conic problems

Exercise IV.7 ♦ [geometry of primal-dual pair of conic problem] The goal of the Exercise is
to reveal notable geometry of primal-dual pair of conic problem.
It is convenient to work with the primal problem in the form
n o
Opt(P ) = min c⊤ x : Ax − b ≥K 0, P x = p (P )
x

where K is a regular cone in certain RN . As is immediately seen, the conic dual of (P ) reduces
to the problem
n o
Opt(D) = max b⊤ y + p⊤ z : y ∈ K∗ , A⊤ y + P ⊤ z = c 1
(D)
y,z

From now on we make the following, in fact, rather weak,

Assumption: The systems of linear equality constraints in (P ) and (D) are solvable.

Let us fix x and (y, z) such that

P x = p & A⊤ y + P ⊤ z = c. (#)

Your first task is as follows:

1. Pass in (P ) from variables x to primal slack ξ = Ax − b. Specifically, prove that in terms of
primal slack (P ) becomes the problem
⊤
Opt(P) = minξ y ξ : ξ ∈ K ∩ [L − ξ] (P)
L = {ξ : ∃x : ξ = Ax, P x = 0}, ξ = b − Ax

namely, prove that

(i) Every feasible solution x to (P ) induces feasible solution ξ = Ax − b to (P), and the value
of the objective of (P ) at x differs from the value of the objective of (P) at ξ = Ax − b by
the independent of x constant:
h i
y ⊤ ξ = c⊤ x − y ⊤ b + z ⊤ p . (A)

(ii) Vice versa, every feasible solution ξ to (P) is of the form Ax − b for some feasible solution
x to (P ).
The bottom line is that (P ) can be reformulated equivalently as (P), and the optimal values
of these two problems are linked by the relation
h i
Opt(P) = Opt(P ) − y ⊤ b + z ⊤ p .

Next task is as follows:

1 building conic dual to a conic problem is a purely mechanical process; however, this process as
presented in section 22.4 operates with conic problem in a form slightly different from the one of
(P ), namely, with linear inequality constraints instead of linear equalities. To apply this process to
(P ), it suffices to represent the linear equalities P x = p by a pair of opposite linear inequalities
P x − p ≥ 0, −P x + p ≥ 0. Applying the recipe from section 22.4 to the resulting problem, the dual
reads

b⊤ y + [z ′ − z ′′ ]⊤ A⊤ y + P ⊤ [z ′ − z ′′ ] = c, y ∈ K∗ , z ′ ≥ 0, z ′′ ≥ 0 .

max
y,z ′ ,z ′′

Passing from z ′ , z ′′ to z = z ′ − z ′′ , we reduce the latter problem to (D).

318 Exercises for Part IV

2. Pass from problem (D) in variables y, z to problem

n ⊤ o
max ξ y : y ∈ K∗ ∩ [L⊥ + y]
⊥ y (D)
L := {y : y ⊤ ξ = 0 ∀ξ ∈ L} = {y : ∃z : A⊤ y + P ⊤ z = 0}

in variable y only, specifically, prove that

(i) The orthogonal complement L⊥ of L indeed is the linear subspace {y : ∃z : A⊤ y + P ⊤ z =
0}.
(ii) y-component of feasible solution (y, z) to (D) is a feasible solution to (D), and vice versa
– every feasible solution y to (D) can be augmented by z to yield a feasible solution (y, z) to
(D). Besides this, whenever (y, z) is feasible for (D), we have
⊤
b⊤ y + p⊤ z = ξ y + c⊤ x. (B)
The bottom line is that (D) can be reformulated equivalently as (D), and the optimal values
of these two problems are linked by the relation
Opt(D) = Opt(D) − c⊤ x.
The summary of items 1 and 2 is as follows:
• Primal-dual pair (P ), (D) of conic problems reduces to pair of problems (P), (D),
“reduces” meaning that feasible solutions x and (y, z) to (P ), (D) induce feasible
solutions ξ = Ax − b and y to (P), (D), and every pair of feasible solutions to the
latter problems can be obtained, in the fashion just described, from a pair of feasible
solutions to (P ), (D);
• Geometrically, (P), (D) are as follows:
• Problems’ data are (a) primal-dual pair of regular cones K, K∗ in some RN , (b)
pair of linear subspaces LP , LD in RN which are orthogonal complements to each
other, and (c) pair of vectors y, ξ in RN .
• (P) is the problem of minimizing linear objective y ⊤ ξ over the intersection of the
primal feasible plane MP := LP − ξ with the cone K, while (D) is the problem
⊤
of maximizing the linear objective ξ y over the intersection of the dual feasible
plane MD := LD + ȳ and the dual cone K∗ .
Pay attention to the “nearly perfect” primal-dual symmetry; the only asymmetry is that in the
primal feasible plane the shift vector is −ξ – minus the vector of coefficients of the objective
in (D), while in the dual feasible plane the shift vector is y – the vector of coefficients of the
objective in (P). This minor asymmetry stems from the fact that by tradition one of the problems
(in our presentation, (P)) is written as a minimization program, and the other problem from
the pair as a maximization one.
In fact, the symmetry can be made perfect, and the objectives – eliminated at all.
3. Consider pairs of problems (P ), (D) along with problems (P), (D), and let x, (y, z) be feasible
solutions to (P ), (D), and ξ, y – the feasible solutions to (P), (D) induced by x and (y, z),
respectively. Prove that the duality gap
DualityGap(x; y, z) := c⊤ x − [b⊤ y + p⊤ z]
– the difference between the objective of primal problem (P ) evaluated at primal feasible
solution x and the objective of the dual problem (D) evaluated at the dual feasible solution
(y, z) – is nothing but the inner product ξ ⊤ y of ξ and y.
Since solving the primal-dual pair (P ), (D) is the same as minimizing the duality gap over pairs
(x, (y, z)) of their feasible solutions, we conclude that
Solving (P ), (D) is the same as finding in the feasible set MP ∩ K of (P) and the
feasible set MD ∩ K∗ of (D) pair of vectors ξ, y as close to orthogonality as possible.
28.2 Around S-Lemma 319

Note that for every pair ξ ∈ MP ∩ K, y ∈ MD ∩ K∗ we have ξ ∈ K, y ∈ K∗ , that is, ξ ⊤ y ≥ 0;

the zero value of the latter inner product means zero duality gap for the associated with ξ, y
feasible solutions x, (y, z) to (P ), (D). The latter, by Weak Duality, implies optimality of x
in (P ) and of (y, z) in (D). By Conic Duality Theorem, the desired orthogonal to each other
ξ ∈ MP ∩ K and y ∈ MD ∩ K∗ definitely exist, provided that the primal and the dual feasible
planes intersect the interiors of the respective cones.

Figure IV.1. Geometry of primal-dual conic pair

∠AOB – cone K; ∠COD – cone K∗ ; segment [P, Q] – feasible set of (P);
ray [ST ) – feasible set of (D); Q is the primal, and S is the dual optimal solution.
−−→ −→ −
−→ −→
Pay attention to the orthogonality of P Q to ST and of OQ to OS.

We comment that the geometric formulation of (P), (D) – “find orthogonal to each other vectors
in the intersections of given affine planes with given cones2 ” for the authors, who in their high-
school years were taught traditional geometry, sounds as a problem from their old geometry
textbooks (modulo the fact that the dimensionality now is arbitrary, and not 2 or 3). It is
extremely surprising that in spite of its quite old-fashioned appearance, this geometric problem
happens to be responsible for an extremely wide spectrum of applications, ranging from feeding
poultry and cattle to decision making, signal and image processing, engineering design, etc., etc.

28.2 Around S-Lemma

Exercise IV.8 Recall that S-Lemma guarantees that the validity of the implication

xT Ax ≥ 0 =⇒ xT Bx ≥ 0 [A, B ∈ Sn ]

is the same as the existence of λ ≥ 0 such that B ⪰ λA only under the assumption that the
inequality xT Ax ≥ 0 is strictly feasible. Does the lemma remain true when this assumption is
lifted?
Exercise IV.9 ♦ Given A ∈ Sn , consider the set QA = {x ∈ Rn : x⊤ Ax ≤ 0}.
1 Let B ∈ Sn be such that B ̸= A and QB = QA . Then, is it always true that there exists
ρ > 0 such that B = ρA?
2 Suppose that A ∈ Sn satisfies Aij ≥ 0 for all i, j. Under this condition, does your answer
to item 1 change?
3 Suppose that A ∈ Sn satisfies λmin (A) < 0 < λmax (A). Under this condition, does your
answer to item 1 change?

2 not arbitrary planes and arbitrary cones: the planes should be shifts of linear subspaces which are
orthogonal complements of each other, the cones should be duals of each other.
320 Exercises for Part IV

Exercise IV.10 ♦ For two nonzero reals a, b, one has 2|ab| = minλ>0 [λ−1 a2 + λb2 ], implying
by
the Schur Complement Lemma that 2|ab| ≤ c if and only if there exists λ > 0 such that
c − λb2 a

⪰ 0. Assuming b ̸= 0, we have also 2|ab| ≤ c if and only if there exists λ ≥ 0
a λ
c − λb2 a

such that ⪰ 0. Note also that c ≥ 2|ab| is the same as c ≥ 2aδb for all δ ∈ [−1, 1].
a λ
Prove the following matrix analogy of the above observation3 :
Let A ∈ Rp×r , B ∈ Rp×s , let B ̸= 0, and let D = {∆ ∈ Rr×s : ∥∆∥ ≤ 1}, where ∥ · ∥
⊤ ⊤ ⊤
is the spectral norm. Then C ⪰ [A∆B + B∆ A ] for all ∆ ∈ D if and only if there exists
⊤

C − λBB A
λ ≥ 0 such that ⪰ 0. In particular, when a, b ∈ Rp and b ̸= 0, one has
A⊤ λIr
C − λbb⊤ a

C ⪰ ±[ab⊤ + ba⊤ ] if and only if there exists λ ≥ 0 such that ⪰ 0.
a⊤ λ

Exercise IV.11 ♦ [Robust TTD] 4 Let us come back to TTD problem (5.2). Assume we have
solved this problem and have at our disposal the resulting nominal truss withstanding best of
all, the total truss volume being a given W > 0, the load of interest f . Now, we cannot ignore
the possibility that “in real life” the truss can be affected, aside of the load of interest f , by
perhaps small, but still nonzero, occasional load composed of forces acting at the free nodes
utilized by the nominal truss (think of railroad bridge and wind). In order for our truss to be
useful, it should withstand well all small enough occasional loads of this type. Note that our
design gives no guarantees of this type – when building the nominal truss, we took into account
just one loading scenario f .
1. To get impression of potential dangers of “small occasional loads,” run numerical study as
follows:
• Compute the optimal console t∗ (see “Console design” Exercise I.16)
• Looking one by one at the free nodes p1 , . . . , pµ actually used by the nominal console,
associate with every one of them single-force occasional load, the corresponding force
acting at node under consideration, generate this force as random 2D vector of Euclidean
length 0.01 (that is, 1% of the magnitude of the single nonzero force in the load of interest),
and compute the compliance of the nominal truss w.r.t. to the resulting occasional load.
Conclude that the nominal console can be crushed by small occasional load and is therefore
completely impractical.
2. Proposed cure is, of course, to use Robust Optimization methodology – to immunize the truss
against small occasional loads, that is, to control its compliance w.r.t. the load of interest
and all small occasional loads. An immediate question is where the occasional loads should
be applied. There is no sense to allow them to act at all free nodes from the original set of
tentative nodes – we have all reasons to believe that some, if not most, of these nodes will
not be used in the optimal truss, so that we should not bother about forces acting at these
nodes. On the other hand, we should take into account occasional loads acting at the nodes
actually used by the optimal robust truss, and we do not know in advance what these nodes
are. A reasonable compromise here as follows. After the nominal optimal truss is built, we
can reduce the nodal set to the nodes actually used in this truss, allow for all pair connection
of these nodes and resolve the TTD problem on this reduced sets of tentative nodes and
tentative bars, now taking into account not only the load of interest, but all small occasional
loads distributed along the nodes of our new nodal set. This approach can be implemented
as follows.
3 S. Boyd, L. El Ghaoui, E. Feron, V. Balakrishnan, Linear Matrix Inequalities in System and Control
Theory – SIAM 1994.
4 Preceding exercises in the TTD series are I.16, I.18, III.9.
28.3 Miscellaneous exercises 321

• We specify V as the set of virtual displacements of nodes of our reduced nodal set, pre-
serving the original status (”fixed” – ”free”) of these nodes, and denote by f the natural
projection of the load of interest on V; note that all nonzero blocks in f – those represent-
ing nonzero physical forces from the collection specifying f – are inherited by f , since the
free nodes where these nonzero forces are applied should clearly be used by the nominal
truss.
• We specify F as the “ellipsoidal envelope” of f and all small in magnitude (measured in
∥ · ∥2 -norm) loads from V. Specifically, we use f as one of the half-axes of F; the other
M − 1 half-axes of F (M = dim V) are orthogonal to each other and to f vectors from
V of ∥ · ∥2 -norm ρ∥f ∥2 , where the “uncertainty level” ρ ∈ [0, 1] is a parameter of our
construction. Note that
F = {g = P h : h⊤ h ≤ 1}

for properly selected M × M matrix P .

• We define the robust compliance C(t) of a truss t̄ ∈ RN + (N is the number of bars in
our new – reduced – set of tentative bars), as the supremum, over g ∈ F, of the usual
compliances (computed for the new nodal set) of t w.r.t. load g, and pose the Robust
Counterpart of the TTD problem as the problem of minimizing this robust compliance
over trusses t ≥ 0 of total volume W . Solving this problem, we arrive at the robust truss.
An immediate question is how to solve the Robust Counterpart. Those who solved Exercise
I.16.3 know that as stated right now, the Robust Counterpart is the semiinfinite – with
infinitely many convex constraints – optimization program
 " # 
N ⊤
B Diag{t}B g
 X 
N
Opt = min τ : t ∈ R+ , ti = W, ⊤
⪰ 0, ∀g ∈ F (#)
t,τ 
i=1
g 2τ 

where B is the matrix built for the new TTD data in the same fashion as the matrix B was
built for the original data.
Here go your tasks:
1. Reformulate (#) as a “normal” convex optimization problem – one with efficiently com-
putable convex objective and finitely many explicitly verifiable convex constraints.
2. Solve the Console design version of the latter problem and subject the resulting robust
truss to the same tests as those proposed above for quantifying the “real-life” quality of
the nominal truss.

28.3 Miscellaneous exercises

Exercise IV.12 ▲ Find the minimizer of a linear function

f (x) = c⊤ x

on the set
n
X
Vp = {x ∈ Rn | |xi |p ≤ 1};
i=1

here p, 1 < p < ∞, is a parameter. What happens with the solution when the parameter becomes
0.5?
Exercise IV.13 ▲ Every one of 3 random variables ξ1 , ξ2 , ξ3 takes values 0 and 1 with
probabilities 0.5, and every two of these 3 variables are independent. Is it true that all 3 variables
are mutually independent? If not, how large could be probability of the event ξ1 = ξ2 = ξ3 = 1?
322 Exercises for Part IV

Exercise IV.14 ▲ [computational study] Consider situation as follows: at discrete time in-
stants t = 1, 2, . . . , T we observe the states yt ∈ Rκ of dynamical system; our observations
are
yt + σξt , t = 1, 2, . . . , T,
where σ > 0 is a given noise intensity and ξt are independent across t zero mean Gaussian noises
with unit covariance matrix. All we know about the trajectory of the system is that
∥yt+1 − 2yt + yt−1 ∥2 ≤ dt2 α,
where dt > 0 is the continuous time interval between consecutive discrete time instants; in other
words, the Euclidean norm of the (finite-difference approximation of the) acceleration of the
system is ≤ α. Given time delay d, we want to estimate the linear form f ⊤ yT +d of the system’s
state at time T + d ≥ 1, and we intend to use a linear estimate
T
X
yb = h⊤
t ωt .
t=1

1. Write down optimization problem specifying the minimum risk linear estimate, with the risk
of an estimate defined as
r
Risk[b
y ] = sup E{|by − f ⊤ yT +d |2 },
y∈Y

where Y is the set of all trajectories y = {yt , −∞ < t < ∞} satisfying all constraints (!).
2. Use Conic Duality to convert the problem from the previous item into a Conic Quadratic
problem.
3. Carry out numerical experimentation with minimum risk linear estimate.
Exercise IV.15 ▲ [computational study] Consider the following problem:
A particle is moving through Rd . Given positions and velocities of the particle at
times t = 0 and t = 1, find the trajectory of the particle on [0, 1] with minimum
possible (upper bound) on acceleration.
1. Formulate the (discretized in time version of the) problem as a Conic Quadratic problem
and write down its conic dual. Are the problems solvable? Are the optimal values equal to
each other? What is said by optimality conditions?
2. Run numerical experiments in 2D and 3D and look at the results.
Exercise IV.16 ▲ [computational study] The study offered to you in this Exercise is aimed
at answering the following question::
A steel rod is heated at time t = 0, the magnitude of the temperature being ≤ R,
and is left to cool, the temperature at the endpoints being all the time kept 0. We
measure the temperature of the rod at locations si and times ti > 0, 1 ≤ i ≤ m; the
measurements are affected by Gaussian noise with zero mean and covariance matrix
σ 2 Im . Given the measurements, we want to recover the distribution of temperature
of the rod at time t̄ > 0.
Building the model. With properly selected units of temperature and length (so that the
rod becomes the segment [0, 1]), evolution of the temperature u(t, s) (t ≥ 0 is time, s ∈ [0, 1] is
location) is governed by the Heat equation
∂ ∂2
u(t, s) = 2 u(t, s) [u(t, 0) = u(t, 1) ≡ 0]
∂t ∂s
It is convenient to represent functions on [0, 1] as
∞
X √
f (s) = fk ϕk (s), ϕk (s) = 2 sin(πks).
k=1
28.3 Miscellaneous exercises 323

Functions ϕk form an orthonormal basis in the space L2 = L2 [0, 1] of square summable real-
valued functions on [0, 1] equipped with the inner product

Z 1
⟨f, g⟩ = f (s)g(s)ds,
0

qR
1
the corresponding norm being ∥f ∥2 = 0
f 2 (s)ds.

The claim that functions ϕk form an orthonormal basis in L2 means that a series

∞
X
fk ϕk (s),
k=1

fk2 < ∞, and in this case fk = ⟨f, ϕk ⟩;

P
converges in ∥ · ∥2 to some f ∈ L2 if and only if k
moreover,
X X X
⟨ fk ϕk (·), gk ϕk (·)⟩ = fk gk
k k k

for all square-summable sequences {fk } and {g. }. In particular,

∞
X Z 1
u(t, s) = uk (t)ϕk (s), uk (t) = u(t, s)ϕk (s)ds.
k=1 0

Assuming |u(0, ·)| ≤ R, we have

X
u2k (0) ≤ R2 , (28.1)
k

and in terms of the coefficients uk (t) of the rod’s temperature, the Heat equation becomes very
simple:

d
uk (t) = −π 2 k2 uk (t) =⇒ uk (t) = exp{−π 2 k2 t}uk (0).
dt

As a result, when t > 0, the coefficients uk (t) go to 0 exponentially fast as k → ∞, so that the
series
X
uk (t)ϕk (s)
k

converges to the solution (t, s) of the heat equation not only in ∥ · ∥2 , but uniformly on [0, 1] as
well, implying, due to ϕk (0) = ϕk (1) = 0, that the series does satisfy the boundary conditions
u(t, 0) = u(t, 1) = 0, t > 0.
Now our problem can be posed as follows:
324 Exercises for Part IV
The sequence of coefficients {utk }∞
k=1 of u(t, ·) in the orthonormal basis {ϕk (·)}k≥1 of
L2 evolves according to
utk = exp{−π 2 k2 t}u0k ,
with
X
u0 := {u0k }k≥1 ∈ B := {{ck }k≥1 : c2k ≤ R2 }.
k

Given m noisy observations

∞
X
ωi = Ωi [u0 ] + σξi , Ωi [u0 ] = exp{−π 2 k2 ti }u0k ϕk (si ),
k=1

where ξ1 , . . . , ξm are independent of each other N (0, 1) observation noises, and ti > 0,
si ∈ [0, 1] are given, we want to recover the sequence {ut̄k }k≥1 .
We quantify the performance of a candidate estimate ω := (ω1 , . . . , ωm ) 7→ u b :=
{b
uk (ω)}k≥1 by the risk
v  
u
u X 
Risk[bu] = t max Eξ uk (Ω1 [u0 ] + ξ1 , . . . , Ωm [u0 ] + ξm ) − exp{−π 2 k2 t̄}u0k ]2
[b
u
u0 ∈B  
k≥1

that is, Risk2 is the worst, with respect to the distribution of temperature at time
t = 0 of ∥ · ∥2 -norm not exceeding R, expected squared norm ∥ · ∥22 of the recovery
error.
Our last modeling step is to replace infinite sequences {u0k }k≥1 with their finite initial segments
{u0k }1≤k≤K , that is, to approximate the situation by the one where uk0 = 0 when k > K. The
simplest way to do it is as follows. Let t = min[mini ti , t̄], so that t > 0. For u0 ∈ B and K ≥ 1,
the magnitude of the total contribution of the coefficients uk0 , k > K, to u(t, s) with t ≥ t does
not exceed
∞ ∞
X √ X
max |ϕk (s)| exp{−π 2 k2 t}|uk0 | ≤ δ := 2R exp{−π 2 k2 t}.
s
k=K+1 k=K+1

−10
Given a “really small” tolerance δ̄ > 0, say, δ̄ = 10 , we can easily find K = K(δ̄) such that
δ ≤ δ̄. Thus, as far as the temperatures we measure and the temperatures we want to recover
are concerned, zeroing out coefficients uk0 with k > K(δ̄) changes these temperatures by at most
δ̄. Common sense (which can be easily justified by formal analysis) says, that with δ̄ as small as
10−10 , these changes have no effect on the quality of our recovery, at least when σ ≫ δ̄.
Now goes your task:
1. Assuming uk0 = 0 for k > K, model the problem of interest as the following estimation
problem:
“In the nature” there exists K-dimensional signal u known to belong to the centered
at the origin Euclidean ball B R = {u ∈ RK : u⊤ u ≤ R2 } of a given radius R. Given
noisy observations
ω = Au + σξ, [A : m × K, ξ ∼ N (0, Im )]
we want to recover Bu, quantifying the recovery error of a candidate estimate ω 7→
u
b(ω) by its risk
r
Risk2[bu] = u(Au + σξ) − Bu]⊤ [b
sup Eξ∼N (0,Im ) {[b u(Au + σξ) − Bu]}
u∈B R

where B is a given K × K matrix.

Write down the expressions for the matrices A and B.
28.4 Around convex cone-constrained and conic problems 325

2. Build convex optimization problem responsible for the minimum risk linear estimate – esti-
b(ω) = H ⊤ ω.
mate of the form u
3. Compute the minimum risk linear estimate and run simulations to test its performance.
Recommended setup:
• t̄ ∈ {0.01, 0.001, 0.0001, 0.00001}
• m = 100, ti are drawn at random from the uniform distribution on [t̄, 2t̄], si are drawn at
random from the uniform distribution on [0, 1];
• R = 104 , σ = 10, δ̄ = 10−10 ;
• To accelerate computations, truncate K(δ̄) at the level 100.
Exercise IV.17 ▲ Given positive definite A ∈ Sn , let us set
P [A] = {X ∈ Sn : X ⪰ 0, X 2 ⪯ A}, Q[A] = {X ∈ Sn : X ⪰ 0, X ⪯ A1/2 .
From ⪰-monotonicity of the matrix square root on Sn + (Example IV.25.5 in section 25.2) it
follows that P [A] ⊆ Q[A]. Your task is to answer the following question:
Are P [A] and Q[A] ”comparable,” meaning that for some c independent of A (but
perhaps depending on n) one has
Q[A] ⊂ c · P [A] ?

Exercise IV.18 Find the optimal value in the convex optimization problem
nXn X o
Opt(a) = min [−(1 + ai )xi + xi ln xi ] : x ≥ 0, xi ≤ 1
x i=1 i

where 0 ln 0 = 0 by definition, so that the function x ln x is well defined and continuous on the
nonnegative ray x ≥ 0.
Exercise IV.19 ♦ Given m × n matrix A with trivial kernel, consider the matrix-valued
function F (X) = [A⊤ X −1 A]−1 : Dom F := {X ∈ Sm , X ≻ 0} → Sn
+ . Prove that F is ⪰-concave
on its domain.

28.4 Around convex cone-constrained and conic problems

Exercise IV.20 ♦ [cone-constrained semidefinite problems]
1. Let X, Y ∈ Sm+ , Prove that Tr(XY ) = 0 if and only if XY = Y X = 0.
2. Given an ordered collection ν = {n1 , . . . , nk } of positive integers, let Sν be the space of block-
diagonal symmetric matrices with k diagonal blocks of sizes n1 × n1 ,. . . ,nk × nk , and let Sν+
be the cone of positive semidefinite matrices from Sν . Equipping Sν with the Frobenius inner
product, Sν+ is clearly a self-dual regular cone in the resulting Euclidean space.
Convex cone-constrained problem on the cone Sν+ is of the generic form
n o
Opt(SDP) = min f (x) : g(x) := Ax − b ≤ 0, gb(x) := Diag{g1 (x), . . . , gk (x)} ≤Sν+ 0
x∈X
(SDP)
where X is a nonempty convex set in some Rn , the function f : X → R is convex, and the
mapping gb : X → Sν is Sν+ -convex.
Prove that in the case of convex cone-constrained semidefinite problem (SDP) Theorem
IV.23.7 reads
Theorem IV.23.7.SDP Consider a convex cone-constrained semidefinite problem
(SDP), let x∗ ∈ X be a feasible solution to the problem, and let f and gb be differ-
entiable at x∗ . ∗
(i) If x∗ is a KKT point of (SDP), the Lagrange multipliers being λ ≥ 0 and λ
b∗ ∈ Sν+ ,
326 Exercises for Part IV

meaning that
∗
h λi [g(x∗ )]i = 0 ∀i & λb∗ gb(x∗ ) = 0
i [sdp complementary slackness]
∗ ⊤ ∗
∇x f (x) + [λ ] g(x) + Tr(λ b gb(x))
x=x∗
∈ −NX (x∗ ) [ KKT equation]

(here, as always, NX (x) is the normal cone of X, see (14.5)), then x∗ is an optimal
solution to (SDP).
(ii) If x∗ is optimal solution to (SDP) and, if addition to the above premise, (SDP)
satisfies the cone-constrained Relaxed Slater condition, then x∗ is an sdp KKT point,
as defined in (i),

Exercise IV.21 ▲ [follow-up to Exercise IV.20] In the sequel, we fix the dimension n of the
embedding space and denote by EC = {x ∈ Rn : x⊤ Cx ≤ 1} the centered at the origin ellipsoid
associated with positive definite n × n matrix C. Given positive K and K ellipsoids EAk , k ≤ K,
consider two optimization problems:
— O: find the smallest volume centered at the origin ellipsoid containing ∪k≤K EAk
— I: find the largest volume centered at the origin ellipsoid contained in ∩k≤K EAk .
1. Pose O and I as solvable convex cone-constrained semidefinite programs
2. Prove that problems O and I reduce to each other at the cost of appropriate modification
of the data
P
3. Prove that there exist matrices Λk ⪰ 0 such that Λ := k Λk ≻ 0 and

Λk = Λk Ak Λ, k ≤ K.

Exercise IV.22 ▲ Recall convex cone-constrained problem in Example IV.22.1, section 22.1

t : t ≥ Tr(y) , y 2 ⪯ B

Opt(P) = min (22.1)
x=(t,y)∈R×Sn | {z }
⇐⇒ ⟨y,In ⟩−t≤0

where B is a positive definite matrix.

1. Verify (22.2)
2. Find Lagrange multipliers certifying that t∗ = −Tr(B 1/2 ), y∗ = −B 1/2 is a cone-constrained
KKT point of problem (22.1) (and thus, by Theorem IV.23.7, is an optimal solution to the
problem).
3. Consider the parametric family

Opt(p := (v, w)) = min n t : t ≥ Tr(y), yv −1 y ⪯ w

(P[p])
t∈R,y∈S

of convex cone-constrained problems, with p ∈ P = {p = (v, w) : v ∈ int Sn n

+ , w ∈ int S+ }, so
that (22.1) is problem (P[p]) corresponding to

p = (In , B).

Prove that Opt(p) is convex function of p ∈ P and find a subgradient of this function at the
point p.

Exercise IV.23 ▲ [follow-up to Exercise IV.4] Given positive integers m, n, consider two
parametric families of convex sets:

X P
• S1 [P ] = {(X, Y ) ∈ R1 := Sm ×Sn : ⪰ 0}, where the “parameter” P runs through
P⊤ Y
m×n
the space R of m × n matrices, let it be temporarily

denoted P1 ;
X Y
• S2 [P ] = {(X, Y ) ∈ R2 := Sm × Rm×n : ⪰ 0}, where the “parameter” P runs
Y⊤ P
through the positive semidefinite cone Sn
+, let it be temporarily denoted P2 .
28.4 Around convex cone-constrained and conic problems 327

Prove that for χ = 1, 2 the set-valued mappings P → Sχ [P ] are super-additive on their domains:
P, Q ∈ Pχ =⇒ P + Q ∈ Pχ & Sχ [P ] + Sχ [Q] ⊂ Sχ [P + Q] .
| {z }
(∗)

and that the concluding inclusion

— not necessarily is equality for χ = 1, and
— is equality for χ = 2.
Exercise IV.24 In the simplest Steiner problem, one is given m distinct points a1 , . . . , am in
Rn and is looking for a point x∗ such that the sum of Euclidean distances between the points
and x∗ is as small as possible (think, e.g., about m oil wells on 2D plane and the problem of
locating collector to be linked to the wells by pipes in a way minimizing the total length of the
pipes).
1. Pose the problem as conic problem, the cone being direct product of m Lorentz cones.
2. Build the dual problem. Are the primal and the dual problems solvable? Are the primal and
the dual optimal values equal to each other?
3. Write down optimality conditions and see what they say
Hint: You are advised to consider separately the cases where optimal solution differs from
all of the points a1 , . . . , am , and the case when it is one of the points.
4. Solve the problem in the case when n = 2, m = 3 and a1 , a2 , a3 are vertices of triangle on
2D plane.
Note: The point on the plane minimizing the sum of distances to the vertices of a given triangle
is called Fermat point of the triangle. Quoting “Fermat point” in Wikipedia, ”This question [to
minimize the sum of distances from a point to the vertices of triangle] was proposed by Fermat, as
a challenge to Evangelista Torricelli. He solved the problem in a similar way to Fermat’s [. . . ] His
pupil, Viviani, published the solution in 1659.”
Exercise IV.25 ▲ Consider a primal-dual pair of conic problems
n o
Opt(P ) = min c⊤ x : Ax ≥K b (P )
x
n o
Opt(D) = max b⊤ y : y ≥K∗ 0, A⊤ y = c (D)
y

(K ⊂ Rn is a regular cone) and assume that both problems are feasible.

1. Find the recessive cones Rec(P ) and Rec(D) of the primal and the dual feasible sets.
2. Prove that the feasible set of at least one of the problems is unbounded.
Exercise IV.26 (semidefinite duality) A semidefinite program is a conic program involving
the positive semidefinite cone. As a matter of fact, Semidefinite programming – the family of
semidefinite programs – possesses extremely powerful “expressive abilities.” It is prudent to say
that for all practical purposes, whatever it means, Semidefinite programming is “the same” as
the entire Convex programming. In this exercise we would like to acquaint the reader with the
specific form taken by Conic duality when the cone involved is the positive semidefinite cone.
Formally, a semidefinite program is of the form
Opt(P ) = minx∈Rn c⊤ x : Ax − b := j aj xj − b ≥ 0,
P
(P )
Ax − B := x1 A1 + . . . + xn An − B ⪰ 0 ,
where aj , b are vectors from some Rp , and Aj , B are matrices from some Sq . “Real life” form
of a semidefinite program usually is a bit different, namely,
Opt(P) = minx∈Rn c⊤ x : Ax − b := j aj xj − b ≥ 0,
P
(P)
Ai x − B i := x1 Ai1 + . . . + xn Ain − B i ⪰ 0, ∀i ≤ m ,

where Aij , B i ∈ Sqi . In the formulation (P) as opposed to the formulation (P ) we have a bunch
of positive semidefinite cone constraints, i.e., Ai x − B i ⪰ 0, i ≤ m, instead of a single constraint
328 Exercises for Part IV

Ax − B ⪰ 0. We can always rewrite (P) in the form of (P ) by assembling Aij , B i into block-
diagonal matrices Aj = Diag{A1j , . . . , Am 1 m
j }, B = Diag{B , . . . , B }. Taking into account that
a block-diagonal symmetric matrix is positive semidefinite if and only if all the diagonal blocks
are positive semidefinite, we deduce that (P) is equivalent to the problem
( )
⊤
X X
minn c x : Ax − b := xj aj − b ≥ 0, Ax − B := xj Aj − B ⪰ 0
x∈R
j j

of the form (P ). When proving theorems, it is usually better to work with program in the form
of (P ) – it saves notation; in contrast, when working with “real life” semidefinite programs, it
is usually better to operate with problems in more detailed form (P).
Your task is as follows:
1. Verify that the conic dual of (P) is the semidefinite program
( m
)
⊤
X i λ ∈ Rp+ , Λi ∈ Sq+i , i ≤ m
max b λ+ Tr(Λi B ) : , (D)
A⊤ λ + m ∗
P
λ,{Λi ,i≤m}
i=1 i=1 Ai Λi = c,

n
→ Sq its conjugate linear mapping
P
where for the linear mapping x 7→ j xj Aj : R
∗ q n
X 7→ A X : S → R is given by the identity
Tr(X[Ax]) ≡ [A∗ X]⊤ x ∀(x ∈ Rn , X ∈ Sq ),
or, which is the same,
A∗ X = [Tr(A1 X); . . . ; Tr(An X)].

Exercise IV.27 ♦ [example of semidefinite relaxation] Let Tk ⪰ 0, k ≤ K, be positive semidef-

inite m × m matrices such that k Tk ≻ 0, T ⊂ RK
P
+ be a convex compact set intersecting the
⊤
interior of RK
+ , and A be a symmetric m × m matrix. Let also ϕT (z) = maxt∈T z t be the
support function of T . Prove that
P
Opt := minz ϕT (z) : z ≥ 0, A ⪯ k zk Tk (a)
= maxΛ,t {Tr(AΛ) : Λ ⪰ 0, t ∈ T , Tr(Tk Λ) ≤ tk , k ≤ K} (b)
and that both minimization and maximization problems above are solvable.
Comment. Exercise IV.27 tells us an interesting story. Consider the problem of maximizing the homo-
geneous quadratic form x⊤ Ax over the set

X = {x ∈ Rm : ∃t ∈ T : x⊤ Tk x ≤ tk , k ≤ K}

with the above restrictions on Tk and T . Such a set is called ellitope, and this notion covers many
interesting sets, e.g.,
• finite and bounded intersections of centered at the origin ellipsoids/elliptic cylinders; this is what you
get when take T = {[1; . . . ; 1]}. In particular, the intersection of K symmetric with respect to the
origin stripes — sets of the form {x : |a⊤ k x| ≤ 1}, like the unit box {x ∈ R
m : ∥x∥
∞ ≤ 1} is an
ellitope, provided that the intersection is bounded – set T = {[1; . . . ; 1]} and Tk = ak a⊤
k;
• ∥ · ∥p -balls with 2 ≤ p ≤ ∞: {x ∈ Rm : ∥x∥p ≤ 1} = {x ∈ Rm : ∃t ∈ T : x2k ≤ tk , k ≤ K = m} with
T = {t ∈ Rm + : ∥t∥p/2 ≤ 1}.

It is known that computing the maximum Opt∗ = maxx∈X x⊤ Ax of quadratic form over an ellitope,
even as simple as the unit box, is a computationally intractable problem, even when A is restricted to be
positive semidefinite. However, the difficult to compute quantity Opt∗ admits the semidefinite relaxation
bound built as follows: whenever z ∈ RK
P
+ is such that A ⪯ k zk Tk and x ∈ X , there exists t ∈ T such
⊤
that x Tk x ≤ tk , k ≤ K, and we have
" #
X X X
⊤ ⊤
x Ax ≤ x zk Tk x = zk x⊤ Tk x ≤ zk tk ≤ ϕT (z),
k k k
28.4 Around convex cone-constrained and conic problems 329

implying that the optimal value Opt of (a) is an upper bound on Opt∗ . For reasonable T , in particular,
those in the above examples, Opt, in contrast to Opt∗ , is efficiently computable.
Now, the fact that Opt is the optimal value not only of (a), but of (b) as well, also admits a useful
interpretation. Namely, instead of maximizing x⊤ Ax over x ∈ X , let us maximize the expectation of
ξ ⊤ Aξ over randomized solutions ξ, that is, random vectors ξ which satisfy the constraints specifying X
at average — random vectors ξ with distributions P satisfying the condition

∃t = t[P ] ∈ T : Eξ∼P {ξ ⊤ Tk ξ} ≤ tk , k ≤ K. (#)

Setting Λ = Eξ∈P {ξξ ⊤ }, (#) implies that Λ ⪰ 0 and Tr(ΛTk ) ≤ tk , k ≤ K, for some t ∈ T , that is, (Λ, t)
is feasible for (b). Vice versa, if (Λ, t) is feasible for (b), then, representing Λ as the covariance matrix of
random vector ξ (which always can be done, in many ways, since Λ ⪰ 0), we get randomized solution
ξ such that (ξ, t) satisfies (#). The bottom line is that the equality in (a), (b) allows for alternative
interpretation of the semidefinite relaxation upper bound Opt on Opt∗ — as the maximal expected
value of ξ ⊤ Aξ over all random vectors ξ belonging to X “at average.” This interpretation underlies all
known results on tightness of semidefinite relaxation, like the relation
√
Opt∗ ≤ Opt ≤ 3 ln( 3K)Opt∗ .

Derivation of this (in fact, unimprovable, up to absolute constants involved, without additional restric-
tions on A and X ), bound on the quality of semidefinite relaxations, while not being too difficult, goes
beyond the scope of this textbook. It should be added that in special cases better approximation bounds
can be found. For example, when A ⪰ 0 and X = {x : ∥x∥∞ ≤ 1} is the unit box (or, more generally,
√
matrices T1 , . . . , TK commute with each other), the “tightness factor” 3 ln( 3K) can be improved to
π
2
≈ 1.571 (Nesterov’s π2 Theorem), and when A is further restricted to have nonpositive off-diagonal
entries and zero row sums – to 1.138 (MAXCUT Theorem of Goemans and Williamson).
Exercise IV.28 ♦ 5 What follows is the concluding exercise in the “Truss Topology Design”
series. We have already used TTD problem to present instructive “real life” illustrations of the
power of several results of Convex Analysis, specifically, Caratheodory Theorem (Exercise I.18),
epigraph description of convexity and Helly Theorem (Exercise III.9) and S-Lemma (Exercise
IV.11), not speaking about the Schur Complement Lemma which was instrumental in all these
exercises. Now it is time to illustrate the power of conic duality.
In the sequel, we assume that the reader is reasonably well acquainted with Truss Topology
Design story as told in Exercise I.16 and use without additional comments the notions, notation,
and the results presented in this Exercise, including the default assumption R which remains in
force below. In addition, we assume from now on that the load of interest f is nonzero – this is
the only nontrivial case in TTD.
Recall that the TTD problem as posed in Exercise I.16.2 reads
( )
B Diag{t}B ⊤ f
X
Opt = min τ : ⪰ 0, t ≥ 0, ti = W (P )
τ,r f⊤ 2τ i

In our present language, this is a semidefinite program, and we know from Exercise I.16 that
this problem is solvable.
Your first task is easy:
1. Build the semidefinite dual of (P ) and prove that the dual problem is solvable with the same
optimal value Opt as the primal problem (P ).
Since passing from a semidefinite problem to its dual is a purely mechanical process, on one
hand, and the subsequent tasks will be formulated in terms of the dual problem, here is the dual
as given by Conic Duality:

V g
max −2f ⊤ g − W µ : 2θ = 1, b⊤
i V bi + λi − µ = 0 ∀i, λ ≥ 0, ⊤ ⪰0
V,g,θ,λ,µ g θ
5 Preceding exercises in the TTD series are I.16, I.18, III.9, IV.11.
330 Exercises for Part IV

Eliminating variable τ (which is fixed by the corresponding constraint), we rewrite the dual as

⊤ ⊤ V g
max −2f g − W µ : bi V bi + λi − µ = 0 ∀i, λ ≥ 0, ⪰0 (D)
V,g,λ,µ g ⊤ 12

What is left to you, is to verify the derivation and to prove that (D) is solvable with the same
optimal value Opt as (P ).
Your next task still is easy:
2. Verify that eliminating, by partial optimization, variables V and λ, problem (D) reduces to
the problem
b⊤

µ i g
max −2f ⊤ g − W µ : ⪰ 0 ∀i (D)
g,µ b⊤i g
1
2

and the latter problem is solvable with the same optimal value Opt as (P ) and (D).
Pay attention to the first surprising fact: semidefinite constraints in (D) involve the cone S2+
of 2 × 2 positive semidefinite matrices, and this cone, as we know, is, up to one-to-one linear
transformation, just the Lorentz cone L3 . Thus, (D) is a conic quadratic problem.
Your next task is
3. Pass from problem (D) to its semidefinite dual (P ) and prove that the latter problem is
solvable with optimal value Opt.
At the first glance, the task seems crazy: the dual of the dual is the primal! Note, however, that
(D) is not the plain conic dual to (P ) problem (D) – it is obtained from (D) by eliminating
part of variables, and nobody told us that this elimination keeps the dual to (D) equivalent to
the dual of (D), that is, to (P ).
By the same reasons as in item 1, we take upon ourselves writing down (P ):
( )
1X X X ti qi
min si : ti = W, qi bi = f, ⪰ 0 ∀i (P )
s,t,q 2 qi si
i i i

What is left to you is to prove that (P ) is solvable with optimal value Opt.
Now – the main surprise:
4. Verify that (P ) allows eliminating, by partial minimization, variables ti and si , which reduces
(P ) to solvable optimization problem
( !2 )
1 X X
min |qi | : qi bi = f (#.1)
q 2W i i

with the same optimal value Opt as all preceding problems, (P ) included.
This indeed is a great surprise – (#.1) is equivalent to Linear Programming problem
( )
X
G = min ∥q∥1 : qi bi = f , (#.2)
q
i

the optimal value in this problem being

p
G= 2W Opt.

The challenge is, of course, to extract from optimal solution to (#.2) an optimal truss t∗ –
one with total bar volume W and compliance, w.r.t. load f , equal to Opt, and this is your final
task:
5. Extract from optimal solution to (#.2) an optimal truss.
28.5 ⋆ Cone-convexity 331

28.5 ⋆ Cone-convexity
Exercise IV.29 ♦ [elementary properties of cone-convex functions] The goal of this Exercise
is to extend elementary properties of convex functions onto cone-convex mappings.
A. Let X , Y be Euclidean spaces equipped with norms ∥ · ∥X , ∥ · ∥Y . Let, next, X be a closed
pointed cone in X , Y be a closed and pointed cone in Y, and f : X → Y be a mapping defined
on a nonempty convex set X ⊂ X . Recall that for a closed and pointed cone K in Euclidean
space K and x, x′ ∈ K, relation x ≤K x′ , same as x′ ≥K x, means that x′ − x ∈ K.
Recall that f is called
• (X, Y)-monotone on X, if

x, x′ ∈ X and x ≤X x′ =⇒ f (x) ≤Y f (x′ );

• Y-convex on X, if
f (λx + (1 − λ)x′ ) ≤Y λf (x) + (1 − λ)f (x′ )

for every x, x′ ∈ X and λ ∈ [0, 1].

For example,
— an affine mapping f (x) = Ax + a : X → Y is Y-convex, whatever be pointed closed cone Y;
— when Y = R and Y = R+ , Y-convex on X functions are the convex, in the standard defini-
tion, real-valued functions on X.

A.1. In the situation of A, let Y∗ be the cone dual to Y. For e ∈ Y, let fe (x) = ⟨e, f (x)⟩Y : X → R.
Prove that f is
— Y-convex on X if and only if the function fe is convex on X whenever e ∈ Y∗
— (X, Y)-monotone on X if and only if the function fe is X-monotone on X (i.e., x, x′ ∈
X, x ≤X x′ =⇒ fe (x) ≤ fe (x′ )) for every e ∈ Y∗ .
A.2. In the situation of A, let f be Y-convex. Prove that f is locally bounded and locally Lipschitz
continuous on the interior of X, meaning that if X̄ ⊂ int X is a closed and bounded set, then
there exists M < ∞ such that ∥f (x)∥Y ≤ M holds for all x ∈ X̄ (this is local boundedness)
and there exists L < ∞ such that ∥f (x) − f (z ′ )∥Y ≤ L∥x − x′ ∥X holds for all x, x′ ∈ X̄ (this
is local Lipschitz continuity).
B. Now let us look at elementary operations preserving cone convexity. From now on,
Lin(X , Y) denotes the linear space of linear mappings acting from Euclidean space X to Eu-
clidean space Y. Prove the following statements:
B.1. [“nonnegative linear combinations”] Let X be a nonempty convex subset of Euclidean space
X , Yj , j ≤ J, and Y be Euclidean spaces equipped with pointed closed cones Yj , Y, and
αj ∈ Lin(Yj , Y) be “nonnegative coefficients”, meaning that αj yj ∈ Y whenever yj ∈ Yj .
When mappings fj (x) : X → Yj . are Yj -convex, j ≤ J, their “linear combination with
coefficients αj ” – the mapping
X
f (x) = αj fj (x) : X → Y
j

– is Y-convex.
B.2. [affine substitution of variables] In the situation of A, let z 7→ Az + a : Z → X be an affine
mapping, and let f be Y-convex on X. Then, the function g(z) := f (Az + a) is Y-convex
on the set Z = {z : Az + a ∈ X}.
B.3. [monotone composition] Let Uj , j ≤ J, be Euclidean spaces equipped with closed pointed
cones Uj , let U = U1 ×. . .×UJ , U = U1 ×. . .×UJ , and let Y be an Euclidean space equipped
with closed pointed cone Y. Next, let X be nonempty convex set in Euclidean space X , U
be a nonempty convex set in U, let fj (x) : X → Uj be Uj -convex functions, j ≤ J, such
332 Exercises for Part IV

that f (x) = [f1 (x); . . . ; fJ (x)] ∈ U whenever x ∈ X. Finally, let mapping F : U → Y be

(U, Y)-monotone and Y-convex on U . Then the composition
G(x) = F (f (x)) : X → Y
is Y-convex on X.
Note: we do not forbid some of Uj to be the trivial cones {0}. When Uj is trivial, (Uj , Y)-
monotonicity of F (u1 , . . . , uJ ) with respect to uj automatically holds true, while Uj -convexity
of fj holds true when fj is affine, cf. Convex Monotone Superposition rule in section 13.1.
C. The gradient inequality and existence of directional derivative can be extended from the
usual convex functions (i.e., R+ -convex functions taking values in R) to the cone-convex ones.
Prove the following statements:
C.1. [“gradient inequality”] In the situation of A, let x̄ ∈ X and f be Y-convex on X and
differentiable at x̄. Then
∀y ∈ X : f (y) ≥Y f (x̄) + f ′ (x̄)(y − x̄),
where f ′ (x̄) is the Jacobian of f at x̄.
C.2. [existence of directional derivative] In the situation of A, let f be Y-convex on X, let x̄ ∈
int X and d ∈ X . Then
f (x̄ + td) − f (x̄)
∃Df (x̄)[d] := lim
t→+0 t
and
(t ≥ 0 & x̄ + td ∈ X) =⇒ f (x̄ + td) ≥Y f (x̄) + tDf (x̄)[d]. (#)
Besides this, as a function of d ∈ X , Df (x̄)[d] is positively homogeneous of degree 1 (i.e.,
Df (x̄)[td] = tDf (x̄)[d] when t ≥ 0) and Y-convex.
D. Subdifferentials of the usual convex functions admit natural extensions to the cone-convex
mappings. Specifically, in the situation of A, let x̄ ∈ X. Let us say that g ∈ Lin(X , Y) is a sub-
Jacobian of f at x̄, if
∀y ∈ X : f (y) ≥Y f (x̄) + g[y − x].
For example, C.1 says that if f is Y-convex on X and differentiable at x̄ ∈ X, then the taken
at x Jacobian f ′ (x̄) of f is a sub-Jacobian of f at x̄. Clearly, for a usual convex function its
sub-Jacobians at a point are exactly the linear forms on X given by subgradients f ′ (x) of f at
x according to
gh = ⟨f ′ (x), h⟩X , h ∈ X .
Let Jf (x) be the set of all sub-Jacobians of f at x ∈ X. Prove the following statements:
D.1. In the situation of A, for x ∈ X one has g ∈ Jf (x) if and only if for every e ∈ Y∗ the vector
g ∗ e ∈ X is a subgradient of fe at x; here for g ∈ Lin(X , Y), g ∗ ∈ Lin(Y, X ) is the conjugate
of g: ⟨gu, v⟩Y = ⟨u, g ∗ v⟩X for all u ∈ X , v ∈ Y.
D.2. In the situation of A, let f be Y-convex on X. Then
— D.2.1. For every x ∈ X, the set Jf (x) is a closed convex subset of Lin(X , Y);
— D.2.2. The mapping x 7→ Jf (x) is locally bounded on the interior of X, that is, for every
closed and bounded set X̄ ⊂ int X, the induced norms ∥g∥X ,Y = maxz {∥gz∥Y : ∥z∥X ≤ 1}
of linear mappings g ∈ Jf (x), x ∈ X̄ are bounded away from +∞;
— D.2.3. The multivalued mapping x 7→ Jf (x) is closed on int X: if xi ∈ int X converge
as i → ∞ to x̄ ∈ int X and linear mappings gi ∈ Jf (xi ) converge as i → ∞ to some
ḡ ∈ Lin(X , Y), then ḡ ∈ Jf (x̄).
The most attractive property of subgradients of the usual convex function is their existence, at
least at interior points of the function’s domain. This fact extends to the cone-convex mappings.
Prove the following statements:
28.5 ⋆ Cone-convexity 333

D.3. [existence of sub-Jacobians] In the situation of A, let x̄ ∈ int X and f be Y-convex on X.

Then Jf (x̄) is nonempty.
For a real-valued convex function f and x ∈ int Dom f , d ∈ X , one has

Df (x)[d] = max ⟨y, d⟩X .

y∈∂f (x)

A similar fact holds true for cone-convex functions:

D.4. In the situation of A, let f be Y-convex on X. Let also x̄ ∈ int X and d ∈ X . Then for
properly selected g ∈ Jf (x̄) one has

Df (x̄)[d] = gd,

while for every g ′ ∈ Jf (x̄) one has

Df (x̄)[d] ≥Y g ′ d.

There is a natural relation between sub-Jacobians of Y-convex function f and subgradients of

functions fe = ⟨e, f ⟩Y , e ∈ Y∗ :

D.5. In the situation of A, let f be Y-convex on X and x̄ ∈ int X. For e ∈ Y∗ , h ∈ ∂fe (x̄) (that
is, fe (y) ≥ fe (x̄) + ⟨h, y − x̄⟩X for all y ∈ X) if and only if h = g ∗ e for some g ∈ Jf (x̄).
Finally, the chain rule:

D.6. [chain rule] Let Uj , j ≤ J, be Euclidean spaces equipped with closed pointed cones Uj , let
U = U1 × . . . × UJ , U = U1 × . . . × UJ , and let Y be an Euclidean space equipped with
closed pointed cone Y. Next, let X be nonempty convex set in Euclidean space X , U be a
nonempty convex set in U, let fj (x) : X → Uj be Uj -convex on X functions, j ≤ J, such
that f (x) = [f1 (x); . . . ; fJ (x)] ∈ U whenever x ∈ X. Finally, let mapping F : U → Y be
(U, Y)-monotone and Y-convex on U . As we know from B.3, the composition

G(x) = F (f (x)) : X → Y

is Y-convex on X. Now let x̄ ∈ int X, ūj = fj (x̄) be such that ū = [ū1 ; . . . ; ūJ ] ∈ int U .
Finally, let gj ∈ Jf j (x̄), j ≤ J, and g ∈ JF (ū). Then the linear mapping [u1 ; . . . ; uJ ] 7→
g[u1 ; . . . ; uJ ] is (U, Y)-monotone, and the linear mapping

h 7→ gbh := g[g1 h; . . . ; gJ h] : X → Y

is sub-Jacobian of G at x̄.
Exercise IV.30 Univariate function f (x) = x−1/2 : {x > 0} → R is nonincreasing and convex,
and ∇f (x) = −x−3/2 /2, x > 0. Now let P be m × n matrix with trivial kernel.

1. Prove that the mapping F (X) = [P XP ⊤ ]−1/2 : Sn m n n

++ → S , where S++ = int S+ = {X ∈
n n m m
S : X ≻ 0}, is (S+ , S+ )-antimonotone and S+ -convex

2 −1
2. Assuming P = I2 , compute numerically F (X) and dF (X)[dX] for X = and
−1 1

0 1
dX = . For the above X, compute also the Jacobian J of F at X – the matrix
1 0
2 2
√ mapping2 dX 7→ DF (X)[dX] : S → S – in the basis [1, 0; 0, 0], [0, 0; 0, 1],
of the√ linear
[0, 1/ 2; 1/ 2, 0] of S .
3. How the “Gradient inequality” (Exercise IV.29.C.1) for the Sn
+ -convex mapping F looks like?
334 Exercises for Part IV

28.6 ⋆ Around conic representations of sets and functions

28.6.1 Conic representations: definitions
Let K be a family of regular cones in Euclidean spaces which contains the nonnegative ray R+
and is closed with respect to taking finite direct products and passing from a cone to its dual.
Instructive examples are the families R of nonnegative orthants, L of finite direct products of
Lorentz cones, and S of finite direct products of semidefinite cones.
• A K-representation (K-r.) of a set X ⊂ Rn is its representation of the form

X = {x ∈ Rn : ∃u : P x + Qu − r ∈ K} (28.2)

with K ∈ K – representation of X as the projection of the solution set of conic inequality

P x + Qu ≥K r in variables x, u onto the plane of x-variables where X lives. A set X admitting
conic representation with cone from K is called K-representable (K-r for short).
• A K-representation of a function f : Rn → R ∪ {+∞} is, by definition, K-representation of
the epigraph of f :

[t; x] ∈ epi{f } := {[x; t] : t ≥ f (x)} ⇐⇒ ∃u : P x + tp + Qu − r ∈ K with K ∈ K.

Functions admitting K-representation are called K-representable (K-r for short)

We are already acquainted with R-representability – it is that was called polyhedral repre-
sentability. By Fourier-Motzkin elimination, polyhedral representable sets X ⊂ Rn admit poly-
hedral representations not involving additional variables u, and similarly for R-representable
functions; this is not the case for more general families K, like families L of Lorentz- and S of
semidefinite-representable sets.
The following exercise explains what is the rationale underlying the above restrictions on K
and why we are interested in K-representations.
Exercise IV.31 ♦ Check that
1. Every finite system P0 y ≥ r0 , Pi y − ri ∈ Ki , i ≤ I, of scalar linear inequalities and conic
inequalities, involving cones from K, in variables y is equivalent to a single conic inequality,
with cone from K, in these variables:

{P0 y − r0 ≥ 0, Pi y − ri ∈ Ki , 1 ≤ i ≤ I}
⇐⇒ [P0 ; P1 ; . . . ; PI ]y − [r0 ; r1 ; . . . ; rI ] ∈ K := R+ × . . . × R+ ×K1 × K2 × . . . × KI
| {z }
dim r0 times

and K ∈ K (since R+ ∈ K and K is closed with respect to taking finite direct products).
As a result, representation of a set X as

X = {x : ∃u : P0 x + Q0 u − r0 ≥ 0, Pi x + Qi u − ri ∈ Ki , 1 ≤ i ≤ I} [Ki ∈ K] (!)

– as the projection of the solution set of a finite system of linear and K-conic inequalities in
variables x, u onto the plane of x-variables where X lives, can be straightforwardly converted
into a K-r. of X.
Important: Item 1 allows us from now on to refer to representations of the form (!) as to
K-representations of X, skipping (always straightforward and purely mechanical) conversion of
such a representation into the “canonical” representation (28.2).
2. K-r. of a function straightforwardly induces K-r.’s of its sublevel sets:

{t ≥ f (x)} ⇐⇒ {∃u : P x + tp + Qu − r ∈ K}
[a ∈ R, K ∈ K]
=⇒ Xa := {x : f (x) ≤ a} = {x : ∃u : P x + Qu − [r − ap] ∈ K}
28.6 ⋆ Around conic representations of sets and functions 335

3. Given K-representations of a set X ⊂ Rn and a function f : Rn → R ∪ {+∞}:

X = {x ∈ Rn : ∃u : PX x + QX u − rX ∈ KX },
[KX ∈ K, Kf ∈ K]
epi{f } = {[x; t] : ∃v : Pf x + tpf + Qf v − rf ∈ Kf }

we can straightforwardly convert the optimization problem

min f (x) (∗)

x∈X

into conic problem on a cone from K, namely, the problem

min t : A[x; t; u; v] − b := [PX x + QX u; Pf x + tpf + Qf v] − [rX ; rf ] ∈ K := KX × Kf
x,t,u,v | {z }
∈K

As a result, a solver S capable to solve conic problems on cones from K can be straightfor-
wardly utilized when solving problems (∗) with X and f given by K-r.’s.
4. Given a conic problem
n o
min c⊤ x : Ax − b ∈ K, Rx ≥ r (P )
x

on a cone from K, its conic dual – the conic problem

maxy,z ⟨b, y⟩ + r⊤ z : A∗ y + R⊤ z = c, y ∈ K∗ , z ≥ 0

h i
⟨·, ·⟩ is the inner product in the Euclidean space where K lives, K∗ is the cone dual to K,
A∗ is the conjugate of A: ⟨Ax, y⟩ ≡ x⊤ A∗ y ∀x, y
(D)
also is a conic problem on a cone from K (since K is closed with respect to passing from a
cone to its dual and contains nonnegative orthants).
Note that the option mentioned in the last item of Exercise IV.31 is implemented in ”CVX:
MATLAB software for disciplined convex programming” due to M. Grant and S. Boyd http:
//cvxr.com/cvx – second to none in its scope and user-friendliness tool for numerical processing
of well-structured convex problems, the underlying family K being the semidefinite family S. We
conclude that it makes sense to develop a kind of calculus allowing to recognize K-representability
of sets/functions and to build, when possible, their K-representations. The desired calculus exists
and is pretty simple, general and fully algorithmic. The goal of subsequent exercises is to make
you acquainted with the most frequently used elements of this calculus; for more on this subject,
see [BTN].

28.6.2 Conic representability: elementary calculus

Elementary calculus of conic representability is completely similar to calculus of polyhedral
representations from section 3.3.
Exercise IV.32 [elementary calculus of K-representable sets] Check that basic convexity-
preserving6 operations with sets preserve K-representability. Specifically,
1. Finite intersection of K-r sets Xi = {x ∈ Rn : ∃ui : Pi x + Qi ui − ri ∈ Ki }, i ≤ I (here and
in what follows all cones involved are from K) is K-r:

{x ∈ Rn : ∃u = [u1 ; . . . ; uI ] :
T
Xi =
i≤I
P x + Qu − r := [P1 x + Q1 u1 ; . . . ; PI x + QI uI ] − [r1 ; . . . ; rI ] ∈ K := K1 × . . . × KI }
| {z }
∈K

6 “convexity-preserving” is crucial – clearly, K-r sets and functions must be convex!

336 Exercises for Part IV

2. Direct product of finitely many K-r sets Xi = {x ∈ Rn : ∃ui : Pi x + Qi ui − ri ∈ Ki }, i ≤ I

is K-r:
X1 × . . . × XI = {x = [x1 ; . . . ; xI ] : ∃u = [u1 ; . . . ; uI ] :
P x + Qu − r := [P1 x1 + Q1 u1 ; . . . ; PI xI + QI uI ] − [r1 ; . . . ; rI ] ∈ K := K1 × . . . × KI }
| {z }
∈K

n
3. Affine image Y = {y = Ax + b : x ∈ X} of K-r set X = {x ∈ R : ∃u : P x + Qu − r ∈ K} is
K-r:
Y = {y : ∃[x; u] : Ax + b = y, P x + Qu − r ∈ K}

is the projection onto the y-plane of a set given by explicit finite system of linear and K-conic
inequalities and as such admits an explicit K-r. by item 1 of Exercise IV.31.
4. Inverse affine image Y = {y : Ay + b ∈ X} of K-r set X = {x ∈ Rn : ∃u : P x + Qu − r ∈ K}
is K-r:
Y = {y : ∃u : P Ay + Qy − [r − P b] ∈ K}.

5. The arithmetic sum X = X1 +. . .+XI of K-r sets Xi = {x ∈ Rn : ∃ui : Pi x+Qi ui −ri ∈ Ki },

i ≤ I, is K-r:
X i
X = {x : ∃[x1 ; . . . ; xI ; u1 ; . . . ; uI ] : x − x = 0, Pi xi + Qi ui − ri ∈ Ki , i ≤ I}
i

and it remains to apply item 1 of Exercise IV.31.

Exercise IV.33 ♦ [elementary calculus of K-representable functions] Check that the following
convexity-preserving operations with functions preserve K-representability:
0. Restricting onto K-r set: K-r. t ≥ f (x) ⇐⇒ ∃u : Pf x + tf p + Qf u − rf ∈ Kf of a function
f : Rn → R ∪ {+∞} taken together with K-r. X = {x ∈ Rn : ∃v : PX x + QX v − rX ∈ KX }
of a set X ⊂ Rn induce K-r.

t≥f (x) ⇐⇒ ∃u, v : Pf x + tpf + Qf u − rf ∈ Kf , PX x + QX v − rX ∈ KX

X

f (x) , x ∈ X
of the restriction f X (x) = of f onto X
+∞ , x ̸∈ X
PI
1. Taking linear combination i=1 λi fi with positive coefficients:
t ≥ fi (x) ⇐⇒ ∃ui : Pi x + tpi + Qi ui − ri ∈ Ki , i ≤ I
⇓
PI 1 i P i
t ≥ f (x) := i=1 λi fi (x) ⇐⇒ ∃[t1 ; . . . ; tI ; u ; . . . ; u ] : t ≥ i λi ti , Pi x + ti pi + Qi u − ri ∈ Ki , i ≤ I

2. Direct summation:
t ≥ fi (xi ) ⇐⇒ ∃ui : Pi xi + tpi + Qi ui − ri ∈ Ki , i ≤ I
⇓
t ≥ f (x1 , . . . , xI ) :=
PI i 1 i
i=1 fi (x ) ⇐⇒ ∃[t1 ; . . . ; tI ; u ; . . . ; u ] :
P i i
t≥ i ti , Pi x + ti pi + Qi u − ri ∈ Ki , i ≤ I

3. Taking finite maxima:

t ≥ fi (x) ⇐⇒ ∃ui : Pi x + tpi + Qi ui − ri ∈ Ki , i ≤ I
⇓
t ≥ f (x) := max fi (x) ⇐⇒ ∃[u1 ; . . . ; ui ] : Pi x + tpi + Qi ui − ri ∈ Ki , i ≤ I
i≤I

4. Affine substitution of variables:

t ≥ f (x) ⇐⇒ ∃u : P x + tp + Qu − r ∈ K
⇓
t ≥ g(y) := f (Ay + b) ⇐⇒ ∃u : P Au + tp + Qu − [r − P b] ∈ K

In fact, claims in items 1–4 are special cases of the following observation:
28.6 ⋆ Around conic representations of sets and functions 337

5. Monotone superposition: let functions fi (x), i ≤ I, be K-r with the first K of the functions
being affine, and let F (y) : RI → R ∪ {+∞} be K-r and monotonically nondecreasing in
yK+1 , . . . , yI .
y, y ′ ∈ RI , y ≥ y ′ , yi = yi′ , i ≤ K =⇒ F (y) ≥ F (y ′ ).
Then the functions

F (f1 (x), . . . , fI (x)) , fi (x) < ∞ ∀i
g(x) =
+∞ , otherwise.
is K-r, specifically,
fi are affine, i ≤ K, & t ≥ fi (x) ⇐⇒ ∃ui : Pi x + tpi + Qi ui − ri ∈ Ki , K < i ≤ I

t ≥ F (y) ⇐⇒ ∃u : P y + tp + Qu − r ∈ K
⇓
 ti − fi (x) = 0 ,i ≤ K

 | {z }

t ≥ g(x) ⇐⇒ ∃ti , 1 ≤ i ≤ I, ui , K < i ≤ I, u : linear equations
i
 Pi x + ti pi + Qi u − ri ∈ Ki ,K < i ≤ I



P [t1 ; . . . ; tk ] + tp + Qu − r ∈ K

28.6.3 R/L/S hierarchy

Exercise IV.34 ♦
1. Let K and M be two families of regular cones, each containing nonnegative rays and closed
with respect to taking finite direct products and passing from a cone to its dual cone. Assume
that every cone M ∈ M admits K-representation:
M = {y : ∃v : PM y + QM v − rM ∈ KM }.
|{z}
∈K
n
Show that a M-r. X = {x ∈ R : ∃u : P x+Qu−r ∈ |{z}
M } of a set X can be straightforwardly
∈M
converted into K-r. of X.
2. [Cf. Exercise IV.35] Note that Rn+ belongs to L (same as to every other family of cones we are
considering here – all these families contain nonnegative rays and are closed with respect to
taking finite direct products), thus, every polyhedral representable set/function is Lorentz-
representable as well by item 1. Check that the Lorentz cone Lm is semidefinite-representable
as well, specifically,
qP
m−1 2
Lm := {x ∈ Rm : xm ≥ i=1 xi }  
xm x1 ... xm−1
x1 xm
 
x ∈ Rm : Arrow(x) := ⪰0
 
=  . .. 
 . .

 . 
xm−1 xm

implying by item 1 that cones from L admit explicit S-representations and thus that Lorentz-
representable sets and functions are semidefinite representable as well, with S-r.’s readily
given by L-r.’s.
Exercise IV.35 ♦ It is easy “to see” the nonnegative orthant Rn + in the semidefinite cone S+
n

– Rn + is nothing but the intersection of S n

+ with the linear subspace L of diagonal matrices from
Sn . Formally: Let A be the embedding of Rn into Sn which maps any vector ξ into the diagonal
matrix Diag{ξ}; then z ∈ Rn n n
+ if and only if Az ∈ S+ . Alternatively, you can get R+ as the linear
n n
image of the positive semidefinite cone. In particular, R+ is the image of S+ under the linear
mapping which maps a symmetric n × n matrix Z into the vector Dg(Z) ⊤composed of diagonal
entries of Z. As a result, a Linear Programming problemmin x∈Rn c z : Ax ≤ b can be
P
converted into the equivalent semidefinite problem minX∈Sn i ci X ii : X ⪰ 0, ADg(X) ≤b .
338 Exercises for Part IV

Indeed, similar possibilities exist for the Lorentz cone Ln , including the possibility to reformulate
a conic problem involving direct products of Lorentz cones as a semidefinite program. Specifically,
1. Prove that x ∈ Ln if and only if the following “arrow” matrix
 
xn x1 x2 ... xn−1

 x2 xn 

Arrow(x) := 
 .. ..


 . . 
xn−1 xn

is positive semidefinite.
2. Represent Ln as the image of Sn
+ under a linear mapping.

28.6.4 More calculus

The calculus rules to follow are less trivial:
Exercise IV.36 ♦ [passing from a set to its support function and polar] Let X ⊂ Rn be a
nonempty closed convex set given by essentially strictly feasible K-representation:
X = {x ∈ Rn : ∃u : Ax + Bu − c ≥ 0, P x + Qu − r ∈ K}
(∗)
& ∃x̄, ū : Ax̄ + B ū − c ≥ 0, P x̄ + Qū − r ∈ int K.

This representation induces K-r. of the support function ϕX (y) = supx∈X y ⊤ x, specifically,

A⊤ λ + P ∗ ξ + y = 0, B ⊤ λ + Q∗ ξ = 0
t ≥ ϕX (y) ⇐⇒ ∃(λ, ξ) : .
c⊤ λ + ⟨r, ξ⟩ + t ≥ 0, λ ≥ 0, ξ ∈ K∗
where ⟨·, ·⟩ is the inner product in the Euclidean space where K lives and, as always, K∗ is the
cone dual to K. In addition. (∗) induces K-r. of the polar Polar (X) of X:
⊤
Polar (X) := {y
: y x ≤ 1 ∀x ∈ X}
A⊤ λ + P ∗ ξ + y = 0, B ⊤ λ + Q∗ ξ = 0

= y : ∃(λ, ξ) : ⊤
c λ + ⟨r, ξ⟩ + 1 ≥ 0, λ ≥ 0, ξ ∈ K∗
Exercise IV.37 ♦ Let f : Rn → R ∪ {+∞} be a proper convex lower semiconscious function
given by essentially strictly feasible K-representation:
t ≥ f (x) ⇐⇒ ∃u : Ax + tq + Bu ≥ c, P x + tp + Qu − r ∈ K
& ∃x, t, u : Ax + tq + Bu ≥ c, P x + tp + Qu − r ∈ int K
Build K-r. of the Legendre transform
h i
f ∗ (y) = sup y ⊤ x − f (x)
x

of f .

28.6.5 Raw materials

Rules of grammar become useful only after we have at our disposal words in “dictionary form”
which we can combine using these rules. Similarly, calculus of conic representations becomes
useful only after a rich enough dictionary of “raw materials,” “atoms” – specific K-representable
sets and functions – is built. In contrast to calculus rules which are, basically, independent of
what is the family K of cones in question, raw materials do depend on K. Here we restrict ourselves
with few instructive examples of Lorentz- and Semidefinite-representable sets and functions; for
in-depth acquaintance with this topic, we refer the reader to [BTN].
28.6 ⋆ Around conic representations of sets and functions 339

We understand well what are the “atomic” R-representable functions and sets – these are
half-spaces and affine functions. Other polyhedrally representable sets are intersections of finite
families of half-spaces, and other polyhedrally representable functions – maxima of finitely many
affine functions restricted on a polyhedral domain. In other words, all R-representable functions
and sets are obtained from the above atoms via the calculus we have just outlined.
In the next two exercises we present instructive examples of L-r functions and sets.
Exercise IV.38 ▲ [L-representability of ∥ · ∥2 and ∥ · ∥22 ] Check that the functions ∥x∥2 and
x⊤ x on Rn admits L-r.’s as follows:
{[x; t] ∈ Rn n n+1

x × Rt : t ≥ ∥x∥2 } = [x; t] ∈ R × R : [x; t] ∈ L
⊤
{[x; t] ∈ Rx × Rt : t ≥ x x} = [x; t] ∈ R × R : [2x; t − 1; t + 1] ∈ Ln+2
n n

Exercise IV.39 ♦ [L-representability of power functions] Justify the following claims

1. Let k be a positive integer. Then the set
2k
Y 1/2k
Gk = [t; x1 ; x2 ; . . . ; x2k ] ≥ 0 : t ≤ xi
i=1

– the intersection of the hypograph of the geometric mean of 2k nonnegative variables

k
x1 , . . . , x2k with the half-space {[t; x] ∈ R2x × Rt : t ≥ 0} – admits L-representation, specif-
ically,

Gk = [t; x1 ; x2 ; . . . ; x2k ] ≥ 0 : ∃{ui,ℓ ≥ 0, 1 ≤ ℓ ≤ k, 1 ≤ i ≤ 2ℓ } :
uik = xi , 1 ≤ i ≤ 2k
(∗)
[2uiℓ ; u2i−1,ℓ+1 − u2i,ℓ+1 ; u2i−1,ℓ+1 + u2i,ℓ+1 ] ∈ L3 ,

1 ≤ i ≤ 2ℓ , 1 ≤ ℓ < k
[2t; u1,1 − u2,1 ; u1,1 + u2,1 ] ∈ L3 .
Surprisingly, item 1 paves road to L-representations of power functions.
2. Build explicit L-r’s of the univariate functions as follows:
θ
p x]/q with rational θ = p/q ≥ 1.
2.1. f (x) = max[0,
x + +
,x ≥ 0
2.2. f (x) = , where p± , q± are positive integers with p+ /q+ ≥ 1,
|x|p− /q− , x ≤ 0
p− /q− ≥ 1 p/q
−x ,x ≥ 0
2.3. f (x) = with positive integers p, q such that p/q ≤ 1
+∞ ,x < 0
−p/q
x ,x > 0
2.4. f (x) = with positive integers p, q
+∞ ,x ≤ 0
3. Build L-r’s of the following sets:
3.1. The hypograph
{[x; t] ∈ Rn π1 π2 πn
+ × Rt : t ≤ f (x) := x1 x2 . . . xn }

of algebraic
P monomial of n nonnegative variables, where πi are positive rationals such
that i πi ≤ 1 (the latter inequality for nonnegative πi ’s is a necessary and sufficient for
f to be concave on Rn + ).
3.2. The epigraph of algebraic monomial f (x) = x−π 1
1 −π2
x2 . . . x−πn
n
of n positive variables,
where πi are positive rationals.
3.3. The epigraph of ∥ · ∥π on Rn with rational π ≥ 1.
By Exercise IV.34, expressive abilities of semidefinite representations are at least as strong
as those of Lorentz representability. In fact, S-representability is strong enough to bring, ”for
all practical purposes,” the entire Convex Optimization within the grasp of Semidefinite Opti-
mization. In our next exercise we are just touching the tip of the “semidefinite iceberg.”
340 Exercises for Part IV

Exercise IV.40 ♦
1. For starters, build S-r’s of the maximum eigenvalue of a symmetric matrix and of the spectral
norm ∥ · ∥2,2 (the maximum singular value) of a rectangular matrix.
Hint: Note
a p × q matrix A, the eigenvalues of the symmetric (p + q) × (p + q)
that for
A
matrix are the singular values of A, minus these singular values, and perhaps a
A⊤
number of zeros.
As a matter of fact, the single most valuable S-representation is the one for the sums Sk (X)
of k largest eigenvalues of a symmetric matrix X; convexity of these sums in X was established
in chapter 17.
2. Build S-r. of the sum Sk (X) of k ≤ m largest eigenvalues of m × m symmetric matrix X.
Hint: Recall the polyhedral representation, built in Exercise I.29, of the “vector analogy” of
Sk (X) – the sum sk (x) of k largest entries in m-dimensional vector x:
X
t ≥ sk (x) ⇐⇒ ∃z ≥ 0, s : x ≤ z + s1, zi + ks ≤ t,
i

where 1 is the all-ones vector.

The importance of S-representability of Sk (·) becomes clear from the following
3. Let f (x) : Rm → R ∪ {+∞} be a convex function symmetric with respect to permutations
of entries in the argument, and let

F (X) = f (λ(X)) : Sm → R ∪ {+∞};

recall that F is convex by Proposition III.17.3. Show that F (X) admits the following repre-
sentation:
f (u) ≤ t (a)
u1 ≥ u2 ≥ . . . ≥ um (b)
t ≥ F (x) ⇐⇒ ∃u ∈ Rm : (28.3)
Sk (X) ≤ u1 + . . . + uk , 1 ≤ k < m (ck )
Tr(X) = u1 + . . . + um (cm )

Combine this fact with S-representability of Sk (·) to arrive at the following

Corollary IV.28.1 In the situation of item 3, assume that f is not just symmetric,
but is S-representable as well. A S-r. of f gives rise to explicit S-r. of F (X).

Corollary underlies S-representations of numerous highly important functions and sets, e.g.,
Shatten norms of rectangular matrices – p-norms of the vector of matrix’s singular values, or
the hypograph t ≤ Det1/m (X) of the (appropriate power of the) determinant of X ∈ Sm+ , or the
epigraph of the function Det−1 (X) of X ≻ 0.
Exercise IV.41 ♦ An interesting example of S-representable sets deals with matrix square
and matrix square root:
1. [⪰-epigraph of the matrix square] Prove that the function F (X) = X ⊤ X : Rm×n → Sn is
⪰-convex and find a S-r. of its ⪰-epigraph {(X, Y ) ∈ Rm×n × Sn : Y ⪰ X ⊤ X}.
2. [⪰-hypograph of the matrix square root] Prove that the set {(X, Y ) ∈ Sn × Sn : X ⪰ 0, Y ⪯
X 1/2 } is convex and find its S-r.
Note: Solutions to items 1–2 provide us with S-r’s of the sets {(X, Y ) ∈ Sn × Sn : X ⪰ 0, 0 ⪯
X ⪯ Y 1/2 } and {(X, Y ) ∈ Sn × Sn : X ⪰ 0, X 2 ⪯ Y }. These sets are different, and the second
is “essentially smaller” than the first one, see Exercise IV.17.
28.6 ⋆ Around conic representations of sets and functions 341

Exercise IV.42 ♦ [important example of S-representation] Consider the situation as follows.

Given a basic set B ⊂ Rn which is the solution set of a strictly feasible quadratic inequality:
B = {u ∈ Rn : u⊤ Qu + 2q ⊤ u + κ ≤ 0},
we consider target set
Q = {x ∈ Rm : x⊤ Sx + 2s⊤ x + σ ≤ 0} [S ∈ Sm , s ∈ Rm , σ ∈ R]
and affine mapping
u 7→ P (x) := P u + p : Rn → Rm .
We are interested in the situation when the image of the basic set under the mapping P (·)
is contained in the target set, and want to describe this situation in terms of the parameters
S, s, σ, P, p.
Your task is as follows. Let us set
P ⊤ s − λq

−λQ
M(S, s, σ; P, p; λ) = [P, p]⊤ S[P, p] + .
s⊤ P − λq ⊤ 2s⊤ p + σ − λκ
Prove that the inclusion P (B) ⊂ Q is equivalent to the existence of λ ≥ 0 such that
M(S, s, σ; P, p; λ) ⪯ 0. (!)
Why this is important: Pay attention to the fact that
1. When (S, s, σ) is fixed and S ⪰ 0, M(S, s, σ; P, p; λ) is ⪰-convex in (P, p, λ); moreover, introducing ⪰-upper
bound V on [P, p]⊤ S[P, p], we have

M(S, s, σ; P, p; λ) ⪯ 0
" # ⇕
[P, p]⊤ S 1/2 P ⊤ s − λq

V −λQ
∃V ∈ Sm+1 : ⪰0&V + ⪯ 0,
S 1/2
[P, p] In s⊤ P − λq ⊤ ⊤
2s p + σ − λκ

and we arrive at S-r. of the set of parameters P, p of affine mappings mapping B into Q.
2. When P, p are fixed, (!) is a linear matrix inequality in variables S, s, σ; as a result, we get an S-r. of the
set of parameters S, s, σ of quadratic forms resulting in P (B) ⊂ Q for fixed P (·).
Item 1 here allows to pose as an explicit semidefinite program the arising in numerous applications problem
of finding the largest volume ellipsoid contained in the intersection S of finitely many ellipsoids (or, more
generally, sublevel sets of convex quadratic functions, e.g., linear functions). To this end it suffices to specify
the basic set as the unit Euclidean ball in Rm and to note that an ellipsoid (“flat” of full-dimensional) in
Rm is the image of this ball under affine transformation u 7→ P u + p with symmetric positive semidefinite P .
Item 1 says that the set of parameter (P ⪰ 0, p) of ellipsoids contained in S admits explicit S-r. Taking into
account that the volume of the ellipsoid P B + p with P ⪰ 0 is proportional to Det(P ) and that the function
−Det1/m (P ) of P ∈ Sm + admits explicit S-r., see Exercise IV.40.3, we arrive at the semidefinite reformulation
of the problem of interest.
Similarly, item 2 allows to handle another problem with a wide spectrum of applications – the problem of
finding the smallest volume ellipsoid containing the union of finitely many given ellipsoids. Indeed, representing
these ellipsoids as the images of the unit ball in Rm under given affine mappings, representing the ellipsoid of
interest as
n ⊤
E = {x ∈ R : (x − c) S(x − c) ≤ 1}

with S ≻ 0, and passing from variables S, c to variables S, s = −Sc, we get

E = {x ∈ Rn : x⊤ Sx + 2s⊤ x + σ(S, s) ≤ 0},

(!!)
[S ≻ 0, σ(S, s) = s⊤ S −1 s − 1]

so that the fact that E contains a given ellipsoid P B + p is equivalent to the existence of λ ≥ 0 such that

P ⊤ s − λq

⊤ −λQ
M(S, s, σ(S, s); P, p; λ) = [P, p] S[P ; p] + ⪯ 0,
s⊤ P − λq ⊤ 2s⊤ p + s⊤ S −1 s − 1 − λκ

or, which is the same, to the existence of λ ≥ 0 and µ such that

P ⊤ s − λq

⊤ −λQ S s
[P.p] S[P ; p] + ⪯ 0, ⪰ 0.
⊤
s P − λq ⊤ ⊤
2s p + µ − 1 − λκ s⊤ µ

Thus, we can point out an explicit S-r. of the set of parameters (S ≻ 0, s) of ellipsoids containing all ellipsoids
342 Exercises for Part IV

from a given finite collection. The volume of ellipsoid (!!) is proportional to Det−1/2 S, so that the problem of
interest is to maximize Det(S) over a subset of Sm + given by an S-r. Recalling that the function −Det
1/m
(S)
of S ⪰ 0 admits explicit S-r., we again reduce the problem of interest to an explicit semidefinite program.
7
As a simple illustration, consider the inscribed ellipsoid algorithm . This algorithm is aimed at solving
optimization problem
Opt = minn {f (x) : ∥x∥∞ ≤ 1}
x∈R

where f is convex continuous function on the unit box Bn = {x ∈ Rn : ∥x∥∞ ≤ 1}. The algorithm is applicable
when one can compute the value and a subgradient of f at any desired point x ∈ int Bn and works as follows:
at the beginning of iteration i = 1, 2, . . . we have at our disposal a localizer Gi - a polytope known to belong
to Bn and to contain the optimal set X∗ of the problem, with G1 = Bn . At iteration i we
• compute i-th search point xi – the center of the largest volume ellipsoid Ei contained in Gi ; we already
know that this requires solving auxiliary semidefinite problem and can be done efficiently;
• compute f (xi ) and a subgradient gi of f at xi . We define the approximate solution xi generated at the first
i iterations as the best – with the smallest value of the objective – among the search points x1 , . . . , xi , and
set f i = f (xi );
• update the localizer by setting
T i
Gi+1 = {x ∈ Gi : f (xi ) + gi (x − xi ) ≤ f }.

Note that the polytope Gi+1 indeed is a localizer. Indeed, X∗ ⊂ Gi , since Gi is a localizer; the only
possibility for the inclusion X∗ ⊂ Gi+1 to be violated is for some x∗ ∈ X∗ to satisfy f (xi )+giT (x∗ −xi ) > f i ,
which is impossible – the left hand side in this inequality is ≤ f (x∗ ) due to gi ∈ ∂f (xi ), and f (x∗ ) ≤ f i =
f (xi ).
The above recurrence is terminated when the average linear size Vol1/n (Ei ) of Ei becomes less than ϵ,
where ϵ ∈ (0, 1/2) is a prescribed tolerance. Theory says that
• the number I of iterations before termination is bounded by O(1)n ln(1/ϵ), and
• the resulting approximate solution xI ∈ Bn satisfies the error bound
I
f (x ) − Opt ≤ ϵ[max f − min f ].
Bn Bn

Moreover, the outlined algorithm possesses provably optimal (in certain precise sense not to be discussed here)
complexity.
An interested reader is highly recommended to implement and run this algorithm, preferably on a low-
dimensional problem (say, with n = 5 or n = 10) in order to make solving auxiliary problems not too time-
consuming.
Recommended setup:
• n = 5 or n = 10
• f (x) = maxi≤1000n |aT 3
i x + bi | with i.i.d. N (0, 1) entries in ai and b
• ϵ = 1.e-6

7 S.P Tarasov, L.G. Khachiyan, I.I. Erlikh, ”The method of inscribed ellipsoids”, Dokl. Akad. Nauk
SSSR, 298:5 (1988), 1081–1085; English translation: Dokl. Math., 37:1 (1988), 226–230.
29

Proofs of Facts from Part IV

Fact IV.20.6
(i) A cone K ⊆ Rn is regular if and only if its dual cone K∗ = {y ∈ Rn : y ⊤ x ≥
0, ∀x ∈ K} is regular.
(ii) Given regular cones K1 , . . . , Km , their direct product K1 × . . . × Km is also
regular.
Proof. (i): This is an immediate consequence of Fact II.8.23.
(ii): This is evident.

Fact IV.20.7
The following cones (see Examples discussed in section 1.2.4) are regular:

(i) Nonnegative ray, R+ .

Lorentz (a.k.a., second-order, or ice-cream) cone, Ln = {x ∈ Rn :
(ii) p xn ≥
x21 + . . . + x2n−1 } (L1 := R+ ).
(iii) Positive semidefinite cone, Sn+ = {X ∈ Sn : a⊤ Xa ≥ 0, ∀a ∈ Rn }.
Proof. Regularity of the above cones is self-evident.
Fact IV.20.10 Let K be a regular cone in Rν , Z ⊆ Rn be a nonempty convex set
and h : Z → Rν be a mapping with dom h = Z. Then, h is K-convex if and only
if for all µ ∈ K∗ (where K∗ is the cone dual to K) we have that the real valued
functions µ⊤ h(·) are convex on Z.
Proof. Since K is a closed cone, it is the cone dual to K∗ , that is, (20.5) takes place for all
x, y ∈ Z and λ ∈ [0, 1] if and only if

µ⊤ h(λx + (1 − λ)y) ≤ λµ⊤ h(x) + (1 − λ)µ⊤ h(y), ∀(x, y ∈ Z, λ ∈ [0, 1], µ ∈ K∗ ).

Fact IV.22.2 Consider parametric family (P∆ ) of convex cone-constrained problems

along with the family (D∆ ) of their cone-constrained Lagrange duals. Then,
(i) If L0 (µ) > −∞ for some µ = [µ; µ b] ∈ Λ, then the primal optimal value Opt(P∆ )
takes values in R ∪ {+∞} and is a convex function of ∆.
(ii) If (D0 ) is solvable with optimal solution λ∗ = [λ∗ ; λ
b∗ ] and Opt(D0 ) = Opt(P0 ),
then −λ∗ is a subgradient of Opt(P∆ ) at the point ∆ = 0, i.e.,
⊤
b⊤ δ,
Opt(P∆ ) ≥ Opt(P0 ) − λ∗ δ − λ b ∀(∆ = [δ; δ]).
b
∗

The premises in (i) and (ii) definitely take place when (P0 ) satisfies the Relaxed
Slater condition and is below bounded.
343
344 Proofs of Facts from Part IV

Proof. Under the premise of (i), by Weak duality we have for all ∆

Opt(P∆ ) ≥ L∆ (µ) = L0 (µ) − µ⊤ δ − µ

b⊤ δ.
b (29.1)
That is, for all ∆, Opt(P∆ ) > −∞ (as L0 (µ) > −∞). To prove that in this case Opt(P∆ )
is convex in ∆, it suffices to verify that when ∆′ , ∆′′ ∈ Dom Opt(P· ) and θ ∈ (0, 1), we have
′
Opt(Pθ∆′ +(1−θ)∆′′ ) ≤ θOpt(P∆′ ) + (1 − θ)Opt(P∆′′ ). Let us denote ∆′ = [δ ; δb′ ] and ∆′′ =
′′
[δ ; δb′′ ]. Then, by the definitions of Opt(P∆′ ) and Opt(P∆′′ ), given ϵ > 0, we can find x′ϵ , x′′ϵ ∈ X
such that
′
f (x′ϵ ) ≤ Opt(P∆′ ) + ϵ, g(x′ϵ ) ≤ δ , gb(x′ϵ ) ≤K δb′ ,
′′
f (x′′ϵ ) ≤ Opt(P∆′′ ) + ϵ, g(x′′ϵ ) ≤δ , gb(x′′ϵ ) ≤K δb′′ .
Hence, by convexity of f and g and K-convexity of gb, the point xϵ := θx′ϵ + (1 − θ)x′′ϵ satisfies
′ ′′
f (xϵ ) ≤ (θOpt(P∆′ ) + (1 − θ)Opt(P∆′′ ))+ϵ, g(xϵ ) ≤ θδ +(1−θ)δ , gb(xϵ ) ≤K θδb′ +(1−θ)δb′′ ,
implying that Opt(Pθ∆′ +(1−θ)∆′′ ) ≤ (θOpt(P∆′ ) + (1 − θ)Opt(P∆′′ ))+ϵ. Since ϵ > 0 is arbitrary,
we arrive at the desired relation Opt(Pθ∆′ +(1−θ)∆′′ ) ≤ θOpt(P∆′ ) + (1 − θ)Opt(P∆′′ ). This
justifies part (i).
To justify (ii), note that under the premise of this claim, the premise of (i) is satisfied by
taking µ := λ∗ . Then, (29.1) holds true for this µ as well. Moreover, as L∆ (λ∗ ) = Opt(P0 ), we
see that (29.1) with µ = λ∗ says that −λ∗ is a subgradient of Opt(P· ) at the origin.
Finally, when (P0 ) is below bounded and satisfies the relaxed Slater condition, the premise
of (ii) (and therefore of (i) as well) holds true by Theorem IV.22.1.

Fact IV.22.3 Consider the convex quadratic constraint x⊤ A⊤ Ax ≤ b⊤ x + c, where

A ∈ Rd×n , b ∈ Rn , and c ∈ R. This constraints can be equivalently rewritten as a
conic constraint involving Lorentz cone, i.e.,
x⊤ A⊤ Ax ≤ b⊤ x + c
⇐⇒ [2Ax; b⊤ x + c − 1; b⊤ x + c + 1] ∈ Ld+2
⇐⇒ 4x⊤ Ax + (b⊤ x + c − 1)2 ≤ (b⊤ x + c + 1)2 and b⊤ x + c + 1 ≥ 0.
⊤ 2 ⊤ 2
Proof. Note that b⊤ x + c = (b x+c+1) −(b 4
x+c−1)
. Thus, if x is feasible to the quadratic
constraint (and, in particular, b x + c ≥ 0, implying that b⊤ x + c + 1 > 0), x is feasible for the
⊤

conic constraint, and clearly vice versa.

Fact IV.22.5 Consider the conic problem given in (22.5)

Opt(P ) = minn c⊤ x : Ax − b ≤ 0, P x − p ≤K 0 ,

(P )
x∈R

along with its conic dual problem

n o
Opt(D) = max −b⊤ λ − p⊤ λ b : A⊤ λ + P ⊤ λ
b + c = 0, λ ≥ 0, λ
b ∈ K∗ . (D)
λ,λ
b

Conic duality is symmetric, i.e., the conic dual to conic problem (D) is (equivalent
to) conic problem (P ), i.e., (22.5).
Proof. In order to apply our recipe for building the conic dual to a conic problem, let us rewrite
(D) in the minimization form with ≤ 0-type affine constraints and a single conic constraint, i.e.,

A⊤ λ + P ⊤ λ
 
 b+c ≤0 
⊤ ⊤b ⊤ ⊤
−Opt(D) = min b λ + p λ : −A λ − P λ b − c ≤ 0 , −λ b ∈ −K∗ . (D)
λ,λ
−λ ≤ 0
b  
Proofs of Facts from Part IV 345

The dual to this problem, in view of [K∗ ]∗ = K, reads

n o
max c⊤ [u − v] : b + A[u − v] − w = 0, P [u − v] + p − y = 0, u ≥ 0, v ≥ 0, w ≥ 0, y ∈ K .
u,v,w,y

By setting x := v − u and eliminating y and w, the latter problem becomes

n o
max −c⊤ x : Ax − b ≤ 0, P x − p≤K 0 ,
x

which is noting but (P ).

Fact IV.27.4 Let X and Λ be compact sets in Rn and Rm , respectively, and let
L(x, λ) : X × Λ → R be a continuous function. Then, the problems (P) and (D)
are solvable.
Proof. Since X and Λ are compact sets and L is continuous on X × Λ, due to the well-known
Analysis theorem (Theorem B.29 and Remark B.30) L is uniformly continuous on X × Λ. That
is, for every ϵ > 0 there exists δ(ϵ) > 0 such that
∥x − x′ ∥2 + ∥λ − λ′ ∥2 ≤ δ(ϵ) =⇒ |L(x, λ) − L(x′ , λ′ )| ≤ ϵ. (29.2)
′
In particular, for every λ = λ ∈ Λ we have
∥x − x′ ∥2 ≤ δ(ϵ) =⇒ |L(x, λ) − L(x′ , λ)| ≤ ϵ,
which immediately implies
∥x − x′ ∥2 ≤ δ(ϵ) =⇒ |L(x) − L(x′ )| ≤ ϵ,

so that the function L is continuous on X. This together with the fact that X is compact imply
that the problem (P) is solvable. The same reasoning applied to the variables λ leads to the
conclusion that problem (D) is solvable.
Appendix A

Prerequisites from Linear Algebra

Note: Appendices A – D reproduce, courtesy of World Scientific Publish-

ing Co., appendices A – C in the textbook A. Nemirovski, Introduction
to Linear Optimization, World Scientific, 2024.

Regarded as mathematical entities, the objective and the constraints in a Math-

ematical Programming problem are functions of several real variables; therefore
before entering the Optimization Theory, we need to recall several basic notions
and facts about the spaces Rn where these functions live, same as about the
functions themselves.
The following material is considered basic, and we expect the reader to be
already familiar with it, and thus we use a “cook book” style here.

A.1 Space Rn : algebraic structure

Basically all events and constructions to be considered will take place in the space
Rn of n-dimensional real vectors. This space can be described as follows.

A.1.1 A point in Rn
A point in Rn (called also an n-dimensional vector) is an ordered collection
x = [x1 ; . . . ; xn ] of n reals, called the coordinates, or components, or entries
of vector x; the space Rn itself is the set of all collections of this type. Note
that we follow MATLAB notation for representing vectors and matrices. That is,
the notation x = [x1 ; . . . ; xn ] represents a column vector, while x = [x1 , . . . , xn ]
represents a row vector.

A.1.2 Linear operations

n
R is equipped with two basic operations:
• Addition of vectors. This operation takes on input two vectors x = [x1 ; . . . ; xn ]
and y = [y1 ; . . . ; yn ] and produces from them a new vector with entries which
are sums of the corresponding entries in x and in y, i.e.,
x + y = [x1 + y1 ; . . . ; xn + yn ].
346
A.2 Linear subspaces 347

• Multiplication of vectors by reals. This operation takes on input a real λ and

an n-dimensional vector x = [x1 ; . . . ; xn ] and produces from them a new vector
with entries which are λ times the entries of x, i.e.,

λx = [λx1 ; . . . ; λxn ].

As far as addition and multiplication by reals are concerned, the arithmetic of

Rn inherits most of the common rules of Real Arithmetic, like x + y = y + x,
(x + y) + z = x + (y + z), (λ + µ)(x + y) = λx + µx + λy + µy, λ(µx) = (λµ)x,
etc.
In the above presentation, it was tacitly assumed that n is a positive integer.
To avoid unnecessary trivial comments in the sequel, it makes sense to define R0
as the linear space with exactly one element, denoted by 0, and the only possible
in this case linear operations: 0 + 0 = 0, λ · 0 = 0, λ ∈ R.

A.2 Linear subspaces

Definition A.1 [Linear subspace] A linear subspace in Rn is, by definition,

a nonempty subset of Rn which is closed with respect to addition of vectors
and multiplication of vectors by reals. That is,

 L ̸= ∅;
n
L ⊆ R is a linear subspace ⇐⇒ x, y ∈ L =⇒ x + y ∈ L;
x ∈ L, λ ∈ R =⇒ λx ∈ L.


A.2.1 Examples of linear subspaces

We have some immediate examples of linear subspaces.
Example A.2 Each one of the following is a linear subspace:

1. The entire Rn .
2. The trivial subspace containing the single zero vector 0 = [0; . . . ; 0] (this vec-
tor/point is called also the origin).
Here, pay attention to the notation: we use the same symbol 0 to denote the
real zero and the n-dimensional vector with all coordinates equal to zero; these
two zeros are not the same, and one should understand from the context (it is
always very easy) which zero is meant.
3. The set {x ∈ Rn : x1 = 0} of all vectors x with the first coordinate equal to
zero.

In fact, the last example in the above list admits a natural extension:
Example A.3 The set of all solutions to a homogeneous (i.e., with zero right
348 Prerequisites from Linear Algebra

hand side) system of linear equations

 

 a11 x 1 + . . . + a1n x n = 0 

a21 x1 + . . . + a2n xn = 0
 
n
x∈R : (A.1)

 ... 

am1 x1 + . . . + amn xn = 0
 

is always a linear subspace in Rn .

In fact, we will see in Proposition A.46 that this example is “generic,” that is,
every linear subspace in Rn is the solution set of a (finite) system of homogeneous
linear equations.

Definition A.4 [Linear combination] Given a set of vectors x1 , . . . , xN ∈ Rn

PN
and a set of real numbers λ1 , . . . , λN , the vector i=1 λi xi is called a linear
combination of the vectors x1 , . . . , xN , and the reals λ1 , . . . , λN are called the
coefficients of the combination.

Definition A.5 [Linear span] Given a nonempty set X ⊆ Rn of vectors,

the linear span of X [notation: Lin(X)] is the linear subspace that consists
of all vectors x which can be represented as linear combinations of vectors
from X. That is,
( N
)
X
n i N i
Lin(X) := x ∈ R : ∃N, x ∈ X, i = 1, . . . , N, λ ∈ R : x = λi x .
i=1

By definition, the linear span of empty set is the trivial linear subspace, the
origin, i.e., Lin(∅) = {0}.

Example A.6 Given a nonempty set X ⊆ Rn of vectors, their linear span

Lin(X) is a linear subspace.
The definition of linear span also immediately points out to an “outer” char-
acterization as follows.

Fact A.7 For X ⊂ Rn , Lin(X) is the smallest linear subspace which con-
tains X: if L is a linear subspace such that L ⊇ X, then L ⊇ Lin(X).

We will see in Theorem A.16 that the “linear span” example is generic as well.
That is, every linear subspace in Rn is the linear span of an appropriately chosen
finite set of vectors from Rn . Note that this latter characterization of the linear
span is of “inner” description type.
A.2 Linear subspaces 349

A.2.2 Sums and intersections of linear subspaces

Let {Lα }α∈I be a family (finite or infinite) of linear subspaces of Rn . From this
family, we can build two sets:
P
1. The sum Lα of the subspaces Lα which consists of all vectors which can be
α
represented as finite sums of vectors taken each from its own subspace of the
family; T
2. The intersection Lα of the subspaces from the family.
α

Theorem A.8 PLet {Lα }α∈I be a family of linear subspaces of Rn . Then,

(i) The sum Lα of the subspaces from the family is itself a linear sub-
α
space of Rn ; it is the smallest of those subspaces of Rn which contain every
subspace Lα from the family;
T
(ii) The intersection Lα of the subspaces from the family is itself a linear
α
subspace of Rn ; it is the largest of those subspaces of Rn which are contained
in every subspace Lα from the family.

A.2.3 Linear independence, bases, dimensions

Definition A.9 [Linear independence] A collection X = {x1 , . . . , xN } of

vectors from Rn is called linearly independent, if no nontrivial (i.e., with at
least one nonzero coefficient) linear combination of vectors from X is zero.
That is,
XN
λi xi = 0 =⇒ λ1 = . . . = λN = 0.
i=1

Example A.10 (Example of a linearly independent set) The collection of n

standard basis vectors, (a.k.a. standard basic orths) in Rn , i.e., the vectors e1 :=
[1; 0; . . . ; 0], e2 := [0; 1; 0; . . . ; 0], . . . , en := [0; . . . ; 0; 1], is linearly independent.
Example A.11 (Examples of linearly dependent sets) The following sets are all
linearly dependent:
1. X = {0};
2. X = {e1 , e1 };
3. X = {e1 , e2 , e1 + e2 }.

Definition A.12 [Basis] A collection of vectors f 1 , . . . , f m is called a basis

in Rn , if
• the collection is linearly independent, and
350 Prerequisites from Linear Algebra

• every vector from Rn is a linear combination of vectors from the collection

(i.e., Lin{f 1 , . . . , f m } = Rn ).

Example A.13 (Basis) The collection of standard basis vectors e1 , . . . , en is a

basis in Rn .
Example A.14 (Non-basis) Every one of the following examples does not form
a basis of Rn .
1. The collection {e2 , . . . , en }: this collection is linearly independent, but not
every vector Rn is a linear combination of the vectors from the collection.
2. The collection {e1 , e1 , e2 , . . . , en }: every vector in Rn is a linear combination
of vectors from this collection, but the collection is not linearly independent.
We can indeed talk about the bases of any linear subspace, not just the entire
space Rn .

Definition A.15 [Basis of a linear subspace] A collection {f 1 , . . . , f m } of

vectors is called a basis of a linear subspace L, if
• the collection is linearly independent, and
• L = Lin{f 1 , . . . , f m }, i.e., all vectors f i belong to L, and every vector from
L is a linear combination of the vectors f 1 , . . . , f m .

In order to avoid trivial remarks, we follow the standard convention:

An empty set of vectorsPis linearly independent, and an empty linear
combination of vectors λi xi has a value, specifically, equals to zero.
i∈∅

With this convention, the trivial linear subspace L = {0} also has a basis,
specifically, an empty set of vectors. This convention also is fully compatible with
our convention Lin(∅) = {0}.

Theorem A.16 Let L be a linear subspace of Rn . Then, L admits a (finite)

basis, and all bases of L are composed of the same number of vectors.

Definition A.17 [Dimension of a linear subspace] Given a linear subspace

L, the number of elements in its basis is called the dimension of L [notation:
dim(L)].

We have seen that Rn admits a basis composed of n elements, i.e., the standard
basis vectors ei . From Theorem A.16, it follows that every basis of Rn contains
exactly n vectors, and the dimension of Rn is n.

Theorem A.18 The larger is a linear subspace of Rn , the larger is its

dimension: Suppose L ⊆ L′ are linear subspaces of Rn . Then, dim(L) ≤
dim(L′ ). Moreover, dim(L) = dim(L′ ) holds if and only if L = L′ .
A.2 Linear subspaces 351

We have seen that dim(Rn ) = n; according to the above convention, the trivial
linear subspace {0} of Rn admits an empty basis, so that its dimension is 0.
Since every linear subspace L of Rn satisfies {0} ⊆ L ⊆ Rn , it follows from
Theorem A.18 that the dimension of a linear subspace in Rn is an integer between
0 and n.

Theorem A.19 Let L be a linear subspace in Rn . Then,

(i) Every linearly independent subset of vectors from L can be extended to
a basis for L;
(ii) From every spanning subset X for L, i.e., a set X such that Lin(X) = L,
we can extract a basis for L.
It follows from Theorem A.19 that
• every linearly independent subset of L contains at most dim(L) vectors, and if
it contains exactly dim(L) vectors, it is a basis of L;
• every spanning set for L contains at least dim(L) vectors, and if it contains
exactly dim(L) vectors, it is a basis of L.

Theorem A.20 Let L be a linear subspace in Rn . Suppose L is nontrivial

(i.e., L ̸= {0}) and {f 1 , . . . , f m } is a basis of L. Then, every vector x ∈ L
admits a unique representation
m
X
x= λi (x)f i
i=1

as a linear combination of vectors from the basis. Moreover, the mapping

x 7→ (λ1 (x), . . . , λm (x)) : L → Rm
is a one-to-one mapping of L onto Rm which is linear, i.e., for every i =
1, . . . , m we have
λi (x + y) = λi (x) + λi (y), ∀(x, y ∈ L);
(A.2)
λi (νx) = νλi (x), ∀(x ∈ L, ν ∈ R).
352 Prerequisites from Linear Algebra

Definition A.21 [Coordinates] Given a linear subspace L, a basis f 1 , . . . , f m

of L, and a point x ∈ L, the reals λi (x), i = 1, . . . , m, in the unique repre-
m
λi (x)f i are called the coordinates of x ∈ L in
P
sentation of x given by x =
i=1
the basis f 1 , . . . , f m .

For example, the coordinates of a vector x ∈ Rn in the standard basis e1 , . . . , en

of Rn – the one composed of the standard basis vectors – are exactly the entries
of x.

Theorem A.22 [Dimension formula] Let L1 , L2 be linear subspaces of Rn .

Then,
dim(L1 ∩ L2 ) + dim(L1 + L2 ) = dim(L1 ) + dim(L2 ).

A.2.4 Linear mappings and matrices

A function A(x), also called as mapping, defined on Rn and taking values in Rm
is called linear if it preserves linear operations:
A(x + y) = A(x) + A(y), ∀(x, y ∈ Rn ), and
A(λx) = λA(x), ∀(x ∈ Rn , λ ∈ R).
It is immediately seen that a linear mapping from Rn to Rm can be represented
as a multiplication by an m×n matrix. That is, given a linear mapping A : Rn →
Rm , there exists a matrix A ∈ Rm×n such that
A(x) = Ax.
Moreover, this matrix A is uniquely defined by the mapping: the columns Aj of
A are just the images of the standard basis vectors ej under the mapping A, i.e.,
Aj = A(ej ).
Linear mappings from Rn into Rm can be added to each other, i.e.,
(A + B)(x) = A(x) + B(x),
and multiplied by reals, i.e.,
(λA)(x) = λA(x), ∀λ ∈ R.
Moreover, the results of these operations again are linear mappings from Rn
to Rm . The addition of linear mappings and multiplication of these mappings
by reals correspond to the same operations with the matrices representing the
mappings: adding/multiplying by reals mappings, we add, respectively, multiply
by reals the corresponding matrices.
A.2 Linear subspaces 353

Given two linear mappings A(x) : Rn → Rm and B(y) : Rm → Rk , we can

build their superposition
C(x) ≡ B(A(x)) : Rn → Rk ,
which is again a linear mapping, now from Rn to Rk . In the language of matrices
representing the mappings, the superposition corresponds to matrix multiplica-
tion: the k×n matrix C representing the mapping C is the product of the matrices
representing A and B:
A(x) = Ax, B(y) = By =⇒ C(x) ≡ B(A(x)) = B · (Ax) = (BA
|{z})x.
=C

A.2.5 Determinant and rank

Let us recall some basic facts about ranks and determinants of matrices.

A.2.6 Determinant
Let A = [aij ] 1≤i≤n, be a square matrix. A diagonal of the matrix A is the col-
1≤j≤n

lection of n cells with indices (1, j1 ), (2, j2 ), . . . , (n, jn ), where j1 , j2 , . . . , jn are

distinct from each other, so that the mapping σ : {1, . . . , n} → {1, . . . , n} given
by σ(i) = ji , 1 ≤ i ≤ n, is a permutation, i.e., a one-to-one mapping of the
set {1, . . . , n} onto itself. There are n! different permutations of {1, . . . , n}, and
these permutations form the group Σn with the group operation –the product–
(σ 1 σ 2 )(i) = σ 1 (σ 2 (i)), 1 ≤ i ≤ n. Any permutation σ ∈ Σn can be assigned a
sign sign(σ) ∈ {±1} in such a way that the sign of the identity permutation
σ(i) ≡ i is +1, the sign of the product of two permutations is the product of their
signs: sign(σ 1 σ 2 ) = sign(σ 1 )sign(σ 2 ) for all σ 1 , σ 2 , and the sign of a transposition
–permutation with swaps two distinct indices and keeps all other indices intact–
is (−1); these properties specify the sign of a permutation in a unique fashion.
Then, for any n × n real or complex matrix A = [aij ]1≤i,j≤n , the quantity
X n
Y
Det(A) := sign(σ) aiσ(i)
σ∈Σn i=1

is called the determinant of A. We have the following main properties of the

determinant:
1. Det(A) = Det(A⊤ ).
2. [polylinearity] Det(A) is linear in rows of A: when all rows but one are fixed,
Det(A) is a linear function of the remaining row. An analogous property holds
for the columns as well.
3. [antisymmetry] When swapping rows with two distinct indices, the determi-
nant is multiplied by (−1). An analogous property holds for the columns as
well.
354 Prerequisites from Linear Algebra

4. Det(In ) = 1.
Note: The last three properties uniquely define Det(·).
5. [multiplicativity] For two n×n matrices A, B we have Det(AB) = Det(A)Det(B).
6. An n × n matrix A is nonsingular, that is, AB = In for properly selected B,
if and only if Det(A) ̸= 0.
7. [Cramer’s rule] For a nonsingular n × n matrix A, the linear system Ax = b in
variables x has a unique solution, and the entries of this solution are given by
Det(A, i, b)
xi := , 1 ≤ i ≤ n,
Det(A)
where Det(A, i, b) is the determinant of the matrix obtained from A by replac-
ing its i-th column with the right hand side vector b.
8. [decomposition] Let A be an n × n matrix with n > 1. Then, for every i ∈
{1, . . . , n} we have
Xn
Det(A) = aij C ij ,
j=1

where C ij is the algebraic complement of the (i, j)-th entry in A, i.e., C ij :=

(−1)i+j Det(Aij ), and here Aij is the (n − 1) × (n − 1) matrix obtained from
A by eliminating i-th row and j-th column.
9. An affine mapping x 7→ Ax + b : Rn → Rn multiplies n-dimensional volumes
by |Det(A)|. In particular, for a real n × n matrix A the quantity |Det(A)| is
the n-dimensional volume of the parallelotope
( )
X
X= x= sj Aj : 0 ≤ sj ≤ 1, ∀j = 1, . . . , n
j

spanned by the columns A1 , . . . , An of A.

A.2.7 Rank
Let A be an m × n matrix. Every matrix is associated with two linear subspaces,
namely its image space and kernel. The image space of A [notation: Im(A)] is
defined as the linear span of columns of A, i.e., the subspace of the destination
space Rm formed by the vectors that admit a representation of the form Ax for
some x ∈ Rn . The kernel (a.k.a. nullspace) of A [notation: Ker(A)] is given by
Ker(A) := {x : Ax = 0}. Note that Ker(A) is the orthogonal complement of
Im(A⊤ ).
Let Rp := {i1 < i2 < . . . < ip } be a collection of p ≥ 1 distinct from each other
indices of rows of A, and Cq := {j1 < j2 < . . . < jq } be a collection of q ≥ 1
distinct from each other indices of columns of A. The matrix B ∈ Rp×q with
entries Bkℓ = Aik ,jℓ , 1 ≤ k ≤ p, 1 ≤ ℓ ≤ q is called the submatrix of A with row
indices Rp and column indices Cq ; this is precisely what we get from the matrix
A in the intersection of rows with indices from Rp and columns with indices from
Cq .
A.3 Space Rn : Euclidean structure 355

The rank of A (which is denoted by rank(A)) is, by definition, the largest of the
row, or, which is the same, the column sizes of nonsingular square submatrices
of A. When no such submatrix exist, that is, when A is the zero matrix, the rank
of A by definition is 0.
The main properties of rank are as follows:
1. rank(A) is the dimension of the image space Im A, i.e., rank(A) = dim(Im A).
Equivalently, rank(A) is the maximum of cardinalities of linearly indepen-
dent collections of columns of A. Moreover, a collection of columns of A with
rank(A) distinct indices is linearly independent if and only if its intersection
with the collection of rank(A) properly selected rows is a nonsingular subma-
trix of A.
2. rank(A) = rank(A⊤ ), so that rank(A) is the maximum of cardinalities of
linearly independent collections of rows of A. Thus, rank(A) is the codimension
of the kernel of A. That is, for a given A ∈ Rm×n
dim(Ker(A)) = n − rank(A).
A collection of rows of A with rank(A) distinct indices is linearly independent
if and only if its intersection with the collection of rank(A) properly selected
columns is a nonsingular submatrix of A.
3. Whenever the product AB of two matrices makes sense, we have
rank(AB) ≤ min {rank(A), rank(B)} .
For two matrices A, B of the same size, we have
| rank(A) − rank(B)| ≤ rank(A + B) ≤ rank(A) + rank(B).
4. An n × n matrix B is nonsingular if and only if rank(B) = n.

A.3 Space Rn : Euclidean structure

So far, we were interested solely in the algebraic structure of Rn , or, equivalently,
in the properties of the linear operations (addition of vectors and multiplication
of vectors by scalars) the space is endowed with. Now let us consider another
structure on Rn –the standard Euclidean structure– which allows to speak about
distances, angles, convergence, etc., and thus makes the space Rn a much richer
mathematical entity.

A.3.1 Euclidean structure

The standard Euclidean structure on Rn is given by the standard inner product
– an operation which takes on input two vectors x, y and produces from them a
real number, specifically, the real number given by
n
X
⊤
⟨x, y⟩ := x y = x i yi .
i=1
356 Prerequisites from Linear Algebra

The basic properties of the inner product are as follows:

1. [bi-linearity]: The real-valued function ⟨x, y⟩ of two vector arguments x, y ∈ Rn
is linear with respect to every one of the arguments when the other argument
is being fixed:
⟨λu + µv, y⟩ = λ⟨u, y⟩ + µ⟨v, y⟩, ∀(u, v, y ∈ Rn , λ, µ ∈ R),
⟨x, λu + µv⟩ = λ⟨x, u⟩ + µ⟨x, v⟩, ∀(x, u, v ∈ Rn , λ, µ ∈ R).
2. [symmetry]: The function ⟨x, y⟩ is symmetric:
⟨x, y⟩ = ⟨y, x⟩, ∀(x, y ∈ Rn ).
3. [positive definiteness]: The quantity ⟨x, x⟩ is always nonnegative, and it is zero
if and only if x is zero.
Remark A.23 The outlined three properties (i.e., bi-linearity, symmetry and
positive definiteness) form a definition of an Euclidean inner product. In fact, there
are infinitely many different ways to satisfy these properties. In other words, there
are infinitely many different Euclidean inner products on Rn . The standard inner
product ⟨x, y⟩ = x⊤ y is just a particular case of this general notion. Although in
this book we normally work with the standard inner product, the reader should
remember that the facts we are about to recall are valid for all Euclidean inner
products, and not only for the standard one.
The notion of an inner product underlies a number of purely algebraic construc-
tions, in particular, those of inner product representation of linear forms and of
orthogonal complement.

A.3.2 Inner product representation of linear forms on Rn

A linear form on Rn is a real-valued function f (x) on Rn which is additive, i.e.,
f (x + y) = f (x) + f (y) for all x, y ∈ Rn , and homogeneous, i.e., f (λx) = λf (x)
for all x ∈ Rn and λ ∈ R.
n
P
Example A.24 (Example of a linear function) The function f (x) := jxj is
j=1
linear.
Example A.25 (Examples of non-linear functions) Each one of the following
functions is not linear:
1. f (x) := x1 + 1,
2. f (x) := x21 − x22 ,
3. f (x) := sin(x1 ).
When we add two linear forms or when we multiply a linear form by a real
number, we again get a linear form (scientifically speaking: “linear forms on Rn
form a linear space”). Euclidean structure allows us to identify linear forms on
Rn with vectors from Rn as follows.
A.3 Space Rn : Euclidean structure 357

Theorem A.26 Let ⟨·, ·⟩ be a Euclidean inner product on Rn .

(i) Let f (x) be a linear form on Rn . Then, there exists a uniquely defined
vector f ∈ Rn such that the linear form f (x) is just the inner product with
f and x:
f (x) = ⟨f, x⟩, ∀x ∈ Rn .
(ii) Vice versa, every vector f ∈ Rn defines, via the formula
f (x) := ⟨f, x⟩,
a linear form on Rn .
(iii) The above one-to-one correspondence between the linear forms and
vectors on Rn is linear: adding linear forms (or multiplying a linear form
by a real number), we add (respectively, multiply by the real number) the
vector(s) representing the form(s).

A.3.3 Orthogonal complement

The Euclidean structure allows us to associate with a linear subspace L ⊆ Rn
another linear subspace, namely the orthogonal complement (or the annulator)
of L [notation: L⊥ ]. By definition, L⊥ consists of all vectors which are orthogonal
to every vector from L. That is,
L⊥ := {f ∈ Rn : ⟨f, x⟩ = 0, ∀x ∈ L} .

Theorem A.27 (i) Whenever L is a linear subspace of Rn , we have

(L⊥ )⊥ = L.
(ii) The larger is a linear subspace L, the smaller is its orthogonal comple-
ment: if L1 ⊂ L2 are linear subspaces of Rn , then L⊥ ⊥
1 ⊃ L2 .
(iii) The intersection of a subspace and its orthogonal complement is trivial,
and the sum of these subspaces is the entire Rn :
L ∩ L⊥ = {0}, L + L⊥ = Rn .

Remark A.28 From Theorem A.27.iii and the Dimension formula (Theorem
A.22) it follows that for every subspace L in Rn we have
dim(L) + dim(L⊥ ) = n.
Moreover, every vector x ∈ Rn admits a unique decomposition as a sum of two
vectors
x = xL + xL ⊥ ,
where xL belongs to L and xL⊥ belongs to L⊥ . This decomposition is called the
358 Prerequisites from Linear Algebra

orthogonal decomposition of x taken with respect to L, L⊥ . In this decomposition,

xL is called the orthogonal projection of x onto L, and xL⊥ is called the orthogonal
projection of x onto the orthogonal complement of L. Both projections depend
on x linearly. That is, we have
(x + y)L = xL + yL , ∀x, y ∈ Rn , and (λx)L = λxL , ∀x ∈ Rn , ∀λ ∈ R.
The mapping x 7→ xL is called the orthogonal projector onto L.

A.3.4 Orthonormal bases

A collection of vectors f 1 , . . . , f m is called orthonormal w.r.t. Euclidean inner
product ⟨·, ·⟩ if each vector from the collection is orthogonal to every other vector
from it, i.e.,
i ̸= j =⇒ ⟨f i , f j ⟩ = 0,
and the inner product of every vector f i with itself is unit, i.e.,
⟨f i , f i ⟩ = 1, i = 1, . . . , m.

Theorem A.29 An orthonormal collection of vectors f 1 , . . . , f m is always

linearly independent and is thus a basis of its linear span L = Lin(f 1 , . . . , f m )
(such a basis in a linear subspace is called orthonormal ). The coordinates of
a vector x ∈ L w.r.t. an orthonormal basis f 1 , . . . , f m of L are given by
explicit formulas:
m
X
x= λi (x)f i ⇐⇒ λi (x) = ⟨x, f i ⟩ ∀i = 1, . . . , m.
i=1

Proof. Consider any x ∈ Rn and any i = 1, . . . , m. Starting from the represen-

m
λj (x)f j , and by taking inner product of both sides of this equality
P
tation x =
j=1
with f i , we get
Xm
⟨x, fi ⟩ = ⟨ λj (x)f j , f i ⟩
j=1
m
X
= λj (x)⟨f j , f i ⟩ [by bilinearity of inner product]
j=1

= λi (x). [by orthonormality of {f i }]

Plugging x = 0 in this representation results in the corresponding coefficients
λi (0) = 0 for all i, i.e., all the coefficients are zero. Hence, an orthonormal system
is linearly independent.
Example A.30 (An orthonormal basis in Rn ) The standard basis {e1 , . . . , en }
is orthonormal with respect to the standard inner product ⟨x, y⟩ = x⊤ y on Rn
(but is not orthonormal w.r.t. other Euclidean inner products on Rn ).
A.3 Space Rn : Euclidean structure 359

Theorem A.31 (i) If f 1 , . . . , f m is an orthonormal basis in a linear subspace

L, then the inner product of two vectors x, y ∈ L in the coordinates λi (·)
w.r.t. this basis is given by the standard formula
m
X
⟨x, y⟩ = λi (x)λi (y).
i=1

(ii) Every linear subspace L of Rn admits an orthonormal basis. Moreover,

every orthonormal system f 1 , . . . , f m of vectors from L can be extended to
an orthonormal basis in L.

Proof. To prove (i), consider any two vectors x, y ∈ L along with their basis
m m
λi (x)f i and y = λi (y)f i . Then,
P P
representations, i.e., x =
i=1 i=1
Xm Xm
⟨x, y⟩ = ⟨ λi (x)f i , λi (y)f i ⟩
i=1 i=1
m X
X m
= λi (x)λj (y)⟨f i , f j ⟩ [by bilinearity of the inner product]
i=1 j=1
Xm
= λi (x)λi (y). [by orthonormality of the vectors {f i }]
i=1

The proof of (ii) is given by the Gram-Schmidt orthogonalization process (which

is important by its own right) as follows. We start with an arbitrary basis
h1 , . . . , hm in L and step by step convert it into an orthonormal basis f 1 , . . . , f m .
At the beginning of a step t of the construction, we already have an orthonormal
collection f 1 , . . . , f t−1 such that Lin{f 1 , . . . , f t−1 } = Lin{h1 , . . . , ht−1 }. For any
t = 1, . . . , m, at the step t, we proceed as follows:

1. Build the vector

t−1
X
g t := ht − ⟨ht , f j ⟩f j .
j=1

It is easily seen (check it!) that

• we have
Lin{f 1 , . . . , f t−1 , g t } = Lin{h1 , . . . , ht }; (A.3)
• moreover, g t ̸= 0 (derive this fact from (A.3) and the linear independence
of the collection h1 , . . . , hm );
• also, g t is orthogonal to f 1 , . . . , f t−1 .
2. Since g t ̸= 0, the quantity ⟨g t , g t ⟩ is positive (by the positive definiteness of
the inner product), so that the vector
1
f t := p gt
⟨g t , g t ⟩
360 Prerequisites from Linear Algebra

is well defined. It is immediately seen (check it!) that the collection f 1 , . . . , f t

is orthonormal and
Lin{f 1 , . . . , f t } = Lin{f 1 , . . . , f t−1 , g t } = Lin{h1 , . . . , ht },
which completes the step t of the orthogonalization process.
After m steps of the orthogonalization process, we end up with an orthonormal
system f 1 , . . . , f m of vectors from L such that
Lin{f 1 , . . . , f m } = Lin{h1 , . . . , hm } = L,
so that f 1 , . . . , f m is an orthonormal basis in L.
The construction can be easily modified (do it!) to extend a given orthonormal
system of vectors from L to an orthonormal basis of L.
Theorem A.31 admits an important corollary which states that
All Euclidean spaces of the same dimension are “the same.”

Corollary A.32 Suppose L is an m-dimensional subspace in a space Rn

equipped with an Euclidean inner product ⟨·, ·⟩. Then, there exists a one-to-
one mapping x 7→ A(x) of L onto Rm such that
• the mapping preserves linear operations:
A(x + y) = A(x) + A(y), ∀(x, y ∈ L) and
A(λx) = λA(x), ∀(x ∈ L, λ ∈ R);
• the mapping converts the ⟨·, ·⟩ inner product on L into the standard inner
product on Rm :
⟨x, y⟩ = (A(x))⊤ A(y), ∀x, y ∈ L.

Proof. Indeed, by Theorem A.31.ii L admits an orthonormal basis f 1 , . . . , f m ;

using Theorem A.31.i, we can immediately check that the mapping
x 7→ A(x) = [λ1 (x); . . . ; λm (x)]
which maps x ∈ L into the m-dimensional vector composed of the coordinates of
x in the basis f 1 , . . . , f m , meets all the requirements.

Fact A.33 (i) Let L be a linear subspace of Rn , and f 1 , . . . , f m be an

orthonormal basis in L. Then, for every x ∈ Rn , the orthoprojection xL of x
onto L is given by the formula
m
X
xL := (x⊤ f i )f i .
i=1

(ii) Let L1 , L2 be linear subspaces in Rn . Then,

(L1 + L2 )⊥ = L⊥ ⊥
1 ∩ L2 and (L1 ∩ L2 )⊥ = L⊥ ⊥
1 + L2 .
A.4 Affine subspaces in Rn 361

A.4 Affine subspaces in Rn

In this book, many events we consider take place not in the entire Rn , but in its
affine subspaces. Geometrically, affine subspaces are planes of different dimensions
in Rn . Let us become acquainted with these subspaces.

A.4.1 Affine subspaces and affine hulls

In geometry, a linear subspace L of Rn is a special plane – the one passing through
the origin of the space (i.e., containing the zero vector). To get an arbitrary plane
M , it suffices to translate an appropriate special plane L, i.e., add a fixed shifting
vector ā to all points from L. This geometric intuition leads to the following
definition.

Definition A.34 [Affine subspace] An affine subspace (a.k.a. affine plane)

in Rn is a set of the form
M = ā + L = {ā + x : x ∈ L} , (A.4)
where L is a linear subspace in Rn and ā is a vector from Rn .

According to our convention on arithmetic of sets, we were supposed to write in

(A.4) {ā} + L instead of ā + L as we did not define arithmetic sum of a vector
and a set. It is usual to ignore this difference and omit the brackets when writing
down singleton sets in similar expressions: we shall write ā + L instead of {ā} + L,
Rd instead of R{d}, etc.
Example A.35 Let L be the linear subspace composed of vectors with first
entries equal to zero. Suppose that we shift L by a vector ā = [ā1 ; . . . ; ān ]. Then,
we obtain the set M := ā + L of all vectors x with x1 = ā1 . According to our
terminology, this set M is an affine subspace.
An immediate question about the notion of an affine subspace is what “degrees
of freedom” we may have in decomposition (A.4). For example, would M uniquely
determine the linear subspace L and the shifting vector ā? The next proposition
provides an answer to this question.

Proposition A.36 Given an affine subspace M in Rn , the linear subspace

L in its decomposition (A.4) is uniquely determined by M . Specifically, L is
the set of all differences of the vectors from M , i.e.,
L = M − M = {x − y : x, y ∈ M }. (A.5)
In contrast, the shifting vector ā is not uniquely defined by M and can be
chosen as an arbitrary vector from M .

In this book, given an affine subspace M , we refer to the linear subspace L :=

M − M as the subspace parallel to the affine subspace M .
362 Prerequisites from Linear Algebra

A.4.2 Intersections of affine subspaces, affine combinations, and

affine hulls
An immediate conclusion of Proposition A.36 is as follows:
n
Corollary A.37 Let {MTα } be an arbitrary family of affine subspaces in R .
Whenever the set M := Mα is nonempty, then M is an affine subspace.
α

From Corollary A.37 it immediately follows that for every nonempty subset Y
of Rn there exists the smallest affine subspace containing Y ; this is the intersec-
tion of all affine subspaces containing Y . This smallest affine subspace containing
Y is called the affine hull of Y [notation: Aff(Y )].
All this resembles a lot the story about linear spans. In fact, we can further
extend this analogy and get an “inner” description of the affine hull Aff(Y ) in
terms of elements of Y similar to the one of the linear span (recall that the linear
span of X is also characterized as the set of all linear combinations of vectors
from X).
Given a nonempty set Y , let us choose an arbitrary point y 0 ∈ Y , and consider
the set
X := Y − y 0 .
All affine subspaces containing Y should also contain y 0 . Therefore, by Proposi-
tion A.36, Aff(Y ) can be represented as M = y 0 +L, where L is a linear subspace.
It is absolutely evident that an affine subspace M = y 0 + L contains Y if and
only if the subspace L contains X, and that the larger is L, the larger is M :
L ⊂ L′ =⇒ M = y 0 + L ⊂ M ′ = y 0 + L′ .
Thus, in order to find the smallest among the affine subspaces containing Y , it
suffices to find the smallest among the linear subspaces containing X and then
translate the latter space by y 0 :
Aff(Y ) = y 0 + Lin(X) = y 0 + Lin(Y − y 0 ). (A.6)
Now, recall that by definition Lin(Y − y 0 ) is the set of all linear combinations of
vectors from Y − y 0 , so that a generic element of Lin(Y − y 0 ) is
k
X
x= µi (y i − y 0 ) [here k may depend on x]
i=1

with y i ∈ Y and coefficients µi ∈ R for i = 1, . . . , k. Then, a generic element of

Aff(Y ) is given by
Xk X k
0 i 0
y=y + µi (y − y ) = λi y i ,
i=1 i=0

where
X
λ0 := 1 − µi , and λi := µi , i = 1, . . . , k.
i
A.4 Affine subspaces in Rn 363

Hence, we deduce that a generic element of Aff(Y ) is a linear combination of

vectors from Y . Note, however, that in this combination the coefficients λi are
not completely arbitrary: their sum is equal to 1. Linear combinations of this
type –with the unit sum of coefficients– have a special name; they are called
affine combinations.
We have seen that every vector from Aff(Y ) is an affine combination of vectors
from Y . In fact, the reverse is also true, i.e., Aff(Y ) contains all affine combinations
of vectors from Y . Indeed, if
X k
y= λi y i
i=1
P
is an affine combination of vectors from Y , then, using the equality i λi = 1,
we can write it also as
Xk
0
y=y + λi (y i − y 0 ),
i=1
0
where y is the “marked” vector we used in our previous reasoning. And any
vector y of this form, as we already know, belongs to Aff(Y ). Thus, we arrive at
the following result.

Proposition A.38 [Structure of affine hull] For any nonempty set Y , we

have
Aff(Y ) = {the set of all affine combinations of vectors from Y }.

When Y itself is an affine subspace, it, of course, coincides with its affine hull,
and the previous proposition leads to the following consequence.

Corollary A.39 An affine subspace M is closed with respect to taking affine

combinations of its members, i.e., every combination of this type is a vector
from M . Vice versa, a nonempty set which is closed with respect to taking
affine combinations of its members is an affine subspace.

A.4.3 Affinely spanning sets, affinely independent sets, affine

dimension
Affine subspaces are closely related to linear subspaces, and the basic notions
associated with linear subspaces have natural and useful affine analogies. Here,
we introduce these notions and discuss their basic properties.
Affinely spanning sets. Consider an affine subspace M = ā + L. We say that a
subset Y of M is affinely spanning for M (we also say that Y spans M affinely,
or that M is affinely spanned by Y ), if M = Aff(Y ), or, due to Proposition A.38
equivalently, if every point of M is an affine combination of points from Y . Hence,
we arrive at the following immediate consequence of section A.4.2.
364 Prerequisites from Linear Algebra

Proposition A.40 Let M = ā + L be an affine subspace, let Y be a subset

of M , and consider any y 0 ∈ Y . Then, the set Y affinely spans M , i.e.,
M = Aff(Y ), if and only if the set
X := Y − y 0
spans the linear subspace L, i.e., L = Lin(X).

Affinely independent sets. A linearly independent set of vectors x1 , . . . , xk is

a set such that no nontrivial linear combination of x1 , . . . , xk equals to zero (see
Definition A.9). An equivalent definition is given by Theorem A.16.iv. That is,
vectors x1 , . . . , xk are linearly independent if in any linear combination
k
X
x= λ i xi ,
i=1

the coefficients λi are uniquely determined by the vector x. This equivalent form
reflects the essence of the matter – what we indeed need, is the uniqueness of
the coefficients in expansions. Accordingly, this equivalent form is the prototype
for the notion of an affinely independent set: we want to introduce this notion in
such a way that the coefficients λi in an affine combination
k
X
y= λi y i
i=0

of “affinely independent” set of vectors y 0 , . . . , y k would be uniquely defined by

y. Non-uniqueness would mean that we can write y as an affine combination of
the vectors y 0 , . . . , y k using two different sets of coefficients λi and λ′i such that
Pk Pk ′
i=0 λi = i=0 λi = 1. That is,

k
X k
X
y= i
λi y = λ′i y i .
i=0 i=0

In such a case, we arrive at

k
X
(λi − λ′i )y i = 0,
i=0

which implies that the vectors y 0 , . . . , y k are linearly dependent. Moreover, there
exists a nontrivial combination of these vectors which represents the zeroPvector
′
P
and
P ′ the sum of the coefficients in this representation satisfy i (λ i −λ i) = i λi −
λ
i i = 1−1 = 0. Our reasoning can be reversed: if there exists a nontrivial linear
combination of y i ’s with zero sum of coefficients which results in the zero vector,
then the coefficients in the representation of any vector as an affine combination
of y i ’s are not uniquely defined. Thus, in order to get uniqueness we should for
A.4 Affine subspaces in Rn 365

sure forbid relations

k
X
µi y i = 0
i=0
Pk
with nontrivial coefficients µi satisfying i=0 µi = 0. Thus, this discussion moti-
vates the following definition.

Definition A.41 [Affine independence] A collection y 0 , . . . , y k of vectors in

Rn is called affinely independent if no nontrivial linear combination of the
vectors with zero sum of coefficients is the zero vector, i.e.,
k
X k
X
λi y i = 0 and λi = 0 =⇒ λ0 = λ1 = . . . = λk = 0.
i=0 i=0

(To compare against the definition of linear independence, see Definition A.9.)

With this definition of affinely independent set of vectors, we arrive at the fol-
lowing result analogous to Theorem A.20.

Corollary A.42 Let y 0 , . . . , y k be affinely independent. Then, the coeffi-

cients λi in any affine combination
k
" k
#
X X
y= λi y i where λi = 1
i=0 i=0
0 k
of the vectors y , . . . , y are uniquely defined by the value y of the combina-
tion.

Verification of affine independence of a collection can be immediately reduced to

the verification of linear independence of a closely related collection.

Proposition A.43 k + 1 vectors y 0 , . . . , y k are affinely independent if and

only if the k vectors given by (y 1 − y 0 ), (y 2 − y 0 ), . . . , (y k − y 0 ) are linearly
independent.

Based on this last proposition we deduce for example that the set of vectors
0, e1 , . . . , en composed of the origin and the standard basis vectors is affinely
independent. Note that this collection is linearly dependent (as every set of vectors
containing zero is linearly dependent). The difference between the two notions
of independence we deal with is important to keep in mind: linear independence
means that no nontrivial linear combination of the vectors can be zero, while
affine independence means that no nontrivial linear combination from certain
restricted class of them (with zero sum of coefficients) can be zero. Therefore,
there are more affinely independent sets than the linearly independent ones: a
linearly independent set is for sure affinely independent, but not vice versa.
366 Prerequisites from Linear Algebra

Affine bases and affine dimension. Propositions A.38 and A.40 reduce the no-
tions of affine spanning/affinely independent sets to the notions of spanning/linearly
independent ones. Combined with Theorem A.16, they result in the following
analogies of the latter two statements:

Proposition A.44 [Affine dimension] Let M = ā + L be an affine subspace

in Rn . Then, the following two quantities are finite integers which are equal
to each other:
(i) the minimal number of elements in the subsets of M which affinely span
M;
(ii) the maximal number of elements in affinely independent subsets of M .
The common value of these two integers is exactly equal to dim(L) + 1.

By definition, the affine dimension of an affine subspace M = ā + L is the

dimension dim(L) of L, i.e., dim(M ) := dim(L). Thus, if M is of affine dimension
k, then the minimal cardinality of sets affinely spanning M , same as the maximal
cardinality of affinely independent subsets of M , is k + 1.

Theorem A.45 [Affine bases] Let M = ā + L be an affine subspace in Rn .

(i) For any Y ⊆ M , the following three properties of Y are equivalent:
(a) Y is an affinely independent set which affinely spans M ;
(b) Y is affinely independent and contains dim(L) + 1 elements;
(c) Y affinely spans M and contains dim(L) + 1 elements.
A subset Y of M possessing these preceding equivalent to each other prop-
erties is called an affine basis of M . Affine bases of M are exactly the col-
lections y 0 , . . . , y dim(L) such that y 0 ∈ M and (y 1 − y 0 ), . . . , (y dim(L) − y 0 ) is
a basis of L.
(ii) Every affinely independent collection of vectors of M either itself is an
affine basis of M , or can be extended to such a basis by adding new vectors.
In particular, every affine subspace M admits an affine basis.
(iii) If Y affinely spans M , then we can always extract from Y an affine
basis of M .

We already know that the standard basis vectors e1 , . . . , en form a basis of the
entire space Rn . And what about affine bases in Rn ? According to Theorem
A.45(i), you can choose such an affine basis for an affine subspace M as any
collection of vectors e0 , e0 + e1 , . . . , e0 + en , where e0 is an arbitrary vector from
M.
Barycentric coordinates. Let M be an affine subspace, and let y 0 , . . . , y k be
an affine basis of M . Since the basis, by definition, affinely spans M , every vector
y from M is an affine combination of the vectors of the basis, i.e.,
k
" k
#
X X
y= λi y i where λi = 1 .
i=0 i=0
A.4 Affine subspaces in Rn 367

Moreover, since the vectors of the affine basis are affinely independent, the co-
efficients of this combination are uniquely defined by y (Corollary A.42). These
coefficients are called barycentric coordinates of y with respect to the affine basis
in question. In contrast to the usual coordinates with respect to a (linear) basis,
the barycentric coordinates could not be quite arbitrary: their sum should be
equal to 1.

A.4.4 Dual description of linear subspaces and affine subspaces

So far, we have introduced the notions of linear subspace and affine subspace
and have presented a scheme of generating these entities. For example, to get, a
linear subspace, we start from an arbitrary nonempty set X ⊂ Rn and add to it
all linear combinations of the vectors from X. By replacing linear combinations
with the affine ones, we get a way to generate affine subspaces.
The just indicated way of generating linear or affine subspaces resembles the
approach of a worker building a house: he starts with the base and then adds to
it new elements until the house is ready. There is yet another way to generate
such subspaces that resembles the approach of an artist creating a sculpture: he
takes something large and then deletes extra parts of it. The “artist’s way” to
represent linear subspaces and affine subspaces is quite instructive as well and we
will examine it next.

A.4.5 Affine subspaces and systems of linear equations

Let L be a linear subspace in Rn . According to Theorem A.27.i it is an orthogonal
complement itself, namely, L is the orthogonal complement to the linear subspace
L⊥ . Now let the vectors a1 , . . . , am be a finite spanning set in L⊥ . A vector x which
is orthogonal to a1 , . . . , am is orthogonal to the entire L⊥ (since every vector from
L⊥ is a linear combination of a1 , . . . , am and the inner product is bilinear). Of
course, vice versa is also true: a vector orthogonal to the entire L⊥ is orthogonal
to every vector in a1 , . . . , am . Therefore, we arrive at
L = (L⊥ )⊥ = x ∈ Rn : a⊤

i x = 0, i = 1, . . . , m . (A.7)
Thus, we get the following very important, although simple, result.

Proposition A.46 [“Outer” description of a linear subspace] Every linear

subspace L in Rn is the set of all solutions to a homogeneous system of linear
equations
a⊤i x = 0, i = 1, . . . , m, (A.8)
given by properly chosen m and vectors a1 , . . . , am ∈ Rn .

Recall from Example A.3 that the solution set to a homogeneous system of linear
equations with n variables is always a linear subspace in Rn . Thus, Proposition
A.46 is indeed an “if and only if” statement.
368 Prerequisites from Linear Algebra

From Proposition A.46 and the facts about the dimension of linear subspaces
we can easily derive several important consequences:
• The systems of equations (A.8) which define a given linear subspace L are
exactly the systems given by the vectors a1 , . . . , am which span L⊥ 1) .
• If m is the smallest possible number of equations in (A.8), then m is also the
dimension of L⊥ . That is, by Remark A.28, we have codim(L) := n − dim(L).2)
Now, an affine subspace M is, by definition, a translation of a linear subspace,
i.e., M = ā + L. Recall that the vectors x from L are exactly the solutions of
certain homogeneous system of linear equations
a⊤
i x = 0, i = 1, . . . , m.
It is absolutely clear that adding to these vectors a fixed vector ā, we get exactly
the set of solutions to the inhomogeneous feasible system of linear equations
a⊤
i x = bi , i = 1, . . . , m,
where bi := a⊤i ā for all i. The reverse is also true, i.e., the set of solutions to a
feasible system of linear equations
a⊤
i x = bi , i = 1, . . . , m,
with n variables is the sum of a particular solution to the system and the solution
set to the underlying homogeneous system of linear equations (the latter set, as
we already know, is a linear subspace in Rn ). That is, the set of solutions to a
feasible system of linear equations is an affine subspace. Thus, we arrive at the
following result.

Proposition A.47 [“Outer” description of an affine subspace]

Every affine subspace M = a + L in Rn is the set of all solutions to a feasible
system of linear equations
a⊤
i x = bi , i = 1, . . . , m, (A.9)
given by properly chosen m and vectors a1 , . . . , am .
Vice versa, the set of all solutions to a feasible system of linear equations
with n variables is an affine subspace in Rn .
The linear subspace L associated with M is exactly the set of solutions of
the homogeneous (with the right hand side set to 0) version of system (A.9).

We see, in particular, that an affine subspace is always closed.

1 The reasoning which led us to Proposition A.46 states that a1 , . . . , am span L⊥ implies (A.8)
defines L. Here, we claim that the reverse is also true.
2 Note that this statement holds true also in the extreme case when L = Rn (i.e., when
codim(L) = 0) due to the fact that the solution set of an empty set of equations or inequalities in
variables x ∈ Rn is the entire space; indeed, were it not the case, there would exist a vector in Rn
violating one or more equalities/inequalities composing the system, which clearly is not the case
when the system is empty.
A.4 Affine subspaces in Rn 369

Remark A.48 The “outer” description of a linear or affine subspace (the artist’s
one) is in many cases much more useful than the “inner” description via lin-
ear/affine combinations (the worker’s one). For example, using the outer descrip-
tion, it is very easy to check whether or not a given vector belongs to a given
linear (or affine) subspace. In contrast, this task is not that easy with the inner
description3) . In fact both descriptions are “complementary” to each other and
work perfectly well in parallel: what is difficult to see with one of them, is clear
with another. The idea of using “inner” and “outer” descriptions of the entities
we meet with (e.g., linear subspaces, affine subspaces, convex sets, optimization
problems) – the general idea of duality – is, in our humble opinion, the main driv-
ing force of Convex Analysis and Optimization, and so not surprisingly in this
book we will all the time meet with different implementations of this fundamental
idea.

A.4.6 Structure of the simplest affine subspaces

Here, we mainly introduce some terminology. According to their dimension, affine
subspaces in Rn are named as follows:
• Subspaces of dimension 0 are translations of the only 0-dimensional linear
subspace {0}, i.e., they are singleton sets – vectors from Rn . These subspaces
are called points; a point is a solution to a system of n linear equations in n
unknowns with a nonsingular matrix.
• Subspaces of dimension 1 are lines. These subspaces are translations of one-
dimensional linear subspaces of Rn . A one-dimensional linear subspace has a
single-element basis given by a nonzero vector d and is simply the set of all
possible multiples of this vector. Consequently, a line l is a set of the form
l := {ā + td : t ∈ R}
given by a pair of vectors ā (the origin of the line) and d (the direction of the
line), d ̸= 0. Naturally, this is the inner description of the line l. Note that in
this description the origin of the line and its direction are not uniquely defined
by the line; you can choose as origin any point on the line and also you can
multiply a particular direction by any nonzero real number.
In the barycentric coordinates a line l is described as follows:
l = {λ0 y 0 + λ1 y 1 : λ0 + λ1 = 1} = {λy 0 + (1 − λ)y 1 : λ ∈ R},
where y 0 , y 1 is an affine basis of l (such a basis can be chosen as any pair of
distinct points on the line).
The outer description of a line is as follows: it is the set of solutions to a
system of n−1 linearly independent linear equations in n variables (unknowns).
3 In principle it is not difficult to certify that a given point belongs to, say, a linear subspace given as
the linear span of some set – it suffices to point out a representation of the point as a linear
combination of vectors from the set. But how could you certify that the point does not belong to the
subspace?
370 Prerequisites from Linear Algebra

• Subspaces of dimension k where 2 ≤ k < n−1 have no special names; sometimes

they are called affine planes of dimension k.
• Affine subspaces of dimension n − 1, due to the important role they play in
Convex Analysis, have a special name; they are called hyperplanes. The outer
description of a hyperplane is that a hyperplane is the solution set of a single
linear equation
a⊤ x = b
with nontrivial left hand side (a ̸= 0). In other words, a hyperplane is the level
set a(x) = const of a nonconstant linear form a(x) = a⊤ x.
• The “largest possible” affine subspace –the one of dimension n– is unique and it
is the entire Rn . This subspace is given by an empty system of linear equations.

A.5 Exercises
Exercise 1 1. Mark in the list below those subsets of Rn which are linear subspaces. For the
ones that are linear subspaces, find out their dimensions and point out bases. For the ones
that are not linear subspaces provide counterexamples.
1. Rn
2. {0}
3. ∅

n
4. x ∈ Rn
P
: ixi = 0
i=1
n
5. x ∈ Rn
P 2
: ixi = 0
i=1
n
6. x ∈ Rn
P
: ixi = 1
i=1
n
7. x ∈ Rn ix2i = 1
P
:
i=1

2. Suppose that we know L is a subspace of Rn with exactly one basis. What is L?

Exercise 2 Consider the sets given in Exercise 1 and identify the ones that are affine subspaces.
For the ones that are affine subspaces, find their affine dimensions and point out their linear
subspaces parallel to them. For the ones that are not affine subspaces, provide counterexamples.
Exercise 3 1.
What is the orthogonal
complement (w.r.t. the standard inner product) of the
n
n
xi = 0 in Rn ?
P
subspace x ∈ R :
i=1
2. Find an orthonormal basis (w.r.t. the standard inner product) in the linear subspace {x ∈
Rn : x1 = 0} of Rn
Exercise 4 Suppose a ∈ Rn where ai > 0 for all i = 1, . . . , n, and consider the affine subspace
( n
)
n
X
M = x∈R : ai xi = 1
i=1

Point out the linear subspace parallel to M and find an affine basis in M .
Exercise 5 Let ∅ ̸= C ⊆ Rn and x ∈ Rn be given.
1. Is it always true that Aff(C − {x}) = Aff(C) − {x}?
2. Is it always true that Lin(C − {x}) = Aff(C) − {x}?
3. Do your answers to the previous questions change if you further assume x ∈ Aff(C)?
A.6 Proofs of Facts 371

Exercise 6 Suppose that we are given n sets E1 , E2 , . . . , En in R100 that are distinct from each
other and they satisfy
E1 ⊂ E2 ⊂ . . . ⊂ En .
How large can n be, if
1. every one of Ei is a linear subspace?
2. every one of Ei is an affine subspace?
3. every one of Ei is a convex set?
Exercise 7 Prove that the Triangle inequality in Euclidean norm, i.e., ∥x + y∥2 ≤ ∥x∥2 + ∥y∥2 ,
holds true as equality if and only if x and y are nonnegative multiples of some vector (which
always can be taken to be x + y).

A.6 Proofs of Facts

Fact A.33 (i) Let L be a linear subspace of Rn , and f 1 , . . . , f m be an orthonormal

basis in L. Then, for every x ∈ Rn , the orthoprojection xL of x onto L is given by
the formula
Xm
xL := (x⊤ f i )f i .
i=1

(ii) Let L1 , L2 be linear subspaces in Rn . Then,

(L1 + L2 )⊥ = L⊥ ⊥
1 ∩ L2 and (L1 ∩ L2 )⊥ = L⊥ ⊥
1 + L2 .

λi f i for some λi ’s, and x − xL is

P
Proof. (i): Note that xL belongs to L and therefore xL = i
orthogonal to L. Hence, we have
m
X
x = (x − xL ) + λi f i .
i=1

Then, for any j, by taking inner products of both sides of the above equality with f j , we arrive
at
m
X m
X
(f j )⊤ x = (f j )⊤ (x − xL ) + λi (f j )⊤ f i = λi (f j )⊤ f i = λj .
i=1 i=1

The result follows by plugging in the expressions for λj in the equation xL = i λi f i .

(ii): Since L1 ⊆ L1 + L2 and L2 ⊆ L1 + L2 , a vector orthogonal to L1 + L2 is orthogonal to

both L1 and L2 . Vice versa, a vector with the latter property is orthogonal to sums of vectors
from L1 and from L2 , that is, it is orthogonal to L1 +L2 . Thus, the vectors orthogonal to L1 +L2
are exactly the vectors orthogonal to both L1 and L2 . This proves the first equality. Applying
this equality to orthogonal complements of L1 and L2 in the role of L1 , L2 , we get the second
equality.
Appendix B

Prerequisites from Real Analysis

B.1 Space Rn : metric structure and topology

Euclidean structure on the space Rn gives rise to a number of extremely impor-
tant metric notions – distances, convergence, etc. For the sake of definiteness, we
associate these notions with the standard inner product ⟨x, y⟩ = x⊤ y.

B.1.1 Euclidean norm and distances

By positive definiteness, the quantity x⊤ x is always nonnegative, so that the
quantity
√ q
∥x∥2 = x⊤ x = x21 + x22 + . . . + x2n
is well-defined. This quantity is called the (standard) Euclidean norm of vector
x (or simply the norm of x) and is treated as the distance from the origin to
x. The distance between two arbitrary points x, y ∈ Rn is, by definition, the
norm d2 (x, y) = ∥x − y∥2 of the difference x − y. These notions we have just
introduced indeed satisfy all the basic requirements on the general notions of a
norm ∥ · ∥ : Rn → R and a distance d(x, y) : Rm × Rn → R. Specifically:
1. Positivity of norms: The norm of a vector is always nonnegative. Furthermore,
it is zero if and only if the vector is zero:
∥x∥ ≥ 0 ∀x; ∥x∥ = 0 ⇐⇒ x = 0.
2. Homogeneity of norms: When a vector is multiplied by a real, its norm is
multiplied by the absolute value of the real:
∥λx∥ = |λ| · ∥x∥ ∀(x ∈ Rn , λ ∈ R).
3. Triangle inequality: Norm of the sum of two vectors is less than or equal to
the sum of their norms:
∥x + y∥ ≤ ∥x∥ + ∥y∥ ∀(x, y ∈ Rn ).
In contrast to the properties of positivity and homogeneity, which are absolutely
evident, the Triangle inequality is not trivial and definitely requires a proof.
Its proof goes through a fact which is extremely important by its own right –
the Cauchy Inequality, which perhaps is the most frequently used inequality in
Mathematics.
372
B.1 Space Rn : metric structure and topology 373

Theorem B.1 [Cauchy Inequality] For any x, y ∈ Rn , we have

|x⊤ y| ≤ ∥x∥2 ∥y∥2 .
Moreover, the above relation holds as equality if and only if one of the vectors
is proportional to the other one:
|x⊤ y| = ∥x∥2 ∥y∥2
⇐⇒ ∃α ∈ R such that x = αy or ∃β ∈ R such that y = βx.

Proof. Without loss of generality we may assume that both x and y are nonzero
(otherwise the Cauchy inequality is clearly equality, and one of the vectors is
constant times (specifically, zero times) the other one, as desired). Assuming
x, y ̸= 0, consider the function
f (λ) = (x − λy)⊤ (x − λy) = x⊤ x − 2λx⊤ y + λ2 y ⊤ y.
By positive definiteness of the inner product, this function – which is a second
order polynomial – is nonnegative on the entire axis, hence the discriminant
(x⊤ y)2 − (x⊤ x)(y ⊤ y) of f is nonpositive:
(x⊤ y)2 ≤ (x⊤ x)(y ⊤ y).
By taking square roots of both sides, we arrive at the Cauchy Inequality. We
also see that the inequality holds as equality if and only if the discriminant of
the second order polynomial f (λ) is zero, i.e., if and only if the polynomial has
a (multiple) real root. But, due to the positive definiteness of the inner product,
f (·) has a root λ if and only if x = λy, which proves the second part of the
theorem.
From Cauchy’s Inequality to the Triangle Inequality: Let x, y ∈ Rn . Then,
∥x + y∥22 = (x + y)⊤ (x + y) [by the definition of the Euclidean norm]
⊤ ⊤ ⊤
= x x + y y + 2x y [by opening the parentheses]
⊤ ⊤
≤ x x + y y +2∥x∥2 ∥y∥2
|{z} [by Cauchy’s Inequality]
|{z}
=∥x∥22 =∥y∥22

= (∥x∥2 + ∥y∥2 )2
=⇒ ∥x + y∥2 ≤ ∥x∥2 + ∥y∥2 .
We also have the following simple and useful fact.

Fact B.2 For every norm ∥ · ∥ on Rn and any x, y ∈ Rn , we have

|∥x∥ − ∥y∥| ≤ ∥x − y∥.

Proof. Indeed, x = (x − y) + y, and so by the Triangle inequality ∥x∥ ≤ ∥x −

y∥ + ∥y∥. That is, ∥x∥ − ∥y∥ ≤ ∥x − y∥. Swapping x and y, we get ∥y∥ − ∥x∥ ≤
∥y − x∥ = ∥x − y∥, as well. Thus, the result follows.
374 Prerequisites from Real Analysis

The properties of a norm (i.e., of the distance to the origin) we have estab-
lished induce properties of the distances between pairs of arbitrary points in
Rn . Specifically, d2 (x, y) = ∥x − y∥2 , same as any other norm-induced distance
d∥·∥ (x, y) = ∥x − y∥, possesses the following standard properties of a general
distance d(x, y) : Rn × Rn → R:
1. Positivity: The distance d(x, y) between two points x, y ∈ Rn is positive,
except for the case when the points coincide (x = y) in which case the distance
between x and y is zero:
d(x, y) ≥ 0, ∀(x, y ∈ Rn ); d(x, y) = 0 ⇐⇒ x = y.
2. Symmetry: For any x, y ∈ Rn , the distance from x to y is the same as the
distance from y to x:
d(x, y) = d(y, x), ∀(x, y ∈ Rn ).
3. Triangle inequality: For every x, y, z ∈ Rn , the distance from x to z does not
exceed the sum of distances between x and y and between y and z:
d(x, z) ≤ d(x, y) + d(y, z), ∀(x, y, z ∈ Rn ).
A norm-induced distance d(x, y) = ∥x − y∥ possesses the following additional
properties as well:
4. Shift invariance: d(x + h, y + h) = d(x, y) for all x, y, h ∈ Rn .
5. Homogeneity: d(λx, λy) = |λ| d(x, y) for all x, y ∈ Rn and λ ∈ R.

Equivalence of norms on Rn . As is immediately seen, any distance d(x, y) =

∥x − y∥ induced by an arbitrary norm ∥ · ∥ on Rn possesses the above properties
1. – 5. In fact (check it!), every distance d(x, y) on Rn satisfying properties 1.
– 5. is indeed induced by a norm, specifically, the norm d(0, x) (in the case of
properties 1. – 5., d(0, x) indeed is a norm).
We next discuss a very important fact (characteristic for finite-dimensional
linear spaces) which states that all these distances are within constant factors
from each other.

Proposition B.3 Any two norms ∥ · ∥, ∥ · ∥′ on Rn are equivalent, i.e., there

exists a positive constant c (which depends on the given pair of norms) such
that
∥x∥′ 1
c≤ ≤ , ∀(x ∈ Rn , x ̸= 0). (B.1)
∥x∥ c

Proof. Clearly it suffices to justify equivalence of any norm to a fixed one, say,
the Euclidean norm ∥ · ∥2 . Thus, let us take ∥ · ∥′ ≡ ∥ · ∥2 . It suffices to prove that
there exist positive constants c and C such that both of the following conditions
hold:
(a) ∥x∥ ≤ C∥x∥2 , ∀x ̸= 0;
(B.2)
(b) ∥x∥2 ≤ c∥x∥, ∀x ̸= 0.
B.1 Space Rn : metric structure and topology 375
Pn
To prove (a), all we need is to set C := i=1 ∥ei ∥, where e1 , . . . , en are the
standard basis vectors. This is because
n
X Xn X n
∥x∥ = xi ei ≤ |xi |∥ei ∥ ≤ max |xi | ∥ei ∥ ≤ C∥x∥2 .
i=1,...,n
i=1 i=1 i=1

In order to prove (b), we will show that 0 < inf x {∥x∥/∥x∥2 : x ̸= 0}. Assume, on
the contrary, that the infimum in question is 0. Then, there exists a sequence of
nonzero vectors xt , t = 1, 2, . . . such that ∥xt ∥/∥xt ∥2 → 0 as t → ∞. Since both
∥ · ∥ and ∥ · ∥2 are homogeneous, we lose nothing by scaling xt to have ∥xt ∥2 = 1
for all t, whence we have ∥xt ∥ → 0 as t → ∞. As ∥xt ∥2 = 1 for all t, for any
1 ≤ i ≤ n, the sequences xti of the i-th entries in xt are bounded. Passing to
a subsequence, we can assume w.l.o.g. that all these sequences have limits as
t → ∞, i.e., limt→∞ xti = zi for some z ∈ Rn . Passing to limit in the equality
P n t 2 t
i=1 (xi ) = 1, we get ∥z∥2 = 1. Clearly, ∥x − z∥2 → 0 as t → ∞. Moreover, from
(a) we have ∥xt − z∥ ≤ C∥xt − z∥2 which implies that ∥xt − z∥ → 0 as t → ∞.
Then, by Triangle inequality, we deduce ∥z∥ ≤ ∥z − xt ∥ + ∥xt ∥ → 0 as t → ∞,
and thus ∥z∥ = 0. This is the desired contradiction since z ̸= 0 and therefore
∥z∥ > 0.
In the preceding proof, we used for granted the following fundamental property
of real line (stemming from centuries-long development of rigorous theory of real
numbers):
(✠) Every bounded sequence of real numbers {xt }t≥1 has a converging
subsequence, that is, for properly selected t1 < t2 < . . . and x̄ ∈ R the
sequence xti converges to x̄ as i → ∞: for every ϵ > 0 we have |x̄−xti | < ϵ
for all but finitely many indices i.
As an immediate consequence of this statement, from every bounded sequence
{xt }t≥1 of vectors from Rn (boundedness meaning that the sequences of i-th
entries in xt are bounded for every i ≤ n) we can extract a subsequence {xts : s =
1, 2, . . . , t1 < t2 < . . .} such that for every i ≤ n the sequences of i-th coordinates
xtis of vectors xts converge as s → ∞ to some reals – the fact we used in the proof
of Proposition B.3. Indeed, the sequences {xti }t≥1 are bounded for every i. By (✠)
we can extract from {xt }t≥1 a subsequence {xts , s = 1, 2, . . . , t1 < t2 < . . .} with
converging first entries, from this subsequence – a subsequence with converging
second entries, and so on. In n steps of this process we get a subsequence of
the original sequence of vectors such that i-th entries in the vectors from our
subsequence converge to some reals x̄i , i ≤ n. This is formalized as Theorem B.15
in Section B.1.4.

B.1.2 Convergence
Equipped with distances, we can define the fundamental notion of convergence
of a sequence of vectors. Specifically, we say that a sequence x1 , x2 , . . . of vectors
from Rn converges to a vector x̄, or, equivalently, that x̄ is the limit of the
376 Prerequisites from Real Analysis

sequence {xi } [notation: x̄ = lim xi ], if the distances from x̄ to xi go to 0 as

i→∞
i → ∞:
x̄ = lim xi ⇐⇒ ∥x̄ − xi ∥2 → 0 as i → ∞,
i→∞

or, equivalently, for every ϵ > 0 there exists i = i(ϵ) such that the distance
between every point xi , i ≥ i(ϵ), and x̄ does not exceed ϵ:
∥x̄ − xi ∥2 → 0 as i → ∞ ⇐⇒ ∀ϵ > 0, ∃i(ϵ) : i ≥ i(ϵ) =⇒ ∥x̄ − xi ∥2 ≤ ϵ .

Note that by Proposition B.3 every two norm-induced distances on Rn are within
multiplicative factors from each other, meaning that replacing in the definition
of convergence the Euclidean distance with any other norm-induced distance, we
do not affect convergence per se – the fact that a sequence converges, and the
limit of such a sequence.

Fact B.4 All of the following statements are correct:

(i) For any x̄ ∈ Rn , we have x̄ = lim xt if and only if for every index
t→∞
i = 1, . . . , n the i-th coordinate of the vectors xt converge to the i-th
coordinate of the vector x̄ as t → ∞.
(ii) If a sequence converges, its limit is uniquely defined.
(iii) Convergence is compatible with linear operations:
• if xt → x and y t → y as t → ∞, then xt + y t → x + y as t → ∞;
• if xt → x and λt → λ as t → ∞, then λt xt → λx as t → ∞.

We also introduce two other notions related to sequences in R to analyze their

“extremes.” Given a sequence {xt } ⊂ R, its limit inferior (lower limit) [notation:
lim inf t→∞ xt ] is defined by

lim inf xt := lim inf xm .
t→∞ t→∞ m≥t

That is,
lim inf xt = sup inf xm = sup {inf{xm : m ≥ t} : t ≥ 0} .
t→∞ t≥0 m≥t

In other words, if {xt } is a sequence of reals, a = lim inf t→∞ xt is

• either −∞, if there is a subsequence diverging to −∞,
• or +∞, if xt → ∞, t → ∞,
• or the smallest real a which can be represented as the limit of a subsequence
of {xt }.
In contrast to the usual limit, lim inf is well defined for every sequence of reals,
e.g., lim inf t→∞ (−1)t = −1.
Similarly, the limit superior (upper limit) [notation: lim supt→∞ at ] is defined
by

lim sup xt := lim sup xm ,
t→∞ t→∞ m≥t
B.1 Space Rn : metric structure and topology 377

which is nothing but

lim sup xt = inf sup xm = inf {sup{xm : m ≥ t} : t ≥ 0} .
t→∞ t≥0 m≥t

B.1.3 Closed and open sets

Now that we have at our disposal the notions of distance and convergence, we
can speak about closed and open sets.

Definition B.5 [Closed set] A set X ⊆ Rn is called closed, if it contains

limits of all converging sequences of elements from X:
n o
xi ∈ X, x = lim xi =⇒ x ∈ X.
i→∞

Example B.6 (Examples of closed sets) Each of the following sets is closed:
1. Rn .
2. ∅.
3. The set composed of the points xi = (i, 0, . . . , 0), i = 1, 2, 3, . . .
4. Any finite subset of Rn .
5. Any
linear subspace in Rn , i.e., any set of the form
n
x ∈ Rn :
P
aij xj = 0, i = 1, . . . , m (see Proposition A.46).
i=1
6. Any
affine subspace of Rn , i.e., any nonempty
set of the form
n
n
P
x∈R : aij xj = bi , i = 1, . . . , m (see Proposition A.47).
i=1

Example B.7 (Examples of non-closed sets) Each of the following sets is not
closed, provided n > 0:
1. Rn \ {0}.
2. The set composed of points xi = (1/i, 0, . . . , 0), for i = 1, 2, 3, . . .
n
3. The set {x
∈ R : xjn > 0, ∀j= 1, . . . , n}.
4. The set x ∈ Rn :
P
xj > 5 .
i=1

Definition B.8 [Open set] A set X ⊆ Rn is called open, if whenever x

belongs to X, all points close enough to x also belong to X:
∀(x ∈ X) ∃(δ > 0) : ∥x′ − x∥2 < δ =⇒ x′ ∈ X.

Definition B.9 [Neighborhood, interior point] An open set containing a

point x ∈ Rn is called a neighborhood of x. A point x ∈ Rn from a set X is
called an interior point of X if X contains a neighborhood of x.
378 Prerequisites from Real Analysis

Note that based on these definitions, open sets are exactly the sets such that
every point of the set is its interior point.

Definition B.10 [Interior of a set] Given a set X ⊆ Rn , the interior of X

[notation: int(X) or int X] is defined as the set of all interior points of X:
int(X) := {x ∈ X : ∃δ > 0 such that x′ ∈ X ∀(x′ : ∥x − x′ ∥2 < δ)} .

Note that a set X ⊆ Rn is open if and only if X = int(X).

Example B.11 (Examples of open sets) Each of the following sets is open:
1. Rn .
2. ∅. ( )
n
n
P
3. The set x∈R : aij xj > bi , ∀i = 1, . . . , m .
j=1
4. The complement of any finite set.
Example B.12 (Examples of non-open sets) Each of the following sets is not
open:
1. A nonempty finite set.
2. The sequence xi = (1/i, 0, . . . , 0), i = 1, 2, 3, . . ..
i n
3. The set
composed of points from
n
a sequence x ∈ R , i = 1, 2, 3, . . .
4. The set x ∈ Rn :
P
xj ≥ 5 .
i=1

Based on these examples, we see that Rn and ∅ are both open and closed.
Also, note that there are sets that are neither closed nor open, e.g.,
x ∈ R2 : x1 > 0, x2 ≥ 0 .

Fact B.13 (i) A set X ⊆ Rn is closed if and only if its complement X :=

Rn \ X is open.
(ii) Intersection of every (finite or infinite) family of closed sets is closed.
Union of every (finite or infinite) family of open sets is open.
(iii) Union of finitely many closed sets is closed. Intersection of finitely
many open sets is open.

We close this section with another important definition.

Definition B.14 [Closure of a set] The closure of a set X ⊆ Rn [notation:

cl X or cl(X)] is the smallest closed set which contains X, i.e., it is the
intersection of all closed sets containing X (this intersection indeed is closed
by Fact B.13(ii)).
Equivalently (the equivalence is established in Fact I.1.23), cl X is the set
B.1 Space Rn : metric structure and topology 379

composed of the limits of all converging sequences of points from X:

n o
cl X = x ∈ Rn : ∃ xi i ∈ X such that x = lim xi .

i→∞

Note that a set X ⊆ Rn is closed if and only if X = cl(X).

B.1.4 Local compactness of Rn

The following is a fundamental fact about convergence in Rn , which in certain
sense is characteristic for this series of spaces.

Theorem B.15 From every bounded sequence {xi }∞ i=1 of points from R
n
ij ∞
we can extract a converging subsequence {x }j=1 .

Proof. Let {xi ∈ Rn }i≥1 be a bounded sequence. Since every bounded sequence
of reals has a converging subsequence, we can extract from {xi } a subsequence
with the first entries of its members converging to a limit. Note that this sub-
sequence itself will be bounded. Then, from this subsequence, we can extract a
subsequence with the second entries of the members converging to a limit. Ex-
tracting in this fashion subsequences from already generated ones, after n steps
we get a subsequence of the original sequence with every entry of the selected
members converging to a limit. That is, the selected final subsequence converges
entrywise (and then - in the Euclidean norm as well) to the vector composed of
the above limits.
We next introduce an important concept related to the closed sets.

Definition B.16 [Compact set] A set X is called compact if every sequence

in X has a subsequence that converges to an element again contained in X.

Theorem B.17 Any set X ⊆ Rn that is closed and bounded is compact.

Moreover, any compact set X ⊆ Rn is closed and bounded.

Proof. Let X be a closed and bounded subset of Rn . Consider any sequence

{xi } of points from X. Note that this sequence is bounded as X is bounded.
Then, by Theorem B.15 there exists a subsequence of our sequence which has a
limit. Since X is closed, this limit belongs to X, as required in the definition of
compact sets.
To prove the last statement in this theorem, let X be a compact set in Rn .
Note that if the set X were unbounded, it would be possible to select a sequence
of points from X with norms diverging to +∞, and such a sequence has no
converging subsequences, which contradicts to X being compact. Furthermore, if
X were to be not closed, we could find a sequence of points from X converging to a
point not from X. For such a sequence, no subsequence converges to a point from
X (since limits of all these subsequences are the same as the limit of the entire
380 Prerequisites from Real Analysis

sequence, and the latter does not belong to X), which once again contradicts to
X being compact. S .
Given a set X ⊆ Rn , we refer to a family C of open sets such that X ⊆ U
U ∈C
as an open covering of X. We first establish a general statement regarding open
coverings, and then show that a much stronger version of this property gives an
alternative characterization of compact sets.

Proposition B.18 Given a set X ⊆ Rn and its open covering C, we can ex-
tract a countable subcovering
S – there exists a sequence U1 , U2 , . . . of members
of C such that X ⊆ Ui .
i≥1

Proof. Observe, first, that the family V of all open balls {x ∈ Rn : ∥x − x̄∥2 < r}
where the radius r and all coordinates of the centers x̄ are rational is countable,
i.e., all balls from V can be arranged into sequence V1 , V2 , . . .. This is because
the rational data – radius and coordinates of center – of a ball from V form a
collection of n + 1 rational numbers, or, which is the same, a collection of 2(n + 1)
integers, and all collections of the latter type can be arranged into a sequence:
we first list all collections of 2(n + 1) integers with the total of their magnitudes
not exceeding 1 (there are finitely many collections of this type), then list the
yet unlisted collections with the total of magnitudes of members not exceeding
2, and so on. This construction clearly arranges into a sequence all collections
of 2(n + 1) integers, that is, all balls with rational centers and positive rational
radii.
Now let us build a countable subcovering U1 , U2 , . . . of a given open covering
C of X ⊆ Rn . We will process the balls V1 , V2 ,. . . one by one. When processing
Vi , we check whether this set is covered by a member of C. If Vi is covered by
a member of C, we add this member to the sequence U1 , U2 , . . . we are building.
Otherwise, we add to the U -sequence nothing and pass to processing Vi+1 .
SNow, every point x of X belongs to certain member Wx of covering C as X ⊆
C. Since Wx is open, among the balls Vi we can find a ball, let us call its index
C∈C
i(x), such that the center of Vi(x) is close to x and the radius of Vi(x) is small enough
to ensure that x ∈ Vi(x) ⊂ Wx . Then, at the step i(x) of processing V1 , V2 , . . . the
family C contains a member, namely, Wx , which covers Vi(x) . Therefore, at this
step we will add to the U -sequence a set which contains Vi(x) and hence contains x.
We conclude that every x ∈ X is contained in some set of the sequence U1 , U2 , . . .
of members of C, and thus this sequence is a countable subcovering of X as
desired.
We close by providing a characterization of compact sets through their open
coverings.

Theorem B.19 A set X ⊆ Rn is compact if and only if it possesses the

following property:
B.2 Continuous functions on Rn 381

From every open covering C of X we can extract a finite sub-covering, i.e.,

a finite subfamily C ′ ⊆ C such that C ′ still is a covering of X.

Proof. First, suppose that X satisfies the stated property. We will prove that
X is closed and bounded, which by Theorem B.17 will imply that X is com-
pact. Suppose X is unbounded, and consider the covering of X given by all
open balls centered at points from X. Then, we would be unable to extract a
finite subcovering of X from this open covering. Now, assume for contradiction
that X is not closed, and consider the covering of X given by the open sets
{x : ∥x − x̄∥2 > 1/i}, i = 1, 2, . . . associated with a point x̄ ∈ cl(X) \ X. Then,
we would be unable to extract a finite subcovering of X from this open covering
of X which is a contradiction.
Now, let X be closed and bounded (and thus compact by Theorem B.17).
Consider an open covering C ofS X. By Proposition B.18, we can extract from C
a countable subcovering: X ⊂ i≥1 Ui , Ui ∈ C. We claim that in fact just finitely
many of the sets Ui suffices to cover X. This, of course, is enough to show that X
obeys the desired property stated in the theorem. Now, S assume for contradiction
that for every i = 1, 2, . . . there is a point xi ∈ X \ j≤i Uj . Since X is compact,
by definition we can extract from {xi } a subsequence converging to some x̄ ∈ X.
Since Ui are open and cover X, there exists i∗ such that x̄ ∈ Ui∗ . Also, as Ui∗
is open and the subsequence in question converges to x̄, the subsequence (and
therefore the entire sequence {xi }) visits Ui∗ infinitely many times. This gives us
the desired contradiction, since by construction all points xi with i > i∗ do not
belong to Ui∗ .

B.2 Continuous functions on Rn

B.2.1 Continuity of a function
In this section, we consider X ⊆ Rn and examine a function (also called mapping)
f (x) : X → Rm defined on X and taking values in Rm . We start with basic
definitions.

Definition B.20 [Continuity at a point] A function f (x) : X → Rm is called

continuous at a point x̄ ∈ X, if for every sequence xi of points of X converging
to x̄ the sequence f (xi ) converges to f (x̄). Equivalently, f : X → Rm is
continuous at x̄ ∈ X if for every ϵ > 0 there exists δ > 0 such that
x ∈ X, ∥x − x̄∥2 < δ =⇒ |f (x) − f (x̄)| < ϵ.

Definition B.21 [Continuous function] A function f (x) : X → Rm is called

continuous on X, if f is continuous at every point from X. Equivalently, f is
continuous if f preserves convergence: whenever a sequence of points xi ∈ X
converges to a point x ∈ X, the sequence f (xi ) converges to f (x).
382 Prerequisites from Real Analysis

1.6 2.5

1.4 2

1.2 1.5

1 1

0.8 0.5

0.6 0

0.4 -0.5

0.2 -1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

continuous function f : [0, 1] → R discontinuous function f : [0, 1] → R

Figure V.1. Continuity vs. discontinuity
Example B.22 (Continuous mappings) An affine mapping
"m m
#
X X
f (x) := A1j xj + b1 ; . . . ; Amj xj + bm ≡ Ax + b : Rn → Rm
j=1 j=1

is continuous on the entire R (and thus – on every subset of Rn ) (check it!).

Fact B.23 The norm ∥x∥2 is a real-valued function and it is continuous on

Rn (and thus – on every subset of Rn ) (check it!). In fact, every norm ∥ · ∥
on Rn is continuous.

B.2.2 Elementary continuity-preserving operations

All “elementary” operations with mappings preserve continuity.

Theorem B.24 Let X ⊆ Rn .

(i) [stability of continuity w.r.t. linear operations] If f1 (x), f2 (x) are continu-
ous functions on X taking values in Rm and λ1 (x), λ2 (x) are continuous
real-valued functions on X, then the function
f (x) = λ1 (x)f1 (x) + λ2 (x)f2 (x) : X → Rm
is continuous on X.
(ii) [stability of continuity w.r.t. superposition] Let
• X ⊆ Rn , Y ⊆ Rm ;
• f : X → Rm be a continuous mapping such that f (x) ∈ Y for every
x ∈ X;
• g : Y → Rk be a continuous mapping.
Then, the composite mapping
h(x) := g(f (x)) : X → Rk
is continuous on X.
All these claims are self-evident.
B.2 Continuous functions on Rn 383

B.2.3 Basic properties of continuous functions on Rn

The basic properties of continuous functions on Rn can be summarized as the
next three theorems.

Theorem B.25 Let X be a nonempty compact subset of Rn . If a mapping

f : X → Rm is continuous on X, it is bounded on X, i.e., there exists
C ∈ R such that ∥f (x)∥2 ≤ C for all x ∈ X. Moreover, its image f (X) :=
{f (x) : x ∈ X} is closed. Hence, the image f (X) of a compact set X ⊂ Rn
under a continuous mapping f is compact.

Proof. Assume for contradiction that f is unbounded. Then, for every i ∈ N

there exists a point xi ∈ X such that ∥f (xi )∥2 > i. Since X is bounded, by The-
orem B.15, we can extract from the sequence {xi } a subsequence {xij }∞ j=1 which
converges to a point x̄ ∈ X. The real-valued function g(x) = ∥f (x)∥2 is continu-
ous (as the superposition of two continuous mappings, see Theorem B.24.ii) and
therefore its values at the points xij should converge, as j → ∞, to its value
at x̄. On the other hand, g(xij ) ≥ ij → ∞ as j → ∞, and we get the desired
contradiction.
In order to prove that f (X) is closed, we should prove that if a sequence
y i := f (xi ), xi ∈ X, of points from f (X) converges to ȳ, as i → ∞, then ȳ ∈ f (X).
Indeed, since X is compact, we can extract from {xi } a subsequence converging to
some x̄ ∈ X. By continuity of f , the corresponding subsequence of the sequence
{y i = f (xi )} converges to f (x̄). Thus, ȳ = f (x̄) ∈ f (X), as claimed.
A stronger notion of continuity is defined as follows:

Definition B.26 [Uniformly continuous function] A function f (x) : X →

Rm is called uniformly continuous on X, if for every ϵ > 0 there exists δ > 0
such that
x, y ∈ X, ∥x − y∥2 < δ =⇒ ∥f (x) − f (y)∥2 < ϵ.

Remark B.27 It is important to note the difference between the usual continuity
and the uniform continuity. The usual continuity of a function f means that given
ϵ > 0 and a point x ∈ X, it is possible to choose δ(ϵ, x) > 0 such that for all
y ∈ X, ∥x − y∥2 < δ(ϵ, x) =⇒ ∥f (x) − f (y)∥2 < ϵ; here the appropriate value of
δ can depend on ϵ and on x. The uniform continuity means that this positive δ
may be chosen as a function of only ϵ.
Example B.28 (Uniformly continuous functions) All of the following mappings
are uniformly continuous on X := [1, 100]:
• f (x) := ax + b
• f (x) := |x|
• f (x) = x2
• f (x) = 1/x
384 Prerequisites from Real Analysis

Theorem B.29 Let X ⊂ Rn be compact. If a mapping f : X → Rm is

continuous on X, then it is uniformly continuous.

Proof. Assume for contradiction that there exists ϵ > 0 such that for every δ > 0
we can find a pair of points x, y ∈ X such that ∥x−y∥2 < δ and ∥f (x)−f (y)∥2 ≥ ϵ.
In particular, for every i = 1, 2, . . ., we can find two points xi , y i in X such that
∥xi − y i ∥2 ≤ 1/i and ∥f (xi ) − f (y i )∥2 ≥ ϵ. As X is compact, by Theorem B.15,
we can extract from the sequence {xi } a subsequence {xij }∞ j=1 which converges
to certain point x̄ ∈ X. Since ∥y ij − xij ∥2 ≤ 1/ij → 0 as j → ∞, the sequence
{y ij }∞ ij ∞
j=1 converges to the same point x̄ as the sequence {x }j=1 (why?) Since f
is continuous, we have
lim f (y ij ) = f (x̄) = lim f (xij ),
j→∞ j→∞

ij ij
whence lim (f (x ) − f (y )) = 0. But, this contradicts the assumption that
j→∞
∥f (xij ) − f (y ij )∥2 ≥ ϵ > 0 for all j.
Remark B.30 The fact that a function that is continuous on a compact set is
automatically uniformly continuous on the set is one of the most useful features
of compact sets. Note also that compactness – closedness and boundednes – of
the domain is crucial here: every one of the functions f (x) = 1/x : (0, 1] → R,
f (x) = x2 : R → R is continuous on the respective domain, and neither one is
uniformly continuous on it.

Theorem B.31 Let X ⊂ Rn be nonempty and compact and let f be a

real-valued continuous function on X. Then, f attains both its minimum
and its maximum on X, i.e.,

Argmin f := x ∈ X : f (x) = inf f (y) ̸= ∅,
X y∈X

Argmax f := x ∈ X : f (x) = sup f (y) ̸= ∅.
X y∈X

Proof. We will prove that f attains its maximum on X; the proof for minimum
is completely analogous. Since f is bounded on X by Theorem B.25, the quantity
f ∗ := sup f (x)
x∈X

is finite. Thus, we can find a sequence {xi } of points from X such that f ∗ =
lim f (xi ). As X is compact, we can extract from the sequence {xi } a subsequence
i→∞
{xij }∞
j=1 which converges to a certain point x̄ ∈ X. Since f is continuous on X,
we have
f (x̄) = lim f (xij ) = lim f (xi ) = f ∗ ,
j→∞ i→∞

so that the maximum of f on X indeed is achieved (e.g., at the point x̄).

Theorem B.31 admits the following useful generalization for unbounded sets:
B.3 Semicontinuity 385

Theorem B.32 Let X ⊆ Rn be a nonempty and closed set, and let f be a

real-valued continuous function on X which “goes to +∞ at ∞,” i.e., f (xt ) →
∞, t → ∞, along every sequence {xt ∈ X, t ≥ 1} with ∥xt ∥2 → ∞, t → ∞,
or, which is the same, such that the sublevel sets Xa := {x ∈ X : f (x) ≤ a}
of f on X are bounded for all a ∈ R. Then, f attains its minimum on X:
Argmin f ̸= ∅.
X

Needless to say, Theorem B.32 admits a maximization version; the reader is

strongly advised to formulate this version on their own.
Proof. Since X ̸= ∅, there exists a real number a ∈ R such that the set Xa
is nonempty. As X is closed and f is continuous on X, Xa is closed. Moreover,
under the premise of the theorem, this set is also bounded, so that Xa is compact
by Theorem B.17. By Theorem B.31, f attains its minimum on Xa , and due to
the origin of Xa , the minimizers of f on Xa are nothing but the minimizers of f
on X.

B.3 Semicontinuity
Theorem B.32 admits a useful extension in which we can relax the requirement
for f to be continuous. The related framework is as follows.

B.3.1 Functions with values in the extended real axis

So far, when speaking about scalar functions, we dealt with real-valued functions
defined on subsets of Rn . However, in various minimization-related situations it
is more convenient to speak about functions on Rn taking values in extended real
axis R ∪ {+∞}. To this end, we will
• augment R with “fictitious” point, denoted +∞, and extend the usual orders <,
≤ onto the resulting extended real line R∪{+∞} by the natural convention that
a real number is < +∞ (and therefore is ≤ +∞), and, of course, +∞ ≤ +∞;
• consider functions f (x) : Rn → R ∪ {+∞}; for such a function, its domain
Dom f is defined as the set where the values of f are reals:
Dom f := {x ∈ Rn : f (x) ∈ R} .
Note that our “old” partially defined real-valued functions on Rn , i.e., the func-
tions f : X → R with X ⊆ Rn , can be extended by value +∞ outside of X to
become functions on the entire Rn taking values in R ∪ {+∞}. Such an extended
function clearly “remembers its origin” – its domain is X, and its restriction onto
X coincides with the original f . Thus, two entities – the set X on which the
original f was defined, and the original function f itself – are now encoded in a
single entity, i.e., the function on Rn taking values in R ∪ {+∞}. While there is
nothing deep in this encoding (this is just a convention), it saves a lot of words.
386 Prerequisites from Real Analysis

In the sequel, if otherwise is not explicitly stated, “function” means a function

on Rn taking values in R ∪ {+∞}.
Note that the function with empty domain – one which is identically equal to
+∞ – is a fully legitimate inhabitant of our new world. In this world, functions
with nonempty domains have a special name; they are called proper.

B.3.2 Epigraph of a function

n
Given a function f : R → R ∪ {+∞}, we associate with it its epigraph – the set
epi{f } := {[x; t] ∈ Rn × R : t ≥ f (x)} .
Clearly, a set E in Rn × R is an epigraph if and only if for every x ∈ Rn the
intersection of E with the “vertical” line ℓx := {[x; t] : t ∈ R} is either empty, or is
a ray of the form {t ∈ R : t ≥ tx } with some real number tx . Moreover, a set with
these properties remembers the underlying function of which it is epigraph, i.e.,
the value of this function at a point x ∈ Rn is +∞ when ℓx does not intersect E,
and is tx otherwise. The domain of a function is just the projection of its epigraph
onto the x-plane:
Dom f := {x ∈ Rn : ∃t such that [x; t] ∈ epi{f }} .

B.3.3 Lower semicontinuity

Functions with closed epigraphs have a name – they are called lower semicontin-
uous (lsc for short). Here is their characterization:

Theorem B.33 Let f (x) : Rn → R ∪ {+∞}. The following properties of f

are equivalent to each other
(i) [lower semicontinuity] epi{f } is closed;
(ii) [closedness of sublevel sets] For every α ∈ R, the sublevel set of f defined
as
levα (f ) := {x ∈ Rn : f (x) ≤ α}
is closed;
(iii) Whenever a sequence {xi ∈ Rn , i ≥ 1} has a limit, say x, we have
f (x) ≤ lim inf f (xi ). (B.3)
i→∞

Proof. (i) =⇒ (ii): We want to prove that if epi{f } is closed and α ∈ R, then
levα (f ) is closed. This is immediate: if xi ∈ levα (f ) and xi → x, i → ∞, then
f (xi ) ≤ α for all i, so that the sequence xi := [xi ; α] ∈ Rn × R belongs to
epi{f }; as i → ∞, this sequence converges to [x; α], and since epi{f } is closed,
[x; α] ∈ epi{f }, implying that f (x) ≤ α, that is, x ∈ levα (f ).
(ii) =⇒ (iii): Assume that the sublevel sets of f are closed, and let us prove that
whenever xi → x as i → ∞, (B.3) holds true. Passing to a subsequence of {xi },
B.3 Semicontinuity 387

it suffices to consider the situation when lim inf i→∞ f (xi ) = limi→∞ f (xi ) =: β.
Note that β is either +∞, or a real number, or −∞. In the first case we clearly
have f (x) ≤ β. Suppose the second or the third case holds, and assume for the
contradiction that β < f (x). Let γ be a real number such that β < γ < f (x).
Since γ > β, for all large enough i we have f (xi ) ≤ γ, that is, xi ∈ levγ (f ),
implying that x = limi→∞ xi is the limit of a converging sequence of points from
levγ (f ). As the sublevel sets of f are closed, we conclude that x ∈ levγ (f ), that
is, f (x) ≤ γ, which contradicts the origin of γ.
(iii) =⇒ (i): Assume that f obeys (iii), and let us prove that epi{f } is closed.
Suppose the points [xi ; ti ] ∈ epi{f } converge as i → ∞ to [x; t]; we need to prove
that [x; t] ∈ epi{f }. As [xi ; ti ] ∈ epi{f }, we have f (xi ) ≤ ti , and as ti → t, i → ∞,
we deduce lim inf i→∞ f (xi ) ≤ t. Since f obeys (ii), we conclude that f (x) ≤ t,
that is, [x; t] ∈ epi{f }.
Clearly, all finite-valued continuous functions are also lower semicontinuous.
Moreover, there are many finite-valued discontinuous functions that are lower
semicontinuous, i.e., precisely the ones with closed epigraphs. Based on the defi-
nition of lower semicontinuity, we also see that there are finite-valued discontin-
uous function that are not lsc. For example, the discontinuous function in Figure
V.1 is not lsc.
The following corollary shows that a large class of continuous functions with
extended values are also lsc.

Corollary B.34 Let f : Rn → R ∪ {+∞} be a function that is contin-

uous on its domain Dom f and let its domain be closed. Then, f is lower
semicontinuous.

Proof. For any α ∈ R, consider the sublevel set of f given by levα (f ) =

{x ∈ Dom f : f (x) ≤ α}. Under the promise of the corollary we will show that
these sublevel sets are all closed, and then the result will follow from Theo-
rem B.33.ii. Consider a converging sequence {xi } contained in levα (f ) and let
x := limi→∞ xi . As Dom f is closed and xi ∈ Dom f , we have x ∈ Dom f . Since
f is continuous on Dom f , we deduce f (x) = limi→∞ f (xi ). Finally, as f (xi ) ≤ α
for all i, we get f (x) ≤ α, that is, x ∈ levα (f ). This proves that levα (f ) is closed
as desired.
The closedness of the domain in Corollary B.33 is indeed crucial – look what
happens with the univariate function which is zero on the positive ray and +∞
on the nonpositive ray.
Another useful example of lsc function is the pointwise supremum

f (x) = sup fα (x)

α∈A
T
of a whatever family of lsc functions. Indeed, we clearly have epi{f } = epi{fα },
α∈A
and the intersection of a family of closed sets is closed.
We are now ready to state and prove the promised extension of Theorem B.32.
388 Prerequisites from Real Analysis

Theorem B.35 Let f : Rn → R ∪ {+∞} be a proper lower semicontinuous

function with bounded sublevel sets. Then, f is below bounded and attains
its minimum.

Proof. Since f is proper, there exists α ∈ R such that the set levα (f ) is nonempty.
Since f is lsc, levα (f ) is closed. Moreover, under the premise of the theorem
levα (f ) is also bounded, and thus it is compact. Now, let xi ∈ levα (f ) be a
sequence such that f (xi ) → inf x∈levα (f ) f (x) (note also that inf x∈levα (f ) f (x) =
inf x∈Rn f (x)). Since levα (f ) is compact, passing to a subsequence, we can assume
that xi → x̄ as i → ∞, whence, by Theorem B.33.iii, f (x̄) ≤ lim inf i→∞ f (xi ) =
inf x∈Rn f (x), implying that inf x∈Rn f (x) is f (x̄) ∈ R and therefore inf x∈Rn f (x)
is achieved and finite.

B.3.4 Hypograph and upper semicontinuity

While in the preceding sections we focused on minimizers of the functions and
stated conditions under which such minimizers exist, there are of course natural
“maximization” analogies of these developments. In the “maximization” analo-
gies, the functions are allowed to take values in R ∪ {−∞}, and the role of the
epigraph is played by the hypograph of a function, i.e., the set

hypo{f } = {[x; t] : t ≤ f (x)} ,

and lower semicontinuity is replaced with upper semicontinuity, i.e., closedness of

the hypograph. For example, the discontinuous function in Figure V.1 is upper
semicontinuous.
In fact, a function f : Rn → (R ∪ {±∞}) is upper semicontinuous if and only
if −f is lower semicontinuous; and such an f is continuous if and only if it is both
upper and lower semicontinuous.

B.4 Exercises
Exercise 8 Mark in the list below those sets which are closed and those which are open (sets
are in Rn , ∥ · ∥ is a norm on Rn , where n > 0):
1. All vectors with integer coordinates.
2. All vectors with rational coordinates.
3. All vectors with positive coordinates.
4. All vectors with nonnegative coordinates.
5. {x ∈ Rn : ∥x∥ < 1}.
6. {x ∈ Rn : ∥x∥ = 1}.
7. {x ∈ Rn : ∥x∥ ≤ 1}
8. {x ∈ Rn : ∥x∥ ≥ 1}
9. {x ∈ Rn : ∥x∥ > 1}
10. {x ∈ Rn : 1 < ∥x∥ ≤ 2}
B.5 Proofs of Facts 389

Exercise 9 Consider the function f (x1 , x2 ) : R2 → R defined as

( x2 −x2
1
2
2
2, if (x1 , x2 ) ̸= 0
f (x1 , x2 ) = x1 +x2
0, if x1 = x2 = 0.
Check whether this function is continuous on the following sets:
1. R2
2
2. \ {0}
R
2
3. x ∈ R2 : x1 = 0
4. x ∈ R 2 : x2 = 0
5. x ∈ R2 : x1 + x2 = 0
6. x ∈ R2 : x1 − x2 = 0 4
7. x ∈ R : |x1 − x2 | ≤ x1 + x42
Exercise 10 Let f : Rn → Rm be a continuous mapping. Among the following statements,
mark those which are always true:
1. If U is an open set in Rm , then so is the set f −1 (U ) := {x ∈ Rn : f (x) ∈ U }.
2. If U is an open set in Rn , then so is the set f (U ) = {f (x) : x ∈ U }.
3. If F is a closed set in Rm , then so is the set f −1 (F ) = {x ∈ Rn : f (x) ∈ F }.
4. If F is a closed set in Rn , then so is the set f (F ) = {f (x) : x ∈ F }.
Exercise 11 Prove that in general neither one of Theorems B.25, B.29, and B.31 remains valid
when X is closed, but not bounded, same as when X is bounded, but not closed.

B.5 Proofs of Facts

Fact B.4 All of the following statements are correct:

(i) For any x̄ ∈ Rn , we have x̄ = lim xt if and only if for every index i = 1, . . . , n
t→∞
the i-th coordinate of the vectors xt converge to the i-th coordinate of the vector
x̄ as t → ∞.
(ii) If a sequence converges, its limit is uniquely defined.
(iii) Convergence is compatible with linear operations:
• if xt → x and y t → y as t → ∞, then xt + y t → x + y as t → ∞;
• if xt → x and λt → λ as t → ∞, then λt xt → λx as t → ∞.
Proof. qP
n
(i): Indeed, ∥x̄ − xi ∥2 = i
j=1 (x̄j − (x )j ) converges to 0 as i → ∞ if and only if x̄j − (x )j
i 2

converges to 0 as i → ∞ for every j ≤ n.

(ii): Suppose x′ and x′′ both are limits of a converging sequence {xi }. Then, for every ϵ > 0
and all large enough i it holds that ∥x′ − xi ∥2 ≤ ϵ, ∥x′′ − xi ∥2 ≤ ϵ, and so by Triangle inequality
∥x′ − x′′ ∥2 ≤ 2ϵ. Since ϵ > 0 is arbitrary, we conclude that ∥x′ − x′′ ∥2 ≤ 0, and thus x′ = x′′ .
(iii): By elementary Calculus, both claims are valid for sequences of real numbers; this com-
bines with item (i) to imply the validity of the claims for vectors.

Fact B.13 (i) A set X ⊆ Rn is closed if and only if its complement X := Rn \ X

is open.
(ii) Intersection of every (finite or infinite) family of closed sets is closed. Union of
every (finite or infinite) family of open sets is open.
(iii) Union of finitely many closed sets is closed. Intersection of finitely many open
390 Prerequisites from Real Analysis

sets is open.
Proof.
(i): Clearly, a point x̄ ∈ Rn is a limit of a sequence of points from a set X ⊆ Rn if and only if
every centered at x̄ ball of positive radius contains points from X. Consequently,
• when X is closed and x̄ ̸∈ X, there is a ball of positive radius centered at x̄ and not
intersecting X, implying that the complement X of a closed set X is open;
• when X is open, none of the points from X is the limit of a sequence of the points from
the complement X of X, implying that the limit of every converging sequence of points
from X belongs to X, that is, X is closed.
(ii): Evident.
(iii): When X1 , . . . , XN are closed and {xi } is a converging sequence of points from X = ∪i≤N Xi ,
the sequence visits certain Xj infinitely many times, that is, the sequence has a subsequence
with all terms from certain Xj . The limit of this subsequence (which is the limit of {xi } as
well) belongs to Xj since this set is closed, and therefore the limit of {xi } belongs to X.
Thus, X is closed. Passing from the sets to their complements and invoking (i), we extract
from what just have been proved that the intersection of finitely many open sets is open.

Fact B.23 The norm ∥x∥2 is a real-valued function and it is continuous on Rn

(and thus – on every subset of Rn ) (check it!). In fact, every norm ∥ · ∥ on Rn is
continuous.
Proof. By Fact B.2, we have |∥x∥2 − ∥y∥2 | ≤ ∥x − y∥2 , so that limt→∞ xt = x implies that
0 ≤ |∥x∥2 − ∥xt ∥2 | ≤ ∥x − xt ∥2 → 0, t → ∞, hence limt→∞ ∥xt ∥2 = ∥x∥2 , implying the
continuity of ∥ · ∥2 . This reasoning can be word by word repeated for an arbitrary norm ∥ · ∥
and induced by this norm convergence in the role of those coming from ∥ · ∥2 . This combines
with the the fact that convergence of a sequence from Rn and its limit are independent of the
norm-induced distance on Rn we use, see section B.1.2, to imply the continuity of every norm
on Rn .
Appendix C

Prerequisites from Calculus

C.1 Differentiable functions on Rn

We next turn to prerequisites from calculus, specifically differentiable functions.

C.1.1 The derivative

The reader definitely is familiar with the notion of derivative of a real-valued
function f : R → R of real variable x:

f (x + ∆x) − f (x)
f ′ (x) := lim . 1)
∆x→0 ∆x

This definition does not work when we pass from functions of single real variable
to functions of several real variables, or, which is the same, to functions with
vector arguments. Indeed, in this case the shift in the argument ∆x should be a
vector, and we do not know what does it mean to divide by a vector. . .
A proper way to extend the notion of the derivative to real- and vector-
valued functions of vector argument is to realize what in fact the meaning of
the derivative is in the univariate case: f ′ (x) gives us the precise description of
how to approximate f in a neighborhood of x by a linear function. Specifically,
if f ′ (x) exists, then the linear function f ′ (x)∆x of ∆x approximates the change
f (x + ∆x) − f (x) in f up to a remainder which is of higher order as compared
to ∆x as ∆x → 0:

|f (x + ∆x) − f (x) − f ′ (x)∆x| ≤ ō(|∆x|) as ∆x → 0.

In the above formula, we meet with the notation ō(|∆x|), and here is the expla-
nation of this notation:

1 Here in what follows, when speaking about limits like the one in the right hand side of the definition
of f ′ , we exclude the value ∆x = 0 (at which the quantity we are looking at is undefined). In other
words, we are speaking about limit taken w.r.t. “percolated neighborhood” of the origin: by
definition, the relation a = lim∆x→0 g(∆x)/∆x means that for every ϵ > 0 there exists δ > 0 such
that |g(∆x)/∆x − a| ≤ ϵ for all nonzero ∆x satisfying |∆x| ≤ δ.

391
392 Prerequisites from Calculus

ō(|∆x|) is a common name of all functions ϕ(∆x) of ∆x which are well-

defined in a neighborhood of the point ∆x = 0 on the axis, vanish at the
point ∆x = 0 and are such that
ϕ(∆x)
→ 0 as ∆x → 0.
|∆x|
For example,
1. (∆x)2 = ō(|∆x|), ∆x → 0,
2. |∆x|1.01 = ō(|∆x|), ∆x → 0,
3. sin2 (∆x) = ō(|∆x|), ∆x → 0,
4. ∆x ̸= ō(|∆x|), ∆x → 0.
Later on we shall meet with the notation “ō(|∆x|k ) as ∆x → 0”, where
k is a positive integer. The definition of ‘ō(|∆x|k ) is completely similar
to the one for the case of k = 1:
ō(|∆x|k ) is a common name of all functions ϕ(∆x) of ∆x which are well-
defined in a neighborhood of the point ∆x = 0 on the axis, vanish at the
point ∆x = 0 and are such that
ϕ(∆x)
→ 0 as ∆x → 0.
|∆x|k
Note that if f (·) is a function defined in a neighborhood of a point x on the axis,
then there perhaps are many linear functions a∆x of ∆x which well approximate
f (x + ∆x) − f (x), in the sense that the remainder in the approximation
f (x + ∆x) − f (x) − a∆x
tends to 0 as ∆x → 0. Among these approximations, however, there exists at most
one which approximates f (x + ∆x) − f (x) “very well” – so that the remainder is
ō(|∆x|), and not merely tends to 0 as ∆x → 0. Indeed, if
f (x + ∆x) − f (x) − a∆x = ō(|∆x|),
then, dividing both sides by ∆x, we get
f (x + ∆x) − f (x) ō(|∆x|)
−a= .
∆x ∆x
By definition of ō(·), the right hand side in this equality tends to 0 as ∆x → 0,
whence
f (x + ∆x) − f (x)
a = lim = f ′ (x).
∆x→0 ∆x
Thus, if a linear function a∆x of ∆x approximates the change f (x + ∆x) − f (x)
in f up to the remainder which is ō(|∆x|) as ∆x → 0, then a is the derivative of f
at x. You can easily verify that the inverse statement is also true: if the derivative
of f at x exists, then the linear function f ′ (x)∆x of ∆x approximates the change
f (x + ∆x) − f (x) in f up to the remainder which is ō(|∆x|) as ∆x → 0.
C.1 Differentiable functions on Rn 393

The advantage of the “ō(|∆x|)”-definition of derivative is that it can be natu-

rally extended onto vector-valued functions of vector arguments (by just replacing
“axis” with Rn in the definition of ō) and enlightens the essence of the notion of
derivative: when it exists, this is exactly the linear function of ∆x which approx-
imates the change f (x + ∆x) − f (x) in f up to a remainder which is ō(|∆x|). The
precise definition is as follows:

Definition C.1 [Frechet differentiability] Let f be a function which is well-

defined in a neighborhood of a point x ∈ Rn and takes values in Rm . We say
that f is differentiable at x, if there exists a linear function Df (x)[∆x] of
∆x ∈ Rn taking values in Rm which approximates the change f (x + ∆x) −
f (x) in f up to a remainder which is ō(∥∆x∥2 ), i.e.,
∥f (x + ∆x) − f (x) − Df (x)[∆x]∥2 ≤ ō(∥∆x∥2 ). (C.1)
Equivalently, a function f which is well-defined in a neighborhood of a point
x ∈ Rn and takes values in Rm is called differentiable at x, if there exists
a linear function Df (x)[∆x] of ∆x ∈ Rn taking values in Rm such that for
every ϵ > 0 there exists δ > 0 satisfying the relation
∥∆x∥ ≤ δ =⇒ ∥f (x + ∆x) − f (x) − Df (x)[∆x]∥2 ≤ ϵ∥∆x∥2 .

Note that due to equivalence of norms on finite-dimensional spaces, in this def-

inition the standard Euclidean norms in which we measure change in f and
magnitude of ∆x can be replaced with any other pair of norms on Rm and Rn
without affecting differentiability and Df (x)[·].

C.1.2 Derivative and directional derivatives

We have defined what it means for a function f : Rn → Rm to be differentiable
at a point x, but have not stated yet what the derivative is. The reader may
guess that the derivative is exactly the “linear function Df (x)[∆x] of ∆x ∈ Rn
taking values in Rm which approximates the change f (x+∆x)−f (x) in f up to a
remainder which is less than or equal to ō(∥∆x∥2 )” participating in the definition
of differentiability. While this guess is correct, we cannot merely call the entity
participating in the definition the derivative – why do we know that this entity is
unique? Perhaps there are many different linear functions of ∆x approximating
the change in f up to a remainder which is ō(∥∆x∥2 ). In fact there is no more
than a single linear function with this property due to the following observation.

Proposition C.2 Let f be differentiable at x, and Df (x)[∆x] be a linear

function participating in the definition of differentiability. Then,
f (x + t∆x) − f (x)
Df (x)[∆x] = lim , ∀∆x ∈ Rn . (C.2)
t→+0 t
In particular, the derivative Df (x)[·] is uniquely defined by f and x.
394 Prerequisites from Calculus

Proof. For all ∆x ∈ Rn and for any t > 0, we have

∥f (x + t∆x) − f (x) − Df (x)[t∆x]∥2 ≤ ō(∥t∆x∥2 )
f (x + t∆x) − f (x) Df (x)[t∆x] ō(∥t∆x∥2 )
=⇒ ∥ − ∥2 ≤
t t t
f (x + t∆x) − f (x) ō(∥t∆x∥2 )
⇐⇒ − Df (x)[∆x] ≤
t 2 t
f (x + t∆x) − f (x)
=⇒ Df (x)[∆x] = lim ,
t→+0 t
where the second line is obtained by dividing with t > 0, the third line follows
since Df (x)[·] is linear, and the last line is obtained by passing to limit as t → +0
and noting that ō(∥t∆x∥
t
2)
→ 0 as t → +0.
We conclude this section with two important remarks as follows:

1. The right hand side limit in (C.2) is an important entity called the directional
derivative of f taken at x along (a direction) ∆x; note that this quantity is
defined in the “purely univariate” fashion – by dividing the change in f by the
magnitude of a shift in a direction ∆x and passing to limit as the magnitude
of the shift approaches 0. Relation (C.2) states that the derivative, if it exists,
is, at every ∆x, nothing but the directional derivative of f taken at x along
∆x. Note, however, that differentiability is much more than the existence of
directional derivatives along all directions ∆x. In particular, differentiability
requires also the directional derivatives to be “well-organized” – to depend
linearly on the direction ∆x. It is easily seen that just mere existence of di-
rectional derivatives does not imply their “good organization.” For example,
the Euclidean norm
f (x) = ∥x∥2
at x = 0 possesses directional derivatives along all directions:
f (0 + t∆x) − f (0)
lim = ∥∆x∥2 .
t→+0 t
These derivatives, however, depend non-linearly on ∆x, so that the Euclidean
norm is not differentiable at the origin (although is differentiable everywhere
outside the origin, but this is another story).
2. It should be stressed that the derivative, if it exists, is what it is: a linear
function of ∆x ∈ Rn taking values in Rm . As we shall see in a while, we can
represent this function by something “tractable,” like a vector or a matrix,
and we can understand how to compute such a representation. However, a
careful reader should bear in mind that a representation is not exactly the
same as the represented entity. Sometimes the difference between derivatives
and the entities which represent them is reflected in the terminology: what we
call the derivative, is also called the differential, while the word “derivative”
is reserved for the vector/matrix representing the differential.
C.1 Differentiable functions on Rn 395

C.1.3 Representations of the derivative

By definition, the derivative of a mapping f : Rn → Rm at a point x is a linear
function Df (x)[∆x] taking values in Rm . How could we represent such a function?
Case of m = 1: The gradient. Let us start with real-valued functions (i.e., with
the case of m = 1); in this case the derivative is a linear real-valued function on
Rn . As we remember, the standard Euclidean structure on Rn allows to represent
every linear function on Rn as the inner product of the argument with certain
fixed vector. In particular, the derivative Df (x)[∆x] of a scalar function can be
represented as
Df (x)[∆x] = [vector]⊤ ∆x;
what is denoted with “vector” in this relation, is called the gradient of f at x
and is denoted by ∇f (x):
Df (x)[∆x] = (∇f (x))⊤ ∆x. (C.3)
How to compute the gradient? The answer is given by (C.2). Indeed, let us look
what (C.3) and (C.2) say when ∆x is the i-th standard basis vector. According
to (C.3), Df (x)[ei ] is the i-th coordinate of the vector ∇f (x). Then, using (C.2),
we arrive at

Df (x)[ei ] = lim f (x+teti )−f (x) , 
t→+0
f (x−tei )−f (x) f (x+tei )−f (x)
Df (x)[ei ] = −Df (x)[−ei ] = − lim t
= lim t

t→+0 t→−0
∂f (x)
=⇒ Df (x)[ei ] = ∂xi
.
Thus, we conclude:
If a real-valued function f is differentiable at x, then the first-order
partial derivatives of f at x exist, and the gradient of f at x is just the
vector with the coordinates which are the first-order partial derivatives
of f taken at x:
 
∂f (x)
∂x1

∇f (x) :=  .. 
.
 . 
∂f (x)
∂xn

The derivative of f , taken at x, is the linear function of ∆x given by

n
X ∂f (x)
Df (x)[∆x] = (∇f (x))⊤ ∆x = (∆x)i .
i=1
∂xi

General case: The Jacobian. Now consider f : Rn → Rm with m ≥ 1. In this

case, Df (x)[∆x], regarded as a function of ∆x, is a linear mapping from Rn to
Rm . Recall that the standard way to represent a linear mapping from Rn to Rm
is to represent it as the multiplication by an m × n matrix:
Df (x)[∆x] = [m × n matrix] · ∆x. (C.4)
396 Prerequisites from Calculus

What is denoted by “matrix” in (C.4), is called the Jacobian of f at x and is

denoted by f ′ (x). How to compute the entries of the Jacobian? Once again the
answer is readily given by (C.2). Indeed, on one hand, we have
Df (x)[∆x] = f ′ (x)∆x, (C.5)
where
[Df (x)[ej ]]i = [f ′ (x)]ij , i = 1, . . . , m, j = 1, . . . , n.
On the other hand, by denoting
 
f1 (x)
f (x) =  ..
,
 
.
fm (x)
the same computation as in the case of gradient demonstrates that
∂fi (x)
[Df (x)[ej ]]i = .
∂xj
Thus, we arrive at the following conclusion:
If a vector-valued function f (x) = [f1 (x); . . . ; fm (x)] is differentiable at
x, then the first-order partial derivatives of all fi at x exist,h and the i
Jacobian of f at x is just the m × n matrix with the entries ∂f∂xi (x) j i,j
(so that the rows in the Jacobian are [∇f1 (x)]⊤ ,. . . , [∇fm (x)]⊤ ). The
derivative of f , taken at x, is the linear vector-valued function of ∆x
given by
[∇f1 (x)]⊤ ∆x
 

Df (x)[∆x] = f ′ (x)∆x =  ..
.
 
.
⊤
[∇fm (x)] ∆x

Remark C.3 Note that for a real-valued function f : Rn → R we have defined

both the gradient ∇f (x) and the Jacobian f ′ (x). These two entities are “nearly
the same,” but not exactly the same. The Jacobian is a vector-row and the
gradient is a vector-column; these two are linked by the relation
f ′ (x) = (∇f (x))⊤ .
Of course, both these representations of the derivative of f yield the same linear
approximation of the change in f :
Df (x)[∆x] = (∇f (x))⊤ ∆x = f ′ (x)∆x.

C.1.4 Existence of the derivative

We have seen that the existence of the derivative of f at a point implies the
existence of the first-order partial derivatives of the components (f1 , . . . , fm ) of
C.1 Differentiable functions on Rn 397

f . The reverse statement is not exactly true. In particular, the existence of all
first-order partial derivatives ∂f∂xi (x)
j
for all i, j not necessarily implies the existence
of the derivative. In fact, we need a bit more, which is described next.

Theorem C.4 [Sufficient condition for differentiability] Given a mapping

f = (f1 , . . . , fm ) : Rn → Rm , f is differentiable at the point x̄ ∈ Rn if all of
the following conditions holds:
(i) the mapping f is well-defined in a neighborhood U of the point x̄ ∈ Rn ;
(ii) the first-order partial derivatives of the components fi of f exist ev-
erywhere in U ; and
(iii) the first-order partial derivatives of the components fi of f are con-
tinuous at the point x̄.

C.1.5 Calculus of derivatives

The following elementary rules for the calculus of derivatives are useful to know.

Theorem C.5
(i) [Differentiability and linear operations] Let f1 (x), f2 (x) be mappings
defined in a neighborhood of a point x̄ ∈ Rn and taking values in Rm ,
and λ1 (x), λ2 (x) be real-valued functions defined in a neighborhood of x̄.
Whenever f1 , f2 , λ1 , λ2 are differentiable at x̄, so is the function f (x) :=
λ1 (x)f1 (x) + λ2 (x)f2 (x), and its derivative at x̄ is given by
Df (x̄)[∆x] = [Dλ1 (x̄)[∆x]]f1 (x̄) + λ1 (x̄)Df1 (x̄)[∆x]
+ [Dλ2 (x̄)[∆x]]f2 (x̄) + λ2 (x̄)Df2 (x̄)[∆x],
=⇒ f ′ (x̄) = f1 (x̄)[∇λ1 (x̄)]⊤ + λ1 (x̄)f1′ (x̄)
+ f2 (x̄)[∇λ2 (x̄)]⊤ + λ2 (x̄)f2′ (x̄).
(ii) [Chain rule] Let a mapping f : Rn → Rm be differentiable at x̄, and a
mapping g : Rm → Rk be differentiable at ȳ := f (x̄). Then, the superposition
function given by h(x) = g(f (x)) is differentiable at x̄, and its derivative at
x̄ is given by
Dh(x̄)[∆x] = Dg(ȳ)[Df (x̄)[∆x]],
=⇒ h′ (x̄) = g ′ (ȳ)f ′ (x̄).
If the outer function g is real-valued, then the latter formula implies that
∇h(x̄) = [∇g(ȳ)]⊤ f ′ (x̄)
(recall that for a real-valued function ϕ, ϕ′ = (∇ϕ)⊤ ).
398 Prerequisites from Calculus

C.1.6 Computing the derivative

Representations of the derivative via first-order partial derivatives normally allow
us to compute it by the standard Calculus rules, in a completely mechanical
fashion, not thinking at all of what we are computing. The examples to follow
(especially Example C.8) demonstrate that it often makes sense to bear in mind
what the derivative is; this sometimes yields the result much faster than blindly
implementing Calculus rules.
Example C.6 (Gradient of an affine function) An affine function
n
X
f (x) = a + gi xi ≡ a + g ⊤ x : Rn → R
i=1

is differentiable at every point (Theorem C.4) and its gradient, of course, equals
to g:
(∇f (x))⊤ ∆x = lim t−1 (f (x + t∆x) − f (x)) [by (C.2)]
t→+0

= lim t−1 (tg ⊤ ∆x) [by plugging the definition of f ].

t→+0

Hence, we arrive at
∇(a + g ⊤ x) = g.
Example C.7 (Gradient of a quadratic form) For now, let us define a homoge-
neous quadratic form on Rn as a function
n X
X n
f (x) = Aij xi xj = x⊤ Ax,
i=1 j=1

where A is an n × n matrix. Note that the matrices A and A⊤ define the same
quadratic form, and therefore the symmetric matrix B := 21 (A+A⊤ ) also produces
the same quadratic form as A and A⊤ . Thus, we always may assume (and do
assume from now on) that the matrix A producing the quadratic form in question
is symmetric.
A quadratic form is a simple polynomial and as such is differentiable at ev-
ery point (Theorem C.4). What is the gradient of f at a point x? Here is the
computation:
(∇f (x))⊤ ∆x
= Df (x)[∆x]
= lim t−1 (x + t∆x)⊤ A(x + t∆x) − x⊤ Ax

t→+0

= lim t−1 x⊤ Ax + t(∆x)⊤ Ax + tx⊤ A∆x + t2 (∆x)⊤ A∆x − x⊤ Ax
t→+0

−1 ⊤ 2 ⊤
= lim t 2t(Ax) ∆x + t (∆x) A∆x
t→+0

= 2(Ax)⊤ ∆x,
C.1 Differentiable functions on Rn 399

where the second equality follows from (C.2), the third equality is obtained by
opening the parenthesis, and in the last line we used that A is symmetric.
Hence, we conclude that for a symmetric matrix A, we have
∇(x⊤ Ax) = 2Ax.
Example C.8 (Derivative of X −1 on the domain of nonsingular n × n matrices)
Define the mapping F (X) := X −1 on the open set of nonsingular n × n matrices.
Suppose X ∈ Rn×n is nonsingular. Then, for any ∆X ∈ Rn×n we have
DF (X)[∆X] = lim t−1 (X + t∆X)−1 − X −1

t→+0

= lim t−1 (X(I + tX −1 ∆X))−1 − X −1

t→+0

= lim t−1 (I + tX −1 ∆X)−1 X −1 − X −1

t→+0

Thus, by defining Y := X −1 ∆X, for all ∆X ∈ Rn×n we arrive at

DF (X)[∆X] = lim t−1 (I + tY )−1 − I X −1

t→+0

−1 −1
= lim t (I − (I + tY )) (I + tY ) X −1
t→+0

= lim (−Y (I + tY ) ) X −1
−1
t→+0

= −Y X −1
= −X −1 ∆XX −1 .
Therefore, we arrive at the important relation
D(X −1 )[∆X] = −X −1 ∆XX −1 .
(cf., the derivative of the univariate function x−1 at x ̸= 0 is −x−2 ).
Example C.9 (Derivative of the log-det barrier) The log-det barrier is given by
F (X) = ln Det(X),
where X is an n × n matrix (or, if you prefer, n2 -dimensional vector). The log-
det barrier plays an extremely important role in modern optimization. In this
example, we will compute its derivative.
Note that F (X) is well-defined and differentiable in a neighborhood of every
point X̄ with positive determinant. (Indeed, Det(X) is a polynomial of the entries
of X and thus it is everywhere continuous and differentiable with continuous
partial derivatives, while the function ln(t) is continuous and differentiable on
the positive ray. Then, by Theorems B.24.ii and C.5.ii, F is differentiable at
every X such that Det(X) > 0). While the computation of the derivative of F
by the standard techniques will not be very pleasant, we next illustrate that this
can be done easily by resorting to the fundamental definition of the derivative.
400 Prerequisites from Calculus

Let us consider a point X̄ such that Det(X̄) > 0, and define G(X) := Det(X).
Then, we have

DF (X̄)[∆X] = D ln(G(X̄)) DG(X̄)[∆X]
= (G(X̄))−1 DG(X̄)[∆X]
= (Det(X̄))−1 lim t−1 Det(X̄ + t∆X) − Det(X̄)

t→+0

= (Det(X̄))−1 lim t−1 Det X̄(I + tX̄ −1 ∆X) − Det(X̄)

t→+0

= (Det(X̄)) lim t−1 Det(X̄) Det(I + tX̄ −1 ∆X − 1)

−1

t→+0

= lim t−1 Det(I + tX̄ −1 ∆X) − 1

t→+0
n X
X n
= Tr(X̄ −1 ∆X) = (X̄ −1 )ji (∆X)ij ,
i=1 j=1

where the first equality follows from the chain rule, the second one from the fact
that ln′ (x) = x−1 for any x > 0, the third one from the definition of G and (C.2),
the fifth one follows from the relation Det(AB) = Det(A)Det(B) for every A, B
of appropriate size, and the second to last equality, i.e.,
n
X
lim t−1 (Det(I + tA) − 1) = Tr(A) ≡ Aii , (C.6)
t→+0
i=1

is immediately given by recalling what Det(I + tA) is. Note that Det(I + tA) is
a polynomial of t which is the sum of products, taken along all diagonals of an
n × n matrix and assigned certain signs, of the entries of I + tA. At every one
of these diagonals, except for the main one, there are at least two cells with the
entries proportional to t, so that the corresponding products do not contribute
to the constant and the linear in t terms in Det(I + tA) and thus do not affect
the limit in (C.6). The only product which does contribute to the linear and the
constant terms in Det(I + tA) is the product (1 + tA11 )(1 + tA22 ) . . . (1 + tAnn )
coming from the main diagonal. Moreover, it is clear that in this product the
constant term is 1, and the linear in t term is t(A11 + . . . + Ann ), and thus (C.6)
follows.

C.2 Higher order derivatives

n m
Let f : R → R be a mapping which is well-defined and differentiable at every
point x from an open set U . The Jacobian of this mapping J(x) is a mapping
from Rn to the space Rm×n matrices, i.e., it is a mapping taking values in certain
RM (M = mn). The derivative of this mapping J(x), if it exists, is called the
second derivative of f , which again is a mapping from Rn to certain RM and as
such can be differentiable, and so on, so that we can speak about the second, the
third, . . . derivatives of a vector-valued function of vector argument. A sufficient
condition for the existence of k derivatives of f in U is that f is Ck in U , i.e., that
C.2 Higher order derivatives 401

all partial derivatives of f of orders ≤ k exist and are continuous everywhere in

U (cf. Theorem C.4).
The preceding description explains what it means that f has k derivatives in U .
Note, however, that according to this description, highest order derivatives at a
point x are just long vectors; say, the second-order derivative of a scalar function
f of 2 variables is the Jacobian of the mapping x 7→ f ′ (x) : R2 → R2 , i.e., a
mapping from R2 to R2×2 = R4 ; the third-order derivative of f is therefore the
Jacobian of a mapping from R2 to R4 , i.e., a mapping from R2 to R4×2 = R8 ,
and so on. The question which should be addressed now is: What is a natural
and transparent way to represent the highest order derivatives?
The answer is as follows:
(∗) Let f : Rn → Rm be Ck on an open set U ⊆ Rn . The derivative of
order ℓ ≤ k of f , taken at a point x ∈ U , can be naturally identified with
a function
Dℓ f (x)[∆x1 , ∆x2 , . . . , ∆xℓ ]
of ℓ vector arguments ∆xi ∈ Rn , i = 1, . . . , ℓ, and taking values in Rm .
This function is linear in every one of the arguments ∆xi , the other
arguments being fixed, and is symmetric with respect to permutation of
arguments ∆x1 , . . . , ∆xℓ .
In terms of f , the quantity Dℓ f (x)[∆x1 , ∆x2 , . . . , ∆xℓ ] (full name: “the
ℓ-th derivative (or differential) of f taken at a point x along the directions
∆x1 , . . . , ∆xℓ ”) is given by
Dℓ f (x)[∆x1 , ∆x2 , . . . , ∆xℓ ]
∂ℓ (C.7)
= ∂tℓ ∂tℓ−1 ...∂t1 t1 =...=tℓ =0
f (x + t1 ∆x1 + t2 ∆x2 + . . . + tℓ ∆xℓ ).

The explanation to our claims is as follows. Let f : Rn → Rm be Ck on an open

set U ⊆ Rn .

1. When ℓ = 1, (∗) states that the first-order derivative of f , taken at x, is a

linear function Df (x)[∆x1 ] of ∆x1 ∈ Rn , taking values in Rm , and that the
value of this function at every ∆x1 is given by the relation
∂
Df (x)[∆x1 ] = t1 =0
f (x + t1 ∆x1 ) (C.8)
∂t1
(cf. (C.2)), which is in complete accordance with what we already know about
the derivative.
2. To understand what the second derivative is, let us take the first derivative
Df (x)[∆x1 ], let us temporarily fix somehow the argument ∆x1 and treat the
derivative as a function of x. As a function of x, ∆x1 being fixed, the quantity
Df (x)[∆x1 ] is again a mapping which maps U into Rm and is differentiable by
Theorem C.4 (provided, of course, that k ≥ 2). The derivative of this mapping
will be a certain linear function of ∆x ≡ ∆x2 ∈ Rn , depending on x as on a
parameter; and of course it depends on ∆x1 as on a parameter as well. Thus,
402 Prerequisites from Calculus

the derivative of Df (x)[∆x1 ] in x is a certain function

D2 f (x)[∆x1 , ∆x2 ]
of x ∈ U and ∆x1 , ∆x2 ∈ Rn and taking values in Rm . What we know about
this function is that it is linear in ∆x2 . In fact, it is also linear in ∆x1 , since
it is the derivative in x of certain function (namely, of Df (x)[∆x1 ]) linearly
depending on the parameter ∆x1 , so that the derivative of the function in x
is linear in the parameter ∆x1 as well (differentiation is a linear operation
with respect to a function we are differentiating: summing up functions and
multiplying them by real constants, we sum up, respectively, multiply by the
same constants, the derivatives). Thus, D2 f (x)[∆x1 , ∆x2 ] is linear in ∆x1
when x and ∆x2 are fixed, and is linear in ∆x2 when x and ∆x1 are fixed.
Moreover, we have
∂
D2 f (x)[∆x1 , ∆x2 ] = ∂t2 t2 =0
Df (x + t2 ∆x2 )[∆x1 ] [cf. (C.8)]
∂ ∂
= ∂t2 t2 =0 ∂t1 t1 =0
f (x + t2 ∆x2 + t1 ∆x1 ) [by (C.8)]
∂2
= ∂t2 ∂t1
f (x + t1 ∆x1 + t2 ∆x2 )
t1 =t2 =0
(C.9)
as claimed in (C.7) for ℓ = 2. The only piece of information about the
second derivative which is contained in (∗) and is not justified yet is that
D2 f (x)[∆x1 , ∆x2 ] is symmetric in ∆x1 , ∆x2 . This fact is readily given by
the representation (C.7), since, as it was proven in Calculus, if a function ϕ
possesses continuous partial derivatives of orders ≤ ℓ in a neighborhood of a
point, then these derivatives in this neighborhood are independent of the order
in which they are taken. Then, it follows that
∂2
D2 f (x)[∆x1 , ∆x2 ] = f (x + t1 ∆x1 + t2 ∆x2 ) [by (C.9)]
∂t2 ∂t1 t1 =t2 =0
| {z }
:=ϕ(t1 ,t2 )
2
∂
= ϕ(t1 , t2 )
∂t1 ∂t2 t1 =t2 =0
∂2
= f (x + t2 ∆x2 + t1 ∆x1 )
∂t1 ∂t2 t1 =t2 =0

= D2 f (x)[∆x , ∆x1 ] 2
[once again by (C.9)]
3. Now it is clear how to proceed: to define D3 f (x)[∆x1 , ∆x2 , ∆x3 ], we fix in
the second-order derivative D2 f (x)[∆x1 , ∆x2 ] the arguments ∆x1 , ∆x2 and
treat it as a function of x only, thus arriving at a mapping which maps U
into Rm and depends on ∆x1 , ∆x2 as on parameters (linearly in every one
of them). Differentiating the resulting mapping in x, we arrive at a func-
tion D3 f (x)[∆x1 , ∆x2 , ∆x3 ] which by construction is linear in every one of
the arguments ∆x1 , ∆x2 , ∆x3 and satisfies (C.7); the latter relation, due
to the Calculus result on the symmetry of partial derivatives, implies that
D3 f (x)[∆x1 , ∆x2 , ∆x3 ] is symmetric in ∆x1 , ∆x2 , ∆x3 . After we have at our
C.2 Higher order derivatives 403

disposal the third derivative D3 f , we can build from it in the already explained
fashion the fourth derivative, and so on, until k-th derivative is defined.
Remark C.10 Since Dℓ f (x)[∆x1 , . . . , ∆xℓ ] is linear in every one of ∆xi , we can
expand the derivative in a multiple sum:
n
X
i
∆x = ∆xij ej ,
j=1
" n n
#
X X
=⇒ Dℓ f (x)[∆x1 , . . . , ∆xℓ ] = Dℓ f (x) ∆x1j1 ej1 , . . . , ∆xℓjℓ ejℓ (C.10)
j1 =1 jℓ =1
X
= Dℓ f (x) [ej1 , . . . , ejℓ ] ∆x1j1 . . . ∆xℓjℓ .
1≤j1 ,...,jℓ ≤n

What is the origin of the coefficients Dℓ f (x)[ej1 , . . . , ejℓ ]? According to (C.7), one
has
∂ℓ
Dℓ f (x)[ej1 , . . . , ejℓ ] = f (x + t1 ej1 + t2 ej2 + . . . + tℓ ejℓ )
∂tℓ ∂tℓ−1 . . . ∂t1 t1 =...=tℓ =0
∂ℓ
= f (x).
∂xjℓ ∂xjℓ−1 . . . ∂xj1
so that the coefficients in (C.10) are nothing but the partial derivatives, of order
ℓ, of f .
Remark C.11 An important particular case of the relation (C.7) is the one
when the vectors ∆x1 , ∆x2 , . . . , ∆xℓ are all the same. Let d := ∆x1 = . . . = ∆xℓ .
According to (C.7), we have
∂ℓ
Dℓ f (x)[d, d, . . . , d] = f (x + t1 d + t2 d + . . . + tℓ d).
∂tℓ ∂tℓ−1 . . . ∂t1 t1 =...=tℓ =0

This relation can be interpreted as follows: consider the function

ϕ(t) := f (x + td)
of a real variable t. Then, (check it!)
∂ℓ
ϕ(ℓ) (0) = f (x + t1 d + t2 d + . . . + tℓ d) = Dℓ f (x)[d, . . . , d].
∂tℓ ∂tℓ−1 . . . ∂t1 t1 =...=tℓ =0

In fact, Dℓ f (x)[d, . . . , d] has a special name. It is called ℓ-th directional derivative

of f taken at x along the direction d. To define this quantity, we pass from function
f of several variables to the univariate function ϕ(t) := f (x + td) , which restricts
f onto the line passing through x and directed by d, and then take the “usual”
derivative of order ℓ of the resulting function of single real variable t at the point
t = 0 (which corresponds to the point x of our line).

Representation of higher order derivatives. k-th order derivative Dk f (x)[·,

. . . , ·] of a Ck function f : Rn → Rm is what it is – it is a symmetric k-linear
404 Prerequisites from Calculus

mapping on Rn taking values in Rm and depending on x as a parameter. Choosing

somehow coordinates in Rn , we can represent such a mapping in the form
X ∂ k f (x)
Dk f (x)[∆x1 , . . . , ∆xk ] = (∆x1 )i1 . . . (∆xk )ik .
1≤i1 ,...,ik ≤n
∂xik ∂x ik−1 . . . ∂x i1

We may say that the derivative can be represented by k-index collection of

k
m-dimensional vectors ∂xi ∂x∂ i f (x)...∂xi . This collection, however, is a difficult-to-
k k−1 1
handle entity, so that such a representation does not help. There is, however, a
case when the collection becomes an entity we know to handle; this is the case of
the second-order derivative of a scalar function (k = 2, m = 1). In this case, the
collection in question is just a symmetric matrix
2
∂ f (x)
H(x) := .
∂xi ∂xj 1≤i,j≤n
This matrix is called the Hessian of f at x. Note that
D2 f (x)[∆x1 , ∆x2 ] = [∆x1 ]⊤ H(x)∆x2 .

C.2.1 Calculus of Ck mappings

The calculus of Ck mappings can be summarized as follows:

Theorem C.12
(i) Let U be an open set in Rn , f1 (·), f2 (·) : Rn → Rm be Ck in U , and let
real-valued functions λ1 (·), λ2 (·) be Ck in U . Then, the function
f (x) = λ1 (x)f1 (x) + λ2 (x)f2 (x)
is Ck in U .
(ii) Let U be an open set in Rn , V be an open set in Rm , let a mapping
f : Rn → Rm be such that it is Ck in U and f (x) ∈ V for x ∈ U , and,
finally, let a mapping g : Rm → Rp be Ck in V . Then, the superposition
h(x) = g(f (x))
is Ck in U .

Remark C.13 For higher order derivatives, in contrast to the first-order ones,
there is no simple “chain rule” for computing the derivative of superposition. For
example, the second-order derivative of the superposition h(x) = g(f (x)) of two
C2 -mappings is given by the formula
Dh(x)[∆x1 , ∆x2 ]
= Dg(f (x))[D2 f (x)[∆x1 , ∆x2 ]] + D2 g(x)[Df (x)[∆x1 ], Df (x)[∆x2 ]]

(check it!). We see that both the first- and the second-order derivatives of f and
g contribute to the second-order derivative of the superposition h.
The only case when there does exist a simple formula for high order derivatives
C.2 Higher order derivatives 405

of a superposition is the case when the inner function is affine: if f (x) = Ax + b

and h(x) = g(f (x)) = g(Ax + b) with a Cℓ mapping g, then
Dℓ h(x)[∆x1 , . . . , ∆xℓ ] = Dℓ g(Ax + b)[A∆x1 , . . . , A∆xℓ ]. (C.11)

C.2.2 Examples of higher-order derivatives

We next go over some examples of computing higher-order derivatives.
Example C.14 (Second-order derivative of an affine function) Consider f (x) =
a + b⊤ x. Then, of course, its second-order derivative is identically zero. Indeed,
as we have seen in Example C.6,
Df (x)[∆x1 ] = b⊤ ∆x1
is independent of x, and therefore the derivative of Df (x)[∆x1 ] in x, which should
give us the second derivative D2 f (x)[∆x1 , ∆x2 ], is zero. Clearly, the third, the
fourth, etc., derivatives of an affine function are zero as well.
Example C.15 (Second-order derivative of a homogeneous quadratic form) Con-
sider f (x) = x⊤ Ax, where A is a symmetric n × n matrix. As we have seen in
Example C.7,
Df (x)[∆x1 ] = 2x⊤ A ∆x1 .
Differentiating this mapping in x, we get
D2 f (x)[∆x1 , ∆x2 ] = lim t−1 2(x + t∆x2 )⊤ A ∆x1 − 2x⊤ A ∆x1 = 2(∆x2 )⊤ A ∆x1 .

t→+0

Hence,
D2 f (x)[∆x1 , ∆x2 ] = 2(∆x2 )⊤ A ∆x1 .
Note that the second derivative of a quadratic form is independent of x. Conse-
quently, the third, the fourth, etc., derivatives of a quadratic form are identically
zero.
Example C.16 (Second-order derivative of the log-det barrier) Consider F (X) =
ln Det(X). As we have seen, this function of an n × n matrix is well-defined and
differentiable on the set U of matrices with positive determinant (which is an
open set in the space Rn×n of n × n matrices). In fact, this function is C∞ in U .
Let us compute its second-order derivative. Recall from Example C.9 that
DF (X)[∆X 1 ] = Tr(X −1 ∆X 1 ). (C.12)
To differentiate the right hand side in X, we will need the derivative of the
mapping G(X) = X −1 which is defined on the open set of nonsingular n × n
matrices. Recall from Example C.8 that we have the relation
D(X −1 )[∆X] = −X −1 ∆XX −1 , X ∈ Rn×n , Det(X) ̸= 0

which is the “matrix extension” of the standard relation (x−1 )′ = −x−2 , x ∈ R.

Now we are ready to compute the second derivative of the log-det barrier
406 Prerequisites from Calculus

F (X) = ln Det(X). Starting from its first derivative DF (X)[∆X 1 ] = Tr(X −1 ∆X 1 ),

and differentiating we obtain
D2 F (X)[∆X 1 , ∆X 2 ] = lim t−1 Tr (X + t∆X 2 )−1 ∆X 1 − Tr(X −1 ∆X 1 )

t→+0

= lim Tr t−1 (X + t∆X 2 )−1 ∆X 1 − X −1 ∆X 1

t→+0

= lim Tr t−1 (X + t∆X 2 )−1 − X −1 ∆X 1

t→+0

−1 2 −1 −1 1

= Tr lim t (X + t∆X ) − X ∆X
t→+0

= Tr −X −1 ∆X 2 X −1 ∆X 1 ,

where the last equation follows from D(X −1 )[∆X 2 ] = −X −1 ∆X 2 X −1 for all
nonsingular X ∈ Rn×n . Hence, we arrive at the formula
D2 F (X)[∆X 1 , ∆X 2 ] = −Tr(X −1 ∆X 2 X −1 ∆X 1 ) [X ∈ Rn×n , Det(X) > 0] .
Since Tr(AB) = Tr(BA) (check it!) for all matrices A, B such that the product
AB makes sense and is square, the right hand side in the above formula is sym-
metric in ∆X 1 , ∆X 2 , as it should be for the second derivative of a C2 function.

C.2.3 Taylor expansion

Assume that f : R → R is Ck in an open neighborhood U of a point x̄. The
n m

Taylor expansion of order k of f , built at the point x̄, is the function defined as
1 1
Fk (x) := f (x̄) +Df (x̄)[x − x̄] + D2 f (x̄)[x − x̄, x − x̄] (C.13)
1! 2!
1 1
+ D3 f (x̄)[x − x̄, x − x̄, x − x̄] + . . . + Dk f (x̄)[x − x̄, . . . , x − x̄].
3! k! | {z }
k times
We are already acquainted with the Taylor expansion of order 1
F1 (x) = f (x̄) + Df (x̄)[x − x̄],
this is the affine function of x which approximates “very well” f (x) in a neigh-
borhood of x̄, namely, within approximation error ō(|x − x̄|). We next state that
a similar fact is true for Taylor expansions of higher order.

Theorem C.17 Let f : Rn → Rm be Ck in a neighborhood of x̄, and let

Fk (x) be the Taylor expansion of f at x̄ of degree k. Then,
(i) Fk (x) is a vector-valued polynomial of full degree less than or equal to
k, i.e., every one of the coordinates of the vector Fk (x) is a polynomial of
x1 , . . . , xn , and the sum of powers of xi ’s in every term of this polynomial
does not exceed k.
(ii) Fk (x) approximates f (x) in a neighborhood of x̄ up to a remainder
which is ō(∥x − x̄∥k2 ) as x → x̄. That is, for every ϵ > 0, there exists δ > 0
C.2 Higher order derivatives 407

such that
∥x − x̄∥2 ≤ δ =⇒ ∥Fk (x) − f (x)∥2 ≤ ϵ∥x − x̄∥k2 .
Fk (·) is the unique polynomial with components of full degree less than or
equal to k which approximates f up to a remainder which is ō(∥x − x̄∥k ).
(iii) The value and the derivatives of Fk of orders 1, 2, . . . , k, taken at x̄,
are the same as the value and the corresponding derivatives of f taken at
the same point.

As stated in Theorem C.17, Fk (x) approximates f (x) for x close to x̄ up to a

remainder which is ō(|x − x̄|k ). In many cases, it is not enough to know that the
reminder is “ō(∥x − x̄∥k2 )” — we need an explicit bound on this remainder. The
standard bound of this type is as follows:

Theorem C.18 Let k be a positive integer, and let f : Rn → Rm be Ck+1

in a ball Br := Br (x̄) = {x ∈ Rn : ∥x − x̄∥2 < r} of a radius r > 0 centered
at a point x̄. Assume that the directional derivatives of order k + 1, taken
at every point of Br along every ∥ · ∥2 -unit direction, do not exceed certain
L < ∞:
∥Dk+1 f (x)[d, . . . , d]∥2 ≤ L, ∀(x ∈ Br ), ∀(d ∈ Rn : ∥d∥2 = 1).
Then, the following holds for the Taylor expansion Fk of order k of f taken
at x̄
L∥x − x̄∥k+1
2
∥f (x) − Fk (x)∥2 ≤ , ∀x ∈ Br .
(k + 1)!

Thus, in a neighborhood of x̄ the remainder of the k-th order Taylor expansion,

taken at x̄, is of order of L∥x − x̄∥k+1
2 , where L is the maximal (over all unit direc-
tions and all points from the neighborhood) ∥ · ∥2 -magnitudes of the directional
derivatives of order k + 1 of f .
Appendix D

Prerequisites: Symmetric Matrices and

Positive Semidefinite Cone

D.1 Symmetric matrices

Let S be the space of symmetric m × m matrices, and Rm×n be the space of
m

rectangular m × n matrices with real entries. From the viewpoint of their linear
structure (i.e., the operations of addition and multiplication by real numbers)
Sm is nothing but the arithmetic linear space Rm(m+1)/2 of dimension m(m+1) 2
:
by arranging the elements of a symmetric m × m matrix X in a single column,
say, in the row-by-row order, you get a usual m2 -dimensional column vector;
multiplication of a matrix by a real number and addition of matrices correspond
to the same operations with the “representing vector(s).” When X runs through
Sm , the vector representing X runs through m(m + 1)/2-dimensional subspace of
2
Rm consisting of vectors satisfying the “symmetry condition” – the coordinates
coming from symmetric to each other pairs of entries in X are equal to each
other. Similarly, Rm,n as a linear space is just Rmn , and it is natural to equip
Rm,n with the inner product defined as the usual inner product of the vectors
representing the matrices:
m Xn
" m n #
X XX
⊤ ⊤
⟨X, Y ⟩ := Xij Yij = Tr(XY ) = Yij Xij = Tr(Y X ) .
i=1 j=1 i=1 j=1

Here, Tr stands for the trace – the sum of diagonal elements of a (square) matrix.
With this inner product (called the Frobenius inner product), Rm×n becomes
a legitimate Euclidean space, and we may use in connection with this space
all notions based upon the Euclidean structure, e.g., the (Frobenius) norm of a
matrix
rX X
m n
q q
∥X∥2 := ⟨X, X⟩ = Xij2 = Tr(XX ⊤ ),
i=1 j=1

and likewise the notions of orthogonality, orthogonal complement of a linear sub-

space, etc. The same applies to the space Sm equipped with the Frobenius inner
product; of course, the Frobenius inner product of symmetric matrices can be
written without the transposition sign:

⟨X, Y ⟩ = Tr(XY ), X, Y ∈ Sm .

The following simple fact is very useful:

408
D.1 Symmetric matrices 409

Fact D.1 Let X, Y be rectangular matrices such that XY makes sense and
is a square matrix. Then, Tr(Y X) also makes sense and
Tr(XY ) = Tr(Y X).

Here is another equally simple and useful fact.

Fact D.2 If X, Y ∈ Rm×n , the Frobenius inner product of X and Y is equal

to the Frobenius inner product of X ⊤ and Y ⊤ :
Tr(XY ⊤ ) = Tr (X ⊤ )(Y ⊤ )⊤ .

Moreover, when U is an orthogonal m × m matrix (i.e., U U ⊤ = U ⊤ U = Im ,

which is the same as U −1 = U ⊤ ), and V is an orthogonal n × n matrix (i.e.,
V V ⊤ = V ⊤ V = In ), the Frobenius inner product of U XV and U Y V is the
same as the Frobenius inner product of X and Y :
Tr(XY ⊤ ) = Tr (U XV )(U Y V )⊤ .

D.1.1 Main facts on symmetric matrices

Here, we will focus on the space Sm of symmetric matrices. We start with the
following most important property of these matrices.

Theorem D.3 [Eigenvalue decomposition] An n × n matrix A is symmetric

if and only if it admits an orthonormal system of eigenvectors, i.e., there
exist orthonormal basis {e1 , . . . , en } such that
Aei = λi ei , i = 1, . . . , n, (D.1)
holds for some reals λi .

In connection with Theorem D.3, let us recall the following notions and facts:
D.1.1.A. Eigenvectors and eigenvalues.

Definition D.4 [Eigenvector and eigenvalue] An eigenvector of an n × n

matrix A is a nonzero vector e (real or complex) such that Ae = λe holds
for some (real or complex) scalar λ. This scalar is called the eigenvalue of A
corresponding to the eigenvector e.

Eigenvalues of A are exactly the roots of the characteristic polynomial of A

given by
π(z) = Det(zI − A) = z n + b1 z n−1 + b2 z n−2 + . . . + bn .
In the sequel, we denote by σ(A) the spectrum of A ∈ Sn , i.e., the set of all real
numbers which are eigenvalues of the matrix. To avoid misunderstanding, let us
stress that σ(A) has as many points as many distinct eigenvalues A has and “pays
no attention” to the multiplicities of these eigenvalues; for example, σ(In ) = {1}.
410 Prerequisites: Symmetric Matrices and Positive Semidefinite Cone

Theorem D.3 states, in particular, that for a symmetric matrix A, all eigenval-
ues are real, and the corresponding eigenvectors can be chosen to be real and to
form an orthonormal basis in Rn .
Remark D.5 When Q is a square and nonsingular n × n matrix, the simi-
larity transformation A 7→ QAQ−1 preserves the characteristic polynomial (and
therefore the eigenvalues). Indeed,

Det(zI − QAQ−1 ) = Det(Q[zI − A]Q−1 ) = Det(Q)Det(zI − A)Det(Q−1 )

= Det(zI − A).

D.1.1.B. Eigenvalue decomposition of a symmetric matrix. We will give

an alternative characterization of symmetric matrices. To this end, we start with
a definition and a fact that are important on their own.

Definition D.6 [Orthogonal matrix] An n×n matrix U is called orthogonal

if U −1 = U ⊤ .

There are several equivalent characterizations of orthogonal matrices.

Fact D.7 An n × n matrix U is orthogonal if and only if any (and all) of

the following holds:
• U −1 = U ⊤ ,
• U ⊤ U = I,
• U U ⊤ = I,
• the columns of U form an orthonormal basis in Rn ,
• the rows of U form an orthonormal basis in Rn .

Theorem D.3 admits an equivalent reformulation as follows (check the equiva-

lence!).

Theorem D.8 An n × n matrix A is symmetric if and only if it can be

represented in the form
A = U ΛU ⊤ , (D.2)
where
• U is an orthogonal matrix,
• Λ is the diagonal matrix with the diagonal entries λ1 , . . . , λn .

Representation (D.2) with orthogonal U and diagonal Λ is called the eigenvalue

decomposition of A. In such a representation,

• the columns of U form an orthonormal system of eigenvectors of A, and

• the diagonal entries in Λ are the eigenvalues of A corresponding to these eigen-
vectors.
D.1 Symmetric matrices 411

D.1.1.C. Vector of eigenvalues. When speaking about eigenvalues λi (A) of a

symmetric n × n matrix A, we always arrange them in the non-increasing order:
λ1 (A) ≥ λ2 (A) ≥ . . . ≥ λn (A).
Moreover, we use λ(A) ∈ Rn to denote the vector of eigenvalues of A taken in the
above order. We let λmax (A) (and respectively λmin (A)) correspond to the largest
(smallest) eigenvalue of A.
D.1.1.D. Freedom in eigenvalue decomposition. It is important to note
that in the eigenvalue decomposition (D.2), a part of the data Λ, U is uniquely
defined by A, while the other data admit certain “freedom.” Specifically, the se-
quence λ1 , . . . , λn of eigenvalues of A (i.e., diagonal entries of Λ) is exactly the
sequence of roots of the characteristic polynomial of A (every root is repeated
according to its multiplicity) and thus is uniquely defined by A (provided that
we arrange the entries of the sequence in the non-increasing order). On the other
hand, it is possible that the columns of U are not uniquely defined by A. What
is uniquely defined, are the linear spans E(λ) of the columns of U corresponding
to all eigenvalues equal to certain λ, and such a linear span is nothing but the
spectral subspace {x : Ax = λx} of A corresponding to the eigenvalue λ. Given
A ∈ Sn , there are as many spectral subspaces of A as the number of differ-
ent eigenvalues of A. Moreover, the spectral subspaces corresponding to different
eigenvalues of a given symmetric matrix are orthogonal to each other, and their
sum is the entire space. When building an orthogonal matrix U for the eigen-
value decomposition (D.2), one chooses an orthonormal eigenbasis in the spectral
subspace corresponding to the largest element in σ(A) and makes the vectors of
this basis the first columns in U , then chooses an orthonormal basis in the spec-
tral subspace corresponding to the second largest element of σ(A) and makes the
vectors from this basis the next columns of U , and so on.
D.1.1.E. “Simultaneous” decomposition of commuting symmetric ma-
trices. Given a number of symmetric matrices A1 , . . . , Ak ∈ Sn , it is useful to
know when they commute with each other, i.e., Ai Aj = Aj Ai for all i, j. It turns
out that there is a complete characterization of this property given as follows.

Fact D.9 The matrices A1 , . . . , Ak ∈ Sn commute with each other (Ai Aj =

Aj Ai for all i, j) if and only if they can be “simultaneously diagonalized,”
i.e., there exist a single orthogonal matrix U and diagonal matrices Λ1 ,. . . ,Λk
such that
Ai = U Λi U ⊤ , i = 1, . . . , k.

The proof of Fact D.9 relies on two simple facts that are important by their
own rights. We state these facts next.

Fact D.10 Let A, B ∈ Rn×n be commuting and λ be a real eigenvalue of A.

Then, the spectral subspace E = {x ∈ Rn : Ax = λx} of A corresponding
to λ is invariant for B (i.e., Be ∈ E for every e ∈ E).
412 Prerequisites: Symmetric Matrices and Positive Semidefinite Cone

Fact D.11 If A is an n × n matrix and L is an invariant subspace of A

(i.e., L is a linear subspace such that Ae ∈ L whenever e ∈ L), then the
orthogonal complement L⊥ of L is invariant for the matrix A⊤ . In particular,
if A is symmetric and L is invariant subspace of A, then L⊥ is an invariant
subspace of A⊤ as well.

D.1.2 Variational characterization of eigenvalues

To put what follows into a proper perspective, let us ask ourselves the following
simple question: for a vector x with coordinates x1 , . . . , xn , let x(k) be the k-th
largest of the coordinates – the one which will be at k-th place when the entries
in x are rearranged in the non-ascending order. For example, for x = [2; 1; 2] we
have x(1) = 2, x(2) = 2, x(3) = 1. Now the question is how to characterize x(k)
without referring to rearranging the entries in x. The answer is as follows. Let us
think about n stones with the weights x1 , x2 , . . . , xn . Then x(k) can be described
as follows: let us somehow throw away from the collection of our n stones k − 1
of them and look at the largest weight, w, of the remaining stones; this weight
depends on which k − 1 of the stones were thrown away. Now let us take the
minimum of the weights w we can get in this fashion over all possible ways to
select k − 1 stones to throw away; this minimum is exactly x(k) . By the way, it
is immediately self-evident that when x, y ∈ Rn and y ≥ x, then y (k) ≥ x(k) for
every k ≤ n. However, even self-evident facts need proof, and here is the proof:
when increasing entries xi in x to yi ’s (which precisely corresponds to increasing
the weights of every one of our stones) we clearly cannot decrease the above
w’s (that is, the largest of the weights in groups obtained by throwing away a
particular collection of k − 1 stones). And since none of the w’s decrease, their
minimum does not decrease as well.
Variational characterization of eigenvalues of a symmetric matrix is the ex-
tremely useful matrix version of the elementary considerations above.

Theorem D.12 [Variational Characterization of Eigenvalues (VCE)] For

any A ∈ Sn , we have
λℓ (A) = min max x⊤ Ax, ℓ = 1, . . . , n, (D.3)
E∈Eℓ x∈E,x⊤ x=1

where Eℓ is the family of all linear subspaces in Rn of the dimension n − ℓ + 1.

Based on VCE, in order to get the largest eigenvalue λ1 (A), we maximize the
quadratic form x⊤ Ax over the unit sphere S = x ∈ Rn : x⊤ x = 1 and the

maximum is exactly λ1 (A). To get the second largest eigenvalue λ2 (A), we act
as follows: we choose a linear subspace E of dimension n − 1 and maximize the
quadratic form x⊤ Ax over the cross-section of S by this subspace; the maximum
value of the form depends on E, and we minimize this maximum over linear
subspaces E of the dimension n − 1; the resulting value is exactly λ2 (A). To get
λ3 (A), we replace in the latter construction subspaces of the dimension n − 1 by
D.1 Symmetric matrices 413

those of the dimension n − 2, and so on. In particular, the smallest eigenvalue

λn (A) is just the minimum, over all linear subspaces E of the dimension n−n+1 =
1, i.e., over all lines passing through the origin, of the quantities x⊤ Ax, where
x ∈ E has unit norm (x⊤ x = 1). In other words, λn (A) is just the minimum of
the quadratic form x⊤ Ax over the unit sphere S.
Proof of the VCE. Let e1 , . . . , en be an orthonormal eigenbasis of A, i.e., Aeℓ =
λℓ (A)eℓ and eℓ are unit vectors for all 1 ≤ ℓ ≤ n. For 1 ≤ ℓ ≤ n, let

Fℓ := Lin{e1 , . . . , eℓ }, and Gℓ := Lin{eℓ , eℓ+1 , . . . , en }.

Finally, for x ∈ Rn , let ξ(x) be the vector of coordinates of x in the orthonormal

basis e1 , . . . , en . Note that
x⊤ x = ξ(x)⊤ ξ(x),

since {e1 , . . . , en } is an orthonormal basis, and that

Pn Pn Pn
x⊤ Ax = x⊤ A i=1 ξi (x)ei = x⊤ i=1 λi (A)ξi (x)ei = i=1 λi (A)ξi (x) (x⊤ ei )
| {z }
=ξi (x)
Pn
= i=1 λi (A)ξi2 (x).
(D.4)
Now, consider a fixed ℓ such that 1 ≤ ℓ ≤ n, and set E = Gℓ . Note that E is a
linear subspace of the dimension n − ℓ + 1. Moreover, from (D.4), we deduce that
n
X
x⊤ Ax = λi (A)ξi (x)2
i=ℓ

holds for any x ∈ E. Therefore,

Pn Pn 2
max x⊤ Ax : x ∈ E, x⊤ x = 1

= maxξ { i=ℓ λi (A)ξi2 : i=ℓ ξi = 1}
x
= max λi (A) = λℓ (A).
ℓ≤i≤n

Thus, for appropriately chosen E ∈ Eℓ , the inner maximum in the right hand side
of (D.3) equals to λℓ (A). Hence, we have

min max x⊤ Ax ≤ λℓ (A).

E∈Eℓ x∈E,x⊤ x=1

It remains to prove the reverse inequality. To this end, consider a linear sub-
space E of the dimension n − ℓ + 1 and observe that its intersection with the
linear subspace Fℓ of the dimension ℓ is nontrivial (indeed, dim E + dim Fℓ =
(n − ℓ + 1) + ℓ > n, so that dim(E ∩ F ) > 0 by the Dimension formula). Con-
sider any unit vector y ∈ E ∩ Fℓ . Since y is a unit vector from Fℓ , it admits a
representation of the form,
ℓ
X ℓ
X
y= ηi ei with ηi2 = 1,
i=1 i=1
414 Prerequisites: Symmetric Matrices and Positive Semidefinite Cone

whence, by (D.4),
ℓ
X
y ⊤ Ay = λi (A)ηi2 ≥ min λi (A) = λℓ (A).
1≤i≤ℓ
i=1

Since y ∈ E, we conclude that

max x⊤ Ax ≥ y ⊤ Ay ≥ λℓ (A).
x∈E,x⊤ x=1

Since E is an arbitrary subspace form Eℓ , we conclude that the right hand side
in (D.3) is ≥ λℓ (A).
Our reasoning in the proof of the VCE uses the relation (D.4) which is a simple
and useful byproduct. Therefore, we state it formally as a corollary.

Corollary D.13 For any A ∈ Sn , the quadratic form x⊤ Ax is exactly equal

to the weighted sum of squares of the coordinates ξi (x) of x taken with
respect to an orthonormal eigenbasis of A. Moreover, the weights in this sum
are exactly the eigenvalues of A. That is,
X
x⊤ Ax = λi (A)ξi2 (x).
i

D.1.3 Corollaries of the VCE

VCE admits a number of extremely important corollaries as follows:
D.1.2.A. Eigenvalue characterization of positive (semi)definite matri-
ces.

Definition D.14 [Positive definite matrix] A matrix A is called positive

definite [notation: A ≻ 0], if it is symmetric and x⊤ Ax > 0 holds for all
x ̸= 0.

Definition D.15 [Positive semidefinite matrix] A matrix A is called positive

semidefinite [notation: A ⪰ 0], if it is symmetric and x⊤ Ax ≥ 0 holds for all
x.
Based on VCE we arrive at the following eigenvalue characterization of positive
(semi)definite matrices.

Proposition D.16 A symmetric matrix A is positive semidefinite if and

only if its eigenvalues are nonnegative. Moreover, A is positive definite if and
only if all eigenvalues of A are positive.

Proof. Indeed, A is positive definite (positive semidefinite) if and only if the

minimum value of x⊤ Ax over the unit sphere is positive (nonnegative). Recall
D.1 Symmetric matrices 415

also that by VCE, the minimum value of x⊤ Ax over the unit sphere is exactly
the minimum eigenvalue of A.
D.1.2.B. ⪰-Monotonicity of the vector of eigenvalues.
We write A ⪰ B (A ≻ B) to express that A, B are symmetric matrices of the
same size such that A − B is positive semidefinite (respectively, positive definite).

Proposition D.17 Consider A, B ∈ Sn . If A ⪰ B, then λ(A) ≥ λ(B). Also,

if A ≻ B, then λ(A) > λ(B).

Proof. Indeed, when A ⪰ B, then, of course,

max x⊤ Ax ≥ max x⊤ Bx
x∈E:x⊤ x=1 x∈E:x⊤ x=1

for every linear subspace E, whence

λℓ (A) = min max x⊤ Ax ≥ min max x⊤ Bx = λℓ (B), ℓ = 1, . . . , n,

E∈Eℓ x∈E:x⊤ x=1 E∈Eℓ x∈E:x⊤ x=1

i.e., λ(A) ≥ λ(B). The case of A ≻ B can be considered similarly.

D.1.2.C. Eigenvalue Interlacement Theorem.
This is an extremely important result, and we formulate it as follows.

Theorem D.18 [Eigenvalue Interlacement Theorem] For A ∈ Sn , let Ā be

the angular (n − k) × (n − k) submatrix of A where k ≤ n. Then, for every
ℓ ≤ n − k, the ℓ-th eigenvalue of Ā separates the ℓ-th and the (ℓ + k)-th
eigenvalues of A:
λℓ (A) ⪰ λℓ (Ā) ⪰ λℓ+k (A). (D.5)

Proof. Recall that by VCE,

λℓ (Ā) = min max x⊤ Ax,

E∈Ēℓ x∈E:x⊤ x=1

where Ēℓ is the family of all linear subspaces of the dimension n−k−ℓ+1 contained
in the linear subspace {x ∈ Rn : xn−k+1 = xn−k+2 = . . . = xn = 0}. Since Ēℓ ⊂
Eℓ+k , we have

λℓ (Ā) = min max x⊤ Ax ≥ min max x⊤ Ax = λℓ+k (A).

E∈Ēℓ x∈E:x⊤ x=1 E∈Eℓ+k x∈E:x⊤ x=1

We have proved the left inequality in (D.5). Applying this inequality to the matrix
−A, we get
−λℓ (Ā) = λn−k−ℓ (−Ā) ≥ λn−ℓ (−A) = −λℓ (A),

or, which is the same, λℓ (Ā) ≤ λℓ (A). This then proves the first inequality in
(D.5).
416 Prerequisites: Symmetric Matrices and Positive Semidefinite Cone

D.1.4 Spectral norm and Lipschitz continuity of vector of eigenvalues

Spectral and induced norms of matrices. A useful example of a norm (to

refresh your memory on what a norm is, see section B.1) is the spectral norm of
an m × n matrix A:
∥A∥ = max ∥Ax∥2 .
x:∥x∥2 ≤1

The spectral norm is indeed a special case of the induced norm of a linear map-
ping. x 7→ y = Ax : E → F , where E and F are (finite-dimensional) linear spaces
equipped with norms ∥ · ∥E and ∥ · ∥F respectively. The norm ∥A∥F,E of A induced
by these two norms is defined as
∥A∥F,E := max {∥Ax∥F : ∥x∥E ≤ 1} .
x

The definition of induced norm immediately imply (check it!) that, first, for all
x ∈ E, y ∈ F we have
∥Ax∥F ≤ ∥A∥F,E ∥x∥E ,
and, second, the latter inequality is equality for properly selected nonzero x ∈ E.
In other words, the induced norm is the largest “amplification factor” – the largest
factor by which the norm ∥ · ∥F of Ax can be larger than the norm ∥ · ∥E of a
nonzero vector x.
Clearly, when E = Rn , F = Rm , and ∥ · ∥E , ∥ · ∥F are the standard Euclidean
norms on the respective spaces, the induced norm of A is exactly the spectral
norm of A.

We also have the following useful properties of the spectral norm ∥·∥ on Rm×n .

Fact D.20 Let ∥ · ∥ be the spectral norm on Rm×n .

(i) For any A ∈ Rm×n , we have
∥A∥ = max y ⊤ Ax : ∥x∥2 ≤ 1, ∥y∥2 ≤ 1 ,

x,y

hence also ∥A∥ = ∥A⊤ ∥.

(ii) For any A ∈ Sn , we have ∥A∥ = max {|λmax (A)| , |λmin (A)|}. Moreover,
for any A ∈ Rm×n we have
∥A∥2 = ∥A⊤ A∥ = λmax (A⊤ A) = λmax (AA⊤ ) = ∥AA⊤ ∥ = ∥A⊤ ∥2 .
D.1 Symmetric matrices 417

Lipschitz continuity of the vector of eigenvalues. Another useful conse-

quence of VCE is the Lipschitz continuity of the vector of eigenvalues of a matrix
as a function of the matrix.

Fact D.21 The vector-valued function A 7→ λ(A) : Sn → Rn is Lipschitz

continuous, specifically, denoting by ∥ · ∥ the spectral norm, for all k ≤ n, we
have
|λk (A) − λk (A′ )| ≤ ∥A − A′ ∥, ∀(A, A′ ∈ Sn ).

We also have the following immediate consequence of Fact D.21.

Corollary D.22 Let S be a subset of Rn , and S be the set of all matrices

A ∈ Sn with λ(A) ∈ S. Whenever S is open, or closed, or bounded, so is S.

D.1.5 Functions of symmetric matrices

Let A be a symmetric m × m matrix, and f be a real-valued function defined on
a subset of real line containing the spectrum σ(A) of A. We define the m × m
symmetric matrix f (A) as follows:
Let σ(A) = {µ1 , . . . , µs }, and let Rm = E1 +. . .+Es be the decomposition
of Rm into the sum of mutually orthogonal spectral subspaces of A,
i.e., Ei := {x ∈ Rm : Ax = λi x}. The matrix f (A) has every one of Ei
invariant, and x ∈ Ei implies Ax = f (µi )x for all i ≤ s.
Equivalently: given an eigenvalue decomposition
A = U Diag{λ1 , . . . , λm }U ⊤ of A, we have f (A) =
U Diag{f (λ1 ), . . . , f (λm )}U ⊤ .
From this definition it follows immediately that f (A) = g(A) whenever f and g
coincide with each other on the spectrum of A.
Moreover, it is immediately seen (check it!) that the resulting “calculus of
functions of (fixed) symmetric matrix A” obeys very natural rules:
1. f (·) 7→ f (A) is a mapping from the algebra R(σ(A)) of real-valued functions
on the subset σ(A) of the real axis into the space Sm of m × m symmetric
matrices which preserves linear operations and multiplication. That is, for any
f1 , . . . , fk ∈ R(σ(A)) and any a1 , . . . , ak ∈ R, we have
[a1 f1 (·) + . . . + ak fk (·)](A) = a1 f1 (A) + . . . + ak fk (A)
and
[f1 (·) · f2 (·)](A) = f1 (A)f2 (A).
In particular, all matrices of the form f (A), f ∈ R(σ(A)) commute with each
other.
Besides this, if f is a real-valued function on σ(A), then σ(f (A)) = f (σ(A)) :=
418 Prerequisites: Symmetric Matrices and Positive Semidefinite Cone

{t = f (s) : s ∈ σ(A)}, and if g is a real-valued function on f (σ(A)) and

g ◦ f (s) = g(f (s)), s ∈ Σ(A) is the composition of g and f , then
(g ◦ f )(A) = g(f (A)).
2. When f (·) ≡ 1, we have f (A) = Im , and when f (x) ≡ x, we have f (A) = A.
More generally, whenever f is a real algebraic polynomial
k
X
f (x) = a0 + ai xi ,
i=1

we have
k
X
f (A) = a0 In + ai Ai .
i=1

3. The mapping f (·) → f (A) “preserves nonnegativity,” i.e., the matrix f (A) is
positive semidefinite whenever f (·) is nonnegative on σ(A).
4. When a sequence fi (·) ∈ R(σ(A)) pointwise on σ(A) converges to f (·) as
i → ∞, the matrices fi (A) converge to f (A) as i → ∞. Moreover, for real-
valued on σ(A) functions f, g we have
∥f (A) − g(A)∥ ≤ max |f (s) − g(s)|,
s∈σ(A)

where ∥ · ∥ is the spectral norm.

Illustration: Square root of a positive semidefinite matrix. When A is
a positive semidefinite matrix, its matrix square root A1/2 is, by definition, f (A),
where f (µ) = µ1/2 is the usual square root on the nonnegative ray. From the
preceding rules, it follows that A1/2 is symmetric, positive semidefinite, and it
squares to A: (A1/2 )2 = A.
Similar to the arithmetic square root of a nonnegative real number a, i.e., the
unique nonnegative real number which squares to a, the square root A1/2 of a
positive semidefinite matrix A is the unique positive semidefinite matrix which
squares to A.

Fact D.23 Let A be a positive semidefinite m × m matrix, and B be a

positive semidefinite matrix such that B 2 = A. Then, B = A1/2 .

Fact D.24 Let f (x) be a continuously differentiable real-valued function on

an interval (a, b) (where −∞ ≤ a < b ≤ +∞) of the real axis. The function
ϕ(X) = Tr(f (X)) is continuously differentiable on the open set D of Sm
composed of all matrices with spectrum from (a, b), and
d
ϕ(X + tH) = Tr(f ′ (X)H) ∀X ∈ D, H ∈ Sm .
dt t=0

In other words, ϕ(·) is continuously differentiable on the open set D and its
D.1 Symmetric matrices 419

gradient, taken w.r.t. the Frobenius Euclidean structure on Sm , is ∇ϕ(X) =

f ′ (X).

Hint: In order to prove Fact D.24, it makes sense to verify its validity for algebraic
polynomials and then use the fact that continuously differentiable real valued
function on an interval (a, b) of the real axis is the limit, in the sense of uniform
convergence along with the first order derivative on compact subsets of (a, b), of
a sequence of polynomials.
Remark D.25 Note that when a complex-valued function f (z) of complex-
valued variable z can be represented as the sum of everywhere converging power
series (these functions are called entire):
∞
X
f (z) = cν z ν , [where cν ∈ C]
ν=0

we can “extend” f to the function

∞
X
f (Z) = cν Z ν
ν=0
n×n
defined on the space C of n × n matrices with complex entries and taking
n×n
values in the same C , ensuring that for entire functions f, g we have
f (z) ≡ z ν =⇒ f (Z) = Z ν , ν = 0, 1, . . . ,
(f + g)(Z) ≡ f (Z) + g(Z),
(f · g)(Z) ≡ f (Z)g(Z),
(λf )(Z) ≡ λf (Z), λ ∈ C.
In particular, for fixed Z, matrices f (Z) stemming from various entire functions
f commute with each other.
Whenever an entire function f is real-valued on the real axis (this is the case if
and only if the coefficients in the power series representing f are real), the matrix
f (Z), for a real n × n matrix Z, also is real, and for a real symmetric Z is real
symmetric as well. It is immediately seen that when f is an entire function real-
valued on the real axis and Z is real symmetric matrix, both definitions of f (Z)
– our original definition “keep the eigenbasis intact and replace eigenvalues λ
with eigenvalues f (λ)” (this recipe works whenever f is real-valued function well
defined on the spectrum of Z), and the new definition via power series expansion
result in the same matrix f (Z).

D.1.5.A Continuity of f (Z) in Z. So far we were interested in what happens

with the matrix f (Z) of a fixed symmetric Z when f changes. Now let us ask
ourselves what happens with this matrix when Z changes. We are about to con-
sider the simplest question – continuity of f (Z) in Z. Here is all we need in this
respect:
420 Prerequisites: Symmetric Matrices and Positive Semidefinite Cone

Proposition D.26 Let m be a positive integer, and let Σ be a nonempty

closed subset of the real axis. Suppose f (·) : Σ → R is a continuous function.
Then, the set Z := ZΣ composed of all symmetric m × m matrices with
spectrum from Σ is closed, and the function f (Z) of Z ∈ Z is continuous on
Z.
Proof. By Fact D.21, the vector λ(Z) of eigenvalues of Z ∈ Sm is Lipschitz
continuous function of Z, and as Σ is closed we deduce that Z is closed. To prove
continuity of f (Z) in Z ∈ Z, it is clearly sufficient to establish this fact when Σ
is nonempty, closed and bounded. Thus, let Σ be nonempty closed and bounded,
f be continuous on Σ, and let Zi ∈ Z converge, as i → ∞, to Z; we should verify
that f (Zi ) → f (Z) as i → ∞. It is well-known that a continuous function on a
nonempty compact subset of the real axis can be approximated, within whatever
accuracy in the uniform norm, by an algebraic polynomial. Therefore, given ϵ > 0,
we can find univariate algebraic polynomial p(s) such that |p(s) − f (s)| ≤ ϵ/3 for
all s ∈ Σ, implying, by item 4 of Section D.1.5, that ∥p(Z) − f (Z)∥ ≤ ϵ/3 for
all Z ∈ Z, where ∥ · ∥ is the spectral norm. Since p is an algebraic polynomial,
we clearly have p(Zi ) → p(Z) as i → ∞, implying that there exists iϵ such
that ∥p(Zi ) − p(Z)∥ ≤ ϵ/3 when i ≥ iϵ . We conclude that ∥f (Zi ) − f (Z)∥ ≤
∥p(Zi ) − f (Zi )∥ + ∥p(Zi ) − p(Z)∥ + ∥f (Z) − p(Z)∥ ≤ ϵ when i ≥ iϵ . Thus, for
every ϵ > 0 there exists iϵ such that ∥f (Zi ) − f (Z)∥ ≤ ϵ for i ≥ iϵ , so that
f (Zi ) → f (Z) as i → ∞.

D.2 Positive semidefinite matrices and positive semidefinite cone

D.2.1 Positive semidefinite matrices.
Recall that an n × n matrix A is called positive semidefinite [notation: A ⪰ 0], if
A is symmetric and produces nonnegative quadratic form:
A ⪰ 0 ⇐⇒ A = A⊤ and x⊤ Ax ≥ 0 ∀x .

A is called positive definite [notation: A ≻ 0], if it is positive semidefinite and the

corresponding quadratic form is positive outside the origin:
A ≻ 0 ⇐⇒ A = A⊤ and x⊤ Ax > 0 ∀x ̸= 0 .

The class of positive semidefinite matrices plays an important role in Mathe-

matics. Thus, we next list a number of equivalent definitions of a positive semidef-
inite matrix.

Theorem D.27 For any A ∈ Sn , the following properties of A are equivalent

to each other:
(i) A ⪰ 0,
(ii) λ(A) ≥ 0,
(iii) A = D⊤ D for certain rectangular matrix D,
(iv) A = ∆⊤ ∆ for certain upper triangular n × n matrix ∆,
D.2 Positive semidefinite matrices and positive semidefinite cone 421

(v) A = B 2 for certain symmetric matrix B,

(vi) A = B 2 for certain B ⪰ 0.
Moreover, for any A ∈ Sn , we also have the following properties equivalent
to each other:
(i′ ) A ≻ 0,
(ii′ ) λ(A) > 0,
(iii′ ) A = D⊤ D for certain rectangular matrix D of rank n,
(iv′ ) A = ∆⊤ ∆ for certain nonsingular upper triangular n × n matrix ∆,
(v′ ) A = B 2 for certain nonsingular symmetric matrix B,
(vi′ ) A = B 2 for certain B ≻ 0.

Proof. (i) ⇐⇒ (ii): This equivalence is given by Proposition D.16.

(ii) =⇒ (vi): Suppose that we are in the case of (ii). Let A = U ΛU ⊤ be the
eigenvalue decomposition of A, so that U is orthogonal and Λ is diagonal with
nonnegative diagonal entries λi (A) (as we are in the case of (ii)). Let Λ1/2 be the
1/2
diagonal matrix with the diagonal entries λi (A). Note that (Λ1/2 )2 = Λ. Then,
1/2
the matrix B = U Λ1/2 U ⊤ is symmetric with nonnegative eigenvalues λi (A),
and thus B ⪰ 0 by Proposition D.16. Moreover,
⊤ 1/2 ⊤ 1/2 2 ⊤ ⊤
B 2 = U Λ1/2 U
| {zU} Λ U = U (Λ ) U = U ΛU = A,
=I

as required in (vi).
(vi) =⇒ (v): Evident.
(v) =⇒ (iv): Suppose A = B 2 with certain symmetric B. Let bi be the i-th
column of B for all i ≤ n. Applying the Gram-Schmidt orthogonalization process
(see proof of Theorem A.31(ii)), we can find an orthonormal system of vectors
u1 , . . . , un and lower triangular matrix L such that
i
X
bi = Lij uj ,
j=1

or, which is the same, B ⊤ = LU , where U is the orthogonal matrix with the rows
u⊤ ⊤ 2 ⊤ ⊤ ⊤
1 , . . . , un . Then, we have A = B = B (B ) = LU U ⊤ L⊤ = LL⊤ . Recalling
that L is a lower triangular matrix, we see that A = ∆⊤ ∆, where the matrix
∆ = L⊤ is upper triangular.
(iv) =⇒ (iii): Evident.
(iii) =⇒ (i): If A = D⊤ D, then x⊤ Ax = (Dx)⊤ (Dx) ≥ 0 for all x.
We have proved the equivalence of the properties (i) – (vi). Slightly modifying
the reasoning (do it yourself!), one can prove the equivalence of the properties
(i′ ) – (vi′ ).
Remark D.28 (i) [Checking positive semidefiniteness] Given matrix A ∈ Sn , we
can check whether it is positive semidefinite by a purely algebraic finite algorithm
(the so called Lagrange diagonalization of a quadratic form) which requires at
most O(n3 ) arithmetic operations. Positive definiteness of a matrix can be checked
422 Prerequisites: Symmetric Matrices and Positive Semidefinite Cone

also by the Cholesky factorization algorithm which finds the decomposition in

(iv′ ), if it exists, in approximately 16 n3 arithmetic operations.
There exists another useful algebraic criterion (Sylvester’s criterion) for posi-
tive semidefiniteness of a matrix. According to this criterion, a symmetric matrix
A is positive definite if and only if all of its angular (leading principal) minors
are positive, and A is positive semidefinite if and only if all its principal minors
are nonnegative. For example, a symmetric 2 × 2 matrix

a b
A=
b c
is positive semidefinite if and only if a ≥ 0, c ≥ 0, and Det(A) := ac − b2 ≥ 0.
Another consequence is that when A ⪰ 0, we have Aii Ajj ≥ A2ij since the principal
2 × 2 minors of A should be nonnegative.
(ii) [Square root of a positive semidefinite matrix] By the first chain of equiv-
alences in Theorem D.27, a symmetric matrix A is ⪰ 0 if and only if A is the
square of a positive semidefinite matrix B. The latter matrix is uniquely defined
by A ⪰ 0 and is called the square root of A [notation: A1/2 ], see section D.1.5.
A symmetric matrix A is called negative semidefinite [notation: A ⪯ 0] if
its negative, i.e., the matrix −A, is positive semidefinite. Similarly, A is called
negative definite [notation: A ≺ 0] if −A is positive definite.

D.2.2 The positive semidefinite cone

When adding symmetric matrices and multiplying them by real numbers, we add,
respectively multiply by real numbers, the corresponding quadratic forms. Hence,
we arrive at the following basic fact.

Fact D.29 Let A, B ∈ Sn be positive semidefinite. Then, A + B is also posi-

tive semidefinite. For any α ∈ R+ , the matrix αA is also positive semidefinite.

This fact is also the same as the following one stated in section 1.2.4.

Fact D.30 The set of n × n positive semidefinite matrices form a cone Sn+ in
the Euclidean space Sn of symmetric n × n matrices based on the Euclidean
structure being given by the Frobenius inner product ⟨A, B⟩ = Tr(AB) =
P
i,j Aij Bij .

The cone Sn+ is called the positive semidefinite cone of size n. It is immedi-
ately seen that the semidefinite cone Sn+ is quite nice. Specifically, it satisfies the
following properties:
• Sn+ is closed: the limit of a converging sequence of positive semidefinite matrices
is positive semidefinite.
• Sn+ is pointed: the only n × n matrix A such that both A and −A are positive
semidefinite is the n × n matrix of all zeros.
D.2 Positive semidefinite matrices and positive semidefinite cone 423

• Sn+ possesses a nonempty interior (the subset composed of all interior points of
Sn+ ) which is exactly the set of positive definite matrices.
In fact, Sn+ is a regular cone, see section 20.3. Note that the relation A ⪰ B
means exactly that A − B ∈ Sn+ , while A ≻ B is equivalent to A − B ∈ int Sn+ .
The “matrix inequalities” A ⪰ B (A ≻ B) match the standard properties of the
usual scalar inequalities, e.g.,
A⪰A [reflexivity]
A ⪰ B, B ⪰ A =⇒ A = B [anti-symmetry]
A ⪰ B, B ⪰ C =⇒ A ⪰ C [transitivity]
A ⪰ B, C ⪰ D =⇒ A + C ⪰ B + D [compatibility with linear operations, I]
A ⪰ B, λ ≥ 0 =⇒ λA ⪰ λB [compatibility with linear operations, II]
Ai ⪰ Bi , Ai → A, Bi → B as i → ∞ =⇒ A ⪰ B [closedness]
with evident modifications when ⪰ is replaced with ≻, like
A ⪰ B, C ≻ D =⇒ A + C ≻ B + D,
etc. Along with these standard properties of inequalities, the inequality ⪰ pos-
sesses the following nice additional property.

Fact D.31 For any A, B ∈ Sn such that A ⪰ B, and for any V ∈ Rn×m , we
always have
V ⊤ AV ⪰ V ⊤ BV.

Another important property of the positive semidefinite cone is that its dual
cone is equal to itself, i.e., the positive semidefinite cone is self-dual.

Theorem D.32 A matrix X has nonnegative Frobenius inner products with

all positive semidefinite matrices if and only if X itself is positive semidefinite.

Proof. We first prove the “if” part. Assume that X ⪰ 0, and let us prove that
then Tr(XY ) ≥ 0 for every Y ⪰ 0. Using the eigenvalue decomposition of X, we
can write it as
Xn
X= λi (X)ei e⊤
i ,
i=1

where ei are the orthonormal eigenvectors of X. Then, we arrive at

n
! ! n
X X
⊤
Tr(XY ) = Tr λi (X)ei ei Y = λi (X)Tr(ei e⊤
i Y)
i=1 i=1
n
X
= λi (X)Tr(e⊤
i Y ei ), (D.6)
i=1

where the last equality is given by Fact D.1. Recall that ei s are just vectors, and
424 Prerequisites: Symmetric Matrices and Positive Semidefinite Cone

thus Tr(e⊤ ⊤ ⊤
i Y ei ) = ei Y ei . Also, as Y ⪰ 0, we have ei Y ei ≥ 0 for all i. Moreover,
since X ⪰ 0 by Proposition D.16, we have λi (X) ≥ 0, and thus we conclude
n
X
Tr(XY ) = λi (X)Tr(e⊤
i Y ei ) ≥ 0.
i=1

In order to prove the “only if” part, suppose that X is such that Tr(XY ) ≥ 0
for all matrices Y ⪰ 0. Note that for every vector y, the matrix Y = yy ⊤ is
positive semidefinite (Theorem D.27.iii). Then, for any y ∈ Rn we have 0 ≤
Tr(Xyy ⊤ ) = Tr(y ⊤ Xy) = y ⊤ Xy, where the first equality follows from Fact D.1.
Thus, X ⪰ 0 as desired.
Graphical illustration. The single-dimensional positive semidefinite cone Sn1 is
just the nonnegative ray, i.e.,
S1+ = {x ∈ R : x ≥ 0} = R+ .
The cone

x y x y
S2+ = ∈S :2
⪰0
y z y z
is nothing but the set
(x, y, z) ∈ R3 : x ≥ 0, z ≥ 0, xz ≥ y 2

q
3
= (x, y, z) ∈ R : x + z ≥ (x − z) + 4y .
2 2

After linear invertible substitution of coordinates w = x + z, u = x − z, v = 2y,

this last set becomes the 3D second-order, or 3D Lorentz, cone
n √ o
(u, v, w) ∈ R3 : w ≥ u2 + v 2 .

The smallest “genuine” – not belonging to a simpler family of cones – semidefinite

cone is the cone S3 of positive semidefinite 3 × 3 matrices. This cone, however,
lives in 6-dimensional linear space S3 , so that we cannot draw it in 3D. What we
can draw, are 3D cross-section of S3 by various 3D planes, Several cross-sections
of this type are shown on Figure V.2 illustrating that the range of convex sets
that can be obtained by intersecting even the simplest semidefinite cone with
affine planes is quite wide. In this respect it should be noted that the problems of
minimizing linear objectives over the intersection of positive semidefinite cones
and affine planes have a name – these are semidefinite programs, the subject of
Semidefinite Programming, sometimes referred to as the “Linear Programming
of XXI Century.” We speak about Semidefinite Programming in more details in
sections 22.4, 28.6 and exercises IV.20, IV.21, IV.26, IV.28, IV.35, among others.
It is important to note that “expressive abilities” of Semidefinite Programming
are extremely strong, and “for all practical purposes,” it covers basically all con-
vex problems arising in applications. The ultimate reason for this universality
is the second-to-none richness of the family of convex sets which we can get by
intersecting positive semidefinite cones with affine planes.
D.2 Positive semidefinite matrices and positive semidefinite cone 425

       
 z x y   x 0 0 
(x, y, z) ∈ R3 :  x z 0 ⪰0 (x, y, z) ∈ R3 :  0 y 0 ⪰0
ny 0 z 0 0 z
   
p o
3D Lorentz cone z≥ x2 + y 2 Nonegative orthant {x ≥ 0, y ≥ 0, z ≥ 0}

   
 z x y 
(x, y, z) ∈ R3 :  x z x ⪰0 random 3D cross-section of S3+
y x z
 

Figure V.2. Several 3D cross-sections of S3+

Schur Complement Lemma. Schur Complement Lemma is the following sim-

ple and extremely useful fact:

Proposition D.33 [Schur Complement Lemma] Consider a symmetric 2×2

block matrix
P Q
A=
Q⊤ R
with R ≻ 0. Then, A is positive definite (semidefinite) if and only if the
426 Prerequisites: Symmetric Matrices and Positive Semidefinite Cone

matrix
P − QR−1 Q⊤
is positive definite (resp. semidefinite).

Proof. Suppose P ∈ Sp , R ∈ Sq . Then, splitting the vector x from Rp+q into

blocks u ∈ Rp , v ∈ Rq , we arrive at
P ⪰0
⇐⇒ u⊤ P u + 2u⊤ Qv + v ⊤ Rv ≥ 0, ∀(u, v)
⇐⇒ min v ⊤ Rv + 2v ⊤ Qu + u⊤ P u ≥ 0, ∀u

v

⇐⇒ min (v + R−1 Qu)⊤ R(v + R−1 Qu) + u⊤ P u − u⊤ QR−1 Q⊤ u ≥ 0, ∀u

⇐⇒ u⊤ P u − u⊤ QR−1 Q⊤ u + min (v + R−1 Qu)⊤ R(v + R−1 Qu) ≥ 0, ∀u

v | {z }
≥0 as R≻0
−1 ⊤
⇐⇒ P − QR Q ⪰ 0.
Here, in the last step we observe that for any u ∈ Rp , by selecting v = −R−1 Qu
we see that the optimum value of the minimization problem is equal to zero. As
we concluded P − QR−1 Q⊤ ⪰ 0, this proves the “positive semidefinite” version
of Schur Complement Lemma. The same reasoning, with evident modifications,
justifies the “positive definite” version.

D.3 Exercises
Exercise 12 1. Find the dimension of Rm×n and point out a basis in this space.
2. Build an orthonormal basis in Sm .
Exercise 13 In the space Rm×m of square m × m matrices, there are two interesting subsets:
the set Sm of symmetric matrices A : A = A⊤ and the set Jm of skew-symmetric matrices
{A = [Aij ] : Aij = −Aji , ∀i, j}.
1. Verify that both Sm and Jm are linear subspaces of Rm×m .
2. Find the dimension of Sm and point out a basis in Sm .
3. Find the dimension of Jm and point out a basis in Jm .
4. What is the sum of Sm and Jm ? What is the intersection of Sm and Jm ?
Exercise 14 Is the “3-factor” extension of Fact D.1 valid, at least in the case of square matrices
X, Y, Z of the same size? That is, for square matrices X, Y, Z of the same size, is it always true
that Tr(XY Z) = Tr(Y XZ)?
Exercise 15 Given P ∈ Sp , Q ∈ Rr×p , and R ∈ Sr , consider the matrices

P Q⊤ −Q⊤

P R Q R −Q
A= , B= , C= , D= .
Q R −Q R Q⊤ P −Q⊤ P

Prove that λ(A) = λ(B) = λ(C) = λ(D). Thus, the matrices A, B, C, D simultaneously are/are
not positive semidefinite. As a consequence: Schur Complement Lemma says that when R ≻ 0,
we have A ⪰ 0 iff P − Q⊤ R−1 Q ⪰ 0; since A ⪰ 0 iff C ⪰ 0, we see that the same Lemma says
that when P ≻ 0, we have A ⪰ 0 iff R − QP −1 Q⊤ ⪰ 0.
D.4 Proofs of Facts 427

Exercise 16 Let Sn n n n
++ := int S+ = {X ∈ S : X ≻ 0}, and consider X, Y ∈ S++ . Then, X ⪯ Y
−1 −1 −1 n
holds if and only if X ⪰Y (“⪰-antimonotonicity of X , X ∈ S++ ). Is it true that from
0 ≺ X ⪯ Y it always follows that X −2 ⪰ Y −2 ?
Exercise 17 Let A, B ∈ Sn be such that 0 ⪯ A ⪯ B. For each one of the following, either prove
the statement or produce a counter example:
1. A2 ⪯ B 2 ;
2. 0 ⪯ A1/2 ⪯ B 1/2 .
Exercise 18 A matrix A ∈ Sn is called diagonally dominant if it satisfies the relation
X
aii ≥ |aij |, i = 1, . . . , n.
j̸=i

Prove that every diagonally dominant matrix A is positive semidefinite.

a2 +b2
Exercise 19 Prove the following matrix analogy of the scalar inequality ab ≤ 2
for a, b ∈ R:
⊤ ⊤ ⊤ ⊤ m×n
AB + BA ⪯ AA + BB , ∀A, B ∈ R .
Exercise 20 1. Let Ik denote the k × k identity matrix, and let A be an m × n matrix. Prove
that the following three properties are equivalent to each other:
• A⊤ A ⪯ In ;
⊤
• AA
⪯ Im
;
Im A
• ⪰ 0.
A⊤ In
2. Let A1 , . . . , Ak be n × n matrices such that
A⊤ ⊤
1 A1 + . . . + Ak Ak ⪯ In .

For each one of the following, either prove the statement or produce a counter example:
• A1 A⊤ + . . . + Ak A⊤
k ⪯ In ;
 1⊤
A1 A1 A1 A⊤ · · · A1 A⊤

2 k
A2 A⊤1 A2 A⊤ 2 · · · A2 A⊤k 

•  .  ⪯ Ikn .

. . .
 .. .. .. .. 
Ak A⊤
1 Ak A⊤ 2 · · · Ak A⊤k

D.4 Proofs of Facts

Fact D.1 Let X, Y be rectangular matrices such that XY makes sense and is a
square matrix. Then, Tr(Y X) also makes sense and
Tr(XY ) = Tr(Y X).
Proof. In order for XY to make sense and be a square matrix, the sizes of the matrices should
be m × n (for X) and n × m (for Y ) with some m and n, We now have
m X
X n n X
X m
Tr(XY ) = Xij Yji , Tr(Y X) = Yji Xij ,
i=1 j=1 j=1 i=1

that is, both quantities are the same.

Fact D.2. If X, Y ∈ Rm×n , the Frobenius inner product of X and Y is equal to the
Frobenius inner product of X ⊤ and Y ⊤ :
Tr(XY ⊤ ) = Tr (X ⊤ )(Y ⊤ )⊤ .

428 Prerequisites: Symmetric Matrices and Positive Semidefinite Cone

Moreover, when U is an orthogonal m × m matrix (i.e., U U ⊤ = U ⊤ U = Im , or,

which is the same as U −1 = U ⊤ , and V is an orthogonal n × n matrix (i.e., V V ⊤ =
V ⊤ V = In ), the Frobenius inner product of U XV and U Y V is the same as the
Frobenius inner product of X and Y :
Tr(XY ⊤ ) = Tr (U XV )(U Y V )⊤ .

Proof. We have Tr((X ⊤ )(Y ⊤ )⊤ ) = Tr(X ⊤ Y ) = Tr((X ⊤ Y )⊤ ) = Tr(Y ⊤ X) = Tr(XY ⊤ ), where

the concluding inequality is given by Fact D.1. Similarly, given orthogonal matrices U ∈ Rm×m
and V ∈ Rn×n , we have
Tr((U XV )(U Y V )⊤ ) = Tr(U X(V V ⊤ )Y ⊤ U ⊤ ) = Tr(XY ⊤ (U ⊤ U )) = Tr(XY ⊤ ),

where the first and the last equalities follow from the fact that U, V are orthogonal matrices
and thus V ⊤ V = V V ⊤ = In and U ⊤ U = U U ⊤ = Im , and the second equality follows from Fact
D.1.

Fact D.10. Let A, B ∈ Rn×n be commuting and λ be a real eigenvalue of A. Then,

the spectral subspace E = {x ∈ Rn : Ax = λx} of A corresponding to λ is invariant
for B (i.e., Be ∈ E for every e ∈ E).
Proof. For e ∈ E we have A(Be) = B(Ae) = B(λe) = λ(Be), that is Be ∈ E.
Fact D.11. If A is an n × n matrix and L is an invariant subspace of A (i.e., L is a
linear subspace such that Ae ∈ L whenever e ∈ L), then the orthogonal complement
L⊥ of L is invariant for the matrix A⊤ . In particular, if A is symmetric and L is
invariant subspace of A, then L⊥ is an invariant subspace of A⊤ as well.
Proof. Suppose L is invariant for A. Recall that x ∈ L⊥ if and only if x⊤ y = 0 for all y ∈ L.
Therefore, for all x ∈ L⊥ we have
y ∈ L =⇒ Ay ∈ L =⇒ 0 = x⊤ Ay = (A⊤ x)⊤ y,

where the first implication is due to L being an invariant subspace of A, and the second impli-
cation is due to Ay ∈ L and x ∈ L⊥ . Then, from (A⊤ x)⊤ y = 0 for all x ∈ L⊥ we deduce that
A⊤ x ∈ L⊥ .

Fact D.9. The matrices A1 , . . . , Ak ∈ Sn commute with each other (Ai Aj = Aj Ai

for all i, j) if and only if they can be “simultaneously diagonalized,” i.e., there exist
a single orthogonal matrix U and diagonal matrices Λ1 ,. . . ,Λk such that
Ai = U Λi U ⊤ , i = 1, . . . , k.
Proof. The fact that when the matrices are simultaneously diagnosable, then they commute, is
evident: for diagonal matrices D, E and an orthogonal matrix U we have
(U ⊤ DU )(U ⊤ EU ) = U ⊤ D(U ⊤ U )EU = U ⊤ DEU,
and similarly (U ⊤ EU )(U ⊤ DU ) = U ⊤ EDU ; it remains to note that ED = DE as D, E are
diagonal matrices.
In the opposite direction, let us carry out induction in the number k of commuting symmetric
matrices. There is nothing to prove when k = 1, or, more precisely, the case of k = 1 is covered
by the results on eigenvalue decomposition of symmetric matrix. Inductive step: Assuming the
statement valid for some k and given k + 1 commuting matrices A1 , . . . , Ak+1 , let λ1 , . . . , λs be
distinct eigenvalues of Ak+1 , and E1 , . . . , Es be the corresponding spectral subspaces of Ak+1 ,
that is, restricted onto Eℓ , the mapping x 7→ Ak+1 x acts as multiplication by λℓ . By Fact
D.4 Proofs of Facts 429

D.10, subspaces Eℓ are invariant for matrices A1 , . . . , Ak . Then, by the inductive hypothesis,
we deduce that for every ℓ ≤ s there exists an orthonormal basis in Eℓ with basic vectors being
eigenvalues of every one of A1 , . . . , Ak . In addition, these vectors, as all vectors from Eℓ , are
eigenvectors of Ak+1 . So, the union, over ℓ ≤ s of vectors from these orthonormal bases form an
orthonormal basis in the entire space (recall that E1 , . . . , Es are mutually orthogonal, and their
sum is the entire Rn ), and in this basis all matrices A1 , . . . , Ak+1 become diagonal.

Fact D.19 (i) Let E and F be finite-dimensional linear spaces. The induced norm
∥ · ∥F,E is indeed a norm on the space Lin(E, F ) of linear mappings from E to F .
(ii) Let E, F, G be finite-dimensional linear spaces equipped with the norms ∥ · ∥E ,
∥ · ∥F , ∥ · ∥G respectively. Let y = Ax : E → F and z = By : F → G be linear
mappings. Then,
∥BA∥G,E ≤ ∥B∥G,F ∥A∥F,E .
Proof. (i): Homogeneity and positivity of ∥ · ∥F,E are evident. To prove the Triangle inequality,
consider A, B ∈ Lin(E, F ). Then, for any x we have
∥(A + B)x∥F = ∥Ax + Bx∥F ≤ ∥Ax∥F + ∥Bx∥F ≤ (∥A∥F,E + ∥B∥F,E ) ∥x∥E .

This implies ∥(A + B)∥F,E ≤ ∥A∥F,E + ∥B∥F,E as desired.

(ii): Indeed, for every x ∈ E we have ∥BAx∥G ≤ ∥B∥G,F ∥Ax∥F ≤ ∥B∥G,F ∥A∥F,E ∥x∥E .

Fact D.20 Let ∥ · ∥ be the spectral norm on Rm×n .

(i) For any A ∈ Rm×n , we have
∥A∥ = max y ⊤ Ax : ∥x∥2 ≤ 1, ∥y∥2 ≤ 1 ,

x,y

hence also ∥A∥ = ∥A⊤ ∥.

(ii) For any A ∈ Sn , we have ∥A∥ = max {|λmax (A)| , |λmin (A)|}. Moreover, for
any A ∈ Rm×n we have
∥A∥2 = ∥A⊤ A∥ = λmax (A⊤ A) = λmax (AA⊤ ) = ∥AA⊤ ∥ = ∥A⊤ ∥2 .
Proof.
(i): Indeed, ∥Ax∥2 = max y ⊤ Ax : ∥y∥2 ≤ 1 by Cauchy inequality (Theorem B.1), implying

y
that max {∥Ax∥2 : ∥x∥2 ≤ 1} = maxx,y y ⊤ Ax : ∥x∥2 ≤ 1, ∥y∥2 ≤ 1 .

x
(ii): Indeed, let A = U Diag{λ}U ⊤ be the eigenvalue decomposition of the symmetric matrix
A. Hence, y ⊤ Ax = (U ⊤ y)⊤ Diag{λ}(U ⊤ x) with orthogonal U . Thus,
n o
max y ⊤ Ax : ∥x∥2 ≤ 1, ∥y∥2 ≤ 1
x,y
n o
= max v ⊤ Diag{λ}u : ∥u∥2 ≤ 1, ∥v∥2 ≤ 1
u,v
( )
X
= max λi ui vi : ∥u∥2 ≤ 1, ∥v∥2 ≤ 1
u,v
i
( )
X
≤ (max |λi |) max |ui vi | : ∥u∥2 ≤ 1, ∥v∥2 ≤ 1
i u,v
i
≤ max |λi |,
i

where the last inequality is due to Cauchy inequality. Note also that y ⊤ Ax = maxi |λi | when x
is the unit norm eigenvector of A corresponding to the maximum magnitude eigenvalue, λi∗ , of
430 Prerequisites: Symmetric Matrices and Positive Semidefinite Cone

A, and y = sign(λi∗ )x. Thus, when A is symmetric, we do have ∥A∥ = maxi |λi (A)|.
For an arbitrary matrix A we have

∥A∥2 = max ∥Ax∥22 = max x⊤ (A⊤ A)x = λmax (A⊤ A),

x:∥x∥2 ≤1 x:∥x∥2 ≤1

where the last equality is given by the variational characterization of the largest eigenvalue of
symmetric positive semidefinite matrix A⊤ A (Theorem D.12). Moreover, ∥A⊤ A∥ = λmax (A⊤ A)
p p
due to the already proved “symmetric” part of (ii). Thus, ∥A∥ = ∥A⊤ A∥ = λmax (A⊤ A),
Recalling that ∥A∥ = ∥A⊤ ∥, we complete the verification of (ii).

Fact D.21. The vector-valued function A 7→ λ(A) : Sn → Rn is Lipschitz continu-

ous, specifically, denoting by ∥ · ∥ the spectral norm, for all k ≤ n, we have
|λk (A) − λk (A′ )| ≤ ∥A − A′ ∥, ∀(A, A′ ∈ Sn ).
Proof. Indeed, by VCE, denoting Ek the family of all subspaces of codimension k − 1 in Rn , we
have
λk (A) = min max e⊤ Ae,
E∈Ek e∈E:∥e∥2 =1

and the right hand side is clearly Lipschitz continuous, with constant 1 w.r.t. ∥ · ∥, function of
A. 1
Fact D.23 Let A be a positive semidefinite m × m matrix, and B be a positive
semidefinite matrix such that B 2 = A. Then, B = A1/2 .
Proof. When B 2 = A, B clearly commutes with A, implying by Fact D.9 that the matrices
admit simultaneous diagonalization, i.e.,

A = U Diag{λ1 , . . . , λm }U ⊤ , B = U Diag{υ1 , . . . , υm }U ⊤

where U is an orthogonal matrix. Thus, the relation A = B 2 reads

U Diag{λ1 , . . . , λm }U ⊤ = (U Diag{υ1 , . . . , υm }U ⊤ )(U Diag{υ1 , . . . , υm }U ⊤ )

= U Diag{υ1 , . . . , υm } (U ⊤ U ) Diag{υ1 , . . . , υm }U ⊤
| {z }
=Im

=U Diag{υ12 , . . . , υm
2
}U ⊤ ,

and so λi = υi2 for all i. Moreover, B is positive semidefinite, implying that υi ≥ 0. The bottom
1/2
line is that υi = λi for all i, that is, B = A1/2 .

Fact D.24 Let f (x) be a continuously differentiable real-valued function on an in-

terval (a, b), (where −∞ ≤ a < b ≤ +∞) of the real axis. The function ϕ(X) =
Tr(f (X)) is continuously differentiable on the open set D of Sm composed of all
1 Justification of “clearly” is based upon two nearly evident facts (check them!): (1) whenever
e ∈ Rm , h ∈ Rn , the function e⊤ Bh is Lipschitz continuous, with constant ∥h∥2 ∥e∥2 w.r.t. ∥ · ∥,
function of B ∈ Rm×n , and (2) Let Q ⊂ Rk , ∥ · ∥ be a norm on Rk , and {fα (·) : α ∈ A ̸= ∅} be a
family of Lipschitz continuous, with common constant L with respect to ∥ · ∥, real-valued functions
on Q, If the function f (x) = supα∈A fα (x) is finite at some point of Q, it is finite everywhere on Q
and is Lipschitz continuous, with constant L, w.r.t. ∥ · ∥. It should be added that in the latter
statement, Q can be replaced with abstract metric space (a nonempty set equipped with distance
d(·, ·), possessing properties 1 – 3, see page 374, between points of Q, with Lipschitz continuity, with
constant L w.r.t. d, of function f : Q → R meaning that |f (x) − f (x′ )| ≤ Ld(xc, x′ ) for all x, x′ ∈ Q).
D.4 Proofs of Facts 431

matrices with spectrum from (a, b), and

d
ϕ(X + tH) = Tr(f ′ (X)H) ∀X ∈ D, H ∈ Sm .
dt t=0

In other words, ϕ(·) is continuously differentiable on the open set D and its gradient,
taken w.r.t. the Frobenius Euclidean structure on Sm , is ∇ϕ(X) = f ′ (X).
Hint: In order to prove Fact D.24, it makes sense to verify its validity for algebraic
polynomials and then use the fact that continuously differentiable real valued
function on an interval (a, b) of the real axis is the limit, in the sense of uniform
convergence along with the first order derivative on compact subsets of (a, b), of
a sequence of polynomials. P k i
Proof. Following the hint, let us start with the case when f (x) = i=0 ai x is an algebraic
polynomial. In this case the matrix-valued function f (X) = a0 Im +a1 X +. . .+ak X k : Sm → Sm
is clearly infinitely many times differentiable, and for all X, H ∈ Sm we have
k i
d X X
X j−1 HX i−j ,

f (X + tH) = ai
dt t=0 i=1 j=1

and thus
d d
ϕ(X + tH) = Tr(f (X + tH))
dt t=0 dt t=0

d
= Tr f (X + tH) [as Tr(Z) is a linear function of Z]
dt t=0
k i
!
X X j−1 i−j
= Tr ai X HX
i=1 j=1
k
X i
X
Tr X j−1 HX i−j

= ai
i=1 j=1
k
X i
X
= ai Tr(HX i−1 ) [by Fact D.1]
i=1 j=1
k
X !
i−1
= Tr H ai iX
i=1

= Tr(Hf ′ (X)).

Next, Corollary D.22 implies that D is an open set. Now let f be continuously differentiable
real-valued function on the interval (a, b) ⊆ R, and ft (x), t = 1, 2, . . ., be polynomials such that
ft (·) and ft′ (·) converge as t → ∞, uniformly on every segment [a′ , b′ ] ⊂ (a, b), to f (·) and f ′ (·),
respectively; it is well known that such a sequence exists. Taking into account that the spectral
norm (maximum of magnitudes of eigenvalues) of g(X) is upper-bounded by the uniform norm of
g(·) on σ(X), we immediately conclude that as t → ∞, the functions ϕt (x) := Tr(ft (X)) converge
to ϕ(X) uniformly on closed and bounded subsets of D, and the gradients ∇ϕt (X) = ft′ (X)
converge to f ′ (X) uniformly on closed and bounded subsets of D. By the standard results
of Analysis, it follows that ϕ(·) is continuously differentiable on D, its gradient is given by
∇ϕ(X) = f ′ (X).

Fact D.31 For any A, B ∈ Sn such that A ⪰ B, and for any V ∈ Rn×m , we always
432 Prerequisites: Symmetric Matrices and Positive Semidefinite Cone

have
V ⊤ AV ⪰ V ⊤ BV.
Proof. Indeed, we should prove that if A − B ⪰ 0, then also V ⊤ (A − B)V ⪰ 0. This is immediate
as the quadratic form y ⊤ (V ⊤ (A − B)V )y = (V y)⊤ (A − B)(V y) of y is nonnegative along with
the quadratic form x⊤ (A − B)x of x.
References

[Axl15] S. Axler, Linear algebra done right, 3rd edition, Undergraduate Texts in Mathematics,
Springer, 2015.
[BNO03] D. Bertsekas, A. Nedic, and A. Özdaǧlar, Convex analysis and optimization, Athena
Scientific, 2003.
[BTN] A. Ben-Tal and A. Nemirovski, Lectures on modern convex optimization: analysis,
algorithms, and engineering applications, SIAM 2001 and https://www.isye.gatech.
edu/~nemirovs/LMCOLN.pdf 2023.
[BV04] S. Boyd and L. Vandenberghe, Convex optimization, Cambridge University Press,
2004.
[Edw12] C.H. Edwards, Advanced calculus of several variables, Courier Corporation, 2012.
[Gel89] I.M. Gel’fand, Lectures on linear algebra, Dover Books on Mathematics, Dover Pub-
lications, Inc, 1989.
[HUL93] J.-B. Hiriart-Urruty and C. Lemarechal, Convex analysis and minimization algo-
rithms, I: Fundamentals, II: Advanced theory and bundle methods, Springer, 1993.
[IT79] A.D. Ioffe and V.M. Tikhomirov, Theory of extremal problems, Nauka, 1974 (in Rus-
sian) English translation: Studies in Mathematics and its Applications, v.6, North-
Holland, 1979.
[Nem24] A. Nemirovski, Introduction to linear optimization, World Scientific, 2024 and
https://www.isye.gatech.edu/~nemirovs/WSbook.pdf, 2024.
[Nes18] Yu. Nesterov, Lectures on convex optimization, 2nd edition, Springer, 2018.
[Pas22] D. S. Passman, Lectures on linear algebra, World Scientific, 2022.
[Roc70] R.T. Rockafellar, Convex analysis, Mathematics, Princeton University Press, 1970.
[Rud13] W. Rudin, Principles of mathematical analysis, 3rd edition, McGraw Hill India, 2013.
[Str06] G. Strang, Linear algebra and its applications., Belmont, CA: Thomson, Brooks/Cole,
2006.

433
Index

Aff(·), affine hull, 362 dependence/independence, 364

Argmin, 179 dimension
Cone(·), conic hull, 12 of affine subspace, 366
ConeT(·), conic transform, 22 of set, 26
Conv(·), convex hull, 9 hull, 362
Det(·), determinant, 353 structure of, 363
Dg{X} plane, 370
vector of diagonal entries of square matrix subspace, 361
X, xv representation by linear equations, 367
Diag{X1 , . . . , XK } affinely
block-diagonal matrix with diagonal blocks independent set, 364
X1 , . . . , XK , xv spanning set, 363
Diag{x} annulator see orthogonal complement
Diagonal matrix with entries of vector x on ball
the diagonal, xv Euclidean, 5, 76
Dom, domain (of a function), 159 in norm, 5
ℑA unit, 5
image {y : ∃x : y = Ax} of matrix A, 354 base
KerA cone, 108
kernel {h : Ah = 0} of matrix A, 354 basic orth, see standard basic orth, 349
Lin(·), linear span, 348 basis
Persp(·), perspective transform, 22 in Rn , 349
Rec(·), see recessive cone 96 in linear subspace, 350
Lm , Lorentz cone, 11 orthonormal, 358
Rm + , nonnegative orthant, 10 existence of, 359
Rn , space of real column vectors of dimension boundary, 16
n, 346 relative, 18
Rm×n , space of m × n real matrices, 408 characteristic polynomial, 409
Sm , space of m × m real symmetric matrices, closure, 15
408 combination
Sm+ , Positive semidefinite cone, 11 affine, 363
Sm+ , semidefinite cone, 422 conic see conic combination
bd, boundary, 16 convex see convex combination
cl, closure, 15 linear see linear combination
dim, dimension, 26 combination, linear see linear combination
int, interior, 15 complementary slackness
ConeT(·), closed conic transform, 24 in Convex Programming, 269
rank(·), rank, 354 in Linear Programming, 57
rbd, relative boundary, 18 cone, 10
rint, relative interior, 17 definition of, 10
a.k.a. dual, 100, 102
also known as, xv ice cream, 11
affine Lorentz, 11, 244, 343
basis, 366 normal, 181
combination see combination, affine pointed, 102, 107

434
Index 435

polyhedral, 122 representation of, 395

extreme rays of, 120 determinant, 353
polyhedral see polyhedral cone differential see derivatives
Positive semidefinite, 11 dimension, 350
radial, 181 of a set, 26
recessive, 96 dimension formula, 352
regular, 243 dual of optimization program see Lagrange
second-order see cone, Lorentz dual
self-dual, 103 eigenvalue decomposition
semidefinite, 422 of a symmetric matrix, 409
self-duality of, 423 simultaneous, 411
cone-convexity, 244, 283 eigenvalues, 409
cone-monotonicity, 284 variational description of, 412
conic vector of, 411
combination, 11 Lipschitz continuity, 417
duality, 260 eigenvectors, 409
theorem, 260 ellipsoid, 7
hull, 11 Euclidean
program, 260 distance, 373
sets, 10 norm, 372
Conic Duality Theorem, 262 structure, 355
Conic Programming, 260 extreme
conic transform, 22 direction, 107
closed, 24 ray, 107
constraints extreme points, 91
active, 239 of polyhedral sets, 112
of optimization program, 50, 239
Farkas Lemma
convex
Homogeneous, 43
combination, 7, 8
Inhomogeneous, 49
hull, 8, 9
feasibility of optimization problem
problem/program, 240
essentially strict, 242
in cone-constrained form, 243, 257
strict, 242
sets, definition of, 3
Fourier-Motzkin elimination, 38
sets see sets, convex
function
Convex Programming Duality Theorem, 253
affine minorant of, 191
in cone-constrained form, 258
functions
Convex Programming optimality conditions
affine, 155
in saddle point form, 268
characteristic, a.k.a. indicator, 159
in the Karush-Kuhn-Tucker form, 271
cone-convex, 244, 283
Convex Theorem on Alternative, 243
calculus of, 283
in cone-constrained form, 247 continuous, 381
convexity basic properties of, 383
criterion for smooth functions, 168 calculus of, 382
of function, 155 convex
of function, strict, 155 below boundedness of, 175
of set, 3 calculus of, 161
strict closures of, 195
sufficient condition for, 169 convexity of sublevel sets of, 158
coordinates, 351 definition of, 155
barycentric, 366 differential criteria of convexity, 165
derivatives, 391 domains of, 159
calculus of, 397 elementary properties of, 157
computation of, 398 Fenchel conjugates of, 209
directional, 393 Karush-Kuhn-Tucker optimality
directionsl of convex function, 204 conditions for minimizers of, 183
existence, 396 Legendre transforms of, 209
of higher order, 400 Lipschitz continuity of, 174
436 Index

lower semicontinuous, 188 Frobenius, 360, 408

maxima and minima of, 179 representation of linear forms, 356
proper, 188 interior, 15
subdifferential of, 198 relative, 17
subgradients of, 198 Karush-Kuhn-Tucker
values outside the domains of, 159 optimality condition for cone-constrained
differentiable, 391–415 problem, 273
domains of, 385 optimality condition for inequality
epigraphs of, 386 constrained problem, 273
hypograph, 388 point of cone-constrained problem, 273
Lipschitz continuity of, 174 point of inequality constrained problem,
lower semicontinuous, 189, 386 273
lsc, a.k.a. lower semicontinuous, 386 kernel of a matrix
Minkowski, 87, 227 see Ker, 354
of eigenvalues of symmetric matrices, 217 KKT
of symmetric matrices, 417 see Karush-Kuhn-Tucker, 273
permutation symmetric, 170, 217
positive homogeneity of, 157 Lagrange
proper, 386 dual, 255
regular, 190 in cone-constrained form, 257
spectral, 217 function, 253
subadditive, 157 for problem in cone-constrained form,
sublevel sets of, 158 257
superlevel sets of, 313 saddle points of, 256
support, 226 multipliers, 255
taking values in extended real axis, 385 Legendre transform, 209
uniformly continuous, 383 Lemma
upper semicontinuous, 388 S-Lemma, 262
usc, a.k.a. upper semicontinuous, 388 S-Lemma, inhomogeneous, 265
functios on Schur Complement, 424, 426
lsc, a.k.a. lower semicontinuous, 189 Dubovitski-Milutin, 104, 105
Minimax, 308
General Theorem on Alternative, 47
limit
gradient, 395
inferior, 376
Gram-Schmidt orthogonalization, 359
superior, 376
GTA see General Theorem on Alternative
line, 369
Hessian, 404 linear
hull combination, 7, 348
affine, see affine hull dependence/independence, 349
conic see conic hull form, 356
convex see convex hull, see convex hull9 mapping, 352
hypercube, 6 representation by matrices, 352
hyperoctahedron, 6 subspace, 347–348
hyperplane, 370 basis of, 350
supporting, 90 dimension of, 350
iff orthogonal complement of, 357
if and only if, xv representation by homogeneous linear
image space of a matrtix equations, 367
see ℑ, 354 linear combination, 348
inequality Linear Programming
Cauchy, 373 Duality Theorem, 55
Gradient, 173 necessary and sufficient optimality
Hölder, 214 conditions, 57
Jensen, 157 problem/program, 52
Moment, 215 program, 124
Triangle, 372 bounded/unbounded, 124
Young, 213 dual of, 52
inner product, 355 feasible/infeasible, 124
Index 437

solvability of, 125 convex, 240

solvable/unsolvable, 124 domain of, 50, 239
linear span, 348 feasible, 50, 239
linear subspace feasible set of, 50, 239
parallel to affine subspace, 361 feasible solutions of, 50, 239
linearly objective of, 50, 239
dependent/independent set, 349 optimal solutions of, 50, 51, 239, 240
spanning set, 351 optimal value of, 50, 51, 239, 240
local optimality, 290 solvable, 51, 240
Majorization, 128 orthogonal
Majorization Principle, 128 complement, 357
mappings see functions, see functions projection, 358
Mathematical Programming program, see orths
optimization program, see optimization standard, see standard basic orth, 349
program perspective transform, 22
matrices of a function, 164
doubly stochastic, 119 pointed
Frobenius inner product of, 408 cone, 107
negative (semi)definite, 422 polar
nonsingular, 354 of convex set, 110
orthogonal, 410 polyhedral
positive (semi)definite, 414 cone, 10
square roots of, 422 representation, 36
spaces of, 408 set, 4
spectral norm of, 417 polytope, 9
symmetric, 409 program
eigenvalue decomposition of, 409 semidefinite, 132, 327
functions of, 417 proper
minimizer convex function, 159
global, 179 rank, 354
local, 179 Recessive
nonnegative orthant, 10 direction, 96
norm recessive
conjugate, 216 cone, 96
dual, 216 regular solution, 290
induced, 416 regular-cone, 243
spectral, 416 representation
norms polyhedral, 36
convexity of, 157 saddle point, 304
properties of, 372 saddle points, 255
nullspace of a matrix existence of, 308
see Ker, 354 game theory interpretation of, 305
objective structure of the set of, 306
of optimization program, 50, 239 Schur Complement Lemma, see Lemma on
optimal value, 50, 239 Schur Complement
in parametric convex problem, 274 self-dual-cone, 103
optimality semicontinity
local, 290 lower and upper, 386
optimality conditions upper, 388
in Linear Programming, 57 separation
in saddle point form, 268 strong, 84
in the Karush-Kuhn-Tucker form, 271 theorem, 85
optimization problem sets
linear, 52 closed, 377
optimization program, 50, 239 closure, 378
below bounded, 51, 240 compact, 379
constraints of, 50, 239 conic see conic sets
438 Index

convex, 3 vertex, 92
calculus of, 12 w.l.o.g.
topological properties of, 18 without loss of generality, xv
interior, 378 w.r.t.
open, 377 with respect to, xv
polyhedral Weak
calculus of, 40 duality
extreme points of, 112 in convex cone-constrained
structure of, 121 programming, 258
polyhedral see polyhedral sets in Convex Programming, 253
similarity transformation, 410 in LP, 55
simplex, 9 Duality Theorem in LP, 53
Slater condition, 242, 242
in cone-constrained form, 247
relaxed
in cone-constrained form, 247
solution
regular, 290
span, linear see linear span
standard basic orth, 349
subdifferential, 198
subgradient, 198
subspace
spectral of matrix, 411
Symmetry Principle, 183
Taylor expansion, 406
Theorem
Caratheodory, 26
in conic form, 29
Eigenvalue Decomposition, 409
Eigenvalue Interlacement, 415
Hahn-Banach, 205
Helly, 30
Helly, II, 34
Kirchberger, 67
Krein-Milman, 93
in conic form, 109
on Alternative, Convex, 243
in cone-constrained form see Convex
Theorem on Alternative in
cone-constrained form
on Alternative, General see General
Theorem on Alternative
on Convex Programming Duality see
Convex Programming Duality
Theorem
on Linear Programming Duality see Linear
Programming Duality Theorem
Radon, 29
Separation, 85
Shapley-Folkman, 16, 67
Sion-Kakutani, 308, 312, 313
trace inequality, 221, 234
transform
closed conic, 79
conic, see conic transform, 22
perspective, see perspective transform, 22

Derlon
100% (1)
Derlon
11 pages
Additional Exercises Sol
80% (10)
Additional Exercises Sol
632 pages
Functions One Shot #BounceBack
100% (1)
Functions One Shot #BounceBack
112 pages
Convex Functions and Their Applications: Constantin P. Niculescu Lars-Erik Persson
No ratings yet
Convex Functions and Their Applications: Constantin P. Niculescu Lars-Erik Persson
430 pages
Additional Exercises For Convex Optimization
No ratings yet
Additional Exercises For Convex Optimization
116 pages
Mathematics Essentials For Convex Optimization
No ratings yet
Mathematics Essentials For Convex Optimization
300 pages
Kılınc-Karzan, F., & Mellon, C. (2025) - Essential Mathematics For Convex Optimization. Cambridge University Press. (Draft)
No ratings yet
Kılınc-Karzan, F., & Mellon, C. (2025) - Essential Mathematics For Convex Optimization. Cambridge University Press. (Draft)
460 pages
KKN EssMath
No ratings yet
KKN EssMath
453 pages
OPTIIILN2023Spring ConvexOpti
No ratings yet
OPTIIILN2023Spring ConvexOpti
341 pages
1993 Bookmatter FundamentalsOfConvexAnalysis
No ratings yet
1993 Bookmatter FundamentalsOfConvexAnalysis
20 pages
Jan Van Tiel - Convex Analysis - An Introductory Text-Wiley (1984) PDF
No ratings yet
Jan Van Tiel - Convex Analysis - An Introductory Text-Wiley (1984) PDF
135 pages
Convex Opt Alg
No ratings yet
Convex Opt Alg
66 pages
Co 463
No ratings yet
Co 463
116 pages
Lectures On Convex Sets
No ratings yet
Lectures On Convex Sets
93 pages
Lectures On Convex Sets: Niels Lauritzen
No ratings yet
Lectures On Convex Sets: Niels Lauritzen
93 pages
Convex - Module A Part 2
No ratings yet
Convex - Module A Part 2
27 pages
Mikhail Moklyachuk - Convex Optimization - Introductory Course-Wiley-ISTE (2021)
No ratings yet
Mikhail Moklyachuk - Convex Optimization - Introductory Course-Wiley-ISTE (2021)
254 pages
Grundlehren Der Mathematischen Wissenschaften 305: A Series of Comprehensive Studies in Mathematics
No ratings yet
Grundlehren Der Mathematischen Wissenschaften 305: A Series of Comprehensive Studies in Mathematics
431 pages
Chapter 1
No ratings yet
Chapter 1
147 pages
Lecture 02 - Convexity
No ratings yet
Lecture 02 - Convexity
42 pages
Set 20160906
No ratings yet
Set 20160906
27 pages
Lecture3 ConvexSetsFuns PDF
No ratings yet
Lecture3 ConvexSetsFuns PDF
43 pages
Research Log 1
No ratings yet
Research Log 1
12 pages
Script Convex Analysis
No ratings yet
Script Convex Analysis
167 pages
Additional Exercises
No ratings yet
Additional Exercises
288 pages
Opt Cours
No ratings yet
Opt Cours
67 pages
A First Look at Duality
No ratings yet
A First Look at Duality
14 pages
Convexity I: Sets and Functions: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Convexity I: Sets and Functions: Ryan Tibshirani Convex Optimization 10-725
27 pages
Trustrum 1971
No ratings yet
Trustrum 1971
98 pages
Convexity and Optimization: 1 An Entirely Too Brief Motivation
No ratings yet
Convexity and Optimization: 1 An Entirely Too Brief Motivation
22 pages
BV Cvxbook Extra Exercises
No ratings yet
BV Cvxbook Extra Exercises
187 pages
Additional Exercises For Convex Optimization PDF
No ratings yet
Additional Exercises For Convex Optimization PDF
187 pages
BV Cvxbook Extra Exercises2
No ratings yet
BV Cvxbook Extra Exercises2
175 pages
BV Cvxbook Extra Exercises
No ratings yet
BV Cvxbook Extra Exercises
165 pages
BVCvxbook Extra Exercises
No ratings yet
BVCvxbook Extra Exercises
192 pages
L2 Sets
No ratings yet
L2 Sets
21 pages
Lecture Notes "Convex Analysis": Univ.-Prof. Dr. Radu Ioan Bot
No ratings yet
Lecture Notes "Convex Analysis": Univ.-Prof. Dr. Radu Ioan Bot
107 pages
BV Cvxbook Extra Exercises
No ratings yet
BV Cvxbook Extra Exercises
117 pages
Lecture3 - Convex Sets and Convex Functions
No ratings yet
Lecture3 - Convex Sets and Convex Functions
9 pages
Aditional Exercises Boyd PDF
No ratings yet
Aditional Exercises Boyd PDF
164 pages
Lecture 2 Convex Sets and Functions
No ratings yet
Lecture 2 Convex Sets and Functions
25 pages
Lecture Notes 2
No ratings yet
Lecture Notes 2
181 pages
Convex It y 2015
0% (1)
Convex It y 2015
437 pages
Workshop Math (Autosaved)
No ratings yet
Workshop Math (Autosaved)
16 pages
Lectures 2023
No ratings yet
Lectures 2023
115 pages
DC As I Am Convex Property
No ratings yet
DC As I Am Convex Property
7 pages
BÀI TẬP GT LỒI
No ratings yet
BÀI TẬP GT LỒI
127 pages
BV Cvxbook Extra Exercises
No ratings yet
BV Cvxbook Extra Exercises
156 pages
Extra Exercises PDF
No ratings yet
Extra Exercises PDF
232 pages
Convex Analysis
No ratings yet
Convex Analysis
32 pages
Convex Optimization
No ratings yet
Convex Optimization
152 pages
BV Cvxbook Extra Exercises PDF
No ratings yet
BV Cvxbook Extra Exercises PDF
152 pages
Additional Exercises For Convex Optimization: Stephen Boyd Lieven Vandenberghe March 30, 2012
No ratings yet
Additional Exercises For Convex Optimization: Stephen Boyd Lieven Vandenberghe March 30, 2012
143 pages
BV Cvxbook Extra Exercises
No ratings yet
BV Cvxbook Extra Exercises
139 pages
Convex Optimization Overview: Zico Kolter October 19, 2007
No ratings yet
Convex Optimization Overview: Zico Kolter October 19, 2007
12 pages
Chebyshev and Fourier Spectral Methods: Second Revised Edition
From Everand
Chebyshev and Fourier Spectral Methods: Second Revised Edition
John P. Boyd
4.5/5 (2)
Quantum Information: From Foundations to Quantum Technology Applications
From Everand
Quantum Information: From Foundations to Quantum Technology Applications
Dagmar Bruss
No ratings yet
Classical Electromagnetism: Revised Second Edition
From Everand
Classical Electromagnetism: Revised Second Edition
Jerrold Franklin
5/5 (1)
Experimental Mechanics of Solids
From Everand
Experimental Mechanics of Solids
Cesar A. Sciammarella
No ratings yet
INVENRELATION
From Everand
INVENRELATION
Shih Yu Chang
No ratings yet
Group Theory in Solid State Physics and Photonics: Problem Solving with Mathematica
From Everand
Group Theory in Solid State Physics and Photonics: Problem Solving with Mathematica
Wolfram Hergert
No ratings yet
Geometry: A Comprehensive Course
From Everand
Geometry: A Comprehensive Course
Dan Pedoe
3.5/5 (8)
Fsds 26-06 Data Cleaning With Excel
No ratings yet
Fsds 26-06 Data Cleaning With Excel
7 pages
9222 - Calculating and Processing Devices (WEEK 4)
No ratings yet
9222 - Calculating and Processing Devices (WEEK 4)
11 pages
CausalML Book 2022
No ratings yet
CausalML Book 2022
500 pages
ImPPPact Nigeria Alliance People First PPPs For Sustainable Development
No ratings yet
ImPPPact Nigeria Alliance People First PPPs For Sustainable Development
9 pages
Candidate System - UHB NHS-Recruit
No ratings yet
Candidate System - UHB NHS-Recruit
3 pages
Pushpak Dagade - Python Algorithmic Trading Cookbook All The Recipes You Need To Implement Your Own Trading Strategies in Python (2020) (Z-Lib - Io)
No ratings yet
Pushpak Dagade - Python Algorithmic Trading Cookbook All The Recipes You Need To Implement Your Own Trading Strategies in Python (2020) (Z-Lib - Io)
2 pages
PHD Fee Waiver
No ratings yet
PHD Fee Waiver
3 pages
Pressure Groups
No ratings yet
Pressure Groups
12 pages
Processes of Legislation
No ratings yet
Processes of Legislation
9 pages
Individualized Policy Evaluation and Learning Under Clustered Network Interference
No ratings yet
Individualized Policy Evaluation and Learning Under Clustered Network Interference
33 pages
Economics As A Science
No ratings yet
Economics As A Science
15 pages
Standard Operating Procedure (SOP) For Program Facilitators
No ratings yet
Standard Operating Procedure (SOP) For Program Facilitators
7 pages
Factors of Production and Their Theories
No ratings yet
Factors of Production and Their Theories
19 pages
Economic Growth and Development
100% (1)
Economic Growth and Development
20 pages
C E T D (A - ) O V: Ausal Stimation For EXT Ata With Ppar ENT Verlap Iolations
No ratings yet
C E T D (A - ) O V: Ausal Stimation For EXT Ata With Ppar ENT Verlap Iolations
20 pages
Partial Fraction Decomposition
No ratings yet
Partial Fraction Decomposition
8 pages
Chapter 4.3 Part 1 Circular Functions PDF
No ratings yet
Chapter 4.3 Part 1 Circular Functions PDF
4 pages
Image Enhancement in The Frequency Domain: © 2002 R. C. Gonzalez & R. E. Woods
No ratings yet
Image Enhancement in The Frequency Domain: © 2002 R. C. Gonzalez & R. E. Woods
61 pages
Namma Kalvi Maths 1st Chapter 12th
No ratings yet
Namma Kalvi Maths 1st Chapter 12th
2 pages
Unit+5+Packet+Rat+Fns+24 25
No ratings yet
Unit+5+Packet+Rat+Fns+24 25
10 pages
Practice Questions Matrices
No ratings yet
Practice Questions Matrices
1 page
SM015 - 1 Solution
No ratings yet
SM015 - 1 Solution
18 pages
Liate and Tabular Intergration by Parts
No ratings yet
Liate and Tabular Intergration by Parts
3 pages
Dummit and Foote Chapter 1 Solutions
No ratings yet
Dummit and Foote Chapter 1 Solutions
39 pages
Ega 4
No ratings yet
Ega 4
52 pages
S S 2 S (s+3) (S 2) : Evaluate L-1
No ratings yet
S S 2 S (s+3) (S 2) : Evaluate L-1
2 pages
List of Trigonometric Identities
No ratings yet
List of Trigonometric Identities
21 pages
Edtpa Revised Lessonplan Jung
No ratings yet
Edtpa Revised Lessonplan Jung
11 pages
PA2 - Revision Worksheet
No ratings yet
PA2 - Revision Worksheet
7 pages
Brownian Motion and Stochastic Integrals: N.J. Nielsen
No ratings yet
Brownian Motion and Stochastic Integrals: N.J. Nielsen
29 pages
G12 Assignment On Continuity and Differentiability - PDF
100% (1)
G12 Assignment On Continuity and Differentiability - PDF
49 pages
APPC 2.13 Notes
No ratings yet
APPC 2.13 Notes
6 pages
2.5 Implicit Practice Ws 1
No ratings yet
2.5 Implicit Practice Ws 1
4 pages
Lesson 3 Definite Integral
No ratings yet
Lesson 3 Definite Integral
21 pages
4.mot So Ham So Co Ban - Part2
No ratings yet
4.mot So Ham So Co Ban - Part2
33 pages
MAX Test Revision: MATHS 153
No ratings yet
MAX Test Revision: MATHS 153
7 pages
Differentiation Formulas
No ratings yet
Differentiation Formulas
5 pages
MA 105: Calculus D1 - T5, Tutorial 03: Aryaman Maithani
No ratings yet
MA 105: Calculus D1 - T5, Tutorial 03: Aryaman Maithani
16 pages
Applied Functional Analysis, Third Edition John Tinsley Oden 2024 Scribd Download
No ratings yet
Applied Functional Analysis, Third Edition John Tinsley Oden 2024 Scribd Download
38 pages
Interpolating Between Dimensions: 1 Dimension Theory and A New Perspective
No ratings yet
Interpolating Between Dimensions: 1 Dimension Theory and A New Perspective
19 pages
Laboratory 4: Discrete-Time Fourier Analysis
No ratings yet
Laboratory 4: Discrete-Time Fourier Analysis
7 pages
Mathematics - I A (Em) MQP
No ratings yet
Mathematics - I A (Em) MQP
3 pages
Assignment 1 Soluti
No ratings yet
Assignment 1 Soluti
3 pages
Linear Algebra Problem Set
No ratings yet
Linear Algebra Problem Set
30 pages