0% found this document useful (0 votes)
4 views

1. From linear to conic programming

Uploaded by

scribd-ml
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

1. From linear to conic programming

Uploaded by

scribd-ml
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

C.O.R.E.

Summer School
on
Modern Convex Optimization
August 26-30, 2002

FIVE LECTURES
ON
MODERN CONVEX OPTIMIZATION

Arkadi Nemirovski
[email protected],
http://iew3.technion.ac.il/Home/Users/Nemirovski.phtml
Faculty of Industrial Engineering and Management
and Minerva Optimization Center
Technion – Israel Institute of Technology
Technion City, Haifa 32000, Israel
2

Preface

Mathematical Programming deals with optimization programs of the form


minimize f (x)
subject to
(P)
gi (x) ≤ 0, i = 1, ..., m,
[x ⊂ Rn ]
and includes the following general areas:
1. Modelling: methodologies for posing various applied problems as optimization programs;
2. Optimization Theory, focusing on existence, uniqueness and on characterization of optimal
solutions to optimization programs;
3. Optimization Methods: development and analysis of computational algorithms for various
classes of optimization programs;
4. Implementation, testing and application of modelling methodologies and computational
algorithms.
Essentially, Mathematical Programming was born in 1948, when George Dantzig has invented
Linear Programming – the class of optimization programs (P) with linear objective f (·) and
constraints gi (·). This breakthrough discovery included
• the methodological idea that a natural desire of a human being to look for the best possible
decisions can be posed in the form of an optimization program (P) and thus subject to
mathematical and computational treatment;
• the theory of LP programs, primarily the LP duality (this is in part due to the great
mathematician John von Neumann);
• the first computational method for LP – the Simplex method, which over the years turned
out to be an extremely powerful computational tool.
As it often happens with first-rate discoveries (and to some extent is characteristic for such
discoveries), today the above ideas and constructions look quite traditional and simple. Well,
the same is with the wheel.
In 50 plus years since its birth, Mathematical Programming was rapidly progressing along
all outlined avenues, “in width” as well as “in depth”. I have no intention (and time) to trace
the history of the subject decade by decade; instead, let me outline the major achievements in
Optimization during the last 20 years or so, those which, I believe, allow to speak about modern
optimization as opposed to the “classical” one as it existed circa 1980. The reader should be
aware that the summary to follow is highly subjective and reflects the personal attitudes prefer-
ences of the author. Thus, in my opinion the major achievements in Mathematical Programming
during last 15-20 years can be outlined as follows:
♠ Realizing what are the generic optimization programs one can solve well (“efficiently solv-
able” programs) and when such a possibility is, mildly speaking, problematic (“computationally
intractable” programs). At this point, I do not intend to explain what does it mean exactly
that “a generic optimization program is efficiently solvable”; we will arrive at this issue further
in the course. However, I intend to answer the question (right now, not well posed!) “what are
generic optimization programs we can solve well”:
3

(!) As far as numerical processing of programs (P) is concerned, there exists a


“solvable case” – the one of convex optimization programs, where the objective f
and the constraints gi are convex functions.
Under minimal additional “computability assumptions” (which are satisfied in basi-
cally all applications), a convex optimization program is “computationally tractable”
– the computational effort required to solve the problem to a given accuracy “grows
moderately” with the dimensions of the problem and the required number of accuracy
digits.
In contrast to this, a general-type non-convex problems are too difficult for numerical
solution – the computational effort required to solve such a problem by the best
known so far numerical methods grows prohibitively fast with the dimensions of
the problem and the number of accuracy digits, and there are serious theoretical
reasons to guess that this is the intrinsic feature of non-convex problems rather than
a drawback of the existing optimization techniques.
Just to give an example, consider a pair of optimization problems. The first is
n
minimize − i=1
xi
subject to
(A)
x2i − xi = 0, i = 1, ..., n;
xi xj = 0 ∀(i, j) ∈ Γ,

Γ being a given set of pairs (i, j) of indices i, j. This is a fundamental combinatorial problem of computing the
stability number of a graph; the corresponding “covering story” is as follows:
Assume that we are given n letters which can be sent through a telecommunication channel, say,
n = 256 usual bytes. When passing trough the channel, an input letter can be corrupted by errors;
as a result, two distinct input letters can produce the same output and thus not necessarily can be
distinguished at the receiving end. Let Γ be the set of “dangerous pairs of letters” – pairs (i, j) of
distinct letters i, j which can be converted by the channel into the same output. If we are interested
in error-free transmission, we should restrict the set S of letters we actually use to be independent
– such that no pair (i, j) with i, j ∈ S belongs to Γ. And in order to utilize best of all the capacity
of the channel, we are interested to use a maximal – with maximum possible number of letters –
independent sub-alphabet. It turns out that the minus optimal value in (A) is exactly the cardinality
of such a maximal independent sub-alphabet.

Our second problem is


k m
minimize −2 i=1 j=1
cij xij + x00
subject to  m 
x1 b x
j=1 pj 1j
 .. 
 
λmin  .
m · · ·  ≥ 0, (B)
 
m m xk b x
j=1 pj kj

j=1
bpj x1j ··· b x
j=1 pj kj
x00
p = 1, ..., N,
k
i=1 i
x = 1,

where λmin (A) denotes the minimum eigenvalue of a symmetric matrix A. This problem is responsible for the
design of a truss (a mechanical construction comprised of linked with each other thin elastic bars, like an electric
mast, a bridge or the Eiffel Tower) capable to withstand best of all to k given loads.
When looking at the analytical forms of (A) and (B), it seems that the first problem is easier than the second:
the constraints in (A) are simple explicit quadratic equations, while the constraints in (B) involve much more
complicated functions of the design variables – the eigenvalues of certain matrices depending on the design vector.
The truth, however, is that the first problem is, in a sense, “as difficult as an optimization problem can be”, and
4

the worst-case computational effort to solve this problem within absolute inaccuracy 0.5 by all known optimization
methods is about 2n operations; for n = 256 (just 256 design variables corresponding to the “alphabet of bytes”),
the quantity 2n ≈ 1077 , for all practical purposes, is the same as +∞. In contrast to this, the second problem is
quite “computationally tractable”. E.g., for k = 6 (6 loads of interest) and m = 100 (100 degrees of freedom of
the construction) the problem has about 600 variables (twice the one of the “byte” version of (A)); however, it
can be reliably solved within 6 accuracy digits in a couple of minutes. The dramatic difference in computational
effort required to solve (A) and (B) finally comes from the fact that (A) is a non-convex optimization problem,
while (B) is convex.
Note that realizing what is easy and what is difficult in Optimization is, aside of theoretical
importance, extremely important methodologically. Indeed, mathematical models of real world
situations in any case are incomplete and therefore are flexible to some extent. When you know in
advance what you can process efficiently, you perhaps can use this flexibility to build a tractable
(in our context – a convex) model. The “traditional” Optimization did not pay much attention
to complexity and focused on easy-to-analyze purely asymptotical “rate of convergence” results.
From this viewpoint, the most desirable property of f and gi is smoothness (plus, perhaps,
certain “nondegeneracy” at the optimal solution), and not their convexity; choosing between
the above problems (A) and (B), a “traditional” optimizer would, perhaps, prefer the first of
them. I suspect that a non-negligible part of “applied failures” of Mathematical Programming
came from the traditional (I would say, heavily misleading) “order of preferences” in model-
building. Surprisingly, some advanced users (primarily in Control) have realized the crucial
role of convexity much earlier than some members of the Optimization community. Here is a
real story. About 7 years ago, we were working on certain Convex Optimization method, and
I sent an e-mail to people maintaining CUTE (a benchmark of test problems for constrained
continuous optimization) requesting for the list of convex programs from their collection. The
answer was: “We do not care which of our problems are convex, and this be a lesson for those
developing Convex Optimization techniques.” In their opinion, I am stupid; in my opinion, they
are obsolete. Who is right, this I do not know...
♠ Discovery of interior-point polynomial time methods for “well-structured” generic convex
programs and throughout investigation of these programs.
By itself, the “efficient solvability” of generic convex programs is a theoretical rather than
a practical phenomenon. Indeed, assume that all we know about (P) is that the program is
convex, its objective is called f , the constraints are called gj and that we can compute f and gi ,
along with their derivatives, at any given point at the cost of M arithmetic operations. In this
case the computational effort for finding an -solution turns out to be at least O(1)nM ln( 1 ).
Note that this is a lower complexity bound, and the best known so far upper bound is much
worse: O(1)n(n3 + M ) ln( 1 ). Although the bounds grow “moderately” – polynomially – with
the design dimension n of the program and the required number ln( 1 ) of accuracy digits, from
the practical viewpoint the upper bound becomes prohibitively large already for n like 1000.
This is in striking contrast with Linear Programming, where one can solve routinely problems
with tens and hundreds of thousands of variables and constraints. The reasons for this huge
difference come from the fact that
When solving an LP program, our a priory knowledge is far beyond the fact that the
objective is called f , the constraints are called gi , that they are convex and we can
compute their values at derivatives at any given point. In LP, we know in advance
what is the analytical structure of f and gi , and we heavily exploit this knowledge
when processing the problem. In fact, all successful LP methods never never compute
the values and the derivatives of f and gi – they do something completely different.
5

One of the most important recent developments in Optimization is realizing the simple fact
that a jump from linear f and gi ’s to “completely structureless” convex f and gi ’s is too long: in-
between these two extremes, there are many interesting and important generic convex programs.
These “in-between” programs, although non-linear, still possess nice analytical structure, and
one can use this structure to develop dedicated optimization methods, the methods which turn
out to be incomparably more efficient than those exploiting solely the convexity of the program.
The aforementioned “dedicated methods” are Interior Point polynomial time algorithms,
and the most important “well-structured” generic convex optimization programs are those of
Linear, Conic Quadratic and Semidefinite Programming; the last two entities merely did not
exist as established research subjects just 15 years ago. In my opinion, the discovery of Inte-
rior Point methods and of non-linear “well-structured” generic convex programs, along with the
subsequent progress in these novel research areas, is one of the most impressive achievements in
Mathematical Programming. It is my pleasure to add that one of the key roles in these break-
through developments, and definitely the key role as far as nonlinear programs are concerned,
was and is played by Professor Yuri Nesterov from CORE.
♠ I have outlined the most revolutionary, in my appreciation, changes in the theoretical core
of Mathematical Programming in the last 15-20 years. During this period, we have witnessed
perhaps less dramatic, but still quite important progress in the methodological and application-
related areas as well. The major novelty here is certain shift from the traditional for Operations
Research applications in Industrial Engineering (production planning, etc.) to applications in
“genuine” Engineering. I believe it is completely fair to say that the theory and methods
of Convex Optimization, especially those of Semidefinite Programming, have become a kind
of new paradigm in Control and are becoming more and more frequently used in Mechanical
Engineering, Design of Structures, Medical Imaging, etc.

The aim of the course is to outline some of the novel research areas which have arisen in
Optimization during the past decade or so. I intend to focus solely on Convex Programming,
specifically, on

• Conic Programming, with emphasis on the most important particular cases – those of
Linear, Conic Quadratic and Semidefinite Programming (LP, CQP and SDP, respectively).
Here the focus will be on

– basic Duality Theory for conic programs;


– investigation of “expressive abilities” of CQP and SDP;
– overview of the theory of Interior Point polynomial time methods for LP, CQP and
SDP.

• “Efficient (polynomial time) solvability” of generic convex programs.

• “Low cost” optimization methods for extremely large-scale optimization programs.

Acknowledgements. The first four lectures of the five comprising the course are based upon
the recent book
Ben-Tal, A., Nemirovski, A., Lectures on Modern Convex Optimization: Analysis, Algo-
rithms, Engineering Applications, MPS-SIAM Series on Optimization, SIAM, Philadelphia,
2001.
6

I am greatly indebted to my colleagues, primarily to Yuri Nesterov, Aharon Ben-Tal, Stephen


Boyd, Claude Lemarechal and Kees Roos, who over the years have influenced significantly my
understanding of our subject as expressed in this course. Needless to say, I am the only person
responsible for the drawbacks in what follows.

Arkadi Nemirovski,
Haifa, Israel, May 2002.
Contents

1 From Linear to Conic Programming 9


1.1 Linear programming: basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Duality in linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 Certificates for solvability and insolvability . . . . . . . . . . . . . . . . . 10
1.2.2 Dual to an LP program: the origin . . . . . . . . . . . . . . . . . . . . . . 14
1.2.3 The LP Duality Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 From Linear to Conic Programming . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Orderings of Rm and convex cones . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 “Conic programming” – what is it? . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.6 Conic Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.6.1 Geometry of the primal and the dual problems . . . . . . . . . . . . . . . 25
1.7 The Conic Duality Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.7.1 Is something wrong with conic duality? . . . . . . . . . . . . . . . . . . . 31
1.7.2 Consequences of the Conic Duality Theorem . . . . . . . . . . . . . . . . 32

2 Conic Quadratic Programming 39


2.1 Conic Quadratic problems: preliminaries . . . . . . . . . . . . . . . . . . . . . . . 39
2.2 Examples of conic quadratic problems . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2.1 Contact problems with static friction [10] . . . . . . . . . . . . . . . . . . 41
2.3 What can be expressed via conic quadratic constraints? . . . . . . . . . . . . . . 43
2.3.1 More examples of CQ-representable functions/sets . . . . . . . . . . . . . 58
2.4 More applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.4.1 Robust Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . 62

3 Semidefinite Programming 77
3.1 Semidefinite cone and Semidefinite programs . . . . . . . . . . . . . . . . . . . . 77
3.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2 What can be expressed via LMI’s? . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.3 Applications of Semidefinite Programming in Engineering . . . . . . . . . . . . . 95
3.3.1 Dynamic Stability in Mechanics . . . . . . . . . . . . . . . . . . . . . . . . 96
3.3.2 Design of chips and Boyd’s time constant . . . . . . . . . . . . . . . . . . 98
3.3.3 Lyapunov stability analysis/synthesis . . . . . . . . . . . . . . . . . . . . 100
3.4 Semidefinite relaxations of intractable problems . . . . . . . . . . . . . . . . . . . 108
3.4.1 Semidefinite relaxations of combinatorial problems . . . . . . . . . . . . . 108
3.4.2 Matrix Cube Theorem and interval stability analysis/synthesis . . . . . . 121
3.4.3 Robust Quadratic Programming . . . . . . . . . . . . . . . . . . . . . . . 128
3.5 Appendix: S-Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

7
8 CONTENTS

4 Polynomial Time Interior Point algorithms for LP, CQP and SDP 137
4.1 Complexity of Convex Programming . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.1.1 Combinatorial Complexity Theory . . . . . . . . . . . . . . . . . . . . . . 137
4.1.2 Complexity in Continuous Optimization . . . . . . . . . . . . . . . . . . . 140
4.1.3 Difficult continuous optimization problems . . . . . . . . . . . . . . . . . 144
4.2 Interior Point Polynomial Time Methods for LP, CQP and SDP . . . . . . . . . . 145
4.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.2.2 Interior Point methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.2.3 But... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.3 Interior point methods for LP, CQP, and SDP: building blocks . . . . . . . . . . 151
4.3.1 Canonical cones and canonical barriers . . . . . . . . . . . . . . . . . . . . 151
4.3.2 Elementary properties of canonical barriers . . . . . . . . . . . . . . . . . 153
4.4 Primal-dual pair of problems and primal-dual central path . . . . . . . . . . . . . 155
4.4.1 The problem(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.4.2 The central path(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.5 Tracing the central path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.5.1 The path-following scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.5.2 Speed of path-tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
4.5.3 The primal and the dual path-following methods . . . . . . . . . . . . . . 165
4.5.4 The SDP case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
4.6 Complexity bounds for LP, CQP, SDP . . . . . . . . . . . . . . . . . . . . . . . . 181
4.6.1 Complexity of LP b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.6.2 Complexity of CQP b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.6.3 Complexity of SDP b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.7 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

5 Simple methods for extremely large-scale problems 187


5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
5.2 Information-based complexity of Convex Programming . . . . . . . . . . . . . . . 189
5.3 The Bundle-Mirror scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
5.4 Implementation issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
5.5 Illustration: PET Image Reconstruction problem . . . . . . . . . . . . . . . . . . 204
5.6 Appendix: strong convexity of ω(·) for standard setups . . . . . . . . . . . . . . . 211
Lecture 1

From Linear to Conic Programming

1.1 Linear programming: basic notions


A Linear Programming (LP) program is an optimization program of the form
  

min cT x  Ax ≥ b , (LP)

where
• x ∈ Rn is the design vector
• c ∈ Rn is a given vector of coefficients of the objective function cT x
• A is a given m × n constraint matrix, and b ∈ Rm is a given right hand side of the
constraints.
(LP) is called
– feasible, if its feasible set
F = {x | Ax − b ≥ 0}
is nonempty; a point x ∈ F is called a feasible solution to (LP);
– bounded below, if it is either infeasible, or its objective cT x is bounded below on F.
For a feasible bounded below problem (LP), the quantity
c∗ ≡ inf cT x
x:Ax−b≥0

is called the optimal value of the problem. For an infeasible problem, we set c∗ = +∞,
while for feasible unbounded below problem we set c∗ = −∞.
(LP) is called solvable, if it is feasible, bounded below and the optimal value is attained, i.e.,
there exists x ∈ F with cT x = c∗ . An x of this type is called an optimal solution to (LP).
A priori it is unclear whether a feasible and bounded below LP program is solvable: why should
the infimum be achieved? It turns out, however, that a feasible and bounded below program
(LP) always is solvable. This nice fact (we shall establish it later) is specific for LP. Indeed, a
very simple nonlinear optimization program
  
1 
min x≥1
x
is feasible and bounded below, but it is not solvable.

9
10 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

1.2 Duality in linear programming


The most important and interesting feature of linear programming as a mathematical entity
(i.e., aside of computations and applications) is the wonderful LP duality theory we are about
to consider. We motivate this topic by first addressing the following question:
Given an LP program 
 

∗  T
c = min c x  Ax − b ≥ 0 , (LP)
x

how to find a systematic way to bound from below its optimal value c∗ ?
Why this is an important question, and how the answer helps to deal with LP, this will be seen
in the sequel. For the time being, let us just believe that the question is worthy of the effort.
A trivial answer to the posed question is: solve (LP) and look what is the optimal value.
There is, however, a smarter and a much more instructive way to answer our question. Just to
get an idea of this way, let us look at the following example:
 

  x1 + 2x2 + ... + 2001x2001 + 2002x2002 − 1 ≥ 0,  

min x1 + x2 + ... + x2002  2002x1 + 2001x2 + ... + 2x2001 + x2002 − 100 ≥ 0, .

 
..... ... ... 
101
We claim that the optimal value in the problem is ≥ 2003 . How could one certify this bound?
This is immediate: add the first two constraints to get the inequality

2003(x1 + x2 + ... + x1998 + x2002 ) − 101 ≥ 0,

and divide the resulting inequality by 2003. LP duality is nothing but a straightforward gener-
alization of this simple trick.

1.2.1 Certificates for solvability and insolvability


Consider a (finite) system of scalar inequalities with n unknowns. To be as general as possible,
we do not assume for the time being the inequalities to be linear, and we allow for both non-
strict and strict inequalities in the system, as well as for equalities. Since an equality can be
represented by a pair of non-strict inequalities, our system can always be written as

fi (x) Ωi 0, i = 1, ..., m, (S)

where every Ωi is either the relation ” > ” or the relation ” ≥ ”.


The basic question about (S) is
(?) Whether (S) has a solution or not.
Knowing how to answer the question (?), we are able to answer many other questions. E.g., to
verify whether a given real a is a lower bound on the optimal value c∗ of (LP) is the same as to
verify whether the system 
−cT x + a > 0
Ax − b ≥ 0
has no solutions.
The general question above is too difficult, and it makes sense to pass from it to a seemingly
simpler one:
1.2. DUALITY IN LINEAR PROGRAMMING 11

(??) How to certify that (S) has, or does not have, a solution.

Imagine that you are very smart and know the correct answer to (?); how could you convince
somebody that your answer is correct? What could be an “evident for everybody” certificate of
the validity of your answer?
If your claim is that (S) is solvable, a certificate could be just to point out a solution x∗ to
(S). Given this certificate, one can substitute x∗ into the system and check whether x∗ indeed
is a solution.
Assume now that your claim is that (S) has no solutions. What could be a “simple certificate”
of this claim? How one could certify a negative statement? This is a highly nontrivial problem
not just for mathematics; for example, in criminal law: how should someone accused in a murder
prove his innocence? The “real life” answer to the question “how to certify a negative statement”
is discouraging: such a statement normally cannot be certified (this is where the rule “a person
is presumed innocent until proven guilty” comes from). In mathematics, however, the situation
is different: in some cases there exist “simple certificates” of negative statements. E.g., in order
to certify that (S) has no solutions, it suffices to demonstrate that a consequence of (S) is a
contradictory inequality such as
−1 ≥ 0.
For example, assume that λi , i = 1, ..., m, are nonnegative weights. Combining inequalities from
(S) with these weights, we come to the inequality
m

λi fi (x) Ω 0 (Cons(λ))
i=1

where Ω is either ” > ” (this is the case when the weight of at least one strict inequality from
(S) is positive), or ” ≥ ” (otherwise). Since the resulting inequality, due to its origin, is a
consequence of the system (S), i.e., it is satisfied by every solution to S), it follows that if
(Cons(λ)) has no solutions at all, we can be sure that (S) has no solution. Whenever this is the
case, we may treat the corresponding vector λ as a “simple certificate” of the fact that (S) is
infeasible.
Let us look what does the outlined approach mean when (S) is comprised of linear inequal-
ities:   
T ”>”
(S) : {ai x Ωi bi , i = 1, ..., m} Ωi =
”≥”
Here the “combined inequality” is linear as well:
m
 m

(Cons(λ)) : ( λai )T x Ω λbi
i=1 i=1

(Ω is ” > ” whenever λi > 0 for at least one i with Ωi = ” > ”, and Ω is ” ≥ ” otherwise). Now,
when can a linear inequality
dT x Ω e
be contradictory? Of course, it can happen only when d = 0. Whether in this case the inequality
is contradictory, it depends on what is the relation Ω: if Ω = ” > ”, then the inequality is
contradictory if and only if e ≥ 0, and if Ω = ” ≥ ”, it is contradictory if and only if e > 0. We
have established the following simple result:
12 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

Proposition 1.2.1 Consider a system of linear inequalities



aTi x > bi , i = 1, ..., ms ,
(S) :
aTi x ≥ bi , i = ms + 1, ..., m.

with n-dimensional vector of unknowns x. Let us associate with (S) two systems of linear
inequalities and equations with m-dimensional vector of unknowns λ:


 (a) λ ≥ 0;

 m




 (b) λi ai = 0;
 i=1
TI : m


 (cI ) λi bi ≥ 0;

 i=1

 m
s


 (dI ) λi > 0.
i=1


 (a) λ ≥ 0;

 m


(b) λi ai = 0;
TII : i=1

 m



 (cII ) λi bi > 0.
i=1

Assume that at least one of the systems TI , TII is solvable. Then the system (S) is infeasible.

Proposition 1.2.1 says that in some cases it is easy to certify infeasibility of a linear system of
inequalities: a “simple certificate” is a solution to another system of linear inequalities. Note,
however, that the existence of a certificate of this latter type is to the moment only a sufficient,
but not a necessary, condition for the infeasibility of (S). A fundamental result in the theory of
linear inequalities is that the sufficient condition in question is in fact also necessary:

Theorem 1.2.1 [General Theorem on Alternative] In the notation from Proposition 1.2.1, sys-
tem (S) has no solutions if and only if either TI , or TII , or both these systems, are solvable.
There are numerous proofs of the Theorem on Alternative; in my taste, the most instructive one is to
reduce the Theorem to its particular case – the Homogeneous Farkas Lemma:
[Homogeneous Farkas Lemma] A homogeneous nonstrict linear inequality

aT x ≤ 0

is a consequence of a system of homogeneous nonstrict linear inequalities

aTi x ≤ 0, i = 1, ..., m

if and only if it can be obtained from the system by taking weighted sum with nonnegative
weights:
(a) aTi x ≤ 0, i = 1, ..., m ⇒ aT x ≤ 0,
  (1.2.1)
(b) ∃λi ≥ 0 : a = λi ai .
i

The reduction of TA to HFL is easy. As about the HFL, there are, essentially, two ways to prove the
statement:
• The “quick and dirty” one based on separation arguments, which is as follows:
1.2. DUALITY IN LINEAR PROGRAMMING 13

1. First, we demonstrate that if A is a nonempty closed convex set in Rn and a is a point from
Rn \A, then a can be strongly separated from A by a linear form: there exists x ∈ Rn such
that
xT a < inf xT b. (1.2.2)
b∈A

To this end, it suffices to verify that



(a) In A, there exists a point closest to a w.r.t. the standard Euclidean norm b2 = bT b,
i.e., that the optimization program
min a − b2
b∈A

has a solution b∗ ;
(b) Setting x = b∗ − a, one ensures (1.2.2).
Both (a) and (b) are immediate.
2. Second, we demonstrate that the set
m

A = {b : ∃λ ≥ 0 : b = λi ai }
i=1

– the cone spanned by the vectors a1 , ..., am – is convex (which is immediate) and closed (the
proof of this crucial fact also is not difficult).
3. Combining the above facts, we immediately see that
— either a ∈ A, i.e., (1.2.1.b) holds,

— or there exists x such that xT a < inf xT λi ai .
λ≥0 i
The latter inf is finite if and only if xT ai ≥ 0 for all i, and in this case the inf is 0, so that
the “or” statement says exactly that there exists x with aTi x ≥ 0, aT x < 0, or, which is the
same, that (1.2.1.a) does not hold.
Thus, among the statements (1.2.1.a) and the negation of (1.2.1.b) at least one (and, as it
is immediately seen, at most one as well) always is valid, which is exactly the equivalence
(1.2.1).
• “Advanced” proofs based purely on Linear Algebra facts. The advantage of these purely Linear
Algebra proofs is that they, in contrast to the outlined separation-based proof, do not use the
completeness of Rn as a metric space and thus work when we pass from systems with real coefficients
and unknowns to systems with rational (or algebraic) coefficients. As a result, an advanced proof
allows to establish the Theorem on Alternative for the case when the coefficients and unknowns in
(S), TI , TII are restricted to belong to a given “real field” (e.g., are rational).
We formulate here explicitly two very useful principles following from the Theorem on Al-
ternative:

A. A system of linear inequalities

aTi x Ωi bi , i = 1, ..., m

has no solutions if and only if one can combine the inequalities of the system in
a linear fashion (i.e., multiplying the inequalities by nonnegative weights, adding
the results and passing, if necessary, from an inequality aT x > b to the inequality
aT x ≥ b) to get a contradictory inequality, namely, either the inequality 0T x ≥ 1, or
the inequality 0T x > 0.
B. A linear inequality
aT0 x Ω0 b0
14 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

is a consequence of a solvable system of linear inequalities


aTi x Ωi bi , i = 1, ..., m
if and only if it can be obtained by combining, in a linear fashion, the inequalities of
the system and the trivial inequality 0 > −1.
It should be stressed that the above principles are highly nontrivial and very deep. Consider,
e.g., the following system of 4 linear inequalities with two variables u, v:
−1 ≤ u ≤ 1
−1 ≤ v ≤ 1.
From these inequalities it follows that
u2 + v 2 ≤ 2, (!)
which in turn implies, by the Cauchy inequality, the linear inequality u + v ≤ 2:
  √
u + v = 1 × u + 1 × v ≤ 12 + 12 u2 + v 2 ≤ ( 2)2 = 2. (!!)
The concluding inequality is linear and is a consequence of the original system, but in the
demonstration of this fact both steps (!) and (!!) are “highly nonlinear”. It is absolutely
unclear a priori why the same consequence can, as it is stated by Principle A, be derived
from the system in a linear manner as well [of course it can – it suffices just to add two
inequalities u ≤ 1 and v ≤ 1].
Note that the Theorem on Alternative and its corollaries A and B heavily exploit the fact
that we are speaking about linear inequalities. E.g., consider the following 2 quadratic and
2 linear inequalities with two variables:
(a) u2 ≥ 1;
(b) v2 ≥ 1;
(c) u ≥ 0;
(d) v ≥ 0;
along with the quadratic inequality
(e) uv ≥ 1.

The inequality (e) is clearly a consequence of (a) – (d). However, if we extend the system of
inequalities (a) – (b) by all “trivial” (i.e., identically true) linear and quadratic inequalities
with 2 variables, like 0 > −1, u2 + v 2 ≥ 0, u2 + 2uv + v 2 ≥ 0, u2 − uv + v 2 ≥ 0, etc.,
and ask whether (e) can be derived in a linear fashion from the inequalities of the extended
system, the answer will be negative. Thus, Principle A fails to be true already for quadratic
inequalities (which is a great sorrow – otherwise there were no difficult problems at all!)

We are about to use the Theorem on Alternative to obtain the basic results of the LP duality
theory.

1.2.2 Dual to an LP program: the origin


As already mentioned, the motivation for constructing the problem dual to an LP program
   
aT1
      
   aT2  
c∗ = min cT x  Ax − b ≥ 0 A =   ∈ Rm×n  (LP)
x   ...  
aTm
1.2. DUALITY IN LINEAR PROGRAMMING 15

is the desire to generate, in a systematic way, lower bounds on the optimal value c∗ of (LP).
An evident way to bound from below a given function f (x) in the domain given by system of
inequalities
gi (x) ≥ bi , i = 1, ..., m, (1.2.3)
is offered by what is called the Lagrange duality and is as follows:
Lagrange Duality:
• Let us look at all inequalities which can be obtained from (1.2.3) by linear aggre-
gation, i.e., at the inequalities of the form
 
yi gi (x) ≥ yi bi (1.2.4)
i i

with the “aggregation weights” yi ≥ 0. Note that the inequality (1.2.4), due to its
origin, is valid on the entire set X of solutions of (1.2.3).
• Depending on the choice of aggregation weights, it may happen that the left hand
side in (1.2.4) is ≤ f (x) for all x ∈ Rn . Whenever it is the case, the right hand side

yi bi of (1.2.4) is a lower bound on f in X.
i

Indeed, on X the quantity yi bi is a lower bound on yi gi (x), and for y in question
i
the latter function of x is everywhere ≤ f (x).

It follows that
• The optimal value in the problem
 
 y ≥ 0, (a) 
max yi bi :  yi gi (x) ≤ f (x) ∀x ∈ Rn (b) (1.2.5)
y  
i i

is a lower bound on the values of f on the set of solutions to the system (1.2.3).
Let us look what happens with the Lagrange duality when f and gi are homogeneous linear
functions: f = cT x, gi (x) = aTi x. In this case, the requirement (1.2.5.b) merely says that

c = yi ai (or, which is the same, AT y = c due to the origin of A). Thus, problem (1.2.5)
i
becomes the Linear Programming problem
 
max bT y : AT y = c, y ≥ 0 , (LP∗ )
y

which is nothing but the LP dual of (LP).


By the construction of the dual problem,
[Weak Duality] The optimal value in (LP∗ ) is less than or equal to the optimal value
in (LP).
In fact, the “less than or equal to” in the latter statement is “equal”, provided that the optimal
value c∗ in (LP) is a number (i.e., (LP) is feasible and below bounded). To see that this indeed
is the case, note that a real a is a lower bound on c∗ if and only if cT x ≥ a whenever Ax ≥ b,
or, which is the same, if and only if the system of linear inequalities

(Sa ) : −cT x > −a, Ax ≥ b


16 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

has no solution. We know by the Theorem on Alternative that the latter fact means that some
other system of linear equalities (more exactly, at least one of a certain pair of systems) does
have a solution. More precisely,
(*) (Sa ) has no solutions if and only if at least one of the following two systems with
m + 1 unknowns:


 (a) λ = (λ0 , λ1 , ..., λm ) ≥ 0;

 m


 −λ0 c +
 (b) λi ai = 0;
TI : i=1
m




 (cI ) −λ0 a + λi bi ≥ 0;

 i=1

(dI ) λ0 > 0,
or 

 (a) λ = (λ0 , λ1 , ..., λm ) ≥ 0;

 m


(b) −λ0 c − λi ai = 0;
TII : i=1

 m



 (cII ) −λ0 a − λi bi > 0
i=1
– has a solution.
Now assume that (LP) is feasible. We claim that under this assumption (Sa ) has no solutions
if and only if TI has a solution.
The implication ”TI has a solution ⇒ (Sa ) has no solution” is readily given by the above
remarks. To verify the inverse implication, assume that (Sa ) has no solutions and the system
Ax ≤ b has a solution, and let us prove that then TI has a solution. If TI has no solution, then
by (*) TII has a solution and, moreover, λ0 = 0 for (every) solution to TII (since a solution
to the latter system with λ0 > 0 solves TI as well). But the fact that TII has a solution λ
with λ0 = 0 is independent of the values of a and c; if this fact would take place, it would
mean, by the same Theorem on Alternative, that, e.g., the following instance of (Sa ):

0T x ≥ −1, Ax ≥ b

has no solutions. The latter means that the system Ax ≥ b has no solutions – a contradiction
with the assumption that (LP) is feasible.

Now, if TI has a solution, this system has a solution with λ0 = 1 as well (to see this, pass from
a solution λ to the one λ/λ0 ; this construction is well-defined, since λ0 > 0 for every solution
to TI ). Now, an (m + 1)-dimensional vector λ = (1, y) is a solution to TI if and only if the
m-dimensional vector y solves the system of linear inequalities and equations
y ≥ 0;
m

AT y ≡ yi ai = c; (D)
i=1
bT y ≥ a
Summarizing our observations, we come to the following result.
Proposition 1.2.2 Assume that system (D) associated with the LP program (LP) has a solution
(y, a). Then a is a lower bound on the optimal value in (LP). Vice versa, if (LP) is feasible and
a is a lower bound on the optimal value of (LP), then a can be extended by a properly chosen
m-dimensional vector y to a solution to (D).
1.2. DUALITY IN LINEAR PROGRAMMING 17

We see that the entity responsible for lower bounds on the optimal value of (LP) is the system
(D): every solution to the latter system induces a bound of this type, and in the case when
(LP) is feasible, all lower bounds can be obtained from solutions to (D). Now note that if
(y, a) is a solution to (D), then the pair (y, bT y) also is a solution to the same system, and the
lower bound bT y on c∗ is not worse than the lower bound a. Thus, as far as lower bounds on
c∗ are concerned, we lose nothing by restricting ourselves to the solutions (y, a) of (D) with
a = bTy; the best lower bound

 on c given by (D) is therefore the optimal value of the problem

maxy bT y  AT y = c, y ≥ 0 , which is nothing but the dual to (LP) problem (LP∗ ). Note that
(LP∗ ) is also a Linear Programming program.
All we know about the dual problem to the moment is the following:

Proposition 1.2.3 Whenever y is a feasible solution to (LP∗ ), the corresponding value of the
dual objective bT y is a lower bound on the optimal value c∗ in (LP). If (LP) is feasible, then for
every a ≤ c∗ there exists a feasible solution y of (LP∗ ) with bT y ≥ a.

1.2.3 The LP Duality Theorem


Proposition 1.2.3 is in fact equivalent to the following

Theorem 1.2.2 [Duality Theorem in Linear Programming] Consider a linear programming


program   
T 

min c x  Ax ≥ b (LP)
x

along with its dual   


 T
T
max b y  A y = c, y ≥ 0 (LP∗ )
y

Then
1) The duality is symmetric: the problem dual to dual is equivalent to the primal;
2) The value of the dual objective at every dual feasible solution is ≤ the value of the primal
objective at every primal feasible solution
3) The following 5 properties are equivalent to each other:

(i) The primal is feasible and bounded below.


(ii) The dual is feasible and bounded above.
(iii) The primal is solvable.
(iv) The dual is solvable.
(v) Both primal and dual are feasible.

Whenever (i) ≡ (ii) ≡ (iii) ≡ (iv) ≡ (v) is the case, the optimal values of the primal and the dual
problems are equal to each other.

Proof. 1) is quite straightforward: writing the dual problem (LP∗ ) in our standard form, we
get    
 

  Im 0 


T   
T 
min −b y   A  y − −c ≥ 0 , 
y  

−AT c
18 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

where Im is the m-dimensional unit matrix. Applying the duality transformation to the latter
problem, we come to the problem
 

 ξ ≥ 0 


  

 η ≥ 0
max 0 ξ + c η + (−c) ζ 
T T T
,

ξ,η,ζ  ζ ≥ 0 


 

ξ − Aη + Aζ = −b

which is clearly equivalent to (LP) (set x = η − ζ).


2) is readily given by Proposition 1.2.3.
3):

(i)⇒(iv): If the primal is feasible and bounded below, its optimal value c∗ (which
of course is a lower bound on itself) can, by Proposition 1.2.3, be (non-strictly)
majorized by a quantity bT y ∗ , where y ∗ is a feasible solution to (LP∗ ). In the
situation in question, of course, bT y ∗ = c∗ (by already proved item 2)); on the other
hand, in view of the same Proposition 1.2.3, the optimal value in the dual is ≤ c∗ . We
conclude that the optimal value in the dual is attained and is equal to the optimal
value in the primal.
(iv)⇒(ii): evident;
(ii)⇒(iii): This implication, in view of the primal-dual symmetry, follows from the
implication (i)⇒(iv).
(iii)⇒(i): evident.
We have seen that (i)≡(ii)≡(iii)≡(iv) and that the first (and consequently each) of
these 4 equivalent properties implies that the optimal value in the primal problem
is equal to the optimal value in the dual one. All which remains is to prove the
equivalence between (i)–(iv), on one hand, and (v), on the other hand. This is
immediate: (i)–(iv), of course, imply (v); vice versa, in the case of (v) the primal is
not only feasible, but also bounded below (this is an immediate consequence of the
feasibility of the dual problem, see 2)), and (i) follows.

An immediate corollary of the LP Duality Theorem is the following necessary and sufficient
optimality condition in LP:

Theorem 1.2.3 [Necessary and sufficient optimality conditions in linear programming] Con-
sider an LP program (LP) along with its dual (LP∗ ). A pair (x, y) of primal and dual feasible
solutions is comprised of optimal solutions to the respective problems if and only if

yi [Ax − b]i = 0, i = 1, ..., m, [complementary slackness]

likewise as if and only if


cT x − bT y = 0 [zero duality gap]

Indeed, the “zero duality gap” optimality condition is an immediate consequence of the fact
that the value of primal objective at every primal feasible solution is ≥ the value of the
dual objective at every dual feasible solution, while the optimal values in the primal and the
dual are equal to each other, see Theorem 1.2.2. The equivalence between the “zero duality
gap” and the “complementary slackness” optimality conditions is given by the following
1.3. FROM LINEAR TO CONIC PROGRAMMING 19

computation: whenever x is primal feasible and y is dual feasible, the products yi [Ax − b]i ,
i = 1, ..., m, are nonnegative, while the sum of these products is precisely the duality gap:

y T [Ax − b] = (AT y)T x − bT y = cT x − bT y.

Thus, the duality gap can vanish at a primal-dual feasible pair (x, y) if and only if all products
yi [Ax − b]i for this pair are zeros.

1.3 From Linear to Conic Programming


Linear Programming models cover numerous applications. Whenever applicable, LP allows to
obtain useful quantitative and qualitative information on the problem at hand. The specific
analytic structure of LP programs gives rise to a number of general results (e.g., those of the LP
Duality Theory) which provide us in many cases with valuable insight and understanding. At
the same time, this analytic structure underlies some specific computational techniques for LP;
these techniques, which by now are perfectly well developed, allow to solve routinely quite large
(tens/hundreds of thousands of variables and constraints) LP programs. Nevertheless, there
are situations in reality which cannot be covered by LP models. To handle these “essentially
nonlinear” cases, one needs to extend the basic theoretical results and computational techniques
known for LP beyond the bounds of Linear Programming.
For the time being, the widest class of optimization problems to which the basic results of
LP were extended, is the class of convex optimization programs. There are several equivalent
ways to define a general convex optimization problem; the one we are about to use is not the
traditional one, but it is well suited to encompass the range of applications we intend to cover
in our course.
When passing from a generic LP problem
  

min c x  Ax ≥ b
T
[A : m × n] (LP)
x

to its nonlinear extensions, we should expect to encounter some nonlinear components in the
problem. The traditional way here is to say: “Well, in (LP) there are a linear objective function
f (x) = cT x and inequality constraints fi (x) ≥ bi with linear functions fi (x) = aTi x, i = 1, ..., m.
Let us allow some/all of these functions f, f1 , ..., fm to be nonlinear.” In contrast to this tra-
ditional way, we intend to keep the objective and the constraints linear, but introduce “nonlin-
earity” in the inequality sign ≥.

1.4 Orderings of Rm and convex cones


The constraint inequality Ax ≥ b in (LP) is an inequality between vectors; as such, it requires a
definition, and the definition is well-known: given two vectors a, b ∈ Rm , we write a ≥ b, if the
coordinates of a majorate the corresponding coordinates of b:

a ≥ b ⇔ {ai ≥ bi , i = 1, ..., m}. (” ≥ ”)

In the latter relation, we again meet with the inequality sign ≥, but now it stands for the
“arithmetic ≥” – a well-known relation between real numbers. The above “coordinate-wise”
partial ordering of vectors in Rm satisfies a number of basic properties of the standard ordering
of reals; namely, for all vectors a, b, c, d, ... ∈ Rm one has
20 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

1. Reflexivity: a ≥ a;

2. Anti-symmetry: if both a ≥ b and b ≥ a, then a = b;

3. Transitivity: if both a ≥ b and b ≥ c, then a ≥ c;

4. Compatibility with linear operations:

(a) Homogeneity: if a ≥ b and λ is a nonnegative real, then λa ≥ λb


(”One can multiply both sides of an inequality by a nonnegative real”)
(b) Additivity: if both a ≥ b and c ≥ d, then a + c ≥ b + d
(”One can add two inequalities of the same sign”).

It turns out that

• A significant part of the nice features of LP programs comes from the fact that the vector
inequality ≥ in the constraint of (LP) satisfies the properties 1. – 4.;

• The standard inequality ” ≥ ” is neither the only possible, nor the only interesting way to
define the notion of a vector inequality fitting the axioms 1. – 4.

As a result,

A generic optimization problem which looks exactly the same as (LP), up to the
fact that the inequality ≥ in (LP) is now replaced with and ordering which differs
from the component-wise one, inherits a significant part of the properties of LP
problems. Specifying properly the ordering of vectors, one can obtain from (LP)
generic optimization problems covering many important applications which cannot
be treated by the standard LP.

To the moment what is said is just a declaration. Let us look how this declaration comes to
life.
We start with clarifying the “geometry” of a “vector inequality” satisfying the axioms 1. –
4. Thus, we consider vectors from a finite-dimensional Euclidean space E with an inner product
·, · and assume that E is equipped with a partial ordering, let it be denoted by !: in other
words, we say what are the pairs of vectors a, b from E linked by the inequality a ! b. We call
the ordering “good”, if it obeys the axioms 1. – 4., and are interested to understand what are
these good orderings.
Our first observation is:

A. A good inequality ! is completely identified by the set K of !-nonnegative vectors:

K = {a ∈ E | a ! 0}.

Namely,
a ! b ⇔ a − b ! 0 [⇔ a − b ∈ K].

Indeed, let a ! b. By 1. we have −b ! −b, and by 4.(b) we may add the latter
inequality to the former one to get a − b ! 0. Vice versa, if a − b ! 0, then, adding
to this inequality the one b ! b, we get a ! b.
1.4. ORDERINGS OF RM AND CONVEX CONES 21

The set K in Observation A cannot be arbitrary. It is easy to verify that it must be a pointed
convex cone, i.e., it must satisfy the following conditions:
1. K is nonempty and closed under addition:
a, a ∈ K ⇒ a + a ∈ K;

2. K is a conic set:
a ∈ K, λ ≥ 0 ⇒ λa ∈ K.
3. K is pointed:
a ∈ K and − a ∈ K ⇒ a = 0.
Geometrically: K does not contain straight lines passing through the origin.
Thus, every nonempty pointed convex cone K in E induces a partial ordering on E which
satisfies the axioms 1. – 4. We denote this ordering by ≥K :
a ≥K b ⇔ a − b ≥K 0 ⇔ a − b ∈ K.
What is the cone responsible for the standard coordinate-wise ordering ≥ on E = Rm we have
started with? The answer is clear: this is the cone comprised of vectors with nonnegative entries
– the nonnegative orthant
Rm T m
+ = {x = (x1 , ..., xm ) ∈ R : xi ≥ 0, i = 1, ..., m}.

(Thus, in order to express the fact that a vector a is greater than or equal to, in the component-
wise sense, to a vector b, we were supposed to write a ≥Rm +
b. However, we are not going to be
that formal and shall use the standard shorthand notation a ≥ b.)
The nonnegative orthant Rm + is not just a pointed convex cone; it possesses two useful
additional properties:
I. The cone is closed: if a sequence of vectors ai from the cone has a limit, the latter also
belongs to the cone.
II. The cone possesses a nonempty interior: there exists a vector such that a ball of positive
radius centered at the vector is contained in the cone.
These additional properties are very important. For example, I is responsible for the possi-
bility to pass to the term-wise limit in an inequality:
ai ≥ bi ∀i, ai → a, bi → b as i → ∞ ⇒ a ≥ b.
It makes sense to restrict ourselves with good partial orderings coming from cones K sharing
the properties I, II. Thus,
From now on, speaking about good partial orderings ≥K , we always assume that the
underlying set K is a pointed and closed convex cone with a nonempty interior.
Note that the closedness of K makes it possible to pass to limits in ≥K -inequalities:
ai ≥K bi , ai → a, bi → b as i → ∞ ⇒ a ≥K b.
The nonemptiness of the interior of K allows to define, along with the “non-strict” inequality
a ≥K b, also the strict inequality according to the rule
a >K b ⇔ a − b ∈ int K,
where int K is the interior of the cone K. E.g., the strict coordinate-wise inequality a >Rm
+
b
(shorthand: a > b) simply says that the coordinates of a are strictly greater, in the usual
arithmetic sense, than the corresponding coordinates of b.
22 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

Examples. The partial orderings we are especially interested in are given by the following
cones:
• The nonnegative orthant Rm n
+ in R ;

• The Lorentz (or the second-order, or the less scientific name the ice-cream) cone
  
 m−1
 
Lm = x = (x1 , ..., xm−1 , xm )T ∈ Rm : xm ≥ ! x2i
 
i=1

• The positive semidefinite cone Sm + . This cone “lives” in the space E = S


m of m × m

symmetric matrices (equipped with the Frobenius inner product A, B = Tr(AB) =

Aij Bij ) and consists of all m × m matrices A which are positive semidefinite, i.e.,
i,j

A = AT ; xT Ax ≥ 0 ∀x ∈ Rm .

1.5 “Conic programming” – what is it?


Let K be a cone in E (convex, pointed, closed and with a nonempty interior). Given an
objective c ∈ Rn , a linear mapping x #→ Ax : Rn → E and a right hand side b ∈ E, consider the
optimization problem   

min cT x  Ax ≥K b (CP).
x

We shall refer to (CP) as to a conic problem associated with the cone K. Note that the only
difference between this program and an LP problem is that the latter deals with the particular
choice E = Rm , K = Rm + . With the formulation (CP), we get a possibility to cover a much
wider spectrum of applications which cannot be captured by LP; we shall look at numerous
examples in the sequel.

1.6 Conic Duality


Aside of algorithmic issues, the most important theoretical result in Linear Programming is the
LP Duality Theorem; can this theorem be extended to conic problems? What is the extension?
The source of the LP Duality Theorem was the desire to get in a systematic way a lower
bound on the optimal value c∗ in an LP program
 

∗  T
c = min c x  Ax ≥ b . (LP)
x

The bound was obtained by looking at the inequalities of the type

λ, Ax ≡ λT Ax ≥ λT b (Cons(λ))

with weight vectors λ ≥ 0. By its origin, an inequality of this type is a consequence of the system
of constraints Ax ≥ b of (LP), i.e., it is satisfied at every solution to the system. Consequently,
whenever we are lucky to get, as the left hand side of (Cons(λ)), the expression cT x, i.e.,
whenever a nonnegative weight vector λ satisfies the relation

AT λ = c,
1.6. CONIC DUALITY 23

the inequality (Cons(λ)) yields a lower bound bT λ on the optimal value in (LP). And the dual
problem  
max bT λ | λ ≥ 0, AT λ = c
was nothing but the problem of finding the best lower bound one can get in this fashion.
The same scheme can be used to develop the dual to a conic problem
 
min cT x | Ax ≥K b , K ⊂ E. (CP)

Here the only step which needs clarification is the following one:

(?) What are the “admissible” weight vectors λ, i.e., the vectors such that the scalar
inequality
λ, Ax ≥ λ, b
is a consequence of the vector inequality Ax ≥K b?

In the particular case of coordinate-wise partial ordering, i.e., in the case of E = Rm , K = Rm


+,
the admissible vectors were those with nonnegative coordinates. These vectors, however, not
necessarily are admissible for an ordering ≥K when K is different from the nonnegative orthant:

Example 1.6.1 Consider the ordering ≥L3 on E = R3 given by the 3-dimensional ice-cream
cone:    
a1 0 "
 a2  ≥L3  0  ⇔ a3 ≥ a2 + a2 .
1 2
a3 0
The inequality    
−1 0
 −1  ≥L3  0 
2 0
 
1
is valid; however, aggregating this inequality with the aid of a positive weight vector λ =  1 ,
0.1
we get the false inequality
−1.8 ≥ 0.
Thus, not every nonnegative weight vector is admissible for the partial ordering ≥L3 .

To answer the question (?) is the same as to say what are the weight vectors λ such that

∀a ≥K 0 : λ, a ≥ 0. (1.6.1)

Whenever λ possesses the property (1.6.1), the scalar inequality

λ, a ≥ λ, b

is a consequence of the vector inequality a ≥K b:

a ≥K b
⇔ a − b ≥K 0 [additivity of ≥K ]
⇒ λ, a − b ≥ 0 [by (1.6.1)]
⇔ λ, a ≥ λT b.
24 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

Vice versa, if λ is an admissible weight vector for the partial ordering ≥K :

∀(a, b : a ≥K b) : λ, a ≥ λ, b

then, of course, λ satisfies (1.6.1).


Thus the weight vectors λ which are admissible for a partial ordering ≥K are exactly the
vectors satisfying (1.6.1), or, which is the same, the vectors from the set

K∗ = {λ ∈ E : λ, a ≥ 0 ∀a ∈ K}.

The set K∗ is comprised of vectors whose inner products with all vectors from K are nonnegative.
K∗ is called the cone dual to K. The name is legitimate due to the following fact:

Theorem 1.6.1 [Properties of the dual cone] Let E be a finite-dimensional Euclidean space
with inner product ·, · and let K ⊂ E be a nonempty set. Then
(i) The set
K∗ = {λ ∈ Em : λ, a ≥ 0 ∀a ∈ K}
is a closed convex cone.
(ii) If int K $= ∅, then K∗ is pointed.
(iii) If K is a closed convex pointed cone, then int K∗ $= ∅.
(iv) If K is a closed convex cone, then so is K∗ , and the cone dual to K∗ is K itself:

(K∗ )∗ = K.

An immediate corollary of the Theorem is as follows:

Corollary 1.6.1 A set K ⊂ E is a closed convex pointed cone with a nonempty interior if and
only if the set K∗ is so.

From the dual cone to the problem dual to (CP). Now we are ready to derive the dual
problem of a conic problem (CP). As in the case of Linear Programming, we start with the
observation that whenever x is a feasible solution to (CP) and λ is an admissible weight vector,
i.e., λ ∈ K∗ , then x satisfies the scalar inequality

(A∗ λ)T x ≡ λ, Ax ≥ λ, b 1)

– this observation is an immediate consequence of the definition of K∗ . It follows that whenever


λ∗ is an admissible weight vector satisfying the relation

A∗ λ = c,

one has
cT x = (A∗ λ)T x = λ, Ax ≥ b, λ
1)
For a linear operator x → Ax : Rn → E, A∗ is the conjugate operator given by the identity

y, Ax = xT Ay ∀(y ∈ E, x ∈ Rn ).

When representing the operators by their matrices in orthogonal bases in the argument and the range spaces,
the matrix representing the conjugate operator is exactly the transpose of the matrix representing the operator
itself.
1.6. CONIC DUALITY 25

for all x feasible for (CP), so that the quantity b, λ is a lower bound on the optimal value of
(CP). The best bound one can get in this fashion is the optimal value in the problem

max {b, λ | A∗ λ = c, λ ≥K∗ 0} (D)

and this program is called the program dual to (CP).


So far, what we know about the duality we have just introduced is the following
Proposition 1.6.1 [Weak Duality Theorem] The optimal value of (D) is a lower bound on the
optimal value of (CP).

1.6.1 Geometry of the primal and the dual problems


The structure of problem (D) looks quite different from the one of (CP). However, a more
careful analysis demonstrates that the difference in structures comes just from how we represent
the data: geometrically, the problems are completely similar. Indeed, in (D) we are asked to
maximize a linear objective b, λ over the intersection of an affine plane L∗ = {λ | A∗ λ = c}
with the cone K∗ . And what about (CP)? Let us pass in this problem from the “true design
variables” x to their images y = Ax−b ∈ E. When x runs through Rn , y runs through the affine
plane L = {y = Ax − b | x ∈ Rn }; x ∈ Rn is feasible for (CP) if and only if the corresponding
y = Ax − b belongs to the cone K. Thus, in (CP) we also deal with the intersection of an affine
plane, namely, L, and a cone, namely, K. Now assume that our objective cT x can be expressed
in terms of y = Ax − b:
cT x = d, Ax − b + const.
This assumption is clearly equivalent to the inclusion

c ∈ ImA∗ . (1.6.2)

Indeed, in the latter case we have c = A∗ d for some d, whence

cT x = A∗ d, x = d, Ax = d, Ax − b + d, b ∀x. (1.6.3)

In the case of (1.6.2) the primal problem (CP) can be posed equivalently as the following problem:

min {d, y | y ∈ L, y ≥K 0} ,
y

where L = ImA − b and d is (any) vector satisfying the relation A∗ d = c. Thus,


In the case of (1.6.2) the primal problem, geometrically, is the problem of minimizing
a linear form over the intersection of the affine plane L with the cone K, and the
dual problem, similarly, is to maximize another linear form over the intersection of
the affine plane L∗ with the dual cone K∗ .
Now, what happens if the condition (1.6.2) is not satisfied? The answer is very simple: in this
case (CP) makes no sense – it is either unbounded below, or infeasible.
Indeed, assume that (1.6.2) is not satisfied. Then, by Linear Algebra, the vector c is not
orthogonal to the null space of A, so that there exists e such that Ae = 0 and cT e > 0. Now
let x be a feasible solution of (CP); note that all points x − µe, µ ≥ 0, are feasible, and
cT (x − µe) → ∞ as µ → ∞. Thus, when (1.6.2) is not satisfied, problem (CP), whenever
feasible, is unbounded below.
26 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

From the above observation we see that if (1.6.2) is not satisfied, then we may reject (CP) from
the very beginning. Thus, from now on we assume that (1.6.2) is satisfied. In fact in what
follows we make a bit stronger assumption:

A. The mapping A is of full column rank, i.e., it has trivial null space.
Assuming that the mapping x #→ Ax has the trivial null space (“we have eliminated
from the very beginning the redundant degrees of freedom – those not affecting the
value of Ax”), the equation
A∗ d = q
is solvable for every right hand side vector q.

In view of A, problem (CP) can be reformulated as a problem (P) of minimizing a linear objective
d, y over the intersection of an affine plane L and a cone K. Conversely, a problem (P) of this
latter type can be posed in the form of (CP) – to this end it suffices to represent the plane L as
the image of an affine mapping x #→ Ax − b (i.e., to parameterize somehow the feasible plane)
and to “translate” the objective d, y to the space of x-variables – to set c = A∗ d, which yields

y = Ax − b ⇒ d, y = cT x + const.

Thus, when dealing with a conic problem, we may pass from its “analytic form” (CP) to the
“geometric form” (P) and vice versa.
What are the relations between the “geometric data” of the primal and the dual problems?
We already know that the cone K∗ associated with the dual problem is dual of the cone K
associated with the primal one. What about the feasible planes L and L∗ ? The answer is
simple: they are orthogonal to each other! More exactly, the affine plane L is the translation,
by vector −b, of the linear subspace

L = ImA ≡ {y = Ax | x ∈ Rn }.

And L∗ is the translation, by any solution λ0 of the system A∗ λ = c, e.g., by the solution d to
the system, of the linear subspace

L∗ = Null(A∗ ) ≡ {λ | A∗ λ = 0}.

A well-known fact of Linear Algebra is that the linear subspaces L and L∗ are orthogonal
complements of each other:

L = {y | y, λ = 0 ∀λ ∈ L∗ }; L∗ = {λ | y, λ = 0 ∀y ∈ L}.

Thus, we come to a nice geometrical conclusion:

A conic problem2) (CP) is the problem

min {d, y | y ∈ L − b, y ≥K 0} (P)


y

of minimizing a linear objective d, y over the intersection of a cone K with an affine
plane L = L − b given as a translation, by vector −b, of a linear subspace L.
2)
recall that we have restricted ourselves to the problems satisfying the assumption A
1.6. CONIC DUALITY 27

The dual problem is the problem


 
max b, λ | λ ∈ L⊥ + d, λ ≥K∗ 0 . (D)
λ

of maximizing the linear objective b, λ over the intersection of the dual cone K∗
with an affine plane L∗ = L⊥ + d given as a translation, by the vector d, of the
orthogonal complement L⊥ of L.

What we get is an extremely transparent geometric description of the primal-dual pair of conic
problems (P), (D). Note that the duality is completely symmetric: the problem dual to (D) is
(P)! Indeed, we know from Theorem 1.6.1 that (K∗ )∗ = K, and of course (L⊥ )⊥ = L. Switch
from maximization to minimization corresponds to the fact that the “shifting vector” in (P) is
(−b), while the “shifting vector” in (D) is d. The geometry of the primal-dual pair (P), (D) is
illustrated on the below picture:
b
L*

K* L

Figure 1.1. Primal-dual pair of conic problems


[bold: primal (vertical segment) and dual (horizontal ray) feasible sets]

Finally, note that in the case when (CP) is an LP program (i.e., in the case when K is the
nonnegative orthant), the “conic dual” problem (D) is exactly the usual LP dual; this fact
immediately follows from the observation that the cone dual to Rm m
+ is R+ itself.
We have explored the geometry of a primal-dual pair of conic problems: the “geometric
data” of such a pair are given by a pair of dual to each other cones K, K∗ in E and a pair of
affine planes L = L − b, L∗ = L⊥ + d, where L is a linear subspace in E and L⊥ is its orthogonal
complement. The first problem from the pair – let it be called (P) – is to minimize b, y over
y ∈ K ∩ L, and the second (D) is to maximize d, λ over λ ∈ K∗ ∩ L∗ . Note that the “geometric
data” (K, K∗ , L, L∗ ) of the pair do not specify completely the problems of the pair: given L, L∗ ,
we can uniquely define L, but not the shift vectors (−b) and d: b is known up to shift by a
vector from L, and d is known up to shift by a vector from L⊥ . However, this non-uniqueness
is of absolutely no importance: replacing a chosen vector d ∈ L∗ by another vector d ∈ L∗ , we
pass from (P) to a new problem (P ) which is completely equivalent to (P): indeed, both (P)
and (P ) have the same feasible set, and on the (common) feasible plane L of the problems their
28 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

objectives d, y and d , y differ from each other by a constant:

y ∈ L = L − b, d − d ∈ L⊥ ⇒ d − d , y + b = 0 ⇒ d − d , y = −(d − d ), b ∀y ∈ L.

Similarly, shifting b along L, we do modify the objective in (D), but in a trivial way – on the
feasible plane L∗ of the problem the new objective differs from the old one by a constant.

1.7 The Conic Duality Theorem


The Weak Duality (Proposition 1.6.1) we have established so far for conic problems is much
weaker than the Linear Programming Duality Theorem. Is it possible to get results similar to
those of the LP Duality Theorem in the general conic case as well? The answer is affirmative,
provided that the primal problem (CP) is strictly feasible, i.e., that there exists x such that
Ax − b >K 0, or, geometrically, L ∩ int K $= ∅.
The advantage of the geometrical definition of strict feasibility is that it is independent of
the particular way in which the feasible plane is defined; hence, with this definition it is clear
what does it mean that the dual problem (D) is strictly feasible.
Our main result is the following

Theorem 1.7.1 [Conic Duality Theorem] Consider a conic problem


 
c∗ = min cT x | Ax ≥K b (CP)
x

along with its conic dual

b∗ = max {b, λ | A∗ λ = c, λ ≥K∗ 0} . (D)

1) The duality is symmetric: the dual problem is conic, and the problem dual to dual is the
primal.
2) The value of the dual objective at every dual feasible solution λ is ≤ the value of the primal
objective at every primal feasible solution x, so that the duality gap

cT x − b, λ

is nonnegative at every “primal-dual feasible pair” (x, λ).


3.a) If the primal (CP) is bounded below and strictly feasible (i.e. Ax >K b for some x), then
the dual (D) is solvable and the optimal values in the problems are equal to each other: c∗ = b∗ .
3.b) If the dual (D) is bounded above and strictly feasible (i.e., exists λ >K∗ 0 such that
A λ = c), then the primal (CP) is solvable and c∗ = b∗ .

4) Assume that at least one of the problems (CP), (D) is bounded and strictly feasible. Then
a primal-dual feasible pair (x, λ) is a pair of optimal solutions to the respective problems
4.a) if and only if
b, λ = cT x [zero duality gap]

and
4.b) if and only if
λ, Ax − b = 0 [complementary slackness]
1.7. THE CONIC DUALITY THEOREM 29

Proof. 1): The result was already obtained when discussing the geometry of the primal and
the dual problems.
2): This is the Weak Duality Theorem.
3): Assume that (CP) is strictly feasible and bounded below, and let c∗ be the optimal value
of the problem. We should prove that the dual is solvable with the same optimal value. Since
we already know that the optimal value of the dual is ≤ c∗ (see 2)), all we need is to point out
a dual feasible solution λ∗ with bT λ∗ ≥ c∗ .
Consider the convex set
M = {y = Ax − b | x ∈ Rn , cT x ≤ c∗ }.
Let us start with the case of c $= 0. We claim that in this case
(i) The set M is nonempty;
(ii) the plane M does not intersect the interior K of the cone K: M ∩ int K = ∅.
(i) is evident (why?). To verify (ii), assume, on contrary, that there exists a point x̄, cT x̄ ≤ c∗ ,
such that ȳ ≡ Ax̄ − b >K 0. Then, of course, Ax − b >K 0 for all x close enough to x̄, i.e., all
points x in a small enough neighbourhood of x̄ are also feasible for (CP). Since c $= 0, there are
points x in this neighbourhood with cT x < cT x̄ ≤ c∗ , which is impossible, since c∗ is the optimal
value of (CP).
Now let us make use of the following basic fact:
Theorem 1.7.2 [Separation Theorem for Convex Sets] Let S, T be nonempty non-
intersecting convex subsets of a finite-dimensional Euclidean space E with inner prod-
uct ·, · . Then S and T can be separated by a linear functional: there exists a nonzero
vector λ ∈ E such that
supλ, u ≤ inf λ, u .
u∈S u∈T

Applying the Separation Theorem to S = M and T = K, we conclude that there exists λ ∈ E


such that
sup λ, y ≤ inf λ, y . (1.7.1)
y∈M y∈int K

From the inequality it follows that the linear form λ, y of y is bounded below on K = int K.
Since this interior is a conic set:
y ∈ K, µ > 0 ⇒ µy ∈ K
(why?), this boundedness implies that λ, y ≥ 0 for all y ∈ K. Consequently, λ, y ≥ 0 for all
y from the closure of K, i.e., for all y ∈ K. We conclude that λ ≥K∗ 0, so that the inf in (1.7.1)
is nonnegative. On the other hand, the infimum of a linear form over a conic set clearly cannot
be positive; we conclude that the inf in (1.7.1) is 0, so that the inequality reads
sup λ, u ≤ 0.
u∈M

Recalling the definition of M , we get


[A∗ λ]T x ≤ λ, b (1.7.2)
for all x from the half-space cT x ≤ c∗ . But the linear form [A∗ λ]T x can be bounded above on
the half-space if and only if the vector A∗ λ is proportional, with a nonnegative coefficient, to
the vector c:
A∗ λ = µc
30 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

for some µ ≥ 0. We claim that µ > 0. Indeed, assuming µ = 0, we get A∗ λ = 0, whence λ, b ≥ 0
in view of (1.7.2). It is time now to recall that (CP) is strictly feasible, i.e., Ax̄ − b >K 0 for
some x̄. Since λ ≥K∗ 0 and λ $= 0, the product λ, Ax̄ − b should be strictly positive (why?),
while in fact we know that the product is −λ, b ≤ 0 (since A∗ λ = 0 and, as we have seen,
λ, b ≥ 0).
Thus, µ > 0. Setting λ∗ = µ−1 λ, we get
λ∗ ≥K∗ 0 [since λ ≥K∗ 0 and µ > 0]
A∗ λ ∗ =c [since A∗ λ = µc] .
cT x ≤ λ∗ , b ∀x : cT x ≤ c∗ [see (1.7.2)]
We see that λ∗ is feasible for (D), the value of the dual objective at λ∗ being at least c∗ , as
required.
It remains to consider the case c = 0. Here, of course, c∗ = 0, and the existence of a dual
feasible solution with the value of the objective ≥ c∗ = 0 is evident: the required solution is
λ = 0. 3.a) is proved.
3.b): the result follows from 3.a) in view of the primal-dual symmetry.
4): Let x be primal feasible, and λ be dual feasible. Then
cT x − b, λ = (A∗ λ)T x − b, λ = Ax − b, λ .
We get a useful identity as follows:
(!) For every primal-dual feasible pair (x, λ) the duality gap cT x − b, λ is equal to
the inner product of the primal slack vector y = Ax − b and the dual vector λ.
Note that (!) in fact does not require “full” primal-dual feasibility: x may be ar-
bitrary (i.e., y should belong to the primal feasible plane ImA − b), and λ should
belong to the dual feasible plane A∗ λ = c, but y and λ not necessary should belong
to the respective cones.
In view of (!) the complementary slackness holds if and only if the duality gap is zero; thus, all
we need is to prove 4.a).
The “primal residual” cT x − c∗ and the “dual residual” b∗ − b, λ are nonnegative, provided
that x is primal feasible, and λ is dual feasible. It follows that the duality gap
cT x − b, λ = [cT x − c∗ ] + [b∗ − b, λ ] + [c∗ − b∗ ]
is nonnegative (recall that c∗ ≥ b∗ by 2)), and it is zero if and only if c∗ = b∗ and both primal
and dual residuals are zero (i.e., x is primal optimal, and λ is dual optimal). All these arguments
hold without any assumptions of strict feasibility. We see that the condition “the duality gap
at a primal-dual feasible pair is zero” is always sufficient for primal-dual optimality of the pair;
and if c∗ = b∗ , this sufficient condition is also necessary. Since in the case of 4) we indeed have
c∗ = b∗ (this is stated by 3)), 4.a) follows.
A useful consequence of the Conic Duality Theorem is the following
Corollary 1.7.1 Assume that both (CP) and (D) are strictly feasible. Then both problems are
solvable, the optimal values are equal to each other, and each one of the conditions 4.a), 4.b) is
necessary and sufficient for optimality of a primal-dual feasible pair.

Indeed, by the Weak Duality Theorem, if one of the problems is feasible, the other is bounded,
and it remains to use the items 3) and 4) of the Conic Duality Theorem.
1.7. THE CONIC DUALITY THEOREM 31

1.7.1 Is something wrong with conic duality?


The statement of the Conic Duality Theorem is weaker than that of the LP Duality theorem:
in the LP case, feasibility (even non-strict) and boundedness of either primal, or dual problem
implies solvability of both the primal and the dual and equality between their optimal values.
In the general conic case something “nontrivial” is stated only in the case of strict feasibility
(and boundedness) of one of the problems. It can be demonstrated by examples that this
phenomenon reflects the nature of things, and is not due to our ability to analyze it. The case
of non-polyhedral cone K is truly more complicated than the one of the nonnegative orthant K;
as a result, a “word-by-word” extension of the LP Duality Theorem to the conic case is false.
Example 1.7.1 Consider the following conic problem with 2 variables x = (x1 , x2 )T and the
3-dimensional ice-cream cone K:
   

 x1 − x2 

 
min x1 | Ax − b ≡  1  ≥L3 0 .

 

x1 + x2

Recalling the definition of L3 , we can write the problem equivalently as


 " 
min x1 | (x1 − x2 )2 + 1 ≤ x1 + x2 ,

i.e., as the problem


min {x1 | 4x1 x2 ≥ 1, x1 + x2 > 0} .
Geometrically the problem is to minimize x1 over the intersection of the 3D ice-cream cone with
a 2D plane; the inverse image of this intersection in the “design plane” of variables x1 , x2 is part
of the 2D nonnegative orthant bounded by the hyperbola x1 x2 ≥ 1/4. The problem is clearly
strictly feasible (a strictly feasible solution is, e.g., x = (1, 1)T ) and bounded below, with the
optimal value 0. This optimal value, however, is not achieved – the problem is unsolvable!

Example 1.7.2 Consider the following conic problem with two variables x = (x1 , x2 )T and the
3-dimensional ice-cream cone K:
   

 x1 

 
min x2 | Ax − b =  x2  ≥L3 0 .

 

x1

The problem is equivalent to the problem


 " 
x2 | x21 + x22 ≤ x1 ,

i.e., to the problem


min {x2 | x2 = 0, x1 ≥ 0} .
The problem is clearly solvable, and its optimal set is the ray {x1 ≥ 0, x2 = 0}.
Now let us build the conic dual to our (solvable!) primal. It is immediately seen that the
cone dual to an ice-cream cone is this ice-cream cone itself. Thus, the dual problem is
 # $ # $ %
λ1 + λ3 0
max 0 | = , λ ≥L3 0 .
λ λ2 1
32 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

In spite of the fact that primal is solvable, the dual"is infeasible: indeed, assuming that λ is dual
feasible, we have λ ≥L3 0, which means that λ3 ≥ λ21 + λ22 ; since also λ1 + λ3 = 0, we come to
λ2 = 0, which contradicts the equality λ2 = 1.

We see that the weakness of the Conic Duality Theorem as compared to the LP Duality one
reflects pathologies which indeed may happen in the general conic case.

1.7.2 Consequences of the Conic Duality Theorem


Sufficient condition for infeasibility. Recall that a necessary and sufficient condition for
infeasibility of a (finite) system of scalar linear inequalities (i.e., for a vector inequality with
respect to the partial ordering ≥) is the possibility to combine these inequalities in a linear
fashion in such a way that the resulting scalar linear inequality is contradictory. In the case of
cone-generated vector inequalities a slightly weaker result can be obtained:
Proposition 1.7.1 Consider a linear vector inequality

Ax − b ≥K 0. (I)

(i) If there exists λ satisfying

λ ≥K∗ 0, A∗ λ = 0, λ, b > 0, (II)

then (I) has no solutions.


(ii) If (II) has no solutions, then (I) is “almost solvable” – for every positive there exists b
such that b − b2 < and the perturbed system

Ax − b ≥K 0

is solvable.
Moreover,
(iii) (II) is solvable if and only if (I) is not “almost solvable”.
Note the difference between the simple case when ≥K is the usual partial ordering ≥ and the
general case. In the former, one can replace in (ii) “nearly solvable” by “solvable”; however, in
the general conic case “almost” is unavoidable.
Example 1.7.3 Let system (I) be given by
 
x+1
 
Ax − b ≡  x√− 1  ≥L3 0.
2x

Recalling the definition of the ice-cream cone L3 , we can write the inequality equivalently as
√ " 
2x ≥ (x + 1)2 + (x − 1)2 ≡ 2x2 + 2, (i)

which of course is unsolvable. The corresponding system (II) is


" & '
λ3 ≥ λ21 + λ22 ⇔ λ ≥L3∗ 0
√ & '
λ1 + λ2 + 2λ3 = 0 ⇔ AT λ = 0 (ii)
& '
λ2 − λ 1 > 0 ⇔ bT λ > 0
1.7. THE CONIC DUALITY THEOREM 33

From the second of these relations, λ3 = − √12 (λ1 + λ2 ), so that from the first inequality we get
0 ≤ (λ1 − λ2 )2 , whence λ1 = λ2 . But then the third inequality in (ii) is impossible! We see that
here both (i) and (ii) have no solutions.
The geometry of the example is as follows. (i) asks to find a point in the intersection of
the 3D ice-cream cone and a line. This line is an asymptote of the cone (it belongs to a 2D
plane which crosses the cone in such way that the boundary of the cross-section is a branch of
a hyperbola, and the line is one of two asymptotes of the hyperbola). Although the intersection
is empty ((i) is unsolvable), small shifts of the line make the intersection nonempty (i.e., (i) is
unsolvable and “almost solvable” at the same time). And it turns out that one cannot certify
the fact that (i) itself is unsolvable by providing a solution to (ii).

Proof of the Proposition. (i) is evident (why?).


Let us prove (ii). To this end it suffices to verify that if (I) is not “almost solvable”, then (II) is
solvable. Let us fix a vector σ >K 0 and look at the conic problem

min {t | Ax + tσ − b ≥K 0} (CP)
x,t

in variables (x, t). Clearly, the problem is strictly feasible (why?). Now, if (I) is not almost solvable, then,
first, the matrix of the problem [A; σ] satisfies the full column rank condition A (otherwise the image of
the mapping (x, t) #→ Ax + tσ − b would coincide with the image of the mapping x #→ Ax − b, which is
not he case – the first of these images does intersect K, while the second does not). Second, the optimal
value in (CP) is strictly positive (otherwise the problem would admit feasible solutions with t close to 0,
and this would mean that (I) is almost solvable). From the Conic Duality Theorem it follows that the
dual problem of (CP)
max {b, λ | A∗ λ = 0, σ, λ = 1, λ ≥K∗ 0}
λ

has a feasible solution with positive b, λ , i.e., (II) is solvable.


It remains to prove (iii). Assume first that (I) is not almost solvable; then (II) must be solvable by
(ii). Vice versa, assume that (II) is solvable, and let λ be a solution to (II). Then λ solves also all systems
of the type (II) associated with small enough perturbations of b instead of b itself; by (i), it implies that
all inequalities obtained from (I) by small enough perturbation of b are unsolvable.

When is a scalar linear inequality a consequence of a given linear vector inequality?


The question we are interested in is as follows: given a linear vector inequality

Ax ≥K b (V)

and a scalar inequality


cT x ≥ d (S)

we want to check whether (S) is a consequence of (V). If K is the nonnegative orthant, the
answer is be given by the Inhomogeneous Farkas Lemma:

Inequality (S) is a consequence of a feasible system of linear inequalities Ax ≥ b if


and only if (S) can be obtained from (V) and the trivial inequality 1 ≥ 0 in a linear
fashion (by taking weighted sum with nonnegative weights).

In the general conic case we can get a slightly weaker result:


34 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

Proposition 1.7.2 (i) If (S) can be obtained from (V) and from the trivial inequality 1 ≥ 0 by
admissible aggregation, i.e., there exist weight vector λ ≥K∗ 0 such that

A∗ λ = c, λ, b ≥ d,

then (S) is a consequence of (V).


(ii) If (S) is a consequence of a strictly feasible linear vector inequality (V), then (S) can be
obtained from (V) by an admissible aggregation.

The difference between the case of the partial ordering ≥ and a general partial ordering ≥K is
in the word “strictly” in (ii).
Proof of the proposition. (i) is evident (why?). To prove (ii), assume that (V) is strictly feasible and
(S) is a consequence of (V) and consider the conic problem
 ( )   
x Ax − b
min t | Ā − b̄ ≡ ≥ 0 ,
x,t t d − cT x + t K̄

K̄ = {(x, t) | x ∈ K, t ≥ 0}

The problem is clearly strictly feasible (choose x to be a strictly feasible solution to (V) and then choose
t to be large enough). The fact that (S) is a consequence of (V) says exactly that the optimal value in
the problem is nonnegative. By the Conic Duality Theorem, the dual problem
 ( ) 
∗ λ
max b, λ − dµ | A λ − c = 0, µ = 1, ≥K̄∗ 0
λ,µ µ

has a feasible solution with the value of the objective ≥ 0. Since, as it is easily seen, K̄∗ = {(λ, µ) | λ ∈
K∗ , µ ≥ 0}, the indicated solution satisfies the requirements

λ ≥K∗ 0, A∗ λ = c, b, λ ≥ d,

i.e., (S) can be obtained from (V) by an admissible aggregation.

“Robust solvability status”. Examples 1.7.2 – 1.7.3 make it clear that in the general conic
case we may meet “pathologies” which do not occur in LP. E.g., a feasible and bounded problem
may be unsolvable, the dual to a solvable conic problem may be infeasible, etc. Where the
pathologies come from? Looking at our “pathological examples”, we arrive at the following
guess: the source of the pathologies is that in these examples, the “solvability status” of the
primal problem is non-robust – it can be changed by small perturbations of the data. This issue
of robustness is very important in modelling, and it deserves a careful investigation.

Data of a conic problem. When asked “What are the data of an LP program min{cT x |
Ax − b ≥ 0}”, everybody will give the same answer: “the objective c, the constraint matrix A
and the right hand side vector b”. Similarly, for a conic problem
 
min cT x | Ax − b ≥K 0 , (CP)

its data, by definition, is the triple (c, A, b), while the sizes of the problem – the dimension n
of x and the dimension m of K, same as the underlying cone K itself, are considered as the
structure of (CP).
1.7. THE CONIC DUALITY THEOREM 35

Robustness. A question of primary importance is whether the properties of the program (CP)
(feasibility, solvability, etc.) are stable with respect to perturbations of the data. The reasons
which make this question important are as follows:

• In actual applications, especially those arising in Engineering, the data are normally inex-
act: their true values, even when they “exist in the nature”, are not known exactly when
the problem is processed. Consequently, the results of the processing say something defi-
nite about the “true” problem only if these results are robust with respect to small data
perturbations i.e., the properties of (CP) we have discovered are shared not only by the
particular (“nominal”) problem we were processing, but also by all problems with nearby
data.

• Even when the exact data are available, we should take into account that processing them
computationally we unavoidably add “noise” like rounding errors (you simply cannot load
something like 1/7 to the standard computer). As a result, a real-life computational routine
can recognize only those properties of the input problem which are stable with respect to
small perturbations of the data.

Due to the above reasons, we should study not only whether a given problem (CP) is feasi-
ble/bounded/solvable, etc., but also whether these properties are robust – remain unchanged
under small data perturbations. As it turns out, the Conic Duality Theorem allows to recognize
“robust feasibility/boundedness/solvability...”.
Let us start with introducing the relevant concepts. We say that (CP) is

• robust feasible, if all “sufficiently close” problems (i.e., those of the same structure
(n, m, K) and with data close enough to those of (CP)) are feasible;

• robust infeasible, if all sufficiently close problems are infeasible;

• robust bounded below, if all sufficiently close problems are bounded below (i.e., their
objectives are bounded below on their feasible sets);

• robust unbounded, if all sufficiently close problems are not bounded;

• robust solvable, if all sufficiently close problems are solvable.

Note that a problem which is not robust feasible, not necessarily is robust infeasible, since among
close problems there may be both feasible and infeasible (look at Example 1.7.2 – slightly shifting
and rotating the plane Im A − b, we may get whatever we want – a feasible bounded problem,
a feasible unbounded problem, an infeasible problem...). This is why we need two kinds of
definitions: one of “robust presence of a property” and one more of “robust absence of the same
property”.
Now let us look what are necessary and sufficient conditions for the most important robust
forms of the “solvability status”.

Proposition 1.7.3 [Robust feasibility] (CP) is robust feasible if and only if it is strictly feasible,
in which case the dual problem (D) is robust bounded above.

Proof. The statement is nearly tautological. Let us fix δ >K 0. If (CP) is robust feasible, then for small
enough t > 0 the perturbed problem min{cT x | Ax − b − tδ ≥K 0} should be feasible; a feasible solution
to the perturbed problem clearly is a strictly feasible solution to (CP). The inverse implication is evident
36 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING

(a strictly feasible solution to (CP) remains feasible for all problems with close enough data). It remains
to note that if all problems sufficiently close to (CP) are feasible, then their duals, by the Weak Duality
Theorem, are bounded above, so that (D) is robust above bounded.

Proposition 1.7.4 [Robust infeasibility] (CP) is robust infeasible if and only if the system

b, λ = 1, A∗ λ = 0, λ ≥K∗ 0

is robust feasible, or, which is the same (by Proposition 1.7.3), if and only if the system

b, λ = 1, A∗ λ = 0, λ >K∗ 0 (1.7.3)

has a solution.
Proof. First assume that (1.7.3) is solvable, and let us prove that all problems sufficiently close to (CP)
are infeasible. Let us fix a solution λ̄ to (1.7.3). Since A is of full column rank, simple Linear Algebra
says that the systems [A ]∗ λ = 0 are solvable for all matrices A from a small enough neighbourhood U
of A; moreover, the corresponding solution λ(A ) can be chosen to satisfy λ(A) = λ̄ and to be continuous
in A ∈ U . Since λ(A ) is continuous and λ(A) >K∗ 0, we have λ(A ) is >K∗ 0 in a neighbourhood of A;
shrinking U appropriately, we may assume that λ(A ) >K∗ 0 for all A ∈ U . Now, bT λ̄ = 1; by continuity
reasons, there exists a neighbourhood V of b and a neighbourhood U  of A such that b ∈ V and all
A ∈ U  one has b , λ(A ) > 0.
Thus, we have seen that there exist a neighbourhood U  of A and a neighbourhood V of b, along with
a function λ(A ), A ∈ U  , such that

b λ(A ) > 0, [A ]∗ λ(A ) = 0, λ(A ) ≥K∗ 0

for all b ∈ V and A ∈ U . By Proposition 1.7.1.(i) it means that all the problems
* +
min [c ]T x | A x − b ≥K 0

with b ∈ V and A ∈ U  are infeasible, so that (CP) is robust infeasible.


Now let us assume that (CP) is robust infeasible, and let us prove that then (1.7.3) is solvable. Indeed,
by the definition of robust infeasibility, there exist neighbourhoods U of A and V of b such that all vector
inequalities
A x − b ≥K 0
with A ∈ U and b ∈ V are unsolvable. It follows that whenever A ∈ U and b ∈ V , the vector inequality

A x − b ≥K 0

is not almost solvable (see Proposition 1.7.1). We conclude from Proposition 1.7.1.(ii) that for every
A ∈ U and b ∈ V there exists λ = λ(A , b ) such that

b , λ(A , b ) > 0, [A ]∗ λ(A , b ) = 0, λ(A , b ) ≥K∗ 0.

Now let us choose λ0 >K∗ 0. For all small enough positive we have A = A + b[A∗ λ0 ]T ∈ U . Let us
choose an with the latter property to be so small that b, λ0 > −1 and set A = A , b = b. According
to the previous observation, there exists λ = λ(A , b) such that

b, λ > 0, [A ]∗ λ ≡ A∗ [λ + b, λ λ0 ] = 0, λ ≥K∗ 0.

Setting λ̄ = λ + b, λ λ0 , we get λ̄ >K∗ 0 (since λ ≥K∗ 0, λ0 >K∗ 0 and b, λ > 0), while A∗ λ̄ = 0 and
b, λ̄ = b, λ (1 + b, λ0 ) > 0. Multiplying λ̄ by appropriate positive factor, we get a solution to (1.7.3).

Now we are able to formulate our main result on “robust solvability”.


1.7. THE CONIC DUALITY THEOREM 37

Proposition 1.7.5 For a conic problem (CP) the following conditions are equivalent to each
other
(i) (CP) is robust feasible and robust bounded (below);
(ii) (CP) is robust solvable;
(iii) (D) is robust solvable;
(iv) (D) is robust feasible and robust bounded (above);
(v) Both (CP) and (D) are strictly feasible.
In particular, under every one of these equivalent assumptions, both (CP) and (D) are solv-
able with equal optimal values.
Proof. (i) ⇒ (v): If (CP) is robust feasible, it also is strictly feasible (Proposition 1.7.3). If, in addition,
(CP) is robust bounded below, then (D) is robust solvable (by the Conic Duality Theorem); in particular,
(D) is robust feasible and therefore strictly feasible (again Proposition 1.7.3).
(v) ⇒ (ii): The implication is given by the Conic Duality Theorem.
(ii) ⇒ (i): trivial.
We have proved that (i)≡(ii)≡(v). Due to the primal-dual symmetry, we also have proved that
(iii)≡(iv)≡(v).

You might also like