1. From linear to conic programming
1. From linear to conic programming
Summer School
on
Modern Convex Optimization
August 26-30, 2002
FIVE LECTURES
ON
MODERN CONVEX OPTIMIZATION
Arkadi Nemirovski
[email protected],
http://iew3.technion.ac.il/Home/Users/Nemirovski.phtml
Faculty of Industrial Engineering and Management
and Minerva Optimization Center
Technion – Israel Institute of Technology
Technion City, Haifa 32000, Israel
2
Preface
Γ being a given set of pairs (i, j) of indices i, j. This is a fundamental combinatorial problem of computing the
stability number of a graph; the corresponding “covering story” is as follows:
Assume that we are given n letters which can be sent through a telecommunication channel, say,
n = 256 usual bytes. When passing trough the channel, an input letter can be corrupted by errors;
as a result, two distinct input letters can produce the same output and thus not necessarily can be
distinguished at the receiving end. Let Γ be the set of “dangerous pairs of letters” – pairs (i, j) of
distinct letters i, j which can be converted by the channel into the same output. If we are interested
in error-free transmission, we should restrict the set S of letters we actually use to be independent
– such that no pair (i, j) with i, j ∈ S belongs to Γ. And in order to utilize best of all the capacity
of the channel, we are interested to use a maximal – with maximum possible number of letters –
independent sub-alphabet. It turns out that the minus optimal value in (A) is exactly the cardinality
of such a maximal independent sub-alphabet.
j=1
bpj x1j ··· b x
j=1 pj kj
x00
p = 1, ..., N,
k
i=1 i
x = 1,
where λmin (A) denotes the minimum eigenvalue of a symmetric matrix A. This problem is responsible for the
design of a truss (a mechanical construction comprised of linked with each other thin elastic bars, like an electric
mast, a bridge or the Eiffel Tower) capable to withstand best of all to k given loads.
When looking at the analytical forms of (A) and (B), it seems that the first problem is easier than the second:
the constraints in (A) are simple explicit quadratic equations, while the constraints in (B) involve much more
complicated functions of the design variables – the eigenvalues of certain matrices depending on the design vector.
The truth, however, is that the first problem is, in a sense, “as difficult as an optimization problem can be”, and
4
the worst-case computational effort to solve this problem within absolute inaccuracy 0.5 by all known optimization
methods is about 2n operations; for n = 256 (just 256 design variables corresponding to the “alphabet of bytes”),
the quantity 2n ≈ 1077 , for all practical purposes, is the same as +∞. In contrast to this, the second problem is
quite “computationally tractable”. E.g., for k = 6 (6 loads of interest) and m = 100 (100 degrees of freedom of
the construction) the problem has about 600 variables (twice the one of the “byte” version of (A)); however, it
can be reliably solved within 6 accuracy digits in a couple of minutes. The dramatic difference in computational
effort required to solve (A) and (B) finally comes from the fact that (A) is a non-convex optimization problem,
while (B) is convex.
Note that realizing what is easy and what is difficult in Optimization is, aside of theoretical
importance, extremely important methodologically. Indeed, mathematical models of real world
situations in any case are incomplete and therefore are flexible to some extent. When you know in
advance what you can process efficiently, you perhaps can use this flexibility to build a tractable
(in our context – a convex) model. The “traditional” Optimization did not pay much attention
to complexity and focused on easy-to-analyze purely asymptotical “rate of convergence” results.
From this viewpoint, the most desirable property of f and gi is smoothness (plus, perhaps,
certain “nondegeneracy” at the optimal solution), and not their convexity; choosing between
the above problems (A) and (B), a “traditional” optimizer would, perhaps, prefer the first of
them. I suspect that a non-negligible part of “applied failures” of Mathematical Programming
came from the traditional (I would say, heavily misleading) “order of preferences” in model-
building. Surprisingly, some advanced users (primarily in Control) have realized the crucial
role of convexity much earlier than some members of the Optimization community. Here is a
real story. About 7 years ago, we were working on certain Convex Optimization method, and
I sent an e-mail to people maintaining CUTE (a benchmark of test problems for constrained
continuous optimization) requesting for the list of convex programs from their collection. The
answer was: “We do not care which of our problems are convex, and this be a lesson for those
developing Convex Optimization techniques.” In their opinion, I am stupid; in my opinion, they
are obsolete. Who is right, this I do not know...
♠ Discovery of interior-point polynomial time methods for “well-structured” generic convex
programs and throughout investigation of these programs.
By itself, the “efficient solvability” of generic convex programs is a theoretical rather than
a practical phenomenon. Indeed, assume that all we know about (P) is that the program is
convex, its objective is called f , the constraints are called gj and that we can compute f and gi ,
along with their derivatives, at any given point at the cost of M arithmetic operations. In this
case the computational effort for finding an -solution turns out to be at least O(1)nM ln( 1 ).
Note that this is a lower complexity bound, and the best known so far upper bound is much
worse: O(1)n(n3 + M ) ln( 1 ). Although the bounds grow “moderately” – polynomially – with
the design dimension n of the program and the required number ln( 1 ) of accuracy digits, from
the practical viewpoint the upper bound becomes prohibitively large already for n like 1000.
This is in striking contrast with Linear Programming, where one can solve routinely problems
with tens and hundreds of thousands of variables and constraints. The reasons for this huge
difference come from the fact that
When solving an LP program, our a priory knowledge is far beyond the fact that the
objective is called f , the constraints are called gi , that they are convex and we can
compute their values at derivatives at any given point. In LP, we know in advance
what is the analytical structure of f and gi , and we heavily exploit this knowledge
when processing the problem. In fact, all successful LP methods never never compute
the values and the derivatives of f and gi – they do something completely different.
5
One of the most important recent developments in Optimization is realizing the simple fact
that a jump from linear f and gi ’s to “completely structureless” convex f and gi ’s is too long: in-
between these two extremes, there are many interesting and important generic convex programs.
These “in-between” programs, although non-linear, still possess nice analytical structure, and
one can use this structure to develop dedicated optimization methods, the methods which turn
out to be incomparably more efficient than those exploiting solely the convexity of the program.
The aforementioned “dedicated methods” are Interior Point polynomial time algorithms,
and the most important “well-structured” generic convex optimization programs are those of
Linear, Conic Quadratic and Semidefinite Programming; the last two entities merely did not
exist as established research subjects just 15 years ago. In my opinion, the discovery of Inte-
rior Point methods and of non-linear “well-structured” generic convex programs, along with the
subsequent progress in these novel research areas, is one of the most impressive achievements in
Mathematical Programming. It is my pleasure to add that one of the key roles in these break-
through developments, and definitely the key role as far as nonlinear programs are concerned,
was and is played by Professor Yuri Nesterov from CORE.
♠ I have outlined the most revolutionary, in my appreciation, changes in the theoretical core
of Mathematical Programming in the last 15-20 years. During this period, we have witnessed
perhaps less dramatic, but still quite important progress in the methodological and application-
related areas as well. The major novelty here is certain shift from the traditional for Operations
Research applications in Industrial Engineering (production planning, etc.) to applications in
“genuine” Engineering. I believe it is completely fair to say that the theory and methods
of Convex Optimization, especially those of Semidefinite Programming, have become a kind
of new paradigm in Control and are becoming more and more frequently used in Mechanical
Engineering, Design of Structures, Medical Imaging, etc.
The aim of the course is to outline some of the novel research areas which have arisen in
Optimization during the past decade or so. I intend to focus solely on Convex Programming,
specifically, on
• Conic Programming, with emphasis on the most important particular cases – those of
Linear, Conic Quadratic and Semidefinite Programming (LP, CQP and SDP, respectively).
Here the focus will be on
Acknowledgements. The first four lectures of the five comprising the course are based upon
the recent book
Ben-Tal, A., Nemirovski, A., Lectures on Modern Convex Optimization: Analysis, Algo-
rithms, Engineering Applications, MPS-SIAM Series on Optimization, SIAM, Philadelphia,
2001.
6
Arkadi Nemirovski,
Haifa, Israel, May 2002.
Contents
3 Semidefinite Programming 77
3.1 Semidefinite cone and Semidefinite programs . . . . . . . . . . . . . . . . . . . . 77
3.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2 What can be expressed via LMI’s? . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.3 Applications of Semidefinite Programming in Engineering . . . . . . . . . . . . . 95
3.3.1 Dynamic Stability in Mechanics . . . . . . . . . . . . . . . . . . . . . . . . 96
3.3.2 Design of chips and Boyd’s time constant . . . . . . . . . . . . . . . . . . 98
3.3.3 Lyapunov stability analysis/synthesis . . . . . . . . . . . . . . . . . . . . 100
3.4 Semidefinite relaxations of intractable problems . . . . . . . . . . . . . . . . . . . 108
3.4.1 Semidefinite relaxations of combinatorial problems . . . . . . . . . . . . . 108
3.4.2 Matrix Cube Theorem and interval stability analysis/synthesis . . . . . . 121
3.4.3 Robust Quadratic Programming . . . . . . . . . . . . . . . . . . . . . . . 128
3.5 Appendix: S-Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7
8 CONTENTS
4 Polynomial Time Interior Point algorithms for LP, CQP and SDP 137
4.1 Complexity of Convex Programming . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.1.1 Combinatorial Complexity Theory . . . . . . . . . . . . . . . . . . . . . . 137
4.1.2 Complexity in Continuous Optimization . . . . . . . . . . . . . . . . . . . 140
4.1.3 Difficult continuous optimization problems . . . . . . . . . . . . . . . . . 144
4.2 Interior Point Polynomial Time Methods for LP, CQP and SDP . . . . . . . . . . 145
4.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.2.2 Interior Point methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.2.3 But... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.3 Interior point methods for LP, CQP, and SDP: building blocks . . . . . . . . . . 151
4.3.1 Canonical cones and canonical barriers . . . . . . . . . . . . . . . . . . . . 151
4.3.2 Elementary properties of canonical barriers . . . . . . . . . . . . . . . . . 153
4.4 Primal-dual pair of problems and primal-dual central path . . . . . . . . . . . . . 155
4.4.1 The problem(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.4.2 The central path(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.5 Tracing the central path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.5.1 The path-following scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.5.2 Speed of path-tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
4.5.3 The primal and the dual path-following methods . . . . . . . . . . . . . . 165
4.5.4 The SDP case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
4.6 Complexity bounds for LP, CQP, SDP . . . . . . . . . . . . . . . . . . . . . . . . 181
4.6.1 Complexity of LP b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.6.2 Complexity of CQP b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.6.3 Complexity of SDP b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.7 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
where
• x ∈ Rn is the design vector
• c ∈ Rn is a given vector of coefficients of the objective function cT x
• A is a given m × n constraint matrix, and b ∈ Rm is a given right hand side of the
constraints.
(LP) is called
– feasible, if its feasible set
F = {x | Ax − b ≥ 0}
is nonempty; a point x ∈ F is called a feasible solution to (LP);
– bounded below, if it is either infeasible, or its objective cT x is bounded below on F.
For a feasible bounded below problem (LP), the quantity
c∗ ≡ inf cT x
x:Ax−b≥0
is called the optimal value of the problem. For an infeasible problem, we set c∗ = +∞,
while for feasible unbounded below problem we set c∗ = −∞.
(LP) is called solvable, if it is feasible, bounded below and the optimal value is attained, i.e.,
there exists x ∈ F with cT x = c∗ . An x of this type is called an optimal solution to (LP).
A priori it is unclear whether a feasible and bounded below LP program is solvable: why should
the infimum be achieved? It turns out, however, that a feasible and bounded below program
(LP) always is solvable. This nice fact (we shall establish it later) is specific for LP. Indeed, a
very simple nonlinear optimization program
1
min x≥1
x
is feasible and bounded below, but it is not solvable.
9
10 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING
how to find a systematic way to bound from below its optimal value c∗ ?
Why this is an important question, and how the answer helps to deal with LP, this will be seen
in the sequel. For the time being, let us just believe that the question is worthy of the effort.
A trivial answer to the posed question is: solve (LP) and look what is the optimal value.
There is, however, a smarter and a much more instructive way to answer our question. Just to
get an idea of this way, let us look at the following example:
x1 + 2x2 + ... + 2001x2001 + 2002x2002 − 1 ≥ 0,
min x1 + x2 + ... + x2002 2002x1 + 2001x2 + ... + 2x2001 + x2002 − 100 ≥ 0, .
..... ... ...
101
We claim that the optimal value in the problem is ≥ 2003 . How could one certify this bound?
This is immediate: add the first two constraints to get the inequality
and divide the resulting inequality by 2003. LP duality is nothing but a straightforward gener-
alization of this simple trick.
(??) How to certify that (S) has, or does not have, a solution.
Imagine that you are very smart and know the correct answer to (?); how could you convince
somebody that your answer is correct? What could be an “evident for everybody” certificate of
the validity of your answer?
If your claim is that (S) is solvable, a certificate could be just to point out a solution x∗ to
(S). Given this certificate, one can substitute x∗ into the system and check whether x∗ indeed
is a solution.
Assume now that your claim is that (S) has no solutions. What could be a “simple certificate”
of this claim? How one could certify a negative statement? This is a highly nontrivial problem
not just for mathematics; for example, in criminal law: how should someone accused in a murder
prove his innocence? The “real life” answer to the question “how to certify a negative statement”
is discouraging: such a statement normally cannot be certified (this is where the rule “a person
is presumed innocent until proven guilty” comes from). In mathematics, however, the situation
is different: in some cases there exist “simple certificates” of negative statements. E.g., in order
to certify that (S) has no solutions, it suffices to demonstrate that a consequence of (S) is a
contradictory inequality such as
−1 ≥ 0.
For example, assume that λi , i = 1, ..., m, are nonnegative weights. Combining inequalities from
(S) with these weights, we come to the inequality
m
λi fi (x) Ω 0 (Cons(λ))
i=1
where Ω is either ” > ” (this is the case when the weight of at least one strict inequality from
(S) is positive), or ” ≥ ” (otherwise). Since the resulting inequality, due to its origin, is a
consequence of the system (S), i.e., it is satisfied by every solution to S), it follows that if
(Cons(λ)) has no solutions at all, we can be sure that (S) has no solution. Whenever this is the
case, we may treat the corresponding vector λ as a “simple certificate” of the fact that (S) is
infeasible.
Let us look what does the outlined approach mean when (S) is comprised of linear inequal-
ities:
T ”>”
(S) : {ai x Ωi bi , i = 1, ..., m} Ωi =
”≥”
Here the “combined inequality” is linear as well:
m
m
(Cons(λ)) : ( λai )T x Ω λbi
i=1 i=1
(Ω is ” > ” whenever λi > 0 for at least one i with Ωi = ” > ”, and Ω is ” ≥ ” otherwise). Now,
when can a linear inequality
dT x Ω e
be contradictory? Of course, it can happen only when d = 0. Whether in this case the inequality
is contradictory, it depends on what is the relation Ω: if Ω = ” > ”, then the inequality is
contradictory if and only if e ≥ 0, and if Ω = ” ≥ ”, it is contradictory if and only if e > 0. We
have established the following simple result:
12 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING
with n-dimensional vector of unknowns x. Let us associate with (S) two systems of linear
inequalities and equations with m-dimensional vector of unknowns λ:
(a) λ ≥ 0;
m
(b) λi ai = 0;
i=1
TI : m
(cI ) λi bi ≥ 0;
i=1
m
s
(dI ) λi > 0.
i=1
(a) λ ≥ 0;
m
(b) λi ai = 0;
TII : i=1
m
(cII ) λi bi > 0.
i=1
Assume that at least one of the systems TI , TII is solvable. Then the system (S) is infeasible.
Proposition 1.2.1 says that in some cases it is easy to certify infeasibility of a linear system of
inequalities: a “simple certificate” is a solution to another system of linear inequalities. Note,
however, that the existence of a certificate of this latter type is to the moment only a sufficient,
but not a necessary, condition for the infeasibility of (S). A fundamental result in the theory of
linear inequalities is that the sufficient condition in question is in fact also necessary:
Theorem 1.2.1 [General Theorem on Alternative] In the notation from Proposition 1.2.1, sys-
tem (S) has no solutions if and only if either TI , or TII , or both these systems, are solvable.
There are numerous proofs of the Theorem on Alternative; in my taste, the most instructive one is to
reduce the Theorem to its particular case – the Homogeneous Farkas Lemma:
[Homogeneous Farkas Lemma] A homogeneous nonstrict linear inequality
aT x ≤ 0
aTi x ≤ 0, i = 1, ..., m
if and only if it can be obtained from the system by taking weighted sum with nonnegative
weights:
(a) aTi x ≤ 0, i = 1, ..., m ⇒ aT x ≤ 0,
(1.2.1)
(b) ∃λi ≥ 0 : a = λi ai .
i
The reduction of TA to HFL is easy. As about the HFL, there are, essentially, two ways to prove the
statement:
• The “quick and dirty” one based on separation arguments, which is as follows:
1.2. DUALITY IN LINEAR PROGRAMMING 13
1. First, we demonstrate that if A is a nonempty closed convex set in Rn and a is a point from
Rn \A, then a can be strongly separated from A by a linear form: there exists x ∈ Rn such
that
xT a < inf xT b. (1.2.2)
b∈A
has a solution b∗ ;
(b) Setting x = b∗ − a, one ensures (1.2.2).
Both (a) and (b) are immediate.
2. Second, we demonstrate that the set
m
A = {b : ∃λ ≥ 0 : b = λi ai }
i=1
– the cone spanned by the vectors a1 , ..., am – is convex (which is immediate) and closed (the
proof of this crucial fact also is not difficult).
3. Combining the above facts, we immediately see that
— either a ∈ A, i.e., (1.2.1.b) holds,
— or there exists x such that xT a < inf xT λi ai .
λ≥0 i
The latter inf is finite if and only if xT ai ≥ 0 for all i, and in this case the inf is 0, so that
the “or” statement says exactly that there exists x with aTi x ≥ 0, aT x < 0, or, which is the
same, that (1.2.1.a) does not hold.
Thus, among the statements (1.2.1.a) and the negation of (1.2.1.b) at least one (and, as it
is immediately seen, at most one as well) always is valid, which is exactly the equivalence
(1.2.1).
• “Advanced” proofs based purely on Linear Algebra facts. The advantage of these purely Linear
Algebra proofs is that they, in contrast to the outlined separation-based proof, do not use the
completeness of Rn as a metric space and thus work when we pass from systems with real coefficients
and unknowns to systems with rational (or algebraic) coefficients. As a result, an advanced proof
allows to establish the Theorem on Alternative for the case when the coefficients and unknowns in
(S), TI , TII are restricted to belong to a given “real field” (e.g., are rational).
We formulate here explicitly two very useful principles following from the Theorem on Al-
ternative:
aTi x Ωi bi , i = 1, ..., m
has no solutions if and only if one can combine the inequalities of the system in
a linear fashion (i.e., multiplying the inequalities by nonnegative weights, adding
the results and passing, if necessary, from an inequality aT x > b to the inequality
aT x ≥ b) to get a contradictory inequality, namely, either the inequality 0T x ≥ 1, or
the inequality 0T x > 0.
B. A linear inequality
aT0 x Ω0 b0
14 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING
The inequality (e) is clearly a consequence of (a) – (d). However, if we extend the system of
inequalities (a) – (b) by all “trivial” (i.e., identically true) linear and quadratic inequalities
with 2 variables, like 0 > −1, u2 + v 2 ≥ 0, u2 + 2uv + v 2 ≥ 0, u2 − uv + v 2 ≥ 0, etc.,
and ask whether (e) can be derived in a linear fashion from the inequalities of the extended
system, the answer will be negative. Thus, Principle A fails to be true already for quadratic
inequalities (which is a great sorrow – otherwise there were no difficult problems at all!)
We are about to use the Theorem on Alternative to obtain the basic results of the LP duality
theory.
is the desire to generate, in a systematic way, lower bounds on the optimal value c∗ of (LP).
An evident way to bound from below a given function f (x) in the domain given by system of
inequalities
gi (x) ≥ bi , i = 1, ..., m, (1.2.3)
is offered by what is called the Lagrange duality and is as follows:
Lagrange Duality:
• Let us look at all inequalities which can be obtained from (1.2.3) by linear aggre-
gation, i.e., at the inequalities of the form
yi gi (x) ≥ yi bi (1.2.4)
i i
with the “aggregation weights” yi ≥ 0. Note that the inequality (1.2.4), due to its
origin, is valid on the entire set X of solutions of (1.2.3).
• Depending on the choice of aggregation weights, it may happen that the left hand
side in (1.2.4) is ≤ f (x) for all x ∈ Rn . Whenever it is the case, the right hand side
yi bi of (1.2.4) is a lower bound on f in X.
i
Indeed, on X the quantity yi bi is a lower bound on yi gi (x), and for y in question
i
the latter function of x is everywhere ≤ f (x).
It follows that
• The optimal value in the problem
y ≥ 0, (a)
max yi bi : yi gi (x) ≤ f (x) ∀x ∈ Rn (b) (1.2.5)
y
i i
is a lower bound on the values of f on the set of solutions to the system (1.2.3).
Let us look what happens with the Lagrange duality when f and gi are homogeneous linear
functions: f = cT x, gi (x) = aTi x. In this case, the requirement (1.2.5.b) merely says that
c = yi ai (or, which is the same, AT y = c due to the origin of A). Thus, problem (1.2.5)
i
becomes the Linear Programming problem
max bT y : AT y = c, y ≥ 0 , (LP∗ )
y
has no solution. We know by the Theorem on Alternative that the latter fact means that some
other system of linear equalities (more exactly, at least one of a certain pair of systems) does
have a solution. More precisely,
(*) (Sa ) has no solutions if and only if at least one of the following two systems with
m + 1 unknowns:
(a) λ = (λ0 , λ1 , ..., λm ) ≥ 0;
m
−λ0 c +
(b) λi ai = 0;
TI : i=1
m
(cI ) −λ0 a + λi bi ≥ 0;
i=1
(dI ) λ0 > 0,
or
(a) λ = (λ0 , λ1 , ..., λm ) ≥ 0;
m
(b) −λ0 c − λi ai = 0;
TII : i=1
m
(cII ) −λ0 a − λi bi > 0
i=1
– has a solution.
Now assume that (LP) is feasible. We claim that under this assumption (Sa ) has no solutions
if and only if TI has a solution.
The implication ”TI has a solution ⇒ (Sa ) has no solution” is readily given by the above
remarks. To verify the inverse implication, assume that (Sa ) has no solutions and the system
Ax ≤ b has a solution, and let us prove that then TI has a solution. If TI has no solution, then
by (*) TII has a solution and, moreover, λ0 = 0 for (every) solution to TII (since a solution
to the latter system with λ0 > 0 solves TI as well). But the fact that TII has a solution λ
with λ0 = 0 is independent of the values of a and c; if this fact would take place, it would
mean, by the same Theorem on Alternative, that, e.g., the following instance of (Sa ):
0T x ≥ −1, Ax ≥ b
has no solutions. The latter means that the system Ax ≥ b has no solutions – a contradiction
with the assumption that (LP) is feasible.
Now, if TI has a solution, this system has a solution with λ0 = 1 as well (to see this, pass from
a solution λ to the one λ/λ0 ; this construction is well-defined, since λ0 > 0 for every solution
to TI ). Now, an (m + 1)-dimensional vector λ = (1, y) is a solution to TI if and only if the
m-dimensional vector y solves the system of linear inequalities and equations
y ≥ 0;
m
AT y ≡ yi ai = c; (D)
i=1
bT y ≥ a
Summarizing our observations, we come to the following result.
Proposition 1.2.2 Assume that system (D) associated with the LP program (LP) has a solution
(y, a). Then a is a lower bound on the optimal value in (LP). Vice versa, if (LP) is feasible and
a is a lower bound on the optimal value of (LP), then a can be extended by a properly chosen
m-dimensional vector y to a solution to (D).
1.2. DUALITY IN LINEAR PROGRAMMING 17
We see that the entity responsible for lower bounds on the optimal value of (LP) is the system
(D): every solution to the latter system induces a bound of this type, and in the case when
(LP) is feasible, all lower bounds can be obtained from solutions to (D). Now note that if
(y, a) is a solution to (D), then the pair (y, bT y) also is a solution to the same system, and the
lower bound bT y on c∗ is not worse than the lower bound a. Thus, as far as lower bounds on
c∗ are concerned, we lose nothing by restricting ourselves to the solutions (y, a) of (D) with
a = bTy; the best lower bound
∗
on c given by (D) is therefore the optimal value of the problem
maxy bT y AT y = c, y ≥ 0 , which is nothing but the dual to (LP) problem (LP∗ ). Note that
(LP∗ ) is also a Linear Programming program.
All we know about the dual problem to the moment is the following:
Proposition 1.2.3 Whenever y is a feasible solution to (LP∗ ), the corresponding value of the
dual objective bT y is a lower bound on the optimal value c∗ in (LP). If (LP) is feasible, then for
every a ≤ c∗ there exists a feasible solution y of (LP∗ ) with bT y ≥ a.
Then
1) The duality is symmetric: the problem dual to dual is equivalent to the primal;
2) The value of the dual objective at every dual feasible solution is ≤ the value of the primal
objective at every primal feasible solution
3) The following 5 properties are equivalent to each other:
Whenever (i) ≡ (ii) ≡ (iii) ≡ (iv) ≡ (v) is the case, the optimal values of the primal and the dual
problems are equal to each other.
Proof. 1) is quite straightforward: writing the dual problem (LP∗ ) in our standard form, we
get
Im 0
T
T
min −b y A y − −c ≥ 0 ,
y
−AT c
18 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING
where Im is the m-dimensional unit matrix. Applying the duality transformation to the latter
problem, we come to the problem
ξ ≥ 0
η ≥ 0
max 0 ξ + c η + (−c) ζ
T T T
,
ξ,η,ζ ζ ≥ 0
ξ − Aη + Aζ = −b
(i)⇒(iv): If the primal is feasible and bounded below, its optimal value c∗ (which
of course is a lower bound on itself) can, by Proposition 1.2.3, be (non-strictly)
majorized by a quantity bT y ∗ , where y ∗ is a feasible solution to (LP∗ ). In the
situation in question, of course, bT y ∗ = c∗ (by already proved item 2)); on the other
hand, in view of the same Proposition 1.2.3, the optimal value in the dual is ≤ c∗ . We
conclude that the optimal value in the dual is attained and is equal to the optimal
value in the primal.
(iv)⇒(ii): evident;
(ii)⇒(iii): This implication, in view of the primal-dual symmetry, follows from the
implication (i)⇒(iv).
(iii)⇒(i): evident.
We have seen that (i)≡(ii)≡(iii)≡(iv) and that the first (and consequently each) of
these 4 equivalent properties implies that the optimal value in the primal problem
is equal to the optimal value in the dual one. All which remains is to prove the
equivalence between (i)–(iv), on one hand, and (v), on the other hand. This is
immediate: (i)–(iv), of course, imply (v); vice versa, in the case of (v) the primal is
not only feasible, but also bounded below (this is an immediate consequence of the
feasibility of the dual problem, see 2)), and (i) follows.
An immediate corollary of the LP Duality Theorem is the following necessary and sufficient
optimality condition in LP:
Theorem 1.2.3 [Necessary and sufficient optimality conditions in linear programming] Con-
sider an LP program (LP) along with its dual (LP∗ ). A pair (x, y) of primal and dual feasible
solutions is comprised of optimal solutions to the respective problems if and only if
Indeed, the “zero duality gap” optimality condition is an immediate consequence of the fact
that the value of primal objective at every primal feasible solution is ≥ the value of the
dual objective at every dual feasible solution, while the optimal values in the primal and the
dual are equal to each other, see Theorem 1.2.2. The equivalence between the “zero duality
gap” and the “complementary slackness” optimality conditions is given by the following
1.3. FROM LINEAR TO CONIC PROGRAMMING 19
computation: whenever x is primal feasible and y is dual feasible, the products yi [Ax − b]i ,
i = 1, ..., m, are nonnegative, while the sum of these products is precisely the duality gap:
Thus, the duality gap can vanish at a primal-dual feasible pair (x, y) if and only if all products
yi [Ax − b]i for this pair are zeros.
to its nonlinear extensions, we should expect to encounter some nonlinear components in the
problem. The traditional way here is to say: “Well, in (LP) there are a linear objective function
f (x) = cT x and inequality constraints fi (x) ≥ bi with linear functions fi (x) = aTi x, i = 1, ..., m.
Let us allow some/all of these functions f, f1 , ..., fm to be nonlinear.” In contrast to this tra-
ditional way, we intend to keep the objective and the constraints linear, but introduce “nonlin-
earity” in the inequality sign ≥.
In the latter relation, we again meet with the inequality sign ≥, but now it stands for the
“arithmetic ≥” – a well-known relation between real numbers. The above “coordinate-wise”
partial ordering of vectors in Rm satisfies a number of basic properties of the standard ordering
of reals; namely, for all vectors a, b, c, d, ... ∈ Rm one has
20 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING
1. Reflexivity: a ≥ a;
• A significant part of the nice features of LP programs comes from the fact that the vector
inequality ≥ in the constraint of (LP) satisfies the properties 1. – 4.;
• The standard inequality ” ≥ ” is neither the only possible, nor the only interesting way to
define the notion of a vector inequality fitting the axioms 1. – 4.
As a result,
A generic optimization problem which looks exactly the same as (LP), up to the
fact that the inequality ≥ in (LP) is now replaced with and ordering which differs
from the component-wise one, inherits a significant part of the properties of LP
problems. Specifying properly the ordering of vectors, one can obtain from (LP)
generic optimization problems covering many important applications which cannot
be treated by the standard LP.
To the moment what is said is just a declaration. Let us look how this declaration comes to
life.
We start with clarifying the “geometry” of a “vector inequality” satisfying the axioms 1. –
4. Thus, we consider vectors from a finite-dimensional Euclidean space E with an inner product
·, · and assume that E is equipped with a partial ordering, let it be denoted by !: in other
words, we say what are the pairs of vectors a, b from E linked by the inequality a ! b. We call
the ordering “good”, if it obeys the axioms 1. – 4., and are interested to understand what are
these good orderings.
Our first observation is:
K = {a ∈ E | a ! 0}.
Namely,
a ! b ⇔ a − b ! 0 [⇔ a − b ∈ K].
Indeed, let a ! b. By 1. we have −b ! −b, and by 4.(b) we may add the latter
inequality to the former one to get a − b ! 0. Vice versa, if a − b ! 0, then, adding
to this inequality the one b ! b, we get a ! b.
1.4. ORDERINGS OF RM AND CONVEX CONES 21
The set K in Observation A cannot be arbitrary. It is easy to verify that it must be a pointed
convex cone, i.e., it must satisfy the following conditions:
1. K is nonempty and closed under addition:
a, a ∈ K ⇒ a + a ∈ K;
2. K is a conic set:
a ∈ K, λ ≥ 0 ⇒ λa ∈ K.
3. K is pointed:
a ∈ K and − a ∈ K ⇒ a = 0.
Geometrically: K does not contain straight lines passing through the origin.
Thus, every nonempty pointed convex cone K in E induces a partial ordering on E which
satisfies the axioms 1. – 4. We denote this ordering by ≥K :
a ≥K b ⇔ a − b ≥K 0 ⇔ a − b ∈ K.
What is the cone responsible for the standard coordinate-wise ordering ≥ on E = Rm we have
started with? The answer is clear: this is the cone comprised of vectors with nonnegative entries
– the nonnegative orthant
Rm T m
+ = {x = (x1 , ..., xm ) ∈ R : xi ≥ 0, i = 1, ..., m}.
(Thus, in order to express the fact that a vector a is greater than or equal to, in the component-
wise sense, to a vector b, we were supposed to write a ≥Rm +
b. However, we are not going to be
that formal and shall use the standard shorthand notation a ≥ b.)
The nonnegative orthant Rm + is not just a pointed convex cone; it possesses two useful
additional properties:
I. The cone is closed: if a sequence of vectors ai from the cone has a limit, the latter also
belongs to the cone.
II. The cone possesses a nonempty interior: there exists a vector such that a ball of positive
radius centered at the vector is contained in the cone.
These additional properties are very important. For example, I is responsible for the possi-
bility to pass to the term-wise limit in an inequality:
ai ≥ bi ∀i, ai → a, bi → b as i → ∞ ⇒ a ≥ b.
It makes sense to restrict ourselves with good partial orderings coming from cones K sharing
the properties I, II. Thus,
From now on, speaking about good partial orderings ≥K , we always assume that the
underlying set K is a pointed and closed convex cone with a nonempty interior.
Note that the closedness of K makes it possible to pass to limits in ≥K -inequalities:
ai ≥K bi , ai → a, bi → b as i → ∞ ⇒ a ≥K b.
The nonemptiness of the interior of K allows to define, along with the “non-strict” inequality
a ≥K b, also the strict inequality according to the rule
a >K b ⇔ a − b ∈ int K,
where int K is the interior of the cone K. E.g., the strict coordinate-wise inequality a >Rm
+
b
(shorthand: a > b) simply says that the coordinates of a are strictly greater, in the usual
arithmetic sense, than the corresponding coordinates of b.
22 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING
Examples. The partial orderings we are especially interested in are given by the following
cones:
• The nonnegative orthant Rm n
+ in R ;
• The Lorentz (or the second-order, or the less scientific name the ice-cream) cone
m−1
Lm = x = (x1 , ..., xm−1 , xm )T ∈ Rm : xm ≥ ! x2i
i=1
symmetric matrices (equipped with the Frobenius inner product A, B = Tr(AB) =
Aij Bij ) and consists of all m × m matrices A which are positive semidefinite, i.e.,
i,j
A = AT ; xT Ax ≥ 0 ∀x ∈ Rm .
We shall refer to (CP) as to a conic problem associated with the cone K. Note that the only
difference between this program and an LP problem is that the latter deals with the particular
choice E = Rm , K = Rm + . With the formulation (CP), we get a possibility to cover a much
wider spectrum of applications which cannot be captured by LP; we shall look at numerous
examples in the sequel.
λ, Ax ≡ λT Ax ≥ λT b (Cons(λ))
with weight vectors λ ≥ 0. By its origin, an inequality of this type is a consequence of the system
of constraints Ax ≥ b of (LP), i.e., it is satisfied at every solution to the system. Consequently,
whenever we are lucky to get, as the left hand side of (Cons(λ)), the expression cT x, i.e.,
whenever a nonnegative weight vector λ satisfies the relation
AT λ = c,
1.6. CONIC DUALITY 23
the inequality (Cons(λ)) yields a lower bound bT λ on the optimal value in (LP). And the dual
problem
max bT λ | λ ≥ 0, AT λ = c
was nothing but the problem of finding the best lower bound one can get in this fashion.
The same scheme can be used to develop the dual to a conic problem
min cT x | Ax ≥K b , K ⊂ E. (CP)
Here the only step which needs clarification is the following one:
(?) What are the “admissible” weight vectors λ, i.e., the vectors such that the scalar
inequality
λ, Ax ≥ λ, b
is a consequence of the vector inequality Ax ≥K b?
Example 1.6.1 Consider the ordering ≥L3 on E = R3 given by the 3-dimensional ice-cream
cone:
a1 0 "
a2 ≥L3 0 ⇔ a3 ≥ a2 + a2 .
1 2
a3 0
The inequality
−1 0
−1 ≥L3 0
2 0
1
is valid; however, aggregating this inequality with the aid of a positive weight vector λ = 1 ,
0.1
we get the false inequality
−1.8 ≥ 0.
Thus, not every nonnegative weight vector is admissible for the partial ordering ≥L3 .
To answer the question (?) is the same as to say what are the weight vectors λ such that
∀a ≥K 0 : λ, a ≥ 0. (1.6.1)
λ, a ≥ λ, b
a ≥K b
⇔ a − b ≥K 0 [additivity of ≥K ]
⇒ λ, a − b ≥ 0 [by (1.6.1)]
⇔ λ, a ≥ λT b.
24 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING
K∗ = {λ ∈ E : λ, a ≥ 0 ∀a ∈ K}.
The set K∗ is comprised of vectors whose inner products with all vectors from K are nonnegative.
K∗ is called the cone dual to K. The name is legitimate due to the following fact:
Theorem 1.6.1 [Properties of the dual cone] Let E be a finite-dimensional Euclidean space
with inner product ·, · and let K ⊂ E be a nonempty set. Then
(i) The set
K∗ = {λ ∈ Em : λ, a ≥ 0 ∀a ∈ K}
is a closed convex cone.
(ii) If int K $= ∅, then K∗ is pointed.
(iii) If K is a closed convex pointed cone, then int K∗ $= ∅.
(iv) If K is a closed convex cone, then so is K∗ , and the cone dual to K∗ is K itself:
(K∗ )∗ = K.
Corollary 1.6.1 A set K ⊂ E is a closed convex pointed cone with a nonempty interior if and
only if the set K∗ is so.
From the dual cone to the problem dual to (CP). Now we are ready to derive the dual
problem of a conic problem (CP). As in the case of Linear Programming, we start with the
observation that whenever x is a feasible solution to (CP) and λ is an admissible weight vector,
i.e., λ ∈ K∗ , then x satisfies the scalar inequality
A∗ λ = c,
one has
cT x = (A∗ λ)T x = λ, Ax ≥ b, λ
1)
For a linear operator x → Ax : Rn → E, A∗ is the conjugate operator given by the identity
y, Ax = xT Ay ∀(y ∈ E, x ∈ Rn ).
When representing the operators by their matrices in orthogonal bases in the argument and the range spaces,
the matrix representing the conjugate operator is exactly the transpose of the matrix representing the operator
itself.
1.6. CONIC DUALITY 25
for all x feasible for (CP), so that the quantity b, λ is a lower bound on the optimal value of
(CP). The best bound one can get in this fashion is the optimal value in the problem
c ∈ ImA∗ . (1.6.2)
In the case of (1.6.2) the primal problem (CP) can be posed equivalently as the following problem:
min {d, y | y ∈ L, y ≥K 0} ,
y
From the above observation we see that if (1.6.2) is not satisfied, then we may reject (CP) from
the very beginning. Thus, from now on we assume that (1.6.2) is satisfied. In fact in what
follows we make a bit stronger assumption:
A. The mapping A is of full column rank, i.e., it has trivial null space.
Assuming that the mapping x #→ Ax has the trivial null space (“we have eliminated
from the very beginning the redundant degrees of freedom – those not affecting the
value of Ax”), the equation
A∗ d = q
is solvable for every right hand side vector q.
In view of A, problem (CP) can be reformulated as a problem (P) of minimizing a linear objective
d, y over the intersection of an affine plane L and a cone K. Conversely, a problem (P) of this
latter type can be posed in the form of (CP) – to this end it suffices to represent the plane L as
the image of an affine mapping x #→ Ax − b (i.e., to parameterize somehow the feasible plane)
and to “translate” the objective d, y to the space of x-variables – to set c = A∗ d, which yields
y = Ax − b ⇒ d, y = cT x + const.
Thus, when dealing with a conic problem, we may pass from its “analytic form” (CP) to the
“geometric form” (P) and vice versa.
What are the relations between the “geometric data” of the primal and the dual problems?
We already know that the cone K∗ associated with the dual problem is dual of the cone K
associated with the primal one. What about the feasible planes L and L∗ ? The answer is
simple: they are orthogonal to each other! More exactly, the affine plane L is the translation,
by vector −b, of the linear subspace
L = ImA ≡ {y = Ax | x ∈ Rn }.
And L∗ is the translation, by any solution λ0 of the system A∗ λ = c, e.g., by the solution d to
the system, of the linear subspace
L∗ = Null(A∗ ) ≡ {λ | A∗ λ = 0}.
A well-known fact of Linear Algebra is that the linear subspaces L and L∗ are orthogonal
complements of each other:
of minimizing a linear objective d, y over the intersection of a cone K with an affine
plane L = L − b given as a translation, by vector −b, of a linear subspace L.
2)
recall that we have restricted ourselves to the problems satisfying the assumption A
1.6. CONIC DUALITY 27
of maximizing the linear objective b, λ over the intersection of the dual cone K∗
with an affine plane L∗ = L⊥ + d given as a translation, by the vector d, of the
orthogonal complement L⊥ of L.
What we get is an extremely transparent geometric description of the primal-dual pair of conic
problems (P), (D). Note that the duality is completely symmetric: the problem dual to (D) is
(P)! Indeed, we know from Theorem 1.6.1 that (K∗ )∗ = K, and of course (L⊥ )⊥ = L. Switch
from maximization to minimization corresponds to the fact that the “shifting vector” in (P) is
(−b), while the “shifting vector” in (D) is d. The geometry of the primal-dual pair (P), (D) is
illustrated on the below picture:
b
L*
K* L
Finally, note that in the case when (CP) is an LP program (i.e., in the case when K is the
nonnegative orthant), the “conic dual” problem (D) is exactly the usual LP dual; this fact
immediately follows from the observation that the cone dual to Rm m
+ is R+ itself.
We have explored the geometry of a primal-dual pair of conic problems: the “geometric
data” of such a pair are given by a pair of dual to each other cones K, K∗ in E and a pair of
affine planes L = L − b, L∗ = L⊥ + d, where L is a linear subspace in E and L⊥ is its orthogonal
complement. The first problem from the pair – let it be called (P) – is to minimize b, y over
y ∈ K ∩ L, and the second (D) is to maximize d, λ over λ ∈ K∗ ∩ L∗ . Note that the “geometric
data” (K, K∗ , L, L∗ ) of the pair do not specify completely the problems of the pair: given L, L∗ ,
we can uniquely define L, but not the shift vectors (−b) and d: b is known up to shift by a
vector from L, and d is known up to shift by a vector from L⊥ . However, this non-uniqueness
is of absolutely no importance: replacing a chosen vector d ∈ L∗ by another vector d ∈ L∗ , we
pass from (P) to a new problem (P ) which is completely equivalent to (P): indeed, both (P)
and (P ) have the same feasible set, and on the (common) feasible plane L of the problems their
28 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING
y ∈ L = L − b, d − d ∈ L⊥ ⇒ d − d , y + b = 0 ⇒ d − d , y = −(d − d ), b ∀y ∈ L.
Similarly, shifting b along L, we do modify the objective in (D), but in a trivial way – on the
feasible plane L∗ of the problem the new objective differs from the old one by a constant.
1) The duality is symmetric: the dual problem is conic, and the problem dual to dual is the
primal.
2) The value of the dual objective at every dual feasible solution λ is ≤ the value of the primal
objective at every primal feasible solution x, so that the duality gap
cT x − b, λ
4) Assume that at least one of the problems (CP), (D) is bounded and strictly feasible. Then
a primal-dual feasible pair (x, λ) is a pair of optimal solutions to the respective problems
4.a) if and only if
b, λ = cT x [zero duality gap]
and
4.b) if and only if
λ, Ax − b = 0 [complementary slackness]
1.7. THE CONIC DUALITY THEOREM 29
Proof. 1): The result was already obtained when discussing the geometry of the primal and
the dual problems.
2): This is the Weak Duality Theorem.
3): Assume that (CP) is strictly feasible and bounded below, and let c∗ be the optimal value
of the problem. We should prove that the dual is solvable with the same optimal value. Since
we already know that the optimal value of the dual is ≤ c∗ (see 2)), all we need is to point out
a dual feasible solution λ∗ with bT λ∗ ≥ c∗ .
Consider the convex set
M = {y = Ax − b | x ∈ Rn , cT x ≤ c∗ }.
Let us start with the case of c $= 0. We claim that in this case
(i) The set M is nonempty;
(ii) the plane M does not intersect the interior K of the cone K: M ∩ int K = ∅.
(i) is evident (why?). To verify (ii), assume, on contrary, that there exists a point x̄, cT x̄ ≤ c∗ ,
such that ȳ ≡ Ax̄ − b >K 0. Then, of course, Ax − b >K 0 for all x close enough to x̄, i.e., all
points x in a small enough neighbourhood of x̄ are also feasible for (CP). Since c $= 0, there are
points x in this neighbourhood with cT x < cT x̄ ≤ c∗ , which is impossible, since c∗ is the optimal
value of (CP).
Now let us make use of the following basic fact:
Theorem 1.7.2 [Separation Theorem for Convex Sets] Let S, T be nonempty non-
intersecting convex subsets of a finite-dimensional Euclidean space E with inner prod-
uct ·, · . Then S and T can be separated by a linear functional: there exists a nonzero
vector λ ∈ E such that
supλ, u ≤ inf λ, u .
u∈S u∈T
From the inequality it follows that the linear form λ, y of y is bounded below on K = int K.
Since this interior is a conic set:
y ∈ K, µ > 0 ⇒ µy ∈ K
(why?), this boundedness implies that λ, y ≥ 0 for all y ∈ K. Consequently, λ, y ≥ 0 for all
y from the closure of K, i.e., for all y ∈ K. We conclude that λ ≥K∗ 0, so that the inf in (1.7.1)
is nonnegative. On the other hand, the infimum of a linear form over a conic set clearly cannot
be positive; we conclude that the inf in (1.7.1) is 0, so that the inequality reads
sup λ, u ≤ 0.
u∈M
for some µ ≥ 0. We claim that µ > 0. Indeed, assuming µ = 0, we get A∗ λ = 0, whence λ, b ≥ 0
in view of (1.7.2). It is time now to recall that (CP) is strictly feasible, i.e., Ax̄ − b >K 0 for
some x̄. Since λ ≥K∗ 0 and λ $= 0, the product λ, Ax̄ − b should be strictly positive (why?),
while in fact we know that the product is −λ, b ≤ 0 (since A∗ λ = 0 and, as we have seen,
λ, b ≥ 0).
Thus, µ > 0. Setting λ∗ = µ−1 λ, we get
λ∗ ≥K∗ 0 [since λ ≥K∗ 0 and µ > 0]
A∗ λ ∗ =c [since A∗ λ = µc] .
cT x ≤ λ∗ , b ∀x : cT x ≤ c∗ [see (1.7.2)]
We see that λ∗ is feasible for (D), the value of the dual objective at λ∗ being at least c∗ , as
required.
It remains to consider the case c = 0. Here, of course, c∗ = 0, and the existence of a dual
feasible solution with the value of the objective ≥ c∗ = 0 is evident: the required solution is
λ = 0. 3.a) is proved.
3.b): the result follows from 3.a) in view of the primal-dual symmetry.
4): Let x be primal feasible, and λ be dual feasible. Then
cT x − b, λ = (A∗ λ)T x − b, λ = Ax − b, λ .
We get a useful identity as follows:
(!) For every primal-dual feasible pair (x, λ) the duality gap cT x − b, λ is equal to
the inner product of the primal slack vector y = Ax − b and the dual vector λ.
Note that (!) in fact does not require “full” primal-dual feasibility: x may be ar-
bitrary (i.e., y should belong to the primal feasible plane ImA − b), and λ should
belong to the dual feasible plane A∗ λ = c, but y and λ not necessary should belong
to the respective cones.
In view of (!) the complementary slackness holds if and only if the duality gap is zero; thus, all
we need is to prove 4.a).
The “primal residual” cT x − c∗ and the “dual residual” b∗ − b, λ are nonnegative, provided
that x is primal feasible, and λ is dual feasible. It follows that the duality gap
cT x − b, λ = [cT x − c∗ ] + [b∗ − b, λ ] + [c∗ − b∗ ]
is nonnegative (recall that c∗ ≥ b∗ by 2)), and it is zero if and only if c∗ = b∗ and both primal
and dual residuals are zero (i.e., x is primal optimal, and λ is dual optimal). All these arguments
hold without any assumptions of strict feasibility. We see that the condition “the duality gap
at a primal-dual feasible pair is zero” is always sufficient for primal-dual optimality of the pair;
and if c∗ = b∗ , this sufficient condition is also necessary. Since in the case of 4) we indeed have
c∗ = b∗ (this is stated by 3)), 4.a) follows.
A useful consequence of the Conic Duality Theorem is the following
Corollary 1.7.1 Assume that both (CP) and (D) are strictly feasible. Then both problems are
solvable, the optimal values are equal to each other, and each one of the conditions 4.a), 4.b) is
necessary and sufficient for optimality of a primal-dual feasible pair.
Indeed, by the Weak Duality Theorem, if one of the problems is feasible, the other is bounded,
and it remains to use the items 3) and 4) of the Conic Duality Theorem.
1.7. THE CONIC DUALITY THEOREM 31
Example 1.7.2 Consider the following conic problem with two variables x = (x1 , x2 )T and the
3-dimensional ice-cream cone K:
x1
min x2 | Ax − b = x2 ≥L3 0 .
x1
In spite of the fact that primal is solvable, the dual"is infeasible: indeed, assuming that λ is dual
feasible, we have λ ≥L3 0, which means that λ3 ≥ λ21 + λ22 ; since also λ1 + λ3 = 0, we come to
λ2 = 0, which contradicts the equality λ2 = 1.
We see that the weakness of the Conic Duality Theorem as compared to the LP Duality one
reflects pathologies which indeed may happen in the general conic case.
Ax − b ≥K 0. (I)
Ax − b ≥K 0
is solvable.
Moreover,
(iii) (II) is solvable if and only if (I) is not “almost solvable”.
Note the difference between the simple case when ≥K is the usual partial ordering ≥ and the
general case. In the former, one can replace in (ii) “nearly solvable” by “solvable”; however, in
the general conic case “almost” is unavoidable.
Example 1.7.3 Let system (I) be given by
x+1
Ax − b ≡ x√− 1 ≥L3 0.
2x
Recalling the definition of the ice-cream cone L3 , we can write the inequality equivalently as
√ "
2x ≥ (x + 1)2 + (x − 1)2 ≡ 2x2 + 2, (i)
From the second of these relations, λ3 = − √12 (λ1 + λ2 ), so that from the first inequality we get
0 ≤ (λ1 − λ2 )2 , whence λ1 = λ2 . But then the third inequality in (ii) is impossible! We see that
here both (i) and (ii) have no solutions.
The geometry of the example is as follows. (i) asks to find a point in the intersection of
the 3D ice-cream cone and a line. This line is an asymptote of the cone (it belongs to a 2D
plane which crosses the cone in such way that the boundary of the cross-section is a branch of
a hyperbola, and the line is one of two asymptotes of the hyperbola). Although the intersection
is empty ((i) is unsolvable), small shifts of the line make the intersection nonempty (i.e., (i) is
unsolvable and “almost solvable” at the same time). And it turns out that one cannot certify
the fact that (i) itself is unsolvable by providing a solution to (ii).
min {t | Ax + tσ − b ≥K 0} (CP)
x,t
in variables (x, t). Clearly, the problem is strictly feasible (why?). Now, if (I) is not almost solvable, then,
first, the matrix of the problem [A; σ] satisfies the full column rank condition A (otherwise the image of
the mapping (x, t) #→ Ax + tσ − b would coincide with the image of the mapping x #→ Ax − b, which is
not he case – the first of these images does intersect K, while the second does not). Second, the optimal
value in (CP) is strictly positive (otherwise the problem would admit feasible solutions with t close to 0,
and this would mean that (I) is almost solvable). From the Conic Duality Theorem it follows that the
dual problem of (CP)
max {b, λ | A∗ λ = 0, σ, λ = 1, λ ≥K∗ 0}
λ
Ax ≥K b (V)
we want to check whether (S) is a consequence of (V). If K is the nonnegative orthant, the
answer is be given by the Inhomogeneous Farkas Lemma:
Proposition 1.7.2 (i) If (S) can be obtained from (V) and from the trivial inequality 1 ≥ 0 by
admissible aggregation, i.e., there exist weight vector λ ≥K∗ 0 such that
A∗ λ = c, λ, b ≥ d,
The difference between the case of the partial ordering ≥ and a general partial ordering ≥K is
in the word “strictly” in (ii).
Proof of the proposition. (i) is evident (why?). To prove (ii), assume that (V) is strictly feasible and
(S) is a consequence of (V) and consider the conic problem
( )
x Ax − b
min t | Ā − b̄ ≡ ≥ 0 ,
x,t t d − cT x + t K̄
K̄ = {(x, t) | x ∈ K, t ≥ 0}
The problem is clearly strictly feasible (choose x to be a strictly feasible solution to (V) and then choose
t to be large enough). The fact that (S) is a consequence of (V) says exactly that the optimal value in
the problem is nonnegative. By the Conic Duality Theorem, the dual problem
( )
∗ λ
max b, λ − dµ | A λ − c = 0, µ = 1, ≥K̄∗ 0
λ,µ µ
has a feasible solution with the value of the objective ≥ 0. Since, as it is easily seen, K̄∗ = {(λ, µ) | λ ∈
K∗ , µ ≥ 0}, the indicated solution satisfies the requirements
λ ≥K∗ 0, A∗ λ = c, b, λ ≥ d,
“Robust solvability status”. Examples 1.7.2 – 1.7.3 make it clear that in the general conic
case we may meet “pathologies” which do not occur in LP. E.g., a feasible and bounded problem
may be unsolvable, the dual to a solvable conic problem may be infeasible, etc. Where the
pathologies come from? Looking at our “pathological examples”, we arrive at the following
guess: the source of the pathologies is that in these examples, the “solvability status” of the
primal problem is non-robust – it can be changed by small perturbations of the data. This issue
of robustness is very important in modelling, and it deserves a careful investigation.
Data of a conic problem. When asked “What are the data of an LP program min{cT x |
Ax − b ≥ 0}”, everybody will give the same answer: “the objective c, the constraint matrix A
and the right hand side vector b”. Similarly, for a conic problem
min cT x | Ax − b ≥K 0 , (CP)
its data, by definition, is the triple (c, A, b), while the sizes of the problem – the dimension n
of x and the dimension m of K, same as the underlying cone K itself, are considered as the
structure of (CP).
1.7. THE CONIC DUALITY THEOREM 35
Robustness. A question of primary importance is whether the properties of the program (CP)
(feasibility, solvability, etc.) are stable with respect to perturbations of the data. The reasons
which make this question important are as follows:
• In actual applications, especially those arising in Engineering, the data are normally inex-
act: their true values, even when they “exist in the nature”, are not known exactly when
the problem is processed. Consequently, the results of the processing say something defi-
nite about the “true” problem only if these results are robust with respect to small data
perturbations i.e., the properties of (CP) we have discovered are shared not only by the
particular (“nominal”) problem we were processing, but also by all problems with nearby
data.
• Even when the exact data are available, we should take into account that processing them
computationally we unavoidably add “noise” like rounding errors (you simply cannot load
something like 1/7 to the standard computer). As a result, a real-life computational routine
can recognize only those properties of the input problem which are stable with respect to
small perturbations of the data.
Due to the above reasons, we should study not only whether a given problem (CP) is feasi-
ble/bounded/solvable, etc., but also whether these properties are robust – remain unchanged
under small data perturbations. As it turns out, the Conic Duality Theorem allows to recognize
“robust feasibility/boundedness/solvability...”.
Let us start with introducing the relevant concepts. We say that (CP) is
• robust feasible, if all “sufficiently close” problems (i.e., those of the same structure
(n, m, K) and with data close enough to those of (CP)) are feasible;
• robust bounded below, if all sufficiently close problems are bounded below (i.e., their
objectives are bounded below on their feasible sets);
Note that a problem which is not robust feasible, not necessarily is robust infeasible, since among
close problems there may be both feasible and infeasible (look at Example 1.7.2 – slightly shifting
and rotating the plane Im A − b, we may get whatever we want – a feasible bounded problem,
a feasible unbounded problem, an infeasible problem...). This is why we need two kinds of
definitions: one of “robust presence of a property” and one more of “robust absence of the same
property”.
Now let us look what are necessary and sufficient conditions for the most important robust
forms of the “solvability status”.
Proposition 1.7.3 [Robust feasibility] (CP) is robust feasible if and only if it is strictly feasible,
in which case the dual problem (D) is robust bounded above.
Proof. The statement is nearly tautological. Let us fix δ >K 0. If (CP) is robust feasible, then for small
enough t > 0 the perturbed problem min{cT x | Ax − b − tδ ≥K 0} should be feasible; a feasible solution
to the perturbed problem clearly is a strictly feasible solution to (CP). The inverse implication is evident
36 LECTURE 1. FROM LINEAR TO CONIC PROGRAMMING
(a strictly feasible solution to (CP) remains feasible for all problems with close enough data). It remains
to note that if all problems sufficiently close to (CP) are feasible, then their duals, by the Weak Duality
Theorem, are bounded above, so that (D) is robust above bounded.
Proposition 1.7.4 [Robust infeasibility] (CP) is robust infeasible if and only if the system
b, λ = 1, A∗ λ = 0, λ ≥K∗ 0
is robust feasible, or, which is the same (by Proposition 1.7.3), if and only if the system
has a solution.
Proof. First assume that (1.7.3) is solvable, and let us prove that all problems sufficiently close to (CP)
are infeasible. Let us fix a solution λ̄ to (1.7.3). Since A is of full column rank, simple Linear Algebra
says that the systems [A ]∗ λ = 0 are solvable for all matrices A from a small enough neighbourhood U
of A; moreover, the corresponding solution λ(A ) can be chosen to satisfy λ(A) = λ̄ and to be continuous
in A ∈ U . Since λ(A ) is continuous and λ(A) >K∗ 0, we have λ(A ) is >K∗ 0 in a neighbourhood of A;
shrinking U appropriately, we may assume that λ(A ) >K∗ 0 for all A ∈ U . Now, bT λ̄ = 1; by continuity
reasons, there exists a neighbourhood V of b and a neighbourhood U of A such that b ∈ V and all
A ∈ U one has b , λ(A ) > 0.
Thus, we have seen that there exist a neighbourhood U of A and a neighbourhood V of b, along with
a function λ(A ), A ∈ U , such that
for all b ∈ V and A ∈ U . By Proposition 1.7.1.(i) it means that all the problems
* +
min [c ]T x | A x − b ≥K 0
A x − b ≥K 0
is not almost solvable (see Proposition 1.7.1). We conclude from Proposition 1.7.1.(ii) that for every
A ∈ U and b ∈ V there exists λ = λ(A , b ) such that
Now let us choose λ0 >K∗ 0. For all small enough positive we have A = A + b[A∗ λ0 ]T ∈ U . Let us
choose an with the latter property to be so small that b, λ0 > −1 and set A = A , b = b. According
to the previous observation, there exists λ = λ(A , b) such that
Setting λ̄ = λ + b, λ λ0 , we get λ̄ >K∗ 0 (since λ ≥K∗ 0, λ0 >K∗ 0 and b, λ > 0), while A∗ λ̄ = 0 and
b, λ̄ = b, λ (1 + b, λ0 ) > 0. Multiplying λ̄ by appropriate positive factor, we get a solution to (1.7.3).
Proposition 1.7.5 For a conic problem (CP) the following conditions are equivalent to each
other
(i) (CP) is robust feasible and robust bounded (below);
(ii) (CP) is robust solvable;
(iii) (D) is robust solvable;
(iv) (D) is robust feasible and robust bounded (above);
(v) Both (CP) and (D) are strictly feasible.
In particular, under every one of these equivalent assumptions, both (CP) and (D) are solv-
able with equal optimal values.
Proof. (i) ⇒ (v): If (CP) is robust feasible, it also is strictly feasible (Proposition 1.7.3). If, in addition,
(CP) is robust bounded below, then (D) is robust solvable (by the Conic Duality Theorem); in particular,
(D) is robust feasible and therefore strictly feasible (again Proposition 1.7.3).
(v) ⇒ (ii): The implication is given by the Conic Duality Theorem.
(ii) ⇒ (i): trivial.
We have proved that (i)≡(ii)≡(v). Due to the primal-dual symmetry, we also have proved that
(iii)≡(iv)≡(v).